WO2005072053A2 - Novel nucleotide and amino acid sequences, and assays and methods of use thereof for diagnosis of colon cancer - Google Patents

Novel nucleotide and amino acid sequences, and assays and methods of use thereof for diagnosis of colon cancer Download PDF

Info

Publication number
WO2005072053A2
WO2005072053A2 PCT/IB2005/000928 IB2005000928W WO2005072053A2 WO 2005072053 A2 WO2005072053 A2 WO 2005072053A2 IB 2005000928 W IB2005000928 W IB 2005000928W WO 2005072053 A2 WO2005072053 A2 WO 2005072053A2
Authority
WO
WIPO (PCT)
Prior art keywords
seq
amino acid
amino acids
acid sequence
homologous
Prior art date
Application number
PCT/IB2005/000928
Other languages
French (fr)
Other versions
WO2005072053A9 (en
Inventor
Rotem Sorek
Michal Ayalon-Soffer
Dvir Dahary
Amit Novik
Amir Toporik
Alexander Diber
Guy Kol
Sarah Pollock
Zurit Levine
Gad S. Cojocaru
Ronen Shemesh
Yossi Cohen
Osnat Sella-Tavor
Shirley Sameah-Greenwald
Shira Walach
Original Assignee
Compugen Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Compugen Ltd. filed Critical Compugen Ltd.
Priority to EP05718397A priority Critical patent/EP1749025A2/en
Priority to AU2005207883A priority patent/AU2005207883A1/en
Priority to CA002554623A priority patent/CA2554623A1/en
Priority claimed from US11/043,788 external-priority patent/US20060014166A1/en
Publication of WO2005072053A2 publication Critical patent/WO2005072053A2/en
Publication of WO2005072053A9 publication Critical patent/WO2005072053A9/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/112Disease subtyping, staging or classification

Definitions

  • the present invention is related to novel nucleotide and protein sequences that are diagnostic markers for colon cancer, and assays and methods of use thereof.
  • Colon and rectal cancers are malignant conditions which occur in the corresponding segments of the large intestine. These cancers are sometimes referred to jointly as “colorectal cancer", and, in many respects, the diseases are considered identical. The major differences between them are the sites where the malignant growths occur and the fact that treatments may differ based on the location of the tumors. More than 95 percent of cancers of the colon and rectum are adenocarcinomas, which develop in glandular cells lining the inside (lumen) of the colon and rectum.
  • adenocarcinomas there are other rarer types of cancers of the large intestine: these include carcinoid tumors usually found in the appendix and rectum; gastrointestinal stromal tumors found in connective tissue in the wall -of the colon and rectum; and lymphomas, which are malignancies of immune cells in the colon, rectum and lymph nodes.
  • carcinoid tumors usually found in the appendix and rectum
  • gastrointestinal stromal tumors found in connective tissue in the wall -of the colon and rectum
  • lymphomas which are malignancies of immune cells in the colon, rectum and lymph nodes.
  • a number of genetic abnormalities have been associated with colon tumors (Bos et al, (1987) Nature 327:293-297; Baker et al, (1989) 244:217-221; Nishisho et al, (1991) 253:665- 669).
  • Colorectal cancer is the second most common cause of cancer death in the United States and the third most prevalent cancer in both men and women. Approximately 100,000 patients every year suffer from colon cancer and approximately half that number die of the disease. In large part this death rate is due to the inability to diagnose the disease at an early stage (Wanebo (1993) Colorectal Cancer, Mosby, St. Louis Mo.). In fact, the prognosis for a case of colon cancer is vastly enhanced when malignant tissue is detected at the early stage known as polyps. Polyps are usually benign growths protruding from the mucous membrane. Nearly all cases of colorectal cancer arise from adenomatous polyps, some of which mature into large polyps, undergo abnormal growth and development, and ultimately progress into cancer.
  • a number of hereditary and nonhereditary conditions have also been linked to a heightened risk of developing colorectal cancer, including familial adenomatous polyposis (FAP), hereditary nonpolyposis colorectal cancer (Lynch syndrome or HNPCC), a personal and/or family history of colorectal cancer or adenomatous polyps, inflammatory bowel disease, diabetes mellitus, and obesity.
  • FAP familial adenomatous polyposis
  • HNPCC hereditary nonpolyposis colorectal cancer
  • a personal and/or family history of colorectal cancer or adenomatous polyps adenomatous polyps
  • inflammatory bowel disease inflammatory bowel disease
  • diabetes mellitus and obesity.
  • obesity the tumor suppressor gene APC (adenomatous polyposis coli), located at 5q21, has been either mutationally inactivated or deleted (Alberts et al., Molecular Biology of the Cell 1288 (3d e
  • the APC protein plays a role in a number of functions, including cell adhesion, apoptosis, and repression of the c-myc oncogene.
  • apoptosis apoptosis apoptosis apoptosis apoptosis apoptosis apoptosis apoptosis apoptosis apoptosis apoptosis, and repression of the c-myc oncogene.
  • HPNCC tumor suppressor gene HNPCC, but only about 15% of tumors contain the mutated gene.
  • a host of other genes have also been implicated in colorectal cancer, including the K-ras, c-Ki-ras, N-ras, H-ras and c-myc oncogenes, and the tumor suppressor genes DCC (deleted in colon carcinoma), Wg/Wnt signal transduction pathway components and p53.
  • DCC tumor suppressor genes
  • Some tyrosine kinases have been shown up-regulated in colorectal tumor tissues or cell lines like HT29.
  • Focal adhesion kinase (FAK) and its up-stream kinase c-src and c-yes in colonic epithelial cells may play an important role in the promotion of colorectal cancers through the extracellular 1 5 matrix (ECM) and integrin-mediated signaling pathways.
  • ECM extracellular 1 5 matrix
  • c-src/FAK complexes may coordinately deregulate VEGF expression and apoptosis inhibition.
  • Recent evidences suggest that a specific signal-transduction pathway for cell survival that implicates integrin engagement leads to FAK activation and thus activates PI-3 kinase and akt. In turn, akt phosphorylates BAD and blocks apoptosis in epithelial cells.
  • VEGF vascular endothelial growth factor
  • Cox enzymes Ota, S. et al. Aliment Pharmacol. Ther. 16 (Suppl 2): 102-106 (2002)
  • estrogen alAzzawi, F. and Wahab, M. Climacteric 5: 3-14 (2002)
  • peroxisome proliferator-activated receptor-y PPAR-y
  • IGF-I Giovannucci (2001)
  • TDG thymine DNA glycosylase
  • Procedures used for detecting, diagnosing, monitoring, staging, and prognosticating colon cancer are of critical importance to the outcome of the patient. For example, patients diagnosed with early colon cancer generally have a much greater five-year survival rate as compared to the survival rate for patients diagnosed with distant metastasized colon cancer. Because colon cancer is highly treatable when detected at an early, localized stage, screening should be a part of routine care for all adults starting at age 50, especially those with first-degree relatives with colorectal cancer.
  • One major advantage of colorectal cancer screening over its counterparts in other types of cancer is its ability to not only detect precancerous lesions, but to remove them as well.
  • the key colorectal cancer screening tests in use today are fecal occult blood test, sigmoidoscopy, colonoscopy, double-contrast barium enema, and the carcinoembryonic antigen (CEA) test.
  • CCA carcinoembryonic antigen
  • Visual examination of the colon for abnormalities can be performed through endoscopic or radiographic techniques such as rigid proctosigmoidoscopy, flexible sigmoidoscopy, colonoscopy, and barium-contrast enema. These methods enable one to detect, biopsy, and remove adenomatous polyps. Despite the advantages of these procedures, there are accompanying downsides: they are expensive, and uncomfortable, and also carry with them a risk of complications. Sigmoidoscopy, by definition, is limited to the sigmoid colon and below, colonoscopy is a relatively expensive procedure, and both share the risk of possible bowel perforation and hemorrhaging.
  • Double-contrast barium enema enables detection of lesions better than FOBT, and almost as well a colonoscopy, but it may be limited in evaluating the winding rectosigmoid region.
  • Another method of colon cancer diagnosis is the detection of carcinoembryonic antigen (CEA) in a blood sample from a subject, which when present at high levels, may indicate the presence of advanced colon cancer. But CEA levels may also be abnormally high when no cancer is present. Thus, this test is not selective for colon cancer, which limits the test's value as an accurate and reliable diagnostic tool.
  • elevated CEA levels are not detectable until late-stage colon cancer, when the cure rate is low, treatment options limited, and patient prognosis poor.
  • Dukes A and “Dukes B” colon cancers are neoplasia that have invaded into the wall of the colon but have not spread into other tissues.
  • Dukes A colon cancers are cancers that have not invaded beyond the submucosa.
  • Dukes B colon cancers are subdivided into two groups: Dukes Bl and Dukes B2.
  • "Dukes Bl” colon cancers are neoplasias that have invaded up to but not through the muscularis basement.
  • Dukes B2 colon cancers are cancers that have breached completely through the muscularis basement. Over a five year period, patients with Dukes A cancer who receive surgical treatment (i.e. removal of the affected tissue) have a greater than 90% survival rate.
  • Dukes A, Bl and B2 cancers are also referred to as Tl, T2 and T3-T4 cancers, respectively.
  • "Dukes C" colon cancers are cancers that have spread to the regional lymph nodes, such as the lymph nodes of the gut. Patients with Dukes C cancer who receive surgical treatment alone have a 35% survival rate over a five year period, but this survival rate is increased to 60% in patients that receive chemotherapy.
  • "Dukes D" colon cancers are cancers that have metastasized to other organs. The liver is the most common organ in which metastatic colon cancer is found.
  • TNM in situ carcinoma
  • T in situ carcinoma
  • stage 1 the cancer has not spread to the regional lymph nodes (NO), and there is no distant metastasis (N40).
  • stage 1 there is still no spread of the cancer to the regional lymph nodes and no distant metastasis, but the tumor has invaded the submucosa (T I) or has progressed further to invade the muscularis propria (T2).
  • Stage R also involves no spread of the cancer to the regional lymph nodes and no distant metastasis, but the tumor has invaded the subserosa, or the nonperitonealized horric or perirectal tissues (T3), or has progressed to invade other organs or structures, and/or has perforated the visceral peritoneum (T4).
  • Stage 3 is characterized by any of the T substages, no distant metastasis, and either metastasis in 1 to 3 regional lymph nodes (Nl) or metastasis in four or more regional lymph nodes (N2).
  • stage 4 involves any of the T or N substages, as well as distant metastasis.
  • pathological staging of colon cancer is preferable over clinical staging as pathological staging provides a more accurate prognosis.
  • Pathological staging typically involves examination of the resected colon section, along with surgical examination of the abdominal cavity.
  • the present invention overcomes the deficiencies of the background art by providing novel markers for colon cancer that are both sensitive and accurate. Furthermore, these markers are able to distinguish between different stages of colon cancer, such as adenocarcinoma (mucinous or signet ring cell originating); leiomyocarcomas; carcinoid.
  • markers are able to distinguish, alone or in combination, between colon cancer between non-cancerous polyps. These markers are overexpressed in colon cancer specifically, as opposed to normal colon tissue. The measurement of these markers, alone or in combination, in patient samples provides information that the diagnostician can correlate with a probable diagnosis of colon cancer.
  • the markers of the present invention alone or in combination, show a high degree of differential detection between colon cancer and non-cancerous states.
  • suitable biological samples include but are not limited to blood, serum, plasma, blood cells, urine, sputum, saliva, stool, spinal fluid or CSF, lymph fluid, the external secretions of the skin, respiratory, intestinal, and genitourinary tracts, tears, milk, neuronal tissue, colon tissue or mucous and any human organ or tissue.
  • the biological sample comprises colon tissue and/or a serum sample and/or a urine sample and/or a stool sample and/or any other tissue or liquid sample.
  • the sample can optionally be diluted with a suitable eluant before contacting the sample to an antibody and/or performing any other diagnostic assay.
  • signalpjtimm and “signalp_nn” refer to two modes of operation for the program SignalP: hmm refers to Hidden Markov Model, while nn refers to neural networks. Localization was also determined through manual inspection of known protein localization and/or gene structure, and the use of heuristics by the individual inventor.
  • T - > C means that the SNP results in a change at the position given in the table from T to C.
  • M - > Q means that the SNP has caused a change in the corresponding amino acid sequence, from methionine (M) to glutamine (Q). If, in place of a letter at the right hand side for the nucleotide sequence SNP, there is a space, it indicates that a frameshift has occurred. A frameshift may also be indicated with a hyphen (-).
  • a stop codon is indicated with an asterisk at the right hand side (*).
  • a comment may be found in parentheses after the above description of the SNP itself. This comment may include an FTId, which is an identifier to a SwissProt entry that was created with the indicated SNP.
  • the header of the first column is "SNP position(s) on amino acid sequence", representing a position of a known mutation on amino acid sequence.
  • SNPs may optionally be used as diagnostic markers according to the present invention, alone or in combination with one or more other SNPs and/or any other diagnostic marker.
  • Preferred embodiments of the present invention comprise such SNPs, including but not limited to novel SNPs on the known (WT or wild type) protein sequences given below, as well as novel nucleic acid and/or amino acid sequences formed through such SNPs, and/or any SNP on a variant amino acid and/or nucleic acid sequence described herein.
  • a key to the p values with regard to the analysis of such overexpression is as follows: - library-based statistics: P-value without including the level of expression in cell- lines (PI) - library based statistics: P-value including the level of expression in cell-lines (P2) - EST clone statistics: P-value without including the level of expression in cell-lines (SP1) - EST clone statistics: predicted overexpression ratio without including the level of expression in cell-lines (R3) - EST clone statistics: P-value including the level of expression in cell-lines (SP2) - EST clone statistics: predicted overexpression ratio including the level of expression in cell-lines (R4)
  • Library-based statistics refer to statistics over an entire library, while EST clone statistics refer to expression only for ESTs from a particular tissue or cancer.
  • the unabbreviated tissue name was used as the reference to the type of chip for which expression was measured.
  • microarray results those from microarrays prepared according to a design by the present inventors, for which the microarray fabrication procedure is described in detail in Materials and Experimental Procedures section herein; and those results from microarrays using Affymetrix technology.
  • the unabbreviated tissue name was used as the reference to the type of chip for which expression was measured.
  • the probe name begins with the name of the cluster (gene), followed by an identifying number.
  • Oligonucleotide microarray results taken from Affymetrix data were from chips available from Affymetrix Inc, Santa Clara, CA, USA (see for example data regarding the Human Genome U133 (HG-U133) Set at www.affymetrix.com products/arrays/specific/hgul33.affx; GeneChip Human Genome U133A
  • the probe names follow the Affymetrix naming convention.
  • the data is available from NCBI Gene Expression Omnibus
  • TAA Tumor Associated Antigen
  • nucleic acid sequences of the present invention refer to portions of nucleic acid sequences that were shown to have one or more properties as described below. They are also the building blocks that were used to construct complete nucleic acid sequences as described in greater detail below.
  • oligonucleotides which are embodiments of the present invention, for example as amplicons, hybridization units and/or from which primers and/or complementary oligonucleotides may optionally be derived, and or for any other use.
  • colon cancer refers to cancers of the colon or colorectal cancers.
  • marker in the context of the present invention refers to a nucleic acid fragment, a peptide, or a polypeptide, which is differentially present in a sample taken from subjects (patients) having colon cancer as compared to a comparable sample taken from subjects who do not have colon cancer.
  • differentially present refers to differences in the quantity of a marker present in a sample taken from patients having colon cancer as compared to a comparable sample taken from patients who do not have colon cancer.
  • a nucleic acid fragment may optionally be differentially present between the two samples if the amount of the nucleic acid fragment in one sample is significantly different from the amount of the nucleic acid fragment in the other sample, for example as measured by hybridization and/or NAT-based assays.
  • a polypeptide is differentially present between the two samples if the amount of the polypeptide in one sample is significantly different from the amount of the polypeptide in the other sample. It should be noted that if the marker is detectable in one sample and not detectable in the other, then such a marker can be considered to be differentially present.
  • diagnosis means identifying the presence or nature of a pathologic condition. Diagnostic methods differ in their sensitivity and specificity. The "sensitivity" of a diagnostic assay is the percentage of diseased individuals who test positive
  • diagnostic assay (percent of "true positives"). Diseased individuals not detected by the assay are “false negatives.” Subjects who are not diseased and who test negative in the assay are termed “true negatives.”
  • the "specificity" of a diagnostic assay is 1 minus the false positive rate, where the "false positive” rate is defined as the proportion of those without the disease who test positive. While a particular diagnostic method may not provide a definitive diagnosis of a condition, it suffices if the method provides a positive indication that aids in diagnosis.
  • diagnosis refers to classifying a disease or a symptom, determining a severity of the disease, monitoring disease progression, forecasting an outcome of a disease and/or prospects of recovery.
  • detecting may also optionally encompass any of the above. Diagnosis of a disease according to the present invention can be effected by determining a level of a polynucleotide or a polypeptide of the present invention in a biological sample obtained from the subject, wherein the level dete ⁇ nined can be correlated with predisposition to, or presence or absence of the disease. It should be noted that a “biological sample obtained from the subject” may also optionally comprise a sample that has not been physically removed from the subject, as described in greater detail below. As used herein, the term “level” refers to expression levels of RNA and/or protein or to DNA copy number of a marker of the present invention.
  • the level of the marker in a biological sample obtained from the subject is different (i.e., increased or decreased) from the level of the same variant in a similar sample obtained from a healthy individual (examples of biological samples are described herein).
  • tissue or fluid collection methods can be utilized to collect the biological sample from the subject in order to determine the level of DNA, RNA and/or polypeptide of the variant of interest in the subject. Examples include, but are not limited to, fine needle biopsy, needle biopsy, core needle biopsy and surgical biopsy (e.g., brain biopsy), and lavage. Regardless of the procedure employed, once a biopsy/sample is obtained the level of the variant can be determined and a diagnosis can thus be made.
  • Determining the level of the same variant in normal tissues of the same origin is preferably effected along-side to detect an elevated expression and/or amplification and/or a decreased expression, of the variant as opposed to the normal tissues.
  • a "test amount" of a marker refers to an amount of a marker in a subject's sample that is consistent with a diagnosis of colon cancer.
  • a test amount can be either in absolute amount (e.g., microgram/ml) or a relative amount (e.g., relative intensity of signals).
  • a "control amount" of a marker can be any amount or a range of amounts to be compared against a test amount of a marker.
  • a control amount of a marker can be the amount of a marker in a patient with colon cancer or a person without colon cancer.
  • a control amount can be either in absolute amount (e.g., microgram/ml) or a relative amount (e.g., relative intensity of signals).
  • Detect refers to identifying the presence, absence or amount of the object to be detected.
  • label includes any moiety or item detectable by spectroscopic, photo chemical, biochemical, immunochemical, or chemical means.
  • useful labels include 32 P, 35 S, fluorescent dyes, electron-dense reagents, enzymes (e.g., as commonly used in an ELISA), biotin-streptavadin, dioxigenin, haptens and proteins for which antisera or monoclonal antibodies are available, or nucleic acid molecules with a sequence complementary to a target.
  • the label often generates a measurable signal, such as a radioactive, chromogenic, or fluorescent signal, that can be used to quantify the amount of bound label in a sample.
  • the label can be incorporated in or attached to a primer or probe either covalently, or through ionic, van der Waals or hydrogen bonds, e.g., incorporation of radioactive nucleotides, or biotinylated nucleotides that are recognized by streptavadin.
  • the label may be directly or indirectly detectable. Indirect detection can involve the binding of a second label to the first label, directly or indirectly.
  • the label can be the ligand of a binding partner, such as biotin, which is a binding partner for streptavadin, or a nucleotide sequence, which is the binding partner for a complementary sequence, to which it can specifically hybridize.
  • the binding partner may itself be directly detectable, for example, an antibody may be itself labeled with a fluorescent molecule.
  • the binding partner also may be indirectly detectable, for example, a nucleic acid having a complementary nucleotide sequence can be a part of a branched DNA molecule that is in turn detectable through hybridization with other labeled nucleic acid molecules (see, e.g., P. D. Fahrlander and A. Klausner, Bio/Technology 6:1165 (1988)). Quantitation of the signal is achieved by, e.g., scintillation counting, densitometry, or flow cytometry.
  • Exemplary detectable labels include but are not limited to magnetic beads, fluorescent dyes, radiolabels, enzymes (e.g., horse radish peroxide, alkaline phosphatase and others commonly used in an ELISA), and calorimetric labels such as colloidal gold or colored glass or plastic beads.
  • the marker in the sample can be detected using an indirect assay, wherein, for example, a second, labeled antibody is used to detect bound marker-specific antibody, and/or in a competition or inhibition assay wherein, for example, a monoclonal antibody which binds to a distinct epitope of the marker are incubated simultaneously with the mixture.
  • Immunoassay is an assay that uses an antibody to specifically bind an antigen.
  • the immunoassay is characterized by the use of specific binding properties of a particular antibody to isolate, target, and or quantify the antigen.
  • the specified antibodies bind to a particular protein at least two times greater than the background (non-specific signal) and do not substantially bind in a significant amount to other proteins present in the sample.
  • Specific binding to an antibody under such conditions may require an antibody that is selected for its specificity for a particular protein.
  • polyclonal antibodies raised to seminal basic protein from specific species such as rat, mouse, or human can be selected to obtain only those polyclonal antibodies that are specifically immunoreactive with seminal basic protein and not with other proteins, except for polymo ⁇ hic variants and alleles of seminal basic protein. This selection may be achieved by subtracting out antibodies that cross-react with seminal basic protein molecules from other species.
  • immunoassay formats may be used to select antibodies specifically immunoreactive with a particular protein.
  • solid-phase ELISA immunoassays are routinely used to select antibodies specifically immunoreactive with a protein (see, e.g., Harlow & Lane, Antibodies, A Laboratory Manual (1988), for a description of immunoassay formats and conditions that can be used to determine specific immunoreactivity).
  • a specific or selective reaction will be at least twice background signal or noise and more typically more than 10 to 100 times background.
  • an isolated polynucleotide comprising a transcript SEQ ID NOs: 1 and 2.
  • an isolated polynucleotide comprising a segment SEQ ID NOs: 89, 90, 91, 92, 93, 94, 95, 96, 97, 98 and 99.
  • an isolated polypeptide comprising SEQ ID NOs 534 and 535.
  • an isolated polynucleotide comprising a transcript SEQ ID NOs: 3, 4, 5 and 6.
  • an isolated polynucleotide comprising a segment SEQ ID NOs: 100, 101, 102, 103, 104, 105, 106 and 107.
  • an isolated polypeptide comprising SEQ ID NOs 536, 537, 538 and 539.
  • an isolated polynucleotide comprising a transcript SEQ ID NO. 7.
  • an isolated polynucleotide comprising a segment SEQ ID NOs: 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120 ,121 and 122.
  • an isolated polypeptide comprising SEQ ID NOs 540.
  • an isolated polynucleotide comprising a transcript selected from the group consisting of SEQ ID NO. 8 and 9. According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a segment selected from the group consisting of SEQ ID NOs: 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141 and 142. According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising SEQ ID NOs 541, 542.
  • an isolated polynucleotide comprising a transcript SEQ ID NO. 10. According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a segment SEQ ID NOs: 143, 144, 145, 146, 147, 148 and 149. According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising SEQ ID NOs 543. According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a transcript SEQ ID NO. 11, 12, 13 and 14.
  • an isolated polynucleotide comprising a segment SEQ ID NOs: 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166 and 167.
  • an isolated polypeptide comprising SEQ ID NOs 544, 545, 546 and 547.
  • an isolated polynucleotide comprising a transcript SEQ ID NO. 15.
  • an isolated polynucleotide comprising a segment SEQ ID NOs: 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183 and 184.
  • an isolated polypeptide comprising SEQ ID NO. 548.
  • an isolated polynucleotide comprising a transcript SEQ ID NO. 16.
  • an isolated polynucleotide comprising a segment SEQ ID NOs: 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195 and 196.
  • an isolated polypeptide comprising SEQ ID NOs 549.
  • an isolated polynucleotide comprising a transcript SEQ ID NO. 17 and 18.
  • an isolated polynucleotide comprising a segment SEQ ID NOs: 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210 and 211.
  • an isolated polypeptide comprising SEQ ID NOs 550 and 551.
  • an isolated polynucleotide comprising a transcript SEQ ID NO. 19, 20, 21 and 22.
  • an isolated polynucleotide comprising a segment SEQ ID NOs: 212, 213, 214, 215, 216, 217, 218 and 219. According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising SEQ ID NOs 552, 553, 554 and 555. According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a transcript SEQ ID NO. 23, 24, 25, 26 and 27.
  • an isolated polynucleotide comprising a segment SEQ ID NOs: 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239 and 240.
  • an isolated polypeptide comprising SEQ ID NOs 556, 557, 558 and 559.
  • an isolated polynucleotide comprising a transcript SEQ ID NO. 28, 29, 30, 31 and 32.
  • an isolated polynucleotide comprising a segment SEQ ID NOs: 241, 242, 243, 244, 245, 246, 247, 248, 249, 250 and 251.
  • an isolated polypeptide comprising SEQ ID NOs 560, 561, 562 and 563.
  • an isolated polynucleotide comprising a transcript SEQ ID NO. 33, 34, and 35.
  • an isolated polynucleotide comprising a segment SEQ ID NOs: 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 267, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284.
  • an isolated polypeptide comprising SEQ ID NOs 564, 565, and 566.
  • an isolated polynucleotide comprising a transcript SEQ ID NO.
  • an isolated polynucleotide comprising a segment SEQ ID NOs: 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305 and 306.
  • an isolated polypeptide comprising SEQ ID NOs 567, 568, 569, 570, 571, 572, 573 and 574.
  • an isolated polynucleotide comprising a transcript SEQ ID NO. 44.
  • an isolated polynucleotide comprising a segment SEQ ID NOs: 307, 308, 309, 310, 311, 312, 313, 314, 315 and 316.
  • an isolated polypeptide comprising SEQ ID NO. 575.
  • an isolated polynucleotide comprising a transcript SEQ ID NO. 45, 46, 47 and 48.
  • an isolated polynucleotide comprising a segment SEQ ID NOs: 317, 318, 319, 320, 321, 322, 323,
  • an isolated polypeptide comprising SEQ ID NOs 576, 577, 578 and 579. According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a transcript SEQ ID NO. 49.
  • an isolated polynucleotide comprising a segment SEQ ID NOs: 363, 364 and 365. According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising SEQ ID NO. 580. • According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a transcript SEQ ID NO. 50, 51, 52, 53, 54, 55 and 56. According to prefe ⁇ 'ed embodiments of the present invention, there is provided an isolated polynucleotide comprising a segment SEQ ID NOs: 366, 367, 368, 369, 370, 371, 372,
  • an isolated polypeptide comprising SEQ ID NOs 581, 582, 583, 584, 585, 586 and 587.
  • an isolated polynucleotide comprising a transcript SEQ ID NO.
  • an isolated polynucleotide comprising a segment SEQ ID NOs: 419, 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 43, 431, 432, 433, 434, 435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448 and 449.
  • an isolated polypeptide comprising SEQ ID NOs 588, 589, 590, 591, 592, 593, 594, 595, 596, 597, 598, 599, 600, 601 and 602.
  • an isolated polynucleotide comprising a transcript SEQ ID NO. 75, 76, 77, 78, 79 and 80.
  • an isolated polynucleotide comprising a segment SEQ ID NOs: 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, 468, 469, 470, 471, 472, 473, 474 and 475.
  • an isolated polypeptide comprising SEQ ID NOs 603, 604, 605, 606 and 607.
  • an isolated polynucleotide comprising a transcript SEQ ID NO. 81, 82, 83 and 84.
  • an isolated polynucleotide comprising a segment SEQ ID NOs: 476, 477, 478, 479, 480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 498, 499, 500, 501, 502, 503 and 504.
  • an isolated polypeptide comprising SEQ ID NOs 608, 609, 610 and 611.
  • an isolated polynucleotide comprising a transcript SEQ ID NO. 85, 86, 87 and 88.
  • an isolated polynucleotide comprising a segment SEQ ID NOs: 505-532 and 533.
  • an isolated polypeptide comprising SEQ ID NOs: 612, 613, 614 and 615.
  • an isolated chimeric polypeptide encodings from clusters M85491, T10888, H14624, H53626,
  • HSENA78 HUMGROG5, HUMODCA, R00299, Z19178, S67314, Z44808, Z25299,
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 608, comprising a first amino acid sequence being at least 90 %> homologous to amino acids 1 - 207 of SSRA_HUMAN, which also corresponds to amino acids 1 - 207 of SEQ ID NO. 608, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide corresponding to amino acids 208 - 214 of SEQ ID NO.
  • an isolated polypeptide encoding for a tail of SEQ ID NO. 608, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95% homologous to acids 208 - 214 in SEQ ID NO.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 609 comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 207 of SSRA_HUMAN, which also corresponds to amino acids 1 - 207 of SEQ ID NO. 609.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 610 comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 181 of SSRAJHUMAN, which also corresponds to amino acids 1 - 181 of SEQ ID NO.
  • an isolated polypeptide encoding for a tail of SEQ ID NO. 610 comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to acids 182 - 192 in SEQ ID NO.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 611 comprising a first amino acid sequence being at least 90 %> homologous to amino acids 1 - 93 of SSRA_HUMAN, which also corresponds to amino acids 1 - 93 of SEQ ID NO. 611, and a second amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%>, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide corresponding to amino acids 94 - 104 of SEQ ID NO. 611, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
  • an isolated polypeptide encoding for a tail of SEQ ID NO. 611 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85% > , more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 94 - 104 in SEQ ID NO. 611.
  • an isolated polypeptide encoding for a head of SEQ ID NO. 604 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 1 - 110 of
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 604, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 83 of Q8N2G4, which also corresponds to amino acids 1 - 83 of SEQ ID NO. 604, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide corresponding to amino acids 84 - 222 of SEQ ID NO. 604, wherein said first and second amino acid sequences are contiguous and in a sequential order.
  • an isolated polypeptide encoding for a tail of SEQ ID NO. 604 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 84 - 222 in SEQ ID NO. 604.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 604 comprising a first amino acid sequence being at least 90 % homologous to amino acids 24 - 106 of BAC85518, which also corresponds to amino acids 1 - 83 of SEQ ID NO.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 605, comprising a first amino acid sequence being at least 90 %> homologous to amino acids 1 - 64 of Q96AC2, which also corresponds to amino acids 1 - 64 of SEQ ID NO. 605, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide corresponding to amino acids 65 - 93 of SEQ ID NO. 605, wherein said first and second amino acid sequences are contiguous and in a sequential order.
  • an isolated polypeptide encoding for a tail of SEQ ID NO. 605, comprising a polypeptide being at least 70%, optionally at least about 80%), preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 65 - 93 in SEQ ID NO. 605.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 605, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 64 of Q8N2G4, which also corresponds to amino acids 1 - 64 of SEQ ID NO.
  • an isolated polypeptide encoding for a tail of SEQ ID NO. 605, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 65 - 93 in SEQ ID NO. 605.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 605 comprising a first amino acid sequence being at least 70%), optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MWVLG corresponding to amino acids 1 - 5 of SEQ ID NO. 605, second amino acid sequence being at least 90 % homologous to amino acids 22 - 80 of BAC85273, which also corresponds to amino acids 6 - 64 of SEQ ID NO.
  • an isolated polypeptide encoding for a head of SEQ ID NO. 605, comprising a polypeptide being at least 70%>, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 1 - 5 of SEQ ID NO.
  • an isolated polypeptide encoding for a tail of SEQ ID NO. 605, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95% homologous to amino acids 65 - 93 in SEQ ID NO. 605.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 605, comprising a first amino acid sequence being at least 90 %> homologous to amino acids 1 - 63 of Q96AC2, which also corresponds to amino acids 1 - 63 of SEQ ID NO. 606, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 64 - 84 of SEQ ID NO. 606, wherein said first and second amino acid sequences are contiguous and in a sequential order.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 607 comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 63 of Q96AC2, which also corresponds to amino acids 1 - 63 of SEQ ID NO.
  • an isolated polypeptide encoding for a tail of SEQ ID NO. 607 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95%) homologous to amino acids 64 - 90 in SEQ ID NO. 607.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 607 comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 63 of Q8N2G4, which also corresponds to amino acids 1 - 63 of SEQ ID NO. 607, and a second amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 64 - 90 of SEQ ID NO. 607 wherein said first and second amino acid sequences are contiguous and in a sequential order.
  • an isolated polypeptide encoding for a tail of SEQ ID NO. 607 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95%> homologous to amino acids 64 - 90 in SEQ ID NO. 607.
  • 607 comprising a first amino acid sequence being at least 70%>, optionally at least 80%>, preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 1 - 5 of SEQ ID NO. 607, second amino acid sequence being at least 90 % homologous to amino acids 22 - 79 of BAC85273, which also corresponds to amino acids 6 - 63 of SEQ ID NO. 607, and a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%) homologous to a polypeptide sequence corresponding to amino acids 64 - 90 of SEQ ID NO.
  • an isolated polypeptide encoding for a head of SEQ ID NO. 607 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95% homologous to amino acids 1 - 5 of SEQ ID NO. 607.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 607 comprising a first amino acid sequence being at least 90 % homologous to amino acids 24 - 86 of BAC85518, which also corresponds to amino acids 1 - 63 of SEQ ID NO.
  • an isolated polypeptide encoding for a tail of SEQ ID NO. 607 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95% homologous to amino acids 64 - 90 in SEQ ID NO. 607.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 588 comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%) and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 1 - 26 of SEQ ID NO. 588, a second amino acid sequence being at least 90 % homologous to amino acids 13 - 187 of SEQ ID NO. 639, which also corresponds to amino acids 27 - 201 of SEQ ID NO. 588, a bridging amino acid A corresponding to amino acid 202 of SEQ ID NO.
  • an isolated polypeptide encoding for a head of SEQ ID NO. 588 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence amino acids 1 - 26 of SEQ ID NO. 588.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 588 comprising a first amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%, more preferably at least 90%) and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 1 - 109 of SEQ ID NO. 588, a second amino acid sequence being at least 90 %> homologous to amino acids 1 - 159 of SEQ ID NO. 640, which also corresponds to amino acids 110 - 268 of SEQ ID NO.
  • a third amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 269 - 356 of SEQ ID NO. 588, wherein said first amino acid sequence, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order.
  • polypeptide 588 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95% homologous to amino acids 1 - 109 of SEQ ID NO. 588.
  • an isolated polypeptide encoding for a tail of SEQ ID NO. 588 comprising a polypeptide being at least 70%), optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 269 - 356 in SEQ ID NO. 588.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 588 comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 128 of SEQ ID NO. 638, which also corresponds to amino acids 1 - 128 of SEQ ID NO. 588, a bridging amino acid L corresponding to amino acid 129 of SEQ ID NO. 588, and a second amino acid sequence being at least 90 % homologous to amino acids 130 - 356 of SEQ ID NO. 638, which also corresponds to amino acids 130 - 356 of SEQ ID NO.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 589 comprising a first amino acid sequence being at least 70%>, optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to amino acids 1 - 26 of SEQ ID NO. 589, a second amino acid sequence being at least 90 % homologous to amino acids 13 - 187 of SEQ ID NO. 639, which also corresponds to amino acids 27 - 201 of SEQ ID NO.
  • a bridging amino acid A corresponding to amino acid 202 of SEQ ID NO. 589 a third amino acid sequence being at least 90 %> homologous to amino acids 189 - 297 of SEQ ID NO. 639, which also corresponds to amino acids 203 - 311 of SEQ ID NO. 589, and a fourth amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 312 - 315 of SEQ ID NO. 589, wherein said first amino acid sequence, second amino acid sequence, bridging amino acid, third amino acid sequence and fourth amino acid sequence are contiguous and in a sequential order.
  • an isolated polypeptide encoding for a head of SEQ ID NO. 589 comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 1 - 26 of SEQ ID NO. 589.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 589 comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence to amino acids 1 - 109 of SEQ ID NO.
  • a second amino acid sequence being at least 90 % homologous to amino acids 1 - 159 of SEQ ID NO. 640, which also corresponds to amino acids 110 - 268 of SEQ ID NO. 589, and a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 269 - 315 of SEQ ID NO. 589, wherein said first amino acid sequence, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order.
  • polypeptide 589 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to amino acids 1 - 109 of SEQ ID NO. 589.
  • an isolated polypeptide encoding for a tail of SEQ ID NO. 589 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95% homologous to amino acids 269 - 315 in SEQ ID NO. 589.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 589 comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 128 of SEQ ID NO. 638, which also corresponds to amino acids 1 - 128 of SEQ ID NO. 589, a bridging amino acid L corresponding to amino acid 129 of SEQ ID NO. 589, a second amino acid sequence being at least 90 % homologous to amino acids 130 - 311 of SEQ ID NO. 638, which also corresponds to amino acids 130 - 311 of SEQ ID NO.
  • a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 312 - 315 of SEQ ID NO. 589, wherein said first amino acid sequence, bridging amino acid, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order.
  • polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 312 - 315 in SEQ ID NO. 589.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 589 comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 311 of Q9UJZ1, which also corresponds to amino acids 1 - 311 of SEQ ID NO.
  • a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%) and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 312 - 315 of SEQ ID NO. 589, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 589 comprising a first amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90%) and most preferably at least 95%> homologous to a polypeptide sequence corresponding to amino acids 1 - 109 of SEQ ID NO.
  • an isolated polypeptide encoding for a head of SEQ ID NO. 589 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 1 - 109 of
  • an isolated polypeptide encoding for a tail of SEQ ID NO. 589 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 269 - 315 in SEQ ID NO. 589.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 589 comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 128 of SEQ ID NO. 638, which also corresponds to amino acids 1 - 128 of SEQ ID NO.
  • a bridging amino acid L corresponding to amino acid 129 of SEQ ID NO. 589 a second amino acid sequence being at least 90 % homologous to amino acids 130 - 311 of SEQ ID NO. 638, which also corresponds to amino acids 130 - 311 of SEQ ID NO. 589, and a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 312 - 315 of SEQ ID NO. 589, wherein said first amino acid sequence, bridging amino acid, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order.
  • an isolated polypeptide encoding for a tail of SEQ ID NO. 589 comprising a polypeptide being at least 70%), optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 312 - 315 in SEQ ID NO. 589.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 589 comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 311 of Q9UJZ1, which also corresponds to amino acids 1 - 311 of SEQ ID NO.
  • an isolated polypeptide encoding for a tail of SEQ ID NO. 589 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 312 - 315 in SEQ ID NO.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 590 comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 1 - 26 of SEQ ID NO. 590, a second amino acid sequence being at least 90 %> homologous to amino acids 13 - 187 of Q9P042, which also corresponds to amino acids 27 - 201 of SEQ ID NO. 590, a bridging amino acid A corresponding to amino acid 202 of SEQ ID NO.
  • a third amino acid sequence being at least 90 % homologous to amino acids 189 - 254 of SEQ ID NO. 639, which also corresponds to amino acids 203 - 268 of SEQ ID NO. 590
  • a fourth amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 269 - 290 of SEQ ID NO. 590, wherein said first amino acid sequence, second amino acid sequence, bridging amino acid, third amino acid sequence and fourth amino acid sequence are contiguous and in a sequential order.
  • an isolated polypeptide encoding for a head of SEQ ID NO. 590 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% > homologous to amino acids 1 - 26 of SEQ ID NO. 590.
  • polypeptide being at least 70%o, optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 269 - 290 in SEQ
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 590 comprising a first amino acid sequence being at least 70%, optionally at least 80%o, preferably at least 85%, more preferably at least 90%) and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 1 - 109 of SEQ ID NO. 590, and a second amino acid sequence being at least 90 % homologous to corresponding to amino acids 1 - 181 of SEQ ID NO. 640, which also corresponds to amino acids 110 - 290 of SEQ ID NO. 590, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
  • an isolated polypeptide encoding for a head of SEQ ID NO. 590 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 1 - 109 of SEQ ID NO. 590.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 590 comprising a first amino acid sequence being at least 90 %> homologous to amino acids 1 - 128 of SEQ ID NO. 638, which also corresponds to amino acids 1 - 128 of SEQ ID NO.
  • a bridging amino acid L corresponding to amino acid 129 of SEQ ID NO. 590 a second amino acid sequence being at least 90 %> homologous to amino acids 130 - 268 of SEQ ID NO. 638, which also corresponds to amino acids 130 - 268 of SEQ ID NO. 590, and a third amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 269 - 290 of SEQ ID NO. 590, wherein said first amino acid sequence, bridging amino acid, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order.
  • an isolated polypeptide encoding for a tail of SEQ ID NO. 590 comprising a polypeptide being at least 70%), optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95% > homologous to amino acids 269 - 290 in SEQ ID NO. 590.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 590 comprising a first amino acid sequence being at least 90 %> homologous to amino acids 1 - 268 of Q9UJZ1, which also corresponds to amino acids 1 - 268 of SEQ ID NO. 590, and a second amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 269 - 290 of SEQ ID NO. 590, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
  • an isolated polypeptide encoding for a tail of SEQ ID NO. 590 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 269 - 290 in SEQ ID NO. 590.
  • a first amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 1 - 26 of SEQ ID NO. 591, a second amino acid sequence being at least 90 % homologous to amino acids 13 - 187 of SEQ ID NO. 639, which also corresponds to amino acids 27 - 201 of SEQ ID NO. 591, a bridging amino acid A corresponding to amino acid 202 of SEQ ID NO. 591, a third amino acid sequence being at least 90 % homologous to amino acids 189 - 226 of SEQ ID NO.
  • a fourth amino acid sequence being at least 10%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% > homologous to a polypeptide sequence corresponding to amino acids 241 - 281 of SEQ ID NO. 591, and a fifth amino acid sequence being at least 90 % homologous to amino acids 227 - 342 of SEQ ID NO. 639, which also corresponds to amino acids 282 - 397 of SEQ ID NO.
  • an isolated polypeptide encoding for a head of SEQ ID NO. 591 comprising a polypeptide being at least 70%, optionally at least about 80%), preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 1-26 of SEQ ID NO. 591.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 591 comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 1 - 109 of SEQ ID NO.
  • a second amino acid sequence being at least 90 % homologous to amino acids 1 - 131 of SEQ ID NO. 640, which also corresponds to amino acids 110 - 240 of SEQ ID NO. 591
  • a third amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 241 - 281 of SEQ ID NO. 591
  • a fourth amino acid sequence being at least 90 % homologous to amino acids 132 - 159 of SEQ ID NO. 640, which also corresponds to amino acids 282 - 309 of SEQ ID NO.
  • a fifth amino acid sequence being at least 70%>, optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 310 - 397 of SEQ ID NO. 591, wherein said first amino acid sequence, second amino acid sequence, third amino acid sequence, fourth amino acid sequence and fifth amino acid sequence are contiguous and in a sequential order.
  • polypeptide 591 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95%> homologous to amino acids 1 - 109 of SEQ ID NO. 591.
  • an isolated polypeptide encoding for an edge portion of SEQ ID NO. 591 comprising an amino acid sequence being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95%) homologous to the sequence encoding for amino acids 241 - 281 corresponding to SEQ ID NO. 591.
  • an isolated polypeptide encoding for a tail of SEQ ID NO. 591 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 310 - 397 in SEQ ID NO. 591.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 591 comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 128 of Q96FY2, which also corresponds to amino acids 1 - 128 of SEQ ID NO.
  • a bridging amino acid L corresponding to amino acid 129 of SEQ ID NO. 591 a second amino acid sequence being at least 90 % homologous to amino acids 130 - 240 of SEQ ID NO. 638, which also corresponds to amino acids 130 - 240 of SEQ ID NO. 591, a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%o homologous to a polypeptide sequence corresponding to amino acids 241 - 281 of SEQ ID NO. 591, and a fourth amino acid sequence being at least 90 % homologous to amino acids 241 - 356 of SEQ ID NO.
  • an isolated polypeptide encoding for an edge portion of SEQ ID NO. 591 comprising an amino acid sequence being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence encoding for amino acids 241 - 281 corresponding to SEQ ID NO. 591.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 591 comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 240 of Q9UJZ1, which also corresponds to amino acids 1 - 240 of SEQ ID NO. 591, a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 241 - 281 of SEQ ID NO.
  • an isolated polypeptide encodmg for an edge portion of SEQ ID NO. 591 comprising an amino acid sequence being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence encoding for amino acids 241 - 281 corresponding to SEQ ID NO.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 592 comprising a first amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 1 - 26 of SEQ ID NO. 592, a second amino acid sequence being at least 90 %> homologous to amino acids 13 - 187 of SEQ ID NO. 639, which also corresponds to amino acids 27 - 201 of SEQ ID NO. 592, a bridging amino acid A corresponding to amino acid 202 of SEQ ID NO.
  • a third amino acid sequence being at least 90 % homologous to to amino acids 189 - 254 of SEQ ID NO. 639, which also corresponds to amino acids 203 - 268 of SEQ ID NO. 592
  • a fourth amino acid sequence being at least 90 %> homologous to amino acids 298 - 342 of SEQ ID NO. 639, which also corresponds to amino acids 269 - 313 of SEQ ID NO. 592, wherein said first amino acid sequence, second amino acid sequence, bridging amino acid, third amino acid sequence and fourth amino acid sequence are contiguous and in a sequential order.
  • polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 1 - 26 of SEQ ID NO. 592.
  • a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 1 - 26 of SEQ ID NO. 592.
  • an isolated chimeric polypeptide encoding for an edge portion of SEQ ID NO.
  • n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise HA, having a structure as follows: a sequence starting from any of amino acid numbers 268-x to 268; and ending at any of amino acid numbers 269+ ((n-2) - x), in which x varies from 0 to n-2.
  • a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%) and most preferably at least 95% homologous to a polypeptide sequence amino acids 1 - 109 of SEQ ID NO. 592, a second amino acid sequence being at least 90 % homologous to amino acids 1 - 159 of SEQ ID NO. 640, which also corresponds to amino acids 110 - 268 of SEQ ID NO.
  • a third amino acid sequence being at least 70%>, optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 269 - 313 of SEQ ID NO. 592, wherein said first amino acid sequence, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order.
  • polypeptide 592 comprising a polypeptide being at least 70%, optionally at least about 80% > , preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 1 - 109 of SEQ ID NO. 592.
  • an isolated polypeptide encoding for a tail of SEQ ID NO. 592 comprising a polypeptide being at least 70%), optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95%> homologous to amino acids 269 - 313 in SEQ ID NO. 592.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 592 comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 128 of SEQ ID NO. 638, which also corresponds to amino acids 1 - 128 of SEQ ID NO. 592, a bridging amino acid L corresponding to amino acid 129 of SEQ ID NO. 592, a second amino acid sequence being at least 90 % homologous to amino acids 130 - 268 of SEQ ID NO. 638, which also corresponds to amino acids 130 - 268 of SEQ ID NO.
  • a third amino acid sequence being at least 90 % homologous to amino acids 312 - 356 of SEQ ID NO. 638, which also corresponds to amino acids 269 - 313 of SEQ ID NO. 592, wherein said first amino acid sequence, bridging amino acid, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order.
  • an isolated chimeric polypeptide encoding for an edge portion of SEQ ID NO.
  • n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise HA, having a structure as follows: a sequence starting from any of amino acid numbers 268-x to 268; and ending at any of amino acid numbers 269+ ((n-2) - x), in which x varies from 0 to n-2.
  • n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise HA, having a structure as follows: a sequence starting from any of amino acid numbers 268-x to 268; and ending at any of amino acid numbers 269+ ((n-2) - x), in which x varies from 0 to n-2.
  • a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%) and most preferably at least 95% homologous to a polypeptide sequence amino acids 1 - 109 of SEQ ID NO. 592, a second amino acid sequence being at least 90 % homologous to amino acids 1 - 159 of SEQ ID NO. 640, which also corresponds to amino acids 110 - 268 of SEQ ID NO.
  • a third amino acid sequence being at least 70%>, optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 269 - 313 of SEQ ID NO. 592, wherein said first amino acid sequence, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order.
  • polypeptide 592 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95 %> homologous to amino acids 1 - 109 of SEQ ID NO. 592.
  • an isolated polypeptide encoding for a tail of SEQ ID NO. 592 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to amino acids 269 - 313 in SEQ ID NO. 592.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 592 comprising a first amino acid sequence being at least 90 %> homologous to amino acids 1 - 128 of SEQ ID NO. 638, which also corresponds to amino acids 1 - 128 of SEQ ID NO. 592, a bridging amino acid L corresponding to amino acid 129 of SEQ ID NO. 592, a second amino acid sequence being at least 90 % homologous to amino acids 130 - 268 of SEQ ID NO. 638, which also corresponds to amino acids 130 - 268 of SEQ ID NO.
  • a third amino acid sequence being at least 90 % homologous to amino acids 312 - 356 of SEQ ID NO. 638, which also corresponds to amino acids 269 - 313 of SEQ ID NO. 592, wherein said first amino acid sequence, bridging amino acid, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order.
  • an isolated chimeric polypeptide encoding for an edge portion of SEQ ID NO.
  • n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise HA, having a structure as follows: a sequence starting from any of amino acid numbers 268-x to 268; and ending at any of amino acid numbers 269+ ((n-2) - x), in which x varies from 0 to n-2.
  • n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherem at least two amino acids comprise HA, having a structure as follows: a sequence starting from any of amino acid numbers 268-x to 268; and ending at any of amino acid numbers 269+ ((n-2) - x), in which x varies from 0 to n-2.
  • a first amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 1 - 26 of SEQ ID NO. 593, a second amino acid sequence being at least 90 % homologous to amino acids 13 - 187 of SEQ ID NO. 639, which also corresponds to amino acids 27 - 201 of SEQ ID NO. 593, a bridging amino acid A corresponding to amino acid 202 of SEQ ID NO. 593, a third amino acid sequence being at least 90 % homologous to amino acids 189 - 226 of SEQ ID NO.
  • a fourth amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 241 - 281 of SEQ ID NO. 593, a fifth amino acid sequence being at least 90 % homologous to amino acids 227 - 254 of SEQ ID NO. 639, which also corresponds to amino acids 282 - 309 of SEQ ID NO.
  • a sixth amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 310 - 331 of SEQ ID NO. 593, wherein said first amino acid sequence, second amino acid sequence, bridging amino acid, third amino acid sequence, fourth amino acid sequence, fifth amino acid sequence and sixth amino acid sequence are contiguous and in a sequential order.
  • an isolated polypeptide encoding for an edge portion of SEQ ID NO. 593 comprising an amino acid sequence being at least 70%>, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence encoding for amino acids 241 - 281 corresponding to SEQ ID NO. 593.
  • an isolated polypeptide encoding for a tail of SEQ ID NO. 593 comprising a polypeptide being at least 10%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95%> homologous to amino acids 310 - 331in SEQ ID NO. 593.
  • a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide sequence corresponding to amino acids 1 - 109 of SEQ ID NO. 593, a second amino acid sequence being at least 90 % homologous to amino acids 1 - 131 of SEQ ID NO. 640, which also corresponds to amino acids 110 - 240 of SEQ ID NO.
  • a third amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 241 - 281 of SEQ ID NO. 593, and a fourth amino acid sequence being at least 90 % homologous to amino acids 132 - 181 of SEQ ID NO. 640, which also corresponds to amino acids 282 - 331 of SEQ ID NO. 593, wherein said first amino acid sequence, second amino acid sequence, third amino acid sequence and fourth amino acid sequence are contiguous and in a sequential order.
  • an isolated polypeptide encoding for a head of SEQ ID NO. 593 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95%> homologous to amino acids 1 - 109 of SEQ ID NO. 593.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 593 comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 128 of SEQ ID NO. 638, which also corresponds to amino acids 1 - 128 of SEQ ID NO. 593, a bridging amino acid L corresponding to amino acid 129 of SEQ ID NO.
  • a second amino acid sequence being at least 90 % homologous to amino acids 130 - 240 of SEQ ID NO. 638, which also corresponds to amino acids 130 - 240 of SEQ ID NO. 593
  • a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%>, more preferably at least 90%> and most preferably at least 95%> homologous to a polypeptide sequence corresponding to amino acids 241 - 281 of SEQ ID NO. 593
  • a fourth amino acid sequence being at least 90 % homologous to amino acids 241 - 268 of SEQ ID NO. 638, which also corresponds to amino acids 282 - 309 of SEQ ID NO.
  • a fifth amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 310 - 331 of SEQ ID NO. 593, wherem said first amino acid sequence, bridging amino acid, second amino acid sequence, third amino acid sequence, fourth amino acid sequence and fifth amino acid sequence are contiguous and in a sequential order.
  • an isolated polypeptide encoding for an edge portion of SEQ ID NO.
  • an isolated polypeptide encoding for a tail of SEQ ID NO. 593 comprising a polypeptide being at least 70%), optionally at least about 80%», preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 310 - 331 in SEQ ID NO. 593.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 593 comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 240 of Q9UJZ1, which also corresponds to amino acids 1 - 240 of SEQ ID NO. 593, a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%) and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 241 - 281 of SEQ ID NO.
  • a third amino acid sequence being at least 90 % homologous to amino acids 241 - 268 of Q9UJZ1, which also corresponds to amino acids 282 - 309 of SEQ ID NO. 593
  • a fourth amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%o homologous to a polypeptide sequence corresponding to amino acids 310 - 331 of SEQ ID NO. 593, wherein said first amino acid sequence, second amino acid sequence, third amino acid sequence and fourth amino acid sequence are contiguous and in a sequential order.
  • an isolated polypeptide encoding for a tail of SEQ ID NO. 593 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 310 - 331 in SEQ ID NO. 593.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 594 comprising a first amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 1 - 26 of SEQ ID NO. 594, a second amino acid sequence being at least 90 %> homologous to amino acids 13 - 134 of SEQ ID NO. 639, which also corresponds to amino acids 27 - 148 of SEQ ID NO.
  • an isolated polypeptide encoding for a head of SEQ ID NO. 594 comprising a polypeptide being at least 70%>, optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 1 - 26 of
  • an isolated polypeptide encoding for a tail of SEQ ID NO. 594 comprising a polypeptide being at least 70%), optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 149 - 183 in SEQ ID NO. 594.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 594 comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 128 of SEQ ID NO. 638, which also corresponds to amino acids 1 - 128 of SEQ ID NO.
  • a bridging amino acid L corresponding to amino acid 129 of SEQ ID NO. 594 a second amino acid sequence being at least 90 % homologous to amino acids 130 - 148 of SEQ ID NO. 638, which also corresponds to amino acids 130 - 148 of SEQ ID NO. 594, and a third amino acid sequence being at least 70%>, optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 149 - 183 of SEQ ID NO. 594, wherein said first amino acid sequence, bridging amino acid, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order.
  • an isolated polypeptide encoding for a tail of SEQ ID NO. 594 comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 149 - 183 in SEQ ID NO. 594.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 594 comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 148 of Q9UJZ1, which also corresponds to amino acids 1 - 148 of SEQ ID NO.
  • a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 149 - 183 of SEQ ID NO. 594, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
  • polypeptide 594 comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%o, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 149 - 183 in SEQ ID NO. 594.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 595 comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 1 - 26 of SEQ ID NO.
  • a second amino acid sequence being at least 90 % homologous to amino acids 13 - 180 of SEQ ID NO. 639, which also corresponds to amino acids 27 - 194 of SEQ ID NO. 595
  • a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 195 - 220 of SEQ ID NO. 595, wherein said first amino acid sequence, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order.
  • polypeptide 595 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to amino acids 1 - 26 of SEQ ID NO. 595.
  • an isolated polypeptide encoding for a tail of SEQ ID NO. 595 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% > homologous to amino acids 195 - 220 in SEQ ID NO. 595.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 595 comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 128 of SEQ ID NO. 638, which also corresponds to amino acids 1 - 128 of SEQ ID NO. 595, a bridging amino acid L corresponding to amino acid 129 of SEQ ID NO. 595, a second amino acid sequence being at least 90 % homologous to amino acids 130 - 194 of SEQ ID NO. 638, which also corresponds to amino acids 130 - 194 of SEQ ID NO.
  • a third amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90%> and most preferably at least 95%o homologous to a polypeptide sequence corresponding to amino acids 195 - 220 of SEQ ID NO. 595, wherein said first amino acid sequence, bridging amino acid, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order.
  • polypeptide 595 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 195 - 220 in SEQ ID NO. 595.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 595 comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 194 of Q9UJZ1, which also corresponds to amino acids 1 - 194 of SEQ ID NO.
  • a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%) and most preferably at least 95%> homologous to a polypeptide sequence corresponding to amino acids 195 - 220 of SEQ ID NO. 595, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 596 comprising a first amino acid sequence being at least 10%, optionally at least 80%, preferably at least 85%, more preferably at least 90%) and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 1 - 26 of SEQ ID NO.
  • a second amino acid sequence being at least 90 % homologous to amino acids 13 - 134 of SEQ ID NO. 639, which also corresponds to amino acids 27 - 148 of SEQ ID NO. 596
  • a third amino acid sequence being at least 90 % homologous to amino acids 180 - 187 of SEQ ID NO. 639, which also corresponds to amino acids 149 - 156 of SEQ ID NO. 596
  • a bridging amino acid A corresponding to amino acid 157 of SEQ ID NO. 596
  • a fourth amino acid sequence being at least 90 % homologous to amino acids 189 - 342 of SEQ ID NO. 639, which also corresponds to amino acids 158 - 311 of SEQ ID NO.
  • an isolated polypeptide encoding for a head of SEQ ID NO. 596 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to amino acids 1 - 26 of SEQ ID NO. 596.
  • n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise RV, having a structure as follows: a sequence starting from any of amino acid numbers 148-x to 148; and ending at any of amino acid numbers 149+ ((n-2) - x), in which x varies from 0 to n-2.
  • a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide sequence corresponding to amino acids 1 - 109 of SEQ ID NO. 596, a second amino acid sequence being at least 90 % homologous to amino acids 1 - 39 of SEQ ID NO. 640, which also corresponds to amino acids 110 - 148 of SEQ ID NO. 596, a third amino acid sequence being at least 90 % homologous to amino acids 85 - 159 of SEQ ID NO. 640, which also corresponds to amino acids 149 - 223 of SEQ ID NO.
  • a fourth amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide sequence corresponding to amino acids 224 - 311 of SEQ ID NO. 596, wherein said first amino acid sequence, second amino acid sequence, third amino acid sequence and fourth amino acid sequence are contiguous and in a sequential order.
  • polypeptide being at least 10%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 1 - 109 of SEQ ID NO. 596.
  • a polypeptide being at least 10%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 1 - 109 of SEQ ID NO. 596.
  • an isolated chimeric polypeptide encoding for an edge portion of SEQ ID NO.
  • n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise RV, having a structure as follows: a sequence starting from any of amino acid numbers 148-x to 148; and ending at any of amino acid numbers 149+ ((n-2) - x), in which x varies from 0 to n-2.
  • an isolated polypeptide encoding for a tail of SEQ ID NO.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 596 comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 128 of Q96FY2, which also corresponds to amino acids 1 - 128 of SEQ ID NO. 596, a bridging amino acid L corresponding to amino acid 129 of SEQ ID NO.
  • a second amino acid sequence being at least 90 % > homologous to amino acids 130 - 148 of SEQ ID NO. 638, which also corresponds to amino acids 130 - 148 of SEQ ID NO. 596
  • a third amino acid sequence being at least 90 % homologous to corresponding to amino acids 194 - 356 of SEQ ID NO. 638, which also corresponds to amino acids 149 - 311 of SEQ ID NO. 596, wherein said first amino acid sequence, bridging amino acid, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order.
  • n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise RV, having a structure as follows: a sequence starting from any of amino acid numbers 148-x to 148; and ending at any of amino acid numbers 149+ ((n-2) - x), in which x varies from 0 to n-2.
  • n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise RV, having a structure as follows: a sequence starting from any of amino acid numbers 148-x to 148; and ending at any of amino acid numbers 149+ ((n-2) - x), in which x varies from 0 to n-2.
  • a first amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 1 - 26 of SEQ ID NO. 597, a second amino acid sequence being at least 90 % homologous to amino acids 13 - 143 of SEQ ID NO. 639, which also corresponds to amino acids 27 - 157 of SEQ ID NO. 597, and a third amino acid sequence being at least 90 % homologous to amino acids 295 - 342 of SEQ ID NO. 639, which also corresponds to amino acids 158 - 205 of SEQ ID NO.
  • an isolated polypeptide encoding for a head of SEQ ID NO. 597 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to amino acids 1 - 26 of SEQ ID NO. 597.
  • n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise IV, having a structure as follows: a sequence starting from any of amino acid numbers 157-x to 157; and ending at any of amino acid numbers 158+ ((n-2) - x), in which x varies from 0 to n-2.
  • amino acid sequence being at least 90 %> homologous to amino acids 1 - 128 of Q96FY2, which also corresponds to amino acids 1 - 128 of SEQ ID NO. 597, a bridging amino acid L corresponding to amino acid 129 of SEQ ID NO. 597, a second amino acid sequence being at least 90 % homologous to amino acids 130 - 157 of SEQ ID NO. 639,, which also coixesponds to amino acids 130 - 157 of SEQ ID NO. 597, and a third amino acid sequence being at least 90 % homologous to amino acids 309 - 356 of ID NO. 639, which also corresponds to amino acids 158 - 205 of SEQ ID NO.
  • n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise TV, having a structure as follows: a sequence starting from any of amino acid numbers 157-x to 157; and ending at any of amino acid numbers 158+ ((n-2) - x), in which x varies from 0 to n-2.
  • n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise IV, having a structure as follows: a sequence starting from any of amino acid numbers 157-x to 157; and ending at any of amino acid numbers 158+ ((n-2) - x), in which x varies from 0 to n-2.
  • a first amino acid sequence being at least 70%
  • a second amino acid sequence being at least 90 % homologous to amino acids 13 - 128 of SEQ ID NO. 639, which also co ⁇ -esponds to amino acids 27 - 142 of SEQ ID NO.
  • a third amino acid sequence being at least 10%, optionally at least 80%, preferably at least 85%>, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 143 - 161 of SEQ ID NO. 598, wherein said first amino acid sequence, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order.
  • polypeptide 598 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 1 - 26 of SEQ ID NO. 598.
  • an isolated polypeptide encoding for a tail of SEQ ID NO. 598 comprising a polypeptide being at least 70%), optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95% homologous to amino acids 143 - 161 in SEQ ID NO. 598.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 598 comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 128 of SEQ ID NO. 638, which also corresponds to amino acids 1 - 128 of SEQ ID NO. 598, a bridging amino acid L corresponding to amino acid 129 of SEQ ID NO. 598, a second amino acid sequence being at least 90 % homologous to amino acids 130 - 142 of SEQ ID NO. 638, which also corresponds to amino acids 130 - 142 of SEQ ID NO.
  • a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 143 - 161 of SEQ ID NO. 598, wherein said first amino acid sequence, bridging amino acid, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order.
  • polypeptide 598 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 143 - 161 in SEQ ID NO. 598.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 598 comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 142 of Q9UJZ1, which also corresponds to amino acids 1 - 142 of SEQ ID NO.
  • a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%) and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 143 - 161 of SEQ ID NO. 598, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
  • polypeptide 598 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to amino acids 143 - 161 in SEQ ID NO. 598.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 600 comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 61 of SEQ ID NO. 638, which also corresponds to amino acids 1 - 61 of SEQ ID NO.
  • a second amino acid sequence being at least 70%>, optionally at least 80% ⁇ , preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 62 - 102 of SEQ ID NO. 600, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 600 comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 61 of Q9UJZ1, which also corresponds to amino acids 1 - 61 of SEQ ID NO.
  • a second amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90%> and most preferably at least 95%> homologous to a polypeptide sequence corresponding to amino acids 62 - 102 of SEQ ID NO. 600, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
  • a second amino acid sequence being at least 90 % homologous to amino acids 13 - 47 of SEQ ID NO. 639, which also corresponds to amino acids 27 - 61 of SEQ ID NO. 601, and a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 62 - 72 of SEQ ID NO. 601, wherein said first amino acid sequence, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 601, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 61 of Q96FY2, which also corresponds to amino acids 1 - 61 of SEQ ID NO. 601, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence con-esponding to amino acids 62 - 72 of SEQ ID NO. 601, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
  • an isolated polypeptide encoding for a tail of SEQ ID NO. 601, comprising a polypeptide being at least 70%), optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 62 - 72 in SEQ ID NO. 601.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 601, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 61 of Q9UJZ1, which also corresponds to amino acids 1 - 61 of SEQ ID NO.
  • an isolated polypeptide encoding for a tail of SEQ ID NO. 601, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 62 - 72 in SEQ ID NO. 601.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 602 comprising a first amino acid sequence being at least 70%), optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 1 - 26 of SEQ ID NO. 602, a second amino acid sequence being at least 90 % homologous to amino acids 13 - 80 of SEQ ID NO. 639, which also corresponds to amino acids 27 - 94 of SEQ ID NO.
  • a third amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%o, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 95 - 111 of SEQ ID NO. 602, wherein said first amino acid sequence, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order.
  • an isolated polypeptide encoding for a tail of SEQ ID NO. 602 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95% homologous to amino acids 95 - 111 in SEQ ID NO. 602.
  • a first amino acid sequence being at least 90 % homologous to amino acids 1 - 94 of SEQ ID NO. 638, which also corresponds to amino acids 1 - 94 of SEQ ID NO. 602, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide sequence corresponding to amino acids 95 - 111 of SEQ ID NO. 602, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 581 comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 67 of PLTP HUMAN, which also corresponds to amino acids 1 - 67 of SEQ ID NO. 581, and a second amino acid sequence being at least 90 % homologous to amino acids 163 - 493 of PLTP_HUMAN, which also corresponds to amino acids 68 - 398 of SEQ ID NO. 581, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
  • an isolated chimeric polypeptide encoding for an edge portion of SEQ ID NO. 581 comprising a polypeptide having a length "n", wherem n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise EK, having a structure as follows: a sequence starting from any of amino acid numbers 67-x to 67; and ending at any of amino acid numbers 68+ ((n-2) - x), in which x varies from 0 to n-2.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 582 comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 427 of PLTPJHUMAN, which also corresponds to amino acids 1 - 427 of SEQ ID NO. 582, and a second amino acid sequence being at least 70%>, optionally at least 80%>, preferably at least 85%o, more preferably at least 90%) and most preferably at least 95%> homologous to a polypeptide sequence corresponding to amino acids 428 - 432 of SEQ ID NO. 582, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
  • an isolated polypeptide encoding for a tail of SEQ ID NO. 582 comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%o, more preferably at least about 90% and most preferably at least about 95%> homologous to amino acids 428 - 432 in SEQ ID NO. 582.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 584 comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 67 of PLTPJHUMAN, which also corresponds to amino acids 1 - 67 of SEQ ID NO.
  • a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 68 - 98 of SEQ ID NO. 584, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
  • polypeptide 584 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to amino acids 68 - 98 in SEQ ID NO. 584.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 585 comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 183 of PLTP_HUMAN, which also corresponds to amino acids 1 - 183 of SEQ ID NO.
  • an isolated polypeptide encoding for a tail of SEQ ID NO. 585 comprising a polypeptide being at least 70%), optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 184 - 200in SEQ ID NO. 585, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
  • an isolated polypeptide encoding for a tail of SEQ ID NO. 585 comprising a polypeptide being at least 70%), optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 184 - 200in SEQ
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 586 comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 205 of PLTP_HUMAN, which also corresponds to amino acids 1 - 205 of SEQ ID NO. 586, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%) and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 206 - 217 of SEQ ID NO. 586, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
  • an isolated polypeptide encoding for a tail of SEQ ID NO. 586 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 206 - 217 in SEQ ID NO. 586.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 587 comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 109 of PLTPJHUMAN, which also corresponds to amino acids 1 - 109 of SEQ ID NO.
  • a second amino acid sequence bridging amino acid sequence comprising of L, a third amino acid sequence being at least 90 % homologous to amino acids 163 - 183 of PLTP_HUMAN, which also corresponds to amino acids 111 - 131 of SEQ ID NO. 587, and a fourth amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%>, more preferably at least 90%. and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 132 - 148 of SEQ ID NO. 587, wherein said first amino acid sequence, second amino acid sequence, third amino acid sequence and fourth amino acid sequence are contiguous and in a sequential order.
  • an isolated polypeptide encoding for an edge portion of SEQ ID NO. 587 comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least three amino acids comprise FLK having a structure as follows (numbering according to SEQ ID NO. 587): a sequence starting from any of amino acid numbers 109-x to 109; and ending at any of amino acid numbers 111 + ((n-2) - x), in which x varies from 0 to n-2.
  • an isolated polypeptide encoding for a tail of SEQ ID NO. 587 comprising a polypeptide being at least 70%), optionally at least about 80%o, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95% homologous to amino acids 132 - 148 in SEQ ID NO. 587.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 576 comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 1056 of SEQ ID NO. 634, which also corresponds to amino acids 1 - 1056 of SEQ ID NO.
  • a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 1057 - 1081 of SEQ ID NO. 576, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
  • polypeptide 576 comprising a polypeptide being at least 70%), optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to amino acids 1057 - 1081 in SEQ ID NO. 576.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 577 comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 714 of SEQ ID NO. 634, which also corresponds to amino acids 1 - 714 of SEQ ID NO.
  • a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 715 - 729 of SEQ ID NO. 577, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 578 comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 648 of SEQ ID NO. 634, which also corresponds to amino acids 1 - 648 of SEQ ID NO. 578, a second amino acid sequence being at least 90 % homologous to amino acids 667 - 714 of SEQ ID NO.
  • n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise AG, having a structure as follows: a sequence starting from any of amino acid numbers 648-x to 648; and ending at any of amino acid numbers 649+ ((n-2) - x), in which x varies from 0 to n-2.
  • an isolated polypeptide encoding for a tail of SEQ ID NO.
  • polypeptide 578 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%) homologous to amino acids 697 - 738 in SEQ ID NO. 578.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 579 comprising a first amino acid sequence being at least 90 %> homologous to amino acids 1 - 260 of SEQ ID NO. 634, which also corresponds to amino acids 1 - 260 of SEQ ID NO.
  • a second amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90%) and most preferably at least 95%> homologous to a polypeptide sequence corresponding to amino acids 261 - 273 of SEQ ID NO. 579, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
  • polypeptide 579 comprising a polypeptide being at least 70%), optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95%> homologous to amino acids 261 - 273 in SEQ ID NO. 579.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 575 comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 13 of GFR2_HUMAN, which also corresponds to amino acids 1 - 13 of SEQ ID NO.
  • an isolated polypeptide encoding for a tail of SEQ ID NO. 575 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 14 - 30 in SEQ ID NO.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 567 comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 123 of SEQ ID NO. 631, which also corresponds to amino acids 1 - 123 of SEQ ID NO. 567, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 124 - 156 of SEQ ID NO. 567, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
  • an isolated polypeptide encoding for a tail of SEQ ID NO. 567 comprising a polypeptide being at least 70%), optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 124 - 156 in SEQ ID NO. 567.
  • a first amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%>, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide sequence con-esponding to amino acids 1 - 73 of SEQ ID NO. 567, and a second amino acid sequence being at least 90 % homologous to amino acids 1799 - 1881 of SEQ ID NO. 629, which also corresponds to amino acids 74 - 156 of SEQ ID NO. 567, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 567 comprising a first amino acid sequence being at least 90 % homologous to to amino acids 1 - 52 of SEQ ID NO. 630, which also corresponds to amino acids 1 - 52 of SEQ ID NO. 567, a bridging amino acid G corresponding to amino acid 53 of SEQ ID NO.
  • a second amino acid sequence being at least 90 % homologous to amino acids 54 - 124 of SEQ ID NO. 630, which also corresponds to amino acids 54 - 124 of SEQ ID NO. 567, and a third amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 125 - 156 of SEQ ID NO. 567, wherein said first amino acid sequence, bridging amino acid, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 568 comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 123 of SEQ ID NO. 631, which also corresponds to amino acids 1 - 123 of SEQ ID NO.
  • a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 124 - 169 of SEQ ID NO. 568, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 568 comprising a first amino acid sequence being at least 90 %> homologous to amino acids 1 - 52 of SEQ ID NO. 630, which also corresponds to amino acids 1 - 52 of SEQ ID NO. 568, a bridging amino acid G corresponding to amino acid 53 of SEQ ID NO.
  • a second amino acid sequence being at least 90 % homologous to amino acids 54 - 122 of SEQ ID NO. 630, which also corresponds to amino acids 54 - 122 of SEQ ID NO. 568
  • a third amino acid sequence being at least 70%, optionally at least 80%), preferably at least 85%>, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 123 - 136 of SEQ ID NO. 568
  • a fourth amino acid sequence being at least 90 %> homologous to amino acids 123 - 155 of SEQ ID NO. 630, which also corresponds to amino acids 137 - 169 of SEQ ID NO.
  • an isolated polypeptide encoding for an edge portion of SEQ ID NO. 568 comprising an amino acid sequence being at least 70%, optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence encoding for amino acids 123 - 136, corresponding to SEQ ID NO. 568.
  • a first amino acid sequence being at least 90 % homologous to amino acids 1 - 123 of SEQ ID NO. 631, which also corresponds to amino acids 1 - 123 of SEQ ID NO. 569, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide sequence corresponding to amino acids 124 - 180 of SEQ ID NO. 569, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 569 comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 52 of SEQ ID NO. 630, which also corresponds to amino acids 1 - 52 of SEQ ID NO. 569, a bridging amino acid G corresponding to amino acid 53 of SEQ ID NO.
  • a second amino acid sequence being at least 90 %> homologous to amino acids 54 - 123 of SEQ ID NO. 630, which also corresponds to amino acids 54 - 123 of SEQ ID NO. 569
  • a third amino acid sequence being at least 70%o, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 124 - 148 of SEQ ID NO. 569
  • a fourth amino acid sequence being at least 90 % homologous to amino acids 124 - 155 of SEQ ID NO. 630, which also corresponds to amino acids 149 - 180 of SEQ ID NO.
  • an isolated polypeptide encoding for an edge portion of SEQ ID NO. 569 comprising an amino acid sequence being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence encoding for amino acids 124 - 148, corresponding to SEQ ID NO. 569.
  • a first amino acid sequence being at least 90 % homologous to amino acids 1 - 123 of SEQ ID NO. 631, which also corresponds to amino acids 1 - 123 of SEQ ID NO. 570, and a second amino acid sequence being at least 70%, optionally at least 80%), preferably at least 85%>, more preferably at least 90%) and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 124 - 145 of SEQ ID NO. 570, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 570 comprising a first amino acid sequence being at least 90 %> homologous to amino acids 1 - 52 of SEQ ID NO. 630, which also corresponds to amino acids 1 - 52 of SEQ ID NO. 570, a bridging amino acid G corresponding to amino acid 53 of SEQ ID NO.
  • a second amino acid sequence being at least 90 % homologous to amino acids 54 - 124 of SEQ ID NO. 630, which also corresponds to amino acids 54 - 124 of SEQ ID NO. 570, and a third amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 125 - 145 of SEQ ID NO. 570, wherein said first amino acid sequence, bridging amino acid, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order.
  • an isolated polypeptide encoding for a tail of SEQ ID NO. 570 comprising a polypeptide being at least 10%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 125 - 145 in SEQ ID NO. 570.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 571 comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 101 of SEQ ID NO. 631, which also corresponds to amino acids 1 - 101 of SEQ ID NO.
  • a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 102 - 122 of SEQ ID NO. 571, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 571 comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 52 of SEQ ID NO. 630, which also corresponds to amino acids 1 - 52 of SEQ ID NO. 571, a bridging amino acid G corresponding to amino acid 53 of SEQ ID NO.
  • a second amino acid sequence being at least 90 % homologous to amino acids 54 - 101 of SEQ ID NO. 630, which also corresponds to amino acids 54 - 101 of SEQ ID NO. 571, and a third amino acid sequence being at least 70%>, optionally at least 80%o, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence conesponding to amino acids 102 - 122 of SEQ ID NO. 571, wherein said first amino acid sequence, bridging amino acid, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order.
  • an isolated polypeptide encoding for a tail of SEQ ID NO. 571 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90%) and most preferably at least about 95% homologous to amino acids 102 - 122 in SEQ ID NO. 571.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 572 comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 62 of SEQ ID NO. 631, which also corresponds to amino acids 1 - 62 of SEQ ID NO.
  • a bridging amino acid P conesponding to amino acid 63 of SEQ ID NO. 572 a second amino acid sequence being at least 90 % homologous to amino acids 64 - 123 of SEQ ID NO. 631, which also conesponds to amino acids 64 - 123 of SEQ ID NO. 572, and a third amino acid sequence being at least 70%>, optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence conesponding to amino acids 124 - 155 of SEQ ID NO. 572, wherein said first amino acid sequence, bridging amino acid, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order.
  • an isolated polypeptide encoding for a tail of SEQ ID NO. 572 comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90%) and most preferably at least about 95% homologous to amino acids 124 - 155 in SEQ ID NO. 572.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 572 comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 52 of SEQ ID NO. 630, which also conesponds to amino acids 1 - 52 of SEQ ID NO.
  • a bridging amino acid G conesponding to amino acid 53 of SEQ ID NO. 572 a second amino acid sequence being at least 90 % homologous to LSDDEETIS conesponding to amino acids 54 - 62 of SEQ ID NO. 630, which also conesponds to amino acids 54 - 62 of SEQ ID NO. 572, a bridging amino acid P conesponding to amino acid 63 of SEQ ID NO. 572, and a third amino acid sequence being at least 90 %> homologous to amino acids 64 - 155 of SEQ ID NO. 630, which also conesponds to amino acids 64 - 155 of SEQ ID NO.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 573 comprising a first amino acid sequence being at least 90 %> homologous to amino acids 1 - 62 of SEQ ID NO. 631 which also conesponds to amino acids 1 - 62 of SEQ ID NO. 573, a bridging amino acid P conesponding to amino acid 63 of SEQ ID NO. 573, a second amino acid sequence being at least 90 % homologous to amino acids 64 - 101 of SEQ ID NO.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 573 comprising a first amino acid sequence being at least 90 %> homologous to amino acids 1 - 52 of SEQ ID NO. 630 which also conesponds to amino acids 1 - 52 of SEQ ID NO. 573, a bridging amino acid G conesponding to amino acid 53 of SEQ ID NO.
  • a second amino acid sequence being at least 90 % homologous to amino acids 54 - 62 of SEQ ID NO. 630, which also conesponds to amino acids 54 - 62 of SEQ ID NO. 573, a bridging amino acid P conesponding to amino acid 63 of SEQ ID NO. 573, a third amino acid sequence being at least 90 % homologous to amino acids 64 -
  • SEQ ID NO. 630 which also conesponds to amino acids 64 - 101 of SEQ ID NO. 573, and a fourth amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%> and most preferably at least 95%> homologous to a polypeptide sequence conesponding to amino acids 102 - 109 of SEQ ID NO. 573, wherein said first amino acid sequence, bridging amino acid, second amino acid sequence, bridging amino acid, third amino acid sequence and fourth amino acid sequence are contiguous and in a sequential order.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 574 comprising a first amino acid sequence being at least 90 %> homologous to amino acids 1 - 62 of SEQ ID NO. 631, which also conesponds to amino acids 1 - 62 of SEQ ID NO. 574, a bridging amino acid P corresponding to amino acid 63 of SEQ ID NO.
  • a second amino acid sequence being at least 90 % homologous to amino acids 64 - 101 of SEQ ID NO. 631, which also conesponds to amino acids 64 - 101 of SEQ ID NO. 574, and a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%> and most preferably at least 95%> homologous to a polypeptide sequence conesponding to amino acids
  • an isolated polypeptide encoding for a tail of SEQ ID NO. 574 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 102 - 133 in
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 574 comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 52 of SEQ ID NO. 630, which also corresponds to amino acids 1 - 52 of SEQ ID NO. 574, a bridging amino acid G conesponding to amino acid 53 of SEQ ID NO. 574, a second amino acid sequence being at least 90 % homologous to amino acids 54 - 62 of SEQ ID NO. 630, which also conesponds to amino acids 54 - 62 of SEQ ID NO.
  • a bridging amino acid P conesponding to amino acid 63 of SEQ ID NO. 574 a third amino acid sequence being at least 90 % homologous to amino acids 64 - 101 of SEQ ID NO. 630, which also conesponds to amino acids 64 - 101 of SEQ ID NO. 574, and a fourth amino acid sequence being at least 90 % homologous to amino acids 124 - 155 of SEQ ID NO. 630, which also corresponds to amino acids 102 - 133 of SEQ ID NO. 574, wherein said first amino acid sequence, bridging amino acid, second amino acid sequence, bridging amino acid, third amino acid sequence and fourth amino acid sequence are contiguous and in a sequential order.
  • an isolated chimeric polypeptide encoding for an edge portion of SEQ ID NO. 574 comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise KV, having a structure as follows: a sequence starting from any of amino acid numbers 101-x to 101; and ending at any of amino acid numbers 102+ ((n-2) - x), in which x varies from 0 to n-2.
  • an isolated chimeric polypeptid e encoding for SEQ ID NO. 564 comprising a first amino acid sequence being at least 90 %> homologous to amino acids 1 - 1617 of SEQ ID NO. 627, which also corresponds to amino acids 1 - 1617 of SEQ ID NO. 564, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 1618 - 1645 of SEQ ID NO. 564, wherem said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
  • an isolated polypeptide encoding for a tail of SEQ ID NO. 564 comprising a polypeptide being at least 70%), optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 1618 - 1645 in SEQ ID NO. 564.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 565 comprising a first amino acid sequence being at least 90 %> homologous to amino acids 1 - 2062 of SEQ ID NO. 627, which also conesponds to amino acids 1 - 2062 of SEQ ID NO.
  • an isolated polypeptide encoding for a tail of SEQ ID NO. 565 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 2063 - 2074 in
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 566 comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 587 of SEQ ID NO. 627, which also conesponds to amino acids 1 - 587 of SEQ ID NO. 566, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 588 - 603 of SEQ ID NO. 566, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
  • an isolated polypeptide encoding for a tail of SEQ ID NO. 566 comprising a polypeptide being at least 70%), optionally at least about 80% > , preferably at least about 85%o, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 588 - 603 in SEQ ID NO. 566.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 560 comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 131 of SEQ ID NO.
  • polypeptide 560 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to amino acids 132 - 139 in SEQ ID NO. 560.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 561 comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 131 of SEQ ID NO. 625, which also conesponds to amino acids 1 - 131 of SEQ ID NO.
  • an isolated polypeptide encoding for a tail of SEQ ID NO. 561, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 132 - 156 in SEQ ID NO.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 562 comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 81 of SEQ ID NO. 625, which also corresponds to amino acids 1 - 81 of SEQ ID NO. 562, and a second amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90%o and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 82 - 89 of SEQ ID NO. 562, wherein said first and second amino acid sequences are contiguous and in a sequential order.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 563 comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 82 of SEQ ID NO. 625 which also conesponds to amino acids 1 - 82 of SEQ ID NO. 563.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 552 comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 1 - 116 of FABHJHUMAN, which also corresponds to amino acids 1 - 116 of SEQ ID NO. 552, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%>, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide sequence conesponding to amino acids 117 - 215 of SEQ ID NO.
  • an isolated polypeptide encoding for a tail of SEQ ID NO. 552 comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95 %> homologous to amino acids 117 - 215 in SEQ ID NO. 552.
  • a first amino acid sequence being at least 90 % homologous to amino acids 1 - 116 of AAP35373, which also conesponds to amino acids 1 - 116 of SEQ ID NO. 552, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence conesponding to amino acids 117 - 215 of SEQ ID NO. 552, wherein said first and second amino acid sequences are contiguous and in a sequential order.
  • a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence conesponding to amino acids 117 - 178 of SEQ ID NO. 553, wherein said first and second amino acid sequences are contiguous and in a sequential order.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 553, comprising a first amino acid sequence being at least 90 %> homologous to amino acids 1 - 116 of AAP35373, which also conesponds to amino acids 1 - 116 of SEQ ID NO.
  • an isolated polypeptide encoding for a tail of SEQ ID NO. 553, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90%) and most preferably at least about 95%> homologous to acids 117 - 178 in SEQ ID NO.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 553, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%) and most preferably at least 95% homologous to a polypeptide sequence conesponding to amino acids 1 - 116 of FABH HUMAN, which also conesponds to amino acids 1 - 116 of SEQ ID NO. 553, and a second amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide sequence conesponding to amino acids
  • an isolated polypeptide encoding for a tail of SEQ ID NO. 553, comprising a polypeptide being at least 70%>, optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 117 - 178 in SEQ ID NO. 553,
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 553, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 116 of AAP35373, which also conesponds to amino acids 1 - 116 of SEQ ID NO. 553, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least
  • an isolated polypeptide encoding for a tail of SEQ ID NO. 553, comprising a polypeptide being at least 10%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 117 - 178 in SEQ ID NO. 553.
  • first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence conesponding to amino acids 1 - 116 of FABHJHUMAN, which also conesponds to amino acids 1 - 116 of SEQ ID NO. 554, and a second amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence conesponding to amino acids 117 - 126 of SEQ ID NO. 554, wherein said first and second amino acid sequences are contiguous and in a sequential order.
  • an isolated polypeptide encoding for a tail of SEQ ID NO. 554 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 117 - 126 in SEQ ID NO. 554.
  • an isolated polypeptide encoding for a tail of SEQ ID NO. 554, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 117 - 126 in SEQ ID NO.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 555 comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 24 of FABHJHUMAN, which also conesponds to amino acids 1 - 24 of SEQ ID NO. 555, second amino acid sequence being at least 70%), optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95 %> homologous to a polypeptide sequence conesponding to amino acids 25 - 35 of SEQ ID NO.
  • an isolated polypeptide encoding for an edge portion of SEQ ID NO. 555 comprising an amino acid sequence being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence encoding for amino acids 25 - 35 conesponding to SEQ ID NO. 555.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 555 comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 24 of AAP35373, which also corresponds to amino acids 1 - 24 of SEQ ID NO. 555, second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 25 - 35 of SEQ ID NO. 555, and a third amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for an edge portion of SEQ ID NO.
  • an amino acid sequence being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence encoding for amino acids 25 - 35 conesponding to SEQ ID NO. 555.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 534 comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 476 of EPB2JHUMAN, which also conesponds to amino acids 1 - 476 of SEQ ID NO.
  • a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide sequence conesponding to amino acids 477 - 496 of SEQ ID NO. 534, wherein said first and second amino acid sequences are contiguous and in a sequential order.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 535 comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 270 of EPB2_HUMAN, which also corresponds to amino acids 1 - 270 of SEQ ID NO.
  • a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 271 - 301 of SEQ ID NO. 535, wherein said first and second amino acid sequences are contiguous and in a sequential order.
  • polypeptide 535 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to amino acids 271 - 301 in SEQ ID NO. 535.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 536 comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 319 of CEA6JHUMAN, which also conesponds to amino acids 1 - 319 of SEQ ID NO.
  • a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%) and most preferably at least 95%> homologous to a polypeptide sequence conesponding to amino acids 320 - 324 of SEQ ID NO. 536, wherein said first and second amino acid sequences are contiguous and in a sequential order.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 537 comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 234 of CEA6JHUMAN, which also conesponds to amino acids 1 - 234 of SEQ ID NO.
  • a second amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence conesponding to amino acids 235 - 256 of SEQ ID NO. 537, wherein said first and second amino acid sequences are contiguous and in a sequential order.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 537 comprising a first amino acid sequence being at least 90 %> homologous to amino acids 1 - 234 of Q 13774, which also conesponds to amino acids 1 - 234 of SEQ ID NO.
  • an isolated polypeptide encoding for a tail of SEQ ID NO. 537 comprising a polypeptide being at least 70%), optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to acids 235 - 256 in SEQ ID NO. 537, wherein said first and second amino acid sequences are contiguous and in a sequential order.
  • an isolated polypeptide encoding for a tail of SEQ ID NO. 537 comprising a polypeptide being at least 70%), optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to acids 235 - 256 in SEQ ID
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 538 comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 320 of CEA6JTUMAN, which also conesponds to amino acids 1 - 320 of SEQ ID NO. 538, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%) and most preferably at least 95% homologous to a polypeptide sequence conesponding to amino acids 321 - 390 of SEQ ID NO. 538, wherein said first and second amino acid sequences are contiguous and in a sequential order.
  • an isolated polypeptide encoding for a tail of SEQ ID NO. 538 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to amino acids 321 - 390 in SEQ ID NO. 538.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 539 comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 141 of CEA6JHUMAN, which also conesponds to amino acids 1 - 141 of SEQ ID NO.
  • a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence conesponding to amino acids 142 - 183 of SEQ ID NO. 539, wherein said first and second amino acid sequences are contiguous and in a sequential order.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 540 comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 167 of Q9HAP5, which also conesponds to amino acids 1 - 167 of SEQ ID NO.
  • an isolated polypeptide encoding for a tail of SEQ ID NO. 540 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 168 - 180 in SEQ ID NO.
  • an isolated polypeptide encoding for an edge portion of SEQ ID NO. 541, comprising an amino acid sequence being at least 70%>, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence encoding for acids 358 - 437 conesponding to SEQ ID NO. 541.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 542 comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 269 of Q9H4D7, which also conesponds to amino acids 1 - 269 of SEQ ID NO. 542, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence conesponding to amino acids 270 - 490 of SEQ ID NO. 542, wherein said first and second amino acid sequences are contiguous and in a sequential order.
  • an isolated polypeptide encoding for a tail of SEQ ID NO. 542 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95%> homologous to amino acids 270 - 490 in SEQ ID NO. 542.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 542 comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 269 of Q8N441, which also conesponds to amino acids 1 - 269 of SEQ ID NO.
  • a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence conesponding to amino acids 270 - 490 of SEQ ID NO. 542, wherein said first and second amino acid sequences are contiguous and in a sequential order.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 543 comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 81 of SZ05JHUMAN, which also conesponds to amino acids 1 - 81 of SEQ ID NO. 543.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 543 comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 81 of SZ05JHUMAN, which also conesponds to amino acids 1 - 81 of SEQ ID NO. 543.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 545 comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 103 of MI2B_HUMAN, which also conesponds to amino acids 1 - 103 of SEQ ID NO. 545.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 545 comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 103 of MI2B_HUMAN, which also conesponds to amino acids 1 - 103 of SEQ ID NO. 545.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 547 comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence conesponding to amino acids 1 - 103 of SEQ ID NO.
  • an isolated polypeptide encoding for a head of SEQ ID NO. 547 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95%) homologous to amino acids 1 - 103 of SEQ ID NO. 547.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 548 comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence conesponding to amino acids 1 - 29 of SEQ ID NO. 548, and a second amino acid sequence being at least 90 %> homologous to amino acids 151 - 461 of DCOR_HUMAN, which also conesponds to amino acids 30 - 340 of SEQ ID NO. 548, wherein said first and second amino acid sequences are contiguous and in a sequential order.
  • an isolated polypeptide encoding for a head of SEQ ID NO. 548 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 1 - 29 of SEQ ID NO. 548.
  • a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%> and most preferably at least 95%> homologous to a polypeptide sequence conesponding to amino acids 1 - 29 of SEQ ID NO. 548, and a second amino acid sequence being at least 90 % homologous to amino acids 40 - 350 of AAA59968, which also conesponds to amino acids 30 - 340 of SEQ ID NO. 548, wherein said first and second amino acid sequences are contiguous and in a sequential order.
  • a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%> and most preferably at least 95%> homologous to a polypeptide sequence conesponding to amino acids 1 - 29 of SEQ ID NO. 548
  • a second amino acid sequence being at least 90 % homologous to amino acids 40 - 350 of AAA59968, which also conesponds to amino acids 30
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 548 comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide sequence conesponding to amino acids 1 - 29 of SEQ ID NO.
  • an isolated polypeptide encoding for a head of SEQ ID NO. 548 comprising a polypeptide being at least 70%o, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 1 - 29 of SEQ ID NO. 548.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 549 comprising a first amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence conesponding to amino acids 1 - 44 of SEQ ID NO. 549, second amino acid sequence being at least 90 %> homologous to amino acids 74 - 191 of Q9NWT9, which also conesponds to amino acids 45 - 162 of SEQ ID NO.
  • a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence conesponding to amino acids 163 - 238 of SEQ ID NO. 549, wherein said first, second and third amino acid sequences are contiguous and in a sequential order.
  • polypeptide 549 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95 %> homologous to amino acids 1 - 44 of SEQ ID NO. 549.
  • an isolated polypeptide encoding for a tail of SEQ ID NO. 549 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 163 - 238 in SEQ ID NO. 549.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 549 comprising a first amino acid sequence being at least 10%, optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence conesponding to amino acids 1 - 44 of SEQ ID NO. 549, and a second amino acid sequence being at least 90 %> homologous to amino acids 21 - 214 of TESCJHUMAN, which also conesponds to amino acids 45 - 238 of SEQ ID NO. 549, wherein said first and second amino acid sequences are contiguous and in a sequential order.
  • an isolated polypeptide encoding for a head of SEQ ID NO. 549 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 1 - 44 of SEQ ID NO. 549.
  • a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence conesponding to amino acids 1 - 130 of SEQ ID NO. 550, and a second amino acid sequence being at least 90 % homologous to amino acids 1 - 172 of Q96C98, which also conesponds to amino acids 131 - 302 of SEQ ID NO. 550, wherein said first and second amino acid sequences are contiguous and in a sequential order.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 550 comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence conesponding to amino acids 1 - 74 of SEQ ID NO.
  • an isolated polypeptide encoding for a head of SEQ ID NO. 550 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 1 - 74 of SEQ ID NO. 550.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 551 comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%) and most preferably at least 95% homologous to a polypeptide sequence conesponding to amino acids 1 - 34 of SEQ ID NO. 551, and a second amino acid sequence being at least 90 % homologous to conesponding to amino acids 60 - 172 of Q96C98, which also conesponds to amino acids 35 - 147 of SEQ ID NO. 551, wherein said first and second amino acid sequences are contiguous and in a sequential order.
  • an isolated polypeptide encoding for a head of SEQ ID NO. 551 comprising a polypeptide being at least 10%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95%> homologous to amino acids 1 - 34 of SEQ ID NO. 551.
  • a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence conesponding to amino acids 1 - 34 of SEQ ID NO. 551, and a second amino acid sequence being at least 90 % homologous to conesponding to amino acids 168 - 280 of Q9BVA2, which also conesponds to amino acids 35 - 147 of SEQ ID NO. 551, wherein said first and second amino acid sequences are contiguous and in a sequential order.
  • an isolated polypeptide encoding for a head of SEQ ID NO. 548 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95% homologous to amino acids 1 - 34 of SEQ ID NO. 551.
  • an isolated polypeptide encoding for a head of SEQ ID NO. 548 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95% homologous to amino acids 1 - 29 of SEQ ID NO. 548.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 556 comprising a first amino acid sequence being at least 90 %> homologous to amino acids 1 - 441 of SM02JHUMAN, which also conesponds to amino acids 1 - 441 of SEQ ID NO. 556, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence conesponding to amino acids 442 - 464 of SEQ ID NO. 556, wherein said first and second amino acid sequences are contiguous and in a sequential order.
  • an isolated polypeptide encoding for a tail of SEQ ID NO. 556 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95% homologous to amino acids 442 - 464 in SEQ ID NO. 556.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 557 comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 428 of SM02JHUMAN, which also conesponds to amino acids 1 - 428 of SEQ ID NO.
  • a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%) and most preferably at least 95% homologous to a polypeptide sequence conesponding to amino acids 429 - 434 of SEQ ID NO. 557, wherein said first and second amino acid sequences are contiguous and in a sequential order.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 558 comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 441 of SM02JHUMAN, which also conesponds to amino acids 1 - 441 of SEQ ID NO.
  • a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence conesponding to amino acids 442 - 454 of SEQ ID NO. 558, wherein said first and second amino acid sequences are contiguous and in a sequential order.
  • an isolated chimeric polypeptide encoding for SEQ ID NO. 559 comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 170 of SM02JHUMAN, which also conesponds to amino acids 1 - 170 of SEQ ID NO.
  • a second amino acid sequence being at least 90 % homologous to amino acids 188 - 446 of SM02JHUMAN, which also conesponds to amino acids 171 - 429 of SEQ ID NO. 559, wherein said first and second amino acid sequences are contiguous and in a sequential order.
  • an isolated chimeric polypeptide encoding for an edge portion of SEQ ID NO. 559 comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise TD, having a structure as follows: a sequence starting from any of amino acid numbers 170-x to 170; and ending at any of amino acid numbers 171+ ((n-2) - x), in which x varies from 0 to n-2.
  • an antibody capable of specifically binding to an epitope of an amino acid sequence from clusters of M85491, T10888, H14624, H53626, HSENA78, HUMGROG5, HUMODCA, R00299, Z19178, S67314, Z44808, Z25299, HUMF5A, HUMANK, Z39818, HUMCA1XIA, HSSIOOPCB, HUMPHOSLIP, D11853, R11723, M77903 and HSKITCR.
  • said amino acid sequence cones ponds to a bridge, edge portion, tail, head or insertion.
  • kits for detecting colon cancer comprising a kit detecting overexpression of a splice variant from clusters of M85491, T10888, H14624, H53626, HSENA78, HUMGROG5, HUMODCA, R00299, Z19178, S67314, Z44808, Z25299, HUMF5A, HUMANK, Z39818, HUMCAIXIA,
  • the kit comprises a NAT-based technology.
  • the kit further comprises at least one primer pair capable of selectively hybridizing to a nucleic acid sequence.
  • the kit further comprises at least one oligonucleotide capable of selectively hybridizing to a nucleic acid sequence.
  • the kit comprises an antibody.
  • the kit further comprises at least one reagent for performing an ELISA or a Western blot.
  • an method for detecting colon cancer comprising detecting overexpression of a splice variant from clusters of M85491, T10888, H14624, H53626, HSENA78, HUMGROG5, HUMODCA, R00299, Z19178, S67314, Z44808, Z25299, HUMF5A, HUMANK, Z39818, HUMCAIXIA, HSSIOOPCB, HUMPHOSLIP, DI 1853, RI 1723, M77903 and HSKITCR.
  • detecting overexpression is performed with a NAT-based technology.
  • said detecting overexpression is performed with an immunoassay.
  • the immunoassay comprises an antibody.
  • a biomarker capable of detecting colon cancer comprising nucleic acid sequences or a fragment thereof, or amino acid sequences or a fragment thereof from clusters of M85491, T10888, H14624, H53626, HSENA78, HUMGROG5, HUMODCA, R00299, Z19178, S67314, Z44808, Z25299, HUMF5A, HUMANK, Z39818, HUMCAIXIA, HSSIOOPCB, HUMPHOSLIP, D11853, R11723, M77903 and HSKITCR.
  • a method for screening for colon cancer comprising detecting colon cancer cells with a biomarker or an antibody or a method or assay.
  • a method for diagnosing colon cancer comprising detecting colon cancer cells with a biomarker or an antibody or a method or assay.
  • a method for monitoring disease progression of colon cancer comprising detecting colon cancer cells with a biomarker or an antibody or a method or assay.
  • a method of selecting a therapy for colon cancer comprising detecting colon cancer cells with a biomarker or an antibody or a method or assay and selecting a therapy according to said detection.
  • an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
  • nucleic acid sequence comprising a sequence selected from the table below:
  • an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
  • AI684092 PEA 1 T2 AI684092 PEA 1 T3 a nucleic acid sequence comprising a sequence in the table below:
  • AI684092_PEA_ l_node_0 AI684092JPEA_ l_node_2
  • AI684092_PEA_ _l_node_4 AI684092_PEA_ l_node_5
  • AI684092JPEA_ l_node_6 AI684092_PEA_ l_node_7
  • AI684092_PEA_ _l__node_8 AI684092_PEA_ l_node_9
  • an isolated polypeptide comprising an amino acid sequence in the table below:
  • an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
  • nucleic acid sequence comprising a sequence in the table below:
  • an isolated polypeptide comprising an amino acid sequence in the table below:
  • an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
  • HUMCEA_PEA_ _1_T30 a nucleic acid sequence comprising a sequence in the table below: Segment Name HUMCEAJPEA_ l_node_ _0 HUMCEA_PEA_ l_node_ 2 HUMCEA_PEA_ l_node_ 6 HUMCEA JPEA_ l ⁇ ode_ . 33 HUMCEA JPEA_ l_node_ -34 HUMCEA > EA_ l_node_ , 35 HUMCEA_PEA_ l_node_45 HUMCEAJPEA_ l_node_ .
  • an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or: 10.
  • an isolated polypeptide comprising an amino acid sequence in the table below: Protein Name M78035. _P2 M78035. _P4 M78035. _P6 M78035_P8 M78035. _P18 M78035. JP19
  • an isolated polynucleotide comprising a nucleic acid sequence in the table below and or:
  • nucleic acid sequence comprising a sequence in the table below:
  • an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or: Transcript Name
  • T23657 _T35 T23657 T37 T23657 T38 a nucleic acid sequence comprising a sequence in the table below:
  • an isolated polypeptide comprising an amino acid sequence in the table below:
  • an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
  • an isolated polypeptide comprising an amino acid sequence in the table below:
  • an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
  • nucleic acid sequence comprising a sequence in the table below:
  • an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
  • nucleic acid sequence comprising a sequence in the table below:
  • an isolated polypeptide comprising an amino acid sequence in the table below:
  • an isolated chimeric polypeptide encoding for HSHCGIJPEAJ J 3 17, comprising a first amino acid sequence being at least 90 % homologous to MASGQFVNKLQEEVICPICLDILQKPVTIDCGHNFCPQCITQIGETSCGFFKCPLCKTSVR RDAIRFNSLLRNLVEKIQALQASEVQSKRKEATCPRHQEMFHYFCEDDGKFLCFVCRES KDHKSHNVSLIEEAAQNYQGQIQEQIQVLQQKEKETVQVKAQGVHRVDVFTDQVEHE KQRILTEFELLHQVLEEEKNFLLSRIYWLGHEGTEAGKHYV conesponding to amino acids 1 - 218 of TM31JHUMAN, which also conesponds to amino acids 1 - 218 of HSHCGI__PEAJ_P17, and a second amino acid sequence being at least 70%>, optionally at least 80%), preferably at
  • an isolated polypeptide encoding for a tail of HSHCGIJPEAJ JP17 comprising a polypeptide being at least 70%>, optionally at least about 80%), preferably at least about 85%>, more preferably at least about 90%> and most preferably at least about 95% homologous to the sequence EIPLMPTVERSQEARCYP in HSHCGI_PEAJ_P17.
  • an isolated chimeric polypeptide encoding for HSHCGIJPEAJJP19 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a tail of HSHCGIJPEAJ JP19 comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence NWRKNSVKQNQDTTPSQGA in HSHCGI _PEA _3 JP 19.
  • an isolated polypeptide encoding for a tail of HSHCGIJPEA JP4 comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence YDGPPQMYFAY in HSHCGIJ ⁇ AJ J > 4.
  • an isolated chimeric polypeptide encodmg for HSHCGIJPEA_3JP6 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a tail of HSHCGI PEA JP6 comprising a polypeptide being at least 70%o, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence PTPG in HSHCGI J?EAJJ > 6.
  • an isolated chimeric polypeptide encoding for HSHCGIJPEA J JP7 comprising a first amino acid sequence being at least 90 %> homologous to
  • TM31_HUMAN_V1 which also conesponds to amino acids 1 - 257 of HSHCGIJPEA JJP7, and a second amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%o homologous to a polypeptide having the sequence SFSHTSSPDLTNQLNHIFLEVKSFSFSTQPLFLWNWRKNSVKQNQDTTPSQGA conesponding to amino acids 258 - 310 of HSHCGIJPEA JP7, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
  • an isolated polypeptide encoding for a tail of HSHCGI J > EAJ_P7 comprising a polypeptide being at least 70%>, optionally at least about 80%o, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SFSHTSSPDLTNQLNHIFLEVKSFSFSTQPLFLWNWRKNSVKQNQDTTPSQGA in HSHCGI_PEAJJP7.
  • an isolated chimeric polypeptide encoding for HSHCGI JPEAJJP 8 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated chimeric polypeptide encoding for HSHCGIJPEA JP9 comprising a first amino acid sequence being at least 90 % homologous to MASGQFVNKLQEEVICPICLDILQKPVTIDCGHNFCLKCITQIGETSCGFFKCPLCKTSVR KNAIRFNSLLRNLVEKIQALQASEVQSKRKEATCPRHQEMFHYFCEDDGKFLCFVCRES KDHKSHNVSLIEEAAQNYQGQIQEQIQVLQQKEKETVQVKAQGVHRVDVFTDQVEHE KQRILTEFELLHQVLEEEKNFLLSRIYWLGHEGTEAGKHYVASTEPQLNDLKKLVDSLK TKQNMPPRQLLEDIKVVLCR conesponding to amino acids 1 - 256 of TM31JHUMANJV1, which also conesponds to amino acids 1 - 256 of HSHCGIJPEA
  • an isolated polypeptide encoding for a tail of HSHCGI PEAJ P9 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence TGEKTQ in HSHCGIJPEA_3 JP9.
  • an isolated chimeric polypeptide encoding for HSHCGIJPEA JJP12 comprising a first amino acid sequence being at least 90 % homologous to MNKNDMKSWGLLQKNNHKMNKTSEPGSSSAGGRTTSGPPNHHSSAPSHSLFRASSAG KVTFPVCLLASYDEISGQGASSQDTKTFDVALSEELHAALSEWLTAIRAWFCEVPSS conesponding to amino acids 312 - 425 of TM31 JHUMAN, which also conesponds to amino acids 1 - 114 of HSHCGI_PEAJ_P12.
  • an isolated chimeric polypeptide encoding for HSHCGI PEA JP14 comprising a first amino acid sequence being at least 90 % homologous to
  • HSHCGIJPEAJ JP 14 wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
  • an isolated chimeric polypeptide encoding for HSHCGIJPEAJ J 3 16 comprising a first amino acid sequence being at least 90 %> homologous to
  • an isolated polypeptide encoding for a tail of HSHCGI_PEAJ_P16 comprising a polypeptide being at least 70%>, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90%> and most preferably at least about 95%> homologous to the sequence VRKTPSHDLWKQKHLCQSSWNPLLH in HSHCGIJPEA JP16.
  • an isolated chimeric polypeptide encoding for HSHCGIJPEAJJP21 comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%> and most preferably at least 95%> homologous to a polypeptide having the sequence MHHSDWGNIMWIFQMSPLQNFRKEERNQ conesponding to amino acids 1 - 28 of HSHCGI JPE A JJP21, and a second amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a head of HSHCGI_PEAJ_P21 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MHHSDWGNIMWIFQMSPLQNFRKEERNQ of HSHCGI JPEAJ JP21.
  • an isolated chimeric polypeptide encoding for HSHCGIJPEA J JP22 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a tail of T51958_PEA_1_P5 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence GMGWGGLCCTGSGGPPvRLSPCTQPLCTEHGTEAIFVAAVGIRPSHHAAAQS in T51958_PEAJ_P5.
  • an isolated chimeric polypeptide encoding for T51958JPEAJJP6 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a tail of T51958_PEA_1_P28 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SEHLCPEGQGEVEGNTGLGVMDRGFPGTHLRSSQFWALQAWESVHYWESV in T51958 PEA 1 P28.
  • an isolated chimeric polypeptide encoding for T51958JPEAJ J > 28 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated chimeric polypeptide encoding for T51958 JPEAJ JP28 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a tail of T51958JPEAJJP28 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SEHLCPEGQGEVEGNTGLGVMDRGFPGTHLRSSQFWALQAWESVHYWESV in T51958JPEAJ_P28.
  • an isolated chimeric polypeptide encoding for T51958JPEAJ JP28 comprising a first amino acid sequence being at least 90 % homologous to MGAARGSPARPRRLPLLSVLLLPLLGGTQTAIVFIKQPSSQDALQGPvRALLRCEVEAPGP VHVYWLLDGAPVQDTEPvRFAQGSSLSFAAVDRLQDSGTFQCVARDDVTGEEARSANA SFNIKWIEAGPVVLKHPASEAEIQPQTQVTLRCHIDGHPRPTYQWFRDGTPLSDGQSNH TVSSKERNLTLRPAGPEHSGLYSCCAHSAFGQACSSQNFTLSIADESFARWLAPQDVV VARYEEAMFHCQFSAQPPPSLQWLFEDETPITNRSRPPHLRRATVFANGSLLLTQVRPR NAGIYRCIGQGQRGPPIILEATLHLAEIEDMPLFEPRVFTAGSEERVTCLPPKGLP
  • an isolated polypeptide encoding for a tail of T51958J ⁇ AJ JP28 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SEHLCPEGQGEVEGNTGLGVMDRGFPGTHLRSSQFWALQAWESVHYWESV in T51958JPEAJJ > 28.
  • an isolated chimeric polypeptide encoding for T51958 J ⁇ AJ JP28 comprising a first amino acid sequence being at least 90 %> homologous to
  • an isolated polypeptide encoding for a tail of T51958_PEA_1_P28 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence SEHLCPEGQGEVEGNTGLGVMDRGFPGTHLRSSQFWALQAWESVHYWESV in T51958 PEA 1 P28.
  • an isolated chimeric polypeptide encoding for T51958_PEA_1_P28 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a tail of T51958_PEA_1_P28 comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence SEHLCPEGQGEVEGNTGLGVMDRGFPGTHLRSSQFWALQAWESVHYWESV in T51958JPEAJJP28.
  • an isolated chimeric polypeptide encoding for T51958 JPEAJ JP30 comprising a first amino acid sequence being at least 90 % homologous to
  • MGAARGSPARPP RLPLLSVLLLPLLGGTQTAIVFIKQPSSQDALQGRRALLRCEVEAPGP VHVYWLLDGAPVQDTERRFAQGSSLSFAAVDR QDSGTFQCVARDDVTGEEARSANA SFNIK conesponding to amino acids 1 - 122 of PTK7_HUMAN_V13, which also conesponds to amino acids 1 - 122 of T51958JPEAJJP30, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence CESQGGCAQSPCQTLND conesponding to amino acids 123 - 139 of T51958J ⁇ AJJP30, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
  • an isolated polypeptide encoding for a tail of T51958JPEAJ JP30 comprising a polypeptide being at least 70%>, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95%> homologous to the sequence CESQGGCAQSPCQTLND in T51958J?EAJ JP30.
  • an isolated chimeric polypeptide encoding for T51958JPEAJ JP34 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated chimeric polypeptide encoding for T51958JPEAJJP35 comprising a first amino acid sequence being at least 90 % homologous to
  • T51958_PEAJJP35 and a second amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence GEPGVGAEGMR conesponding to amino acids 221 - 231 of T51958_PEA_1_P35, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
  • an isolated polypeptide encoding for a tail of T51958 J > EAJ JP35 comprising a polypeptide being at least 70%, optionally at least about 80%o, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95% > homologous to the sequence GEPGVGAEGMR in T51958_PEA_l_P35.
  • an isolated chimeric polypeptide encoding for T23657_P2 comprising a first amino acid sequence being at least 90 % homologous to MPLHQLGDKPLTFPSPNSAMENGLDHTPPSRRASPGTPLSPGSLRSAAHSPLDTSKQPLC QLWAEKHGARGTHEVRYVS AGQS VACGWWAF APPCLQVLNTPKGILFFLCAAAFLQG MTVNGFINTVITSLERRYDLHSYQSGLIASSYDIAACLCLTFVSYFGGSGHKPRWLGWG VLLMGTGSLVFALPHFTAGRYEVELDAGVRTCPANPGAVCADSTSGLSRYQLVFMLG QFLHGVGATPLYTLGVTYLDENVKSSCSPVYIAIFYTAAILGPAAGYLIGGALLNIYTEM GRPvTELTTESPLWVGAWWVGFLGSGAAAFFTAVPILGYPRQLPGSQRYAVMRAAEMH QLKDSSRGEASNPDF
  • S21CJTUMAN which also conesponds to amino acids 1 - 675 of T23657_P2, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%> and most preferably at least 95%> homologous to a polypeptide having the sequence FQLPEVHHSLNVLNRKFQKQTVHNL conesponding to amino acids 676 - 700 of T23657_P2, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
  • an isolated polypeptide encoding for a tail of T23657_P2 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95%> homologous to the sequence FQLPEVHHSLNVLNRKFQKQTVHNL in T23657_P2.
  • an isolated chimeric polypeptide encoding for T23657_P3 comprising a first amino acid sequence being at least 90 % homologous to
  • S21C_HUMAN which also conesponds to amino acids 1 - 675 of T23657_P3, and a second amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence TIKHKAF conesponding to amino acids 676 - 682 of T23657_P3, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
  • an isolated polypeptide encoding for a tail of T23657JP3, comprising a polypeptide being at least 70%), optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence TIKHKAF in T23657JP3.
  • an isolated chimeric polypeptide encoding for T23657J comprising a first amino acid sequence being at least 90 %> homologous to MPLHQLGDKPLTFPSPNSAMENGLDHTPPSRRASPGTPLSPGSLRSAAHSPLDTSKQPLC QLWAEKHGARGTHEVRYNSAGQSVACGWWAFAPPCLQVLNTPKGILFFLCAAAFLQG MTVNGFINTVITSLERRYDLHSYQSGLIASSYDIAACLCLTFVSYFGGSGHKPRWLGWG VLLMGTGSLVFALPHFTAGRYEVELDAGVRTCPANPGAVCADSTSGLSRYQLVFMLG QFLHGVGATPLYTLGVTYLDENVKSSCSPVYIAIFYTAAILGPAAGYLIGGALLNIYTEM GRRTELTTESPLWVGAWWVGFLGSGAAAFFTAVPILGYPRQLPGSQRYAVMRAAEMH QLKDSSRGEASNPDFGKT
  • an isolated polypeptide encoding for an edge portion of T23657_P4 comprising an amino acid sequence being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95%> homologous to the sequence encoding for GTVQCEEAMVSCTVCSLHKGM, conesponding to T23657JP4.
  • an isolated polypeptide encoding for a tail of T23657 J comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90%) and most preferably at least about 95% homologous to the sequence TIKHKAF in T23657_P4.
  • an isolated chimeric polypeptide encoding for T23657JP5 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a tail of T23657JP6, comprising a polypeptide being at least 70%), optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SGAAAYRPCPPLDPGKGPPCLPLVIGAIVGLPRCTETVAVSLRIFPLVLAMPLQGNALQL VRESPSFWFSYSL in T23657_P6.
  • an isolated chimeric polypeptide encoding for T23657JP7 comprising a first amino acid sequence being at least 90 % homologous to MPLHQLGDKPLTFPSPNSAMENGLDHTPPSPvRASPGTPLSPGSLRS AAHSPLDTSKQPLC QLWAEKHGARGTHEVRYVSAGQSVACGWWAFAPPCLQVLNTPKGILFFLCAAAFLQG MTVNGFINTVITSLERRYDLHSYQSGLIASSYDIAACLCLTFVSYFGGSGHKPRWLGWG VLLMGTGSLVFALPHFTAGRYEVELDAGVRTCPANPGAVCADSTSGLSRYQLVFMLG QFLHGVGATPLYTLGVTYLDENVKSSCSPVYIAIFYTAAILGPAAGYLIGGALLNIYTEM GRRTELTTESPLWVGAWWVGFLGSGAAAFFTAVPILGYPRQLPGSQRYAVMRAAEMH QLKDSSRGEASN
  • an isolated polypeptide encoding for a tail of T23657JP8 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence QHSCTNGNSTMCP in T23657JP8.
  • an isolated chimeric polypeptide encoding for T23657_P10 comprising a first amino acid sequence being at least 90 % homologous to MPLHQLGDKPLTFPSPNSAMENGLDHTPPSRRASPGTPLSPGSLRSAAHSPLDTSKQPLC QLWAEKHGARGTHEVRYVSAGQSVACGWWAFAPPCLQVLNTPKGILFFLCAAAFLQG MTVNGFLNTVITSLERRYDLHSYQSGLIASSYDIAACLCLTFVSYFGGSGHKPRWLGWG VLLMGTGSLVFALPHFTAGRYEVELDAGVRTCPANPGAVCADSTSGLSRYQLVFMLG QFLHGVGATPLYTLGVTYLDENVKSSCSPVYIAIFYTAAILGPAAGYLIGGALLNIYTEM GR TELTTESPLWVGAWWVGFLGSGAAAFFTAVPILGYPRQLPGSQRYAVMRAAEMH QLKDSSRGEASNPDF
  • an isolated polypeptide encoding for an edge portion of T23657JP10 comprising an amino acid sequence being at least 10%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95% homologous to the sequence encoding for GTVQCEEAMVSCTVCSLHKGM, conesponding to T23657JP10.
  • an isolated chimeric polypeptide encoding for T23657JP11 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a tail of T23657_P11 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95% homologous to the sequence ASCPKAT in T23657JP11.
  • an isolated chimeric polypeptide encoding for T23657JP12 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a tail of T23657JP12 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95% homologous to the sequence EEENEFRRL in T23657JP12.
  • an isolated chimeric polypeptide encoding for T23657JP16 comprising a first amino acid sequence being at least 70%>, optionally at least 80%), preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MGTSPMADPVPAGRQHGSGLDPTTRLSPLC conesponding to amino acids 1 - 30 of T23657JP16, and a second amino acid sequence being at least 90 %> homologous to SLLPEGHLNLTAPCNAACSCQPEHYSPVCGSDGLMYFSLCHAGCPAATETNVDGQKVY RDCSCIPQNLSSGFGHATAGKCTSTCQRKPLLLVFIFVVIFFTFLSSIPALTATLRCVRDPQ RSFALGIQWIVVRILGGIPGPIAFGWVIDKACLLWQDQCGQQGSCLVYQNSAMSRYILI MGLLYKVLGVLFFAI
  • an isolated polypeptide encoding for a head of T23657_P16 comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MGTSPMADPVPAGRQHGSGLDPTTRLSPLC of T23657JP16.
  • an isolated chimeric polypeptide encoding for T23657JP17 comprising a first amino acid sequence being at least 90 %> homologous to
  • an isolated chimeric polypeptide encoding for T23657_P21 comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence MWTAR conesponding to amino acids 1 - 5 of T23657JP21, and a second amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a head of T23657JP21 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MWTAR of T23657JP21.
  • an isolated chimeric polypeptide encoding for T23657JP23 comprising a first amino acid sequence being at least 90 % homologous to MPLHQLGDKPLTFPSPNSAMENGLDHTPPSRRASPGTPLSPGSLRSAAHSPLDTSKQPLC QLWAEKHGARGTHEVRYVSAGQSVACGWWAFAPPCLQVLNTPKGILFFLCAAAFLQG MTVNGFLNTVITSLERRYDLHSYQSGLIASSYDIAACLCLTFVSYFGGSGHKPRWLGWG VLLMGTGSLVFALPHFTAGRYEVELDAGVRTCPANPGAVCADSTSGLSRYQLVFMLG QFLHGVGATPLYTLGVTYLDENVKSSCSPVYIAIFYTAAILGPAAGYLIGGALLNIYTEM GRPvTELTTESPLWVGAWWVGFLGSGAAAFFTAVPILGYPRQLPGSQRYAVMRAAEMH QLKDSSRGEASNPDF
  • an isolated polypeptide encoding for a tail of T23657JP23 comprising a polypeptide being at least 70%), optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SGAAAYRPCPPLDPGKGPPCLPLVIGAIVGLPRCTETVAVSLRIFPLVLAMHCREMHFNL SEKAPPSGFHIRCNFLYIPQQHSCTNGNSTVSWGRVCACPELSLQHPEAELCRS in T23657JP23.
  • an isolated chimeric polypeptide encoding for R30650_PEA_2_P4 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a tail of R30650_PEA_2_P4 comprising a polypeptide being at least 70%>, optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence NWLVRHPDCINVPDWRGAICSGCYAQMYIQAYKTSNLRMKIIKNDFPSHPLYLEGALT RSTHYQQYQPVVTLQKGYTIHWDQTAPAELAIWLLNFNKGDWIRVGLCYPRGTTFSILS DVHNRLLKQTSKTGVFVRTLQMDKVEQSYPGRSHYYWDEDSGLLFLKLKAQNEREKF AFCSMKGCERIKIKALIPKNAGVSDCTATAYPKFTERAVVDVPMPKKLFGSQLKTKDHF LEVKMESSKQHFFHLWNDFAYIEVDGKKYPSSEDGIQVVID
  • an isolated chimeric polypeptide encoding for R30650JPEAJ2JP4 comprising a first amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%>, more preferably at least 90%) and most preferably at least 95% homologous to a polypeptide having the sequence MYLHIGEEIDGVDMRAEVGLLSRNIIVMGEMEDKCYPYRNHICNFFDFDTFGGHIKFAL GFKAAHLEGTELKHMGQQLVGQYPIHFHLAGD conesponding to amino acids 1 - 91 of R30650_PEA_2_P4, and a second amino acid sequence being at least 90 %> homologous to VDERGGYDPPTYIRDLSIHHTFSRCVTVHGSNGLLLKDVVGYNSLGHCFFTEDGPEERNT FDHCLGLLVKSGTLLPSDRDSKMCKMITEDSYPGYIPKPRQDCNAVSTFWMANPNNNL
  • an isolated polypeptide encoding for a head of R30650JPEA_2JP4 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95%> homologous to the sequence MYLHIGEEIDGVDMRAEVGLLSRNIIVMGEMEDKCYPYRNHICNFFDFDTFGGHIKFAL GFKAAHLEGTELKHMGQQLVGQYPIHFHLAGD ofR30650_PEAJ_P4.
  • an isolated chimeric polypeptide encoding for R30650_PEA_2_P4 comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MYLHIGEEIDGVDMRAEVGLLSRNIIVMGEMEDKCYPYRNHICNFFDFDTFGGHIKFAL GFKAAHLEGTELKHMGQQLVGQYPIHFHLAGDVDERGGYDPPTYIRDLSIHHTFSRCVT VHGSNGLLIKDVVGYNSLGHCFFTEDGPEEPvNTFDHCLGLLVKSGTLLPSDRDSKMCK MITEDSYPGYIPKPRQDCNAVSTFWMANPNNNLLNCAAAGSEETGFWFIFHHVPTGPSV GMYSPGYSEHIPLGKFYNNRAHSNYRAGMIIDNGVKTTEASAKDK
  • an isolated polypeptide encoding for a head of R30650_PEA JP4 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%o and most preferably at least about 95%> homologous to the sequence MYLHIGEEIDGVDMRAEVGLLSRNIIVMGEMEDKCYPYRNHICNFFDFDTFGGHIKFAL GFKAAHLEGTELKHMGQQLVGQYPIHFHLAGDVDERGGYDPPTYIRDLSIHHTFSRCVT VHGSNGLLIKDVVGYNSLGHCFFTEDGPEERNTFDHCLGLLVKSGTLLPSDRDSKMCK MITEDSYPGYIPKPRQDCNAVSTFWMANPNNNLINCAAAGSEETGFWFIFHHVPTGPSV GMYSPGYSEHIPLGKFYNNRAHSNYRAGMIIDNGVKTTEASAKDKRPFLSI
  • an isolated chimeric polypeptide encoding for R30650JPEA JP5 comprising a first amino acid sequence being at least 90 %> homologous to MDGVNLSTEVVYKKGQDYRFACYDRGRACRSYRVRFLCGKPVRPKLTVTIDTNVNSTI LNLEDNVQSWKPGDTLVIASTDYSMYQAEEFQVLPCRSCAPNQVKVAGKPMYLHIGEE IDGVDMRAEVGLLSRNIIVMGEMEDKCYPYRNHICNFFDFDTFGGHIKFALGFKAAHLE GTELKHMGQQLVGQYPIHFHLAGDVDERGGYDPPTYIRDLSIHHTFSRCVTVHGSNGLL IKDVVGYNSLGHCFFTEDGPEERNTFDHCLGLLVKSGTLLPSDRDSKMCKMITEDSYPG YIPKPRQDCNAVSTFWMANPNNNLINCAAAGSEETGFWFIFHHVPTGPSVGMYS
  • an isolated chimeric polypeptide encoding for R30650_PEAJ_P5 comprising a first amino acid sequence being at least 90 %> homologous to
  • an isolated polypeptide encoding for a tail of R30650JPEAJ2JP5 comprising a polypeptide being at least 70%>, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence NWLVRHPDCLNVPDWRGAICSGCYAQMYIQAYKTSNLRMKIIKNDFPSHPLYLEGALT RSTHYQQYQPWTLQKGYTIHWDQTAPAELAIWLINFNKGDWIRVGLCYPRGTTFSILS DVHNRLLKQTSKTGVFVRTLQMDKVEQSYPGRSHYYWDEDSGLLFLKLKAQNEREKF AFCSMKGCEmKIKALIPKNAGVSDCTATAYPKFTERAVVDVPMPKKLFGSQLKTKDHF LEVKMESSKQHFFHLWNDFAYIEVDGKKYPSSEDGIQVVVIDGNQGRVVS
  • an isolated chimeric polypeptide encoding for R30650JPEA_2_P5 comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%), more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MDGVNLSTEVVYKKGQDYRFACYDRGRACRSYRVRFLCGKPVRPKLTVTIDTNVNSTI LNLEDNVQSWKPGDTLVIASTDYSMYQAEEFQVLPCRSCAPNQVKVAGKPMYLHIGEE IDGVDMRAEVGLLSRNIIVMGEMEDKCYPYRNHICNFFDFDTFGGHIKFALGFKAAHLE GTELKHMGQQLVGQYPIHFHLAGD conesponding to amino acids 1 - 199 of R30650JPEA 2 P5, and a second amino acid sequence being at least 90 % homologous to
  • KGSFRPIWVTLDTEDHKAKIFQVVPIPVVKKKKL conesponding to amino acids 8 - 804 of Q9NPN9, which also conesponds to amino acids 200 - 996 of R30650_PEA_2_P5, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
  • an isolated polypeptide encoding for a head of R30650_PEA_2_P5 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95% homologous to the sequence MDGVNLSTEVVYKKGQDYRFACYDRGRACRSYRVRFLCGKPVRPKLTVTIDTNVNSTI LNLEDNVQSWKPGDTLVIASTDYSMYQAEEFQVLPCRSCAPNQVKVAGKPMYLHIGEE IDGVDMRAEVGLLSRNIIVMGEMEDKC YPYRNHICNFFDFDTFGGHIKF ALGFKAAHLE GTELKHMGQQLVGQYPIHFHLAGD ofR30650_PEA_2_P5.
  • an isolated chimeric polypeptide encoding for R30650_PEA_2_P5 comprising a first amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MDGVNLSTEVVYKKGQDYRFACYDRGRACRSYRVRFLCGKPVRPKLTVTIDTNVNSTI LNLEDNVQSWKPGDTLVIASTDYSMYQAEEFQVLPCRSCAPNQVKVAGKPMYLHIGEE IDGVDMRAEVGLLSRNIIVMGEMEDKCYPYRNHICNFFDFDTFGGHIKFALGFKAAHLE GTELKHMGQQLVGQYPIHFHLAGDVDERGGYDPPTYIRDLSIHHTFSRCVTVHGSNGLL IKDWGYNSLGHCFFTEDGPEERNTFDHCLGLLVKSGTLLPSDRD
  • an isolated polypeptide encoding for a head of R30650JPEAJ JP5 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence
  • an isolated chimeric polypeptide encoding for R30650_PEA_2_P8 comprising a first amino acid sequence being at least 70%>, optionally at least 80%>, preferably at least 85%>, more preferably at least 90%) and most preferably at least 95% homologous to a polypeptide having the sequence MGAAGRQDFLFKAMLTISWLTLTCFPGATSTVAAGCPDQSPELQPWNPGHDQDHHVHI GQGKTLLLTSSATVYSIHISEGGKLVIKDHDEPIVLRTRHILIDNGGELHAGSALCPFQGN FTIILYGRADEGIQPDPYYGLKYIGVGKGGALELHGQKKLSWTFLNKTLHPGGMAEGG YFFERSWGHRGVINHVIDPKSGTVIHSDRFDTYRSKKESER VQYLNAVPDGPvILSVAV NDEGSRNLDDMARKAMTKLGSKHFLHLGFRHPWSFLTVKGNPSS
  • an isolated polypeptide encoding for a head of R30650_PEA_2_P8 comprising a polypeptide being at least 70%>, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90%> and most preferably at least about 95% homologous to the sequence MGAAGRQDFLFKAMLTISWLTLTCFPGATSTVAAGCPDQSPELQPWNPGHDQDHHVHI GQGKTLLLTSSATVYSIHISEGGKLVIKDHDEPIVLRTRHILIDNGGELHAGSALCPFQGN FTIILYGRADEGIQPDPYYGLKYIGVGKGGALELHGQKKLSWTFLNKTLHPGGMAEGG YFFERSWGHRGVrVHVIDPKSGTVIHSDRFDTYRSKKESERLVQYLNAVPDGRILSVAV NDEGSRNLDDMARKAMTKLGSKHFLHLGFRHPWSFLTVKGNPSSSVEDH
  • an isolated polypeptide encoding for a tail of R30650J ⁇ AJ2JP8 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence KQRTISWR in R30650JPEA_2_P8.
  • an isolated chimeric polypeptide encoding for R30650J ⁇ AJJP8 comprising a first amino acid sequence being at least 90 %> homologous to
  • an isolated polypeptide encoding for a tail of R30650JPEAJJP8, comprising a polypeptide being at least 70%>, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence
  • R30650JPEA_2JP8 According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for R30650_PEAJ_P8, comprising a first amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence
  • DVHNRLLKQTSKTGVFVRTLQMDKVEQSYPGRSHYYWDEDSG corresponding to amino acids 8 - 579 of Q9NPN9, which also conesponds to amino acids 565 - 1136 of R30650JPEAJJP8, and a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence KQRTISWR corresponding to amino acids 1137 - 1144 of R30650_PEAJ_P8, wherein said first amino acid sequence, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order.
  • an isolated polypeptide encoding for a tail of R30650_PEAJ_P8 comprising a polypeptide being at least 70%>, optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90%> and most preferably at least about 95% homologous to the sequence KQRTISWR in R30650_PEAJ_P8.
  • an isolated chimeric polypeptide encoding for R30650_PEA_2_P8 comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%) and most preferably at least 95% homologous to a polypeptide having the sequence MGAAGRQDFLFKAMLTISWLTLTCFPGATSTVAAGCPDQSPELQPWNPGHDQDHHVHI GQGKTLLLTSSATVYSIHISEGGKLVIKDHDEPIVLRTRHILIDNGGELHAGSALCPFQGN FTIILYGRADEGIQPDPYYGLKYIGVGKGGALELHGQKKLSWTFLNKTLHPGGMAEGG YFFERSWGHRGVIVHVIDPKSGTVIHSDRFDTYRSKKESERLVQYLNAVPDGRILSVAV NDEGSRNLDDMARKAMTKLGSKHFLHLGFRHPWSFLTVKGNPSSSVEDHIE
  • NRLLKQTSKTGVFVRTLQMDKVEQSYPGRSHYYWDEDSG conesponding to amino acids 2 - 275 of Q9H1K5, which also conesponds to amino acids 863 - 1136 of R30650_PEA_2_P8, and a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence KQRTISWR conesponding to amino acids 1137 - 1144 of R30650JPEAJJ > 8, wherein said first amino acid sequence, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order.
  • an isolated polypeptide encoding for a head of R30650 PEAJ ?8 comprising a polypeptide being at least 70%o, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MGAAGRQDFLFKAMLTISWLTLTCFPGATSTVAAGCPDQSPELQPWNPGHDQDHHVHI GQGKTLLLTSSATVYSIHISEGGKLVIKDHDEPIVLRTRHILIDNGGELHAGSALCPFQGN FTIILYGRADEGIQPDPYYGLKYIGVGKGGALELHGQKKLSWTFLNKTLHPGGMAEGG YFFERS WGHRGVIVHVIDPKSGTVIHSDRFDTYRSKKESERLVQYLNAVPDGRILS VA V NDEGSRNLDDMARKAMTKLGSKHFLHLGFRHPWSFLTVKGNPSSSVEDHIEYHGHRG
  • an isolated polypeptide encoding for a tail of R30650_PEAJ_P8 comprising a polypeptide being at least 70%>, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence KQRTISWR in R30650JPEA_2JP8.
  • an isolated chimeric polypeptide encoding for R30650JPEAJJP15 comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence MGAAGRQDFLFKAMLTISWLTLTCFPGATSTVAAGCPDQSPELQPWNPGHDQDHHVHI GQGKTLLLTSSATVYSIHISEGGKLVIKDHDEPIVLRTRHILIDNGGELHAGSALCPFQGN FT ⁇ LYGRADEGIQPDPYYGLKYIGVGKGGALELHGQKKLSWTFLNKTLHPGGMAEGG YFFERSWGHRGV ⁇ VHVIDPKSGTVIHSDRFDTYRSKKESERLVQYLNAVPDGRILSVAV NDEGSRNLDDMARKAMTKLGSKHFLHLGFRHPWSFLTVKGNPSSSVEDHIE
  • RTLQMDKVEQSYPGRSHYYWDEDSG conesponding to amino acids 1 - 788 of Q9ULM1, which also conesponds to amino acids 349 - 1136 of R30650_PEA_2_P15, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
  • an isolated polypeptide encoding for a head of R30650JPEA J_P15 comprising a polypeptide being at least 70%>, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MGAAGRQDFLFKAMLTISWLTLTCFPGATSTVAAGCPDQSPELQPWNPGHDQDHHVHI GQGKTLLLTSSATVYSIHISEGGKLVIKDHDEPIVLRTRHILIDNGGELHAGSALCPFQGN FTIILYGRADEGIQPDPYYGLKYIGVGKGGALELHGQKKLSWTFLNKTLHPGGMAEGG YFFERSWGHRGVrVHVIDPKSGTVIHSDRFDTYRSKKESERLVQYLNAVPDGRILSVAV NDEGSRNLDDMARKAMTKLGSKHFLHLGFRHPWSFLTVKGNPSSSVEDHIE
  • GEPGPWFNQLDMDGDKTSVFHDVDGSVSEYPGSYLTKND conesponding to amino acids 1 - 977 of Q8WUJ3, which also conesponds to amino acids 1 - 977 of R30650_PEAJ_P15, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least
  • an isolated polypeptide encoding for a tail of R30650JPEA_J_P15 comprising a polypeptide being at least 70%>, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence NWLVRHPDCLNVPDWRGAICSGCYAQMYIQAYKTSNLRMKIIKNDFPSHPLYLEGALT RSTHYQQYQPVVTLQKGYTIHWDQTAPAELAIWLINFNKGDWIRVGLCYPRGTTFSILS DVHNRLLKQTSKTGVFVRTLQMDKVEQSYPGRSHYYWDEDSG in R30650JPEA_2J » 15.
  • an isolated chimeric polypeptide encoding for R30650_PEAJ_P15 comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MGAAGRQDFLFKAMLTISWLTLTCFPGATSTVAAGCPDQSPELQPWNPGHDQDHHVHI GQGKTLLLTSSATVYSIHISEGGKLVIKDHDEPIVLRTRHILIDNGGELHAGSALCPFQGN FTIILYGRADEGIQPDPYYGLKYIGVGKGGALELHGQKKLSWTFLNKTLHPGGMAEGG YFFERSWGHRGVRVHVIDPKSGTVIHSDRFDTYRSKKESERLVQYLNAVPDGRILSVAV NDEGSRNLDDMARKAMTKLGSKHFLHLGFRHPWSFLTVKGNPSSSVEDHIEY
  • an isolated chimeric polypeptide encoding for R30650JPEA_2JP15 comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
  • an isolated polypeptide encoding for a head of R30650_PEA_2 S comprising a polypeptide being at least 70%>, optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence MGAAGRQDFLFKAMLTISWLTLTCFPGATSTVAAGCPDQSPELQPWNPGHDQDHHVHI GQGKTLLLTSSATVYSIHISEGGKLVIKDHDEPIVLRTRHILIDNGGELHAGSALCPFQGN FTIILYGRADEGIQPDPYYGLKYIGVGKGGALELHGQKKLSWTFLNKTLHPGGMAEGG YFFERSWGHRGV ⁇ VHVIDPKSGTVIHSDRFDTYRSKKESERLVQYLNAVPDGRILSVAV NDEGSRNLDDMARKAMTKLGSKHFLHLGFRHPWSFLTVKGNPSSSVEDH
  • R30650JPEA_2_P15 there is provided an isolated chimeric polypeptide encoding for R30650JPEA_2JP17, comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a tail of R30650JPEA_2_P17 comprising a polypeptide being at least 70%>, optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence GEEFQTIW in R30650JPEA_2_P17.
  • an isolated chimeric polypeptide encoding for M78035JP4 comprising a first amino acid sequence being at least 90 %> homologous to
  • an isolated chimeric polypeptide encoding for M78035 _JP6 comprising a first amino acid sequence being at least 90 % homologous to MILDDGGDLTNLIHTKYPQLLPGIRGISEETTTGVHNLYKMMANGILKVPAINVNDSVT KSKFDNLYGCRESLIDGIKRATDVMIAGKVAVVAGYGDVGKGCAQALRGFGARVIITEI DPLNALQAAMEGYEVTTMDEACQEGNIFVTTTGCIDIILGRHFEQMKDDAIVCNIGHFD VEIDVKWLNENAVEKVNIKPQVDRYRLKNGRRIILLAEGRLVNLGCAMGHPSFVMSNS FTNQVMAQIELWTHPDKYPVGVHFLPKKLDEAVAEAHLGKLNVKLTKLTEKQAQYLG MSCDGPFKPDHYRY conesponding to amino acids 127 - 432 of SAHH_HUMAN, which also
  • an isolated chimeric polypeptide encoding for M78035_P8 comprising a first amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%>, more preferably at least 90%) and most preferably at least 95%> homologous to a polypeptide having the sequence MSDKLPYKV conesponding to amino acids 1 - 9 of M78035JP8, and a second amino acid sequence being at least 90 %> homologous to
  • an isolated polypeptide encoding for a head of M78035JP8 comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%o, more preferably at least about 90%) and most preferably at least about 95% homologous to the sequence MSDKLPYKV of M78035JP8.
  • an isolated chimeric polypeptide encoding for HUMCEAJPEAJ JM comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a tail of HUMCEAJPEAJ JM comprising a polypeptide being at least 70%>, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence CEYICSSLAQAASPNPQGQRQDFSVPLRFKYTDPQPWTSRLSVTFCPRKTWADQVLTKN RRGGAASVLGGSGSTPYDGRNR in HUMCEA PEA 1 P4.
  • an isolated chimeric polypeptide encoding for HUMCEAJPEAJ JP5 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for a tail of HUMCEA_PEA_1_P5 comprising a polypeptide being at least 70%>, optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence
  • an isolated chimeric polypeptide encoding for HUMCEAJPEAJ JP7 comprising a first amino acid sequence being at least 90 % homologous to MESPSAPPHRWCIPWQRLLLTASLLTFWNPPTTAKLTIESTPFNVAEGKEVLLLVHNLPQ HLFGYSWYKGERVDGNRQIIGYVIGTQQATPGPAYSGREI ⁇ YPNASLLIQNIIQNDTGFYT LHVIKSDLVNEEATGQFRVYPELPKPSISSNNSKPVEDKDAVAFTCEPETQDATYLWWV NNQSLPVSPRLQLSNGNRTLTLFNVTRNDTASYKCETQNPVSARRSDSVILNVLYGPDA PTISPLNTSYRSGENLNLSCHAASNPPAQYSWFVNGTFQ
  • an isolated chimeric polypeptide encoding for an edge portion of HUMCEA_PEA_1_P7 comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise VS, having a structure as follows: a sequence starting from any of amino acid numbers 674-x to 674; and ending at any of amino acid numbers 675+ ((n-2) - x), in which x varies from 0 to n-2.
  • an isolated chimeric polypeptide encoding for HUMCEAJPEAJ JP10 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated chimeric polypeptide encoding for an edge portion of HUMCEAJ ⁇ AJ JP 10 comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise SV, having a structure as follows: a sequence starting from any of amino acid numbers 228-x to 228; and ending at any of amino acid numbers 229+ ((n-2) - x), in which x varies from 0 to n-2.
  • an isolated chimeric polypeptide encoding for HUMCEAJPEAJ JP19 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated chimeric polypeptide encoding for HUMCEAJ ⁇ AJ JP20 comprising a first amino acid sequence being at least 90 % homologous to MESPSAPPHRWCIPWQRLLLTASLLTFWNPPTTAKLTIESTPFNVAEGKEVLLLVHNLPQ HLFGYSWYKGERVDGNRQIIGYVIGTQQATPGPAYSGREIIYPNASLLIQNIIQNDTGFYT LHVIKSDLVNEEATGQFRVYP conesponding to amino acids 1 - 142 of CEA5_HUMAN, which also conesponds to amino acids 1 - 142 of HUMCEA_PEA_1_P20, and a second amino acid sequence being at least 90 % homologous to ELPKPSISSNNSKPVEDKDAVAFTCEPEAQNTTYLWWVNGQSLPVSPRLQLSNGNRTLT LFNVTRNDARAYVCGIQNSVSANRS
  • an isolated chimeric polypeptide encoding for an edge portion of HUMCEA_PEA_1_P20 comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise PE, having a structure as follows: a sequence starting from any of amino acid numbers 142-x to 142; and ending at any of amino acid numbers 143+ ((n-2) - x), in which x varies from 0 to n-2.
  • an isolated chimeric polypeptide encoding for HUMCACH1 AJPEAJ JP7 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated polypeptide encoding for an edge portion of HUMCACH1A_PEA_1_P7 comprising an amino acid sequence being at least 70%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence encoding for WCWWRRRGAAKAGPSGCRRWG, conesponding to HUMCACH1A_PEA_1_P7.
  • a bridge portion of HUMCACHl AJPEAJ JP7 comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise L, having a structure as follows (numbering according to HUMCACH1A_PEA_1_P7): a sequence starting from any of amino acid numbers 492-x to 492; and ending at any of amino acid numbers 28 + ((n-2) - x), in which x varies from 0 to n-2.
  • an isolated polypeptide encoding for a head of HUMCACHl AJPEA J JP 13 comprising a polypeptide being at least 70%>, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MLRPRCLLRRTAHPPHSAPAPAPARSKCLGSWSNVLIRESSVWSLRL of HUMCACH1A_PEA_1_P13.
  • an isolated chimeric polypeptide encoding for HUMCACH1A_PEA_1_P14 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated chimeric polypeptide encoding for HUMCACHl A_PEA_1_P17 comprising a first amino acid sequence being at least 90 % homologous to MMMMMMMKKMQHQRQQQADHANEANYARGTRLPLSGEGPTSQPNSSKQTVLSWQ AAIDAARQAKAAQTMSTSAPPP VGSLSQRKRQQYAKSKKQGNSSNSRPARALFCLSLN NPIRRACISIVEWKPFDIFILLAIFANCVALAIYIPFPEDDSNSTNHNLEKVEYAFLIIFTVET FLKIIAYGLLLHPNAYVRNGWNLLDFVIVIVGLFSVILEQLTKETEGGNHSSGKSGGFDV KALRAFRVLRPLRLVSGVPSLQWLNSIIKAMVPLLHIALLVLFVIIIYAIIGLELFIGKMH KTCFFADSDIVAEEDPAPCAFSGNGRQCTANGTECRSGWVGPNGGITNFDNFAF
  • an isolated polypeptide encoding for a tail of HUMCACHl A_PEA_1_P17 comprising a polypeptide being at least 10%, optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence HGGSRL in HUMCACH1A_PEA_1_P17.
  • an isolated chimeric polypeptide encoding for AA583399_PEAJ JP2 comprising a first amino acid sequence being at least 90 % homologous to MFTRQAGHFVEGSKAGRSRGRLCLSQALRVAVRGAFVSLWFAAGAGDRERNKGDKG AQTGAGLSQEAEDVDVSRARRVTDAPQGTLCGTGNRNSGSQSARWGVAHLGEAFRV GVEQAISSCPEEVHGRHGLSMEIMWARMDVALRSPGRGLLAGAGALCMTLAESSCPD YERGRRACLTLHRHPTPHCSTWGLPLRVAGSWLTVVTVEALGGWRMGVRRTGQVGP TMHPPPVSGASPLLLHHLLLLLLIIILTC conesponding to amino acids 59 - 313 of MYEOJHUMANJV1, which also conesponds to amino acids 1 - 255 of AA583399 JPEAJ J > 2.
  • an isolated chimeric polypeptide encoding for AA583399JPEAJ JM comprising a first amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MSDLFIGFLVCSLSPLGTGTRCSCSPG conesponding to amino acids 1 - 27 of AA583399 JPEAJ JP4, and a second amino acid sequence being at least 90 % homologous to RNSGSQSARWGVAHLGEAFRVGVEQAISSCPEEVHGRHGLSMEIMWARMDVALRSP GRGLLAGAGALCMTLAESSCPDYERGRRACLTLHRHPTPHCSTWGLPLRVAGSWLTV VTVEALGGWRMGVRRTGQVGPTMHPPPVSGASPLLLHHLLLLLLIIILTC conesponding to amino acids 150 - 313 of MYEOJHUMANJV
  • an isolated polypeptide encoding for a head of AA583399JPEAJ JM comprising a polypeptide being at least 70%>, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95% homologous to the sequence MSDLFIGFLVCSLSPLGTGTRCSCSPG of AA583399_PEA_1_P4.
  • an isolated chimeric polypeptide encoding for AA583399_PEA_1_P5 comprising a first amino acid sequence being at least 90 % homologous to
  • an isolated chimeric polypeptide encoding for AA583399_PEA_1_P10 comprising a first amino acid sequence being at least 90 % homologous to
  • an antibody capable of specifically binding to an epitope of an amino acid sequence as described herein.
  • the amino acid sequence conesponds to a bridge, edge portion, tail, head or insertion as described herein.
  • the antibody is capable of differentiating between a splice variant having said epitope and a conesponding known protein.
  • a kit for detecting colon cancer comprising a kit detecting overexpression of a splice variant as described herein.
  • the kit comprises a NAT-based technology.
  • said the kit further comprises at least one primer pair capable of selectively hybridizing to a nucleic acid sequence as described herein.
  • the kit further comprises at least one oligonucleotide capable of selectively hybridizing to a nucleic acid sequence as described herein.
  • the kit optionally comprises an antibody as described herein.
  • the kit optionally further comprises at least one reagent for performing an ELISA or a Western blot.
  • a method for detecting colon cancer comprising detecting overexpression of a splice variant as described herein. Detecting overexpression is optionally performed with a NAT-based technology.
  • s detecting overexpression is performed with an immunoassay, optionally wherein said immunoassay comprises an antibody as described herein.
  • a biomarker capable of detecting colon cancer comprising any of the above nucleic acid sequences or a fragment thereof, or any of the above amino acid sequences or a fragment thereof.
  • a method for screening for colon cancer comprising detecting colon cancer cells with a biomarker or an antibody or a method or assay as described herein.
  • a method for diagnosing colon cancer comprising detecting colon cancer cells with a biomarker or an antibody or a method or assay as described herein.
  • a method for monitoring disease progression and/or treatment efficacy and/or relapse of colon cancer comprising detecting colon cancer cells with a biomarker or an antibody or a method or assay as described herein.
  • a method of selecting a therapy for colon cancer comprising detecting colon cancer cells with a biomarker or an antibody or a method or assay as described herein and selecting a therapy according to said detection.
  • any of the above nucleic acid and/or amino acid sequences further comprises any sequence having at least about 70%, preferably at least about 80%), more preferably at least about 90%, most preferably at least about 95% homology thereto.
  • all experimental data relates to variants of the present invention, named according to the segment being tested (as expression was tested through RT-PCR as described).
  • nucleic acid sequences and/or amino acid sequences shown herein as embodiments of the present invention relate to their isolated form, as isolated polynucleotides (including for all transcripts), oligonucleotides (including for all segments, amplicons and primers), peptides (including for all tails, bridges, insertions or heads, optionally including other antibody epitopes as described herein) and/or polypeptides (including for all proteins). It should be noted that oligonucleotide and polynucleotide, or peptide and polypeptide, may optionally be used interchangeably. Unless defined otherwise, all technical and scientific terms used herein have the meaning commonly understood by a person skilled in the art to which this invention belongs.
  • Figure 1 is schematic summary of cancer biomarkers selection engine and the wet validation stages.
  • Figure 2 Schematic illustration, depicting grouping of transcripts of a given cluster based on presence or absence of unique sequence regions.
  • Figure 3 is schematic summary of quantitative real-time PCR analysis.
  • Figure 4 is schematic presentation of the oligonucleotide based microanay fabrication.
  • Figure 5 is schematic summary of the oligonucleotide based microanay experimental flow.
  • Figure 6 is a histogram showing Cancer and cell-line vs. normal tissue expression for Cluster M85491.
  • Figure 7 is a histogram showing expression of the Ephrin type-B receptor 2 precursor
  • M85491 transcripts which are detectable by amplicon as depicted in sequence name M85491seg24 in normal and cancerous colon tissues.
  • Figure 8 is a histogram showing the expression of M85491 transcripts which are detectable by amplicon as depicted in sequence name M85491seg24 in different normal tissues.
  • Figure 9 is histogram, showing Cancer and cell-line vs. normal tissue expression for Cluster T10888, demonstrating overexpression in colorectal cancer, a mixture of malignant tumors from different tissues, pancreas carcinoma and gastric carcinoma.
  • Figure 10 is a histogram showing expression of the CEA6 JHUMAN Carcinoembryonic antigen-related cell adhesion molecule 6 (T10888) transcripts which are detectable by amplicon as depicted in sequence name T10888 juncl 1-17, in nonnal and cancerous colon tissues.
  • T10888 Carcinoembryonic antigen-related cell adhesion molecule 6
  • Figure 11 is a the histogram showing the expression of T10888 transcripts, which are detectable by amplicon as depicted in sequence name T10888juncl l-17, in different normal tissues.
  • Figure 12 is a histogram showing Cancer and cell-line vs. normal tissue expression for T10888 transcripts, which are detectable by amplicon as depicted in sequence name T10888juncl l-17, in different normal tissues.
  • Figure 12 is a histogram showing Cancer and cell-line vs. normal tissue expression for
  • Figure 13 is a histogram, showing Cancer and cell- line vs. normal tissue expression for Cluster H53626, demonstrating overexpression in the epithelial malignant tumors, a mixture of malignant tumors from different tissues and myosarcoma.
  • Figure 14 is a histogram showing expression of the above-indicated Homo sapiens fibroblast growth factor receptor-like 1 (FGFRL1) H53626 transcripts, which are detectable by amplicon as depicted in sequence name H53626 junc24-27FlR3, in normal and cancerous colon tissues.
  • FGFRL1 fibroblast growth factor receptor-like 1
  • Figure 15 is the expression of Homo sapiens fibroblast growth factor receptor-like 1 (FGFRL1) H53626 transcripts, which are detectable by amplicon as depicted in sequence name H53626seg25, in normal and cancerous colon tissues.
  • Figure 16 is a a histogram, showing Cancer and cell-line vs. normal tissue expression for
  • Cluster HSENA78 demonstrating overexpression in the epithelial malignant tumors and lung malignant tumors.
  • Figure 17 is a histogram, showing Cancer and cell-line vs. normal tissue expression for the Cluster HUMODCA, demonstrating overexpression in the brain malignant tumors, colorectal cancer, epithelial malignant tumors and a mixture of malignant tumors from different tissues.
  • Figure 18 is a histogram, showing Cancer and cell-line vs. normal tissue expression for the cluster R00299, demonstratin overexpression in the lung malignant tumors.
  • Figure 19 is the histograms showing Cancer and cell-line vs. normal tissue expression for the cluster Z44808, demonstrating overexpression in the colorectal cancer, lung cancer and pancreas carcinoma.
  • Figure 20 is the histograms showing Cancer and cell-line vs. normal tissue expression for the cluster Z25299, demonstrating overexpression in the brain malignant tumors, a mixture of malignant tumors from different tissues and ovarian carcinoma.
  • Figure 21 is a histogram showing expression of Z25299 transcripts, which are detectable by amplicon as depicted in sequence name Z25299seg20, in normal and cancerous colon tissues.
  • Figure 22 is a histogram showing the expression of Secretory leukocyte protease inhibitor Acid-stable proteinase inhibitor with strong affinities for trypsin, chymotrypsin, elastase, and cathepsin G. May prevent elastase-mediated damage to oral and possibly other mucosal tissues Z25299 transcripts which are detectable by amplicon as depicted in sequence name Z25299seg20 in different normal tissues.
  • Figure 23 is the histograms showing Cancer and cell-line vs. normal tissue expression for the cluster HUMANK, demonstrating overexpression in epithelial malignant tumors.
  • Figure 24 is the histograms showing Cancer and cell-line vs. normal tissue expression for the cluster HUMCAIXIA, demonstrating overexpression in the bone malignant tumors, epithelial malignant tumors, a mixture of malignant tumors from different tissues and lung malignant tumors.
  • Figure 25 is the histograms showing Cancer and cell-line vs. normal tissue expression for the cluster HSSIOOPCB, demonstrating overexpression in the mixture of malignant tumors from different tissues.
  • Figure 26 is the histograms showing Cancer and cell-line vs. normal tissue expression for the cluster D11853, demonstrating overexpression in the brain malignant tumors, colorectal cancer and a mixture of malignant tumors from different tissues.
  • Figure 27 is the histograms showing Cancer and cell-line vs. normal tissue expression for the cluster R11723, demonstrating overexpression in the epithelial malignant tumors, a mixture of malignant tumors from different tissues and kidney malignant tumors
  • Figure 28 is the histogram showing expression of the R11723 transcripts, which are detectable by amplicon as depicted in sequence name R11723 segl3 in normal and cancerous colon tissues.
  • Figure 29 is the histogram showing expression of the R11723 transcripts, which are detectable by amplicon as depicted in sequence name R11723 juncl 1-18 in normal and cancerous colon tissues.
  • Figure 30 is the histogram showing the expression of RI 1723 transcripts, detectable by amplicon depicted in sequence name RI 1723segl 3 in different normal tissues.
  • Figure 31 is the histogram showing the expression of RI 1723 transcripts, detectable by amplicon in sequence name RI 1723 juncl 1-18 in different normal tissues.
  • Figure 32 is a histogram showing over expression of the SM02 JHUMAN SPARC related modular calcium-binding protein 2 precursor (Secreted modular calcium-binding protein 2) (SMOC-2) (Smooth muscle-associated protein 2) Z44808 transcripts which are detectable by amplicon as depicted in sequence name Z44808junc8-1 1 in cancerous colon samples relative to the nonnal samples
  • Figure 33 is the histograms showing Cancer and cell-line vs. normal tissue expression for the cluster M77903, demonstrating overexpression in ovarian carcinoma and uterine malignancies.
  • Figure 34 is the histogram showing expression of the SSR-alpha M77903 transcripts, which are detectable by amplicon, as depicted in sequence name M77903segl8 in normal and cancerous colon tissues.
  • Figure 35 is the histogram showing low over expression for amplicon M77903 junc20- 34-35 in the experiment canied out with colon.
  • Figure 36 is the histogram showing low over expression for amplicon M77903 junc20- 28 in the experiment carried out with colon
  • Figures 37-38 are histograms showing differential expression of 6 sequences:
  • Figure 39 is a histogram showing the expression of SM02_HUMAN SPARC related modular calcium-binding protein 2 precursor (Secreted modular calcium-binding protein 2) (SMOC-2) (Smooth muscle-associated protein 2) Z44808 transcripts which are detectable by amplicon as depicted in sequence name Z44808 junc8-l 1 in different normal tissues.
  • SMOC-2 Segted modular calcium-binding protein 2
  • Figure 40 is the histogram showing Cancer and cell-line vs. normal tissue expression for the cluster AA583399, demonstrating overexpression in brain malignant tumors, epithelial malignant tumors, a mixture of malignant tumors from different tissues and gastric carcinoma.
  • Figure 41 is the histogram showing expression of the AA583399 transcripts, which are detectable by amplicon as depicted in sequence name AA583399seg30-32, in normal and cancerous colon tissues.
  • Figure 42 is the histogram showing expression of the AA583399 transcripts which are detectable by amplicon as depicted in sequence name AA583399segl7 in nonnal and cancerous colon tissues.
  • Figure 43 is the histogram showing expression of the AA583399 transcripts which are detectable by amplicon as depicted in sequence name AA583399segl in normal and cancerous colon tissues.
  • Figure 44 is the histogram showing Cancer and cell-line vs. normal tissue expression for the cluster AI684092, demonstrating overexpression in brain malignant tumors, epithelial malignant tumors and a mixture of malignant tumors from different tissues.
  • Figure 45 is the histogram showing expression of the AA5315457 transcripts which are detectable by amplicon as depicted in sequence name AA5315457seg8 in normal and cancerous colon tissues.
  • Figure 46 is the histogram showing Cancer and cell-line vs.
  • FIG. 47 is the histogram showing expression of the Voltage-dependent L-type calcium channel alpha-ID subunit Calcium channel, L type, alpha-1 polypeptide, isoform 2 Transcripts, which are detectable by seg 113, 35, 109, 125,Jn normal and cancerous colon tissues.
  • Figure 48 is the histogram showing expression of the HUMCACHIA Transcripts, which are detectable by amplicon as depicted in sequence name HUMCACHlAseglOl Jn normal and cancerous colon tissues.
  • Figure 49 is the histogram showing Cancer and cell-line vs.
  • FIG. 50 is the histogram showing expression of the HUMCEA transcripts which are detectable by segl2 and seg9Jn normal and cancerous colon tissues.
  • Figure 51 is the histogram showing expression of the Carcinoembryonic antigen-related cell adhesion molecule 5 CEACAM5 HUMCEA transcripts which are detectable by amplicon as depicted in sequence name HUMCEA seg31 in nonnal and cancerous colon tissues.
  • Figure 52 is the histogram showing expression of the Carcinoembryonic antigen-related cell adhesion molecule 5 CEACAM5 HUMCEA transcripts which are detectable by amplicon as depicted in sequence name HUMCEA seg33 in normal and cancerous colon tissues.
  • Figure 53 is the histogram showing expression of the Carcinoembryonic antigen-related cell adhesion molecule 5 CEACAM5 HUMCEA transcripts which are detectable by amplicon as depicted in sequence name HUMCEA seg35 in normal and cancerous colon tissues.
  • Figure 54 is the histogram showing Cancer and cell-line vs.
  • FIG. 55 is the histogram showing expression of the S-adenosylhomocysteine hydrolase (AHCY) M78035 transcripts, which are detectable by amplicon as depicted in sequence name M78035seg42, in normal and cancerous colon tissues
  • Figure 56 is the histogram showing Cancer and cell-line vs. nonnal tissue expression for the cluster R30650, demonstrating overexpression in epithelial malignant tumors and a mixture of malignant tumors from different tissues.
  • AHCY S-adenosylhomocysteine hydrolase
  • Figure 57 is the histogram showing expression of the R30650 transcripts which are detectable by amplicon as depicted in sequence name R30650 seg76 in normal and cancerous colon tissues.
  • Figure 58 is the histogram showing Cancer and cell-line vs. normal tissue expression for the cluster T23657, demonstrating overexpression in epithelial malignant tumors.
  • Figure 59 is the histogram showing expression of solute carrier organic anion transporter family, member 4A1 (SLC04A1) T23657 transcripts, which are detectable by amplicon as depicted in sequence name T23657 segl7-18, in normal and cancerous colon tissues.
  • solute carrier organic anion transporter family, member 4A1 (SLC04A1) T23657 transcripts which are detectable by amplicon as depicted in sequence name T23657 segl7-18, in normal and cancerous colon tissues.
  • Figure 60 is the histogram showing expression of solute carrier organic anion transporter family, member 4A1 (SLC04A1) T23657 transcripts, which are detectable by amplicon as depicted in sequence name T23657 seg22, in normal and cancerous colon tissues.
  • Figure 61 is the histogram showing expression of solute canier organic anion transporter family, member 4A1 (SLC04A1) T23657 transcripts, which are detectable by amplicon as depicted in sequence name T23657 seg29-32, in normal and cancerous colon tissues.
  • Figure 62 is the histogram showing expression of solute canier organic anion transporter family, member 4A1 (SLC04A1) T23657 transcripts, which are detectable by amplicon as depicted in sequence name T23657 seg41, in normal and cancerous colon tissues.
  • Figure 63 is the histogram showing Cancer and cell-line vs. normal tissue expression for the cluster T51958, demonstrating overexpression in epithelial malignant tumors and a mixture of malignant tumors from different tissues.
  • Figure 64 is the histogram showing expression of PTK7 protein tyrosine kinase 7 (PTK7) T51958 transcripts which are detectable by amplicon as depicted in sequence name T 51958seg38 in normal and cancerous colon tissues.
  • Figure 65 is the histogram showing expression of PTK7 protein tyrosine kinase 7
  • FIG. 66 is the histogram showing Cancer and cell-line vs. normal tissue expression for the cluster Z 17877, demonstrating overexpression in brain malignant tumors and malignant tumors involving the bone manow.
  • Figure 67 is the histogram showing expression of c-myc-P64 mRNA, initiating from promoter P0 Z 17877 transcripts, which are detectable by amplicon as depicted in sequence name Z17877seg8, in normal and cancerous colon tissues.
  • Figure 68 is the histogram showing combined expression of 19 sequences (T23657seg 29, T23657seg 22, T23657seg 41, T23657segl7-18, AA315457seg8, R30650seg76, HUM-
  • Figure 69 is the histogram showing expression of TRIM31 tripartite motif HSHCGI transcripts which are detectable by amplicon as depicted in sequence name HSHCGI seg20in normal and cancerous colon tissues.
  • Figure 70 is the histogram showing expression of TRIM31 tripartite motif HSHCGI transcripts which are detectable by amplicon as depicted in sequence name HSHCGI seg35 in nonnal and cancerous colon tissues.
  • Figure 71 is a histogram showing the expression of fibroblast growth factor receptor-like 1 (FGFRLl) transcripts detectable by or according to H53626 seg25 amplicon(s) and H53626 seg25F and H53626 seg25R in different normal tissues.
  • FGFRLl fibroblast growth factor receptor-like 1
  • Figure 72 is a histogram showing the expression of fibroblast growth factor receptor-like 1 (FGFRLl) transcripts detectable by or according to H53626 seg25 amplicon(s) and H53626 seg25F and H53626 junc24-27FlR3 in different normal tissues.
  • Figure 73 is a histogram showing over expression of the Matrix metalloproteinase 11
  • FIG. 74 is a histogram showing over expression of the Matrix metalloproteinase 11 (stromelysin 3) (MMP11) transcripts, which are detectable by amplicon as depicted in sequence name HSSTROL3 seg25, in cancerous colon samples relative to the normal samples.
  • Figure 75 is the histogram showing Cancer and cell-line vs. normal tissue expression for the cluster HSSTROL3, demonstrating overexpression in transitional cell carcinoma, epithelial malignant tumors, a mixture of malignant tumors from different tissues and pancreas carcinoma.
  • Figure 76 is a histogram showing the expression of of Stromelysin-3 HSSTROL3 transcripts, which are detectable by amplicon as depicted in sequence name HSSTROL3 seg24, in different normal tissues.
  • the present invention is of novel markers for colon cancer that are both sensitive and accurate.
  • Biomolecular sequences amino acid and/or nucleic acid sequences
  • uncovered using the methodology of the present invention and described herein can be efficiently utilized as tissue or pathological markers and/or as drugs or drug targets for treating or preventing a disease.
  • These markers are specifically released to the bloodstream under conditions of colon cancer and/or other colon pathology, and/or are otherwise expressed at a much higher level and/or specifically expressed in colon cancer tissue or cells.
  • the measurement of these markers, alone or in combination, in patient samples provides information that the diagnostician can conelate with a probable diagnosis of colon cancer and/or pathology.
  • the present invention therefore also relates to diagnostic assays for colon cancer and/or colon pathology, and methods of use of such markers for detection of colon cancer and/or colon pathology, optionally and preferably in a sample taken from a subject (patient), which is more preferably some type of blood sample.
  • the present invention relates to bridges, tails, heads and/or insertions, and or analogs, homologs and derivatives of such peptides. Such bridges, tails, heads and or insertions are described in greater detail below with regard to the Examples.
  • a "tail” refers to a peptide sequence at the end of an amino acid sequence that is unique to a splice variant according to the present invention.
  • a splice variant having such a tail may optionally be considered as a chimera, in that at least a first portion of the splice variant is typically highly homologous (often 100% identical) to a portion of the conesponding known protein, while at least a second portion of the variant comprises the tail.
  • a "head” refers to a peptide sequence at the beginning of an amino acid sequence that is unique to a splice variant according to the present invention.
  • a splice variant having such a head may optionally be considered as a chimera, in that at least a first portion of the splice variant comprises the head, while at least a second portion is typically highly homologous (often 100% identical) to a portion of the conesponding known protein.
  • an edge portion refers to a connection between two portions of a splice variant according to the present invention that were not joined in the wild type or known protein.
  • An edge may optionally arise due to a join between the above "known protein" portion of a variant and the tail, for example, and/or may occur if an internal portion of the wild type sequence is no longer present, such that two portions of the sequence are now contiguous in the splice variant that were not contiguous in the known protein.
  • a "bridge” may optionally be an edge portion as described above, but may also include a join between a head and a "known protein” portion of a variant, or a join between a tail and a "known protein” portion of a variant, or a join between an insertion and a "known protein” portion of a variant.
  • a bridge between a tail or a head or a unique insertion, and a "known protein" portion of a variant comprises at least about 10 amino acids, more preferably at least about 20 amino acids, most preferably at least about 30 amino acids, and even more preferably at least about 40 amino acids, in which at least one amino acid is from the tail/head/insertion and at least one amino acid is from the "known protein" portion of a variant.
  • the bridge may comprise any number of amino acids from about 10 to about 40 amino acids (for example, 10, 11, 12, 13...37, 38, 39, 40 amino acids in length, or any number in between).
  • bridges cannot be extended beyond the length of the sequence in either direction, and it should be assumed that every bridge description is to be read in such manner that the bridge length does not extend beyond the sequence itself. Furthermore, bridges are described with regard to a sliding window in certain contexts below.
  • a bridge between two edges may optionally be described as follows: a bridge portion of CONTIG-NAMEJP1 (representing the name of the protein), comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise XX (2 amino acids in the center of the bridge, one from each end of the edge), having a structure as follows (numbering according to the sequence of CONTIG-NAME_Pl): a sequence starting from any of amino acid numbers 49-x to 49 (for example); and ending at any of amino acid numbers 50 + ((n-2) - x) (for example), in which x varies from 0 to n-2.
  • this invention provides antibodies specifically recognizing the splice variants and polypeptide fragments thereof of this invention. Preferably such antibodies differentially recognize splice variants of the present invention but do not recognize a conesponding known protein (such known proteins are discussed with regard to their splice variants in the Examples below).
  • this invention provides an isolated nucleic acid molecule encoding for a splice variant according to the present invention, having a nucleotide sequence as set forth in any one of the sequences listed herein, or a sequence complementary thereto.
  • this invention provides an isolated nucleic acid molecule, having a nucleotide sequence as set forth in any one of the sequences listed herein, or a sequence complementary thereto.
  • this mvention provides an oligonucleotide of at least about 12 nucleotides, specifically hybridizable with the nucleic acid molecules of this mvention.
  • this invention provides vectors, cells, liposomes and compositions comprising the isolated nucleic acids of this invention.
  • this invention provides a method for detecting a splice variant according to the present invention in a biological sample, comprising: contacting a biological sample with an antibody specifically recognizing a splice variant according to the present invention under conditions whereby the antibody specifically interacts with the splice variant in the biological sample but do not recognize known conesponding proteins (wherein the known protein is discussed with regard to its splice variant(s) in the Examples below), and detecting said interaction; wherein the presence of an interaction conelates with the presence of a splice variant in the biological sample.
  • this invention provides a method for detecting a splice variant nucleic acid sequences in a biological sample, comprising: hybridizing the isolated nucleic acid molecules or oligonucleotide fragments of at least about a minimum length to a nucleic acid material of a biological sample and detecting a hybridization complex; wherein the presence of a hybridization complex conelates with the presence of a splice variant nucleic acid sequence in the biological sample.
  • the splice variants described herein are non-limiting examples of markers for diagnosing colon cancer and/or colon pathology.
  • Each splice variant marker of the present invention can be used alone or in combination, for various uses, including but not limited to, prognosis, prediction, screening, early diagnosis, determination of progression, therapy selection and treatment monitoring of colon cancer and/or colon pathology.
  • any marker according to the present invention may optionally be used alone or combination.
  • Such a combination may optionally comprise a plurality of markers described herein, optionally including any subcombination of markers, and/or a combination featuring at least one other marker, for example a known marker.
  • such a combination may optionally and preferably be used as described above with regard to determining a ratio between a quantitative or semi-quantitative measurement of any marker described herein to any other marker described herein, and/or any other known marker, and/or any other marker.
  • the known marker comprises the "known protein" as described in greater detail below with regard to each cluster or gene.
  • a splice variant protein or a fragment thereof, or a splice variant nucleic acid sequence or a fragment thereof may be featured as a biomarker for detecting colon cancer and/or colon pathology, such that a biomarker may optionally comprise any of the above.
  • the present invention optionally and preferably encompasses any amino acid sequence or fragment thereof encoded by a nucleic acid sequence conesponding to a splice variant protein as described herein.
  • any oligopeptide or peptide relating to such an amino acid sequence or fragment thereof may optionally also (additionally or alternatively) be used as a biomarker, including but not limited to the unique amino acid sequences of these proteins that are depicted as tails, heads, insertions, edges or bridges.
  • the present invention also optionally encompasses antibodies capable of recognizing, and/or being elicited by, such oligopeptides or peptides.
  • the present invention also optionally and preferably encompasses any nucleic acid sequence or fragment thereof, or amino acid sequence or fragment thereof, conesponding to a splice variant of the present invention as described above, optionally for any application. Non-limiting examples of methods or assays are described below.
  • the present invention also relates to kits based upon such diagnostic methods or assays.
  • Nucleic acid sequences and Oligonucleotides Various embodiments of the present invention encompass nucleic acid sequences described hereinabove; fragments thereof, sequences hybridizable therewith, sequences homologous thereto, sequences encoding similar polypeptides with different codon usage, altered sequences characterized by mutations, such as deletion, insertion or substitution of one or more nucleotides, either naturally occuning or artificially induced, either randomly or in a targeted fashion.
  • the present invention encompasses nucleic acid sequences described herein; fragments thereof, sequences hybridizable therewith, sequences homologous thereto [e.g., at least 50 %, at least 55 %, at least 60%, at least 65 %, at least 70 %, at least 75 %, at least 80 %, at least 85 %, at least 95 %> or more say 100 % identical to the nucleic acid sequences set forth below], sequences encoding similar polypeptides with different codon usage, altered sequences characterized by mutations, such as deletion, insertion or substitution of one or more nucleotides, either naturally occuning or man induced, either randomly or in a targeted fashion.
  • the present invention also encompasses homologous nucleic acid sequences (i.e., which form a part of a polynucleotide sequence of the present invention) which include sequence regions unique to the polynucleotides of the present invention.
  • the present invention also encompasses novel polypeptides or portions thereof, which are encoded by the isolated polynucleotide and respective nucleic acid fragments thereof described hereinabove.
  • a "nucleic acid fragment" or an "oligonucleotide” or a "polynucleotide” are used herein interchangeably to refer to a polymer of nucleic acids.
  • a polynucleotide sequence of the present invention refers to a single or double stranded nucleic acid sequences which is isolated and provided in the form of an RNA sequence, a complementary polynucleotide sequence (cDNA), a genomic polynucleotide sequence and/or a composite polynucleotide sequences (e.g., a combination of the above).
  • cDNA complementary polynucleotide sequence
  • genomic polynucleotide sequence e.g., a combination of the above.
  • composite polynucleotide sequences e.g., a combination of the above.
  • the phrase "complementary polynucleotide sequence” refers to a sequence, which results from reverse transcription of messenger RNA using a reverse transcriptase or any other RNA dependent DNA polymerase. Such a sequence can be subsequently amplified in vivo or in vitro using a DNA dependent DNA polymerase.
  • genomic polynucleotide sequence refers to a sequence derived (isolated) from a chromosome and thus it represents a contiguous portion of a chromosome.
  • composite polynucleotide sequence refers to a sequence, which is composed of genomic and cDNA sequences.
  • a composite sequence can include some exonal sequences required to encode the polypeptide of the present invention, as well as some intronic sequences interposing therebetween.
  • the intronic sequences can be of any source, including of other genes, and typically will include conserved splicing signal sequences. Such intronic sequences may further include cis acting expression regulatory elements.
  • Prefened embodiments of the present invention encompass oligonucleotide probes.
  • An example of an oligonucleotide probe which can be utilized by the present invention is a single stranded polynucleotide which includes a sequence complementary to the unique sequence region of any variant according to the present invention, including but not limited to a nucleotide sequence coding for an amino sequence of a bridge, tail, head and/or insertion according to the present invention, and/or the equivalent portions of any nucleotide sequence given herein (including but not limited to a nucleotide sequence of a node, segment or amplicon described herein).
  • an oligonucleotide probe of the present invention can be designed to hybridize with a nucleic acid sequence encompassed by any of the above nucleic acid sequences, particularly the portions specified above, including but not limited to a nucleotide sequence coding for an amino sequence of a bridge, tail, head and/or insertion according to the present invention, and/or the equivalent portions of any nucleotide sequence given herein (including but not limited to a nucleotide sequence of a node, segment or amplicon described herein).
  • Oligonucleotides designed according to the teachings of the present invention can be generated according to any oligonucleotide synthesis method known in the art such as enzymatic synthesis or solid phase synthesis.
  • Oligonucleotides used according to this aspect of the present invention are those having a length selected from a range of about 10 to about 200 bases preferably about 15 to about 150 bases, more preferably about 20 to about 100 bases, most preferably about 20 to about 50 bases.
  • the oligonucleotide of the present invention features at least 17, at least 18, at least 19, at least 20, at least 22, at least 25, at least 30 or at least 40, bases specifically hybridizable with the biomarkers of the present invention.
  • the oligonucleotides of the present invention may comprise heterocylic nucleosides consisting of purines and the pyrimidines bases, bonded in a 3' to 5' phosphodiester linkage.
  • oligonucleotides are those modified at one or more of the backbone, internucleoside linkages or bases, as is broadly described hereinunder.
  • Specific examples of prefened oligonucleotides useful according to this aspect of the present invention include oligonucleotides containing modified backbones or non-natural internucleoside linkages.
  • Oligonucleotides having modified backbones include those that retain a phosphorus atom in the backbone, as disclosed in U.S. Pat.
  • Prefened modified oligonucleotide backbones include, for example, phosphorothioates, chiral phosphorothioates, phosphorodithioates, phosphotriesters, aminoalkyl phosphotriesters, methyl and other alkyl phosphonates including 3 '-alkylene phosphonates and chiral phosphonates, phosphinates, phosphoramidates including 3'-amino phosphoramidate and aminoalkylphosphoramidates, thionophosphoramidates, thionoalkylphosphonates, thionoalkylphosphotriesters, and boranophosphates having normal 3 '-5' linkages, 2 -5' linked analogs of these, and those having inverted polarity wherein the adjacent pairs of nucleoside units are linked 3'-5' to 5'-3' or 2'-5' to 5'-2'.
  • modified oligonucleotide backbones that do not include a phosphorus atom therein have backbones that are fonned by short chain alkyl or cycloalkyl internucleoside linkages, mixed heteroatom and alkyl or cycloalkyl internucleoside linkages, or one or more short chain heteroatomic or heterocyclic internucleoside linkages.
  • morpholino linkages formed in part from the sugar portion of a nucleoside
  • siloxane backbones sulfide, sulfoxide and sulfone backbones
  • formacetyl and thioformacetyl backbones methylene formacetyl and thioformacetyl backbones
  • alkene containing backbones sulfamate backbones
  • sulfonate and sulfonamide backbones amide backbones; and others having mixed N, O, S and CH 2 component parts, as disclosed in U.S. Pat. Nos.
  • oligonucleotides which can be used according to the present invention, are those modified in both sugar and the internucleoside linkage, i.e., the backbone, of the nucleotide units are replaced with novel groups. The base units are maintained for complementation with the appropriate polynucleotide target.
  • An example for such an oligonucleotide mimetic includes peptide nucleic acid (PNA).
  • PNA peptide nucleic acid
  • United States patents that teach the preparation of PNA compounds include, but are not limited to, U.S. Pat. Nos. 5,539,082; 5,714,331; and 5,719,262, each of which is herein incorporated by reference.
  • Other backbone modifications, which can be used in the present invention are disclosed in U.S. Pat.
  • Oligonucleotides of the present invention may also include base modifications or substitutions.
  • "unmodified” or “natural” bases include the purine bases adenine (A) and guanine (G), and the pyrimidine bases thymine (T), cytosine (C) and uracil (U).
  • Modified bases include but are not limited to other synthetic and natural bases such as 5- methylcytosine (5-me-C), 5-hydroxymethyl cytosine, xanthine, hypoxanthine, 2-aminoadenine, 6-methyl and other alkyl derivatives of adenine and guanine, 2-propyl and other alkyl derivatives of adenine and guanine, 2-thiouracil, 2-thiothymine and 2-thiocytosine, 5-halouracil and cytosine, 5-propynyl uracil and cytosine, 6-azo uracil, cytosine and thymine, 5-uracil (pseudouracil), 4-thiouracil, 8-halo, 8-amino, 8-thiol, 8-thioalkyl, 8-hydroxyl and other 8- substituted adenines and guanines, 5-halo particularly 5-bromo, 5-trifluoromethyl and other 5- substituted uracils and cyto
  • Further bases particularly useful for increasing the binding affinity of the oligomeric compounds of the invention include 5-substituted pyrimidines, 6-azapyrimidines and N-2, N-6 and 0-6 substituted purines, including 2-aminopropyladenine, 5-propynyluracil and 5-propynylcytosine.
  • 5-methylcytosine substitutions have been shown to increase nucleic acid duplex stability by 0.6- 1.2 °C and are presently prefened base substitutions, even more particularly when combined with 2'-0-methoxyethyl sugar modifications.
  • oligonucleotides of the invention involves chemically linking to the oligonucleotide one or more moieties or conjugates, which enhance the activity, cellular distribution or cellular uptake of the oligonucleotide.
  • moieties include but are not limited to lipid moieties such as a cholesterol moiety, cholic acid, a thioether, e.g., hexyl-S- tritylthiol, a thiocholesterol, an aliphatic chain, e.g., dodecandiol or undecyl residues, a phospholipid, e.g., di-hexadecyl-rac-glycerol or triethylammonium 1,2-di-O-hexadecyl-rac- glycero-3-H-phosphonate, a polyamine or a polyethylene glycol chain, or adamantane acetic acid, a palmityl
  • oligonucleotides of the present invention may include further modifications for more efficient use as diagnostic agents and or to increase bioavailability, therapeutic efficacy and reduce cytotoxicity.
  • a nucleic acid construct according to the present invention may be used, which includes at least a coding region of one of the above nucleic acid sequences, and further includes at least one cis acting regulatory element.
  • cis acting regulatory element refers to a polynucleotide sequence, preferably a promoter, which binds a trans acting regulator and regulates the transcription of a coding sequence located downstream thereto. Any suitable promoter sequence can be used by the nucleic acid construct of the present invention.
  • the promoter utilized by the nucleic acid construct of the present invention is active in the specific cell population transfonned.
  • cell type-specific and/or tissue-specific promoters include promoters such as albumin that is liver specific, lymphoid specific promoters [Calame et al., (1988) Adv. Immunol. 43:235-275]; in particular promoters of T-cell receptors [Winoto et al, (1989) EMBO J. 8:729-733] and im unoglobulins; [Banerji et al. (1983) Cell 33729-740], neuron-specific promoters such as the neurofilament promoter [Byrne et al. (1989) Proc. Natl. Acad. Sci.
  • promoters such as albumin that is liver specific, lymphoid specific promoters [Calame et al., (1988) Adv. Immunol. 43:235-275]; in particular promoters of T-cell receptors [Winoto et al, (1989) EMBO J. 8:729-733] and im
  • the nucleic acid construct of the present invention can further include an enhancer, which can be adjacent or distant to the promoter sequence and can function in up regulating the transcription therefrom.
  • the nucleic acid construct of the present invention preferably further includes an appropriate selectable marker and/or an origin of replication.
  • the nucleic acid construct utilized is a shuttle vector, which can propagate both in E.
  • the construct according to the present invention can be, for example, a plasmid, a bacmid, a phagemid, a cosmid, a phage, a virus or an artificial chromosome.
  • suitable constructs include, but are not limited to, pcDNA3, pcD A3.1
  • retroviral vector and packaging systems are those sold by Clontech, San Diego, Calif, includingRetro-X vectors pLNCX and pLXSN, which permit cloning into multiple cloning sites and the trasgene is transcribed from CMV promoter.
  • Vectors derived from Mo-MuLV are also included such as pBabe, where the transgene will be transcribed from the 5 'LTR promoter.
  • Cunently prefened in vivo nucleic acid transfer techniques include transfection with viral or non-viral constructs, such as adenovirus, lentivirus, Herpes simplex I virus, or adeno- associated virus (AAV) and lipid-based systems.
  • viral or non-viral constructs such as adenovirus, lentivirus, Herpes simplex I virus, or adeno- associated virus (AAV) and lipid-based systems.
  • Useful lipids for lipid-mediated transfer of the gene are, for example, DOTMA, DOPE, and DC-Chol [Tonkinson et al., Cancer Investigation, 14(1): 54-65 (1996)].
  • the most prefened constructs for use in gene therapy are viruses, most preferably adenoviruses, AAV, lentiviruses, or retroviruses.
  • a viral construct such as a retroviral construct includes at least one transcriptional promoter/enhancer or locus-defining element(s), or other elements that control gene expression by other means such as alternate splicing, nuclear RNA export, or post-translational modification of messenger.
  • Such vector constructs also include a packaging signal, long terminal repeats (LTRs) or portions thereof, and positive and negative strand primer binding sites appropriate to the virus used, unless it is already present in the viral construct.
  • LTRs long terminal repeats
  • such a construct typically includes a signal sequence for secretion of the peptide from a host cell in which it is placed.
  • the signal sequence for this purpose is a mammalian signal sequence or the signal sequence of the polypeptide variants of the present invention.
  • the construct may also include a signal that directs polyadenylation, as well as one or more restriction sites and a translation tennination sequence.
  • a signal that directs polyadenylation will typically include a 5' LTR, a tRNA binding site, a packaging signal, an origin of second-strand DNA synthesis, and a 3' LTR or a portion thereof.
  • Other vectors can be used that are non-viral, such as cationic lipids, polylysine, and dendrimers.
  • Hybridization assays Detection of a nucleic acid of interest in a biological sample may optionally be effected by hybridization-based assays using an oligonucleotide probe (non-limiting examples of probes according to the present invention were previously described).
  • Traditional hybridization assays include PCR, RT-PCR, Real-time PCR, RNase protection, in-situ hybridization, primer extension, Southern blots (DNA detection), dot or slot blots (DNA, RNA), and Northern blots (RNA detection) (NAT type assays are described in greater detail below). More recently, PNAs have been described (Nielsen et al. 1999, Cunent Opin. Biotechnol. 10:71-75).
  • kits containing probes on a dipstick setup and the like Other detection methods include kits containing probes on a dipstick setup and the like.
  • Hybridization based assays which allow the detection of a variant of interest (i.e., DNA or RNA) in a biological sample rely on the use of oligonucleotides which can be 10, 15, 20, or 30 to 100 nucleotides long preferably from 10 to 50, more preferably from 40 to 50 nucleotides long.
  • the isolated polynucleotides (oligonucleotides) of the present invention are preferably hybridizable with any of the herein described nucleic acid sequences under moderate to stringent hybridization conditions.
  • Moderate to stringent hybridization conditions are characterized by a hybridization solution such as containing 10 % dextrane sulfate, 1 M NaCl, 1 % SDS and 5 x l ⁇ 6 cpm 32 P labeled probe, at 65 °C, with a final wash solution of 0.2 x SSC and 0.1 % SDS and final wash at 65°C and whereas moderate hybridization is effected using a hybridization solution containing 10 % dextrane sulfate, 1 M NaCl, 1 % SDS and 5 x 10 6 cpm 32 P labeled probe, at 65 °C, with a final wash solution of 1 x SSC and 0.1 % SDS and final wash at 50 °C.
  • a hybridization solution such as containing 10 % dextrane sulfate, 1 M NaCl, 1 % SDS and 5 x l ⁇ 6 cpm 32 P labeled probe, at 65 °C
  • moderate hybridization is e
  • hybridization of short nucleic acids can be effected using the following exemplary hybridization protocols which can be modified according to the desired stringency;
  • hybridization duplexes are separated from unhybridized nucleic acids and the labels bound to the duplexes are then detected.
  • labels refer to radioactive, fluorescent, biological or enzymatic tags or labels of standard use in the art.
  • a label can be conjugated to either the oligonucleotide probes or the nucleic acids derived from the biological sample.
  • Probes can be labeled according to numerous well known methods.
  • Non-limiting examples of radioactive labels include 3H, 14C, 32P, and 35S.
  • detectable markers include ligands, fluorophores, chemiluminescent agents, enzymes, and antibodies.
  • oligonucleotides of the present invention can be labeled subsequent to synthesis, by incorporating biotinylated dNTPs or rNTP, or some similar means (e.g., photo- cross-linking a psoralen derivative of biotin to RNAs), followed by addition of labeled streptavidin (e.g., phycoerythrin-conjugated streptavidin) or the equivalent.
  • labeled streptavidin e.g., phycoerythrin-conjugated streptavidin
  • oligonucleotide probes when fluorescently-labeled oligonucleotide probes are used, fluorescein, lissamine, phycoerythrin, rhodamine (Perkin Elmer Cetus), Cy2, Cy3, Cy3.5, Cy5, Cy5.5, Cy7, FluorX (Amersham) and others [e.g., Kricka et al. (1992), Academic Press San Diego, Calif] can be attached to the oligonucleotides. Those skilled in the art will appreciate that wash steps may be employed to wash away excess target DNA or probe as well as unbound conjugate. Further, standard heterogeneous assay formats are suitable for detecting the hybrids using the labels present on the oligonucleotide primers and probes.
  • probes can be labeled according to numerous well known methods.
  • radioactive nucleotides can be incorporated into probes of the invention by several methods.
  • Non-limiting examples of radioactive labels include 3 H, 14 C, 32 P, and 35 S.
  • Probes of the invention can be utilized with naturally occurring sugar-phosphate backbones as well as modified backbones including phosphorothioates, dithionates, alkyl phosphonates and a-nucleotides and the like. Probes of the invention can be constructed of either ribonucleic acid (RNA) or deoxyribonucleic acid (DNA), and preferably of DNA.
  • RNA ribonucleic acid
  • DNA deoxyribonucleic acid
  • NAT-based assays Detection of a nucleic acid of interest in a biological sample may also optionally be effected by NAT-based assays, which involve nucleic acid amplification technology, such as PCR for example (or variations thereof such as real-time PCR for example).
  • a "primer" defines an oligonucleotide which is capable of annealing to (hybridizing with) a target sequence, thereby creating a double stranded region which can serve as an initiation point for DNA synthesis under suitable conditions.
  • Amplification of a selected, or target, nucleic acid sequence may be canied out by a number of suitable methods. See generally Kwoh et al., 1990, Am. Biotechnol.
  • amplification techniques include polymerase chain reaction (PCR), ligase chain reaction (LCR), strand displacement amplification (SDA), transcription-based amplification, the q3 replicase system and NASBA (Kwoh et al, 1989, Proc. Natl. Acad. Sci. USA 86, 1173-1177; Lizardi et al., 1988, BioTechnology 6:1197-1202; Malek et al., 1994, Methods Mol. Biol., 28:253-260; and Sambrook et al., 1989, supra).
  • PCR polymerase chain reaction
  • LCR ligase chain reaction
  • SDA strand displacement amplification
  • amplification pair refers herein to a pair of oligonucleotides (oligos) of the present invention, which are selected to be used together in amplifying a selected nucleic acid sequence by one of a number of types of amplification processes, preferably a polymerase chain reaction.
  • amplification processes include ligase chain reaction, strand displacement amplification, or nucleic acid sequence-based amplification, as explained in greater detail below.
  • the oligos are designed to bind to a complementary sequence under selected conditions.
  • amplification of a nucleic acid sample from a patient is amplified under conditions which favor the amplification of the most abundant differentially expressed nucleic acid.
  • RT-PCR is canied out on an mRNA sample from a patient under conditions which favor the amplification of the most abundant mRNA.
  • the amplification of the differentially expressed nucleic acids is canied out simultaneously. It will be realized by a person skilled in the art that such methods could be adapted for the detection of differentially expressed proteins instead of differentially expressed nucleic acid sequences.
  • the nucleic acid i.e. DNA or RNA
  • for practicing the present invention may be obtained according to well known methods.
  • Oligonucleotide primers of the present invention may be of any suitable length, depending on the particular assay format and the particular needs and targeted genomes employed.
  • the oligonucleotide primers are at least 12 nucleotides in length, preferably between 15 and 24 molecules, and they may be adapted to be especially suited to a chosen nucleic acid amplification system.
  • the oligonucleotide primers can be designed by taking into consideration the melting point of hybridization thereof with its targeted sequence (Sambrook et al., 1989, Molecular Cloning -A Laboratory Manual, 2nd Edition, CSH Laboratories; Ausubel et al., 1989, in Cunent Protocols in Molecular Biology, John Wiley & Sons Inc., N.Y.). It will be appreciated that antisense oligonucleotides may be employed to quantify expression of a splice isoform of interest. Such detection is effected at the pre-mRNA level. Essentially the ability to quantitate transcription from a splice site of interest can be effected based on splice site accessibility.
  • Oligonucleotides may compete with splicing factors for the splice site sequences. Thus, low activity of the antisense oligonucleotide is indicative of splicing activity.
  • the polymerase chain reaction and other nucleic acid amplification reactions are well known in the art (various non-limiting examples of these reactions are described in greater detail below).
  • the pair of oligonucleotides according to this aspect of the present invention are preferably selected to have compatible melting temperatures (Tm), e.g., melting temperatures which differ by less than that 7 °C, preferably less than 5 °C, more preferably less than 4 °C, most preferably less than 3 °C, ideally between 3 °C and 0 °C.
  • PCR Polymerase Chain Reaction
  • PCR The polymerase chain reaction (PCR), as described in U.S. Pat. Nos. 4,683,195 and 4,683,202 to Mullis and Mullis et al, is a method of increasing the concentration of a segment of target sequence in a mixture of genomic DNA without cloning or purification.
  • This technology provides one approach to the problems of low target sequence concentration.
  • PCR can be used to directly increase the concentration of the target to an easily detectable level.
  • This process for amplifying the target sequence involves the introduction of a molar excess of two oligonucleotide primers which are complementary to their respective strands of the double-stranded target sequence to the DNA mixture containing the desired target sequence. The mixture is denatured and then allowed to hybridize.
  • the primers are extended with polymerase so as to fonn complementary strands.
  • the steps of denaturation, hybridization (annealing), and polymerase extension (elongation) can be repeated as often as needed, in order to obtain relatively high concentrations of a segment of the desired target sequence.
  • the length of the segment of the desired target sequence is determined by the relative positions of the primers with respect to each other, and, therefore, this length is a controllable parameter.
  • Ligase Chain Reaction (LCR or LAR): The ligase chain reaction [LCR; sometimes refened to as “Ligase Amplification Reaction” (LAR)] has developed into a well-recognized alternative method of amplifying nucleic acids.
  • LCR four oligonucleotides, two adjacent oligonucleotides which uniquely hybridize to one strand of target DNA, and a complementary set of adjacent oligonucleotides, which hybridize to the opposite strand are mixed and DNA ligase is added to the mixture.
  • ligase will covalently link each set of hybridized molecules.
  • two probes are ligated together only when they base-pair with sequences in the target sample, without gaps or mismatches. Repeated cycles of denaturation, and ligation amplify a short segment of DNA.
  • LCR has also been used in combination with PCR to achieve enhanced detection of single-base changes: see for example Segev, PCT Publication No. W09001069 Al (1990).
  • the four oligonucleotides used in this assay can pair to form two short ligatable fragments, there is the potential for the generation of target-independent background signal.
  • the use of LCR for mutant screening is limited to the examination of specific nucleic acid positions.
  • Self-Sustained Synthetic Reaction (3SR/NASBA) The self-sustained sequence replication reaction (3SR) is a transcription-based in vitro amplification system that can exponentially amplify RNA sequences at a uniform temperature. The amplified RNA can then be utilized for mutation detection. In this method, an oligonucleotide primer is used to add a phage RNA polymerase promoter to the 5' end of the sequence of interest.
  • the target sequence undergoes repeated rounds of transcription, cDNA synthesis and second-strand synthesis to amplify the area of interest.
  • the use of 3SR to detect mutations is kinetically limited to screening small segments of DNA (e.g., 200-300 base pairs).
  • Q-Beta (Q ⁇ ) Replicase In this method, a probe which recognizes the sequence of interest is attached to the replicatable RNA template for Q ⁇ replicase.
  • thermostable DNA ligases are not effective on this RNA substrate, so the ligation must be performed by T4 DNA ligase at low temperatures (37 degrees C). This prevents the use of high temperature as a means of achieving specificity as in the LCR, the ligation event can be used to detect a mutation at the junction site, but not elsewhere.
  • a successful diagnostic method must be very specific.
  • a straight-forward method of controlling the specificity of nucleic acid hybridization is by controlling the temperature of the reaction.
  • a PCR running at 85 %> efficiency will yield only 21 % as much final product, compared to a reaction running at 100 %> efficiency.
  • a reaction that is reduced to 50 % mean efficiency will yield less than 1 % of the possible product.
  • routine polymerase chain reactions rarely achieve the theoretical maximum yield, and PCRs are usually run for more than 20 cycles to compensate for the lower yield.
  • 50 % mean efficiency it would take 34 cycles to achieve the million-fold amplification theoretically possible in 20, and at lower efficiencies, the number of cycles required becomes prohibitive.
  • any background products that amplify with a better mean efficiency than the intended target will become the dominant products.
  • PCR has yet to penetrate the clinical market in a significant way.
  • LCR LCR must also be optimized to use different oligonucleotide sequences for each target sequence.
  • both methods require expensive equipment, capable of precise temperature cycling.
  • nucleic acid detection technologies such as in studies of allelic variation, involve not only detection of a specific sequence in a complex background, but also the discrimination between sequences with few, or single, nucleotide differences.
  • One method of the detection of allele-specific variants by PCR is based upon the fact that it is difficult for Taq polymerase to synthesize a DNA strand when there is a mismatch between the template strand and the 3' end of the primer.
  • An allele-specific variant may be detected by the use of a primer that is perfectly matched with only one of the possible alleles; the mismatch to the other allele acts to prevent the extension of the primer, thereby preventing the amplification of that sequence.
  • This method has a substantial limitation in that the base composition of the mismatch influences the ability to prevent extension across the mismatch, and certain mismatches do not prevent extension or have only a minimal effect.
  • a similar 3 '-mismatch strategy is used with greater effect to prevent ligation in the LCR. Any mismatch effectively blocks the action of the thermostable ligase, but LCR still has the drawback of target-independent background ligation products initiating the amplification.
  • the direct detection method may be, for example a cycling probe reaction (CPR) or a branched DNA analysis.
  • CPR cycling probe reaction
  • branched DNA analysis e.g., a method that does not amplify the signal exponentially is more amenable to quantitative analysis.
  • CPR Cycling probe reaction
  • Hybridization of the probe to a target DNA and exposure to a thermostable RNase H causes the RNA portion to be digested. This destabilizes the remaining DNA portions of the duplex, releasing the remainder of the probe from the target DNA and allowing another probe molecule to repeat the process.
  • the signal in the form of cleaved probe molecules, accumulates at a linear rate. While the repeating process increases the signal, the RNA portion of the oligonucleotide is vulnerable to RNases that may canied through sample preparation.
  • Branched DNA involves oligonucleotides with branched structures that allow each individual oligonucleotide to carry 35 to 40 labels (e.g., alkaline phosphatase enzymes). While this enhances the signal from a hybridization event, signal from non-specific binding is similarly increased.
  • labels e.g., alkaline phosphatase enzymes
  • the detection of at least one sequence change may be accomplished by, for example restriction fragment length polymorphism (RFLP analysis), allele specific oligonucleotide (ASO) analysis, Denaturing/Temperature Gradient Gel Electrophoresis (DGGE/TGGE), Single-Strand Conformation Polymo ⁇ hism (SSCP) analysis or Dideoxy finge ⁇ rinting (ddF).
  • RFLP analysis restriction fragment length polymorphism
  • ASO allele specific oligonucleotide
  • DGGE/TGGE Denaturing/Temperature Gradient Gel Electrophoresis
  • SSCP Single-Strand Conformation Polymo ⁇ hism
  • ddF Dideoxy finge ⁇ rinting
  • nucleic acid segments for mutations.
  • One option is to determine the entire gene sequence of each test sample (e.g., a bacterial isolate). For sequences under approximately 600 nucleotides, this may be accomplished using amplified material (e.g., PCR reaction products). This avoids the time and expense associated with cloning the segment of interest. However, specialized equipment and highly trained personnel are required, and the method is too labor-intense and expensive to be practical and effective in the clinical setting.
  • a given segment of nucleic acid may be characterized on several other levels. At the lowest resolution, the size of the molecule can be determined by electrophoresis by comparison to a known standard run on the same gel.
  • a more detailed picture of the molecule may be achieved by cleavage with combinations of restriction enzymes prior to electrophoresis, to allow construction of an ordered map.
  • the presence of specific sequences within the fragment can be detected by hybridization of a labeled probe, or the precise nucleotide sequence can be determined by partial chemical degradation or by primer extension in the presence of chain-terminating nucleotide analogs.
  • Restriction fragment length polymorphism RFLP: For detection of single-base differences between like sequences, the requirements of the analysis are often at the highest level of resolution. For cases in which the position of the nucleotide in question is known in advance, several methods have been developed for examining single base changes without direct sequencing.
  • a change in the pattern of digestion can be used as a diagnostic tool (e.g., restriction fragment length polymo ⁇ hism [RFLP] analysis).
  • RFLP restriction fragment length polymo ⁇ hism
  • Single point mutations have been also detected by the creation or destruction of RFLPs. Mutations are detected and localized by the presence and size of the RNA fragments generated by cleavage at the mismatches.
  • Single nucleotide mismatches in DNA heteroduplexes are also recognized and cleaved by some chemicals, providing an alternative strategy to detect single base substitutions, generically named the "Mismatch Chemical Cleavage" (MCC).
  • MCC Mismatch Chemical Cleavage
  • RFLP analysis is used for the detection of point mutations, it is, by its nature, limited to the detection of only those single base changes which fall within a restriction sequence of a known restriction endonuclease. Moreover, the majority of the available enzymes have 4 to 6 base-pair recognition sequences, and cleave too frequently for many large-scale DNA manipulations. Thus, it is applicable only in a small fraction of cases, as most mutations do not fall within such sites. A handful of rare-cutting restriction enzymes with 8 base-pair specificities have been isolated and these are widely used in genetic mapping, but these enzymes are few in number, are limited to the recognition of G+C-rich sequences, and cleave at sites that tend to be highly clustered.
  • Allele specific oligonucleotide ASO: If the change is not in a recognition sequence, then allele-specific oligonucleotides (ASOs), can be designed to hybridize in proximity to the mutated nucleotide, such that a primer extension or ligation event can bused as the indicator of a match or a mis-match. Hybridization with radioactively labeled allelic specific oligonucleotides (ASO) also has been applied to the detection of specific point mutations.
  • the method is based on the differences in the melting temperature of short DNA fragments differing by a single nucleotide. Stringent hybridization and washing conditions can differentiate between mutant and wild-type alleles.
  • the ASO approach applied to PCR products also has been extensively utilized by various researchers to detect and characterize point mutations in ras genes and gsp/gip oncogenes. Because of the presence of various nucleotide changes in multiple positions, the ASO method requires the use of many oligonucleotides to cover all possible oncogenic mutations. With either of the techniques described above (i.e., RFLP and ASO), the precise location of the suspected mutation must be known in advance of the test.
  • DGGE/TGGE Denaturing/Temperature Gradient Gel Electrophoresis
  • variants can be distinguished, as differences in melting properties of homoduplexes versus heteroduplexes differing in a single nucleotide can detect the presence of mutations in the target sequences because of the conesponding changes in their electrophoretic mobilities.
  • the fragments to be analyzed usually PCR products, are "clamped” at one end by a long stretch of G-C base pairs (30-80) to allow complete denaturation of the sequence of interest without complete dissociation of the strands.
  • the attachment of a GC "clamp" to the DNA fragments increases the fraction of mutations that can be recognized by DGGE. Attaching a GC clamp to one primer is critical to ensure that the amplified sequence has a low dissociation temperature.
  • TGGE uses a thermal gradient rather than a chemical denaturant gradient. TGGE requires the use of specialized equipment which can generate a temperature gradient pe ⁇ endicularly oriented relative to the electrical field. TGGE can detect mutations in relatively small fragments of DNA therefore scanning of large gene segments requires the use of multiple PCR products prior to running the gel.
  • Single-Strand Conformation Polymorphism SSCP: Another common method, called “Single-Strand Conformation Polymo ⁇ hism" (SSCP) was developed by Hayashi, Sekya and colleagues and is based on the observation that single strands of nucleic acid can take on characteristic conformations in non-denaturing conditions, and these conformations influence electrophoretic mobility.
  • the complementary strands assume sufficiently different structures that one strand may be resolved from the other. Changes in sequences within the fragment will also change the conformation, consequently altering the mobility and allowing this to be used as an assay for sequence variations.
  • the SSCP process involves denaturing a DNA segment (e.g., a PCR product) that is labeled on both strands, followed by slow electrophoretic separation on a non-denaturing polyacrylamide gel, so that intra-molecular interactions can form and not be disturbed during the run. This technique is extremely sensitive to variations in gel composition and temperature. A serious limitation of this method is the relative difficulty encountered in comparing data generated in different laboratories, under apparently similar conditions.
  • Dideoxy fingerprinting (ddF): The dideoxy f ⁇ nge ⁇ rinting (ddF) is another technique developed to scan genes for the presence of mutations.
  • the ddF technique combines components of Sanger dideoxy sequencing with SSCP.
  • a dideoxy sequencing reaction is performed using one dideoxy terminator and then the reaction products are electrophoresed on nondenaturing polyacrylamide gels to detect alterations in mobility of the termination segments as in SSCP analysis.
  • ddF is an improvement over SSCP in terms of increased sensitivity
  • ddF requires the use of expensive dideoxynucleotides and this technique is still limited to the analysis of fragments of the size suitable for SSCP (i.e., fragments of 200-300 bases for optimal detection of mutations).
  • all of these methods are limited as to the size of the nucleic acid fragment that can be analyzed.
  • sequences of greater than 600 base pairs require cloning, with the consequent delays and expense of either deletion sub-cloning or primer walking, in order to cover the entire fragment.
  • SSCP and DGGE have even more severe size limitations. Because of reduced sensitivity to sequence changes, these methods are not considered suitable for larger fragments.
  • SSCP is reportedly able to detect 90 % of single-base substitutions within a 200 base-pair fragment, the detection drops to less than 50 % for 400 base pair fragments. Similarly, the sensitivity of DGGE decreases as the length of the fragment reaches 500 base-pairs.
  • the ddF technique as a combination of direct sequencing and SSCP, is also limited by the relatively small size of the DNA that can be screened.
  • the step of searching for any of the nucleic acid sequences described here, in tumor cells or in cells derived from a cancer patient is effected by any suitable teclmique, including, but not limited to, nucleic acid sequencing, polymerase chain reaction, ligase chain reaction, self-sustained synthetic reaction, Q ⁇ -Replicase, cycling probe reaction, branched DNA, restriction fragment length polymo ⁇ hism analysis, mismatch chemical cleavage, heteroduplex analysis, allele-specific oligonucleotides, denaturing gradient gel electrophoresis, constant denaturant gel electrophoresis, temperature gradient gel electrophoresis and dideoxy finge ⁇ rinting.
  • any suitable teclmique including, but not limited to, nucleic acid sequencing, polymerase chain reaction, ligase chain reaction, self-sustained synthetic reaction, Q ⁇ -Replicase, cycling probe reaction, branched DNA, restriction fragment length polymo ⁇ hism analysis, mismatch chemical cleavage, heteroduplex analysis,
  • Detection may also optionally be performed with a chip or other such device.
  • the nucleic acid sample which includes the candidate region to be analyzed is preferably isolated, amplified and labeled with a reporter group.
  • This reporter group can be a fluorescent group such as phycoerythrin.
  • the labeled nucleic acid is then incubated with the probes immobilized on the chip using a fluidics station, describe the fabrication of fluidics devices and particularly microcapillary devices, in silicon and glass substrates. Once the reaction is completed, the chip is inserted into a scanner and patterns of hybridization are detected. The hybridization data is collected, as a signal emitted from the reporter groups already inco ⁇ orated into the nucleic acid, which is now bound to the probes attached to the chip.
  • the identity of the nucleic acid hybridized to a given probe can be dete ⁇ nined. It will be appreciated that when utilized along with automated equipment, the above described detection methods can be used to screen multiple samples for a disease and/or pathological condition both rapidly and easily.
  • polypeptide amino acid sequences and peptides
  • polypeptide amino acid sequences and peptides
  • polypeptide amino acid sequences and peptides
  • polypeptide amino acid sequences and peptides
  • polypeptide amino acid sequences and peptides
  • polypeptide amino acid sequences and peptides
  • polypeptide amino acid sequences and peptides
  • polypeptide amino acid sequences and peptides
  • polypeptide amino acid residues.
  • polypeptide amino acid residues
  • polypeptides can be modified, e.g., by the addition of carbohydrate residues to form glycoproteins.
  • polypeptide include glycoproteins, as well as non-glycoproteins.
  • Polypeptide products can be biochemically synthesized such as by employing standard solid phase techniques. Such methods include but are not limited to exclusive solid phase synthesis, partial solid phase synthesis methods, fragment condensation, classical solution synthesis. These methods are preferably used when the peptide is relatively short (i.e., 10 kDa) and/or when it cannot be produced by recombinant techniques (i.e., not encoded by a nucleic acid sequence) and therefore involves different chemistry.
  • Solid phase polypeptide synthesis procedures are well known in the art and further described by John Monow Stewart and Janis Dillaha Young, Solid Phase Peptide Syntheses (2nd
  • Synthetic polypeptides can optionally be purified by preparative high performance liquid chromatography [Creighton T. (1983) Proteins, structures and molecular principles. WH Freeman and Co. N.Y.], after which their composition can be confirmed via amino acid sequencing. In cases where large amounts of a polypeptide are desired, it can be generated using recombinant techniques such as described by Bitter et al., (1987) Methods in Enzymol. 153:516-

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Organic Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Engineering & Computer Science (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Analytical Chemistry (AREA)
  • Zoology (AREA)
  • Genetics & Genomics (AREA)
  • Wood Science & Technology (AREA)
  • Physics & Mathematics (AREA)
  • Biotechnology (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • Hospice & Palliative Care (AREA)
  • Biophysics (AREA)
  • Oncology (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Peptides Or Proteins (AREA)

Abstract

Novel markers for colon cancer that are both sensitive and accurate. These markers are overexpressed in colon cancer specifically, as opposed to normal colon tissue. The measurement of these markers, alone or in combination, in patient samples provides information that the diagnostician can correlate with a probable diagnosis of colon cancer. The markers of the present invention, alone or in combination, show a high degree of differential detection between colon cancer and non-cancerous states.

Description

NOVEL NUCLEOTIDE AND AMINO ACID SEQUENCES, AND ASSAYS AND METHODS OF USE THEREOF FOR DIAGNOSIS OF COLON CANCER
FIELD OF THE INVENTION The present invention is related to novel nucleotide and protein sequences that are diagnostic markers for colon cancer, and assays and methods of use thereof.
BACKGROUND OF THE INVENTION Colon and rectal cancers are malignant conditions which occur in the corresponding segments of the large intestine. These cancers are sometimes referred to jointly as "colorectal cancer", and, in many respects, the diseases are considered identical. The major differences between them are the sites where the malignant growths occur and the fact that treatments may differ based on the location of the tumors. More than 95 percent of cancers of the colon and rectum are adenocarcinomas, which develop in glandular cells lining the inside (lumen) of the colon and rectum. In addition to adenocarcinomas, there are other rarer types of cancers of the large intestine: these include carcinoid tumors usually found in the appendix and rectum; gastrointestinal stromal tumors found in connective tissue in the wall -of the colon and rectum; and lymphomas, which are malignancies of immune cells in the colon, rectum and lymph nodes. As with other malignant conditions, a number of genetic abnormalities have been associated with colon tumors (Bos et al, (1987) Nature 327:293-297; Baker et al, (1989) 244:217-221; Nishisho et al, (1991) 253:665- 669). Colorectal cancer is the second most common cause of cancer death in the United States and the third most prevalent cancer in both men and women. Approximately 100,000 patients every year suffer from colon cancer and approximately half that number die of the disease. In large part this death rate is due to the inability to diagnose the disease at an early stage (Wanebo (1993) Colorectal Cancer, Mosby, St. Louis Mo.). In fact, the prognosis for a case of colon cancer is vastly enhanced when malignant tissue is detected at the early stage known as polyps. Polyps are usually benign growths protruding from the mucous membrane. Nearly all cases of colorectal cancer arise from adenomatous polyps, some of which mature into large polyps, undergo abnormal growth and development, and ultimately progress into cancer. This progression would appear to take at least 10 years in most patients, rendering it a readily treatable form of cancer if diagnosed early, when the cancer is localized. Simple removal of malignant polyps (polypectomy) through colonoscopy is now routine, and curing the condition from this procedure is effectively guaranteed. However, early detection of polyps and tumors depends on diligent and ongoing examination of patients at risk. The most reliable detection procedures to date include fecal occult blood tests, sigmoidoscopy, barium enema X-ray, digital rectal exam, and colonoscopy. Normally a malignant colon cancer will not cause noticeable symptoms (e.g., bowel obstruction, abdominal pain, anemia) until it has reached an advanced and far more serious stage of malignancy. At these stages, only risky, traumatic and/or invasive procedures are available, including chemotherapy, radiation therapy, and colonectomy. Although current understanding of the etiology of colon cancer is undergoing continual refinement, extensive research in this area points to a combination of factors, including age, hereditaiy and non-hereditary conditions, and environmental/dietary factors. Age is a key risk factor in the development of colorectal cancer, since men and women over 40 years of age become increasingly susceptible to that cancer. Incidence rates increase considerably in each subsequent decade of life. A number of hereditary and nonhereditary conditions have also been linked to a heightened risk of developing colorectal cancer, including familial adenomatous polyposis (FAP), hereditary nonpolyposis colorectal cancer (Lynch syndrome or HNPCC), a personal and/or family history of colorectal cancer or adenomatous polyps, inflammatory bowel disease, diabetes mellitus, and obesity. In the case of FAP, the tumor suppressor gene APC (adenomatous polyposis coli), located at 5q21, has been either mutationally inactivated or deleted (Alberts et al., Molecular Biology of the Cell 1288 (3d ed. 1994)). The APC protein plays a role in a number of functions, including cell adhesion, apoptosis, and repression of the c-myc oncogene. Of those patients with colorectal cancer who have normal APC genes, over 65% have such mutations in the cancer cells but not in other tissues. In the case of HPNCC, patients manifest abnormalities in the tumor suppressor gene HNPCC, but only about 15% of tumors contain the mutated gene. A host of other genes have also been implicated in colorectal cancer, including the K-ras, c-Ki-ras, N-ras, H-ras and c-myc oncogenes, and the tumor suppressor genes DCC (deleted in colon carcinoma), Wg/Wnt signal transduction pathway components and p53. Some tyrosine kinases have been shown up-regulated in colorectal tumor tissues or cell lines like HT29. Focal adhesion kinase (FAK) and its up-stream kinase c-src and c-yes in colonic epithelial cells may play an important role in the promotion of colorectal cancers through the extracellular 1 5 matrix (ECM) and integrin-mediated signaling pathways. The formation of c-src/FAK complexes may coordinately deregulate VEGF expression and apoptosis inhibition. Recent evidences suggest that a specific signal-transduction pathway for cell survival that implicates integrin engagement leads to FAK activation and thus activates PI-3 kinase and akt. In turn, akt phosphorylates BAD and blocks apoptosis in epithelial cells. The activation of c-sre in colon cancer may induce VEGF expression through the hypoxia pathway. Other genes that may be implicated in colorectal cancer include Cox enzymes (Ota, S. et al. Aliment Pharmacol. Ther. 16 (Suppl 2): 102-106 (2002)), estrogen (alAzzawi, F. and Wahab, M. Climacteric 5: 3-14 (2002)), peroxisome proliferator-activated receptor-y (PPAR-y) (Gelman, L. et al. Cell Mol. Life Sci. 5 5: 932-943 (1999)), IGF-I (Giovannucci (2001)), thymine DNA glycosylase (TDG) (Hardeland, U. et al. Prog. Nucleic Acid Res. Mol. Biol. 68: 235-253 (2001)) and EGF (Mendelsohn, J. EndocrineRelated Cancer 8: 3-9 (2001)).
Procedures used for detecting, diagnosing, monitoring, staging, and prognosticating colon cancer are of critical importance to the outcome of the patient. For example, patients diagnosed with early colon cancer generally have a much greater five-year survival rate as compared to the survival rate for patients diagnosed with distant metastasized colon cancer. Because colon cancer is highly treatable when detected at an early, localized stage, screening should be a part of routine care for all adults starting at age 50, especially those with first-degree relatives with colorectal cancer. One major advantage of colorectal cancer screening over its counterparts in other types of cancer is its ability to not only detect precancerous lesions, but to remove them as well. The key colorectal cancer screening tests in use today are fecal occult blood test, sigmoidoscopy, colonoscopy, double-contrast barium enema, and the carcinoembryonic antigen (CEA) test. New diagnostic methods which are more sensitive and specific for detecting early colon cancer are clearly needed.
Visual examination of the colon for abnormalities can be performed through endoscopic or radiographic techniques such as rigid proctosigmoidoscopy, flexible sigmoidoscopy, colonoscopy, and barium-contrast enema. These methods enable one to detect, biopsy, and remove adenomatous polyps. Despite the advantages of these procedures, there are accompanying downsides: they are expensive, and uncomfortable, and also carry with them a risk of complications. Sigmoidoscopy, by definition, is limited to the sigmoid colon and below, colonoscopy is a relatively expensive procedure, and both share the risk of possible bowel perforation and hemorrhaging. Double-contrast barium enema (DCBE) enables detection of lesions better than FOBT, and almost as well a colonoscopy, but it may be limited in evaluating the winding rectosigmoid region. Another method of colon cancer diagnosis is the detection of carcinoembryonic antigen (CEA) in a blood sample from a subject, which when present at high levels, may indicate the presence of advanced colon cancer. But CEA levels may also be abnormally high when no cancer is present. Thus, this test is not selective for colon cancer, which limits the test's value as an accurate and reliable diagnostic tool. In addition, elevated CEA levels are not detectable until late-stage colon cancer, when the cure rate is low, treatment options limited, and patient prognosis poor.
Several classification systems have been devised to stage the extent of colorectal cancer, including the Dukes' system and the more detailed International Union against Cancer-American Joint Committee on Cancer TNM staging system, which is considered by many in the field to be a more useful staging system. These most widely used staging systems generally use at least one of the following characteristics for staging: the extent of tumor penetration into the colon wall, with greater penetration generally correlating with a more dangerous tumor; the extent of invasion of the tumor through the colon wall and into other neighboring tissues, with greater invasion generally correlating with a more dangerous tumor; the extent of invasion of the tumor into the regional lymph nodes, with greater invasion generally correlating with a more dangerous tumor; and the extent of metastatic invasion into more distant tissues, such as the liver, with greater metastatic invasion generally correlating with a more dangerous disease state. "Dukes A" and "Dukes B" colon cancers are neoplasia that have invaded into the wall of the colon but have not spread into other tissues. Dukes A colon cancers are cancers that have not invaded beyond the submucosa. Dukes B colon cancers are subdivided into two groups: Dukes Bl and Dukes B2. "Dukes Bl" colon cancers are neoplasias that have invaded up to but not through the muscularis propria. Dukes B2 colon cancers are cancers that have breached completely through the muscularis propria. Over a five year period, patients with Dukes A cancer who receive surgical treatment (i.e. removal of the affected tissue) have a greater than 90% survival rate. Over the same period, patients with Dukes Bl and Dukes B2 cancer receiving surgical treatment have a survival rate of about 85% and 75% respectively. Dukes A, Bl and B2 cancers are also referred to as Tl, T2 and T3-T4 cancers, respectively. "Dukes C" colon cancers are cancers that have spread to the regional lymph nodes, such as the lymph nodes of the gut. Patients with Dukes C cancer who receive surgical treatment alone have a 35% survival rate over a five year period, but this survival rate is increased to 60% in patients that receive chemotherapy. "Dukes D" colon cancers are cancers that have metastasized to other organs. The liver is the most common organ in which metastatic colon cancer is found. Patients with Dukes D colon cancer have a survival rate of less than 5% over a five year period, regardless of the treatment regimen. The TNM system, which is used for either clinical or pathological staging, is divided into four stages, each of which evaluates the extent of cancer growth with respect to primary tumor (T), regional lymph nodes (N), and distant metastasis (M). The system focuses on the extent of tumor invasion into the intestinal wall, invasion of adjacent structures, the number of regional lymph nodes that have been affected, and whether distant metastasis has occurred. Stage 0 is characterized by in situ carcinoma (Tis), in which the cancer cells are located inside the glandular basement membrane (mtraepithelial) or lamina propria, (intramucosal). In this stage, the cancer has not spread to the regional lymph nodes (NO), and there is no distant metastasis (N40). In stage 1, there is still no spread of the cancer to the regional lymph nodes and no distant metastasis, but the tumor has invaded the submucosa (T I) or has progressed further to invade the muscularis propria (T2). Stage R also involves no spread of the cancer to the regional lymph nodes and no distant metastasis, but the tumor has invaded the subserosa, or the nonperitonealized pericolic or perirectal tissues (T3), or has progressed to invade other organs or structures, and/or has perforated the visceral peritoneum (T4). Stage 3 is characterized by any of the T substages, no distant metastasis, and either metastasis in 1 to 3 regional lymph nodes (Nl) or metastasis in four or more regional lymph nodes (N2). Lastly, stage 4 involves any of the T or N substages, as well as distant metastasis. Currently, pathological staging of colon cancer is preferable over clinical staging as pathological staging provides a more accurate prognosis. Pathological staging typically involves examination of the resected colon section, along with surgical examination of the abdominal cavity.
SUMMARY OF THE INVENTION The background art does not teach or suggest markers for colon cancer that are sufficiently sensitive and/or accurate, alone or in combination. From the foregoing, it is clear that procedures used for detecting, diagnosing, monitoring, staging, prognosticating, and preventing the recurrence of colorectal cancer are of critical importance to the outcome of the patient. Moreover, current procedures, while helpful in each of these analyses, are limited by their specificity, sensitivity, invasiveness, and/or their cost. It would therefore be desirable to provide more sensitive and accurate methods and reagents for the early diagnosis, staging, prognosis, monitoring, and treatment of diseases associated with colon cancer, or to indicate a predisposition to such for preventative measures, as well as to determine whether or not such cancer has metastasized and for monitoring the progress of colon cancer in a human which has not metastasized for the onset of metastasis. The present invention overcomes the deficiencies of the background art by providing novel markers for colon cancer that are both sensitive and accurate. Furthermore, these markers are able to distinguish between different stages of colon cancer, such as adenocarcinoma (mucinous or signet ring cell originating); leiomyocarcomas; carcinoid. Furthermore, at least some of these markers are able to distinguish, alone or in combination, between colon cancer between non-cancerous polyps. These markers are overexpressed in colon cancer specifically, as opposed to normal colon tissue. The measurement of these markers, alone or in combination, in patient samples provides information that the diagnostician can correlate with a probable diagnosis of colon cancer. The markers of the present invention, alone or in combination, show a high degree of differential detection between colon cancer and non-cancerous states. According to preferred embodiments of the present invention, examples of suitable biological samples include but are not limited to blood, serum, plasma, blood cells, urine, sputum, saliva, stool, spinal fluid or CSF, lymph fluid, the external secretions of the skin, respiratory, intestinal, and genitourinary tracts, tears, milk, neuronal tissue, colon tissue or mucous and any human organ or tissue. In a preferred embodiment, the biological sample comprises colon tissue and/or a serum sample and/or a urine sample and/or a stool sample and/or any other tissue or liquid sample. The sample can optionally be diluted with a suitable eluant before contacting the sample to an antibody and/or performing any other diagnostic assay. Information given in the text with regard to cellular localization was determined according to four different software programs: (i) tmhmm (from Center for Biological Sequence Analysis, Technical University of Denmark DTU, http://www.cbs.dtu.dk/services/TMHMM/TMHMM2.0b.guide.php) or (ii) tmpred (from EMBnet, maintained by the ISREC Bionformatics group and the LICR Information Technology Office, Ludwig Institute for Cancer Research, Swiss Institute of Bioinformatics, http:/Λvww.ch.embnct.org/software/TMPRED_foπn.htιnl) for transmembrane region prediction; (iii) signalpjhmm or (iv) signalpjnn (both from Center for Biological Sequence Analysis, Technical University of Denmark DTU, http://www.cbs.dtu.dk/seiΥices/SignalP/background/prediction.php) for signal peptide prediction. The terms "signalpjtimm" and "signalp_nn" refer to two modes of operation for the program SignalP: hmm refers to Hidden Markov Model, while nn refers to neural networks. Localization was also determined through manual inspection of known protein localization and/or gene structure, and the use of heuristics by the individual inventor. In some cases for the manual inspection of cellular localization prediction inventors used the ProLoc computational platform [Einat Hazkani-Covo, Erez Levanon, Galit Rotman, Dan Graur and Amit Novik; (2004) "Evolution of multicellularity in metazoa: comparative analysis of the subcellular localization of proteins in Saccharomyces, Drosophila and Caenorhabditis." Cell Biology International 2004;28(3):171-8.], which predicts protein localization based on various parameters including, protein domains (e.g., prediction of trans-membranous regions and localization thereof within the protein), pi, protein length, amino acid composition, homology to pre-annotated proteins, recognition of sequence patterns which direct the protein to a certain organelle (such as, nuclear localization signal, NLS, mitochondria localization signal), signal peptide and anchor modeling and using unique domains from Pfam that are specific to a single compartment. Infoπnation is given in the text with regard to SNPs (single nucleotide polymorphisms). A description of the abbreviations is as follows. "T - > C", for example, means that the SNP results in a change at the position given in the table from T to C. Similarly, "M - > Q", for example, means that the SNP has caused a change in the corresponding amino acid sequence, from methionine (M) to glutamine (Q). If, in place of a letter at the right hand side for the nucleotide sequence SNP, there is a space, it indicates that a frameshift has occurred. A frameshift may also be indicated with a hyphen (-). A stop codon is indicated with an asterisk at the right hand side (*). As part of the description of an SNP, a comment may be found in parentheses after the above description of the SNP itself. This comment may include an FTId, which is an identifier to a SwissProt entry that was created with the indicated SNP. An FTId is a unique and stable feature identifier, which allows construction of links directly from position- specific annotation in the feature table to specialized protein-related databases. The FTId is always the last component of a feature in the description field, as follows: FTId=XXX_number, in which XXX is the 3-letter code for the specific feature key, separated by an underscore from a 6-digit number. In the table of the amino acid mutations of the wild type proteins of the selected splice variants of the invention, the header of the first column is "SNP position(s) on amino acid sequence", representing a position of a known mutation on amino acid sequence. SNPs may optionally be used as diagnostic markers according to the present invention, alone or in combination with one or more other SNPs and/or any other diagnostic marker. Preferred embodiments of the present invention comprise such SNPs, including but not limited to novel SNPs on the known (WT or wild type) protein sequences given below, as well as novel nucleic acid and/or amino acid sequences formed through such SNPs, and/or any SNP on a variant amino acid and/or nucleic acid sequence described herein. Information given in the text with regard to the Homology to the known proteins was determined by Smith-Waterman version 5.1.2 using special (non default) parameters as follows: -model=sw.model -GAPEXT=0
-GAPOP=100.0
-MATRIX=blosumlOO Information is given with regard to overexpression of a cluster in cancer based on ESTs.
A key to the p values with regard to the analysis of such overexpression is as follows: - library-based statistics: P-value without including the level of expression in cell- lines (PI) - library based statistics: P-value including the level of expression in cell-lines (P2) - EST clone statistics: P-value without including the level of expression in cell-lines (SP1) - EST clone statistics: predicted overexpression ratio without including the level of expression in cell-lines (R3) - EST clone statistics: P-value including the level of expression in cell-lines (SP2) - EST clone statistics: predicted overexpression ratio including the level of expression in cell-lines (R4) Library-based statistics refer to statistics over an entire library, while EST clone statistics refer to expression only for ESTs from a particular tissue or cancer. Infoπnation is given with regard to overexpression of a cluster in cancer based on microarrays. As a microarray reference, in the specific segment paragraphs, the unabbreviated tissue name was used as the reference to the type of chip for which expression was measured. There are two types of microarray results: those from microarrays prepared according to a design by the present inventors, for which the microarray fabrication procedure is described in detail in Materials and Experimental Procedures section herein; and those results from microarrays using Affymetrix technology. As a microarray reference, in the specific segment paragraphs, the unabbreviated tissue name was used as the reference to the type of chip for which expression was measured. For microarrays prepared according to a design by the present inventors, the probe name begins with the name of the cluster (gene), followed by an identifying number. Oligonucleotide microarray results taken from Affymetrix data were from chips available from Affymetrix Inc, Santa Clara, CA, USA (see for example data regarding the Human Genome U133 (HG-U133) Set at www.affymetrix.com products/arrays/specific/hgul33.affx; GeneChip Human Genome U133A
2.0 Array at www.affymetrix.coin products/arrays/specific/hgul33av2.affx; and Human
Genome U133 Plus 2.0 Array at www.affymetrix.com/products/arrays/specific/hgul33plus.affx). The probe names follow the Affymetrix naming convention. The data is available from NCBI Gene Expression Omnibus
(see www.ncbi.nlm.nih.gov/projects/geo/ and Edgar et al, Nucleic Acids Research, 2002, Vol.
30, No. 1 207-210). The dataset (including results) is available from www.ncbi.nlm.nih.gov/geo/query/acc. cgi?acc=GSEl 133 for the Series GSE1133 database
(published on March 2004); a reference to these results is as follows: Su et al (Proc Natl Acad Sci U S A. 2004 Apr 20;101(16):6062-7. Epub 2004 Apr 09). A list of probes is given below.
>M85491_0_0_25999
GACATCTTTGCATATCATGTCAGAGCTATAACATCATTGTGGAGAAGCTC
>M85491_0_14_0
GTCATGAAAATCAACACCGAGGTGCGGAGCTTCGGACCTGTGTCCCGCAG >H53626_0_16_0
ATGCGGGCATGTACATCTGCCTTGGCGCCAACACCATGGGCTACAGCTTC
>H53626_0_0_8391
GGGTCTGGGGTGCTCTCCTGGTCTTTGTGTCGGCGTTCCCCTCCCTACCT
>HSENA78_0_1_0 TGAAGAGTGTGAGGAAAACCTATGTTTGCCGCTTAAGCTTTCAGCTCAGC
>HUMGROG5 )_0_16626
GCAGAAACTTTGCAGTAACACCTTCAGTGAGTTCAAGGCTAGGATCCCTG
>R00299_0_8_0
CCAAGGCTCGTCTGCGCACCTTGTGTCTTGTAGGGTATGGTATGTGGGAC >S67314_0_0_741
CACAGAGCCAGGATGTTCTTCTGACCTCAGTATCTACTCCAGCTCCAGCT
>S67314_0_0_744
TGGCATGCTGGAACATGGACTCTAGCTAGCAAGAAGGGCTCAAGGAGGTG
>Z44808_0_8_0 AAAAGCATGAGTTTCTGACCAGCGTTCTGGACGCGCTGTCCACGGACATG
>Z44808 0 0 72347 ATGTTCTTAGGAGGCAAGCCAGGAGAAGCCGGGTCTGACTTTTCAGCTCA
>Z44808_0_0_72349
TCCTCCAGACCCAAAGCCACAACCCATCGCAAGTCAAGAACACTTTCCAG
>Z25299_0_3_0 AACTCTGGCACCTTGGGCTGTGGAAGGCTCTGGAAAGTCCTTCAAAGCTG
>HUMCA1XIA_0_0_14909
GCTGCAATCTAAGTTTCGGAATACTTATACCACTCCAGAAATAATCCTCG
>HUMCA1XIA_0_18_0
TTCAGAACTGTTAACATCGCTGACGGGAAGTGGCATCGGGTAGCAATCAG >HSS100PCB_0_0_12280
CTCAAAATGAAACTCCCTCTCGCAGAGCACAATTCCAATTCGCTCTAAAA
>HUMPHOSLIP )_0_18458
AAGGAAGCAGGACCAGTGGATGTGAGGCGTGGTCGAAGAACAACAGAAAG
>HUMPHOSLIP_0_0_18487 ACAGGGGCCAGATGGTGACCCATGACCCAGCCTAAAAGGCAGCCAGAGGG
>D11853_0_0_0
GAGGCCCCTGGGTGGGAATGGGGACAGGAATTGACAGTGGAAGGGGTTCT
>D11853_0_0_3085
TGACTCCCTACATACTCCAGGACTAGCTTAGGTCCCAACCCAATAGTTCC >D11853_0_0_3082
TGGTCCCCATGTGATTCTCCGAGGATCCTGAGGGTCGTGGTTTATGGAGA
>M77903_0_0_21402
ACGTGATGGTTGGAACGCGTACCTTAGAGCTTCCAGTTCCGTCTTAGGAC
>AA583399_0_12_0 ATCCCCACTGAACCCAGTGCTTTCACCAGCCATATTAGCTCCCACTCACC
>AA583399_0_0_1681
CACCGCATGCTGCCAATCTGATGGTGGAGACAGAACAGCAGTCCCGGATG
>AA583399_0_1_1687
TTTCCACACTCAGTGCCACGAAGTGCAGCTCTAAGCTGGGGATTTCTGTG >HUMCACH1AJ)_12_0
ACCCAGCTCCATGTGCGTTCTCAGGGAATGGACGCCAGTGTACTGCCAAT >HUMCACH1A_0_3_14917
AGAGAATATCACTCCGATGGTCGGTTTCTGACTGTCACGCTAAGGGCAAC
>HUMC ACH 1 A_0_0_ 14922
GAACACAGAGAACGTCAGCGGTGAAGGCGAGAACCGAGGCTGCTGTGGAA >HUMCACH1A_0_0_14913
GACTCAGGAGATGAACAGCTCCCAACTATTTGCCGGGAAGACCCAGAGAT
>HUMC ACH 1 A_0_0_14924
GGCCCAGCATTGGGAACCTTGAGCATGTGTCTGAAAATGGGCATCATTCT
>HUMCEA_0_0_96 CAAGAGGGGTTTGGCTGAGACTTTAGGATTGTGATTCAGCTTAGAGGGAC
>HUMCEA_0_0_15183
CCTGGTGGGAGCCCATGAGAAGCGAGTTCTCTGTGCAACGGACTTAGTAA
>HUMCEA_0_0_15182
GCTCCCTGGAGCATCAGCATCATATTCTGGGGTGGAGTCTATCTGGTTCT >HUMCEA_0_0_15168
TCCTGCCTGTCACCTGAAGTTCTAGATCATTCCCTGGACTCCACTCTATC
>HUMCEA_0_0_15180
TTTAACACAGGATTGGGACAGGATTCAGAGGGACACTGTGGCCCTTCTAC
>M78035_0_0_21693 CCATCCACATTTATGGAAACACTTGCTGTATATCTGGTGATTTACGTGTG
>M78035_0_0_21691
CCTTTCACCACTGTGTGCAAGCGAATACACGCGGAACAATCCTAGTGAAT
>M78035_0_1_21707
TTTGCTAGAAATCTGGTGTGGTGCAGGAGCGACTCCAGGATTCACTCTGT >T23657_0_18_0
TCCGTGACCCTCAGAGATCCTTTGCCCTGGGAATCCAGTGGATTGTAGTT
>T51958_0_0_50903
CCCATGGTGGCCAGAGTGTCAGGTCTCATCGTGACGCTCTTGTCCTCCTC
>T51958_0_0_50916 GGGGCTGTGCCCAGTCCCCCTGTCAGACCCTCAATGACTGAGGCCTGGGG
>Z17877 0 4 0 ACTTTGCACTGGAACTTACAACACCCGAGCAAGGACGCGACTCTCCCGAC >HSHCGI_0_0_1061 1
GCCTACTGATTCATCCACATACAATTCTCAGCGTATATCCAAATGCAGTC >HSHCGI_0_0_10620 GGACCTCTAAGTCTACAGGTGGTCAAAATGCTGTATCCACCCAATTCCAC
The following list of abbreviations for tissues was used in the TAA histograms. The teπn "TAA" stands for "Tumor Associated Antigen", and the TAA histograms, given in the text, represent the cancerous tissue expression pattern as predicted by the biomarkers selection engine, as described in detail in examples 1-5 below: "BONE" for "bone"; "COL" for "colon"; "EPI" for "epithelial"; "GEN" for "general"; "LIVER" for "liver"; "LUN" for "lung"; "LYMPH" for "lymph nodes"; "MARROW" for "bone marrow"; "OVA" for "ovary"; "PANCREAS" for "pancreas"; "PRO" for "prostate"; "STOMACH" for "stomach"; "TCELL" for "T cells"; "THYROID" for "Thyroid"; "MAM" for "breast"; "BRAIN" for "brain"; "UTERUS" for "uterus"; "SKIN" for "skin"; "KIDNEY" for "kidney"; "MUSCLE" for "muscle"; "ADREN" for "adrenal"; "HEAD" for "head and neck"; "BLADDER" for "bladder";
It should be noted that the terms "segment", "seg" and "node" are used interchangeably in reference to nucleic acid sequences of the present invention; they refer to portions of nucleic acid sequences that were shown to have one or more properties as described below. They are also the building blocks that were used to construct complete nucleic acid sequences as described in greater detail below. Optionally and preferably, they are examples of oligonucleotides which are embodiments of the present invention, for example as amplicons, hybridization units and/or from which primers and/or complementary oligonucleotides may optionally be derived, and or for any other use. As used herein the phrase "colon cancer" refers to cancers of the colon or colorectal cancers. The term "marker" in the context of the present invention refers to a nucleic acid fragment, a peptide, or a polypeptide, which is differentially present in a sample taken from subjects (patients) having colon cancer as compared to a comparable sample taken from subjects who do not have colon cancer. The phrase "differentially present" refers to differences in the quantity of a marker present in a sample taken from patients having colon cancer as compared to a comparable sample taken from patients who do not have colon cancer. For example, a nucleic acid fragment may optionally be differentially present between the two samples if the amount of the nucleic acid fragment in one sample is significantly different from the amount of the nucleic acid fragment in the other sample, for example as measured by hybridization and/or NAT-based assays. A polypeptide is differentially present between the two samples if the amount of the polypeptide in one sample is significantly different from the amount of the polypeptide in the other sample. It should be noted that if the marker is detectable in one sample and not detectable in the other, then such a marker can be considered to be differentially present. As used herein the phrase "diagnostic" means identifying the presence or nature of a pathologic condition. Diagnostic methods differ in their sensitivity and specificity. The "sensitivity" of a diagnostic assay is the percentage of diseased individuals who test positive
(percent of "true positives"). Diseased individuals not detected by the assay are "false negatives." Subjects who are not diseased and who test negative in the assay are termed "true negatives." The "specificity" of a diagnostic assay is 1 minus the false positive rate, where the "false positive" rate is defined as the proportion of those without the disease who test positive. While a particular diagnostic method may not provide a definitive diagnosis of a condition, it suffices if the method provides a positive indication that aids in diagnosis. As used herein the phrase "diagnosing" refers to classifying a disease or a symptom, determining a severity of the disease, monitoring disease progression, forecasting an outcome of a disease and/or prospects of recovery. The term "detecting" may also optionally encompass any of the above. Diagnosis of a disease according to the present invention can be effected by determining a level of a polynucleotide or a polypeptide of the present invention in a biological sample obtained from the subject, wherein the level deteπnined can be correlated with predisposition to, or presence or absence of the disease. It should be noted that a "biological sample obtained from the subject" may also optionally comprise a sample that has not been physically removed from the subject, as described in greater detail below. As used herein, the term "level" refers to expression levels of RNA and/or protein or to DNA copy number of a marker of the present invention. Typically the level of the marker in a biological sample obtained from the subject is different (i.e., increased or decreased) from the level of the same variant in a similar sample obtained from a healthy individual (examples of biological samples are described herein). Numerous well known tissue or fluid collection methods can be utilized to collect the biological sample from the subject in order to determine the level of DNA, RNA and/or polypeptide of the variant of interest in the subject. Examples include, but are not limited to, fine needle biopsy, needle biopsy, core needle biopsy and surgical biopsy (e.g., brain biopsy), and lavage. Regardless of the procedure employed, once a biopsy/sample is obtained the level of the variant can be determined and a diagnosis can thus be made. Determining the level of the same variant in normal tissues of the same origin is preferably effected along-side to detect an elevated expression and/or amplification and/or a decreased expression, of the variant as opposed to the normal tissues. A "test amount" of a marker refers to an amount of a marker in a subject's sample that is consistent with a diagnosis of colon cancer. A test amount can be either in absolute amount (e.g., microgram/ml) or a relative amount (e.g., relative intensity of signals). A "control amount" of a marker can be any amount or a range of amounts to be compared against a test amount of a marker. For example, a control amount of a marker can be the amount of a marker in a patient with colon cancer or a person without colon cancer. A control amount can be either in absolute amount (e.g., microgram/ml) or a relative amount (e.g., relative intensity of signals). "Detect" refers to identifying the presence, absence or amount of the object to be detected. A "label" includes any moiety or item detectable by spectroscopic, photo chemical, biochemical, immunochemical, or chemical means. For example, useful labels include 32P, 35S, fluorescent dyes, electron-dense reagents, enzymes (e.g., as commonly used in an ELISA), biotin-streptavadin, dioxigenin, haptens and proteins for which antisera or monoclonal antibodies are available, or nucleic acid molecules with a sequence complementary to a target. The label often generates a measurable signal, such as a radioactive, chromogenic, or fluorescent signal, that can be used to quantify the amount of bound label in a sample. The label can be incorporated in or attached to a primer or probe either covalently, or through ionic, van der Waals or hydrogen bonds, e.g., incorporation of radioactive nucleotides, or biotinylated nucleotides that are recognized by streptavadin. The label may be directly or indirectly detectable. Indirect detection can involve the binding of a second label to the first label, directly or indirectly. For example, the label can be the ligand of a binding partner, such as biotin, which is a binding partner for streptavadin, or a nucleotide sequence, which is the binding partner for a complementary sequence, to which it can specifically hybridize. The binding partner may itself be directly detectable, for example, an antibody may be itself labeled with a fluorescent molecule. The binding partner also may be indirectly detectable, for example, a nucleic acid having a complementary nucleotide sequence can be a part of a branched DNA molecule that is in turn detectable through hybridization with other labeled nucleic acid molecules (see, e.g., P. D. Fahrlander and A. Klausner, Bio/Technology 6:1165 (1988)). Quantitation of the signal is achieved by, e.g., scintillation counting, densitometry, or flow cytometry. Exemplary detectable labels, optionally and preferably for use with immunoassays, include but are not limited to magnetic beads, fluorescent dyes, radiolabels, enzymes (e.g., horse radish peroxide, alkaline phosphatase and others commonly used in an ELISA), and calorimetric labels such as colloidal gold or colored glass or plastic beads. Alternatively, the marker in the sample can be detected using an indirect assay, wherein, for example, a second, labeled antibody is used to detect bound marker-specific antibody, and/or in a competition or inhibition assay wherein, for example, a monoclonal antibody which binds to a distinct epitope of the marker are incubated simultaneously with the mixture. "Immunoassay" is an assay that uses an antibody to specifically bind an antigen. The immunoassay is characterized by the use of specific binding properties of a particular antibody to isolate, target, and or quantify the antigen. The phrase "specifically (or selectively) binds" to an antibody or "specifically (or selectively) immunoreactive with," when referring to a protein or peptide (or other epitope), refers to a binding reaction that is determinative of the presence of the protein in a heterogeneous population of proteins and other biologies. Thus, under designated immunoassay conditions, the specified antibodies bind to a particular protein at least two times greater than the background (non-specific signal) and do not substantially bind in a significant amount to other proteins present in the sample. Specific binding to an antibody under such conditions may require an antibody that is selected for its specificity for a particular protein. For example, polyclonal antibodies raised to seminal basic protein from specific species such as rat, mouse, or human can be selected to obtain only those polyclonal antibodies that are specifically immunoreactive with seminal basic protein and not with other proteins, except for polymoφhic variants and alleles of seminal basic protein. This selection may be achieved by subtracting out antibodies that cross-react with seminal basic protein molecules from other species. A variety of immunoassay formats may be used to select antibodies specifically immunoreactive with a particular protein. For example, solid-phase ELISA immunoassays are routinely used to select antibodies specifically immunoreactive with a protein (see, e.g., Harlow & Lane, Antibodies, A Laboratory Manual (1988), for a description of immunoassay formats and conditions that can be used to determine specific immunoreactivity). Typically a specific or selective reaction will be at least twice background signal or noise and more typically more than 10 to 100 times background. According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a transcript SEQ ID NOs: 1 and 2. According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a segment SEQ ID NOs: 89, 90, 91, 92, 93, 94, 95, 96, 97, 98 and 99.According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising SEQ ID NOs 534 and 535. According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a transcript SEQ ID NOs: 3, 4, 5 and 6. According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a segment SEQ ID NOs: 100, 101, 102, 103, 104, 105, 106 and 107. According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising SEQ ID NOs 536, 537, 538 and 539. According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a transcript SEQ ID NO. 7. According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a segment SEQ ID NOs: 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120 ,121 and 122. According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising SEQ ID NOs 540. According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a transcript selected from the group consisting of SEQ ID NO. 8 and 9. According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a segment selected from the group consisting of SEQ ID NOs: 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141 and 142. According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising SEQ ID NOs 541, 542. According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a transcript SEQ ID NO. 10. According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a segment SEQ ID NOs: 143, 144, 145, 146, 147, 148 and 149. According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising SEQ ID NOs 543. According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a transcript SEQ ID NO. 11, 12, 13 and 14. According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a segment SEQ ID NOs: 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166 and 167. According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising SEQ ID NOs 544, 545, 546 and 547. According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a transcript SEQ ID NO. 15. According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a segment SEQ ID NOs: 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183 and 184. According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising SEQ ID NO. 548. According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a transcript SEQ ID NO. 16. According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a segment SEQ ID NOs: 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195 and 196. According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising SEQ ID NOs 549. According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a transcript SEQ ID NO. 17 and 18. According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a segment SEQ ID NOs: 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210 and 211. According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising SEQ ID NOs 550 and 551. According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a transcript SEQ ID NO. 19, 20, 21 and 22. According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a segment SEQ ID NOs: 212, 213, 214, 215, 216, 217, 218 and 219. According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising SEQ ID NOs 552, 553, 554 and 555. According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a transcript SEQ ID NO. 23, 24, 25, 26 and 27. According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a segment SEQ ID NOs: 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239 and 240. According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising SEQ ID NOs 556, 557, 558 and 559. According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a transcript SEQ ID NO. 28, 29, 30, 31 and 32. According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a segment SEQ ID NOs: 241, 242, 243, 244, 245, 246, 247, 248, 249, 250 and 251. According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising SEQ ID NOs 560, 561, 562 and 563. According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a transcript SEQ ID NO. 33, 34, and 35. According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a segment SEQ ID NOs: 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 267, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284. According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising SEQ ID NOs 564, 565, and 566. According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a transcript SEQ ID NO. 36, 37, 38, 39, 40, 41, 42 and 43. According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a segment SEQ ID NOs: 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305 and 306. According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising SEQ ID NOs 567, 568, 569, 570, 571, 572, 573 and 574. According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a transcript SEQ ID NO. 44. According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a segment SEQ ID NOs: 307, 308, 309, 310, 311, 312, 313, 314, 315 and 316. According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising SEQ ID NO. 575. According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a transcript SEQ ID NO. 45, 46, 47 and 48. According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a segment SEQ ID NOs: 317, 318, 319, 320, 321, 322, 323,
324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361 and 362. According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising SEQ ID NOs 576, 577, 578 and 579. According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a transcript SEQ ID NO. 49. According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a segment SEQ ID NOs: 363, 364 and 365. According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising SEQ ID NO. 580. • According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a transcript SEQ ID NO. 50, 51, 52, 53, 54, 55 and 56. According to prefeπ'ed embodiments of the present invention, there is provided an isolated polynucleotide comprising a segment SEQ ID NOs: 366, 367, 368, 369, 370, 371, 372,
373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391,
392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417 and 418. According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising SEQ ID NOs 581, 582, 583, 584, 585, 586 and 587. According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a transcript SEQ ID NO. 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71 , 72 , 73 and 74. According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a segment SEQ ID NOs: 419, 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 43, 431, 432, 433, 434, 435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448 and 449. According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising SEQ ID NOs 588, 589, 590, 591, 592, 593, 594, 595, 596, 597, 598, 599, 600, 601 and 602. According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a transcript SEQ ID NO. 75, 76, 77, 78, 79 and 80. According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a segment SEQ ID NOs: 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, 468, 469, 470, 471, 472, 473, 474 and 475. According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising SEQ ID NOs 603, 604, 605, 606 and 607. According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a transcript SEQ ID NO. 81, 82, 83 and 84. According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a segment SEQ ID NOs: 476, 477, 478, 479, 480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 498, 499, 500, 501, 502, 503 and 504. According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising SEQ ID NOs 608, 609, 610 and 611. According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a transcript SEQ ID NO. 85, 86, 87 and 88. According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a segment SEQ ID NOs: 505-532 and 533. According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising SEQ ID NOs: 612, 613, 614 and 615. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encodings from clusters M85491, T10888, H14624, H53626,
HSENA78, HUMGROG5, HUMODCA, R00299, Z19178, S67314, Z44808, Z25299,
HUMF5A, HUMANK, Z39818, HUMCA1XIA, HSS100PCB, HUMPHOSLIP, D11853,
RI 1723, M77903 and HSKITCR. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 608, comprising a first amino acid sequence being at least 90 %> homologous to amino acids 1 - 207 of SSRA_HUMAN, which also corresponds to amino acids 1 - 207 of SEQ ID NO. 608, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide corresponding to amino acids 208 - 214 of SEQ ID NO. 608, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 608, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95% homologous to acids 208 - 214 in SEQ ID
NO. 608. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 609, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 207 of SSRA_HUMAN, which also corresponds to amino acids 1 - 207 of SEQ ID NO. 609. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 610, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 181 of SSRAJHUMAN, which also corresponds to amino acids 1 - 181 of SEQ ID NO. 610, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide corresponding to amino acids 182 - 192 of SEQ ID NO. 610, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 610, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to acids 182 - 192 in SEQ ID NO. 610. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 611, comprising a first amino acid sequence being at least 90 %> homologous to amino acids 1 - 93 of SSRA_HUMAN, which also corresponds to amino acids 1 - 93 of SEQ ID NO. 611, and a second amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%>, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide corresponding to amino acids 94 - 104 of SEQ ID NO. 611, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 611, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 94 - 104 in SEQ ID NO. 611. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 604, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide corresponding to amino acids 1 - 110 of SEQ ID NO. 604, and a second amino acid sequence being at least 90 % homologous to amino acids 1 - 112 of Q8IXM0, which also corresponds to amino acids 111 - 222 of SEQ ID NO. 604, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of SEQ ID NO. 604, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 1 - 110 of
SEQ ID NO. 604. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 604, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 83 of Q96AC2, which also corresponds to amino acids 1 - 83 of SEQ ID NO. 604, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide corresponding to amino acids 84 - 222 of SEQ ID NO. 604, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 604, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 84 - 222 in SEQ ID NO. 604.
According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 604, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 83 of Q8N2G4, which also corresponds to amino acids 1 - 83 of SEQ ID NO. 604, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide corresponding to amino acids 84 - 222 of SEQ ID NO. 604, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 604, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 84 - 222 in SEQ ID NO. 604. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 604, comprising a first amino acid sequence being at least 90 % homologous to amino acids 24 - 106 of BAC85518, which also corresponds to amino acids 1 - 83 of SEQ ID NO. 604, and a second amino acid sequence being at least 70%o, optionally at least 80%>, preferably at least 85%, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide corresponding to amino acids 84 - 222 of SEQ ID NO. 604, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 604, comprising a polypeptide being at least 70%), optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 84 - 222 in SEQ ID NO. 604. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 605, comprising a first amino acid sequence being at least 90 %> homologous to amino acids 1 - 64 of Q96AC2, which also corresponds to amino acids 1 - 64 of SEQ ID NO. 605, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide corresponding to amino acids 65 - 93 of SEQ ID NO. 605, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 605, comprising a polypeptide being at least 70%, optionally at least about 80%), preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 65 - 93 in SEQ ID NO. 605. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 605, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 64 of Q8N2G4, which also corresponds to amino acids 1 - 64 of SEQ ID NO. 605, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide corresponding to amino acids 65 - 93 of SEQ ID NO. 605, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 605, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 65 - 93 in SEQ ID NO. 605. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 605, comprising a first amino acid sequence being at least 70%), optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MWVLG corresponding to amino acids 1 - 5 of SEQ ID NO. 605, second amino acid sequence being at least 90 % homologous to amino acids 22 - 80 of BAC85273, which also corresponds to amino acids 6 - 64 of SEQ ID NO. 605, and a third amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%>, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 65 - 93 of SEQ ID NO. 605, wherein said first, second and third amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of SEQ ID NO. 605, comprising a polypeptide being at least 70%>, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 1 - 5 of SEQ ID NO. 605. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 605, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95%) homologous to amino acids 65 - 93 in SEQ ID NO. 605. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 605, comprising a first amino acid sequence being at least 90 % homologous to amino acids 24 - 87 of BAC85518, which also corresponds to amino acids 1 - 64 of SEQ ID NO. 605, and a second amino acid sequence being at least 70%), optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence conesponding to amino acids 65 - 93 of SEQ ID NO. 605, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 605, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95% homologous to amino acids 65 - 93 in SEQ ID NO. 605. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 605, comprising a first amino acid sequence being at least 90 %> homologous to amino acids 1 - 63 of Q96AC2, which also corresponds to amino acids 1 - 63 of SEQ ID NO. 606, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 64 - 84 of SEQ ID NO. 606, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 606, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 64 - 84 in SEQ ID NO. 606. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 607, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 63 of Q96AC2, which also corresponds to amino acids 1 - 63 of SEQ ID NO. 607, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 64 - 90 of SEQ ID NO. 607, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 607, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95%) homologous to amino acids 64 - 90 in SEQ ID NO. 607. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 607, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 63 of Q8N2G4, which also corresponds to amino acids 1 - 63 of SEQ ID NO. 607, and a second amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 64 - 90 of SEQ ID NO. 607 wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 607, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95%> homologous to amino acids 64 - 90 in SEQ ID NO. 607. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 607, comprising a first amino acid sequence being at least 70%>, optionally at least 80%>, preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 1 - 5 of SEQ ID NO. 607, second amino acid sequence being at least 90 % homologous to amino acids 22 - 79 of BAC85273, which also corresponds to amino acids 6 - 63 of SEQ ID NO. 607, and a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%) homologous to a polypeptide sequence corresponding to amino acids 64 - 90 of SEQ ID NO. 607, wherein said first, second and third amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of SEQ ID NO. 607, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95% homologous to amino acids 1 - 5 of SEQ ID NO. 607. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 607, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%) homologous to amino acids 64 - 90 in SEQ ID NO. 607. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 607, comprising a first amino acid sequence being at least 90 % homologous to amino acids 24 - 86 of BAC85518, which also corresponds to amino acids 1 - 63 of SEQ ID NO. 607, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%>, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 64 - 90 of SEQ ID NO. 607, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 607, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95% homologous to amino acids 64 - 90 in SEQ ID NO. 607. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 588, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%) and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 1 - 26 of SEQ ID NO. 588, a second amino acid sequence being at least 90 % homologous to amino acids 13 - 187 of SEQ ID NO. 639, which also corresponds to amino acids 27 - 201 of SEQ ID NO. 588, a bridging amino acid A corresponding to amino acid 202 of SEQ ID NO. 588, and a third amino acid sequence being at least 90 % homologous to amino acids 189 - 342 of SEQ ID NO. 639, which also corresponds to amino acids 203 - 356 of SEQ ID NO. 588, wherein said first amino acid sequence, second amino acid sequence, bridging amino acid and third amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of SEQ ID NO. 588, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence amino acids 1 - 26 of SEQ ID NO. 588. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 588, comprising a first amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%, more preferably at least 90%) and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 1 - 109 of SEQ ID NO. 588, a second amino acid sequence being at least 90 %> homologous to amino acids 1 - 159 of SEQ ID NO. 640, which also corresponds to amino acids 110 - 268 of SEQ ID NO. 588, and a third amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 269 - 356 of SEQ ID NO. 588, wherein said first amino acid sequence, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of SEQ ID NO. 588, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95% homologous to amino acids 1 - 109 of SEQ ID NO. 588. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 588, comprising a polypeptide being at least 70%), optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 269 - 356 in SEQ ID NO. 588. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 588, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 128 of SEQ ID NO. 638, which also corresponds to amino acids 1 - 128 of SEQ ID NO. 588, a bridging amino acid L corresponding to amino acid 129 of SEQ ID NO. 588, and a second amino acid sequence being at least 90 % homologous to amino acids 130 - 356 of SEQ ID NO. 638, which also corresponds to amino acids 130 - 356 of SEQ ID NO. 588, wherein said first amino acid sequence, bridging amino acid and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 589, comprising a first amino acid sequence being at least 70%>, optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to amino acids 1 - 26 of SEQ ID NO. 589, a second amino acid sequence being at least 90 % homologous to amino acids 13 - 187 of SEQ ID NO. 639, which also corresponds to amino acids 27 - 201 of SEQ ID NO. 589, a bridging amino acid A corresponding to amino acid 202 of SEQ ID NO. 589, a third amino acid sequence being at least 90 %> homologous to amino acids 189 - 297 of SEQ ID NO. 639, which also corresponds to amino acids 203 - 311 of SEQ ID NO. 589, and a fourth amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 312 - 315 of SEQ ID NO. 589, wherein said first amino acid sequence, second amino acid sequence, bridging amino acid, third amino acid sequence and fourth amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of SEQ ID NO. 589, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 1 - 26 of SEQ ID NO. 589. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encodmg for a tail of SEQ ID NO. 589, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 312 - 315 in SEQ ID NO. 589. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 589, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence to amino acids 1 - 109 of SEQ ID NO. 589, a second amino acid sequence being at least 90 % homologous to amino acids 1 - 159 of SEQ ID NO. 640, which also corresponds to amino acids 110 - 268 of SEQ ID NO. 589, and a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 269 - 315 of SEQ ID NO. 589, wherein said first amino acid sequence, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of SEQ ID NO. 589, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to amino acids 1 - 109 of SEQ ID NO. 589. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 589, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95% homologous to amino acids 269 - 315 in SEQ ID NO. 589. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 589, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 128 of SEQ ID NO. 638, which also corresponds to amino acids 1 - 128 of SEQ ID NO. 589, a bridging amino acid L corresponding to amino acid 129 of SEQ ID NO. 589, a second amino acid sequence being at least 90 % homologous to amino acids 130 - 311 of SEQ ID NO. 638, which also corresponds to amino acids 130 - 311 of SEQ ID NO. 589, and a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 312 - 315 of SEQ ID NO. 589, wherein said first amino acid sequence, bridging amino acid, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 589, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 312 - 315 in SEQ ID NO. 589. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 589, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 311 of Q9UJZ1, which also corresponds to amino acids 1 - 311 of SEQ ID NO. 589, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%) and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 312 - 315 of SEQ ID NO. 589, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 589, comprising a polypeptide being at least 70%), optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 312 - 315 in SEQ ID NO. 589. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 589, comprising a first amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90%) and most preferably at least 95%> homologous to a polypeptide sequence corresponding to amino acids 1 - 109 of SEQ ID NO. 589, a second amino acid sequence being at least 90 % homologous to amino acids 1 - 159 of SEQ ID NO. 640, which also corresponds to amino acids 110 - 268 of SEQ ID NO. 589, and a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence con-espσnding to amino acids 269 - 315 of SEQ ID NO. 589, wherein said first amino acid sequence, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of SEQ ID NO. 589, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 1 - 109 of
SEQ ID NO. 589. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 589, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 269 - 315 in SEQ ID NO. 589. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 589, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 128 of SEQ ID NO. 638, which also corresponds to amino acids 1 - 128 of SEQ ID NO. 589, a bridging amino acid L corresponding to amino acid 129 of SEQ ID NO. 589, a second amino acid sequence being at least 90 % homologous to amino acids 130 - 311 of SEQ ID NO. 638, which also corresponds to amino acids 130 - 311 of SEQ ID NO. 589, and a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 312 - 315 of SEQ ID NO. 589, wherein said first amino acid sequence, bridging amino acid, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 589, comprising a polypeptide being at least 70%), optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 312 - 315 in SEQ ID NO. 589. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 589, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 311 of Q9UJZ1, which also corresponds to amino acids 1 - 311 of SEQ ID NO. 589, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 312 - 315 of SEQ ID NO. 589, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 589, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 312 - 315 in SEQ ID NO. 589. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 590, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 1 - 26 of SEQ ID NO. 590, a second amino acid sequence being at least 90 %> homologous to amino acids 13 - 187 of Q9P042, which also corresponds to amino acids 27 - 201 of SEQ ID NO. 590, a bridging amino acid A corresponding to amino acid 202 of SEQ ID NO. 590, a third amino acid sequence being at least 90 % homologous to amino acids 189 - 254 of SEQ ID NO. 639, which also corresponds to amino acids 203 - 268 of SEQ ID NO. 590, and a fourth amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 269 - 290 of SEQ ID NO. 590, wherein said first amino acid sequence, second amino acid sequence, bridging amino acid, third amino acid sequence and fourth amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of SEQ ID NO. 590, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to amino acids 1 - 26 of SEQ ID NO. 590. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 590, comprising a polypeptide being at least 70%o, optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 269 - 290 in SEQ
ID NO. 590. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 590, comprising a first amino acid sequence being at least 70%, optionally at least 80%o, preferably at least 85%, more preferably at least 90%) and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 1 - 109 of SEQ ID NO. 590, and a second amino acid sequence being at least 90 % homologous to corresponding to amino acids 1 - 181 of SEQ ID NO. 640, which also corresponds to amino acids 110 - 290 of SEQ ID NO. 590, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of SEQ ID NO. 590, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 1 - 109 of SEQ ID NO. 590. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 590, comprising a first amino acid sequence being at least 90 %> homologous to amino acids 1 - 128 of SEQ ID NO. 638, which also corresponds to amino acids 1 - 128 of SEQ ID NO. 590, a bridging amino acid L corresponding to amino acid 129 of SEQ ID NO. 590, a second amino acid sequence being at least 90 %> homologous to amino acids 130 - 268 of SEQ ID NO. 638, which also corresponds to amino acids 130 - 268 of SEQ ID NO. 590, and a third amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 269 - 290 of SEQ ID NO. 590, wherein said first amino acid sequence, bridging amino acid, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 590, comprising a polypeptide being at least 70%), optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95%> homologous to amino acids 269 - 290 in SEQ ID NO. 590.
According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 590, comprising a first amino acid sequence being at least 90 %> homologous to amino acids 1 - 268 of Q9UJZ1, which also corresponds to amino acids 1 - 268 of SEQ ID NO. 590, and a second amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 269 - 290 of SEQ ID NO. 590, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 590, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 269 - 290 in SEQ ID NO. 590. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 591, comprising a first amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 1 - 26 of SEQ ID NO. 591, a second amino acid sequence being at least 90 % homologous to amino acids 13 - 187 of SEQ ID NO. 639, which also corresponds to amino acids 27 - 201 of SEQ ID NO. 591, a bridging amino acid A corresponding to amino acid 202 of SEQ ID NO. 591, a third amino acid sequence being at least 90 % homologous to amino acids 189 - 226 of SEQ ID NO. 639, which also con-esponds to amino acids 203 - 240 of SEQ ID NO. 591, a fourth amino acid sequence being at least 10%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide sequence corresponding to amino acids 241 - 281 of SEQ ID NO. 591, and a fifth amino acid sequence being at least 90 % homologous to amino acids 227 - 342 of SEQ ID NO. 639, which also corresponds to amino acids 282 - 397 of SEQ ID NO. 591, wherein said first amino acid sequence, second amino acid sequence, bridging amino acid, third amino acid sequence, fourth amino acid sequence and fifth amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of SEQ ID NO. 591, comprising a polypeptide being at least 70%, optionally at least about 80%), preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 1-26 of SEQ ID NO. 591. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encodmg for an edge portion of SEQ ID NO. 591, comprising an amino acid sequence being at least 70%>, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence encoding amino acids 241 - 281 corresponding to SEQ ID NO. 591. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 591, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 1 - 109 of SEQ ID NO. 591, a second amino acid sequence being at least 90 % homologous to amino acids 1 - 131 of SEQ ID NO. 640, which also corresponds to amino acids 110 - 240 of SEQ ID NO. 591, a third amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 241 - 281 of SEQ ID NO. 591, a fourth amino acid sequence being at least 90 % homologous to amino acids 132 - 159 of SEQ ID NO. 640, which also corresponds to amino acids 282 - 309 of SEQ ID NO. 591, and a fifth amino acid sequence being at least 70%>, optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 310 - 397 of SEQ ID NO. 591, wherein said first amino acid sequence, second amino acid sequence, third amino acid sequence, fourth amino acid sequence and fifth amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of SEQ ID NO. 591, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95%> homologous to amino acids 1 - 109 of SEQ ID NO. 591. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for an edge portion of SEQ ID NO. 591, comprising an amino acid sequence being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95%) homologous to the sequence encoding for amino acids 241 - 281 corresponding to SEQ ID NO. 591. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 591, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 310 - 397 in SEQ ID NO. 591. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 591, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 128 of Q96FY2, which also corresponds to amino acids 1 - 128 of SEQ ID NO. 591, a bridging amino acid L corresponding to amino acid 129 of SEQ ID NO. 591, a second amino acid sequence being at least 90 % homologous to amino acids 130 - 240 of SEQ ID NO. 638, which also corresponds to amino acids 130 - 240 of SEQ ID NO. 591, a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%o homologous to a polypeptide sequence corresponding to amino acids 241 - 281 of SEQ ID NO. 591, and a fourth amino acid sequence being at least 90 % homologous to amino acids 241 - 356 of SEQ ID NO. 638, which also corresponds to amino acids 282 - 397 of SEQ ID NO. 591, wherein said first amino acid sequence, bridging amino acid, second amino acid sequence, third amino acid sequence and fourth amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for an edge portion of SEQ ID NO. 591, comprising an amino acid sequence being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence encoding for amino acids 241 - 281 corresponding to SEQ ID NO. 591. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 591, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 240 of Q9UJZ1, which also corresponds to amino acids 1 - 240 of SEQ ID NO. 591, a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 241 - 281 of SEQ ID NO. 591, and a third amino acid sequence being at least 90 % homologous to amino acids 241 - 356 of Q9UJZ1, which also corresponds to amino acids 282 - 397 of SEQ ID NO. 591, wherein said first amino acid sequence, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encodmg for an edge portion of SEQ ID NO. 591, comprising an amino acid sequence being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence encoding for amino acids 241 - 281 corresponding to SEQ ID NO. 591. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 592, comprising a first amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 1 - 26 of SEQ ID NO. 592, a second amino acid sequence being at least 90 %> homologous to amino acids 13 - 187 of SEQ ID NO. 639, which also corresponds to amino acids 27 - 201 of SEQ ID NO. 592, a bridging amino acid A corresponding to amino acid 202 of SEQ ID NO. 592, a third amino acid sequence being at least 90 % homologous to to amino acids 189 - 254 of SEQ ID NO. 639, which also corresponds to amino acids 203 - 268 of SEQ ID NO. 592, and a fourth amino acid sequence being at least 90 %> homologous to amino acids 298 - 342 of SEQ ID NO. 639, which also corresponds to amino acids 269 - 313 of SEQ ID NO. 592, wherein said first amino acid sequence, second amino acid sequence, bridging amino acid, third amino acid sequence and fourth amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of SEQ ID NO. 592, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 1 - 26 of SEQ ID NO. 592. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of SEQ ID NO. 592, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise HA, having a structure as follows: a sequence starting from any of amino acid numbers 268-x to 268; and ending at any of amino acid numbers 269+ ((n-2) - x), in which x varies from 0 to n-2. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 592, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%) and most preferably at least 95% homologous to a polypeptide sequence amino acids 1 - 109 of SEQ ID NO. 592, a second amino acid sequence being at least 90 % homologous to amino acids 1 - 159 of SEQ ID NO. 640, which also corresponds to amino acids 110 - 268 of SEQ ID NO. 592, and a third amino acid sequence being at least 70%>, optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 269 - 313 of SEQ ID NO. 592, wherein said first amino acid sequence, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of SEQ ID NO. 592, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 1 - 109 of SEQ ID NO. 592. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 592, comprising a polypeptide being at least 70%), optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95%> homologous to amino acids 269 - 313 in SEQ ID NO. 592. According to prefen-ed embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 592, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 128 of SEQ ID NO. 638, which also corresponds to amino acids 1 - 128 of SEQ ID NO. 592, a bridging amino acid L corresponding to amino acid 129 of SEQ ID NO. 592, a second amino acid sequence being at least 90 % homologous to amino acids 130 - 268 of SEQ ID NO. 638, which also corresponds to amino acids 130 - 268 of SEQ ID NO. 592, and a third amino acid sequence being at least 90 % homologous to amino acids 312 - 356 of SEQ ID NO. 638, which also corresponds to amino acids 269 - 313 of SEQ ID NO. 592, wherein said first amino acid sequence, bridging amino acid, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of SEQ ID NO. 592, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise HA, having a structure as follows: a sequence starting from any of amino acid numbers 268-x to 268; and ending at any of amino acid numbers 269+ ((n-2) - x), in which x varies from 0 to n-2. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 592, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 268 of Q9UJZ1, which also corresponds to amino acids 1 - 268 of SEQ ID NO. 592, and a second amino acid sequence being at least 90 %> homologous to amino acids 312 - 356 of Q9UJZ1, which also corresponds to amino acids 269 - 313 of SEQ ID NO. 592, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of SEQ ID NO. 592, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise HA, having a structure as follows: a sequence starting from any of amino acid numbers 268-x to 268; and ending at any of amino acid numbers 269+ ((n-2) - x), in which x varies from 0 to n-2. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 592, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%) and most preferably at least 95% homologous to a polypeptide sequence amino acids 1 - 109 of SEQ ID NO. 592, a second amino acid sequence being at least 90 % homologous to amino acids 1 - 159 of SEQ ID NO. 640, which also corresponds to amino acids 110 - 268 of SEQ ID NO. 592, and a third amino acid sequence being at least 70%>, optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 269 - 313 of SEQ ID NO. 592, wherein said first amino acid sequence, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of SEQ ID NO. 592, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95 %> homologous to amino acids 1 - 109 of SEQ ID NO. 592. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 592, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to amino acids 269 - 313 in SEQ ID NO. 592. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 592, comprising a first amino acid sequence being at least 90 %> homologous to amino acids 1 - 128 of SEQ ID NO. 638, which also corresponds to amino acids 1 - 128 of SEQ ID NO. 592, a bridging amino acid L corresponding to amino acid 129 of SEQ ID NO. 592, a second amino acid sequence being at least 90 % homologous to amino acids 130 - 268 of SEQ ID NO. 638, which also corresponds to amino acids 130 - 268 of SEQ ID NO. 592, and a third amino acid sequence being at least 90 % homologous to amino acids 312 - 356 of SEQ ID NO. 638, which also corresponds to amino acids 269 - 313 of SEQ ID NO. 592, wherein said first amino acid sequence, bridging amino acid, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of SEQ ID NO. 592, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise HA, having a structure as follows: a sequence starting from any of amino acid numbers 268-x to 268; and ending at any of amino acid numbers 269+ ((n-2) - x), in which x varies from 0 to n-2. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 592, comprising a first amino acid sequence being at least 90 %> homologous to amino acids 1 - 268 of Q9UJZ1, which also corresponds to amino acids 1 - 268 of SEQ ID NO. 592, and a second amino acid sequence being at least 90 % homologous to amino acids 312 - 356 of Q9UJZ1, which also corresponds to amino acids 269 - 313 of SEQ ID NO. 592, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of SEQ ID NO. 592, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherem at least two amino acids comprise HA, having a structure as follows: a sequence starting from any of amino acid numbers 268-x to 268; and ending at any of amino acid numbers 269+ ((n-2) - x), in which x varies from 0 to n-2. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 593, comprising a first amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 1 - 26 of SEQ ID NO. 593, a second amino acid sequence being at least 90 % homologous to amino acids 13 - 187 of SEQ ID NO. 639, which also corresponds to amino acids 27 - 201 of SEQ ID NO. 593, a bridging amino acid A corresponding to amino acid 202 of SEQ ID NO. 593, a third amino acid sequence being at least 90 % homologous to amino acids 189 - 226 of SEQ ID NO. 639, which also corresponds to amino acids 203 - 240 of SEQ ID NO. 593, a fourth amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 241 - 281 of SEQ ID NO. 593, a fifth amino acid sequence being at least 90 % homologous to amino acids 227 - 254 of SEQ ID NO. 639, which also corresponds to amino acids 282 - 309 of SEQ ID NO. 593, and a sixth amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 310 - 331 of SEQ ID NO. 593, wherein said first amino acid sequence, second amino acid sequence, bridging amino acid, third amino acid sequence, fourth amino acid sequence, fifth amino acid sequence and sixth amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of SEQ ID NO. 593, comprising a polypeptide being at least 70%>, optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 1 - 26 of SEQ ID NO. 593. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for an edge portion of SEQ ID NO. 593, comprising an amino acid sequence being at least 70%>, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence encoding for amino acids 241 - 281 corresponding to SEQ ID NO. 593. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 593, comprising a polypeptide being at least 10%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95%> homologous to amino acids 310 - 331in SEQ ID NO. 593. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 593, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide sequence corresponding to amino acids 1 - 109 of SEQ ID NO. 593, a second amino acid sequence being at least 90 % homologous to amino acids 1 - 131 of SEQ ID NO. 640, which also corresponds to amino acids 110 - 240 of SEQ ID NO. 593, a third amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 241 - 281 of SEQ ID NO. 593, and a fourth amino acid sequence being at least 90 % homologous to amino acids 132 - 181 of SEQ ID NO. 640, which also corresponds to amino acids 282 - 331 of SEQ ID NO. 593, wherein said first amino acid sequence, second amino acid sequence, third amino acid sequence and fourth amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of SEQ ID NO. 593, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95%> homologous to amino acids 1 - 109 of SEQ ID NO. 593. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for an edge portion of SEQ ID NO. 593, comprising an amino acid sequence being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence encoding for amino acids 241 - 281 corresponding to SEQ ID NO. 593. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 593, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 128 of SEQ ID NO. 638, which also corresponds to amino acids 1 - 128 of SEQ ID NO. 593, a bridging amino acid L corresponding to amino acid 129 of SEQ ID NO. 593, a second amino acid sequence being at least 90 % homologous to amino acids 130 - 240 of SEQ ID NO. 638, which also corresponds to amino acids 130 - 240 of SEQ ID NO. 593, a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%>, more preferably at least 90%> and most preferably at least 95%> homologous to a polypeptide sequence corresponding to amino acids 241 - 281 of SEQ ID NO. 593, a fourth amino acid sequence being at least 90 % homologous to amino acids 241 - 268 of SEQ ID NO. 638, which also corresponds to amino acids 282 - 309 of SEQ ID NO. 593, and a fifth amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 310 - 331 of SEQ ID NO. 593, wherem said first amino acid sequence, bridging amino acid, second amino acid sequence, third amino acid sequence, fourth amino acid sequence and fifth amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for an edge portion of SEQ ID NO. 593, comprising an amino acid sequence being at least 70%>, optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence encoding for amino acids 241 - 281 corresponding to SEQ ID NO. 593. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 593, comprising a polypeptide being at least 70%), optionally at least about 80%», preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 310 - 331 in SEQ ID NO. 593. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 593, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 240 of Q9UJZ1, which also corresponds to amino acids 1 - 240 of SEQ ID NO. 593, a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%) and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 241 - 281 of SEQ ID NO. 593, a third amino acid sequence being at least 90 % homologous to amino acids 241 - 268 of Q9UJZ1, which also corresponds to amino acids 282 - 309 of SEQ ID NO. 593, and a fourth amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%o homologous to a polypeptide sequence corresponding to amino acids 310 - 331 of SEQ ID NO. 593, wherein said first amino acid sequence, second amino acid sequence, third amino acid sequence and fourth amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for an edge portion of SEQ ID NO. 593, comprising an amino acid sequence being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95% homologous to the sequence encoding for amino acids 241 - 281 corresponding to SEQ ID NO. 593. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 593, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 310 - 331 in SEQ ID NO. 593. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 594, comprising a first amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 1 - 26 of SEQ ID NO. 594, a second amino acid sequence being at least 90 %> homologous to amino acids 13 - 134 of SEQ ID NO. 639, which also corresponds to amino acids 27 - 148 of SEQ ID NO. 594, and a third amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%>, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 149 - 183 of SEQ ID NO. 594, wherein said first amino acid sequence, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of SEQ ID NO. 594, comprising a polypeptide being at least 70%>, optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 1 - 26 of
SEQ ID NO. 594. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 594, comprising a polypeptide being at least 70%), optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 149 - 183 in SEQ ID NO. 594. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 594, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 128 of SEQ ID NO. 638, which also corresponds to amino acids 1 - 128 of SEQ ID NO. 594, a bridging amino acid L corresponding to amino acid 129 of SEQ ID NO. 594, a second amino acid sequence being at least 90 % homologous to amino acids 130 - 148 of SEQ ID NO. 638, which also corresponds to amino acids 130 - 148 of SEQ ID NO. 594, and a third amino acid sequence being at least 70%>, optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 149 - 183 of SEQ ID NO. 594, wherein said first amino acid sequence, bridging amino acid, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 594, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 149 - 183 in SEQ ID NO. 594. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 594, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 148 of Q9UJZ1, which also corresponds to amino acids 1 - 148 of SEQ ID NO. 594, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 149 - 183 of SEQ ID NO. 594, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 594, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%o, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 149 - 183 in SEQ ID NO. 594. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 595, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 1 - 26 of SEQ ID NO. 595, a second amino acid sequence being at least 90 % homologous to amino acids 13 - 180 of SEQ ID NO. 639, which also corresponds to amino acids 27 - 194 of SEQ ID NO. 595, and a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 195 - 220 of SEQ ID NO. 595, wherein said first amino acid sequence, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of SEQ ID NO. 595, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to amino acids 1 - 26 of SEQ ID NO. 595. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 595, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to amino acids 195 - 220 in SEQ ID NO. 595. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 595, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 128 of SEQ ID NO. 638, which also corresponds to amino acids 1 - 128 of SEQ ID NO. 595, a bridging amino acid L corresponding to amino acid 129 of SEQ ID NO. 595, a second amino acid sequence being at least 90 % homologous to amino acids 130 - 194 of SEQ ID NO. 638, which also corresponds to amino acids 130 - 194 of SEQ ID NO. 595, and a third amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90%> and most preferably at least 95%o homologous to a polypeptide sequence corresponding to amino acids 195 - 220 of SEQ ID NO. 595, wherein said first amino acid sequence, bridging amino acid, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 595, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 195 - 220 in SEQ ID NO. 595. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 595, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 194 of Q9UJZ1, which also corresponds to amino acids 1 - 194 of SEQ ID NO. 595, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%) and most preferably at least 95%> homologous to a polypeptide sequence corresponding to amino acids 195 - 220 of SEQ ID NO. 595, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 595, comprising a polypeptide being at least 70%), optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 195 - 220 in SEQ ID NO. 595. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 596, comprising a first amino acid sequence being at least 10%, optionally at least 80%, preferably at least 85%, more preferably at least 90%) and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 1 - 26 of SEQ ID NO. 596, a second amino acid sequence being at least 90 % homologous to amino acids 13 - 134 of SEQ ID NO. 639, which also corresponds to amino acids 27 - 148 of SEQ ID NO. 596, a third amino acid sequence being at least 90 % homologous to amino acids 180 - 187 of SEQ ID NO. 639, which also corresponds to amino acids 149 - 156 of SEQ ID NO. 596, a bridging amino acid A corresponding to amino acid 157 of SEQ ID NO. 596, and a fourth amino acid sequence being at least 90 % homologous to amino acids 189 - 342 of SEQ ID NO. 639, which also corresponds to amino acids 158 - 311 of SEQ ID NO. 596, wherein said first amino acid sequence, second amino acid sequence, third amino acid sequence, bridging amino acid and fourth amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of SEQ ID NO. 596, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to amino acids 1 - 26 of SEQ ID NO. 596. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of SEQ ID NO. 596, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise RV, having a structure as follows: a sequence starting from any of amino acid numbers 148-x to 148; and ending at any of amino acid numbers 149+ ((n-2) - x), in which x varies from 0 to n-2. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 596, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide sequence corresponding to amino acids 1 - 109 of SEQ ID NO. 596, a second amino acid sequence being at least 90 % homologous to amino acids 1 - 39 of SEQ ID NO. 640, which also corresponds to amino acids 110 - 148 of SEQ ID NO. 596, a third amino acid sequence being at least 90 % homologous to amino acids 85 - 159 of SEQ ID NO. 640, which also corresponds to amino acids 149 - 223 of SEQ ID NO. 596, and a fourth amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide sequence corresponding to amino acids 224 - 311 of SEQ ID NO. 596, wherein said first amino acid sequence, second amino acid sequence, third amino acid sequence and fourth amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of SEQ ID NO. 596, comprising a polypeptide being at least 10%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 1 - 109 of SEQ ID NO. 596. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of SEQ ID NO. 596, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise RV, having a structure as follows: a sequence starting from any of amino acid numbers 148-x to 148; and ending at any of amino acid numbers 149+ ((n-2) - x), in which x varies from 0 to n-2. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 596, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%o, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 224 - 311 in SEQ ID NO. 596. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 596, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 128 of Q96FY2, which also corresponds to amino acids 1 - 128 of SEQ ID NO. 596, a bridging amino acid L corresponding to amino acid 129 of SEQ ID NO. 596, a second amino acid sequence being at least 90 %> homologous to amino acids 130 - 148 of SEQ ID NO. 638, which also corresponds to amino acids 130 - 148 of SEQ ID NO. 596, and a third amino acid sequence being at least 90 % homologous to corresponding to amino acids 194 - 356 of SEQ ID NO. 638, which also corresponds to amino acids 149 - 311 of SEQ ID NO. 596, wherein said first amino acid sequence, bridging amino acid, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of SEQ ID NO. 596, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise RV, having a structure as follows: a sequence starting from any of amino acid numbers 148-x to 148; and ending at any of amino acid numbers 149+ ((n-2) - x), in which x varies from 0 to n-2. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 596, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 148 of Q9UJZ1, which also corresponds to amino acids 1 - 148 of SEQ ID NO. 596, and a second amino acid sequence being at least 90 % homologous to amino acids 194 - 356 of Q9UJZ1, which also corresponds to amino acids 149 - 311 of SEQ ID NO. 596, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of SEQ ID NO. 596, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise RV, having a structure as follows: a sequence starting from any of amino acid numbers 148-x to 148; and ending at any of amino acid numbers 149+ ((n-2) - x), in which x varies from 0 to n-2. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 597, comprising a first amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 1 - 26 of SEQ ID NO. 597, a second amino acid sequence being at least 90 % homologous to amino acids 13 - 143 of SEQ ID NO. 639, which also corresponds to amino acids 27 - 157 of SEQ ID NO. 597, and a third amino acid sequence being at least 90 % homologous to amino acids 295 - 342 of SEQ ID NO. 639, which also corresponds to amino acids 158 - 205 of SEQ ID NO. 597, wherein said first amino acid sequence, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of SEQ ID NO. 597, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to amino acids 1 - 26 of SEQ ID NO. 597. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of SEQ ID NO. 597, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise IV, having a structure as follows: a sequence starting from any of amino acid numbers 157-x to 157; and ending at any of amino acid numbers 158+ ((n-2) - x), in which x varies from 0 to n-2. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 597, comprising a first amino acid sequence being at least 90 %> homologous to amino acids 1 - 128 of Q96FY2, which also corresponds to amino acids 1 - 128 of SEQ ID NO. 597, a bridging amino acid L corresponding to amino acid 129 of SEQ ID NO. 597, a second amino acid sequence being at least 90 % homologous to amino acids 130 - 157 of SEQ ID NO. 639,, which also coixesponds to amino acids 130 - 157 of SEQ ID NO. 597, and a third amino acid sequence being at least 90 % homologous to amino acids 309 - 356 of ID NO. 639, which also corresponds to amino acids 158 - 205 of SEQ ID NO. 597, wherein said first amino acid sequence, bridging amino acid, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of SEQ ID NO. 597, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise TV, having a structure as follows: a sequence starting from any of amino acid numbers 157-x to 157; and ending at any of amino acid numbers 158+ ((n-2) - x), in which x varies from 0 to n-2. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 597, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 157 of Q9UJZ1, which also corresponds to amino acids 1 - 157 of SEQ ID NO. 597, and a second amino acid sequence being at least 90 % homologous to amino acids 309 - 356 of Q9UJZ1, which also corresponds to amino acids 158 - 205 of SEQ ID NO. 597, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of SEQ ID NO. 597, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise IV, having a structure as follows: a sequence starting from any of amino acid numbers 157-x to 157; and ending at any of amino acid numbers 158+ ((n-2) - x), in which x varies from 0 to n-2. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 598, comprising a first amino acid sequence being at least 70%), optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 1 - 26 of SEQ ID NO. 598, a second amino acid sequence being at least 90 % homologous to amino acids 13 - 128 of SEQ ID NO. 639, which also coπ-esponds to amino acids 27 - 142 of SEQ ID NO. 598, and a third amino acid sequence being at least 10%, optionally at least 80%, preferably at least 85%>, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 143 - 161 of SEQ ID NO. 598, wherein said first amino acid sequence, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of SEQ ID NO. 598, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 1 - 26 of SEQ ID NO. 598. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 598, comprising a polypeptide being at least 70%), optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95% homologous to amino acids 143 - 161 in SEQ ID NO. 598. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 598, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 128 of SEQ ID NO. 638, which also corresponds to amino acids 1 - 128 of SEQ ID NO. 598, a bridging amino acid L corresponding to amino acid 129 of SEQ ID NO. 598, a second amino acid sequence being at least 90 % homologous to amino acids 130 - 142 of SEQ ID NO. 638, which also corresponds to amino acids 130 - 142 of SEQ ID NO. 598, and a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 143 - 161 of SEQ ID NO. 598, wherein said first amino acid sequence, bridging amino acid, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 598, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 143 - 161 in SEQ ID NO. 598. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 598, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 142 of Q9UJZ1, which also corresponds to amino acids 1 - 142 of SEQ ID NO. 598, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%) and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 143 - 161 of SEQ ID NO. 598, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 598, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to amino acids 143 - 161 in SEQ ID NO. 598. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 600, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 61 of SEQ ID NO. 638, which also corresponds to amino acids 1 - 61 of SEQ ID NO. 600, and a second amino acid sequence being at least 70%>, optionally at least 80%ι, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 62 - 102 of SEQ ID NO. 600, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 600, comprising a polypeptide being at least 70%), optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence amino acids 62 - 102 in SEQ ID NO. 600. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 600, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 61 of Q9UJZ1, which also corresponds to amino acids 1 - 61 of SEQ ID NO. 600, and a second amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90%> and most preferably at least 95%> homologous to a polypeptide sequence corresponding to amino acids 62 - 102 of SEQ ID NO. 600, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 600, comprising a polypeptide being at least 70%), optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 62 - 102 in SEQ ID NO. 600. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 601, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 1 - 26 of SEQ ID NO. 601, a second amino acid sequence being at least 90 % homologous to amino acids 13 - 47 of SEQ ID NO. 639, which also corresponds to amino acids 27 - 61 of SEQ ID NO. 601, and a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 62 - 72 of SEQ ID NO. 601, wherein said first amino acid sequence, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of SEQ ID NO. 601, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%o and most preferably at least about 95 %> homologous to amino acids 1 - 26 of SEQ ID NO. 601. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 601, comprising a polypeptide being at least 70%), optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 62 - 72 in SEQ ID NO. 601. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 601, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 61 of Q96FY2, which also corresponds to amino acids 1 - 61 of SEQ ID NO. 601, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence con-esponding to amino acids 62 - 72 of SEQ ID NO. 601, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 601, comprising a polypeptide being at least 70%), optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 62 - 72 in SEQ ID NO. 601. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 601, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 61 of Q9UJZ1, which also corresponds to amino acids 1 - 61 of SEQ ID NO. 601, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 62 - 72 of SEQ ID NO. 601, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 601, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 62 - 72 in SEQ ID NO. 601. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 602, comprising a first amino acid sequence being at least 70%), optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 1 - 26 of SEQ ID NO. 602, a second amino acid sequence being at least 90 % homologous to amino acids 13 - 80 of SEQ ID NO. 639, which also corresponds to amino acids 27 - 94 of SEQ ID NO. 602, and a third amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%o, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 95 - 111 of SEQ ID NO. 602, wherein said first amino acid sequence, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of SEQ ID NO. 602, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95% homologous to amino acids 1 - 26 of SEQ ID NO. 602. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 602, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95% homologous to amino acids 95 - 111 in SEQ ID NO. 602. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 602, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 94 of SEQ ID NO. 638, which also corresponds to amino acids 1 - 94 of SEQ ID NO. 602, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide sequence corresponding to amino acids 95 - 111 of SEQ ID NO. 602, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 602, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to amino acids 95 - 111 in SEQ ID NO. 602. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 602, comprising a first amino acid sequence being at least 90 %> homologous to amino acids 1 - 94 of Q9UJZ1, which also corresponds to amino acids 1 - 94 of SEQ ID NO. 602, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide sequence corresponding to amino acids 95 - 111 of SEQ ID NO. 602, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 602, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95%> homologous to amino acids 95 - 111 in SEQ ID NO. 602. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 581, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 67 of PLTP HUMAN, which also corresponds to amino acids 1 - 67 of SEQ ID NO. 581, and a second amino acid sequence being at least 90 % homologous to amino acids 163 - 493 of PLTP_HUMAN, which also corresponds to amino acids 68 - 398 of SEQ ID NO. 581, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of SEQ ID NO. 581, comprising a polypeptide having a length "n", wherem n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise EK, having a structure as follows: a sequence starting from any of amino acid numbers 67-x to 67; and ending at any of amino acid numbers 68+ ((n-2) - x), in which x varies from 0 to n-2. According to preferced embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 582, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 427 of PLTPJHUMAN, which also corresponds to amino acids 1 - 427 of SEQ ID NO. 582, and a second amino acid sequence being at least 70%>, optionally at least 80%>, preferably at least 85%o, more preferably at least 90%) and most preferably at least 95%> homologous to a polypeptide sequence corresponding to amino acids 428 - 432 of SEQ ID NO. 582, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 582, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%o, more preferably at least about 90% and most preferably at least about 95%> homologous to amino acids 428 - 432 in SEQ ID NO. 582. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 584, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 67 of PLTPJHUMAN, which also corresponds to amino acids 1 - 67 of SEQ ID NO. 584, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 68 - 98 of SEQ ID NO. 584, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 584, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to amino acids 68 - 98 in SEQ ID NO. 584. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 585, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 183 of PLTP_HUMAN, which also corresponds to amino acids 1 - 183 of SEQ ID NO. 585, and a second amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90%) and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 184 - 200 of SEQ ID NO. 585, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 585, comprising a polypeptide being at least 70%), optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 184 - 200in SEQ
ID NO. 585. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 586, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 205 of PLTP_HUMAN, which also corresponds to amino acids 1 - 205 of SEQ ID NO. 586, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%) and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 206 - 217 of SEQ ID NO. 586, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 586, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 206 - 217 in SEQ ID NO. 586. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 587, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 109 of PLTPJHUMAN, which also corresponds to amino acids 1 - 109 of SEQ ID NO. 587, a second amino acid sequence bridging amino acid sequence comprising of L, a third amino acid sequence being at least 90 % homologous to amino acids 163 - 183 of PLTP_HUMAN, which also corresponds to amino acids 111 - 131 of SEQ ID NO. 587, and a fourth amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%>, more preferably at least 90%. and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 132 - 148 of SEQ ID NO. 587, wherein said first amino acid sequence, second amino acid sequence, third amino acid sequence and fourth amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for an edge portion of SEQ ID NO. 587, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least three amino acids comprise FLK having a structure as follows (numbering according to SEQ ID NO. 587): a sequence starting from any of amino acid numbers 109-x to 109; and ending at any of amino acid numbers 111 + ((n-2) - x), in which x varies from 0 to n-2. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 587, comprising a polypeptide being at least 70%), optionally at least about 80%o, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95% homologous to amino acids 132 - 148 in SEQ ID NO. 587. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 576, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 1056 of SEQ ID NO. 634, which also corresponds to amino acids 1 - 1056 of SEQ ID NO. 576, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 1057 - 1081 of SEQ ID NO. 576, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 576, comprising a polypeptide being at least 70%), optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to amino acids 1057 - 1081 in SEQ ID NO. 576. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 577, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 714 of SEQ ID NO. 634, which also corresponds to amino acids 1 - 714 of SEQ ID NO. 577, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 715 - 729 of SEQ ID NO. 577, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 577, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 715 - 729 in SEQ ID NO. 577. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 578, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 648 of SEQ ID NO. 634, which also corresponds to amino acids 1 - 648 of SEQ ID NO. 578, a second amino acid sequence being at least 90 % homologous to amino acids 667 - 714 of SEQ ID NO. 634, which also corresponds to amino acids 649 - 696 of SEQ ID NO. 578, and a third amino acid sequence being at least 70%>, optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 697 - 738 of SEQ ID NO. 578, wherein said first amino acid sequence, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of SEQ ID NO. 578, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise AG, having a structure as follows: a sequence starting from any of amino acid numbers 648-x to 648; and ending at any of amino acid numbers 649+ ((n-2) - x), in which x varies from 0 to n-2. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 578, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%) homologous to amino acids 697 - 738 in SEQ ID NO. 578. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 579, comprising a first amino acid sequence being at least 90 %> homologous to amino acids 1 - 260 of SEQ ID NO. 634, which also corresponds to amino acids 1 - 260 of SEQ ID NO. 579, and a second amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90%) and most preferably at least 95%> homologous to a polypeptide sequence corresponding to amino acids 261 - 273 of SEQ ID NO. 579, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferced embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 579, comprising a polypeptide being at least 70%), optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95%> homologous to amino acids 261 - 273 in SEQ ID NO. 579. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 575, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 13 of GFR2_HUMAN, which also corresponds to amino acids 1 - 13 of SEQ ID NO. 575, and a second amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 14 - 30 of SEQ ID NO. 575, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 575, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 14 - 30 in SEQ ID NO. 575. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 567, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 123 of SEQ ID NO. 631, which also corresponds to amino acids 1 - 123 of SEQ ID NO. 567, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 124 - 156 of SEQ ID NO. 567, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 567, comprising a polypeptide being at least 70%), optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 124 - 156 in SEQ ID NO. 567. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 567, comprising a first amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%>, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide sequence con-esponding to amino acids 1 - 73 of SEQ ID NO. 567, and a second amino acid sequence being at least 90 % homologous to amino acids 1799 - 1881 of SEQ ID NO. 629, which also corresponds to amino acids 74 - 156 of SEQ ID NO. 567, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of SEQ ID NO. 567, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence amino acids 1 - 73 of SEQ ID NO. 567. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 567, comprising a first amino acid sequence being at least 90 % homologous to to amino acids 1 - 52 of SEQ ID NO. 630, which also corresponds to amino acids 1 - 52 of SEQ ID NO. 567, a bridging amino acid G corresponding to amino acid 53 of SEQ ID NO. 567, a second amino acid sequence being at least 90 % homologous to amino acids 54 - 124 of SEQ ID NO. 630, which also corresponds to amino acids 54 - 124 of SEQ ID NO. 567, and a third amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 125 - 156 of SEQ ID NO. 567, wherein said first amino acid sequence, bridging amino acid, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 567, comprising a polypeptide being at least 70%), optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 125 - 156 in SEQ ID NO. 567. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 568, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 123 of SEQ ID NO. 631, which also corresponds to amino acids 1 - 123 of SEQ ID NO. 568, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 124 - 169 of SEQ ID NO. 568, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefen-ed embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 568, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 124 - 169 in SEQ ID NO. 568. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 568, comprising a first amino acid sequence being at least 90 %> homologous to amino acids 1 - 52 of SEQ ID NO. 630, which also corresponds to amino acids 1 - 52 of SEQ ID NO. 568, a bridging amino acid G corresponding to amino acid 53 of SEQ ID NO. 568, a second amino acid sequence being at least 90 % homologous to amino acids 54 - 122 of SEQ ID NO. 630, which also corresponds to amino acids 54 - 122 of SEQ ID NO. 568, a third amino acid sequence being at least 70%, optionally at least 80%), preferably at least 85%>, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 123 - 136 of SEQ ID NO. 568, and a fourth amino acid sequence being at least 90 %> homologous to amino acids 123 - 155 of SEQ ID NO. 630, which also corresponds to amino acids 137 - 169 of SEQ ID NO. 568, wherein said first amino acid sequence, bridging amino acid, second amino acid sequence, third amino acid sequence and fourth amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for an edge portion of SEQ ID NO. 568, comprising an amino acid sequence being at least 70%, optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence encoding for amino acids 123 - 136, corresponding to SEQ ID NO. 568. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 569, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 123 of SEQ ID NO. 631, which also corresponds to amino acids 1 - 123 of SEQ ID NO. 569, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide sequence corresponding to amino acids 124 - 180 of SEQ ID NO. 569, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 569, comprising a polypeptide being at least 70%), optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence amino acids 124 - 180 in SEQ ID NO. 569. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 569, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 52 of SEQ ID NO. 630, which also corresponds to amino acids 1 - 52 of SEQ ID NO. 569, a bridging amino acid G corresponding to amino acid 53 of SEQ ID NO. 569, a second amino acid sequence being at least 90 %> homologous to amino acids 54 - 123 of SEQ ID NO. 630, which also corresponds to amino acids 54 - 123 of SEQ ID NO. 569, a third amino acid sequence being at least 70%o, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 124 - 148 of SEQ ID NO. 569, and a fourth amino acid sequence being at least 90 % homologous to amino acids 124 - 155 of SEQ ID NO. 630, which also corresponds to amino acids 149 - 180 of SEQ ID NO. 569, wherein said first amino acid sequence, bridging amino acid, second amino acid sequence, third amino acid sequence and fourth amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for an edge portion of SEQ ID NO. 569, comprising an amino acid sequence being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence encoding for amino acids 124 - 148, corresponding to SEQ ID NO. 569. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 570, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 123 of SEQ ID NO. 631, which also corresponds to amino acids 1 - 123 of SEQ ID NO. 570, and a second amino acid sequence being at least 70%, optionally at least 80%), preferably at least 85%>, more preferably at least 90%) and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 124 - 145 of SEQ ID NO. 570, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 570, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 124 - 148 in SEQ ID NO. 570. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 570, comprising a first amino acid sequence being at least 90 %> homologous to amino acids 1 - 52 of SEQ ID NO. 630, which also corresponds to amino acids 1 - 52 of SEQ ID NO. 570, a bridging amino acid G corresponding to amino acid 53 of SEQ ID NO. 570, a second amino acid sequence being at least 90 % homologous to amino acids 54 - 124 of SEQ ID NO. 630, which also corresponds to amino acids 54 - 124 of SEQ ID NO. 570, and a third amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 125 - 145 of SEQ ID NO. 570, wherein said first amino acid sequence, bridging amino acid, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 570, comprising a polypeptide being at least 10%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 125 - 145 in SEQ ID NO. 570. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 571, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 101 of SEQ ID NO. 631, which also corresponds to amino acids 1 - 101 of SEQ ID NO. 571, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 102 - 122 of SEQ ID NO. 571, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 571, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95% homologous to amino acids 102 - 122 in SEQ ID NO. 571. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 571, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 52 of SEQ ID NO. 630, which also corresponds to amino acids 1 - 52 of SEQ ID NO. 571, a bridging amino acid G corresponding to amino acid 53 of SEQ ID NO. 571, a second amino acid sequence being at least 90 % homologous to amino acids 54 - 101 of SEQ ID NO. 630, which also corresponds to amino acids 54 - 101 of SEQ ID NO. 571, and a third amino acid sequence being at least 70%>, optionally at least 80%o, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence conesponding to amino acids 102 - 122 of SEQ ID NO. 571, wherein said first amino acid sequence, bridging amino acid, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 571, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90%) and most preferably at least about 95% homologous to amino acids 102 - 122 in SEQ ID NO. 571. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 572, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 62 of SEQ ID NO. 631, which also corresponds to amino acids 1 - 62 of SEQ ID NO. 572, a bridging amino acid P conesponding to amino acid 63 of SEQ ID NO. 572, a second amino acid sequence being at least 90 % homologous to amino acids 64 - 123 of SEQ ID NO. 631, which also conesponds to amino acids 64 - 123 of SEQ ID NO. 572, and a third amino acid sequence being at least 70%>, optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence conesponding to amino acids 124 - 155 of SEQ ID NO. 572, wherein said first amino acid sequence, bridging amino acid, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 572, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90%) and most preferably at least about 95% homologous to amino acids 124 - 155 in SEQ ID NO. 572. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 572, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 52 of SEQ ID NO. 630, which also conesponds to amino acids 1 - 52 of SEQ ID NO. 572, a bridging amino acid G conesponding to amino acid 53 of SEQ ID NO. 572, a second amino acid sequence being at least 90 % homologous to LSDDEETIS conesponding to amino acids 54 - 62 of SEQ ID NO. 630, which also conesponds to amino acids 54 - 62 of SEQ ID NO. 572, a bridging amino acid P conesponding to amino acid 63 of SEQ ID NO. 572, and a third amino acid sequence being at least 90 %> homologous to amino acids 64 - 155 of SEQ ID NO. 630, which also conesponds to amino acids 64 - 155 of SEQ ID NO. 572, wherein said first amino acid sequence, bridging amino acid, second amino acid sequence, bridging amino acid and third amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 573 comprising a first amino acid sequence being at least 90 %> homologous to amino acids 1 - 62 of SEQ ID NO. 631 which also conesponds to amino acids 1 - 62 of SEQ ID NO. 573, a bridging amino acid P conesponding to amino acid 63 of SEQ ID NO. 573, a second amino acid sequence being at least 90 % homologous to amino acids 64 - 101 of SEQ ID NO. 631, which also conesponds to amino acids 64 - 101 of SEQ ID NO. 573, and a third amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence conesponding to amino acids 102 - 109 of SEQ ID NO. 573, wherein said first amino acid sequence, bridging amino acid, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 573, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 102 - 109 in SEQ ID NO. 573. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 573, comprising a first amino acid sequence being at least 90 %> homologous to amino acids 1 - 52 of SEQ ID NO. 630 which also conesponds to amino acids 1 - 52 of SEQ ID NO. 573, a bridging amino acid G conesponding to amino acid 53 of SEQ ID NO. 573, a second amino acid sequence being at least 90 % homologous to amino acids 54 - 62 of SEQ ID NO. 630, which also conesponds to amino acids 54 - 62 of SEQ ID NO. 573, a bridging amino acid P conesponding to amino acid 63 of SEQ ID NO. 573, a third amino acid sequence being at least 90 % homologous to amino acids 64 -
101 of SEQ ID NO. 630, which also conesponds to amino acids 64 - 101 of SEQ ID NO. 573, and a fourth amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%> and most preferably at least 95%> homologous to a polypeptide sequence conesponding to amino acids 102 - 109 of SEQ ID NO. 573, wherein said first amino acid sequence, bridging amino acid, second amino acid sequence, bridging amino acid, third amino acid sequence and fourth amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 573, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 102 - 109 in SEQ ID NO. 573. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 574, comprising a first amino acid sequence being at least 90 %> homologous to amino acids 1 - 62 of SEQ ID NO. 631, which also conesponds to amino acids 1 - 62 of SEQ ID NO. 574, a bridging amino acid P corresponding to amino acid 63 of SEQ ID NO. 574, a second amino acid sequence being at least 90 % homologous to amino acids 64 - 101 of SEQ ID NO. 631, which also conesponds to amino acids 64 - 101 of SEQ ID NO. 574, and a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%> and most preferably at least 95%> homologous to a polypeptide sequence conesponding to amino acids
102 - 133 of SEQ ID NO. 574, wherein said first amino acid sequence, bridging amino acid, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 574, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 102 - 133 in
SEQ ID NO. 574. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 574, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 52 of SEQ ID NO. 630, which also corresponds to amino acids 1 - 52 of SEQ ID NO. 574, a bridging amino acid G conesponding to amino acid 53 of SEQ ID NO. 574, a second amino acid sequence being at least 90 % homologous to amino acids 54 - 62 of SEQ ID NO. 630, which also conesponds to amino acids 54 - 62 of SEQ ID NO. 574, a bridging amino acid P conesponding to amino acid 63 of SEQ ID NO. 574, a third amino acid sequence being at least 90 % homologous to amino acids 64 - 101 of SEQ ID NO. 630, which also conesponds to amino acids 64 - 101 of SEQ ID NO. 574, and a fourth amino acid sequence being at least 90 % homologous to amino acids 124 - 155 of SEQ ID NO. 630, which also corresponds to amino acids 102 - 133 of SEQ ID NO. 574, wherein said first amino acid sequence, bridging amino acid, second amino acid sequence, bridging amino acid, third amino acid sequence and fourth amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of SEQ ID NO. 574, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise KV, having a structure as follows: a sequence starting from any of amino acid numbers 101-x to 101; and ending at any of amino acid numbers 102+ ((n-2) - x), in which x varies from 0 to n-2. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptid e encoding for SEQ ID NO. 564, comprising a first amino acid sequence being at least 90 %> homologous to amino acids 1 - 1617 of SEQ ID NO. 627, which also corresponds to amino acids 1 - 1617 of SEQ ID NO. 564, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 1618 - 1645 of SEQ ID NO. 564, wherem said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 564, comprising a polypeptide being at least 70%), optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 1618 - 1645 in SEQ ID NO. 564. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 565, comprising a first amino acid sequence being at least 90 %> homologous to amino acids 1 - 2062 of SEQ ID NO. 627, which also conesponds to amino acids 1 - 2062 of SEQ ID NO. 565, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide sequence conesponding to amino acids 2063 - 2074 of SEQ ID NO. 565, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 565, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 2063 - 2074 in
SEQ ID NO. 565.
According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 566, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 587 of SEQ ID NO. 627, which also conesponds to amino acids 1 - 587 of SEQ ID NO. 566, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 588 - 603 of SEQ ID NO. 566, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 566, comprising a polypeptide being at least 70%), optionally at least about 80%>, preferably at least about 85%o, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 588 - 603 in SEQ ID NO. 566. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 560, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 131 of SEQ ID NO. 625, which also conesponds to amino acids 1 - 131 of SEQ ID NO. 560, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide sequence conesponding to amino acids 132 - 139 of SEQ ID NO. 560, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 560, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to amino acids 132 - 139 in SEQ ID NO. 560. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 561, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 131 of SEQ ID NO. 625, which also conesponds to amino acids 1 - 131 of SEQ ID NO. 561, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence conesponding to amino acids 132 - 156 of SEQ ID NO. 561, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 561, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 132 - 156 in SEQ ID NO. 561. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 562, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 81 of SEQ ID NO. 625, which also corresponds to amino acids 1 - 81 of SEQ ID NO. 562, and a second amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90%o and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 82 - 89 of SEQ ID NO. 562, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 562, comprising a polypeptide being at least 70%), optionally at least about 80%o, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 82 - 89 in SEQ ID NO. 562. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 563, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 82 of SEQ ID NO. 625 which also conesponds to amino acids 1 - 82 of SEQ ID NO. 563. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 552, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 1 - 116 of FABHJHUMAN, which also corresponds to amino acids 1 - 116 of SEQ ID NO. 552, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%>, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide sequence conesponding to amino acids 117 - 215 of SEQ ID NO. 552, wherein said firstand second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 552, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95 %> homologous to amino acids 117 - 215 in SEQ ID NO. 552. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 552, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 116 of AAP35373, which also conesponds to amino acids 1 - 116 of SEQ ID NO. 552, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence conesponding to amino acids 117 - 215 of SEQ ID NO. 552, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 552, comprising a polypeptide being at least 70%), optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 117 - 215 in SEQ ID NO. 552. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 553, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide sequence amino acids 1 - 116 of FABH HUMAN, which also conesponds to amino acids 1 - 116 of SEQ ID NO. 553, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence conesponding to amino acids 117 - 178 of SEQ ID NO. 553, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 553, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90%) and most preferably at least about 95% homologous to amino acids 117 - 178 in SEQ ID NO. 553. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 553, comprising a first amino acid sequence being at least 90 %> homologous to amino acids 1 - 116 of AAP35373, which also conesponds to amino acids 1 - 116 of SEQ ID NO. 553, and a second amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 117 - 178 of SEQ ID NO. 553, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 553, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90%) and most preferably at least about 95%> homologous to acids 117 - 178 in SEQ ID
NO. 553. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 553, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%) and most preferably at least 95% homologous to a polypeptide sequence conesponding to amino acids 1 - 116 of FABH HUMAN, which also conesponds to amino acids 1 - 116 of SEQ ID NO. 553, and a second amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide sequence conesponding to amino acids
117 - 178 of SEQ ID NO. 553, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 553, comprising a polypeptide being at least 70%>, optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 117 - 178 in SEQ
ID NO. 553. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 553, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 116 of AAP35373, which also conesponds to amino acids 1 - 116 of SEQ ID NO. 553, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least
90% and most preferably at least 95% homologous to a polypeptide sequence conesponding to amino acids 117 - 178 of SEQ ID NO. 553, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 553, comprising a polypeptide being at least 10%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 117 - 178 in SEQ ID NO. 553. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 554, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence conesponding to amino acids 1 - 116 of FABHJHUMAN, which also conesponds to amino acids 1 - 116 of SEQ ID NO. 554, and a second amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence conesponding to amino acids 117 - 126 of SEQ ID NO. 554, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 554, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 117 - 126 in SEQ ID NO. 554. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 554, comprising a first amino acid sequence being at least 90 %> homologous to amino acids 1 - 116 of AAP35373, which also corresponds to amino acids 1 - 116 of SEQ ID NO. 554, and a second amino acid sequence being at least 70%>, optionally at least 80%>, preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence conesponding to amino acids 117 - 126 of SEQ ID NO. 554, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 554, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 117 - 126 in SEQ ID NO. 554. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 555, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 24 of FABHJHUMAN, which also conesponds to amino acids 1 - 24 of SEQ ID NO. 555, second amino acid sequence being at least 70%), optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95 %> homologous to a polypeptide sequence conesponding to amino acids 25 - 35 of SEQ ID NO. 555, and a third amino acid sequence being at least 90 % homologous to amino acids 25 - 133 of FABHJHUMAN, which also conesponds to amino acids 36 - 144 of SEQ ID NO. 555, wherein said first, second, third and fourth amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for an edge portion of SEQ ID NO. 555, comprising an amino acid sequence being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence encoding for amino acids 25 - 35 conesponding to SEQ ID NO. 555. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 555, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 24 of AAP35373, which also corresponds to amino acids 1 - 24 of SEQ ID NO. 555, second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 25 - 35 of SEQ ID NO. 555, and a third amino acid sequence being at least 90 % homologous to
GVGFATRQVASMTKPTTIIEKNGDILTLKTHSTFKNTEISFKLGVEFDETTADDRKVKSI VTLDGGKLVHLQKWDGQETTLVRELIDGKLILTLTHGTAVCTRTYEKEA conesponding to amino acids 25 - 133 of AAP35373, which also conesponds to amino acids 36 - 144 of SEQ ID NO. 555, wherein said first, second and third amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for an edge portion of SEQ ID NO. 555, comprising an amino acid sequence being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence encoding for amino acids 25 - 35 conesponding to SEQ ID NO. 555. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 534, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 476 of EPB2JHUMAN, which also conesponds to amino acids 1 - 476 of SEQ ID NO. 534, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide sequence conesponding to amino acids 477 - 496 of SEQ ID NO. 534, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 534, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95%> homologous to amino acids 477 - 496 in SEQ ID NO. 534. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 535, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 270 of EPB2_HUMAN, which also corresponds to amino acids 1 - 270 of SEQ ID NO. 535, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence corresponding to amino acids 271 - 301 of SEQ ID NO. 535, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 535, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to amino acids 271 - 301 in SEQ ID NO. 535. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 536, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 319 of CEA6JHUMAN, which also conesponds to amino acids 1 - 319 of SEQ ID NO. 536, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%) and most preferably at least 95%> homologous to a polypeptide sequence conesponding to amino acids 320 - 324 of SEQ ID NO. 536, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 536, comprising a polypeptide being at least 70%), optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 320 - 324in SEQ ID NO. 536. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 537, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 234 of CEA6JHUMAN, which also conesponds to amino acids 1 - 234 of SEQ ID NO. 537, and a second amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence conesponding to amino acids 235 - 256 of SEQ ID NO. 537, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 537, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 235 - 256 in SEQ ID NO. 537. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 537, comprising a first amino acid sequence being at least 90 %> homologous to amino acids 1 - 234 of Q 13774, which also conesponds to amino acids 1 - 234 of SEQ ID NO. 537, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%) and most preferably at least 95% homologous to a polypeptide sequence conesponding to amino acids 235 - 256 of SEQ ID NO. 537, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 537, comprising a polypeptide being at least 70%), optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to acids 235 - 256 in SEQ ID
NO. 537. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 538, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 320 of CEA6JTUMAN, which also conesponds to amino acids 1 - 320 of SEQ ID NO. 538, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%) and most preferably at least 95% homologous to a polypeptide sequence conesponding to amino acids 321 - 390 of SEQ ID NO. 538, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 538, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to amino acids 321 - 390 in SEQ ID NO. 538. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 539, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 141 of CEA6JHUMAN, which also conesponds to amino acids 1 - 141 of SEQ ID NO. 539, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence conesponding to amino acids 142 - 183 of SEQ ID NO. 539, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 539, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 142 - 183 in SEQ ID NO. 539. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 540, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 167 of Q9HAP5, which also conesponds to amino acids 1 - 167 of SEQ ID NO. 540, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence conesponding to amino acids 168 - 180 of SEQ ID NO. 540, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 540, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 168 - 180 in SEQ ID NO. 540. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 541, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 357 of Q8N441, which also conesponds to amino acids 1 - 357 of SEQ ID NO. 541, second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence conesponding to amino acids 358 - 437 of SEQ ID NO. 541, and a third amino acid sequence being at least 90 % homologous to amino acids 358 - 504 of Q8N441, which also conesponds to amino acids 438 - 584 of SEQ ID NO. 541, wherein said first, second and third amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for an edge portion of SEQ ID NO. 541, comprising an amino acid sequence being at least 70%>, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence encoding for acids 358 - 437 conesponding to SEQ ID NO. 541. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 542, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 269 of Q9H4D7, which also conesponds to amino acids 1 - 269 of SEQ ID NO. 542, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence conesponding to amino acids 270 - 490 of SEQ ID NO. 542, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 542, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95%> homologous to amino acids 270 - 490 in SEQ ID NO. 542. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 542, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 269 of Q8N441, which also conesponds to amino acids 1 - 269 of SEQ ID NO. 542, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence conesponding to amino acids 270 - 490 of SEQ ID NO. 542, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 542, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90%) and most preferably at least about 95%> homologous to amino acids 270 - 490 in SEQ ID NO. 542. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 543, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 81 of SZ05JHUMAN, which also conesponds to amino acids 1 - 81 of SEQ ID NO. 543. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 544, comprising a first amino acid sequence being at least 90 %> homologous to amino acids 1 - 74 of MI2B_HUMAN, which also conesponds to amino acids 1 - 74 of SEQ ID NO. 544. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 545, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 103 of MI2B_HUMAN, which also conesponds to amino acids 1 - 103 of SEQ ID NO. 545. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 546, comprising a first amino acid sequence being at least 90 %> homologous to amino acids 1 - 61 of MI2BJHUMAN, which also conesponds to amino acids 1 - 61 of SEQ ID NO. 546, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence conesponding to amino acids 62 - 98 of SEQ ID NO. 546, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 546, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95%> homologous to amino acids 62 - 98 in SEQ ID NO. 546. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 547, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence conesponding to amino acids 1 - 103 of SEQ ID NO. 547, and a second amino acid sequence being at least 90 % homologous to amino acids 34 - 107 of MI2B HUMAN, which also conesponds to amino acids 104 - 177 of SEQ ID NO. 547, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of SEQ ID NO. 547, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95%) homologous to amino acids 1 - 103 of SEQ ID NO. 547. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 548, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence conesponding to amino acids 1 - 29 of SEQ ID NO. 548, and a second amino acid sequence being at least 90 %> homologous to amino acids 151 - 461 of DCOR_HUMAN, which also conesponds to amino acids 30 - 340 of SEQ ID NO. 548, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of SEQ ID NO. 548, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 1 - 29 of SEQ ID NO. 548. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 548, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%> and most preferably at least 95%> homologous to a polypeptide sequence conesponding to amino acids 1 - 29 of SEQ ID NO. 548, and a second amino acid sequence being at least 90 % homologous to amino acids 40 - 350 of AAA59968, which also conesponds to amino acids 30 - 340 of SEQ ID NO. 548, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encodmg for a head of SEQ ID NO. 548, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 1 - 29 of SEQ ID NO. 548. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 548, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide sequence conesponding to amino acids 1 - 29 of SEQ ID NO. 548, and a second amino acid sequence being at least 90 % homologous to amino acids 86 - 396 of AAH14562, which also conesponds to amino acids 30 - 340 of SEQ ID NO. 548, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of SEQ ID NO. 548, comprising a polypeptide being at least 70%o, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 1 - 29 of SEQ ID NO. 548. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 549, comprising a first amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence conesponding to amino acids 1 - 44 of SEQ ID NO. 549, second amino acid sequence being at least 90 %> homologous to amino acids 74 - 191 of Q9NWT9, which also conesponds to amino acids 45 - 162 of SEQ ID NO. 549, and a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence conesponding to amino acids 163 - 238 of SEQ ID NO. 549, wherein said first, second and third amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of SEQ ID NO. 549, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95 %> homologous to amino acids 1 - 44 of SEQ ID NO. 549. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 549, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 163 - 238 in SEQ ID NO. 549. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 549, comprising a first amino acid sequence being at least 10%, optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence conesponding to amino acids 1 - 44 of SEQ ID NO. 549, and a second amino acid sequence being at least 90 %> homologous to amino acids 21 - 214 of TESCJHUMAN, which also conesponds to amino acids 45 - 238 of SEQ ID NO. 549, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of SEQ ID NO. 549, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 1 - 44 of SEQ ID NO. 549. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 550, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence conesponding to amino acids 1 - 130 of SEQ ID NO. 550, and a second amino acid sequence being at least 90 % homologous to amino acids 1 - 172 of Q96C98, which also conesponds to amino acids 131 - 302 of SEQ ID NO. 550, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of SEQ ID NO. 550, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 1 - 130 of SEQ ID NO. 550. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 550, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence conesponding to amino acids 1 - 74 of SEQ ID NO. 550, and a second amino acid sequence being at least 90 % homologous to amino acids 53 - 280 of Q9BVA2, which also corresponds to amino acids 75 - 302 of SEQ ID NO. 550, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of SEQ ID NO. 550, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to amino acids 1 - 74 of SEQ ID NO. 550. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 551, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%) and most preferably at least 95% homologous to a polypeptide sequence conesponding to amino acids 1 - 34 of SEQ ID NO. 551, and a second amino acid sequence being at least 90 % homologous to conesponding to amino acids 60 - 172 of Q96C98, which also conesponds to amino acids 35 - 147 of SEQ ID NO. 551, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of SEQ ID NO. 551 comprising a polypeptide being at least 10%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95%> homologous to amino acids 1 - 34 of SEQ ID NO. 551. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 551, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence conesponding to amino acids 1 - 34 of SEQ ID NO. 551, and a second amino acid sequence being at least 90 % homologous to conesponding to amino acids 168 - 280 of Q9BVA2, which also conesponds to amino acids 35 - 147 of SEQ ID NO. 551, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of SEQ ID NO. 551, comprising a polypeptide being at least 70%>, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95% homologous to amino acids 1 - 34 of SEQ ID NO. 551. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of SEQ ID NO. 548, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95% homologous to amino acids 1 - 29 of SEQ ID NO. 548. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 556, comprising a first amino acid sequence being at least 90 %> homologous to amino acids 1 - 441 of SM02JHUMAN, which also conesponds to amino acids 1 - 441 of SEQ ID NO. 556, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence conesponding to amino acids 442 - 464 of SEQ ID NO. 556, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an An isolated polypeptide encoding for a tail of SEQ ID NO. 556, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95% homologous to amino acids 442 - 464 in SEQ ID NO. 556. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 557, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 428 of SM02JHUMAN, which also conesponds to amino acids 1 - 428 of SEQ ID NO. 557, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%) and most preferably at least 95% homologous to a polypeptide sequence conesponding to amino acids 429 - 434 of SEQ ID NO. 557, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 557, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95 %> homologous to amino acids 429 - 434 in SEQ ID NO. 557. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 558, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 441 of SM02JHUMAN, which also conesponds to amino acids 1 - 441 of SEQ ID NO. 558, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide sequence conesponding to amino acids 442 - 454 of SEQ ID NO. 558, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of SEQ ID NO. 558, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous amino acids 442 - 454 in SEQ ID NO. 558. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for SEQ ID NO. 559, comprising a first amino acid sequence being at least 90 % homologous to amino acids 1 - 170 of SM02JHUMAN, which also conesponds to amino acids 1 - 170 of SEQ ID NO. 559, and a second amino acid sequence being at least 90 % homologous to amino acids 188 - 446 of SM02JHUMAN, which also conesponds to amino acids 171 - 429 of SEQ ID NO. 559, wherein said first and second amino acid sequences are contiguous and in a sequential order.
According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of SEQ ID NO. 559, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise TD, having a structure as follows: a sequence starting from any of amino acid numbers 170-x to 170; and ending at any of amino acid numbers 171+ ((n-2) - x), in which x varies from 0 to n-2. According to prefened embodiments of the present invention, there is provided an antibody capable of specifically binding to an epitope of an amino acid sequence from clusters of M85491, T10888, H14624, H53626, HSENA78, HUMGROG5, HUMODCA, R00299, Z19178, S67314, Z44808, Z25299, HUMF5A, HUMANK, Z39818, HUMCA1XIA, HSSIOOPCB, HUMPHOSLIP, D11853, R11723, M77903 and HSKITCR. Optionally said amino acid sequence conesponds to a bridge, edge portion, tail, head or insertion. Optionally the antibody is capable of differentiating between a splice variant having said epitope and a conesponding known protein. According to prefened embodiments of the present invention, there is provided a kit for detecting colon cancer, comprising a kit detecting overexpression of a splice variant from clusters of M85491, T10888, H14624, H53626, HSENA78, HUMGROG5, HUMODCA, R00299, Z19178, S67314, Z44808, Z25299, HUMF5A, HUMANK, Z39818, HUMCAIXIA,
HSSIOOPCB, HUMPHOSLIP, DI 1853, RI 1723, M77903 and HSKITCR. Optionally the kit comprises a NAT-based technology. Optionally the kit further comprises at least one primer pair capable of selectively hybridizing to a nucleic acid sequence. Optionally the kit further comprises at least one oligonucleotide capable of selectively hybridizing to a nucleic acid sequence. Optionally the kit comprises an antibody. Optionally the kit further comprises at least one reagent for performing an ELISA or a Western blot. According to prefened embodiments of the present invention, there is provided an method for detecting colon cancer, comprising detecting overexpression of a splice variant from clusters of M85491, T10888, H14624, H53626, HSENA78, HUMGROG5, HUMODCA, R00299, Z19178, S67314, Z44808, Z25299, HUMF5A, HUMANK, Z39818, HUMCAIXIA, HSSIOOPCB, HUMPHOSLIP, DI 1853, RI 1723, M77903 and HSKITCR. Optionally detecting overexpression is performed with a NAT-based technology. Optionally said detecting overexpression is performed with an immunoassay. Optionally the immunoassay comprises an antibody. According to prefened embodiments of the present invention, there is provided a biomarker capable of detecting colon cancer, comprising nucleic acid sequences or a fragment thereof, or amino acid sequences or a fragment thereof from clusters of M85491, T10888, H14624, H53626, HSENA78, HUMGROG5, HUMODCA, R00299, Z19178, S67314, Z44808, Z25299, HUMF5A, HUMANK, Z39818, HUMCAIXIA, HSSIOOPCB, HUMPHOSLIP, D11853, R11723, M77903 and HSKITCR. According to prefened embodiments of the present invention, there is provided a method for screening for colon cancer, comprising detecting colon cancer cells with a biomarker or an antibody or a method or assay. According to prefened embodiments of the present invention, there is provided a method for diagnosing colon cancer, comprising detecting colon cancer cells with a biomarker or an antibody or a method or assay. According to prefened embodiments of the present invention, there is provided a method for monitoring disease progression of colon cancer, comprising detecting colon cancer cells with a biomarker or an antibody or a method or assay. According to prefened embodiments of the present invention, there is provided a method of selecting a therapy for colon cancer, comprising detecting colon cancer cells with a biomarker or an antibody or a method or assay and selecting a therapy according to said detection. According to prefened embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
Transcript Name AA583399_PEA_1_ _T0 AA583399_PEA_1. _T1 AA583399_PEA_1. _T2 AA583399_PEA_1. _T3 AA583399_PEA_1_ _T4 AA583399_PEA_1. _T5 AA583399JPEA_1. _T6
AA583399_PEA _1_ JT7
AA583399_PEA_1. _T8
AA583399_PEA_1_ JT9
AA583399_PEA_1. _T10
AA583399_PEA_1_ _T11
AA583399_PEA_1. _T12
AA583399_PEA_1. _T15
AA583399_PEA_1_ _T16
AA583399_PEA_1. _T17
a nucleic acid sequence comprising a sequence selected from the table below:
Segment Name
AA583399_PEA_ l_node_0
AA583399_PEA_ l_node_3
AA583399_PEA_ l_node_9
AA583399_PEA_ l_node_10
AA583399_PEA_ _l_node_12
AA583399_PEA_ l_node_14
AA583399_PEA_ l_node_21
AA583399JPEA_ l_node_24
AA583399_PEA_ l_node_25
AA583399_PEA_ l_node_29
AA583399_PEA_ l_node_l
AA583399_PEA_ l_node_2
AA583399_PEA_ l_node_4
AA583399_PEA_ l_node_5
AA583399_PEA_ l_node_6
AA583399_PEA_ _l_node_7 AA583399_ _PEA_ l_node_ _8 AA583399_ _PEA_ l_node_ 11 AA583399, _PEA_ l_node_ 9 AA583399_PEA_ l_node_ _27
According to prefened embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below
Protein Name AA583399_PEA_ _1_P3 AA583399JPEA_ _1_P2 AA583399_PEA_ _1_P4 AA583399_PEA_ 1_P5 AA583399_PEA_ _1_P6 AA583399_PEA_ _1_P8 AA583399_PEA_ ,1_P10 AA583399_PEA_ _1_P11 AA583399_PEA_ ,1_P12 AA583399_PEA_ ,1_P14
According to prefened embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
Transcript Name AI684092 PEA 1 T2 AI684092 PEA 1 T3 a nucleic acid sequence comprising a sequence in the table below:
Segment Name AI684092_PEA_ l_node_0 AI684092JPEA_ l_node_2 AI684092_PEA_ _l_node_4 AI684092_PEA_ l_node_5 AI684092JPEA_ l_node_6 AI684092_PEA_ l_node_7 AI684092_PEA_ _l__node_8 AI684092_PEA_ l_node_9
According to prefened embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below:
Protein Name AI684092 PEA 1 PI AI684092 PEA 1 P3
According to prefened embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
Transcript Name HUMCACH1A_PEA_ 1_T0 HUMCACH1AJPEA_ 1_T1 HUMCACH1A_PEA_ _1_T2 HUMCACH1AJPEA_ 1_T3 HUMCACH1A_PEA_ 1_T4 HUMCACH1A_PEA_ 1_T6 HUMCACH1A_PEA_ 1_T7 HUMCACH1A_PEA_ 1_T8 HUMCACH1AJPEA_ _1_T12 HUMCACH1AJPEA_ 1JT13 HUMCACH1A_PEA_ 1_T14
HUMCACH1AJPEA_ 1_T15
HUMCACH1A_PEA_ _1_T16
HUMCACH1AJPEA_ 1_T17
HUMCACH1AJPEA_ 1_T18
HUMCACH1A_PEA_ _1_T19
HUMCACH1AJPEA_ _1_T20
HUMCACH1A_PEA_ _1JT22
a nucleic acid sequence comprising a sequence in the table below:
Segment Name
HUMCACH1AJPEA_ l_node_2
HUMCACH1A_PEA_ _l_node_5
HUMCACH1A__PEA_ l_ node_9
HUMCACH1AJPEA_ l_node_l 1
HUMCACH1AJPEA_ l_node_14
HUMCACH1A_PEA_ l_node_16
HUMCACH1A_PEA_ l_node_27
HUMCACH1A_PEA_ l_node_30
HUMCACH1A_PEA_ l_node_33
HUMCACH1AJPEA_ l_node_41
HUMCACH1AJPEA_ l_node_43
HUMCACH1A_PEA_ l_node_45
HUMCACH1A_PEA_ _l_node_47
HUMCACH1AJPEA_ _l_node_55
HUMCACH1A_PEA_ _l_node_57
HUMCACH1A_PEA_ l_node_70
HUMCACH1A_PEA_ _l_nodeJ72
HUMCACH1A__PEA_ l_node_74
Figure imgf000104_0001
HUMCACH1A_PEA_1_ _node_64 HUMCACH1A_PEA_1_ node 66 HUMCACHIAJPEAJ. _node_68 HUMCACH1AJΕAJ. _node_76 HUMCACH1A_PEA_1_ _node_77 HUMCACHIAJPEAJ. _node_79 HUMCACHIAJPEAJ. _node_81 HUMCACHIAJPEAJ. _node_84 HUMCACHIAJPEAJ. _node_88 HUMCACH1AJ>EAJ_ _node_90 HUMCACHIAJPEAJ. _node_96 HUMCACHIAJPEAJ. _node_98 HUMCACH1A_PEA_1_ _node_100 HUMCACHIAJPEAJ. _node_101 HUMCACH1A_PEA_1. _node_107 HUMCACHIAJPEAJ. _node_l 11 HUMCACH1AJΕAJ. _nodeJ 17 HUMCACH1AJPEA_1. _node_124 HUMCACH1 A_PEA_1_ _node_126
According to prefened embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below:
Protein Name HUMCACH1AJPEA_ .1. _P2 HUMCACH1AJPEA_ .1. _P3 HUMCACH1A_PEA_ .1. JP4 HUMCACH1AJPEA_ .1. _P5 HUMCACH1AJPEA_ X _P7 HUMCACH1A_PEA_ _1_P8 HUMCACH1A_PEA_ .1_P9 HUMCACH1A__PEA_ .1JP10 HUMCACH1A_PEA_ JJP11 HUMCACH1AJPEA_ _1_P12 HUMCACH1AJ>EA_ _1_P13 HUMCACH1AJPEA_ .1_P14 HUMCACH1A_P£A_ .1JP15 HUMCACH1A_PEA_ JJP17
According to prefened embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
Transcript Name HUMCEA_PEA_ _1_T8 HUMCEA_PEA_ JJ9 HUMCEA_PEA_ _1JN2 • HUMCEA_PEA_ _1_T14 HUMCEA_PEA_ _1_T16 HUMCEA_PEA_ _1_T20 HUMCEA_PEA_ _1_T25 HUMCEA_PEA_ -1_T26 HUMCEA_PEA_ .1JT29 HUMCEA_PEA_ _1_T30 a nucleic acid sequence comprising a sequence in the table below: Segment Name HUMCEAJPEA_ l_node_ _0 HUMCEA_PEA_ l_node_ 2 HUMCEA_PEA_ l_node_ 6
Figure imgf000107_0001
HUMCEA JPEA_ l αode_ .33 HUMCEA JPEA_ l_node_ -34 HUMCEA >EA_ l_node_ ,35 HUMCEA_PEA_ l_node_45 HUMCEAJPEA_ l_node_ .49 HUMCEA JPEA_ l_node_ _50 HUMCEA_PEA_ l_node_ .51 HUMCEA_PEA_ l_node .56 HUMCEA_PEA_ l_node_ .57 HUMCEA JPEA_ l_node_ -58 HUMCEA_PEA_ l_node_ -60 HUMCEA_PEA_ l_node_ -61 HUMCEA_PEA_ l_node -62 HUMCEA_PEA_ l_node_ _64
According to prefened embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below
Protein Name HUMCEA_PEA_ 1_P4 HUMCEA JPEA_ -1_P5 HUMCEAJPEA_ .1_P7 HUMCEA_PEA_ 1JP10 HUMCEA JPEA_ .1JP14 HUMCEA_PEA_ _1_P19 HUMCEA JPEA_ .1_P20
According to prefened embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or: 10. Transcript Name M78035. _T0 M78035. _T3 M78035. _T4 M78035. _T7 M78035. _T9 M78035. _T11 M78035. _T17 M78035. JIT 8 M78035. JIT 9 M78035. _T20 M78035. _T27 M78035. _T28 a nucleic acid sequence comprising a sequence in the table below:
Figure imgf000109_0001
Figure imgf000110_0001
According to prefened embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below: Protein Name M78035. _P2 M78035. _P4 M78035. _P6 M78035_P8 M78035. _P18 M78035. JP19
According to prefened embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and or:
Figure imgf000111_0001
a nucleic acid sequence comprising a sequence in the table below:
Figure imgf000111_0002
Figure imgf000112_0001
R30650_PEA_ _2_node_ 17 R30650_PEA_ _2_node_ .28 R30650_PEA_ _2_node_ .31 R30650JPEA. _2_node_48 R30650_PEA_ _2_node_ -53 R30650_PEA_ 2 jnode .58 R30650_PEA_2_node 58 R30650_PEA. _2_node .77 R30650_PEA. 2 jnode. _82 R30650JPEA. 2 jnode _85 R30650_PEA_2_node_ .88 R30650_PEA_2_node_ .90 R30650_PEA_2_node_ _91 R30650_PEA. 2_node_ .92
According to prefened embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below
Protein Name R30650 PEA 2 P4 R30650 PEA 2 P5 R30650 PEA 2 P8 R30650 PEA 2 P12 R30650 PEA 2 P13 R30650 PEA 2 P15 R30650 PEA 2 PI 7
According to prefened embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or: Transcript Name
T23657_ TO
T23657. Tl
T23657. _T2
T23657. _T3
T23657. _T4
T23657. T5
T23657. T6
T23657. _T7
T23657. T8
T23657. _T9
T23657. _T10
T23657. _T11
T23657. _T12
T23657. _T13
T23657. _T14
T23657. _T15
T23657. _T16
T23657. _T17
T23657_T19
T23657. J20
T23657. _T21
T23657. _T22
T23657. J23
T23657. _T24
T23657_T28
T23657. _T30
T23657_T31
T23657. _T32
T23657. _T35 T23657 T37 T23657 T38 a nucleic acid sequence comprising a sequence in the table below:
Figure imgf000115_0001
T23657_node..26 T23657_node_ 28 T23657_node_ 30 T23657_node_ .31 T23657_node_ .32 T23657_node_41 T23657_node_42 T23657_node_ _43 T23657_node_44
According to prefened embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below:
Figure imgf000116_0001
T23657. P19 T23657. _P21 T23657. _P22 T23657JP23
According to prefened embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
Transcript Name T51958JPEAJ. _T4 T51958_PEA_1. _T5 T51958JPEAJ. _T6 T51958JPEAJ. _T8 T51958_PEA_1. _T12 T51958_PEA_1. _T16 T51958JPEAJ. _T33 T51958 ΕAJ. _T35 T51958_PEAJ. _T37 T51958_PEA_1. JT39 T51958_PEA_1. _T40 T51958 ?EA_l. _T41 a nucleic acid sequence comprising a sequence in the table below:
Segment Name T51958JPEAJ. _node_ .0 T51958JPEAJ. _node_ 1 T51958_PEA_1_ _node_ .8 T51958JPEAJ. _node_ .9 T51958JΕAJ. _node_ -1 T51958 ?EA_ l αode_ 6
T51958JPEA_ l_node_ 8
T51958JPEA_ l_node_ - 1
T51958 ?EA_ l ode .22
T51958_PEA_ l_node_ .24
T51958 ?EA_ ljαode .27
T51958JPEA_ l aode. _29
T51958JPEA. l_node_ .33
T51958 ?EA^ ljnode _40
T51958 ?EA_ ljnode -41
T51958_PEA_ l_node_46
T51958 ?EA_ ljnode .51
T51958JPEA_ l_node_ _55
T51958JΕA. ljnode -67
T51958 ?EA_ l_node O
T51958 PEA_ l_node_74
T51958_PEA_ l_node_ .78
T51958 ?EA_ ljnode. -11
T51958_PEA_ ljnode. -15
T51958_PEA_ ljnode -20
T51958 ?EA_ l_node_26
T51958_PEA_ l_node_ -35
Figure imgf000118_0001
T51958 ?EA_ l_node_38
T51958 ?EA_ _l_node_39
T51958 ?EA_ _l_node_42
T51958 PEA_ _l_node_43
T51958_PEA_ _l_node_44
T51958 ?EA_ l_node_45
T51958_PEA_ _ljnode_47 T51958_PEA_ l_node_ .48 T51958JPEA_ l_node_49 T51958JΕA. ljnode_ - 0 T51958JPEA_ l_node -54 T51958_PEA_ l_node - 1 T51958JPEA_ l_node_ Jl T51958 ?EA_ l_node_ .72 T51958_PEA_ l_node_75 T51958_PEA_ l_node_ -76 T51958 ?EA_ ljnode. 11 T51958 ?EA_ l_node. -80 T51958JPEA_ l_node. _82 T51958JPEA_ l_node _84
According to preferred embodiments of the present mvention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below:
Protein Name T51958JPEAJ. _P5 T51958JPEAJ. _P6 T51958JPEAJ. _P28 T51958JPEAJ. _P30 T51958JPEAJ. _P34 T51958JPEAJ. _P35
According to prefened embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
Transcript Name Z17877JPEA_ 1_T0
Z17877_PEA_ .1_T2
Z17877_PEA_ _1_T3
Z17877_PEA_ -1-_T4
Z17877_PEA_ _1JT6
Z17877_PEA_ -1_T7
Z17877J?EA_ 1_T8
Z17877_PEA_ 1_T11
Z17877 ?EA_ -1__T12
a nucleic acid sequence comprising a sequence in the table below:
Segment Name
Z17877_PEA_ l_node_0
Z17877_PEA_ l_node_3
Z17877_PEA_ l_node_8
Z17877JPEA_ _l_node_9
Z17877_PEA_ l_node_10
Z17877_PEA_ l_nodeJ 1
Z17877JPEA_ 1 jnode J 3
Z17877_PEA_ 1 jnode J 5
Z17877_PEA_ l_node_16
Z17877_PEA_ _l_node 8
Z17877_PEA_ _l_nodeJ
Z17877_PEA_ l_node_2
Z17877_PEA_ l_node_4
Z17877_PEA_ l_node_5
Z17877_PEA_ l_node_6
Z17877_PEA_ l_node_14
Z17877_PEA_ l_node_17 According to prefened embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below
Protein Name Z17877_PEA_ .1_P1 Z17877_PEA_ _1JP2 Z17877JPEA_ 1_P3 Z17877_PEA_ _1J?6
According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
Transcript Name HSHCGIJPEA . JO HSHCGI_PEAJ_ _τι HSHCGI_PEA_3. _T2 HSHCGI_PEA_3_ _T3 HSHCGIJΕA_3_ _T4 HSHCGI_PEA_3. _T5 HSHCGIJPEAJ. J6 HSHCGI ?EA_3. TJ HSHCGIJPEAJ. _Υ8 HSHCGIJΕAJ. T9 HSHCGI ?EA_3. JT10 HSHCGI_PEA_3. rn HSHCGI_PEAJ_ _T12 HSHCGIJPEAJ. _T13 HSHCGIJPEAJ. _T14 HSHCGI PEAJ. JT15 HSHCGI JPEA_3_ _T17
HSHCGI >EA_3_ _T18
HSHCGI J»EA_3_ JT19
HSHCGIJPEAJ. J20
HSHCGI_PEA_3. _T21
HSHCGI _PEA_3. _T22
HSHCGI_PEA_3. _T23
HSHCGI_PEA_3_ _T24
a nucleic acid sequence comprising a sequence in the table below:
Segment Name
HSHCGIJPEAJ. node_0
HSHCGIJΕAJ. _node_2
HSHCGI_PEAJ. _node_7
HSHCGI JPEA_3_ _node_8
HSHCGI_PEA_3_ node J 4
HSHCGIJPEAJ. node J 6
HSHCGIJPEAJ. jnode J 8
HSHCGI_PEA_3. _node_20
HSHCGI JPEA_3. _node_26
HSHCGI ΕA_3_ _node_28
HSHCGI_PEAJ. nodeJO
HSHCGI JPEA_3. _nodeJ2
HSHCGI_PEA_3. _node_33
HSHCGIJPEAJ. _nodeJ4
HSHCGIJPEA_3_ node J 6
HSHCGI_PEA_3_node_l
HSHCGIJPEA_3. _node_4
HSHCGIJPEA_3_node_6 HSHCGIJΕAJ. node_ .9
HSHCGIJPEAJ. node_ .11
HSHCGIJΕAJ. _node_ 3
HSHCGI JPEAJ. node_ -19
HSHCGI_PEA_3_ node_21
HSHCGI JPEAJ. node. .22
HSHCGIJΕAJ. node - 3
HSHCGIJPEAJ. _node_24
HSHCGI_PEAJ. node. -27
HSHCGIJΕAJ. node _31
HSHCGI JPEAJ. _nodeJ5
According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below:
Protein Name
HSHCGIJPEAJ. _P17
HSHCGIJPEAJ. _P18
HSHCGI JPEAJ. _P19
HSHCGIJΕAJ. _P1
HSHCGIJPEAJ. JP4
HSHCGI ?EAJ. _P6
HSHCGIJPEAJ. _P7
HSHCGI >EAJ_ _P8
HSHCGI_PEAJ_ _P9
HSHCGIJPEAJ. _P12
HSHCGI JPEAJ. _P13
HSHCGI JPEAJ. _P14
HSHCGI JPEAJ. JP15
HSHCGI ΕAJ. JP16 HSHCGI PEA 3 P20 HSHCGI PEA 3 P21 HSHCGI PEA 3 P22
According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSHCGIJPEAJ J317, comprising a first amino acid sequence being at least 90 % homologous to MASGQFVNKLQEEVICPICLDILQKPVTIDCGHNFCPQCITQIGETSCGFFKCPLCKTSVR RDAIRFNSLLRNLVEKIQALQASEVQSKRKEATCPRHQEMFHYFCEDDGKFLCFVCRES KDHKSHNVSLIEEAAQNYQGQIQEQIQVLQQKEKETVQVKAQGVHRVDVFTDQVEHE KQRILTEFELLHQVLEEEKNFLLSRIYWLGHEGTEAGKHYV conesponding to amino acids 1 - 218 of TM31JHUMAN, which also conesponds to amino acids 1 - 218 of HSHCGI__PEAJ_P17, and a second amino acid sequence being at least 70%>, optionally at least 80%), preferably at least 85%>, more preferably at least 90%o and most preferably at least 95%> homologous to a polypeptide having the sequence EIPLMPTVERSQEARCYP conesponding to amino acids 219 - 236 of HSHCGIJΕAJ J* 17, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSHCGIJPEAJ JP17, comprising a polypeptide being at least 70%>, optionally at least about 80%), preferably at least about 85%>, more preferably at least about 90%> and most preferably at least about 95% homologous to the sequence EIPLMPTVERSQEARCYP in HSHCGI_PEAJ_P17. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSHCGIJPEAJJP19, comprising a first amino acid sequence being at least 90 % homologous to
MASGQFVNKLQEEVICPICLDILQKPVTIDCGHNFCLKCITQIGETSCGFFKCPLCKTSVR RDAIRENSLLRNLVEKIQALQASEVQSKRKEATCPRHQEMFHYFCEDDGKFLCFVCRES KDHKSHNVSLIEEAAQNYQGQIQEQIQVLQQKEKETVQVKAQGVHRVDVFTDQVEHE KQWLTEFELLHQVLEEEKNFLLSMYWLGHEGTEAGKHYVASTEPQLNDLKKLVDSLK TKQNMPPRQLLE conesponding to amino acids 1 - 248 of TM31_HUMAN_V2, which also conesponds to amino acids 1 - 248 of HSHCGIJPEAJJP19, and a second amino acid sequence being at least 10%, optionally at least 80%o, preferably at least 85%), more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence NWRKNSVKQNQDTTPSQGA conesponding to amino acids 249 - 267 of HSHCGIJPEAJJP19, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSHCGIJPEAJ JP19, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence NWRKNSVKQNQDTTPSQGA in HSHCGI _PEA _3 JP 19. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSHCGIJPEA _3 JP4, comprising a first amino acid sequence being at least 90 % homologous to MASGQFVNKLQEEVICPICLDILQKPVTIDCGHNFCLKCITQIGETSCGFFKCPLCKTSVR KNAIRPNSLLRNLVEKIQALQASEVQSKRKEATCPRHQEMFHYFCEDDGKFLCFVCRES KDHKSHNVSLIEEAAQNYQGQIQEQIQVLQQKEKETVQVKAQGVHRVDVFTDQVEHE KQRILTEFELLHQVLEEEKNFLLSRIYWLGHEGTEAGKHYVASTEPQLNDLKKLVDSLK TKQNMPPRQLLEDIKVVLCR conesponding to amino acids 1 - 256 of TM31JHUMANJV1, which also conesponds to amino acids 1 - 256 of HSHCGI JΕAJJP4, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence YDGPPQMYFAY conesponding to amino acids 257 - 267 of HSHCGIJPEA JP4, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSHCGIJPEA JP4, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence YDGPPQMYFAY in HSHCGIJΕAJ J>4. According to prefened embodiments of the present mvention, there is provided an isolated chimeric polypeptide encodmg for HSHCGIJPEA_3JP6, comprising a first amino acid sequence being at least 90 % homologous to
MASGQFVNKLQEEVICPICLDILQKPVTIDCGHNFCLKCITQIGETSCGFFKCPLCKTSVR KNAIRFNSLLRNLVEKIQALQASEVQSKRKEATCPRHQEMFHYFCEDDGKFLCFVCRES KDHKSHNVSLIEEAAQNYQGQIQEQIQVLQQKEKETVQVKAQGVHRVDVFTDQVEHE KQRILTEFELLHQVLEEEKNFLLSRIYWLGHEGTEAGKHYVASTEPQLNDLKKLVDSLK TKQNMPPRQLLEDIKVVLCR conesponding to amino acids 1 - 256 of TM31 JTUMANJVl, which also conesponds to amino acids 1 - 256 of HSHCGI_PEAJ_P6, and a second amino acid sequence being at least 70%>, optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence PTPG conesponding to amino acids 257 - 260 of HSHCGI PEA J_P6, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSHCGI PEA JP6, comprising a polypeptide being at least 70%o, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence PTPG in HSHCGI J?EAJJ>6. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSHCGIJPEA J JP7, comprising a first amino acid sequence being at least 90 %> homologous to
MASGQFVNKLQEEVICPICLDILQKPVTIDCGHNFCLKCITQIGETSCGFFKCPLCKTSVR KNAIRFNSLLRNLVEKIQALQASEVQSKRKEATCPRHQEMFHYFCEDDGKFLCFVCRES KDHKSHNVSLIEEAAQNYQGQIQEQIQVLQQKEKETVQVKAQGVHRVDVFTDQVEHE KQRILTEFELLHQVLEEEKNFLLSRIYWLGHEGTEAGKHYVASTEPQLNDLKKLVDSLK TKQNMPPRQLLEDIKVVLCRS conesponding to amino acids 1 - 257 of
TM31_HUMAN_V1, which also conesponds to amino acids 1 - 257 of HSHCGIJPEA JJP7, and a second amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%o homologous to a polypeptide having the sequence SFSHTSSPDLTNQLNHIFLEVKSFSFSTQPLFLWNWRKNSVKQNQDTTPSQGA conesponding to amino acids 258 - 310 of HSHCGIJPEA JP7, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSHCGI J>EAJ_P7, comprising a polypeptide being at least 70%>, optionally at least about 80%o, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SFSHTSSPDLTNQLNHIFLEVKSFSFSTQPLFLWNWRKNSVKQNQDTTPSQGA in HSHCGI_PEAJJP7. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSHCGI JPEAJJP 8, comprising a first amino acid sequence being at least 90 % homologous to
MASGQFVNKLQEEVICPICLDILQKPVTIDCGHNFCLKCITQIGETSCGFFKCPLCKTSVR KNAIRFNSLLRNLVEKIQALQASEVQSKRKEATCPRHQEMFHYFCEDDGKFLCFVCRES KDHKSHNVSLIEEAAQNYQGQIQEQIQVLQQKEKETVQVKAQGVHRVDVFTDQVEHE KQRILTEFELLHQVLEEEKNFLLSRIYWLGHEGTEAGKHYVASTEPQLNDLKKLVDSLK TKQNMPPRQLLEDIKVVLCRSEEFQFLNPTP LELEKKLSEAKSRHDSITGSLKKFKDQ LQADRKKDENRFFKSMNKNDMKSWGLLQKNNHKMNKTSEPGSSSAG conesponding to amino acids 1 - 342 of TM31 JTUMANJVl, which also conesponds to amino acids 1 - 342 of HSHCGI_PEAJ_P8, and a second amino acid sequence being at least 70%, optionally at least 80%), preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence KSPVSEY conesponding to amino acids 343 - 349 of HSHCGI JPE A JJP8, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSHCGIJPEA 3 P8, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence KSPVSEY in HSHCGI J>EAJJ>8. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSHCGIJPEA JP9, comprising a first amino acid sequence being at least 90 % homologous to MASGQFVNKLQEEVICPICLDILQKPVTIDCGHNFCLKCITQIGETSCGFFKCPLCKTSVR KNAIRFNSLLRNLVEKIQALQASEVQSKRKEATCPRHQEMFHYFCEDDGKFLCFVCRES KDHKSHNVSLIEEAAQNYQGQIQEQIQVLQQKEKETVQVKAQGVHRVDVFTDQVEHE KQRILTEFELLHQVLEEEKNFLLSRIYWLGHEGTEAGKHYVASTEPQLNDLKKLVDSLK TKQNMPPRQLLEDIKVVLCR conesponding to amino acids 1 - 256 of TM31JHUMANJV1, which also conesponds to amino acids 1 - 256 of HSHCGIJPEA 3 JP9, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%o and most preferably at least 95%> homologous to a polypeptide having the sequence TGEKTQ conesponding to amino acids 257 - 262 of HSHCGIJPEA _P9, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSHCGI PEAJ P9, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence TGEKTQ in HSHCGIJPEA_3 JP9. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSHCGIJPEA JJP12, comprising a first amino acid sequence being at least 90 % homologous to MNKNDMKSWGLLQKNNHKMNKTSEPGSSSAGGRTTSGPPNHHSSAPSHSLFRASSAG KVTFPVCLLASYDEISGQGASSQDTKTFDVALSEELHAALSEWLTAIRAWFCEVPSS conesponding to amino acids 312 - 425 of TM31 JHUMAN, which also conesponds to amino acids 1 - 114 of HSHCGI_PEAJ_P12. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSHCGI PEA JP14, comprising a first amino acid sequence being at least 90 % homologous to
MASGQFVNKLQEEVICPICLDILQKPVTIDCGHNFCLKCITQIGETSCGFFKCPLCKTSVR KNAIRFNSLLRNLVEKIQALQASEVQSIO^KEATCPPJHQEMFHYFCEDDGKFLCFVCRES KDHKSHNVSLIEEAAQNYQGQIQEQIQVLQQKEKETVQVKAQGVHRVDVFTDQVEHE KQWLTEFELLHQVLEEEKNFLLSRIYWLGHEGTEAGKF1YVASTEPQLNDLKK VDSLK TKQNMPPRQLLEDIKWLCRSEEFQFLNPTPVPLELEKKLSEAKSRHDSITGSLKKFKDQ LQADRKKDENRFFKSMNKNDMKS conesponding to amino acids 1 - 319 of TM31JTUMANJV1, which also conesponds to amino acids 1 - 319 of HSHCGI_PEA_3_P14, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%o, more preferably at least 90%> and most preferably at least 95%> homologous to a polypeptide having the sequence CK conesponding to amino acids 320 - 321 of
HSHCGIJPEAJ JP 14, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSHCGIJPEAJ J316, comprising a first amino acid sequence being at least 90 %> homologous to
MASGQFVNKLQEEVICPICLDILQKPVTIDCGHNFCLKCITQIGETSCGFFKCPLCKTSVR KNAIRFNSLLRNLVEKIQALQASEVQSKRKEATCPRHQEMFHYFCEDDGKFLCFVCRES KDHKSHNVSLIEEAAQNYQGQIQEQIQVLQQKEKETVQVKAQGVHRVDVFT conesponding to amino acids 1 - 171 of TM31JHUMAN V1, which also conesponds to amino acids 1 - 171 of HSHCGIJPEAJJP16, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence VRKTPSHDLWKQKHLCQSSWNPLLH conesponding to amino acids 172 - 196 of HSHCGI PEA 3 P16, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSHCGI_PEAJ_P16, comprising a polypeptide being at least 70%>, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90%> and most preferably at least about 95%> homologous to the sequence VRKTPSHDLWKQKHLCQSSWNPLLH in HSHCGIJPEA JP16. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSHCGIJPEAJJP21, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%> and most preferably at least 95%> homologous to a polypeptide having the sequence MHHSDWGNIMWIFQMSPLQNFRKEERNQ conesponding to amino acids 1 - 28 of HSHCGI JPE A JJP21, and a second amino acid sequence being at least 90 % homologous to
FLCFVCRESKDHKSHNVSLIEEAAQNYQGQIQEQIQVLQQKEKETVQVKAQGVHRVDV FTDQVEHEKQRILTEFELLHQVLEEEKNFLLSRIYWLGHEGTEAGKHYVASTEPQLNDL KKLVDSLKTKQNMPPRQLLEDIKVVLCRSEEFQFLNPTPVPLELEKKLSEAKSRHDSITG SLKKFKDQLQADRKKDENRFFKSMNKNDMKSWGLLQKNNHKMNKTSEPGSSSAGGR TTSGPPNHHSSAPSHSLFRASSAGKVTFPVCLLASYDEISGQGASSQDTKTFDVALSEEL HAALSEWLTAIRAWFCEVPSS conesponding to amino acids 112 - 425 of TM31_HUMAN, which also conesponds to amino acids 29 - 342 of HSHCGIJPEAJ P21, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of HSHCGI_PEAJ_P21, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MHHSDWGNIMWIFQMSPLQNFRKEERNQ of HSHCGI JPEAJ JP21. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSHCGIJPEA J JP22, comprising a first amino acid sequence being at least 90 % homologous to
MPPRQLLEDIKVVLCRSEEFQFLNPTPVPLELEKKLSEAKSRHDSITGSLKKFKDQLQAD RKKDENRFFKSMNKNDMKSWGLLQKNNHKMNKTSEPGSSSAGGRTTSGPPNHHSSAP SHSLFRASSAGKVTFPVCLLASYDEISGQGASSQDTKTFDVALSEELHAALSEWLTAIRA WFCEVPSS conesponding to amino acids 241 - 425 of TM31JHUMAN, which also conesponds to amino acids 1 - 185 of HSHCGI PEAJJP22. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T51958JPEAJ JP5, comprising a first amino acid sequence being at least 90 % homologous to
MGAARGSPARPRRLPLLSVLLLPLLGGTQTAIVFIKQPSSQDALQGRRALLRCEVEAPGP VHVYWLLDGAPVQDTERRFAQGSSLSFAAVDRLQDSGTFQCVARDDVTGEEARSANA SFNIKWIEAGPWLKHPASEAEIQPQTQVTLRCHIDGHPRPTYQWFRDGTPLSDGQSNH TVSSKERNLTLRPAGPEHSGLYSCCAHSAFGQACSSQNFTLSIADESFARWLAPQDW VARYEEAMFHCQFSAQPPPSLQWLFEDETPITNRSRPPHLRRATVFANGSLLLTQVRPR NAGIYRCIGQGQRGPPIILEATLHLAEIEDMPLFEPRVFTAGSEERVTCLPPKGLPEPSVW WEHAGVRLPTHGRVYQKGHELVLANIAESDAGVYTCHAANLAGQR QDVNITVATVP SWLKKPQDSQLEEGKPGYLDCLTQATPKPTVVWYRNQMLISEDSRFEVFKNGTLRΓNS VEVYDGTWYRCMSSTPAGSIEAQARVQVLEKLKFTPPPQPQQCMEFDKEATVPCSATG REKPTIKWERADGSSLPEWVTDNAGTLHFARVTRDDAGNYTCIASNGPQGQIRAHVQL TVAVFITFKVEPERTTVYQGHTALLQCEAQGDPKPLIQWKGKDRILDPTKLGPRMHIFQ
NGSLVIHDVAPEDSGRYTCIAGNSCNIKHTEAPLYVV conesponding to amino acids 1 - 682 of PTK7_HUMAN_V4, which also conesponds to amino acids 1 - 682 of T51958JPEAJ JP5, and a second amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
GMGWGGLCCTGSGGPRRLSPCTQPLCTEHGTEAIFVAAVGIRPSHHAAAQS conesponding to amino acids 683 - 733 of T51958JΕAJ J>5, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T51958_PEA_1_P5, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence GMGWGGLCCTGSGGPPvRLSPCTQPLCTEHGTEAIFVAAVGIRPSHHAAAQS in T51958_PEAJ_P5. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T51958JPEAJJP6, comprising a first amino acid sequence being at least 90 % homologous to
MGAARGSPARPRRLPLLSVLLLPLLGGTQTAIVFIKQPSSQDALQGPvRALLRCEVEAPGP VHVYWLLDGAPVQDTERPvFAQGSSLSFAAVDRLQDSGTFQCVARDDVTGEEARSANA SFNIKWIEAGPWLKHPASEAEIQPQTQVTLRCHIDGHPRPTYQWFRDGTPLSDGQSNH TVSSKERNLTLRPAGPEHSGLYSCCAHSAFGQACSSQNFTLSIADESFARVVLAPQDVV VARYEEAMFHCQFSAQPPPSLQWLFEDETPITNRSRPPHLRRATVFANGSLLLTQVRPR NAGIYRCIGQGQRGPPIILEATLHLAEIEDMPLFEPRVFTAGSEERVTCLPPKGLPEPSVW WEHAGVRLPTHGRVYQKGHELVLANIAESDAGVYTCHAANLAGQRRQDVNITVATVP SWLKKPQDSQLEEGKPGYLDCLTQATPKPTVVWYRNQMLISEDSRFEVFKNGTLRTNS VEVYDGTWYRCMSSTPAGSIEAQARVQVLEKLKFTPPPQPQQCMEFDKEATVPCSATG REKPTIKWERADGSSLPEWVTDNAGTLHFARVTRDDAGNYTCIASNGPQGQIRAHVQL TVAVFITFKVEPERTTVYQGHTALLQCEAQGDPKPLIQWKGKDRILDPTKLGPRM conesponding to amino acids 1 - 641 of PTK7_HUMAN_V4, which also conesponds to amino acids 1 - 641 of T51958JPEAJ JP6, and a second amino acid sequence being at least 70%>, optionally at least 80%>, preferably at least 85%, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence APW conesponding to amino acids 642 - 644 of T51958_PEA_1_P6, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T51958 JΕAJ JP28, comprising a first amino acid sequence being at least 90 % homologous to
MGAARGSPARPRRLPLLSVLLLPLLGGTQTAIVFIKQPSSQDALQGRRALLRCEVEAPGP VHVYWLLDGAPVQDTERRFAQGSSLSFAAVDRLQDSGTFQCVARDDVTGEEARSANA SFNIKWIEAGPWLKHPASEAEIQPQTQVTLRCHIDGHPRPTYQWFRDGTPLSDGQSNH TVSSKERNLTLRPAGPEHSGLYSCCAHSAFGQACSSQNFTLSIADESFARVVLAPQDVV VARYEEAMFHCQFSAQPPPSLQWLFEDETPITNRSRPPHLRRATVFANGSLLLTQVRPR NAGIYRCIGQGQRGPPIILEATLHLAEIEDMPLFEPRVFTAGSEERVTCLPPKGLPEPSVW WEHAGVRLPTHGRVYQKGHELVLANIAESDAGVYTCHAANLAGQPvRQDVNITVA conesponding to amino acids 1 - 409 of PTK7_HUMAN_V11, which also conesponds to amino acids 1 - 409 of T51958JPEAJ J>28, and a second amino acid sequence being at least 70%), optionally at least 80%>, preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence SEHLCPEGQGEVEGNTGLGVMDRGFPGTHLRSSQFWALQAWESVHYWESV conesponding to amino acids 410 - 459 of T51958_PEA_1_P28, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T51958_PEA_1_P28, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SEHLCPEGQGEVEGNTGLGVMDRGFPGTHLRSSQFWALQAWESVHYWESV in T51958 PEA 1 P28. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T51958JPEAJ J>28, comprising a first amino acid sequence being at least 90 % homologous to
MGAARGSPARPRPvLPLLSVLLLPLLGGTQTAIVFIKQPSSQDALQGRRALLRCEVEAPGP VHVYWLLDGAPVQDTERRFAQGSSLSFAAVDRLQDSGTFQCVARDDVTGEEARSANA SFNIKWIEAGPVVLKHPASEAEIQPQTQVTLRCHIDGHPRPTYQWFRDGTPLSDGQSNH TVSSKERNLTLRPAGPEHSGLYSCCAHSAFGQACSSQNFTLSIADESFARVVLAPQDVV VARYEEAMFHCQFSAQPPPSLQWLFEDETPITNRSRPPHLRRATVFANGSLLLTQVRPR NAGIYRCIGQGQRGPPIILEATLHLAEIEDMPLFEPRVFTAGSEERVTCLPPKGLPEPSVW WEHAGVRLPTHGRVYQKGHELVLANIAESDAGVYTCHAANLAGQRRQDVNITVA conesponding to amino acids 1 - 409 of Q8NFA5, which also conesponds to amino acids 1 - 409 of T51958 JΕAJJP28, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%) homologous to a polypeptide having the sequence SEHLCPEGQGEVEGNTGLGVMDRGFPGTHLRSSQFWALQAWESVHYWESV conesponding to amino acids 410 - 459 of T51958_PEA_1_P28, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T51958 JPEAJ J>28, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90%> and most preferably at least about 95%> homologous to the sequence SEHLCPEGQGEVEGNTGLGVMDRGFPGTHLRSSQFWALQAWESVHYWESV in T51958JPEAJJP28. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T51958 JPEAJ JP28, comprising a first amino acid sequence being at least 90 % homologous to
MGAARGSPAPJ>PJ? PLLSVLLLPLLGGTQTAIVFIKQPSSQDALQGRRALLRCEVEAPGP VFTvYWLLDGAPVQDTERRFAQGSSLSFAAVDRLQDSGTFQCVARDDVTGEEARSANA SFNIKWIEAGPWLKHPASEAEIQPQTQVTLRCHIDGHPRPTYQWFRDGTPLSDGQSNH TVSSKERNLTLRPAGPEHSGLYSCCAHSAFGQACSSQNFTLSIADESFARWLAPQDW VARYEEAMFHCQFSAQPPPSLQWLFEDETPITNRSRPPHLRRATVFANGSLLLTQVRPR NAGIYRCIGQGQRGPPIILEATLHLAEIEDMPLFEPRVFTAGSEERVTCLPPKGLPEPSVW WEHAGVRLPTHGRVYQKGHELVLANIAESDAGVYTCHAANLAGQRRQDVNITVA conesponding to amino acids 1 - 409 of Q8NFA6, which also corresponds to amino acids 1 - 409 of T51958JPEAJ JP28, and a second amino acid sequence being at least 70%), optionally at least 80%, preferably at least 85%, more preferably at least 90%) and most preferably at least 95% homologous to a polypeptide having the sequence
SEHLCPEGQGEVEGNTGLGVMDRGFPGTHLRSSQFWALQAWESVHYWESV conesponding to amino acids 410 - 459 of T51958JPEAJJP28, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T51958JPEAJJP28, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SEHLCPEGQGEVEGNTGLGVMDRGFPGTHLRSSQFWALQAWESVHYWESV in T51958JPEAJ_P28. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T51958JPEAJ JP28, comprising a first amino acid sequence being at least 90 % homologous to MGAARGSPARPRRLPLLSVLLLPLLGGTQTAIVFIKQPSSQDALQGPvRALLRCEVEAPGP VHVYWLLDGAPVQDTEPvRFAQGSSLSFAAVDRLQDSGTFQCVARDDVTGEEARSANA SFNIKWIEAGPVVLKHPASEAEIQPQTQVTLRCHIDGHPRPTYQWFRDGTPLSDGQSNH TVSSKERNLTLRPAGPEHSGLYSCCAHSAFGQACSSQNFTLSIADESFARWLAPQDVV VARYEEAMFHCQFSAQPPPSLQWLFEDETPITNRSRPPHLRRATVFANGSLLLTQVRPR NAGIYRCIGQGQRGPPIILEATLHLAEIEDMPLFEPRVFTAGSEERVTCLPPKGLPEPSVW WEHAGVRLPTHGRVYQKGHELVLANIAESDAGVYTCHAANLAGQRRQDVNITVA conesponding to amino acids 1 - 409 of Q8NFA7, which also conesponds to amino acids 1 - 409 of T51958_PEA_1_P28, and a second amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%o homologous to a polypeptide having the sequence SEHLCPEGQGEVEGNTGLGVMDRGFPGTHLRSSQFWALQAWESVHYWESV conesponding to amino acids 410 - 459 of T51958JΕAJ _P28, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T51958JΕAJ JP28, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SEHLCPEGQGEVEGNTGLGVMDRGFPGTHLRSSQFWALQAWESVHYWESV in T51958JPEAJJ>28. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T51958 JΕAJ JP28, comprising a first amino acid sequence being at least 90 %> homologous to
MGAARGSPARPRRLPLLSVLLLPLLGGTQTAIVFIKQPSSQDALQGRRALLRCEVEAPGP VHVYWLLDGAPVQDTERRFAQGSSLSFAAVDRLQDSGTFQCVARDDVTGEEARSANA SFNIKWIEAGPWLKHPASEAEIQPQTQVTLRCHIDGHPRPTYQWFRDGTPLSDGQSNH TVSSKERNLTLRPAGPEHSGLYSCCAHSAFGQACSSQNFTLSIADESFARVVLAPQDVV VARYEEAMFHCQFSAQPPPSLQWLFEDETPITNRSRPPHLR ATVFANGSLLLTQVRPR NAGIYRCIGQGQRGPPIILEATLHLAEIEDMPLFEPRVFTAGSEERVTCLPPKGLPEPSVW WEHAGVRLPTHGRVYQKGHELVLANIAESDAGVYTCHAANLAGQRRQDVNITVA conesponding to amino acids 1 - 409 of Q8NFA8, which also conesponds to amino acids 1 - 409 of T51958_PEA_1_P28, and a second amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%>, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence
SEHLCPEGQGEVEGNTGLGVMDRGFPGTHLRSSQFWALQAWESVHYWESV conesponding to amino acids 410 - 459 of T51958JPEAJJP28, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T51958_PEA_1_P28, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence SEHLCPEGQGEVEGNTGLGVMDRGFPGTHLRSSQFWALQAWESVHYWESV in T51958 PEA 1 P28. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T51958_PEA_1_P28, comprising a first amino acid sequence being at least 90 % homologous to
MGAARGSPARPRRLPLLSVLLLPLLGGTQTAIVFIKQPSSQDALQGRRALLRCEVEAPGP VHVYWLLDGAPVQDTERRFAQGSSLSFAAVDRLQDSGTFQCVARDDVTGEEARSANA SFNIKWIEAGPVVLKHPASEAEIQPQTQVTLRCHIDGHPRPTYQWFRDGTPLSDGQSNH TVSSKERNLTLRPAGPEHSGLYSCCAHSAFGQACSSQNFTLSIADESFARVVLAPQDVV VARYEEAMFHCQFSAQPPPSLQWLFEDETPITNRSRPPHLRRATVFANGSLLLTQVRPR NAGIYRCIGQGQRGPPIILEATLHLAEIEDMPLFEPRVFTAGSEERVTCLPPKGLPEPSVW WEHAGVRLPTHGRVYQKGHELVLANIAESD AGVYTCHAANLAGQRRQDVNITVA conesponding to amino acids 1 - 409 of AAN04862, which also conesponds to amino acids 1 - 409 of T51958JPEAJJP28, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence SEHLCPEGQGEVEGNTGLGVMDRGFPGTHLRSSQFWALQAWESVHYWESV conesponding to amino acids 410 - 459 of T51958JPEAJ JP28, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T51958_PEA_1_P28, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence SEHLCPEGQGEVEGNTGLGVMDRGFPGTHLRSSQFWALQAWESVHYWESV in T51958JPEAJJP28. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T51958 JPEAJ JP30, comprising a first amino acid sequence being at least 90 % homologous to
MGAARGSPARPP RLPLLSVLLLPLLGGTQTAIVFIKQPSSQDALQGRRALLRCEVEAPGP VHVYWLLDGAPVQDTERRFAQGSSLSFAAVDR QDSGTFQCVARDDVTGEEARSANA SFNIK conesponding to amino acids 1 - 122 of PTK7_HUMAN_V13, which also conesponds to amino acids 1 - 122 of T51958JPEAJJP30, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence CESQGGCAQSPCQTLND conesponding to amino acids 123 - 139 of T51958JΕAJJP30, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T51958JPEAJ JP30, comprising a polypeptide being at least 70%>, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95%> homologous to the sequence CESQGGCAQSPCQTLND in T51958J?EAJ JP30. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T51958JPEAJ JP34, comprising a first amino acid sequence being at least 90 % homologous to
MGAARGSPARPRRLPLLSVLLLPLLGGTQTAIVFIKQPSSQDALQGPvRALLRCEVEAPGP VHVYWLLDGAPVQDTEPvRFAQGSSLSFAAVDRLQDSGTFQCVARDDVTGEEARSANA SFNIKWIEAGPVVLKHPASEAEIQPQTQVTLRCHIDGHPR conesponding to amino acids 1 - 157 of PTK7JHUMANJ 3, which also conesponds to amino acids 1 - 157 of T51958JΕAJJP34. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T51958JPEAJJP35, comprising a first amino acid sequence being at least 90 % homologous to
MGAARGSPARPRRLPLLSVLLLPLLGGTQTAIVFIKQPSSQDALQGRRALLRCEVEAPGP VHVYWLLDGAPVQDTERRFAQGSSLSFAAVDRLQDSGTFQCVARDDVTGEEARSANA SFNIKWIEAGPVVLKHPASEAEIQPQTQVTLRCHIDGHPRPTYQWFRDGTPLSDGQSNH TVSSKERNLTLRPAGPEHSGLYSCCAHSAFGQACSSQNFTLSIA conesponding to amino acids 1 - 220 of PTK7JHUMANJV11, which also conesponds to amino acids 1 - 220 of
T51958_PEAJJP35, and a second amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence GEPGVGAEGMR conesponding to amino acids 221 - 231 of T51958_PEA_1_P35, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T51958 J>EAJ JP35, comprising a polypeptide being at least 70%, optionally at least about 80%o, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95%> homologous to the sequence GEPGVGAEGMR in T51958_PEA_l_P35. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T23657_P2, comprising a first amino acid sequence being at least 90 % homologous to MPLHQLGDKPLTFPSPNSAMENGLDHTPPSRRASPGTPLSPGSLRSAAHSPLDTSKQPLC QLWAEKHGARGTHEVRYVS AGQS VACGWWAF APPCLQVLNTPKGILFFLCAAAFLQG MTVNGFINTVITSLERRYDLHSYQSGLIASSYDIAACLCLTFVSYFGGSGHKPRWLGWG VLLMGTGSLVFALPHFTAGRYEVELDAGVRTCPANPGAVCADSTSGLSRYQLVFMLG QFLHGVGATPLYTLGVTYLDENVKSSCSPVYIAIFYTAAILGPAAGYLIGGALLNIYTEM GRPvTELTTESPLWVGAWWVGFLGSGAAAFFTAVPILGYPRQLPGSQRYAVMRAAEMH QLKDSSRGEASNPDFGKTIRDLPLSIWLLLKNPTFILLCLAGATEATLITGMSTFSPKFLES QFSLSASEAATLFGYLVVPAGGGGTFLGGFFVNKLRLRGSAVIKFCLFCTVVSLLGILVF SLHCPSVPMAGVTASYGGSLLPEGHLNLTAPCNAACSCQPEHYSPVCGSDGLMYFSLC HAGCPAATETNVDGQKVYRDCSCIPQNLSSGFGHATAGKCTSTCQRKPLLLVFIFWIFF TFLSSIPALTATLRCVRDPQRSFALGIQWTVVRILGGIPGPIAFGWVIDKACLLWQDQCG QQGSCLVYQNSAMSRYILIMGLLYK conesponding to amino acids 1 - 675 of
S21CJTUMAN, which also conesponds to amino acids 1 - 675 of T23657_P2, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%> and most preferably at least 95%> homologous to a polypeptide having the sequence FQLPEVHHSLNVLNRKFQKQTVHNL conesponding to amino acids 676 - 700 of T23657_P2, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T23657_P2, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95%> homologous to the sequence FQLPEVHHSLNVLNRKFQKQTVHNL in T23657_P2. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T23657_P3, comprising a first amino acid sequence being at least 90 % homologous to
MPLHQLGDKPLTFPSPNSAMENGLDHTPPSRRASPGTPLSPGSLRSAAHSPLDTSKQPLC QLWAEKBGARGTHEVRYVSAGQSVACGWWAFAPPCLQVLNTPKGILFFLCAAAFLQG MTVNGFINTVITSLERRYDLHSYQSGLIASSYDIAACLCLTFVSYFGGSGHKPRWLGWG VLLMGTGSLVFALPHFTAGRYEVELDAGVRTCPANPGAVCADSTSGLSRYQLVFMLG QFLHGVGATPLYTLGVTYLDENVKSSCSPVYIAIFYTAAILGPAAGYLIGGALLNIYTEM GRRTELTTESPLWVGAWWVGFLGSGAAAFFTAVPILGYPRQLPGSQRYAVMRAAEMH QLKDSSRGEASNPDFGKTIRDLPLSIWLLLKNPTFILLCLAGATEATLITGMSTFSPKFLES QFSLSASEAATLFGYLVVPAGGGGTFLGGFFVNKLRLRGSAVIKFCLFCTWSLLGILVF SLHCPSVPMAGVTASYGGSLLPEGHLNLTAPCNAACSCQPEHYSPVCGSDGLMYFSLC HAGCPAATETNVDGQKVYRDCSCIPQNLSSGFGHATAGKCTSTCQRKPLLLVFIFVVIFF TFLSSIPALTATLRCVRDPQRSFALGIQWIVVRILGGIPGPIAFGWVIDKACLLWQDQCG QQGSCLVYQNSAMSRYILIMGLLYK conesponding to amino acids 1 - 675 of
S21C_HUMAN, which also conesponds to amino acids 1 - 675 of T23657_P3, and a second amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence TIKHKAF conesponding to amino acids 676 - 682 of T23657_P3, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T23657JP3, comprising a polypeptide being at least 70%), optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence TIKHKAF in T23657JP3. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T23657J , comprising a first amino acid sequence being at least 90 %> homologous to MPLHQLGDKPLTFPSPNSAMENGLDHTPPSRRASPGTPLSPGSLRSAAHSPLDTSKQPLC QLWAEKHGARGTHEVRYNSAGQSVACGWWAFAPPCLQVLNTPKGILFFLCAAAFLQG MTVNGFINTVITSLERRYDLHSYQSGLIASSYDIAACLCLTFVSYFGGSGHKPRWLGWG VLLMGTGSLVFALPHFTAGRYEVELDAGVRTCPANPGAVCADSTSGLSRYQLVFMLG QFLHGVGATPLYTLGVTYLDENVKSSCSPVYIAIFYTAAILGPAAGYLIGGALLNIYTEM GRRTELTTESPLWVGAWWVGFLGSGAAAFFTAVPILGYPRQLPGSQRYAVMRAAEMH QLKDSSRGEASNPDFGKTIRDLPLSIWLLLKNPTFILLCLAGATEATLITGMSTFSPKFLES QFSLSASEAATLFGYLVVPAGGGGTFLGGFFVNKLRLRGSAVIKFCLFCTVVSLLGILVF SLHCPSVPMAGVTASYGGSLLPEGHLNLTAPCNAACSCQPEHYSPVCGSDGLMYFSLC HAGCPAATETNVDGQKVYRDCSCIPQNLSSGFGHATAGKCTSTCQRKPLLLVFIFWIFF TFLSSIPALTATLRCVRDPQRSFALGIQWIWRIL conesponding to amino acids 1 - 625 of S21CJHUMAN, which also conesponds to amino acids 1 - 625 of T23657JP4, a second amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence GTVQCEEAMVSCTVCSLHKGM conesponding to amino acids 626 - 646 of T23657JP4, a third amino acid sequence being at least 90 % homologous to GGIPGPIAFGWVIDKACLLWQDQCGQQGSCLVYQNSAMSRYILIMGLLYK conesponding to amino acids 626 - 675 of S21CJHUMAN, which also conesponds to amino acids 647 - 696 of T23657J , and a fourth amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence TIKHKAF conesponding to amino acids 697 - 703 of T23657JP4, wherein said first amino acid sequence, second amino acid sequence, third amino acid sequence and fourth amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for an edge portion of T23657_P4, comprising an amino acid sequence being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95%> homologous to the sequence encoding for GTVQCEEAMVSCTVCSLHKGM, conesponding to T23657JP4. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T23657 J , comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90%) and most preferably at least about 95% homologous to the sequence TIKHKAF in T23657_P4. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T23657JP5, comprising a first amino acid sequence being at least 90 % homologous to
MPLHQLGDKPLTFPSPNSAMENGLDHTPPSPvRASPGTPLSPGSLRSAAHSPLDTSKQPLC QLWAEKHGARGTHEVRYVSAGQSVACGWWAFAPPCLQVLNTPKGILFFLCAAAFLQG MTVNGFINTVITSLERRYDLHSYQSGLIASSYDIAACLCLTFVSYFGGSGHKPRWLGWG VLLMGTGSLVFALPHFTAGRYEVELDAGVRTCPANPGAVCADSTSGLSRYQLVFMLG QFLHGVGATPLYTLGVTYLDENVKSSCSPVYIAIFYTAAILGPAAGYLIGGALLNIYTEM GRRTELTTESPLWVGAWWVGFLGSGAAAFFTAVPILGYPRQLPGSQRYAVMRAAEMH QLKDSSRGEASNPDFGKTIRDLPLSIWLLLKNPTFILLCLAGATEATLITGMSTFSPKFLES QFSLSASEAATLFGYLVVPAGGGGTFLGGFFVNKLRLRGSAVIKFCLFCTVVSLLGILVF SLHCPSVPMAGVTASYGGSLLPEGHLNLTAPCNAACSCQPEHYSPVCGSDGLMYFSLC HAGCPAATETNVDGQKVYRDCSCIPQNLSSGFGHATAGKCTSTCQRKPLLLVFIF WIFF TFLSSIPALTATLR conesponding to amino acids 1 - 604 of S21C_HUMAN, which also conesponds to amino acids 1 - 604 of T23657JP5. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T23657JP6, comprising a first amino acid sequence being at least 90 %> homologous to
MPLHQLGDKPLTFPSPNSAMENGLDHTPPSRRASPGTPLSPGSLRSAAHSPLDTSKQPLC QLWAEKHGARGTHEVRYVSAGQSVACGWWAFAPPCLQVLNTPKGILFFLCAAAFLQG MTVNGFINTVITSLEPvRYDLHSYQSGLIASSYDIAACLCLTFVSYFGGSGHKPRWLGWG VLLMGTGSLVFALPHFTAGRYEVELDAGVRTCPANPGAVCADSTSGLSRYQLVFMLG QFLHGVGATPLYTLGVTYLDENVKSSCSPVYIAIFYTAAILGPAAGYLIGGALLNIYTEM GRRTELTTESPLWVGAWWVGFLGSGAAAFFTAVPILGYPRQLPGSQRYAVMRAAEMH QLKDSSRGEASNPDFGKTIRDLPLSIWLLLKNPTFILLCLAGATEATLITGMSTFSPKFLES QFSLSASEAATLFGYLVVPAGGGGTFLGGFFVNKLRLRGSAVIKFCLFCTVVSLLGILVF SLHCPSVPMAGVTASYGGSLLPEGHLNLTAPCNAACSCQPEHYSPVCGSDGLMYFSLC HAGCPAATETNVDGQKV conesponding to amino acids 1 - 547 of S21 CJHUMAN, which also conesponds to amino acids 1 - 547 of T23657JP6, and a second amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence SGAAAYRPCPPLDPGKGPPCLPLVIGAIVGLPRCTETVAVSLRIFPLVLAMPLQGNALQL VRESPSFWFSYSL conesponding to amino acids 548 - 620 of T23657_P6, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T23657JP6, comprising a polypeptide being at least 70%), optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SGAAAYRPCPPLDPGKGPPCLPLVIGAIVGLPRCTETVAVSLRIFPLVLAMPLQGNALQL VRESPSFWFSYSL in T23657_P6. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T23657JP7, comprising a first amino acid sequence being at least 90 % homologous to MPLHQLGDKPLTFPSPNSAMENGLDHTPPSPvRASPGTPLSPGSLRS AAHSPLDTSKQPLC QLWAEKHGARGTHEVRYVSAGQSVACGWWAFAPPCLQVLNTPKGILFFLCAAAFLQG MTVNGFINTVITSLERRYDLHSYQSGLIASSYDIAACLCLTFVSYFGGSGHKPRWLGWG VLLMGTGSLVFALPHFTAGRYEVELDAGVRTCPANPGAVCADSTSGLSRYQLVFMLG QFLHGVGATPLYTLGVTYLDENVKSSCSPVYIAIFYTAAILGPAAGYLIGGALLNIYTEM GRRTELTTESPLWVGAWWVGFLGSGAAAFFTAVPILGYPRQLPGSQRYAVMRAAEMH QLKDSSRGEASNPDFGKTIRDLPLSIWLLLKNPTFILLCLAGATEATLITGMSTFSPKFLES QFSLSASEAATLFGYLVVPAGGGGTFLGGFFVNKLRLRGSAVIKFCLFCTVVSLLGILVF SLHCPSVPMAGVTASYGGSLLPEGHLNLTAPCNAACSCQPEHYSPVCGSDGLMYFSLC HAGCPAATETNVDGQK conesponding to amino acids 1 - 546 of S21C HUMAN, which also conesponds to amino acids 1 - 546 of T23657JP7, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MCP conesponding to amino acids 547 - 549 of T23657JP7, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T23657JP8, comprising a first amino acid sequence being at least 90 %> homologous to
MPLHQLGDKPLTFPSPNSAMENGLDHTPPSRRASPGTPLSPGSLRSAAHSPLDTSKQPLC QLWAEKHGARGTHEVRYVSAGQSVACGWWAFAPPCLQVLNTPKGILFFLCAAAFLQG MTVNGFINTVITSLERRYDLHSYQSGLIASSYDIAACLCLTFVSYFGGSGHKPRWLGWG VLLMGTGSLVFALPHFTAGRYEVELDAGVRTCPANPGAVCADSTSGLSRYQLVFMLG QFLHGVGATPLYTLGVTYLDENVKSSCSPVYIAIFYTAAILGPAAGYLIGGALLNIYTEM GRRTELTTESPLWVGAWWVGFLGSGAAAFFTAVPILGYPRQLPGSQRYAVMRAAEMH QLKDSSRGEASNPDFGKTIRDLPLSIWLLLKNPTFILLCLAGATEATLITGMSTFSPKFLES QFSLSASEAATLFGYLVVPAGGGGTFLGGFFVNKLRLRGSAVIKFCLFCTVVSLLGILVF SLHCPSVPMAGVTASYGGSLLPEGHLNLTAPCNAACSCQPEHYSPVCGSDGLMYFSLC HAGCPAATETNVDGQK conesponding to amino acids 1 - 546 of S21CJHUMAN, which also conesponds to amino acids 1 - 546 of T23657JP8, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence QHSCTNGNSTMCP conesponding to amino acids 547 - 559 of T23657_P8, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T23657JP8, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence QHSCTNGNSTMCP in T23657JP8. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T23657_P10, comprising a first amino acid sequence being at least 90 % homologous to MPLHQLGDKPLTFPSPNSAMENGLDHTPPSRRASPGTPLSPGSLRSAAHSPLDTSKQPLC QLWAEKHGARGTHEVRYVSAGQSVACGWWAFAPPCLQVLNTPKGILFFLCAAAFLQG MTVNGFLNTVITSLERRYDLHSYQSGLIASSYDIAACLCLTFVSYFGGSGHKPRWLGWG VLLMGTGSLVFALPHFTAGRYEVELDAGVRTCPANPGAVCADSTSGLSRYQLVFMLG QFLHGVGATPLYTLGVTYLDENVKSSCSPVYIAIFYTAAILGPAAGYLIGGALLNIYTEM GR TELTTESPLWVGAWWVGFLGSGAAAFFTAVPILGYPRQLPGSQRYAVMRAAEMH QLKDSSRGEASNPDFGKTIPvDLPLSIWLLLKNPTFILLCLAGATEATLITGMSTFSPKFLES QFSLSASEAATLFGYLVVPAGGGGTFLGGFFVNKLRLRGSAVIKFCLFCTVVSLLGILVF SLHCPSVPMAGVTASYGGSLLPEGHLNLTAPCNAACSCQPEHYSPVCGSDGLMYFSLC HAGCPAATETNVDGQKVYRDCSCIPQNLSSGFGHATAGKCTSTCQRKPLLLVFIFVVIFF TFLSSIPALTATLRCVRDPQRSFALGIQWIVVRIL conesponding to amino acids 1 - 625 of S21CJXUMAN, which also conesponds to amino acids 1 - 625 of T23657JP10, a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence GTVQCEEAMVSCTVCSLHKGM conesponding to amino acids 626 - 646 of T23657JP10, and a third amino acid sequence being at least 90 % homologous to GGIPGPIAFGWVIDKACLLWQDQCGQQGSCLVYQNSAMSRYILIMGLLYKVLGVLFFAI ACFLYKPLSESSDGLETCLPSQSSAPDSATDSQLQSSV conesponding to amino acids 626 - 722 of S21CJHUMAN, which also conesponds to amino acids 647 - 743 of T23657JP10, wherein said first amino acid sequence, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for an edge portion of T23657JP10, comprising an amino acid sequence being at least 10%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95% homologous to the sequence encoding for GTVQCEEAMVSCTVCSLHKGM, conesponding to T23657JP10. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T23657JP11, comprising a first amino acid sequence being at least 90 % homologous to
MPLHQLGDKPLTFPSPNSAMENGLDHTPPSRRASPGTPLSPGSLRSAAHSPLDTSKQPLC QLWAEKHGARGTHEVRYVSAGQSVACGWWAFAPPCLQVLNTPKGILFFLCAAAFLQG MTVNGFLNTVITSLERRYDLHSYQSGLIASSYDIAACLCLTFVSYFGGSGHKPRWLGWG VLLMGTGSLVFALPHFTAGRYEVELDAGVRTCPANPGAVCADSTSGLSRYQLVFMLG QFLHGVGATPLYTLGVTYLDENVKSSCSPVYIAIFYTAAILGPAAGYLIGGALLNIYTEM GRRTELTTESPLWVGAWWVGFLGSGAAAFFTAVPILGYPRQLPGSQRYAVMRAAEMH QLKDSSRGEASNPDFGKTIRDLPLSIWLLLKNPTFILLCLAGATEATLITGMSTFSPKFLES QFSLSASEAATLF conesponding to amino acids 1 - 425 of S21 C_HUMAN, which also conesponds to amino acids 1 - 425 of T23657JP11, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%o, more preferably at least 90%> and most preferably at least 95%. homologous to a polypeptide having the sequence ASCPKAT conesponding to amino acids 426 - 432 of T23657JP11, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T23657_P11, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95% homologous to the sequence ASCPKAT in T23657JP11. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T23657JP12, comprising a first amino acid sequence being at least 90 % homologous to
MPLHQLGDKPLTFPSPNSAMENGLDHTPPSRRASPGTPLSPGSLRS AAHSPLDTSKQPLC QLWAEKHGARGTHEVRYVSAGQSVACGWWAFAPPCLQVLNTPKGILFFLCAAAFLQG MTVNGFINTVITSLERRYDLHSYQSGLIASSYDIAACLCLTFVSYFGGSGHKPRWLGWG VLLMGTGSLVFALPHFTAGRYEVELDAGVRTCPANPGAVCADSTSGLSRYQLVFMLG QFLHGVGATPLYTLGVTYLDENVKSSCSPVYIAIFYTAAILGPAAGYLIGGALLNIYTEM GRRTELTTESPLWVGAWWVGFLGSGAAAFFTAVPILGYPRQLPGSQRYAVMRAAEMH QLKDSSRGEASNPDFGKTIRDLPLSIWLLLKNPTFILLCLAGATEATLITGMSTFSPKFLES QFSLSASEAATLFGYLVVPAGGGGTFLGGFFVNKLRLRGSAVIKFCLFCTVVSLLGILVF SLHCPSVPMAGVTASYGGSLLPEGHLNLTAPCNAACSCQPEHYSPVCGSDGLMYFSLC HAGCPAATETNVDGQKVYRDCSCIPQNLSSGFGHATAGKCTSTCQRKPLLLVFIFVVIFF TFLSSIPALTATLRCVRDPQRSFALGIQWIVVRILGGIPGPIAFGWVIDKACLLWQDQCG QQGSCLVYQNSAMSRYILIMGLLYK conesponding to amino acids 1 - 675 of S21CJHUMAN, which also conesponds to amino acids 1 - 675 of T23657JP12, and a second amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence EEENEFRRL conesponding to amino acids 676 - 684 of T23657JP12, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T23657JP12, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95% homologous to the sequence EEENEFRRL in T23657JP12. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T23657JP16, comprising a first amino acid sequence being at least 70%>, optionally at least 80%), preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MGTSPMADPVPAGRQHGSGLDPTTRLSPLC conesponding to amino acids 1 - 30 of T23657JP16, and a second amino acid sequence being at least 90 %> homologous to SLLPEGHLNLTAPCNAACSCQPEHYSPVCGSDGLMYFSLCHAGCPAATETNVDGQKVY RDCSCIPQNLSSGFGHATAGKCTSTCQRKPLLLVFIFVVIFFTFLSSIPALTATLRCVRDPQ RSFALGIQWIVVRILGGIPGPIAFGWVIDKACLLWQDQCGQQGSCLVYQNSAMSRYILI MGLLYKVLGVLFFAIACFLYKPLSESSDGLETCLPSQSSAPDSATDSQLQSSV conesponding to amino acids 491 - 722 of S21 CJHUMAN, which also conesponds to amino acids 31 - 262 of T23657JP16, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of T23657_P16, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MGTSPMADPVPAGRQHGSGLDPTTRLSPLC of T23657JP16. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T23657JP17, comprising a first amino acid sequence being at least 90 %> homologous to
MYFSLCHAGCPAATETNVDGQKVYRDCSCIPQNLSSGFGHATAGKCTSTCQRKPLLLV FIFWIFFTFLSSIPALTATLRCVRDPQRSFALGIQWIWRILGGIPGPIAFGWVIDKACLL WQDQCGQQGSCLVYQNSAMSRYILIMGLLYKVLGVLFFAIACFLYKPLSESSDGLETCL PSQSSAPDSATDSQLQSSV conesponding to amino acids 525 - 722 of S21 CJHUMAN, which also conesponds to amino acids 1 - 198 of T23657JP17. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T23657_P21, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence MWTAR conesponding to amino acids 1 - 5 of T23657JP21, and a second amino acid sequence being at least 90 % homologous to
RCVRDPQRSFALGIQWIVVRILGGIPGPIAFGWVIDKACLLWQDQCGQQGSCLVYQNSA MSRYILIMGLLYKVLGVLFFAIACFLYKPLSESSDGLETCLPSQSSAPDSATDSQLQSSV conesponding to amino acids 604 - 722 of S21 CJHUMAN, which also conesponds to amino acids 6 - 124 of T23657_P21, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of T23657JP21, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MWTAR of T23657JP21. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T23657JP23, comprising a first amino acid sequence being at least 90 % homologous to MPLHQLGDKPLTFPSPNSAMENGLDHTPPSRRASPGTPLSPGSLRSAAHSPLDTSKQPLC QLWAEKHGARGTHEVRYVSAGQSVACGWWAFAPPCLQVLNTPKGILFFLCAAAFLQG MTVNGFLNTVITSLERRYDLHSYQSGLIASSYDIAACLCLTFVSYFGGSGHKPRWLGWG VLLMGTGSLVFALPHFTAGRYEVELDAGVRTCPANPGAVCADSTSGLSRYQLVFMLG QFLHGVGATPLYTLGVTYLDENVKSSCSPVYIAIFYTAAILGPAAGYLIGGALLNIYTEM GRPvTELTTESPLWVGAWWVGFLGSGAAAFFTAVPILGYPRQLPGSQRYAVMRAAEMH QLKDSSRGEASNPDFGKTIRDLPLSIWLLLKNPTFILLCLAGATEATLITGMSTFSPKFLES QFSLSASEAATLFGYLVVPAGGGGTFLGGFFVNKLRLRGSAVIKFCLFCTWSLLGILVF SLHCPSVPMAGVTASYGGSLLPEGHLNLTAPCNAACSCQPEHYSPVCGSDGLMYFSLC HAGCPAATETNVDGQKV conesponding to amino acids 1 - 547 of S21CJΪUMAN, which also conesponds to amino acids 1 - 547 of T23657_P23, and a second amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence SGAAAYRPCPPLDPGKGPPCLPLVIGAIVGLPRCTETVAVSLRIFPLVLAMHCREMHFNL SEKAPPSGFHIRCNFLYIPQQHSCTNGNSTVSWGRVCACPELSLQHPEAELCRS conesponding to amino acids 548 - 661 of T23657JP23, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T23657JP23, comprising a polypeptide being at least 70%), optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SGAAAYRPCPPLDPGKGPPCLPLVIGAIVGLPRCTETVAVSLRIFPLVLAMHCREMHFNL SEKAPPSGFHIRCNFLYIPQQHSCTNGNSTVSWGRVCACPELSLQHPEAELCRS in T23657JP23. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for R30650_PEA_2_P4, comprising a first amino acid sequence being at least 90 % homologous to
MYLHIGEEIDGVDMRAEVGLLSRNIIVMGEMEDKCYPYRNHICNFFDFDTFGGHIKFAL GFKAAHLEGTELKHMGQQLVGQYPIHFHLAGDVDERGGYDPPTYIRDLSIHHTFSRCVT VHGSNGLLIKDVVGYNSLGHCFFTEDGPEERNTFDHCLGLLVKSGTLLPSDRDSKMCK MITEDSYPGYIPKPRQDCNAVSTFWMANPNNNLLNCAAAGSEETGFWFIFHHVPTGPSV GMYSPGYSEHIPLGKFYNNRAHSNYRAGMIIDNGVKTTEASAKDKRPFLSIISARYSPHQ DADPLKPREPAIIRHFIAYKNQDHGAWLRGGDVWLDSCRFADNGIGLTLASGGTFPYD DGSKQEIKNSLFVGESGNVGTEMMDNRIWGPGGLDHSGRTLPIGQNFPIRGIQLYDGPIN IQNCTFRKFVALEGRHTSALAFRLNNAWQSCPHNNVTGIAFEDVPITSRVFFGEPGPWF NQLDMDGDKTSVFHDVDGSVSEYPGSYLTKNDNWLVRHPDCINVPDWRGAICSGCYA QMYIQAYKTSNLRMKIIKNDFPSHPLYLEGALTRSTHYQQYQPWTLQKGYTIHWDQT APAELAIWLΓNFNKGDWIRVGLCYPRGTTFSILSDVHNRLLKQTSKTGVFVRTLQMDKV EQSYPGRSHYYWDEDSGLLFLKLKAQNEREKFAFCSMKGCERIKIKALΓPKNAGVSDCT ATAYPKFTERAVVDVPMPKKLFGSQLKTKDHFLEVKMESSKQHFFHLWNDFAYIEVD GKKYPSSEDGIQNWIDGNQGRVVSHTSFRNSILQGIPWQLFNYVATIPDNSIVLMASKG RYVSRGPWTRVLEKLGADRGLKLKEQMAFVGFKGSFRPΠVNTLDTEDHKAKLFQVVPI PVVKKKKL conesponding to amino acids 126 - 1013 of Q9ULM1, which also corresponds to amino acids 1 - 888 of R30650JPEAJ2JP4. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for R30650JPEA JP4, comprising a first amino acid sequence being at least 90 % homologous to
MYLHIGEEIDGVDMRAEVGLLSRNIIVMGEMEDKCYPYRNHICNFFDFDTFGGHIKFAL GFKAAHLEGTELKHMGQQLVGQYPIHFHLAGDVDERGGYDPPTYIRDLSIHHTFSRCVT VHGSNGLLIKDVVGYNSLGHCFFTEDGPEERNTFDHCLGLLVKSGTLLPSDRDSKMCK MITEDSYPGYIPKPRQDCNAVSTFWMANPNNNLINCAAAGSEETGFWFIFHHVPTGPSV GMYSPGYSEHIPLGKFYNNRAHSNYRAGMIIDNGVKTTEASAKDKRPFLSIISARYSPHQ DADPLKPREPAIIRHFIAYKNQDHGAWLRGGDVWLDSCRFADNGIGLTLASGGTFPYD DGSKQEIKNSLFVGESGNVGTEMMDNRIWGPGGLDHSGRTLPIGQNFPIRGIQLYDGPIN IQNCTFRKFVALEGRHTSALAFRLNNAWQSCPHNNVTGIAFEDVPITSRVFFGEPGPWF NQLDMDGDKTSVFHDVDGSVSEYPGSYLTKND conesponding to amino acids 474 - 977 of Q8WUJ3, which also conesponds to amino acids 1 - 504 of R30650JPEA _2JP4, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence NWLVRHPDCINVPDWRGAICSGCYAQMYIQAYKTSNLRMKIIKNDFPSHPLYLEGALT RSTHYQQYQPVVTLQKGYTIHWDQTAPAELAIWLINFNKGDWIRVGLCYPRGTTFSILS DVHNRLLKQTSKTGVFVRTLQMDKVEQSYPGRSHYYWDEDSGLLFLKLKAQNEREKF AFCSMKGCERIKIKALIPKNAGVSDCTATAYPKFTERAVVDVPMPKKLFGSQLKTKDHF LEVKMESSKQHFFHLWNDFAYIEVDGKKYPSSEDGIQWVIDGNQGRWSHTSFRNSIL QGIPWQLFNYVATffDNSIVLMASKGRYVSRGPWTRVLEKLGADRGLKLKEQMAFVGF KGSFRPIWVTLDTEDHKAKIFQVNPIPVVKKKKL conesponding to amino acids 505 - 888 of R30650JPEAJ P4, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of R30650_PEA_2_P4, comprising a polypeptide being at least 70%>, optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence NWLVRHPDCINVPDWRGAICSGCYAQMYIQAYKTSNLRMKIIKNDFPSHPLYLEGALT RSTHYQQYQPVVTLQKGYTIHWDQTAPAELAIWLLNFNKGDWIRVGLCYPRGTTFSILS DVHNRLLKQTSKTGVFVRTLQMDKVEQSYPGRSHYYWDEDSGLLFLKLKAQNEREKF AFCSMKGCERIKIKALIPKNAGVSDCTATAYPKFTERAVVDVPMPKKLFGSQLKTKDHF LEVKMESSKQHFFHLWNDFAYIEVDGKKYPSSEDGIQVVVIDGNQGRVVSHTSFRNSIL QGIPWQLFNYVATIPDNSIVLMASKGRYVSRGPWTRVLEKLGADRGLKLKEQMAFVGF KGSFRPIWVTLDTEDHKAKIFQVVPIPVVKKKKL in R30650_PEA J_P4. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for R30650JPEAJ2JP4, comprising a first amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%>, more preferably at least 90%) and most preferably at least 95% homologous to a polypeptide having the sequence MYLHIGEEIDGVDMRAEVGLLSRNIIVMGEMEDKCYPYRNHICNFFDFDTFGGHIKFAL GFKAAHLEGTELKHMGQQLVGQYPIHFHLAGD conesponding to amino acids 1 - 91 of R30650_PEA_2_P4, and a second amino acid sequence being at least 90 %> homologous to VDERGGYDPPTYIRDLSIHHTFSRCVTVHGSNGLLLKDVVGYNSLGHCFFTEDGPEERNT FDHCLGLLVKSGTLLPSDRDSKMCKMITEDSYPGYIPKPRQDCNAVSTFWMANPNNNL INCAAAGSEETGFWFIFHHVPTGPSVGMYSPGYSEHIPLGKFYNNRAHSNYRAGMIIDN GVKTTEASAKDKRPFLSIISARYSPHQDADPLKPREPAIIRHFIAYKNQDHGAWLRGGDV WLDSCRFADNGIGLTLASGGTFPYDDGSKQEIKNSLFVGESGNVGTEMMDNRIWGPGG LDHSGRTLPIGQNFPIRGIQLYDGPINIQNCTFRKFVALEGRHTSALAFRLNNAWQSCPH NNVTGIAFEDVPITSRVFFGEPGPWFNQLDMDGDKTSVFHDVDGSVSEYPGSYLTKND NWLVRHPDCLNVPDWRGAICSGCYAQMYIQAYKTSNLRMKIIKNDFPSHPLYLEGALT RSTHYQQYQPVVTLQKGYTIHWDQTAPAELAIWLLNFNKGDWIRVGLCYPRGTTFSILS DVHNRLLKQTSKTGVFVRTLQMDKVEQSYPGRSHYYWDEDSGLLFLKLKAQNEREKF AFCSMKGCERTKIKALIPKNAGVSDCTATAYPKFTERAVVDVPMPKKLFGSQLKTKDHF LEVKMESSKQHFFHLWNDFAYIEVDGKKYPSSEDGIQWVIDGNQGRWSHTSFRNSIL QGIPWQLFNYVATIPDNSIVLMASKGRYNSRGPWTRVLEKLGADRGLKLKEQMAFVGF KGSFRPIWVTLDTEDHKAKIFQVVPIPVVKKKKL conesponding to amino acids 8 - 804 of Q9ΝPΝ9, which also conesponds to amino acids 92 - 888 of R30650JPEA_2JP4, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of R30650JPEA_2JP4, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95%> homologous to the sequence MYLHIGEEIDGVDMRAEVGLLSRNIIVMGEMEDKCYPYRNHICNFFDFDTFGGHIKFAL GFKAAHLEGTELKHMGQQLVGQYPIHFHLAGD ofR30650_PEAJ_P4. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for R30650_PEA_2_P4, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MYLHIGEEIDGVDMRAEVGLLSRNIIVMGEMEDKCYPYRNHICNFFDFDTFGGHIKFAL GFKAAHLEGTELKHMGQQLVGQYPIHFHLAGDVDERGGYDPPTYIRDLSIHHTFSRCVT VHGSNGLLIKDVVGYNSLGHCFFTEDGPEEPvNTFDHCLGLLVKSGTLLPSDRDSKMCK MITEDSYPGYIPKPRQDCNAVSTFWMANPNNNLLNCAAAGSEETGFWFIFHHVPTGPSV GMYSPGYSEHIPLGKFYNNRAHSNYRAGMIIDNGVKTTEASAKDKRPFLSIISARYSPHQ DADPLKPREPAIIRHFIAYKNQDHGAWLRGGDVWLDSCRFADNGIGLTLASGGTFPYD DGSKQEIKNSLFVGESGNVGTEMMDNRIWGPGGLDH conesponding to amino acids 1 - 389 of R30650JPEA_2JP4, and a second amino acid sequence being at least 90 %> homologous to SGRTLPIGQNFPIRGIQLYDGPINIQNCTFRKFVALEGRHTSALAFRLNNAWQSCPHNNV TGIAFEDVPITSRVFFGEPGPWFNQLDMDGDKTSVFHDVDGSVSEYPGSYLTKNDNWL VRHPDCINVPDWRGAICSGCYAQMYIQAYKTSNLRMKIIKNDFPSHPLYLEGALTRSTH YQQYQPWTLQKGYTIHWDQTAPAELAIWLINFNKGDWIRVGLCYPRGTTFSILSDVH NRLLKQTSKTGVFVRTLQMDKVEQSYPGRSHYYWDEDSGLLFLKLKAQNEREKFAFC SMKGCERjΕIKALIPKNAGVSDCTATAYPKFTERAVNDWMPKKLFGSQLKTKDHFLEV KMESSKQHFFHLWΝDFAYIEVDGKKYPSSEDGIQWVIDGΝQGRVVSHTSFRΝSILQGI PWQLFΝYVATIPDΝSIVLMASKGRYVSRGPWTRVLEKLGADRGLKLKEQMAFVGFKG SFRPIWVTLDTEDHKAKIFQV 1TVVKKKKL conesponding to amino acids 2 - 500 of Q9H1K5, which also conesponds to amino acids 390 - 888 of R30650_PEA_2_P4, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of R30650_PEA JP4, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%o and most preferably at least about 95%> homologous to the sequence MYLHIGEEIDGVDMRAEVGLLSRNIIVMGEMEDKCYPYRNHICNFFDFDTFGGHIKFAL GFKAAHLEGTELKHMGQQLVGQYPIHFHLAGDVDERGGYDPPTYIRDLSIHHTFSRCVT VHGSNGLLIKDVVGYNSLGHCFFTEDGPEERNTFDHCLGLLVKSGTLLPSDRDSKMCK MITEDSYPGYIPKPRQDCNAVSTFWMANPNNNLINCAAAGSEETGFWFIFHHVPTGPSV GMYSPGYSEHIPLGKFYNNRAHSNYRAGMIIDNGVKTTEASAKDKRPFLSIISARYSPHQ DADPLKPREPAIIRHFIAYKNQDHGAWLRGGDVWLDSCRFADNGIGLTLASGGTFPYD DGSKQEIKNSLFVGESGNVGTEMMDNRIWGPGGLDH of R30650 »EA_2 JP4. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for R30650JPEA JP5, comprising a first amino acid sequence being at least 90 %> homologous to MDGVNLSTEVVYKKGQDYRFACYDRGRACRSYRVRFLCGKPVRPKLTVTIDTNVNSTI LNLEDNVQSWKPGDTLVIASTDYSMYQAEEFQVLPCRSCAPNQVKVAGKPMYLHIGEE IDGVDMRAEVGLLSRNIIVMGEMEDKCYPYRNHICNFFDFDTFGGHIKFALGFKAAHLE GTELKHMGQQLVGQYPIHFHLAGDVDERGGYDPPTYIRDLSIHHTFSRCVTVHGSNGLL IKDVVGYNSLGHCFFTEDGPEERNTFDHCLGLLVKSGTLLPSDRDSKMCKMITEDSYPG YIPKPRQDCNAVSTFWMANPNNNLINCAAAGSEETGFWFIFHHVPTGPSVGMYSPGYS EHIPLGKFYNNP^HSNYPAGMIIDNGVKTTEASAIΦKRPFLSIISARYSPHQDADPLKPR EPAIIPVHFIAYKNQDHGAWLRGGDVWLDSCRFADNGIGLTLASGGTFPYDDGSKQEIKN SLFVGESGNVGTEMMDNRIWGPGGLDHSGRTLPIGQNFPIRGIQLYDGPTNIQNCTFRKF VALEGRHTSALAFRLNNAWQSCPHNNVTGIAFEDVPITSRVFFGEPGPWFNQLDMDGD KTSVFHDVDGSVSEYPGSYLTKNDNWLVRHPDCLNVPDWRGAICSGCYAQMYIQAYK TSNLR KΠKNDFPSHPLYLEGALTRSTHYQQYQPVVTLQKGYTIHWDQTAPAELAIWL ΓNFNKGDWIRVGLCYPRGTTFSILSDVHNRLLKQTSKTGVFVRTLQMDKVEQSYPGRSH YYWDEDSGLLFLKLKAQNEREKFAFCSMKGCERΓKIKALIPKNAGVSDCTATAYPKFTE
PVAVVDVPMPKKLFGSQLKTKDFIELEVKMESSKQHFFHLWNDFAYIEVDGKKYPSSED GIQVWIDGNQGRWSHTSFRNSILQGIPWQLFNYVAT DNSIVLMASKGRYVSRGPW
TRVLEKLGADRGLKLKEQMAFVGFKGSFRPIWVTLDTEDHKAI FQVWIPVVKKKKL corresponding to amino acids 18 - 1013 of Q9ULM1, which also conesponds to amino acids 1 - 996 of R30650JPEA_2J»5. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for R30650_PEAJ_P5, comprising a first amino acid sequence being at least 90 %> homologous to
MDGVNLSTEVVYKKGQDYRFACYDRGRACRSYRVRFLCGKPVRPKLTVTIDTNVNSTI LNLEDNVQSWKPGDTLVIASTDYSMYQAEEFQVLPCRSCAPNQVKVAGKPMYLHIGEE IDGVDMRAEVGLLSRNIIVMGEMEDKCYPYRNHICNFFDFDTFGGHIKFALGFKAAHLE
GTELKHMGQQLVGQYPIHFHLAGDVDERGGYDPPTYIRDLSIHHTFSRCVTVHGSNGLL IKDVVGYNSLGHCFFTEDGPEERNTFDHCLGLLVKSGTLLPSDRDSKMCKMITEDS YPG YIPKPRQDCNAVSTFWMANPNNNLINCAAAGSEETGFWFIFHHVPTGPSVGMYSPGYS EHIPLGKFYNNRAHSNYRAGMIIDNGVKTTEASAKDKRPFLSIISARYSPHQDADPLKPR EPAIIRHFIAYKNQDHGAWLRGGDVWLDSCRFADNGIGLTLASGGTFPYDDGSKQEIKN SLFVGESGNVGTEMMDNRIWGPGGLDHSGRTLPIGQNFPIRGIQLYDGPΓNIQNCTFRKF VALEGRHTSALAFRLNNAWQSCPHNNVTGIAFEDVPITSRVFFGEPGPWFNQLDMDGD
KTSVFHDVDGSVSEYPGSYLTKND conesponding to amino acids 366 - 977 of Q8WUJ3, which also conesponds to amino acids 1 - 612 of R30650_PEAJ_P5, and a second amino acid sequence being at least 70%>, optionally at least 80%>, preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence NWLVRHPDCΓNVPDWRGAICSGCYAQMYIQAYKTSNLRMKIIKNDFPSHPLYLEGALT RSTHYQQYQPVVTLQKGYTIHWDQTAPAELAIWLΓNFNKGDWIRVGLCYPRGTTFSILS DVHNRLLKQTSKTGVFVRTLQMDKVEQSYPGRSHYYWDEDSGLLFLKLKAQNEREKF AJCSMKGCERΓKIKALIPKNAGVSDCTATAYPKFTERAVVDVPMPKKLFGSQLKTKDHF LEVKMESSKQHFFHLWNDFAYIEVDGKKYPSSEDGIQVVVIDGNQGRVVSHTSFRNSIL QGIPWQLFNYVATIPDNSIVLMASKGRYVSRGPWTRVLEKLGADRGLKLKEQMAFVGF
KGSFRPIWVTLDTEDHKAKIFQV IPVVKKKKL conesponding to amino acids 613 - 996 of R30650JPEA_2_P5, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of R30650JPEAJ2JP5, comprising a polypeptide being at least 70%>, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence NWLVRHPDCLNVPDWRGAICSGCYAQMYIQAYKTSNLRMKIIKNDFPSHPLYLEGALT RSTHYQQYQPWTLQKGYTIHWDQTAPAELAIWLINFNKGDWIRVGLCYPRGTTFSILS DVHNRLLKQTSKTGVFVRTLQMDKVEQSYPGRSHYYWDEDSGLLFLKLKAQNEREKF AFCSMKGCEmKIKALIPKNAGVSDCTATAYPKFTERAVVDVPMPKKLFGSQLKTKDHF LEVKMESSKQHFFHLWNDFAYIEVDGKKYPSSEDGIQVVVIDGNQGRVVSHTSFRNSIL QGIPWQLFNYVATIPDNSIVLMASKGRYVSRGPWTRVLEKLGADRGLKLKEQMAFVGF KGSFRPIWVTLDTEDHKAKIFQVVPIPVVKKKKL in R30650_PEA_2JP5. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for R30650JPEA_2_P5, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%), more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MDGVNLSTEVVYKKGQDYRFACYDRGRACRSYRVRFLCGKPVRPKLTVTIDTNVNSTI LNLEDNVQSWKPGDTLVIASTDYSMYQAEEFQVLPCRSCAPNQVKVAGKPMYLHIGEE IDGVDMRAEVGLLSRNIIVMGEMEDKCYPYRNHICNFFDFDTFGGHIKFALGFKAAHLE GTELKHMGQQLVGQYPIHFHLAGD conesponding to amino acids 1 - 199 of R30650JPEA 2 P5, and a second amino acid sequence being at least 90 % homologous to
VDERGGYDPPTYIRDLSIHHTFSRCVTVHGSNGLLIKDWGYNSLGHCFFTEDGPEERNT FDHCLGLLVKSGTLLPSDRDSKMCKMITEDSYPGYIPKPRQDCNAVSTFWMANPNNNL ΓNCAAAGSEETGFWFIFHHVPTGPSVGMYSPGYSEHIPLGKFYNNRAHSNYRAGMIIDN GVKTTEASAKDKRPFLSIISARYSPHQDADPLKPREPAITRHFIAYKNQDHGAWLRGGDV WLDSCRFADNGIGLTLASGGTFPYDDGSKQEIKNSLFVGESGNVGTEMMDNRIWGPGG LDHSGRTLPIGQNFPIRGIQLYDGPINIQNCTFRKFVALEGRHTSALAFRLNNAWQSCPH NNVTGIAFEDVPITSRVFFGEPGPWFNQLDMDGDKTSVFHDVDGSVSEYPGSYLTKND NWLVRHPDCINVPDWRGAICSGCYAQMYIQAYKTSNLRMKIIKNDFPSHPLYLEGALT RSTHYQQYQPWTLQKGYTIHWDQTAPAELAIWLINFNKGDWIRVGLCYPRGTTFSILS DVHNRLLKQTSKTGVFVRTLQMDKVEQSYPGRSHYYWDEDSGLLFLKLKAQNEREKF AFCSMKGCERIKIKALIPKNAGVSDCTATAYPKFTERAVVDVPMPKKLFGSQLKTKDHF LEVKMESSKQHFFHLWNDFAYIEVDGKKYPSSEDGIQWVIDGNQGRVVSHTSFRNSIL QGIPWQLFNYVATIPDNSRVLMASKGRYVSRGPWTRVLEKLGADRGLKLKEQMAFVGF
KGSFRPIWVTLDTEDHKAKIFQVVPIPVVKKKKL conesponding to amino acids 8 - 804 of Q9NPN9, which also conesponds to amino acids 200 - 996 of R30650_PEA_2_P5, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of R30650_PEA_2_P5, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95% homologous to the sequence MDGVNLSTEVVYKKGQDYRFACYDRGRACRSYRVRFLCGKPVRPKLTVTIDTNVNSTI LNLEDNVQSWKPGDTLVIASTDYSMYQAEEFQVLPCRSCAPNQVKVAGKPMYLHIGEE IDGVDMRAEVGLLSRNIIVMGEMEDKC YPYRNHICNFFDFDTFGGHIKF ALGFKAAHLE GTELKHMGQQLVGQYPIHFHLAGD ofR30650_PEA_2_P5. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for R30650_PEA_2_P5, comprising a first amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MDGVNLSTEVVYKKGQDYRFACYDRGRACRSYRVRFLCGKPVRPKLTVTIDTNVNSTI LNLEDNVQSWKPGDTLVIASTDYSMYQAEEFQVLPCRSCAPNQVKVAGKPMYLHIGEE IDGVDMRAEVGLLSRNIIVMGEMEDKCYPYRNHICNFFDFDTFGGHIKFALGFKAAHLE GTELKHMGQQLVGQYPIHFHLAGDVDERGGYDPPTYIRDLSIHHTFSRCVTVHGSNGLL IKDWGYNSLGHCFFTEDGPEERNTFDHCLGLLVKSGTLLPSDRDSKMCKMITEDSYPG YIPKPRQDCNAVSTFWMANPNNNLINCAAAGSEETGFWFIFHHVPTGPSVGMYSPGYS EHIPLGKFYNNRAHSNYRAGMIIDNGVKTTEASAKDKRPFLSIISARYSPHQDADPLKPR EPAIIRHFIAYKNQDHGAWLRGGDVWLDSCRFADNGIGLTLASGGTFPYDDGSKQEIKN SLFVGESGNVGTEMMDNRIWGPGGLDH conesponding to amino acids 1 - 497 of R30650_PEA_2_P5, and a second amino acid sequence being at least 90 % homologous to
SGRTLPIGQNFPIRGIQLYDGP iQNCTFPJ FVALEGRHTSALAFR NNAWQSCPFINNV TGIAFEDVPITSRVFFGEPGPWFNQLDMDGDKTSVFHDVDGSVSEYPGSYLTKNDNWL VRHPDCINVPDWRGAICSGCYAQMYIQAYKTSNLRMKIIKNDFPSHPLYLEGALTRSTH YQQYQPWTLQKGYTIHWDQTAPAELAIWLLNFNKGDWIRVGLCYPRGTTFSILSDVH NRLLKQTSKTGVFVRTLQMDKVEQSYPGRSHYYWDEDSGLLFLKLKAQNEREKFAFC SMKGCEiyKIKALIPKNAGVSDCTATAYPKFTERAVVDWMPKiaFGSQLKTKDFlTLEV KMESSKQHFFHLWNDFAYIEVDGKKYPSSEDGIQVVVIDGNQGRVVSHTSFRNSILQGI PWQLFNYVATIPDNSIVLMASKGRYVSRGPWTRVLEKLGADRGLKLKEQMAFVGFKG SFRPIWVTLDTEDHKAKIFQVVPIPVVKKKKL conesponding to amino acids 2 - 500 of Q9H1K5, which also conesponds to amino acids 498 - 996 of R30650_PEAJ_P5, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of R30650JPEAJ JP5, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence
MDGVNLSTEVVYKKGQDYRFACYDRGRACRSYRVRFLCGKPVRPKLTVTIDTNVNSTI LNLEDNVQSWKPGDTLVIASTDYSMYQAEEFQVLPCRSCAPNQVKVAGKPMYLHIGEE IDGVDMRAEVGLLSRNIIVMGEMEDKCYPYRNHICNFFDFDTFGGHIKFALGFKAAHLE GTELKHMGQQLVGQYPIHFHLAGDVDERGGYDPPTYIRDLSIHHTFSRCVTVHGSNGLL IKDVVGYNSLGHCFFTEDGPEERNTFDHCLGLLVKSGTLLPSDRDSKMCKMITEDSYPG YIPKPRQDCNAVSTFWMANPNNNLINCAAAGSEETGFWFIFHHVPTGPSVGMYSPGYS EHIPLGKFYNNRAHSNYRAGMIIDNGVKTTEASAKDKRPFLSIISARYSPHQDADPLKPR EPAIIRHFIAYKNQDHGAWLRGGDVWLDSCRFADNGIGLTLASGGTFPYDDGSKQEIKN SLFVGESGNVGTEMMDNRIWGPGGLDH ofR30650_PEAJ_P5. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for R30650_PEA_2_P8, comprising a first amino acid sequence being at least 70%>, optionally at least 80%>, preferably at least 85%>, more preferably at least 90%) and most preferably at least 95% homologous to a polypeptide having the sequence MGAAGRQDFLFKAMLTISWLTLTCFPGATSTVAAGCPDQSPELQPWNPGHDQDHHVHI GQGKTLLLTSSATVYSIHISEGGKLVIKDHDEPIVLRTRHILIDNGGELHAGSALCPFQGN FTIILYGRADEGIQPDPYYGLKYIGVGKGGALELHGQKKLSWTFLNKTLHPGGMAEGG YFFERSWGHRGVINHVIDPKSGTVIHSDRFDTYRSKKESER VQYLNAVPDGPvILSVAV NDEGSRNLDDMARKAMTKLGSKHFLHLGFRHPWSFLTVKGNPSSSVEDHIEYHGHRG SAAARVFKLFQTEHGEYFNVSLSSEWVQDVEWTEWFDHDKVSQTKGGEKISDLWK conesponding to amino acids 1 - 348 of R30650JPEA_2J?8, a second amino acid sequence being at least 90 % homologous to AHPGKICNRPIDIQATTMDGVNLSTEVVYKKGQDYRFACYDRGRACRSYRVRFLCGKP VRPKLTVTIDTNVNSTILNLEDNVQSWKPGDTLVIASTDYSMYQAEEFQVLPCRSCAPN QVKVAGKPMYLHIGEEIDGVDMRAEVGLLSRNIIVMGEMEDKCYPYRNHICNFFDFDT FGGHIKFALGFKAAHLEGTELKHMGQQLVGQYPIHFHLAGDVDERGGYDPPTYIRDLSI HHTFSRCVTVHGSNGLLIKDVVGYNSLGHCFFTEDGPEERNTFDHCLGLLVKSGTLLPS DRDSKMCKMITEDSYPGYIPKPRQDCNAVSTFWMANPNNNLINCAAAGSEETGFWFIF HHVPTGPSVGMYSPGYSEHIPLGKFYNNRAHSNYRAGMIIDNGVKTTEASAKDKRPFLS IISARYSPHQDADPLKPREPAIIRHFIAYKNQDHGAWLRGGDVWLDSCRFADNGIGLTL ASGGTFPYDDGSKQEIKNSLFVGESGNVGTEMMDNRIWGPGGLDHSGRTLPIGQNFPIR GIQLYDGPINIQNCTFRKFVALEGRHTS ALAFRLNNAWQSCPHNNVTGIAFEDVPITSRV FFGEPGPWFNQLDMDGDKTSVFHDVDGSVSEYPGSYLTKNDNWLVRHPDCTNVPDWR GAICSGCYAQMYIQAYKTSNLRMKIIKNDFPSHPLYLEGALTRSTHYQQYQPVVTLQKG YTIHWDQTAPAELAIWLINFNKGDWIRVGLCYPRGTTFSILSDVHNRLLKQTSKTGVFV RTLQMDKVEQSYPGRSHYYWDEDSG conesponding to amino acids 1 - 788 of Q9ULM1, which also conesponds to amino acids 349 - 1136 of R30650JPEAJJP8, and a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%> and most preferably at least 95%> homologous to a polypeptide having the sequence KQRTISWR conesponding to amino acids 1137 - 1144 of R30650JPEA_2JP8, wherein said first amino acid sequence, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of R30650_PEA_2_P8, comprising a polypeptide being at least 70%>, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90%> and most preferably at least about 95% homologous to the sequence MGAAGRQDFLFKAMLTISWLTLTCFPGATSTVAAGCPDQSPELQPWNPGHDQDHHVHI GQGKTLLLTSSATVYSIHISEGGKLVIKDHDEPIVLRTRHILIDNGGELHAGSALCPFQGN FTIILYGRADEGIQPDPYYGLKYIGVGKGGALELHGQKKLSWTFLNKTLHPGGMAEGG YFFERSWGHRGVrVHVIDPKSGTVIHSDRFDTYRSKKESERLVQYLNAVPDGRILSVAV NDEGSRNLDDMARKAMTKLGSKHFLHLGFRHPWSFLTVKGNPSSSVEDHIEYHGHRG SAAARVFKLFQTEHGEYFNVSLSSEWVQDVEWTEWFDHDKVSQTKGGEKISDLWK of R30650 PEA 2 P8. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of R30650JΕAJ2JP8, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence KQRTISWR in R30650JPEA_2_P8. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for R30650JΕAJJP8, comprising a first amino acid sequence being at least 90 %> homologous to
MGAAGRQDFLFKAMLTISWLTLTCFPGATSTVAAGCPDQSPELQPWNPGHDQDHHVHI GQGKTLLLTSSATVYSIHISEGGKLVIKDHDEPIVLRTKHILIDNGGELHAGSALCPFQGN FTIILYGRADEGIQPDPYYGLKYIGVGKGGALELHGQKKLSWTFLNKTLHPGGMAEGG YFFERSWGHRGVIVHVIDPKSGTVIHSDRFDTYRSKKESERLVQYLNAVPDGRILSVAV NDEGSRNLDDMARKAMTKLGSKHFLHLGFRHPWSFLTVKGNPSSSVEDHIEYHGHRG SAAARVFKLFQTEHGEYFNVSLSSEWVQDVEWTEWFDHDKVSQTKGGEKISDLWKAH PGKICNRPIDIQATTMDGVNLSTEVVYKKGQDYRFACYDRGRACRSYRVRPLCGKPVR PKLTVTIDTNVNSTILNLEDNVQSWKPGDTLVIASTDYSMYQAEEFQVLPCRSCAPNQV KVAGKPMYLHIGEEIDGVDMRAEVGLLSRNIIVMGEMEDKCYPYRNHICNFFDFDTFG GHIKFALGFKAAHLEGTELKHMGQQLVGQYPIHFHLAGDVDERGGYDPPTYIRDLSIHH TFSRCVTVHGSNGLLIKDWGYNSLGHCFFTEDGPEERNTFDHCLGLLVKSGTLLPSDR DSKMCKMITEDSYPGYIPKPRQDCNAVSTFWMANPNNNLINCAAAGSEETGFWFIFHH VPTGPSVGMYSPGYSEHIPLGKFYNNRAHSNYRAGMIIDNGVKTTEASAKDKRPFLSIIS ARYSPHQDADPLKPREPAIIRHFIAYKNQDHGAWLRGGDVWLDSCRFADNGIGLTLAS GGTFPYDDGSKQEIKNSLFVGESGNVGTEMMDNRIWGPGGLDHSGRTLPIGQNFPIRGI QLYDGPΓNIQNCTFRKFVALEGRHTSALAFRLNNAWQSCPFΓNNVTGIAFEDVPITSRVFF GEPGPWFNQLDMDGDKTSVFHDVDGSVSEYPGSYLTKND conesponding to amino acids 1 - 977 of Q8WUJ3, which also conesponds to amino acids 1 - 977 of R30650JPEAJJP8, and a second amino acid sequence being at least 70%o, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence NWLVRHPDCINWDWRGAICSGCYAQMYIQAYKTSNLRMKIIKNDFPSHPLYLEGALT RSTHYQQYQPWTLQKGYTIHWDQTAPAELAIWLINFNKGDWIRVGLCYPRGTTFSILS DVHNRLLKQTSKTGVFVRTLQMDKVEQSYPGRSHYYWDEDSGKQRTISWR conesponding to amino acids 978 - 1144 of R30650 JPEAJ- JP8, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of R30650JPEAJJP8, comprising a polypeptide being at least 70%>, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence
NWLVRHPDCΓNVPDWRGAICSGCYAQMYIQAYKTSNLRMKIIKNDFPSHPLYLEGALT RSTHYQQYQPVVTLQKGYTIHWDQTAPAELAIWLINFNKGDWIRVGLCYPRGTTFSILS DVHNRLLKQTSKTGVFVRTLQMDKVEQSYPGRSHYYWDEDSGKQRTISWR IN
R30650JPEA_2JP8. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for R30650_PEAJ_P8, comprising a first amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence
MGAAGRQDFLFKAMLTISWLTLTCFPGATSTVAAGCPDQSPELQPWNPGHDQDHHVHI GQGKTLLLTSSATVYSIHISEGGKLVIKDHDEPIVLRTRHILIDNGGELHAGSALCPFQGN FTΠLYGRADEGIQPDPYYGLKYIGVGKGGALELHGQKKLSWTFLNKTLHPGGMAEGG YFFERSWGHRGVIVHVIDPKSGTVIHSDRFDTYRSKKESERLVQYLNAVPDGRILSVAV NDEGSRNLDDMARKAMTKLGSKHFLHLGFRHPWSFLTVKGNPSSSVEDHIEYHGHRG SAAARVFKLFQTEHGEYFNVSLSSEWVQDVEWTEWFDHDKVSQTKGGEKISDLWKAH PGKICNRPIDIQATTMDGVNLSTEVVYKKGQDYRFACYDRGRACRSYRVRFLCGKPVR PKLTVTIDTNVNSTILNLEDNVQSWKPGDTLVIASTDYSMYQAEEFQVLPCRSCAPNQV KVAGKPMYLHIGEEIDGVDMRAEVGLLSRNIΓVMGEMEDKCYPYRNHICNFFDFDTFG GHIKFALGFKAAHLEGTELKHMGQQLVGQYPIHFHLAGD conesponding to amino acids 1 - 564 of R30650JPEAJJP8, a second amino acid sequence being at least 90 % homologous to
VDERGGYDPPTYIRDLSIHHTFSRCVTVHGSNGLLIKDWGYNSLGHCFFTEDGPEERNT FDHCLGLLVKSGTLLPSDRDSKMCKMITEDSYPGYIPKPRQDCNAVSTFWMANPNNNL RNCAAAGSEETGFWFIFHHVPTGPSVGMYSPGYSEHIPLGKFYNNRAHSNYRAGMIIDN GVKTTEASAKDKRPFLSIISARYSPHQDADPLKPREPAIIRHFIAYKNQDHGAWLRGGDV WLDSCRFADNGIGLTLASGGTFPYDDGSKQEIKNSLFVGESGNVGTEMMDNRIWGPGG LDHSGRTLPIGQNFPIRGIQLYDGPINIQNCTFRKFVALEGRHTSALAFRLNNAWQSCPH NNVTGIAFEDVPITSRVFFGEPGPWFNQLDMDGDKTSVFHDVDGSVSEYPGSYLTKND NWLVRHPDCΓNVPDWRGAICSGCYAQMYIQAYKTSNLRMKIIKNDFPSHPLYLEGALT RSTHYQQYQPVVTLQKGYTIHWDQTAPAELAIWLINFNKGDWIRVGLCYPRGTTFSILS
DVHNRLLKQTSKTGVFVRTLQMDKVEQSYPGRSHYYWDEDSG corresponding to amino acids 8 - 579 of Q9NPN9, which also conesponds to amino acids 565 - 1136 of R30650JPEAJJP8, and a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence KQRTISWR corresponding to amino acids 1137 - 1144 of R30650_PEAJ_P8, wherein said first amino acid sequence, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of R30650JPEA_2JP8, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence
MGAAGRQDFLFKAMLTISWLTLTCFPGATSTVAAGCPDQSPELQPWNPGHDQDHHVHI GQGKTLLLTSSATVYSIHISEGGKLVIKDHDEPIVLRTRHILIDNGGELHAGSALCPFQGN FTI1 YGRADEGIQPDPYYGLKYIGVGKGGALELHGQKKLSWTFLNKTLHPGGMAEGG YFFERSWGHRGVΓVHVIDPKSGTVIHSDRFDTYRSKKESERLVQYLNAVPDGRILSVAV NDEGSRNLDDMARKAMTKLGSKHFLHLGFRΉPWSFLTVKGNPSSSVEDHIEYHGHRG SAAARVFKLFQTEHGEYFNVSLSSEWVQDVEWTEWFDHDKVSQTKGGEKISDLWKAH PGKICNRPIDIQATTMDGVNLSTEWYKKGQDYRFACYDRGRACRSYRVRFLCGKPVR PKLTVTIDTNVNSTILNLEDNVQSWKPGDTLVIASTDYSMYQAEEFQVLPCRSCAPNQV KVAGKPMYLHIGEEIDGVDMRAEVGLLSRNIΓVMGEMEDKCYPYRNHICNFFDFDTFG GHIKFALGFKAAHLEGTELKHMGQQLVGQYPIHFHLAGD OFR30650JPEAJ_P8. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of R30650_PEAJ_P8, comprising a polypeptide being at least 70%>, optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90%> and most preferably at least about 95% homologous to the sequence KQRTISWR in R30650_PEAJ_P8. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for R30650_PEA_2_P8, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%) and most preferably at least 95% homologous to a polypeptide having the sequence MGAAGRQDFLFKAMLTISWLTLTCFPGATSTVAAGCPDQSPELQPWNPGHDQDHHVHI GQGKTLLLTSSATVYSIHISEGGKLVIKDHDEPIVLRTRHILIDNGGELHAGSALCPFQGN FTIILYGRADEGIQPDPYYGLKYIGVGKGGALELHGQKKLSWTFLNKTLHPGGMAEGG YFFERSWGHRGVIVHVIDPKSGTVIHSDRFDTYRSKKESERLVQYLNAVPDGRILSVAV NDEGSRNLDDMARKAMTKLGSKHFLHLGFRHPWSFLTVKGNPSSSVEDHIEYHGHRG SAAARVFKLFQTEHGEYFNVSLSSEWVQDVEWTEWFDHDKVSQTKGGEKISDLWKAH PGKICNRPIDIQATTMDGVNLSTEVVYKKGQDYRFACYDRGRACRSYRVRFLCGKPVR PKLTVTIDTNVNSTILNLEDNVQSWKPGDTLVIASTDYSMYQAEEFQVLPCRSCAPNQV KVAGKPMYLHIGEEIDGVDMRAEVGLLSRNI MGEMEDKCYPYRNHICNFFDFDTFG GHIKFALGFKAAHLEGTELKHMGQQLVGQYPIHFHLAGDVDERGGYDPPTYIRDLSIHH TFSRCVTVHGSNGLLIKDVVGYNSLGHCFFTEDGPEERNTFDHCLGLLVKSGTLLPSDR DSKMCKMITEDSYPGYIPKPRQDCNAVSTFWMANPNNNLΓNCAAAGSEETGFWFIFHH VPTGPSVGMYSPGYSEHIPLGKFYNNRAHSNΎRAGMΠDNGVKTTEASAKDKRPFLSΠS ARYSPHQDADPLKPREPAIIRHFIAYKNQDHGAWLRGGDVWLDSCRFADNGIGLTLAS
GGTFPYDDGSKQEIKNSLFVGESGNVGTEMMDNRIWGPGGLDH conesponding to amino acids 1 - 862 of R30650JPEA _2JP8, a second amino acid sequence being at least 90 %> homologous to
SGRTLPIGQNFPIRGIQLYDGPΓNIQNCTFRKFVALEGRHTSALAFRLNNAWQSCPHNNV TGIAFEDVPITSRVFFGEPGPWFNQLDMDGDKTSVFHDVDGSVSEYPGSYLTKNDNWL VRHPDCINVPDWRGAICSGCYAQMYIQAYKTSNLRMKΠKNDFPSHPLYLEGALTRSTH YQQYQPWTLQKGYTIHWDQTAPAELAIWLΓNFNKGDWIRVGLCYPRGTTFSILSDVH
NRLLKQTSKTGVFVRTLQMDKVEQSYPGRSHYYWDEDSG conesponding to amino acids 2 - 275 of Q9H1K5, which also conesponds to amino acids 863 - 1136 of R30650_PEA_2_P8, and a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence KQRTISWR conesponding to amino acids 1137 - 1144 of R30650JPEAJJ>8, wherein said first amino acid sequence, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of R30650 PEAJ ?8, comprising a polypeptide being at least 70%o, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MGAAGRQDFLFKAMLTISWLTLTCFPGATSTVAAGCPDQSPELQPWNPGHDQDHHVHI GQGKTLLLTSSATVYSIHISEGGKLVIKDHDEPIVLRTRHILIDNGGELHAGSALCPFQGN FTIILYGRADEGIQPDPYYGLKYIGVGKGGALELHGQKKLSWTFLNKTLHPGGMAEGG YFFERS WGHRGVIVHVIDPKSGTVIHSDRFDTYRSKKESERLVQYLNAVPDGRILS VA V NDEGSRNLDDMARKAMTKLGSKHFLHLGFRHPWSFLTVKGNPSSSVEDHIEYHGHRG SAAARVFKLFQTEHGEYFNVSLSSEWVQDVEWTEWFDHDKVSQTKGGEKISDLWKAH PGKICNRPIDIQATTMDGVNLSTEVVYKKGQDYRFACYDRGRACRSYRVRFLCGKPVR PKLTVTIDTNVNSTILNLEDNVQSWKPGDTLVIASTDYSMYQAEEFQVLPCRSCAPNQV KVAGKPMYLHIGEEIDGVDMRAEVGLLSRNIIVMGEMEDKCYPYRNHICNFFDFDTFG GHIKFALGFKAAHLEGTELKHMGQQLVGQYPIHFHLAGDVDERGGYDPPTYIRDLSIHH TFSRCVTVHGSNGLLIKDVVGYNSLGHCFFTEDGPEERNTFDHCLGLLVKSGTLLPSDR DSKMCKMITEDSYPGYIPKPRQDCNAVSTFWMANPNNNLiNCAAAGSEETGFWFIFHH VPTGPSVGMYSPGYSEHIPLGKFYNNRAHSNYRAGMIIDNGVKTTEASAKDKRPFLSIIS ARYSPHQDADPLKPREPAIIRHFIAYKNQDHGAWLRGGDVWLDSCRFADNGIGLTLAS GGTFPYDDGSKQEIKNSLFVGESGNVGTEMMDNRIWGPGGLDH of R30650JPEA_2JP8. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of R30650_PEAJ_P8, comprising a polypeptide being at least 70%>, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence KQRTISWR in R30650JPEA_2JP8. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for R30650JPEAJJP15, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence MGAAGRQDFLFKAMLTISWLTLTCFPGATSTVAAGCPDQSPELQPWNPGHDQDHHVHI GQGKTLLLTSSATVYSIHISEGGKLVIKDHDEPIVLRTRHILIDNGGELHAGSALCPFQGN FTΠLYGRADEGIQPDPYYGLKYIGVGKGGALELHGQKKLSWTFLNKTLHPGGMAEGG YFFERSWGHRGVΓVHVIDPKSGTVIHSDRFDTYRSKKESERLVQYLNAVPDGRILSVAV NDEGSRNLDDMARKAMTKLGSKHFLHLGFRHPWSFLTVKGNPSSSVEDHIEYHGHRG SAAARVFKLFQTEHGEYFNVSLSSEWVQDVEWTEWFDHDKVSQTKGGEKISDLWK conesponding to amino acids 1 - 348 of R30650JPEAJJP15, and a second amino acid sequence being at least 90 %> homologous to
AHPGKICNRPIDIQATTMDGVNLSTEVVYKKGQDYRFACYDRGRACRSYRVRFLCGKP VRPKLTVTIDTNVNSTILNLEDNVQSWKPGDTLVIASTDYSMYQAEEFQVLPCRSCAPN QVKVAGKPMYLHIGEEIDGVDMRAEVGLLSRNIIVMGEMEDKCYPYRNHICNFFDFDT FGGHIKFALGFKAAHLEGTELKHMGQQLVGQYPIHFHLAGDVDERGGYDPPTYIRDLSI HHTFSRCVTVHGSNGLLIKDVVGYNSLGHCFFTEDGPEERNTFDHCLGLLVKSGTLLPS DRDSKMCKMITEDSYPGYΓPKPRQDCNAVSTFWMANPNNNLINCAAAGSEETGFWFIF HHVPTGPSVGMYSPGYSEHIPLGKFYNNRAHSNYRAGMIIDNGVKTTEASAKDKRPFLS IISARYSPHQDADPLKPREPAIIRHFIAYKNQDHGAWLRGGDVWLDSCRFADNGIGLTL ASGGTFPYDDGSKQEΓKNSLFVGESGNVGTEMMDNRIWGPGGLDHSGRTLPIGQNFPIR GIQLYDGPΓNIQNCTFRKFVALEGRHTSALAFRLNNAWQSCPHNNVTGIAFEDVPITSRV FFGEPGPWFNQLDMDGDKTSVFHDVDGSVSEYPGSYLTKNDNWLVRHPDCINVPDWR GAICSGCYAQMYIQAYKTSNLRMKIIKNDFPSHPLYLEGALTRSTHYQQYQPVVTLQKG YTIHWDQTAPAELAIWLΓNFNKGDWIRVGLCYPRGTTFSILSDVHNRLLKQTSKTGVFV
RTLQMDKVEQSYPGRSHYYWDEDSG conesponding to amino acids 1 - 788 of Q9ULM1, which also conesponds to amino acids 349 - 1136 of R30650_PEA_2_P15, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of R30650JPEA J_P15, comprising a polypeptide being at least 70%>, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MGAAGRQDFLFKAMLTISWLTLTCFPGATSTVAAGCPDQSPELQPWNPGHDQDHHVHI GQGKTLLLTSSATVYSIHISEGGKLVIKDHDEPIVLRTRHILIDNGGELHAGSALCPFQGN FTIILYGRADEGIQPDPYYGLKYIGVGKGGALELHGQKKLSWTFLNKTLHPGGMAEGG YFFERSWGHRGVrVHVIDPKSGTVIHSDRFDTYRSKKESERLVQYLNAVPDGRILSVAV NDEGSRNLDDMARKAMTKLGSKHFLHLGFRHPWSFLTVKGNPSSSVEDHIEYHGHRG SAAARVFKLFQTEHGEYFNVSLSSEWVQDVEWTEWFDHDKVSQTKGGEKISDLWK of R30650_PEA_2_P15. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for R30650jPEAJJP15, comprising a first amino acid sequence being at least 90 % homologous to
MGAAGRQDFLFKAMLTISWLTLTCFPGATSTVAAGCPDQSPELQPWNPGHDQDHHVHI GQGKTLLLTSSATVYSIHISEGGKLVIKDHDEPIVLRTRHILIDNGGELHAGSALCPFQGN FTIILYGRADEGIQPDPYYGLKYIGVGKGGALELHGQKKLSWTFLNKTLHPGGMAEGG YFFERSWGHRGVRVHVIDPKSGTVIHSDRFDTYRSKKESERLVQYLNAVPDGRILSVAV NDEGSRNLDDMARKAMTKLGSKHFLHLGFRHPWSFLTVKGNPSSSVEDHIEYHGHRG SAAARVFKLFQTEHGEYFNVSLSSEWVQDVEWTEWFDHDKVSQTKGGEKISDLWKAH PGKICNRPIDIQATTMDGVNLSTEVVYKKGQDYRFACYDRGRACRSYRVRFLCGKPVR PKLTVTIDTNVNSTILNLEDNVQSWKPGDTLVIASTDYSMYQAEEFQVLPCRSCAPNQV KVAGKPMYLHIGEEIDGVDMRAEVGLLSRNIΓVMGEMEDKCYPYRNHICNFFDFDTFG GHIKFALGFKAAHLEGTELKHMGQQLVGQYPIHFHLAGDVDERGGYDPPTYIRDLSIHH
TFSRCVTVHGSNGLLIKDWGYNSLGHCFFTEDGPEERNTFDHCLGLLVKSGTLLPSDR
DSKMCKMITEDSYPGYIPKPRQDCNAVSTFWMANPNNNLINCAAAGSEETGFWFIFHH VPTGPSVGMYSPGYSEHIPLGKFYNNRAHSNYRAGMIIDNGVKTTEASAKDKRPFLSIIS ARYSPHQDADPLKPREPAIIRHFIAYKNQDHGAWLRGGDVWLDSCRFADNGIGLTLAS GGTFPYDDGSKQEΓKNSLFVGESGNVGTEMMDNRIWGPGGLDHSGRTLPIGQNFPIRGI QLYDGP IQNCTFRKFVALEGRHTSALAFRLNNAWQSCPHNNVTGIAFEDVPITSRVFF
GEPGPWFNQLDMDGDKTSVFHDVDGSVSEYPGSYLTKND conesponding to amino acids 1 - 977 of Q8WUJ3, which also conesponds to amino acids 1 - 977 of R30650_PEAJ_P15, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least
85%, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence
NWLVRHPDCINVPDWRGAICSGCYAQMYIQAYKTSNLRMKIIKNDFPSHPLYLEGALT RSTHYQQYQPWTLQKGYTIHWDQTAPAELAIWLINFNKGDWIRVGLCYPRGTTFSILS
DVHNRLLKQTSKTGVFVRTLQMDKVEQSYPGRSHYYWDEDSG conesponding to amino acids 978 - 1136 of R30650_PEAJ_P15, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of R30650JPEA_J_P15, comprising a polypeptide being at least 70%>, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence NWLVRHPDCLNVPDWRGAICSGCYAQMYIQAYKTSNLRMKIIKNDFPSHPLYLEGALT RSTHYQQYQPVVTLQKGYTIHWDQTAPAELAIWLINFNKGDWIRVGLCYPRGTTFSILS DVHNRLLKQTSKTGVFVRTLQMDKVEQSYPGRSHYYWDEDSG in R30650JPEA_2J»15. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for R30650_PEAJ_P15, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MGAAGRQDFLFKAMLTISWLTLTCFPGATSTVAAGCPDQSPELQPWNPGHDQDHHVHI GQGKTLLLTSSATVYSIHISEGGKLVIKDHDEPIVLRTRHILIDNGGELHAGSALCPFQGN FTIILYGRADEGIQPDPYYGLKYIGVGKGGALELHGQKKLSWTFLNKTLHPGGMAEGG YFFERSWGHRGVRVHVIDPKSGTVIHSDRFDTYRSKKESERLVQYLNAVPDGRILSVAV NDEGSRNLDDMARKAMTKLGSKHFLHLGFRHPWSFLTVKGNPSSSVEDHIEYHGHRG SAAARVFKLFQTEHGEYFNVSLSSEWVQDVEWTEWFDHDKVSQTKGGEKISDLWKAH PGKICNRPIDIQATTMDGVNLSTEWYKKGQDYRFACYDRGRACRSYRVRFLCGKPVR PKLTVTIDTNVNSTILNLEDNVQSWKPGDTLVIASTDYSMYQAEEFQVLPCRSCAPNQV KVAGKPMYLHIGEEIDGVDMRAEVGLLSRNIΓVMGEMEDKCYPYRNHICNFFDFDTFG
GHIKFALGFKAAHLEGTELKHMGQQLVGQYPIHFHLAGD conesponding to amino acids 1 - 564 of R30650_PEA_2_P15, and a second amino acid sequence being at least 90 % homologous to
VDERGGYDPPTYIRDLSIHHTFSRCVTVHGSNGLLIKDVVGYNSLGHCFFTEDGPEERNT FDHCLGLLVKSGTLLPSDRDSKMCKMITEDSYPGYIPKPRQDCNAVSTFWMANPNNNL ΓNCAAAGSEETGFWFIFHHVPTGPSVGMYSPGYSEHIPLGKFYNNRAHSNYRAGMIIDN GVKTTEASAKDKRPFLSIISARYSPHQDADPLKPREPAIIRHFIAYKNQDHGAWLRGGDV WLDSCRFADNGIGLTLASGGTFPYDDGSKQEIKNSLFVGESGNVGTEMMDNRIWGPGG LDHSGRTLPIGQNFPIRGIQLYDGPΓNIQNCTFRKFVALEGRHTSALAFRLNNAWQSCPH NNVTGIAFEDVPITSRVFFGEPGPWFNQLDMDGDKTSVFHDVDGSVSEYPGSYLTKND NWLVRHPDCINVPDWRGAICSGCYAQMYIQAYKTSNLRMKIIKNDFPSHPLYLEGALT RSTHYQQYQPVVTLQKGYTIHWDQTAPAELAIWLΓNFNKGDWIRVGLCYPRGTTFSILS DVHNRLLKQTSKTGVFVRTLQMDKVEQSYPGRSHYYWDEDSG conesponding to amino acids 8 - 579 of Q9NPN9, which also conesponds to amino acids 565 - 1136 of R30650JPEAJJP15, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of R30650JPEA _2 JP 15, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence
MGAAGRQDFLFKAMLTISWLTLTCFPGATSTVAAGCPDQSPELQPWNPGHDQDHHVHI GQGKTLLLTSSATVYSIHISEGGKLVIKDHDEPIVLRTRHILIDNGGELHAGSALCPFQGN FTIILYGRADEGIQPDPYYGLKYIGVGKGGALELHGQKKLSWTFLNKTLHPGGMAEGG YFFERSWGHRGVΓVHVIDPKSGTVIHSDRFDTYRSKKESERLVQYLNAVPDGRILSVAV NDEGSRNLDDMARKAMTKLGSKHFLHLGFRHPWSFLTVKGNPSSSVEDHIEYHGHRG SAAARVFKLFQTEHGEYFNVSLSSEWVQDVEWTEWFDHDKVSQTKGGEKISDLWKAH PGKICNRPIDIQATTMDGVNLSTEWYKKGQDYRFACYDRGRACRSYRVRFLCGKPVR PKLTVTIDTNVNSTILNLEDNVQSWKPGDTLVIASTDYSMYQAEEFQVLPCRSCAPNQV KVAGKPMYLHIGEEIDGVDMRAEVGLLSRNIIVMGEMEDKCYPYRNHICNFFDFDTFG GHIKFALGFKAAHLEGTELKHMGQQLVGQYPIHFHLAGD OF R30650JPEA_2JP15. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for R30650JPEA_2JP15, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
MGAAGRQDFLFKAMLTISWLTLTCFPGATSTVAAGCPDQSPELQPWNPGHDQDHHVHI GQGKTLLLTSSATVYSIHISEGGKLVIKDHDEPIVLRTPVHILIDNGGELHAGSALCPFQGN FTIILYGRADEGIQPDPYYGLKYIGVGKGGALELHGQKKLSWTFLNKTLHPGGMAEGG YFFERSWGHRGVΓVΉVIDPKSGTVIHSDRFDTYRSKKESERLVQYLNAVPDGRILSVAV NDEGSRNLDDMARKAMTKLGSKHFLHLGFRHPWSFLTVKGNPSSSVEDHIEYHGHRG SAAARVFKLFQTEHGEYFNVSLSSEWVQDVEWTEWFDHDKVSQTKGGEKISDLWKAH PGKICNRPIDIQATTMDGVNLSTEWYKKGQDYRFACYDRGRACRSYRVRFLCGKPVR PKLTVTIDTNVNSTILNLEDNVQSWKPGDTLVIASTDYSMYQAEEFQVLPCRSCAPNQV KVAGKPMYLHIGEEIDGVDMRAEVGLLSRNIIVMGEMEDKCYPYRNHICNFFDFDTFG GHIKFALGFKAAHLEGTELKHMGQQLVGQYPIHFHLAGDVDERGGYDPPTYIRDLSIHH TFSRCVTVHGSNGLLIKDVVGYNSLGHCFFTEDGPEERNTFDHCLGLLVKSGTLLPSDR DSKMCKMITEDSYPGYIPKPRQDCNAVSTFWMANPNNNLINCAAAGSEETGFWFIFHH VPTGPSVGMYSPGYSEHIPLGKFYNNRAHSNYRAGMΠDNGVKTTEASAKDKRPFLSΠS ARYSPHQDADPLKPREPAIIRHFIAYKNQDHGAWLRGGDVWLDSCRFADNGIGLTLAS GGTFPYDDGSKQEIKNSLFVGESGNVGTEMMDNRIWGPGGLDH conesponding to amino acids 1 - 862 of R30650_PEA_2_P15, and a second amino acid sequence being at least 90 %> homologous to
SGRTLPIGQNFPIRGIQLYDGP1NIQNCTFRKFVALEGRHTSALAFRLNNAWQSCPHNNV TGIAFEDVPITSRVFFGEPGPWFNQLDMDGDKTSVFHDVDGSVSEYPGSYLTKNDNWL VRHPDCINVPDWRGAICSGCYAQMYIQAYKTSNLRMKIIKNDFPSHPLYLEGALTRSTH YQQYQPWTLQKGYTIHWDQTAPAELAIWLLNFNKGDWIRVGLCYPRGTTFSILSDVH NRLLKQTSKTGVFVRTLQMDKVEQSYPGRSHYYWDEDSG conesponding to amino acids 2 - 275 of Q9H1K5, which also conesponds to amino acids 863 - 1136 of R30650JPEA_2JP15, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of R30650_PEA_2 S, comprising a polypeptide being at least 70%>, optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence MGAAGRQDFLFKAMLTISWLTLTCFPGATSTVAAGCPDQSPELQPWNPGHDQDHHVHI GQGKTLLLTSSATVYSIHISEGGKLVIKDHDEPIVLRTRHILIDNGGELHAGSALCPFQGN FTIILYGRADEGIQPDPYYGLKYIGVGKGGALELHGQKKLSWTFLNKTLHPGGMAEGG YFFERSWGHRGVΓVHVIDPKSGTVIHSDRFDTYRSKKESERLVQYLNAVPDGRILSVAV NDEGSRNLDDMARKAMTKLGSKHFLHLGFRHPWSFLTVKGNPSSSVEDHIEYHGHRG SAAARVFKLFQTEHGEYFNVSLSSEWVQDVEWTEWFDHDKVSQTKGGEKISDLWKAH PGKICNRPIDIQATTMDGVNLSTEWYKKGQDYRFACYDRGRACRSYRVRFLCGKPVR PKLTVTIDTNVNSTILNLEDNVQSWKPGDTLVIASTDYSMYQAEEFQVLPCRSCAPNQV KVAGKPMYLHIGEEIDGVDMRAEVGLLSRNΠVMGEMEDKCYPYRNHICNFFDFDTFG GHIKFALGFKAAHLEGTELKHMGQQLVGQYPIHFHLAGDVDERGGYDPPTYIRDLSIHH TFSRCVTVHGSNGLLIKDVVGYNSLGHCFFTEDGPEERNTFDHCLGLLVKSGTLLPSDR DSKMCKMITEDSYPGYIPKPRQDCNAVSTFWMANPNNNLΓNCAAAGSEETGFWFIFHH VPTGPSVGMYSPGYSEHIPLGKFYNNRAHSNYRAGMIIDNGVKTTEASAKDKRPFLSIIS ARYSPHQDADPLKPREPAIIRHFIAYKNQDHGAWLRGGDVWLDSCRFADNGIGLTLAS GGTFPYDDGSKQEIKNSLFVGESGNVGTEMMDNRIWGPGGLDH OF
R30650JPEA_2_P15. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for R30650JPEA_2JP17, comprising a first amino acid sequence being at least 90 % homologous to
MGAAGRQDFLFKAMLTISWLTLTCFPGATSTVAAGCPDQSPELQPWNPGHDQDHHVHI GQGKTLLLTSSATVYSIHISEGGKLVIKDHDEPIVLRTRHILIDNGGELHAGSALCPFQGN FTIILYGRADEGIQPDPYYGLKYIGVGKGGALELHGQKKLSWTFLNKTLHPGGMAEGG YFFERSWGHRGVΓVHVIDPKSGTVIHSDRFDTYRSKKESERLVQYLNAVPDGRILSVAV NDEGSRNLDDMARKAMTKLGSKHFLHLGFRHPWSFLTVKGNPSSSVEDHIEYHGHRG
SAAARVFKLFQTEHGEYFNVSLSSEWVQ conesponding to amino acids 1 - 321 of Q8WUJ3, which also conesponds to amino acids 1 - 321 of R30650JPEAJJP17, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%>, more preferably at least 90%> and most preferably at least 95 % homologous to a polypeptide having the sequence GEEFQTIW conesponding to amino acids 322 - 329 of R30650_PEA_2_P17, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of R30650JPEA_2_P17, comprising a polypeptide being at least 70%>, optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence GEEFQTIW in R30650JPEA_2_P17. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for M78035JP4, comprising a first amino acid sequence being at least 90 %> homologous to
MPGLMRMRERYSASKPLKGARIAGCLHMTVETAVLIETLVTLGAEVQWSSCNIFSTQD HAAAAIAKAGIPVYAWKGETDEEYLWCIEQTLYFKDGPLNMILDDGGDLTNLIHTKYP QLLPGIRGISEETTTGVHNLYKMMANGILKVPAINVNDSVTKSKFDNLYGCRESLIDGIK RATDVMIAGKVAVVAGYGDVGKGCAQALRGFGARVIITEIDPLNALQAAMEGYEVTT MDEACQEGNIFVTTTGCIDIILGRHFEQMKDDAIVCNIGHFDVEIDVKWLNENAVEKVN IKPQVDRYRLKNGRPJILLAEGRLVNLGCAMGHPSFVMSNSFTNQVMAQIELWTHPDK YPVGVHFLPKKLDEAVAEAHLGKLNVKLTKLTEKQAQYLGMSCDGPFKPDHYRY conesponding to amino acids 29 - 432 of S AHH_HUMAN, which also conesponds to amino acids 1 - 404 of M78035JP4. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for M78035 _JP6, comprising a first amino acid sequence being at least 90 % homologous to MILDDGGDLTNLIHTKYPQLLPGIRGISEETTTGVHNLYKMMANGILKVPAINVNDSVT KSKFDNLYGCRESLIDGIKRATDVMIAGKVAVVAGYGDVGKGCAQALRGFGARVIITEI DPLNALQAAMEGYEVTTMDEACQEGNIFVTTTGCIDIILGRHFEQMKDDAIVCNIGHFD VEIDVKWLNENAVEKVNIKPQVDRYRLKNGRRIILLAEGRLVNLGCAMGHPSFVMSNS FTNQVMAQIELWTHPDKYPVGVHFLPKKLDEAVAEAHLGKLNVKLTKLTEKQAQYLG MSCDGPFKPDHYRY conesponding to amino acids 127 - 432 of SAHH_HUMAN, which also conesponds to amino acids 1 - 306 of M78035JP6. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for M78035_P8, comprising a first amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%>, more preferably at least 90%) and most preferably at least 95%> homologous to a polypeptide having the sequence MSDKLPYKV conesponding to amino acids 1 - 9 of M78035JP8, and a second amino acid sequence being at least 90 %> homologous to
VYAWKGETDEEYLWCIEQTLYFKDGPLNMILDDGGDLTNLIHTKYPQLLPGIRGISEET TTGVFINLYiαVlMANGILKVPAINVNDSVTKSKFDNLYGCRESLIDGIKRATDVMIAGKV AWAGYGDVGKGCAQALRGFGARVIITEIDPINALQAAMEGYEVTTMDEACQEGNIFV TTTGCIDIILGRHFEQMKDDAIVCNIGHFDVEIDVKWLNENAVEKVNIKPQVDRYRLKN GRPJILLAEGRXVNLGCAMGHPSFVMSNSFTNQVMAQIELWTFiPDKYPVGVFIFLPKKL DEAVAEAHLGKLNVKLTKLTEKQAQYLGMSCDGPFKPDHYRY corresponding to amino acids 99 - 432 of S AHHJHUMAN, which also conesponds to amino acids 10 - 343 of M78035JP8, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of M78035JP8, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%o, more preferably at least about 90%) and most preferably at least about 95% homologous to the sequence MSDKLPYKV of M78035JP8. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMCEAJPEAJ JM, comprising a first amino acid sequence being at least 90 % homologous to
MESPSAPPHRWCIPWQRLLLTASLLTFWNPPTTAKLTIESTPFNVAEGKEVLLLVHNLPQ HLFGYSWYKGERVDGNRQIIGYVIGTQQATPGPAYSGREIIYPNASLLIQNIIQNDTGFYT LHVIKSDLVNEEATGQFRVYPELPKPSISSNNSKPVEDKDAVAFTCEPETQDATYLWWV NNQSLPVSPRLQLSNGNRTLTLFNVTRNDTASYKCETQNPVSARRSDSVILNVL conesponding to amino acids 1 - 234 of CEA5 JHUMAN, which also conesponds to amino acids 1 - 234 of HUMCEAJPEAJ JM, and a second amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence
CEYICSSLAQAASPNPQGQRQDFSVPLRFKYTDPQPWTSRLSVTFCPRKTWADQVLTKN RRGGAASVLGGSGSTPYDGRNR conesponding to amino acids 235 - 315 of HUMCEAJPEAJ J , wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HUMCEAJPEAJ JM, comprising a polypeptide being at least 70%>, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence CEYICSSLAQAASPNPQGQRQDFSVPLRFKYTDPQPWTSRLSVTFCPRKTWADQVLTKN RRGGAASVLGGSGSTPYDGRNR in HUMCEA PEA 1 P4. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMCEAJPEAJ JP5, comprising a first amino acid sequence being at least 90 % homologous to
MESPSAPPHRWCIPWQRLLLTASLLTFWNPPTTAKLTIESTPFNVAEGKEVLLLVHNLPQ HLFGYSWYKGERVDGNRQIIGYVIGTQQATPGPAYSGREIIYPNASLLIQNIIQNDTGFYT LHVIKSDLVNEEATGQFRVYPELPKPSISSNNSKPVEDKDAVAFTCEPETQDATYLWWV NNQSLPVSPRLQLSNGNRTLTLFNVTRNDTASYKCETQNPVSARRSDSVILNVLYGPDA PTISPLNTSYRSGENLNLSCHAASNPPAQYSWFVNGTFQQSTQELFIPNITVNNSGSYTC QAHNSDTGLNRTTVTTITVYAEPPKPFITSNNSNPVEDEDAVALTCEPEIQNTTYLWWV NNQSLPVSPRLQLSNDNRTLTLLS VTRNDVGP YECGIQNELS VDHSDPVILNVLYGPDD PTISPSYTYYRPGVNLSLSCHAASNPPAQYSWLIDGNIQQHTQELFISNITEKNSGLYTCQ ANNSASGHSRTTVKTITVSAELPKPSISSNNSKPVEDKDAVAFTCEPEAQNTTYLWWVN GQSLPVSPRLQLSNGNRTLTLFNVTRNDARAYVCGIQNSVSANRSDPVTLDVLYGPDTP IISPPDSSYLSGANLNLSCHSASNPSPQYSWRTNGIPQQHTQVLFIAKITPNNNGTYACFV SNLATGRNNSIVKSITVS conesponding to amino acids 1 - 675 of CEA5 JHUMAN, which also conesponds to amino acids 1 - 675 of HUMCEA_PEA_1_P5, and a second amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%>, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence GKWLPGASASYSGVESIWFSPKSQEDIFFPSLCSMGTRKSQILS conesponding to amino acids 676 - 719 of HUMCEA_PEA_1_P5, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HUMCEA_PEA_1_P5, comprising a polypeptide being at least 70%>, optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence
GKWLPGASASYSGVESIWFSPKSQEDIFFPSLCSMGTRKSQILS in HUMCEAJPEAJ JP5. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMCEAJPEAJ JP7, comprising a first amino acid sequence being at least 90 % homologous to MESPSAPPHRWCIPWQRLLLTASLLTFWNPPTTAKLTIESTPFNVAEGKEVLLLVHNLPQ HLFGYSWYKGERVDGNRQIIGYVIGTQQATPGPAYSGREIΓYPNASLLIQNIIQNDTGFYT LHVIKSDLVNEEATGQFRVYPELPKPSISSNNSKPVEDKDAVAFTCEPETQDATYLWWV NNQSLPVSPRLQLSNGNRTLTLFNVTRNDTASYKCETQNPVSARRSDSVILNVLYGPDA PTISPLNTSYRSGENLNLSCHAASNPPAQYSWFVNGTFQQSTQELFIPNITVNNSGSYTC QAHNSDTGLNRTTVTTITVYAEPPKPFITSNNSNPVEDEDAVALTCEPEIQNTTYLWWV NNQSLPVSPRLQLSNDNRTLTLLSVTRNDVGPYECGIQNELSVDHSDPVILNVLYGPDD PTISPSYTYYRPGVNLSLSCHAASNPPAQYSWLIDGNIQQHTQELFISNITEKNSGLYTCQ ANNSASGHSRTTVKTITVSAELPKPSISSNNSKPVEDKDAVAFTCEPEAQNTTYLWWVN GQSLPVSPRLQLSNGNRTLTLFNVTRNDARAYVCGIQNSVSANRSDPVTLDVLYGPDTP IISPPDSSYLSGANLNLSCHSASNPSPQYSWPJNGIPQQHTQVLFIAKITPNNNGTYACFV SNLATGRNNSIVKSITV conesponding to amino acids 1 - 674 of CEA5 JHUMAN, which also conesponds to amino acids 1 - 674 of HUMCEA_PEA_1_P7, and a second amino acid sequence being at least 90 % homologous to SAGATVGIMIGVLVGVALI conesponding to amino acids 684 - 702 of CEA5 JHUMAN, which also conesponds to amino acids 675 - 693 of HUMCEAJPEAJ JP7, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of HUMCEA_PEA_1_P7, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise VS, having a structure as follows: a sequence starting from any of amino acid numbers 674-x to 674; and ending at any of amino acid numbers 675+ ((n-2) - x), in which x varies from 0 to n-2. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMCEAJPEAJ JP10, comprising a first amino acid sequence being at least 90 % homologous to
MESPSAPPHRWCIPWQP LLTASLLTFWNPPTTAKLTIESTPFNVAEGKEVLLLVHNLPQ HLFGYSWYKGERVDGNRQIIGYVIGTQQATPGPAYSGREIIYPNASLLIQNIIQNDTGFYT LHVIKSDLVNEEATGQFRVYPELPKPSISSNNSKPVEDKDAVAFTCEPETQDATYLWWV NNQSLPVSPRLQLSNGNRTLTLFNVTRNDTASYKCETQNPVSARRSDS conesponding to amino acids 1 - 228 of CEA5 JHUMAN, which also conesponds to amino acids 1 - 228 of HUMCEAJΕAJ JP10, and a second amino acid sequence being at least 90 % homologous to VILNVLYGPDDPTISPSYTYYRPGVNLSLSCHAASNPPAQYSWLIDGNIQQHTQELFISNI TEKNSGLYTCQANNSASGHSRTTVKTITVSAELPKPSISSNNSKPVEDKDAVAFTCEPEA QNTTYLWWVNGQSLPVSPRLQLSNGNRTLTLFNVTRNDARAYVCGIQNSVSANRSDPV TLDVLYGPDTPIISPPDSSYLSGANLNLSCHSASNPSPQYSWRTNGIPQQHTQVLFIAKITP NNNGTYACFVSNLATGRNNSIVKSITVSASGTSPGLSAGATVGIMIGVLVGVALI conesponding to amino acids 407 - 702 of CEA5_HUMAN, which also conesponds to amino acids 229 - 524 of HUMCEAJΕAJ J* 10, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of HUMCEAJΕAJ JP 10, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise SV, having a structure as follows: a sequence starting from any of amino acid numbers 228-x to 228; and ending at any of amino acid numbers 229+ ((n-2) - x), in which x varies from 0 to n-2. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMCEAJPEAJ JP19, comprising a first amino acid sequence being at least 90 % homologous to
MESPSAPPHRWCIPWQRLLLTASLLTFWNPPTTAKLTIESTPFNVAEGKEVLLLVHNLPQ HLFGYSWYKGERVDGNRQIIGYVIGTQQATPGPAYSGREIIYPNASLLIQNIIQNDTGFYT LHVIKSDLVNEEATGQFRVYPELPKPSISSNNSKPVEDKDAVAFTCEPETQDATYLWWV NNQSLPVSPRLQLSNGNRTLTLFNVTRNDTASYKCETQNPVSARRSDSVILN conesponding to amino acids 1 - 232 of CEA5_HUMAN, which also conesponds to amino acids 1 - 232 of HUMCEAJPEAJ JP 19, and a second amino acid sequence being at least 90 % homologous to
VLYGPDTPΠSPPDSSYLSGANLNLSCHSASNPSPQYSWRINGΓPQQHTQVLFIAKITPNNN GTYACFVSNLATGRNNSIVKSITVSASGTSPGLSAGATVGIMIGVLVGVALI conesponding to amino acids 589 - 702 of CEA5JHUMAN, which also conesponds to amino acids 233 - 346 of HUMCEAJΕAJ JP 19, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of HUMCEAJPEAJ J319, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise NV, having a structure as follows: a sequence starting from any of amino acid numbers 232-x to 232; and ending at any of amino acid numbers 233+ ((n-2) - x), in which x varies from 0 to n-2. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMCEAJΕAJ JP20, comprising a first amino acid sequence being at least 90 % homologous to MESPSAPPHRWCIPWQRLLLTASLLTFWNPPTTAKLTIESTPFNVAEGKEVLLLVHNLPQ HLFGYSWYKGERVDGNRQIIGYVIGTQQATPGPAYSGREIIYPNASLLIQNIIQNDTGFYT LHVIKSDLVNEEATGQFRVYP conesponding to amino acids 1 - 142 of CEA5_HUMAN, which also conesponds to amino acids 1 - 142 of HUMCEA_PEA_1_P20, and a second amino acid sequence being at least 90 % homologous to ELPKPSISSNNSKPVEDKDAVAFTCEPEAQNTTYLWWVNGQSLPVSPRLQLSNGNRTLT LFNVTRNDARAYVCGIQNSVSANRSDPVTLDVLYGPDTPIISPPDSSYLSGANLNLSCHS ASNPSPQYSWPJNGIPQQHTQVLFIAKITPNNNGTYACFVSNLATGRNNSIVKSITVSASG TSPGLSAGATVGIMIGVLVGVALI conesponding to amino acids 499 - 702 of CEA5 JHUMAN, which also conesponds to amino acids 143 - 346 of HUMCEAJPEAJ JP20, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of HUMCEA_PEA_1_P20, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise PE, having a structure as follows: a sequence starting from any of amino acid numbers 142-x to 142; and ending at any of amino acid numbers 143+ ((n-2) - x), in which x varies from 0 to n-2. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMCACH1 AJPEAJ JP7, comprising a first amino acid sequence being at least 90 % homologous to
MPTSETESVNTENVSGEGENRGCCGSL conesponding to amino acids 466 - 492 of CCAD_HUMAN_V3, which also conesponds to amino acids 1 - 27 of HUMCACHl AJPEAJ JP7, a second amino acid sequence being at least 70%, optionally at least 80%), preferably at least 85%, more preferably at least 90%> and most preferably at least 95%> homologous to a polypeptide having the sequence WCWWRRRGAAKAGPSGCRRWG conesponding to amino acids 28 - 48 of HUMCACHl AJPEAJ JP7, and a third amino acid sequence being at least 90 % homologous to QAISKSKLSRRWRRWNRFNRRRCRAAVKSVTFYWLVIVLVFLNTLTISSEHYNQPDWL TQIQDIANKVLLALFTCEMLVKMYSLGLQAYFVSLFNRFDCFVVCGGITETILVELEIMS PLGISVFRCVRLLRIFKVTRHWTSLSNLVASLLNSMKSIASLLLLLFLFΠΓFSLLGMQLFG GKFNFDETQTKRSTFDNFPQALLTVFQILTGEDWNAVMYDGIMAYGGPSSSGMIVCIYF
IILFICGNYILLNVFLAIAVDNLADAESLNTAQKEEAEEKERKKIARKESLENKKNNKPE VNQIANSDNKVTIDDYREEDEDKDPYPPCDVPVGEEEEEEEEDEPEVPAGPRPRRISELN MKEKIAPIPEGSAFFILSKTNPIRVGCHKLINHHIFTNLILVFIMLSSAALAAEDPIRSHSFR NTILGYFDYAFTAIFTVEILLKMTTFGAFLHKGAFCRNYFNLLDMLVVGVSLVSFGIQSS
AISWKΓLRVLRVLRPLRAΓNRAKGLKHWQCVFVAIRTIGNIMIVTTLLQFMFACIGVQ LFKGKFYRCTDEAKSNPEECRGLFILYKDGDVDSPVVRERIWQNSDFNFDNVLSAMMA LFTVSTFEGWPALLYKAIDSNGENIGPIYNHRVEISIFFIIYIIIVAFFMMNIFVGFVIVTFQE QGEKEYKNCELDKNQRQCVEYALKARPLRRYIPKNPYQYKFWYWNSSPFEYMMFVL IMLNTLCLAMQHYEQSKMFNDAMDILNMVFTGVFTVEMVLKVIAFKPKGYFSDAWNT FDSLIVIGSIIDVAI..SEADPTESENWWTATPGNSEESNPJSITFFRLFRVMRLVKLLSRGE GIRTLLWTFIKSFQALPYVALLIAMLFFIYAVIGMQMFGKVAMRDNNQΓNRNNNFQTFP QAVLLLFRCATGEAWQEIMLACLPGKLCDPESDYNPGEEYTCGSNFAIVYFISFYMLCA FLIΓNLFVAVIMDNFDYLTRDWSILGPHHLDEFKRIWSEYDPEAKGRIKHLDWTLLRRI QPPLGFGKLCPHRVACKRLVAMNMPLNSDGTVMFNATLFALVRTALKIKTEGNLEQA NEELRAVIKKIWKKTSMKLLDQVVPPAGDDEVTVGKFYATFLIQDYFPvKFKKRKEQGL VGKYPAKNTTIALQAGLRTLHDIGPEIRRAISCDLQDDEPEETKREEEDDVFKRNGALLG NHVNHVNSDRRDSLQQTNTTHRPLHVQRPSIPPASDTEKPLFPPAGNSVCHNHHNHNSI GKQVPTSTNANLNNANMSKAAHGKRPSIGNLEHVSENGHHSSHKHDREPQRRSSVKRT RYYETYIRSDSGDEQLPTICREDPEIHGYFRDPHCLGEQEYFSSEECYEDDSSPTWSRQN YGYYSRYPGRNIDSERPRGYHHPQGFLEDDDSPVCYDSRRSPRRRLLPPTPASHRRSSFN FECLRRQSSQEEVPSSPIFPHRTALPLHLMQQQIMAVAGLDSSKAQKYSPSHSTRSWATP PATPPYRDWTPCYTPLIQVEQSEALDQVNGSLPSLHRSSWYTDEPDISYRTFTPASLTVP SSFRNKNSDKQRSADSLVEAVLISEGLGRYARDPKFVSATKHEIADACDLTIDEMESAA STLLNGNVRPRANGDVGPLSHRQDYELQDFGPGYSDEEPDPGRDEEDLADEMICITTL conesponding to amino acids 494 - 2161 of CCAD JHUMAN J 3, which also conesponds to amino acids 49 - 1716 of HUMCACHIAJPEAJ _P7, wherein said first amino acid sequence, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for an edge portion of HUMCACH1A_PEA_1_P7, comprising an amino acid sequence being at least 70%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence encoding for WCWWRRRGAAKAGPSGCRRWG, conesponding to HUMCACH1A_PEA_1_P7. According to prefened embodiments of the present invention, there is provided a bridge portion of HUMCACHl AJPEAJ JP7, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise L, having a structure as follows (numbering according to HUMCACH1A_PEA_1_P7): a sequence starting from any of amino acid numbers 492-x to 492; and ending at any of amino acid numbers 28 + ((n-2) - x), in which x varies from 0 to n-2. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMCACHIAJPEAJ J>13, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%) homologous to a polypeptide having the sequence MLRPRCLLRRTAHPPHSAPAPAPARSKCLGSWSNVLIRESSVWSLRL conesponding to amino acids 1 - 47 of HUMCACHIAJPEAJ J513, and a second amino acid sequence being at least 90 %> homologous to
DDEVTVGKFYATFLIQDYFRKFKKRKEQGLVGKYPAKNTTIALQAGLRTLHDIGPEIRR AISCDLQDDEPEETKREEEDDVFKRNGALLGNHVNHVNSDRRDSLQQTNTTHRPLHVQ RPSIPPASDTEKPLFPPAGNSVCHNHHNHNSIGKQVPTSTNANLNNANMSKAAHGKRPS IGNLEHVSENGHHSSHKHDREPQRRSSVKRTRYYETYIRSDSGDEQLPTICREDPEIHGY FRDPHCLGEQEYFSSEECYEDDSSPTWSRQNYGYYSRYPGRNIDSERPRGYHHPQGFLE DDDSPVCYDSRRSPRRRLLPPTPASHRRSSFNFECLRRQSSQEEVPSSPIFPHRTALPLHL MQQQIMAVAGLDSSKAQKYSPSHSTRSWATPPATPPYRDWTPCYTPLIQVEQSEALDQ VNGSLPSLHRSSWYTDEPDISYRTFTPASLTVPSSFRNKNSDKQRSADSLVEAVLISEGL GRYARDPKFVSATKHEIADACDLTIDEMESAASTLLNGNVRPRANGDVGPLSHRQDYE LQDFGPGYSDEEPDPGRDEEDLADEMICITTL conesponding to amino acids 1598 - 2161 of CCAD JHUMAN, which also conesponds to amino acids 48 - 611 of HUMCACHIAJPEAJ J* 13, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of HUMCACHl AJPEA J JP 13, comprising a polypeptide being at least 70%>, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MLRPRCLLRRTAHPPHSAPAPAPARSKCLGSWSNVLIRESSVWSLRL of HUMCACH1A_PEA_1_P13. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMCACH1A_PEA_1_P14, comprising a first amino acid sequence being at least 90 % homologous to
MSKAAHGKRPSIGNLEHVSENGHHSSHKHDREPQRRSSVKRTRYYETYIRSDSGDEQLP TICREDPEIHGYFRDPHCLGEQEYFSSEECYEDDSSPTWSRQNYGYYSRYPGRNIDSERP RGYHHPQGFLEDDDSPVCYDSRRSPRRRLLPPTPASHRRSSFNFECLRRQSSQEEVPSSPI FPHRTALPLHLMQQQIMAVAGLDSSKAQKYSPSHSTRSWATPPATPPYRDWTPCYTPLI QVEQSEALDQVNGSLPSLHRSSWYTDEPDISYRTFTPASLTVPSSFRNKNSDKQRSADSL VEAVLISEGLGRYARDPKFVSATKHEIADACDLTIDEMESAASTLLNGNVRPRANGDVG PLSHRQDYELQDFGPGYSDEEPDPGRDEEDLADEMICITTL conesponding to amino acids 1763 - 2161 of CCADJHUMAN, which also conesponds to amino acids 1 - 399 of HUMCACHIAJPEAJ JP14. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMCACHl A_PEA_1_P17, comprising a first amino acid sequence being at least 90 % homologous to MMMMMMMKKMQHQRQQQADHANEANYARGTRLPLSGEGPTSQPNSSKQTVLSWQ AAIDAARQAKAAQTMSTSAPPP VGSLSQRKRQQYAKSKKQGNSSNSRPARALFCLSLN NPIRRACISIVEWKPFDIFILLAIFANCVALAIYIPFPEDDSNSTNHNLEKVEYAFLIIFTVET FLKIIAYGLLLHPNAYVRNGWNLLDFVIVIVGLFSVILEQLTKETEGGNHSSGKSGGFDV KALRAFRVLRPLRLVSGVPSLQWLNSIIKAMVPLLHIALLVLFVIIIYAIIGLELFIGKMH KTCFFADSDIVAEEDPAPCAFSGNGRQCTANGTECRSGWVGPNGGITNFDNFAFAMLT QCITMEGWTDVLYWMNDAMGFELPWVYFVSLVIFGSFFVLNLVLGVLSG conesponding to amino acids 1 - 407 of CCADJHUMAN, which also conesponds to amino acids 1 - 407 of HUMCACHIAJPEAJ JP17, and a second amino acid sequence being at least 70%), optionally at least 80%>, preferably at least 85%>, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence HGGSRL conesponding to amino acids 408 - 413 of HUMCACHIAJPEAJ JP17, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HUMCACHl A_PEA_1_P17, comprising a polypeptide being at least 10%, optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence HGGSRL in HUMCACH1A_PEA_1_P17. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for AA583399_PEAJ JP2, comprising a first amino acid sequence being at least 90 % homologous to MFTRQAGHFVEGSKAGRSRGRLCLSQALRVAVRGAFVSLWFAAGAGDRERNKGDKG AQTGAGLSQEAEDVDVSRARRVTDAPQGTLCGTGNRNSGSQSARWGVAHLGEAFRV GVEQAISSCPEEVHGRHGLSMEIMWARMDVALRSPGRGLLAGAGALCMTLAESSCPD YERGRRACLTLHRHPTPHCSTWGLPLRVAGSWLTVVTVEALGGWRMGVRRTGQVGP TMHPPPVSGASPLLLHHLLLLLLIIILTC conesponding to amino acids 59 - 313 of MYEOJHUMANJV1, which also conesponds to amino acids 1 - 255 of AA583399 JPEAJ J>2. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for AA583399JPEAJ JM, comprising a first amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MSDLFIGFLVCSLSPLGTGTRCSCSPG conesponding to amino acids 1 - 27 of AA583399 JPEAJ JP4, and a second amino acid sequence being at least 90 % homologous to RNSGSQSARWGVAHLGEAFRVGVEQAISSCPEEVHGRHGLSMEIMWARMDVALRSP GRGLLAGAGALCMTLAESSCPDYERGRRACLTLHRHPTPHCSTWGLPLRVAGSWLTV VTVEALGGWRMGVRRTGQVGPTMHPPPVSGASPLLLHHLLLLLLIIILTC conesponding to amino acids 150 - 313 of MYEOJHUMANJV1, which also conesponds to amino acids 28 - 191 of AA583399JPEAJ JM, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of AA583399JPEAJ JM, comprising a polypeptide being at least 70%>, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95% homologous to the sequence MSDLFIGFLVCSLSPLGTGTRCSCSPG of AA583399_PEA_1_P4. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for AA583399_PEA_1_P5, comprising a first amino acid sequence being at least 90 % homologous to
MEIMWARMDVALRSPGRGLLAGAGALCMTLAESSCPDYERGRRACLTLHRHPTPHCS TWGLPLRVAGSWLTWTVEALGGWRMGVRRTGQVGPTMHPPPVSGASPLLLHHLLLL LLIIILTC conesponding to amino acids 192 - 313 of MYEO_HUMAN_V2, which also conesponds to amino acids 1 - 122 of AA583399JPEAJ JP5. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for AA583399_PEA_1_P10, comprising a first amino acid sequence being at least 90 % homologous to
MFTRQAGHFVEGSKAGRSRGRLCLSQALRVAVRGAFVSLWFAAGAGDRERNKGDKG AQTGAGLSQEAEDVDVSRARRVTDAPQGTLCGTGNRNSGSQSARAVGVAHLGEAFRV GVEQAISSCPEEVHGRHGLSMEIMWAQMDVALRSPGRGLLAGAGALCMTLAESSCPD YERGRRACLTLHRHPTPHCSTWGLPLRVAGSWLTVVTVEALGRWRMGVRRTGQVGPT MHPPPVSGASPLLLHHLLLLLLIIILTC conesponding to amino acids 59 - 313 of MYEO__HUMAN_V3, which also conesponds to amino acids 1 - 255 of AA583399JPEAJ_P10. According to prefened embodiments of the present invention, there is provided an antibody capable of specifically binding to an epitope of an amino acid sequence as described herein. Optionally the amino acid sequence conesponds to a bridge, edge portion, tail, head or insertion as described herein. Optionally the antibody is capable of differentiating between a splice variant having said epitope and a conesponding known protein. According to prefened embodiments of the present invention, there is provided a kit for detecting colon cancer, comprising a kit detecting overexpression of a splice variant as described herein. Optionally the kit comprises a NAT-based technology. Optionally said the kit further comprises at least one primer pair capable of selectively hybridizing to a nucleic acid sequence as described herein. Optionally the kit further comprises at least one oligonucleotide capable of selectively hybridizing to a nucleic acid sequence as described herein.The kit optionally comprises an antibody as described herein.The kit optionally further comprises at least one reagent for performing an ELISA or a Western blot. There is optionally provided a method for detecting colon cancer, comprising detecting overexpression of a splice variant as described herein. Detecting overexpression is optionally performed with a NAT-based technology. Optionally s detecting overexpression is performed with an immunoassay, optionally wherein said immunoassay comprises an antibody as described herein. A biomarker capable of detecting colon cancer, comprising any of the above nucleic acid sequences or a fragment thereof, or any of the above amino acid sequences or a fragment thereof. A method for screening for colon cancer, comprising detecting colon cancer cells with a biomarker or an antibody or a method or assay as described herein. A method for diagnosing colon cancer, comprising detecting colon cancer cells with a biomarker or an antibody or a method or assay as described herein. A method for monitoring disease progression and/or treatment efficacy and/or relapse of colon cancer, comprising detecting colon cancer cells with a biomarker or an antibody or a method or assay as described herein. A method of selecting a therapy for colon cancer, comprising detecting colon cancer cells with a biomarker or an antibody or a method or assay as described herein and selecting a therapy according to said detection. According to prefened embodiments of the present invention, preferably any of the above nucleic acid and/or amino acid sequences further comprises any sequence having at least about 70%, preferably at least about 80%), more preferably at least about 90%, most preferably at least about 95% homology thereto. Unless otherwise noted, all experimental data relates to variants of the present invention, named according to the segment being tested (as expression was tested through RT-PCR as described). All nucleic acid sequences and/or amino acid sequences shown herein as embodiments of the present invention relate to their isolated form, as isolated polynucleotides (including for all transcripts), oligonucleotides (including for all segments, amplicons and primers), peptides (including for all tails, bridges, insertions or heads, optionally including other antibody epitopes as described herein) and/or polypeptides (including for all proteins). It should be noted that oligonucleotide and polynucleotide, or peptide and polypeptide, may optionally be used interchangeably. Unless defined otherwise, all technical and scientific terms used herein have the meaning commonly understood by a person skilled in the art to which this invention belongs. The following references provide one of skill with a general definition of many of the terms used in this invention: Singleton et al., Dictionary of Microbiology and Molecular Biology (2nd ed. 1994); The Cambridge Dictionary of Science and Technology (Walker ed., 1988); The Glossary of Genetics, 5th Ed., R. Rieger et al. (eds.), Springer Verlag (1991); and Hale & Marham, The Harper Collins Dictionary of Biology (1991). All of these are hereby incorporated by reference as if fully set forth herein. As used herein, the following terms have the meanings ascribed to them unless specified otherwise. BRIEF DESCRIPTION OF DRAWINGS
Figure 1. is schematic summary of cancer biomarkers selection engine and the wet validation stages.
Figure 2. Schematic illustration, depicting grouping of transcripts of a given cluster based on presence or absence of unique sequence regions. Figure 3 is schematic summary of quantitative real-time PCR analysis.
Figure 4 is schematic presentation of the oligonucleotide based microanay fabrication. Figure 5 is schematic summary of the oligonucleotide based microanay experimental flow.
Figure 6 is a histogram showing Cancer and cell-line vs. normal tissue expression for Cluster M85491. Figure 7 is a histogram showing expression of the Ephrin type-B receptor 2 precursor
(EC 2.7.1.112) (Tyrosine-protein kinase receptor EPH-3) M85491 transcripts which are detectable by amplicon as depicted in sequence name M85491seg24 in normal and cancerous colon tissues. Figure 8 is a histogram showing the expression of M85491 transcripts which are detectable by amplicon as depicted in sequence name M85491seg24 in different normal tissues.
Figure 9 is histogram, showing Cancer and cell-line vs. normal tissue expression for Cluster T10888, demonstrating overexpression in colorectal cancer, a mixture of malignant tumors from different tissues, pancreas carcinoma and gastric carcinoma.. Figure 10 is a histogram showing expression of the CEA6 JHUMAN Carcinoembryonic antigen-related cell adhesion molecule 6 (T10888) transcripts which are detectable by amplicon as depicted in sequence name T10888 juncl 1-17, in nonnal and cancerous colon tissues.
Figure 11 is a the histogram showing the expression of T10888 transcripts, which are detectable by amplicon as depicted in sequence name T10888juncl l-17, in different normal tissues. Figure 12 is a histogram showing Cancer and cell-line vs. normal tissue expression for
Cluster H14624.
Figure 13 is a histogram, showing Cancer and cell- line vs. normal tissue expression for Cluster H53626, demonstrating overexpression in the epithelial malignant tumors, a mixture of malignant tumors from different tissues and myosarcoma.
Figure 14 is a histogram showing expression of the above-indicated Homo sapiens fibroblast growth factor receptor-like 1 (FGFRL1) H53626 transcripts, which are detectable by amplicon as depicted in sequence name H53626 junc24-27FlR3, in normal and cancerous colon tissues.
Figure 15 is the expression of Homo sapiens fibroblast growth factor receptor- like 1 (FGFRL1) H53626 transcripts, which are detectable by amplicon as depicted in sequence name H53626seg25, in normal and cancerous colon tissues. Figure 16 is a a histogram, showing Cancer and cell-line vs. normal tissue expression for
Cluster HSENA78, demonstrating overexpression in the epithelial malignant tumors and lung malignant tumors.
Figure 17 is a histogram, showing Cancer and cell-line vs. normal tissue expression for the Cluster HUMODCA, demonstrating overexpression in the brain malignant tumors, colorectal cancer, epithelial malignant tumors and a mixture of malignant tumors from different tissues.
Figure 18 is a histogram, showing Cancer and cell-line vs. normal tissue expression for the cluster R00299, demonstratin overexpression in the lung malignant tumors.
Figure 19 is the histograms showing Cancer and cell-line vs. normal tissue expression for the cluster Z44808, demonstrating overexpression in the colorectal cancer, lung cancer and pancreas carcinoma.
Figure 20 is the histograms showing Cancer and cell-line vs. normal tissue expression for the cluster Z25299, demonstrating overexpression in the brain malignant tumors, a mixture of malignant tumors from different tissues and ovarian carcinoma.
Figure 21 is a histogram showing expression of Z25299 transcripts, which are detectable by amplicon as depicted in sequence name Z25299seg20, in normal and cancerous colon tissues.
Figure 22 is a histogram showing the expression of Secretory leukocyte protease inhibitor Acid-stable proteinase inhibitor with strong affinities for trypsin, chymotrypsin, elastase, and cathepsin G. May prevent elastase-mediated damage to oral and possibly other mucosal tissues Z25299 transcripts which are detectable by amplicon as depicted in sequence name Z25299seg20 in different normal tissues.
Figure 23 is the histograms showing Cancer and cell-line vs. normal tissue expression for the cluster HUMANK, demonstrating overexpression in epithelial malignant tumors.
Figure 24 is the histograms showing Cancer and cell-line vs. normal tissue expression for the cluster HUMCAIXIA, demonstrating overexpression in the bone malignant tumors, epithelial malignant tumors, a mixture of malignant tumors from different tissues and lung malignant tumors.
Figure 25 is the histograms showing Cancer and cell-line vs. normal tissue expression for the cluster HSSIOOPCB, demonstrating overexpression in the mixture of malignant tumors from different tissues.
Figure 26 is the histograms showing Cancer and cell-line vs. normal tissue expression for the cluster D11853, demonstrating overexpression in the brain malignant tumors, colorectal cancer and a mixture of malignant tumors from different tissues.
Figure 27 is the histograms showing Cancer and cell-line vs. normal tissue expression for the cluster R11723, demonstrating overexpression in the epithelial malignant tumors, a mixture of malignant tumors from different tissues and kidney malignant tumors
Figure 28 is the histogram showing expression of the R11723 transcripts, which are detectable by amplicon as depicted in sequence name R11723 segl3 in normal and cancerous colon tissues. Figure 29 is the histogram showing expression of the R11723 transcripts, which are detectable by amplicon as depicted in sequence name R11723 juncl 1-18 in normal and cancerous colon tissues.
Figure 30 is the histogram showing the expression of RI 1723 transcripts, detectable by amplicon depicted in sequence name RI 1723segl 3 in different normal tissues.
Figure 31 is the histogram showing the expression of RI 1723 transcripts, detectable by amplicon in sequence name RI 1723 juncl 1-18 in different normal tissues. Figure 32 is a histogram showing over expression of the SM02 JHUMAN SPARC related modular calcium-binding protein 2 precursor (Secreted modular calcium-binding protein 2) (SMOC-2) (Smooth muscle-associated protein 2) Z44808 transcripts which are detectable by amplicon as depicted in sequence name Z44808junc8-1 1 in cancerous colon samples relative to the nonnal samples
Figure 33 is the histograms showing Cancer and cell-line vs. normal tissue expression for the cluster M77903, demonstrating overexpression in ovarian carcinoma and uterine malignancies. Figure 34 is the histogram showing expression of the SSR-alpha M77903 transcripts, which are detectable by amplicon, as depicted in sequence name M77903segl8 in normal and cancerous colon tissues.
Figure 35 is the histogram showing low over expression for amplicon M77903 junc20- 34-35 in the experiment canied out with colon.
Figure 36 is the histogram showing low over expression for amplicon M77903 junc20- 28 in the experiment carried out with colon Figures 37-38 are histograms showing differential expression of 6 sequences:
(M85491seg24, M77903 segl8, M77903junc20-28, Z44808 junc8-l l, Z25299 seg 20 and HSKITCR seg3 in nonnal and cancerous colon tissues, in different combinations. Figure 39 is a histogram showing the expression of SM02_HUMAN SPARC related modular calcium-binding protein 2 precursor (Secreted modular calcium-binding protein 2) (SMOC-2) (Smooth muscle-associated protein 2) Z44808 transcripts which are detectable by amplicon as depicted in sequence name Z44808 junc8-l 1 in different normal tissues.
Figure 40 is the histogram showing Cancer and cell-line vs. normal tissue expression for the cluster AA583399, demonstrating overexpression in brain malignant tumors, epithelial malignant tumors, a mixture of malignant tumors from different tissues and gastric carcinoma. Figure 41 is the histogram showing expression of the AA583399 transcripts, which are detectable by amplicon as depicted in sequence name AA583399seg30-32, in normal and cancerous colon tissues. Figure 42 is the histogram showing expression of the AA583399 transcripts which are detectable by amplicon as depicted in sequence name AA583399segl7 in nonnal and cancerous colon tissues. Figure 43 is the histogram showing expression of the AA583399 transcripts which are detectable by amplicon as depicted in sequence name AA583399segl in normal and cancerous colon tissues. Figure 44 is the histogram showing Cancer and cell-line vs. normal tissue expression for the cluster AI684092, demonstrating overexpression in brain malignant tumors, epithelial malignant tumors and a mixture of malignant tumors from different tissues. Figure 45 is the histogram showing expression of the AA5315457 transcripts which are detectable by amplicon as depicted in sequence name AA5315457seg8 in normal and cancerous colon tissues. Figure 46 is the histogram showing Cancer and cell-line vs. normal tissue expression for the cluster HUMCACHIA, demonstrating overexpression in a mixture of malignant tumors from different tissues. Figure 47 is the histogram showing expression of the Voltage-dependent L-type calcium channel alpha-ID subunit Calcium channel, L type, alpha-1 polypeptide, isoform 2 Transcripts, which are detectable by seg 113, 35, 109, 125,Jn normal and cancerous colon tissues. Figure 48 is the histogram showing expression of the HUMCACHIA Transcripts, which are detectable by amplicon as depicted in sequence name HUMCACHlAseglOl Jn normal and cancerous colon tissues. Figure 49 is the histogram showing Cancer and cell-line vs. normal tissue expression for the cluster HUMCEA, demonstrating overexpression in epithelial malignant tumors, a mixture of malignant tumors from different tissues and pancreas carcinoma. Figure 50 is the histogram showing expression of the HUMCEA transcripts which are detectable by segl2 and seg9Jn normal and cancerous colon tissues. Figure 51 is the histogram showing expression of the Carcinoembryonic antigen-related cell adhesion molecule 5 CEACAM5 HUMCEA transcripts which are detectable by amplicon as depicted in sequence name HUMCEA seg31 in nonnal and cancerous colon tissues. Figure 52 is the histogram showing expression of the Carcinoembryonic antigen-related cell adhesion molecule 5 CEACAM5 HUMCEA transcripts which are detectable by amplicon as depicted in sequence name HUMCEA seg33 in normal and cancerous colon tissues. Figure 53 is the histogram showing expression of the Carcinoembryonic antigen-related cell adhesion molecule 5 CEACAM5 HUMCEA transcripts which are detectable by amplicon as depicted in sequence name HUMCEA seg35 in normal and cancerous colon tissues. Figure 54 is the histogram showing Cancer and cell-line vs. normal tissue expression for the cluster M78035, demonstrating overexpression in brain malignant tumors, colorectal cancer, epithelial malignant tumors, a mixture of malignant tumors from different tissues, malignant tumors involving the lymph nodes and pancreas carcinoma. Figure 55 is the histogram showing expression of the S-adenosylhomocysteine hydrolase (AHCY) M78035 transcripts, which are detectable by amplicon as depicted in sequence name M78035seg42, in normal and cancerous colon tissues Figure 56 is the histogram showing Cancer and cell-line vs. nonnal tissue expression for the cluster R30650, demonstrating overexpression in epithelial malignant tumors and a mixture of malignant tumors from different tissues. Figure 57 is the histogram showing expression of the R30650 transcripts which are detectable by amplicon as depicted in sequence name R30650 seg76 in normal and cancerous colon tissues. Figure 58 is the histogram showing Cancer and cell-line vs. normal tissue expression for the cluster T23657, demonstrating overexpression in epithelial malignant tumors. Figure 59 is the histogram showing expression of solute carrier organic anion transporter family, member 4A1 (SLC04A1) T23657 transcripts, which are detectable by amplicon as depicted in sequence name T23657 segl7-18, in normal and cancerous colon tissues. Figure 60 is the histogram showing expression of solute carrier organic anion transporter family, member 4A1 (SLC04A1) T23657 transcripts, which are detectable by amplicon as depicted in sequence name T23657 seg22, in normal and cancerous colon tissues. Figure 61 is the histogram showing expression of solute canier organic anion transporter family, member 4A1 (SLC04A1) T23657 transcripts, which are detectable by amplicon as depicted in sequence name T23657 seg29-32, in normal and cancerous colon tissues. Figure 62 is the histogram showing expression of solute canier organic anion transporter family, member 4A1 (SLC04A1) T23657 transcripts, which are detectable by amplicon as depicted in sequence name T23657 seg41, in normal and cancerous colon tissues. Figure 63 is the histogram showing Cancer and cell-line vs. normal tissue expression for the cluster T51958, demonstrating overexpression in epithelial malignant tumors and a mixture of malignant tumors from different tissues.
Figure 64 is the histogram showing expression of PTK7 protein tyrosine kinase 7 (PTK7) T51958 transcripts which are detectable by amplicon as depicted in sequence name T 51958seg38 in normal and cancerous colon tissues. Figure 65 is the histogram showing expression of PTK7 protein tyrosine kinase 7
(PTK7) T51958 transcripts which are detectable by amplicon as depicted in sequence name T 51958seg7 in normal and cancerous colon tissues. Figure 66 is the histogram showing Cancer and cell-line vs. normal tissue expression for the cluster Z 17877, demonstrating overexpression in brain malignant tumors and malignant tumors involving the bone manow. Figure 67 is the histogram showing expression of c-myc-P64 mRNA, initiating from promoter P0 Z 17877 transcripts, which are detectable by amplicon as depicted in sequence name Z17877seg8, in normal and cancerous colon tissues. Figure 68 is the histogram showing combined expression of 19 sequences (T23657seg 29, T23657seg 22, T23657seg 41, T23657segl7-18, AA315457seg8, R30650seg76, HUM-
CEASeg 33, CEA-Seg35, CEA-Seg31, AA583399segl, AA583399segl7, AA58339-seg30-32, HUMCACHlAseglOl, HSHCGI seg20, HSHCGI seg35, M78035seg 42, T51958seg7, T51958 seg3 and, Z 17877 seg8 ) in normal and cancerous colon tissues. Figure 69 is the histogram showing expression of TRIM31 tripartite motif HSHCGI transcripts which are detectable by amplicon as depicted in sequence name HSHCGI seg20in normal and cancerous colon tissues. Figure 70 is the histogram showing expression of TRIM31 tripartite motif HSHCGI transcripts which are detectable by amplicon as depicted in sequence name HSHCGI seg35 in nonnal and cancerous colon tissues. Figure 71 is a histogram showing the expression of fibroblast growth factor receptor-like 1 (FGFRLl) transcripts detectable by or according to H53626 seg25 amplicon(s) and H53626 seg25F and H53626 seg25R in different normal tissues. Figure 72 is a histogram showing the expression of fibroblast growth factor receptor-like 1 (FGFRLl) transcripts detectable by or according to H53626 seg25 amplicon(s) and H53626 seg25F and H53626 junc24-27FlR3 in different normal tissues. Figure 73 is a histogram showing over expression of the Matrix metalloproteinase 11
(stromelysin 3) (MMP11) transcripts, which are detectable by amplicon as depicted in sequence name HSSTROL3 junc21-27, in cancerous colon samples relative to the normal samples. Figure 74 is a histogram showing over expression of the Matrix metalloproteinase 11 (stromelysin 3) (MMP11) transcripts, which are detectable by amplicon as depicted in sequence name HSSTROL3 seg25, in cancerous colon samples relative to the normal samples.
Figure 75 is the histogram showing Cancer and cell-line vs. normal tissue expression for the cluster HSSTROL3, demonstrating overexpression in transitional cell carcinoma, epithelial malignant tumors, a mixture of malignant tumors from different tissues and pancreas carcinoma. Figure 76 is a histogram showing the expression of of Stromelysin-3 HSSTROL3 transcripts, which are detectable by amplicon as depicted in sequence name HSSTROL3 seg24, in different normal tissues.
DESCRIPTION OF PREFERRED EMBODIMENTS The present invention is of novel markers for colon cancer that are both sensitive and accurate. Biomolecular sequences (amino acid and/or nucleic acid sequences) uncovered using the methodology of the present invention and described herein can be efficiently utilized as tissue or pathological markers and/or as drugs or drug targets for treating or preventing a disease. These markers are specifically released to the bloodstream under conditions of colon cancer and/or other colon pathology, and/or are otherwise expressed at a much higher level and/or specifically expressed in colon cancer tissue or cells. The measurement of these markers, alone or in combination, in patient samples provides information that the diagnostician can conelate with a probable diagnosis of colon cancer and/or pathology. The present invention therefore also relates to diagnostic assays for colon cancer and/or colon pathology, and methods of use of such markers for detection of colon cancer and/or colon pathology, optionally and preferably in a sample taken from a subject (patient), which is more preferably some type of blood sample. In another embodiment, the present invention relates to bridges, tails, heads and/or insertions, and or analogs, homologs and derivatives of such peptides. Such bridges, tails, heads and or insertions are described in greater detail below with regard to the Examples. As used herein a "tail" refers to a peptide sequence at the end of an amino acid sequence that is unique to a splice variant according to the present invention. Therefore, a splice variant having such a tail may optionally be considered as a chimera, in that at least a first portion of the splice variant is typically highly homologous (often 100% identical) to a portion of the conesponding known protein, while at least a second portion of the variant comprises the tail. As used herein a "head" refers to a peptide sequence at the beginning of an amino acid sequence that is unique to a splice variant according to the present invention. Therefore, a splice variant having such a head may optionally be considered as a chimera, in that at least a first portion of the splice variant comprises the head, while at least a second portion is typically highly homologous (often 100% identical) to a portion of the conesponding known protein. As used herein "an edge portion" refers to a connection between two portions of a splice variant according to the present invention that were not joined in the wild type or known protein. An edge may optionally arise due to a join between the above "known protein" portion of a variant and the tail, for example, and/or may occur if an internal portion of the wild type sequence is no longer present, such that two portions of the sequence are now contiguous in the splice variant that were not contiguous in the known protein. A "bridge" may optionally be an edge portion as described above, but may also include a join between a head and a "known protein" portion of a variant, or a join between a tail and a "known protein" portion of a variant, or a join between an insertion and a "known protein" portion of a variant. Optionally and preferably, a bridge between a tail or a head or a unique insertion, and a "known protein" portion of a variant, comprises at least about 10 amino acids, more preferably at least about 20 amino acids, most preferably at least about 30 amino acids, and even more preferably at least about 40 amino acids, in which at least one amino acid is from the tail/head/insertion and at least one amino acid is from the "known protein" portion of a variant. Also optionally, the bridge may comprise any number of amino acids from about 10 to about 40 amino acids (for example, 10, 11, 12, 13...37, 38, 39, 40 amino acids in length, or any number in between). It should be noted that a bridge cannot be extended beyond the length of the sequence in either direction, and it should be assumed that every bridge description is to be read in such manner that the bridge length does not extend beyond the sequence itself. Furthermore, bridges are described with regard to a sliding window in certain contexts below. For example, certain descriptions of the bridges feature the following format: a bridge between two edges (in which a portion of the known protein is not present in the variant) may optionally be described as follows: a bridge portion of CONTIG-NAMEJP1 (representing the name of the protein), comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise XX (2 amino acids in the center of the bridge, one from each end of the edge), having a structure as follows (numbering according to the sequence of CONTIG-NAME_Pl): a sequence starting from any of amino acid numbers 49-x to 49 (for example); and ending at any of amino acid numbers 50 + ((n-2) - x) (for example), in which x varies from 0 to n-2. In this example, it should also be read as including bridges in which n is any number of amino acids between 10-50 amino acids in length. Furthermore, the bridge polypeptide cannot extend beyond the sequence, so it should be read such that 49-x (for example) is not less than 1, nor 50 + ((n-2) - x) (for example) greater than the total sequence length. In another embodiment, this invention provides antibodies specifically recognizing the splice variants and polypeptide fragments thereof of this invention. Preferably such antibodies differentially recognize splice variants of the present invention but do not recognize a conesponding known protein (such known proteins are discussed with regard to their splice variants in the Examples below). In another embodiment, this invention provides an isolated nucleic acid molecule encoding for a splice variant according to the present invention, having a nucleotide sequence as set forth in any one of the sequences listed herein, or a sequence complementary thereto. In another embodiment, this invention provides an isolated nucleic acid molecule, having a nucleotide sequence as set forth in any one of the sequences listed herein, or a sequence complementary thereto. In another embodiment, this mvention provides an oligonucleotide of at least about 12 nucleotides, specifically hybridizable with the nucleic acid molecules of this mvention. In another embodiment, this invention provides vectors, cells, liposomes and compositions comprising the isolated nucleic acids of this invention. In another embodiment, this invention provides a method for detecting a splice variant according to the present invention in a biological sample, comprising: contacting a biological sample with an antibody specifically recognizing a splice variant according to the present invention under conditions whereby the antibody specifically interacts with the splice variant in the biological sample but do not recognize known conesponding proteins (wherein the known protein is discussed with regard to its splice variant(s) in the Examples below), and detecting said interaction; wherein the presence of an interaction conelates with the presence of a splice variant in the biological sample. In another embodiment, this invention provides a method for detecting a splice variant nucleic acid sequences in a biological sample, comprising: hybridizing the isolated nucleic acid molecules or oligonucleotide fragments of at least about a minimum length to a nucleic acid material of a biological sample and detecting a hybridization complex; wherein the presence of a hybridization complex conelates with the presence of a splice variant nucleic acid sequence in the biological sample. According to the present invention, the splice variants described herein are non-limiting examples of markers for diagnosing colon cancer and/or colon pathology. Each splice variant marker of the present invention can be used alone or in combination, for various uses, including but not limited to, prognosis, prediction, screening, early diagnosis, determination of progression, therapy selection and treatment monitoring of colon cancer and/or colon pathology. According to optional but prefened embodiments of the present invention, any marker according to the present invention may optionally be used alone or combination. Such a combination may optionally comprise a plurality of markers described herein, optionally including any subcombination of markers, and/or a combination featuring at least one other marker, for example a known marker. Furthermore, such a combination may optionally and preferably be used as described above with regard to determining a ratio between a quantitative or semi-quantitative measurement of any marker described herein to any other marker described herein, and/or any other known marker, and/or any other marker. With regard to such a ratio between any marker described herein (or a combination thereof) and a known marker, more preferably the known marker comprises the "known protein" as described in greater detail below with regard to each cluster or gene. According to other prefened embodiments of the present invention, a splice variant protein or a fragment thereof, or a splice variant nucleic acid sequence or a fragment thereof, may be featured as a biomarker for detecting colon cancer and/or colon pathology, such that a biomarker may optionally comprise any of the above. According to still other prefened embodiments, the present invention optionally and preferably encompasses any amino acid sequence or fragment thereof encoded by a nucleic acid sequence conesponding to a splice variant protein as described herein. Any oligopeptide or peptide relating to such an amino acid sequence or fragment thereof may optionally also (additionally or alternatively) be used as a biomarker, including but not limited to the unique amino acid sequences of these proteins that are depicted as tails, heads, insertions, edges or bridges. The present invention also optionally encompasses antibodies capable of recognizing, and/or being elicited by, such oligopeptides or peptides. The present invention also optionally and preferably encompasses any nucleic acid sequence or fragment thereof, or amino acid sequence or fragment thereof, conesponding to a splice variant of the present invention as described above, optionally for any application. Non-limiting examples of methods or assays are described below. The present invention also relates to kits based upon such diagnostic methods or assays.
Nucleic acid sequences and Oligonucleotides Various embodiments of the present invention encompass nucleic acid sequences described hereinabove; fragments thereof, sequences hybridizable therewith, sequences homologous thereto, sequences encoding similar polypeptides with different codon usage, altered sequences characterized by mutations, such as deletion, insertion or substitution of one or more nucleotides, either naturally occuning or artificially induced, either randomly or in a targeted fashion. The present invention encompasses nucleic acid sequences described herein; fragments thereof, sequences hybridizable therewith, sequences homologous thereto [e.g., at least 50 %, at least 55 %, at least 60%, at least 65 %, at least 70 %, at least 75 %, at least 80 %, at least 85 %, at least 95 %> or more say 100 % identical to the nucleic acid sequences set forth below], sequences encoding similar polypeptides with different codon usage, altered sequences characterized by mutations, such as deletion, insertion or substitution of one or more nucleotides, either naturally occuning or man induced, either randomly or in a targeted fashion. The present invention also encompasses homologous nucleic acid sequences (i.e., which form a part of a polynucleotide sequence of the present invention) which include sequence regions unique to the polynucleotides of the present invention. In cases where the polynucleotide sequences of the present invention encode previously unidentified polypeptides, the present invention also encompasses novel polypeptides or portions thereof, which are encoded by the isolated polynucleotide and respective nucleic acid fragments thereof described hereinabove. A "nucleic acid fragment" or an "oligonucleotide" or a "polynucleotide" are used herein interchangeably to refer to a polymer of nucleic acids. A polynucleotide sequence of the present invention refers to a single or double stranded nucleic acid sequences which is isolated and provided in the form of an RNA sequence, a complementary polynucleotide sequence (cDNA), a genomic polynucleotide sequence and/or a composite polynucleotide sequences (e.g., a combination of the above). As used herein the phrase "complementary polynucleotide sequence" refers to a sequence, which results from reverse transcription of messenger RNA using a reverse transcriptase or any other RNA dependent DNA polymerase. Such a sequence can be subsequently amplified in vivo or in vitro using a DNA dependent DNA polymerase. As used herein the phrase "genomic polynucleotide sequence" refers to a sequence derived (isolated) from a chromosome and thus it represents a contiguous portion of a chromosome. As used herein the phrase "composite polynucleotide sequence" refers to a sequence, which is composed of genomic and cDNA sequences. A composite sequence can include some exonal sequences required to encode the polypeptide of the present invention, as well as some intronic sequences interposing therebetween. The intronic sequences can be of any source, including of other genes, and typically will include conserved splicing signal sequences. Such intronic sequences may further include cis acting expression regulatory elements. Prefened embodiments of the present invention encompass oligonucleotide probes. An example of an oligonucleotide probe which can be utilized by the present invention is a single stranded polynucleotide which includes a sequence complementary to the unique sequence region of any variant according to the present invention, including but not limited to a nucleotide sequence coding for an amino sequence of a bridge, tail, head and/or insertion according to the present invention, and/or the equivalent portions of any nucleotide sequence given herein (including but not limited to a nucleotide sequence of a node, segment or amplicon described herein). Alternatively, an oligonucleotide probe of the present invention can be designed to hybridize with a nucleic acid sequence encompassed by any of the above nucleic acid sequences, particularly the portions specified above, including but not limited to a nucleotide sequence coding for an amino sequence of a bridge, tail, head and/or insertion according to the present invention, and/or the equivalent portions of any nucleotide sequence given herein (including but not limited to a nucleotide sequence of a node, segment or amplicon described herein). Oligonucleotides designed according to the teachings of the present invention can be generated according to any oligonucleotide synthesis method known in the art such as enzymatic synthesis or solid phase synthesis. Equipment and reagents for executing solid-phase synthesis are commercially available from, for example, Applied Biosystems. Any other means for such synthesis may also be employed; the actual synthesis of the oligonucleotides is well within the capabilities of one skilled in the art and can be accomplished via established methodologies as detailed in, for example, "Molecular Cloning: A laboratory Manual" Sambrook et al., (1989); "Cunent Protocols in Molecular Biology" Volumes I-III Ausubel, R. M., ed. (1994); Ausubel et al, "Cunent Protocols in Molecular Biology", John Wiley and Sons, Baltimore, Maryland (1989); Perbal, "A Practical Guide to Molecular Cloning", John Wiley & Sons, New York (1988) and "Oligonucleotide Synthesis" Gait, M. J., ed. (1984) utilizing solid phase chemistiy, e.g. cyanoethyl phosphoramidite followed by deprotection, desalting and purification by for example, an automated trityl-on method or HPLC. Oligonucleotides used according to this aspect of the present invention are those having a length selected from a range of about 10 to about 200 bases preferably about 15 to about 150 bases, more preferably about 20 to about 100 bases, most preferably about 20 to about 50 bases. Preferably, the oligonucleotide of the present invention features at least 17, at least 18, at least 19, at least 20, at least 22, at least 25, at least 30 or at least 40, bases specifically hybridizable with the biomarkers of the present invention. The oligonucleotides of the present invention may comprise heterocylic nucleosides consisting of purines and the pyrimidines bases, bonded in a 3' to 5' phosphodiester linkage. Preferably used oligonucleotides are those modified at one or more of the backbone, internucleoside linkages or bases, as is broadly described hereinunder. Specific examples of prefened oligonucleotides useful according to this aspect of the present invention include oligonucleotides containing modified backbones or non-natural internucleoside linkages. Oligonucleotides having modified backbones include those that retain a phosphorus atom in the backbone, as disclosed in U.S. Pat. NOs: 4,469,863; 4,476,301 5,023,243; 5,177,196; 5,188,897; 5,264,423; 5,276,019; 5,278,302; 5,286,717; 5,321,131 5,399,676; 5,405,939; 5,453,496; 5,455,233; 5,466, 677; 5,476,925; 5,519,126; 5,536,821 5,541,306; 5,550,111; 5,563,253; 5,571,799; 5,587,361; and 5,625,050. Prefened modified oligonucleotide backbones include, for example, phosphorothioates, chiral phosphorothioates, phosphorodithioates, phosphotriesters, aminoalkyl phosphotriesters, methyl and other alkyl phosphonates including 3 '-alkylene phosphonates and chiral phosphonates, phosphinates, phosphoramidates including 3'-amino phosphoramidate and aminoalkylphosphoramidates, thionophosphoramidates, thionoalkylphosphonates, thionoalkylphosphotriesters, and boranophosphates having normal 3 '-5' linkages, 2 -5' linked analogs of these, and those having inverted polarity wherein the adjacent pairs of nucleoside units are linked 3'-5' to 5'-3' or 2'-5' to 5'-2'. Various salts, mixed salts and free acid forms can also be used. Alternatively, modified oligonucleotide backbones that do not include a phosphorus atom therein have backbones that are fonned by short chain alkyl or cycloalkyl internucleoside linkages, mixed heteroatom and alkyl or cycloalkyl internucleoside linkages, or one or more short chain heteroatomic or heterocyclic internucleoside linkages. These include those having morpholino linkages (formed in part from the sugar portion of a nucleoside); siloxane backbones; sulfide, sulfoxide and sulfone backbones; formacetyl and thioformacetyl backbones; methylene formacetyl and thioformacetyl backbones; alkene containing backbones; sulfamate backbones; methyleneimino and methylenehydrazino backbones; sulfonate and sulfonamide backbones; amide backbones; and others having mixed N, O, S and CH2 component parts, as disclosed in U.S. Pat. Nos. 5,034,506; 5,166,315; 5,185,444; 5,214,134; 5,216,141; 5,235,033; 5,264,562; 5,264,564; 5,405,938; 5,434,257; 5,466,677; 5,470,967; 5,489,677; 5,541,307; 5,561,225; 5,596,086; 5,602,240; 5,610,289; 5,602,240; 5,608,046; 5,610,289; 5,618,704; 5,623, 070; 5,663,312; 5,633,360; 5,677,437; and 5,677,439. Other oligonucleotides which can be used according to the present invention, are those modified in both sugar and the internucleoside linkage, i.e., the backbone, of the nucleotide units are replaced with novel groups. The base units are maintained for complementation with the appropriate polynucleotide target. An example for such an oligonucleotide mimetic, includes peptide nucleic acid (PNA). United States patents that teach the preparation of PNA compounds include, but are not limited to, U.S. Pat. Nos. 5,539,082; 5,714,331; and 5,719,262, each of which is herein incorporated by reference. Other backbone modifications, which can be used in the present invention are disclosed in U.S. Pat. No: 6,303,374. Oligonucleotides of the present invention may also include base modifications or substitutions. As used herein, "unmodified" or "natural" bases include the purine bases adenine (A) and guanine (G), and the pyrimidine bases thymine (T), cytosine (C) and uracil (U). Modified bases include but are not limited to other synthetic and natural bases such as 5- methylcytosine (5-me-C), 5-hydroxymethyl cytosine, xanthine, hypoxanthine, 2-aminoadenine, 6-methyl and other alkyl derivatives of adenine and guanine, 2-propyl and other alkyl derivatives of adenine and guanine, 2-thiouracil, 2-thiothymine and 2-thiocytosine, 5-halouracil and cytosine, 5-propynyl uracil and cytosine, 6-azo uracil, cytosine and thymine, 5-uracil (pseudouracil), 4-thiouracil, 8-halo, 8-amino, 8-thiol, 8-thioalkyl, 8-hydroxyl and other 8- substituted adenines and guanines, 5-halo particularly 5-bromo, 5-trifluoromethyl and other 5- substituted uracils and cytosines, 7-methylguanine and 7-methyladenine, 8-azaguanine and 8- azaadenine, 7-deazaguanine and 7-deazaadenine and 3-deazaguanine and 3-deazaadenine. Further bases particularly useful for increasing the binding affinity of the oligomeric compounds of the invention include 5-substituted pyrimidines, 6-azapyrimidines and N-2, N-6 and 0-6 substituted purines, including 2-aminopropyladenine, 5-propynyluracil and 5-propynylcytosine. 5-methylcytosine substitutions have been shown to increase nucleic acid duplex stability by 0.6- 1.2 °C and are presently prefened base substitutions, even more particularly when combined with 2'-0-methoxyethyl sugar modifications. Another modification of the oligonucleotides of the invention involves chemically linking to the oligonucleotide one or more moieties or conjugates, which enhance the activity, cellular distribution or cellular uptake of the oligonucleotide. Such moieties include but are not limited to lipid moieties such as a cholesterol moiety, cholic acid, a thioether, e.g., hexyl-S- tritylthiol, a thiocholesterol, an aliphatic chain, e.g., dodecandiol or undecyl residues, a phospholipid, e.g., di-hexadecyl-rac-glycerol or triethylammonium 1,2-di-O-hexadecyl-rac- glycero-3-H-phosphonate, a polyamine or a polyethylene glycol chain, or adamantane acetic acid, a palmityl moiety, or an octadecylamine or hexylamino-carbonyl-oxycholesterol moiety, as disclosed in U.S. Pat. No: 6,303,374. It is not necessary for all positions in a given oligonucleotide molecule to be uniformly modified, and in fact more than one of the aforementioned modifications may be incoφorated in a single compound or even at a single nucleoside within an oligonucleotide. It will be appreciated that oligonucleotides of the present invention may include further modifications for more efficient use as diagnostic agents and or to increase bioavailability, therapeutic efficacy and reduce cytotoxicity. To enable cellular expression of the polynucleotides of the present invention, a nucleic acid construct according to the present invention may be used, which includes at least a coding region of one of the above nucleic acid sequences, and further includes at least one cis acting regulatory element. As used herein, the phrase "cis acting regulatory element" refers to a polynucleotide sequence, preferably a promoter, which binds a trans acting regulator and regulates the transcription of a coding sequence located downstream thereto. Any suitable promoter sequence can be used by the nucleic acid construct of the present invention. Preferably, the promoter utilized by the nucleic acid construct of the present invention is active in the specific cell population transfonned. Examples of cell type-specific and/or tissue- specific promoters include promoters such as albumin that is liver specific, lymphoid specific promoters [Calame et al., (1988) Adv. Immunol. 43:235-275]; in particular promoters of T-cell receptors [Winoto et al, (1989) EMBO J. 8:729-733] and im unoglobulins; [Banerji et al. (1983) Cell 33729-740], neuron-specific promoters such as the neurofilament promoter [Byrne et al. (1989) Proc. Natl. Acad. Sci. USA 86:5473-5477], pancreas-specific promoters [Edlunch et al. (1985) Science 230:912-916] or mammary gland-specific promoters such as the milk whey promoter (U.S. Pat. No. 4,873,316 and European Application Publication No. 264,166). The nucleic acid construct of the present invention can further include an enhancer, which can be adjacent or distant to the promoter sequence and can function in up regulating the transcription therefrom. The nucleic acid construct of the present invention preferably further includes an appropriate selectable marker and/or an origin of replication. Preferably, the nucleic acid construct utilized is a shuttle vector, which can propagate both in E. coli (wherein the construct comprises an appropriate selectable marker and origin of replication) and be compatible for propagation in cells, or integration in a gene and a tissue of choice. The construct according to the present invention can be, for example, a plasmid, a bacmid, a phagemid, a cosmid, a phage, a virus or an artificial chromosome. Examples of suitable constructs include, but are not limited to, pcDNA3, pcD A3.1
(+/-), pGL3, PzeoSV2 (+/-), pDisplay, pEF/myc/cyto, pCMV/myc/cyto each of which is commercially available from Invitrogen Co. (www.invitrogen.com). Examples of retroviral vector and packaging systems are those sold by Clontech, San Diego, Calif, includingRetro-X vectors pLNCX and pLXSN, which permit cloning into multiple cloning sites and the trasgene is transcribed from CMV promoter. Vectors derived from Mo-MuLV are also included such as pBabe, where the transgene will be transcribed from the 5 'LTR promoter. Cunently prefened in vivo nucleic acid transfer techniques include transfection with viral or non-viral constructs, such as adenovirus, lentivirus, Herpes simplex I virus, or adeno- associated virus (AAV) and lipid-based systems. Useful lipids for lipid-mediated transfer of the gene are, for example, DOTMA, DOPE, and DC-Chol [Tonkinson et al., Cancer Investigation, 14(1): 54-65 (1996)]. The most prefened constructs for use in gene therapy are viruses, most preferably adenoviruses, AAV, lentiviruses, or retroviruses. A viral construct such as a retroviral construct includes at least one transcriptional promoter/enhancer or locus-defining element(s), or other elements that control gene expression by other means such as alternate splicing, nuclear RNA export, or post-translational modification of messenger. Such vector constructs also include a packaging signal, long terminal repeats (LTRs) or portions thereof, and positive and negative strand primer binding sites appropriate to the virus used, unless it is already present in the viral construct. In addition, such a construct typically includes a signal sequence for secretion of the peptide from a host cell in which it is placed. Preferably the signal sequence for this purpose is a mammalian signal sequence or the signal sequence of the polypeptide variants of the present invention. Optionally, the construct may also include a signal that directs polyadenylation, as well as one or more restriction sites and a translation tennination sequence. By way of example, such constructs will typically include a 5' LTR, a tRNA binding site, a packaging signal, an origin of second-strand DNA synthesis, and a 3' LTR or a portion thereof. Other vectors can be used that are non-viral, such as cationic lipids, polylysine, and dendrimers.
Hybridization assays Detection of a nucleic acid of interest in a biological sample may optionally be effected by hybridization-based assays using an oligonucleotide probe (non-limiting examples of probes according to the present invention were previously described). Traditional hybridization assays include PCR, RT-PCR, Real-time PCR, RNase protection, in-situ hybridization, primer extension, Southern blots (DNA detection), dot or slot blots (DNA, RNA), and Northern blots (RNA detection) (NAT type assays are described in greater detail below). More recently, PNAs have been described (Nielsen et al. 1999, Cunent Opin. Biotechnol. 10:71-75). Other detection methods include kits containing probes on a dipstick setup and the like. Hybridization based assays which allow the detection of a variant of interest (i.e., DNA or RNA) in a biological sample rely on the use of oligonucleotides which can be 10, 15, 20, or 30 to 100 nucleotides long preferably from 10 to 50, more preferably from 40 to 50 nucleotides long. Thus, the isolated polynucleotides (oligonucleotides) of the present invention are preferably hybridizable with any of the herein described nucleic acid sequences under moderate to stringent hybridization conditions. Moderate to stringent hybridization conditions are characterized by a hybridization solution such as containing 10 % dextrane sulfate, 1 M NaCl, 1 % SDS and 5 x lθ6 cpm 32P labeled probe, at 65 °C, with a final wash solution of 0.2 x SSC and 0.1 % SDS and final wash at 65°C and whereas moderate hybridization is effected using a hybridization solution containing 10 % dextrane sulfate, 1 M NaCl, 1 % SDS and 5 x 106 cpm 32P labeled probe, at 65 °C, with a final wash solution of 1 x SSC and 0.1 % SDS and final wash at 50 °C. More generally, hybridization of short nucleic acids (below 200 bp in length, e.g. 17-40 bp in length) can be effected using the following exemplary hybridization protocols which can be modified according to the desired stringency; (i) hybridization solution of 6 x SSC and 1 % SDS or 3 M TMACI, 0.01 M sodium phosphate (pH 6.8), 1 mM EDTA (pH 7.6), 0.5 % SDS, 100 μg/ml denatured salmon sperm DNA and 0.1 % nonfat dried milk, hybridization temperature of 1 - 1.5 °C below the Tm, final wash solution of 3 M TMACI, 0.01 M sodium phosphate (pH 6.8), 1 mM EDTA (pH 7.6), 0.5 % SDS at 1 - 1.5 °C below the Tm; (ii) hybridization solution of 6 x SSC and 0.1 % SDS or 3 M TMACI, 0.01 M sodium phosphate (pH 6.8), 1 mM EDTA (pH 7.6), 0.5 % SDS, 100 μg/ml denatured salmon sperm DNA and 0.1 % nonfat dried milk, hybridization temperature of 2 - 2.5 °C below the Tm, final wash solution of 3 M TMACI, 0.01 M sodium phosphate (pH 6.8), 1 mM EDTA (pH 7.6), 0.5 % SDS at 1 - 1.5 °C below the Tm, final wash solution of 6 x SSC, and final wash at 22 °C; (iii) hybridization solution of 6 x SSC and 1 % SDS or 3 M TMACI, 0.01 M sodium phosphate (pH 6.8), 1 mM EDTA (pH 7.6), 0.5 % SDS, 100 μg/ml denatured salmon sperm DNA and 0.1 % nonfat dried milk, hybridization temperature. The detection of hybrid duplexes can be carried out by a number of methods. Typically, hybridization duplexes are separated from unhybridized nucleic acids and the labels bound to the duplexes are then detected. Such labels refer to radioactive, fluorescent, biological or enzymatic tags or labels of standard use in the art. A label can be conjugated to either the oligonucleotide probes or the nucleic acids derived from the biological sample. Probes can be labeled according to numerous well known methods. Non-limiting examples of radioactive labels include 3H, 14C, 32P, and 35S. Non-limiting examples of detectable markers include ligands, fluorophores, chemiluminescent agents, enzymes, and antibodies. Other detectable markers for use with probes, which can enable an increase in sensitivity of the method of the invention, include biotin and radio-nucleotides. It will become evident to the person of ordinary skill that the choice of a particular label dictates the manner in which it is bound to the probe. For example, oligonucleotides of the present invention can be labeled subsequent to synthesis, by incorporating biotinylated dNTPs or rNTP, or some similar means (e.g., photo- cross-linking a psoralen derivative of biotin to RNAs), followed by addition of labeled streptavidin (e.g., phycoerythrin-conjugated streptavidin) or the equivalent. Alternatively, when fluorescently-labeled oligonucleotide probes are used, fluorescein, lissamine, phycoerythrin, rhodamine (Perkin Elmer Cetus), Cy2, Cy3, Cy3.5, Cy5, Cy5.5, Cy7, FluorX (Amersham) and others [e.g., Kricka et al. (1992), Academic Press San Diego, Calif] can be attached to the oligonucleotides. Those skilled in the art will appreciate that wash steps may be employed to wash away excess target DNA or probe as well as unbound conjugate. Further, standard heterogeneous assay formats are suitable for detecting the hybrids using the labels present on the oligonucleotide primers and probes. It will be appreciated that a variety of controls may be usefully employed to improve accuracy of hybridization assays. For instance, samples may be hybridized to an inelevant probe and treated with RNAse A prior to hybridization, to assess false hybridization. Although the present invention is not specifically dependent on the use of a label for the detection of a particular nucleic acid sequence, such a label might be beneficial, by increasing the sensitivity of the detection. Furthermore, it enables automation. Probes can be labeled according to numerous well known methods. As commonly known, radioactive nucleotides can be incorporated into probes of the invention by several methods. Non-limiting examples of radioactive labels include 3H, 14C, 32P, and 35S. Those skilled in the art will appreciate that wash steps may be employed to wash away excess target DNA or probe as well as unbound conjugate. Further, standard heterogeneous assay fonnats are suitable for detecting the hybrids using the labels present on the oligonucleotide primers and probes. It will be appreciated that a variety of controls may be usefully employed to improve accuracy of hybridization assays. Probes of the invention can be utilized with naturally occurring sugar-phosphate backbones as well as modified backbones including phosphorothioates, dithionates, alkyl phosphonates and a-nucleotides and the like. Probes of the invention can be constructed of either ribonucleic acid (RNA) or deoxyribonucleic acid (DNA), and preferably of DNA.
NAT Assays Detection of a nucleic acid of interest in a biological sample may also optionally be effected by NAT-based assays, which involve nucleic acid amplification technology, such as PCR for example (or variations thereof such as real-time PCR for example). As used herein, a "primer" defines an oligonucleotide which is capable of annealing to (hybridizing with) a target sequence, thereby creating a double stranded region which can serve as an initiation point for DNA synthesis under suitable conditions. Amplification of a selected, or target, nucleic acid sequence may be canied out by a number of suitable methods. See generally Kwoh et al., 1990, Am. Biotechnol. Lab. 8:14 Numerous amplification techniques have been described and can be readily adapted to suit particular needs of a person of ordinary skill. Non-limiting examples of amplification techniques include polymerase chain reaction (PCR), ligase chain reaction (LCR), strand displacement amplification (SDA), transcription-based amplification, the q3 replicase system and NASBA (Kwoh et al, 1989, Proc. Natl. Acad. Sci. USA 86, 1173-1177; Lizardi et al., 1988, BioTechnology 6:1197-1202; Malek et al., 1994, Methods Mol. Biol., 28:253-260; and Sambrook et al., 1989, supra). The terminology "amplification pair" (or "primer pair") refers herein to a pair of oligonucleotides (oligos) of the present invention, which are selected to be used together in amplifying a selected nucleic acid sequence by one of a number of types of amplification processes, preferably a polymerase chain reaction. Other types of amplification processes include ligase chain reaction, strand displacement amplification, or nucleic acid sequence-based amplification, as explained in greater detail below. As commonly known in the art, the oligos are designed to bind to a complementary sequence under selected conditions. In one particular embodiment, amplification of a nucleic acid sample from a patient is amplified under conditions which favor the amplification of the most abundant differentially expressed nucleic acid. In one prefened embodiment, RT-PCR is canied out on an mRNA sample from a patient under conditions which favor the amplification of the most abundant mRNA. In another prefened embodiment, the amplification of the differentially expressed nucleic acids is canied out simultaneously. It will be realized by a person skilled in the art that such methods could be adapted for the detection of differentially expressed proteins instead of differentially expressed nucleic acid sequences. The nucleic acid (i.e. DNA or RNA) for practicing the present invention may be obtained according to well known methods. Oligonucleotide primers of the present invention may be of any suitable length, depending on the particular assay format and the particular needs and targeted genomes employed. Optionally, the oligonucleotide primers are at least 12 nucleotides in length, preferably between 15 and 24 molecules, and they may be adapted to be especially suited to a chosen nucleic acid amplification system. As commonly known in the art, the oligonucleotide primers can be designed by taking into consideration the melting point of hybridization thereof with its targeted sequence (Sambrook et al., 1989, Molecular Cloning -A Laboratory Manual, 2nd Edition, CSH Laboratories; Ausubel et al., 1989, in Cunent Protocols in Molecular Biology, John Wiley & Sons Inc., N.Y.). It will be appreciated that antisense oligonucleotides may be employed to quantify expression of a splice isoform of interest. Such detection is effected at the pre-mRNA level. Essentially the ability to quantitate transcription from a splice site of interest can be effected based on splice site accessibility. Oligonucleotides may compete with splicing factors for the splice site sequences. Thus, low activity of the antisense oligonucleotide is indicative of splicing activity. The polymerase chain reaction and other nucleic acid amplification reactions are well known in the art (various non-limiting examples of these reactions are described in greater detail below). The pair of oligonucleotides according to this aspect of the present invention are preferably selected to have compatible melting temperatures (Tm), e.g., melting temperatures which differ by less than that 7 °C, preferably less than 5 °C, more preferably less than 4 °C, most preferably less than 3 °C, ideally between 3 °C and 0 °C. Polymerase Chain Reaction (PCR): The polymerase chain reaction (PCR), as described in U.S. Pat. Nos. 4,683,195 and 4,683,202 to Mullis and Mullis et al, is a method of increasing the concentration of a segment of target sequence in a mixture of genomic DNA without cloning or purification. This technology provides one approach to the problems of low target sequence concentration. PCR can be used to directly increase the concentration of the target to an easily detectable level. This process for amplifying the target sequence involves the introduction of a molar excess of two oligonucleotide primers which are complementary to their respective strands of the double-stranded target sequence to the DNA mixture containing the desired target sequence. The mixture is denatured and then allowed to hybridize. Following hybridization, the primers are extended with polymerase so as to fonn complementary strands. The steps of denaturation, hybridization (annealing), and polymerase extension (elongation) can be repeated as often as needed, in order to obtain relatively high concentrations of a segment of the desired target sequence. The length of the segment of the desired target sequence is determined by the relative positions of the primers with respect to each other, and, therefore, this length is a controllable parameter. Because the desired segments of the target sequence become the dominant sequences (in terms of concentration) in the mixture, they are said to be "PCR-amplified." Ligase Chain Reaction (LCR or LAR): The ligase chain reaction [LCR; sometimes refened to as "Ligase Amplification Reaction" (LAR)] has developed into a well-recognized alternative method of amplifying nucleic acids. In LCR, four oligonucleotides, two adjacent oligonucleotides which uniquely hybridize to one strand of target DNA, and a complementary set of adjacent oligonucleotides, which hybridize to the opposite strand are mixed and DNA ligase is added to the mixture. Provided that there is complete complementarity at the junction, ligase will covalently link each set of hybridized molecules. Importantly, in LCR, two probes are ligated together only when they base-pair with sequences in the target sample, without gaps or mismatches. Repeated cycles of denaturation, and ligation amplify a short segment of DNA. LCR has also been used in combination with PCR to achieve enhanced detection of single-base changes: see for example Segev, PCT Publication No. W09001069 Al (1990). However, because the four oligonucleotides used in this assay can pair to form two short ligatable fragments, there is the potential for the generation of target-independent background signal. The use of LCR for mutant screening is limited to the examination of specific nucleic acid positions. Self-Sustained Synthetic Reaction (3SR/NASBA): The self-sustained sequence replication reaction (3SR) is a transcription-based in vitro amplification system that can exponentially amplify RNA sequences at a uniform temperature. The amplified RNA can then be utilized for mutation detection. In this method, an oligonucleotide primer is used to add a phage RNA polymerase promoter to the 5' end of the sequence of interest. In a cocktail of enzymes and substrates that includes a second primer, reverse transcriptase, RNase H, RNA polymerase and ribo-and deoxyribonucleoside triphosphates, the target sequence undergoes repeated rounds of transcription, cDNA synthesis and second-strand synthesis to amplify the area of interest. The use of 3SR to detect mutations is kinetically limited to screening small segments of DNA (e.g., 200-300 base pairs). Q-Beta (Qβ) Replicase: In this method, a probe which recognizes the sequence of interest is attached to the replicatable RNA template for Qβ replicase. A previously identified major problem with false positives resulting from the replication of unhybridized probes has been addressed through use of a sequence-specific ligation step. However, available thermostable DNA ligases are not effective on this RNA substrate, so the ligation must be performed by T4 DNA ligase at low temperatures (37 degrees C). This prevents the use of high temperature as a means of achieving specificity as in the LCR, the ligation event can be used to detect a mutation at the junction site, but not elsewhere. A successful diagnostic method must be very specific. A straight-forward method of controlling the specificity of nucleic acid hybridization is by controlling the temperature of the reaction. While the 3SR/NASBA, and Qβ systems are all able to generate a large quantity of signal, one or more of the enzymes involved in each cannot be used at high temperature (i.e., > 55 degrees C). Therefore the reaction temperatures cannot be raised to prevent non-specific hybridization of the probes. If probes are shortened in order to make them melt more easily at low temperatures, the likelihood of having more than one perfect match in a complex genome increases. For these reasons, PCR and LCR cunently dominate the research field in detection technologies. The basis of the amplification procedure in the PCR and LCR is the fact that the products of one cycle become usable templates in all subsequent cycles, consequently doubling the population with each cycle. The final yield of any such doubling system can be expressed as:
(1+X)n =y, where "X" is the mean efficiency (percent copied in each cycle), "n" is the number of cycles, and "y" is the overall efficiency, or yield of the reaction. If every copy of a target DNA is utilized as a template in every cycle of a polymerase chain reaction, then the mean efficiency is 100 %. If 20 cycles of PCR are performed, then the yield will be 220; or i ,048,576 copies of the starting material. If the reaction conditions reduce the mean efficiency to 85 %, then the yield in those 20 cycles will be only 1.85 0^ or 220,513 copies of the starting material. In other words, a PCR running at 85 %> efficiency will yield only 21 % as much final product, compared to a reaction running at 100 %> efficiency. A reaction that is reduced to 50 % mean efficiency will yield less than 1 % of the possible product. In practice, routine polymerase chain reactions rarely achieve the theoretical maximum yield, and PCRs are usually run for more than 20 cycles to compensate for the lower yield. At 50 % mean efficiency, it would take 34 cycles to achieve the million-fold amplification theoretically possible in 20, and at lower efficiencies, the number of cycles required becomes prohibitive. In addition, any background products that amplify with a better mean efficiency than the intended target will become the dominant products. Also, many variables can influence the mean efficiency of PCR, including target DNA length and secondary structure, primer length and design, primer and dNTP concentrations, and buffer composition, to name but a few. Contamination of the reaction with exogenous DNA (e.g., DNA spilled onto lab surfaces) or cross-contamination is also a major consideration. Reaction conditions must be carefully optimized for each different primer pair and target sequence, and the process can take days, even for an experienced investigator. The laboriousness of this process, including numerous technical considerations and other factors, presents a significant drawback to using PCR in the clinical setting. Indeed, PCR has yet to penetrate the clinical market in a significant way. The same concerns arise with LCR, as LCR must also be optimized to use different oligonucleotide sequences for each target sequence. In addition, both methods require expensive equipment, capable of precise temperature cycling. Many applications of nucleic acid detection technologies, such as in studies of allelic variation, involve not only detection of a specific sequence in a complex background, but also the discrimination between sequences with few, or single, nucleotide differences. One method of the detection of allele-specific variants by PCR is based upon the fact that it is difficult for Taq polymerase to synthesize a DNA strand when there is a mismatch between the template strand and the 3' end of the primer. An allele-specific variant may be detected by the use of a primer that is perfectly matched with only one of the possible alleles; the mismatch to the other allele acts to prevent the extension of the primer, thereby preventing the amplification of that sequence. This method has a substantial limitation in that the base composition of the mismatch influences the ability to prevent extension across the mismatch, and certain mismatches do not prevent extension or have only a minimal effect. A similar 3 '-mismatch strategy is used with greater effect to prevent ligation in the LCR. Any mismatch effectively blocks the action of the thermostable ligase, but LCR still has the drawback of target-independent background ligation products initiating the amplification. Moreover, the combination of PCR with subsequent LCR to identify the nucleotides at individual positions is also a clearly cumbersome proposition for the clinical laboratory. The direct detection method according to various prefened embodiments of the present invention may be, for example a cycling probe reaction (CPR) or a branched DNA analysis. When a sufficient amount of a nucleic acid to be detected is available, there are advantages to detecting that sequence directly, instead of making more copies of that target, (e.g., as in PCR and LCR). Most notably, a method that does not amplify the signal exponentially is more amenable to quantitative analysis. Even if the signal is enhanced by attaching multiple dyes to a single oligonucleotide, the conelation between the final signal intensity and amount of target is direct. Such a system has an additional advantage that the products of the reaction will not themselves promote further reaction, so contamination of lab surfaces by the products is not as much of a concern. Recently devised techniques have sought to eliminate the use of radioactivity and/or improve the sensitivity in automatable formats. Two examples are the "Cycling Probe Reaction" (CPR), and "Branched DNA" (bDNA). Cycling probe reaction (CPR): The cycling probe reaction (CPR), uses a long chimeric oligonucleotide in which a central portion is made of RNA while the two termini are made of DNA. Hybridization of the probe to a target DNA and exposure to a thermostable RNase H causes the RNA portion to be digested. This destabilizes the remaining DNA portions of the duplex, releasing the remainder of the probe from the target DNA and allowing another probe molecule to repeat the process. The signal, in the form of cleaved probe molecules, accumulates at a linear rate. While the repeating process increases the signal, the RNA portion of the oligonucleotide is vulnerable to RNases that may canied through sample preparation. Branched DNA: Branched DNA (bDNA), involves oligonucleotides with branched structures that allow each individual oligonucleotide to carry 35 to 40 labels (e.g., alkaline phosphatase enzymes). While this enhances the signal from a hybridization event, signal from non-specific binding is similarly increased. The detection of at least one sequence change according to various prefened embodiments of the present invention may be accomplished by, for example restriction fragment length polymorphism (RFLP analysis), allele specific oligonucleotide (ASO) analysis, Denaturing/Temperature Gradient Gel Electrophoresis (DGGE/TGGE), Single-Strand Conformation Polymoφhism (SSCP) analysis or Dideoxy fingeφrinting (ddF). The demand for tests which allow the detection of specific nucleic acid sequences and sequence changes is growing rapidly in clinical diagnostics. As nucleic acid sequence data for genes from humans and pathogenic organisms accumulates, the demand for fast, cost-effective, and easy-to-use tests for as yet mutations within specific sequences is rapidly increasing. A handful of methods have been devised to scan nucleic acid segments for mutations. One option is to determine the entire gene sequence of each test sample (e.g., a bacterial isolate). For sequences under approximately 600 nucleotides, this may be accomplished using amplified material (e.g., PCR reaction products). This avoids the time and expense associated with cloning the segment of interest. However, specialized equipment and highly trained personnel are required, and the method is too labor-intense and expensive to be practical and effective in the clinical setting. In view of the difficulties associated with sequencing, a given segment of nucleic acid may be characterized on several other levels. At the lowest resolution, the size of the molecule can be determined by electrophoresis by comparison to a known standard run on the same gel. A more detailed picture of the molecule may be achieved by cleavage with combinations of restriction enzymes prior to electrophoresis, to allow construction of an ordered map. The presence of specific sequences within the fragment can be detected by hybridization of a labeled probe, or the precise nucleotide sequence can be determined by partial chemical degradation or by primer extension in the presence of chain-terminating nucleotide analogs. Restriction fragment length polymorphism (RFLP): For detection of single-base differences between like sequences, the requirements of the analysis are often at the highest level of resolution. For cases in which the position of the nucleotide in question is known in advance, several methods have been developed for examining single base changes without direct sequencing. For example, if a mutation of interest happens to fall within a restriction recognition sequence, a change in the pattern of digestion can be used as a diagnostic tool (e.g., restriction fragment length polymoφhism [RFLP] analysis). Single point mutations have been also detected by the creation or destruction of RFLPs. Mutations are detected and localized by the presence and size of the RNA fragments generated by cleavage at the mismatches. Single nucleotide mismatches in DNA heteroduplexes are also recognized and cleaved by some chemicals, providing an alternative strategy to detect single base substitutions, generically named the "Mismatch Chemical Cleavage" (MCC). However, this method requires the use of osmium tetroxide and piperidine, two highly noxious chemicals which are not suited for use in a clinical laboratory. RFLP analysis suffers from low sensitivity and requires a large amount of sample. When
RFLP analysis is used for the detection of point mutations, it is, by its nature, limited to the detection of only those single base changes which fall within a restriction sequence of a known restriction endonuclease. Moreover, the majority of the available enzymes have 4 to 6 base-pair recognition sequences, and cleave too frequently for many large-scale DNA manipulations. Thus, it is applicable only in a small fraction of cases, as most mutations do not fall within such sites. A handful of rare-cutting restriction enzymes with 8 base-pair specificities have been isolated and these are widely used in genetic mapping, but these enzymes are few in number, are limited to the recognition of G+C-rich sequences, and cleave at sites that tend to be highly clustered. Recently, endonucleases encoded by group I introns have been discovered that might have greater than 12 base-pair specificity, but again, these are few in number. Allele specific oligonucleotide (ASO): If the change is not in a recognition sequence, then allele-specific oligonucleotides (ASOs), can be designed to hybridize in proximity to the mutated nucleotide, such that a primer extension or ligation event can bused as the indicator of a match or a mis-match. Hybridization with radioactively labeled allelic specific oligonucleotides (ASO) also has been applied to the detection of specific point mutations. The method is based on the differences in the melting temperature of short DNA fragments differing by a single nucleotide. Stringent hybridization and washing conditions can differentiate between mutant and wild-type alleles. The ASO approach applied to PCR products also has been extensively utilized by various researchers to detect and characterize point mutations in ras genes and gsp/gip oncogenes. Because of the presence of various nucleotide changes in multiple positions, the ASO method requires the use of many oligonucleotides to cover all possible oncogenic mutations. With either of the techniques described above (i.e., RFLP and ASO), the precise location of the suspected mutation must be known in advance of the test. That is to say, they are inapplicable when one needs to detect the presence of a mutation within a gene or sequence of interest. Denaturing/Temperature Gradient Gel Electrophoresis (DGGE/TGGE): Two other methods rely on detecting changes in electrophoretic mobility in response to minor sequence changes. One of these methods, termed "Denaturing Gradient Gel Electrophoresis" (DGGE) is based on the observation that slightly different sequences will display different patterns of local melting when electrophoretically resolved on a gradient gel. In this manner, variants can be distinguished, as differences in melting properties of homoduplexes versus heteroduplexes differing in a single nucleotide can detect the presence of mutations in the target sequences because of the conesponding changes in their electrophoretic mobilities. The fragments to be analyzed, usually PCR products, are "clamped" at one end by a long stretch of G-C base pairs (30-80) to allow complete denaturation of the sequence of interest without complete dissociation of the strands. The attachment of a GC "clamp" to the DNA fragments increases the fraction of mutations that can be recognized by DGGE. Attaching a GC clamp to one primer is critical to ensure that the amplified sequence has a low dissociation temperature. Modifications of the teclmique have been developed, using temperature gradients, and the method can be also applied to RNA.-RNA duplexes. Limitations on the utility of DGGE include the requirement that the denaturing conditions must be optimized for each type of DNA to be tested. Furthermore, the method requires specialized equipment to prepare the gels and maintain the needed high temperatures during electrophoresis. The expense associated with the synthesis of the clamping tail on one oligonucleotide for each sequence to be tested is also a major consideration. In addition, long running times are required for DGGE. The long running time of DGGE was shortened in a modification of DGGE called constant dena urant gel electrophoresis (CDGE). CDGE requires that gels be perfonned under different denaturant conditions in order to reach high efficiency for the detection of mutations. A technique analogous to DGGE, termed temperature gradient gel electrophoresis
(TGGE), uses a thermal gradient rather than a chemical denaturant gradient. TGGE requires the use of specialized equipment which can generate a temperature gradient peφendicularly oriented relative to the electrical field. TGGE can detect mutations in relatively small fragments of DNA therefore scanning of large gene segments requires the use of multiple PCR products prior to running the gel. Single-Strand Conformation Polymorphism (SSCP): Another common method, called "Single-Strand Conformation Polymoφhism" (SSCP) was developed by Hayashi, Sekya and colleagues and is based on the observation that single strands of nucleic acid can take on characteristic conformations in non-denaturing conditions, and these conformations influence electrophoretic mobility. The complementary strands assume sufficiently different structures that one strand may be resolved from the other. Changes in sequences within the fragment will also change the conformation, consequently altering the mobility and allowing this to be used as an assay for sequence variations. The SSCP process involves denaturing a DNA segment (e.g., a PCR product) that is labeled on both strands, followed by slow electrophoretic separation on a non-denaturing polyacrylamide gel, so that intra-molecular interactions can form and not be disturbed during the run. This technique is extremely sensitive to variations in gel composition and temperature. A serious limitation of this method is the relative difficulty encountered in comparing data generated in different laboratories, under apparently similar conditions. Dideoxy fingerprinting (ddF): The dideoxy fϊngeφrinting (ddF) is another technique developed to scan genes for the presence of mutations. The ddF technique combines components of Sanger dideoxy sequencing with SSCP. A dideoxy sequencing reaction is performed using one dideoxy terminator and then the reaction products are electrophoresed on nondenaturing polyacrylamide gels to detect alterations in mobility of the termination segments as in SSCP analysis. While ddF is an improvement over SSCP in terms of increased sensitivity, ddF requires the use of expensive dideoxynucleotides and this technique is still limited to the analysis of fragments of the size suitable for SSCP (i.e., fragments of 200-300 bases for optimal detection of mutations). In addition to the above limitations, all of these methods are limited as to the size of the nucleic acid fragment that can be analyzed. For the direct sequencing approach, sequences of greater than 600 base pairs require cloning, with the consequent delays and expense of either deletion sub-cloning or primer walking, in order to cover the entire fragment. SSCP and DGGE have even more severe size limitations. Because of reduced sensitivity to sequence changes, these methods are not considered suitable for larger fragments. Although SSCP is reportedly able to detect 90 % of single-base substitutions within a 200 base-pair fragment, the detection drops to less than 50 % for 400 base pair fragments. Similarly, the sensitivity of DGGE decreases as the length of the fragment reaches 500 base-pairs. The ddF technique, as a combination of direct sequencing and SSCP, is also limited by the relatively small size of the DNA that can be screened. According to a presently prefened embodiment of the present invention the step of searching for any of the nucleic acid sequences described here, in tumor cells or in cells derived from a cancer patient is effected by any suitable teclmique, including, but not limited to, nucleic acid sequencing, polymerase chain reaction, ligase chain reaction, self-sustained synthetic reaction, Qβ-Replicase, cycling probe reaction, branched DNA, restriction fragment length polymoφhism analysis, mismatch chemical cleavage, heteroduplex analysis, allele-specific oligonucleotides, denaturing gradient gel electrophoresis, constant denaturant gel electrophoresis, temperature gradient gel electrophoresis and dideoxy fingeφrinting. Detection may also optionally be performed with a chip or other such device. The nucleic acid sample which includes the candidate region to be analyzed is preferably isolated, amplified and labeled with a reporter group. This reporter group can be a fluorescent group such as phycoerythrin. The labeled nucleic acid is then incubated with the probes immobilized on the chip using a fluidics station, describe the fabrication of fluidics devices and particularly microcapillary devices, in silicon and glass substrates. Once the reaction is completed, the chip is inserted into a scanner and patterns of hybridization are detected. The hybridization data is collected, as a signal emitted from the reporter groups already incoφorated into the nucleic acid, which is now bound to the probes attached to the chip. Since the sequence and position of each probe immobilized on the chip is known, the identity of the nucleic acid hybridized to a given probe can be deteπnined. It will be appreciated that when utilized along with automated equipment, the above described detection methods can be used to screen multiple samples for a disease and/or pathological condition both rapidly and easily.
Amino acid sequences and peptides The terms "polypeptide," "peptide" and "protein" are used interchangeably herein to refer to a polymer of amino acid residues. The terms apply to amino acid polymers in which one or more amino acid residue is an analog or mimetic of a conesponding naturally occuning amino acid, as well as to naturally occuning amino acid polymers. Polypeptides can be modified, e.g., by the addition of carbohydrate residues to form glycoproteins. The terms
"polypeptide," "peptide" and "protein" include glycoproteins, as well as non-glycoproteins. Polypeptide products can be biochemically synthesized such as by employing standard solid phase techniques. Such methods include but are not limited to exclusive solid phase synthesis, partial solid phase synthesis methods, fragment condensation, classical solution synthesis. These methods are preferably used when the peptide is relatively short (i.e., 10 kDa) and/or when it cannot be produced by recombinant techniques (i.e., not encoded by a nucleic acid sequence) and therefore involves different chemistry. Solid phase polypeptide synthesis procedures are well known in the art and further described by John Monow Stewart and Janis Dillaha Young, Solid Phase Peptide Syntheses (2nd
Ed., Pierce Chemical Company, 1984). Synthetic polypeptides can optionally be purified by preparative high performance liquid chromatography [Creighton T. (1983) Proteins, structures and molecular principles. WH Freeman and Co. N.Y.], after which their composition can be confirmed via amino acid sequencing. In cases where large amounts of a polypeptide are desired, it can be generated using recombinant techniques such as described by Bitter et al., (1987) Methods in Enzymol. 153:516-
544, Studier et al. (1990) Methods in Enzymol. 185:60-89, Brisson et al. (1984) Nature 310:511- 514, Takamatsu et al. (1987) EMBO J. 6:307-311, Coruzzi et al. (1984) EMBO J. 3:1671-1680 and Brogli et al., (1984) Science 224:838-843, Gurley et al. (1986) Mol. Cell. Biol. 6:559-565 and Weissbach & Weissbach, 1988, Methods for Plant Molecular Biology, Academic Press, NY, Section VIII, pp 421-463. The present invention also encompasses polypeptides encoded by the polynucleotide sequences of the present invention, as well as polypeptides according to the amino acid sequences described herein. The present invention also encompasses homologues of these polypeptides, such homologues can be at least 50 %, at least 55 %, at least 60%>, at least 65 %, at least 70 %, at least 75 %>, at least 80 %, at least 85 %, at least 95 % or more say 100 % homologous to the amino acid sequences set forth below, as can be determined using BlastP software of the National Center of Biotechnology Information (NCBI) using default parameters, optionally and preferably including the following: filtering on (this option filters repetitive or low-complexity sequences from the query using the Seg (protein) program), scoring matrix is BLOSUM62 for proteins, word size is 3, E value is 10, gap costs are 11, 1 (initialization and extension), and number of alignments shown is 50. Optionally and preferably, nucleic acid sequence homology/identity may be determined by using BlastN software of the National Center of Biotechnology Information (NCBI) using default parameters, which preferably include using the DUST filter program, and also preferably include having an E value of 10, filtering low complexity sequences and a word size of 11. Finally, the present invention also encompasses fragments of the above described polypeptides and polypeptides having mutations, such as deletions, insertions or substitutions of one or more amino acids, either naturally occurring or artificially induced, either randomly or in a targeted fashion. It will be appreciated that peptides identified according the present invention may be degradation products, synthetic peptides or recombinant peptides as well as peptidomimetics, typically, synthetic peptides and peptoids and semipeptoids which are peptide analogs, which may have, for example, modifications rendering the peptides more stable while in a body or more capable of penetrating into cells. Such modifications include, but are not limited to N terminus modification, C terminus modification, peptide bond modification, including, but not limited to, CH2-NH, CH2-S, CH2-S=0, 0=C-NH, CH2-0, CH2-CH2, S=C-NH, CH=CH or CF=CH, backbone modifications, and residue modification. Methods for preparing peptidomimetic compounds are well known in the art and are specified. Further details in this respect are provided hereinunder. Peptide bonds (-CO-NH-) within the peptide may be substituted, for example, by N- methylated bonds (-N(CH3)-CO-), ester bonds (-C(R)H-C-0-0-C(R)-N-), ketomethylen bonds (-C0-CH2-), α-aza bonds (-NH-N(R)-CO-), wherein R is any alkyl, e.g., methyl, carba bonds (- CH2-NH-), hydroxyethylene bonds (-CH(OH)-CH2-), thioamide bonds (-CS-NH-), olefinic double bonds (-CH=CH-), retro amide bonds (-NH-CO-), peptide derivatives (-N(R)-CH2-CO-), wherein R is the "normal" side chain, naturally presented on the carbon atom. These modifications can occur at any of the bonds along the peptide chain and even at several (2-3) at the same time. Natural aromatic amino acids, Tφ, Tyr and Phe, may be substituted for synthetic non- natural acid such as Phenylglycine, TIC, naphthyl elanine (Nol), ring-methylated derivatives of Phe, halogenated derivatives of Phe or o-methyl-Tyr. In addition to the above, the peptides of the present invention may also include one or more modified amino acids or one or more non-amino acid monomers (e.g. fatty acids, complex carbohydrates etc). As used herein in the specification and in the claims section below the term "amino acid" or "amino acids" is understood to include the 20 naturally occuning amino acids; those amino acids often modified post-translationally in vivo, including, for example, hydroxyproline, phosphoserine and phosphothreonine; and other unusual amino acids including, but not limited to, 2-aminoadipic acid, hydroxylysine, isodesmosine, nor-valine, nor-leucine and ornithine. Furthermore, the term "amino acid" includes both D- and L-amino acids. Table 1 non-conventional or modified amino acids which can be used with the present invention.
Table 1
Figure imgf000217_0001
Figure imgf000218_0001
Figure imgf000219_0001
Figure imgf000220_0001
Figure imgf000221_0001
Table 1 Cont. Since the peptides of the present invention are preferably utilized in diagnostics which require the peptides to be in soluble foπn, the peptides of the present invention preferably include one or more non-natural or natural polar amino acids, including but not limited to serine and threonine which are capable of increasing peptide solubility due to their hydroxy 1-containing side chain. The peptides of the present invention are preferably utilized in a linear form, although it will be appreciated that in cases where cyclicization does not severely interfere with peptide characteristics, cyclic forms of the peptide can also be utilized. The peptides of present invention can be biochemically synthesized such as by using standard solid phase techniques. These methods include exclusive solid phase synthesis well known in the art, partial solid phase synthesis methods, fragment condensation, classical solution synthesis. These methods are preferably used when the peptide is relatively short (i.e., 10 kDa) and/or when it cannot be produced by recombinant techniques (i.e., not encoded by a nucleic acid sequence) and therefore involves different chemistry. Synthetic peptides can be purified by preparative high perfonnance liquid chromatography and the composition of which can be confirmed via amino acid sequencing. In cases where large amounts of the peptides of the present invention are desired, the peptides of the present invention can be generated using recombinant techniques such as described by Bitter et al., (1987) Methods in Enzymol. 153:516-544, Studier et al. (1990) Methods in Enzymol. 185:60-89, Brisson et al. (1984) Nature 310:511-514, Takamatsu et al. (1987) EMBO J. 6:307-311, Coruzzi et al. (1984) EMBO J. 3:1671-1680 and Brogli et al., (1984) Science 224:838-843, Gurley et al. (1986) Mol. Cell. Biol. 6:559-565 and Weissbach & Weissbach, 1988, Methods for Plant Molecular Biology, Academic Press, NY, Section VIII, pp 421-463 and also as described above.
Antibodies "Antibody" refers to a polypeptide ligand that is preferably substantially encoded by an immunoglobulin gene or immunoglobulin genes, or fragments thereof, which specifically binds and recognizes an epitope (e.g., an antigen). The recognized immunoglobulin genes include the kappa and lambda light chain constant region genes, the alpha, gamma, delta, epsilon and mu heavy chain constant region genes, and the myriad-immunoglobulin variable region genes. Antibodies exist, e.g., as intact immunoglobulins or as a number of well characterized fragments produced by digestion with various peptidases. This includes, e.g., Fab' and F(ab)'2 fragments. The tenn "antibody," as used herein, also includes antibody fragments either produced by the modification of whole antibodies or those synthesized de novo using recombinant DNA methodologies. It also includes polyclonal antibodies, monoclonal antibodies, chimeric antibodies, humanized antibodies, or single chain antibodies. "Fc" portion of an antibody refers to that portion of an immunoglobulin heavy chain that comprises one or more heavy chain constant region domains, CHI, CH2 and CH3, but does not include the heavy chain variable region. The functional fragments of antibodies, such as Fab, F(ab')2, and Fv that are capable of binding to macrophages, are described as follows: (1) Fab, the fragment which contains a monovalent antigen-binding fragment of an antibody molecule, can be produced by digestion of whole antibody with the enzyme papain to yield an intact light chain and a portion of one heavy chain; (2) Fab', the fragment of an antibody molecule that can be obtained by treating whole antibody with pepsin, followed by reduction, to yield an intact light chain and a portion of the heavy chain; two Fab' fragments are obtained per antibody molecule; (3) (Fab')2, the fragment of the antibody that can be obtained by treating whole antibody with the enzyme pepsin without subsequent reduction; F(ab')2 is a dimer of two Fab' fragments held together by two disulfide bonds; (4) Fv, defined as a genetically engineered fragment containing the variable region of the light chain and the variable region of the heavy chain expressed as two chains; and (5) Single chain antibody ("SCA"), a genetically engineered molecule containing the variable region of the light chain and the variable region of the heavy chain, linked by a suitable polypeptide linker as a genetically fused single chain molecule. Methods of producing polyclonal and monoclonal antibodies as well as fragments thereof are well known in the art (See for example, Harlow and Lane, Antibodies: A Laboratory Manual, Cold Spring Harbor Laboratory, New York, 1988, incoφorated herein by reference). Antibody fragments according to the present invention can be prepared by proteolytic hydrolysis of the antibody or by expression in E. coli or mammalian cells (e.g. Chinese hamster ovary cell culture or other protein expression systems) of DNA encoding the fragment. Antibody fragments can be obtained by pepsin or papain digestion of whole antibodies by conventional methods. For example, antibody fragments can be produced by enzymatic cleavage of antibodies with pepsin to provide a 5S fragment denoted F(ab')2. This fragment can be further cleaved using a thiol reducing agent, and optionally a blocking group for the sulfhydryl groups resulting from cleavage of disulfide linkages, to produce 3.5S Fab' monovalent fragments. Alternatively, an enzymatic cleavage using pepsin produces two monovalent Fab' fragments and an Fc fragment directly. These methods are described, for example, by Goldenberg, U.S. Pat. Nos. 4,036,945 and 4,331,647, and references contained therein, which patents are hereby incoφorated by reference in their entirety. See also Porter, R. R. [Biochem. J. 73: 119-126 (1959)]. Other methods of cleaving antibodies, such as separation of heavy chains to form monovalent light-heavy chain fragments, further cleavage of fragments, or other enzymatic, chemical, or genetic techniques may also be used, so long as the fragments bind to the antigen that is recognized by the intact antibody. Fv fragments comprise an association of VH and VL chains. This association may be noncovalent, as described in Inbar et al. [Proc. Nat'l Acad. Sci. USA 69:2659-62 (19720]. Alternatively, the variable chains can be linked by an intermolecular disulfide bond or cross- linked by chemicals such as glutaraldehyde. Preferably, the Fv fragments comprise VH and VL chains connected by a peptide linker. These single-chain antigen binding proteins (sFv) are prepared by constructing a structural gene comprising DNA sequences encoding the VH and VL domains connected by an oligonucleotide. The structural gene is inserted into an expression vector, which is subsequently introduced into a host cell such as E. coli. The recombinant host cells synthesize a single polypeptide chain with a linker peptide bridging the two V domains. Methods for producing sFvs are described, for example, by [Whitlow and Filpula, Methods 2: 97-105 (1991); Bird et al., Science 242:423-426 (1988); Pack et al., Bio/Technology 11:1271-77 (1993); and U.S. Pat. No. 4,946,778, which is hereby inco orated by reference in its entirety. Another form of an antibody fragment is a peptide coding for a single complementarity- detennining region (CDR). CDR peptides ("minimal recognition units") can be obtained by constructing genes encoding the CDR of an antibody of interest. Such genes are prepared, for example, by using the polymerase chain reaction to synthesize the variable region from RNA of antibody-producing cells. See, for example, Larrick and Fry [Methods, 2: 106-10 (1991)]. Humanized forms of non-human (e.g., murine) antibodies are chimeric molecules of immunoglobulins, immunoglobulin chains or fragments thereof (such as Fv, Fab, Fab', F(ab') or other antigen-binding subsequences of antibodies) which contain minimal sequence derived from non-human immunoglobulin. Humanized antibodies include human immunoglobulins (recipient antibody) in which residues from a complementary determining region (CDR) of the recipient are replaced by residues from a CDR of a non-human species (donor antibody) such as mouse, rat or rabbit having the desired specificity, affinity and capacity. In some instances, Fv framework residues of the human immunoglobulin are replaced by conesponding non-human residues. Humanized antibodies may also comprise residues which are found neither in the recipient antibody nor in the imported CDR or framework sequences. In general, the humanized antibody will comprise substantially all of at least one, and typically two, variable domains, in which all or substantially all of the CDR regions conespond to those of a non-human immunoglobulin and all or substantially all of the FR regions are those of a human immunoglobulin consensus sequence. The humanized antibody optimally also will comprise at least a portion of an immunoglobulin constant region (Fc), typically that of a human immunoglobulin [Jones et al., Nature, 321:522-525 (1986); Riechmann et al., Nature, 332:323- 329 (1988); and Presta, Cun. Op. Struct. Biol., 2:593-596 (1992)]. Methods for humanizing non-human antibodies are well known in the art. Generally, a humanized antibody has one or more amino acid residues introduced into it from a source which is non-human. These non-human amino acid residues are often refened to as import residues, which are typically taken from an import variable domain. Humanization can be essentially performed following the method of Winter and co-workers [Jones et al., Nature, 321:522-525 (1986); Riechmann et al., Nature 332:323-327 (1988); Verhoeyen et al., Science, 239:1534- 1536 (1988)], by substituting rodent CDRs or CDR sequences for the conesponding sequences of a human antibody. Accordingly, such humanized antibodies are chimeric antibodies (U.S. Pat. No. 4,816,567), wherein substantially less than an intact human variable domain has been substituted by the conesponding sequence from a non-human species. In practice, humanized antibodies are typically human antibodies in which some CDR residues and possibly some FR residues are substituted by residues from analogous sites in rodent antibodies. Human antibodies can also be produced using various techniques known in the art, including phage display libraries [Hoogenboom and Winter, J. Mol. Biol., 227:381 (1991); Marks et al., J. Mol. Biol., 222:581 (1991)]. The techniques of Cole et al. and Boemer et al. are also available for the preparation of human monoclonal antibodies (Cole et al., Monoclonal Antibodies and Cancer Therapy, Alan R. Liss, p. 77 (1985) and Boerner et al., J. Immunol., 147(l):86-95 (1991)]. Similarly, human antibodies can be made by introduction of human immunoglobulin loci into transgenic animals, e.g., mice in which the endogenous immunoglobulin genes have been partially or completely inactivated. Upon challenge, human antibody production is observed, which closely resembles that seen in humans in all respects, including gene reanangement, assembly, and antibody repertoire. This approach is described, for example, in U.S. Pat. Nos. 5,545,807; 5,545,806; 5,569,825; 5,625,126; 5,633,425; 5,661,016, and in the following scientific publications: Marks et al., Bio/Technology 10,: 779- 783 (1992); Lonberg et al, Nature 368: 856-859 (1994); Morrison, Nature 368 812-13 (1994); Fishwild et al., Nature Biotechnology 14, 845-51 (1996); Neuberger, Nature Biotechnology 14: 826 (1996); and Lonberg and Huszar, Intern. Rev. Immunol. 13, 65-93 (1995). Preferably, the antibody of this aspect of the present invention specifically binds at least one epitope of the polypeptide variants of the present invention. As used herein, the term "epitope" refers to any antigenic determinant on an antigen to which the paratope of an antibody binds. Epitopic determinants usually consist of chemically active surface groupings of molecules such as amino acids or carbohydrate side chains and usually have specific three dimensional structural characteristics, as well as specific charge characteristics. Optionally, a unique epitope may be created in a variant due to a change in one or more post-translational modifications, including but not limited to glycosylation and or phosphorylation, as described below. Such a change may also cause a new epitope to be created, for example through removal of glycosylation at a particular site. An epitope according to the present invention may also optionally comprise part or all of a unique sequence portion of a variant according to the present invention in combination with at least one other portion of the variant which is not contiguous to the unique sequence portion in the linear polypeptide itself, yet which are able to form an epitope in combination. One or more unique sequence portions may optionally combine with one or more other non-contiguous portions of the variant (including a portion which may have high homology to a portion of the known protein) to form an epitope.
Immunoassays In another embodiment of the present invention, an immunoassay can be used to qualitatively or quantitatively detect and analyze markers in a sample. This method comprises: providing an antibody that specifically binds to a marker; contacting a sample with the antibody; and detecting the presence of a complex of the antibody bound to the marker in the sample. To prepare an antibody that specifically binds to a marker, purified protein markers can be used. Antibodies that specifically bind to a protein marker can be prepared using any suitable methods known in the art. After the antibody is provided, a marker can be detected and/or quantified using any of a number of well recognized immunological binding assays. Useful assays include, for example, an enzyme immune assay (EIA) such as enzyme-linked immunosorbent assay (ELISA), a radioimmune assay (RIA), a Western blot assay, or a slot blot assay see, e.g., U.S. Pat. Nos. 4,366,241; 4,376,110; 4,517,288; and 4,837,168). Generally, a sample obtained from a subject can be contacted with the antibody that specifically binds the marker. Optionally, the antibody can be fixed to a solid support to facilitate washing and subsequent isolation of the complex, prior to contacting the antibody with a sample. Examples of solid supports include but are not limited to glass or plastic in the form of, e.g., a microtiter plate, a stick, a bead, or a microbead. Antibodies can also be attached to a solid support. After incubating the sample with antibodies, the mixture is washed and the antibody- marker complex formed can be detected. This can be accomplished by incubating the washed mixture with a detection reagent. Alternatively, the marker in the sample can be detected using an indirect assay, wherein, for example, a second, labeled antibody is used to detect bound marker-specific antibody, and/or in a competition or inhibition assay wherein, for example, a monoclonal antibody which binds to a distinct epitope of the marker are incubated simultaneously with the mixture. Throughout the assays, incubation and/or washing steps may be required after each combination of reagents. Incubation steps can vary from about 5 seconds to several hours, preferably from about 5 minutes to about 24 hours. However, the incubation time will depend upon the assay format, marker, volume of solution, concentrations and the like. Usually the assays will be carried out at ambient temperature, although they can be conducted over a range of temperatures, such as 10 °C to 40 °C. The immunoassay can be used to detennine a test amount of a marker in a sample from a subject. First, a test amount of a marker in a sample can be detected using the immunoassay methods described above. If a marker is present in the sample, it will foπn an antibody-marker complex with an antibody that specifically binds the marker under suitable incubation conditions described above. The amount of an antibody-marker complex can optionally be determined by comparing to a standard. As noted above, the test amount of marker need not be measured in absolute units, as long as the unit of measurement can be compared to a control amount and/or signal. Preferably used are antibodies which specifically interact with the polypeptides of the present invention and not with wild type proteins or other isofonns thereof, for example. Such antibodies are directed, for example, to the unique sequence portions of the polypeptide variants of the present invention, including but not limited to bridges, heads, tails and insertions described in greater detail below. Prefened embodiments of antibodies according to the present invention are described in greater detail with regard to the section entitled "Antibodies". Radio-immunoassay (RIA): In one version, this method involves precipitation of the desired substrate and in the methods detailed hereinbelow, with a specific antibody and radiolabelled antibody binding protein (e.g., protein A labeled with I 125 ) immobilized on a precipitable canier such as agarose beads. The number of counts in the precipitated pellet is proportional to the amount of substrate. In an alternate version of the RIA, a labeled substrate and an unlabelled antibody binding protein are employed. A sample containing an unknown amount of substrate is added in varying amounts. The decrease in precipitated counts from the labeled substrate is proportional to the amount of substrate in the added sample. Enzyme linked immunosorbent assay (ELISA): This method involves fixation of a sample (e.g., fixed cells or a proteinaceous solution) containing a protein substrate to a surface such as a well of a microtiter plate. A substrate specific antibody coupled to an enzyme is applied and allowed to bind to the substrate. Presence of the antibody is then detected and quantitated by a colorimetric reaction employing the enzyme coupled to the antibody. Enzymes commonly employed in this method include horseradish peroxidase and alkaline phosphatase. If well calibrated and within the linear range of response, the amount of substrate present in the sample is proportional to the amount of color produced. A substrate standard is generally employed to improve quantitative accuracy. Western blot: This method involves separation of a substrate from other protein by means of an acrylamide gel followed by transfer of the substrate to a membrane (e.g., nylon or PVDF). Presence of the substrate is then detected by antibodies specific to the substrate, which are in turn detected by antibody binding reagents. Antibody binding reagents may be, for example, protein A, or other antibodies. Antibody binding reagents may be radiolabelled or enzyme linked as described hereinabove. Detection may be by autoradiography, colorimetric reaction or chemiluminescence. This method allows both quantitation of an amount of substrate and determination of its identity by a relative position on the membrane which is indicative of a migration distance in the acrylamide gel during electrophoresis. Immunohistochemical analysis: This method involves detection of a substrate in situ in fixed cells by substrate specific antibodies. The substrate specific antibodies may be enzyme linked or linked to fmorophores. Detection is by microscopy and subjective evaluation. If enzyme linked antibodies are employed, a colorimetric reaction may be required. Fluorescence activated cell sorting (FACS): This method involves detection of a substrate in situ in cells by substrate specific antibodies. The substrate specific antibodies are linked to fmorophores. Detection is by means of a cell sorting machine which reads the wavelength of light emitted from each cell as it passes through a light beam. This method may employ two or more antibodies simultaneously.
Radio-imaging Methods These methods include but are not limited to, positron emission tomography (PET) single photon emission computed tomography (SPECT). Both of these techniques are non- invasive, and can be used to detect and/or measure a wide variety of tissue events and/or functions, such as detecting cancerous cells for example. Unlike PET, SPECT can optionally be used with two labels simultaneously. SPECT has some other advantages as well, for example with regard to cost and the types of labels that can be used. For example, US Patent No. 6,696,686 describes the use of SPECT for detection of breast cancer, and is hereby incoφorated by reference as if fully set forth herein.
Display Libraries According to still another aspect of the present invention there is provided a display library comprising a plurality of display vehicles (such as phages, viruses or bacteria) each displaying at least 6, at least 7, at least 8, at least 9, at least 10, 10-15, 12-17, 15-20, 15-30 or 20- 50 consecutive amino acids derived from the polypeptide sequences of the present invention. Methods of constructing such display libraries are well known in the art. Such methods are described in, for example, Young AC, et al, "The three-dimensional structures of a polysaccharide binding antibody to Cryptococcus neoformans and its complex with a peptide from a phage display library: implications for the identification of peptide mimotopes" J Mol Biol 1997 Dec 12;274(4):622-34; Giebel LB et al. "Screening of cyclic peptide phage libraries identifies ligands that bind streptavidin with high affinities" Biochemistry 1995 Nov 28;34(47): 15430-5; Davies EL et al, "Selection of specific phage-display antibodies using libraries derived from chicken immunoglobulin genes" J Immunol Methods 1995 Oct 12;186(l):125-35; Jones C RT al. "Cunent trends in molecular recognition and bioseparation" J Chromatogr A 1995 Jul 14;707(l):3-22; Deng SJ et al. "Basis for selection of improved carbohydrate-binding single-chain antibodies from synthetic gene libraries" Proc Natl Acad Sci U S A 1995 May 23;92(ll):4992-6; and Deng SJ et al. "Selection of antibody single-chain variable fragments with improved carbohydrate binding by phage display" J Biol Chem 1994 Apr l;269(13):9533-8, which are incoφorated herein by reference.
The following sections relate to Candidate Marker Examples (first section) and to Experimental Data for these Marker Examples (second section). It should be noted that Table numbering is restarted within each section.
CANDIDATE MARKER EXAMPLES SECTION This Section relates to Examples of sequences according to the present invention, including illustrative methods of selection thereof. Description of the methodology undertaken to uncover the biomolecular sequences of the present invention Human ESTs and cDNAs were obtained from GenBank versions 136 (June 15, 2003 ftp.ncbi.nih.gov/genbank/release.notes/gbl36.release.notes); NCBI genome assembly of April 2003; RefSeq sequences from June 2003; Genbank version 139 (December 2003); Human
Genome from NCBI (Build 34) (from Oct 2003); and RefSeq sequences from December 2003; and from the LifeSeq library of Incyte Coφoration (Wilmington, DE, USA; ESTs only). With regard to GenBank sequences, the human EST sequences from the EST (GBEST) section and the human mRNA sequences from the primate (GBPRI) section were used; also the human nucleotide RefSeq mRNA sequences were used (see for example www.ncbi.nlm.nih.gov/Genbank/GenbankOverview.html and for a reference to the EST section, see www.ncbi.nlm.nih.gov/dbEST/; a general reference to dbEST, the EST database in
GenBank, may be found in Boguski et al, Nat Genet. 1993 Aug;4(4):332-3; all of which are hereby incoφorated by reference as if fully set forth herein). Novel splice variants were predicted using the LEADS clustering and assembly system as described in Sorek, R., Ast, G. & Graur, D. Alu-containing exons are alternatively spliced.
Genome Res 12, 1060-7 (2002); US patent No: 6,625,545; and U.S. Pat. Appl. No. 10/426,002, published as US20040101876 on May 27 2004; all of which are hereby incoφorated by reference as if fully set forth herein. Briefly, the software cleans the expressed sequences from repeats, vectors and immunoglobulins. It then aligns the expressed sequences to the genome taking alternatively splicing into account and clusters overlapping expressed sequences into
"clusters" that represent genes or partial genes. These were annotated using the GeneCarta (Compugen, Tel-Aviv, Israel) platform. The
GeneCarta platform includes a rich pool of annotations, sequence information (particularly of spliced sequences), chromosomal information, alignments, and additional information such as
SNPs, gene ontology terms, expression profiles, functional analyses, detailed domain structures, known and predicted proteins and detailed homology reports. A brief explanation is provided with regard to the method of selecting the candidates.
However, it should noted that this explanation is provided for descriptive puφoses only, and is not intended to be limiting in any way. The potential markers were identified by a computational process that was designed to find genes and/or their splice variants that are over-expressed in tumor tissues, by using databases of expressed sequences. Various parameters related to the infonnation in the EST libraries, detennined according to a manual classification process, were used to assist in locating genes and/or splice variants thereof that are over-expressed in cancerous tissues. The detailed description of the selection method is presented in Example 1 below. The cancer biomarkers selection engine and the following wet validation stages are schematically summarized in Figure 1. EXAMPLE 1 Identification of differentially expressed gene products — Algorithm In order to distinguish between differentially expressed gene products and constitutively expressed genes (i.e., house keeping genes ) an algorithm based on an analysis of frequencies was configured. A specific algorithm for identification of transcripts over expressed in cancer is described hereinbelow. Dry analysis Library annotation - EST libraries are manually classified according to: (i) Tissue origin
(ii) Biological source - Examples of frequently used biological sources for construction of EST libraries include cancer cell-lines; normal tissues; cancer tissues; fetal tissues; and others such as nonnal cell lines and pools of normal cell-lines, cancer cell-lines and combinations thereof. A specific description of abbreviations used below with regard to these tissues/cell lines etc is given above.
(iii) Protocol of library construction - various methods are known in the art for library construction including normalized library construction; non-normalized library construction; subtracted libraries; ORESTES and others. It will be appreciated that at times the protocol of library construction is not indicated. The following rules are followed: EST libraries originating from identical biological samples are considered as a single library. EST libraries which included above-average levels of contamination, such as DNA contamination for example, were eliminated. The presence of such contamination was determined as follows. For each library, the number of unspliced ESTs that are not fully contained within other spliced sequences was counted. If the percentage of such sequences (as compared to all other sequences) was at least 4 standard deviations above the average for all libraries being analyzed, this library was tagged as being contaminated and was eliminated from further consideration in the below analysis (see also Sorek, R. & Safer, H.M. A novel algorithm for computational identification of contaminated EST libraries. Nucleic Acids Res 31, 1067-74 (2003)for further details). Clusters (genes) having at least five sequences including at least two sequences from the tissue of interest were analyzed. Splice variants were identified by using the LEADS software package as described above.
EXAMPLE 2
Identification of genes over expressed in cancer.
Two different scoring algorithms were developed. Libraries score -candidate sequences which are supported by a number of cancer libraries, are more likely to serve as specific and effective diagnostic markers. The basic algorithm - for each cluster the number of cancer and normal libraries contributing sequences to the cluster was counted. Fisher exact test was used to check if cancer libraries are significantly over-represented in the cluster as compared to the total number of cancer and normal libraries. Library counting: Small libraries (e.g., less than 1000 sequences) were excluded from consideration unless they participate in the cluster. For this reason, the total number of libraries is actually adjusted for each cluster. Clones no. score - Generally, when the number of ESTs is much higher in the cancer libraries relative to the nonnal libraries it might indicate actual over-expression. The algorithm - Clone counting: For counting EST clones each library protocol class was given a weight based on our belief of how much the protocol reflects actual expression levels: (i) non-normalized : 1 (ii) normalized : 0.2 (iii) all other classes : 0.1 Clones number score - The total weighted number of EST clones from cancer libraries was compared to the EST clones from normal libraries. To avoid cases where one library contributes to the majority of the score, the contribution of the library that gives most clones for a given cluster was limited to 2 clones. The score was computed as
Figure imgf000234_0001
where: c - weighted number of "cancer" clones in the cluster. C- weighted number of clones in all "cancer" libraries. n - weighted number of "nonnal" clones in the cluster. N- weighted number of clones in all "normal" libraries. Clones number score significance - Fisher exact test was used to check if EST clones from cancer libraries are significantly over-represented in the cluster as compared to the total number of EST clones from cancer and normal libraries. Two search approaches were used to find either general cancer-specific candidates or tumor specific candidates. • Libraries/sequences originating from tumor tissues are counted as well as libraries originating from cancer cell-lines ("normal" cell-lines were ignored). • Only libraries/sequences originating from tumor tissues are counted
EXAMPLE 3 Identification of tissue specific genes For detection of tissue specific clusters, tissue libraries/sequences were compared to the total number of libraries/sequences in cluster. Similar statistical tools to those described in above were employed to identify tissue specific genes. Tissue abbreviations are the same as for cancerous tissues, but are indicated with the header "normal tissue". The algorithm - for each tested tissue T and for each tested cluster the following were examined: 1. Each cluster includes at least 2 libraries from the tissue T. At least 3 clones
(weighed - as described above) from tissue T in the cluster; and 2. Clones from the tissue T are at least 40 % from all the clones participating in the tested cluster Fisher exact test P-values were computed both for library and weighted clone counts to check that the counts are statistically significant.
EXAMPLE 4
Identification of splice variants over expressed in cancer of clusters which are not over expressed in cancer Cancer-specific splice variants containing a unique region were identified. Identification of unique sequence regions in splice variants A Region is defined as a group of adjacent exons that always appear or do not appear together in each splice variant. A "segment" (sometimes refened also as "seg" or "node") is defined as the shortest contiguous transcribed region without known splicing inside. Only reliable ESTs were considered for region and segment analysis. An EST was defined as unreliable if: (i) Unspliced; (ii) Not covered by RNA; (iii) Not covered by spliced ESTs; and (iv) Alignment to the genome ends in proximity of long poly-A stretch or starts in proximity of long poly-T stretch. Only reliable regions were selected for further scoring. Unique sequence regions were considered reliable if: (i) Aligned to the genome; and (ii) Regions supported by more than 2 ESTs. The algorithm Each unique sequence region divides the set of transcripts into 2 groups: (i) Transcripts containing this region (group TA). (ii) Transcripts not containing this region (group TB). The set of EST clones of every cluster is divided into 3 groups: (i) Supporting (originating from) transcripts of group TA (SI). (ii) Supporting transcripts of group TB (S2). (iii) Supporting transcripts from both groups (S3). Library and clones number scores described above were given to SI group. Fisher Exact Test P-values were used to check if: SI is significantly enriched by cancer EST clones compared to S2; and SI is significantly enriched by cancer EST clones compared to cluster background
(S1+S2+S3). Identification of unique sequence regions and division of the group of transcripts accordingly is illustrated in Figure 2. Each of these unique sequence regions conesponds to a segment, also termed herein a "node".
Region 1: common to all transcripts, thus it is not considered; Region 2: specific to Transcript 1: T_l unique regions (2+6) against TJ+3 unique regions (3+4); Region 3: specific to Transcripts 2+3: TJ+3 unique regions (3+4) against Tl unique regions (2+6); Region 4: specific to Transcript 3: TJ unique regions (4) against Tl+2 unique regions (2+5+6); Region 5: specific to Transcript 1+2: TJ+2 unique regions (2+5+6) against T3 unique regions (4); Region 6: specific to Transcript 1: same as region 2.
EXAMPLE 5 Identification of cancer specific splice variants of genes over expressed in cancer A search for EST supported (no mRNA) regions for genes of: (i) known cancer markers (ii) Genes shown to be over-expressed in cancer in published micro-anay experiments. Reliable EST supported-regions were defined as supported by minimum of one of the following: (i) 3 spliced ESTs; or (ii) 2 spliced ESTs from 2 libraries; (iii) 10 unspliced ESTs from 2 libraries, or (iv) 3 libraries.
Actual Marker Examples The following examples relate to specific actual marker examples.
EXPERIMENTAL EXAMPLES SECTION This Section relates to Examples describing experiments involving these sequences, and illustrative, non-limiting examples of methods, assays and uses thereof. The materials and experimental procedures are explained first, as all experiments used them as a basis for the work that was performed.
The markers of the present invention were tested with regard to their expression in various cancerous and non-cancerous tissue samples. A description of the samples used in the panel is provided in Table 1 below. A description of the samples used in the normal tissue panel is provided in Table 2 below. Tests were then performed as described in the "Materials and
Experimental Procedures" section below.
Table 1: Tissue samples in testing panel
Figure imgf000238_0001
H5-CG-Adeno D-A jCG-235 IRecturn llchifov lAdenocarcmorna intramucosal Duke's A (F/66
Table 2: Tissue samples in normal panel: Lot no. Source Tissue Pathology Sex/Age
Figure imgf000239_0001
Figure imgf000240_0001
Figure imgf000241_0001
Materials and Experimental Procedures RNA preparation - RNA was obtained from Clontech (Franklin Lakes, NJ USA 07417, www.clontech.com), BioChain Inst. Inc. (Hayward, CA 94545 USA www.biochain.com), ABS (Wilmington, DE 19801, USA, http://www.absbioreagents.com) or Ambion (Austin, TX 78744 USA, http://www.ambion.com). Alternatively, RNA was generated from tissue samples using TRI-Reagent (Molecular Research Center), according to Manufacturer's instructions. Tissue and RNA samples were obtained from patients or from postmortem. Total RNA samples were treated with DNasel (Ambion) and purified using RNeasy columns (Qiagen). RT PCR - Purified RNA (1 μg) was mixed with 150 ng Random Hexamer primers (Invitrogen) and 500 μM dNTP in a total volume of 15.6 μl. The mixture was incubated for 5 min at 65 °C and then quickly chilled on ice. Thereafter, 5 μl of 5X Superscriptll first strand buffer (Invitrogen), 2.4μl O.IM DTT and 40 units RNasin (Promega) were added, and the mixture was incubated for 10 min at 25 °C, followed by further incubation at 42 °C for 2 min. Then, 1 μl (200units) of Superscriptll (Invitrogen) was added and the reaction (final volume of 25μl) was incubated for 50 min at 42 °C and then inactivated at 70 °C for 15min. The resulting cDNA was diluted 1 :20 in TE buffer (10 mM Tris pH=8, 1 mM EDTA pH=8). Real-Time RT-PCR analysis- cDNA (5μl), prepared as described above, was used as a template in Real-Time PCR reactions using the SYBR Green I assay (PE Applied Biosystem) with specific primers and UNG Enzyme (Eurogentech or ABI or Roche). The amplification was effected as follows: 50 °C for 2 min, 95 °C for 10 min, and then 40 cycles of 95 °C for 15sec, followed by 60 °C for 1 min. Detection was performed by using the PE Applied Biosystem SDS 7000. The cycle in which the reactions achieved a threshold level (Ct) of fluorescence was registered and was used to calculate the relative transcript quantity in the RT reactions. The relative quantity was calculated using the equation Q=efficiencyΛ"Ct. The efficiency of the PCR reaction was calculated from a standard curve, created by using serial dilutions of several reverse transcription (RT) reactions. To minimize inherent differences in the RT reaction, the resulting relative quantities were normalized to the geometric mean of the relative quantities of several housekeeping (HSKP) genes. Schematic summary of quantitative real-time PCR analysis is presented in Figure 3. As shown, the x-axis shows the cycle number.The CT =
Threshold Cycle point, which is the cycle that the amplification curve crosses the fluorescence threshold that was set in the experiment. This point is a calculated cycle number in which PCR products signal is above the background level (passive dye ROX) and still in the Geometric/Exponential phase (as shown, once the level of fluorescence crosses the measurement threshold, it has a geometrically increasing phase, during which measurements are most accurate, followed by a linear phase and a plateau phase; for quantitative measurements, the latter two phases do not provide accurate measurements). The y-axis shows the normalized reporter fluorescence. It should be noted that this type of analysis provides relative quantification.
The sequences of the housekeeping genes measured in all the examples on tissue testing panel were as follows:
PBGD (GenBank Accession No. BC019323), PBGD Forward primer (SEQ ID NO:529): TGAGAGTGATTCGCGTGGG PBGD Reverse primer (SEQ ID NO:530): CCAGGGTACGAGGCTTTCAAT PBGD-amplicon (SEQ ID NO:531):
TGAGAGTGATTCGCGTGGGTACCCGCAAGAGCCAGCTTGCTCGCATACAGACGGAC AGTGTGGTGGCAACATTGAAAGCCTCGTACCCTGG
HPRT1 (GenBank Accession No. NM_000194),
HPRT1 Forward primer (SEQ ID NO:532): TGACACTGGCAAAACAATGCA HPRT1 Reverse primer (SEQ ID N0.533): GGTCCTTTTCACCAGCAAGCT HPRTl-amplicon (SEQ ID O:612):
TGACACTGGCAAAACAATGCAGACTTTGCTTTCCTTGGTCAGGCAGTATAATCCAA AGATGGTCAAGGTCGCAAGCTTGCTGGTGAAAAGGACC
G6PD (GenBank Accession No. NM_000402) G6PD Forward primer (SEQ ID NO:613): gaggccgtcaccaagaacat
G6PD Reverse primer (SEQ ID NO:614): ggacagccggtcagagctc
G6PD-amplicon (SEQ ID NO:615): gaggccgtcaccaagaacattcacgagtcctgcatgagccagataggctggaaccgcatcatcgtggagaagcccttcgggagggacct gcagagctctgaccggctgtcc
RPS27A (GenBank Accession No. NM_002954)
RPS27A Forward primer (SEQ ID NO:642): CTGGCAAGCAGCTGGAAGAT
RPS27A Reverse primer (SEQ ID NO:1260): TTTCTTAGCACCACCACGAAGTC RPS27A-amplicon (SEQ ID NO: 1261): CTGGCAAGCAGCTGGAAGATGGACGTACTTTGTCTGACTACAATATTCAAAAGGAG
TCTACTCTTCATCTTGTGTTGAGACTTCGTGGTGGTGCTAAGAAA
The sequences of the housekeeping genes measured in all the examples on normal tissue panel were as follows:
RPL19 (GenBank Accession No. NM_000981), RPL19 Forward primer (SEQ ID NO: 1262): TGGCAAGAAGAAGGTCTGGTTAG
RPL19 Reverse primer (SEQ ID NO:1263): TGATCAGCCCATCTTTGATGAG
RPL 19 -amplicon (SEQ ID NO: 1264):
TGGCAAGAAGAAGGTCTGGTTAGACCCCAATGAGACCAATGAAATCGCCAATGCCA ACTCCCGTCAGCAGATCCGGAAGCTCATCAAAGATGGGCTGATCA TATA box (GenBank Accession No. NM_003194),
TATA box Forward primer (SEQ ID NO:1265): CGGTTTGCTGCGGTAATCAT
TATA box Reverse primer(SEQ ID NO: 1266): TTTCTTGCTGCCAGTCTGGAC
TATA box -amplicon (SEQ ID NO: 1267): CGGTTTGCTGCGGTAATCATGAGGATAAGAGAGCCACGAACCACGGCACTGATTTT
CAGTTCTGGGAAAATGGTGTGCACAGGAGCCAAGAGTGAAGAACAGTCCAGACTG
GCAGCAAGAAA Ubiquitin(GenBank Accession No. BC000449)
Ubiquitin Forward primer (SEQ ID NO:1268): ATTTGGGTCGCGGTTCTTG Ubiquitin Reverse primer (SEQ ID NO: 1269): TGCCTTGACATTCTCGATGGT
Ubiquitin -amplicon (SEQ ID NO: 1270):
ATTTGGGTCGCGGTTCTTGTTTGTGGATCGCTGTGATCGTCACTTGACAATGCAGAT
CTTCGTGAAGACTCTGACTGGTAAGACCATCACCCTCGAGG
TTGAGCCCAGTGACACCATCGAGAATGTCAAGGCA SDHA (GenBank Accession No. NM_004168)
SDHA Forward primer (SEQ ID NO: 1271): TGGGAACAAGAGGGCATCTG
SDHA Reverse primer (SEQ ID NO: 1272): CCACCACTGCATCAAATTCATG SDHA-amplicon (SEQ ID NO: 1273):
TGGGAACAAGAGGGCATCTGCTAAAGTTTCAGATTCCATTTCTGCTCAGTATCCAGT AGTGGATCATGAATTTGATGCAGTGGTGG
Oligonucleotide-based micro-array experiment protocol-
Microanay fabrication Microanays (chips) were printed by pin deposition using the MicroGrid II MGII 600 robot from BioRobtics Limited (Cambridge, UK). 50-mer oligonucleotides target sequences were designed by Compugen Ltd (Tel-Aviv, IL) as described by A. Shoshan et al, "Optical technologies and infonnatics", Proceedings of SPIE. Vol 4266, pp. 86-95 (2001). The designed oligonucleotides were synthesized and purified by desalting with the Sigma-Genosys system (The Woodlands, TX, US) and all of the oligonucleotides were joined to a C6 amino-modified linker at the 5' end, or being attached directly to CodeLink slides (Cat #25-6700-01. Amersham Bioscience, Piscataway, NJ, US). The 50-mer oligonucleotides, forming the target sequences, were first suspended in Ultra-pure DDW (Cat # 01-866-1A Kibbutz Beit-Haemek, Israel) to a concentration of 50μM. Before printing the slides, the oligonucleotides were resuspended in 300mM sodium phosphate (pH 8.5) to final concentration of 150mM and printed at 35-40% relative humidity at 21 °C . Each slide contained a total of 9792 features in 32 subanays. Of these features, 4224 features were sequences of interest according to the present invention and negative controls that were printed in duplicate. An additional 288 features (96 target sequences printed in triplicate) contained housekeeping genes from Human Evaluation Library2, Compugen Ltd, Israel. Another 384 features are E.coli spikes 1-6, which are oligos to E-Coli genes which are commercially available in the Anay Control product (Anay control- sense oligo spots, Ambion Inc. Austin, TX. Cat #1781, Lot #112K06).
Post-coupling processing of printed slides After the spotting of the oligonucleotides to the glass (CodeLink) slides, the slides were incubated for 24 hours in a sealed saturated NaCl humidification chamber (relative humidity 70-
75%). Slides were treated for blocking of the residual reactive groups by incubating them in blocking solution at 50°C for 15 minutes (lOml/slide of buffer containing 0.1M Tris, 50mM ethanolamine, 0.1% SDS). The slides were then rinsed twice with Ultra-pure DDW (double distilled water). The slides were then washed with wash solution (lOml/slide. 4X SSC, 0.1%
SDS)) at 50°C for 30 minutes on the shaker. The slides were then rinsed twice with Ultra-pure
DDW, followed by drying by centrifugation for 3 minutes at 800 rpm. Next, in order to assist in automatic operation of the hybridization protocol, the slides were treated with Ventana Discovery hybridization station barcode adhesives. The printed slides were loaded on a Bio-Optica (Milan, Italy) hematology staining device and were incubated for 10 minutes in 50ml of 3-AminoρropyI Triethoxysilane (Sigma A3648 lot #122K589). Excess fluid was dried and slides were then incubated for three hours in 20 mm/Hg in a dark vacuum desiccator (Pelco 2251, Ted Pella, Inc. Redding CA). The following protocol was then followed with the Genisphere 900-RP (random primer), with mini elute columns on the Ventana Discovery HybStation™, to perform the microanay experiments. Briefly, the protocol was performed as described with regard to the instructions and information provided with the device itself. The protocol included cDNA synthesis and labeling. cDNA concentration was measured with the TBS-380 (Turner Biosystems. Sunnyvale, CA.) PicoFlour, which is used with the OliGreen ssDNA Quantitation reagent and kit.
Hybridization was performed with the Ventana Hybridization device, according to the provided protocols (Discovery Hybridization Station Tuscon AZ). The slides were then scanned with GenePix 4000B dual laser scanner from Axon
Instruments Inc, and analyzed by GenePix Pro 5.0 software. Schematic summary of the oligonucleotide based microanay fabrication and the experimental flow is presented in Figures 4 and 5. Briefly, as shown in Figure 4, DNA oligonucleotides at 25uM were deposited (printed) onto Amersham 'CodeLink' glass slides generating a well defined 'spot'. These slides are covered with a long-chain, hydrophilic polymer chemistry that creates an active 3-D surface that covalently binds the DNA oligonucleotides 5 '-end via the
C6-amine modification. This binding ensures that the full length of the DNA oligonucleotides is available for hybridization to the cDNA and also allows lower background, high sensitivity and reproducibility. Figure 5 shows a schematic method for performing the microanay experiments. It should be noted that stages on the left-hand or right-hand side may optionally be performed in any order, including in parallel, until stage 4 (hybridization). Briefly, on the left-hand side, the target oligonucleotides are being spotted on a glass microscope slide (although optionally other materials could be used) to form a spotted slide (stage 1). On the right hand side, control sample RNA and cancer sample RNA are Cy3 and Cy5 labeled, respectively (stage 2), to form labeled probes. It should be noted that the control and cancer samples come from conesponding tissues (for example, nonnal prostate tissue and cancerous prostate tissue). Furthennore, the tissue from which the RNA was taken is indicated below in the specific examples of data for particular clusters, with regard to overexpression of an oligonucleotide from a "chip" (microanay), as for example "prostate" for chips in which prostate cancerous tissue and normal tissue were tested as described above. In stage 3, the probes are mixed. In stage 4, hybridization is performed to form a processed slide. In stage 5, the slide is washed and scanned to form an image file, followed by data analysis in stage 6.
DESCRIPTION FOR CLUSTER M85491
Cluster M85491 features 2 transcript(s) and 11 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Figure imgf000248_0001
Figure imgf000249_0001
These sequences are variants of the known protein Ephrin type-B receptor 2 [precursor] (SwissProt accession identifier EPB2_HUMAN; known also according to the synonyms EC 2.7.1.112; Tyrosine-protein kinase receptor EPH-3; DRT; Receptor protein-tyrosine kinase HEK5; ERK), SEQ ID NO: 616, refened to herein as the previously known protein. Protein Ephrin type-B receptor 2 [precursor] is known or believed to have the following function(s): Receptor for members of the ephrin-B family. The sequence for protein Ephrin type-B receptor 2 [precursor] is given at the end of the application, as "Ephrin type-B receptor 2 [precursor] amino acid sequence" (SEQ ID NO:616). Known polymoφhisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein
Figure imgf000249_0002
Protein Ephrin type-B receptor 2 [precursor] localization is believed to be Type I membrane protein. The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: protein amino acid phosphorylation; transmembrane receptor protein tyrosine kinase signaling pathway; neurogenesis, which are annotation(s) related to Biological Process; protein tyrosine kinase; receptor; transmembrane-ephrin receptor; ATP binding; transferase, which are annotation(s) related to Molecular Function; and integral membrane protein, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>.
Cluster M85491 can be used as a diagnostic marker according to overexpression of transcripts of this cluster in cancer. Expression of such transcripts in normal tissues is also given according to the previously described methods. The term "number" in the left hand column of the table and the numbers on the y-axis of the figure below refer to weighted expression of ESTs in each category, as "parts per million" (ratio of the expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million). Overall, the following results were obtained as shown with regard to the histograms in Figure 6 and Table 5. This cluster is overexpressed (at least at a minimum level) in the following pathological conditions: epithelial malignant tumors and a mixture of malignant tumors from different tissues. Table 5 - Normal tissue distribution
Figure imgf000250_0001
Figure imgf000251_0001
Table 6 - P values and ratios for expression in cancerous tissue
Figure imgf000251_0002
As noted above, cluster M85491 features 2 transcript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Ephrin type-B receptor 2 [precursor]. A description of each variant protein according to the present invention is now provided.
Variant protein M85491_PEA_1_P13 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) M85491JPEAJJT16. An alignment is given to the known protein (Ephrin type-B receptor 2 [precursor]) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows:
Comparison report between M85491 JPEAJ JP 13 and EPB2 JHUMAN: l.An isolated chimeric polypeptide encoding for M85491JPEAJ JP13, comprising a first amino acid sequence being at least 90 % homologous to MALRRLGAALLLLPLLAAVEETLMDSTTATAELGWMVHPPSGWEEVSGYDENMNTIR TYQVCNVFESSQNNWLRTKFIRRRGAHRIHVEMKFSVRDCSSIPSVPGSCKETFNLYYY EADFDSATKTFPNWMENPWVKVDTIAADESFSQVDLGGRVMKTNTEVRSFGPVSRSGF YLAFQDYGGCMSLIAVRVFYRKCPRIIQNGAIFQETLSGAESTSLVAARGSCIANAEEVD VPIKLYCNGDGEWLVPIGRCMCKAGFEAVENGTVCRGCPSGTFKANQGDEACTHCPiN SRTTSEGATNCVCRNGYYRADLDPLDMPCTTIPSAPQAVISSVNETSLMLEWTPPRDSG GREDLVYNIICKSCGSGRGACTRCGDNVQYAPRQLGLTEPPJYISDLLAHTQYTFEIQAV NGVTDQSPFSPQFASVNITTNQAAPSAVSIMHQVSRTVDSITLSWSQPDQPNGVILDYEL QYYEK conesponding to amino acids 1 - 476 of EPB2JHUMAN, which also conesponds to amino acids 1 - 476 of M85491__PEA_1_P13, and a second amino acid sequence being at least 70%), optionally at least 80%>, preferably at least 85%>, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence VPIGWVLSPSPTSLRAPLPG conesponding to amino acids 477 - 496 of M85491JPEAJJP13, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of M85491_PEA_1_P13, comprising a polypeptide being at least 70%>, optionally at least about 80%., preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VPIGWVLSPSPTSLRAPLPG in M85491 JPEAJ J>13.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region. Variant protein M85491_PEA_1_P13 is encoded by the following transcript(s): M85491 JΕAJJTlδ, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript M85491JPEAJJT16 is shown in bold; this coding portion starts at position 143 and ends at position 1630. The transcript also has the following SNPs as listed in Table 7 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein M85491_PEA_1_P13 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Nucleic acid SNPs
Figure imgf000253_0001
Figure imgf000254_0001
Variant protein M85491 ΕAJJP14 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) M85491JPEAJ JT20. An alignment is given to the known protein (Ephrin type-B receptor 2 [precursor]) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows:
Comparison report between M85491JPEAJ J?14 and EPB2JΪUMAN: l.An isolated chimeric polypeptide encoding for M85491_PEA_1_P14, comprising a first amino acid sequence being at least 90 %> homologous to MALRRLGAALLLLPLLAAVEETLMDSTTATAELGWMVHPPSGWEEVSGYDENMNTIR TYQVCNVFESSQNNWLRTKFIRRRGAHRIHVEMKFSVRDCSSIPSVPGSCKETFNLYYY EADFDSATKTFPNWMENPWVKVDTIAADESFSQVDLGGRVMKTNTEVRSFGPVSRSGF YLAFQDYGGCMSLIAVRVFYRKCPRIIQNGAIFQETLSGAESTSLVAARGSCIANAEEVD VPIKLYCNGDGEWLVPIGRCMCKAGFEAVENGTVCR conesponding to amino acids 1 - 270 of EPB2JHUMAN, which also conesponds to amino acids 1 - 270 of M85491JPEAJ JP14, and a second amino acid sequence being at least 70%>, optionally at least 80%), preferably at least 85%>, more preferably at least 90%> and most preferably at least 95%> homologous to a polypeptide having the sequence ERQDLTMLSRLVLNSWPQMILPPQPPKVLEL conesponding to amino acids 271 - 301 of M85491 ΕAJJP14, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of M85491_PEA_1_P14, comprising a polypeptide being at least 70%>, optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90%> and most preferably at least about 95% homologous to the sequence ERQDLTMLSRLVLNSWPQMILPPQPPKVLEL in M85491_PEA_l_P14.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region..
Variant protein M85491_PEA_1_P14 is encoded by the following transcript(s): M85491JPEAJJT20, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript M85491 JPEAJ JT20 is shown in bold; this coding portion starts at position 143 and ends at position 1045. The transcript also has the following SNPs as listed in Table 8 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein M85491_PEA_1_P14 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Nucleic acid SNPs
Figure imgf000255_0001
As noted above, cluster M85491 features 11 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster M85491 JPEAJ _nodeJ) according to the present invention is supported by 5 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M85491JPEAJJT16 and M85491JPEAJJT20. Table 9 below describes the starting and ending position of this segment on each transcript. Table 9 - Segment location on transcripts
Figure imgf000256_0001
Segment cluster M85491JPEAJ_nodeJ3 according to the present invention is supported by 6 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M85491 ΕAJJT20. Table 10 below describes the starting and ending position of this segment on each transcript. Table 10 - Segment location on transcripts
Figure imgf000256_0002
Microanay (chip) data is also available for this segment as follows. As described above with regard to the cluster itself, various oligonucleotides were tested for being differentially expressed in various disease conditions, particularly cancer. The following oligonucleotides were found to hit this segment, shown in Table 11. Table 11 - Oligonucleotides related to this segment
Figure imgf000257_0001
Segment cluster M85491JPEAJ__node_21 according to the present invention is supported by 18 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M85491_PEA_1_T16. Table 12 below describes the starting and ending position of this segment on each transcript. Table 12 - Segment location on transcripts
Figure imgf000257_0002
Segment cluster M85491 JPEA J _node_23 according to the present invention is supported by 18 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M85491JPEAJJT16. Table 13 below describes the starting and ending position of this segment on each transcript. Table 13 - Segment location on transcripts
Figure imgf000257_0003
Segment cluster M85491_PEA_l_node_24 according to the present invention is supported by 3 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M85491JPEAJJT16. Table 14 below describes the starting and ending position of this segment on each transcript. Table 14 - Segment location on transcripts
Figure imgf000257_0004
Figure imgf000258_0001
Segment cluster M85491JPEAJ _node_8 according to the present invention is supported by 25 libraries. The number of libraries was detennined as previously described. This segment can be found in the following transcript(s): M85491JPEAJJT16 and M85491JPEAJJT20. Table 15 below describes the starting and ending position of this segment on each transcript. Table 15 - Segment location on transcripts
Figure imgf000258_0002
Microanay (chip) data is also available for this segment as follows. As described above with regard to the cluster itself, various oligonucleotides were tested for being differentially expressed in various disease conditions, particularly cancer. The following oligonucleotides were found to hit this segment with regard to colon cancer, shown in Table 16. Table 16 - Oligonucleotides related to this segment
Figure imgf000258_0003
Segment cluster M85491_PEA_l_node_9 according to the present invention is supported by 20 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M85491_PEA_1_T16 and M85491_PEA_1_T20. Table 17 below describes the starting and ending position of this segment on each transcript. Table 17 - Segment location on transcripts
Figure imgf000258_0004
According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster M85491_PEA_l_node_10 according to the present invention is supported by 17 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M85491JPEAJJT16 and M85491JPEAJJT20. Table 18 below describes the starting and ending position of this segment on each transcript. Table 18 - Segment location on transcripts
Figure imgf000259_0001
Segment cluster M85491_PEA_l_node_18 according to the present invention is supported by 15 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M85491JPEAJJT16. Table 19 below describes the starting and ending position of this segment on each transcript. Table 19 - Segment location on transcripts
Figure imgf000259_0002
Segment cluster M85491_PEA_l_node_19 according to the present invention is supported by 15 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M85491__PEA_1_T16. Table 20 below describes the starting and ending position of this segment on each transcript. Table 20 - Segment location on transcripts
Figure imgf000260_0001
Segment cluster M85491 JPEAJ _node_6 according to the present invention is supported by 11 libraries. The number of libraries was detennined as previously described. This segment can be found in the following transcript(s): M85491JPEAJJT16 and M85491JPEAJJT20. Table 21 below describes the starting and ending position of this segment on each transcript. Table 21 - Segment location on transcripts
Figure imgf000260_0002
Variant protein alignment to the previously known protein: Sequence name: /tmp/qfmsU9VtxS/DylcLC9j8v:EPB2_HUMAN
Sequence documentation:
Alignment of: M85491_PEA 1_P13 x EPB2 HUMAN
Alignment segment 1/1: Quality: 4726.00
Escore: 0 Matching length: 476 Total length: 476 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent
Identity: 100.00 Gaps : 0
Alignment :
1 MALRRLGAALLLLPLLAAVEETLMDSTTATAELG MVHPPSGWEEVSGYD 50 II M II I II I II II I II II II I II I II I i II I I I II II I I I I II I III i I 1 MALRRLGAALLLLPLLAAVEETLMDSTTATAELGWMVHPPSG EEVSGYD 50
51 ENMNTIRTYQVCNVFESSQNNWLRTKFIRRRGAHRIHVEM FSVRDCSSI 100 I II I II I I II II II II II II II II I II II II II I II II II II I I II I I I I 51 ENMNTIRTYQVCNVFESSQNNWLRTKFIRRRGAHRIHVEMKFSVRDCSSI 100 . . . . . 101 PSVPGSCKETFNLYYYEADFDSATKTFPNWMENP VKVDT1AADESFSQV 150 I I I I I II I I II I I I I I I I I II II II I II I I II II I I I I I II II I I II I I I 101 PSVPGSCKETFNLYYYEADFDSATKTFPNWMENPWVKVDTIAADESFSQV 150 151 DLGGRVMKINTEVRSFGPVSRSGFYLAFQDYGGCMSLIAVRVFYRKCPRI 200 I II I I I I I I I I I I I I I II I I I I I I II I I I I I I I II I II I I II I I I I I I I I 151 DLGGRVMKINTEVRSFGPVSRSGFYLAFQDYGGCMSLIAVRVFYRKCPRI 200 201 IQNGAIFQET SGAESTSLVAARGSCIANAEEVDVPIKLYCNGDGEWLVP 250 I I II I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I II I I I I I I I I 201 IQNGAIFQETLSGAESTSLVAARGSCIANAEEVDVPIKLYCNGDGE LVP 250 251 IGRCMCKAGFEAVENGTVCRGCPSGTFKANQGDEACTHCPINSRTTSEGA 300 I II I I I I I I I I I II I I I I I I I I I I I I I I I 1 II II II I I I II I I II I 1 II I 251 IGRCMCKAGFEAVENGTVCRGCPSGTFKANQGDEACTHCPINSRTTSEGA 300 . . . . . 301 TNCVCRNGYYRADLDPLDMPCTTIPSAPQAVISSVNETSLMLEWTPPRDS 350 I I I I I I I I I I I I II I I I I I I II I I II I I I I I I I I I I I I II I I I I I I I II I 301 TNCVCRNGYYRADLDPLDMPCTTIPSAPQAVISSVNETSLMLEWTPPRDS 350 351 GGREDLVYNIICKSCGSGRGACTRCGDNVQYAPRQLGLTEPRIYISDLLA 400 I I II I I I I I I I I I II I I I I I I I I I I I I I I I II I II I I I I I II I I I I I I II 351 GGREDLVYNIICKSCGSGRGACTRCGDNVQYAPRQLGLTEPRIYISDLLA 400
401 HTQYTFEIQAVNGVTDQSPFSPQFASVNITTNQAAPSAVSIMHQVSRTVD 450 || I M M I I || || I I I I || I I I I II I I I I I II I II I M II I I I I I I I I I I 401 HTQYTFEIQAVNGVTDQSPFSPQFASVNITTNQAAPSAVSIMHQVSRTVD 450
451 SITLS SQPDQPNGVILDYELQYYEK 476 II II I I I I II I I I II I I I I I I I I II 1 451 SITLSWSQPDQPNGVILDYELQYYEK 476
Sequence name: /tmp/rmnzuDbot6/GiHbjeU8iR:EPB2_HUMAN
Sequence documentation:
Alignment of: M85491 PEA 1 P14 x EPB2 HUMAN Alignment segment 1/1:
Quality: 2673.00 Escore: 0 Matching length: 270 Total length: 270 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment:
1 MALRRLGAALLLLPLLAAVEETLMDSTTATAELGWMVHPPSGWEEVSGYD 50
1 MALRRLGAALLLLPLLAAVEETLMDSTTATAELG MVHPPSGWEEVSGYD 50
51 ENMNTIRTYQVCNVFESSQNNWLRTKFIRRRGAHRIHVEMKFSVRDCSSI 100
51 ENMNTIRTYQVCNVFESSQNNWLRTKFIRRRGAHRIHVEMKFSVRDCSSI 100
101 PSVPGSCKETFNLYYYEADFDSATKTFPN MENPWVKVDTIAADESFSQV 150
101 PSVPGSCKETFNLYYYEADFDSATKTFPNWMENPWVKVDTIAADESFSQV 150
151 DLGGRVMKINTEVRSFGPVSRSGFYLAFQDYGGCMSLIAVRVFYRKCPRI 200
151 DLGGRVMKINTEVRSFGPVSRSGFYLAFQDYGGCMSLIAVRVFYRKCPRI 200 201 IQNGAIFQETLSGAESTSLVAARGSCIANAEEVDVPIKLYCNGDGEWLVP 250 I II I I 1 II I I I I I I I I I I I I III I I II I II I I II I I II I I I I I I I I I I I I 201 IQNGAIFQETLSGAESTSLVAARGSCIANAEEVDVPIKLYCNGDGEWLVP 250 251 IGRCMCKAGFEAVENGTVCR 270 I I I I I I I I I I I I I I I I I I I I 251 IGRCMCKAGFEAVENGTVCR 270
Expression of Ephrin type-B receptor 2 precursor (EC 2.7.1.112) (Tyrosine-protein kinase receptor EPH-3) M85491 transcripts which are detectable by amplicon as depicted in sequence name M85491seg24 in normal and cancerous colon tissues Expression of Ephrin type-B receptor 2 precursor (EC 2.1 A .112) (Tyrosine-protein kinase receptor EPH-3) transcripts detectable by or according to seg24 , M85491seg24 amplicon and M85491 seg24F and M85491 seg24R primers was measured by real time PCR. In parallel the expression of four housekeeping genes -PBGD (GenBank Accession No. BC019323; amplicon
- PBGD-amplicon, SEQ ID NO.-531), HPRT1 (GenBank Accession No. NM_000194; amplicon
- HPRTl-amplicon, SEQ ID NO:612), G6PD (GenBank Accession No. NM_000402; G6PD amplicon, SEQ ID NO:615), and RPS27A (GenBank Accession No. NM_002954; RPS27A amplicon, SEQ ID NO: 1261) was measured similarly. For each RT (RT-PCR) sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the normal post-mortem (PM) samples (Sample Nos. 41, 52, 62-67, 69-71, Table 1, "Tissue samples in testing panel", above), to obtain a value of fold up-regulation for each sample relative to median of the normal PM samples. Figure 7 is a histogram showing over expression of the above-indicated Ephrin type-B receptor 2 precursor (EC 2.7.1.112) (Tyrosine-protein kinase receptor EPH-3) transcripts in cancerous colon samples relative to the normal samples. Values represent the average of duplicate experiments. Enor bars indicate the minimal and maximal values obtained.. As is evident from Figure 7, the expression of Ephrin type-B receptor 2 precursor (EC
2.7.1.112) (Tyrosine-protein kinase receptor EPH-3) transcripts detectable by the above amplicon in cancer samples was significantly higher than in the non-cancerous samples (Sample Nos. 41,52, 62-67, 69-71 Table 1, "Tissue samples in testing panel"). Notably over-expression of at least 3 fold was found in 13 out of 37 adenocarcinoma samples. Statistical analysis was applied to verify the significance of these results, as described below. The P value for the difference in the expression levels of Ephrin type-B receptor 2 precursor (EC 2.7.1.112) (Tyrosine-protein kinase receptor EPH-3) transcripts detectable by the above amplicon(s) in colon cancer samples versus the normal tissue samples was determined by T test as 6.83E-04 Threshold of 3 fold over expression was found to differentiate between cancer and normal samples with P value of 2.66E-02 in as checked by exact fisher test. The above values demonstrate statistical significance of the results. Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non- limiting illustrative example only of a suitable primer pair: M85491seg24F forward primer; and M85491seg24R reverse primer. The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon was obtained as a non-limiting illustrative example only of a suitable amplicon: M85491seg24.
M85491seg24F (SEQ ID NO: 1274)- GGCGTCTTTCTCCCTCTGAAC M85491seg24R (SEQ ID NO: 1275)- GTCCCATTCTGGGTGCTGTG M85491 seg24 (SEQ ID NO: 1276) -
GGCGTCTTTCTCCCTCTGAACCTCAGTTTCCACCTGTGTCGAGTGTGGGTGAGACCC CTCGCGGGGAGCTATGCAGGTTACGGAGAAAAGGCAGCACAGCACCCAGAATGGG AC
Expression of Ephrin type-B receptor 2 precursor (EC 2.7.1.112) (Tyrosine-protein kinase receptor EPH-3) M85491 transcripts which are detectable by amplicon as depicted in sequence name M85491seg24 in different normal tissues.
Expression of Ephrin type-B receptor 2 precursor transcripts detectable by or according to M85491 seg24 amplicon(s) and M85491 seg24F and M85491 seg24R was measured by real time PCR. In parallel the expression of four housekeeping genes -RPL19 (GenBank Accession No. NM_000981; RPL19 amplicon), TATA box (GenBank Accession No. NM 03194; TATA amplicon), Ubiquitin (GenBank Accession No. BC000449; amplicon - Ubiquitin-amplicon) and SDHA (GenBank Accession No. NM_004168; amplicon - SDHA-amplicon) was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The noπnalized quantity of each RT sample was then divided by the median of the quantities of the lung samples (Sample Nos. 15-17 Table 2 Tissue samples in normal panel), to obtain a value of relative expression of each sample relative to median of the lung samples. The results are described in Figure 8, presenting the histogram showing the expression of M85491 transcripts which are detectable by amplicon as depicted in sequence name M85491seg24 in different normal tissues.
Forward primer (SEQ ID NO: 1274): GGCGTCTTTCTCCCTCTGAAC Reverse primer (SEQ ID NO: 1275): GTCCCATTCTGGGTGCTGTG Amplicon (SEQ ID NO: 1276): GGCGTCTTTCTCCCTCTGAACCTCAGTTTCCACCTGTGTCGAGTGTGGGTGAGACCC CTCGCGGGGAGCTATGCAGGTTACGGAGAAAAGGCAGCACAGCACCCAGAATGGG AC
DESCRIPTION FOR CLUSTER T10888
Cluster T10888 features 4 transcript(s) and 8 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Figure imgf000267_0001
Table 3 - Proteins of interest
Figure imgf000268_0001
These sequences are variants of the known protein Carcinoembryonic antigen-related cell adhesion molecule 6 precursor (SwissProt accession identifier CEA6JTUMAN; known also according to the synonyms Nonnal cross-reacting antigen; Nonspecific crossreacting antigen; CD66c antigen), SEQ ID NO: 617, refened to herein as the previously known protein. The sequence for protein Carcinoembryonic antigen-related cell adhesion molecule 6 precursor is given at the end of the application, as "Carcinoembryonic antigen-related cell adhesion molecule 6 precursor amino acid sequence". Known polymoφhisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein
Figure imgf000268_0002
Protein Carcinoembryonic antigen-related cell adhesion molecule 6 precursor localization is believed to be Attached to the membrane by a GPI-anchor. The previously known protein also has the following indication(s) and/or potential therapeutic use(s): Cancer. It has been investigated for clinical/therapeutic use in humans, for example as a target for an antibody or small molecule, and/or as a direct therapeutic; available information related to these investigations is as follows. Potential pharmaceutically related or therapeutically related activity or activities of the previously known protein are as follows: Immunostiinulant. A therapeutic role for a protein represented by the cluster has been predicted. The cluster was assigned this field because there was information in the drug database or the public databases (e.g., described herein above) that this protein, or part thereof, is used or can be used for a potential therapeutic indication: Imaging agent; Anticancer; Immunostimulant; Immunoconjugate; Monoclonal antibody, murine; Antisense therapy; antibody. The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: signal transduction; cell-cell signaling, which are annotation(s) related to Biological Process; and integral plasma membrane protein, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>.
Cluster T10888 can be used as a diagnostic marker according to overexpression of transcripts of this cluster in cancer. Expression of such transcripts in nonnal tissues is also given according to the previously described methods. The term "number" in the right hand column of the table and the numbers on the y-axis of the figure below refer to weighted expression of ESTs in each category, as "parts per million" (ratio of the expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million).
Overall, the following results were obtained as shown with regard to the histograms in Figure 9 and Table 5. This cluster is overexpressed (at least at a minimum level) in the following pathological conditions: colorectal cancer, a mixture of malignant tumors from different tissues, pancreas carcinoma and gastric carcinoma.
Table 5 - Normal tissue distribution
Figure imgf000269_0001
Figure imgf000270_0001
Table 6 - P values and ratios for expression in cancerous tissue
Figure imgf000270_0002
As noted above, cluster T10888 features 4 transcript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Carcinoembryonic antigen-related cell adhesion molecule 6 precursor. A description of each variant protein according to the present invention is now provided.
Variant protein T10888JΕAJ JP2 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) T10888JPEAJJT1. An alignment is given to the known protein (Carcinoembryonic antigen- related cell adhesion molecule 6 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T10888_PEA_1_P2 and CEA6_HUMAN: l.An isolated chimeric polypeptide encoding for T10888_PEA_1_P2, comprising a first amino acid sequence being at least 90 % homologous to MGPPSAPPCRLHVPWKEVLLTASLLTFWNPPTTAKLTIESTPFNVAEGKEVLLLAHNLP QNRIGYSWYKGERVDGNSLIVGYVIGTQQATPGPAYSGRETIYPNASLLIQNVTQNDTG FYTLQVIKSDLVNEEATGQFHVYPELPKPSISSNNSNPVEDKDAVAFTCEPEVQNTTYL WWVNGQSLPVSPRLQLSNGNMTLTLLSVKRNDAGSYECEIQNPASANRSDPVTLNVLY GPDVPTISPSKANYRPGENLNLSCHAASNPPAQYSWFINGTFQQSTQELFIPNITVNNSGS YMCQAHNSATGLNRTTVTMITVS conesponding to amino acids 1 - 319 of CEA6 JHUMAN, which also conesponds to amino acids 1 - 319 of T10888_PEA_1_P2, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence DWTRP conesponding to amino acids 320 - 324 of T10888 JPEAJ JP2, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of T10888_PEA_1_P2, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95%> homologous to the sequence DWTRP in Tl 0888 JPEA JJ>2. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region.. Variant protein T10888_PEA_1_P2 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 7, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T10888JΕAJ J>2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Amino acid mutations
Figure imgf000272_0001
Variant protein T10888 PEAJJP2 is encoded by the following transcript(s): T10888_PEA_1_T1, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript T10888_PEA_1_T1 is shown in bold; this coding portion starts at position 151 and ends at position 1122. The transcript also has the following SNPs as listed in Table 8 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T10888JPEAJ J>2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Nucleic acid SNPs
Figure imgf000272_0002
Figure imgf000273_0001
Variant protein T10888JΕAJJP4 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) T10888_PEA_1_T4. An alignment is given to the known protein (Carcinoembryonic antigen- related cell adhesion molecule 6 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows:
Comparison report between T10888_PEA_1_P4 and CEA6 JHUMAN: l.An isolated chimeric polypeptide encoding for T10888_PEA_1_P4, comprising a first amino acid sequence being at least 90 % homologous to MGPPSAPPCRLHVPWKEVLLTASLLTFWNPPTTAKLTIESTPFNVAEGKEVLLLAHNLP QNRIGYSWYKGERVDGNSLIVGYVIGTQQATPGPAYSGRETIYPNASLLIQNVTQNDTG FYTLQVIKSDLVNEEATGQFHVYPELPKPSISSNNSNPVEDKDAVAFTCEPEVQNTTYL WWVNGQSLPVSPRLQLSNGNMTLTLLSVKRNDAGSYECEIQNPASANRSDPVTLNNL conesponding to amino acids 1 - 234 of CEA6_HUMAN, which also conesponds to amino acids 1 - 234 of T10888JPEAJJP4, and a second amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90%> and most preferably at least 95%> homologous to a polypeptide having the sequence LLLSSQLWPPSASRLECWPGWL conesponding to amino acids 235 - 256 of T10888JPEAJ JM, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of T10888JPEAJJ , comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence LLLSSQLWPPSASRLECWPGWL in T10888JΕAJ JP4.
Comparison report between T10888JPEAJ J>4 and Q13774 (SEQ ID NO:1382): l.An isolated chimeric polypeptide encoding for T10888JPEAJJP4, comprising a first amino acid sequence being at least 90 % homologous to MGPPSAPPCRLHWWKEVLLTASLLTFWNPPTTAKLTIESTPFNVAEGKEVLLLAHNLP QNRIGYSWYKGERVDGNSLIVGYVIGTQQATPGPAYSGRETIYPNASLLIQNVTQNDTG FYTLQVIKSDLVNEEATGQFHVYPELPKPSISSNNSNPVEDKDAVAFTCEPEVQNTTYL WWVNGQSLPVSPR-LQLSNGNMTLTLLSVKRNDAGSYECEIQNPASANRSDPVTLNVL conesponding to amino acids 1 - 234 of Q 13774, which also conesponds to amino acids 1 - 234 of Tl 0888 JPEA JJP4, and a second amino acid sequence being at least 70%, optionally at least 80%), preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence LLLSSQLWPPSASRLECWPGWL conesponding to amino acids 235 - 256 of T10888_PEA_1_P4, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of T10888_PEA_1_P4, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence LLLSSQLWPPSASRLECWPGWL in Tl 0888 JPEA JP4. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region.. Variant protein T 10888 JPEA JJP4 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 9, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T10888JPEAJJP4 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Amino acid mutations
Figure imgf000275_0001
Variant protein T10888_PEA_1_P4 is encoded by the following transcript(s): T10888JPEAJJT4, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript T10888JPEAJJT4 is shown in bold; this coding portion starts at position 151 and ends at position 918. The transcript also has the following SNPs as listed in Table 10 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T10888_PEA_1_P4 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 - Nucleic acid SNPs
Figure imgf000276_0001
Variant protein T10888_PEA_1_P5 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) T10888_PEA_1_T5. An alignment is given to the known protein (Carcinoembryonic antigen- related cell adhesion molecule 6 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T10888 JΕAJ JP5 and CEA6 JHUMAN: l.An isolated chimeric polypeptide encoding for T10888JPEAJJP5, comprising a first amino acid sequence being at least 90 %> homologous to
MGPPSAPPCRLHVPWKEVLLTASLLTFWNPPTTAKLTIESTPFNVAEGKEVLLLAHNLP QNRIGYSWYKGERVDGNSLIVGYVIGTQQATPGPAYSGRETIYPNASLLIQNVTQNDTG FYTLQVΓKSDLVNEEATGQFHVYPELPKPSISSNNSNPVEDKDAVAFTCEPEVQNTTYL WWVNGQSLPVSPRLQLSNGNMTLTLLSVKRNDAGSYECEIQNPASANRSDPVTLNVLY GPDVPTISPSKANYRPGENLNLSCHAASNPPAQYSWFINGTFQQSTQELFIPNITVNNSGS
YMCQAHNSATGLNRTTVTMITVSG conesponding to amino acids 1 - 320 of CEA6 JHUMAN, which also conesponds to amino acids 1 - 320 of T10888_PEA_1_P5, and a second amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence
KWIHEALASHFQVESGSQRRARKKFSFPTCVQGAHANPKFSPEPSQFTSADSFPLVFLFF VVFCFLISHV conesponding to amino acids 321 - 390 of T10888JΕAJJP5, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of T10888JPEAJJP5, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence
KWIHEALASHFQVESGSQRRARKKFSFPTCVQGAHANPKFSPEPSQFTSADSFPLVFLFF VVFCFLISHV in T10888_PEA_1_P5.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because although both signal- peptide prediction programs agree that this protein has a signal peptide, both trans-membrane region prediction programs predict that this protein has a trans-membrane region downstream of this signal peptide.. Variant protein T10888JPEAJJP5 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 11 , (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T10888_PEA_1_P5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 11 - Amino acid mutations
Figure imgf000278_0001
Variant protein T10888_PEA_1_P5 is encoded by the following transcript(s): T10888JPEAJJT5, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript T10888JPEAJJT5 is shown in bold; this coding portion starts at position 151 and ends at position 1320. The transcript also has the following SNPs as listed in Table 12 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T10888JPEAJJP5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 12 - Nucleic acid SNPs
Figure imgf000278_0002
Figure imgf000279_0001
Variant protein T10888_PEA_1_P6 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) T10888JPEAJ JT6. An alignment is given to the known protein (Carcinoembryonic antigen- related cell adhesion molecule 6 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. Comparison report between T10888_PEA_1_P6 and CEA6_HUMAN: l.An isolated chimeric polypeptide encoding for Tl 0888 JPEA JJP6, comprising a first amino acid sequence being at least 90 %> homologous to MGPPSAPPCPXHVPWKEVLLTASLLTFWNPPTTAKLTIESTPFNVAEGKEVLLLAHNLP QNRIGYSWYKGERVDGNSLIVGYVIGTQQATPGPAYSGRETIYPNASLLIQNVTQNDTG FYTLQVIKSDLVNEEATGQFHVY conesponding to amino acids 1 - 141 of CEA6JΪUMAN, which also conesponds to amino acids 1 - 141 of T10888JPEAJJP6, and a second amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence REYFHMTSGCWGSVLLPTYGIVRPGLCLWPSLHYILYQGLDI conesponding to amino acids 142 - 183 of T10888_PEA_1_P6, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of T10888_PEA_1_P6, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence REYFHMTSGCWGSVLLPTYGIVRPGLCLWPSLHYILYQGLDI in T10888_PEA_1_P6. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region.. Variant protein T10888_PEA_1_P6 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 13, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T10888_PEA_1_P6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 13 - Amino acid mutations
Figure imgf000280_0001
Variant protein T10888JPEAJJP6 is encoded by the following transcript(s): T10888 >EA_1_T6, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript T10888JΕAJJT6 is shown in bold; this coding portion starts at position 151 and ends at position 699. The transcript also has the following SNPs as listed in Table 14 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T10888_PEA_1_P6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 14 - Nucleic acid SNPs
Figure imgf000281_0001
As noted above, cluster T10888 features 8 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster T10888JPEAJ_nodeJ 1 according to the present invention is supported by 57 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T10888 JΕAJ JT1 and T10888JPEAJ _T5. Table 15 below describes the starting and ending position of this segment on each transcript. Table 15 - Segment location on transcripts
Figure imgf000281_0002
Figure imgf000282_0001
Segment cluster T10888 PEAJ jnodeJ2 according to the present invention is supported by 9 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T10888JPEAJJT5. Table 16 below describes the starting and ending position of this segment on each transcript. Table 16 - Segment location on transcripts
Figure imgf000282_0002
Segment cluster T10888_PEA_l_node_17 according to the present invention is supported by 160 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T10888_PEAJ_T1 and T10888JΕAJJT4. Table 17 below describes the starting and ending position of this segment on each transcript. Table 17 - Segment location on transcripts
Figure imgf000282_0003
Segment cluster Tl 0888 JPEA J_node l according to the present invention is supported by 61 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T10888_PEA_1_T1, T10888_PEAJ_T4, T10888_PEA_1_T5 and T10888_PEA_1_T6. Table 18 below describes the starting and ending position of this segment on each transcript. Table 18 - Segment location on transcripts
Figure imgf000282_0004
Figure imgf000283_0001
Segment cluster T10888_PEA_l_node_6 according to the present invention is supported by 81 libraries. The number of libraries was detennined as previously described. This segment can be found in the following transcript(s): T10888_PEA_1_T1, T10888JPEAJJT4, T10888JPEAJJT5 and T10888_PEA_1_T6. Table 19 below describes the starting and ending position of this segment on each transcript. Table 19 - Segment location on transcripts
Figure imgf000283_0002
Segment cluster T10888_PEA_l_node_7 according to the present invention is supported by 4 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Tl 0888JPEA JJT6. Table 20 below describes the starting and ending position of this segment on each transcript. Table 20 - Segment location on transcripts
Figure imgf000283_0003
Segment cluster T10888_PEA_l_node_9 according to the present invention is supported by 72 libraries. The number of libraries was detennined as previously described. This segment can be found in the following transcript(s): T10888JPEAJJT1, T10888_PEA_1_T4 and T10888JPEAJJT5. Table 21 below describes the starting and ending position of this segment on each transcript. Table 21 - Segment location on transcripts
Figure imgf000284_0001
According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster T10888JPEAJ_nodeJ5 according to the present invention is supported by 39 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T10888JPEAJJT4. Table 22 below describes the starting and ending position of this segment on each transcript. Table 22 - Segment location on transcripts
Figure imgf000284_0002
Variant protein alignment to the previously known protein: Sequence name: /tmp/tM4EgaoKvm/vuztUrlRc7:CEA6_HUMAN Sequence documentation:
Alignment of: T10888_PEA_1_P2 x CEA6_HUMAN ..
Alignment segment 1/1 :
Quality: 3163.00 Escore: 0 Matching length: 319 Total length: 319
Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0
Alignment: 1 MGPPSAPPCRLHVP KEVLLTASLLTF NPPTTAKLTIESTPFNVAEGKE 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MGPPSAPPCRLHVP KEVLLTASLLTFWNPPTTAKLTIESTPFNVAEGKE 50
51 VLLLAHNLPQNRIGYSWYKGERVDGNSLIVGYVIGTQQATPGPAYSGRET 100
51 VLLLAHNLPQNRIGYS YKGERVDGNSLIVGYVIGTQQATPGPAYSGRET 100
101 IYPNASLLIQNVTQNDTGFYTLQVIKSDLVNEEATGQFHVYPELPKPSIS 150 I I I I I I I I I I I I II I I II II I I I II I I I I I II I I II I I I II I I I I II I I I 101 IYPNASLLIQNVTQNDTGFYTLQVIKSDLVNEEATGQFHVYPELPKPSIS 150
151 SNNSNPVEDKDAVAFTCEPEVQNTTYLW VNGQSLPVSPRLQLSNGNMTL 200 II I I I I II I I I I I I I I I II I I I II I II I I I II I I I I I I I I I I I II I I I II 151 SNNSNPVEDKDAVAFTCEPEVQNTTYLWWVNGQSLPVSPRLQLSNGNMTL 200
201 TLLSVKRNDAGSYECEIQNPASANRSDPVTLNVLYGPDVPTISPSKANYR 250 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 201 TLLSVKRNDAGSYECEIQNPASANRSDPVTLNVLYGPDVPTISPSKANYR 250 251 PGENLNLSCHAASNPPAQYSWFINGTFQQSTQELFIPNITVNNSGSYMCQ 300 I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 251 PGENLNLSCHAASNPPAQYS FINGTFQQSTQELFIPNITVNNSGSYMCQ 300 301 AHNSATGLNRTTVTMI VS 319 I I I I I I I I I I I I I I I I I I I 301 AHNSATGLNRTTVTMITVS 319
Sequence name: /tmp/Yjllgj7TCe/PgdufzL01W:CEA6_HUMAN
Sequence documentation:
Alignment of: T10888JΕAJ JP4 x CEA6 JHUMAN ..
Alignment segment 1/1: Quality: 2310.00 Escore: 0 Matching length: 234 Total length: 234
Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0
Alignment : 1 MGPPSAPPCRLHVP KEVLLTASLLTFWNPPTTAKLTIESTPFNVAEGKE 50 1 I I I I I I I I I I I 1 I I I I II I I I I I I I I I I I I I I I I I I I I I I I II I I I 11 I 1 MGPPSAPPCRLHVPWKEVLLTASLLTFWNPPTTAKLTIESTPFNVAEGKE 50 . . . . . 51 VLLLAHNLPQNRIGYS YKGERVDGNSLIVGYVIGTQQATPGPAYSGRET 100 I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 VLLLAHNLPQNRIGYS YKGERVDGNSLIVGYVIGTQQATPGPAYSGRET 100 101 IYPNASLLIQNVTQNDTGFYTLQVIKSDLVNEEATGQFHVYPELPKPSIS 150 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 1 I I I I I I I I I I I I I II I 101 IYPNASLLIQNVTQNDTGFYTLQVIKSDLVNEEATGQFHVYPELPKPSIS 150
151 SNNSNPVEDKDAVAFTCEPEVQNTTYLWWVNGQSLPVSPRLQLSNGNMTL 200 I I I I I I I I I 11 I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 151 SNNSNPVEDKDAVAFTCEPEVQNTTYLWWVNGQSLPVSPRLQLSNGNMTL 200
201 TLLSVKRNDAGSYECEIQNPASANRSDPVTLNVL 234 I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I 201 TLLSVKRNDAGSYECEIQNPASANRSDPVTLNVL 234
Sequence name: /tmp/Yjllgj7TCe/PgdufzL01W:Q13774
Sequence documentation:
Alignment of: T10888JPEAJ J>4 x Q13774 .. Alignment segment 1/1 :
Quality: 2310.00 Escore: 0 Matching length: 234 Total length: 234
Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0
Alignment:
1 MGPPSAPPCRLHVP KEVLLTASLLTFWNPPTTAKLTIESTPFNVAEGKE 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I 1 MGPPSAPPCRLHVPWKEVLLTASLLTFWNPPTTAKLTIESTPFNVAEGKE 50 . . . . . 51 VLLLAHNLPQNRIGYSWYKGERVDGNSLIVGYVIGTQQATPGPAYSGRET 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 VLLLAHNLPQNRIGYS YKGERVDGNSLIVGYVIGTQQATPGPAYSGRET 100 101 IYPNASLLIQNVTQNDTGFYTLQVIKSDLVNEEATGQFHVYPELPKPSIS 150 I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 101 IYPNASLLIQNVTQNDTGFYTLQVIKSDLVNEEATGQFHVYPELPKPSIS 150 151 SNNSNPVEDKDAVAFTCEPEVQNTTYLWWVNGQSLPVSPRLQLSNGNMTL 200 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I II I I I I I I I I I I I 151 SNNSNPVEDKDAVAFTCEPEVQNTTYLW VNGQSLPVSPRLQLSNGNMTL 200 201 TLLSVKRNDAGSYECEIQNPASANRSDPVTLNVL 234 I II I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I 201 TLLSVKRNDAGSYECEIQNPASANRSDPVTLNVL 234
Sequence name: /tmp/x5xDBacdpj/rTXRGepv3y:CEA6 JHUMAN
Sequence documentation:
Alignment of: T10888_PEA_1_P5 x CEA6_HUMAN ..
Alignment segment 1/1 :
Quality: 3172.00 Escore: 0 Matching length: 320 Total length: 320
Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0
Alignment :
1 MGPPSAPPCRLHVPWKEVLLTASLLTF NPPTTAKLTIESTPFNVAEGKE 50 I I I I I I I 11 I I I II I I I I I I I I I I I I I I II I I II I I I I I I I I I I I I I I I I 1 MGPPSAPPCRLHVP KEVLLTASLLTF NPPTTAKLTIESTPFNVAEGKE 50 . . . . . 51 VLLLAHNLPQNRIGYSWYKGERVDGNSLIVGYVIGTQQATPGPAYSGRET 100 I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I II I I I I I I I I I II I I 51 VLLLAHNLPQNRIGYSWYKGERVDGNSLIVGYVIGTQQATPGPAYSGRET 100 101 IYPNASLLIQNVTQNDTGFYTLQVIKSDLVNEEATGQFHVYPELPKPSIS 150 101 IYPNASLLIQNVTQNDTGFYTLQVIKSDLVNEEATGQFHVYPELPKPSIS 150
151 SNNSNPVEDKDAVAFTCEPEVQNTTYL WVNGQSLPVSPRLQLSNGNMTL 200 I I I I I I 1 I I 1 I I I I I I I I I I I I 1 I I I I I I I 1 I I I I I I I I I 11 I I I I I I I I 151 SNNSNPVEDKDAVAFTCEPEVQNTTYL VNGQSLPVSPRLQLSNGNMTL 200 201 TLLSVKRNDAGSYECEIQNPASANRSDPVTLNVLYGPDVPTISPSKANYR 250 I I I I I 1 I I I I 1 I I I I I 1 I I I I I 1 I I I I I I I I I I I I I I I 1 I I I I I I I I I I I 201 TLLSVKRNDAGSYECEIQNPASANRSDPVTLNVLYGPDVPTISPSKANYR 250 251 PGENLNLSCHAASNPPAQYS FINGTFQQSTQELFIPNITVNNSGSYMCQ 300 I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I 251 PGENLNLSCHAASNPPAQYS FINGTFQQSTQELFIPNITVNNSGSYMCQ 300 301 AHNSATGLNRTTVTMITVSG 320 I I I I I I I I I I I 1 I I I I 1 I I I 301 AHNSATGLNRTTVTMITVSG 320
Sequence name: /tmp/VAhvYFeatq/QNEM573uCo :CEA6JHUMAN
Sequence documentation:
Alignment of: T10888_PEA_1_P6 x CEA6J-UMAN
Alignment segment 1/1: Quality: 1393.00 Escore: 0 Matching length: 143 Total length: 143 Matching Percent Similarity: 99.30 Matching Percent Identity: 99.30 Total Percent Similarity: 99.30 Total Percent Identity: 99.30 Gaps :
Alignment :
1 MGPPSAPPCRLHVP KEVLLTASLLTFWNPPTTAKLTIESTPFNVAEGKE 50
1 MGPPSAPPCRLHVPWKEVLLTASLLTF NPPTTAKLTIESTPFNVAEGKE 50
51 VLLLAHNLPQNRIGYS YKGERVDGNSLIVGYVIGTQQATPGPAYSGRET 100
51 VLLLAHNLPQNRIGYS YKGERVDGNSLIVGYVIGTQQATPGPAYSGRET 100
101 IYPNASLLIQNVTQNDTGFYTLQVIKSDLVNEEATGQFHVYRE 143
101 IYPNASLLIQNVTQNDTGFYTLQVIKSDLVNEEATGQFHVYPE 143
Alignment of: T10888_PEA 1 P6 x CE 6 -UMAN
Alignment segment 1/1
Quality: 101.00 Escore: Matching length: 141 Total length: 183 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 77.05 Total Percent Identity: 77.05 Gaps : 1
Alignment :
1 MGPPSAPPCRLHVPWKEVLLTASLLTFWNPPTTAKLTIESTPFNVAEGKE 50
1 MGPPSAPPCRLHVP KEVLLTASLLTF NPPTTAKLTIESTPFNVAEGKE 50
51 VLLLAHNLPQNRIGYSWYKGERVDGNSLIVGYVIGTQQATPGPAYSGRET 100
51 VLLLAHNLPQNRIGYS YKGERVDGNSLIVGYVIGTQQATPGPAYSGRET 100
101 IYPNASLLIQNVTQNDTGFYTLQVIKSDLVNEEATGQFHVYREYFHMTSG 150
101 IYPNASLLIQNVTQNDTGFYTLQVIKSDLVNEEATGQFHVY 141
151 C GSVLLPTYGIVRPGLCLWPSLHYILYQGLDI 183
141 141
Expression of CEA6 JHUMAN Carcinoembryonic antigen-related cell adhesion molecule 6 (T10888)] transcripts which are detectable by amplicon as depicted in sequence name [T10888 juncl 1-17] in normal and cancerous colon tissues. Expression of CEA6_HUMAN Carcinoembryonic antigen-related cell adliesion molecule
6 transcripts detectable by or according to juncl 1-17 [node(s)/edge], T10888 juncl 1-17 amplicon (SEQ ID NO: 1279) and juncl 1-17 primers (SEQ ID NO: 1277-1278) was measured by real time PCR. In parallel the expression of four housekeeping genes -PBGD (GenBank Accession No. BC019323; amplicon - PBGD-amplicon, SEQ ID NO:531), HPRT1 (GenBank Accession No. NMJ300194; amplicon - HPRT1 -amplicon, SEQ ID NO:612), G6PD (GenBank Accession No. NMJ)00402; G6PD amplicon, SEQ ID NO:615), and RPS27A (GenBank Accession No. NM 02954; RPS27A amplicon, SEQ ID NO: 1261), was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the normal post-mortem (PM) samples (Sample Nos. 41, 52, 62-67, 69-71, Table 1, "Tissue samples in testing panel", above), to obtain a value of fold up-regulation for each sample relative to median of the normal PM samples. Figure 10 is a histogram showing over expression of the above-indicated CEA6JHUMAN Carcinoembryonic antigen-related cell adhesion molecule 6 transcripts in cancerous colon samples relative to the normal samples. As is evident from Figure 10, the expression of CEA6 JHUMAN Carcinoembryonic antigen-related cell adliesion molecule 6 transcripts detectable by the above amplicon(s) in cancer samples was significantly higher than in the non-cancerous samples (Sample Nos. 41, 52, 62-67, 69-71, Table 1, "Tissue samples in testing panel", above). Notably an over-expression of at least 3 fold was found in 15 out of 36 adenocarcinoma samples Statistical analysis was applied to verify the significance of these results, as described below. The P value for the difference in the expression levels of CEA6JHUMAN Carcinoembryonic antigen-related cell adhesion molecule 6 transcripts detectable by the above amplicon(s) in colon cancer samples versus the normal tissue samples was determined by T test as 5.36E-03. Threshold of 3 fold overexpression was found to differentiate between cancer and normal samples with P value of 7.41E-03 as checked by exact fisher test. The above values demonstrate statistical significance of the results. Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non- limiting illustrative example only of a suitable primer pair: T10888/wwc77-77E forward primer; and T10888juncll-17R reverse primer. The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon was obtained as a non-limiting illustrative example only of a suitable amplicon: T10888/wnc./7- 17.
Forward: (SΕQ ID NO: 1277)- CCAGCAATCCACACAAGAGCT Reverse (SΕQ ID NO: 1278)- CAGGGTCTGGTCCAATCAGAG Amplicon (SΕQ ID NO: 1279)- CCAGCAATCCACACAAGAGCTCTTTATCCCCAACATCACTGTGAATAATAGCGGAT CCTATATGTGCCAAGCCCATAACTCAGCCACTGGCCTCAATAGGACCACAGTCACG ATGATCACAGTCTCTGATTGGACCAGACCCTG
Expression of CEA6 JHUMAN Carcinoembryonic antigen-related cell adhesion molecule 6 T10888 transcripts, which are detectable by amplicon as depicted in sequence name T10888juncl 1-17 in different normal tissues.
Expression of CEA6JHUMAN Carcinoembryonic antigen-related cell adhesion molecule 6 transcripts detectable by or according to T10888 juncl 1-17 amplicon (SEQ ID NO: 1282) and T10888 juncl 1-17F (SEQ ID NO: 1280) and T10888 juncl 1-17R (SEQ ID NO: 1281) was measured by real time PCR. In parallel the expression of four housekeeping genes -RPL19 (GenBank Accession No. NM .000981; RPL19 amplicon, SEQ ID NO:1264), TATA box (GenBank Accession No. NM_003194; TATA amplicon, SEQ ID NO:1267), Ubiquitin (GenBank Accession No. BC000449; amplicon - Ubiquitin-amplicon) and SDHA (GenBank Accession No. NM_004168; amplicon - SDHA-amplicon, SEQ ID NO:1273) was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the ovary samples (Sample Nos. 18-20 Table 2 Tissue samples in nonnal panel), to obtain a value of relative expression of each sample relative to median of the ovary samples. The results are described in Figure 11, presenting the histogram showing the expression of T10888 transcripts, which are detectable by amplicon as depicted in sequence name T10888juncl 1-17 in different normal tissues.
Forward primer (SEQ ID NO: 1280): CCAGCAATCCACACAAGAGCT Reverse primer (SEQ ID NO: 1281): CAGGGTCTGGTCCAATCAGAG Amplicon (SEQ ID NO: 1282): CCAGC AATCCACACAAGAGCTCTTTATCCCCAACATCACTGTGAATAATAGCGGAT CCTATATGTGCCAAGCCCATAACTCAGCCACTGGCCTCAATAGGACCACAGTCACG ATGATCACAGTCTCTGATTGGACCAGACCCTG
DESCRIPTION FOR CLUSTER H14624
Cluster HI 4624 features 1 transcript(s) and 15 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Figure imgf000295_0001
Table 2 - Segments of interest
Figure imgf000296_0001
Table 3 - Proteins of interest Protein Name SEQ ID NO: H14624 P15 540 Cluster H14624 can be used as a diagnostic marker according to overexpression of transcripts of this cluster in cancer. Expression of such transcripts in normal tissues is also given according to the previously described methods. The term "number" in the left hand column of the table and the numbers on the y-axis of the figure below refer to weighted expression of ESTs in each category, as "parts per million" (ratio of the expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million). Overall, the following results were obtained as shown with regard to the histograms in Figure 12 and Table 4. This cluster is overexpressed (at least at a minimum level) in the following pathological conditions: colorectal cancer, epithelial malignant tumors, a mixtare of malignant tumors from different tissues, lung malignant tumors and pancreas carcinoma. Table 4 - Normal tissue distribution
Figure imgf000297_0001
Table 5 - P values and ratios for expression in cancerous tissue
Figure imgf000297_0002
Figure imgf000298_0001
As noted above, cluster H14624 features 1 transcript(s), which were listed in Table 1 above. A description of each variant protein according to the present invention is now provided. Variant protein H14624JP15 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) H14624JT20. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between H14624JP 15 and Q9HAP5 (SEQ ID NO: 1384): l.An isolated chimeric polypeptide encoding for H14624_P15, comprising a first amino acid sequence being at least 90 % homologous to MLQGPGSLLLLFLASHCCLGSARGLFLFGQPDFSYKRSNCKPIPANLQLCHGIEYQNMR LPNLLGHETMKEVLEQAGAWIPLVMKQCHPDTKKFLCSLFAPVCLDDLDETIQPCHSLC VQVKDRCAPVMSAFGFPWPDMLECDRFPQDNDLCIPLASSDHLLPATEE conesponding to amino acids 1 - 167 of Q9HAP5, which also conesponds to amino acids 1 - 167 of H14624JP15, and a second amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence GKPSLLLPHSLLG conesponding to amino acids 168 - 180 of H14624JP15, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of H14624JP15, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence GKPSLLLPHSLLG in H14624_P15.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region.. Variant protein H14624JP15 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 6, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein H14624JP15 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 6 - Amino acid mutations
Figure imgf000299_0001
Variant protein H14624JP15 is encoded by the following transcript(s): H14624JT20, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript H14624JT20 is shown in bold; this coding portion starts at position 857 and ends at position 1396. The transcript also has the following SNPs as listed in Table 7 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein H14624_P15 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Nucleic acid SNPs
Figure imgf000300_0001
for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided. Segment cluster H14624_node_0 according to the present invention is supported by 3 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): H14624JT20. Table 8 below describes the starting and ending position of this segment on each transcript.
Figure imgf000301_0001
Segment cluster H14624_node_16 according to the present invention is supported by 3 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): H14624JT20. Table 9 below describes the starting and ending position of this segment on each transcript. Table 9 - Segment location on transcripts
Figure imgf000301_0002
Segment cluster H14624_nodeJ according to the present invention is supported by 67 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): H14624JT20. Table 10 below describes the starting and ending position of this segment on each transcript. Table 10 - Segment location on transcripts
Figure imgf000301_0003
According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description. Segment cluster H14624_node_10 according to the present invention can be found in the following transcript(s): H14624JT20. Table 11 below describes the starting and ending position of this segment on each transcript. Table 11 - Segment location on transcripts Transcript name Segment starting position Segment ending position H14624JT20 1070 1079
Segment cluster H14624_nodeJ l according to the present invention is supported by 99 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): H14624JT20. Table 12 below describes the starting and ending position of this segment on each transcript. Table 12 - Segment location on transcripts
Figure imgf000302_0001
Segment cluster H14624jnodeJ2 according to the present invention can be found in the following transcript(s): H14624JT20. Table 13 below describes the starting and ending position of this segment on each transcript. Table 13 - Segment location on transcripts
Figure imgf000302_0002
Segment cluster H14624_node_13 according to the present invention is supported by 124 libraries. The number of libraries was detennined as previously described. This segment can be found in the following transcript(s): H14624_T20. Table 14 below describes the starting and ending position of this segment on each transcript.
Figure imgf000303_0001
Segment cluster H14624_node_14 according to the present invention is supported by 114 libraries. The number of libraries was detennined as previously described. This segment can be found in the following transcript(s): H14624JT20. Table 15 below describes the starting and ending position of this segment on each transcript. Table 15 - Segment location on transcripts Transcript name Segment starting position Segment ending position H14624JT20 1228 1287
Segment cluster H14624_node_15 according to the present invention is supported by 124 libraries. The number of libraries was detennined as previously described. This segment can be found in the following transcript(s): H14624JT20. Table 16 below describes the starting and ending position of this segment on each transcript. Table 16 - Segment location on transcripts
Figure imgf000303_0002
Segment cluster H14624_nodeJ according to the present invention is supported by 65 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): H14624JT20. Table 17 below describes the starting and ending position of this segment on each transcript. Table 17 - Segment location on transcripts
Figure imgf000304_0001
Segment cluster HI 4624 jnode according to the present invention can be found in the following transcript(s): H14624JT20. Table 18 below describes the starting and ending position of this segment on each transcript. Table 18 - Segment location on transcripts
Figure imgf000304_0002
Segment cluster H14624_node_6 according to the present invention can be found in the following transcript(s): H14624JT20. Table 19 below describes the starting and ending position of this segment on each transcript. Table 19 - Segment location on transcripts
Figure imgf000304_0003
Segment cluster H14624_nodeJ according to the present invention can be found in the following transcript(s): H14624JT20. Table 20 below describes the starting and ending position of this segment on each transcript. Table 20 - Segment location on transcripts
Figure imgf000304_0004
Segment cluster H14624_nodeJ according to the present invention is supported by 85 libraries. The number of libraries was detennined as previously described. This segment can be found in the following transcript(s): H14624JT20. Table 21 below describes the starting and ending position of this segment on each transcript. Table 21 - Segment location on transcripts
Figure imgf000305_0001
Segment cluster H14624_node_9 according to the present invention is supported by 87 libraries. The number of libraries was detennined as previously described. This segment can be found in the following transcript(s): H14624JT20. Table 22 below describes the starting and ending position of this segment on each transcript. Table 22 - Segment location on transcripts
Figure imgf000305_0002
Variant protein alignment to the previously known protein: Sequence name: /tmp/UpblSbFkrj/N4PrGQAB2V:Q9HAP5 Sequence documentation:
Alignment of: H14624 P15 x Q9HAP5
Alignment segment 1/1: Quality: 1702.00 Escore: 0 Matching length: 167 Total length: 167 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment :
1 MLQGPGSLLLLFLASHCCLGSARGLFLFGQPDFSYKRSNCKPIPANLQLC 50
1 MLQGPGSLLLLFLASHCCLGSARGLFLFGQPDFSYKRSNCKPIPANLQLC 50
51 HGIEYQNMRLPNLLGHETMKEVLEQAGA IPLVMKQCHPDTKKFLCSLFA 100
51 HGIEYQNMRLPNLLGHETMKEVLEQAGAWIPLVMKQCHPDTKKFLCSLFA 100
101 PVCLDDLDETIQPCHSLCVQVKDRCAPVMSAFGFP PDMLECDRFPQDND 150
101 PVCLDDLDETIQPCHSLCVQVKDRCAPVMSAFGFPWPDMLECDRFPQDND 150
151 LCIPLASSDHLLPATEE 167
151 LCIPLASSDHLLPATEE 167
Figure imgf000307_0001
DESCRIPTION FOR CLUSTER H53626 Cluster H53626 features 2 transcript(s) and 20 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Figure imgf000308_0001
Figure imgf000309_0001
Cluster H53626 can be used as a diagnostic marker according to overexpression of transcripts of this cluster in cancer. Expression of such transcripts in nonnal tissues is also given according to the previously described methods. The term "number" in the left hand column of the table and the numbers on the y-axis of the figure below refer to weighted expression of ESTs in each category, as "parts per million" (ratio of the expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million).
Overall, the following results were obtained as shown with regard to the histograms in Figure 13 and Table 4. This cluster is overexpressed (at least at a minimum level) in the following pathological conditions: epithelial malignant tumors, a mixture of malignant tumors from different tissues and myosarcoma.
Table 4 - Normal tissue distribution Name of Tissue Number
Figure imgf000310_0001
Table 5 - P values and ratios for expression in cancerous tissue
Figure imgf000310_0002
Figure imgf000311_0001
As noted above, cluster H53626 features 2 transcript(s), which were listed in Table 1 above. A description of each variant protein according to the present invention is now provided.
Variant protein H53626JPEAJ J>4 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) H53626_PEA_1_T15. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between H53626JPEAJJP4 and Q8N441(SEQ ID NO: 1385): l.An isolated chimeric polypeptide encoding for H53626JPEAJJP4, comprising a first amino acid sequence being at least 90 % homologous to MTPSPLLLLLLPPLLLGAFPPAAAARGPPKMADKWPRQVARLGRTVRLQCPVEGDPPP LTMWTKDGRTIHSGWSRFRVLPQGLKVKQVEREDAGVYVCKATNGFGSLSVNYTLW LDDISPGKESLGPDSSSGGQEDPASQQWARPRFTQPSKMRRRVIARPVGSSVRLKCVAS GHPRPDITWMKDDQALTRPEAAEPRKKKWTLSLKNLRPEDSGKYTCRVSNRAGAΓNAT YKVDVIQRTRSKPVLTGTHPVNTTVDFGGTTSFQCKVRSDVKPVIQWLKRVEYGAEGR HNSTIDVGGQKFWLPTGDVWSRPDGSYLNKLLITRARQDDAGMYICLGANTMGYSFR SAFLTVLP conesponding to amino acids 1 - 357 of Q8N441, which also conesponds to amino acids 1 - 357 of H53626JPEAJ J>4, second amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%>, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence GARLPRHATPCWCPDPPPGPGVPPTGWGPTLPSRAVLARSSAEGGQPRGTVSTAPGMG LGCSPGLCVGVPLPTSFPLALA corresponding to amino acids 358 - 437 of H53626JPEAJJP4, and a third amino acid sequence being at least 90 % homologous to DPKPPGPPVASSSSATSLPWPVVIGIPAGAVFILGTLLLWLCQAQKKPCTPAPAPPLPGH RPPGTARDRSGDKDLPSLAALSAGPGVGLCEEHGSPAAPQHLLGPGPVAGPKLYPKLY TDIHTHTHTHSHTHSHVEGKVHQHIHYQC conesponding to amino acids 358 - 504 of Q8N441, which also conesponds to amino acids 438 - 584 of H53626_PEA_1_P4, wherein said first, second and third amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for an edge portion of H53626_PEA_1_P4, comprising an amino acid sequence being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence encoding for
GARLPRHATPCWCPDPPPGPGVPPTGWGPTLPSRAVLARSSAEGGQPRGTVSTAPGMG LGCSPGLCVGVPLPTSFPLALA, conesponding to H53626JPEAJ JP4.
The location of the variant protein was detennined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because although both signal- peptide prediction programs agree that this protein has a signal peptide, both trans-membrane region prediction programs predict that this protein has a trans-membrane region downstream of this signal peptide.. Variant protein H53626_PEA_1_P4 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 6, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein H53626_PEA_1_P4 sequence provides support for the deduced sequence of this variant protein according to the present invention).
Table 6 - Amino acid mutations
Figure imgf000313_0001
Variant protein H53626JPEAJJP4 is encoded by the following transcript(s): H53626_PEA_1_T15, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript H53626JPEAJJT15 is shown in bold; this coding portion starts at position 17 and ends at position 1768. The transcript also has the following SNPs as listed in Table 7 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein H53626JPEAJ JP4 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Nucleic acid SNPs
Figure imgf000313_0002
Figure imgf000314_0001
Variant protein H53626_PEA_1_P5 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) H53626JPEAJJT16. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between H53626_PEA_1_P5 and Q9H4D7(SEQ ID NO: 1386): l.An isolated chimeric polypeptide encoding for H53626_PEA_1_P5, comprising a first amino acid sequence being at least 90 % homologous to MTPSPLLLLLLPPLLLGAFPPAAAARGPPKMADKWPRQVARLGRTVRLQCPVEGDPPP LTMWTKDGRTIHSGWSRFRVLPQGLKVKQVEREDAGVYVCKATNGFGSLSVNYTLVV LDDISPGKESLGPDSSSGGQEDPASQQWARPRFTQPSKMRRRVIARPVGSSVRLKCVAS GFrpPJ'DITWMKDDQALTRPEAAEPPJαO WTLSLKNLRPEDSGKYTCRVSNRAGAiNAT YKVDVIQRTRSKPVLTGTHPVNTTVDFGGTTSFQCK conesponding to amino acids 1 - 269 of 09H4D7. which also conesponds to amino acids 1 - 269 of H53626 PEA J JP5, and a second amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
TQNRQGHLWPPRPRPLACRGPWSSASQPALSSSWAPCSCGFARPRRSRAPPRLPLPCLG TARRGRPATAAETRTFPRWPPSALALVWGCVRSMGLRQPPSTYWAQAQLLALSCTPNS TQTSTHTHTHTLTHTHTWRARSTSTSTISARRHRICSGHGGAGQTGRLGGWRTELQTKA GDPWRGGMASTPGSLCVRHSPWTHTHRHTHYLDACMHTHARTRAP conesponding to amino acids 270 - 490 of H53626JPEAJJP5, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of H53626JPEAJJP5, comprising a polypeptide being at least 10%, optionally at least about 80%, preferably at least about 85%), more preferably at least about 90% and most preferably at least about 95% homologous to the sequence TQNRQGHLWPPRPRPLACRGPWSSASQPALSSSWAPCSCGFARPRRSRAPPRLPLPCLG TARRGRPATAAETRTFPRWPPSALALVWGCVRSMGLRQPPSTYWAQAQLLALSCTPNS TQTSTHTHTHTLTHTHTWRARSTSTSTISARRHRICSGHGGAGQTGRLGGWRTELQTKA GDPWRGGMASTPGSLCVRHSPWTHTHRHTHYLDACMHTHARTRAP in
H53626J»EA_1_P5. Comparison report between H53626 JPEA JJP5 and Q8N441(SEQ ID NO:1385): l.An isolated chimeric polypeptide encoding for H53626_PEA_1_P5, comprising a first amino acid sequence being at least 90 % homologous to MTPSPLLLLLLPPLLLGAFPPAAAARGPPKMADKWPRQVARLGRTVRLQCPVEGDPPP LTMWTKDGRTIHSGWSRFRVLPQGLKVKQVEREDAGVYVCKATNGFGSLSVNYTLVV LDDISPGKESLGPDSSSGGQEDPASQQWARPRFTQPSKMRRRVIARPVGSSVRLKCVAS GHPRPDITWMKDDQALTRPEAAEPRKKKWTLSLKNLRPEDSGKYTCRVSNRAGATNAT YKVDVIQRTRSKPVLTGTHPVNTTVDFGGTTSFQCK conesponding to amino acids 1 - 269 of Q8N441, which also conesponds to amino acids 1 - 269 of H53626_PEA_1_P5, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%> and most preferably at least 95%> homologous to a polypeptide having the sequence
TQNRQGHLWPPRPRPLACRGPWSSASQPALSSSWAPCSCGFARPRRSRAPPRLPLPCLG TARRGRPATAAETRTFPRWPPSALALVWGCVRSMGLRQPPSTYWAQAQLLALSCTPNS TQTSTHTHTHTLTHTHTWRARSTSTSTISARRHRICSGHGGAGQTGRLGGWRTELQTKA GDPWRGGMASTPGSLCVRHSPWTHTHRHTHYLDACMHTHARTRAP conesponding to amino acids 270 - 490 of H53626JPEAJ J>5, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of H53626JPEAJJP5, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95%) homologous to the sequence TQNRQGHLWPPRPRPLACRGPWSSASQPALSSSWAPCSCGFARPRRSRAPPRLPLPCLG TARRGRPATAAETRTFPRWPPSALALVWGCVRSMGLRQPPSTYWAQAQLLALSCTPNS TQTSTHTHTHTLTHTHTWRARSTSTSTISARRHRICSGHGGAGQTGRLGGWRTELQTKA GDPWRGGMASTPGSLCVRHSPWTHTHRHTHYLDACMHTHARTRAP in H53626JΕAJJP5.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region.. Variant protein H53626 JPEA JJP5 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 8, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein H53626JPEAJ JP5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Amino acid mutations
Figure imgf000316_0001
Figure imgf000317_0001
Variant protein H53626_PEA_1_P5 is encoded by the following transcript(s): H53626JPEAJJT16, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript H53626JPEAJ _T16 is shown in bold; this coding portion starts at position 17 and ends at position 1486. The franscript also has the following SNPs as listed in Table 9 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein H53626 PEAJ P5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Nucleic acid SNPs
Figure imgf000317_0002
Figure imgf000318_0001
As noted above, cluster H53626 features 20 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster H53626_PEA_l_node_15 according to the present invention is supported by 25 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): H53626_PEA_1_T15 and H53626_PEA_1_T16. Table 10 below describes the starting and ending position of this segment on each transcript. Table 10 - Segment location on transcripts
Figure imgf000318_0002
Segment cluster H53626JPEAJ _node_22 according to the present invention is supported by 42 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): H53626_PEA_1_T15 and H53626_PEA_1_T16. Table 11 below describes the starting and ending position of this segment on each transcript. Table 11 - Segment location on transcripts
Figure imgf000319_0001
Segment cluster H53626JPEAJ iode 5 according to the present invention is supported by 41 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): H53626JPEAJ JT15. Table 12 below describes the starting and ending position of this segment on each transcript. Table 12 - Segment location on transcripts
Figure imgf000319_0002
Segment cluster H53626JPEAJ _node_26 according to the present invention is supported by 5 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): H53626_PEA_1_T15. Table 14 below describes the starting and ending position of this segment on each transcript. Table 14 - Segment location on transcripts
Figure imgf000319_0003
Microanay (chip) data is also available for this segment as follows. As described above with regard to the cluster itself, various oligonucleotides were tested for being differentially expressed in various disease conditions, particularly cancer. The following oligonucleotides (related to colon cancer) were found to hit this segment, shown in Table 15. Table 15 - Oligonucleotides related to this segment
Figure imgf000319_0004
Segment cluster H53626_PEA_l_nodeJ7 according to the present invention is supported by 106 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): H53626_PEA_1_T15 and H53626JPEAJJT16. Table 16 below describes the starting and ending position of this segment on each transcript. Table 16 - Segment location on transcripts
Figure imgf000320_0001
Segment cluster H53626_PEA_l_nodeJ4 according to the present invention is supported by 121 libraries. The number of libraries was detennined as previously described. This segment can be found in the following transcript(s): H53626JPEAJJT15 and H53626_PEA_1_T16. Table 17 below describes the starting and ending position of this segment on each transcript. Table 17 - Segment location on transcripts
Figure imgf000320_0002
Segment cluster H53626JPEAJ iode 35 according to the present invention is supported by 85 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): H53626JPEAJJT15 and H53626JPEAJJT16. Table 18 below describes the starting and ending position of this segment on each transcript. Table 18 - Segment location on transcripts
Figure imgf000320_0003
Segment cluster H53626JPEAJ ιodeJ6 according to the present invention is supported by 69 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): H53626JΕAJJT15 and H53626_PEA_1_T16. Table 19 below describes the starting and ending position of this segment on each transcript. Table 19 - Segment location on transcripts
Figure imgf000321_0001
According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description. Segment cluster H53626_PEA_l_node_l 1 according to the present invention is supported by 12 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): H53626_PEA_1_T15 and H53626_PEA_1_T16. Table 20 below describes the starting and ending position of this segment on each transcript. Table 20 - Segment location on transcripts
Figure imgf000321_0002
Segment cluster H53626_PEA_l_node_12 according to the present invention is supported by 11 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): H53626JPEAJJT15 and H53626JPEAJJT16. Table 21 below describes the starting and ending position of this segment on each transcript. Table 21 - Segment location on transcripts
Figure imgf000321_0003
Segment cluster H53626_PEA_l_node_16 according to the present invention can be found in the following transcript(s): H53626JPEAJJT15 and H53626_PEA_1_T16. Table 22 below describes the starting and ending position of this segment on each transcript. Table 22 - Segment location on transcripts
Figure imgf000322_0001
Segment cluster H53626JPEAJ jιodeJ9 according to the present invention is supported by 25 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): H53626JPEAJJT15 and H53626_PEA_1_T16. Table 23 below describes the starting and ending position of this segment on each transcript. Table 23 - Segment location on transcripts
Figure imgf000322_0002
Segment cluster H53626_PEA_l_node JO according to the present invention is supported by 27 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): H53626JPEA JT15 and H53626_PEA_1_T16. Table 24 below describes the starting and ending position of this segment on each transcript. Table 24 - Segment location on transcripts
Figure imgf000322_0003
Segment cluster H53626JPEAJ_node_24 according to the present invention is supported by 34 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): H53626_PEA_1_T15 and H53626_PEA_1_T16. Table 25 below describes the starting and ending position of this segment on each transcript. Table 25 - Segment location on transcripts
Figure imgf000323_0001
Segment cluster H53626VPEAJ ιodeJ8 according to the present invention is supported by 66 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): H53626JPEA JT15 and H53626_PEA_1_T16. Table 26 below describes the starting and ending position of this segment on each transcript. Table 26 - Segment location on transcripts
Figure imgf000323_0002
Segment cluster H53626JPEAJ iode J9 according to the present invention is supported by 73 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): H53626JPEAJJT15 and H53626JPEAJJT16. Table 27 below describes the starting and ending position of this segment on each transcript. Table 27 - Segment location on transcripts
Figure imgf000323_0003
Segment cluster H53626JPEA ljnode O according to the present invention is supported by 71 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): H53626JPEAJJT15 and H53626JPEAJJT16. Table 28 below describes the starting and ending position of this segment on each transcript. Table 28 - Segment location on transcripts
Figure imgf000323_0004
Figure imgf000324_0001
Segment cluster H53626JPEAJ iodejl according to the present invention is supported by 67 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): H53626JΕAJJT 15 and H53626JPEAJJT16. Table 29 below describes the starting and ending position of this segment on each transcript. Table 29 - Segment location on transcripts
Figure imgf000324_0002
Segment cluster H53626JΕAJ jnode J2 according to the present invention is supported by 65 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): H53626_PEA_1_T15 and H53626JPEAJJT16. Table 30 below describes the starting and ending position of this segment on each transcript. Table 30 - Segment location on transcripts
Figure imgf000324_0003
Segment cluster H53626_PEA_l_nodeJ3 according to the present invention can be found in the following transcript(s): H53626JPEAJJT15 and H53626_PEA_1_T16. Table 31 below describes the starting and ending position of this segment on each transcript. Table 31 - Segment location on transcripts
Figure imgf000324_0004
Expression of Homo sapiens fibroblast growth factor receptor-like 1 (FGFRLl) H53626 transcripts, which are detectable by amplicon as depicted in sequence name H53626 junc24- 27F1R3 in normal and cancerous colon tissues. Expression of Homo sapiens fibroblast growth factor receptor-like 1 (FGFRLl) transcripts detectable by or according to junc24-27, H53626 junc24-27FlR3 amplicon (SEQ ID NO: 1285) and H53626 junc24-27Fl (SEQ ID NO: 1283) and H53626 junc24-27R3 (SEQ ID NO: 1284) primers was measured by real time PCR. In parallel the expression of four housekeeping genes -PBGD (GenBank Accession No. BCO 19323; amplicon - PBGD-amplicon, SEQ ID NO:531), HPRT1 (GenBank Accession No. NM_000194; amplicon - HPRT1 -amplicon, SEQ ID NO:612), and G6PD (GenBank Accession No. N J000402; G6PD amplicon, SEQ ID NO:615), RPS27A (GenBank Accession No. NM_002954; RPS27A amplicon, SEQ ID NO:1261), was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the normal post-mortem (PM) samples (Sample Nos. 41, 52, 62-67, 69-71, Table 3 above, "Tissue sample in colon cancer testing panel"), to obtain a value of fold up-regulation for each sample relative to median of the normal PM samples. Figure 14 is a histogram showing over expression of the above-indicated Homo sapiens fibroblast growth factor receptor-like 1 (FGFRLl) transcripts in cancerous colon samples relative to the normal samples. As is evident from Figure 14, the expression of Homo sapiens fibroblast growth factor receptor-like 1 (FGFRLl) transcripts detectable by the above amplicon in cancer samples was significantly higher than in the non-cancerous samples (Sample Nos. 41, 52, 62-67, 69-71 Table 3, "Tissue sample in colon cancer testing panel"). Notably an over- expression of at least 5 fold was found in 13 out of 36 adenocarcinoma samples.
Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non- limiting illustrative example only of a suitable primer pair: H53626 junc24-27Fl forward primer; and H53626 junc24-27R3 reverse primer. The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon was obtained as a non-lhniting illustrative example only of a suitable amplicon: H53626 junc24- 27F1R3.
Forward primer (SEQ ID NO: 1283): GTCCTTCCAGTGCAAGACCCA Reverse primer (SEQ ID NO: 1284): TGGGCCTGGCAAAGCC Amplicon (SEQ ID NO: 1285):
GTCCTTCCAGTGCAAGACCCAAAACCGCCAGGGCCACCTGTGGCCTCCTCGTCCTC GGCCACTAGCCTGCCGTGGCCCGTGGTCATCGGCATCCCAGCCGGCGCTGTCTTCAT CCTGGGCACCCTGCTCCTGTGGCTTTGCCAGGCCCA
Expression of Homo sapiens fibroblast growth factor receptor-like 1 (FGFRLl) H53626 transcripts, which are detectable by amplicon as depicted in sequence name H53626 seg25 in nonnal and cancerous colon tissues. Expression of Homo sapiens fibroblast growth factor receptor-like 1 (FGFRLl) transcripts detectable by or according to seg25, H53626 seg25 amplicon(SEQ ID NO: 1288) and H53626 seg25F (SEQ ID NO: 1286)and H53626 seg25R (SEQ ID NO: 1287) primers was measured by real time PCR. In parallel the expression of four housekeeping genes -PBGD (GenBank Accession No. BC019323; amplicon - PBGD-amplicon, SEQ ID NO:531), HPRTl (GenBank Accession No. NM_000194; amplicon - HPRTl -amplicon, SEQ ID NO:612), G6PD (GenBank Accession No. NM_000402; G6PD amplicon, SEQ ID NO:615), RPS27A (GenBank Accession No. NM_002954; RPS27A amplicon, SEQ ID NO:1261), was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the normal post-mortem (PM) samples (Sample Nos. 41, 52, 62-67, 69-71, Table 3 above, "Tissue samples in colon cancer testing panel"), to obtain a value of fold up-regulation for each sample relative to median of the normal PM samples. Figure 15 is a histogram showing over expression of the above-indicated Homo sapiens fibroblast growth factor receptor-like 1 (FGFRLl) transcripts in cancerous colon samples relative to the normal samples. As is evident from Figure 15, the expression of Homo sapiens fibroblast growth factor receptor-like 1 (FGFRLl) transcripts detectable by the above amplicon was higher in a few cancer samples than in the non-cancerous samples (Sample Nos. 41, 52, 62- 67, 69-71 Table 3, "Tissue samples in colon cancer testing panel"). Notably an over-expression of at least 5 fold was found in 6 out of 36 adenocarcinoma samples. Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non- limiting illustrative example only of a suitable primer pair: H53626 seg25F forward primer; and H53626 seg25R reverse primer. The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon was obtained as a non-limiting illustrative example only of a suitable amplicon: H53626 seg25. Forward primer (SEQ ID NO: 1286): CCGACGGCTCCTACCTCAA Reverse primer (SEQ ID NO: 1287): GGAAGCTGTAGCCCATGGTGT Amplicon (SEQ ID NO: 1288):
CCGACGGCTCCTACCTCAATAAGCTGCTCATCACCCGTGCCCGCCAGGACGATGCG GGCATGTACATCTGCCTTGGCGCCAACACCATGGGCTACAGCTTCC
It should be noted that the variant expression pattern was found to be similar to the expression pattern of the wild-type (previously known) transcript. However, in some cases (as for colon cancer) overexpression of the variant (for example H53626_FGF-RL_T16 transcript) seems to be higher than that the of previously known transcript.
Expression of Homo sapiens fibroblast growth factor receptor-like 1 (FGFRLl) H53626 transcripts, which are detectable by amplicon as depicted in sequence name H53626 seg25 in different normal tissues.
Expression of Homo sapiens fibroblast growth factor receptor-like 1 (FGFRLl) transcripts detectable by or according to H53626 seg25 amplicon (SEQ ID NO: 1288) and H53626 seg25F (SEQ ID NO: 1286) and H53626 seg25R (SEQ ID NO: 1287) was measured by real time PCR.
In parallel the expression of four housekeeping genes: RPL19 (GenBank Accession No.
NM_000981 ; RPLl 9 amplicon, SEQ ID NO: 1264), TATA box (GenBank Accession No.
NM_003194; TATA amplicon, SEQ ID NO: 1267), UBC (GenBank Accession No. BC000449; amplicon - Ubiquitin-amplicon) and SDHA (GenBank Accession No. NM_004168; amplicon - SDHA-amplicon, SEQ ID NO: 1273) was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the lung samples (Sample Nos. 15-17 Table 2 above, "Tissue samples in normal panel"), to obtain a value of relative expression of each sample relative to median of the lung samples.
Forward primer (SEQ ID NO: 1286): CCGACGGCTCCTACCTCAA Reverse primer (SEQ ID NO: 1287): GGAAGCTGTAGCCCATGGTGT Amplicon (SEQ ID NO: 1288):
CCGACGGCTCCTACCTCAATAAGCTGCTCATCACCCGTGCCCGCCAGGACGATGCG GGCATGTACATCTGCCTTGGCGCCAACACCATGGGCTAC AGCTTCC
The results are presented in Figure 71, showing the expression of fibroblast growth factor receptor-like 1 (FGFRLl) transcripts detectable by or according to H53626 seg25 amplicon(s) and H53626 seg25F and H53626 seg25R in different normal tissues.
Expression of Homo sapiens fibroblast growth factor receptor-like 1 (FGFRLl) H53626 transcripts which are detectable by amplicon as depicted in sequence name H53626 junc24- 27F1R3 in different normal tissues
Expression of Homo sapiens fibroblast growth factor receptor-like 1 (FGFRLl) transcripts detectable by or according to H53626 junc24-27FlR3 amplicon (SEQ ID NO: 1285) and H53626 junc24-27Fl (SEQ ID NO:1283) and H53626 junc24-27R3 (SEQ ID NO:1284) was measured by real time PCR. In parallel the expression of four housekeeping genes - RPL19 (GenBank Accession No. NM_000981; RPL19 amplicon, SEQ ID NO:1264), TATA box
(GenBank Accession No. NM_003194; TATA amplicon, SEQ ID NO:1267), UBC (GenBank Accession No. BC000449; amplicon - Ubiquitin-amplicon) and SDHA (GenBank Accession No. NM_004168; amplicon - SDHA-amplicon, SEQ ID NO:1273) was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the lung samples (Sample Nos. 15-17 Table 2 above, "Tissue samples in nonnal panel"), to obtain a value of relative expression of each sample relative to median of the lung samples.
Forward primer (SEQ ID NO: 1283): GTCCTTCCAGTGCAAGACCCA Reverse primer (SEQ ID NO : 1284): TGGGCCTGGCAAAGCC Amplicon (SEQ ID NO: 1285):
GTCCTTCCAGTGCAAGACCCAAAACCGCCAGGGCCACCTGTGGCCTCCTCGTCCTC GGCCACTAGCCTGCCGTGGCCCGTGGTCATCGGCATCCCAGCCGGCGCTGTCTTCAT CCTGGGCACCCTGCTCCTGTGGCTTTGCCAGGCCCA
The results are presented in Figure 72, showing the expression of fibroblast growth factor receptor-like 1 (FGFRLl) transcripts detectable by or according to H53626 seg25 amplicon(s) and H53626 seg25F and H53626 junc24-27FlR3 in different normal tissues.
Variant protein alignment to the previously known protein:
Sequence name: /tmp/KlMec2ReKO/eglEUS2AXY:Q8N441
Sequence documentation :
Alignment of: H53626_PEA_1_P4 x Q8N441
Alignment segment 1/1:
Quality: 4882.00
Escore: 0 Matching length: 504 Total length: 584 Matching Percent Similarity: 100.00 Matching Percent
Identity: 100.00 Total Percent Similarity: 86.30 Total Percent
Identity: 86.30 Gaps : 1
Alignment:
1 MTPSPLLLLLLPPLLLGAFPPAAAARGPPKMADKVVPRQVARLGRTVRLQ 50
1 MTPSPLLLLLLPPLLLGAFPPAAAARGPPKMADKVVPRQVARLGRTVRLQ 50 51 CPVEGDPPPLTMWTKDGRTIHSGWSRFRVLPQGLKVKQVEREDAGVYVCK 100 I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I 51 CPVEGDPPPLTMWTKDGRTIHSG SRFRVLPQGLKVKQVEREDAGVYVCK 100
101 ATNGFGSLSVNYTLVVLDDISPGKESLGPDSSSGGQEDPASQQWARPRFT 150 I I I I I I I I I I I I I II I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 101 ATNGFGSLSVNYTLVVLDDISPGKESLGPDSSSGGQEDPASQQWARPRFT 150 . . . . . 151 QPSKMRRRVIARPVGSSVRLKCVASGHPRPDITWMKDDQALTRPEAAEPR 200 I I I I II I I I I I I I I I I II I I I I I I I I I I I I I II I I I I I I I I I I I II I I I I 151 QPSKMRRRVIARPVGSSVRLKCVASGHPRPDITWMKDDQALTRPEAAEPR 200 201 KKKWTLSLKNLRPEDSGKYTCRVSNRAGAINATYKVDVIQRTRSKPVLTG 250 I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I 201 KKK TLSLKNLRPEDSGKYTCRVSNRAGAINATYKVDVIQRTRSKPVLTG 250 251 THPVNTTVDFGGTTSFQCKVRSDVKPVIQWLKRVEYGAEGRHNSTIDVGG 300 II I I I I I I I I II II I II I I I I I II II I I I I I I II I I I I I I I I I I II I I I I 251 THPVNTTVDFGGTTSFQCKVRSDVKPVIQ LKRVEYGAEGRHNSTIDVGG 300 301 QKF¥VLPTGDVWSRPDGSYLNKLLITRARQDDAGMYICLGANTMGYSFRS 350 I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 301 QKFVVLPTGDVWSRPDGSYLNKLLITRARQDDAGMYICLGANTMGYSFRS 350 . . . . . 351 AFLTVLPGARLPRHATPC CPDPPPGPGVPPTG GPTLPSRAVLARSSAE 400 I I I I I I I 351 AFLTVLP 357 401 GGQPRGTVSTAPGMGLGCSPGLCVGVPLPTSFPLALADPKPPGPPVASSS 450 I I I I I I I I I I I I I 358 DPKPPGPPVASSS 370
451 SATSLP PVVIGIPAGAVFILGTLLL LCQAQKKPCTPAPAPPLPGHRPP 500 II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 371 SATSLPWPVVIGIPAGAVFILGTLLLWLCQAQKKPCTPAPAPPLPGHRPP 420 501 GTARDRSGDKDLPSLAALSAGPGVGLCEEHGSPAAPQHLLGPGPVAGPKL 550 I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 421 GTARDRSGDKDLPSLAALSAGPGVGLCEEHGSPAAPQHLLGPGPVAGPKL 470 551 YPKLYTDIHTHTHTHSHTHSHVEGKVHQHIHYQC 584 I I I I I I I I I I I I I I I I I II I I II I II I I I I I I I I 471 YPKLYTDIHTHTHTHSHTHSHVEGKVHQHIHYQC 504
Sequence name: /tmp/oSUZaRW3WK/oSh3fN5ZtO :Q9H4D7 Sequence documentation:
Alignment of: H53626_PEA_1_P5 x Q9H4D7
Alignment segment 1/1:
Quality: 2644.00
Escore: 0 Matching length: 269 Total length: 269 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment: 1 MTPSPLLLLLLPPLLLGAFPPAAAARGPPKMADKVVPRQVARLGRTVRLQ 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I 1 MTPSPLLLLLLPPLLLGAFPPAAAARGPPKMADKVVPRQVARLGRTVRLQ 50 51 CPVEGDPPPLTMWTKDGRTIHSGWSRFRVLPQGLKVKQVEREDAGVYVCK 100 I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 CPVEGDPPPLTMWTKDGRTIHSGWSRFRVLPQGLKVKQVEREDAGVYVCK 100
101 ATNGFGSLSVNYTLVVLDDISPGKESLGPDSSSGGQEDPASQQWARPRFT 150 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 101 ATNGFGSLSVNYTLWLDDISPGKESLGPDSSSGGQEDPASQQWARPRFT 150 151 QPSKMRRRVIARPVGSSVRLKCVASGHPRPDITWMKDDQALTRPEAAEPR 200 I I II I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I II I I I 151 QPSKMRRRVIARPVGSSVRLKCVASGHPRPDIT MKDDQALTRPEAAEPR 200 201 KKKWTLSLKNLRPEDSGKYTCRVSNRAGAINATYKVDVIQRTRSKPVLTG 250 I I I I I I I I I I I I II I I I 1 I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I 201 KKK TLSLKNLRPEDSGKYTCRVSNRAGAINATYKVDVIQRTRSKPVLTG 250 251 THPVNTTVDFGGTTSFQCK 269 I I I I I I II I I I II I I II I I 251 THPVNTTVDFGGTTSFQCK 269
Sequence name: /tmp/oSUZaR 3WK/oSh3fN5ZtO :Q8N441
Sequence documentation:
Alignment of: H53626_PEA_1_P5 x Q8N441
Alignment segment 1/1:
Quality: 2644.00
Escore: 0 Matching length: 269 Total length: 269 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent
Identity: 100.00 Gaps : 0
Alignment:
1 MTPSPLLLLLLPPLLLGAFPPAAAARGPPKMADKVVPRQVARLGRTVRLQ 50
1 MTPSPLLLLLLPPLLLGAFPPAAAARGPPKMADKVVPRQVARLGRTVRLQ 50 . . . . . 51 CPVEGDPPPLTMWTKDGRTIHSGWSRFRVLPQGLKVKQVEREDAGVYVCK 100
51 CPVEGDPPPLTMWTKDGRTIHSGWSRFRVLPQGLKVKQVEREDAGVYVCK 100 101 ATNGFGSLSVNYTLVVLDDISPGKESLGPDSSSGGQEDPASQQWARPRFT 150 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I 101 ATNGFGSLSVNYTLVVLDDISPGKESLGPDSSSGGQEDPASQQ ARPRFT 150 151 QPSKMRRRVIARPVGSSVRLKCVASGHPRPDITWMKDDQALTRPEAAEPR 200
151 QPSKMRRRVIARPVGSSVRLKCVASGHPRPDITWMKDDQALTRPEAAEPR 200
201 KKK TLSLKNLRPEDSGKYTCRVSNRAGAINATYKVDVIQRTRSKPVLTG 250 I I I I I I II I II I I I I I I I I I I I I II I I I I II I I I I II I I I I I I II I I I II 201 KKKWTLSLKNLRPEDSGKYTCRVSNRAGAINATYKVDVIQRTRSKPVLTG 250 251 THPVNTTVDFGGTTSFQCK 269 I I I I I I I I I II II I I II II 251 THPVNTTVDFGGTTSFQCK 269 DESCRIPTION FOR CLUSTER HSENA78 Cluster HSENA78 features 1 transcript(s) and 7 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Figure imgf000335_0001
These sequences are variants of the known protein Small inducible cytokine B5 precursor (SwissProt accession identifier SZ05JHUMAN; known also according to the synonyms CXCL5; Epithelial-derived neufrophil activating protein 78; Neutrophil-activating peptide ENA- 78), SEQ ID NO: 618, refened to herein as the previously known protein. Protein Small inducible cytokine B5 precursor is known or believed to have the following function(s): Involved in neufrophil activation. The sequence for protein Small inducible cytokine B5 precursor is given at the end of the application, as "Small inducible cytokine B5 precursor amino acid sequence". Protein Small inducible cytokine B5 precursor localization is believed to be Secreted. The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: chemotaxis; signal transduction; cell-cell signaling; positive control of cell proliferation, which are annotation(s) related to Biological Process; and chemokine, which are annotation(s) related to Molecular Function. The GO assignment relies on information from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>. Cluster HSENA78 can be used as a diagnostic marker according to overexpression of transcripts of this cluster in cancer. Expression of such transcripts in normal tissues is also given according to the previously described methods. The term "number" in the left hand column of the table and the numbers on the y-axis of the figure below refer to weighted expression of ESTs in each category, as "parts per million" (ratio of the expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million). Overall, the following results were obtained as shown with regard to the histograms in Figure 16 and Table 4. This cluster is overexpressed (at least at a minimum level) in the following pathological conditions: epithelial malignant tumors and lung malignant tumors.
Table 4 - Normal tissue distribution
Figure imgf000336_0001
Figure imgf000337_0001
Table 5 - P values and ratios for expression in cancerous tissue
Figure imgf000337_0002
As noted above, cluster HSENA78 features 1 transcript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Small inducible cytokine B5 precursor. A description of each variant protein according to the present invention is now provided. Variant protein HSENA78JP2 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSENA78JT5. An alignment is given to the known protein (Small inducible cytokine B5 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSENA78JP2 and SZ05 JHUMAN: l.An isolated chimeric polypeptide encoding for HSENA78JP2, comprising a first amino acid sequence being at least 90 % homologous to MSLLSSRAARVPGPSSSLCALLVLLLLLTQPGPIASAGPAAAVLRELRCVCLQTTQGVHP KMISNLQVFAIGPQCSKVEW conesponding to amino acids 1 - 81 of SZ05_HUMAN, which also conesponds to amino acids 1 - 81 of HSENA78JP2. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region.. Variant protein HSENA78JP2 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 6, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSENA78JP2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 6 - Amino acid mutations
Figure imgf000338_0001
Variant protein HSENA78JP2 is encoded by the following transcript(s): HSENA78JT5, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSENA78JT5 is shown in bold; this coding portion starts at position 149 and ends at position 391. The transcript also has the following SNPs as listed in Table 7 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSENA78_P2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Nucleic acid SNPs
Figure imgf000338_0002
Figure imgf000339_0001
above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided. Segment cluster HSENA78_node__0 according to the present invention is supported by 24 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSENA78JT5. Table 8 below describes the starting and ending position of this segment on each transcript. Table 8 - Segment location on transcripts
Figure imgf000339_0002
Segment cluster HSENA78_nodeJ2 according to the present invention is supported by 22 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSENA78JT5. Table 9 below describes the starting and ending position of this segment on each franscript. Table 9 - Segment location on ti'anscripts
Figure imgf000340_0001
Segment cluster HSENA78_node_6 according to the present invention is supported by 68 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSENA78JT5. Table 10 below describes the starting and ending position of this segment on each transcript. Table 10 - Segment location on transcripts
Figure imgf000340_0002
Segment cluster HSENA78_node_9 according to the present invention is supported by 28 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSENA78JT5. Table 11 below describes the starting and ending position of this segment on each transcript. Table 11 - Segment location on transcripts
Figure imgf000340_0003
According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description. Segment cluster HSENA78_nodeJ according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSENA78JT5. Table 12 below describes the starting and ending position of this segment on each transcript. Table 12 - Segment location on transcripts
Figure imgf000340_0004
Segment cluster HSENA78_nodeJ according to the present invention is supported by 17 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSENA78JT5. Table 13 below describes the starting and ending position of this segment on each transcript. Table 13 - Segment location on transcripts
Figure imgf000341_0001
Segment cluster HSENA78_node_8 according to the present invention can be found in the following transcript(s): HSENA78JT5. Table 14 below describes the starting and ending position of this segment on each transcript. Table 14 - Segment location on transcripts
Figure imgf000341_0002
Microanay (chip) data is also available for this gene as follows. As described above with regard to the cluster itself, various oligonucleotides were tested for being differentially expressed in various disease conditions, particularly cancer. The following oligonucleotides were found to hit this segment with regard to colon cancer, shown in Table 15. Table 15 - Oligonucleotides related to this gene
Figure imgf000341_0003
Variant protein alignment to the previously known protein: Sequence name: /tmp/5kiQY6MxWx/pLnTrxsCqk: SZ05_HUMAN Sequence documentation:
Alignment of: HSENA78 P2 x SZ05 HUMAN Alignment segment 1/1:
Quality: 767.00 Escore: 0 Matching length: 81 Total length: 81 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps :
Alignment :
1 MSLLSSRAARVPGPSSSLCALLVLLLLLTQPGPIASAGPAAAVLRELRCV 50
1 MSLLSSRAARVPGPSSSLCALLVLLLLLTQPGPIASAGPAAAVLRELRCV 50
51 CLQTTQGVHPKMISNLQVFAIGPQCSKVEVV 81
51 CLQTTQGVHPKMISNLQVFAIGPQCSKVEVV
DESCRIPTION FOR CLUSTER HUMGROG5 Cluster HUMGROG5 features 4 transcript(s) and 18 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Figure imgf000343_0001
Figure imgf000344_0001
These sequences are variants of the known protein Macrophage inflammatory protein-2 - beta precursor (SwissProt accession identifier MI2BJHUMAN; known also according to the synonyms MIP2-beta; CXCL3; Growth regulated protein gamma; GRO-gamma), SEQ ID NO: 619, refened to herein as the previously known protein. Protein Macrophage inflammatory protein-2-beta precursor is known or believed to have the following function(s): May play a role in inflammation and exert its effects on endothelial cells in an autocrine fashion. The sequence for protein Macrophage inflammatory protein-2-beta precursor is given at the end of the application, as "Macrophage inflammatory protein-2-beta precursor amino acid sequence". Known polymoφhisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein
Figure imgf000344_0002
Protein Macrophage inflammatory protein-2-beta precursor localization is believed to be Secreted. The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: chemokine, which are annotation(s) related to Molecular Function; and extracellular soace. which are annotation^ related to Cellular Comnonent. The GO assignment relies on infonnation from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nhn.nih.gov/projects/LocusLink/>. As noted above, cluster HUMGROG5 features 4 transcript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Macrophage inflammatory protein-2-beta precursor. A description of each variant protein according to the present invention is now provided.
Variant protein HUMGROG5_PEA_l_P2 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMGROG5JPEAJJT3. An alignment is given to the known protein (Macrophage inflammatory protein-2 -beta precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMGROG5 JPEA J P2 and MI2B JIUMAN: l.An isolated chimeric polypeptide encoding for HUMGROG5_PEA_l_P2, comprising a first amino acid sequence being at least 90 % homologous to
MAHATLSAAPSNPRLLRVALLLLLLVAASRRAAGASWTELRCQCLQTLQGIHLKNIQS VNVRSPGPHCAQTEV conesponding to amino acids 1 - 74 of MI2B_HUMAN, which also conesponds to amino acids 1 - 74 of HUMGROG5_PEA_l_P2.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region.. Variant protein HUMGROG5_PEA_l_P2 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 5, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMGROG5 PEA J JP2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 5 - Amino acid mutations
Figure imgf000346_0001
Variant protein HUMGROG5JPEAJJP2 is encoded by the following transcript(s): HUMGROG5 PEA J T3, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMGROG5 PEA J T3 is shown in bold; this coding portion starts at position 196 and ends at position 420. The transcript also has the following SNPs as listed in Table 6 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMGROG5JPEAJJP2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 6 - Nucleic acid SNPs
Figure imgf000346_0002
Variant protein HUMGROG5JPEAJJP3 according to the present invention has an amino acid seαuence as -riven at the end of the application: it is encoded bv transcripts HUMGROG5_PEA_l_T4. An alignment is given to the known protein (Macrophage inflammatory protein-2 -beta precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMGROG5 J»EA JP3 and MI2B_HUMAN: l.An isolated chimeric polypeptide encoding for HUMGROG5JPEAJJP3, comprising a first amino acid sequence being at least 90 % homologous to MAHATLSAAPSNPRLLRVALLLLLLVAASRRAAGASVVTELRCQCLQTLQGIHLKNIQS VNVRSPGPHCAQTEVIATLKNGKKACLNPASPMVQKIIEKILNK conesponding to amino acids 1 - 103 of MI2B JHUMAN, which also conesponds to amino acids 1 - 103 of HUMGROG5 PEA 1 P3.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region.. Variant protein HUMGROG5_PEA_l_P3 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 7, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMGROG5_PEA_l_P3 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Amino acid mutations
Figure imgf000347_0001
Variant protein HUMGROG5_PEA_l_P3 is encoded by the following transcript(s): HUMGROG5JPEAJJT4, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMGROG5_PEA_l_T4 is shown in bold; this coding portion starts at position 196 and ends at position 504. The transcript also has the following SNPs as listed in Table 8 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMGROG5_PEA_l_P3 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Nucleic acid SNPs
Figure imgf000348_0001
Variant protein HUMGROG5_PEA_l_P7 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMGROG5JΕAJJT9. An alignment is given to the known protein (Macrophage inflammatory protein-2-beta precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMGROG5 J>EA J JP7 and MI2B JHUMAN: l.An isolated chimeric polypeptide encoding for HUMGROG5_PEA_l_P7, comprising a first amino acid sequence being at least 90 % homologous to MAHATLSAAPSNPRLLRVALLLLLLVAASRRAAGASVVTELRCQCLQTLQGIHLKNIQS VN conesponding to amino acids 1 - 61 of MI2B_HUMAN, which also conesponds to amino acids 1 - 61 of HUMGROG5 JPEAJ JP7, and a second amino acid sequence being at least 10%, optionally at least 80%>, preferably at least 85%, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence SHTQEWEESLSQPRIPHGSENHRKDTEQGEHQLTGEK conesponding to amino acids 62 - 98 of HUMGROG5_PEA_l_P7, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HUMGROG5JPEAJ JP7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95% homologous to the sequence SHTQEWEESLSQPRIPHGSENHRKDTEQGEHQLTGEK in HUMGROG5 PEA 1 P7. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region.. Variant protein HUMGROG5 JΕAJ JP7 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 9, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMGROG5_PEA_l_P7 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Amino acid mutations
Figure imgf000349_0001
Figure imgf000350_0001
Variant protein HUMGROG5_PEA_l_P7 is encoded by the following transcript(s): HUMGROG5JPEAJ JT9, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMGROG5JPEAJ JT9 is shown in bold; this coding portion starts at position 196 and ends at position 489. The transcript also has the following SNPs as listed in Table 10 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMGROG5JPEAJJP7 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 - Nucleic acid SNPs
Figure imgf000350_0002
Variant protein HUMGROG5JPEAJ P12 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMGROG5_PEA_l_T6. An alignment is given to the known protein (Macrophage inflammatory protein-2-beta precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMGROG5JPEAJ JP12 and MI2B JHUMAN: l.An isolated chimeric polypeptide encoding for HUMGROG5_PEA_l_P12, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MHKKGSPILGSHTARVAGTSPPALPLLAQLPDASAEPHGPRHALRRPQQSPAPAGGAAA PAPGGRQPARSRWVPAPWGPRAGRGWGGRPAPTAPLNQRVYSSL conesponding to amino acids 1 - 103 of HUMGROG5JPEAJ J>12, and a second amino acid sequence being at least 90 %> homologous to GASVVTELRCQCLQTLQGIHLKNIQSVNVRSPGPHCAQTEVIATLKNGKKACLNPASPM VQKIIEKILNKGSTN conesponding to amino acids 34 - 107 of MI2B_HUMAN, which also conesponds to amino acids 104 - 177 of HUMGROG5JPEAJJP12, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a head of HUMGROG5 PEAJ JP12, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95% homologous to the sequence MHKKGSPILGSHTARVAGTSPPALPLLAQLPDASAEPHGPRHALRRPQQSPAPAGGAAA PAPGGRQPARSRWVPAPWGPRAGRGWGGRPAPTAPLNQRVYSSL of HUMGROG5 JPEAJ J>12.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because one of the two signal- peptide prediction programs (HMM: Signal peptide,NN:NO) predicts that this protein has a signal peptide.. Variant protein HUMGROG5JPEAJJP12 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 11, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMGROG5_PEA_l_P12 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 11 - Amino acid mutations
Figure imgf000351_0001
Figure imgf000352_0001
Variant protein HUMGR0G5_PEA_1_P12 is encoded by the following transcript(s): HUMGROG5 PEA J JT6, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMGROG5 J>EAJ JT6 is shown in bold; this coding portion starts at position 84 and ends at position 614. The transcript also has the following SNPs as listed in Table 12 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMGROG5JPEAJJP12 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 12 - Nucleic acid SNPs
Figure imgf000352_0002
As noted above, cluster HUMGROG5 features 18 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster HUMGROG5 JPEAJ jnode J 8 according to the present invention is supported by 23 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMGROG5JPEAJJT3 and HUMGROG5JPEAJJT4. Table 13 below describes the starting and ending position of this segment on each transcript. Table 13 - Segment location on transcripts
Figure imgf000353_0001
Microanay (chip) data is also available for this segment as follows. As described above with regard to the cluster itself, various oligonucleotides were tested for being differentially expressed in various disease conditions, particularly cancer. The following oligonucleotides were found to hit this segment with regard to colon cancer, shown in Table 14. Table 14 - Oligonucleotides related to this segment
Segment cluster HUMGROG5 JPEAJ jnode J 9 according to the present invention is supported by 40 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMGROG5JPEAJJT3, HUMGROG5_PEAJ_T4, HUMGROG5 JPEA JT6 and HUMGROG5_PEA_l_T9. Table 15 below describes the starting and ending position of this segment on each transcript. Table 15 - Segment location on transcripts
Figure imgf000353_0003
Segment cluster HUMGROG5JPEAJ_node_21 according to the present invention is supported by 45 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMGROG5JPEAJ T3, HUMGROG5 JPEAJ JT4, HUMGROG5 PEAJ JT6 and HUMGROG5_PEA_l_T9. Table 16 below describes the starting and ending position of this segment on each transcript. Table 16 - Segment location on transcripts
Figure imgf000354_0001
supported by 60 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMGROG5_PEA_l_T3, HUMGROG5_PEA_l_T4, HUMGROG5_PEA_l_T6 and HUMGROG5_PEA_l_T9. Table 17 below describes the starting and ending position of this segment on each transcript. Table 17 - Segment location on transcripts
Figure imgf000354_0002
Segment cluster HUMGROG5_PEA_l_node_6 according to the present invention is supported by 22 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMGROG5_PEA_l_T3, HUMGROG5JPEAJJT4, HUMGROG5JPEAJJT6 and HUMGROG5JPEAJ JT9. Table 18 below describes the starting and ending position of this segment on each transcript. Table 18 - Segment location on transcripts
Figure imgf000354_0003
Figure imgf000355_0001
According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster HUMGROG5 JPEA J iode J 0 according to the present invention can be found in the following transcript(s): HUMGROG5_PEA_l_T3, HUMGROG5JPEAJJT4, HUMGROG5 JΕAJJT6 and HUMGROG5JPEAJ JT9. Table 19 below describes the starting and ending position of this segment on each transcript. Table 19 - Segment location on transcripts
Figure imgf000355_0002
Segment cluster HUMGROG5_PEA_l_node 11 according to the present invention is supported by 33 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMGROG5_PEA_l_T3, HUMGROG5_PEA_l_T4, HUMGROG5 JPE A JJT6 and HUMGROG5 JPEAJ JT9. Table 20 below describes the starting and ending position of this segment on each transcript. Table 20 - Segment location on transcripts
Figure imgf000355_0003
Figure imgf000356_0001
Segment cluster HUMGROG5_PEA_l_node_12 according to the present invention can be found in the following transcript(s): HUMGROG5JΕAJ JT3, HUMGROG5_PEA_l_T4 and HUM GROG5 JPEA JJT6. Table 21 below describes the starting and ending position of this segment on each transcript. Table 21 - Segment location on transcripts
Figure imgf000356_0002
Segment cluster HUMGROG5_PEA_l_node_13 according to the present invention can be found in the following transcript(s): HUMGROG5 JPEAJ JT3, HUMGROG5JPEAJ JT4 and HUMGROG5_PEA_l_T6. Table 22 below describes the starting and ending position of this segment on each transcript. Table 22 - Segment location on transcripts
Figure imgf000356_0003
Segment cluster HUMGROG5JPEAJ_nodeJ4 according to the present invention is supported by 8 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMGROG5_PEA_l_T3. Table 23 below describes the starting and ending position of this segment on each transcript. Table 23 - Segment location on transcripts
Figure imgf000356_0004
Segment cluster HUMGROG5 JPEA J_node J 5 according to the present invention is supported by 7 libraries. The number of libraries was detennined as previously described. This segment can be found in the following transcript(s): HUMGROG5JΕAJ JT3. Table 24 below describes the starting and ending position of this segment on each transcript.
Figure imgf000357_0001
Segment cluster HUMGROG5 PEA_l_node_16 according to the present invention is supported by 6 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMGROG5JPEAJ JT3. Table 25 below describes the starting and ending position of this segment on each transcript. Table 25 - Segment location on transcripts Transcript name Segment starting position Segment ending position HUMGROG5 PEA 1 T3 500 532 Segment cluster HUMGROG5_PEA_l_node_17 according to the present invention is supported by 29 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMGROG5_PEA_l_T3, HUMGROG5JPEAJJT4, HUMGROG5_PEA_l_T6 and HUMGROG5JPEAJ_T9. Table 26 below describes the starting and ending position of this segment on each transcript. Table 26 - Segment location on transcripts
Figure imgf000357_0002
Segment cluster HUMGROG5_PEA_l_node_20 according to the present invention can be found in the following transcript(s): HUMGROG5JPEAJJT3, HUMGROG5_PEA_l_T4, HUMGROG5 J>EAJ JT6 and HUMGROG5_PEA_l_T9. Table 27 below describes the starting and ending position of this segment on each transcript. Table 27 - Segment location on transcripts
Figure imgf000358_0001
Segment cluster HUMGROG5JPEAJ _node_22 according to the present invention can be found in the following transcript(s): HUMGROG5JPEAJJT3, HUMGROG5JPEAJJT4, HUMGROG5 J>EA J JT6 and HUMGROG5 J>EA J JT9. Table 28 below describes the starting and ending position of this segment on each transcript. Table 28 - Segment location on transcripts
Figure imgf000358_0002
Segment cluster HUMGROG5_PEA_l_node_7 according to the present invention is supported by 23 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMGROG5JPEAJJT3, HUMGROG5_PEA_l_T4, HUMGROG5 J>EAJ_T6 and HUMGROG5_PEA_l_T9. Table 29 below describes the starting and ending position of this segment on each transcript. Table 29 - Segment location on transcripts
Figure imgf000358_0003
Figure imgf000359_0001
Segment cluster HUMGROG5 PEAJ ιode_8 according to the present invention can be found in the following transcript(s): HUMGROG5JPEAJJT3, HUMGROG5_PEA_l_T4, HUMGROG5 J EA J T6 and HUMGROG5 J>EA J _T9. Table 30 below describes the starting and ending position of this segment on each transcript. Table 30 - Segment location on transcripts
Figure imgf000359_0002
Segment cluster HUMGROG5JPEAJ_node_9 according to the present invention is supported by 3 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcriρt(s): HUMGROG5 PEA _1_T6. Table 31 below describes the starting and ending position of this segment on each transcript. Table 31 - Segment location on transcripts
Figure imgf000359_0003
Variant protein alignment to the previously known protein: Sequence name: /tmp/2xn09xcDbu/OFuYQZgnpt:MI2B HUMAN Sequence documentation:
Alignment of: HUMGROG5_PEA_l_P2 x MI2B_HUMAN
Alignment segment 1/1:
Quality: 701.00
Escore: 0 Matching length: 75 Total length: 75 Matching Percent Similarity: 100.00 Matching Percent Identity: 98.67 Total Percent Similarity: 100.00 Total Percent Identity: 98.67 Gaps : 0
Alignment : 1 MAHATLSAAPSNPRLLRVALLLLLLVAASRRAAGASVVTELRCQCLQTLQ 50 I II I I II I I I I I I I I I I I I I I I I I II II I I I I I I I I II I I I I II I I I I II 1 MAHATLSAAPSNPRLLRVALLLLLLVAASRRAAGASVVTELRCQCLQTLQ 50
51 GIHLKNIQSVNVRSPGPHCAQTEVM 75 I I I I II I I I I M I I I I I I II I I I I : 51 GIHLKNIQSVNVRSPGPHCAQTEVI 75 Sequence name: /tmp/PMlNwtDTrf/oTkbZ2ktxi :MI2B HUMAN
Sequence documentation:
Alignment of: HUMGR0G5JPEA 1 P3 x MI2B_HUMAN
Alignment segment 1/1 Quality: 979.00
Escore: 0 Matching length: 103 Total length: 103 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment:
1 MAHATLSAAPSNPRLLRVALLLLLLVAASRRAAGASWTELRCQCLQTLQ 50
1 MAHATLSAAPSNPRLLRVALLLLLLVAASRRAAGASVVTELRCQCLQTLQ 50
51 GIHLKNIQSVNVRSPGPHCAQTEVIATLKNGKKACLNPASPMVQKIIEKI 100
51 GIHLKNIQSVNVRSPGPHCAQTEVIATLKNGKKACLNPASPMVQKIIEKI 100
101 LNK 103 101 LNK 103
Sequence name: /tmp/H0ryq4X077/k 3t8ORy6X :MI2B_HUMAN
Sequence documentation:
Alignment of: HUMGROG5 PEA 1 P7 x MI2B HUMAN
Alignment segment 1/1:
Quality: 567.00 Escore: 0 Matching length: 61 Total length: 61 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps :
Alignment:
1 MAHATLSAAPSNPRLLRVALLLLLLVAASRRAAGASVVTELRCQCLQTLQ 50
1 MAHATLSAAPSNPRLLRVALLLLLLVAASRRAAGASWTELRCQCLQTLQ 50 51 GIHLKNIQSVN 61
51 GIHLKNIQSVN 61
Sequence name: /tmp/eJBNVFGEc7/N3fotcYJ07 :MI2B_HUMAN
Sequence documentation:
Alignment of : HUMGROG5 PEA 1_P12 x MI2BJΪUMAN
Alignment segment 1/1:
Quality: 721.00 Escore: 0 Matching length: 74 Total length: 74 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps :
Alignment:
104 GASVVTELRCQCLQTLQGIHLKNIQSVNVRSPGPHCAQTEVIATLKNGKK 153 GASVVTELRCQCLQTLQGIHLKNIQSVNVRSPGPHCAQTEVIATLKNGKK 83
ACLNPASPMVQKIIEKILNKGSTN 177
I I I I I I I I I I I I 1 1 I I I I I I I I I I ACLNPASPMVQKI IEKILNKGSTN 107
DESCRIPTION FOR CLUSTER HUMODCA Cluster HUMODCA features 1 transcript(s) and 17 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Figure imgf000365_0001
Table 3 - Proteins of interest
Figure imgf000366_0001
These sequences are variants of the known protein Omithine decarboxylase (SwissProt accession identifier DCORJIUMAN; known also according to the synonyms EC 4.1.1.17; ODC), SEQ ID NO: 620, refened to herein as the previously known protein. Protein Omithine decarboxylase is known or believed to have the following function(s): Polyamine biosynthesis; first (rate-limiting) step. The sequence for protein Omithine decarboxylase is given at the end of the application, as "Omithine decarboxylase amino acid sequence". Known polymoφhisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein
Figure imgf000366_0002
The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: polyamine biosynthesis, which are annotation(s) related to Biological Process; and omithine decarboxylase; lyase, which are annotation(s) related to Molecular Function. The GO assignment relies on information from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>.
Cluster HUMODCA can be used as a diagnostic marker according to overexpression of transcripts of this cluster in cancer. Expression of such transcripts in normal tissues is also given according to the previously described methods. The term "number" in the left hand column of the table and the numbers on the y-axis of the figure below refer to weighted expression of ESTs in each category, as "parts per million" (ratio of the expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million). Overall, the following results were obtained as shown with regard to the histograms in Figure 17 and Table 5. This cluster is overexpressed (at least at a minimum level) in the following pathological conditions: brain malignant tumors, colorectal cancer, epithelial malignant tumors and a mixture of malignant tumors from different tissues. Table 5 - Normal tissue distribution
Figure imgf000367_0001
Table 6 - P values and ratios for expression in cancerous tissue
Figure imgf000368_0001
above. These transcript(s) encode for protein(s) which are variant(s) of protein Omithine decarboxylase. A description of each variant protein according to the present invention is now provided. Variant protein HUMODCAJP9 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMODCAJT17.
An alignment is given to the known protein (Omithine decarboxylase) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMODCA P9 and DCORJHUMAN: l.An isolated chimeric polypeptide encoding for HUMODCA_P9, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MKSLTATSSMKVLLPRTFWTRKLMKFLLL conesponding to amino acids 1 - 29 of HUMODCAJP9, and a second amino acid sequence being at least 90 % homologous to LVLRIATDDSKAVCRLSVKFGATLRTSRLLLERAKELNIDWGVSFHVGSGCTDPETFV QAISDARCVFDMGAEVGFSMYLLDIGGGFPGSEDVKLKFEEITGVINPALDKYFPSDSG VRIIAEPGRYYVASAFTLAVNIIAKKIVLKEQTGSDDEDESSEQTFMYYVNDGVYGSFN CILYDHAHVKPLLQKRPKPDEKYYSSSIWGPTCDGLDRIVERCDLPEMHVGDWMLFEN MGAYTVAAASTFNGFQRPTIYYVMSGPAWQLMQQFQNPDFPPEVEEQDASTLPVSCA WESGMKRHRAACASASINV conesponding to amino acids 151 - 461 of DCOR_HUMAN, which also conesponds to amino acids 30 - 340 of HUMODCA_P9, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a head of HUMODCAJP9, comprising a polypeptide being at least 70%>, optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MKSLTATSSMKVLLPRTFWTRKLMKFLLL of HUMODCAJP9. Comparison report between HUMODCAJP9 and AAA59968(SEQ ID NO:1387): l.An isolated chimeric polypeptide encoding for HUMODCAJP9, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MKSLTATSSMKVLLPRTFWTRKLMKFLLL conesponding to amino acids 1 - 29 of HUMODCAJP9, and a second amino acid sequence being at least 90 % homologous to LVLRIATDDSKAVCRLSVKFGATLRTSRLLLERAKELNIDVNGVSFHVGSGCTDPETFV QAISDARCVFDMGAEVGFSMYLLDIGGGFPGSEDVKLKFEEITGVINPALDKYFPSDSG VRIIAEPGRYYVASAFTLAVNIIAKKIVLKEQTGSDDEDESSEQTFMYYVNDGVYGSFN CILYDHAHVKPLLQKRPKPDEKYYSSSIWGPTCDGLDRIVERCDLPEMHVGDWMLFEN MGAYTVAAASTFNGFQRPTIYYVMSGPAWQLMQQFQNPDFPPEVEEQDASTLPVSCA WESGMKRHRAACASASINV conesponding to amino acids 40 - 350 of AAA59968, which also conesponds to amino acids 30 - 340 of HUMODCAJP9, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a head of HUMODCAJP9, comprising a polypeptide being at least 70%o, optionally at least about 80%), preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MKSLTATSSMKVLLPRTFWTRKLMKFLLL of HUMODCA_P9. Comparison report between HUMODCAJP9 and AAH14562(SEQ ID NO:1388): l.An isolated chimeric polypeptide encoding for HUMODCA_P9, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MKSLTATSSMKVLLPRTFWTRKLMKFLLL conesponding to amino acids 1 - 29 of HUMODCA P9, and a second amino acid sequence being at least 90 % homologous to LVLRIATDDSKAVCRLSVKFGATLRTSRLLLERAKELNIDWGVSFHVGSGCTDPETFV QAISDARCVFDMGAEVGFSMYLLDIGGGFPGSEDVKLKFEEITGVINPALDKYFPSDSG VRIIAEPGRYYVASAFTLAVNIIAKKIVLKEQTGSDDEDESSEQTFMYYVNDGVYGSFN CILYDHAHVKPLLQKRPKPDEKYYSSSIWGPTCDGLDRIVERCDLPEMHVGDWMLFEN MGAYTVAAASTFNGFQRPTIYYVMSGPAWQLMQQFQNPDFPPEVEEQDASTLPVSCA WESGMKRHRAACASASINV conesponding to amino acids 86 - 396 of AAH14562, which also conesponds to amino acids 30 - 340 of HUMODCAJP9, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a head of HUMODCAJP9, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95%> homologous to the sequence MKSLTATSSMKVLLPRTFWTRKLMKFLLL of HUMODCAJP9. The location of the variant protein was detennined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region.. Variant protein HUMODCAJP9 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 7, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMODCA_P9 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Amino acid mutations
Figure imgf000371_0001
Figure imgf000372_0001
Variant protein HUMODCA_P9 is encoded by the following transcript(s): HUMODCAJT17, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMODCA JTl 7 is shown in bold; this coding portion starts at position 528 and ends at position 1547. The transcript also has the following SNPs as listed in Table 8 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMODCA_P9 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Nucleic acid SNPs
Figure imgf000372_0002
Figure imgf000373_0001
Figure imgf000374_0001
above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster HUMODCA_node_l according to the present invention is supported by 76 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMODCA_T17. Table 9 below describes the starting and ending position of this segment on each transcript. Table 9 - Segment location on transcripts
Figure imgf000375_0001
Segment cluster HUMODCA_nodeJ5 according to the present invention is supported by 190 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMODCAJT17. Table 10 below describes the starting and ending position of this segment on each transcript. Table 10 - Segment location on transcripts
Figure imgf000375_0002
Segment cluster HUMODCA_node J2 according to the present invention is supported by 249 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMODCAJT17. Table 11 below describes the starting and ending position of this segment on each transcript. Table 11 - Segment location on transcripts
Figure imgf000375_0003
Segment cluster HUMODCA_nodeJ6 according to the present invention is supported by 348 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMODCA_T17. Table 12 below describes the starting and ending position of this segment on each transcript. Table 12 - Segment location on transcripts
Figure imgf000376_0001
Segment cluster HUMODCA node 39 according to the present invention is supported by 297 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMODCA_T17. Table 13 below describes the starting and ending position of this segment on each transcript. Table 13 - Segment location on transcripts
Figure imgf000376_0002
Segment cluster HUMODCA iode ll according to the present invention is supported by 230 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMODCA_T17. Table 14 below describes the starting and ending position of this segment on each transcript. Table 14 - Segment location on transcripts
Figure imgf000376_0003
According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster HUMODCA_node_0 according to the present invention is supported by 9 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMODCA JTl 7. Table 15 below describes the starting and ending position of this segment on each transcript. Table 15 - Segment location on ti-anscripts
Figure imgf000377_0001
Segment cluster HUMODCA iode JO according to the present invention is supported by 107 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMODCAJT17. Table 16 below describes the starting and ending position of this segment on each transcript. Table 16 - Segment location on transcripts
Figure imgf000377_0002
Segment cluster HUMODCA iodej 2 according to the present invention is supported by 132 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMODCA JT17. Table 17 below describes the starting and ending position of this segment on each transcript. Table 17 - Segment location on transcripts
Figure imgf000377_0003
Segment cluster HUMODCA_node_13 according to the present invention is supported by 126 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMODCAJT17. Table 18 below describes the starting and ending position of this segment on each transcript. Table 18 - Segment location on transcripts
Figure imgf000377_0004
Figure imgf000378_0001
Segment cluster HUMODCA_node_2 according to the present invention is supported by 81 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMODCA_T17. Table 19 below describes the starting and ending position of this segment on each transcript. Table 19 - Segment location on transcripts
Figure imgf000378_0002
Segment cluster HUMODCA_node_27 according to the present invention is supported by 185 libraries. The number of libraries was detennined as previously described. This segment can be found in the following transcript(s): HUMODCAJT17. Table 20 below describes the starting and ending position of this segment on each transcript. Table 20 - Segment location on transcripts
Figure imgf000378_0003
Segment cluster HUMODCA_node__3 according to the present invention is supported by 85 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMODCA_T17. Table 21 below describes the starting and ending position of this segment on each transcript. Table 21 - Segment location on transcripts
Figure imgf000378_0004
Segment cluster HUMODCA iodeJO according to the present invention is supported by 196 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMODCAJT17. Table 22 below describes the starting and ending position of this segment on each transcript.
Figure imgf000379_0001
Segment cluster HUMODCA_node J4 according to the present invention is supported by 259 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMODCAJT17. Table 23 below describes the starting and ending position of this segment on each transcript. Table 23 - Segment location on transcripts
Figure imgf000379_0002
Segment cluster HUMODCA_node 8 according to the present invention is supported by 272 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMODCAJT17. Table 24 below describes the starting and ending position of this segment on each transcript. Table 24 - Segment location on transcripts
Figure imgf000379_0003
Segment cluster HUMODCA_node_40 according to the present invention is supported by 239 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMODCAJT17. Table 25 below describes the starting and ending position of this segment on each transcript. Table 25 - Segment location on transcripts
Figure imgf000380_0001
Variant protein alignment to the previously known protein: Sequence name: /tmp/y03EwE6i01/dRQ512K6e2 : DCOR_HUMAN
Sequence documentation:
Alignment of: HUMODCA P9 x DCOR HUMAN
Alignment segment 1/1 Quality: 3056.00 Escore: 0 Matching length: 311 Total length: 311 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0 Alignment:
30 LVLRIATDDSKAVCRLSVKFGATLRTSRLLLERAKELNIDVVGVSFHVGS 79 I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I 151 LVLRIATDDSKAVCRLSVKFGATLRTSRLLLERAKELNIDVVGVSFHVGS 200
80 GCTDPETFVQAISDARCVFDMGAEVGFSMYLLDIGGGFPGSEDVKLKFEE 129 I I I I I I I I I I I I I I I I I I I I I II I II I II I I I I I I I I I II I I I I I I I I I I 201 GCTDPETFVQAISDARCVFDMGAEVGFSMYLLDIGGGFPGSEDVKLKFEE 250
130 ITGVINPALDKYFPSDSGVRIIAEPGRYYVASAFTLAVNIIAKKIVLKEQ 179 I I I I I I I I I I I I I I I I I I I I I II I II I I I I I I I I I I I I I I I I I I I I II I I 251 ITGVINPALDKYFPSDSGVRIIAEPGRYYVASAFTLAVNIIAKKIVLKEQ 300 . . . . . 180 TGSDDEDESSEQTFMYYVNDGVYGSFNCILYDHAHVKPLLQKRPKPDEKY 229 I I I I I I I I I I I I I I I II I II II I I I I I I I I I I I I I I II I I I I I I I I II I I 301 TGSDDEDESSEQTFMYYVNDGVYGSFNCILYDHAHVKPLLQKRPKPDEKY 350 230 YSSSIWGPTCDGLDRIVERCDLPEMHVGD MLFENMGAYTVAAASTFNGF 279 I I I I I I II II II I I I ! I I I I I I I I I I I I I I I II I II I I I I II I I I I II I I 351 YSSSI GPTCDGLDRIVERCDLPEMHVGDWMLFENMGAYTVAAASTFNGF 400
280 QRPTIYYVMSGPA QLMQQFQNPDFPPEVEEQDASTLPVSCAWESGMKRH 329 I I I I I I I I I II I I I I I I II I I I I I I I I I I II II I I II I I I I I I I I I I I I I 401 QRPTIYYVMSGPA QLMQQFQNPDFPPEVEEQDASTLPVSCAWESGMKRH 450
330 RAACASASINV 340 I I I I I I I I I I 1 451 RAACASASINV 461
Sequence name: /tmp/y03E E6i01/dRQ512K6e2 :AAA59968
Sequence documentation:
Alignment of: HUM0DCAJP9 x AAA59968
Alignment segment 1/1:
Quality: 3056.00 Escore: 0 Matching length: 311 Total length: 311 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent
Identity: 100.00 Gaps : 0
Alignment: . . . . . 30 LVLRIATDDSKAVCRLSVKFGATLRTSRLLLERAKELNIDWGVSFHVGS 79 I I I I I I I I I I I I I I I I I II II I I I II I II I II I I I I I I I I I I I II I I I I I 40 LVLRIATDDSKAVCRLSVKFGATLRTSRLLLERAKELNIDVVGVSFHVGS 89 80 GCTDPETFVQAISDARCVFDMGAEVGFSMYLLDIGGGFPGSEDVKLKFEE 129 90 GCTDPETFVQAISDARCVFDMGAEVGFSMYLLDIGGGFPGSEDVKLKFEE 139
130 ITGVINPALDKYFPSDSGVRIIAEPGRYYVASAFTLAVNIIAKKIVLKEQ 179 I I I I I I I II I I II I I II I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 140 ITGVINPALDKYFPSDSGVRIIAEPGRYYVASAFTLAVNIIAKKIVLKEQ 189 180 TGSDDEDESSEQTFMYYVNDGVYGSFNCILYDHAHVKPLLQKRPKPDEKY 229 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 190 TGSDDEDESSEQTFMYYVNDGVYGSFNCILYDHAHVKPLLQKRPKPDEKY 239
230 YSSSIWGPTCDGLDRIVERCDLPEMHVGDWMLFENMGAYTVAAASTFNGF 279 I I I I I II I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I II I 240 YSSSI GPTCDGLDRIVERCDLPEMHVGDWMLFENMGAYTVAAASTFNGF 289 280 QRPTIYYVMSGPAWQLMQQFQNPDFPPEVEEQDASTLPVSCAWESGMKRH 329 I I I I I I I I I I I I I I I I I II I I I I I I I II I I I I I I I I I I I I I I I II I I I I I 290 QRPTIYYVMSGPA QLMQQFQNPDFPPEVEEQDASTLPVSCA ESGMKRH 339
330 RAACASASINV 340
340 RAACASASINV 350
Sequence name: /tmp/y03E E6i01/dRQ512K6e2 :AAH14562
Sequence documentation: Alignment of: HUMODCAJP9 x AAH14562
Alignment segment 1/1: Quality: 3056.00
Escore: 0 Matching length: 311 Total length: 311 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent
Identity: 100.00 Gaps: 0
Alignment:
30 LVLRIATDDSKAVCRLSVKFGATLRTSRLLLERAKELNIDVVGVSFHVGS 79 I I I I I I I I II I I I I I II I II II I I I I I I I I I I I II I I I I I I I I I I I I I I I 86 LVLRIATDDSKAVCRLSVKFGATLRTSRLLLERAKELNIDVVGVSFHVGS 135
80 GCTDPETFVQAISDARCVFDMGAEVGFSMYLLDIGGGFPGSEDVKLKFEE 129
136 GCTDPETFVQAISDARCVFDMGAEVGFSMYLLDIGGGFPGSEDVKLKFEE 185 130 ITGVINPALDKYFPSDSGVRIIAEPGRYYVASAFTLAVNIIAKKIVLKEQ 179 I I I I I I I I I II II I I I I I I I I II II I II I I II I I I I II I II II II I I I II 186 ITGVINPALDKYFPSDSGVRIIAEPGRYYVASAFTLAVNIIAKKIVLKEQ 235
180 TGSDDEDESSEQTFMYYVNDGVYGSFNCILYDHAHVKPLLQKRPKPDEKY 229 I I I I I I II I I I I I I I I I I I I I II II II I I I I II I II I I II I I II II I I II 236 TGSDDEDESSEQTFMYYVNDGVYGSFNCILYDHAHVKPLLQKRPKPDEKY 285 230 YSSSIWGPTCDGLDRIVERCDLPEMHVGD MLFENMGAYTVAAASTFNGF 279 I I I I I I M I I I I I I I I I I I I I I 11 I I I I I I I I I I I I I I I 1 I I I I I I I I I I
286 YSSSI GPTCDGLDRIVERCDLPEMHVGDWMLFENMGAYTVAAASTFNGF 335 . . . . .
280 QRPTIYYVMSGPAWQLMQQFQNPDFPPEVEEQDASTLPVSCA ESGMKRH 329 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I 1 I I
336 QRPTIYYVMSGPA QLMQQFQNPDFPPEVEEQDASTLPVSCAWESGMKRH 385
330 RAACASASINV 340 I I I I I I I I I I I 386 RAACASASINV 396
DESCRIPTION FOR CLUSTER R00299 Cluster R00299 features 1 transcript(s) and 12 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Figure imgf000386_0001
These sequences are variants of the known protein Tescalcin (SwissProt accession identifier TESCJHUMAN; known also according to the synonyms TSC), SEQ ID NO: 621, refened to herein as the previously known protein. Protein Tescalcin is known or believed to have the following function: Binds calcium. The sequence for protein Tescalcin is given at the end of the application, as "Tescalcin amino acid sequence". The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: calcium binding, which are annotation(s) related to Molecular Function. The GO assignment relies on information from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nim.nih.gov/projects/LocusLink/>.
Cluster R00299 can be used as a diagnostic marker according to overexpression of transcripts of this cluster in cancer. Expression of such transcripts in nornial tissues is also given according to the previously described methods. The term "number" in the left hand column of the table and the numbers on the y-axis of the figure below refer to weighted expression of ESTs in each category, as "parts per million" (ratio of the expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million).
Overall, the following results were obtained as shown with regard to the histograms in Figure 18 and Table 4. This cluster is overexpressed (at least at a minimum level) in the following pathological conditions: lung malignant tumors. Table 4 - Normal tissue distribution
Figure imgf000387_0001
387
Figure imgf000388_0001
Table 5 - P values and ratios for expression in cancerous tissue
Figure imgf000388_0002
As noted above, cluster R00299 features 1 transcriρt(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Tescalcin. A description of each variant protein according to the present invention is now provided. Variant protein R00299_P3 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) R00299JT2. An alignment is given to the known protein (Tescalcin) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between R00299JP3 and Q9NWT9(SEQ ID NO: 1389): l .An isolated chimeric polypeptide encoding for R00299JP3, comprising a first amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%), more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MAEKALLCPSSAGLGTWPWVLNSAWPVLPLAVDQGVDWRPRGPV conesponding to amino acids 1 - 44 of R00299_P3, second amino acid sequence being at least 90 % homologous to
SSDQIEQLHRRFKQLSGDQPTIRKENFNNVPDLELNPIRSKIVRAFFDNRNLRKGPSGLA DEINFEDFLTIMSYFRPIDTTMDEEQVELSRKEKLRFLFHMYDSDSDGRITLEEYRNV conesponding to amino acids 74 - 191 of Q9NWT9, which also conesponds to amino acids 45 - 162 of R00299J>3, and a third amino acid sequence being at least 70%>, optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VEELLSGNPHIEKESARSIADGAMMEAASVCMGQMEPDQVYEGITFEDFLKIWQGIDIE TKMHVRFLNMETMALCH conesponding to amino acids 163 - 238 of R00299JP3, wherein said first, second and third amino acid sequences are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a head of R00299__P3, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95% homologous to the sequence MAEKALLCPSSAGLGTWPWVLNSAWPVLPLAVDQGVDWRPRGPV ofR00299_P3. 3. An isolated polypeptide encoding for a tail of R00299JP3, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VEELLSGNPHIEKESARSIADGAMMEAASVCMGQMEPDQVYEGITFEDFLKIWQGIDIE TKMHVRFLNMETMALCH in R00299 P3. Comparison report between R00299JP3 and TESCJIUMAN: l .An isolated chimeric polypeptide encoding for R00299JP3, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%) homologous to a polypeptide having the sequence MAEKALLCPSSAGLGTWPWVLNSAWPVLPLAVDQGVDWRPRGPV corresponding to amino acids 1 - 44 of R00299JP3, and a second amino acid sequence being at least 90 %> homologous to
SSDQIEQLHRRFKQLSGDQPTIRKENFNNVPDLELNPIRSKIVRAFFDNRNLRKGPSGLA DEINFEDFLTIMSYFRPIDTTMDEEQVELSRKEKLRFLFHMYDSDSDGRITLEEYRNWE ELLSGNPHIEKESARSIADGAMMEAAS VCMGQMEPDQVYEGITFEDFLKIWQGIDIETK MHVRFLNMETMALCH conesponding to amino acids 21 - 214 of TESCJIUMAN, which also conesponds to amino acids 45 - 238 of R00299JP3, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a head of R00299JP3, comprising a polypeptide being at least 70%>, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MAEKALLCPSSAGLGTWPWVLNSAWPVLPLAVDQGVDWRPRGPV ofR00299_P3.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because one of the two signal- peptide prediction programs (HMM:Signal peptide,NN:NO) predicts that this protein has a signal peptide.. Variant protein R00299JP3 also has the following non-silent SNPs (Single Nucleotide
Polymoφhisms) as listed in Table 6, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein R00299JP3 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 6 - Amino acid mutations
Figure imgf000391_0001
Variant protein R00299JP3 is encoded by the following transcript(s): R00299JT2, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript R00299JT2 is shown in bold; this coding portion starts at position 142 and ends at position 855. The transcript also has the following SNPs as listed in Table 7 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein R00299JP3 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Nucleic acid SNPs
Figure imgf000391_0002
As noted above, cluster R00299 features 12 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided. Segment cluster R00299_node_2 according to the present invention is supported by 3 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R00299JT2. Table 8 below describes the starting and ending position of this segment on each transcript. Table 8 - Segment location on transcripts
Figure imgf000392_0001
Segment cluster R00299_nodeJ0 according to the present invention is supported by 75 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R00299JT2. Table 9 below describes the starting and ending position of this segment on each transcript. Table 9 - Segment location on transcripts
Figure imgf000392_0002
According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster R00299_node_10 according to the present invention is supported by 46 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R00299JT2. Table 10 below describes the starting and ending position of this segment on each transcript. Table 10 - Segment location on transcripts Transcript name Segment starting position Segment ending position R00299JT2 346 422 Segment cluster R00299 ιodeJ4 according to the present invention is supported by 61 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R00299JT2. Table 11 below describes the starting and ending position of this segment on each transcript. Table 11 - Segment location on transcripts
Figure imgf000393_0001
Segment cluster R00299 iode J 5 according to the present invention can be found in the following transcript(s): R00299JT2. Table 12 below describes the starting and ending position of this segment on each transcript. Table 12 - Segment location on transcripts
Figure imgf000393_0002
Segment cluster R00299_node JO according to the present invention is supported by 66 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R00299JT2. Table 13 below describes the starting and ending position of this segment on each transcript. Table 13 - Segment location on transcripts
Figure imgf000393_0003
Segment cluster R00299_node_23 according to the present invention is supported by 71 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R00299JT2. Table 14 below describes the starting and ending position of this segment on each transcript. Table 14 - Segment location on transcripts
Figure imgf000394_0001
Segment cluster R00299_nodeJ5 according to the present invention is supported by 62 libraries. The number of libraries was detennined as previously described. This segment can be found in the following transcript(s): R00299JT2. Table 15 below describes the starting and ending position of this segment on each transcript. Table 15 - Segment location on transcripts
Figure imgf000394_0002
Segment cluster R00299_node J8 according to the present invention can be found in the following transcript(s): R00299JT2. Table 16 below describes the starting and ending position of this segment on each transcript. Table 16 - Segment location on transcripts
Figure imgf000394_0003
Segment cluster R00299_nodejl according to the present invention is supported by 48 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R00299JT2. Table 17 below describes the starting and ending position of this segment on each transcript. Table 17 - Segment location on transcripts
Figure imgf000395_0001
Segment cluster R00299_nodeJ according to the present invention is supported by 45 libraries. The number of libraries was detennined as previously described. This segment can be found in the following transcript(s): R00299JT2. Table 18 below describes the starting and ending position of this segment on each transcript. Table 18 - Segment location on transcripts
Figure imgf000395_0002
Segment cluster R00299_node_9 according to the present invention can be found in the following transcript(s): R00299JT2. Table 19 below describes the starting and ending position of this segment on each transcript. Table 19 - Segment location on transcripts
Figure imgf000395_0003
Microanay (chip) data is also available for this gene as follows. As described above with regard to the cluster itself, various oligonucleotides were tested for being differentially expressed in various disease conditions, particularly cancer. The following oligonucleotide was found to hit this segment with regard to colon cancer, shown in Table 20. Table 20 - Oligonucleotide related to this gene
Figure imgf000395_0004
Variant protein alignment to the previously known protein:
Sequence name: /tmp/OleVDhrKQO/EjblgLomjM: Q9NWT9
Sequence documentation:
Alignment of: R00299_P3 x Q9N T9
Alignment segment 1/1:
Quality: 1162.00 Escore: 0 Matching length: 118 Total length: 118 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment :
45 SSDQIEQLHRRFKQLSGDQPTIRKENFNNVPDLELNPIRSKIVRAFFDNR 94 I I I II I I I I I I I I I I I I I II I I I II II I I I I II I I I I II I II I II II I I I 74 SSDQIEQLHRRFKQLSGDQPTIRKENFNNVPDLELNPIRSK1VRAFFDNR 123
95 NLRKGPSGLADEINFEDFLTIMSYFRPIDTTMDEEQVELSRKEKLRFLFH 144
124 NLRKGPSGLADEINFEDFLTIMSYFRPIDTTMDEEQVELSRKEKLRFLFH 173 145 MYDSDSDGRITLEEYRNV 162 I II I I III II I I I I I I I I 174 MYDSDSDGRITLEEYRNV 191
Sequence name: /tmp/OleVDhrKQ0/EjblgLomjM:TESC_HUMAN
Sequence documentation:
Alignment of: R00299_P3 x TESCJIUMAN
Alignment segment 1/1:
Quality: 1920.00
Escore: 0 Matching length: 194 Total length: 194 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment: 45 SSDQIEQLHRRFKQLSGDQPTIRKENFNNVPDLELNPIRSKIVRAFFDNR 94 21 SSDQIEQLHRRFKQLSGDQPTIRKENFNNVPDLELNPIRSKIVRAFFDNR 70
95 NLRKGPSGLADEINFEDFLTIMSYFRPIDTTMDEEQVELSRKEKLRFLFH 144
71 NLRKGPSGLADEINFEDFLTIMSYFRPIDTTMDEEQVELSRKEKLRFLFH 120
145 MYDSDSDGRITLEEYRNVVEELLSGNPHIEKESARSIADGAMMEAASVCM 194
121 MYDSDSDGRITLEEYRNVVEELLSGNPHIEKESARSIADGAMMEAASVCM 170
195 GQMEPDQVYEGITFEDFLKIWQGIDIETKMHVRFLNMETMALCH 238
171 GQMEPDQVYEGITFEDFLKIWQGIDIETKMHVRFLNMETMALCH 214
DESCRIPTION FOR CLUSTER Z19178
Cluster Z19178 features 2 transcript(s) and 15 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Figure imgf000398_0001
2005/072053
398
Figure imgf000399_0001
These sequences are variants of the known protein Skeletal muscle LIM-protein 2 (SwissProt accession identifier SLI2JHUMAN; known also according to the synonyms SLIM 2; Four and a half LIM domains protein 3; FHL-3), SEQ ID NO: 622, refened to herein as the previously known protein. The sequence for protein Skeletal muscle LIM-protein 2 is given at the end of the application, as "Skeletal muscle LIM-protein 2 amino acid sequence". The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: muscle development, which are annotation(s) related to Biological Process. The GO assignment relies on information from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>.
As noted above, cluster Z19178 features 2 transcript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Skeletal muscle LIM-protein 2. A description of each variant protein according to the present invention is now provided.
Variant protein Z19178JPEAJ JP5 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) Z19178JPEAJJT5. An alignment is given to the known protein (Skeletal muscle LIM-protein 2) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between Z19178_PEAJ JP5 and Q96C98(SEQ ID NO:1390): l.An isolated chimeric polypeptide encoding for Z19178_PEA_1_P5, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence
GGGGRADWRPKGRWGRGLAPAAGWGAGVRGPGGAGPRSLPRGGVGAALPLAHTVR LQSAASPAARSAPAWPGPQELFYEDRHFHEGCFRCCRCQRSLADEPFTCQDSELLCNDC YCSAFSSQCSACGETV conesponding to amino acids 1 - 130 of Zl 9178JPEA JJ>5, and a second amino acid sequence being at least 90 % homologous to MPGSRKLEYGGQTWHEHCFLCSGCEQPLGSRSFVPDKGAHYCVPCYENKFAPRCARCS KTLTQGGVTYRDQPWHRECLVCTGCQTPLAGQQFTSRDEDPYCVACFGELFAPKCSSC KRPIVGLGGGKYVSFEDRHWHHNCFSCARCSTSLVGQGFVPDGDQVLCQGCSQAGP conesponding to amino acids 1 - 172 of Q96C98, which also conesponds to amino acids 131 - 2005/072053
400
302 of Z19178JPEAJ J"5, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a head of Z19178JPEAJJ>5, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence
GGGGRADWRPKGRWGRGLAPAAGWGAGVRGPGGAGPRSLPRGGVGAALPLAHTVR LQSAASPAARSAPAWPGPQELFYEDRHFHEGCFRCCRCQRSLADEPFTCQDSELLCNDC YCSAFSSQCSACGETV of Z19178JΕAJJP5. Comparison report between Z19178_PEA_1_P5 and Q9BVA2(SEQ ID NO:1391): l.An isolated chimeric polypeptide encoding for Z19178_PEA_1_P5, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence GGGGRADWRPKGRWGRGLAPAAGWGAGVRGPGGAGPRSLPRGGVGAALPLAHTVR LQSAASPAARSAPAWPGPQ conesponding to amino acids 1 - 74 of Z19178JPEAJ JP5, and a second amino acid sequence being at least 90 % homologous to ELFYEDRHFHEGCFRCCRCQRSLADEPFTCQDSELLCNDCYCSAFSSQCSACGETVMPG SRKLEYGGQTWHEHCFLCSGCEQPLGSRSFVPDKGAHYCVPCYENKFAPRCARCSKTL TQGGVTYRDQPWHRECLVCTGCQTPLAGQQFTSRDEDPYCVACFGELFAPKCSSCKRP IVGLGGGKYVSFEDRHWHHNCFSCARCSTSLVGQGFVPDGDQVLCQGCSQAGP conesponding to amino acids 53 - 280 of Q9BVA2, which also conesponds to amino acids 75 - 302 of Z19178JPEAJJP5, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a head of Z19178JPEAJJP5, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence GGGGRADWRPKGRWGRGLAPAAGWGAGVRGPGGAGPRSLPRGGVGAALPLAHTVR LQSAASPAARSAPAWPGPQ of Z19178_PEA_1_P5. 401 The location of the variant protein was detennined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because one of the two signal- peptide prediction programs (HMM:SignaI peptide,NN:NO) predicts that this protein has a signal peptide.. Variant protein Z19178JPEAJJP5 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 4, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Z19178JPEAJJP5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 4 - Amino acid mutations
Figure imgf000402_0001
Variant protein Z19178JPEAJJP5 is encoded by the following transcript(s): Z19178_PEA_1_T5, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript Zl 9178 JPEA JJT5 is shown in bold; this coding portion starts at position 1 and ends at position 907. The transcript also has the following SNPs as listed in Table 5 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is Icnown or not; the presence of known SNPs in variant protein Z19178JPEAJ J>5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 5 - Nucleic acid SNPs
Figure imgf000402_0002
Figure imgf000403_0001
Variant protein Z19178 ΕAJ JP6 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) Z19178JPEAJJT9. An alignment is given to the known protein (Skeletal muscle LIM-protein 2) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between Z19178JPEAJ J>6 and Q96C98(SEQ ID NO:1390): l.An isolated chimeric polypeptide encoding for Z19178JPEAJJP6, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MNPSPARTVSCSAMTATAVRFPRSAPLVGRLSCL conesponding to amino acids 1 - 34 of Z19178_PEA_1_P6, and a second amino acid sequence being at least 90 % homologous to TLTQGGVTYRDQPWHRECLVCTGCQTPLAGQQFTSRDEDPYCVACFGELFAPKCSSCK RPIVGLGGGKYVSFEDRHWHHNCFSCARCSTSLVGQGFVPDGDQVLCQGCSQAGP conesponding to amino acids 60 - 172 of Q96C98, which also conesponds to amino acids 35 - 147 of Z19178_PEA_1_P6, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a head of Z19178_PEA_1__P6, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MNPSPARTVSCSAMTATAVRFPRSAPLVGRLSCL of Z19178JPEAJ J»6. Comparison report between Z19178JPEAJ JP6 and Q9BVA2(SEQ ID NO:1391): l.An isolated chimeric polypeptide encoding for Z19178JΕAJJP6, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MNPSPARTVSCSAMTATAVRFPRSAPLVGRLSCL conesponding to amino acids 1 - 34 of Z19178JPEAJJP6, and a second amino acid sequence being at least 90 % homologous to
TLTQGGVTYRDQPWHRECLVCTGCQTPLAGQQFTSRDEDPYCVACFGELFAPKCSSCK RPIVGLGGGKYVSFEDRHWHHNCFSCARCSTSLVGQGFVPDGDQVLCQGCSQAGP conesponding to amino acids 168 - 280 of Q9BVA2, which also conesponds to amino acids 35 - 147 of Z19178JPEAJJP6, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a head of Z19178_PEA_1_P6, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MNPSPARTVSCSAMTATAVRFPRSAPLVGRLSCL of Z19178JPEAJ J>6.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because one of the two signal- peptide prediction programs (HMM:Signal peptide,NN:NO) predicts that this protein has a signal peptide.. Variant protein Z19178_PEA_1_P6 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 6, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Z19178_PEA_1_P6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 6 - Amino acid mutations
Figure imgf000405_0001
Variant protein Z19178JPEAJJP6 is encoded by the following transcript(s): Z19178 ΕAJJT9, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript Z19178_PEA_1_T9 is shown in bold; this coding portion starts at position 379 and ends at position 819. The transcript also has the following SNPs as listed in Table 7 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Z19178_PEA_1_P6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Nucleic acid SNPs
Figure imgf000405_0002
As noted above, cluster Z19178 features 15 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided. Segment cluster Z 19178 JPEAJ jnode J 5 according to the present invention is supported by 50 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z19178JΕAJJT5. Table 8 below describes the starting and ending position of this segment on each transcript. Table 8 - Segment location on transcripts
Figure imgf000406_0001
Segment cluster Z19178JPEAJ rodeJ7 according to the present invention is supported by 50 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z19178_PEAJ JT5 and Zl 9178 JPEA JJT9. Table 9 below describes the starting and ending position of this segment on each transcript. Table 9 - Segment location on transcripts
Figure imgf000406_0002
Segment cluster Z19178_PEA_l_node_2 according to the present invention is supported by 43 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z19178JPEAJJT5 and Zl 9178 JPEA JJT9. Table 10 below describes the starting and ending position of this segment on each transcript. Table 10 - Segment location on transcripts 2005/072053
406
Figure imgf000407_0001
Segment cluster Z19178_PEA_l_nodeJ2 according to the present invention is supported by 61 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcriρt(s): Z19178JPEAJJT5 and Z19178JPEAJ JT9. Table 11 below describes the starting and ending position of this segment on each transcript. Table 11 - Segment location on transcripts
Figure imgf000407_0002
Segment cluster Z19178_PEA_l_node_23 according to the present invention is supported by 81 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z19178JPEAJJT5 and Z19178JPEAJJT9. Table 12 below describes the starting and ending position of this segment on each transcript. Table 12 - Segment location on transcripts
Figure imgf000407_0003
Segment cluster Z19178_PEA_l_node_24 according to the present invention is supported by 58 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcripts): Z19178JPEAJJT5 and Z19178_PEA_1_T9. Table 13 below describes the starting and ending position of this segment on each transcript. Table 13 - Segment location on transcripts
Figure imgf000408_0001
According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster Z19178JPEAJ_nodeJ0 according to the present invention is supported by 60 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z19178_PEA_1_T5 and Z19178_PEA_1_T9. Table 14 below describes the starting and ending position of this segment on each transcript. Table 14 - Segment location on transcripts
Figure imgf000408_0002
Segment cluster Z19178_PEA_l_node_l 1 according to the present invention is supported by 56 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcriρt(s): Z19178JPEAJJT5 and Zl 9178 JPEA JJT9. Table 15 below describes the starting and ending position of this segment on each transcript. Table 15 - Segment location on transcripts
Figure imgf000408_0003
Segment cluster Z19178_PEA_l_node_14 according to the present invention is supported by 53 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z19178_PEA_1_T5. Table 16 below describes the starting and ending position of this segment on each transcript. Table 16 - Segment location on transcripts
Figure imgf000409_0001
Segment cluster Z19178 JPEAJ jnode J 8 according to the present invention is supported by 47 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z19178JPEAJJT5 and Z19178_PEA_1_T9. Table 17 below describes the starting and ending position of this segment on each transcript. Table 17 - Segment location on transcripts
Figure imgf000409_0002
Segment cluster Z19178_PEA_l_node_19 according to the present invention can be found in the following transcript(s): Z19178JΕAJJT5 and Z19178JPEAJJT9. Table 18 below describes the starting and ending position of this segment on each transcript. Table 18 - Segment location on transcripts
Figure imgf000409_0003
Segment cluster Zl 9178JPEAJ jnode J according to the present invention can be found in the following transcript(s): Z19178J>EAJJT5 and Z19178J>EAJ_T9. Table 19 below describes the starting and ending position of this segment on each transcript. Table 19 - Segment location on transcripts
Figure imgf000410_0001
Segment cluster Z19178 JPEA J_node_4 according to the present invention is supported by 29 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Zl 9178JPEA JJT9. Table 20 below describes the starting and ending position of this segment on each transcript. Table 20 - Segment location on transcripts
Figure imgf000410_0002
Segment cluster Z19178_PEA_l_node_5 according to the present invention is supported by 31 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z19178JPEAJ JT9. Table 21 below describes the starting and ending position of this segment on each transcript. Table 21 - Segment location on transcripts
Figure imgf000410_0003
Segment cluster Z19178 ΕAJ iode according to the present invention is supported by 58 libraries. The number of libraries was detennined as previously described. This segment can be found in the following transcript(s): Z19178_PEA_1_T5 and Zl 9178 JPEA JJT9. Table 22 below describes the starting and ending position of this segment on each transcript. Table 22 - Segment location on transcripts
Figure imgf000411_0001
Variant protein alignment to the previously known protein: Sequence name: /tmp/HCEUPaHO0b/Molk3qa5mK:Q96C95 Sequence documentation:
Alignment of: Z19178_PEA_J_P5 x Q96C98
Alignment segment 1/1
Quality: 1799.00 Escore: 0 Matching length: 172 Total length: 172 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent
Identity: 100.00 Gaps : 0
Alignment:
131 MPGSRKLEYGGQT HEHCFLCSGCEQPLGSRSFVPDKGAHYCVPCYENKF 180 I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I 1 I II I I I 1 I II II II I I I I 1 MPGSRKLEYGGQTWHEHCFLCSGCEQPLGSRSFVPDKGAHYCVPCYENKF 50 181 APRCARCSKTLTQGGVTYRDQPWHRECLVCTGCQTPLAGQQFTSRDEDPY 230 I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I II I II 51 APRCARCSKTLTQGGVTYRDQP HRECLVCTGCQTPLAGQQFTSRDEDPY 100 231 CVACFGELFAPKCSSCKRP1VGLGGGKYVSFEDRH HHNCFSCARCSTSL 280 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I 101 CVACFGELFAPKCSSCKRPIVGLGGGKYVSFEDRH HHNCFSCARCSTSL 150
281 VGQGFVPDGDQVLCQGCSQAGP 302 I I I I I I I I I I I I I I I I I I I I I I 151 VGQGFVPDGDQVLCQGCSQAGP 172
Sequence name: /tmp/HCEUPaHO0b/Molk3qa5mK:SLI2_HUMAN
Sequence documentation: Alignment of : Z19178_PEAJ._P5 x SLI2_HUMAN
Alignment segment 1/1: Quality: 2249.00
Escore: 0 Matching length: 228 Total length: 228 Matching Percent Similarity: 96.49 Matching Percent Identity: 94.74 Total Percent Similarity: 96.49 Total Percent
Identity: 94.74 Gaps : 0
Alignment:
75 ELFYEDRHFHEGCFRCCRCQRSLADEPFTCQDSELLCNDCYCSAFSSQCS 124 II II I II I II I I I I I I I I I I I I II I I I I I I I I I I I I I II I I I I I I I II I 53 ELFYEDRHFHEGCFRCCRCQRSLADEPFTRQDSELLCNDCYCSAFSSQCS 102
125 ACGETVMPGSRKLEYGGQT HEHCFLCSGCEQPLGSRSFVPDKGAHYCVP 174
103 ACGETVMPGSRKLEYGGQTWHEHCFLC1GCEQPLGSRPFVPDKGAHYCVP 152 175 CYENKFAPRCARCSKTLTQGGVTYRDQP HRECLVCTGCQTPLAGQQFTS 224 I I I I I I I I II I I : II I I I I I : I I I I III : I II I I I I I I I I II I I I I I 153 CYENNFAPRCARCTKTLTQGGLTYRDLPWHPKCLVCTGCQTPLAGQQFTS 202
225 RDEDPYCVACFGELFAPKCSSCKRPIVGLGGGKYVSFEDRHWHHNCFSCA 274
203 RDEDPYCVACFGELFAPKCSSCKRPIVGLGGGKYVSFEDRHWHHNCFTCD 252 275 RCSTSLVGQGFVPDGDQVLCQGCSQAGP 302
253 RCSNSLVGQGFVPDGDQVLCQGCSQAGP 280
Sequence name: /tmp/HCEUPaHO0b/Molk3qa5mK: Q9BVA2
Sequence documentation:
Alignment of: Z19178 PEA 1 P5 x Q9BVA2
Alignment segment 1/1:
Quality: 2394.00 Escore: 0 Matching length: 228 Total length: 228 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment:
75 ELFYEDRHFHEGCFRCCRCQRSLADEPFTCQDSELLCNDCYCSAFSSQCS 124 I I I I I I II II I I II I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 53 ELFYEDRHFHEGCFRCCRCQRSLADEPFTCQDSELLCNDCYCSAFSSQCS 102
125 ACGETVMPGSRKLEYGGQTWHEHCFLCSGCEQPLGSRSFVPDKGAHYCVP 174 I I I I II I I I I I I I I I I I I I I I I I I I I I I 1 I I I I II I I II I II I I I I II I I 103 ACGETVMPGSRKLEYGGQTWHEHCFLCSGCEQPLGSRSFVPDKGAHYCVP 152
175 CYENKFAPRCARCSKTLTQGGVTYRDQP HRECLVCTGCQTPLAGQQFTS 224 I I II I I I I I I I I I I I I I I I I I I II I I II I I I I I I I I I I I I I I I I I I I I I I 153 CYENKFAPRCARCSKTLTQGGVTYRDQP HRECLVCTGCQTPLAGQQFTS 202 225 RDEDPYCVACFGELFAPKCSSCKRPIVGLGGGKYVSFEDRH HHNCFSCA 274
203 RDEDPYCVACFGELFAPKCSSCKRPIVGLGGGKYVSFEDRH HHNCFSCA 252 275 RCSTSLVGQGFVPDGDQVLCQGCSQAGP 302 II I I I I II I I I I I I I I I I I I I I I I I I I I 253 RCSTSLVGQGFVPDGDQVLCQGCSQAGP 280
Sequence name: /tmp/nlVRxocMJO/rZHvyWGjFT:Q96C9£
Sequence documentation:
Alignment of: Z19178_PEA__1_P6 x Q96C98
Alignment segment 1/1: Quality: 1169.00
Escore: 0 Matching length: 113 Total length: 113 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent
Identity: 100.00 Gaps: 0
Alignment:
35 TLTQGGVTYRDQP HRECLVCTGCQTPLAGQQFTSRDEDPYCVACFGELF 84 I I I I I I I I II I I I I I I I I I II I I I I I II I I I I I I II II II I 1 II I II I II 60 TLTQGGVTYRDQP HRECLVCTGCQTPLAGQQFTSRDEDPYCVACFGELF 109
85 APKCSSCKRPIVGLGGGKYVSFEDRH HHNCFSCARCSTSLVGQGFVPDG 134 I I II II I I II I I I I I I I I I I I II I II II I II I I II I II I II I II I I I I I I 110 APKCSSCKRPIVGLGGGKYVSFEDRH HHNCFSCARCSTSLVGQGFVPDG 159
135 DQVLCQGCSQAGP 147 I I I I I II I I I I I I 160 DQVLCQGCSQAGP 172
Sequence name: /tmp/nlVRxocMJO/rZHvy GjFT:SLI2_HUMAN Sequence documentation:
Alignment of: Z19178 PEA 1 P6 x SLI2 HUMAN
Alignment segment 1/1
Quality: 1090.00 Escore: 0 Matching length: 113 Total length: 113 Matching Percent Similarity: 96.46 Matching Percent Identity: 93.81 Total Percent Similarity: 96.46 Total Percent Identity: 93.81 Gaps : 0
Alignment :
35 TLTQGGVTYRDQP HRECLVCTGCQTPLAGQQFTSRDEDPYCVACFGELF 84
168 TLTQGGLTYRDLP HPKCLVCTGCQTPLAGQQFTSRDEDPYCVACFGELF 217
85 APKCSSCKRPIVGLGGGKYVSFEDRH HHNCFSCARCSTSLVGQGFVPDG 134
218 APKCSSCKRPIVGLGGGKYVSFEDRH HHNCFTCDRCSNSLVGQGFVPDG 267
135 DQVLCQGCSQAGP 147
268 DQVLCQGCSQAGP 280
Sequence name: /tmp/nlVRxocMJO/rZHvyWGjFT :Q9BVA2
Sequence documentation:
Alignment of: Z19178_PEA 1 P6 x Q9BVA2
Alignment segment 1/1
Quality: 1169.00 Escore: 0 Matching length: 113 Total length: 113 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment:
35 TLTQGGVTYRDQP HRECLVCTGCQTPLAGQQFTSRDEDPYCVACFGELF 84
168 TLTQGGVTYRDQP HRECLVCTGCQTPLAGQQFTSRDEDPYCVACFGELF 217
85 APKCSSCKRPIVGLGGGKYVSFEDRHWHHNCFSCARCSTSLVGQGFVPDG 134 218 APKCSSCKRPIVGLGGGKYVSFEDRHWHHNCFSCARCSTSLVGQGFVPDG 267
135 DQVLCQGCSQAGP 147 I I I II I I I I I I I I 268 DQVLCQGCSQAGP 280
DESCRIPTION FOR CLUSTER S67314 Cluster S67314 features 4 transcript(s) and 8 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Figure imgf000420_0001
Figure imgf000421_0001
These sequences are variants of the Icnown protein Fatty acid-binding protein, heart (SwissProt accession identifier FABHJHUMAN; known also according to the synonyms H- FABP; Muscle fatty acid-binding protein; M-FABP; Mammary-derived growth inhibitor; MDGI), SEQ ID NO: 623, refened to herein as the previously known protein. Protein Fatty acid-binding protein, heart is known or believed to have the following function(s): FABP are thought to play a role in the intracellular transport of long-chain fatty acids and their acyl-CoA esters. The sequence for protein Fatty acid-binding protein, heart is given at the end of the application, as "Fatty acid-binding protein, heart amino acid sequence". Known polymoφhisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein
Figure imgf000421_0002
Protein Fatty acid-binding protein, heart localization is believed to be Cytoplasmic. The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: negative control of cell proliferation, which are annotation(s) related to Biological Process; and lipid binding, which are annotation(s) related to Molecular Function. The GO assigmnent relies on information from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.cli/sprot/>; or Locuslink, available from <htto://www.ncbi.nlm.nih.gov/projects/LocusLink/>. As noted above, cluster S67314 features 4 transcript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Fatty acid- binding protein, heart. A description of each variant protein according to the present invention is now provided.
Variant protein S67314JPEAJ JM according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) S67314JPEAJJT4. An alignment is given to the known protein (Fatty acid-binding protein, heart) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between S67314J>EAJ JP4 and FABHJHUMAN: l .An isolated chimeric polypeptide encoding for S67314JPEAJJP4, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence
MVDAFLGTWKLVDSKNFDDYMKSLGVGFATRQVASMTKPTTIIEKNGDILTLKTHSTF KNTEISFKLGVEFDETTADDRKVKSIVTLDGGKLVHLQKWDGQETTLVRELIDGKLIL conesponding to amino acids 1 - 116 of FABHJHUMAN, which also conesponds to amino acids 1 - 116 of S67314_PEA_1_P4, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence VRWATLELYLIGYYYCSFSQACSKKPSPPLRAVEAGTREWLWVRVVSGGNFLCSGFGL TQAGTQILPYRLHDCGQITFSKCNCKTGINNTNLVGLLGSL conesponding to amino acids 117 - 215 of S67314 ΕAJJP4, wherein said firstand second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of S67314JDEAJJ,4, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VRWATLELYLIGYYYCSFSQACSKKPSPPLRAVEAGTREWLWVRVVSGGNFLCSGFGL TQAGTQILPYRLHDCGQITFSKCNCKTGINNTNLVGLLGSL in S67314JPEAJ J 4. Comparison report between S67314JPEAJ JP4 and AAP35373(SEQ ID N0:1392): l.An isolated chimeric polypeptide encoding for S67314_PEA_1_P4, comprising a first amino acid sequence being at least 90 % homologous to
MVDAFLGTWKLVDSKNFDDYMKSLGVGFATRQVASMTKPTTIIEKNGDILTLKTHSTF KNTEISFKLGVEFDETTADDRKVKSIVTLDGGKLVHLQKWDGQETTLVRELIDGKLIL conesponding to amino acids 1 - 116 of AAP35373, which also conesponds to amino acids 1 -
116 of S67314_PEA_1_P4, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%>, more preferably at least 90% and most preferably at least 95%o homologous to a polypeptide having the sequence VRWATLELYLIGYYYCSFSQACSKKPSPPLRAVEAGTRE WLWVRVVSGGNFLCSGFGL TQAGTQILPYRLHDCGQITFSKCNCKTGINNTNLVGLLGSL conesponding to amino acids
117 - 215 of S67314JΕAJJP4, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of S67314JPEAJJP4, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence
VRWATLELYLIGYYYCSFSQACSKKPSPPLRAVEAGTREWLWVRVVSGGNFLCSGFGL
TQAGTQILPYRLHDCGQITFSKCNCKTGINNTNLVGLLGSL in S67314JPEAJJP4.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: intracellularly. The protein localization is believed to be intracellular because neither of the trans-membrane region prediction programs predicted a trans-membrane region for this protein.
In addition both signal-peptide prediction programs predict that this protein is a non-secreted protein.. Variant protein S67314JPEAJJP4 also has the following non-silent SNPs (Single
Nucleotide Polymoφhisms) as listed in Table 5, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein S67314JPEAJJP4 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 5 - Amino acid mutations
Figure imgf000424_0001
Variant protein S67314_PEA_1_P4 is encoded by the following transcript(s): S67314JPEAJJT4, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript S67314JPEAJJT4 is shown in bold; this coding portion starts at position 925 and ends at position 1569. The transcript also has the following SNPs as listed in Table 6 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein S67314JPEAJJP4 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 6 - Nucleic acid SNPs
Figure imgf000424_0002
Variant protein S67314JPEAJ JP5 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) S67314_PEA_1_T5. An alignment is given to the known protein (Fatty acid-binding protein, heart) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between S67314JPEAJ J>5 and FABHJHUMAN: l.An isolated chimeric polypeptide encoding for S67314JPEAJJP5, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%) and most preferably at least 95% homologous to a polypeptide having the sequence
MVDAFLGTWKLVDSKNFDDYMKSLGVGFATRQVASMTKPTTIIEKNGDILTLKTHSTF KNTEISFKLGVEFDETTADDRKVKSIVTLDGGKLVHLQKWDGQETTLVRELIDGKLIL conesponding to amino acids 1 - 116 of FABHJHUMAN, which also conesponds to amino acids 1 - 116 of S67314_PEA_1_P5, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%o and most preferably at least 95% homologous to a polypeptide having the sequence DVLTAWPSIYRRQVKVLREDEITILPWHLQWSREKATKLLRPTLPSYNNHGWEELRVG KSIV conesponding to amino acids 117 - 178 of S67314JPEAJJP5, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of S67314JPEAJJ>5, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95% homologous to the sequence DVLTAWPSIYRRQVKVLREDEITILPWHLQWSREKATKLLRPTLPSYNNHGWEELRVG KSIV in S67314_PEA_1_P5. Comparison report between S67314JPEAJ JP5 and AAP35373(SEQ ID NO:1392): l.An isolated chimeric polypeptide encoding for S67314JPEAJJP5, comprising a first amino acid sequence being at least 90 % homologous to MVDAFLGTWKLVDSKNFDDYMKSLGVGFATRQVASMTKPTTIIEKNGDILTLKTHSTF KNTEISFKLGVEFDETTADDRKVKSIVTLDGGKLVHLQKWDGQETTLVRELIDGKLIL conesponding to amino acids 1 - 116 of AAP35373, which also conesponds to amino acids 1 - 116 of S67314JPEAJJP5, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence DVLTAWPSIYRRQVKVLREDEITILPWHLQWSREKATKLLRPTLPSYNNHGWEELRVG KSIV corresponding to amino acids 117 - 178 of S67314_PEA_1__P5, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of S67314JPEAJJP5, comprising a polypeptide being at least 70%>, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95%> homologous to the sequence DVLTAWPSIYRRQVKVLREDEITILPWHLQWSREKATKLLRPTLPSYNNHGWEELRVG KSIV in S67314JPEAJJP5. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: intracellularly. The protein localization is believed to be intracellular because neither of the trans-membrane region prediction programs predicted a trans-membrane region for this protein. In addition both signal-peptide prediction programs predict that this protein is a non-secreted protein- Variant protein S67314_PEA_1_P5 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 7, (given according to their ρosition(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is Icnown or not; the presence of known SNPs in variant protein S67314_PEA_1_P5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Amino acid mutations
Figure imgf000426_0001
Variant protein S67314JPEAJJD5 is encoded by the following transcript(s): S67314JPEAJJT5, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript S67314_PEA_1_T5 is shown in bold; this coding portion starts at 426 position 925 and ends at position 1458. The transcript also has the following SNPs as listed in Table 8 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein S67314JPEAJJP5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Nucleic acid SNPs
Figure imgf000427_0001
Variant protein S67314JPEAJ JP6 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) S67314JPEA JJT6. An alignment is given to the known protein (Fatty acid-binding protein, heart) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between S67314_PEA_1_P6 and FABHJHUMAN: l.An isolated chimeric polypeptide encoding for S67314JPEAJJP6, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence MVDAFLGTWKLVDSKNFDDYMKSLGVGFATRQVASMTKPTTIIEKNGDILTLKTHSTF KNTEISFKLGVEFDETTADDRKVKSIVTLDGGKLVHLQKWDGQETTLVRELIDGKLIL conesponding to amino acids 1 - 116 of FABHJHUMAN, which also conesponds to amino acids 1 - 116 of S67314_PEA_1_P6, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most 2005/072053
427 preferably at least 95% homologous to a polypeptide having the sequence MEKLQLRNVK conesponding to amino acids 117 - 126 of S67314JPEAJ JP6, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of S67314JPEAJJP6, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MEKLQLRNVK in S67314 ΕAJJP6. Comparison report between S67314_PEA_1_P6 and AAP35373 (SEQ ID NO:1392): l.An isolated chimeric polypeptide encoding for S67314_PEA_1_P6, comprising a first amino acid sequence being at least 90 % homologous to MVDAFLGTWKLVDSKNFDDYMKSLGVGFATRQVASMTKPTTIIEKNGDILTLKTHSTF KNTEISFKLGVEFDETTADDRKVKSIVTLDGGKLVHLQKWDGQETTLVRELIDGKLIL conesponding to amino acids 1 - 116 of AAP35373, which also conesponds to amino acids 1 - 116 of S67314JPEAJJP6, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MEKLQLRNVK conesponding to amino acids 117 - 126 of S67314JPEAJJP6, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of S67314_PEA_1_P6, comprising a polypeptide being at least 70%o, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MEKLQLRNVK in S67314_PEA_1_P6.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: intracellularly. The protein localization is believed to be intracellular because neither of the trans-membrane region prediction programs predicted a trans-membrane region for this protein. In addition both signal-peptide prediction programs predict that this protein is a non-secreted protein.. Variant protein S67314JPEAJJP6 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 9, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is Icnown or not; the presence of known SNPs in variant protein S67314__PEA_1_P6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Amino acid mutations
Figure imgf000429_0001
Variant protein S67314JPEAJJP6 is encoded by the following transcript(s): S67314JΕAJJT6, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript S67314JPEAJJT6 is shown in bold; this coding portion starts at position 925 and ends at position 1302. The transcript also has the following SNPs as listed in Table 10 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein S67314JPEAJ P6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 - Nucleic acid SNPs
Figure imgf000429_0002
Variant protein S67314JPEAJJP7 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) S67314_PEA_1_T7. An alignment is given to the known protein (Fatty acid-binding protein, heart) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between S67314__PEA_1_P7 and FABHJHUMAN: l.An isolated chimeric polypeptide encoding for S67314JPEAJJP7, comprising a first amino acid sequence being at least 90 % homologous to MVDAFLGTWKLVDSKNFDDYMKSL corresponding to amino acids 1 - 24 of FABHJHUMAN, which also conesponds to amino acids 1 - 24 of S67314_PEA_1_P7, second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence AHILITFPLPS conesponding to amino acids 25 - 35 of S67314JPEAJ JP7, and a third amino acid sequence being at least 90 % homologous to GVGFATRQVASMTKPTTIIEKNGDILTLKTHSTFKNTEISFKLGVEFDETTADDRKVKSI VTLDGGKLVHLQKWDGQETTLVRELIDGKLILTLTHGTAVCTRTYEKEA conesponding to amino acids 25 - 133 of FABH_HUMAN, which also conesponds to amino acids 36 - 144 of S67314JPEAJJP7, wherein said first, second, third and fourth amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for an edge portion of S67314JPEAJJP7, comprising an amino acid sequence being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence encoding for AHILITFPLPS, conesponding to S67314J>EAJJP7. Comparison report between S67314 >EAJ J>7 and AAP35373(SEQ ID NO: 1392): l.An isolated chimeric polypeptide encoding for S67314_PEA_1_P7, comprising a first amino acid sequence being at least 90 % homologous to MVDAFLGTWKLVDSKNFDDYMKSL conesponding to amino acids 1 - 24 of AAP35373, which also conesponds to amino acids 1 - 24 of S67314_PEA_1_P7, second amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence AHILITFPLPS conesponding to amino acids 25 - 35 of S67314JPEAJJP7, and a third amino acid sequence being at least 90 % homologous to GVGFATRQVASMTKPTTIIEKNGDILTLKTHSTFKNTEISFKLGVEFDETTADDRKVKSI VTLDGGKLVHLQKWDGQETTLVRELIDGKLILTLTHGTAVCTRTYEKEA conesponding to amino acids 25 - 133 of AAP35373, which also conesponds to amino acids 36 - 144 of S67314JΕAJJP7, wherein said first, second and third amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for an edge portion of S67314_PEA_1_P7, comprising an amino acid sequence being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90%> and most preferably at least about 95%> homologous to the sequence encoding for AHILITFPLPS, conesponding to S67314 PEA 1 P7.
The location of the variant protein was detennined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: intracellularly. The protein localization is believed to be intracellular because neither of the trans-membrane region prediction programs predicted a trans-membrane region for this protein. In addition both signal-peptide prediction programs predict that this protein is a non-secreted protein.. Variant protein S67314_PEA_1_P7 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 11, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is Icnown or not; the presence of known SNPs in variant protein S67314_PEA_1JP7 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 11 - Amino acid mutations
Figure imgf000431_0001
Variant protein S67314JPEAJJ>7 is encoded by the following transcript(s): S67314_PEA_1_T7, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript S67314_PEA_1_T7 is shown in bold; this coding portion starts at position 925 and ends at position 1356. The transcript also has the following SNPs as listed in Table 12 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is Icnown or not; the presence of known SNPs in variant protein S67314JPEAJJP7 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 12 - Nucleic acid SNPs
Figure imgf000432_0001
As noted above, cluster S67314 features 8 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster S67314JPEAJ_nodeJ) according to the present invention is supported by 90 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcriρt(s): S67314JPEAJJT4, S67314JPEAJJT5, S67314JPEA JT6 and S67314JΕAJJT7. Table 13 below describes the starting and ending position of this segment on each transcript. Table 13 - Segment location on transcripts
Figure imgf000433_0001
Segment cluster S67314 JPEAJ jnode J 1 according to the present invention is supported by 6 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): S67314JPEAJJT4. Table 14 below describes the starting and ending position of this segment on each transcript. Table 14 - Segment location on transcripts
Figure imgf000433_0002
Segment cluster S67314JPEAJjιodeJ3 according to the present invention is supported by 76 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): S67314JPEAJJT7. Table 15 below describes the starting and ending position of this segment on each transcript. Table 15 - Segment location on transcripts
Figure imgf000434_0001
Segment cluster S67314_PEA_l_node_15 according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): S67314__PEA_1_T5. Table 16 below describes the starting and ending position of this segment on each transcript. Table 16 - Segment location on transcripts
Figure imgf000434_0002
Segment cluster S67314_PEA_l_node_17 according to the present invention is supported by 4 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): S67314JPEAJ JT6. Table 17 below describes the starting and ending position of this segment on each transcript. Table 17 - Segment location on transcripts
Figure imgf000434_0003
Microanay (chip) data is also available for this segment as follows. As described above with regard to the cluster itself, various oligonucleotides were tested for being differentially expressed in various disease conditions, particularly cancer. The following oligonucleotides were found to hit this segment with regard to colon cancer, shown in Table 18. Table 18 - Oligonucleotides related to this segment
Figure imgf000435_0001
As a general note, oligonucleotide S67314_0_0J41 was overexpressed in colon cancer; this oligonucleotide maps to at least one part of this cluster. Segment cluster S67314_PEA_l_nodeJ according to the present invention is supported by 101 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): S67314_PEA_1_T4, S67314J>EAJJT5, S67314 ΕAJJT6 and S67314JPEAJJT7. Table 19 below describes the starting and ending position of this segment on each transcript. Table 19 - Segment location on transcripts
Figure imgf000435_0002
According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster S67314_PEA_l_node_10 according to the present invention is supported by 64 libraries. The number of libraries was detennined as previously described. This segment can be found in the following transcript(s): S67314JPEAJJT4, S67314JPEAJJT5, S67314JPEAJJT6 and S67314JPEAJJT7. Table 20 below describes the starting and ending position of this segment on each transcript. Table 20 - Segment location on transcripts
Figure imgf000435_0003
Figure imgf000436_0001
Segment cluster S67314JPEAJ_nodeJ according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): S67314_PEA_1_T7. Table 21 below describes the starting and ending position of this segment on each transcript. Table 21 - Segment location on transcripts
Figure imgf000436_0002
Variant protein alignment to the previously known protein: Sequence name: /tmp/EQ0nMn6tqU/R73CUVKUk5 : FABH_HUMAN
Sequence documentation:
Alignment of: S67314_PEA 1 P4 x FABH_HUMAN Alignment segment 1/1
Quality: 1095.00 Escore: 0 Matching length: 115 Total length: 115 Matching Percent Similarity: 100.00 Matching Percent
Identity: 100.00 Total Percent Similarity: 100.00 Total Percent
Identity: 100.00 Gaps : 0
Alignment :
2 VDAFLGT KLVDSKNFDDYMKSLGVGFATRQVASMTKPTTIIEKNGDILT 51
1 VDAFLGT KLVDSKNFDDYMKSLGVGFATRQVASMTKPTTIIEKNGDILT 50
52 LKTHSTFKNTEISFKLGVEFDETTADDRKVKSIVTLDGGKLVHLQKWDGQ 101 I I I II I I I II I I I I I I I II III I II II I I I I I I II I II I I I I I I I I I I I I 51 LKTHSTFKNTEISFKLGVEFDETTADDRKVKSIVTLDGGKLVHLQK DGQ 100
102 ETTLVRELIDGKLIL 116 I I I I I I I I I I I I I I I 101 ETTLVRELIDGKLIL 115
Sequence name: /tmp/EQ0nMn6tqU/R73CUVKUk5 :AAP35373
Sequence documentation:
Alignment of: S67314_PEA 1 P4 x AAP35373 Alignment segment 1/1:
Quality: 1107.00
Escore: 0 Matching length: 116 Total length: 116 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment : 1 MVDAFLGT KLVDSKNFDDYMKSLGVGFATRQVASMTKPTTIIEKNGDIL 50 I I I I II I I I I I II I I I I I I II I I I I I I I I I I I I I I I I I I I I II I I II I II 1 MVDAFLGT KLVDSKNFDDYMKSLGVGFATRQVASMTKPTTIIEKNGDIL 50
51 TLKTHSTFKNTEISFKLGVEFDETTADDRKVKSIVTLDGGKLVHLQK DG 100 II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 TLKTHSTFKNTEISFKLGVEFDETTADDRKVKSIVTLDGGKLVHLQK DG 100 101 QETTLVRELIDGKLIL 116 I I I I I I I I I I I I I I I I 101 QETTLVRELIDGKLIL 116 Sequence name: /tmp/ql4YPIBbdQ/SeofJfCmJW : FABH_HUMAN
Sequence documentation:
Alignment of: S67314 PEA 1 P5 x FABH HUMAN
Alignment segment 1/1:
Quality: 1095.00 Escore: 0 Matching length: 115 Total length: 115 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment:
2 VDAFLGTWKLVDSKNFDDYMKSLGVGFATRQVASMTKPTTIIEKNGDILT 51
1 VDAFLGT KLVDSKNFDDYMKSLGVGFATRQVASMTKPTTIIEKNGDILT 50
52 LKTHSTFKNTEISFKLGVEFDETTADDRKVKSIVTLDGGKLVHLQKWDGQ 101
51 LKTHSTFKNTEISFKLGVEFDETTADDRKVKSIVTLDGGKLVHLQK DGQ 100
102 ETTLVRELIDGKLIL 116
101 ETTLVRELIDGKLIL 115
Sequence name: /tmp/ql4YPIBbdQ/SeofJfCmJW:AAP35373
Sequence documentation:
Alignment of: S6731 _PEA_1_P5 x AAP35373
Alignment segment 1/1: Quality: 1107.00
Escore: 0 Matching length: 116 Total length: 116 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent
Identity: 100.00 Gaps : 0
Alignment:
1 MVDAFLGT KLVDSKNFDDYMKSLGVGFATRQVASMTKPTTIIEKNGDIL 50 I I I I II I I Ml I I I I I I I I I I I I I I I I I I I I II I I I I I I I II I I I I I I I I 1 MVDAFLGTWKLVDSKNFDDYMKSLGVGFATRQVASMTKPTTIIEKNGDIL 50 . . . . . 51 TLKTHSTFKNTEISFKLGVEFDETTADDRKVKSIVTLDGGKLVHLQK DG 100 51 TLKTHSTFKNTEISFKLGVEFDETTADDRKVKSIVTLDGGKLVHLQK DG 100
101 QETTLVRELIDGKLIL 116
101 QETTLVRELIDGKLIL 116
Sequence name: /tmp/PXra2DxLlv/Q8GTrzNMVX: FABH HUMAN
Sequence documentation:
Alignment of: S67314_PEA 1_P6 x FABH_HUMAN
Alignment segment 1/1
Quality: 1095.00 Escore: 0 Matching length: 115 Total length: 115 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps :
Alignment : 2 VDAFLGTWKLVDSKNFDDYMKSLGVGFATRQVASMTKPTTIIEKNGDILT 51 I I I I II I I II II I I I I I II II II I I I II II I I I I I I I I I II I I I I I I I I I 1 VDAFLGTWKLVDSKNFDDYMKSLGVGFATRQVASMTKPTTIIEKNGDILT 50 . . . . . 52 LKTHSTFKNTEISFKLGVEFDETTADDRKVKSIVTLDGGKLVHLQKWDGQ 101 I II I I I I I I I I I I I II I II I I I I I I II II I I I I I II I I II I I I I I I I I I I 51 LKTHSTFKNTEISFKLGVEFDETTADDRKVKSIVTLDGGKLVHLQKWDGQ 100 102 ETTLVRELIDGKLIL 116 II I I I II I II I I I I I 101 ETTLVRELIDGKLIL 115
Sequence name: /tmp/PXra2DxLlv/Q8GTrzNMVX: AP35373
Sequence documentation:
Alignment of: S67314_PEA_1_P6 x AAP35373
Alignment segment 1/1:
Quality: 1107.00
Escore: 0 Matching length: 116 Total length: 116 Matching Percent Similarity: 100.00 Matching Percent
Identity: 100.00 Total Percent Similarity: 100.00 Total Percent
Identity: 100.00 Gaps : 0
Alignment:
1 MVDAFLGTWKLVDSKNFDDYMKSLGVGFATRQVASMTKPTTIIEKNGDIL 50 I I I I M I I I II I I I I I I I I I I I I I I I I I I I I I I I I II I I M I I I I I I I I I 1 MVDAFLGTWKLVDSKNFDDYMKSLGVGFATRQVASMTKPTTIIEKNGDIL 50
51 TLKTHSTFKNTEISFKLGVEFDETTADDRKVKSIVTLDGGKLVHLQK DG 100 I I I I I I I I I I II II I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I II I I 51 TLKTHSTFKNTEISFKLGVEFDETTADDRKVKSIVTLDGGKLVHLQK DG 100
101 QETTLVRELIDGKLIL 116 I I I I I I I I I I I II I I I 101 QETTLVRELIDGKLIL 116
Sequence name: /tmp/xYz yViDom/t Du3T69pd:FABH_HUMAN
Sequence documentation:
Alignment of: S67314JPEAJ. P7 x FABH HUMAN 2005/072053
443
Alignment segment 1/1:
Quality: 1160.00 Escore: 0 Matching length: 132 Total length: 143 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 92.31 Total Percent Identity: 92.31 Gaps : 1
Alignment:
2 VDAFLGT KLVDSKNFDDYMKSLAHILITFPLPSGVGFATRQVASMTKPT 51
1 VDAFLGT KLVDSKNFDDYMKSL GVGFATRQVASMTKPT 39
52 TIIEKNGDILTLKTHSTFKNTEISFKLGVEFDETTADDRKVKSIVTLDGG 101
40 TIIEKNGDILTLKTHSTFKNTEISFKLGVEFDETTADDRKVKSIVTLDGG 89
102 KLVHLQKWDGQETTLVRELIDGKLILTLTHGTAVCTRTYEKEA 144
90 KLVHLQKWDGQETTLVRELIDGKLILTLTHGTAVCTRTYEKEA 132 Sequence name: /tmp/xYz yViDom/twDu3T69pd: AP35373
Sequence documentation:
Alignment of: S67314_PEAJ._P7 x AAP35373
Alignment segment 1/1:
Quality: 1172.00 Escore: 0 Matching length: 133 Total length: 144 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 92.36 Total Percent
Identity: 92.36 Gaps : 1
Alignment: . . . . . 1 MVDAFLGT KLVDSKNFDDYMKSLAHILITFPLPSGVGFATRQVASMTKP 50 I I I I I I I II II I I I I I II II II I I I I I I I I I I I I I I I I I 1 MVDAFLGTWKLVDSKNFDDYMKSL GVGFATRQVASMTKP 39 51 TTIIEKNGDILTLKTHSTFKNTEISFKLGVEFDETTADDRKVKSIVTLDG 100 I I I I I II II I I I II II I I II I II I I I I I I I I I I I I I I I I II I I I I I I I I I 40 TTIIEKNGDILTLKTHSTFKNTEISFKLGVEFDETTADDRKVKSIVTLDG 89
101 GKLVHLQKWDGQETTLVRELIDGKLILTLTHGTAVCTRTYEKEA 144 I M I M I I I I I I I I I I I I I M I I I I I || I I I I I M I I I I I I I I I 90 GKLVHLQKWDGQETTLVRELIDGKLILTLTHGTAVCTRTYEKEA 133 445
DESCRIPTION FOR CLUSTER Z44808 Cluster Z44808 features 5 transcript(s) and 21 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Figure imgf000447_0001
Figure imgf000448_0001
These sequences are variants of the known protein SPARC related modular calcium- binding protein 2 precursor (SwissProt accession identifier SM02JHUMAN; Icnown also according to the synonyms Secreted modular calcium-binding protein 2; SMOC-2; Smooth muscle-associated protein 2; SMAP-2; MSTP117), SEQ ID NO: 624, refened to herein as the previously known protein. Protein SPARC related modular calcium-binding protein 2 precursor is known or believed to have the following function(s): calcium binding . The sequence for protein SPARC related modular calcium-binding protein 2 precursor is given at the end of the application, as "SPARC related modular calcium-binding protein 2 precursor amino acid sequence". Known polymoφhisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein
Figure imgf000448_0002
Figure imgf000449_0001
Protein SPARC related modular calcium-binding protein 2 precursor localization is believed to be Secreted (Probable). Cluster Z44808 can be used as a diagnostic marker according to overexpression of transcripts of this cluster in cancer. Expression of such transcripts in nonnal tissues is also given according to the previously described methods. The term "number" in the right hand column of the table and the numbers on the y-axis of the figure below refer to weighted expression of ESTs in each category, as "parts per million" (ratio of the expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million).
Overall, the following results were obtained as shown with regard to the histograms in Figure 19 and Table 5. This cluster is overexpressed (at least at a minimum level) in the following pathological conditions: colorectal cancer, lung cancer and pancreas carcinoma.
Table 5 - Normal tissue distribution
Figure imgf000449_0002
Figure imgf000450_0001
Table 6 - P values and ratios for expression in cancerous tissue
Figure imgf000450_0002
As noted above, cluster Z44808 features 5 transcript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein SPARC related modular calcium-binding protein 2 precursor. A description of each variant protein according to the present invention is now provided.
Variant protein Z44808JΕAJ JP5 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) Z44808 ΕAJJT4. An alignment is given to the known protein (SPARC related modular calcium-binding protein 2 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows:
Comparison report between Z44808JΕAJ JP5 and SM02 JHUMAN: l.An isolated chimeric polypeptide encoding for Z44808JPEAJJP5, comprising a first amino acid sequence being at least 90 % homologous to MLLPQLCWLPLLAGLLPPVPAQKFSALTFLRVDQDKDKDCSLDCAGSPQKPLCASDGR TFLSRCEFQRAKCKDPQLEIAYRGNCKDVSRCVAERKYTQEQARKEFQQVFIPECNDD GTYSQVQCHSYTGYCWCVTPNGRPISGTAVAHKTPRCPGSVNEKLPQREGTGKTDDAA APALETQPQGDEEDIASRYPTLWTEQVKSRQNKTNKNSVSSCDQEHQSALEEAKQPKN DNVVIPECAHGGLYKPVQCHPSTGYCWCVLVDTGRPIPGTSTRYEQPKCDNTARAHPA KARDLYKGRQLQGCPGAKKHEFLTSVLDALSTDMVHAASDPSSSSGRLSEPDPSHTLEE RVVHWYFKLLDKNSSGDIGKKEIKPFKRFLRKKSKPKKCVKKFVEYCDVNNDKSISVQ ELMGCLGVAKEDGKADTKKRHTPRGHAESTSNRQ conesponding to amino acids 1 - 441 of SM02 JHUMAN, which also conesponds to amino acids 1 - 441 of Z44808JΕAJJP5, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%o, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence DAMVVSSRPKATTHRKSRTLSRR conesponding to amino acids 442 - 464 of Z44808JPEAJJP5, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of Z44808_PEA_1_P5, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence DAMVVSSRPKATTHRKSRTLSRR in Z44808_PEA J J>5.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region..
Variant protein Z44808_PEA_1_P5 is encoded by the following transcript(s): Z44808_PEA_1_T4, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript Z44808JPEAJ JT4 is shown in bold; this coding portion starts at position 586 and ends at position 1977. The transcript also has the following SNPs as listed in Table 7 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Z44808JPEAJ JP5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Nucleic acid SNPs
Figure imgf000452_0001
Variant protein Z44808_PEA_1_P6 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) Z44808J>EAJJT5. An alignment is given to the known protein (SPARC related modular calcium-binding protein 2 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows:
Comparison report between Z44808 JPEA J JP6 and SM02 JHUMAN: l.An isolated chimeric polypeptide encoding for Z44808_PEA_1_P6, comprising a first amino acid sequence being at least 90 % homologous to
MLLPQLCWLPLLAGLLPPVPAQKFSALTFLRVDQDKDKDCSLDCAGSPQKPLCASDGR TFLSRCEFQRAKCKDPQLEIAYRGNCKDVSRCVAERKYTQEQARKEFQQVFIPECNDD GTYSQVQCHSYTGYCWCVTPNGRPISGTAVAHKTPRCPGSVNEKLPQREGTGKTDDAA APALETQPQGDEEDIASRYPTLWTEQVKSRQNKTNKNSVSSCDQEHQSALEEAKQPKN DNWΓPECAHGGLYKPVQCHPSTGYCWCVLVDTGRPIPGTSTRYEQPKCDNTARAHPA KARDLYKGRQLQGCPGAKKHEFLTSVLDALSTDMVHAASDPSSSSGRLSEPDPSHTLEE RVVHWYFKLLDKNSSGDIGKKEIKPFKRFLRKXSKPKKCVKKFVEYCDVNNDKSISVQ
ELMGCLGVAKEDGKADTKKRH conesponding to amino acids 1 - 428 of SM02 JHUMAN, which also conesponds to amino acids 1 - 428 of Z44808_PEA_1_P6, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence RSKRNL conesponding to amino acids 429 - 434 of Z44808JPEAJ JP6, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of Z44808JPEAJJ>6, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95 % homologous to the sequence RSKRNL in Z44808_PEA_1_P6. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region.. Variant protein Z44808JPEAJJP6 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 8, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is Icnown or not; the presence of known SNPs in variant protein Z44808 ΕAJ JP6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Amino acid mutations
Figure imgf000454_0001
Variant protein Z44808_PEA_1_P6 is encoded by the following transcript(s): Z44808JΕAJ JT5, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript Z44808JPEAJJT5 is shown in bold; this coding portion starts at position 586 and ends at position 1887. The transcript also has the following SNPs as listed in Table 9 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Z44808_PEA_1_P6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Nucleic acid SNPs
Figure imgf000454_0002
Figure imgf000455_0001
Variant protein Z44808 J*EAJ J>7 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) Z44808_PEA_1_T9. An alignment is given to the known protein (SPARC related modular calcium-binding protein 2 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows:
Comparison report between Z44808 JPEAJ JP7 and SM02_HUMAN: 1.An isolated chimeric polypeptide encoding for Z44808JPEA J JP7, comprising a first amino acid sequence being at least 90 % homologous to MLLPQLCWLPLLAGLLPPVPAQKFSALTFLRVDQDKDKDCSLDCAGSPQKPLCASDGR TFLSRCEFQRAKCKDPQLEIAYRGNCKDVSRCVAERKYTQEQARKEFQQVFIPECNDD GTYSQVQCHSYTGYCWCVTPNGRPISGTAVAHKTPRCPGSVNEKLPQREGTGKTDDAA APALETQPQGDEEDIASRYPTLWTEQVKSRQNKTNKNSVSSCDQEHQSALEEAKQPKN DNVVΓPECAHGGLYKPVQCHPSTGYCWCVLVDTGRPΓPGTSTRYEQPKCDNTARAHPA KARDLYKGRQLQGCPGAKKHEFLTSVLDALSTDMVHAASDPSSSSGRLSEPDPSHTLEE RVVHWWKLLDKNSSGDIGKKEIKPFKRFLRKKSKPKKCVKKFVEYCDVNNDKSISVQ ELMGCLGVAKEDGKADTKKRHTPRGHAESTSNRQ conesponding to amino acids 1 - 441 of SM02 JHUMAN, which also corresponds to amino acids 1 - 441 of Z44808_PEA_1_P7, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%o, more preferably at least 90%o and most preferably at least 95% homologous to a polypeptide having the sequence LLWLRGKVSFYCF conesponding to amino acids 442 - 454 of Z44808_PEA_1_P7, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of Z44808JPEAJJP7, comprising a polypeptide being at least 10%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95%> homologous to the sequence LLWLRGKVSFYCF in Z44808JPEAJ J>7.
The location of the variant protein was detennined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region.. Variant protein Z44808_PEA_1_P7 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 10, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of Icnown SNPs in variant protein Z44808JPEA J JP7 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 - Amino acid mutations
Figure imgf000456_0001
Variant protein Z44808_PEA_1_P7 is encoded by the following transcript(s): Z44808JPEAJ JT9, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript Z44808_PEA_1_T9 is shown in bold; this coding portion starts at position 586 and ends at position 1947. The transcript also has the following SNPs as listed in Table 11 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of Icnown SNPs in variant protein Z44808JPEAJ JP7 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 11 - Nucleic acid SNPs
Figure imgf000457_0001
Variant protein Z44808JΕAJ JP11 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) Z44808JPEAJjπ i. The identification of this transcript was performed using a non-EST based method for identification of alternative splicing, described in the following reference: "Sorek R et al., Genome Res. (2004) 14:1617-23." An alignment is given to the known protein (SPARC related modular calcium-binding protein 2 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between Z44808 J»EAJ ?l 1 and SM02 JHUMAN: l.An isolated chimeric polypeptide encoding for Z44808_PEA_1_P11, comprising a first amino acid sequence being at least 90 % homologous to MLLPQLCWLPLLAGLLPPVPAQKFSALTFLRVDQDKDKDCSLDCAGSPQKPLCASDGR TFLSRCEFQRAKCKDPQLEIAYRGNCKDVSRCVAERKYTQEQARKEFQQVFIPECNDD GTYSQVQCHSYTGYCWCVTPNGRPISGTAVAHKTPRCPGSVNEKLPQREGTGKT conesponding to amino acids 1 - 170 of SM02J1UMAN, which also corresponds to amino acids 1 - 170 of Z44808__PEA_1_P11, and a second amino acid sequence being at least 90 % homologous to
DIASRYPTLWTEQVKSRQNKTNKNSVSSCDQEHQSALEEAKQPKNDNVVIPECAHGGL YKPVQCHPSTGYCWCVLVDTGRPIPGTSTRYEQPKCDNTARAHPAKARDLYKGRQLQ GCPGAKKHEFLTSVLDALSTDMVHAASDPSSSSGRLSEPDPSHTLEERVVHWYFKLLD KNSSGDIGKKEIKPFKRPLRKKSKPKKCVKKFVEYCDVNNDKSISVQELMGCLGVAKE DGKADTKKRHTPRGHAESTSNRQPRKQG conesponding to amino acids 188 - 446 of SM02 JHUMAN, which also conesponds to amino acids 171 - 429 of Z44808 JPEA JP11, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated chimeric polypeptide encoding for an edge portion of Z44808_PEA_1_P11, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise TD, having a structure as follows: a sequence starting from any of amino acid numbers 170-x to 170; and ending at any of amino acid numbers 171+ ((n-2) - x), in which x varies from 0 to n-2.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region.. Variant protein Z44808JPEAJ J»l 1 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 12, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Z44808JPEAJJP11 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 12 - Amino acid mutations
Figure imgf000459_0001
Variant protein Z44808_PEA_1_P11 is encoded by the following transcript(s): Z44808JPEAJJT11, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript Z44808JPEAJJT11 is shown in bold; this coding portion starts at position 586 and ends at position 1872. The transcript also has the following SNPs as listed in Table 13 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Z44808J)EAJJP11 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 13 - Nucleic acid SNPs
Figure imgf000459_0002
Figure imgf000460_0001
As noted above, cluster Z44808 features 21 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster Z44808 JPEAJ jnode J) according to the present invention is supported by 29 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z44808JPEAJJT11, Z44808_PEA_1_T4, Z44808JPEA JT5, Z44808JPEAJ JT8 and Z44808JPEAJJT9. Table 14 below describes the starting and ending position of this segment on each transcript. Table 14 - Segment location on transcripts
Figure imgf000460_0002
Segment cluster Z44808_PEA_l_node_16 according to the present invention is supported by 39 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z44808_PEA_1_T11, Z44808JPEAJJT4, Z44808JPEAJJT5, Z44808JPEAJJT8 and Z44808JPEAJJT9. Table 15 below describes the starting and ending position of this segment on each transcript. Table 15 - Segment location on transcripts
Figure imgf000460_0003
Figure imgf000461_0001
Segment cluster Z44808JΕAJ jnodeJ according to the present invention is supported by 34 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcripts): Z44808 JPEAJ JTl 1, Z44808JPEAJJT4, Z44808JPEAJJT5, Z44808_PEAJ T8 and Z44808_PEA_1_T9. Table 16 below describes the starting and ending position of this segment on each transcript. Table 16 - Segment location on transcripts
Figure imgf000461_0002
Segment cluster Z44808JPEAJ iode _24 according to the present invention is supported by 52 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z44808JPEAJJT11, Z44808JPEAJJT4, Z44808JPEAJJT5, Z44808JPEAJJT8 and Z44808JPEA JJT9. Table 17 below describes the starting and ending position of this segment on each transcript. Table 17 - Segment location on transcripts
Figure imgf000461_0003
Figure imgf000462_0001
Segment cluster Z44808_PEA_l_node J2 according to the present invention is supported by 17 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z44808 JPEAJ JT4 and Z44808JPEAJ JT8. Table 18 below describes the starting and ending position of this segment on each transcript. Table 18 - Segment location on transcripts
Figure imgf000462_0002
Segment cluster Z44808_PEA_l_nodeJ3 according to the present invention is supported by 133 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z44808 JPEA JJT 11, Z44808 JPEAJ JT4 and Z44808J>EAJJT5. Table 20 below describes the starting and ending position of this segment on each transcript. Table 20 - Segment location on transcripts
Figure imgf000462_0003
Segment cluster Z44808_PEA_l_nodeJ6 according to the present invention is supported by 117 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z44808 JPEAJ JTl 1, Z44808JPEAJJT4 and Z44808JΕAJJT5. Table 21 below describes the starting and ending position of this segment on each transcript. Table 21 - Segment location on transcripts
Figure imgf000463_0001
Segment cluster Z44808_PEA_l_node_37 according to the present invention is supported by 120 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z44808JPEAJJT11, Z44808 JPEAJ _T4 and Z44808JPEA J JT5. Table 22 below describes the starting and ending position of this segment on each transcript. Table 22 - Segment location on transcripts
Figure imgf000463_0002
Segment cluster Z44808_PEA_l_node_41 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z44808_PEA_1_T9. Table 23 below describes the starting and ending position of this segment on each transcript. Table 23 - Segment location on transcripts
Figure imgf000463_0003
According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description. Segment cluster Z44808JPEAJ jnodeJ 1 according to the present invention is supported by 25 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z44808JPEAJJT4, Z44808JPEAJJT5, Z44808JPEAJ T8 and Z44808JPEAJ JT9. Table 24 below describes the starting and ending position of this segment on each transcript. Table 24 - Segment location on transcripts
Figure imgf000464_0001
Segment cluster Z44808_PEA_l_node_13 according to the present invention is supported by 28 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z44808 JPEAJ JTl 1, Z44808JPENJJT4, Z44808JPEAJ JT5, Z44808JPEAJJT8 and Z44808_PEA_1_T9. Table 25 below describes the starting and ending position of this segment on each transcript. Table 25 - Segment location on transcripts
Figure imgf000464_0002
Segment cluster Z44808_PEA_l_node_18 according to the present invention is supported by 27 libraries. The number of libraries was detennined as previously described. This segment can be found in the following transcript(s): Z44808J>EAJ_T11, Z44808_PEA_1_T4, Z44808JPEAJJT5, Z44808JPEAJJT8 and Z44808JPEAJJT9. Table 26 below describes the starting and ending position of this segment on each transcript. Table 26 - Segment location on transcripts
Figure imgf000465_0001
Segment cluster Z44808_PEA_l_nodeJ2 according to the present invention is supported by 33 libraries. The number of libraries was detennined as previously described. This segment can be found in the following transcript(s): Z44808JPEAJJT11, Z44808_PEA_1_T4, Z44808JΕAJ T5, Z44808_PEAJJT8 and Z44808_PEA_1_T9. Table 27 below describes the starting and ending position of this segment on each transcript. Table 27 - Segment location on transcripts
Figure imgf000465_0002
Microarray (chip) data is also available for this segment as follows. As described above with regard to the cluster itself, various oligonucleotides were tested for being differentially expressed in various disease conditions, particularly cancer. The following oligonucleotides were found to hit this segment, shown in Table 28. Table 28 - Oligonucleotides related to this segment
Figure imgf000466_0001
Segment cluster Z44808 ΕAJ iode _26 according to the present invention is supported by 2 libraries. The number of libraries was detennined as previously described. This segment can be found in the following transcript(s): Z44808 PEAJJT5. Table 29 below describes the starting and ending position of this segment on each transcript. Table 29 - Segment location on transcripts
Figure imgf000466_0002
Segment cluster Z44808JPEAJ_nodeJ0 according to the present invention is supported by 44 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcripts): Z44808 JPEAJ JTl 1, Z44808JPEAJJT4, Z44808J>EAJJT5, Z44808JΕAJJT8 and Z44808JPEA JJT9. Table 31 below describes the starting and ending position of this segment on each transcript. Table 31 - Segment location on transcripts
Figure imgf000466_0003
Figure imgf000467_0001
Segment cluster Z44808JPEA J iode J4 according to the present invention is supported by 70 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z44808_PEA_1_T11, Z44808JPEAJJT4 and Z44808JPEAJJT5. Table 32 below describes the starting and ending position of this segment on each transcript. Table 32 - Segment location on transcripts
Figure imgf000467_0002
Segment cluster Z44808_PEA_l_nodeJ5 according to the present invention can be found in the following transcript(s): Z44808 JPEAJ JTl 1, Z44808JPEA JT4 and Z44808JPEAJ JT5. Table 33 below describes the starting and ending position of this segment on each transcript. Table 33 - Segment location on transcripts
Figure imgf000467_0003
Segment cluster Z44808_PEA_l_nodeJ9 according to the present invention is supported by 1 libraries. The number of libraries was detennined as previously described. This segment can be found in the following transcript(s): Z44808JPEAJJT9. Table 34 below describes the starting and ending position of this segment on each transcript. Table 34 - Segment location on transcripts
Figure imgf000468_0001
Segment cluster Z44808_PEA_l_node_4 according to the present invention is supported by 33 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z44808JΕAJJT11, Z44808JPEAJJT4, Z44808 JΕAJJT5, Z44808JΕAJJT8 and Z44808 JΕAJJT9. Table 35 below describes the starting and ending position of this segment on each transcript. Table 35 - Segment location on transcripts
Figure imgf000468_0002
Segment cluster Z44808JPEA_l_node_6 according to the present invention is supported by 30 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z44808_PEAJ_T11, Z44808_PEA_1_T4, Z44808JPEAJJT5, Z44808JPEAJ JT8 and Z44808_PEA_1_T9. Table 36 below describes the starting and ending position of this segment on each transcript. Table 36 - Segment location on transcripts
Figure imgf000468_0003
Figure imgf000469_0001
Segment cluster Z44808JPEAJ iode according to the present invention is supported by 25 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z44808J»EAJ_T11, Z44808JΕAJJT4, Z44808_PEA_1_T5, Z44808JPEAJJT8 and Z44808_PEA_1_T9. Table 37 below describes the starting and ending position of this segment on each transcript. Table 37 - Segment location on transcripts
Figure imgf000469_0002
Variant protein alignment to the previously known protein: Sequence name: /tmp/vϋqLu6eAVZ/K3JDuPvaLo:SM02J-UMAN
Sequence documentation:
Alignment of: Z44808 PEA 1 P5 x SM02 HUMAN Alignment segment 1/1: Quality: 4440.00
Escore: 0 Matching length: 441 Total length: 441 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent
Identity: 100.00 Gaps : 0
Alignment :
1 MLLPQLC LPLLAGLLPPVPAQKFSALTFLRVDQDKDKDCSLDCAGSPQK 50 I I I I I I I I I I I I I I I I I I I I I I I I II I I II I I I II I I I I I I I II I I I I I I 1 MLLPQLCWLPLLAGLLPPVPAQKFSALTFLRVDQDKDKDCSLDCAGSPQK 50
51 PLCASDGRTFLSRCEFQRAKCKDPQLEIAYRGNCKDVSRCVAERKYTQEQ 100 I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I II I I I 51 PLCASDGRTFLSRCEFQRAKCKDPQLEIAYRGNCKDVSRCVAERKYTQEQ 100 . . . . . 101 ARKEFQQVFIPECNDDGTYSQVQCHSYTGYC CVTPNGRPISGTAVAHKT 150 I I I I I I I I I II I II I I I I I I I I I I I II I I I II I I I I I I I I I I I I I I I I I I 101 ARKEFQQVFIPECNDDGTYSQVQCHSYTGYCWCVTPNGRPISGTAVAHKT 150 151 PRCPGSVNEKLPQREGTGKTDDAAAPALETQPQGDEEDIASRYPTLWTEQ 200 I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I II I I I 151 PRCPGSVNEKLPQREGTGKTDDAAAPALETQPQGDEEDIASRYPTLWTEQ 200
201 VKSRQNKTNKNSVSSCDQEHQSALEEAKQPKNDNWIPECAHGGLYKPVQ 250 I II I I I I I I I I I I II I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 201 VKSRQNKTNKNSVSSCDQEHQSALEEAKQPKNDNWIPECAHGGLYKPVQ 250 251 CHPSTGYC CVLVDTGRPIPGTSTRYEQPKCDNTARAHPAKARDLYKGRQ 300 I I I I I I I II I I I I I II I I I I I I I I I I I I I I II I I I I I II I I II I I I I I I I 251 CHPSTGYCWCVLVDTGRPIPGTSTRYEQPKCDNTARAHPAKARDLYKGRQ 300
301 LQGCPGAKKHEFLTSVLDALSTDMVHAASDPSSSSGRLSEPDPSHTLEER 350 II I I I I I I I I I I I I I I I I I I I I III I I I I I I I I I I I I I I I I I I I I I II II 301 LQGCPGAKKHEFLTSVLDALSTDMVHAASDPSSSSGRLSEPDPSHTLEER 350 351 VVH YFKLLDKNSSGDIGKKEIKPFKRFLRKKSKPKKCVKKFVEYCDVNN 400 II II II I I I I I II I I I I I I I I I II I I I I I I I I I II II I II II II I I I I I I 351 WH YFKLLDKNSSGDIGKKEIKPFKRFLRKKSKPKKCVKKFVEYCDVNN 400
401 DKSISVQELMGCLGVAKEDGKADTKKRHTPRGHAESTSNRQ 441
401 DKSISVQELMGCLGVAKEDGKADTKKRHTPRGHAESTSNRQ 441
Sequence name: /tmp/QSUNfTsJ5y/kL0w5Vb6SD:SM02_HUMAN
Sequence documentation:
Alignment of: Z44808__PEA_1_P6 x SM02JTOMAN
Alignment segment 1/1: Quality: 4310.00 Escore: 0 Matching length: 428 Total length: 428 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment :
1 MLLPQLCWLPLLAGLLPPVPAQKFSALTFLRVDQDKDKDCSLDCAGSPQK 50
1 MLLPQLCWLPLLAGLLPPVPAQKFSALTFLRVDQDKDKDCSLDCAGSPQK 50
51 PLCASDGRTFLSRCEFQRAKCKDPQLEIAYRGNCKDVSRCVAERKYTQEQ 100
51 PLCASDGRTFLSRCEFQRAKCKDPQLEIAYRGNCKDVSRCVAERKYTQEQ 100
101 ARKEFQQVFIPECNDDGTYSQVQCHSYTGYCWCVTPNGRPISGTAVAHKT 150
101 ARKEFQQVFIPECNDDGTYSQVQCHSYTGYC CVTPNGRPISGTAVAHKT 150
151 PRCPGSVNEKLPQREGTGKTDDAAAPALETQPQGDEEDIASRYPTLWTEQ 200
151 PRCPGSVNEKLPQREGTGKTDDAAAPALETQPQGDEEDIASRYPTL TEQ 200
201 VKSRQNKTNKNSVSSCDQEHQSALEEAKQPKNDNVVIPECAHGGLYKPVQ 250
201 VKSRQNKTNKNSVSSCDQEHQSALEEAKQPKNDNVVIPECAHGGLYKPVQ 250 251 CHPSTGYCWCVLVDTGRPIPGTSTRYEQPKCDNTARAHPAKARDLYKGRQ 300 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I 251 CHPSTGYCWCVLVDTGRPIPGTSTRYEQPKCDNTARAHPAKARDLYKGRQ 300 . . . . . 301 LQGCPGAKKHEFLTSVLDALSTDMVHAASDPSSSSGRLSEPDPSHTLEER 350 I I II I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I 301 LQGCPGAKKHEFLTSVLDALSTDMVHAASDPSSSSGRLSEPDPSHTLEER 350 351 VVHWYFKLLDKNSSGDIGKKEIKPFKRFLRKKSKPKKCVKKFVEYCDVNN 400 III I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 351 VVH YFKLLDKNSSGDIGKKEIKPFKRFLRKKSKPKKCVKKFVEYCDVNN 400
401 DKSISVQELMGCLGVAKEDGKADTKKRH 42i II I I I I I I I I I I I I II I I I I I I I I I I I I 401 DKSISVQELMGCLGVAKEDGKADTKKRH 42?
Sequence name: /tmp/MZVdR4PVdM/5uN8RwViJl : SM02_HUMAN
Sequence documentation:
Alignment of: Z44808_PEA_1_P7 x SM02_HUMAN
Alignment segment 1/1: Quality: 4440.00 Escore: 0 Matching length: 441 Total length: 441 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment :
1 MLLPQLC LPLLAGLLPPVPAQKFSALTFLRVDQDKDKDCSLDCAGSPQK 50
1 MLLPQLC LPLLAGLLPPVPAQKFSALTFLRVDQDKDKDCSLDCAGSPQK 50
51 PLCASDGRTFLSRCEFQRAKCKDPQLEIAYRGNCKDVSRCVAERKYTQEQ 100
51 PLCASDGRTFLSRCEFQRAKCKDPQLEIAYRGNCKDVSRCVAERKYTQEQ 100
101 ARKEFQQVFIPECNDDGTYSQVQCHSYTGYCWCVTPNGRPISGTAVAHKT 150
101 ARKEFQQVFIPECNDDGTYSQVQCHSYTGYC CVTPNGRPISGTAVAHKT 150
151 PRCPGSVNEKLPQREGTGKTDDAAAPALETQPQGDEEDIASRYPTLWTEQ 200
151 PRCPGSVNEKLPQREGTGKTDDAAAPALETQPQGDEEDIASRYPTL TEQ 200
201 VKSRQNKTNKNSVSSCDQEHQSALEEAKQPKNDNVVIPECAHGGLYKPVQ 250
201 VKSRQNKTNKNSVSSCDQEHQSALEEAKQPKNDNVVIPECAHGGLYKPVQ 250 251 CHPSTGYC CVLVDTGRPIPGTSTRYEQPKCDNTARAHPAKARDLYKGRQ 300 I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I 251 CHPSTGYC CVLVDTGRPIPGTSTRYEQPKCDNTARAHPAKARDLYKGRQ 300 301 LQGCPGAKKHEFLTSVLDALSTDMVHAASDPSSSSGRLSEPDPSHTLEER 350 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 301 LQGCPGAKKHEFLTSVLDALSTDMVHAASDPSSSSGRLSEPDPSHTLEER 350 351 WH YFKLLDKNSSGDIGKKEIKPFKRFLRKKSKPKKCVKKFVEYCDVNN 400 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 351 WHWYFKLLDKNSSGDIGKKEIKPFKRFLRKKSKPKKCVKKFVEYCDVNN 400
401 DKSISVQELMGCLGVAKEDGKADTKKRHTPRGHAESTSNRQ 441 I I I I I I I I I I I I I I I I I I I II I I I I I I I I I II I I I I I I I I I 401 DKSISVQELMGCLGVAKEDGKADTKKRHTPRGHAESTSNRQ 441
Sequence name: /tmp/3fGVxqLloe/J5mQduAdOF: SM02JHUMAN
Sequence documentation:
Alignment of: Z44808_PEA_1_P11 x SM02_HUMAN
Alignment segment 1/1: Quality: 4228.00
Escore: 0 Matching length: 429 Total length: 446 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 96.19 Total Percent
Identity: 96.19 Gaps : 1
Alignment :
1 MLLPQLCWLPLLAGLLPPVPAQKFSALTFLRVDQDKDKDCSLDCAGSPQK 50 I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MLLPQLCWLPLLAGLLPPVPAQKFSALTFLRVDQDKDKDCSLDCAGSPQK 50
51 PLCASDGRTFLSRCEFQRAKCKDPQLEIAYRGNCKDVSRCVAERKYTQEQ 100 I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 PLCASDGRTFLSRCEFQRAKCKDPQLEIAYRGNCKDVSRCVAERKYTQEQ 100 . . . . . 101 ARKEFQQVFIPECNDDGTYSQVQCHSYTGYC CVTPNGRPISGTAVAHKT 150 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I 101 ARKEFQQVFIPECNDDGTYSQVQCHSYTGYCWCVTPNGRPISGTAVAHKT 150 151 PRCPGSVNEKLPQREGTGKT DIASRYPTL TEQ 183 I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I II 151 PRCPGSVNEKLPQREGTGKTDDAAAPALETQPQGDEEDIASRYPTLWTEQ 200 184 VKSRQNKTNKNSVSSCDQEHQSALEEAKQPKNDNVVIPECAHGGLYKPVQ 233 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 201 VKSRQNKTNKNSVSSCDQEHQSALEEAKQPKNDNWIPECAHGGLYKPVQ 250 234 CHPSTGYC CVLVDTGRPIPGTSTRYEQPKCDNTARAHPAKARDLYKGRQ 283 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 251 CHPSTGYC CVLVDTGRPIPGTSTRYEQPKCDNTARAHPAKARDLYKGRQ 300 . . . . . 284 LQGCPGAKKHEFLTSVLDALSTDMVHAASDPSSSSGRLSEPDPSHTLEER 333 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 301 LQGCPGAKKHEFLTSVLDALSTDMVHAASDPSSSSGRLSEPDPSHTLEER 350 334 WH YFKLLDKNSSGDIGKKEIKPFKRFLRKKSKPKKCVKKFVEYCDVNN 383 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 351 VVH YFKLLDKNSSGDIGKKEIKPFKRFLRKKSKPKKCVKKFVEYCDVNN 400
384 DKSISVQELMGCLGVAKEDGKADTKKRHTPRGHAESTSNRQPRKQG 429 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 401 DKSISVQELMGCLGVAKEDGKADTKKRHTPRGHAESTSNRQPRKQG 446
Expression of SM02JHUMAN SPARC related modular calcium-binding protein 2 precursor (Secreted modular calcium-binding protein 2) (SMOC-2) (Smooth muscle-associated protein 2) Z44808 transcripts which are detectable by amplicon as depicted in sequence name Z44808junc8-11 in normal and cancerous colon tissues Expression of SM02_HUMAN SPARC related modular calcium-binding protein 2 precursor (Secreted modular calcium-binding protein 2) (SMOC-2) (Smooth muscle-associated protein 2) transcripts detectable by or according to junc8-l 1, Z44808junc8-11 amplicon (SEQ ID NO: 1291) and primers Z44808junc8-1 IF (SEQ ID NO: 1289) and Z44808junc8-1 IR (SEQ ID
NO: 1290) was measured by real time PCR. In parallel the expression of four housekeeping genes -PBGD (GenBank Accession No. BC019323; amplicon - PBGD-amplicon, SEQ ID
NO:531), HPRTl (GenBank Accession No. NM_000194; amplicon - HPRTl -amplicon, SEQ ID
NO:612), G6PD (GenBank Accession No. NM_000402; G6PD amplicon, SEQ ID NO:615), RPS27A (GenBank Accession No. NM_002954; RPS27A amplicon, SEQ ID NO: 1261) was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The nonnalized quantity of each RT sample was then divided by the median of the quantities of the normal post-mortem (PM) samples (Sample Nos. 41, 52, 62-67, 69-71, Table 1, above, "Tissue samples in testing panel"), to obtain a value of fold up-regulation for each sample relative to median of the normal PM samples. Figure 32 is a histogram showing over expression of the above-indicated SM02JHUMAN SPARC related modular calcium-binding protein 2 precursor (Secreted modular calcium-binding protein 2) (SMOC-2) (Smooth muscle-associated protein 2) transcripts in cancerous colon samples relative to the normal samples. As is evident from Figure 32, the expression of SM02 JHUMAN SPARC related modular calcium-binding protein 2 precursor (Secreted modular calcium-binding protein 2) (SMOC-2) (Smooth muscle-associated protein 2) transcripts detectable by the above amplicon in cancer samples was higher in a few samples than in the non-cancerous samples (Sample Nos. 41, 52, 62-67, 69-71, Table 1, above, "Tissue samples in testing panel"). Notably an over- expression of at least 5 fold was found in 4 out of 36 adenocarcinoma samples.
Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non- limiting illustrative example only of a suitable primer pair: Z44808junc8-1 IF forward primer; and Z44808junc8-1 IR reverse primer. The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon was obtained as a non-limiting illustrative example only of a suitable amplicon: Z44808junc8- 11. Primers: Forward primer Z44808junc8-1 IF (SEQ ID NO: 1289): GAAGGCACAGGAAAAACAGATATTG Reverse primer Z44808junc8-1 IR (SEQ ID NO: 1290): TGGTGCTCTTGGTCACAGGAT Amplicon Z44808junc8-11(SEQ ID NO: 1291) :
GAAGGCACAGGAAAAACAGATATTGCATCACGTTACCCTACCCTTTGGACTGAACA GGTTAAAAGTCGGCAGAACAAAACCAATAAGAATTCAGTGTCATCCTGTGACCAAG AGCACCA
Expression of SM02 JHUMAN SPARC related modular calcium-binding protein 2 precursor (Secreted modular calcium-binding protein 2) (SMOC-2) (Smooth muscle-associated protein 2) Z44808 transcripts which are detectable by amplicon as depicted in sequence name Z44808 junc8-l 1 in different normal tissues
Expression of SM02 JHUMAN SPARC related modular calcium-binding protein 2 precursor (Secreted modular calcium-binding protein 2) (SMOC-2) (Smooth muscle-associated protein 2) transcripts detectable by or according to Z44808junc8-ll amplicon (SEQ ID NO: 1291) and primers: Z44808junc8-11F (SEQ ID NO: 1289) and Z44808junc8-11R (SEQ ID NO: 1290) was measured by real time PCR. In parallel the expression of four housekeeping genes -RPL19 (GenBank Accession No. NM_000981 ; RPL19 amplicon, SEQ ID NO: 1264), TATA box (GenBank Accession No. NM 03194; TATA amplicon, SEQ ID NO: 1267), Ubiquitin (GenBank Accession No. BC000449; amplicon - Ubiquitin-amplicon) and SDHA (GenBank Accession No. NM_004168; amplicon - SDHA-amplicon, SEQ ID NO:1273) was measured similarly. For each RT sample, the expression of the above amplicon was noπnalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the ovary samples (Sample Nos. 18-20, Table 2, "Tissue samples in normal panel"), to obtain a value of relative expression of each sample relative to median of the ovary samples.
Primers: Forward primer Z44808junc8-11F (SEQ ID NO: 1289): GAAGGCACAGGAAAAACAGATATTG Reverse primer Z44808junc8-1 IR (SEQ ID NO: 1290): TGGTGCTCTTGGTCACAGGAT Amplicon Z44808junc8-11(SEQ ID NO: 1291) : GAAGGCACAGGAAAAACAGATATTGCATCACGTTACCCTACCCTTTGGACTGAACA GGTTAAAAGTCGGCAGAACAAAACCAATAAGAATTCAGTGTCATCCTGTGACCAAG AGCACCA
The results are shown in Figure 39, demonstrating the expression of SM02JHUMAN SPARC related modular calcium-binding protein 2 precursor (Secreted modular calcium-binding protein 2) (SMOC-2) (Smooth muscle-associated protein 2) Z44808 transcripts which are detectable by amplicon as depicted in sequence name Z44808 junc8-l 1 in different normal tissues.
Figure imgf000481_0001
Table 5 - P values and ratios for expression in cancerous tissue
Figure imgf000481_0002
As noted above, cluster Z25299 features 5 transcript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Antileukoproteinase 1 precursor. A description of each variant protein according to the present invention is now provided.
Variant protein Z25299_PEA__2_P2 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) Z25299JPEA JT1. An alignment is given to the known protein (Antileukoproteinase 1 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between Z25299_PEA_2_P2 and ALKl JLUMAN: 1.An isolated chimeric polypeptide encoding for Z25299_PEA _2JP2, comprising a first amino acid sequence being at least 90 %> homologous to MKSSGLFPFLVLLALGTLAPWAVEGSGKSFKAGVCPPKKSAQCLRYKKPECQSDWQCP GKKRCCPDTCGIKCLDPVDTPNPTRRKPGKCPVTYGQCLMLNPPNFCEMDGQCKRDLK CCMGMCGKSCVSPVK conesponding to amino acids 1 - 131 of ALK1_HUMAN, which also conesponds to amino acids 1 - 131 of Z25299JPEA _2JP2, and a second amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90%) and most preferably at least 95%) homologous to a polypeptide having the sequence GKQGMRAH conesponding to amino acids 132 - 139 of Z25299JPEAJ P2, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of Z25299JPEA_2JP2, comprising a polypeptide being at least 70%>, optionally at least about 80%, preferably at least about 85%o, more preferably at least about 90%> and most preferably at least about 95% homologous to the sequence GKQGMRAH in Z25299_PEA_2_P2.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region.. Variant protein Z25299JPEA_2JP2 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 6, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Z25299_PEA_2_P2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 6 - Amino acid mutations
Figure imgf000483_0001
Variant protein Z25299JPEA_2JP2 is encoded by the following transcript(s): Z25299JPEA JT1, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript Z25299 JΕA JTl is shown in bold; this coding portion starts at position 124 and ends at position 540. The transcript also has the following SNPs as listed in Table 7 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is Icnown or not; the presence of known SNPs in variant protein Z25299_PEA_2_P2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Nucleic acid SNPs
Figure imgf000483_0002
Figure imgf000484_0001
Variant protein Z25299JPEA_2JP3 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) Z25299_PEA_2_T2. An alignment is given to the known protein (Antileukoproteinase 1 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between Z25299 JPEAJ2 J»3 and ALK1_HUMAN: 1.An isolated chimeric polypeptide encoding for Z25299_PEA_2_P3, comprising a first amino acid sequence being at least 90 %> homologous to MKSSGLFPFLVLLALGTLAPWAVEGSGKSFKAGVCPPKKSAQCLRYKKPECQSDWQCP GKKRCCPDTCGKCLDPVDTPNPTRRKPGKCPVTYGQCLMLNPPNFCEMDGQCKRDLK CCMGMCGKSCVSPVK conesponding to amino acids 1 - 131 of ALKl JIUMAN, which also conesponds to amino acids 1 - 131 of Z25299_PEA_2_P3, and a second amino acid sequence being at least 70%>, optionally at least 80%>, preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence GEKRHHKQLRDQEVDPLEMRRHSAG conesponding to amino acids 132 - 156 of Z25299JPEA JP3, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of Z25299JPEAJJP3, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95%) homologous to the sequence GEKRHHKQLRDQEVDPLEMRRHSAG in Z25299_PEAJ_P3.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region.. Variant protein Z25299JPEA_2JP3 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 8, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Z25299JPEAJJP3 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Amino acid mutations
Figure imgf000485_0001
Variant protein Z25299JPEAJJP3 is encoded by the following transcript(s): Z25299_PEA_2_T2, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript Z25299_PEAJ_T2 is shown in bold; this coding portion starts at position 124 and ends at position 591. The transcript also has the following SNPs as listed in Table 9 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Z25299JPEA_2JP3 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Nucleic acid SNPs
Figure imgf000486_0001
Variant protein Z25299JPEA JP7 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) Z25299_PEA_2_T6. An alignment is given to the known protein (Antileukoproteinase 1 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between Z25299JPEA JP7 and ALKl JHUMAN: 1.An isolated chimeric polypeptide encoding for Z25299JΕA _P7, comprising a first amino acid sequence being at least 90 %ι homologous to MKSSGLFPFLVLLALGTLAPWAVEGSGKSFKAGVCPPKKSAQCLRYKKPECQSDWQCP GKKRCCPDTCGIKCLDPVDTPNP conesponding to amino acids 1 - 81 of ALKl JIUMAN, which also conesponds to amino acids 1 - 81 of Z25299JΕAJJP7, and a second amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%), more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence RGSLGSAQ conesponding to amino acids 82 - 89 of Z25299JPEAJJP7, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of Z25299_PEA_2_P7, comprising a polypeptide being at least 70%>, optionally at least about 80%>, preferably at least about 85%), more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence RGSLGSAQ in Z25299_PEAJ_P7. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region.. Variant protein Z25299_PEAJ_P7 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 10, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Z25299_PEA_2_P7 sequence provides support for the deduced sequence of this variant protein according to the present invention)- Table 10 - Amino acid mutations
Figure imgf000487_0001
Figure imgf000488_0001
Variant protein Z25299JPEAJJP7 is encoded by the following transcript(s): Z25299 JPEA T6, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript Z25299 J>EA J_T6 is shown in bold; this coding portion starts at position 124 and ends at position 390. The transcript also has the following SNPs as listed in Table 11 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Z25299JPEAJJP7 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 11 - Nucleic acid SNPs
Figure imgf000488_0002
Variant protein Z25299_PEAJ_P10 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) Z25299_PEA_2_T9. An alignment is given to the known protein (Antileukoproteinase 1 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between Z25299JPEA_2JP10 and ALKl JHUMAN: l .An isolated chimeric polypeptide encoding for Z25299_PEA_2_P10, comprising a first amino acid sequence being at least 90 % homologous to
MKSSGLFPFLVLLALGTLAPWAVEGSGKSFKAGVCPPKKSAQCLRYKKPECQSDWQCP
GKKRCCPDTCGIKCLDPVDTPNPT conesponding to amino acids 1 - 82 of ALKl JIUMAN, which also conesponds to amino acids 1 - 82 of Z25299JPEAJJP10.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region.. Variant protein Z25299_PEA_2_P10 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 12, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is Icnown or not; the presence of known SNPs in variant protein Z25299_PEA_2_P10 sequence provides support for the deduced sequence of this variant protein according to the present invention).
Table 12 - Amino acid mutations
Figure imgf000490_0001
Variant protein Z25299 JPEA J310 is encoded by the following transcript(s): Z25299_PEA_2_T9, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript Z25299JPEA JT9 is shown in bold; this coding portion starts at position 124 and ends at position 369. The transcript also has the following SNPs as listed in Table 13 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Z25299JPEAJJP10 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 13 - Nucleic acid SNPs
Figure imgf000490_0002
Figure imgf000491_0001
As noted above, cluster Z25299 features 1 1 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster Z25299_PEAJ_node O according to the present invention is supported by 6 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z25299JPEA_2_T1. Table 14 below describes the starting and ending position of this segment on each transcript. Table 14 - Segment location on transcripts
Figure imgf000491_0002
Segment cluster Z25299_PEAJ_node_21 according to the present invention is supported by 162 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z25299JPEA_2_T1, Z25299JPEA JT6 and Z25299_PEA_2_T9. Table 15 below describes the starting and ending position of this segment on each transcript. Table 15 - Segment location on transcripts
Figure imgf000491_0003
Segment cluster Z25299__PEA_2_node J3 according to the present invention is supported by 2 libraries. The number of libraries was detennined as previously described. This segment can be found in the following transcript(s): Z25299_PEA_2_T2. Table 16 below describes the starting and ending position of this segment on each transcript. Table 16 ~ Segment location on transcripts Transcript name Segment starting position Segment ending position Z25299JPEAJJT2 518 707
Segment cluster Z25299 J>EA jnode 4 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z25299 JPEAJ JT2 and Z25299_PEA_2_T3. Table 17 below describes the starting and ending position of this segment on each transcript. Table 17 - Segment location on transcripts
Figure imgf000492_0001
Segment cluster Z25299_PEAJ_node_8 according to the present invention is supported by 218 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z25299_PEA_2_T1, Z25299JPEA_2JT2, Z25299_PEAJ_T3, Z25299JPEA JT6 and Z25299_PEA_2_T9. Table 18 below describes the starting and ending position of this segment on each transcript. Table 18 - Segment location on transcripts
Figure imgf000492_0002
Figure imgf000493_0001
According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster Z25299_PEA_2_node_12 according to the present invention is supported by 228 libraries. The number of libraries was detennined as previously described. This segment can be found in the following transcript(s): Z25299 JPEA J JTl , Z25299_PEA_2_T2, Z25299JPEAJJT3, Z25299_PEA_2_T6 and Z25299JPEA_2JT9. Table 20 below describes the starting and ending position of this segment on each transcript. Table 20 - Segment location on transcripts
Figure imgf000493_0002
Segment cluster Z25299_PEA_2_node_13 according to the present invention is supported by 246 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z25299_PEA_2_T1, Z25299_PEA_2_T2, Z25299_PEA_2_T3, Z25299_PEA_2_T6 and Z25299_PEAJ_T9. Table 22 below describes the starting and ending position of this segment on each transcript. Table 22 - Segment location on transcripts
Figure imgf000493_0003
Figure imgf000494_0001
Segment cluster Z25299JPEAJ_nodeJ4 according to the present invention can be found in the following transcript(s): Z25299JPEA_2_T1, Z25299_PEA_2_T2, Z25299_PEA_2_T3, Z25299_PEA_2_T6 and Z25299_PEA_2_T9. Table 23 below describes the starting and ending position of this segment on each transcript.
Figure imgf000494_0002
Segment cluster Z25299_PEA_2_node_17 according to the present invention can be found in the following transcript(s): Z25299_PEA_2_T1, Z25299_PEAJ_T2 and Z25299_PEA_2_T3. Table 24 below describes the starting and ending position of this segment on each transcript. Table 24 - Segment location on transcripts
Figure imgf000494_0003
Segment cluster Z25299_PEAJ_node_18 according to the present invention is supported by 221 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z25299JPEA JT1, Z25299_PEA_2_T2, Z25299JPEAJJT3 and Z25299 J>E A JT6. Table 25 below describes the starting and ending position of this segment on each transcript. Table 25 - Segment location on transcripts
Figure imgf000495_0001
Segment cluster Z25299JPEA J ιodeJ9 according to the present invention is supported by 197 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z25299JPEA JT1, Z25299_PEA_2_T2, Z25299JPEAJJT3 and Z25299JPEAJJT6. Table 26 below describes the starting and ending position of this segment on each transcript. Table 26 - Segment location on transcripts
Figure imgf000495_0002
Variant protein alignment to the previously Icnown protein:
Sequence name: /tmp/oXgeQ4MeyL/K6VqblMQu2 :ALK1_HUMAN
Sequence documentation:
Alignment of: Z25299_PEA_2_P2 x ALK1_HUMAN
Alignment segment 1/1: Quality: 1371.00
Escore: 0 Matching length: 131 Total length: 131 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment:
1 MKSSGLFPFLVLLALGTLAP AVEGSGKSFKAGVCPPKKSAQCLRYKKPE 50
1 MKSSGLFPFLVLLALGTLAP AVEGSGKSFKAGVCPPKKSAQCLRYKKPE 50
51 CQSD QCPGKKRCCPDTCGIKCLDPVDTPNPTRRKPGKCPVTYGQCLMLN 100
51 CQSD QCPGKKRCCPDTCGIKCLDPVDTPNPTRRKPGKCPVTYGQCLMLN 100
101 PPNFCEMDGQCKRDLKCCMGMCGKSCVSPVK 131 101 PPNFCEMDGQCKRDLKCCMGMCGKSCVSPVK 131
Sequence name: /tmp/rbf314VLIm/yR43i4SbP4 :ALK1_HUMAN
Sequence documentation:
Alignment of: Z25299_PEA_2_P3 x ALK1_HUMAN
Alignment segment 1/1:
Quality: 1371.00
Escore: 0 Matching length: 131 Total length: 131 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent
Identity: 100.00 Gaps : 0
Alignment :
1 MKSSGLFPFLVLLALGTLAPWAVEGSGKSFKAGVCPPKKSAQCLRYKKPE 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MKSSGLFPFLVLLALGTLAPWAVEGSGKSFKAGVCPPKKSAQCLRYKKPE 50 51 CQSD QCPGKKRCCPDTCGIKCLDPVDTPNPTRRKPGKCPVTYGQCLMLN 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 CQSDWQCPGKKRCCPDTCGIKCLDPVDTPNPTRRKPGKCPVTYGQCLMLN 100
101 PPNFCEMDGQCKRDLKCCMGMCGKSCVSPVK 131
101 PPNFCEMDGQCKRDLKCCMGMCGKSCVSPVK 131
Sequence name: /tmp/KCtSXACZXe/rK4T6LKeRX:ALKl HUMAN
Sequence documentation:
Alignment of: Z25299_PEA_2_P7 x ALK1 TOMAN
Alignment segment 1/1
Quality: 835.00 Escore: 0 Matching length: Total length: 81 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : Alignment :
1 MKSSGLFPFLVLLALGTLAPWAVEGSGKSFKAGVCPPKKSAQCLRYKKPE 50
1 MKSSGLFPFLVLLALGTLAP AVEGSGKSFKAGVCPPKKSAQCLRYKKPE 50
51 CQSD QCPGKKRCCPDTCGIKCLDPVDTPNP
51 CQSD QCPGKKRCCPDTCGIKCLDPVDTPNP
Sequence name: /tmp/LcBlcAxB6c/NSI9pqfxoU:ALKl HUMAN
Sequence documentation:
Alignment of: Z25299 PEA_2_P10 x ALK1_HUMAN
Alignment segment 1/1:
Quality: 844.00 Escore: . 0 Matching length: 82 Total length: 82 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment : 1 MKSSGLFPFLVLLALGTLAPWAVEGSGKSFKAGVCPPKKSAQCLRYKKPE 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I 1 MKSSGLFPFLVLLALGTLAPWAVEGSGKSFKAGVCPPKKSAQCLRYKKPE 50
51 CQSD QCPGKKRCCPDTCGIKCLDPVDTPNPT I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 CQSD QCPGKKRCCPDTCGIKCLDPVDTPNPT
Expression of Secretory leukocyte protease inhibitor Acid-stable proteinase inhibitor with strong affinities for trypsin, chymotrypsin, elastase, and cathepsin G Z25299 transcripts, which are detectable by amplicon as depicted in sequence name Z25299 seg20, were examined for expression in normal and cancerous colon tissues.
Transcripts detectable by or according to seg20, Z25299 seg20 amplicon (SEQ ID NO: 1294) and Z25299 seg20F (SEQ ID NO: 1292) and Z25299 seg20R (SEQ ID NO: 1293) primers were measured by real time PCR. In parallel the expression of four housekeeping genes -PBGD (GenBank Accession No. BC019323; amplicon - PBGD-amplicon, SEQ ID N0.531), HPRTl (GenBank Accession No. NM__000194; amplicon - HPRTl -amplicon, SEQ ID NO:612), G6PD (GenBank Accession No. NM_000402; HPRTl -amplicon, SEQ ID NO.615), RPS27A (GenBank Accession No. NM_002954; RPS27A amplicon, SEQ ID NO: 1261), was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the nonnal post-mortem (PM) samples (Sample Nos. 41, 52, 62-67, 69-71, Table 1, above Tissue samples in testing panel), to obtain a value of fold up-regulation for each sample relative to median of the nonnal PM samples. Figure 21 is a histogram showing over expression of the above-indicated variant. Transcript expression in cancerous colon samples relative to the nonnal samples are shown. As is evident from Figure 21, transcripts detectable by the above amplicon(s) in cancer samples were significantly higher than in the non-cancerous samples (Sample Nos. 41,52, 62- 67, 69-71 Table 1 Tissue samples in testing panel). Notably an over-expression of at least 5 fold was found in 7 out of 36 adenocarcinoma samples. Statistical analysis was applied to verify the significance of these results, as described below. The P value for the difference in the expression levels of this variant was determined. Transcripts detectable by the above amplicon in colon cancer samples versus the normal tissue samples were determined by T test as 6.98E-02. Threshold of 5 fold overexpression was found to differentiate between cancer and normal samples with P value of 1.33E-02 as checked by exact fisher test. The above values demonstrate statistical significance of the results.
Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non- limiting illustrative example only of a suitable primer pair: Z25299 seg20F forward primer; and Z25299 seg20R reverse primer. The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon was obtained as a non-limiting illustrative example only of a suitable amplicon: Z25299 seg20. Forward primer (SEQ ID NO: 1292): CTCCTGAACCCTACTCCAAGCA Reverse primer (SEQ ID NO: 1293): CAGGCGATCCTATGGAAATCC Amplicon (SEQ ID NO: 1294):
CTCCTGAACCCTACTCCAAGCACAGCCTCTGTCTGACTCCCTTGTCCTTCAAGAGAA CTGTTCTCCAGGTCTCAGGGCCAGGATTTCCATAGGATCGCCTG
Expression of Secretory leukocyte protease inhibitor Acid-stable proteinase inhibitor with strong affinities for trypsin, chymotrypsin, elastase, and cathepsin G. May prevent elastase-mediated damage to oral and possibly other mucosal tissues Z25299 transcripts which are detectable by amplicon as depicted in sequence name Z25299seg20 in different normal tissues
Expression of Secretory leukocyte protease inhibitor Acid-stable proteinase inhibitor with strong affinities for trypsin, chymotrypsin, elastase, and cathepsin G. May prevent elastase-mediated damage to oral and possibly other mucosal tissues transcripts detectable by or according to Z25299seg20 amplicon (SEQ ID NO: 1294) and primers: Z25299seg20F (SEQ ID NO: 1294) and Z25299seg20R (SEQ ID NO: 1294) was measured by real time PCR. In parallel the expression of four housekeeping genes -RPL 19 (GenBank Accession No. NMJ300981 ; RPL 19 amplicon), TATA box (GenBank Accession No. NM_003194; TATA amplicon), Ubiquitin (GenBank Accession No. BC000449; amplicon - Ubiquitin-amplicon) and SDHA (GenBank Accession No. NM_004168; amplicon - SDHA-amplicon) was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the ovary samples (Sample Nos. 18-20, Table 2, "Tissue samples on nonnal panel"), to obtain a value of relative expression of each sample relative to median of the ovary samples.
Forward primer (SEQ ID NO: 1292): CTCCTGAACCCTACTCCAAGCA Reverse primer (SEQ ID NO: 1293): CAGGCGATCCTATGGAAATCC Amplicon (SEQ ID NO: 1294): CTCCTGAACCCTACTCCAAGCACAGCCTCTGTCTGACTCCCTTGTCCTTCAAGAGAA CTGTTCTCCAGGTCTCAGGGCCAGGATTTCCATAGGATCGCCTG The results are demonstrated in Figure 22, showing the expression of Secretory leukocyte protease inhibitor Acid-stable proteinase inhibitor with strong affinities for trypsin, chymotrypsin, elastase, and cathepsin G. May prevent elastase-mediated damage to oral and possibly other mucosal tissues Z25299 transcripts which are detectable by amplicon as depicted in sequence name Z25299seg20 in different normal tissues.
DESCRIPTION FOR CLUSTER HUMF5A Cluster HUMF5A features 3 transcript(s) and 33 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Figure imgf000503_0001
Table 2 - Segments of interest
Figure imgf000504_0001
Figure imgf000505_0001
Table 3 - Proteins of interest
Figure imgf000505_0002
These sequences are variants of the known protein Coagulation factor V precursor (SwissProt accession identifier FA5 HUMAN; known also according to the synonyms Activated protein C cofactor), SEQ ID NO: 626, , refened to herein as the previously known protein. Protein Coagulation factor V precursor is known or believed to have the following function(s): Coagulation factor V is a cofactor that participates with factor Xa to activate prothrombin to thrombin. The sequence for protein Coagulation factor V precursor is given at the end of the application, as "Coagulation factor V precursor amino acid sequence". Known polymoφhisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein
Figure imgf000505_0003
Figure imgf000506_0001
following annotation(s) were found: cell adhesion; blood coagulation, which are annotation(s) related to Biological Process; and blood coagulation factor; copper binding, which are annotation(s) related to Molecular Function. The GO assignment relies on information from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from<htto://www.ncbi.nIm.nih.gov/projects/LocusLink/>. As noted above, cluster HUMF5A features 3 transcript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Coagulation factor V precursor. A description of each variant protein according to the present invention is now provided.
Variant protein HTJMF5AJPEAJ J?3 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMF5 AJPEAJ JTl. An alignment is given to the known protein (Coagulation factor V precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMF5AJPEAJ JP3 and FA5_HUMAN_V1 (SEQ ID NO 627): l.An isolated chimeric polypeptide encoding for HUMF5AJPEAJJP3, comprising a first amino acid sequence being at least 90 % homologous to MFPGCPRLWVLVVLGTSWVGWGSQGTEAAQLRQFYVAAQGISWSYRPEPTNSSLNLS VTSFKKIVYREYEPYFKKEKPQSTISGLLGPTLYAEVGDIIKVHFKNKADKPLSIHPQGIR YSKLSEGASYLDHTFPAEKMDDAVAPGREYTYEWSISEDSGPTHDDPPCLTHIYYSHEN LIEDFNSGLIGPLLICKKGTLTEGGTQKTFDKQIVLLFAVFDESKSWSQSSSLMYTVNGY VNGTMPDITVCAHDHISWHLLGMSSGPELFSIHFNGQVLEQNHHKVSAITLVSATSTTA NMTVGPEGKWIISSLTPKHLQAGMQAYIDIKNCPKKTPJ^LKKITREQRRHMKRWEYFI AAEEVIWDYAPVIPANMDKKYRSQHLDNFSNQIGKHYKKVMYTQYEDESFTKHTVNP NMKEDGILGPIIRAQVRDTLKIVFKNMASRPYSIYPHGVTFSPYEDEVNSSFTSGRNNTM IRAVQPGETYTYKWNILEFDEPTENDAQCLTRPYYSDVDIMRDIASGLIGLLLICKSRSL DRRGIQRAADIEQQAVFAVFDENKSWYLEDNINKFCENPDEVKRDDPKFYESNIMSTIN GYVPESITTLGFCFDDTVQWHFCSVGTQNEILTIHFTGHSFIYGKRHEDTLTLFPMRGES VTVTMDNVGTWMLTSMNSSPRSKKLRLKFRDVKCIPDDDEDSYEIFEPPESTVMATRK MHDRLEPEDEESDADYDYQNRLAAALGIRSFRNSSLNQEEEEFNLTALALENGTEFVSS NTDIIVGSNYSSPSNISKFTVNNLAEPQKAPSHQQATTAGSPLRHLIGKNSVLNSSTAEHS SPYSEDPIEDPLQPDVTGIRLLSLGAGEFRSQEHAJKRKGPKVERDQAAKHRFSWMKLLA HKVGRHLSQDTGSPSGMRPWEDLPSQDTGSPSRMRPWKDPPSDLLLLKQSNSSKILVG RWHLASEKGSYEΠQDTDEDTAVNNWLISPQNASRAWGESTPLANKPGKQSGHPKFPR VRHKSLQVRQDGGKSRLKKSQFLIKTRKKKKEKHTHHAPLSPRTFHPLRSEAYNTFSER RLKHSLVLHKSNETSLPTDLNQTLPSMDFGWIASLPDHNQNSSNDTGQASCPPGLYQTV PPEEHYQTFPIQDPDQMHSTSDPSHRSSSPELSEMLEYDRSHKSFPTDISQMSPSSEHEV WQTVISPDLSQVTLSPELSQTNLSPDLSHTTLSPELIQRNLSPALGQMPISPDLSHTTLSPD LSHTTLSLDLSQTNLSPELSQTNLSPALGQMPLSPDLSHTTLSLDFSQTNLSPELSHMTLS PELSQTNLSPALGQMPISPDLSHTTLSLDFSQTNLSPELSQTNLSPALGQMPLSPDPSHTT LSLDLSQTNLSPELSQTNLSPDLSEMPLFADLSQIPLTPDLDQMTLSPDLGETDLSPNFGQ MSLSPDLSQVTLSPDISDTTLLPDLSQISPPPDLDQIFYPSESSQSLLLQEFNESFPYPDLGQ MPSPSSPTLNDTFLSKEFNPLVIVGLSKDGTDYIEIIPKEEVQSSEDDYAEIDYVPYDDPY KTDVRTNINSSRDPDNIAAWYLRSNNGNRRNYYIAAEEISWDYSEFVQRETDIEDSDDIP EDTTYKK conesponding to amino acids 1 - 1617 of FA5 JHUMAN J l, which also conesponds to amino acids 1 - 1617 of HUMF5 AJPEAJ JP3, and a second amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence GSMKSISEFLVLLSELKWMMLSKFVLKI conesponding to amino acids 1618 - 1645 of HUMF5AJPEAJ JP3, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of HUMF5AJPEAJJP3, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence GSMKSISEFLVLLSELKWMMLSKFVLKI in HUMF5A_PEAJ JP3.
It should be noted that the known protein sequence (FA5JHUMAN) has one or more changes than the sequence given at the end of the application and named as being the amino acid sequence for FA5JHUMANJV1. These changes were previously known to occur and are listed in the table below. Table 5 - Changes to FA5_HUMAN_V1
Figure imgf000508_0001
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region. Variant protein HUMF5A ΕAJJP3 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 6, (given according to their position(s) on the amino acid sequence, with the altemative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMF5AJPEAJJP3 sequence provides support for the deduced sequence of this variant protein according to the present invention) . Table 6 - Amino acid mutations
Figure imgf000509_0001
Figure imgf000510_0001
Variant protein HUMF5AJPEAJJP3 is encoded by the following transcript(s): HUMF5 AJPEAJ JTl, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMF5AJPEAJ JTl is shown in bold; this coding portion starts at position 183 and ends at position 5117. The transcript also has the following SNPs as listed in Table 7 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is Icnown or not; the presence of known SNPs in variant protein HUMF5AJPEAJJP3 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Nucleic acid SNPs
Figure imgf000510_0002
Figure imgf000511_0001
Figure imgf000512_0001
Variant protein HUMF5 AJPEAJ JM according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMF5A_PEA_1_T3. An alignment is given to the known protein (Coagulation factor V precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMF5A JPEAJ JP4 and FA5 JHUMAN JV1 : l.An isolated chimeric polypeptide encoding for HUMF5AJPEAJJP4, comprising a first amino acid sequence being at least 90 % homologous to MFPGCPRLWVLWLGTSWVGWGSQGTEAAQLRQFYVAAQGISWSYRPEPTNSSLNLS VTSFKKIVYREYEPYFKKEKPQSTISGLLGPTLYAEVGDIIKVHFKNKADKPLSIHPQGIR YSKLSEGASYLDHTFPAEKMDDAVAPGREYTYEWSISEDSGPTHDDPPCLTHIYYSHEN LIEDFNSGLIGPLLICKKGTLTEGGTQKTFDKQIVLLFAVFDESKSWSQSSSLMYTVNGY VNGTMPDITVCAHDHISWHLLGMSSGPELFSIHFNGQVLEQNHHKVSAITLVSATSTTA NMTVGPEGKWIISSLTPKHLQAGMQAYIDIKNCPKKTRNLKKITREQRRHMKRWEYFI AAEEVIWDYAPV ANMDKKYRSQHLDNFSNQIGKHYKKVMYTQYEDESFTKHTVNP NMKEDGILGPIIRAQVRDTLKIVEKNMASRPYSIYPHGVTFSPYEDEVNSSFTSGRNNTM IRAVQPGETYTYKWNILEFDEPTENDAQCLTRPYYSDVDIMRDIASGLIGLLLICKSRSL DRRGIQRAADIEQQA AVFDENKSWYLEDNINKFCENPDEVT PJ)DPKFYESNΓMSTΓN GYVPESITTLGFCFDDTVQWHFCSVGTQNEILTIHFTGHSFIYGKRHEDTLTLFPMRGES VTVTMDNVGTWMLTSMNSSPRSKKLRLKFRDVKCIPDDDEDSYEIFEPPESTVMATRK MHDRLEPEDEESDADYDYQNRLAAALGIRSFRNSSLNQEEEEFNLTALALENGTEFVSS NTDIIVGSNYSSPSNISKFTVNNLAEPQKAPSHQQATTAGSPLRHLIGKNSVLNSSTAEHS SPYSEDPIEDPLQPDVTGIRLLSLGAGEFRSQEHAKRKGPKVERDQAAKHRFSWMKLLA HKVGRHLSQDTGSPSGMRPWEDLPSQDTGSPSRMRPWKDPPSDLLLLKQSNSSKILVG RWHLASEKGSYEIIQDTDEDTAVNNWLISPQNASRAWGESTPLANKPGKQSGHPKFPR VRHKSLQVRQDGGKSRLKKSQFLIKTRKKKKEKHTHHAPLSPRTFHPLRSEAYNTFSER RLKHSLVLHKSNETSLPTDLNQTLPSMDFGWIASLPDHNQNSSNDTGQASCPPGLYQTV PPEEHYQTFPIQDPDQMHSTSDPSHRSSSPELSEMLEYDRSHKSFPTDISQMSPSSEHEV WQTVISPDLSQVTLSPELSQTNLSPDLSHTTLSPELIQRNLSPALGQMPISPDLSHTTLSPD LSHTTLSLDLSQTNLSPELSQTNLSPALGQMPLSPDLSHTTLSLDFSQTNLSPELSHMTLS PELSQTNLSPALGQMPISPDLSHTTLSLDFSQTNLSPELSQTNLSPALGQMPLSPDPSHTT LSLDLSQTNLSPELSQTNLSPDLSEMPLFADLSQIPLTPDLDQMTLSPDLGETDLSPNFGQ MSLSPDLSQVTLSPDISDTTLLPDLSQISPPPDLDQIFYPSESSQSLLLQEFNESFPYPDLGQ MPSPSSPTLNDTFLSKEFNPLVIVGLSKDGTDYIEIIPKEEVQSSEDDYAEIDYVPYDDPY KTDVRTNINSSRDPDNIAAWYLRSNNGNRRNYYIAAEEISWDYSEFVQRETDIEDSDDIP EDTTYKKVVFRKYLDSTFTKRDPRGEYEEHLGILGPIIRAEVDDVIQVRFKNLASRPYSL HAHGLSYEKSSEGKTYEDDSPEWFKEDNAVQPNSSYTYVWHATERSGPESPGSACRA WAYYSAVNPEKDIHSGLIGPLLICQKGILHKDSNMPVDMREFVLLFMTFDEKKSWYYE KKSRSSWRLTSSEMKKSHEFHAINGMIYSLPGLKMYEQEWVRLHLLNIGGSQDIHVVH FHGQTLLENGNKQHQLGVWPLLPGSFKTLEMKASKPGWWLLNTEVGENQRAGMQTP FLIMDRDCRMPMGLSTGIISDSQIKASEFLGYWEPRLARLNNGGSYNAWSVEKLAAEFA SKPWIQVDMQKEVIITGIQTQGAKHYLKSCYTTEFYVAYSSNQINWQIFKGNSTRNVMY FNGNSDASTIKENQFDPPIVARYIRISPTRAYNRPTLRLELQGCE conesponding to amino acids 1 - 2062 of FA5 JHUMAN JV1, which also conesponds to amino acids 1 - 2062 of HUMF5A_PEA_1_P4, and a second amino acid sequence being at least 70%, optionally at least 80%), preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence DVPHPWVWKMER conesponding to amino acids 2063 - 2074 of HUMF5A_PEA_1_P4, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HUMF5AJPEAJ JM, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%) homologous to the sequence DVPHPWVWKMER in HUMF5AJPEA J>4.
It should be noted that the Icnown protein sequence (FA5JHUMAN) has one or more changes than the sequence given at the end of the application and named as being the amino acid sequence for F A5 JHUMAN JV1. These changes were previously known to occur and are listed in the table below. Table 8 - Changes to FA5J1UMAN 1
Figure imgf000514_0001
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region. Variant protein HUMF5AJΕAJJP4 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 9, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMF5A_PEA_1_P4 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Amino acid mutations
Figure imgf000515_0001
Variant protein HUMF5A_PEA_1_P4 is encoded by the following transcript(s): HUMF5AJPEAJJT3, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMF5AJPEAJ JT3 is shown in bold; this coding portion starts at position 183 and ends at position 6404. The transcript also has the following SNPs as listed in Table 10 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMF5AJPEAJJM sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 - Nucleic acid SNPs
Figure imgf000516_0001
Figure imgf000517_0001
Figure imgf000518_0001
Variant protein HUMF5AJPEAJ JP8 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMF5AJPEAJJT7. An alignment is given to the known protein (Coagulation factor V precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMF5AJΕAJ JP8 and FA5 JHUMAN: l.An isolated chimeric polypeptide encoding for HUMF5AJPEAJJP8, comprising a first amino acid sequence being at least 90 % homologous to MFPGCPRLWVLWLGTSWVGWGSQGTEAAQLRQFYVAAQGISWSYRPEPTNSSLNLS VTSFKKIVYREYEPYFKKEKPQSTISGLLGPTLYAEVGDIIKVHFKNKADKPLSIHPQGIR YSKLSEGASYLDHTFPAEKMDDAVAPGREYTYEWSISEDSGPTHDDPPCLTHIYYSHEN LIEDFNSGLIGPLLICKKGTLTEGGTQKTFDKQIVLLFAVFDESKSWSQSSSLMYTVNGY VNGTMPDITVCAHDHISWHLLGMSSGPELFSIHFNGQVLEQNHHKVSAITLVSATSTTA NMTVGPEGKWIISSLTPKHLQAGMQAYIDIKNCPKKTRNLKKITREQRRHMKRWEYFI AAEEVIWDYAPVIPANMDKKYRSQHLDNFSNQIGKHYKKVMYTQYEDESFTKHTVNP NMKEDGILGPIIRAQVRDTLKIVFKNMASRPYSIYPHGVTFSPYEDEVNSSFTSGRNNTM IRAVQPGETYTYKWNILEFDEPTENDAQCLTRPYYSDVDIMRDIASGLIGLLLICKSRSL DRRGIQRAADIEQQAVFAVFDENKSWYLEDNINKFCENPDEVKRDDPKFYESNIMS conesponding to amino acids 1 - 587 of FA5 JHUMAN, which also conesponds to amino acids 1 - 587 of HUMF5AJPEAJJP8, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence SKSEYYFCSSVFHSCG conesponding to amino acids 588 - 603 of HUMF5AJPEAJ JP8, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HUMF5AJPEAJJP8, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95% homologous to the sequence SKSEYYFCSSVFHSCG in HUMF5AJPEAJ JP8.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region. Variant protein HUMF5A JPEAJ JP8 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 11, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is Icnown or not; the presence of known SNPs in variant protein HUMF5AJPEAJjP8 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 11 - Amino acid mutations
Figure imgf000519_0001
The glycosylation sites of variant protein HUMF5A_PEA_1_P8, as compared to the known protein Coagulation factor V precursor, are described in Table 12 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 12 - Glycosylation site(s)
Figure imgf000520_0001
Figure imgf000521_0001
The phosphorylation sites of variant protein HUMF5AJPEAJ JP8, as compared to the known protein Coagulation factor V precursor, are described in Table 13 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the phosphorylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 13 - Phosphorylation site(s)
Figure imgf000521_0002
Variant protein HUMF5NJPEAJJP8 is encoded by the following transcript(s): HUMF5A ?EA_l_T7, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMF5AJPEAJ JT7 is shown in bold; this coding portion starts at position 183 and ends at position 1991. The transcript also has the following SNPs as listed in Table 14 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMF5A_PEA_1_P8 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 14 - Nucleic acid SNPs
Figure imgf000522_0001
above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster HUMF5AJPEAJ ιodeJ) according to the present invention is supported by 9 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMF5AJPEAJJT1, HUMF5AJPEAJJT3 and HUMF5AJPEAJ_T7. Table 15 below describes the starting and ending position of this segment on each transcript. Table 15 - Segment location on transcripts
Figure imgf000523_0001
Segment cluster HUMF5AJPEAJjnodeJ according to the present invention is supported by 7 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMF5A_PEA_1_T1, HUMF5AJPEAJJT3 and HUMF5AJPEAJJT7. Table 16 below describes the starting and ending position of this segment on each transcript. Table 16 - Segment location on transcripts
Figure imgf000523_0002
Segment cluster HUMF5A_PEA_l_node_6 according to the present invention is supported by 11 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMF5A_PEA_1_T1, HUMF5A_PEA_1_T3 and HUMF5A_PEA_1_T7. Table 17 below describes the starting and ending position of this segment on each transcript. Table 17 - Segment location on transcripts
Figure imgf000523_0003
Figure imgf000524_0001
Segment cluster HUMF5A_PEA_l_nodeJ according to the present invention is supported by 8 libraries. The number of libraries was detennined as previously described. This segment can be found in the following transcript(s): HUMF5AJΕAJJT1, HUMF5AJPEAJJT3 and HUMF5AJPEAJJT7. Table 18 below describes the starting and ending position of this segment on each transcript. Table 18 - Segment location on transcripts
Figure imgf000524_0002
Segment cluster HUMF5A_PEA_l_node_10 according to the present invention is supported by 7 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMF5A PEAJ JTl, HUMF5 AJPEAJ JT3 and HUMF5A_PEA_1__T7. Table 19 below describes the starting and ending position of this segment on each transcript. Table 19 - Segment location on transcripts
Figure imgf000524_0003
Segment cluster HUMF5A_PEA_l_node_12 according to the present invention is supported by 10 libraries. The number of libraries was detennined as previously described. This segment can be found in the following transcript(s): HUMF5 AJPEAJ JTl, HUMF5AJPEAJJT3 and HUMF5AJPEAJ _T7. Table 20 below describes the starting and ending position of this segment on each transcript. Table 20 - Segment location on transcripts
Figure imgf000525_0001
Segment cluster HUMF5A_PEA_l_node_14 according to the present invention is supported by 9 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMF5 AJPEAJ JTl, HUMF5AJPEAJJT3 and HUMF5 AJPEAJ JT7. Table 21 below describes the starting and ending position of this segment on each transcript. Table 21 - Segment location on transcripts
Figure imgf000525_0002
Segment cluster HUMF5A_PEA_l_node_18 according to the present invention is supported by 10 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMF5 AJPEAJ JTl, HUMF5AJΕAJJT3 and HUMF5A_PEA_1_T7. Table 22 below describes the starting and ending position of this segment on each transcript. Table 22 - Segment location on transcripts
Figure imgf000526_0001
Segment cluster HUMF5AJPEAJ_nodeJl according to the present invention is supported by 12 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMF5 AJPEAJ JTl, HUMF5AJΕAJ JT3 and HUMF5AJPEA T7. Table 23 below describes the starting and ending position of this segment on each transcript. Table 23 - Segment location on transcripts
Figure imgf000526_0002
Segment cluster HUMF5AJPEAJ_node_22 according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMF5A JPEAJ JT7. Table 24 below describes the starting and ending position of this segment on each transcript. Table 24 - Segment location on transcripts
Figure imgf000527_0001
Segment cluster HUMF5A_PEA_l_nodeJ4 according to the present invention is supported by 13 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMF5AJPEAJJT1 and HUMF5A_PEA_1_T3. Table 25 below describes the starting and ending position of this segment on each transcript. Table 25 - Segment location on transcripts
Figure imgf000527_0002
Segment cluster HUMF5A_PEA_l_node_26 according to the present invention is supported by 33 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMF5A_PEA_1_T1 and HUMF5AJPEAJJT3. Table 26 below describes the starting and ending position of this segment on each transcript. Table 26 - Segment location on transcripts
Figure imgf000527_0003
Segment cluster HUMF5A_PEA_l_node_27 according to the present invention is supported by 12 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMF5A_PEA_1_T1 and HUMF5AJPEAJJT3. Table 27 below describes the starting and ending position of this segment on each transcript. Table 27 - Segment location on transcripts
Figure imgf000528_0001
Segment cluster HUMF5A_PEA_l_node_29 according to the present invention is supported by 22 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMF5 AJPEAJ JTl and FfUMF5AJPEAJJT3. Table 28 below describes the starting and ending position of this segment on each transcript. Table 28 - Segment location on transcripts
Figure imgf000528_0002
Segment cluster HUMF5A_PEA_l_node_35 according to the present invention is supported by 7 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMF5AJPEAJJT1 and HUMF5 AJPEAJ JT3. Table 29 below describes the starting and ending position of this segment on each transcript. Table 29 - Segment location on transcripts
Figure imgf000529_0001
Segment cluster HUMF5A_PEA_l_nodeJ7 according to the present invention is supported by 9 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMF5AJPEAJJT1 and HUMF5AJPEAJJT3. Table 30 below describes the starting and ending position of this segment on each transcript. Table 30 - Segment location on transcripts
Figure imgf000529_0002
Segment cluster HUMF5A_PEA_l_nodeJ9 according to the present invention is supported by 10 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMF5A_PEA_1_T1 and HUMF5A_PEA_1_T3. Table 31 below describes the starting and ending position of this segment on each franscript. Table 31 - Segment location on transcripts
Figure imgf000529_0003
Segment cluster HUMF5A_PEA_l_nodeJ7 according to the present invention is supported by 14 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMF5AJPEAJJT1 and HUMF5AJPEAJJT3. Table 32 below describes the starting and ending position of this segment on each transcript. Table 32 - Segment location on transcripts
Figure imgf000530_0001
Segment cluster HUMF5A_PEA_l_nodeJ0 according to the present invention is supported by 20 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMF5AJPEAJJT1 and HUMF5A_PEA_1_T3. Table 33 below describes the starting and ending position of this segment on each transcript. Table 33 - Segment location on transcripts
Figure imgf000530_0002
Segment cluster HUMF5A_PEA_l_nodeJ3 according to the present invention is supported by 29 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMF5A PEAJ JTl and HUMF5AJPEAJJT3. Table 34 below describes the starting and ending position of this segment on each transcript. Table 34 - Segment location on transcripts
Figure imgf000531_0001
Segment cluster HUMF5A_PEA_l_node_56 according to the present invention is supported by 24 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMF5AJPEAJJT1 and HUMF5AJPEAJJT3. Table 35 below describes the starting and ending position of this segment on each transcript. Table 35 - Segment location on transcripts
Figure imgf000531_0002
Segment cluster HUMF5NJ>EAJjnode_60 according to the present invention is supported by 24 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMF5AJPEAJJT1 and HUMF5A_PEA_1_T3. Table 36 below describes the starting and ending position of this segment on each transcript. Table 36 - Segment location on transcripts
Figure imgf000531_0003
Figure imgf000532_0001
According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster HUMF5A ΕAJjnodeJ according to the present invention is supported by 6 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMF5A_PEA_1_T1, HUMF5AJPEAJJT3 and HUMF5A_PEA_1_T7. Table 37 below describes the starting and ending position of this segment on each transcript. Table 37 - Segment location on transcripts
Figure imgf000532_0002
Segment cluster HUMF5A_PEA_l_node_16 according to the present invention is supported by 10 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMF5A_PEA_1_T1, HUMF5A_PEA_1_T3 and HUMF5AJPEAJJT7. Table 38 below describes the starting and ending position of this segment on each transcript. Table 38 - Segment location on transcripts
Figure imgf000533_0001
Segment cluster HUMF5 A_PEA_1 jnode l according to the present invention is supported by 3 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMF5A JPEAJ JTl and HUMF5A ΕAJJT3. Table 39 below describes the starting and ending position of this segment on each transcript. Table 39 - Segment location on transcripts
Figure imgf000533_0002
Segment cluster HUMF5A_PEA_l_nodeJ2 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMF5 AJPEAJ JT3. Table 40 below describes the starting and ending position of this segment on each transcript. Table 40 - Segment location on transcripts
Figure imgf000533_0003
Segment cluster HUMF5A_PEA_l_nodeJ3 according to the present invention is supported by 4 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMF5AJPEAJJT1 and HUMF5AJPEAJJT3. Table 41 below describes the starting and ending position of this segment on each transcript. Table 41 - Segment location on transcripts
Figure imgf000534_0001
Segment cluster HUMF5A_PEA_l_node_41 according to the present invention is supported by 8 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMF5 AJPEAJ JTl and HUMF5AJPEAJJT3. Table 42 below describes the starting and ending position of this segment on each transcript. Table 42 - Segment location on transcripts
Figure imgf000534_0002
Segment cluster HUMF5A ?EA_l_node_43 according to the present invention is supported by 6 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMF5A JPEAJ JTl and HUMF5AJPEAJJT3. Table 43 below describes the starting and ending position of this segment on each transcript. Table 43 - Segment location on transcripts
Figure imgf000535_0001
Segment cluster HUMF5AJPEAJ_nodeJ5 according to the present invention is supported by 12 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMF5 AJPEAJ JTl and HUMF5A_PEA_1_T3. Table 44 below describes the starting and ending position of this segment on each transcript. Table 44 - Segment location on transcripts
Figure imgf000535_0002
Segment cluster HUMF5A_PEA_l_node_51 according to the present invention can be found in the following transcript(s): HUMF5AJPEAJJT1. Table 45 below describes the starting and ending position of this segment on each transcript. Table 45 - Segment location on transcripts
Figure imgf000535_0003
Segment cluster HUMF5A_PEA_l_nodeJ7 according to the present invention is supported by 18 libraries. The number of libraries was detennined as previously described. This segment can be found in the following transcript(s): HUMF5A_PEA_1_T1 and HUMF5AJPEAJJT3. Table 46 below describes the starting and ending position of this segment on each transcript. Table 46 - Segment location on transcripts
Figure imgf000536_0001
Segment cluster HUMF5A_PEA_l_nodeJ9 according to the present invention can be found in the following transcript(s): HUMF5 AJPEAJ JTl and HUMF5A_PEA_1_T3. Table 47 below describes the starting and ending position of this segment on each transcript. Table 47 - Segment location on transcripts
Figure imgf000536_0002
Variant protein alignment to the previously known protein: Sequence name: FA5 HUMAN VI Sequence documentation:
Alignment of: HUMF5A PEA 1 P3 x FA5 HUMAN VI
Alignment segment 1/1:
Quality: 16060.00 Escore: 0 Matching length: 1617 Total length: 1617 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0
Alignment :
1 MFPGCPRL VLVVLGTS VGWGSQGTEAAQLRQFYVAAQGIS SYRPEPT 50 I II I I I I I I II I I I I I I II I I I I I II I I I I I II I I I I I I I I I I I I I I I II 1 MFPGCPRLWVLVVLGTSWVGWGSQGTEAAQLRQFYVAAQGISWSYRPEPT 50
51 NSSLNLSVTSFKKIVYREYEPYFKKEKPQSTISGLLGPTLYAEVGDIIKV 100
51 NSSLNLSVTSFKKIVYREYEPYFKKEKPQSTISGLLGPTLYAEVGDIIKV 100
101 HFKNKADKPLSIHPQGIRYSKLSEGASYLDHTFPAEKMDDAVAPGREYTY 150
101 HFKNKADKPLSIHPQGIRYSKLSEGASYLDHTFPAEKMDDAVAPGREYTY 150
151 EWSISEDSGPTHDDPPCLTHIYYSHENLIEDFNSGLIGPLLICKKGTLTE 200 I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I 1
151 E SISEDSGPTHDDPPCLTHIYYSHENLIEDFNSGLIGPLLICKKGTLTE 200
201 GGTQKTFDKQIVLLFAVFDESKSWSQSSSLMYTVNGYVNGTMPDITVCAH 250 II I I I I 11 I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
201 GGTQKTFDKQIVLLFAVFDESKS SQSSSLMYTVNGYVNGTMPDITVCAH 250
251 DHISWHLLGMSSGPELFSIHFNGQVLEQNHHKVSAITLVSATSTTANMTV 300 I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I II 251 DHIS HLLGMSSGPELFSIHFNGQVLEQNHHKVSAITLVSATSTTANMTV 300
301 GPEGK IISSLTPKHLQAGMQAYIDIKNCPKKTRNLKKITREQRRHMKRW 350 I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I
301 GPEGKWIISSLTPKHLQAGMQAYIDIKNCPKKTRNLKKITREQRRHMKRW 350 . . . . .
351 EYFIAAEEVIWDYAPVIPANMDKKYRSQHLDNFSNQIGKHYKKVMYTQYE 400 II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I
351 EYFIAAEEVIWDYAPVIPANMDKKYRSQHLDNFSNQIGKHYKKVMYTQYE 400
401 DESFTKHTVNPNMKEDGILGPIIRAQVRDTLKIVFKNMASRPYSIYPHGV 450 II I I I II I I I I I I I I I 1 I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I II
401 DESFTKHTVNPNMKEDGILGPIIRAQVRDTLKIVFKNMASRPYSIYPHGV 450
451 TFSPYEDEVNSSFTSGRNNTMIRAVQPGETYTYKWNILEFDEPTENDAQC 500 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I
451 TFSPYEDEVNSSFTSGRNNTMIRAVQPGETYTYKWNILEFDEPTENDAQC 500
501 LTRPYYSDVDIMRDIASGLIGLLLICKSRSLDRRGIQRAADIEQQAVFAV 550 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I 501 LTRPYYSDVDIMRDIASGLIGLLLICKSRSLDRRGIQRAADIEQQAVFAV 550 551 FDENKS YLEDNINKFCENPDEVKRDDPKFYESNIMSTINGYVPESITTL 600 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I
551 FDENKSWYLEDNINKFCENPDEVKRDDPKFYESNIMSTINGYVPESITTL 600
601 GFCFDDTVQ HFCSVGTQNEILTIHFTGHSFIYGKRHEDTLTLFPMRGES 650 I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I II I I I I
601 GFCFDDTVQWHFCSVGTQNEILTIHFTGHSFIYGKRHEDTLTLFPMRGES 650
651 VTVTMDNVGTWMLTSMNSSPRSKKLRLKFRDVKCIPDDDEDSYEIFEPPE 700 I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I
651 VTVTMDNVGT MLTSMNSSPRSKKLRLKFRDVKCIPDDDEDSYEIFEPPE 700
701 STVMATRKMHDRLEPEDEESDADYDYQNRLAAALGIRSFRNSSLNQEEEE 750 I I I I I I I I I I II I I II I I II II I II I II II I II I I I I I I I I II I I I I I I I 701 STVMATRKMHDRLEPEDEESDADYDYQNRLAAALGIRSFRNSSLNQEEEE 750
751 FNLTALALENGTEFVSSNTDIIVGSNYSSPSNISKFTVNNLAEPQKAPSH 800 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
751 FNLTALALENGTEFVSSNTDIIVGSNYSSPSNISKFTVNNLAEPQKAPSH 800 . . . . .
801 QQATTAGSPLRHLIGKNSVLNSSTAEHSSPYSEDPIEDPLQPDVTGIRLL 850 I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 801 QQATTAGSPLRHLIGKNSVLNSSTAEHSSPYSEDPIEDPLQPDVTGIRLL 850
851 SLGAGEFRSQEHAKRKGPKVERDQAAKHRFS MKLLAHKVGRHLSQDTGS 900 I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I
851 SLGAGEFRSQEHAKRKGPKVERDQAAKHRFS MKLLAHKVGRHLSQDTGS 900
901 PSGMRPWEDLPSQDTGSPSRMRPWKDPPSDLLLLKQSNSSKILVGRWHLA 950 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I
901 PSGMRPWEDLPSQDTGSPSRMRP KDPPSDLLLLKQSNSSKILVGR HLA 950 951 SEKGSYEIIQDTDEDTAVNN LISPQNASRAWGESTPLANKPGKQSGHPK 1000 I I I II I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 951 SEKGSYEIIQDTDEDTAVNNWLISPQNASRA GESTPLANKPGKQSGHPK 1000 . . . . .
1001 FPRVRHKSLQVRQDGGKSRLKKSQFLIKTRKKKKEKHTHHAPLSPRTFHP 1050 I I I I I I I I I I I I I II I I I I I I I I I I II I I I I I I I I I I II I I I I I I I I I I I
1001 FPRVRHKSLQVRQDGGKSRLKKSQFLIKTRKKKKEKHTHHAPLSPRTFHP 1050
1051 LRSEAYNTFSERRLKHSLVLHKSNETSLPTDLNQTLPSMDFGWIASLPDH 1100 I I I II I I I I I I I II I I I I I II I I I I I II I I I I I I I I I I I I 11 I I I I I I I I
1051 LRSEAYNTFSERRLKHSLVLHKSNETSLPTDLNQTLPSMDFGWIASLPDH 1100
1101 NQNSSNDTGQASCPPGLYQTVPPEEHYQTFPIQDPDQMHSTSDPSHRSSS 1150 I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I II I I I I I
1101 NQNSSNDTGQASCPPGLYQTVPPEEHYQTFPIQDPDQMHSTSDPSHRSSS 1150
1151 PELSEMLEYDRSHKSFPTDISQMSPSSEHEV QTVISPDLSQVTLSPELS 1200 I I I I I II I II I I I II I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I 1151 PELSEMLEYDRSHKSFPTDISQMSPSSEHEVWQTVISPDLSQVTLSPELS 1200
1201 QTNLSPDLSHTTLSPELIQRNLSPALGQMPISPDLSHTTLSPDLSHTTLS 1250 I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I II I
1201 QTNLSPDLSHTTLSPELIQRNLSPALGQMPISPDLSHTTLSPDLSHTTLS 1250 . . . . .
1251 LDLSQTNLSPELSQTNLSPALGQMPLSPDLSHTTLSLDFSQTNLSPELSH 1300 II I I I I I I I I I II I II I I I II I I I II I I I I I I I I I I II II I I I I I I I I I I
1251 LDLSQTNLSPELSQTNLSPALGQMPLSPDLSHTTLSLDFSQTNLSPELSH 1300
1301 MTLSPELSQTNLSPALGQMPISPDLSHTTLSLDFSQTNLSPELSQTNLSP 1350 1301 MTLSPELSQTNLSPALGQMPISPDLSHTTLSLDFSQTNLSPELSQTNLSP 1350
1351 ALGQMPLSPDPSHTTLSLDLSQTNLSPELSQTNLSPDLSEMPLFADLSQI 1400 I I I I I I II I I I I I I I II I I I I I I I I I I I I I I I I I I II II I I I I I I I I I I I
1351 ALGQMPLSPDPSHTTLSLDLSQTNLSPELSQTNLSPDLSEMPLFADLSQI 1400
1401 PLTPDLDQMTLSPDLGETDLSPNFGQMSLSPDLSQVTLSPDISDTTLLPD 1450 I I II I I I I I I I I I I I II I I I I I I I I II I I II I I I I I I I I I I I I I I I II I I
1401 PLTPDLDQMTLSPDLGETDLSPNFGQMSLSPDLSQVTLSPDISDTTLLPD 1450
1451 LSQISPPPDLDQIFYPSESSQSLLLQEFNESFPYPDLGQMPSPSSPTLND 1500 I I I I I I I I I I I I I I I I I I I I I I II II I I I I I I I I I I I I I I I I I II I I I I I 1451 LSQISPPPDLDQIFYPSESSQSLLLQEFNESFPYPDLGQMPSPSSPTLND 1500
1501 TFLSKEFNPLVIVGLSKDGTDYIEIIPKEEVQSSEDDYAEIDYVPYDDPY 1550 I I I I I I I II I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I II II I I
1501 TFLSKEFNPLVIVGLSKDGTDYIEIIPKEEVQSSEDDYAEIDYVPYDDPY 1550
1551 KTDVRTNINSSRDPDNIAAWYLRSNNGNRRNYYIAAEEIS DYSEFVQRE 1600 I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I II I II I I I
1551 KTDVRTNINSSRDPDNIAAWYLRSNNGNRRNYYIAAEEIS DYSEFVQRE 1600
1601 TDIEDSDDIPEDTTYKK 1617 I I I I I I I I I I I I 1 I I I I 1601 TDIEDSDDIPEDTTYKK 1617 Sequence name: FA5 HUMAN VI
Sequence documentation:
Alignment of: HUMF5A_PEA_1 P4 x FA5_HUMAN_V1
Alignment segment 1/1:
Quality: 20532.00 Escore: 0 Matching length: 2062 Total length: 2062 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment:
1 MFPGCPRLWVLVVLGTSWVGWGSQGTEAAQLRQFYVAAQGISWSYRPEPT 50
1 MFPGCPRL VLVVLGTS VG GSQGTEAAQLRQFYVAAQGISWSYRPEPT 50
51 NSSLNLSVTSFKKIVYREYEPYFKKEKPQSTISGLLGPTLYAEVGDIIKV 100
51 NSSLNLSVTSFKKIVYREYEPYFKKEKPQSTISGLLGPTLYAEVGDIIKV 100
101 HFKNKADKPLSIHPQGIRYSKLSEGASYLDHTFPAEKMDDAVAPGREYTY 150
101 HFKNKADKPLSIHPQGIRYSKLSEGASYLDHTFPAEKMDDAVAPGREYTY 150 151 EWSISEDSGPTHDDPPCLTHIYYSHENLIEDFNSGLIGPLLICKKGTLTE 200 I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II
151 E SISEDSGPTHDDPPCLTHIYYSHENLIEDFNSGLIGPLLICKKGTLTE 200 . . . . .
201 GGTQKTFDKQIVLLFAVFDESKS SQSSSLMYTVNGYVNGTMPDITVCAH 250 I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
201 GGTQKTFDKQIVLLFAVFDESKS SQSSSLMYTVNGYVNGTMPDITVCAH 250
251 DHIS HLLGMSSGPELFSIHFNGQVLEQNHHKVSAITLVSATSTTANMTV 300 I I I I II I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
251 DHIS HLLGMSSGPELFSIHFNGQVLEQNHHKVSAITLVSATSTTANMTV 300
301 GPEGKWIISSLTPKHLQAGMQAYIDIKNCPKKTRNLKKITREQRRHMKR 350 I I II I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I
301 GPEGK IISSLTPKHLQAGMQAYIDIKNCPKKTRNLKKITREQRRHMKRW 350
351 EYFIAAEEVIWDYAPVIPANMDKKYRSQHLDNFSNQIGKHYKKVMYTQYE 400 I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I II I II I I I I 351 EYFIAAEEVI DYAPVIPANMDKKYRSQHLDNFSNQIGKHYKKVMYTQYE 400
401 DESFTKHTVNPNMKEDGILGPIIRAQVRDTLKIVFKNMASRPYSIYPHGV 450 I I I I II II I I I I I I I I I I I I i I I I I I I II I I I I I I I I I I I I II I I I I I I I
401 DESFTKHTVNPNMKEDGILGPIIRAQVRDTLKIVFKNMASRPYSIYPHGV 450 . . . . .
451 TFSPYEDEVNSSFTSGRNNTMIRAVQPGETYTYK NILEFDEPTENDAQC 500 I I I I I I I I I I I I I I I I I I I I I I I I I I II I I II I I II I I II I I I I I I I I I I-
451 TFSPYEDEVNSSFTSGRNNTMIRAVQPGETYTYKWNILEFDEPTENDAQC 500
501 LTRPYYSDVDIMRDIASGLIGLLLICKSRSLDRRGIQRAADIEQQAVFAV 550 501 LTRPYYSDVDIMRDIASGLIGLLLICKSRSLDRRGIQRAADIEQQAVFAV 550
551 FDENKS YLEDNINKFCENPDEVKRDDPKFYESNIMSTINGYVPESITTL 600 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I II I I I
551 FDENKS YLEDNINKFCENPDEVKRDDPKFYESNIMSTINGYVPESITTL 600
601 GFCFDDTVQ HFCSVGTQNEILTIHFTGHSFIYGKRHEDTLTLFPMRGES 650 I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I 11 I I I I I I I I
601 GFCFDDTVQ HFCSVGTQNEILTIHFTGHSFIYGKRHEDTLTLFPMRGES 650
651 VTVTMDNVGT MLTSMNSSPRSKKLRLKFRDVKCIPDDDEDSYEIFEPPE 700 I I I I I I I I I I I I I I I I I I I 1 I I I I I 1 I I I I 1 I I I I I I I I 1 I I I I 1 I I I I 1
651 VTVTMDNVGTWMLTSMNSSPRSKKLRLKFRDVKCIPDDDEDSYEIFEPPE 700
701 STVMATRKMHDRLEPEDEESDADYDYQNRLAAALGIRSFRNSSLNQEEEE 750 I I I I I I I I I I I I I I I I I I II I I I I 1 I I I I I I I I I I I I II I I I I II I I II I
701 STVMATRKMHDRLEPEDEESDADYDYQNRLAAALGIRSFRNSSLNQEEEE 750
751 FNLTALALENGTEFVSSNTDIIVGSNYSSPSNISKFTVNNLAEPQKAPSH 800 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I
751 FNLTALALENGTEFVSSNTDIIVGSNYSSPSNISKFTVNNLAEPQKAPSH 800
801 QQATTAGSPLRHLIGKNSVLNSSTAEHSSPYSEDPIEDPLQPDVTGIRLL 850 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 801 QQATTAGSPLRHLIGKNSVLNSSTAEHSSPYSEDPIEDPLQPDVTGIRLL 850
851 SLGAGEFRSQEHAKRKGPKVERDQAAKHRFS MKLLAHKVGRHLSQDTGS 900 I I I I II I I II I I I I I I I II I I II I I I I I I I I I I I I I I II I I I I I I I II I I
851 SLGAGEFRSQEHAKRKGPKVERDQAAKHRFS MKLLAHKVGRHLSQDTGS 900
901 PSGMRP EDLPSQDTGSPSRMRP KDPPSDLLLLKQSNSSKILVGRWHLA 950 901 PSGMRPWEDLPSQDTGSPSRMRP KDPPSDLLLLKQSNSSKILVGR HLA 950
951 SEKGSYEIIQDTDEDTAVNNWLISPQNASRA GESTPLANKPGKQSGHPK 1000
951 SEKGSYEIIQDTDEDTAVNNWLISPQNASRAWGESTPLANKPGKQSGHPK 1000
1001 FPRVRHKSLQVRQDGGKSRLKKSQFLIKTRKKKKEKHTHHAPLSPRTFHP 1050 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1001 FPRVRHKSLQVRQDGGKSRLKKSQFLIKTRKKKKEKHTHHAPLSPRTFHP 1050
1051 LRSEAYNTFSERRLKHSLVLHKSNETSLPTDLNQTLPSMDFG IASLPDH 1100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I
1051 LRSEAYNTFSERRLKHSLVLHKSNETSLPTDLNQTLPSMDFGWIASLPDH 1100 . . . . .
1101 NQNSSNDTGQASCPPGLYQTVPPEEHYQTFPIQDPDQMHSTSDPSHRSSS 1150 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I
1101 NQNSSNDTGQASCPPGLYQTVPPEEHYQTFPIQDPDQMHSTSDPSHRSSS 1150
1151 PELSEMLEYDRSHKSFPTDISQMSPSSEHEVWQTVISPDLSQVTLSPELS 1200 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I
1151 PELSEMLEYDRSHKSFPTDISQMSPSSEHEV QTVISPDLSQVTLSPELS 1200
1201 QTNLSPDLSHTTLSPELIQRNLSPALGQMPISPDLSHTTLSPDLSHTTLS 1250 I I II I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
1201 QTNLSPDLSHTTLSPELIQRNLSPALGQMPISPDLSHTTLSPDLSHTTLS 1250
1251 LDLSQTNLSPELSQTNLSPALGQMPLSPDLSHTTLSLDFSQTNLSPELSH 1300 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1251 LDLSQTNLSPELSQTNLSPALGQMPLSPDLSHTTLSLDFSQTNLSPELSH 1300 1301 MTLSPELSQTNLSPALGQMPISPDLSHTTLSLDFSQTNLSPELSQTNLSP 1350 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
1301 MTLSPELSQTNLSPALGQMPISPDLSHTTLSLDFSQTNLSPELSQTNLSP 1350
1351 ALGQMPLSPDPSHTTLSLDLSQTNLSPELSQTNLSPDLSEMPLFADLSQI 1400 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
1351 ALGQMPLSPDPSHTTLSLDLSQTNLSPELSQTNLSPDLSEMPLFADLSQI 1400
1401 PLTPDLDQMTLSPDLGETDLSPNFGQMSLSPDLSQVTLSPDISDTTLLPD 1450 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
1401 PLTPDLDQMTLSPDLGETDLSPNFGQMSLSPDLSQVTLSPDISDTTLLPD 1450
1451 LSQISPPPDLDQIFYPSESSQSLLLQEFNESFPYPDLGQMPSPSSPTLND 1500 I I I I I I I I I II I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I 1451 LSQISPPPDLDQIFYPSESSQSLLLQEFNESFPYPDLGQMPSPSSPTLND 1500
1501 TFLSKEFNPLVIVGLSKDGTDYIEIIPKEEVQSSEDDYAEIDYVPYDDPY 1550 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I
1501 TFLSKEFNPLVIVGLSKDGTDYIEIIPKEEVQSSEDDYAEIDYVPYDDPY 1550 . . . . .
1551 KTDVRTNINSSRDPDNIAA YLRSNNGNRRNYYIAAEEISWDYSEFVQRE 1600 I I I I I I I I I I I I I I I 1 I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I II I I
1551 KTDVRTNINSSRDPDNIAAWYLRSNNGNRRNYYIAAEEISWDYSEFVQRE 1600
1601 TDIEDSDDIPEDTTYKKWFRKYLDSTFTKRDPRGEYEEHLGILGPIIRA 1650 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I
1601 TDIEDSDDIPEDTTYKKWFRKYLDSTFTKRDPRGEYEEHLGILGPIIRA 1650
1651 EVDDVIQVRFKNLASRPYSLHAHGLSYEKSSEGKTYEDDSPEWFKEDNAV 1700 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
1651 EVDDVIQVRFKNLASRPYSLHAHGLSYEKSSEGKTYEDDSPE FKEDNAV 1700 1701 QPNSSYTYV HATERSGPESPGSACRAWAYYSAVNPEKDIHSGLIGPLLI 1750 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
1701 QPNSSYTYV HATERSGPESPGSACRA AYYSAVNPEKDIHSGLIGPLLI 1750 . . . . .
1751 CQKGILHKDSNMPVDMREFVLLFMTFDEKKS YYEKKSRSS RLTSSEMK 1800 I I II I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
1751 CQKGILHKDSNMPVDMREFVLLFMTFDEKKSWYYEKKSRSS RLTSSEMK 1800
1801 KSHEFHAINGMIYSLPGLKMYEQE VRLHLLNIGGSQDIHVVHFHGQTLL 1850 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I
1801 KSHEFHAINGMIYSLPGLKMYEQEWVRLHLLNIGGSQDIHVVHFHGQTLL 1850
1851 ENGNKQHQLGV PLLPGSFKTLEMKASKPG WLLNTEVGENQRAGMQTPF 1900 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
1851 ENGNKQHQLGVWPLLPGSFKTLEMKASKPG WLLNTEVGENQRAGMQTPF 1900
1901 LIMDRDCRMPMGLSTGIISDSQIKASEFLGY EPRLARLNNGGSYNAWSV 1950 I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I II I I I I I I I I I I I 1901 LIMDRDCRMPMGLSTGIISDSQIKASEFLGY EPRLARLNNGGSYNA SV 1950
1951 EKLAAEFASKPWIQVDMQKEVIITGIQTQGAKHYLKSCYTTEFYVAYSSN 2000 I I I I I I I I I I I I I I II II I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I 1951 EKLAAEFASKPWIQVDMQKEVIITGIQTQGAKHYLKSCYTTEFYVAYSSN 2000
2001 QINWQIFKGNSTRNVMYFNGNSDASTIKENQFDPPIVARYIRISPTRAYN 2050 I I I I I I I I I I I I I II I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I 2001 QINWQIFKGNSTRNVMYFNGNSDASTIKENQFDPPIVARYIRISPTRAYN 2050
2051 RPTLRLELQGCE 2062 2051 RPTLRLELQGCE 2062
Sequence name: FA5 HUMAN
Sequence documentation:
Alignment of: HUMF5A PEA 1 P8 x FA5 HUMAN
Alignment segment 1/1:
Quality: 5863.00 Escore: 0 Matching length: 588 Total length: 588 Matching Percent Similarity: 100.00 Matching Percent Identity: 99.83 Total Percent Similarity: 100.00 Total Percent Identity: 99.83 Gaps :
Alignment:
1 MFPGCPRLWVLVVLGTSWVGWGSQGTEAAQLRQFYVAAQGISWSYRPEPT 50
1 MFPGCPRL VLVVLGTS VG GSQGTEAAQLRQFYVAAQGIS SYRPEPT 50 NSSLNLSVTSFKKIVYREYEPYFKKEKPQSTISGLLGPTLYAEVGDIIKV 100
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I NSSLNLSVTSFKKIVYREYEPYFKKEKPQSTISGLLGPTLYAEVGDIIKV 100
HFKNKADKPLSIHPQGIRYSKLSEGASYLDHTFPAEKMDDAVAPGREYTY 150
I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I HFKNKADKPLSIHPQGIRYSKLSEGASYLDHTFPAEKMDDAVAPGREYTY 150
EWSISEDSGPTHDDPPCLTHIYYSHENLIEDFNSGLIGPLLICKKGTLTE 200
I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I EWSISEDSGPTHDDPPCLTHIYYSHENLIEDFNSGLIGPLLICKKGTLTE 200
GGTQKTFDKQIVLLFAVFDESKS SQSSSLMYTVNGYVNGTMPDITVCAH 250
GGTQKTFDKQIVLLFAVFDESKSWSQSSSLMYTVNGYVNGTMPDITVCAH 250
DHIS HLLGMSSGPELFSIHFNGQVLEQNHHKVSAITLVSATSTTANMTV 300
DHIS HLLGMSSGPELFSIHFNGQVLEQNHHKVSAITLVSATSTTANMTV 300
GPEGKWIISSLTPKHLQAGMQAYIDIKNCPKKTRNLKKITREQRRHMKRW 350
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I GPEGKWIISSLTPKHLQAGMQAYIDIKNCPKKTRNLKKITREQRRHMKRW 350
EYFIAAEEVI DYAPVIPANMDKKYRSQHLDNFSNQIGKHYKKVMYTQYE 400 I I I II I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I EYFIAAEEVIWDYAPVIPANMDKKYRSQHLDNFSNQIGKHYKKVMYTQYE 400
DESFTKHTVNPNMKEDGILGPIIRAQVRDTLKIVFKNMASRPYSIYPHGV 450
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I DESFTKHTVNPNMKEDGILGPIIRAQVRDTLKIVFKNMASRPYSIYPHGV 450 451 TFSPYEDEVNSSFTSGRNNTMIRAVQPGETYTYK NILEFDEPTENDAQC 500 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 451 TFSPYEDEVNSSFTSGRNNTMIRAVQPGETYTYKWNILEFDEPTENDAQC 500 . . . . . 501 LTRPYYSDVDIMRDIASGLIGLLLICKSRSLDRRGIQRAADIEQQAVFAV 550 I I I I I I I I I I 1 I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I 501 LTRPYYSDVDIMRDIASGLIGLLLICKSRSLDRRGIQRAADIEQQAVFAV 550 551 FDENKS YLEDNINKFCENPDEVKRDDPKFYESNIMSS 588 I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : 551 FDENKSWYLEDNINKFCENPDEVKRDDPKFYESNIMST 588
PBGD-amplicon, SEQ ID NO:531HPRTl-amplicon, SEQ ID NO: 612HPRT1 -amplicon, SEQ ID NO:615RPS27A amplicon, SEQ ID NO:1261
DESCRIPTION FOR CLUSTER HUMANK
Cluster HUMANK featares 8 transcript(s) and 22 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Figure imgf000551_0001
Figure imgf000552_0001
These sequences are variants of the known protein Ankyrin 1 (SwissProt accession identifier ANKl JHUMAN; known also according to the synonyms Erythrocyte ankyrin; Ankyrin R), SEQ ID NO: 628, refened to herein as the previously known protein. Protein Ankyrin 1 is Icnown or believed to have the following function(s): Attach integral membrane proteins to cytoskeletal elements; bind to the erythrocyte membrane protein band 4.2, to Na-K ATPase, to the lymphocyte membrane protein GP85, and to the cytoskeletal proteins fodrin, tabulin, vimentin and desmin. Erythrocyte ankyrins also link specfrin (beta chain) to the cytoplasmic domain of the erythrocytes anion exchange protein; they retain most or all of these binding functions. The sequence for protein Ankyrin 1 is given at the end of the application, as "Ankyrin 1 amino acid sequence". Known polymoφhisms for this sequence are as shown in Table 4.
Figure imgf000553_0001
Protein Ankyrin 1 localization is believed to be Cytoplasmic surface of erythrocytic plasma membrane. The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: exocytosis; cytoskeleton organization and biogenesis; signal transduction, which are annotation(s) related to Biological Process; structural protein; structural protein of cytoskeleton; cytoskeletal adaptor, which are annotation(s) related to Molecular Function; and cytoskeleton; plasma membrane; actin cytoskeleton; basolateral plasma membrane, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>. Cluster HUMANK can be used as a diagnostic marker according to overexpression of transcripts of this cluster in cancer. Expression of such transcripts in normal tissues is also given according to the previously described methods. The term "number" in the left hand column of the table and the numbers on the y-axis of the figure below refer to weighted expression of ESTs in each category, as "parts per million" (ratio of the expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million).
Overall, the following results were obtained as shown with regard to the histograms in Figure 23 and Table 5. This cluster is overexpressed (at least at a minimum level) in the following pathological conditions: epithelial malignant tumors. Table 5 - Normal tissue distribution
Figure imgf000554_0001
Table 6 - P values and ratios for expression in cancerous tissue
Figure imgf000554_0002
Figure imgf000555_0001
As noted above, cluster HUMANK featares 8 transcript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Ankyrin 1. A description of each variant protein according to the present invention is now provided.
Variant protein HUMANKJP12 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMANK_T13. An alignment is given to the Icnown protein (Ankyrin 1) at the end of the application. One or more aligmnents to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMANK P12 and AAH07930 (SEQ ID NO 631): l.An isolated chimeric polypeptide encoding for HUMANKJP12, comprising a first amino acid sequence being at least 90 % homologous to MWTFVTQLLVTLVLLSFFLVSCQNVMHIVRGSLCFVLKHIHQELDKELGESEGLSDDEE TISTRVNRi VFLKGNEFQNIPGEQVTEEQFTDEQGNIVTKKπRKVNRQIDLSSADAAQ EHEE conesponding to amino acids 1 - 123 of AAH07930, which also conesponds to amino acids 1 - 123 of HUMANKJP12, and a second amino acid sequence being at least 70%o, optionally at least 80%, preferably at least 85%), more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence VTVEGPLEDPSELEVDIDYFMKHSKDHTSTPNP conesponding to amino acids 124 - 156 of HUMANK_P12, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HUMANK_P12, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VTVEGPLEDPSELEVDIDYFMKHSKDHTSTPNP in HUMANK_P12.
Comparison report between HUMANK P12 and ANK1_HUMAN_V1 (SEQ ID NO 629): l.An isolated chimeric polypeptide encoding for HUMANK_P12, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%>, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence
MWTFVTQLLVTLVLLSFFLVSCQNVMHIVRGSLCFVLKHIHQELDKELGESEGLSDDEE TISTRVVRRRVFLK conesponding to amino acids 1 - 73 of HUMANKJP12, and a second amino acid sequence being at least 90 % homologous to GNEFQNIPGEQVTEEQFTDEQGNIVTKKIIRKVVRQIDLSSADAAQEHEEVTVEGPLEDP SELEVDIDYFMKHSKDHTSTPNP conesponding to amino acids 1799 - 1881 of A-NK1 JHUMAN JV1, which also conesponds to amino acids 74 - 156 of HUMANKJP12, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a head of HUMANK P12, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence MWTFVTQLLVTLVLLSFFLVSCQNVMHIVRGSLCFVLKHIHQELDKELGESEGLSDDEE TISTRVVRRRVFLK of HUMANK P12. It should be noted that the Icnown protein sequence (ANK1 JHUMAN) has one or more changes than the sequence given at the end of the application and named as being the amino acid sequence for ANK1JHUMAN_V1. These changes were previously known to occur and are listed in the table below. Table 7 - Changes to ANK1_HUMAN_V1
Figure imgf000557_0001
Comparison report between HUMANK J>12 and Q8N604 (SEQ ID NO: 630): l .An isolated chimeric polypeptide encoding for HUMANK_P12, comprising a first amino acid sequence being at least 90 % homologous to MWTFVTQLLVTLVLLSFFLVSCQNVMHIVRGSLCFVLKHIHQELDKELGESE conesponding to amino acids 1 - 52 of Q8N604, which also conesponds to amino acids 1 - 52 of HUMANK P12, a bridging amino acid G conesponding to amino acid 53 of HUMANKJP12, a second amino acid sequence being at least 90 %> homologous to LSDDEETISTRVNRRRVFLKGNEFQNIPGEQVTEEQFTDEQGNIVTKKIIRKVVRQIDLSS ADAAQEHEEV conesponding to amino acids 54 - 124 of Q8N604, which also conesponds to amino acids 54 - 124 of HUMANK JP 12, and a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence TVEGPLEDPSELEVDIDYFMKHSKDHTSTPNP conesponding to amino acids 125 - 156 of HUMANKJP12, wherein said first amino acid sequence, bridging amino acid, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HUMANKJP12, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95%> homologous to the sequence TVEGPLEDPSELEVDIDYFMKHSKDHTSTPNP in HUMANK P12. The location of the variant protein was detennined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region. Variant protein HUMANK_P12 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 8, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMANKJP12 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Amino acid mutations
Figure imgf000558_0001
Variant protein HUMANK_P12 is encoded by the following transcript(s): HUMANKJT13, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMANKJT13 is shown in bold; this coding portion starts at position 2053 and ends at position 2520. The transcript also has the following SNPs as listed in Table 9 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMANKJP12 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Nucleic acid SNPs
Figure imgf000559_0001
Variant protein HUMANKJP21 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMANKJT26. An alignment is given to the Icnown protein (Ankyrin 1) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMANK P21 and AAH07930 (SEQ ID NO: 631): l.An isolated chimeric polypeptide encoding for HUMANKJP21, comprising a first amino acid sequence being at least 90 % homologous to MWTFVTQLLVTLVLLSFFLVSCQNVMHIVRGSLCFVLKHIHQELDKELGESEGLSDDEE TISTRVVRRRVFLKGNEFQNIPGEQVTEEQFTDEQGNIVTKKIIRKVVRQIDLSSADAAQ EHEE conesponding to amino acids 1 - 123 of AAH07930, which also conesponds to amino acids 1 - 123 of HUMANKJP21, and a second amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%>, more preferably at least 90%> and most preferably at least 95%> homologous to a polypeptide having the sequence VTVEGPLEDPSELEVELRGSGLQPDLIEGRKGAQIVKRASLKRGKQ conesponding to amino acids 124 - 169 of HUMANKJP21, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of HUMANK ?21, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence VTVEGPLEDPSELEVELRGSGLQPDLIEGRKGAQIVKRASLKRGKQ in HUMANK_P21.
Comparison report between HUMANKJP21 and Q8N604: l.An isolated chimeric polypeptide encoding for HUMANKJP21, comprising a first amino acid sequence being at least 90 % homologous to MWTFVTQLLVTLVLLSFFLVSCQNVMHIVRGSLCFVLKHIHQELDKELGESE conesponding to amino acids 1 - 52 of Q8N604, which also conesponds to amino acids 1 - 52 of HUMANK_P21, a bridging amino acid G conesponding to amino acid 53 of HUMANKJP21, a second amino acid sequence being at least 90 %> homologous to LSDDEETISTRVVRRRVFLKGNEFQNIPGEQVTEEQFTDEQGNIVTKKIIRKVVRQIDLSS ADAAQEHE conesponding to amino acids 54 - 122 of Q8N604, which also corresponds to amino acids 54 - 122 of HUMANKJP21, a third amino acid sequence being at least 70%>, optionally at least 80%>, preferably at least 85%), more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence EVTVEGPLEDPSEL conesponding to amino acids 123 - 136 of HUMANK_P21, and a fourth amino acid sequence being at least 90 % homologous to EVELRGSGLQPDLIEGRKGAQIVKRASLKRGKQ conesponding to amino acids 123 - 155 of Q8N604, which also conesponds to amino acids 137 - 169 of HUMANKJP21, wherein said first amino acid sequence, bridging amino acid, second amino acid sequence, third amino acid sequence and fourth amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for an edge portion of HUMANK P21, comprising an amino acid sequence being at least 70%, optionally at least about 80%, preferably at least about 85%o, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence encoding for EVTVEGPLEDPSEL, conesponding to HUMANKJP21.
The location of the variant protein was detennined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region. Variant protein HUMANK JP21 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 10, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMANKJP21 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 - Amino acid mutations 561
Figure imgf000562_0001
Variant protein HUMANKJP21 is encoded by the following transcript(s): HUMANKJT26, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMANK_T26 is shown in bold; this coding portion starts at position 2053 and ends at position 2559. The transcript also has the following SNPs as listed in Table 11 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMANKJP21 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 11 - Nucleic acid SNPs
Figure imgf000562_0002
Figure imgf000563_0001
Variant protein HUMANK_P22 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMANKJT27. An alignment is given to the known protein (Ankyrin 1) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMANKJP22 and AAH07930: l.An isolated chimeric polypeptide encoding for HUMANK JP22, comprising a first amino acid sequence being at least 90 % homologous to MWTFVTQLLVTLVLLSFFLVSCQNVMHIVRGSLCFVLKHIHQELDKELGESEGLSDDEE TISTRVVRRRVFLKGNEFQNIPGEQVTEEQFTDEQGNIVTKKIIRKVVRQIDLSSADAAQ EHEE conesponding to amino acids 1 - 123 of AAH07930, which also conesponds to amino acids 1 - 123 of HUMANK JP22, and a second amino acid sequence being at least 70%>, optionally at least 80%>, preferably at least 85%, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence VTVEGPLEDPSELEVDIDYFMKHSKVELRGSGLQPDLIEGRKGAQIVKRASLKRGKQ conesponding to amino acids 124 - 180 of HUMANK J>22, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of HUMANK_P22, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence VTVEGPLEDPSELEVDIDYFMKHSKVELRGSGLQPDLIEGRKGAQIVKRASLKRGKQ in HUMANKJP22.
Comparison report between HUMANK JP22 and Q8N604: l.An isolated chimeric polypeptide encoding for HUMANKJP22, comprising a first amino acid sequence being at least 90 % homologous to MWTFVTQLLVTLVLLSFFLVSCQNVMHIVRGSLCFVLKHIHQELDKELGESE conesponding to amino acids 1 - 52 of Q8N604, which also conesponds to amino acids 1 - 52 of HUMANKJP22, a bridging amino acid G conesponding to amino acid 53 of HUMANKJP22, a second amino acid sequence being at least 90 % homologous to LSDDEETISTRVVRRRVFLKGNEFQNIPGEQVTEEQFTDEQGNIVTKKIIRKVVRQIDLSS ADAAQEHEE conesponding to amino acids 54 - 123 of Q8N604, which also conesponds to amino acids 54 - 123 of HUMANKJP22, a third amino acid sequence being at least 70%>, optionally at least 80%>, preferably at least 85%, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence VTVEGPLEDPSELEVDIDYFMKHSK conesponding to amino acids 124 - 148 of HUMANK JP22, and a fourth amino acid sequence being at least 90 % homologous to VELRGSGLQPDLIEGRKGAQIVKRASLKRGKQ conesponding to amino acids 124 - 155 of Q8N604, which also conesponds to amino acids 149 - 180 of HUMANK_P22, wherein said first amino acid sequence, bridging amino acid, second amino acid sequence, third amino acid sequence and fourth amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for an edge portion of HUMANK J>22, comprising an amino acid sequence being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence encoding for VTVEGPLEDPSELEVDIDYFMKHSK, conesponding to HUMANK P22.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region. Variant protein HUMANKJP22 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 12, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMANK JP22 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 12 - Amino acid mutations
Figure imgf000565_0001
Variant protein HUMANKJP22 is encoded by the following transcript(s): HUMANKJT27, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMANK_T27 is shown in bold; this coding portion starts at position 2053 and ends at position 2592. The transcript also has the following SNPs as listed in Table 13 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMANKJP22 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 13 - Nucleic acid SNPs
Figure imgf000566_0001
Figure imgf000567_0001
Variant protein HUMANK_P23 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMANKJT28. An alignment is given to the Icnown protein (Ankyrin 1) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMANK JP23 and AAH07930: l.An isolated chimeric polypeptide encoding for HUMANK_P23, comprising a first amino acid sequence being at least 90 % homologous to MWTFVTQLLVTLVLLSFFLVSCQNVMHIVRGSLCFVLKHIHQELDKELGESEGLSDDEE TISTRVVRRRVFLKGNEFQNIPGEQVTEEQFTDEQGNIVTKKIIRKVVRQIDLSSADAAQ EHEE conesponding to amino acids 1 - 123 of AAH07930, which also conesponds to amino acids 1 - 123 of HUMANK_P23, and a second amino acid sequence being at least 70%>, optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence VTVEGPLEDPSELEDHTSTPNP conesponding to amino acids 124 - 145 of HUMANKJP23, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of HUMANK JP23, comprising a polypeptide being at least 70%>, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%o and most preferably at least about 95%> homologous to the sequence VTVEGPLEDPSELEDHTSTPNP in HUMANK >23.
Comparison report between HUMANK >23 and Q8N604: l.An isolated chimeric polypeptide encoding for HUMANKJP23, comprising a first amino acid sequence being at least 90 % homologous to MWTFVTQLLVTLVLLSFFLVSCQNVMHIVRGSLCFVLKHIHQELDKELGESE conesponding to amino acids 1 - 52 of Q8N604, which also conesponds to amino acids 1 - 52 of HUMANK JP23, a bridging amino acid G conesponding to amino acid 53 of HUMANKJP23, a second amino acid sequence being at least 90 %> homologous to LSDDEETISTRVVRRRVFLKGNEFQNIPGEQVTEEQFTDEQGNIVTKKIIRKVVRQIDLSS ADAAQEHEEV conesponding to amino acids 54 - 124 of Q8N604, which also conesponds to amino acids 54 - 124 of HUMANK_P23, and a third amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence TVEGPLEDPSELEDHTSTPNP conesponding to amino acids 125 - 145 of HUMANK_P23, wherein said first amino acid sequence, bridging amino acid, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of HUMANK_P23, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95% homologous to the sequence TVEGPLEDPSELEDHTSTPNP in HUMANK P23. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region. Variant protein HUMANK_P23 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 14, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMANKJP23 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 14 - Amino acid mutations
Figure imgf000569_0001
Variant protein HUMANK_P23 is encoded by the following transcript(s): HUMANKJT28, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMANKJT28 is shown in bold; this coding portion starts at position 2053 and ends at position 2487. The transcript also has the following SNPs as listed in Table 15 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMANK_P23 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 15 - Nucleic acid SNPs
Figure imgf000569_0002
Figure imgf000570_0001
Variant protein HUMANK_P27 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMANKJT35. An aligmnent is given to the Icnown protein (Ankyrin 1) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMANKJP27 and AAH07930: l.An isolated chimeric polypeptide encoding for HUMANKJP27, comprising a first amino acid sequence being at least 90 % homologous to MWTFVTQLLVTLVLLSFFLVSCQNVMHTVRGSLCFVLKHIHQELDKELGESEGLSDDEE TISTRWRRRVFLKGNEFQNIPGEQVTEEQFTDEQGNIVTKK conesponding to amino acids 1 - 101 of AAH07930, which also conesponds to amino acids 1 - 101 of HUMANK_P27, and a second amino acid sequence being at least 70%>, optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VGAECSPLCWGEAGGLEAKRW conesponding to amino acids 102 - 122 of HUMANK_P27, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HUMANK JP27, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90%> and most preferably at least about 95% homologous to the sequence VGAECSPLCWGEAGGLEAKRW in HUMANK P27.
Comparison report between HUMANK JP27 and Q8N604: l.An isolated chimeric polypeptide encoding for HUMANKJP27, comprising a first amino acid sequence being at least 90 %> homologous to MWTFVTQLLVTLVLLSFFLVSCQNVMHIVRGSLCFVLKHIHQELDKELGESE conesponding to amino acids 1 - 52 of Q8N604, which also conesponds to amino acids 1 - 52 of HUMANKJP27, a bridging amino acid G conesponding to amino acid 53 of HUMANK_P27, a second amino acid sequence being at least 90 % homologous to LSDDEETISTRWRRRVFLKGNEFQNIPGEQVTEEQFTDEQGNIVTKK conesponding to amino acids 54 - 101 of Q8N604, which also conesponds to amino acids 54 - 101 of HUMANK _P27, and a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VGAECSPLCWGEAGGLEAKRW conesponding to amino acids 102 - 122 of HUMANK_P27, wherein said first amino acid sequence, bridging amino acid, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of HUMANK_P27, comprising a polypeptide being at least 70%>, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VGAECSPLCWGEAGGLEAKRW in HUMANKJP27. The location of the variant protein was detennined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region. Variant protein HUMANKJP27 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 16, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMANKJP27 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 16 - Amino acid mutations
Figure imgf000572_0001
Variant protein HUMANK_P27 is encoded by the following transcript(s): HUMANK JT35, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMANK JT35 is shown in bold; this coding portion starts at position 2053 and ends at position 2418. The transcript also has the following SNPs as listed in Table 17 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMANK_P27 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 17 - Nucleic acid SNPs
Figure imgf000573_0001
Variant protein HUMANKJP29 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMANK JT3. An alignment is given to the known protein (Ankyrin 1) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMANKJP29 and AAH07930: l.An isolated chimeric polypeptide encoding for HUMANK_P29, comprising a first amino acid sequence being at least 90 % homologous to MWTFVTQLLVTLVLLSFFLVSCQNVMHIVRGSLCFVLKHIHQELDKELGESEGLSDDEE TIS conesponding to amino acids 1 - 62 of AAH07930, which also conesponds to amino acids 1 - 62 of HUMANK JP29, a bridging amino acid P conesponding to amino acid 63 of HUMANK JP29, a second amino acid sequence being at least 90 %> homologous to RWRRRVFLKGNEFQNPGEQVTEEQFTDEQGNIVTKKIIRKWRQIDLSSADAAQEHEE conesponding to amino acids 64 - 123 of AAH07930, which also conesponds to amino acids 64 - 123 of HUMANK JP29, and a third amino acid sequence being at least 70%>, optionally at least 80%), preferably at least 85%, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence
VELRGSGLQPDLIEGRKGAQIVKRASLKRGKQ conesponding to amino acids 124 - 155 of HUMANKJP29, wherein said first amino acid sequence, bridging amino acid, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HUMANK_P29, comprising a polypeptide being at least 70%>, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95%> homologous to the sequence VELRGSGLQPDLIEGRKGAQIVKRASLKRGKQ in HUMANK_P29.
Comparison report between HUMANKJP29 and Q8N604: l.An isolated chimeric polypeptide encoding for HUMANK_P29, comprising a first amino acid sequence being at least 90 % homologous to MWTFVTQLLVTLVLLSFFLVSCQNVMHIVRGSLCFVLKHIHQELDKELGESE conesponding to amino acids 1 - 52 of Q8N604, which also conesponds to amino acids 1 - 52 of HUMANK_P29, a bridging amino acid G conesponding to amino acid 53 of HUMANKJP29, a second amino acid sequence being at least 90 % homologous to LSDDEETIS conesponding to amino acids 54 - 62 of Q8N604, which also conesponds to amino acids 54 - 62 of HUMANK >29, a bridging amino acid P conesponding to amino acid 63 of HUMANK_P29, and a third amino acid sequence being at least 90 % homologous to RVVRPJR.VFLKGNEFQNIPGEQVTEEQFTDEQGNIVTKKIIRKVVRQIDLSSADAAQEHEE VELRGSGLQPDLIEGRKGAQIVKRASLKRGKQ conesponding to amino acids 64 - 155 of Q8N604, which also conesponds to amino acids 64 - 155 of HUMANK JP29, wherein said first amino acid sequence, bridging amino acid, second amino acid sequence, bridging amino acid and third amino acid sequence are contiguous and in a sequential order.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region. Variant protein HUMANK >29 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 18, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is Icnown or not; the presence of Icnown SNPs in variant protein HUMANKJP29 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 18 - Amino acid mutations
Figure imgf000575_0001
Variant protein HUMANK JP29 is encoded by the following transcripts): HUMANKJT3, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMANKJT3 is shown in bold; this coding portion starts at position 2053 and ends at position 2517. The transcript also has the following SNPs as listed in Table 19 (given according to their position on the nucleotide sequence, with the altemative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMANKJP29 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 19 - Nucleic acid SNPs
Figure imgf000575_0002
Figure imgf000576_0001
Variant protein HUMANKJP33 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMANKJT23. An alignment is given to the Icnown protein (Ankyrin 1) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMANK JP33 and AAH07930: l.An isolated chimeric polypeptide encoding for HUMANK_P33, comprising a first amino acid sequence being at least 90 % homologous to
MWTFVTQLLVTLVLLSFFLVSCQNVMHIVRGSLCFVLKHIHQELDKELGESEGLSDDEE
TIS conesponding to amino acids 1 - 62 of AAH07930, which also conesponds to amino acids
1 - 62 of HUMANKJP33, a bridging amino acid P conesponding to amino acid 63 of
HUMANKJP33, a second amino acid sequence being at least 90 % homologous to RWRRRVFLKGNEFQNIPGEQVTEEQFTDEQGNIVTKK conesponding to amino acids 64 -
101 of AAH07930, which also conesponds to amino acids 64 - 101 of HUMANK JP33, and a third amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence DHTSTPNP conesponding to amino acids 102 - 109 of HUMANKJP33, wherein said first amino acid sequence, bridging amino acid, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HUMANK_P33, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95% homologous to the sequence DHTSTPNP in HUMANK P33.
Comparison report between HUMANK P33 and Q8N604: l.An isolated chimeric polypeptide encoding for HUMANK JP33, comprising a first amino acid sequence being at least 90 % homologous to MWTFVTQLLVTLVLLSFFLVSCQNVMHIVRGSLCFVLKHIHQELDKELGESE conesponding to amino acids 1 - 52 of Q8N604, which also conesponds to amino acids 1 - 52 of HUMANKJP33, a bridging amino acid G conesponding to amino acid 53 of HUMANKJP33, a second amino acid sequence being at least 90 % homologous to LSDDEETIS conesponding to amino acids 54 - 62 of Q8N604, which also conesponds to amino acids 54 - 62 of HUMANKJ>33, a bridging amino acid P conesponding to amino acid 63 of HUMANKJP33, a third amino acid sequence being at least 90 % homologous to RVVRRRVFLKGNEFQNIPGEQVTEEQFTDEQGNIVTKK conesponding to amino acids 64 - 101 of Q8N604, which also conesponds to amino acids 64 - 101 of HUMANK JP33, and a fourth amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence DHTSTPNP conesponding to amino acids 102 - 109 of HUMANK JP33, wherein said first amino acid sequence, bridging amino acid, second amino acid sequence, bridging amino acid, third amino acid sequence and fourth amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HUMANKJP33, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence DHTSTPNP in HUMANK J>33.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region. Variant protein HUMANKJP33 also has the following non-silent SNPs (Single
Nucleotide Polymoφhisms) as listed in Table 20, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMANK JP33 sequence provides support for the deduced sequence of this variant protein according to the present invention).
Table 20 - Amino acid mutations
Figure imgf000579_0001
Variant protein HUMANKJP33 is encoded by the following transcript(s): HUMANKJT23, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMANKJT23 is shown in bold; this coding portion starts at position 2053 and ends at position 2379. The transcript also has the following SNPs as listed in Table 21 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is Icnown or not; the presence of known SNPs in variant protein HUMANK_P33 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 21 - Nucleic acid SNPs
Figure imgf000579_0002
Figure imgf000580_0001
Variant protein HUMANK JP34 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMANK JT24. An alignment is given to the known protein (Ankyrin 1) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMANKJP34 and AAH07930: l.An isolated chimeric polypeptide encoding for HUMANKJP34, comprising a first amino acid sequence being at least 90 % homologous to MWTFVTQLLVTLVLLSFFLVSCQNVMHIVRGSLCFVLKHIHQELDKELGESEGLSDDEE TIS conesponding to amino acids 1 - 62 of AAH07930, which also conesponds to amino acids 1 - 62 of HUMANKJP34, a bridging amino acid P conesponding to arnino acid 63 of HUMANKJP34, a second amino acid sequence being at least 90 %> homologous to RVVRRRVFLKGNEFQNIPGEQVTEEQFTDEQGNIVTKK conesponding to amino acids 64 - 101 of AAH07930, which also conesponds to amino acids 64 - 101 of HUMANK_P34, and a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VELRGSGLQPDLIEGRKGAQIVKRASLKRGKQ conesponding to amino acids 102 - 133 of HUMANKJP34, wherein said first amino acid sequence, bridging amino acid, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of HUMANKJP34, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95%> homologous to the sequence VELRGSGLQPDLIEGRKGAQIVKRASLKRGKQ in HUMANK JP34.
Comparison report between HUMANK JP34 and Q8N604: l .An isolated chimeric polypeptide encoding for HUMANKJP34, comprising a first amino acid sequence being at least 90 %> homologous to MWTFVTQLLVTLVLLSFFLVSCQNVMHIVRGSLCFVLKHIHQELDKELGESE conesponding to amino acids 1 - 52 of Q8N604, which also conesponds to amino acids 1 - 52 of HUMANKJP34, a bridging amino acid G conesponding to amino acid 53 of HUMANKJP34, a second amino acid sequence being at least 90 % homologous to LSDDEETIS conesponding to amino acids 54 - 62 of Q8N604, which also conesponds to amino acids 54 - 62 of HUMANKJP34, a bridging amino acid P conesponding to amino acid 63 of HUMANK JP34, a third amino acid sequence being at least 90 %> homologous to RWRRRVFLKGNEFQNIPGEQVTEEQFTDEQGNIVTKK conesponding to amino acids 64 - 101 of Q8N604, which also conesponds to amino acids 64 - 101 of HUMANK JP34, and a fourth amino acid sequence being at least 90 % homologous to VELRGSGLQPDLIEGRKGAQIVKRASLKRGKQ conesponding to amino acids 124 - 155 of Q8N604, which also conesponds to amino acids 102 - 133 of HUMANK_P34, wherein said first amino acid sequence, bridging amino acid, second amino acid sequence, bridging amino acid, third amino acid sequence and fourth amino acid sequence are contiguous and in a sequential order. 2.An isolated chimeric polypeptide encoding for an edge portion of HUMANK_P34, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise KV, having a stmcture as follows: a sequence starting from any of amino acid numbers 101-x to 101 ; and ending at any of amino acid numbers 102+ ((n-2) - x), in which x varies from 0 to n-2.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region. Variant protein HUMANKJP34 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 22, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMANK_P34 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 22 - Amino acid mutations
Figure imgf000582_0001
Variant protein HUMANK P34 is encoded by the following transcript(s): HUMANKJT24, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMANKJT24 is shown in bold; this coding portion starts at position 2053 and ends at position 2451. The transcript also has the following SNPs as listed in Table 23 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMANK_P34 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 23 - Nucleic acid SNPs
Figure imgf000583_0001
Figure imgf000584_0001
As noted above, cluster HUMANK features 22 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster HUMANK_node_91 according to the present invention is supported by 8 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMANK T3, HUMANK T13, HUMANK T23, HUMANK T24, HUMANKJT26, HUMANKJT27, HUMANKJT28 and HUMANKJT35. Table 24 below describes the starting and ending position of this segment on each transcript. Table 24 - Segment location on transcripts
Figure imgf000584_0002
Segment cluster HUMANK_node_92 according to the present invention is supported by 19 libraries. The number of libraries was detennined as previously described. This segment can be found in the following transcript(s): HUMANKJT3, HUMANK JTl 3, HUMANK JT23, HUMANKJT24, HUMANKJT26, HUMANKJT27, HUMANK JT28 and HUMANK T35. Table 25 below describes the starting and ending position of this segment on each transcript. Table 25 - Segment location on transcripts
Figure imgf000585_0001
Segment cluster HUMANK_node_93 according to the present invention is supported by 31 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMANK T3, HUMANK T13, HUMANKJT23, HUMANK T24, HUMANKJT26, HUMANKJT27, HUMANKJT28 and HUMANK T35. Table 26 below describes the starting and ending position of this segment on each transcript. Table 26 - Segment location on transcripts
Figure imgf000585_0002
Figure imgf000586_0001
Segment cluster HUMANK iodeJOO according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMANKJT35. Table 27 below describes the starting and ending position of this segment on each transcript. Table 27 - Segment location on transcripts
Figure imgf000586_0002
Segment cluster HUMANK_node_108 according to the present invention is supported by 33 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMANK_T3, HUMANKJT24, HUMANK T26 and HUMANKJT27. Table 28 below describes the starting and ending position of this segment on each transcript. Table 28 - Segment location on transcripts
Figure imgf000586_0003
Figure imgf000587_0001
Segment cluster HUMANKjnodeJ 13 according to the present invention is supported by 56 libraries. The number of libraries was detennined as previously described. This segment can be found in the following transcript(s): HUMANKJT3, HUMANK JTl 3, HUMANK T23, HUMANK JT24, HUMANKJT26, HUMANK JT27 and HUMANK T28. Table 29 below describes the starting and ending position of this segment on each transcript. Table 29 - Segment location on transcripts
Figure imgf000587_0002
Segment cluster HUMANK_nodeJ 15 according to the present invention is supported by 63 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMANKJT3, HUMANKJT13, HUMANKJT23, HUMANK JT24, HUMANK T26, HUMANK T27 and HUMANK T28. Table 30 below describes the starting and ending position of this segment on each transcript. Table 30 - Segment location on transcripts
Figure imgf000587_0003
Figure imgf000588_0001
Segment cluster HUMANKjnodeJ 17 according to the present invention is supported by 28 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMANKJT3, HUMANK T13, HUMANK T23, HUMANK T24, HUMANKJT26, HUMANKJT27 and HUMANK JT28. Table 31 below describes the starting and ending position of this segment on each transcript. Table 31 - Segment location on transcripts
Figure imgf000588_0002
Segment cluster HUMANK jiodej 19 according to the present invention is supported by 10 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMANK T3, HUMANK_T13, HUMANK JT23, HUMANK T24, HUMANK_T26, HUMANK T27 and HUMANK T28. Table 32 below describes the starting and ending position of this segment on each transcript. Table 32 - Segment location on transcripts
Figure imgf000589_0001
Segment cluster HUMANK_node_120 according to the present invention is supported by 13 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMANKJT3, HUMANKJT3, HUMANK T23, HUMANKJT24, HUMANK T26, HUMANK T27 and HUMANKJT28. Table 33 below describes the starting and ending position of this segment on each transcript. Table 33 - Segment location on transcripts
Figure imgf000589_0002
According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster HUMANK_node_94 according to the present invention is supported by 28 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMANKJT3, HUMANKJT13, HUMANKJT23, HUMANK T24, HUMANKJT26, HUMANKJT27, HUMANK T28 and HUMANK T35. Table 34 below describes the starting and ending position of this segment on each transcript. Table 34 - Segment location on transcripts
Figure imgf000590_0001
Segment cluster HUMANK_node_95 according to the present invention is supported by 28 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMANK T3, HUMANKJT13, HUMANKJT23, HUMANKJT24, HUMANK T26, HUMANKJT27, HUMANK T28 and HUMANK T35. Table 35 below describes the starting and ending position of this segment on each transcript. Table 35 - Segment location on transcripts
Figure imgf000591_0001
Segment cluster HUMANK_node_98 according to the present invention is supported by 53 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMANK T3, HUMANK JT3, HUMANK T23, HUMANK JT24, HUMANK T26, HUMANKJT27, HUMANK_T28 and HUMANK JT35. Table 36 below describes the starting and ending position of this segment on each transcript. Table 36 - Segment location on transcripts
Figure imgf000591_0002
Segment cluster HUMANK_node_99 according to the present invention can be found in the following transcript(s): HUMANK JT3, HUMANK T13, HUMANKJT23, HUMANK T24, HUMANKJT26, HUMANK JT27, HUMANKJT28 and HUMANK_T35. Table 37 below describes the starting and ending position of this segment on each transcript. Table 37 - Segment location on transcripts
Figure imgf000592_0001
Segment cluster HUMANK_node_102 according to the present invention is supported by 57 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMANK T3, HUMANK T13, HUMANK T26, HUMANKJT27 and HUMANKJT28. Table 38 below describes the starting and ending position of this segment on each transcript. Table 38 - Segment location on transcripts
Figure imgf000592_0002
Figure imgf000593_0001
Segment cluster HUMANK_node J 03 according to the present invention can be found in the following transcript(s): HUMANK JT3, HUMANK T13, HUMANKJT26, HUMANKJT27 and HUMANKJT28. Table 39 below describes the starting and ending position of this segment on each transcript. Table 39 - Segment location on transcripts
Figure imgf000593_0002
Segment cluster HUMANK_node_104 according to the present invention can be found in the following transcript(s): HUMANK T3, HUMANK T13, HUMANK_T26, HUMANK T27 and HUMANKJT28. Table 40 below describes the starting and ending position of this segment on each transcript. Table 40 - Segment location on transcripts
Figure imgf000593_0003
Segment cluster HUMANKjnode J 05 according to the present invention is supported by 33 libraries. The number of libraries was detennined as previously described. This segment can be found in the following transcript(s): HUMANKJT13, HUMANK T26, HUMANKJT27 and HUMANKJT28. Table 41 below describes the starting and ending position of this segment on each transcript. Table 41 - Segment location on transcripts
Figure imgf000594_0001
Segment cluster HUMANK_node_l 06 according to the present invention is supported by 31 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMANK JT13 and HUMANK JT27. Table 42 below describes the starting and ending position of this segment on each transcript. Table 42 - Segment location on transcripts
Figure imgf000594_0002
Segment cluster HUMANKjnode J 12 according to the present invention is supported by 56 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMANK T3, HUMANKJT13, HUMANK T23, HUMANKJT24, HUMANKJT26, HUMANK T27 and HUMANK_T28. Table 43 below describes the starting and ending position of this segment on each transcript. Table 43 - Segment location on transcripts
Figure imgf000595_0001
Segment cluster HUMANK nodej 14 according to the present invention is supported by 55 libraries. The number of libraries was detennined as previously described. This segment can be found in the following transcript(s): HUMANK T3, HUMANKJT13, HUMANK_T23, HUMANKJT24, HUMAN T26, HUMANKJT27 and HUMANKJT28. Table 44 below describes the starting and ending position of this segment on each transcript. Table 44 - Segment location on transcripts
Figure imgf000595_0002
Segment cluster HUMANK iodeJ 16 according to the present invention can be found in the following transcript(s): HUMANKJT3. Table 45 below describes the starting and ending position of this segment on each transcript. Table 45 - Segment location on transcripts
Figure imgf000596_0001
Variant protein alignment to the previously Icnown protein: Sequence name: AAH07930
Sequence documentation:
Alignment of: HϋMANK_P12 x AAH07930
Alignment segment 1/1: Quality: 1185.00 Escore: 0 Matching length: 123 Total length: 123 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps
Alignment :
1 M TFVTQLLVTLVLLSFFLVSCQNVMHIVRGSLCFVLKHIHQELDKELGE 50
1 MWTFVTQLLVTLVLLSFFLVSCQNVMHIVRGSLCFVLKHIHQELDKELGE 50
51 SEGLSDDEETISTRVVRRRVFLKGNEFQNIPGEQVTEEQFTDEQGNIVTK 100
51 SEGLSDDEETISTRVVRRRVFLKGNEFQNIPGEQVTEEQFTDEQGNIVTK 100
101 KIIRKWRQIDLSSADAAQEHEE 123 I I I I I I I I I I I I I I I I I I I I I I I 101 KIIRKWRQIDLSSADAAQEHEE 123
Sequence name: ANK1_HUMAN_V1
Sequence documentation:
Alignment of: HUMANK_P12 x ANK1_HUMAN_V1
Alignment segment 1/1: Quality: 815.00
Escore: 0 Matching length: 84 Total length: 84 Matching Percent Similarity: 100.00 Matching Percent Identity: 98.81 Total Percent Similarity: 100.00 Total Percent Identity: 98.81 Gaps : 0
Alignment :
73 KGNEFQNIPGEQVTEEQFTDEQGNIVTKKIIRKVVRQIDLSSADAAQEHE 122
1798 QGNEFQNIPGEQVTEEQFTDEQGNIVTKKIIRKVVRQIDLSSADAAQEHE 1847
123 EVTVEGPLEDPSELEVDIDYFMKHSKDHTSTPNP 156
1848 EVTVEGPLEDPSELEVDIDYFMKHSKDHTSTPNP 1881
Sequence name: Q8N604
Sequence documentation:
Alignment of: HUMANK P12 x Q8N604
Alignment segment 1/1; Quality: 1184.00
Escore: 0 Matching length: 128 Total length: 128 Matching Percent Similarity: 97.66 Matching Percent Identity: 96.88 Total Percent Similarity: 97.66 Total Percent
Identity: 96.88 Gaps : 0
Alignment :
1 M TFVTQLLVTLVLLSFFLVSCQNVMHIVRGSLCFVLKHIHQELDKELGE 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I 1 M TFVTQLLVTLVLLSFFLVSCQNVMHIVRGSLCFVLKHIHQELDKELGE 50
51 SEGLSDDEETISTRVVRRRVFLKGNEFQNIPGEQVTEEQFTDEQGNIVTK 100
51 SEDLSDDEETISTRWRRRVFLKGNEFQNIPGEQVTEEQFTDEQGNIVTK 100
101 KIIRKVVRQIDLSSADAAQEHEEVTVEG 128
101 KIIRKVVRQIDLSSADAAQEHEEVELRG 128
Sequence name: AAH07930 Sequence documentation:
Alignment of: HUMANK_P21 x AAH07930
Alignment segment 1/1
Quality: 1185.00 Escore: 0 Matching length: 123 Total length: 123 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0
Alignment :
1 M TFVTQLLVTLVLLSFFLVSCQNVMHIVRGSLCFVLKHIHQELDKELGE 50 I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MWTFVTQLLVTLVLLSFFLVSCQNVMHIVRGSLCFVLKHIHQELDKELGE 50
51 SEGLSDDEETISTRVVRRRVFLKGNEFQNIPGEQVTEEQFTDEQGNIVTK 100
51 SEGLSDDEETISTRVVRRRVFLKGNEFQNIPGEQVTEEQFTDEQGNIVTK 100
101 KIIRKWRQIDLSSADAAQEHEE 123
101 KIIRKWRQIDLSSADAAQEHEE 123
Sequence name: Q8N604
Sequence documentation:
Alignment of: HUMANK_P21 x Q8N604
Alignment segment 1/1
Quality: 1370.00 Escore: 0 Matching length: 155 Total length: 169 Matching Percent Similarity: 99.35 Matching Percent Identity: 99.35 Total Percent Similarity: 91.12 Total Percent Identity: 91.12 Gaps : 1
Alignment :
1 MWTFVTQLLVTLVLLSFFLVSCQNVMHIVRGSLCFVLKHIHQELDKELGE 50
1 M TFVTQLLVTLVLLSFFLVSCQNVMHIVRGSLCFVLKHIHQELDKELGE 50
51 SEGLSDDEETISTRWRRRVFLKGNEFQNIPGEQVTEEQFTDEQGNIVTK 100 II I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 SEDLSDDEETISTRWRRRVFLKGNEFQNIPGEQVTEEQFTDEQGNIVTK 100 101 KIIRKWRQIDLSSADAAQEHEEVTVEGPLEDPSELEVELRGSGLQPDLI 150
101 KIIRKWRQIDLSSADAAQEHE EVELRGSGLQPDLI 136
151 EGRKGAQIVKRASLKRGKQ 169
137 EGRKGAQIVKRASLKRGKQ 155
Sequence name: AAH07930
Sequence documentation:
Alignment of: HUMANK_P22 x AAH07930
Alignment segment 1/1
Quality: 1185.00 Escore: 0 Matching length: 123 Total length: 123 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0 Alignment :
1 MWTFVTQLLVTLVLLSFFLVSCQNVMHIVRGSLCFVLKHIHQELDKELGE 50 I I I I I 1 I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I 1 M TFVTQLLVTLVLLSFFLVSCQNVMHIVRGSLCFVLKHIHQELDKELGE 50 51 SEGLSDDEETISTRVVRRRVFLKGNEFQNIPGEQVTEEQFTDEQGNIVTK 100 I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I 51 SEGLSDDEETISTRWRRRVFLKGNEFQNIPGEQVTEEQFTDEQGNIVTK 100
101 KIIRKWRQIDLSSADAAQEHEE 123
101 KIIRKWRQIDLSSADAAQEHEE 123
Sequence name: Q8N604
Sequence documentation:
Alignment of: HUMANK_P22 x Q8N604
Alignment segment 1/1:
Quality: 1370.00 Escore: 0 Matching length: 155 Total length: 180 Matching Percent Similarity: 99.35 Matching Percent Identity: 99.35 Total Percent Similarity: 85.56 Total Percent
Identity: 85.56 Gaps : 1
Alignment : . . . . . 1 M TFVTQLLVTLVLLSFFLVSCQNVMHIVRGSLCFVLKHIHQELDKELGE 50 I I I II I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 M TFVTQLLVTLVLLSFFLVSCQNVMHIVRGSLCFVLKHIHQELDKELGE 50 51 SEGLSDDEETISTRWRRRVFLKGNEFQNIPGEQVTEEQFTDEQGNIVTK 100 II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I II 1 I I I I 51 SEDLSDDEETISTRWRRRVFLKGNEFQNIPGEQVTEEQFTDEQGNIVTK 100 101 KIIRKWRQIDLSSADAAQEHEEVTVEGPLEDPSELEVDIDYFMKHSKVE 150 I I I I I 1 I I I I I I I I I I I I I I I I I II 101 KIIRKWRQIDLSSADAAQEHEE VE 125
151 LRGSGLQPDLIEGRKGAQIVKRASLKRGKQ 180 I I I I I II I I I I I I I I I I I II I I I I I I I I I I 126 LRGSGLQPDLIEGRKGAQIVKRASLKRGKQ 155 Sequence name: AAH07930
Sequence documentation:
Alignment of: HUMANK P23 x AAH07930
Alignment segment 1/1:
Quality: 1185.00 Escore: 0 Matching length: 123 Total length: 123 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment :
1 M TFVTQLLVTLVLLSFFLVSCQNVMHIVRGSLCFVLKHIHQELDKELGE 50
1 MWTFVTQLLVTLVLLSFFLVSCQNVMHIVRGSLCFVLKHIHQELDKELGE 50
51 SEGLSDDEETISTRWRRRVFLKGNEFQNIPGEQVTEEQFTDEQGNIVTK 100
51 SEGLSDDEETISTRWRRRVFLKGNEFQNIPGEQVTEEQFTDEQGNIVTK 100
101 KIIRKWRQIDLSSADAAQEHEE 123
101 KIIRKWRQIDLSSADAAQEHEE 123
Sequence name: Q8N604
Sequence documentation:
Alignment of: HUMANK_P23 x Q8N604
Alignment segment 1/1: Quality: 1184.00
Escore: 0 Matching length: 128 Total length: 128 Matching Percent Similarity: 97.66 Matching Percent Identity: 96.88 Total Percent Similarity: 97.66 Total Percent
Identity: 96.88 Gaps : 0
Alignment:
1 M TFVTQLLVTLVLLSFFLVSCQNVMHIVRGSLCFVLKHIHQELDKELGE 50 I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MWTFVTQLLVTLVLLSFFLVSCQNVMHIVRGSLCFVLKHIHQELDKELGE 50 . . . . . 51 SEGLSDDEETISTRWRRRVFLKGNEFQNIPGEQVTEEQFTDEQGNIVTK 100 51 SEDLSDDEETI STRWRRRVFLKGNEFQNI PGEQVTEEQFTDEQGNIVTK 100
101 KI IRKWRQIDLSSADAAQEHEEVTVEG 128 I I I I I I I I I I I I I I I I I I I I I I I I : I 101 KI IRKWRQIDLSSADAAQEHEEVELRG 128
Sequence name: AAH07930
Sequence documentation:
Alignment of: HUMANK P27 x AAH07930
Alignment segment 1/1:
Quality: 984.00
Escore: 0 Matching length: 102 Total length: 102 Matching Percent Similarity: 100.00 Matching Percent Identity: 99.02 Total Percent Similarity: 100.00 Total Percent Identity: 99.02 Gaps : 0
Alignment : 1 M TFVTQLLVTLVLLSFFLVSCQNVMHIVRGSLCFVLKHIHQELDKELGE 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I 1 MWTFVTQLLVTLVLLSFFLVSCQNVMHIVRGSLCFVLKHIHQELDKELGE 50 . . . . . 51 SEGLSDDEETI STRWRRRVFLKGNEFQNIPGEQVTEEQFTDEQGNIVTK 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 SEGLSDDEETISTRWRRRVFLKGNEFQNIPGEQVTEEQFTDEQGNIVTK 100 101 KV 102 I: 101 KI 102
Sequence name: Q8N604
Sequence documentation:
Alignment of: HUMANK_P27 x Q8N604
Alignment segment 1/1:
Quality: 971.00
Escore: 0 Matching length: 102 Total length: 102 Matching Percent Similarity: 99.02 Matching Percent
Identity: 98.04 Total Percent Similarity: 99.02 Total Percent
Identity: 98.04 Gaps : 0
Alignment :
1 M TFVTQLLVTLVLLSFFLVSCQNVMHIVRGSLCFVLKHIHQELDKELGE 50 I I || I ) I 1 I I I I || M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MWTFVTQLLVTLVLLSFFLVSCQNVMHIVRGSLCFVLKHIHQELDKELGE 50
51 SEGLSDDEETISTRVVRRRVFLKGNEFQNIPGEQVTEEQFTDEQGNIVTK 100 II I II I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I 51 SEDLSDDEETISTRWRRRVFLKGNEFQNIPGEQVTEEQFTDEQGNIVTK 100
101 KV 102 I: 101 KI 102
Sequence name: AAH07930
Sequence documentation:
Alignment of: HUMANK P29 x AAH07930 Alignment segment 1/1:
Quality: 1172.00
Escore: 0 Matching length: 123 Total length: 123 Matching Percent Similarity: 99.19 Matching Percent Identity: 99.19 Total Percent Similarity: 99.19 Total Percent Identity: 99.19 Gaps : 0
Alignment : 1 MWTFVTQLLVTLVLLSFFLVSCQNVMHIVRGSLCFVLKHIHQELDKELGE 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I 1 MWTFVTQLLVTLVLLSFFLVSCQNVMHIVRGSLCFVLKHIHQELDKELGE 50 51 SEGLSDDEETISPRWRRRVFLKGNEFQNIPGEQVTEEQFTDEQGNIVTK 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 SEGLSDDEETISTRWRRRVFLKGNEFQNIPGEQVTEEQFTDEQGNIVTK 100 101 KI IRKWRQIDLSSADAAQEHEE 123 I I I I I I I I I I I I I I I I I I I I I I I 101 KIIRKWRQIDLSSADAAQEHEE 123 Sequence name: Q8N604
Sequence documentation:
Alignment of: HUMANK P29 x Q8N604
Alignment segment 1/1:
Quality: 1457.00 Escore: 0 Matching length: 155 Total length: 155 Matching Percent Similarity: 98.71 Matching Percent Identity: 98.71 Total Percent Similarity: 98.71 Total Percent Identity: 98.71 Gaps : 0
Alignment :
1 MWTFVTQLLVTLVLLSFFLVSCQNVMHIVRGSLCFVLKHIHQELDKELGE 50
1 MWTFVTQLLVTLVLLSFFLVSCQNVMHIVRGSLCFVLKHIHQELDKELGE 50
51 SEGLSDDEETISPRWRRRVFLKGNEFQNIPGEQVTEEQFTDEQGNIVTK 100
51 SEDLSDDEETISTRWRRRVFLKGNEFQNIPGEQVTEEQFTDEQGNIVTK 100
101 KIIRKWRQIDLSSADAAQEHEEVELRGSGLQPDLIEGRKGAQIVKRASL 150 I I I I I I I I I I I I I II I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 101 KIIRKVVRQIDLSSADAAQEHEEVELRGSGLQPDLIEGRKGAQIVKRASL 150 151 KRGKQ 155
151 KRGKQ 155
Sequence name: AAH07930
Sequence documentation:
Alignment of: HUMANK P33 x AAH07930
Alignment segment 1/1
Quality: 967.00 Escore: 0 Matching length: 101 Total length: 101 Matching Percent Similarity: 99.01 Matching Percent Identity: 99.01 Total Percent Similarity: 99.01 Total Percent Identity: 99.01 Gaps : 0
Alignment:
1 MWTFVTQLLVTLVLLSFFLVSCQNVMHIVRGSLCFVLKHIHQELDKELGE 50 1 M TFVTQLLVTLVLLSFFLVSCQNVMHIVRGSLCFVLKHIHQELDKELGE 50
51 SEGLSDDEETISPRVVRRRVFLKGNEFQNIPGEQVTEEQFTDEQGNIVTK 100
51 SEGLSDDEETISTRVVRRRVFLKGNEFQNIPGEQVTEEQFTDEQGNIVTK 100
101 K 101
101 K 101
Sequence name: Q8N604
Sequence documentation:
Alignment of: HUMANK_P33 x Q8N604
Alignment segment 1/1: Quality: 954.00
Escore: 0 Matching length: 101 Total length: 101 Matching Percent Similarity: 98.02 Matching Percent Identity: 98.02 Total Percent Similarity: 98.02 Total Percent
Identity: 98.02 Gaps : 0
Alignment:
1 M TFVTQLLVTLVLLSFFLVSCQNVMHIVRGSLCFVLKHIHQELDKELGE 50
1 M TFVTQLLVTLVLLSFFLVSCQNVMHIVRGSLCFVLKHIHQELDKELGE 50 . . . . . 51 SEGLSDDEETISPRWRRRVFLKGNEFQNIPGEQVTEEQFTDEQGNIVTK 100
51 SEDLSDDEETISTRWRRRVFLKGNEFQNIPGEQVTEEQFTDEQGNIVTK 100
101 K 101
101 K 101
Sequence name: AAH07930
Sequence documentation:
Alignment of: HUMANKJP34 x AAH07930
Alignment segment 1/1: Quality: 971.00
Escore: 0 Matching length: 102 Total length: 102 Matching Percent Similarity: 99.02 Matching Percent Identity: 98.04 Total Percent Similarity: 99.02 Total Percent
Identity: 98.04 Gaps : 0
Alignment :
1 M TFVTQLLVTLVLLSFFLVSCQNVMHIVRGSLCFVLKHIHQELDKELGE 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 M TFVTQLLVTLVLLSFFLVSCQNVMHIVRGSLCFVLKHIHQELDKELGE 50
51 SEGLSDDEETISPRWRRRVFLKGNEFQNIPGEQVTEEQFTDEQGNIVTK 100 I I I I I I I I I II I I I I I I II I II I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 SEGLSDDEETISTRWRRRVFLKGNEFQNIPGEQVTEEQFTDEQGNIVTK 100
101 KV 102
101 KI 102
Sequence name: Q8N604 Sequence documentation:
Alignment of: HUMANK_P34 x Q8N604
Alignment segment 1/1:
Quality: 1152.00
Escore: 0 Matching length: 133 Total length: 155 Matching Percent Similarity: 98.50 Matching Percent Identity: 98.50 Total Percent Similarity: 84.52 Total Percent
Identity: 84.52 Gaps: 1
Alignment :
1 M TFVTQLLVTLVLLSFFLVSCQNVMHIVRGSLCFVLKHIHQELDKELGE 50 I I I I I I I I I I I 1 I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I 1 MWTFVTQLLVTLVLLSFFLVSCQNVMHIVRGSLCFVLKHIHQELDKELGE 50 51 SEGLSDDEETISPRWRRRVFLKGNEFQNIPGEQVTEEQFTDEQGNIVTK 100 II I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I II 51 SEDLSDDEETISTRWRRRVFLKGNEFQNIPGEQVTEEQFTDEQGNIVTK 100 101 K VELRGSGLQPDLIEGRKGAQIVKRASL 128 I I I I I I I I I I I I I I I I I I I I I I I II I I I 101 KIIRKWRQIDLSSADAAQEHEEVELRGSGLQPDLIEGRKGAQIVKRASL 150 129 KRGKQ 133 151 KRGKQ 155
DESCRIPTION FOR CLUSTER Z39819 Cluster Z39819 featares 1 transcript(s) and 10 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Figure imgf000617_0001
Table 3 - Proteins of interest
Figure imgf000618_0001
These sequences are variants of the known protein GDNF family receptor alpha 2 precursor (SwissProt accession identifier GFR2_HUMAN; known also according to the synonyms GFR-alpha 2; Neurtarin receptor alpha; NTNR-alpha; NRTNR-alpha; TGF-beta related neurotrophic factor receptor 2; GDNF receptor beta; GDNFR-beta; RET ligand 2), SEQ ID NO:632, refened to herein as the previously Icnown protein. Protein GDNF family receptor alpha 2 precursor is known or believed to have the following function(s): Receptor for neurtarin. Mediates the NRTN-induced autophosphorylation and activation of the RET receptor. Also able to mediate GDNF signaling through the RET tyrosine kinase receptor. The sequence for protein GDNF family receptor alpha 2 precursor is given at the end of the application, as "GDNF family receptor alpha 2 precursor amino acid sequence". Known polymoφhisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein
Figure imgf000618_0002
Protein GDNF family receptor alpha 2 precursor localization is believed to be attached to the membrane by a GPI-anchor (By similarity). The following GO Annotation(s) apply to the previously Icnown protein. The following annotation(s) were found: transmembrane receptor protein tyrosine kinase signaling pathway, which are annotation(s) related to Biological Process; and receptor; glial cell line-derived neurotrophic factor receptor, which are annotation(s) related to Molecular Function. The GO assignment relies on information from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>. As noted above, cluster Z39819 features 1 transcript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein GDNF family receptor alpha 2 precursor. A description of each variant protein according to the present invention is now provided.
Variant protein Z39819_PEA_1_P6 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) Z39819_PEA_1_T2. An alignment is given to the known protein (GDNF family receptor alpha 2 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between Z39819JPEA J JP6 and GFR2 JHUMAN: l .An isolated chimeric polypeptide encoding for Z39819JPEAJJP6, comprising a first amino acid sequence being at least 90 % homologous to MILANVFCLFFFL conesponding to amino acids 1 - 13 of GFR2 JHUMAN, which also conesponds to amino acids 1 - 13 of Z39819_PEA_1_P6, and a second amino acid sequence being at least 70%, optionally at least 80%), preferably at least 85%>, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence GPRAPRLAPPSGLCPGQ conesponding to amino acids 14 - 30 of Z39819_PEA_1_P6, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of Z39819JPEAJJP6, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence GPRAPRLAPPSGLCPGQ in Z39819_PEA_1_P6.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region.
The glycosylation sites of variant protein Z39819JPEAJJP6, as compared to the known protein GDNF family receptor alpha 2 precursor, are described in Table 5 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 5 - Glycosylation site(s)
Figure imgf000620_0001
Variant protein Z39819JPEAJJP6 is encoded by the following transcript(s): Z39819JPEAJJT2, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript Z39819_PEA_1_T2 is shown in bold; this coding portion starts at position 715 and ends at position 804. The transcript also has the following SNPs as listed in Table 6 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Z39819_PEA_1_P6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 6 - Nucleic acid SNPs
Figure imgf000620_0002
Figure imgf000621_0001
As noted above, cluster Z39819 featares 10 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster Z39819_PEA_l_node_2 according to the present invention is supported by 17 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z39819JPEAJJT2. Table 7 below describes the starting and ending position of this segment on each transcript. Table 7 - Segment location on transcripts
Figure imgf000621_0002
Segment cluster Z39819_PEA_l_node_6 according to the present invention is supported by 17 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z39819JPEAJJT2. Table 8 below describes the starting and ending position of this segment on each transcript. Table 8 - Segment location on transcripts
Figure imgf000622_0001
Segment cluster Z39819JPEAJ_nodeJ0 according to the present invention is supported by 23 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z39819JPEAJJT2. Table 9 below describes the starting and ending position of this segment on each transcript. Table 9 - Segment location on transcripts
Figure imgf000622_0002
Segment cluster Z39819_PEA_l_node_14 according to the present invention is supported by 25 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z39819JPEAJJT2. Table 10 below describes the starting and ending position of this segment on each transcript. Table 10 - Segment location on transcripts
Figure imgf000622_0003
Segment cluster Z39819_PEA_l_node_16 according to the present invention is supported by 26 libraries. The number of libraries was detennined as previously described. This segment can be found in the following transcript(s): Z39819_PEA_1_T2. Table 11 below describes the starting and ending position of this segment on each transcript. Table 11 - Segment location on transcripts
Figure imgf000623_0001
Segment cluster Z39819JPEAJ_node_21 according to the present invention is supported by 61 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z39819_PEA_1_T2. Table 12 below describes the starting and ending position of this segment on each transcript. Table 12 - Segment location on transcripts
Figure imgf000623_0002
According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster Z39819_PEA_l_nodeJ according to the present invention is supported by 14 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z39819JPEAJ JT2. Table 13 below describes the starting and ending position of this segment on each transcript. Table 13 - Segment location on transcripts
Figure imgf000624_0001
Segment cluster Z39819_PEA_l_node_8 according to the present invention is supported by 14 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z39819JPEAJJT2. Table 14 below describes the starting and ending position of this segment on each transcript. Table 14 - Segment location on transcripts
Figure imgf000624_0002
Segment cluster Z39819_PEA_l_node_12 according to the present invention is supported by 20 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z39819JPEAJJT2. Table 15 below describes the starting and ending position of this segment on each transcript. Table 15 - Segment location on transcripts
Figure imgf000624_0003
Segment cluster Z39819_PEA_l_node_19 according to the present invention is supported by 27 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z39819JPEAJJT2. Table 16 below describes the starting and ending position of this segment on each transcript. Table 16 - Segment location on transcripts
Figure imgf000625_0001
Variant protein alignment to the previously known protein: Sequence name: GFR2_HUMAN
Sequence documentation:
Alignment of: Z39819_PEA_1_P6 x GFR2_HUMAN Alignment segment 1/1:
Quality: 146.00 Escore: 0 Matching length: 26 Total length: 26 Matching Percent Similarity: 69.23 Matching Percent Identity: 69.23 Total Percent Similarity: 69.23 Total Percent Identity: 69.23 Gaps :
Alignment : 1 MILANVFCLFFFLGPRAPRLAPPSGL 26
1 MILANVFCLFFFLDETLRSLASPSSL 26
DESCRIPTION FOR CLUSTER HUMCAIXIA Cluster HUMCAIXIA featares 4 transcript(s) and 46 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Figure imgf000626_0001
Figure imgf000627_0001
Figure imgf000628_0001
Table 3 - Proteins of interest
Figure imgf000628_0002
These sequences are variants of the known protein Collagen alpha 1 (SwissProt accession identifier CAIBJHUMAN; known also according to the synonyms XI), SEQ ID NO: 633, refened to herein as the previously known protein. Protein Collagen alpha 1 is known or believed to have the following function(s): May play an important role in fibrillogenesis by controlling lateral growth of collagen II fibrils. The sequence for protein Collagen alpha 1 is given at the end of the application, as "Collagen alpha 1 amino acid sequence". Known polymoφhisms for this sequence are as shown in Table 4. Table 4 -Amino acid mutations for Known Protein
Figure imgf000628_0003
Figure imgf000629_0001
The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: cartilage condensation; vision; hearing; cell-cell adhesion; extracellular matrix organization and biogenesis, which are annotation(s) related to Biological Process; extracellular matrix structural protein; extracellular matrix protein, adhesive, which are annotation(s) related to Molecular Function; and extracellular matrix; collagen; collagen type XI, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>.
Cluster HUMCAIXIA can be used as a diagnostic marker according to overexpression of transcripts of this cluster in cancer. Expression of such transcripts in normal tissues is also given according to the previously described methods. The term "number" in the left hand column of the table and the numbers on the y-axis of the figure below refer to weighted expression of ESTs in each category, as "parts per million" (ratio of the expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million).
Overall, the following results were obtained as shown with regard to the histograms in Figure 24 and Table 5. This cluster is overexpressed (at least at a minimum level) in the following pathological conditions: bone malignant tamors, epithelial malignant tamors, a mixture of malignant tamors from different tissues and lung malignant tamors. Table 5 - Normal tissue distribution
Figure imgf000630_0001
Table 6 - P values and ratios for expression in cancerous tissue
Figure imgf000630_0002
Figure imgf000631_0001
As noted above, cluster HUMCAIXIA featares 4 transcript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Collagen alpha 1. A description of each variant protein according to the present invention is now provided. Variant protein HUMCA 1XIAJP 14 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMCA lXIAJT 16. An alignment is given to the Icnown protein (Collagen alpha 1) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMCAIXIA J 4 and CA1B_HUMAN_V5 (SEQ ID NO 634): l .An isolated chimeric polypeptide encoding for HUMCA 1XIA_P 14, comprising a first amino acid sequence being at least 90 %> homologous to MEPWSSRWKTKRWLWDFTVTTLALTFLFQAREVRGAAPVDVLKALDFHNSPEGISKTT GFCTNRKNSKGSDTAYRVSKQAQLSAPTKQLFPGGTFPEDFSILFTVKPKKGIQSFLLSIY NEHGIQQIGVEVGRSPVFLFEDHTGKPAPEDYPLFRTVNIADGKWHRVAISVEKKTVTM IVDCKKKTTKPLDRSERAIVDTNGITVFGTRILDEEVFEGDIQQFLITGDPKAAYDYCEH YSPDCDSSAPKAAQAQEPQIDEYAPEDIIEYDYEYGEAEYKEAESVTEGPTVTEETIAQT EANIVDDFQEYNYGTMESYQTEAPRHVSGTNEPNPVEEIFTEEYLTGEDYDSQRKNSED TLYENKEIDGRDSDLLVDGDLGEYDFYEYKEYEDKPTSPPNEEFGPGVPAETDITETSIN GHGAYGEKGQKGEPAWEPGMLVEGPPGPAGPAGIMGPPGLQGPTGPPGDPGDRGPPG RPGLPGADGLPGPPGTMLMLPFRYGGDGSKGPTISAQEAQAQAILQQARIALRGPPGPM GLTGRPGPVGGPGSSGAKGESGDPGPQGPRGVQGPPGPTGKPGKRGRPGADGGRGMP GEPGAKGDRGFDGLPGLPGDKGHRGERGPQGPPGPPGDDGMRGEDGEIGPRGLPGEAG PRGLLGPRGTPGAPGQPGMAGVDGPPGPKGNMGPQGEPGPPGQQGNPGPQGLPGPQG PIGPPGEKGPQGKPGLAGLPGADGPPGHPGKEGQSGEKGALGPPGPQGPIGYPGPRGVK GADGVRGLKGSKGEKGEDGFPGFKGDMGLKGDRGEVGQIGPRGEDGPEGPKGRAGPT GDPGPSGQAGEKGKLGVPGLPGYPGRQGPKGSTGFPGFPGANGEKGARGVAGKPGPR GQRGPTGPRGSRGARGPTGKPGPKGTSGGDGPPGPPGERGPQGPQGPVGFPGPKGPPGP PGKDGLPGHPGQRGETGFQGKTGPPGPGGVVGPQGPTGETGPIGERGHPGPPGPPGEQG LPGAAGKEGAKGDPGPQGISGKDGPAGLRGFPGERGLPGAQGAPGLKGGEGPQGPPGP V conesponding to amino acids 1 - 1056 of CA1BJHUMANJV5, which also conesponds to amino acids 1 - 1056 of HUMCA 1XIAJP 14, and a second amino acid sequence being at least 70%), optionally at least 80%, preferably at least 85%., more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence VSMMIiNSQTIMVVNYSSSFITLML conesponding to amino acids 1057 - 1081 of HUMCA 1XIA_P 14, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HUMCA 1XIAJP 14, comprising a polypeptide being at least 70%>, optionally at least about 80%>, preferably at least about 85%o, more preferably at least about 90%> and most preferably at least about 95%> homologous to the sequence VSMMIINSQTIMVVNYSSSFITLML in HUMCAIXIA J°14.
It should be noted that the known protein sequence (CAIB JHUMAN) has one or more changes than the sequence given at the end of the application and named as being the amino acid sequence for CA1B_HUMAN_V5. These changes were previously known to occur and are listed in the table below. Table 7 - Changes to CA1B_HUMAN_V5
Figure imgf000632_0001
The location of the variant protein was detennined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region. Variant protein HUMCA lXIAJP 14 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 8, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is Icnown or not; the presence of Icnown SNPs in variant protein HUMCA lXIAJP 14 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Amino acid mutations
Figure imgf000633_0001
Variant protein HUMCA1XIAJP14 is encoded by the following transcriρt(s): HUMCA lXIAjr 16, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMCA 1XIAJT 16 is shown in bold; this coding portion starts at position 319 and ends at position 3561. The transcript also has the following SNPs as listed in Table 9 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMCA1XIAJP14 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Nucleic acid SNPs
Figure imgf000634_0001
Variant protein HUMCA 1XIA_P 15 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMCA 1XIAJT17. An alignment is given to the known protein (Collagen alpha 1) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMCAIXIA JP15 and CA1BJIUMAN: l.An isolated chimeric polypeptide encoding for HUMCA1XIAJP15, comprising a first amino acid sequence being at least 90 %> homologous to MEPWSSRWKTKRWLWDFTVTTLALTFLFQAREVRGAAPVDVLKALDFHNSPEGISKTT GFCTNRKNSKGSDTAYRVSKQAQLSAPTKQLFPGGTFPEDFSILFTVKPKKGIQSFLLSIY NEHGIQQIGVEVGRSPVFLFEDHTGKPAPEDYPLFRTVNIADGKWHRVAISVEKKTVTM IVDCKKKTTKPLDRSERAIVDTNGITVFGTRILDEEVFEGDIQQFLITGDPKAAYDYCEH YSPDCDSSAPKAAQAQEPQIDEYAPEDIIEYDYEYGEAEYKEAESVTEGPTVTEETIAQT EANIVDDFQEYNYGTMESYQTEAPRHVSGTNEPNPVEEIFTEEYLTGEDYDSQRKNSED TLYENKEIDGRDSDLLVDGDLGEYDFYEYKEYEDKPTSPPNEEFGPGVPAETDITETSIN GHGAYGEKGQKGEPAVVEPGMLVEGPPGPAGPAGIMGPPGLQGPTGPPGDPGDRGPPG RPGLPGADGLPGPPGTMLMLPFRYGGDGSKGPTISAQEAQAQAILQQARIALRGPPGPM GLTGRPGPVGGPGSSGAKGESGDPGPQGPRGVQGPPGPTGKPGKRGRPGADGGRGMP GEPGAKGDRGFDGLPGLPGDKGHRGERGPQGPPGPPGDDGMRGEDGEIGPRGLPGEAG PRGLLGPRGTPGAPGQPGMAGVDGPPGPKGNMGPQGEPGPPGQQGNPGPQGLPGPQG PIGPPGEK conesponding to amino acids 1 - 714 of CA IB JHUMAN, which also conesponds to amino acids 1 - 714 of HUMCA lXIAJP 15, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence MCCNLSFGILIPLQK conesponding to amino acids 715 - 729 of HUMCA 1XIA_P 15, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HUMCA lXIAJP 15, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MCCNLSFGILIPLQK in HUMCAlXIA ?15.
The location of the variant protein was detennined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region. Variant protein HUMCAIXIA JP15 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 10, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMCA 1XIA_P 15 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 - Amino acid mutations
Figure imgf000636_0001
The glycosylation sites of variant protein HUMCA 1XIAJ315, as compared to the known protein Collagen alpha 1, are described in Table 11 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 11 - Glycosylation site(s)
Figure imgf000636_0002
Variant protein HUMCA lXIAJP 15 is encoded by the following transcript(s): HUMCA1XIAJT17, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMCA 1XIA_T 17 is shown in bold; this coding portion starts at position 319 and ends at position 2505. The transcript also has the following SNPs as listed in Table 12 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is Icnown or not; the presence of known SNPs in variant protein HUMCA1XIAJP15 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 12 - Nucleic acid SNPs
Figure imgf000636_0003
Figure imgf000637_0001
Variant protein HUMCA lXIAJP 16 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMCA1XIA_T19. An alignment is given to the Icnown protein (Collagen alpha 1) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMCAIXIA JH6 and CA1B HUMAN: l.An isolated chimeric polypeptide encoding for HUMCA lXIAJP 16, comprising a first amino acid sequence being at least 90 % homologous to MEPWSSRWKTKRWLWDFTVTTLALTFLFQAREVRGAAPVDVLKALDFHNSPEGISKTT GFCTNRKNSKGSDTAYRVSKQAQLSAPTKQLFPGGTFPEDFSILFTVKPKKGIQSFLLSIY NEHGIQQIGVEVGRSPVFLFEDHTGKPAPEDYPLFRTVNIADGKWHRVAISVEKKTVTM IVDCKKKTTKPLDRSERAIVDTNGITVFGTRTLDEEVFEGDIQQFLITGDPKAA YDYCEH YSPDCDSSAPKAAQAQEPQIDEYAPEDIIEYDYEYGEAEYKEAESVTEGPTVTEETIAQT EANIVDDFQEYNYGTMESYQTEAPRHVSGTNEPNPVEEIFTEEYLTGEDYDSQPvKNSED TLYENKEIDGRDSDLLVDGDLGEYDFYEYKEYEDKPTSPPNEEFGPGVPAETDITETSIN GHGAYGEKGQKGEPAWEPGMLVEGPPGPAGPAGIMGPPGLQGPTGPPGDPGDRGPPG RPGLPGADGLPGPPGTMLMLPFRYGGDGSKGPTISAQEAQAQAILQQARIALRGPPGPM GLTGRPGPVGGPGSSGAKGESGDPGPQGPRGVQGPPGPTGKPGKRGRPGADGGRGMP GEPGAKGDRGFDGLPGLPGDKGHRGERGPQGPPGPPGDDGMRGEDGEIGPRGLPGEA conesponding to amino acids 1 - 648 of CAIB JIUMAN, which also conesponds to amino acids 1 - 648 of HUMCA 1XIA_P 16, a second amino acid sequence being at least 90 % homologous to GMAGVDGPPGPKGNMGPQGEPGPPGQQGNPGPQGLPGPQGPIGPPGEK conesponding to amino acids 667 - 714 of CAIB JHUMAN, which also conesponds to amino acids 649 - 696 of HUMCA1XIA_P16, and a third amino acid sequence being at least 70%>, optionally at least 80%>, preferably at least 85%, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence VSFSFSLFYKKVIKFACDKRFVGRHDERKVVKLSLPLYLIYE conesponding to amino acids 697 - 738 of HUMCA lXIAjP 16, wherein said first amino acid sequence, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. 2.An isolated chimeric polypeptide encoding for an edge portion of HUMCA 1 XI A JP 16, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise AG, having a stmcture as follows: a sequence starting from any of amino acid numbers 648-x to 648; and ending at any of amino acid numbers 649+ ((n-2) - x), in which x varies from 0 to n-2. 3.An isolated polypeptide encoding for a tail of HUMCA1XIAJP16, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VSFSFSLFYKKVIKFACDKRFVGRHDERKVVKLSLPLYLIYE in
HUMCA1XIAJP16.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region. Variant protein HUMCAIXIA >16 also has the following non-silent SNPs (Single
Nucleotide Polymoφhisms) as listed in Table 13, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is Icnown or not; the presence of known SNPs in variant protein HUMCA lXIAJP 16 sequence provides support for the deduced sequence of this variant protein according to the present invention).
Table 13 - Amino acid mutations
Figure imgf000639_0001
The glycosylation sites of variant protein HUMCA lXIAJP 16, as compared to the known protein Collagen alpha 1, are described in Table 14 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 14 - Glycosylation site(s)
Figure imgf000639_0002
Variant protein HUMCA1XIA_P16 is encoded by the following transcript(s): HUMCA lXIAjr 19, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMCA1XIA_T19 is shown in bold; this coding portion starts at position 319 and ends at position 2532. The transcript also has the following SNPs as listed in Table 15 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMCA1XIA_P16 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 15 - Nucleic acid SNPs
Figure imgf000639_0003
Figure imgf000640_0001
Variant protein HUMCA 1XIA_P 17 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMCA1XIAJT20. An alignment is given to the Icnown protein (Collagen alpha 1) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMCA1XIAJP17 and CA1BJHUMAN: l.An isolated chimeric polypeptide encoding for HUMCA lXIAJP 17, comprising a first amino acid sequence being at least 90 %> homologous to MEPWSSRWKTKRWLWDFTVTTLALTFLFQAREVRGAAPVDVLKALDFHNSPEGISKTT GFCTNRKNSKGSDTAYRVSKQAQLSAPTKQLFPGGTFPEDFSILFTVKPKKGIQSFLLSIY NEHGIQQIGVEVGRSPVFLFEDHTGKPAPEDYPLFRTVNIADGKWHRVAISVEKKTVTM IVDCKKKTTKPLDRSERAIVDTNGITVFGTRILDEEVFEGDIQQFLITGDPKAAYDYCEH YSPDCDSSAPKAAQAQEPQIDE conesponding to amino acids 1 - 260 of CA1B_HUMAN, which also conesponds to amino acids 1 - 260 of HUMCA1XIAJP17, and a second amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VRSTRPEKVFVFQ conesponding to amino acids 261 - 273 of HUMCA lXIAJP 17, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HUMCA lXIAJP 17, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VRSTRPEKVFVFQ in HUMCA lXIAJP 17.
The location of the variant protein was detennined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region. Variant protein HUMCA lXIAJP 17 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 16, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMCA1XIA_P17 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 16 - Amino acid mutations
Figure imgf000641_0001
The glycosylation sites of variant protein HUMCA1XIA_P17, as compared to the Icnown protein Collagen alpha 1, are described in Table 17 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 17 - Glycosylation site(s) Position(s) on known amino Present in variant protein? acid sequence
Figure imgf000642_0001
Variant protein HUMCA lXIAJP 17 is encoded by the following transcript(s): HUMCA 1XIAJT20, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMCA 1XIAJT20 is shown in bold; this coding portion starts at position 319 and ends at position 1 137. The transcript also has the following SNPs as listed in Table 18 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMCA lXIAJP 17 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 18 - Nucleic acid SNPs
Figure imgf000642_0002
2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster HUMCA lXIA jnode J) according to the present invention is supported by 13 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCA lXIAJT 16, HUMCA1XIAJT17, HUMCA1XIA_T19 and HUMCA 1XIA_T20. Table 19 below describes the starting and ending position of this segment on each transcript. Table 19 - Segment location on transcripts
Figure imgf000643_0001
Segment cluster HUMCA 1 XIAjiode J according to the present invention is supported by 9 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCA lXIAJT 16, HUMCA 1XIA_T 17, HUMCA1XIAJT19 and HUMCA 1XIAJT20. Table 20 below describes the starting and ending position of this segment on each transcript. Table 20 - Segment location on transcripts
Figure imgf000643_0002
Segment cluster HUMCA 1 XI A iode - according to the present invention is supported by 5 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCA lXIAJT 16, HUMCA lXIAJT 17, HUMCAIXIA JT19 and HUMCA 1XIAJT20. Table 21 below describes the starting and ending position of this segment on each transcript. Table 21 - Segment location on transcripts
Figure imgf000644_0001
Microanay (chip) data is also available for this segment as follows. As described above with regard to the cluster itself, various oligonucleotides were tested for being differentially expressed in various disease conditions, particularly cancer. The following oligonucleotides were found to hit this segment (in relation to colon cancer), shown in Table 22. Table 22 - Oligonucleotides related to this segment
Figure imgf000644_0002
Segment cluster HUMCA lXIA_node_6 according to the present invention is supported by 5 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCAIXIA JT16, HUMCA lXIAJT 17, HUMCA1XIA_T19 and HUMCA 1XIA_T20. Table 23 below describes the starting and ending position of this segment on each transcript. Table 23 - Segment location on transcripts
Figure imgf000644_0003
Microanay (chip) data is also available for this segment as follows. As described above with regard to the cluster itself, various oligonucleotides were tested for being differentially expressed in various disease conditions, particularly cancer. The following oligonucleotides were found to hit this segment, shown in Table 24. Table 24 - Oligonucleotides related to this segment
Figure imgf000645_0001
Segment cluster HUMCA lXIA_node_8 according to the present invention is supported by 5 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCA1XIA_T16, HUMCA lXIAJT 17, HUMCA1XIAJT19 and HUMCAIXIA JT20. Table 25 below describes the starting and ending position of this segment on each transcript. Table 25 - Segment location on transcripts
Figure imgf000645_0002
Segment cluster HUMCAlXIA_node_9 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCA 1XIAJT20. Table 26 below describes the starting and ending position of this segment on each transcript. Table 26 - Segment location on transcripts
Figure imgf000646_0001
Segment cluster HUMCA 1 XLA jnode J 8 according to the present invention is supported by 6 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCA lXIAJT 16, HUMCA1XIAJT17 and HUMCA lXIAJT 19. Table 27 below describes the starting and ending position of this segment on each transcript. Table 27 - Segment location on transcripts
Figure imgf000646_0002
Segment cluster HUMCAlXIA_nodeJ4 according to the present invention is supported by 2 libraries. The number of libraries was detennined as previously described. This segment can be found in the following transcript(s): HUMCA1XIA_T19. Table 28 below describes the starting and ending position of this segment on each transcript. Table 28 - Segment location on transcripts
Figure imgf000646_0003
2005/072053
646 Segment cluster HUMCAlXIA_node_55 according to the present invention is supported by 4 libraries. The number of libraries was detennined as previously described. This segment can be found in the following transcript(s): HUMCA1XIAJT17 and HUMCA1XIAJT19. Table 29 below describes the starting and ending position of this segment on each transcript. Table 29 - Segment location on transcripts
Figure imgf000647_0001
Segment cluster HUMCA lXIA_node_92 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCA lXIAJT 16. Table 31 below describes the starting and ending position of this segment on each transcript. Table 31 - Segment location on transcripts
Figure imgf000647_0002
According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description. Segment cluster HUMCAlXIA_node_l l according to the present invention is supported by 3 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCAIXIA JTl 6, HUMCAIXIA JT17 and 2005/072053
647 HUMCA lXIAJT 19. Table 32 below describes the starting and ending position of this segment on each transcript. Table 32 - Segment location on transcripts
Figure imgf000648_0001
Segment cluster HUMCA 1 XI A jnode J 5 according to the present invention is supported by 5 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCA1XIA_T 16, HUMCA lXIAJT 17 and HUMCA lXIAJT 19. Table 33 below describes the starting and ending position of this segment on each transcript. Table 33 - Segment location on transcripts
Figure imgf000648_0002
Segment cluster HUMCAlXIA_nodeJ9 according to the present invention is supported by 3 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCA lXIAJT 16, HUMCA1XIA_T17 and HUMCAIXIA JTl 9. Table 34 below describes the starting and ending position of this segment on each transcript. Table 34 - Segment location on transcripts
Figure imgf000649_0001
Segment cluster HUMCA lXIA_node_21 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCAIXIA JTl 6, HUMCAIXIA JT17 and HUMCA1XIAJT19. Table 35 below describes the starting and ending position of this segment on each transcript. Table 35 - Segment location on transcripts
Figure imgf000649_0002
Segment cluster HUMCA lXIA_node_23 according to the present invention is supported by 3 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCA1XIA_T16, HUMCAIXIA JT17 and HUMCA1XIAJT19. Table 36 below describes the starting and ending position of this segment on each transcript. Table 36 - Segment location on transcripts
Figure imgf000650_0001
Segment cluster HUMCA lXIA_node_25 according to the present invention is supported by 3 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCA lXIAJT 16, HUMCA1XIAJT17 and HUMCA lXIAJT 19. Table 37 below describes the starting and ending position of this segment on each transcript. Table 37 - Segment location on transcripts
Figure imgf000650_0002
Segment cluster HUMCAlXIA_nodeJ7 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCAIXIA JT16, HUMCAIXIA JT17 and HUMCA lXIAJT 19. Table 38 below describes the starting and ending position of this segment on each transcript. Table 38 - Segment location on transcripts
Figure imgf000650_0003
Segment cluster HUMCAlXIA_node 9 according to the present invention is supported by 3 libraries. The number of libraries was detennined as previously described. This segment can be found in the following transcript(s): HUMCA lXIAJT 16, HUMCA1XIAJT17 and HUMCA lXIAJT 19. Table 39 below describes the starting and ending position of this segment on each transcript. Table 39 - Segment location on transcripts
Figure imgf000651_0001
Segment cluster HUMCA lXIA nodeJl according to the present invention is supported by 3 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCAIXIA JT16, HUMCAIXIA JT17 and HUMCA1XIA_T19. Table 40 below describes the starting and ending position of this segment on each transcript. Table 40 - Segment location on transcripts
Figure imgf000651_0002
Segment cluster HUMCAlXIA_nodeJ3 according to the present invention is supported by 3 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCA lXIAJT 16, HUMCA lXIAJT H and HUMCA lXIAJT 19. Table 41 below describes the starting and ending position of this segment on each transcript. Table 41 - Segment location on transcripts
Figure imgf000652_0001
Segment cluster HUMCAlXIA_nodeJ5 according to the present invention is supported by 4 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcripts): HUMCA lXIAJT 16, HUMCAIXIA JT17 and HUMCA1XIAJT19. Table 42 below describes the starting and ending position of this segment on each transcript. Table 42 - Segment location on transcripts
Figure imgf000652_0002
Segment cluster HUMCAlXIA_nodeJ7 according to the present invention is supported by 4 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCA lXIAJT 16, HUMCAIXIA JT17 and HUMCAIXIA JTl 9. Table 43 below describes the starting and ending position of this segment on each transcript. Table 43 - Segment location on transcripts
Figure imgf000653_0001
Segment cluster HUMCAlXIA_node_39 according to the present invention is supported by 5 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCA lXIAJT 16, HUMCA1XIAJT17 and HUMCA lXIAJT 19. Table 44 below describes the starting and ending position of this segment on each transcript. Table 44 - Segment location on transcripts
Figure imgf000653_0002
Segment cluster HUMCA lXIAjtiode U according to the present invention is supported by 4 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCA1XIA_T16, HUMCAIXIA JT17 and HUMCA1XIA_T19. Table 45 below describes the starting and ending position of this segment on each transcript. Table 45 - Segment location on transcripts
Figure imgf000653_0003
Figure imgf000654_0001
Segment cluster HUMCA 1 XI A_node_43 according to the present invention is supported by 5 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCA1XIAJT16, HUMCA1XIAJT17 and HUMCA lXIAJT 19. Table 46 below describes the starting and ending position of this segment on each transcript. Table 46 - Segment location on transcripts
Figure imgf000654_0002
Segment cluster HUMCAIXIA jnode l5 according to the present invention is supported by 4 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCAIXIA JT16 and HUMCA lXIAJT 17. Table 47 below describes the starting and ending position of this segment on each transcript. Table 47 - Segment location on transcripts
Figure imgf000654_0003
Segment cluster HUMCA lXIA ιode_47 according to the present invention is supported by 5 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCA lXIAJT 16, HUMCA1XIAJT17 and HUMCA lXIAJT 19. Table 48 below describes the starting and ending position of this segment on each transcript. Table 48 - Segment location on transcripts
Figure imgf000655_0001
Segment cluster HUMCA lXIA_node_49 according to the present invention is supported by 5 libraries. The number of libraries was detennined as previously described. This segment can be found in the following transcript(s): HUMCA1XIA_T16, HUMCA1XIA_T17 and HUMCA1XIA_T19. Table 49 below describes the starting and ending position of this segment on each transcript. Table 49 - Segment location on transcripts
Figure imgf000655_0002
Segment cluster HUMCA lXIA_node_51 according to the present invention is supported by 7 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCA1XIA_T16, HUMCAIXIA JT17 and HUMCAIXIA JTl 9. Table 50 below describes the starting and ending position of this segment on each transcript. Table 50 - Segment location on transcripts
Figure imgf000656_0001
Segment cluster HUMCAlXIA ιode 7 according to the present invention is supported by 4 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCA 1 XI A JTl 6. Table 51 below describes the starting and ending position of this segment on each transcript. Table 51 - Segment location on transcripts
Figure imgf000656_0002
Segment cluster HUMCAlXIA iode 59 according to the present invention is supported by 3 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCA1XIAJT16. Table 52 below describes the starting and ending position of this segment on each transcript. Table 52 - Segment location on transcripts
Figure imgf000656_0003
Segment cluster HUMCA 1 XIAjiode _62 according to the present invention is supported by 3 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCA lXIAJT 16. Table 53 below describes the starting and ending position of this segment on each transcript. Table 53 - Segment location on transcripts
Figure imgf000657_0001
Segment cluster HUMCA lXIA_node_64 according to the present invention is supported by 4 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCA1XIAJT16. Table 54 below describes the starting and ending position of this segment on each transcript. Table 54 - Segment location on transcripts
Figure imgf000657_0002
Segment cluster HUMCA lXIA_node_66 according to the present invention is supported by 4 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCA1XIAJT16. Table 55 below describes the starting and ending position of this segment on each transcript. Table 55 - Segment location on transcripts
Figure imgf000657_0003
Segment cluster HUMCAlXIA ιode_68 according to the present invention is supported by 7 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCA lXIAJT 16. Table 56 below describes the starting and ending position of this segment on each transcript. Table 56 - Segment location on transcripts
Figure imgf000658_0001
Segment cluster HUMCA lXIA node _70 according to the present invention is supported by 6 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCA1XIAJ 16. Table 57 below describes the starting and ending position of this segment on each transcript. Table 57 - Segment location on transcripts
Figure imgf000658_0002
Segment cluster HUMCA lXL _nodeJ72 according to the present invention is supported by 6 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCAIXIA JTl 6. Table 58 below describes the starting and ending position of this segment on each transcript. Table 58 - Segment location on transcripts
Figure imgf000658_0003
Segment cluster HUMCA 1 XI A jnode 4 according to the present invention is supported by 5 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCA lXIAJT 16. Table 59 below describes the starting and ending position of this segment on each transcript. Table 59 - Segment location on transcripts
Figure imgf000659_0001
Segment cluster HUMCA lXIA_node 6 according to the present invention is supported by 6 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCA1XIAJT16. Table 60 below describes the starting and ending position of this segment on each transcript. Table 60 - Segment location on transcripts
Figure imgf000659_0002
Segment cluster HUMCAlXIA_nodeJ8 according to the present invention is supported by 6 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCA1XIA_T16. Table 61 below describes the starting and ending position of this segment on each transcript. Table 61 - Segment location on transcripts
Figure imgf000659_0003
Figure imgf000660_0001
Segment cluster HUMCA lXIA_node_81 according to the present invention is supported by 8 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCA lXIAJT 16. Table 62 below describes the starting and ending position of this segment on each transcript. Table 62 - Segment location on transcripts
Figure imgf000660_0002
Segment cluster HUMCA lXIA iode 83 according to the present invention is supported by 7 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCA lXIAJT 16. Table 63 below describes the starting and ending position of this segment on each transcript. Table 63 - Segment location on transcripts
Figure imgf000660_0003
Segment cluster HUMCAlXIA_node_85 according to the present invention is supported by 6 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCA1XIA_T16. Table 64 below describes the starting and ending position of this segment on each transcript. Table 64 - Segment location on transcripts
Figure imgf000661_0001
Segment cluster HUMCA lXIA_node_87 according to the present invention is supported by 10 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCAIXIA JTl 6. Table 65 below describes the starting and ending position of this segment on each transcript. Table 65 - Segment location on transcripts
Figure imgf000661_0002
Segment cluster HUMCA lXIA_node_89 according to the present invention is supported by 9 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCA1XIA_T16. Table 66 below describes the starting and ending position of this segment on each transcript. Table 66 - Segment location on transcripts
Figure imgf000661_0003
Segment cluster HUMCA lXIA_node_91 according to the present invention is supported by 11 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCA1XIAJT16. Table 67 below describes the starting and ending position of this segment on each transcript. Table 67 - Segment location on transcripts
Figure imgf000662_0001
Variant protein alignment to the previously known protein: Sequence name: CAlB_HUMANJ/5
Sequence documentation:
Alignment of: HUMCA1XIA_P14 x CA1B_HUMAN_V5
Alignment segment 1/1: Quality: 10456.00 Escore: 0 Matching length: 1058 Total length: 1058 Matching Percent Similarity: 99.91 Matching Percent Identity: 99.91 Total Percent Similarity: 99.91 Total Percent Identity: 99.91 Gaps : 0 Alignment ;
1 MEPWSSRWKTKRWLWDFTVTTLALTFLFQAREVRGAAPVDVLKALDFHNS 50 MEP SSR KTKRWL DFTVTTLALTFLFQAREVRGAAPVDVLKALDFHNS 50
PEGISKTTGFCTNRKNSKGSDTAYRVSKQAQLSAPTKQLFPGGTFPEDFS 100
I I I I I I I I I I I I II I I I II III II I I I I I I I I I I I I I I II I I I I I I I I I I PEGISKTTGFCTNRKNSKGSDTAYRVSKQAQLSAPTKQLFPGGTFPEDFS 100
ILFTVKPKKGIQSFLLSIYNEHGIQQIGVEVGRSPVFLFEDHTGKPAPED 150
I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I II 1 I I II I I I II I ILFTVKPKKGIQSFLLSIYNEHGIQQIGVEVGRSPVFLFEDHTGKPAPED 150
YPLFRTVNIADGKWHRVAISVEKKTVTMIVDCKKKTTKPLDRSERAIVDT 200
I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I YPLFRTVNIADGKWHRVAISVEKKTVTMIVDCKKKTTKPLDRSERAIVDT 200
NGITVFGTRILDEEVFEGDIQQFLITGDPKAAYDYCEHYSPDCDSSAPKA 250
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I NGITVFGTRILDEEVFEGDIQQFLITGDPKAAYDYCEHYSPDCDSSAPKA 250
AQAQEPQIDEYAPEDIIEYDYEYGEAEYKEAESVTEGPTVTEETIAQTEA 300 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I AQAQEPQIDEYAPEDIIEYDYEYGEAEYKEAESVTEGPTVTEETIAQTEA 300
NIVDDFQEYNYGTMESYQTEAPRHVSGTNEPNPVEEIFTEEYLTGEDYDS 350
I I I I II I I I I I I I I I I I I I II I I I II I I II I I I I I I I I I I I I I I I I I I I I NIVDDFQEYNYGTMESYQTEAPRHVSGTNEPNPVEEIFTEEYLTGEDYDS 350
QRKNSEDTLYENKEIDGRDSDLLVDGDLGEYDFYEYKEYEDKPTSPPNEE 400
I I I I I I I I I I I I I I I I I I I 11 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I QRKNSEDTLYENKEIDGRDSDLLVDGDLGEYDFYEYKEYEDKPTSPPNEE 400
FGPGVPAETDITETSINGHGAYGEKGQKGEPAWEPGMLVEGPPGPAGPA 450 I I I I I I I I I I I I II I I I I I I I I I I I I I I I II II I I I I I I I I I I I I I I I I I
401 FGPGVPAETDITETSINGHGAYGEKGQKGEPAWEPGMLVEGPPGPAGPA 450
451 GIMGPPGLQGPTGPPGDPGDRGPPGRPGLPGADGLPGPPGTMLMLPFRYG 500 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
451 GIMGPPGLQGPTGPPGDPGDRGPPGRPGLPGADGLPGPPGTMLMLPFRYG 500
501 GDGSKGPTISAQEAQAQAILQQARIALRGPPGPMGLTGRPGPVGGPGSSG 550 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 501 GDGSKGPTISAQEAQAQAILQQARIALRGPPGPMGLTGRPGPVGGPGSSG 550
551 AKGESGDPGPQGPRGVQGPPGPTGKPGKRGRPGADGGRGMPGEPGAKGDR 600
551 AKGESGDPGPQGPRGVQGPPGPTGKPGKRGRPGADGGRGMPGEPGAKGDR 600
601 GFDGLPGLPGDKGHRGERGPQGPPGPPGDDGMRGEDGEIGPRGLPGEAGP 650 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I II I I I I I I I I I
601 GFDGLPGLPGDKGHRGERGPQGPPGPPGDDGMRGEDGEIGPRGLPGEAGP 650
651 RGLLGPRGTPGAPGQPGMAGVDGPPGPKGNMGPQGEPGPPGQQGNPGPQG 700 I I II I I I I I I I I I I I I I III I I I I I II I I I I II I II I I I I I I I II I I I I I
651 RGLLGPRGTPGAPGQPGMAGVDGPPGPKGNMGPQGEPGPPGQQGNPGPQG 700
701 LPGPQGPIGPPGEKGPQGKPGLAGLPGADGPPGHPGKEGQSGEKGALGPP 750
701 LPGPQGPIGPPGEKGPQGKPGLAGLPGADGPPGHPGKEGQSGEKGALGPP 750
751 GPQGPIGYPGPRGVKGADGVRGLKGSKGEKGEDGFPGFKGDMGLKGDRGE 800 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 751 GPQGPIGYPGPRGVKGADGVRGLKGSKGEKGEDGFPGFKGDMGLKGDRGE 800 801 VGQIGPRGEDGPEGPKGRAGPTGDPGPSGQAGEKGKLGVPGLPGYPGRQG 850 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 801 VGQIGPRGEDGPEGPKGRAGPTGDPGPSGQAGEKGKLGVPGLPGYPGRQG 850
851 PKGSTGFPGFPGANGEKGARGVAGKPGPRGQRGPTGPRGSRGARGPTGKP 900 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 851 PKGSTGFPGFPGANGEKGARGVAGKPGPRGQRGPTGPRGSRGARGPTGKP 900
901 GPKGTSGGDGPPGPPGERGPQGPQGPVGFPGPKGPPGPPGKDGLPGHPGQ 950 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 901 GPKGTSGGDGPPGPPGERGPQGPQGPVGFPGPKGPPGPPGKDGLPGHPGQ 950
951 RGETGFQGKTGPPGPGGWGPQGPTGETGPIGERGHPGPPGPPGEQGLPG 1000 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 951 RGETGFQGKTGPPGPGGWGPQGPTGETGPIGERGHPGPPGPPGEQGLPG 1000
1001 AAGKEGAKGDPGPQGISGKDGPAGLRGFPGERGLPGAQGAPGLKGGEGPQ 1050 I I I I I I 1 1 I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1001 AAGKEGAKGDPGPQGISGKDGPAGLRGFPGERGLPGAQGAPGLKGGEGPQ 1050
1051 GPPGPVVS 1058
1051 GPPGPVGS 1058
Sequence name: CAIBJHUMAN Sequence documentation:
Alignment of: HUMCA1XIA_P15 x CA1B_HUMAN
Alignment segment 1/1:
Quality: 7073.00
Escore: 0 Matching length: 714 Total length: 714 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent
Identity: 100.00 Gaps: 0
Alignment :
1 MEPWSSR KTKRWL DFTVTTLALTFLFQAREVRGAAPVDVLKALDFHNS 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MEPWSSRWKTKRWLWDFTVTTLALTFLFQAREVRGAAPVDVLKALDFHNS 50 51 PEGISKTTGFCTNRKNSKGSDTAYRVSKQAQLSAPTKQLFPGGTFPEDFS 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I 51 PEGISKTTGFCTNRKNSKGSDTAYRVSKQAQLSAPTKQLFPGGTFPEDFS 100 101 ILFTVKPKKGIQSFLLSIYNEHGIQQIGVEVGRSPVFLFEDHTGKPAPED 150 I I I I I I I I I I I I I I I IM I I I I I I! I I I I I I I I I I I I I I I I I II I I I I I I 101 ILFTVKPKKGIQSFLLSIYNEHGIQQIGVEVGRSPVFLFEDHTGKPAPED 150
151 YPLFRTVNIADGKWHRVAISVEKKTVTMIVDCKKKTTKPLDRSERAIVDT 200 I I I I I I I I I I I I I I II I I I I I II I I I I I I I M I I M II Ml I I ill I I I I
151 YPLFRTVNIADGK HRVAISVEKKTVTMIVDCKKKTTKPLDRSERAIVDT 200
201 NGITVFGTRILDEEVFEGDIQQFLITGDPKAAYDYCEHYSPDCDSSAPKA 250 I I I I I II I I I II II I I I I I I I I I II I I I I I I I I I I I I II 1 I II I I II II I
201 NGITVFGTRILDEEVFEGDIQQFLITGDPKAAYDYCEHYSPDCDSSAPKA 250
251 AQAQEPQIDEYAPEDIIEYDYEYGEAEYKEAESVTEGPTVTEETIAQTEA 300 I I I I I II I I I I I I I II II I I I I I II I I I I I I I I I I I I I I I I II I I II II I 251 AQAQEPQIDEYAPEDIIEYDYEYGEAEYKEAESVTEGPTVTEETIAQTEA 300
301 NIVDDFQEYNYGTMESYQTEAPRHVSGTNEPNPVEEIFTEEYLTGEDYDS 350 I I I II I I I II I II I I I I I II I I I I I I I I I I I I I I I I I I II II II I II I I I
301 NIVDDFQEYNYGTMESYQTEAPRHVSGTNEPNPVEEIFTEEYLTGEDYDS 350 . . . . .
351 QRKNSEDTLYENKEIDGRDSDLLVDGDLGEYDFYEYKEYEDKPTSPPNEE 400 I II I I II I I I I I II I I I I I I II II I I I I I I I II I I I II I II I I I I I II I I
351 QRKNSEDTLYENKEIDGRDSDLLVDGDLGEYDFYEYKEYEDKPTSPPNEE 400
401 FGPGVPAETDITETSINGHGAYGEKGQKGEPAWEPGMLVEGPPGPAGPA 450 I II II II II II II I II I I II II II I I I II II I II I I I II I I II I I II I I I
401 FGPGVPAETDITETSINGHGAYGEKGQKGEPAWEPGMLVEGPPGPAGPA 450
451 GIMGPPGLQGPTGPPGDPGDRGPPGRPGLPGADGLPGPPGTMLMLPFRYG 500 I I I I I I I I I I I I I I I I II II II I 1 II I II II II II M II 11 II I M II II
451 GIMGPPGLQGPTGPPGDPGDRGPPGRPGLPGADGLPGPPGTMLMLPFRYG 500
501 GDGSKGPTISAQEAQAQAILQQARIALRGPPGPMGLTGRPGPVGGPGSSG 550 I I I I I II I I I II II I I I I I II I II I I I I I I I I I I I I I I I I I II I I I I I I I 501 GDGSKGPTISAQEAQAQAILQQARIALRGPPGPMGLTGRPGPVGGPGSSG 550 551 AKGESGDPGPQGPRGVQGPPGPTGKPGKRGRPGADGGRGMPGEPGAKGDR 600 I I II I I I I I I II I I II I I I I I II II I I I I I I I I I I II II I I I I I I I I I I I 551 AKGESGDPGPQGPRGVQGPPGPTGKPGKRGRPGADGGRGMPGEPGAKGDR 600 601 GFDGLPGLPGDKGHRGERGPQGPPGPPGDDGMRGEDGEIGPRGLPGEAGP 650 I I I I I I I I II I I I I I I I I I I I I I I I II II I I II II I I I I I I I I I I I I I I I 601 GFDGLPGLPGDKGHRGERGPQGPPGPPGDDGMRGEDGEIGPRGLPGEAGP 650 651 RGLLGPRGTPGAPGQPGMAGVDGPPGPKGNMGPQGEPGPPGQQGNPGPQG 700 I II I I I I I I I I 1 I I I I I I II 1 I I II I I I I I I I I I I II I I II I I I I I I I I I 651 RGLLGPRGTPGAPGQPGMAGVDGPPGPKGNMGPQGEPGPPGQQGNPGPQG 700
701 LPGPQGPIGPPGEK 714 I I I I II II II I I II 701 LPGPQGPIGPPGEK 714
Sequence name: CA1BJ.UMAN
Sequence documentation:
Alignment of: HUMCA1XIA_P16 x CA1B_HUMAN
Alignment segment 1/1: Quality: 6795.00
Escore: 0 Matching length: 696 Total length: 714 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 97.48 Total Percent
Identity: 97.48 Gaps : 1
Alignment : . . . . . 1 MEP SSRWKTKRWL DFTVTTLALTFLFQAREVRGAAPVDVLKALDFHNS 50 1 I I II I I I I I I I I I I I I II I II II II II I II II I I I I I II I I I I I I I I I I 1 MEPWSSR KTKR LWDFTVTTLALTFLFQAREVRGAAPVDVLKALDFHNS 50 51 PEGISKTTGFCTNRKNSKGSDTAYRVSKQAQLSAPTKQLFPGGTFPEDFS 100 I I I I I II I I I I I I I I II I II I I I I I I I I I I I I I II I II I I I I I I I I I I I I 51 PEGISKTTGFCTNRKNSKGSDTAYRVSKQAQLSAPTKQLFPGGTFPEDFS 100
101 ILFTVKPKKGIQSFLLSIYNEHGIQQIGVEVGRSPVFLFEDHTGKPAPED 150 I I I I I I I I I I I I I II I I I I II II I I II I II II I I I I I I I I II I I I I I I II 101 ILFTVKPKKGIQSFLLSIYNEHGIQQIGVEVGRSPVFLFEDHTGKPAPED 150
151 YPLFRTVNIADGK HRVAISVEKKTVTMIVDCKKKTTKPLDRSERAIVDT 200 I II I I I III I I I I I I I I I I I I I I I II II I II I I I I I I I II I I I I II I I I I 151 YPLFRTVNIADGKWHRVAISVEKKTVTMIVDCKKKTTKPLDRSERAIVDT 200
201 NGITVFGTRILDEEVFEGDIQQFLITGDPKAAYDYCEHYSPDCDSSAPKA 250 I II II II II M II II II II II II II II II II I II I II II I MM I I II II 201 NGITVFGTRILDEEVFEGDIQQFLITGDPKAAYDYCEHYSPDCDSSAPKA 250 . . . . . 251 AQAQEPQIDEYAPEDIIEYDYEYGEAEYKEAESVTEGPTVTEETIAQTEA 300 II I II II II II II I I II II II II II I) I I I I II II II II II II I I II II I
251 AQAQEPQIDEYAPEDIIEYDYEYGEAEYKEAESVTEGPTVTEETIAQTEA 300
301 NIVDDFQEYNYGTMESYQTEAPRHVSGTNEPNPVEEIFTEEYLTGEDYDS 350 I I I I II I I II I I I I I I I I II I I I II I I I I I I I I I I I I I I I I I I I I II I I I
301 NIVDDFQEYNYGTMESYQTEAPRHVSGTNEPNPVEEIFTEEYLTGEDYDS 350
351 QRKNSEDTLYENKE1DGRDSDLLVDGDLGEYDFYEYKEYEDKPTSPPNEE 400 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I II I I I I 351 QRKNSEDTLYENKEIDGRDSDLLVDGDLGEYDFYEYKEYEDKPTSPPNEE 400
401 FGPGVPAETDITETSINGHGAYGEKGQKGEPAWEPGMLVEGPPGPAGPA 450
401 FGPGVPAETDITETSINGHGAYGEKGQKGEPAWEPGMLVEGPPGPAGPA 450
451 GIMGPPGLQGPTGPPGDPGDRGPPGRPGLPGADGLPGPPGTMLMLPFRYG 500 I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I II I I II I I
451 GIMGPPGLQGPTGPPGDPGDRGPPGRPGLPGADGLPGPPGTMLMLPFRYG 500
501 GDGSKGPTISAQEAQAQAILQQARIALRGPPGPMGLTGRPGPVGGPGSSG 550 II I II II II II M II I II II II I II M M M I I I I I I I I II I I I I I I I I I
501 GDGSKGPTISAQEAQAQAILQQARIALRGPPGPMGLTGRPGPVGGPGSSG 550
551 AKGESGDPGPQGPRGVQGPPGPTGKPGKRGRPGADGGRGMPGEPGAKGDR 600 I I I II I I II I I I I I I II 1 I I I I I II I I I I II I II II II II II II I I II 11
551 AKGESGDPGPQGPRGVQGPPGPTGKPGKRGRPGADGGRGMPGEPGAKGDR 600
601 GFDGLPGLPGDKGHRGERGPQGPPGPPGDDGMRGEDGEIGPRGLPGEA.. 648 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I II I I I I I 601 GFDGLPGLPGDKGHRGERGPQGPPGPPGDDGMRGEDGEIGPRGLPGEAGP 650 649 GMAGVDGPPGPKGNMGPQGEPGPPGQQGNPGPQG 682 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 651 RGLLGPRGTPGAPGQPGMAGVDGPPGPKGNMGPQGEPGPPGQQGNPGPQG 700
683 LPGPQGPIGPPGEK 696
701 LPGPQGPIGPPGEK 714
Sequence name: CAIB HUMAN
Sequence documentation:
Alignment of: HUMCAIXIA PI7 x CAIB HUMAN
Alignment segment 1/1:
Quality: 2561.00 Escore: 0 Matching length: 260 Total length: 260 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0 Alignment:
1 MEPWSSR KTKRWLWDFTVTTLALTFLFQAREVRGAAPVDVLKALDFHNS 50 I I I II I II II II I I I I I I I I I I I I I I I I I I I II II I I II II I I II I I I I I 1 MEPWSSRWKTKRWLWDFTVTTLALTFLFQAREVRGAAPVDVLKALDFHNS 50 51 PEGISKTTGFCTNRKNSKGSDTAYRVSKQAQLSAPTKQLFPGGTFPEDFS 100 I II II I II I I I 1 II I II II II I I II II II II II II II II II II II I II I I 51 PEGISKTTGFCTNRKNSKGSDTAYRVSKQAQLSAPTKQLFPGGTFPEDFS 100 101 ILFTVKPKKGIQSFLLSIYNEHGIQQIGVEVGRSPVFLFEDHTGKPAPED 150 II I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I II I I I I I I I I 101 ILFTVKPKKGIQSFLLSIYNEHGIQQIGVEVGRSPVFLFEDHTGKPAPED 150 151 YPLFRTVNIADGKWHRVAISVEKKTVTMIVDCKKKTTKPLDRSERAIVDT 200 I I I I I I I I I I I I 1 I I I I II I I I I I I I I I I I I II I 1 I I I I I I I I I I I I I I I 151 YPLFRTVNIADGKWHRVAISVEKKTVTMIVDCKKKTTKPLDRSERAIVDT 200
201 NGITVFGTRILDEEVFEGDIQQFLITGDPKAAYDYCEHYSPDCDSSAPKA 250 I I I I I I I I II I II I I I I I I I I I I II I I I II I I I I I I I I I I I I I 1 I I I I I I 201 NGITVFGTRILDEEVFEGDIQQFLITGDPKAAYDYCEHYSPDCDSSAPKA 250
251 AQAQEPQIDE 260 I I I I I I II I I 251 AQAQEPQIDE 260
DESCRIPTION FOR CLUSTER HSSIOOPCB Cluster HSSI OOPCB features 1 transcript(s) and 3 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Figure imgf000673_0001
Table 2 - Segments of interest
Figure imgf000673_0002
Table 3 - Proteins of interest
Figure imgf000673_0003
These sequences are variants of the known protein S-100P protein (SwissProt accession identifier S10PJHUMAN), SEQ ID NO: 635, refened to herein as the previously known protein. The sequence for protein S-100P protein is given at the end of the application, as "S-100P protein amino acid sequence". Known polymoφhisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein
Figure imgf000673_0004
The following GO Annotation(s) apply to the previously Icnown protein. The following annotation(s) were found: calcium binding; protein binding, which are annotation(s) related to Molecular Function. The GO assignment relies on information from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink >.
Cluster HSSIOOPCB can be used as a diagnostic marker according to overexpression of transcripts of this cluster in cancer. Expression of such transcripts in normal tissues is also given according to the previously described methods. The term "number" in the left hand column of the table and the numbers on the y-axis of the figure below refer to weighted expression of ESTs in each category, as "parts per million" (ratio of the expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million). Overall, the following results were obtained as shown with regard to the histograms in Figure 25 and Table 5. This cluster is overexpressed (at least at a minimum level) in the following pathological conditions: a mixtare of malignant tumors from different tissues.
Table 5 - Normal tissue distribution
Figure imgf000674_0001
Figure imgf000675_0001
Table 6 - P values and ratios for expression in cancerous tissue
Figure imgf000675_0002
As noted above, cluster HSSIOOPCB featares 1 transcript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein S-100P protein. A description of each variant protein according to the present invention is now provided. Variant protein HSS100PCBJP3 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSS100PCBJT1. The location of the variant protein was detennined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region. Variant protein HSS100PCBJP3 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 7, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSS100PCB_P3 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Amino acid mutations
Figure imgf000676_0001
Variant protein HSS100PCBJP3 is encoded by the following transcript(s): HSSIOOPCBJTI, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSSIOOPCBJTI is shown in bold; this coding portion starts at position 1057 and ends at position 1533. The transcript also has the following SNPs as listed in Table 8 (given according to their position on the nucleotide sequence, with the altemative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSS100PCB_P3 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Nucleic acid SNPs
Figure imgf000677_0001
As noted above, cluster HSSIOOPCB featares 3 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster HSSIOOPCB jnode J according to the present invention is supported by 16 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSSIOOPCBJTI. Table 9 below describes the starting and ending position of this segment on each transcript. Table 9 - Segment location on transcripts
Figure imgf000678_0001
Segment cluster HSSlOOPCB iodeJ according to the present invention is supported by 29 libraries. The number of libraries was detennined as previously described. This segment can be found in the following transcript(s): HSS100PCB_T1. Table 10 below describes the starting and ending position of this segment on each transcript. Table 10 - Segment location on transcripts
Figure imgf000678_0002
Microanay (chip) data is also available for this segment as follows. As described above with regard to the cluster itself, various oligonucleotides were tested for being differentially expressed in various disease conditions, particularly cancer. The following oligonucleotides were found to hit this segment (related to colon cancer), shown in Table 11. Table 11 - Oligonucleotides related to this segment
Figure imgf000678_0003
Segment cluster HSSlOOPCBjiodeJ according to the present invention is supported by 141 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSSIOOPCBJTI. Table 12 below describes the starting and ending position of this segment on each transcript. Table 12 - Segment location on transcripts
Figure imgf000679_0001
DESCRIPTION FOR CLUSTER HUMPHOSLIP Cluster HUMPHOSLIP features 7 transcript(s) and 53 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Figure imgf000679_0002
Figure imgf000680_0001
Figure imgf000681_0001
Table 3 - Proteins of interest
Figure imgf000682_0001
These sequences are variants of the Icnown protein Phospholipid transfer protein precursor (SwissProt accession identifier PLTPJHUMAN; known also according to the synonyms Lipid transfer protein II), SEQ ID NO: 636, refened to herein as the previously known protein. Protein Phospholipid transfer protein precursor is known or believed to have the following function(s): Converts HDL into larger and smaller particles. May play a key role in extracellular phospholipid transport and modulation of hdl particles. The sequence for protein Phospholipid transfer protein precursor is given at the end of the application, as "Phospholipid transfer protein precursor amino acid sequence". Known polymoφhisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein
Figure imgf000682_0002
Protein Phospholipid transfer protein precursor localization is believed to be Secreted. The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: lipid metabolism; lipid transport, which are annotation(s) related to Biological Process; lipid binding, which are annotation(s) related to Molecular Function; and extracellular, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLinlc/>.
For this cluster, at least one oligonucleotide was found to demonstrate overexpression of the cluster, although not of at least one transcript/segment as listed below. Microanay (chip) data is also available for this cluster as follows. Various oligonucleotides were tested for being differentially expressed in various disease conditions, particularly cancer, as previously described. The following oligonucleotides were found to hit this cluster (in relation to colon cancer) but not other segments/transcripts below, shown in Table 5. Table 5 - Oligonucleotides related to this cluster
Figure imgf000683_0001
As noted above, cluster HUMPHOSLIP featares 7 transcript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Phospholipid transfer protein precursor. A description of each variant protein according to the present invention is now provided.
Variant protein HUMPHOSLIP_PEA_2_P10 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMPHOSLIP_PEA_2_T17. An alignment is given to the known protein (Phospholipid transfer protein precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMPHOSLIP JPEA_2_P 10 and PLTPJHUMAN: l.An isolated chimeric polypeptide encoding for HUMPHOSLTP PEAJ PIO, comprising a first amino acid sequence being at least 90 %> homologous to MALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETITIPDLRGKEGH FYYNISE conesponding to amino acids 1 - 67 of PLTPJHUMAN, which also conesponds to amino acids 1 - 67 of HUMPHOSLIP_PEA_2_P10, and a second amino acid sequence being at least 90 % homologous to KVYDFLSTFITSGMRFLLNQQICPVLYHAGTVLLNSLLDTVPVRSS VDELVGIDYSLMK DPVASTSNLDMDFRGAFFPLTERNWSLPNRAVEPQLQEEERMVYVAFSEFFFDSAMES YFRAGALQLLLVGDKVPHDLDMLLRATYFGSIVLLSPAVIDSPLKLELRVLAPPRCTIKP SGTTISVTASVTIALVPPDQPEVQLSSMTMDARLSAKMALRGKALRTQLDLRRFRIYSN HSALESLALIPLQAPLKTMLQIGVMPMLNERTWRGVQIPLPEGINFVHEWTNHAGFLTI GADLHFAKGLREVIEKNRPADVRASTAPTPSTAAV conesponding to amino acids 163 - 493 of PLTPJHUMAN, which also conesponds to amino acids 68 - 398 of HUMPHOSLIP ΕAJJ510, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2. An isolated chimeric polypeptide encoding for an edge portion of HUMPHOSLIP_PEAJ_P10, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise EK, having a structure as follows: a sequence starting from any of amino acid numbers 67-x to 67; and ending at any of amino acid numbers 68+ ((n-2) - x), in which x varies
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region. Variant protein HUMPHOSLIP JΕ A JP 10 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 6, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMPHOSLIP_PEA_2_P10 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 6 - Amino acid mutations
Figure imgf000685_0001
The glycosylation sites of variant protein HUMPHOSLIP JPEAJ JP 10, as compared to the known protein Phospholipid transfer protein precursor, are described in Table 7 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 7 - Glycosylation site(s)
Figure imgf000686_0001
Variant protein HUMPHOSLIP JPEA_2JP 10 is encoded by the following transcript(s): HUMPHOSLIP_PEAJ_T17, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMPHOSLIP JPEAJ JTl 7 is shown in bold; this coding portion starts at position 276 and ends at position 1469. The transcript also has the following SNPs as listed in Table 8 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMPHOSLIP_PEA_2_P10 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Nucleic acid SNPs
Figure imgf000686_0002
Figure imgf000687_0001
Figure imgf000688_0001
Variant protein HUMPHOSLTPJPEA_2JP12 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMPHOSLIPJPEA_2JT19. An alignment is given to the known protein (Phospholipid transfer protein precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMPHOSLIP JPEA_2JP 12 and PLTPJHUMAN: l .An isolated chimeric polypeptide encoding for HUMPHOSLIPJ>EA_2JP12, comprising a first amino acid sequence being at least 90 % homologous to MALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETITIPDLRGKEGH FYYNISEVKVTELQLTSSELDFQPQQELMLQITNASLGLRFRRQLLYWFFYDGGYINAS AEGVSIRTGLELSRDPAGRMKVSNVSCQAS VSRMHAAFGGTFKKVYDFLSTFITSGMRF LLNQQICPVLYHAGTVLLNSLLDTVPVRSSVDELVGIDYSLMKDPVASTSNLDMDFRG AFFPLTERNWSLPNRAVEPQLQEEERMVYVAFSEFFFDSAMESYFRAGALQLLLVGDK VPHDLDMLLRATYFGSIVLLSPAVIDSPLKLELRVLAPPRCTIKPSGTTISVTASVTIALVP PDQPEVQLSSMTMDARLSAKMALRGKALRTQLDLRRFRIYSNHSALESLALIPLQAPLK TMLQIGVMPMLN conesponding to amino acids 1 - 427 of PLTPJHUMAN, which also conesponds to amino acids 1 - 427 of HUMPHOSLHJPEAJJP12, and a second amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence GKAGV conesponding to amino acids 428 - 432 of HUMPHOSLIP JPEA JP 12, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HUMPHOSLIP_PEA_2_P12, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%o homologous to the sequence GKAGV in HUMPHOSLIP JΕAJ2JP12.
The location of the variant protein was detennined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region. Variant protein HUMPHOSLIPJPEAJJP12 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 9, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMPHOSLIP_PEA_2_P12 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Amino acid mutations
Figure imgf000689_0001
Figure imgf000690_0001
The glycosylation sites of variant protein HUMPHOSLTPJPEAJJP12, as compared to the known protein Phospholipid transfer protein precursor, are described in Table 10 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 10 - Glycosylation site(s)
Figure imgf000690_0002
Variant protein HUMPHOSLIP_PEA_2_P12 is encoded by the following transcript(s): HUMPHOSLIP JPEAJ JTl 9, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMPHOSLIP_PEA_2_T19 is shown in bold; this coding portion starts at position 276 and ends at position 1571. The transcript also has the following SNPs as listed in Table 11 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMPHOSLTP_PEA_2_P12 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 11 - Nucleic acid SNPs
Figure imgf000691_0001
Figure imgf000692_0001
Variant protein HUMPHOSLIP_PEAJ_P30 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMPHOSLIP_PEA J_T6. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region. Variant protein HUMPHOSLIP JPEA >30 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 12, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMPHOSLIP_PEAJ_P30 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 12 - Amino acid mutations
Figure imgf000693_0001
Variant protein HUMPHOSLIP_PEAJ_P30 is encoded by the following transcript(s): HUMPHOSLIP ΕAJJT6, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMPHOSLTPJPEAJJT6 is shown in bold; this coding portion starts at position 276 and ends at position 431. The transcript also has the following SNPs as listed in Table 13 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is Icnown or not; the presence of Icnown SNPs in variant protein HUMPHOSLIP_PEA_2_P30 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 13 - Nucleic acid SNPs
Figure imgf000693_0002
Figure imgf000694_0001
Figure imgf000695_0001
Variant protein HUMPHOSLIP_PEA_2_P31 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMPHOSLIP JPEA JT7. An alignment is given to the known protein (Phospholipid transfer protein precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMPHOSLIP JPEA J_P31 and PLTPJHUMAN: l.An isolated chimeric polypeptide encoding for HUMPHOSLIP_PEA_2J>31, comprising a first amino acid sequence being at least 90 % homologous to MALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETITIPDLRGKEGH FYYNISE conesponding to amino acids 1 - 67 of PLTPJHUMAN, which also conesponds to amino acids 1 - 67 of HUMPHOSLIP_PEA_2_P31, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%>, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence PGLERGADKFPWGGSSLFLALDLTLRPPVG conesponding to amino acids 68 - 98 of HUMPHOSLIP_PEA_2_P31, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HUMPHOSLIPJPEA_2JP31, comprising a polypeptide being at least 10%, optionally at least about 80%), preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence PGLERGADKFPWGGSSLFLALDLTLRPPVG in HUMPHOSLIP_PEA_2_P31. The location of the variant protein was detennined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region. Variant protein HUMPHOSLIP JΕA 2JP31 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 14, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMPHOSLIP >EA_2JP31 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 14 - Amino acid mutations
Figure imgf000696_0001
The glycosylation sites of variant protein HUMPHOSLIP_PEA_2_P31, as compared to the known protein Phospholipid transfer protein precursor, are described in Table 15 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 15 - Glycosylation site(s)
Figure imgf000696_0002
Figure imgf000697_0001
Variant protein HUMPHOSLIP J5EA_2 _P31 is encoded by the following transcript(s): HUMPHOSLIP_PEA_2_T7, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMPHOSLTPJPEAJJT7 is shown in bold; this coding portion starts at position 276 and ends at position 569. The transcript also has the following SNPs as listed in Table 16 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMPHOSLIP_PEA_2_P31 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 16 - Nucleic acid SNPs
Figure imgf000697_0002
Figure imgf000698_0001
Variant protein HUMPHOSLIP_PEAJ_P33 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMPHOSLIP_PEAJ_T14. An alignment is given to the known protein (Phospholipid transfer protein precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMPHOSLIP J>EA JJP33 and PLTP_HUMAN: l.An isolated chimeric polypeptide encoding for HUMPHOSLIP JPEA MP33, comprising a first amino acid sequence being at least 90 %> homologous to MALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETITIPDLRGKEGH FYYNISEVKVTELQLTSSELDFQPQQELMLQITNASLGLRFRRQLLYWFFYDGGYINAS AEGVSIRTGLELSRDPAGRMKVSNVSCQASVSRMHAAFGGTFKKVYDFLSTFITSGMRF LLNQQ conesponding to amino acids 1 - 183 of PLTPJHUMAN, which also conesponds to amino acids 1 - 183 of HUMPHOSLIP_PEA_2_P33, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VWAATGRRVARVGMLSL conesponding to amino acids 184 - 200 of HUMPHOSLIP_PEA_2_P33, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of HUMPHOSLIPJPEA JP33, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VWAATGRRVARVGMLSL in HUMPHOSLIP JPEA_2J>33.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither frans-membrane region prediction program predicts that this protein has a trans-membrane region. Variant protein HUMPHOSLIP JPEA_2JP33 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 17, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMPHOSLIPJΕAJJP33 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 17 - Amino acid mutations
Figure imgf000700_0001
The glycosylation sites of variant protein HUMPHOSLIP_PEA_2_P33, as compared to the known protein Phospholipid transfer protein precursor, are described in Table 18 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 18 - Glycosylation site(s)
Figure imgf000700_0002
Figure imgf000701_0001
Variant protein HUMPH0SLTPJPEAJJP33 is encoded by the following transcript(s): HUMPHOSLIP JPEA JT14, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMPHOSLIP_PEA_2_T14 is shown in bold; this coding portion starts at position 276 and ends at position 875. The transcript also has the following SNPs as listed in Table 19 (given according to their position on the nucleotide sequence, with the altemative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMPHOSLIP_PEAJ_P33 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 19 - Nucleic acid SNPs
Figure imgf000701_0002
Figure imgf000702_0001
Variant protein HUMPHOSLIP ?EA JJP34 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMPHOSLIP_PEAJ_T16. An alignment is given to the known protein (Phospholipid transfer protein precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMPHOSLIP JPEA _2 JP34 and PLTPJHUMAN: l .An isolated chimeric polypeptide encoding for HUMPHOSLTPJPEAJJP34, comprising a first amino acid sequence being at least 90 % homologous to MALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETITIPDLRGKEGH FYYNISEVKVTELQLTSSELDFQPQQELMLQITNASLGLRFRRQLLYWFFYDGGYTNAS AEGVSIRTGLELSRDPAGRMKVSNVSCQASVSRMHAAFGGTFKKVYDFLSTFITSGMRF LLNQQICPVLYHAGTVLLNSLLDTVPV conesponding to amino acids 1 - 205 of PLTPJHUMAN, which also conesponds to amino acids 1 - 205 of HUMPHOSLTPJPEAJJP34, and a second amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence LWTSLLALTIPS conesponding to amino acids 206 - 217 of HUMPHOSLIP JΕA_2JP34, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HUMPHOSLTP PEAJJP34, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence LWTSLLALTIPS in HUMPHOSLIP JPEA JP34.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region. Variant protein HUMPHOSLIP JPEA >34 also has the following non-silent SNPs
(Single Nucleotide Polymoφhisms) as listed in Table 20, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of Icnown SNPs in variant protein HUMPHOSLTPJPEA 2JP34 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 20 - Amino acid mutations
Figure imgf000704_0001
The glycosylation sites of variant protein HUMPHOSLTPJPEA J J>34, as compared to the known protein Phospholipid transfer protein precursor, are described in Table 21 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 21 - Glycosylation site(s)
Figure imgf000704_0002
Variant protein HUMPHOSLTPJPEA_2JP34 is encoded by the following transcript(s): HUMPHOSLIP_PEA_2_T16, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMPHOSLIP_PEA_2_T16 is shown in bold; this coding portion starts at position 276 and ends at position 926. The transcript also has the following SNPs as listed in Table 22 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMPHOSLIP_PEA_2_P34 sequence provides support for the deduced sequence of this variant protein according to the present invention) . Table 22 - Nucleic acid SNPs
Figure imgf000705_0001
Figure imgf000706_0001
Variant protein HUMPHOSLIP JPEA JP35 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMPHOSLTPJPEA JT18. An alignment is given to the known protein (Phospholipid transfer protein precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present mvention to each such aligned protein is as follows: Comparison report between HUMPHOSLIP JΕA J_P35 and PLTPJHUMAN: l.An isolated chimeric polypeptide encoding for HUMPHOSLIP_PEA_2_P35, comprising a first amino acid sequence being at least 90 %> homologous to MALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETITIPDLRGKEGH FYYNISEVKVTELQLTSSELDFQPQQELMLQITNASLGLRFRRQLLYWF conesponding to amino acids 1 - 109 of PLTPJHUMAN, which also conesponds to amino acids 1 - 109 of HUMPHOSLιP_PEA_2_P35, a second amino acid sequence bridging amino acid sequence comprising of L, a third amino acid sequence being at least 90 %> homologous to KVYDFLSTFITSGMRFLLNQQ conesponding to amino acids 163 - 183 of PLTP_HUMAN, which also conesponds to amino acids 111 - 131 of HUMPHOSLIP JΕA_2JP35, and a fourth amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VWAATGRRVARVGMLSL conesponding to amino acids 132 - 148 of HUMPHOSLIP_PEAJJP35, wherein said first amino acid sequence, second amino acid sequence, third amino acid sequence and fourth amino acid sequence are contiguous and in a sequential order. 2. An isolated polypeptide encoding for an edge portion of HUMPHOSLTPJPEA .JP35, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise FLK having a stmcture as follows (numbering according to
HUMPHOSLIP_PEA_2_P35): a sequence starting from any of amino acid numbers 109-x to
109; and ending at any of amino acid numbers 111 + ((n-2) - x), in which x varies from 0 to n-2. 3. An isolated polypeptide encoding for a tail of HUMPHOSLIP_PEAJ_P35, comprising a polypeptide being at least 70%>, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90%> and most preferably at least about 95% homologous to the sequence VWAATGRRVARVGMLSL in HUMPHOSLIP >EA_2J>35.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region. Variant protein HUMPHOSLIP ΕAJJP35 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 23, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMPHOSLIP_PEA_2_P35 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 23 - Amino acid mutations
Figure imgf000708_0001
The glycosylation sites of variant protein HUMPHOSLIP PEAJJP35, as compared to the known protein Phospholipid transfer protein precursor, are described in Table 24 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 24 - Glycosylation site(s)
Figure imgf000709_0001
Variant protein HUMPHOSLIP_PEA_2_P35 is encoded by the following transcript(s): HUMPHOSLTPJPEA J JTl 8, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMPHOSLIP ?EA_2_T18 is shown in bold; this coding portion starts at position 276 and ends at position 719. The transcript also has the following SNPs as listed in Table 25 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMPHOSLIP JPEA J>35 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 25 - Nucleic acid SNPs
Figure imgf000709_0002
Figure imgf000710_0001
As noted above, cluster HUMPHOSLIP features 53 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided. Segment cluster HUMPHOSLIP JPEA iode ) according to the present invention is supported by 150 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMPHOSLIP JPEA JT6, HUMPHOSLIP JΕAJ T7, HUMPHOSLIP JPEA JTl 4, HUMPHOSLIP_PEA_2_T16, HUMPHOSLIP >EA_2 JTl 7, HUMPHOSLIPJPEA_2JT18 and HUMPHOSLIP_PEA_2_T19. Table 26 below describes the starting and ending position of this segment on each transcript. Table 26 - Segment location on transcripts
Figure imgf000711_0001
Segment cluster HUMPHOSLIP JPEA jnode J 9 according to the present invention is supported by 186 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HUMPHOSLIP JPEA JT6, HUMPHOSLIP JΕAJJT7, HUMPHOSLIP J>EAJ JTl 4, HUMPHOSLIP JPEA_2 JTl 6 and HUMPHOSLIP_PEA_2_T19. Table 27 below describes the starting and ending position of this segment on each transcript. Table 27 - Segment location on transcripts
Figure imgf000712_0001
Segment cluster HUMPHOSLIP PEA 2_nodeJ4 according to the present invention is supported by 191 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMPHOSLIP_PEA_2_T6, HUMPHOSLIP JPEA _T7, HUMPHOSLIP JΕAJJT14, HUMPHOSLIP _PEA_2_T16, HUMPHOSLIP JPEA JT17, HUMPHOSLIP_PEAJ_T18 and HUMPHOSLIP JPEAJ JTl 9. Table 28 below describes the starting and ending position of this segment on each franscript. Table 28 - Segment location on transcripts
Figure imgf000712_0002
Segment cluster HUMPHOSLIPJPEAJ node δ according to the present invention is supported by 131 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMPHOSLIP JPEA_2_T6, HUMPHOSLIP J>EA JT7, HUMPHOSLIP JΕA J JT14, HUMPHOSLIP J>EA_2 JTl 6, HUMPHOSLIP JPEA JTl 7, HUMPHOSLIP ?EA_2_T18 and HUMPHOSLIP JPEA_2 JTl 9. Table 29 below describes the starting and ending position of this segment on each transcript. Table 29 - Segment location on transcripts
Figure imgf000713_0001
Segment cluster HUMPHOSLIPJ)EA_2_node_70 according to the present invention is supported by 5 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMPHOSLIP_PEAJ_T6, HUMPHOSLIP_PEA__2_T7, HUMPHOSLIP JPEA JT14, HUMPHOSLIPJPEA_2_T16, HUMPHOSLIP J>EA_2JT17, HUMPHOSLIP ΕAJ JTl 8 and HUMPHOSLIP J?EA_2JT19. Table 30 below describes the starting and ending position of this segment on each transcript. Table 30 - Segment location on transcripts
Figure imgf000713_0002
Figure imgf000714_0001
Segment cluster HUMPHOSLIP_PEA_2_nodeJ5 according to the present invention is supported by 14 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMPHOSLIP JPEA_2JT6, FΓUMPHOSLΓ JPEA JΠ, HUMPHOSLΓPJPEA_2JΓI4, HUMPHOSLIP_PEA_2_TI6, HUMPHOSLTP_PEAJ_T17, HUMPHOSLIP JPEA_2JΠ 8 and HUMPHOSLIP ?EAJ_Tl 9. Table 31 below describes the starting and ending position of this segment on each transcript. Table 31 - Segment location on transcripts
Figure imgf000714_0002
According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster HUMPHOSLIP JPEA jnode J according to the present invention is supported by 159 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMPHOSLIP JPEA_2JT6, HUMPHOSLIP J>EAJJT7, HUMPHOSLIP_PEAJ_T14, HUMPHOSLIP_PEA_2_T16, HUMPHOSLIP_PEAJ_T17, HUMPHOSLIPJPEAJJT18 and HUMPHOSLIP_PEAJ_T19. Table 32 below describes the starting and ending position of this segment on each transcript. Table 32 - Segment location on transcripts
Figure imgf000715_0001
Segment cluster HUMPHOSLIP JΕA_2_node J according to the present invention can be found in the following franscript(s): HUMPHOSLIP JPEA JT7, HUMPHOSLIP JΕNJJT14, HUMPHOSLIPJPEA_2_T16, HUMPHOSLIP JPEA_2_T17, HUMPHOSLIP ?EA_2_Tl 8 and HUMPHOSLIP J>EA_2_T19. Table 33 below describes the starting and ending position of this segment on each transcript. Table 33 - Segment location on transcripts
Figure imgf000716_0001
Segment cluster HUMPHOSLIP_PEAJ_node_4 according to the present invention can be found in the following transcript(s): HUMPHOSLIP_PEA_2_T7, HUMPHOSLIP JPEA_2_T14, HUMPHOSLIP JPEA JT 6, HUMPHOSLIP ?EAJ_Tl 7, HUMPHOSLIP JPEA JTl 8 and HUMPHOSLιPJPEAJ ri9. Table 34 below describes the starting and ending position of this segment on each transcript. Table 34 - Segment location on transcripts
Figure imgf000716_0002
Segment cluster HUMPHOSLIP_PEAJ ιode_6 according to the present invention can be found in the following transcript(s): HUMPHOSLIP_PEA_2_T7, HUMPHOSLIP JPEA_2_T14, HUMPHOSLIP_PEAJ_T16, HUMPHOSLIP_PEA_2_T17, HUMPHOSLIP >EA_2_T18 and HUMPHOSLIP J>EAJ JTl 9. Table 35 below describes the starting and ending position of this segment on each transcript. Table 35 - Segment location on transcripts
Figure imgf000717_0001
Segment cluster HUMPHOSLIP_PEA_2_node_7 according to the present invention can be found in the following transcript(s): HUMPHOSLIP JΕA 2JT6, HUMPHOSLIP >EAJJT7, HUMPHOSLIP _PEA_2_T14, HUMPHOSLIP_PEA_2_T16, HUMPHOSLIP ΕA_2 JTl 7, HUMPHOSLTPJPEA_2jri8 and HUMPHOSLIP_PEAJ_T19. Table 36 below describes the starting and ending position of this segment on each transcript. Table 36 - Segment location on transcripts
Figure imgf000717_0002
Segment cluster HUMPHOSLIP ?EAJ_node_8 according to the present invention is supported by 171 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMPHOSLTP ΕAJJT6, HUMPHOSLIP JPEA T7, HUMPHOSLιPJPEAJJT14, HUMPHOSLIP_PEAJ_T16, HUMPHOSLIP >EA JT17, HUMPHOSLIP JPEAJ JTl 8 and HUMPHOSLIP _PEAJ_T19. Table 37 below describes the starting and ending position of this segment on each transcript. Table 37 - Segment location on transcripts
Figure imgf000718_0001
Segment cluster HUMPHOSLTP_PEA_2_node_9 according to the present invention is supported by 168 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMPHOSLTP JPEA JT6, HUMPHOSLIP JPEAJ JT7, HUMPHOSLIP JPEA_2_T14, HUMPHOSLIP_PEA_2_T16, HUMPHOSLTP JΕAJJT17, HUMPHOSLIP JΕA JTl 8 and HUMPHOSLIP _PEA_2_T19. Table 38 below describes the starting and ending position of this segment on each transcript. Table 38 - Segment location on transcripts
Figure imgf000718_0002
Figure imgf000719_0001
Segment cluster HUMPHOSLIP ΕAJJnodeJ 4 according to the present invention is supported by 6 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMPHOSLIP_PEAJ_T7. Table 39 below describes the starting and ending position of this segment on each transcript. Table 39 - Segment location on transcripts
Figure imgf000719_0002
Segment cluster HUMPHOSLIP JPEA_2_nodeJ 5 according to the present invention can be found in the following franscript(s): HUMPHOSLIP JPEA_2JT6, HUMPHOSLIP J>EA_2_T7, HUMPHOSLIP_PEAJ_T14, HUMPHOSLIP JPEA_2JT16, HUMPHOSLIP ?EA_2_Tl 8 and HUMPHOSLIP _PEA JT19. Table 40 below describes the starting and ending position of this segment on each franscript. Table 40 - Segment location on transcripts
Figure imgf000719_0003
Figure imgf000720_0001
Segment cluster HUMPHOSLIP _PEAJ_node l 6 according to the present invention is supported by 179 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMPHOSLTP JPEA_2_T6, HUMPHOSLIP JPEA_2_T7, HUMPHOSLIP JPEA JTl 4, HUMPHOSLIP_PEA_2_T16, HUMPHOSL P JPEAJ JTl 8 and HUMPHOSLIP J>EAJJT 9. Table 41 below describes the starting and ending position of this segment on each transcript. Table 41 - Segment location on transcripts
Figure imgf000720_0002
Segment cluster HUMPHOSLTP JPEA _nodeJ 7 according to the present invention can be found in the following transcript(s): HUMPHOSLIP JPEAJ JT6, HUMPHOSLIP_PEA_2_T7, HUMPHOSLIP >EA_2JT14, HUMPHOSLTP J>EA_2JT16, HUMPHOSLIP_PEA_2_T18 and HUMPHOSLIP J>EA_2 JTl 9. Table 42 below describes the starting and ending position of this segment on each transcript. Table 42 - Segment location on transcripts
Figure imgf000721_0001
Segment cluster HUMPHOSLIP_PEAJ_node_23 according to the present invention is supported by 168 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMPHOSLIP JPEAJ JT6, HUMPHOSLIP JΕA.JJT7, HUMPHOSLTP ΕAJJT14, HUMPHOSLIP J>EA_2 JTl 6, HUMPHOSLIP >EA_2JT17, HUMPHOSLiPJPEA_2JT18 and HUMPHOSLIP JPEA JT19. Table 43 below describes the starting and ending position of this segment on each transcript. Table 43 - Segment location on transcripts
Figure imgf000721_0002
Segment cluster HUMPHOSLIP_PEA_2_node_24 according to the present invention can be found in the following transcripts): HUMPHOSLIP JPEAJ JT6, HUMPHOSLIP _PEA_2_T7, HUMPHOSLTP JPEA JT14, HUMPHOSLIP J»EA_2_T16, HUMPHOSLTP J>EAJJT17, HUMPHOSLIP_PEA_2_T18 and HUMPHOSLIP JPEA_2JT19. Table 44 below describes the starting and ending position of this segment on each transcript. Table 44 - Segment location on transcripts
Figure imgf000722_0001
Segment cluster HUMPHOSLIP JPE A J iode 5 according to the present invention is supported by 5 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMPHOSLIP_PEA_2_T14 and HUMPHOSLIP JPEAJJTl 8. Table 45 below describes the starting and ending position of this segment on each transcript. Table 45 - Segment location on transcripts
Figure imgf000722_0002
Segment cluster HUMPHOSLTPJPEA ιodeJ26 according to the present invention is supported by 163 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscriρt(s): HUMPHOSLTP ΕAJJT6, HUMPHOSLIP J>EA JT7, HUMPHOSLIP J»EA JT14, HUMPHOSLTP J>EA_2 JTl 6, HUMPHOSLIP_PEA_2_T17, HUMPHOSLIP JPEA_2 JTl 8 and HUMPHOSLIP JPEA JTl 9. Table 46 below describes the starting and ending position of this segment on each transcript. Table 46 - Segment location on transcripts
Figure imgf000723_0001
Segment cluster HUMPHOSLIP_PEA_2_node_29 according to the present invention can be found in the following transcript(s): HUMPHOSLTP JΕA_2JT6, HUMPHOSLIP JPEA JT7, HUMPHOSLTPJPEAJJT14, HUMPHOSLTP_PEA_2_T17, HUMPHOSLTPJPEAJjriδ and HUMPHOSLTP >ENJJT19. Table 47 below describes the starting and ending position of this segment on each transcript. Table 47 - Segment location on transcripts
Figure imgf000723_0002
Segment cluster HUMPHOSLIP JPEA_2_nodeJ0 according to the present invention is supported by 181 libraries. The number of libraries was detennined as previously described. This segment can be found in the following transcript(s): HUMPHOSLTPJΕA JT6, HUMPHOSLIP JPEA JT7, HUMPHOSLIP_PEAJ_T14, HUMPHOSLIP_PEAJ_T16, HUMPHOSLTP J>EAJJT17, HUMPHOSLIPJ>EA_2JT18 and HUMPHOSLIP_PEAJ_T19. Table 48 below describes the starting and ending position of this segment on each transcript. Table 48 - Segment location on transcripts
Figure imgf000724_0001
Segment cluster HUMPHOSLTPJΕA jnodeJ3 according to the present invention is supported by 173 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMPHOSLIP JPEA JT6, HUMPHOSLTP JΕAJJT7, HUMPHOSLTP JPEA JT14, HUMPHOSLTP >EA_2 JTl 6, HUMPHOSLTP JΕAJJT17, HUMPHOSLTP >EA_2 JTl 8 and HUMPHOSLTP JPEA_2_T19. Table 49 below describes the starting and ending position of this segment on each transcript. Table 49 - Segment location on transcripts
Figure imgf000724_0002
Figure imgf000725_0001
Segment cluster HUMPHOSLTP JPEA _nodeJ 6 according to the present invention is supported by 163 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMPHOSLIP_PEA_2_T6, HUMPHOSLIPJ>EA_2 JT7, HUMPHOSLIP _PEA_2_T14, HUMPHOSLIP JPEA_2 JTl 6, HUMPHOSLIP >EAJ JTl 7, HUMPHOSLIP J>EA_2 T 8 and HUMPHOSLIP JPEA JTl 9. Table 50 below describes the starting and ending position of this segment on each transcript. Table 50 - Segment location on transcripts
Figure imgf000725_0002
Segment cluster HUMPHOSLTP JΕA iodeJ 7 according to the present invention can be found in the following transcript(s): HUMPHOSLTP JPEA_2_T6, HUMPHOSLTP J?EA_2JT7, HUMPHOSLIP >EAJ2_T14, HUMPHOSLIP >EA_2 Tl 6, HUMPHOSLIP JΕA JTl 7, HUMPHOSLIP JPEA_2_T18 and HUMPHOSLIP _PEA_2_T19. Table 51 below describes the starting and ending position of this segment on each transcript. Table 51 - Segment location on transcripts
Figure imgf000726_0001
Segment cluster HUMPHOSLTP_PEA_2_nodeJ9 according to the present invention is supported by 166 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMPHOSLIPJΕA JT6, HUMPHOSLIP ΕA_2JT7, HUMPHOSLTP JΕA JT14, HUMPHOSLTP JPEA_2 JTl 6, HUMPHOSLIP JΕA JTl 7, HUMPHO SLIP_PEA_2_T 18 and HUMPHOSLIP JPEA JTl 9. Table 52 below describes the starting and ending position of this segment on each transcript. Table 52 - Segment location on transcripts
Figure imgf000727_0001
Segment cluster HUMPHOSLTPVPEA 2_node 40 according to the present invention is supported by 199 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMPHOSLTPJPEA JT6, HUMPHOSLTP JPEAJ JT7, HUMPHOSLIP_PEAJ_T14, HUMPHOSLIP ?EA_2 JTl 6, HUMPHOSLIP_PEA_2_T17, HUMPHO SLIP JPEAJ JT 18 and HUMPHOSLTP JPEA_2 JTl 9. Table 53 below describes the starting and ending position of this segment on each transcript. Table 53 - Segment location on transcripts
Segment cluster HUMPHOSLTP JPEA jnode ll according to the present invention is supported by 186 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMPHOSLIP J>EA_2JT6, HUMPHOSLIP ?EA_2_T7, HUMPHOSLIP JPEA JT14, HUMPHOSLTPJ?EA_Jjri6, HUMPHOSLIP_PEA_2_T17, HUMPHOSLIP >EAJ JTl 8 and HUMPHOSLTPJPEA J JTl 9. Table 54 below describes the starting and ending position of this segment on each transcript. Table 54 - Segment location on transcripts
Figure imgf000728_0001
Segment cluster HUMPHOSLIP_PEAJ_node_42 according to the present invention can be found in the following transcript(s): HUMPHO SLIP JPEA JT6, HUMPHOSLIP J>EA_2JT7, HUMPHOSLIP_PEA_2_T14, HUMPHOSLIP JPEA_2_T 16, HUMPHOSLHJPEAJj , HUMPHOSLIP JPEA_2 JTl 8 and HUMPHOSLIP_PEAJ_T19. Table 55 below describes the starting and ending position of this segment on each transcript. Table 55 - Segment location on transcripts
Figure imgf000728_0002
Segment cluster HUMPHOSL _PEAJ_node_44 according to the present invention is supported by 185 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMPHOSLIP >EA_2JT6, HUMPHOSLIP _PEA_2_T7, HUMPHOSLIP JPEA _2_T 14, HUMPHO SLIP JΕA _T 16, HUMPHOSLTP JPEA_2_T17, HUMPHO SLIP_PEA_2_T 18 and HUMPHOSLT J>EA_2 JTl 9. Table 56 below describes the starting and ending position of this segment on each transcript. Table 56 - Segment location on transcripts
Figure imgf000729_0001
Segment cluster HUMPHOSLTPJPEA 2 node_45 according to the present invention is supported by 197 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMPHOSLTP JPEA JT6, HUMPHOSLTP >EA_2 T7, HUMPHOSLιPJPEA_2JT14, HUMPHOSLIP_PEA_2_T16, HUMPHOSLIP J>EA_2 JTl 7, HUMPHOSLIPJ>EAJJT18 and HUMPHOSLT JPEA_2_T19. Table 57 below describes the starting and ending position of this segment on each transcript. Table 57 - Segment location on transcripts
Figure imgf000729_0002
Figure imgf000730_0001
Segment cluster HUMPHO SLIP JPEA jιodeJ-7 according to the present invention is supported by 223 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMPHOSLIP_PEA_2_T6, HUMPHOSLIP J>EA_J_T7, HUMPHOSLIP J>EA_2 JTl 4, HUMPHOSLIP_PEA_2_T16, HUMPHOSLIP J>EA_2_T17, HUMPHO SLIP JPEA JT 18 and HUMPHOSLIP_PEA_2_T19. Table 58 below describes the starting and ending position of this segment on each transcript. Table 58 - Segment location on transcripts
Figure imgf000730_0002
Segment cluster HUMPHOSLTP j?EA_2 jnode l according to the present invention can be found in the following franscript(s): HUMPHOSLTP JPEA_2JT6, HUMPHOSLIP_PEA_2_T7, HUMPHOSLIP >EA_2JT14, HUMPHOSLIPJPEA_2JT16, HUMPHOSLIP _PEA_2_T17, HUMPHO SLIP_PEA_2_T 18 and HUMPHOSLIP J>EA_2 JTl 9. Table 59 below describes the starting and ending position of this segment on each transcript. Table 59 - Segment location on transcripts
Figure imgf000731_0001
Segment cluster HUMPHOSLIP JΕA _nodeJ2 according to the present invention is supported by 235 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMPHOSLIP_PEA_2_T6, HUMPHOSLTP >EAJJT7, HUMPHOSLIP_PEAJ_T14, HUMPHOSLTP >EA_2 JTl 6, HUMPHOSLIP JPEA JT17, HUMPHO SLIP JPEA _T 18 and HUMPHOSLIP JΕA_2 JTl 9. Table 60 below describes the starting and ending position of this segment on each transcript. Table 60 - Segment location on transcripts
Figure imgf000731_0002
Segment cluster HUMPHOSLIP JΕAJ iode 3 according to the present invention is supported by 5 libraries. The number of libraries was detennined as previously described. This segment can be found in the following transcript(s): HUMPHOSLIP_PEA_2_T19. Table 61 below describes the starting and ending position of this segment on each transcript. Table 61 - Segment location on transcripts
Figure imgf000732_0001
Segment cluster HUMPHOSLTP >EAJ_nodeJ4 according to the present invention is supported by 236 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMPHOSLTP JPEA JT6, HUMPHOSLIP _PEA_2_T7, HUMPHOSLTP JPEA JT14, HUMPHOSLIP_PEAJ_T16, HUMPHOSLTPJPEA J JTl 7, HUMPHO SLIP_PEA_2_T 18 and HUMPHOSLTP JPEA_ JTl 9. Table 62 below describes the starting and ending position of this segment on each transcript. Table 62 - Segment location on transcripts
Figure imgf000732_0002
Segment cluster HUMPHOSLIP PEA _nodeJ5 according to the present invention is supported by 232 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMPHOSLIP PEAJJT6, HUMPHOSLTP JPEA_2JT7, HUMPHOSLIP JΕA JJT14, HUMPHOSLTP J>EA_ JTl 6, HUMPHOSLIP_PEA_2_T17, HUMPHOSLIP_PEA_2__T18 and HUMPHOSLIP_PEAJ_T19. Table 63 below describes the starting and ending position of this segment on each transcript. t Table 63 - Segment location on transcripts
Figure imgf000733_0001
Segment cluster HUMPHOSLTP PEA 2jnode_58 according to the present invention can be found in the following transcript(s): HUMPHOSLIP JPEA JT6, HUMPHOSLIP_PEAJJT7, HUMPHOSLIP ?EAJ_T14, HUMPHOSLIP ?EA_2_Tl 6, HUMPHOSLIP JΕA J JTl 7, HUMPHOSLTP JPEA_2 JTl 8 and HUMPHOSLIP_PEA_2_T19. Table 64 below describes the starting and ending position of this segment on each transcript. Table 64 - Segment location on transcripts
Figure imgf000733_0002
Figure imgf000734_0001
Segment cluster HUMPHOSLIP_PEAJ_nodeJ9 according to the present invention is supported by 230 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMPHOSLTPJPEAJ JT6, HUMPHOSLIP JPEAJ2JT7, HUMPHOSLTP JPEA_2JT14, HUMPHOSLIP_PEA_2_T16, HUMPHOSLIP JPEA_2 JTl 7, HUMPHOSLTP JΕA JTl 8 and HUMPHOSLIP_PEAJ_T19. Table 65 below describes the starting and ending position of this segment on each transcript. Table 65 - Segment location on transcripts
Figure imgf000734_0002
Segment cluster HUMPHOSLTP >EAJjnode 50 according to the present invention can be found in the following franscript(s): HUMPHOSLTP JPEAJJT6, HUMPHOSLIP_PEA_2_T7, HUMPHOSLIP JPEA_2_T14, HUMPHOSLIP ?EA_2_T16, HUMPHOSLT J>EA_2_T17, HUMPHOSLIP_PEAJ_T18 and HUMPHOSLIP JPEA_2_T19. Table 66 below describes the starting and ending position of this segment on each transcript. Table 66 - Segment location on transcripts
Figure imgf000735_0001
Segment cluster HUMPHOSLIP_PEAJ_node_61 according to the present invention can be found in the following transcript(s): HUMPHOSLIP JPEA 2 JT6, HUMPHOSLIP_PEA__2_T7, HUMPHO SLIP JPEA T 14, HUMPHOSLIP_PEAJ_T16, HUMPHOSL _PEA_2_T17, HUMPHOSLIPJΕAJJT18 and HUMPHOSLIP J>EAJ JTl 9. Table 67 below describes the starting and ending position of this segment on each transcript. Table 67 - Segment location on transcripts
Figure imgf000735_0002
Segment cluster HUMPHOSLP_PEA_2_node_62 according to the present invention can be found in the following franscript(s): HUMPHOSL JΕAJJT6, HUMPH0SLIP_PEAJJT7, HUMPHOSLIP JPEA JJT14, HUMPHOSLP JPEA_2 T 16, HUMPHOSLIP JPEA JT17, HUMPHOSLIP_PEA_2_T18 and HUMPHOSLP_PEAJ_T19. Table 68 below describes the starting and ending position of this segment on each transcript. Table 68 - Segment location on transcripts
Figure imgf000736_0001
Segment cluster HUMPHO SLP_PE A 2_node_63 according to the present invention can be found in the following transcript(s): HUMPHOSLIP JPEA JT6, HUMPHOSLIP JPEA JT7, HUMPHOSLIP J EAJJT14, HUMPHOSLIP_PEA_2_T16, HUMPHOSLP JPEA JT17, HUMPHO SLP_PEA_2_T 18 and HUMPHOSLIP JΕA_2_T19. Table 69 below describes the starting and ending position of this segment on each transcript. Table 69 - Segment location on transcripts
Figure imgf000736_0002
Figure imgf000737_0001
Segment cluster HUMPH08LIPJPEA 2_node_64 according to the present invention can be found in the following transcript(s): HUMPHOSLIP JPEA JT6, HUMPHOSLP JPEA JT7, HUMPHOSLIP J>EA _2_T14, HUMPHOSLP_PEA _2_T16, HUMPHOSLIP_PEAJ_T17, HUMPHOSLIPJPEAJJT18 and HUMPHOSLPJPEAJJT19. Table 70 below describes the starting and ending position of this segment on each transcript. Table 70 - Segment location on transcripts
Figure imgf000737_0002
Segment cluster HUMPHOSLP_PEAJ_node_65 according to the present invention can be found in the following franscriρt(s): HUMPHOSLIP JPEA JT6, HUMPHOSLIP JΕAJJT7, HUMPHOSLIP JPEA _T14, HUMPHOSLIP JPEAJ JTl 6, HUMPHOSLP JPEA JTl 7, HUMPHO SLP_PEA J 18 and HUMPHOSLIP JΕA J JTl 9. Table 71 below describes the starting and ending position of this segment on each transcript. Table 71 - Segment location on transcripts
Figure imgf000737_0003
Figure imgf000738_0001
Segment cluster HUMPHOSLIPJPEAJjnode 56 according to the present invention is supported by 180 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMPHOSLIP JPEAJ JT6, HUMPHOSLIP ΕAJJT7, HUMPHOSLPJPEA_2JT14, HUMPHOSLIP J>EA_2JT16, HUMPHOSLP J>EA_2 JTl 7, HUMPHOSLP JPEA_2 JTl 8 and HUMPHOSLIP J»EAJ_T19. Table 72 below describes the starting and ending position of this segment on each franscript. Table 72 - Segment location on transcripts
Figure imgf000738_0002
Segment cluster HUMPHOSLP_PEA_2_node_67 according to the present invention can be found in the following transcript(s): HUMPHOSLIP JΕA JT6, HUMPHOSLP PEA_2_T7, HUMPHOSLP PEA 2JT14, HUMPHOSLPJPEA_2_T16, HUMPHOSLP JΕAJjπ 7, HUMPHO SLIP JPEA JT 18 and HUMPHOSLIP_PEA_2_T19. Table 73 below describes the starting and ending position of this segment on each transcript. Table 73 - Segment location on transcripts
Figure imgf000739_0001
Segment cluster HUMPHOSLIP JΕAJ__node_69 according to the present invention can be found in the following transcript(s): HUMPHOSLIP JPEA_2JT6, HUMPHOSLP J>EA_2JT7, HUMPHOSLIP JPEA_2_T14, HUMPHOSLIP JPEA_2_T16, HUMPHOSLP JPEA_2_T17, HUMPHOSLIP JPEA_2_T18 and HUMPHOSLIP JPEA_2 JTl 9. Table 74 below describes the starting and ending position of this segment on each transcript. Table 74 - Segment location on transcripts
Figure imgf000739_0002
Segment cluster HUMPHOSLP_PEA_2_node_71 according to the present invention can be found in the following transcript(s): HUMPHOSLIP J°EAJJT6, HUMPHOSLIP JPEA JT7, HUMPHO SLIP J>EA_2 JTl 4, HUMPHOSLIP JΕA_2JT 16, HUMPHOSLP_PEAJ_T17, HUMPHO SLIP_PEA_2_T 18 and HUMPHOSLIP J>EA JTl 9. Table 75 below describes the starting and ending position of this segment on each transcript. Table 75 - Segment location on transcripts
Figure imgf000740_0001
Segment cluster HUMPHOSLP_PEAJ_nodeJ2 according to the present invention is supported by 7 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMPHOSLP_PEA_2_T6, HUMPHOSLIP ΕAJJT7, HUMPHOSLIP ΕA_2_T14, HUMPHOSLP_PEA_2_T16, HUMPHOSLP JPEA JT17, HUMPHOSLIPJPEA_2JT18 and HUMPHOSLP_PEAJ_T19. Table 76 below describes the starting and ending position of this segment on each transcript. Table 76 - Segment location on transcripts
Figure imgf000740_0002
Figure imgf000741_0001
Segment cluster HUMPHOSLIPJPEA jnode _73 according to the present invention is supported by 5 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMPHOSLPJPEAJJT6, HUMPHOSLIP J>EA_2JT7, HUMPHOSLIP JPEA_2_T14, HUMPHOSLIP J?EA_2JT 6, HUMPHOSLIP >EA_2_T17, HUMPHOSLIP J>EA_2 JTl 8 and HUMPHOSLP_PEA_2_T19. Table 77 below describes the starting and ending position of this segment on each transcript. Table 77 - Segment location on transcripts
Figure imgf000741_0002
Segment cluster HUMPHOSLP_PEAJ_nodeJ4 according to the present invention is supported by 10 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMPHOSLP_PEAJ_T6, HUMPHOSLP >EA_2_T7, HUMPHOSLP JPEAJ JTl 4, HUMPHOSLIP_PEA_2_T16, HUMPHOSLP J>EA_2j;i7, HUMPHO SLIP JΕA JT 18 and HUMPHOSLPJPEA_2JT19. Table 78 below describes the starting and ending position of this segment on each transcript. Table 78 - Segment location on transcripts
Figure imgf000742_0001
Variant protein alignment to the previously known protein: Sequence name: PLTPJiUMAN
Sequence documentation:
Alignment of: HUMPHOSLIP_PEA_2_P10 x PLTPJiUMAN Alignment segment 1/1:
Quality: 3716.00 Escore: 0 Matching length: 398 Total length: 493 Matching Percent Similarity: 100.00 Matching Percent
Identity: 100.00 Total Percent Similarity: 80.73 Total Percent
Identity: 80.73 Gaps : 1
Alignment :
1 MALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETIT 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETIT 50
51 IPDLRGKEGHFYYNISE 67 I I I I I I I I I I I I I I I I I 51 IPDLRG EGHFYYNISEVKVTELQLTSSELDFQPQQELMLQITNASLGLR 100
67 67
101 FRRQLLY FFYDGGYINASAEGVSIRTGLELSRDPAGRMKVSNVSCQASV 150 . . . . . 68 KVYDFLSTFITSGMRFLLNQQICPVLYHAGTVLLNSLL 105 I I I I I I I I II I I I I I I I I II I I I I I I I I I I I I I I I I I I 151 SRMHAAFGGTFKKVYDFLSTFITSGMRFLLNQQICPVLYHAGTVLLNSLL 200 106 DTVPVRSSVDELVGIDYSLMKDPVASTSNLDMDFRGAFFPLTERNWSLPN 155 I I I I I I I II I I I I I I II I II I I I I I I I I I II I I I I I I II I I I I II I I I II 201 DTVPVRSSVDELVGIDYSLMKDPVASTSNLDMDFRGAFFPLTERN SLPN 250
156 RAVEPQLQEEERMVYVAFSEFFFDSAMESYFRAGALQLLLVGD VPHDLD 205 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I 251 RAVEPQLQEEERMVYVAFSEFFFDSAMESYFRAGALQLLLVGDKVPHDLD 300 206 MLLRATYFGSIVLLSPAVIDSPLKLELRVLAPPRCTIKPSGTTISVTASV 255 I I I I I I I I I I I I I I I 111 II I I I I II I I II I II I I I I I I I I I II I I II I 1 301 MLLRATYFGSIVLLSPAVIDSPLKLELRVLAPPRCTIKPSGTTISVTASV 350 . . . . . 256 TIALVPPDQPEVQLSSMTMDARLSAKMALRGKALRTQLDLRRFRIYSNHS 305 I I I I II I I I I I I I I I I I I I I I I II I I I I I I II I I I I I I II I I I I I I I I I I 351 TIALVPPDQPEVQLSSMTMDARLSAKMALRGKALRTQLDLRRFRIYSNHS 400 306 ALESLALIPLQAPLKTMLQIGVMPMLNERTWRGVQIPLPEGINFVHEVVT 355 I I I I I I I I I I II I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I 1 I I 401 ALESLALIPLQAPLKTMLQIGVMPMLNERTWRGVQIPLPEGINFVHEWT 450
356 NHAGFLTIGADLHFAKGLREVIEKNRPADVRASTAPTPSTAAV 398
451 NHAGFLTIGADLHFAKGLREVIEKNRPADVRASTAPTPSTAAV 493
Sequence name: PLTP_HUMAN
Sequence documentation:
Alignment of: HUMPHOSLIP_PEA_2_P12 x PLTPJHUMAN
Alignment segment 1/1: Quality: 4101.00 Escore: 0 Matching length: 427 Total length: 427 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment:
1 MALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETIT 50
1 MALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETIT 50
51 IPDLRGKEGHFYYNISEVKVTELQLTSSELDFQPQQELMLQITNASLGLR 100
51 IPDLRGKEGHFYYNISEVKVTELQLTSSELDFQPQQELMLQITNASLGLR 100
101 FRRQLLY FFYDGGYINASAEGVSIRTGLELSRDPAGRMKVSNVSCQASV 150
101 FRRQLLY FFYDGGYINASAEGVSIRTGLELSRDPAGRMKVSNVSCQASV 150
151 SRMHAAFGGTFKKVYDFLSTFITSGMRFLLNQQICPVLYHAGTVLLNSLL 200
151 SRMHAAFGGTFKKVYDFLSTFITSGMRFLLNQQICPVLYHAGTVLLNSLL 200
201 DTVPVRSSVDELVGIDYSLMKDPVASTSNLDMDFRGAFFPLTERNWSLPN 250
201 DTVPVRSSVDELVGIDYSLMKDPVASTSNLDMDFRGAFFPLTERNWSLPN 250 251 RAVEPQLQEEERMVY¥AFSEFFFDSAMESYFRAGALQLLLVGDK¥PHDLD 300 I I I I I I I I I I I I I I II I I II I I 1 II I I I I I I I I I I I I I I I I I 1 II I II II 251 RAVEPQLQEEERMVYVAFSEFFFDSAMESYFRAGALQLLLVGDKVPHDLD 300 . . . . . 301 MLLRATYFGSIVLLSPAVIDSPLKLELRVLAPPRCTIKPSGTTISVTASV 350 I I I I I I I II I I I II I I II I I I I I I II I I I I I I I I I I I I I I I II I I I I I I I 301 MLLRATYFGSIVLLSPAVIDSPLKLELRVLAPPRCTI PSGTTISVTASV 350 351 TIALVPPDQPEVQLSSMTMDARLSA MALRGKALRTQLDLRRFRIYSNHS 400 I I I I II I I I I I I I II II II I II I I I I I I I I I I II I I I I I I I I II I I I I I I 351 TIALVPPDQPEVQLSSMTMDARLSAKMALRGKALRTQLDLRRFRIYSNHS 400
401 ALESLALIPLQAPLKTMLQIGVMPMLN 427 II 1 I I I I I I I I I II 1 I I I I I I I I I I I I 01 ALESLALIPLQAPLKTMLQIGVMPMLN 427
Sequence name: PLTP_HUMAN
Sequence documentation:
Alignment of: HUMPHOSLIP_PEA_2_P31 x PLTP_HUMAN
Alignment segment 1/1: Quality: 639.00
Escore: 0 Matching length: 67 Total length: 67 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent
Identity: 100.00 Gaps : 0
Alignment :
1 MALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETIT 50 I I I I I I I I II I I II I I I I I I I I I I II II I I I I II II I I I I I I I I I I I I I I 1 MALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETIT 50
51 IPDLRGKEGHFYYNISE 67
51 IPDLRGKEGHFYYNISE 67
Sequence name: PLTP_HUMAN
Sequence documentation:
Alignment of: HUMPHOSLIP_PEA 2 P33 x PLTPJiUMAN Alignment segment 1/1
Quality: 1767.00 Escore: 0 Matching length: 184 Total length: 184 Matching Percent Similarity: 100.00 Matching Percent Identity: 99.46 Total Percent Similarity: 100.00 Total Percent Identity: 99.46 Gaps : 0
Alignment :
1 MALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETIT 50
1 MALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETIT 50
51 IPDLRGKEGHFYYNISEVKVTELQLTSSELDFQPQQELMLQITNASLGLR 100
51 IPDLRGKEGHFYYNISEVKVTELQLTSSELDFQPQQELMLQITNASLGLR 100
101 FRRQLLYWFFYDGGYINASAEGVSIRTGLELSRDPAGRMKVSNVSCQASV 150
101 FRRQLLYWFFYDGGYINASAEGVSIRTGLELSRDPAGRMKVSNVSCQASV 150
151 SRMHAAFGGTFKKVYDFLSTFITSGMRFLLNQQV 184
151 SRMHAAFGGTFKKVYDFLSTFITSGMRFLLNQQI 184
Sequence name : PLTPJIUMAN
Sequence documentation:
Alignment of: HUMPHOSLIP PEA 2 P34 x PLTP HUMAN
Alignment segment 1/1
Quality: 1971.00 Escore: 0 Matching length: 205 Total length: 205 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment ;
1 MALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETIT 50
1 MALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELE IT 50
51 IPDLRGKEGHFYYNISEVKVTELQLTSSELDFQPQQELMLQITNASLGLR 100 II I I II I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I II I I I II I I I I I 51 IPDLRGKEGHFYYNISEVKVTELQLTSSELDFQPQQELMLQITNASLGLR 100 101 FRRQLLYWFFYDGGYINASAEG¥SIRTGLELSRDPAGRMKVSNVSCQASV 150 I I I 1 I I I I I 1 I I I I I I I I I II I II I I I I I I I I 1 II I I I I I I I I II I I I I I 101 FRRQLLYWFFYDGGYINASAEGVSIRTGLELSRDPAGRMKVSNVSCQASV 150 . . . . . 151 SRMHAAFGGTFKKVYDFLSTFITSGMRFLLNQQICPVLYHAGTVLLNSLL 200 I I I I II I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I II I I I I 151 SRMHAAFGGTFKKVYDFLSTFITSGMRFLLNQQICPVLYHAGTVLLNSLL 200 201 DTVPV 205 Mill 201 DTVPV 205
Sequence name: PLTPJIUMAN
Sequence documentation:
Alignment of: HUMPHOSLIP_PEA_2__P35 x PLTPJIUMAN
Alignment segment 1/1:
Quality: 1158.00
Escore: 0 Matching length: 132 Total length: 184 Matching Percent Similarity: 100.00 Matching Percent
Identity: 98.48 Total Percent Similarity: 71.74 Total Percent
Identity: 70.65 Gaps: 1
Alignment :
1 MALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETIT 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I 1 MALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETIT 50 51 IPDLRGKEGHFYYNISEVKVTELQLTSSELDFQPQQELMLQITNASLGLR 100 I I I I I I I I I I I I I I II I I I I I I I I I I I II I I I II I I I 11 I I I I I I I I I I I 51 IPDLRGKEGHFYYNISEVKVTELQLTSSELDFQPQQELMLQITNASLGLR 100
101 FRRQLLY FL 110 I I I I I I I I |: 101 FRRQLLYWFFYDGGYINASAEGVSIRTGLELSRDPAGRMKVSNVSCQASV 150 . . . Ill KVYDFLSTFITSGMRFLLNQQV 132 I I I I I I I I I I I I I I I I I I I I I : 151 SRMHAAFGGTFKKVYDFLSTFITSGMRFLLNQQI 184
DESCPJPTION FOR CLUSTER DI 1853 Cluster D11853 features 18 transcript(s) and 31 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Figure imgf000752_0001
Figure imgf000753_0001
Figure imgf000754_0001
These sequences are variants of the known protein Membrane associated protein SLP-2 (SwissProt accession identifier Q9UIZ1; known also according to the synonyms Stomatin-like protein 2; Stomatin- like 2; Hypothetical protein FLJ14499), SEQ ID NO: 637, refened to herein as the previously known protein. The sequence for protein Membrane associated protein SLP-2 is given at the end of the application, as "Membrane associated protein SLP-2 amino acid sequence". The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: ligand, which are annotation(s) related to Molecular Function; and cytoskeleton; membrane, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>. Cluster D1 1853 can be used as a diagnostic marker according to overexpression of transcripts of this cluster in cancer. Expression of such transcripts in nonnal tissues is also given according to the previously described methods. The term "number" in the left hand column of the table and the numbers on the y-axis of the figure below refer to weighted expression of ESTs in each category, as "parts per million" (ratio of the expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million).
Overall, the following results were obtained as shown with regard to the histograms in Figure 26 and Table 4. This cluster is overexpressed (at least at a minimum level) in the following pathological conditions: brain malignant mmors, colorectal cancer and a mixmre of malignant tumors from different tissues.
Table 4 - Normal tissue distribution
Figure imgf000755_0001
Figure imgf000756_0001
Table 5 - P values and ratios for expression in cancerous tissue
Figure imgf000756_0002
Figure imgf000757_0001
As noted above, cluster DI 1853 features 18 transcript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Membrane associated protein SLP-2. A description of each variant protein according to the present invention is now provided.
Variant protein DI 1853_PEA_1_P1 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) Dll 853 JPEAJ JTl. An alignment is given to the known protein (Membrane associated protein SLP-2) at the end of the application. One or more aligmnents to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between DI 1853_PEAJ_P1 and Q9P042 (SEQ ID NO 639): l.An isolated chimeric polypeptide encoding for D11853JPEAJ P1, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MLARAARGTGALLLRGSLLASGRAPR conesponding to amino acids 1 - 26 of D11853_PEA_1_P1, a second amino acid sequence being at least 90 % homologous to RASSGLPRNTVVLFVPQQEAWVVERMGRFHRILEPGLNILPVLDRIRYVQSLKEIVINVP EQSAVTLDNVTLQIDGVLYLRIMDPYKASYGVEDPEYAVTQLAQTTMRSELGKLSLDK VFRERESLNASIVDAINQAADCWGIRCLRYEIKDIHVPPRVKESMQMQVEAERRKR conesponding to amino acids 13 - 187 of Q9P042, which also conesponds to amino acids 27 - 201 of D11853_PEA_1_P1, a bridging amino acid A conesponding to amino acid 202 of D11853 >EA_1_P1, and a third amino acid sequence being at least 90 % homologous to TVLESEGTRESAINVAEGKKQAQILASEAEKAEQ QAAGEASAVLAXAKAKAEAIML AAALTQIINGDAAASLTVAEQYVSAFSKLAKDSNTILLPSNPGDVTSMVAQAMGVYGA LTKAPVPGTPDSLSSGSSRDVQGTDASLDEELDRVKMS conesponding to amino acids 189 - 342 of Q9P042, which also conesponds to amino acids 203 - 356 of D11853_PEA_1_P1, wherein said first amino acid sequence, second amino acid sequence, bridging amino acid and third amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a head of DI 1853 JPEA JJP1, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MLARAARGTGALLLRGSLLASGRAPR of DI 1853_PEAJ_P1.
Comparison report between DI 1853_PEA_1_P 1 and BAC85377 (SEQ ID NO 640): l .An isolated chimeric polypeptide encoding for DI 1853 JPEA JJP1, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%) and most preferably at least 95% homologous to a polypeptide having the sequence
MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEAWVVERMGRFHRI LEPGLNILIPVLDRIRYVQSLKEIVTNVPEQSAVTLDNVTLQIDGVLYLRI conesponding to amino acids 1 - 109 of D11853_PEA_1_P1, a second amino acid sequence being at least 90 % homologous to
MDPYKASYGVEDPEYAVTQLAQTTMRSELGKLSLDKVFRERESLNASIVDATNQAADC WGIRCLRYEIKDIHVPPRVKESMQMQVEAERRJ RATVLESEGTRESATNVAEGKKQAQI LASEAEKAEQINQAAGEASAVLAKAKAKAEAIRILAAALTQH conesponding to amino acids 1 - 159 of BAC85377, which also conesponds to amino acids 110 - 268 of Dl l 853 JPEAJ JP1, and a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
NGDAAASLTVAEQYVSAFSKLAKDSNTILLPSNPGDVTSMVAQAMGVYGALTKAPVP GTPDSLSSGSSRDVQGTDASLDEELDRVKMS conesponding to amino acids 269 - 356 of Dl l 853 JPEAJ JP1, wherein said first amino acid sequence, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a head of D11853JPEAJJP1, comprising a polypeptide being at least 70%, optionally at least about 80%), preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEAWVVERMGRFHRI LEPGLNILIPVLDRIRYVQSLKEIVINVPEQSAVTLDNVTLQIDGVLYLRI of
D11853_PEAJ_P1. 3.An isolated polypeptide encoding for a tail of DI 1853 JPEAJ JM, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence
NGDAAASLTVAEQYVSAFSKLAKDSNTILLPSNPGDVTSMVAQAMGVYGALTKAPVP GTPDSLSSGSSRDVQGTDASLDEELDRVKMS in DI 1853_PEAJ_P1.
Comparison report between DI 1853 JΕAJ JP1 and Q96FY2 (SEQ ID NO: 638): l.An isolated chimeric polypeptide encoding for Dll 853 ΕAJ JP1, comprising a first amino acid sequence being at least 90 % homologous to MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEAWVVERMGRFHRI LEPGLNILPVLDRIRYVQSLKEIVTNVPEQS AVTLDNVTLQIDGVLYLRIMDPYKAS YGV EDPEYAVTQ conesponding to amino acids 1 - 128 of Q96FY2, which also conesponds to amino acids 1 - 128 of D11853_PEA_1_P1, a bridging amino acid L conesponding to amino acid 129 of DI 1853 JPEAJ JP1, and a second amino acid sequence being at least 90 % homologous to AQTTMRSELGKLSLDKVFRERESLNASIVDATNQAADCWGIRCLRYEIKDIHVPPRVKES MQMQVEAERRKRATVLESEGTRESAΓNVAEGKKQAQILASEAEKAEQTNQAAGEASAV LAKAKAK-AEAIRILAAALTQHNGDAAASLTVAEQYVSAFSKLAKDSNTILLPSNPGDVT SMVAQAMGVYGALTKAPVPGTPDSLSSGSSRDVQGTDASLDEELDRVKMS conesponding to amino acids 130 - 356 of Q96FY2, which also conesponds to amino acids 130 - 356 of Dll 853 JPEAJ JP1, wherein said first amino acid sequence, bridging amino acid and second amino acid sequence are contiguous and in a sequential order.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because one of the two signal- peptide prediction programs (HMM:Signal peptide,NN:NO) predicts that this protein has a signal peptide. Variant protein D11853_PEA_1_P1 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 6, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is Icnown or not; the presence of known SNPs in variant protein D11853JPEAJJP1 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 6 - Amino acid mutations
Figure imgf000760_0001
Figure imgf000761_0001
Variant protein D1 1853_PEA_1_P1 is encoded by the following transcript(s): D11853JPEAJJT1, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript D11853JPEAJJT1 is shown in bold; this coding portion starts at position 108 and ends at position 1175. The transcript also has the following SNPs as listed in Table 7 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein DI 1853_PEA_1_P1 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Nucleic acid SNPs
Figure imgf000761_0002
Figure imgf000762_0001
Variant protein DI 1853 JPEA JJP2 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) DI 1853 JPEA JJT3. An alignment is given to the known protein (Membrane associated protein SLP-2) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between DI 1853_PEA__1_P2 and Q9P042 (SEQ ID NO: 639): l.An isolated chimeric polypeptide encoding for D11853_PEA_1_P2, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MLARAARGTGALLLRGSLLASGRAPR conesponding to amino acids 1 - 26 of Dl l 853 JPEA JJP2, a second amino acid sequence being at least 90 % homologous to RASSGLPRNTVNLFWQQEAWVVERMGRFHRILEPGLNILIPVLD RYVQSLKΕIVTNVP EQSAVTLDNVTLQIDGVLYLRIMDPYKAS YGVEDPEYAVTQLAQTTMRSELGKLSLDK VFRERESLNASIVDATNQAADCWGIRCLRYEIKDIHVPPRVKESMQMQVEAERRKR conesponding to amino acids 13 - 187 of Q9P042, which also conesponds to amino acids 27 - 201 of Dl l 853 JPEAJ JP2, a bridging amino acid A conesponding to amino acid 202 of D11853_PEA_1_P2, a third amino acid sequence being at least 90 % homologous to TVLESEGTRESAINVAEGKKQAQILASEAEK\AEQ1NQAAGEASAVLAKAKA.KAEAIRIL AAALTQHNGDAAASLTVAEQYVSAFSKLAKDSNTILLPSNPGDVTSMVAQ conesponding to amino acids 189 - 297 of Q9P042, which also conesponds to amino acids 203 - 311 of DI 1853_PEA_1_P2, and a fourth amino acid sequence being at least 70%, optionally at least 80%), preferably at least 85%, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence VRAL conesponding to amino acids 312 - 315 of Dl l 853 JPEA JJP2, wherein said first amino acid sequence, second amino acid sequence, bridging amino acid, third amino acid sequence and fourth amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a head of D11853_PEA_1_P2, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence MLARAARGTGALLLRGSLLASGRAPR of DI 1853_PEA_1_P2. 3.An isolated polypeptide encoding for a tail of D11853JPEAJ J>2, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence VRAL in DI 1853 JPEA JP2.
Comparison report between DI 1853 JPEAJ JP2 and BAC85377 (SEQ ID NO: 640): l .An isolated chimeric polypeptide encoding for D11853_PEA_1_P2, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEAWVVERMGRFHRI
LEPGLNILPVLDRIRYVQSLKEIVΓNVPEQSAVTLDNVTLQIDGVLYLRI conesponding to amino acids 1 - 109 of DI 1853 JΕAJ JP2, a second amino acid sequence being at least 90 % homologous to
MDPYKASYGVEDPEYAVTQLAQTTMRSELGKLSLDKVFRERESLNASIVDAΓNQAADC WGIRCLRYEIKDIH PRVKESMQMQVEAERRKRATVLESEGTRESAΓNVAEGKKQAQI
LASEAEKAEQΓNQAAGEASAVLAKAKAKAEAIRILAAALTQH conesponding to amino acids 1 - 159 of BAC85377, which also conesponds to amino acids 110 - 268 of Dl l 853 JPEA JJP2, and a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
NGDAAASLTVAEQYVSAFSKLAKDSNTILLPSNPGDVTSMVAQVRAL conesponding to amino acids 269 - 315 of D11853JPEAJJP2, wherein said first amino acid sequence, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a head of D11853_PEA_1_P2, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence
MLARAARGTGALLLRGSLLASGRAPRRASSGLPPVNTVVLFVPQQEAWVVERMGRFHRI LEPGLNILIPVLDRIRYVQSLKEIVΓNVPEQSAVTLDNVTLQIDGVLYLRI of
D1 1853_PEA_1_P2. 3.An isolated polypeptide encoding for a tail of D11853JPEAJJP2, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence NGDAAASLTVAEQYVSAFSKLAKDSNTILLPSNPGDVTSMVAQVRAL in
Dl l 853 JPEA JJ»2.
Comparison report between DI 1853JΕAJ JP2 and Q96FY2: l.An isolated chimeric polypeptide encoding for DI I853 ΕAJJP2, comprising a first amino acid sequence being at least 90 % homologous to MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEAWVVERMGRFHRI LEPGLNILIPVLDRIRYVQSLKEIVINVPEQSAVTLDNVTLQIDGVLYLRIMDPYKASYGV EDPEYAVTQ conesponding to amino acids 1 - 128 of Q96FY2, which also conesponds to amino acids 1 - 128 of D11853JPEAJJP2, a bridging amino acid L conesponding to amino acid 129 of D11853_PEA_1_P2, a second amino acid sequence being at least 90 % homologous to
AQTTMRSELGKLSLDKVFRERESLNASIVDAΓNQAADCWGIRCLRYEIKDIHVPPRVKES
MQMQVEAERIXKA .TVLESEGTRESAINVAEGKKQAQILASEAEKAEQRNQAAGEASAV LAKAKAKAEAIRILAAALTQHNGDAAASLTVAEQYVSAFSKLAKDSNTILLPSNPGDVT
SMVAQ conesponding to amino acids 130 - 311 of Q96FY2, which also conesponds to amino acids 130 - 311 of Dl l 853 JPEA JJP2, and a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VRAL conesponding to amino acids 312 - 315 of D11853_PEA_1_P2, wherein said first amino acid sequence, bridging amino acid, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of Dll 853 JPEAJ J>2, comprising a polypeptide being at least 70%>, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VRAL in DI 1853 JPEAJ J>2. Comparison report between Dl l 853 ΕAJ JP2 and Q9UJZ1 : l .An isolated chimeric polypeptide encoding for D11853_PEA_1_P2, comprising a first amino acid sequence being at least 90 % homologous to MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEA VVERMGRFHRI LEPGLNILIPVLDRIRYVQSLKEIVTNVPEQSAVTLDNVTLQIDGVLYLRIMDPYKASYGV EDPEYAVTQLAQTTMRSELGKLSLDKVFRERESLNASIVDAINQAADCWGIRCLRYEIK DIHVPPRVKESMQMQVEAERRIOIATVLESEGTRESAINVAEGKKQAQILASEAEKAEQI NQAAGEASAVLAKAKAKAEAIRILAAALTQHNGDAAASLTVAEQYVSAFSKLAKDSN TILLPSNPGDVTSMVAQ conesponding to amino acids 1 - 311 of Q9UJZ1, which also conesponds to amino acids 1 - 311 of D11853_PEA_1_P2, and a second amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VRAL conesponding to amino acids 312 - 315 of Dl l 853 JPEA JJP2, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of D11853_PEA_1_P2, comprising a polypeptide being at least 70%>, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VRAL in DI 1853_PEA_1_P2.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because one of the two signal- peptide prediction programs (HMM:Signal peptide,NN:NO) predicts that this protein has a signal peptide. Variant protein Dl l 853 JPEA Jj?2 also has the following non-silent SNPs (Single
Nucleotide Polymorphisms) as listed in Table 8, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein D11853_PEA_1_P2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Amino acid mutations
Figure imgf000767_0001
Variant protein D11853_PEA_1_P2 is encoded by the following transcript(s): Dl l 853 JPEA JJT3, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript Dl l 853 ΕAJ JT3 is shown in bold; this coding portion starts at position 108 and ends at position 1052. The transcript also has the following SNPs as listed in Table 9 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein DI 1853JΕAJ JP2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Nucleic acid SNPs
Figure imgf000768_0001
Figure imgf000769_0001
Variant protein D11853_PEA_1_P7 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s) D11853_PEA_1_T10. An alignment is given to the known protein (Membrane associated protein SLP-2) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between DI 1853 JΕAJ JP7 and Q9P042: l .An isolated chimeric polypeptide encoding for Dl l 853 JPEA JJP7, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MLARAARGTGALLLRGSLLASGRAPR conesponding to amino acids 1 - 26 of D11853_PEA_1_P7, a second amino acid sequence being at least 90 % homologous to RASSGLPRNTVVLFVPQQEAWWERMGRFHRILEPGLNILPVLDRIRYVQSLKEIVLNVP EQSAVTLDNVTLQIDGVLYLRIMDPYKASYGVEDPEYAVTQLAQTTMRSELGKLSLDK VFllERESLNASIVDAINQAADCWGIRCLRYEIKDIHVPPRVKESMQMQVEALRRKR conesponding to amino acids 13 - 187 of Q9P042, which also conesponds to amino acids 27 - 201 of Dl l 853 JPEA JJ>7, a bridging amino acid A conesponding to amino acid 202 of D11853_PEA_1_P7, a third amino acid sequence being at least 90 % homologous to
TVLESEGTRESAΓNVAEGKKQAQILASEAEKAEQΓNQAAGEASAVLAKAKAKAEAIRIL
AAALTQH conesponding to amino acids 189 - 254 of Q9P042, which also conesponds to amino acids 203 - 268 of Dll 853 JPEAJ JP7, and a fourth amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VRGPWVGMGTGIDSGRGSLIYA conesponding to amino acids 269 - 290 of Dl l 853 JPEA JJP7, wherein said first amino acid sequence, second amino acid sequence, bridging amino acid, third amino acid sequence and fourth amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a head of Dll 853 JPEAJ JP7, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MLARAARGTGALLLRGSLLASGRAPR of DI 1853 JPEA J_P7. 3. An isolated polypeptide encoding for a tail of Dll 853 JPEA JJP7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VRGPWVGMGTGIDSGRGSLIYA in DI 1853_PEA_1_P7. Comparison report between DI 1853JPEAJ JP7 and BAC85377: l.An isolated chimeric polypeptide encoding for D11853JΕAJJP7, comprising a first amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEAWVVERMGRFHRI LEPGLNIL VLDRIRYVQSLKEIVINVPEQSAVTLDNVTLQIDGVLYLRI conesponding to amino acids 1 - 109 of D11853_PEA_1_P7, and a second amino acid sequence being at least 90 % homologous to
MDPYKASYGVEDPEYAVTQLAQTTMRSELGKLSLDKVFRERESLNASIVDALNQAADC WGIRCLRYEIKDIHVPPRVKESMQMQVEAERRKRATVLESEGTRES AINVAEGKKQAQI LASEAEKAEQTNQAAGEASAVLAKAKAKAEAIRILAAALTQHVRGPWVGMGTGIDSGR GSLIYA conesponding to amino acids 1 - 181 of BAC85377, which also conesponds to amino acids 110 - 290 of Dll 853 JPEA JJP7, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a head of D11853_PEAJ_P7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEAWWERMGRFHRI LEPGLNILIPVLDRIRYVQSLKEIVTNVPEQSAVTLDNVTLQIDGVLYLRI of
D11853_PEA_1_P7.
Comparison report between DI 1853 JPEAJ JP7 and Q96FY2 (SEQ ID NO 638): l.An isolated chimeric polypeptide encoding for D11853_PEA_1_P7, comprising a first amino acid sequence being at least 90 % homologous to
MLARAARGTGALLLRGSLLASGRAPR1 A.SSGLPRNTVVLFWQQEAWVVERMGRFHRI LEPGLNILIPVLDRIRYVQSLKEΓVΓNVPEQSAVTLDNVTLQIDGVLYLRIMDPYKASYGV
EDPEYAVTQ conesponding to amino acids 1 - 128 of Q96FY2, which also conesponds to amino acids 1 - 128 of D11853_PEA_1_P7, a bridging amino acid L conesponding to amino acid 129 of Dll 853 JPEA JJP7, a second amino acid sequence being at least 90 % homologous to AQTTMRSELGKLSLDKVFRERESLNASIVDATNQAADCWGIRCLRYEIKDIHVPPRVKES MQMQVEAERJIKRATVLESEGTRESAINVAEGKKQAQILASEAEKAEQΓNQAAGEASAV
LAKAKAKAEAIRILAAALTQH conesponding to amino acids 130 - 268 of Q96FY2, which also corresponds to amino acids 130 - 268 of Dl l 853 JPEAJ >7, and a third amino acid sequence being at least 70%, optionally at least 80%), preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VRGPWVGMGTGIDSGRGSLIYA conesponding to amino acids 269 - 290 of D11853_PEA_1_P7, wherein said first amino acid sequence, bridging amino acid, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of Dl l 853 JPEAJ JP7, comprising a polypeptide being at least 70%>, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VRGPWVGMGTGIDSGRGSLIYA in DI 1853 JPEAJ JP7. Comparison report between DI 1853 JPEA JP7 and Q9UJZ1 : l.An isolated chimeric polypeptide encoding for D11853JPEAJJP7, comprising a first amino acid sequence being at least 90 % homologous to MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEAWWERMGRFHRI LEPGLNILPVLDRIRYVQSLKEIVINVPEQSAVTLDNVTLQIDGVLYLPJMDPYKASYGV EDPEYAVTQLAQTTMRSELGKLSLDKVFRERESLNASIVDAINQAADCWGIRCLRYEIK DIHVPPRVKESMQMQVEAERRKRATVLESEGTRESAINVAEGKKQAQILASEAEKAEQI NQAAGEASAVLAKAKAKAEAIRILAAALTQH conesponding to amino acids 1 - 268 of Q9UJZ1, which also conesponds to amino acids 1 - 268 of DI 1853 JPEAJ JP7, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VRGPWVGMGTGIDSGRGSLIYA conesponding to amino acids 269 - 290 of Dl l 853 JPEAJ JP7, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of Dll 853 JPEA JJP7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95%o homologous to the sequence VRGPWVGMGTGIDSGRGSLIYA in DI 1853 JΕA 1_P7.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because one of the two signal- peptide prediction programs (HMM:Signal peptide,NN:NO) predicts that this protein has a signal peptide. Variant protein D11853_PEA_1__P7 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 10, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of Icnown SNPs in variant protein D11853JPEAJJP7 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 - Amino acid mutations
Figure imgf000773_0001
Figure imgf000774_0001
Variant protein D11853_PEA_1_P7 is encoded by the following transcript(s): Dl l 853 JΕAJ JTIO, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript D11853JPEAJJT10 is shown in bold; this coding portion starts at position 108 and ends at position 977. The transcript also has the following SNPs as listed in Table 11 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Dl l 853 JPEA JJP7 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 11 - Nucleic acid SNPs
Figure imgf000774_0002
Figure imgf000775_0001
Variant protein D11853_PEA_1_P9 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s) D1 1853JPEAJJT13. An alignment is given to the known protein (Membrane associated protein SLP-2) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between DI 1853 JPEA JJP9 and Q9P042: l .An isolated chimeric polypeptide encoding for Dl l 853 JPEA JJP9, comprising a first amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MLARAARGTGALLLRGSLLASGRAPR conesponding to amino acids 1 - 26 of Dl l 853 JPEA JJP9, a second amino acid sequence being at least 90 % homologous to RASSGLPRNTVNLFVPQQEAWVVEPJVIGRFHRILEPGLNILPVLDRIRYVQSLKEIVINVP EQSAVTLDNVTLQIDGVLYLRIMDPYKASYGVEDPEYAVTQLAQTTMRSELGKLSLDK VFRERESLNASIVDAINQAADCWGIRCLRYEIKDIHVPPRVKESMQMQVEAERRKR conesponding to amino acids 13 - 187 of Q9P042, which also conesponds to amino acids 27 - 201 of D11853JPEA _1_P9, a bridging amino acid A conesponding to amino acid 202 of Dl l 853 JPEA JJP9, a third amino acid sequence being at least 90 % homologous to TVLESEGTRESAiNVAEGKKQAQILASEAEKAEQTNQA conesponding to amino acids 189 - 226 of Q9P042, which also conesponds to amino acids 203 - 240 of D11853_PEA_1_P9, a fourth amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence AGQERVEAEGGARHGPLKIGAGAGSLGYFDFMGQASSVPSL conesponding to amino acids 241 - 281 of Dl l 853 JPEA JJP9, and a fifth amino acid sequence being at least 90 % homologous to AGEASAVLAKAKAKAEAIWLAAΛLTQHNGDAAASLTVAJEQYVSAFSKLAKDSNTILL PSNPGDVTSMVAQAMGVYGALTKAPVPGTPDSLSSGSSRDVQGTDASLDEELDRVKM S conesponding to amino acids 227 - 342 of Q9P042, which also conesponds to amino acids 282 - 397 of Dl l 853 JPEA JJP9, wherein said first amino acid sequence, second amino acid sequence, bridging amino acid, third amino acid sequence, fourth amino acid sequence and fifth amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a head of Dl l 853 JPEA JJP9, comprising a polypeptide being at least 70%>, optionally at least about 80%, preferably at least about 85%), more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MLARAARGTGALLLRGSLLASGRAPR of DI 1853J>EAJ J>9. 3. An isolated polypeptide encoding for an edge portion of D11853JPEAJJP9, comprising an amino acid sequence being at least 70%, optionally at least about 80%), preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95%> homologous to the sequence encoding for
AGQERVEAEGGARHGPLKIGAGAGSLGYFDFMGQASSVPSL, conesponding to Dl l 853 JPEA JJP9.
Comparison report between DI 1853 JPEA JP9 and BAC85377: l .An isolated chimeric polypeptide encoding for D11853JPEAJJP9, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEAWVVERMGRFHRI LEPGLNILPVLDRIRYVQSLKEIVΓNVPEQSAVTLDNVTLQIDGVLYLRI conesponding to amino acids 1 - 109 of D11853_PEA_1_P9, a second amino acid sequence being at least 90 % homologous to
MDPYKASYGVEDPEYAVTQLAQTTMRSELGKLSLDKVFRERESLNASIVDATNQAADC WGIRCLRYEIKDIHWPRVKESMQMQVEAERRKRATVLESEGTRESAΓNVAEGKKQAQI
LASEAEKAEQINQA conesponding to amino acids 1 - 131 of BAC85377, which also conesponds to amino acids 110 - 240 of DI 1853_PEA_1_P9, a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence AGQERVEAEGGARHGPLKIGAGAGSLGYFDFMGQASSVPSL conesponding to amino acids 241 - 281 of Dll 853 JPEA JJP9, a fourth amino acid sequence being at least 90 %> homologous to AGEASAVLAKAKAKAEAIRILAAALTQH conesponding to amino acids 132 - 159 of BAC85377, which also conesponds to amino acids 282 - 309 of Dll 853 JPEA JJP9, and a fifth amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%o, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence
NGDAAASLTVAEQYVSAFSKLAKDSNTILLPSNPGDVTSMVAQAMGVYGALTKAPVP GTPDSLSSGSSRDVQGTDASLDEELDRVKMS conesponding to amino acids 310 - 397 of DI 1853 ΕAJ JP9, wherein said first amino acid sequence, second amino acid sequence, third amino acid sequence, fourth amino acid sequence and fifth amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a head of D11853_PEA_1_P9, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence
MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEAWVVERMGRFHRI LEPGLNILPVLDRIRYVQSLKEIVTNVPEQSAVTLDNVTLQIDGVLYLRI of
Dl l 853 JPEA JJP9. 3. An isolated polypeptide encoding for an edge portion of D11853_PEA_1_P9, comprising an amino acid sequence being at least 70%, optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90%> and most preferably at least about 95% homologous to the sequence encoding for
AGQERVEAEGGARHGPLKIGAGAGSLGYFDFMGQASSVPSL, conesponding to Dl l 853 JPEA J_P9. 4.An isolated polypeptide encoding for a tail of Dl l 853 JPEA JJP9, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence NGDAAASLTVAEQYVSAFSKLAKDSNTILLPSNPGDVTSMVAQAMGNYGALTKAPVP GTPDSLSSGSSRDVQGTDASLDEELDRVKMS in D11853_PEA_1_P9.
Comparison report between DI 1853_PEA_1_P9 and Q96FY2: l.An isolated chimeric polypeptide encoding for Dl l 853 JPEA JJP9, comprising a first amino acid sequence being at least 90 % homologous to
MLAIlAARGTGALLLRGSLLASGRAPPvRASSGLPRΝTVVLFVPQQEAWNVΕRMGRFHRI LEPGLNILIPVLDRIRYVQSLKEIVINVPEQSAVTLDNVTLQIDGVLYLRIMDPYKASYGV EDPEYAVTQ conesponding to amino acids 1 - 128 of Q96FY2, which also conesponds to amino acids 1 - 128 of D11853_PEA_1_P9, a bridging amino acid L conesponding to amino acid 129 of D11853_PEA_1_P9, a second amino acid sequence being at least 90 % homologous to
AQTTMRSELGKLSLDKVFRERESLNASIVDATNQAADCWGIRCLRYEIKDIHVPPRVKES MQMQVEAERl^J lATVLESEGTRESAmVAEGKKQAQILASEAEKAEQrNQA conesponding to amino acids 130 - 240 of Q96FY2, which also conesponds to amino acids 130 - 240 of DI 1853 JPEAJ JP9, a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
AGQERVEAEGGARHGPLKIGAGAGSLGYFDFMGQASSVPSL conesponding to amino acids 241 - 281 of Dll 853 JPEAJ J>9, and a fourth amino acid sequence being at least 90 % homologous to AGEASAVLAKAKAKAEAIRILAAALTQHNGDAAASLTVAEQYVSAFSKLAKDSNTILL PSNPGDVTSMVAQAMGVYGALTKAPVPGTPDSLSSGSSRDVQGTDASLDEELDRVKM S conesponding to amino acids 241 - 356 of Q96FY2, which also conesponds to amino acids 282 - 397 of D11853_PEA_1_P9, wherein said first amino acid sequence, bridging amino acid, second amino acid sequence, third amino acid sequence and fourth amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for an edge portion of Dl l 853 JPEA JJP9, comprising an amino acid sequence being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95% homologous to the sequence encoding for AGQERVEAEGGARHGPLKIGAGAGSLGYFDFMGQASSVPSL, conesponding to
D11853_PEA_1_P9.
Comparison report between DI 1853 JPEA JP9 and Q9UIZ1: l.An isolated chimeric polypeptide encoding for D11853JPEAJJP9, comprising a first amino acid sequence being at least 90 % homologous to
MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEAWVVERMGRFHRI LEPGLNILIPVLDRIRYVQSLKEIVINVPEQSAVTLDNVTLQIDGVLYLRIMDPYKASYGV EDPEYAVTQLAQTTMRSELGKLSLDKVFRERESLNASIVDATNQAADCWGIRCLRYEIK DIHWPRVKESMQMQVEAERRKRATVLESEGTRESATNVAEGKKQAQILASEAEKAEQI NQA conesponding to amino acids 1 - 240 of Q9UJZ1, which also conesponds to amino acids 1 - 240 of Dl l 853 JPEA JJP9, a second amino acid sequence being at least 70%, optionally at least 80%), preferably at least 85%>, more preferably at least 90% and most preferably at least 95%o homologous to a polypeptide having the sequence
AGQERVEAEGGARHGPLKIGAGAGSLGYFDFMGQASSVPSL conesponding to amino acids 241 - 281 of Dl l 853 JPEAJ JP9, and a third amino acid sequence being at least 90 % homologous to
AGEASAVLAKAKAKAEAIRILAAALTQHNGDAAASLTVAEQYVSAFSKLAKDSNTILL PSNPGDVTSMVAQAMGWGALTKAPVPGTPDSLSSGSSRDVQGTDASLDEELDRVKM S conesponding to amino acids 241 - 356 of Q9UJZ1, which also conesponds to amino acids 282 - 397 of D11853JPEAJJP9, wherein said first amino acid sequence, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for an edge portion of D11853_PEA_1_P9, comprising an amino acid sequence being at least 70%>, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95% homologous to the sequence encoding for AGQERVEAEGGARHGPLKIGAGAGSLGYFDFMGQASSVPSL, conesponding to
Dl l 853 JPEA JP9.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignaP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because one of the two signal- peptide prediction programs (HMM:Signal peptide,NN:NO) predicts that this protein has a signal peptide. Variant protein Dll 853 JPEA JJP9 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 12, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of Icnown SNPs in variant protein Dl l 853 JPEAJ JP9 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 12 - Amino acid mutations
Figure imgf000781_0001
Figure imgf000782_0001
Variant protein D11853_PEA_1_P9 is encoded by the following transcript(s): Dl l 853 JPEA JJT13, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript D11853JPEAJJT13 is shown in bold; this coding portion starts at position 108 and ends at position 1298. The transcript also has the following SNPs as listed in Table 13 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein DI 1853 JPEAJ J>9 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 13 - Nucleic acid SNPs
Figure imgf000782_0002
Figure imgf000783_0001
Variant protein D11853JPEAJJP10 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) Dll 853 JPEA JJT14. An alignment is given to the known protein (Membrane associated protein SLP-2) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between Dl 1853_PEA_1_P10 and Q9P042: l.An isolated chimeric polypeptide encoding for Dl 1853 JPEA JJ> 10, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%>, more preferably at least 90%> and most preferably at least 95%> homologous to a polypeptide having the sequence MLARAARGTGALLLRGSLLASGRAPR conesponding to amino acids 1 - 26 of Dll 853 JPEAJ JP10, a second amino acid sequence being at least 90 % homologous to RASSGLPRNTVVLFVPQQEAWVVERMGRFHRILEPGLNILPVLDRIRYVQSLKEIVTNVP EQSAVTLDNVTLQIDGVLYLRIMDPYKASYGVEDPEYAVTQLAQTTMRSELGKLSLDK VFRERESLNASIVDAINQAADCWGIRCLRYEIKDIHVPPRVKESMQMQVEAERRKR conesponding to amino acids 13 - 187 of Q9P042, which also conesponds to amino acids 27 - 201 of Dll 853 JΕAJ JP 10, a bridging amino acid A conesponding to amino acid 202 of D11853_PEA_1_P10, a third amino acid sequence being at least 90 % homologous to TVLESEGTRESAINVAEGKKQAQILASEAEKAEQTNQAAGEASAVLAKAKAKAEAIRIL AAALTQH conesponding to amino acids 189 - 254 of Q9P042, which also conesponds to amino acids 203 - 268 of D11853JPEAJJP10, and a fourth amino acid sequence being at least 90 % homologous to AMGVYGALTKAPVPGTPDSLSSGSSRDVQGTDASLDEELDRVKMS conesponding to amino acids 298 - 342 of Q9P042, which also conesponds to amino acids 269 - 313 of Dl 1853 JPEA JJ310, wherein said first amino acid sequence, second amino acid sequence, bridging amino acid, third amino acid sequence and fourth amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a head of D11853_PEA_1_P10, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MLARAARGTGALLLRGSLLASGRAPR of Dl 1853_PEAJ_P10. 3.An isolated chimeric polypeptide encoding for an edge portion of D11853JPEAJ JP10, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise HA, having a structure as follows: a sequence starting from any of amino acid numbers 268-x to 268; and ending at any of amino acid numbers 269+ ((n-2) - x), in which x varies from 0 to n-2.
Comparison report between Dl 1853 JPEAJ J>10 and BAC85377: l.An isolated chimeric polypeptide encoding for Dl 1853JΕAJJP10, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEAWVVERMGRFHRI LEPGLNILIPVLDRIRYVQSLKEIVTNVPEQSAVTLDNVTLQIDGVLYLRI conesponding to amino acids 1 - 109 of Dl 1853 JΕAJ JP10, a second amino acid sequence being at least 90 %> homologous to
MDPYKASYGVEDPEYAVTQLAQTTMRSELGKLSLDKVFRERESLNASIVDATNQAADC WGIRCLRYEIKDIHVPPRVKΈSMQMQVEAERRKRATVLESEGTRESAINVAEGKKQAQI LASEAEKAEQΓNQAAGEASAVLAKAKAKAEAIRILAAALTQH conesponding to amino acids 1 - 159 of BAC85377, which also conesponds to amino acids 110 - 268 of Dll 853 JΕAJ JP10, and a third amino acid sequence being at least 70%, optionally at least 80%), preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence AMGVYGALTKAPVPGTPDSLSSGSSRDVQGTDASLDEELDRVKMS conesponding to amino acids 269 - 313 of Dll 853 JPEAJ J310, wherein said first amino acid sequence, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a head of D11853_PEA_1_P10, comprising a polypeptide being at least 70%>, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence
MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEAWWERMGRFHRI LEPGLNILIPVLDRIRYVQSLKEIVINVPEQSAVTLDNVTLQIDGVLYLRI of
Dll 853 JPEA JJ>10. 3.An isolated polypeptide encoding for a tail of Dll 853 JPEA JJP10, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%) and most preferably at least about 95% homologous to the sequence AMGVYGALTKAPVPGTPDSLSSGSSRDVQGTDASLDEELDRVKMS in Dl l 853 JΕAJ JP10. Comparison report between Dl 1853JPEAJ JP10 and Q96FY2: l.An isolated chimeric polypeptide encoding for Dl 1853 JPEAJ JP10, comprising a first amino acid sequence being at least 90 % homologous to MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEAWVVERMGRFHRI LEPGLNILIPVLDRIRYVQSLKEIVTNVPEQSAVTLDNVTLQIDGVLYLRIMDPYKASYGV EDPEYAVTQ conesponding to amino acids 1 - 128 of Q96FY2, which also conesponds to amino acids 1 - 128 of D11853_PEA_1_P10, a bridging amino acid L conesponding to amino acid 129 of D11853JPEAJJP10, a second amino acid sequence being at least 90 % homologous to
AQTTMRSELGKLSLDKVFRERESLNASIVDATNQAADCWGIRCLRYEIKDIHVPPRVKES MQMQVEAERRKRATVLESEGTRESAΓNVAEGKKQAQILASEAEKAEQΓNQAAGEASAV
LAKAKAKAEAIRILAAALTQH conesponding to amino acids 130 - 268 of Q96FY2, which also conesponds to amino acids 130 - 268 of Dl 1853 JΕAJ JP10, and a third amino acid sequence being at least 90 % homologous to
AMGVYGALTKAPVPGTPDSLSSGSSRDVQGTDASLDEELDRVKMS conesponding to amino acids 312 - 356 of Q96FY2, which also conesponds to amino acids 269 - 313 of D11853_PEAJ_P10, wherein said first amino acid sequence, bridging amino acid, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. 2.An isolated chimeric polypeptide encoding for an edge portion of Dl 1853 JPEA JJ* 10, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise HA, having a structure as follows: a sequence starting from any of amino acid numbers 268-x to 268; and ending at any of amino acid numbers 269+ ((n-2) - x), in which x varies from 0 to n-2.
Comparison report between Dl 1853 JPEA JJP10 and Q9UJZ1 : l.An isolated chimeric polypeptide encoding for Dl 1853 JPEAJ JP10, comprising a first amino acid sequence being at least 90 % homologous to MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEAWVVERMGRFHRI LEPGLNILIPVLDRIRYVQSLKEIVTNVPEQSAVTLDNVTLQIDGVLYLRIMDPYKASYGV EDPEYAVTQLAQTTMRSELGKLSLDKVFRERESLNASIVDATNQAADCWGIRCLRYEIK DIHVPPRVKESMQMQVEAERRKRATVLESEGTRESAINVAEGKKQAQILASEAEKAEQI NQAAGEASAVLAKAKAKAEAIRILAAALTQH conesponding to amino acids 1 - 268 of Q9UIZ1, which also conesponds to amino acids 1 - 268 of D11853_PEA_1_P10, and a second amino acid sequence being at least 90 % homologous to AMGVYGALTKAPVPGTPDSLSSGSSRDVQGTDASLDEELDRVKMS conesponding to amino acids 312 - 356 of Q9UJZ1, which also conesponds to amino acids 269 - 313 of Dl l 853 JPEAJ JP10, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated chimeric polypeptide encoding for an edge portion of Dl 1853JPEAJJP10, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise HA, having a structure as follows: a sequence starting from any of amino acid numbers 268-x to 268; and ending at any of amino acid numbers 269+ ((n-2) - x), in which x varies from 0 to n-2.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because one of the two signal- peptide prediction programs (HMM:Signal peptide,NN:NO) predicts that this protein has a signal peptide. Variant protein Dl 1853 ΕA JJP10 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 14, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Dl 1853JPEAJ JP10 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 14 - Amino acid mutations
Figure imgf000788_0001
Variant protein D11853_PEA_1_P10 is encoded by the following transcript(s): Dl l 853 JPEAJ JT14, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript D11853_PEA_1_T14 is shown in bold; this coding portion starts at position 108 and ends at position 1046. The transcript also has the following SNPs as listed in Table 15 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Dl 1853JPEAJ JP10 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 15 - Nucleic acid SNPs
Figure imgf000789_0001
Figure imgf000790_0001
Variant protein Dll 853 JPEAJ JP11 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) D11853JPEAJJT15. An alignment is given to the known protein (Membrane associated protein SLP-2) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between D 11853 JΕA J JP 11 and Q9P042 : l.An isolated chimeric polypeptide encoding for Dl 1853 JPEAJ JP11, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MLARAARGTGALLLRGSLLASGRAPR conesponding to amino acids 1 - 26 of D11853JPEA J P11, a second amino acid sequence being at least 90 % homologous to RASSGLPRNTVVLF QQEAWVVERMGRFHWLEPGLNILIPVLDMRYVQSLKΕIVINVP EQSAVTLDNVTLQIDGVLYLRIMDPYKASYGVEDPEYAVTQLAQTTMRSELGKLSLDK VFRERESLNASIVDAINQAADCWGIRCLRYEIKDIHVPPRVKESMQMQVEAERRKR conesponding to amino acids 13 - 187 of Q9P042, which also conesponds to amino acids 27 - 201 of D11853JPEAJJP11, a bridging amino acid A conesponding to amino acid 202 of D11853JPEA_1_P11, a third amino acid sequence being at least 90 % homologous to TVLESEGTTESAΓNVAEGKKQAQILASEAEKAEQINQA conesponding to amino acids 189 - 226 of Q9P042, which also conesponds to amino acids 203 - 240 of D11853 ΕAJJP11, a fourth amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%o and most preferably at least 95% homologous to a polypeptide having the sequence AGQERVEAEGGARHGPLKIGAGAGSLGYFDFMGQASSVPSL corresponding to amino acids 241 - 281 of D11853_PEA_1_P11, a fifth amino acid sequence being at least 90 % homologous to AGEASAVLAKAKAKAEAIRILAAALTQH conesponding to amino acids 227 - 254 of Q9P042, which also conesponds to amino acids 282 - 309 of Dl l 853 JPEAJ JP11, and a sixth amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%o, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VRGPWVGMGTGIDSGRGSLIYA conesponding to amino acids 310 - 331 of Dl 1853JΕAJJP11, wherein said first amino acid sequence, second amino acid sequence, bridging amino acid, third amino acid sequence, fourth amino acid sequence, fifth amino acid sequence and sixth amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a head of D11853_PEA_1_P11, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MLARAARGTGALLLRGSLLASGRAPR of D11853_PEA_1_P11. 3.An isolated polypeptide encoding for an edge portion of D11853_PEA_1J)11, comprising an amino acid sequence being at least 70%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence encoding for
AGQERVEAEGGARHGPLKIGAGAGSLGYFDFMGQASSVPSL, conesponding to
D11853_PEAJ_P11. 4.An isolated polypeptide encoding for a tail of Dl 1853 JPEA JJP11, comprising a polypeptide being at least 70%>, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VRGPWVGMGTGIDSGRGSLIYA in Dl 1853 JPEAJ JP11.
Comparison report between Dl 1853 JΕAJ JP11 and BAC85377: l.An isolated chimeric polypeptide encoding for D11853JPEA_1_P11, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEAWWERMGRFHRI LEPGLNILIPVLDRIRYVQSLKEIVINVPEQSAVTLDNVTLQIDGVLYLRI conesponding to amino acids 1 - 109 of Dl 1853 ΕAJ JP11, a second amino acid sequence being at least 90 %> homologous to
MDPYKASYGVEDPEYAVTQLAQTTMRSELGKLSLDKVFRERESLNASIVDATNQAADC WGIRCLRYEIKDIHVPPRVKESMQMQVEAERRKRATVLESEGTRESAINVAEGKKQAQI LASEAEKAEQTNQA conesponding to amino acids 1 - 131 of BAC85377, which also conesponds to amino acids 110 - 240 of Dl l 853 JPEAJ Pl l, a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence AGQERVEAEGGARHGPLKIGAGAGSLGYFDFMGQASSVPSL conesponding to amino acids 241 - 281 of Dl l 853 JPEAJ JP11, and a fourth amino acid sequence being at least 90 % homologous to AGEASAVLAKAKAKAEAIRILAAALTQHVRGPWVGMGTGIDSGRGSLIYA conesponding to amino acids 132 - 181 of BAC85377, which also conesponds to amino acids 282 - 331 of D11853JPEAJJP11, wherein said first amino acid sequence, second amino acid sequence, third amino acid sequence and fourth amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a head of D11853_PEA_1_P11, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEAWvNERMGRFHRI LEPGLΝILIPVLDRIRYVQSLKEΓVΓΝVPEQSAVTLDΝVTLQIDGVLYLRI of
D11853_PEAJ_P11. 3.An isolated polypeptide encoding for an edge portion of Dl 1853 JPEA JJP11, comprising an amino acid sequence being at least 70%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence encoding for AGQERVEAEGGARHGPLKIGAGAGSLGYFDFMGQASSVPSL, conesponding to
D11853 ΕAJ_P11.
Comparison report between Dl 1853 JΕAJ jPl 1 and Q96FY2: 1.An isolated chimeric polypeptide encoding for Dl 1853 JPEA JJP11, comprising a first amino acid sequence being at least 90 % homologous to MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEAWVVERMGRFHRI LEPGLNILIPVLDRIRYVQSLKLIV VPEQSAVTLDNVTLQIDGVLYLRTMDPYKASYGV EDPEYAVTQ conesponding to amino acids 1 - 128 of Q96FY2, which also conesponds to amino acids 1 - 128 of D11853JPEAJJP11, a bridging amino acid L conesponding to amino acid 129 of Dl 1853 JΕAJ JP11, a second amino acid sequence being at least 90 %> homologous to
AQTTMRSELGKLSLDKVFRERESLNASIVDAINQAADCWGIRCLRYEIKDIHVPPRVKES MQMQVEAERRKRATVLESEGTRESATNVAEGKKQAQILASEAEKAEQTNQA conesponding to amino acids 130 - 240 of Q96FY2, which also conesponds to amino acids 130
- 240 of D11853 PEAJJP11, a third amino acid sequence being at least 10%, optionally at least 80%), preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence AGQERVEAEGGARHGPLKIGAGAGSLGYFDFMGQASSVPSL conesponding to amino acids 241 - 281 of D11853JPEAJJP11, a fourth amino acid sequence being at least 90 % homologous to AGEASAVLAKAKAKAEAIRILAAALTQH conesponding to amino acids 241
- 268 of Q96FY2, which also conesponds to amino acids 282 - 309 of D11853J>EAJJ?11, and a fifth amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%), more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence VRGPWVGMGTGIDSGRGSLIYA conesponding to amino acids 310 - 331 of Dl 1853 JPEA JJP11, wherein said first amino acid sequence, bridging amino acid, second amino acid sequence, third amino acid sequence, fourth amino acid sequence and fifth amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for an edge portion of Dl 1853 JPEA JP11, comprising an amino acid sequence being at least 70%>, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence encoding for
AGQERVEAEGGARHGPLKIGAGAGSLGYFDFMGQASSVPSL, conesponding to
Dl 1853 JPEA JP11. 3. An isolated polypeptide encoding for a tail of D11853_PEA_1_P11, comprising a polypeptide being at least 70%, optionally at least about 80%), preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VRGPWVGMGTGIDSGRGSLIYA in Dl 1853_PEA_1_P11.
Comparison report between Dl 1853_PEA_1_P11 and Q9UJZ 1 : l.An isolated chimeric polypeptide encoding for Dl 1853 JPEA JJP11, comprising a first amino acid sequence being at least 90 % homologous to MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEAWVVERMGRFHRI LEPGLNILPVLDRIRYVQSLKEIVINVPEQSAVTLDNVTLQIDGVLYLRIMDPYKASYGV EDPEYAVTQLAQTTMRSELGKLSLDKVFRERESLNASIVDATNQAADCWGIRCLRYEIK DIH PRVKESMQMQVEAERRKRATVLESEGTRESATNVAEGKKQAQILASEAEKAEQI NQA conesponding to amino acids 1 - 240 of Q9UJZ1, which also conesponds to amino acids 1 - 240 of Dl 1853 JPEAJ JP11, a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence AGQERVEAEGGARHGPLKIGAGAGSLGYFDFMGQASSVPSL conesponding to amino acids 241 - 281 of Dl 1853 JPEAJ JΗ, a third amino acid sequence being at least 90 %> homologous to AGEASAVLAKAKAKAEAIRILAAALTQH corresponding to amino acids 241 - 268 of Q9UIZ1, which also conesponds to amino acids 282 - 309 of D11853_PEAJ_P11, and a fourth amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%), more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VRGPWVGMGTGIDSGRGSLIYA conesponding to amino acids 310 - 331 of Dl 1853 JPEA JJP11, wherein said first amino acid sequence, second amino acid sequence, third amino acid sequence and fourth amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for an edge portion of D11853_PEA__1_P11, comprising an amino acid sequence being at least 70%>, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%o and most preferably at least about 95% homologous to the sequence encoding for AGQERVEAEGGARHGPLKIGAGAGSLGYFDFMGQASSVPSL, corresponding to Dl 1853 JPEA J>11. 3.An isolated polypeptide encoding for a tail of D11853_PEA_1_P11, comprising a polypeptide being at least 70%, optionally at least about 80%), preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VRGPWVGMGTGIDSGRGSLIYA in Dl 1853 JPEAJ JP11. The location of the variant protein was detennined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because one of the two signal- peptide prediction programs (HMM: Signal peptide,NN:NO) predicts that this protein has a signal peptide. Variant protein Dl l 853 JPEA JJP11 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 16, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Dl l 853 JPEAJ Η sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 16 - Amino acid mutations
Figure imgf000795_0001
Figure imgf000796_0001
Variant protein D11853JPEAJJP11 is encoded by the following transcript(s): Dl l 853 JΕAJ JT15, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript D11853_PEA_1_T15 is shown in bold; this coding portion starts at position 108 and ends at position 1100. The franscript also has the following SNPs as listed in Table 17 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Dl 1853JPEAJ JM 1 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 17 - Nucleic acid SNPs
Figure imgf000796_0002
Figure imgf000797_0001
Figure imgf000798_0001
Variant protein D11853JPEAJJP12 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) D11853JPEAJJT16. An aligmnent is given to the known protein (Membrane associated protein SLP-2) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between Dl 1853_PEA_1_P12 and Q9P042: l.An isolated chimeric polypeptide encoding for Dl 1853JPEAJ JP12, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MLARAARGTGALLLRGSLLASGRAPR conesponding to amino acids 1 - 26 of D11853_PEA_1_P12, a second amino acid sequence being at least 90 % homologous to RASSGLPP^TVNLFVPQQEAWVVERMGRFHRILEPGLNILPVLDRIRYVQSLKEIVTNVP EQSAVTLDNVTLQIDGVLYLRIMDPYKASYGVEDPEYAVTQLAQTTMRSELGKLSLDK VFR conesponding to amino acids 13 - 134 of Q9P042, which also conesponds to amino acids 27 - 148 of D11853_PEA_1_P12, and a third amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%>, more preferably at least 90%) and most preferably at least 95%> homologous to a polypeptide having the sequence VSRSEPELGFEDTNLTLLLFSEGQDQSQALLSVGP conesponding to amino acids 149 - 183 of D11853JPEAJJP12, wherein said first amino acid sequence, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a head of D11853JPEAJJP12, comprising a polypeptide being at least 70%>, optionally at least about 80%o, preferably at least about 85%>, more preferably at least about 90%> and most preferably at least about 95%> homologous to the sequence MLARAARGTGALLLRGSLLASGRAPR of Dl 1853 JPEAJ J>12. 3. An isolated polypeptide encoding for a tail of D11853_PEA_1_P12, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95% homologous to the sequence VSRSEPELGFEDTNLTLLIFSEGQDQSQALLSVGP in Dl 1853JPEAJ JP12.
Comparison report between Dl 1853 JPEAJ JP12 and Q96FY2: l.An isolated chimeric polypeptide encoding for D11853JPEAJJP12, comprising a first amino acid sequence being at least 90 % homologous to MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEAWWERMGRFHRI LEPGLNILIPVLDRIRYVQSLKEIVINVPEQSAVTLDNVTLQIDGVLYLRIMDPYKASYGV EDPEYAVTQ conesponding to amino acids 1 - 128 of Q96FY2, which also conesponds to amino acids 1 - 128 of Dl 1853 JPEAJ J>12, a bridging amino acid L conesponding to amino acid 129 of D11853_PEA_1_P12, a second amino acid sequence being at least 90 % homologous to AQTTMRSELGKLSLDKVFR conesponding to amino acids 130 - 148 of Q96FY2, which also conesponds to amino acids 130 - 148 of Dl 1853JPEAJ JP12, and a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VSRSEPELGFEDTNLTLLIFSEGQDQSQALLSVGP conesponding to amino acids 149 - 183 of D11853JPEAJJP12, wherein said first amino acid sequence, bridging amino acid, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of D11853_PEA_1_P12, comprising a polypeptide being at least 70%>, optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90%> and most preferably at least about 95%> homologous to the sequence VSRSEPELGFEDTNLTLLTFSEGQDQSQALLSVGP in Dl 1853_PEA_1_P12.
Comparison report between Dl 1853 JPEAJ J>12 and Q9UJZ1 : l.An isolated chimeric polypeptide encoding for Dll 853 JPEAJ JP12, comprising a first amino acid sequence being at least 90 % homologous to MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVNLFVPQQEAWVNERMGRFHRI
LEPGLΝILPVLDmRYVQSLKEIVlΝVPEQSAVTLDΝVTLQIDGVLYLRTMDPYKASYGV EDPEYAVTQLAQTTMRSELGKLSLDKVFR conesponding to amino acids 1 - 148 of Q9UJZ1, which also conesponds to amino acids 1 - 148 of D11853_PEA_1_P12, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VSRSEPELGFEDTNLTLLIFSEGQDQSQALLSVGP conesponding to amino acids 149 - 183 of Dl l 853 JPEAJ J>12, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of D11853JPEAJJP12, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VSRSEPELGFEDTNLTLLIFSEGQDQSQALLSVGP in Dl 1853 JΕAJ JP12.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because one of the two signal- peptide prediction programs (HMM: Signal peptide,NN:NO) predicts that this protein has a signal peptide. Variant protein Dl l 853 JPEAJ PI 2 is encoded by the following transcripts): D11853_PEA_1_T16, for which the sequence(s) is/are given at the end of the application. The coding portion of transcri.pt Dl 1853JPEAJJT16 is shown in bold; this coding portion starts at position 108 and ends at position 656. The transcript also has the following SNPs as listed in Table 18 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Dl 1853JPEAJ JP12 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 18 - Nucleic acid SNPs
Figure imgf000800_0001
Figure imgf000801_0001
Figure imgf000802_0001
Variant protein Dll 853 JΕAJ JP14 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) Dll 853 JPEAJ JT19. An alignment is given to the known protein (Membrane associated protein SLP-2) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between Dl 1853JPEAJ JP14 and Q9P042: l.An isolated chimeric polypeptide encoding for D11853JPEAJJP14, comprising a first amino acid sequence being at least 70%>, optionally at least 80%>, preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MLARAARGTGALLLRGSLLASGRAPR conesponding to amino acids 1 - 26 of Dl l 853 JPEAJ JP14, a second amino acid sequence being at least 90 % homologous to PvASSGLPP^TVNLFWQQEAWVVERMGRFHRILEPGLNILPVLDMRYVQSLKEIVTNVP EQSAVTLDNVTLQIDGVLYLRIMDPYKASYGVEDPEYAVTQLAQTTMRSELGKLSLDK VFRERESLNASTVDAINQAADCWGIRCLRYEIKDIHVPPRVKESMQMQV conesponding to amino acids 13 - 180 of Q9P042, which also conesponds to amino acids 27 - 194 of Dl l 853 JPEA JJP14, and a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%> and most preferably at least 95%> homologous to a polypeptide having the sequence GAKEGWEKGLRAPVPGGSRLPSCYDG corresponding to amino acids 195 - 220 of Dl 1853_PEA_1_P14, wherein said first amino acid sequence, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a head of D11853_PEA_1_P14, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95% homologous to the sequence MLARAARGTGALLLRGSLLASGRAPR of Dl 1853 JPEAJ J»14. 3.An isolated polypeptide encoding for a tail of D1 1853 ΕAJJP14, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence GAKEGWEKGLRAPVPGGSRLPSCYDG in Dl 1853 JPEAJ JP14.
Comparison report between Dl 1853_PEA_1_P14 and Q96FY2: l.An isolated chimeric polypeptide encoding for Dl 1853_PEA_1_P14, comprising a first amino acid sequence being at least 90 % homologous to MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEAWVVERMGRFHRI LEPGLNILIPVLDRIRYVQSLKEIVINVPEQSAVTLDNVTLQIDGVLYLRIMDPYKASYGV EDPEYAVTQ conesponding to amino acids 1 - 128 of Q96FY2, which also conesponds to amino acids 1 - 128 of Dl l 853 JPEA JJP14, a bridging amino acid L conesponding to amino acid 129 of Dl l 853 JPEAJ JP14, a second amino acid sequence being at least 90 % homologous to
AQTTMRSELGKLSLDKVFRERESLNASIVDATNQAADCWGIRCLRYEIKDIHVPPRVKES MQMQV conesponding to amino acids 130 - 194 of Q96FY2, which also conesponds to amino acids 130 - 194 of D11853_PEAJ_P14, and a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence GAKEGWEKGLRAPVPGGSRLPSCYDG conesponding to amino acids 195 - 220 of Dl l 853 JPEA JJ*14, wherein said first amino acid sequence, bridging amino acid, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of Dl l 853 ΕAJ JP14, comprising a polypeptide being at least 70%, optionally at least about 80%), preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95%> homologous to the sequence GAKEGWEKGLRAPVPGGSRLPSCYDG in Dl 1853 ΕAJ JP14.
Comparison report between Dl 1853JPEAJ JP14 and Q9UJZ1 : l.An isolated chimeric polypeptide encoding for Dl 1853 JPEAJ J>14, comprising a first amino acid sequence being at least 90 % homologous to MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEAWVVERMGRFHRI LEPGLNILPVLDRIRYVQSLKEIVINVPEQS AVTLDNVTLQIDGVLYLRIMDPYKASYGV EDPEYAVTQLAQTTMRSELGKLSLDKVFRERESLNASIVDATNQAADCWGIRCLRYEIK DIHVPPRVKESMQMQV conesponding to amino acids 1 - 194 of Q9UJZ1, which also conesponds to amino acids 1 - 194 of Dll 853 JPEAJ JP14, and a second amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%>, more preferably at least 90%) and most preferably at least 95% homologous to a polypeptide having the sequence GAKEGWEKGLRAPVPGGSRLPSCYDG conesponding to amino acids 195 - 220 of Dl 1853 JΕAJ JP14, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of D11853_PEA_1_P14, comprising a polypeptide being at least 70%>, optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90%> and most preferably at least about 95%> homologous to the sequence GAKEGWEKGLRAPVPGGSRLPSCYDG in D11853_PEA_1_P14.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because one of the two signal- peptide prediction programs (HMM:Signal peptide,NN:NO) predicts that this protein has a signal peptide. Variant protein D11853JPEAJJP14 also has the following non-silent SNPs (Single
Nucleotide Polymoφhisms) as listed in Table 19, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is Icnown or not; the presence of Icnown SNPs in variant protein Dl 1853_PEA_1_P14 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 19 - Amino acid mutations
Figure imgf000805_0001
Variant protein Dl l 853 JPEAJ JP14 is encoded by the following transcript(s): D11853JPEAJJT19, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript Dl 1853JPEAJJT19 is shown in bold; this coding portion starts at position 108 and ends at position 767. The franscript also has the following SNPs as listed in Table 20 (given according to their position on the nucleotide sequence, with the altemative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Dl 1853JPEAJ J>14 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 20 - Nucleic acid SNPs
Figure imgf000805_0002
Figure imgf000806_0001
Figure imgf000807_0001
Variant protein D11853JPEAJJP16 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) Dl l 853 JPEA JJT24. An alignment is given to the known protein (Membrane associated protein SLP-2) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between Dl 1853 JPEAJ JP16 and Q9P042: l.An isolated chimeric polypeptide encoding for Dl 1853JPEAJ JP16, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MLARAARGTGALLLRGSLLASGRAPR conesponding to amino acids 1 - 26 of D11853JPEAJJP16, a second amino acid sequence being at least 90 % homologous to 1 .SSGLPRNTVNLFVPQQEAWVVERMGRFHRILEPGLNILIPVLDWRYVQSLKEIVTNVP EQSAVTLDNVTLQIDGVLYLRIMDPYKASYGVEDPEYAVTQLAQTTMRSELGKLSLDK VFR conesponding to amino acids 13 - 134 of Q9P042, which also conesponds to amino acids 27 - 148 of D11853_PEA_1_P16, a third amino acid sequence being at least 90 %> homologous to VEAERRKR conesponding to amino acids 180 - 187 of Q9P042, which also conesponds to amino acids 149 - 156 of D11853JPEAJJP16, a bridging amino acid A conesponding to amino acid 157 of Dl 1853 JPEAJ J> 16, and a fourth amino acid sequence being at least 90 % homologous to TVLESEGTRESAINVAEGKKQAQILASEAEKAEQTNQAAGEASAVLAKAKAKAEAIRIL AAALTQHNGDAAASLTVAEQYVSAFSKLAKDSNTILLPSNPGDVTSMVAQAMGVYGA LTKAPVPGTPDSLSSGSSRDVQGTDASLDEELDRVKMS conesponding to amino acids 189 - 342 of Q9P042, which also conesponds to amino acids 158 - 311 of Dl 1853 JPEA JP16, wherein said first amino acid sequence, second amino acid sequence, third amino acid sequence, bridging amino acid and fourth amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a head of D11853_PEA_1_P16, comprising a polypeptide being at least 70%>, optionally at least about 80%), preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MLARAARGTGALLLRGSLLASGRAPR of Dl 1853JPEAJ JP16. 3.An isolated chimeric polypeptide encoding for an edge portion of Dl 1853 J>EAJ J*l 6, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise RV, having a structure as follows: a sequence starting from any of amino acid numbers 148-x to 148; and ending at any of amino acid numbers 149+ ((n-2) - x), in which x varies from 0 to n-2.
Comparison report between D11853JPEAJ JP16 and BAC85377: l.An isolated chimeric polypeptide encoding for Dl 1853JΕAJ JP16, comprising a first amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEAWVVERMGRFHRI LEPGLNILIPVLDRIRNVQSLKEIVINVPEQSAVTLDNVTLQIDGVLYLRI conesponding to amino acids 1 - 109 of Dl 1853JPEAJJP16, a second amino acid sequence being at least 90 % homologous to MDPYKASYGVEDPEYAVTQLAQTTMRSELGKLSLDKVFR conesponding to amino acids 1 - 39 of BAC85377, which also conesponds to amino acids 110 - 148 of D11853JPEAJJP16, a third amino acid sequence being at least 90 % homologous to VEAERI KRATVLESEGTPJESAlNVAEGKKQAQILASEAEKAEQINQAAGEASAVLAKA KAKAEAIRILAAALTQH conesponding to amino acids 85 - 159 of BAC85377, which also conesponds to amino acids 149 - 223 of D11853JPEAJ P16, and a fourth amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence NGDAAASLTVAEQYVSAFSKLAKDSNTILLPSNPGDVTSMVAQAMGVYGALTKAPVP GTPDSLSSGSSRDVQGTDASLDEELDRVKMS conesponding to amino acids 224 - 311 of D11853 ?EAJJP16, wherein said first amino acid sequence, second amino acid sequence, third amino acid sequence and fourth amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a head of D11853 PEAJJP16, comprising a polypeptide being at least 70%>, optionally at least about 80%, preferably at least about 85%o, more preferably at least about 90%> and most preferably at least about 95% homologous to the sequence
MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEAWVVERMGRFHRI LEPGLNILPVLDRIRYVQSLKEIVTNVPEQSAVTLDNVTLQIDGVLYLRI of
D11853_PEA_1_P16. 3.An isolated chimeric polypeptide encoding for an edge portion of D 11853 JΕA J JP 16, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise RV, having a stmcture as follows: a sequence starting from any of amino acid numbers 148-x to 148; and ending at any of amino acid numbers 149+ ((n-2) - x), in which x varies from 0 to n-2. 4.An isolated polypeptide encoding for a tail of Dll 853 JPEA JJP16, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence
NGDAAASLTVAEQYVSAFSKLAKDSNTILLPSNPGDVTSMVAQAMGVYGALTKAPVP GTPDSLSSGSSRDVQGTDASLDEELDRVKMS in Dl 1853JPEAJ JP16.
Comparison report between Dl 1853_PEA_1_P16 and Q96FY2: l.An isolated chimeric polypeptide encoding for Dl 1853JPEAJ JP16, comprising a first amino acid sequence being at least 90 % homologous to MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEAWVVERMGRFHRI LEPGLNILIPVLDRIRYVQSLKEIVTNVPEQSAVTLDNVTLQIDGVLYLRIMDPYKASYGV EDPEYAVTQ conesponding to amino acids 1 - 128 of Q96FY2, which also conesponds to amino acids 1 - 128 of D11853JPEAJJP16, a bridging amino acid L conesponding to amino acid 129 of D11853JPEAJJP16, a second amino acid sequence being at least 90 % homologous to AQTTMRSELGKLSLDKVFR conesponding to amino acids 130 - 148 of Q96FY2, which also conesponds to amino acids 130 - 148 of D11853JPEA JP16, and a third amino acid sequence being at least 90 % homologous to VEAEP RKRATVLESEGTRESATNVAEGKKQAQILASEAEKAEQINQAAGEASAVLAKA KAKAEAIRILAAALTQHNGDAAASLTVAEQYVSAFSKLAKDSNTILLPSNPGDVTSMVA QAMGVYGALTKAPVPGTPDSLSSGSSRDVQGTDASLDEELDRVKMS conesponding to amino acids 194 - 356 of Q96FY2, which also conesponds to amino acids 149 - 311 of D11853_PEA_1_P16, wherein said first amino acid sequence, bridging amino acid, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. 2. An isolated chimeric polypeptide encoding for an edge portion of Dl 1853JPEAJ JP16, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise RV, having a stmcture as follows: a sequence starting from any of amino acid numbers 148-x to 148; and ending at any of amino acid numbers 149+ ((n-2) - x), in which x varies from 0 to n-2.
Comparison report between Dl 1853 JΕA J JP16 and Q9UJZ1 : l.An isolated chimeric polypeptide encoding for D11853_PEA_1_P16, comprising a first amino acid sequence being at least 90 % homologous to
MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTNVLFVPQQEAWVVERMGRFHRI LEPGLNILPVLDRIRYVQSLKEIVΓNVPEQSAVTLDNVTLQIDGVLYLRΓMDPYKASYGV
EDPEYAVTQLAQTTMRSELGKLSLDKVFR conesponding to amino acids 1 - 148 of Q9UIZ1, which also conesponds to amino acids 1 - 148 of D11853_PEA_1_P16, and a second amino acid sequence being at least 90 % homologous to VEALPJ KI^TVLESEGTRESAINNAEGKKQAQILASEAEKAEQINQAAGEASAVLAKA KAKAEAIRILAAALTQHNGDAAASLTVAEQYVSAFSKLAKDSNTILLPSNPGDVTSMVA QAMGVYGALTKAPVPGTPDSLSSGSSRDVQGTDASLDEELDRVKMS conesponding to amino acids 194 - 356 of Q9UIZ1, which also conesponds to amino acids 149 - 311 of D11853JPEAJJP16, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated chimeric polypeptide encoding for an edge portion of D11853JPEAJ JP16, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise RV, having a structure as follows: a sequence starting from any of amino acid numbers 148-x to 148; and ending at any of amino acid numbers 149+ ((n-2) - x), in which x varies from 0 to n-2.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because one of the two signal- peptide prediction programs (HMM: Signal peptide,NN:NO) predicts that this protein has a signal peptide. Variant protein Dll 853 JPEAJ JP16 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 21, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Dl 1853JPEAJJP16 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 21 - Amino acid mutations
Figure imgf000811_0001
Figure imgf000812_0001
Variant protein D11853_PEA_1_P16 is encoded by the following transcript(s): Dl l 853 JPEA JJT24, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript Dll 853 JPEAJ JT24 is shown in bold; this coding portion starts at position 108 and ends at position 1040. The transcript also has the following SNPs as listed in Table 22 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein D11853JPEAJ J>16 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 22 - Nucleic acid SNPs
Figure imgf000812_0002
Figure imgf000813_0001
Figure imgf000814_0001
Variant protein D11853JPEAJJP18 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) D11853_PEA_1_T26. An alignment is given to the known protein (Membrane associated protein SLP-2) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between Dl 1853_PEAJ JP18 and Q9P042: l.An isolated chimeric polypeptide encoding for Dl 1853 JPEAJ J*l 8, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%> and most preferably at least 95%> homologous to a polypeptide having the sequence MLARAARGTGALLLRGSLLASGRAPR conesponding to amino acids 1 - 26 of D11853 >EA_1_P18, a second amino acid sequence being at least 90 % homologous to RASSGLPRNTVVLFVPQQEAWVVERMGRFHRILEPGLNIL VLDRIRYVQSLKEIVINVP EQSAVTLDNVTLQIDGVLYLRIMDPYKASYGVEDPEYAVTQLAQTTMRSELGKLSLDK VFRERESLNASI conesponding to amino acids 13 - 143 of Q9P042, which also conesponds to amino acids 27 - 157 of D11853JPEAJJP18, and a third amino acid sequence being at least 90 % homologous to VAQAMGVYGALTKAPVPGTPDSLSSGSSRDVQGTDASLDEELDRVKMS conesponding to amino acids 295 - 342 of Q9P042, which also conesponds to amino acids 158 - 205 of D11853JPEAJJP18, wherein said first amino acid sequence, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a head of D11853JΕAJJP18, comprising a polypeptide being at least 10%, optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MLARAARGTGALLLRGSLLASGRAPR of Dl 1853JPEAJ JP18. 3.An isolated chimeric polypeptide encoding for an edge portion of D11853_PEA_1_P18, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise IV, having a structure as follows: a sequence starting from any of amino acid numbers 157-x to 157; and ending at any of amino acid numbers 158+ ((n-2) - x), in which x varies from 0 to n-2.
Comparison report between Dl 1853JPEAJJP18 and Q96FY2: l.An isolated chimeric polypeptide encoding for Dl 1853 JΕAJ J> 18, comprising a first amino acid sequence being at least 90 %> homologous to MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEAWVVERMGRFHRI LEPGLNILIPVLDRIRYVQSLKEIVTNVPEQSAVTLDNVTLQIDGVLYLRIMDPYKASYGV EDPEYAVTQ conesponding to amino acids 1 - 128 of Q96FY2, which also conesponds to amino acids 1 - 128 of D11853JPEAJJP18, a bridging amino acid L conesponding to amino acid 129 of Dll 853 JPEAJ JP18, a second amino acid sequence being at least 90 %> homologous to AQTTMRSELGKLSLDKVFRERESLNASI conesponding to amino acids 130 - 157 of Q96FY2, which also conesponds to amino acids 130 - 157 of D11853_PEA_1_P18, and a third amino acid sequence being at least 90 % homologous to VAQAMGVYGALTKAPVPGTPDSLSSGSSRDVQGTDASLDEELDRVKMS conesponding to amino acids 309 - 356 of Q96FY2, which also conesponds to amino acids 158 - 205 of D11853JPEAJJP18, wherein said first amino acid sequence, bridging amino acid, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. 2.An isolated chimeric polypeptide encoding for an edge portion of Dll 853 JPEA JJP18, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise IV, having a structure as follows: a sequence starting from any of amino acid numbers 157-x to 157; and ending at any of amino acid numbers 158+ ((n-2) - x), in which x varies from 0 to n-2. Comparison report between Dl 1853 ΕAJ JP18 and Q9UJZ1 : l.An isolated chimeric polypeptide encoding for Dl 1853_PEA_1_P18, comprising a first amino acid sequence being at least 90 % homologous to MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEAWVVERMGRFHRI LEPGLNILIPVLDRIRYVQSLKEIVTNVPEQSAVTLDNVTLQIDGVLYLRIMDPYKASYGV EDPEYAVTQLAQTTMRSELGKLSLDKVFRERESLNASI conesponding to amino acids 1 - 157 of Q9UJZ1, which also conesponds to amino acids 1 - 157 of D11853JPEAJJP18, and a second amino acid sequence being at least 90 % homologous to VAQAMGVYGALTKAPVPGTPDSLSSGSSRDVQGTDASLDEELDRVKMS conesponding to amino acids 309 - 356 of Q9UJZ1, which also conesponds to amino acids 158 - 205 of D11853_PEA_1_P18, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated chimeric polypeptide encoding for an edge portion of Dl 1853JPEAJJP18, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise IV, having a structure as follows: a sequence starting from any of amino acid numbers 157-x to 157; and ending at any of amino acid numbers 158+ ((n-2) - x), in which x varies from 0 to n-2.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because one of the two signal- peptide prediction programs (HMM:Signal peptide,NN:NO) predicts that this protein has a signal peptide. Variant protein D11853_PEA_1_P18 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 23, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Dl 1853 JΕAJ JP18 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 23 - Amino acid mutations
Figure imgf000817_0001
Variant protein D11853JΕAJJP18 is encoded by the following transcript(s): D11853_PEA_1_T26, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript Dl 1853 JPEAJ _T26 is shown in bold; this coding portion starts at position 108 and ends at position 722. The transcript also has the following SNPs as listed in Table 24 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Dl 1853JPEAJ JP18 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 24 - Nucleic acid SNPs
Figure imgf000817_0002
Figure imgf000818_0001
Variant protein Dll 853 JPEAJ JP19 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) D11853JPEAJJT27. An alignment is given to the known protein (Membrane associated protein SLP-2) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between D 11853_PEA_1_P 19 and Q9P042: l.An isolated chimeric polypeptide encoding for Dl 1853JPEAJ JP19, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MLARAARGTGALLLRGSLLASGRAPR conesponding to amino acids 1 - 26 of D11853_PEA_1_P19, a second amino acid sequence being at least 90 % homologous to RASSGLPP^TVVLF QQEAWVVERMGRFHRILEPGLNILIPVLDWRYVQSLKEIViNVP EQSAVTLDNVTLQIDGVLYLRIMDPYKASYGVEDPEYAVTQLAQTTMRSELGKLS conesponding to amino acids 13 - 128 of Q9P042, which also conesponds to amino acids 27 - 142 of D11853JPEAJJP19, and a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%>, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence SRLTLQWEQQRCPGYRCKS conesponding to amino acids 143 - 161 of D11853JPEAJJP19, wherein said first amino acid sequence, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a head of D11853JPEAJJP19, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence MLARAARGTGALLLRGSLLASGRAPR of Dl 1853_PEAJ_P19. 3.An isolated polypeptide encoding for a tail of Dl l 853 JPEA 1 P19, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SRLTLQWEQQRCPGYRCKS in Dl 1853_PEA_1_P19.
Comparison report between Dl 1853_PEA_1_P19 and Q96FY2: l .An isolated chimeric polypeptide encoding for Dl 1853 JΕAJ JP19, comprising a first amino acid sequence being at least 90 % homologous to MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEAWVVERMGRFHRI LEPGLNILPVLDRIRYVQSLKEIVINVPEQSAVTLDNVTLQIDGVLYLRIMDPYKASYGV EDPEYAVTQ conesponding to amino acids 1 - 128 of Q96FY2, which also conesponds to amino acids 1 - 128 of D11853_PEA_1_P19, a bridging amino acid L conesponding to amino acid 129 of D11853JΕAJJP19, a second amino acid sequence being at least 90 % homologous to AQTTMRSELGKLS conesponding to amino acids 130 - 142 of Q96FY2, which also conesponds to amino acids 130 - 142 of D11853JPEAJJP19, and a third amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence SRLTLQWEQQRCPGYRCKS conesponding to amino acids 143 - 161 of D11853_PEA_1_P19, wherein said first amino acid sequence, bridging amino acid, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of D11853JPEAJJP19, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence SRLTLQWEQQRCPGYRCKS in Dl 1853 JPEAJ JP19. Comparison report between Dl 1853_PEA_1_P19 and Q9UJZ1: l .An isolated chimeric polypeptide encoding for Dl 1853JΕAJ JP19, comprising a first amino acid sequence being at least 90 % homologous to
MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEAWVVERMGRFHRI LEPGLNILPVLDRIRYVQSLKEIVΓNVPEQSAVTLDNVTLQIDGVLYLRIMDPYKASYGV
EDPEYAVTQLAQTTMRSELGKLS conesponding to amino acids 1 - 142 of Q9UJZ1, which also conesponds to amino acids 1 - 142 of D11853_PEA_1_P19, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence SRLTLQWEQQRCPGYRCKS conesponding to amino acids 143 - 161 of D11853JPEAJJP19, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of D11853JPEAJJP19, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SRLTLQWEQQRCPGYRCKS in Dl 1853JPEAJ JP19.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because one of the two signal- peptide prediction programs (HMM:Signal peptide,NN:NO) predicts that this protein has a signal peptide. Variant protein Dl l 853 JPEA JJP19 also has the following non-silent SNPs (Single Nucleotide PoTymoφhisms) as listed in Table 25, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Dl 1853JPEAJ JP19 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 25 - Amino acid mutations
Figure imgf000821_0001
Variant protein D11853_PEA_1_P19 is encoded by the following transcript(s): D11853 PEAJJT27, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript Dl 1853 ΕAJJT27 is shown in bold; this coding portion starts at position 108 and ends at position 590. The transcript also has the following SNPs as listed in Table 26 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein D11853JPEAJJP19 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 26 - Nucleic acid SNPs
Figure imgf000821_0002
Figure imgf000822_0001
Variant protein D11853JPEAJJP20 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) D11853JPEAJ_T7, D11853JΕAJJT17 and Dl l 853 JPEA JJT25. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because one of the two signal-peptide prediction programs (HMM:Signal peptide,NN:NO) predicts that this protein has a signal peptide.
Variant protein Dl l 853 JPEAJ JP20 is encoded by the following transcript(s): D11853_PEA_1__T7, Dl l 853 JPEAJ JTl 7 and Dl l 853 JPEA JJT25, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript Dl l 853_PEA_1_T7 is shown in bold; this coding portion starts at position 108 and ends at position 287. The transcript also has the following SNPs as listed in Table 27 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Dl 1853_PEA_1_P20 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 27 - Nucleic acid SNPs
Figure imgf000822_0002
Figure imgf000823_0001
Figure imgf000824_0001
The coding portion of franscript D11853JPEAJJT17 is shown in bold; this coding portion starts at position 108 and ends at position 287. The transcript also has the following SNPs as listed in Table 28 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Dl 1853 JPEA JJP20 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 28 - Nucleic acid SNPs
Figure imgf000824_0002
Figure imgf000825_0001
The coding portion of transcript Dll 853 JPEA JJT25 is shown in bold; this coding portion starts at position 108 and ends at position 287. The transcript also has the following SNPs as listed in Table 29 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Dl 1853_PEA_1_P20 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 29 - Nucleic acid SNPs
Figure imgf000826_0001
Figure imgf000827_0001
Variant protein Dl l 853 JPEA JJP21 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) Dl 1853JPEAJ JT8. An alignment is given to the known protein (Membrane associated protein SLP-2) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between Dl 1853_PEAJ_P21 and Q96FY2: l.An isolated chimeric polypeptide encoding for Dl 1853 JPEAJ JP21, comprising a first amino acid sequence being at least 90 % homologous to MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEAWVVERMGRFHRI LEP conesponding to amino acids 1 - 61 of Q96FY2, which also conesponds to amino acids 1 - 61 of Dl 1853JPEAJ JP21 , and a second amino acid sequence being at least 70%, optionally at least 80%), preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
VRNLFCPPWASQMTNPSRHAMSGGLPLGLPALLAPDSVGQT conesponding to amino acids 62 - 102 of D11853_PEA_1_P21, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of D11853JPEAJJP21, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to tire sequence VRNLFCPPWASQMTNPSRHAMSGGLPLGLPALLAPDSVGQT in Dl 1853 JPEA JP21.
Comparison report between Dl 1853 JPEAJ JP21 and Q9UJZ1 : l.An isolated chimeric polypeptide encoding for D11853 ΕAJJP21, comprising a first amino acid sequence being at least 90 % homologous to MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEAWWERMGRFHRI LEP conesponding to amino acids 1 - 61 of Q9UIZ1, which also conesponds to amino acids 1 - 61 of D11853JPEAJJP21, and a second amino acid sequence being at least 10%, optionally at least 80%, preferably at least 85%), more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence VRNLFCPPWASQMTNPSRHAMSGGLPLGLPALLAPDSVGQT conesponding to amino acids 62 - 102 of D11853_PEA_1_P21, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of Dll 853 JPEA JJP21, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VRNLFCPPWASQMTNPSRHAMSGGLPLGLPALLAPDSVGQT in Dl l 853 PEA 1 P21.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because one of the two signal- peptide prediction programs (HMM:Signal peptide,NN:NO) predicts that this protein has a signal peptide. Variant protein Dl l 853 JPEA JJP21 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 30, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Dl l 853 JPEA JJP21 sequence provides support for the deduced sequence of this variant protein according to the present invention) . Table 30 - Amino acid mutations
Figure imgf000829_0001
Variant protein Dl l 853 JPEA JJP21 is encoded by the following transcript(s): Dl l 853 JPEA J_T8, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript D11853 ΕAJJT8 is shown in bold; this coding portion starts at position 108 and ends at position 413. The transcript also has the following SNPs as listed in Table 31 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Dl 1853_PEA_1_P21 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 31 - Nucleic acid SNPs
Figure imgf000830_0001
Figure imgf000831_0001
Variant protein D11853JPEAJJP22 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s) Dl 1853 JPEA JJT9. An alignment is given to the known protein (Membrane associated protein SLP-2) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between Dl 1853 JPEAJ JP22 and Q9P042: l.An isolated chimeric polypeptide encoding for Dl 1853 JPEA JJP22, comprising a first amino acid sequence being at least 70%), optionally at least 80%ι, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MLARAARGTGALLLRGSLLASGRAPR conesponding to amino acids 1 - 26 of Dll 853 JPEA JJP22, a second amino acid sequence being at least 90 % homologous to RASSGLPRNTWLFVPQQEAWWERMGRFHRILEP conesponding to amino acids 13 - 47 of Q9P042, which also conesponds to amino acids 27 - 61 of Dl l 853 JPEA JJP22, and a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence ELLLFWACSMC conesponding to amino acids 62 - 72 of D11853_PEA_1_P22, wherein said first amino acid sequence, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a head of D11853JPEAJJP22, comprising a polypeptide being at least 70%>, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MLARAARGTGALLLRGSLLASGRAPR of Dl 1853 JPEA JJP22. 3.An isolated polypeptide encoding for a tail of Dl l 853 JPEA JJP22, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence ELLLFWACSMC in Dl 1853 JPEA JJ>22.
Comparison report between Dl 1853JPEAJ JP22 and Q96FY2: l.An isolated chimeric polypeptide encoding for Dl l 853 JΕAJ JP22, comprising a first amino acid sequence being at least 90 % homologous to MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEAWWERMGRFHRI LEP conesponding to amino acids 1 - 61 of Q96FY2, which also conesponds to amino acids 1 - 61 of Dl l 853 JPEA JJP22, and a second amino acid sequence being at least 70%, optionally at least 80%), preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence ELLLFWACSMC conesponding to amino acids 62 - 72 of Dl 1853_PEA_1_P22, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of Dl l 853 JPEA JJ>22, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence ELLLFWACSMC in Dl 1853 JPEAJ JP22.
Comparison report between Dl 1853 JPEA JJP22 and Q9UJZ1: l.An isolated chimeric polypeptide encoding for Dll 853 ΕAJ JP22, comprising a first amino acid sequence being at least 90 % homologous to
MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEAWVVERMGRFHRI LEP conesponding to amino acids 1 - 61 of Q9UIZ1, which also conesponds to amino acids 1 - 61 of Dl 1853 JPEA JJP22, and a second amino acid sequence being at least 70%, optionally at least 80%), preferably at least 85%>, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence ELLLFWACSMC conesponding to amino acids 62 - 72 of D11853_PEA_1_P22, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of D11853JPEAJJP22, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence ELLLFWACSMC in Dl 1853JΕAJ JP22.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because one of the two signal- peptide prediction programs (HMM:Signal peptide,NN:NO) predicts that this protein has a signal peptide. Variant protein Dll 853 JPEA JJP22 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 32, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Dl 1853 JPEA JJP22 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 32 - Amino acid mutations
Figure imgf000833_0001
Variant protein D11853_PEA_1_P22 is encoded by the following transcript(s): D11853 ΕAJJT9, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript Dl l 853 JPEA JJT9 is shown in bold; this coding portion starts at position 108 and ends at position 323. The transcript also has the following SNPs as listed in Table 33 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein D 11853 JPEA JJP22 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 33 - Nucleic acid SNPs
Figure imgf000834_0001
Figure imgf000835_0001
Variant protein D11853JPEAJJP24 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) D11853_PEA_1_T21. An alignment is given to the known protein (Membrane associated protein SLP-2) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between Dl 1853 ΕAJJP24 and Q9P042: l.An isolated chimeric polypeptide encoding for Dl 1853 JPEA JJP24, comprising a first amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence MLARAARGTGALLLRGSLLASGRAPR conesponding to amino acids 1 - 26 of D11853_PEA_1_P24, a second amino acid sequence being at least 90 % homologous to RASSGLPRNTVVLFVPQQEAWVVERMGRFHRILEPGLNILPVLDRIRYVQSLKEIVINVP EQSAVTL conesponding to amino acids 13 - 80 of Q9P042, which also conesponds to amino acids 27 - 94 of Dl l 853 JPEA JJP24, and a third amino acid sequence being at least 70%>, optionally at least 80%o, preferably at least 85%>, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence GTGVPECQHCGCHQPSC conesponding to amino acids 95 - 111 of Dl l 853 ΕA >24, wherein said first amino acid sequence, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a head of D11853_PEA_1_P24, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MLARAARGTGALLLRGSLLASGRAPR of Dl 1853 JPEAJ JP24. 3.An isolated polypeptide encoding for a tail of D11853_PEA_1_P24, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence GTGVPECQHCGCHQPSC in D11853_PEAJ J>24.
Comparison report between Dl 1853 JPEAJ JP24 and Q96FY2: l.An isolated chimeric polypeptide encoding for Dl 1853 JPEA JJP24, comprising a first amino acid sequence being at least 90 % homologous to MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEAWWERMGRFHRI LEPGLNILIPVLDRIRYNQSLKEIVTNVPEQSAVTL conesponding to amino acids 1 - 94 of Q96FY2, which also conesponds to amino acids 1 - 94 of D11853_PEA_1_P24, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%> and most preferably at least 95 %> homologous to a polypeptide having the sequence GTGVPECQHCGCHQPSC conesponding to amino acids 95 - 111 of Dl 1853 JPEA JJ*24, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of Dl l 853 JPEA JJP24, comprising a polypeptide being at least 70%>, optionally at least about 80%), preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence GTGVPECQHCGCHQPSC in D11853_PEA_l_P24.
Comparison report between Dl 1853 JPEAJ JP24 and Q9UIZ1 : 1.An isolated chimeric polypeptide encoding for Dl 1853JPEAJ JP24, comprising a first amino acid sequence being at least 90 % homologous to MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEAWVVERMGRFHRI LEPGLNILIPVLDRIRYVQSLKEIVINVPEQSAVTL conesponding to amino acids 1 - 94 of Q9UIZ1, which also conesponds to amino acids 1 - 94 of D11853JPEAJJP24, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence GTGVPECQHCGCHQPSC conesponding to amino acids 95 - 111 of Dl l 853 JPEA JJP24, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of D11853_PEA_1_P24, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence GTGVPECQHCGCHQPSC in D11853JPEA JP24. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because one of the two signal- peptide prediction programs (HMM:Signal peptide,NN:NO) predicts that this protein has a signal peptide. Variant protein Dl 1853_PEA_1_P24 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 34, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Dl 1853_PEA_1_P24 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 34 - Amino acid mutations
Figure imgf000838_0001
Variant protein Dl l 853 JPEA JJP24 is encoded by the following transcript(s): Dl l 853 JPEA JJT21, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript Dl 1853 JPEA JJT21 is shown in bold; this coding portion starts at position 108 and ends at position 440. The franscript also has the following SNPs as listed in Table 35 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Dl 1853_PEA_1__P24 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 35 - Nucleic acid SNPs
Figure imgf000838_0002
Figure imgf000839_0001
Figure imgf000840_0001
As noted above, cluster Dl l 853 featares 31 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided. Segment cluster Dll 853 JPEA J iode according to the present invention is supported by 24 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): D11853JPEAJJT7, D11853JPEAJJT17 and Dl 1853 JPEA JJT25. Table 36 below describes the starting and ending position of this segment on each franscript. Table 36 - Segment location on transcripts
Figure imgf000840_0002
Segment cluster D11853_PEA_l_node_6 according to the present invention is supported by 25 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): D11853_PEA_1_T7, Dll 853 JPEA JJT8, Dl l 853 JΕAJ JT17 and D11853_PEA_1_T25. Table 37 below describes the starting and ending position of this segment on each transcript. Table 37 - Segment location on transcripts
Figure imgf000841_0001
Segment cluster Dl l 853 JΕAJ jnode according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): Dl l 853 JPEAJ JT25. Table 38 below describes the starting and ending position of this segment on each transcript. Table 38 - Segment location on transcripts
Figure imgf000841_0002
Segment cluster Dl 1853 JΕAJ jnode J 7 according to the present invention is supported by 3 libraries. The number of libraries was detennined as previously described. This segment can be found in the following transcript(s): Dl l 853 JPEAJ JTl 6 and Dl 1853 JPEAJ JT23. Table 39 below describes the starting and ending position of this segment on each transcript. Table 39 - Segment location on transcripts
Figure imgf000841_0003
Figure imgf000842_0001
Segment cluster Dl 1853 JPEAJ iode l according to the present invention is supported by 6 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): D11853JΕNJJT19 and D11853JPEAJJT23. Table 40 below describes the starting and ending position of this segment on each transcript. Table 40 - Segment location on transcripts
Figure imgf000842_0002
Segment cluster Dl 1853 JPEA J ιode -2 according to the present invention is supported by 287 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcripts): Dl l 853 JPEAJ JTl, Dl l 853 JPEA J_T3, Dl l 853 JPEA J_T7, D11853JPEAJJT8, Dll 853 JPEA JJT9, D11853_PEA_1_T10, D11853_PEA_1_T13, D11853_PEAJ_T14, D11853_PEA_1_T15, Dl l 853 JPEAJ JTl 6, Dll 853 JPEA JJT17, Dll 853 JPEAJ _T19, D11853_PEAJ_T21, Dl l 853 JPEA JJT23, D11853_PEA_1_T24 and D11853_PEA_1_T25. Table 41 below describes the starting and ending position of this segment on each transcript. Table 41 - Segment location on transcripts
Figure imgf000842_0003
Figure imgf000843_0001
Segment cluster Dl 1853 JPEA J_node_23 according to the present invention is supported by 4 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): D11853J>EAJJT13 and Dl 1853 JΕAJ JT15. Table 42 below describes the starting and ending position of this segment on each transcript. Table 42 - Segment location on transcripts
Figure imgf000843_0002
Segment cluster Dll 853 JPEA J_node_25 according to the present invention is supported by 11 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): Dl l 853 JPEA JN0, D11853_PEA_1_T15 and D11853_PEA_1_T25. Table 43 below describes the starting and ending position of this segment on each franscript. Table 43 - Segment location on transcripts
Figure imgf000844_0001
Segment cluster Dl 1853JPEAJ ιodeJ6 according to the present invention is supported by 290 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Dl l 853 JPEAJ JTl, D11853JPEAJJT3, D11853_PEA_1_T7, D11853JPEAJJT8, Dl l 853 JPEA JJT9, Dl l 853 JPEA JT10, D11853JPEAJJT13, D11853JPEAJ_T15, Dll 853 JΕAJ JTl 6, D11853JPEAJJT17, Dl 1853 JPEA JJT19, D11853JΕAJJT21, Dl l 853 JPEA JT23, Dl l 853 JPEA JJT24 and Dl l 853 JPEAJ JT25. Table 44 below describes the starting and ending position of this segment on each transcript. Table 44 - Segment location on transcripts
Figure imgf000844_0002
Figure imgf000845_0001
Segment cluster Dl 1853 J>EAJ jnode 7 according to the present invention is supported by 8 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): D11853_PEA_1_T3 and Dl l 853 JPEA JT25. Table 45 below describes the starting and ending position of this segment on each transcript. Table 45 - Segment location on transcripts
Figure imgf000845_0002
Segment cluster Dl l 853 JΕAJ jiode JO according to the present invention is supported by 249 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcripts): Dl l 853 JPEAJ JTl, D11853_PEA_1_T3, Dl l 853 JPEA JJT7, D11853_PEAJ_T8, Dl 1853 JPEA _T9, Dl 1853 JPEAJ JTl 0, Dl l 853 JPEA J T13, D11853_PEAJ JTl 4, Dl l 853 JPEA JT15, D11853_PEA_1__T16, D11853_PEAJ_T17, Dll 853 JΕAJ JTl 9, Dll 853 JPEAJ JT21, Dll 853 JPEA JJT23, D11853JPEAJJT24, Dl l 853 JPEA JJT25, Dl l 853 JΕAJJT26 and D11853_PEAJ_T27. Table 46 below describes the starting and ending position of this segment on each transcript. Table 46 - Segment location on transcripts
Figure imgf000845_0003
Figure imgf000846_0001
Segment cluster Dl 1853 JPEA J ιodeJ2 according to the present invention is supported by 215 libraries. The number of libraries was detennined as previously described. This segment can be found in the following transcript(s): D11853_PEA_1_T1, Dl 1853 JΕA JJT3, D11853JPEAJ_T7, D11853JPEAJJT8, D11853_PEAJ_T9, D11853JPEAJJT10, D11853_PEAJ_T13, D11853JPEAJJT14, Dll 853 JPEA JJT15, Dl 1853 JPEA JTl 6, Dl 1853 JPEAJ JT17, Dll 853 JPEAJ JTl 9, D11853JPEAJ_T21, Dl l 853 JPEA JJT23, Dl l 853 JPEA JJT24, Dll 853 JPEAJ JT25, Dl 1853 JPEA JT26 and Dl 1853 JPEA JT27. Table 47 below describes the starting and ending position of this segment on each transcript. Table 47 - Segment location on transcripts
Figure imgf000847_0001
According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster Dll 853 JPEA J_nodeJ3 according to the present invention is supported by 14 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Dl 1853 JPEAJ JTl, Dll 853 JPEA JJT3, D11853JPEAJJT7, Dl l 853 JΕAJ JT8, Dl 1853 JΕAJ JT9, D11853_PEA_1_T10, D11853JΕAJ T13, D11853JΕAJJT14, Dl 1853_PEA_1_T15, D11853_PEAJ_T16, Dl l 853 JΕAJ JTl 7, Dl 1853 JΕAJ JT19, Dl 1853 JPEA JJT21, D11853_PEA_1_T23, D11853_PEA_1_T24, D11853_PEA_1_T25, Dl 1853 JPEAJ JT26 and D11853J>EAJJT27. Table 48 below describes the starting and ending position of this segment on each transcript. Table 48 - Segment location on transcripts
Figure imgf000848_0001
Segment cluster D11853JPEAJ_nodeJ according to the present invention is supported by 158 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): D11853JPEAJJT1, Dl l 853 JPEAJ JT3, D11853_PEA_1_T7, Dl l 853 JPEA JJT8, Dll 853 JΕAJ JT9, D11853_PEA_1_T10, Dl 1853 JPEA _T13, D11853_PEAJ_T14, Dll 853 JΕAJ JTl 5, D11853JPEAJJT16, D11853_PEA_1_T17, Dl l 853 JPEAJJTl 9, Dl l 853 JPEA JJT21, D11853JΕAJJT23, D11853JΕAJJT24, D11853_PEA_1_T25, D11853_PEA_1_T26 and D11853_PEA_1_T27. Table 49 below describes the starting and ending position of this segment on each transcript. Table 49 - Segment location on transcripts
Figure imgf000849_0001
Segment cluster D11853_PEA_l_node_2 according to the present invention is supported by 247 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): D11853_PEAJ_T1, D11853JΕAJJT3, D11853_PEA_1_T7, Dl 1853 JΕAJ JT8, D11853 ΕAJJT9, Dl 1853 JΕAJ JT10, Dl l 853 JPEAJ JTl 3, D11853JΕAJJT14, D11853_PEAJ_T15, D11853_PEA_1_T16, D11853_PEA_1_T17, D11853JPEAJJT19, Dl 1853 J>EAJJT21, Dl 1853 JPEAJ JT23, Dl l 853 JPEA JJT24, D11853_PEAJ_T25, Dl l 853 JPEAJ JT26 and D11853_PEA_1_T27. Table 50 below describes the starting and ending position of this segment on each transcript. Table 50 - Segment location on transcripts
Figure imgf000850_0001
Segment cluster Dl l 853 JPEAJ _nodeJ according to the present invention is supported by 258 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Dl l 853 JΕAJ JTl, Dl l 853 JPEA JJT3, D11853JΕAJJT7, Dl l 853 JPEAJ JT8, Dl l 853 JPEA JJT9, D11853_PEA_1_T10, Dl l 853 JPEA JJT13. D11853JPEAJJT14, D11853JΕAJJT15, D11853_PEA_1_T16, D11853_PEA_1_T17, D11853JPEAJJT19, D11853_PEAJ_T21, Dl l 853 JPEA JJT23, Dl l 853 JPEA JJT24, D11853JΕAJJT25, Dl 1853 JPEA JJT26 and D11853_PEA_1_T27. Table 51 below describes the starting and ending position of this segment on each transcript. Table 51 - Segment location on transcripts
Figure imgf000851_0001
Segment cluster Dll 853 JPEA J_nodeJ according to the present invention is supported by 291 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Dll 853 JPEAJ JTl, Dll 853 JPEA JJT3, D11853JPEAJJT7, D11853JΕAJJT8, Dll 853 JPEA JJT9, D11853JPEAJJN0, D11853_PEAJ_T13, D11853JΕAJJT14, Dll 853 JΕAJ JTl 5, D11853JPEAJJT16, Dll 853 JPEAJ JT17, D11853_PEA_1_T19, Dl 1853 JPEAJ JT21, D11853JPEAJJT23, D11853_PEA_1_T24, Dll 853 JPEAJ JT25, Dll 853 JPEAJ JT26 and Dll 853 JPEAJ JT27. Table 52 below describes the starting and ending position of this segment on each transcript. Table 52 - Segment location on transcripts
Figure imgf000852_0001
Figure imgf000853_0001
Segment cluster D11853_PEA_l_nodeJ according to the present invention is supported by 20 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcripts): Dl l 853 JPEA JJT7, Dl 1853 JPEA JJT8, D11853_PEA_1_T9, Dll 853 JPEA JJT17 and Dl 1853 JPEA JJT25. Table 53 below describes the starting and ending position of this segment on each franscript. Table 53 - Segment location on transcripts
Figure imgf000853_0002
Segment cluster D11853_PEA_l_node_8 according to the present invention is supported by 304 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Dl 1853 JPEAJ JTl, Dll 853 JPEA JJT3, D11853JPEAJJT7- Dl 1853 JPEA _T8, Dl l 853 JPEAJ JT9, D11853JPEAJJN0, D11853JPEAJJT13, D11853_PEAJ_T14, Dll 853 JPEAJ JTl 5, Dl 1853 JPEA JT16, Dl 1853 JPEAJ JT17, D11853_PEAJ_T19, D11853JPEAJJT21, D11853JΕAJJT23, D11853JPEAJJT24, D11853_PEA_1_T25, Dll 853 JΕAJJT26 and D11853JΕAJJT27. Table 54 below describes the starting and ending position of this segment on each transcript. Table 54 - Segment location on transcripts
Figure imgf000853_0003
Figure imgf000854_0001
Segment cluster Dl 1853 JΕAJ jnode JO according to the present invention is supported by 237 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcripts): Dll 853 JPEAJJTl, Dl 1853 JPEAJ JT3, Dl 1853 JΕAJ JT7, D11853_PEA_1_T8, Dll 853 JΕA JJT9, D11853_PEAJ_T10, Dl l 853 JPEA JJT13, D11853_PEA_1_T14, Dll 853 JPEAJ JTl 5, Dll 853 JPEAJ JTl 6, D11853_PEA_1_T17, D11853_PEAJ_T19, Dl l 853 JPEA J_T23, D11853JΕAJJT24, D11853_PEAJ_T25, Dll 853 JΕA JJT26 and Dll 853 JPEA JJT27. Table 55 below describes the starting and ending position of this segment on each franscript. Table 55 - Segment location on transcripts
Figure imgf000855_0001
Segment cluster Dl 1853JPEAJ jnodeJ2 according to the present invention is supported by 239 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Dl 1853 JPEAJ JTl, D11853_PEA_1_T3, D11853JPEAJJT7, D11853JPEAJJT8, Dll 853 JΕA JJT9, Dl 1853 JPEA JT10, Dl 1853 ΕAJ JT13, Dll 853 JPEAJ JT14, Dl 1853 JPEAJJTl 5, Dl 1853 JPEAJ 16, Dll 853 JPEA JJT17, D11853JPEAJJT19, D11853JPEAJJT23, Dll 853 JPEA JJT24, Dll 853 J EA JJT25, D11853_PEAJ_T26 and D11853_PEAJ_T27. Table 56 below describes the starting and ending position of this segment on each transcript. Table 56 - Segment location on transcripts
Figure imgf000856_0001
Segment cluster D11853JPEAJ_nodeJ3 according to the present invention can be found in the following transcriρt(s): Dl l 853 JPEA JJT, D11853_PEA_1_T3, D11853_PEA_1_T7, Dll 853 JPEA JJT8, Dll 853 JPEA JJT9, D11853_PEAJJT10, D11853JΕAJJT13, D11853JPEAJJT14, D11853JPEAJ T5, D11853_PEAJ_T16, D11853JΕAJJT17, Dl 1853 JPEA JJT19, D11853 JPEAJ JT23, D11853_PEAJ_T24, D11853_PEAJ_T25, D11853_PEA_1_T26 and Dll 853 JPEA JJT27. Table 57 below describes the starting and ending position of this segment on each transcript. Table 57 - Segment location on transcripts
Figure imgf000857_0001
Segment cluster D11853_PEA_l_node_14 according to the present invention can be found in the following transcript(s): Dll 853 JΕAJ JTl, Dl l 853 JPEA JJT3, D11853JΕAJJT7, D11853JΕAJJT8, Dll 853 JPEA JT9, D11853JPEAJJT10, D11853JΕAJJT13, Dl 1853 JPEAJ JT14, D11853JPEAJJT15, D11853_PEA_1_T16, D11853_PEAJ_T17, Dl 1853JPEA JJT19, D11853_PEAJ_T23, Dl 1853 JPEA JT24, Dll 853 JPEA JJT25, Dll 853 JPEA JT26 and D11853_PEA_1_T27. Table 58 below describes the starting and ending position of this segment on each transcript. Table 58 - Segment location on transcripts
Figure imgf000858_0001
Segment cluster D11853JΕAJ_nodeJ5 according to the present invention can be found in the following transcript(s): D11853JPEAJJT1, D11853_PEA_1_T3, Dl l 853 J EA J_T7, D11853_PEA__1_T8, D11853_PEA_1_T9, Dl 1853 JPEAJ JTl 0, Dl l 853 JPEA JJT13, Dll 853 JPEA JJT14, D11853_PEA_1_T15, Dll 853 JΕAJ JTl 6, D11853_PEA_1_T17, D11853_PEA_1_T19, D11853_PEA_1_T23, Dl l 853 JPEA JJT24, D11853_PEA_1_T25, D11853_PEAJ_T26 and D11853 ΕAJJT27. Table 59 below describes the starting and ending position of this segment on each transcript. Table 59 - Segment location on transcripts
Figure imgf000859_0001
Segment cluster D11853_PEA_l_node_16 according to the present invention can be found in the following transcript(s): Dll 853 JΕAJ JTl. Dl l 853 JPEA J_T3, D11853_PEA_1_T7, D11853_PEAJ_T8, D11853JΕAJJT9, D11853J>EAJ_T10, D11853JPEAJJT13, D11853_PEA_1_T14, Dll 853 JPEAJ JTl 5, D11853_PEA_1_T16, Dl l 853 JPEA JJT17, Dl 1853 JPEA JJT19, D11853_PEAJ_T23, Dll 853 JPEA J_T24, D11853_PEAJ_T25 and D11853_PEA_1_T26. Table 60 below describes the starting and ending position of this segment on each franscript. Table 60 - Segment location on transcripts
Figure imgf000860_0001
Segment cluster Dl 1853 JPEAJ ιodeJ8 according to the present invention is supported by 230 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Dl l 853 JPEAJ JTl, Dl l 853 JPEA JJT3, D11853_PEAJ_T7, D11853JΕAJJT8, Dl 1853 JPEA JJT9, D11853_PEAJ_T10, Dll 853 JPEAJ JT13, Dll 853 JPEA JJT14, D11853_PEA_1_T15, D11853_PEA_1_T16, Dl 1853 JPEA JJT17, D11853J>EAJ ri9, D11853_PEA_1_T21, D11853JPEAJJT23, Dll 853 JPEA JJT25 and D11853_PEA_1_T26. Table 61 below describes the starting and ending position of this segment on each franscript. Table 61 - Segment location on transcripts
Figure imgf000861_0001
Segment cluster D11853_PEA_l_node_19 according to the present invention can be found in the following transcript(s): D11853_PEA_1_T1, Dl l 853 JΕAJ JT3, D11853_PEA_1_T7, D11853JPEAJJT8, D11853JPEAJJT9, Dl l 853 JPEAJ JN0, D11853JPEAJJT13, D11853_PEAJ_T14, D11853_PEAJ_T15, D11853 JPEA JT16, D11853_PEA_1_T17, D11853_PEA_1_T19, D11853_PEA_1_T21, D11853_PEA_1_T23 and Dll 853 JPEA JJT25. Table 62 below describes the starting and ending position of this segment on each transcript. Table 62 - Segment location on transcripts
Figure imgf000862_0001
Segment cluster Dl 1853_PEA_l_node_20 according to the present invention is supported by 257 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcripts): Dl 1853 JPEAJ JTl, Dll 853 JPEA JJT3, Dll 853 J EA JJT7, Dll 853 JPEA JJT8, D11853_PEAJ_T9, D11853_PEAJ_T10, D11853JPEAJJT13, D11853JPEAJJT14, Dl 1853 JPEAJ JT15, D11853_PEAJ_T16, D11853JΕAJJT17, Dl 1853 JPEAJ T19, D11853_PEAJ_T21, D11853_PEAJ_T23 and Dll 853_PEA_1_T25. Table 63 below describes the starting and ending position of this segment on each transcript. Table 63 - Segment location on transcripts
Figure imgf000863_0001
Segment cluster Dl 1853 JPEA J ιodeJ4 according to the present invention is supported by 254 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Dl l 853 JΕAJ JTl, D11853_PEA_1_T3, Dll 853 JPEA JJT7, Dl 1853 JΕAJ JT8, Dll 853 JPEA J_T9, D11853_PEAJ_T10, Dl l 853 JPEA J_T13, Dll 853 JPEA JJT14, D11853JPEAJ JT5, D11853JPEAJJT16, D11853_PEAJ_T17, Dll 853 JΕAJ JT19, D11853_PEAJ_T21, Dll 853 JPEA JJT23, D11853 ΕAJJT24 and D11853_PEA_1_T25. Table 64 below describes the starting and ending position of this segment on each transcript. Table 64 - Segment location on transcripts
Figure imgf000864_0001
Segment cluster Dl l 853 JPEA J ιode -8 according to the present invention can be found in the following transcript(s): Dl 1853 JΕA JJT3, Dll 853 JPEA JJT25 and Dl 1853JPEAJ _T26. Table 65 below describes the starting and ending position of this segment on each franscript. Table 65 - Segment location on transcripts
Figure imgf000864_0002
Figure imgf000865_0001
Segment cluster Dl 1853 J>EA Jjiode 9 according to the present invention is supported by 248 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcripts): Dl l 853 JPEAJ JTl, D11853JΕAJJT3, Dl l 853 JPEA JJT7, D11853_PEAJ_T8, D11853J>EAJJT9, D11853JPEAJJN0, D11853JPEAJJT13, D1 1853_PEA_1_T14, D11853JPEAJJT15, Dl 1853 JPEAJ JTl 6, D11853_PEAJ_T17, D11853_PEAJ_T19, Dl l 853 JΕAJ JT21, D11853JPEAJJT23, Dl 1853 JPEA JJT24, D11853_PEA_1_T25 and Dl l 853 JPEA JJT26. Table 66 below describes the starting and ending position of this segment on each transcript. Table 66 - Segment location on transcripts
Figure imgf000865_0002
2
865
Figure imgf000866_0001
Variant protein alignment to the previously known protein: Sequence name: Q9P042
Sequence documentation:
Alignment of: D11853_PEA_1__P1 x Q9P042
Alignment segment 1/1;
Quality: 3115.00 Escore: 0 Matching length: 330 Total length: 330 Matching Percent Similarity: 99.70 Matching Percent Identity: 99.70 Total Percent Similarity: 99.70 Total Percent Identity: 99.70 Gaps : 0
Alignment:
27 RASSGLPRNTVVLFVPQQEA VVERMGRFHRILEPGLNILIPVLDRIRYV 76
13 RASSGLPRNTVV FVPQQEA VVERMGRFHRILEPGLNILIPVLDRIRYV 62
77 QSLKEIVINVPEQSAVTLDNVTLQIDGVLYLRIMDPYKASYGVEDPEYAV 126 63 QSL EIVINVPEQSAVTLDNVTLQIDGVLYLRIMDPYKASYGVEDPEYAV 112
127 TQLAQTTMRSELGKLSLDKVFRERESLNASIVDAINQAADCWGIRCLRYE 176 I I I I I I I I I I I II II I I 1 I I I I I I I I I I I I I II II I I I I I I I II I I II I I 113 TQLAQTTMRSELGKLSLDKVFRERESLNASIVDAINQAADCWGIRCLRYE 162
177 IKDIHVPPRVKESMQMQVEAERRKRATVLESEGTRESAINVAEGKKQAQI 226 II II I I II I I I I I I I I I II II II I I I I I II I I I II I I II I I I I I I I I II 163 IKDIHVPPRVKESMQMQVEAERR RPTVLESEGTRESAINVAEGKKQAQI 212 227 LASEAEKAEQINQAAGEASAVLAKAKAKAEAIRILAAALTQHNGDATAASL 276 II I I II I II I I I II I I I I I I I I II I I I I I II I I I I I I I II I I I II I I I I I 213 LASEAEKAEQINQAAGEASAVLAKAKAKAEAIRILAAALTQHNGDAAASL 262 277 TVAEQYVSAFSKLAKDSNTILLPSNPGDVTSMVAQAMGVYGALTKAPVPG 326 I I I I I I I I I I I I I I I I II I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I 263 TVAEQYVSAFS LAKDSNTILLPSNPGDVTSMVAQAMGVYGALTKAPVPG 312
327 TPDS SSGSSRDVQGTDASLDEELDRVKMS 356 I I I I I I I I I II I I I I I I I I II I I I I I I I I I 313 TPDSLSSGSSRDVQGTDASLDEELDRVKMS 342
Sequence name : BAC85377
Sequence documentation : 867
Alignment of: D11853 PEA 1 PI x BAC85377
Alignment segment 1/1: Quality: 1512.00
Escore: 0 Matching length: 159 Total length: 159 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment:
110 MDPYKASYGVEDPEYAVTQLAQTTMRSELGKLSLDKVFRERESLNASIVD 159
1 MDPYKASYGVEDPEYAVTQLAQTTMRSELGKLSLDKVFRERESLNASIVD 50
160 AINQAADC GIRCLRYEIKDIHVPPRVKESMQMQVEAERRKRATVLESEG 209
51 AINQAADC GIRCLRYEIKDIHVPPRVKESMQMQVEAERRKRATVLESEG 100
210 TRESAINVAEGKKQAQILASEAEKAEQINQAAGEASAVLAKAKAKAEAIR 259
101 TRESAINVAEGKKQAQILASEAEKAEQINQAAGEASAVLAKAKAKAEAIR 150
260 ILAAALTQH 268 I I I I I I I I I 151 ILAAALTQH 159
Sequence name: Q96FY2
Sequence documentation:
Alignment of: D11853_PEA_1__P1 x Q96FY2
Alignment segment 1/1: Quality: 3343.00
Escore: 0 Matching length: 356 Total length: 356 Matching Percent Similarity: 99.72 Matching Percent Identity: 99.72 Total Percent Similarity: 99.72 Total Percent
Identity: 99.72 Gaps : 0
Alignment :
1 MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEAWVVE 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I II I I I I I 1 MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEAWWE 50 . . . . . 51 RMGRFHRILEPGLNILIPVLDRIRYVQSLKEIVINVPEQSAVTLDNVTLQ 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I RMGRFHRILEPGLNILIPVLDRIRYVQSLKEIVINVPEQSAVTLDNVTLQ 100
IDGVLYLRIMDPYKASYGVEDPEYAVTQLAQTTMRSELGKLSLDKVFRER 150 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I IDGVLYLRIMDPYKASYGVEDPEYAVTQPAQTTMRSELGKLSLDKVFRER 150
ESLNASIVDAINQAADC GIRCLRYEIKDIHVPPRVKESMQMQVEAERRK 200
I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I II III II I I I I II I I I I I ESLNASIVDAINQAADCWGIRCLRYEIKDIHVPPRVKESMQMQVEAERRK 200
RATVLESEGTRESAINVAEGKKQAQILASEAEKAEQINQAAGEASAVLAK 250
RATVLESEGTRESAINVAEGKKQAQILASEAEKAEQINQAAGEASAVLAK 250
AKAKAEAIRILAAALTQHNGDAAASLTVAEQYVSAFSKLAKDSNTILLPS 300
AKAKAEAIRILAAALTQHNGDAAASLTVAEQYVSAFSKLAKDSNTILLPS 300
NPGDVΪSMVAQAMGVYGALTKAPVPGTPDSLSSGSSRDVQGTDASLDEEL 350
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I NPGDVTSMVAQAMGVYGALTKAPVPGTPDSLSSGSSRDVQGTDASLDEEL 350
DRVKMS 356
DRVKMS 356 370
Sequence name: Q9P042
Sequence documentation:
Alignment of: D11853 PEA 1 P2 x Q9P042
Alignment segment 1/1: Quality: 2691.00
Escore: 0 Matching length: 285 Total length: 285 Matching Percent Similarity: 99.65 Matching Percent Identity: 99.65 Total Percent Similarity: 99.65 Total Percent Identity: 99.65 Gaps : 0
Alignment:
27 RASSGLPRNTVVLFVPQQEAWVVERMGRFHRILEPGLNILIPVLDRIRYV 76
13 RASSGLPRNTVVLFVPQQEAWVVERMGRFHRILEPGLNILIPVLDRIRYV 62
77 QSLKEIVINVPEQSAVTLDNVTLQIDGVLYLRIMDPYKASYGVEDPEYAV 126
63 QSLKEIVINVPEQSAVTLDNVTLQIDGVLYLRIMDPYKASYGVEDPEYAV 112
127 TQLAQTTMRSELGKLSLDKVFRERESLNASIVDAINQAADCWGIRCLRYE 176 113 TQLAQTTMRSELGKLSLDKVFRERESLNASIVDAINQAADC GIRCLRYE 162
177 IKDIHVPPR¥KESMQMQVEAERRKRATVLESEGTRESAINVAEGKKQAQI 226 I I I II I I II I I I I I I I I I I I II I I I I I I I I I I I II I I I I I I I I I I I II I 163 IKDIHVPPRVKESMQMQVEAERRKRPTVLESEGTRESAINVAEGKKQAQI 212
227 LASEAEKAEQINQAAGEASAVLAKAKAKAEAIRILAAALTQHNGDAAASL 276 I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I II I I I I I I 1 I 213 LASEAEKAEQINQAAGEASAVLAKAKAKAEAIRILAAALTQHNGDAAASL 262
277 TVAEQYVSAFSKLAKDSNTILLPSNPGDVTSMVAQ 311 I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I 263 TVAEQYVSAFSKLAKDSNTILLPSNPGDVTSMVAQ 297
Sequence name: BAC85377
Sequence documentation:
Alignment of: D11853_PEA_1_P2 x BAC85377
Alignment segment 1/1:
Quality: 1512.00
Escore: 0 Matching length: 159 Total length: 159 Matching Percent Similarity: 100.00 Matching Percent
Identity: 100.00 Total Percent Similarity: 100.00 Total Percent
Identity: 100.00 Gaps : 0
Alignment :
110 MDPYKASYGVEDPEYAVTQLAQTTMRSELGKLSLDKVFRERESLNASIVD 159 I I I I I I I I I II I I I I I Ml I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MDPYKASYGVEDPEYAVTQLAQTTMRSELGKLSLDKVFRERESLNASIVD 50
160 AINQAADC GIRCLRYEIKDIHVPPRVKESMQMQVEAERRKRATVLESEG 209 I I I I II I II I II II I II III III I I I II I I I II II I II II I I I III I I I I 51 AINQAADCWGIRCLRYEIKDIHVPPRVKESMQMQVEAERRKRATVLESEG 100
210 TRESAINVAEGKKQAQILASEAEKAEQINQAAGEASAVLAKAKAKAEAIR 259 I I I I I I I I II II I I I I I I I I II I I I II I I I I I II I I I I II I I II I I I I I I 101 TRESAINVAEGKKQAQILASEAEKAEQINQAAGEASAVLAKAKAKAEAIR 150
260 ILAAALTQH 268 II ! I I I I I I 151 ILAAALTQH 159
Sequence name: Q96FY2 Sequence documentation:
Alignment of: D11853_PEAJL_P2 x Q96FY2
Alignment segment 1/1:
Quality: 2919.00
Escore: 0 Matching length: 311 Total length: 311 Matching Percent Similarity: 99.68 Matching Percent Identity: 99.68 Total Percent Similarity: 99.68 Total Percent
Identity: 99.68 Gaps : 0
Alignment :
1 MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEAWWE 50
1 MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEA VVE 50
51 RMGRFHRILEPGLNILIPVLDRIRYVQSLKEIVINVPEQSAVTLDNVTLQ 100 I I I I I I I I I I I II I I I I I I I I I I II I I I I I I I 1 I I I I I I I I I I II I I I I I 51 RMGRFHRILEPGLNILIPVLDRIRYVQSLKEIVINVPEQSAVTLDNVTLQ 100
101 IDGVLYLRIMDPYKASYGVEDPEYAVTQLAQTTMRSELGKLSLDKVFRER 150 I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I II I I I I I I I II I I II I 101 IDGVLYLRIMDPYKASYGVEDPEYAVTQPAQTTMRSELGKLSLDKVFRER 150 151 ESLNASIVDAINQAADCWGIRCLRYEIKDIHVPPRVKESMQMQVEAERRK 200 I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I II I II I 151 ESLNASIVDAINQAADCWGIRCLRYEIKDIHVPPRVKESMQMQVEAERRK 200 201 RATVLESEGTRESAINVAEGKKQAQILASEAEKAEQINQAAGEASAVLAK 250 I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 201 RATVLESEGTRESAINVAEGKKQAQILASEAEKAEQINQAAGEASAVLAK 250
251 AKAKAEAIRILAAALTQHNGDAAASLTVAEQYVSAFSKLAKDSNTILLPS 300 I I I I I I I I I II II I II I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I 251 AKAKAEAIRILAAALTQHNGDAAASLTVAEQYVSAFSKLAKDSNTILLPS 300
301 NPGDVTSMVAQ 311 I I I I I I I I I I I 301 NPGDVTSMVAQ 311
Sequence name: Q9UJZ1
Sequence documentation:
Alignment of: D11853_PEA_1_P2 x Q9UJZ1
Alignment segment 1/1 :
Quality: 2934.00 Escore: 0 875 Matching length: 311 Total length: 311 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent
Identity: 100.00 Gaps : 0
Alignment: . . . . . 1 MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEA VVE 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I II I I I I I 1 MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTWLFVPQQEAWVVE 50 51 RMGRFHRILEPGLNILIPVLDRIRYVQSLKEIVINVPEQSAVTLDNVTLQ 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 RMGRFHRILEPGLNILIPVLDRIRYVQSLKEIVINVPEQSAVTLDNVTLQ 100
101 IDGVLYLRIMDPYKASYGVEDPEYAVTQLAQTTMRSELGKLSLDKVFRER 150 I I II I I I I I I I I I M I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I 101 IDGVLYLRIMDPYKASYGVEDPEYAVTQLAQTTMRSELGKLSLDKVFRER 150
151 ESLNASIVDAINQAADC GIRCLRYEIKDIHVPPRVKESMQMQVEAERRK 200 1 I I I I I I I I I 1 I I I I I I I I II I I I I I I I II I I I I I I I I I II I I I I II I I I 151 ESLNASIVDAINQAADCWGIRCLRYEIKDIHVPPRVKESMQMQVEAERRK 200
201 RATVLESEGTRESAINVAEGKKQAQILASEAEKAEQINQAAGEASAVLAK 250 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I II I I I I I II I I I I I 201 RATVLESEGTRESAINVAEGKKQAQILASEAEKAEQINQAAGEASAVLAK 250 251 AKAKAEAIRILAAALTQHNGDAAASLTVAEQYVSAFSKLAKDSNTILLPS 300 251 AKAKAEAIRILAAALTQHNGDAAASLTVAEQYVSAFSKLAKDSNTILLPS 300
301 NPGDVTSMVAQ 311
301 NPGDVTSMVAQ 311
Sequence name: Q9P042
Sequence documentation:
Alignment of: D11853_PEA 1_P7 x Q9P042
Alignment segment 1/1:
Quality: 2290.00 Escore: 0 Matching length: 242 Total length: 242 Matching Percent Similarity: 99.59 Matching Percent Identity: 99.59 Total Percent Similarity: 99.59 Total Percent Identity: 99.59 Gaps : 0
Alignment: 27 RASSGLPRNTVVLFVPQQEA VVERMGRFHRILEPGLNILIPVLDRIRYV 76 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 13 RASSGLPRNTVVLFVPQQEAWVVERMGRFHRILEPGLNILIPVLDRIRYV 62 . . . . . 77 QSLKEIVINVPEQSAVTLDNVTLQIDGVLYLRIMDPYKASYGVEDPEYAV 126 I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 63 QSLKEIVINVPEQSAVTLDNVTLQIDGVLYLRIMDPYKASYGVEDPEYAV 112 127 TQLAQTTMRSELGKLSLDKVFRERESLNASIVDAINQAADC GIRCLRYE 176 I I II I I II I I I II I I I I I I I I I I I II I I I I I I I I I II II II I I I I I I I II 113 TQLAQTTMRSELGKLSLDKVFRERESLNASIVDAINQAADCWGIRCLRYE 162 177 IKDIHVPPRVKESMQMQVEAERRKRATVLESEGTRESAINVAEGKKQAQI 226 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I II I II I I I I I 163 IKDIHVPPRVKESMQMQVEAERRKRPTVLESEGTRESAINVAEGKKQAQI 212 227 LASEAEKAEQINQAAGEASAVLAKAKAKAEAIRILAAALTQH 268 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 213 LASEAEKAEQINQAAGEASAVLAKAKAKAEAIRILAAALTQH 254
Sequence name: BAC85377
Sequence documentation:
Alignment of: D11853 PEA 1 P7 x BAC85377 Alignment segment 1/1:
Quality: 1724.00 Escore: 0 Matching length: 181 Total length: 181 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent
Identity: 100.00 Gaps : 0
Alignment : . . . . . 110 MDPYKASYGVEDPEYAVTQLAQTTMRSELGKLSLDKVFRERESLNASIVD 159 I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I II I I I I I I I 1 MDPYKASYGVEDPEYAVTQLAQTTMRSELGKLSLDKVFRERESLNASIVD 50 160 AINQAADCWGIRCLRYEIKDIHVPPRVKESMQMQVEAERRKRATVLESEG 209 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 AINQAADCWGIRCLRYEIKDIHVPPRVKESMQMQVEAERRKRATVLESEG 100 210 TRESAINVAEGKKQAQILASEAEKAEQINQAAGEASAVLAKAKAKAEAIR 259 I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 101 TRESAINVAEGKKQAQILASEAEKAEQINQAAGEASAVLAKAKAKAEAIR 150
260 ILAAALTQHVRGPWVGMGTGIDSGRGSLIYA 290 I I I I I I I I II I I I I I I I I I I II I II I I I I I I 151 ILAAALTQHVRGP VGMGTGIDSGRGSLIYA 181
Sequence name: Q96FY2
Sequence documentation:
Alignment of: D11853_PEA_1_P7 x Q96FY2
Alignment segment 1/1:
Quality: 2518.00 Escore: 0 Matching length: 268 Total length: 268 Matching Percent Similarity: 99.63 Matching Percent Identity: 99.63 Total Percent Similarity: 99.63 Total Percent
Identity: 99.63 Gaps : 0
Alignment: . . . . . 1 MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEAWWE 50 II I I I I I I I I II I I I! I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEA VVE 50 51 RMGRFHRILEPGLNILIPVLDRIRYVQSLKEIVINVPEQSAVTLDNVTLQ 100 51 RMGRFHRILEPGLNILIPVLDRIRYVQSLKEIVINVPEQSAVTLDNVTLQ 100
101 IDGVLYLRIMDPYKASYGVEDPEYAVTQLAQTTMRSELGKLSLDKVFRER 150 I I I II I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I 101 IDGVLYLRIMDPYKASYGVEDPEYAVTQPAQTTMRSELGKLSLDKVFRER 150 151 ESLNASIVDAINQAADCWGIRCLRYEIKDIHVPPRVKESMQMQVEAERRK 200 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I 151 ESLNASIVDAINQAADCWGIRCLRYEIKDIHVPPRVKESMQMQVEAERRK 200 201 RATVLESEGTRESAINVAEGKKQAQILASEAEKAEQINQAAGEASAVLAK 250 I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I 201 RATVLESEGTRESAINVAEGKKQAQILASEAEKAEQINQAAGEASAVLAK 250 251 AKAKAEAIRILAAALTQH 268 I I I I I I I I I I I I I I I I I I 251 AKAKAEAIRILAAALTQH 268
Sequence name: Q9UJZ1
Sequence documentation:
Alignment of: D11853_PEA_1_P7 x Q9UJZ1
Alignment segment 1/1: Quality: 2533.00 Escore: 0 Matching length: 268 Total length: 268 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps :
Alignment :
1 MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEA VVE 50
1 MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEAWWE 50
51 RMGRFHRILEPGLNILIPVLDRIRYVQSLKEIVINVPEQSAVTLDNVTLQ 100
51 RMGRFHRILEPGLNILIPVLDRIRYVQSLKEIVINVPEQSAVTLDNVTLQ 100
101 IDGVLYLRIMDPYKASYGVEDPEYAVTQLAQTTMRSELGKLSLDKVFRER 150
101 IDGVLYLR1MDPYKASYGVEDPEYAVTQLAQTTMRSELGKLSLDKVFRER 150
151 ESLNASIVDAINQAADCWGIRCLRYEIKDIHVPPRVKESMQMQVEAERRK 200
151 ESLNASIVDAINQAADC GIRCLRYEIKDIHVPPRVKESMQMQVEAERRK 200
201 RATVLESEGTRESAINVAEGKKQAQILASEAEKAEQINQAAGEASAVLAK 250 I I I I I II I I II II 1 I I I II I I I II I I III I I I II I 1111 I I I III II I I I 201 RATVLESEGTRESAINVAEGKKQAQILASEAEKAEQINQAAGEASAVLAK 250 251 AKAKAEAIRILAAALTQH 268
251 AKAKAEAIRILAAALTQH 268
Sequence name: Q9P042
Sequence documentation:
Alignment of: D11853 PEA 1 P9 x Q9P042
Alignment segment 1/1:
Quality: 3015.00 Escore: 0 Matching length: 330 Total length: 371 Matching Percent Similarity: 99.70 Matching Percent Identity: 99.70 Total Percent Similarity: 88.68 Total Percent Identity: 88.68 Gaps : 1
Alignment :
27 RASSGLPRNTWLFVPQQEAVVERMGRFHRILEPGLNILIPVLDRIRYV 76 I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I RASSGLPRNTVVLFVPQQEA VVERMGRFHRILEPGLNILIPVLDRIRYV 62
QSLKEIVINVPEQSAVTLDNVTLQIDGVLYLRIMDPYKASYGVEDPEYAV 126 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I QSLKEIVINVPEQSAVTLDNVTLQIDGVLYLRIMDPYKASYGVEDPEYAV 112
TQLAQTTMRSELGKLSLDKVFRERESLNASIVDAINQAADC GIRCLRYE 176 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I TQLAQTTMRSELGKLSLDKVFRERESLNASIVDAINQAADCWGIRCLRYE 162
IKDIHVPPRVKESMQMQVEAERRKRATVLESEGTRESAINVAEGKKQAQI 226
I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I II I I I I II I I I I II I I IKDIHVPPRVKESMQMQVEAERRKRPTVLESEGTRESAINVAEGKKQAQI 212 . . . . . LASEAEKAEQINQAAGQERVEAEGGARHGPLKIGAGAGSLGYFDFMGQAS 276
I I I I I I I I I I I I I I LASEAEKAEQINQA 226
SVPSLAGEASAVLAKAKAKAEAIRILAAALTQHNGDAAASLTVAEQYVSA 326 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I AGEASAVLAKAKAKAEAIRILAAALTQHNGDAAASLTVAEQYVSA 271
FSKLAKDSNTILLPSNPGDVTSMVAQAMGVYGALTKAPVPGTPDSLSSGS 376 I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I FSKLAKDSNTILLPSNPGDVTSMVAQAMGVYGALTKAPVPGTPDSLSSGS 321
SRDVQGTDASLDEELDRVKMS 397
I I I I I I I I I II I I I I I I I I I I SRDVQGTDASLDEELDRVKMS 342
Sequence name: BAC85377
Sequence documentation:
Alignment of: D11853 PEA 1 P9 x BAC85377
Alignment segment 1/1:
Quality: 1412.00 Escore: 0 Matching length: 159 Total length: 200 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 79.50 Total Percent Identity: 79.50 Gaps : 1
Alignment:
110 MDPYKASYGVEDPEYAVTQLAQTTMRSELGKLSLDKVFRERESLNASIVD 159
1 MDPYKASYGVEDPEYAVTQLAQTTMRSELGKLSLDKVFRERESLNASIVD 50
160 AINQAADCWGIRCLRYEIKDIHVPPRVKESMQMQVEAERRKRATVLESEG 209 51 AINQAADCWGIRCLRYEIKDIHVPPRVKESMQMQVEAERRKRATVLESEG 100
210 TRESAINVAEGKKQAQILASEAEKAEQINQAAGQERVEAEGGARHGPLKI 259
101 TRESAINVAEGKKQAQILASEAEKAEQINQA 131
260 GAGAGSLGYFDFMGQASSVPSLAGEASAVLAKAKAKAEAIRILAAALTQH 309
132 AGEASAVLAKAKAKAEAIRILAAALTQH 159
Sequence name: Q96FY2
Sequence documentation:
Alignment of: D11853 PEA 1 P9 x Q96FY2
Alignment segment 1/1:
Quality: 3243.00 Escore: 0 Matching length: 356 Total length: 397 Matching Percent Similarity: 99.72 Matching Percent Identity: 99.72 Total Percent Similarity: 89.42 Total Percent Identity: 89.42 Gaps : 1
Alignment: 1 MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEA VVE 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I 1 MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEAWVVE 50
51 RMGRFHRILEPGLNILIPVLDRIRYVQSLKEIVINVPEQSAVTLDNVTLQ 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 RMGRFHRILEPGLNILIPVLDRIRYVQSLKEIVINVPEQSAVTLDNVTLQ 100
101 IDGVLYLRIMDPYKASYGVEDPEYAVTQLAQTTMRSELGKLSLDKVFRER 150 I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 101 IDGVLYLRIMDPYKASYGVEDPEYAVTQPAQTTMRSELGKLSLDKVFRER 150 151 ESLNASIVDAINQAADC GIRCLRYEIKDIHVPPRVKESMQMQVEAERRK 200 I I I I I I I I I I I I I I I I II I I I I I I I I I I I I II I I II I II I I I I I I I I I I I 151 ESLNASIVDAINQAADC GIRCLRYEIKDIHVPPRVKESMQMQVEAERRK 200 . . . . . 201 RATVLESEGTRESAINVAEGKKQAQILASEAEKAEQINQAAGQERVEAEG 250 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II II 201 RATVLESEGTRESAINVAEGKKQAQILASEAEKAEQINQA 240 251 GARHGPLKIGAGAGSLGYFDFMGQASSVPSLAGEASAVLAKAKAKAEAIR 300 I I I I I I I I I I I I I I I I I I I 241 AGEASAVLAKAKAKAEAIR 259
301 ILAAALTQHNGDAAASLTVAEQYVSAFSKLAKDSNTILLPSNPGDVTSMV 350 I II II I II I I I II I I I I I I I I I I I I I I I I II I I 1 I I I I I I I I I I I I I I I I 260 ILAAALTQHNGDAAASLTVAEQYVSAFSKLAKDSNTILLPSNPGDVTSMV 309 887
351 AQAMGVYGALTKAPVPGTPDSLSSGSSRDVQGTDASLDEELDRVKMS 397
310 AQAMGVYGALTKAPVPGTPDSLSSGSSRDVQGTDASLDEELDRVKMS 356
Sequence name: Q9UJZ1
Sequence documentation:
Alignment of: D11853 PEA 1 P9 x Q9UJZ1
Alignment segment 1/1
Quality: 3258.00 Escore: 0 Matching length: 356 Total length: 397 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 89.67 Total Percent Identity: 89.67 Gaps : 1
Alignment:
1 MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTWLFVPQQEAWWE 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I III I I I I MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEAWVVE 50
RMGRFHRILEPGLNILIPVLDRIRYVQSLKEIVINVPEQSAVTLDNVTLQ 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I RMGRFHRILEPGLNILIPVLDRIRYVQSLKEIV1NVPEQSAVTLDNVTLQ 100
IDGVLYLRIMDPYKASYGVEDPEYAVTQLAQTTMRSELGKLSLDKVFRER 150 I II I I I I I I I I I I III I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I IDGVLYLRIMDPYKASYGVEDPEYAVTQLAQTTMRSELGKLSLDKVFRER 150
ESLNASIVDAINQAADCWGIRCLRYEIKDIHVPPRVKESMQMQVEAERRK 200 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I II I I I I I II I II I ESLNASIVDAINQAADC GIRCLRYEIKDIHVPPRVKESMQMQVEAERRK 200 . . . . . RATVLESEGTRESAINVAEGKKQAQILASEAEKAEQINQAAGQERVEAEG 250 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I RATVLESEGTRESAINVAEGKKQAQILASEAEKAEQINQA 240
GARHGPLKIGAGAGSLGYFDFMGQASSVPSLAGEASAVLAKAKAKAEAIR 300 I I I I I I I I I I I I I I I I I I I AGEASAVLAKAKAKAEAIR 259
ILAAALTQHNGDAAASLTVAEQYVSAFSKLAKDSNTILLPSNPGDVTSMV 350
I I I I I I I I I I I I I I I I I II I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I ILAAALTQHNGDAAASLTVAEQYVSAFSKLAKDSNTILLPSNPGDVTSMV 309
AQAMGVYGALTKAPVPGTPDSLSSGSSRDVQGTDASLDEELDRVKMS 397
I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I II I I I I I I I I I I I I I I AQAMGVYGALTKAPVPGTPDSLSSGSSRDVQGTDASLDEELDRVKMS 356
Sequence name: Q9P042
Sequence documentation:
Alignment of: D11853 PEA 1 PIO x Q9P042
Alignment segment 1/1:
Quality: 2614.00 Escore: 0 Matching length: 287 Total length: 330 Matching Percent Similarity: 99.65 Matching Percent Identity: 99.65 Total Percent Similarity: 86.67 Total Percent Identity: 86.67 Gaps : 1
Alignment:
27 RASSGLPRNTVVLFVPQQEAWVVERMGRFHRILEPGLNILIPVLDRIRYV 76
13 RASSGLPRNTVVLFVPQQEAWVVERMGRFHRILEPGLNILIPVLDRIRYV 62
77 QSLKEIVINVPEQSAVTLDNVTLQIDGVLYLRIMDPYKASYGVEDPEYAV 126 63 QSLKEIVINVPEQSAVTLDNVTLQIDGVLYLRIMDPYKASYGVEDPEYAV 112
127 TQLAQTTMRSELGKLSLDKVFRERESLNASIVDAINQAADCWGIRCLRYE 176 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I 113 TQLAQTTMRSELGKLSLDKVFRERESLNASIVDAINQAADCWGIRCLRYE 162 177 IKDIHVPPRVKESMQMQVEAERRKRATVLESEGTRESAINVAEGKKQAQI 226 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I 163 IKDIHVPPRVKESMQMQVEAERRKRPTVLESEGTRESAINVAEGKKQAQI 212
227 LASEAEKAEQINQAAGEASAVLAKAKAKAEAIRILAAALTQH 268 I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 213 LASEAEKAEQINQAAGEASAVLAKAKAKAEAIRILAAALTQHNGDAAASL 262 269 AMGVYGALTKAPVPG 283 I I I I I I I I I I I I I I I 263 TVAEQYVSAFSKLAKDSNTILLPSNPGDVTSMVAQAMGVYGALTKAPVPG 312
284 TPDSLSSGSSRDVQGTDASLDEELDRVKMS 313 I I I I II I I I I I I I I I I I I I I I I I I I I I I I I 313 TPDSLSSGSSRDVQGTDASLDEELDRVKMS 342
Sequence name: BAC85377
Sequence documentation: Alignment of: D11853_PEA_1J?10 x BAC85377
Alignment segment 1/1: Quality: 1515.00
Escore: 0 Matching length: 162 Total length: 162 Matching Percent Similarity: 98.77 Matching Percent Identity: 98.77 Total Percent Similarity: 98.77 Total Percent
Identity: 98.77 Gaps : 0
Alignment:
110 MDPYKASYGVEDPEYAVTQLAQTTMRSELGKLSLDKVFRERESLNASIVD 159 I I I I I I I I I I I I I I I II I II II I I I I I I I I I I I I I I I I I I I I I III I I I I 1 MDPYKASYGVEDPEYAVTQLAQTTMRSELGKLSLDKVFRERESLNASIVD 50 . . . . . 160 AINQAADC GIRCLRYEIKDIHVPPRVKESMQMQVEAERRKRATVLESEG 209 I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 AINQAADCWGIRCLRYEIKDIHVPPRVKESMQMQVEAERRKRATVLESEG 100 210 TRESAINVAEGKKQAQILASEAEKAEQINQAAGEASAVLAKAKAKAEAIR 259 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 101 TRESAINVAEGKKQAQILASEAEKAEQINQAAGEASAVLAKAKAKAEAIR 150
260 ILAAALTQHAMG 271 I I I I I I I I I I 151 ILAAALTQHVRG 162
Sequence name: Q96FY2
Sequence documentation:
Alignment of: D11853J?EAJ._P10 x Q96FY2
Alignment segment 1/1: Quality: 2842.00
Escore: 0 Matching length: 313 Total length: 356 Matching Percent Similarity: 99.68 Matching Percent Identity: 99.68 Total Percent Similarity: 87.64 Total Percent
Identity: 87.64 Gaps : 1
Alignment:
1 MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEAWWE 50 I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTWLFVPQQEAWVVE 50 . . . . . 51 RMGRFHRILEPGLNILIPVLDRIRYVQSLKEIVINVPEQSAVTLDNVTLQ 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I RMGRFHRILEPGLNILIPVLDRIRYVQSLKEIVINVPEQSAVTLDNVTLQ 100
IDGVLYLRTMDPYKASYGVEDPEYAVTQLAQTTMRSELGKLSLDKVFRER 150 I I I II M I I I || I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I IDGVLYLRIMDPYKASYGVEDPEYAVTQPAQTTMRSELGKLSLDKVFRER 150
ESLNASIVDAINQAADC GIRCLRYEIKDIHVPPRVKESMQMQVEAERRK 200
I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I II I I I I I I I I I I I I I I I I ESLNASIVDA1NQAADCWGIRCLRYEIKDIHVPPRVKESMQMQVEAERRK 200
RATVLESEGTRESAINVAEGKKQAQILASEAEKAEQINQAAGEASAVLAK 250
I I I I I I I 1 I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I II I I I I I I I I I RATVLESEGTRESAINVAEGKKQAQILASEAEKAEQINQAAGEASAVLAK 250 . . . . . AKAKAEAIRILAAALTQH 268
I I I I I I 1 I I I I I I I I I I I AKAKAEAIRILAAALTQHNGDAAASLTVAEQYVSAFSKLAKDSNTILLPS 300 AMGVYGALTKAPVPGTPDSLSSGSSRDVQGTDASLDEEL 307 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I NPGDVTSMVAQAMGVYGALTKAPVPGTPDSLSSGSSRDVQGTDASLDEEL 350
DRVKMS 313
DRVKMS 356 Sequence name: Q9UJZ1
Sequence documentation:
Alignment of: D11853_PEA 1 P10 x Q9UJZ1
Alignment segment 1/1: Quality: 2857.00
Escore: 0 Matching length: 313 Total length: 356 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 87.92 Total Percent Identity: 87.92 Gaps : 1
Alignment :
1 MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEAWVVE 50
1 MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEA VVE 50
51 RMGRFHRILEPGLNILIPVLDRIRYVQSLKEIVINVPEQSAVTLDNVTLQ 100
51 RMGRFHRILEPGLNILIPVLDRIRYVQSLKEIVINVPEQSAVTLDNVTLQ 100
101 IDGVLYLRIMDPYKASYGVEDPEYAVTQLAQTTMRSELGKLSLDKVFRER 150 101 IDGVLYLRIMDPYKASYGVEDPEYAVTQLAQTTMRSELGKLSLDKVFRER 150
151 ESLNASIVDAINQAADCWGIRCLRYEIKDIHVPPRVKESMQMQVEAERRK 200 I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I 151 ESLNASIVDAINQAADCWGIRCLRYEIKDIHVPPRVKESMQMQVEAERRK 200 201 RATVLESEGTRESAINVAEGKKQAQILASEAEKAEQINQAAGEASAVLAK 250 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 201 RATVLESEGTRESAINVAEGKKQAQILASEAEKAEQINQAAGEASAVLAK 250
251 AKAKAEAIRILAAALTQH 268 I I I I I I I I I II I I I I I I I 251 AKAKAEAIRILAAALTQHNGDAAASLTVAEQYVSAFSKLAKDSNTILLPS 300 269 AMGVYGALTKAPVPGTPDSLSSGSSRDVQGTDASLDEEL 307 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 301 NPGDVTSMVAQAMGVYGALTKAPVPGTPDSLSSGSSRDVQGTDASLDEEL 350
308 DRVKMS 313
351 DRVKMS 356
Sequence name: Q9P042
Sequence documentation: Alignment of: D11853_PEA_1_P11 x Q9P042
Alignment segment 1/1: Quality: 2190.00
Escore: 0 Matching length: 242 Total length: 283 Matching Percent Similarity: 99.59 Matching Percent Identity: 99.59 Total Percent Similarity: 85.16 Total Percent
Identity: 85.16 Gaps : 1
Alignment:
27 RASSGLPRNTVVLFVPQQEA VVERMGRFHRILEPGLNILIPVLDRIRYV 76 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 13 RASSGLPRNTWLFVPQQEA VVERMGRFHRILEPGLNILIPVLDRIRYV 62 . . . . . 77 QSLKEIVINVPEQSAVTLDNVTLQIDGVLYLRIMDPYKASYGVEDPEYAV 126 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 63 QSLKEIVINVPEQSAVTLDNVTLQIDGVLYLRIMDPYKASYGVEDPEYAV 112 127 TQLAQTTMRSELGKLSLDKVFRERESLNASIVDAINQAADCWGIRCLRYE 176 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 113 TQLAQTTMRSELGKLSLDKVFRERESLNASIVDAINQAADC GIRCLRYE 162 177 IKDIHVPPRVKESMQMQVEAERRKRATVLESEGTRESAINVAEGKKQAQI 226 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 163 IKDIHVPPRVKESMQMQVEAERRKRPTVLESEGTRESAINVAEGKKQAQI 212 897
227 LASEAEKAEQINQAAGQERVEAEGGARHGPLKIGAGAGSLGYFDFMGQAS 276
213 LASEAEKAEQINQA 226
277 SVPSLAGEASAVLAKAKAKAEAIRILAAALTQH 309
227 .AGEASAVLAKAKAKAEAIRILAAALTQH 254
Sequence name: BAC85377
Sequence documentation:
Alignment of: D11853 PEA 1 Pll x BAC85377
Alignment segment 1/1:
Quality: 1624.00 Escore: 0 Matching length: 181 Total length: 222 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 81.53 Total Percent Identity: 81.53 Gaps : 1 Alignment:
110 MDPYKASYGVEDPEYAVTQLAQTTMRSELGKLSLDKVFRERESLNASIVD 159 I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MDPYKASYGVEDPEYAVTQLAQTTMRSELGKLSLDKVFRERESLNASIVD 50
160 AINQAADC GIRCLRYEIKDIHVPPRVKESMQMQVEAERRKRATVLESEG 209 ' I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 AINQAADC GIRCLRYEIKDIHVPPRVKESMQMQVEAERRKRATVLESEG 100
210 TRESAINVAEGKKQAQILASEAEKAEQINQAAGQERVEAEGGARHGPLKI 259 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 101 TRESAINVAEGKKQAQILASEAEKAEQINQA 131 . . . . . 260 GAGAGSLGYFDFMGQASSVPSLAGEASAVLAKAKAKAEAIRILAAALTQH 309 I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I 132 AGEASAVLAKAKAKAEAIRILAAALTQH 159 310 VRGPWVGMGTGIDSGRGSLIYA 331 I I I I I I I I I I I I I I I I I I I I I I 160 VRGPWVGMGTGIDSGRGSLIYA 181
Sequence name: Q96FY2
Sequence documentation: Alignment of: D11853_PEA_1_P11 x Q96FY2
Alignment segment 1/1:
Quality: 2418.00
Escore: 0 Matching length: 268 Total length: 309 Matching Percent Similarity: 99.63 Matching Percent Identity: 99.63 Total Percent Similarity: 86.41 Total Percent
Identity: 86.41 Gaps: 1
Alignment:
1 MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEAWVVE 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEAWVVE 50
51 RMGRFHRILEPGLNILIPVLDRIRYVQSLKEIVINVPEQSAVTLDNVTLQ 100 I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 RMGRFHRILEPGLNILIPVLDRIRYVQSLKEIVINVPEQSAVTLDNVTLQ 100 . . . . . 101 IDGVLYLRIMDPYKASYGVEDPEYAVTQLAQTTMRSELGKLSLDKVFRER 150 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I 101 IDGVLYLRIMDPYKASYGVEDPEYAVTQPAQTTMRSELGKLSLDKVFRER 150 151 ESLNASIVDAINQAADCWGIRCLRYEIKDIHVPPRVKESMQMQVEAERRK 200 151 ESLNASIVDAINQAADCWGIRCLRYEIKDIHVPPRVKESMQMQVEAERRK 200
201 RATVLESEGTRESAINVAEGKKQAQILASEAEKAEQINQAAGQERVEAEG 250
201 RATVLESEGTRESAINVAEGKKQAQILASEAEKAEQINQA 240
251 GARHGPLKIGAGAGSLGYFDFMGQASSVPSLAGEASAVLAKAKAKAEAIR 300
241 AGEASAVLAKAKAKAEAIR 259
301 ILAAALTQH 309
260 ILAAALTQH 268
Sequence name: Q9UJZ1
Sequence documentation:
Alignment of: D11853_PEA_1 Pll x Q9UJZ1
Alignment segment 1/1:
Quality: 2433.00 Escore: 0 Matching length: 268 Total length: 309 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 86.73 Total Percent
Identity: 86.73 Gaps : 1
Alignment :
1 MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEAWVVE 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEAWVVΞ 50
51 RMGRFHRILEPGLNILIPVLDRIRYVQSLKEIVINVPEQSAVTLDNVTLQ 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 RMGRFHRILEPGLNILIPVLDRIRYVQSLKEIVINVPEQSAVTLDNVTLQ 100 101 IDGVLYLRIMDPYKASYGVEDPEYAVTQLAQTTMRSELGKLSLDKVFRER 150 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I 101 IDGVLYLRIMDPYKASYGVEDPEYAVTQLAQTTMRSELGKLSLDKVFRER 150 . . . . . 151 ESLNASIVDAINQAADCWGIRCLRYEIKDIHVPPRVKESMQMQVEAERRK 200 I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 151 ESLNASIVDAINQAADCWGIRCLRYEIKDIHVPPRVKESMQMQVEAERRK 200 201 RATVLESEGTRESAINVAEGKKQAQILASEAEKAEQINQAAGQERVEAEG 250 II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 201 RATVLESEGTRESAINVAEGKKQAQILASEAEKAEQINQA 240
251 GARHGPLKIGAGAGSLGYFDFMGQASSVPSLAGEASAVLAKAKAKAEAIR 300
241 AGEASAVLAKAKAKAEAIR 259 301 ILAAALTQH 309
260 ILAAALTQH 268
Sequence name: Q9P042
Sequence documentation:
Alignment of: D11853 PEA 1 P12 x Q9P042
Alignment segment 1/1:
Quality: 1167.00 Escore: 0 Matching length: 122 Total length: 122 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment:
27 RASSGLPRNTWLFVPQQEAWWERMGRFHRILEPGLNILIPVLDRIRYV 76 I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 13 RASSGLPRNTVVLFVPQQEAWVVERMGRFHRILEPGLNILIPVLDRIRYV 62
77 QSLKEIVINVPEQSAVTLDNVTLQIDGVLYLRIMDPYKASYGVEDPEYAV 126 I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 63 QSLKEIVINVPEQSAVTLDNVTLQIDGVLYLRIMDPYKASYGVEDPEYAV 112
127 TQLAQTTMRSELGKLSLDKVFR 148 I I I I I I I III I I I I I I I I I I I I 113 TQLAQTTMRSELGKLSLDKVFR 134
Sequence name: Q96FY2
Sequence documentation:
Alignment of: D11853_PEA_1_P12 x Q96FY2
Alignment segment 1/1: Quality: 1385.00
Escore: 0 Matching length: 148 Total length: 148 Matching Percent Similarity: 99.32 Matching Percent Identity: 99.32 Total Percent Similarity: 99.32 Total Percent
Identity: 99.32 Gaps : 0
Alignment:
1 MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEAWVVE 50 I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEAWVVE 50
51 RMGRFHRILEPGLNILIPVLDRIRYVQSLKEIVINVPEQSAVTLDNVTLQ 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 RMGRFHRILEPGLNILIPVLDRIRYVQSLKEIVINVPEQSAVTLDNVTLQ 100 101 IDGVLYLRIMDPYKASYGVEDPEYAVTQLAQTTMRSELGKLSLDKVFR 148 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 101 IDGVLYLRIMDPYKASYGVEDPEYAVTQPAQTTMRSELGKLSLDKVFR 148
Sequence name: Q9UJZ1
Sequence documentation:
Alignment of: D11853_PEA_1_P12 x Q9UJZ1
Alignment segment 1/1: Quality: 1400.00 Escore: 0 Matching length: 148 Total length: 148 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment :
1 MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEAWVVE 50
1 MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEAWVVE 50
51 RMGRFHRILEPGLNILIPVLDRIRYVQSLKEIVINVPEQSAVTLDNVTLQ 100
51 RMGRFHRILEPGLNILIPVLDRIRYVQSLKEIVINVPEQSAVTLDNVTLQ 100
101 IDGVLYLRIMDPYKASYGVEDPEYAVTQLAQTTMRSELGKLSLDKVFR 148
101 IDGVLYLRIMDPYKASYGVEDPEYAVTQLAQTTMRSELGKLSLDKVFR 148
Sequence name: Q9P042 Sequence documentation:
Alignment of: D11853 PEA 1 P14 x Q9P042
Alignment segment 1/1
Quality: 1628.00 Escore: 0 Matching length: 170 Total length: 170 Matching Percent Similarity: 99.41 Matching Percent Identity: 99.41 Total Percent Similarity: 99.41 Total Percent Identity: 99.41 Gaps: 0
Alignment:
27 RASSGLPRNTVVLFVPQQEAWVVERMGRFHRILEPGLNILIPVLDRIRYV 76 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 13 RASSGLPRNTVVLFVPQQEAWVVERMGRFHRILEPGLNILIPVLDRIRYV 62
77 QSLKEIVINVPEQSAVTLDNVTLQIDGVLYLRIMDPYKASYGVEDPEYAV 126
63 QSLKEIVINVPEQSAVTLDNVTLQIDGVLYLRIMDPYKASYGVEDPEYAV 112
127 TQLAQTTMRSELGKLSLDKVFRERESLNASIVDAINQAADCWGIRCLRYE 176
113 TQLAQTTMRSELGKLSLDKVFRERESLNASIVDAINQAADCWGIRCLRYE 162
177 IKDIHVPPRVKESMQMQVGA 196 163 IKDIHVPPRVKESMQMQVEA 182
Sequence name: Q96FY2
Sequence documentation:
Alignment of: D11853_PEA_1_P1 x Q96FY2
Alignment segment 1/1:
Quality: 1846.00 Escore: 0 Matching length: 196 Total length: 196 Matching Percent Similarity: 98.98 Matching Percent Identity: 98.98 Total Percent Similarity: 98.98 Total Percent Identity: 98.98 Gaps : 0
Alignment:
1 MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEAWWE 50 I I I I I I I I II II II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 1 MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEAWWE 50 51 RMGRFHRILEPGLNILIPVLDRIRYVQSLKEIVINVPEQSAVTLDNVTLQ 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 RMGRFHRILEPGLNILIPVLDRIRYVQSLKEIVINVPEQSAVTLDNVTLQ 100 . . . . . 101 IDGVLYLRIMDPYKASYGVEDPEYAVTQLAQTTMRSELGKLSLDKVFRER 150 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 101 IDGVLYLRIMDPYKASYGVEDPEYAVTQPAQTTMRSELGKLSLDKVFRER 150 151 ESLNASIVDAINQAADCWGIRCLRYEIKDIHVPPRVKESMQMQVGA 196 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 151 ESLNASIVDAINQAADCWGIRCLRYEIKDIHVPPRVKESMQMQVEA 196
Sequence name: Q9UJZ1
Sequence documentation:
Alignment of: D11853_PEA_1_P1 x Q9UJZ1
Alignment segment 1/1:
Quality: 1861.00
Escore: 0 Matching length: 196 Total length: 196 Matching Percent Similarity: 99.49 Matching Percent Identity: 99.49 Total Percent Similarity: 99.49 Total Percent
Identity: 99.49 Gaps : 0
Alignment:
1 MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEAWVVE 50
1 MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEAWVVE 50 51 RMGRFHRILEPGLNILIPVLDRIRYVQSLKEIVINVPEQSAVTLDNVTLQ 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I 51 RMGRFHRILEPGLNILIPVLDRIRYVQSLKEIVINVPEQSAVTLDNVTLQ 100
101 IDGVLYLRIMDPYKASYGVEDPEYAVTQLAQTTMRSELGKLSLDKVFRER 150
101 IDGVLYLRIMDPYKASYGVEDPEYAVTQLAQTTMRSELGKLSLDKVFRER 150
151 ESLNASIVDAINQAADCWGIRCLRYEIKDIHVPPRVKESMQMQVGA 196
151 ESLNASIVDAINQAADCWGIRCLRYEIKDIHVPPRVKESMQMQVEA 196
Sequence name: Q9P042 Sequence documentation:
Alignment of: D11853_PEA_1_P16 x Q9P042
Alignment segment 1/1:
Quality: 2564.00
Escore: 0 Matching length: 285 Total length: 330 Matching Percent Similarity: 99.65 Matching Percent Identity: 99.65 Total Percent Similarity: 86.06 Total Percent
Identity: 86.06 Gaps : 1
Alignment :
27 RASSGLPRNTVVLFVPQQEAWVVERMGRFHRILEPGLNILI PVLDRIRYV 76 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 13 RASSGLPRNTVVLFVPQQEAWWERMGRFHRILEPGLNILIPVLDRIRYV 62 77 QSLKEIVINVPEQSAVTLDNVTLQIDGVLYLRIMDPYKASYGVEDPEYAV 126 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 63 QSLKEIVINVPEQSAVTLDNVTLQIDGVLYLRIMDPYKASYGVEDPEYAV 112
127 TQLAQTTMRSELGKLSLDKVFR 148 I I I I I I I I I I I I I I I I I I I I I I 113 TQLAQTTMRSELGKLSLDKVFRERESLNASIVDAINQAADCWGIRCLRYE 162 • • • . . 149 VEAERRKRATVLESEGTRESAINVAEGKKQAQI 181 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 163 IKDIHVPPRVKESMQMQVEAERRKRPTVLESEGTRESAINVAEGKKQAQI 212 182 LASEAEKAEQINQAAGEASAVLAKAKAKAEAIRILAAALTQHNGDAAASL 231 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 213 LASEAEKAEQINQAAGEASAVLAKAKAKAEAIRILAAALTQHNGDAAASL 262 232 TVAEQYVSAFSKLAKDSNTILLPSNPGDVTSMVAQAMGVYGALTKAPVPG 281 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 263 TVAEQYVSAFSKLAKDSNTILLPSNPGDVTSMVAQAMGVYGALTKAPVPG 312
282 TPDSLSSGSSRDVQGTDASLDEELDRVKMS 311 I I I I I I I I I I I I I I I I I I 1 I I I I I I 1 I I I I 313 TPDSLSSGSSRDVQGTDASLDEELDRVKMS 342
Sequence name: BAC85377
Sequence documentation:
Alignment of: D11853_PEA_1_P16 x BAC85377
Alignment segment 1/1:
Quality: 961.00 Escore: 0 Matching length: 114 Total length: 159 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 71.70 Total Percent Identity: 71.70 Gaps : 1
Alignment :
110 MDPYKASYGVEDPEYAVTQLAQTTMRSELGKLSLDKVFR. 14!
1 MDPYKASYGVEDPEYAVTQLAQTTMRSELGKLSLDKVFRERESLNASIVD 50
149 VEAERRKRATVLESEG 164
51 AINQAADCWGIRCLRYEIKDIHVPPRVKESMQMQVEAERRKRATVLESEG 100
165 TRESAINVAEGKKQAQILASEAEKAEQINQAAGEASAVLAKAKAKAEAIR 214
101 TRESAINVAEGKKQAQILASEAEKAEQINQAAGEASAVLAKAKAKAEAIR 150
215 ILAAALTQH 223
151 ILAAALTQH 159 Sequence name: Q96FY2
Sequence documentation:
Alignment of: D11853_PEA_1_P16 x Q96FY2
Alignment segment 1/1:
Quality: 2792.00 Escore: 0 Matching length: 311 Total length: 356 Matching Percent Similarity: 99.68 Matching Percent Identity: 99.68 Total Percent Similarity: 87.08 Total Percent
Identity: 87.08 Gaps : 1
Alignment: . . . . . 1 MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEAWVVE 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I 1 MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEAWVVE 50 51 RMGRFHRILEPGLNILIPVLDRIRYVQSLKEIVINVPEQSAVTLDNVTLQ 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 RMGRFHRILEPGLNILIPVLDRIRYVQSLKEIVINVPEQSAVTLDNVTLQ 100
101 IDGVLYLRIMDPYKASYGVEDPEYAVTQLAQTTMRSELGKLSLDKVFR.. 148
101 IDGVLYLRIMDPYKASYGVEDPEYAVTQPAQTTMRSELGKLSLDKVFRER 150 149 VEAERRK 155 I I I I I I I 151 ESLNASIVDAINQAADCWGIRCLRYEIKDIHVPPRVKESMQMQVEAERRK 200 . . . . . 156 RATVLESEGTRESAINVAEGKKQAQILASEAEKAEQINQAAGEASAVLAK 205 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 201 RATVLESEGTRESAINVAEGKKQAQILASEAEKAEQINQAAGEASAVLAK 250 206 AKAKAEAIRILAAALTQHNGDAAASLTVAEQYVSAFSKLAKDSNTILLPS 255 I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 251 AKAKAEAIRILAAALTQHNGDAAASLTVAEQYVSAFSKLAKDSNTILLPS 300
256 NPGDVTSMVAQAMGVYGALTKAPVPGTPDSLSSGSSRDVQGTDASLDEEL 305 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 301 NPGDVTSMVAQAMGVYGALTKAPVPGTPDSLSSGSSRDVQGTDASLDEEL 350
306 DRVKMS 311 351 DRVKMS 356
Sequence name: Q9UJZ1
Sequence documentation:
Alignment of: D11853 PEA 1 P16 x Q9UJZ1 Alignment segment 1/1:
Quality: 2807.00 Escore: 0 Matching length: 311 Total length: 356 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 87.36 Total Percent
Identity: 87.36 Gaps : 1
Alignment : . . . . . 1 MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEAWVVE 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I 1 MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEAWVVE 50 51 RMGRFHRILEPGLNILIPVLDRIRYVQSLKEIVINVPEQSAVTLDNVTLQ 100 I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I 51 RMGRFHRILEPGLNILIPVLDRIRYVQSLKEIVINVPEQSAVTLDNVTLQ 100
101 IDGVLYLRIMDPYKASYGVEDPEYAVTQLAQTTMRSELGKLSLDKVFR.. 148 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 101 IDGVLYLRIMDPYKASYGVEDPEYAVTQLAQTTMRSELGKLSLDKVFRER 150
149 VEAERRK 155 I I I I I I I 151 ESLNASIVDAINQAADCWGIRCLRYEIKDIHVPPRVKESMQMQVEAERRK 200 156 RATVLESEGTRESAINVAEGKKQAQILASEAEKAEQINQAAGEASAVLAK 205 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 201 RATVLESEGTRESAINVAEGKKQAQILASEAEKAEQINQAAGEASAVLAK 250 206 AKAKAEAIRILAAALTQHNGDAAASLTVAEQYVSAFSKLAKDSNTILLPS 255 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 251 AKAKAEAIRILAAALTQHNGDAAASLTVAEQYVSAFSKLAKDSNTILLPS 300 256 NPGDVTSMVAQAMGVYGALTKAPVPGTPDSLSSGSSRDVQGTDASLDEEL 305 I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 301 NPGDVTSMVAQAMGVYGALTKAPVPGTPDSLSSGSSRDVQGTDASLDEEL 350
306 DRVKMS 311 I I I I I I 351 DRVKMS 356
Sequence name: Q9P042
Sequence documentation:
Alignment of: D11853_PEA_1_P18 x Q9P042
Alignment segment 1/1: Quality: 1601.00
Escore: 0 Matching length: 179 Total length: 330 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 54.24 Total Percent Identity: 54.24 Gaps : 1
Alignment :
27 RASSGLPRNTVVLFVPQQEAWVVERMGRFHRILEPGLNILIPVLDRIRYV 76
13 RASSGLPRNTWLFVPQQEAWVVERMGRFHRILEPGLNILI PVLDRIRYV 62
77 QSLKEIVINVPEQSAVTLDNVTLQIDGVLYLRIMDPYKASYGVEDPEYAV 126
63 QSLKEIVINVPEQSAVTLDNVTLQIDGVLYLRIMDPYKASYGVEDPEYAV 112
127 TQLAQTTMRSELGKLSLDKVFRERESLNASI 157
113 TQLAQTTMRSELGKLSLDKVFRERESLNASIVDAINQAADCWGIRCLRYE 162
157 157
163 IKDIHVPPRVKESMQMQVEAERRKRPTVLESEGTRESAINVAEGKKQAQI 212
157 157
213 LASEAEKAEQINQAAGEASAVLAKAKAKAEAIRILAAALTQHNGDAAASL 262
158 VAQAMGVYGALTKAPVPG 175 I I I I I I I I I I I I I I I I I I 263 TVAEQYVSAFSKLAKDSNTILLPSNPGDVTSMVAQAMGVYGALTKAPVPG 312 176 TPDSLSSGSSRDVQGTDASLDEELDRVKMS 205 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 313 TPDSLSSGSSRDVQGTDASLDEELDRVKMS 342
Sequence name: Q96FY2
Sequence documentation:
Alignment of: D11853_PEA_1_P18 x Q96FY2
Alignment segment 1/1:
Quality: 1819.00
Escore: 0 Matching length: 205 Total length: 356 Matching Percent Similarity: 99.51 Matching Percent Identity: 99.51 Total Percent Similarity: 57.30 Total Percent
Identity: 57.30 Gaps : 1
Alignment : MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEAWVVE 50
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 11 I I I I I I I MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEAWVVE 50 . . . . . RMGRFHRILEPGLNILIPVLDRIRYVQSLKEIVINVPEQSAVTLDNVTLQ 100
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I RMGRFHRILEPGLNILIPVLDRIRYVQSLKEIVINVPEQSAVTLDNVTLQ 100
IDGVLYLRIMDPYKASYGVEDPEYAVTQLAQTTMRSELGKLSLDKVFRER 150
I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I IDGVLYLRIMDPYKASYGVEDPEYAVTQPAQTTMRSELGKLSLDKVFRER 150
ESLNASI 157 I I I I I I I ESLNASIVDAINQAADCWGIRCLRYEIKDIHVPPRVKESMQMQVEAERRK 200
157
RATVLESEGTRESAINVAEGKKQAQILASEAEKAEQINQAAGEASAVLAK 250
157
AKAKAEAIRILAAALTQHNGDAAASLTVAEQYVSAFSKLAKDSNTILLPS 300 VAQAMGVYGALTKAPVPGTPDSLSSGSSRDVQGTDASLDEEL 199 I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I NPGDVTSMVAQAMGVYGALTKAPVPGTPDSLSSGSSRDVQGTDASLDEEL 350
DRVKMS 205 351 DRVKMS 356
Sequence name: Q9UJZ1
Sequence documentation:
Alignment of: D11853 PEA 1 P18 x Q9UJZ1
Alignment segment 1/1
Quality: 1834.00 Escore: 0 Matching length: 205 Total length: 356 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 57.55 Total Percent Identity: 57.58 Gaps :
Alignment :
1 MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEAWVVE 50
1 MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTWLFVPQQEAWVVE 50 RMGRFHRILEPGLNILIPVLDRIRYVQSLKEIVINVPEQSAVTLDNVTLQ 100
I I I 1 I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I RMGRFHRILEPGLNILIPVLDRIRYVQSLKEIVINVPEQSAVTLDNVTLQ 100
IDGVLYLRIMDPYKASYGVEDPEYAVTQLAQTTMRSELGKLSLDKVFRER 150
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I IDGVLYLRIMDPYKASYGVEDPEYAVTQLAQTTMRSELGKLSLDKVFRER 150
ESLNASI 157
I I I I I I I ESLNASIVDAINQAADCWGIRCLRYEIKDIHVPPRVKESMQMQVEAERRK 200 157
RATVLESEGTRESAINVAEGKKQAQILASEAEKAEQINQAAGEASAVLAK 250 157
AKAKAEAIRILAAALTQHNGDAAASLTVAEQYVSAFSKLAKDSNTILLPS 300 . . . . . VAQAMGVYGALTKAPVPGTPDSLSSGSSRDVQGTDASLDEEL 199
NPGDVTSMVAQAMGVYGALTKAPVPGTPDSLSSGSSRDVQGTDASLDEEL 350
DRVKMS 205
DRVKMS 356 Sequence name: Q9P042
Sequence documentation: . .
Alignment of: D11853JPEAJ.J>19 x Q9P042
Alignment segment 1/1:
Quality: 1110.00
Escore: 0 Matching length: 116 Total length: 116 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent
Identity: 100.00 Gaps : 0
Alignment :
27 RASSGLPRNTVVLFVPQQEAWVVERMGRFHRILEPGLNILIPVLDRIRYV 76 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 13 RASSGLPRNTVVLFVPQQEAWVVERMGRFHRILEPGLNILIPVLDRIRYV 62
77 QSLKEIVINVPEQSAVTLDNVTLQIDGVLYLRIMDPYKASYGVEDPEYAV 126
63 QSLKEIVINVPEQSAVTLDNVTLQIDGVLYLRIMDPYKASYGVEDPEYAV 112 127 TQLAQTTMRSELGKLS 142 113 TQLAQTTMRSELGKLS 128
Sequence name: Q96FY2
Sequence documentation:
Alignment of: D11853_PEA_1_P19 x Q96FY2
Alignment segment 1/1
Quality: 1328.00 Escore: 0 Matching length: 142 Total length: 142 Matching Percent Similarity: 99.30 Matching Percent Identity: 99.30 Total Percent Similarity: 99.30 Total Percent Identity: 99.30 Gaps : 0
Alignment :
1 MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTWLFVPQQEAWVVE 50
1 MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTWLFVPQQEAWVVE 50 51 RMGRFHRILEPGLNILIPVLDRIRYVQSLKEIVINVPEQSAVTLDNVTLQ 100
51 RMGRFHRILEPGLNILIPVLDRIRYVQSLKEIVINVPEQSAVTLDNVTLQ 100
101 IDGVLYLRIMDPYKASYGVEDPEYAVTQLAQTTMRSELGKLS 142
101 IDGVLYLRIMDPYKASYGVEDPEYAVTQPAQTTMRSELGKLS 142
Sequence name: Q9UJZ1
Sequence documentation:
Alignment of: D11853_PEA_1_P19 x Q9UJZ1
Alignment segment 1/1:
Quality: 1343.00 Escore: 0 Matching length: 142 Total length: 142 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0 Alignment :
1 MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEAWVVE 50 I I 1 I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I .1 I I I I I 1 MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEAWVVE 50 51 RMGRFHRILEPGLNILIPVLDRIRYVQSLKEIVINVPEQSAVTLDNVTLQ 100 I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I 51 RMGRFHRILEPGLNILIPVLDRIRYVQSLKEIVINVPEQSAVTLDNVTLQ 100
101 IDGVLYLRIMDPYKASYGVEDPEYAVTQLAQTTMRSELGKLS 142 I I I I I I I I I I I I I I I I I 1 I I I I I I I 1 I I I I I I I I I I I I I I I I 101 IDGVLYLRIMDPYKASYGVEDPEYAVTQLAQTTMRSELGKLS 142
Sequence name: Q96FY2
Sequence documentation:
Alignment of: D11853_PEA_1_P21 x Q96FY2
Alignment segment 1/1:
Quality: 587.00 Escore: 0 Matching length: 68 Total length: 68 Matching Percent Similarity: 95.59 Matching Percent Identity: 92.65 Total Percent Similarity: 95.59 Total Percent
Identity: 92.65 Gaps : 0
Alignment : . . . . . 1 MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEAWWE 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEAWVVE 50 51 RMGRFHRILEPVRNLFCP 68 I I I I I I I I I I I I:: I 51 RMGRFHRILEPGLNILIP 68
Sequence name: Q9UJZ1
Sequence documentation:
Alignment of: D11853_PEA_1_P21 x Q9UJZ1
Alignment segment 1/1: Quality: 587.00
Escore: 0 Matching length: 68 Total length: 68 Matching Percent Similarity: 95.59 Matching Percent Identity: 92.65 Total Percent Similarity: 95.59 Total Percent
Identity: 92.65 Gaps : 0
Alignment :
1 MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEAWVVE 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I 1 MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEAWVVE 50
51 RMGRFHRILEPVRNLFCP 68
51 RMGRFHRILEPGLNILIP 68
Sequence name: Q9P042
Sequence documentation:
Alignment of: D11853 PEA 1 P22 x Q9P042 Alignment segment 1/1:
Quality: 348.00
Escore: 0 Matching length: 37 , Total length: 37 Matching Percent Similarity: 97.30 Matching Percent Identity: 97.30 Total Percent Similarity: 97.30 Total Percent Identity: 97.30 Gaps : 0
Alignment : 27 RASSGLPRNTVVLFVPQQEAWVVERMGRFHRILEPEL 63 I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I 1 I I I I I I I 13 RASSGLPRNTVVLFVPQQEAWWERMGRFHRILEPGL 49
Sequence name: Q96FY2
Sequence documentation:
Alignment of: D11853_PEA_1_P22 x Q96FY2
Alignment segment 1/1: Quality: 581.00
Escore: 0 Matching length: 63 Total length: 63 Matching Percent Similarity: 98.41 Matching Percent Identity: 98.41 Total Percent Similarity: 98.41 Total Percent
Identity: 98.41 Gaps : 0
Alignment :
1 MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEAWVVE 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEAWVVE 50
51 RMGRFHRILEPEL 63
51 RMGRFHRILEPGL 63
Sequence name: Q9UJZ1
Sequence documentation:
Alignment of: D11853 PEA_1_P22 x Q9UJZ1 Alignment segment 1/1:
Quality: 581.00
Escore: 0 Matching length: 63 Total length: 63 Matching Percent Similarity: 98.41 Matching Percent Identity: 98.41 Total Percent Similarity: 98.41 Total Percent Identity: 98.41 Gaps : 0
Alignment : 1 MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEAWVVE 50 I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I 1 I 1 I I I I I 1 I I I I I I I I I 1 MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEAWWE 50
51 RMGRFHRILEPEL 63
51 RMGRFHRILEPGL 63
Sequence name: Q9P042
Sequence documentation: Alignment of: D11853_PEA_1_P24 x Q9P042
Alignment segment 1/1: Quality: 650.00
Escore: 0 Matching length: 68 Total length: 68 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent
Identity: 100.00 Gaps : 0
Alignment:
27 RASSGLPRNTVVLFVPQQEAWVVERMGRFHRILEPGLNILIPVLDRIRYV 76 I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 13 RASSGLPRNTVVLFVPQQEAWVVERMGRFHRILEPGLNILIPVLDRIRYV 62
77 QSLKEIVINVPEQSAVTL 94 I I I I I I I I I I I I I I I I I I 63 QSLKEIVINVPEQSAVTL 80
Sequence name: Q96FY2 Sequence documentation:
Alignment of: D11853_PEA 1 P24 x Q96FY2
Alignment segment 1/1:
Quality: 883.00 Escore: 0 Matching length: 94 Total length: 94 Matching Percent . Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps:
Alignment:
1 MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEAWWE 50
1 MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEAWVVE 50
51 RMGRFHRILEPGLNILIPVLDRIRYVQSLKEIVINVPEQSAVTL 94
51 RMGRFHRILEPGLNILIPVLDRIRYVQSLKEIVINVPEQSAVTL 94 Sequence name: Q9UJZ1
Sequence documentation:
Alignment of: D11853_PEA_1_P2 x Q9UJZ1
Alignment segment 1/1:
Quality: 883.00 Escore: 0 Matching length: 94 Total length: 94 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent
Identity: 100.00 Gaps : 0
Alignment : . . . . . 1 MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTVVLFVPQQEAWVVE 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MLARAARGTGALLLRGSLLASGRAPRRASSGLPRNTWLFVPQQEAWVVE 50 51 RMGRFHRILEPGLNILIPVLDRIRYVQSLKEIVINVPEQSAVTL 94 I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 RMGRFHRILEPGLNILIPVLDRIRYVQSLKEIVINVPEQSAVTL 94
DESCRIPTION FOR CLUSTER RI 1723 Cluster RI 1723 feamres 6 transcript(s) and 26 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Figure imgf000935_0001
Figure imgf000936_0001
Cluster R11723 can be used as a diagnostic marker according to overexpression of transcripts of this cluster in cancer. Expression of such transcripts in normal tissues is also given according to the previously described methods. The tenn "number" in the right hand column of the table and the numbers on the y-axis of the figure below refer to weighted expression of ESTs in each category, as "parts per million" (ratio of the expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million).
Overall, the following results were obtained as shown with regard to the histograms in Figure 27 and Table 4. This cluster is overexpressed (at least at a minimum level) in the following pathological conditions: epithelial malignant mmors, a mixtare of malignant tamors from different tissues and kidney malignant tamors.
Table 4 - Normal tissue distribution
Figure imgf000937_0001
Table 5 - P values and ratios for expression in cancerous tissue
Figure imgf000937_0002
937
Figure imgf000938_0001
As noted above, cluster R11723 featares 6 transcript(s), which were listed in Table 1 above. A description of each variant protein according to the present invention is now provided.
Variant protein RI 1723_PEA_1_P2 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s) Rl l 723 JΕAJJT6. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither frans-membrane region prediction program predicts that this protein has a frans-membrane region.. Variant protein R11723JPEAJJP2 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 6, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein R11723JΕAJJP2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 6 - Amino acid mutations
Figure imgf000939_0001
Variant protein Rl l 723 JPEA JJ>2 is encoded by the following transcript(s): Rll 723 JPEA JJT6, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript Rl l 723 JΕA JJT6 is shown in bold; this coding portion starts at position 1716 and ends at position 2051. The transcript also has the following SNPs as listed in Table 7 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein RI 1723_PEA_1_P2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Nucleic acid SNPs
Figure imgf000939_0002
Variant protein R11723_PEA_1_P6 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s)
Rl l 723 JPEAJ JTl 5. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows:
Comparison report between RI 1723JΕAJJP6 and Q8IXM0 (SEQ ID NO: 1393): l.An isolated chimeric polypeptide encoding for R11723JΕAJJP6, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSAGIMYRKSCASSAACLIASAGSPCRGLAPGREEQRALHKAGAVGGGVR conesponding to amino acids 1 - 110 of RI 1723_PEA_1_P6, and a second amino acid sequence being at least 90 % homologous to
MYAQALLVVGVLQRQAAAQHLHEHPPKLLRGHRVQERVDDRAEVEKRLREGEEDHV RPEVGPRPVVLGFGRSHDPPNLVGHPAYGQC1TIWQPWADTSPJIERQRKEKHSMRTQ conesponding to amino acids 1 - 112 of Q8IXM0, which also conesponds to amino acids 111 - 222 of R11723_PEA_1_P6, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a head of R11723_PEA_1_P6, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSAGIMYRKSCASSAACLIASAGSPCRGLAPGREEQRALHKAGAVGGGVR of
Rl l 723 PEA 1 P6. Comparison report between RI 1723_PEA_1_P6 and Q96AC2 (SEQ ID NO: 1394): l .An isolated chimeric polypeptide encoding for Rl l 723 JPEA JJP6, comprising a first amino acid sequence being at least 90 % homologous to MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSAGIMYRKSCASSAACLIASAG conesponding to amino acids 1 - 83 of Q96AC2, which also conesponds to amino acids 1 - 83 of R11723_PEA_1_P6, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence SPCRGLAPGREEQRALHKAGAVGGGVRMYAQALLVVGVLQRQAAAQHLHEHPPKLL RGHRVQERVDDRAEVEKRLREGEEDHVRPEVGPRP WLGFGRSHDPPNLVGHPA YGQ CHNNQPWADTSRRERQRKEKHSMRTQ conesponding to amino acids 84 - 222 of R11723_PEA_1_P6, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of R11723_PEA_1_P6, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence
SPCRGLAPGP^EQRALHKAGAVGGGVRMYAQALLVVGVLQRQAAAQHLHEHPPKLL RGHRVQERVDDRAEVEKRLREGEEDHVRPEVGPRPWLGFGRSHDPPNLVGHPAYGQ CHNNQPWADTSRRERQRKEKHSMRTQ in RI 1723 JPEA J_P6.
Comparison report between RI 1723_PEAJ JP6 and Q8N2G4 (SEQ ID NO: 1395): l.An isolated chimeric polypeptide encoding for R11723_PEA_1_P6, comprising a first amino acid sequence being at least 90 % homologous to MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSAGIMYRKSCASSAACLIASAG conesponding to amino acids 1 - 83 of Q8N2G4, which also conesponds to amino acids 1 - 83 of R11723_PEA_1_P6, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%o and most preferably at least 95% homologous to a polypeptide having the sequence SPCRGLAPGREEQRALHKAGAVGGGVRMYAQALLWGVLQRQAAAQHLHEHPPKLL RGHRVQERVDDRAEVEKRLREGEEDHVRPEVGPRPVVLGFGRSHDPPNLVGHPAYGQ CHNNQPWADTSRRERQRKEKHSMRTQ conesponding to amino acids 84 - 222 of Rl l 723 JPEA JJP6, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of R11723_PEA_1_P6, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence
SPCRGLAPGREEQRALHKAGAVGGGVRMYAQALLVVGVLQRQAAAQHLHEHPPKLL RGHRVQERVDDRAEVEKRLREGEEDHVRPEVGPRPVVLGFGRSHDPPNLVGHPAYGQ CHNNQPWADTSRRERQRKEKHSMRTQ in RI 1723 JPEA JJP6.
Comparison report between RI 1723 JPEA JJ»6 and BAC85518 (SEQ ID NO:1396): l.An isolated chimeric polypeptide encoding for Rl l 723 JPEA JJP6, comprising a first amino acid sequence being at least 90 % homologous to MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSAGIMYRKSCASSAACLIASAG conesponding to amino acids 24 - 106 of BAC85518, which also conesponds to amino acids 1 - 83 of R11723_PEA_1_P6, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence SPCRGLAPGREEQRALHKAGAVGGGVRMYAQALLVVGVLQRQAAAQHLHEHPPKLL RGHRVQERVDDRAEVEKRLREGEEDHVRPEVGPRPVVLGFGRSHDPPNLVGHPAYGQ CHNNQPWADTSRRERQRKEKHSMRTQ conesponding to amino acids 84 - 222 of Rl l 723 JPEA JJP6, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of R11723_PEA_1_P6, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SPCRGLAPGREEQRALHKAGAVGGGVRMYAQALLWGVLQRQAAAQHLHEHPPKLL RGHRVQERVDDRAEVEKP REGEEDHVRPEVGPRPVNLGFGRSHDPPNLVGHPAYGQ CHNNQPWADTSRRERQRKEKHSMRTQ in RI 1723_PEA_1_P6. O 2005 0
942
The location of the variant protein was detennined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region.. Variant protein R11723_PEA_1_P6 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 8, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein R11723_PEA_1_P6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Amino acid mutations
Figure imgf000943_0001
Variant protein Rl l 723 JPEA JJ>6 is encoded by the following transcript(s): Rl l 723 JPEAJ JTl 5, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript RI 1723 JPEAJ JTl 5 is shown in bold; this coding portion starts at position 434 and ends at position 1099. The franscript also has the following SNPs as listed in Table 9 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of Icnown SNPs in variant protein R11723_PEA_1_P6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Nucleic acid SNPs 2005/072053
943
Figure imgf000944_0001
Variant protein RI 1723_PEA_1_P7 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) Rll 723 JPEAJ JTl 7. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows:
Comparison report between RI 1723_PEA_1_P7 and Q96AC2: l.An isolated chimeric polypeptide encoding for R11723_PEA_1_P7, comprising a first amino acid sequence being at least 90 % homologous to MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSAG conesponding to amino acids 1 - 64 of Q96AC2, which also conesponds to amino acids 1 - 64 of R11723_PEA_1_P7, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence SHCVTRLECSGTISAHCNLCLPGSNDHPT conesponding to amino acids 65 - 93 of R11723_PEA_1_P7, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of R11723_PEA_1_P7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SHCVTRLECSGTISAHCNLCLPGSNDHPT in RI 1723_PEA_1_P7. Comparison report between RI 1723 JΕAJ J>7 and Q8N2G4: l .An isolated chimeric polypeptide encoding for Rl l 723 JPEA JJP7, comprising a first amino acid sequence being at least 90 %> homologous to MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSAG conesponding to amino acids 1 - 64 of Q8N2G4, which also conesponds to amino acids 1 - 64 of R11723_PEA_1_P7, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence SHCVTRLECSGTISAHCNLCLPGSNDHPT conesponding to amino acids 65 - 93 of RI 1723 J>EA J JP7, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of R11723_PEA_1_P7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SHCVTRLECSGTISAHCNLCLPGSNDHPT in RI 1723JPEA J JP7.
Comparison report between RI 1723 JPEAJ JP7 and BAC85273: l.An isolated chimeric polypeptide encoding for Rl l 723 JPEA JJP7, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MWVLG conesponding to amino acids 1 - 5 of Rl l 723 JPEA JJP7, second amino acid sequence being at least 90 % homologous to IAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEVMEQSAG conesponding to amino acids 22 - 80 of BAC85273, which also conesponds to amino acids 6 - 64 of Rl l 723 JPEAJ JP7, and a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
SHCVTRLECSGTISAHCNLCLPGSNDHPT conesponding to amino acids 65 - 93 of R11723_PEA_1_P7, wherein said first, second and third amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a head of Rl l 723 JPEA JJP7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95% homologous to the sequence MWVLG of RI 1723 JPEAJ JP7. 3.An isolated polypeptide encoding for a tail of R11723_PEA_1_P7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SHCVTRLECSGTISAHCNLCLPGSNDHPT in RI 1723_PEA_1_P7. Comparison report between RI 1723J>EAJ JP7 and BAC85518: l .An isolated chimeric polypeptide encoding for Rl l 723 JPEA JJ*7, comprising a first amino acid sequence being at least 90 % homologous to MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSAG conesponding to amino acids 24 - 87 of BAC85518, which also conesponds to amino acids 1 - 64 of Rll 723 JPEA JJP7, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence SHCVTRLECSGTISAHCNLCLPGSNDHPT conesponding to amino acids 65 - 93 of R11723_PEA_1_P7, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of Rl l 723 JPEAJ JP7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SHCVTRLECSGTISAHCNLCLPGSNDHPT in RI 1723 JPEAJ J>7.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region.. 2
946 Variant protein R11723 ΕAJ JP7 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 10, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of Icnown SNPs in variant protein R11723JΕAJJP7 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 - Amino acid mutations
Figure imgf000947_0001
Variant protein R11723_PEA_1_P7 is encoded by the following transcript(s): RI 1723JPEAJJT17, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript R11723_PEA_1_T17 is shown in bold; this coding portion starts at position 434 and ends at position 712. The transcript also has the following SNPs as listed in Table 11 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein RI 1723 JPEA JJP7 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 11 - Nucleic acid SNPs
Figure imgf000947_0002
Variant protein Rl l 723 JPEAJ J>13 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) Rl l 723 JPEAJ JTl 9 and Rl l 723 JPEA J_T5. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between RI 1723JΕAJ J»13 and Q96AC2: l.An isolated chimeric polypeptide encoding for RI 1723_PEA_1_P13, comprising a first amino acid sequence being at least 90 % homologous to MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSA conesponding to amino acids 1 - 63 of Q96AC2, which also corresponds to amino acids 1 - 63 of R11723_PEA_1_P13, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence DTKRTNTLLFEMRHFAKQLTT conesponding to amino acids 64 - 84 of RI 1723 JΕAJJ 13, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of R11723JPEAJJP13, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence DTKRTNTLLFEMRHFAKQLTT in RI 1723_PEA_1_P13. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a frans-membrane region..
Variant protein R11723_PEA_1_P13 is encoded by the following transcript(s):
R11723JPEAJJT19, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript R11723_PEA_1_T19 is shown in bold; this coding portion starts at position 434 and ends at position 685. The franscript also has the following SNPs as listed in
Table 12 (given according to their position on the nucleotide sequence, with the alternative O 200
948 nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein RI 1723JPEAJ JP13 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 12 - Nucleic acid SNPs
Figure imgf000949_0001
Variant protein R11723_PEA_1_P10 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) Rll 723 JPEA JJT20. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows:
Comparison report between RI 1723JPEAJ JP10 and Q96AC2: l.An isolated chimeric polypeptide encoding for R11723JPEAJJP10, comprising a first amino acid sequence being at least 90 % homologous to MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSA conesponding to amino acids 1 - 63 of Q96AC2, which also conesponds to amino acids 1 - 63 of R11723_PEA_1_P10, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence DRVSLCHEAGVQWNNFSTLQPLPPRLK conesponding to amino acids 64 - 90 of RI 1723 J EAJ JP 10, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of R11723_PEA_1_P10, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence DRVSLCHEAGVQWNNFSTLQPLPPRLK in RI 1723 JPEA JJ» 10.
Comparison report between RI 1723 JPEA JP10 and Q8N2G4: l.An isolated chimeric polypeptide encoding for RI 1723 JPEAJ J>10, comprising a first amino acid sequence being at least 90 % homologous to MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSA conesponding to amino acids 1 - 63 of Q8N2G4, which also corresponds to amino acids 1 - 63 of R11723_PEA_1_P10, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence DRVSLCHEAGVQWNNFSTLQPLPPRLK conesponding to amino acids 64 - 90 of RI 1723_PEA_1_P10, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of R11723_PEA_1_P10, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence DRVSLCHEAGVQWNNFSTLQPLPPRLK in RI 1723 JPEA JJ> 10.
Comparison report between RI 1723_PEA_1_P10 and BAC85273 (SEQ ID NO: 1397): l.An isolated chimeric polypeptide encoding for RI 1723_PEA_1_P10, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MWVLG conesponding to amino acids 1 - 5 of R11723JPEAJJP10, second amino acid sequence being at least 90 % homologous to IAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFTVNCTVNVQDMCQKEVMEQSA conesponding to amino acids 22 - 79 of BAC85273, which also conesponds to amino acids 6 - 63 of R11723_PEA_1_P10, and a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence DRVSLCHEAGVQWNNFSTLQPLPPRLK conesponding to amino acids 64 - 90 of RI 1723JPEAJ J 10, wherein said first, second and third amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a head of R11723_PEA_1_P10, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MWVLG of RI 1723 JPEAJ JP10. 3.An isolated polypeptide encoding for a tail of Rll 723 JPEA JJP10, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence DRVSLCHEAGVQWNNFSTLQPLPPRLK in RI 1723 JPEA JJ> 10.
Comparison report between RI 1723 JPEAJ JP10 and BAC85518: l.An isolated chimeric polypeptide encoding for RI 1723 ΕAJJP10, comprising a first amino acid sequence being at least 90 % homologous to MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEV MEQSA conesponding to amino acids 24 - 86 of BAC85518, which also conesponds to amino acids 1 - 63 of R11723_PEA_1_P10, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence DRVSLCHEAGVQWNNFSTLQPLPPRLK conesponding to amino acids 64 - 90 of R11723JPEAJJP10, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of R11723_PEA_1_P10, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence DRVSLCHEAGVQWNNFSTLQPLPPRLK in RI 1723 JPEAJ JP10.
The location of the variant protein was detennined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region.. Variant protein R11723_PEA_l ?10 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 13, (given according to their position(s) on the amino acid sequence, with the altemative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein RI 1723 JPEA JJP 10 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 13 - Amino acid mutations
Figure imgf000952_0001
Variant protein Rl l 723 JPEA JJP 10 is encoded by the following franscript(s): Rl l 723 JΕA JJT20, for which the sequence(s) is/are given at the end of the application. The coding portion of franscript Rll 723 JPEA JJT20 is shown in bold; this coding portion starts at position 434 and ends at position 703. The franscript also has the following SNPs as listed in Table 14 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein RI 1723 JPEAJ JP 10 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 14 - Nucleic acid SNPs
Figure imgf000952_0002
As noted above, cluster R11723 features 26 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster RI 1723 ΕAJ ιodeJ3 according to the present invention is supported by 5 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R11723JPEAJJT19, Rl l 723 JPEAJ JT5 and R11723_PEA_1_T6. Table 15 below describes the starting and ending position of this segment on each transcript. Table 15 - Segment location on transcripts
Figure imgf000953_0001
Segment cluster RI 1723_PEA_l_node_16 according to the present invention is supported by 3 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): RI 1723 JPEAJ JT17, R11723_PEA_1_T19 and RI 1723 JPEA JJT20. Table 16 below describes the starting and ending position of this segment on each transcript. Table 16 - Segment location on transcripts
Figure imgf000953_0002
Segment cluster RI 1723 JPEAJ jnode J 9 according to the present invention is supported by 45 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Rl l 723 JPEA JJT5 and Rl l 723 JPEA JJT6. Table 17 below describes the starting and ending position of this segment on each transcript. Table 17 - Segment location on transcripts
Figure imgf000954_0001
Segment cluster Rl l 723 JPEA J iode according to the present invention is supported by 29 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): RI 1723 JPEAJ JTl 5, R11723JPEAJJT17, Rl l 723 JPEAJ JTl 9, Rl l 723 JPEA J T20, Rl l 723 JPEA JJT5 and Rl l 723 JΕAJJT6. Table 18 below describes the starting and ending position of this segment on each transcript. Table 18 - Segment location on transcripts
Figure imgf000954_0002
Segment cluster RI 1723 JPEAJ jnode _22 according to the present invention is supported by 65 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscriρt(s): Rll 723 JPEA JJT5 and Rl l 723 JPEA JJT6. Table 19 below describes the starting and ending position of this segment on each franscript. Table 19 - Segment location on transcripts
Figure imgf000954_0003
Figure imgf000955_0001
Segment cluster RI 1723 JPEAJ jnode Jl according to the present invention is supported by 70 libraries. The number of libraries was detennined as previously described. This segment can be found in the following franscript(s): R11723JPEAJJT15, Rl l 723 JPEA JJT5 and Rll 723 JPEA JJT6. Table 20 below describes the starting and ending position of this segment on each transcript (it should be noted that these transcripts show alternative polyadenylation). Table 20 - Segment location on transcripts
Figure imgf000955_0002
According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster RI 1723_PEA_l_node_10 according to the present invention is supported by 38 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): RI 1723 JPEAJ JT15, RI 1723 JPEAJ JT17, R11723_PEA_1_T19, Rl l 723 JPEA JJT20, Rl l 723 JPEA _T5 and R11723J>EAJ_T6. Table 21 below describes the starting and ending position of this segment on each franscript. Table 21 - Segment location on transcripts
Figure imgf000955_0003
Figure imgf000956_0001
Segment cluster RI 1723 JPEA JjiodeJ 1 according to the present invention is supported by 42 libraries. The number of libraries was detennined as previously described. This segment can be found in the following transcript(s): RI 1723 JPEAJ JTl 5, Rl l 723 JPEAJ JTl 7, Rl l 723 JPEAJ JTl 9, RI 1723 JPEA JJT20, R11723_PEAJ_T5 and R11723_PEA_1_T6. Table 22 below describes the starting and ending position of this segment on each transcript. Table 22 - Segment location on transcripts
Figure imgf000956_0002
Segment cluster RI 1723 JPEAJ j ode J 5 according to the present invention can be found in the following transcript(s): RI 1723JPEAJ JT20. Table 23 below describes the starting and ending position of this segment on each transcript. Table 23 - Segment location on transcripts
Figure imgf000956_0003
Segment cluster RI 1723_PEA_l_node_18 according to the present invention is supported by 40 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R1 1723JPEAJJT15, R11723JΕAJJT5 and Rl l 723 JPEA JJT6. Table 24 below describes the starting and ending position of this segment on each transcript. Table 24 - Segment location on transcripts
Figure imgf000957_0001
Segment cluster Rl l 723 JPEA Jjiode JO according to the present invention can be found in the following transcript(s): Rl l 723 JPEA JJT5 and R11723JPEAJJT6. Table 25 below describes the starting and ending position of this segment on each transcript. Table 25 - Segment location on transcripts
Figure imgf000957_0002
Segment cluster RI 1723 JPEA J iode l according to the present invention is supported by 36 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): RI 1723_PEA_1_T5 and RI 1723 JΕAJ JT6. Table 26 below describes the starting and ending position of this segment on each transcript. Table 26 - Segment location on transcripts
Figure imgf000957_0003
Segment cluster RI 1723 JPEAJ jnode 3 according to the present invention is supported by 39 libraries. The number of libraries was detennined as previously described. This segment can be found in the following transcript(s): RI 1723_PEA_1_T5 and RI 1723JPEAJ _T6. Table 27 below describes the starting and ending position of this segment on each transcript. Table 27 - Segment location on transcripts
Figure imgf000958_0001
Segment cluster RI 1723JPEAJ jnode 4 according to the present invention is supported by 51 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): Rl l 723 JPEAJ JTl 5, Rl l 723 JPEA JJT5 and Rl l 723 JPEA JJT6. Table 28 below describes the starting and ending position of this segment on each transcript. Table 28 - Segment location on transcripts
Figure imgf000958_0002
Segment cluster RI 1723_PEA_l_nodeJ5 according to the present invention is supported by 54 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): R11723_PEA_1_T15, Rl l 723 JPEA JJT5 and RI 1723JPEA JJT6. Table 29 below describes the starting and ending position of this segment on each transcript. Table 29 - Segment location on transcripts
Figure imgf000958_0003
Figure imgf000959_0001
Segment cluster RI 1723 JPEA J iode ό according to the present invention is supported by 62 libraries. The number of libraries was detennined as previously described. This segment can be found in the following transcript(s): Rl l 723 JPEAJ JTl 5, R11723 ΕAJJT5 and R11723_PEA_1_T6. Table 30 below describes the starting and ending position of this segment on each transcript. Table 30 - Segment location on transcripts
Figure imgf000959_0002
Segment cluster RI 1723 JPEA J_nodeJ27 according to the present invention is supported by 67 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R11723_PEA_1_T15, Rl l 723 JPEA JJT5 and Rl l 723 JPEA JJT6. Table 31 below describes the starting and ending position of this segment on each transcript. Table 31 - Segment location on transcripts
Figure imgf000959_0003
Segment cluster Rl l 723 JPEA J ιode_28 according to the present invention can be found in the following transcript(s): RI 1723 JΕA JT15, Rl l 723 JPEA J_T5 and Rll 723 JPEA JJT6. Table 32 below describes the starting and ending position of this segment on each franscript. Table 32 - Segment location on transcripts
Figure imgf000960_0001
Segment cluster RI 1723JPEA Jjnode _29 according to the present invention is supported by 69 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Rl l 723 JΕAJ JTl 5, Rl l 723 JPEA JJT5 and R11723_PEA_1_T6. Table 33 below describes the starting and ending position of this segment on each franscript. Table 33 - Segment location on transcripts
Figure imgf000960_0002
Segment cluster RI 1723 JPEA Jjiode according to the present invention can be found in the following franscript(s): RI 1723 JPEAJ JT15, Rl l 723 JPEAJ JTl 7, R11723_PEA_1_T19, RI 1723 JΕA JJT20, RI 1723 JPEA _T5 and R11723_PEA_1_T6. Table 34 below describes the starting and ending position of this segment on each franscript. Table 34 - Segment location on transcripts
Figure imgf000960_0003
Figure imgf000961_0001
Segment cluster Rl l 723 JPENJ jnode JO according to the present invention can be found in the following transcript(s): RI 1723 JΕAJ JT15, R11723_PEA_1_T5 and RI 1723 JPEAJ JT6. Table 35 below describes the starting and ending position of this segment on each transcript. Table 35 - Segment location on transcripts
Figure imgf000961_0002
Segment cluster R11723_PEA_l_node_4 according to the present invention is supported by 25 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): RI 1723 JPEA JJT15, Rl l 723 JPEAJ JTl 7, R11723_PEA_1_T19, RI 1723 JPEA JJT20, Rl l 723 JPEA JJT5 and RI 1723 JPEA JJT6. Table 36 below describes the starting and ending position of this segment on each franscript. Table 36 - Segment location on transcripts
Figure imgf000961_0003
Figure imgf000962_0001
Segment cluster Rl l 723 JPEA J_nodeJ according to the present invention is supported by 26 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Rl l 723 JPEAJ JTl 5, R11723_PEA_1_T17, R11723_PEA_1_T19, R11723_PEA_1_T20, Rl l 723 JPEA J_T5 and RI 1723 JPEA JJT6. Table 37 below describes the starting and ending position of this segment on each transcript. Table 37 - Segment location on transcripts
Figure imgf000962_0002
Segment cluster R11723_PEA_l_node_6 according to the present invention is supported by 27 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcriρt(s): Rll 723 JΕAJ JTl 5, R11723_PEA_1_T17, Rl l 723 _PEAJ JTl 9, RI 1723 JPEA JT20, Rll 723 JPEA JJT5 and RI 1723 JPEA _T6. Table 38 below describes the starting and ending position of this segment on each transcript. Table 38 - Segment location on transcripts
Figure imgf000962_0003
Figure imgf000963_0001
Segment cluster Rl l 723 JPEAJ jnode J according to the present invention is supported by 29 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): RI 1723 JPEAJ JT15, R11723JPEAJJT17, RI 1723 JPEA JJT19, R11723_PEAJ_T20, Rl l 723 JPEA JJT5 and R11723_PEA_1_T6. Table 39 below describes the starting and ending position of this segment on each transcript. Table 39 - Segment location on transcripts
Figure imgf000963_0002
Segment cluster Rl l 723 JPEAJ _node_8 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): R11723JPEAJJT6. Table 40 below describes the starting and ending position of this segment on each transcript. Table 40 - Segment location on transcripts
Figure imgf000963_0003
It should be noted that the variants of this cluster are variants of the hypothetical protein
PSEC0181 (refened to herein as "PSEC"). Furthermore, use of the Icnown protein (WT protein) for detection of ovarian cancer, alone or in combination with one or more variants of this cluster and/or of any other cluster and/or of any known marker, also comprises an embodiment of the present invention. It should be noted that the nucleotide transcript sequence of known protein (PSEC, also referred to herein as the "wild type" or WT protein) feature at least one SNP that appears to affect the coding region, in addition to certain silent SNPs. This SNP does not have an effect on the RI 1723 JPEA JJT5 splice variant sequence): "G-> " resulting in a missing nucleotide (affects amino acids from position 91 onwards). The missing nucleotide creates a frame shift, resulting in a new protein. This SNP was not previously identified and is supported by 5 ESTs out of -70 ESTs in this exon.
Expression of Rll 723 transcripts, which are detectable by amplicon as depicted in sequence name R11723 seg!3 in normal and cancerous colon tissues. Expression of transcripts detectable by or according to segl3, R11723 segl3 amplicon (SEQ ID NO: 1297) and R11723 segl3F (SEQ ID NO: 1295) and R11723 segl3R (SEQ ID NO: 1296) primers was measured by real time PCR. In parallel the expression of four housekeeping genes -PBGD (GenBank Accession No. BC019323; amplicon - PBGD-amplicon, SEQ ID NO:531), HPRTl (GenBanlc Accession No. NM_000194; amplicon - HPRTl -amplicon, SEQ ID NO:612), G6PD (GenBank Accession No. NM_000402; HPRTl -amplicon, SEQ ID NO:615), and RPS27A (GenBank Accession No. NM 02954; RPS27A amplicon, SEQ ID NO:1261), was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the nonnal post-mortem (PM) samples (Sample Nos. 41, 52, 62-67, 69-71, Table 3, above: "Tissue samples in colon cancer testing panel"), to obtain a value of fold differential expression for each sample relative to median of the normal PM samples. Figure 28 is a histogram showing differential expression of the above-indicated transcripts in cancerous colon samples relative to the normal samples. Values represent the average of duplicate experiments. Enor bars indicate the minimal and maximal values obtained. As is evident from Figure 28, the expression of transcripts detectable by the above amplicon in a few cancer samples was higher by more than 5 fold than in the non-cancerous samples (Sample Nos. 41, 52, 62-67, 69-71 Table 3: "Tissue samples in colon cancer testing panel"). However, the expression of transcripts detectable by the above amplicon in a several other cancer samples was lower than in the non-cancerous samples.
Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non- limiting illustrative example only of a suitable primer pair: RI 1723 segl3F forward primer; and RI 1723 segl3R reverse primer. The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon was obtained as a non-limiting illustrative example only of a suitable amplicon: RI 1723 segl3. RI 1723segl3F (SEQ ID NO: 1295) - ACACTAAAAGAACAAACACCTTGCTC RI 1723segl3R (SEQ ID NO: 1296) - TCCTCAGAAGGCACATGAAAGA RI 1723segl3 - amplicon (SEQ ID NO: 1297):
ACACTAAAAGAACAAACACCTTGCTCTTCGAGATGAGACATTTTGCCAAGCAGTTG ACCACTTAGTTCTCAAGAAGCAACTATCTCTTTCATGTGCCTTCTGAGGA
Expression of RI 1723 tr-anscripts, which are detectable by amplicon as depicted in sequence name RI 1723 juncl 1-18 in normal and cancerous colon tissues. Expression of transcripts detectable by or according to juncl 1-18, R1 1723 juncl 1-18 amplicon (SEQ ID NO: 1300) and RI 1723 juncl 1-18F (SEQ ID NO: 1298) and RI 1723 juncl 1- 18R (SEQ ID NO: 1299) primers was measured by real time PCR. In parallel the expression of four housekeeping genes -PBGD (GenBank Accession No. BC019323; amplicon - PBGD- amplicon, SEQ ID NO:531), HPRTl (GenBank Accession No. NM 00194; amplicon - HPRTl-amplicon, SEQ ID NO:612), G6PD (GenBank Accession No. NM_000402; HPRT1- amplicon, SEQ ID NO:615), and RPS27A (GenBank Accession No. NM 02954; RPS27A amplicon, SEQ ID NO: 1261), was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the normal post-mortem (PM) samples (Sample Nos. 41, 52, 62-67, 69-71, Table 3, above: "Tissue samples in colon cancer testing panel"), to obtain a value of fold differential expression for each sample relative to median of the normal PM samples. Figure 29 is a histogram showing differential expression of the above-indicated transcripts in a few cancerous colon samples relative to the normal samples (Sample Nos. 41, 52, 62-67, 69-71 Table 3: "Tissue samples in colon cancer testing panel"). As is evident from Figure 29, the expression of transcripts detectable by the above amplicon in a few cancer samples was higher by more than 5 fold than in the non-cancerous samples (Sample Nos. 41, 52, 62-67, 69-71 Table 1: "Tissue samples in colon cancer testing panel"). However, the expression of franscripts detectable by the above amplicon in a several other cancer samples was lower than in the non-cancerous samples Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non- limiting illustrative example only of a suitable primer pair: R11723 juncl 1-18F forward primer; and RI 1723 junc 11 - 18R reverse primer. The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon was obtained as a non-limiting illustrative example only of a suitable amplicon: RI 1723 juncl 1- 18. R11723juncl l-18F (SEQ ID NO: 1298)- AGTGATGGAGCAAAGTGCCG RI 1723 junc 11 - 18R (SEQ ID NO: 1299)- CAGCAGCTGATGCAAACTGAG
RI 1723 juncl 1-18 amplicon (SEQ ID NO: 1300) -
AGTGATGGAGCAAAGTGCCGGGATCATGTACCGCAAGTCCTGTGCATCATCAGCGG CCTGTCTCATCGCCTCTGCCGGGTACCAGTCCTTCTGCTCCCCAGGGAAACTGAACT
CAGTTTGCATCAGCTGCTG
Expression of Rl l 723 transcripts, which are detectable by amplicon as depicted in sequence name RI 1723segl3 in different normal tissues.
Expression of RI 1723 transcripts detectable by or according to R11723segl3 amplicon (SEQ ID NO: 1297) and R11723segl3F (SEQ ID NO: 1295) , R11723segl3R (SEQ ID NO:
1296) was measured by real time PCR. In parallel the expression of four housekeeping genes-
RPL19 (GenBank Accession No. NM 000981; RPL19 amplicon, SEQ ID NO:1264), TATA box (GenBank Accession No. NM_003194; TATA amplicon, SEQ ID NO: 1267), UBC
(GenBank Accession No. BC000449; amplicon - Ubiquitin-amplicon, SEQ ID NO: 1270) and SDHA (GenBank Accession No. NM_004168; amplicon - SDHA-amplicon, SEQ ID NO:1273) was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the ovary samples to obtain a value of relative expression of each sample relative to median of the ovary samples. The results are described in Figure 30, presenting the histogram showing the expression of R11723 transcripts, detectable by amplicon depicted in sequence name R11723segl3 in different normal tissues. RI 1723segl3F (SEQ ID NO: 1295) - ACACTAAAAGAACAAACACCTTGCTC RI 1723segl3R (SEQ ID NO: 1296) - TCCTCAGAAGGCACATGAAAGA
RI 1723segl3 - amplicon (SEQ ID NO: 1297):
ACACTAAAAGAACAAACACCTTGCTCTTCGAGATGAGACATTTTGCCAAGCAGTTG ACCACTTAGTTCTCAAGAAGCAACTATCTCTTTCATGTGCCTTCTGAGGA
Expression of Rl l 723 transcripts, which are detectable by amplicon as depicted in sequence name RI 1723 juncl 1-18 in different normal tissues.
Expression of RI 1723 transcripts detectable by or according to R11723segl3 amplicon (SEQ ID NO: 1300) and RI 1723 juncl 1-18F (SEQ ID NO: 1298), RI 1723 juncl 1-18R (SEQ ID
NOΛ299) was measured by real time PCR. In parallel the expression of four housekeeping genes- RPL19 (GenBank Accession No. NM_000981; RPL19 amplicon, SEQ ID NO:1264),
TATA box (GenBank Accession No. NM_003194; TATA amplicon, SEQ ID NO:1267), UBC
(GenBank Accession No. BC000449; amplicon - Ubiquitin-amplicon, SEQ ID NO: 1270) and SDHA (GenBank Accession No. NM_004168; amplicon - SDHA-amplicon, SEQ ID NO:1273) was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the ovary samples to obtain a value of relative expression of each sample relative to median of the ovary samples. The results are described in Figure 31, presenting the histogram showing the expression of Rll 723 transcripts, detectable by amplicon depicted in sequence name Rl l 723 juncl 1-18 in different normal tissues. RI 1723juncl 1-18F (SEQ ID NO: 1298)- AGTGATGGAGCAAAGTGCCG RI 1723 juncl 1-18R (SEQ ID NO: 1299)- CAGCAGCTGATGCAAACTGAG R11723 juncl 1-18 amplicon (SEQ ID NO: 1300)
AGTGATGGAGCAAAGTGCCGGGATCATGTACCGCAAGTCCTGTGCATCATCAGCGG CCTGTCTCATCGCCTCTGCCGGGTACCAGTCCTTCTGCTCCCCAGGGAAACTGAACT CAGTTTGCATCAGCTGCTG
It was found that the known protein (wild type) transcript expression pattern for the above cluster (PSEC) is similar to the variant expression pattern, except that in some cases (such as ovarian cancer) the variant overexpression in cancer was found to be higher.
Variant protein alignment to the previously known protein:
Sequence name: /tmp/gp6eQTL qk/mFtjUpUzhb:Q8IXM0
Sequence documentation:
Alignment of: R11723_PEA_1_P6 x Q8IXM0
Alignment segment 1/1: Quality: 1128.00
Escore: 0 Matching length: 112 Total length: 112 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent
Identity: 100.00 Gaps : 0
Alignment:
111 MYAQALLVVGVLQRQAAAQHLHEHPPKLLRGHRVQERVDDRAEVEKRLRE 160 I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MYAQALLVVGVLQRQAAAQHLHEHPPKLLRGHRVQERVDDRAEVEKRLRE 50 161 GEEDHVRPEVGPRPWLGFGRSHDPPNLVGHPAYGQCHNNQPWADTSRRE 210 I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I 1 I I I I I I II I I I I I 51 GEEDHVRPEVGPRPVVLGFGRSHDPPNLVGHPAYGQCHNNQPWADTSRRE 100 211 RQRKEKHSMRTQ 222 I I I I I I I I I I I I 101 RQRKEKHSMRTQ 112
Sequence name: /tmp/gp6eQTLWqk/mFtjUpUzhb:Q96AC2
Sequence documentation:
Alignment of: Rl1723_PEAJ.JP6 x Q96AC2
Alignment segment 1/1: Quality: 835.00
Escore: 0 Matching length: 83 Total length: 83 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 1 Total Percent Similarity: 100.00 Total Percent
Identity: 100.00 Gaps : 0
Alignment :
1 M VLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNV 50 I I I 1 I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 M VLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNV 50
51 QDMCQKEVMEQSAGIMYRKSCASSAACLIASAG 83 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 QDMCQKEVMEQSAGIMYRKSCASSAACLIASAG 83
Sequence name: /tmp/gp6eQTLWqk/mFtjUpUzhb:Q8N2G4
Sequence documentation:
Alignment of: R11723 PEA 1 P6 x Q8N2G4 Alignment segment 1/1
Quality: 835.00 Escore: 0 Matching length: Total length: 83 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00. Gaps :
Alignment:
1 M VLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNV 50
1 MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNV 50
51 QDMCQKEVMEQSAGIMYRKSCASSAACLIASAG
51 QDMCQKEVMEQSAGIMYRKSCASSAACLIASAG 83
Sequence name: /tmp/gp6eQTLWqk/mFtjUpϋzhb:BAC85518
Sequence documentation: Alignment of: R11723_PEA 1 P6 x BAC85518
Alignment segment 1/1 Quality: 835.00
Escore: 0 Matching length: 83 Total length: 83 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps :
Alignment:
1 M VLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNV 50
24 MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNV 73
51 QDMCQKEVMEQSAGIMYRKSCASSAACLIASAG
74 QDMCQKEVMEQSAGIMYRKSCASSAACLIASAG 106
Sequence name: /tmp/VXjdFlzdBX/bexTxThOTh:Q96AC2 Sequence documentation:
Alignment of: R11723_PEA_1_P7 x Q96AC2
Alignment segment 1/1
Quality: 654.00 Escore: 0 Matching length: 64 Total length: 64 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps:
Alignment :
1 M VLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNV 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 M VLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNV 50
51 QDMCQKEVMEQSAG 64
51 QDMCQKEVMEQSAG 64 Sequence name: /tmp/VXjdFlzdBX/bexTxThOTh :Q8N2G4
Sequence documentation:
Alignment of: R11723_PEA_1_P7 x Q8N2G4
Alignment segment 1/1:
Quality: 654.00 Escore: 0 Matching length: 64 Total length: 64 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent
Identity: 100.00 Gaps : 0
Alignment : . . . . . 1 M VLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNV 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNV 50 51 QDMCQKEVMEQSAG 64 I I I I I I I I I I I I I I 51 QDMCQKEVMEQSAG 64 Sequence name: /tmp/VXjdFlzdBX/bexTxThOTh :BAC85273
Sequence documentation:
Alignment of: R11723_PEAJ.J?7 x BAC85273
Alignment segment 1/1:
Quality: 600.00
Escore: 0 Matching length: 59 Total length: 59 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent
Identity: 100.00 Gaps : 0
Alignment:
6 IAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQ 55 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 22 IAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQ 71
56 KEVMEQSAG 64 I I I I I I I I I 72 KEVMEQSAG 80
Sequence name: /tmp/VXjdFlzdBX/bexTxThOTh :BAC85518
Sequence documentation:
Alignment of: R11723 PEA 1 P7 x BAC85518
Alignment segment 1/1
Quality: 654.00 Escore: 0 Matching length: 64 Total length: 64 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment:
1 MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNV 50
24 M VLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNV 73
51 QDMCQKEVMEQSAG 64 I I I I I I I I I I I I I I 74 QDMCQKEVMEQSAG 87
Sequence name: /tmp/OLMSexEmIh/pc7Z7XmlYR:Q96AC2
Sequence documentation:
Alignment of: R11723_PEA_1_P10 x Q96AC2
Alignment segment 1/1: Quality: 645.00
Escore: 0 Matching length: 63 Total length: 63 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent
Identity: 100.00 Gaps : 0
Alignment:
1 MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNV 50 I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNV 50
51 QDMCQKEVMEQSA 63 51 QDMCQKEVMEQSA 63
Sequence name: /tmp/OLMSexEmIh/pc7Z7XmlYR:Q8N2G4
Sequence documentation:
Alignment of: R11723_PEA __P10 x Q8N2G4
Alignment segment 1/1:
Quality: 645.00
Escore: 0 Matching length: 63 Total length: 63 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent
Identity: 100.00 Gaps : 0
Alignment:
1 MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNV 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 M VLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNV 50 51 QDMCQKEVMEQSA 63
51 QDMCQKEVMEQSA 63
Sequence name: /tmp/OLMSexEmIh/pc7Z7XmlYR:BAC85273
Sequence documentation:
Alignment of: R11723 PEA_1_P10 x BAC85273
Alignment segment 1/1:
Quality: 591.00 Escore: 0 Matching length: 58 Total length: 58 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment:
6 IAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQ 55 22 IAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQ 71
56 KEVMEQSA 63 I I I I I I I I 72 KEVMEQSA 79
Sequence name: /tmp/OLMSexEmIh/pc7Z7XmlYR:BAC85518
Sequence documentation:
Alignment of: R11723_PEA_1_P10 x BAC85518
Alignment segment 1/1
Quality: 645.00 Escore: 0 Matching length: 63 Total length: 63 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps :
Alignment : 1 MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNV 50
24 M VLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNV 73
51 QDMCQKEVMEQSA 63
74 QDMCQKEVMEQSA 86
Alignment of: R11723 PEA 1 P13 x Q96AC2
Alignment segment 1/1:
Quality: 645.00 Escore: 0 Matching length: 63 Total length: 63 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment:
1 MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCT V 50 1 M VLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNV 50
51 QDMCQKEVMEQSA 63
51 QDMCQKEVMEQSA 63
DESCRIPTION FOR CLUSTER M77903 Cluster M77903 featares 4 transcript(s) and 29 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Figure imgf000983_0001
Table 2 - Segments of interest
Figure imgf000983_0002
Figure imgf000984_0001
Table 3 - Proteins of interest
Figure imgf000984_0002
Figure imgf000985_0001
These sequences are variants of the known protein Translocon-associated protein, alpha subunit precursor (SwissProt accession identifier SSRAJHUMAN; known also according to the synonyms TRAP-alpha; Signal sequence receptor alpha subunit; SSR-alpha), SEQ ID NO: 641, refened to herein as the previously Icnown protein. Protein Translocon-associated protein, alpha subunit precursor is Icnown or believed to have the following function(s): TRAP proteins are part of a complex whose function is to bind calcium to the ER membrane and thereby regulate the retention of ER resident proteins. May be involved in the recycling of the translocation apparatus after completion of the translocation process or may function as a membrane-bound chaperone facilitating folding of translocated proteins. The sequence for protein Translocon-associated protein, alpha subunit precursor is given at the end of the application, as "Translocon-associated protein, alpha subunit precursor amino acid sequence". Known polymoφhisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein
Figure imgf000985_0002
Protein Translocon-associated protein, alpha subunit precursor localization is believed to be Type I membrane protein. Endoplasmic reticulum. The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: co-translational membrane targeting; positive confrol of cell proliferation, which are annotation(s) related to Biological Process; signal sequence receptor; calcium binding, which are annotation(s) related to Molecular Function; and endoplasmic reticulum; integral membrane protein, which are annotation(s) related to Cellular Component. The GO assignment relies on infonnation from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>.
Cluster M77903 can be used as a diagnostic marker according to overexpression of transcripts of this cluster in cancer. Expression of such transcripts in normal tissues is also given according to the previously described methods. The term "number" in the left hand column of the table and the numbers on the y-axis of the figure below refer to weighted expression of ESTs in each category, as "parts per million" (ratio of the expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million).
Overall, the following results were obtained as shown with regard to the histograms in Figure 33 and Table 5. This cluster is overexpressed (at least at a minimum level) in the following pathological conditions: ovarian carcinoma and uterine malignancies.
Table 5 - Normal tissue distribution
Figure imgf000986_0001
Figure imgf000987_0001
Table 6-P values and ratios for expression in cancerous tissue
Figure imgf000987_0002
Figure imgf000988_0001
As noted above, cluster M77903 features 4 transcript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Translocon- associated protein, alpha subunit precursor. A description of each variant protein according to the present invention is now provided. Variant protein M77903JP4 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) M77903JT11. An alignment is given to the known protein (Translocon-associated protein, alpha subunit precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between M77903 J 4 and SSR JHUMAN: 1.An isolated chimeric polypeptide encoding for M77903 JP4, comprising a first amino acid sequence being at least 90 % homologous to MRLLPRLLLLLLLVFPATVLFRGGPRGLLAVAQDLTEDEETVEDSIIEDEDDEAEVEEDE PTDLVEDKEEEDVSGEPEASPSADTTILFVKGEDFPANNIVKFLVGFTNKGTEDFIVESLD ASFRYPQDYQFYIQNFTALPLNTWPPQRQATFEYSFIPAEPMGGRPFGLVTNLNYKDLN GNVFQDAVFNQTVTVIEREDGLDGET conesponding to amino acids 1 - 207 of SSRA JHUMAN, which also conesponds to amino acids 1 - 207 of M77903 JP4, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VRDPYRK conesponding to amino acids 208 - 214 of M77903_P4, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of M77903JP4, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VRDPYRK in M77903 P4.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither frans-membrane region prediction program predicts that this protein has a frans-membrane region. Variant protein M77903_P4 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 7, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is Icnown or not; the presence of known SNPs in variant protein M77903 JP4 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Amino acid mutations
Figure imgf000989_0001
Figure imgf000990_0001
The glycosylation sites of variant protein M77903 JP4, as compared to the Icnown protein Translocon-associated protein, alpha subunit precursor, are described in Table 8 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 8 - Glycosylation site(s)
Figure imgf000990_0002
Variant protein M77903_P4 is encoded by the following transcript(s): M77903JT11, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript M77903JT11 is shown in bold; this coding portion starts at position 200 and ends at position 841. The franscript also has the following SNPs as listed in Table 9 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein M77903_P4 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Nucleic acid SNPs
Figure imgf000991_0001
Figure imgf000992_0001
Figure imgf000993_0001
Variant protein M77903JP5 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) M77903JT12. An alignment is given to the known protein (Translocon-associated protein, alpha subunit precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between M77903JP5 and 8SRA JHUMAN: l.An isolated chimeric polypeptide encoding for M77903JP5, comprising a first amino acid sequence being at least 90 % homologous to MRLLPRLLLLLLLVFPATVLFRGGPRGLLAVAQDLTEDEETVEDSIIEDEDDEAEVEEDE PTDLVEDKEEEDVSGEPEASPSADTTILFVKGEDFPANNIVKFLVGFTNKGTEDFIVESLD ASFRYPQDYQFYIQNFTALPLNTVVPPQRQATFEYSFIPAEPMGGRPFGLVTNLNYKDLN GNVFQDAVFNQTVTVIEREDGLDGET conesponding to amino acids 1 - 207 of SSRA JHUMAN, which also conesponds to amino acids 1 - 207 of M77903_P5.
The location of the variant protein was detennined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither frans-membrane region prediction program predicts that this protein has a frans-membrane region. Variant protein M77903_P5 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 10, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is Icnown or not; the presence of known SNPs in variant protein M77903_P5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 - Amino acid mutations
Figure imgf000994_0001
Figure imgf000995_0001
The glycosylation sites of variant protein M77903JP5, as compared to the known protein Translocon-associated protein, alpha subunit precursor, are described in Table 11 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 11 - Glycosylation site(s)
Figure imgf000995_0002
Variant protein M77903JP5 is encoded by the following transcript(s): M77903JT12, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript M77903JT12 is shown in bold; this coding portion starts at position 200 and ends at position 820. The franscript also has the following SNPs as listed in Table 12 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein M77903JP5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 12 - Nucleic acid SNPs
Figure imgf000996_0001
Figure imgf000997_0001
Figure imgf000998_0001
Variant protein M77903_P15 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s) M77903 JT34. An alignment is given to the known protein (Translocon-associated protein, alpha subunit precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between M77903 JP 15 and S8RA JIUMAN: l.An isolated chimeric polypeptide encoding for M77903JP15, comprising a first amino acid sequence being at least 90 % homologous to
MRLLPRLLLLLLLVFPATVLFRGGPRGLLAVAQDLTEDEETVEDSIIEDEDDEAEVEEDE PTDLVEDKEEEDVSGEPEASPSADTTILFVKGEDFPANNIVKFLVGFTNKGTEDFIVESLD ASFRYPQDYQFYIQNFTALPLNTVVPPQRQATFEYSFIPAEPMGGRPFGLVTNLNYKDLN conesponding to amino acids 1 - 181 of SSRAJHUMAN, which also conesponds to amino acids 1 - 181 of M77903JP15, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VRSSKPSFCLS conesponding to amino acids 182 - 192 of M77903JP15, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of M77903_P15, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VRSSKPSFCLS in M77903JP15.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a frans-membrane region. Variant protein M77903_P15 also has the following non-silent SNPs (Single Nucleotide
Polymoφhisms) as listed in Table 13, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein M77903_P15 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 13 - Amino acid mutations
Figure imgf001000_0001
The glycosylation sites of variant protein M77903JP15, as compared to the known protein Translocon-associated protein, alpha subunit precursor, are described in Table 14 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 14 - Glycosylation site(s)
Figure imgf001000_0002
Variant protein M77903JP15 is encoded by the following transcript(s): M77903JT34, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript M77903JT34 is shown in bold; this coding portion starts at position 200 and ends at position 775. The transcript also has the following SNPs as listed in Table 15 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein M77903JP15 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 15 - Nucleic acid SNPs
Figure imgf001001_0001
Variant protein M77903JP16 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) M77903JT36. An alignment is given to the known protein (Translocon-associated protein, alpha subunit precursor) at the end of the application. One or more aligmnents to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between M77903_P16 and SSRA JHUMAN: l .An isolated chimeric polypeptide encoding for M77903_P16, comprising a first amino acid sequence being at least 90 % homologous to
MRLLPRLLLLLLLVFPATVLFRGGPRGLLAVAQDLTEDEETVEDSIIEDEDDEAEVEEDE PTDLVEDKEEEDVSGEPEASPSADTTILFVKGE conesponding to amino acids 1 - 93 of SSRAJHUMAN, which also conesponds to amino acids 1 - 93 of M77903JP16, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence GNTEVLVLIQM conesponding to amino acids 94 - 104 of M77903JP16, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of M77903JP16, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence GNTEVLVLIQM in M77903_P16.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither frans-membrane region prediction program predicts that this protein has a trans-membrane region. Variant protein M77903JP16 also has the following non-silent SNPs (Single Nucleotide
Polymoφhisms) as listed in Table 16, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is Icnown or not; the presence of known SNPs in variant protein M77903JP16 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 16 - Amino acid mutations
Figure imgf001003_0001
The glycosylation sites of variant protein M77903JP16, as compared to the Icnown protein Translocon-associated protein, alpha subunit precursor, are described in Table 17 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 17 - Glycosylation site(s)
Figure imgf001003_0002
Variant protein M77903JP16 is encoded by the following franscript(s): M77903JT36, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript M77903JT36 is shown in bold; this coding portion starts at position 200 and ends at position 511. The transcript also has the following SNPs as listed in Table 18 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein M77903JP16 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 18 - Nucleic acid SNPs
Figure imgf001004_0001
As noted above, cluster M77903 featares 29 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster M77903_node_2 according to the present invention is supported by 150 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): M77903JT11, M77903JT12, M77903JT34 and M77903JT36. Table 19 below describes the starting and ending position of this segment on each transcript. Table 19 - Segment location on transcripts
Figure imgf001005_0001
Segment cluster M77903_node_13 according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M77903JT36. Table 20 below describes the starting and ending position of this segment on each transcript. Table 20 - Segment location on transcripts
Figure imgf001005_0002
Segment cluster M77903_node_16 according to the present invention is supported by 149 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): M77903JT11, M77903JT12 and M77903JT34. Table 21 below describes the starting and ending position of this segment on each transcript. Table 21 - Segment location on transcripts
Figure imgf001005_0003
Figure imgf001006_0001
Segment cluster M77903 jiode J 8 according to the present invention is supported by 2 libraries. The number of libraries was detennined as previously described. This segment can be found in the following transcript(s): M77903JT34. Table 22 below describes the starting and ending position of this segment on each transcript. Table 22 - Segment location on transcripts
Figure imgf001006_0002
Segment cluster M77903_nodeJ5 according to the present invention is supported by 145 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): M77903JT11 and M77903JT12. Table 23 below describes the starting and ending position of this segment on each franscript. Table 23 - Segment location on transcripts
Figure imgf001006_0003
Segment cluster M77903_node _36 according to the present invention is supported by 173 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcriρt(s): M77903JT11 and M77903JT12. Table 24 below describes the starting and ending position of this segment on each transcript. Table 24 - Segment location on transcripts
Figure imgf001007_0001
Segment cluster M77903 jιodeJ7 according to the present invention is supported by 128 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): M77903JT11 and M77903JT12. Table 25 below describes the starting and ending position of this segment on each transcript. Table 25 - Segment location on transcripts
Figure imgf001007_0002
Segment cluster M77903_nodeJ8 according to the present invention is supported by 152 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M77903JT11 and M77903JT12. Table 26 below describes the starting and ending position of this segment on each transcript. Table 26 - Segment location on transcripts
Figure imgf001007_0003
Segment cluster M77903_node_40 according to the present invention is supported by 186 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M77903JT11 and M77903JT12. Table 27 below describes the starting and ending position of this segment on each transcript. Table 27 - Segment location on transcripts
Figure imgf001008_0001
Segment cluster M77903_node_44 according to the present invention is supported by 122 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M77903JT11 and M77903JT12. Table 28 below describes the starting and ending position of this segment on each franscript. Table 28 - Segment location on transcripts
Figure imgf001008_0002
Segment cluster M77903_node_46 according to the present invention is supported by 10 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): M77903 T11 and M77903JT12. Table 29 below describes the starting and ending position of this segment on each franscript. Table 29 - Segment location on transcripts
Figure imgf001008_0003
Figure imgf001009_0001
Segment cluster M77903_node_47 according to the present invention is supported by 40 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M77903JT11 and M77903JT12. Table 30 below describes the starting and ending position of this segment on each transcript. Table 30 - Segment location on transcripts
Figure imgf001009_0002
Segment cluster M77903_node_48 according to the present invention is supported by 63 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M77903JT11 and M77903JT12. Table 31 below describes the starting and ending position of this segment on each transcript. Table 31 - Segment location on transcripts
Figure imgf001009_0003
Segment cluster M77903_node_49 according to the present invention is supported by 7 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M77903JT11 and M77903JT12. Table 32 below describes the starting and ending position of this segment on each transcript. Table 32 - Segment location on transcripts
Figure imgf001010_0001
Segment cluster M77903_node l according to the present invention is supported by 14 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M77903JT11 and M77903JT12. Table 33 below describes the starting and ending position of this segment on each transcript. Table 33 - Segment location on transcripts
Figure imgf001010_0002
Segment cluster M77903 jiode _52 according to the present invention is supported by 160 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M77903JT11 and M77903JT12. Table 34 below describes the starting and ending position of this segment on each transcript. Table 34 - Segment location on transcripts
Figure imgf001010_0003
According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster M77903_node_l according to the present invention is supported by 4 libraries. The number of libraries was detennined as previously described. This segment can be found in the following transcript(s): M77903JT11, M77903JT12, M77903JT34 and M77903JT36. Table 35 below describes the starting and ending position of this segment on each transcript. Table 35 - Segment location on transcripts
Figure imgf001011_0001
Segment cluster M77903_nodeJ according to the present invention is supported by 154 libraries. The number of libraries was detennined as previously described. This segment can be found in the following franscript(s): M77903JT11, M77903JT12, M77903JT34 and M77903JT36. Table 36 below describes the starting and ending position of this segment on each franscript. Table 36 - Segment location on transcripts
Figure imgf001011_0002
Figure imgf001012_0001
Segment cluster M77903_node_9 according to the present invention can be found in the following transcript(s): M77903JT11, M77903JT12, M77903JT34 and M77903JT36. Table 37 below describes the starting and ending position of this segment on each transcript. Table 37 - Segment location on transcripts
Figure imgf001012_0002
Segment cluster M77903_nodeJ0 according to the present invention is supported by 148 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcriρt(s): M77903JT11, M77903JT12, M77903JT34 and M77903JT36. Table 38 below describes the starting and ending position of this segment on each transcript. Table 38 - Segment location on transcripts
Figure imgf001012_0003
Segment cluster M77903 jiodej 1 according to the present invention can be found in the following transcript(s): M77903JT11, M77903JT12, M77903JT34 and M77903JT36. Table 39 below describes the starting and ending position of this segment on each transcript. Table 39 - Segment location on transcripts
Figure imgf001013_0001
Segment cluster M77903_nodeJ2 according to the present invention can be found in the following franscript(s): M77903JT11, M77903JT12, M77903JT34 and M77903JT36. Table 40 below describes the starting and ending position of this segment on each transcript. Table 40 - Segment location on transcripts
Figure imgf001013_0002
Segment cluster M77903_node_15 according to the present invention is supported by 129 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcriρt(s): M77903JT11, M77903JT12 and M77903JT34. Table 41 below describes the starting and ending position of this segment on each franscript. Table 41 - Segment location on transcripts
Figure imgf001014_0001
Segment cluster M77903_node_17 according to the present invention is supported by 141 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M77903JT11, M77903JT12 and M77903JT34. Table 42 below describes the starting and ending position of this segment on each franscript. Table 42 - Segment location on transcripts
Figure imgf001014_0002
Segment cluster M77903_node_20 according to the present invention is supported by 134 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M77903JT11 and M77903JT12. Table 43 below describes the starting and ending position of this segment on each franscript. Table 43 - Segment location on transcripts
Figure imgf001014_0003
Segment cluster M77903 jnode J8 according to the present invention is supported by 134 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M77903JT11. Table 44 below describes the starting and ending position of this segment on each transcript. Table 44 - Segment location on transcripts
Figure imgf001015_0001
Segment cluster M77903 jιodeJ4 according to the present invention is supported by 134 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M77903JT11 and M77903JT12. Table 45 below describes the starting and ending position of this segment on each transcript. Table 45 - Segment location on transcripts
Figure imgf001015_0002
Segment cluster M77903 jιodeJ-1 according to the present invention is supported by 119 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): M77903JT11 and M77903JT12. Table 46 below describes the starting and ending position of this segment on each transcript. Table 46 - Segment location on transcripts
Figure imgf001016_0001
Segment cluster M77903_nodeJ2 according to the present invention is supported by 123 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M77903JT11 and M77903JT12. Table 47 below describes the starting and ending position of this segment on each franscript. Table 47 - Segment location on transcripts
Figure imgf001016_0002
Variant protein aligmnent to the previously Icnown protein: Sequence name: SSRA_HUMAN
Sequence documentation: Alignment of: M77903_P4 x SSRAJHUMAN Alignment segment 1/1:
Quality: 1991.00 Escore: 0 Matching length: 208 Total length: 208 Matching Percent Similarity: 100.00 Matching Percent Identity: 99.52 Total Percent Similarity: 100.00 Total Percent Identity: 99.52 Gaps : 0
Alignment:
1 MRLLPRLLLLLLLVFPATVLFRGGPRGLLAVAQDLTEDEETVEDSIIEDE 50
1 MRLLPRLLLLLLLVFPATVLFRGGPRGLLAVAQDLTEDEETVEDSIIEDE 50
51 DDEAEVEEDEPTDLVEDKEEEDVSGEPEASPSADTTILFVKGEDFPANNI 100
51 DDEAEVEEDEPTDLVEDKEEEDVSGEPEASPSADTTILFVKGEDFPANNI 100
101 VKFLVGFTNKGTEDFIVESLDASFRYPQDYQFYIQNFTALPLNTVVPPQR 150
101 VKFLVGFTNKGTEDFIVESLDASFRYPQDYQFYIQNFTALPLNTVVPPQR 150
151 QATFEYSFIPAEPMGGRPFGLVINLNYKDLNGNVFQDAVFNQTVTVIERE 200
151 QATFEYSFIPAEPMGGRPFGLVINLNYKDLNGNVFQDAVFNQTVTVIERE 200
201 DGLDGETV 208
201 DGLDGETI 208 Sequence name: SSRAJiUMAN
Sequence documentation:
Alignment of: M77903_P5 x SSRAJIUMAN
Alignment segment 1/1:
Quality: 1987.00
Escore: 0 Matching length: 207 Total length: 207 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment :
1 MRLLPRLLLLLLLVFPATVLFRGGPRGLLAVAQDLTEDEETVEDSIIEDE 50
1 MRLLPRLLLLLLLVFPATVLFRGGPRGLLAVAQDLTEDEETVEDSIIEDE 50
51 DDEAEVEEDEPTDLVEDKEEEDVSGEPEASPSADTTILFVKGEDFPANNI 100
51 DDEAEVEEDEPTDLVEDKEEEDVSGEPEASPSADTTILFVKGEDFPANNI 100
101 VKFLVGFTNKGTEDFIVESLDASFRYPQDYQFYIQNFTALPLNTVVPPQR 150 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 101 VKFLVGFTNKGTEDFIVESLDASFRYPQDYQFYIQNFTALPLNTVVPPQR 150
151 QATFEYSFIPAEPMGGRPFGLVINLNYKDLNGNVFQDAVFNQTVTVIERE 200 I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 151 QATFEYSFIPAEPMGGRPFGLVINLNYKDLNGNVFQDAVFNQTVTVIERE 200
201 DGLDGET 207 I I I I I I I 201 DGLDGET 207
Sequence name: SSRAJ-UMAN
Sequence documentation:
Alignment of: M77903JP15 x SSRA_HUMAN
Alignment segment 1/1: Quality: 1741.00
Escore: 0 Matching length: 181 Total length: 181 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent
Identity: 100.00 Gaps : 0
Alignment:
1 MRLLPRLLLLLLLVFPATVLFRGGPRGLLAVAQDLTEDEETVEDSIIEDE 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I 1 MRLLPRLLLLLLLVFPATVLFRGGPRGLLAVAQDLTEDEETVEDSIIEDE 50 . . . . . 51 DDEAEVEEDEPTDLVEDKEEEDVSGEPEASPSADTTILFVKGEDFPANNI 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 51 DDEAEVEEDEPTDLVEDKEEEDVSGEPEASPSADTTILFVKGEDFPANNI 100 101 VKFLVGFTNKGTEDFIVESLDASFRYPQDYQFYIQNFTALPLNTVVPPQR 150 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I 101 VKFLVGFTNKGTEDFIVESLDASFRYPQDYQFYIQNFTALPLNTVVPPQR 150
151 QATFEYSFIPAEPMGGRPFGLVINLNYKDLN 181
151 QATFEYSFIPAEPMGGRPFGLVINLNYKDLN 181
Sequence name: SSRAJ-UMAN
Sequence documentation: Alignment of: M77903_P16 x SSRAJiUMAN
Alignment segment 1/1 Quality: 869.00
Escore: 0 Matching length: 93 Total length: 93 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment:
1 MRLLPRLLLLLLLVFPATVLFRGGPRGLLAVAQDLTEDEETVEDSIIEDE 50
1 MRLLPRLLLLLLLVFPATVLFRGGPRGLLAVAQDLTEDEETVEDSIIEDE 50
51 DDEAEVEEDEPTDLVEDKEEEDVSGEPEASPSADTTILFVKGE 93
51 DDEAEVEEDEPTDLVEDKEEEDVSGEPEASPSADTTILFVKGE 93 Expression of SSRA_HUMAN: SSR-alpha M77903 transcripts, which are detectable by amplicon, as depicted in sequence name M77903segl8 in normal and cancerous colon tissues. Transcripts detectable by or according to M77903segl8 amplicon (SEQ ID NO: 1303) and M77903segl8F (SEQ ID NO: 1301) and M77903segl8R (SEQ ID NO: 1302) primers were measured by real time PCR. In parallel the expression of four housekeeping genes: PBGD (GenBank Accession No. BC019323; amplicon - PBGD-amplicon, SEQ ID NO.531), HPRTl (GenBank Accession No. NM_000194; amplicon - HPRTl -amplicon, SEQ ID NO.612), and, G6PD (GenBank Accession No. NM 000402; HPRTl -amplicon, SEQ ID NO:615), and RPS27A (GenBank Accession No. NM 02954; RPS27A amplicon, SEQ ID NO.1261), was measured similarly. For each RT sample, the expression of the above amplicon was noπnalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the normal post-mortem (PM) samples (Sample Nos. 41, 52, 62-67, 69-71, Table 1 Tissue samples in testing panel), to obtain a value of fold up-regulation for each sample relative to median of the normal PM samples. Figure 34 is a histogram showing over expression of the above-indicated SSRAJHUMAN: SSR-alpha transcripts in cancerous colon samples relative to the normal samples. As is evident from Figure 34, the expression of SSRAJHUMAN: SSR-alpha transcripts detectable by the above amplicon(s) in a few cancer samples was higher than in the non- cancerous samples (Sample Nos. 41,52, 62-67, 69-71 Table 1 Tissue samples in testing panel). Notably an over-expression of at least 5 fold was found in 5 out of 37 adenocarcinoma samples.
Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non- limiting illusfrative example only of a suitable primer pair: M77903segl8F forward primer; and M77903segl8R reverse primer. The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon was obtained as a non-limiting illustrative example only of a suitable amplicon: M77903segl8]. M77903segl8F (SEQ ID NO: 1301) CGGTGACGTTGTTTAATAGAATATATCTGT M77903segl8R (SEQ ID NO: 1302) AAGAAACGTGCAATTTATCTTTGCT M77903segl8 amplicon (SEQ ID NO: 1303) CGGTGACGTTGTTTAATAGAATATATCTGTTCATTCAGTTGCCTGTTTTGTGG
TTGAACCTGTGATAGCCACCAGGGAAGCAAAGATAAATTGCACGTTTCTT As can be seen from Figures 35 and 36, for cluster M77903, amplicon name: M77903 junc20- 34-35, and M77903 junc20-28, respectively, low over expression was observed in one experiment canied out with colon.
Expression of SSRA_HUMAN: Translocon-associated protein, alpha subunit (TRAP-alpha Signal sequence receptor alpha subunitSSR-alpha) M77903 transcripts which are detectable by amplicon as depicted in sequence name M77903junc20-28 in normal and cancerous colon tissues
Expression of SSRAJHUMAN: Translocon-associated protein, alpha subunit (TRAP-alpha Signal sequence receptor alpha subunitSSR-alpha ) franscripts detectable by or according to junc20-28, M77903junc20-28 amplicon (SEQ ID NO: 1306) and primers M77903junc20-28F (SEQ ID NO: 1304) and M77903junc20-28R (SEQ ID NO: 1305) was measured by real time PCR. In parallel the expression of four housekeeping genes -PBGD (GenBank Accession No. BC019323; amplicon - PBGD-amplicon, SEQ ID NO:531), HPRTl (GenBank Accession No. NM_000194; amplicon - HPRTl -amplicon, SEQ ID NO:612), G6PD (GenBank Accession No. NM 00402; HPRTl -amplicon, SEQ ID NO:615), RPS27A (GenBank Accession No. NM_002954; RPS27A amplicon, SEQ ID NO: 1261) was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the normal post-mortem (PM) samples (Sample Nos. 41, 52, 62-67, 69-71, Table 1, above, "Tissue samples in testing panel"), to obtain a value of fold up-regulation for each sample relative to median of the normal PM samples. Figure 35 is a histogram showing over expression of the above-indicated SSRA_HUMAN: Translocon-associated protein, alpha subunit TRAP-alpha Signal sequence receptor alpha subunits SR-alpha transcripts in cancerous colon samples relative to the normal samples. As is evident from Figure 35, the expression of the above-indicated SSRAJHUMAN: Translocon-associated protein, alpha subunit TRAP-alpha Signal sequence receptor alpha subunitSSR-alpha transcripts detectable by the above amplicon in cancer samples was higher in a few samples than in the non-cancerous samples (Sample Nos. 41, 52, 62-67, 69-71, Table 1, above, "Tissue samples in testing panel"). Notably an over-expression of at least 5 fold was found in 4 out of 36 adenocarcinoma samples.
Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non- limiting illustrative example only of a suitable primer pair: M77903junc20-28F forward primer; and M77903junc20-28R reverse primer. The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon was obtained as a non-limiting illustrative example only of a suitable amplicon: M77903junc20- 28..
Primers: Forward primer M77903junc20-28F (SEQ ID NO: 1304): GGCAATGTATTCCAAGATGCAG Reverse primer M77903junc20-28R (SEQ ID NO: 1305): TCTGTATGGGTCTCTTACGGTTTCT Amplicon M77903junc20-28 (SEQ ID NO: 1306) : GGCAATGTATTCCAAGATGCAGTCTTCAATCAAACAGTTACAGTTATTGAAAGAGA GGATGGGTTAGATGGAGAAACCGTAAGAGACCCATACAGA
Expression of SSRAJHUMAN: Translocon-associated protein, alpha subunit TRAP-alpha Signal sequence receptor alpha subunitSSR-alphaM77903 transcripts which are detectable by amplicon as depicted in sequence name M77903junc20-34-35 in normal and cancerous colon tissues
Expression of SSRAJHUMAN: Translocon-associated protein, alpha subunit TRAP-alpha Signal sequence receptor alpha subunitSSR-alpha transcripts detectable by or according to junc20-34-35, M77903junc20-34-35 amplicon (SEQ ID NO: 1309) and primers M77903junc20- 34-35F (SEQ ID NO: 1307) and M77903junc20-34-35R (SEQ ID NO: 1308) was measured by real time PCR. In parallel the expression of four housekeeping genes -PBGD (GenBank Accession No. BC019323; amplicon - PBGD-amplicon, SEQ ID NO:531), HPRTl (GenBank Accession No. NM 00194; amplicon - HPRTl -amplicon, SEQ ID NO:612), G6PD (GenBank Accession No. NM 00402; HPRTl-amplicon, SEQ ID NO:615), RPS27A (GenBank Accession No. NM_002954; RPS27A amplicon, SEQ ID NO:1261) was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The nonnalized quantity of each RT sample was then divided by the median of the quantities of the normal post-mortem (PM) samples (Sample Nos. 41, 52, 62-67, 69-71 Table 1, above, "Tissue samples in testing panel"), to obtain a value of fold up-regulation for each sample relative to median of the normal PM samples. Figure 36 is a histogram showing over expression of the above-indicated SSRAJHUMAN: Translocon-associated protein, alpha subunit TRAP-alpha Signal sequence receptor alpha subunit SSR-alpha transcripts in cancerous colon samples relative to the normal samples. As is evident from Figure 36, the expression of SSRAJHUMAN: Translocon-associated protein, alpha subunit, TRAP-alpha Signal sequence receptor alpha subunitSSR-alpha franscripts detectable by the above amplicon in cancer samples was higher in a few samples than in the non-cancerous samples (Sample Nos. 41, 52, 62-67, 69-71, Table 1, above, "Tissue samples in testing panel"). Notably an over-expression of at least 10 fold was found in 7 out of 36 adenocarcinoma samples. Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non- limiting illusfrative example only of a suitable primer pair: M77903junc20-34-35F forward primer; and M77903junc20-34-35R reverse primer. The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon was obtained as a non-limiting illustrative example only of a suitable amplicon: M77903junc20- 34-35.
Primers: Forward primer M77903junc20-34-35F (SEQ ID NO: 1307): ATGGGTTAGATGGAGAAACATAAAGCT Reverse primer M77903junc20-34-35R (SEQ ID NO: 1308):
TGCACAAAGGAACATTTACTCATCA Amplicon M77903jιιnc20-34-35 (SEQ ID NO: 1309): ATGGGTTAGATGGAGAAACATAAAGCTTCACCAAGAAGGTTGCCCAGGAAACGGG CACAGAAGAGATCAGTGGGATCTGATGAGTAAATGTTCCTTTGTGCA Combined expression of 6 sequences (M85491seg24, M77903 segl8, M77903junc20-28,
Z44808 junc8-l 1, Z25299 seg 20 and HSKITCR seg3) in normal and cancerous colon tissues. Expression of Ephrin type-B receptor 2 precursor (EC 2.7.1.112) (Tyrosine-protein kinase receptor EPH-3), SSRA_HUMAN, SM02_HUMAN SPARC related modular calcium- binding protein 2 precursor (Secreted modular calcium-binding protein 2) (SMOC-2) (Smooth muscle-associated protein 2), Secretory leukocyte protease inhibitor Acid-stable proteinase inhibitor and KITJHUMAN; mast/stem cell growth factor receptor SCFR; Proto-oncogene tyrosine-protein kinase Kit; v-kit; CD117 antigen transcripts detectable by or according to M85491seg24 (SEQ ID NO: 1276), M77903 segl8 (SEQ ID NO: 1303), M77903junc20-28 (SEQ ID NO: 1306), Z44808 junc8-l l (SEQ ID NO: 1291), Z25299 seg 20 (SEQ ID NO: 1294) and HSKITCR seg3 (SEQ ID NO: 1309) amplicons and M85491seg24F (SEQ ID NO: 1274), M85491seg24R (SEQ ID NO: 1275), M77903 segl8F (SEQ ID NO: 1301), M77903 seglδR (SEQ ID NO: 1302), M77903junc20-28F (SEQ ID NO: 1304), M77903junc20-28R (SEQ ID NO: 1305), Z44808 junc8-HF (SEQ ID NO: 1289), Z44808 junc8-HR (SEQ ID NO: 1290), Z25299 seg 20F (SEQ ID NO: 1292), Z25299 seg 20R (SEQ ID NO: 1293), HSKITCR seg3F (SEQ ID NO: 1307) and HSKITCR seg3R (SEQ ID NO: 1308) primers was measured by real time PCR. In parallel the expression of four housekeeping genes -PBGD (GenBank Accession No. BC019323; amplicon - PBGD-amplicon, SEQ ID NO:531), HPRTl (GenBank Accession No. NM_000194; amplicon - HPRTl -amplicon, SEQ ID NO:612), G6PD (GenBank Accession No. NM_000402; HPRTl -amplicon, SEQ ID NO:615) and RPS27A (GenBanlc Accession No. NMJ002954; RPS27A amplicon, SEQ ID NO: 1261) was measured similarly. For each RT sample, the expression of the above amplicons was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample of each amplicon was then divided by the median of the quantities of the normal post-mortem (PM) samples detected for the same amplicon (Sample Nos. 41, 52, 62-67, 69-71 Table 3, above), to obtain a value of fold up-regulation for each sample relative to median of the normal PM samples. The reciprocal of this ratio was calculated for HSKITCR seg3, to obtain a value of fold down-regulation for each sample relative to median of the normal PM samples. The expression of HSKITCR transcripts which can be detected by the HSKITCR seg3, is described also in the patent application "NOVEL NUCLEOTIDE AND AMINO ACID SEQUENCES, AND ASSAYS AND METHODS OF USE THEREOF FOR DIAGNOSIS" , attorney reference number XXXXX, by the same inventors, filed on the same date ans incorporated herein by reference.
Figures 37-38 are histograms showing differential expression of the above-indicated transcripts in cancerous colon samples relative to the normal samples, in different combinations. The number and percentage of samples that exhibit at least 5 fold differential of at least one of the sequences, out of the total number of samples tested is indicated in the bottom. As is evident from Figures 37-38, differential expression of at least 5 fold in at least one of the sequences was found in 29 out of 36 adenocarcinoma samples in the combinations of 6 transcripts, and in 13 out of 36 adenocarcinoma samples in the combinations of 5 transcripts. Statistical analysis was applied to verify the significance of these results, as described below. Threshold of 5 fold differential expression of at least one of the amplicons was found to differentiate between cancer and normal samples as checked by exact fisher test. The above values demonstrate statistical significance of the results.
DESCRIPTION FOR CLUSTER HSSTROL3
Cluster HSSTROL3 features 6 franscript(s) and 16 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Figure imgf001027_0001
Figure imgf001028_0001
Table 2 - Segments of interest
Figure imgf001028_0002
Table 3 - Proteins of interest
Figure imgf001028_0003
Figure imgf001029_0001
These sequences are variants of the known protein Stromelysin-3 precursor (SwissProt accession identifier MM11 JHUMAN; known also according to the synonyms EC 3.4.24.-; Matrix metalloproteinase- 11 ; MMP- 11 ; ST3 ; SL-3), SEQ ID NO : 523 , refened to herein as the previously known protein. Protein Stromelysin-3 precursor is Icnown or believed to have the following function(s): May play an important role in the progression of epithelial malignancies. The sequence for protein Stromelysin-3 precursor is given at the end of the application, as "Stromelysin-3 precursor amino acid sequence".
The following GO Annotation(s) apply to the previously Icnown protein. The following annotation(s) were found: proteolysis and peptidolysis; developmental processes; moφhogenesis, which are annotation(s) related to Biological Process; stromelysin 3; calcium binding; zinc binding; hydrolase, which are annotation(s) related to Molecular Function; and extracellular matrix, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <htto://www.ncbi.nlm.nih.gov/projects/LocusLink/>.
Cluster HSSTROL3 can be used as a diagnostic marker according to overexpression of franscripts of this cluster in cancer. Expression of such transcripts in normal tissues is also given according to the previously described methods. The term "number" in the left hand column of the table and the numbers on the y-axis of the figure below refer to weighted expression of ESTs in each category, as "parts per million" (ratio of the expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million). Overall, the following results were obtained as shown with regard to the histograms in Figure 75 and Table 4. This cluster is overexpressed (at least at a minimum level) in the following pathological conditions: transitional cell carcinoma, epithelial malignant tamors, a mixtare of malignant tamors from different tissues and pancreas carcinoma.
Table 4 - Normal tissue distribution
Figure imgf001030_0001
Table 5 - P values and ratios for expression in cancerous tissue
Figure imgf001030_0002
Figure imgf001031_0001
As noted above, cluster HSSTROL3 featares 6 transcript(s), which were listed in Table 1 above. These franscript(s) encode for protein(s) which are variant(s) of protein Sfromelysin-3 precursor. A description of each variant protein according to the present invention is now provided. Variant protein HSSTROL3JP4 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s) HSSTROL3JT5. An alignment is given to the Icnown protein (Sfromelysin-3 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSSTROL3 JP4 and MM11 JHUMAN: l.An isolated chimeric polypeptide encoding for HSSTROL3_P4, comprising a first amino acid sequence being at least 90 % homologous to
MAPAAWLRSAAARALLPPMLLLLLQPPPLLARALPPDVHHLHAERRGPQPWHAALPSS PAPAPATQEAPRPASSLRPPRCGVPDPSDGLSARNRQKRFVLSGGRWEKTDLTYRILRFP WQLVQEQVRQTMAEALKVWSDVTPLTFTEVHEGRADIMIDFARYW conesponding to amino acids 1 - 163 of MM11 JHUMAN, which also conesponds to amino acids 1 - 163 of HSSTROL3JP4, a bridging amino acid H conesponding to amino acid 164 of HSSTROL3JP4, a second amino acid sequence being at least 90 % homologous to GDDLPFDGPGGILAHAFFPKTHREGDVHFDYDETWTIGDDQGTDLLQVAAHEFGHVLG LQHTTAAKALMSAFYTFRYPLSLSPDDCRGVQHLYGQPWPTVTSRTPALGPQ AGIDTN EIAPLEPDAPPDACEASFDAVSTIRGELFFFKAGFVWRLRGGQLQPGYPALASRHWQGL PSPVDAAFEDAQGHIWFFQGAQYWVYDGEKPVLGPAPLTELGLVRFPVHAALVWGPE KNKIYFFRGRDYWRFHPSTRRVDSPVPRRATDWRGVPSEIDAAFQDADG conesponding to amino acids 165 - 445 of MM11 JHUMAN, which also conesponds to amino acids 165 - 445 of HSSTROL3JP4, and a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
ALGVRQLVGGGHSSRFSHLWAGLPHACHRKSGSSSQVLCPEPSALLSVAG conesponding to amino acids 446 - 496 of HSSTROL3JP4, wherein said first amino acid sequence, bridging amino acid, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of HSSTROL3JP4, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence ALGVRQLVGGGHSSRFSHLVVAGLPHACHRKSGSSSQVLCPEPSALLSVAG in HSSTROL3_P4.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region. Variant protein HSSTROL3 JP4 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 6, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of Icnown SNPs in variant protein HSSTROL3_P4 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 6 - Amino acid mutations
Figure imgf001033_0001
Variant protein HSSTROL3 JP4 is encoded by the following transcript(s): HSSTROL3JT5, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSSTROL3JT5 is shown in bold; this coding portion starts at position 24 and ends at position 1511. The transcript also has the following SNPs as listed in Table 7 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSSTROL3JP4 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Nucleic acid SNPs
Figure imgf001033_0002
Figure imgf001034_0001
Variant protein HSSTROL3 JP5 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s) HSSTROL3JT8 and HSSTROL3 JT9. An alignment is given to the known protein (Stromelysin-3 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSSTROL3 J>5 and MM11_HUMAN: 1.An isolated chimeric polypeptide encoding for HSSTROL3 P5, comprising a first amino acid sequence being at least 90 % homologous to
MAPAAWLRSAAARALLPPMLLLLLQPPPLLARALPPDVHHLHAERRGPQPWHAALPSS PAPAPATQEAPRPASSLRPPRCGVPDPSDGLSARNRQKRFVLSGGRWEKTDLTYRILRFP WQLVQEQVRQTMAEALKVWSDVTPLTFTEVHEGRADIMIDFARYW conesponding to amino acids 1 - 163 of MM11 JHUMAN, which also conesponds to amino acids 1 - 163 of HSSTROL3JP5, a bridging amino acid H conesponding to amino acid 164 of HSSTROL3JP5, a second amino acid sequence being at least 90 % homologous to GDDLPFDGPGGILAHAFFPKTHREGDVHFDYDETWTIGDDQGTDLLQVAAHEFGHVLG LQHTTAAKALMSAFYTFRYPLSLSPDDCRGVQHLYGQPWPTVTSRTPALGPQ AGIDTN EIAPLEPDAPPDACEASFDAVSTIRGELFFFKAGFVWRLRGGQLQPGYPALASRHWQGL PSPVDAAFEDAQGHIWFFQ conesponding to amino acids 165 - 358 of MM11 JTUMAN, which also conesponds to amino acids 165 - 358 of HSSTROL3JP5, and a third amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence ELGFPSSTGRDESLEHCRCQGLHK conesponding to amino acids 359 - 382 of HSSTROL3JP5, wherein said first amino acid sequence, bridging amino acid, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSSTROL3JP5, comprising a polypeptide being at least 10%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence ELGFPSSTGRDESLEHCRCQGLHK in HSSTROL3_P5.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither frans-membrane region prediction program predicts that this protein has a frans-membrane region. Variant protein HSSTROL3 P5 also has the following non-silent SNPs (Single
Nucleotide Polymoφhisms) as listed in Table 8, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSSTROL3 JP5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Amino acid mutations
Figure imgf001036_0001
Variant protein HSSTROL3 JP5 is encoded by the following transcript(s): HSSTROL3JT8 and HSSTROL3JT9, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSSTROL3JT8 is shown in bold; this coding portion starts at position 24 and ends at position 1169. The transcript also has the following SNPs as listed in Table 9 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSSTROL3JP5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Nucleic acid SNPs
Figure imgf001036_0002
Figure imgf001037_0001
The coding portion of transcript HSSTROL3JT9 is shown in bold; this coding portion starts at position 24 and ends at position 1169. The transcript also has the following SNPs as listed in Table 10 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is Icnown or not; the presence of Icnown SNPs in variant protein HSSTROL3JP5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 - Nucleic acid SNPs
Figure imgf001037_0002
Figure imgf001038_0001
Variant protein HSSTROL3 JP7 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s) HSSTROL3JT10. An alignment is given to the Icnown protein (Stromelysin-3 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSSTROL3JP7 and MMI 1_HUMAN: 1.An isolated chimeric polypeptide encoding for HSSTROL3 J>7, comprising a first amino acid sequence being at least 90 % homologous to MAPAAWLRSAAARALLPPMLLLLLQPPPLLARALPPDVHHLHAERRGPQPWHAALPSS PAPAPATQEAPRPASSLRPPRCGVPDPSDGLSARNRQKRFVLSGGRWEKTDLTYRILRFP WQLVQEQVRQTMAEALKN SDVTPLTFTEVHEGRADIMIDFARYW conesponding to amino acids 1 - 163 of MMI 1 JHUMAN, which also conesponds to amino acids 1 - 163 of HSSTR0L3JP7, a bridging amino acid H conesponding to amino acid 164 of HSSTROL3JP7, a second amino acid sequence being at least 90 % homologous to
GDDLPFDGPGGILAHAFFPKTHREGDVHFDYDETWTIGDDQGTDLLQVAAHEFGHVLG LQHTTAAKALMSAFYTFRYPLSLSPDDCRGVQHLYGQPWPTVTSRTPALGPQAGIDTN EIAPLEPDAPPDACEASFDAVSTIRGELFFFKAGFVWRLRGGQLQPGYPALASRHWQGL PSPVDAAFEDAQGHIWFFQG conesponding to amino acids 165 - 359 of MMI 1 JHUMAN, which also conesponds to amino acids 165 - 359 of HSSTROL3_P7, and a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%) and most preferably at least 95% homologous to a polypeptide having the sequence TTGVSTPAPGV conesponding to amino acids 360 - 370 of HSSTROL3 P7, wherein said first amino acid sequence, bridging amino acid, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSSTROL3JP7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence TTGVSTPAPGV in HSSTROL3JP7.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a frans-membrane region. Variant protein HSSTROL3 JP7 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 11 , (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSSTROL3JP7 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 11 - Amino acid mutations
Figure imgf001040_0001
Variant protein HSSTROL3 P7 is encoded by the following transcript(s): HSSTROL3JT10, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSSTROL3JT10 is shown in bold; this coding portion starts at position 24 and ends at position 1133. The transcript also has the following SNPs as listed in Table 12 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSSTROL3_P7 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 12 - Nucleic acid SNPs
Figure imgf001040_0002
Figure imgf001041_0001
Variant protein HSSTROL3JP8 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s) HSSTROL3 T11. An alignment is given to the known protein (Stromelysin-3 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSSTROL3JP8 and MMI 1 JHUMAN: 1.An isolated chimeric polypeptide encoding for HS8TROL3 P8, comprising a first amino acid sequence being at least 90 % homologous to MAPAAWLRSAAARALLPPMLLLLLQPPPLLARALPPDVHHLHAERRGPQPWHAALPSS PAPAPATQEAPRPASSLRPPRCGVPDPSDGLSARNRQKRFVLSGGRWEKTDLTYRILRFP WQLVQEQVRQTMAEALKVWSDVTPLTFTEVHEGRADTMIDFARYW conesponding to amino acids 1 - 163 of MMI 1_HUMAN, which also conesponds to amino acids 1 - 163 of HSSTROL3JP8, a bridging amino acid H conesponding to amino acid 164 of HSSTROL3_P8, a second amino acid sequence being at least 90 % homologous to GDDLPFDGPGGILAHAFFPKTHREGDVHFDYDETWTIGDDQGTDLLQVAAHEFGHVLG LQHTTAAKALMSAFYTFRYPLSLSPDDCRGVQHLYGQPWPTVTSRTPALGPQAGIDTN EIAPLE conesponding to amino acids 165 - 286 of MMI 1 JHUMAN, which also conesponds to amino acids 165 - 286 of HSSTROL3_P8, and a third amino acid sequence being at least 10%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VRPCLPVPLLLCWPL conesponding to amino acids 287 - 301 of HSSTROL3JP8, wherein said first amino acid sequence, bridging amino acid, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSSTROL3JP8, comprising a polypeptide being at least 70%, optionally at least about 80%), preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VRPCLPVPLLLCWPL in HSSTROL3_P8.
The location of the variant protein was detennined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region. Variant protein HSSTROL3 JP8 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 13, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSSTROL3_P8 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 13 -Amino acid mutations
Figure imgf001042_0001
Variant protein HSSTROL3_P8 is encoded by the following transcript(s): HSSTROL3 T11, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSSTROL3JT11 is shown in bold; this coding portion starts at position 24 and ends at position 926. The transcript also has the following SNPs as listed in Table 14 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of Icnown SNPs in variant protein HSSTROL3 JP8 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 14 - Nucleic acid SNPs
Figure imgf001043_0001
Figure imgf001044_0001
Variant protein HSSTROL3 JP9 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSSTROL3JT12. An alignment is given to the Icnown protein (Stromelysin-3 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSSTROL3 P9 and MMI 1 JHUMAN: 1.An isolated chimeric polypeptide encoding for HSSTROL3_P9, comprising a first amino acid sequence being at least 90 % homologous to MAPAAWLRSAAARALLPPMLLLLLQPPPLLARALPPDVHHLHAERRGPQPWHAALPSS PAPAPATQEAPRPASSLRPPRCGVPDPSDGLSARNRQK conesponding to amino acids 1 - 96 of MMI 1 JHUMAN, which also conesponds to amino acids 1 - 96 of HSSTROL3JP9, a second amino acid sequence being at least 90 % homologous to RILRFPWQLVQEQVRQTMAEALKVWSDVTPLTFTEVHEGRADIMIDFARYW conesponding to amino acids 113 - 163 of MMI 1_HUMAN, which also conesponds to amino acids 97 - 147 of HSSTROL3JP9, a bridging amino acid H conesponding to amino acid 148 of HSSTROL3_P9, a third amino acid sequence being at least 90 % homologous to GDDLPFDGPGGILAHAFFPKTHREGDVHFDYDETWTIGDDQGTDLLQVAAHEFGHVLG LQHTTAAKALMSAFYTFRYPLSLSPDDCRGVQHLYGQPWPTVTSRTPALGPQAGIDTN EIAPLEPDAPPDACEASFDAVSTIRGELFFFKAGFVWRLRGGQLQPGYPALASRHWQGL PSPVDAAFEDAQGHIWFFQG conesponding to amino acids 165 - 359 of MMI 1_HUMAN, which also conesponds to amino acids 149 - 343 of HSSTROL3 P9, and a fourth amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence TTGVSTPAPGV conesponding to amino acids 344 - 354 of HSSTROL3JP9, wherein said first amino acid sequence, second amino acid sequence, bridging amino acid, third amino acid sequence and fourth amino acid sequence are contiguous and in a sequential order. 2.An isolated chimeric polypeptide encoding for an edge portion of HSSTROL3_P9, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise KR, having a structure as follows: a sequence starting from any of amino acid numbers 96-x to 96; and ending at any of amino acid numbers 97+ ((n-2) - x), in which x varies from 0 to n-2. 3. An isolated polypeptide encoding for a tail of HSSTROL3JP9, comprising a polypeptide being at least 10%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence TTGVSTPAPGV in HSSTROL3_P9.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region. Variant protein HSSTROL3 JP9 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 15, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSSTROL3JP9 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 15 - Amino acid mutations
Figure imgf001045_0001
Variant protein HSSTROL3JP9 is encoded by the following transcript(s): HSSTROL3JT12, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSSTROL3JT12 is shown in bold; this coding portion starts at position 24 and ends at position 1085. The transcript also has the following SNPs as listed in Table 16 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSSTROL3_P9 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 16 - Nucleic acid SNPs
Figure imgf001046_0001
As noted above, cluster HSSTROL3 featares 16 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster HSSTROL3 jiodejo according to the present invention is supported by 14 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSSTROL3 JT5, HSSTROL3 JT 8, HSSTROL3JT9, HSSTROL3JT10, HSSTROL3JT11 and HSSTROL3_T12. Table 17 below describes the starting and ending position of this segment on each transcript. Table 17 - Segment location on transcripts
Figure imgf001047_0001
Segment cluster HSSTROL3 jnode JO according to the present invention is supported by 21 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcriρt(s): HSSTROL3JT5, HSSTROL3JT8, HSSTROL3 _T9, HSSTROL3JT10, HS5TROL3JT11 and HSSTROL3 JT12. Table 18 below describes the starting and ending position of this segment on each franscript. Table 18 - Segment location on transcripts
Figure imgf001048_0001
Segment cluster HSSTROL3_node_13 according to the present invention is supported by 36 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSSTROL3 JT5, HSSTROL3JT8, HSSTROL3JT9, HSSTROL3_T10, HSSTROL3JT11 and HSSTROL3 T12. Table 19 below describes the starting and ending position of this segment on each transcript. Table 19 - Segment location on transcripts
Figure imgf001048_0002
Segment cluster HSSTROL3_node_15 according to the present invention is supported by 47 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSSTROL3JT5, HSSTROL3JT8, HSSTROL3JT9, HSSTROL3JT10, HSSTROL3JT11 and HSSTROL3JT12. Table 20 below describes the starting and ending position of this segment on each transcript. Table 20 - Segment location on transcripts
Figure imgf001049_0001
Segment cluster HSSTROL3 ιodeJ9 according to the present invention is supported by 63 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSSTROL3 _T5, HSSTROL3JT8, HSSTROL3JT9, HSSTROL3JT10, HSSTROL3JT11 and HSSTROL3JT12. Table 21 below describes the starting and ending position of this segment on each transcript. Table 21 - Segment location on transcripts
Figure imgf001049_0002
Segment cluster HSSTROL3 jnode .1 according to the present invention is supported by 61 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSSTROL3JT5, HSSTROL3 JT8, HSSTROL3 JT9, HSSTROL3JT10, HSSTROL3JT11 and HSSTROL3JT12. Table 22 below describes the starting and ending position of this segment on each transcript. Table 22 - Segment location on transcripts
Figure imgf001050_0001
Segment cluster HSSTROL3 jnode J4 according to the present invention is supported by 7 libraries. The number of libraries was detennined as previously described. This segment can be found in the following transcript(s): HSSTROL3 JT8 and HSSTROL3JT9. Table 23 below describes the starting and ending position of this segment on each transcript. Table 23 - Segment location on transcripts
Figure imgf001050_0002
Segment cluster HSSTROL3_node_25 according to the present invention is supported by 13 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSSTROL3JT8. Table 24 below describes the starting and ending position of this segment on each transcript. Table 24 - Segment location on transcripts
Figure imgf001051_0001
Segment cluster HSSTROL3jnodeJ6 according to the present invention is supported by 55 libraries. The number of libraries was detennined as previously described. This segment can be found in the following transcript(s): HSSTROL3JT5, HSSTROL3 JT8, HSSTROL3JT9 and HSSTROL3JT11. Table 25 below describes the starting and ending position of this segment on each transcript. Table 25 - Segment location on transcripts
Figure imgf001051_0002
Segment cluster HSSTROL3 jnode δ according to the present invention is supported by 10 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSSTROL3JT5, HSSTROL3 JT9 and HSSTROL3JT10. Table 26 below describes the starting and ending position of this segment on each transcript. Table 26 - Segment location on transcripts
Figure imgf001051_0003
Figure imgf001052_0001
Segment cluster HSSTROL3 jιodeJ9 according to the present invention is supported by 109 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSSTROL3 JT5, HSSTROL3JT8, HSSTROL3JT9, HSSTROL3JT10, HSSTROL3JT11 and HSSTROL3JT12. Table 27 below describes the starting and ending position of this segment on each transcript. Table 27 - Segment location on transcripts
Figure imgf001052_0002
According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster HSSTROL3_node J 1 according to the present invention is supported by 25 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSSTROL3JT5, HSSTROL3 JT8, HSSTROL3JT9, HSSTROL3JT10 and HSSTROL3JT11. Table 28 below describes the starting and ending position of this segment on each transcript. Table 28 - Segment location on transcripts
Figure imgf001053_0001
Segment cluster HSSTROL3_node_17 according to the present invention is supported by 45 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSSTROL3 JT5, HSSTROL3JT8, HSSTROL3JT9, HSSTROL3 JT10, HSSTROL3 JTl 1 and HSSTROL3JT12. Table 29 below describes the starting and ending position of this segment on each transcript. Table 29 - Segment location on transcripts
Figure imgf001053_0002
Segment cluster HSSTROL3 jnodeJ 8 according to the present invention can be found in the following transcript(s): HSSTROL3JT5, HSSTROL3_T8, HSSTROL3JT9, HSSTROL3JT10, HSSTROL3_Tl 1 and HSSTROL3JT12. Table 30 below describes the starting and ending position of this segment on each transcript. Table 30 - Segment location on transcripts
Figure imgf001054_0001
Segment cluster HSSTROL3_node JO according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSSTROL3JT11. Table 31 below describes the starting and ending position of this segment on each franscript. Table 31 - Segment location on transcripts
Figure imgf001054_0002
Segment cluster HSSTROL3_nodeJ7 according to the present invention is supported by 50 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSSTROL3 _T5, HSSTROL3 JT8, HSSTROL3 JT9, HSSTROL3JT10, HSSTROL3 Tl 1 and HSSTROL3JT12. Table 32 below describes the starting and ending position of this segment on each franscript. Table 32 - Segment location on transcripts
Figure imgf001055_0001
Variant protein alignment to the previously known protein: Sequence name: MM11_HUMAN
Sequence documentation: Alignment of: HSSTROL3_P4 x MMllJiUMAN Alignment segment 1/1:
Quality: 4444.00 Escore: 0 Matching length: 445 Total length: 445 Matching Percent Similarity: 99.78 Matching Percent Identity: 99.78 Total Percent Similarity: 99.78 Total Percent Identity: 99.78 Gaps : 0 Alignment:
1 MAPAA LRSAAARALLPPMLLLLLQPPPLLARALPPDVHHLHAERRGPQP 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MAPAAWLRSAAARALLPPMLLLLLQPPPLLARALPPDVHHLHAERRGPQP 50 51 HAALPSSPAPAPATQEAPRPASSLRPPRCGVPDPSDGLSARNRQKRFVL 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 WHAALPSSPAPAPATQEAPRPASSLRPPRCGVPDPSDGLSARNRQKRFVL 100
101 SGGRWEKTDLTYRILRFPWQLVQEQVRQTMAEALKVWSDVTPLTFTEVHE 150 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 101 SGGRWEKTDLTYRILRFP QLVQEQVRQTMAEALKVWSDVTPLTFTEVHE 150 . . . . . 151 GRADIMIDFARYWHGDDLPFDGPGGILAHAFFPKTHREGDVHFDYDET T 200 I I I I I I I I I I I I I I I I II I I I I I I I I I 1 I I I I I I I I I 1 I I I I I I I I I I I 151 GRADIMIDFARYWDGDDLPFDGPGGILAHAFFPKTHREGDVHFDYDETWT 200 201 IGDDQGTDLLQVAAHEFGHVLGLQHTTAAKALMSAFYTFRYPLSLSPDDC 250 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 201 IGDDQGTDLLQVAAHEFGHVLGLQHTTAAKALMSAFYTFRYPLSLSPDDC 250
251 RGVQHLYGQP PTVTSRTPALGPQAGIDTNEIAPLEPDAPPDACEASFDA 300 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I 251 RGVQHLYGQPWPTVTSRTPALGPQAGIDTNEIAPLEPDAPPDACEASFDA 300 301 VSTIRGELFFFKAGFV RLRGGQLQPGYPALASRHWQGLPSPVDAAFEDA 350 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I 301 VSTIRGELFFFKAGFVWRLRGGQLQPGYPALASRHWQGLPSPVDAAFEDA 350 351 QGHIWFFQGAQYWVYDGEKPVLGPAPLTELGLVRFPVHAALVWGPEKNKI 400 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 351 QGHIWFFQGAQYWVYDGEKPVLGPAPLTELGLVRFPVHAALVWGPEKNKI 400 401 YFFRGRDY RFHPSTRRVDSPVPRRATDWRGVPSEIDAAFQDADG 445 I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I 401 YFFRGRDYWRFHPSTRRVDSPVPRRATD RGVPSEIDAAFQDADG 445
Sequence name : MM11JTOMAN
Sequence documentation:
Alignment of: HSSTROL3_P5 x MM11_HUMAN
Alignment segment 1/1:
Quality: 3566.00 Escore: 0 Matching length: 358 Total length: 358 Matching Percent Similarity: 99.72 Matching Percent Identity: 99.72 Total Percent Similarity: 99.72 Total Percent Identity: 99.72 Gaps : 0 Alignment:
1 MAPAAWLRSAAARALLPPMLLLLLQPPPLLARALPPDVHHLHAERRGPQP 50 I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I 1 MAPAAWLRSAAARALLPPMLLLLLQPPPLLARALPPDVHHLHAERRGPQP 50 51 WHAALPSSPAPAPATQEAPRPASSLRPPRCGVPDPSDGLSARNRQKRFVL 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I 51 WHAALPSSPAPAPATQEAPRPASSLRPPRCGVPDPSDGLSARNRQKRFVL 100 101 SGGR EKTDLTYRILRFPWQLVQEQVRQTMAEALKV SDVTPLTFTEVHE 150 I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 101 SGGR EKTDLTYRILRFPWQLVQEQVRQTMAEALKVWSDVTPLTFTEVHE 150 151 GRADIMIDFARY HGDDLPFDGPGGILAHAFFPKTHREGDVHFDYDETWT 200 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 151 GRADIMIDFARYWDGDDLPFDGPGGILAHAFFPKTHREGDVHFDYDETWT 200
201 IGDDQGTDLLQVAAHEFGHVLGLQHTTAAKALMSAFYTFRYPLSLSPDDC 250 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 201 IGDDQGTDLLQVAAHEFGHVLGLQHTTAAKALMSAFYTFRYPLSLSPDDC 250
251 RGVQHLYGQPWPTVTSRTPALGPQAGIDTNEIAPLEPDAPPDACEASFDA 300 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 251 RGVQHLYGQP PTVTSRTPALGPQAGIDTNEIAPLEPDAPPDACEASFDA 300
301 VSTIRGELFFFKAGFVWRLRGGQLQPGYPALASRH QGLPSPVDAAFEDA 350 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 301 VSTIRGELFFFKAGFV RLRGGQLQPGYPALASRHWQGLPSPVDAAFEDA 350
351 QGHIWFFQ 358 351 QGHI FFQ 358
Sequence name: MM11_HUMAN
Sequence documentation:
Alignment of: HSSTROL3_P7 x MM11 TOMAN
Alignment segment 1/1:
Quality: 3575.00 Escore: 0 Matching length: 359 Total length: 359 Matching Percent Similarity: 99.72 Matching Percent Identity: 99.72 Total Percent Similarity: 99.72 Total Percent Identity: 99.72 Gaps : 0
Alignment:
1 MAPAA LRSAAARALLPPMLLLLLQPPPLLARALPPDVHHLHAERRGPQP 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MAPAAWLRSAAARALLPPMLLLLLQPPPLLARALPPDVHHLHAERRGPQP 50 WHAALPSSPAPAPATQEAPRPASSLRPPRCGVPDPSDGLSARNRQKRFVL 100
I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I 1 I I I I I HAALPSSPAPAPATQEAPRPASSLRPPRCGVPDPSDGLSARNRQKRFVL 100
SGGRWEKTDLTYRILRFPWQLVQEQVRQTMAEALKVWSDVTPLTFTEVHE 150
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I SGGRWEKTDLTYRILRFPWQLVQEQVRQTMAEALKV SDVTPLTFTEVHE 150
GRADIMIDFARY HGDDLPFDGPGGILAHAFFPKTHREGDVHFDYDETWT 200
GRADIMIDFARYWDGDDLPFDGPGGILAHAFFPKTHREGDVHFDYDETWT 200
IGDDQGTDLLQVAAHEFGHVLGLQHTTAAKALMSAFYTFRYPLSLSPDDC 250
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I IGDDQGTDLLQVAAHEFGHVLGLQHTTAAKALMSAFYTFRYPLSLSPDDC 250
RGVQHLYGQPWPTVTSRTPALGPQAGIDTNEIAPLEPDAPPDACEASFDA 300
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I RGVQHLYGQP PTVTSRTPALGPQAGIDTNEIAPLEPDAPPDACEASFDA 300
VSTIRGELFFFKAGFV RLRGGQLQPGYPALASRHWQGLPSPVDAAFEDA 350
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I VSTIRGELFFFKAGFV RLRGGQLQPGYPALASRH QGLPSPVDAAFEDA 350
QGHI FFQG 359
I I I I I I I I I QGHIWFFQG 359 Sequence name: MMllJiUMAN
Sequence documentation:
Alignment of: HSSTROL3_P8 x MMllJiUMAN
Alignment segment 1/1:
Quality: 2838.00 Escore: 0 Matching length: 286 Total length: 286 Matching Percent Similarity: 99.65 Matching Percent Identity: 99.65 Total Percent Similarity: 99.65 Total Percent Identity: 99.65 Gaps : 0
Alignment:
1 MAPAAWLRSAAARALLPPMLLLLLQPPPLLARALPPDVHHLHAERRGPQP 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MAPAAWLRSAAARALLPPMLLLLLQPPPLLARALPPDVHHLHAERRGPQP 50 51 WHAALPSSPAPAPATQEAPRPASSLRPPRCGVPDPSDGLSARNRQKRFVL 100 I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 WHAALPSSPAPAPATQEAPRPASSLRPPRCGVPDPSDGLSARNRQKRFVL 100 101 SGGRWEKTDLTYRILRFPWQLVQEQVRQTMAEALKV SDVTPLTFTEVHE 150 I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 101 SGGRWEKTDLTYRILRFPWQLVQEQVRQTMAEALKVWSDVTPLTFTEVHE 150 151 GRADIMIDFARYWHGDDLPFDGPGGILAHAFFPKTHREGDVHFDYDETWT 200 I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I 151 GRADIMIDFARYWDGDDLPFDGPGGILAHAFFPKTHREGDVHFDYDETWT 200 201 IGDDQGTDLLQVAAHEFGHVLGLQHTTAAKALMSAFYTFRYPLSLSPDDC 250 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I 201 IGDDQGTDLLQVAAHEFGHVLGLQHTTAAKALMSAFYTFRYPLSLSPDDC 250 251 RGVQHLYGQPWPTVTSRTPALGPQAGIDTNEIAPLE 286 II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 251 RGVQHLYGQPWPTVTSRTPALGPQAGIDTNEIAPLE 286
Sequence name: MMI1JHUMAN
Sequence documentation:
Alignment of: HSSTROL3_P9 x MMllJiUMAN
Alignment segment 1/1: Quality: 3316.00
Escore: 0 Matching length: 343 Total length: 359 Matching Percent Similarity: 99.71 Matching Percent Identity: 99.71 Total Percent Similarity: 95.26 Total Percent Identity: 95.26 Gaps : 1
Alignment: . . . . . 1 MAPAAWLRSAAARALLPPMLLLLLQPPPLLARALPPDVHHLHAERRGPQP 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I 1 I I 1 MAPAAWLRSAAARALLPPMLLLLLQPPPLLARALPPDVHHLHAERRGPQP 50 51 WHAALPSSPAPAPATQEAPRPASSLRPPRCGVPDPSDGLSARNRQK.... 96 I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 WHAALPSSPAPAPATQEAPRPASSLRPPRCGVPDPSDGLSARNRQKRFVL 100 97 RILRFPWQLVQEQVRQTMAEALKVWSDVTPLTFTEVHE 134 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 101 SGGRWEKTDLTYRILRFPWQLVQEQVRQTMAEALKVWSDVTPLTFTEVHE 150
135 GRADIMIDFARYWHGDDLPFDGPGGILAHAFFPKTHREGDVHFDYDETWT 184 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I 151 GRADIMIDFARYWDGDDLPFDGPGGILAHAFFPKTHREGDVHFDYDETWT 200 185 IGDDQGTDLLQVAAHEFGHVLGLQHTTAAKALMSAFYTFRYPLSLSPDDC 234 II I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 201 IGDDQGTDLLQVAAHEFGHVLGLQHTTAAKALMSAFYTFRYPLSLSPDDC 250 . . . . . 235 RGVQHLYGQPWPTVTSRTPALGPQAGIDTNEIAPLEPDAPPDACEASFDA 284 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 251 RGVQHLYGQPWPTVTSRTPALGPQAGIDTNEIAPLEPDAPPDACEASFDA 300
285 VSTIRGELFFFKAGFVWRLRGGQLQPGYPALASRH QGLPSPVDAAFEDA 334 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 301 VSTIRGELFFFKAGFVWRLRGGQLQPGYPALASRHWQGLPSPVDAAFEDA 350
335 QGHIWFFQG 343 I I I I I I I I I 351 QGHIWFFQG 359
Expression of Homo sapiens matrix metalloproteinase 11 (stromelysin 3) (MMP11) HSSTROL3 transcripts which are detectable by amplicon as depicted in sequence name HSSTROL3 junc21- 27 in normal and cancerous colon tissues Expression of Homo sapiens matrix metalloproteinase 11 (stromelysin 3) (MMP11) franscripts detectable by or according to junc21-27, HSSTROL3 junc21-27 amplicon (SEQ ID NO.1312) and primers HSSTROL3 junc21-27F (SEQ ID NO:1310) and HSSTROL3 junc21-27R (SEQ ID NO:1311) was measured by real time PCR. In parallel the expression of four housekeeping genes -PBGD (GenBank Accession No. BC019323; amplicon - PBGD-amplicon, SEQ ID NO:531), HPRTl (GenBanlc Accession No. NM_000194; amplicon - HPRTl -amplicon, SEQ ID NO.612), G6PD (GenBank Accession No. NM_000402; HPRTl -amplicon, SEQ ID NO:615), RPS27A (GenBank Accession No. NM_002954; RPS27A amplicon, SEQ ID NO: 1261), was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the normal post-mortem (PM) samples (Sample Nos. 41,52, 62-67, 69-71, Table 1, above, "Tissue samples in testing panel"), to obtain a value of fold up-regulation for each sample relative to median of the normal PM samples. Figure 73 is a histogram showing over expression of the above-indicated Homo sapiens matrix metalloproteinase 11 (stromelysin 3) (MMP11) transcripts in cancerous colon samples relative to the normal samples. As is evident from Figure 73, the expression of Homo sapiens matrix metalloproteinase
11 (stromelysin 3) (MMPl 1) transcripts detectable by the above amplicon(s) in cancer samples was higher than in the non-cancerous samples (Sample Nos. 41, 52, 62-67, 69-71 Table 1,
"Tissue samples in testing panel"). Notably an over-expression of at least 6 fold was found in 14 out of 36 adenocarcinoma samples.
Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non- limiting illustrative example only of a suitable primer pair: HSSTROL3 junc21-27F forward primer; and HSSTROL3 junc21-27R reverse primer. The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon was obtained as a non-limiting illusfrative example only of a suitable amplicon: HSSTROL3 junc21-27.
Primers: Forward primer HSSTROL3 junc21-27F (SEQ ID NO:1310): ACATTTGGTTCTTCCAAGGGACTAC Reverse primer HSSTROL3 junc21-27R (SEQ ID NO:1311): TCGATCTCAGAGGGCACCC Amplicon HSSTROL3 junc21-27 (SEQ ID NO:1312) : ACATTTGGTTCTTCCAAGGGACTACTGGCGTTTCCACCCCAGCACCCGGCGTGTAGA CAGTCCCGTGCCCCGCAGGGCCACTGACTGGAGAGGGGTGCCCTCTGAGATCGA Expression of Homo sapiens matrix metalloproteinase 11 (stromelysin 3) (MMPl 1) HSSTROL3 transcripts which are detectable by amplicon as depicted in sequence name HSSTROL3 seg25 in nonnal and cancerous colon tissues Expression of Homo sapiens matrix metalloproteinase 11 (stromelysin 3) (MMP11) transcripts detectable by or according to seg25, amplicon (SEQ ID NO: 1315) and primers HSSTROL3 seg25F (SEQ ID NO: 1313) and HSSTROL3 seg25R (SEQ ID NO: 1314) was measured by real time PCR. In parallel the expression of four housekeeping genes —PBGD (GenBanlc Accession No. BC019323; amplicon - PBGD-amplicon, SEQ ID NO:531), HPRTl (GenBank Accession No. NM_000194; amplicon - HPRTl -amplicon, SEQ ID NO:612), G6PD (GenBank Accession No. NM_000402; HPRTl -amplicon, SEQ ID NO:615), RPS27A (GenBank Accession No. NM_002954; RPS27A amplicon, SEQ ID NO:1261), was measured similarly. For each RT sample, the expression of the above amplicon was nonnalized to the geometric mean of the quantities of the housekeeping genes. The nonnalized quantity of each RT sample was then divided by the median of the quantities of the nonnal post-mortem (PM) samples (Sample Nos. 41, 52, 62-67, 69-71, Table 1, "Tissue samples in testing panel", above), to obtain a value of fold up-regulation for each sample relative to median of the normal PM samples. Figure 74 is a histogram showing over expression of the above-indicated Homo sapiens matrix metalloproteinase 11 (stromelysin 3) (MMPl 1) transcripts in cancerous colon samples relative to the normal samples. As is evident from Figure 74, the expression of Homo sapiens matrix metalloproteinase 11 (stromelysin 3) (MMP11) transcripts detectable by the above amplicon(s) was higher in a few cancer samples than in the non-cancerous samples (Sample Nos. 41, 52, 62-67, 69-71 Table 1, "Tissue samples in testing panel"). Notably an over-expression of at least 5 fold was found in 5 out of 36 adenocarcinoma samples. Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non- limiting illustrative example only of a suitable primer pair: HSSTROL3 seg25F forward primer; and HSSTROL3 seg25R reverse primer. The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon was obtained as a non-limiting illustrative example only of a suitable amplicon: HSSTROL3 seg25. Primers: Forward primer HSSTROL3 seg25F (SEQ ID NO: 1313): CACTGCCCCAGCTTATCCC Reverse primer HSSTROL3 seg25R (SEQ ID NO: 1314): CTCTCCCAGCCTCAGTTTCCT Amplicon HSSTROL3 seg25 (SEQ ID NO: 1315):
CACTGCCCCAGCTTATCCCAGGCCTCCCGCTTCCCTCTGCGGGTGGGGTGCTGAGCA GGCATTATTGGCCTGCATGTTTTACTGATGAGGAAACTGAGGCTGGGAGAG
Expression of Homo sapiens matrix metalloproteinase 11 (stromelysin 3) (MMPl 1) HSSTROL3 transcripts which are detectable by amplicon as depicted in sequence name HSSTROL3 seg24 in normal and cancerous colon tissues Expression of Homo sapiens matrix metalloproteinase 11 (stromelysin 3) (MMP11) franscripts detectable by or according to seg24, HSSTROL3 seg24 amplicon (SEQ ID NO: 1318) and primers HSSTROL3 seg24F (SEQ ID NO: 1316) and HSSTROL3 seg24R (SEQ ID NO: 1317) was measured by real time PCR. In parallel the expression of four housekeeping genes -
PBGD (GenBank Accession No. BC019323; amplicon - PBGD-amplicon, SEQ ID NO: 531),
HPRTl (GenBanlc Accession No. NM_000194; amplicon - HPRTl -amplicon, SEQ ID NO:
612), G6PD (GenBank Accession No. NM_000402; G6PD amplicon, SEQ ID NO.615),
RPS27A (GenBank Accession No. NM_002954; RPS27A amplicon, SEQ ID NO: 1261), was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the normal post-mortem (PM) samples (Sample Nos. 41,52, 62-67, 69-71, Table 1, above, "Tissue samples in testing panel"), to obtain a value of fold differential expression for each sample relative to median of the normal PM samples. In one experiment that was carried out no differential expression in the cancerous samples relative to the normal PM samples was observed.
Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non- limiting illustrative example only of a suitable primer pair: HSSTROI3 seg24F forward primer; and HSSTROI3 seg24R reverse primer. The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon was obtained as a non-limiting illustrative example only of a suitable amplicon: HSSTROL3 seg24.
Primers: Forward primer HSSTROL3 seg24F (SEQ ID NO: 1316): ATTTCCATCCTCAACTGGCAGA Reverse primer HSSTROE3 seg24R (SEQ ID NO: 1317): TGCCCTGGAACCCACG Amplicon HSSTROL3 seg24 (SEQ ID NO: 1318): ATTTCCATCCTCAACTGGCAGAGATGAGAGCCTGGAGCATTGCAGATGCCAGGGAC TTCACAAATGAAGGCACAGCATGGGAAACCTGCGTGGGTTCCAGGGCA Expression of Sfromelysin-3 precursor (Matrix metalloproteinase- 11) (MMP-11) (ST3) (SL-3) HSSTROL3 transcripts which are detectable by amplicon as depicted in sequence name HSSTROL3 seg24 in different normal tissues
Expression of Stromelysin-3 precursor (EC 3.4.24.-) (Matrix metalloproteinase- 11) (MMP-11) (ST3) (SL-3 transcripts detectable by or according to HSSTROL3 seg24 amplicon (SEQ ID NO: 1318) and HSSTROL3 seg24F (SEQ ID NO: 1316) and HSSTROL3 seg24R (SEQ ID NO: 1317) was measured by real time PCR. In parallel the expression of four housekeeping genes UBC (GenBank Accession No. BC000449; amplicon - Ubiquitin-amplicon, SEQ ID NO: 1270) and SDHA (GenBank Accession No. NM_004168; amplicon - SDHA-amplicon, SEQ ID NO: 1273), RPL19 (GenBank Accession No. NM_000981; RPL19 amplicon, SEQ ID NO: 1264), TATA box (GenBank Accession No. NM_003194; TATA amplicon, SEQ ID NO: 1267) was measured similarly. For each RT sample, the expression of the above amplicon was noπnalized to the geometric mean of the quantities of the housekeeping genes. The noπnalized quantity of each RT sample was then divided by the median of the quantities of the lung samples (Sample Nos. 15-17 above), to obtain a value of relative expression of each sample relative to median of the lung samples. Primers: Forward primer HSSTROL3 seg24F (SEQ ID NO: 1316): ATTTCCATCCTCAACTGGCAGA Reverse primer HSSTROL3 seg24R (SEQ ID NO: 1317): TGCCCTGGAACCCACG Amplicon HSSTROI3 seg24 (SEQ ID NO: 1318):
ATTTCCATCCTCAACTGGCAGAGATGAGAGCCTGGAGCATTGCAGATGCCAGGGAC TTCACAAATGAAGGCACAGCATGGGAAACCTGCGTGGGTTCCAGGGCA
The results are presented in Figure 76, showing the expression of of Sfromelysin-3 HSSTROL3 transcripts which are detectable by amplicon as depicted in sequence name HSSTROL3 seg24 in different normal tissues.
DESCRIPTION FOR CLUSTER AA583399
Cluster AA583399 features 16 transcript(s) and 20 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Figure imgf001070_0001
Figure imgf001071_0001
Table 3 - Proteins of interest
Figure imgf001071_0002
Figure imgf001072_0001
These sequences are variants of the known protein Myeloma overexpressed gene protein (SwissProt accession identifier MYEOJTUMAN; known also according to the synonyms Oncogene in multiple myeloma), SEQ ID NO: 679, refened to herein as the previously Icnown protein. The sequence for protein Myeloma overexpressed gene protein is given at the end of the application, as "Myeloma overexpressed gene protein amino acid sequence". Known polymoφhisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein
Figure imgf001072_0002
Cluster AA583399 can be used as a diagnostic marker according to overexpression of transcripts of this cluster in cancer. Expression of such transcripts in normal tissues is also given according to the previously described methods. The term "number" in the left hand column of the table and the numbers on the y-axis of the figure below refer to weighted expression of ESTs in each category, as "parts per million" (ratio of the expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million).
Overall, the following results were obtained as shown with regard to the histograms in Figure 40 and Table 5. This cluster is overexpressed (at least at a minimum level) in the following pathological conditions: brain malignant tamors, epithelial malignant tumors, a mixture of malignant tamors from different tissues and gastric carcinoma. Table 5 - Normal tissue distribution
Figure imgf001073_0001
Table 6 - P values and ratios for expression in cancerous tissue
Figure imgf001074_0001
As noted above, cluster AA583399 featares 16 franscript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Myeloma overexpressed gene protein. A description of each variant protein according to the present invention is now provided. Variant protein AA583399JPEAJ JP3 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by franscript(s) AA583399JPEAJJT1. The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide. Variant protein AA583399JPEAJ JP3 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 7, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein AA583399JPEA J JP3 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Amino acid mutations
Figure imgf001075_0001
Variant protein AA583399JPEA J JP3 is encoded by the following franscript(s): AA583399JPEAJJT1, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript AA583399JPEAJJT1 is shown in bold; this coding portion starts at position 587 and ends at position 1525. The franscript also has the following SNPs as listed in Table 8 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein AA583399_PEA_1_P3 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Nucleic acid SNPs
Figure imgf001075_0002
Figure imgf001076_0001
Variant protein AA583399_PEA_1_P2 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) AA583399JPEAJ JT3. An alignment is given to the known protein (Myeloma overexpressed gene protein) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between AA583399JPEAJ JP2 and MYEO JHUMAN l (SEQ ID NO: 680): l.An isolated chimeric polypeptide encoding for AA583399JPEAJJP2, comprising a first amino acid sequence being at least 90 % homologous to MFTRQAGHFVEGSKAGRSRGRLCLSQALRVAVRGAFVSLWFAAGAGDRERNKGDKG AQTGAGLSQEAEDVDVSRARRVTDAPQGTLCGTGNRNSGSQSARWGVAHLGEAFRV GVEQAISSCPEEVHGRHGLSMETMWARMDVALRSPGRGLLAGAGALCMTLAESSCPD YERGRRACLTLHR1 PTPHCSTWGLPLRVAGSWLTVVTVEALGGWRMGVRRTGQVGP TMHPPPVSGASPLLLHHLLLLLLIIILTC conesponding to amino acids 59 - 313 of MYEOJXUMANJV1, which also conesponds to amino acids 1 - 255 of AA583399 PEA 1 P2.
It should be noted that the known protein sequence (MYEO HUMAN) has one or more changes than the sequence given at the end ofthe application and named as being the amino acid sequence for MYEOJHUMAN l . These changes were previously known to occur and are listed in the table below. Table 9 - Changes to MYE0_HUMAN_V1
Figure imgf001077_0001
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because one ofthe two signal- peptide prediction programs (HMM:Signal peptide,NN:NO) predicts that this protein has a signal peptide.
Variant protein AA583399_PEA_1_P2 is encoded by the following transcript(s): AA583399JPEAJJT3, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript AA583399JPEA J JT3 is shown in bold; this coding portion starts at position 689 and ends at position 1453. The transcript also has the following SNPs as listed in Table 10 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein AA583399_PEA_1_P2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 - Nucleic acid SNPs
Figure imgf001077_0002
Figure imgf001078_0001
Variant protein AA583399JPEAJ JP4 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) AA583399JPEAJ JT7. An alignment is given to the known protein (Myeloma overexpressed gene protein) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between AA583399JPEA J JP4 and MYEO_HUMAN_Vl : l.An isolated chimeric polypeptide encoding for AA583399JPEAJJP4, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MSDLFIGFLVCSLSPLGTGTRCSCSPG conesponding to amino acids 1 - 27 of AA583399_PEA_1_P4, and a second amino acid sequence being at least 90 % homologous to RNSGSQSARWGVAHLGEAFRVGVEQAISSCPEEVHGRHGLSMEIMWARMDVALRSP GRGLLAGAGALCMTLAESSCPDYERGRRACLTLHRHPTPHCSTWGLPLRVAGSWLTV VTVEALGGWRMGVRRTGQVGPTMHPPPVSGASPLLLHHLLLLLLIIILTC conesponding to amino acids 150 - 313 of MYEO_HUMAN_Vl, which also conesponds to amino acids 28 - 191 of AA583399_PEA_1_P4, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a head of AA583399JPEAJ JP4, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MSDLFIGFLVCSLSPLGTGTRCSCSPG of AA583399_PEAJ JP4.
It should be noted that the known protein sequence (MYEO JIUMAN) has one or more changes than the sequence given at the end ofthe application and named as being the amino acid sequence for MYEOJHUMANJV1. These changes were previously known to occur and are listed in the table below. Table 11 - Changes to MYEO_HUMAN_Vl
Figure imgf001079_0001
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide. Variant protein AA583399JPEAJJP4 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 12, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein AA583399 JΕAJJP4 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 12 - Amino acid mutations
Figure imgf001079_0002
Figure imgf001080_0001
Variant protein AA583399JPEAJ JP4 is encoded by the following franscript(s): AA583399JPEAJJT7, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript AA583399_PEA_1_T7 is shown in bold; this coding portion starts at position 789 and ends at position 1361. The transcript also has the following SNPs as listed in Table 13 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is Icnown or not; the presence of known SNPs in variant protein AA583399_PEA_1_P4 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 13 - Nucleic acid SNPs
Figure imgf001080_0002
Variant protein AA583399JPEAJ JP5 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by franscript(s) AA583399JPEAJ JT8. An alignment is given to the known protein (Myeloma overexpressed gene protein) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between AA583399_PEA_1_P5 and MYEO JHUMAN JV2 (SEQ ID NO: 681): l.An isolated chimeric polypeptide encoding for AA583399JPEAJ JP5, comprising a first amino acid sequence being at least 90 % homologous to MEIMWARMDVALRSPGRGLLAGAGALCMTLAESSCPDYERGRRACLTLHRHPTPHCS TWGLPLRVAGSWLTVVTVEALGGWRMGVRRTGQVGPTMHPPPVSGASPLLLHHLLLL LLIIILTC conesponding to amino acids 192 - 313 of MYEO_HUMAN_V2, which also conesponds to amino acids 1 - 122 of AA583399JPEAJ JP5.
It should be noted that the known protein sequence (MYEO JTUMAN) has one or more changes than the sequence given at the end of the application and named as being the amino acid sequence for MYEO_HUMAN_V2. These changes were previously known to occur and are listed in the table below. Table 14 - Changes to MYEO_HUMAN_ V2
Figure imgf001081_0001
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because one ofthe two signal- peptide prediction programs (HMM:Signal peptide,NN:NO) predicts that this protein has a signal peptide. Variant protein AA583399JPEAJ JP5 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 15, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein AA583399JΕAJ JP5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 15 - Amino acid mutations
Figure imgf001082_0001
Variant protein AA583399_PEA_1_P5 is encoded by the following transcript(s): AA583399 JPEAJJT8, for which the sequence(s) is/are given at the end ofthe application. The coding portion of franscript AA583399JPEAJ JT8 is shown in bold; this coding portion starts at position 849 and ends at position 1214. The transcript also has the following SNPs as listed in Table 16 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of Icnown SNPs in variant protein AA583399JPEAJ JP5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 16 - Nucleic acid SNPs
Figure imgf001082_0002
Figure imgf001083_0001
Variant protein AA583399JPEAJ JP6 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) AA583399JPEAJ JT12. The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: intracellularly. The protein localization is believed to be intracellularly because neither ofthe trans-membrane region prediction programs predicted a trans-membrane region for this protein. In addition both signal-peptide prediction programs predict that this protein is a non-secreted protein.
Variant protein AA583399_PEAJ_P6 is encoded by the following transcript(s): AA583399JPEAJ JT12, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript AA583399JPEAJ JT12 is shown in bold; this coding portion starts at position 39 and ends at position 371. The transcript also has the following SNPs as listed in Table 17 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is Icnown or not; the presence of known SNPs in variant protein AA583399JPEAJ JP6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 17 - Nucleic acid SNPs
Figure imgf001083_0002
Figure imgf001084_0001
Variant protein AA583399 ΕAJ JP8 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) AA583399JPEAJ JT17. The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: intracellularly. The protein localization is believed to be intracellularly because neither ofthe trans-membrane region prediction programs predicted a trans-membrane region for this protein. In addition both signal-peptide prediction programs predict that this protein is a non-secreted protein.
Variant protein AA583399JPEAJ JP8 is encoded by the following transcript(s): AA583399_PEA_1_T17, for which the sequence(s) is/are given at the end ofthe application. The coding portion of franscript AA583399JPEAJ JT17 is shown in bold; this coding portion starts at position 191 and ends at position 400.
Variant protein AA583399JPEA 1 JP10 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) AA583399JPEAJJT0. An alignment is given to the Icnown protein (Myeloma overexpressed gene protein) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between AA583399_PEA_1_P10 and MYEO JHUMAN ^3 (SEQ ID NO: 682): l.An isolated chimeric polypeptide encoding for AA583399JPEAJJP10, comprising a first amino acid sequence being at least 90 % homologous to MFTRQAGHFVEGSKAGRSRGRLCLSQALRVAVRGAFVSL FAAGAGDRERNKGDKG AQTGAGLSQEAEDVDVSRARRVTDAPQGTLCGTGNRNSGSQSARAVGVAHLGEAFRV GVEQAISSCPEEVHGRHGLSMEIMWAQMDVALRSPGRGLLAGAGALCMTLAESSCPD YERGRRACLTLHRHPTPHCSTWGLPLRVAGSWLTVVTVEALGRWRMGVRRTGQVGPT MHPPPVSGASPLLLHHLLLLLLIIILTC conesponding to amino acids 59 - 313 of MYEO JHUMAN JV3, which also conesponds to amino acids 1 - 255 of AA583399 JPEA JJP 10.
It should be noted that the known protein sequence (MYEOJHUMAN) has one or more changes than the sequence given at the end ofthe application and named as being the amino acid sequence for MYEO JHUMAN JV3. These changes were previously known to occur and are listed in the table below. Table 18 - Changes to MYEO_HUMAN_ V3
Figure imgf001085_0001
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because one ofthe two signal- peptide prediction programs (HMM:Signal peptide,NN:NO) predicts that this protein has a signal peptide. Variant protein AA583399_PEA_1_P10 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 19, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein AA583399JPEAJJP10 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 19 - Amino acid mutations
Figure imgf001086_0001
Variant protein AA583399 JPEA JJ310 is encoded by the following transcript(s): AA583399JPEAJJT0, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript AA583399JPEAJJT0 is shown in bold; this coding portion starts at position 857 and ends at position 1621. The transcript also has the following SNPs as listed in Table 20 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein AA583399JPEAJ JP10 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 20 - Nucleic acid SNPs
Figure imgf001086_0002
Figure imgf001087_0001
-Variant protein AA583399_PEA_1_P11 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) AA583399_PEA_1_T2. The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide. Variant protein AA583399_PEA_1_P11 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 21, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein AA583399JPEAJ JP11 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 21 - Amino acid mutations
Figure imgf001087_0002
Variant protein AA583399JPEAJ JP11 is encoded by the following franscript(s): AA583399_PEA_1_T2, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript AA583399JPEAJ JT2 is shown in bold; this coding portion starts at position 493 and ends at position 1431. The franscript also has the following SNPs as listed in Table 22 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein AA583399JΕA J JP11 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 22 - Nucleic acid SNPs
Figure imgf001088_0001
Variant protein AA583399_PEA_1_P12 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) AA583399JPEAJ JT10 and AA583399_PEA_1_T11. The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: intracellularly. The protein localization is believed to be intracellularly because neither ofthe trans-membrane region prediction programs predicted a trans-membrane region for this protein. In addition both signal-peptide prediction programs predict that this protein is a non-secreted protein. Variant protein AA583399JPEAJ JP12 is encoded by the following transcript(s): AA583399_PEA_1_T10 and AA583399 JPEAJ JTl 1, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript AA583399JPENJ JT10 is shown in bold; this coding portion starts at position 191 and ends at position 367. The transcript also has the following SNPs as listed in Table 23 (given according to their position on the nucleotide sequence, with the altemative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein AA583399JPEAJ JP12 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 23 - Nucleic acid SNPs
Figure imgf001089_0001
The coding portion of transcript AA583399JPEAJ_T11 is shown in bold; this coding portion starts at position 191 and ends at position 367. The transcript also has the following SNPs as listed in Table 24 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of Icnown SNPs in variant protein AA583399_PEA_1_P12 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 24 - Nucleic acid SNPs
Figure imgf001089_0002
Variant protein AA583399_PEA_1_P14 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) AA583399JPEAJ JT15. The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: intracellularly. The protein localization is believed to be intracellularly because neither ofthe trans-membrane region prediction programs predicted a trans-membrane region for this protein. In addition both signal-peptide prediction programs predict that this protein is a non-secreted protein.
Variant protein AA583399JPEAJ JP14 is encoded by the following transcript(s): AA583399 JΕAJ JTl 5, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript AA583399JPEAJ JT15 is shown in bold; this coding portion starts at position 43 and ends at position 210.
As noted above, cluster AA583399 featares 20 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end ofthe application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided. Segment cluster AA583399_PEA_l_node_0 according to the present invention is supported by 24 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): AA583399JPEAJ JT0, AA583399 JPEAJJTl, AA583399 JPEAJ JT2, AA583399JPEAJ JT3, AA583399JPEAJJT4, AA583399_PEA_1_T5, AA583399 JΕAJJT6, AA583399_PEA_1_T7, AA583399J>EAJ_T8, AA583399JPEAJ JT9, AA583399_PEA_1_T10, AA583399JPEAJ JTl 1 and AA583399_PEA_1_T17. Table 25 below describes the starting and ending position of this segment on each transcript. Table 25 - Segment location on transcripts
Figure imgf001090_0001
Figure imgf001091_0001
Segment cluster AA583399JPEAJ jiodej according to the present invention is supported by 4 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): AA583399JPEAJ JT4. Table 26 below describes the starting and ending position of this segment on each franscript. Table 26 - Segment location on transcripts
Figure imgf001091_0002
Segment cluster AA583399_PEA_l_node_9 according to the present invention is supported by 23 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): AA583399JPEAJJT0, AA583399JPEAJ JTl, AA583399_PEA_1_T2, AA583399 JPEAJ JT3, AA583399JPEAJ JT4, AA583399JPEAJJT5, AA583399 JPEAJ T6, AA583399JPEAJ JT8 and AA583399JPEAJJT9. Table 27 below describes the starting and ending position of this segment on each transcript. Table 27 - Segment location on transcripts
Figure imgf001092_0001
Segment cluster AA583399_PEA_l_node_10 according to the present invention is supported by 59 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): AA583399_PEA_1_T0, AA583399 JΕAJ JTl, AA583399JPEAJ JT2, AA583399JPEAJ JT3, AA583399JPEAJJT4, AA583399JPEAJJT5, AA583399 JΕA JJT6, AA583399 JPEAJ JT7, AA583399JPEAJJT8 and AA583399JΕA JJT9. Table 28 below describes the starting and ending position of this segment on each franscript. Table 28 - Segment location on transcripts
Figure imgf001092_0002
Figure imgf001093_0001
Segment cluster AA583399JPEAJ_nodeJ2 according to the present invention is supported by 34 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): AA583399JPEAJ JT0, AA583399_PEA_1_T1, AA583399_PEA_1_T2, AA583399_PEA_1_T3, AA583399 J EAJ JT4, AA583399_PEA_1_T5, AA583399 JΕAJJT6, AA583399JPEAJ JT7, AA583399JPEAJJT8 and AA583399JPEAJ _T9. Table 29 below describes the starting and ending position of this segment on each transcript. Table 29 - Segment location on transcripts
Figure imgf001093_0002
Segment cluster AA583399JPEAJ jιodeJ4 according to the present invention is supported by 6 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): AA583399JPEAJJT12 and AA583399_PEA_1_T16. Table 30 below describes the starting and ending position of this segment on each transcript. Table 30 - Segment location on transcripts
Figure imgf001094_0001
Segment cluster AA583399JPEAJ jnode .1 according to the present invention is supported by 15 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): AA583399JPEAJJT10, AA583399 JΕAJ JTl 1, AA583399JPEAJ_T12, AA583399_PEAJ_T15, AA583399 JPEAJ JTl 6 and AA583399 JPEAJ JT17. Table 31 below describes the starting and ending position of this segment on each franscript. Table 31 - Segment location on transcripts
Figure imgf001094_0002
Segment cluster AA583399JPEAJ jnode _24 according to the present invention is supported by 7 libraries. The number of libraries was detennined as previously described. This segment can be found in the following transcript(s): AA583399JPEAJ JTl 1, AA583399JΕAJ JT12, AA583399JΕAJ JT15 and AA583399_PEA_1_T16. Table 32 below describes the starting and ending position of this segment on each transcript. Table 32 - Segment location on transcripts
Figure imgf001095_0001
Segment cluster AA583399JPEAJ _nodeJ5 according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): AA583399JPEAJ JT16. Table 33 below describes the starting and ending position of this segment on each transcript. Table 33 - Segment location on transcripts
Figure imgf001095_0002
Segment cluster AA583399JΕAJ jιodeJ29 according to the present invention is supported by 8 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): AA583399 JPEAJ JTl 1, AA583399_PEA_1_T12 and AA583399JPEAJ JT15. Table 34 below describes the starting and ending position of this segment on each transcript. Table 34 - Segment location on transcripts
Figure imgf001096_0001
According to an optional embodiment ofthe present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster AA583399_PEA_l_node_l according to the present invention is supported by 22 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): AA583399_PEA_1_T0, AA583399JPEAJJT1, AA583399JPEAJ T2, AA583399_PEA_1_T3, AA583399 JPEAJ JT4, AA583399_PEA_1_T5, AA583399JPEAJ _T6, AA583399_PEA_1_T7, AA583399JPEAJ JT8, AA583399_PEA_1_T9, AA583399JPEAJJT10 and AA583399JPEAJJT11. Table 35 below describes the starting and ending position of this segment on each franscript. Table 35 - Segment location on transcripts
Figure imgf001096_0002
Figure imgf001097_0001
Segment cluster AA583399_PEA_l_nodeJ according to the present invention is supported by 24 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): AA583399JPEAJ JT0, AA583399JPEAJJT1, AA583399JPEAJ JT2, AA583399_PEA_1_T3, AA583399 JPEA JJT4, AA583399JPEAJ JT5, AA583399JPEA JJT6, AA583399JPEAJJT7, AA583399_PEA_1_T8, AA583399_PEA_1_T9, AA583399_PEA_1_T10, AA583399_PEA_1_T11 and AA583399JPEAJJT17. Table 36 below describes the starting and ending position of this segment on each transcript. Table 36 - Segment location on transcripts
Figure imgf001097_0002
Figure imgf001098_0001
Segment cluster AA583399_PEA_l_node_4 according to the present invention is supported by 13 libraries. The number of libraries was detennined as previously described. This segment can be found in the following transcript(s): AA583399JPEAJJT0, AA583399 JPEAJ JTl, AA583399_PEA_1_T4, AA583399 JΕAJ JT6, AA583399_PEA_1_T7 and AA583399_PEA_1_T9. Table 37 below describes the starting and ending position of this segment on each transcript. Table 37 - Segment location on transcripts
Figure imgf001098_0002
Segment cluster AA583399_PEA_l_nodeJ according to the present invention is supported by 13 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): AA583399JPEAJ JT0, AA583399 JPEAJ JTl, AA583399JPEAJJT2, AA583399JPEAJJT4, AA583399_PEA_1_T5, AA583399_PEA_1_T6, AA583399JPEAJ JT7 and AA583399_PEA_1_T9. Table 38 below describes the starting and ending position of this segment on each franscript. Table 38 - Segment location on transcripts
Figure imgf001099_0001
Segment cluster AA583399_PEA_l_node_6 according to the present invention is supported by 13 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): AA583399JPEAJ JT0, AA583399 JPEAJJTl, AA583399_PEA_1_T2, AA583399_PEA_1_T3, AA583399 J EA J_T4, AA583399_PEA_1_T5, AA583399_PEA_1_T6, AA583399 JPEAJ JT7 and AA583399JPEAJ JT9. Table 39 below describes the starting and ending position of this segment on each transcript. Table 39 - Segment location on transcripts
Figure imgf001099_0002
Figure imgf001100_0001
Segment cluster AA583399JPEAJ_node_7 according to the present invention is supported by 9 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): AA583399JPEAJ JT0, AA583399JΕAJ _T3, AA583399JPEAJ_T4, AA583399_PEA_1_T5 and AA583399_PEA_1_T7. Table 40 below describes the starting and ending position of this segment on each transcript. Table 40 - Segment location on transcripts
Figure imgf001100_0002
Segment cluster AA583399JPEAJ jιode_8 according to the present invention is supported by 13 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): AA583399 ΕAJ JT0, AA583399 JPEAJJTl, AA583399_PEA_1_T2, AA583399JΕAJJT3, AA583399JPEAJ JT4, AA583399_PEAJ _T5, AA583399 JPEAJ JT6, AA583399_PEA_1_T7 and AA583399_PEA_1_T9. Table 41 below describes the starting and ending position of this segment on each transcript. Table 41 - Segment location on transcripts
Figure imgf001101_0001
Segment cluster AA583399_PEA_l_node_l 1 according to the present invention can be found in the following transcript(s): AA583399_PEA_1_T0, AA583399JPEAJJT1, AA583399 JΕAJJT2, AA583399JPEAJJT3, AA583399_PEA_1_T4, AA583399_PEA_1_T5, AA583399 JPEA JJT6, AA583399_PEA_1_T7 and AA583399_PEA_1_T8. Table 42 below describes the starting and ending position of this segment on each transcript. Table 42 - Segment location on transcripts
Figure imgf001101_0002
Figure imgf001102_0001
Segment cluster AA583399_PEA_l_node_19 according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): AA583399JPEAJJT15. Table 43 below describes the starting and ending position of this segment on each transcript. Table 43 - Segment location on transcripts
Figure imgf001102_0002
Segment cluster AA583399_PEA_l_node_27 according to the present invention is supported by 4 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): AA583399JPEAJJT11, AA583399JPEAJJT12 and AA583399JPEAJJT15. Table 44 below describes the starting and ending position of this segment on each franscript. Table 44 - Segment location on transcripts
Figure imgf001102_0003
Variant protein alignment to the previously known protein: Sequence name: MYEO_HUMAN_Vl
Sequence documentation:
Alignment of: AA583399_PEA_1_P2 x MYE0_HUMAN_V1
Alignment segment 1/1:
Quality: 2473.00 Escore: 0 Matching length: 255 Total length: 255 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment : . . . . . 1 MFTRQAGHFVEGSKAGRSRGRLCLSQALRVAVRGAFVSLWFAAGAGDRER 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 59 MFTRQAGHFVEGSKAGRSRGRLCLSQALRVAVRGAFVSLWFAAGAGDRER 108 51 NKGDKGAQTGAGLSQEAEDVDVSRARRVTDAPQGTLCGTGNRNSGSQSAR 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I 109 NKGDKGAQTGAGLSQEAEDVDVSRARRVTDAPQGTLCGTGNRNSGSQSAR 158 101 WGVAHLGEAFRVGVEQAISSCPEEVHGRHGLSMEIMWARMDVALRSPGR 150 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I 159 WGVAHLGEAFRVGVEQAISSCPEEVHGRHGLSMEIMWARMDVALRSPGR 208 151 GLLAGAGALCMTLAESSCPDYERGRRACLTLHRHPTPHCSTWGLPLRVAG 200 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 209 GLLAGAGALCMTLAESSCPDYERGRRACLTLHRHPTPHCSTWGLPLRVAG 258 . . . . . 201 SWLTVVTVEALGGWRMGVRRTGQVGPTMHPPPVSGASPLLLHHLLLLLLI 250 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I 259 SWLTWTVEALGGWRMGVRRTGQVGPTMHPPPVSGASPLLLHHLLLLLLI 308 251 IILTC 255 I I I I I 309 IILTC 313
Sequence name: MYE0_HUMAN_V1
Sequence documentation:
Alignment of: AA583399_PEA_1_P x MYEOJHUMAN 71
Alignment segment 1/1:
Quality: 1607.00 Escore: 0 Matching length: 164 Total length: 164 Matching Percent Similarity: 100.00 Matching Percent
Identity: 100.00 Total Percent Similarity: 100.00 Total Percent
Identity: 100.00 Gaps : 0
Alignment :
28 RNSGSQSARVVGVAHLGEAFRVGVEQAISSCPEEVHGRHGLSMEIMWARM 77
150 RNSGSQSARWGVAHLGEAFRVGVEQAISSCPEEVHGRHGLSMEIMWARM 199
78 DVALRSPGRGLLAGAGALCMTLAESSCPDYERGRRACLTLHRHPTPHCST 127 200 DVALRSPGRGLLAGAGALCMTLAESSCPDYERGRRACLTLHRHPTPHCST 249
128 WGLPLRVAGSWLTVVTVEALGGWRMGVRRTGQVGPTMHPPPVSGASPLLL 177
250 WGLPLRVAGSWLTVVTVEALGGWRMGVRRTGQVGPTMHPPPVSGASPLLL 299
178 HHLLLLLLIIILTC 191
300 HHLLLLLLIIILTC 313
Sequence name: MYEOJHUMAN V2 Sequence documentation:
Alignment of: AA583399J?EA_1_P5 x MYE0_HUMAN_V2
Alignment segment 1/1:
Quality: 1206.00 Escore: 0 Matching length: 122 Total length: 122 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0
Alignment :
1 MEIMWARMDVALRSPGRGLLAGAGALCMTLAESSCPDYERGRRACLTLHR 50
192 MEIMWARMDVALRSPGRGLLAGAGALCMTLAESSCPDYERGRRACLTLHR 241
51 HPTPHCSTWGLPLRVAGSWLTVVTVEALGGWRMGVRRTGQVGPTMHPPPV 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 242 HPTPHCSTWGLPLRVAGSWLTWTVEALGGWRMGVRRTGQVGPTMHPPPV 291 101 SGASPLLLHHLLLLLLIIILTC 122 I I I I I I I I I I I I I I I I I I I I I I 292 SGASPLLLHHLLLLLLIIILTC 313
Sequence name: MYEO_HUMAN_V3
Sequence documentation:
Alignment of: AA58339 _PEA_1_P10 x MYEO_HUMAN_V3
Alignment segment 1/1:
Quality: 2475.00
Escore : 0 Matching length: 255 Total length : 255 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment : 1 MFTRQAGHFVEGSKAGRSRGRLCLSQALRVAVRGAFVSLWFAAGAGDRER 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 59 MFTRQAGHFVEGSKAGRSRGRLCLSQALRVAVRGAFVSLWFAAGAGDRER 108 51 NKGDKGAQTGAGLSQEAEDVDVSRARRVTDAPQGTLCGTGNRNSGSQSAR 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 109 NKGDKGAQTGAGLSQEAEDVDVSRARRVTDAPQGTLCGTGNRNSGSQSAR 158 101 AVGVAHLGEAFRVGVEQAISSCPEEVHGRHGLSMEIMWAQMDVALRSPGR 150 I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 159 AVGVAHLGEAFRVGVEQAISSCPEEVHGRHGLSMEIMWAQMDVALRSPGR 208 . . . . . 151 GLLAGAGALCMTLAESSCPDYERGRRACLTLHRHPTPHCSTWGLPLRVAG 200 I I I I I I I 1 I I I I I I I I I I I I I I I 1 I I I I I 1 I I I I I I 1 I I I I I I I I I I I I I 209 GLLAGAGALCMTLAESSCPDYERGRRACLTLHRHPTPHCSTWGLPLRVAG 258 201 SWLTVVTVEALGRWRMGVRRTGQVGPTMHPPPVSGASPLLLHHLLLLLLI 250 I I I I I I I I I I I I I I 1 I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I 259 SWLTVVTVEALGRWRMGVRRTGQVGPTMHPPPVSGASPLLLHHLLLLLLI 308
251 IILTC 255
309 IILTC 313
Expression of myeloma overexpressed gene (in a subset of t(l 1;14) positive multiple myelomas) (MYEOV) AA583399 franscripts which are detectable by amplicon as depicted in sequence name AA583399seg30-32 in normal and cancerous colon tissues Expression of myeloma overexpressed gene (in a subset of t(l 1;14) positive multiple myelomas) (MYEOV) transcripts detectable by or according to seg30-32, AA583399seg30-32 amplicon (SEQ ID NO: 1321) and AA583399seg30-32F (SEQ ID NO: 1319) and AA583399seg30-32R(SEQ ID NO: 1320) primers was measured by real time PCR. In parallel the expression of four housekeeping genes -PBGD (GenBank Accession No. BCO 19323; amplicon - PBGD-amplicon, SEQ ID NO:531), HPRTl (GenBank Accession No. NM_000194; amplicon - HPRTl -amplicon, SEQ ID NO:612), G6PD (GenBank Accession No. NM 00402; G6PD amplicon, SEQ ID NO:615), RPS27A (GenBanlc Accession No. NM 02954; RPS27A amplicon, SEQ ID NO: 1261), was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities ofthe housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the normal post-mortem (PM) samples (Sample Nos. 41, 52, 62-67, 69-71, Table 1, above, " Tissue samples in testing panel"), to obtain a value of fold up-regulation for each sample relative to median ofthe normal PM samples. Figure 41 is a histogram showing over expression ofthe above-indicated myeloma overexpressed gene (in a subset of t(l 1;14) positive multiple myelomas) (MYEOV) transcripts in cancerous colon samples relative to the normal samples. (Values represent the average of duplicate experiments. Enor bars indicate the minimal and maximal values obtained.) The number and percentage of samples that exhibit at least 5 fold over-expression, out ofthe total number of samples tested is indicated in the bottom. As is evident from Figure 41 , the expression of myeloma overexpressed gene (in a subset of t(l 1;14) positive multiple myelomas) (MYEOV) transcripts detectable by the above amplicon in cancer samples was significantly higher than in the non-cancerous samples (Sample Nos. 41, 52, 62-67, 69-71, Table 1, " Tissue samples in testing panel"). Notably an over- expression of at least 5 fold was found in 27 out of 37 adenocarcinoma samples, Statistical analysis was applied to verify the significance of these results, as described below. The P value for the difference in the expression levels of myeloma overexpressed gene (in a subset of t(l 1;14) positive multiple myelomas) (MYEOV) transcripts detectable by the above amplicon in colon cancer samples versus the normal tissue samples was determined by T test as 6.50E-05. Threshold of 5 fold overexpression was found to differentiate between cancer and normal samples with P value of 1.56E-05 as checked by exact fisher test. The above values demonstrate statistical significance ofthe results. Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non- limiting illustrative example only of a suitable primer pair: AA583399seg30-23F forward primer; and AA583399seg30-32 Rreverse primer. The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon was obtained as a non-limiting illustrative example only of a suitable amplicon: AA583399seg30-32.
Forward primer (SEQ ID NO: 1319): TGGAGATTCCTGGTTTAAAGCATT Reverse primer (SEQ ID NO: 1320): CCCCAGCTTAGAGCTGCACT Amplicon (SEQ ID NO: 1321):
TGGAGATTCCTGGTTTAAAGCATTTAAAGCCTCTGTGAAAATTTGCCCAGGCCAACA ACTTCACTTTCCACACTCAGTGCCACGAAGTGCAGCTCTAAGCTGGGG
Expression of myeloma overexpressed gene (in a subset of t(l 1 ; 14) positive multiple myelomas) (MYEOV) AA583399 transcripts which are detectable by amplicon as depicted in sequence name AA583399segl7 in normal and cancerous colon tissues Expression of myeloma overexpressed gene (in a subset of t(l 1;14) positive multiple myelomas) (MYEOV) transcripts detectable by or according to segl7, AA583399segl7 amplicon (SEQ ID NO: 1324) and AA583399segl7F (SEQ ID NO: 1322) AA583399segl7R (SEQ ID NO: 1323) primers was measured by real time PCR. In parallel the expression of four housekeeping genes -PBGD (GenBank Accession No. BC019323; amplicon - PBGD-amplicon, SEQ ID NO:531), HPRTl (GenBank Accession No. NM_000194; amplicon - HPRTl -amplicon, SEQ ID NO:612), G6PD (GenBank Accession No. NMJD00402; G6PD amplicon, SEQ ID NO:615), RPS27A (GenBank Accession No. NM 02954; RPS27A amplicon, SEQ ID NO: 1261), was measured similarly. For each RT sample, the expression of the above amplicon was noπnalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the normal post-mortem (PM) samples (Sample Nos. 41, 52, 62-67, 69-71, Table 1, above, " Tissue samples in testing panel"), to obtain a value of fold up-regulation for each sample relative to median of the normal PM samples. Figure 42 is a histogram showing over expression ofthe above-indicated myeloma overexpressed gene (in a subset of t(l 1;14) positive multiple myelomas) (MYEOV) transcripts in cancerous colon samples relative to the normal samples. (Values represent the average of duplicate experiments. Enor bars indicate the minimal and maximal values obtained.) The number and percentage of samples that exhibit at least 5 fold over-expression, out ofthe total number of samples tested is indicated in the bottom. As is evident from Figure 42, the expression of myeloma overexpressed gene (in a subset of t(l 1;14) positive multiple myelomas) (MYEOV) transcripts detectable by the above amplicon in cancer samples was significantly higher than in the non-cancerous samples (Sample Nos. 41, 52, 62-67, 69-71, Table 1, " Tissue samples in testing panel"). Notably an over- expression of at least 5 fold was found in 22 out of 37 adenocarcinoma samples. Statistical analysis was applied to verify the significance of these results, as described below. The P value for the difference in the expression levels of myeloma overexpressed gene (in a subset of t(l 1 ; 14) positive multiple myelomas) (MYEOV) transcripts detectable by the above amplicon in colon cancer samples versus the normal tissue samples was determined by T test as 2.37E-04. Threshold of 5 fold overexpression was found to differentiate between cancer and normal samples with P value of 3.42E-04 as checked by exact fisher test. The above values demonstrate statistical significance ofthe results. Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non- limiting illustrative example only of a suitable primer pair: AA583399segl 7F forward primer; and AA583399segl 7 Rreverse primer. The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon was obtained as a non-limiting illustrative example only of a suitable amplicon: AA583399segl 7. Forward primer (SEQ ID NO: 1322): CATTCTCCACGCATCAGATGA Reverse primer (SEQ ID NO: 1323): ACCATCAGATTGGCAGCATG Amplicon (SEQ ID NO: 1324): CATTCTCCACGCATCAGATGATCCTGTGGCCCCTCAGTGCCAGGCCCCACTGGCCCT CTGCGCACATCAGTGACTCTGATGTTCTCCCCCACCGCATGCTGCCAATCTGATGGT
Expression of myeloma overexpressed gene (in a subset of t(l 1;14) positive multiple myelomas) (MYEOV) AA583399 transcripts which are detectable by amplicon as depicted in sequence name AA583399segl in normal and cancerous colon tissues Expression of myeloma overexpressed gene (in a subset of t(l 1;14) positive multiple myelomas) (MYEOV) transcripts detectable by or according to segl, AA583399segl amplicon (SEQ ID NO: 1327) and AA583399seglF(SBQ ID NO: 1325) AA583399seglR (SEQ ID NO: 1326) primers was measured by real time PCR. In parallel the expression of four housekeeping genes -PBGD (GenBank Accession No. BC019323; amplicon - PBGD-amplicon, SEQ ID NO:531), HPRTl (GenBank Accession No. NM_000194; amplicon - HPRTl -amplicon, SEQ ID NO:612), G6PD (GenBank Accession No. NM_000402; G6PD amplicon, SEQ ID NO:615), RPS27A (GenBank Accession No. NM_002954; RPS27A amplicon, SEQ ID NO:1261), was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the normal post-mortem (PM) samples (Sample Nos. 41, 52, 62-67, 69-71, Table 1, above, "Tissue samples in testing panel"), to obtain a value of fold up-regulation for each sample relative to median of the normal PM samples. Figure 43 is a histogram showing over expression ofthe above-indicated myeloma overexpressed gene (in a subset of t(l 1;14) positive multiple myelomas) (MYEOV) transcripts in cancerous colon samples relative to the normal samples. The number and percentage of samples that exhibit at least 5 fold over-expression, out ofthe total number of samples tested is indicated in the bottom. As is evident from Figure 43, the expression of myeloma overexpressed gene (in a subset of t(l 1;14) positive multiple myelomas) (MYEOV) transcripts detectable by the above amplicon in cancer samples was significantly higher than in the non-cancerous samples (Sample Nos. 41, 52, 62-67, 69-71, Table 1, "Tissue samples in testing panel"). Notably an over- expression of at least 5 fold was found in 23 out of 37 adenocarcinoma samples. Statistical analysis was applied to verify the significance of these results, as described below. The P value for the difference in the expression levels of myeloma overexpressed gene
(in a subset of t(l 1;14) positive multiple myelomas) (MYEOV) franscripts detectable by the above amplicon in colon cancer samples versus the normal tissue samples was detennined by T test as 1.55E-05. Threshold of 5 fold overexpression was found to differentiate between cancer and normal samples with P value of 1.97E-04 as checked by exact fisher test. The above values demonstrate statistical significance ofthe results.
Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non- limiting illustrative example only of a suitable primer pair: AA583399seglF forward primer; and AA583399segl Rreverse primer. The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon was obtained as a non-limiting illustrative example only of a suitable amplicon: AA583399segl .
Forward primer (SEQ ID NO: 1325): GAATCAGCCCAAAGCCAGG Reverse primer (SEQ ID NO: 1326): GCTGGTGAAAGCACTGGGTT Amplicon (SEQ ID NO: 1327): GAATCAGCCCAAAGCCAGGCGTCCAGGGTCTCCCTCACCTGAAGCTGACTTTTTCCC CACCTTGGACAGAGGGCGGGAGATGCCATCCCCACTGAACCCAGTGCTTTCACCAG C
DESCRIPTION FOR CLUSTER AI684092
Cluster AI684092 features 2 transcript(s) and 8 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end ofthe application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Figure imgf001114_0001
Table 2 - Segments of interest
Figure imgf001115_0001
Table 3 - Proteins of interest
Figure imgf001115_0002
Cluster AI684092 can be used as a diagnostic marker according to overexpression of franscripts of this cluster in cancer. Expression of such transcripts in normal tissues is also given according to the previously described methods. The term "number" in the left hand column of the table and the numbers on the y-axis ofthe figure below refer to weighted expression of ESTs in each category, as "parts per million" (ratio ofthe expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million).
Overall, the following results were obtained as shown with regard to the histograms in Figure 44 and Table 4. This cluster is overexpressed (at least at a minimum level) in the following pathological conditions: brain malignant tumors, epithelial malignant tamors and a mixtare of malignant tamors from different tissues.
Table 4 - Normal tissue distribution
Figure imgf001116_0001
Table 5 - P values and ratios for expression in cancerous tissue
Figure imgf001116_0002
Figure imgf001117_0001
As noted above, cluster AI684092 featares 2 franscript(s), which were listed in Table 1 above. A description of each variant protein according to the present invention is now provided.
Variant protein AI684092_PEA_1_P1 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) AI684092JPEAJ JT2. The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: intracellularly. The protein localization is believed to be intracellularly because neither ofthe trans-membrane region prediction programs predicted a trans-membrane region for this protein. In addition both signal-peptide prediction programs predict that this protein is a non-secreted protein. Variant protein AI684092_PEA_1_P1 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 6, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is Icnown or not; the presence of known SNPs in variant protein AI684092_PEA_1_P1 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 6 - Amino acid mutations
Figure imgf001117_0002
Figure imgf001118_0001
Variant protein AI684092_PEA_1_P1 is encoded by the following transcript(s): AI684092JPEAJJT2, for which the sequence(s) is/are given at the end ofthe application. The coding portion of franscript AI684092 JPEA JJT2 is shown in bold; this coding portion starts at position 1480 and ends at position 2058. The franscript also has the following SNPs as listed in Table 7 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein AI684092_PEA_1_P1 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Nucleic acid SNPs
Figure imgf001118_0002
Variant protein AI684092 JΕAJ JP3 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) AI684092JPEAJ JT3. The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: intracellularly. The protein localization is believed to be intracellularly because neither ofthe frans-membrane region prediction programs predicted a trans-membrane region for this protein. In addition both signal-peptide prediction programs predict that this protein is a non-secreted protein.
Variant protein AI684092JPEAJ JP3 is encoded by the following transcript(s): AI684092_PEA_1_T3, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript AI684092 JPEA JJT3 is shown in bold; this coding portion starts at position 28 and ends at position 279. The transcript also has the following SNPs as listed in Table 8 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is Icnown or not; the presence of known SNPs in variant protein AI684092JPEA J JP3 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Nucleic acid SNPs
Figure imgf001119_0001
Figure imgf001120_0001
As noted above, cluster AI684092 features 8 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided. Segment cluster AI684092JPEAJ jnodeJ) according to the present invention is supported by 5 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): AI684092JPEAJ JT2 and AI684092JPEAJ JT3. Table 9 below describes the starting and ending position of this segment on each franscript. Table 9 - Segment location on transcripts
Figure imgf001120_0002
Segment cluster AI684092 JPEAJ ιode_2 according to the present invention is supported by 5 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): AI684092JPEAJ JT2 and AI684092JΕAJ JT3. Table 10 below describes the starting and ending position of this segment on each transcript. Table 10 - Segment location on transcripts
Figure imgf001121_0001
Segment cluster AI684092JPEAJ jnode _4 according to the present invention is supported by 7 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): AI684092JPEAJJT2 and AI684092_PEA_1_T3. Table 11 below describes the starting and ending position of this segment on each transcript. Table 11 - Segment location on transcripts
Figure imgf001121_0002
Segment cluster AI684092JPEA Jjnode _5 according to the present invention is supported by 7 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): AI684092_PEA_1_T2 and AI684092JPEAJJT3. Table 12 below describes the starting and ending position of this segment on each transcript. Table 12 - Segment location on transcripts
Figure imgf001121_0003
Segment cluster AI684092JPEAJ j odeJS according to the present invention is supported by 7 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): AI684092JPEAJ JT2 and AI684092JΕAJ JT3. Table 13 below describes the starting and ending position of this segment on each transcript. Table 13 - Segment location on transcripts
Figure imgf001122_0001
Segment cluster AI684092_PEA_l_node_7 according to the present invention is supported by 9 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): AI684092JPEAJ JT2 and AI684092_PEA_1_T3. Table 14 below describes the starting and ending position of this segment on each franscript. Table 14 - Segment location on transcripts
Figure imgf001122_0002
Segment cluster AI684092_PEA_l_node_8 according to the present invention is supported by 10 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): AI684092_PEA_1_T2. Table 15 below describes the starting and ending position of this segment on each transcript. Table 15 - Segment location on transcripts
Figure imgf001123_0001
Segment cluster AI684092_PEA_l_node_9 according to the present invention is supported by 9 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): AI684092_PEA_1_T2 and AI684092JPEAJ JT3. Table 16 below describes the starting and ending position of this segment on each transcript. Table 16 - Segment location on transcripts
Figure imgf001123_0002
Example 1 Expression ofAA5315451 transcripts which are detectable by amplicon as depicted in sequence name AA531545 lseg8 in normal and cancerous colon tissues Expression of AA5315457 franscripts detectable by or according to seg8, AA5315451 seg8 amplicon (SEQ ID NO: 1330) and -L45315457E (SΕQ ID NO: 1328) -4 5315457R (SΕQ ID NO: 1329) primers was measured by real time PCR. In parallel the expression of four housekeeping genes -PBGD (GenBank Accession No. BCO 19323; amplicon - PBGD-amplicon, SΕQ ID NO:531), HPRTl (GenBank Accession No. NM_000194; amplicon - HPRTl -amplicon, SΕQ ID NO:612), G6PD (GenBank Accession No. NM_000402; G6PD amplicon, SΕQ ID NO:615), RPS27A (GenBank Accession No. NM_002954; RPS27A amplicon, SΕQ ID NO: 1261), was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the normal post-mortem (PM) samples (Sample Nos. 41, 52, 62-67, 69-71, Table 1, above, "Tissue samples in testing panel"), to obtain a value of fold up-regulation for each sample relative to median ofthe normal PM samples. Figure 45 is a histogram showing over expression ofthe above-indicated -4-45315457 transcripts in cancerous colon samples relative to the normal samples. (Values represent the average of duplicate experiments. Enor bars indicate the minimal and maximal values obtained.) The number and percentage of samples that exhibit at least 3 fold over-expression, out ofthe total number of samples tested is indicated in the bottom. As is evident from Figure 45, the expression of AA5 15457 transcripts detectable by the above amplicon in cancer samples was significantly higher than in the non-cancerous samples (Sample Nos. 41, 52, 62-67, 69-71, Table 1 above, "Tissue samples in testing panel"). Notably an over-expression of at least 3 fold was found in 10 out of 37 adenocarcinoma samples, Statistical analysis was applied to verify the significance of these results, as described below. The P value for the difference in the expression levels of -4-45315457 transcripts detectable by the above amplicon in colon cancer samples versus the normal tissue samples was detennined by T test as 1.66E-05. Threshold of 3 fold overexpression was found to differentiate between cancer and normal samples with P value of 5.33E-02 as checked by exact fisher test. The above values demonstrate statistical significance ofthe results.
Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non- limiting illustrative example only of a suitable primer pair: -4.45315457E forward primer; and ,4 5315457Rreverse primer. The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon was obtained as a non-limiting illustrative example only of a suitable amplicon: .4-45315457. Forward primer (SΕQ ID NO: 1328): CATGGACCCCAGGCAAGTC Reversr primer (SΕQ ID NO: 1329): CTGTTTAGGGTCGAGGCTGTG Amplicon (SΕQ ID NO: 1330): CATGGACCCCAGGCAAGTCCCCCCACCCACGCATTTCTAATCATCTGCCCTGGTTTT GCCTCCTGAGTCTGTTAAGGCTGTGTGCCCCTCATCGAGGCCCGTCACAGCCTCGAC CCTAAACAG
DESCRIPTION FOR CLUSTER HUMCACHIA
Cluster HUMCACHIA featares 18 transcript(s) and 67 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end ofthe application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Figure imgf001125_0001
Figure imgf001126_0001
Figure imgf001127_0001
Figure imgf001128_0001
Table 3 - Proteins of interest
Figure imgf001128_0002
Figure imgf001129_0001
These sequences are variants ofthe Icnown protein Voltage-dependent L-type calcium channel alpha-ID subunit (SwissProt accession identifier CCAD HUMAN; Icnown also according to the synonyms Calcium channel, L type, alpha-1 polypeptide, isoform 2), SEQ ID NO: 790, refened to herein as the previously Icnown protein. Protein Voltage-dependent L-type calcium channel alpha- ID subunit is known or believed to have the following function(s): Voltage-sensitive calcium channels (VSCC) mediate the entry of calcium ions into excitable cells and are also involved in a variety of calcium-dependent processes, including muscle contraction, hormone or neurofransmitter release, gene expression, cell motility, cell division and cell death. The isoform alpha-ID gives rise to L-type calcium cunents. Long-lasting (L-type) calcium channels belong to the "high-voltage activated" (HVA) group. They are blocked by dihydropyridines (DHP), phenylalkylamines, benzothiazepines, and by omega-agatoxin-IIIA (omega-aga-IIIA). They are however insensitive to omega-conotoxin- GVTA (omega-CTx-GVIA) and omega-agatoxin-IVA (omega-aga-IVA). The sequence for protein Voltage-dependent L-type calcium channel alpha-ID subunit is given at the end ofthe application, as "Voltage-dependent L-type calcium channel alpha- ID subunit amino acid sequence". Known polymoφhisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein
Figure imgf001130_0001
Protein Voltage-dependent L-type calcium channel alpha- ID subunit localization is believed to be Integral membrane protein.
The following GO Annotation(s) apply to the previously Icnown protein. The following annotation(s) were found: transport; cation transport; calcium ion transport, which are annotation(s) related to Biological Process; calcium binding; dihydropyridine-sensitive calcium channel, which are annotation(s) related to Molecular Function; and voltage-gated calcium channel; integral membrane protein, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more ofthe SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot >; or Locuslink, available from <htto://www.ncbi.nlm.nih.gov/projects/LocusLink/>. Cluster HUMCACHIA can be used as a diagnostic marker according to overexpression of transcripts of this cluster in cancer. Expression of such transcripts in nonnal tissues is also given according to the previously described methods. The tenn "number" in the left hand column of the table and the numbers on the y-axis ofthe figure below refer to weighted expression of ESTs in each category, as "parts per million" (ratio of the expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million).
Overall, the following results were obtained as shown with regard to the histograms in Figure 46 and Table 5. This cluster is overexpressed (at least at a minimum level) in the following pathological conditions: a mixtare of malignant tamors from different tissues.
Table 5 - Normal tissue distribution
Figure imgf001131_0001
Table 6 - P values and ratios for expression in cancerous tissue
Figure imgf001131_0002
Figure imgf001132_0001
As noted above, cluster HUMCACHIA featares 18 transcript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Voltage- dependent L-type calcium channel alpha- ID subunit. A description of each variant protein according to the present invention is now provided. Variant protein HUMCACHl A_PEAJ_P2 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) HUMCACHl A_PEA_1_T0, HUMCACH 1A_PEA_1_T1, HUMCACH 1 AJPEAJ JT2, HUMCACHIAJPEAJ JT3 and HUMCACHl A JΕA J_T4. The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because both frans-membrane region prediction programs predicted a trans-membrane region for this protein. Variant protein HUMCACHl A_PEA_1_P2 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 7, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMCACHIAJPEAJ JP2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Amino acid mutations
Figure imgf001133_0001
Variant protein HUMCACHl A_PEA_1_P2 is encoded by the following transcript(s): HUMCACHl AJPEAJ _T0, HUMCACHl AJPEAJ JTl, HUMCACHl A_PEA_1_T2, HUMCACHIAJPEAJ JT3 and HUMCACHIAJPEAJ JT4, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript HUMCACHIAJPEAJ JT0 is shown in bold; this coding portion starts at position 512 and ends at position 7054. The transcript also has the following SNPs as listed in Table 8 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMCACH 1 AJPEAJ JP2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Nucleic acid SNPs
Figure imgf001133_0002
Figure imgf001134_0001
The coding portion of franscript HUMCACH1A_PEA_1_T1 is shown in bold; this coding portion starts at position 89 and ends at position 6631. The franscript also has the following SNPs as listed in Table 9 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMCACH1A_PEA_1_P2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Nucleic acid SNPs
Figure imgf001134_0002
Figure imgf001135_0001
The coding portion of transcript HUMCACHl AJPEAJ JT2 is shown in bold; this coding portion starts at position 512 and ends at position 7054. The transcript also has the following SNPs as listed in Table 10 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMCACHl A_PEA_1_P2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 -Nucleic acid SNPs
Figure imgf001135_0002
Figure imgf001136_0001
The coding portion of franscript HUMCACHl A_PEA_1_T3 is shown in bold; this coding portion starts at position 512 and ends at position 7054. The franscript also has the following SNPs as listed in Table 11 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMCACHl AJPEAJ J*2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 11 - Nucleic acid SNPs
Figure imgf001137_0001
The coding portion of transcript HUMCACHl AJPEAJ JT4 is shown in bold; this coding portion starts at position 512 and ends at position 7054. The franscript also has the following SNPs as listed in Table 12 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMCACH 1AJPEAJ JP2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 12 - Nucleic acid SNPs
Figure imgf001137_0002
Figure imgf001138_0001
Variant protein HUMCACHl AJPEAJ J?3 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) HUMCACH 1AJPEAJ JT6. The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because both trans-membrane region prediction programs predicted a trans-membrane region for this protein. Variant protein HUMCACHl A_PEA_1_P3 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 13, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMCACHl AJPEAJ JP3 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 13 - Amino acid mutations
Figure imgf001139_0001
Variant protein HUMCACHl AJPEAJ JP3 is encoded by the following transcript(s): HUMCACHIAJPEAJ JT6, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript HUMCACHl A_PEA J T6 is shown in bold; this coding portion starts at position 512 and ends at position 6157. The transcript also has the following SNPs as listed in Table 14 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMCACH1AJΕAJ JP3 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 14 - Nucleic acid SNPs
Figure imgf001139_0002
Figure imgf001140_0001
Variant protein HUMCACHl A_PEA_1_P4 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) HUMCACHIAJPEAJ JT7. The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because both trans-membrane region prediction programs predicted a trans-membrane region for this protein. Variant protein HUMCACHl A_PEA_1_P4 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 15, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of Icnown SNPs in variant protein HUMCACHIAJPEAJ JP4 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 15 - Amino acid mutations
Figure imgf001141_0001
Variant protein HUMCACHl A_PEA_1_P4 is encoded by the following transcript(s): HUMCACHIAJPEAJ JT7, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript HUMCACH 1AJPEAJ JT7 is shown in bold; this coding portion starts at position 512 and ends at position 7027. The transcript also has the following SNPs as listed in Table 16 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMCACHIAJPEAJ JP4 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 16 - Nucleic acid SNPs
Figure imgf001141_0002
Figure imgf001142_0001
Variant protein HUMCACHIAJPEAJ JP5 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by franscript(s) HUMCACHIAJPEAJ JT8. The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because both trans-membrane region prediction programs predicted a trans-membrane region for this protein. Variant protein HUMCACHl AJPEA JP5 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 17, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMCACHl AJPEAJ JP5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 17 - Amino acid mutations
Figure imgf001143_0001
Variant protein HUMCACHl AJPEA JJP 5 is encoded by the following transcript(s): HUMCACH 1A_PEA_1_T8, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript HUMCACH 1AJPEAJ _T8 is shown in bold; this coding portion starts at position 512 and ends at position 6994. The transcript also has the following SNPs as listed in Table 18 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMCACHl A_PEA_1_P5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 18 - Nucleic acid SNPs
Figure imgf001143_0002
Figure imgf001144_0001
Variant protein HUMCACHl AJPEAJ JP7 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) HUMCACH1AJPEA_1_T12. An alignment is given to the known protein (Voltage-dependent L-type calcium channel alpha-ID subunit) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMCACH1AJΕAJ JP7 and CCAD JHUMAN 3 (SEQ ID NO:791): l.An isolated chimeric polypeptide encoding for HUMCACHl AJPEAJ JP7, comprising a first amino acid sequence being at least 90 % homologous to MPTSETESVNTENVSGEGENRGCCGSL conesponding to amino acids 466 - 492 of CCADJTUMANJ/3, which also conesponds to amino acids 1 - 27 of HUMCACH1AJΕAJ JP7, a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence WCWWRPvRGAAKAGPSGCRRWG conesponding to amino acids 28 - 48 of HUMCACHIAJPEAJ J 7, and a third amino acid sequence being at least 90 % homologous to
QAISKSKLSRRWRRWNRFNIJIRCRAAVKSVTFYWLVIVLVFLNTLTISSEHYNQPDWL TQIQDIANKVLLALFTCEMLVKMYSLGLQAYFVSLFNRFDCFVVCGGITETILVELEIMS PLGISVFRCVRLLRIFKVTRHWTSLSNLVASLLNSMKSIASLLLLLFLFIIIFSLLGMQLFG GKFNFDETQTKRSTFDNFPQALLTVFQILTGEDWNAVMYDGIMAYGGPSSSGMIVCIYF IILFICGNYILLNVFLAIAVDNLADAESLNTAQIa5EAEEKERKi ARl^SLENKKNNKPE VNQIANSDNKVTIDDYREEDEDKDPYPPCDVPVGEEEEEEEEDEPEVPAGPRPRRISELN MKEKIAPIPEGSAFFILSKTNPIRVGCHKLINHHIFTNLILVFIMLSSAALAAEDPIRSHSFR NTILGYFDYAFTAIFTVEILLKMTTFGAFLHKGAFCRNYFNLLDMLWGVSLVSFGIQSS AIS VVKILRVLRVLRPLRATNRAKGLKHVVQCVFVAIRTIGNIMIVTTLLQFMFACIGVQ LFKGKFYRCTDEAKSNPEECRGLFILYKDGDVDSPVVRERIWQNSDFNFDNVLSAMMA LFTVSTFEGWPALLYKAIDSNGENIGPIYNHRVEISIFFIIYIIIVAFFMMNIFVGFVIVTFQE QGEI^YK^CELDKNQRQCVEYALKARPLPJRYIPKNPYQYKFWYVVNSSPFEYMMFVL IMLNTLCLAMQHYEQSKMFNDAMDILNMVFTGVFTVEMVLKVIAFKPKGYFSDAWNT FDSLIVIGSIIDVALSEADPTESENVPVPTATPGNSEESNRISITFFRLFRVMRLVKLLSRGE GIRTLLWTFIKSFQALPYVALLIAMLFFIYAVIGMQMFGKVAMRDNNQINRNNNFQTFP QAVLLLFRCATGEAWQEIMLACLPGKLCDPESDYNPGEEYTCGSNFAIVYFISFYMLCA FLIINLFVAVIMDNFDYLTRDWSILGPHHLDEFI<aJWSEYDPEAKGRIKHLDVNTLLRRI QPPLGFGI<XCPHRVACKRLVAMΝMPLΝSDGTVMFΝATLFALVRTALKIKTEGΝLEQA NEELRAVIKΩWKKTSMKLLD
VGKYPAK^TTIALQAGLRTLHDIGPEIRRAISCDLQDDEPEETKREEEDDVFKRNGALLG NHVNHVNSDRRDSLQQTNTTHRPLHVQRPSIPPASDTEKPLFPPAGNSVCHNHHNHNSI GKQ TSTNANLNNANMSI<^AHGI<ΠJPSIGNLEHVSENGHHSSHKHDREPQPVRSSVKRT RYYETYIRSDSGDEQLPTICREDPEIHGYFRDPHCLGEQEYFSSEECYEDDSSPTWSRQN YGYYSRYPGRMDSERPRGYHHPQGFLEDDDSPVCYDSPP.SPPPPLLPPTPASHRRSSFN FECLRRQSSQEEVPSSPIFPHRTALPLHLMQQQIMAVAGLDSSKAQKYSPSHSTRSWATP PATPPYRDWTPCYTPLIQVEQSEALDQVNGSLPSLHRSSWYTDEPDISYRTFTPASLTVP SSFRNKNSDKQRSADSLVEAVLISEGLGRYARDPKFVSATKHEIADACDLTIDEMESAA STLLNGNVRPRANGDVGPLSHRQDYELQDFGPGYSDEEPDPGRDEEDLADEMICITTL conesponding to amino acids 494 - 2161 of CCAD_HUMAN_V3, which also conesponds to amino acids 49 - 1716 of HUMCACHIAJPEAJ J>7, wherein said first amino acid sequence, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for an edge portion of HUMCACHIAJPEAJ JP7, comprising an amino acid sequence being at least 10%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence encoding for WCWWRTPJ-GAAKAGPSGCRRWG, conesponding to HUMCACH1A_PEAJ JP7. 3.A bridge portion of HUMCACHl AJPEAJ JP7, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise L, having a stmcture as follows (numbering according to HUMCACH 1A_PEA_1_P7): a sequence starting from any of amino acid numbers 492-x to 492; and ending at any of amino acid numbers 28 + ((n-2) - x), in which x varies from 0 to n-2.
It should be noted that the known protein sequence (CCAD JTUMAN) has one or more changes than the sequence given at the end ofthe application and named as being the amino acid sequence for CCAD_HUMAN_V3. These changes were previously known to occur and are listed in the table below. Table 19 - Changes to CCAD _HCMAN J 3
Figure imgf001146_0001
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because both trans-membrane region prediction programs predicted a trans-membrane region for this protein. Variant protein HUMCACHl A_PEA_1_P7 also has the following non-silent SNPs (Single Nucleotide Poiymoφhisms) as listed in Table 20, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is Icnown or not; the presence of known SNPs in variant protein HUMCACHl AJPEAJ JP7 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 20 - Amino acid mutations
Figure imgf001147_0001
Variant protein HUMCACHl AJPEAJ JP7 is encoded by the following franscript(s): HUMCACHIAJPEAJ JT12, for which the sequence(s) is/are given at the end ofthe application. The coding portion of franscript HUMCACHIAJPEAJ JT12 is shown in bold; this coding portion starts at position 240 and ends at position 5387. The transcript also has the following SNPs as listed in Table 21 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMCACHl A_PEA_1_P7 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 21 - Nucleic acid SNPs
Figure imgf001148_0001
Variant protein HUMCACHl A_PEA_1_P8 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by franscript(s) HUMCACHIAJPEAJ JT13. The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because both trans-membrane region prediction programs predicted a trans-membrane region for this protein. Variant protein HUMCACHl AJPEAJ JP8 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 22, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMCACHl AJPEAJ JP8 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 22 - Amino acid mutations
Figure imgf001149_0001
Variant protein HUMCACH1A_PEA_1_P8 is encoded by the following franscript(s): HUMCACHIAJPEAJ JT13, for which the sequence(s) is/are given at the end ofthe application. The coding portion of franscript HUMCACHIAJPEAJ JTl 3 is shown in bold; this coding portion starts at position 512 and ends at position 88889. The franscript also has the following SNPs as listed in Table 23 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMCACHl A_PEA_1_P8 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 23 - Nucleic acid SNPs
Figure imgf001149_0002
Figure imgf001150_0001
Variant protein HUMCACHl A_PEA_1_P9 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) HUMCACH 1AJPEAJJT 14. The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because both trans-membrane region prediction programs predicted a frans-membrane region for this protein. Variant protein HUMCACHl A_PEA_1_P9 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 24, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMCACH IAJPEAJ J*9 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 24 - Amino acid mutations
Figure imgf001150_0002
Figure imgf001151_0001
Variant protein HUMCACHl AJPEAJ JP9 is encoded by the following transcript(s): HUMCACH 1 AJ EAJ JT 14, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMCACHIAJPEAJ JTl 4 is shown in bold; this coding portion starts at position 512 and ends at position 5386. The transcript also has the following SNPs as listed in Table 25 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMCACHl AJPEAJ JP9 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 25 - Nucleic acid SNPs
Figure imgf001151_0002
Variant protein HUMCACHl A_PEA_1_P 10 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMCACHIAJPEAJ JTl 5. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because both trans-membrane region prediction programs predicted a trans-membrane region for this protein. Variant protein HUMCACH 1AJPEAJ JP 10 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 26, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is Icnown or not; the presence of Icnown SNPs in variant protein HUMCACH 1 A JPEA JJP 10 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 26 - Amino acid mutations
Figure imgf001152_0001
Variant protein HUMCACHIAJΕAJ JP10 is encoded by the following franscript(s): HUMCACHIAJPEAJ JT15, for which the sequence(s) is/are given at the end ofthe application. The coding portion of franscript HUMCACHl A_PEA_1_T15 is shown in bold; this coding portion starts at position 512 and ends at position 88889. The franscript also has the following SNPs as listed in Table 27 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is Icnown or not; the presence of known SNPs in variant protein HUMCACHl A_PEA_1__P10 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 27 - Nucleic acid SNPs
Figure imgf001153_0001
Variant protein HUMCACHIAJPEAJ JP11 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by franscript(s) HUMCACHIAJPEAJ JT16. The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because both frans-membrane region prediction programs predicted a trans-membrane region for this protein. Variant protein HUMCACHIAJPEAJ JP11 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 28, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is Icnown or not; the presence of known SNPs in variant protein HUMCACH 1AJPEAJ JP11 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 28 - Amino acid mutations
Figure imgf001154_0001
Variant protein HUMCACH 1AJPEAJ JP11 is encoded by the following transcript(s): HUMCACH 1 AJPEAJ _T 16, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript HUMCACH IAJPEAJ JTl 6 is shown in bold; this coding portion starts at position 512 and ends at position 88889. The transcript also has the following SNPs as listed in Table 29 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of Icnown SNPs in variant protein HUMCACHl A_PEA_1_P11 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 29 - Nucleic acid SNPs
Figure imgf001154_0002
Figure imgf001155_0001
Variant protein HUMCACHIAJPEAJ JP 12 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) HUMCACHIAJPEAJ JT17. The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because both trans-membrane region prediction programs predicted a trans-membrane region for this protein.
Variant protein HUMCACH 1 AJPEAJ JP 12 is encoded by the following transcript(s): HUMCACHl AJPEA JJT 17, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript HUMCACHIAJPEAJ JT17 is shown in bold; this coding portion starts at position 1 and ends at position 2644. The franscript also has the following SNPs as listed in Table 30 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is Icnown or not; the presence of known SNPs in variant protein HUMCACHl AJPEAJ JP 12 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 30 - Nucleic acid SNPs
Figure imgf001155_0002
Figure imgf001156_0001
Variant protein HUMCACHl AJPEA JJP13 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by franscript(s) HUMCACHIAJPEAJ JT18. An alignment is given to the Icnown protein (Voltage-dependent L-type calcium channel alpha- ID subunit) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMCACHl A_PEA JJP13 and CCADJHUMAN: l.An isolated chimeric polypeptide encoding for HUMCACHIAJPEAJ P13, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%o, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence MLRPRCLLRRTAHPPHSAPAPAPARSKCLGSWSNVLIRESSV SLRL conesponding to amino acids 1 - 47 of HUMCACHl A_PEA_1_P13, and a second amino acid sequence being at least 90 % homologous to DDEVTVGI<Π^YATFLIQDYFRI^KSJJ ΈQGLVGKYPAKNTTIALQAGLRTLHDIGPEIRR AISCDLQDDEPEETI<ΠJ5EEDDVFKIWGALLGNHVNHVNSDPJPJ)SLQQTNTTHRPLHVQ RPSIPPASDTEIAPLFPPAGNSVCHNHHNHNSIGKQWTSTNANLNNANMSKAAHGKRPS IGNLEHVSENGHHSSHKHDREPQRRSSVKRTRYΎETYIRSDSGDEQLPTICREDPEIHGY FRDPHCLGEQEYFSSEECYEDDSSPTWSRQNYGYYSRYPGRNIDSERPRGYHHPQGFLE DDDSPVCYDSP ISPPJJRLLPPTPASHRRSSFNFECLRRQSSQEEVPSSPIFPHRTALPLHL MQQQTMAVAGLDSSKAQKYSPSHSTRSWATPPATPPYRDWTPCYTPLIQVEQSEALDQ VNGSLPSLHRSSWYTDEPDISYRTFTPASLTVPSSFRNKNSDKQRSADSLVEAVLISEGL GRYARDPKFVSATKHEIADACDLTIDEMESAASTLLNGNVRPRANGDVGPLSHRQDYE LQDFGPGYSDEEPDPGRDEEDLADEMICITTL conesponding to amino acids 1598 - 2161 of CCADJHUMAN, which also conesponds to amino acids 48 - 611 of HUMCACH 1 AJPEAJ J513, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a head of HUMCACHl AJPEAJ JP13, comprising a polypeptide being at least 70%, optionally at least about 80%), preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MLRPRCLLRRTAHPPHSAPAPAPARSKCLGSWSNVLIRESSVWSLRL of HUMCACHIA PEA 1 P13.
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: intracellularly. The protein localization is believed to be intracellularly because neither ofthe frans-membrane region prediction programs predicted a frans-membrane region for this protein. In addition both signal-peptide prediction programs predict that this protein is a non-secreted protein.
The glycosylation sites of variant protein HUMCACHl AJPEAJ JP13, as compared to the known protein Voltage-dependent L-type calcium channel alpha- ID subunit, are described in Table 31 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 31 - Glycosylation site(s)
Figure imgf001157_0001
Figure imgf001158_0001
The phosphorilation sites of variant protein HUMCACHl AJPEAJ JP13, as compared to the known protein Voltage-dependent L-type calcium channel alpha- ID subunit, are described in Table 32 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the phosphorilation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 32 - Phosphorilation site(s)
Figure imgf001158_0002
Variant protein HUMCACH 1AJPEAJ JP 13 is encoded by the following transcript(s): HUMCACHl AJPEAJ JTl 8, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript HUMCACH 1 AJPEAJ JT 18 is shown in bold; this coding portion starts at position 63 and ends at position 1895. The transcript also has the following SNPs as listed in Table 33 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMCACHIAJPEAJ J313 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 33 - Nucleic acid SNPs
Figure imgf001158_0003
Figure imgf001159_0001
Variant protein HUMCACHl A PEA J_P14 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) HUMCACHIAJPEAJ JT19. An alignment is given to the known protein (Voltage-dependent L-type calcium channel alpha- ID subunit) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMCACH 1A_PEA_1_P 14 and CCADJHUMAN: l.An isolated chimeric polypeptide encoding for HUMCACH1A_PEA_1_P14, comprising a first amino acid sequence being at least 90 % homologous to MSKAAHGKPJPSIGNLEHVSENGHHSSHi DREPQRRSSVKRTRYYETYIRSDSGDEQLP TICREDPEIHGYFRDPHCLGEQEYFSSEECYEDDSSPTWSRQNYGYYSRYPGRNIDSERP RGYHHPQGFLEDDDSPVCYDSPJlSPRPvRLLPPTPASHRRSSFNFECLRRQSSQEEVPSSPI FPHRTALPLHLMQQQIMAVAGLDSSKAQKYSPSHSTRSWATPPATPPYRDWTPCYTPLI QVEQSEALDQVNGSLPSLHRSSWYTDEPDISYRTFTPASLTVPSSFRNKNSDKQRSADSL VEAVLISEGLGRYARDPKFVSATKHEIADACDLTIDEMESAASTLLNGNVRPRANGDVG PLSHRQDYELQDFGPGYSDEEPDPGRDEEDLADEMICITTL conesponding to amino acids 1763 - 2161 of CCADJHUMAN, which also conesponds to amino acids 1 - 399 of HUMCACHIA PEA 1 P14.
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: intracellularly. The protein localization is believed to be intracellularly because neither ofthe trans-membrane region prediction programs predicted a trans-membrane region for this protein. In addition both signal-peptide prediction programs predict that this protein is a non-secreted protein.
The glycosylation sites of variant protein HUMCACHl AJPEAJ JP 14, as compared to the known protein Voltage-dependent L-type calcium channel alpha- ID subunit, are described in Table 34 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 34 - Glycosylation site(s)
Figure imgf001160_0001
The phosphorilation sites of variant protein HUMCACH 1A_PEA_1_P 14, as compared to the known protein Voltage-dependent L-type calcium channel alpha- ID subunit, are described in Table 35 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the phosphorilation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 35 - Phosphorilation site(s)
Figure imgf001160_0002
Variant protein HUMCACHl A_PEA J_P14 is encoded by the following franscript(s): HUMCACHl A JPEAJ JTl 9, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript HUMCACHl A_PEA_1_T19 is shown in bold; this coding portion starts at position 1820 and ends at position 3016. The transcript also has the following SNPs as listed in Table 36 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMCACHl AJPEAJ JP 14 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 36 - Nucleic acid SNPs
Figure imgf001161_0001
Variant protein HUMCACHl AjPEAJ JP 15 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) HUMCACHl A JΕAJ JT20. The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because both trans-membrane region prediction programs predicted a trans-membrane region for this protein.
Variant protein HUMCACHl AJPEAJ J? 15 is encoded by the following transcript(s): HUMCACHIAJPEAJ JT20, for which the sequence(s) is/are given at the end ofthe application. The coding portion of franscript HUMCACHIAJPEAJ JT20 is shown in bold; this coding portion starts at position 512 and ends at position 1732. The franscript also has the following SNPs as listed in Table 37 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMCACH 1A_PEA_1_P 15 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 37 - Nucleic acid SNPs
Figure imgf001162_0001
Variant protein HUMCACH 1A_PEA_1_P 17 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by franscript(s) HUMCACH 1 AJ EAJ JT22. An alignment is given to the known protein (Voltage-dependent L-type calcium channel alpha-ID subunit) at the end ofthe application. One or more aligmnents to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMCACHl AJPEAJ JP17 and CCADJHUMAN: l.An isolated chimeric polypeptide encoding for HUMCACHIAJPEAJ JP17, comprising a first amino acid sequence being at least 90 %> homologous to MMMMMMMKKMQHQRQQQADHANEANYARGTRLPLSGEGPTSQPNSSKQTVLSWQ AAIDAARQAKAAQTMSTSAPPP VGSLSQRKRQQYAKSKKQGNSSNSRPARALFCLSLN NPIRRACISIVEWKPFDIFILLAIFANCVALAIYIPFPEDDSNSTNHNLEKVEYAFLIIFTVET FLKIIAYGLLLHPNAYVRNGWNLLDFVIVIVGLFSVILEQLTKETEGGNHSSGKSGGFDV KALRAFRVLRPLRLVSGVPSLQVVLNSIIKAMVPLLHIALLVLFVIIIYAIIGLELFIGKMH KTCFFADSDTVAEEDPAPCAFSGNGRQCTANGTECRSGWVGPNGGITNFDNFAFAMLT VFQCITMEGWTDVLYWMNDAMGFELPWVYFVSLVIFGSFFVLNLVLGVLSG conesponding to amino acids 1 - 407 of CCADJHUMAN, which also conesponds to amino acids 1 - 407 of HUMCACHIAJPEAJ JP17, and a second amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%o, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence HGGSRL conesponding to amino acids 408 - 413 of HUMCACH 1 AJPEAJ JP 17, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HUMCACHIAJPEAJ J517, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence HGGSRL in HUMCACHIAJPEAJ JP17.
The location ofthe variant protein was detennined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because both frans-membrane region prediction programs predicted a frans-membrane region for this protein. Variant protein HUMCACH 1 A JΕA JJP 17 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 38, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMCACH 1 AJPEAJ 117 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 38 - Amino acid mutations
Figure imgf001164_0001
The glycosylation sites of variant protein HUMCACH 1 AJPEAJ JP 17, as compared to the known protein Voltage-dependent L-type calcium channel alpha- ID subunit, are described in Table 39 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 39 - Glycosylation site(s)
Figure imgf001164_0002
The phosphorilation sites of variant protein HUMCACH1A_PEA_1_P17, as compared to the Icnown protein Voltage-dependent L-type calcium channel alpha-ID subunit, are described in Table 40 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the phosphorilation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 40 - Phosphorilation site(s)
Figure imgf001165_0001
Variant protein HUMCACH 1AJPEAJ JP 17 is encoded by the following transcript(s): HUMCACHl A_PEA_1_T22, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript HUMCACH 1 AJPEA JJT22 is shown in bold; this coding portion starts at position 512 and ends at position 1750. The transcript also has the following SNPs as listed in Table 41 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of Icnown SNPs in variant protein HUMCACH 1AJΕAJJP 17 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 41 - Nucleic acid SNPs
Figure imgf001165_0002
Figure imgf001166_0001
As noted above, cluster HUMCACHIA featares 67 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end ofthe application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided. Segment cluster HUMCACH 1 AJPEA Jjiode 2 according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HUMCACHl AJPEAJ _T0, HUMCACHIAJPEAJ _T2, HUMCACH1A_PEA_1_T3, HUMCACHl AJPEAJ _T4, HUMCACH1A_PEA_1_T6, HUMCACHIAJPEAJ _T7, HUMCACHl AJPEAJ JT8, HUMCACHl AJPEA JJT13, HUMCACHIAJPEAJ JTl 4, HUMCACHl AJPEAJ JTl 5, HUMCACHIAJPEAJ JT16, HUMCACH1A_PEA_1_T20 and HUMCACHl AJPEA JJT22. Table 42 below describes the starting and ending position of this segment on each franscript. Table 42 - Segment location on transcripts
Figure imgf001166_0002
Figure imgf001167_0001
Segment cluster HUMCACH lA_PEA_l_nodeJ according to the present invention is supported by 3 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCACHl AJPEAJ JT0, HUMCACHl AJPEAJ JTl, HUMCACH1AJPEAJJT2, HUMCACH 1A_PEA_1_T3, HUMCACHIAJPEAJ _T4, HUMCACHIAJPEAJ JT6, HUMCACHIAJPEAJ _T7, HUMCACHIAJPEAJ JT8, HUMCACH1AJΕAJJT13, HUMCACHIAJPEAJ JT14, HUMCACH1AJΕAJJT15, HUMCACH1AJΕAJJT16, HUMCACHIAJPEAJ JT20 and HUMCACHl AJPEAJ JT22. Table 43 below describes the starting and ending position of this segment on each franscript. Table 43 - Segment location on transcripts
Figure imgf001167_0002
Figure imgf001168_0001
Segment cluster HUMCACHl AJPEAJ _node_9 according to the present invention is supported by 0 libraries. The number of libraries was detennined as previously described. This segment can be found in the following transcript(s): HUMCACHl AJPEAJ JT0, HUMCACHIAJPEAJ JTl, HUMCACH1A_PEA_1_T2, HUMCACH1A_PEA_1_T3, HUMCACH1A_PEA_1_T4, HUMCACHl AJPEAJJT6, HUMCACHIAJPEAJ _T7, HUMCACHIAJPEAJ JT8, HUMCACHIAJPEAJ JT13, HUMCACH 1AJPEAJJT 14, HUMCACHl AJPEAJ JTl 5, HUMCACH 1AJPEA JJT 16, HUMCACH 1A_PEA_1_T20 and HUMCACHl A JPEA JJT22. Table 44 below describes the starting and ending position of this segment on each transcript. Table 44 - Segment location on transcripts
Figure imgf001168_0002
Figure imgf001169_0001
Segment cluster HUMCACHl AJPEAJ jiodej 1 according to the present invention is supported by 4 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCACHl A_PEA_1_T0, HUMCACHl AJPEAJ JTl, HUMCACH 1A_PE A JJT2, HUMCACHl AJPEAJ _T3, HUMCACH1AJ>EAJ JT4, HUMCACHIAJPEAJ JT6, HUMCACH 1 A JΕA JJT7, HUMCACHl A_PEA_1_T8, HUMCACHl A JΕAJ JTl 3, HUMCACHl A J>EAJ_T14, HUMCACHIAJPEAJ JT15, HUMCACH1A_PEA_1_T16, HUMCACHl A JΕAJ JT20 and HUMCACHl A_PEA_1_T22. Table 45 below describes the starting and ending position of this segment on each franscript. Table 45 - Segment location on transcripts
Figure imgf001169_0002
Figure imgf001170_0001
Segment cluster HUMCACHIAJPEAJ _nodeJ4 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCACHl AJPEAJJT0, HUMCACHl A_PEA_1_T1, HUMCACHl AJPEAJ JT2, HUMCACH 1 AJPEAJ JT3, HUMCACHl AJΕAJ JT4, HUMCACHIAJPEAJ JT6, HUMCACHIAJPEAJ _T7, HUMCACHl AJPEAJ JT8, HUMCACHl AJPEAJ JTl 3, HUMCACHl A_PEA_1_T14, HUMCACHIAJPEAJ JT15, HUMCACHl A_PEA_1_T16, HUMCACH1A_PEA_1_T20 and HUMCACH1A_PEA_1_T22. Table 46 below describes the starting and ending position of this segment on each transcript. Table 46 - Segment location on transcripts
Figure imgf001170_0002
Segment cluster HUMCACHlA PEAJ jnodejό according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCACHIAJPEAJ JT0, HUMCACH1A_PEAJ JTl, HUMCACH 1AJPEAJJT2, HUMCACHl A »EAJ_T3, HUMCACHIAJPEAJ JT4, HUMCACH1AJΕAJ JT6, HUMCACHIAJPEAJ JT7, HUMCACH 1AJPEAJJT8, HUMCACHl AJPEAJ JTl 3, HUMCACH1AJPEAJJT14, HUMCACHIAJPEAJ JT15, HUMCACHl AJPEAJ JTl 6, HUMCACHl AJΕAJ JT20 and HUMCACH1A_PEA_1_T22. Table 47 below describes the starting and ending position of this segment on each transcript. Table 47 - Segment location on transcripts
Figure imgf001171_0001
Segment cluster HUMCACHl A JPEAJ jnode 7 according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCACH 1 AJPEA JJT20. Table 48 below describes the starting and ending position of this segment on each transcript. Table 48 - Segment location on transcripts
Figure imgf001172_0001
Segment cluster HUMCACHl AJPEAJ jiode JO according to the present invention is supported by 8 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCACH1A_PEA_1_T22. Table 49 below describes the starting and ending position of this segment on each transcript. Table 49 - Segment location on transcripts
Figure imgf001172_0002
Segment cluster HUMCACHl AJPEAJ_nodeJ3 according to the present invention is supported by 3 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HUMCACHl AJPEAJ JT0, HUMCACHIAJΕAJ JTl, HUMCACH 1A_PEA_1_T2, HUMCACH1AJΕAJ _T3, HUMCACH1A_PEAJ_T4, HUMCACH1A_PEA_1_T6, HUMCACH1AJPEAJJT7, HUMCACHIAJPEAJ _T8, HUMCACHIAJPEAJ JT12, HUMCACHIAJPEAJ JT13, HUMCACHIAJPEAJ JT 4, HUMCACH1A_PEA_1_T15 and HUMCACHIAJPEA JJT16. Table 50 below describes the starting and ending position of this segment on each franscript. Table JO - Segment location on transcripts
Figure imgf001173_0001
Segment cluster HUMCACHl A_PEA_l_node_41 according to the present invention is supported by 5 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HUMCACHl AJPEAJ JT0, HUMCACHl AJPEAJ JTl, HUMCACHl AJΕAJ _T2, HUMCACHl AJPEAJ JT3, HUMCACH1A_PEA_1_T4, HUMCACH 1A_PEA JJT6, HUMCACH 1A_PEA_1_T7, HUMCACH1A_PEA_1_T8, HUMCACHl AJPEAJ JTl 2, HUMCACH1AJPEAJJT13, HUMCACH1A_PEA_1_T14, HUMCACHl AJΕAJ JTl 5 and HUMCACHl A_PEA_1_T16. Table 51 below describes the starting and ending position of this segment on each transcript. Table 51 - Segment location on transcripts
Figure imgf001173_0002
Figure imgf001174_0001
Segment cluster HUMCACHl A_PEA_l_node_43 according to the present invention is supported by 5 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCACHl A_PEA_1_T0, HUMCACHl AJPEA _T1, HUMCACHl AJPEAJ JT2, HUMCACH 1AJΕAJ JT3, HUMCACH1A_PEA_1_T4, HUMCACH 1A_PEA_1 _T6, HUMCACΗ1AJPEAJJT7, HUMCACH1A_PEA_1_T8, HUMCACH1A_PEA_1_T12, HUMCACHIAJPEAJ JT13, HUMCACHl AJPEA JJT14, HUMCACHIAJPEAJ JT15 and HUMCACHl AJPEAJ JTl 6. Table 52 below describes the starting and ending position of this segment on each transcript. Table 52 - Segment location on transcripts
Figure imgf001174_0002
Figure imgf001175_0001
Segment cluster HUMCACHl A_PEA_l_node_45 according to the present invention is supported by 4 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCACHIAJPEAJ JT0, HUMCACHl A_PEA_1_T1, HUMCACH 1A_PE A JJT2, HUMCACH 1 AJPEA JJT3, HUMCACH 1AJPEAJJT4, HUMCACHl AJPEA JJT6, HUMCACHIAJPEAJ _T7, HUMCACHl AJPEAJ JT8, HUMCACHl AJPEAJ JT12, HUMCACHIAJPEAJ JT13, HUMCACHl AJPEA JT14, HUMCACH 1AJPEA JJT 15 and HUMCACH1AJΕA JJT16. Table 53 below describes the starting and ending position of this segment on each transcript. Table 53 - Segment location on transcripts
Figure imgf001175_0002
Figure imgf001176_0001
Segment cluster HUMCACHl A_PEA_l_node_47 according to the present invention is supported by 3 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCACH 1 A JΕAJJT0, HUMCACHl AJPEAJ JTl, HUMCACHl AJPEA JJT2, HUMCACHl A_PEA_1_T3, HUMCACH1A_PEA_1_T4, HUMCACHl AJΕAJ JT6, HUMCACHIAJPEAJ JT7, HUMCACHIAJPEAJ JT8, HUMCACH1AJΕAJ JT12, HUMCACHIAJPEAJ JT13, HUMCACH1AJΕAJ JT14, HUMCACHIAJPEAJ JT15 and HUMCACHl A JPEAJ Tl 6. Table 54 below describes the starting and ending position of this segment on each transcript. Table 54 - Segment location on transcripts
Figure imgf001176_0002
Segment cluster HUMCACHl AJPEAJ jiode 5 according to the present invention is supported by 6 libraries. The number of libraries was detennined as previously described. This segment can be found in the following transcript(s): HUMCACHl AJPEAJ JT0, HUMCACHl AJPEAJ JTl, HUMCACH1A_PEA_1_T2, HUMCACHl A JPEAJ JT3, HUMCACH1AJΕAJ _T4, HUMCACH1AJΕAJ JT6, HUMCACH 1AJΕAJ _T7, HUMCACH 1A_PEA_1_T8, HUMCACH1A_PEA_1_T12, HUMCACHIAJPEAJ JT13, HUMCACHIAJPEAJ JT14, HUMCACH 1A_PEA_1_T 15 and HUMCACH1AJ>EAJJT16. Table 55 below describes the starting and ending position of this segment on each transcript. Table 55 - Segment location on transcripts
Figure imgf001177_0001
Segment cluster HUMCACHl A_PEA_l_nodeJ7 according to the present invention is supported by 7 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCACH 1A_PEA_1_T0, HUMCACHl AJΕAJ JTl, HUMCACH 1 AJPEA JJT2, HUMCACH 1 AJPEAJ JT3, HUMCACH1A_PEA_1_T4, HUMCACH1AJΕAJ JT6, HUMCACHIAJPEAJ _T7, HUMCACHIAJPEAJ JT8, HUMCACHl AJPEA JJT12, HUMCACHIAJPEAJ JTl 3, HUMCACHIAJPEAJ JN4, HUMCACHIAJPEAJ JTl 5 and HUMCACHIAJPEAJ JT16. Table 56 below describes the starting and ending position of this segment on each transcript. Table 56 - Segment location on transcripts
Figure imgf001178_0001
Segment cluster HUMCACHl A_PEA_l_node_70 according to the present invention is supported by 3 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCACHl A_PEA_1_T0, HUMCACHl AJΕAJ JTl, HUMCACH1A_PEA_1_T2, HUMCACH1AJPEAJJT3, HUMCACHIAJPEAJ JT4, HUMCACHIAJPEAJ JT6, HUMCACH1A_PEA_1_T7, HUMCACHIAJPEAJ JT8, HUMCACH1A_PEA_1_T12, HUMCACHl AJPEAJ JTl 3, HUMCACHIAJPEAJ JTl 4 and HUMCACHl A JPEAJ JTl 5. Table 57 below describes the starting and ending position of this segment on each transcript. Table 57 - Segment location on transcripts
Figure imgf001179_0001
Segment cluster HUMCACHl A_PEA_l_node_72 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HUMCACHl A_PEA_1_T0, HUMCACHIAJPEAJJΠ, HUMCACHIAJPEAJJΠ, HUMCACHIAJPEAJJΠ, HUMCACHIAJPEAJ JT4, HUMCACHIAJPEAJ JT6, HUMCACH1A_PEA_1_T7, HUMCACH1A_PEA_1_T8, HUMCACHIAJPEAJ JT12, HUMCACHL A_PEA_1_T13, HUMCACH1A_PEAJ JT14 and HUMCACHl AJPEAJ JTl 5. Table 58 below describes the starting and ending position of this segment on each transcript. Table 58 - Segment location on transcripts
Figure imgf001180_0001
Segment cluster HUMCACHIAJPEAJ _nodeJ74 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCACHl A_PEA_1_T0, HUMCACHl AJPEA J_T1, HUMCACHl AJPEA JJT2, HUMCACHl AJPEAJ JT3, HUMCACHIAJPEAJ _T4, HUMCACH 1AJPEAJ _T6, HUMCACH 1AJPEAJ J7, HUMCACH1A_PEAJ JT8, HUMCACHIAJPEAJ J12, HUMCACHl A_PEA_1_T13, HUMCACHl A_PEA_1_T14 and HUMCACHl AJPEAJ JTl 5. Table 59 below describes the starting and ending position of this segment on each franscript. Table 59 - Segment location on transcripts
Figure imgf001180_0002
Figure imgf001181_0001
Segment cluster HUMCACHl AJPEAJ jnode 6 according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCACHl AJPEAJ JT0, HUMCACHIAJPEAJJΠ, HUMCACHIAJPEA Π, HUMCACHIAJPEAJJΠ, HUMCACH1AJPEAJJT4, HUMCACH 1 A JΕA JJT6, HUMCACHIAJPEAJ JT7, HUMCACH1A_PEAJ_T8, HUMCACHIAJPEA JJT12, HUMCACH1AJPEAJ_T13, HUMCACHIAJPEAJ JT14 and HUMCACHl A_PEA_1_T17. Table 60 below describes the starting and ending position of this segment on each transcript. Table 60 - Segment location on transcripts
Figure imgf001181_0002
Figure imgf001182_0001
Segment cluster HUMCACHIAJPEAJ ιode_92 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCACHl AJPEAJ JT0, HUMCACHIAJPEAJJΠ, HUMCACHIAJPEAJ Π, HUMCACHl AJΕAJ JT3, HUMCACHIAJPEAJ JT4, HUMCACH1A_PEA_1_T6, HUMCACH1A_PEA_1_T7, HUMCACHIAJPEAJ _T8, HUMCACHIAJPEAJ JT12, HUMCACHIAJPEA JJT13, HUMCACHl AJΕAJ JT14 and HUMCACHl AJPEAJ JT17. Table 61 below describes the starting and ending position of this segment on each transcript. Table 61 - Segment location on transcripts
Figure imgf001182_0002
Segment cluster HUMCACHlA_PEA_l_node_94 according to the present invention is supported by 4 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCACH 1 AJPEA JJT0, HUMCACHl AJΕAJ JTl, HUMCACHl AJ>EAJ_T2, HUMCACH 1A_PEA_1_T3, HUMCACHl AJΕAJ T4, HUMCACH1A_PEA_1_T6, HUMCACHIAJPEA J_T7, HUMCACHIAJΕAJ _T8, HUMCACHIAJPEAJ JT12, HUMCACH1A_PEA_1_T13, HUMCACH 1AJPEA JJT 14 and HUMCACHIAJPEAJ JT17. Table 62 below describes the starting and ending position of this segment on each transcript. Table 62 - Segment location on transcripts
Figure imgf001183_0001
Segment cluster HUMCACHl A JPEA Jjnode J 03 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCACH 1A_PEA_1_T 18. Table 63 below describes the starting and ending position of this segment on each transcript. Table 63 - Segment location on transcripts
Figure imgf001184_0001
Segment cluster HUMCACH 1 AJPEAJ jnode J 04 according to the present invention is supported by 3 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCACHIAJPEAJ JTO, HUMCACHIAJPEAJ Π, HUMCACHIAJPEAJJΠ, HUMCACHI AJΈAJ jre, HUMCACHIAJPEAJ JT4, HUMCACH 1A_PEA_1_T6, HUMCACHIAJPEAJ JT7, HUMCACHIAJPEAJ _T8, HUMCACHIAJPEAJ JTl 2, HUMCACHIAJPEA JJT13, HUMCACHIAJPEAJ JT17 and HUMCACHl A_PEA_1_T18. Table 64 below describes the starting and ending position of this segment on each transcript. Table 64 - Segment location on transcripts
Figure imgf001184_0002
Figure imgf001185_0001
Segment cluster HUMCACH 1 AJPEA J iodeJ 06 according to the present invention is supported by 4 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCACHl A_PEA_1_T19. Table 65 below describes the starting and ending position of this segment on each transcript. Table 65 - Segment location on transcripts
Figure imgf001185_0002
Segment cluster HUMCACH 1 AJPEAJ jnode J 09 according to the present invention is supported by 6 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCACHl AJPEA J JTO, HUMCACHl AJPEAJ JTl, HUMCACHl A_PEA_1_T2, HUMCACHl A JPEAJ _T3, HUMCACH1AJ>EAJ JT4, HUMCACH 1 A J>EAJJT6, HUMCACHIAJPEAJ _T7, HUMCACH1AJ>EAJ JT8, HUMCACHl AJΕAJ JTl 2, HUMCACHIAJPEA JJT13, HUMCACH1A_PEA_1_T17, HUMCACHl A_PEA_1_T18 and HUMCACHIAJPEA JJ19. Table 66 below describes the starting and ending position of this segment on each franscript. Table 66 - Segment location on transcripts
Figure imgf001185_0003
Figure imgf001186_0001
Segment cluster HUMCACH 1AJPEAJ iodej 13 according to the present invention is supported by 4 libraries. The number of libraries was detennined as previously described. This segment can be found in the following franscript(s): HUMCACH 1AJPEAJ JTO, HUMCACHIAJPEAJJΠ, HUMCACHIAJPEAJJΠ, HUMCACHIAJΈAJJΠ, HUMCACHIAJPEAJ _T4, HUMCACHIAJPEAJ _T6, HUMCACHIAJPEA JJT7, HUMCACHIAJPEA JJT8, HUMCACH1A_PEA_1_T12, HUMCACHL A JPEAJ JTL 3, HUMCACH1A_PEAJ_T17, HUMCACHIAJPEA JJT18 AND HUMCACHL A_PEA_1_T19. Table 67 below describes the starting and ending position of this segment on each transcript. Table 67 - Segment location on transcripts
Figure imgf001186_0002
Figure imgf001187_0001
Segment cluster HUMCACHIAJPEAJ jiodej 14 according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcripts): HUMCACH 1A_PEA_1_T6". Table 68 below describes the starting and ending position of this segment on each franscript. Table 68 - Segment location on transcripts
Figure imgf001187_0002
Segment cluster HUMCACH 1 AJPEA Jjiode J 16 according to the present invention is supported by 4 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCACH 1AJPEAJ JTO, HUMCACHlAJPEAJ π, HUMCACH1AJPEA_1_T2, HUMCACHl A_PEA_1_T3, HUMCACH1A_PEA_1_T4, HUMCACHl AJΕAJ JT6, HUMCACHIAJPEAJ JT7, HUMCACHIAJPEA J_T8, HUMCACHIAJPEAJ JTl 2, HUMCACHIA »EA_1_T13, HUMCACHl AJPEA JN7, HUMCACHl A_PEA_1_T18 and HUMCACHl A_PEA_1_T19. Table 69 below describes the starting and ending position of this segment on each transcript. Table 69 - Segment location on transcripts
Figure imgf001187_0003
Figure imgf001188_0001
Segment cluster HUMCACHIAJPEA Jjiode 119 according to the present invention is supported by 4 libraries. The number of libraries was detennined as previously described. This segment can be found in the following transcript(s): HUMCACHl AJPEAJ JTO, HUMCACHl AJPEAJ JTl, HUMCACHIAJPEA JJT2, HUMCACH 1AJPEAJ JT3, HUMCACHIAJPEAJ JT4, HUMCACH 1AJΕAJJT6, HUMCACH1A_PEA_1_T7, HUMCACHIAJPEAJ _T8, HUMCACHl AJΕAJ JTl 2, HUMCACHIAJPEA JJT17, HUMCACHl AJPEAJ JTl 8 and HUMCACHl AJPEA JJT19. Table 70 below describes the starting and ending position of this segment on each franscript. Table 70 - Segment location on transcripts
Figure imgf001188_0002
Figure imgf001189_0001
Segment cluster HUMCACH 1 AJPEAJ jtiodej 21 according to the present invention is supported by 15 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCACHl AJPEAJ JTO, HUMCACHIAJPEAJJΠ, HUMCACHIA_PEA_I_Π, HUMCACHIAJPEAJ JΠ, HUMCACHIAJPEAJ JT4, HUMCACHIAJPEAJ _T6, HUMCACH1A_PEA_1_T7, HUMCACH1AJΕAJ JT8, HUMCACH1A_PEAJ_T12, HUMCACH1AJ EAJ_T17, HUMCACHl AJPEAJ JTl 8 and HUMCACHl AJΕAJ JTl 9. Table 71 below describes the starting and ending position of this segment on each transcript. Table 71 - Segment location on transcripts
Figure imgf001189_0002
Figure imgf001190_0001
Segment cluster HUMCACHl A_PEA_l_node_123 according to the present invention is supported by 28 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCACHl AJPEAJ JTO, HUMCACHl A_PEA_1 JTl, HUMCACHIAJPEAJ _T2, HUMCACH 1AJPEAJ JT3, HUMCACH1AJΕAJJT4, HUMCACH1A_PEA_1_T6, HUMCACHl AJPEAJ JT7, HUMCACHIAJPEA 1_T87HUMCACH1A_PEA^1_T12, HUMCACH1A_PEA_1_T17, HUMCACHl AJΕAJjπ 8 and HUMCACHl A_PEA_1_T19. Table 72 below describes the starting and ending position of this segment on each transcript. Table 72 - Segment location on transcripts
Figure imgf001190_0002
Segment cluster HUMCACH 1AJPEAJ jiodej 25 according to the present invention is supported by 48 libraries. The number of libraries was detennined as previously described. This segment can be found in the following transcript(s): HUMCACHl AJPEAJ JTO, HUMCACHIAJPEAJJΠ, HUMCACHIAJPEAJ Π, HUMCACHIAJPEAJJΠ, HUMCACHIAJPEA J_T4, HUMCACH 1A_PEA_1_T6, HUMCACH1A_PEA_1_T7, HUMCACH 1 AJPEA JJT8, HUMCACHIAJPEAJ _T12, HUMCACH 1AJPEA JJT 17, HUMCACHIAJPEAJ JTl 8 and HUMCACHl A_PEA_1_T19. Table 73 below describes the starting and ending position of this segment on each transcript. Table 73 - Segment location on transcripts
Figure imgf001191_0001
Segment cluster HUMCACHlA PEAJ node 128 according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HUMCACHl A_PEA_1_T2. Table 74 below describes the starting and ending position of this segment on each transcript. Table 74 - Segment location on transcripts
Figure imgf001192_0001
According to an optional embodiment ofthe present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster HUMCACHIAJPEA JjnodeJ) according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HUMCACHl AJPEAJ JTl . Table 75 below describes the starting and ending position of this segment on each transcript. Table 75 - Segment location on transcripts
Figure imgf001192_0002
Segment cluster HUMCACHl AJPEAJ jiode 3 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HUMCACH1A_PEA_1_T0, HUMCACHl AJPEAJ JTl, HUMCACHl AJΕAJ _T2, HUMCACHl A_PEA_1_T3, HUMCACHIAJPEA J_T4, HUMCACH1A_PEA_1_T6, HUMCACH1AJ>EAJ JT7, HUMCACHIAJPEA JJT8, HUMCACHl AJΕAJ JTl 3, HUMCACH1A_PEA_1_T14, HUMCACH1A_PEA_1_T15, HUMCACHl AJPEA J_T16, HUMCACHIAJPEA JJT20 and HUMCACH1AJΕAJ JT22. Table 76 below describes the starting and ending position of this segment on each franscript. Table 76 - Segment location on transcripts
Figure imgf001193_0001
Segment cluster HUMCACHIAJPEAJ _node_7 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCACHl A_PEA_1_T0, HUMCACHIAJPEA JΠ, HUMCACHIAJPEAJ Π, HUMCACHIAJPEAJJB, HUMCACHL AJPEAJ JT4, HUMCACHIAJPEAJ JT6, HUMCACHIAJPEAJ JH, HUMCACHIAJPEAJ JT8, HUMCACHL A_PEA_1_T13, HUMCACHIAJPEA JT14, HUMCACH1A_PEA_1_T15, HUMCACHl A_PEA_1_T16, HUMCACH1A_PEAJ_T20 and HUMCACHIAJPEAJ JT22. Table 77 below describes the starting and ending position of this segment on each transcript. Table 77 - Segment location on transcripts
Figure imgf001194_0001
Segment cluster HUMCACHl AJPEAJ ιode_23 according to the present invention is supported by 5 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HUMCACHl AJPEAJ JT8 and HUMCACHIAJPEAJ JT22. Table 78 below describes the starting and ending position of this segment on each franscript. Table 78 - Segment location on transcripts
Figure imgf001194_0002
Segment cluster HUMCACH 1 A JΕAJ_nodeJ6 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCACHl AJPEAJ JTO, HUMCACHl AJPEA 1 JTl, HUMCACHl AJPEAJ JT2, HUMCACHl A ?EA_l_T3, HUMCACHIAJPEAJ JT4, HUMCACH1A_PEAJ_T6, HUMCACH 1A_PEAJ_T7, HUMCACHl AJPEAJ JTl 3, HUMCACHl AJPEAJ JTl 4, HUMCACH 1A_PEA_1_T15, HUMCACH 1A_PEA_1_T 16 and HUMCACH1A_PEA_1_T20. Table 79 below describes the starting and ending position of this segment on each transcript. Table 79 - Segment location on transcripts
Figure imgf001195_0001
Segment cluster HUMCACHl A_PEA_l_nodeJ2 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCACHIAJPEAJ JT12. Table 80 below describes the starting and ending position of this segment on each transcript. Table 80 - Segment location on transcripts
Figure imgf001196_0001
Segment cluster HUMCACHl A_PEA_l_nodeJ5 according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCACHl AJPEAJ JTO, HUMCACHIAJPEAJJΠ, HUMCACHIAJPEA JΠ, HUMCACHl AJPEAJ JT3, HUMCACHIAJPEAJ _T4, HUMCACHIAJPEAJ _T6, HUMCACH1A_PEA_1_T7, HUMCACHl AJPEA J_T8, HUMCACHl A_PEA_1_T12, HUMCACH1AJΕAJ JT13, HUMCACHIAJPEAJ JTl 4, HUMCACH 1AJPEA JJT 15 and HUMCACHl A_PEA_1_T16. Table 81 below describes the starting and ending position of this segment on each transcript. Table 81 - Segment location on transcripts
Figure imgf001196_0002
Segment cluster HUMCACHl AJPEAJ _nodeJ 7 according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCACHl AJΕAJ JTO, HUMCACHIAJPEAJ JTl, HUMCACHl AJPEA J_T2, HUMCACHl A_PEA_1_T3, HUMCACH 1AJPEAJ JT4, HUMCACH1AJPEAJJT6, HUMCACHIAJPEAJ JT7, HUMCACHIAJPEAJ JT12, HUMCACHl A JPEA JJT13, HUMCACHl AJPEA JJT14, HUMCACHl AJPEA JJT15 and HUMCACHl AJPEA JJT16. Table 82 below describes the starting and ending position of this segment on each transcript. Table 82 - Segment location on transcripts
Figure imgf001197_0001
Segment cluster HUMCACHl A_PEA_l_node 9 according to the present invention is supported by 4 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCACHl A JΕAJJTO, HUMCACHl AJPEAJ JTl, HUMCACHl AJPEAJ JT2, HUMCACHl AJPEA JJT3, HUMCACHIAJPEAJ JT4, HUMCACHl A JΕAJJT6, HUMCACH1A_PEA_1_T7, HUMCACHIAJPEAJ JT8, HUMCACH1A_PEA_1_T12, HUMCACH 1 AJPEA JJN3, HUMCACH1A_PEA_1_T14, HUMCACH 1A_PEA_1_T 15 and HUMCACHl AJPEAJ JTl 6. Table 83 below describes the starting and ending position of this segment on each franscript. Table 83 - Segment location on transcripts
Figure imgf001198_0001
Segment cluster HUMCACHl A JPEA Jjiode -9 according to the present invention is supported by 4 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HUMCACHl A JΕAJ JTO, HUMCACHIAJPEAJJΠ, HUMCACHIAJPEAJJΠ, HUMCACHIAJΈAJ JΠ, HUMCACHIAJPEAJ _T4, HUMCACH1A_PEA_1_T6, HUMCACHIAJPEAJ JT7, HUMCACHIAJPEAJ _T8, HUMCACHIAJPEAJ JT12, HUMCACH1A_PEA_1_T13, HUMCACHIAJPEA JT14, HUMCACH 1AJPEA JJT 15 AND HUMCACHL A_PEA_1_T16. Table 84 below describes the starting and ending position of this segment on each transcript. Table 84 - Segment location on transcripts
Figure imgf001199_0001
Segment cluster HUMCACH 1AJPEA ljnode 51 according to the present invention is supported by 4 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCACHIAJPEAJ JTO, HUMCACHIAJPEAJJΠ, HUMCACHIAJPEAJJΠ, HUMCACHIAJΈAJJΠ, HUMCACH1A_PEA_1_T4, HUMCACH1A_PEA_1_T6, HUMCACH1A_PEA_1_T7, HUMCACHIAJPEAJ JT8, HUMCACHL AJPEA J_T12, HUMCACHL AJΕAJ JTL 3, HUMCACHIAJPEAJ JT4, HUMCACHl AJΕAJ JTl 5 and HUMCACHl AJPEAJ JTl 6. Table 85 below describes the starting and ending position of this segment on each transcript. Table 85 - Segment location on transcripts
Figure imgf001199_0002
Figure imgf001200_0001
Segment cluster HUMCACHl AjPEAJ jnode 53 according to the present invention is supported by 4 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCACHl AJPEAJ JTO, HUMCACHl AJPEAJ JTl, HUMCACHl A_PEA_1_T2, HUMCACH 1AJΕAJ JT3, HUMCACHIAJPEAJ JT4, HUMCACH1A_PEA_1_T6, HUMCACHl AJPEAJ JH, HUMCACHIAJPEA JJT8, HUMCACHl AJPEAJ JTl 2, HUMCACHIAJPEA JJT13, HUMCACH1A_PEA_1_T14, HUMCACH1A_PEA_1_T15 and HUMCACHl A_PEA_1_T16. Table 86 below describes the starting and ending position of this segment on each transcript. Table 86 - Segment location on transcripts
Figure imgf001200_0002
Figure imgf001201_0001
Segment cluster HUMCACHl AJPEAJ jiodej 8 according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HUMCACH1AJΕAJ JT16. Table 87 below describes the starting and ending position of this segment on each franscript. Table 87 - Segment location on transcripts
Figure imgf001201_0002
Segment cluster HUMCACHl AJΕAJ jiode O according to the present invention is supported by 4 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCACHl AJPEAJ JTO, HUMCACHl AJPEAJ Tl, HUMCACHl AJPEAJ _T2, HUMCACHl A_PEA_1_T3, HUMCACH1AJΕAJ JT4, HUMCACHIAJPEAJ _T6, HUMCACH1A_PEAJ_T7, HUMCACH1A_PEA_1 JT8, HUMCACH 1A_PEA_1 _T12, HUMCACH1A_PEA_1_T13, HUMCACH1AJΕAJ JT14 and HUMCACHl AJΕAJ JTl 5. Table 88 below describes the starting and ending position of this segment on each franscript. Table 88 - Segment location on transcripts
Figure imgf001202_0001
Segment cluster HUMCACH 1 AJPEA ljnode j52 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCACHl AJΕA J_T0, HUMCACHIAJΈAJJΠ, HUMCACHIAJPEAJJΠ, HUMCACHIAJΈAJJΠ, HUMCACH1AJΕAJ JT4, HUMCACHL A_PEA_1_T6, HUMCACHIAJPEAJ T7, HUMCACH1AJΕAJJT8, HUMCACH1AJΕAJ JT12, HUMCACHIAJPEA JJT13, HUMCACHl AJΕAJ _T14 and HUMCACHl A_PEAJ JTl 5. Table 89 below describes the starting and ending position of this segment on each franscript. Table 89 - Segment location on transcripts
Figure imgf001202_0002
Figure imgf001203_0001
Segment cluster HUMCACHIAJPEAJ _nodeJ>4 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HUMCACHl AJPEAJ JTO, HUMCACHl AJΕAJ JTl, HUMCACH 1 AJΕAJ _T2, HUMCACH 1A_PEA_1_T3, HUMCACH 1AJΕAJJT4, HUMCACHl AJPEAJ_T6, HUMCACHl A JΕA JJT7, HUMCACHl A_PEA_1_T8, HUMCACH1A_PEA_1_T12, HUMCACHl AJPEAJ JTl 3, HUMCACHIAJPEAJ _T14 and HUMCACHIAJPEAJ JT15. Table 90 below describes the starting and ending position of this segment on each transcript. Table 90 - Segment location on transcripts
Figure imgf001203_0002
Figure imgf001204_0001
Segment cluster HUMCACHl AJΕA Jjnode >6 according to the present invention is supported by 4 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCACHl AJPEAJ _T0, HUMCACHIAJPEAJJΠ, HUMCACHIAJPEAJJΠ, HUMCACHIAJΈAJJΠ, HUMCACHIAJPEAJ _T4, HUMCACH 1AJΕAJJT6, HUMCACHL A_PEA_1_T7, HUMCACHIAJPEAJ JT8, HUMCACH 1A_PEA_1_T 12, HUMCACHL AJPEAJ JT 3, HUMCACH 1A_PEA_1_T 14 and HUMCACHl A JΕAJ JTl 5. Table 91 below describes the starting and ending position of this segment on each franscript. Table 91 - Segment location on transcripts
Figure imgf001204_0002
Segment cluster HUMCACHl A__PEA_l_node_68 according to the present invention is supported by 3 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCACHl AJPEAJ JTO, HUMCACHIAJPEAJ JTl, HUMCACHl AJPEAJJH, HUMCACH 1A_PEA_1_T3, HUMCACHIAJPEA JJT4, HUMCACHl AJPEA JJT6, HUMCACHl AJPEAJJH, HUMCACHIAJPEA JJT8, HUMCACHl A JPEAJJTl 2, HUMCACH 1AJΕAJJT13, HUMCACH1AJΕAJ JT14 and HUMCACH1AJΕAJ JT15. Table 92 below describes the starting and ending position of this segment on each transcript. Table 92 - Segment location on transcripts
Figure imgf001205_0001
Segment cluster HUMCACHIAJΕAJ _nodeJ6 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCACHIAJPEAJ JTO, HUMCACHl AJPEA J_T1, HUMCACHlAJΕAJjπ, HUMCACH 1 AJΕAJ JT3, HUMCACHIAJPEA JJT4, HUMCACHl AJPEA JJT6, HUMCACH 1AJΕAJ JT7, HUMCACH1A_PEA_1_T8, HUMCACHIAJPEAJ JT12, HUMCACH 1A_PEA_1_T 13, HUMCACHIAJPEAJ JT14 and HUMCACHIAJPEA JJT15. Table 93 below describes the starting and ending position of this segment on each transcript. Table 93 - Segment location on transcripts
Figure imgf001206_0001
Segment cluster HUMCACHlA_PEA_l_node_77 according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCACHIAJPEAJ JT15. Table 94 below describes the starting and ending position of this segment on each franscript. Table 94 - Segment location on transcripts
Figure imgf001206_0002
Figure imgf001207_0001
Segment cluster HUMCACHl AJPEA J_nodeJ79 according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCACHl AJPEAJ JTO, HUMCACHIAJPEAJ JTl, HUMCACHl AJPEA JJT2, HUMCACH 1 AJPEA JJT3, HUMCACHIAJPEA JJT4, HUMCACHl AJΕAJ _T6, HUMCACHIAJΕAJ _T7, HUMCACHIAJΕAJ JT8, HUMCACH 1AJΕAJJT 12, HUMCACH1A_PEA_1_T13 and HUMCACHIAJPEAJ JT14. Table 95 below describes the starting and ending position of this segment on each franscript. Table 95 - Segment location on transcripts
Figure imgf001207_0002
Segment cluster HUMCACHIAJΕAJ _node_81 according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCACHl AJPEAJ JTl 7. Table 96 below describes the starting and ending position of this segment on each transcript. Table 96 - Segment location on transcripts
Figure imgf001208_0001
Segment cluster HUMCACHl AJPEAJ ιode_84 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCACH 1AJΕAJJT0, HUMCACHIAJΈAJJΠ, HUMCACHIAJPEAJJΠ, HUMCACHIAJPEAJJΠ, HUMCACH1A_PEA_1_T4, HUMCACH 1AJPEAJJT6, HUMCACH 1AJPEAJJT7, HUMCACHIAJPEA JJT8, HUMCACHIAJΕAJ JT12, HUMCACHIAJΕAJ JT3, HUMCACHIAJPEAJ J 4 and HUMCACHl AJPEAJ JT17. Table 97 below describes the starting and ending position of this segment on each transcript. Table 97 - Segment location on transcripts
Figure imgf001208_0002
Figure imgf001209_0001
Segment cluster HUMCACH lA_PEA_l_node_88 according to the present invention is supported by 2 libraries. The number of libraries was detennined as previously described. This segment can be found in the following transcript(s): HUMCACHl AJΕAJ JTO, HUMCACHIAJPEAJJΠ, HUMCACHIAJPEAJJΠ, HUMCACHIAJΈAJJΠ, HUMCACHIAJPEAJ JT4, HUMCACHL AJPEA JJT6, HUMCACHIAJPEAJ JT7, HUMCACHIAJPEAJ JT8, HUMCACH 1A_PEA_1_T 12, HUMCACHIAJPEA JJT13, HUMCACHIAJPEAJ JT14 and HUMCACHl AJPEAJ JTl 7. Table 98 below describes the starting and ending position of this segment on each transcript. Table 98 - Segment location on transcripts
Figure imgf001209_0002
Segment cluster HUMCACHl A_PEA_l_node_90 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCACH 1 AJΕAJ JTO, HUMCACHlAJΕAJ π, HUMCACHIAJPEAJ Π, HUMCACH1A_PEA_1_T3, HUMCACH 1AJPEAJ _T4, HUMCACH 1AJPEAJJT6, HUMCACH 1AJPEAJ _T7, HUMCACHIAJΕAJ JT8, HUMCACHIAJPEAJ JT12, HUMCACH1A_PEA_1_T13, HUMCACH 1A_PEA_1_T14 and HUMCACH1A_PEA_1_T17. Table 99 below describes the starting and ending position of this segment on each transcript. Table 99 - Segment location on transcripts
Figure imgf001210_0001
Segment cluster HUMCACHl A_PEA_l_node_96 according to the present invention is supported by 4 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCACHIAJPEAJ JTO, HUMCACHIAJΈAJJΠ, HUMCACHIAJΈAJ Π, HUMCACHIAJPEAJJΠ, HUMCACHIA PEA 1 T4, HUMCACHIA PEA 1 T6, HUMCACHIA PEA 1 T7, HUMCACHl AJPEAJ JT8, HUMCACH1AJPEAJ_T12, HUMCACH 1AJPEA JJT 13, HUMCACH1A_PEAJ _T14 and HUMCACH 1A_PEA_1_T 17. Table 100 below describes the starting and ending position of this segment on each transcript. Table 100 - Segment location on transcripts
Figure imgf001211_0001
Segment cluster HUMCACHl AJPEAJ _node_98 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCACHIAJPEAJ JTO, HUMCACHIAJPEAJ JTl, HUMCACHIAJPEAJ JT2, HUMCACH1AJΕAJJT3, HUMCACHIAJΕAJ _T4, HUMCACHIAJPEAJ _T6, HUMCACH1A_PEA_1_T7, HUMCACHIAJPEA J_T8, HUMCACHl AJPEAJ JTl 2, HUMCACHl AJPEAJ JT 13, HUMCACHIAJΕAJ JT14 and HUMCACHIAJPEAJ JT17. Table 101 below describes the starting and ending position of this segment on each franscript. Table 101 - Segment location on transcripts
Figure imgf001212_0001
Segment cluster HUMCACHIAJΕAJ jiode 100 according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCACH 1AJPEAJ JTO, HUMCACHl AJΕAJ JTl, HUMCACHl A_PEA_1_T2, HUMCACHl AJPEA JJT3, HUMCACH1AJΕAJJT4, HUMCACH 1AJPEAJ JT6, HUMCACHIAJΕAJ _T7, HUMCACHIAJPEAJ _T8, HUMCACHIAJPEAJ JT12, HUMCACHIAJPEAJ JT13, HUMCACHIAJΕAJ JT14 and HUMCACHIAJPEAJ JTl 7. Table 102 below describes the starting and ending position of this segment on each transcript. Table 102 - Segment location on transcripts
Figure imgf001212_0002
Figure imgf001213_0001
Segment cluster HUMCACH 1 AJPEA JjiodeJ 01 according to the present invention is supported by 0 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCACHl AJΕAJ JT14. Table 103 below describes the starting and ending position of this segment on each transcript. Table 103 - Segment location on transcripts
Figure imgf001213_0002
Segment cluster HUMCACHl A JPEA Jjnode J 07 according to the present invention is supported by 6 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HUMCACH 1AJΕAJJT0, HUMCACHIAJPEAJJΠ, HUMCACHIAJΈAJJΠ, HUMCACHIAJPEAJJΠ, HUMCACH1AJPEAJJT4, HUMCACH1A_PEA_1_T6, HUMCACH1AJPEAJJT7, HUMCACHIAJPEAJ JT8, HUMCACHL AJΕAJJT12, HUMCACH1A_PEA_1_T13, HUMCACHl AJΕAJ JTl 7, HUMCACH1AJΕAJJT18 and HUMCACHl AJΕAJ JTl 9. Table 104 below describes the starting and ending position of this segment on each franscript. Table 104 - Segment location on transcripts
Figure imgf001214_0001
Segment cluster HUMCACHIAJPEAJ jiodej 11 according to the present invention is supported by 0 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCACHl AJPEAJ JTO, HUMCACHIAJΈAJJΠ, HUMCACHIAJPEAJJΠ, HUMCACHIAJΈA jjra, HUMCACHIAJPEAJ JT4, HUMCACH1A_PEA_1_T6, HUMCACHIAJΕAJ _T8, HUMCACHl AJΕA JT12, HUMCACHl AJPEA JJT17, HUMCACHl AJPEAJ JTl 8 and HUMCACHIAJΕAJ JT19. Table 105 below describes the starting and ending position of this segment on each transcript. Table 105 - Segment location on transcripts
Figure imgf001214_0002
Figure imgf001215_0001
Segment cluster HUMCACHl AJPEAJ jiodej 17 according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcriρt(s): HUMCACHIAJPEAJ JT13. Table 106 below describes the starting and ending position of this segment on each transcript. Table 106- Segment location on transcripts
Figure imgf001215_0002
Segment cluster HUMCACHIAJΕAJ _nodeJ24 according to the present invention is supported by 18 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCACH 1NJPEAJ JTO, HUMCACHIAJPEAJJΠ, HUMCACHIAJPEAJJΠ, HUMCACHIA_PEAJJΠ, HUMCACHIAJPEAJ _T4, HUMCACHIAJΕAJ _T6, HUMCACHIAJΕAJ JT7, HUMCACH1A_PEA_1_T8, HUMCACH1A_PEA_1_T12, HUMCACH1A_PEA_1_T17, HUMCACHIAJΕAJ JT 18 and HUMCACH 1 AJPEAJ JT 19. Table 107 below describes the starting and ending position of this segment on each transcript. Table 107 - Segment location on transcripts
Figure imgf001216_0001
Segment cluster HUMCACHIAJPEAJ _nodeJ26 according to the present invention is supported by 9 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HUMCACHl AJPEAJ JTO, HUMCACHlAJΕAJ π, HUMCACHIAJΕAJ T6, HUMCACHl AJΈAJJΠ, HUMCACH1AJΕAJJT8, HUMCACHl AJΕAJ JTl 2, HUMCACH1A_PEAJ_T17, HUMCACHl AJPEAJ JTl 8 and HUMCACHl AJPEAJ _T19. Table 108 below describes the starting and ending position of this segment on each transcript. Table 108 - Segment location on transcripts
Figure imgf001216_0002
Figure imgf001217_0001
Variant protein alignment to the previously known protein: Sequence name: CCAD_HUMAN_V3 Sequence documentation:
Alignment of: HUMCACH1A_PEA_J_P7 x CCADJHUMAN /3
Alignment segment 1/1
Quality: 16625.00 Escore: 0 Matching length: 1696 Total length: 1716 Matching Percent Similarity: 99.94 Matching Percent Identity: 99.94 Total Percent Similarity: 98.78 Total Percent Identity: 98.78 Gaps : 1
Alignment:
1 MPTSETESVNTENVSGEGENRGCCGSL CW RRRGAAKAGPSGCRR GQA 50
466 MPTSETESVNTENVSGEGENRGCCGSL CQA 495 - - - . . - . - 51 ISKSKLSRRWRRNRFNRRRCRAAVKSVTFYWLVIVLVFLNTLTISSEHY 100
496 ISKSKLSRR RRWNRFNRRRCRAAV SVTFYWLVIVLVFLNTLTISSEHY 545 101 NQPDWLTQIQDIANKVLLALFTCEMLVKMYSLGLQAYFVSLFNRFDCFVV 150 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I 546 NQPDWLTQIQDIANKVLLALFTCEMLVKMYSLGLQAYFVSLFNRFDCFVV 595
151 CGGITETILVELEIMSPLGISVFRCVRLLRIFKVTRHWTSLSNLVASLLN 200
596 CGGITETILVELEIMSPLGISVFRCVRLLRIFKVTRH TSLSNLVASLLN 645
201 SMKSIASLLLLLFLFIIIFSLLGMQLFGGKFNFDETQTKRSTFDNFPQAL 250 I I II I I I I I I I I I I I I I I I I I II I I I I I I I I II I I I I I II I I I I I I I I I I 646 SMKSIASLLLLLFLFIIIFSLLGMQLFGGKFNFDETQTKRSTFDNFPQAL 695
251 LTVFQILTGEDWNAVMYDGIMAYGGPSSSGMIVCIYFIILFICGNYILLN 300 I I I I I I II I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 696 LTVFQILTGEDWNAVMYDGIMAYGGPSSSGMIVCIYFIILFICGNYILLN 745
301 VFLAIAVDNLADAESLNTAQKEEAEEKERKKIARKESLEN KNNKPEVNQ 350 746 VFLAIAVDNLADAESLNTAQKEEAEE ERKKIARKESLENKKNNKPEVNQ 795
351 IANSDNKVTIDDYREEDEDKDPYPPCDVPVGEEEEEEEEDEPEVPAGPRP 400
796 IANSDNKVTIDDYREEDEDKDPYPPCDVPVGEEEEEEEEDEPEVPAGPRP 845
401 RRISELNMKEKIAPIPEGSAFFILSKTNPIRVGCHKLINHHIFTNLILVF 450 846 RRISELNMKEKIAPIPEGSAFFILSKTNPIRVGCHKLINHHIFTNLILVF 895
451 IMLSSAALAAEDPIRSHSFRNTILGYFDYAFTAIFTVEILLKMTTFGAFL 500
896 IMLSSAALAAEDPIRSHSFRNTILGYFDYAFTAIFTVEILLKMTTFGAFL 945
501 HKGAFCRNYFNLLDMLVVGVSLVSFGIQSSAISVVKILRVLRV RPLRAI 550 I I I I I I I I I II I I I I I II I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I 946 HKGAFCRNYFNLLDMLWGVSLVSFGIQSSAI SVVKILRVLRVLRPLRAI 995 551 NRAKGLKHVVQCVFVAIRTIGNIMIVTTLLQFMFACIGVQLFKGKFYRCT 600 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I 996 NRAKGLKHVVQCVFVAIRTIGNIMIVTTLLQFMFACIGVQLFKGKFYRCT 1045 601 DEAKSNPEECRGLFILYKDGDVDSPVVRERI QNSDFNFDNVLSAMMALF 650
1046 DEAKSNPEECRGLFILYKDGDVDSPWRERIWQNSDFNFDNVLSAMMALF 1095
651 TVSTFEGWPALLYKAIDSNGENIGPIYNHRVEISIFFIIYIIIVAFFMMN 700
1096 TVSTFEGWPALLYKAIDSNGENIGPIYNHRVEISIFFIIYIIIVAFFMMN 1145 701 IFVGFVIVTFQEQGEKEYKNCELDKNQRQCVEYALKARPLRRYIPKNPYQ 750 I I I I I 1 I I I II I I I I I I II I I I I I I I I I I I I I I I I I 1 I I I II I I I I I I I I
1146 IFVGFVIVTFQEQGEKEYKNCELDKNQRQCVEYALKARPLRRYIPKNPYQ 1195 751 YKF YVVNSSPFEYMMFVLIMLNTLCLAMQHYEQSKMFNDAMDILNMVFT 800 I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I
1196 YKFWYVVNSSPFEYMMFVLIMLNTLCLAMQHYEQSKMFNDAMDILNMVFT 1245 801 GVFTVEMVLKVIAFKPKGYFSDA NTFDSLIVIGSIIDVALSEADPTESE 850 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I l-l I I I I I I I l-l I I I I
1246 GVFTVEMVLKVIAFKPKGYFSDAWNTFDSLIVIGSIIDVALSEADPTESE 1295 851 NVPVPTATPGNSEESNRISITFFR FRVMRLVKLLSRGEGIRTL WTFIK 900 I I I I I I I I I I I I I I I I I I I I I I I I I I I I 11 I I I I I I I I I I I I I I I I I I I I 1296 NVPVPTATPGNSEESNRISITFFRLFRVMRLVKLLSRGEGIRTLL TFIK 1345 901 SFQALPYVALLIAMLFFIYAVIGMQMFGKVAMRDNNQINRNNNFQTFPQA 950 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
1346 SFQALPYVALLIAMLFFIYAVIGMQMFGKVAMRDNNQINRNNNFQTFPQA 1395 . . . . . 951 VLLLFRCATGEA QEIMLACLPG LCDPESDYNPGEEYTCGSNFAIVYFI 1000 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I 1396 VLLLFRCATGEA QEIMLACLPGKLCDPESDYNPGEEYTCGSNFAIVYFI 1445
1001 SFYMLCAFLIINLFVAVIMDNFDYLTRDWSILGPHHLDEFKRIWSEYDPE 1050 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1446 SFYMLCAFLIINLFVAVIMDNFDYLTRD SI GPHHLDEFKRI SEYDPE 1495
1051 AKGRIKHLDWTLLRRIQPPLGFGKLCPHRVACKRLVAMNMPLNSDGTVM 1100 I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I
1496 AKGRIKHLDWTLLRRIQPPLGFGKLCPHRVACKRLVAMNMPLNSDGTVM 1545 1101 FNATLFALVRTALKIKTEGNLEQANEELRAVIKKI KKTSMKLLDQWPP 1150 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1546 FNATLFALVRTALKIKTEGNLEQANEELRAVIKKIWKKTSMKLLDQVVPP 1595 . . . . . 1151 AGDDEVTVGKFYATFLIQDYFRKFKKRKEQGLVGKYPAKNTTIALQAGLR 1200 I I I I I I II I I I I I I I I I I I I I I I I II I I I I I I I I 1 I I I I I I I I I I I I I I I 1596 AGDDEVTVGKFYATFLIQDYFR FKKRKEQGLVGKYPAKNTTIALQAGLR 1645
1201 TLHDIGPEIRRAISCDLQDDEPEET REEEDDVFKRNGALLGNHVNHVNS 1250 I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I 1646 TLHDIGPEIRRAISCDLQDDEPEETKREEEDDVFKRNGALLGNHVNHVNS 1695 1251 DRRDSLQQTNTTHRPLHVQRPSIPPASDTEKPLFPPAGNSVCHNHHNHNS 1300 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1696 DRRDSLQQTNTTHRPLHVQRPSIPPASDTEKPLFPPAGNSVCHNHHNHNS 1745 1301 IGKQVPTSTNANLNNANMSKAAHGKRPSIGNLEHVSENGHHSSHKHDREP 1350 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I 1746 IGKQVPTSTNANLNNANMSKAAHGKRPSIGNLEHVSENGHHSSHKHDREP 1795
1351 QRRSSVKRTRYYETYIRSDSGDEQLPTICREDPEIHGYFRDPHCLGEQEY 1400 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I 1796 QRRSSVKRTRYYETYIRSDSGDEQLPTICREDPEIHGYFRDPHCLGEQEY 1845 . . . . . 1401 FSSEECYEDDSSPT SRQNYGYYSRYPGRNIDSERPRGYHHPQGFLEDDD 1450 II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1846 FSSEECYEDDSSPT SRQNYGYYSRYPGRNIDSERPRGYHHPQGFLEDDD 1895 1451 SPVCYDSRRSPRRRLLPPTPASHRRSSFNFECLRRQSSQEEVPSSPIFPH 1500 1896 SPVCYDSRRSPRRRLLPPTPASHRRSSFNFECLRRQSSQEEVPSSPIFPH 1945 1501 RTALPLHLMQQQIMAVAGLDSSKAQKYSPSHSTRSWATPPATPPYRDWTP 1550 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1946 RTALPLHLMQQQIMAVAGLDSSKAQKYSPSHSTRSWATPPATPPYRDWTP 1995 1551 CYTPLIQVEQSEALDQVNGSLPSLHRSSWYTDEPDISYRTFTPASLTVPS 1600 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1996 CYTPLIQVEQSEALDQVNGSLPSLHRSSWYTDEPDISYRTFTPASLTVPS 2045
1601 SFRNKNSDKQRSADSLVEAVLISEGLGRYARDPKFVSATKHEIADACDLT 1650
2046 SFRNKNSDKQRSADSLVEAVLISEGLGRYARDPKFVSATKHEIADACDLT 2095 1651 IDEMESAASTLLNGNVRPRANGDVGPLSHRQDYELQDFGPGYSDEEPDPG 1700 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I 2096 IDEMESAASTLLNGNVRPRANGDVGPLSHRQDYELQDFGPGYSDEEPDPG 2145
1701 RDEEDLADEMICITTL 1716
2146 RDEEDLADEMICITTL 2161
Sequence name: CCAD_HUMAN
Sequence documentation: Alignment of : HUMCACHl A JΕAJ. J?l 3 x CCAD_HUMAN
Alignment segment 1/1: Quality: 5658.00 Escore: 0 Matching length: 564 Total length: 564 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 - — Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment:
48 DDEVTVGKFYATFLIQDYFRKFKKRKEQGLVGKYPAKNTTIALQAGLRTL 97 I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1598 DDEVTVGKFYATFLIQDYFRKFKKR EQGLVGKYPAKNT IALQAGLRTL 1647 . . . . . 98 HDIGPEIRRAISCDLQDDEPEETKREEEDDVFKRNGALLGNHVNHVNSDR 147 I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I II I I I I I 1648 HDIGPEIRRAISCDLQDDEPEETKREEEDDVFKRNGALLGNHVNHVNSDR 1697 148 RDSLQQTNTTHRPLHVQRPSIPPASDTEKPLFPPAGNSVCHNHHNHNSIG 197 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1698 RDSLQQTNTTHRPLH¥QRPSIPPASDTEKPLFPPAGNSVCHNHHNHNSIG 1747 198 KQVPTSTNANLNNANMSKAAHGKRPSIGNLEHVSENGHHSSHKHDREPQR 247 I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I 1748 KQVPTSTNANLNNANMSKAAHGKRPSIGNLEHVSENGHHSSHKHDREPQR 1797 248 RSSVKRTRYYETYIRSDSGDEQLPTICREDPEIHGYFRDPHCLGEQEYFS 297 I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
1798 RSSVKRTRYYETYIRSDSGDEQLPTICREDPEIHGYFRDPHCLGEQEYFS 1847 . . . . . 298 SEECYEDDSSPTWSRQNYGYYSRYPGRNIDSERPRGYHHPQGFLEDDDSP 347 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
1848 SEECYEDDSSPTWSRQNYGYYSRYPGRNIDSERPRGYHHPQGFLEDDDSP 1897 348 VCYDSRRSPRRRLLPPTPASHRRSSFNFECLRRQSSQEEVPSSPIFPHRT-397 - I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
1898 VCYDSRRSPRRRLLPPTPASHRRSSFNFECLRRQSSQEEVPSSPIFPHRT 1947 398 ALPLHLMQQQIMAVAGLDSSKAQKYSPSHSTRS ATPPATPPYRDWTPCY 447 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I
1948 ALPLHLMQQQIMAVAGLDSS AQKYSPSHSTRS ATPPATPPYRDWTPCY 1997 448 TPLIQVEQSEALDQVNGSLPSLHRSSWYTDEPDISYRTFTPASLTVPSSF 497 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I 1998 TPLIQVEQSEALDQVNGSLPSLHRSSWYTDEPDISYRTFTPASLTVPSSF 2047 498 RNKNSDKQRSADSLVEA¥LISEGLGRYARDPKFVSATKHEIADACDLTID 547 I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I II I I I I II I I I I I I I
2048 RNKNSDKQRSADSLVEAVLISEGLGRYARDP FVSATKHEIADACDLTID 2097 548 EMESAASTLLNGNVRPRANGDVGPLSHRQDYELQDFGPGYSDEEPDPGRD 597
2098 EMESAASTLLNGNVRPRANGDVGPLSHRQDYELQDFGPGYSDEEPDPGRD 2147 598 EEDLADEMICITTL 611 2148 EEDLADEMICITTL 2161
Sequence name: CCADJiUMAN
Sequence documentation:
Alignment of: HUMCACH1A_PEA_1_P14 x CCADJIUMAN
Alignment segment 1/1:
Quality: 4021.00 Escore: 0 Matching length: 399 Total length: 399 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment:
1 MSKAAHGKRPSIGNLEHVSENGHHSSHKHDREPQRRSSVKRTRYYETYIR 50 II I I I I I I I I II I I I I I II I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I 1763 MSKAAHGKRPSIGNLEHVSENGHHSSHKHDREPQRRSSVKRTRYYETYIR 1812 51 SDSGDEQLPTICREDPEIHGYFRDPHCLGEQEYFSSEECYEDDSSPT SR 100 I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I
1813 SDSGDEQLPTICREDPEIHGYFRDPHCLGEQEYFSSEECYEDDSSPTWSR 1862 101 QNYGYYSRYPGRNIDSERPRGYHHPQGFLEDDDSPVCYDSRRSPRRRLLP 150 I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
1863 QNYGYYSRYPGRNIDSERPRGYHHPQGFLEDDDSPVCYDSRRSPRRRLLP 1912
151 PTPASHRRSSFNFECLRRQSSQEEVPSSPIFPHRTALPLHLMQQQIMAVA 200 I I I I I I I I I I I I I l-l I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
1913 PTPASHRRSSFNFECLRRQSSQEEVPSSPIFPHRTALPLHLMQQQIMAVA 1962
201 GLDSSKAQKYSPSHSTRS ATPPATPPYRD TPCYTPLIQVEQSEALDQV 250 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1963 GLDSSKAQKYSPSHSTRSWATPPATPPYRDWTPCYTPLIQVEQSEALDQV 2012
251 NGSLPSLHRSSWYTDEPDISYRTFTPASLTVPSSFRNKNSDKQRSADSLV 300
2013 NGSLPSLHRSSWYTDEPDISYRTFTPASLTVPSSFRNKNSDKQRSADSLV 2062 . . . . . 301 EAVLISEGLGRYARDPKFVSATKHEIADACDLTIDEMESAASTLLNGNVR 350
2063 EAVLISEGLGRYARDPKFVSATKHEIADACDLTIDEMESAASTLLNGNVR 2112 351 PRANGDVGPLSHRQDYELQDFGPGYSDEEPDPGRDEEDLADEMICITTL 399 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I 2113 PRANGDVGPLSHRQDYELQDFGPGYSDEEPDPGRDEEDLADEMICITTL 2161 Sequence name: CCAD_HUMAN
Sequence documentation:
Alignment of: HUMCACH1A_PEA_1_P17 x CCADJΪUMAN
Alignment segment 1/1: _. ... Quality: 3976.00 Escore: 0 Matching length: 407 Total length: 407 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment :
1 MMMMMMMKKMQHQRQQQADHANEANYARGTRLPLSGEGPTSQPNSSKQTV 50 I I II I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MMMMMMMKKMQHQRQQQADHANEANYARGTRLPLSGEGPTSQPNSSKQTV 50
51 LSWQAAIDAARQAKAAQTMSTSAPPPVGSLSQRKRQQYAKSKKQGNSSNS 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I' I I I I I I I I I I I I I I I I I I 51 LS QAAIDAARQA AAQTMSTSAPPPVGSLSQR RQQYAKSKKQGNSSNS 100
101 RPARALFCLSLNNPIRRACISIVE KPFDIFILLAIFANCVALAIYIPFP 150 101 RPARALFCLSLNNPIRRACISIVE KPFDIFILLAIFANCVALAIYIPFP 150 151 EDDSNSTNHNLEKVEYAFLIIFTVETFLKIIAYGLLLHPNAYVRNGWNLL 200 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I 151 EDDSNSTNHNLEKVEYAFLIIFTVETFL IIAYGLLLHPNAYVRNG NLL 200 201 DFVIVIVGLFSVILEQLTKETEGGNHSSGKSGGFDVKALRAFRVLRPLRL 250 I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I— -- -201 DFVIVIVGLFSVILEQLT ETEGGNHSSGKSGGFDVKALRAFRVLRPLRL 250 251 VSGVPSLQVVLNSIIKAMVPLLHIALLVLFVIIIYAIIGLELFIGKMHKT 300 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I 251 VSGVPSLQVVLNSIIKAMVPLLHIALLVLFVIIIYAIIGLELFIGKMHKT 300 . . . . . 301 CFFADSDIVAEEDPAPCAFSGNGRQCTANGTECRSGWVGPNGGITNFDNF 350 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 301 CFFADSDIVAEEDPAPCAFSGNGRQCTANGTECRSG VGPNGGITNFDNF 350 351 AFAMLTVFQCITMEG TDVLYWMNDAMGFELPWVYFVSLVIFGSFF¥LNL 400 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I 351 AFAMLTVFQCITMEGWTDVLYWMNDAMGFELP VYFVSLVIFGSFFVLNL 400
401 VLGVLSG 407
401 VLGVLSG 407 Expression of Voltage-dependent L-type calcium channel alpha- ID subunit Calcium 5 channel, L type, alpha-1 polypeptide, isoform 2 transcripts which are detectable by seg 113, 35, 109, 125, in normal, and cancerous colon tissues
Expression of Voltage-dependent L-type calcium channel alpha-ID subunit Calcium channel, L type, alpha-1 polypeptide, isoform 2 franscripts detectable by or according to ~10 segments 113, 35, 109, 125 was measured with oligonucleotide-based micro-anays. The results of image intensities for each featore were normalized according to the ninetieth percentile ofthe image intensities of all the featares on the chip. Then, feature image intensities for replicates of the same oligonucleotide on the chip and replicates ofthe same sample were averaged. Outlying results were discarded. 15 For every oligonucleotide HUMCACH 1A JJ 4917 , HUMCACH 1A_0_0_14922 , HUMCACHIAJJOJ 4924 and HUMCACH1A_0_0_14913 (SEQ ID NOs: 1331,1332, 1333 and 1334, respectively) the averaged intensity determined for every sample was divided by the averaged intensity of all the normal samples (Sample Nos. 62-66 and 69, Table 1, above,
20 "Tissue samples in testing panel"), to obtain a value of fold up-regulation for each sample relative to the averaged normal samples. These data are presented in a histogram bellow (Fig 47). As is evident from the histogram, the expression of Voltage-dependent L-type calcium channel alpha- ID subunit Calcium channel, L type, alpha-1 polypeptide, isoform 2 franscripts detectable with the above oligonucleotides in cancer samples was higher than in the normal 25 samples.
HUMCACH1A_0J_14917 (SEQ ID NO: 1331)- AGAGAATATCACTCCGATGGTCGGTTTCTGACTGTCACGCTAAGGGCAAC HUMCACH1A_0_0J4922 (SEQ ID NO: 1332)- 30 GAACACAGAGAACGTCAGCGGTGAAGGCGAGAACCGAGGCTGCTGTGGAA HUMCACH1A_0_0_14924 (SEQ ID NO: 1333)- GGCCCAGCATTGGGAACCTTGAGCATGTGTCTGAAAATGGGCATCATTCT HUMCACH IA OJ 4913 (SEQ ID NO: 1334)- GACTCAGGAGATGAACAGCTCCCAACTATTTGCCGGGAAGACCCAGAGAT
Expression of Voltage-dependent L-type calcium channel alpha- ID subunit Calcium channel, L type, alpha-1 polypeptide, isoform 2 HUMCACHIA transcripts which are detectable by amplicon as depicted in sequence name HUMCACHlAseglOl in normal and cancerous colon tissues Expression of Voltage-dependent L-type calcium channel alpha- ID subunit Calcium channel, L type, alpha-1 polypeptide, isoform 2 transcripts detectable by or according to seglOl, HUMCACHIA seglOl amplicon (SEQ ID NO: 1337) and HUMCACHl AseglOlF (SEQ ID NO: 1335), HUMCACHl AseglOIR (SEQ ID NO: 1336) primers was measured by real time PCR. In parallel the expression of four housekeeping genes -PBGD (GenBank Accession No. BC019323; amplicon - PBGD-amplicon, SEQ ID NO:531), HPRTl (GenBank Accession No. NM_000194; amplicon - HPRTl -amplicon, SEQ ID NO:612), G6PD (GenBank Accession No. NM_000402; G6PD amplicon, SEQ ID NO:615), RPS27A (GenBank Accession No. NM_002954; RPS27A amplicon, SEQ ID NO: 1261), was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the normal post-mortem (PM) samples (Sample Nos. 41, 52, 62-67, 69-71, Table 1, above, "Tissue samples in testing panel"), to obtain a value of fold up-regulation for each sample relative to median ofthe normal PM samples. Figure 48 is a histogram showing over expression ofthe above-indicated Voltage- dependent L-type calcium channel alpha- ID subunit Calcium channel, L type, alpha-1 polypeptide, isoform 2 franscripts in cancerous colon samples relative to the normal samples. The number and percentage of samples that exhibit at least 3 fold over-expression, out of the total number of samples tested is indicated in the bottom. As is evident from Figure 48, the expression of Voltage-dependent L-type calcium channel alpha-ID subunit Calcium channel, L type, alpha-1 polypeptide, isoform 2 transcripts detectable by the above amplicon in cancer samples was significantly higher than in the non- cancerous samples (Sample Nos. 41, 52, 62-67, 69-71 ,Table 1, "Tissue samples in testing panel"). Notably an over-expression of at least 3 fold was found in 11 out of 37 adenocarcinoma samples, Statistical analysis was applied to verify the significance of these results, as described "below. — - - — — - The P value for the difference in the expression levels of Voltage-dependent L-type calcium channel alpha- ID subunit Calcium channel, L type, alpha-1 polypeptide, isoform 2 transcripts detectable by the above amplicon in colon cancer samples versus the normal tissue samples was determined by T test as 1.02E-03. Threshold of 3 fold overexpression was found to differentiate between cancer and normal samples with P value of 3.78E-02 as checked by exact fisher test. The above values demonstrate statistical significance ofthe results.
Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non- limiting illustrative example only of a suitable primer pair: HUMCACH lAsegl 01 Eforward primer; and HUMCACH lAsegl 01 R reverse primer. The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon was obtained as a non-limiting illusfrative example only of a suitable amplicon: HUMCACHl Asegl 01.
Forward primer (SΕQ ID NO: 1335): CAGCAGGAAATTCGGTGTGTC Reverse primer (SΕQ ID NO: 1336): TCAAGGTTCCCAATGCTGG Amplicon (SΕQ ID NO: 1337): CAGCAGGAAATTCGGTGTGTCATAACCATCATAACCATAATTCCATAGGAAAGCAA GTTCCCACCTCAACAAATGCCAATCTCAATAATGCCAATATGTCCAAAGCTGCCCAT GGAAAGCGGCCCAGCATTGGGAACCTTGA
DESCRIPTION FOR CLUSTER HUMCEA
Cluster HUMCEA featares 10 transcript(s) and 47 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end ofthe application. The selected protein variants are given in table 3. ~ Table 1 - Transcripts of interest
Figure imgf001232_0001
Figure imgf001233_0001
Figure imgf001234_0001
Table 3 - Proteins of interest
Figure imgf001234_0002
These sequences are variants ofthe known protein Carcinoembryonic antigen-related cell adhesion molecule 5 precursor (SwissProt accession identifier CEA5JHUMAN; known also according to the synonyms Carcinoembryonic antigen; CEA; Meconium antigen 100; CD66e antigen), SEQ ID NO: 863, refened to herein as the previously known protein. The sequence for protein Carcinoembryonic antigen-related cell adhesion molecule 5 precursor is given at the end ofthe application, as "Carcinoembryonic antigen-related cell adhesion molecule 5 precursor amino acid sequence". Known polymorphisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein
Figure imgf001235_0001
Protein Carcinoembryonic antigen-related cell adhesion molecule 5 precursor localization is believed to be Attached to the membrane by a GPI-anchor. .
The previously known protein also has the following indication(s) and/or potential therapeutic use(s): Cancer. It has been investigated for clinical/therapeutic use in humans, for example as a target for an antibody or small molecule, and/or as a direct therapeutic; available information related to these investigations is as follows. Potential pharmaceutically related or therapeutically related activity or activities ofthe previously known protein are as follows: Immunostimulant. A therapeutic role for a protein represented by the cluster has been predicted. The cluster was assigned this field because there was infoπnation in the drug database or the public databases (e.g., described herein above) that this protein, or part thereof, is used or can be used for a potential therapeutic indication: Imaging agent; Anticancer; Immunostimulant; Immunoconjugate; Monoclonal antibody, murine; Antisense therapy; antibody. The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: integral plasma membrane protein; membrane, which are armotation(s) related to Cellular Component. The GO assignment relies on information from one or more ofthe SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <htto://www.ncbi.nlm.nih.gov/projects/LocusLink/>. Cluster HUMCEA can be used as a diagnostic marker according to overexpression of franscripts of this cluster in cancer. Expression of such franscripts in normal tissues is also given according to the previously described methods. The term "number" in the left hand column of the table and the numbers on the y-axis ofthe figure below refer to weighted expression of ESTs in each category, as "parts per million" (ratio ofthe expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million). Overall, the following results were obtained as shown with regard to the histograms in Figure 49 and Table 5. This cluster is overexpressed (at least at a minimum level) in the following pathological conditions: epithelial malignant tamors, a mixtare of malignant tamors from different tissues and pancreas carcinoma. Table 5 - Normal tissue distribution
Figure imgf001236_0001
Table 6 - P values and ratios for expression in cancerous tissue
Figure imgf001236_0002
Figure imgf001237_0001
For this cluster, at least one oligonucleotide was found to demonstrate overexpression of the cluster, although not of at least one transcript/segment as listed below. Microanay (chip) data is also available for this cluster as follows. Various oligonucleotides were tested for being differentially expressed in various disease conditions, particularly cancer, as previously described. The following oligonucleotides were found to hit this cluster but not other segments/transcripts below, shown in Table 7. Table 7 - Oligonucleotides related to this cluster
Figure imgf001237_0002
As noted above, cluster HUMCEA featares 10 transcript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Carcinoembryonic antigen-related cell adliesion molecule 5 precursor. A description of each variant protein according to the present invention is now provided.
Variant protein HUMCEA_PEA_1_P4 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) HUMCEAJPEAJ _T8. An alignment is given to the known protein (Carcinoembryonic antigen-related cell adhesion molecule 5 precursor) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMCEAJPEAJ JP4 and CEA5 JTUMAN: 1.An isolated chimeric polypeptide encoding for HUMCEAJPEAJ JP4, comprising a first amino acid sequence being at least 90 % homologous to
MESPSAPPHRWCIPWQRLLLTASLLTFWNPPTTAKLTIESTPFNVAEGKEVLLLVHNLPQ HLFGYSWYKGERVDGNRQIIGYVIGTQQATPGPAYSGREITYPNASLLIQNIIQNDTGFYT LHVIKSDLVNEEATGQFRVYPELPKPSISSNNSKPVEDKDAVAFTCEPETQDATYL WV NNQSLPVSPRLQLSNGNRTLTLFNVTRNDTASYKCETQNPVSARRSDSVILNVL conesponding to amino acids 1 - 234 of CEA5 JHUMAN, which also conesponds to amino acids 1 - 234 of HUMCEAJPEAJ JP4, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
CEYICSSLAQAASPNPQGQRQDFSVPLRFKYTDPQPWTSRLSVTFCPRKTWADQVLTKN RRGGAASVLGGSGSTPYDGRNR conesponding to amino acids 235 - 315 of HUMCEA_PEA_1_P4, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HUMCEAJPEAJ JP4, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence CEYICSSLAQAASPNPQGQRQDFSVPLRFKYTDPQPWTSRLSVTFCPRKTWADQVLTKN RRGGAASVLGGSGSTPYDGRNR in HUMCEA_PEA_1_P4.
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region. Variant protein HUMCEA_PEA_1_P4 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 8, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMCEAJPEAJ J?4 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Amino acid mutations
Figure imgf001239_0001
The glycosylation sites of variant protein HUMCEAJPEAJ JP4, as compared to the known protein Carcinoembryonic antigen-related cell adhesion molecule 5 precursor, are described in Table 9 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 9 - Glycosylation site(s)
Figure imgf001239_0002
Figure imgf001240_0001
Variant protein HUMCEAJPEAJ JP4 is encoded by the following transcript(s): HUMCEAJPEAJ _T8, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript HUMCEAJΕAJ JT8 is shown in bold; this coding portion starts at position 115 and ends at position 1059. The franscript also has the following SNPs as listed in Table 10 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMCEAJPEAJ JP4 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 - Nucleic acid SNPs
Figure imgf001241_0001
Figure imgf001242_0001
Variant protein HUMCEA_PEA_1_P5 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) HUMCEAJPEAJ JT9. An alignment is given to the known protein (Carcinoembryonic antigen-related cell adhesion molecule 5 precursor) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMCEA_PEA_1_P5 and CEA5 JHUMAN: l.An isolated chimeric polypeptide encoding for HUMCEAJPEAJ JP5, comprising a first amino acid sequence being at least 90 % homologous to MESPSAPPHRWCIPWQRLLLTASLLTFWNPPTTAKLTIESTPFNVAEGKEVLLLVHNLPQ HLFGYSWYKGERVDGNRQIIGYVIGTQQATPGPAYSGREIIYPNASLLIQNIIQNDTGFYT LHVIKSDLVNEEATGQFRVYPELPKPSISSNNSKPVEDKDAVAFTCEPETQDATYLWWV NNQSLPVSPRLQLSNGNRTLTLFNVTRNDTASYKCETQNPVSARRSDSVILNVLYGPDA PTISPLNTSYRSGENLNLSCHAASNPPAQYSWFVNGTFQQSTQELFIPNITVNNSGSYTC QAHNSDTGLNRTTVTTITVYAEPPKPFITSNNSNPVEDEDAVALTCEPEIQNTTYLWWV NNQSLPVSPRLQLSNDNRTLTLLSVTRNDVGPYECGIQNELSVDHSDPVILNVLYGPDD PTISPSYTYYRPGVNLSLSCHAASNPPAQYSWLIDGNIQQHTQELFISNITEKNSGLYTCQ ANNSASGHSRTTVKTITVSAELPKPSISSNNSKPVEDKDAVAFTCEPEAQNTTYLWWVN GQSLPVSPRLQLSNGNRTLTLFNVTRNDARAYVCGIQNSVSANRSDPVTLDVLYGPDTP πSPPDSSYLSGANLNLSCHSASNPSPQYSWRiNGIPQQHTQVLFIAKITPNNNGTYACFV SNLATGRNNSIVKSITVS conesponding to amino acids 1 - 675 of CEA5_HUMAN, which also conesponds to amino acids 1 - 675 of HUMCEA_PEA_1_P5, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%) and most preferably at least 95% homologous to a polypeptide having the sequence GKWLPGASASYSGVESI FSPKSQEDIFFPSLCSMGTRKSQILS conesponding to amino acids 676 - 719 of HUMCEAJ AJ JP5, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HUMCEAJPEAJ J>5, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%) homologous to the sequence GKWLPGASASYSGVESIWFSPKSQEDIFFPSLCSMGTRKSQILS in HUMCEA PEA 1 P5.
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and "other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region. Variant protein HUMCEAJPEAJ JP5 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 11, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMCEAJPEAJ J 5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 11 - Amino acid mutations
Figure imgf001243_0001
Figure imgf001244_0001
The glycosylation sites of variant protein HUMCEAJPEAJ JP5, as compared to the known protein Carcinoembryonic antigen-related cell adhesion molecule 5 precursor, are described in Table 12 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 12 - Glycosylation site(s)
Figure imgf001244_0002
Figure imgf001245_0001
Variant protein HUMCEAJPEAJ J*5 is encoded by the following transcript(s): HUMCEAJPEAJ JT9, for which the sequence(s) is/are given at the end ofthe application. The coding portion of franscript HUMCEA JΕA J JT9 is shown in bold; this coding portion starts at position 115 and ends at position 2271. The transcript also has the following SNPs as listed in Table 13 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMCEAJPEAJ JP5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 13 - Nucleic acid SNPs
Figure imgf001245_0002
Figure imgf001246_0001
Variant protein HUMCEA JΕA JJP7 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by franscript(s) HUMCEAJPEAJ JT12. An alignment is given to the known protein (Carcinoembryonic antigen-related cell adhesion molecule 5 precursor) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMCEAJPEAJ JP7 and CEA5 JHUMAN: 1.An isolated chimeric polypeptide encoding for HUMCEA_PEA_1_P7, comprising a first amino acid sequence being at least 90 % homologous to MESPSAPPHRWCIPWQPJ.LLTASLLTFWNPPTTAKLTIESTPFNVAEGKEVLLLVHNLPQ HLFGYSWYKGERVDGNRQIIGYVIGTQQATPGPAYSGREIIYPNASLLIQNIIQNDTGFYT LHVTKSDLVNEEATGQFRVYPELPKPSISSNNSKPVEDKDAVAFTCEPETQDATYLWWV NNQSLPVSPRLQLSNGNRTLTLFNVTRNDTASYKCETQNPVSARRSDSVILNVLYGPDA PTISPLNTSYRSGENLNLSCHAASNPPAQYSWFVNGTFQQSTQELFTPNITVNNSGSYTC QAHNSDTGLNRTTVTTITVYAEPPKPFITSNNSNPVEDEDAVALTCEPEIQNTTYLWWV NNQSLPVSPRLQLSNDNRTLTLLSVTRNDVGPYECGIQNELSVDHSDPVILNVLYGPDD PTISPSYTYYRPGVNLSLSCHAASNPPAQYSWLIDGNIQQHTQELFISNITEKNSGLYTCQ AIWSASGHSRTTVKTITVSAELPKPSISSNNSKPVED10 AVAFTCEPEAQNTTYLWWVN GQSLPVSPRLQLSNGNRTLTLFNVTRNDARAYVCGIQNSVSANRSDPVTLDVLYGPDTP πSPPDSSYLSGANLNLSCHSASNPSPQYSWRTNGIPQQHTQVLFIAKITPNNNGTYACFV SNLATGRNNSIVKSITV conesponding to amino acids 1 - 674 of CEA5 JHUMAN, which also conesponds to amino acids 1 - 674 of HUMCEAJPEAJ J 7, and a second amino acid sequence being at least 90 % homologous to SAGATVGIMIGVLVGVALI conesponding to amino acids 684 - 702 of CEA5 JHUMAN, which also conesponds to amino acids 675 - 693 of HUMCEA_PEA_1_P7, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated chimeric polypeptide encoding for an edge portion of HUMCEAJΕAJ JP7, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise VS, having a stmcture as follows: a sequence starting from any of amino acid numbers 674-x to 674; and ending at any of amino acid numbers 675+ ((n-2) - x), in which x varies from 0 to n-2.
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because of manual inspection of known protein localization and/or gene structure. Variant protein HUMCEA JΕAJ JP7 also has the following non-silent SNPs (Single
Nucleotide Polymorphisms) as listed in Table 14, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMCEA JPEA_1_P7 sequence provides support for the deduced sequence of this variant protein according to the present invention).
Table 14 - Amino acid mutations
Figure imgf001248_0001
The glycosylation sites of variant protein HUMCEAJPEAJ JP7, as compared to the known protein Carcinoembryonic antigen-related cell adhesion molecule 5 precursor, are described in Table 15 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 15 - Glycosylation site(s)
Figure imgf001248_0002
Figure imgf001249_0001
Variant protein HUMCEAJPEAJ JP7 is encoded by the following transcript(s): HUMCEAJPEAJ JTl 2, for which the sequence(s) is/are given at the end ofthe application. The coding portion of franscript HUMCEAJΕAJ JTl 2 is shown in bold; this coding portion starts at position 115 and ends at position 2193. The transcript also has the following SNPs as listed in Table 16 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMCEA JPEA J JP7 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 16 - Nucleic acid SNPs
Figure imgf001250_0001
Variant protein HUMCEAJPEAJ JP 10 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by franscript(s) HUMCEAJPEAJ JTl 6. An alignment is given to the known protein (Carcinoembryonic antigen-related cell adhesion molecule 5 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMCEAJPEAJ J» 10 and CEA5 JHUMAN: l.An isolated chimeric polypeptide encoding for HUMCEAJPEAJ J>10, comprising a first amino acid sequence being at least 90 % homologous to MESPSAPPHRWCIPWQRLLLTASLLTF NPPTTAKLTIESTPFNVAEGKEVLLLVHNLPQ HLFGYSWYKGERVDGNRQIIGYVIGTQQATPGPAYSGREIIYPNASLLIQNIIQNDTGFYT LHVIKSDLVNEEATGQFRVYPELPKPSISSNNSKPVEDKDAVAFTCEPETQDATYLWWV NNQSLPVSPRLQLSNGNRTLTLFNVTRNDTASYKCETQNPVSARRSDS conesponding to amino acids 1 - 228 of CEA5 JHUMAN, which also conesponds to amino acids 1 - 228 of HUMCEAJPEAJ JP 10, and a second amino acid sequence being at least 90 % homologous to VILNVLYGPDDPTISPSYTYYRPGVNLSLSCHAASNPPAQYSWLIDGNIQQHTQELFISNI TEKJ SGLYTCQANNSASGHSRTTVKTITVSAELPKPSISSNNSKPVEDKDAVAFTCEPEA
QNTTYLWWVNGQSLPVSPRLQLSNGNRTLTLFNVTRNDARAYVCGIQNSVSANRSDPV
TLDVLYGPDTPIISPPDSSYLSGANLNLSCHSASNPSPQYSWRΓNGIPQQHTQVLFIAKITP
NNNGTYACFVSNLATGRNNSIVKSITVSASGTSPGLSAGATVGIMIGVLVGVALI conesponding to amino acids 407 - 702 of CEA5 JHUMAN, which also conesponds to amino acids 229 - 524 of HUMCEAJΕAJ JP 10, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2. An isolated chimeric polypeptide encoding for an edge portion of HUMCEAJPEAJ JP 10, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise SV, having a stmcture as follows: a sequence starting from any of amino acid numbers 228-x to 228; and ending at any of amino acid numbers 229+ ((n-2) - x), in which x varies from 0 to n-2. The location ofthe variant protein was detennined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide. Variant protein HUMCEAJPEAJ J? 10 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 17, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMCEAJPEAJ JP 10 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 17 - Amino acid mutations
Figure imgf001252_0001
The glycosylation sites of variant protein HUMCEAJPEAJ J510, as compared to the known protein Carcinoembryonic antigen-related cell adhesion molecule 5 precursor, are described in Table 18 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 18 - Glycosylation site(s)
Figure imgf001253_0001
Variant protein HUMCEAJPEAJ JP 10 is encoded by the following transcript(s): HUMCEAJPEAJ JTl 6, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript HUMCEAJΕAJ JT16 is shown in bold; this coding portion starts at position 115 and ends at position 1686. The transcript also has the following SNPs as listed in Table 19 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMCEAJΕAJ JP 10 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 19 - Nucleic acid SNPs
Figure imgf001254_0001
Figure imgf001255_0001
Variant protein HUMCEAJPEAJ JP 14 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMCEA JPEA J JT20. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region. Variant protein HUMCEAJPEAJ JP 14 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 20, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMCEAJPEAJ JP 14 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 20 - Amino acid mutations
Figure imgf001255_0002
Figure imgf001256_0001
Variant protein HUMCEAJ AJ JP 14 is encoded by the following transcript(s): HUMCEAJΕAJ JT20, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript HUMCEAJPEAJ JT20 is shown in bold; this coding portion starts at position 115 and ends at position 1821. The transcript also has the following SNPs as listed in Table 21 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMCEAJΕAJ JP 14 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 21 - Nucleic acid SNPs
Figure imgf001256_0002
Variant protein HUMCEAJΕAJ J519 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by franscript(s) HUMCEA JPEA JJT25. An alignment is given to the known protein (Carcinoembryonic antigen-related cell adhesion molecule 5 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description of the relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMCEA JΕA JJP 19 and CEA5 JHUMAN: l.An isolated chimeric polypeptide encoding for HUMCEAJPEAJ JP 19, comprising a first amino acid sequence being at least 90 % homologous to
MESPSAPPHRWCIPWQRLLLTASLLTFWNPPTTAKLTIESTPFNVAEGKEVLLLVHNLPQ HLFGYSWYKGERVDGNRQIIGYVIGTQQATPGPAYSGREIIYPNASLLIQNIIQNDTGFYT LHVIKSDLVNEEATGQFRVYPELPKPSISSNNSKPVEDKDAVAFTCEPETQDATYLWWV NNQSLPVSPRLQLSNGNRTLTLFNVTRNDTASYKCETQNPVSARRSDSVILN conesponding to amino acids 1 - 232 of CEA5 JHUMAN, which also conesponds to amino acids 1 - 232 of HUMCEAJΕAJ J119, and a second amino acid sequence being at least 90 % homologous to VLYGPDTPIISPPDSSYLSGANLNLSCHSASNPSPQYSWRINGIPQQHTQVLFIAKITPNNN GTYACFVSNLATGRNNSIVKSITVSASGTSPGLSAGATVGIMIGVLVGVALI conesponding to amino acids 589 - 702 of CEA5 JHUMAN, which also conesponds to amino acids 233 - 346 of HUMCEAJPEA JJP19, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2. An isolated chimeric polypeptide encoding for an edge portion of
HUMCEAJPEAJ JP 19, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise NV, having a structure as follows: a sequence starting from any of amino acid numbers 232-x to 232; and ending at any of amino acid numbers 233+ ((n-2) - x), in which x varies from 0 to n-2.
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because of manual inspection of known protein localization and/or gene stmcture. Variant protein HUMCEAJPEAJ JP 19 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 22, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMCEAJΕAJ J319 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 22 - Amino acid mutations
Figure imgf001258_0001
The glycosylation sites of variant protein HUMCEAJPEAJ JP 19, as compared to the known protein Carcinoembryonic antigen-related cell adhesion molecule 5 precursor, are described in Table 23 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 23 - Glycosylation site(s)
Figure imgf001258_0002
Figure imgf001259_0001
Variant protein HUMCEAJPEAJ J519 is encoded by the following franscript(s): HUMCEAJΕAJ JT25, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript HUMCEAJPEAJ _T25 is shown in bold; this coding portion starts at position 115 and ends at position 1152. The franscript also has the following SNPs as listed in Table 24 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is Icnown or not; the presence of known SNPs in variant protein HUMCEAJPEAJ J319 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 24 - Nucleic acid SNPs
Figure imgf001260_0001
Variant protein HUMCEAJPEAJ JP20 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) HUMCEAJPEAJ JT26. An alignment is given to the known protein (Carcinoembryonic antigen-related cell adhesion molecule 5 precursor) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMCEAJPEAJ JP20 and CEA5 J1UMAN: l.An isolated chimeric polypeptide encoding for HUMCEA JΕAJ _P20, comprising a first amino acid sequence being at least 90 %> homologous to
MESPSAPPHRWCIPWQRLLLTASLLTFWNPPTTAKLTIESTPFNVAEGKEVLLLVHNLPQ HLFGYSWYKGERVDGNRQIIGYVIGTQQATPGPAYSGREIIYPNASLLIQNIIQNDTGFYT LHVIKSDLVNEEATGQFRVYP conesponding to amino acids 1 - 142 of CEA5 JHUMAN, which also conesponds to amino acids 1 - 142 of HUMCEA JΕAJJP20, and a second amino acid sequence being at least 90 % homologous to
ELPKPSISSNNSKPVEDIODAVAFTCEPEAQNTTYLWWWGQSLPVSPRLQLSNGNRTLT LFNVTRNDARAYVCGIQNSVSANRSDPVTLDVLYGPDTPIISPPDSSYLSGANLNLSCHS ASNPSPQYSWRINGIPQQHTQVLFIAXITPNNNGTYACFVSNLATGRNNSIVKSITVSASG TSPGLSAGATVGIMIGVLVGVALI conesponding to amino acids 499 - 702 of CEA5 JHUMAN, which also conesponds to amino acids 143 - 346 of HUMCEAJPEAJ JP20, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated chimeric polypeptide encoding for an edge portion of
HUMCEAJΕAJ JP20, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise PE, having a stmcture as follows: a sequence starting from any of amino acid numbers 142-x to 142; and ending at any of amino acid numbers 143+ ((n-2) - x), in which x varies from 0 to n-2.
The location ofthe variant protein was detennined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because of manual inspection of known protein localization and/or gene structure. Variant protein HUMCEA_PEA_1_P20 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 25, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMCEAJΕAJ JP20 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 25 - Amino acid mutations
Figure imgf001262_0001
The glycosylation sites of variant protein HUMCEAJPEAJ J)20, as compared to the known protein Carcinoembryonic antigen-related cell adhesion molecule 5 precursor, are described in Table 26 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 26 - Glycosylation site(s)
Figure imgf001263_0001
Figure imgf001264_0001
Variant protein HUMCEAJPEAJ JP20 is encoded by the following transcript(s): HUMCEAJPEAJ JT26, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript HUMCEAJPEAJ JT26 is shown in bold; this coding portion starts at position 115 and ends at position 1152. The transcript also has the following SNPs as listed in Table 27 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMCEAJPEAJ J*20 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 27 - Nucleic acid SNPs
Figure imgf001264_0002
Figure imgf001265_0001
As noted above, cluster HUMCEA featares 47 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end ofthe application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided. Segment cluster HUMCEAJPEAJ _nodeJ) according to the present invention is supported by 56 libraries. The number of libraries was detennined as previously described. This segment can be found in the following franscript(s): HUMCEAJPEAJ JT8, HUMCEA JPEAJJT9, HUMCEAJPEAJ JT2, HUMCEAJPEAJ JTl 6, HUMCEA JPEA JJT20, HUMCEAJPEAJJT25 and HUMCEAJPEAJ JT26. Table 28 below describes the starting and ending position of this segment on each transcript. Table 28 - Segment location on transcripts
Figure imgf001265_0002
Segment cluster HUMCEAJΕAJ jnode_2 according to the present invention is supported by 83 libraries. The number of libraries was detennined as previously described. This segment can be found in the following transcript(s): HUMCEAJΕAJ JT8, HUMCEAJPEAJ _T9, HUMCEAJPEAJJT12, HUMCEA JΕAJ JTl 6, HUMCEAJPEAJ JT20, HUMCEAJPEAJ JT25 and HUMCEAJPEAJ JT26. Table 29 below describes the starting and ending position of this segment on each transcript. Table 29 - Segment location on transcripts
Figure imgf001266_0001
Segment cluster HUMCEAJPEAJ jiode > according to the present invention is supported by 8 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCEAJPEAJ JTl 4. Table 30 below describes the starting and ending position of this segment on each transcript. Table 30 - Segment location on transcripts
Figure imgf001266_0002
Segment cluster HUMCEAJPEAJ jiodej 1 according to the present invention is supported by 6 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCEAJPEAJ JT8. Table 31 below describes the starting and ending position of this segment on each transcript. Table 31 - Segment location on transcripts
Figure imgf001267_0001
Microanay (chip) data is also available for this segment as follows. As described above with regard to the cluster itself, various oligonucleotides were tested for being differentially expressed in various disease conditions, particularly cancer. The following oligonucleotides were found to hit this segment, shown in Table 32. Table 32 - Oligonucleotides related to this segment
Figure imgf001267_0002
Segment cluster HUMCEAJΕAJ jiodej 2 according to the present invention is supported by 83 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCEAJPEAJ JT8, HUMCEAJΕAJJT9, HUMCEAJPEAJ JT12, HUMCEAJPEAJ JTl 4 and HUMCEAJPEAJ JT20. Table 33 below describes the starting and ending position of this segment on each franscript. Table 33 - Segment location on transcripts
Figure imgf001267_0003
Figure imgf001268_0001
Segment cluster HUMCEA JΕA Jjnode J 1 according to the present invention is supported by 87 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCEAJPEAJ JT8, HUMCEAJPEAJ JT9, HUMCEAJPEAJ JT 2, HUMCEA JΕA J_T14, HUMCEAJ AJ JTl 6 and HUMCEAJPEAJ JT20. Table 34 below describes the starting and ending position of this segment on each franscript. Table 34 - Segment location on transcripts
Figure imgf001268_0002
Segment cluster HUMCEAJPEAJ j ode _36 according to the present invention is supported by 94 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCEA__PEA_1_T8, HUMCEAJPEAJ JT9, HUMCEAJPEAJ _T12, HUMCEA J AJJT14, HUMCEAJPEAJ JT16 and HUMCEAJPEAJ JT26. Table 35 below describes the starting and ending position of this segment on each franscript. Table 35 - Segment location on transcripts
Figure imgf001269_0001
Segment cluster HUMCEAJPEAJ _node_42 according to the present invention is supported by 5 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCEAJPEAJ JT29. Table 36 below describes the starting and ending position of this segment on each transcript. Table 36 - Segment location on transcripts
Figure imgf001269_0002
Segment cluster HUMCEAJΕAJ jnode -3 according to the present invention is supported by 11 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCEA JPEAJ _T29. Table 37 below describes the starting and ending position of this segment on each transcript. Table 37 - Segment location on transcripts
Figure imgf001269_0003
Segment cluster HUMCEAJΕAJ _node !4 according to the present invention is supported by 112 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCEAJΕAJ JT8, HUMCEAJΕAJ JT9, HUMCEA JΕA JJT12, HUMCEA JPEAJJTl 4, HUMCEAJPEAJ JTl 6, HUMCEAJPEAJ JT25, HUMCEAJΕAJ JT26 and HUMCEAJPEAJ JT29. Table 38 below describes the starting and ending position of this segment on each transcript. Table 38 - Segment location on transcripts
Figure imgf001270_0001
Segment cluster HUMCEAJΕAJ jnode 46 according to the present invention is supported by 15 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HUMCEA JΕAJ _T9. Table 39 below describes the starting and ending position of this segment on each franscript. Table 39 - Segment location on transcripts
Figure imgf001270_0002
Segment cluster HUMCEAJ AJ ιode_48 according to the present invention is supported by 18 libraries. The number of libraries was detennined as previously described. This segment can be found in the following franscript(s): HUMCEAJΕAJ JT30. Table 40 below describes the starting and ending position of this segment on each transcript. Table 40 - Segment location on transcripts
Figure imgf001271_0001
Segment cluster HUMCEA_PEA_l_node_63 according to the present invention is supported by 68 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCEAJPEAJ JT8, HUMCEA JPEAJ JTl 2, HUMCEA_PEA_1_T14, HUMCEA JΕAJ JTl 6, HUMCEAJΕAJ JT25, HUMCEA_PEA_1_T26, HUMCEAJPEAJ JH9 and HUMCEAJPEAJjπO. Table 41 below describes the starting and ending position of this segment on each franscript. Table 41 - Segment location on transcripts
Figure imgf001271_0002
Segment cluster HUMCEAJPEAJ jιode_65 according to the present Invention is supported by 54 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcπpt(s): HUMCEAJPEAJ JT8, HUMCEAJPEAJ JTl 2, HUMCEAJPEAJ JT 4, HUMCEAJPEAJ JTl 6, HUMCEAJΕAJ JT25, HUMCEA_PEA_1_T26, HUMCEA_PEAJ JT29 and HUMCEAJPEAJ JT30. Table 42 below describes the starting and ending position of this segment on each transcript. Table 42 - Segment location on transcripts
Figure imgf001272_0001
Segment cluster HUMCEAJPEAJ jιodej57 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCEAJPEAJ JT20. Table 43 below describes the starting and ending position of this segment on each franscript. Table 43 - Segment location on transcripts
Figure imgf001272_0002
According to an optional embodiment ofthe present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster HUMCEAJPEAJ jnode J according to the present invention is supported by 67 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCEAJPEAJ JT8, HUMCEAJPEAJ JT9, HUMCEA_PEA_1_T12, HUMCEAJPEAJ JTl 6, HUMCEAJPEAJ _T20, HUMCEAJΕAJ JT25 and HUMCEAJPEAJ T26. Table 44 below describes the starting and ending position of this segment on each franscript. Table 44 - Segment location on transcripts
Figure imgf001273_0001
Segment cluster HUMCEAJPEAJ jiode 7 according to the present invention is supported by 73 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCEAJPEAJ JT8, HUMCEAJPEAJ J?9, HUMCEAJPEAJ_T12, HUMCEAJPEAJ JT14, HUMCEA_PEA_1_T16, HUMCEA JΕAJJT20 and HUMCEAJPEAJ _T25. Table 45 below describes the starting and ending position of this segment on each transcript. Table 45 - Segment location on transcripts
Figure imgf001274_0001
Segment cluster HUMCEA_PEA_l_node_8 according to the present invention is supported by 67 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCEAJPEAJ JT8, HUMCEA JΕAJJT9, HUMCEAJΕAJ JT12, HUMCEA JPEAJJT14, HUMCEAJΕAJ JTl 6, HUMCEAJPEAJ JH0 and HUMCEAJΕAJ JT25. Table 46 below describes the starting and ending position of this segment on each transcript. Table 46 - Segment location on transcripts
Figure imgf001274_0002
Segment cluster HUMCEA_PEA_l_node_9 according to the present invention is supported by 71 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HUMCEAJ AJ JT8, HUMCEA JΕAJJT9, HUMCEAJPEAJ JTl 2, HUMCEA JΕA JJT14, HUMCEAJPEAJ JTl 6, HUMCEA _PEA_1_T20 and HUMCEA_PEA_1_T25. Table 47 below describes the starting and ending position of this segment on each transcript. Table 47 - Segment location on transcripts
Figure imgf001275_0001
Segment cluster HUMCEAJPEAJ jnode JO according to the present invention is supported by 67 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCEAJPEAJ JT8, HUMCEAJPEAJ _T9, HUMCEA JΕAJ JTl 2, HUMCEAJPEAJ T14, HUMCEAJΕAJ JTl 6, HUMCEAJPEAJ JT20 and HUMCEA_PEA_1_T25. Table 48 below describes the starting and ending position of this segment on each transcript. Table 48 - Segment location on transcripts
Figure imgf001275_0002
Figure imgf001276_0001
Segment cluster HUMCEAJPEAJ jiodej 5 according to the present invention can be found in the following franscript(s): HUMCEAJPEAJ JT8, HUMCEAJPEAJ _T9, HUMCEAJΕAJ JTl 2, HUMCEAJPEAJ JT14 and HUMCEAJPEAJ JT20. Table 49 below describes the starting and ending position of this segment on each transcript. Table 49 - Segment location on transcripts
Figure imgf001276_0002
Segment cluster HUMCEAJPEAJ jnode J 6 according to the present invention can be found in the following franscript(s): HUMCEAJΕA T8, HUMCEAJΕAJ JT9, HUMCEAJPEAJ JT12, HUMCEA_PEA_1_T14 and HUMCEAJΕAJ JT20. Table 50 below describes the starting and ending position of this segment on each transcript. Table 50 - Segment location on transcripts
Figure imgf001276_0003
Figure imgf001277_0001
Segment cluster HUMCEAJΕAJ jiodej 7 according to the present invention can be found in the following transcript(s): HUMCEAJΕAJ JT8, HUMCEAJPEAJ JT9, HUMCEAJΕAJ JTl 2, HUMCEAJPEAJ JT14 and HUMCEAJΕAJ JT20. Table 51 below describes the starting and ending position of this segment on each transcript. Table 51 - Segment location on transcripts
Figure imgf001277_0002
Segment cluster HUMCEAJPEAJ jiodej 8 according to the present invention can be found in the following transcript(s): HUMCEA JPEAJJT8, HUMCEA_PEA_1_T9, HUMCEAJΕAJ _T12, HUMCEAJΕAJ _T14 and HUMCEA JΕAJ JT20. Table 52 below describes the starting and ending position of this segment on each transcript. Table 52 - Segment location on tr-anscripts
Figure imgf001277_0003
Figure imgf001278_0001
Segment cluster HUMCEAJΕAJ jiodej 9 according to the present invention is supported by 69 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCEAJΕAJ JT8, HUMCEAJPEAJ _T9, HUMCEAJPEAJ JT12, HUMCEAJPEAJ JT14 and HUMCEAJPEAJ JT20. Table 53 below describes the starting and ending position of this segment on each franscript. Table 53 - Segment location on transcripts
Figure imgf001278_0002
Segment cluster HUMCEAJΕAJ jnode JO according to the present invention can be found in the following franscript(s): HUMCEA JΕAJ JT8, HUMCEA_PEA_1_T9, HUMCEAJΕAJ JTl 2, HUMCEAJΕAJ JT14 and HUMCEAJΕAJJT20. Table 54 below describes the starting and ending position of this segment on each transcript. Table 54 - Segment location on transcripts
Figure imgf001278_0003
Figure imgf001279_0001
Segment cluster HUMCEAJPEAJ ιode_21 according to the present invention can be found in the following transcript(s): HUMCEAJPEAJ JT8, HUMCEAJPEAJ JT9, HUMCEAJPEAJ JTl 2, HUMCEAJ?EAJJT14 and HUMCEAJΕAJ JT20. Table 55 below describes the starting and ending position of this segment on each franscript. Table 55 - Segment location on transcripts
Figure imgf001279_0002
Segment cluster HUMCEA JPEAJ ιodeJ2 according to the present invention is supported by 77 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HUMCEAJΕAJ JT8, HUMCEAJPEAJJT9, HUMCEAJPEAJ JTl 2, HUMCEAJΕAJ _T14 and HUMCEAJΕAJ JT20. Table 56 below describes the starting and ending position of this segment on each transcript. Table 56 - Segment location on transcripts
Figure imgf001279_0003
Figure imgf001280_0001
Segment cluster HUMCEAJΕAJ jiode _23 according to the present invention is supported by 72 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCEAJPEAJ JT8, HUMCEAJPEAJ _T9, HUMCEAJΕAJ JT12, HUMCEAJPEAJ JTl 4 and HUMCEAJPEAJ JT20. Table 57 below describes the starting and ending position of this segment on each transcript. Table 57 - Segment location on transcripts
Figure imgf001280_0002
Segment cluster HUMCEAJΕAJ jiode 4 according to the present invention can be found in the following franscript(s): HUMCEAJPEAJJT8, HUMCEAJΕAJ JT9, HUMCEAJPEAJ JT12, HUMCEAJΕAJ JT14 and HUMCEA_PEA_1_T20. Table 58 below describes the starting and ending position of this segment on each franscript. Table 58 - Segment location on transcripts
Figure imgf001281_0001
Segment cluster HUMCEAJΕA Jj ode 7 according to the present invention can be found in the following transcript(s): HUMCEAJPEAJ _T8, HUMCEAJPEAJ JT9, HUMCEAJΕAJ JTl 2, HUMCEA JΕA JJT14, HUMCEAJPEAJ JTl 6 and HUMCEA_PEA_1_T20. Table 59 below describes the starting and ending position of this segment on each transcript. Table 59 - Segment location on transcripts
Figure imgf001281_0002
Segment cluster HUMCEA_PEA_l_node_29 according to the present invention can be found in the following transcript(s): HUMCEAJPEAJ JT8, HUMCEAJΕAJ _T9, HUMCEA_PEA_1_T12, HUMCEAJΕAJ JTl 4, HUMCEAJPEAJ JTl 6 and HUMCEAJPEAJ JT20. Table 60 below describes the starting and ending position of this segment on each transcript. Table 60 - Segment location on transcripts
Figure imgf001282_0001
Segment cluster HUMCEAJPEAJ jnode O according to the present invention is supported by 67 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCEAJPEAJ JT8, HUMCEAJPEAJ JT9, HUMCEAJΕAJ JTl 2, HUMCEA JPEAJJT14, HUMCEAJPEAJ_T16 and HUMCEAJΕAJ JT20. Table 61 below describes the starting and ending position of this segment on each transcript. Table 61 - Segment location on transcripts
Figure imgf001282_0002
Segment cluster HUMCEAJΕAJ jiode J3 according to the present invention can be found in the following franscript(s): HUMCEA JPEAJ JT8, HUMCEAJΕAJ JT9, HUMCEAJΕAJ JTl 2, HUMCEAJ AJ JTl 4, HUMCEAJPEAJ J l 6 and HUMCEAJPEAJ JT26. Table 62 below describes the starting and ending position of this segment on each transcript. Table 62 - Segment location on transcripts
Figure imgf001283_0001
Segment cluster HUMCEA JΕA Jjiode J4 according to the present invention is supported by 80 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCEAJPEAJ JT8, HUMCEA JΕAJ JT9, HUMCEAJPEAJ JT 2, HUMCEA JΕA JJT14, HUMCEAJPEAJ JTl 6 and HUMCEAJPEAJ JT26. Table 63 below describes the starting and ending position of this segment on each transcript. Table 63 - Segment location on transcripts
Figure imgf001283_0002
Segment cluster HUMCEAJPEAJ jiode J5 according to the present invention is supported by 75 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCEAJPEAJ JT8, HUMCEAJ>EAJ JT9, HUMCEA JΕA JJT12, HUMCEAJPEAJ JT14, HUMCEAJΕAJ JTl 6 and HUMCEA JPEA JJT26. Table 64 below describes the starting and ending position of this segment on each transcript. Table 64 - Segment location on transcripts
Figure imgf001284_0001
Segment cluster HUMCEAJPEAJ jiode _45 according to the present invention is supported by 9 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCEAJPEAJ JT9. Table 65 below describes the starting and ending position of this segment on each franscript. Table 65 - Segment location on transcripts
Figure imgf001284_0002
Segment cluster HUMCEAJΕAJ jiode _49 according to the present invention can be found in the following transcript(s): HUMCEAJΕAJ JT30. Table 66 below describes the starting and ending position of this segment on each transcript. Table 66 - Segment location on transcripts
Figure imgf001285_0001
Segment cluster HUMCEAJPEAJ jnode O according to the present invention is supported by 64 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCEAJPEAJ JT8, HUMCEAJPEAJ JT14, HUMCEA_PEA_1_T16, HUMCEAJPEAJ JT25, HUMCEAJΕAJ JT26, HUMCEA JΕAJJT29 and HUMCEAJPEAJ JT30. Table 67 below describes the starting and ending position of this segment on each transcript. Table 67 - Segment location on transcripts
Figure imgf001285_0002
Segment cluster HUMCEAJPEAJ jnode _ 51 according to the present invention is supported by 88 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCEAJPEAJ JT8, HUMCEAJPEAJ JT12, HUMCEA J AJ JTl 4, HUMCEAJΕAJ JT 6, HUMCEAJPEAJ JT25, HUMCEAJPEAJ JT26, HUMCEAJPEAJ JH9 and HUMCEAJPEAJ JT30. Table 68 below describes the starting and ending position of this segment on each transcript. Table 68 - Segment location on transcripts
Figure imgf001286_0001
Segment cluster HUMCEAJPEAJ jiode 6 according to the present invention is supported by 75 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCEAJPEAJ JT8, HUMCEA JΕAJJT12, HUMCEAJPEAJ JTl 4, HUMCEAJPEAJ JT 6, HUMCEA JΕNJJT25, HUMCEAJPEAJ _T26, HUMCEAJPEAJ T29 and HUMCEA JPEAJJT30. Table 69 below describes the starting and ending position of this segment on each franscript. Table 69 - Segment location on transcripts
Figure imgf001286_0002
Figure imgf001287_0001
Segment cluster HUMCEAJΕAJ jnode 7 according to the present invention is supported by 82 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HUMCEAJPEAJ JT8, HUMCEAJPEAJ JTl 2, HUMCEAJPEAJ JTl 4, HUMCEAJPEAJ JTl 6, HUMCEAJΕAJ JT25, HUMCEAJPEAJ JT26, HUMCEA_PEA_1_T29 and HUMCEAJPEAJ JT30. Table 70 below describes the starting and ending position of this segment on each transcript. Table 70 - Segment location on tr-anscripts
Figure imgf001287_0002
Segment cluster HUMCEAJPEAJ jiode 8 according to the present invention is supported by 63 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCEAJΕAJ JT8, HUMCEAJPEAJ JTl 2, HUMCEAJPEAJ JT4, HUMCEA JΕAJ JTl 6, HUMCEAJΕAJ _T25, HUMCEA_PEA_1_T26, HUMCEAJΕAJ _T29 and HUMCEA JPEAJ JT30. Table 71 below describes the starting and ending position of this segment on each transcript. Table 71 - Segment location on transcripts
Figure imgf001288_0001
Segment cluster HUMCEAJΕAJ _node >0 according to the present invention is supported by 55 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCEAJPEAJ JT8, HUMCEAJΕAJJT12, HUMCEA JΕA JJT14, HUMCEAJPEAJ JTl 6, HUMCEAJΕA J _T25, HUMCEA_PEA_1_T26, HUMCEAJPEAJ JT29 and HUMCEAJΕAJ JT30. Table 72 below describes the starting and ending position of this segment on each franscript. Table 72 - Segment location on transcripts
Figure imgf001288_0002
Figure imgf001289_0001
Segment cluster HUMCEA JΕA Jjiode j51 according to the present invention can be found in the following transcript(s): HUMCEA JPEA JJT8, HUMCEA_PEA_1_T12, HUMCEAJPEAJ JT14, HUMCEA_PEA_1_T16, HUMCEAJΕAJ JT25, HUMCEAJPEAJ JT26, HUMCEA_PEA_1_T29 and HUMCEAJPEAJ JT30. Table 73 below describes the starting and ending position of this segment on each transcript. Table 73 - Segment location on transcripts
Figure imgf001289_0002
Segment cluster HUMCEAJΕAJ ιode >2 according to the present invention is supported by 60 libraries. The number of libraries was detennined as previously described. This segment can be found in the following transcript(s): HUMCEAJPEAJ JT8, HUMCEA JΕAJJT12, HUMCEA JPEAJ JT14, HUMCEA JPEAJ JTl 6, HUMCEAJΕAJ JT25, HUMCEAJPEAJ JT26, HUMCEAJPEAJ JH9 and HUMCEAJΕAJ JT30. Table 74 below describes the starting and ending position of this segment on each transcript. Table 74 - Segment location on tr-anscripts
Figure imgf001290_0001
Segment cluster HUMCEAJPEAJ _node_64 according to the present invention is supported by 45 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMCEAJPEAJ JT8, HUMCEA JΕAJJT12, HUMCEA JPEAJ JTl 4, HUMCEAJPEAJJT16, HUMCEAJΕAJ JT25, HUMCEA JΕAJJT26, HUMCEA JΕA JJT29 and HUMCEAJΕAJ JT30. Table 75 below describes the starting and ending position of this segment on each transcript. Table 75 - Segment location on transcripts
Figure imgf001290_0002
Figure imgf001291_0001
Variant protein alignment to the previously known protein: Sequence name: CEA5_HUMAN Sequence documentation:
Alignment of: HUMCEA_PEA_1 P4 x CEA5JΪUMAN
Alignment segment 1/1:
Quality: 2320.00 Escore: 0 Matching length: 234 Total length: 234 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0 Alignment:
1 MESPSAPPHRWCIP QRLLLTASLLTF NPPTTAKLTIESTPFNVAEGKE 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MESPSAPPHRWCIP QRLLLTASLLTF NPPTTAKLTIESTPFNVAEG E 50 51 VLLLVHNLPQHLFGYS YKGERVDGNRQIIGYVIGTQQATPGPAYSGREI 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 VLLLVHNLPQHLFGYSWYKGERVDGNRQIIGYVIGTQQATPGPAYSGREI 100
•10 101 IYPNASLLIQNIIQNDTGFYTLHVIKSDLVNEEATGQFRVYPELPKPSIS 150 I I I I I I I I I III I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 101 IYPNASLLIQNIIQNDTGFYTLHVIKSDLVNEEATGQFRVYPELPKPSIS 150
15 151 SNNSKPVEDKDAVAFTCEPETQDATYL WVNNQSLPVSPRLQLSNGNRTL 200 I I I I I I I I I III I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I 151 SNNSKPVEDKDAVAFTCEPETQDATYL VNNQSLPVSPRLQLSNGNRTL 200
201 TLFNVTRNDTASYKCETQNPVSARRSDSVILNVL 234
20 I I I I I I I I I II I I I I I I I I I I I I I I I I I I I II I I 201 TLFNVTRNDTASYKCETQNPVSARRSDSVILNVL 234
25
Sequence name: CEA5JHUMAN
30 Sequence documentation: Al ignment of : HUMCEA_PEA_J_P5 x CEA5_HUMAN
Alignment segment 1/1: Quality: 6692.00
Escore: 0 Matching length: 675 Total length: 675 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00-- Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment:
1 MESPSAPPHRWCIPWQRLLLTASLLTFWNPPTTAKLTIESTPFNVAEGKE 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MESPSAPPHR CIPWQRLLLTASLLTF NPPTTAKLTIESTPFNVAEGKE 50 . . . . . 51 VLLLVHNLPQHLFGYSWYKGERVDGNRQIIGYVIGTQQATPGPAYSGREI 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 VLLLVHNLPQHLFGYS YKGERVDGNRQIIGYVIGTQQATPGPAYSGREI 100 101 IYPNASLLIQNIIQNDTGFYTLHVI SDLVNEEATGQFRVYPELPKPSIS 150 I I I I I I I I I I I I I I I II I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I 101 IYPNASLLIQNIIQNDTGFYTLHVIKSDLVNEEATGQFRVYPELPKPSIS 150 151 SNNSKPVEDKDAVAFTCEPETQDATYL VNNQSLPVSPRLQLSNGNRTL 200 I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I II I I I I I I I I I I 151 SNNSKPVEDKDAVAFTCEPETQDATYL WVNNQSLPVSPRLQLSNGNRTL 200 201 TLFNVTRNDTASYKCETQNPVSARRSDSVILNVLYGPDAPTISPLNTSYR 250 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
201 TLFNVTRNDTASY CETQNPVSARRSDSVILNVLYGPDAPTISPLNTSYR 250 . . . . .
251 SGENLNLSCHAASNPPAQYSWFVNGTFQQSTQELFIPNITVNNSGSYTCQ 300 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
251 SGENLNLSCHAASNPPAQYS FVNGTFQQSTQELFIPNITVNNSGSYTCQ 300
301 AHNSDTGLNRTT¥TTITVYAEPPKPFITSNNSNPVEDEDAVALTCEPEIQ 350 I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
301 AHNSDTGLNRTTVTTITVYAEPPKPFITSNNSNPVEDEDAVALTCEPEIQ 350
351 NTTYL VNNQSLPVSPRLQLSNDNRTLTLLSVTRNDVGPYECGIQNELS 400 I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I
351 NTTYL VNNQSLPVSPRLQLSNDNRTLTLLSVTRNDVGPYECGIQNELS 400
401 VDHSDPVILNVLYGPDDPTISPSYTYYRPGVNLSLSCHAASNPPAQYSWL 450 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 401 VDHSDPVILNVLYGPDDPTISPSYTYYRPGVNLSLSCHAASNPPAQYS L 450
451 IDGNIQQHTQELFISNITEKNSGLYTCQANNSASGHSRTTVKTITVSAEL 500 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
451 IDGNIQQHTQELFISNITEKNSGLYTCQANNSASGHSRTTVKTITVSAEL 500 . . . . .
501 PKPSISSNNS PVEDKDAVAFTCEPEAQNTTYL VNGQSLPVSPRLQLS 550 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
501 PKPSISSNNSKPVEDKDAVAFTCEPEAQNTTYLW VNGQSLPVSPRLQLS 550
551 NGNRTLTLFNVTRNDARAYVCGIQNSVSANRSDPVTLDVLYGPDTPIISP 600 551 NGNRTLTLFNVTRNDARAYVCGIQNSVSANRSDPVTLDVLYGPDTPIISP 600
601 PDSSYLSGANLNLSCHSASNPSPQYS RINGIPQQHTQVLFIAKITPNNN 650 I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I 601 PDSSYLSGANLNLSCHSASNPSPQYSWRINGIPQQHTQVLFIAKITPNNN 650
651 GTYACFVSNLATGRNNSIVKSITVS 675 I I I I I I I I I II I I I I I I I I I I I I I I 651 GTYACFVSNLATGRNNSIVKSITVS 675
Sequence name: CEA5_HUMAN
Sequence documentation:
Alignment of: HUMCEA_PEA_1_P7 x CEA5_HUMAN
Alignment segment 1/1:
Quality: 6745.00 Escore: 0 Matching length: 693 Total length: 702 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 98.72 Total Percent Identity: 98.72 Gaps : 1
Alignment :
1 MESPSAPPHR CIPWQRLLLTASLLTF NPPTTAKLTIESTPFNVAEGKE 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MESPSAPPHRWCIPWQRLLLTASLLTFWNPPTTAKLTIESTPFNVAEGKE 50 51 VLLLVHNLPQHLFGYSWYKGERVDGNRQIIGYVIGTQQATPGPAYSGREI 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I- -- 51 VLLLVHNLPQHLFGYSWYKGERVDGNRQIIGYVIGTQQATPGPAYSGREI 100 101 IYPNASLLIQNIIQNDTGFYTLHVIKSDLVNEEATGQFRVYPELPKPSIS 150 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 101 IYPNASLLIQNIIQNDTGFYTLHVIKSDLVNEEATGQFRVYPELPKPSIS 150 151 SNNSKPVEDKDAVAFTCEPETQDATYLWWVNNQSLPVSPRLQLSNGNRTL 200 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 151 SNNSKPVEDKDAVAFTCEPETQDATYLWWVNNQSLPVSPRLQLSNGNRTL 200 . . . . . 201 TLFNVTRNDTASYKCETQNPVSARRSDSVILNVLYGPDAPTISPLNTSYR 250 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 201 TLFNVTRNDTASYKCETQNPVSARRSDSVILNVLYGPDAPTISPLNTSYR 250 251 SGENLNLSCHAASNPPAQYS FVNGTFQQSTQELFIPNITVNNSGSYTCQ 300 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 251 SGENLNLSCHAASNPPAQYSWFVNGTFQQSTQELFIPNITVNNSGSYTCQ 300 301 AHNSDTGLNRTTVTTITVYAEPPKPFITSNNSNPVEDEDAVALTCEPEIQ 350 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 301 AHNSDTGLNRTTVTTITVYAEPPKPFITSNNSNPVEDEDAVALTCEPEIQ 350 351 NTTYL VNNQSLPVSPRLQLSNDNRTLTLLSVTRNDVGPYECGIQNELS 400 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 351 NTTYL VNNQSLPVSPRLQLSNDNRTLTLLSVTRNDVGPYECGIQNELS 400 . . . . . 401 VDHSDPVILNVLYGPDDPTISPSYTYYRPGVNLSLSCHAASNPPAQYSWL 450 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 401 VDHSDPVILNVLYGPDDPTISPSYTYYRPGVNLSLSCHAASNPPAQYSWL 450
- 451 IDGNIQQHTQELFISNITEKNSGLYTCQANNSASGHSRTTVKTITVSAEL 500 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 451 IDGNIQQHTQELFISNITEKNSGLYTCQANNSASGHSRTTVKTITVSAEL 500 501 PKPSISSNNSKPVEDKDAVAFTCEPEAQNTTYLW VNGQSLPVSPRLQLS 550 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I 501 PKPSISSNNSKPVEDKDAVAFTCEPEAQNTTYLWWVNGQSLPVSPRLQLS 550 551 NGNRTLTLFNVTRNDARAYVCGIQNSVSANRSDPVTLDVLYGPDTPIISP 600 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 551 NGNRTLTLFNVTRNDARAYVCGIQNSVSANRSDPVTLDVLYGPDTPIISP 600 601 PDSSYLSGANLNLSCHSASNPSPQYS RINGIPQQHTQVLFIAKITPNNN 650 I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 601 PDSSYLSGANLNLSCHSASNPSPQYSWRINGIPQQHTQVLFIAKITPNNN 650
651 GTYACFVSNLATGRNNSIVKSI V SAGATVGIMIGVLVGVA 691
651 GTYACFVSNLATGRNNSIVKSITVSASGTSPGLSAGATVGIMIGVLVGVA 700
692 LI 693 701 LI 702
Sequence name: CEA5_HUMAN
Sequence documentation:
Alignment of: HUMCEA_PEA_1_P10 x CEA5_HUMAN
Alignment segment 1/1:
Quality: 5057.00 Escore: 0 Matching length: 524 Total length: 702 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 74.64 Total Percent Identity: 74.64 Gaps : 1
Alignment :
1 MESPSAPPHRWCIP QRLLLTASLLTF NPPTTAKLTIESTPFNVAEGKE 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MESPSAPPHRWCIPWQRLLLTASLLTFWNPPTTAKLTIESTPFNVAEGKE 50 VLLLVHNLPQHLFGYS YKGERVDGNRQIIGYVIGTQQATPGPAYSGREI 100
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I VLLLVHNLPQHLFGYS YKGERVDGNRQIIGYVIGTQQATPGPAYSGREI 100
IYPNASLLIQNIIQNDTGFYTLHVIKSDLVNEEATGQFRVYPELPKPSIS 150
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I IYPNASLLIQNIIQNDTGFYTLHVIKSDLVNEEATGQFRVYPELPKPSIS 150
SNNSKPVEDKDAVAFTCEPETQDATYLWWVNNQSLPVSPRLQLSNGNRTL 200
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I - SNNSKPVEDKDAVAFTCEPETQDATYLWWVNNQSLPVSPRLQLSNGNRTL 200
TLFNVTRNDTASYKCETQNPVSARRSDS 228
I I I I I I I I I I I I I I I I I I I I I I I I I I I I TLFNVTRNDTASYKCETQNPVSARRSDSVILNVLYGPDAPTISPLNTSYR 250 228
SGENLNLSCHAASNPPAQYS FVNGTFQQSTQELFIPNITVNNSGSYTCQ 300 . . . . . 228
AHNSDTGLNRTTVTTITVYAEPPKPFITSNNSNPVEDEDAVALTCEPEIQ 350 228
NTTYLW VNNQSLPVSPRLQLSNDNRTLTLLSVTRNDVGPYECGIQNELS 400 VILNVLYGPDDPTISPSYTYYRPGVNLSLSCHAASNPPAQYSWL 272
VDHSDPVILNVLYGPDDPTISPSYTYYRPGVNLSLSCHAASNPPAQYSWL 450 273 IDGNIQQHTQELFISNITEKNSGLYTCQANNSASGHSRTTVKTITVSAEL 322 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 451 IDGNIQQHTQELFISNITEKNSGLYTCQANNSASGHSRTTVKTITVSAEL 500 . . . . . 323 PKPSISSNNSKPVEDKDAVAFTCEPEAQNTTYLWWVNGQSLPVSPRLQLS 372 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 501 PKPSISSNNSKPVEDKDAVAFTCEPEAQNTTYLWWVNGQSLP¥SPRLQLS 550 373 NGNRTLTLFNVTRNDARAYVCGIQNSVSANRSDPVTLDVLYGPDTPIISP 422 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 551 NGNRTLTLFNVTRNDARAYVCGIQNSVSANRSDPVTLDVLYGPDTPIISP 600 423 PDSSYLSGANLNLSCHSASNPSPQYSWRINGIPQQHTQVLFIAKITPNNN 472 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 601 PDSSYLSGANLNLSCHSASNPSPQYSWRINGIPQQHTQVLFIAKITPNNN 650 473 GTYACFVSNLATGRNNSIVKSITVSASGTSPGLSAGATVGIMIGVLVGVA 522 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 651 GTYACFVSNLATGRNNSIVKSITVSASGTSPGLSAGATVGIMIGVLVGVA 700
523 LI 524 I I 701 LI 702
Sequence name: CEA5 HUMAN Sequence documentation:
Alignment of: HUMCEA_PEA_1_P19 x CEA5_HUMAN
Alignment segment 1/1:
Quality: 3298.00 Escore: 0 - - Matching length: - 346 — -- --Total length: 702 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 49.29 Total Percent Identity: 49.29 Gaps : 1
Alignment: 1 MESPSAPPHR CIP QRLLLTASLLTF NPPTTAKLTIESTPFNVAEGKE 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MESPSAPPHRWCIP QRLLLTASLLTFWNPPTTAKLTIESTPFNVAEGKE 50 51 VLLLVHNLPQHLFGYSWYKGERVDGNRQIIGYVIGTQQATPGPAYSGREI 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 VLLLVHNLPQHLFGYSWYKGERVDGNRQIIGYVIGTQQATPGPAYSGREI 100 101 IYPNASLLIQNIIQNDTGFYTLHVIKSDLVNEEATGQFRVYPELPKPSIS 150 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 101 IYPNASLLIQNIIQNDTGFYTLHVIKSDLVNEEATGQFRVYPELPKPSIS 150 151 SNNSKPVEDKDAVAFTCEPETQDATYLW VNNQSLPVSPRLQLSNGNRTL 200 I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
151 SNNSKPVEDKDAVAFTCEPETQDATYL VNNQSLPVSPRLQLSNGNRTL 200
201 TLFNVTRNDTASYKCETQNPVSARRSDS¥ILN 232 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
201 TLFNVTRNDTASYKCETQNPVSARRSDSVILNVLYGPDAPTISPLNTSYR 250
232 232
251 SGENLNLSCHAASNPPAQYS FVNGTFQQSTQELFIPNITVNNSGSYTCQ 300
232 232
301 AHNSDTGLNRTTVTTITVYAEPPKPFITSNNSNPVEDEDAVALTCEPEIQ 350
232 232
351 NTTYLW VNNQSLPVSPRLQLSNDNRTLTLLSVTRNDVGPYECGIQNELS 400 . . . . .
232 232
401 VDHSDPVILNVLYGPDDPTISPSYTYYRPGVNLSLSCHAASNPPAQYS L 450
232 232
451 IDGNIQQHTQELFISNITEKNSGLYTCQANNSASGHSRTTVKTITVSAEL 500
232 232
501 PKPSISSNNSKPVEDKDAVAFTCEPEAQNTTYLW VNGQSLPVSPRLQLS 550 233 VLYGPDTPIISP 244
551 NGNRTLTLFNVTRNDARAYVCGIQNSVSANRSDPVTLDVLYGPDTPIISP 600
245 PDSSYLSGANLNLSCHSASNPSPQYSWRINGIPQQHTQVLFIAKITPNNN 294
601 PDSSYLSGANLNLSCHSASNPSPQYSWRINGIPQQHTQVLFIAKITPNNN 650 295-GTYACFVSNLATGRNNSIVKSITVSASGTSPGLSAGATVGIMIGVLVGVA 344
651 GTYACFVSNLATGRNNSIVKSITVSASGTSPGLSAGATVGIMIGVLVGVA 700
345 LI 346
701 LI 702
Sequence name: CEA5_HUMAN
Sequence documentation:
Alignment of: HUMCEA_PEA_1_P20 x CEA5_HUMAN
Alignment segment 1/1: Quality: 3294.00 Escore: 0 Matching length: 346 Total length: 702 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 49.29 Total Percent Identity: 49.29 Gaps : 1
Alignment:
1 MESPSAPPHRWCIP QRLLLTASLLTFWNPPTTAKLTIESTPFNVAEGKE 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MESPSAPPHRWCIP QRLLLTASLLTFWNPPTTAKLTIESTPFNVAEGKE 50
51 VLLLVHNLPQHLFGYS YKGERVDGNRQIIGYVIGTQQATPGPAYSGREI 100 II I I I I I I I II I I I I I I I I I I I I I I I II II I I I I I I I I I I I I I I I II I I I 51 VLLLVHNLPQHLFGYSWYKGERVDGNRQIIGYVIGTQQATPGPAYSGREI 100 . . . . . 101 IYPNASLLIQNIIQNDTGFYTLHVIKSDLVNEEATGQFRVYP 142 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 101 IYPNASLLIQNIIQNDTGFYTLHVIKSDLVNEEATGQFRVYPELPKPSIS 150 142 142
151 SNNSKPVEDKDAVAFTCEPETQDATYLWWVNNQSLPVSPRLQLSNGNRTL 200 142 142
201 TLFNVTRNDTASYKCETQNPVSARRSDSVILNVLYGPDAPTISPLNTSYR 250 142 142
251 SGENLNLSCHAASNPPAQYSWFVNGTFQQSTQELFIPNITVNNSGSYTCQ 300
142 142
301 AHNSDTGLNRTTVTTITVYAEPPKPFITSNNSNPVEDEDAVALTCEPEIQ 350
142 142
351 NTTYLWW¥NNQSLPVSPRLQLSNDNRTLTLLSVTRNDVGPYECGIQNELS 400
142 142
401 VDHSDPVILNVLYGPDDPTISPSYTYYRPGVNLSLSCHAASNPPAQYSWL 450
143 EL 144 I I 451 IDGNIQQHTQELFI SNITEKNSGLYTCQANNSASGHSRTTVKTITVSAEL 500
145 PKPSISSNNSKPVEDKDAVAFTCEPEAQNTTYLWWVNGQSLPVSPRLQLS 194 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
501 PKPSISSNNSKPVEDKDAVAFTCEPEAQNTTYLW VNGQSLPVSPRLQLS 550
195 NGNRTLTLFNVTRNDARAYVCGIQNSVSANRSDPVTLDVLYGPDTPIISP 244 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
551 NGNRTLTLFNVTRNDARAYVCGIQNSVSANRSDPVTLDVLYGPDTPI ISP 600
245 PDSSYLSGANLNLSCHSASNPSPQYS RINGIPQQHTQVLFIAKITPNNN 294 601 PDSSYLSGANLNLSCHSASNPSPQYSWRINGIPQQHTQVLFIAKITPNNN 650
295 GTYACFVSNLATGRNNSIVKSITVSASGTSPGLSAGATVGIMIGVLVGVA 344
651 GTYACFVSNLATGRNNSIVKSITVSASGTSPGLSAGATVGIMIG¥LVGVA 700
345 LI 346
701 LI 702
Expression of Carcinoembryonic antigen-related cell adhesion molecule 5 transcripts which are detectable by segl 2 and seg9 , in normal, and cancerous colon tissues
Expression of Carcinoembryonic antigen-related cell adhesion molecule 5 transcripts detectable by or according to segl 2 and seg9 , was measured with oligonucleotide-based micro- anays. The results of image intensities for each feature were normalized according to the ninetieth percentile ofthe image intensities of all the features on the chip. Then, feature image intensities for replicates ofthe same oligonucleotide on the chip and replicates ofthe same sample were averaged. Outlying results were discarded.
For every oligonucleotide HUMCEA_0_0_96 (segl2, SEQ ID NO: 1338) and HUMCEA_0J)J5168 (seg9, SEQ ID NO: 1339) the averaged intensity detennined for every sample was divided by the averaged intensity of all the normal samples (Sample Nos. 62-66 and 69, Table 1, above, "Tissue samples in testing panel"), to obtain a value of fold up-regulation for each sample relative to the averaged normal samples. These data are presented in a histogram bellow, in Figure 50. As is evident from the histogram (fig 50), the expression of Voltage- dependent L-type calcium channel alpha-ID subunit Calcium channel, L type, alpha-1 polypeptide, isoform 2 transcripts detectable with the above oligonucleotides in cancer samples was higher than in the normal samples. HUMCEA )J)_96 (SEQ ID NO: 1338)- CAAGAGGGGTTTGGCTGAGACTTTAGGATTGTGATTCAGCTTAGAGGGAC HUMCEA_0_0J5168 (SEQ ID NO: 1339)- TCCTGCCTGTCACCTGAAGTTCTAGATCATTCCCTGGACTCCACTCTATC
Expression of Carcinoembryonic antigen-related cell adhesion molecule 5 CEACAM5 HUMCEA transcripts which are detectable by amplicon as depicted in sequence name HUMCEA seg31 in normal and cancerous colon tissues Expression of CEACAM5 transcripts detectable by or according to seg31, HUMCEA(ver 3.4 T10888) seg31 amplicon (SEQ ID NO: 1342) and HUMCEA seg31-¥ (SEQ ID NO: 1340) HUMCEA 5e 37-R (SEQ ID NO: 1341) primers was measured by real time PCR. In parallel the expression of four housekeeping genes -PBGD (GenBank Accession No. BC019323 amplicon - PBGD-amplicon, SEQ ID NO:531), HPRTl (GenBank Accession No. NM_000194; amplicon - HPRTl -amplicon, SEQ ID NO:612), G6PD (GenBank Accession No. NMJD00402 G6PD amplicon, SEQ ID NO:615), RPS27A (GenBank Accession No. NM_002954; RPS27A amplicon, SEQ ID NO: 1261), was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean ofthe quantities of the housekeeping genes. The noπnalized quantity of each RT sample was then divided by the median of the quantities ofthe nonnal post-mortem (PM) samples (Sample Nos. 41, 52, 62-67, 69-71, Table 1, above, "Tissue samples in testing panel",), to obtain a value of fold up-regulation for each sample relative to median ofthe normal PM samples. Figure 51 is a histogram showing over expression ofthe above-indicated CEACAM5 franscripts in cancerous colon samples relative to the normal samples. The number and percentage of samples that exhibit at least 3 fold over-expression, out ofthe total number of samples tested is indicated in the bottom. As is evident from Figure 51, the expression of CEACAM5 franscripts detectable by the above amplicon in cancer samples was significantly higher than in the non-cancerous samples (Sample Nos. 41, 52, 62-67, 69-71, Table 1, "Tissue samples in testing panel"). Notably an over-expression of at least 3 fold was found in 9 out of 37 adenocarcinoma samples. Statistical analysis was applied to verify the significance of these results, as described below. The P value for the difference in the expression levels of CEACAM5 transcripts detectable by the above amplicon in colon cancer samples versus the normal tissue samples was detennined by T test as 6.24E-04. Threshold of 3 fold overexpression was found to differentiate between cancer and normal samples with P value of 7.42E-02 as checked by exact fisher test. The above values demonstrate statistical significance ofthe results.
Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non- limiting illustrative example only of a suitable primer pair: HUMCEA seg31 Eforward primer; and HUMCEA seg31 Rreverse primer. The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon was obtained as a non-limiting illusfrative example only of a suitable amplicon: HUMCEA seg31. Forward primer (SEQ ID NO: 1340): CGGCCTCCCAAAGTGCT Reverse primer (SEQ ID NO: 1341): GGGAAGCTCCTGATTGTAGAAGG Amplicon (SEQ ID NO: 1342):
CGGCCTCCCAAAGTGCTGGGATTACAGGCGTGAGCCACCGCACCCGGCCGATTTGG ACTTTTTAACACAGGATTGGGACAGGATTCAGAGGGACACTGTGGCCCTTCTACAA TCAGGAGCTTCCC Expression of Carcinoembryonic antigen-related cell adhesion molecule 5 CEACAM5 HUMCEA transcripts which are detectable by amplicon as depicted in sequence name HUMCEA seg33 in normal and cancerous colon tissues Expression of CEACAM5transcripts detectable by or according to seg33, HUMCEA (ver 3.4 T10888) seg33 amplicon (SEQ ID NO: 1345) and HUMCEA seg33 F (SEQ ID NO: 1343) and HUMCEA seg33 R (SEQ ID NO: 1344) primers was measured by real time PCR. In parallel the expression of four housekeeping genes -PBGD (GenBank Accession No. BC019323; amplicon - PBGD-amplicon, SEQ ID NO:531), HPRTl (GenBank Accession No. NM_000194; amplicon - HPRTl -amplicon, SEQ ID NO:612), G6PD (GenBank Accession No. NM_000402; G6PD amplicon, SEQ ID NO:615), RPS27A (GenBank Accession No. NM_002954; RPS27A amplicon, SEQ ID NO: 1261), was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the normal post-mortem (PM) samples (Sample Nos. 41, 52, 62-67, 69-71, Table 1, above, "Tissue samples in testing panel"), to obtain a value of fold up-regulation for each sample relative to median ofthe normal PM samples. Figure 52 is a histogram showing over expression ofthe above-indicated CEACAM5 transcripts in cancerous colon samples relative to the normal samples. The number and percentage of samples that exhibit at least 3fold over-expression, out ofthe total number of samples tested is indicated in the bottom. As is evident from Figure 52, the expression of CEACAM5 transcripts detectable by the above amplicon in cancer samples was significantly higher than in the non-cancerous samples (Sample Nos. 41, 52, 62-67, 69-71, Table 1, "Tissue samples in testing panel"). Notably an over-expression of at least 3 fold was found in 11 out of 37 adenocarcinoma samples, Statistical analysis was applied to verify the significance of these results, as described below. The P value for the difference in the expression levels of CEACAM5 transcripts detectable by the above amplicon in colon cancer samples versus the normal tissue samples was determined by T test as 4.01E-04. Threshold of 3 fold overexpression was found to differentiate between cancer and normal samples with P value of 3.78E-02 as checked by exact fisher test. The above values demonstrate statistical significance ofthe results. Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non- limiting illustrative example only of a suitable primer pair: HUMCEA seg3"3Eforward primer; and HUMCEA seg33 Rreverse primer. The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon was obtained as a non-limiting illustrative example only of a suitable amplicon: HUMCEA seg33.
Forward primer (SΕQ ID NO: 1343): CTGGAGCATCAGCATCATATTCTG Reverse primer (SΕQ ID NO: 1344): GAGAGTTGGCCGAGATGGAG Amplicon (SΕQ ID NO: 1345):
CTGGAGCATCAGCATCATATTCTGGGGTGGAGTCTATCTGGTTCTCACCAAAGAGCC AAGAAGACATTTTCTTTCCCAGTCTGTGTTCCATGGGCACAAGGAAATCCCAAATTC
TATCCTGAGCCCCCTCACTCCATCTCGGCCAACTCTC
Expression of Carcinoembryonic antigen-related cell adhesion molecule 5 CEACAM5 HUMCEAtranscripts which are detectable by amplicon as depicted in sequence name
HUMCEA seg35 in normal and cancerous colon tissues Expression of CEACAM5 transcripts detectable by or according to seg35, HUMCEA
(ver 3.4 T10888) seg35 amplicon (SEQ ID NO: 1348) and HUMCEA seg35 F (SEQ ID NO:
1346) and HUMCEA seg35 R (SEQ ID NO: 1347) primers was measured by real time PCR. In parallel the expression of four housekeeping genes -PBGD (GenBank Accession No.
BC019323; amplicon - PBGD-amplicon, SEQ ID NO:531), HPRTl (GenBank Accession No.
NM 00194; amplicon - HPRTl -amplicon, SEQ ID NO:612), G6PD (GenBank Accession No.
NM_000402; G6PD amplicon, SEQ ID NO:615), RPS27A (GenBank Accession No.
NM_002954; RPS27A amplicon, SEQ ID NO: 1261), was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the normal post-mortem (PM) samples (Sample Nos. 41, 52, 62-67, 69-71, Table 1, above, "Tissue samples in testing panel"), to obtain a value of fold up-regulation for each sample relative to median ofthe nonnal PM samples. Figure 53 is a histogram showing over expression ofthe above-indicated CEACAM5 transcripts in cancerous colon samples relative to the normal samples. The number and percentage of samples that exhibit at least 3fold over-expression, out ofthe total number of samples tested is indicated in the bottom. As is evident from Figure 53, the expression of CEACAM5 franscripts detectable by the above amplicon in cancer samples was significantly higher than in the non-cancerous samples ~ (Sample Nos. 41, 52, 62-67, 69-71, Table 1, "Tissue samples in testing panel")." Notably an over-expression of at least 3 fold was found in 15 out of 37 adenocarcinoma samples, Statistical analysis was applied to verify the significance of these results, as described below. The P value for the difference in the expression levels of CEACAM5 transcripts detectable by the above amplicon in colon cancer samples versus the normal tissue samples was determined by T test as 8.96E-04. Threshold of 3 fold overexpression was found to differentiate between cancer and normal samples with P value of 1.27E-02 as checked by exact fisher test. The above values demonstrate statistical significance ofthe results. Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non- limiting illusfrative example only of a suitable primer pair: HUMCEA seg35Fforwsxd primer; and HUMCEA seg35 Rreverse primer. The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon was obtained as a non-limiting illusfrative example only of a suitable amplicon: HUMCEA seg35. Forward primer (SEQ ID NO: 1346): GAAGCAGAGTCCCCCAGAACT Reverse primer (SEQ ID NO: 1347): AAGGCCCAGGCTAGTGCATT Amplicon (SEQ ID NO: 1348): GAAGCAGAGTCCCCCAGAACTGGGCTTTTCATTCCCCTGGTGGGAGCCCATGAGAA GCGAGTTCTCTGTGCAACGGACTTAGTAAATACAGAATGCACTAGCCTGGGCCTT
DESCRIPTION FOR CLUSTER M78035"
Cluster M78035 features 12 transcript(s) and 39 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Figure imgf001312_0001
Table 2 - Segments of interest
Figure imgf001313_0001
Figure imgf001314_0001
Table 3 - Proteins of interest
Figure imgf001314_0002
These sequences are variants ofthe known protein Adenosylhomocystemase (SwissProt accession identifier SAHHJHUMAN; known also according to the synonyms EC 3.3.1.1; 8- adenosyl-L-homocysteine hydrolase; AdoHcyase), SEQ ID NO: 922, refened to herein as the previously known protein. Protein Adenosylhomocystemase is known or believed to have the following function(s): Adenosylhomocysteine is a competitive inhibitor of S-adenosyl-L-methionine-dependent methyl transferase reactions; therefore adenosylhomocystemase may play a key role in the confrol of methylations via regulation ofthe intracellular concentration of adenosylhomocysteine. The sequence for protein Adenosylhomocysteinase is given at the end of the application, as "Adenosylhomocystemase amino acid sequence". Known polymoφhisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein
Figure imgf001315_0001
Protein Adenosylhomocysteinase localization is believed to be Cytoplasmic.
The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: one-carbon compound metabolism, which are annotation(s) related to Biological Process; adenosylhomocysteinase; hydrolase, which are annotation(s) related to Molecular Function; and cytoplasm, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more ofthe SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>.
Cluster M78035 can be used as a diagnostic marker according to overexpression of transcripts of this cluster in cancer. Expression of such transcripts in normal tissues is also given according to the previously described methods. The term "number" in the left hand column of the table and the numbers on the y-axis ofthe figure below refer to weighted expression of ESTs in each category, as "parts per million" (ratio ofthe expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million).
Overall, the following results were obtained as shown with regard to the histograms in Figure 54 and Table 5. This cluster is overexpressed (at least at a minimum level) in the following pathological conditions: brain malignant tumors, colorectal cancer, epithelial malignant tamors, a mixture of malignant tumors from different tissues, malignant tumors involving the lymph nodes and pancreas carcinoma.
Table 5 - Normal tissue distribution
Figure imgf001316_0001
Table 6-P values and ratios for expression in cancerous tissue
Figure imgf001317_0001
As noted above, cluster M78035 featares 12 transcript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Adenosylhomocysteinase. A description of each variant protein according to the present invention is now provided.
Variant protein M78035JP2 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) M78035JT0, M78035JT17, M78035JT18, M78035JT19 and M78035JT20. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: intracellularly. The protein localization is believed to be intracellularly because neither ofthe trans-membrane region prediction programs predicted a trans-membrane region for this protein. In addition both signal-peptide prediction programs predict that this protein is a non-secreted protein. Variant protein M78035JP2 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 7, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein M78035JP2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Amino acid mutations
Figure imgf001318_0001
Figure imgf001319_0001
Variant protein M78035 JP2 is encoded by the following transcript(s): M78035 JTO, M78035JT17, M78035_T18, M78035JT19 and M78035JT20, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript M78035JT0 is shown in bold; this coding portion starts at position 132 and ends at position 1427. The franscript also has the following SNPs as listed in Table 8 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein M78035 JP2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Nucleic acid SNPs
Figure imgf001320_0001
Figure imgf001321_0001
Figure imgf001322_0001
The coding portion of transcript M78035JT17 is shown in bold; this coding portion starts at position 132 and ends at position 1427. The transcript also has the following SNPs as listed in Table 9 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein M78035_P2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Nucleic acid SNPs
Figure imgf001322_0002
Figure imgf001323_0001
Figure imgf001324_0001
The coding portion of transcript M78035JT18 is shown in bold; this coding portion starts at position 132 and ends at position 1427. The transcript also has the following SNPs as listed in Table 10 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein M78035_P2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 - Nucleic acid SNPs
Figure imgf001324_0002
Figure imgf001325_0001
Figure imgf001326_0001
The coding portion of transcript M78035JT19 is shown in bold; this coding portion starts at position 132 and ends at position 1427. The transcript also has the following SNPs as listed in Table 11 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein M78035JP2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 11 - Nucleic acid SNPs
Figure imgf001326_0002
Figure imgf001327_0001
The coding portion of transcript M78035JT20 is shown in bold; this coding portion starts at position 132 and ends at position 1427. The franscript also has the following SNPs as listed in Table 12 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein M78035JP2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 12 - Nucleic acid SNPs
Figure imgf001328_0001
Figure imgf001329_0001
Variant protein M78035JP4 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) M78035JT3 and M78035JT4. An alignment is given to the known protein (Adenosylhomocysteinase) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between M78035JP4 and SAHH JHUMAN: 1.An isolated chimeric polypeptide encoding for M78035_P4, comprising a first amino acid sequence being at least 90 % homologous to MPGLMRMRERYSASKPLKGARIAGCLHMTVETAVLIETLVTLGAEVQWSSCNIFSTQD HAAAAIAKAGΓPVYAWKGETDEEYLWCIEQTLYFKDGPLNMILDDGGDLTNLIHTKYP QLLPGIRGISEETTTGVTTNLYKMMANGILKVPAINVNDSVTKSKFDNLYGCRESLIDGIK RATDVMIAGKVAVVAGYGDVGKGCAQALRGFGARVIITEIDPTNALQAAMEGYEVTT MDEACQEGNIFVTTTGCIDIILGRHFEQMKDDAIVCNIGHFDVEIDVKWLNENAVEKVN IKPQVDRYRLKNGRRIILLAEGRLVNLGCAMGHPSFVMSNSFTNQVMAQIELWTHPDK YPVGVHFLPKKLDEAVAEAHLGKLNVKLTKLTEKQAQYLGMSCDGPFKPDHYRY conesponding to amino acids 29 - 432 of SAHHJHUMAN, which also conesponds to amino acids 1 - 404 of M78035 P4.
The location ofthe variant protein was detennined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: intracellularly. The protein localization is believed to be intracellularly because neither ofthe trans-membrane region prediction programs predicted a frans-membrane region for this protein. In addition both signal-peptide prediction programs predict that this protein is a non-secreted protein. Variant protein M78035JP4 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 13, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein M78035JP4 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 13 - Amino acid mutations
Figure imgf001330_0001
Figure imgf001331_0001
Variant protein M78035_P4 is encoded by the following transcript(s): M78035JT3 and M78035JT4, for which the sequence(s) is/are given at the end ofthe application. The coding portion of franscript M78035JT3 is shown in bold; this coding portion starts at position 301 and ends at position 1512. The franscript also has the following SNPs as listed in Table 14 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein M78035 JP4 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 14 - Nucleic acid SNPs
Figure imgf001331_0002
Figure imgf001332_0001
Figure imgf001333_0001
Figure imgf001334_0001
The coding portion of franscript M78035JT4 is shown in bold; this coding portion starts at position 897 and ends at position 2108. The transcript also has the following SNPs as listed in Table 15 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein M78035JP4 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 15 - Nucleic acid SNPs
Figure imgf001334_0002
Figure imgf001335_0001
Figure imgf001336_0001
Figure imgf001337_0001
Variant protein M78035JP6 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by franscript(s) M78035JT7 and M78035JT9. An alignment is given to the known protein (Adenosylhomocysteinase) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between M78035_P6 and SAHH HUMAN: 1.An isolated chimeric polypeptide encoding for M78035_P6, comprising a first amino acid sequence being at least 90 % homologous to MILDDGGDLTNLIHTKYPQLLPGIRGISEETTTGVHNLYT DMANGILKWATNVNDSVT KSKFDNLYGCRESLIDGIKRATDVMIAGKVAWAGYGDVGKGCAQALRGFGARVIITEI DPΓNALQAAMEGYEVTTMDEACQEGNIFVTTTGCIDΠLGRHFEQMKDDAIVCNIGHFD VΈIDVKWLNENAVEKVNIKPQVDRYRLKNGRRIILLAEGRLVNLGCAMGHPSFVMSNS FTNQVMAQIELWTHPDKYPVGVHFLPKKLDEAVAEAHLGIANVKLTKLTEKQAQYLG MSCDGPFKPDHYRY conesponding to amino acids 127 - 432 of SAHH JHUMAN, which also conesponds to amino acids 1 - 306 of M78035 ?6.
The location of the variant protein was detennined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: intracellularly. The protein localization is believed to be intracellularly because neither of the frans-membrane region prediction programs predicted a trans-membrane region for this protein. In addition both signal-peptide prediction programs predict that this protein is a non-secreted protein. Variant protein M78035JP6 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 16, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein M78035JP6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 16 - Amino acid mutations
Figure imgf001338_0001
Figure imgf001339_0001
Variant protein M78035_P6 is encoded by the following franscript(s): M78035JT7 and M78035 JT9, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript M78035 JT7 is shown in bold; this coding portion starts at position 556 and ends at position 1473. The transcript also has the following SNPs as listed in Table 17 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein M78035JP6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 17 - Nucleic acid SNPs
Figure imgf001339_0002
Figure imgf001340_0001
Figure imgf001341_0001
Figure imgf001342_0001
The coding portion of transcript M78035JT9 is shown in bold; this coding portion starts at position 768 and ends at position 1685. The transcript also has the following SNPs as listed in Table 18 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein M78035JP6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 18 - Nucleic acid SNPs
Figure imgf001342_0002
Figure imgf001343_0001
Figure imgf001344_0001
Figure imgf001345_0001
Variant protein M78035_P8 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) M78035JT11. An alignment is given to the known protein (Adenosylhomocysteinase) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between M78035JP8 and SAHH JHUMAN: l.An isolated chimeric polypeptide encoding for M78035JP8, comprising a first amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MSDKLPYKV conesponding to amino acids 1 - 9 of M78035JP8, and a second amino acid sequence being at least 90 % homologous to VYAWKGETDEEYLWCIEQTLYFKDGPLNMILDDGGDLTNLIHTKYPQLLPGIRGISEET TTGVHNLYKMMANGILKVPAINVNDSVTKSKFDNLYGCRESLIDGIKRATDVMIAGKV AVVAGYGDVGKGCAQALRGFGARVIITEIDPINALQAAMEGYEVTTMDEACQEGNIFV TTTGCIDIILGRHFEQMKTJDAIVCNIGHFDVEIDVKWLNENAVEKVNIKPQVDRYRLKN GRRIILLAEGRLVNLGCAMGHPSFVMSNSFTNQVMAQIELWTHPDKYPVGVHFLPKKL DEAVAEAHLGKLNVKLTKLTEKQAQYLGMSCDGPFKPDHYRY conesponding to amino acids 99 - 432 of SAHHJTUMAN, which also conesponds to amino acids 10 - 343 of M78035JP8, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a head of M78035 JP8, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MSDKLPYKV of M78035 P8. The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: intracellularly. The protein localization is believed to be intracellularly because neither ofthe trans-membrane region prediction programs predicted a frans-membrane region for this protein. In addition both signal-peptide prediction programs predict that this protein is a non-secreted protein. Variant protein M78035JP8 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 19, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein M78035JP8 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 19 - Amino acid mutations
Figure imgf001346_0001
Figure imgf001347_0001
Variant protein M78035JP8 is encoded by the following transcript(s): M78035JT11, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript M78035JT11 is shown in bold; this coding portion starts at position 132 and ends at position 1160. The transcript also has the following SNPs as listed in Table 20 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein M78035JP8 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 20 - Nucleic acid SNPs
Figure imgf001347_0002
Figure imgf001348_0001
Figure imgf001349_0001
Variant protein M78035_P18 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) M78035JT27. The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: intracellularly. The protein localization is believed to be intracellularly because neither ofthe trans-membrane region prediction programs predicted a frans-membrane region for this protein. In addition both signal-peptide prediction programs predict that this protein is a non-secreted protein. Variant protein M78035 JT 8 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 21, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein M78035_P18 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 21 - Amino acid mutations
Figure imgf001350_0001
Variant protein M78035JP18 is encoded by the following transcript(s): M78035JT27, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript M78035JT27 is shown in bold; this coding portion starts at position 132 and ends at position 617. The franscript also has the following SNPs as listed in Table 22 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is Icnown or not; the presence of known SNPs in variant protein M78035JP18 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 22 - Nucleic acid SNPs
Figure imgf001351_0001
Variant protein M78035_P19 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) M78035JT28. The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: intracellularly. The protein localization is believed to be intracellularly because neither ofthe trans-membrane region prediction programs predicted a frans-membrane region for this protein. In addition both signal-peptide prediction programs predict that this protein is a non-secreted protein. Variant protein M78035JP19 is encoded by the following transcript(s): M78035JT28, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript M78035JT28 is shown in bold; this coding portion starts at position 585 and ends at position 902. The transcript also has the following SNPs as listed in Table 23 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein M78035JP19 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 23 - Nucleic acid SNPs
Figure imgf001352_0001
As noted above, cluster M78035 featares 39 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster M78035_node_4 according to the present invention is supported by 163 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M78035_T0, M78035JT7, M78035 JTl 1, M78035_T17, M78035_T18, M78035JT19, M78035JT20 and M78035_T27. Table 24 below describes the starting and ending position of this segment on each transcript. Table 24 - Segment location on transcripts
Figure imgf001353_0001
Segment cluster M78035_node_6 according to the present invention is supported by 5 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): M78035JT4. Table 25 below describes the starting and ending position of this segment on each transcript. Table 25 - Segment location on transcripts
Figure imgf001353_0002
Segment cluster M78035_node_10 according to the present invention is supported by 3 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M78035JT3 and M78035 T9. Table 26 below describes the starting and ending position of this segment on each franscript. Table 26 - Segment location on transcripts
Figure imgf001353_0003
Figure imgf001354_0001
Segment cluster M78035_node_17 according to the present invention is supported by 189 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M78035JT0, M78035JT3, M78035JT4, M78035JT7, M78035JT9, M78035JT17, M78035JT18, M78035_T19, M78035JT20 and M78035JT27. Table 27 below describes the starting and ending position of this segment on each franscript. Table 27 - Segment location on transcripts
Figure imgf001354_0002
Segment cluster M78035_nodeJ8 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M78035JT9 and M78035JT27. Table 28 below describes the starting and ending position of this segment on each franscript. Table 28 - Segment location on transcripts
Figure imgf001355_0001
Segment cluster M78035_node l according to the present invention is supported by 3 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M78035JT27. Table 29 below describes the starting and ending position of this segment on each transcript. Table 29 - Segment location on transcripts
Figure imgf001355_0002
Segment cluster M78035_node_25 according to the present invention is supported by 171 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M78035JT0, M78035JT3, M78035 T4, M78035JT7, M78035JT9, M78035JT11, M78035JT17, M78035JT18, M78035JT19 and M78035JT20. Table 30 below describes the starting and ending position of this segment on each transcript. Table 30 - Segment location on transcripts
Figure imgf001355_0003
Figure imgf001356_0001
Segment cluster M78035 jiode J3 according to the present invention is supported by 191 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M78035JT0, M78035JT3, M78035JT4, M78035JT7, M78035JT9, M78035 JT 1, M78035JT17, M78035JT18, M78035JT19 and M78035JT20. Table 31 below describes the starting and ending position of this segment on each franscript. Table 31 - Segment location on transcripts
Figure imgf001356_0002
Segment cluster M78035_nodeJ5 according to the present invention is supported by 238 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): M78035JT0, M78035JT3, M78035JT4, M78035JT7, M/8035JT9 and M78035JT11. Table 32 below describes the starting and ending position of this segment on each transcript. Table 32 - Segment location on transcripts
Figure imgf001357_0001
Segment cluster M78035 jiode _58 according to the present invention is supported by 273 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): M78035JT0, M78035JT3, M78035JT4, M78035JT7, M78035JT9 and M78035JT11. Table 33 below describes the starting and ending position of this segment on each franscript. Table 33 - Segment location on transcripts
Figure imgf001357_0002
Segment cluster M78035 jiode SO according to the present invention is supported by 268 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M78035_T0, M78035JT3, M78035JT4, M78035 J7, M78035JT9 and M78035JT11. Table 34 below describes the starting and ending position of this segment on each transcript. Table 34 - Segment location on transcripts
Figure imgf001358_0001
Segment cluster M78035_node_62 according to the present invention is supported by 4 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M78035JT19 and M78035JT20. Table 35 below describes the starting and ending position of this segment on each transcript. Table 35 - Segment location on transcripts
Figure imgf001358_0002
Segment cluster M78035_node_63 according to the present invention is supported by 8 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M78035JT19, M78035JT20 and M78035JT28. Table 36 below describes the starting and ending position of this segment on each transcript. Table 36 - Segment location on transcripts
Figure imgf001359_0001
Segment cluster M78035_node_64 according to the present invention is supported by 5 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M78035JT19 and M78035JT28. Table 37 below describes the starting and ending position of this segment on each franscript. Table 37 - Segment location on transcripts
Figure imgf001359_0002
Segment cluster M78035_node_65 according to the present invention is supported by 3 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): M78035JT19, M78035JT20 and M78035JT28. Table 38 below describes the starting and ending position of this segment on each franscript. Table 38 - Segment location on transcripts
Figure imgf001359_0003
Figure imgf001360_0001
Segment cluster M78035_node_69 according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcriρt(s): M78035JT18. Table 39 below describes the starting and ending position of this segment on each transcript. Table 39 - Segment location on transcripts
Figure imgf001360_0002
Segment cluster M78035_node_71 according to the present invention is supported by 7 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M78035JT17 and M78035JT18. Table 40 below describes the starting and ending position of this segment on each transcript. Table 40 - Segment location on transcripts
Figure imgf001360_0003
According to an optional embodiment ofthe present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description. Segment cluster M78035 ιodeJ4 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously descπbed. This segment can be found in the following transcript(s): M78035JT28. Table 41 below describes the starting and ending position of this segment on each transcript. Table 41 - Segment location on transcripts
Figure imgf001361_0001
Segment cluster M78035_node_15 according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M78035JT28. Table 42 below describes the starting and ending position of this segment on each transcript. Table 42 - Segment location on transcripts
Figure imgf001361_0002
Segment cluster M78035 jnode O according to the present invention is supported by 162 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M78035JT0, M78035JT3, M78035JT4, M78035JT7, M78035JT9, M78035JT17, M78035JT8, M78035JT19, M78035_T20 and M78035JT27. Table 43 below describes the starting and ending position of this segment on each transcript. Table 43 - Segment location on transcripts
Figure imgf001361_0003
Figure imgf001362_0001
Segment cluster M78035 _node_24 according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M78035JT7. Table 44 below describes the starting and ending position of this segment on each transcript. Table 44 - Segment location on transcripts
Figure imgf001362_0002
Segment cluster M78035 jiode _ 26 according to the present invention can be found in the following franscript(s): M78035JT0, M78035JT3, M78035JT4, M78035JT7, M78035JT9, M78035JT11, M78035JT17, M78035JT18, M78035JT19 and M78035JT20. Table 45 below describes the starting and ending position of this segment on each transcript. Table 45 - Segment location on transcripts
Figure imgf001362_0003
Figure imgf001363_0001
Segment cluster M78035_node_28 according to the present invention is supported by 161 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M78035JT0, M78035JT3, M78035JT4, M78035JT7, M78035JT9, M78035JT11, M78035JT17, M78035JT18, M78035JT19 and M78035JT20. Table 46 below describes the starting and ending position of this segment on each transcript. Table 46 - Segment location on transcripts
Figure imgf001363_0002
Segment cluster M78035 jιodeJ9 according to the present invention is supported by 157 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M78035JT0, M78035_T3, M78035JT4, M78035JT7, M7S035JT9, M78035_T11, M78035JT17, M78035JT18, M78035_T19 and M78035_T20. Table 47 below describes the starting and ending position of this segment on each transcript. Table 47 - Segment location on transcripts
Figure imgf001364_0001
Segment cluster M78035 jnode _30 according to the present invention can be found in the following transcript(s): M78035JT0, M78035JT3, M78035JT4, M78035JT7, M78035JT9, M78035JT11, M78035JT17, M78035JT18, M78035_T19 and M78035_T20. Table 48 below describes the starting and ending position of this segment on each franscript. Table 48 - Segment location on transcripts
Figure imgf001364_0002
Figure imgf001365_0001
Segment cluster M78035 jnode Jl according to the present invention can be found in the following transcript(s): M78035JT0, M78035JT3, M78035JT4, M78035_T7, M78035JT9, M78035JTH, M78035JT17, M78035JT18, M78035_T19 and M78035JT20. Table 49 below describes the starting and ending position of this segment on each franscript. Table 49 - Segment location on transcripts
Figure imgf001365_0002
Segment cluster M78035 jiode J4 according to the present invention can be found in the following transcript(s): M78035JT0, M78035JT3, M78035JT4, M78035JT7, M78035_T9, M78035JT11, M78035JT17, M78035_T18, M78035_T19 and M78035_T20. Table 50 below describes the starting and ending position of this segment on each transcript. Table 50 - Segment location on transcripts
Figure imgf001366_0001
Segment cluster M78035_nodeJ5 according to the present invention can be found in the following franscript(s): M78035JT0, M78035JT3, M78035_T4, M78035_T7, M78035JT9, M78035JT11, M78035JT17, M78035JT18, M78035JT19 and M78035_T20. Table 51 below describes the starting and ending position of this segment on each transcript. Table 51 - Segment location on transcripts
Figure imgf001366_0002
Figure imgf001367_0001
Segment cluster M78035_nodeJ7 according to the present invention is supported by 177 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M78035 JTO, M78035JT3, M78035JT4, M78035 T7, M78035JT9, M78035JT11, M78035JT17, M78035JT18, M78035_T19 and M78035JT20. Table 52 below describes the starting and ending position of this segment on each franscript. Table 52 - Segment location on transcripts
Figure imgf001367_0002
Segment cluster M78035_node _40 according to the present invention is supported by 194 libraries. The number of libraries was detennined as previously described. This segment can be found in the following transcript(s): M78035JT0, M78035JT3, M78035JT4, M78035JT7, M78035JT9, M78035 T11, M78035_T17, M78035_T18, M78035_T19 and M78035JT20. Table 53 below describes the starting and ending position of this segment on each transcript. Table 53 - Segment location on transcripts
Figure imgf001368_0001
Segment cluster M78035_node_48 according to the present invention is supported by 180 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M78035JT0, M78035JT3, M78035JT4, M78035_T7, M78035JT9, M78035_T11, M78035JT17, M78035JN8, M78035JT9 and M78035_T20. Table 54 below describes the starting and ending position of this segment on each transcript. Table 54 - Segment location on transcripts
Figure imgf001368_0002
Figure imgf001369_0001
Segment cluster M78035 jnodeJ-9 according to the present invention is supported by 190 libraries. The number of libraries was detennined as previously described. This segment can be found in the following franscript(s): M78035JT0, M78035 JT3, M78035JT4, M78035 J7, M78035JT9, M78035 JT 1, M78035JT17, M78035JT18, M78035JT19 and M78035JT20. Table 55 below describes the starting and ending position of this segment on each transcript. Table 55 - Segment location on transcripts
Figure imgf001369_0002
Segment cluster M78035_node 0 according to the present invention is supported by 190 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M78035JT0, M78035JT3, M78035JT4, M78035JT7, M78035JT9, M78035JT11, M78035 JT7, M78035JT18, M78035_T19 and M78035JT20. Table 56 below describes the starting and ending position of this segment on each transcript. Table 56 - Segment location on transcripts
Figure imgf001370_0001
Segment cluster M78035_nodeJ2 according to the present invention can be found in the following transcript(s): M78035 T0, M78035JT3, M78035 T4, M78035_T7, M78035JT9, M78035JT11, M78035JT17, M78035JT18, M78035JT19 and M78035_T20. Table 57 below describes the starting and ending position of this segment on each transcript. Table 57 - Segment location on transcripts
Figure imgf001370_0002
Figure imgf001371_0001
Segment cluster M78035_node_53 according to the present invention can be found in the following transcript(s): M78035JT0, M78035JT3, M78035JT4, M78035JT7, M78035JT9, M78035JT11, M78035JT17, M78035JT18, M78035_T19 and M78035_T20. Table 58 below describes the starting and ending position of this segment on each transcript. Table 58 - Segment location on transcripts
Figure imgf001371_0002
2005/0720 1371
Segment cluster M78035_node_54 according to the present invention is supported by 213 libraries. The number of libraries was detennined as previously described. This segment can be found in the following transcript(s): M78035JT0, M78035JT3, M78035JT4, M78035JT7, M78035JT9, M78035JT11, M78035JT17, M78035JT18, M78035 JT9 and M78035_T20. Table 59 below describes the starting and ending position of this segment on each transcript. Table 59 - Segment location on transcripts
Figure imgf001372_0001
Segment cluster M78035 _nodeJ6 according to the present invention can be found in the following franscript(s): M78035JT0, M78035JT3, M78035JT4, M78035JT7, M78035JT9 and M78035JT11. Table 60 below describes the starting and ending position of this segment on each franscript. Table 60 - Segment location on transcripts
Figure imgf001372_0002
Figure imgf001373_0001
Segment cluster M78035 jiode J7 according to the present invention is supported by 225 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M78035JT0, M78035JT3, M78035JT4, M78035 T7, M78035JT9 and M78035JT11. Table 61 below describes the starting and ending position of this segment on each transcript. Table 61 - Segment location on transcripts
Figure imgf001373_0002
Segment cluster M78035_nodeJ9 according to the present invention is supported by 251 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): M78035_T0, M78035JT3, M78035JT4, M78035_T7, M78035JT9 and M78035 T11. Table 62 below describes the starting and ending position of this segment on each transcript. Table 62 - Segment location on transcripts
Figure imgf001374_0001
Variant protein aligmnent to the previously known protein: Sequence name: SAHH_HUMAN
Sequence documentation:
Alignment of: M78035JP4 x SAHH_HUMAN
Alignment segment 1/1:
Quality: 3949.00 Escore: 0 Matching length: 404 Total length: 404 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0 Alignment :
1 MPGLMRMRERYSAS PLKGARIAGCLHMTVETAVLIETLVTLGAEVQ SS 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 29 MPGLMRMRERYSASKPLKGARIAGCLHMTVETAVLIETLVTLGAEVQWSS 78 51 CNIFSTQDHAAAAIA AGIPVYAWKGETDEEYLWCIEQTLYFKDGPLNMI 100 I I I I I I I I I I I I I II I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I 79 CNIFSTQDHAAAAIAKAGIPVYA GETDEEYLWCIEQTLYFKDGPLNMI 128
101 LDDGGDLTNLIHTKYPQLLPGIRGISEETTTGVHNLYKMMANGILKVPAI 150 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 129 LDDGGDLTNLIHTKYPQLLPGIRGISEETTTGVHNLYKMMANGILKVPAI 178
151 NVNDSVTKSKFDNLYGCRESLIDGIKRATDVMIAGKVAVVAGYGDVGKGC 200 I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I 179 NVNDSVTKSKFDNLYGCRESLIDGIKRATDVMIAGKVA VAGYGDVGKGC 228
201 AQALRGFGARVIITEIDPINALQAAMEGYEVTTMDEACQEGNIFVTTTGC 250 I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I II I I I I I 229 AQALRGFGARVIITEIDPINALQAAMEGYEVTTMDEACQEGNIFVTTTGC 278 251 IDIILGRHFEQMKDDAIVCNIGHFDVEIDVK LNENAVE VNI PQVDRY 300 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 279 1DIILGRHFEQMKDDAIVCNIGHFDVEIDVK LNENAVEKVNI PQVDRY 328 301 RLKNGRRIILLAEGRLVNLGCAMGHPSFVMSNSFTNQVMAQIEL THPDK 350 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II II I I I I I I I I I II 329 RLKNGRRIILLAEGRLVNLGCAMGHPSFVMSNSFTNQVMAQIELWTHPDK 378 351 YPVGVHFLP KLDEAVAEAHLGKLNVKLTKLTEKQAQYLGMSCDGPFKPD 400 1 I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I 379 YPVGVHFLPKKLDEAVAEAHLGKLNVKLTKLTEKQAQYLGMSCDGPFKPD 428
401 HYRY 404 I I I I 429 HYRY 432
Sequence name: SAHHJiUMAN
Sequence documentation:
Alignment of: M78035_P6 x SAHHJiUMAN
Alignment segment 1/1:
Quality: 2982.00 Escore: 0 Matching length: 306 Total length: 306 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0 Alignment :
1 MILDDGGDLTNLIHTKYPQLLPGIRGISEETTTGVHNLYKMMANGIL VP 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 127 MILDDGGDLTNLIHT YPQLLPGIRGISEETTTGVHNLYKMMANGILKVP 176 51 AINVNDSVTKSKFDNLYGCRESLIDGIKRATDVMIAGKVAVVAGYGDVGK 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I II I I I I I 177 AINVNDSVTKSKFDNLYGCRESLIDGIKRATDVMIAGKVAVVAGYGDVGK 226 101 GCAQALRGFGARVIITEIDPINALQAAMEGYEVTTMDEACQEGNIFVTTT 150 I I I I I I I I I I I I I I I I I I II I I II I I I I I I I I I I I I I I I I I I I I I I I I I I 227 GCAQALRGFGARVIITEIDPINALQAAMEGYEVTTMDEACQEGNIFVTTT 276 151 GCIDIILGRHFEQMKDDAIVCNIGHFDVEIDVK LNENAVEKVNIKPQVD 200 I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I II 277 GCIDIILGRHFEQMKDDAIVCNIGHFDVEIDVKWLNENAVEKVNIKPQVD 326
201 RYRLKNGRRIILLAEGRLVNLGCAMGHPSFVMSNSFTNQVMAQIEL THP 250 I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 327 RYRLKNGRRIILLAEGRLVNLGCAMGHPSFVMSNSFTNQVMAQIELWTHP 376
251 DKYPVGVHFLPK LDEAVAEAHLG LNVKLTKLTEKQAQYLGMSCDGPFK 300 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II II I I I 377 DKYPVGVHFLPK LDEAVAEAHLGKLNVKLTKLTEKQAQYLGMSCDGPFK 426 301 PDHYRY 306 I I I I I I 427 PDHYRY 432
Sequence name: SAHHJiUMAN
Sequence documentation:
Alignment of : M78035_P8 x SAHRJHUMAN
Alignment segment 1/1:
Quality: 3275.00 Escore: 0 Matching length: 334 Total length: 334 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment :
10 VYA KGETDEEYLWCIEQTLYFKDGPLNMILDDGGDLTNLIHTKYPQLLP 59 I I I I I I I I I I I I I I I I II I I I I I I I I II I I I I I I I I I II I I I I II I II I I 99 VYAWKGETDEEYL CIEQTLYFKDGPLNMILDDGGDLTNLIHTKYPQLLP 148 60 GIRGISEETTTGVHNLYKMMANGILKVPAINVNDSVTKSKFDNLYGCRES 109 I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I II I I I I I I I I I I 149 GIRGISEETTTGVHNLYKMMANGILKVPAINVNDSVTKSKFDNLYGCRES 198 110 LIDGIKRATDVMIAGKVAVVAGYGDVGKGCAQALRGFGARVIITEIDPIN 159 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I 199 LIDGIKRATDVMIAGKVAVVAGYGDVGKGCAQALRGFGARVIITEIDPIN 248
160 ALQAAMEGYEVTTMDEACQEGNIFVTTTGCIDIILGRHFEQMKDDAIVCN 209 I I I I II I II I I I I I I I I I I I I I I I I I I I I I II I I I I II I I I I II I I I I I I 249 ALQAAMEGYEVTTMDEACQEGNIFVTTTGCIDIILGRHFEQM DDAIVCN 298
210 IGHFDVEIDVKWLNENAVEKVNIKPQVDRYRLKNGRRIILLAEGRLVNLG 259 I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 299 1GHFDVEIDVK LNENAVE VNI PQVDRYRLKNGRRIILLAEGRLVNLG 348 260 CAMGHPSFVMSNSFTNQVMAQIEL THPDKYPVGVHFLPKKLDEAVAEAH 309 I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I 349 CAMGHPSFVMSNSFTNQVMAQIEL THPDKYPVGVHFLPKKLDEAVAEAH 398
310 LGKLNVKLTKLTEKQAQYLGMSCDGPFKPDHYRY 343 II I I I I I I I II I II I I I I I I I I I I I I I I I I I I I I 399 LGKLNVKLTKLTEKQAQYLGMSCDGPFKPDHYRY 432
Expression of S-adenosylhomocysteine hydrolase (AHCY) M78035 tr-anscripts which are detectable by amplicon as depicted in sequence name M78035seg42 in normal and cancerous colon tissues Expression of S-adenosylhomocysteine hydrolase (AHCY) transcripts detectable by or according to seg42, M78035seg42 amplicon (SEQ ID NO: 1351) and M78035seg42F (SEQ ID NO: 1349) and M78035seg42R (SEQ ID NO: 1350) primers was measured by real time PCR. In parallel the expression of four housekeeping genes -PBGD (GenBank Accession No. BC019323; amplicon - PBGD-amplicon, SEQ ID NO.-531), HPRTl (GenBank Accession No. NM_000194; amplicon - HPRTl -amplicon, SEQ ID NO.612), G6PD (GenBank Accession No. NM_000402; G6PD amplicon, SEQ ID NO.615), RPS27A (GenBank Accession No. NM_002954; RPS27A amplicon, SEQ ID NO:1261), was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median ofthe quantities ofthe normal post-mortem (PM) samples (Sample Nos. 41, 52, 62-67, 69-71, Table 1, above, "Tissue samples in testing panel"), to obtain a value of fold up-regulation for each sample relative to median ofthe normal PM samples. Figure 55 is a histogram showing over expression ofthe above-indicated S- adenosylhomocysteine hydrolase (AHCY) franscripts in cancerous colon samples relative to the normal samples. The number and percentage of samples that exhibit at least 3 fold over- expression, out ofthe total number of samples tested is indicated in the bottom. As is evident from Figure 55, the expression of S-adenosylhomocysteine hydrolase (AHCY) transcripts detectable by the above amplicon in cancer samples was significantly higher than in the non-cancerous samples (Sample Nos. 41, 52, 62-67, 69-71, Table 1, "Tissue samples in testing panel"). Notably an over-expression of at least 3 fold was found in 11 out of 37 adenocarcinoma samples, Statistical analysis was applied to verify the significance of these results, as described below. The P value for the difference in the expression levels of S-adenosylhomocysteine hydrolase (AHCY) transcripts detectable by the above amplicon in colon cancer samples versus the normal tissue samples was determined by T test as 1.03E-04. Threshold of 3 fold overexpression was found to differentiate between cancer and normal samples with P value of 3.76E-02 as checked by exact fisher test. The above values demonstrate statistical significance ofthe results.
Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non- limiting illustrative example only of a suitable primer pair: M78035seg42F forward primer; and M78035seg42R reverse primer. The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon was obtained as a non-limiting illustrative example only of a suitable amplicon: M78035seg42.
Forward primer (SEQ ID NO: 1349): TGGTCTGGACTCAATCCCG Reverse primer (SEQ ID NO: 1350): GGAGTCTGAGTCCAAGCAGCC ' Amplicon (SEQ ID NO: 1351): TGGTCTGGACTCAATCCCGGGACTTTAGGACTTTTGCTAGAAATCTGGTGTGGTGCA GGAGCGACTCCAGGATTCACTCTGTGGGCTGCTTGGACTCAGACTCC
DESCRIPTION FOR CLUSTER R30650
Cluster R30650 featares 8 transcript(s) and 49 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end ofthe application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Figure imgf001381_0001
Figure imgf001382_0001
Figure imgf001383_0001
Table 3 - Proteins of interest
Figure imgf001383_0002
Figure imgf001384_0001
These sequences are variants ofthe known protein Protein KIAAl 199 precursor (SwissProt accession identifier Kl 99 JHUMAN), SEQ ID NO: 986, refened to herein as the previously known protein. Protein Protein KIAAl 199 precursor is known or believed to have the following function(s): May be involved in hearing. The sequence for protein Protein KIAAl 199 precursor is given at the end ofthe application, as "Protein KIAAl 199 precursor amino acid sequence". Known polymoφhisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein
Figure imgf001384_0002
Cluster R30650 can be used as a diagnostic marker according to overexpression of transcripts of this cluster in cancer. Expression of such transcripts in normal tissues is also given according to the previously described methods. The term "number" in the left hand column of the table and the numbers on the y-axis ofthe figure below refer to weighted expression of ESTs in each category, as "parts per million" (ratio ofthe expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million).
Overall, the following results were obtained as shown with regard to the histograms in Figure 56 and Table 5. This cluster is overexpressed (at least at a minimum level) in the following pathological conditions: epithelial malignant tumors and a mixture of malignant tumors from different tissues.
Table 5 - Normal tissue distribution
Figure imgf001385_0001
Figure imgf001386_0001
Table 6 - P values and ratios for expression in cancerous tissue
Figure imgf001386_0002
For this cluster, at least one oligonucleotide was found to demonstrate overexpression of the cluster, although not of at least one franscript/segment as listed below. Microanay (chip) data is also available for this cluster as follows. Various oligonucleotides were tested for being differentially expressed in various disease conditions, particularly cancer, as previously described. The following oligonucleotides were found to hit this cluster but not other segments/transcripts below, shown in Table 7. Table 7 - Oligonucleotides related to this cluster 1386
Figure imgf001387_0001
As noted above, cluster R30650 featares 8 transcript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Protein KIAAl 199 precursor. A description of each variant protein according to the present invention is now provided.
Variant protein R30650_PEA_2_P4 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) R30650JPEA JT2. An alignment is given to the known protein (Protein KIAAl 199 precursor) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between R30650_PEA_2_P4 and Q9ULM1 (SEQ ID NO:989): l.An isolated chimeric polypeptide encoding for R30650_PEAJ_P4, comprising a first amino acid sequence being at least 90 % homologous to MYLHIGEEIDGVDMRAEVGLLSP^IIVMGEMEDKCYPYRNHICNFFDFDTFGGHIKFAL GFKAAHLEGTELKHMGQQLVGQYPIHFHLAGDVDERGGYDPPTYIRDLSIHHTFSRCVT VHGSNGLLTKDVVGYNSLGHCFFTEDGPEERNTFDHCLGLLVKSGTLLPSDRDSKMCK MITEDSYPGYIPiπ'RQDCNAVSTFWMANPNNNLrNCAAAGSEETGFWFIFHHVPTGPSN GMYSPGYSEHIPLGKFYΝΝlP^vHSΝYPvAGMIIDΝGVKTTEASAKI)KTPFLSπSARYSPHQ DADPLKPREPAIIRHFIAYKΝQDHGAWLRGGDVWLDSCRFADΝGIGLTLASGGTFPYD DGSKQEIKΝSLFVGESGΝVGTEMMDΝRIWGPGGLDHSGRTLPIGQΝFPIRGIQLYDGPTΝ IQΝCTFRl^VALEGRHTSALAFRLΝΝAWQSCPHΝΝVTGIAFEDVPITSRVFFGEPGPWF ΝQLDMDGDKTS HDVDGSVSEYPGSYLTKΝDΝWLVRHPDC1ΝVPDWRGAICSGCYA QMYIQAYKTSΝLRMKITKΝDFPSHPLYLEGALTRSTHYQQYQPNVTLQKGYTIHWDQT APAELAIWLINFNKGDWIRVGLCYPRGTTFSILSDVHNRLLKQTSKTGVFVRTLQMDKV
EQSYPGRSHYYWDEDSGLLFLKLKAQNEREKFAFCSMKGCERIKIKALIPKNAGVSDCT
ATAYPK^TERAVVDVPMPKKLFGSQLKTKDHFLEVKMESSKQHFFHLλVNDFAYIEVD
GKKYPSSEDGIQVVVIDGNQGRVVSHTSFRNSILQGIPWQLFNYVATTPDNSIVLMASKG
RYVSRGPWTRVLEKLGADRGLKLKEQMAFVGFKGSFRPIWVTLDTEDHKAKIFQVVPI
PVVKKKKL conesponding to amino acids 126 - 1013 of Q9ULM1, which also conesponds to amino acids 1 - 888 of R30650_PEAJ_P4.
Comparison report between R30650JPEA 2JP4 and Q8WUJ3 (SEQ ID NO: 987): l.An isolated chimeric polypeptide encoding for R30650_PEAJ_P4, comprising a first amino acid sequence being at least 90 % homologous to
MYLHIGEEIDGVDMRAEVGLLSRNIIVMGEMEDKCYPYRNHICNFFDFDTFGGHIKFAL
GFKAAHLEGTELKHMGQQLVGQYPIHFHLAGDVDERGGYDPPTYIRDLSIHHTFSRCVT
VHGSNGLLIKDWGYNSLGHCFFTEDGPEERNTFDHCLGLLVKSGTLLPSDRDSKMCK
MITEDSYPGYΓPKPRQDCNAVSTFWMANPNNNLLNCAAAGSEETGFWFIFHHVPTGPSV
GMYSPGYSEHIPLGKFYL^RAHSNYRAGMIIDNGVKTTEASAKDKRPFLSIISARYSPHQ
DADPLKTPREPAIIRHFIAYKNQDHGAWLRGGDVWLDSCRFADNGIGLTLASGGTFPYD
DGSKQEIKNSLFVGESGNVGTEMMDNRIWGPGGLDHSGRTLPIGQNFPIRGIQLYDGPTN
IQNCTFRI^VALEGRHTSALAFRLNNAWQSCPHNNVTGIAFEDVPITSRVFFGEPGPWF
NQLDMDGDKTSVFHDVDGSVSEYPGSYLTKND conesponding to amino acids 474 - 977 of Q8WUJ3, which also conesponds to amino acids 1 - 504 of R30650JΕAJJP4, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
N LVRHPDCLL PDWRGAICSGCYAQMYIQAYKTSNLRMKIIKNDFPSHPLYLEGALT
RSTHYQQYQPVVTLQKGYTIHWDQTAJPAELAIWLINFNKGDWIRVGLCYPRGTTFSILS
DVΉNRLLKQTSKTG VRTLQMDKVEQSYPGRSHYYWDEDSGLLFLKLKAQNEREKF
AFCSMKGCEWKLT ALIPKNAGVSDCTATAYPKFTEPAVVDWMPIA FGSQLKTKDHF
LEVKMESSKQFIFFHLWNDFAYIEVDGKKYPSSEDGIQVVVIDGNQGRNVSHTSFRNSIL
QG1PWQLFNYVATIPDNSIVLMASKGRYVSRGPWTRVLEKLGADRGLKLKEQMAFVGF
KGSFRPIWVTLDTEDHKAKΓFQVWIPVVKKKKL conesponding to amino acids 505 - 888 of R30650_PEA_2_P4, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of R30650_PEA 2_P4, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence
NWLVRHPDCΓNVPDWRGAICSGCYAQMYIQAYKTSNLRMKIIKNDFPSHPLYLEGALT
RSTHYQQYQPVVTLQKGYTIHWDQTAPAELAIWLΓNFNKGDWIRVGLCYPRGTTFSILS
DVHNRLLKQTSKTGVFVRTLQMDKVEQSYPGRSHYYWDEDSGLLFLKLKAQNEREKF
AFCSMKGCERIK KALIPKNAGVSDCTATAYPKFTERAVVDVPMPKKLFGSQLKTKDHF
LEVKMESSKQHFFHLWNDFAYIEVDGKKYPSSEDGIQVVVIDGNQGRVVSHTSFRNSIL
QGIPWQLFNYVATIPDNSIVLMASKGRTVSRGPWTRVLEKLGADRGLKLKEQMAFVGF
KGSFRPIWVTLDTEDHKAKIFQVVPIPVVKKKKL IN R30650_PEA_2_P4.
Comparison report between R30650_PEA_2_P4 and Q9NPN9 (SEQ ID NO: 988): l.An isolated chimeric polypeptide encoding for R30650 PEA 2JP4, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
MYLHIGEEIDGVDMRAEVGLLSRNIIVMGEMEDKCYPYRNHICNFFDFDTFGGHIKFAL GFKAAHLEGTELKHMGQQLVGQYPIHFHLAGD conesponding to amino acids 1 - 91 of R30650_PEA_2_P4, and a second amino acid sequence being at least 90 % homologous to VDERGGYDPPTYIRDLSIHHTFSRCVTVHGSNGLLIKDVVGYNSLGHCFFTEDGPEERNT FDHCLGLLVKSGTLLPSDPJDSKMCKMITEDSYPGYIPKPRQDCNAVSTFWMANPNNNL mCAAAGSEETGFWFlPHHVPTGPSVGMYSPGYSEHIPLGKFYlNlNRAHSNYRAGMπDN GVKTTEASAiπ)iaiPFLSIISARYSPHQDADPLKPREPAπRHFIAYKNQDHGAWLRGGDV
WLDSCRFADNGIGLTLASGGTFPYDDGSKQETKNSLFVGESGNVGTEMMDNRIWGPGG
LDHSGRTLPIGQNFPΠIGIQLYDGPΓNIQNCTFRKFVALEGRHTSALAFRLNNAWQSCPH
NNVTGIAFEDVPITSRVFFGEPGPWFNQLDMDGDKTSVFHDVDGSVSEYPGSYLTKND
NWLVRHPDCMVPDWRGAICSGCYAQMYIQAYKTSNLRMKLTKNDFPSHPLYLEGALT
RSTHYQQYQPVNTLQKGYTIHWDQTAPAELAIWLTNFNKGDWIRVGLCYPRGTTFSILS DVHNRLLKQTSKTGVFVRTLQMDKVEQSYPGRSHYYWDEDSGLLFLKLKAQNEREKF AFCSMKGCERIKTKALIPKNAGVSDCTATAYPKFTERAVVDVPMPKKLFGSQLKTKDHF LEVKMESSKQHFFHLWNDFAYIEVDGKKYPSSEDGIQVVVIDGNQGRVVSHTSFRNSIL QGIPWQLFNYVATIPDNSIVLMASKGRYVSRGPWTRVLEKLGADRGLKLKEQMAFVGF KGSFRPIWVTLDTEDHKAiαFQVVPIPVVK-KT KL conesponding to amino acids 8 - 804 of Q9NPN9, which also conesponds to amino acids 92 - 888 of R30650_PEA_2_P4, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a head of R30650_PEAJ_P4, comprising a polypeptide being at least 70%, optionally at least about 80% preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence
MYLHIGEEIDGVDMRAEVGLLSRNIIVMGEMEDKCYPYRNHICNFFDFDTFGGHIKFAL GFKAAHLEGTELKHMGQQLVGQYPTHFHLAGD ofR30650JPEA_2_P4.
Comparison report between R30650_PEA_2_P4 and Q9H1K5 (SEQ ID NO:990): l.An isolated chimeric polypeptide encoding for R30650JPEAJ2JP4, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
MYLHIGEEIDGVDMRAEVGLLSRNIIVMGEMEDKCYPYRNHICNFFDFDTFGGHIKFAL
GFKAAHLEGTELKHMGQQLVGQYPIHFHLAGDVDERGGYDPPTYIRDLSIHHTFSRCVT
VHGSNGLLTKDVVGYNSLGHCFFTEDGPEERNTFDHCLGLLVKSGTLLPSDRDSKMCK
MITEDSYPGYIPI PRQDCNAVSTFWMANPNNNLΓNCAAAGSEETGFWFIFHHVPTGPSV
GMYSPGYSEHIPLGKFYNNΓAHSNYRAGMΠDNGVKTTEASAKDKRPFLSΠSARYSPHQ
DADPLKTPTEPAΠRHFIAYKNQDHGAWLRGGDVWLDSCRFADNGIGLTLASGGTFPYD
DGSKQEIKNSLFVGESGNVGTEMMDNRIWGPGGLDH conesponding to amino acids 1 - 389 of R30650JPEA_2_P4, and a second amino acid sequence being at least 90 % homologous to
SGRTLPIGQNFPIRGIQLYDGPINIQNCTFRKFVALEGRHTSALAFRLNNAWQSCPHNNV TGIAFEDVPITSRVFFGEPGPWFNQLDMDGDKTSVFHDVDGSVSEYPGSYLTKNDNWL VRHPDCTNVPDWRGAICSGCYAQMYIQAYKTSNLRMKIIKNDFPSHPLYLEGALTRSTH YQQYQPVVTLQKGYTIHWDQTAPAELAIWLiNFNKGDWTRVGLCYPRGTTFSILSDVH NRLLKQTSKTGVFVRTLQMDKVEQSYPGRSHYYWDEDSGLLFLKLKAQNEREKFAFC SMKGCERIKIKALIPKNAGVSDCTATAYPKFTERAVVDVPMPKKLFGSQLKTKDHFLEV KMESSKQHFFHLWNDFAYIEVDGKKYPSSEDGIQVVVIDGNQGRVVSHTSFRNSILQGI PWQLFNYVATIPDNSIVLMASKGRYVSRGPWTRVLEKLGADRGLKLKEQMAFVGFKG SFRPIWVTLDTEDHKAKIFQVVPIPVVKKKKL conesponding to amino acids 2 - 500 of Q9H1K5, which also conesponds to amino acids 390 - 888 of R30650_PEA_2_P4, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a head of R30650_PEA_2_P4, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence
MYLHIGEEIDGVDMRAEVGLLSRNIIVMGEMEDKCYPYRNHICNFFDFDTFGGHIKFAL GFKAAHLEGTELKHMGQQLVGQYPIHFHLAGDVDERGGYDPPTYIRDLSIHHTFSRCVT VHGSNGLLIKDWGYNSLGHCFFTEDGPEERNTFDHCLGLLVKSGTLLPSDRDSKMCK MTEDSYPGYIPKPRQDCNAVSTFWMANPNNNLTNCAAAGSEETGFWFIFHHVPTGPSV GMYSPGYSEHIPLGKFYi^WRAHSNYI^GMIIDNGVKTTEASAKDKRPFLSIISARYSPHQ DADPLKPREPAIIRHFIAYKNQDHGAWLRGGDVWLDSCRFADNGIGLTLASGGTFPYD DGSKQEIKNSLFVGESGNVGTEMMDNRIWGPGGLDH ofR30650_PEAJ_P4.
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: intracellularly. The protein localization is believed to be intracellularly because neither ofthe frans-membrane region prediction programs predicted a trans-membrane region for this protein. In addition both signal-peptide prediction programs predict that this protein is a non-secreted protein. Variant protein R30650_PEA_2__P4 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 8, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of Icnown SNPs in variant protein R30650 PEAJJP4 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Amino acid mutations
Figure imgf001392_0001
Variant protein R30650_PEA_2_P4 is encoded by the following transcript(s): R30650JPEA_2 JT2, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript R30650_PEA_2_T2 is shown in bold; this coding portion starts at position 1369 and ends at position 4032. The transcript also has the following SNPs as listed in Table 9 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein R30650_PEAJ_P4 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Nucleic acid SNPs
Figure imgf001392_0002
Figure imgf001393_0001
Variant protein R30650__PEA_2_P5 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) R30650 PEAJJT3. An alignment is given to the known protein (Protein KIAAl 199 precursor) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between R30650JPEA_2JP5 and Q9ULM1 (SEQ ID NO:989): l.An isolated chimeric polypeptide encoding for R30650JPEAJJP5, comprising a first amino acid sequence being at least 90 % homologous to MDGVNLSTEVΛ^KKGQDYRFACYDRGRACRSYRVRFLCGKPVP PKLTVTIDTNVNSTI LNLEDNNQSWKPGDTLVIASTDYSMYQAEEFQVLPCRSCAPNQVKVAGKPMYLHIGEE IDGVDMRAEVGLLSRNIIVMGEMEDKCYPYRNHICNFFDFDTFGGHIKFALGFKAAHLE GTELKHMGQQLVGQYPIHFHLAGDVDERGGYDPPTYIRDLSIHHTFSRCVTVHGSNGLL IKDVVGYNSLGHCFFTEDGPEERNTFDHCLGLLVKSGTLLPSDRDSKMCKMITEDSYPG YIPKPRQDCNAVSTFWMANPNNNLTNCAAAGSEETGFWFIFHHVPTGPSVGMYSPGYS EHIPLGKFYNNRAHSNYRAGMIIDNGVKTTEASAKDKRPFLSIISARYSPHQDADPLKPR EPAIIRHFIAYKNQDHGAWLRGGDVWLDSCRFADNGIGLTLASGGTFPYDDGSKQEIKN SLFVGESGNVGTEMMDNRIWGPGGLDHSGRTLPIGQNFPIRGIQLYDGPTNIQNCTFRKF VALEGRHTSALAFRLNNAWQSCPHNNVTGIAFEDVPITSRVFFGEPGPWFNQLDMDGD KTS HDVDGSVSEYPGSYLTKNDNWLVRHPDCTNVPDWRGAICSGCYAQMYIQAYK TSNLRMHIKNDFPSHPLYLEGALTRSTHYQQYQPNVTLQKGYTIHWDQTAPAELAIWL DSu^KGDWTRVGLCYPRGTTFSILSDVHΝRLLKQTSKTGVFVRTLQMDKVEQSYPGRSH YY DEDSGLLFLKLKAQΝEREl^AFCSMKGCEPJKTKALIP ΝAGVSDCTATAYPKFTE RAVNDWMPKJ.O.FGSQLKTKDHFLEVKMESSKQF1FFHL ΝDFAYIEVDGKKYPSSED GIQVVVIDGNQGRVVSHTSFlWSILQGIPWQLF "VATIPDNSIVLMASKGRNVSRGPW TRVLEKLGADRGLKLKEQMAFVGFKGSFRPIWVTLDTEDHK^IUFQVVPIPVVKKKKL conesponding to amino acids 18 - 1013 of Q9ULM1, which also conesponds to amino acids 1 - 996 of R30650_PEA_2_P5.
Comparison report between R30650_PEA_2_P5 and Q8WUJ3 (SEQ ID NO:987): 1.An isolated chimeric polypeptide encoding for R30650_PEA_2_P5, comprising a first amino acid sequence being at least 90 % homologous to
MDGVNLSTEVVYKKGQDYRFACYDRGRACRSYRVRFLCGKPVRPKLTVTIDTNVNSTI
LNLEDNNQSWKPGDTLVIASTDYSMYQAEEFQVLPCRSCAPNQVKVAGKPMYLHIGEE
IDGVDMRAEVGLLSRNIIVMGEMEDKCYPYRNHICNFFDFDTFGGHIKFALGFKAAHLE
GTELKHMGQQLVGQYPIHFHLAGDVDERGGYDPPTYIRDLSIHHTFSRCVTVHGSNGLL
IKDVVGYNSLGHCFFTEDGPEERNTFDHCLGLLVKSGTLLPSDRDSKMCKMITEDSYPG
YIPKPRQDCNAVSTFWMANPNNNLΓNCAAAGSEETGFWFIFHHVPTGPSVGMYSPGYS
EHIPLGKFYRØJRAHSNYRAGMIIDNGVKTTEASAKDKRPFLSIISARYSPHQDADPLKPR
EPAΠRHFIAYKNQDHGAWLRGGDVWLDSCRFADNGIGLTLASGGTFPYDDGSKQEIKN
SLFVGESGNVGTEMMDNRIWGPGGLDHSGRTLPIGQNFPIRGIQLYDGPINTQNCTFRKF
VALEGPΛTSALAFRLNNAWQSCPHNNVTGIAFEDVPITSRVFFGEPGPWFNQLDMDGD
KTSVFHDVDGSVSEYPGSYLTKND conesponding to amino acids 366 - 977 of Q8WUJ3, which also conesponds to amino acids 1 - 612 of R30650JPEA 2 ?5, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence NWLVRHPDCTNVPDWRGAICSGCYAQMYIQAYKTSNLRMKIIKNDFPSHPLYLEGALT RSTHYQQYQPWTLQKGYTIHWDQTAPAELAIWLTNFNKGDWIRVGLCYPRGTTFSILS DVHNRLLKQTSKTGVFVRTLQMDKVEQSYPGRSHYYWDEDSGLLFLKLKAQNEREKF AFCSMKGCEMKIKALIPKNAGVSDCTATAYPKFTERAVVDVPMPKKLFGSQLKTKDHF LEVKMESSKQHFFHLWNDFAYIEVDGKKYPSSEDGIQVVNIDGNQGRVNSHTSFRNSIL QGIPWQLFNYVATIPDNSIVLMASKGRYVSRGPWTRVLEKLGADRGLKLKEQMAFVGF KGSFRPIWVTLDTEDHKAKTFQVWIPVVKKKKL conesponding to amino acids 613 - 996 of R30650JPEA .JP5, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of R30650JΕAJJP5, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence
NWLVRHPDCTNVPDWRGAICSGCYAQMYIQAYKTSNLRMKIIKNDFPSHPLYLEGALT RSTHYQQYQPWTLQKGYTIHWDQTAPAELAIWLINFNKGDWIRVGLCYPRGTTFSΪLS DVHNRLLKQTSKTGVFVRTLQMDKVEQSYPGRSHYYWDEDSGLLFLKLKAQNEREKF AFCSMKGCEWl KALIPKNAGVSDCTATAYPKTTEr^AVVDVPMPKKLFGSQLKTKDHF LEVKMESSKQHFFHLWNDFAYIEVDGKKYPSSEDGIQVVVIDGNQGRVVSHTSFRNSIL QGIPWQLFNYVATIPDNSIVLMASKGRYVSRGPWTRVLEKLGADRGLKLKEQMAFVGF KGSFRPIWVTLDTEDHKAKIFQWPIPVVKKKKL in R30650JPEAJ2JP5.
Comparison report between R30650_PEA_2_P5 and Q9NPN9 (SEQ ID NO:988): l.An isolated chimeric polypeptide encoding for R30650JPEA_2JP5, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence
MDGVNLSTEVVYKKGQDYRFACYDRGRACRSYRVRFLCGKPVRPKLTVTIDTNVNSTI LNLEDNVQSWKPGDTLVIASTDYSMYQAEEFQVLPCRSCAPNQVKVAGKPMYLHIGEE IDGVDMRAEVGLLSRNIIVMGEMEDKCYPYRNHICNFFDFDTFGGHIKFALGFKAAHLE GTELKHMGQQLVGQYPIHFHLAGD conesponding to amino acids 1 - 199 of R30650_PEA_2_P5, and a second amino acid sequence being at least 90 % homologous to VDERGGYDPPTYIRDLSIHHTFSRCVTVHGSNGLLTKDVVGYNSLGHCFFTEDGPEERNT FDHCLGLLVKSGTLLPSDRDSKMCKMITEDSYPGY1PKPRQDCNAVSTFWMANPNNNL INCAAAGSEETGFWFIFHHVPTGPSVGMYSPGYSEHIPLGKFYNNRAHSNYRAGMIIDN GVKTTEASAKDKRPFLSIISARYSPHQDADPLKPREPAIIRHFIAYKNQDHGAWLRGGDV WLDSCRFADNGIGLTLASGGTFPYDDGSKQEIKNSLFVGESGNVGTEMMDNRIWGPGG LDHSGRTLPIGQNFPIRGIQLYDGPINIQNCTFRKFVALEGRHTSALAFRLNNAWQSCPH NNVTGIAFEDVPITSRVFFGEPGPWFNQLDMDGDKTSVFHDVDGSVSEYPGSYLTKND NWLVRHPDCINVPDWRGAICSGCYAQMYIQAYKTSNLRMKIIKNDFPSHPLYLEGALT RSTHYQQYQPVVTLQKGYTIHWDQTAPAELAIWLTNFNKGDWIRVGLCYPRGTTFSILS DVHNRLLKQTSKTGVFVRTLQMDKVEQSYPGRSHYYWDEDSGLLFLKLKAQNEREKF AFCSMKGCERIKIKALIPKNAGVSDCTATAYPl^TERAVVDVPMPKKLFGSQLKTKDHF LEVKMESSKQHFFHLWNDFAYIEVDG -KYPSSEDGIQVVVIDGNQGRVVSHTSFRNSIL QGIPWQLFNYVATIPDNSIVLMASKGRYVSRGPWTRVLEKLGADRGLKLKEQMAFVGF KGSFlPJPIWVTLDTEDHKAKIFQVVPIPVVKKKKL conesponding to amino acids 8 - 804 of Q9NPN9, which also conesponds to amino acids 200 - 996 of R30650_PEA_2_P5, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a head of R30650_PEA_2_P5, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence
MDGVNLSTEVVYKKGQDYRFACYDRGRACRSYRVRFLCGKPVRPKLTVTIDTNVNSTI LNLEDNVQSWKPGDTLVTASTDYSMYQAEEFQVLPCRSCAPNQVKVAGKPMYLHIGEE IDGVDMRAEVGLLSRNIIVMGEMEDKCYPYRNHICNFFDFDTFGGHIKFALGFKAAHLE GTELKHMGQQLVGQYPIHFHLAGD ofR30650_PEA_2_P5.
Comparison report between R30650JPEAJJP5 and Q9H1K5 (SEQ ID NO:990): l.An isolated chimeric polypeptide encoding for R30650_PEA_2_P5, comprising a first amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
MDGVNLSTEVVYKKGQDYRFACYDRGRACRSYRVRFLCGKPVRPKLTVTIDTNVNSTI LNLEDNVQSWKPGDTLVIASTDYSIVIYQAEEFQVLPCRSCAPNQVKVAGKPMYLHIGEE IDGVDMRAEVGLLSRNΠVMGEMEDKCYPYRNHICNFFDFDTFGGHIKFALGFKAAHLE GTELKHMGQQLVGQYPIHFHLAGDVDERGGYDPPTYIRDLSIHHTFSRCVTVHGSNGLL IKDVVGYNSLGHCFFTEDGPEERNTFDHCLGLLVKSGTLLPSDRDSKMCKMITEDSYPG YIPKPRQDCNAVSTFWMANPNINUNΓLΓNCAAAGSEETGFWFIFHHVPTGPSVGMYSPGYS
EHIPLGKFYNNPVAΗSNYRAGMIIDNGVKTTEASA1U)K1^FLSIISARYSPHQDADPLKPR EPAΠRHFLA-YKNQDHGAWLRGGDV LDSCRFADNGIGLTLASGGTFPYDDGSKQEIKN SLFVGESGNVGTEMMDNRIWGPGGLDH conesponding to amino acids 1 - 497 of R30650_PEA_2_P5, and a second amino acid sequence being at least 90 % homologous to
SGRTLPIGQNFPIRGIQLYDGP IQNCTFRKFVALEGRHTSALAFRLNNAWQSCPHNNV
TGIAFEDVPITSRVFFGEPGPWFNQLDMDGDKTSVFHDVDGSVSEYPGSYLTKNDNWL
VRHPDCΓNVPDWRGAICSGCYAQMYIQAYKTSNLRMKΠKNDFPSHPLYLEGALTRSTH
YQQYQPVVTLQKGYTIHWDQTAPAELAIWLΓNFNKGDWIRVGLCYPRGTTFSILSDVH
NRLLKQTSKTGVFVRTLQMDKVEQSYPGRSHYYWDEDSGLLFLKLKAQNEREKFAFC
SMKGCERIKTKALIPKNAGVSDCTATAYPKFTERAVVDVPMPKKLFGSQLKTKDHFLEV
KMESSKQHFFHLWNDFAYIEVDGKKYPSSEDGIQVVVIDGNQGRVVSHTSFRNSILQGI
PWQLFNYVATPDNSIVLMASKGRYVSRGPWTRVLEKLGADRGLKLKEQMAFVGFKG
SFRPIWVTLDTEDHl^KIFQVVPIPVVKKKKL conesponding to amino acids 2 - 500 of Q9H1K5, which also conesponds to amino acids 498 - 996 of R30650JPEA 2JP5, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a head of R30650_PEAJ_P5, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95% homologous to the sequence
MDGVNLSTEVVYKXGQDYPJ^ACYDRGRACRSYRVRFLCGKPVPJPKLTVTIDTNVNSTI
LNLEDNVQSWKPGDTLVIASTDYSMYQAEEFQVLPCRSCAPNQVKVAGKPMYLHIGEE
IDGVDMRAEVGLLSRNIIVMGEMEDKCYPYRNHICNFFDFDTFGGHIKFALGFKAAHLE
GTELKHMGQQLVGQYPIHFHLAGDVDERGGYDPPTYIRDLSIHHTFSRCVTVHGSNGLL
IKDVVGYNSLGHCFFTEDGPEERNTFDHCLGLLVKSGTLLPSDRDSKMCKMITEDSYPG
YIPKPRQDCNAVSTFWMANPNNNLINCAAAGSEETGFWFIFHHVPTGPSVGMYSPGYS
EHIPLGKPYNNRAHSNYIM.GMΠDNGVKTTEASAKDKRPFLSIISARYSPHQDADPLKPR
EPAIIRHFIAYKNQDHGAWLRGGDVWLDSCRFADNGIGLTLASGGTFPYDDGSKQEΓKN
SLFVGESGNVGTEMMDNRIWGPGGLDH OFR30650J>EA_2JP5.
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: intracellularly. The protein localization is believed to be intracellularly because neither ofthe irans-memorane region prediction programs predicted a trans-membrane region for this protein. In addition both signal-peptide prediction programs predict that this protein is a non-secreted protein. Variant protein R30650JPEA_2_P5 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 10, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein R30650 JPEA JJP5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 - Amino acid mutations
Figure imgf001398_0001
Variant protein R30650 JPEA 2 J*5 is encoded by the following transcript(s): R30650JPEAJ2JT3, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript R30650JPEAJJT3 is shown in bold; this coding portion starts at position 532 and ends at position 3519. The transcript also has the following SNPs as listed in Table 11 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein R30650JPEAJ2JP5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 11 - Nucleic acid SNPs
Figure imgf001398_0002
Figure imgf001399_0001
Variant protein R30650JPEA -JP8 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) R30650JPEA_2_T6. An alignment is given to the Icnown protein (Protein KIAAl 1 9 precursor) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between R30650JPENJJP8 and Q9ULM1 (SEQ ID NO:989): l.An isolated chimeric polypeptide encoding for R30650JΕA JP8, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MGAAGRQDFLFKAMLTIS LTLTCFPGATSTVAAGCPDQSPELQP NPGHDQDHHVHI GQGKTLLLTSSATVYSIHISEGGKLVIKDHDEPIVLRTRHILIDNGGELHAGSALCPFQGN FTIILYGRADEGIQPDPYYGLKYIGVGKGGALELHGQK XSWTFLNKTLHPGGMAEGG YFFERSWGHRGVTVHVIDPKSGTVIHSDRFDTYRSKKESERLVQYLNAVPDGRILSVAV NDEGSRNLDDMARKAMTKLGSKHFLHLGFRHPWSFLTVKGNPSSSVEDHIEYHGHRG SAAARVFKLFQTEHGEYFNVSLSSEWVQDVEWTEWFDHDKVSQTKGGEKISDLWK conesponding to amino acids 1 - 348 of R30650_PEA_2__P8, a second amino acid sequence being at least 90 % homologous to AHPGKICNRPIDIQATTMDGVNLSTEVVYKKGQDYRFACYDRGRACRSYRVRFLCGKP
VRPKLTVTIDTNVNSTILNLEDNVQSWKPGDTLVIASTDYSMYQAEEFQVLPCRSCAPN
QVKVAGKPMYLHIGEEIDGVDMRAEVGLLSRNIIVMGEMEDKCYPYRNHICNFFDFDT
FGGHIKFALGFKAAHLEGTELKHMGQQLVGQYPIHFHLAGDVDERGGYDPPTYIRDLSI
HHTFSRCVTVHGSNGLLIKDVVGYNSLGHCFFTEDGPEERNTFDHCLGLLVKSGTLLPS
DRDSHvICi lTEDSYPGYIPKPRQDCNAVSTFWMANPNNNLrNCAAAGSEETGFWFIF
HH TGPSVGMYSPGYSEHIPLGKFYNNRAHSNYRAGMIIDNGVKTTEASAKDKRPFLS
IISARYSPHQDADPLKPREPAIIRHFIAYKNQDHGAWLRGGDVWLDSCRFADNGIGLTL
ASGGTFPYDDGSKQEIKNSLFVGESGNVGTEMMDNRIWGPGGLDHSGRTLPIGQNFPIR
GIQLYDGPINIQNCTFRl^VALEGRHTSALAFIXNNAWQSCPHNNVTGIAFEDVPITSRV
FFGEPGPWFNQLDMDGDKTSVFHDVDGSVSEYPGSYLTKNDNWLVRHPDCINVPDWR
GAICSGCYAQMYIQAYKTSNLRMKIIKNDFPSHPLYLEGALTRSTHYQQYQPWTLQKG
YTIHWDQTAPAELAIWLTNFNKGDWIRVGLCYPRGTTFSILSDVHNRLLKQTSKTGVFV
RTLQMDKVEQSYPGRSHYYWDEDSG conesponding to amino acids 1 - 788 of Q9ULM1, which also conesponds to amino acids 349 - 1136 of R30650_PEA_2_P8, and a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence KQRTISWR conesponding to amino acids 1137 - 1144 of R30650_PEA_2_P8, wherein said first amino acid sequence, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a head of R30650_PEA_2_P8, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence
MGAAGRQDFLFKAMLTISWLTLTCFPGATSTVAAGCPDQSPELQPWNPGHDQDHHVHI GQGKTLLLTSSATVYSIHISEGGKLVIKDHDEPIVLRTRHILIDNGGELHAGSALCPFQGN FTIILYGRADEGIQPDPYYGLKYIGVGKGGALELHGQKKLSWTFLNKTLHPGGMAEGG YFFERSWGHRGVIVHVIDPKSGTVIHSDRFDTYRSKKESERLVQYLNAVPDGRILSVAV NDEGSRNLDDMARKAMTKLGSKHFLHLGFRHPWSFLTVKGNPSSSVEDHIEYHGHRG SAAAR KLFQTEHGEYFNVSLSSEWVQDVEWTEWFDHDKVSQTKGGEKISDLWK of R30650 PEA 2 P8. 3.An isolated polypeptide encoding for a tail of R30650JPEAJ2JP8, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence KQRTISWR in R30650_PEA_2_P8.
Comparison report between R30650JPEA_2JP8 and Q8WUJ3: l.An isolated chimeric polypeptide encoding for R30650_PEAJ_P8, comprising a first amino acid sequence being at least 90 % homologous to
MGAAGRQDFLFKAMLTISWLTLTCFPGATSTVAAGCPDQSPELQPWNPGHDQDHHVHI GQGKTLLLTSSATVYSIHISEGGKLVIKDHDEPIVLRTRHILIDNGGELHAGSALCPFQGN FTIILYGFADEGIQPDPYYGLKYIGVGKGGALELHGQKKLSWTFLNKTLHPGGMAEGG YFFERSWGHRGVIVHVIDPKSGTVIHSDRFDTYRSKKESERLVQYLNAVPDGRILSVAV NDEGSRNLDDMARKAMTKLGSKHFLHLGFRHPWSFLTVKGNPSSSVEDHIEYHGHRG SAAARVFKLFQTEHGEYFNVSLSSEWVQDVEWTEWFDHDKVSQTKGGEKISDLWKAH PGKICNRPIDIQATTMDGVNLSTEWYKKGQDYRFACYDRGRACRSYRVRFLCGKPVR PKLTVTIDTNVNSTILNLEDNVQSWKPGDTLVIASTDYSMYQAEEFQVLPCRSCAPNQV KVAGKPMYLHIGEEIDGVDMRAEVGLLSRNIIVMGEMEDKCYPYRNHICNFFDFDTFG GHIKFALGFKAAHLEGTELKHMGQQLVGQYPIHFHLAGDVDERGGYDPPTYIRDLSIHH TFSRCVTVHGSNGLLIKDVVGYNSLGHCFFTEDGPEERNTFDHCLGLLVKSGTLLPSDR DSKMCKMITEDSYPGYIPKJRQDCNAVSTFWMANPNNNLINCAAAGSEETGFWFIFHH VPTGPSVGMYSPGYSEHIPLGKFYNNRAHSNYT<AGMIIDNGVKTTEASAKDKRPFLSIIS ARYSPHQDADPLKPREPAIIRHFIAYKNQDHGAWLRGGDVWLDSCRFADNGIGLTLAS GGTFPYDDGSKQETKNSLFVGESGNVGTEMMDNRIWGPGGLDHSGRTLPIGQNFPTRGI QLYDGPINIQNCTFRKFVALEGRHTSALAFRLNNAWQSCPFTNNVTGIAFEDVPITSRVFF GEPGPWFNQLDMDGDKTSVFHDVDGSVSEYPGSYLTKND conesponding to amino acids 1 - 977 of Q8WUJ3, which also conesponds to amino acids 1 - 977 of R30650_PEAJ_P8, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
NWLVRFIPDCINWDWRGAICSGCYAQMYIQAYKTSNLRlNTKiπCNDFPSHPLYLEGALT RSTHYQQYQPWTLQKGYTIHWDQTAPAELAIWLTNFNKGDWIRVGLCYPRGTTFSILS DVHNRLLKQTSKTGVFVRTLQMDKVEQSYPGRSHYYWDEDSGKQRTISWR conesponding to amino acids 978 - 1144 of R30650_PEA_2_P8, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of R30650_PEA_2_P8, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence
NWLVRHPDCTNVPDWRGAICSGCYAQMYIQAYKTSNLRMKIIKNDFPSHPLYLEGALT RSTHYQQYQPWTLQKGYTIHWDQTAPAELAIWLINFNKGDWIRVGLCYPRGTTFSILS DVHNRLLKQTSKTGVFVRTLQMDKVEQSYPGRSHYYWDEDSGKQRTISWR in R30650JPEAJJP8.
Comparison report between R30650_PEA_2_P8 and Q9NPN9: l.An isolated chimeric polypeptide encoding for R30650 JΕA 2JP8, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
MGAAGRQDFLFKAMLTISWLTLTCFPGATSTVAAGCPDQSPELQPWNPGHDQDHHVHI GQGKTLLLTSSATVYSIHISEGGKLVIKDHDEPIVLRTRHILIDNGGELHAGSALCPFQGN FTIILYGRADEGIQPDPYYGLKYIGVGKGGALELHGQKKLSWTFLNKTLHPGGMAEGG YFFERSWGHRGVIVHVIDPKSGTVIHSDRFDTYRSKKESERLVQYLNAVPDGRILSVAV NDEGSRNLDDMARKAMTKLGSKHFLHLGFRHPWSFLTVKGNPSSSVEDHIEYHGHRG SAAARVFKLFQTEHGEYFNVSLSSEWVQDVEWTEWFDHDKVSQTKGGEKISDLWKAH PGKTCNRPIDIQATTMDGVNLSTEVVYKKGQDYRFACYDRGRACRSYRVRFLCGKPVR PKLTVTIDTNVNSTILNLEDNVQSWKPGDTLVIASTDYSMYQAEEFQVLPCRSCAPNQV KVAG PMYLHIGEEIDGVDMRAEVGLLSRNITVMGEMEDKCYPYRNHICNFFDFDTFG GHIKFALGFKAAHLEGTELKHMGQQLVGQYPIHFHLAGD conesponding to amino acids 1 - 564 of R30650JPEA_2_P8, a second amino acid sequence being at least 90 % homologous to
VDERGGYDPPTYIRDLSIHHTFSRCVTVHGSNGLLIKDWGYNSLGHCFFTEDGPEERNT FDHCLGLLVKSGTLLPSDRDSKMCKMITEDSYPGYΓPKPRQDCNAVSTFWMANPNNNL ΓNCAAAGSEETGFWFIFHHVPTGPSVGMYSPGYSEHIPLGKFYNNRAHSNYRAGMIIDN
GVKTTEASAKDKRPFLSIISARYSPHQDADPLKPREPAIIRHFIAYKNQDHGAWLRGGDV
WLDSCRFADNGIGLTLASGGTFPYDDGSKQEIKNSLFVGESGNVGTEMMDNRIWGPGG
LDHSGRTLPIGQNFPIRGIQLYDGPINIQNCTFRKFVALEGRHTSALAFRLNNAWQSCPH NNVTGIAFEDVPITSRVFFGEPGPWFNQLDMDGDKTSVFHDVDGSVSEYPGSYLTKND
NWLVRHPDCΓNVPDWRGAICSGCYAQMYIQAYKTSNLRMKIIKNDFPSHPLYLEGALT RSTHYQQYQPVVTLQKGYTIHWDQTAPAELAIWLINFNKGDWIRVGLCYPRGTTFSILS
DVHNRLLKQTSKTGVFVRTLQMDKVEQSYPGRSHYYWDEDSG conesponding to amino acids 8 - 579 of Q9NPN9, which also conesponds to amino acids 565 - 1136 of R30650_PEA_2_P8, and a third amino acid sequence being at least 70%, optionally at least 80%), preferably at least 85%, more preferably at least 90%) and most preferably at least 95% homologous to a polypeptide having the sequence KQRTISWR conesponding to amino acids 1137 - 1144 of R30650JPEA 2JP8, wherein said first amino acid sequence, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a head of R30650_PEA_2_P8, comprising a polypeptide being at least 70%, optionally at least about 80%o, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence
MGAAGRQDFLFKAMLTISWLTLTCFPGATSTVAAGCPDQSPELQPWNPGHDQDHHVHI GQGKTLLLTSSATVYSIHISEGGKLVIKDHDEPIVLRTRHILIDNGGELHAGSALCPFQGN
FTΠLYGRADEGIQPDPYYGLKYIGVGKGGALELHGQKKLSWTFLNKTLHPGGMAEGG
YFFERSWGHRGVΓVHVIDPKSGTVIHSDRFDTYRSKKESERLVQYLNAVPDGRILSVAV
NDEGSRNLDDMARKAMTKLGSKHFLHLGFRHPWSFLTVKGNPSSSVEDHIEYHGHRG
SAAARVFKLFQTEHGEYFNVSLSSEWVQDVEWTEWFDHDKVSQTKGGEKISDLWKAH
PGKTCNP^IDIQATTMDGVNLSTEVVYKKGQDYRFACYDRGRACRSYRVRFLCGKPVR
PKLTVTIDTNVNSTILNLEDNVQSWKPGDTLVIASTDYSMYQAEEFQVLPCRSCAPNQV
KVAGKPMYLHIGEEIDGVDMP .EVGLLSP^IΓVMGEMEDKCYPYRNHICNFFDFDTFG
GHIKFALGFKAAHLEGTELKHMGQQLVGQYPIHFHLAGD of R30650_PEA_2_P8. 3.An isolated polypeptide encoding for a tail of R30650_PEAJ_P8, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%. and most preferably at least about 95% homologous to the sequence KQRTISWR in R30650JPEA_2JP8.
Comparison report between R30650JPEA_2J»8 and Q9H1K5: l.An isolated chimeric polypeptide encoding for R30650JPEAJJP8, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence
MGAAGRQDFLFKAMLTISWLTLTCFPGATSTVAAGCPDQSPELQPWNPGHDQDHHVHI
GQGKTLLLTSSATVYSIHISEGGKLVIKDHDEPIVLRTRHILIDNGGELHAGSALCPFQGN
FTIILYGRADEGIQPDPYYGLKYIGVGKGGALELHGQKKLSWTFLNKTLHPGGMAEGG
YFFERSWGHRGVIVHVIDPKSGTVIHSDRFDTYRSKKESERLVQYLNAVPDGRILSVAV
NDEGSRNLDDIVLA.RKAMTKLGSKHFLHLGFRHPWSFLTVKGNPSSSVEDHIEYHGHRG
SAAARVFKLFQTEHGEYFNVSLSSEWVQDVEWTEWFDHDKVSQTKGGEKISDLWKAH
PGKICNRPIDIQATTMDGVNLSTEVVYKKGQDYRFACYDRGRACRSYRVRFLCGKPVR
PKLTVTIDTNVNSTILNLEDNVQSWKPGDTLVIASTDYSMYQAEEFQVLPCRSCAPNQV
KVAGKPMYLHIGEEIDGVDMRAEVGLLSRNIRVMGEMEDKCYPYRNHICNFFDFDTFG
GHIKFALGFKAAHLEGTELKHMGQQLVGQYPIHFHLAGDVDERGGYDPPTYIRDLSIHH
TFSRCVTVHGSNGLLIKDVVGYNSLGHCFFTEDGPEERNTFDHCLGLLVKSGTLLPSDR
DSKMCKMITEDSYPGYIPKPRQDCNAVSTFWMANPNNNLΓNCAAAGSEETGFWFΓFHH
VPTGPSVGMYSPGYSEHIPLGKFYNNPAHSNΎRAGMΠDNGVKTTEASAKDKRPFLSΠS
ARYSPHQDADPLKPREPAIIRHFIAYKNQDHGAWLRGGDVWLDSCRFADNGIGLTLAS
GGTFPYDDGSKQEIKNSLFVGESGNVGTEMMDNRIWGPGGLDH conesponding to amino acids 1 - 862 of R30650_PEAJ_P8, a second amino acid sequence being at least 90 % homologous to
SGRTLPIGQNFPIRGIQLYDGPINIQNCTFRK-FVALEGRΉTSALAFRLNNAWQSCPHNNV TGIAFEDVPITSRVFFGEPGPWFNQLDMDGDKTSVFHDVDGSVSEYPGSYLTKNDNWL
VRHPDCLNWDWRGAICSGCYAQMYIQAYKTSNLRMKIFFIANDFPSHPLYLEGALTRSTH YQQYQPWTLQKGYTIHWDQTAPAELAIWLΓNFNKGDWIRVGLCYPRGTTFSILSDVH
NRLLKQTSKTGVFVRTLQMDKVEQSYPGRSHYYWDEDSG conesponding to amino acids 2 - 275 of Q9H1K5, which also conesponds to amino acids 863 - 1136 of R30650JΕAJJP8, and a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence KQRTISWR conesponding to amino acids 1137 - 1144 of R30650JPEA ?8, wherein said first amino acid sequence, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a head of R30650 ?EA_2_P8, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence
MGAAGRQDFLFKAMLTISWLTLTCFPGATSTVAAGCPDQSPELQPWNPGHDQDHHVHI
GQGKTLLLTSSATVYSIHISEGGKLVIKDHDEPIVLRTRHILIDNGGELHAGSALCPFQGN
FTIILYGRADEGIQPDPYYGLKYIGVGKGGALELHGQKKLSWTFLNKTLHPGGMAEGG
YFFERSWGHRGVIVHVIDPKSGTVIHSDRFDTYRSKKESERLVQYLNAVPDGRILSVAV
NDEGSRNLDDMARKAMTKLGSKHFLHLGFRHPWSFLTVKGNPSSSVEDHIEYHGHRG
SAAARVFKLFQTEHGEYFNVSLSSEWVQDVEWTEWFDHDKVSQTKGGEKISDLWKAH
PGKICNRPIDIQATTMDGVNLSTEVVYKKGQDYRFACYDRGRACRSYRVRFLCGKPVR
PKLTVTIDTNVNSTILNLEDNVQSWKPGDTLVIASTDYSMYQAEEFQVLPCRSCAPNQV
KVAGKPMYLHIGEEIDGVDMRAEVGLLSRNIΓVMGEMEDKCYPYRNHICNFFDFDTFG
GHIKFALGFKAAHLEGTELKHMGQQLVGQYPIHFHLAGDVDERGGYDPPTYIRDLSIHH
TFSRCVTVHGSNGLLIKDWGYNSLGHCFFTEDGPEERNTFDHCLGLLVKSGTLLPSDR
DSKMCKMITEDSYPGYIPKPRQDCNAVSTFWMANPNNNLINCAAAGSEETGFWFIFHH
WTGPSVGMYSPGYSEHIPLGKFYNNRAHSNYRAGMIIDNGVKTTEASAKDKRPFLSIIS
ARYSPHQDADPLKPREPAIIRHFIAYKNQDHGAWLRGGDVWLDSCRFADNGIGLTLAS
GGTFPYDDGSKQEIKNSLFVGESGNVGTEMMDNRTWGPGGLDH OFR30650J>EA J 8. 3.An isolated polypeptide encoding for a tail of R30650_PEA_2_P8, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence KQRTISWR in R30650_PEAJ_P8.
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a frans-membrane region. Variant protein R30650_PEA_2_P8 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 12, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is Icnown or not; the presence of known SNPs in variant protein R30650JPEAJ2JP8 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 12 - Amino acid mutations
Figure imgf001406_0001
Variant protein R30650_PEA_2_P8 is encoded by the following transcript(s): R30650_PEA_2_T6, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript R30650_PEA_2_T6 is shown in bold; this coding portion starts at position 265 and ends at position 3696. The transcript also has the following SNPs as listed in Table 13 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein R30650_PEAJ_P8 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 13 - Nucleic acid SNPs
Figure imgf001406_0002
Figure imgf001407_0001
Variant protein R30650_PEA_2_P12 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) R30650JPEA_2JT14. The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: intracellularly. The protein localization is believed to be intracellularly because neither ofthe trans-membrane region prediction programs predicted a trans-membrane region for this protein. In addition both signal-peptide prediction programs predict that this protein is a non-secreted protein.
Variant protein R30650JΕAJ2 JP12 is encoded by the following transcript(s): R30650_PEA_2_T14, for which the sequence(s) is/are given at the end ofthe application. The coding portion of franscript R30650JPEAJ2 T14 is shown in bold; this coding portion starts at position 1543 and ends at position 1719. The transcript also has the following SNPs as listed in Table 14 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein R30650_PEA_2_P12 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 14 - Nucleic acid SNPs
Figure imgf001408_0001
Variant protein R30650_PEAJ_P13 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) R30650JPEA_2JT15 and R30650 JPEA JT21. The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: intracellularly. The protein localization is believed to be intracellularly because neither ofthe trans-membrane region prediction programs predicted a frans-membrane region for this protein. In addition both signal-peptide prediction programs predict that this protein is a non-secreted protein.
Variant protein R30650_PEAJ_P13 is encoded by the following transcript(s): R30650JPEA_2JT15 and R30650JPEA_2JT21, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript R30650JPEA_2JT15 is shown in bold; this coding portion starts at position 1543 and ends at position 1713. The franscript also has the following 1408
SNPs as listed in Table 15 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is Icnown or not; the presence of known SNPs in variant protein R30650_PEA_2_P13 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 15 - Nucleic acid SNPs
Figure imgf001409_0001
The coding portion of transcript R30650JPEAJJT21 is shown in bold; this coding portion starts at position 1543 and ends at position 1713. The franscript also has the following SNPs as listed in Table 16 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein R30650_PEA_2_P13 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 16 - Nucleic acid SNPs
Figure imgf001409_0002
1409
Figure imgf001410_0001
Variant protein R30650JPEAJJP15 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) R30650JPEA _2_T18. An alignment is given to the known protein (Protein KIAAl 199 precursor) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between R30650_PEA_2_P15 and Q9ULM1 : l.An isolated chimeric polypeptide encoding for R30650_PEA_2_P15, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MGAAGRQDFLFKAMLTISWLTLTCFPGATSTVAAGCPDQSPELQPWNPGHDQDHHVHI GQGKTLLLTSSATVYSIHISEGGKLVIKDHDEPΓVLRTRHILIDNGGELHAGSALCPFQGN FTIILYGRADEGIQPDPYYGLKYIGVGKGGALELHGQKKLSWTFLNKTLHPGGMAEGG YFFERSWGHRGVTVHVIDPKSGTVIHSDRFDTYRSKKESERLVQYLNAVPDGRILSVAV NDEGSRNLDDMARKAMTKLGSKHFLHLGFPVHPWSFLTVKGNPSSSVEDHIEYHGHRG SAAARVFKLFQTEHGEYFNVSLSSEWVQDVEWTEWFDHDKVSQTKGGEKISDLWK conesponding to amino acids 1 - 348 of R30650_PEAJ_P15, and a second amino acid sequence being at least 90 % homologous to AHPGKICNRPIDIQATTMDGVNLSTEVVYKKGQDYRFACYDRGRACRSYRVRFLCGKP
VRPKLTVTIDTNVNSTILNLEDNVQSWKPGDTLVIASTDYSMYQAEEFQVLPCRSCAPN
QVKVAGKPMYLHIGEEIDGVDMRAEVGLLSRNIIVMGEMEDKCYPYRNHICNFFDFDT
FGGHIKFALGFKAAHLEGTELKHMGQQLVGQYPIHFHLAGDVDERGGYDPPTYIRDLSI
HHTFSRCVTVHGSNGLLIKDVVGYNSLGHCFFTEDGPEERNTFDHCLGLLVKSGTLLPS
DRDSKMCKMITEDSYPGYIPKPRQDCNAVSTFWMANPNNNLINCAAAGSEETGFWFIF
HHVPTGPSVGMYSPGYSEHIPLGKFYNNRAHSNYRAGMΠDNGVKTTEASAKΌKRPFLS
IISARYSPHQDADPLKPREPAIIRHFIAYKNQDHGAWLRGGDVWLDSCRFADNGIGLTL
ASGGTFPYDDGSKQEIKNSLFVGESGNVGTEMMDNRIWGPGGLDHSGRTLPIGQNFPIR
GIQLYDGPINIQNCTFR ^VALEGRHTSALAFRLNNAWQSCPHNNVTGIAFEDVPITSRV
FFGEPGPWFNQLDMDGDKTSVFHDVDGSVSEYPGSYLTKNDNWLVRHPDCINVPDWR
GAICSGCYAQMYIQAYKTSNLRMKIIKNDFPSHPLYLEGALTRSTHYQQYQPWTLQKG
YTIHWDQTAPAELAIWLΓNFNKGDWIRVGLCYPRGTTFSILSDVHNRLLKQTSKTGVFV
RTLQMDKVEQSYPGRSHYYWDEDSG conesponding to amino acids 1 - 788 of Q9ULM1, which also conesponds to amino acids 349 - 1136 of R30650_PEA_2_P15, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a head of R30650JPEA 2J>15, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence
MGAAGRQDFLFKAMLTISWLTLTCFPGATSTVAAGCPDQSPELQPWNPGHDQDHHVHI
GQGKTLLLTSSATVYSIHISEGGKLVIKDHDEPIVLRTRHILIDNGGELHAGSALCPFQGN
FTI1LYGTADEGIQPDPYYGLKYIGVGKGGALELHGQKKLSWTFLNKTLHPGGMAEGG
YFFERSWGHRGVΓVHVIDPKSGTVIHSDRFDTYRSKKESERLVQYLNAVPDGRILSVAV
NDEGSRL^DDMARKAMTKLGSKHFLHLGFRHPWSFLTVKGNPSSSVEDHIEYHGHRG
SAAARVFKLFQTEHGEYFNVSLSSEWVQDVEWTEWFDHDKVSQTKGGEKISDLWK OF
R30650JPEA_2JP15.
Comparison report between R30650_PEAJ_P15 and Q8WUJ3: l.An isolated chimeric polypeptide encoding for R30650_PEA_2_P15, comprising a first amino acid sequence being at least 90 % homologous to MGAAGRQDFLFKAMLTISWLTLTCFPGATSTVAAGCPDQSPELQPWNPGHDQDHHVHI
GQGKTLLLTSSATVYSIHISEGGKLVIKDHDEPIVLRTRHILIDNGGELHAGSALCPFQGN
FTIILYGRADEGIQPDPYYGLKYIGVGKGGALELHGQKKLSWTFLNKTLHPGGMAEGG
YFFERSWGHRGVIVHVIDPKSGTVIHSDRFDTYRSKKESERLVQYLNAVPDGRILSVAV
NDEGSRNLDDMARKAMTKLGSKHFLHLGFRHPWSFLTVKGNPSSSVEDHIEYHGHRG
SAAARVFKLFQTEHGEYFNVSLSSEWVQDVEWTEWFDHDKVSQTKGGEKISDLWKAH
PGKICNIIPIDIQATTMDGVNLSTEVVYKKGQDYRFACYDRGRACRSYRVRFLCGKPVR
PKLTVTIDTNVNSTILNLEDNVQSWKPGDTLVIASTDYSMYQAEEFQVLPCRSCAPNQV
KVAGKPMYLHIGEEIDGVDMRAEVGLLSRNIIVMGEMEDKCYPYRNHICNFFDFDTFG
GHIKFALGFKAAHLEGTELKHMGQQLVGQYPIHFHLAGDVDERGGYDPPTYIRDLSIHH
TFSRCVTVHGSNGLLIKDVVGYNSLGHCFFTEDGPEERNTFDHCLGLLVKSGTLLPSDR
DSKMCKMITEDSYPGYIPKPRQDCNAVSTFWMANPNNNLΓNCAAAGSEETGFWFIFHH TGPSVGMYSPGYSEHIPLGKFYN PVAHSNYRAGMIIDNGVKTTEASAKDKRPFLSIIS
ARYSPHQDADPLKPREPAIIRHFIAYKNQDHGAWLRGGDVWLDSCRFADNGIGLTLAS
GGTFPYDDGSKQEΓKNSLFVGESGNVGTEMMDNRIWGPGGLDHSGRTLPIGQNFPIRGI
QLYDGPMIQNCTFRKFVALEGRHTSALAFRLNNAWQSCPHNNVTGIAFEDVPITSRVFF
GEPGPWFNQLDMDGDKTSVFHDVDGSVSEYPGSYLTKND conesponding to amino acids 1 - 977 of Q8WUJ3, which also conesponds to amino acids 1 - 977 of R30650_PEA_2_P15, and a second amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
NWLVRHPDCTNVPDWRGAICSGCYAQMYIQAYKTSNLRMKIIKNDFPSHPLYLEGALT RSTHYQQYQPWTLQKGYTIHWDQTAPAELAIWLTNFNKGDWIRVGLCYPRGTTFSILS DVHNRLLKQTSKTG VRTLQMDKVEQSYPGRSHYYWDEDSG conesponding to amino acids 978 - 1136 of R30650_PEA_2_P15, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of R30650_PEAJ_P15, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence NWLVPvHPDCliNVTDWRGAICSGCYAQMYIQAYKTSNLRMiaiKNDFPSHPLYLEGALT RSTHYQQYQPVVTLQKGYTIHWDQTAPAELAIWLΓNFNKGDWIRVGLCYPRGTTFSILS DVHNRLLKQTSKTGVFVRTLQMDKVEQSYPGRSHYYWDEDSG IN
R30650_PEA_2_P15.
Comparison report between R30650JPEA_2JP15 and Q9NPN9: l.An isolated chimeric polypeptide encoding for R30650_PEAJ_P15, comprising a first amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
MGAAGRQDFLFKAMLTISWLTLTCFPGATSTVAAGCPDQSPELQPWNPGHDQDHHVHI
GQGKTLLLTSSATVYSIHISEGGKLVIKDHDEPIVLRTRHILIDNGGELHAGSALCPFQGN
FTIILYGRADEGIQPDPYYGLKYIGVGKGGALELHGQKKLSWTFLNKTLHPGGMAEGG
YFFERSWGHRGVΓVHVIDPKSGTVIHSDRFDTYRSKKESERLVQYLNAVPDGRILSVAV
NDEGSRNLDDMARKAMTKLGSKHFLHLGFRHPWSFLTVKGNPSSSVEDHIEYHGHRG
SAAARVFKLFQTEHGEYFNVSLSSEWVQDVEWTEWFDHDKVSQTKGGEKISDLWKAH
PGKICNRPIDIQATTMDGVNLSTEWYKKGQDYRFACYDRGRACRSYRVRFLCGKPVR
PIΑ,TVTIDTNVNSTILNLEDNVQSWKPGDTLVIASTDYSMYQAEEFQVLPCRSCAPNQV
KVAGKPMYLHIGEEIDGVDMRAEVGLLSRNIRVMGEMEDKCYPYRNHICNFFDFDTFG
GHIKFALGFKAAHLEGTELKHMGQQLVGQYPIHFHLAGD conesponding to amino acids 1 - 564 of R30650JPEAJ2 ?15, and a second amino acid sequence being at least 90 % homologous to
VDERGGYDPPTYIRDLSIHHTFSRCVTVHGSNGLLIKDVVGYNSLGHCFFTEDGPEERNT
FDHCLGLLVKSGTLLPSDRDSKMCKMITEDSYPGYΓPKPRQDCNAVSTFWMANPNNNL
TNCAAAGSEETGFWFIFHHVPTGPSVGMYSPGYSEHIPLGKFYNNRAHSNYRAGMIIDN
GVKTTEASAKDKRPFLSIISARYSPHQDADPLKPREPAΠRHFIAYKNQDHGAWLRGGDV
WLDSCRFADNGIGLTLASGGTFPYDDGSKQEIKNSLFVGESGNVGTEMMDNRIWGPGG
LDHSGRTLPIGQNFP]RGIQLYDGPINIQNCTFRKFVALEGRHTSALAFRLNNAWQSCPH
NNVTGIAFEDVPITSRVFFGEPGPWFNQLDMDGDKTSVFHDVDGSVSEYPGSYLTKND
NWLVRHPDCINVPDWRGAICSGCYAQMYIQAYKTSNLRMK TKNDFPSHPLYLEGALT
RSTHYQQYQPVVTLQKGYTIHWDQTAPAELAIWLII^NKGDWIRVGLCYPRGTTFSILS DVHNRLLKQTSKTGVFVRTLQMDKVEQSYPGRSHYYWDEDSG conesponding to amino acids 8 - 579 of Q9NPN9, which also conesponds to amino acids 565 - 1136 of R30650 PEA 2 P15, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a head of R30650_PEA_2_P15, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence
MGAAGRQDFLFKAMLTISWLTLTCFPGATSTVAAGCPDQSPELQPWNPGHDQDHHVHI GQGKTLLLTSSATVYSIHISEGGKLVIKDHDEPIVLRTRHILIDNGGELHAGSALCPFQGN
FTIILYGRADEGIQPDPYYGLKYIGVGKGGALELHGQKKLSWTFLNKTLHPGGMAEGG
YFFERSWGHRGVΓVHVIDPKSGTVIHSDRFDTYRSKKESERLVQYLNAVPDGRILSVAV
NDEGSRNLDDMARKAMTKLGSKHFLHLGFRHPWSFLTVKGNPSSSVEDHIEYHGHRG
SAAARVFKLFQTEHGEYFNVSLSSEWVQDVEWTEWFDHDKVSQTKGGEKISDLWKAH
PGKICNRPIDIQATTMDGVNLSTEWYKKGQDYRFACYDRGRACRSYRVRFLCGKPVR
PKLTVTIDTNVNSTILNLEDNVQSWKPGDTLVIASTDYSMYQAEEFQVLPCRSCAPNQV
KVAGKPMYLHIGEEIDGVDMRAEVGLLSRNIIVMGEMEDKCYPYRNHICNFFDFDTFG
GHIKFALGFKAAHLEGTELKHMGQQLVGQYPIHFHLAGD OFR30650_PEA_2_P15.
Comparison report between R30650JΕAJJP15 and Q9H1K5: l.An isolated chimeric polypeptide encoding for R30650JPEA_2J?15, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
MGAAGRQDFLFKAMLTISWLTLTCFPGATSTVAAGCPDQSPELQPWNPGHDQDHHVHI
GQGKTLLLTSSATVYSIHISEGGKLVIKDHDEPIVLRTRHILIDNGGELHAGSALCPFQGN
FTΠLYGRADEGIQPDPYYGLKYIGVGKGGALELHGQKKLSWTFLNKTLHPGGMAEGG
YFFERSWGHRGVIVFINIDPKSGTVIHSDRFDTYRSKKESERLVQYLNAVPDGRILSVAV
NDEGSRNLDDMARKAMTKLGSKJTFLHLGFRHPWSFLTVKGNPSSSVEDHIEYHGHRG
SAAARVFKLFQTEHGEYFNVSLSSEWVQDVEWTEWFDHDKVSQTKGGEKISDLWKAH
PGKTCNRPIDIQATTMDGVNLSTEVVYKKGQDYRFACYDRGRACRSYRVRFLCGKPVR
PKLTVTIDTNVNSTILNLEDNVQSWKPGDTLVIASTDYSMYQAEEFQVLPCRSCAPNQV KVAGKPMYLHIGEEIDGVDMRAEVGLLSRNIIVMGEMEDKCYPYRNHICNFFDFDTFG
GHIKFALGFKAAHLEGTELKHMGQQLVGQYPIHFHLAGDVDERGGYDPPTYIRDLSIHH
TFSRCVTVHGSNGLLIKDVVGYNSLGHCFFTEDGPEERNTFDHCLGLLVKSGTLLPSDR
DSKMCKMITEDSYPGYIPKPRQDCNAVSTFWMANPNNNLΓNCAAAGSEETGFWFIFHH
VPTGPSVGMYSPGYSEHIPLGKFYNNRAHSNYRAGMIIDNGVKTTEASAKDKRPFLSIIS
ARYSPHQDADPLKPREPAIIRHFIAYKNQDHGAWLRGGDVWLDSCRFADNGIGLTLAS
GGTFPYDDGSKQEIKNSLFVGESGNVGTEMMDNRIWGPGGLDH conesponding to amino acids 1 - 862 of R30650JPEA 2JP15, and a second amino acid sequence being at least 90 % homologous to SGRTLPIGQNFPIRGIQLYDGPD^IQNCTFRKFVALEGRHTSALAFRLNNAWQSCPHNNV
TGIAFEDVPITSRVFFGEPGPWFNQLDMDGDKTSVFHDVDGSVSEYPGSYLTKNDNWL VRHPDCΓNVPDWRGAICSGCYAQMYIQAYKTSNLRMKIIKNDFPSHPLYLEGALTRSTH YQQYQPWTLQKGYTIHWDQTAPAELAIWLINFNKGDWIRVGLCYPRGTTFSILSDVH
NRLLKQTSKTGVFVRTLQMDKVEQSYPGRSHYYWDEDSG conesponding to amino acids 2 - 275 of Q9H1K5, which also conesponds to amino acids 863 - 1136 of R30650_PEA_2J>15, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a head of R30650JPEA 2JP15, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90%> and most preferably at least about 95% homologous to the sequence
MGAAGRQDFLFKAMLTISWLTLTCFPGATSTVAAGCPDQSPELQPWNPGHDQDHHVHI GQGKTLLLTSSATVYSIHISEGGKLVIKDHDEPIVLRTRHILIDNGGELHAGSALCPFQGN FTIILYGRADEGIQPDPYYGLKYIGVGKGGALELHGQKKLSWTFLNKTLHPGGMAEGG YFFERSWGHRGVTVHVIDPKSGTVIHSDRFDTYRSKKESERLVQYLNAVPDGRILSVAV NDEGSRNLDDIVLA.RKAMTKLGSKHFLHLGFRHPWSFLTVKGNPSSSVEDHIEYHGHRG SAAARVFKLFQTEHGEYFNVSLSSEWVQDVEWTEWFDHDKVSQTKGGEKISDLWKAH PGKTCNPJ'IDIQATTMDGVNLSTEVVYKKGQDYRFACYDRGRACRSYRVRFLCGKPVR PKLTVTIDTNVNSTILNLEDNVQSWKPGDTLVIASTDYSMYQAEEFQVLPCRSCAPNQV KVAGKPMYLHIGEEIDGVDMRAEVGLLSRNirVMGEMEDKCYPYRNHICNFFDFDTFG GHKFALGFKAAHLEGTELKHMGQQLVGQYPIHFHLAGDVDERGGYDPPTYIRDLSIHH TFSRCVTVHGSNGLLIKDWGYNSLGHCFFTEDGPEERNTFDHCLGLLVKSGTLLPSDR DSKMCKMITEDSYPGYIPKPRQDCNAVSTFWMANPNNNLTNCAAAGSEETGFWFIFHH VPTGPSVGMYSPGYSEHIPLGKFYNNRAHSNYRAGMIIDNGVKTTEASAKDKRPFLSIIS ARYSPHQDADPLKPREPAIIRHFIAYKNQDHGAWLRGGDVWLDSCRFADNGIGLTLAS GGTFPYDDGSKQEIKNSLFVGESGNVGTEMMDNRIWGPGGLDH of R30650_PEA_2_P15.
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither frans-membrane region prediction program predicts that this protein has a trans -membrane region. Variant protein R30650 PEA 2JP15 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 17, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is Icnown or not; the presence of known SNPs in variant protein R30650JPEAJ P15 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 17 - Amino acid mutations
Figure imgf001416_0001
Variant protein R30650_PEAJ__P15 is encoded by the following franscript(s): R30650_PEA_2_T18, for which the sequence(s) is/are given at the end ofthe application. The coding portion of franscript R30650__PEA_2_T18 is shown in bold; this coding portion starts at position 265 and ends at position 3672. The transcript also has the following SNPs as listed in Table 18 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein R30650JPEAJJP15 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 18 - Nucleic acid SNPs
Figure imgf001417_0001
Variant protein R30650_PEA_2_P17 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) R30650_PEA_2_T23. An alignment is given to the known protein (Protein KIAAl 199 precursor) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between R30650JPEA_2JP17 and Q8WUJ3: l.An isolated chimeric polypeptide encoding for R30650JPEAJJP17, comprising a first amino acid sequence being at least 90 % homologous to MGAAGRQDFLFKAMLTISWLTLTCFPGATSTVAAGCPDQSPELQPWNPGHDQDHHVHI GQGKTLLLTSSATVYSIHISEGGKLVIKDHDEPIVLRTRHILIDNGGELHAGSALCPFQGN FTIILYGRADEGIQPDPYYGLKYIGVGKGGALELHGQKKLSWTFLNKTLHPGGMAEGG YFFERSWGHRGVWITVIDPKSGTVIHSDRFDTYRSKKESERLVQYLNAVPDGRILSVAV NDEGSRNLDDMARKAMTKLGSKTTFLHLGFRHPWSFLTVKGNPSSSVEDHIEYHGHRG SAAARVFKLFQTEHGEYFNVSLSSEWVQ conesponding to amino acids 1 - 321 of Q8WUJ3, which also conesponds to amino acids 1 - 321 of R30650_PEA__2_P17, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%> and most preferably at least 95 % homologous to a polypeptide having the sequence GEEFQTIW conesponding to amino acids 322 - 329 of R30650JPEAJJP17, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of R30650JPEAJJP17, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence GEEFQTIW in R30650_PEA_2_P17.
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region.
Variant protein R30650JPEA 2JP17 is encoded by the following franscript(s): R30650JPEA_2JT23, for which the sequence(s) is/are given at the end ofthe application. The coding portion of franscript R30650JPBAJJT23 is shown in bold; this coding portion starts at position 265 and ends at position 1251. The transcript also has the following SNPs as listed in Table 19 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein R30650JPEAJ2JP17 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 19 - Nucleic acid SNPs
Figure imgf001418_0001
As noted above, cluster R30650 featares 49 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end ofthe application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster R30650_PEAJ_node_0 according to the present invention is supported by 7 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R30650_PEA_2_T6, R30650JPEA JT14, R30650JPEA_2_T15, R30650JPEAJ2 T18, R30650_PEAJ_T21 and R30650JPEAJJT23. Table 20 below describes the starting and ending position of this segment on each transcript. Table 20 - Segment location on transcripts
Figure imgf001419_0001
Segment cluster R30650_PEA_2_node_l according to the present invention is supported by 5 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscriρt(s): R30650__PEA_2_T14, R30650_PEA_2_T15 and R30650_PEAJ_T21. Table 21 below describes the starting and ending position of this segment on each franscript. Table 21 - Segment location on transcripts
Figure imgf001420_0001
Segment cluster R30650JPEA iode _3 according to the present invention is supported by 7 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcriρt(s): R30650JPENJJT14, R30650_PEA_2_T15 and R30650_PEA_2_T21. Table 22 below describes the starting and ending position of this segment on each transcript. Table 22 - Segment location on transcripts
Figure imgf001420_0002
Segment cluster R30650_PEA_2_node_5 according to the present invention is supported by 8 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R30650JPEAJJT14, R30650JPEA_2 JTl 5 and R30650JPEAJ2JT21. Table 23 below describes the starting and ending position of this segment on each franscript. Table 23 - Segment location on transcripts
Figure imgf001420_0003
Figure imgf001421_0001
Segment cluster R30650_PEAJ_node_9 according to the present invention is supported by 6 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R30650 JPEA JJT21. Table 24 below describes the starting and ending position of this segment on each transcript. Table 24 - Segment location on transcripts
Figure imgf001421_0002
Segment cluster R30650_PEA_2_node_l 1 according to the present invention is supported by 4 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): R30650_PEAJ_T14. Table 25 below describes the starting and ending position of this segment on each franscript. Table 25 - Segment location on transcripts
Figure imgf001421_0003
Segment cluster R30650_PEAJ_node_14 according to the present invention is supported by 10 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcriρt(s): R30650JPENJJT14 and R30650_PEA_2_T15. Table 26 below describes the starting and ending position of this segment on each transcript. 1421
Table 26 - Segment location on transcripts
Figure imgf001422_0001
Segment cluster R30650_PEAJ_node_20 according to the present invention is supported by 6 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R30650JPEA_2 JT6, R30650JΕA T18 and R30650_PEA_2_T23. Table 27 below describes the starting and ending position of this segment on each transcript. Table 27 - Segment location on transcripts
Figure imgf001422_0002
Segment cluster R30650_PEA Jjnode _22 according to the present invention is supported by 6 libraries. The number of libraries was detennined as previously described. This segment can be found in the following transcriρt(s): R30650JPENJJT6, R30650_PEA_2_T18 and R30650_PEA_2_T23. Table 28 below describes the starting and ending position of this segment on each transcript. Table 28 - Segment location on transcripts
Figure imgf001422_0003
Figure imgf001423_0001
Segment cluster R30650JPEAJ_node 4 according to the present invention is supported by 7 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R30650JPEA_2_T6, R30650JPEAJ2JT18 and R30650_PEAJ_T23. Table 29 below describes the starting and ending position of this segment on each transcript. Table 29 - Segment location on transcripts
Figure imgf001423_0002
Segment cluster R30650JPEA iode _26 according to the present invention is supported by 7 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R30650_PEA_2_T6, R30650_PEAJ_T18 and R30650_PEA_2_T23. Table 30 below describes the starting and ending position of this segment on each transcript. Table 30 - Segment location on transcripts
Figure imgf001423_0003
Segment cluster R30650_PEAJ_node J2 according to the present invention is supported by 4 libraries. The number of libraries was detennined as previously described. This segment can be found in the following transcript(s): R30650JΕA 2JT23. Table 31 below describes the starting and ending position of this segment on each transcript. Table 31 - Segment location on transcripts
Figure imgf001424_0001
Segment cluster R30650_PEA_2_nodeJ4 according to the present invention is supported by 8 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R30650JPEA_2_T6 and R30650JPEA JT18. Table 32 below describes the starting and ending position of this segment on each transcript. Table 32 - Segment location on transcripts
Figure imgf001424_0002
Segment cluster R30650_PEAJ_node J6 according to the present invention is supported by 3 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R30650 JPEA JT3. Table 33 below describes the starting and ending position of this segment on each transcript. Table 33 - Segment location on transcripts
Figure imgf001424_0003
Segment cluster R30650JPEA jnode 7 according to the present invention is supported by 8 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R30650 PEAJJT3, R30650_PEAJ_T6 and R30650_PEA_2_T18. Table 34 below describes the starting and ending position of this segment on each transcript. Table 34 - Segment location on tr-anscripts
Figure imgf001425_0001
Segment cluster R30650_PEA_2_node _39 according to the present invention is supported by 8 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): R30650JPEA_2JT3, R30650J°EA JT6 and R30650_PEA_2_T18. Table 35 below describes the starting and ending position of this segment on each transcript. Table 35 - Segment location on transcripts
Figure imgf001425_0002
Segment cluster R30650 JPEAJ jnode JT according to the present invention is supported by 3 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R30650_PEA_2_T2. Table 36 below describes the starting and ending position of this segment on each transcript. Table 36 - Segment location on transcripts
Figure imgf001426_0001
Segment cluster R30650_PEA 2__node_42 according to the present invention is supported by 8 libraries. The number of libraries was detennined as previously described. This segment can be found in the following transcript(s): R30650_PEA_2_T2, R30650JPEA_2JT3, R30650JPEA JT6 and R30650JPEA_2_T18. Table 37 below describes the starting and ending position of this segment on each transcript. Table 37 - Segment location on transcripts
Figure imgf001426_0002
Segment cluster R30650_PEA_2_node_44 according to the present invention is supported by 8 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscriρt(s): R30650_PEAJ_T2, R30650_PEA_2_T3, R30650JPEA_2 JT6 and R30650JPEA_2JT18. Table 38 below describes the starting and ending position of this segment on each transcript. Table 38 - Segment location on transcripts 1426
Figure imgf001427_0001
Segment cluster R30650 JPEA J_node_46 according to the present invention is supported by 9 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R30650JPEAJ_T2, R30650_PEA_2_T3, R30650_PEA_2_T6 and R30650_PEA_2_T18. Table 39 below describes the starting and ending position of this segment on each transcript. Table 39 - Segment location on transcripts
Figure imgf001427_0002
Segment cluster R30650 JPE A j ode O according to the present invention is supported by 9 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R30650_PEA_2_T2, R30650_PEA_2_T3, R30650JPEA_2_T6 and R30650JPEA_2_T18. Table 40 below describes the starting and ending position of this segment on each transcript. Table 40 - Segment location on transcripts 1427
Figure imgf001428_0001
Segment cluster R30650JPEA_2_node 56 according to the present invention is supported by 10 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R30650JPEAJJT2, R30650JΕAJJT3, R30650JPEA JT6 and R30650_PEA_2_T18. Table 41 below describes the starting and ending position of this segment on each transcript. Table 41 - Segment location on transcripts
Figure imgf001428_0002
Segment cluster R30650JPEA J_node_60 according to the present invention is supported by 12 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscriρt(s): R30650_PEA_2_T2, R30650_PEA_2_T3, R30650JPEAJJT6 and R30650JPEA JT18. Table 42 below describes the starting and ending position of this segment on each transcript. Table 42 - Segment location on transcripts
Figure imgf001429_0001
Segment cluster R30650_PEA_2_node_63 according to the present invention is supported by 12 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R30650JPEAJJT2, R30650_PEA_2_T3, R30650_PEA_2_T6 and R30650JPEA JT18. Table 43 below describes the starting and ending position of this segment on each transcript. Table 43 - Segment location on transcripts
Figure imgf001429_0002
Segment cluster R30650_PEA_2_node_67 according to the present invention is supported by 11 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscriρt(s): R30650JΕAJJT2, R30650_PEA_2_T3, R30650_PEA_2_T6 and R30650_PEA_2_T18. Table 44 below describes the starting and ending position of this segment on each franscript. Table 44 - Segment location on transcripts
Figure imgf001430_0001
Segment cluster R30650 PEA Jjiode 70 according to the present invention is supported by 11 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R30650JPEAJJT2, R30650JPEA_2JT3, R30650JPEA JJT6 and R30650_PEA_2_T18. Table 45 below describes the starting and ending position of this segment on each transcript. Table 45 - Segment location on transcripts
Figure imgf001430_0002
Segment cluster R30650JPEA jiode _72 according to the present invention is supported by 14 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): R30650JPEA JT2, R30650J>EA_2JT3, R30650JPEA JJT6 and R30650_PEA_2_T18. Table 46 below describes the starting and ending position of this segment on each transcript. Table 46 - Segment location on transcripts
Figure imgf001431_0001
Segment cluster R30650 PEA node J73 according to the present invention is supported by 3 libraries. The number of libraries was detennined as previously described. This segment can be found in the following transcript(s): R30650JPEA JT18. Table 47 below describes the starting and ending position of this segment on each transcript. Table 47 - Segment location on transcripts
Figure imgf001431_0002
Segment cluster R30650 JΕA jiode _75 according to the present invention is supported by 14 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): R30650JPEA_2JT2 and R30650JPEA JT3. Table 48 below describes the starting and ending position of this segment on each transcript. Table 48 - Segment location on tr-anscripts
Figure imgf001431_0003
Segment cluster R3065 OJΕA jiode J 9 according to the present invention is supported by 15 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R30650JPEAJ2 JT2, R30650JΕAJJT3 and R30650 JPEA JJT6. Table 49 below describes the starting and ending position of this segment on each transcript. Table 49 - Segment location on transcripts
Figure imgf001432_0001
Segment cluster R30650JPEA _node_86 according to the present invention is supported by 43 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): R30650J>EA_2JT2, R30650JPEA_2_T3 and R30650JPEA JJT6. Table 50 below describes the starting and ending position of this segment on each franscript. Table 50 - Segment location on transcripts
Figure imgf001432_0002
Segment cluster R30650_PEAJ_node_87 according to the present invention is supported by 43 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R30650_PEA_2_T2, R30650JPEA_2_T3 and 1432
R30650_PEA_2_T6. Table 51 below describes the starting and ending position of this segment on each transcript. Table 51 - Segment location on transcripts
Figure imgf001433_0001
Segment cluster R30650JPEAJjιode_89 according to the present invention is supported by 69 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): R30650_PEA_2_T2, R30650JPEAJJT3 and R30650_PEA_2_T6. Table 52 below describes the starting and ending position of this segment on each transcript. Table 52 - Segment location on tr-anscripts
Figure imgf001433_0002
Segment cluster R30650JPEA Jjiode 3 according to the present invention is supported by 108 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R30650JPEA_J_T2, R30650_PEA_2_T3 and R30650_PEA_2_T6. Table 53 below describes the starting and ending position of this segment on each transcript. Table 53 - Segment location on transcripts
Figure imgf001434_0001
According to an optional embodiment ofthe present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster R30650_PEA 2_node_8 according to the present invention is supported by 10 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R30650JPEAJ2JT14, R30650JPEA_2_T15 and R30650_PEA_2_T21. Table 54 below describes the starting and ending position of this segment on each franscript. Table 54 - Segment location on transcripts
Figure imgf001434_0002
Segment cluster R30650_PEAJ_node_17 according to the present invention is supported by 6 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R30650 JPEA JJT6, R30650JPEA JT18 and R30650_PEA_2_T23. Table 55 below describes the starting and ending position of this segment on each franscript. Table 55 - Segment location on transcripts
Figure imgf001435_0001
Segment cluster R30650_PEAJ_nodeJ8 according to the present invention is supported by 7 libraries. The number of libraries was detennined as previously described. This segment can be found in the following transcript(s): R30650_PEA_2_T6, R30650JPEAJJT18 and R30650_PEA_2_T23. Table 56 below describes the starting and ending position of this segment on each transcript. Table 56 - Segment location on transcripts
Figure imgf001435_0002
Segment cluster R30650_PEA_2_node__31 according to the present invention is supported by 8 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscriρt(s): R30650JPEA_2 JT6, R30650JPEA JT18 and R30650_PEA_2_T23. Table 57 below describes the starting and ending position of this segment on each franscript. Table 57 - Segment location on transcripts
Figure imgf001435_0003
1435
Figure imgf001436_0001
Segment cluster R306 0JPEA J_node_48 according to the present invention is supported by 10 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): R30650_PEA_2_T2, R30650JPEAJJT3, R30650_PEA_2_T6 and R30650_PEA_2_T18. Table 58 below describes the starting and ending position of this segment on each transcript. Table 58 - Segment location on transcripts
Figure imgf001436_0002
Segment cluster R30650JPEA iode _53 according to the present invention is supported by 12 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): R30650JΕA JT2, R30650JPEA_2_T3, R30650_PEA_2_T6 and R30650_PEA_2_T18. Table 59 below describes the starting and ending position of this segment on each transcript. Table 59 - Segment location on transcripts
Figure imgf001436_0003
Figure imgf001437_0001
Segment cluster R30650_PEA_2_node 8 according to the present invention is supported by 10 libraries. The number of libraries was detennined as previously described. This segment can be found in the following transcript(s): R30650_PEAJ_T2, R30650_PEA_2_T3, R30650_PEA_2_T6 and R30650_PEA_2_T18. Table 60 below describes the starting and ending position of this segment on each transcript. Table 60 - Segment location on transcripts
Figure imgf001437_0002
Segment cluster R30650JPEA J_node_68 according to the present invention is supported by 10 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R30650_PEA_2_T2, R30650JPEA_2_T3, R30650_PEA_2_T6 and R30650_PEA_2_T18. Table 61 below describes the starting and ending position of this segment on each franscript. Table 61 - Segment location on transcripts
Figure imgf001437_0003
Figure imgf001438_0001
Segment cluster R30650_PEA_2_node_77 according to the present invention is supported by 13 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R30650_PEA_2_T2, R30650JPEA_2_T3 and R30650_PEA_2_T6. Table 62 below describes the starting and ending position of this segment on each transcript. Table 62 - Segment location on transcripts
Figure imgf001438_0002
Segment cluster R30650_PEA_2_node_82 according to the present invention is supported by 20 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R30650JPEAJJT2, R30650JPENJJT3 and R30650_PEA_2_T6. Table 63 below describes the starting and ending position of this segment on each transcript. Table 63 - Segment location on transcripts
Figure imgf001438_0003
Segment cluster R30650_PEA J_node_85 according to the present invention can be found in the following transcript(s): R30650_PEA_2_T2, R30650JPEAJJT3 and R30650_PEAJ_T6. Table 64 below describes the starting and ending position of this segment on each transcript. Table 64 - Segment location on transcripts
Figure imgf001439_0001
Segment cluster R30650 JPE A_2jnode J8 according to the present invention is supported by 38 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R30650JPEA_2_T2, R30650_PEA_2_T3 and R30650 J EA JJT6. Table 65 below describes the starting and ending position of this segment on each franscript. Table 65 - Segment location on transcripts
Figure imgf001439_0002
Segment cluster R30650_PEA_2_node_90 according to the present invention can be found in the following franscript(s): R30650_PEA_2_T2, R30650_PEA_2_T3 and R30650_PEA_2_T6. Table 66 below describes the starting and ending position of this segment on each franscript. Table 66 - Segment location on transcripts
Figure imgf001440_0001
Segment cluster R30650_PEA_2_node_91 according to the present invention is supported by 45 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): R30650JPENJJT2, R30650 JPEA JT3 and R30650_PEA_2_T6. Table 67 below describes the starting and ending position of this segment on each transcript. Table 67 - Segment location on transcripts
Figure imgf001440_0002
Segment cluster R30650_PEA_2_node_92 according to the present invention can be found in the following transcript(s): R30650_PEA_2_T2, R30650_PEA_2_T3 and R30650JPEA JT6. Table 68 below describes the starting and ending position of this segment on each transcript. Table 68 - Segment location on transcripts
Figure imgf001440_0003
Figure imgf001441_0001
Variant protein alignment to the previously known protein: Sequence name: Q9ULM1
Sequence documentation: Alignment of: R30650_PEA_2_P4 x Q9ULM1
Alignment segment 1/1:
Quality: 8887.00 Escore: 0 Matching length: 888 Total length: 888 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment :
1 MYLHIGEEIDGVDMRAEVGLLSRNIIVMGEMEDKCYPYRNHICNFFDFDT 50
126 MYLHIGEEIDGVDMRAEVGLLSRNIIVMGEMEDKCYPYRNHICNFFDFDT 175 FGGHIKFALGFKAAHLEGTELKHMGQQLVGQYPIHFHLAGDVDERGGYDP 100
I I I I I I I I I I I II I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I FGGHIKFALGFKAAHLEGTELKHMGQQLVGQYPIHFHLAGDVDERGGYDP 225
PTYIRDLSIHHTFSRCVTVHGSNGLLIKDVVGYNSLGHCFFTEDGPEERN 150
I I I I I I I I I I I I I I I II II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I PTYIRDLSIHHTFSRCVTVHGSNGLLI DVVGYNSLGHCFFTEDGPEERN 275
TFDHCLGLLVKSGTLLPSDRDSKMCKMITEDSYPGYIPKPRQDCNAVSTF 200
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I — TFDHCLGLLVKSGTLLPSDRDS MCKMITEDSYPGYIPKPRQDCNAVSTF 325 MANPNNNLINCAAAGSEETGFWFIFHHVPTGPSVGMYSPGYSEHIPLGK 250
I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I WMANPNNNLINCAAAGSEETGF FIFHHVPTGPSVGMYSPGYSEHIPLGK 375
FYNNRAHSNYRAGMIIDNGV TTEASA DKRPFLSIISARYSPHQDADPL 300
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I FYNNRAHSNYRAGMIIDNGVKTTEASAKDKRPFLSIISARYSPHQDADPL 425
KPREPAIIRHFIAYKNQDHGAWLRGGDV LDSCRFADNGIGLTLASGGTF 350
KPREPAIIRHFIAYKNQDHGA LRGGDVWLDSCRFADNGIGLTLASGGTF 475
PYDDGSKQEIKNSLFVGESGNVGTEMMDNRIWGPGGLDHSGRTLPIGQNF 400
I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I PYDDGSKQEIKNSLFVGESGNVGTEMMDNRIWGPGGLDHSGRTLPIGQNF 525
PIRGIQLYDGPINIQNCTFRKFVALEGRHTSALAFRLNNA QSCPHNNVT 450
I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I PIRGIQLYDGPINIQNCTFRKFVALEGRHTSALAFRLNNAWQSCPHNNVT 575 451 GIAFEDVPITSRVFFGEPGP FNQLDMDGDKTSVFHDVDGSVSEYPGSYL 500 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I 576 GIAFEDVPITSRVFFGEPGP FNQLDMDGDKTSVFHDVDGSVSEYPGSYL 625 . . . . . 501 TKNDN LVRHPDCINVPD RGAICSGCYAQMYIQAYKTSNLRMKIIKNDF 550 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 626 TKNDNWLVRHPDCINVPD RGAICSGCYAQMYIQAYKTSNLRMKI IKNDF 675 - 551 PSHPLYLEGALTRSTHYQQYQPWTLQKGYTIHWDQTAPAELAIWLINFN 600 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 676 PSHPLYLEGALTRSTHYQQYQPVVTLQKGYTIH DQTAPAELAI LINFN 725 601 KGDWIRVGLCYPRGTTFSILSDVHNRLLKQTSKTGVFVRTLQMDKVEQSY 650 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 726 KGDWIRVGLCYPRGTTFSILSDVHNRLLKQTSKTGVFVRTLQMDKVEQSY 775 651 PGRSHYY DEDSGLLFLKLKAQNEREKFAFCSMKGCERIKIKALIPKNAG 700 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 776 PGRSHYYWDEDSGLLFLKLKAQNEREKFAFCSMKGCERIKIKALIPKNAG 825 701 VSDCTATAYPKFTERAVVDVPMPKKLFGSQLKTKDHFLEV MESSKQHFF 750 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 826 VSDCTATAYPKFTERAVVDVPMPKKLFGSQLKTKDHFLEVKMESSKQHFF 875
751 HLWNDFAYIEVDGKKYPSSEDGIQVWIDGNQGRWSHTSFRNSILQGIP 800 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 876 HL NDFAYIEVDGKKYPSSEDGIQVVVIDGNQGRVVSHTSFRNSILQGIP 925 801 QLFNYVATIPDNSIVLMAS GRYVSRGP TRVLEKLGADRGL LKEQMA 850 926 QLFNYVATIPDNSIVLMASKGRYVSRGPWTRVLEKLGADRGLKLKEQMA 975
851 FVGFKGSFRPIWVTLDTEDHKAKIFQWPIPVVKKKKL
976 FVGFKGSFRPI VTLDTEDHKAKIFQWPIPVVKKKKL 1013
- - -
Sequence name: Q8WUJ3
Sequence documentation:
Alignment of: R30650_PEA_2_P4 x Q8WUJ3
Alignment segment 1/1: Quality: 5070.00
Escore: 0 Matching length: 506 Total length: 506 Matching Percent Similarity: 99.80 Matching Percent Identity: 99.80 Total Percent Similarity: 99.80 Total Percent Identity: 99.80 Gaps : 0
Alignment : 1 MYLHIGEEIDGVDMRAEVGLLSRNIIVMGEMEDKCYPYRNHICNFFDFDT 50
474 MYLHIGEEIDGVDMRAEVGLLSRNIIVMGEMEDKCYPYRNHICNFFDFDT 523
51 FGGHIKFALGFKAAHLEGTELKHMGQQLVGQYPIHFHLAGDVDERGGYDP 100
524 FGGHIKFALGFKAAHLEGTELKHMGQQLVGQYPIHFHLAGDVDERGGYDP 573
101 PTYIRDLSIHHTFSRC¥TVHGSNGLLIKDWGYNSLGHCFFTEDGPEERN 150- 574 PTYIRDLSIHHTFSRCVTVHGSNGLLIKDVVGYNSLGHCFFTEDGPEERN 623
151 TFDHCLGLLVKSGTLLPSDRDSKMCKMITEDSYPGYIPKPRQDCNAVSTF 200 I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I II I I I I I I 624 TFDHCLGLLVKSGTLLPSDRDSKMCKMITEDSYPGYIPKPRQDCNAVSTF 673 201 MANPNNNLINCAAAGSEETGF FIFHHVPTGPSVGMYSPGYSEHIPLGK 250 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 674 MANPNNNLINCAAAGSEETGF FIFHHVPTGPSVGMYSPGYSEHIPLGK 723 . . . . . 251 FYNNRAHSNYRAGMIIDNGV TTEASAKDKRPFLSIISARYSPHQDADPL 300 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 724 FYNNRAHSNYRAGMIIDNGVKTTEASAKDKRPFLSIISARYSPHQDADPL 773 301 KPREPAIIRHFIAYKNQDHGA LRGGDV LDSCRFADNGIGLTLASGGTF 350 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 774 KPREPAIIRHFIAYKNQDHGAWLRGGDVWLDSCRFADNGIGLTLASGGTF 823 351 PYDDGSKQEI NSLFVGESGNVGTEMMDNRIWGPGGLDHSGRTLPIGQNF 400 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 824 PYDDGSKQEI NSLFVGESGNVGTEMMDNRI GPGGLDHSGRTLPIGQNF 873 401 PIRGIQLYDGPINIQNCTFRKFVALEGRHTSALAFRLNNA QSCPHNNVT 450 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 874 PIRGIQLYDGPINIQNCTFRKFVALEGRHTSALAFRLNNA QSCPHNNVT 923 . . . . . 451 GIAFEDVPITSRVFFGEPGP FNQLDMDGDKTSVFHDVDGSVSEYPGSYL 500 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 924 GIAFEDVPITSRVFFGEPGP FNQLDMDGDKTSVFHDVDGSVSEYPGSYL 973 — - 501—T NDN 506 I I I I I 974 TKNDKW 979
Sequence name: Q9NPN9
Sequence documentation:
Alignment of: R30650_PEA_2_P4 x Q9NPN9 Alignment segment 1/1:
Quality: 7975.00 Escore: 0 Matching length: 797 Total length: 797 Matching Percent Similarity: 100.00 Matching Percent
Identity: 100.00 Total Percent Similarity: 100.00 Total Percent
Identity: 100.00 Gaps : 0
Alignment :
92 VDERGGYDPPTYIRDLSIHHTFSRCVTVHGSNGLLIKDVVGYNSLGHCFF 141
8 VDERGGYDPPTYIRDLSIHHTFSRCVTVHGSNGLLIKDVVGYNSLGHCFF 57
142 TEDGPEERNTFDHCLGLLVKSGTLLPSDRDSKMCKMITEDSYPGYIPKPR 191 58 TEDGPEERNTFDHCLGLLVKSGTLLPSDRDSKMC MITEDSYPGYIPKPR 107
192 QDCNAVSTFWMANPNNNLINCAAAGSEETGFWFIFHHVPTGPSVGMYSPG 241
108 QDCNAVSTF MANPNNNLINCAAAGSEETGFWFIFHHVPTGPSVGMYSPG 157 . . . . . 242 YSEHIPLGKFYNNRAHSNYRAGMIIDNGVKTTEASAKDKRPFLSIISARY 291
158 YSEHIPLGKFYNNRAHSNYRAGMIIDNGVKTTEASAKDKRPFLSIISARY 207 292 SPHQDADPLKPREPAIIRHFIAYKNQDHGAWLRGGDVWLDSCRFADNGIG 341 I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I 208 SPHQDADPLKPREPAIIRHFIAYKNQDHGAWLRGGDV LDSCRFADNGIG 257 • . • . • 342 LTLASGGTFPYDDGSKQEIKNSLFVGESGNVGTEMMDNRI GPGGLDHSG 391 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 258 LTLASGGTFPYDDGSKQEIKNSLFVGESGNVGTEMMDNRI GPGGLDHSG 307 392 RTLPIGQNFPIRGIQLYDGPINIQNCTFR FVALEGRHTSALAFRLNNAW 441 I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
308 RTLPIGQNFPIRGIQLYDGPINIQNCTFRKFVALEGRHTSALAFRLNNAW 357
442 QSCPHNNVTGIAFEDVPITSRVFFGEPGP FNQLDMDGDKTSVFHDVDGS 491
358 QSCPHNNVTGIAFEDVPITSRVFFGEPGP FNQLDMDGDKTSVFHDVDGS 407
492 VSEYPGSYLTKNDN LVRHPDCINVPDWRGAICSGCYAQMYIQAYKTSNL 541 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I
408 VSEYPGSYLTKNDNWL¥RHPDCINVPD RGAICSGCYAQMYIQAYKTSNL 457
542 RMKIIKNDFPSHPLYLEGALTRSTHYQQYQPVVTLQKGYTIH DQTAPAE 591
458 RMKIIKNDFPSHPLYLEGALTRSTHYQQYQPVVTLQKGYTIHWDQTAPAE 507
592 LAIWLINFNKGD IRVGLCYPRGTTFSILSDVHNRLLKQTSKTGVFVRTL 641 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 508 LAIWLINFNKGDWIRVGLCYPRGTTFSILSDVHNRLLKQTSKTGVFVRTL 557
642 QMDKVEQSYPGRSHYYWDEDSGLLFLKLKAQNEREKFAFCSMKGCERIKI 691 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
558 QMDKVEQSYPGRSHYY DEDSGLLFLKLKAQNEREKFAFCSMKGCERIKI 607 . . . . .
692 KALIPKNAGVSDCTATAYPKFTERAVVDVPMP KLFGSQLKTKDHFLEVK 741 I I I I I I I I I I I II I I I I I I II I II I I I I I I I I 1 I I II I I I I I I I I I I I I I
608 KALIP NAGVSDCTATAYPKFTERAWDVPMPKKLFGSQLKTKDHFLEVK 657
742 MESSKQHFFHL NDFAYIEVDGK YPSSEDGIQVWIDGNQGRWSHTSF 791 658 MESSKQHFFHLWNDFAYIEVDGKKYPSSEDGIQVVVIDGNQGRWSHTSF 707
792 RNSILQGIPWQLFNYVATIPDNSIVLMASKGRYVSRGP TRVLE LGADR 841 I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I 1 I I I I I I I I I I II I I 708 RNSILQGIP QLFNYVATIPDNSIVLMASKGRYVSRGP TRVLEKLGADR 757
842 GLKLKEQMAFVGFKGSFRPIWVTLDTEDH AKIFQVVPIPVVKKKKL 888 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 758 GLKLKEQMAFVGFKGSFRPIWVTLDTEDHKAKIFQWPIPVVKKKKL 804
Sequence name: Q9H1K5
Sequence documentation:
Alignment of: R30650_PEA_2_P x Q9H1K5
Alignment segment 1/1:
Quality: 4983.00 Escore: 0 Matching length: 499 Total length: 499 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment:
390 SGRTLPIGQNFPIRGIQLYDGPINIQNCTFRKFVALEGRHTSALAFRLNN 439 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 2 SGRTLPIGQNFPIRGIQLYDGPINIQNCTFRKFVALEGRHTSALAFRLNN 51 440 AWQSCPHNNVTGIAFEDVPITSRVFFGEPGP FNQLDMDGDKTSVFHDVD 489 I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 52 A QSCPHNNVTGIAFEDVPITSRVFFGEPGP FNQLDMDGDKTSVFHDVD 101 490 GSVSEYPGSYLTKNDN LVRHPDCINVPD RGAICSGCYAQMYIQAYKTS 539 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 102 GSVSEYPGSYLTKNDNWLVRHPDCINVPDWRGAICSGCYAQMYIQAYKTS 151 540 NLRMKIIKNDFPSHPLYLEGALTRSTHYQQYQPVVTLQKGYTIHWDQTAP 589 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 152 NLRMKIIKNDFPSHPLYLEGALTRSTHYQQYQPVVTLQKGYTIH DQTAP 201 . . . . . 590 AELAI LINFNKGD IRVGLCYPRGTTFSILSDVHNRLLKQTSKTGVFVR 639 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 202 AELAI LINFNKGD IRVGLCYPRGTTFSILSDVHNRLLKQTSKTGVFVR 251 640 TLQMDKVEQSYPGRSHYYWDEDSGLLFLKLKAQNEREKFAFCSMKGCERI 689 I I II I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 252 TLQMDKVEQSYPGRSHYY DEDSGLLFLKLKAQNEREKFAFCSMKGCERI 301 690 KI ALIPKNAGVSDCTATAYPKFTERAWDVPMPKKLFGSQL TKDHFLE 739 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I 302 KIKALIPKNAGVSDCTATAYPKFTERAVVDVPMPKKLFGSQLKTKDHFLE 351 740 VKMESSKQHFFHL NDFAYIEVDG KYPSSEDGIQWVIDGNQGRVVSHT 789 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 352 VKMESSKQHFFHL NDFAYIEVDGKKYPSSEDGIQV¥VIDGNQGRVVSHT 401 . . . . . 790 SFRNSILQGIPWQLFNYVATIPDNSIVLMASKGRYVSRGP TRVLEKLGA 839 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 402 SFRNSILQGIPWQLFNYVATIPDNSIVLMASKGRYVSRGPWTRVLEKLGA 451 840 DRGLKLKEQMAFVGFKGSFRPIWVTLDTEDHKAKIFQVVPIPVVKKKKL 888 I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 452 DRGLKLKEQMAFVGFKGSFRPI VTLDTEDHKAKIFQVVPIPVVKKK L 500
Sequence name: Q9ULM1
Sequence documentation:
Alignment of: R30650_PEA_2_P5 x Q9ULM1
Alignment segment 1/1:
Quality: 9960.00 Escore: 0 Matching length: 996 Total length: 996 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment :
1 MDGVNLSTEWYKKGQDYRFACYDRGRACRSYRVRFLCGKPVRPKLTVTI 50 - I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I 18 MDGVNLSTEVVYKKGQDYRFACYDRGRACRSYRVRFLCG PVRPKLTVTI 67 51 DTNVNSTILNLEDNVQS KPGDTLVIASTDYSMYQAEEFQVLPCRSCAPN 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 68 DTNVNSTILNLEDNVQSWKPGDTLVIASTDYSMYQAEEFQVLPCRSCAPN 117
101 QVKVAGKPMYLHIGEEIDGVDMRAEVGLLSRNIIVMGEMEDKCYPYRNHI 150 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 118 QVKVAGKPMYLHIGEEIDGVDMRAEVGLLSRNIIVMGEMEDKCYPYRNHI 167 . . . . . 151 CNFFDFDTFGGHIKFALGF AAHLEGTELKHMGQQLVGQYPIHFHLAGDV 200 I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I 168 CNFFDFDTFGGHIKFALGFKAAHLEGTELKHMGQQLVGQYPIHFHLAGDV 217 201 DERGGYDPPTYIRDLSIHHTFSRCVTVHGSNGLLIKDVVGYNSLGHCFFT 250 I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I 218 DERGGYDPPTYIRDLSIHHTFSRCVTVHGSNGLLIKDVVGYNSLGHCFFT 267 251 EDGPEERNTFDHCLGLLV SGTLLPSDRDS MCKMITEDSYPGYIPKPRQ 300 I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I 268 EDGPEERNTFDHCLGLLVKSGTLLPSDRDSKMCKMITEDSYPGYIP PRQ 317 301 DCNAVSTF MANPNNNLINCAAAGSEETGFWFIFHHVPTGPSVGMYSPGY 350 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
318 DCNAVSTF MANPNNNLINCAAAGSEETGF FIFHHVPTGPSVGMYSPGY 367 . . . . .
351 SEHIPLGKFYNNRAHSNYRAGMIIDNGVKTTEASAKDKRPFLSIISARYS 400 I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I II I I
368 SEHIPLGKFYNNRAHSNYRAGMIIDNGV TTEASAKDKRPFLSIISARYS 417
401 PHQDADPLKPREPAIIRHFIAYKNQDHGA LRGGDVWLDSCRFADNGIGL 450 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
418 PHQDADPL PREPAIIRHFIAYKNQDHGAWLRGGDV LDSCRFADNGIGL 467
451 TLASGGTFPYDDGSKQEIKNSLFVGESGNVGTEMMDNRIWGPGGLDHSGR 500
468 TLASGGTFPYDDGSKQEIKNSLFVGESGNVGTEMMDNRI GPGGLDHSGR 517
501 TLPIGQNFPIRGIQLYDGPINIQNCTFRKFVALEGRHTSALAFRLNNAWQ 550 I I II I I I I I I I II I I I I I II I I I I I I I I I I I I I I I I II I I I I I I I 1 I I I I 518 TLPIGQNFPIRGIQLYDGPINIQNCTFRKFVALEGRHTSALAFRLNNAWQ 567
551 SCPHNNVTGIAFEDVPITSRVFFGEPGP FNQLDMDGDKTSVFHDVDGSV 600 I I I I I I I II I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I 568 SCPHNNVTGIAFEDVPITSRVFFGEPGPWFNQLDMDGDKTSVFHDVDGSV 617 . . . . .
601 SEYPGSYLTKNDN LVRHPDCINVPDWRGAICSGCYAQMYIQAYKTSNLR 650 I I I I I II I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 618 SEYPGSYLTKNDN LVRHPDCINVPD RGAICSGCYAQMYIQAYKTSNLR 667
651 MKIIKNDFPSHPLYLEGALTRSTHYQQYQPVVTLQKGYTIHWDQTAPAEL 700 668 MKIIKNDFPSHPLYLEGALTRSTHYQQYQPVVTLQKGYTIHWDQTAPAEL 717 701 AIWLINFNKGDWIRVGLCYPRGTTFSILSDVHNRLLKQTSKTGVFVRTLQ 750 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 718 AIWLINFNKGD IRVGLCYPRGTTFSILSDVHNRLLKQTSKTGVFVRTLQ 767 751 MDKVEQSYPGRSHYYWDEDSGLLFLKLKAQNEREKFAFCSMKGCERIKIK 800 II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II II I I I I I I 768 MD VEQSYPGRSHYY DEDSGLLFL L AQNEREKFAFCSMKGCERIKIK 817
10- 801 ALIPKNAGVSDCTATAYP FTERAVVDVPMPKKLFGSQLKTKDHFLEVKM 850 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 818 ALIPKNAGVSDCTATAYPKFTERAVVDVPMPKKLFGSQLKTKDHFLEVKM 867
15 851 ESSKQHFFHL NDFAYIEVDGKKYPSSEDGIQVVVIDGNQGRVVSHTSFR 900 I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 868 ESSKQHFFHLWNDFAYIEVDGKKYPSSEDGIQWVIDGNQGRVVSHTSFR 917
901 NSILQGIP QLFNYVATIPDNSIVLMASKGRYVSRGP TRVLE LGADRG 950
20 I I I I I I I I I I I II I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 918 NSILQGIPWQLFNYVATIPDNSIVLMASKGRYVSRGPWTRVLEKLGADRG 967
951 LKLKEQMAFVGFKGSFRPI VTLDTEDHKAKIFQVVPIPVVKK KL 996 I I I I I I I I I I I II I I I II II I I I I I I II I I I I I I I I I I I I II I II I
25 968 LKLKEQMAFVGFKGSFRPIWVTLDTEDHKAKIFQWPIPWKKKKL 1013
30 Sequence name : Q8 UJ3
Sequence documentation:
Alignment of: R30650_PEA_2_P5 x Q8 UJ3
Alignment segment 1/1:
Quality: 6143.00 Escore:- — • - 0-• •• - - . .. Matching length: 614 Total length: 614 Matching Percent Similarity: 99.84 Matching Percent Identity: 99.84 Total Percent Similarity: 99.84 Total Percent Identity: 99.84 Gaps : 0
Alignment: . . . . . 1 MDGVNLSTEWYKKGQDYRFACYDRGRACRSYRVRFLCGKPVRPKLTVTI 50 I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 366 MDGVNLSTEWYKKGQDYRFACYDRGRACRSYRVRFLCGKPVRPKLTVTI 415 51 DTNVNSTILNLEDNVQS KPGDTLVIASTDYSMYQAEEFQVLPCRSCAPN 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 416 DTNVNSTILNLEDNVQSWKPGDTLVIASTDYSMYQAEEFQVLPCRSCAPN 465 101 QVKVAGKPMYLHIGEEIDGVDMRAEVGLLSRNIIVMGEMEDKCYPYRNHI 150 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 466 QVKVAGKPMYLHIGEEIDGVDMRAEVGLLSRNIIVMGEMEDKCYPYRNHI 515 151 CNFFDFDTFGGHIKFALGFKAAHLEGTELKHMGQQLVGQYPIHFHLAGDV 200 I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I 516 CNFFDFDTFGGHIKFALGFKAAHLEGTELKHMGQQLVGQYPIHFHLAGDV 565 . . . . . 201 DERGGYDPPTYIRDLSIHHTFSRCVTVHGSNGLLIKDVVGYNSLGHCFFT 250 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I 566 DERGGYDPPTYIRDLSIHHTFSRCVTVHGSNGLLIKDVVGYNSLGHCFFT 615 - - 251 EDGPEERNTFDHCLGLLVKSGTLLPSDRDSKMCKMITEDSYPGYIPKPRQ 300 I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I 616 EDGPEERNTFDHCLGLLVKSGTLLPSDRDS MCKMITEDSYPGYIPKPRQ 665
301 DCNAVSTFWMANPNNNLINCAAAGSEETGFWFIFHHVPTGPSVGMYSPGY 350 I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 666 DCNAVSTF MANPNNNLINCAAAGSEETGF FIFHHVPTGPSVGMYSPGY 715
351 SEHIPLGKFYNNRAHSNYRAGMIIDNGVKTTEASA DKRPFLSIISARYS 400 I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 716 SEHIPLGKFYNNRAHSNYRAGMIIDNGV TTEASAKD RPFLSIISARYS 765 401 PHQDADPLKPREPAIIRHFIAYKNQDHGA LRGGDV LDSCRFADNGIGL 450 I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I II I 766 PHQDADPLKPREPAIIRHFIAYKNQDHGA LRGGDVWLDSCRFADNGIGL 815 451 TLASGGTFPYDDGSKQEIKNSLFVGESGNVGTEMMDNRI GPGGLDHSGR 500 I I 11 I I II I I I I I I I I I I I I I I I I II II I I I I I I I I I I I I I I I I I I I I I I 816 TLASGGTFPYDDGSKQEIKNSLFVGESGNVGTEMMDNRIWGPGGLDHSGR 865 501 TLPIGQNFPIRGIQLYDGPINIQNCTFRKFVALEGRHTSALAFRLNNAWQ 550 866 TLPIGQNFPIRGIQLYDGPINIQNCTFRKFVALEGRHTSALAFRLNNA Q 915
551 SCPHNNVTGIAFEDVPITSRVFFGEPGP FNQLDMDGD TSVFHDVDGSV 600
916 SCPHNNVTGIAFEDVPITSRVFFGEPGPWFNQLDMDGDKTSVFHDVDGSV 965
601 SEYPGSYLTKNDN 614
966 SEYPGSYLTKND 979
Sequence name: Q9NPN9
Sequence documentation:
Alignment of: R30650_PEA_2_P5 x Q9NPN9
Alignment segment 1/1
Quality: 7975.00 Escore: 0 Matching length: 797 Total length: 797 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment :
200 VDERGGYDPPTYIRDLSIHHTFSRCVTVHGSNGLLIKDVVGYNSLGHCFF 249 I I I I I I I I I I I II I I I I II I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I 8 VDERGGYDPPTYIRDLSIHHTFSRCVTVHGSNGLLIKDVVGYNSLGHCFF 57 250 TEDGPEERNTFDHCLGLLVKSGTLLPSDRDS MCKMITEDSYPGYIPKPR 299 I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 58 TEDGPEERNTFDHCLGLLVKSGTLLPSDRDSKMCKMITEDSYPGYIPKPR 107
300 QDCNAVSTF MANPNNNLINCAAAGSEETGFWFIFHHVPTGPSVGMYSPG 349 II I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I 108 QDCNAVSTF MANPNNNLINCAAAGSEETGFWFIFHHVPTGPSVGMYSPG 157
350 YSEHIPLGKFYNNRAHSNYRAGMIIDNGVKTTEASAKDKRPFLSIISARY 399 I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I II I I I I I I I I 158 YSEHIPLGKFYNNRAHSNYRAGMIIDNGVKTTEASAKDKRPFLSIISARY 207 . . . . . 400 SPHQDADPLKPREPAIIRHFIAYKNQDHGAWLRGGDVWLDSCRFADNGIG 449 I I I I I I II II I I I I I I I I I I I I I I I I I I I I I II I I I II I I I I I I I I I I I I 208 SPHQDADPLKPREPAIIRHFIAYKNQDHGAWLRGGDVWLDSCRFADNGIG 257 450 LTLASGGTFPYDDGSKQEIKNSLFVGESGNVGTEMMDNRI GPGGLDHSG 499 I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I II I I I I I 258 LTLASGGTFPYDDGSKQEI NSLFVGESGNVGTEMMDNRIWGPGGLDHSG 307
500 RTLPIGQNFPIRGIQLYDGPINIQNCTFRKFVALEGRHTSALAFRLNNAW 549 I I I I I || || I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 308 RTLPIGQNFPIRGIQLYDGPINIQNCTFRKFVALEGRHTSALAFRLNNAW 357 550 QSCPHNNVTGIAFEDVPITSRVFFGEPGP FNQLDMDGDKTSVFHDVDGS 599 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 358 QSCPHNNVTGIAFEDVPITSRVFFGEPGPWFNQLDMDGDKTSVFHDVDGS 407 . . . . . 600 VSEYPGSYLTKNDN LVRHPDCINVPD RGAICSGCYAQMYIQAYKTSNL 649 II I I I I I I I I I II I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 408 VSEYPGSYLTKNDNWL¥RHPDCINVPDWRGAICSGCYAQMYIQAYKTSNL 457
650 RMKIIKNDFPSHPLYLEGALTRSTHYQQYQPVVTLQKGYTIHWDQTAPAE 699 I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I 458 RMKIIKNDFPSHPLYLEGALTRSTHYQQYQPVVTLQKGYTIHWDQTAPAE 507
700 LAI LINFNKGD IRVGLCYPRGTTFSILSDVHNRLLKQTSKTGVFVRTL 749
508 LAI LINFNKGD IRVGLCYPRGTTFSILSDVHNRLLKQTSKTGVFVRTL 557
750 QMDKVEQSYPGRSHYYWDEDSGLLFLKLKAQNEREKFAFCSMKGCERIKI 799 I I I I I I I I I I I II I I I II I I I I I I I I I I I I II I I I I I I I I I II I II I I I I 558 QMDKVEQSYPGRSHYYWDEDSGLLFLKLKAQNEREKFAFCSMKGCERIKI 607 800 KALIPKNAGVSDCTATAYPKFTERAVVDVPMPKKLFGSQL TKDHFLEVK 849 I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I II I 608 KALIPKNAGVSDCTATAYPKFTERAWDVPMPKKLFGSQLKTKDHFLEVK 657 . . . . . 850 MESSKQHFFHLWNDFAYIEVDGKKYPSSEDGIQVVVIDGNQGRWSHTSF 899 I I II I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I 658 MESSKQHFFHLWNDFAYIEVDGKKYPSSEDGIQVWIDGNQGRWSHTSF 707 900 RNSILQGIPWQLFNYVATIPDNSIVLMASKGRYVSRGPWTRVLEKLGADR 949 708 RNSILQGIPWQLFNYVATIPDNSIVLMAS GRYVSRGP TRVLEKLGADR 757 950 GLKLKEQMAFVGFKGSFRPI VTLDTEDHKAKIFQVVPIPVVKKKKL 996 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I 758 GLKLKEQMAFVGFKGSFRPIWVTLDTEDH AKIFQVVPIPVVKKKKL 804
Sequence name: Q9H1K5
Sequence documentation:
Alignment of: R30650_PEA_2J?5 x Q9H1K5
Alignment segment 1/1: Quality: 4983.00
Escore: 0 Matching length: 499 Total length: 499 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment: 498 SGRTLPIGQNFPIRGIQLYDGPINIQNCTFRKFVALEGRHTSALAFRLNN 547 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 2 SGRTLPIGQNFPIRGIQLYDGPINIQNCTFRKFVALEGRHTSALAFRLNN 51
548 A QSCPHNNVTGIAFEDVPITSRVFFGEPGPWFNQLDMDGDKTSVFHDVD 597 I I I I I I I I I I I I I I I I I II II I I I I I I I I I I I I I I I I I I I I II I I I I I I I 52 AWQSCPHNNVTGIAFEDVPITSRVFFGEPGPWFNQLDMDGDKTSVFHDVD 101
598 GSVSEYPGSYLTKNDNWLVRHPDCINVPD RGAICSGCYAQMYIQAYKTS 647 I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
102 GSVSEYPGSYLTKNDN LVRHPDCINVPD RGAICSGCYAQMYIQAYKTS 151
648 NLRMKIIKNDFPSHPLYLEGALTRSTHYQQYQPVVTLQKGYTIHWDQTAP 697 I I I I I I I I I I I I I I I I III I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 152 NLRMKIIKNDFPSHPLYLEGALTRSTHYQQYQPVVTLQKGYTIHWDQTAP 201
698 AELAIWLINFNKGDWIRVGLCYPRGTTFSILSDVHNRLLKQTSKTGVFVR 747
202 AELAIWLINFNKGDWIRVGLCYPRGTTFSILSDVHNRLLKQTSKTGVFVR 251
748 TLQMDKVEQSYPGRSHYYWDEDSGLLFLKLKAQNEREKFAFCSMKGCERI 797 I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I
252 TLQMDKVEQSYPGRSHYYWDEDSGLLFLKLKAQNEREKFAFCSMKGCERI 301
798 KIKALIPKNAGVSDCTATAYPKFTERAWDVPMPKKLFGSQLKTKDHFLE 847 I I I I I I I I I I I II I I II I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I
302 KIKALIPKNAGVSDCTATAYPKFTERAVVDVPMPKKLFGSQLKTKDHFLE 351
848 VKMESSKQHFFHL NDFAYIEVDGKKYPSSEDGIQWVIDGNQGRWSHT 897 I I I I I 1 I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
352 VKMESSKQHFFHLWNDFAYIEVDGKKYPSSEDGIQVWIDGNQGRWSHT 401 898 SFRNSILQGIPWQLFNYVATIPDNSIVLMASKGRYVSRGPWTRVLEKLGA 947
402 SFRNSILQGIPWQLFNYVATIPDNSIVLMASKGRYVSRGP TRVLEKLGA 451
948 DRGLKLKEQMAFVGFKGSFRPI VTLDTEDHKAKIFQVVPIPV¥ KKKL 996
452 DRGLKLKEQMAFVGFKGSFRPIWVTLDTEDHKAKIFQVVPIPVVKKKKL 500
Sequence name: Q9ULM1
Sequence documentation:
Alignment of: R30650_PEA_2_P8 x Q9ULM1
Alignment segment 1/1:
Quality: 7919.00 Escore: 0 Matching length: 788 Total length: 788 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0 Alignment :
349 AHPGKICNRPIDIQATTMDGVNLSTEWYKKGQDYRFACYDRGRACRSYR 398 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 AHPGKICNRPIDIQATTMDGVNLSTEVVYKKGQDYRFACYDRGRACRSYR 50 399 VRFLCGKPVRPKLTVTIDTNVNSTILNLEDNVQS KPGDTLVIASTDYSM 448 I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 VRFLCGKPVRPKLTVTIDTNVNSTILNLEDNVQS KPGDTLVIASTDYSM 100
449 YQAEEFQVLPCRSCAPNQVKVAGKPMYLHIGEEIDGVDMRAEVGLLSRNI 498
101 YQAEEFQVLPCRSCAPNQVKVAGKPMYLHIGEEIDGVDMRAEVGLLSRNI 150 . . . . . 499 IVMGEMEDKCYPYRNHICNFFDFDTFGGHIKFALGFKAAHLEGTELKHMG 548
151 IVMGEMEDKCYPYRNHICNFFDFDTFGGHIKFALGFKAAHLEGTELKHMG 200 549 QQLVGQYPIHFHLAGDVDERGGYDPPTYIRDLSIHHTFSRCVTVHGSNGL 598 I I I I I I I II II I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I 201 QQLVGQYPIHFHLAGDVDERGGYDPPTYIRDLSIHHTFSRCVTVHGSNGL 250
599 LIKDVVGYNSLGHCFFTEDGPEERNTFDHCLGLLVKSGTLLPSDRDSKMC 648
251 LIKDVVGYNSLGHCFFTEDGPEERNTFDHCLGLLVKSGTLLPSDRDSKMC 300
649 KMITEDSYPGYIPKPRQDCNAVSTFWMANPNNNLINCAAAGSEETGFWFI 698 I I II I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I 301 KMITEDSYPGYIPKPRQDCNAVSTF MANPNNNLINCAAAGSEETGF FI 350 699 FHHVPTGPSVGMYSPGYSEHIPLGKFYNNRAHSNYRAGMIIDNGVKTTEA 748 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 351 FHHVPTGPSVGMYSPGYSEHIPLGKFYNNRAHSNYRAGMIIDNGVKTTEA 400 749 SA DKRPFLSIISARYSPHQDADPLKPREPAIIRHFIAYKNQDHGA LRG 798 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 401 SAKDKRPFLSIISARYSPHQDADPLKPREPAIIRHFIAYKNQDHGAWLRG 450 799 GDVWLDSCRFADNGIGLTLASGGTFPYDDGSKQEIKNSLFVGESGNVGTE 848 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 451 GDV LDSCRFADNGIGLTLASGGTFPYDDGSKQEIKNSLFVGESGNVGTE 500
849 MMDNRIWGPGGLDHSGRTLPIGQNFPIRGIQLYDGPINIQNCTFRKFVAL 898 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 501 MMDNRI GPGGLDHSGRTLPIGQNFPIRGIQLYDGPINIQNCTFRKFVAL 550 899 EGRHTSALAFRLNNA QSCPHNNVTGIAFEDVPITSRVFFGEPGPWFNQL 948 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 551 EGRHTSALAFRLNNA QSCPHNNVTGIAFEDVPITSRVFFGEPGPWFNQL 600 . . . . . 949 DMDGDKTSVFHDVDGSVSEYPGSYLTKNDNWLVRHPDCINVPDWRGAICS 998 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 601 DMDGDKTSVFHDVDGSVSEYPGSYLTKNDNWLVRHPDCINVPDWRGAICS 650 999 GCYAQMYIQAYKTSNLRMKIIKNDFPSHPLYLEGALTRSTHYQQYQPVVT 1048 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I II I I I I I 651 GCYAQMYIQAYKTSNLRMKIIKNDFPSHPLYLEGALTRSTHYQQYQPWT 700
1049 LQKGYTIHWDQTAPAELAIWLINFNKGDWIRVGLCYPRGTTFSILSDVHN 1098 I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I 701 LQKGYTIHWDQTAPAELAIWLINFNKGDWIRVGLCYPRGTTFSILSDVHN 750 1099 RLLKQTSKTGVFVRTLQMDKVEQSYPGRSHYY DEDSG 1136 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 751 RLLKQTSKTGVFVRTLQMDKVEQSYPGRSHYY DEDSG 788
Sequence name: Q8WUJ3
Sequence documentation:
Alignment of: R30650_PEA_2_P8 x Q8 UJ3
Alignment segment 1/1:
Quality: 9764.00 Escore: 0 Matching length: 979 Total length: 979 Matching Percent Similarity: 99.90 Matching Percent Identity: 99.90 Total Percent Similarity: 99.90 Total Percent Identity: 99.90 Gaps : 0
Alignment : . . . . . 1 MGAAGRQDFLFKAMLTIS LTLTCFPGATSTVAAGCPDQSPELQP NPGH 50 MGAAGRQDFLF AMLTIS LTLTCFPGATSTVAAGCPDQSPELQP NPGH 50
DQDHHVHIGQGKTLLLTSSATVYSIHISEGGKLVIKDHDEPIVLRTRHIL 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I DQDHHVHIGQGKTLLLTSSATVYSIHISEGGKLVIKDHDEPIVLRTRHIL 100
IDNGGELHAGSALCPFQGNFTIILYGRADEGIQPDPYYGLKYIGVGKGGA 150
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I IDNGGELHAGSALCPFQGNFTIILYGRADEGIQPDPYYGLKYIGVGKGGA 150
LELHGQKKLSWTFLNKTLHPGGMAEGGYFFERSWGHRGVIVHVIDPKSGT 200
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I LELHGQKKLS TFLNKTLHPGGMAEGGYFFERSWGHRGVIVH¥IDPKSGT 200 . . . . . VIHSDRFDTYRSKKESERLVQYLNAVPDGRILSVAVNDEGSRNLDDMARK 250
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I VIHSDRFDTYRSKKESERLVQYLNAVPDGRILSVAVNDEGSRNLDDMARK 250
AMTKLGSKHFLHLGFRHP SFLTVKGNPSSSVEDHIEYHGHRGSAAARVF 300
I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I AMTKLGSKHFLHLGFRHPWSFLTVKGNPSSSVEDHIEYHGHRGSAAARVF 300
KLFQTEHGEYFNVSLSSE VQDVEWTEWFDHDKVSQTKGGEKISDLWKAH 350 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I LFQTEHGEYFNVSLSSEWVQDVE TEWFDHDKVSQTKGGEKISDLWKAH 350
PGKICNRPIDIQATTMDGVNLSTEVVYKKGQDYRFACYDRGRACRSYRVR 400
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1.1 I I I I I I I I I I I I I I II I I PGKICNRPIDIQATTMDGVNLSTEVVYKKGQDYRFACYDRGRACRSYRVR 400 401 FLCGKPVRPKLTVTIDTNVNSTILNLEDNVQSWKPGDTLVIASTDYSMYQ 450 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
401 FLCG PVRPKLTVTIDTNVNSTILNLEDNVQS KPGDTLVIASTDYSMYQ 450
451 AEEFQVLPCRSCAPNQVKVAGKPMYLHIGEEIDGVDMRAEVGLLSRNIIV 500 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
451 AEEFQVLPCRSCAPNQVKVAGKPMYLHIGEEIDGVDMRAEVGLLSRNIIV 500
501 MGEMEDKCYPYRNHICNFFDFDTFGGHIKFALGFKAAHLEGTELKHMGQQ 550 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
501 MGEMEDKCYPYRNHICNFFDFDTFGGHI FALGFKAAHLEGTELKHMGQQ 550
551 LVGQYPIHFHLAGDVDERGGYDPPTYIRDLSIHHTFSRCVTVHGSNGLLI 600 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 551 LVGQYPIHFHLAGDVDERGGYDPPTYIRDLSIHHTFSRCVTVHGSNGLLI 600
601 KDVVGYNSLGHCFFTEDGPEERNTFDHCLGLLVKSGTLLPSDRDSKMCKM 650 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
601 KDVVGYNSLGHCFFTEDGPEERNTFDHCLGLLV SGTLLPSDRDSKMCKM 650 . . . . .
651 ITEDSYPGYIPKPRQDCNAVSTFWMANPNNNLINCAAAGSEETGFWFIFH 700 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I 651 ITEDSYPGYIPKPRQDCNAVSTFWMANPNNNLINCAAAGSEETGFWFIFH 700
701 HVPTGPSVGMYSPGYSEHIPLGKFYNNRAHSNYRAGMIIDNGVKTTEASA 750 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
701 HVPTGPSVGMYSPGYSEHIPLG FYNNRAHSNYRAGMIIDNGVKTTEASA 750
751 KDKRPFLSIISARYSPHQDADPLKPREPAIIRHFIAYKNQDHGA LRGGD 800
751 KDKRPFLSIISARYSPHQDADPLKPREPAIIRHFIAYKNQDHGAWLRGGD 800 801 V LDSCRFADNGIGLTLASGGTFPYDDGSKQEIKNSLFVGESGNVGTEMM 850 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 801 V LDSCRFADNGIGLTLASGGTFPYDDGSKQEIKNSLFVGESGNVGTEMM 850 . . . . . 851 DNRI GPGGLDHSGRTLPIGQNFPIRGIQLYDGPINIQNCTFRKFVALEG 900 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 851 DNRI GPGGLDHSGRTLPIGQNFPIRGIQLYDGPINIQNCTFRKFVALEG 900
- 901 RHTSALAFRLNNA QSCPHNNVTGIAFEDVPITSRVFFGEPGP FNQLDM 950 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 901 RHTSALAFRLNNA QSCPHNNVTGIAFEDVPITSRVFFGEPGPWFNQLDM 950 951 DGDKTSVFHDVDGSVSEYPGSYLTKNDN 979
951 DGDKTSVFHDVDGSVSEYPGSYLTKNDKW 979
Sequence name: Q9NPN9 Sequence documentation:
Alignment of: R30650_PEA_2_P8 x Q9NPN9
Alignment segment 1/1: Quality: 5764.00 Escore: 0 Matching length: 572 Total length: 572 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment :
565 VDERGGYDPPTYIRDLSIHHTFSRCVTVHGSNGLLIKDVVGYNSLGHCFF 614 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I 8 VDERGGYDPPTYIRDLSIHHTFSRCVTVHGSNGLLIKDWGYNSLGHCFF 57
615 TEDGPEERNTFDHCLGLLVKSGTLLPSDRDSKMCKMITEDSYPGYIP PR 664 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 58 TEDGPEERNTFDHCLGLLVKSGTLLPSDRDSKMCKMITEDSYPGYIP PR 107 . . . . . 665 QDCNAVSTFWMANPNNNLINCAAAGSEETGFWFIFHHVPTGPSVGMYSPG 714 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 108 QDCNAVSTFWMANPNNNLINCAAAGSEETGFWFIFHHVPTGPSVGMYSPG 157 715 YSEHIPLGKFYNNRAHSNYRAGMIIDNGVKTTEASAKDKRPFLSIISARY 764 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I 158 YSEHIPLGKFYNNRAHSNYRAGMIIDNGVKTTEASAKDKRPFLSIISARY 207 765 SPHQDADPLKPREPAIIRHFIAYKNQDHGA LRGGDV LDSCRFADNGIG 814 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 208 SPHQDADPLKPREPAIIRHFIAYKNQDHGA LRGGDVWLDSCRFADNGIG 257 815 LTLASGGTFPYDDGSKQEIKNSLFVGESGNVGTEMMDNRI GPGGLDHSG 864 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I 258 LTLASGGTFPYDDGSKQEIKNSLFVGESGNVGTEMMDNRIWGPGGLDHSG 307 . . . . . 865 RTLPIGQNFPIRGIQLYDGPINIQNCTFRKFVALEGRHTSALAFRLNNAW 914 I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I 308 RTLPIGQNFPIRGIQLYDGPINIQNCTFRKFVALEGRHTSALAFRLNNAW 357 915 QSCPHNNVTGIAFEDVPITSRVFFGEPGP FNQLDMDGDKTSVFHDVDGS 964 I I I I I I I I I I I I II I I II I I I I I I I 1 I I I I I I I I I I I II I I I I I I I I I I I 358 QSCPHNNVTGIAFEDVPITSRVFFGEPGPWFNQLDMDGDKTSVFHDVDGS 407 965 VSEYPGSYLTKNDNWLVRHPDCINVPDWRGAICSGCYAQMYIQAYKTSNL 1014 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 408 VSEYPGSYLTKNDN LVRHPDCINVPDWRGAICSGCYAQMYIQAYKTSNL 457
1015 RMKIIKNDFPSHPLYLEGALTRSTHYQQYQPVVTLQKGYTIH DQTAPAE 1064 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 458 RMKIIKNDFPSHPLYLEGALTRSTHYQQYQPVVTLQKGYTIHWDQTAPAE 507
1065 LAIWLINFNKGDWIRVGLCYPRGTTFSILSDVHNRLLKQTSKTGVFVRTL 1114
508 LAIWLINFNKGDWIRVGLCYPRGTTFSILSDVHNRLLKQTSKTGVFVRTL 557
1115 QMDKVEQSYPGRSHYYWDEDSG 1136 I I I I I I I I I I I I I I I I I I I I I I 558 QMDKVEQSYPGRSHYYWDEDSG 579 Sequence name: Q9H1K5
Sequence documentation:
Alignment of: R30650_PEA_2_P8 x Q9H1K5
Alignment segment 1/1:
Quality: 2772.00 Escore: 0 Matching length: 274 Total length: 274 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment:
863 SGRTLPIGQNFPIRGIQLYDGPINIQNCTFRKFVALEGRHTSALAFRLNN 912 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I 2 SGRTLPIGQNFPIRGIQLYDGPINIQNCTFRKFVALEGRHTSALAFRLNN 51 913 AWQSCPHNNVTGIAFEDVPITSRVFFGEPGPWFNQLDMDGDKTSVFHDVD 962 I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 52 AWQSCPHNNVTGIAFEDVPITSRVFFGEPGPWFNQLDMDGDKTSVFHDVD 101 963 GSVSEYPGSYLTKNDNWLVRHPDCINVPDWRGAICSGCYAQMYIQAYKTS 1012 II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 102 GSVSEYPGSYLTKNDNWLVRHPDCINVPDWRGAICSGCYAQMYIQAYKTS 151 1013 NLRMKIIKNDFPSHPLYLEGALTRSTHYQQYQPVVTLQKGYTIHWDQTAP 1062 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 152 NLRMKIIKNDFPSHPLYLEGALTRSTHYQQYQPVVTLQKGYTIHWDQTAP 201 1063 AELAIWLINFNKGD IRVGLCYPRGTTFSILSDVHNRLLKQTSKTGVFVR 1112 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 202 AELAIWLINFNKGDWIRVGLCYPRGTTFSILSDVHNRLLKQTSKTGVFVR 251
1113 TLQMDKVEQSYPGRSHYY DEDSG 1136 I I I I I I I I I II II I I I I I I I I I I I 252 TLQMDKVEQSYPGRSHYYWDEDSG 275
Sequence name: Q9ULM1
Sequence documentation:
Alignment of: R30650_PEA_2_P15 x Q9ULM1
Alignment segment 1/1: Quality: 7919.00
Escore: 0 Matching length: 788 Total length: 788 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment : . . . . . 349 AHPGKICNRPIDIQATTMDGVNLSTEWYKKGQDYRFACYDRGRACRSYR 398 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 AHPGKICNRPIDIQATTMDGVNLSTEWYKK"GQDYRFACYDRGRACRSYR 50 399 VRFLCGKPVRPKLTVTIDTNVNSTILNLEDNVQSWKPGDTLVIASTDYSM 448 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 VRFLCGKPVRPKLTVTIDTNVNSTILNLEDNVQSWKPGDTLVIASTDYSM 100 449 YQAEEFQVLPCRSCAPNQVKVAGKPMYLHIGEEIDGVDMRAEVGLLSRNI 498 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I 101 YQAEEFQVLPCRSCAPNQVKVAGKPMYLHIGEEIDGVDMRAEVGLLSRNI 150 499 IVMGEMEDKCYPYRNHICNFFDFDTFGGHIKFALGFKAAHLEGTELKHMG 548 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 151 IVMGEMEDKCYPYRNHICNFFDFDTFGGHIKFALGFKAAHLEGTELKHMG 200 549 QQLVGQYPIHFHLAGDVDERGGYDPPTYIRDLSIHHTFSRCVTVHGSNGL 598 I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I II I I I I I I I 201 QQLVGQYPIHFHLAGDVDERGGYDPPTYIRDLSIHHTFSRCVTVHGSNGL 250
599 LIKDWGYNSLGHCFFTEDGPEERNTFDHCLGLLVKSGTLLPSDRDSKMC 648 251 LIKDVVGYNSLGHCFFTEDGPEERNTFDHCLGLLVKSGTLLPSDRDSKMC 300
649 KMITEDSYPGYIPKPRQDCNAVSTFWMANPNNNLINCAAAGSEETGFWFI 698 I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
301 KMITEDSYPGYIPKPRQDCNAVSTFWMANPNNNLINCAAAGSEETGFWFI 350
699 FHHVPTGPSVGMYSPGYSEHIPLGKFYNNRAHSNYRAGMIIDNGVKTTEA 748 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I 351 FHHVPTGPSVGMYSPGYSEHIPLGKFYNNRAHSNYRAGMIIDNGVKTTEA 400
749 SAKDKRPFLSIISARYSPHQDADPLKPREPAIIRHFIAYKNQDHGAWLRG 798 I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
401 SAKDKRPFLSIISARYSPHQDADPLKPREPAIIRHFIAYKNQDHGAWLRG 450
799 GDVWLDSCRFADNGIGLTLASGGTFPYDDGSKQEIKNSLFVGESGNVGTE 848
451 GDVWLDSCRFADNGIGLTLASGGTFPYDDGSKQEIKNSLFVGESGNVGTE 500
849 MMDNRIWGPGGLDHSGRTLPIGQNFPIRGIQLYDGPINIQNCTFRKFVAL 898
501 MMDNRIWGPGGLDHSGRTLPIGQNFPIRGIQLYDGPINIQNCTFRKFVAL 550
899 EGRHTSALAFRLNNAWQSCPHNNVTGIAFEDVPITSRVFFGEPGPWFNQL 948 I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I Ii I II I I I I
551 EGRHTSALAFRLNNAWQSCPHNNVTGIAFEDVPITSRVFFGEPGPWFNQL 600
949 DMDGDKTSVFHDVDGSVSEYPGSYLTKNDNWLVRHPDCINVPDWRGAICS 998 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 601 DMDGDKTSVFHDVDGSVSEYPGSYLTKNDNWLVRHPDCINVPDWRGAICS 650 999 GCYAQMYIQAYKTSNLRMKIIKNDFPSHPLYLEGALTRSTHYQQYQPVVT 1048 I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 651 GCYAQMYIQAYKTSNLRMKIIKNDFPSHPLYLEGALTRSTHYQQYQPVVT 700
1049 LQKGYTIHWDQTAPAELAIWLINFNKGDWIRVGLCYPRGTTFSILSDVHN 1098 II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 701 LQKGYTIHWDQTAPAELAIWLINFNKGDWIRVGLCYPRGTTFSILSDVHN 750
1099 RLLKQTSKTGVFVRTLQMDKVEQSYPGRSHYYWDEDSG 1136 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 751 RLLKQTSKTGVFVRTLQMDKVEQSYPGRSHYYWDEDSG 788
Sequence name : Q8 UJ3
Sequence documentation:
Alignment of: R30650_PEA_2_P15 x Q8WUJ3
Alignment segment 1/1:
Quality: 9764.00 Escore: 0 Matching length: 979 Total length: 979 Matching Percent Similarity: 99.90 Matching Percent Identity: 99.90 Total Percent Similarity: 99.90 Total Percent Identity: 99.90 Gaps : 0
Alignment:
1 MGAAGRQDFLFKAMLTISWLTLTCFPGATSTVAAGCPDQSPELQPWNPGH 50 II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MGAAGRQDFLFKAMLTISWLTLTCFPGATSTVAAGCPDQSPELQPWNPGH 50 51 DQDHHVHIGQGKTLLLTSSATVYSIHISEGGKLVIKDHDEPI¥LRTRHIL 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I 51 DQDHHVHIGQGKTLLLTSSATVYSIHISEGGKLVIKDHDEPIVLRTRHIL 100 101 IDNGGELHAGSALCPFQGNFTIILYGRADEGIQPDPYYGLKYIGVGKGGA 150 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 101 IDNGGELHAGSALCPFQGNFTIILYGRADEGIQPDPYYGLKYIGVGKGGA 150 151 LELHGQKKLSWTFLNKTLHPGGMAEGGYFFERSWGHRGVIVHVIDPKSGT 200 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 151 LELHGQKKLSWTFLNKTLHPGGMAEGGYFFERSWGHRGVIVHVIDPKSGT 200 201 VIHSDRFDTYRSKKESERLVQYLNAVPDGRILSVAVNDEGSRNLDDMARK 250 II I I I I I I I I I I I I I I I I I I I I II I I I I I I II I I I I I I I I I I I I I I I I I I 201 VIHSDRFDTYRSKKESERLVQYLNAVPDGRILSVAVNDEGSRNLDDMARK 250 251 AMTKLGSKHFLHLGFRHPWSFLTVKGNPSSSVEDHIEYHGHRGSAAARVF 300
251 AMTKLGSKHFLHLGFRHPWSFLTVKGNPSSSVEDHIEYHGHRGSAAARVF 300
301 KLFQTEHGEYFNVSLSSEWVQDVEWTEWFDHDKVSQTKGGEKISDLWKAH 350 301 KLFQTEHGEYFNVSLSSEWVQDVEWTEWFDHDKVSQTKGGEKISDLWKAH 350
351 PGKICNRPIDIQATTMDGVNLSTEVVYKKGQDYRFACYDRGRACRSYRVR 400 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I
351 PGKICNRPIDIQATTMDGVNLSTEVVYKKGQDYRFACYDRGRACRSYRVR 400
401 FLCGKPVRPKLTVTIDTNVNSTILNLEDNVQSWKPGDTLVIASTDYSMYQ 450 I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 401 FLCGKPVRPKLTVTIDTN¥NSTILNLEDNVQSWKPGDTLVIASTDYSMYQ 450
451 AEEFQVLPCRSCAPNQVKVAGKPMYLHIGEEIDGVDMRAEVGLLSRNIIV 500 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
451 AEEFQVLPCRSCAPNQVKVAGKPMYLHIGEEIDGVDMRAEVGLLSRNIIV 500 . . . . .
501 MGEMEDKCYPYRNHICNFFDFDTFGGHIKFALGFKAAHLEGTELKHMGQQ 550 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
501 MGEMEDKCYPYRNHICNFFDFDTFGGHIKFALGFKAAHLEGTELKHMGQQ 550
551 LVGQYPIHFHLAGDVDERGGYDPPTYIRDLSIHHTFSRCVTVHGSNGLLI 600 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
551 LVGQYPIHFHLAGDVDERGGYDPPTYIRDLSIHHTFSRCVTVHGSNGLLI 600
601 KDVVGYNSLGHCFFTEDGPEERNTFDHCLGLLVKSGTLLPSDRDSKMCKM 650
601 KDVVGYNSLGHCFFTEDGPEERNTFDHCLGLLVKSGTLLPSDRDSKMCKM 650
651 ITEDSYPGYIPKPRQDCNAVSTFWMANPNNNLINCAAAGSEETGFWFIFH 700 I I II I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I II I I II I 651 ITEDSYPGYIPKPRQDCNAVSTFWMANPNNNLINCAAAGSEETGFWFIFH 700 701 HVPTGPSVGMYSPGYSEHI PLGKFYNNRAHSNYRAGMI IDNGVKTTEASA 750 I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 701 HVPTGPSVGMYSPGYSEHIPLGKFYNNRAHSNYRAGMIIDNGVKTTEASA 750 751 KDKRPFLSIISARYSPHQDADPLKPREPAIIRHFIAYKNQDHGAWLRGGD 800 I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 751 KDKRPFLSIISARYSPHQDADPLKPREPAIIRHFIAYKNQDHGAWLRGGD 800 801 V LDSCRFADNGIGLTLASGGTFPYDDGSKQEIKNSLFVGESGNVGTEMM 850 I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 801 VWLDSCRFADNGIGLTLASGGTFPYDDGSKQEIKNSLFVGESGNVGTEMM 850
851 DNRIWGPGGLDHSGRTLPIGQNFPIRGIQLYDGPINIQNCTFRKFVALEG 900 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 851 DNRIWGPGGLDHSGRTLPIGQNFPIRGIQLYDGPINIQNCTFRKFVALEG 900 901 RHTSALAFRLNNAWQSCPHNNVTGIAFEDVPITSRVFFGEPGPWFNQLDM 950 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 901 RHTSALAFRLNNAWQSCPHNNVTGIAFEDVPITSRVFFGEPGPWFNQLDM 950
951 DGDKTSVFHDVDGSVSEYPGSYLTKNDNW 979
951 DGDKTSVFHDVDGSVSEYPGSYLTKNDKW 979
Sequence name: Q9NPN9 Sequence documentation:
Alignment of: R30650_PEA_2_P15 x Q9NPN9
Alignment segment 1/1:
Quality: 5764.00 Escore: 0 Matching length: 572 Total length: 572 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0
Alignment :
565 VDERGGYDPPTYIRDLSIHHTFSRCVTVHGSNGLLIKDWGYNSLGHCFF 614 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 8 VDERGGYDPPTYIRDLSIHHTFSRCVTVHGSNGLLIKDVVGYNSLGHCFF 57
615 TEDGPEERNTFDHCLGLLVKSGTLLPSDRDSKMCKMITEDSYPGYIPKPR 664 I I I I I I I I I I I I I I I I I II II I I I I I I I I I I I I I I I I I I I I I I II I I I I I 58 TEDGPEERNTFDHCLGLLVKSGTLLPSDRDSKMCKMITEDSYPGYIPKPR 107 665 QDCNAVSTFWMANPNNNLINCAAAGSEETGFWFIFHHVPTGPSVGMYSPG 714 II I I I I I I I I I I I I I I I I I II I I I I I II I I I I I I I I I I I I I I I I I I I I I I 108 QDCNAVSTFWMANPNNNLINCAAAGSEETGFWFIFHHVPTGPSVGMYSPG 157
715 YSEHIPLGKFYNNRAHSNYRAGMIIDNGVKTTEASAKDKRPFLSIISARY 764 158 YSEHIPLGKFYNNRAHSNYRAGMIIDNGVKTTEASAKDKRPFLSIISARY 207
765 SPHQDADPLKPREPAIIRHFIAYKNQDHGAWLRGGDVWLDSCRFADNGIG 814 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 208 SPHQDADPLKPREPAIIRHFIAYKNQDHGAWLRGGDVWLDSCRFADNGIG 257 815 LTLASGGTFPYDDGSKQEIKNSLFVGESGNVGTEMMDNRIWGPGGLDHSG 864 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 258 LTLASGGTFPYDDGSKQEIKNSLFVGESGNVGTEMMDNRIWGPGGLDHSG 307
865 RTLPIGQNFPIRGIQLYDGPINIQNCTFRKFVALEGRHTSALAFRLNNAW 914 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 308 RTLPIGQNFPIRGIQLYDGPINIQNCTFRKFVALEGRHTSALAFRLNNAW 357 . . . . . 915 QSCPHNNVTGIAFEDVPITSRVFFGEPGPWFNQLDMDGDKTSVFHDVDGS 964 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 358 QSCPHNNVTGIAFEDVPITSRVFFGEPGPWFNQLDMDGDKTSVFHDVDGS 407 965 VSEYPGSYLTKNDNWLVRHPDCINVPDWRGAICSGCYAQMYIQAYKTSNL 1014 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 408 VSEYPGSYLTKNDNWLVRHPDCINVPDWRGAICSGCYAQMYIQAYKTSNL 457
1015 RMKIIKNDFPSHPLYLEGALTRSTHYQQYQPVVTLQKGYTIHWDQTAPAE 1064 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 458 RMKIIKNDFPSHPLYLEGALTRSTHYQQYQPVVTLQKGYTIHWDQTAPAE 507
1065 LAIWLINFNKGDWIRVGLCYPRGTTFSILSDVHNRLLKQTSKTGVFVRTL 1114 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I 508 LAIWLINFNKGDWIRVGLCYPRGTTFSILSDVHNRLLKQTSKTGVFVRTL 557 1115 QMDKVEQSYPGRSHYYWDEDSG 1136
558 QMDKVEQSYPGRSHYYWDEDSG 579
Sequence name: Q9H1K5
Sequence documentation:
Alignment of: R30650_PEA_2_P15 x Q9H1K5
Alignment segment 1/1
Quality: 2772.00 Escore: 0 Matching length: 274 Total length: 274 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment :
863 SGRTLPIGQNFPIRGIQLYDGPINIQNCTFRKFVALEGRHTSALAFRLNN 912 2 SGRTLPIGQNFPIRGIQLYDGPINIQNCTFRKFVALEGRHTSALAFRLNN 51 913 AWQSCPHNNVTGIAFEDVPITSRVFFGEPGPWFNQLDMDGDKTSVFHDVD 962 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 52 AWQSCPHNNVTGIAFEDVPITSRVFFGEPGPWFNQLDMDGDKTSVFHDVD 101 963 GSVSEYPGSYLTKNDNWLVRHPDCINVPDWRGAICSGCYAQMYIQAYKTS 1012 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 102 GSVSEYPGSYLTKNDNWLVRHPDCINVPDWRGAICSGCYAQMYIQAYKTS 151
1013 NLRMKIIKNDFPSHPLYLEGALTRSTHYQQYQPVVTLQKGYTIHWDQTAP 1062 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 152 NLRMKIIKNDFPSHPLYLEGALTRSTHYQQYQPVVTLQKGYTIHWDQTAP 201 1063 AELAIWLINFNKGDWIRVGLCYPRGTTFSILSDVHNRLLKQTSKTGVFVR 1112 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 202 AELAIWLINFNKGDWIRVGLCYPRGTTFSILSDVHNRLLKQTSKTGVFVR 251
1113 TLQMDKVEQSYPGRSHYYWDEDSG 1136
252 TLQMDKVEQSYPGRSHYYWDEDSG 275
Sequence name: Q8WUJ3
Sequence documentation: Alignment of: R30650_PEA_2_P17 x Q8WUJ3
Alignment segment 1/1: Quality: 3170.00
Escore: 0 Matching length: 324 Total length: 324 Matching Percent Similarity: 99.38 Matching Percent Identity: 99.38 - - Total Percent Similarity: 99.38 Total Percent Identity: 99.38 Gaps : 0
Alignment:
1 MGAAGRQDFLFKAMLTISWLTLTCFPGATSTVAAGCPDQSPELQPWNPGH 50 I I I I I I I I I II I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MGAAGRQDFLFKAMLTISWLTLTCFPGATSTVAAGCPDQSPELQPWNPGH 50 . . . . . 51 DQDHHVHIGQGKTLLLTSSATVYSIHISEGGKLVIKDHDEPIVLRTRHIL 100 I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 DQDHHVHIGQGKTLLLTSSATVYSIHISEGGKLVIKDHDEPIVLRTRHIL 100 101 IDNGGELHAGSALCPFQGNFTIILYGRADEGIQPDPYYGLKYIGVGKGGA 150 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 101 IDNGGELHAGSALCPFQGNFTIILYGRADEGIQPDPYYGLKYIGVGKGGA 150 151 LELHGQKKLSWTFLNKTLHPGGMAEGGYFFERSWGHRGVIVHVIDPKSGT 200 I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 151 LELHGQKKLSWTFLNKTLHPGGMAEGGYFFERSWGHRGVIVHVIDPKSGT 200 201 VIHSDRFDTYRSKKESERLVQYLNAVPDGRILSVAVNDEGSRNLDDMARK 250 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I 201 VIHSDRFDTYRSKKESERLVQYLNAVPDGRILSVAVNDEGSRNLDDMARK 250 251 AMTKLGSKHFLHLGFRHPWSFLTVKGNPSSSVEDHIEYHGHRGSAAARVF 300 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 251 AMTKLGSKHFLHLGFRHPWSFLTVKGNPSSSVEDHIEYHGHRGSAAARVF 300 301 KLFQTEHGEYFNVSLSSEWVQGEE 324 I I I I I I I I I I I I I I I I I I I I I I 301 KLFQTEHGEYFNVSLSSEWVQDVE 324
Expression of R30650 transcripts which are detectable by amplicon as depicted in sequence name R30650 seg76 in normal and cancerous colon tissues Expression of R30650 franscripts detectable by or according to seg76, R30650 amplicon (SEQ ID NO: 1354) and R30650 F (SEQ ID NO: 1352) and R30650 R (SEQ ID NO: 1353) primers was measured by real time PCR. In parallel the expression of four housekeeping genes -PBGD (GenBank Accession No. BC019323; amplicon - PBGD-amplicon, SEQ ID NO:531), HPRTl (GenBank Accession No. NM_000194; amplicon - HPRTl -amplicon, SEQ ID NO.612), G6PD (GenBank Accession No. NM_000402; G6PD amplicon, SEQ ID NO.615), RPS27A (GenBank Accession No. NM 002954; RPS27A amplicon, SEQ ID NO.1261), was measured similarly. For each RT sample, the expression ofthe above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the normal post-mortem (PM) samples (Sample Nos. 41, 52, 62-67, 69-71, Table 1, above, "Tissue samples in testing panel"), to obtain a value of fold up-regulation for each sample relative to median of the normal PM samples. Figure 57 is a histogram showing over expression ofthe above-indicated R30650 transcripts in cancerous colon samples relative to the normal samples. The number and percentage of samples that exhibit at least 5 fold over-expression, out ofthe total number of samples tested is indicated in the bottom. As is evident from Figure 57, the expression of R30650 transcripts detectable by the above amplicon in cancer samples was significantly higher than in the non-cancerous samples (Sample Nos. 41, 52, 62-67, 69-71, Table 1, "Tissue samples in testing panel"). Notably an over-expression of at least 5 fold was found in 18 out of 37 adenocarcinoma samples, Statistical analysis was applied to verify the significance of these results, as described below. The P value for the difference in the expression levels of R30650 transcripts detectable by the above amplicon in colon cancer samples versus the nonnal tissue samples was determined by T test as 1.86E-05. Threshold of 5 fold overexpression was found to differentiate between cancer and normal samples with P value of 2.42E-03 as checked by exact fisher test. The above values demonstrate statistical significance ofthe results.
Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non- limiting illusfrative example only of a suitable primer pair: R30650 Fforward primer; and R30650 Rreverse primer. The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon was obtained as a non-limiting illustrative example only of a suitable amplicon: R30650. Forward primer (SEQ ID NO: 1352): CTTCTTGTCCACGGTTTTGTTG Reverse primer (SEQ ID NO: 1353): AACATTCCTGGCCACCTGAA Amplicon (SEQ ID NO: 1354): CTTCTTGTCCACGGTTTTGTTGAGTTTTCACTCTTCTAATGCAAGGGTCTCACACTGT GAACCACTTAGGATGTGATCACTTTCAGGTGGCCAGGAATGTT
DESCRIPTION FOR CLUSTER T23657
Cluster T23657 featares 31 transcript(s) and 33 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end ofthe application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Figure imgf001486_0001
Figure imgf001487_0001
Figure imgf001488_0001
Table 3 - Proteins of interest
Figure imgf001488_0002
Figure imgf001489_0001
These sequences are variants ofthe known protein Solute canier family 21 member 12 (SwissProt accession identifier S21CJTUMAN; known also according to the synonyms Sodium-independent organic anion transporter E; Organic anion transporting polypeptide E; OATP-E; Colon organic anion fransporter; Organic anion transporter polypeptide-related protein 1; OATP-RP1; OATPRP1; POAT), SEQ ID NO: 1062, refened to herein as the previously known protein. Protein Solute carrier family 21 member 12 is known or believed to have the following function(s): Mediates the Na(+)-independent transport of organic anions such as the thyroid hoπnones T3 (triiodo-L-thyronine), T4 (thyroxine) and rT3, and of estrone-3 -sulfate and taurocholate. The sequence for protein Solute canier family 21 member 12 is given at the end of the application, as "Solute carrier family 21 member 12 amino acid sequence". Known polymorphisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein
Figure imgf001490_0001
Protein Solute canier family 21 member 12 localization is believed to be Integral membrane protein.
The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: ion transport, which are annotation(s) related to Biological Process; fransporter, which are annotation(s) related to Molecular Function; and integral membrane protein, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more ofthe SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLirik/>. Cluster T23657 can be used as a diagnostic marker according to overexpression of franscripts of this cluster in cancer. Expression of such transcripts in normal tissues is also given according to the previously described methods. The term "number" in the left hand column of the table and the numbers on the y-axis ofthe figure below refer to weighted expression of ESTs in each category, as "parts per million" (ratio ofthe expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million). Overall, the following results were obtained as shown with regard to the histograms in Figure 58 and Table 5. This cluster is overexpressed (at least at a minimum level) in the following pathological conditions: epithelial malignant tamors.
Table 5 - Normal tissue distribution
Figure imgf001491_0001
Table 6 - P values and ratios for- expression in cancerous tissue
Figure imgf001491_0002
Figure imgf001492_0001
As noted above, cluster T23657 featares 31 transcript(s), which were listed in Table 1 above. These franscript(s) encode for protein(s) which are variant(s) of protein Solute carrier family 21 member 12. A description of each variant protein according to the present invention is now provided. Variant protein T23657JP1 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) T23657JT0, T23657JT1 and T23657JT8. The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because both trans- membrane region prediction programs predicted a trans-membrane region for this protein. Variant protein T23657JP1 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 7, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T23657_P1 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Amino acid mutations
Figure imgf001493_0001
Variant protein T23657_P1 is encoded by the following transcript(s): T23657JT0, T23657JT1 and T23657JT8, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript T23657JT0 is shown in bold; this coding portion starts at position 212 and ends at position 2377. The transcript also has the following SNPs as listed in Table 8 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is Icnown or not; the presence of known SNPs in variant protein T23657JP1 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Nucleic acid SNPs
Figure imgf001493_0002
Figure imgf001494_0001
The coding portion of transcript T23657_T1 is shown in bold; this coding portion starts at position 212 and ends at position 2377. The transcript also has the following SNPs as listed in Table 9 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T23657_P1 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Nucleic acid SNPs
Figure imgf001494_0002
Figure imgf001495_0001
The coding portion of transcript T23657JT8 is shown in bold; this coding portion starts at position 212 and ends at position 2377. The transcript also has the following SNPs as listed in Table 10 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T23657_P1 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 - Nucleic acid SNPs
Figure imgf001495_0002
Variant protein T23657JP2 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) T23657JT2, T23657JT7, T23657JT16 and T23657JT20. An alignment is given to the known protein (Solute canier family 21 member 12) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T23657JP2 and S21 CJHUMAN: l.An isolated chimeric polypeptide encoding for T23657JP2, comprising a first amino acid sequence being at least 90 % homologous to
MPLHQLGDKPLTFPSPNSAMENGLDHTPPSRRASPGTPLSPGSLRSAAHSPLDTSKQPLC QLWAEKHGARGTHEVRYVSAGQSVACGWWAFAPPCLQVLNTPKGILFFLCAAAFLQG MTVNGFTNTVITSLERRYDLHSYQSGLIASSYDIAACLCLTFVSYFGGSGHKPRWLGWG VLLMGTGSLVFALPHFTAGRYEVELDAGVRTCPANPGAVCADSTSGLSRYQLVFMLG QFLHGVGATPLYTLGVTYLDENVKSSCSPVYIAIFYTAAILGPAAGYLIGGALLNIYTEM GRRTELTTESPLWVGAWWVGFLGSGAAAFFTAVPILGYPRQLPGSQRYAVMRAAEMH QLKDSSRGEASNPDFGKTIRDLPLSIWLLLKNPTFILLCLAGATEATLITGMSTFSPKFLES QFSLSASEAATLFGYLVVPAGGGGTFLGGFFVNKLRLRGSAVIKFCLFCTWSLLGILVF SLHCPSVPMAGVTASYGGSLLPEGHLNLTAPCNAACSCQPEHYSPVCGSDGLMYFSLC HAGCPAATETNVDGQKVYRDCSCIPQNLSSGFGHATAGKCTSTCQRKPLLLVFIFWIFF TFLSSIPALTATLRCVRDPQRSFALGIQWIVVRILGGIPGPIAFGWVIDKACLLWQDQCG QQGSCLVYQNSAMSRYILIMGLLYK conesponding to amino acids 1 - 675 of S21C_HUMAN, which also conesponds to amino acids 1 - 675 of T23657JP2, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence FQLPEVHHSLNVLNRKFQKQTVHNL conesponding to amino acids 676 - 700 of T23657_P2, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of T23657JP2, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence FQLPEVHHSLNVLNRKFQKQTVHNL m T23657JP2.
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because both trans-membrane region prediction programs predicted a frans-membrane region for this protein. Variant protein T23657_P2 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 11, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T23657JP2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 11 - Amino acid mutations
Figure imgf001497_0001
The glycosylation sites of variant protein T23657_P2, as compared to the known protein Solute carrier family 21 member 12, are described in Table 12 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 12 - Glycosylation site(s)
Figure imgf001497_0002
Figure imgf001498_0001
Variant protein T23657JP2 is encoded by the following transcript(s): T23657JT2, T23657JH, T23657JT16 and T23657JT20, for which the sequence(s) is/are given at the end of the application. The coding portion of franscript T23657JT2 is shown in bold; this coding portion starts at position 212 and ends at position 2311. The transcript also has the following SNPs as listed in Table 13 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T23657_P2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 13 - Nucleic acid SNPs
Figure imgf001498_0002
Figure imgf001499_0001
The coding portion of transcript T23657 JH is shown in bold; this coding portion starts at position 212 and ends at position 2311. The transcript also has the following SNPs as listed in Table 14 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T23657_P2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 14 - Nucleic acid SNPs
Figure imgf001499_0002
Figure imgf001500_0001
The coding portion of transcript T23657JT16 is shown in bold; this coding portion starts at position 212 and ends at position 2311. The transcript also has the following SNPs as listed in Table 15 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T23657_P2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 15 - Nucleic acid SNPs
Figure imgf001500_0002
Figure imgf001501_0001
The coding portion of franscript T23657JT20 is shown in bold; this coding portion starts at position 212 and ends at position 2311. The transcript also has the following SNPs as listed in Table 16 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T23657JP2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 16 - Nucleic acid SNPs
Figure imgf001501_0002
Figure imgf001502_0001
Variant protein T23657JP3 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) T23657JT3, T23657JT9 and T23657JT21. An alignment is given to the Icnown protein (Solute canier family 21 member 12) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T23657 JP3 and S21 CJHUMAN: l.An isolated chimeric polypeptide encoding for T23657JP3, comprising a first amino acid sequence being at least 90 % homologous to MPLHQLGDKPLTFPSPNSAMENGLDHTPPSRRASPGTPLSPGSLRSAAHSPLDTSKQPLC QLWAEKHGARGTHEVRYVSAGQSVACGWWAFAPPCLQVLNTPKGILFFLCAAAFLQG MTVNGFINTVITSLERRYDLHSYQSGLIASSYDIAACLCLTFVSYFGGSGHKPRWLGWG VLLMGTGSLVFALPHFTAGRYEVELDAGVRTCPANPGAVCADSTSGLSRYQLVFMLG QFLHGVGATPLYTLGVTYLDENVKSSCSPVYIAIFYTAAILGPAAGYLIGGALLNIYTEM GRRTELTTESPLWVGAWWVGFLGSGAAAFFTAVPILGYPRQLPGSQRYAVMRAAEMH QLKDSSRGEASNPDFGKTIRDLPLSIWLLLKNPTFILLCLAGATEATLITGMSTFSPKFLES QFSLSASEAATLFGYLVVPAGGGGTFLGGFFVNKLRLRGSAVTKFCLFCTWSLLGILVF SLHCPSVPMAGVTASYGGSLLPEGHLNLTAPCNAACSCQPEHYSPVCGSDGLMYFSLC HAGCPAATETNVDGQKVYRDCSCIPQNLSSGFGHATAGKCTSTCQRKPLLLVFIFWIFF TFLSSIPALTATLRCVRDPQRSFALGIQWIVVRILGGIPGPIAFGWVIDKACLLWQDQCG QQGSCLWQNSAMSRYILIMGLLYK conesponding to amino acids 1 - 675 of S21CJHUMAN, which also conesponds to amino acids 1 - 675 of T23657_P3, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%>, more preferably at least 90%> and most preferably at least 95%> homologous to a polypeptide having the sequence TIKHKAF conesponding to amino acids 676 - 682 of T23657_P3, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of T23657_P3, comprising a polypeptide being at least 10%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence TIKHKAF in T23657 P3.
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, Including analyses from SignalP and other specialized " programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because both trans-membrane region prediction programs predicted a trans-membrane region for this protein. Variant protein T23657JP3 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 17, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T23657JP3 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 17 - Amino acid mutations
Figure imgf001503_0001
The glycosylation sites of variant protein T23657JP3, as compared to the known protein Solute canier family 21 member 12, are described in Table 18 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 18 - Glycosylation site(s)
Figure imgf001504_0001
. - Variant protein T23657_P3 is encoded by the following transcript(s): T23657JT3, _ T23657JT9 and T23657JT21, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript T23657JT3 is shown in bold; this coding portion starts at position 212 and ends at position 2257. The franscript also has the following SNPs as listed in Table 19 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T23657JP3 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 19 - Nucleic acid SNPs
Figure imgf001504_0002
Figure imgf001505_0001
The coding portion of transcript T23657JT9 is shown in bold; this coding portion starts at position 212 and ends at position 2257. The transcript also has the following SNPs as listed in Table 20 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of Icnown SNPs in variant protein T23657JP3 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 20 - Nucleic acid SNPs
Figure imgf001505_0002
Figure imgf001506_0001
The coding portion of transcript T23657JT21 is shown in bold; this coding portion starts at position 212 and ends at position 2257. The transcript also has the following SNPs as listed in Table 21 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T23657JP3 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 21 - Nucleic acid SNPs
Figure imgf001506_0002
Figure imgf001507_0001
v/ariant protein T23657_P4 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by franscript(s) T23657JT4. An alignment is given to the known protein (Solute canier family 21 member 12) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T23657JP4 and S21C_HUMAN: 1.An isolated chimeric polypeptide encoding for T23657JP4, comprising a first amino acid sequence being at least 90 % homologous to MPLHQLGDKPLTFPSPNSAMENGLDHTPPSRRASPGTPLSPGSLRSAAHSPLDTSKQPLC QLWAEKHGARGTHEVRYVSAGQSVACGWWAFAPPCLQVLNTPKGILFFLCAAAFLQG MTVNGFTNTVITSLERRYDLHSYQSGLIASSYDIAACLCLTFVSYFGGSGHKPRWLGWG VLLMGTGSLVFALPHFTAGRYEVELDAGVRTCPANPGAVCADSTSGLSRYQLVFMLG QFLHGVGATPLYTLGVTYLDENVKSSCSPVYIAIFYTAAILGPAAGYLIGGALLNIYTEM GRRTELTTESPLWVGAWWVGFLGSGAAAFFTAVPILGYPRQLPGSQRYAVMRAAEMH QLKDSSRGEASNPDFGKTIRDLPLSIWLLLKNPTFILLCLAGATEATLITGMSTFSPKFLES QFSLSASEAATLFGYLVVPAGGGGTFLGGFFVNKLRLRGSAVIKFCLFCTVVSLLGILVF SLHCPSVPMAGVTASYGGSLLPEGHLNLTAPCNAACSCQPEHYSPVCGSDGLMYFSLC HAGCPAATETNVDGQKVΥPJDCSCIPQNLSSGFGHATAGKCTSTCQRKPLLLVFiFVNIFF TFLSSIPALTATLRCVRDPQRSFALGIQWIVVRIL conesponding to amino acids 1 - 625 of S21 CJHUMAN, which also conesponds to amino acids 1 - 625 of T23657JP4, a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence GTVQCEEAMVSCTVCSLHKGM conesponding to amino acids 626 - 646 of T23657JP4, a third amino acid sequence being at least 90 % homologous to 5 GGIPGPIAFGWVIDKACLLWQDQCGQQGSCLVYQNSAMSRYILIMGLLYK conesponding to amino acids 626 - 675 of S21 CJHUMAN, which also conesponds to amino acids 647 - 696 of T23657JP4, and a fourth amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%o homologous to a polypeptide having the sequence TIKHKAF conesponding to amino "10 acids 697 - 703 of T23657JP4, wherein said first amino acid sequence, second amino acid sequence, third amino acid sequence and fourth amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for an edge portion of T23657JP4, comprising an amino acid sequence being at least 70%>, optionally at least about 80%, preferably at least about 15 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence encoding for GTVQCEEAMVSCTVCSLHKGM, conesponding to T23657JP4. 3. An isolated polypeptide encoding for a tail of T23657JP4, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence 20 TIKHKAF in T23657_P4.
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: 25 membrane. The protein localization is believed to be membrane because both frans-membrane region prediction programs predicted a frans-membrane region for this protein. Variant protein T23657_P4 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 22, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is 30 known or not; the presence of known SNPs in variant protein T23657JP4 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 22 -Amino acid mutations
Figure imgf001509_0001
The glycosylation sites of variant protein T23657_P4, as compared to the known protein Solute canier family 21 member 12, are described in Table 23 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 23 - Glycosylation site(s)
Figure imgf001509_0002
Variant protein T23657 JP4 is encoded by the following transcriρt(s): T23657JT4, for which the sequence(s) is/are given at the end of the application. The coding portion of franscript T23657JT4 is shown in bold; this coding portion starts at position 212 and ends at position 2320. The transcript also has the following SNPs as listed in Table 24 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T23657JP4 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 24 - Nucleic acid SNPs
Figure imgf001510_0001
Variant protein T23657_P5 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by franscript(s) T23657 JT5 and T23657JT6. An alignment is given to the known protein (Solute carrier family 21 member 12) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T23657_P5 and S21CJTUMAN: l.An isolated chimeric polypeptide encoding for T23657JP5, comprising a first amino acid sequence being at least 90 % homologous to
MPLHQLGDKPLTFPSPNSAMENGLDHTPPSRRASPGTPLSPGSLRS AAHSPLDTSKQPLC QLWAEKHGARGTHEVRYVSAGQSVACGWWAFAPPCLQVLNTPKGILFFLCAAAFLQG MTVNGFINTVITSLERRYDLHSYQSGLIASSYDIAACLCLTFVSYFGGSGHKPRWLGWG VLLMGTGSLVFALPHFTAGRYEVELDAGVRTCPANPGAVCADSTSGLSRYQLVFMLG QFLHGVGATPLYTLGVTYLDENVKSSCSPVYIAIFYTAAILGPAAGYLIGGALLNIYTEM GRRTELTTESPLWVGAWWVGFLGSGAAAFFTAVPILGYPRQLPGSQRYAVMRAAEMH QLKDSSRGEASNPDFGKTIRDLPLSIWLLLK- PTFILLCLAGATEATLITGMSTFSPKFLES QFSLSASEAATLFGYLVVPAGGGGTFLGGFFVNKLRLRGSAVIKFCLFCTWSLLGILVF SLHCPSVPMAGVTASYGGSLLPEGHLNLTAPCNAACSCQPEHYSPVCGSDGLMYFSLC HAGCPAATETNVDGQKVYRDCSCIPQNLSSGFGHATAGKCTSTCQRKPLLLVFIFWIFF TFLSSIPALTATLR conesponding to amino acids 1 - 604 of S21 CJHUMAN, which also conesponds to amino acids 1 - 604 of T23657_P5.
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because both frans-membrane region prediction programs predicted a trans-membrane region for this protein. Variant protein T23657JP5 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 25, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is Icnown or not; the presence of known SNPs in variant protein T23657 JP5 sequence provides support for the deduced sequence of this variant protein according to the present invention).
Table 25 - Amino acid mutations
Figure imgf001512_0001
The glycosylation sites of variant protein T23657JP5, as compared to the known protein Solute canier family 21 member 12, are described in Table 26 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 26 - Glycosylation site(s)
Figure imgf001512_0002
Variant protein T23657__P5 is encoded by the following franscript(s): T23657JT5 and T23657JT6, for which the sequence(s) is/are given at the end ofthe application. The coding portion of franscript T23657JT5 is shown in bold; this coding portion starts at position 212 and ends at position 2023. The transcript also has the following SNPs as listed in Table 27 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T23657_P5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 27 - Nucleic acid SNPs
Figure imgf001512_0003
Figure imgf001513_0001
The coding portion of transcript T23657JT6 is shown in bold; this coding portion starts at position 212 and ends at position 2023. The transcript also has the following SNPs as listed in Table 28 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T23657JP5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 28 - Nucleic acid SNPs
Figure imgf001514_0001
Variant protein T23657 J>6 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by franscript(s) T23657JT10. An alignment is given to the known protein (Solute carrier family 21 member 12) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T23657JP6 and S21C_HUMAN: l.An isolated chimeric polypeptide encoding for T23657_P6, comprising a first amino acid sequence being at least 90 % homologous to
MPLHQLGDKPLTFPSPNSAMENGLDHTPPSRRASPGTPLSPGSLRSAAHSPLDTSKQPLC QLWAEKHGARGTHEVRYVSAGQSVACGWWAFAPPCLQVLNTPKGILFFLCAAAFLQG MTVNGFINTVITSLERRYDLHSYQSGLIASSYDIAACLCLTFVSYFGGSGHKPRWLGWG VLLMGTGSLVFALPHFTAGRYEVELDAGVRTCPANPGAVCADSTSGLSRYQLVFMLG QFLHGVGATPLYTLGVTYLDENVKSSCSPVYIAIFYTAAILGPAAGYLIGGALLN1YTEM GRRTELTTESPLWVGAWWVGFLGSGAAAFFTAVPILGYPRQLPGSQRYAVMRAAEMH QLKDSSRGEASNPDFGKTIRDLPLSIWLLLKNPTFILLCLAGATEATLITGMSTFSPKFLES QFSLSASEAATLFGYLVVPAGGGGTFLGGFFVNKLRLRGSAVIKFCLFCTVVSLLGILVF SLHCPSVPMAGVTASYGGSLLPEGHLNLTAPCNAACSCQPEHYSPVCGSDGLMYFSLC HAGCPAATETNVDGQKV conesponding to amino acids 1 - 547 of S21 CJHUMAN, which also conesponds to amino acids 1 - 547 of T23657 JP6, and a second amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence SGAAAYRPCPPLDPGKGPPCLPLVIGAIVGLPRCTETVAVSLRIFPLVLAMPLQGNALQL VRESPSFWFSYSL conesponding to amino acids 548 - 620 of T23657 JP6, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of T23657JP6, comprising a polypeptide being at least 70%>, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SGAAAYRPCPPLDPGKGPPCLPLVIGAIVGLPRCTETVAVSLRIFPLVLAMPLQGNALQL VRESPSFWFSYSL in T23657 JP6.
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because both trans-membrane region prediction programs predicted a trans-membrane region for this protein. Variant protein T23657 JP6 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 29, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is Icnown or not; the presence of Icnown SNPs in variant protein T23657JP6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 29 - Amino acid mutations
Figure imgf001516_0001
The glycosylation sites of variant protein T23657_P6, as compared to the known protein Solute canier family 21 member 12, are described in Table 30 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 30 - Glycosylation site(s)
Figure imgf001516_0002
Variant protein T23657_P6 is encoded by the following transcript(s): T23657JT10, for which the sequence(s) is/are given at the end ofthe application. The coding portion of franscript T23657JT10 is shown in bold; this coding portion starts at position 212 and ends at position 2071. The transcript also has the following SNPs as listed in Table 31 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T23657JP6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 31 - Nucleic acid SNPs
Figure imgf001517_0001
Figure imgf001518_0001
Variant protein T23657JP7 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) T23657JT12, T23657JT17 and T23657JT22. An aligmnent is given to the Icnown protein (Solute canier family 21 member 12) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T23657JP7 and S21CJHUMAN: l.An isolated chimeric polypeptide encoding for T23657JP7, comprising a first amino acid sequence being at least 90 % homologous to MPLHQLGDKPLTFPSPNSAMENGLDHTPPSRRASPGTPLSPGSLRSAAHSPLDTSKQPLC QLWAEKHGARGTHEVRYVSAGQSVACGWWAFAPPCLQVLNTPKGILFFLCAAAFLQG MTVNGFINTVITSLERRYDLHSYQSGLIASSYDIAACLCLTFVSYFGGSGHKPRWLGWG VLLMGTGSLVFALPHFTAGRYEVELDAGVRTCPANPGAVCADSTSGLSRYQLVFMLG QFLHGVGATPLYTLGVTYLDENVKSSCSPVYIAIFYTAAILGPAAGYLIGGALLNIYTEM GRRTELTTESPLWVGAWWVGFLGSGAAAFFTAVPILGYPRQLPGSQRYAVMRAAEMH QLKDSSRGEASNPDFGKTIRDLPLSIWLLLKNPTFILLCLAGATEATLITGMSTFSPKFLES QFSLSASEAATLFGYLVVPAGGGGTFLGGFFVNKLRLRGSAVIKFCLFCTVVSLLGILVF SLHCPSVPMAGVTASYGGSLLPEGHLNLTAPCNAACSCQPEHYSPVCGSDGLMYFSLC HAGCPAATETNVDGQK conesponding to amino acids 1 - 546 of S21 CJHUMAN, which also conesponds to amino acids 1 - 546 of T23657JP7, and a second amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence MCP conesponding to amino acids 547 - 549 of T23657JP7, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because both trans-membrane region prediction programs predicted a trans-membrane region for this protein. Variant protein T23657JP7 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 32, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is Icnown or not; the presence of Icnown SNPs in variant protein T23657JP7 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 32 - Amino acid mutations
Figure imgf001519_0001
The glycosylation sites of variant protein T23657JP7, as compared to the Icnown protein Solute canier family 21 member 12, are described in Table 33 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 33 - Glycosylation site(s)
Figure imgf001519_0002
Variant protein T23657_P7 is encoded by the following transcript(s): T23657JT12, T23657JT17 and T23657JT22, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript T23657JT12 is shown in bold; this coding portion starts at position 212 and ends at position 1858. The transcript also has the following SNPs as listed in Table 34 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is Icnown or not; the presence of Icnown SNPs in variant protein T23657JP7 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 34 - Nucleic acid SNPs
Figure imgf001520_0001
The coding portion of transcript T23657JT17 is shown in bold; this coding portion starts at position 212 and ends at position 1858. The transcript also has the following SNPs as listed in Table 35 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T23657JP7 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 35 - Nucleic acid SNPs
Figure imgf001521_0001
The coding portion of franscript T23657JT22 is shown in bold; this coding portion starts at position 212 and ends at position 1858. The franscript also has the following SNPs as listed in Table 36 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of Icnown SNPs in variant protein T23657_P7 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 36 - Nucleic acid SNPs
Figure imgf001522_0001
Variant protein T23657JP8 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by franscript(s) T23657JT13, T23657JT19 and T23657JT28. An alignment is given to the known protein (Solute carrier family 21 member 12) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T23657JP8 and S21CJTUMAN: l.An isolated chimeric polypeptide encoding for T23657 ?8, comprising a first amino acid sequence being at least 90 % homologous to
MPLHQLGDKPLTFPSPNSAMENGLDHTPPSRRASPGTPLSPGSLRSAAHSPLDTSKQPLC QLWAEKHGARGTHEVRYVSAGQSVACGWWAFAPPCLQVLNTPKGILFFLCAAAFLQG MTVNGFTNTVITSLERRYDLHSYQSGLIASSYDIAACLCLTFVSYFGGSGHKPRWLGWG VLLMGTGSLVFALPHFTAGRYEVELDAGVRTCPANPGAVCADSTSGLSRYQLVFMLG QFLHGVGATPLYTLGVTYLDENVKSSCSPVYIATFYTAAILGPAAGYLIGGALLNIYTEM GRRTELTTESPLWVGAWWVGFLGSGAAAFFTAVPILGYPRQLPGSQRYAVMRAAEMH QLKDSSRGEASNPDFGKTIRDLPLSIWLLLKNPTFILLCLAGATEATLITGMSTFSPKFLES QFSLSASEAATLFGYLVVPAGGGGTFLGGFFVNKLRLRGSAVIKFCLFCTWSLLGILVF SLHCPSVPMAGVTASYGGSLLPEGHLNLTAPCNAACSCQPEHYSPVCGSDGLMYFSLC HAGCPAATETNVDGQK conesponding to amino acids 1 - 546 of S21 CJHUMAN, which also conesponds to amino acids 1 - 546 of T23657JP8, and a second amino acid sequence being at least 70%), optionally at least 80%, preferably at least 85%>, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence QHSCTNGNSTMCP conesponding to amino acids 547 - 559 of T23657_P8, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of T23657_P8, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence QHSCTNGNSTMCP in T23657_P8.
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because both frans-membrane region prediction programs predicted a trans-membrane region for this protein. Variant protein T23657JP8 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 37, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is Icnown or not; the presence of known SNPs in variant protein T23657_P8 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 37 - Amino acid mutations
Figure imgf001524_0001
The glycosylation sites of variant protein T23657_P8, as compared to the known protein Solute carrier family 21 member 12, are described in Table 38 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 38 - Glycosylation site(s)
Figure imgf001524_0002
Variant protein T23657JP8 is encoded by the following franscript(s): T23657JT13, T23657JT19 and T23657JT28, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript T23657JT13 is shown in bold; this coding portion starts at position 212 and ends at position 1888. The transcript also has the following SNPs as listed in Table 39 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T23657JP8 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 39 - Nucleic acid SNPs
Figure imgf001525_0001
The coding portion of franscript T23657JT19 is shown in bold; this coding portion starts at position 212 and ends at position 1888. The transcript also has the following SNPs as listed in Table 40 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T23657JP8 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 40 - Nucleic acid SNPs
Figure imgf001526_0001
The coding portion of transcript T23657JT28 is shown in bold; this coding portion starts at position 212 and ends at position 1888. The franscript also has the following SNPs as listed in Table 41 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T23657JP8 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 41 - Nucleic acid SNPs
Figure imgf001527_0001
Variant protein T23657JP9 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by franscript(s) T23657JT14. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because one ofthe two signal-peptide prediction programs (HMM: Signal peptide,NN:NO) predicts that this protein has a signal peptide. Variant protein T23657_P9 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 42, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T23657JP9 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 42 - Amino acid mutations
Figure imgf001528_0001
Variant protein T23657_P9 is encoded by the following transcript(s): T23657JT14, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript T23657JT14 is shown in bold; this coding portion starts at position 573 and ends at position 1772. The transcript also has the following SNPs as listed in Table 43 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T23657J*9 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 43 - Nucleic acid SNPs
Figure imgf001528_0002
Variant protein T23657JP10 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) T23657JT15. An alignment is given to the Icnown protein (Solute canier family 21 member 12) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T23657JP10 and S21CJHUMAN: 1.An isolated chimeric polypeptide encoding for T23657_P10, comprising a first amino acid sequence being at least 90 %> homologous to MPLHQLGDKPLTFPSPNSAMENGLDHTPPSRRASPGTPLSPGSLRSAAHSPLDTSKQPLC QLWAEKHGARGTHEVRYVSAGQSVACGWWAFAPPCLQVLNTPKGILFFLCAAAFLQG MTVNGFTNTVITSLERRYDLHSYQSGLIASSYDIAACLCLTFVSYFGGSGHKPRWLGWG VLLMGTGSLVFALPHFTAGRYEVELDAGVRTCPANPGAVCADSTSGLSRYQLVFMLG QFLHGVGATPLYTLGVTYLDENVKSSCSPVYIAIFYTAAILGPAAGYLIGGALLNIYTEM GRRTELTTESPLWVGAWWVGFLGSGAAAFFTAVPILGYPRQLPGSQRYAVMRAAEMH QLKDSSRGEASNPDFGKTIRDLPLSIWLLLKNPTFILLCLAGATEATLITGMSTFSPKFLES QFSLSASEAATLFGYLVVPAGGGGTFLGGFFVNKLRLRGSAVrKFCLFCTVNSLLGILVF SLHCPSVPMAGVTASYGGSLLPEGHLΝLTAPCΝAACSCQPEHYSPVCGSDGLMYFSLC HAGCPAATETΝVDGQKVYRDCSCIPQΝLSSGFGHATAGKCTSTCQRKPLLLVFiFVNIFF TFLSSΓPALTATLRCVRDPQRSFALGIQWΓVVRIL conesponding to amino acids 1 - 625 of S21CJHUMAΝ, which also conesponds to amino acids 1 - 625 of T23657_P10, a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence GTVQCEEAMVSCTVCSLHKGM conesponding to amino acids 626 - 646 of T23657JP10, and a third amino acid sequence being at least 90 % homologous to GGIPGPIAFGWVIDKACLLWQDQCGQQGSCLVYQNSAMSRYILIMGLLYKVLGVLFFAI ACFLYKPLSESSDGLETCLPSQSSAPDSATDSQLQSSV conesponding to amino acids 626 - 722 of S21 CJHUMAN, which also conesponds to amino acids 647 - 743 of T23657JP10, wherein said first amino acid sequence, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for an edge portion of T23657_P10, comprising an amino acid sequence being at least 70%, optionally at least about 80%, preferably at least about 85%o, more preferably at least about 90%> and most preferably at least about 95% homologous to the sequence encoding for GTVQCEEAMVSCTVCSLHKGM, conesponding to T23657_P10.
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because both frans-membrane region prediction programs predicted a frans-membrane region for this protein. Variant protein T23657JP10 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 44, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is Icnown or not; the presence of known SNPs in variant protein T23657JP10 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 44 - Amino acid mutations
Figure imgf001530_0001
The glycosylation sites of variant protein T23657JP10, as compared to the known protein Solute canier family 21 member 12, are described in Table 45 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 45 - Glycosylation site(s)
Figure imgf001531_0001
Variant protein T23657JP10 is encoded by the following transcript(s): T23657JT15, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript T23657JT15 is shown in bold; this coding portion starts at position 212 and ends at position 2440. The transcript also has the following SNPs as listed in Table 46 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T23657JP10 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 46 - Nucleic acid SNPs
Figure imgf001531_0002
Figure imgf001532_0001
Variant protein T23657_P11 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) T23657JT23. An alignment is given to the known protein (Solute carrier family 21 member 12) at the end ofthe application. One or more aligmnents to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T23657_P11 and S21CJHUMAN: 1.An isolated chimeric polypeptide encoding for T23657JP11, comprising a first amino acid sequence being at least 90 % homologous to MPLHQLGDKPLTFPSPNSAMENGLDHTPPSRRASPGTPLSPGSLRSAAHSPLDTSKQPLC QLWAEKHGARGTHEVRYVSAGQSVACGWWAFAPPCLQVLNTPKGILFFLCAAAFLQG MTVNGFTNTVITSLERRYDLHSYQSGLIASSYDIAACLCLTFVSYFGGSGHKPRWLGWG VLLMGTGSLVFALPHFTAGRYEVELDAGVRTCPANPGAVCADSTSGLSRYQLVFMLG QFLHGVGATPLYTLGVTYLDENVKSSCSPVYIAΓFYTAAILGPAAGYLIGGALLNIYTEM GRRTELTTESPLWVGAWWVGFLGSGAAAFFTAVPILGYPRQLPGSQRYAVMRAAEMH QLKDSSRGEASNPDFGKTIRDLPLSIWLLLKNPTFILLCLAGATEATLITGMSTFSPKFLES QFSLSASEAATLF conesponding to amino acids 1 - 425 of S21 CJHUMAN, which also conesponds to amino acids 1 - 425 of T23657JP11, and a second amino acid sequence being at least 70%, optionally at least 80%>, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence ASCPKAT conesponding to amino acids 426 - 432 of T23657_P11, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of T23657_P11, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence ASCPKAT in T23657JP11.
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because both frans-membrane region prediction programs predicted a frans-membrane region for this protein. Variant protein T23657JP11 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 47, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of Icnown SNPs in variant protein T23657_P11 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 47 - Amino acid mutations
Figure imgf001533_0001
The glycosylation sites of variant protein T23657JP11, as compared to the known protein Solute carrier family 21 member 12, are described in Table 48 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 48 - Glycosylation site(s)
Figure imgf001534_0001
Variant protein T23657JP11 is encoded by the following transcript(s): T23657JT23, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript T23657JT23 is shown in bold; this coding portion starts at position 212 and ends at position 1507. The transcript also has the following SNPs as listed in Table 49 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is Icnown or not; the presence of known SNPs in variant protein T23657_P11 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 49 - Nucleic acid SNPs
Figure imgf001534_0002
Figure imgf001535_0001
Variant protein T23657_P12 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) T23657JT24. An alignment is given to the known protein (Solute canier family 21 member 12) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T23657JP12 and S21CJHUMAN: 1.An isolated chimeric polypeptide encoding for T23657JP12, comprising a first amino acid sequence being at least 90 % homologous to MPLHQLGDKPLTFPSPNSAMENGLDHTPPSRRASPGTPLSPGSLRSAAHSPLDTSKQPLC QLWAEKHGARGTHEVRYVSAGQSVACGWWAFAPPCLQVLNTPKGILFFLCAAAFLQG MTVNGFINTVITSLERRYDLHSYQSGLIASSYDIAACLCLTFVSYFGGSGHKPRWLGWG VLLMGTGSLVFALPHFTAGRYEVELDAGVRTCPANPGAVCADSTSGLSRYQLVFMLG QFLHGVGATPLYTLGVTYLDENVKSSCSPVYIAIFYTAAILGPAAGYLIGGALLNIYTEM GRRTELTTESPLWVGAWWVGFLGSGAAAFFTAVPILGYPRQLPGSQRYAVMRAAEMH QLKDSSRGEASNPDFGKTIRDLPLSIWLLLKNPTFILLCLAGATEATLITGMSTFSPKFLES QFSLSASEAATLFGYLV AGGGGTFLGGFFVNKLRLRGSAVΓKFCLFCTVVSLLGILVF SLHCPSVPMAGVTASYGGSLLPEGHLNLTAPCNAACSCQPEHYSPVCGSDGLMYFSLC HAGCPAATETNVDGQKVYRDCSCIPQNLSSGFGHATAGKCTSTCQRKPLLLVFIFWTFF TFLSSIPALTATLRCVRDPQRSFALGIQWIVVRILGGIPGPIAFGWVIDKACLLWQDQCG QQGSCLVYQNSAMSRYILIMGLLYK conesponding to amino acids 1 - 675 of S21CJHUMAN, which also conesponds to amino acids 1 - 675 of T23657JP12, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%>, more preferably at least 90%> and most preferably at least 95%> homologous to a polypeptide having the sequence EEENEFRRL conesponding to amino acids 676 - 684 of T23657_P12, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of T23657_P12, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence EEENEFRRL in T23657_P12.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because both trans-membrane region prediction programs predicted a trans-membrane region for this protein. Variant protein T23657_P12 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 50, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T23657JP12 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 50 - Amino acid mutations
Figure imgf001536_0001
The glycosylation sites of variant protein T23657_P12, as compared to the known protein Solute carrier family 21 member 12, are described in Table 51 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 51 - Glycosylation site(s)
Figure imgf001537_0001
Variant protein T23657_P12 is encoded by the following transcript(s): T23657JT24, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript T23657JT24 is shown in bold; this coding portion starts at position 212 and ends at position 2263. The transcript also has the following SNPs as listed in Table 52 (given according to their position on the nucleotide sequence, with the altemative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T23657_P12 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 52 - Nucleic acid SNPs
Figure imgf001537_0002
Figure imgf001538_0001
Variant protein T23657_P16 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) T23657JT30. An alignment is given to the known protein (Solute canier family 21 member 12) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T23657JP16 and S21CJHUMAN: 1.An isolated chimeric polypeptide encoding for T23657JP16, comprising a first amino acid sequence being at least 70%>, optionally at least 80%>, preferably at least 85%, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence MGTSPMADPVPAGRQHGSGLDPTTRLSPLC conesponding to amino acids 1 - 30 of T23657 ?16, and a second amino acid sequence being at least 90 % homologous to SLLPEGHLNLTAPCNAACSCQPEHYSPVCGSDGLMYFSLCHAGCPAATETNVDGQKVY RDCSCIPQNLSSGFGHATAGKCTSTCQRKPLLLVFIFWIFFTFLSSIPALTATLRCVRDPQ RSFALGIQWIWRILGGIPGPIAFGWVIDKACLLWQDQCGQQGSCLVYQNSAMSRYILI MGLLYKVLGVLFFAIACFLYKPLSESSDGLETCLPSQSSAPDSATDSQLQSSV conesponding to amino acids 491 - 722 of S21 CJHUMAN, which also conesponds to amino acids 31 - 262 of T23657JP16, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a head of T23657JP16, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MGTSPMADPVPAGRQHGSGLDPTTRLSPLC of T23657_P16.
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because both trans-membrane region prediction programs predicted a trans-membrane region for this protein. Variant protein T23657JP16 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 53, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T23657JP16 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 53 - Amino acid mutations
Figure imgf001539_0001
The glycosylation sites of variant protein T23657JP16, as compared to the known protein Solute carrier family 21 member 12, are described in Table 54 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 54 - Glycosylation site(s)
Figure imgf001539_0002
Variant protein T23657_P16 is encoded by the following franscript(s): T23657JT30, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript T23657JT30 is shown in bold; this coding portion starts at position 184 and ends at position 969. The transcript also has the following SNPs as listed in Table 55 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T23657JP16 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 55 - Nucleic acid SNPs
Figure imgf001540_0001
Variant protein T23657JP17 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) T23657JT31 and T23657JT32. An alignment is given to the Icnown protein (Solute carrier family 21 member 12) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T23657JP17 and S21 CJHUMAN: l.An isolated chimeric polypeptide encoding for T23657JP17, comprising a first amino acid sequence being at least 90 %> homologous to MYFSLCHAGCPAATETNVDGQKVYRDCSCIPQNLSSGFGHATAGKCTSTCQRKPLLLV FIFVVTFFTFLSSIPALTATLRCVTDPQRSFALGIQWIVVMLGGIPGPiAFGWVIDKACLL WQDQCGQQGSCLVYQNSAMSRYILIMGLLYKVLGVLFFAIACFLYKPLSESSDGLETCL PSQSSAPDSATDSQLQSSV conesponding to amino acids 525 - 722 of S21 CJHUMAN, which also conesponds to amino acids 1 - 198 of T23657JP17.
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because both trans-membrane region prediction programs predicted a trans-membrane region for this protein.
The glycosylation sites of variant protein T23657 T7, as compared to the Icnown protein Solute carrier family 21 member 12, are described in Table 56 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 56 - Glycosylation site(s)
Figure imgf001541_0001
Variant protein T23657JP17 is encoded by the following franscriρt(s): T23657JT31 and T23657JT32, for which the sequence(s) is/are given at the end ofthe application. The coding portion of franscript T23657JT31 is shown in bold; this coding portion starts at position 216 and ends at position 809. The franscript also has the following SNPs as listed in Table 57 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T23657_P17 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 57 - Nucleic acid SNPs
Figure imgf001542_0001
The coding portion of transcript T23657JT32 is shown in bold; this coding portion starts at position 174 and ends at position 767. The transcript also has the following SNPs as listed in Table 58 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T23657JP17 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 58 - Nucleic acid SNPs
Figure imgf001542_0002
Variant protein T23657JP19 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by franscript(s) T23657JT35. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: intracellularly. The protein localization is believed to be intracellularly because neither ofthe trans-membrane region prediction programs predicted a trans-membrane region for this protein. In addition both signal-peptide prediction programs predict that this protein is a non-secreted protein. Variant protein T23657JP19 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 59, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of Icnown SNPs in variant protein T23657JP19 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 59 - Amino acid mutations
Figure imgf001543_0001
Variant protein T23657 JP19 is encoded by the following transcript(s): T23657JT35, for which the sequence(s) is/are given at the end ofthe application. The coding portion of franscript T23657JT35 is shown in bold; this coding portion starts at position 184 and ends at position 663. The transcript also has the following SNPs as listed in Table 60 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T23657JP19 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 60 - Nucleic acid SNPs
Figure imgf001544_0001
Variant protein T23657_P21 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by franscript(s) T23657JT37. An alignment is given to the known protein (Solute carrier family 21 member 12) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T23657JP21 and S21C_HUMAN: l.An isolated chimeric polypeptide encoding for T23657_P21, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95 % homologous to a polypeptide having the sequence MWTAR conesponding to amino acids 1 - 5 of T23657JP21, and a second amino acid sequence being at least 90 %> homologous to RCVRDPQRSFALGIQWIVVRILGGIPGPIAFGWVIDKACLLWQDQCGQQGSCLVYQNSA MSRYILIMGLLYKVLGVLFFAIACFLYKPLSESSDGLETCLPSQSSAPDSATDSQLQSSV conesponding to amino acids 604 - 722 of S21 CJHUMAN, which also conesponds to amino acids 6 - 124 of T23657_P21, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a head of T23657JP21, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95%> homologous to the sequence MWTAR of T23657_P21.
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because although one ofthe signal-peptide prediction programs predicts that this protein has a signal peptide (HMM: Signal peptide / NN: NO), both frans-membrane region prediction programs predict that this protein has a frans-membrane region downstream of this signal peptide.
The glycosylation sites of variant protein T23657_P21, as compared to the known protein Solute carrier family 21 member 12, are described in Table 61 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 61 - Glycosylation site(s)
Figure imgf001545_0001
Figure imgf001546_0001
Variant protein T23657_P21 is encoded by the following transcript(s): T23657JT37, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript T23657JT37 is shown in bold; this coding portion starts at position 223 and ends at position 594. The transcript also has the following SNPs as listed in Table 62 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of Icnown SNPs in variant protein T23657JP21 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 62 - Nucleic acid SNPs
Figure imgf001546_0002
Variant protein T23657_P22 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by franscript(s) T23657JT38. The location of the variant protein was detennined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: unknown. Variant protein T23657JP22 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 63, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T23657JP22 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 63 - Amino acid mutations
Figure imgf001547_0001
Variant protein T23657JP22 is encoded by the following franscript(s): T23657JT38, for which the sequence(s) is/are given at the end of the application. The coding portion of franscript T23657JT38 is shown in bold; this coding portion starts at position 55 and ends at position 88889. The transcript also has the following SNPs as listed in Table 64 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of Icnown SNPs in variant protein T23657JP22 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 64 - Nucleic acid SNPs
Figure imgf001547_0002
Variant protein T23657_P23 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by franscript(s) T23657JT11. An alignment is given to the known protein (Solute carrier family 21 member 12) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T23657JP23 and S21 CJHUMAN: 1.An isolated chimeric polypeptide encoding for T23657 ?23, comprising a first amino acid sequence being at least 90 %> homologous to
MPLHQLGDKPLTFPSPNSAMENGLDHTPPSRRASPGTPLSPGSLRSAAHSPLDTSKQPLC QLWAEKHGARGTHEVRYVSAGQSVACGWWAFAPPCLQVLNTPKGILFFLCAAAFLQG MTVNGFTNTVITSLERRYDLHSYQSGLIASSYDIAACLCLTFVSYFGGSGHKPRWLGWG VLLMGTGSLVFALPHFTAGRYEVELDAGVRTCPANPGAVCADSTSGLSRYQLVFMLG QFLHGVGATPLYTLGVTYLDENVKSSCSPVYIAIFYTAAILGPAAGYLIGGALLNIYTEM GRRTELTTESPLWVGAWWVGFLGSGAAAFFTAVPILGYPRQLPGSQRYAVMRAAEMH QLKDSSRGEASNPDFGKTIRDLPLSIWLLLKNPTFILLCLAGATEATLITGMSTFSP FLES QFSLSASEAATLFGYLVVPAGGGGTFLGGFFVNKLRLRGSAVIKFCLFCTVVSLLGILVF SLHCPSVPMAGVTASYGGSLLPEGHLNLTAPCNAACSCQPEHYSPVCGSDGLMYFSLC HAGCPAATETNVDGQKV conesponding to amino acids 1 - 547 of S21 CJHUMAN, which also conesponds to amino acids 1 - 547 of T23657JP23, and a second amino acid sequence being at least 70%>, optionally at least 80%>, preferably at least 85%>, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
SGAAAYRPCPPLDPGKGPPCLPLVIGAIVGLPRCTETVAVSLRIFPLVLAMHCREMHFNL SEKAPPSGFHIRCNFLYΓPQQHSCTNGNSTVSWGRVCACPELSLQHPEAELCRS conesponding to amino acids 548 - 661 of T23657JP23, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of T23657JP23, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence
SGAAAYRPCPPLDPGKGPPCLPLVIGAIVGLPRCTETVAVSLRIFPLVLAMHCREMHFNL
SEKAPPSGFHIRCNFLYIPQQHSCTNGNSTVSWGRVCACPELSLQHPEAELCRS in
T23657_P23. The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because both trans-membrane region prediction programs predicted a frans-membrane region for this protein. Variant protein T23657JP23 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 65, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T23657_P23 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 65 - Amino acid mutations
Figure imgf001549_0001
The glycosylation sites of variant protein T23657JP23, as compared to the known protein Solute canier family 21 member 12, are described in Table 66 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 66 - Glycosylation site(s)
Figure imgf001549_0002
Variant protein T23657_P23 is encoded by the following franscript(s): T23657JT11, for which the sequence(s) is/are given at the end ofthe application. The coding portion oftranscri.pt T23657JT11 is shown in bold; this coding portion starts at position 212 and ends at position 2195. The transcript also has the following SNPs as listed in Table 67 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is Icnown or not; the presence of known SNPs in variant protein T23657JP23 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 67 - Nucleic acid SNPs
Figure imgf001550_0001
As noted above, cluster T23657 features 33 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end ofthe application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster T23657_nodeJ according to the present invention is supported by 30 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T23657JT0, T23657JT1, T23657JT2, T23657JT3, T23657JT4, T23657JT5, T23657_T6, T23657JT7, T23657JT8, T23657JT9, T23657JT10, T23657JT11, T23657JT12, T23657JT13, T23657JT14, T23657JT15, T23657JT16, T23657JT17, T23657JT19, T23657_T20, T23657JT21, T23657_T22, T23657_T23, T23657JT24 and T23657_T28. Table 68 below describes the starting and ending position of this segment on each franscript. Table 68 - Segment location on transcripts
Figure imgf001551_0001
Figure imgf001552_0001
Segment cluster T23657_node J according to the present invention is supported by 54 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): T23657JT0, T23657JT1, T23657JT2, T23657JT3, T23657JT4, T23657JT5, T23657JT6, T23657JT7, T23657JT8, T23657JT9, T23657JT10, T23657JT11, T23657JT12, T23657JT13, T23657_T14, T23657_T15, T23657_T16, T23657_T17, T23657JT19, T23657_T20, T23657_T21, T23657_T22, T23657_T23, T23657JT24 and T23657JT28. Table 69 below describes the starting and ending position of this segment on each transcript. Table 69 - Segment location on transcripts
Figure imgf001552_0002
Figure imgf001553_0001
Segment cluster T23657_node_8 according to the present invention is supported by 25 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T23657JT0, T23657JT1, T23657JT2, T23657JT3, T23657JT4, T23657JT5, T23657JT6, T23657JT7, T23657JT8, T23657_T9, T23657JT10, T23657JT11, T23657JT12, T23657JT13, T23657 T14, T23657JT15, T23657JT16, T23657_T17, T23657_T19, T23657JT20, T23657_T21, T23657_T22, T23657JT23, T23657_T24 and T23657JT28. Table 70 below describes the starting and ending position of this segment on each franscript. Table 70 - Segment location on transcripts
Figure imgf001554_0001
Segment cluster T23657_nodeJ6 according to the present invention is supported by 39 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T23657JT0, T23657JT1, T23657JT2, T23657JT3, T23657JT4, T23657JT5, T23657JT6, T23657JT7, T23657JT8, T23657_T9, T23657_T10, T23657JT11, T23657_T12, T23657JT13, T23657_T14, T23657_T15, T23657_T16, T23657_T17, T23657JT9, T23657JT20, T23657_T21, T23657JT22, T23657JT23, T23657_T24 and T23657_T28. Table 71 below describes the starting and ending position of this segment on each transcript. Table 71 - Segment location on transcripts
Figure imgf001555_0001
Figure imgf001556_0001
Segment cluster T23657_node_18 according to the present invention is supported by 40 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): T23657JT0, T23657JT1, T23657JT2, T23657JT3, T23657JT4, T23657JT5, T23657JT6, T23657JT7, T23657JT8, T23657_T9, T23657_T10, T23657_T11, T23657JT12, T23657_T13, T23657_T14, T23657_T15, T23657_T16, T23657JT17, T23657_T19, T23657_T20, T23657_T21, T23657_T22, T23657JT24 and T23657_T28. Table 72 below describes the starting and ending position of this segment on each transcript. Table 72 - Segment location on transcripts
Figure imgf001556_0002
Figure imgf001557_0001
Segment cluster T23657_node_23 according to the present invention is supported by 4 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): T23657JT30 and T23657JT35. Table 73 below describes the starting and ending position of this segment on each transcript. Table 73 - Segment location on transcripts
Figure imgf001557_0002
Segment cluster T23657_node_24 according to the present invention is supported by 42 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): T23657JT0, T23657JT1, T23657JT2, T23657JT3, T23657JT4, T23657JT5, T23657_T6, T23657JT7, T23657JT8, T23657JT9, T23657JT10, T23657JT11, T23657JT12, T23657JT3, T23657_T14, T23657JT15, T23657_T16, T23657JT17, T23657_T19, T23657_T20, T23657_T21, T23657_T22, T23657JT23, T23657_T24, T23657_T28, T23657_T30, T23657_T31, T23657JT32, T23657_T35 and T23657_T37. Table 74 below describes the starting and ending position of this segment on each transcript. Table 74 - Segment location on transcripts
Figure imgf001558_0001
Figure imgf001559_0001
Segment cluster T23657_nodeJ7 according to the present invention is supported by 38 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): T23657JT0, T23657JT1, T23657JT2, T23657JT3, T23657JT4, T23657JT5, T23657JT6, T23657JT7, T23657JT8, T23657JT9, T23657JT0, T23657_T11, T23657_T14, T23657JT5, T23657_T16, T23657_T20, T23657_T21, T23657_T24, T23657JT30, T23657_T31, T23657_T32 and T23657_T35. Table 75 below describes the starting and ending position of this segment on each transcript. Table 75 - Segment location on transcripts
Figure imgf001559_0002
Figure imgf001560_0001
Segment cluster T23657_nodeJ9 according to the present invention is supported by 12 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): T23657_T5, T23657JT6, T23657JT0, T23657JT11 and T23657JT35. Table 76 below describes the starting and ending position of this segment on each franscript. Table 76 - Segment location on transcripts
Figure imgf001560_0002
Segment cluster T23657_node_34 according to the present invention is supported by 65 libraries. The number of libraries was detennined as previously described. This segment can be found in the following transcript(s): T23657JT0, T23657JT1, T23657JT2, T23657JT3, T23657JT4, T23657JT5, T23657JT6, T23657JT7, T23657JT8, T23657_T9, T23657JT10, T23657_T11, T23657_T12, T23657_T13, T23657_T14, T23657JT15, T23657_T16, T23657_T17, T23657JT19, T23657_T20, T23657JT21, T23657_T22, T23657_T23, T23657_T24, T23657_T28, T23657_T30, T23657_T31, T23657_T32, T23657_T35, T23657_T37 and T23657JT38. Table 77 below describes the starting and ending position of this segment on each transcript. Table 77 - Segment location on transcripts
Figure imgf001561_0001
Figure imgf001562_0001
Segment cluster T23657_nodeJ7 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T23657JT3, T23657JT4, T23657JT6, T23657JT9, T23657JT13 and T23657JT21. Table 78 below describes the starting and ending position of this segment on each transcript. Table 78 - Segment location on transcripts
Figure imgf001562_0002
Segment cluster T23657_node_38 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T23657JT3, T23657JT4, T23657JT6, T23657JT9 and T23657JT13. Table 79 below describes the starting and ending position of this segment on each transcript. Table 79 - Segment location on transcripts
Figure imgf001563_0001
Segment cluster T23657_nodeJ9 according to the present invention is supported by 13 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T23657JT2, T23657JT3, T23657JT4, T23657JT5, T23657JT6, T23657_T7, T23657JT9, T23657JT10, T23657JT12, T23657_T13, T23657JT16, T23657JT20, T23657JT22 and T23657JT35. Table 80 below describes the starting and ending position of this segment on each transcript. Table 80 - Segment location on transcripts
Figure imgf001563_0002
Figure imgf001564_0001
Segment cluster T23657_nodeJ0 according to the present invention is supported by 40 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T23657JT2, T23657JT3, T23657JT4, T23657JT5, T23657_T6, T23657JT7, T23657JT9, T23657JT10, T23657_T12, T23657JT13, T23657_T16 and T23657JT35. Table 81 below describes the starting and ending position of this segment on each transcript. Table 81 - Segment location on transcripts
Figure imgf001564_0002
Figure imgf001565_0001
Segment cluster T23657_node_45 according to the present invention is supported by 91 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T23657JT0, T23657JT1, T23657JT2, T23657JT3, T23657JT4, T23657JT5, T23657JT6, T23657_T7, T23657_T8, T23657_T9, T23657_T10, T23657JT11, T23657_T12, T23657_T13, T23657_T14, T23657JT5, T23657_T16, T23657_T17, T23657_T19, T23657_T20, T23657JT21, T23657_T22, T23657JT23, T23657_T30, T23657_T31, T23657JT32, T23657_T35 and T23657_T37. Table 82 below describes the starting and ending position of this segment on each transcript. Table 82 - Segment location on transcripts
Figure imgf001566_0001
Segment cluster T23657_node_46 according to the present invention is supported by 7 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): T23657JT1, T23657JT7 and T23657JT38. Table 83 below describes the starting and ending position of this segment on each transcript. Table 83 - Segment location on transcripts
Figure imgf001566_0002
Segment cluster T23657_node_49 according to the present invention is supported by 5 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T23657JT24 and T23657JT28. Table 84 below describes the starting and ending position of this segment on each transcript. Table 84 - Segment location on transcripts
Figure imgf001567_0001
According to an optional embodiment ofthe present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster T23657_node_0 according to the present invention is supported by 24 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T23657JT0, T23657JT1, T23657JT2, T23657JT3, T23657JT4, T23657JT5, T23657 JT6, T23657JT7, T23657JT8, T23657JT9, T23657JT10, T23657JT11, T23657_T12, T23657JT13, T23657JT14, T23657JT15, T23657JT16, T23657JT17, T23657JT9, T23657JT20, T23657JT21, T23657JT22, T23657_T23, T23657_T24, T23657_T28 and T23657JT31. Table 85 below describes the starting and ending position of this segment on each franscript. Table 85 - Segment location on transcripts
Figure imgf001567_0002
Figure imgf001568_0001
Segment cluster T23657_node_4 according to the present invention is supported by 31 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T23657JT0, T23657JT1, T23657JT2, T23657JT3, T23657JT4, T23657JT5, T23657JT6, T23657JT7, T23657_T8, T23657JT9, T23657JT10, T23657JT11, T23657_T12, T23657JT13, T23657_T14, T23657_T15, T23657_T16, T23657_T17, T23657JT19, T23657_T20, T23657_T21, T23657_T22, T23657_T23, T23657JT24 and T23657JT28. Table 86 below describes the starting and ending position of this segment on each transcript. Table 86 - Segment location on transcripts
Figure imgf001569_0001
Figure imgf001570_0001
Segment cluster T23657_node_6 according to the present invention is supported by 28 libraπes. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T23657JT0, T23657 T1, T23657JT2, T23657JT3, T23657JT4, T23657_T5, T23657JT6, T23657JT7, T23657_T8, T23657JT9, T23657 N0, T23657_T11, T23657JT12, T23657JT13, T23657_T15, T23657_T16, T23657JT17, T23657JT19, T23657JT20, T23657_T21, T23657JT22, T23657JT23, T23657JT24 and T23657JT28. Table 87 below describes the starting and ending position of this segment on each transcript. Table 87 - Segment location on transcripts
Figure imgf001570_0002
Figure imgf001571_0001
Segment cluster T23657_node_l 1 according to the present invention is supported by 33 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T23657JT0, T23657JT1, T23657JT2, T23657JT3, T23657JT4, T23657_T5, T23657JT6, T23657JT7, T23657JT8, T23657JT9, T23657_T10, T23657JT11, T23657_T12, T23657_T13, T23657_T14, T23657_T15, T23657_T16, T23657_T17, T23657_T19, T23657_T20, T23657_T21, T23657JT22, T23657_T23, T23657_T24 and T23657JT28. Table 88 below describes the starting and ending position of this segment on each transcript. Table 88 - Segment location on transcripts
Figure imgf001571_0002
Figure imgf001572_0001
Segment cluster T23657_node_20 according to the present invention is supported by 2 libraries. The nmnber of libraries was determined as previously described. This segment can be found in the following transcript(s): T23657JT32 and T23657 JT37. Table 89 below describes the starting and ending position of this segment on each franscript. Table 89 - Segment location on transcripts
Figure imgf001572_0002
Segment cluster T23657_node_22 according to the present invention is supported by 3 libraries. The number of libraries was detennined as previously described. This segment can be found in the following transcript(s): T23657_T30, T23657JT35 and T23657JT38. Table 90 below describes the starting and ending position of this segment on each transcript. Table 90 - Segment location on transcripts
Figure imgf001573_0001
Segment cluster T23657_nodeJ5 according to the present invention is supported by 36 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T23657JT0, T23657JT1, T23657JT2, T23657JT3, T23657JT4, T23657JT5, T23657JT6, T23657JT7, T23657JT8, T23657JT9, T23657JT10, T23657JT11, T23657JT12, T23657JT13, T23657JT14, T23657_T15, T23657JT16, T23657JT7, T23657JT19, T23657_T20, T23657_T21, T23657_T22, T23657JT23, T23657_T24, T23657_T28, T23657_T30, T23657_T31, T23657JT32, T23657_T35, T23657_T37 and T23657JT38. Table 91 below describes the starting and ending position of this segment on each transcript. Table 91 - Segment location on transcripts
Figure imgf001573_0002
Figure imgf001574_0001
Segment cluster T23657_node_26 according to the present invention is supported by 8 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T23657JT10, T23657JT11 and T23657JT35. Table 92 below describes the starting and ending position of this segment on each transcript. Table 92 - Segment location on transcripts
Figure imgf001575_0001
Segment cluster T23657 jnode J8 according to the present invention is supported by 34 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T23657JT0, T23657JT1, T23657JT2, T23657JT3, T23657JT4, T23657JT5, T23657JT6, T23657JT7, T23657JT8, T23657JT9, T23657_T10, T23657JT11, T23657_T13, T23657_T14, T23657_T15, T23657JT16, T23657_T19, T23657_T20, T23657_T21, T23657JT24, T23657JT28, T23657_T30, T23657_T31, T23657_T32, T23657JT35 and T23657JT38. Table 93 below describes the starting and ending position of this segment on each franscript. Table 93 - Segment location on transcripts
Figure imgf001575_0002
Figure imgf001576_0001
Segment cluster T23657_nodeJ0 according to the present invention is supported by 7 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T23657JT5, T23657_T6, T23657JT0, T23657JT11 and T23657JT35. Table 94 below describes the starting and ending position of this segment on each franscript. Table 94 - Segment location on transcripts
Figure imgf001576_0002
Figure imgf001577_0001
Segment cluster T23657jιodeJl according to the present invention is supported by 46 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T23657JT0, T23657JT1, T23657JT2, T23657JT3, T23657JT4, T23657JT5, T23657 JT6, T23657JT7, T23657_T8, T23657_T9, T23657_T10, T23657JT11, T23657_T12, T23657_T13, T23657_T14, T23657JT15, T23657JT16, T23657_T17, T23657_T19, T23657_T20, T23657JT21, T23657JT22, T23657_T23, T23657JT24, T23657_T28, T23657_T30, T23657_T31, T23657_T32, T23657_T35, T23657_T37 and T23657JT38. Table 95 below describes the starting and ending position of this segment on each franscript. Table 95 - Segment location on tr-anscripts
Figure imgf001577_0002
Figure imgf001578_0001
Segment cluster T23657_nodeJ2 according to the present invention is supported by 5 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T23657JT4, T23657JT6 and T23657JT15. Table 96 below describes the starting and ending position of this segment on each transcript. Table 96 - Segment location on transcripts
Figure imgf001578_0002
Figure imgf001579_0001
Segment cluster T23657_node_41 according to the present invention is supported by 26 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T23657JT2, T23657JT3, T23657JT4, T23657JT5, T23657JT6, T23657J7, T23657JT9, T23657JT10, T23657JT2, T23657_T13, T23657_T16 and T23657_T35. Table 97 below describes the starting and ending position of this segment on each franscript. Table 97 - Segment location on transcripts
Figure imgf001579_0002
Segment cluster T23657_node_42 according to the present invention is supported by 71 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T23657JT0, T23657JT1, T23657JT2, T23657JT3, T23657JT4, T23657JT5, T23657JT6, T23657JT7, T23657JT8, T23657JT9, T23657JT10, T23657JT11, T23657JT12, T23657JT3, T23657_T14, T23657_T15, T23657_T17, T23657JT19, T23657_T20, T23657_T21, T23657_T22, T23657_T23, T23657JT30, T23657_T31, T23657JT32, T23657_T35 and T23657JT37. Table 98 below describes the starting and ending position of this segment on each transcript. Table 98 - Segment location on transcripts
Figure imgf001580_0001
Figure imgf001581_0001
Segment cluster T23657_node_43 according to the present invention is supported by 80 libraries. The number of libraries was detennined as previously described. This segment can be found in the following transcript(s): T23657JT0, T23657JT1 , T23657JT2, T23657JT3, T23657JT4, T23657JT5, T23657JT6, T23657JT7, T23657JT8, T23657JT9, T23657_T10, T23657_T11, T23657_T12, T23657_T13, T23657JT14, T23657_T15, T23657_T17, T23657JT19, T23657_T20, T23657JT21, T23657_T22, T23657_T23, T23657JT30, T23657_T31, T23657_T32, T23657_T35 and T23657_T37. Table 99 below describes the starting and ending position of this segment on each transcript. Table 99 - Segment location on transcripts
Figure imgf001581_0002
Figure imgf001582_0001
Segment cluster T23657_node_44 according to the present invention is supported by 79 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T23657_T0, T23657JT1, T23657JT2, T23657JT3, T23657JT4, T23657JT5, T23657JT6, T23657JT7, T23657JT8, T23657JT9, T23657JT10, T23657JT11, T23657JT12, T23657_T13, T23657_T14, T23657JT5, T23657_T16, T23657JT17, T23657_T19, T23657_T20, T23657JT21, T23657_T22, T23657_T23, T23657_T30, T23657_T31, T23657JT32, T23657_T35 and T23657_T37. Table 100 below describes the starting and ending position of this segment on each transcript. Table 100 - Segment location on transcripts
Figure imgf001582_0002
Figure imgf001583_0001
Variant protein alignment to the previously Icnown protein:
Sequence name: S21CJ-UMAN
Sequence documentation:
Alignment of: T23657_P2 x S21C_HUMAN
Alignment segment 1/1: Quality: 6620.00
Escore: 0 Matching length: 675 Total length: 675 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment :
1 MPLHQLGDKPLTFPSPNSAMENGLDHTPPSRRASPGTPLSPGSLRSAAHS 50
1 MPLHQLGDKPLTFPSPNSAMENGLDHTPPSRRASPGTPLSPGSLRSAAHS 50
51 PLDTSKQPLCQLWAEKHGARGTHEVRYVSAGQSVACGWWAFAPPCLQVLN 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 PLDTSKQPLCQLWAEKHGARGTHEVRYVSAGQSVACGWWAFAPPCLQVLN 100 101 TPKGILFFLCAAAFLQGMTVNGFINTVITSLERRYDLHSYQSGLIASSYD 150 101 TPKGILFFLCAAAFLQGMTVNGFINTVITSLERRYDLHSYQSGLIASSYD 150
151 IAACLCLTFVSYFGGSGHKPRWLGWGVLLMGTGSLVFALPHFTAGRYEVE 200 I I I I I I 1 I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
151 IAACLCLTFVSYFGGSGHKPRWLGWGVLLMGTGSLVFALPHFTAGRYEVE 200
201 LDAGVRTCPANPGAVCADSTSGLSRYQLVFMLGQFLHGVGATPLYTLGVT 250 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
201 LDAGVRTCPANPGAVCADSTSGLSRYQLVFMLGQFLHGVGATPLYTLGVT 250
251 YLDENVKSSCSPVYIAIFYTAAILGPAAGYLIGGALLNIYTEMGRRTELT 300 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
251 YLDENVKSSCSPVYIAIFYTAAILGPAAGYLIGGALLNIYTEMGRRTELT 300
301 TESPLWVGAWWVGFLGSGAAAFFTAVPILGYPRQLPGSQRYAVMRAAEMH 350 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
301 TESPLWVGAWWVGFLGSGAAAFFTAVPILGYPRQLPGSQRYAVMRAAEMH 350
351 QLKDSSRGEASNPDFGKTIRDLPLSIWLLLKNPTFILLCLAGATEATLIT 400 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
351 QLKDSSRGEASNPDFGKTIRDLPLSIWLLLKNPTFILLCLAGATEATLIT 400
401 GMSTFSPKFLESQFSLSASEAATLFGYLVVPAGGGGTFLGGFFVNKLRLR 450 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 401 GMSTFSPKFLESQFSLSASEAATLFGYLWPAGGGGTFLGGFFVNKLRLR 450
451 GSAVIKFCLFCTWSLLGILVFSLHCPSVPMAGVTASYGGSLLPEGHLNL 500
451 GSAVIKFCLFCTVVSLLGILVFSLHCPSVPMAGVTASYGGSLLPEGHLNL 500
501 TAPCNAACSCQPEHYSPVCGSDGLMYFSLCHAGCPAATETNVDGQKVYRD 550 501 TAPCNAACSCQPEHYSPVCGSDGLMYFSLCHAGCPAATETNVDGQKVYRD 550 551 CSCIPQNLSSGFGHATAGKCTSTCQRKPLLLVFIFVVIFFTFLSSIPALT 600 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 551 CSCIPQNLSSGFGHATAGKCTSTCQRKPLLLVFIFWIFFTFLSSIPALT 600
601 ATLRCVRDPQRSFALGIQWIVVRILGGIPGPIAFGWVIDKACLLWQDQCG 650 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 601 ATLRCVRDPQRSFALGIQWIVVRILGGIPGPIAFGWVIDKACLLWQDQCG 650
651 QQGSCLVYQNSAMSRYILIMGLLYK 675 I I I I I I I I I I I I I I I I I I I I I I I I I 651 QQGSCLVYQNSAMSRYILIMGLLYK 675
Sequence name: S21CJ-UMAN
Sequence documentation:
Alignment of: T23657_P3 x S21C_HUMAN
Alignment segment 1/1:
Quality: 6621.00 Escore: 0 Matching length: 677 Total length: 677 Matching Percent Similarity: 99.85 Matching Percent Identity: 99.70 Total Percent Similarity: 99.85 Total Percent Identity: 99.70 Gaps : 0
Alignment : . . . . . 1 MPLHQLGDKPLTFPSPNSAMENGLDHTPPSRRASPGTPLSPGSLRSAAHS 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MPLHQLGDKPLTFPSPNSAMENGLDHTPPSRRASPGTPLSPGSLRSAAHS 50 51 PLDTSKQPLCQLWAEKHGARGTHEVRYVSAGQSVACGWWAFAPPCLQVLN 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 PLDTSKQPLCQLWAEKHGARGTHEVRYVSAGQSVACGWWAFAPPCLQVLN 100
101 TPKGILFFLCAAAFLQGMTVNGFINTVITSLERRYDLHSYQSGLIASSYD 150 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 101 TPKGILFFLCAAAFLQGMTVNGFINTVITSLERRYDLHSYQSGLIASSYD 150
151 IAACLCLTFVSYFGGSGHKPRWLGWGVLLMGTGSLVFALPHFTAGRYEVE 200 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 151 IAACLCLTFVSYFGGSGHKPRWLGWGVLLMGTGSLVFALPHFTAGRYEVE 200 201 LDAGVRTCPANPGAVCADSTSGLSRYQLVFMLGQFLHGVGATPLYTLGVT 250 I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 201 LDAGVRTCPANPGAVCADSTSGLSRYQLVFMLGQFLHGVGATPLYTLGVT 250
251 YLDENVKSSCSPVYIAIFYTAAILGPAAGYLIGGALLNIYTEMGRRTELT 300 251 YLDENVKSSCSPVYIAIFYTAAILGPAAGYLIGGALLNIYTEMGRRTELT 300
301 TESPLWVGAWWVGFLGSGAAAFFTAVPILGYPRQLPGSQRYAVMRAAEMH 350 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
301 TESPLWVGAWWVGFLGSGAAAFFTAVPILGYPRQLPGSQRYAVMRAAEMH 350
351 QLKDSSRGEASNPDFGKTIRDLPLSIWLLLKNPTFILLCLAGATEATLIT 400 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 351 QLKDSSRGEASNPDFGKTIRDLPLSIWLLLKNPTFILLCLAGATEATLIT 400
401 GMSTFSPKFLESQFSLSASEAATLFGYLVVPAGGGGTFLGGFFVNKLRLR 450 I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
401 GMSTFSPKFLESQFSLSASEAATLFGYLVVPAGGGGTFLGGFFVNKLRLR 450 . . . . .
451 GSAVIKFCLFCTVVSLLGILVFSLHCPSVPMAGVTASYGGSLLPEGHLNL 500 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
451 GSAVIKFCLFCTWSLLGILVFSLHCPSVPMAGVTASYGGSLLPEGHLNL 500
501 TAPCNAACSCQPEHYSPVCGSDGLMYFSLCHAGCPAATETNVDGQKVYRD 550 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
501 TAPCNAACSCQPEHYSPVCGSDGLMYFSLCHAGCPAATETNVDGQKVYRD 550
551 CSCIPQNLSSGFGHATAGKCTSTCQRKPLLLVFIFWIFFTFLSSIPALT 600 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
551 CSCIPQNLSSGFGHATAGKCTSTCQRKPLLLVFIFWIFFTFLSSIPALT 600
601 ATLRCVRDPQRSFALGIQWIVVRILGGIPGPIAFGWVIDKACLLWQDQCG 650 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 601 ATLRCVRDPQRSFALGIQWIWRILGGIPGPIAFGWVIDKACLLWQDQCG 650 651 QQGSCLVYQNSAMSRYILIMGLLYKTI 677
651 QQGSCLVYQNSAMSRYILIMGLLYKVL 677
Sequence name: S21C_HUMAN
Sequence documentation:
Alignment of : T23657_P4 x S21CJTOMAN
Alignment segment 1/1:
Quality: 6521.00
Escore : 0 Matching length: 677 Total length: 698 Matching Percent Similarity: 99.85 Matching Percent Identity: 99.70 Total Percent Similarity: 96.85 Total Percent Identity: 96.70 Gaps : 1
Alignment:
1 MPLHQLGDKPLTFPSPNSAMENGLDHTPPSRRASPGTPLSPGSLRSAAHS 50 MPLHQLGDKPLTFPSPNSAMENGLDHTPPSRRASPGTPLSPGSLRSAAHS 50
PLDTSKQPLCQLWAEKHGARGTHEVRYVSAGQSVACGWWAFAPPCLQVLN 100
I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I PLDTSKQPLCQLWAEKHGARGTHEVRYVSAGQSVACGWWAFAPPCLQVLN 100
TPKGILFFLCAAAFLQGMTVNGFINTVITSLERRYDLHSYQSGLIASSYD 150
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I TPKGILFFLCAAAFLQGMTVNGFINTVITSLERRYDLHSYQSGLIASSYD 150
IAACLCLTFVSYFGGSGHKPRWLGWGVLLMGTGSLVFALPHFTAGRYEVE 200
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I IAACLCLTFVSYFGGSGHKPRWLGWGVLLMGTGSLVFALPHFTAGRYEVE 200
LDAGVRTCPANPGAVCADSTSGLSRYQLVFMLGQFLHGVGATPLYTLGVT 250
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I LDAGVRTCPANPGAVCADSTSGLSRYQLVFMLGQFLHGVGATPLYTLGVT 250
YLDENVKSSCSPVYIAIFYTAAILGPAAGYLIGGALLNIYTEMGRRTELT 300
YLDENVKSSCSPVYIAIFYTAAILGPAAGYLIGGALLNIYTEMGRRTELT 300
TESPLWVGAWWVGFLGSGAAAFFTAVPILGYPRQLPGSQRYAVMRAAEMH 350
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I TESPLWVGAWWVGFLGSGAAAFFTAVPILGYPRQLPGSQRYAVMRAAEMH 350
QLKDSSRGEASNPDFGKTIRDLPLSIWLLLKNPTFILLCLAGATEATLIT 400 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I QLKDSSRGEASNPDFGKTIRDLPLSIWLLLKNPTFILLCLAGATEATLIT 400
GMSTFSPKFLESQFSLSASEAATLFGYLVVPAGGGGTFLGGFFVNKLRLR 450 401 GMSTFSPKFLESQFSLSASEAATLFGYLVVPAGGGGTFLGGFFVNKLRLR 450 451 GSAVIKFCLFCTVVSLLGILVFSLHCPSVPMAGVTASYGGSLLPEGHLNL 500 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I 451 GSAVIKFCLFCTVVSLLGILVFSLHCPSVPMAGVTASYGGSLLPEGHLNL 500 501 TAPCNAACSCQPEHYSPVCGSDGLMYFSLCHAGCPAATETNVDGQKVYRD 550 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 501 TAPCNAACSCQPEHYSPVCGSDGLMYFSLCHAGCPAATETNVDGQKVYRD 550 551 CSCIPQNLSSGFGHATAGKCTSTCQRKPLLLVFIFVVIFFTFLSSIPALT 600 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 551 CSCIPQNLSSGFGHATAGKCTSTCQRKPLLLVFIFWIFFTFLSSIPALT 600 . . . . . 601 ATLRCVRDPQRSFALGIQWIVVRILGTVQCEEAMVSCTVCSLHKGMGGIP 650 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 601 ATLRCVRDPQRSFALGIQWIWRIL GGIP 629 651 GPIAFGWVIDKACLLWQDQCGQQGSCLVYQNSAMSRYILIMGLLYKTI 698 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : 630 GPIAFGWVIDKACLLWQDQCGQQGSCLVYQNSAMSRYILIMGLLYKVL 677
Sequence name: S21CJ-UMAN
Sequence documentation: Alignment of: T23657_P5 x S21C_HUMAN
Alignment segment 1/1:
Quality: 5909.00 Escore: 0 Matching length: 604 Total length: 604 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment :
1 MPLHQLGDKPLTFPSPNSAMENGLDHTPPSRRASPGTPLSPGSLRSAAHS 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MPLHQLGDKPLTFPSPNSAMENGLDHTPPSRRASPGTPLSPGSLRSAAHS 50
51 PLDTSKQPLCQLWAEKHGARGTHEVRYVSAGQSVACGWWAFAPPCLQVLN 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I 51 PLDTSKQPLCQLWAEKHGARGTHEVRYVSAGQSVACGWWAFAPPCLQVLN 100 . . . . . 101 TPKGILFFLCAAAFLQGMTVNGFINTVITSLERRYDLHSYQSGLIASSYD 150 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I 101 TPKGILFFLCAAAFLQGMTVNGFINTVITSLERRYDLHSYQSGLIASSYD 150 151 IAACLCLTFVSYFGGSGHKPRWLGWGVLLMGTGSLVFALPHFTAGRYEVE 200 151 IAACLCLTFVSYFGGSGHKPRWLGWGVLLMGTGSLVFALPHFTAGRYEVE 200
201 LDAGVRTCPANPGAVCADSTSGLSRYQLVFMLGQFLHGVGATPLYTLGVT 250 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
201 LDAGVRTCPANPGAVCADSTSGLSRYQLVFMLGQFLHGVGATPLYTLGVT 250
251 YLDENVKSSCSPVYIAIFYTAAILGPAAGYLIGGALLNIYTEMGRRTELT 300 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
251 YLDENVKSSCSPVYIAIFYTAAILGPAAGYLIGGALLNIYTEMGRRTELT 300
301 TESPLWVGAWWVGFLGSGAAAFFTAVPILGYPRQLPGSQRYAVMRAAEMH 350 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 301 TESPLWVGAWWVGFLGSGAAAFFTAVPILGYPRQLPGSQRYAVMRAAEMH 350
351 QLKDSSRGEASNPDFGKTIRDLPLSIWLLLKNPTFILLCLAGATEATLIT 400 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
351 QLKDSSRGEASNPDFGKTIRDLPLSIWLLLKNPTFILLCLAGATEATLIT 400
401 GMSTFSPKFLESQFSLSASEAATLFGYLVVPAGGGGTFLGGFFVNKLRLR 450 I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
401 GMSTFSPKFLESQFSLSASEAATLFGYLVVPAGGGGTFLGGFFVNKLRLR 450
451 GSAVIKFCLFCTVVSLLGILVFSLHCPSVPMAGVTASYGGSLLPEGHLNL 500 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 451 GSAVIKFCLFCTVVSLLGILVFSLHCPSVPMAGVTASYGGSLLPEGHLNL 500
501 TAPCNAACSCQPEHYSPVCGSDGLMYFSLCHAGCPAATETNVDGQKVYRD 550 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 501 TAPCNAACSCQPEHYSPVCGSDGLMYFSLCHAGCPAATETNVDGQKVYRD 550 . . . . .
551 CSCIPQNLSSGFGHATAGKCTSTCQRKPLLLVFIFWIFFTFLSSIPALT 600 551 CSCIPQNLSSGFGHATAGKCTSTCQRKPLLLVFIFVVIFFTFLSSIPALT 600
601 ATLR 604
601 ATLR 604
Sequence name: S21C_HUMAN
Sequence documentation:
Alignment of: T23657_P6 x S21C_HUMAN
Alignment segment 1/1:
Quality: 5354.00 Escore: 0 Matching length: 547 Total length: 547 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment: 1 MPLHQLGDKPLTFPSPNSAMENGLDHTPPSRRASPGTPLSPGSLRSAAHS 50
1 MPLHQLGDKPLTFPSPNSAMENGLDHTPPSRRASPGTPLSPGSLRSAAHS 50
51 PLDTSKQPLCQLWAEKHGARGTHEVRYVSAGQSVACGWWAFAPPCLQVLN 100
51 PLDTSKQPLCQLWAEKHGARGTHEVRYVSAGQSVACGWWAFAPPCLQVLN 100
101 TPKGILFFLCAAAFLQGMTVNGFINTVITSLERRYDLHSYQSGLIASSYD 150
101 TPKGILFFLCAAAFLQGMTVNGFINTVITSLERRYDLHSYQSGLIASSYD 150
151 IAACLCLTFVSYFGGSGHKPRWLGWGVLLMGTGSLVFALPHFTAGRYEVE 200
151 IAACLCLTFVSYFGGSGHKPRWLGWGVLLMGTGSLVFALPHFTAGRYEVE 200
201 LDAGVRTCPANPGAVCADSTSGLSRYQLVFMLGQFLHGVGATPLYTLGVT 250 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 201 LDAGVRTCPANPGAVCADSTSGLSRYQLVFMLGQFLHGVGATPLYTLGVT 250 251 YLDENVKSSCSPVYIAIFYTAAILGPAAGYLIGGALLNIYTEMGRRTELT 300 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 251 YLDENVKSSCSPVYIAIFYTAAILGPAAGYLIGGALLNIYTEMGRRTELT 300 . . . . . 301 TESPLWVGAWWVGFLGSGAAAFFTAVPILGYPRQLPGSQRYAVMRAAEMH 350 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I 301 TESPLWVGAWWVGFLGSGAAAFFTAVPILGYPRQLPGSQRYAVMRAAEMH 350 351 QLKDSSRGEASNPDFGKTIRDLPLSIWLLLKNPTFILLCLAGATEATLIT 400 351 QLKDSSRGEASNPDFGKTIRDLPLSIWLLLKNPTFILLCLAGATEATLIT 400 401 GMSTFSPKFLESQFSLSASEAATLFGYLVVPAGGGGTFLGGFFVNKLRLR 450 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 401 GMSTFSPKFLESQFSLSASEAATLFGYLVVPAGGGGTFLGGFFVNKLRLR 450 451 GSAVIKFCLFCTVVSLLGILVFSLHCPSVPMAGVTASYGGSLLPEGHLNL 500 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 451 GSAVIKFCLFCTVVSLLGILVFSLHCPSVPMAGVTASYGGSLLPEGHLNL 500 501 TAPCNAACSCQPEHYSPVCGSDGLMYFSLCHAGCPAATETNVDGQKV 547 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 501 TAPCNAACSCQPEHYSPVCGSDGLMYFSLCHAGCPAATETNVDGQKV 547
Sequence name: S21C_HUMAN
Sequence documentation:
Alignment of: T23657JP7 x S21CJHUMAN
Alignment segment 1/1:
Quality: 5346.00 Escore: 0 Matching length: 546 Total length: 546 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment:
1 MPLHQLGDKPLTFPSPNSAMENGLDHTPPSRRASPGTPLSPGSLRSAAHS 50
1 MPLHQLGDKPLTFPSPNSAMENGLDHTPPSRRASPGTPLSPGSLRSAAHS 50 51 PLDTSKQPLCQLWAEKHGARGTHEVRYVSAGQSVACGWWAFAPPCLQVLN 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I 51 PLDTSKQPLCQLWAEKHGARGTHEVRYVSAGQSVACGWWAFAPPCLQVLN 100
101 TPKGILFFLCAAAFLQGMTVNGFINTVITSLERRYDLHSYQSGLIASSYD 150 I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I 101 TPKGILFFLCAAAFLQGMTVNGFINTVITSLERRYDLHSYQSGLIASSYD 150 . . . . . 151 IAACLCLTFVSYFGGSGHKPRWLGWGVLLMGTGSLVFALPHFTAGRYEVE 200 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 151 IAACLCLTFVSYFGGSGHKPRWLGWGVLLMGTGSLVFALPHFTAGRYEVE 200 201 LDAGVRTCPANPGAVCADSTSGLSRYQLVFMLGQFLHGVGATPLYTLGVT 250 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 201 LDAGVRTCPANPGAVCADSTSGLSRYQLVFMLGQFLHGVGATPLYTLGVT 250
251 YLDENVKSSCSPVYIAIFYTAAILGPAAGYLIGGALLNIYTEMGRRTELT 300
251 YLDENVKSSCSPVYIAIFYTAAILGPAAGYLIGGALLNIYTEMGRRTELT 300 301 TESPLWVGAWWVGFLGSGAAAFFTAVPILGYPRQLPGSQRYAVMRAAEMH 350 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 301 TESPLWVGAWWVGFLGSGAAAFFTAVPILGYPRQLPGSQRYAVMRAAEMH 350 . . . . . 351 QLKDSSRGEASNPDFGKTIRDLPLSIWLLLKNPTFILLCLAGATEATLIT 400 I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 351 QLKDSSRGEASNPDFGKTIRDLPLSIWLLLKNPTFILLCLAGATEATLIT 400 401 GMSTFSPKFLESQFSLSASEAATLFGYLVVPAGGGGTFLGGFFVNKLRLR 450 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 401 GMSTFSPKFLESQFSLSASEAATLFGYLVVPAGGGGTFLGGFFVNKLRLR 450
451 GSAVIKFCLFCTVVSLLGILVFSLHCPSVPMAGVTASYGGSLLPEGHLNL 500
451 GSAVIKFCLFCTVVSLLGILVFSLHCPSVPMAGVTASYGGSLLPEGHLNL 500
501 TAPCNAACSCQPEHYSPVCGSDGLMYFSLCHAGCPAATETNVDGQK 546 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 501 TAPCNAACSCQPEHYSPVCGSDGLMYFSLCHAGCPAATETNVDGQK 546
Sequence name: S21CJHUMAN
Sequence documentation:
Alignment of: T23657_P8 x S21CJ-UMAN Alignment segment 1/1:
Quality: 5346.00 Escore: 0 Matching length: 546 Total length: 546 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: •• 100.00 •• Total Percent Identity: 100.00 Gaps : 0
Alignment : . . . . . 1 MPLHQLGDKPLTFPSPNSAMENGLDHTPPSRRASPGTPLSPGSLRSAAHS 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I 1 MPLHQLGDKPLTFPSPNSAMENGLDHTPPSRRASPGTPLSPGSLRSAAHS 50 51 PLDTSKQPLCQLWAEKHGARGTHEVRYVSAGQSVACGWWAFAPPCLQVLN 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 PLDTSKQPLCQLWAEKHGARGTHEVRYVSAGQSVACGWWAFAPPCLQVLN 100 101 TPKGILFFLCAAAFLQGMTVNGFINTVITSLERRYDLHSYQSGLIASSYD 150 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 101 TPKGILFFLCAAAFLQGMTVNGFINTVITSLERRYDLHSYQSGLIASSYD 150 151 IAACLCLTFVSYFGGSGHKPRWLGWGVLLMGTGSLVFALPHFTAGRYEVE 200 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 151 IAACLCLTFVSYFGGSGHKPRWLGWGVLLMGTGSLVFALPHFTAGRYEVE 200 201 LDAGVRTCPANPGAVCADSTSGLSRYQLVFMLGQFLHGVGATPLYTLGVT 250 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
201 LDAGVRTCPANPGAVCADSTSGLSRYQLVFMLGQFLHGVGATPLYTLGVT 250
251 YLDENVKSSCSPVYIAIFYTAAILGPAAGYLIGGALLNIYTEMGRRTELT 300 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
251 YLDENVKSSCSPVYIAIFYTAAILGPAAGYLIGGALLNIYTEMGRRTELT 300
301 TESPLWVGAWWVGFLGSGAAAFFTAVPILGYPRQLPGSQRYAVMRAAEMH 350 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
301 TESPLWVGAWWVGFLGSGAAAFFTAVPILGYPRQLPGSQRYAVMRAAEMH 350
351 QLKDSSRGEASNPDFGKTIRDLPLSIWLLLKNPTFILLCLAGATEATLIT 400 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 351 QLKDSSRGEASNPDFGKTIRDLPLSIWLLLKNPTFILLCLAGATEATLIT 400
401 GMSTFSPKFLESQFSLSASEAATLFGYLWPAGGGGTFLGGFFVNKLRLR 450 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
401 GMSTFSPKFLESQFSLSASEAATLFGYLVVPAGGGGTFLGGFFVNKLRLR 450
451 GSAVIKFCLFCTVVSLLGILVFSLHCPSVPMAGVTASYGGSLLPEGHLNL 500
451 GSAVIKFCLFCTVVSLLGILVFSLHCPSVPMAGVTASYGGSLLPEGHLNL 500
501 TAPCNAACSCQPEHYSPVCGSDGLMYFSLCHAGCPAATETNVDGQK 546
501 TAPCNAACSCQPEHYSPVCGSDGLMYFSLCHAGCPAATETNVDGQK 546 Sequence name: S21C_HUMAN
Sequence documentation:
Alignment of: T23657_P10 x S21C_HUMAN
Alignment segment 1/1:
Quality: 6968.00 Escore: 0 Matching length: 722 Total length: 743 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 97.17 Total Percent Identity: 97.17 Gaps : 1
Alignment:
1 MPLHQLGDKPLTFPSPNSAMENGLDHTPPSRRASPGTPLSPGSLRSAAHS 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MPLHQLGDKPLTFPSPNSAMENGLDHTPPSRRASPGTPLSPGSLRSAAHS 50
51 PLDTSKQPLCQLWAEKHGARGTHEVRYVSAGQSVACGWWAFAPPCLQVLN 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 PLDTSKQPLCQLWAEKHGARGTHEVRYVSAGQSVACGWWAFAPPCLQVLN 100 . . . . . 101 TPKGILFFLCAAAFLQGMTVNGFINTVITSLERRYDLHSYQSGLIASSYD 150 101 TPKGILFFLCAAAFLQGMTVNGFINTVITSLERRYDLHSYQSGLIASSYD 150
151 IAACLCLTFVSYFGGSGHKPRWLGWGVLLMGTGSLVFALPHFTAGRYEVE 200 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
151 IAACLCLTFVSYFGGSGHKPRWLGWGVLLMGTGSLVFALPHFTAGRYEVE 200
201 LDAGVRTCPANPGAVCADSTSGLSRYQLVFMLGQFLHGVGATPLYTLGVT 250 I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 201 LDAGVRTCPANPGAVCADSTSGLSRYQLVFMLGQFLHGVGATPLYTLGVT 250
251 YLDENVKSSCSPVYIAIFYTAAILGPAAGYLIGGALLNIYTEMGRRTELT 300
251 YLDENVKSSCSPVYIAIFYTAAILGPAAGYLIGGALLNIYTEMGRRTELT 300
301 TESPLWVGAWWVGFLGSGAAAFFTAVPILGYPRQLPGSQRYAVMRAAEMH 350 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
301 TESPLWVGAWWVGFLGSGAAAFFTAVPILGYPRQLPGSQRYAVMRAAEMH 350
351 QLKDSSRGEASNPDFGKTIRDLPLSIWLLLKNPTFILLCLAGATEATLIT 400 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
351 QLKDSSRGEASNPDFGKTIRDLPLSIWLLLKNPTFILLCLAGATEATLIT 400
401 GMSTFSPKFLESQFSLSASEAATLFGYLWPAGGGGTFLGGFFVNKLRLR 450 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
401 GMSTFSPKFLESQFSLSASEAATLFGYLWPAGGGGTFLGGFFVNKLRLR 450
451 GSAVIKFCLFCTVVSLLGILVFSLHCPSVPMAGVTASYGGSLLPEGHLNL 500 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I 451 GSAVIKFCLFCTVVSLLGILVFSLHCPSVPMAGVTASYGGSLLPEGHLNL 500 501 TAPCNAACSCQPEHYSPVCGSDGLMYFSLCHAGCPAATETNVDGQKVYRD 550 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I 11 I I I I I 501 TAPCNAACSCQPEHYSPVCGSDGLMYFSLCHAGCPAATETNVDGQKVYRD 550 551 CSCIPQNLSSGFGHATAGKCTSTCQRKPLLLVFIFVVIFFTFLSSIPALT 600 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I 551 CSCIPQNLSSGFGHATAGKCTSTCQRKPLLLVFIFVVIFFTFLSSIPALT 600 601 ATLRCVRDPQRSFALGIQWIVVRILGTVQCEEAMVSCTVCSLHKGMGGIP 650 I I I I I I I I I I I I I I I I I I I I I I I I i - I I I I 601 ATLRCVRDPQRSFALGIQWIVVRIL GGIP 629
651 GPIAFGWVIDKACLLWQDQCGQQGSCLVYQNSAMSRYILIMGLLYKVLGV 700 I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 630 GPIAFGWVIDKACLLWQDQCGQQGSCLVYQNSAMSRYILIMGLLYKVLGV 679
701 LFFAIACFLYKPLSESSDGLETCLPSQSSAPDSATDSQLQSSV 743 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 680 LFFAIACFLYKPLSESSDGLETCLPSQSSAPDSATDSQLQSSV 722
Sequence name: S21C_HUMAN
Sequence documentation:
Alignment of: T23657 Pll x S21C HUMAN Alignment segment 1/1:
Quality: 4156.00 Escore: 0 Matching length: 425 Total length: 425 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment : 1 MPLHQLGDKPLTFPSPNSAMENGLDHTPPSRRASPGTPLSPGSLRSAAHS 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MPLHQLGDKPLTFPSPNSAMENGLDHTPPSRRASPGTPLSPGSLRSAAHS 50
51 PLDTSKQPLCQLWAEKHGARGTHEVRYVSAGQSVACGWWAFAPPCLQVLN 100 I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I II I 51 PLDTSKQPLCQLWAEKHGARGTHEVRYVSAGQSVACGWWAFAPPCLQVLN 100 101 TPKGILFFLCAAAFLQGMTVNGFINTVITSLERRYDLHSYQSGLIASSYD 150 I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I 101 TPKGILFFLCAAAFLQGMTVNGFINTVITSLERRYDLHSYQSGLIASSYD 150 151 IAACLCLTFVSYFGGSGHKPRWLGWGVLLMGTGSLVFALPHFTAGRYEVE 200 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I 151 IAACLCLTFVSYFGGSGHKPRWLGWGVLLMGTGSLVFALPHFTAGRYEVE 200
201 LDAGVRTCPANPGAVCADSTSGLSRYQLVFMLGQFLHGVGATPLYTLGVT 250 201 LDAGVRTCPANPGAVCADSTSGLSRYQLVFMLGQFLHGVGATPLYTLGVT 250 251 YLDENVKSSCSPVYIAIFYTAAILGPAAGYLIGGALLNIYTEMGRRTELT 300 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 251 YLDENVKSSCSPVYIAIFYTAAILGPAAGYLIGGALLNIYTEMGRRTELT 300 301 TESPLWVGAWWVGFLGSGAAAFFTAVPILGYPRQLPGSQRYAVMRAAEMH 350 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 301 TESPLWVGAWWVGFLGSGAAAFFTAVPILGYPRQLPGSQRYAVMRAAEMH 350 351 QLKDSSRGEASNPDFGKTIRDLPLSIWLLLKNPTFILLCLAGATEATLIT 400 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 351 QLKDSSRGEASNPDFGKTIRDLPLSIWLLLKNPTFILLCLAGATEATLIT 400
401 GMSTFSPKFLESQFSLSASEAATLF 425 I I I I I I I I I I I I I I I I I I I I I I I I I 401 GMSTFSPKFLESQFSLSASEAATLF 425
Sequence name : S21CJ-UMAN
Sequence documentation:
Alignment of: T23657_P12 x S21C_HUMAN
Alignment segment 1/1: Quality: 6620.00 Escore: 0 Matching length: 675 Total length: 675 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 - Gaps: 0 - -
Alignment :
1 MPLHQLGDKPLTFPSPNSAMENGLDHTPPSRRASPGTPLSPGSLRSAAHS 50
1 MPLHQLGDKPLTFPSPNSAMENGLDHTPPSRRASPGTPLSPGSLRSAAHS 50
51 PLDTSKQPLCQLWAEKHGARGTHEVRYVSAGQSVACGWWAFAPPCLQVLN 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 PLDTSKQPLCQLWAEKHGARGTHEVRYVSAGQSVACGWWAFAPPCLQVLN 100
101 TPKGILFFLCAAAFLQGMTVNGFINTVITSLERRYDLHSYQSGLIASSYD 150 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 101 TPKGILFFLCAAAFLQGMTVNGFINTVITSLERRYDLHSYQSGLIASSYD 150
151 IAACLCLTFVSYFGGSGHKPRWLGWGVLLMGTGSLVFALPHFTAGRYEVE 200
151 IAACLCLTFVSYFGGSGHKPRWLGWGVLLMGTGSLVFALPHFTAGRYEVE 200 201 LDAGVRTCPANPGAVCADSTSGLSRYQLVFMLGQFLHGVGATPLYTLGVT 250 201 LDAGVRTCPANPGAVCADSTSGLSRYQLVFMLGQFLHGVGATPLYTLGVT 250
251 YLDENVKSSCSPVYIAIFYTAAILGPAAGYLIGGALLNIYTEMGRRTELT 300 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
251 YLDENVKSSCSPVYIAIFYTAAILGPAAGYLIGGALLNIYTEMGRRTELT 300
301 TESPLWVGAWWVGFLGSGAAAFFTAVPILGYPRQLPGSQRYAVMRAAEMH 350 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
301 TESPLWVGAWWVGFLGSGAAAFFTAVPILGYPRQLPGSQRYAVMRAAEMH 350
351 QLKDSSRGEASNPDFGKTIRDLPLSIWLLLKNPTFILLCLAGATEATLIT 400 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
351 QLKDSSRGEASNPDFGKTIRDLPLSIWLLLKNPTFILLCLAGATEATLIT 400
401 GMSTFSPKFLESQFSLSASEAATLFGYLVVPAGGGGTFLGGFFVNKLRLR 450 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ϊ
401 GMSTFSPKFLESQFSLSASEAATLFGYLVVPAGGGGTFLGGFFVNKLRLR 450
451 GSAVIKFCLFCTVVSLLGILVFSLHCPSVPMAGVTASYGGSLLPEGHLNL 500
451 GSAVIKFCLFCTVVSLLGILVFSLHCPSVPMAGVTASYGGSLLPEGHLNL 500
501 TAPCNAACSCQPEHYSPVCGSDGLMYFSLCHAGCPAATETNVDGQKVYRD 550 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 501 TAPCNAACSCQPEHYSPVCGSDGLMYFSLCHAGCPAATETNVDGQKVYRD 550
551 CSCIPQNLSSGFGHATAGKCTSTCQRKPLLLVFIFWIFFTFLSSIPALT 600 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
551 CSCIPQNLSSGFGHATAGKCTSTCQRKPLLLVFIFWIFFTFLSSIPALT 600
601 ATLRCVRDPQRSFALGIQWIVVRILGGIPGPIAFGWVIDKACLLWQDQCG 650 601 ATLRCVRDPQRSFALGIQWIVVRILGGIPGPIAFGWVIDKACLLWQDQCG 650
651 QQGSCLVYQNSAMSRYILIMGLLYK 675
651 QQGSCLVYQNSAMSRYILIMGLLYK 675
- -
Sequence name: S21C_HUMAN
Sequence documentation:
Alignment of: T23657_P16 x S21CJTOMAN
Alignment segment 1/1:
Quality: 2296.00 Escore: 0 Matching length: 232 Total length: 232 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment: 31 SLLPEGHLNLTAPCNAACSCQPEHYSPVCGSDGLMYFSLCHAGCPAATET 80 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 491 SLLPEGHLNLTAPCNAACSCQPEHYSPVCGSDGLMYFSLCHAGCPAATET 540 . . . . . 81 NVDGQKVYRDCSCIPQNLSSGFGHATAGKCTSTCQRKPLLLVFIFVVIFF 130 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 541 NVDGQKVYRDCSCIPQNLSSGFGHATAGKCTSTCQRKPLLLVFIFVVIFF 590 131 TFLSSIPALTATLRCVRDPQRSFALGIQWIVVRILGGIPGPIAFGWVIDK 180 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 591 TFLSSIPALTATLRCVRDPQRSFALGIQWIVVRILGGIPGPIAFGWVIDK 640 181 ACLLWQDQCGQQGSCLVYQNSAMSRYILIMGLLYKVLGVLFFAIACFLYK 230 I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 641 ACLLWQDQCGQQGSCLVYQNSAMSRYILIMGLLYKVLGVLFFAIACFLYK 690 231 PLSESSDGLETCLPSQSSAPDSATDSQLQSSV 262 I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I 691 PLSESSDGLETCLPSQSSAPDSATDSQLQSSV 722
Sequence name: S21CJTUMAN
Sequence documentation:
Alignment of: T23657_P17 x S21C HUMAN Alignment segment 1/1:
Quality: 1947.00 Escore: 0 Matching length: 198 Total length: 198 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment : . . . . . 1 MYFSLCHAGCPAATETNVDGQKVYRDCSCIPQNLSSGFGHATAGKCTSTC 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I 525 MYFSLCHAGCPAATETNVDGQKVYRDCSCIPQNLSSGFGHATAGKCTSTC 574 51 QRKPLLLVFIFWIFFTFLSSIPALTATLRCVRDPQRSFALGIQWIVVRI 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 575 QRKPLLLVFIFVVIFFTFLSSIPALTATLRCVRDPQRSFALGIQWIVVRI 624
101 LGGIPGPIAFGWVIDKACLLWQDQCGQQGSCLVYQNSAMSRYILIMGLLY 150 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I 625 LGGIPGPIAFGWVIDKACLLWQDQCGQQGSCLVYQNSAMSRYILIMGLLY 674
151 KVLGVLFFAIACFLYKPLSESSDGLETCLPSQSSAPDSATDSQLQSSV 198 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I 675 KVLGVLFFAIACFLYKPLSESSDGLETCLPSQSSAPDSATDSQLQSSV 722
Sequence name: S21C_HUMAN
Sequence documentation:
Alignment of: T23657_P21 x S21C_HUMAN
Alignment segment 1/1:
Quality: 1169.00 Escore: 0 Matching length: 119 Total length: 119 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment : . . . . . 6 RCVRDPQRSFALGIQWIWRILGGIPGPIAFGWVIDKACLLWQDQCGQQG 55 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 604 RCVRDPQRSFALGIQWIWRILGGIPGPIAFGWVIDKACLLWQDQCGQQG 653 56 SCLVYQNSAMSRYILIMGLLYKVLGVLFFAIACFLYKPLSESSDGLETCL 105 654 SCLVYQNSAMSRYILIMGLLYKVLGVLFFAIACFLYKPLSESSDGLETCL 703
106 PSQSSAPDSATDSQLQSSV 124 I I I I I I I I I I 1 I I I I I I I I 704 PSQSSAPDSATDSQLQSSV 722
Sequence name: S21CJHUMAN
Sequence documentation:
Alignment of: T23657_P23 x S21C_HUMAN
Alignment segment 1/1: Quality: 5354.00
Escore: 0 Matching length: 547 Total length: 547 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment: MPLHQLGDKPLTFPSPNSAMENGLDHTPPSRRASPGTPLSPGSLRSAAHS 50
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I MPLHQLGDKPLTFPSPNSAMENGLDHTPPSRRASPGTPLSPGSLRSAAHS 50
PLDTSKQPLCQLWAEKHGARGTHEVRYVSAGQSVACGWWAFAPPCLQVLN 100
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I PLDTSKQPLCQLWAEKHGARGTHEVRYVSAGQSVACGWWAFAPPCLQVLN 100
TPKGILFFLCAAAFLQGMTVNGFINTVITSLERRYDLHSYQSGLIASSYD 150
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I TPKGILFFLCAAAFLQGMTVNGFINTVITSLERRYDLHSYQSGLIASSYD 150
IAACLCLTFVSYFGGSGHKPRWLGWGVLLMGTGSLVFALPHFTAGRYEVE 200
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I IAACLCLTFVSYFGGSGHKPRWLGWGVLLMGTGSLVFALPHFTAGRYEVE 200
LDAGVRTCPANPGAVCADSTSGLSRYQLVFMLGQFLHGVGATPLYTLGVT 250
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I LDAGVRTCPANPGAVCADSTSGLSRYQLVFMLGQFLHGVGATPLYTLGVT 250 . . . . . YLDENVKSSCSPVYIAIFYTAAILGPAAGYLIGGALLNIYTEMGRRTELT 300
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I YLDENVKSSCSPVYIAIFYTAAILGPAAGYLIGGALLNIYTEMGRRTELT 300
TESPLWVGAWWVGFLGSGAAAFFTAVPILGYPRQLPGSQRYAVMRAAEMH 350 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I TESPLWVGAWWVGFLGSGAAAFFTAVPILGYPRQLPGSQRYAVMRAAEMH 350
QLKDSSRGEASNPDFGKTIRDLPLSIWLLLKNPTFILLCLAGATEATLIT 400
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I QLKDSSRGEASNPDFGKTIRDLPLSIWLLLKNPTFILLCLAGATEATLIT 400 401 GMSTFSPKFLESQFSLSASEAATLFGYLVVPAGGGGTFLGGFFVNKLRLR 450 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 401 GMSTFSPKFLESQFSLSASEAATLFGYLVVPAGGGGTFLGGFFVNKLRLR 450 . . . . . 451 GSAVIKFCLFCTVVSLLGILVFSLHCPSVPMAGVTASYGGSLLPEGHLNL 500 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 451 GSAVIKFCLFCTVVSLLGILVFSLHCPSVPMAGVTASYGGSLLPEGHLNL 500 501 TAPCNAACSCQPEHYSPVCGSDGLMYFSLCHAGCPAATETNVDGQKV 547 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 501 TAPCNAACSCQPEHYSPVCGSDGLMYFSLCHAGCPAATETNVDGQKV 547
Expression of solute carrier organic anion transporter family, member 4A1 (SLC04A1) T23657 Transcripts, which are detectable by amplicon as depicted in sequence name T23657 segl 7-18, in normal and cancerous colon tissues Expression of solute canier organic anion fransporter family, member 4A1 (SLCO4A1) franscripts detectable by or according tosegl7-18, T23657 amplicon (SEQ ID NO: 1357) and T23657 Segl 7-18F (SEQ ID NO: 1355) T23657 Segl 7-18 R (SEQ ID NO: 1356) primers was measured by real time PCR. In parallel the expression of four housekeeping genes -PBGD (GenBank Accession No. BC019323; amplicon - PBGD-amplicon, SEQ ID NO.531), HPRTl (GenBank Accession No. NM_000194; amplicon - HPRTl -amplicon, SEQ ID NO:612), G6PD (GenBank Accession No. NM_000402; G6PD amplicon, SEQ ID NO:615), RPS27A (GenBank Accession No. NM_002954; RPS27A amplicon, SEQ ID NO:1261), was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geomefric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the normal post-mortem (PM) samples (Sample Nos. 41, 52, 62-61, 69-71, Table 1, above, "Tissue samples in testing panel"), to obtain a value of fold up-regulation for each sample relative to median of the normal PM samples. Figure 59 is a histogram showing over expression ofthe above-indicated solute canier organic anion transporter family, member 4A1 (SLC04A1) franscripts in cancerous colon samples relative to the normal samples. (Values represent the average of duplicate experiments. Enor bars indicate the minimal and maximal values obtained.). The number and percentage of samples that exhibit at least 4 fold over-expression, out ofthe total number of samples tested is indicated in the bottom. As is evident from Figure 59, the expression of solute canier organic anion transporter family, member 4A1 (SLC04A1) transcripts detectable by the above amplicon in cancer samples was significantly higher than in the non-cancerous samples (Sample Nos. 41, 52, 62-67, 69-71, Table 1, "Tissue samples in testing panel"). Notably an over-expression of at least 4 fold was found in 28 out of 37 adenocarcinoma samples, Statistical analysis was applied to verify the significance of these results, as described below. The P value for the difference in the expression levels of solute canier organic anion transporter family, member 4A1 (SLCO4A1) franscripts detectable by the above amplicon in colon cancer samples versus the normal tissue samples was determined by T test as 7.22E-04. Threshold of 4 fold overexpression was found to differentiate between cancer and normal samples with P value of 7.43E-06 as checked by exact fisher test. The above values demonstrate statistical significance ofthe results.
Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non- limiting illustrative example only of a suitable primer pair: T23657segl7-18F forward primer; and T23657segl7-18R reverse primer. The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon was obtained as a non-limiting illustrative example only of a suitable amplicon: T23657segl7- 18. Forward primer (SEQ ID NO: 1355): CTGCTGGGCATCCTCGTCT Reverse primer (SEQ ID NO: 1356): CGTACCCAGGTGCCATCTG Amplicon (SEQ ID NO: 1357): CTGGGCATCCTCGTCTTCTCACTGCACTGCCCCAGTGTGCCCATGGCGGGCGTCACA GCCAGCTACGGCGGGAGGTGAGGGCCAGATGGCACCTGGGTACG
Expression of solute carrier organic anion transporter family, member 4A1 (SLC04A1)
T23657 transcripts which are detectable by amplicon as depicted in sequence name T23657 seg22 in normal and cancerous colon tissues Expression of solute carrier organic anion transporter family, member 4A1 (SLCO4A1) franscripts detectable by or according to seg22, T23657 amplicon (SEQ ID NO: 1360) and T23657 seg22F (SEQ ID NO: 1358) T23657 seg22 R (SEQ ID NO: 1359) primers was measured by real time PCR. In parallel the expression of four housekeeping genes -PBGD (GenBank Accession No. BC019323; amplicon - PBGD-amplicon, SEQ ID NO:531), HPRTl (GenBank Accession No. NM_000194; amplicon - HPRTl -amplicon, SEQ ID NO:612), G6PD (GenBank Accession No. NM_000402; G6PD amplicon, SEQ ID NO:615), RPS27A (GenBank Accession No. N J002954; RPS27A amplicon, SEQ ID NO: 1261), was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the normal post-mortem (PM) samples (Sample Nos. 41, 52, 62-67, 69-71, Table 1, above, "Tissue samples in testing panel"), to obtain a value of fold up-regulation for each sample relative to median of the normal PM samples. Figure 60 is a histogram showing over expression ofthe above-indicated solute carrier organic anion transporter family, member 4A1 (SLC04A1) franscripts in cancerous colon samples relative to the normal samples (values represent the average of duplicate experiments. Enor bars indicate the minimal and maximal values obtained). The number and percentage of samples that exhibit at least 4 fold over-expression, out ofthe total number of samples tested is indicated in the bottom. As is evident from Figure 60, the expression of solute canier organic anion fransporter family, member 4A1 (SLC04A1) franscripts detectable by the above amplicon in cancer samples was significantly higher than in the non-cancerous samples (Sample Nos. 41, 52, 62-67, 69-71, Table 1, "Tissue samples in testing panel"). Notably an over-expression of at least 4 fold was found in 20 out of 37 adenocarcinoma samples, Statistical analysis was applied to verify the significance of these results, as described below. The P value for the difference in the expression levels of solute canier organic anion fransporter family, member 4A1 (SLCO4A1) franscripts detectable by the above amplicon in colon cancer samples versus the normal tissue samples was determined by T test as 3.62E-03. Threshold of 4 fold overexpression was found to differentiate between cancer and normal samples with P value of 9.50E-04 as checked by exact fisher test. The above values demonstrate statistical significance ofthe results.
Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non- limiting illustrative example only of a suitable primer pair: T23657seg22F forward primer; and T23657seg22R reverse primer. The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon was obtained as a non-limiting illustrative example only of a suitable amplicon: T23657seg22.
Forward primer (SEQ ID NO: 1358): TGGCAAGTTTGTAGACCCGAA Reverse primer (SEQ ID NO: 1359): GGTAGGGTCCAGGCCAGAG Amplicon (SEQ ID NO: 1360): TGGCAAGTTTGTAGACCCGAAATGCAGGCTGCATGGGGACGAGCCCCATGGCTGAC CCTGTGCCTGCTGGGCGCCAGCATGGCTCTGGCCTGGACCCTACC
Expression of solute carrier organic anion transporter family, member 4A1 (SLC04A1)
T23657 transcripts which are detectable by amplicon as depicted in sequence name T23657 seg29-32 in normal and cancerous colon tissues Expression of solute canier organic anion transporter family, member 4A1 (SLCO4A1) franscripts detectable by or according to seg29-32, T23657 amplicon (SEQ ID NO: 1363) and T23657 seg29-32F (SEQ ID NO: 1361) T23657 seg29-32 R (SEQ ID NO: 1362) primers was measured by real time PCR. In parallel the expression of four housekeeping genes -PBGD (GenBank Accession No. BC019323; amplicon - PBGD-amplicon, SEQ ID NO:531), HPRTl (GenBank Accession No. NM_000194; amplicon - HPRTl -amplicon, SEQ ID NO:612), G6PD (GenBank Accession No. NMjD00402; G6PD amplicon, SEQ ID NO:615), RPS27A (GenBank Accession No. N J002954; RPS27A amplicon, SEQ ID NO: 1261), was measured similarly. For each RT sample, the expression ofthe above amplicon was normalized to the geometric mean ofthe quantities ofthe housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the normal post-mortem (PM) samples (Sample Nos. 41, 52, 62-67, 69-71, Table 1, above, "Tissue samples in testing panel"), to obtain a value of fold up-regulation for each sample relative to median of the normal PM samples. Figure 61 is a histogram showing over expression ofthe above-indicated solute canier organic anion transporter family, member 4A1 (SLC04A1) franscripts in cancerous colon samples relative to the nonnal samples. The number and percentage of samples that exhibit at least 3 fold over-expression, out of the total number of samples tested is indicated in the bottom. As is evident from Figure 61, the expression of solute canier organic anion transporter family, member 4A1 (SLC04A1) transcripts detectable by the above amplicon in cancer samples was significantly higher than in the non-cancerous samples (Sample Nos. 41, 52, 62-67, 69-71, Table 1, "Tissue samples in testing panel"). Notably an over-expression of at least 3 fold was found in 23 out of 37 adenocarcinoma samples, Statistical analysis was applied to verify the significance of these results, as described below. The P value for the difference in the expression levels of solute canier organic anion transporter family, member 4A1 (SLCO4A1) transcripts detectable by the above amplicon in colon cancer samples versus the normal tissue samples was determined by T test as 1.39E-07. Threshold of 3 fold overexpression was found to differentiate between cancer and normal samples with P value of 1.97E-04 as checked by exact fisher test. The above values demonstrate statistical significance ofthe results.
Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non- limiting illustrative example only of a suitable primer pair: T23657seg29-32F forward primer; and T23657seg29-32R reverse primer. The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon was obtained as a non-limiting illustrative example only of a suitable amplicon: T23657seg29- 32. Forward primer (SEQ ID NO: 1361): CCTTTGCCCTGGGAATCC Reverse primer (SEQ ID NO: 1362): GCCCTGCTGGCCACAC Amplicon (SEQ ID NO: 1363): CCTTTGCCCTGGGAATCCAGTGGATTGTAGTTAGAATACTAGGGGGCATCCCGGGG CCCATCGCCTTCGGCTGGGTGATCGACAAGGCCTGTCTGCTGTGGCAGGACCAGTG TGGCCAGCAGGGC
Expression of solute carrier organic anion transporter family, member 4A1 (SLC04A1)
T23657 transcripts which are detectable by amplicon as depicted in sequence name T23657 seg41 in normal and cancerous colon tissues Expression of solute carrier organic anion fransporter family, member 4A1 (SLCO4A1) franscripts detectable by or according to seg41, T23657 amplicon (SEQ ID NO: 1366) and T23657 Seg41F (SEQ ID NO: 1364) T23657 Seg41 R (SEQ ID NO: 1365) primers was measured by real time PCR. In parallel the expression of four housekeeping genes -PBGD (GenBank Accession No. BC019323; amplicon - PBGD-amplicon, SEQ ID NO.531), HPRTl (GenBank Accession No. NM_000194; amplicon - HPRTl -amplicon, SEQ ID NO:612), G6PD (GenBank Accession No. NM_000402; G6PD amplicon, SEQ ID NO:615), RPS27A (GenBank Accession No. NMJ302954; RPS27A amplicon, SEQ ID NO: 1261), was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the nonnal post-mortem (PM) samples (Sample Nos. 41, 52, 62-67, 69-71, Table 1, above, "Tissue samples in testing panel"), to obtain a value of fold up-regulation for each sample relative to median ofthe normal PM samples. Figure 62 is a histogram showing over expression ofthe above-indicated solute canier organic anion transporter family, member 4A1 (SLC04A1) franscripts in cancerous colon samples relative to the normal samples. The number and percentage of samples that exhibit at least 4 fold over-expression, out of the total number of samples tested is indicated in the bottom. As is evident from Figure 62, the expression of solute canier organic anion transporter family, member 4A1 (SLCO4A1) franscripts detectable by the above amplicon in cancer samples was significantly higher than in the non-cancerous samples (Sample Nos. 41, 52, 62-67, 69-71, Table 1, "Tissue samples in testing panel"). Notably an over-expression of at least 4 fold was found in 6 out of 37 adenocarcinoma samples, Statistical analysis was applied to verify the significance of these results, as described below. The P value for the difference in the expression levels of solute canier organic anion fransporter family, member 4A1 (SLC04A1) franscripts detectable by the above amplicon in colon cancer samples versus the normal tissue samples was determined by T test as 3.02E-03. Threshold of 4 fold overexpression was found to differentiate between cancer and normal samples with P value of 1.89E-01 as checked by exact fisher test. The above values demonstrate statistical significance ofthe results.
Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non- limiting illustrative example only of a suitable primer pair: T23657seg41F forward primer; and T23657seg41R reverse primer. The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon was obtained as a non-limiting illustrative example only of a suitable amplicon: T23657seg41.
Forward primer (SEQ ID NO: 1364): CCGTGATGGATGTGGAGTCTC Reverse primer (SEQ ID NO: 1365): GCATCGGAAGCAAATGCATT Amplicon (SEQ ID NO: 1366): CCGTGATGGATGTGGAGTCTCGGCTTTCTGACAACGTCTTCCAGAGCAGGCTTTCTC TAGAGGGTGGACTGCCTGTGTTCTCCTGGGAGAGAATGCATTTGCTTCCGATGC
DESCRIPTION FOR CLUSTER T51958
Cluster T51958 features 12 franscript(s) and 48 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end ofthe application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Figure imgf001622_0001
Figure imgf001623_0001
Figure imgf001624_0001
Table 3 - Proteins of interest
Figure imgf001625_0001
These sequences are variants ofthe Icnown protein Tyrosine-protein kinase-like 7 precursor (SwissProt accession identifier PTK7JHUMAN; Icnown also according to the synonyms Colon carcinoma kinase-4; CCK-4), SEQ ID NO: 1141, refened to herein as the previously known protein. Protein Tyrosine-protein kinase-like 7 precursor is known or believed to have the following function(s): MAY FUNCTION AS A CELL ADHESION MOLECULE. LACKS PROBABLY THE CATALYTIC ACTIVITY OF TYROSINE KINASE. MAY BE CONNECTED TO THE PATHOPHYSIOLOGY OF COLON CARCINOMAS AND/OR MAY REPRESENT A TUMOR PROGRESSION MARKER. The sequence for protein Tyrosine- protein kinase-like 7 precursor is given at the end ofthe application, as "Tyrosine-protein kinase-like 7 precursor amino acid sequence". Known polymoφhisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein
Figure imgf001625_0002
Figure imgf001626_0001
Protein Tyrosine-protein kinase-like 7 precursor localization is believed to be Type I membrane protein.
The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: protein amino acid phosphorylation; cell adhesion; signal transduction, which are annotation(s) related to Biological Process; protein tyrosine kinase; fransmembrane receptor protein tyrosine kinase; receptor; protein binding; ATP binding; transferase, which are annotation(s) related to Molecular Function; and integral plasma membrane protein, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more ofthe SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot >; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>.
Cluster T51958 can be used as a diagnostic marker according to overexpression of transcripts of this cluster in cancer. Expression of such franscripts in normal tissues is also given according to the previously described methods. The term "number" in the left hand column of the table and the numbers on the y-axis ofthe figure below refer to weighted expression of ESTs in each category, as "parts per million" (ratio ofthe expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million). Overall, the following results were obtained as shown with regard to the histograms in Figure 63 and Table 5. This cluster is overexpressed (at least at a minimum level) in the following pathological conditions: epithelial malignant tumors and a mixture of malignant tamors from different tissues.
Table 5 - Normal tissue distribution
Figure imgf001627_0001
Table 6 - P values and ratios for expression in cancerous tissue
Figure imgf001627_0002
Figure imgf001628_0001
As noted above, cluster T51958 featares 12 franscript(s), which were listed in Table 1 above. These franscript(s) encode for protein(s) which are variant(s) of protein Tyrosine-protein kinase-like 7 precursor. A description of each variant protein according to the present invention is now provided. Variant protein T51958_PEA_1_P5 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) T51958_PEA_1_T4. An alignment is given to the Icnown protein (Tyrosine-protein kinase-like 7 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description of the relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T51958_PEA_1_P5 and PTK7JXUMANJ 4 (SEQ ID NO: 1143): l.An isolated chimeric polypeptide encoding for T51958_PEA_1_P5, comprising a first amino acid sequence being at least 90 % homologous to MGAARGSPARPRRLPLLSVLLLPLLGGTQTAIVFIKQPSSQDALQGRRALLRCEVEAPGP VHVYWLLDGAPVQDTERRFAQGSSLSFAAVDRLQDSGTFQCVARDDVTGEEARSANA SFNIKWIEAGPWLKHPASEAEIQPQTQVTLRCHIDGHPRPTYQWFRDGTPLSDGQSNH TVSSKERNLTLRPAGPEHSGLYSCCAHSAFGQACSSQNFTLSIADESFARVVLAPQDVV VARYEEAMFHCQFSAQPPPSLQWLFEDETPITNRSRPPHLRRATVFANGSLLLTQVRPR NAGIYRCIGQGQRGPPIILEATLHLAEIEDMPLFEPRVFTAGSEERVTCLPPKGLPEPSVW WEHAGVRLPTHGRVYQKGHELVLANIAESDAGVYTCHAANLAGQRRQDVNITVATVP SWLKKPQDSQLEEGKPGYLDCLTQATPKPTVVWYRNQMLISEDSRFEVFKNGTLRTNS VEVYDGTWYRCMSSTPAGSIEAQARVQVLEKLKFTPPPQPQQCMEFDKEATVPCSATG REKPTTKWERADGSSLPEWVTDNAGTLHFARVTRDDAGNYTCIASNGPQGQIRAHVQL TVAVFITFKVEPERTTVYQGHTALLQCEAQGDPKPLIQWKGKDRILDPTKLGPRMHIFQ NGSLVIHDVAPEDSGRYTCIAGNSCNIKHTEAPLYW conesponding to amino acids 1 - 682 of PTK7_HUMAN_V4, which also conesponds to amino acids 1 - 682 of T51958JPEAJ JP5, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
GMGWGGLCCTGSGGPRRLSPCTQPLCTEHGTEAIFNAAVGIRPSHHAAAQS conesponding to amino acids 683 - 733 of T51958_PEA_1_P5, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of T51958JPEAJ JP5, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence GMGWGGLCCTGSGGPRRLSPCTQPLCTEHGTEAIFVAAVGIRPSHHAAAQS in T51958JPEAJJP5.
It should be noted that the Icnown protein sequence (PTK7JHTUMAN) has one or more changes than the sequence given at the end ofthe application and named as being the amino acid sequence for PTK7_HUMAN_V4. These changes were previously known to occur and are listed in the table below. Table 7 - Changes to PTK7 HUMAN _V4
Figure imgf001630_0001
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither frans-membrane region prediction program predicts that this protein has a frans-membrane region.
Variant protein T51958_PEA_1_P5 is encoded by the following franscript(s): T51958JΕAJ JT4, for which the sequence(s) is/are given at the end ofthe application. The coding portion of franscript T51958_PEA_1_T4 is shown in bold; this coding portion starts at position 209 and ends at position 2407. The franscript also has the following SNPs as listed in Table 8 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T51958_PEA_1_P5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Nucleic acid SNPs
Figure imgf001631_0001
Variant protein T51958_PEA_1_P6 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) T51958JPEAJ JT5. An alignment is given to the known protein (Tyrosine-protein kinase-like 7 precursor) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description of the relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T51958 JΕAJ JP6 and PTK7_HUMAN_V4: l.An isolated chimeric polypeptide encoding for T51958_PEA_1_P6, comprising a first amino acid sequence being at least 90 % homologous to MGAARGSPARPRRLPLLSVLLLPLLGGTQTAIVFΓKQPSSQDALQGRRALLRCEVEAPGP VHVYWLLDGAPVQDTERRFAQGSSLSFAAVDRLQDSGTFQCVARDDVTGEEARSANA SFNIKWIEAGPWLKHPASEAEIQPQTQVTLRCHIDGHPRPTYQWFRDGTPLSDGQSNH TVSSKERNLTLRPAGPEHSGLYSCCAHSAFGQACSSQNFTLSIADESFARWLAPQDW VARYEEAMFHCQFSAQPPPSLQWLFEDETPITNRSRPPHLRRATVFANGSLLLTQVRPR NAGIYRCIGQGQRGPPIILEATLHLAEIEDMPLFEPRVFTAGSEERVTCLPPKGLPEPSVW WEHAGVRLPTHGRVYQKGHELVLANIAESDAGVYTCHAANLAGQRRQDVNITVATVP SWLKKPQDSQLEEGKPGYLDCLTQATPKPTVVWYRNQMLISEDSRFEVFKNGTLRTNS VEVYDGTWYRCMSSTPAGSIEAQARVQVLEKLKFTPPPQPQQCMEFDKEATVPCSATG REKPTIKWERADGSSLPEWVTDNAGTLHFARVTRDDAGNYTCIASNGPQGQIRAHVQL TVAVFITFKVEPERTTVYQGHTALLQCEAQGDPKPLIQWKGKDRILDPTKLGPRM conesponding to amino acids 1 - 641 of PTK7 JHUMAN JV4, which also conesponds to amino acids 1 - 641 of T51958_PEA_1_P6, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence APW conesponding to amino acids 642 - 644 of T51958_PEA_1_P6, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
It should be noted that the known protein sequence (PTK7JHUMAN) has one or more changes than the sequence given at the end ofthe application and named as being the amino acid sequence for PTK7J3UMANJV4. These changes were previously known to occur and are listed in the table below. Table 9 - Changes to PTK7 _HUMAN_V4
Figure imgf001632_0001
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region.
Variant protein T51958 JPEA JJP6 is encoded by the following transcript(s): T51958JPEAJJT5, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript T51958_PEA_1_T5 is shown in bold; this coding portion starts at position 209 and ends at position 2140. The transcript also has the following SNPs as listed in Table 10 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of Icnown SNPs in variant protein T51958JPEAJ JP6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 - Nucleic acid SNPs
Figure imgf001633_0001
Variant protein T51958 j?EA_l j?28 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by franscript(s) T51958JPEAJ JT37. An alignment is given to the Icnown protein (Tyrosine-protein kinase-like 7 precursor) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T51958JPEAJ JP28 and PTK7JIUMANJ 11 (SEQ ID NO: 1144): l.An isolated chimeric polypeptide encoding for T51958_PEA_1_P28, comprising a first amino acid sequence being at least 90 % homologous to MGAARGSPARPRRLPLLSVLLLPLLGGTQTAIVFIKQPSSQDALQGRRALLRCEVEAPGP VHVYWLLDGAPVQDTERRFAQGSSLSFAAVDRLQDSGTFQCVARDDVTGEEARSANA SFNIKWIEAGPVVLKHPASEAEIQPQTQVTLRCHIDGHPRPTYQWFRDGTPLSDGQSNH TVSSKERNLTLRPAGPEHSGLYSCCAHSAFGQACSSQNFTLSIADESFARVVLAPQDW VARYEEAMFHCQFSAQPPPSLQWLFEDETPITNRSRPPHLRRATVFANGSLLLTQVRPR NAGIYRCIGQGQRGPPIILEATLHLAEIEDMPLFEPRVFTAGSEERVTCLPPKGLPEPSVW WEHAGVRLPTHGRVYQKGHELVLANIAESDAGVYTCHAANLAGQRRQDVNITVA conesponding to amino acids 1 - 409 of PTK7_HUMAN_V11, which also conesponds to amino acids 1 - 409 of T51958_PEA_1_P28, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
SEHLCPEGQGEVEGNTGLGVMDRGFPGTHLRSSQFWALQAWESVHYWESV conesponding to amino acids 410 - 459 of T51958JPEAJJP28, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of T51958_PEA_1_P28, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SEHLCPEGQGEVEGNTGLGVMDRGFPGTHLRSSQFWALQAWESVHYWESV in T51958JPEAJJP28. It should be noted that the known protein sequence (PTK7_HUMAN) has one or more changes than the sequence given at the end ofthe application and named as being the amino acid sequence for PTK7_HUMAN_V11. These changes were previously Icnown to occur and are listed in the table below.
Table 11 - Changes to PTK7_HUMAN_V11
Figure imgf001635_0001
Comparison report between T51958j?EA_1 _P28 and Q8NFA5 (SEQ ID NO:1147): 1.An isolated chimeric polypeptide encoding for T51958 JPEAJ JP28, comprising a first amino acid sequence being at least 90 % homologous to MGAARGSPARPRRLPLLSVLLLPLLGGTQTAIVFIKQPSSQDALQGRRALLRCEVEAPGP VHVYWLLDGAPVQDTERRFAQGSSLSFAAVDRLQDSGTFQCVARDDVTGEEARSANA SFNIKWIEAGPVVLKHPASEAEIQPQTQVTLRCHIDGHPRPTYQWFRDGTPLSDGQSNH TVSSKERNLTLRPAGPEHSGLYSCCAHSAFGQACSSQNFTLSIADESFARVVLAPQDVV VARYEEAMFHCQFSAQPPPSLQ WLFEDETPITNRSRPPHLRRATVFANGSLLLTQVRPR NAGIYRCIGQGQRGPPIILEATLHLAEIEDMPLFEPRVFTAGSEERVTCLPPKGLPEPSVW WEHAGVRLPTHGRVYQKGHELVLANIAESDAGVYTCHAANLAGQRRQDVNITVA conesponding to amino acids 1 - 409 of Q8NFA5, which also conesponds to amino acids 1 - 409 of T51958JPEAJJP28, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence SEHLCPEGQGEVEGNTGLGVMDRGFPGTHLRSSQFWALQAWESVHYWESV conesponding to amino acids 410 - 459 of T51958_PEA_1_P28, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of T51958JPEAJ JP28, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SEHLCPEGQGEVEGNTGLGVMDRGFPGTHLRSSQFWALQAWESVHYWESV in T51958 PEA 1 P28. Comparison report between T51958_PEAJ J?28 and Q8NFA6 (SEQ ID NO.1149): 1.An isolated chimeric polypeptide encoding for T51958JPEAJ J>28, comprising a first amino acid sequence being at least 90 % homologous to
MGAARGSPARPRRLPLLSVLLLPLLGGTQTAIVFIKQPSSQDALQGRRALLRCEVEAPGP VHVYWLLDGAPVQDTERRFAQGSSLSFAAVDRLQDSGTFQCVARDDVTGEEARSANA SFNIKWIEAGPWLKHPASEAEIQPQTQVTLRCHIDGHPRPTYQWFRDGTPLSDGQSNH TVSSKERNLTLRPAGPEHSGLYSCCAHSAFGQACSSQNFTLSIADESFARVVLAPQDVV VARYEEAMFHCQFSAQPPPSLQWLFEDETPITNRSRPPHLRRATVFANGSLLLTQVRPR NAGIYRCIGQGQRGPPIILEATLHLAEIEDMPLFEPRVFTAGSEERVTCLPPKGLPEPSVW WEHAGVRLPTHGRVYQKGHELVLANIAESDAGVYTCHAANLAGQRRQDVNITVA conesponding to amino acids 1 - 409 of Q8NFA6, which also conesponds to amino acids 1 - 409 of T51958 JPEAJ JP28, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence SEHLCPEGQGEVEGNTGLGVMDRGFPGTHLRSSQFWALQAWES VHYWESV conesponding to amino acids 410 - 459 of T51958JPEAJJP28, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of T51958JPEAJ JP28, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SEHLCPEGQGEVEGNTGLGVMDRGFPGTHLRSSQFWALQAWESVHYWESV in T51958_PEAJ_P28.
Comparison report between T51958 ?EAJ ?28 and Q8NFA7 (SEQ ID NO:1148): 1.An isolated chimeric polypeptide encoding for T51958 JPEA J JP28, comprising a first amino acid sequence being at least 90 % homologous to
MGAARGSPARPRRLPLLSVLLLPLLGGTQTAIVFTKQPSSQDALQGRRALLRCEVEAPGP VHVYWLLDGAPVQDTERRFAQGSSLSFAAVDRLQDSGTFQCVARDDVTGEEARSANA SFNKWIEAGPVVLKHPASEAEIQPQTQVTLRCHIDGHPRPTYQWFRDGTPLSDGQSNH TVSSKERNLTLRPAGPEHSGLYSCCAHSAFGQACSSQNFTLSIADESFARVNLAPQDVN VARYEEAMFHCQFSAQPPPSLQWLFEDETPITNRSRPPHLRRATVFANGSLLLTQVRPR NAGIYRCIGQGQRGPPIILEATLHLAEIEDMPLFEPRVFTAGSEERVTCLPPKGLPEPSVW WEHAGVRLPTHGRVYQKGHELVLANIAESDAGVYTCHAANLAGQRRQDVNITVA conesponding to amino acids 1 - 409 of Q8NFA7, which also conesponds to amino acids 1 - 409 of T51958_PEA_1_P28, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
SEHLCPEGQGEVEGNTGLGVMDRGFPGTHLRSSQFWALQAWESVHYWESV conesponding to amino acids 410 - 459 of T51958JΕAJJP28, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of T51958_PEA_1_P28, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SEHLCPEGQGEVEGNTGLGVMDRGFPGTHLRSSQFWALQAWESVHYWESV in T51958 JPEAJ J>28.
Comparison report between T51958_PEA_1_P28 and Q8NFA8 (SEQ ID NO:1146): l.An isolated chimeric polypeptide encoding for T51958JΕAJ JP28, comprising a first amino acid sequence being at least 90 % homologous to MGAARGSPAP >RI?J PLLSVLLLPLLGGTQTAιVFIKQPSSQDALQGRRALLRCEVEAPGP VHVYWLLDGAPVQDTERRFAQGSSLSFAAVDRLQDSGTFQCVARDDVTGEEARSANA SFNIKWIEAGPWLKHPASEAEIQPQTQVTLRCHIDGHPRPTYQWFRDGTPLSDGQSNH TVSSKERNLTLRPAGPEHSGLYSCCAHSAFGQACSSQNFTLSIADESFARVVLAPQDW VARYEEAMFHCQFSAQPPPSLQWLFEDETPITNRSRPPHLRRATVFANGSLLLTQVRPR NAGIYRCIGQGQRGPPIILEATLHLAEIEDMPLFEPRVFTAGSEERVTCLPPKGLPEPSVW WEHAGVRLPTHGRVYQKGHELVLANIAESDAGVYTCHAANLAGQRRQDVNITVA conesponding to amino acids 1 - 409 of Q8NFA8, which also conesponds to amino acids 1 - 409 of T51958 JPEA JJP28, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence SEHLCPEGQGEVEGNTGLGVMDRGFPGTHLRSSQFWALQAWESVHYWESV conesponding to amino acids 410 - 459 of T51958_PEA_1_P28, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of T51958JPEAJ JP28, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SEHLCPEGQGEVEGNTGLGVMDRGFPGTHLRSSQFWALQAWESVHYWESV in T51958_PEA_l_P28. Comparison report between T51958JPEAJ JP28 and AAN04862 (SEQ ID NO: 1150): l.An isolated chimeric polypeptide encoding for T51958JΕAJJP28, comprising a first amino acid sequence being at least 90 % homologous to
MGAARGSPARPRRLPLLSVLLLPLLGGTQTAΓVFΓKQPSSQDALQGRRALLRCEVEAPGP VHVYWLLDGAPVQDTERRFAQGSSLSFAAVDRLQDSGTFQCVARDDVTGEEARSANA SFNIKWIEAGPVVLKHPASEAEIQPQTQVTLRCHIDGHPRPTYQWFRDGTPLSDGQSNH TVSSKERNLTLRPAGPEHSGLYSCCAHSAFGQACSSQNFTLSIADESFARVVLAPQDW VARYEEAMFHCQFSAQPPPSLQWLFEDETPITNRSRPPHLRRATVFANGSLLLTQVRPR NAGIYRCIGQGQRGPPIILEATLHLAEIEDMPLFEPRVFTAGSEERVTCLPPKGLPEPSVW WEHAGVRLPTHGRVYQKGHELVLANIAESDAGVYTCHAANLAGQRRQDVNITVA conesponding to amino acids 1 - 409 of AAN04862, which also conesponds to amino acids 1 - 409 of T51958JPEAJJP28, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence SEHLCPEGQGEVEGNTGLGVMDRGFPGTHLRSSQFWALQAWESVHYWESV conesponding to amino acids 410 - 459 of T51958_PEA_1_P28, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of T51958_PEA_1_P28, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SEHLCPEGQGEVEGNTGLGVMDRGFPGTHLRSSQFWALQAWESVHYWESV in T51958 PEA 1 P28. The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region.
Variant protein T51958JPEAJ JP28 is encoded by the following franscript(s): T51958JPEAJ JT37, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript T51958JPEAJ JT37 is shown in bold; this coding portion starts at position 209 and ends at position 1585.
Variant protein T51958 JPEA J P30 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by franscript(s) T51958JPEAJ JT40. An aligmnent is given to the known protein (Tyrosine-protein kinase-like 7 precursor) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T51958_PEA_1_P30 and PTK7_HUMAN_V13 (SEQ ID NO: 1145): l.An isolated chimeric polypeptide encoding for T51958JΕAJJP30, comprising a first amino acid sequence being at least 90 % homologous to
MGAARGSPARPRRLPLLSVLLLPLLGGTQTAIVFIKQPSSQDALQGRRALLRCEVEAPGP VHVYWLLDGAPVQDTERRFAQGSSLSFAAVDRLQDSGTFQCVARDDVTGEEARSANA SFNIK conesponding to amino acids 1 - 122 of PTK7_HUMAN_V13, which also conesponds to amino acids 1 - 122 of T51958_PEA_1_P30, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence CESQGGCAQSPCQTLND conesponding to amino acids 123 - 139 of T51958JPEAJ JP30, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of T51958_PEA_1_P30, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence CESQGGCAQSPCQTLND in T51958JPEAJJP30.
It should be noted that the Icnown protein sequence (PTK7JHUMAN) has one or more changes than the sequence given at the end ofthe application and named as being the amino acid sequence for PTK7JXUMANJV13. These changes were previously known to occur and are listed in the table below. Table 12 - Changes to PTK7_HUMAN_V13
Figure imgf001640_0001
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither frans-membrane region prediction program predicts that this protein has a trans-membrane region.
Variant protein T51958_PEA_1_P30 is encoded by the following franscript(s): T51958JPEAJ JT40, for which the sequence(s) is/are given at the end ofthe application. The coding portion of franscript T51958JPEAJJT40 is shown in bold; this coding portion starts at position 209 and ends at position 625. Variant protein T51958_PEA_1_P34 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) T51958JPEAJJT8. An alignment is given to the known protein (Tyrosine-protein kinase-like 7 precursor) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T51958_PEA_1_P34 and PTK7_HUMAN_V3 (SEQ ID NO: 1142): l.An isolated chimeric polypeptide encoding for T51958 PEAJJP34, comprising a first amino acid sequence being at least 90 % homologous to MGAARGSPARPRRLPLLSVLLLPLLGGTQTAIVFIKQPSSQDALQGRRALLRCEVEAPGP VHVYWLLDGAPVQDTERRFAQGSSLSFAAVDRLQDSGTFQCVARDDVTGEEARSANA SFNIKWIEAGPVVLKHPASEAEIQPQTQVTLRCHIDGHPR conesponding to amino acids 1 - 157 of PTK7 JHUMAN JV3, which also conesponds to amino acids 1 - 157 of T51958_PEA_1_P34.
It should be noted that the Icnown protein sequence (PTK7JHUMAN) has one or more changes than the sequence given at the end ofthe application and named as being the amino acid sequence for PTK7_HUMAN_V3. These changes were previously known to occur and are listed in the table below. Table 13 - Changes to PTK7_HUMAN_V3
Figure imgf001641_0001
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region.
Variant protein T51958 JPEA JJ>34 is encoded by the following transcript(s): T51958_PEA_1_T8, for which the sequence(s) is/are given at the end ofthe application. The coding portion of franscript T51958JPEAJ JT8 is shown in bold; this coding portion starts at position 209 and ends at position 679. The transcript also has the following SNPs as listed in Table 14 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T51958 JPEA JJP34 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 14 - Nucleic acid SNPs
Figure imgf001642_0001
Variant protein T51958__PEA_1_P35 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s) T51958 JPEA JJT6. An alignment is given to the known protein (Tyrosine-protein kinase-like 7 precursor) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T51958_PEA_1_P35 and PTK7 JHUMAN J l 1 : 5 l.An isolated chimeric polypeptide encoding for T51958JPEAJJP35, comprising a first amino acid sequence being at least 90 % homologous to MGAARGSPAI? PRRLPLLSVLLLPLLGGTQTAIVFIKQPSSQDALQGRRALLRCEVEAPGP VHVYWLLDGAPVQDTERRFAQGSSLSFAAVDRLQDSGTFQCVARDDVTGEEARSANA SFNIKWIEAGPVVLKHPASEAEIQPQTQVTLRCHIDGHPRPTYQWFRDGTPLSDGQSNH " 10 TVSSKERNLTLRPAGPEHSGLYSCCAHSAFGQACSSQNFTLSIA conesponding to amino acids 1 - 220 of PTK7 JHUMAN JV11, which also conesponds to amino acids 1 - 220 of T51958_PEA_1_P35, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence GEPGVGAEGMR conesponding to amino 15 acids 221 - 231 of T51958_PEA_1_P35, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of T51958JPEAJ P35, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the 20 sequence GEPGVGAEGMR in T51958_PEA_1_P35.
It should be noted that the known protein sequence (PTK7_HUMAN) has one or more changes than the sequence given at the end ofthe application and named as being the amino acid sequence for PTK7 JHUMAN J/l 1. These changes were previously known to occur and 25 are listed in the table below. Table 15 - Changes to PTK7_HUMAN_V11
Figure imgf001643_0001
Figure imgf001644_0001
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region. Variant protein T51958 JPEA J JP35 is encoded by the following transcript(s) : T51958 JPEA JJT6, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript T51958 JPEA JJT6 is shown in bold; this coding portion starts at position 209 and ends at position 901. The transcript also has the following SNPs as listed in Table 16 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is Icnown or not; the presence of known SNPs in variant protein T51958 PEAJ JP35 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 16 - Nucleic acid SNPs
Figure imgf001644_0002
As noted above, cluster T51958 features 48 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster T51958JPEAJ jiodej) according to the present invention is supported by 21 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): T51958_PEA_1_T4, T51958JPEAJ _T5, T51958JPEAJJT6, T51958JPEAJ _T8, T51958JΕAJJT12, T51958 JPEAJ JTl 6, T51958JPEAJJT33, T51958_PEA_1_T35, T51958 JPEA JJT37, T51958JPEAJJT39, T51958JPEAJJT40 and T51958JPEAJJT41. Table 17 below describes the starting and ending position of this segment on each franscript. Table 17 - Segment location on transcripts
Figure imgf001645_0001
Segment cluster T51958JPEAJ jiodej according to the present invention is supported by 29 libraries. The number of libraries was detennined as previously described. This segment can be found in the following transcript(s): T51958 JPEA JJT4, T51958_PEA_1_T5, T51958JPEAJ JT6, T51958 JΕAJ JT8, T51958JPEAJJT12, T51958JPEAJJT16, T51958JΕAJJT33, T51958JPEAJJT35, T51958_PEA_1_T37, T51958_PEA_1_T39, T51958_PEA_1_T40 and T51958JPEAJ JT41. Table 18 below describes the starting and ending position of this segment on each transcript. Table 18 - Segment location on transcripts
Figure imgf001646_0001
Segment cluster T51958_PEA_l_node_8 according to the present invention is supported by 28 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): T51958JPEAJJT4, T51958JPEAJJT5, T51958_PEA_1_T6, T51958_PEA_1_T8, T51958JPEAJJT12, T51958JPEAJJT16, T51958_PEAJ_T33, T51958_PEAJ_T35, T51958_PEA_1_T37, T51958JPEAJJT39, T51958JPEAJ JT40 and T51958 JPEA JJT41. Table 19 below describes the starting and ending position of this segment on each transcript. Table 19 - Segment location on transcripts
Figure imgf001647_0001
Segment cluster T51958JPEAJ jiodej according to the present invention is supported by 4 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): T51958_PEA_1_T40. Table 20 below describes the starting and ending position of this segment on each franscript. Table 20 - Segment location on transcripts
Figure imgf001647_0002
Segment cluster T51958JPEAJ jιodeJ4 according to the present invention is supported by 27 libraries. The number of libraries was detennined as previously described. This segment can be found in the following franscript(s): T51958JPEAJ JT4, T51958_PEA_1_T5, T51958_PEA_1_T6, T51958JPEAJJT12, T51958JPEAJJT16, T51958JPEAJJT33, T51958 JPEAJ _T35, T51958JPEAJJT37, T51958JPEAJJT39 and T51958_PEA_1_T41. Table 21 below describes the starting and ending position of this segment on each transcript. Table 21 - Segment location on transcripts
Figure imgf001648_0001
Segment cluster T51958_PEA_l_node_l 6 according to the present invention is supported by 25 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): T51958JPEAJJT4, T51958JPEAJ JT5, T51958 JΕAJJT6, T51958JPEAJ _T8, T51958JΕAJJT12, T51958_PEA_1_T16, T51958JPEAJJT33, T51958JΕAJJT35, T51958 JPEA JJT37, T51958 JΕA JJT39 and T51958 JPEA J JT41. Table 22 below describes the starting and ending position of this segment on each transcript. Table 22 - Segment location on transcripts
Figure imgf001649_0001
Segment cluster T51958_PEA_l_node_18 according to the present invention is supported by 26 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): T51958JPEAJ JT4, T51958 ΕAJ JT5, T51958_PEA_1_T6, T51958JPEAJJT8, T51958JPEAJJT12, T51958J>EAJ_T16, T51958JPEAJ JT33, T51958_PEA_1_T35, T51958 JPEA JJT37, T51958 JΕA JJT39 and T51958_PEA_1_T41. Table 23 below describes the starting and ending position of this segment on each transcript. Table 23 - Segment location on transcripts
Figure imgf001649_0002
Figure imgf001650_0001
Segment cluster T51958JPEAJ jiode l according to the present invention is supported by 29 libraries. The number of libraries was detennined as previously described. This segment can be found in the following transcript(s): T51958_PEA_1_T4, T51958JPEAJJT5, T51958JΕAJJT6, T51958JPEAJJT8, T51958JPEAJ _T12, T51958 JΕAJ JTl 6, T51958 JΕAJ JTO, T51958JPEAJJT35, T51958_PEA_1_T37, T51958JΕAJJT39 and T51958_PEA_1_T41. Table 24 below describes the starting and ending position of this segment on each franscript. Table 24 - Segment location on transcripts
Figure imgf001650_0002
Segment cluster T51958JPEAJ ιodeJ2 according to the present invention is supported by 5 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T51958 JPEA JJT37 and T51958 JΕAJ JT39. Table 25 below describes the starting and ending position of this segment on each transcript. Table 25 - Segment location on transcripts
Figure imgf001651_0001
Segment cluster T51958JPEAJ jnodeJ4 according to the present invention is supported by 34 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T51958_PEA_1_T4, T51958JPEAJ JT5, T51958_PEA_1_T6, T51958JPEAJ JT8, T51958JPEAJJT12, T51958JΕAJJT16, T51958JPEAJJT33, T51958JPEAJJT35 and T51958_PEA_1_T41. Table 26 below describes the starting and ending position of this segment on each franscript. Table 26 - Segment location on transcripts
Figure imgf001651_0002
Figure imgf001652_0001
Segment cluster T51958_PEA_l_node_27 according to the present invention is supported by 33 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T51958_PEA_1_T4, T51958JPEAJJT5, T51958 JPEA JJT6, T51958J>EAJ_T8, T51958JPEAJ JT12, T51958_PEA_1_T16, T51958_PEA_1_T33, T51958 ΕAJJT35 and T51958_PEAJ_T41. Table 27 below describes the starting and ending position of this segment on each transcript. Table 27 - Segment location on transcripts
Figure imgf001652_0002
Segment cluster T51958JΕA_l_node_29 according to the present invention is supported by 37 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): T51958 JPEA JJT4, T51958JPEAJJT5, T51958 JΕAJ JT6, T51958JPEAJJT8, T51958 JPEAJ JTl 2, T51958 JPEAJ JTl 6, T51958_PEA_1_T33 and T51958_PEA_1_T35. Table 28 below describes the starting and ending position of this segment on each franscript. Table 28 - Segment location on transcripts
Figure imgf001653_0001
Segment cluster T51958JPEAJ jιodeJ3 according to the present invention is supported by 37 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T51958JPEAJJT4, T51958JPEAJ JT5, T51958 JΕAJJT6, T51958JPEAJ JT8, T51958JPEAJ JT12, T51958JΕA JJT16, T51958JPEAJJT33, T51958JΕAJ JT35 and T51958_PEA_1_T41. Table 29 below describes the starting and ending position of this segment on each franscript. Table 29 - Segment location on transcripts
Figure imgf001653_0002
Figure imgf001654_0001
Segment cluster T51958 JPEA Jjnode JO according to the present invention is supported by 13 libraries. The number of libraries was detennined as previously described. This segment can be found in the following franscript(s): T51958JPEAJJT4, T51958JPEAJJT5, T51958JPEAJ JT12, T51958_PEA_1_T16, T51958JΕAJ JT33 and T51958_PEA_1_T35. Table 30 below describes the starting and ending position of this segment on each transcript. Table 30 - Segment location on transcripts
Figure imgf001654_0002
Segment cluster T51958JΕAJ iodeJT according to the present invention is supported by 12 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T51958_PEA_1_T4, T51958_PEA_1_T5, T51958_PEA_1_T12, T51958 JPEAJ JTl 6 and T51958_PEAJ_T33. Table 31 below describes the starting and ending position of this segment on each transcript. Table 31 - Segment location on transcripts
Figure imgf001654_0003
Figure imgf001655_0001
Segment cluster T51958_PEA_l_node_46 according to the present invention is supported by 15 libraries. The number of libraries was detennined as previously described. This segment can be found in the following transcript(s): T51958JPEAJJT12, T51958_PEA_1_T16 and T51958 ΕAJ JT33. Table 32 below describes the starting and ending position of this segment on each transcript. Table 32 - Segment location on transcripts
Figure imgf001655_0002
Segment cluster T51958 JPEA Jjiode l according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): T51958JPEAJ JT33. Table 33 below describes the starting and ending position of this segment on each franscript. Table 33 - Segment location on transcripts
Figure imgf001655_0003
Segment cluster T51958 ΕAJ ιodeJ5 according to the present invention is supported by 82 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T51958JPEAJJT4, T51958 JΕAJ JT5, T51958JPEAJ JT6, T51958JPEAJJT8, T51958_PEA_1_T12 and T51958_PEA_1_T41. Table 34 below describes the starting and ending position of this segment on each transcript. Table 34 - Segment location on transcripts
Figure imgf001656_0001
Segment cluster T51958 ΕAJ jιodeJ>7 according to the present invention is supported by 81 libraries. The number of libraries was detennined as previously described. This segment can be found in the following franscript(s): T51958JPEAJ JT4, T51958_PEA_1_T5, T51958JPEAJ JT6, T51958JPEAJJT8, T51958 JPEAJ JTl 2, T51958JPEAJJT16 and T51958JPEAJJT41. Table 35 below describes the starting and ending position of this segment on each transcript. Table 35 - Segment location on transcripts
Figure imgf001656_0002
Figure imgf001657_0001
Segment cluster T51958 JPEA Jjnode _70 according to the present invention is supported by 85 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T51958_PEA_1_T4, T51958JPEAJ JT5, T51958_PEA_1_T6, T51958JPEAJJT8, T51958 JPEAJ JTl 2, T51958JPEAJ JT16 and T51958JPEAJJT41. Table 36 below describes the starting and ending position of this segment on each transcript. Table 36 - Segment location on transcripts
Figure imgf001657_0002
Segment cluster T51958JPEAJ jnode J4 according to the present invention is supported by 191 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): T51958_PEAJ_T4, T51958_PEA_1_T5, T51958 JΕAJJT6, T51958_PEA_1_T8, T51958_PEA_1_T12, T51958_PEA_1_T16 and T51958 JPEA JJT41. Table 37 below describes the starting and ending position of this segment on each franscript. Table 37 - Segment location on transcripts
Figure imgf001658_0001
Segment cluster T51958 JΕA Jjiode 8 according to the present invention is supported by 115 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T51958JPEAJJT4, T51958_PEA_1_T5, T51958JPEAJJT6, T51958JΕAJJT8, T51958 JΕAJ JT12, T51958_PEA_1_T16 and T51958JPEAJ JT41. Table 38 below describes the starting and ending position of this segment on each transcript. Table 38 - Segment location on transcripts
Figure imgf001658_0002
According to an optional embodiment ofthe present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster T51958JΕAJ jiodej 1 according to the present invention is supported by 23 libraries. The number of libraries was detennined as previously described. This segment can be found in the following transcript(s): T51958JPEAJ JT4, T51958JPEAJ JT5, T51958_PEA_1_T6, T51958 JΕA JJT8, T51958 J>EA JJT 12, T51958_PEA_1_T16, T51958JPEAJJT33, T51958_PEA_1_T35, T51958JPEAJJT37, T51958JΕAJ JT39 and T51958 JPEA JJT41. Table 39 below describes the starting and ending position of this segment on each transcript. Table 39 - Segment location on transcripts
Figure imgf001659_0001
Segment cluster T51958_PEA_l_node_l 5 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T51958 JPEA JJT6 and T51958JPEAJJT41. Table 40 below describes the starting and ending position of this segment on each transcript. Table 40 - Segment location on transcripts
Figure imgf001660_0001
Segment cluster T51958 JΕAJ_node_20 according to the present invention is supported by 25 libraries. The number of libraries was detennined as previously described. This segment can be found in the following transcript(s): T51958_PEA_1_T4, T51958JPEAJJT5, T51958 JΕAJJT6, T51958JPEAJJT8, T51958 JΕA JJT12, T51958_PEA_1_T16, T51958JΕAJJT33, T51958 JPEA JJT35, T51958JPEAJJT37, T51958 JPEA JJT39 and T51958_PEA_1_T41. Table 41 below describes the starting and ending position of this segment on each transcript. Table 41 - Segment location on transcripts
Figure imgf001660_0002
Figure imgf001661_0001
Segment cluster T51958_PEA_l_node_26 according to the present invention can be found in the following transcript(s): T51958 JΕAJ JT4, T51958_PEA_1_T5, T51958 JPEA JJT6, T51958JPEAJJT8, T51958_PEA_1_T12, T51958JPEA JJT16, T51958_PEA_1_T33, T51958_PEA_1_T35 and T51958JPEAJJT41. Table 42 below describes the starting and ending position of this segment on each transcript. Table 42 - Segment location on transcripts
Figure imgf001661_0002
Segment cluster T51958JPEAJ _nodeJ5 according to the present invention is supported by 41 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): T51958JPEAJJT4, T51958JPEAJ JT5, T51958 J EA JJT6, T51958_PEA_1_T8, T51958JPEAJJU2, T51958_PEA_1_T16, T51958JPEAJJT33, T51958JPEAJJT35 and T51958_PEA_1_T41. Table 43 below describes the starting and ending position of this segment on each transcript. Table 43 - Segment location on transcripts
Figure imgf001662_0001
Segment cluster T51958 JΕA Jjnode _36 according to the present invention is supported by 35 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): T51958JPEAJJT4, T51958JPEAJ JT5, T51958_PEAJ_T6, T51958JPEAJ T8, T51958_PEA_1_T12, T51958_PEAJ_T16, T51958_PEA_1_T33, T51958JPEAJJT35 and T51958JPEAJ JT41. Table 44 below describes the starting and ending position of this segment on each franscript. Table 44 - Segment location on transcripts
Figure imgf001662_0002
Figure imgf001663_0001
Segment cluster T51958_PEA_l_node 8 according to the present invention can be found in the following franscript(s): T51958_PEA_1_T4, T51958 JΕAJJT6, T51958_PEA_1_T8, T51958JPEAJJT12, T51958 JPEA JJT 6, T51958_PEA_1_T33, T51958JPEAJJT35 and T51958_PEA_1_T41. Table 45 below describes the starting and ending position of this segment on each transcript. Table 45 - Segment location on transcripts
Figure imgf001663_0002
Segment cluster T51958_PEA_l_node 9 according to the present invention is supported by 40 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T51958JΕAJJT4, T51958JPEAJ _T5, T51958JPEAJ JT6, T51958 JΕA JJT8, T51958JΕAJJT12, T51958JΕAJJT16, T51958JPEAJJT33, T51958 ΕA JJT35 and T51958JPEAJ T41. Table 46 below describes the starting and ending position of this segment on each transcript. Table 46 - Segment location on transcripts
Figure imgf001664_0001
Segment cluster T51958JPEAJ jnode J2 according to the present invention can be found in the following transcript(s): T51958JPEAJ JT4, T51958JPEAJJT5, T51958JPEAJJT6, T51958JPEAJJT12, T51958 JPEAJ JTl 6, T51958JPEAJJT33 and T51958JPEAJ JT41. Table 47 below describes the starting and ending position of this segment on each transcript. Table 47 - Segment location on transcripts
Figure imgf001664_0002
Segment cluster T51958_PEA_l_nodeJ3 according to the present invention is supported by 50 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T51958JΕAJJT4, T51958JPEAJJT5, T51958 J EAJ JT6, T51958_PEAJ_T8, T51958J>EAJJT12, T51958 JΕAJ JTl 6, T51958JPEAJ JT33 and T51958JΕAJ JT41. Table 48 below describes the starting and ending position of this segment on each transcript. Table 48 - Segment location on transcripts
Figure imgf001665_0001
Segment cluster T51958 JPEA Jjnode _44 according to the present invention is supported by 57 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T51958JPEAJ JT4, T51958JΕAJ JT5, T51958JPEAJ JT6, T51958_PEAJ_T8, T51958_PEA_1_T12, T51958 JPEAJ JTl 6, T51958JPEAJJT33 and T51958JPEAJJT41. Table 49 below describes the starting and ending position of this segment on each transcript. Table 49 - Segment location on transcripts
Figure imgf001665_0002
Figure imgf001666_0001
Segment cluster T51958JΕAJ jιodeJ5 according to the present invention can be found in the following transcript(s): T51958_PEA_1_T4, T51958JPEAJ JT5, T51958_PEA_1_T6, T51958_PEAJ_T8, T51958JPEAJ JT12, T51958JPEAJ J 6, T51958_PEA_1_T33 and T51958JPEAJ JT41. Table 50 below describes the starting and ending position of this segment on each transcript. Table 50 - Segment location on transcripts
Figure imgf001666_0002
Segment cluster T51958 PEA J_nodeJ7 according to the present invention is supported by 65 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): T51958_PEA_1_T4, T51958_PEA_1_T5, T51958JPEAJJT6, T51958JPEAJJT8, T51958JPEAJJT12, T51958 JPEAJ JTl 6, T51958JPEAJJT33 and T51958 JPEA JJT41. Table 51 below describes the starting and ending position of this segment on each transcript. Table 51 - Segment location on transcripts
Figure imgf001667_0001
Segment cluster T51958_PEA_l_node_48 according to the present invention is supported by 68 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): T51958_PEA_1_T4, T51958JPEAJJT5, T51958JPEAJJT6, T51958JPEAJ JT8, T51958 JΕAJ JTl 2, T51958JPEAJJT16, T51958 JΕAJ JT33 and T51958 JPEA _T41. Table 52 below describes the starting and ending position of this segment on each franscript. Table 52 - Segment location on transcripts
Figure imgf001667_0002
Figure imgf001668_0001
Segment cluster T51958JPEAJ _nodeJ9 according to the present invention is supported by 70 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): T51958JPEAJJT4, T51958_PEA_1_T5, T51958JΕAJ JT6, T51958_PEA_1_T8, T51958JΕAJ J 2, T51958JPEAJJT16, T51958JΕAJ JT33 and T51958JPEAJJT41. Table 53 below describes the starting and ending position of this segment on each franscript. Table 53 - Segment location on transcripts
Figure imgf001668_0002
Segment cluster T51958_PEA_l_nodeJ0 according to the present invention can be found in the following franscript(s): T51958JPEAJ JT4, T51958_PEA_1_T5, T51958JPEAJ _T6, T51958JΕAJ JT8, T51958J>EAJJT12, T51958 JPEAJ JTl 6, T51958_PEA_1_T33 and T51958JΕAJ JT41. Table 54 below describes the starting and ending position of this segment on each franscript. Table 54 - Segment location on transcripts
Figure imgf001669_0001
Segment cluster T51958_PEA_l_nodeJ4 according to the present invention can be found in the following franscript(s): T51958_PEA_1_T4, T51958JPEAJ _T5, T51958JPEAJJT6, T51958JPEAJJT8, T51958_PEA_1_T12 and T51958JΕAJ T41. Table 55 below describes the starting and ending position of this segment on each franscript. Table 55 - Segment location on transcripts
Figure imgf001669_0002
Segment cluster T51958JPEAJ iode jl according to the present invention is supported by 72 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T51958_PEAJ_T4, T51958J>EAJ JT5, T51958_PEAJ_T6, T51958JPEAJ JT8, T51958 JΕAJ JTl 2, T51958_PEAJ_T16 and T51958JPEAJ JT41. Table 56 below describes the starting and ending position of this segment on each transcript. Table 56 - Segment location on transcripts
Figure imgf001670_0001
Segment cluster T51958 ΕAJ iode l according to the present invention is supported by 80 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscriρt(s): T51958_PEA_1_T4, T51958_PEA_1_T5, T51958JPEAJJT6, T51958JPEAJ _T8, T51958JΕAJJT12, T51958JPEAJ JT16 and T51958_PEA_1_T41. Table 57 below describes the starting and ending position of this segment on each transcript. Table 57 - Segment location on transcripts
Figure imgf001670_0002
Figure imgf001671_0001
Segment cluster T51958JPEAJ jiode _72 according to the present invention can be found in the following transcript(s): T51958_PEAJ JT4, T51958JPEAJJT5, T51958JPEAJJT6, T51958 JPEA JJT8, T51958J>EAJ JT12, T51958 J>EA JJT 16 and T51958_PEA_1_T41. Table 58 below describes the starting and ending position of this segment on each transcript. Table 58 - Segment location on transcripts
Figure imgf001671_0002
Segment cluster T51958JPEAJ _nodeJ5 according to the present invention can be found in the following franscript(s): T51958JPEAJ JT4, T51958JPEAJ_T5, T51958JΕAJ JT6, T51958JPEAJJT8, T51958 JPEAJ JTl 2, T51958_PEA_1_T16 and T51958JPEAJJT41. Table 59 below describes the starting and ending position of this segment on each franscript. Table 59 - Segment location on transcripts
Figure imgf001672_0001
Segment cluster T51958 PEAJ jιode 6 according to the present invention can be found in the following transcript(s): T51958JPEAJ JT4, T51958_PEA_1_T5, T51958_PEAJ_T6, T51958JPEAJJT8, T51958_PEA_1_T12, T51958JPEAJJT16 and T51958 PEAJJT41. Table 60 below describes the starting and ending position of this segment on each transcript. Table 60 - Segment location on transcripts
Figure imgf001672_0002
Segment cluster T51958_PEA_l_nodeJ7 according to the present invention can be found in the following franscript(s): T51958 JΕA JJT4, T51958JPEAJJT5, T51958 JΕA JJT6, T51958 ΕAJJT8, T51958_PEA_1_T12, T51958 JΕAJ JTl 6 and T51958JPEAJJT41. Table 61 below describes the starting and ending position of this segment on each franscript. Table 61 - Segment location on transcripts
Figure imgf001673_0001
Segment cluster T51958_PEA_l_node_80 according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T51958JPEAJJT35. Table 62 below describes the starting and ending position of this segment on each transcript. Table 62 - Segment location on transcripts
Figure imgf001673_0002
Segment cluster T51958_PEA_l_node_82 according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T51958JPEAJ JT35. Table 63 below describes the starting and ending position of this segment on each franscript. Table 63 - Segment location on transcripts
Figure imgf001674_0001
Segment cluster T51958JPEAJ ιode_84 according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T51958JPEAJJT35. Table 64 below describes the starting and ending position of this segment on each transcript. Table 64 - Segment location on transcripts
Figure imgf001674_0002
Variant protein alignment to the previously Icnown protein: Sequence name: PTK7_HUMAN_V4
Sequence documentation:
Alignment of: T51958 PEA 1 P5 x PTK7 HUMAN V4
Alignment segment 1/1
Quality: 6749.00 Escore : Matching length: 682 Total length: 682 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment : . . . . . 1 MGAARGSPARPRRLPLLSVLL PLLGGTQTAIVFIKQPSSQDALQGRRAL 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MGAARGSPARPRRLPLLSVLLLPLLGGTQTAIVFIKQPSSQDALQGRRAL 50 51 LRCEVEAPGPVHVYWLLDGAPVQDTERRFAQGSSLSFAAVDRLQDSGTFQ 100 II II I I I I I I I I I I I I I II I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I 51 LRCEVEAPGPVHVY LLDGAPVQDTERRFAQGSSLSFAAVDRLQDSGTFQ 100 101 CVARDDVTGEEARSANASFNIK IEAGPVVLKHPASEAEIQPQTQVTLRC 150 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 101 CVARDDVTGEEARSANASFNIK IEAGPVVLKHPASEAEIQPQTQVTLRC 150
151 HIDGHPRPTYQWFRDGTPLSDGQSNHTVSSKERNLTLRPAGPEHSGLYSC 200 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I 151 HIDGHPRPTYQ FRDGTPLSDGQSNHTVSS ERNLTLRPAGPEHSGLYSC 200
201 CAHSAFGQACSSQNFTLSIADESFARWLAPQDVWARYEEAMFHCQFSA 250 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 201 CAHSAFGQACSSQNFTLSIADESFARWLAPQDVVVARYEEAMFHCQFSA 250 251 QPPPSLQ LFEDETPITNRSRPPHLRRATVFANGSLLLTQVRPRNAGIYR 300 251 QPPPSLQWLFEDETPITNRSRPPHLRRATVFANGSLLLTQVRPRNAGIYR 300
301 CIGQGQRGPPIILEATLHLAEIEDMPLFEPRVFTAGSEERVTCLPPKGLP 350 I I I I I I I I I I I || I I I I I 1 || I I I I I I I I I I I I I I I I I I I I I II I I I I I I
301 CIGQGQRGPPIILEATLHLAEIEDMPLFEPRVFTAGSEERVTCLPPKGLP 350
351 EPSVWWEHAGVRLPTHGRVYQKGHELVLANIAESDAGVYTCHAANLAGQR 400 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 351 EPSV WEHAGVRLPTHGRVYQKGHELVLANIAESDAGVYTCHAANLAGQR 400
401 RQDVNITVATVPSWLKKPQDSQLEEGKPGYLDCLTQATPKPTVVWYRNQM 450 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I
401 RQDVNITVATVPSWLK PQDSQLEEGKPGYLDCLTQATPKPTVV YRNQM 450 . . . . .
451 LISEDSRFEVFKNGTLRINSVEVYDGTWYRCMSSTPAGSIEAQARVQVLE 500 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I
451 LISEDSRFEVFKNGTLRINSVEVYDGT YRCMSSTPAGSIEAQARVQVLE 500
501 KLKFTPPPQPQQCMEFDKEATVPCSATGREKPTIKWERADGSSLPE VTD 550 I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I II I I I I I I I I I
501 KLKFTPPPQPQQCMEFD EATVPCSATGREKPTIKWERADGSSLPE VTD 550
551 NAGTLHFARVTRDDAGNYTCIASNGPQGQIRAHVQLTVAVFITFKVEPER 600 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
551 NAGTLHFARVTRDDAGNYTCIASNGPQGQIRAHVQLTVAVFITFKVEPER 600
601 TTVYQGHTALLQCEAQGDPKPLIQ KGKDRILDPTKLGPRMHIFQNGSLV 650 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 601 TTVYQGHTAL QCEAQGDPKPLIQWKGKDRILDPTKLGPRMHIFQNGSLV 650 651 IHDVAPEDSGRYTCIAGNSCNIKHTEAPLYVV 682
651 IHDVAPEDSGRYTCIAGNSCNIKHTEAPLYVV 682
Sequence name: PTK7_HUMAN_V4
Sequence documentation:
Alignment of: T51958_PEA_1_P6 x PTK7_HUMAN_V4
Alignment segment 1/1:
Quality: 6343.00 Escore: 0 Matching length: 641 Total length: 641 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment: 1 MGAARGSPARPRRLPLLSVLLLPLLGGTQTAIVFIKQPSSQDALQGRRAL 50 MGAARGSPARPRRLPLLSVLLLPLLGGTQTAIVFIKQPSSQDALQGRRAL 50
LRCEVEAPGPVHVY LLDGAPVQDTERRFAQGSSLSFAAVDRLQDSGTFQ 100
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I LRCEVEAPGPVHVYWLLDGAPVQDTERRFAQGSSLSFAAVDRLQDSGTFQ 100
CVARDDVTGEEARSANASFNIK IEAGPVVLKHPASEAEIQPQTQVTLRC 150
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I CVARDDVTGEEARSANASFNIKWIEAGPVVLKHPASEAEIQPQTQVTLRC 150
HIDGHPRPTYQWFRDGTPLSDGQSNHTVSSKERNLTLRPAGPEHSGLYSC 200
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I HIDGHPRPTYQ FRDGTPLSDGQSNHTVSSKERNLTLRPAGPEHSGLYSC 200
CAHSAFGQACSSQNFTLSIADESFARVVLAPQDVVVARYEEAMFHCQFSA 250
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I CAHSAFGQACSSQNFTLSIADESFARWLAPQDVVVARYEEAMFHCQFSA 250
QPPPSLQ LFEDETPITNRSRPPHLRRATVFANGSLLLTQVRPRNAGIYR 300 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I QPPPSLQ LFEDETPITNRSRPPHLRRATVFANGSLLLTQVRPRNAGIYR 300
CIGQGQRGPPIILEATLHLAEIEDMPLFEPRVFTAGSEERVTCLPPKGLP 350 I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I CIGQGQRGPPIILEATLHLAEIEDMPLFEPRVFTAGSEERVTCLPPKGLP 350
EPSV WEHAGVRLPTHGRVYQKGHELVLANIAESDAGVYTCHAANLAGQR 400
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I EPSVW EHAGVRLPTHGRVYQKGHELVLANIAESDAGVYTCHAANLAGQR 400
RQDVNITVATVPS LKKPQDSQLEEGKPGYLDCLTQATPKPTλΛ/ YRNQM 450 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 401 RQDVNITVATVPSWLKKPQDSQLEEGKPGYLDCLTQATPKPTVV YRNQM 450 451 LISEDSRFEVFKNGTLRINSVEVYDGT YRCMSSTPAGSIEAQARVQVLE 500 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 451 ISEDSRFEVFKNGTLRINSVEVYDGT YRCMSSTPAGSIEAQARVQVLE 500 501 KLKFTPPPQPQQCMEFDKEATVPCSATGREKPTIKWERADGSSLPEWVTD 550 I I I I I I I I I I I 1 I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 501 KLKFTPPPQPQQCMEFDKEATVPCSATGREKPTIKWERADGSSLPE VTD 550 551 NAGTLHFARVTRDDAGNYTCIASNGPQGQIRAHVQLTVAVFITFKVEPER 600 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 551 NAGTLHFARVTRDDAGNYTCIASNGPQGQIRAHVQLTVAVFITFKVEPER 600 . . . . 601 TTVYQGHTALLQCEAQGDPKPLIQWKGKDRILDPTKLGPRM 641 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 601 TTVYQGHTALLQCEAQGDPKPLIQWKGKDRILDPTKLGPRM 641
Sequence name: PTK7_HUMAN_V11
Sequence documentation:
Alignment of: T51958_PEA_1_P28 x PTK7_HUMAN_V11
Alignment segment 1/1: Quality: 4023.00 Escore: 0 Matching length: 410 Total length: 410 Matching Percent Similarity: 100.00 Matching Percent Identity: 99.76 Total Percent Similarity: 100.00 Total Percent Identity: 99.76 -Gaps: 0
Alignment:
1 MGAARGSPARPRRLPLLSVLLLPLLGGTQTAIVFIKQPSSQDALQGRRAL 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MGAARGSPARPRRLPLLSVLLLPLLGGTQTAIVFIKQPSSQDALQGRRAL 50 51 RCEVEAPGPVHVYWLLDGAPVQDTERRFAQGSSLSFAAVDRLQDSGTFQ 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 LRCEVEAPGPVHVYWLLDGAPVQDTERRFAQGSSLSFAAVDRLQDSGTFQ 100
101 CVARDDVTGEEARSANASFNIKWIEAGPVVLKHPASEAEIQPQTQVTLRC 150 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 101 CVARDDVTGEEARSANASFNIK IEAGPVVLKHPASEAEIQPQTQVTLRC 150 . . . . . 151 HIDGHPRPTYQ FRDGTPLSDGQSNHTVSSKERNLTLRPAGPEHSGLYSC 200 II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I 151 HIDGHPRPTYQ FRDGTPLSDGQSNHTVSSKERNLTLRPAGPEHSGLYSC 200 201 CAHSAFGQACSSQNFTLSIADESFARWLAPQDVVVARYEEAMFHCQFSA 250 201 CAHSAFGQACSSQNFTLSIADESFARVVLAPQDVVVARYEEAMFHCQFSA 250 251 QPPPSLQWLFEDETPITNRSRPPHLRRATVFANGSLLLTQVRPRNAGIYR 300 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 251 QPPPSLQ LFEDETPITNRSRPPHLRRATVFANGSLLLTQVRPRNAGIYR 300 301 CIGQGQRGPPIILEATLHLAEIEDMPLFEPRVFTAGSEERVTCLPPKGLP 350 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 301 CIGQGQRGPPIILEATLHLAEIEDMPLFEPRVFTAGSEERVTCLPPKGLP 350 351 EPSV WEHAGVRLPTHGRVYQKGHELVLANIAESDAGVYTCHAANLAGQR 400
351 EPSV EHAGVRLPTHGRVYQKGHELVLANIAESDAGVYTCHAANLAGQR 400 401 RQDVNITVAS 410 I I I I I I I I I: 401 RQDVNITVAT 410
Sequence name: Q8NFA5
Sequence documentation:
Alignment of: T51958_PEA_1_P28 x Q8NFA5
Alignment segment 1/1: Quality: 4023.00 Escore: 0 Matching length: 410 Total length: 410 Matching Percent Similarity: 100.00 Matching Percent Identity: 99.76 Total Percent Similarity: 100.00 Total Percent Identity: 99.76 Gaps : 0
Alignment :
1 MGAARGSPARPRRLPLLSVLLLPLLGGTQTAIVFIKQPSSQDALQGRRAL 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I 1 MGAARGSPARPRRLPLLSVLLLPLLGGTQTAIVFIKQPSSQDA QGRRAL 50
51 LRCEVEAPGPVHVYWLLDGAPVQDTERRFAQGSSLSFAAVDRLQDSGTFQ 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 LRCEVEAPGPVHVYWLLDGAPVQDTERRFAQGSSLSFAAVDRLQDSGTFQ 100 . . . . . 101 CVARDDVTGEEARSANASFNIKWIEAGPVVLKHPASEAEIQPQTQVTLRC 150 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 101 CVARDDVTGEEARSANASFNIKWIEAGPVVLKHPASEAEIQPQTQVTLRC 150 151 HIDGHPRPTYQWFRDGTPLSDGQSNHTVSSKERNLTLRPAGPEHSGLYSC 200 I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 151 HIDGHPRPTYQ FRDGTPLSDGQSNHTVSSKERNLTLRPAGPEHSGLYSC 200 201 CAHSAFGQACSSQNFTLSIADESFARWLAPQDVWARYEEAMFHCQFSA 250 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 201 CAHSAFGQACSSQNFTLSIADESFARWLAPQDVWARYEEAMFHCQFSA 250 251 QPPPSLQ LFEDETPITNRSRPPHLRRATVFANGSLLLTQVRPRNAGIYR 300 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 251 QPPPSLQ LFEDETPITNRSRPPHLRRATVFANGSLLLTQVRPRNAGIYR 300 . . . . . 301 CIGQGQRGPPIILEATLHLAEIEDMPLFEPRVFTAGSEERVTCLPPKGLP 350 I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 301 CIGQGQRGPPIILEATLHLAEIEDMPLFEPRVFTAGSEERVTCLPPKGLP 350 351 EPSV EHAGVRLPTHGRVYQKGHELVLANIAESDAGVYTCHAANLAGQR 400 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 351 EPSVW EHAGVRLPTHGRVYQKGHELVLANIAESDAGVYTCHAANLAGQR 400
401 RQDVNITVAS 410 I I I I I I I I I : 401 RQDVNITVAT 410
Sequence name: Q8NFA6
Sequence documentation:
Alignment of: T51958_PEA_1_P28 x Q8NFA6
Alignment segment 1/1: Quality: 4023.00 Escore: 0 Matching length: 410 Total length: 410 Matching Percent Similarity: 100.00 Matching Percent Identity: 99.76 Total Percent Similarity: 100.00 Total Percent Identity: 99.76 Gaps : 0 -- -
Alignment:
1 MGAARGSPARPRRLPLLSVLLLPLLGGTQTAIVFIKQPSSQDALQGRRAL 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MGAARGSPARPRRLPLLSV LLPLLGGTQTAIVFIKQPSSQDALQGRRAL 50
51 LRCEVEAPGPVHVY LLDGAPVQDTERRFAQGSSLSFAAVDRLQDSGTFQ 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 LRCEVEAPGPVHVY LLDGAPVQDTERRFAQGSSLSFAAVDRLQDSGTFQ 100 . . . . . 101 CVARDDVTGEEARSANASFNIKWIEAGPVVLKHPASEAEIQPQTQVTLRC 150 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 101 CVARDDVTGEEARSANASFNIK IEAGPWL HPASEAEIQPQTQVTLRC 150 151 HIDGHPRPTYQWFRDGTPLSDGQSNHTVSSKERNLTLRPAGPEHSGLYSC 200 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 151 HIDGHPRPTYQ FRDGTPLSDGQSNHTVSSKERNLTLRPAGPEHSGLYSC 200 201 CAHSAFGQACSSQNFTLSIADESFARWLAPQDVVVARYEEAMFHCQFSA 250 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 201 CAHSAFGQACSSQNFTLSIADESFARWLAPQDVWARYEEAMFHCQFSA 250 251 QPPPSLQWLFEDETPITNRSRPPHLRRATVFANGSLLLTQVRPRNAGIYR 300 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 251 QPPPSLQWLFEDETPITNRSRPPHLRRATVFANGSLLLTQVRPRNAGIYR 300 . . . . . 301 CIGQGQRGPPIILEATLHLAEIEDMPLFEPRVFTAGSEERVTCLPPKGLP 350 I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 301 CIGQGQRGPPIILEATLHLAEIEDMPLFEPRVFTAGSEERVTCLPPKGLP 350 351 EPSV EHAGVRLPTHGRVYQKGHELVLANIAESDAGVYTCHAANLAGQR 400 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 351 EPSVWWEHAGVRLPTHGRVYQKGHELVLANIAESDAGVYTCHAANLAGQR 400
401 RQDVNITVAS 410 I I I I I I I I I: 401 RQDVNITVAT 410
Sequence name: Q8NFA7
Sequence documentation:
Alignment of: T51958_PEA_1_P28 x Q8NFA7
Alignment segment 1/1: Quality: 4023.00 Escore: 0 Matching length: 416 Total length: 416 Matching Percent Similarity: 99.04 Matching Percent Identity: 98.80 Total Percent Similarity: 99.04 Total Percent Identity: 98.80 Gaps : 0
Alignment :
1 MGAARGSPARPRRLPLLSVLLLPLLGGTQTAIVFIKQPSSQDALQGRRAL 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MGAARGSPARPRRLPLLSVLLLPLLGGTQTAIVFIKQPSSQDALQGRRAL 50
51 LRCEVEAPGPVHVYWLLDGAPVQDTERRFAQGSS SFAAVDRLQDSGTFQ 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 LRCEVEAPGPVHVY LLDGAPVQDTERRFAQGSS SFAAVDRLQDSGTFQ 100 . . . . . 101 CVARDDVTGEEARSANASFNIK IEAGPVVLKHPASEAEIQPQTQVTLRC 150 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I 101 CVARDDVTGEEARSANASFNIK IEAGPVVLKHPASEAEIQPQTQVTLRC 150 151 HIDGHPRPTYQWFRDGTPLSDGQSNHTVSSKERNLTLRPAGPEHSGLYSC 200 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 151 HIDGHPRPTYQ FRDGTPLSDGQSNHTVSSKERNLTLRPAGPEHSGLYSC 200
201 CAHSAFGQACSSQNFTLSIADESFARWLAPQDVWARYEEAMFHCQFSA 250 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 201 CAHSAFGQACSSQNFTLSIADESFARWLAPQDWVARYEEAMFHCQFSA 250 251 QPPPSLQ LFEDETPITNRSRPPHLRRATVFANGSLLLTQVRPRNAGIYR 300 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 251 QPPPSLQWLFEDETPITNRSRPPHLRRATVFANGSLLLTQVRPRNAGIYR 300 . . . . . 301 CIGQGQRGPPIILEATLHLAEIEDMPLFEPRVFTAGSEERVTCLPPKGLP 350 I I I I I 1 I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 301 CIGQGQRGPPIILEATLHLAEIEDMPLFEPRVFTAGSEERVTCLPPKGLP 350 351 EPSV WEHAGVRLPTHGRVYQKGHELVLANIAESDAGVYTCHAANLAGQR 400 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I 351 EPSVWWEHAGVRLPTHGRVYQ GHELVLANIAESDAGVYTCHAANLAGQR 400
401 RQDVNITVASEHLCPE 416 I I I I I I I I I : II 401 RQDVNITVANGSSLPE 416
Sequence name: Q8NFA8
Sequence documentation:
Alignment of: T51958_PEA_1_P28 x Q8NFA8
Alignment segment 1/1: Quality: 4023.00 Escore: 0 Matching length: 410 Total length: 410 Matching Percent Similarity: 100.00 Matching Percent Identity: 99.76 Total Percent Similarity: 100.00 Total Percent Identity: 99.76 Gaps : 0
Alignment :
1 MGAARGSPARPRRLPLLSVLLLPLLGGTQTAIVFIKQPSSQDALQGRRAL 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I 1 MGAARGSPARPRRLPLLSVLLLPLLGGTQTAIVFIKQPSSQDA QGRRAL 50
51 LRCEVEAPGPVHVY LLDGAPVQDTERRFAQGSSLSFAAVDRLQDSGTFQ 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 LRCEVEAPGPVHVY LLDGAPVQDTERRFAQGSSLSFAAVDRLQDSGTFQ 100 . . . . . 101 CVARDDVTGEEARSANASFNIKWIEAGPVVLKHPASEAEIQPQTQVTLRC 150 I I I I I I I I I II I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 101 CVARDDVTGEEARSANASFNIKWIEAGPVVLKHPASEAEIQPQTQVTLRC 150 151 HIDGHPRPTYQ FRDGTPLSDGQSNHTVSSKERNLTLRPAGPEHSGLYSC 200 I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 151 HIDGHPRPTYQWFRDGTPLSDGQSNHTVSSKERNLTLRPAGPEHSGLYSC 200
201 CAHSAFGQACSSQNFTLSIADESFARWLAPQDVVVARYEEAMFHCQFSA 250 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 201 CAHSAFGQACSSQNFTLSIADESFARWLAPQDVWARYEEAMFHCQFSA 250 251 QPPPSLQ LFEDETPITNRSRPPHLRRATVFANGSLLLTQVRPRNAGIYR 300 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 251 QPPPSLQWLFEDETPITNRSRPPHLRRATVFANGSLLLTQVRPRNAGIYR 300 . . . . . 301 CIGQGQRGPPIILEATLHLAEIEDMPLFEPRVFTAGSEERVTCLPPKGLP 350 I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 301 CIGQGQRGPPIILEATLHLAEIEDMPLFEPRVFTAGSEERVTCLPPKGLP 350 351 EPSVW EHAGVRLPTHGRVYQKGHELVLANIAESDAGVYTCHAANLAGQR 400 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 351 EPSV EHAGVRLPTHGRVYQKGHELVLANIAESDAGVYTCHAANLAGQR 400
401 RQDVNITVAS 410 I I I I I I I I I : 401 RQDVNITVAT 410
Sequence name: AAN04862
Sequence documentation:
Alignment of: T51958_PEA_1_P28 x AAN04862
Alignment segment 1/1: Quality: 4023.00 Escore: 0 Matching length: 410 Total length: 410 Matching Percent Similarity: 100.00 Matching Percent Identity: 99.76 Total Percent Similarity: 100.00 Total Percent Identity: 99.76 Gaps : 0
Alignment :
1 MGAARGSPARPRRLPLLSVLLLPLLGGTQTAIVFIKQPSSQDALQGRRAL 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I 1 MGAARGSPARPRRLPLLSVLLLPLLGGTQTAIVFIKQPSSQDALQGRRAL 50
51 LRCEVEAPGPVHVYWLLDGAPVQDTERRFAQGSSLSFAAVDRLQDSGTFQ 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I 51 LRCEVEAPGPVHVY LLDGAPVQDTERRFAQGSS SFAAVDRLQDSGTFQ 100 . . . . . 101 CVARDDVTGEEARSANASFNIK IEAGPVVLKHPASEAEIQPQTQVTLRC 150 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 101 CVARDDVTGEEARSANASFNIK IEAGPWLKHPASEAEIQPQTQVTLRC 150 151 HIDGHPRPTYQWFRDGTPLSDGQSNHTVSS ERNLTLRPAGPEHSGLYSC 200 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 151 HIDGHPRPTYQ FRDGTPLSDGQSNHTVSSKERNLTLRPAGPEHSGLYSC 200 201 CAHSAFGQACSSQNFTLSIADESFARWLAPQDVWARYEEAMFHCQFSA 250 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 201 CAHSAFGQACSSQNFTLSIADESFARWLAPQDWVARYEEAMFHCQFSA 250 251 QPPPSLQ LFEDETPITNRSRPPHLRRATVFANGSLLLTQVRPRNAGIYR 300 I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 251 QPPPSLQWLFEDETPITNRSRPPHLRRATVFANGSLLLTQVRPRNAGIYR 300 . . . . . 301 CIGQGQRGPPIILEATLHLAEIEDMPLFEPRVFTAGSEERVTCLPPKGLP 350 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 301 CIGQGQRGPPIILEATLHLAEIEDMPLFEPRVFTAGSEERVTCLPPKGLP 350 351 EPSVWWEHAGVRLPTHGRVYQKGHELVLANIAESDAGVYTCHAANLAGQR 400 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 351 EPSV WEHAGVRLPTHGRVYQ GHELVLANIAESDAGVYTCHAANLAGQR 400
401 RQDVNITVAS 410 I I I I I I I I I : 401 RQDVNITVAT 410
Sequence name: PTK7_HUMAN_V13
Sequence documentation:
Alignment of: T51958_PEA_1_P30 x PTK7_HUMAN_V13
Alignment segment 1/1: Quality: 1164.00 Escore: 0 Matching length: 122 Total length: 122 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment :
1 MGAARGSPARPRRLPLLSVLLLPLLGGTQTAIVFIKQPSSQDALQGRRAL 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MGAARGSPARPRRLPLLSVLLLPLLGGTQTAIVFIKQPSSQDALQGRRAL 50
51 LRCEVEAPGPVHVY LLDGAPVQDTERRFAQGSSLSFAAVDRLQDSGTFQ 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 LRCEVEAPGPVHVYWLLDGAPVQDTERRFAQGSSLSFAAVDRLQDSGTFQ 100
101 CVARDDVTGEEARSANASFNIK 122
101 CVARDDVTGEEARSANASFNIK 122
Sequence name: PTK7 HUMAN V3 Sequence documentation:
Alignment of: T51958_PEA_1_P34 x PTK7_HUMAN_V3
Alignment segment 1/1:
Quality: 1518.00 Escore: 0 Matching length: 157 Total length: 157 - - - Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0
Alignment :
1 MGAARGSPARPRRLPLLSVLLLPLLGGTQTAIVFIKQPSSQDALQGRRAL 50 I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I 1 MGAARGSPARPRRLPLLSVLLLPLLGGTQTAIVFIKQPSSQDALQGRRAL 50 51 LRCEVEAPGPVHVY LLDGAPVQDTERRFAQGSSLSFAAVDRLQDSGTFQ 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I III I I I I I 51 LRCEVEAPGPVHVY LLDGAPVQDTERRFAQGSSLSFAAVDRLQDSGTFQ 100
101 CVARDDVTGEEARSANASFNIK IEAGPWLKHPASEAEIQPQTQVTLRC 150 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I 101 CVARDDVTGEEARSANASFNIK IEAGPVVLKHPASEAEIQPQTQVTLRC 150
151 HIDGHPR 157 151 HIDGHPR 157
Sequence name: PTK7J-UMAN 11
Sequence documentation:
Alignment of: T51958_PEA_1_P35 x PTK7_HUMAN_V11
Alignment segment 1/1:
Quality: 2160.00 Escore: 0 Matching length: 222 Total length: 222 Matching Percent Similarity: 99.55 Matching Percent Identity: 99.55 Total Percent Similarity: 99.55 Total Percent Identity: 99.55 Gaps : 0
Alignment:
1 MGAARGSPARPRRLPL SVLLLPLLGGTQTAIVFIKQPSSQDALQGRRAL 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MGAARGSPARPRRLPLLSVLLLPLLGGTQTAIVFIKQPSSQDALQGRRAL 50 51 LRCEVEAPGPVHVY LLDGAPVQDTERRFAQGSSLSFAAVDRLQDSGTFQ 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 LRCEVEAPGPVHVY LLDGAPVQDTERRFAQGSSLSFAAVDRLQDSGTFQ 100 . . . . . 101 CVARDDVTGEEARSANASFNIK IEAGPVVLKHPASEAEIQPQTQVTLRC 150 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 101 CVARDDVTGEEARSANASFNIKWIEAGPVVLKHPASEAEIQPQTQVTLRC 150 151 HIDGHPRPTYQWFRDGTPLSDGQSNHTVSSKERNLTLRPAGPEHSGLYSC 200 I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I 151 HIDGHPRPTYQWFRDGTPLSDGQSNHTVSSKERNLTLRPAGPEHSGLYSC 200 201 CAHSAFGQACSSQNFTLSIAGE 222
201 CAHSAFGQACSSQNFTLSIADE 222
Expression of Homo sapiens PTK7 protein tyrosine kinase 7 (PTK7) 7 1958 transcripts which are detectable by amplicon as depicted in sequence name T51958seg38 in normal and cancerous colon tissues Expression of Homo sapiens PTK7 protein tyrosine kinase 7 (PTK7) franscripts detectable by or according to seg38, T51958seg38 amplicon (SEQ ID NO: 1369) and 751958 seg38F (SEQ ID NO: 1367) and T51958seg38R (SEQ ID NO: 1368) primers was measured by real time PCR. In parallel the expression of four housekeeping genes -PBGD (GenBank
Accession No. BC019323; amplicon - PBGD-amplicon, SEQ ID NO:531), HPRTl (GenBank Accession No. NMJ000194; amplicon - HPRTl -amplicon, SEQ ID NO:612), G6PD (GenBank Accession No. NM_000402; G6PD amplicon, SEQ ID NO:615), RPS27A (GenBank Accession No. NM_002954; RPS27A amplicon, SEQ ID NO: 1261), was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the normal post-mortem (PM) samples (Sample Nos. 41, 52, 62-67, 69-71, Table 1, above, "Tissue samples in testing panel"), to obtain a value of fold up-regulation for each sample relative to median ofthe normal PM samples. Figure 64 is a histogram showing over expression ofthe above-indicated Homo sapiens PTK7 protein tyrosine kinase 7 (PTK7) transcripts in cancerous colon samples relative to the normal samples. (Values represent the average of duplicate experiments. Enor bars indicate the minimal and maximal values obtained.) The number and percentage of samples that exhibit at least 3 fold over-expression, out ofthe total number of samples tested is indicated in the bottom. As is evident from Figure 64, the expression of Homo sapiens PTK7 protein tyrosine kinase 7 (PTK7) transcripts detectable by the above amplicon in cancer samples was significantly higher than in the non-cancerous samples (Sample Nos41, 52, 62-67, 69-71, Table 1, "Tissue samples in testing panel"). Notably an over-expression of at least 3 fold was found in 23 out of 37 adenocarcinoma samples, Statistical analysis was applied to verify the significance of these results, as described below. The P value for the difference in the expression levels of Homo sapiens PTK7 protein tyrosine kinase 7 (PTK7) transcripts detectable by the above amplicon in colon cancer samples versus the normal tissue samples was determined by T test as 4.58E-04. Threshold of 3 fold overexpression was found to differentiate between cancer and normal samples with P value of 1.97E-04 as checked by exact fisher test. The above values demonstrate statistical significance ofthe results. Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non- limiting illustrative example only of a suitable primer pair: T51958seg38F forward primer; and T51958seg38R reverse primer. The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon was obtained as a non-limiting illustrative example only of a suitable amplicon: T51958seg38.
Forward primer (SEQ ID NO: 1367): GCTTGCCCTTTCATGTGGA Reverse primer (SEQ ID NO: 1368): TCACGATGAGACCTGACACTCTG Amplicon (SEQ ID NO: 1369): GCTTGCCCTTTCATGTGGAGCACTGTGATTGGACCCAAGTTGGCAAGAGTGGAAGA CCAGGGGACAGAACAGAAATCCCCATGGTGGCCAGAGTGTCAGGTCTCATCGTGA
Expression of Homo sapiens PTK7 protein tyrosine kinase 7 (PTK7) 751958 transcripts which are detectable by amplicon as depicted in sequence name T51958seg7 in normal and cancerous colon tissues Expression of Homo sapiens PTK7 protein tyrosine kinase 7 (PTK7) franscripts detectable by or according to seg7, T51958seg7 amplicon (SEQ ID NO: 1372) and 7*51958 seg7F (SEQ ID NO: 1370) and T51958seg7R (SEQ ID NO: 1371) primers was measured by real time PCR. In parallel the expression of four housekeeping genes -PBGD (GenBank
Accession No. BC019323; amplicon - PBGD-amplicon, SEQ ID NO:531), HPRTl (GenBank Accession No. NM_000194; amplicon - HPRTl -amplicon, SEQ ID NO:612), G6PD (GenBank
Accession No. NM_000402; G6PD amplicon, SEQ ID NO:615), RPS27A (GenBank Accession
No. NM 002954; RPS27A amplicon, SEQ ID NO.1261), was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the nonnal post-mortem (PM) samples (Sample Nos. 41, 52, 62-67, 69-71, Table 1, above/Tissue samples in testing panel"), to obtain a value of fold up-regulation for each sample relative to median of the nonnal PM samples. Figure 65 is a histogram showing over expression ofthe above-indicated Homo sapiens PTK7 protein tyrosine kinase 7 (PTK7) transcripts in cancerous colon samples relative to the normal samples. (Values represent the average of duplicate experiments. Enor bars indicate the minimal and maximal values obtained.) The number and percentage of samples that exhibit at least 3 fold over-expression, out ofthe total number of samples tested is indicated in the bottom. As is evident from Figure 65 the expression of Homo sapiens PTK7 protein tyrosine kinase 7 (PTK7) transcripts detectable by the above amplicon in cancer samples was significantly higher than in the non-cancerous samples (Sample Nos. 41, 52, 62-67, 69-71,Table 1, "Tissue samples in testing panel"). Notably an over-expression of at least 3 fold was found in 19 out of 37 adenocarcinoma samples, Statistical analysis was applied to verify the significance of these results, as described below. The P value for the difference in the expression levels of Homo sapiens PTK7 protein tyrosine kinase 7 (PTK7) transcripts detectable by the above amplicon in colon cancer samples versus the normal tissue samples was determined by T test as 1.74E-05. Threshold of 3 fold overexpression was found to differentiate between cancer and normal samples with P value of 1.53E-03 as checked by exact fisher test. The above values demonstrate statistical significance ofthe results.
Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non- limiting illustrative example only of a suitable primer pair: 7 1958seg7E forward primer; and T 51958seg7R reverse primer. The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon was obtained as a non-limiting illustrative example only of a suitable amplicon: T5\958seg7.
Forward primer (SΕQ ID NO: 1370): GTGCCCAGTCCCCCTGTC " Reverse primer (SΕQ ID NO: 1371): CCTGGCCCGTTTAACTGGA Amplicon (SΕQ ID NO: 1372): GTGCCCAGTCCCCCTGTCAGACCCTCAATGACTGAGGCCTGGGGGATCCCTCCCTTA CCTCAGCTTCTCCCATTTCCAGTTAAACGGGCCAGG
DESCRIPTION FOR CLUSTER Z17877
Cluster Z 17877 features 9 transcript(s) and 17 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end ofthe application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Figure imgf001699_0001
Figure imgf001700_0001
Table 2 - Segments of interest
Figure imgf001700_0002
Table 3 - Proteins of interest
Figure imgf001700_0003
Figure imgf001701_0001
Cluster Z17877 can be used as a diagnostic marker according to overexpression of transcripts of this cluster in cancer. Expression of such transcripts in normal tissues is also given according to the previously described methods. The term "number" in the left hand column of the table and the numbers on the y-axis ofthe figure below refer to weighted expression of ESTs in each category, as "parts per million" (ratio ofthe expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million). Overall, the following results were obtained as shown with regard to the histograms in Figure 66 and Table 4. This cluster is overexpressed (at least at a minimum level) in the following pathological conditions: brain malignant tamors and malignant tamors involving the bone manow. Table 4 - Nor-mal tissue distribution
Figure imgf001701_0002
Figure imgf001702_0001
Table 5 - P values and ratios for expression in cancerous tissue
Figure imgf001702_0002
Figure imgf001703_0001
As noted above, cluster Z17877 features 9 transcript(s), which were listed in Table 1 above. A description of each variant protein according to the present invention is now provided.
Variant protein Z17877_PEAJ_P1 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by franscript(s) Z17877JPEAJJT0. The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: intracellularly. The protein localization is believed to be intracellularly because of manual inspection of known protein localization and/or gene stmcture.
Variant protein Z17877_PEAJ J?l is encoded by the following franscript(s): Z 17877 JPEA J_T0, for which the sequence(s) is/are given at the end ofthe application. The coding portion of franscript Z17877JPEAJ JTO is shown in bold; this coding portion starts at position 1206 and ends at position 2522. The transcript also has the following SNPs as listed in Table 6 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Z17877_PEA_1_P1 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 6 - Nucleic acid SNPs
Figure imgf001704_0001
Figure imgf001705_0001
Variant protein Z17877JPEAJ JP2 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) Z17877JPEAJ JT6 and Z17877JPEAJ JTl 1. The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: intracellularly. The protein localization is believed to be infracellularly because of manual inspection of known protein localization and/or gene structure. Variant protein Z17877JPEAJ J>2 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 7, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Z17877_PEA_1_P2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Amino acid mutations
Figure imgf001705_0002
Figure imgf001706_0001
Variant protein Z17877_PEA_1_P2 is encoded by the following transcript(s): Z17877JPEAJ JT6 and Z17877JPEAJ JTl 1, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript Zl 7877 JPEA JJT6 is shown in bold; this coding portion starts at position 1206 and ends at position 2270. The transcript also has the following SNPs as listed in Table 8 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Z17877JPEAJ JP2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Nucleic acid SNPs
Figure imgf001706_0002
Figure imgf001707_0001
Figure imgf001708_0001
The coding portion of transcript Z17877JPEAJ JTl 1 is shown in bold; this coding portion starts at position 602 and ends at position 1666. The transcript also has the following SNPs as listed in Table 9 (given according to their position on the nucleotide sequence, with the altemative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Zl 7877 JPEAJ J>2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Nucleic acid SNPs
Figure imgf001708_0002
Figure imgf001709_0001
Variant protein Zl 7877 JPEA JJP3 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by franscript(s) Z17877 JΕA J JT12. The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: intracellularly. The protein localization is believed to be intracellularly because of manual inspection of known protein localization and/or gene stmcture. Variant protein Z17877JPEAJ JP3 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 10, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Zl 7877 JPEA JJP3 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 - Amino acid mutations
Figure imgf001710_0001
Variant protein Z17877JPEAJ JP3 is encoded by the following transcript(s): Z17877JPEAJ JT12, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript Z17877 JΕAJJT12 is shown in bold; this coding portion starts at position 602 and ends at position 1945. The franscript also has the following SNPs as listed in Table 11 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Z17877JPEA J JP3 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 11 - Nucleic acid SNPs
Figure imgf001711_0001
Variant protein Z17877JΕAJ JP6 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) Z17877 J EAJ _T2, Z17877_PEA_1_T4 and Zl 7877 JPEAJ JT8. The location ofthe variant protein was detennined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: unknown. Variant protein Zl 7877 JPEA JJ>6 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 12, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Z17877JPEAJ JP6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 12 - Amino acid mutations
Figure imgf001712_0001
Variant protein Zl 7877 JPEA JJP6 is encoded by the following transcript(s): Z17877_PEA_1_T2, Z17877JPEAJJT4 and Z17877 JΕAJ JT8, for which the sequence(s) is/are given at the end ofthe application. The coding portion of franscript Z17877JPEAJ JT2 is shown in bold; this coding portion starts at position 40 and ends at position 381. The franscript also has the following SNPs as listed in Table 13 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Z17877JPEAJ JP6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 13 - Nucleic acid SNPs
Figure imgf001713_0001
Figure imgf001714_0001
The coding portion of transcript Z17877JPEAJ JT4 is shown in bold; this coding portion starts at position 40 and ends at position 381. The transcript also has the following SNPs as listed in Table 14 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of Icnown SNPs in variant protein Z17877_PEA_1_P6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 14 -Nucleic acid SNPs
Figure imgf001714_0002
Figure imgf001715_0001
Figure imgf001716_0001
The coding portion of transcript Z17877JPEAJ JT8 is shown in bold; this coding portion starts at position 40 and ends at position 381. The franscript also has the following SNPs as listed in Table 15 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Z17877JPEAJ JP6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 15 - Nucleic acid SNPs
Figure imgf001716_0002
Figure imgf001717_0001
Figure imgf001718_0001
As noted above, cluster Z17877 featares 17 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided. Segment cluster Z17877JPEAJ jiodej) according to the present invention is supported by 4 libraries. The number of libraries was detennined as previously described. This segment can be found in the following franscript(s): Zl 7877 JPEAJ JTO, Z17877JPEAJ _T2, Zl 7877 JPEA JJT3, Z17877_PEA_1_T4, Z17877_PEA_1_T6, Z17877JPEAJ JT7, Z17877_PEA_1_T8, Z17877JPEAJ J l and Z17877 JΕAJ JT12. Table 16 below describes the starting and ending position of this segment on each transcript. Table 16 - Segment location on transcripts
Figure imgf001718_0002
Segment cluster Z17877JPEAJ jnode J according to the present invention is supported by 23 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z17877JPEAJ JTO, Z17877JPEAJ JT2, Z17877JPEAJ JT3, Z17877JPEAJ JT4, Z17877JΕAJJT6 and Z17877 JPEAJ _T8. Table 17 below describes the starting and ending position of this segment on each franscript. Table 17 - Segment location on transcripts
Figure imgf001719_0001
Segment cluster Z17877JPEAJ jnode_8 according to the present invention is supported by 100 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z17877_PEA_1_T0, Z17877 JPEAJ JT2, Z17877JPEAJJT3, Z17877JPEAJJT4, Z17877JPEAJJT6 and Z17877 JPEA J_T8. Table 18 below describes the starting and ending position of this segment on each franscript. Table 18 - Segment location on transcripts
Figure imgf001719_0002
Figure imgf001720_0001
Segment cluster Z17877_PEA_l_node_9 according to the present invention is supported by 110 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): Zl 7877 JΕAJ JTO, Zl 7877 JΕA JJT2, Z17877JPEAJJT3, Z17877_PEA_1_T4, Z17877_PEA_1_T6, Z 17877 JPEA JJT7, Zl 7877 JPEA JJT8, Z17877 JPEAJ JTl 1 and Z17877 JPEA JJT12. Table 19 below describes the starting and ending position of this segment on each transcript. Table 19 - Segment location on transcripts
Figure imgf001720_0002
Segment cluster Zl 7877 JPEA Jjiode JO according to the present invention is supported by 8 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z 17877 JΕA JJT2 and Z17877JPEAJ JT4. Table 20 below describes the starting and ending position of this segment on each transcript. Table 20 - Segment location on tr-anscripts
Figure imgf001721_0001
Segment cluster Z17877JPEAJ jiodej 1 according to the present invention is supported by 23 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z17877JPEAJ_T2 and Z17877JPEAJ JT4. Table 21 below describes the starting and ending position of this segment on each transcript. Table 21 - Segment location on transcripts
Figure imgf001721_0002
Segment cluster Z17877JPEAJ _nodeJ3 according to the present invention is supported by 108 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Zl 7877 JPEAJ JTO, Z17877 JΕAJ JT2, Z17877JPEAJ JT3, Z17877_PEA_1_T4, Z17877_PEA_1_T6, Z17877JPEAJJT7, Z17877JΕAJJT8, Zl 7877 JΕAJ JTl 1 and Z17877JPEAJJT12. Table 22 below describes the starting and ending position of this segment on each transcript. Table 22 - Segment location on transcripts
Figure imgf001721_0003
Figure imgf001722_0001
Segment cluster Z17877JPEAJ _nodeJ5 according to the present invention is supported by 139 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z17877 JPEAJ JTO, Z17877JPEAJ JT2, Z17877_PEA_1_T3, Z17877_PEA_1_T4, Z17877JPEAJ JT6, Z17877JPEAJJT7, Z17877JPEAJJT8, Zl 7877 JPEAJ JTl 1 and Z17877_PEA_1_T12. Table 23 below describes the starting and ending position of this segment on each franscript. Table 23 - Segment location on transcripts
Figure imgf001722_0002
Segment cluster Z17877JPEAJ jnodejό according to the present invention is supported by 21 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z17877 J>EAJ JT6 and Z17877JPEAJ JTl 1. Table 24 below describes the starting and ending position of this segment on each transcript. Table 24 - Segment location on transcripts
Figure imgf001723_0001
Segment cluster Z17877JPEAJ ιodeJ8 according to the present invention is supported by 263 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z17877JPEAJ JTO, Z17877JPEAJ JT2, Z17877_PEA_1_T3, Z17877_PEA_1_T4, Z17877JPEAJJT6, Z17877_PEA_1_T7, Z17877JPEAJJT8, Z17877_PEA_1_T11 and Z17877JPEAJJT12. Table 25 below describes the starting and ending position of this segment on each transcript. Table 25 - Segment location on transcripts
Figure imgf001723_0002
According to an optional embodiment ofthe present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster Z17877JΕAJ jiodej according to the present invention is supported by 3 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z17877 JPEAJ JTO, Z17877JPEAJ JT2, Z17877 J EAJ JT3, Z17877_PEA_1_T4, Z17877JPEAJ JT6, Z17877 JPEAJ _T7, Z17877JPEAJ JT8, Zl 7877 J>EAJ JTl 1 and Z17877_PEA_1_T12. Table 26 below describes the starting and ending position of this segment on each franscript. Table 26 - Segment location on transcripts
Figure imgf001724_0001
Segment cluster Z17877JΕAJ jnodeJ according to the present invention can be found in the following franscript(s): Z17877 JΕAJ JTO, Z 17877 JPEA J_T2, Zl 7877 JΕA J_T3, Z17877_PEA_1_T4, Z 17877 JΕA J_T6, Z17877JPEAJ _T7, Z17877JPEAJJT8, Z17877_PEA_1_T11 and Z17877 JPEAJ JT12. Table 27 below describes the starting and ending position of this segment on each franscript. Table 27 - Segment location on transcripts
Figure imgf001725_0001
Segment cluster Z17877_PEA_l_node_4 according to the present invention can be found in the following franscript(s): Z17877_PEA_1_T0, Z17877JPEAJ JT2, Z 17877 JΕA JJT3, Z17877_PEA_1_T4, Z17877 J>EAJJT6 and Z17877 JPEA JJT8. Table 28 below describes the starting and ending position of this segment on each franscript. Table 28 - Segment location on transcripts
Figure imgf001725_0002
Segment cluster Z17877JPEAJ jiodej according to the present invention is supported by 80 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z17877JPEAJ JTO, Z17877 JPEAJ _T2, Z17877JPEAJ JT3, Z17877 ΕAJ JT6 and Z17877JPEAJ JT8. Table 29 below describes the starting and ending position of this segment on each transcript. Table 29 - Segment location on transcripts
Figure imgf001726_0001
Segment cluster Z 17877 JPEA Jjiode S according to the present invention can be found in the following franscript(s): Z17877JPEAJJT0, Z17877 JPEAJ JT2, Zl 7877 JΕA J_T3, Z17877JPEAJ JT4, Z17877JPEAJ JT6 and Z17877_PEAJ_T8. Table 30 below describes the starting and ending position of this segment on each franscript. Table 30 - Segment location on transcripts
Figure imgf001726_0002
Segment cluster Z17877JΕAJ ιodeJ4 according to the present invention is supported by 83 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z17877_PEAJ JTO, Z17877 JΕAJ JT2, Z17877JPEAJ _T3, Z17877JPEAJ JT4, Zl 7877 JPEA JJT6, Z17877_PEA_1_T7, Z17877JPEAJJT8, Z17877J>EAJJT11 and Z17877JPEAJJT12. Table 31 below describes the starting and ending position of this segment on each transcript. Table 31 - Segment location on transcripts
Figure imgf001727_0001
Segment cluster Z17877JPEAJ jιodeJ7 according to the present invention is supported by 11 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): Z17877 JΕAJ _T6, Z17877 JΕAJ JT8 and Z17877_PEA_1_T12. Table 32 below describes the starting and ending position of this segment on each franscript. Table 32 - Segment location on transcripts
Figure imgf001727_0002
Figure imgf001728_0001
Expression of c-myc-P64 mRNA, initiating from promoter PO Z17877 transcripts which are detectable by amplicon as depicted in sequence name Zl 7877seg8 in normal and cancerous colon tissues Expression of c-myc-P64 mRNA, initiating from promoter PO franscripts detectable by or according to seg8, Z17877seg8 amplicon (SEQ ID NO: 1375) and Z17877seg8 E(SEQ ID NO: 1373) and Z17877seg8 R (SEQ ID NO: 1374) primers was measured by real time PCR. In parallel the expression of four housekeeping genes -PBGD (GenBank Accession No. BC019323; amplicon - PBGD-amplicon, SEQ ID NO:531), HPRTl (GenBank Accession No. NM_000194; amplicon - HPRTl -amplicon, SEQ ID NO:612), G6PD (GenBank Accession No. NM_000402; G6PD amplicon, SEQ ID NO:615), RPS27A (GenBank Accession No. NM 02954; RPS27A amplicon, SEQ ID NO:1261), was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the normal post-mortem (PM) samples (Sample Nos. 41, 52, 62-67, 69-71, Table 1, above, "Tissue samples in testing panel"), to obtain a value of fold up-regulation for each sample relative to median ofthe normal PM samples. Figure 67 is a histogram showing over expression ofthe above-indicated c-myc-P64 mRNA, initiating from promoter P0 transcripts in cancerous colon samples relative to the normal samples. The number and percentage of samples that exhibit at least 3 fold over- expression, out ofthe total number of samples tested is indicated in the bottom. As is evident from Figure 67, the expression of c-myc-P64 mRNA, initiating from promoter P0 franscripts detectable by the above amplicon in cancer samples was significantly higher than in the non-cancerous samples (Sample Nos. 41, 52, 62-67, 69-71 Table 1, "Tissue samples in testing panel"). Notably an over-expression of at least 3 fold was found in 13 out of 37 adenocarcinoma samples, Statistical analysis was applied to verify the significance of these results, as described below. The P value for the difference in the expression levels of c-myc-P64 mRNA, initiating from promoter PO franscripts detectable by the above amplicon in colon cancer samples versus the normal tissue samples was determined by T test as 6.27E-05. Threshold of 3 fold overexpression was found to differentiate between cancer and normal samples with P value of 1.85E-02 as checked by exact fisher test. The above values demonstrate statistical significance ofthe results. Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non- limiting illustrative example only of a suitable primer pair: Z17877seg8F forward primer; and Z17877seg8R reverse primer. The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon was obtained as a non-limiting illustrative example only of a suitable amplicon: Z17877seg8.
Forward primer (SEQ ID NO: 1373): AGCAAGGACGCGACTCTCC Reverse primer (SEQ ID NO: 1374): AATCCAGCGTCTAAGCAGCTG Amplicon (SEQ ID NO: 1375):
AGCAAGGACGCGACTCTCCCGACGCGGGGAGGCTATTCTGCCCATTTGGGGACACT TCCCCGCCGCTGCCAGGACCCGCTTCTCTGAAAGGCTCTCCTTGCAGCTGCTTAGAC GCTGGATT
Combined expression of 19 sequences (T23657seg 29, T23657seg 22, T23657seg 41, T23657segl7-18, AA315457seg8, R30650seg76, HUM-CEASeg 33, CEA-Seg35, CEA-Seg31, AA583399segl, AA583399segl 7, AA58339-seg30-32, HUMCACHlAsegl 01, HSHCGI seg20, HSHCGI seg35, M78035seg 42, T51958seg7, T51958 seg3 and, Z17 77 seg8) in normal and cancerous colon tissues. Expression of solute canier organic anion transporter family, member 4A1 (SLC04A1), Carcinoembryonic antigen-related cell adhesion molecule 5 [Precursor] , myeloma overexpressed gene (in a subset of t(l 1;14) positive multiple myelomas) (MYEOV) , Voltage- dependent L-type calcium channel alpha-ID subunit Calcium channel, L type, alpha-1 polypeptide, isoform 2 , TRIM31 tripartite motif, S-adenosylhomocysteine hydrolase (AHCY), Homo sapiens PTK7 protein tyrosine kinase 7 (PTK7) and c-myc-P64 mRNA, initiating from promoter PO transcripts detectable by or according to T23657seg 29-32 ( SEQ ID NO: 1363); T23657seg 22 (SEQ ID NO: 1360); T23657seg 41 ( SEQ ID NO: 1366); T23657segl7-18 (SEQ ID NO: 1357); AA315457seg8 SEQ ID NO:1383; R30650seg76 ( SEQ ID NO: 1354); HUM- CEASeg33 (SEQ ID NO: 1345); CEA-Seg35 ( SEQ ID NO: 1348); CEA-Seg31 (SEQ ID NO: 1342); AA58339segl ( SEQ ID NO: 1327); AA583399segl7 (SEQ ID NO: 1324); AA583399- seg30-32 (SEQ ID NO: 1321); HUMCACHlAseglOl ( SEQ ID NO: 1337); HSHCGIseg20 (SEQ ID NO: 1378); HSHCGIseg35 (SEQ ID NO: 1381); M78035seg 42 (SEQ ID NO: 1351); T51958seg7 (SEQ ID NO: 1372); T51958seg38 (SEQ ID NO: 1369); Z17877seg8 (SEQ ID NO: 1375) amplicons was measured by real time PCR. In parallel the expression of four housekeeping genes -PBGD (GenBank Accession No. BC019323; amplicon - PBGD-amplicon, SEQ ID NO:531), HPRTl (GenBank Accession No. NM_000194; amplicon - HPRT1- amplicon, SEQ ID NO:612), G6PD (GenBank Accession No. NM 00402; G6PD amplicon, SEQ ID NO:615), RPS27A (GenBank Accession No. NM_002954; RPS27A amplicon, SEQ ID NO:1261), was measured similarly. For each RT sample, the expression ofthe above amplicons was normalized to the geometric mean ofthe quantities ofthe housekeeping genes. The normalized quantity of each RT sample of each amplicon was then divided by the median ofthe quantities ofthe normal post-mortem (PM) samples detected for the same amplicon (Sample Nos. 41, 52, 62-67, 69-71, Table 1, above, "Tissue samples in testing panel"), to obtain a value of fold up-regulation for each sample relative to median ofthe normal PM samples.
Figure 68 is a histogram showing over expression of the above-indicated transcripts in cancerous colon samples relative to the normal samples. The number and percentage of samples that exhibit at least 5 fold over-expression of at least one of the sequences, out of the total number of samples tested is indicated in the bottom. As is evident from Figure 68, an over-expression of at least 5 fold in at least one of the sequences was found in 37 out of 37 adenocarcinoma samples. Statistical analysis was applied to verify the significance of these results, as described below. Threshold of 5 fold overexpression of at least one ofthe amplicons was found to differentiate between cancer and normal samples with P value of 5.31E-10 as checked by exact fisher test. The above values demonstrate statistical significance ofthe results.
The Figure 68 shows combined results for the colon panel marker, as a non-limiting example of a combination of markers according to the present invention.
DESCRIPTION FOR CLUSTER HSHCGI
Cluster HSHCGI features 24 transcript(s) and 29 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end ofthe application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Figure imgf001731_0001
Figure imgf001732_0001
Figure imgf001733_0001
Table 3 - Proteins of interest
Figure imgf001733_0002
Figure imgf001734_0001
As noted above, cluster HSHCGI featares 24 franscript(s), which were listed in Table 1 above. A description of each variant protein according to the present invention is now provided.
Variant protein HSHCGIJPEAJJP17 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) HSHCGI PEAJJT13. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSHCGI_PEAJ_P17 and TM3 INHUMAN (SEQ ID NO: 1242): l.An isolated chimeric polypeptide encoding for HSHCGIJPEAJ * 17, comprising a first amino acid sequence being at least 90 % homologous to MASGQFVNKLQEEVICPICLDILQKPVTIDCGHNFCPQCITQIGETSCGFFKCPLCKTSVR RDAIRFNSLLRNLVEKIQALQASEVQSKRKEATCPRHQEMFHYFCEDDGKFLCFVCRES KDHKSHNVSLIEEAAQNYQGQIQEQIQVLQQKEKETVQVKAQGVHRVDVFTDQVEHE KQRILTEFELLHQVLEEEKNFLLSRIYWLGHEGTEAGKHYV conesponding to amino acids 1 - 218 of TM31 JTUMAN, which also conesponds to amino acids 1 - 218 of HSHCGIJPEAJ JP 17, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence EIPLMPTVERSQEARCYP conesponding to amino acids 219 - 236 of HSHCGIJPEA JJP 17, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of HSHCGI PEA JP 17, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence EIPLMPTVERSQEARCYP in HSHCGIJPEAJ J> 17. The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: infracellularly. The protein localization is believed to be infracellularly because neither ofthe trans-membrane region prediction programs predicted a trans-membrane region for this protein. In addition both signal-peptide prediction programs predict that this protein is a non-secreted protein. Variant protein HSHCGI_PEAJ_P17 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 4, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSHCGIJPEA_3JP17 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 4 - Amino acid mutations
Figure imgf001736_0001
Variant protein HSHCGIJPEAJ JP 17 is encoded by the following franscript(s): HSHCGIJPEAJ JTl 3, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript HSHCGI_PEAJ_T13 is shown in bold; this coding portion starts at position 111 and ends at position 814. The transcript also has the following SNPs as listed in Table 5 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSHCGI PEA JJP 17 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 5 - Nucleic acid SNPs
Figure imgf001736_0002
Figure imgf001737_0001
Variant protein HSHCGIJPEAJJP18 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) HSHCGI JPEAJ JTO. The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: infracellularly. The protein localization is believed to be intracellularly because neither ofthe frans-membrane region prediction programs predicted a frans-membrane region for this protein. In addition both signal-peptide prediction programs predict that this protein is a non-secreted protein. Variant protein HSHCGI_PEAJ_P18 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 6, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSHCGIJPEAJ J* 18 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 6 - Amino acid mutations
Figure imgf001738_0001
Variant protein HSHCGIJPEAJJP18 is encoded by the following franscript(s): HSHCGI JPEA 3 JTO, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript HSHCGI JPEA 3 JTO is shown in bold; this coding portion starts at position 111 and ends at position 1385. The transcript also has the following SNPs as listed in Table 7 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSHCGIJPEAJ J118 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Nucleic acid SNPs
Figure imgf001738_0002
Figure imgf001739_0001
Variant protein HSHCG1JPEAJJP19 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) HSHCGIJPEA 3 JTl 1. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSHCGIJPEAJ J> 19 and TM31 JHT MANJV2 (SEQ ID NO:1241): l.An isolated chimeric polypeptide encoding for HSHCGIJPEA JJP19, comprising a first amino acid sequence being at least 90 % homologous to MASGQFVNKLQEEVICPICLDILQKPVTIDCGHNFCLKCITQIGETSCGFFKCPLCKTSVR TDAIRFNSLLRNLVEKIQALQASEVQSKRKEATCPRHQEMFHYFCEDDGKFLCFVCRES KDHKSHNVSLIEEAAQNYQGQIQEQIQVLQQKEKETVQVKAQGVHRVDVFTDQVEHE KQRILTEFELLHQVLEEEKNFLLSRIYWLGHEGTEAGKHYVASTEPQLNDLKKLVDSLK TKQNMPPRQLLE conesponding to amino acids 1 - 248 of TM31 JHUMAN /2, which also conesponds to amino acids 1 - 248 of HSHCGI PEA 3 P19, and a second amino acid sequence being at least 70%>, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence NWRKNSVKQNQDTTPSQGA conesponding to amino acids 249 - 267 of HSHCGIJPEAJ P19, wherem said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of HSHCGI PEA JP 19, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence NWRKNSVKQNQDTTPSQGA in HSHCGI_PEAJ_P19.
It should be noted that the known protein sequence (TM31 JHUMAN) has one or more changes than the sequence given at the end ofthe application and named as being the amino acid sequence for TM31 JHUMANJV2. These changes were previously known to occur and are listed in the table below. Table 8 - Changes to TM31 _HUMAN_V2
Figure imgf001740_0001
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: intracellularly. The protein localization is believed to be infracellularly because neither ofthe frans-membrane region prediction programs predicted a frans-membrane region for this protein. In addition both signal-peptide prediction programs predict that this protein is a non-secreted protein. Variant protein HSHCGIJPEA JJT 9 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 9, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSHCGI_PEAJ_P19 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Amino acid mutations
Figure imgf001741_0001
Variant protein HSHCGIJPEA J P 19 is encoded by the following transcript(s): HSHCGI J EAJ Tl 1, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript HSHCGI JPEAJ JTl 1 is shown in bold; this coding portion starts at position 111 and ends at position 911. The transcript also has the following SNPs as listed in Table 10 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSHCGI PEA _ _? 19 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 - Nucleic acid SNPs
Figure imgf001741_0002
Figure imgf001742_0001
Variant protein HSHCGIJPEA JJP1 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) HSHCGI JΕAJJT3. The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: infracellularly. The protein localization is believed to be infracellularly because neither ofthe trans-membrane region prediction programs predicted a frans-membrane region for this protein. In addition both signal-peptide prediction programs predict that this protein is a non-secreted protein. Variant protein HSHCGI PEA J P1 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 11, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of Icnown SNPs in variant protein HSHCGI_PEAJ_P1 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 11 - Amino acid mutations
Figure imgf001743_0001
Variant protein HSHCGIJPEA JJP1 is encoded by the following transcript(s): HSHCGI JPEA JT3, for which the sequence(s) is/are given at the end ofthe application. The coding portion of franscript HSHCGIJPEA JJT3 is shown in bold; this coding portion starts at position 139 and ends at position 1413. The franscript also has the following SNPs as listed in Table 12 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of Icnown SNPs in variant protein HSHCGI JPEA J P1 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 12 - Nucleic acid SNPs
Figure imgf001743_0002
Figure imgf001744_0001
Variant protein HSHCGI PEA JP4 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) HSHCGI JPEA JJT5. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSHCGIJΕAJ J>4 and TM31_HUMAN_V1 (SEQ ID NO: 1240): l.An isolated chimeric polypeptide encoding for HSHCGIJPEA JP4, comprising a first amino acid sequence being at least 90 % homologous to MASGQFVNKLQEEVICPICLDILQKPVTIDCGHNFCLKCITQIGETSCGFFKCPLCKTSVR KNAIRFNSLLRNLVEKIQALQASEVQSKRKEATCPRHQEMFHYFCEDDGKFLCFVCRES KDHKSHNVSLIEEAAQNYQGQIQEQIQVLQQKEKETVQVKAQGVHRVDVFTDQVEHE KQRILTEFELLHQVLEEEKNFLLSRIYWLGHEGTEAGKHYVASTEPQLNDLKKLVDSLK TKQNMPPRQLLEDIKVVLCR conesponding to amino acids 1 - 256 of TM31 JHUMAN J/l, which also conesponds to amino acids 1 - 256 of HSHCGI JΕA JJP4, and a second amino acid sequence being at least 70%), optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence YDGPPQMYFAY conesponding to amino acids 257 - 267 of HSHCGIJPEA J JP4, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSHCGIJPEA 3 JP4, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence YDGPPQMYFAY in HSHCGIJPEA JJ»4.
It should be noted that the known protein sequence (TM31 JHUMAN) has one or more changes than the sequence given at the end ofthe application and named as being the amino acid sequence for TM31 JHUMANJVl . These changes were previously Icnown to occur and are listed in the table below. Table 13 - Changes to TM31 _HUMAN_V1
Figure imgf001745_0001
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: infracellularly. The protein localization is believed to be infracellularly because neither ofthe trans-membrane region prediction programs predicted a frans-membrane region for this protein. In addition both signal-peptide prediction programs predict that this protein is a non-secreted protein. Variant protein HSHCGI_PEA J_P4 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 14, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSHCGI JPEA JJP4 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 14 - Amino acid mutations
Figure imgf001746_0001
Variant protein HSHCGIJPEAJ JP4 is encoded by the following transcript(s): HSHCGI PE A JT5, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript HSHCGI PEA JJT5 is shown in bold; this coding portion starts at position 139 and ends at position 939. The franscript also has the following SNPs as listed in Table 15 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSHCGI PEA JP4 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 15 - Nucleic acid SNPs
Figure imgf001746_0002
Figure imgf001747_0001
Figure imgf001748_0001
Variant protein HSHCGIJPEA JJP6 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) HSHCGIJPEA JT7. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSHCGIJΕA JP6 and TM31_HUMAN_V1: 1.An isolated chimeric polypeptide encoding for HSHCGIJPEA P6, comprising a first amino acid sequence being at least 90 %> homologous to MASGQFVNI^QEEVICPICLDILQKPVTIDCGFΓNFCLKCITQIGETSCGFFKCPLCKTSVR KNAIRFNSLLRNLVEKIQALQASEVQSKRKEATCPRHQEMFHYFCEDDGKFLCFVCRES KDHKSHNVSLIEEAAQNYQGQIQEQIQVLQQKEKETVQVKAQGVHRVDVFTDQVEHE KQRILTEFELLHQVLEEEKNFLLSRIYWLGHEGTEAGKHYVASTEPQLNDLKKLVDSLK TKQNMPPRQLLEDIKVVLCR conesponding to amino acids 1 - 256 of TM31 _HUMAN_V1, which also conesponds to amino acids 1 - 256 of HSHCGI JPEA JP6, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence PTPG conesponding to amino acids 257 - 260 of HSHCGI JPEAJ JP6, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSHCGI JPEAJ JP6, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence PTPG in HSHCGI _PEAJ_P6. It should be noted that the Icnown protein sequence (TM31 JTUMAN) has one or more changes than the sequence given at the end ofthe application and named as being the amino acid sequence for TM31 JHUMAN /l. These changes were previously Icnown to occur and are listed in the table below. Table 16 - Changes to TM31 _HUMAN_V1
Figure imgf001749_0001
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: infracellularly. The protein localization is believed to be intracellularly because neither ofthe trans-membrane region prediction programs predicted a frans-membrane region for this protein. In addition both signal-peptide prediction programs predict that this protein is a non-secreted protein. Variant protein HSHCGI PEA _3_P6 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 17, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSHCGI JPEA J P6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 17 - Amino acid mutations
Figure imgf001749_0002
Figure imgf001750_0001
Variant protein HSHCGIJPEA J j?6 is encoded by the following transcript(s): HSHCGI JPEAJ T7, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSHCGIJPEA J_T7 is shown in bold; this coding portion starts at position 139 and ends at position 918. The transcript also has the following SNPs as listed in Table 18 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is Icnown or not; the presence of Icnown SNPs in variant protein HSHCGI PEA JP6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 18 - Nucleic acid SNPs
Figure imgf001750_0002
Figure imgf001751_0001
Variant protein HSHCGIJPEA J JP7 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSHCGI JPEAJJT8. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSHCGIJPEA _3_P7 and TM31 JHUMAN J l : l.An isolated chimeric polypeptide encoding for HSHCGI PEA JP7, comprising a first amino acid sequence being at least 90 % homologous to MASGQFVNKLQEEVICPICLDILQKPVTIDCGHNFCLKCITQIGETSCGFFKCPLCKTSVR KNAIRFNSLLRNLVEKIQALQASEVQSKRKEATCPRHQEMFHYFCEDDGKFLCFVCRES KDHKSHNVSLIEEAAQNYQGQIQEQIQVLQQKEKETVQVKAQGVHRVDVFTDQVEHE KQRILTEFELLHQVLEEEKNFLLSRIYWLGHEGTEAGKHYVASTEPQLNDLKKLVDSLK TKQNMPPRQLLEDIKWLCRS conesponding to amino acids 1 - 257 of TM31JHUMAN V1, which also conesponds to amino acids 1 - 257 of HSHCGIJPEA JP7, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence SFSHTSSPDLTNQLNHIFLEVKSFSFSTQPLFLWNWRKNSVKQNQDTTPSQGA conesponding to amino acids 258 - 310 of HSHCGIJPEA 3 J*7, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of HSHCGI JPEAJ JP7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SFSHTSSPDLTNQLNHITLEVXSFSFSTQPLFLWNWRKNSVKQNQDTTPSQGA in HSHCGI PEA 3 P7. It should be noted that the known protein sequence (TM31 JTUMAN) has one or more changes than the sequence given at the end ofthe application and named as being the amino acid sequence for TM31 JHUMAN JVT . These changes were previously known to occur and are listed in the table below. Table 19 - Changes to TM31JHUMAN_V1
Figure imgf001752_0001
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: infracellularly. The protein localization is believed to be intracellularly because neither ofthe frans-membrane region prediction programs predicted a frans-membrane region for this protein. In addition both signal-peptide prediction programs predict that this protein is a non-secreted protein. Variant protein HSHCGIJPEA J JP7 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 20, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSHCGI JΕAJJP7 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 20 - Amino acid mutations
Figure imgf001752_0002
Figure imgf001753_0001
Variant protein HSHCGI_PEAJ_P7 is encoded by the following transcript(s): HSHCGI JΕAJJT8, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript HSHCGI JPEA J_T8 is shown in bold; this coding portion starts at position 139 and ends at position 1068. The transcript also has the following SNPs as listed in Table 21 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is Icnown or not; the presence of known SNPs in variant protein HSHCGI PEA J_P7 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 21 - Nucleic acid SNPs
Figure imgf001753_0002
Figure imgf001754_0001
Variant protein HSHCGIJPEAJ JP8 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by franscript(s) HSHCGI PEA JJT9. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSHCGI JΕA J_P8 and TM31_HUMAN_V1 : l.An isolated chimeric polypeptide encoding for HSHCGI PEA 3_P8, comprising a first amino acid sequence being at least 90 % homologous to MASGQFVΉKLQEEVICPICLDILQKPVTIDCGHNFCLKCITQIGETSCGFFKCPLCKTSVR KNAIRFNSLLRNLVEKIQALQASEVQSKL'J^ATCPRHQEMFHYFCEDDGKFLCFVCRES KDHKSHNVSLIEEAAQNYQGQIQEQIQVLQQKEKETVQVKAQGVHRVDVFTDQVEHE KQRILTEFELLHQVLEEEKNFLLSRIYWLGHEGTEAGKHYVASTEPQLNDLKKLVDSLK TKQNMPPRQLLEDIKVVLCRSEEFQFLNPTPVPLELEKKLSEAKSRHDSITGSLKKFKDQ LQADRKTΦENRFFKSMNKNDMKSWGLLQKNNHKMNKTSEPGSSSAG conesponding to amino acids 1 - 342 of TM31 JHUMANJV1, which also conesponds to amino acids 1 - 342 of HSHCGIJPEA JP8, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence KSPVSEY conesponding to amino acids 343 - 349 of HSHCGI PEA J P8, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSHCGI JPEA JP8, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence KSPVSEY in HSHCGIJPEA J J>8.
It should be noted that the known protein sequence (TM31 JHUMAN) has one or more changes than the sequence given at the end ofthe application and named as being the amino acid sequence for TM31_HUMAN_V1. These changes were previously known to occur and are listed in the table below. Table 22 - Changes to TM31_HUMAN_V1
Figure imgf001755_0001
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: intracellularly. The protein localization is believed to be infracellularly because neither ofthe trans-membrane region prediction programs predicted a frans-membrane region for this protein. In addition both signal-peptide prediction programs predict that this protein is a non-secreted protein. Variant protein HSHCGI JΕA JP8 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 23, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSHCGI JPEA_3JP8 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 23 - Amino acid mutations
Figure imgf001756_0001
Variant protein HSHCGIJPEA J P8 is encoded by the following transcript(s): HSHCGI ?E A J_T9, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSHCGI PEA JT9 is shown in bold; this coding portion starts at position 139 and ends at position 1185. The transcript also has the following SNPs as listed in Table 24 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSHCGI PEA _3 JP8 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 24 - Nucleic acid SNPs
Figure imgf001756_0002
Figure imgf001757_0001
Variant protein HSHCGI PEA J P9 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by franscript(s) HSHCGI PEA J JTl 0. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSHCGIJPEA J J>9 and TM31 JHUMANJV1 : 1.An isolated chimeric polypeptide encoding for HSHCGI PEA JP9, comprising a first amino acid sequence being at least 90 % homologous to MASGQFVT^KLQEEVICPICLDILQKPVTIDCGHNFCLKCITQIGETSCGFFKCPLCKTSVR KNAIRFNSLLRNLVEKTQALQASEVQSKRKEATCPRHQEMFHYFCEDDGKFLCFVCRES KDHKSHNVSLIEEAAQNYQGQIQEQIQVLQQKEKETVQVKAQGVΉRVDVFTDQVEHE KQWLTEFELLHQVLEEEKNFLLSRTYVVLGHEGTEAGKHYVASTEPQLNDLKKLVDSLK TKQNMPPRQLLEDIKVVLCR conesponding to amino acids 1 - 256 of TM31 JHUMAN V1 , which also conesponds to amino acids 1 - 256 of HSHCGIJPEAJ j?9, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence TGEKTQ conesponding to amino acids 257 - 262 of HSHCGI JPEAJ J>9, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSHCGIJPEA JJP9, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%) homologous to the sequence TGEKTQ in HSHCGI JΕA _3 JP9.
It should be noted that the Icnown protein sequence (TM31 JHUMAN) has one or more changes than the sequence given at the end ofthe application and named as being the amino acid sequence for TM31 JHUMANJV1. These changes were previously known to occur and are listed in the table below. Table 25 - Changes to TM31_HUMAN_V1
Figure imgf001758_0001
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: infracellularly. The protein localization is believed to be intracellularly because neither ofthe frans-membrane region prediction programs predicted a trans-membrane region for this protein. In addition both signal-peptide prediction programs predict that this protein is a non-secreted protein. Variant protein HSHCGIJPEA J JP9 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 26, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSHCGIJPEAJ JP9 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 26 - Amino acid mutations
Figure imgf001759_0001
Variant protein HSHCGI PEA JP9 is encoded by the following franscript(s): HSHCGIJPEA JT10, for which the sequence(s) is/are given at the end ofthe application. The coding portion of franscript HSHCGIJPEA 3 JTl 0 is shown in bold; this coding portion starts at position 139 and ends at position 924. The transcript also has the following SNPs as listed in Table 27 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSHCGI PEA J JP9 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 27 - Nucleic acid SNPs
Figure imgf001759_0002
Figure imgf001760_0001
Variant protein HSHCGI PEA JP 12 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) HSHCGIJPEAJJT14. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSHCGI_PEAJ_P12 and TM31_HUMAN: l.An isolated chimeric polypeptide encoding for HSHCGIJPEAJJP12, comprising a first amino acid sequence being at least 90 % homologous to MNKNDMKSWGLLQKNNHKMNKTSEPGSSSAGGRTTSGPPNHHSSAPSHSLFRASSAG KVTFPVCLLASYDEISGQGASSQDTKTFDVALSEELHAALSEWLTAIRAWFCEVPSS conesponding to amino acids 312 - 425 of TM31 JHUMAN, which also conesponds to amino acids 1 - 114 of HSHCGI_PEAJ_P12.
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: infracellularly. The protein localization is believed to be infracellularly because neither ofthe trans-membrane region prediction programs predicted a trans-membrane region for this protein. In addition both signal-peptide prediction programs predict that this protein is a non-secreted protein.
Variant protein HSHCGI_PEAJ_P12 is encoded by the following transcript(s): HSHCGIJPEA JJT14, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript HSHCGIj?EAJ_T14 is shown in bold; this coding portion starts at position 1795 and ends at position 2136. The transcript also has the following SNPs as listed in Table 28 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSHCG1JPEAJJP12 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 28 - Nucleic acid SNPs
Figure imgf001761_0001
Figure imgf001762_0001
Variant protein HSHCGIJPEAJ J313 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) HSHCGI PEAJ JT17. The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: infracellularly. The protein localization is believed to be infracellularly because neither ofthe frans-membrane region prediction programs predicted a frans-membrane region for this protein. In addition both signal-peptide prediction programs predict that this protein is a non-secreted protein.
Variant protein HSHCGIJPEAJ JP 13 is encoded by the following franscript(s): HSHCGIJPEA JJT17, for which the sequence(s) is/are given at the end ofthe application. The coding portion of franscript HSHCGΪJPEAJJT17 is shown in bold; this coding portion starts at position 585 and ends at position 914. The transcript also has the following SNPs as listed in Table 29 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is Icnown or not; the presence of known SNPs in variant protein HSHCGIJΕAJ J313 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 29 - Nucleic acid SNPs
Figure imgf001763_0001
Variant protein HSHCGI_PEAJ_P14 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) HSHCGI JPEA JJT18. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSHCGI J»EA_3JP14 and TM31_HUMAN_V1: 1.An isolated chimeric polypeptide encoding for HSHCGI_PEAJ_P14, comprising a first amino acid sequence being at least 90 % homologous to MASGQFVNKLQEEVICPICLDILQKPVTIDCGHNFCLKCITQIGETSCGFFKCPLCKTSVR KKAIRFNSLLRNLVEKIQALQASEVQSKRKEATCPRHQEMFHYFCEDDGKFLCFVCRES KDHKSHNVSLIEEAAQNYQGQIQEQIQVLQQKEKETVQVKAQGVHRVDVFTDQVEHE KQRILTEFELLHQVLEEEKNFLLSRIYWLGHEGTEAGKHYVASTEPQLNDLKKLVDSLK TKQNMPPRQLLEDIKVVLCRSEEFQFLNPTPVPLELEKKLSEAKSRHDSITGSLKKFKDQ LQADRKKDENRFFKSMNKNDMKS conesponding to amino acids 1 - 319 of TM31 JIUMANJV1, which also conesponds to amino acids 1 - 319 of HSHCGIJPEAJ JP 14, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%) and most preferably at least 95% homologous to a polypeptide having the sequence CK conesponding to amino acids 320 - 321 of HSHCGIJPEAJJP14, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
It should be noted that the Icnown protein sequence (TM31 JHUMAN) has one or more changes than the sequence given at the end ofthe application and named as being the amino acid sequence for TM31_HUMAN_V1. These changes were previously known to occur and are listed in the table below. Table 30 - Changes to TM31 _HUMAN_ V1
Figure imgf001764_0001
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: infracellularly. The protein localization is believed to be infracellularly because neither ofthe trans-membrane region prediction programs predicted a frans-membrane region for this protein. In addition both signal-peptide prediction programs predict that this protein is a non-secreted protein. Variant protein HSHCGI JΕA J P 14 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 31, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of Icnown SNPs in variant protein HSHCGI ?EAJ ?14 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 31 - Amino acid mutations
Figure imgf001765_0001
Variant protein HSHCGIJPEA J T 4 is encoded by the following transcript(s): HSHCGI JPEAJJTl 8, for which the sequence(s) is/are given at the end ofthe application. The coding portion of franscript HSHCGI ?EAJ_Tl 8 is shown in bold; this coding portion starts at position 139 and ends at position 1101. The transcript also has the following SNPs as listed in Table 32 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of Icnown SNPs in variant protein HSHCGIJPEAJJP14 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 32 - Nucleic acid SNPs
Figure imgf001765_0002
Figure imgf001766_0001
Variant protein HSHCGIJPEAJJP15 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) HSHCGIJPEA JT21. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: intracellularly. The protein localization is believed to be infracellularly because neither ofthe trans-membrane region prediction programs predicted a frans-membrane region for this protein. In addition both signal-peptide prediction programs predict that this protein is a non-secreted protein.
Variant protein HSHCGIJPEAJJP15 is encoded by the following transcript(s): HSHCGIJPEAJJT21, for which the sequence(s) is/are given at the end ofthe application. The coding portion of franscript HSHCGI PEA T21 is shown in bold; this coding portion starts at position 338 and ends at position 505. The transcript also has the following SNPs as listed in Table 33 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSHCGIJPEAJ J515 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 33 - Nucleic acid SNPs
Figure imgf001767_0001
Variant protein HSHCGI_PEAJ_P16 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) HSHCGI JPEA JT23. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSHCGIJPEA_3_P 16 and TM31 JHUMANJV1 : 1.An isolated chimeric polypeptide encoding for HSHCGI PEA JJP 16, comprising a first amino acid sequence being at least 90 % homologous to MASGQFVNKLQEEVICPICLDILQKPVTIDCGHNFCLKCITQIGETSCGFFKCPLCKTSVR KNAIRFNSLLRINDLVEKIQALQASEVQSKIPJKEATCPRHQEMFHYFCEDDGKFLCFVCRES KDHKSHNVSLIEEAAQNYQGQIQEQIQVLQQKEKETVQVKAQGVHRVDVFT conesponding to amino acids 1 - 171 of TM31 JHUMAN J/l, which also conesponds to amino acids 1 - 171 of HSHCGIJPEAJJP16, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VRKTPSHDLWKQKHLCQSSWNPLLH conesponding to amino acids 172 - 196 of HSHCGIJPEAJJP16, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of HSHCGI JPEAJ JP 16, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VRKTPSHDLWKQKHLCQSSWNPLLH in HSHCGIJPEAJ J> 16.
It should be noted that the Icnown protein sequence (TM31 JHUMAN) has one or more changes than the sequence given at the end ofthe application and named as being the amino acid sequence for TM31JHUMANJV1. These changes were previously known to occur and are listed in the table below. Table 34 - Changes to TM31 _HUMAN_V1
Figure imgf001768_0001
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: intracellularly. The protein localization is believed to be infracellularly because neither ofthe frans-membrane region prediction programs predicted a trans-membrane region for this protein. In addition both signal-peptide prediction programs predict that this protein is a non-secreted protein. Variant protein HSHCGI_PEA JP 16 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 35, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSHCGIJPEAJJP16 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 35 - Amino acid mutations
Figure imgf001769_0001
Variant protein HSHCGIJPEAJ J* 16 is encoded by the following transcript(s): HSHCGI PE A JJT23, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript HSHCGI JPEA JJT23 is shown in bold; this coding portion starts at position 139 and ends at position 726. The transcript also has the following SNPs as listed in Table 36 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSHCG1JPEAJJP16 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 36 - Nucleic acid SNPs
Figure imgf001769_0002
Figure imgf001770_0001
Variant protein HSHCGI PEA 3 JP20 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) HSHCGI J>EA J_T 1 and HSHCGI J>EA J JT2. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: intracellularly. The protein localization is believed to be intracellularly because neither ofthe trans-membrane region prediction programs predicted a frans-membrane region for this protein. In addition both signal-peptide prediction programs predict that this protein is a non-secreted protein. Variant protein HSHCGI PEA J J>20 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 37, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of Icnown SNPs in variant protein HSHCGI_PEAJ_P20 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 37 - Amino acid mutations
Figure imgf001770_0002
Figure imgf001771_0001
Variant protein HSHCGIJPEA JJP20 is encoded by the following transcript(s): HSHCGI PEA J Tl and HSHCGI PEA JJT2, for which the sequence(s) is/are given at the end of the application. The coding portion of franscript HSHCGI JPEAJ JTl is shown in bold; this coding portion starts at position 139 and ends at position 1413. The transcript also has the following SNPs as listed in Table 38 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is Icnown or not; the presence of Icnown SNPs in variant protein HSHCGI_PEA_3_P20 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 38 - Nucleic acid SNPs
Figure imgf001771_0002
Figure imgf001772_0001
The coding portion of transcript HSHCGIJPEAJ JT2 is shown in bold; this coding portion starts at position 112 and ends at position 1386. The franscript also has the following SNPs as listed in Table 39 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSHCGIJPEA JJP20 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 39 - Nucleic acid SNPs
Figure imgf001772_0002
Figure imgf001773_0001
Variant protein HSHCGI PEA J JP21 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by franscript(s) HSHCGI JPEA JJT4. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSHCGI JPEAJJTl and TM31 JHUMAN: l.An isolated chimeric polypeptide encoding for HSHCGIJPEA J P21, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MHHSDWGNIMWIFQMSPLQNFRKEERNQ conesponding to amino acids 1 - 28 of HSHCGI JPEA P21, and a second amino acid sequence being at least 90 % homologous to FLCFVCP^SKDHKSHNVSLIEEAAQNYQGQIQEQIQVLQQKEKETVQVKAQGVHRVDV FTDQVEHEKQRILTEFELLHQVLEEEKNFLLSRIYWLGHEGTEAGKHYVASTEPQLNDL KKLVDSLKTKQNMPPRQLLEDIKVVLCRSEEFQFLNPTPWLELEKKLSEAKSRHDSITG SLKKFKDQLQADRKKT ENRFFKSMNKNDMKSWGLLQKNNFIKMNKTSEPGSSSAGGR TTSGPPNHHSSAPSHSLFRASSAGKVTFPVCLLASYDEISGQGASSQDTKTFDVALSEEL HAALSEWLTAIRAWFCEVPSS conesponding to amino acids 112 - 425 of TM31 JTUMAN, which also conesponds to amino acids 29 - 342 of HSHCGIJPEA 3 JP21, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a head of HSHCGI JPEAJ JP21, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MHHSDWGNIMWIFQMSPLQNFRKEERNQ of HSHCGI J»EA JJP21.
The location ofthe variant protein was detennined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: intracellularly. The protein localization is believed to be intracellularly because neither ofthe trans-membrane region prediction programs predicted a trans-membrane region for this protein. In addition both signal-peptide prediction programs predict that this protein is a non-secreted protein. Variant protein HSHCGI PEA 3 JP21 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 40, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSHCGI PEA JJP21 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 40 - Amino acid mutations
Figure imgf001774_0001
Variant protein HSHCGIJPEA JJP21 is encoded by the following franscript(s): HSHCGI JPEAJJT4, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript HSHCGI ?EAJJT4 is shown in bold; this coding portion starts at position 252 and ends at position 1277. The franscript also has the following SNPs as listed in Table 41 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSHCGI_PEAJ_P21 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 41 - Nucleic acid SNPs
Figure imgf001775_0001
Variant protein HSHCGIJPEA JP22 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) HSHCGI PEA _3_T6. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description ofthe relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSHCGI_PEAJ_P22 and TM31 JHUMAN: 1.An isolated chimeric polypeptide encoding for HSHCGI PEA 3 ?22, comprising a first amino acid sequence being at least 90 % homologous to MPPRQLLEDIKVVLCRSEEFQFLNPTPVPLELEKKLSEAKSRHDSITGSLKKFKDQLQAD RKKDENRFFKSMNKNDMKSWGLLQKNNHKMNKTSEPGSSSAGGRTTSGPPNHHSSAP SHSLFRASSAGKVTFPVCLLASYDEISGQGASSQDTKTFDVALSEELHAALSEWLTAIRA WFCEVPSS conesponding to amino acids 241 - 425 of TM31 JHUMAN, which also conesponds to amino acids 1 - 185 of HSHCGI JPEA JJP22. The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: intracellularly. The protein localization is believed to be infracellularly because neither ofthe frans-membrane region prediction programs predicted a frans-membrane region for this protein. In addition both signal-peptide prediction programs predict that this protein is a non-secreted protein. Variant protein HSHCGIJPEA J JP22 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 42, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSHCGI_PEAJ_P22 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 42 - Amino acid mutations
Figure imgf001776_0001
Figure imgf001777_0001
Variant protein HSHCGIJPEA J JP22 is encoded by the following transcript(s): HSHCGI JPEAJJT6, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript HSHCGI PEA JJT6 is shown in bold; this coding portion starts at position 413 and ends at position 967. The transcript also has the following SNPs as listed in Table 43 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is Icnown or not; the presence of known SNPs in variant protein HSHCGI PEA J P22 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 43 - Nucleic acid SNPs
Figure imgf001777_0002
As noted above, cluster HSHCGI featares 29 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end ofthe application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster HSHCGI PEA 3_node_0 according to the present invention is supported by 3 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSHCGI JΕA JJT21 and HSHCGI JΕAJJT22. Table 44 below describes the starting and ending position of this segment on each transcript. Table 44 - Segment location on transcripts
Figure imgf001778_0001
Segment cluster HSHCGIJPEA J_node_2 according to the present invention is supported by 6 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSHCGI PEAJJT21 and HSHCGI PEAJJT22. Table 45 below describes the starting and ending position of this segment on each franscript. Table 45 - Segment location on transcripts
Figure imgf001778_0002
Segment cluster HSHCGI_PEA J_node J according to the present invention is supported by 27 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSHCGI JΕA JTO, HSHCGIJPEA J JTl, HSHCGIJPEA JT2, HSHCGI JPEA_3_T3, HSHCGI JΕAJJT4, HSHCGI JΕAJJT5, HSHCGI JPEA JJT7, HSHCGIJPEA_3JT8, HSHCGIJPEAJ JT9, HSHCGI_PEAJ_T10, HSHCGI_PEAJ_T11, HSHCGI JΕAJJT12, HSHCGIJPEA_3_T13, HSHCGI_PEAJ_T18 and HSHCGI JΕAJJT23. Table 46 below describes the starting and ending position of this segment on each transcript. Table 46 - Segment location on transcripts
Figure imgf001779_0001
Segment cluster HSHCGI_PEAJ_node_8 according to the present invention is supported by 26 libraries. The number of libraries was detennined as previously described. This segment can be found in the following transcript(s): HSHCGIJPEAJ JTO, HSHCGI JPEA_3 JTl , HSHCGI JPEA JJT2, HSHCGIJPEA _T3, HSHCGI JΕAJJT5, HSHCGI JΕA _T7, HSHCGI JPEA JJT8, HSHCGI ?EAJ_T9, HSHCGI_PEA_3_T10, HSHCGI_PEA_3_T11, HSHCGI_PEAJ_T12, HSHCGIJPEA_3_T13, HSHCGI_PEA_3_T18 and HSHCGI_PEAJ_T23. Table 47 below describes the starting and ending position of this segment on each transcript. Table 47 - Segment location on transcripts
Figure imgf001780_0001
Segment cluster HSHCGI_PEAJ_node_14 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSHCGI PEAJJT23. Table 48 below describes the starting and ending position of this segment on each transcript. Table 48 - Segment location on transcripts
Figure imgf001781_0001
Segment cluster HSHCGIJPEA Jjiode J 6 according to the present invention is supported by 43 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSHCGI PEA J TO, HSHCGIJPEAJJTl, HSHCGI JPEA JJT2, HSHCGI JPEA J_T3, HSHCGI »EA_3_T4, HSHCGI ΕAJJT5, HSHCGIJPEA JJT6, HSHCGIJPEA _T7, HSHCGI JPEA_3_T8, HSHCGI JPEA J_T9, HSHCGIJPEAJJTl 0, HSHCGI_PEAJ_T11, HSHCGIJPEAJJTl 2, HSHCGI JΕAJ JTl 3 and HSHCGIJPEAJJTl 8. Table 49 below describes the starting and ending position of this segment on each transcript. Table 49 - Segment location on transcripts
Figure imgf001781_0002
Figure imgf001782_0001
Segment cluster HSHCGI PEA J node J 8 according to the present invention is supported by 4 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSHCGI_PEAJ_T14. Table 50 below describes the starting and ending position of this segment on each franscript. Table 50 - Segment location on transcripts
Figure imgf001782_0002
Segment cluster HSHCGIJPEA Jjnode .0 according to the present invention is supported by 11 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSHCGI JPEA JT5, HSHCGI JPEA JJT12 and HSHCGI JΕA JJT14. Table 51 below describes the starting and ending position of this segment on each transcript. Table 51 - Segment location on transcripts
Figure imgf001782_0003
Segment cluster HSHCGIJPEA Jjnode _26 according to the present invention is supported by 5 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSHCGI_PEAJ_T15, HSHCGIJPEAJJTl 9, HSHCGI JPEA JJT20 and HSHCGI JPE A JT24. Table 52 below describes the starting and ending position of this segment on each franscript. Table 52 - Segment location on transcripts
Figure imgf001783_0001
Segment cluster HSHCGI_PEA Jjiode 8 according to the present invention is supported by 3 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSHCGI PEAJJT18 and HSHCGI PEAJJT24. Table 53 below describes the starting and ending position of this segment on each franscript. Table 53 - Segment location on transcripts
Figure imgf001783_0002
Segment cluster HSHCGIJPEA Jjiode O according to the present invention is supported by 3 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSHCGIJPEAJJTl 7. Table 54 below describes the starting and ending position of this segment on each transcript. Table 54 - Segment location on transcripts
Figure imgf001784_0001
Segment cluster HSHCGI PEA Jjiode 32 according to the present invention is supported by 7 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSHCGI_PEAJ_T9, HSHCGI JΕAJjπ 2, HSHCGIJPEAJJTl 7 and HSHCGIJPEAJJTl 9. Table 55 below describes the starting and ending position of this segment on each transcript. Table 55 - Segment location on transcripts
Figure imgf001784_0002
Segment cluster HSHCGI PEA J_nodeJ3 according to the present invention is supported by 50 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSHCGI JΕAJ JTO, HSHCGI JΕAJ JTl, HSHCGIJPEA JJT2, HSHCGI_PEAJ_T3, HSHCGI_PEAJ_T4, HSHCGIJPEA J_T5, HSHCGIJΕAJ _T6, HSHCGI JPEAJJT7, HSHCGI JPEAJ_T8, HSHCGI JΕAJJT9, HSHCGI JPEAJ T 10, HSHCGIJ AJJT11, HSHCGI JΕAJ JTl 2, HSHCGI PEA 3 T13, HSHCGI PEA 3 T 14, HSHCGI PEA 3 T15, HSHCGI PEA 3 T17, HSHCGIJPEAJJTl 9 and HSHCGI JPEA JJT20. Table 56 below describes the starting and ending position of this segment on each transcript. Table 56 - Segment location on transcripts
Figure imgf001785_0001
Segment cluster HSHCGI PEA Jjiode _34 according to the present invention is supported by 32 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSHCGIJPEAJ JTO, HSHCGI PEA 3 Tl, HSHCGIJPEA 3 T2, HSHCGI PEA 3 T4, HSHCGI PEA 3_T5, HSHCGIJPEA JJT6, HSHCGI JΕA JJT7, HSHCGI_PEAJ_T8, HSHCGI j?EAJJT9, HSHCGI_PEAJJT10, HSHCGIJPEAJJTl 1, HSHCGIJPEAJJTl 2, HSHCGI_PEAJ_T13, HSHCGIJPEAJJTl 4, HSHCGI JΕAJ JTl 5, HSHCGIJΕAJJT17 and HSHCGIJPEAJJTl 9. Table 57 below describes the starting and ending position of this segment on each transcript. Table 57 - Segment location on transcripts
Figure imgf001786_0001
Segment cluster HSHCGIJPEAJ jiode J6 according to the present invention is supported by 38 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSHCGIJΕAJ JTO, HSHCGIJPEAJJTl, HSHCGI J»EAJJT2, HSHCGI JΕA JJT3, HSHCGI_PEAJ_T4, HSHCGI_PEAJ_T5, HSHCGI ΕAJJT6, HSHCGI JΕA J_T7, HSHCGI JΕA JJT8, HSHCGI_PEAJ_T9, HSHCGIJPEA _T10, HSHCGI_PEAJ_T11, HSHCGIJPEAJ JT12, HSHCGI J>EAJ JTl 3, HSHCGI JPEA JJT14, HSHCGI_PEA_3_T15, HSHCGI_PEA_3_T17, HSHCGI_PEA_3_T19 and HSHCGI_PEAJ_T20. Table 58 below describes the starting and ending position of this segment on each transcript. Table 58 - Segment location on transcripts
Figure imgf001787_0001
According to an optional embodiment ofthe present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster HSHCGI_PEA J jnode J according to the present invention is supported by 0 libraries. The number of libraries was detennined as previously described. This segment can be found in the following transcript(s): HSHCGIJPEA J JTO, HSHCGIJPEA JTl 1 and HSHCGIJPEAJJTl 3. Table 59 below describes the starting and ending position of this segment on each transcript. Table 59 - Segment location on transcripts
Figure imgf001788_0001
Segment cluster HSHCGI PEA Jjiode 4 according to the present invention is supported by 4 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSHCGIJPEA J JT2. Table 60 below describes the starting and ending position of this segment on each franscript. Table 60 - Segment location on transcripts
Figure imgf001788_0002
Segment cluster HSHCGIJPEA J_node_6 according to the present invention is supported by 3 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSHCGIJPEAJJTl, HSHCGI JPEA_3_T3, HSHCGIJPEA J_T4, HSHCGI JΕA JJT5, HSHCGI JΕA J JT7, HSHCGI_PEA_3_T8, HSHCGI_PEAJ_T9, HSHCGI_PEA_3_T10, HSHCGIJPEAJJTl 2, HSHCGIJPEAJJTl! and HSHCGI JΕAJJT23. Table 61 below describes the starting and ending position of this segment on each transcript. Table 61 - Segment location on transcripts
Figure imgf001789_0001
Segment cluster HSHCGIJPEA J jiode according to the present invention is supported by 32 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSHCGI JPEAJ JTO, HSHCGIJΕAJ JT, HSHCGI JΕAJjπ, HSHCGI »EAJ_T3, HSHCGI JΕAJJT4, HSHCGI JPEA_3_T5, HSHCGI ?EAJ_T7, HSHCGI_PEA_3_T8, HSHCGI JΕA JJT9, HSHCGIJPEAJJTl 0, HSHCGIJPEAJJTl 1, HSHCGI_PEAJ_T12, HSHCGI JΕAJ JTl 3, HSHCGIJPEAJJTl 8 and HSHCGI JPEA JJT23. Table 62 below describes the starting and ending position of this segment on each franscript. Table 62 - Segment location on tr-anscripts
Figure imgf001790_0001
Segment cluster HSHCGIJPEA 3 jiode J 1 according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSHCGIJPEAJ JT6. Table 63 below describes the starting and ending position of this segment on each franscript. Table 63 - Segment location on tr-anscripts
Figure imgf001790_0002
Segment cluster HSHCGIJPEAJ node 13 according to the present invention is supported by 35 libraries. The number of libraries was detennined as previously described. This segment can be found in the following transcript(s): HSHCGI JPEAJ JTO, HSHCGIJPEAJJTl, HSHCGI »EAJ_T2, HSHCGI JPEA JJT3, HSHCGI JΕA JJT4, HSHCGIJPEA JJT5, HSHCGI JPEA JJT6, HSHCGI J>EA_3_T7, HSHCGI JΕA J T8, HSHCGI J EA JT9, HSHCGI J AJ JTl 0, HSHCGI_PEAJ_T11, HSHCGI_PEAJ_T12, HSHCGI_PEAJ_T13, HSHCGIJΕAJJT18 and HSHCGI_PEAJ_T23. Table 64 below describes the starting and ending position of this segment on each transcript. Table 64 - Segment location on transcripts
Figure imgf001791_0001
Segment cluster HSHCGI JΕA Jjiode J 9 according to the present invention can be found in the following transcript(s): HSHCGI JPEAJ JTO, HSHCGIJPEAJJTl, HSHCGIJPEAJ JT2, HSHCGI JPEA J_T3, HSHCG1_PEAJ_T4, HSHCGI JΕAJJT5, HSHCGI ΕAJJT6, HSHCGIJΕAJ _T7, HSHCGI_PEAJ_T8, HSHCGI_PEAJ_T9, HSHCGI JΕA JJT 10, HSHCGI PEAJJT12, HSHCG1JΕAJ T14 and HSHCGI_PEAJ_T18. Table 65 below describes the starting and ending position of this segment on each transcript. Table 65 - Segment location on transcripts
Figure imgf001792_0001
Segment cluster HSHCGIJPEA 3 iode l according to the present invention can be found in the following franscript(s): HSHCGI JPEAJJTl 3. Table 66 below describes the starting and ending position of this segment on each franscript. Table 66 - Segment location on transcripts
Figure imgf001793_0001
Segment cluster HSHCGI PEA node 22 according to the present invention is supported by 6 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSHCGI JPE A JJT5, HSHCGIJPEA _T8, HSHCGIJΕAJ JT2 and HSHCGIJPEA JJT14. Table 67 below describes the starting and ending position of this segment on each transcript. Table 67 - Segment location on transcripts
Figure imgf001793_0002
Segment cluster HSHCGI PEA Jjnode 23 according to the present invention is supported by 30 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSHCGI JPEAJ JTO, HSHCGI JΕAJ JTl, HSHCGI JPEA JJT2, HSHCGI JPEAJJT3, HSHCGI JΕA JJT4, HSHCGI JΕAJJT5, HSHCGIJPEAJ _T6, HSHCGI JΕA JJT8, HSHCGI JΕA JJT9, HSHCGIJPEAJJT12, HSHCGIJPEAJJTl 3, HSHCGIJPEAJJTl 4 and HSHCGIJΕAJjπ 8. Table 68 below describes the starting and ending position of this segment on each transcript. Table 68 - Segment location on transcripts
Figure imgf001794_0001
Segment cluster HSHCGI_PEA J_node_24 according to the present invention is supported by 38 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSHCGIJPEAJ JTO, HSHCGIJPEAJJTl, HSHCGI JPEAJJH, HSHCGI JΕA JJT3, HSHCGI JPEAJJT4, HSHCGIJPEA JJT5, HSHCGIJPEA_3JT6, HSHCGI JPEAJJT8, HSHCGIJPEAJJT9, HSHCGIJPEAJJTl 0, HSHCGIJPEAJ JT 1, HSHCGI JΕA JJT12, HSHCGIJΕAJJT13, HSHCGI JΕAJJT14 and HSHCGI JPEAJJTl 8. Table 69 below describes the starting and ending position of this segment on each transcript. Table 69 - Segment location on transcripts
Figure imgf001794_0002
Figure imgf001795_0001
Segment cluster HSHCGI PEA Jjnode _27 according to the present invention is supported by 43 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscriρt(s): HSHCGI JPEAJ JTO, HSHCGIJPEAJJTl, HSHCGI JΕAJJT2, HSHCGI JPEAJJT3, HSHCGIJPEA JJT4, HSHCGIJPEA JJT5, HSHCGI JΕA J_T6, HSHCGI JPEA JJT7, HSHCGIJPEAJ JT8, HSHCGI JΕAJJT9, HSHCGIJPEAJJTl 0, HSHCGIJPEAJJTl 1, HSHCGI JPEA JJT12, HSHCGIJPEA JJT13, HSHCGIJPEAJJT14, HSHCGIJΕAJJT15, HSHCGIJPEAJJTl 8, HSHCGI JΕA _3_T19, HSHCGI JΕA JJT20 and HSHCGI_PEAJ_T24. Table 70 below describes the starting and ending position of this segment on each franscript. Table 70 - Segment location on transcripts
Figure imgf001795_0002
Figure imgf001796_0001
Segment cluster HSHCGIJPEA Jjiode 1 according to the present invention is supported by 34 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSHCGI JPEAJ JTO, HSHCGIJPEA_3_T1, HSHCGIJPEA JJT2, HSHCGI JPEA_3_T3, HSHCGI JPEAJJT4, HSHCGI JPEA JJT5, HSHCGI_PEA JJT6, HSHCGI JΕA J_T7, HSHCGIJPEAJ JT8, HSHCGIJPEA J_T9, HSHCGIJPEAJJTl 0, HSHCGIJΕAJJT11, HSHCGIJPEAJJTl 2, HSHCGI JΕAJ JTl 3, HSHCGI JΕA JJT14, HSHCGIJΕAJ JTl 5, HSHCGI JPEAJJT17, HSHCGI ?EAJ_Tl 9 and HSHCGIJPEA JJT20. Table 71 below describes the starting and ending position of this segment on each transcript. Table 71 - Segment location on transcripts
Figure imgf001797_0001
Segment cluster HSHCGIJΕAJ jiodej 5 according to the present invention is supported by 38 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSHCGI JPEAJ JTO, HSHCGI JΕAJ JTl, HSHCGI JPEAJJT2, HSHCGI JΕA JJT3, HSHCGI JΕAJJT4, HSHCGIJPEAJ JT5, HSHCGI ?EAJ_T6, HSHCGIJPEAJJH, HSHCGIJPEAJJT8, HSHCGI JPEAJJT9, HSHCGI JΕAJ JTl 0, HSHCGIJPEAJJTl 1, HSHCGI_PEAJ_T12, HSHCGI_PEAJ_T13, HSHCGIJPEA JT14, HSHCGI JΕAJ JTl 5, HSHCGIJPEA JJT 7, HSHCGIJPEAJ JT 9 and HSHCGI JPEA_ 3JT20. Table 72 below describes the starting and ending position of this segment on each transcript. Table 72 - Segment location on transcripts
Figure imgf001798_0001
Variant protein alignment to the previously known protein:
Sequence name: TM31_HUMAN
Sequence documentation:
Alignment of: HSHCGI_PEA_3_P17 x TM31_HUMAN
Alignment segment 1/1:
Quality: 2185.00 Escore: 0 Matching length: 218 Total length: 218 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment :
1 MASGQFVNKLQEEV1CPICLDILQKPVT1DCGHNFCPQCITQ1GETSCGF 50 II I I II I I II I 1 I I I I I I 1 I I I I 1 I 1 I I I I I I I I I I I I I I I II I I I I I 1 I 1 MASGQFVNKLQEEV1CPICLD1LQKPVTIDCGHNFCPQC1TQ1GETSCGF 50
51 FKCPLCKTSVRRDATRFNSLLRNLVEKIQALQASEVQSKRKEATCPRHQE 100 I II I I I I I I I II I I I I I I II I I I I I I I I I II II I I I I I I I I I I I I I I I II 51 FKCPLCKTSVRRDA1RFNSLLRNLVEKIQALQASEVQSKRKEATCPRHQE 100 . . . . . 101 MFHYFCEDDGKFLCFVCRESKDHKSHNVSLIEEAAQNYQGQ1QEQ1QVLQ 150 101 MFHYFCEDDGKFLCFVCRESKDHKSHNVSL1EEAAQNYQGQIQEQ1QVLQ 150
151 QKEKETVQVKAQGVHRVDVFTDQVEHEKQRILTEFELLHQVLEEEKNFLL 200 I I I I I I I I I I II I I I I I I I I I I I I I I I I I II I II I I I I I I I I I I I II I I I 151 QKEKETVQVKAQGVHRVDVFTDQVEHEKQRILTEFELLHQVLEEEKNFLL 200
201 SR1Y LGHEGTEAGKHYV 218 I I I I I I I I I I I I I I I I I I 201 SRIY LGHEGTEAGKHYV 218
Sequence name: TM31_HUMAN_V2
Sequence documentation:
Alignment of: HSHCG1_PEA_3_P19 x TM31_HUMAN_V2
Alignment segment 1/1: Quality: 2474.00
Escore: 0 Matching length: 249 Total length: 249 Matching Percent Similarity: 100.00 Matching Percent Identity: 99.60 Total Percent Similarity: 100.00 Total Percent Identity: 99.60 Gaps : 0
Alignment:
1 MASGQFVNKLQEEVTCP1CLDTLQKPVTTDCGHNFCLKCITQIGETSCGF 50
1 MASGQFVNKLQEEV1CPICLDILQKPVTIDCGHNFCLKC1TQIGETSCGF 50
51 FKCPLCKTSVRRDAIRFNSLLRNLVEK1QALQASEVQSKRKEATCPRHQE 100
51 FKCPLCKTSVRRDA1RFNS LRNLVEK1QALQASEVQSKRKEATCPRHQE 100 101 MFHYFCEDDGKFLCFVCRESKDHKSHNVSLIEEAAQNYQGQIQEQIQVLQ 150 I I I I I I I I II I I I I I I I 1 I II II I I I I I I I I I I II I I I I I II I II I II I I 101 MFHYFCEDDGKFLCFVCRESKDHKSHNVSLIEEAAQNYQGQIQEQIQVLQ 150
151 QKEKETVQVKAQGVHRVDVFTDQVEHEKQRXLTEFELLHQVLEEEKNFLL 200 I I I I I I I I I I I I I I I I I I I || I I I || I I I I I I I I I I I I I I I I I || I I I I I 151 QKEKETVQVKAQGVHRVDVFTDQVEHEKQRILTEFELLHQVLEEEKNFLL 200
201 SRIYWLGHEGTEAGKHYVASTEPQLNDLKKLVDSLKTKQNMPPRQLLEN 249 I II I I II I I I I I II I I 11 I I I I I I I I I I I I I I I I I I I I I I II I II I II: 201 SR1Y LGHEGTEAGKHYVASTEPQLNDLKKLVDSLKTKQNMPPRQL ED 249 Sequence name: TM31_HUMAN_V1
Sequence documentation:
Alignment of: HSHCG1_PEA_3_P4 x TM31_HUMAN_V1
Alignment segment 1/1:
Quality: 2550.00 Escore: 0 Matching length: 256 Total length: 256 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment: . . . . . 1 MASGQFVNKLQEEVICPICLDILQKPVTIDCGHNFCLKCITQIGETSCGF 50 I I I I I I I I II II I I I I II I I I I I I I I I I I I II II I I I I I I I I II I I I I II 1 MASGQFVNKLQEEVICPICLDILQKPVTIDCGHNFCL CITQIGETSCGF 50 51 FKCPLCKTSVRKNAIRFNSLLRNLVEKIQALQASEVQSKRKEATCPRHQE 100 I I I I I I I I II I I II II I I I I II I I I I I I I I II I I I I I I I I I I III I I I I I 51 FKCPLCKTSVRKNAIRFNSLLRNLVEKIQALQASEVQSKRKEATCPRHQE 100
101 MFHYFCEDDGKFLCFVCRESKDHKSHNVSLIEEAAQNYQGQIQEQIQVLQ 150 M I I I I I I || I I I I I I I I I I || I I I I I I I I I I I I I I I I I I I I I I I I I I I I 101 MFHYFCEDDGKFLCFVCRESKDHKSHNVSLIEEAAQNYQGQIQEQIQVLQ 150 151 QKEKETVQVKAQGVHRVDVFTDQVEHEKQRILTEFELLHQVLEEEKNFLL 200 I I I I I I II I I I I I II II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 151 QKEKETVQVKAQGVHRVDVFTDQVEHEKQRILTEFELLHQVLEEEKNFLL 200 201 SRIY LGHEGTEAGKHYVASTEPQLNDLKKLVDSLKTKQNMPPRQLLEDI 250 I I I I I I I I I I I I II I I I I I I I I II I I I I I I I I I II I I II I I I I I I I I I I I 201 SRIY LGHEGTEAGKHYVASTEPQLNDLKKLVDSLKTKQNMPPRQLLEDI 250 251 KVVLCR 256 I I I I I I 251 KVVLCR 256
Sequence name: TM31_HUMAN_V1
Sequence documentation:
Alignment of: HSHCGI_PEA_3_P6 x TM31_HUMAN_V1
Alignment segment 1/1:
Quality: 2550.00 Escore: 0 Matching length: 256 Total length: 256 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment:
1 MASGQFVNKLQEEVICPICLDILQKPVTIDCGHNFCLKCITQ1GETSCGF 50 I I 11 I I 1 I I I I II II I I I I I I II I I II I I I II I I I I I! I 11 I 1 I I 1 I 1 I I 1 MASGQFVNKLQEEVICPICLDILQKPVTIDCGHNFCLKCITQIGETSCGF 50
51 FKCPLCKTSVRKNAIRFNSLLRNLVEKIQALQASEVQSKRKEATCPRHQE 100 I 1111 I I I I I I I I I I II I I I I II I II II I I II II I II I I I I II II 11 I I I 51 FKCPLCKTSVRKNAIRFNSLLRNLVEKIQALQASEVQSKRKEATCPRHQE 100
101 MFHYFCEDDGKFLCFVCRESKDHKSHNVSLIEEAAQNYQGQIQEQIQVLQ 150 1111 I 1 I I 1 I I I I I I I I I I II I I I I I II I I II 1 II I II I I II 1111 I I I 1 101 MFHYFCEDDGKFLCFVCRESKDHKSHNVSLIEEAAQNYQGQIQEQIQVLQ 150 . . . . . . 151 QKEKETVQVKAQGVHRVDVFTDQVEHEKQRILTEFELLHQVLEEEKNFLL 200 I 11 I I 111 I I I I I I I I II I I I I I II I II II I I I I I I I II II 11 llll I I I 151 QKEKETVQVKAQGVHRVDVFTDQVEHEKQRILTEFELLHQVLEEEKNFLL 200 201 SRIYWLGHEGTEAGKHYVASTEPQLNDLKKLVDSLKTKQNMPPRQLLEDI 250 I I I I 1 I I I 1 I I I I I 1 I I I I I I I II I II II I I I I I II I II II I I I I I I I I I 201 SRIY LGHEGTEAGKHYVASTEPQLNDLKKLVDSLKTKQNMPPRQLLEDI 250
251 KVVLCR 256
251 KVVLCR 256
Sequence name: TM31_HUMAN_Vl
Sequence documentation:
Alignment of: HSHCGI_PEA_3_P7 x TM31_HUMAN_V1
Alignment segment 1/1: Quality: 2559.00
Escore: 0 Matching length: 257 Total length: 257 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment:
1 MASGQFVNKLQEEVICPICLDILQKPVTIDCGHNFCLKCITQIGETSCGF 50 II I I II II I II I III I I I I I I I II I I I I I I II I I I II I I II II II I I I II 1 MASGQFVNKLQEEVICPICLDILQKPVTIDCGHNFCLKCITQIGETSCGF 50 . . . . . 51 FKCPLCKTSVRKNAIRFNSLLRNLVEKIQALQASEVQSKRKEATCPRHQE 100 51 FKCPLCKTSVRKNAIRFNSLLRNLVEKIQALQASEVQSKRKEATCPRHQE 100
101 MFHYFCEDDGKFLCFVCRESKDHKSHNVSLIEEAAQNYQGQIQEQIQVLQ 150 I I I I I I I I I II I I I I I I I I I I I I I I I I II I I I I! I I I I I I I I I I I I I I I I 101 MFHYFCEDDGKFLCFVCRESKDHKSHNVSLIEEAAQNYQGQIQEQIQVLQ 150
151 QKEKETVQVKAQGVHRVDVFTDQVEHEKQRILTEFELLHQVLEEEKNFLL 200 I I I I II II I III I I I I I I I II I II I II I I I I I I II I I I I I I II II II I I I 151 QKEKETVQVKAQGVHRVDVFTDQVEHEKQRILTEFELLHQVLEEEKNFLL 200
201 SRIY LGHEGTEAGKHYVASTEPQLNDLKKLVDSLKTKQNMPPRQLLEDI 250 I II I I II I I III I I I I II I II I 1 I II I I I I I I I I I II II I I I I II I I II I 201 SRIYWLGHEGTEAGKHYVASTEPQLNDLKKLVDSLKTKQNMPPRQLLEDI 250
251 KVVLCRS 257
251 KVVLCRS 257
Sequence name: TM31J-UMANJ71
Sequence documentation:
Alignment of: HSHCGI_PEA_3_P8 x TM31_HUMAN_V1
Alignment segment 1/1: Quality: 3409.00 Escore: 0 Matching length: 342 Total length: 342 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0
Alignment :
1 MASGQFVNKLQEEVICPICLDILQKPVTIDCGHNFCLKCITQIGETSCGF 50
1 MASGQFVNKLQEEVICPICLDILQKPVTIDCGHNFCLKCITQIGETSCGF 50
51 FKCPLCKTSVRKNAIRFNSLLRNLVEKIQALQASEVQSKRKEATCPRHQE 100 I I I I I I I I I II I I I I I I II I I II I I I I I II I I I I I I II I I I I I I I I I II I 51 FKCPLCKTSVRKNAIRFNSLLRNLVEKIQALQASEVQSKRKEATCPRHQE 100
101 MFHYFCEDDGKFLCFVCRESKDHKSHNVSLIEEAAQNYQGQIQEQIQVLQ 150 II II II II I II I II II II I II II II II II II II II II I II I I II I II I II 101 MFHYFCEDDGKFLCFVCRESKDHKSHNVSLIEEAAQNYQGQIQEQIQVLQ 150 . . . . . 151 QKEKETVQVKAQGVHRVDVFTDQVEHEKQRILTEFELLHQVLEEEKNFLL 200 II I I I I I I I I I I I I I I I I II 1 I I I I I I I II II I I I I I I I I II I I II I I I I 151 QKEKETVQVKAQGVHRVDVFTDQVEHEKQRILTEFELLHQVLEEEKNFLL 200 201 SRI WLGHEGTEAGKHYVASTEPQLNDLKKLVDSLKTKQNMPPRQLLEDI 250 201 SRIYWLGHEGTEAGKHYVASTEPQLNDLKKLVDSLKTKQNMPPRQLLEDI 250 251 KVVLCRSEEFQFLNPTPVPLELEKKLSEAKSRHDSITGSLKKFKDQLQAD 300 I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 251 KVVLCRSEEFQFLNPTP¥PLELEKKLSEAKSRHDSITGSLKKFKDQLQAD 300 301 RKKDENRFFKSMNKNDMKS GLLQKNNHKMNKTSEPGSSSAG 342 I I I I I I I I I II I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I 301 RKKDENRFFKSMNKNDMKS GLLQKNNHKMNKTSEPGSSSAG 342
Sequence name: TM31_HUMAN_V1
Sequence documentation:
Alignment of: HSHCGI_PEA_3_P9 x TM31_HUMAN_V1
Alignment segment 1/1:
Quality: 2556.00 Escore: 0 Matching length: 259 Total length: 259 Matching Percent Similarity: 99.61 Matching Percent Identity: 99.23 Total Percent Similarity: 99.61 Total Percent Identity: 99.23 Gaps : 0
Alignment :
1 MASGQFVNKLQEEVICPICLDILQKPVTIDCGHNFCLKCITQIGETSCGF 50 I I I I I I I I II I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MASGQFVNKLQEEVICPICLDILQKPVTIDCGHNFCLKCITQIGETSCGF 50 51 FKCPLCKTSVRKNAIRFNSLLRNLVEKIQALQASEVQSKRKEATCPRHQE 100 I I I I II I I I I I I I I I II I I I I I II II I I I I I I I I II I I I I I I I I I II I I I 51 FKCPLCKTSVRKNAIRFNSLLRNLVEKIQALQASEVQSKRKEATCPRHQE 100
101 MFHYFCEDDGKFLCFVCRESKDHKSHNVSLIEEAAQNYQGQIQEQIQVLQ 150 I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I 101 MFHYFCEDDGKFLCFVCRESKDHKSHNVSLIEEAAQNYQGQIQEQIQVLQ 150 151 QKEKETVQVKAQGVHRVDVFTDQVEHEKQRILTEFELLHQVLEEEKNFLL 200 I I I II II II II I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I 151 QKEKETVQVKAQGVHRVDVFTDQVEHEKQRILTEFELLHQVLEEEKNFLL 200 . . . . . 201 SRIY LGHEGTEAGKHYVASTEPQLNDLKKLVDSLKTKQNMPPRQLLEDI 250 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 201 SRIYWLGHEGTEAGKHYVASTEPQLNDLKKLVDSLKTKQNMPPRQLLEDI 250 251 KVVLCRTGE 259 I I I I I I: I 251 KVVLCRSEE 259 Sequence name: TM31_HUMAN
Sequence documentation:
Alignment of: HSHCGI_PEA_3_P12 x TM31JHUMAN
Alignment segment 1/1:
Quality: 1130.00 Escore: 0 Matching length: 114 Total length: 114 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment:
1 MNKNDMKS GLLQKNNHKMNKTSEPGSSSAGGRTTSGPPNHHSSAPSHSL 50 I I II I II I I I I I I I I I I II II I I I I I I I I I I I II I I I I II I I I II I I I I I 312 MNKNDMKSWGLLQKNNHKMNKTSEPGSSSAGGRTTSGPPNHHSSAPSHSL 361 51 FRASSAGKVTFPVCLLASYDEISGQGASSQDTKTFDVALSEELHAALSEW 100 I I I I I I I I I I I I I I I I I I I II I II I I I I I I I I I I I I I I I I I I I I I I I II I 362 FRASSAGKVTFPVCLLASYDEISGQGASSQDTKTFDVALSEELHAALSE 411
101 LTAIRAWFCEVPSS 114 412 LTAIRAWFCEVPSS 425
Sequence name: TM31_HUMAN_V1
Sequence documentation:
Alignment of: HSHCGI_PEA_3_P14 x TM31_HUMAN_Vl
Alignment segment 1/1:
Quality: 3175.00 Escore: 0 Matching length: 319 Total length: 319 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment:
1 MASGQFVNKLQEEVICPICLDILQKPVTIDCGHNFCLKCITQIGETSCGF 50 I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MASGQFVTSJKLQEEVICPICLDILQKPVTIDCGHNFCLKCITQIGETSCGF 50 51 FKCPLCKTSVRKNAIRFNSLLRNLVEKIQALQASEVQSKRKEATCPRHQE 100 I I I I I I 1 I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I 51 FKCPLCKTSVRKNAIRFNSLLRNLVEKIQALQASEVQSKRKEATCPRHQE 100 . . . . . 101 MFHYFCEDDGKFLCFVCRESKDHKSHNVSLIEEAAQNYQGQIQEQIQVLQ 150 II II II II I I I I II I II II II II II II I II II II II II I I II I II I II I I 101 MFHYFCEDDGKFLCFVCRESKDHKSHNVSLIEEAAQNYQGQIQEQIQVLQ 150 151 QKEKETVQVKAQGVHRVDVFTDQVEHEKQRILTEFELLHQVLEEEKNFLL 200 I I I I I I I I I I I I I I I II I I I II I II I I I I I I I I I I II I I I I II I I I I I II 151 QKEKETVQVKAQGVHRVDVFTDQVEHEKQRILTEFELLHQVLEEEKNFLL 200
201 SRIYWLGHEGTEAGKHYVASTEPQLNDLKKLVDSLKTKQNMPPRQLLEDI 250 I I I I I I I I I I I I I I I I I I I I I I | | I I I I I I I I I I I I I I I I I I I I I I I I I I 201 SRIYWLGHEGTEAGKHYVASTEPQLNDLKKLVDSLKTKQNMPPRQLLEDI 250
251 KWLCRSEEFQFLNPTPVPLELEKKLSEAKSRHDSITGSLKKFKDQLQAD 300 II I I II II I I I I II 1 I II I I I I II II II II II II II I I I I I II I I I II I I 251 KVVLCRSEEFQFLNPTPVPLELEKKLSEAKSRHDSITGSLKKFKDQLQAD 300 301 RKKDENRFFKSMNKNDMKS 319
301 RKKDENRFFKSMNKNDMKS 319
Sequence name: TM31J-UMAN VI Sequence documentation:
Alignment of: HSHCGI_PEA_3_P16 x TM31_HUMAN_V1
Alignment segment 1/1:
Quality: 1714.00
Escore : 0 Matching length: 171 Total length: 171 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment : 1 MASGQFVNKLQEEVICPICLDILQKPVTIDCGHNFCLKCITQIGETSCGF 50 I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I 11 I I I I II I I I I I I I I I I 1 MASGQFVNKLQEEVICPICLDILQKPVTIDCGHNFCLKCITQIGETSCGF 50 51 FKCPLCKTSVRKNAIRFNSLLRNLVEKIQALQASEVQSKRKEATCPRHQE 100 I I I I I I I I I I M I I I II I I I I II II I I I I I I I I I I I I I I I II I M I I II I 51 FKCPLCKTSVRKNAIRFNSLLRNLVEKIQALQASEVQSKRKEATCPRHQE 100 101 MFHYFCEDDGKFLCFVCRESKDHKSHNVSLIEEAAQNYQGQIQEQIQVLQ 150 I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I 101 MFHYFCEDDGKFLCFVCRESKDHKSHNVSLIEEAAQNYQGQIQEQIQVLQ 150 151 QKEKETVQVKAQGVHRVDVFT 171
151 QKEKETVQVKAQGVHRVDVFT 171
Sequence name: TM31_HUMAN
Sequence documentation:
Alignment of: HSHCGI_PEA_3_P21 x TM31_HUMAN
Alignment segment 1/1
Quality: 3106.00 Escore: 0 Matching length: 319 Total length: 319 Matching Percent Similarity: 99.37 Matching Percent Identity: 98.75 Total Percent Similarity: 99.37 Total Percent Identity: 98.75 Gaps : 0
Alignment:
24 EERNQFLCFVCRESKDHKSHNVSLIEEAAQNYQGQIQEQIQVLQQKEKET 73 107 EDDGKFLCFVCRESKDHKSHNVSLIEEAAQNYQGQIQEQIQVLQQKEKET 156 74 VQVKAQGVHRVDVFTDQVEHEKQRILTEFELLHQVLEEEKNFLLSRIYWL 123 I I I I I I I I I I I I I II I I I I I I I I 1 I 111 I I I I I I 1 I I 1 I I I I I II I I I I I
157 VQVKAQGVHRVDVFTDQVEHEKQRILTEFELLHQVLEEEKNFLLSRIY L 206
124 GHEGTEAGKHYVASTEPQLNDLKKLVDSLKTKQNMPPRQLLEDIKVVLCR 173 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I
207 GHEGTEAGKHYVASTEPQLNDLKKLVDSLKTKQNMPPRQLLEDIKVVLCR 256
174 SEEFQFLNPTPVPLELEKKLSEAKSRHDSITGSLKKFKDQLQADRKKDEN 223
257 SEEFQFLNPTPVPLELEKKLSEAKSRHDSITGSLKKFKDQLQADRKKDEN 306
224 RFFKSMNKNDMKSWGLLQKNNHKMNKTSEPGSSSAGGRTTSGPPNHHSSA 273 I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I II I I I I I
307 RFFKSMNKNDMKS GLLQKNNHKMNKTSEPGSSSAGGRTTSGPPNHHSSA 356
274 PSHSLFRASSAGKVTFPVCLLASYDEISGQGASSQDTKTFDVALSEELHA 323
357 PSHSLFRASSAGKVTFPVCLLASYDEISGQGASSQDTKTFDVALSEELHA 406
324 ALSE LTAIRAWFCEVPSS 342
407 ALSEWLTAIRAWFCEVPSS 425 Sequence name: TM31_HUMAN
Sequence documentation:
Alignment of: HSHCGI_PEA_3JP22 x TM31_HUMAN
Alignment segment 1/1:
Quality: 1829.00 Escore: 0 Matching length: 185 Total length: 185 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment: . . . . . 1 MPPRQLLEDIKWLCRSEEFQFLNPTPVPLELEKKLSEAKSRHDSITGSL 50 I 1 I I I I I I I I II I I I I 1 I I I I I I II I I I I I I I I II I I I I I I I I I I I I I I I 241 MPPRQLLEDIKVVLCRSEEFQFLNPTPVPLELEKKLSEAKSRHDSITGSL 290 51 KKFKDQLQADRKKDENRFFKSMNKNDMKSWGLLQKNNHKMNKTSEPGSSS 100 I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I II I I I I I I I I I I I I I I I I I 291 KKFKDQLQADRKKDENRFFKSMNKNDMKSWGLLQKNNHKMNKTSEPGSSS 340
101 AGGRTTSGPPNHHSSAPSHSLFRASSAGKVTFPVCLLASYDEISGQGASS 150 I I I I I I I I I I I 1 || I I I I I I I I I I I I I I M I I I I I I I I I I I II I I I I II I 341 AGGRTTSGPPNHHSSAPSHSLFRASSAGKVTFPVCLLASYDEISGQGASS 390 151 QDTKTFDVALSEELHAALSEWLTAIRAWFCEVPSS 185 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 391 QDTKTFDVALSEELHAALSEWLTAIRAWFCEVPSS 425
Expression ofTRIM31 tripartite motif HSHCGI transcripts which are detectable by amplicon as depicted in sequence name HSHCGI seg20 in normal and cancerous colon tissues Expression of TRIM31 tripartite motif transcripts detectable by or according to seg20, HSHCGIseg20 amplicon (SEQ ID NO: 1378) and HSHCGIseg20F (SEQ ID NO: 1376) HSHCGIseg20R (SEQ ID NO: 1377) primers was measured by real time PCR. In parallel the expression of four housekeeping genes -PBGD (GenBank Accession No. BC019323; amplicon
- PBGD-amplicon, SEQ ID NO:531), HPRTl (GenBank Accession No. NM 00194; amplicon
- HPRTl -amplicon, SEQ ID NO:612), G6PD (GenBank Accession No. NM_000402; G6PD amplicon, SEQ ID NO:615), RPS27A (GenBanlc Accession No. NMJ302954; RPS27A amplicon, SEQ ID NOT261), was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geomefric mean ofthe quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities ofthe normal post-mortem (PM) samples (Sample Nos. 41, 52, 62-67, 69-71, Table 1, above, "Tissue samples in testing samples"), to obtain a value of fold up-regulation for each sample relative to median of the normal PM samples. Figure 69 is a histogram showing over expression ofthe above-indicated TRIM31 tripartite motif franscripts in cancerous colon samples relative to the normal samples. The number and percentage of samples that exhibit at least 3 fold over-expression, out ofthe total number of samples tested is indicated in the bottom. As is evident from Figure 69, the expression of TRTM31 tripartite motif franscripts detectable by the above amplicon in cancer samples was higher than in the non-cancerous samples (Sample Nos. 41, 52, 62-67, 69-71, Table 1, "Tissue samples in testing samples"). Notably an over-expression of at least 3 fold was found in 6 out of 37 adenocarcinoma samples, Statistical analysis was applied to verify the significance of these results, as described below. The P value for the difference in the expression levels of TRIM31 tripartite motif transcripts detectable by the above amplicon in colon cancer samples versus the normal tissue samples was determined by T test as 6.58E-02.
Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non- limiting illustrative example only of a suitable primer pair: HSHCGIseg20F forward primer; and HSHCGIseg20R reverse primer. The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon was obtained as a non-limiting illustrative example only of a suitable amplicon: HSHCGIseg20.
Forward primer (SEQ ID NO: 1376): TGCCTACTGATTCATCCACATACA Reverse primer (SEQ ID NO: 1377): GCATTCCCCGGCTGC Amplicon (SEQ ID NO: 1378):
TGCCTACTGATTCATCCACATACAATTCTCAGCGTATATCCAAATGCAGTCAACATT CCTCTCTCAGAAATACCCACCCACCTCTAACTCTGCATTCATACATTTAGGCTGCAG CCGGGGAATGC
Expression ofTRIM31 tripartite motif HSHCGI transcripts which are detectable by amplicon as depicted in sequence name HSHCGI seg35 in normal and cancerous colon tissues Expression of TRIM31 tripartite motif franscripts detectable by or according to seg35, HSHCGIseg35 amplicon (SEQ ID NO: 1381) and HSHCGIseg35F (SEQ ID NO: 1379) HSHCGIseg35R (SEQ ID NO: 1380) primers was measured by real time PCR. In parallel the expression of four housekeeping genes -PBGD (GenBank Accession No. BCO 19323; amplicon
- PBGD-amplicon, SEQ ID NO:531), HPRTl (GenBank Accession No. NM_000194; amplicon
- HPRTl -amplicon, SEQ ID NO.612), G6PD (GenBank Accession No. NM 000402; G6PD amplicon, SEQ ID NO:615), RPS27A (GenBank Accession No. NMJ302954; RPS27A amplicon, SEQ ID NOT261), was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean ofthe quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities ofthe normal post-mortem (PM) samples (Sample Nos. 41, 52, 62-67, 69-71, Table 1, above, "Tissue samples in testing samples"), to obtain a value of fold up-regulation for each sample relative to median ofthe normal PM samples. Figure 70 is a histogram showing over expression ofthe above-indicated TRIM31 tripartite motif transcripts in cancerous colon samples relative to the normal samples. The number and percentage of samples that exhibit at least 3 fold over-expression, out ofthe total number of samples tested is indicated in the bottom. As is evident from Figure 70, the expression of TRIM31 tripartite motif franscripts detectable by the above amplicon in cancer samples was significantly higher than in the non- cancerous samples (Sample Nos. 41-45, 49-52, 62-67, 69-71 Table 1, 'Tissue samples in testing samples"). Notably an over-expression of at least 3 fold was found in 8 out of 37 adenocarcinoma samples, Statistical analysis was applied to verify the significance of these results, as described below. The P value for the difference in the expression levels of TRIM31 tripartite motif franscripts detectable by the above amplicon in colon cancer samples versus the normal tissue samples was determined by T test as 7.56E-03.
Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non- limiting illustrative example only of a suitable primer pair: HSHCGIseg35F forward primer; and HSHCGIseg35R reverse primer. The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon was obtained as a non-limiting illusfrative example only of a suitable amplicon: HSHCGIseg35.
Forward primer (SEQ ID NO: 1379): TAAGTCTACAGGTGGTCAAAATGCTG Reverse primer (SEQ ID NO: 1380): GGAGCGCCCTCTGTTTCC Amplicon (SEQ ID NO: 1381): TAAGTCTACAGGTGGTCAAAATGCTGTATCCACCCAATTCCACTAAATGGAATAAA TGAATAAATGAATGAATTCATTTATTCCATTTCCTCAGTTCCTCCCCAAATTACACCT CTGCCAGGAAACAGAGGGCGCTCC It should be noted that for R30650 JPEA_2-seg73, no differential expression was observed in one Q-PCR experiment carried out with colon panel. For HUMCEAJΕA J seg 6 - no differential expression was observed in one Q-PCR experiment canied out with colon panel. Therapeutic applications of splice variants of the present invention Splice variants described herein (including any polynucleotide, oligonucleotide, polypeptide, peptide or fragments thereof) or antibodies that specifically bind thereto may optionally be used for therapeutic applications, for example to treat the diseases described herein with regard to diagnostic applications thereof. A "variant-treatable" disease refers to any disease that is freatable by using a splice variant of any ofthe therapeutic proteins according to the present invention. "Treatment" also encompasses prevention, amelioration, elimination and confrol ofthe disease and/or pathological condition. The diseases for which such variants may be useful therapeutic agents are described in greater detail below for each of the variants. The variants themselves are described by "cluster" or by gene, as these variants are splice variants of known proteins. Therefore, a "cluster-related disease" or a "variant-related disease" refers to a disease that may be treated by a particular protein, with regard to the description of such diseases below a therapeutic protein variant according to the present invention. The tenn "biologically active", as used herein, refers to a protein having structural, regulatory, or biochemical functions of a naturally occuning molecule. Likewise, "immunologically active" refers to the capability ofthe natural, recombinant, or synthetic ligand, or any oligopeptide thereof, to induce a specific immune response in appropriate animals or cells and to bind with specific antibodies. The term "modulate", as used herein, refers to a change in the activity of at least one receptor mediated activity. For example, modulation may cause an increase or a decrease in protein activity, binding characteristics, or any other biological, functional or immunological properties of a ligand.
METHODS OF TREATMENT As mentioned hereinabove the novel therapeutic protein variants ofthe present invention and compositions derived therefrom (i.e., peptides, oligonucleotides) can be used to treat cluster- related diseases. Thus, according to an additional aspect ofthe present invention there is provided a method of treating cluster-related disease in a subject. The subject according to the present invention is a mammal, preferably a human which has at least one type ofthe cluster-related diseases described hereinabove. As mentioned hereinabove, the biomolecular sequences ofthe present invention can be used to treat subjects with the above-described diseases. The subject according to the present invention is a mammal, preferably a human which is diagnosed with one ofthe diseases described hereinabove, or alternatively is predisposed to having one ofthe diseases described hereinabove. As used herein the term "treating" refers to preventing, curing, reversing, attenuating, alleviating, rnmimizing, suppressing or halting the deleterious effects ofthe above-described diseases. Treating, according to the present invention, can be effected by specifically upregulating or alternatively downregulating the expression of at least one ofthe polypeptides ofthe present invention in the subject. Optionally, upregulation may be effected by administering to the subject at least one ofthe polypeptides ofthe present invention (e.g., recombinant or synthetic) or an active portion thereof, as described herein. However, since the bioavailability of large polypeptides may potentially be relatively small due to high degradation rate and low penetration rate, administration of polypeptides is preferably confined to small peptide fragments (e.g., about 100 amino acids). The polypeptide or peptide may optionally be administered in a pharmaceutical composition, described in more detail below. It will be appreciated that treatment ofthe above-described diseases according to the present invention may be combined with other freatment methods known in the art (i.e., combination therapy). Thus, freatment of malignancies using the agents ofthe present invention may be combined with, for example, radiation therapy, antibody therapy and/or chemotherapy. Alternatively or additionally, an upregulating method may optionally be effected by specifically upregulating the amount (optionally expression) in the subject of at least one ofthe polypeptides ofthe present invention or active portions thereof. As is mentioned hereinabove and in the Examples section which follows, the biomolecular sequences of this aspect ofthe present invention may be used as valuable therapeutic tools in the treatment of diseases in wliich altered activity or expression ofthe wild-type gene product is known to contribute to disease onset or progression. For example in case a disease is caused by overexpression of a membrane bound receptor, a soluble variant thereof may be used as an antagonist which competes with the receptor for binding the ligand, to thereby terminate signaling from the receptor. Examples of such diseases are listed in the Examples section which follows. It will be appreciated that the polypeptides ofthe present invention may also have agonistic properties. These include increasing the stability ofthe ligand (e.g., IL-4), protection from proteolysis and modification ofthe pharmacokinetic properties ofthe ligand (i.e., increasing the half-life ofthe ligand, while decreasing the clearance thereof). As such, the biomolecular sequences of this aspect ofthe present invention may be used to freat conditions or diseases in which the wild-type gene product plays a favorable role, for example, increasing angiogenesis in cases of diabetes or ischemia. Upregulating expression ofthe therapeutic protein variants ofthe present invention may be effected via the adminisfration of at least one ofthe exogenous polynucleotide sequences ofthe present invention, ligated into a nucleic acid expression construct designed for expression of coding sequences in eukaryotic cells (e.g., mammalian cells), as described above. Accordingly, the exogenous polynucleotide sequence may be a DNA or RNA sequence encoding the variants ofthe present invention or active portions thereof. It will be appreciated that the nucleic acid construct can be administered to the individual employing any suitable mode of adminisfration, described hereinbelow (i.e., in-vivo gene therapy). Alternatively, the nucleic acid construct is introduced into a suitable cell via an appropriate gene delivery vehicle/method (transfection, transduction, homologous recombination, etc.) and an expression system as needed and then the modified cells are expanded in culture and returned to the individual (i.e., ex-vivo gene therapy). Nucleic acid constructs are described in greater detail above. It will be appreciated that the present methodology may also be effected by specifically upregulating the expression ofthe variants ofthe present invention endogenously in the subject. Agents for upregulating endogenous expression of specific splice variants of a given gene include antisense oligonucleotides, which are directed at splice sites of interest, thereby altering the splicing pattern ofthe gene. This approach has been successfully used for shifting the balance of expression ofthe two isoforms of Bcl-x [Taylor (1999) Nat. Biotechnol. 17:1097-1100; and Mercatante (2001) J. Biol. Chem. 276:16411-16417]; IL-5R [Kanas (2000) Mol. Pharmacol. 58:380-387]; and c-myc [Giles (1999) Antisense Acid Dmg Dev. 9:213-220]. For example, mterleukin 5 and its receptor play a critical role as regulators of hematopoiesis and as mediators in some inflammatory diseases such as allergy and asthma. Two alternatively spliced isoforms are generated from the IL-5R gene, which include (i.e., long form) or exclude (i.e., short form) exon 9. The long form encodes for the intact membrane-bound receptor, while the shorter form encodes for a secreted soluble non-functional receptor. Using 2'-0-MOE- oligonucleotides specific to regions of exon 9, Kanas and co-workers (supra) were able to significantly decrease the expression ofthe wild type receptor and increase the expression ofthe shorter isoforms. Design and synthesis of oligonucleotides which can be used according to the present invention are described hereinbelow and by Sazani and Kole (2003) Progress in Moleclular and Subcellular Biology 31 :217-239. Upregulating expression ofthe polypeptides ofthe present invention in a subject may be effected via the administration of at least one ofthe exogenous polynucleotide sequences ofthe present invention ligated into a nucleic acid expression constmct designed for expression of coding sequences in eukaryotic cells (e.g., mammalian cells). Accordingly, the exogenous polynucleotide sequence may be a DNA or RNA sequence encoding the variants ofthe present invention or active portions thereof. It will be appreciated that the nucleic acid constmct can be administered to the individual employing any suitable mode of administration, described hereinbelow (i.e., in-vivo gene therapy). Alternatively, the nucleic acid construct is introduced into a suitable cell via an appropriate gene delivery vehicle/method (transfection, transduction, homologous recombination, etc.) and an expression system as needed and then the modified cells are expanded in culture and returned to the individual (i.e., ex-vivo gene therapy). Preferably, the promoter utilized by the nucleic acid construct ofthe present invention is active in the specific cell population transformed. Examples of cell type-specific and/or tissue- specific promoters include promoters, such as albumin that is liver specific [Pinkert et al., (1987) Genes Dev. 1:268-277], lymphoid specific promoters [Calame et al., (1988) Adv. Immunol. 43:235-275]; in particular promoters of T-cell receptors [Winoto et al., (1989) EMBO J. 8:729-733] and immunoglobulins; [Banerji et al. (1983) Cell 33729-740], neuron-specific promoters such as the neurofilament promoter [Byrne et al. (1989) Proc. Natl. Acad. Sci. USA 86:5473-5477], pancreas-specific promoters [Edlunch et al. (1985) Science 230:912-916] or mammary gland- specific promoters such as the milk whey promoter (U.S. Pat. No. 4,873,316 and European Patent Application No. EP 264,166). Examples of suitable constructs include, but are not limited to, pcDNA3, pcDNA3.1 (+/-), pGL3, PzeoSV2 (+/-), pDisplay, pEF/myc/cyto, pCMV/myc/cyto each of which is commercially available from Invitrogen Co. (www.invitrogen.com). Examples of retroviral vector and packaging systems are those sold by Clontech, San Diego, Calif., including Refro-X vectors pLNCX and pLXSN, which permit cloning into multiple cloning sites and the trasgene is transcribed from CMV promoter. Vectors derived from Mo-MuLV are also included such as pBabe, where the fransgene will be transcribed from the 5 'LTR promoter. Cunently prefened in vivo nucleic acid transfer techniques include transfection with viral or non- viral constructs, such as adenovims, lentivims, Herpes simplex I vims, or adeno-associated vims (AAV) and lipid-based systems. Useful lipids for lipid-mediated transfer ofthe gene are, for example, DOTMA, DOPE, and DC-Chol [Tonkinson et al., Cancer Investigation, 14(1): 54-65 (1996)]. The most prefened constructs for use in gene therapy are viruses, most preferably adenoviruses, AAV, lentiviruses, or retroviruses. A viral constmct such as a retroviral construct includes at least one franscriptional promoter/enhancer or locus-defining element(s), or other elements that control gene expression by other means such as alternate splicing, nuclear RNA export, or post-translational modification of messenger. Such vector constructs also include a packaging signal, long tenninal repeats (LTRs) or portions thereof, and positive and negative sfrand primer binding sites appropriate to the vi s used, unless it is already present in the viral construct. In addition, such a constmct typically includes a signal sequence for secretion ofthe peptide from a host cell in which it is placed. Preferably the signal sequence for this purpose is a mammalian signal sequence or the signal sequence ofthe polypeptide variants ofthe present invention. Optionally, the construct may also include a signal that directs polyadenylation, as well as one or more restriction sites and a translation termination sequence. By way of example, such constructs will typically include a 5' LTR, a tRNA binding site, a packaging signal, an origin of second-strand DNA synthesis, and a 3' LTR or a portion thereof. Other vectors can be used that are non-viral, such as cationic lipids, polylysine, and dendrimers. It will be appreciated that the present methodology may also be performed by specifically upregulating the expression ofthe splice variants ofthe present invention endogenously in the subject. Agents for upregulating endogenous expression of specific splice variants of a given gene include antisense oligonucleotides, which are directed at splice sites of interest, thereby altering the splicing pattern ofthe gene. This approach has been successfully used for shifting the balance of expression ofthe two isoforms of Bcl-x [Taylor (1999) Nat. Biotechnol. 17:1097-1100; and Mercatante (2001) J. Biol. Chem. 276:16411-16417]; IL-5R TKaπas (2000) Mol. Pharmacol. 58:380-387]; and c-myc [Giles (1999) Antisense Acid Drug Dev. 9:213-220]. For example, interleukin 5 and its receptor play a critical role as regulators of hematopoiesis and as mediators in some inflammatory diseases such as allergy and asthma. Two alternatively spliced isoforms are generated from the IL-5R gene, wliich include (i.e., long form) or exclude (i.e., short form) exon 9. The long form encodes for the intact membrane-bound receptor, while the shorter form encodes for a secreted soluble non-functional receptor. Using 2'-0-MOE- oligonucleotides specific to regions of exon 9, Kanas and co-workers (supra) were able to significantly decrease the expression ofthe wild type receptor and increase the expression ofthe shorter isoforms. Design and synthesis of oligonucleotides which can be used according to the present invention are described hereinbelow and by Sazani and Kole (2003) Progress in Moleclular and Subcellular Biology 31:217-239. Treatment can preferably effected by agents which are capable of specifically downregulating expression (or activity) of at least one ofthe polypeptide variants ofthe present invention. Down regulating the expression ofthe therapeutic protein variants ofthe present invention may be achieved using oligonucleotide agents such as those described in greater detail below. SiRNA molecules - Small interfering RNA (siRNA) molecules can be used to down- regulate expression ofthe therapeutic protein variants ofthe present invention. RNA interference is a two-step process. The first step, which is termed as the initiation step, input dsRNA is digested into 21-23 nucleotide (nt) small interfering RNAs (siRNA), probably by the action of Dicer, a member ofthe RNase III family of dsRNA-specific ribonucleases, wliich processes (cleaves) dsRNA (introduced directly or via a fransgene or a vims) in an ATP-dependent manner. Successive cleavage events degrade the RNA to 19-21 bp duplexes (siRNA), each with 2- nucleotide 3' overhangs [Hutvagner and Zamore Cun. Opin. Genetics and Development 12:225- 232 (2002); and Bernstein Nature 409:363-366 (2001)]. In the effector step, the siRNA duplexes bind to a nuclease complex to from the RNA-induced silencing complex (RISC). An ATP-dependent unwinding ofthe siRNA duplex is required for activation ofthe RISC. The active RISC then targets the homologous transcript by base pairing interactions and cleaves the mRNA into 12 nucleotide fragments from the 3' terminus ofthe siRNA [Hutvagner and Zamore Cun. Opin. Genetics and Development 12:225-232 (2002); Hammond et al. (2001) Nat. Rev. Gen. 2:110-119 (2001); and Sharp Genes. Dev. 15:485-90 (2001)]. Although the mechanism of cleavage is still to be elucidated, research indicates that each RISC contains a single siRNA and an RNase [Hutvagner and Zamore Cun. Opin. Genetics and Development 12:225-232 (2002)]. Because ofthe remarkable potency of RNAi, an amplification step within the RNAi pathway has been suggested. Amplification could occur by copying ofthe input dsRNAs which would generate more siRNAs, or by replication ofthe siRNAs formed. Alternatively or additionally, amplification could be effected by multiple turnover events ofthe RISC [Hammond et al. Nat. Rev. Gen.2:110-119 (2001), Sharp Genes. Dev. 15:485-90 (2001); Hutvagner and Zamore Cun. Opin. Genetics and Development 12:225-232 (2002)]. For more infomiation on RNAi see the following reviews Tuschl ChemBiochem. 2:239-245 (2001); Cullen Nat. Immunol. 3:597-599 (2002); and Brantl Biochem. Biophys. Act. 1575:15-25 (2002). Synthesis of RNAi molecules suitable for use with the present invention can be effected as follows. First, the mRNA sequence is scanned downstream ofthe AUG start codon for AA dinucleotide sequences. Occunence of each AA and the 3' adjacent 19 nucleotides is recorded as potential siRNA target sites. Preferably, siRNA target sites are selected from the open reading frame, as untranslated regions (UTRs) are richer in regulatory protein binding sites. UTR-binding proteins and/or translation initiation complexes may interfere with binding ofthe siRNA endonuclease complex [Tuschl ChemBiochem. 2:239-245]. It will be appreciated though, that siRNAs directed at untranslated regions may also be effective, as demonstrated for GAPDH wherein siRNA directed at the 5 ' UTR mediated about 90 % decrease in cellular GAPDH mRNA and completely abolished protein level (www.ambion.com/techlib/tn 91/912.html). Second, potential target sites are compared to an appropriate genomic database (e.g., human, mouse, rat etc.) using any sequence alignment software, such as the BLAST software available from the NCBI server (www.ncbi.nlm.nih.gov/BLAST/). Putative target sites which exhibit significant homology to other coding sequences are filtered out. Qualifying target sequences are selected as template for siRNA synthesis. Prefened sequences are those including low G/C content as these have proven to be more effective in mediating gene silencing as compared to those with G/C content higher than 55 %. Several target sites are preferably selected along the length ofthe target gene for evaluation. Target sites are selected from the unique nucleotide sequences of each ofthe polynucleotides ofthe present invention, such that each polynucleotide is specifically down regulated. For better evaluation ofthe selected siRNAs, a negative confrol is preferably used in conjunction. Negative control siRNA preferably include the same nucleotide composition as the siRNAs but lack significant homology to the genome. Thus, a scrambled nucleotide sequence ofthe siRNA is preferably used, provided it does not display any significant homology to any other gene. DNAzyme molecules - Another agent capable of downregulating expression ofthe polypeptides ofthe present invention is a DNAzyme molecule capable of specifically cleaving an mRNA transcript or DNA sequence ofthe polynucleotides ofthe present invention. DNAzymes are single-stranded polynucleotides which are capable of cleaving both single and double stranded target sequences (Breaker, R.R. and Joyce, G. Chemistry and Biology 1995;2:655; Santoro, S.W. & Joyce, G.F. Proc. Natl, Acad. Sci. USA 1997;943:4262) A general model (the "10-23" model) for the DNAzyme has been proposed. " 10-23" DNAzymes have a catalytic domain of 15 deoxyribonucleotides, flanked by two substrate-recognition domains of seven to nine deoxyribonucleotides each. This type of DNAzyme can effectively cleave its substrate RNA at purine:pyrimidine junctions (Santoro, S.W. & Joyce, G.F. Proc. Natl, Acad. Sci. USA 199; for rev of DNAzymes see Khachigian, LM [Cun Opin Mol Ther 4:119-21 (2002)]. Target sites for DNAzymes are selected from the unique nucleotide sequences of each of the polynucleotides ofthe present invention, such that each polynucleotide is specifically down regulated. Examples of construction and amplification of synthetic, engineered DNAzymes recognizing single and double-stranded target cleavage sites have been disclosed in U.S. Pat. No. 6,326,174 to Joyce et al. DNAzymes of similar design directed against the human Urokinase receptor were recently observed to inhibit Urokinase receptor expression, and successfully inhibit colon cancer cell metastasis in vivo (Itoh et al , 20002, Abstract 409, Ann Meeting Am Soc Gen Ther www.asgt.org). In another application, DNAzymes complementary to bcr-abl oncogenes were successful in inhibiting the oncogenes expression in leukemia cells, and lessening relapse rates in autologous bone maπow transplant in cases of CML and ALL. Antisense molecules - Downregulation ofthe polynucleotides ofthe present invention can also be effected by using an antisense polynucleotide capable of specifically hybridizing with an mRNA franscript encoding the polypeptide variants ofthe present invention. The term "antisense", as used herein, refers to any composition containing nucleotide sequences, which are complementary to a specific DNA or RNA sequence. The term "antisense strand" is used in reference to a nucleic acid sfrand that is complementary to the "sense" strand. Antisense molecules also include peptide nucleic acids and may be produced by any method including synthesis or transcription. Once infroduced into a cell, the complementary nucleotides combine with natural sequences produced by the cell to form duplexes and block either transcription or translation. The designation "negative" is sometimes used in reference to the antisense strand, and "positive" is sometimes used in reference to the sense strand. Antisense oligonucleotides are also used for modulation of alternative splicing in vivo and for diagnostics in vivo and in vitro (Khelifi C. et al., 2002, Cunent Pharmaceutical Design 8:451- 1466; Sazani, P., and Kole. R. Progress in Molecular and Cellular Biology, 2003, 31:217-239). Design of antisense molecules which can be used to efficiently downregulate expression ofthe polypeptides ofthe present invention must be effected while considering two aspects important to the antisense approach. The first aspect is delivery ofthe oligonucleotide into the cytoplasm ofthe appropriate cells, while the second aspect is design of an oligonucleotide which specifically binds the designated mRNA within cells in a way which inhibits translation thereof. The prior art teaches of a number of delivery strategies which can be used to efficiently deliver oligonucleotides into a wide variety of cell types [see, for example, Luft J Mol Med 76: 75- 6 (1998); Kronenwett et al. Blood 91: 852-62 (1998); Rajur et al. Bioconjug Chem 8: 935-40 (1997); Lavigne et al. Biochem Biophys Res Commun 237: 566-71 (1997) and Aoki et al. (1997) Biochem Biophys Res Commun 231 : 540-5 (1997)]. In addition, algorithms for identifying those sequences with the highest predicted binding affinity for their target mRNA based on a thermodynamic cycle that accounts for the energetics of structural alterations in both the target mRNA and the oligonucleotide are also available [see, for example, Walton et al. Biotechnol Bioeng 65: 1-9 (1999)]. Such algorithms have been successfully used to implement an antisense approach in cells. For example, the algorithm developed by Walton et al. enabled scientists to successfully design antisense oligonucleotides for rabbit beta-globin (RBG) and mouse tumor necrosis factor-alpha (TNF alpha) transcripts. The same research group has more recently reported that the antisense activity of rationally selected oligonucleotides against three model target mRNAs (human lactate dehydrogenase A and B and rat gp 130) in cell culture as evaluated by a kinetic PCR technique proved effective in almost all cases, including tests against three different targets in two cell types with phosphodiester andphosphorothioate oligonucleotide chemistries. In addition, several approaches for designing and predicting efficiency of specific oligonucleotides using an in vitro system were also published (Matveeva et al., Nature Biotechnology 16: 1374 - 1375 (1998)]. Several clinical trials have demonstrated safety, feasibility and activity of antisense oligonucleotides. For example, antisense oligonucleotides suitable for the treatment of cancer have been successfully used [Holmund et al., Cun Opin Mol Ther 1:372-85 (1999)], while freatment of hematological malignancies via antisense oligonucleotides targeting c-myb gene, p53 and Bcl-2 had entered clinical trials and had been shown to be tolerated by patients [Gerwitz Cun Opin Mol Ther 1:297-306 (1999)]. More recently, antisense-mediated suppression of human heparanase gene expression has been reported to inhibit pleural dissemination of human cancer cells in a mouse model [Uno et al., Cancer Res 61 :7855-60 (2001)]. Thus, the cunent consensus is that recent developments in the field of antisense technology which, as described above, have led to the generation of highly accurate antisense design algorithms and a wide variety of oligonucleotide delivery systems, enable an ordinarily skilled artisan to design and implement antisense approaches suitable for downregulating expression of known sequences without having to resort to undue trial and enor experimentation. Target sites for antisense molecules are selected from the unique nucleotide sequences of each ofthe polynucleotides ofthe present invention, such that each polynucleotide is specifically down regulated. Ribozymes - Another agent capable of downreguTating expression ofthe polypeptides of the present invention is a ribozyme molecule capable of specifically cleaving an mRNA franscript encoding the polypeptide variants ofthe present invention. Ribozymes are being increasingly used for the sequence-specific inhibition of gene expression by the cleavage of mRNAs encoding proteins of interest [Welch et al., Cun Opin Biotechnol. 9:486-96 (1998)]. The possibility of designing ribozymes to cleave any specific target RNA has rendered them valuable tools in both basic research and therapeutic applications. In therapeutics area, ribozymes have been exploited to target viral RNAs in infectious diseases, dominant oncogenes in cancers and specific somatic mutations in genetic disorders [Welch et al., Clin Diagn Virol. 10:163-71 (1998)]. Most notably, several ribozyme gene therapy protocols for HTV patients are already in Phase 1 trials. More recently, ribozymes have been used for transgenic animal research, gene target validation and pathway elucidation. Several ribozymes are in various stages of clinical trials. ANGIOZYME was the first chemically synthesized ribozyme to be studied in human clinical trials. ANGIOZYME specifically inhibits formation ofthe VEGF-r (Vascular Endothelial Growth Factor receptor), a key component in the angiogenesis pathway. Ribozyme Pharmaceuticals, Inc., as well as other firms have demonstrated the importance of anti-angiogenesis therapeutics in animal models. HEPTAZYME, a ribozyme designed to selectively destroy Hepatitis C Vims (HCV) RNA, was found effective in decreasing Hepatitis C viral RNA in cell culture assays (Ribozyme Phannaceuticals, Incoφorated - WEB home page). Alternatively, down regulation ofthe polypeptide variants ofthe present invention may be achieved at the polypeptide level using downregulating agents such as antibodies or antibody fragments capabale of specifically binding the polypeptides ofthe present invention and inhibiting the activity thereof (i.e., neutralizing antibodies). Such antibodies can be directed for example, to the heterodimerizing domain on the variant, or to a putative ligand binding domain. Further description of antibodies and methods of generating same is provided below.
PHARMACEUTICAL COMPOSITIONS AND DELIVERY THEREOF The present invention features a pharmaceutical composition comprising a therapeutically effective amount of a therapeutic agent according to the present invention, which is preferably a therapeutic protein variant as described herein. Optionally and alternatively, the therapeutic agent could be an antibody or an oligonucleotide that specifically recognizes and binds to the therapeutic protein variant, but not to the conesponding full length known protein. Alternatively, the pharmaceutical composition ofthe present invention includes a therapeutically effective amount of at least an active portion of a therapeutic protein variant polypeptide. The pharmaceutical composition according to the present invention is preferably used for the treatment of cluster-related diseases. "Treatment" refers to both therapeutic freatment and prophylactic or preventative measures. Those in need of freatment include those already with the disorder as well as those in which the disorder is to be prevented. Hence, the mammal to be freated herein may have been diagnosed as having the disorder or may be predisposed or susceptible to the disorder. "Mammal" for purposes of treatment refers to any animal classified as a mammal, including humans, domestic and farm animals, and zoo, sports, or pet animals, such as dogs, horses, cats, cows, etc. Preferably, the mammal is human. A "disorder" is any condition that wouiα oeneπt from treatment with the agent according to the present invention. This includes chronic and acute disorders or diseases including those pathological conditions which predispose the mammal to the disorder in question. Non-limiting examples of disorders to be treated herein are described with regard to specific examples given herein. The tenn "therapeutically effective amount" refers to an amount of agent according to the present invention that is effective to freat a disease or disorder in a mammal. In the case of cancer, the therapeutically effective amount ofthe agent may reduce the number of cancer cells; reduce the tumor size; inhibit (i.e., slow to some extent and preferably stop) cancer cell infiltration into peripheral organs; inhibit (i.e., slow to some extent and preferably stop) tumor metastasis; inliibit, to some extent, tumor growth; and/or relieve to some extent one or more ofthe symptoms associated with the cancer. To the extent the agent may prevent growth and/or kill existing cancer cells, it may be cytostatic and/or cytotoxic. For cancer therapy, efficacy can, for example, be measured by assessing the time to disease progression (TTP) and/or determining the response rate (RR). The therapeutic agents ofthe present invention can be provided to the subject per se, or as part of a pharmaceutical composition where they are mixed with a pharmaceutically acceptable carrier. As used herein a "pharmaceutical composition" refers to a preparation of one or more ofthe active ingredients described herein with other chemical components such as physiologically suitable carriers and excipients. The purpose of a pharmaceutical composition is to facilitate administration of a compound to an organism. Herein the term "active ingredient" refers to the preparation accountable for the biological effect. Hereinafter, the phrases "physiologically acceptable carrier" and "phannaceutically acceptable carrier" which may be interchangeably used refer to a carrier or a diluent that does not cause significant irritation to an organism and does not abrogate the biological activity and properties ofthe administered compound. An adjuvant is included under these phrases. One ofthe ingredients included in the phannaceutically acceptable carrier can be for example polyethylene glycol (PEG), a biocompatible polymer with a wide range of solubility in both organic and aqueous media (Mutter et al. (1979). Herein the teπn "excipient" refers to an inert substance added to a pharmaceutical composition to further facilitate administration of an active ingredient. Examples, without limitation, of excipients include calcium carbonate, calcium phosphate, various sugars and types of starch, cellulose derivatives, gelatin, vegetable oils and polyethylene glycols. Techniques for fonnulation and administration of drags may be found in "Remington's
Phannaceutical Sciences," Mack Publishing Co., Easton, PA, latest edition, which is incorporated herein by reference. Suitable routes of adminisfration may, for example, include oral, rectal, fransmucosal, especially transnasal, intestinal or parenteral delivery, including intramuscular, subcutaneous and inframeduUary injections as well as infrathecal, direct intravenfricular, intravenous, intraperitoneal, intranasal, or intraocular injections. Alternately, one may administer a preparation in a local rather than systemic manner, for example, via injection ofthe preparation directly into a specific region of a patient's body. Pharmaceutical compositions ofthe present invention may be manufactured by processes well known in the art, e.g., by means of conventional mixing, dissolving, granulating, dragee- making, levigating, emulsifying, encapsulating, entrapping or lyophilizing processes. Pharmaceutical compositions for use in accordance with the present invention may be formulated in conventional manner using one or more physiologically acceptable carriers comprising excipients and auxiliaries, which facilitate processing ofthe active ingredients into preparations which, can be used pharmaceutically. Proper formulation is dependent upon the route of administration chosen. For injection, the active ingredients ofthe invention may be formulated in aqueous solutions, preferably in physiologically compatible buffers such as Hank's solution, Ringer's solution, or physiological salt buffer. For fransmucosal administration, penefrants appropriate to the barrier to be permeated are used in the formulation. Such penefrants are generally known in the art. For oral administration, the compounds can be formulated readily by combining the active compounds with pharmaceutically acceptable carriers well known in the art. Such carriers enable the compounds ofthe invention to be formulated as tablets, pills, dragees, capsules, liquids, gels, syrups, slurries, suspensions, and the like, for oral ingestion by a patient. Pharmacological preparations for oral use can be made using a solid excipient, optionally grmding the resulting mixture, and processing the mixtare of granules, after adding suitable auxiliaries if desired, to obtain tablets or dragee cores. Suitable excipients are, in particular, fillers such as sugars, including lactose, sucrose, mannitol, or sorbitol; cellulose preparations such as, for example, maize starch, wheat starch, rice starch, potato starch, gelatin, gum tragacanth, methyl cellulose, hydroxypropylmethyl-cellulose, sodium carbomethylcellulose; and/or physiologically acceptable polymers such as polyvinylpynolidone (PVP). If desired, disintegrating agents may be added, such as cross-linked polyvinyl pynolidone, agar, or alginic acid or a salt thereof such as sodium alginate. Dragee cores are provided with suitable coatings. For this purpose, concentrated sugar solutions may be used which may optionally contain gum arabic, talc, polyvinyl pynolidone, carbopol gel, polyethylene glycol, titanium dioxide, lacquer solutions and suitable organic solvents or solvent mixtures. Dyestuffs or pigments may be added to the tablets or dragee coatings for identification or to characterize different combinations of active compound doses. Pharmaceutical compositions, which can be used orally, include push-fit capsules made of gelatin as well as soft, sealed capsules made of gelatin and a plasticizer, such as glycerol or sorbitol. The push-fit capsules may contain the active ingredients in admixture with filler such as lactose, binders such as starches, lubricants such as talc or magnesium stearate and, optionally, stabilizers. In soft capsules, the active ingredients may be dissolved or suspended in suitable liquids, such as fatty oils, liquid paraffin, or liquid polyethylene glycols. In addition, stabilizers may be added. All formulations for oral adminisfration should be in dosages suitable for the chosen route of administration. For buccal administration, the compositions may take the form of tablets or lozenges formulated in conventional manner. For administration by nasal inhalation, the active ingredients for use according to the present invention are conveniently delivered in the form of an aerosol spray presentation from a pressurized pack or a nebulizer with the use of a suitable propellant, e.g., dichlorodifluoromethane, trichlorofluoromethane, dichloro-tetrafluoroethane or carbon dioxide. In the case of a pressurized aerosol, the dosage unit may be determined by providing a valve to deliver a metered amount. Capsules and cartridges of, e.g., gelatin for use in a dispenser may be formulated containing a powder mix ofthe compound and a suitable powder base such as lactose or starch. The preparations described herein may be formulated for parenteral administration, e.g., by bolus injection or continuous infusion. Formulations for injection may be presented in unit dosage form, e.g., in ampoules or in multidose containers with optionally, an added preservative. The compositions may be suspensions, solutions or emulsions in oily or aqueous vehicles, and may contain formulatory agents such as suspending, stabilizing and/or dispersing agents. Phaπnaceutical compositions for parenteral administration include aqueous solutions ofthe active preparation in water-soluble form. Additionally, suspensions ofthe active ingredients may be prepared as appropriate oily or water based injection suspensions. Suitable lipophilic solvents or vehicles include fatty oils such as sesame oil, or synthetic fatty acids esters such as ethyl oleate, triglycerides or liposomes. Aqueous injection suspensions may contain substances, which increase the viscosity ofthe suspension, such as sodium carboxymethyl cellulose, sorbitol or dextran. Optionally, the suspension may also contain suitable stabilizers or agents which increase the solubility ofthe active ingredients to allow for the preparation of highly concentrated solutions. Alternatively, the active ingredient may be in powder form for constitution with a suitable vehicle, e.g., sterile, pyrogen-free water based solution, before use. The preparation ofthe present invention may also be formulated in rectal compositions such as suppositories or retention enemas, using, e.g., conventional suppository bases such as cocoa butter or other glycerides. Pharmaceutical compositions suitable for use in context ofthe present invention include compositions wherein the active ingredients are contained in an amount effective to achieve the intended purpose. More specifically, a therapeutically effective amount means an amount of active ingredients effective to prevent, alleviate or ameliorate symptoms of disease or prolong the survival ofthe subject being treated. Determination of a therapeutically effective amount is well within the capability of those skilled in the art. For any preparation used in the methods ofthe invention, the therapeutically effective amount or dose can be estimated initially from in vitro assays. For example, a dose can be formulated in animal models and such information can be used to more accurately determine useful doses in humans. Toxicity and therapeutic efficacy ofthe active ingredients described herein can be detennined by standard pharmaceutical procedures in vitro, in cell cultures or experimental animals. The data obtained from these in vitro and cell culture assays and animal studies can be used in formulating a range of dosage for use in human. The dosage may vary depending upon the dosage form employed and the route of administration utilized. The exact fonnulation, route of administration and dosage can be chosen by the individual physician in view ofthe patient's condition. (See e.g., Fingl, et al, 1975, in "The Phannacological Basis of Therapeutics", Ch. 1 p.l). Depending on the severity and responsiveness ofthe condition to be freated, dosing can be of a single or a plurality of administrations, with course of treahnent lasting from several days to several weeks or until cure is effected or diminution ofthe disease state is achieved. The amount of a composition to be administered will, of course, be dependent on the subject being freated, the severity ofthe affliction, the manner of administration, the judgment of the prescribing physician, etc. Compositions including the preparation ofthe present invention formulated in a compatible pharmaceutical carrier may also be prepared, placed in an appropriate container, and labeled for treatment of an indicated condition. Pharmaceutical compositions ofthe present invention may, if desired, be presented in a pack or dispenser device, such as an FDA approved kit, which may contain one or more unit dosage forms containing the active ingredient. The pack may, for example, comprise metal or plastic foil, such as a blister pack. The pack or dispenser device may be accompanied by instructions for administration. The pack or dispenser may also be accommodated by a notice associated with the container in a form prescribed by a governmental agency regulating the manufacture, use or sale of pharmaceuticals, which notice is reflective of approval by the agency of the form ofthe compositions or human or veterinary administration. Such notice, for example, may be of labeling approved by the U.S. Food and Dmg Adminisfration for prescription drugs or of an approved product insert.
IMMUNOGENIC COMPOSITIONS A therapeutic agent according to the present invention may optionally be a molecule, which promotes a specific immunogenic response against at least one ofthe polypeptides ofthe present invention in the subject. The molecule can be polypeptide variants ofthe present invention, a fragment derived therefrom or a nucleic acid sequence encoding thereof. Although such a molecule can be provided to the subject per se, the agent is preferably administered with an immunostimulant in an immunogenic composiiton. An immunostimulant may be any substance that enhances or potentiates an immune response (antibody and/or cell-mediated) to an exogenous antigen. Examples of immunostimulants include adjuvants, biodegradable microspheres (e.g., polylactic galactide) and liposomes into which the compound is incorporated (see e.g., U.S. Pat. No. 4,235,877). Vaccine preparation is generally described in, for example, M. F. Powell and M. J. Newman, eds., "Vaccine Design (the subunit and adjuvant approach)," Plenum Press (NY, 1995). Illusfrative immunogenic compositions may contain DNA encoding one or more ofthe polypeptides as described above, such that the polypeptide is generated in situ. The DNA may be present within any of a variety of delivery systems Icnown to those of ordinary skill in the art, including nucleic acid expression systems (see below), bacteria and viral expression systems. Numerous gene delivery techniques are well Icnown in the art, such as those described by Rolland, Crit. Rev. Therap. Dmg Carrier Systems 15:143-198, 1998, and references cited therein.
Appropriate nucleic acid expression systems contain the necessary DNA sequences for expression in the subject (such as a suitable promoter and terminating signal). Bacterial delivery systems involve the administration of a bacterium (such as Bacillus-Calmette-Guenin) that expresses an immunogenic portion ofthe polypeptide on its cell surface or secretes such an epitope. In a prefened embodiment, the DNA may be infroduced using a viral expression system (e.g., vaccinia or other pox vims, retrovirus, or adenovirus), which may involve the use of a non-pathogenic (defective), replication competent virus. Suitable systems are disclosed, for example, in Fisher- Hoch et al., Proc. Natl. Acad. Sci. USA 86:317-321, 1989; Flexner et al., Ann. N.Y Acad. Sci. 569:86-103, 1989; Flexner et al., Vaccine 8:17-21, 1990; U.S. Pat. Nos. 4,603,112, 4,769,330, and 5,017,487; WO 89/01973; U.S. Pat. No. 4,777,127; GB 2,200,651; EP 0,345,242; WO 91/02805; Berkner, Biotechniques 6:616-627, 1988; Rosenfeld et al., Science 252:431-434, 1991; Kolls et al., Proc. Natl. Acad. Sci. USA 91:215-219, 1994; Kass-Eisler et al., Proc. Natl. Acad. Sci. USA 90: 11498-11502, 1993; Guzman et al, Circulation 88:2838-2848, 1993; and Guzman et al., Cir. Res. 73:1202-1207, 1993. Techniques for incoφorating DNA into such expression systems are well known to those of ordinary skill in the art. The DNA may also be "naked," as described, for example, in Ulmer et al., Science 259:1745-1749, 1993 and reviewed by Cohen, Science 259:1691- 1692, 1993. The uptake of naked DNA maybe increased by coating the DNA onto biodegradable beads, which are efficiently transported into the cells. It will be appreciated that an immunogenic composition may comprise both a polynucleotide and a polypeptide component. Such iinmunogenic compositions may provide for an enhanced immune response. Any of a variety of immunostimulants may be employed in the immunogenic compositions of this invention. For example, an adjuvant may be included. Most adjuvants contain a substance designed to protect the antigen from rapid catabolism, such as aluminum hydroxide or mineral oil, and a stimulator of immune responses, such as lipid A, Bortadella pertussis or Mycobacterium tuberculosis derived proteins. Suitable adjuvants are commercially available as, for example,
Freund's Incomplete Adjuvant and Complete Adjuvant (Difco Laboratories, Detroit, Mich.); Merck Adjuvant 65 (Merck and Company, Inc., Railway, N.J.); AS-2 (SmithKline Beecham, Philadelphia, Pa.); aluminum salts such as aluminum hydroxide gel (alum) or aluminum phosphate; salts of calcium, iron or zinc; an insoluble suspension of acylated tyrosine; acylated sugars; cationically or anionically derivatized polysaccharides; polyphosphazenes; biodegradable microspheres; monophosphoryl lipid A and quil A. Cytokines, such as GM-CSF or interleukin-2,-7, or -12, may also be used as adjuvants. The adjuvant composition may be designed to induce an immune response predominantly ofthe Thl type. High levels of Thl-type cytokines (e.g., IFN-.gamma., TNF.alpha., IL-2 and IL- 12) tend to favor the induction of cell mediated immune responses to an administered antigen. In contrast, high levels of Th2-type cytokines (e.g., IL-4, IL-5, IL-6 and IL-10) tend to favor the induction of humoral immune responses. Following application of an immunogenic composition as provided herein, the subject will support an immune response that includes Thl- and Th2-tyρe responses. The levels of these cytokines may be readily assessed using standard assays. For a review ofthe families of cytokines, see Mosmann and Coffinan, Ann. Rev. Immunol. 7:145-173, 1989. Prefened adjuvants for use in eliciting a predominantly Thl-type response include, for example, a combination of monophosphoryl lipid A, preferably 3-de-O-acylated monophosphoryl lipid A (3D-MPL), together with an aluminum salt. MPL adjuvants are available from Corixa Corporation (Seattle, Wash.; see U.S. Pat. Nos.4,436,727; 4,877,611; 4,866,034 and 4,912,094). CpG-containing oligonucleotides (in which the CpG dinucleotide is unmethylated) also induce a predominantly Thl response. Such oligonucleotides are well known and are described, for example, in WO 96/02555, WO 99/33488 and U.S. Pat. Nos. 6,008,200 and 5,856,462. Immunostimulatory DNA sequences are also described, for example, by Sato et al., Science 273:352, 1996. Another prefened adjuvant is a saponin, preferably QS21 (Aquila Biopharmaceuticals Inc., Framingham, Mass.), which may be used alone or in combination with other adjuvants. For example, an enhanced system involves the combination of a monophosphoryl lipid A and saponin derivative, such as the combination of QS21 and 3D-MPL as described in WO 94/00153, or a less reactogenic composition where the QS21 is quenched with cholesterol, as described in WO 96/33739. Other prefened formulations comprise an oil-in-water emulsion and tocopherol. A particularly potent adjuvant foπnulation involving QS21, 3D-MPL and tocopherol in an oil-in-water emulsion is described in WO 95/17210. Other prefened adjuvants include Montanide ISA 720 (Seppic, France), SAF (Chiron, Calif., United States), ISCOMS (CSL), MF-59 (Chiron), the SBAS series of adjuvants (e.g., SBAS-2 or SBAS-4, available from SmithKline Beecham, Rixensart, Belgium), Detox (Corixa, Hamilton, Mont.), RC-529 (Corixa, Hamilton, Mont.) and other amrnoalkyl glucosaminide 4- phosphates (AGPs), such as those described in pending U.S. patent application Ser. Nos. 08/853,826 and 09/074,720. A delivery vehicle may be employed within the immunogenic composition ofthe present invention to facilitate production of an antigen-specific immune response that targets tamor cells. Delivery vehicles include antigen presenting cells (APCs), such as dendritic cells, macrophages, B cells, monocytes and other cells that may be engineered to be efficient APCs. Such cells may be genetically modified to increase the capacity for presenting the antigen, to improve activation and/or maintenance ofthe T cell response, to have anti-tumor effects per se and/or to be immunologically compatible with the receiver (i.e., matched HLA haplotype). APCs may generally be isolated from any of a variety of biological fluids and organs, including tumor and peritumoral tissues, and may be autologous, allogeneic, syngeneic or xenogeneic cells. Dendritic cells are highly potent APCs (Banchereau and Steinman, Nature 392:245-251, 1998) and have been shown to be effective as a physiological adjuvant for eliciting prophylactic or therapeutic antitamor immunity (see Timmernan and Levy, Ann. Rev. Med. 50:507-529, 1999). In general, dendritic cells may be identified based on their typical shape (stellate in sita, with marked cytoplasmic processes (dendrites) visible in vitro), their ability to take up, process and present antigens with high efficiency and their ability to activate naive T cell responses. Dendritic cells may, of course, be engineered to express specific cell-surface receptors or ligands that are not commonly found on dendritic cells in vivo or ex vivo, and such modified dendritic cells are contemplated by the present invention. As an alternative to dendritic cells, secreted vesicles antigen-loaded dendritic cells (called exosomes) may be used within an immunogenic composition (see Zitvogel et al, Nature Med.4:594-600, 1998). Dendritic cells and progenitors may be obtained from peripheral blood, bone marrow, tumor-infiltrating cells, peritumoral tissues-infiltrating cells, lymph nodes, spleen, skin, umbilical cord blood or any other suitable tissue or fluid. For example, dendritic cells may be differentiated ex vivo by adding a combination of cytokines such as GM-CSF, IL-4, IL-13 and/or TNF.alpha. to cultures of monocytes harvested from peripheral blood. Alternatively, CD34 positive cells harvested from peripheral blood, umbilical cord blood or bone manow may be differentiated into dendritic cells by adding to the culture medium combinations of GM-CSF, IL-3, TNF.alpha., CD40 ligand, LPS, flt3 ligand and or other compound(s) that induce differentiation, maturation and proliferation of dendritic cells. Dendritic cells are categorized as "immature" and "mature" cells, which allows a simple way to discriminate between two well characterized phenotypes. Immature dendritic cells are characterized as APC with a high capacity for antigen uptake and processing, which conelates with the high expression of Fey receptor and mannose receptor. The mature phenotype is typically characterized by a lower expression of these markers, but a high expression of cell surface molecules responsible for T cell activation such as class I and class II MHC, adhesion molecules (e.g., CD54 and CD11) and costimulatory molecules (e.g., CD40, CD80, CD86 and 4-1BB). APCs may generally be transfected with at least one polynucleotide encoding a polypeptide ofthe present invention, such that variant II, or an immunogenic portion thereof, is expressed on the cell surface. Such transfection may take place ex vivo, and a composition comprising such transfected cells may then be used for therapeutic purposes, as described herein. Alternatively, a gene delivery vehicle that targets a dendritic or other antigen presenting cell may be administered to the subject, resulting in transfection that occurs in vivo. In vivo and ex vivo transfection of dendritic cells, for example, may generally be performed using any methods known in the art, such as those described in WO 97/24447, or the gene gun approach described by Mahvi et al., Immunology and cell Biology 75:456-460, 1997. Antigen loading of dendritic cells may be achieved by incubating dendritic cells or progenitor cells with a polypeptide ofthe present rnventio, DNA (naked or within a plasmid vector) or RNA; or with antigen-expressing recombinant bacterium or viruses (e.g., vaccinia, fowlpox, adenovirus or lentivirus vectors). Prior to loading, the polypeptide may be covalently conjugated to an immunological partner that provides T cell help (e.g., a carrier molecule) such as described above. Alternatively, a dendritic cell may be pulsed with a non- conjugated immunological partner, separately or in the presence ofthe polypeptide.
It is appreciated that certain featares ofthe invention, which are, for clarity, described in the context of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features ofthe invention, which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable subcombination.
Although the invention has been described in conjunction with specific embodiments thereof, it is evident that many alternatives, modifications and variations will be apparent to those skilled in the art. Accordingly, it is intended to embrace all such alternatives, modifications and variations that fall within the spirit and broad scope of the appended claims. All publications, patents and patent applications mentioned in this specification are herein incoφorated in their entirety by reference into the specification, to the same extent as if each individual publication, patent or patent application was specifically and individually indicated to be incoφorated herein by reference. In addition, citation or identification of any reference in this application shall not be constmed as an admission that such reference is available as prior art to the present invention.

Claims

WHAT IS CLAIMED IS: 1. An isolated polynucleotide comprising a polynucleotide having a sequence of R11723_PEA_1_T5.
2. The isolated polynucleotide of claim 1, comprising a node having a sequence of : RI 1723 JPEA J_nodeJ 3.
3. An isolated polypeptide comprising a polypeptide having a sequence of : RI 1723 JPEA JJ>13.
4. The isolated of claim 3, comprising a chimeric polypeptide encoding for
RI 1723JPEAJ JP13, comprising a first amino acid sequence being at least 95 % homologous to
MWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFINNCTVNVQDMCQKEV MEQSA conesponding to amino acids 1 - 63 of Q96AC2, which also corresponds to amino acids 1 - 63 of RI 1723JPEAJ JP13, and a second amino acid sequence being at least about 95% homologous to a polypeptide having the sequence DTKRTNTLLFEMRHFAKQLTT conesponding to amino acids 64 - 84 of RI 1723JPEAJ JP13, wherein said first and second amino acid sequences are contiguous and in a sequential order.
4. The isolated polypeptide of claim 4, comprising a tail of RI 1723JPEAJ JP13, comprising a polypeptide being at least about 95% homologous to the sequence DTKRTNTLLFEIvIRHFAKQLTT in RI 1723_PEAJ JP13.
5. The isolated oligonucleotide of claim 1, comprising an amplicon according to SEQ ID NO: 1297.
6. A primer pair, comprising a pair of isolated oligonucleotides capable of amplifying said amplicon of claim 5.
7. The primer pair of claim 6, comprising a pair of isolated oligonucleotides: SEQ Os 1295 and 1296.
8. An antibody capable of specifically binding to an epitope of an amino acid sequence of claim 3.
9. The antibody of claim 8, wherein said amino acid sequence comprises said tail of claim 4.
10. The antibody of claim 8, wherein said antibody is capable of differentiating between a splice variant having said epitope and a corresponding known protein PSEC.
11. A kit for detecting colon cancer, comprising a kit detecting overexpression of a splice variant according to claim 1.
12. The kit of claim 11 , wherein said kit comprises a NAT-based technology.
13. The kit of claim 11 , wherein said kit further comprises at least one primer pair capable of selectively hybridizing to a nucleic acid sequence according to claim 1.
14. The kit of claim 11 , wherein said kit further comprises at least one oligonucleotide capable of selectively hybridizing to a nucleic acid sequence according to claim 1.
12. A kit for detecting colon cancer, comprising a kit detecting overexpression of a splice variant according to claim 3, said kit comprising an antibody according to claim 8.
13. The kit of claim 12, wherein said kit further comprises at least one reagent for performing an ELISA or a Western blot.
14. A method for detecting colon cancer, comprising detecting overexpression of a splice variant according to claim 1.
15. The method of claim 14, wherein said detecting overexpression is performed with a NAT-based technology.
16. A method for detecting colon cancer, comprising detecting overexpression of a splice variant according to claim 3, wherein said detecting overexpression is performed with an immunoassay.
17. The method of claim 16, wherein said immunoassay comprises an antibody according to claim 8.
18. A biomarker capable of detecting colon cancer, comprising a nucleic acid sequence according to claim 1 or a fragment thereof, or an amino acid sequence according to claim 3 or a fragment thereof.
19. A method for screening for colon cancer, comprising detecting colon cancer cells with a biomarker according to claim 18.
20. A method for diagnosing colon cancer, comprising detecting colon cancer cells with a biomarker according to claim 18.
21. A method for monitoring disease progression and/or treatment efficacy and/or relapse of colon cancer, comprising detecting colon cancer cells with a biomarker according to claim 18.
22. A method of selecting a therapy for colon cancer, comprising detecting colon cancer cells with a biomarker according to claim 18 and selecting a therapy according to said detection.
PCT/IB2005/000928 2004-01-27 2005-01-27 Novel nucleotide and amino acid sequences, and assays and methods of use thereof for diagnosis of colon cancer WO2005072053A2 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
EP05718397A EP1749025A2 (en) 2004-01-27 2005-01-27 Novel nucleotide and amino acid sequences, and assays and methods of use thereof for diagnosis of colon cancer
AU2005207883A AU2005207883A1 (en) 2005-01-27 2005-01-27 Novel nucleotide and amino acid sequences, and assays and methods of use thereof for diagnosis of colon cancer
CA002554623A CA2554623A1 (en) 2004-01-27 2005-01-27 Novel nucleotide and amino acid sequences, and assays and methods of use thereof for diagnosis of colon cancer

Applications Claiming Priority (57)

Application Number Priority Date Filing Date Title
US53912804P 2004-01-27 2004-01-27
US53912904P 2004-01-27 2004-01-27
US60/539,129 2004-01-27
US60/539,128 2004-01-27
US62097504P 2004-10-22 2004-10-22
US62097404P 2004-10-22 2004-10-22
US62065604P 2004-10-22 2004-10-22
US62091604P 2004-10-22 2004-10-22
US62092404P 2004-10-22 2004-10-22
US62091804P 2004-10-22 2004-10-22
US62091704P 2004-10-22 2004-10-22
US62087404P 2004-10-22 2004-10-22
US62085304P 2004-10-22 2004-10-22
US60/620,916 2004-10-22
US60/620,975 2004-10-22
US60/620,656 2004-10-22
US60/620,874 2004-10-22
US60/620,974 2004-10-22
US60/620,917 2004-10-22
US60/620,924 2004-10-22
US60/620,853 2004-10-22
US60/620,918 2004-10-22
US62105304P 2004-10-25 2004-10-25
US62113104P 2004-10-25 2004-10-25
US60/621,131 2004-10-25
US60/621,053 2004-10-25
US62232004P 2004-10-27 2004-10-27
US60/622,320 2004-10-27
US62812304P 2004-11-17 2004-11-17
US62811204P 2004-11-17 2004-11-17
US62819004P 2004-11-17 2004-11-17
US62810104P 2004-11-17 2004-11-17
US62814504P 2004-11-17 2004-11-17
US62818904P 2004-11-17 2004-11-17
US62817804P 2004-11-17 2004-11-17
US62815604P 2004-11-17 2004-11-17
US62811104P 2004-11-17 2004-11-17
US62813404P 2004-11-17 2004-11-17
US62825104P 2004-11-17 2004-11-17
US62823104P 2004-11-17 2004-11-17
US60/628,190 2004-11-17
US60/628,156 2004-11-17
US60/628,178 2004-11-17
US60/628,251 2004-11-17
US60/628,134 2004-11-17
US60/628,189 2004-11-17
US60/628,145 2004-11-17
US60/628,231 2004-11-17
US60/628,111 2004-11-17
US60/628,101 2004-11-17
US60/628,112 2004-11-17
US60/628,123 2004-11-17
US63055904P 2004-11-26 2004-11-26
US60/630,559 2004-11-26
US63407504P 2004-12-08 2004-12-08
US11/043,788 US20060014166A1 (en) 2004-01-27 2005-01-27 Novel nucleotide and amino acid sequences, and assays and methods of use thereof for diagnosis of endometriosis
USNONE 2005-06-20

Publications (2)

Publication Number Publication Date
WO2005072053A2 true WO2005072053A2 (en) 2005-08-11
WO2005072053A9 WO2005072053A9 (en) 2010-03-04

Family

ID=37637446

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2005/000928 WO2005072053A2 (en) 2004-01-27 2005-01-27 Novel nucleotide and amino acid sequences, and assays and methods of use thereof for diagnosis of colon cancer

Country Status (2)

Country Link
EP (1) EP1749025A2 (en)
WO (1) WO2005072053A2 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006131928A2 (en) * 2005-06-08 2006-12-14 Compugen Ltd. Novel nucleotide and amino acid sequences, and assays and methods of use thereof for diagnosis
US20160279214A1 (en) * 2015-03-27 2016-09-29 Immatics Biotechnologies Gmbh Novel peptides and combination of peptides for use in immunotherapy against various tumors
US9920123B2 (en) 2008-12-09 2018-03-20 Genentech, Inc. Anti-PD-L1 antibodies, compositions and articles of manufacture
US10745460B2 (en) 2015-03-27 2020-08-18 Immatics Biotechnologies Gmbh Peptides and combination of peptides for use in immunotherapy against various tumors

Cited By (57)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006131928A2 (en) * 2005-06-08 2006-12-14 Compugen Ltd. Novel nucleotide and amino acid sequences, and assays and methods of use thereof for diagnosis
WO2006131928A3 (en) * 2005-06-08 2007-06-07 Compugen Ltd Novel nucleotide and amino acid sequences, and assays and methods of use thereof for diagnosis
US9920123B2 (en) 2008-12-09 2018-03-20 Genentech, Inc. Anti-PD-L1 antibodies, compositions and articles of manufacture
US20160279214A1 (en) * 2015-03-27 2016-09-29 Immatics Biotechnologies Gmbh Novel peptides and combination of peptides for use in immunotherapy against various tumors
US9802997B2 (en) 2015-03-27 2017-10-31 Immatics Biotechnologies Gmbh Peptides and combination of peptides for use in immunotherapy against various tumors
US9840548B2 (en) 2015-03-27 2017-12-12 Immatics Biotechnologies Gmbh Peptides and combination of peptides for use in immunotherapy against various tumors
US9862756B2 (en) 2015-03-27 2018-01-09 Immatics Biotechnologies Gmbh Peptides and combination of peptides for use in immunotherapy against various tumors
US9932384B2 (en) 2015-03-27 2018-04-03 Immatics Biotechnologies Gmbh Peptides and combination of peptides for use in immunotherapy against various tumors
US9951119B2 (en) 2015-03-27 2018-04-24 Immatics Biotechnologies Gmbh Peptides and combination of peptides for use in immunotherapy against various tumors
US9982030B2 (en) 2015-03-27 2018-05-29 Immatics Biotechnologies Gmbh Peptides and combination of peptides for use in immunotherapy against various tumors
US9982031B2 (en) 2015-03-27 2018-05-29 Immatics Biotechnologies Gmbh Peptides and combination of peptides for use in immunotherapy against various tumors
US9988432B2 (en) 2015-03-27 2018-06-05 Immatics Biotechnologies Gmbh Peptides and combination of peptides for use in immunotherapy against various tumors
US9994628B2 (en) 2015-03-27 2018-06-12 Immatics Biotechnologies Gmbh Peptides and combination of peptides for use in immunotherapy against various tumors
US10000547B2 (en) 2015-03-27 2018-06-19 immatics biotechnology GmbH Peptides and combination of peptides for use in immunotherapy against various tumors
US10005828B2 (en) 2015-03-27 2018-06-26 Immatics Biotechnologies Gmbh Peptides and combination of peptides for use in immunotherapy against various tumors
US10059755B2 (en) 2015-03-27 2018-08-28 Immatics Biotechnologies Gmbh Peptides and combination of peptides for use in immunotherapy against various tumors
US10066003B1 (en) 2015-03-27 2018-09-04 Immatics Biotechnologies Gmbh Peptides and combination of peptides for use in immunotherapy against various tumors
US10072063B2 (en) 2015-03-27 2018-09-11 Immatics Biotechnologies Gmbh Peptides and combination of peptides for use in immunotherapy against various tumors
US10081665B2 (en) 2015-03-27 2018-09-25 Immatics Biotechnologies Gmbh Peptides and combination of peptides for use in immunotherapy against various tumors
US10081664B2 (en) 2015-03-27 2018-09-25 Immatics Biotechnologies Gmbh Peptides and combination of peptides for use in immunotherapy against various tumors
US10093715B2 (en) 2015-03-27 2018-10-09 Immatics Biotechnologies Gmbh Peptides and combination of peptides for use in immunotherapy against various tumors
US10106593B2 (en) 2015-03-27 2018-10-23 Immatics Biotechnologies Gmbh Peptides and combination of peptides for use in immunotherapy against various tumors
US10106594B2 (en) 2015-03-27 2018-10-23 Immatics Biotechnologies Gmbh Peptides and combination of peptides for use in immunotherapy against various tumors
US10131703B2 (en) 2015-03-27 2018-11-20 Inmatics Biotechnologies Gmbh Peptides and combination of peptides for use in immunotherapy against various tumors
US10138288B2 (en) 2015-03-27 2018-11-27 Immatics Biotechnologies Gmbh Peptides and combination of peptides for use in immunotherapy against various tumors
US10155801B1 (en) 2015-03-27 2018-12-18 immatics biotechnology GmbH Peptides and combination of peptides for use in immunotherapy against various tumors
US10183982B2 (en) 2015-03-27 2019-01-22 Immatics Biotechnologies Gmbh Peptides and combination of peptides for use in immunotherapy against various tumors
US10202436B2 (en) 2015-03-27 2019-02-12 Immatics Biotechnologies Gmbh Peptides and combination of peptides for use in immunotherapy against various tumors
US10370429B2 (en) 2015-03-27 2019-08-06 Immatics Biotechnologies Gmbh Peptides and combination of peptides for use in immunotherapy against various tumors
US10450362B2 (en) 2015-03-27 2019-10-22 Immatics Biotechnologies Gmbh Peptides and combination of peptides for use in immunotherapy against various tumors
US10479823B2 (en) 2015-03-27 2019-11-19 Immatics Biotechnologies Gmbh Peptides and combination of peptides for use in immunotherapy against various tumors
US10487131B2 (en) 2015-03-27 2019-11-26 Immatics Biotechnologies Gmbh Peptides and combination of peptides for use in immunotherapy against various tumors
US10501522B2 (en) 2015-03-27 2019-12-10 Immatics Biotechnologies Gmbh Peptides and combination of peptides for use in immunotherapy against various tumors
US10519215B2 (en) 2015-03-27 2019-12-31 Immatics Biotechnologies Gmbh RELAXIN1 derived peptides for use in immunotherapy against various tumors
US10723781B2 (en) 2015-03-27 2020-07-28 Immatics Biotechnologies Gmbh Peptides and combination of peptides for use in immunotherapy against various tumors
US10745460B2 (en) 2015-03-27 2020-08-18 Immatics Biotechnologies Gmbh Peptides and combination of peptides for use in immunotherapy against various tumors
US10766944B2 (en) 2015-03-27 2020-09-08 Inmatics Biotechnologies Gmbh Peptides and combination of peptides for use in immunotherapy against various tumors
US10934338B2 (en) 2015-03-27 2021-03-02 Immatics Biotechnologies Gmbh Peptides and combination of peptides for use in immunotherapy against various tumors
US10947293B2 (en) 2015-03-27 2021-03-16 Immatics Biotechnologies Gmbh Peptides and combination of peptides for use in immunotherapy against various tumors
US10947294B2 (en) 2015-03-27 2021-03-16 Immatics Biotechnologies Gmbh Peptides and combination of peptides for use in immunotherapy against various tumors
US11155597B2 (en) 2015-03-27 2021-10-26 Immatics Biotechnologies Gmbh Relaxin1 derived peptides for use in immunotherapy
US11332512B2 (en) 2015-03-27 2022-05-17 Immatics Biotechnologies Gmbh Peptides and combination of peptides for use in immunotherapy against various tumors
US11365235B2 (en) 2015-03-27 2022-06-21 Immatics Biotechnologies Gmbh Peptides and combination of peptides for use in immunotherapy against various tumors
US11365234B2 (en) 2015-03-27 2022-06-21 Immatics Biotechnologies Gmbh Peptides and combination of peptides for use in immunotherapy against various tumors
US11407808B2 (en) 2015-03-27 2022-08-09 Immatics Biotechnologies Gmbh Peptides and combination of peptides for use in immunotherapy against various tumors
US11407810B2 (en) 2015-03-27 2022-08-09 Immatics Biotechnologies Gmbh Peptides and combination of peptides for use in immunotherapy against various tumors
US11407807B2 (en) 2015-03-27 2022-08-09 Immatics Biotechnologies Gmbh Peptides and combination of peptides for use in immunotherapy against various tumors
US11407809B2 (en) 2015-03-27 2022-08-09 Immatics Biotechnologies Gmbh Peptides and combination of peptides for use in immunotherapy against various tumors
US11434274B2 (en) 2015-03-27 2022-09-06 Immatics Biotechnologies Gmbh Peptides and combination of peptides for use in immunotherapy against various tumors
US11434273B2 (en) 2015-03-27 2022-09-06 Immatics Biotechnologies Gmbh Peptides and combination of peptides for use in immunotherapy against various tumors
US11440947B2 (en) 2015-03-27 2022-09-13 Immatics Biotechnologies Gmbh Peptides and combination of peptides for use in immunotherapy against various tumors
US11459371B2 (en) 2015-03-27 2022-10-04 Immatics Biotechnologies Gmbh Peptides and combination of peptides for use in immunotherapy against various tumors
US11466072B2 (en) 2015-03-27 2022-10-11 Immatics Biotechnologies Gmbh Peptides and combination of peptides for use in immunotherapy against various tumors
US11702460B2 (en) 2015-03-27 2023-07-18 Immatics Biotechnologies Gmbh Peptides and combination of peptides for use in immunotherapy against various tumors
US11873329B2 (en) 2015-03-27 2024-01-16 Immatics Biotechnologies Gmbh Peptides and combination of peptides for use in immunotherapy against various tumors
US11897934B2 (en) 2015-03-27 2024-02-13 Immatics Biotechnologies Gmbh Peptides and combination of peptides for use in immunotherapy against various tumors
US11965013B2 (en) 2015-03-27 2024-04-23 Immatics Biotechnologies Gmbh Peptides and combination of peptides for use in immunotherapy against various tumors

Also Published As

Publication number Publication date
WO2005072053A9 (en) 2010-03-04
EP1749025A2 (en) 2007-02-07

Similar Documents

Publication Publication Date Title
US7368548B2 (en) Nucleotide and amino acid sequences, and assays and methods of use thereof for diagnosis of prostate cancer
EP2740742B1 (en) Fusion gene of kif5b gene and ret gene, and method for determining effectiveness of cancer treatment targeting fusion gene
EP2300041B1 (en) Method for determining risk of recurrence of prostate cancer
US20060046257A1 (en) Novel nucleotide and amino acid sequences, and assays and methods of use thereof for diagnosis of lung cancer
US20060147946A1 (en) Novel calcium channel variants and methods of use thereof
EP2420575B1 (en) Marker for prognosis of liver cancer
EP1851543A2 (en) Novel diagnostic markers, especially for in vivo imaging, and assays and methods of use thereof
US20060263786A1 (en) Novel nucleotide and amino acid sequences, and assays and methods of use thereof for diagnosis of colon cancer
KR20060105139A (en) Multiple snp for diagnosing colorectal cancer, microarray and kit comprising the same, and method for diagnosing colorectal cancer using the same
KR20190087106A (en) Biomarkers for predicting the response of anticancer drugs to gastric cancer and their uses
US20090215046A1 (en) Novel nucleotide and amino acid sequences, and assays methods of use thereof for diagnosis of colon cancer
JP2009165473A (en) Cancer
EP1749025A2 (en) Novel nucleotide and amino acid sequences, and assays and methods of use thereof for diagnosis of colon cancer
EP1774046A2 (en) Novel nucleotide and amino acid sequences and assays and methods of use thereof for diagnosis of lung cancer
WO2005116850A2 (en) Differential expression of markers in ovarian cancer
WO2010061393A1 (en) He4 variant nucleotide and amino acid sequences, and methods of use thereof
US7528243B2 (en) Nucleotide and amino acid sequences, and assays and methods of use thereof for diagnosis of breast cancer
WO2005072050A2 (en) Novel nucleotide and amino acid sequences, and assays and methods of use thereof for diagnosis of breast cancer
KR20210134551A (en) Biomarkers for predicting the recurrence possibility and survival prognosis of papillary renal cell carcinoma and uses thereof
WO2006043271A1 (en) Novel nucleotide and amino acid sequences, and assays and methods of use thereof for diagnosis
WO2005107364A9 (en) Polynucleotide, polypeptides, and diagnostic methods
WO2006021874A2 (en) Novel nucleotide and amino acid sequences, and assays and methods of use thereof for diagnosis of prostate cancer
EP1735468A2 (en) Novel nucleotide and amino acid sequences, and assays and methods of use thereof for diagnosis of prostate cancer
KR102051737B1 (en) Markers for diagnosing gastric marginal zone lymphoma and uses thereof
EP1732943A2 (en) Novel nucleotide and amino acid sequences, and assays and methods of use thereof for diagnosis of breast cancer

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

WWE Wipo information: entry into national phase

Ref document number: 2554623

Country of ref document: CA

WWE Wipo information: entry into national phase

Ref document number: 2005207883

Country of ref document: AU

NENP Non-entry into the national phase in:

Ref country code: DE

WWW Wipo information: withdrawn in national office

Country of ref document: DE

ENP Entry into the national phase in:

Ref document number: 2005207883

Country of ref document: AU

Date of ref document: 20050127

Kind code of ref document: A

WWP Wipo information: published in national office

Ref document number: 2005207883

Country of ref document: AU

WWE Wipo information: entry into national phase

Ref document number: 2005718397

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 2005718397

Country of ref document: EP