WO2009002175A1 - A method of typing a sample comprising colorectal cancer cells - Google Patents

A method of typing a sample comprising colorectal cancer cells Download PDF

Info

Publication number
WO2009002175A1
WO2009002175A1 PCT/NL2008/050426 NL2008050426W WO2009002175A1 WO 2009002175 A1 WO2009002175 A1 WO 2009002175A1 NL 2008050426 W NL2008050426 W NL 2008050426W WO 2009002175 A1 WO2009002175 A1 WO 2009002175A1
Authority
WO
WIPO (PCT)
Prior art keywords
seq
genes
rna
sample
preferred
Prior art date
Application number
PCT/NL2008/050426
Other languages
French (fr)
Inventor
Iris Simon
Ryan Van Laar
Laura Johanna Van 't Veer
Robertus Alexandre Eduadrd Mathheus TOLLENAAR
Original Assignee
Agendia B.V.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Agendia B.V. filed Critical Agendia B.V.
Publication of WO2009002175A1 publication Critical patent/WO2009002175A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/112Disease subtyping, staging or classification
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/118Prognosis of disease development
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/16Primer sets for multiplex assays

Definitions

  • the invention relates to the field of oncology. More specifically, the invention relates to a method for typing colorectal cancer cells.
  • the invention provides means and methods for differentiating colorectal cancer cells with a low metastasizing potential and with a high metastatic potential.
  • stage IV metastatic or locally inoperable primary cancer
  • VEGF vascular endothelial growth factor
  • EGFR epidermal growth factor receptor
  • the invention provides a method for typing a RNA sample of an individual suffering from colorectal cancer or suspected of suffering there from, the method comprising providing an RNA sample that is prepared from a tissue sample from said individual, said tissue sample comprising colorectal cancer cells or suspected to comprise colorectal cancer cells; determining RNA levels for a set of genes in said RNA sample; and typing said RNA sample on the basis of the RNA levels determined for said set of genes; wherein said set of genes comprises at least five of the genes listed in Table 1 and/or Table 4.
  • Colorectal cancer is a type of cancer that originates in the large intestine or bowel, comprising the colon and the rectum. Colon cancer and rectal cancer have many features in common. The majority of colorectal cancers are adenocarcinomas. These are cancers of the cells that line the inside layer of the wall of the colon and rectum.
  • tumors may also develop in the colon and rectum, such as carcinoid tumors, which develop from specialized hormone -producing cells of the intestine; gastrointestinal stromal tumors or leiomyosarcomas, which develop from smooth muscle cells in the wall of the intestine; and lymphomas, which are cancers of immune system cells that typically develop in lymph nodes but also may start in the colon and rectum or other organs.
  • carcinoid tumors which develop from specialized hormone -producing cells of the intestine
  • gastrointestinal stromal tumors or leiomyosarcomas which develop from smooth muscle cells in the wall of the intestine
  • lymphomas which are cancers of immune system cells that typically develop in lymph nodes but also may start in the colon and rectum or other organs.
  • Adenocarcinomas usually start as a colorectal polyp, a hyperplasia which is defined as a visible protrusion above the surface of the surrounding normal large bowel mucosa.
  • Colorectal polyps are classified as either neoplastic (adenomatous polyps) or non-neoplastic, comprising hyperplastic, mucosal, inflammatory, and hamartomatous polyps which have no malignant potential.
  • Adenomatous polyps, or adenomas are attached to the bowel wall by a stalk (pedunculated) or by a broad, flat base (sessile).
  • a colorectal hyperplasia or polyp can develop into a malignant adenocarcinoma.
  • RNA isolated from a training set of colorectal samples comprising colorectal cancers that did not give rise to metastases in patients within the length of followup time of each patient; and colorectal cancers that gave rise to metastases in patients within the length of followup time of each patient
  • genes were selected using a multivariate Cox Regression based method (Simon et al., Design and Analysis of DNA Microarray Investigations, Springer-Verlag New York, (2003); Korn et al.,. Journal of Statistical Planning and Inference 124, 379-398 (2004)). Genes were selected of which the RNA levels was significantly related to survival of the patient, independent of patient stage, where survival is defined as being free of cancer recurrence.
  • Each of the genes listed in Table 1 and/or Table 4 was shown to be predictive of survival and have a minimum significance threshold of 0.001 .
  • a set of at least five of the genes listed in Table 1 and/or Table 4 can be used in a method according to the invention for typing of an RNA sample of an individual suffering from colorectal cancer or suspected of suffering therefrom. The individual preferably has not been treated for said cancer, for example by neo-adjuvant chemotherapy and/or radiotherapy.
  • a set of genes according to the invention comprises at least six of the genes listed in Table 1 and/or Table 4, more preferred at least seven of the genes listed in Table 1 and/or Table 4, more preferred at least eight of the genes listed in Table 1 and/or Table 4, more preferred at least nine of the genes listed in Table 1 and/or Table 4, more preferred at least ten of the genes listed in Table 1 and/or Table 4, more preferred at least fifteen of the genes listed in Table 1 and/or Table 4, more preferred at least twenty of the genes listed in Table 1 and/or Table 4, more preferred at least twenty-five of the genes listed in Table 1 and/or Table 4, more preferred at least thirty of the genes listed in Table 1 and/or Table 4, more preferred at least fourty of the genes listed in Table 1 and/or Table 4, more preferred at least fifty of the genes listed in Table 1 and/or Table 4, more preferred at least sixty of the genes listed in Table 1 and/or Table 4, more preferred at least seventy of the genes listed in Table 1 and/or Table 4, more preferred at least eighty of the genes listed in Table 1
  • the genes listed in Table 1 are rank-ordered. Ranking can be based on a correlation or significance of association with typing of a RNA sample from the tissue sample. Ranking can be based on a correlation with overall survival time, or on a correlation with recurrence free survival time, or on a correlation wiitthh differential expression between tumor samples from low-risk and high- risk patients, or based on the selection percentages of the genes during the multiple samples approach (Michiel et al., Lancet 365: 488-92 (2005)), as is known to a skilled person. Rank-ordering of the genes listed in Table 1 was performed according to their respective univariate p-value, which is a measure for association between the RNA level of the gene and disease recurrance.
  • a preferred set of genes for use in a method of the invention comprises the first five rank-ordered genes listed in Table 1, more preferred the first six rank-ordered genes, more preferred the first seven rank-ordered genes, more preferred the first eight rank-ordered genes, more preferred the first ten rank- ordered genes, more preferred the first fifteen rank-ordered genes, more preferred the first twenty rank-ordered genes, more preferred the first thirty rank-ordered genes, more preferred the first fourty rank-ordered genes, more preferred the first fifty rank-ordered genes, more preferred the first sixty rank- ordered genes, more preferred the first seventy rank-ordered genes, more preferred the first eighty rank-ordered genes, more preferred the first ninety rank-ordered genes, more preferred the first hundred rank-ordered genes, more preferred the first hundred-fifty rank-ordered genes, more preferred the first two-hundred rank-ordered genes, more preferred all two hundred fourty- one rank-ordered genes listed in Table 1.
  • Said rank-ordered genes from Table 1 are preferably combined with at least one of the genes listed in Table 4, more preferred at least two of the genes listed in Table 4, more preferred at least five of the genes listed in Table 4, more preferred at least ten of the genes listed in Table 4, more preferred at least twenty of the genes listed in Table 4, more preferred at least fifty of the genes listed in Table 4, more preferred at least hundred of the genes listed in Table 4, more preferred all of the genes listed in Table 4.
  • a highly preferred signature comprises genes referred to in Table 1 as SEQ ID NO 1, SEQ ID NO 2, SEQ ID NO 3, SEQ ID NO 4, SEQ ID NO 5, SEQ ID NO 6, SEQ ID NO 7, SEQ ID NO 8, SEQ ID NO 9, SEQ ID NO 12, SEQ ID NO 13, SEQ ID NO 14, SEQ ID NO 15, SEQ ID NO 16, SEQ ID NO17, SEQ ID NO18, SEQ ID NO 19, SEQ ID NO20, SEQ ID NO 21, SEQ ID NO 22, SEQ ID NO 23, SEQ ID NO24, SEQ ID NO 25, SEQ ID NO 27, SEQ ID NO 28, SEQ ID NO 30, SEQ ID NO 31, SEQ ID NO 33, SEQ ID NO35, 36, SEQ ID NO 37, SEQ ID NO 38, SEQ ID NO39, SEQ ID NO 40, SEQ ID NO 41, SEQ ID NO 42, SEQ ID NO 43, SEQ ID NO 44, SEQ ID NO 47, SEQ ID NO 48, SEQ ID NO 49, SEQ ID NO 50, SEQ ID NO 52, SEQ ID NO 53, SEQ ID NO
  • SEQ ID NO 1 genes referred to in Table 1 as SEQ ID NO 1, SEQ ID NO 2, SEQ ID NO 3, SEQ ID NO 4, SEQ ID NO 5, SEQ ID NO 6, SEQ ID NO 7, SEQ ID NO 8, SEQ ID NO 9, SEQ ID NO 12, SEQ ID NO 13, SEQ ID NO 14, SEQ ID NO 15, SEQ ID NO 16, SEQ ID NO17, SEQ ID NO18, SEQ ID NO19, SEQ ID NO20, SEQ ID NO 21, SEQ ID NO 22, SEQ ID NO 23, SEQ ID NO24, SEQ ID NO 25, SEQ ID NO 27, SEQ ID NO 28, SEQ ID NO 30, SEQ ID NO 31, SEQ ID NO 33, SEQ ID NO35, 36, SEQ ID NO 37,
  • a cell sample is a clinically relevant sample that comprises a colorectal cancer cell or an expression product comprising a nucleic acid from a colorectal cancer cell.
  • a cell sample according to the invention is obtained directly from the large intestine during surgery.
  • the cell sample is prepared from a biopsy sample that is taken during colonoscopy.
  • the biopsies have a depth of at most 10 millimeter, more preferred at most 5 millimeter, with a preferred diameter of about 2 millimeter, more preferred about 3 millimeter, more preferred about 4 millimeter, more preferred about 5 millimeter, more preferred about 6 millimeter, more preferred about 7 millimeter, more preferred about 8 millimeter, more preferred about 9 millimeter, more preferred about 10 millimeter.
  • the tissue sample comprises stool or blood voided by a patient suffering from colorectal cancer, said tissue sample comprising a colorectal cancer cell or a gene expression product such as a nucleic acid product from a colorectal cancer cell.
  • Samples can be processed in numerous ways, as is known to a skilled person. For example, they can be freshly prepared from cells or tissues at the moment of harvesting, or they can be prepared from samples that are stored at -70 0 C until processed for sample preparation. Alternatively, tissues, biopsies, stool or blood samples can be stored under conditions that preserve the quality of the protein or RNA. Examples of these preservative conditions are fixation using e.g.
  • RNAsin RNAsin
  • RNasecure aquous solutions
  • aquous solutions such as RNAIa ter (Assuragen; US06204375), Hepes-Glutamic acid buffer mediated Organic solvent Protection Effect (HOPE; DE10021390), and RCL2 (Alphelys; WO04083369)
  • non-aquous solutions such as Universal Molecular Fixative (Sakura Finetek USA Inc.; US7138226).
  • RNA level of at least five of the genes listed in Table 1 and/or Table 4 can be determined by any method known in the art.
  • RNA levels of genes are known to a skilled person and include, but are not limited to, Northern blotting, quantitative PCR, and microarray analysis.
  • Northern blotting comprises the quantification of the nucleic acid expression product of a specific gene by hybridizing a labeled probe that specifically interacts with said nucleic acid expression product, after separation of nucleic acid expression products by gel electrophoreses. Quantification of the labeled probe that has interacted with said nucleic acid expression product serves as a measure for determining the level of expression.
  • the determined level of expression can be normalized for differences in the total amounts of nucleic acid expression products between two separate samples by comparing the level of expression of a gene that is known not to differ in expression level between samples.
  • Quantitative Polymerase Chain Reaction provides an alternative method to quantify the level of expression of nucleic acids.
  • qPCR can be performed by real-time PCR (rtPCR), in which the amount of product is monitored during the reaction, or by end-point measurements, in which the amount of a final product is determined.
  • rtPCR can be performed by either the use of a nucleic acid intercalator, such as for example ethidium bromide or SYBR® Green I dye, which interacts which all generated double stranded products resulting in an increase in fluorescence during amplification, or by the use of labeled probes that react specifically with the generated double stranded product of the gene of interest.
  • Alternative detection methods that can be used are provided by dendrimer signal amplification, hybridization signal amplification, and molecular beacons.
  • amplification methods known to a skilled artisan, can be employed for qPCR, including but not limited to PCR, rolling circle amplification, nucleic acid sequence-based amplification, transcription mediated amplification, and linear RNA amplification.
  • qPCR methods such as reverse transcriptase- multiplex ligation- dependent amplification (rtMLPA), which accurately quantifies up to 45 2008/050426
  • transcripts of interest in a one-tube assay (Eldering et al., Nucleic Acids Res 2003; 31: el53) can be employed.
  • a microarray usually comprises nucleic acid molecules, termed probes, which are able to hybridize to nucleic acid expression products.
  • the probes are exposed to labeled sample nucleic acid, hybridized, and the abundance of nucleic acid expression products in the sample that are complementary to a probe is determined.
  • the probes on a microarray may comprise DNA sequences, RNA sequences, or copolymer sequences of DNA and RNA.
  • the probes may also comprise DNA and/or RNA analogues such as, for example, nucleotide analogues or peptide nucleic acid molecules (PNA), or combinations thereof.
  • the sequences of the probes may be full or partial fragments of genomic DNA.
  • the sequences may also be synthesized nucleotide sequences, such as synthetic oligonucleotide sequences.
  • RNA levels are determined simultaneously.
  • Simultaneous analyses can be performed, for example, by multiplex qPCR and microarray analysis.
  • Microarray analyses allow the simultaneous determination of the nucleic acid levels of expression of a large number of genes, such as more than 50 genes, more than 100 genes, more than 1000 genes, or even more than 10.000 genes, allowing the use of a large number of gene expression data for normalization of the genes comprising the colorectal expression profile.
  • RNA levels are determined by microarray analysis.
  • a probe is specific for a gene listed in Table 1 and/or Table 4.
  • a probe can be specific when it comprises a continuous stretch of nucleotides that are completely complementary to a nucleotide sequence of a RNA product of said gene, or a cDNA product thereof.
  • a probe can also be specific when it comprises a continuous stretch of nucleotides that are partially complementary to a nucleotide sequence of a RNA product of said gene, or a cDNA product thereof. Partially means that a maximum of 5% from the nucleotides in a continuous stretch of at least 20 nucleotides differs from the corresponding nucleotide sequence of a RNA product of said gene.
  • the term complementary is known in the art and refers to a sequence that is related by base-pairing rules to the sequence that is to be detected. It is preferred that the sequence of the probe is carefully designed to minimize nonspecific hybridization to said probe. It is preferred that the probe is or mimics a single stranded nucleic acid molecule.
  • the length of said complementary continuous stretch of nucleotides can vary between 15 bases and several kilo bases, and is preferably between 20 bases and 1 kilobase, more preferred between 40 and 100 bases, and most preferred 60 nucleotides.
  • the RNA sample is preferably labeled, either directly or indirectly, and contacted with probes on the array under conditions that favor duplex formation between a probe and a complementary molecule in the labeled RNA sample.
  • the amount of label that remains associated with a probe after washing of the microarray can be determined and is used as a measure for the level of RNA of a nucleic acid molecule that is complementary to said probe.
  • the determined RNA levels for genes listed in Table land/or Table 4 can be normalized to correct for systemic bias.
  • Systemic bias results in variation by inter-array differences in overall performance, which can be due to for example inconsistencies in array fabrication, staining and scanning, and variation between labeled RNA samples, which can be due for example to variations in purity.
  • Systemic bias can be introduced during the handling of the sample in a microarray experiment.
  • the determined RNA levels are preferably corrected for background non-specific hybridization and normalized using, for example, Feature Extraction software (Agilent Technologies).
  • the array may comprise specific probes that are used for normalization. These probes preferably detect RNA products from housekeeping genes such as glyceraldehyde-3-phosphate dehydrogenase and 18S rRNA levels, of which the RNA level is thought to be constant in a given cell and independent from the developmental stage or prognosis of said cell.
  • a preferred method according to the invention further comprises normalizing the determined RNA levels of said set of at least five of the genes listed in Table 1 and/or Table 4 in said sample.
  • genes are selected of which the RNA levels are largely constant between different tissue samples comprsing colorectal cells from one individual, and between tissue samples comprsing colorectal cells from different individuals. It is furthermore preferred that RNA levels of said set of normalization genes differ between the genes. For example, it is preferred to select genes with a low RNA level in said tissue sample, and genes with a high RNA level. More preferred is to select genes genes with a low RNA level in said tissue sample, genes with a moderate RNA level, and genes with a high RNA level. It will be clear to a skilled artisan that the RNA levels of said set of normalization genes preferably allow normalization over the whole range of RNA levels.
  • a preferred method of the invention comprises determining the RNA level of at least five of the genes listed in Table 2 in an RNA sample of an individual suffering from colorectal cancer, and using the determined RNA levels for normalizing the determined RNA levels of the set of at least five of the genes listed in Table 1 and/or Table 4 in said sample.
  • the invention also provides a method according to the invention, whereby the RNA level of at least five of the genes listed in Table 2 are used for normalizing the determined RNA levels of the set of at least five of the genes listed in Table 1 and/or Table 4 in said sample.
  • a method of the invention comprises determining the RNA levels of at least six of the genes listed in Table 2, more preferred at least seven of the genes listed in Table 2, more preferred at least eight of the genes listed in Table 2, more preferred at least nine of the genes listed in Table 2, more preferred at least ten of the genes listed in Table 2, more preferred at least fifteen of the genes listed in Table 2, more preferred at least twenty of the genes listed in Table 2, more preferred at least twenty-five of the genes listed in Table 2, more preferred at least thirty of the genes listed in Table 2, more preferred at least fourty of the genes listed in Table 2, more preferred at least fifty of the genes listed in Table 2, more preferred at least sixty of the genes listed in Table 2, more preferred at least seventy of the genes listed in Table 2, more preferred at least eighty of the genes listed in Table 2, more preferred hundred of the genes listed in Table 2, more preferred hundred-fifty of the genes listed in Table 2, more preferred two-hundred of the genes listed in Table 2, more preferred all of the genes listed in Table 2.
  • RNA level of the individual genes listed in Table 2 ranges from low to high.
  • said at least five genes from Table 2 are selected to include genes of which the RNA levels largely cover the range of RNA levels from low to high.
  • the method further comprises multiplying each of said determined values with a predetermined constant for said gene to obtain a weighted value for the relative RNA level of said gene, and thereby a set of weighted values for said set of genes, said method further comprising typing said sample on the basis of said set of weighted values.
  • Said set of weighted values can be summed and compared to a summed set of weighted values from a reference sample. It is preferred that said summed set of weighted values is compard to a classification treshold, that is determined by the values obtained from RNA samples of which the typing is known.
  • stage A adenocarcinoma is defined as a cancer that penetrates into the mucosa of the bowel wall but no further.
  • Stage Bl defines a cancer that penetrates into, but not through the muscularis basement (the muscular layer) of the bowel wall, while a B2 defines a cancer that penetrates into and through the muscularis propria of the bowel wall.
  • Stage Cl defines a cancer that penetrates into, but not through the muscularis intestinal of the bowel wall, combined with pathologic evidence of colon cancer in the lymph nodes
  • stage C2 defines a cancer which penetrates into and through the muscularis propria of the bowel wall, combined with pathologic evidence of colon cancer in the lymph nodes
  • stage D defines a cancer, which has spread beyond the confines of the lymph nodes to organs such as the liver, lung or bone.
  • TNM Staging System which combines data about the Tumor (T), the spread to lymph nodes (N), and the existence of distant metastases (M).
  • TNM stage I colorectal cancer is defined as a cancer that began to spread and has invaded the submucosa or the muscularis basement.
  • a TNM stage I is equal to a Duke's stage A.
  • TNM stage II defines a cancer that has invaded through the muscularis basement into the subserosa, or into the horric or perirectal tissues, but has not reached the lymph nodes.
  • a stage III defines a cancer that has spread to the lymph nodes in the absence of distant metastases.
  • a stage IV defines a cancer that has spread to distant sites.
  • stage II patients Although undoubtedly real, the benefit from chemotherapy for stage II patients is small, with the proportional reduction in the risk of death being 18% (CI 5% to 30%), which translates to an absolute improvement in five-year survival of about 3.6% (1.0% to 6.0%) for a stage II patient with five-year mortality of 20%. Since it is clear that some TNM stage II patients have a reasonable prognosis, and the balance of pros and cons of chemotherapy for these individuals might result in favor of not having adjuvant therapy. Current recommendations are that stage II patients with a higher than average risk of tumor recurrence - e.g. T4 stage or vascular invasion, comprising about 15% of the population, should be offered chemotherapy.
  • the invention provides a method of typing an individual suffering from colorectal cancer, wherein said colorectal cancer comprises a TNM stage II or TNM stage III colorectal cancer as determined by the TNM Staging System.
  • said typing in a method according to the invention allows differentiating cancer cells with a low metastasizing potential or risk of cancer recurrence and cancer cells with a high metastatic potential or risk of cancer recurrence.
  • RNA levels at least five of the genes listed in Table 1 and/or Table 4 can be compared to RNA levels of said genes in a reference sample.
  • the reference sample can be an RNA sample isolated from a colorectal tissue from a healthy individual, or an RNA sample from a relevant cell line or mixture of cell lines. Said reference sample can also be an RNA sample from a cancerous growth of an individual suffering from colorectal cancer. Said individual suffering from colorectal cancer can have an increased risk of cancer recurrence, or a low risk of cancer recurrence.
  • said reference sample is an RNA sample from an individual suffering from colorectal cancer and having a low risk of cancer recurrence.
  • said reference sample is a pooled RNA sample from multiple tissue samples comprising colorectal cells from individuals suffering from colorectal cancer and having a low risk of cancer recurrence. It is preferred that said multiple tissue sample comprises more than 10 tissue samples, more preferred more than 20 tissue samples, more preferred more than 30 tissue samples, more preferred more than 40 tissue samples, most preferred more than 50 tissue samples.
  • the reference sample could also be RNA isolated and pooled from colon tissue from healthy individuals, or from so called normal adjacent tissue from colon cancer patients or RNA from a generic cell line or cell line mixture.
  • the RNA from a cell line or cell line mixture can be produced in-house or obtained from a commercial source such as, for example, Stratagene Human Reference RNA.
  • a coefficient is determined that is a measure of the similarity or dissimilarity of a sample with said reference sample.
  • a number of different coefficients can be used for determining a correlation between the RNA expression level in an RNA sample from an individual and a reference sample.
  • Preferred methods are parametric methods which assume a normal distribution of the data. One of these methods is the Pearson product-moment correlation coefficient, which is obtained by dividing the covariance of the two variables by the product of their standard deviations.
  • Preferred methods comprise cosine-angle, un- centered correlation and, more preferred, cosine correlation (Fan et al., Conf Proc IEEE Eng Med Biol Soc. 5:4810-3 (2005)).
  • said correlation with a reference sample is used to produce an overall similarity score for the set of genes that are used.
  • a similarity score is a measure of the average correlation of RNA levels of a set of genes in an RNA sample from an individual and a reference sample.
  • Said similarity score can be a numerical value between +1, indicative of a high correlation between the RNA expression level of the set of genes in a RNA sample of said individual and said reference sample, and -1, which is indicative of an inverse correlation and therefore indicative of having an increased risk of cancer recurrence (van 't Veer et al., Nature 415: 484-5 (2002)).
  • the invention provides a method of classifying an individual suffering from colorectal cancer, comprising classifying said individual as having a poor prognosis or a good prognosis by a method comprising determining a similarity value between RNA levels from a set of at least five genes listed in Table 1 and/or Table 4 in a RNA sample from said individual and a level of expression from said set of genes in a RNA sample from a patient having no recurrent disease within five years of initial diagnosis, and classifying said individual as having a poor prognosis if said similarity value is below a similarity threshold value, and classifying said individual as having a good prognosis if said similarity value exceeds a similarity threshold value.
  • a preferred method of classifying samples as either high or low risk for disease recurrence involves the use of a classification template, derived from Support Vector Machine (SVM) training using all genes identified as being correlated with disease progression.
  • SVM Support Vector Machine
  • Each gene in the template has a corresponding weighting factor, as determined by the SVM implementation by Chang & Lin (Chih-Chung Chang and Chih-Jen Lin, LIBSVM : a library for support vector machines, 2001. http://www.csie.ntu.edu.tw/ ⁇ cjlin/libsvm).
  • This algorithm analyses the information contained in the 241 genes across all training set samples and constructs a classification template that best separates patients with recurrence from those without.
  • LIBSVM developed by Chih-Chung Chang and Chih-Jen Lin, is an integrated software for analyzing many problems in supervised classification or regression frameworks. By multiplying the loglO expression value for each gene in the template
  • a similarity threshold value is an arbitrary value that allows discriminating between RNA samples from patients with a high risk of cancer recurrence, and RNA samples from patients with a low risk of cancer recurrence.
  • Said similarity threshold value is set at a value at which an acceptable number of patients with known metastasis formation within five years after initial diagnosis would score as false negatives above the threshold value, and an acceptable number of patients without known metastasis formation within five years after initial diagnosis would score as false positives below the threshold value.
  • a similarity score and or a resultant of said score which is a measurement of a high risk or a low risk of cancer recurrence, is preferably displayed or outputted to a user interface device, a computer readable storage medium, or a local or remote computer system.
  • the invention provides a method of classifying an individual suffering from colorectal cancer, comprising classifying said individual as having a poor prognosis or a good prognosis by a method comprising (a) providing an RNA sample from a said individual that is prepared from a tissue sample from said individual, said tissue sample comprising colorectal cancer cells or suspected to comprise colorectal cancer cells; (b) determining a level of RNA for a set of genes comprising at least five of the genes listed in Table 1 and/or Table 4 in said sample; (c) determining a similarity value between a level of expression from the set of genes in said individual and a level of expression from said set of genes in a patient having no recurrent disease within five years of initial diagnosis; and (d) classifying said individual as having a poor prognosis if said similarity value is below a first similarity threshold value, and classifying said individual as having a good prognosis if said similarity value exceeds said first similarity threshold value.
  • the level of RNA for a set of genes comprising at least five of the genes listed in Table 1 and/or Table 4 in said sample is normalized. Normalization can be performed by any method known to a skilled artisan, including global analysis and the use of specific probes. In a preferred embodiment, the RNA level of at least 5 of the genes listed in Table 2 are used for normalization.
  • the invention provides a method of assigning treatment to an individual suffering from colorectal cancer, comprising classifying said individual as having a poor prognosis or a good prognosis with a method according to the invention, and assigning adjuvant chemotherapy if said individual is classified as having said poor prognosis.
  • Chemotherapy comprises the use of natural or non -natural compounds to eliminate fast-dividing, and therefore susceptible, cancer cells.
  • Chemotherapeutic compounds comprise alkylating agents such as decarbazine and cyclophosphamide; DNA crosslinking agents such as cisplatin and carboplatin; antimetabolitic agents such as methotrexate, 5-fluorouracil (5FU), and mercap top urine; alkaloidic agents such as taxanes such as paclitaxel and docetaxel, vincristine and vinblastine; topoisomerase inhibitors such as camptothecins and amsacrine; Antibiotics such as anthracycline glycosides such as doxorubicin, daunorubicin, idarubicin, pirarubicin, and epirubicin; mytomycin; poly amine biosynthesis inhibitors such as eflor
  • FOLFOX The current standard surgical adjuvant treatment for colorectal cancer comprising modified TNM Stage III or higher is FOLFOX 4.
  • FOLFOX combines oxaliplatin, leucovorin, and infusional 5FU.
  • Leucovorin is a drug that is used to enhance the anti-cancer effect of chemotherapy, and especially 5FU.
  • Other therapies uses are XELOX, a combination therapy comprising oxaliplatin and capecitabine; and FOLFIRI, which combines 5-FU, leucovorin, and irinotecan, a topoisomerase 1 inhibitor.
  • antibody-based therapeutics including but not limited to bevacizumab, which inhibits angiogenesis, cetuximab, an Epidermal Growth Factor Receptor inhibitor, and panitumumab, an Epidermal Growth Factor receptor inhibitor.
  • a cancer that originates in the colon or rectum is termed a colorectal cancer, bowel cancer, colon cancer, or rectal cancer.
  • treatment approaches for colon and rectal cancers involve the use of neoadjuvant there apy, the kind of surgery and the use of chemotherapy alone or chemotherapy plus radiation.
  • a colorectal cancer according to the invention relates to a colon cancer.
  • a colorectal cancer according to the invention relates to a rectal cancer.
  • the colon and rectum are part of the digestive tract.
  • the colon absorbs water, electrolytes, and nutrients from food and transports them into the bloodstream. It is about 6 feet in length and consists of the cecum (connects to the small intestine at the cecal valve), the ascending colon (the vertical segment located on the right side of the abdomen), the transverse colon (extends across the abdomen), the descending colon (leads vertically down the left side of the abdomen), and the sigmoid colon (extends to the rectum).
  • the rectum is the last segment of the large intestine. It is 8 to 10 inches in length and leads to the anus, which is the opening to the outside of the body. Waste material (fecal matter) is stored in the rectum until it is eliminated from the body through the anus.
  • the invention provides a method for typing colorectal cancer cells according to the invention to select patients having an increased chance of responding to therapy.
  • Said method can help to obtain an appropriate definition of a patient population by revealing potential disease subtypes that may differ in etiology, pathogenesis, and response to treatments.
  • said method can be instrumental for identifying subsets of colorectal cancer patients who are at risk for certain complications or who preferentially benefit from specific treatments.
  • Information about colorectal subtypes could also substantially improve the design of future colorectal clinical studies by improving patient selection, reducing variability, and focusing on relevant outcome measures.
  • the invention relates to an array, comprising between 5 and 12.000 nucleic acid molecules comprising a first set of nucleic acid molecules wherein each nucleic acid molecule of said first set comprises a nucleotide sequence that is able to hybridize to a different gene selected from the genes listed in Table 1 and/or Table 4.
  • Said first set of nucleic acid molecules comprises at least five of the genes listed in Table 1, more preferred at least six of the genes listed in Table 1 and/or Table 4, more preferred at least seven of the genes listed in Table 1 and/or Table 4, more preferred at least eight of the genes listed in Table 1 and/or Table 4, more preferred at least nine of the genes listed in Table 1 and/or Table 4, more preferred at least ten of the genes listed in Table 1 and/or Table 4, more preferred at least fifteen of the genes listed in Table 1 and/or Table 4, more preferred at least twenty of the genes listed in Table 1 and/or Table 4, more preferred at least twenty-five of the genes listed in Table 1 and/or Table 4, more preferred at least thirty of the genes listed in Table 1 and/or Table 4, more preferred at least forty of the genes listed in Table 1 and/or Table 4, more preferred at least fifty of the genes listed in Table 1 and/or Table 4.
  • an array according to the invention further comprises a second set of nucleic acid molecules wherein each nucleic acid molecule of said second set comprises a nucleotide sequence that is able to hybridize to normalization gene, whereby it is preferred that the RNA levels of said normalization genes are dissimilar.
  • said second set of nucleic acid molecules comprises nucleic acid molecules that are able to hybridize to at least 5 of the genes listed in Table 2.
  • the invention provides the use of an array according to the invention for obtaining a colorectal expression profile.
  • Figure 1 Hierarchical clustering of the 241 genes used for prognostic classification in the training set of patients
  • Figure 2 (a) Scatter plot of prediction indices for samples in the training set. Example of optimal classification (over trained) as this dataset was used to train the algorithm itself, (b) Scatter plot of prediction indices vs months followup time for the validation set. Patients with recurrence in the time they have been followed up are shown in red. Patients without recurrence in the length of followup time are shown as blue squares. The orange dashed line at - 0.107 indicates the classification threshold used to determine if a sample belongs to the high or low risk category.
  • Figure 3 Kaplan Meier analysis of time to recurrence for all stage II and III patients in a training set as classified by 241 gene SVM classifier.
  • Log rank test p-value ⁇ 0.0001
  • Figure 4 Kaplan Meier analysis of time to recurrence for stage II patients in training set, as classified by 241 gene SVM classifier.
  • Log rank test p-value 0.056
  • Example 1 Generation of classifier Patients Clinical and pathological information documented at the time of surgery included stage, grade, size and location of tumors. Additionally, the number of lymph nodes assessed for nodal involvement was described in 95% of cases. Tumors were staged according to the TMN staging system. All tissue samples were collected from patients with appropriate informed consent. The study was carried out in accordance with the ethical standards of the Helsinki Declaration and was approved by the Medical Ethical Board of the participating medical centers and hospitals. Patients were monitored for survival and recurrence for up to 270 months. Detailed patient information can be found in Table 3.
  • RNA samples were available for this study. Two-hundred nanogram total RNA was amplified using the Low RNA Input Fluorescent Labeling Kit (Agilent Technologies). Cyanine 3-CTP or Cyanine 5- CTP (GE Health Care) was directly incorporated into the cRNA during in vitro transcription. A total of 750 ng of Cyanine-labeled RNA was co-hybridized with a standard reference to Agilent 44k oligo nucleotide microarrays at 60 degrees Celsius for 17 hrs and subsequently washed according to the Agilent standard hybridization protocol (Agilent Oligo Microarray Kit, Agilent Technologies).
  • Dyeswap Cy 5, Reference RNA. Cy 3, Sample RNA
  • the normalised gene expression ratios from each hybridisation were combined to produce a single gene expression profile, per patient, using Agendia XPrint software (version 1.5.1).
  • an error-weighted mean value was calculated for the probes belonging to the same gene as loglO ratios.
  • the Rosetta error model was used, which corrects for the uncertainties in individual probe measurements (Weng L et al, Bioinformatics 22:1111-21 (2006)).
  • a text file containing normalised, error- weighted log ratios was generated, which was then used for further analysis. The data were then loaded into BRB ArrayTools (Simon et al., Cancer
  • Probes for normalisation were selected by selecting all probes with fewer than 5% missing values across the training set and with a coefficient of variance (CV, ie standard deviation of log ratio / mean log ratio) ⁇ 0.01.
  • Genes that have low variability were filtered out using the minimum fold- change filter.
  • the criterion for filtering out a gene is based upon the percentage of expression values for that gene which have at least a minimum 1.5 fold-change from the median expression value for that gene. If less than a specified percentage of expression values meet the minimum fold-change requirement, then the gene is filtered out.
  • Log expression variation filter Filtering based on the variance for the gene across the arrays was applied in ArrayTools. Statistical significance criterion based on the variance was used whereby the variance of the log-ratios for each gene is compared to the median of all the variances. Those genes not significantly more variable (p ⁇ 0.01) than the median gene are filtered out.
  • the quantity (n-1) Vari / Varmed is computed for each gene I, wherein Vari is the variance of the log intensity for gene i across the entire set of n arrays and Varmed is the median of these gene-specific variances. This quantity is compared to a percentile of the chi-square distribution with n- 1 degrees of freedom. This is an approximate test of the hypothesis that gene i has the same variance as the median variance. (iii) Percent missing filter
  • the criterion for filtering out a gene is based upon the percentage of expression values that are not missing and not filtered out by any of the previous spot filters. A threshold of no more than 25% missing values was applied.
  • Cox regression is a model in which the hazard function for an individual is a function of predictor variables.
  • the predictor variables are log expression levels.
  • the hazard function is the instantaneous force of mortality at any time conditional on having survived until that time.
  • the proportional hazards model postulates that the logarithm of the hazard of death is a linear function of the predictor variables, linked by unknown regression coefficients. For more details, see the Cox DR., Journal of the Royal Statistical Society B 34:187-220, 1972).
  • SVM means "Support Vector Machine", a general-purpose machine learning algorithm.
  • SVMs deliver state-of-the-art performance in real-world applications such as text categorization, hand-written character recognition, image classification, and bioinformatics.
  • SRM Structural Risk Minimization
  • ELM Empirical Risk Minimization
  • SVM was used to analyse the information contained in the 241 genes across all training set samples and to construct a classification template that best separates patients with recurrence from those without (see Figure 2). Based on these anlyses, a treshold was set at -0.107 Index units.
  • LOCV leave-one-out cross- validation
  • the class labels were randomly permuted and the entire LOOCV process was repeated. The significance level is the proportion of the random permutations that gave a cross-validated error rate no greater than the cross-validated error rate obtained with the real data. 1000 random permutations were used.
  • the classifier was observed to produce the following sensitivity, specificity, positive (PPV) and negative (NPV) predictive values for prediction of disease recurrence in those samples included in the training process.
  • a global test whether the predictor is picking up the random noise in the data and the outcome classes do not differ at all with regard to expression profiles was performed.
  • a permutation analysis was used for the computation of the p value for this global test.
  • Class labels of the samples are randomly permuted 1,000 times.
  • samples were classified and the 1Ox CV misclassification rate of the classifier was computed as a proportion of correctly predicted samples.
  • the p-value of the predictor is the proportion of permutations with misclassification rate smaller than the misclassification rate of the original labelling.
  • a global p-value less than 0.05 is considered significant. Since one global hypothesis was tested, stringent control for multiple comparisons was not necessary for the global test. Based on 100 random permutations, a probability that the support vector machines classifier is classifying on random noise was calculated to be 0.004. Another way of stating this is that we can be 99.6% sure that the classifier is based on true biological information.
  • Example 4 Cross validation of the 241 signature genes in the training set.
  • a series of stage II and stage III tumors that were used in the gene selection and algorithm training process was typed using the 241 gene model.
  • a typing for each sample in the training set was determined by leave-one -out cross validation. The resulting typing is presented in Figures 3 and 4.
  • stage II tumors not used in the gene selection or algorithm training process was typed using the 241 gene model. These samples are described in the column titled 'validation series' in Table 3. Each prediction was compared to the current known status of the patient and Kaplan Meier analysis (see Figure 5) was used to evaluate the survival difference between risk groups.
  • the normalised gene expression ratios for each gene in the signature was multiplied by a weight, determined by the SVM algorithm.
  • the weighted loglO expression ratios are then summed and a classification of high- or low-risk is determined by comparison of the summed value to a classification threshold.
  • the prediction rule is defined by the inner sum of the weights (wi) and expression (xi) of significant genes. A sample is classified to the class low risk if the sum is greater than the threshold; that is, ⁇ iwi xi > -0.107
  • a sample is classified to the class high risk if the sum is greater than the threshold; that is, ⁇ iwi xi ⁇ -0.107
  • weights (wi) for each gene in the classifier are provided in Table 1.
  • A_24_P931583 A_24_P931583 AAAGCCTGGCTCCCATGCCAGGTGTTGATGCTGTCCTTCCACGCTTCTCTCCTCCTAAAG 288 -fcr LO
  • BC009800 BC009800 TCTAGTT ⁇ GTAMTCACATTTGGCGTTTGTAGATCACTCCTTCCCT ⁇ TAGTGGCATTCT 446
  • NP109393 CAACCTCTCCTTCTTGGACCTCTGTTTCACCACGAGTTGTGTTCCCCAAATGCTGGCCAA 493

Abstract

The invention relates to a method of typing colorectal cancer cells by determining the RNA levels of a set of signature genes. Said typing can be use for predicting a risk for recurrence of said colorectal cancer. The invention further relates to a set of genes that can be used for normalizing the RNA levels of said set of signature genes., and to micro-array comprising said set of signture genes. In particular, the typing allows the distinction of stage II and III cancers.

Description

Title: A method of typing a sample comprising colorectal cancer cells.
The invention relates to the field of oncology. More specifically, the invention relates to a method for typing colorectal cancer cells. The invention provides means and methods for differentiating colorectal cancer cells with a low metastasizing potential and with a high metastatic potential.
Worldwide over a million new cases of colorectal cancer were diagnosed in 2002, accounting for more than 9% of all new cancer cases (Parkin et al. 2005. CA Cancer J Clin 55: 74—108). It is the third most common cancer worldwide after lung and breast with two-thirds of all colorectal cancers occurring in the more developed regions. As with all cancers, chances of survival are good for patients when the cancer is detected in an early stage. Stage I patients have a survival rate of ~85% while the 5-year survival rate drops to ~65-75% in stage II patients and to 35-50% in stage III patients (Coleman et al. 2004. Br J Cancer 90: 1367-73). According to international recommendations, chemotherapy should be made available to patients following surgery for stage III if they are well enough to tolerate it; patients with metastatic or locally inoperable primary cancer (stage IV) require careful evaluation, and may be appropriate for palliative chemotherapy and/or radiotherapy. Whether to use chemotherapy in stage II tumors should be discussed between patients and their oncologists but is still a subject of debate (Sobrero and Koehne 2006. Lancet Oncol7: 515-517).
Despite numerous clinical trials, the benefit of adjuvant chemotherapy for stage II colon cancer patients has never been proven in a randomized study. Three-fourth of patients is cured by surgery alone and therefore, less than 25% of patients would benefit from additional chemotherapy. The identification of the sub-group of patients that are more likely to suffer from a recurrent disease would therefore allow the identification of patients who are more likely to benefit from adjuvant chemotherapy and should be treated after surgery.'
Over the past ten years the number of treatment options for colorectal cancer has increased with many more therapeutic agents in clinical development. New targeted therapies have been designed whose action is directed against vascular endothelial growth factor (VEGF) or epidermal growth factor receptor (EGFR). These include biological therapies that are being tested in combination with chemotherapy for both early stage and advanced disease.
Current pathological prediction factors are not sufficient to identify "high risk" patients, who have an increased risk for recurrent disease. It is therefore an object of the present invention to provide methods and means to allow identification of high risk patients and low risk patients suffering from colorectal cancer.
Therefore, the invention provides a method for typing a RNA sample of an individual suffering from colorectal cancer or suspected of suffering there from, the method comprising providing an RNA sample that is prepared from a tissue sample from said individual, said tissue sample comprising colorectal cancer cells or suspected to comprise colorectal cancer cells; determining RNA levels for a set of genes in said RNA sample; and typing said RNA sample on the basis of the RNA levels determined for said set of genes; wherein said set of genes comprises at least five of the genes listed in Table 1 and/or Table 4.
Colorectal cancer is a type of cancer that originates in the large intestine or bowel, comprising the colon and the rectum. Colon cancer and rectal cancer have many features in common. The majority of colorectal cancers are adenocarcinomas. These are cancers of the cells that line the inside layer of the wall of the colon and rectum. Other less common types of tumors may also develop in the colon and rectum, such as carcinoid tumors, which develop from specialized hormone -producing cells of the intestine; gastrointestinal stromal tumors or leiomyosarcomas, which develop from smooth muscle cells in the wall of the intestine; and lymphomas, which are cancers of immune system cells that typically develop in lymph nodes but also may start in the colon and rectum or other organs.
Adenocarcinomas usually start as a colorectal polyp, a hyperplasia which is defined as a visible protrusion above the surface of the surrounding normal large bowel mucosa. Colorectal polyps are classified as either neoplastic (adenomatous polyps) or non-neoplastic, comprising hyperplastic, mucosal, inflammatory, and hamartomatous polyps which have no malignant potential. Adenomatous polyps, or adenomas, are attached to the bowel wall by a stalk (pedunculated) or by a broad, flat base (sessile). A colorectal hyperplasia or polyp can develop into a malignant adenocarcinoma.
Using RNA isolated from a training set of colorectal samples, comprising colorectal cancers that did not give rise to metastases in patients within the length of followup time of each patient; and colorectal cancers that gave rise to metastases in patients within the length of followup time of each patient, genes were selected using a multivariate Cox Regression based method (Simon et al., Design and Analysis of DNA Microarray Investigations, Springer-Verlag New York, (2003); Korn et al.,. Journal of Statistical Planning and Inference 124, 379-398 (2004)). Genes were selected of which the RNA levels was significantly related to survival of the patient, independent of patient stage, where survival is defined as being free of cancer recurrence. Each of the genes listed in Table 1 and/or Table 4 was shown to be predictive of survival and have a minimum significance threshold of 0.001 . A set of at least five of the genes listed in Table 1 and/or Table 4 can be used in a method according to the invention for typing of an RNA sample of an individual suffering from colorectal cancer or suspected of suffering therefrom. The individual preferably has not been treated for said cancer, for example by neo-adjuvant chemotherapy and/or radiotherapy.
In a preferred embodiment, a set of genes according to the invention comprises at least six of the genes listed in Table 1 and/or Table 4, more preferred at least seven of the genes listed in Table 1 and/or Table 4, more preferred at least eight of the genes listed in Table 1 and/or Table 4, more preferred at least nine of the genes listed in Table 1 and/or Table 4, more preferred at least ten of the genes listed in Table 1 and/or Table 4, more preferred at least fifteen of the genes listed in Table 1 and/or Table 4, more preferred at least twenty of the genes listed in Table 1 and/or Table 4, more preferred at least twenty-five of the genes listed in Table 1 and/or Table 4, more preferred at least thirty of the genes listed in Table 1 and/or Table 4, more preferred at least fourty of the genes listed in Table 1 and/or Table 4, more preferred at least fifty of the genes listed in Table 1 and/or Table 4, more preferred at least sixty of the genes listed in Table 1 and/or Table 4, more preferred at least seventy of the genes listed in Table 1 and/or Table 4, more preferred at least eighty of the genes listed in Table 1 and/or Table 4, more preferred hundred of the genes listed in Table 1 and/or Table 4, more preferred hundred-fifty of the genes listed in Table 1 and/or Table 4, more preferred two- hundred of the genes listed in Table 1 and/or Table 4, more preferred all of the p g-fiennfieRs l liifisttfiedd i inn T Taahbllfie 11 a anndd//oorr T Taabbllee 44.
The genes listed in Table 1 are rank-ordered. Ranking can be based on a correlation or significance of association with typing of a RNA sample from the tissue sample. Ranking can be based on a correlation with overall survival time, or on a correlation with recurrence free survival time, or on a correlation wiitthh differential expression between tumor samples from low-risk and high- risk patients, or based on the selection percentages of the genes during the multiple samples approach (Michiel et al., Lancet 365: 488-92 (2005)), as is known to a skilled person. Rank-ordering of the genes listed in Table 1 was performed according to their respective univariate p-value, which is a measure for association between the RNA level of the gene and disease recurrance.
A preferred set of genes for use in a method of the invention comprises the first five rank-ordered genes listed in Table 1, more preferred the first six rank-ordered genes, more preferred the first seven rank-ordered genes, more preferred the first eight rank-ordered genes, more preferred the first ten rank- ordered genes, more preferred the first fifteen rank-ordered genes, more preferred the first twenty rank-ordered genes, more preferred the first thirty rank-ordered genes, more preferred the first fourty rank-ordered genes, more preferred the first fifty rank-ordered genes, more preferred the first sixty rank- ordered genes, more preferred the first seventy rank-ordered genes, more preferred the first eighty rank-ordered genes, more preferred the first ninety rank-ordered genes, more preferred the first hundred rank-ordered genes, more preferred the first hundred-fifty rank-ordered genes, more preferred the first two-hundred rank-ordered genes, more preferred all two hundred fourty- one rank-ordered genes listed in Table 1.
Said rank-ordered genes from Table 1 are preferably combined with at least one of the genes listed in Table 4, more preferred at least two of the genes listed in Table 4, more preferred at least five of the genes listed in Table 4, more preferred at least ten of the genes listed in Table 4, more preferred at least twenty of the genes listed in Table 4, more preferred at least fifty of the genes listed in Table 4, more preferred at least hundred of the genes listed in Table 4, more preferred all of the genes listed in Table 4. A highly preferred signature comprises genes referred to in Table 1 as SEQ ID NO 1, SEQ ID NO 2, SEQ ID NO 3, SEQ ID NO 4, SEQ ID NO 5, SEQ ID NO 6, SEQ ID NO 7, SEQ ID NO 8, SEQ ID NO 9, SEQ ID NO 12, SEQ ID NO 13, SEQ ID NO 14, SEQ ID NO 15, SEQ ID NO 16, SEQ ID NO17, SEQ ID NO18, SEQ ID NO 19, SEQ ID NO20, SEQ ID NO 21, SEQ ID NO 22, SEQ ID NO 23, SEQ ID NO24, SEQ ID NO 25, SEQ ID NO 27, SEQ ID NO 28, SEQ ID NO 30, SEQ ID NO 31, SEQ ID NO 33, SEQ ID NO35, 36, SEQ ID NO 37, SEQ ID NO 38, SEQ ID NO39, SEQ ID NO 40, SEQ ID NO 41, SEQ ID NO 42, SEQ ID NO 43, SEQ ID NO 44, SEQ ID NO 47, SEQ ID NO 48, SEQ ID NO 49, SEQ ID NO 50, SEQ ID NO 52, SEQ ID NO 53, SEQ ID NO 55, SEQ ID NO 56, SEQ ID NO 57, SEQ ID NO 58, SEQ ID NO 60, SEQ ID NO 61, SEQ ID NO 62, SEQ ID NO 63, SEQ ID NO 66, SEQ ID NO 67, SEQ ID NO 69, SEQ ID NO 72, SEQ ID NO 73, SEQ ID NO 74, SEQ ID NO 75, SEQ ID NO 76, SEQ ID NO 78, SEQ ID NO 79, SEQ ID NO 82, SEQ ID NO 83, SEQ ID NO 85, SEQ ID NO 86, SEQ ID NO 87, SEQ ID NO 89, SEQ ID NO 90, SEQ ID NO 91, SEQ ID NO 94, SEQ ID NO 95, SEQ ID NO 96, SEQ ID NO 98, SEQ ID NO 100, SEQ ID NO 101, SEQ ID NO 102, SEQ ID NO 103, SEQ ID NO 104, SEQ ID NO 106, SEQ ID NO 109, SEQ ID NO 110, SEQ ID NO 111, SEQ ID NO 114, SEQ ID NO 115, SEQ ID NO 117, SEQ ID NO 118, SEQ ID NO 119, SEQ ID NO 120, SEQ ID NO 123, SEQ ID NO 125, SEQ ID NO 128, SEQ ID NO 130, SEQ ID NO 133, SEQ ID NO 134, SEQ ID NO 135, SEQ ID NO 137, SEQ ID NO 138, SEQ ID NO 139, SEQ ID NO 140, SEQ ID NO 141, SEQ ID NO 142, SEQ ID NO 143, SEQ ID NO 147, SEQ ID NO 149, SEQ ID NO 151, SEQ ID NO 155, SEQ ID NO 157, SEQ ID NO 158, SEQ ID NO 159, SEQ ID NO 160, SEQ ID NO 165, SEQ ID NO 168, SEQ ID NO 173, SEQ ID NO 175, SEQ ID NO 178, SEQ ID NO 180, SEQ ID NO 181, SEQ ID NO 183, SEQ ID NO 185, SEQ ID NO 186, SEQ ID NO 189, SEQ ID NO 191, SEQ ID NO 193, SEQ ID NO 195, SEQ ID NO 196, SEQ ID NO 197, SEQ ID NO 199, SEQ ID NO 202, SEQ ID NO 204, SEQ ID NO 205, SEQ ID NO 206, SEQ ID NO 207, SEQ ID NO 209, SEQ ID NO 212, SEQ ID NO 216, SEQ ID NO 217, SEQ ID NO 218, SEQ ID NO 219, SEQ ID NO 220, SEQ ID NO 222, SEQ ID NO 223, SEQ ID NO 224, SEQ ID NO 226, SEQ ID NO 228, SEQ ID NO 229, SEQ ID NO 230, SEQ ID NO 231, SEQ ID NO 232, SEQ ID NO 235, and SEQ ID NO 241.
Even more preferred is a combination of genes referred to in Table 1 as SEQ ID NO 1, SEQ ID NO 2, SEQ ID NO 3, SEQ ID NO 4, SEQ ID NO 5, SEQ ID NO 6, SEQ ID NO 7, SEQ ID NO 8, SEQ ID NO 9, SEQ ID NO 12, SEQ ID NO 13, SEQ ID NO 14, SEQ ID NO 15, SEQ ID NO 16, SEQ ID NO17, SEQ ID NO18, SEQ ID NO19, SEQ ID NO20, SEQ ID NO 21, SEQ ID NO 22, SEQ ID NO 23, SEQ ID NO24, SEQ ID NO 25, SEQ ID NO 27, SEQ ID NO 28, SEQ ID NO 30, SEQ ID NO 31, SEQ ID NO 33, SEQ ID NO35, 36, SEQ ID NO 37,
SEQ ID NO 38, SEQ ID NO39, SEQ ID NO 40, SEQ ID NO 41, SEQ ID NO 42, SEQ ID NO 43, SEQ ID NO 44, SEQ ID NO 47, SEQ ID NO 48, SEQ ID NO 49, SEQ ID NO 50, SEQ ID NO 52, SEQ ID NO 53, SEQ ID NO 55, SEQ ID NO 56, SEQ ID NO 57, SEQ ID NO 58, SEQ ID NO 60, SEQ ID NO 61, SEQ ID NO 62, SEQ ID NO 63, SEQ ID NO 66, SEQ ID NO 67, SEQ ID NO 69, SEQ ID NO 72, SEQ ID NO 73, SEQ ID NO 74, SEQ ID NO 75, SEQ ID NO 76, SEQ ID NO 78, SEQ ID NO 79, SEQ ID NO 82, SEQ ID NO 83, SEQ ID NO 85, SEQ ID NO 86, SEQ ID NO 87, SEQ ID NO 89, SEQ ID NO 90, SEQ ID NO 91, SEQ ID NO 94, SEQ ID NO 95, SEQ ID NO 96, SEQ ID NO 98, SEQ ID NO 100, SEQ ID NO 101, SEQ ID NO 102, SEQ ID NO 103, SEQ ID NO 104, SEQ ID NO 106, SEQ ID NO 109, SEQ ID NO 110, SEQ ID NO 111, SEQ ID NO 114, SEQ ID NO 115, SEQ ID NO 117, SEQ ID NO 118, SEQ ID NO 119, SEQ ID NO 120, SEQ ID NO 123, SEQ ID NO 125, SEQ ID NO 128, SEQ ID NO 130, SEQ ID NO 133, SEQ ID NO 134, SEQ ID NO 135, SEQ ID NO 137, SEQ ID NO 138, SEQ ID NO 139, SEQ ID NO 140, SEQ ID NO 141, SEQ ID NO 142, SEQ ID NO 143, SEQ ID NO 147, SEQ ID NO 149, SEQ ID NO 151, SEQ ID NO 155, SEQ ID NO 157, SEQ ID NO 158, SEQ ID NO 159, SEQ ID NO 160, SEQ ID NO 165, SEQ ID NO 168, SEQ ID NO 173, SEQ ID NO 175, SEQ ID NO 178, SEQ ID NO 180, SEQ ID NO 181, SEQ ID NO 183, SEQ ID NO 185, SEQ ID NO 186, SEQ ID NO 189, SEQ ID NO 191, SEQ ID NO 193, SEQ ID NO 195, SEQ ID NO 196, SEQ ID NO 197, SEQ ID NO 199, SEQ ID NO 202, SEQ ID NO 204, SEQ ID NO 205, SEQ ID NO 206, SEQ ID NO 207, SEQ ID NO 209, SEQ ID NO 212, SEQ ID NO 216, SEQ ID NO 217, SEQ ID NO 218, SEQ ID NO 219, SEQ ID NO 220, SEQ ID NO 222, SEQ ID NO 223, SEQ ID NO 224, SEQ ID NO 226, SEQ ID NO 228, SEQ ID NO 229, SEQ ID NO 230, SEQ ID NO 231, SEQ ID NO 232, SEQ ID NO 235, and SEQ ID NO 241, with at least ten of the genes listed in Table 4, more preferred more preferred at least twenty of the genes listed in Table 4, more preferred at least fifty of the genes listed in Table 4, more preferred at least hundred of the genes listed in Table 4, more preferred all of the genes listed in Table 4..
A cell sample is a clinically relevant sample that comprises a colorectal cancer cell or an expression product comprising a nucleic acid from a colorectal cancer cell.
In a preferred embodiment, a cell sample according to the invention is obtained directly from the large intestine during surgery. In an alternative embodiment, the cell sample is prepared from a biopsy sample that is taken during colonoscopy.
It is further preferred that the biopsies have a depth of at most 10 millimeter, more preferred at most 5 millimeter, with a preferred diameter of about 2 millimeter, more preferred about 3 millimeter, more preferred about 4 millimeter, more preferred about 5 millimeter, more preferred about 6 millimeter, more preferred about 7 millimeter, more preferred about 8 millimeter, more preferred about 9 millimeter, more preferred about 10 millimeter. However, other forms that are equal in size and total volume are also possible. In another preferred embodiment, the tissue sample comprises stool or blood voided by a patient suffering from colorectal cancer, said tissue sample comprising a colorectal cancer cell or a gene expression product such as a nucleic acid product from a colorectal cancer cell. Methods to purify cells or gene expression products such as RNA from human stool or blood samples are known in the art and have been described for example in patent application WO199820355, WO2003068788, and Yang et al. 2005. Cancer Lett 226: 55-63.
Samples can be processed in numerous ways, as is known to a skilled person. For example, they can be freshly prepared from cells or tissues at the moment of harvesting, or they can be prepared from samples that are stored at -700C until processed for sample preparation. Alternatively, tissues, biopsies, stool or blood samples can be stored under conditions that preserve the quality of the protein or RNA. Examples of these preservative conditions are fixation using e.g. formaline, RNase inhibitors such as RNAsin (Pharmingen) or RNasecure (Ambion), aquous solutions such as RNAIa ter (Assuragen; US06204375), Hepes-Glutamic acid buffer mediated Organic solvent Protection Effect (HOPE; DE10021390), and RCL2 (Alphelys; WO04083369), and non-aquous solutions such as Universal Molecular Fixative (Sakura Finetek USA Inc.; US7138226).
The RNA level of at least five of the genes listed in Table 1 and/or Table 4 can be determined by any method known in the art.
Methods to determine RNA levels of genes are known to a skilled person and include, but are not limited to, Northern blotting, quantitative PCR, and microarray analysis.
Northern blotting comprises the quantification of the nucleic acid expression product of a specific gene by hybridizing a labeled probe that specifically interacts with said nucleic acid expression product, after separation of nucleic acid expression products by gel electrophoreses. Quantification of the labeled probe that has interacted with said nucleic acid expression product serves as a measure for determining the level of expression. The determined level of expression can be normalized for differences in the total amounts of nucleic acid expression products between two separate samples by comparing the level of expression of a gene that is known not to differ in expression level between samples.
Quantitative Polymerase Chain Reaction (qPCR) provides an alternative method to quantify the level of expression of nucleic acids. qPCR can be performed by real-time PCR (rtPCR), in which the amount of product is monitored during the reaction, or by end-point measurements, in which the amount of a final product is determined. As is known to a skilled person, rtPCR can be performed by either the use of a nucleic acid intercalator, such as for example ethidium bromide or SYBR® Green I dye, which interacts which all generated double stranded products resulting in an increase in fluorescence during amplification, or by the use of labeled probes that react specifically with the generated double stranded product of the gene of interest. Alternative detection methods that can be used are provided by dendrimer signal amplification, hybridization signal amplification, and molecular beacons.
Different amplification methods, known to a skilled artisan, can be employed for qPCR, including but not limited to PCR, rolling circle amplification, nucleic acid sequence-based amplification, transcription mediated amplification, and linear RNA amplification.
For the simultaneous detection of multiple nucleic acid gene expression products, qPCR methods such as reverse transcriptase- multiplex ligation- dependent amplification (rtMLPA), which accurately quantifies up to 45 2008/050426
11
transcripts of interest in a one-tube assay (Eldering et al., Nucleic Acids Res 2003; 31: el53) can be employed.
Microarray analyses involve the use of selected biomolecules that are immobilized on a surface. A microarray usually comprises nucleic acid molecules, termed probes, which are able to hybridize to nucleic acid expression products. The probes are exposed to labeled sample nucleic acid, hybridized, and the abundance of nucleic acid expression products in the sample that are complementary to a probe is determined. The probes on a microarray may comprise DNA sequences, RNA sequences, or copolymer sequences of DNA and RNA. The probes may also comprise DNA and/or RNA analogues such as, for example, nucleotide analogues or peptide nucleic acid molecules (PNA), or combinations thereof. The sequences of the probes may be full or partial fragments of genomic DNA. The sequences may also be synthesized nucleotide sequences, such as synthetic oligonucleotide sequences.
It is preferred that said RNA levels are determined simultaneously. Simultaneous analyses can be performed, for example, by multiplex qPCR and microarray analysis. Microarray analyses allow the simultaneous determination of the nucleic acid levels of expression of a large number of genes, such as more than 50 genes, more than 100 genes, more than 1000 genes, or even more than 10.000 genes, allowing the use of a large number of gene expression data for normalization of the genes comprising the colorectal expression profile.
In a preferred embodiment, therefore, said RNA levels are determined by microarray analysis.
Said probe is specific for a gene listed in Table 1 and/or Table 4. A probe can be specific when it comprises a continuous stretch of nucleotides that are completely complementary to a nucleotide sequence of a RNA product of said gene, or a cDNA product thereof. A probe can also be specific when it comprises a continuous stretch of nucleotides that are partially complementary to a nucleotide sequence of a RNA product of said gene, or a cDNA product thereof. Partially means that a maximum of 5% from the nucleotides in a continuous stretch of at least 20 nucleotides differs from the corresponding nucleotide sequence of a RNA product of said gene. The term complementary is known in the art and refers to a sequence that is related by base-pairing rules to the sequence that is to be detected. It is preferred that the sequence of the probe is carefully designed to minimize nonspecific hybridization to said probe. It is preferred that the probe is or mimics a single stranded nucleic acid molecule. The length of said complementary continuous stretch of nucleotides can vary between 15 bases and several kilo bases, and is preferably between 20 bases and 1 kilobase, more preferred between 40 and 100 bases, and most preferred 60 nucleotides.
To determine the RNA level of at least two of the genes listed in Table 1 and/or Table 4, the RNA sample is preferably labeled, either directly or indirectly, and contacted with probes on the array under conditions that favor duplex formation between a probe and a complementary molecule in the labeled RNA sample. The amount of label that remains associated with a probe after washing of the microarray can be determined and is used as a measure for the level of RNA of a nucleic acid molecule that is complementary to said probe.
The determined RNA levels for genes listed in Table land/or Table 4 can be normalized to correct for systemic bias. Systemic bias results in variation by inter-array differences in overall performance, which can be due to for example inconsistencies in array fabrication, staining and scanning, and variation between labeled RNA samples, which can be due for example to variations in purity. Systemic bias can be introduced during the handling of the sample in a microarray experiment. To reduce systemic bias, the determined RNA levels are preferably corrected for background non-specific hybridization and normalized using, for example, Feature Extraction software (Agilent Technologies). Other methods that are or will be known to a person of ordinary skill in the art, such as a dye swap experiment (Martin-Magniette et al., Bioinformatics 21:1995-2000 (2005)) can also be applied to normalize differences introduced by dye bias.
Conventional methods for normalization of array data include global analysis, which is based on the assumption that the majority of genetic markers on an array are not differentially expressed between samples [Yang et al., Nucl Acids Res 30: 15 (2002)]. Alternatively, the array may comprise specific probes that are used for normalization. These probes preferably detect RNA products from housekeeping genes such as glyceraldehyde-3-phosphate dehydrogenase and 18S rRNA levels, of which the RNA level is thought to be constant in a given cell and independent from the developmental stage or prognosis of said cell.
Therefore, a preferred method according to the invention further comprises normalizing the determined RNA levels of said set of at least five of the genes listed in Table 1 and/or Table 4 in said sample.
In a preferred embodiment, genes are selected of which the RNA levels are largely constant between different tissue samples comprsing colorectal cells from one individual, and between tissue samples comprsing colorectal cells from different individuals. It is furthermore preferred that RNA levels of said set of normalization genes differ between the genes. For example, it is preferred to select genes with a low RNA level in said tissue sample, and genes with a high RNA level. More preferred is to select genes genes with a low RNA level in said tissue sample, genes with a moderate RNA level, and genes with a high RNA level. It will be clear to a skilled artisan that the RNA levels of said set of normalization genes preferably allow normalization over the whole range of RNA levels.
A preferred method of the invention comprises determining the RNA level of at least five of the genes listed in Table 2 in an RNA sample of an individual suffering from colorectal cancer, and using the determined RNA levels for normalizing the determined RNA levels of the set of at least five of the genes listed in Table 1 and/or Table 4 in said sample.
Thus, the invention also provides a method according to the invention, whereby the RNA level of at least five of the genes listed in Table 2 are used for normalizing the determined RNA levels of the set of at least five of the genes listed in Table 1 and/or Table 4 in said sample.
The genes listed in Table 2 were identified because their expression level was largely constant when comparing different RNA samples from one individual, and when comparing relevant RNA samples from different individuals.
In a preferred embodiment, a method of the invention comprises determining the RNA levels of at least six of the genes listed in Table 2, more preferred at least seven of the genes listed in Table 2, more preferred at least eight of the genes listed in Table 2, more preferred at least nine of the genes listed in Table 2, more preferred at least ten of the genes listed in Table 2, more preferred at least fifteen of the genes listed in Table 2, more preferred at least twenty of the genes listed in Table 2, more preferred at least twenty-five of the genes listed in Table 2, more preferred at least thirty of the genes listed in Table 2, more preferred at least fourty of the genes listed in Table 2, more preferred at least fifty of the genes listed in Table 2, more preferred at least sixty of the genes listed in Table 2, more preferred at least seventy of the genes listed in Table 2, more preferred at least eighty of the genes listed in Table 2, more preferred hundred of the genes listed in Table 2, more preferred hundred-fifty of the genes listed in Table 2, more preferred two-hundred of the genes listed in Table 2, more preferred all of the genes listed in Table 2.
The RNA level of the individual genes listed in Table 2 ranges from low to high. In a preferred embodiment, said at least five genes from Table 2 are selected to include genes of which the RNA levels largely cover the range of RNA levels from low to high.
In apreferred method of the invention, the method further comprises multiplying each of said determined values with a predetermined constant for said gene to obtain a weighted value for the relative RNA level of said gene, and thereby a set of weighted values for said set of genes, said method further comprising typing said sample on the basis of said set of weighted values.
Said set of weighted values can be summed and compared to a summed set of weighted values from a reference sample. It is preferred that said summed set of weighted values is compard to a classification treshold, that is determined by the values obtained from RNA samples of which the typing is known.
Colorectal cancers such as adenocarcinomas are staged dependent on the visible invasiveness of the surrounding tissue. According to the Modified Duke Staging System, stage A adenocarcinoma is defined as a cancer that penetrates into the mucosa of the bowel wall but no further. Stage Bl defines a cancer that penetrates into, but not through the muscularis propria (the muscular layer) of the bowel wall, while a B2 defines a cancer that penetrates into and through the muscularis propria of the bowel wall. Stage Cl defines a cancer that penetrates into, but not through the muscularis propria of the bowel wall, combined with pathologic evidence of colon cancer in the lymph nodes, while stage C2 defines a cancer which penetrates into and through the muscularis propria of the bowel wall, combined with pathologic evidence of colon cancer in the lymph nodes. Finally, stage D defines a cancer, which has spread beyond the confines of the lymph nodes to organs such as the liver, lung or bone.
An alternative staging system is provided by the TNM Staging System, which combines data about the Tumor (T), the spread to lymph nodes (N), and the existence of distant metastases (M). A TNM stage I colorectal cancer is defined as a cancer that began to spread and has invaded the submucosa or the muscularis propria. A TNM stage I is equal to a Duke's stage A. TNM stage II defines a cancer that has invaded through the muscularis propria into the subserosa, or into the pericolic or perirectal tissues, but has not reached the lymph nodes. A stage III defines a cancer that has spread to the lymph nodes in the absence of distant metastases. A stage IV defines a cancer that has spread to distant sites.
It has been estimated that the absolute benefit from adjuvant chemotherapy for a TNM stage III patient is about 8.5%, while the absolute benefit from adjuvant chemotherapy for a TNM stage II patient is 3.6 %. Given that many of the treatments are very toxic, with perhaps a toxic death rate of between 0.5% and 1%, then accurately defining prognosis for an individual is an essential part of that patient's management.
Although undoubtedly real, the benefit from chemotherapy for stage II patients is small, with the proportional reduction in the risk of death being 18% (CI 5% to 30%), which translates to an absolute improvement in five-year survival of about 3.6% (1.0% to 6.0%) for a stage II patient with five-year mortality of 20%. Since it is clear that some TNM stage II patients have a reasonable prognosis, and the balance of pros and cons of chemotherapy for these individuals might result in favor of not having adjuvant therapy. Current recommendations are that stage II patients with a higher than average risk of tumor recurrence - e.g. T4 stage or vascular invasion, comprising about 15% of the population, should be offered chemotherapy. However, such prognostic markers have not been proven to increase treatment benefit (Benson AB, Schrag D, Sommerfield MR et al. (2004) American Society of Clinical Oncology recommendations on adjuvant chemotherapy for stage II colon cancer. J Clin Oncol 22(16):3408-19).
Therefore, in a preferred embodiment, the invention provides a method of typing an individual suffering from colorectal cancer, wherein said colorectal cancer comprises a TNM stage II or TNM stage III colorectal cancer as determined by the TNM Staging System.
It is preferred that said typing in a method according to the invention allows differentiating cancer cells with a low metastasizing potential or risk of cancer recurrence and cancer cells with a high metastatic potential or risk of cancer recurrence.
To differentiate cancer cells with a low metastasizing potential and cancer cells with a high metastatic potential, the RNA levels at least five of the genes listed in Table 1 and/or Table 4 can be compared to RNA levels of said genes in a reference sample.
The reference sample can be an RNA sample isolated from a colorectal tissue from a healthy individual, or an RNA sample from a relevant cell line or mixture of cell lines. Said reference sample can also be an RNA sample from a cancerous growth of an individual suffering from colorectal cancer. Said individual suffering from colorectal cancer can have an increased risk of cancer recurrence, or a low risk of cancer recurrence.
It is preferred that said reference sample is an RNA sample from an individual suffering from colorectal cancer and having a low risk of cancer recurrence. In a more preferred embodiment, said reference sample is a pooled RNA sample from multiple tissue samples comprising colorectal cells from individuals suffering from colorectal cancer and having a low risk of cancer recurrence. It is preferred that said multiple tissue sample comprises more than 10 tissue samples, more preferred more than 20 tissue samples, more preferred more than 30 tissue samples, more preferred more than 40 tissue samples, most preferred more than 50 tissue samples.
The reference sample could also be RNA isolated and pooled from colon tissue from healthy individuals, or from so called normal adjacent tissue from colon cancer patients or RNA from a generic cell line or cell line mixture. The RNA from a cell line or cell line mixture can be produced in-house or obtained from a commercial source such as, for example, Stratagene Human Reference RNA.
Typing of a sample can be performed in various ways. In one method, a coefficient is determined that is a measure of the similarity or dissimilarity of a sample with said reference sample. A number of different coefficients can be used for determining a correlation between the RNA expression level in an RNA sample from an individual and a reference sample. Preferred methods are parametric methods which assume a normal distribution of the data. One of these methods is the Pearson product-moment correlation coefficient, which is obtained by dividing the covariance of the two variables by the product of their standard deviations. Preferred methods comprise cosine-angle, un- centered correlation and, more preferred, cosine correlation (Fan et al., Conf Proc IEEE Eng Med Biol Soc. 5:4810-3 (2005)).
Preferably, said correlation with a reference sample is used to produce an overall similarity score for the set of genes that are used. A similarity score is a measure of the average correlation of RNA levels of a set of genes in an RNA sample from an individual and a reference sample. Said similarity score can be a numerical value between +1, indicative of a high correlation between the RNA expression level of the set of genes in a RNA sample of said individual and said reference sample, and -1, which is indicative of an inverse correlation and therefore indicative of having an increased risk of cancer recurrence (van 't Veer et al., Nature 415: 484-5 (2002)).
In another aspect, the invention provides a method of classifying an individual suffering from colorectal cancer, comprising classifying said individual as having a poor prognosis or a good prognosis by a method comprising determining a similarity value between RNA levels from a set of at least five genes listed in Table 1 and/or Table 4 in a RNA sample from said individual and a level of expression from said set of genes in a RNA sample from a patient having no recurrent disease within five years of initial diagnosis, and classifying said individual as having a poor prognosis if said similarity value is below a similarity threshold value, and classifying said individual as having a good prognosis if said similarity value exceeds a similarity threshold value.
A preferred method of classifying samples as either high or low risk for disease recurrence involves the use of a classification template, derived from Support Vector Machine (SVM) training using all genes identified as being correlated with disease progression. Each gene in the template (signature) has a corresponding weighting factor, as determined by the SVM implementation by Chang & Lin (Chih-Chung Chang and Chih-Jen Lin, LIBSVM : a library for support vector machines, 2001. http://www.csie.ntu.edu.tw/~cjlin/libsvm). This algorithm analyses the information contained in the 241 genes across all training set samples and constructs a classification template that best separates patients with recurrence from those without.
LIBSVM, developed by Chih-Chung Chang and Chih-Jen Lin, is an integrated software for analyzing many problems in supervised classification or regression frameworks. By multiplying the loglO expression value for each gene in the template
^=241) a weighted log ratio is obtained. The sum of these weighted ratios is then calculated to produce a classification index for the sample being tested. The index is then compared to a classification threshold, also determined by the SVM algorithm to achieve maximum separation between high and low risk groups in the training set. If the index of the test sample is greater than the threshold (-0.107) the sample is classified as low risk. If the index is lower than the threshold (-0.107) the sample is classified as high risk.
A similarity threshold value is an arbitrary value that allows discriminating between RNA samples from patients with a high risk of cancer recurrence, and RNA samples from patients with a low risk of cancer recurrence.
Said similarity threshold value is set at a value at which an acceptable number of patients with known metastasis formation within five years after initial diagnosis would score as false negatives above the threshold value, and an acceptable number of patients without known metastasis formation within five years after initial diagnosis would score as false positives below the threshold value. 26
21
A similarity score and or a resultant of said score, which is a measurement of a high risk or a low risk of cancer recurrence, is preferably displayed or outputted to a user interface device, a computer readable storage medium, or a local or remote computer system.
In an alternative embodiment the invention provides a method of classifying an individual suffering from colorectal cancer, comprising classifying said individual as having a poor prognosis or a good prognosis by a method comprising (a) providing an RNA sample from a said individual that is prepared from a tissue sample from said individual, said tissue sample comprising colorectal cancer cells or suspected to comprise colorectal cancer cells; (b) determining a level of RNA for a set of genes comprising at least five of the genes listed in Table 1 and/or Table 4 in said sample; (c) determining a similarity value between a level of expression from the set of genes in said individual and a level of expression from said set of genes in a patient having no recurrent disease within five years of initial diagnosis; and (d) classifying said individual as having a poor prognosis if said similarity value is below a first similarity threshold value, and classifying said individual as having a good prognosis if said similarity value exceeds said first similarity threshold value.
In a preferred method of the invention, the level of RNA for a set of genes comprising at least five of the genes listed in Table 1 and/or Table 4 in said sample is normalized. Normalization can be performed by any method known to a skilled artisan, including global analysis and the use of specific probes. In a preferred embodiment, the RNA level of at least 5 of the genes listed in Table 2 are used for normalization.
In yet another aspect, the invention provides a method of assigning treatment to an individual suffering from colorectal cancer, comprising classifying said individual as having a poor prognosis or a good prognosis with a method according to the invention, and assigning adjuvant chemotherapy if said individual is classified as having said poor prognosis.
The routine treatment for colorectal cancer is surgery, which may be followed by additional treatment, such as adjuvant chemotherapy and radiotherapy. Chemotherapy comprises the use of natural or non -natural compounds to eliminate fast-dividing, and therefore susceptible, cancer cells. Chemotherapeutic compounds comprise alkylating agents such as decarbazine and cyclophosphamide; DNA crosslinking agents such as cisplatin and carboplatin; antimetabolitic agents such as methotrexate, 5-fluorouracil (5FU), and mercap top urine; alkaloidic agents such as taxanes such as paclitaxel and docetaxel, vincristine and vinblastine; topoisomerase inhibitors such as camptothecins and amsacrine; Antibiotics such as anthracycline glycosides such as doxorubicin, daunorubicin, idarubicin, pirarubicin, and epirubicin; mytomycin; poly amine biosynthesis inhibitors such as eflornithine; mycophenolic acid and other inosine monophosphate dehydrogenase inhibitors; and anthrapyrazoles such as mitoxantrone, piroxantrone, and losoxantrone.
The current standard surgical adjuvant treatment for colorectal cancer comprising modified TNM Stage III or higher is FOLFOX 4. FOLFOX combines oxaliplatin, leucovorin, and infusional 5FU. Leucovorin is a drug that is used to enhance the anti-cancer effect of chemotherapy, and especially 5FU. Other therapies uses are XELOX, a combination therapy comprising oxaliplatin and capecitabine; and FOLFIRI, which combines 5-FU, leucovorin, and irinotecan, a topoisomerase 1 inhibitor. These can be combined with antibody-based therapeutics including but not limited to bevacizumab, which inhibits angiogenesis, cetuximab, an Epidermal Growth Factor Receptor inhibitor, and panitumumab, an Epidermal Growth Factor receptor inhibitor. A cancer that originates in the colon or rectum is termed a colorectal cancer, bowel cancer, colon cancer, or rectal cancer. There are differences in treatment approaches for colon and rectal cancers. These differences involve the use of neoadjuvant there apy, the kind of surgery and the use of chemotherapy alone or chemotherapy plus radiation.
In a preferred embodiment, a colorectal cancer according to the invention relates to a colon cancer.
In another preferred embodiment, a colorectal cancer according to the invention relates to a rectal cancer.
The colon and rectum are part of the digestive tract. The colon absorbs water, electrolytes, and nutrients from food and transports them into the bloodstream. It is about 6 feet in length and consists of the cecum (connects to the small intestine at the cecal valve), the ascending colon (the vertical segment located on the right side of the abdomen), the transverse colon (extends across the abdomen), the descending colon (leads vertically down the left side of the abdomen), and the sigmoid colon (extends to the rectum).
The rectum is the last segment of the large intestine. It is 8 to 10 inches in length and leads to the anus, which is the opening to the outside of the body. Waste material (fecal matter) is stored in the rectum until it is eliminated from the body through the anus.
In yet another aspect, the invention provides a method for typing colorectal cancer cells according to the invention to select patients having an increased chance of responding to therapy. Said method can help to obtain an appropriate definition of a patient population by revealing potential disease subtypes that may differ in etiology, pathogenesis, and response to treatments. For example, said method can be instrumental for identifying subsets of colorectal cancer patients who are at risk for certain complications or who preferentially benefit from specific treatments. Information about colorectal subtypes could also substantially improve the design of future colorectal clinical studies by improving patient selection, reducing variability, and focusing on relevant outcome measures.
In yet another aspect, the invention relates to an array, comprising between 5 and 12.000 nucleic acid molecules comprising a first set of nucleic acid molecules wherein each nucleic acid molecule of said first set comprises a nucleotide sequence that is able to hybridize to a different gene selected from the genes listed in Table 1 and/or Table 4.
Said first set of nucleic acid molecules comprises at least five of the genes listed in Table 1, more preferred at least six of the genes listed in Table 1 and/or Table 4, more preferred at least seven of the genes listed in Table 1 and/or Table 4, more preferred at least eight of the genes listed in Table 1 and/or Table 4, more preferred at least nine of the genes listed in Table 1 and/or Table 4, more preferred at least ten of the genes listed in Table 1 and/or Table 4, more preferred at least fifteen of the genes listed in Table 1 and/or Table 4, more preferred at least twenty of the genes listed in Table 1 and/or Table 4, more preferred at least twenty-five of the genes listed in Table 1 and/or Table 4, more preferred at least thirty of the genes listed in Table 1 and/or Table 4, more preferred at least forty of the genes listed in Table 1 and/or Table 4, more preferred at least fifty of the genes listed in Table 1 and/or Table 4.
In a preferred embodiment, an array according to the invention further comprises a second set of nucleic acid molecules wherein each nucleic acid molecule of said second set comprises a nucleotide sequence that is able to hybridize to normalization gene, whereby it is preferred that the RNA levels of said normalization genes are dissimilar.
It is preferred that said second set of nucleic acid molecules comprises nucleic acid molecules that are able to hybridize to at least 5 of the genes listed in Table 2.
In yet another aspect, the invention provides the use of an array according to the invention for obtaining a colorectal expression profile.
Figure legends
Figure 1: Hierarchical clustering of the 241 genes used for prognostic classification in the training set of patients
Figure 2: (a) Scatter plot of prediction indices for samples in the training set. Example of optimal classification (over trained) as this dataset was used to train the algorithm itself, (b) Scatter plot of prediction indices vs months followup time for the validation set. Patients with recurrence in the time they have been followed up are shown in red. Patients without recurrence in the length of followup time are shown as blue squares. The orange dashed line at - 0.107 indicates the classification threshold used to determine if a sample belongs to the high or low risk category.
Figure 3: Kaplan Meier analysis of time to recurrence for all stage II and III patients in a training set as classified by 241 gene SVM classifier. Log rank test p-value = <0.0001
Figure 4: Kaplan Meier analysis of time to recurrence for stage II patients in training set, as classified by 241 gene SVM classifier. Log rank test p-value = 0.056
Figure 5: Kaplan Meier analysis of time to recurrence for stage II patients in validation set, as classified by 241 gene SVM classifier. Log rank test p-value = 0.098 Examples
Example 1 Generation of classifier Patients Clinical and pathological information documented at the time of surgery included stage, grade, size and location of tumors. Additionally, the number of lymph nodes assessed for nodal involvement was described in 95% of cases. Tumors were staged according to the TMN staging system. All tissue samples were collected from patients with appropriate informed consent. The study was carried out in accordance with the ethical standards of the Helsinki Declaration and was approved by the Medical Ethical Board of the participating medical centers and hospitals. Patients were monitored for survival and recurrence for up to 270 months. Detailed patient information can be found in Table 3.
Microarray hybridization
Aliquots of total RNA from frozen tumor samples were available for this study. Two-hundred nanogram total RNA was amplified using the Low RNA Input Fluorescent Labeling Kit (Agilent Technologies). Cyanine 3-CTP or Cyanine 5- CTP (GE Health Care) was directly incorporated into the cRNA during in vitro transcription. A total of 750 ng of Cyanine-labeled RNA was co-hybridized with a standard reference to Agilent 44k oligo nucleotide microarrays at 60 degrees Celsius for 17 hrs and subsequently washed according to the Agilent standard hybridization protocol (Agilent Oligo Microarray Kit, Agilent Technologies).
StrataGene Universal Human Reference RNA (Catalog #740000) was used in the reference channel of each microarray. Microarray image analysis
Fluorescence intensities on scanned images were quantified, values corrected for background non-specific hybridization, and normalized using Agilent Feature Extraction software (Version 8.5.1) according to the manufactures recommended settings for the Agilent Whole Genome 44k microarray. The default normalisation procedure for this microarray includes a linear and loess component, which corrects for any difference in Cy3/5 dye incorporation and centers the final profile at 0 (loglO scale, Cy5/Cy3). This process is described in the Agilent Feature Extraction 8.5 Reference Guide.
Data pre-processing
For each tumor specimen, two microarray hybridisations were carried out.
Straight: Cy5, Sample RNA; Cy3 Reference RNA
Dyeswap: Cy 5, Reference RNA. Cy 3, Sample RNA The normalised gene expression ratios from each hybridisation were combined to produce a single gene expression profile, per patient, using Agendia XPrint software (version 1.5.1). To obtain a single expression ratio value for each of the signature genes on the array, an error-weighted mean value was calculated for the probes belonging to the same gene as loglO ratios. To establish appropriate relative weights, the Rosetta error model was used, which corrects for the uncertainties in individual probe measurements (Weng L et al, Bioinformatics 22:1111-21 (2006)). A text file containing normalised, error- weighted log ratios was generated, which was then used for further analysis. The data were then loaded into BRB ArrayTools (Simon et al., Cancer
Informatics 2: 11-17 (2007)). To obtain a single expression ratio value for each unique probe on the array, a mean ratio value was calculated for all probes present more than one time. To correct for any systematic variation between training and validation sets, all data were normalised to the median expression of a list of 300 normalisation probes, selected from the training set. (ColoPrint_Normalisation300_ProbeIDs.xls)
Probes for normalisation were selected by selecting all probes with fewer than 5% missing values across the training set and with a coefficient of variance (CV, ie standard deviation of log ratio / mean log ratio) < 0.01.
Unsupervised data reduction
To exclude microarray probes with negligible differential expression from further analysis, a weak unsupervised filter was applied to the normalised, straight/dye swap combined dataset. Features passing the following filtering criteria in ArrayTools were selected: (i) Minimum fold -change filter
Genes that have low variability were filtered out using the minimum fold- change filter. Here the criterion for filtering out a gene is based upon the percentage of expression values for that gene which have at least a minimum 1.5 fold-change from the median expression value for that gene. If less than a specified percentage of expression values meet the minimum fold-change requirement, then the gene is filtered out. (ii) Log expression variation filter Filtering based on the variance for the gene across the arrays was applied in ArrayTools. Statistical significance criterion based on the variance was used whereby the variance of the log-ratios for each gene is compared to the median of all the variances. Those genes not significantly more variable (p<0.01) than the median gene are filtered out. Specifically, the quantity (n-1) Vari / Varmed is computed for each gene I, wherein Vari is the variance of the log intensity for gene i across the entire set of n arrays and Varmed is the median of these gene-specific variances. This quantity is compared to a percentile of the chi-square distribution with n- 1 degrees of freedom. This is an approximate test of the hypothesis that gene i has the same variance as the median variance. (iii) Percent missing filter
Here the criterion for filtering out a gene is based upon the percentage of expression values that are not missing and not filtered out by any of the previous spot filters. A threshold of no more than 25% missing values was applied.
After applying these filters, 15, 075 probes remained for further analysis.
Prognostic gene selection
To identify genes suitable to predict the likelihood of disease recurrence, we identified genes whose expression was significantly related to survival of the patient, independent of patient stage, where survival is defined as disease recurrence.
Cox regression is a model in which the hazard function for an individual is a function of predictor variables. In our case, the predictor variables are log expression levels. The hazard function is the instantaneous force of mortality at any time conditional on having survived until that time. The proportional hazards model postulates that the logarithm of the hazard of death is a linear function of the predictor variables, linked by unknown regression coefficients. For more details, see the Cox DR., Journal of the Royal Statistical Society B 34:187-220, 1972).
We computed a statistical significance level for each gene based on univariate proportional hazards models including patient stage as a clinical covariate . This ensures that genes selected by the algorithm are significantly predictive of survival, over and above the predictive ability of clinical staging. These p- values were then used in a multivariate permutation test, in which the survival times (ie. time to disease recurrence) and censoring indicators (ie. recurrence = 1, no recurrence - 0) were randomly permuted among arrays. We used the multivariate permutation test to provide 90% confidence that the false discovery rate was less than 10%. The false discovery rate is the proportion of the list of genes claimed to be differentially expressed that are false positives. The multivariate permutation test is non-parametric and does not require the assumption of Gaussian distributions.
Using a minimum significance threshold of 0.001, this approach identified 241 probes ("genes") from the previously unsupervised-filtered list of 15,075. These genes are presented in Table 1. See also Figure 1.
Example 2 Classifier training
We developed a model for utilizing these 241 gene expression ratios to predict the class of future samples, based on the Support Vector Machines with linear kernel (Ramaswamy et al., Proceedings of the National Academy of Sciences USA 98:15149-54, (2001)). The models incorporated all genes that were selected by the previous Cox Regression analysis.
SVM means "Support Vector Machine", a general-purpose machine learning algorithm. SVMs deliver state-of-the-art performance in real-world applications such as text categorization, hand-written character recognition, image classification, and bioinformatics. Developed by Vladimir Vapnik (Boser, B., I. Guyon, and V. Vapnik (1992). In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pp. 144-152. ACM Press; Cortes, C. and V. Vapnik (1995). Machine Learning 20, 273-297), SVMs perform the Structural Risk Minimization (SRM) principle, which minimizes an upper bound on the generalization error, as opposed to Empirical Risk Minimization (ERM) which minimizes the error on training data. It is this difference which equips SVMs with a greater potential to generalize.
SVM was used to analyse the information contained in the 241 genes across all training set samples and to construct a classification template that best separates patients with recurrence from those without (see Figure 2). Based on these anlyses, a treshold was set at -0.107 Index units. We estimated the prediction error of each model using leave-one-out cross- validation (LOOCV) as previously described (Simon et al., Journal of the National Cancer Institute 95:14-18, (2003). We evaluated whether the cross- validated error rate estimate for a model was significantly less than one would expect from random prediction. The class labels were randomly permuted and the entire LOOCV process was repeated. The significance level is the proportion of the random permutations that gave a cross-validated error rate no greater than the cross-validated error rate obtained with the real data. 1000 random permutations were used.
The classifier was observed to produce the following sensitivity, specificity, positive (PPV) and negative (NPV) predictive values for prediction of disease recurrence in those samples included in the training process.
Example 3 Classifier evaluation
In-silico validation of prognostic gene selection
A global test whether the predictor is picking up the random noise in the data and the outcome classes do not differ at all with regard to expression profiles was performed. A permutation analysis was used for the computation of the p value for this global test. Class labels of the samples are randomly permuted 1,000 times. For each permutation, samples were classified and the 1Ox CV misclassification rate of the classifier was computed as a proportion of correctly predicted samples. The p-value of the predictor is the proportion of permutations with misclassification rate smaller than the misclassification rate of the original labelling. A global p-value less than 0.05 is considered significant. Since one global hypothesis was tested, stringent control for multiple comparisons was not necessary for the global test. Based on 100 random permutations, a probability that the support vector machines classifier is classifying on random noise was calculated to be 0.004. Another way of stating this is that we can be 99.6% sure that the classifier is based on true biological information.
Example 4 Cross validation of the 241 signature genes in the training set. A series of stage II and stage III tumors that were used in the gene selection and algorithm training process was typed using the 241 gene model. A typing for each sample in the training set was determined by leave-one -out cross validation. The resulting typing is presented in Figures 3 and 4.
Application of 241 gene predictor to independent validation samples
A series of stage II tumors, not used in the gene selection or algorithm training process was typed using the 241 gene model. These samples are described in the column titled 'validation series' in Table 3. Each prediction was compared to the current known status of the patient and Kaplan Meier analysis (see Figure 5) was used to evaluate the survival difference between risk groups.
Method of typing new samples
To generate a prediction for a test sample, the normalised gene expression ratios for each gene in the signature was multiplied by a weight, determined by the SVM algorithm. The weighted loglO expression ratios are then summed and a classification of high- or low-risk is determined by comparison of the summed value to a classification threshold.
Samples which result in a prediction score (calculated from log 10 expression ratios) of less than -0.107 were classified as high risk, whereas samples that scored above or equal to -0.107 were scored as low risk for disease recurrence.
Ie. The prediction rule is defined by the inner sum of the weights (wi) and expression (xi) of significant genes. A sample is classified to the class low risk if the sum is greater than the threshold; that is, ∑iwi xi > -0.107
A sample is classified to the class high risk if the sum is greater than the threshold; that is, ∑iwi xi < -0.107
The weights (wi) for each gene in the classifier are provided in Table 1.
Table 1
List of signature genes that are rank-ordered according to the univariate p-value. The determined SVM-weight is included for each of the genes.
Order Univariate SVM Sequence GeneName SystematicNam SEQ p-value weight e ID NO
1 0.000000273 0.014 AGTTCAACGCAGGTTGTGGGTCTGTCTTGAACACAAATGGTGAGGTTGAAATGGATGCTA ASRGLl NM_025080 1
2 0.00000093 0.005 CAATGTAGTGCCTTCATTTCATTCTTGAGGAATCAACATTACATTTAGGTGGTGAAATAC T40959 T40959 2
3 0.0000012 -0.04 AGCAGCTATGGGGGTTCCAGAGATTCCTGGAGAAAAACTGGTGACAGAGAGAAACAAAAA ASRGLl NM_025080 3
4 0.00000194 0.007 TGGCTCCCAGTCGGGTATCAGTGCAGATGTGGAAAAACCAAGTGCTACTGACGGCGTTCC SALL4 NM 020436 4
5 0.00000275 0.01 CACAGGAGGAGTTACAGAAACAAGAGAGAGAATCTGCAAAGTCAGAACTTACAGAATCTT AKAP12 NM_144497 5
6 0.00000571 -0.043 CCATTATGCCTTCCGCATCTTTGACTTTGATGATGACGGAACCTTGAACAGAGAAGACCT CIBl NMJD06384 6
7 0.00000584 0 CCCAGTTCTATGTTTGTGTGTTGAGAGAGAGCCAGAGTATGTGTATAAGCTGTAGCAATC THC2383841 THC2383841 7
8 0.00000673 -0.052 CATTATGCCTTCCGCATCTTTGACTTTGATGATGACGGAACCTTGAACAGAGAAGACCTG NM 006384 NM 006384 8 w
9 0.00000678 0.036 CTGTCCITGATGTACAGCAGTCCAGCΓGTTGTTTTGCATTTATTAAAATATTCTCTGCTA AA854957 AA854957 9 °"
10 0.00000854 -0.121 CTCTGATTAAACATAAGAAAATTCATATGGAGAGAGACCCTATAAATGCAGTGTCTGTGA ZNF673 NM_017776 10
11 0.0000133 -0.071 AAAGAACTTAAGGATTTCTGCAAGTATGAGTGAAGACCCACAGTGCACCAGAGCACAGCT NEURL2 NMJ380749 11
12 0.0000143 0.077 CCTGTTTACACTGTGAAGTGGATATTGTTACAGAAAACACACCAGTGGCTTTCTCACTGT A_32_P232647 A_32_P232647 12
13 0.0000179 0.081 GTAGATAGATATAGCTGTGGGATAAAGAACTTCTGCTGAATGATGATTTGCAAAAGTATG AK092379 AK092379 13
14 0.0000212 -0.022 ACCAGGAGAATCAGGACTATCTTTTGGGAAATATAAGCTGGGTCCTTATGATGGAGTCGG AK124352 AK124352 14
15 0.0000213 0.077 ACAACATTTACCTAATGTCATTCACTAACATGGAAGAGTTGTGAAAATTCTAGAGTGCTG AL137342 AL137342 15
16 0.0000269 -0.086 ACACCATΠTAC I I I I ATTTCGGCAAGACAGTGTGATGAGAGAGAAAGAACACTGGGGTA A_32_P183656 A_32_P183656 16
17 0.0000335 0.083 AAAGTCACTCTAGGTTGTACAAAGGTTCTATGTATACGTCTGTTACAAGTAACAAAGCTA A_32_P162443 A_32_P162443 17
18 0.0000376 -0.072 AAGTGAGCTTCTACAAGATGTCAGAGTCAACCAGAGCAGGTGTGAGATGAAGACTTTTGT TRIM 15 NM_033229 18
19 0.000043 0.078 AAGGATGCTTAGATGCTGGATTGCAAAAAATGTACAGCTGTCTACTACAACCCCAAATCT BX108121 BX108121 19
20 0.0000434 -0.065 CACGGAGTGTGCCAAAACTAAAAAGCATΠTGAAACATACAGAATGTTCTATTGTCATTG NM 005100 NM_005100 20
21 0.0000506 -0.124 TCCAACTAGAAGTGACAGAAAATAGAAAGAGTCCAGTAATGTGAGGTCTTGAAGTGTTGG THC2351317 THC2351317 21
ENST0000023857 ENST0000023857
22 0.0000551 0.033 ACACGTTTTTGGAGGATACCAAGAAGCTGTACCACTCAGAAGCCTCTTCCATCAACTTCA 6 6 22
23 0.0000566 -0.038 TCCGTGGCAGGAACCAGAGGCCACATGTGGCTGCTCGTATTTAAGTTAATTAAAATGGAA AK023376 AK023376 23
24 0.0000573 -0.067 GCGACTGAATATACACCTGTAAACGAGTAGCATGTATACATTGATTTTGATTACAAATGG USP36 NM_025090 24
25 0.0000597 -0.048 GAGAGATAGAGGAGAGTTCATTAGCATCAAACCACCATGGATTTCATAATCAATTCCTAA LOC286272 AK000939 25
26 0.0000636 0.024 GAGC I I I I GTGGCAGTG I I I I I ACTAGCAATGTTCTATGAAGGACTCAAGATAGCCCGAG SLC31A1 NM_001859 26
27 0.0000665 0.073 TTGAGCTCAGTGGAGCCATGAACACTACTGCCTGCAGTCTGATGAAGATAGCAAATGATA FH N M_000143 27
28 0.0000725 0.08 AGCATTCACTAGTTGGCAACTATGAATTTATTCCATGTCATTCTGTTTACTTAGCACTTG AL137342 AL137342 28
29 0.0000741 -0.004 GAGGATCTTTATCCTTTAATATGTTGGCAAI I I I I I CATATTCCGGAGCAAACTGCTTGC BF195626 BF195626 29
30 0.0000778 0.008 AGAGAGAAAGCTATGATTCTGATGAGTATGTATACGTGTGTAATCCCAGAGAAGTGAACG FU10404 NM_019057 30
31 0.0000802 -0.048 TAAGTTMGAGCTGTGGAACTTTAGATACATTCCTTAACATTACTGTATCRCAGTTTGCC THC2263651 THC2263651 31
32 0.0000826 0.007 GCAAGTTTCACACTGAGAGAGAAATGTAGATAACCAATAATAAAATGAAATGACCTCTGG FU14327 AK024389 32
33 0.0000861 -0.036 GTGCCTACATΠTGTTAI I I I I GGCATTACTACAGAGCCATGTACAATAGAAAGCAATGC AI806221 AI806221 33
34 0.0000867 -0.005 TACTACTATGTCCTATCATAACATTCCATACATACTTAAAACCAAGCAAAGGGTGGAGTT BE962960 BE962960 34
35 0.0000897 0.1 TGGGCCAGACGCTGGTGTGGTGTCTGCACAAGGAGTGACCTTCTCATGCTGATTTGCAGA GCHFR NM_005258 35
36 0.0000938 -0.013 AGTACAGTTGGATTAATCTTGCTATTTAAAAAGTTGGTGGGGCAAAGCCGAGTCTAGGAT PRKCBPl AL137703 36
37 0.0000961 0.181 GAATCTCTGATAAATATTTTCTGATGTTACTAGCTATGGGAAATTAGAACTGGCACAACC WDR72 NM_182758 37
38 0.000102 -0.119 CACGGGATTTTTGTTGATGGCTGAATTCTTGTGGATTCATAAGAGGATCATGCCCTTAGC FU20273 NM_019027 38
39 0.000106 -0.105 I I I I I C I I I I AGGAGGAAACCATTGGCTTCTGGTTAGAGTTCTAAGAAGGGAAACTGGCG AF074986 AF074986 39
40 0.000106 -0.074 GTTAACAGAGAAACAGCAGCATAGATAAGTAAGCCAATTTAATGTAGGGAGCAACCACTA THC2415708 THC2415708 40
41 0.000113 -0.098 AAAGAGAACΓAGACAGAGAGMTΓTGTTGCCTΠTGACCAATGCTTCCATGAGAGAGCTG THC2439183 THC2439183 41
42 0.000113 0.017 CCTTTCACAAAGACTGTAAGCCTTACCCAAGCATTAATTTTGCTGCATAGGCGGCCTGTT THC2316688 THC2316688 42
43 0.000113 0.033 CTGGAAAAATTTATCAAGTTCCTAAACCGCAATGCATACATCATGATCGCCATCTACGGG C6orf29 NM_025257 43
44 0.000118 -0.075 ACCTTGAACTTAGC I I I C I I I GAAGGGTAGGAAACCTGGAAGACGTACAAAGGAGGGAGA HM13 NM_178582 44 U)
45 0.000122 0.072 TCCTTGGTGCGACTTATGCCAAGTAATTATGACGAI I I I I I GTTTTGTTTTGTTTCCTTG THC2455550 THC2455550 45 ^
46 0.000123 0.019 CATTGCTTAACGATGGGGATACATTCTTAGAAATGTGTCACTAGGCAATTCTGTCATTGT L3MBTL NM_015478 46
47 0.000134 -0.042 CTGAATGGAAAATAGCATGACAGTTATAACAGAAGTAAAGAAAGTCACATGAGAGTCCAC THC2406792 THC2406792 47
48 0.000137 -0.018 CAGGTGATGAAATCTTCCCACCTTATGACTGGGGAATGTCAGGTCTAGTAGCAAAGCAAA FU22955 AK022223 48
49 0.000141 -0.076 CCAAGGTGATGGAAGCTGTTTTATTGTCCAGCAAGACTTAGACTATGTCACTGAGCTCAC WDR71 NM_025155 49
50 0.000143 0.057 GGTCCTTAGCTGCCCAAACAAATCAGGATTCTGTTGGCAAGCGAAAAAGAGAGTGGCCGC THC2378401 THC2378401 50
51 0.000143 -0.017 TGGATATTTGGGGATTAi I I I IGATTGTTGATATTCTCΠTTGGTTTTATTGTTGTGGTT ELN BC065566 51
52 0.000144 -0.027 ACCGCGTTTTACAAACATTTGTGTTCTCCTAACATAGTGATTAATTACTAGGACAGGTTT AK022204 AK022204 52
53 0.000145 -0.033 ACTGCATAGTCACTACTTTTAGTGAGTTTGAAATCTGTTTGGAGAGCTATGTAAGTACCA AK024937 AK024937 53
54 0.000146 -0.15 ACTCATTATATAGCAAACATTCAGCAATTATTAGAATGGTTGCTTCTATCGACAGGTAGG HNF4A AK096973 54
55 0.000154 0.049 TGGAAAGTGAAGTGAAGGATTTTTGTCATACAGCCAGTAAGTGCCAGAACTGACTTGAAC HSPC054 AF161539 55
56 0.000157 -0.116 TGACACAATTCCATTAAATACGCTATCTAGTG I I I I I I CTGGTAGACTTTATTGCACTGC THC2303175 THC2303175 56
57 0.000159 -0.045 AGAGAGAGACGGTGATGGATAAATTGACAACTCTGTAGGATTTACTAGCAAGCTAATGGA RPA4 NM_013347 57
58 0.000161 -0.052 TGGGCATTGCAGTAGGTACCAGTGAGAAAAAAGGGGAAAATCTGCATATTGAGTATTTAT CAPNl AK022319 58
59 0.000161 0.121 GGCAGCAAG I I I I IATAAAGGCAAACTGCTTATTCAGAAGTGATCACGGAACAACTCAAA AI918632 AI918632 59
60 0.000162 -0.006 TAAGCCTTAACA I I I I I I AGATAAACGGTCCCTGAAAAGAATAAAAAGTATCAGTACTTC AK023338 AK023338 60
61 0.000164 -0.018 CCAGGGGTGCATACTAGGGTAAAGAAMATTTTGTAATAGCAACAGTGGTTTGGGATTTT AF090939 AF090939 61
62 0.000166 -0.033 ATTTAGATATTGCCAAAATGAAATTAAGAATCTTAAACAAGGACCAGCTAAGACCTGCGG FNTA AB209689 62
63 0.000167 0.059 GGGCAACTAGTCATCTACTAGTTAGCTTAGTAAGCTAAGCATTAAATCTAAGAAATAGCA RSAFDl AL833452 63
64 0.000176 -0.133 ATCATCAACATTCTTGAAGCGCATΠTACACTATTCTGCATTAGCAATTGTGTTGAGGGA AA902595 AA902595 64
65 0.000177 -0.135 TGAAATGGTTAGGTTCCTAGAAAGATTAAAATGTCAACATTGGTACGAAAAGCAACAGAG KIAA1833 AK093563 65
66 0.000177 -0.042 ATCTGTGTGGTGTTTTGTACCGGCACGGGATATGGAACGAAAACTGCTTTGTAATGCAGT CMKORl NM 020311 66
67 0.000177 0.062 GAGAGTTCTGTACTTACGGTGAGCTCGGCACACAGGCCCACACTGGGCAGCTGCAGTCCA CRYBA2 NM_005209 67
68 0.000183 0.092 TTGAAGCAGGTGTGCAGTCCGTGTGAAAGCCTTCCCTTTAGCTATTAGGTATTGAGTCAA THC2377764 THC2377764 68
69 0.000187 -0.053 GAACATTCTTCTTCAGGTACA I M i l I GTTAAATTATTGTTTCATGCATAAAAGTTCACC THC2369196 THC2369196 69
70 0.00019 0.089 ACCTCTTCTGAAGTAACTGGTTAAATTAATAACTGGTCCCATAAGTCAGGGTCAAGCCAC AK026192 AK026192 70
71 0.000195 0.179 TCTGTTTTCTCATGCACAGGCAATACACAAATTTAAAATGAGTTGTGAGCCAATTGTTTC UBQLNl NM_013438 71
72 0.000197 -0.121 GCCTCAATCACATTTGTGATGGCCAACAAATTTACCTAAGCATTTTGCTAACCTGAAGAA THC2280343 THC2280343 72
73 0.000197 0.057 TGGGGTTAGAAGTGAAGTCAGTAAACAAATTGCCTGCTATTAAGAAACTTAAATACATGC A_23_P83991 A_23_P83991 73
74 0.000198 0.074 TGGTTTGAAACAGCTTCACAAGGCAGAACAGTATCAAAAGAAGAATCGGAAGATGCTTGT STX16 NM_001001433 74
75 0.000199 0.006 AAATGCCAC I I I I AAAGCTGTTAATAGACTTTGCACC I I I I CTTTGACAAGGATGTGTCA FU31951 NM_144726 75
76 2.00E-04 -0.122 TTGGTGGATTTTRGTGTTTTGTAAGTTGTCTATTTTGATAATGTATTATTTTTATAACTG ORAOVl NM_153451 76
77 0.000203 -0.069 TTTCCCATTTGGMGGGCTTTCTCCATCCTGAMGCTCCTGTAGACΓTGGTGCTCΓTTTT AK074696 AK074696 77
78 0.000212 0.032 AGTGGAAGTTTTGTTGTAGGAATTATAGTAATCACACCACATTACTTGGCCTTTCGGTAA SLC25A36 NMJ318155 78
79 0.000216 -0.003 AAGCCAGGAATAATΠTCTATCTCATGCTACCTAACTCCTGGAAGTTATCATGAGACCCT A_32_P90468 A 32 P90468 79
ENST0000025888 ENST0000025888
80 0.000221 -0.089 GTTGATTGCCAGTTTGTGTACTGTTGCTTGGATGCGGCACAGTGGTTGGTAATGGAATAA 4 4 80
81 0.000228 -0.095 AATGCAGAGGGAACACCAGAGGACG 1 1 I I I CTTCAACTCTGCACAGCTATTGACTCTATT AK5 NM_174858 81 Uj
82 0.000233 0.059 GAACCAATGGGGAAAGAATAAGCTAAATGTGTTAACGGATTCCAAATTACTTAAAGCCTT TSEN54 AK094466 82
83 0.000234 0.017 TGCAACCTTTTGAAAATGGATGGGTGATAATGAAAGACCATTCCAGAGAAGAAACACCAT LOC254128 BC036624 83
84 0.000235 0.053 TTTGCTTGCCAAACTTAGCTTTGCCAGTGATAGTCAATATTAAAGTGTAC I 1 1 1 1 I CCCC L3MBTL NM_032107 84
85 0.000237 -0.022 TTGTTCTCACCAGATAATTACACAAC I I I I I CTTCCTATTCCAAAGTAACAGTGTTGTGG THC2439581 THC2439581 85
86 0.000237 -0.017 CTGTTGGGATTAGGAGTGGAATGAGGAATAGTTTAGGACCACAAGAGATTTGGCCTCCTT NAG8 NM_014411 86
87 0.000238 -0.071 ACTGTGTAGTGATACTΓCATTTTGAAAGTTGGGAAAAGCGTCTTTTTCACTTTAGTACAG RPL23 BC034378 87
88 0.00024 -0.166 ACCTGGTAGGAAAGAAGACCC I 1 1 1 1 I GCAATGCCTCTAAGGAGTGGGATAACACCACTA C4BPB NM_000716 88
89 0.000249 -0.023 GAGATTGTAGTTATC I I I I I I GAGATGAGATACTTTCACGACTTTGGTATCATCTGTCAG SFl NM_201997 89
90 0.00025 -0.069 GAGCTCAGTGCAAGACAAGATTCTGGCGTAAGAACTGCGGTAGCAAGGACAAGAGGGCGT LOC441776 XM_497520 90
91 0.000252 -0.092 TCΓAGATTTGCACAGTAATAGAGGAATTAGAAGTACCTAACTATACACTTTGATTCAGCC C14otfl29 NM_016472 91
92 0.000254 0.152 TCTAGATGCTTCTACTGTTATGTΠTATCTGCCCATTTATC I I I C I I AGTTACCAGGAGA EPDRl NMJ317549 92
93 0.000255 0.08 ATCACGAGTGCACAGAGAGAGAGATCAGCTGCATTTAGTGAAACGCCAAGTTGGGAAAAT THC2277609 THC2277609 93
94 0.000258 0.119 TGGTCTAGAAATAAGCATAGCACAGTTAATGGACATTACCACAATGGAATCCTCAATGGC NM_003360 NM_003360 94
95 0.000262 -0.13 ACAGGAAGATCATTTTCTGTTTTCCTTCTAACCAGGGTGAGAAAATACAAAGTAAAGAAC THC2335352 THC2335352 95
96 0.000263 0.009 CCATCGCTGTTTGACATAACCTCCTGATTCTATTATTGTCACAGCATTAACCTCCACAGT A_24_P615462 A_24_P615462 96
97 0.000276 -0.009 GTCAGAGCTCAAACCTTAGTCAACACCAGAGAATTCACATGAGGGAAAACCTATTAATGT ZNF165 NM_003447 97
98 0.000278 -0.122 AATTATCAGTTTACAAACAGAGTTTTTGGATATGTTGAGGAGGTAGATCGCATGCCATCT THC2334047 THC2334047 98
ENST0000032708 ENST0000032708
99 0.000294 -0.101 GAAATATGTTGATAAGAACACACCAGGACTTGAAAGATGTTACTAATAATGTCCACTACG 6 6 99
100 0.000295 -0.078 CTTGAATACATTGGCAGAGGTGCTAATCACATCTTCCCTAAGGCACCTGGAAGAATTATT HOOK3 NM_032410 100
101 0.000298 -0.073 I I I I I I ATTTCTGAAAATAAAAAGTACATTTATTCTCAACTTTGTAATACAGTCCATTTA NLN BCOO 1644 101
102 3.00E-04 -0.099 CAATTCATTGCCAGACTTCATTGGAATGCTTTGTTTGATGATGTATGTTCATTCTCAGCT C14orfl29 NM_016472 102
103 0.000306 0.011 GAAAGAAGCATTGTATATAAGCCTATGTATTTCTGTAATGCTGCTACAGGGGGATACAAA A_32_P210193 A 32 P210193 103
104 0.00031 0.011 AAGGTGCΠTATTTGTGGGCCAAATAGTAGCAGGTATTAGATACGTTGGTGGGCCAAGAA THC2381061 THC2381061 104
105 0.000312 -0.03 GGCCTACTTCCAGAGGAACTGACACCATTGATTTTGGCAACTCAGAAGCAGTTCAATTAC ETFA NM_000126 105
106 0.00032 -0.021 TGAAGAGGATCTTATTAMCTGCΓGGTCTGACTTTATGGATTGACACTGTTCATTTCTTT AK3L1 NM_001005353 106
107 0.000323 -0.047 CAAACAAAATGACGAGAGAGATGTGGGACTAAATAATGATCGTTCAACTACTAGCCAAGG ATXN3L AB050195 107
108 0.000329 -0.29 TATTTTGGTATAATGTACTAAAGTCTTGTAGTATTTAGTCATAATAGAGTTTAACAGTGG X15674 X15674 108
109 0.000336 0.138 CTATAAGGTTGTACTGCTGGGAAAATACAATGCACAGGGCTTAGGTTCAGATCATGAATT FU22028 NMJD24854 109
110 0.000336 0.033 GCGTCGCAGTGGCACAGAATGATGACAGGTGGAAATTTGGACTCCCAGGGAGACCCTCTT PLEKHA4 NM_020904 110
111 0.000337 -0.035 CCACCAGTTAATCTCACATACTCAGCAAACTCACCCGTGGGTCGCTAGGGTGGGGTATGG LOC 124220 NM_145252 111
112 0.00034 -0.055 GTTTATCTTTGTCTGCMTTACGATTCTTCTGGGTTTATTTTGCCAGCTCATTATCCAGC BG118529 BG118529 112
113 0.000341 -0.099 AAACATTAGGTAGCAGCTTGTAGAGGATATATTTAGGGTCATGATGTCCTTCTTGTTGGC THC2340734 THC2340734 113
114 0.00035 0.175 ACCTTTATATAI I I I I I GAAGCCAGTACTGTGCTCTGCATATAACAAAGCTGCTTCAAGG GSR BC035691 114
115 0.000351 0.064 ATAGTTTACATGTGCGACATGAGCTAGTGTTTGTACATGTGACAGACCTTGTTTCTGGAT AI911989 AI911989 115
ENST0000026534 ENST0000026534
116 0.000355 -0.017 GATTTAi I ici ΓCTAATCAAAGATGCATAACAGCTATTATCTAGGGGACCACCAAATGTG 1 1 116
117 0.000358 -0.031 ATGCATTCTACCACTACATTTTGGTGCTATTTAAGGTGTGCAATTTTCTATAGGTGACTT SLC6A14 NM_007231 117
118 0.000358 0.145 TACGTGTCTCATCTGTTCATCTAGACTATTTGCTATGTGCTACACAAACCTAAAATATAC THC2279497 THC2279497 118
119 0.00036 0.025 GCAGAAGTACAAGCTTTAGGGTGTATCTATTCATCTATTCCTAGTACATAAAATTTAGCC THC2284174 THC2284174 119 &
120 0.000363 -0.136 TTTGTTCTTTAAAGAACGTTTTACTTAACTTAGTATTTCATTTTTCATCTATATTATGAG SMA4 NM_021652 120
121 0.000365 -0.071 GAACAACATTTTGTGTTGCAGTTGACTCCCTCAATGGATTGGTTTACATAGGTCAAAGAG LOC387921 NM_001017370 121
122 0.000366 0.034 TTGGTACTTCAAGAAGGACAAAGCAGTTTGGTACTTTTCCAGGCAACTATGTAAAACCTT SORBSl NMJX36434 122
123 0.00038 -0.191 AGGTAAACCCCCAAGAATCAGGAACTAGAGCCAAGTAAAGAATAAGCCCCGAGAAGCTTT THC2376737 THC2376737 123
124 0.000388 -0.131 CGTCCCTTCTGAATTTAATTTGCACTAAGTCATTTGCACTGGTTGGAGTTGTGGAGACGG H19 NRJ302196 124
125 0.000394 0.033 AGCTGTTTTCTATTAACACTGAAGTACTCTGAGAGCTTGGAAATTTTCAAGTGCAAAATC THC2314215 THC2314215 125
126 0.000395 -0.066 ACAACCCACTGAGTCCCAGGCTAATGATATTGGATTTAAGAAGGTGTTTAAGTTTGTTGG AKAP12 NM_144497 126
127 0.000395 0.018 TCTCCATCATCTTAGAGCCAAGTTATATGTTCTTGTCTAATCCATGTAGC I I I I I GTTCA SLC31A1 NM_001859 127
128 0.000397 0.005 ACACTGCGACCAGTGTGAGCAAACGTTCAGAAAGCACATTTAAACACTGTTACTTAAAAT LOC142937 BC008131 128
129 0.000404 -0.087 AAAGTTATCAATATTCTΠTGAGAGTGGGAGGACCTGAGTCACAGTΠTAAATGTCAACT THC2446936 THC2446936 129
130 0.000409 0.011 AAGTCATGATGTCAGACAAGAACTTGGATTTTGGAGACACGGGTTTGAATTTCAGTCATT AF289610 AF289610 130
131 0.000409 -0.005 TGGATATTAAGATGAGTAAAATAAGAGACTTCCCAGAAATAACTGGTTAGCTGTTTCCTG UNQ9433 NM_207413 131
132 0.000411 0.056 ATATGAAAAGTGAAGAAAGTGATCATAGGAGAAAACCATTTCAGATGACAAGAGCACCTC C21orf34 NM_001005733 132
133 0.000413 -0.064 GAATAAGAATTAAACTGAATGTGCATGTCATCATACTTCTGAAACTGACTTTGGTCAGAG THC2443199 THC2443199 133
134 0.000415 -0.026 GTTCACAAAAACACCTAGTAGGTATTCAGTTCATATTGGAATGAATGAGAAAATGAGCAG A_32_P211048 A_32_P211048 134
135 0.000419 0.042 CTATGGCCAAAGTTTGGTNTCTCAACACTGTCTAAATTTGGATTAAAACTTTGAACTTT PLEKHA4 NM_020904 135
136 0.000421 -0.077 TCAACTTAAGCATAAAGGGCTCTGAACTTTTCCACTTTAGAGTGACCGTCATTTCAGGAG C14orfl61 NMJ324764 136
137 0.000424 0.015 TGTCATTCTGAAAACATCCTATGCGATGGAATGGAGAAGGAAGTGATGACTCAGAGTGTG THC2323620 THC2323620 137
138 0.000425 -0.149 CCCGATTGTGAACTGCTTGAAAGAAAAACGAAACTTCTAAGATGTTTGTCCTTTCATGTC BC012894 BC012894 138
139 0.000426 -0.031 GGGTGTGGGTAAGACACAATGTCCTTGAAATTATATTTCACTGGGTTAATGAAATTGGCT AA858394 AA858394 139
140 0.000443 0.009 GATTTTAACCCTCAAGGGCAATTTGATACTGAGAGGTTAAGGGCACAGACTTCTGAATCA SFXN4 AK095295 140
141 0.000443 -0.041 TTGCAAATTTTAGGGTCCTGAGCCAAGTATGGATGGTTCAGAATTTG I I I C I I I CCTGGA MGC11271 NM 024323 141
142 0.000446 0.086 TGAGGTAAGCAGGAATGTAAATGGAGTTCAAGGCCTTAACA I I I I I GAGCACTGCTACTA ZNF198 AL136621 142
143 0.000446 -0.117 CCCATTTTCGTTGGCTGGCAGGTTGAGATG I I I I I C I I AAACACTGCCTGTCAGTGTGAA ZNF502 NM_033210 143
ENST0000025888 ENST0000025888
144 0.000448 -0.109 ATTACΓAGCCAACAGAGTTTTACΓATTTTGATTGTCTGGTTGGTTTAACAAAGAGCCTAG 4 4 144
145 0.000452 -0.061 AATMTGACΓAAGGCCACTGCTTCCTCAATTGAGTCTTTGATATAAAAGCCTTCACTTTT THC2406017 THC2406017 145
146 0.000457 0.023 CAATAGCAACAGAAATATAAATTATATTCCATTCCCAGAGAGAGAATGCGCTTTGGATTG RNF150 NM_020724 146
147 0.000473 -0.053 TTATCTAGGAAATGWACRCATATCRTAATTGTAGTCATTGGCTTGAGTGACGGGTTTTG AI791206 AI791206 147
148 0.000483 -0.021 CCCCCAAAGTGAATTTTAAACTTGACTTATTTATGCCGTTCTCATAGCAACAGGAAAACT THC2315602 THC2315602 148
149 0.000493 -0.011 TTGTTTCTGTGACAGGACGATTACTGAGAGCCAATCAGATTAGTTTCAACTGGGACAGGC KIDINS220 NM_020738 149
150 0.000494 -0.05 TGTTTGGATATAAATGTGTATGTGTCCTTGTAAATGTTTCTATCAAGCAAGAATGCCACG FU14054 NMJ324563 150
151 0.000499 0.058 GTAAGCACTCATTATAATTTGTTGATGATAGTTTAAGTATATGGGGTATGCCCATTAGCC THC2314058 THC2314058 151
152 5.00E-04 0.034 ATTATCTGTTTGTTCATTTAGTCACAATATTGATCAATATAGAACAAAGCAGTATATTTT BC033250 BC033250 152
153 0.000504 -0.055 CCTTCTCTCTGGAAAGAATTTGCTTAACTTGACATTCCATGTGCCGCTAATAAAATATAT ASRGLl BC006267 153
154 0.000513 -0.006 AGCGAGTTTGATAGCCCAATATTTTCCCAGGTAAGTATCTAGAATACCAGCTTCCAGCAT AI686776 AI686776 154
155 0.000518 -0.013 CGATACAATATTGTTAAGCTGTATTATAAGTATTGTTACACAGGGTTATGCAATTCCCGG NM_000627 NM_000627 155
156 0.000519 0.108 ACCTTTTCTGTTTTTCATCCGACATAATCCTACAGGTGCTGTGTTATTCATGGGGCAGAT SERPINE2 NM_006216 156
157 0.00052 0.05 TTCTCCAGGAAGCAGATCAAGGACATGGAGAAGATGTTCAAGCAGTATGATGCCGGGCGG EFHD2 NM_024329 157
158 0.000526 -0.01 GGGGAGGAATAAGCAAGAAGAGATCATGCAGGAAACGGGCCAAACTACATCAAATGATTT AK023542 AK023542 158 8
159 0.000529 0.07 CAATGTAAAGCCAGAATATCAACGTCC I I I I GTCAAGATTTTCAAACCTATTTGGCTGAT THC2318614 THC2318614 159
160 0.000533 0.029 ACAGAAACCATCGCTAACACTGATTACAGCCACATAAAGGCAGGAGGTTAGAGAGAATCC THC2356115 THC2356115 160
161 0.000541 0.024 AAAGTTTACCCTACATAGGGAATTGTAGAAGTGAGTTAGTAGAAGTGAGTTCGTGTGAAA AB052759 AB052759 161
162 0.000549 -0.078 GTGAATTGTGTTTCGGAAGACAAAATGACCCATTTACAAATGTGACTGCTTCATTACCCC RAB11FIP2 NM_014904 162
163 0.000571 0.048 CAATCTTCGGGGTGTGATGAATAGCAAATCATCTCAAATCCTTGAGCACTCAGTCTAGTG HK2 NMJ300189 163
164 0.000571 0.155 TTAGTGTTGCATCTGATΠTCAGGTGTACATTTA I I I I I GACTGGGCAGATAGGGGATTT DATFl NM_022105 164
165 0.000579 -0.186 ACAGTAACTTAACAAAACCTCAAGTCTGGATAGTGAGTTCAAGAGTGTTACTTTATATGG A_32_P100475 A 32 P100475 165
ENST0000032708 ENST0000032708
166 0.000584 -0.075 ACTGGCAGCTGTGACTTATCATGGAGTTGATAACAAGAAGAATAAAGGGCAGCTCACTAA 6 6 166
167 0.000598 0.061 AGACATCTTTCTGAGGAATGTTACGGCATCCGGAGTCATCCCTCAGATTTCTCTGATCAT PCCB NM_000532 167
168 6.00E-04 -0.036 ATGGGAAGTTACTACCCAGGCTTACCAAAAGGTCAGGTTTATATAAAGTGGCGTTCCTTT AK022346 AK022346 168
169 0.000616 0.025 ATGGTGGTATTGTGACCACTGAATTCACTCCAGTCAACAGTTTCAGAATGAGAATGGGAC A_23_P14432 A_23_P14432 169
170 0.000634 0.042 GGACAGACTGTCAACTTGAAATTTACTTATGTAAAAAGCTTAGGTGATTCTTAGGGTTTC LOC389677 NM_203390 170
171 0.000639 0.034 GGGACCTGTATAGCCGTTAAACTATAAATCAGGGCCAAAAAGGAAAGATAAATTATAAGT THC2280557 THC2280557 171
172 0.000642 0.011 CGG I I I I I AAAAGTCTCTTCTACCCACTACATCAGTTGTGTCTCTTGGGATGGACTTTAG THC2446045 THC2446045 172
173 0.000649 0.04 TTCGCTGGTTTCCAACTAGAAGTGACAGAAAATAGAAAGAGTCCAGTAATGTGAGGTCTT THC2351317 THC2351317 173
174 0.00065 -0.015 AATATGGTAATTGGAATTAACCCCACACCATAGTATGCATTGTTATACATACTGTGTACC SRP9 NM_003133 174
175 0.000654 0.028 GGAGTTCAGAGTCAAGAGAGAAGAAAGAAACACAACATACGTTAAAGTTAAAATATAAGA THC2410524 THC2410524 175
176 0.000664 -0.052 GAGAAAAGCAAAGCTC I I I CI I ATTTTCCTCATAATCAGCTACCCTGGAGGGGAGGGAGA PLAlA NM_015900 176
177 0.000665 0.109 CATTGTCAGTGCTACAGGAGTTACACCAAATGTAGAACCTTTTCTCCATGGTAACAGTTT FU22028 NM_024854 177
178 0.000671 0.056 CCACGTGTCAAGTAATCCTTAAAAGAATATCTTGGAAAAGGAAACAGA I I I I I I CCTGTG A 24 P164815 A 24 P164815 178
179 0.000704 0.155 TTCTCATTGTAGATGTCATCTCTCACATTTATATCAGTGAGGTTTGAAATTCTGTGTAGC AK023647 AK023647 179
180 0.000704 0.037 TGAAGCAATATTGAGGAATCCTGAAAATGTTGATACTCAAATTCTGATCCTGATTTTGGG THC2453189 THC2453189 180
ENST0000024284 ENST0000024284
181 0.000705 -0.079 TGTGAAATGGAACCATGGATTTATGTCTGGATCATCCATACAGAACCAACAATTTTATTC 8 8 181
182 0.000713 0.038 TATTCCATGTATCGGGAATTCTGGGCAAAACCTAAGCCTTAGAAGAAGAGATGCTGTCTT HIGl NM_014056 182
183 0.00072 0.019 CCGGGAAGTGCACAAAACCAATCACCCTGAGTGTGAATTTGTCCGCCCCTGCATCGCCAA A_24_P101742 A_24_P101742 183
184 0.000721 0.064 GCAAGAACTCTGGGCTTGGGTAATGAGCAGGAAGAAAATTTTCTGATCTTAAGCCCAGCT NM_003793 NM_003793 184
185 0.000729 0.047 AGGTAATTGGGGGTATGACTTCAGTCAC I I I I GAAATATTGGGAACTAAATTCTCTCATT THC2344420 THC2344420 185
186 0.00073 0.024 CTGTCTCCAGGTAGGTGGACCAGAGAACTTGAGCGAAGCTCAAGCCTTCTCAACTCAAGG THC2342624 THC2342624 186
187 0.000734 0.196 GTTCTGGTGTCATAGATGTCCCATTTTGTGAGGTAGAGCTGTGCATTAAACTTGCACATG IDHl NM_005896 187
188 0.000736 -0.092 GGACCATAATTCCrGAATCITGGAATTATGAMTTTTCACCATTAGATGTGTGATATTTT THC2379364 THC2379364 188
189 0.000737 -0.128 ACCAGCGTGTGAAAAGGGAACGAGAGAAGGAAGATATTGCACTAAATAAGCATCGCTCAA RHOBTBl NM_014836 189
190 0.000745 -0.043 GGAGAGAGATGAAAAATTATCCAAGTCAATCAGTTTTACCAGTGAATCAATTAGTCGGGT DKFZp434C0328 NMJ317577 190
191 0.000746 -0.028 GCATGAGATGAAAGATGATATTTTAATTCTATCATTGAGGCATAGTCTTTCCAACACACC PRKCZ AB007974 191
192 0.000746 0.034 AACCCCATCTCTATTAAAAATACAAAAAATTAGCCTGGTGTGGTGGCAGGCGCACGAAAT ALPKl AK026323 192
193 0.000747 -0.035 ATCACTTCATTGTTTGAAGGACAGTACACTATCCAGGATTCATTTATTGTTCAGGTGTCT AK021848 AK021848 193
194 0.000751 0.021 GGCTTTGTTGTAGGAGCAATGACTGTTGTTATATCATGTATCGAGAATTCTGGGCAAAAC A_23_P120644 A_23_P120644 194 g
195 0.000751 -0.058 AGATGGGCATAATCTCGGGGAGCTAACGATCTAGTGGTGAAAACCAGGTAACCAGATAAT SATB2 AK025127 195
196 0.000752 0.284 ATATGATCTTAAGAGTCTAAACATTCAAGAGACGAGGGCAAGAAAGCCAGTCACATGTAG N52413 N52413 196
197 0.000753 0.035 AGAGAGGGAGAGGGAGAGAGATTAGAGTTGGAAAATACTCATCAGAGATGCTAATTTGTT THC2408471 THC2408471 197
ENST0000032813 ENST0000032813
198 0.000775 -0.014 GTTCTCCTATTTGGTAACTGCAACAACTGCTGTGGGTGTCACATATGCTGCCAAGAATGT 5 5 198
199 0.000779 -0.029 CTGTCTGTGGTTTAAGTCTTTGCAGTCAAGTACTGATGCATCCMGCCAGGCCCATGCCT GTPBP5 NM_015666 199
200 0.000783 -0.031 ATTTCTGTGTCTCAGCCAAAAATGCGTTCATGCTACTCATGCGAAACATTGTCAGGGTGG C6orf29 NMJ325257 200
201 0.000786 -0.049 GGGCTGAGGTTCTAAAGGAATTGAAAAGAGACGATAAACATTGAATGAGGGTGAGGGCAT SEMA5A NM_003966 201
202 0.000786 0.031 TTCACCACACTCATTTAGACATCAAGGCCTTCACCATGTACCGAGAAGTGCGCAAAATCA A_32_P78488 A_32_P78488 202
203 0.000796 -0.154 AAAGAAAATCATTCTGTTTCCTGCAGTGGGAACTGGCCATGTAGTCTTAACAGTCTTTTG SCRNl NM_014766 203
204 0.000803 0.027 GAMTCTTACTGMCCAAGMAGTTTTGCAGAATTTCTGAAGGCCGAATATTCAGACTTA THC2415754 THC2415754 204
205 0.000805 -0.048 TTCCATATAGGCMCAGTTCTGTTTCCATAGATCTAMC I I I I CCAC I I I I CATGCMTG THC2403644 THC2403644 205
206 0.000806 -0.18 GATTCCACAGAGATCAATGTTCCACAGGACATTTTGGGAATTTGGGTGGACCCCATAGAT INPPl NM_002194 206
207 0.000806 0.028 GCMGCACCATCCTTTACCTCGCTCATTTAGACACCMGGCCTTCACCATGGGCCAGGM A_24_P306814 A_24_P306814 207
208 0.000811 0.094 TACAGGGACTTAGAGAACAGTCTCTTTTCTGCCTTTMAATGAGAGTTCCTCCATTTACC RAB3B NM_002867 208
209 0.000815 0.009 ATAAGAAGCTTGATAMGAGCCTGGGCAACATAGTGAGATCCCGTCTGCACCAMMATT A_23_P5O320 A_23_P50320 209
210 0.000842 0.126 ACMACTGTTGTGCTATTGGATACTTAGGTGG I I I C I I CACTGACMTACTGAATMACA NM_003014 NM_003014 210
211 0.000844 0.023 GTGTGTCTGGAAGATAGAATTCTAGGCGTAGAATTGATAGGTTMATGTATTTATAGGGA THC2261399 THC2261399 211
212 0.000845 0.052 TCCCTGGAGGATGCCTGMTTCTACMCCGGTTCAAGGGCCGCAATGACCTGATGGAGTA ASS NM_054012 212
213 0.000845 -0.134 TTTTAGCAGTTATTGCGAGGCTACCATTTTAGGTAGTATGAGAGAAMATCCTMTATGC WDSOFl AF161549 213
214 0.000847 -0.055 TTGAGATGAAGGTCAMGMMATCAMACTGAAGAACTCTGAAGCTGAGCTCCAGCGGC A_24_P409824 A_24_P409824 214
215 0.00086 0.038 TGTTTGAGGCACCATTGACTATGATTTGGGGTCATAGGAGAGCAGGTATACTGATGGAAC BM979049 BM979049 215
216 0.00086 0.063 GAAGTGCACAAAACCAATCAAGGCCTGGGCTTGAAATTTGCTGAGCTGATATACACCAGT A_24_P358606 A_24_P358606 216
217 0.000868 -0.026 GTTTAAGATTATTAAGATGCTTTTA I I I I I I ATGTAATGAATATGGCTATAAAAGGTATA THC2380397 THC2380397 217
218 0.000876 0.073 AAAGAAGATATCGAATAACTTGGAAAAATGGGTACTTAGTGCGGTGGCAAAAGCCAAACA GGTL3 NM_178025 218
219 0.00088 0.085 TATGTTTCCTTGGAi I I ΓTGAAATTTGAATTTGTTGTGTAACTTTGCTAGTTAAAAGATT AK021576 AK021576 219
220 0.000881 0.153 TGTGTACAGTTATTTATGCCTCTGTATTTAAAAAACTAACACCCAGTCTGTTCCCCATGG NOS2A NM_000625 220
221 0.000887 0.028 TTGCTCAGATCTCCAAGCAAGCA I I I C I I I I CTTTTAGGGATGTCTGAAAGTCACATCCA SS18L1 NM_198935 221
ENST0000033420 ENST0000033420
222 0.000888 0.141 CCGTCACCCCATGCAGTCAGCATCACCTTGGTCACCCCATGCAGTCGGCATCACCGAAGT 7 7 222
223 0.000895 0.005 TTCAAGAGCTCAAAGTGCAAATACAGCAAGTCAGTTCACAAAGTTTCAAGATACAGTTTT AI955364 AI955364 223
224 0.000905 -0.085 AATCTGTGTGATTGTTTGCAGTATGAAGACACATTTCTACTTATGCAGTATTCTCATGAC KDELCl NM_024089 224
ENST0000030917 ENST0000030917
225 0.000906 -0.155 AGCCTCCAACCACACTAACAAGATTCAAGATTACTTGCAACAGCTCACAGGAGCGAGAAT 8 8 225
226 0.000907 0.029 CTCATGCACATCAGCTACGAAGCTGGAATCCTAGAGCCCAAGAACCAAGCGCCTCCAGGT A_24_P33055 A_24_P33055 226
227 0.000907 -0.254 GTTTACCTAATATTACCTGTTTTGTATACCTGAGAGCCTGCTATGTTC I I CI I I I GTTGA PDGFA X06374 227
228 0.000916 -0.081 TGGGTMAGCTGMTATTTCATAATCTTTGTAACCAAGAGATGTTGGGTTCTTTCCACAA THC2283850 THC2283850 228
229 0.000925 0.012 GCACAGATCATGMTAACTCAGAAACCATTAGCATATTGTGTGCTTAATTAAGGCTGGM BC022398 BC022398 229
230 0.000933 0.214 ACTCAGGGATGATGGTGTTTATTGCMATGCACAATCTTTTTCCCATTGMATGTCATCA KIAA0980 NMJ325176 230
231 0.000943 0.034 TCTAMTATTCTCAAGTCATGTTCAATGTTTCCTMACCTTCMTΠTGGCCMAGTCCC RPESP NM_153225 231
232 0.000947 -0.098 GGCAGMCATTTGACTGGCACTGATTTGCAATMGCTAAGCTCAGAMCTTTCCTACTGT HS3ST1 NM_005114 232
233 0.000956 0.053 CCCAAGACGTCAGGATAATMAGCTCTGTATTTATMTCTTTTATATGTCCTATTGTGGC BX641010 BX641010 233
234 0.00096 0.193 AGCITCAMGCTCTTGGAGGCTTTAAAGTTCTTTCTGTTGGGTGTGCATTACAGTTTACT D8S2298E NM_005671 234
235 0.000964 0.054 TTAAGCCMGGAATATTTGAACCCTACCGATCAAMGTACTCTTCAAGAAGCATTATTTG THC2441641 THC2441641 235
236 0.000969 0.163 GGAGTTTGGGGCMCTGGTTGGAGGGAAGGTGAAGTTCMTGATGCTCTTGATTTTMTC POU5F1 NMJD02701 236
237 0.000974 -0.141 GCTGMACATGCTAGTGATATCTAGAMGGGCTAATTAGGTCTCATCCTTTMTGCCCCT KL NM_153683 237
238 0.000974 0.039 TTGTTGGACGAGCTTGCTTTAGTTAGACGTCTCATTATTGAAGTTACTATTATTGTTGGA A_24_P246636 A_24_P246636 238
239 0.000974 0.055 ATTTATATACACGATGGGGCAGCCTMTGCACATCATAACATTMCTTTTCCAMGMGT AI097304 AI097304 239
240 0.000989 0.074 CCTCTCCATGTCCAGGAMCTTGTAACCACCCTTTTCTMCAGCMTMAGAGGTGTCCT CTSF NM_003793 240
ENST0000024937 ENST0000024937
241 0.001 0.024 GGCTGTATTGACATTATGGAGMCTGCTTCATTAGMTAMGTCCAGAGGTATCTACMG 6 6 241
Table 2
List of normalization genes.
Gene SEQ number Gene name Systematic name Sequence ID NO
1 OR4D2 NM_001004707 TCTATTGTCCAGCTGGCrCTGATGCTCCCACTGCCCTTCTGTGGCCCCAACATTTTGGAT 242
2 A_32_P1863O A_32_P18630 CTCAGTGTTGATTGTCC 1 I I I r i GTAATCCτCCTGCTGTTCTGTTCTGTTGTGCTGTGCA 243
3 FU14346 BC017694 CTCTCnTGCrGCCTGTTCTCrCCrrrTGCAGGTTGCGTTTATTGGCTTATCTCTGGGGGT 244
4 A_23_P21862 A_23_P21862 ATATATATCrTCTGArrTTGGCCTCTCTTATACATGCTGTTCCTCTCCTGCTCTGTGCAG 245
5 PRDMIl NM_020229 CAGGTGCTCCATGTGTTGCCCTTGTATCCTCCTTGTCAATAAAGGAAGTTCCGCTGCAGA 246
6 LYPLA3 NM_012320 CAGGACTGAAGCTGCCTCCCTTCACCCTGGGACTGTGGTTCCAAGGATGAGAGCAGGGGT 247
7 NEK8 NM_178170 CCTGTGGGGACTTCTTCACTGCCTGCCTGACTGACAGAGGCATCATCATGACATTCGGCA 248
8 KIFlA NM_004321 TTAATGCTTTATACTGCCGAGTCTGGGGGCTTGTTTTGGTTTGGGGGCAGCCATCCTCCA 249
9 TM6SF1 NM_023003 ACCTGCTGAGAAGACCATTTGATTTAATGTTGGTTGTGTGTCTCCTCCTGGCAACTGGAT 250
10 FU23754 NM_152675 CTCACTACCTGCTCCTGTCGTTCTAGTCCCATTGCTGCTCTCAGGACAAGCTGTGACTGA 251
11 THC2278244 THC2278244 TGGGACAGAAGGGGTGTTTTTCCTCCTGTCTTCCTCTTCCCATTCTCCTCTTTTGGGGAG 252 8
12 PRDMIl NM_020229 TCATGAACTTCCCACCACCTCI I I I I GCCCTAACTGTATTCGCCTAAAGAAGAAGGTTCG 253
13 LTBP2 NM_000428 GTGATGACTTGAACGGGCCTGCTGTGCTCTGTGTCCATGGTTACTGCGAGAACACAGAGG 254
14 CORT NM_001302 CAGACCTGAATAAAATGTATTAAGCAGCAGTGATCTTTCCTCTCCTCCTTCCCAAGTCAT 255
15 BM564463 BM564463 ATCAACTCTCCCTTTCCCTGCCAACAAAGAAAAATCCAATTTGGGATGTAACCAAACTTA 256
16 GZMM NM_005317 GAGTCCTGTCCTTCAGCTCCAGGGTCTGCACTGACATCTTCAAGCCTCCCGTGGCCACCG 257
17 CRHR2 NM_001883 TATTTCAACTCCTTCCTGCAGTCGTTCCAGGGTTTCTTCGTGTCTGTCTTCTACTGCTTC 258
18 TIGDl NM_145702 TTGGTAAAAACCTTCCTCTTCCTCCAGTGCGGGACGCACTCTCTGGTATCTCTTTTGACC 259
19 KLK14 NM_022046 GTGAGAAAGCAAGACATCAAGGAGGGACCTGTGCCCTGCTCCACATCCTCCCACCTGCCG 260
20 HKR2 NM_181846 AGCTTCTGTTCATTACTGTTAGTGTATCATTCTTCATGCCTCTGTTTCTGTGATGGTCTC 261
21 LOC440523 AK090827 CACCAATACCTTTCTGAAGTTTTCTAGTCCCTCC I I I I l GTTTGTGCTCCTTAAAGCCCA 262
22 LRCH3 NM_032773 CTGCCTCATTCTTCTGCCTTCACGCCTCTTAAGAGTGATGACAGACCTAATGCTCTATTA 263
23 COX6B2 NM_144613 CACCAGCCTTCGAGTTCCTTGTTTCCCTTGCTCTGGTCTCCACGTGTATGATGGGGTTCT 264
24 PRIMAl NM_178013 TATGCTGTGCCTCCCTGGTGTTTCTGACTGTGCTTGTCATCATTTGCTACAAAGCCATAA 265
25 AK054939 AK054939 TAAGTGTGAATGTGCCTGCCTCATTGCCTTGTGTTCCAAACACAGTACTGAATGCGTTGT 266
26 GTPBP2 NM_019096 CACCTAAACCCTCGTTTTAGTAATTTGTAGTGACTGTTCCCTTCCCTCTGTTGCAGGGAA 267
27 FU12788 NM_022492 AGATTACTCACAGCTCCTCATGCCATTTCCTGTCCAGATTGCTATGTATGACTCTGACCT 268
28 A_23_P142197 A_23_P142197 GGTAAGGGACGTAGCTACACCACAGACCCACTCTTGTCCCTGCTCCTGTTAGCTCTGCAT 269
29 AK098749 AK098749 TGTTTGTGGGTCCATTATGACCAGGGTTATTGGTGTCCTTGCTGTTCI I I I CTCCCATGT 270
30 LOC440455 CR594811 AAAGGCCACCAGCATCCTCTGCCTCAATCTCAATCTTTGGAACl I I I I AGCTCCAAAGTG 271
31 CDK5R2 NM_003936 ATTTTCCGTTCCTTGGGTTTGTGTGTCTGCATCTCCATCTTACCCCTTGCCTGACTGTAC 272
32 STM N4 NM_030795 TGGCCAAGGGGCGTTTCCTCTGC I I I I GGTGTTTGTACATGTTAAGAATTGACCAGTGAA 273
33 C10orf49 NM_145314 ACGCATGCCTTTCCTTCCAGTGTGTGAAAGTGTCCTGACTTTCACCTCTTTGCAGACCAT 274
34 OPRLl NM_182647 CTGTTCACMAGTGGAGGCCTCGTTTTCCTGGTCTTGACTGCTCTGTTTGGGTGGGAGAA 275
35 KRTAPlO-IO NM_181688 TGTCTGCTCTAAGTCCGTCTGCTATGTGCCTGTGTGCTCTGGGGCTTCCACTTCATGCTG 276
36 DKFZp762C2414 NM_178542 TTGCCTTTCCACCCCTCTCAGCTTGAGGTCCTACCTGGTGTCCTCCATGAGAATCATACA 277
37 FBSl NM_022452 GCCTCTACGGTCTGGAACCTGCTCACCCCTTGCTCTACAGCCGCTTGGCTCCTCCACCAC 278
38 YIPF2 NM_024029 TCTTTGTCTTCATCCCCATGGTGGTCCTGTGGCTCATCCCTGTGCCTTGGCTGCAGTGGC 279
39 HRLP5 NM_054108 TGCTTCCTTCCCTTGCCTTCTCTCAATGGCTCCGATTAAGAAAATTGTGAACTTCTGGAA 280
40 TIAMl NM_003253 CCGTAGAGAATGTGTGTAGATACTTCCTGCCCTAACTCTGCCCACCCTCCTGTACCGTCG 281
41 CABP5 NM_019855 GTATCAAAAGGTTCAGAACGTAGTCTCTACTCTTCCCCTGCCTGCATCAACTGTTGCAAA 282
42 KRTHA2 NM_002278 AACTGTTCTCTTTGGTGATGTTTCTGGTTGTCTGTGCTGCCTCAAAGAGCGTGTGTTCTT 283
43 ENST00000339692 ENST00000339692 CTGTCCTGAGAGCCCCAGGTTCCACTTTGACCTCTAAACAGATCCTCCTCTTCTCGGAGG 284
44 A_23_P207049 A_23_P207049 CTCAAGTAGATGTCGTTCTGTGGCATCTCTTCTTCCTCCTGCCTTGTGCCCTCATGTTCA 285
45 GABBRl NM_001470 TTTGCACACGTCCATGTTTATCCATGTACTTTCCCTGTGTACCCTCCATGTACCTTGTGT 286
46 LPHNl NM_001008701 ATGTCTTTCTGGCTGTCTCTATGTTCCTCTTCTCTTATCCTCAACTTTCTGTCCATTCGG 287
47 A_24_P931583 A_24_P931583 AAAGCCTGGCTCCCATGCCAGGTGTTGATGCTGTCCTTCCACGCTTCTCTCCTCCTAAAG 288 -fcr LO
48 CXorf48 NM_017863 TCTCACCCACATMCTTGCGGTGTTTGTTGTTTTACTTGTTCCTTCTTCCCCATGCTAGG 289
49 OR6N1 NM_001005185 GTGTCCAGATTTATCTCTTCCTCTTGTTGCTTCTCATTTACCTCATGACTGTGTTGGGAA 290
50 AK026131 AK026131 AATCCTGAGCTGCMGGTTCTGCCTCACCTGTCTCTCCACTCTGACATCACGMATCAGC 291
51 A_23_P972 A_23_P972 GTTATTCAGACCATAAMGCCTC I I CI I I CTCTATTTTTCCATTTGCACCATCACACCAG 292
52 RHEBLl NM_144593 TTCATCATTGGGGTCCATGGTTATGTGCTTGTGTATTCTGTCACCTCTCTGCATAGCTTC 293
53 ICOSLG NM_015259 TATCACCTGGTGGCCACAGTCCCCCTTCTCACCTCAGCMTGATCCCCMAGTGAGAGGT 294
54 AF113013 AF113013 AGCCAGCCCAGGACAGTCTCTGTGCCTAGGCCCTGCCAGCCTGGCTGCCCTGCCTCCAGC 295
55 SLICl NM_153337 CTGTGTGCTCTCCAGCTCCTAAGTCGTGGGCCTGGTTTCAGCTTTGATTGCCCACTCTTG 296
56 A_24_P925010 A_24_P925010 CATCCTTTCTGTGTGTCTCCTCTTGTGGCTACACTTGACGGGCCATATTATAMAGAATA 297
57 PIP5K2C NM_024779 TCTCATMTGTTGAGMCCCTGATGAGATACATTGTCTTCCTCTCCCTACAATGCCTCTG 298
58 AI732974 AI732974 CCTGTTCTTCCCCMCTTGGCTTTCC 1 1 I I L 1 1 I 1 1 GGTCATGGGCTCTCAGAGTCTGGG 299
59 AI791380 AI791380 CCTCATCGCTCTCCTAGGCTCCTGGCCTGCTGGACTCTGGGCTGCAGGTCCTTCTTGAM 300
60 A_32_P158543 A_32_P158543 CTCTTCATCTTCCTCTTCATCATCTTCATCATCTGAATCGCTCAGGGTGTCCTCTCCGGA 301
61 SYNGRl NM_004711 AATTCCCAGGGGACTGACGTTAGTTCCCTACTCCATCCTTCCCTGGTGATTCTTGATCTT 302
62 KRTH B4 NM_033045 GAGATGGCTTTTCTCCCAGTGGCTTCTCTCCGGCTGTTTCTCTTCCTGGGTTGTTGGTGT 303
63 KIAAlOOl NM_014960 AGATTACACTCAGGACCCTTCAGTMCTCCCTGCTGTAATCCCTACCAMTTGCCTGCCG 304
64 BU570396 BU570396 AGGTCTCTTTTGCTCTGGCTTCTTCAGTTTAGCCTAGTCAACACCACCCTTCTTAACAM 305
65 ENST00000290997 ENST00000290997 ACAGCCAACTGGAMGATATAMAGTTTGGGTCTGTCTCCTCTCCTTCAGAMTGMATA 306
66 AL080233 AL080233 TGAATTACTCCTCTGCCMGAACCCTCTGCTGCACCTTATCCAAGGCCCACCATAMATG 307
67 OR3A3 NM 012373 TGTGGCACCCTTGGTCTTCATCAGTGTGTCCTATGCCCATGTGGTAGCTGCTGTGCTGCA 308
68 THC2285742 THC2285742 AACAGAGGCTTCCTTTCTGCCATGCCTGTGGGATGATATTCTTATTGTTCCA I I I I I I AG 309
69 SMAP-I NM_017979 TCCTCGAGATCTTTCCCTACTGCTCTTGCCTGGAGACAGTGGCCTCATGGGTGCTGACAG 310
70 RARA NM_000964 AGTTCTCCΓCCTCAGCCTTTTCCTCCTCAGTTTTCTCTTTAAAACTGTGAAGTACTAACT 311
71 C2orfl3 BC030711 TCTCCAAGACACTTTCATATCCTAAGCCCTGTTCTGTTTGTTCTTGTGTAGTAAATTGGC 312
72 ADAM33 NM_025220 GGCCTCTGCAAACAMCATAATTTTGGGGACCTTCCTTCCTGTTTCTTCCCACCCTGTCT 313
73 KCNH4 NM_012285 CTACTTTCTCCCCAGGACCTGGAGGCAGGCTGTCTCTCAAGGTGGTCTTTTCTGCTGCTA 314
74 THC2313031 THC2313031 ACCTCAATTG AACCTGTCACCACGGATTTCCTTCTGCTCTGTTTCTATAATATGACATAA 315
75 AK027178 AK027178 ACGAGGCTTGCTTACrTCTATTGTTGGTTGGAπcrTTCCTCTTTTTCTTCTTTTAAAAA 316
76 ENST00000331736 ENST00000331736 AAI I I I I GGCCATCTGTCACCCCCTGCACTACCCAGTCATCGTGAATCCTCACTTCTGTG 317
77 DKFZp434F142 AL136837 CCMAACTTCCTCAAGTCAAGCTCTTTAGGCCCACCTTCTGCCTTGCAGTGGCCTGTACA 318
78 HB-I NM_021182 AGAGTAAMTTMGCMGTGGAACATATGCCCTTTGCCTCTGCTCTGCACAGTGMATGA 319
79 H 16080 H16080 TTGCTGCAAGGMGTMAGAGTCCCTTCTACCTTTGCCAGTGCTACTTACCAGTTAMM 320
80 KLC2 NM_022822 ACAGCCCCTGTC I I I I CTGTTCAATCTCAGGGTMCCTTCTCCCTTGTCATCTCAGCCTG 321
81 A_24_P333077 A_24_P333077 TGGTCTTCCTCCTCAGCATCGTGCTCCATMCATGTCCTGCATTTCTAAGCGCTTGACTG 322
82 CR619805 CR619805 CCCACAGCCGCTGCATGGCTGTGC I I I I CCTTGGCACTCTGCCCTTGTGTCCTGTGACCA 323
83 THC2414210 THC2414210 AACTTAMTCCCTTCATCTTCCCCTTCATTCAAGTGTATATTCTGCTAGCAMGTGAGAC 324
84 FOXE3 NM_012186 CCCTTTCACGTAGACACACCTGGCTGCCTTCTTCACGCCCTGAGGACACTTCTTGGAGAT 325
85 THC2377542 THC2377542 ACCAACCCTCTTCCTAGCTCCTCCATAATTCC 1 1 1 1 1 1 I CCTGCACGTGATTACAGTGAG 326
86 AF085829 AF085829 AGGTCTGGTTATGACTTCAGGCCTCAGCACCTGGTTGTCTCTTCTGCMAGAATGCTTTC 327
87 FU35785 NM_173613 AGGGCAGTCCTGTTTCTTGCTTCCTGCCTCTGAC I I I I AAAGGTGGGTAGCCCTGGGCTC 328
88 OR4C15 NM_001001920 TC I I I I lACTTTCCAGCπCCCTTTTGTGGCCCCAATGTCATCAATCACTTTATGTGTGA 329
89 SLC16A4 NM_004696 TATTTACTTGGTC I I I I CTCCTCAGTCAGTTAGCATACTTCATCCCTACCTTTCACCTGG 330
90 C14orfl52 NM_138344 TGGGATGTTGCCCTTCCTTGCCTTGCAGTCACCCCAGAAGCCAGAGAGACGCATCTGTTT 331
91 MGC26717 NM_173824 TCGTGAGCCCTCGTTTGCTGTCACTAGTAMAGAAGAAI I I CI 1 1 I I CTCAGCCCCMCC 332
92 ENST00000318886 ENST00000318886 ATTTCTGACMTTCCTGTGGCCCCACTCCATTTG 1 1 1 1 1 I AGACTTCTCCCTCC I 1 1 H G 333
93 HRG NM_000412 TCTATGCCACTCAGAATCTCCTTCTTTCCTGGACTTAACTCTMTTCTAGAGTCTCTGTT 334
94 FGDl NM_004463 ACAGATCCTGCCTCCACCCAGTTCCCCACAMGCACCAGAGGTAGGAGACCTGGATTCM 335
95 THC2306982 THC2306982 TGGGAGAGCGTTGGTGGATGTGTTCTGCATGTTCCTTTCTGTACAGTAACΓTCTGCATTT 336
96 A_32_P158253 A_32_P158253 GGTCACTCTGGTCCAGTCAGTCCTTCCTCCAGGGACATAACGAATCTGGCTCTGCCTGM 337
97 A_32_P151087 A_32_P151087 TCCACCCTCCCTAGAAGCTGTGCCACCAACGMAGCTTTTCCCTGCACCTGTAGTAGAGG 338
98 A_24_P456802 A_24_P456802 GTCCTGTTTGGCCCACCACTTTCTTGTTGGGTATATTAGGGTGATTCTTMGGAATTCCC 339
99 POLD2 AK025276 CTGGCATCCTTCTTGTCTTGTTTTGAGTTGCTCGCCTCTGTCTGCTCCCTAGGGCGTAGA 340
100 A_32_P113646 A_32_P113646 CAGTCTCTTCCTCTCTGGGAGCTGGCTGGAGCTGGGATGGACACCTGACAGMGGAAATT 341
101 MYADML NM_207329 GGCCTGCCTCATCTTCGTGTTCATCMTAGCCCCTACGTGTACCACMCCGGCCGGCCCT 342
102 HTR3A NM_213621 GTGTGCATGGCTCTGCTGGTGATMGTTTGGCCGAGACCATCTTCATTGTGCGGCTGGTG 343
103 C14orfl62 NM_020181 TGAAAGCATTTCACCCTCTCAATGCCTCACCTTTCCCGTTCGCCAGATGGMCCATCAGT 344
104 THC2348993 THC2348993 ATATTGACGATACCTCTCTtCTGCCTCCATTTTGTATTCCCACTGCCACTGGGMCATGA 345
105 C21orf89 AF426268 GTTTCCΓTGATTTCTCCTCCTTTGGATGTATTGGGCTTCCTGACTCTTTGAAGAATCTAT 346
106 BI829861 BI829861 AGAAGGTTTTCCGTGATCCTGCAAGACCTGTGTCCCATCCTGGTGATTCTGTCTTCAACT 347
107 MEF2D NM_005920 ACACAGATTACCAGTTGACCAGTGCAGAGCTCTCCTCCTTACCAGCCTTTAGTTCACCTG 348
108 ENST00000338302 ENST00000338302 TGGGGGCACTTCTGGCTTCTTCTGCCCCAGCACCCATTCTTGCCAAAATGGAGGTGTCTT 349
109 TAAR8 NM_053278 CTCTTCACAGTTGCTGTGATGTGGCATTΓΓGWACTCTTCTGTCCTCCACTTGTGCTTCA 350
110 ENST00000355104 ENST00000355104 AAGTCTCTTCATCTACGCCTCTTCAGCCACAGCTCTTGCCTGGTGGGCTTCTCACAGCAG 351
111 STAT5B BC020868 TAATTGTGCAGCTTCCTCTCATTCCCTGCTACTTGTCTCATGTCCTGGCAACAGGAAAAA 352
112 ENST00000360151 ENST00000360151 TCΓCGTAGGGCTTCCATGGGTGTCTCTGGTGMATTTGCTTTCTGTTTCATGGGCTGCTG 353
113 BX105952 BX105952 GGCAATCCTTTGCCCCTTCCCTCCTTGAGGTGACTAAGATGGTCATGCAGACAAGCCACC 354
114 BC032064 BC032064 AAAGAGAAAGCCTGTCAAGTCTCTCAGCCTCCCGGGCCTTCGTGCCCATCTTAAGGCTGA 355
115 EPM2A AF454494 CCCAACTCCGCCTTGGTTGGGTTTTGTGTCCTCATTTTCCTGCTCTAATTCACTAAAAAA 356
116 KCNK7 NM_033347 CCTGCGCCATTGCCTGCTGCCTGTGCTCAGCCGCCCACGTGCCTGGGTAGCGGTCCACTG 357
117 SGCA BCO 14215 CTGCAGATGGGGTGACTGATAGGCAGGTTACGCAGTCCTTGTCCCTTCCTCTCTCTCTCT 358
118 THC2249515 THC2249515 AAATTTCTTACACCCACAGGGCCTTCCTCTCACATGAATGTACTAAATGTATCCCCAATG 359
119 SNX15 NM_013306 AGAAAAATAATGAATTCTTAGCTCCCTGATTACACCTGCCACCTTGGAATCCAGGACTCA 360
120 LOC441120 NM_001013718 ACAGAATCCTTGGGGTATCTGTCATCCCTCAGCTCCTCTCAGCCACCAGAGCCTTTGCGT 361
121 A_32_P64025 A_32_P64025 TGATGTCCTCATGAGTCTGGTGTTCAACATGGGCCTGCTGTCCACCTAGGGCCTTGGTGT 362 c
122 ZSCANl NM_182572 GGAGTTTCATTCGCGTCCCATCTTTCAGAAGCCTTGTCCAGGTCTCCCTGTTAGCACCTG 363
123 PEMT BC007572 CTGTGAAGAAGGATCCATTTCCGTCAGCACCCTGCCTTTCATGCTCCTCCCTGCTGTTGA 364
124 CCIN NM_005893 TGGACTGTCCTGCCTGCTGTCTAGCCAAGCTACCTTGCAAGATTCTTCAAAGGATTTAAA 365
125 C20orf81 NM_022760 TGGTGATCATCAACTCCTGCCTCTGGGATCTCTCCAGATATGGTCGCTGCTCAATGGAGA 366
126 CRNN NM_016190 GGACTA I I I I I ATCTCTGACATCTCTCTATTGCCCCATCTACCCTAATGCATCAATAAAA 367
127 ADRBKl NM_001619 GTGCCTGATTCGGCTGTCTCAGACTCI I I I IGTACCTGGTGACCCCI Il ICAGCTTCTGC 368
128 MGC25181 BC071598 TCTGGAAAATTCTTCTCTGATATCCACATGCCTTTGCATTGCCTCCCTGTCTCGCCCTGA 369
129 THC2269888 THC2269888 AAGCCTAGTGTTTTGCAGTTACTTCCΓTCCCTGTCAMATTTTGCCTCCΠΓTGATTTTCA 370
130 ENST00000300398 ENST00000300398 AGGAAATACCTCACTGCCTTTGGTGCTCAAAAAGCTTCTCCTCTCTAGAGATGAAACCCT 371
131 LOC51252 NM_016490 CCΓΓCCTCTGTCTCCCΓGTCACTAATGTGAGGTTTCTTTGTGCACATTAAAGTCTTCTTT 372
132 ENST00000321130 ENST00000321130 TTTCTTCTCMGTATGCACTTAMTATAATTACTGATCCTTGGTCCTCTAGCAGATTTC 373
133 THC2436815 THC2436815 GTTTGGGCATCAGGCTGGATCTTCACACTCTGACCTCACTCTGCTCTATCCTGCCCAGTG 374
134 AK056734 AK056734 CTGCARRCCTTCCACTGTTGTTGATTCAGCTCATTTTACATACCAAGGCACCTTTCCTGA 375
135 ENST00000355855 ENST00000355855 ACMGGCCCGTGCCACCTCTTCCCACAAGGTCCGTGCCACCTCTGCCCACAAGGCCCGTG 376
136 A_23_P75192 A_23_P75192 TGGTAMATGTGCAGCCTCCCTTCCMATTTCCATTGCGCGGTGGCTTTTGGTTTATTTT 377
137 CLDNIl NM_005602 TGTCTGGCATTTTGTAGTCTTMCTTCTCCCCATTTCCCCCATCTTTTGGTTGCCTTAAA 378
138 LGI3 NM_139278 CTGTAGCTGTCGCTTTCTCAGGCTTGGACTGTCCCCTGGTTCACCCTGTCTTTGGGATGT 379
139 SLIT3 AL122074 AGCTCCCTGCTACCTCTAGCCAGGCTCATCTATCTGTCTGTCCACTTAAMACTCGTCAG 380
140 Clorf82 NM_024813 CTGGGCTTCTGGTTCCTCTTCAGATTACATTGGGAGATATTTACACACMCTTMMATC 381
141 A 24_P919727 A_24 P919727 AGGGTCCTTTCTCCCCTTACTCTGACCTGTGGGTTTTCCAGGTGCCAGGMCTTCGGTCA 382
142 ENST00000323051 ENST00000323051 GGAGTTACACTCAMGTCrrCCCCCTTCTTATCACTCCTGCCGATTTCCCTCITCCrTTT 383
143 C8orf31 NM_173687 TCTGGTTTCTGCTGGGCTGCCCTCTCATGGGAAGCCTTTAATCACTCTAGCTCAATATAT 384
144 BF149382 BF149382 GGATCTCCTCAGGGCTCTTCTCCAGCCCCTCCTGGCTCAGGAAGTCTGTGAAGGGGATGT 385
145 FU20694 NM_017928 TGCAGCCTCC I I I I CCCTATCTATAAAATAAAAATGACCCTGCTCTATCTCACTGGGCTG 386
146 THC2392835 THC2392835 AAGCATATTCCTCTTTGGCATTAACGATGTCAGTTTTGGTGGTTGCCTGCTCCAGATGTT 387
147 THC2287450 THC2287450 AGTGTTGGTTCATAGCAGTTTCCTTGTCCC I I I I C I I I I I AAAATTCCCAACTGATTATG 388
148 PACRG NM_152410 GCAAGGCCTTGGTGCCTTATTACCGTCAAATCCTCCCTGTCCTGAACATCTTTAAGAATA 389
149 A_32_P141211 A_32_P141211 CATGAATCTCATCATGTTACTCCTCCATTTACATCACTTCTCCTTGCCTCAGGGATTAAG 390
150 ESRRB NM_004452 CCATGATGGAAAATGCCCCTTCCAATCAGCTGCCTTCACAAGCAGGGATCAGAGCAACTC 391
151 A_32_P198601 A_32_P198601 ATGTO^ATCTCTGGACCTGCCTTCTGGACGCCATTTATCTAAMTTC^CTGTTTGTTTTT 392
152 RTN4RL2 NM_178570 TCCAACCTGCTCACCCTGTGGCTCTTCTCCAACAACCTCTCCACCATCTACCCGGGCACT 393
153 C6orfl06 NM_024294 TTGGCAAGTTAAATGTCCCAAGCACCTTGTCTCCCTTCCCAAGAACAAACCATTTCTGTG 394
154 AK024371 AK024371 TCCCACAGCTCACTTCCCTCTCTACTGAATCTGCTGAAGATCATGCAGCTGCTGTTCCTG 395
155 OR10H4 NM_001004465 TCATGTGCTTTCCCTCTTGAAGTTGGCCTGTGAAAACAAGACATCATCTGTCATCATGGG 396
156 SSTR3 NM_001051 ACCCCATCCTTTATGGCTTCCTCTCCTACCGCTTCAAGCAGGGCTTCCGCAGGGTCCTGC 397
157 GPR144 NM_182611 ACCGCCTGCTTCTGCAACCACAGCACCAGCTTTGCCATCCTGCTGCAAATCTATGAAGTA 398
158 CR610954 CR610954 ACCTTCCTTATCCCAGACCTCCTTCCTAGCTCTTGCTCAGAGTTGAGGCCTTGGTCGGGT 399
159 INSL3 NM_005543 CCTGCATGTGTAACACCCCTTCTTGCTGTCTCTTAGTAAATAAACGACCCAAAGCAGCTT 400
160 A_23_P130639 A_23_P130639 CTCCCTCAAGGACCTTTCCTCTTCCCTCCGTCCAGAGTGTCTTCI 1 1 I l IATTTTTATTT 401
161 RABlB NM_030981 AGGATCCTAGTCCCCTGCCCTCTGGCACGGCTGCTTCCTGCAAGAAAGTAAGTCTTTGGT 402
162 LOC284001 NM_198082 GCCGAGGTTCCTGCCACAGAAACCGTCCTTGCCTGTGCTCTGCCAGCGTTGCTGTCTGTA 403
163 ENST00000316517 ENST00000316517 TGACATGGGTCTGGTCTTTGCTTCCTATTCTTTGATTATTCACTCAGTGCTGAAGCTGAA 404
164 THC2283890 THC2283890 CAGGTTGAGACTCTGG I I I CI I TC Cl CGTTTGATATTTTCTGCTTCCCAAATTTCAGAGA 405
165 KRTAP2-1 AJ296345 TGAAMAGTCACACTCCCATTTCTCTACTTTAACAAACCTCCTGCCCTGAGCCCACAAAA 406
166 THC2320455 THC2320455 CCCAGAGCTGGAACGTCATGCCTGGACTCCCTCACTATTTGGAAAGACTTCTTTGGAGTC 407
167 CR609588 CR609588 CGGTGATTGGTGGTTGGCTGTTCCGTGAGTTCTGCATTGATCTTGCTGGCTGAGCGTCCA 408
168 LMANlL NM_021819 ACCTCTCCATGTCACTCAATAAGGACTCTGCCAAGGTCGGTGCCCTGCTCCATGGACAGT 409
169 ENST00000321619 ENST00000321619 CTGGTTAGCCTCTGTTTCCCAGACATTGACTCGAGGCGCCTCCAGTCCTCTCAACCCCCT 410
170 THC2455655 THC2455655 GCTTCTAGCAGCCTCATCTCCCTGGTATGACCTAGCTCCAGGGACTGGATTTCCCCAAAT 41 1
171 THC2343350 THC2343350 CACCTCTGCTGCCTGTCTTTTAACATTGATTCTAAGCTGCATACTAATTTCAAAAATCTC 412
172 BTN 2 A3 NM_024018 TCAGCCCCTGTTCACCTCCTCATTAGGTGTGGCTTTAGTAGTTCCTTTGGTTGTAACTAT 413
173 AA292852 AA292852 ACAGTACAGTGACCCTACACGCTTGTTACCAGCCTCACCTGTCACCCACATGGAGTCCTT 414
174 ETV3 NM_005240 TCTTGACTTTCCTGGTTATAGCTTTCCATCACAGCTCCCCACATTCTCTCTTGATGTTGA 415
175 X78261 X78261 GCAAGTGCCTTCTCCGTAGACCATGCTTCAGTCTC I I l e i IΆGTTCTCCTTCTTACTGAA 416
176 NOVAl NM_002515 TCCATCCTAGCAAGCTAAATGGCATCCCAGCTGCTCCTTTCTGTGCAACCAATTAAAGAA 417
177 AVPR2 NM_000054 CTCCTCATTCTCTCCCTAATAAAAATTGGAGCTCTTTTCCACATGGCAAGGGGTCTCCTT 418
178 THC2315820 THC2315820 CGCGACCTTCTCCCTTTCAGCCTCTTCCATAACCACACTACTCCTTTCCTCCATCGGCTT 419
179 TSPAN 14 NM_030927 AAGTACCTCCTTTTCAGCTACAACATCATCTTCTGGTTGGCTGGAGTTGTCTTCCTTGGA 420
180 A_23_P213468 A_23_P213468 GACCATTTTGCAGCCGGCACGTTGATGGACACTGATGCCACCTCTTCTCCCTCAGGTGTT 421
181 DKFZP434A0131 NM_018991 TGTGATGTCACAGTTACTGTCAGTTCACAGCGAACCTTCCCTCCI I I rCCTGTTG ACTTT 422
182 A_24_P928705 A_24_P928705 CTGGTGTGCAGTTCTTGGTCTGCAGTGAGGGTAGTCTTGTAGGGGCTATTCCCTGAAGCT 423
183 SPATAl NM_022354 CTACCAGATCACCCTTCACTTCCTTGTCAACCTGTTCI I I CI I CAGGAATAACTGATATA 424
184 LOC157740 A3291676 ATTACTCCΠTGCTTGTTCCTTTCCCACCGCCCACTTCTGACAI Γ I I IATTCAAAAGTAG 425
185 AA442488 AA442488 TTGCTTGTTTAATATAGAAGTCCCCCTTGTTCCTTGGGAGATCATGGCCTTTGAATATGT 426
186 SLC5A2 NM_003041 TTCTCCCGGCCTTCCTCTGCCTGGGGCCCACTGCATCTGATTGGCAGTCACTTCCCATGA 427
187 BF761218 BF761218 AGTAGCTTCTTCTGGACTTCCAAAAAGATTGCTCGATTCTCCTCCTGGGGGCTTCATGGC 428
188 PRKACA NM_002730 AATCCTCTCTGCCAATCCTGCGAGGGTCTAGGCCCCTTTAGGAAGCCTCCGCTCTC I 1 1 I 429
189 TBLlXRl NM_024665 CTGGTTGG 1 1 1 1 I CGTTTTG 1 1 1 1 C I 1 1 G I 1 1 1 1 1 CCCCCTTCTCCTGAATCAGCAGGGA 430
190 FKSG43 AF334945 TGCCCTTCTATGGGTGTGCCTTCTTCCACGGTGAGGTTGACAAGCCAGCCCAAGGCTTTT 431
191 AA282192 AA282192 TGCCCCAGCTTCTCATTCCTCTCTGCCCAGGCCAGTGATCTGAGGCTTTCTGTGGGAAAT 432
192 FU35740 NM_147195 TTCTAGCTTCACTGTTGGGCCAGATTTCATCCCCACCGTGGCTCTTCATTGCGGCTGCTT 433
193 AK054756 AK054756 TACCAACCTTCACGTTCTAATAATCTGTCCCGATGTCAAGGCCCTGGCTCTGTCATTAAA 434
194 DRFl NM_025104 AGTGAGGATTCTGCACGTGGATGGTACCCTTTCTGTGCTGGGCTCCTGTGGAGGGAGAAT 435
195 SLC35B4 NM_032826 CCCTRCCAΑTCCGGGTΠCGTCTTCTTGGCTTCTGATATTTATGACCATGCAGTTCTAT 436
196 MGC57359 NM_001004351 ATTCATTTCTTCCTGGCTCTCTACCTGGCCAATGACATGGAGGAGGACGACGAGGACCCC 437
197 ENST00000272643 ENST00000272643 CAACTGCCTTAACCGCTTTCTC I l I I1 GTAGCTCTCAGACTTCTCAG I Il 1 1 I GAGGAATC 438
198 THC2317675 THC2317675 TTAATTAAACAAGAGACAGTGAGGCTCTCAATCCCCTTCCCGCATCTACAGGTGTTGCTA 439
199 THC2321018 THC2321018 GCATATTTCCTTACTTTTCTTATCCCTGCTCAACTGTAAGCTCCTTATGGGTATTTCTAG 440
200 A_32_P221437 A_32_P221437 TTTGGGCCACGCCTGCCTTCTGTGCCGGCCTTGGATTATGTTCAAATTCATCTCATGCTA 441
201 THC2443434 THC2443434 TGAGAATTTGTCCTAATGCCTCTGGTCTTTGCCTGTGCTTCAGCTTATTTGGGGGAACAA 442
202 A_32_P56726 A_32_P56726 TGTGTGAATCATTCCCTCCTCTTTGTTTTACTCAGCACACATACAAGGACATATGCCCAT 443
203 THC2281165 THC2281165 CCCTTGCCCTGCCTCTCCTTCTGTTGAGGTAGGAGTGAGAATTCTGCAGCCATTTCAGAC 444
204 C21orfl24 NM_032920 TGGCCAGTATTCCCAGTGGCCCCTTTGAGAGGATTTCCTGGAGCTCCCTGTCTGAGCTTC 445
205 BC009800 BC009800 TCTAGTTΓGTAMTCACATTTGGCGTTTGTAGATCACTCCTTCCCTΠTAGTGGCATTCT 446
206 SLC29A2 NM_001532 CTCTGCCCTGGGGTACTTTATCACGCCCTGTGTGGGCATCCTCATGTCCATCGTGTGTTA 447
207 C20orf67 NM_022104 AGTTGATCCTGCCTGCCTTTGAGCATGAGTACCGCAGTGGCTCCCAGCACATCTGCAAGA 448
208 THC2317029 THC2317029 TGCATGTAACTAGTATGTGACTGCTTCTGCATTCCCTTCGATCTATCCTTGGCTTTCTCA 449
209 CB852325 CB852325 ATATTCCTTGCCTGTGCCACATCAGTGTGCTATGGACAGCCTGATCTCCAACTGCCAGCA 450
210 HDAC5 NM_001015053 CCCACTCCTTGCTTTGTCTCCCTGGATATGGATTTCAGTTAAGTATTTTGTAACCCGTTA 451
211 EGFL8 NM_030652 TCCAGGGCTGAGCTGTGCACTCTCTTAGGCGGATTCTCCTTCCTCCTGCTACTGATACCA 452
212 AF085968 AF085968 CTAACCTGGGACCTGCCAAAAAACACACTATATGTGCCCCTGTAGGCATGCTGGCTGACT 453
213 AJ005814 A3005814 CAAAGCTGACTTACGTACTTTCCCCATCTTCCCAGGCATCTTGTACGTCGTTACATTTAT 454
214 AK026465 AK026465 TTCCAATCTTCCATCCTGAGGTCCTGCTAGAATGGGCTGTTAGAATATTCCCTCACTAGA 455
215 ZNF646 NM 014699 CCCATATCTTCTCCTCTCCCCTTGTGAAGAGGACCCAGATCTGGCTTCTTTCCCAAGGAG 456
216 LOC222159 AK027340 TGAGTCATGGGGCGCCTAGTCAAAATGCTAATTTGGGGCTCACATCTCCTTCCTCTGGCC 457
217 A_24_P267686 A_24_P267686 TCACGGCCTTCCCACACTTTGTCCTCACTTGCAACAGGGACΠ CTTGTCTG CCTGTTGTA 458
218 KRTHA8 NM_006771 ATGGGTAGAGATTTGCTAAGTATGATGTGCTTCCACCTCTTCTCTTCATACTCTCTACCT 459
219 AK055306 AK055306 GAGCCCCTGTTCTCΓATTCTCCCTAACATTATTCCCTTTGCATCTCAGGAGCCTACTTCT 460
220 A_24_P238666 A_24_P238666 CCAGCTTAGTGATCACAAATGATCCTGTCTCTTCCCTGTCTGTAAAAGGTTGTTTTGAAC 461
221 THC2282164 THC2282164 GTGACAGTGCTTTTGTAATCTTGTACTCCCACTTCTCATCATCCCI 1 1 I 1 1 GGTTTGTCC 462
222 AK094860 AK094860 ATGACATCCCCTCΓTGTTTTTGCCTCTCTTTCTCCTGATGCAATGGCCAAAATGCTGGAA 463
223 ENST00000315293 ENST00000315293 MTTCCCCTTTGTACTCCTATCTTTATCTCTAACTGACAGCATGAAGGTAGCCCGAAGTC 464
224 THC2282130 THC2282130 ATGCTCTTCAACAAAGATTTCCTATCTCTGTATCTTTCCTCTCACCTTCTGCATGGCTTG 465
225 AK093639 AK093639 TGGTGTATTΓCATATTATATTCΓTCΓGCTTTCTTGTCTGTCTCGTGCATCTGGACCTCTG 466
226 A_32_P8653 A_32_P8653 AGAGTAC I 1 1 I CCTACCTCTTTCGCCTGCATCCTTAGAAAACTCCCGTGGATGAGAAGAT 467
111 OR4A15 NM_001005275 AGGAGCGATTTGTGCTGTCACCTTCTTCACTATCCTGCTTTCCTATGGGGTCATATTACA 468
228 THC2350529 THC2350529 AAGTACTGCTTTTAGGAATGAAATCTAATGGTGGCCTTGTCCCTGGTGCGGGCATAGTAT 469
229 A_24_P927553 A_24_P927553 TTCTTCCTGGTCTTTCTCACACACTGCTGTAACCTCCCTCACAI I ICI IATTAGAAGGTT 470
230 BI830189 BI830189 GGTTCAGCATGTTTAGGGCGAGGGTGTGCTCCATGGGGACCTGCCTTCTGATCCGTTTTA 471
231 CLCNl NM_000083 TCCATCTTCCAGTCCCTGCTTCACTGCTTGCTGGGCAGAGCTCGCCCCACAAAGAAGAAA 472 J
232 KRTHA5 NM_002280 CCTGGGTCAGGTTTTCCTTCTAGGTGCTGTTCCGGTGGATTCTGAAATGCAGTAGAGGGC 473 C
233 THC2339964 THC2339964 TGTGTGTGTCCTAGCCTTGTTCCCAAAGATGTCAGAGGTGATCTGCATTCAGACATCTTA 474
234 A_32_P73184 A_32_P73184 ATAAGAATCCAAATGTGGATATTTCAGGTTGCTCCAGGGGGTCTGATGGCCAGTGTGACA 475
235 SCC-112 NM_015200 TGCATTGATAGGGACCTTΓGTCTCTTCCTCCCTTTGATTAATTGCCCGGCATCACAGTTT 476
236 KCNC3 NM_004977 GGGGGTGCTGACCATCGCCATGCCTGTGCCCGTCATTGTCAACAACTTTGGCATGTACTA 477
237 FAM54B NM_019557 TAGTTTAAAGGGTAAGAGAGAAGTTGTTTCTGGTTTTTCCTTGCCCCTGTGTGAAAATAG 478
238 POLL NM_013274 ACCGGCGCCTGGACATCATCGTGGTGCCCTATAGCGAGTTTGCCTGTGCCCTGCTCTACT 479
239 PANX3 NM_052959 GCAAGAAAGCTTGTGGAAAGTCTCTCTCCTTCCTCATAAGACATGCACACTAATACACAT 480
240 THC2287766 THC2287766 TCCAACCCCTCTTGTAG I I I l I I CI I CCTCACTGTCACTGTAACCATAAAAATACTCAGT 481
241 RASGRP4 NM_052949 CCTGACAGATGCTTGTCTGATGCCTGAATGTCTCCCTACCCATGCCCACGGCACAGGATA 482
242 ALS2CL NM_147129 CTGTTGGGCCTCAGTTTCTCTTCCCCACACAGTTTATCTTCCGTCACATTGTGCCGGGTG 483
243 ARL3 NM_004311 TTTCCTTCTCGGGGACCAGTTarTACTTCCTTTTATTTTTAGCTCTGCACTCCATGTGGT 484
244 CYP21A2 NM_000500 AGAACTCGAAGCCCTTCCAGTGGTACCAGCTCACTCCCTGGGAAAGGGGTTGTCAAGAGA 485
245 THC2438980 THC2438980 TATATAMATATATCCTAGGCCCCTTTCAGTCATCACATTCTTAACTTTTAGGAAGAGCC 486
246 LOC143903 AK096925 ccTTCCTTCTGTCCCAGTTAGAATACGTCAATGTCACATCTAACACAC 11 1 c ri c ΓTAG A 487
247 WBSCR17 NM_022479 AGGGCTGAAGCAAAGGCTGCCAGGACCCTTGAAGATGCI I I I GGCTCACCTCATTTCACC 488
248 KCNAB2 NM_003636 CGGGGGTCTCCACTCAGTCCTGCTGCCTGCTTCACCAGAAGCAGCCCTGTGAGTGTGGGG 489
249 OR4X2 NM_001004727 GGAGGCTTCATGCATTCCTTTGCACAAATCCTTCTCATCTTCCACCTGCTCTTCTGTGGC 490
250 KCNQ5 NM_019842 AGCATTTGTAAGGCAGGAGAAAGTACAGATGCCCTCAGCTTGCCTCATGTCAAACTGAAA 491
251 GPDl NM_005276 TCATGCCACCACATTCGCCAGAAATGCAGTTGCCCTGTCCCTCTCCAGATGTGGGGCTTT 492
252 NP109393 NP109393 CAACCTCTCCTTCTTGGACCTCTGTTTCACCACGAGTTGTGTTCCCCAAATGCTGGCCAA 493
253 GHSR NM_004122 TGTGGGTGTCCAGCATCTTCTTCTTCCTTCCTGTCTTCTGTCTCACGGTCCTCTACAGTC 494
254 LOC56964 NM_020212 GGCCTGCCCACTCCTGGAAAATATCTCAAAAAATTGTACCATTCCTCAAAGGGACTTGGA 495
255 MEGFlO NM_032446 GCTGTGGCTGTAAAAATGATGCAGTCTGCTCTCCTGTGGACGGGTCTTGTACTTGCAAGG 496
256 AK025430 AK025430 GCCCACCTGTCACATCI I I I I GTTTCTTCTATACTGCCTTATTTTGTAGAAAGTAGCTAT 497
257 KRTAP19-1 AJ457067 ACCTCCAAATGTGTTTGGTCCTGTCCCGTGCTTTCATTCCAAAAATCCATTCTATTGCCT 498
258 ENST00000330284 ENST00000330284 1 1 I I I IGATGTTTCTAMGGTTCTCTTATCTATGGTTTTCCTCCTTCCTCCTTCTCGTCC 499
259 SULT4A1 NM_014351 CTTTAGAACGTGCAGCCTCTCCATGTCTGATTACAAACAGTCTCCACATTGCAGTTCCAA 500
260 LASS5 NM_147190 ATTGGGCTTATCTCCTTCTCCTACATCAACAATATGGTTCGAGTGGGAACTCTGATCATG 501
261 THC2361983 THC2361983 TCTTCCTCGAGCTTCCTTATCTCCTCCTGTTGAATCATTTTAAGATGCTCGAACTTGTCC 502
262 TAAR2 NM_014626 AATAGAGGGCTATGACATCTTGGTTGCTTGTTCCAGTTCCTGCCCAGTGATGTTCAACAA 503
263 FU12681 NM_022773 GGAGTTGAGACAATGGCAATCCTGACACCTTCCTCCACTACAGCCCTGACCATAGACCCA 504
264 MFNl NM_033540 TGCAGTCACCAAGTAAAACAACAAATAGCTACCACTTTTGCCCGCCTGTGCCAACAAGTT 505
265 A_32_P139012 A_32_P139012 GAAAGGACTTGCTTTACAGGTGGTCCCTCTTCTGGCTGGGTTTCAGTTAATTCTGAATTA 506
266 THC2334567 THC2334567 GGAGAATAACTCATTCTGGTTCTTGTTTCCCCTTTTCCACATAAAAGTATATTTGTCTTG 507
267 A_23_P77245 A_23_P77245 ACATGGTGGTCCTTCTCACCATGGTGTTCTTGTCTCCACAGCTCTTTGAATCACTGAATT 508
268 TUGl NRJ302323 TGACTACCTTCCCTGTGCTATTCCATCAGCCTACAGACCTGGTACCTGGATTTTTGCCCG 509
269 CR617352 CR617352 CTGGGGTCTGGGACTTGCTCCTTTGGCGAATTGAGAAGTAAGGATATAGAAGGCATTTTA 510
-
270 FU31164 NM_145003 CTGCTTGTCATCATCATCATCATCGCCACCTCTGTCCGAAAGTGATGCTACCCGTGCATC 511
271 LOC146853 NM_145272 TGACAAACCTGCCTCTTTCCCTGAAACCATTCAATAAAACCTCTGATCAAAGTGCCAAAA 512
272 THC2407386 THC2407386 GGGCTGTTGCTGATACGTGGTTGACTGTCATTGCTAGACTGTGGCTTTACCGGGGGCATT 513
273 MFSD2 AK027396 CCCAGAGATGACCCCAGAAATCTGGGAAACTCCCCTTGGTTCCCCATCTCTCATCCCCTA 514
274 TASlRl NM_138697 CTTCCTCTACAATGGCCTCCTCTCCATCAGTGCCTTTGCCTGCAGCTACCTGGGTAAGGA 515
275 RASL12 NM_016563 GGTTTGGGTGCCTGTTTTTCGAGGTCTCTGCCTGTCTGGACTTTGAGCACGTGCAGCATG 516
276 LOC375759 NM_199350 ACACATGGGCCAGGCAATGAGGAGCGTCCTTCCTTGCCCCACAAATAAAGGCACATCCCA 517
277 HXMA NM_182617 TGCGCAGTGAGGCGTCTAGGAGACATTCATTTGGATTCCCCTCI I CI I ICTCTTTCTTTT 518
278 UNC13D AK024474 TTCCTCACTGAGGTTGAGAGGTGTGTTGGATAGGACTGATCCCACCTGCCCCTTGCTGGT 519
279 THC2408500 THC2408500 AGTGTTGTGTACTGGGATGTTGGGGCACTCGGCTCTAGGGTCTGGGATAACAGGTGACTT 520
280 PRPSl NM_002764 AGATTAACTGCTGGACCTCCTACCTGCATTATCTCATTCTGGCTTCCTTGATAATTCTGT 521
281 ELN NM_000501 TGGACTTGGAGTTGGTGCTGGTGTTCCTGGACTTGGAGTTGGTGCTGGTGTTCCTGGCTT 522
282 RPS6KL1 NM_031464 GGTGGTGTCAGCAAACTCAAGTCCCATCCCI I I I I CAGTACCATCCAATGGAGCAAGCTG 523
283 TMEM30B NM_001017970 AACATCCTTGCAAACTCTTCCCACCTCCTTCACGACACTGAGTTGCCATGTGAGGTTCTT 524
284 A_24_P67632 A_24_P67632 CTCTTCCCTCTTGGATGTCTATAGCTTCTGTCTCAAGTCTGCTAAGTCCCCAAAGCAAAA 525
285 HRH2 NM_022304 GGATGGGCTGGTCACCTTCTACCTCCCGCTACTGATCATGTGCATCACCTACTACCGCAT 526
286 AF161342 AF161342 AGCTGGGGTGGGCACAAAGCAGGTCACATGGGGGCTGCCTGGGGCCAGGAGGATAAGGCA 527
287 PDE4C NM_000923 CATGTGGCTCCTGCTTCACTTTCCCACCCATTTAGGGAGACAATCAAGCTCTTAGTTATA 528
288 MLL5 NM_018682 AGGACCAAACAGTATTCCAACACCTACTGCTTCAGGGTTCTGTCCTCATCCTGGCTCTGT 529
289 PSORS1C2 NM 014069 CCTCTTCCTCATTCCCTCGGTTTTATTCTGAACCCGTAAGGTGGTGTTCTCAATATTTCC 530
290 A_24_P933927 A_24_P933927 GAGTACACCATGTACAGGATCTCCAGCATCTCCAAGATCTTTCCTGTCCTCATGCTGTAC 531
291 BF909544 BF909544 TCCTGTGCTGAGTACAGTAACTCATAAATCCCACTCACTCTGTCCCCAAGCATGAATGAA 532
292 BCL7A NM_020993 CTTGTGCCCTTGAGGTGACCTCTGGCATGTATCCTGGTGGTTCTTACATCCCCCTCTGCA 533
293 FUIlOOO NM_018295 CCCAGGCCTTACTCATCCTCTTGCTTATAGCCATGGCTGTGTTCCCTCTGAGGGCTGAGA 534
294 TNNIl NM_003281 TGTCTCACTGCCACCATGCCGGAAGTCGAGAGAAAACCCAAGATCACTGCCTCCCGCAAA 535
295 APCS NM_001639 GTGCCTACAGCCTCTTCTCCTACAATACCCAAGGCAGGGATAATGAGCTACTAGTTTATA 536
296 ARIHl NM_005744 AAGAAGGCATGGGTCACACTATTTCGTGTCCTGCTCATGGTTGTGATATCTTAGTGGATG 537
297 AI823874 AI823874 ACATCTTGGACATCCTGCCATGTTATCTCTGTATCTTCTTGGGGATTCAGCACTTCCTCA 538
298 L12052 L12052 GCCrrTTCCTCACACπTCTACATCCAAATACAGCTGTTTATAACCAGTTATCTGCAGTA 539
299 CENTB5 NM_030649 CTGGTGCAGGCCGTGCTAGGGGGCTCCTTGATCGTCTGTGAGTTCCTGCTGCAAAACGGA 540
300 THC2444653 THC2444653 TCAGTGGTTGAGGACTAGAAAAACCAAAGAGCCTCGTCAAGTCCTGCCTGCTAACTCCTC 541
S
Table 3
Overview of the characteristics of the samples.
Figure imgf000052_0001
Table 4
l\3
Figure imgf000053_0001
U)
Figure imgf000054_0001
Figure imgf000055_0001
Figure imgf000056_0001
Figure imgf000057_0001
i
Figure imgf000058_0001
OO
Figure imgf000059_0001
Figure imgf000060_0001
S
Figure imgf000061_0001
Figure imgf000062_0001

Claims

Claims
1. A method for typing a RNA sample of an individual suffering from colorectal cancer or suspected of suffering therefrom, the method comprising a. providing an RNA sample that is prepared from a tissue sample from said individual, said tissue sample comprising colorectal cancer cells or suspected to comprise colorectal cancer cells; b. determining RNA levels for a set of genes in said RNA sample; and c. typing said RNA sample on the basis of the RNA levels determined for said set of genes; wherein said set of genes comprises at least five of the genes listed in Table 1 and/or Table 4.
2. The method of claim 1, wherein said set of genes comprises at least ten of the genes listed in Table 1 and/or Table 4.
3. The method of claim 1, whereby said set of genes comprises five of the genes listed in Table 1 and/or Table 4, which genes are rank-ordered 1-5.
4. Method according to any of claims 1-3, whereby said RNA levels are determined by microarray analysis.
5. Method according to any of the previous claims, further comprising normalizing the determined RNA levels of said set of genes in said sample.
6. Method according to claim 5, whereby the RNA level of at least five of the genes listed in Table 2 are used for normalizing the determined RNA levels of the set of at least five of the genes listed in Table 1 and/or Table 4 in said sample.
7. Method according to any of the previous claims, further comprising multiplying each of said determined values with a predetermined constant for said gene to obtain a weighted value for the relative RNA level of said gene, and thereby a set of weighed values for said set of genes, said method further comprising typing said sample on the basis of said set of weighed values.
8. Method according to any of the previous claims, whereby said colorectal cancer comprises a TNM stage II or TNM stage III cancer according to the TNM Staging System.
9. Method according to any of the previous claims, whereby said typing differentiates cancer cells with a low metastasizing potential and cancer cells with a high metastatic potential.
10. A method of classifying an individual suffering from colorectal cancer, comprising: classifying said individual as having a poor prognosis or a good prognosis by a method comprising:
(a) providing an RNA sample from a said individual that is prepared from a tissue sample from said individual, said tissue sample comprising colorectal cancer cells or suspected to comprise colorectal cancer cells;
(b) determining a level of RNA for a set of genes comprising at least five of the genes listed in Table 1 and/or Table 4 in said sample;
(c) determining a similarity value between a level of expression from the set of genes in said individual and a level of expression from said set of genes in a patient having no recurrent disease within five years of initial diagnosis; and
(d) classifying said individual as having a poor prognosis if said similarity value is below a first similarity threshold value, and classifying said individual as having a good prognosis if said similarity value exceeds said first similarity threshold value.
11. The method of claim 10, whereby the determined level of RNA for said set of genes is normalized.
12. A method of assigning treatment to a individual suffering from colorectal cancer, comprising
(a) classifying said individual as having a good prognosis or a poor prognosis according to claim 10 or claim 11; (b) assigning chemotherapy if said individual is classified as having said poor prognosis.
13. Method according to any of the previous claims, whereby said colorectal cancer comprises a colon cancer.
14. A nucleotide microarray, comprising a total of between 100 and 12.000 nucleic acid molecules, at least five of said nucleic acid molecules being able to hybridize to a set of genes comprising at least five of the genes listed in Table 1 and/or Table 4.
15. The microarray of claim 14, further comprising at least five nucleic acid molecules being able to hybridize to a set of genes comprising at least five of the genes listed in Table 2.
16. The use of an array according to claim 14 or claim 15 to differentiate colorectal cancer cells with a low versus a high metastasizing potentia
PCT/NL2008/050426 2007-06-28 2008-06-27 A method of typing a sample comprising colorectal cancer cells WO2009002175A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP07111327.8 2007-06-28
EP07111327 2007-06-28

Publications (1)

Publication Number Publication Date
WO2009002175A1 true WO2009002175A1 (en) 2008-12-31

Family

ID=38941903

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/NL2008/050426 WO2009002175A1 (en) 2007-06-28 2008-06-27 A method of typing a sample comprising colorectal cancer cells

Country Status (1)

Country Link
WO (1) WO2009002175A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120157341A1 (en) * 2009-08-24 2012-06-21 Shuichi Kaneko Detection of digestive organ cancer, gastric cancer, colorectal cancer, pancreatic cancer, and biliary tract cancer by gene expression profiling
US20130296191A1 (en) * 2010-12-23 2013-11-07 Paul Roepman Methods and Means for Molecular Classification of Colorectal Cancers
WO2015044495A1 (en) * 2013-09-26 2015-04-02 Servicio Andaluz De Salud Method for predicting the response to chemotherapy treatment in patients suffering from colorectal cancer
EP2622099B1 (en) * 2010-09-28 2017-11-08 Agendia N.V. Methods and means for typing a sample comprising cancer cells based on oncogenic signal transduction pathways
CN109988708A (en) * 2019-02-01 2019-07-09 碳逻辑生物科技(中山)有限公司 A kind of system for carrying out parting to the patient with colorectal cancer

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020142981A1 (en) * 2000-06-14 2002-10-03 Horne Darci T. Gene expression profiles in liver cancer
WO2004063709A2 (en) * 2003-01-08 2004-07-29 Bristol-Myers Squibb Company Biomarkers and methods for determining sensitivity to epidermal growth factor receptor modulators

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020142981A1 (en) * 2000-06-14 2002-10-03 Horne Darci T. Gene expression profiles in liver cancer
WO2004063709A2 (en) * 2003-01-08 2004-07-29 Bristol-Myers Squibb Company Biomarkers and methods for determining sensitivity to epidermal growth factor receptor modulators

Non-Patent Citations (14)

* Cited by examiner, † Cited by third party
Title
BERTUCCI FRANCOIS ET AL: "Gene expression profiling of colon cancer by DNA microarrays and correlation with histoclinical parameters", ONCOGENE, BASINGSTOKE, HANTS, GB, vol. 23, no. 7, 19 February 2004 (2004-02-19), pages 1377 - 1391, XP002414422, ISSN: 0950-9232 *
CHEN Y ET AL: "Identification of differently expressed genes in human colorectal adenocarcinoma", WORLD JOURNAL OF GASTROENTEROLOGY 21 FEB 2006 CHINA, vol. 12, no. 7, 21 February 2006 (2006-02-21), pages 1025 - 1032, XP009095009, ISSN: 1007-9327 *
CHOI MOON-CHANG ET AL: "AKAP12/Gravin is inactivated by epigenetic mechanism in human gastric carcinoma and shows growth suppressor activity", ONCOGENE, vol. 23, no. 42, 16 September 2004 (2004-09-16), pages 7095 - 7103, XP002466070, ISSN: 0950-9232 *
CRONER ROLAND S ET AL: "Microarray versus conventional prediction of lymph node metastasis in colorectal carcinoma", CANCER, AMERICAN CANCER SOCIETY, PHILADELPHIA, PA, US, vol. 104, no. 2, 10 June 2005 (2005-06-10), pages 395 - 404, XP009088697, ISSN: 0008-543X *
DATABASE EBI [online] EBI; 9 August 2001 (2001-08-09), ROSEN CA ET AL: "Nucleic acids encoding human immune/hematopoietic antigen polypeptides, useful for preventing, diagnosing and/or treating cancers and metastasis", XP002500538, Database accession no. AAK61866 *
DATABASE SRS [online] EBI; 5 October 2000 (2000-10-05), QIAN B ET AL: "Homo sapiens cDNA clone:GLCGFC09, 5'end, expressed in corresponding non cancerous liver tissue.", XP002500537, Database accession no. AV719591 *
EVTIMOVA V ET AL: "IDENTIFICATION OF CRASH, A GENE DEREGULATED IN GYNECOLOGICAL TUMORS", INTERNATIONAL JOURNAL OF ONCOLOGY, EDITORIAL ACADEMY OF THE INTERNATIONAL JOURNAL OF ONCOLOGY,, GR, vol. 24, no. 1, January 2004 (2004-01-01), pages 33 - 41, XP009038071, ISSN: 1019-6439 *
FREDERIKSEN CASPER MOLLER ET AL: "Classification of Dukes' B and C colorectal cancers using expression arrays", JOURNAL OF CANCER RESEARCH AND CLINICAL ONCOLOGY, SPRINGER INTERNATIONAL, BERLIN, DE, vol. 129, no. 5, 15 May 2003 (2003-05-15), pages 263 - 271, XP009088733, ISSN: 0171-5216 *
FRIEDERICHS JAN ET AL: "Gene expression profiles of different clinical stages of colorectal carcinoma: toward a molecular genetic understanding of tumor progression", INTERNATIONAL JOURNAL OF COLORECTAL DISEASE, vol. 20, no. 5, September 2005 (2005-09-01), pages 391 - 402, XP002466147, ISSN: 0179-1958 *
GROENE JOERN ET AL: "Transcriptional census of 36 microdissected colorectal cancers yields a gene signature to distinguish UICCII and III", INTERNATIONAL JOURNAL OF CANCER, vol. 119, no. 8, October 2006 (2006-10-01), pages 1829 - 1836, XP002466068, ISSN: 0020-7136 *
HABANO WATARU ET AL: "Novel approach for detecting global epigenetic alterations associated with tumor cell aneuploidy", INTERNATIONAL JOURNAL OF CANCER, vol. 121, no. 7, October 2007 (2007-10-01), pages 1487 - 1493, XP002466069, ISSN: 0020-7136 *
IRIZARRY RAFAEL A ET AL: "Exploration, normalization, and summaries of high density oligonucleotide array probe level data.", BIOSTATISTICS (OXFORD, ENGLAND) APR 2003, vol. 4, no. 2, April 2003 (2003-04-01), pages 249 - 264, XP002466228, ISSN: 1465-4644 *
KIDD ELIZABETH A ET AL: "Variance in the expression of 5-Fluorouracil pathway genes in colorectal cancer", CLINICAL CANCER RESEARCH, THE AMERICAN ASSOCIATION FOR CANCER RESEARCH, US, vol. 11, no. 7, 1 April 2005 (2005-04-01), pages 2612 - 2619, XP002392018, ISSN: 1078-0432 *
VAN GEELEN C M M ET AL: "Lessons from TRAIL-resistance mechanisms in colorectal cancer cells: paving the road to patient-tailored therapy", DRUG RESISTANCE UPDATES, CHURCHILL LIVINGSTONE, EDINBURGH, GB, vol. 7, no. 6, 1 December 2004 (2004-12-01), pages 345 - 358, XP004777707, ISSN: 1368-7646 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120157341A1 (en) * 2009-08-24 2012-06-21 Shuichi Kaneko Detection of digestive organ cancer, gastric cancer, colorectal cancer, pancreatic cancer, and biliary tract cancer by gene expression profiling
US8932990B2 (en) * 2009-08-24 2015-01-13 National University Corporation Kanazawa University Detection of digestive organ cancer, gastric cancer, colorectal cancer, pancreatic cancer, and biliary tract cancer by gene expression profiling
US9441276B2 (en) 2009-08-24 2016-09-13 National University Corporation Kanazawa University Detection of digestive organ cancer, gastric cancer, colorectal cancer, pancreatic cancer, and biliary tract cancer by gene expression profiling
US9512490B2 (en) 2009-08-24 2016-12-06 Kubix Inc. Detection of digestive organ cancer, gastric cancer, colorectal cancer, pancreatic cancer, and biliary tract cancer by gene expression profiling
US9512491B2 (en) 2009-08-24 2016-12-06 Kubix Inc. Detection of digestive organ cancer, gastric cancer, colorectal cancer, pancreatic cancer, and biliary tract cancer by gene expression profiling
EP2622099B1 (en) * 2010-09-28 2017-11-08 Agendia N.V. Methods and means for typing a sample comprising cancer cells based on oncogenic signal transduction pathways
US20130296191A1 (en) * 2010-12-23 2013-11-07 Paul Roepman Methods and Means for Molecular Classification of Colorectal Cancers
US10036070B2 (en) * 2010-12-23 2018-07-31 Agendia N.V. Methods and means for molecular classification of colorectal cancers
WO2015044495A1 (en) * 2013-09-26 2015-04-02 Servicio Andaluz De Salud Method for predicting the response to chemotherapy treatment in patients suffering from colorectal cancer
CN109988708A (en) * 2019-02-01 2019-07-09 碳逻辑生物科技(中山)有限公司 A kind of system for carrying out parting to the patient with colorectal cancer

Similar Documents

Publication Publication Date Title
US10266902B2 (en) Methods for prognosis prediction for melanoma cancer
JP5745848B2 (en) Signs of growth and prognosis in gastrointestinal cancer
JP2009528825A (en) Molecular analysis to predict recurrence of Dukes B colorectal cancer
CN105431738B (en) The method for building up of the prognostic predictive model of stomach cancer
JP2011509689A (en) Molecular staging and prognosis of stage II and III colon cancer
WO2021164492A1 (en) Application of a group of genes related to colon cancer prognosis
US8921051B2 (en) Methods and means for typing a sample comprising colorectal cancer cells
US20090220956A1 (en) Prediction of Local Recurrence of Breast Cancer
WO2009002175A1 (en) A method of typing a sample comprising colorectal cancer cells
EP2553119A1 (en) Algorithm for prediction of benefit from addition of taxane to standard chemotherapy in patients with breast cancer
WO2009089548A2 (en) Malignancy-risk signature from histologically normal breast tissue
CA2475769C (en) Colorectal cancer prognostics
US20050048526A1 (en) Colorectal cancer prognostics
US20090297506A1 (en) Classification of cancer
CN117012376A (en) Construction method and risk prediction method of breast cancer local recurrence model
CN117004711A (en) Tool for measuring prognosis marker of breast cancer local recurrence risk and application thereof
CN116926190A (en) Prognosis marker for measuring breast cancer remote metastasis risk and application thereof

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08766849

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 04.05.2010)

122 Ep: pct application non-entry in european phase

Ref document number: 08766849

Country of ref document: EP

Kind code of ref document: A1