US20150368724A1 - Methods and materials for classification of tissue of origin of tumor samples - Google Patents

Methods and materials for classification of tissue of origin of tumor samples Download PDF

Info

Publication number
US20150368724A1
US20150368724A1 US14/746,487 US201514746487A US2015368724A1 US 20150368724 A1 US20150368724 A1 US 20150368724A1 US 201514746487 A US201514746487 A US 201514746487A US 2015368724 A1 US2015368724 A1 US 2015368724A1
Authority
US
United States
Prior art keywords
mir
hsa
seq
group
nos
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US14/746,487
Inventor
Ranit Aharonov
Nitzan Rosenfeld
Shai Rosenwald
Nir DROMI
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Rosetta Genomics Ltd
Original Assignee
Rosetta Genomics Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from PCT/IL2008/000396 external-priority patent/WO2008117278A2/en
Priority claimed from PCT/IL2009/001212 external-priority patent/WO2010073248A2/en
Priority claimed from US13/167,489 external-priority patent/US8802599B2/en
Priority claimed from PCT/IL2011/000849 external-priority patent/WO2012070037A2/en
Application filed by Rosetta Genomics Ltd filed Critical Rosetta Genomics Ltd
Priority to US14/746,487 priority Critical patent/US20150368724A1/en
Publication of US20150368724A1 publication Critical patent/US20150368724A1/en
Assigned to ROSETTA GENOMICS LTD. reassignment ROSETTA GENOMICS LTD. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: ROSENFELD, NITZAN, DROMI, NIR, ROSENWALD, SHAI, AHARONOV, RANIT
Priority to US15/909,145 priority patent/US20190032142A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/158Expression markers
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/178Oligonucleotides characterized by their use miRNA, siRNA or ncRNA

Definitions

  • the present invention relates to methods and materials for classification of cancers and the identification of their tissue of origin. Specifically the invention relates to microRNA molecules associated with specific cancers, as well as various nucleic acid molecules relating thereto or derived therefrom.
  • microRNAs are a novel class of non-coding, regulatory RNA genes 1-3 which are involved in oncogenesis 4 and show remarkable tissue-specificity 5-7 . They have emerged as highly tissue-specific biomarkers 2,5,6 postulated to play important roles in encoding developmental decisions of differentiation. Various studies have tied microRNAs to the development of specific malignancies 4 . MicroRNAs are also stable in tissue, stored frozen or as formalin-fixed, paraffin-embedded (FFPE) samples, and in serum.
  • FFPE paraffin-embedded
  • the patient may undergo a wide range of costly, time consuming, and at times inefficient tests, including physical examination of the patient, histopathology analysis of the biopsy, imaging methods such as chest X-ray, CT and PET scans, in order to identify the primary origin of the metastasis.
  • CUP cancer of unknown primary
  • the present invention provides specific nucleic acid sequences for use in the identification, classification and diagnosis of specific cancers and tumor tissue of origin.
  • the nucleic acid sequences can also be used as prognostic markers for prognostic evaluation and determination of appropriate treatment of a subject based on the abundance of the nucleic acid sequences in a biological sample.
  • the present invention provides a method for accurate identification of tumor tissue origin.
  • microRNA expression levels were measured in 1300 primary and metastatic tumor paraffin-embedded samples.
  • microRNAs were profiled using a custom array platform. Using the custom array platform, a set of over 300 microRNAs was identified for the normalization of the array data and 65 microRNAs were used for the accurate classification of over 40 different tumor types. The accuracy of the assay exceeds 85%.
  • the findings demonstrate the utility of microRNA as novel biomarkers for the tissue of origin of a metastatic tumor.
  • the classifier has wide biological as well as diagnostic applications.
  • the present invention provides a method of identifying a tissue of origin of a cancer, the method comprising obtaining a biological sample from a subject, measuring the relative abundance in said sample of nucleic acid sequences selected from the group consisting of SEQ ID NOS: 1-390, any combinations thereof, or a sequence having at least about 80% identity thereto; and comparing the measurement to a reference abundance of the nucleic acid by using a classifier algorithm, wherein the relative abundance of said nucleic acid sequences allows for the identification of the tissue of origin of said sample.
  • the classifier algorithm is selected from the group consisting of decision tree classifier, K-nearest neighbor classifier (KNN), logistic regression classifier, nearest neighbor classifier, neural network classifier, Gaussian mixture model (GMM), Support Vector Machine (SVM) classifier, nearest centroid classifier, linear regression classifier and random forest classifier.
  • the sample is obtained from a subject with cancer of unknown primary (CUP), with a primary cancer or with a metastatic cancer.
  • the cancer is selected from the group consisting of adrenocortical carcinoma; anus or skin squamous cell carcinoma; biliary tract adenocarcinoma; Ewing sarcoma; gastrointestinal stromal tumor (GIST); gastrointestinal tract carcinoid; renal cell carcinoma: chromophobe, clear cell and papillary; pancreatic islet cell tumor; pheochromocytoma; urothelial cell carcinoma (TCC); lung, head & neck, or esophagus squamous cell carcinoma (SCC); brain: astrocytic tumor, oligodendroglioma; breast adenocarcinoma; uterine cervix squamous cell carcinoma; chondrosarcoma; germ cell cancer; sarcoma; colorectal adenocarcinoma; liposarcoma; hepatocellular carcinoma (HCC); lung large cell or adenocarcinoma;
  • the invention further provides a method for identifying a cancer of germ cell origin, comprising measuring the relative abundance of SEQ ID NO: 55 or a sequence having at least about 80% identity thereto in said sample; wherein the abundance of said nucleic acid sequence is indicative of a cancer of germ cell origin.
  • the germ cell is selected from the group consisting of an ovarian primitive cell and a testis cell.
  • the group of nucleic acid furthers consists of SEQ ID NOS: 29, 62 or a sequence having at least about 80% identity thereto, and the abundance of said nucleic acid sequence is indicative of a testis cell cancer origin selected from the group consisting of seminomatous testicular germ cell and non-seminomatous testicular germ cell.
  • the invention further provides a method for identifying a cancer origin selected from the group consisting of biliary tract adenocarcinoma and hepatocellular carcinoma, comprising measuring the relative abundance of a nucleic acid sequence selected from the group consisting of SEQ ID NOS: 55, 6, 9, 29 or a sequence having at least about 80% identity thereto in said sample; wherein the abundance of said nucleic acid sequence is indicative of a cancer origin selected from the group consisting of biliary tract adenocarcinoma and hepatocellular carcinoma.
  • the invention further provides a method for identifying a cancer of brain origin, the method comprising measuring the relative abundance of a nucleic acid sequence selected from the group consisting of SEQ ID NOS: 55, 6, 30, 46, 16, 156, 66, 68 or a sequence having at least about 80% identity thereto in said sample; wherein the abundance of said nucleic acid sequence is indicative of a cancer of brain origin.
  • the group of nucleic acid furthers consists of SEQ ID NOS: 40, 60 or a sequence having at least about 80% identity thereto, and wherein the abundance of said nucleic acid sequence is indicative of a brain cancer origin selected from the group consisting of oligodendroglioma and astrocytoma.
  • the invention further provides a method for identifying a cancer of prostate adenocarcinoma origin, the method comprising measuring the relative abundance of a nucleic acid sequence selected from the group consisting of SEQ ID NOS: 55, 6, 30, 46, 56, 65, 25, 27, 35, 14, 21 or a sequence having at least about 80% identity thereto in said sample; wherein the abundance of said nucleic acid sequence is indicative of a cancer of prostate adenocarcinoma origin.
  • the invention further provides a method for identifying a cancer of breast adenocarcinoma origin, the method comprising measuring the relative abundance of a nucleic acid sequence selected from the group consisting of SEQ ID NOS: 55, 6, 30, 46, 56, 65, 27, 35, 14, 21, 32, 51, 7, 25, 50, 11, 148, 4, 49, 67 or a sequence having at least about 80% identity thereto in said sample; wherein the abundance of said nucleic acid sequence is indicative of a cancer of breast adenocarcinoma origin.
  • the invention further provides a method for identifying a cancer of ovarian carcinoma origin, the method comprising measuring the relative abundance of a nucleic acid sequence selected from the group consisting of SEQ ID NOS: 55, 6, 30, 46, 56, 65, 27, 35, 14, 21, 32, 51, 7, 25, 4, 39, 50, 11, 148, 49, 67, 57, 34 or a sequence having at least about 80% identity thereto in said sample; wherein the abundance of said nucleic acid sequence is indicative of a cancer of an ovarian carcinoma origin.
  • the invention further provides a method for identifying a cancer of thyroid carcinoma origin, the method comprising measuring the relative abundance of a nucleic acid sequence selected from the group consisting of SEQ ID NOS: 55, 6, 30, 46, 56, 65, 25, 27, 35, 14, 21, 32, 51, 7, 11, 148, 4 or a sequence having at least about 80% identity thereto in said sample; wherein the abundance of said nucleic acid sequence is indicative of a cancer of thyroid carcinoma origin.
  • the group of nucleic acid furthers consists of SEQ ID NOS: 17, 34 or a sequence having at least about 80% identity thereto, and wherein said thyroid carcinoma origin is selected from the group consisting of follicular and papillary.
  • the invention further provides a method for identifying a cancer origin selected from the group consisting of lung large cell and lung adenocarcinoma, the method comprising measuring the relative abundance of a nucleic acid sequence selected from the group consisting of SEQ ID NOS: 55, 6, 30, 46, 56, 65, 25, 27, 35, 14, 21, 32, 51, 7, 11, 148, 4, 49, 67, 57, 34 or a sequence having at least about 80% identity thereto in said sample; wherein the abundance of said nucleic acid sequence is indicative of a cancer origin selected from the group consisting of lung large cell and lung adenocarcinoma.
  • the invention further provides a method for identifying a cancer origin selected from the group consisting of lung large cell and lung adenocarcinoma, the method comprising measuring the relative abundance of a nucleic acid sequence selected from the group consisting of SEQ ID NOS: 55, 6, 30, 46, 56, 65, 25, 27, 35, 14, 21, 32, 51, 7, 11, 148, 4, 49, 67, 57, 34 or a sequence having at least about 80% identity thereto in said sample; wherein the abundance of said nucleic acid sequence is indicative of a cancer origin selected from the group consisting of lung large cell and lung adenocarcinoma.
  • the invention further provides a method for identifying a cancer of thymic carcinoma origin, the method comprising measuring the relative abundance of a nucleic acid sequence selected from the group consisting of SEQ ID NOS: 55, 6, 30, 46, 56, 65, 25, 27, 35, 14, 21, 32, 51, 7, 50, 4, 39, 3, 34 or a sequence having at least about 80% identity thereto in said sample; wherein the abundance of said nucleic acid sequence is indicative of a cancer of a thymic carcinoma origin.
  • the invention further provides a method for identifying a cancer origin selected from the group consisting of a urothelial cell carcinoma and squamous cell carcinoma, the method comprising measuring the relative abundance of a nucleic acid sequence selected from the group consisting of SEQ ID NOS: 55, 6, 30, 46, 56, 65, 25, 27, 35, 14, 21, 32, 51, 7, 50, 4, 39, 3, 34, 69, 24, 44 or a sequence having at least about 80% identity thereto in said sample; wherein the abundance of said nucleic acid sequence is indicative of is indicative of a cancer origin selected from the group consisting of urothelial cell carcinoma and squamous cell carcinoma.
  • the group of nucleic acid further consists of SEQ ID NOS: 1, 5, 54 or a sequence having at least about 80% identity thereto, and wherein the abundance of said nucleic acid sequence is indicative of squamous-cell-carcinoma origin selected from the group consisting of uterine cervix squamous-cell-carcinoma and non uterine cervix squamous cell carcinoma.
  • the group of nucleic acid further consists of SEQ ID NOS: 11, 23 or a sequence having at least about 80% identity thereto in said sample, and wherein the abundance of said nucleic acid sequence is indicative of a non-uterine cervix squamous cell carcinoma origin selected from the group consisting of anus or skin squamous cell carcinoma; and lung, head & neck, and esophagus squamous cell carcinoma.
  • the invention further provides a method for identifying a cancer origin selected from melanoma and lymphoma, the method comprising measuring the relative abundance of a nucleic acid sequence selected from the group consisting of SEQ ID NOS: 55, 6, 30, 46, 16, 2, 47, 50 or a sequence having at least about 80% identity thereto in said sample; wherein the abundance of said nucleic acid sequence is indicative of a cancer origin selected from the group consisting of melanoma and lymphoma.
  • the group of nucleic acid further consists of SEQ ID NOS: 35, 48 or a sequence having at least about 80% identity thereto, and wherein the abundance of said nucleic acid sequence is indicative of a lymphoma cancer origin selected from the group consisting of B-cell lymphoma and T-cell lymphoma.
  • the invention further provides a method for identifying a cancer of lung small cell carcinoma origin, the method comprising measuring the relative abundance of a nucleic acid sequence selected from the group consisting of SEQ ID NOS: 55, 6, 30, 46, 56, 65, 25, 20, 45 or a sequence having at least about 80% identity thereto in said sample; wherein the abundance of said nucleic acid sequence is indicative of a cancer of lung small cell carcinoma origin.
  • the invention further provides a method for identifying a cancer of medullary thyroid carcinoma origin, the method comprising measuring the relative abundance of a nucleic acid sequence selected from the group consisting of SEQ ID NOS: 55, 6, 30, 46, 56, 65, 25, 20, 45, 40, 67, 68 or a sequence having at least about 80% identity thereto in said sample; wherein the abundance of said nucleic acid sequence is indicative of a cancer of medullary thyroid carcinoma origin.
  • the invention further provides a method for identifying a cancer of lung carcinoid origin, the method comprising measuring the relative abundance of a nucleic acid sequence selected from the group consisting of SEQ ID NOS: 55, 6, 30, 46, 56, 65, 25, 20, 45, 40, 67, 68, 64, 53, 37 or a sequence having at least about 80% identity thereto in said sample; wherein the abundance of said nucleic acid sequence is indicative of a cancer of lung carcinoid origin.
  • the invention further provides a method for identifying a cancer origin selected from the group consisting of gastrointestinal tract carcinoid and pancreatic islet cell tumor, the method comprising measuring the relative abundance of a nucleic acid sequence selected from the group consisting of SEQ ID NOS: 55, 6, 30, 46, 56, 65, 25, 20, 45, 40, 67, 68, 64, 53, 37, 34, 18 or a sequence having at least about 80% identity thereto in said sample; wherein the abundance of said nucleic acid sequence is indicative of a cancer origin selected from the group consisting of gastrointestinal tract carcinoid and pancreatic islet cell tumor.
  • the invention further provides a method for identifying a cancer origin selected from the group consisting of gastric and esophageal adenocarcinoma, the method comprising measuring the relative abundance of a nucleic acid sequence selected from the group consisting of SEQ ID NOS: 55, 6, 30, 46, 56, 65, 25, 27, 35, 42, 36, 146 or a sequence having at least about 80% identity thereto in said sample; wherein the abundance of said nucleic acid sequence is indicative of a cancer origin elected from the group consisting of gastric and esophageal adenocarcinoma.
  • the invention further provides a method for identifying a cancer of colorectal adenocarcinoma origin, the method comprising measuring the relative abundance of a nucleic acid sequence selected from the group consisting of SEQ ID NOS: 55, 6, 30, 46, 56, 65, 25, 27, 35, 42, 36, 146, 20, 43 or a sequence having at least about 80% identity thereto in said sample; wherein the abundance of said nucleic acid sequence is indicative of a cancer of colorectal adenocarcinoma origin.
  • the invention further provides a method for identifying a cancer origin selected from the group consisting of pancreatic adenocarcinoma and biliary tract adenocarcinoma, the method comprising measuring the relative abundance of a nucleic acid sequence selected from the group consisting of SEQ ID NOS: 55, 6, 30, 46, 56, 65, 25, 27, 35, 42, 36, 146, 20, 4351, 49, 16, or a sequence having at least about 80% identity thereto, and wherein the abundance of said nucleic acid sequence is indicative of a cancer origin selected from the group consisting of pancreatic adenocarcinoma or biliary tract adenocarcinoma.
  • the invention further provides a method for identifying a cancer of renal cell carcinoma origin, the method comprising measuring the relative abundance of a nucleic acid sequence selected from the group consisting of SEQ ID NOS: 55, 6, 30, 46, 16, 2, 66, 68, 19, 29 or a sequence having at least about 80% identity thereto in said sample; wherein the abundance of said nucleic acid sequence is indicative of a cancer of renal cell carcinoma origin.
  • the group of nucleic acid further consists of SEQ ID NOS: 36, 147 or a sequence having at least about 80% identity thereto, and wherein the abundance of said nucleic acid sequence is indicative of a chromophobe renal cell carcinoma origin.
  • the group of nucleic acid further consists of SEQ ID NOS: 49, 9 or a sequence having at least about 80% identity thereto, and wherein the abundance of said nucleic acid sequence is indicative of a renal cell carcinoma origin selected from the group consisting of clear cell and papillary.
  • the invention further provides a method for identifying a cancer of pheochromocytoma origin, the method comprising measuring the relative abundance of a nucleic acid sequence selected from the group consisting of SEQ ID NOS: 55, 6, 30, 46, 16, 2, 66, 68, 19, 29, 65, 56 or a sequence having at least about 80% identity thereto in said sample; wherein the abundance of said nucleic acid sequence is indicative of a cancer of pheochromocytoma origin.
  • the invention further provides a method for identifying a cancer of adrenocortical origin, the method comprising measuring the relative abundance of a nucleic acid sequence selected from the group consisting of SEQ ID NOS: 55, 6, 30, 46, 16, 2, 66, 68, 19, 29, 65, 56, 31, 38, 61 or a sequence having at least about 80% identity thereto in said sample; wherein the abundance of said nucleic acid sequence is indicative of a cancer of adrenocortical origin.
  • the invention further provides a method for identifying a cancer of gastrointestinal stromal tumor origin, the method comprising measuring the relative abundance of a nucleic acid sequence selected from the group consisting of SEQ ID NOS: 55, 6, 30, 46, 16, 2, 66, 68, 19, 29, 65, 56, 31, 38, 61, 14, 45 or a sequence having at least about 80% identity thereto in said sample; wherein the abundance of said nucleic acid sequence is indicative of a cancer of gastrointestinal stromal tumor origin.
  • the invention further provides a method for identifying a cancer origin selected from the group consisting of pleural mesothelioma and sarcoma, the method comprising measuring the relative abundance of a nucleic acid sequence selected from the group consisting of SEQ ID NOS: 55, 6, 30, 46, 16, 2, 66, 68, 19, 29, 65, 56, 31, 38, 61, 14, 45, 35, 10, 5 or a sequence having at least about 80% identity thereto in said sample; wherein the abundance of said nucleic acid sequence is indicative of a cancer origin selected from the group consisting of pleural mesothelioma and sarcoma.
  • the group of nucleic acid further consists of SEQ ID NOS: 3, 40, 15 or a sequence having at least about 80% identity thereto, and wherein said sarcoma is synovial sarcoma.
  • the group of nucleic acid further consists of SEQ ID NOS: 3, 40, 15, 12, 58 or a sequence having at least about 80% identity thereto, and wherein said sarcoma is chondrosarcoma.
  • the group of nucleic acid further consists of SEQ ID NOS: 3, 40, 15, 12, 58, 36, 26 or a sequence having at least about 80% identity thereto, and wherein said sarcoma is liposarcoma.
  • the group of nucleic acid further consists of SEQ ID NOS: 3, 40, 15, 12, 58, 36, 26, 21, 25, 49 or a sequence having at least about 80% identity thereto and wherein said sarcoma is selected from the group consisting of Ewing sarcoma and osteosarcoma.
  • the group of nucleic acid further consists of SEQ ID NOS: 3, 40, 15, 12, 58, 36, 26, 21, 59, 39, 33 or a sequence having at least about 80% identity thereto and wherein said sarcoma is selected from the group consisting of rhabdomyosarcoma; and malignant fibrous histiocytoma and fibrosarcoma.
  • the present invention provides a method of distinguishing between cancers of different origins, said method comprising:
  • the measurement of the relative abundance of SEQ ID NOS: 372, 233, 55, 200, 201 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a cancer originating from a germ-cell tumor and a cancer originating from the group consisting of non-germ-cell tumors.
  • the measurement of the relative abundance of SEQ ID NOS: 6, 30, 13 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a cancer originating from hepatobiliary tumors and a cancer originating from the group consisting of non-germ-cell non-hepatobiliary tumors.
  • the measurement of the relative abundance of SEQ ID NOS: 28, 29, 231, 9 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a cancer originating from liver tumors and a cancer originating from biliary-tract carcinomas.
  • the measurement of the relative abundance of SEQ ID NOS: 46, 5, 12, 30, 29, 28, 32, 13, 152, 49 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a cancer originating from the group consisting of tumors from an epithelial origin and a cancer originating from the group consisting of tumors from a non-epithelial origin.
  • the measurement of the relative abundance of SEQ ID NOS: 164, 168, 170, 16, 198, 50, 176, 186, 11, 158, 20, 155, 231, 4, 8, 46, 3, 2, 7 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a cancer originating from the group consisting of melanoma and lymphoma and a cancer originating from the group consisting of all other non-epithelial tumors.
  • the measurement of the relative abundance of SEQ ID NOS: 159, 66, 225, 187, 162, 161, 68, 232, 173, 11, 8, 174, 155, 231, 4, 182, 181, 37 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a cancer originating from brain tumors and a cancer originating from the group consisting of all non-brain, non-epithelial tumors.
  • the measurement of the relative abundance of SEQ ID NOS: 40, 208, 60, 153, 230, 228, 147, 34, 206, 35, 52, 25, 229, 161, 187, 179 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a cancer originating from astrocytoma and a cancer originating from oligodendroglioma.
  • measurement of the relative abundance of SEQ ID NOS: 56, 65, 25, 175, 152, 155, 32, 49, 35, 181, or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a cancer originating from the group consisting of neuroendocrine tumors and a cancer originating from the group consisting of all non-neuroendocrine, epithelial tumors.
  • measurement of the relative abundance of SEQ ID NOS: 27, 177, 4, 32, 35 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a cancer originating from the group consisting of gastrointestinal epithelial tumors and a cancer originating from the group consisting of non-gastrointestinal epithelial tumors.
  • measurement of the relative abundance of SEQ ID NOS: 56, 199, 14, 15, 165, 231, 36, 154, 21, 49 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a cancer originating from prostate tumors and a cancer originating from the group consisting of all other non-gastrointestinal epithelial tumors.
  • measurement of the relative abundance of SEQ ID NOS: 222, 62, 29, 28, 211, 214, 227, 215, 218, 152, 216, 212, 224, 13, 194, 192, 221, 217, 205, 219, 32, 193, 223, 220, 210, 209, 213, 163, 30 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a cancer originating from seminoma and a cancer originating from the group consisting of non-seminoma testis-tumors.
  • measurement of the relative abundance of SEQ ID NOS: 42, 32, 36, 178, 243, 242, 49, 240, 57, 11, 46, 17, 47, 51, 7, 8, 154, 190, 157, 196, 197, or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a cancer originating from the group consisting of squamous cell carcinoma, transitional cell carcinoma and thymoma, and a cancer originating from the group consisting of non gastrointestinal adenocarcinoma tumors.
  • measurement of the relative abundance of SEQ ID NOS: 56, 46, 25, 152, 50, 45, 191, 181, 179, 49, 32, 42, 184, 40, 147, 236, 57, 203, 36, or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a cancer originating from breast adenocarcinoma, and a cancer originating from the group consisting of squamous cell carcinoma, transitional cell carcinoma, thymomas and ovarian carcinoma.
  • measurement of the relative abundance of SEQ ID NOS: 253, 32, 4, 39, 10, 46, 5, 226, 2, 195, 32, 185, 11, 168, 184, 16, 242, 12, 237, 243, 250, 49, 246, 167 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a cancer originating from ovarian carcinoma, and a cancer originating from the group consisting of squamous cell carcinoma, transitional cell carcinoma and thymomas.
  • measurement of the relative abundance of SEQ ID NOS: 11, 147, 17, 157, 40, 8, 49, 9, 191, 205, 207, 195, 51, 46, 45, 52, 234, 231, 21, 169, 43, 3, 196, 154, 390, 171, 255, 197, 190, 189, 39, 7, 48, 47, 32, 36, 4, 178, 37, 181, 25, 183, 182, 35, 240, 57, 242, 204, 236, 176, 158, 148, 206, 50, 20, 34, 186, 239, 251, 244, 24, 188, 172, 238 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a cancer originating from thyroid carcinoma, and a cancer originating from the group consisting of breast adenocarcinoma, lung large cell carcinoma, lung adenocarcinoma and ovarian carcinoma.
  • measurement of the relative abundance of SEQ ID NOS: 249, 180, 65, 235, 241, 248, 254, 247, 160, 243, 245, 252, 17, 49, 166, 225, 168, 34 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a cancer originating from follicular thyroid carcinoma and a cancer originating from papillary thyroid carcinoma.
  • measurement of the relative abundance of SEQ ID NOS: 32, 56, 50, 45, 25, 253, 152, 9, 46, 191, 178, 49, 40, 10, 147, 4, 36, 228, 236, 230, 189, 240, 67, 202, 17 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a cancer originating from breast adenocarcinoma and a cancer originating from the group consisting of lung adenocarcinoma and ovarian carcinoma.
  • measurement of the relative abundance of SEQ ID NOS: 56, 11, 168, 16, 237, 21, 52, 12, 154, 279, 9, 39, 47, 23, 50, 167, 383, 34, 35, 388, 5, 359, 245, 254, 10, 240, 236, 202, 4, 25, 203, 231, 20, 158, 186, 258, 244, 172, 2, 235, 256, 28, 277, 296, 374, 153, 181 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a cancer originating from lung adenocarcinoma and a cancer originating from ovarian carcinoma.
  • measurement of the relative abundance of SEQ ID NOS: 161, 164, 22, 53, 285, 3, 152, 191, 154, 21, 206, 174, 19, 45, 171, 179, 8, 296, 284, 18, 51, 258, 49, 184, 35, 34, 37, 42, 228, 15, 14, 242, 230, 253, 36, 182, 293, 292, 4, 294, 297, 354, 377, 189, 30, 386, 249, 5, 274 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a cancer originating from thymic carcinoma and a cancer originating from the group consisting of transitional cell carcinoma and squamous cell carcinoma.
  • measurement of the relative abundance of SEQ ID NOS: 69, 28, 280, 13, 191, 152, 29, 175, 30, 204, 4, 24, 5, 329, 273, 170, 184, 26, 231, 368, 37, 16, 169, 155, 35, 40, 17 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a cancer originating from transitional cell carcinoma and a cancer originating from the group consisting of squamous cell carcinoma.
  • measurement of the relative abundance of SEQ ID NOS: 164, 5, 231, 54, 1, 242, 372, 249, 167, 254, 354, 381, 380, 245, 358, 364, 240, 11, 378 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between squamous cell carcinoma cancers originating from the uterine cervix, and squamous cell carcinoma cancers originating from the group consisting of anus and skin, lung, head & neck and esophagus.
  • measurement of the relative abundance of SEQ ID NOS: 305, 184, 41, 183, 49, 382, 235, 291, 181, 5, 296, 289, 206, 338, 334, 25, 11, 19, 198, 23 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between squamous cell carcinoma cancers originating from the group consisting of anus and skin, and between squamous cell carcinoma cancers originating from the group consisting of lung, head & neck and esophagus.
  • measurement of the relative abundance of SEQ ID NOS: 4, 11, 46, 8, 274, 169, 36, 47, 363, 231, 303, 349, 10, 7, 3, 16, 164, 170, 168, 198, 50, 245, 365, 45, 382, 259, 296, 364, 314, 12 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a cancer originating from melanoma and a cancer originating from lymphoma.
  • measurement of the relative abundance of SEQ ID NOS: 11, 191, 48, 35, 228 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a cancer originating from B-cell lymphoma and a cancer originating from T-cell lymphoma.
  • measurement of the relative abundance of SEQ ID NOS: 158, 20, 176, 186, 148, 36, 51, 172, 260, 265, 67, 188, 277, 284, 302, 68, 168, 242, 204, 162, 177, 27, 65, 263, 155, 191, 190, 45, 59, 43, 56, 266, 14, 15, 8, 7, 39, 189, 249, 231, 293, 2 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a cancer originating from lung small cell carcinoma and a cancer originating from the group consisting of lung carcinoid, medullary thyroid carcinoma, gastrointestinal tract carcinoid and pancreatic islet cell tumor.
  • measurement of the relative abundance of SEQ ID NOS: 159, 40, 147, 11, 311, 4, 8, 231, 301, 297, 68, 67, 265, 36 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a cancer originating from medullary thyroid carcinoma and a cancer originating from other neuroendocrine tumors selected from the group consisting of lung carcinoid, gastrointestinal tract carcinoid and pancreatic islet cell tumor.
  • measurement of the relative abundance of SEQ ID NOS: 331, 162, 59, 326, 306, 350, 317, 155, 325, 318, 339, 264, 332, 262, 336, 324, 322, 330, 321, 263, 309, 53, 320, 275, 352, 312, 355, 367, 269, 64, 308, 175, 190, 54, 302, 152, 301, 266, 47, 313, 359, 65, 307, 191, 242, 4, 147, 40, 372, 168, 16, 182, 167, 356, 148, 382, 37, 364, 35 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a cancer originating from lung carcinoid tumors, and a cancer originating from gastrointestinal neuroendocrine tumors selected from the group consisting of gastrointestinal tract carcinoid and pancreatic islet cell tumor.
  • measurement of the relative abundance of SEQ ID NOS: 263, 288, 18, 286, 162, 225, 287, 206, 205, 296, 258, 313, 377, 373, 256, 153, 259, 265, 303, 268, 267, 165, 15, 272, 14, 202, 236, 203, 4, 168, 310, 298, 27, 29, 34, 228, 3, 349, 35, 26 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a cancer originating from pancreatic islet cell tumors and a Gastrointestinal neuroendocrine carcinoid cancer originating from the group consisting of small intestine and duodenum; appendicitis, stomach and pancreas.
  • measurement of the relative abundance of SEQ ID NOS: 36, 267, 268, 165, 15, 14, 356, 167, 372, 272, 370, 42, 41, 146 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between adenocarcinoma tumors of the gastrointestinal system originating from:
  • cholangiocarcinoma or adenocarcinoma of the extrahepatic biliary tract the group consisting of cholangiocarcinoma or adenocarcinoma of the extrahepatic biliary tract, pancreatic adenocarcinoma and colorectal adenocarcinoma.
  • measurement of the relative abundance of SEQ ID NOS: 42, 184, 67, 158, 20, 186, 284, 389, 203, 240, 236, 146, 204, 43, 176, 202, 49, 46, 38, 363 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a cancer originating from colorectal adenocarcinoma and a cancer originating from the group consisting of adenocarcinoma of biliary tract or pancreas.
  • measurement of the relative abundance of SEQ ID NOS: 49, 11, 13, 373, 154, 5, 30, 45, 178, 147, 274, 16, 40, 21, 43, 253, 245, 256, 12, 374, 379, 180, 153, 51, 52, 1, 295, 257, 385, 293, 294 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a cancer originating from pancreatic adenocarcinoma, and a cancer originating from the group consisting of cholangiocarcinoma or adenocarcinoma of the extrahepatic biliary tract.
  • renal cell tumors selected from the group consisting of chromophobe renal cell carcinoma, clear cell renal cell carcinoma and papillary renal cell carcinoma, and
  • sarcomas the group consisting of sarcomas, adrenal tumors and pleural mesothelioma.
  • measurement of the relative abundance of SEQ ID NOS: 65, 56, 11, 162, 59, 331, 350, 155, 335, 159, 336, 332, 263, 306, 339, 337, 275, 301, 276, 330, 317, 309, 45, 318, 324, 352, 191, 262, 269, 313, 19, 367, 326, 325, 322, 327, 190, 261, 321, 360, 353, 312, 371, 5, 328, 205, 183, 38, 181, 37, 40, 182, 147, 17, 42, 382, 34, 18, 3 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a cancer originating from pheochromocytoma, and a cancer originating from the group consisting of all sarcoma, adrenal carcinoma and mesothelioma tumors.
  • measurement of the relative abundance of SEQ ID NOS: 61, 333, 31, 347, 346, 344, 345, 387, 334, 351, 324, 326, 269, 155, 320, 322, 59, 318, 325, 245, 254, 331, 275, 180, 355, 370, 323, 312, 178, 249, 183, 181, 38, 182, 37, 3, 25 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a cancer originating from adrenal carcinoma and a cancer originating from the group consisting of mesothelioma and sarcoma tumors.
  • measurement of the relative abundance of SEQ ID NOS: 165, 14, 15, 333, 272, 270, 45, 301, 191, 46, 195, 266, 190, 19, 334, 155, 25, 147, 40, 34 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a cancer originating from a gastrointestinal stromal tumor and a cancer originating from the group consisting of mesothelioma and sarcoma tumors.
  • measurement of the relative abundance of SEQ ID NOS: 13, 30, 361, 280, 362, 147, 40, 291, 387, 290, 299, 152, 178, 303, 242, 49, 11, 35, 34, 36, 206, 16, 170, 177, 17 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a cancer originating from a chromophobe renal cell carcinoma tumor and a cancer originating from the group consisting of clear cell and papillary renal cell carcinoma tumors.
  • measurement of the relative abundance of SEQ ID NOS: 344, 382, 9, 338, 29, 49, 28, 195, 46, 4, 11, 254 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a renal carcinoma cancer originating from a clear cell tumor and a cancer originating from a papillary tumor.
  • measurement of the relative abundance of SEQ ID NOS: 49, 35, 17, 34, 25, 36, 168, 170, 26, 4, 190, 46, 10, 240, 43, 39, 385, 63, 202, 181, 37, 5, 183, 182, 38, 206, 296, 1 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a cancer originating from pleural mesothelioma and a cancer originating from the group consisting of sarcoma tumors.
  • measurement of the relative abundance of SEQ ID NOS: 152, 29, 159, 28, 339, 275, 352, 19, 320, 155, 262, 38, 37, 182, 331, 317, 323, 355, 3, 282, 312, 181, 269, 318, 59, 266, 322, 8, 324, 10, 40, 147, 169, 205, 34, 168, 14, 15, 12, 46, 255, 39, 23, 190, 236, 386, 379, 202 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a cancer originating from a synovial sarcoma and a cancer originating from the group consisting of other sarcoma tumors.
  • measurement of the relative abundance of SEQ ID NOS: 12, 271, 206, 333, 11, 58, 36, 18, 178, 293, 189, 382, 381, 240, 249, 5, 377, 235, 17, 20, 385, 384, 46, 283 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a cancer originating from chondrosarcoma and a cancer originating from the group consisting of other non-synovial sarcoma tumors.
  • measurement of the relative abundance of SEQ ID NOS: 295, 205, 25, 26, 231, 183, 42, 254, 168, 64, 14, 178, 15, 39, 36, 154, 265, 174, 384, 67 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a cancer originating from liposarcoma and a cancer originating from the group consisting of other non chondrosarcoma and non synovial sarcoma tumors.
  • measurement of the relative abundance of SEQ ID NOS: 22, 154, 21, 174, 205, 158, 186, 148, 20, 59, 8, 183, 231 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a cancer originating from:
  • measurement of the relative abundance of SEQ ID NOS: 155, 179, 43, 208, 278, 17, 385, 174, 5, 52, 257, 366, 48, 49, 12, 25, 169, 34, 35, 23, 384, 189, 377, 265, 294, 293, 292 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a cancer originating from Ewing sarcoma and a cancer originating from osteosarcoma.
  • measurement of the relative abundance of SEQ ID NOS: 33, 268, 267, 333, 276, 319, 306, 320, 334, 323, 300, 281, 59, 339, 316, 176, 348, 352, 349, 67, 357, 315, 343, 342, 355, 340, 344, 10, 341, 331, 20, 277, 318, 158, 265, 284, 36, 183, 40, 63, 147, 43, 289, 52, 190, 4, 5, 39, 169, 208 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a cancer originating from rhabdomyosarcoma and a cancer originating from the group consisting of malignant fibrous histiocytoma and fibrosarcoma.
  • the biological sample is selected from the group consisting of bodily fluid, a cell line, a tissue sample, a biopsy sample, a needle biopsy sample, a fine needle biopsy (FNA) sample, a surgically removed sample, and a sample obtained by tissue-sampling procedures such as endoscopy, bronchoscopy, or laparoscopic methods.
  • the tissue is a fresh, frozen, fixed, wax-embedded or formalin-fixed paraffin-embedded (FFPE) tissue.
  • the nucleic acid sequence relative abundance is determined by a method selected from the group consisting of nucleic acid hybridization and nucleic acid amplification.
  • the nucleic acid hybridization is performed using a solid-phase nucleic acid biochip array or in situ hybridization.
  • the nucleic acid amplification method is real-time PCR.
  • the real-time PCR comprises forward and reverse primers.
  • the real-time PCR method further comprises a probe.
  • the probe comprises a sequence selected from the group consisting of a sequence that is complementary to a sequence selected from SEQ ID NOS: 1-390; a fragment thereof and a sequence having at least about 80% identity thereto.
  • the present invention provides a kit for cancer origin identification, the kit comprising a probe comprising a sequence selected from the group consisting of a sequence that is complementary to a sequence selected from SEQ ID NOS: 1-390; a fragment thereof and a sequence having at least about 80% identity thereto.
  • FIGS. 1A-1F demonstrate the structure of the binary decision-tree classifier, with 45 nodes and 46 leaves. Each node is a binary decision between two sets of samples, those to the left and right of the node. A series of binary decisions, starting at node #1 and moving downwards, lead to one of the possible tumor types, which are the “leaves” of the tree. A sample which is classified to the right branch at node #1 continues to node #2, otherwise it continues to node #11. A sample which is classified to the right branch at node #2 continues to node #4, otherwise it continues to node #3.
  • a sample that reaches node #3 is further classified to either the left branch at node #3, and is assigned to the “hepatocellular carcinoma” class, or to the right branch at node #3, and is assigned to the “biliary tract adenocarcinoma” class.
  • FIGS. 2A-2D demonstrate binary decisions at node #1 of the decision-tree.
  • the “non germ cell” classes (right branch at node #1); are easily distinguished from tumors of the “germ cell” classes (left branch at node #1) using the expression levels of hsa-miR-373 (SEQ ID NO: 233, 2 A), hsa-miR-372 (SEQ ID NO: 55, 2 B), hsa-miR-371-3p (SEQ ID NO: 200, 2 C), and hsa-miR-371-5p (SEQ ID NO: 201, 2 D).
  • the boxplot presentations comparing distribution of the expression of the statistically significant miRs in tumor samples from the “germ cell” classes (left box) and “non germ cell” classes (right box).
  • the line in the box indicates the median value.
  • the box contains 50% of the data and the horizontal lines and crosses (outliers) show the full range of signals in this group.
  • FIG. 3 demonstrates binary decisions at node #3 of the decision-tree.
  • Tumors of hepatocellular carcinoma (HCC) origin (left branch at node #3, marked by squares) are easily distinguished from tumors of biliary tract adenocarcinoma origin (right branch at node #3, marked by diamonds) using the expression levels of hsa-miR-200b (SEQ ID NO: 29, y-axis) and hsa-miR-126 (SEQ ID NO: 9, x-axis).
  • HCC hepatocellular carcinoma
  • FIG. 4 demonstrates binary decisions at node #4 of the decision-tree.
  • Tumors originating in epithelial are easily distinguished from tumors of non-epithelial origin (squares) using the expression levels of hsa-miR-30a (SEQ ID NO: 46, y-axis) and hsa-miR-200c (SEQ ID NO: 30, x-axis).
  • FIG. 5 demonstrates binary decisions at node #5 of the decision-tree.
  • Tumors originating in the lymphoma or melanoma are easily distinguished from tumors of non epithelial, non lymphoma/melanoma origin (squares) using the expression levels of hsa-miR-146a (SEQ ID NO: 16, y-axis), hsa-miR-30a (SEQ ID NO: 46, x-axis) and hsa-let-7e (SEQ ID NO: 2, z-axis).
  • FIG. 6 demonstrates binary decisions at node #6 of the decision-tree.
  • Tumors originating in the brain are easily distinguished from tumors of non epithelial, non brain (right branch at node #6, marked by squares) using the expression levels of hsa-miR-9* (SEQ ID NO: 66, y-axis) and hsa-miR-92b (SEQ ID NO: 68, x-axis).
  • FIG. 7 demonstrates binary decisions at node #7 of the decision-tree.
  • Tumors originating in astrocytoma (right branch at node #7, marked by diamonds) are easily distinguished from tumors of oligodendroglioma origins (left branch at node #7, marked by squares) using the expression levels of hsa-miR-497 (SEQ ID NO: 60, y-axis) and hsa-miR-222 (SEQ ID NO: 40, x-axis).
  • FIG. 8 demonstrates binary decisions at node #8 of the decision-tree.
  • Tumors originating in the neuroendocrine are easily distinguished from tumors of epithelial, origin (squares) using the expression levels of hsa-miR-193a-3p (SEQ ID NO: 181, y-axis), hsa-miR-7 (SEQ ID NO: 65, x-axis) and hsa-miR-375 (SEQ ID NO: 56, z-axis).
  • FIG. 9 demonstrates binary decisions at node #9 of the decision-tree.
  • Tumors originating in gastro-intestinal (GI) are easily distinguished from tumors of non GI origins (right branch at node #9, marked by squares) using the expression levels of hsa-miR-21* (SEQ ID NO: 35, y-axis) and hsa-miR-194 (SEQ ID NO: 27, x-axis).
  • FIG. 10 demonstrates binary decisions at node #10 of the decision-tree.
  • Tumors originating in prostate adenocarcinoma are easily distinguished from tumors of non prostate origins (right branch at node #10, marked by squares) using the expression levels of hsa-miR-181a (SEQ ID NO: 21, y-axis) and hsa-miR-143 (SEQ ID NO: 14, x-axis).
  • FIG. 11 demonstrates binary decisions at node #12 of the decision-tree.
  • Tumors originating in seminomatous testicular germ cell are easily distinguished from tumors of non seminomatous origins (right branch at node #12, marked by squares) using the expression levels of hsa-miR-516a-5p (SEQ ID NO: 62, y-axis) and hsa-miR-200b (SEQ ID NO: 29, x-axis).
  • FIG. 12 demonstrates binary decisions at node #16 of the decision-tree.
  • Tumors originating in thyroid carcinoma are easily distinguished from tumors of adenocarcinoma of the lung, breast and ovarian origin (squares) using the expression levels of hsa-miR-93 (SEQ ID NO: 148, y-axis), hsa-miR-138 (SEQ ID NO: 11, x-axis) and hsa-miR-10a (SEQ ID NO: 4, z-axis).
  • FIG. 13 demonstrates binary decisions at node #17 of the decision-tree.
  • Tumors originating in follicular thyroid carcinoma are easily distinguished from tumors of papillary thyroid carcinoma origins (right branch at node #17, marked by squares) using the expression levels of hsa-miR-21 (SEQ ID NO: 34, y-axis) and hsa-miR-146b-5p (SEQ ID NO: 17, x-axis).
  • FIG. 14 demonstrates binary decisions at node #18 of the decision-tree.
  • Tumors originating in breast are easily distinguished from tumors of lung and ovarian origin (squares) using the expression levels of hsa-miR-92a (SEQ ID NO: 67, y-axis), hsa-miR-193a-3p (SEQ ID NO: 25, x-axis) and hsa-miR-31 (SEQ ID NO: 49, z-axis).
  • FIG. 15 demonstrates binary decisions at node #19 of the decision-tree.
  • Tumors originating in lung adenocarcinoma are easily distinguished from tumors of ovarian carcinoma origin (squares) using the expression levels of hsa-miR-21 (SEQ ID NO: 34, y-axis), hsa-miR-378 (SEQ ID NO: 57, x-axis) and hsa-miR-138 (SEQ ID NO: 11, z-axis).
  • FIG. 16 demonstrates binary decisions at node #20 of the decision-tree.
  • Tumors originating in thymic carcinoma are easily distinguished from tumors of urothelial carcinoma, transitional cell carcinoma (TCC) carcinoma and squamous cell carcinoma (SCC) origins (right branch at node #20, marked by squares) using the expression levels of hsa-miR-21 (SEQ ID NO: 34, y-axis) and hsa-miR-100 (SEQ ID NO: 3, x-axis).
  • FIG. 17 demonstrates binary decisions at node #22 of the decision-tree.
  • Tumors originating in SCC of the uterine cervix are easily distinguished from tumors of other SCC origin (squares) using the expression levels of hsa-miR-361-5p (SEQ ID NO: 54, y-axis), hsa-let-7c (SEQ ID NO: 1, x-axis) and hsa-miR-10b (SEQ ID NO: 5, z-axis).
  • FIG. 18 demonstrates binary decisions at node #24 of the decision-tree.
  • Tumors originating in melanoma are easily distinguished from tumors of lymphoma origin (squares) using the expression levels of hsa-miR-342-3p (SEQ ID NO: 50, y-axis) and hsa-miR-30d (SEQ ID NO: 47, x-axis).
  • FIG. 19 demonstrates binary decisions at node #27 of the decision-tree.
  • Tumors originating in thyroid carcinoma, medullary (diamonds) are easily distinguished from tumors of other neuroendocrine origin (squares) using the expression levels of hsa-miR-92b (SEQ ID NO: 68, y-axis), hsa-miR-222 (SEQ ID NO: 40, x-axis) and hsa-miR-92a (SEQ ID NO: 67, z-axis).
  • FIG. 20 demonstrates binary decisions at node #30 of the decision-tree.
  • Tumors originating in gastric or esophageal adenocarcinoma are easily distinguished from tumors of other GI adenocarcinoma origin (squares) using the expression levels of hsa-miR-1201 (SEQ ID NO: 146, y-axis), hsa-miR-224 (SEQ ID NO: 42, x-axis) and hsa-miR-210 (SEQ ID NO: 36, z-axis).
  • FIG. 21 demonstrates binary decisions at node #31 of the decision-tree.
  • Tumors originating in colorectal adenocarcinoma are easily distinguished from tumors of adenocarcinoma of biliary tract or pancreas origin (squares) using the expression levels of hsa-miR-30a (SEQ ID NO: 46, y-axis), hsa-miR-17 (SEQ ID NO: 20, x-axis) and hsa-miR-29a (SEQ ID NO: 43, z-axis).
  • FIG. 22 demonstrates binary decisions at node #33 of the decision-tree.
  • Tumors originating in kidney are easily distinguished from tumors of adrenal, mesothelioma and sarcoma origin (squares) using the expression levels of hsa-miR-200b (SEQ ID NO: 29, y-axis), hsa-miR-30a (SEQ ID NO: 46, x-axis) and hsa-miR-149 (SEQ ID NO: 19, z-axis).
  • FIG. 23 demonstrates binary decisions at node #34 of the decision-tree.
  • Tumors originating in pheochromocytoma are easily distinguished from tumors of adrenal, mesothelioma and sarcoma origin (squares) using the expression levels of hsa-miR-375 (SEQ ID NO: 56, y-axis) and hsa-miR-7 (SEQ ID NO: 65, x-axis).
  • FIG. 24 demonstrates binary decisions at node #44 of the decision-tree.
  • Tumors originating in Ewing sarcoma are easily distinguished from tumors of osteosarcoma origin (squares) using the expression levels of hsa-miR-31 (SEQ ID NO: 49, y-axis) and hsa-miR-193a-3p (SEQ ID NO: 25, x-axis).
  • FIG. 25 demonstrates binary decisions at node #45 of the decision-tree.
  • Tumors originating in Rhabdomyosarcoma are easily distinguished from tumors of malignant fibrous histiocytoma (MFH) or fibrosarcoma origin (squares) using the expression levels of hsa-miR-206 (SEQ ID NO: 33, y-axis), hsa-miR-22 (SEQ ID NO: 39, x-axis) and hsa-miR-487b (SEQ ID NO: 59, z-axis).
  • the present invention is based in part on the discovery that specific nucleic acid sequences can be used for the identification of the tissue-of-origin of a tumor.
  • the present invention provides a sensitive, specific and accurate method which can be used to distinguish between different tumor origins.
  • a new microRNA-based classifier was developed for determining tissue origin of tumors based on 65 microRNAs markers. The classifier uses a specific algorithm and allows a clear interpretation of the specific biomarkers.
  • each node in the classification tree may be used as an independent differential diagnosis tool, for example in the identification of different types of lung cancers.
  • the possibility to distinguish between different tumor origins facilitates providing the patient with the best and most suitable treatment.
  • the present invention provides diagnostic assays and methods, both quantitative and qualitative for detecting, diagnosing, monitoring, staging and prognosticating cancers by comparing the levels of the specific microRNA molecules of the invention. Such levels are preferably measured in at least one of biopsies, tumor samples, fine-needle aspiration (FNA), cells, tissues and/or bodily fluids.
  • FNA fine-needle aspiration
  • the methods provided in the present invention are particularly useful for discriminating between different cancers.
  • All the methods of the present invention may optionally further include measuring levels of additional cancer markers.
  • the cancer markers measured in addition to said microRNA molecules depend on the cancer being tested and are known to those of skill in the art.
  • Assay techniques can be used to determine levels of gene expression, such as genes encoding the nucleic acids of the present invention in a sample derived from a patient.
  • Such assay methods include, but are not limited to, nucleic acid microarrays and biochip analysis, reverse transcriptase PCR (RT-PCR) assays, immunohistochemistry assays, in situ hybridization assays, competitive-binding assays, northern blot analyses and ELISA assays.
  • the assay is based on expression level of 65 microRNAs in RNA extracted from FFPE metastatic tumor tissue.
  • the expression levels are used to infer the sample origin using analysis techniques such as, but not limited to, decision-tree classifier, K nearest neighbors classifier, logistic regression classifier, linear regression classifier, nearest neighbor classifier, neural network classifier and nearest centroid classifier.
  • the expression levels are used to make binary decisions (at each relevant node) following the pre-defined structure of the binary decision-tree (defined using a training set).
  • P TH a probability threshold level
  • the classification continues to the left or right branch according to whether P is larger or smaller than the P TH for that node. This continues until an end-point (“leaf”) of the tree is reached.
  • P TH 0.5 for all nodes, and the value of ⁇ 0 is adjusted accordingly.
  • ⁇ 0, ⁇ 1, ⁇ 2, . . . are adjusted so that the slope of the log of the odds ratio function is limited.
  • Training the tree algorithm means determining the tree structure—which nodes there are and what is on each side, and, for each node: which miRs are used, the values of ⁇ 0, ⁇ 1, ⁇ 2 . . . and the P TH . These are determined by a combination of machine learning, optimization algorithm, and trial and error by experts in machine learning and diagnostic algorithms.
  • An arbitrary threshold of the expression level of one or more nucleic acid sequences can be set for assigning a sample to one of two groups.
  • expression levels of one or more nucleic acid sequences of the invention are combined by a method such as logistic regression to define a metric which is then compared to previously measured samples or to a threshold.
  • the threshold is treated as a parameter that can be used to quantify the confidence with which samples are assigned to each class.
  • the threshold can be scaled to favor sensitivity or specificity, depending on the clinical scenario.
  • the correlation value to the reference data generates a continuous score that can be scaled and provides diagnostic information on the likelihood that a sample belongs to a certain class of cancer origin or type. In multivariate analysis the microRNA signature provides a high level of prognostic information.
  • expression level of nucleic acids is used to classify a test sample by comparison to a training set of samples.
  • the test sample is compared in turn to each one of the training set samples.
  • the comparison is performed by comparing the expression levels of one or multiple nucleic acids between the test sample and the specific training sample.
  • Each such pairwise comparison generates a combined metric for the multiple nucleic acids, which can be calculated by various numeric methods such as correlation, cosine, Euclidian distance, mean square distance, or other methods known to those skilled in the art.
  • the training samples are then ranked according to this metric, and the samples with the highest values of the metric (or lowest values, according to the type of metric) are identified, indicating those samples that are most similar to the test sample.
  • this By choosing a parameter K, this generates a list that includes the K training samples that are most similar to the test sample.
  • Various methods can then be applied to identify from this list the predicted class of the test sample.
  • the test sample is predicted to belong to the class that has the highest number of representative in the list of K most-similar training samples (this method is known as the K Nearest Neighbors method).
  • Other embodiments may provide a list of predictions including all or part of the classes represented in the list, those classes that are represented more than a given minimum number of times, or other voting schemes whereby classes are grouped together.
  • each intervening number there between with the same degree of precision is explicitly contemplated.
  • the numbers 7 and 8 are contemplated in addition to 6 and 9, and for the range 6.0-7.0, the number 6.0, 6.1, 6.2, 6.3, 6.4, 6.5, 6.6, 6.7, 6.8, 6.9 and 7.0 are explicitly contemplated.
  • “Attached” or “immobilized”, as used herein, to refer to a probe and a solid support means that the binding between the probe and the solid support is sufficient to be stable under conditions of binding, washing, analysis, and removal.
  • the binding may be covalent or non-covalent. Covalent bonds may be formed directly between the probe and the solid support or may be formed by a cross linker or by inclusion of a specific reactive group on either the solid support or the probe or both molecules.
  • Non-covalent binding may be one or more of electrostatic, hydrophilic, and hydrophobic interactions.
  • non-covalent binding is the covalent attachment of a molecule, such as streptavidin, to the support and the non-covalent binding of a biotinylated probe to the streptavidin Immobilization may also involve a combination of covalent and non-covalent interactions.
  • a molecule such as streptavidin
  • Baseline means the initial cycles of PCR, in which there is little change in fluorescence signal.
  • Biological sample means a sample of biological tissue or fluid that comprises nucleic acids. Such samples include, but are not limited to, tissue or fluid isolated from subjects. Biological samples may also include sections of tissues such as biopsy and autopsy samples, FFPE samples, frozen sections taken for histological purposes, blood, blood fraction, plasma, serum, sputum, stool, tears, mucus, hair, skin, urine, effusions, ascitic fluid, amniotic fluid, saliva, cerebrospinal fluid, cervical secretions, vaginal secretions, endometrial secretions, gastrointestinal secretions, bronchial secretions, cell line, tissue sample, or secretions from the breast.
  • tissue sample includes, but are not limited to, tissue or fluid isolated from subjects. Biological samples may also include sections of tissues such as biopsy and autopsy samples, FFPE samples, frozen sections taken for histological purposes, blood, blood fraction, plasma, serum, sputum, stool, tears, mucus, hair, skin, urine, effusions, ascitic fluid, amniotic
  • a biological sample may be provided by fine-needle aspiration (FNA), pleural effusion or bronchial brushing.
  • FNA fine-needle aspiration
  • a biological sample may be provided by removing a sample of cells from a subject but can also be accomplished by using previously isolated cells (e. g., isolated by another person, at another time, and/or for another purpose), or by performing the methods described herein in vivo.
  • Archival tissues such as those having treatment or outcome history, may also be used.
  • Biological samples also include explants and primary and/or transformed cell cultures derived from animal or human tissues.
  • cancer is meant to include all types of cancerous growths or oncogenic processes, metastatic tissues or malignantly transformed cells, tissues, or organs, irrespective of histopathologic type or stage of invasiveness.
  • cancers include, but are not limited, to solid tumors and leukemias, including: apudoma, choristoma, branchioma, malignant carcinoid syndrome, carcinoid heart disease, carcinoma (e.g., Walker, basal cell, basosquamous, Brown-Pearce, ductal, Ehrlich tumor, non-small cell lung (e.g., lung squamous cell carcinoma, lung adenocarcinoma and lung undifferentiated large cell carcinoma), oat cell, papillary, bronchiolar, bronchogenic, squamous cell, and transitional cell), histiocytic disorders, leukemia (e.g., B cell, mixed cell, null cell, T cell, T-cell chronic, HTLV-II
  • classification refers to a procedure and/or algorithm in which individual items are placed into groups or classes based on quantitative information on one or more characteristics inherent in the items (referred to as traits, variables, characters, features, etc.) and based on a statistical model and/or a training set of previously labeled items.
  • traits, variables, characters, features, etc. referred to as traits, variables, characters, features, etc.
  • a “classification tree” places categorical variables into classes.
  • “Complement” or “complementary” is used herein to refer to a nucleic acid may mean Watson-Crick (e.g., A-T/U and C-G) or Hoogsteen base pairing between nucleotides or nucleotide analogs of nucleic acid molecules.
  • a full complement or fully complementary means 100% complementary base pairing between nucleotides or nucleotide analogs of nucleic acid molecules.
  • the complementary sequence has a reverse orientation (5′-3′).
  • Ct signals represent the first cycle of PCR where amplification crosses a threshold (cycle threshold) of fluorescence. Accordingly, low values of Ct represent high abundance or expression levels of the microRNA.
  • the PCR Ct signal is normalized such that the normalized Ct remains inversed from the expression level. In other embodiments the PCR Ct signal may be normalized and then inverted such that low normalized-inverted Ct represents low abundance or low expression levels of the microRNA.
  • a “data processing routine” refers to a process that can be embodied in software that determines the biological significance of acquired data (i.e., the ultimate results of an assay or analysis). For example, the data processing routine can determine a tissue of origin based upon the data collected. In the systems and methods herein, the data processing routine can also control the data collection routine based upon the results determined. The data processing routine and the data collection routines can be integrated and provide feedback to operate the data acquisition, and hence provide assay-based judging methods.
  • data set refers to numerical values obtained from the analysis. These numerical values associated with analysis may be values such as peak height and area under the curve.
  • data structure refers to a combination of two or more data sets, an application of one or more mathematical manipulation to one or more data sets to obtain one or more new data sets, or a manipulation of two or more data sets into a form that provides a visual illustration of the data in a new way.
  • An example of a data structure prepared from manipulation of two or more data sets would be a hierarchical cluster.
  • Detection means detecting the presence of a component in a sample. Detection also means detecting the absence of a component. Detection also means determining the level of a component, either quantitatively or qualitatively.
  • differential expression means qualitative or quantitative differences in the temporal and/or spatial gene expression patterns within and among cells and tissue.
  • a differentially expressed gene may qualitatively have its expression altered, including an activation or inactivation, in, e.g., normal versus diseased tissue. Genes may be turned on or turned off in a particular state, relative to another state, thus permitting comparison of two or more states.
  • a qualitatively regulated gene may exhibit an expression pattern within a state or cell type which may be detectable by standard techniques. Some genes may be expressed in one state or cell type, but not in both.
  • the difference in expression may be quantitative, e.g., in that expression is modulated: up-regulated, resulting in an increased amount of transcript, or down-regulated, resulting in a decreased amount of transcript.
  • the degree to which expression differs needs only to be large enough to quantify via standard characterization techniques such as expression arrays, quantitative reverse transcriptase PCR, northern blot analysis, real-time PCR, in situ hybridization and RNase protection.
  • Epithelial tumors is meant to include all types of tumors from epithelial origin.
  • epithelial tumors include, but are not limited to cholangioca or adenoca of extrahepatic biliary tract, urothelial carcinoma, adenocarcinoma of the breast, lung large cell or adenocarcinoma, lung small cell carcinoma, carcinoid, lung, ovarian carcinoma, pancreatic adenocarcinoma, prostatic adenocarcinoma, gastric or esophageal adenocarcinoma, thymoma/thymic carcinoma, follicular thyroid carcinoma, papillary thyroid carcinoma, medullary thyroid carcinoma, anus or skin squamous cell carcinoma, lung, head&neck, or esophagus squamous cell carcinoma, uterine cervix squamous cell carcinoma, gastrointestinal tract carcinoid, pancreatic islet cell tumor and colorectal aden
  • Non epithelial tumors is meant to include all types of tumors from non epithelial origin.
  • examples of non epithelial tumors include, but are not limited to adrenocortical carcinoma, chromophobe renal cell carcinoma, clear cell renal cell carcinoma, papillary renal cell carcinoma, pleural mesothelioma, astrocytic tumor, oligodendroglioma, pheochromocytoma, B-cell lymphoma, T-cell lymphoma, melanoma, gastrointestinal stromal tumor (GIST), Ewing Sarcoma, chondrosarcoma, malignant fibrous histiocytoma (MFH) or fibrosarcoma, osteosarcoma, rhabdomyosarcoma, synovial sarcoma and liposarcoma.
  • MMH malignant fibrous histiocytoma
  • osteosarcoma rhabdomyosarcoma
  • expression profile is used broadly to include a genomic expression profile, e.g., an expression profile of microRNAs. Profiles may be generated by any convenient means for determining a level of a nucleic acid sequence, e.g., quantitative hybridization of microRNA, labeled microRNA, amplified microRNA, cDNA, etc., quantitative PCR, ELISA for quantitation, and the like, and allow the analysis of differential gene expression between two samples.
  • a subject or patient tumor sample e.g., cells or collections thereof, e.g., tissues, is assayed. Samples are collected by any convenient method, as known in the art.
  • Nucleic acid sequences of interest are nucleic acid sequences that are found to be predictive, including the nucleic acid sequences provided above, where the expression profile may include expression data for 5, 10, 20, 25, 50, 100 or more of the nucleic acid sequences, including all of the listed nucleic acid sequences.
  • expression profile means measuring the relative abundance of the nucleic acid sequences in the measured samples.
  • “Expression ratio”, as used herein, refers to relative expression levels of two or more nucleic acids as determined by detecting the relative expression levels of the corresponding nucleic acids in a biological sample.
  • “Fragment” is used herein to indicate a non-full-length part of a nucleic acid. Thus, a fragment is itself also a nucleic acid.
  • gastrointestinal tumors is meant to include all types of tumors from gastrointestinal origin.
  • examples of gastrointestinal tumors include, but are not limited to cholangioca. or adenoca of extrahepatic biliary tract, pancreatic adenocarcinoma, gastric or esophageal adenocarcinoma, and colorectal adenocarcinoma.
  • Gene may be a natural (e.g., genomic) or synthetic gene comprising transcriptional and/or translational regulatory sequences and/or a coding region and/or non-translated sequences (e.g., introns, 5′- and 3′-untranslated sequences).
  • the coding region of a gene may be a nucleotide sequence coding for an amino acid sequence or a functional RNA, such as tRNA, rRNA, catalytic RNA, siRNA, miRNA or antisense RNA.
  • a gene may also be an mRNA or cDNA corresponding to the coding regions (e.g., exons and miRNA) optionally comprising 5′- or 3′-untranslated sequences linked thereto.
  • a gene may also be an amplified nucleic acid molecule produced in vitro, comprising all or a part of the coding region and/or 5′- or 3′-untranslated sequences linked thereto.
  • “Germ cell tumors” as used herein, include, but are not limited, to non-seminomatous testicular germ cell tumors, seminomatous testicular germ cell tumors and ovarian primitive germ cell tumors.
  • “Groove binder” and/or “minor groove binder” may be used interchangeably and refer to small molecules that fit into the minor groove of double-stranded DNA, typically in a sequence-specific manner.
  • Minor groove binders may be long, flat molecules that can adopt a crescent-like shape and thus fit snugly into the minor groove of a double helix, often displacing water.
  • Minor groove binding molecules may typically comprise several aromatic rings connected by bonds with torsional freedom such as furan, benzene, or pyrrole rings.
  • Minor groove binders may be antibiotics such as netropsin, distamycin, berenil, pentamidine and other aromatic diamidines, Hoechst 33258, SN 6999, aureolic anti-tumor drugs such as chromomycin and mithramycin, CC-1065, dihydrocyclopyrroloindole tripeptide (DPI 3 ), 1,2-dihydro-(3H)-pyrrolo[3,2-e]indole-7-carboxylate (CDPI 3 ), and related compounds and analogues, including those described in Nucleic Acids in Chemistry and Biology, 2nd ed., Blackburn and Gait, eds., Oxford University Press, 1996, and PCT Published Application No.
  • antibiotics such as netropsin, distamycin, berenil, pentamidine and other aromatic diamidines, Hoechst 33258, SN 6999, aureolic anti-tumor drugs such as chromomycin and
  • a minor groove binder may be a component of a primer, a probe, a hybridization tag complement, or combinations thereof. Minor groove binders may increase the T m of the primer or a probe to which they are attached, allowing such primers or probes to effectively hybridize at higher temperatures.
  • “High expression miR-205 tumors” as used herein include, but are not limited, to urothelial carcinoma (TCC), thymoma/thymic carcinoma, anus or skin squamous cell carcinoma, lung, head&neck, or esophagus squamous cell carcinoma and uterine cervix squamous cell carcinoma.
  • TCC urothelial carcinoma
  • thymoma/thymic carcinoma anus or skin squamous cell carcinoma
  • lung head&neck
  • esophagus squamous cell carcinoma and uterine cervix squamous cell carcinoma.
  • Low expression miR-205 tumors include, but are not limited, to lung, large cell or adenocarcinoma, follicular thyroid carcinoma and papillary thyroid carcinoma.
  • Host cell may be a naturally occurring cell or a transformed cell that may contain a vector and may support replication of the vector.
  • Host cells may be cultured cells, explants, cells in vivo, and the like.
  • Host cells may be prokaryotic cells, such as E. coli , or eukaryotic cells, such as yeast, insect, amphibian, or mammalian cells, such as CHO and HeLa cells.
  • nucleic acids or polypeptide sequences mean that the sequences have a specified percentage of residues that are the same over a specified region. The percentage may be calculated by optimally aligning the two sequences, comparing the two sequences over the specified region, determining the number of positions at which the identical residue occurs in both sequences to yield the number of matched positions, dividing the number of matched positions by the total number of positions in the specified region, and multiplying the result by 100 to yield the percentage of sequence identity.
  • the residues of single sequence are included in the denominator but not the numerator of the calculation.
  • thymine (T) and uracil (U) may be considered equivalent.
  • Identity may be performed manually or by using a computer sequence algorithm such as BLAST or BLAST 2.0.
  • “In situ detection”, as used herein, means the detection of expression or expression levels in the original site, hereby meaning in a tissue sample such as biopsy.
  • K-nearest neighbor refers to a classification method that classifies a point by calculating the distances between it and points in the training data set. It then assigns the point to the class that is most common among its K-nearest neighbors (where K is an integer).
  • a leaf as used herein, is the terminal group in a classification or decision tree.
  • Label means a composition detectable by spectroscopic, photochemical, biochemical, immunochemical, chemical, or other physical means.
  • useful labels include 32 P, fluorescent dyes, electron-dense reagents, enzymes (e.g., as commonly used in an ELISA), biotin, digoxigenin, or haptens and other entities which can be made detectable.
  • a label may be incorporated into nucleic acids and proteins at any position.
  • Logistic regression is part of a category of statistical models called generalized linear models. Logistic regression can allow one to predict a discrete outcome, such as group membership, from a set of variables that may be continuous, discrete, dichotomous, or a mix of any of these. The dependent or response variable can be dichotomous, for example, one of two possible types of cancer. Logistic regression models the natural log of the odds ratio, i.e., the ratio of the probability of belonging to the first group (P) over the probability of belonging to the second group (1-P), as a linear combination of the different expression levels (in log-space). The logistic regression output can be used as a classifier by prescribing that a case or sample will be classified into the first type if P is greater than 0.5 or 50%. Alternatively, the calculated probability P can be used as a variable in other contexts, such as a 1D or 2D threshold classifier.
  • Metalastasis means the process by which cancer spreads from the place at which it first arose as a primary tumor to other locations in the body.
  • the metastatic progression of a primary tumor reflects multiple stages, including dissociation from neighboring primary tumor cells, survival in the circulation, and growth in a secondary location.
  • Neuroendocrine tumors is meant to include all types of tumors from neuroendocrine origin.
  • Examples of neuroendocrine tumors include, but are not limited to lung small cell carcinoma, lung carcinoid, gastrointestinal tract carcinoid, pancreatic islet cell tumor and medullary thyroid carcinoma.
  • a “node” is a decision point in a classification (i.e., decision) tree. Also, a point in a neural net that combines input from other nodes and produces an output through application of an activation function.
  • Nucleic acid or “oligonucleotide” or “polynucleotide”, as used herein, mean at least two nucleotides covalently linked together.
  • the depiction of a single strand also defines the sequence of the complementary strand.
  • a nucleic acid also encompasses the complementary strand of a depicted single strand.
  • Many variants of a nucleic acid may be used for the same purpose as a given nucleic acid.
  • a nucleic acid also encompasses substantially identical nucleic acids and complements thereof.
  • a single strand provides a probe that may hybridize to a target sequence under stringent hybridization conditions.
  • a nucleic acid also encompasses a probe that hybridizes under stringent hybridization conditions.
  • Nucleic acids may be single-stranded or double-stranded, or may contain portions of both double-stranded and single-stranded sequences.
  • the nucleic acid may be DNA, both genomic and cDNA, RNA, or a hybrid, where the nucleic acid may contain combinations of deoxyribo- and ribo-nucleotides, and combinations of bases including uracil, adenine, thymine, cytosine, guanine, inosine, xanthine hypoxanthine, isocytosine and isoguanine.
  • Nucleic acids may be obtained by chemical synthesis methods or by recombinant methods.
  • a nucleic acid will generally contain phosphodiester bonds, although nucleic acid analogs may be included that may have at least one different linkage, e.g., phosphoramidate, phosphorothioate, phosphorodithioate, or O-methylphosphoroamidite linkages and peptide nucleic acid backbones and linkages.
  • Other analog nucleic acids include those with positive backbones, non-ionic backbones and non-ribose backbones, including those described in U.S. Pat. Nos. 5,235,033 and 5,034,506, which are incorporated herein by reference.
  • Nucleic acids containing one or more non-naturally occurring or modified nucleotides are also included within one definition of nucleic acids.
  • the modified nucleotide analog may be located for example at the 5′-end and/or the 3′-end of the nucleic acid molecule.
  • Representative examples of nucleotide analogs may be selected from sugar- or backbone-modified ribonucleotides.
  • nucleobase-modified ribonucleotides i.e., ribonucleotides, containing a non-naturally occurring nucleobase instead of a naturally occurring nucleobase such as uridine or cytidine modified at the 5-position, e.g., 5-(2-amino) propyl uridine, 5-bromo uridine; adenosine and guanosine modified at the 8-position, e.g., 8-bromo guanosine; deaza nucleotides, e.g., 7-deaza-adenosine; 0- and N-alkylated nucleotides, e.g., N6-methyl adenosine are suitable.
  • uridine or cytidine modified at the 5-position e.g., 5-(2-amino) propyl uridine, 5-bromo uridine
  • the 2′-OH-group may be replaced by a group selected from H, OR, R, halo, SH, SR, NH 2 , NHR, NR 2 or CN, wherein R is C1-C6 alkyl, alkenyl or alkynyl and halo is F, Cl, Br or I.
  • Modified nucleotides also include nucleotides conjugated with cholesterol through, e.g., a hydroxyprolinol linkage as described in Krutzfeldt et al., Nature 2005; 438:685-689, Soutschek et al., Nature 2004; 432:173-178, and U.S. Patent Publication No. 20050107325, which are incorporated herein by reference.
  • Modifications of the ribose-phosphate backbone may be done for a variety of reasons, e.g., to increase the stability and half-life of such molecules in physiological environments, to enhance diffusion across cell membranes, or as probes on a biochip.
  • the backbone modification may also enhance resistance to degradation, such as in the harsh endocytic environment of cells.
  • the backbone modification may also reduce nucleic acid clearance by hepatocytes, such as in the liver and kidney. Mixtures of naturally occurring nucleic acids and analogs may be made; alternatively, mixtures of different nucleic acid analogs, and mixtures of naturally occurring nucleic acids and analogs may be made.
  • Probe means an oligonucleotide capable of binding to a target nucleic acid of complementary sequence through one or more types of chemical bonds, usually through complementary base pairing, usually through hydrogen bond formation. Probes may bind target sequences lacking complete complementarity with the probe sequence depending upon the stringency of the hybridization conditions. There may be any number of base pair mismatches which will interfere with hybridization between the target sequence and the single-stranded nucleic acids described herein. However, if the number of mutations is so great that no hybridization can occur under even the least stringent of hybridization conditions, the sequence is not a complementary target sequence.
  • a probe may be single-stranded or partially single- and partially double-stranded. The strandedness of the probe is dictated by the structure, composition, and properties of the target sequence. Probes may be directly labeled or indirectly labeled such as with biotin to which a streptavidin complex may later bind.
  • reference value or “reference expression profile” refers to a criterion expression value to which measured values are compared in order to identify a specific cancer.
  • the reference value may be based on the abundance of the nucleic acids, or may be based on a combined metric score thereof.
  • the reference value is determined from statistical analysis of studies that compare microRNA expression with known clinical outcomes.
  • Sarcoma is meant to include all types of tumors from sarcoma origin.
  • sarcoma tumors include, but are not limited to gastrointestinal stromal tumor (GIST), Ewing sarcoma, chondrosarcoma, malignant fibrous histiocytoma (MFH) or fibrosarcoma, osteosarcoma, rhabdomyosarcoma, synovial sarcoma and liposarcoma.
  • “Sensitivity”, as used herein, may mean a statistical measure of how well a binary classification test correctly identifies a condition, for example, how frequently it correctly classifies a cancer into the correct class out of two possible classes.
  • the sensitivity for class A is the proportion of cases that are determined to belong to class “A” by the test out of the cases that are in class “A”, as determined by some absolute or gold standard.
  • Specificity may mean a statistical measure of how well a binary classification test correctly identifies a condition, for example, how frequently it correctly classifies a cancer into the correct class out of two possible classes.
  • the specificity for class A is the proportion of cases that are determined to belong to class “not A” by the test out of the cases that are in class “not A”, as determined by some absolute or gold standard.
  • Stringent hybridization conditions mean conditions under which a first nucleic acid sequence (e.g., probe) will hybridize to a second nucleic acid sequence (e.g., target), such as in a complex mixture of nucleic acids. Stringent conditions are sequence-dependent and will be different in different circumstances. Stringent conditions may be selected to be about 5-10° C. lower than the thermal melting point (T m ) for the specific sequence at a defined ionic strength pH.
  • the T m may be the temperature (under defined ionic strength, pH, and nucleic concentration) at which 50% of the probes complementary to the target hybridize to the target sequence at equilibrium (as the target sequences are present in excess, at T m , 50% of the probes are occupied at equilibrium).
  • Stringent conditions may be those in which the salt concentration is less than about 1.0 M sodium ion, such as about 0.01-1.0 M sodium ion concentration (or other salts) at pH 7.0 to 8.3 and the temperature is at least about 30° C. for short probes (e.g., about 10-50 nucleotides) and at least about 60° C. for long probes (e.g., greater than about 50 nucleotides).
  • Stringent conditions may also be achieved with the addition of destabilizing agents such as formamide.
  • destabilizing agents such as formamide.
  • a positive signal may be at least 2 to 10 times background hybridization.
  • Exemplary stringent hybridization conditions include the following: 50% formamide, 5 ⁇ SSC, and 1% SDS, incubating at 42° C., or, 5 ⁇ SSC, 1% SDS, incubating at 65° C., with wash in 0.2 ⁇ SSC, and 0.1% SDS at 65° C.
  • “Substantially complementary”, as used herein, means that a first sequence is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 97%, 98% or 99% identical to the complement of a second sequence over a region of 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100 or more nucleotides, or that the two sequences hybridize under stringent hybridization conditions.
  • “Substantially identical”, as used herein, means that a first and a second sequence are at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 97%, 98% or 99% identical over a region of 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100 or more nucleotides or amino acids, or with respect to nucleic acids, if the first sequence is substantially complementary to the complement of the second sequence.
  • the term “subject” refers to a mammal, including both human and other mammals.
  • the methods of the present invention are preferably applied to human subjects.
  • Target nucleic acid means a nucleic acid or variant thereof that may be bound by another nucleic acid.
  • a target nucleic acid may be a DNA sequence.
  • the target nucleic acid may be RNA.
  • the target nucleic acid may comprise a mRNA, tRNA, shRNA, siRNA or Piwi-interacting RNA, or a pri-miRNA, pre-miRNA, miRNA, or anti-miRNA.
  • the target nucleic acid may comprise a target miRNA binding site or a variant thereof.
  • One or more probes may bind the target nucleic acid.
  • the target binding site may comprise 5-100 or 10-60 nucleotides.
  • the target binding site may comprise a total of 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30-40, 40-50, 50-60, 61, 62 or 63 nucleotides.
  • the target site sequence may comprise at least 5 nucleotides of the sequence of a target miRNA binding site disclosed in U.S. patent application Ser. Nos. 11/384,049, 11/418,870 or 11/429,720, the contents of which are incorporated herein.
  • “1D/2D threshold classifier”, as used herein, may mean an algorithm for classifying a case or sample such as a cancer sample into one of two possible types such as two types of cancer.
  • the decision is based on one variable and one predetermined threshold value; the sample is assigned to one class if the variable exceeds the threshold and to the other class if the variable is less than the threshold.
  • a 2D threshold classifier is an algorithm for classifying into one of two types based on the values of two variables.
  • a threshold may be calculated as a function (usually a continuous or even a monotonic function) of the first variable; the decision is then reached by comparing the second variable to the calculated threshold, similar to the 1D threshold classifier.
  • tissue sample is tissue obtained from a tissue biopsy using methods well known to those of ordinary skill in the related medical arts.
  • the phrase “suspected of being cancerous”, as used herein, means a cancer tissue sample believed by one of ordinary skill in the medical arts to contain cancerous cells. Methods for obtaining the sample from the biopsy include gross apportioning of a mass, microdissection, laser-based microdissection, or other art-known cell-separation methods.
  • Tumor refers to all neoplastic cell growth and proliferation, whether malignant or benign, and all pre-cancerous and cancerous cells and tissues.
  • nucleic acid means (i) a portion of a referenced nucleotide sequence; (ii) the complement of a referenced nucleotide sequence or portion thereof; (iii) a nucleic acid that is substantially identical to a referenced nucleic acid or the complement thereof; or (iv) a nucleic acid that hybridizes under stringent conditions to the referenced nucleic acid, complement thereof, or a sequence substantially identical thereto.
  • wild-type sequence refers to a coding, a non-coding or an interface sequence which is an allelic form of sequence that performs the natural or normal function for that sequence. Wild-type sequences include multiple allelic forms of a cognate sequence, for example, multiple alleles of a wild type sequence may encode silent or conservative changes to the protein sequence that a coding sequence encodes.
  • the present invention employs miRNAs for the identification, classification and diagnosis of specific cancers and the identification of their tissues of origin.
  • a gene coding for microRNA may be transcribed leading to production of a miRNA primary transcript known as the pri-miRNA.
  • the pri-miRNA may comprise a hairpin with a stem and loop structure.
  • the stem of the hairpin may comprise mismatched bases.
  • the pri-miRNA may comprise several hairpins in a polycistronic structure.
  • the hairpin structure of the pri-miRNA may be recognized by Drosha, which is an RNase III endonuclease. Drosha may recognize terminal loops in the pri-miRNA and cleave approximately two helical turns into the stem to produce a 60-70 nt precursor known as the pre-miRNA. Drosha may cleave the pri-miRNA with a staggered cut typical of RNase III endonucleases yielding a pre-miRNA stem loop with a 5′ phosphate and ⁇ 2 nucleotide 3′ overhang. Approximately one helical turn of stem ( ⁇ 10 nucleotides) extending beyond the Drosha cleavage site may be essential for efficient processing. The pre-miRNA may then be actively transported from the nucleus to the cytoplasm by Ran-GTP and the export receptor Ex-portin-5.
  • the pre-miRNA may be recognized by Dicer, which is also an RNase III endonuclease. Dicer may recognize the double-stranded stem of the pre-miRNA. Dicer may also cut off the terminal loop two helical turns away from the base of the stem loop, leaving an additional 5′ phosphate and a ⁇ 2 nucleotide 3′ overhang.
  • the resulting siRNA-like duplex which may comprise mismatches, comprises the mature miRNA and a similar-sized fragment known as the miRNA*.
  • the miRNA and miRNA* may be derived from opposing arms of the pri-miRNA and pre-miRNA. MiRNA* sequences may be found in libraries of cloned miRNAs, but typically at lower frequency than the miRNAs.
  • RISC RNA-induced silencing complex
  • the miRNA* When the miRNA strand of the miRNA:miRNA* duplex is loaded into the RISC, the miRNA* may be removed and degraded.
  • the strand of the miRNA:miRNA* duplex that is loaded into the RISC may be the strand whose 5′ end is less tightly paired. In cases where both ends of the miRNA:miRNA* have roughly equivalent 5′ pairing, both miRNA and miRNA* may have gene silencing activity.
  • the RISC may identify target nucleic acids based on high levels of complementarity between the miRNA and the mRNA, especially by nucleotides 2-7 of the miRNA. Only one case has been reported in animals where the interaction between the miRNA and its target was along the entire length of the miRNA. This was shown for miR-196 and Hox B8 and it was further shown that miR-196 mediates the cleavage of the Hox B8 mRNA (Yekta et al. Science 2004; 304:594-596). Otherwise, such interactions are known only in plants (Bartel & Bartel 2003; 132:709-717).
  • the target sites in the mRNA may be in the 5′ UTR, the 3′ UTR or in the coding region.
  • multiple miRNAs may regulate the same mRNA target by recognizing the same or multiple sites.
  • the presence of multiple miRNA binding sites in most genetically identified targets may indicate that the cooperative action of multiple RISCs provides the most efficient translational inhibition.
  • miRNAs may direct the RISC to down-regulate gene expression by either of two mechanisms: mRNA cleavage or translational repression.
  • the miRNA may specify cleavage of the mRNA if the mRNA has a certain degree of complementarity to the miRNA. When a miRNA guides cleavage, the cut may be between the nucleotides pairing to residues 10 and 11 of the miRNA.
  • the miRNA may repress translation if the miRNA does not have the requisite degree of complementarity to the miRNA. Translational repression may be more prevalent in animals since animals may have a lower degree of complementarity between the miRNA and binding site.
  • any pair of miRNA and miRNA* there may be variability in the 5′ and 3′ ends of any pair of miRNA and miRNA*. This variability may be due to variability in the enzymatic processing of Drosha and Dicer with respect to the site of cleavage. Variability at the 5′ and 3′ ends of miRNA and miRNA* may also be due to mismatches in the stem structures of the pri-miRNA and pre-miRNA. The mismatches of the stem strands may lead to a population of different hairpin structures. Variability in the stem structures may also lead to variability in the products of cleavage by Drosha and Dicer.
  • Nucleic acids are provided herein.
  • the nucleic acids comprise the sequences of SEQ ID NOS: 1-390 or variants thereof.
  • the variant may be a complement of the referenced nucleotide sequence.
  • the variant may also be a nucleotide sequence that is substantially identical to the referenced nucleotide sequence or the complement thereof.
  • the variant may also be a nucleotide sequence which hybridizes under stringent conditions to the referenced nucleotide sequence, complements thereof, or nucleotide sequences substantially identical thereto.
  • the nucleic acid may have a length of from about 10 to about 250 nucleotides.
  • the nucleic acid may have a length of at least 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 125, 150, 175, 200 or 250 nucleotides.
  • the nucleic acid may be synthesized or expressed in a cell (in vitro or in vivo) using a synthetic gene described herein.
  • the nucleic acid may be synthesized as a single-strand molecule and hybridized to a substantially complementary nucleic acid to form a duplex.
  • the nucleic acid may be introduced to a cell, tissue or organ in a single- or double-stranded form or capable of being expressed by a synthetic gene using methods well known to those skilled in the art, including as described in U.S. Pat. No. 6,506,559, which is incorporated herein by reference.
  • SEQ ID Nos 1-34 are in accordance with Sanger database version 10; SEQ ID Nos 35-390 are in accordance with Sanger database version 11;
  • the nucleic acid may further comprise one or more of the following: a peptide, a protein, a RNA-DNA hybrid, an antibody, an antibody fragment, a Fab fragment, and an aptamer.
  • the nucleic acid may comprise a sequence of a pri-miRNA or a variant thereof.
  • the pri-miRNA sequence may comprise from 45-30,000, 50-25,000, 100-20,000, 1,000-1,500 or 80-100 nucleotides.
  • the sequence of the pri-miRNA may comprise a pre-miRNA, miRNA and miRNA*, as set forth herein, and variants thereof.
  • the sequence of the pri-miRNA may comprise any of the sequences of SEQ ID NOS: 1-390 or variants thereof.
  • the pri-miRNA may comprise a hairpin structure.
  • the hairpin may comprise a first and a second nucleic acid sequence that are substantially complimentary.
  • the first and second nucleic acid sequence may be from 37-50 nucleotides.
  • the first and second nucleic acid sequence may be separated by a third sequence of from 8-12 nucleotides.
  • the hairpin structure may have a free energy of less than ⁇ 25 Kcal/mole, as calculated by the Vienna algorithm with default parameters, as described in Hofacker et al. (Monatshefte f. Chemie 1994; 125:167-188), the contents of which are incorporated herein by reference.
  • the hairpin may comprise a terminal loop of 4-20, 8-12 or 10 nucleotides.
  • the pri-miRNA may comprise at least 19% adenosine nucleotides, at least 16% cytosine nucleotides, at least 23% thymine nucleotides and at least 19% guanine nucleotides.
  • the nucleic acid may also comprise a sequence of a pre-miRNA or a variant thereof.
  • the pre-miRNA sequence may comprise from 45-90, 60-80 or 60-70 nucleotides.
  • the sequence of the pre-miRNA may comprise a miRNA and a miRNA* as set forth herein.
  • the sequence of the pre-miRNA may also be that of a pri-miRNA excluding from 0-160 nucleotides from the 5′ and 3′ ends of the pri-miRNA.
  • the sequence of the pre-miRNA may comprise the sequence of SEQ ID NOS: 1-390 or variants thereof.
  • the nucleic acid may also comprise a sequence of a miRNA (including miRNA*) or a variant thereof.
  • the miRNA sequence may comprise from 13-33, 18-24 or 21-23 nucleotides.
  • the miRNA may also comprise a total of at least 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39 or 40 nucleotides.
  • the sequence of the miRNA may be the first 13-33 nucleotides of the pre-miRNA.
  • the sequence of the miRNA may also be the last 13-33 nucleotides of the pre-miRNA.
  • the sequence of the miRNA may comprise the sequence of SEQ ID NOS: 1-69, 146-148, 152-390 or variants thereof.
  • a probe comprising a nucleic acid described herein is also provided. Probes may be used for screening and diagnostic methods, as outlined below. The probe may be attached or immobilized to a solid substrate, such as a biochip.
  • the probe may have a length of from 8 to 500, 10 to 100 or 20 to 60 nucleotides.
  • the probe may also have a length of at least 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 120, 140, 160, 180, 200, 220, 240, 260, 280 or 300 nucleotides.
  • the probe may further comprise a linker sequence of from 10-60 nucleotides.
  • the probe may comprise a nucleic acid that is complementary to a sequence selected from the group consisting of SEQ ID NOS: 1-390 or variants thereof.
  • a biochip is also provided.
  • the biochip may comprise a solid substrate comprising an attached probe or plurality of probes described herein.
  • the probes may be capable of hybridizing to a target sequence under stringent hybridization conditions.
  • the probes may be attached at spatially defined addresses on the substrate. More than one probe per target sequence may be used, with either overlapping probes or probes to different sections of a particular target sequence.
  • the probes may be capable of hybridizing to target sequences associated with a single disorder appreciated by those in the art.
  • the probes may either be synthesized first, with subsequent attachment to the biochip, or may be directly synthesized on the biochip.
  • the solid substrate may be a material that may be modified to contain discrete individual sites appropriate for the attachment or association of the probes and is amenable to at least one detection method.
  • substrates include glass and modified or functionalized glass, plastics (including acrylics, polystyrene and copolymers of styrene and other materials, polypropylene, polyethylene, polybutylene, polyurethanes, TeflonJ, etc.), polysaccharides, nylon or nitrocellulose, resins, silica or silica-based materials including silicon and modified silicon, carbon, metals, inorganic glasses and plastics.
  • the substrates may allow optical detection without appreciably fluorescing.
  • the substrate may be planar, although other configurations of substrates may be used as well. For example, probes may be placed on the inside surface of a tube, for flow-through sample analysis to minimize sample volume.
  • the substrate may be flexible, such as flexible foam, including closed cell foams made of particular plastics.
  • the biochip and the probe may be derivatized with chemical functional groups for subsequent attachment of the two.
  • the biochip may be derivatized with a chemical functional group including, but not limited to, amino groups, carboxyl groups, oxo groups or thiol groups.
  • the probes may be attached using functional groups on the probes either directly or indirectly using a linker.
  • the probes may be attached to the solid support by either the 5′ terminus, 3′ terminus, or via an internal nucleotide.
  • the probe may also be attached to the solid support non-covalently.
  • biotinylated oligonucleotides can be made, which may bind to surfaces covalently coated with streptavidin, resulting in attachment.
  • probes may be synthesized on the surface using techniques such as photopolymerization and photolithography.
  • diagnosis refers to classifying pathology, or a symptom, determining a severity of the pathology (e.g., grade or stage), monitoring pathology progression, forecasting an outcome of pathology and/or prospects of recovery.
  • a severity of the pathology e.g., grade or stage
  • the phrase “subject in need thereof” refers to an animal or human subject who is known to have cancer, at risk of having cancer (e.g., a genetically predisposed subject, a subject with medical and/or family history of cancer, a subject who has been exposed to carcinogens, occupational hazard, environmental hazard) and/or a subject who exhibits suspicious clinical signs of cancer (e.g., blood in the stool or melena, unexplained pain, sweating, unexplained fever, unexplained loss of weight up to anorexia, changes in bowel habits (constipation and/or diarrhea), tenesmus (sense of incomplete defecation, for rectal cancer specifically), anemia and/or general weakness).
  • the subject in need thereof can be a healthy human subject undergoing a routine well-being check up.
  • Analyzing presence of malignant or pre-malignant cells can be effected in vivo or ex vivo, whereby a biological sample (e.g., biopsy, blood) is retrieved.
  • a biological sample e.g., biopsy, blood
  • biopsy samples comprise cells and may be an incisional or excisional biopsy. Alternatively, the cells may be retrieved from a complete resection.
  • treatment regimen refers to a treatment plan that specifies the type of treatment, dosage, follow-up plans, schedule and/or duration of a treatment provided to a subject in need thereof (e.g., a subject diagnosed with a pathology).
  • the selected treatment regimen can be an aggressive one which is expected to result in the best clinical outcome (e.g., complete cure of the pathology) or a more moderate one which may relieve symptoms of the pathology yet results in incomplete cure of the pathology. It will be appreciated that in certain cases the treatment regimen may be associated with some discomfort to the subject or adverse side effects (e.g., damage to healthy cells or tissue).
  • the type of treatment can include a surgical intervention (e.g., removal of lesion, diseased cells, tissue, or organ), a cell replacement therapy, an administration of a therapeutic drug (e.g., receptor agonists, antagonists, hormones, chemotherapy agents) in a local or a systemic mode, an exposure to radiation therapy using an external source (e.g., external beam) and/or an internal source (e.g., brachytherapy) and/or any combination thereof.
  • a surgical intervention e.g., removal of lesion, diseased cells, tissue, or organ
  • a cell replacement therapy e.g., an administration of a therapeutic drug (e.g., receptor agonists, antagonists, hormones, chemotherapy agents) in a local or a systemic mode
  • an exposure to radiation therapy using an external source e.g., external beam
  • an internal source e.g., brachytherapy
  • the dosage, schedule and duration of treatment can vary, depending on the severity of pathology and the selected type of treatment, and those
  • a method of diagnosis comprises detecting an expression level of a specific cancer-associated nucleic acid in a biological sample.
  • the sample may be derived from a patient. Diagnosis of a specific cancer state in a patient may allow for prognosis and selection of therapeutic strategy. Further, the developmental stage of cells may be classified by determining temporarily expressed specific cancer-associated nucleic acids.
  • In situ hybridization of labeled probes to tissue arrays may be performed.
  • the skilled artisan can make a diagnosis, a prognosis, or a prediction based on the findings. It is further understood that the nucleic acid sequences which indicate the diagnosis may differ from those which indicate the prognosis and molecular profiling of the condition of the cells or exosomes may lead to distinctions between responsive or refractory conditions or may be predictive of outcomes.
  • kits may comprise a nucleic acid described herein together with any or all of the following: assay reagents, buffers, probes and/or primers, and sterile saline or another pharmaceutically acceptable emulsion and suspension base.
  • the kits may include instructional materials containing directions (e.g., protocols) for the practice of the methods described herein.
  • the kit may further comprise a software package for data analysis of expression profiles.
  • the kit may be a kit for the amplification, detection, identification or quantification of a target nucleic acid sequence.
  • the kit may comprise a poly (T) primer, a forward primer, a reverse primer, and a probe.
  • kits may comprise, in suitable container means, an enzyme for labeling the miRNA by incorporating labeled nucleotide or unlabeled nucleotides that are subsequently labeled. It may also include one or more buffers, such as reaction buffer, labeling buffer, washing buffer, or a hybridization buffer, compounds for preparing the miRNA probes, components for in situ hybridization and components for isolating miRNA.
  • Other kits of the invention may include components for making a nucleic acid array comprising miRNA, and thus may include, for example, a solid support.
  • Tumor samples were obtained from several sources. Institutional review approvals were obtained for all samples in accordance with each institute's institutional review board or IRB equivalent guidelines. Samples included primary tumors and metastases of defined origins, according to clinical records. Tumor content was at least 50% for >95% of samples, as determined by a pathologist based on hematoxylin-eosin (H&E) stained slides.
  • H&E hematoxylin-eosin
  • Custom microarrays (Agilent Technologies, Santa Clara, Calif.) were produced by printing DNA oligonucleotide probes to: 982 miRs sequences, 17 negative controls, 23 spikes, and 10 positive controls (total of 1032 probes). Each probe, printed in triplicate, carried up to 28-nucleotide (nt) linker at the 3′ end of the microRNA's complement sequence. 17 negative control probes were designed using as sequences which do not match the genome.
  • Two groups of positive control probes were designed to hybridize to miR array: (i) synthetic small RNAs were spiked to the RNA before labeling to verify the labeling efficiency; and (ii) probes for abundant small RNA (e.g., small nuclear RNAs (U43, U24, Z30, U6, U48, U44), 5.8s and 5s ribosomal RNA are spotted on the array to verify RNA quality.
  • synthetic small RNAs were spiked to the RNA before labeling to verify the labeling efficiency
  • probes for abundant small RNA e.g., small nuclear RNAs (U43, U24, Z30, U6, U48, U44), 5.8s and 5s ribosomal RNA are spotted on the array to verify RNA quality.
  • RNA-linker p-rCrU-Cy/dye (Eurogentec or equivalent)
  • the labeling reaction contained total RNA, spikes (0.1-100 fmoles), 400 ng RNA-linker-dye, 15% DMSO, 1 ⁇ ligase buffer and 20 units of T4 RNA ligase (NEB), and proceeded at 4° C. for 1 h, followed by 1 h at 37° C., followed by 4° C. up to 40 min.
  • the labeled RNA was mixed with 30 ⁇ l hybridization mixture (mixture of 45 ⁇ L of the 10 ⁇ GE Agilent Blocking Agent and 246 ⁇ L of 2 ⁇ Hi-RPM Hybridization).
  • the labeling mixture was incubated at 100° C. for 5 minutes followed by ice incubation in water bath for 5 minutes. Slides were Hybridize at 54° C. for 16-20 hours, followed by two washes.
  • the first wash was conducted at room temperature with Agilent GE Wash Buffer 1 for 5 min followed by a second wash with Agilent GE Wash Buffer 2 at 37° C. for 5 min
  • Arrays were scanned using an Agilent Microarray Scanner Bundle G2565BA (resolution of 5 ⁇ m at XDR Hi 100%, XDR Lo 5%). Array images were analyzed using Feature Extraction 10.7 software (Agilent).
  • Triplicate spots were combined to produce one signal for each probe by taking the logarithmic mean of reliable spots. All data were log 2-transformed and the analysis was performed in log 2-space.
  • a reference data vector for normalization R was calculated by taking the median expression level for each probe across all samples.
  • a 2nd degree polynomial F was found so as to provide the best fit between the sample data and the reference data, such that R ⁇ F(S).
  • Remote data points (“outliers”) were not used for fitting the polynomial F.
  • the aim of a logistic regression model is to use several features, such as expression levels of several microRNAs, to assign a probability of belonging to one of two possible groups, such as two branches of a node in a binary decision-tree.
  • Logistic regression models the natural log of the odds ratio, i.e., the ratio of the probability of belonging to the first group, for example, the left branch in a node of a binary decision-tree (P) over the probability of belonging to the second group, for example, the right branch in such a node (1-P), as a linear combination of the different expression levels (in log-space).
  • P binary decision-tree
  • the logistic regression assumes that:
  • ⁇ 0 is the bias
  • M i is the expression level (normalized, in log 2-space) of the i-th microRNA used in the decision node
  • ⁇ i is its corresponding coefficient.
  • ⁇ i>0 indicates that the probability to take the left branch (P) increases when the expression level of this microRNA (Mi) increases, and the opposite for ⁇ i ⁇ 0. If a node uses only a single microRNA (M), then solving for P results in:
  • the regression error on each sample is the difference between the assigned probability P and the true “probability” of this sample, i.e., 1 if this sample is in the left branch group and 0 otherwise.
  • the training and optimization of the logistic regression model calculates the parameters ⁇ and the p-values [for each microRNA by the Wald statistic and for the overall model by the ⁇ 2 (chi-square) difference], maximizing the likelihood of the data given the model and minimizing the total regression error
  • the probability output of the logistic model is here converted to a binary decision by comparing P to a threshold, denoted by P TH , i.e., if P ⁇ P TH then the sample belongs to the left branch (“first group”) and vice versa.
  • P TH a threshold
  • P TH a threshold which adjusts the probability threshold
  • ⁇ 0 was modified such that the threshold will be shifted back to 0.5.
  • ⁇ 0, ⁇ 1, ⁇ 2, . . . were adjusted so that the slope of the log of the odds ratio function is limited.
  • the original data contain the expression levels of multiple microRNAs for each sample, i.e., multiple of data features.
  • training the classifier for each node only a small subset of these features was selected and used for optimizing a logistic regression model. In the initial training this was done using a forward stepwise scheme. The features were sorted in order of decreasing log-likelihoods, and the logistic model was started off and optimized with the first feature. The second feature was then added, and the model re-optimized. The regression error of the two models was compared: if the addition of the feature did not provide a significant advantage (a ⁇ 2 difference less than 7.88, p-value of 0.005), the new feature was discarded. Otherwise, the added feature was kept.
  • Adding a new feature may make a previous feature redundant (e.g., if they are very highly correlated). To check for this, the process iteratively checks if the feature with lowest likelihood can be discarded (without losing ⁇ 2 difference as above). After ensuring that the current set of features is compact in this sense, the process continues to test the next feature in the sorted list, until features are exhausted. No limitation on the number of features was inserted into the algorithm.
  • the stepwise logistic regression method was used on subsets of the training set samples by re-sampling the training set with repetition (“bootstrap”), so that each of the 20 runs contained somewhat different training set. All the features that took part in one of the 20 models were collected. A robust set of 1-3 features per each node was selected by comparing features that were repeatedly chosen in the bootstrap sets to previous evidence, and considering their signal strengths and reliability. When using these selected features to construct the classifier, the stepwise process was not used and the training optimized the logistic regression model parameters only.
  • the decision-tree and KNN each return a predicted tissue of origin and histological type where applicable.
  • the tissue of origin and histological type may be one of the exact origins and types in the training or a variant thereof.
  • the training includes brain oligodendroglioma and brain astrocytoma
  • the answer may simply be brain carcinoma.
  • the KNN and decision-tree each return a confidence measure.
  • the KNN returns the number of samples within the K nearest neighbors that agreed with the answer reported by the KNN (denoted by V), and the decision-tree returns the probability of the result (P), which is the multiplication of the probabilities at each branch point made on the way to that answer.
  • the classifier returns the two different predictions or a single prediction in case the predictions concur, can be unified into a single answer (for example into the prediction brain if the KNN returned brain oligodendroglioma and the decision-tree brain astrocytoma), or if based on V and P, one answer is chosen to override the other.
  • FIGS. 1A-F A tumor classifier was built using the microRNA expression levels by applying a binary tree classification scheme ( FIGS. 1A-F ).
  • This framework is set up to utilize the specificity of microRNAs in tissue differentiation and embryogenesis: different microRNAs are involved in various stages of tissue specification, and are used by the algorithm at different decision points or “nodes”.
  • the tree breaks up the complex multi-tissue classification problem into a set of simpler binary decisions.
  • classes which branch out earlier in the tree are not considered, reducing interference from irrelevant samples and further simplifying the decision.
  • the decision at each node can then be accomplished using only a small number of microRNA biomarkers, which have well-defined roles in the classification (Table 2).
  • the structure of the binary tree was based on a hierarchy of tissue development and morphological similarity 18 , which was modified by prominent features of the microRNA expression patterns.
  • the expression patterns of microRNAs indicated a significant difference between germ cell tumors and tumors of non-germ cell origin, and these are therefore distinguished at node #1 ( FIG. 2 ) into separate branches ( FIG. 1A ).
  • Class hsa-miR-372 (SEQ ID NO: 55) Germ cell cancer hsa-miR-372, hsa-miR-122 (SEQ ID NO: 6), hsa-miR-126 (SEQ ID Biliary tract NO: 9), hsa-miR-200b (SEQ ID NO: 29) adenocarcinoma hsa-miR-372, hsa-miR-122, hsa-miR-126, hsa-miR-200b Hepatocellular carcinoma (HCC) hsa-miR-372, hsa-miR-122, hsa-miR-200c (SEQ ID NO: 30), hsa-miR- Brain tumor 30a (SEQ ID NO: 46), hsa-miR-146a (SEQ ID NO: 16), h
  • hsa-miR-372 (SEQ ID NO: 55) is used at node 1 of the binary-tree-classifier detailed in the invention to distinguish between germ-cell tumors and all other tumors.
  • FIGS. 2A-D are boxplot presentations comparing distribution of the expression of the statistically significant miRs in tumor samples from the “germ cell” class (left box) and “non germ cell” class (right box).
  • miR name 1.0e+005-5.0e+001 2024.31 (+) 1.1e ⁇ 123 6 hsa-miR-122 7.4e+001-8.1e+003 109.63 ( ⁇ ) 3.6e ⁇ 010 30 hsa-miR-200c 5.0e+001-1.4e+003 27.92 ( ⁇ ) 4.8e ⁇ 010 13 hsa-miR-141 (+) the higher expression of this miR is in tumors from a hepatobiliary origin ( ⁇ ) the higher expression of this miR is in tumors from a non germ-cell, non-hepatobiliary origin
  • hsa-miR-122 (SEQ ID NO: 6) is used at node 2 of the binary-tree-classifier detailed in the invention to distinguish between hepatobiliary tumors and non germ-cell non-hepatobiliary tumors.
  • hsa-miR-126 (SEQ ID NO: 9) and hsa-miR-200b (SEQ ID NO: 29) are used at node 3 of the binary-tree-classifier detailed in the invention to distinguish between liver tumors and biliary-tract carcinoma.
  • FIG. 3 demonstrates that tumors of hepatocellular carcinoma (HCC) origin (marked by squares) are easily distinguished from tumors of biliary tract adenocarcinoma origin (marked by diamonds) using the expression levels of hsa-miR-200b (SEQ ID NO: 29, y-axis) and hsa-miR-126 (SEQ ID NO: 9, x-axis).
  • HCC hepatocellular carcinoma
  • a combination of the expression level of any of the miRs detailed in table 6 with the expression level of any of hsa-miR-30a (SEQ ID NO: 46), hsa-miR-10b (SEQ ID NO: 5) and hsa-miR-140-3p (SEQ ID NO: 12) also provides for distinguishing between tumors from epithelial origins and tumors from non-epithelial origins. This is demonstrated at node 4 of the binary-tree-classifier detailed in the invention with hsa-miR-200c (SEQ ID NO: 30) and hsa-miR-30a (SEQ ID NO: 46) ( FIG. 4 ).
  • Tumors originating in epithelial are easily distinguished from tumors of non-epithelial origin (squares) using the expression levels of hsa-miR-30a (SEQ ID NO: 46, y-axis) and hsa-miR-200c (SEQ ID NO: 30, x-axis).
  • hsa-miR-146a (SEQ ID NO: 16), hsa-let-7e (SEQ ID NO: 2) and hsa-miR-30a (SEQ ID NO: 46) are used at node 5 of the binary-tree-classifier detailed in the invention to distinguish between the group consisting of melanoma and lymphoma, and the group consisting of all other non-epithelial tumors.
  • FIG. 16 hsa-miR-146a
  • hsa-let-7e (SEQ ID NO: 2)
  • hsa-miR-30a (SEQ ID NO: 46) are used at node 5 of the binary-tree-classifier detailed in the invention to distinguish between the group consisting of melanoma and lymphoma, and the group consisting of all other non-epithelial tumors.
  • tumors originating in the lymphoma or melanoma are easily distinguished from tumors of non epithelial, non lymphoma/melanoma origin (squares) using the expression levels of hsa-miR-146a (SEQ ID NO: 16, y-axis), hsa-miR-30a (SEQ ID NO: 46, x-axis) and hsa-let-7e (SEQ ID NO: 2, z-axis).
  • hsa-miR-9* (SEQ ID NO: 66) and hsa-miR-92b (SEQ ID NO: 68) are used at node 6 of the binary-tree-classifier detailed in the invention to distinguish between brain tumors and the group consisting of all non-brain, non-epithelial tumors.
  • FIG. 6 demonstrates that tumors originating in the brain (marked by diamonds) are easily distinguished from tumors of non epithelial, non brain origin (marked by squares) using the expression levels of hsa-miR-9* (SEQ ID NO: 66, y-axis) and hsa-miR-92b (SEQ ID NO: 68, x-axis).
  • a combination of the expression level of any of the miRs detailed in table 9 with the expression level of hsa-miR-497 (SEQ ID NO: 208) or hsa-let-7d (SEQ ID NO: 153) also provides for classification of brain tumors as astrocytic tumors or oligodendrogliomas. This is demonstrated at node 7 of the binary-tree-classifier detailed in the invention with hsa-miR-222 (SEQ ID NO: 40) and hsa-miR-497 (SEQ ID NO: 208).
  • hsa-miR-222 SEQ ID NO: 40
  • hsa-let-7d SEQ ID NO: 153
  • FIG. 7 demonstrates that tumors originating in astrocytoma (marked by diamonds) are easily distinguished from tumors of oligodendroglioma origins (marked by squares) using the expression levels of hsa-miR-497 (SEQ ID NO: 208, y-axis) and hsa-miR-222 (SEQ ID NO: 40, x-axis).
  • hsa-miR-375 (SEQ ID NO: 56), hsa-miR-7 (SEQ ID NO: 65) and hsa-miR-193a-3p (SEQ ID NO: 25) are used at node 8 of the binary-tree-classifier detailed in the invention to distinguish between the group consisting of neuroendocrine tumors and the group consisting of all non-neuroendocrine, epithelial tumors.
  • tumors originating in the neuroendocrine are easily distinguished from tumors of epithelial, origin (squares) using the expression levels of hsa-miR-193a-3p (SEQ ID NO: 25, y-axis), hsa-miR-7 (SEQ ID NO: 65, x-axis) and hsa-miR-375 (SEQ ID NO: 56, z-axis).
  • hsa-miR-194 (SEQ ID NO: 27) and hsa-miR-21* (SEQ ID NO: 35) are used at node 9 of the binary-tree-classifier detailed in the invention to distinguish between GI epithelial tumors and non-GI epithelial tumors.
  • FIG. 9 demonstrates that tumors originating in gastro-intestinal (GI) (marked by diamonds) are easily distinguished from tumors of non GI origins (marked by squares) using the expression levels of hsa-miR-21* (SEQ ID NO: 35, y-axis) and hsa-miR-194 (SEQ ID NO: 27, x-axis).
  • GI gastro-intestinal
  • hsa-miR-143 (SEQ ID NO: 14) and hsa-miR-181a (SEQ ID NO: 21) are used at node 10 of the binary-tree-classifier detailed in the invention to distinguish between prostate tumors and all other non-GI epithelial tumors.
  • FIG. 10 demonstrates that tumors originating in prostate adenocarcinoma (marked by diamonds) are easily distinguished from tumors of non prostate origins (marked by squares) using the expression levels of hsa-miR-181a (SEQ ID NO: 21, y-axis) and hsa-miR-143 (SEQ ID NO: 14, x-axis).
  • hsa-miR-516a-5p (SEQ ID NO: 211) and hsa-miR-200b (SEQ ID NO: 29) are used at node 12 of the binary-tree-classifier detailed in the invention to distinguish between seminoma and non-seminoma testis-tumors.
  • FIG. 11 demonstrates that tumors originating in seminomatous testicular germ cell (marked by diamonds) are easily distinguished from tumors of non seminomatous origins (marked by squares) using the expression levels of hsa-miR-516a-5p (SEQ ID NO: 211, y-axis) and hsa-miR-200b (SEQ ID NO: 29, x-axis).
  • Node 13 of the binary-tree-classifier separates tissues with high expression of miR-205 (SCC marker) such as SCC, TCC and thymomas from adenocarcinomas.
  • SCC marker miR-205
  • a combination of the expression level of any of the miRs detailed in table 14 with the expression level of hsa-miR-331-3p also provides for this classification.
  • hsa-miR-205 (SEQ ID NO: 32), hsa-miR-345 (SEQ ID NO: 51) and hsa-miR-125a-5p (SEQ ID NO: 7) are used at node 13 of the binary-tree-classifier detailed in the invention.
  • hsa-miR-193a-3p (SEQ ID NO: 25), hsa-miR-375 (SEQ ID NO: 56) and hsa-miR-342-3p (SEQ ID NO: 50) are used at node 14 of the binary-tree-classifier detailed in the invention.
  • hsa-miR-193a-3p (SEQ ID NO: 25), hsa-miR-375 (SEQ ID NO: 56) and hsa-miR-224 (SEQ ID NO: 42) may be used at node 14 of the binary-tree-classifier detailed in the invention.
  • hsa-miR-205 (SEQ ID NO: 32), hsa-miR-10a (SEQ ID NO: 4) and hsa-miR-22 (SEQ ID NO: 39) are used at node 15 of the binary-tree-classifier detailed in the invention.
  • FIG. 12 demonstrates binary decisions at node #16 of the decision-tree.
  • Tumors originating in thyroid carcinoma are easily distinguished from tumors of adenocarcinoma of the lung, breast and ovarian origin (squares) using the expression levels of hsa-miR-93 (SEQ ID NO: 148, y-axis), hsa-miR-138 (SEQ ID NO: 11, x-axis) and hsa-miR-10a (SEQ ID NO: 4, z-axis).
  • FIG. 13 demonstrates binary decisions at node #17 of the decision-tree.
  • Tumors originating in follicular thyroid carcinoma are easily distinguished from tumors of papillary thyroid carcinoma origins (marked by squares) using the expression levels of hsa-miR-21 (SEQ ID NO: 34, y-axis) and hsa-miR-146b-5p (SEQ ID NO: 17, x-axis).
  • FIG. 14 demonstrates binary decisions at node #18 of the decision-tree.
  • Tumors originating in breast are easily distinguished from tumors of lung and ovarian origin (squares) using the expression levels of hsa-miR-92a (SEQ ID NO: 67, y-axis), hsa-miR-193a-3p (SEQ ID NO: 25, x-axis) and hsa-miR-31 (SEQ ID NO: 49, z-axis).
  • FIG. 15 demonstrates binary decisions at node #19 of the decision-tree.
  • Tumors originating in lung adenocarcinoma are easily distinguished from tumors of ovarian carcinoma origin (squares) using the expression levels of hsa-miR-21 (SEQ ID NO: 34, y-axis), hsa-miR-378 (SEQ ID NO: 202, x-axis) and hsa-miR-138 (SEQ ID NO: 11, z-axis).
  • FIG. 16 demonstrates binary decisions at node #20 of the decision-tree.
  • Tumors originating in thymic carcinoma are easily distinguished from tumors of urothelial carcinoma, transitional cell carcinoma (TCC) carcinoma and squamous cell carcinoma (SCC) origins (marked by squares) using the expression levels of hsa-miR-21 (SEQ ID NO: 34, y-axis) and hsa-miR-100 (SEQ ID NO: 3, x-axis).
  • hsa-miR-934 (SEQ ID NO: 69), hsa-miR-191 (SEQ ID NO: 24) and hsa-miR-29c (SEQ ID NO: 191) are used at node #21 of the binary-tree-classifier detailed in the invention to distinguish between TCC and SCC.
  • FIG. 17 demonstrates binary decisions at node #22 of the decision-tree.
  • Tumors originating in SCC of the uterine cervix are easily distinguished from tumors of other SCC origin (squares) using the expression levels of hsa-miR-361-5p (SEQ ID NO: 54, y-axis), hsa-let-7c (SEQ ID NO: 1, x-axis) and hsa-miR-10b (SEQ ID NO: 5, z-axis).
  • hsa-miR-10b (SEQ ID NO: 5), hsa-miR-138 (SEQ ID NO: 11) and hsa-miR-185 (SEQ ID NO: 23) are used at node 23 of the binary-tree-classifier detailed in the invention to distinguish between anus or skin SCC and upper SCC tumors.
  • FIG. 18 demonstrates binary decisions at node #24 of the decision-tree.
  • Tumors originating in melanoma are easily distinguished from tumors of lymphoma origin (squares) using the expression levels of hsa-miR-342-3p (SEQ ID NO: 50, y-axis) and hsa-miR-30d (SEQ ID NO: 47, x-axis).
  • hsa-miR-30e (SEQ ID NO: 48) and hsa-miR-21* (SEQ ID NO: 35) are used at node 25 of the binary-tree-classifier detailed in the invention to distinguish between B-cell lymphoma and T-cell lymphoma.
  • hsa-miR-17 (SEQ ID NO: 20) and hsa-miR-29c* (SEQ ID NO: 45) are used at node #26 of the binary-tree-classifier detailed in the invention to distinguish between lung small cell carcinoma and other neuroendocrine tumors.
  • FIG. 19 demonstrates binary decisions at node #27 of the decision-tree.
  • Tumors originating in medullary thyroid carcinoma are easily distinguished from tumors of other neuroendocrine origin (squares) using the expression levels of hsa-miR-92b (SEQ ID NO: 68, y-axis), hsa-miR-222 (SEQ ID NO: 40, x-axis) and hsa-miR-92a (SEQ ID NO: 67, z-axis).
  • hsa-miR-652 (SEQ ID NO: 64), hsa-miR-34c-5p (SEQ ID NO: 53) and hsa-miR-214 (SEQ ID NO: 37) are used at node 28 of the binary-tree-classifier detailed in the invention to distinguish between lung carcinoid tumors and GI neuroendocrine tumors.
  • hsa-miR-21 (SEQ ID NO: 34), and hsa-miR-148a (SEQ ID NO: 18) are used at node 29 of the binary-tree-classifier detailed in the invention to distinguish between pancreatic islet cell tumors and GI neuroendocrine carcinoid tumors.
  • miR expression in fluorescence units distinguishing between gastric or esophageal adenocarcinoma and other adenocarcinoma tumors of the gastrointestinal system selected from the group consisting of cholangiocarcinoma or adenocarcinoma of extrahepatic biliary tract, pancreatic adenocarcinoma and colorectal adenocarcinoma SEQ ID fold- miR name NO.
  • FIG. 20 demonstrates binary decisions at node #30 of the decision-tree.
  • Tumors originating in gastric or esophageal adenocarcinoma are easily distinguished from tumors of other GI adenocarcinoma origin (squares) using the expression levels of hsa-miR-1201 (SEQ ID NO: 146, y-axis), hsa-miR-224 (SEQ ID NO: 42, x-axis) and hsa-miR-210 (SEQ ID NO: 36, z-axis).
  • FIG. 21 demonstrates binary decisions at node #31 of the decision-tree.
  • Tumors originating in colorectal adenocarcinoma are easily distinguished from tumors of cholangiocarcinoma or adenocarcinoma of biliary tract or pancreas origin (squares) using the expression levels of hsa-miR-30a (SEQ ID NO: 46, y-axis), hsa-miR-17 (SEQ ID NO: 20, x-axis) and hsa-miR-29a (SEQ ID NO: 43, z-axis).
  • hsa-miR-345 (SEQ ID NO: 51), hsa-miR-31 (SEQ ID NO: 49) and hsa-miR-146a (SEQ ID NO: 16) are used at node #32 of the binary-tree-classifier detailed in the invention to distinguish between cholangio cancer or adenocarcinoma of extrahepatic biliary tract and pancreatic adenocarcinoma.
  • FIG. 22 demonstrates binary decisions at node #33 of the decision-tree.
  • Tumors originating in kidney are easily distinguished from tumors of adrenal, mesothelioma and sarcoma origin (squares) using the expression levels of hsa-miR-200b (SEQ ID NO: 29, y-axis), hsa-miR-30a (SEQ ID NO: 46, x-axis) and hsa-miR-149 (SEQ ID NO: 19, z-axis).
  • FIG. 23 demonstrates binary decisions at node #34 of the decision-tree.
  • Tumors originating in pheochromocytoma are easily distinguished from tumors of adrenal, mesothelioma and sarcoma origin (squares) using the expression levels of hsa-miR-375 (SEQ ID NO: 56, y-axis) and hsa-miR-7 (SEQ ID NO: 65, x-axis).
  • hsa-miR-202 (SEQ ID NO: 31), hsa-miR-509-3p (SEQ ID NO: 61) and hsa-miR-214* (SEQ ID NO: 38) are used at node 35 of the binary-tree-classifier detailed in the invention to distinguish between adrenal carcinoma and sarcoma or mesothelioma tumors.
  • hsa-miR-29C* (SEQ ID NO: 45) and hsa-miR-143 (SEQ ID NO: 14) are used at node 36 of the binary-tree-classifier detailed in the invention to distinguish between GIST and sarcoma or mesothelioma tumors.
  • hsa-miR-210 (SEQ ID NO: 36) and hsa-miR-221 (SEQ ID NO: 147) are used at node #37 of the binary-tree-classifier detailed in the invention to distinguish between chromophobe renal cell carcinoma tumors and clear cell or papillary renal cell carcinoma tumors.
  • hsa-miR-31 (SEQ ID NO: 49) and hsa-miR-126 (SEQ ID NO: 9) are used at node 38 of the binary-tree-classifier detailed in the invention to distinguish between renal clear cell and papillary cell carcinoma tumors.
  • hsa-miR-21* (SEQ ID NO: 35)
  • hsa-miR-130a (SEQ ID NO: 10)
  • hsa-miR-10b (SEQ ID NO: 5) are used at node 39 of the binary-tree-classifier detailed in the invention to distinguish between pleural mesothelioma tumors and sarcoma tumors.
  • hsa-miR-100 (SEQ ID NO: 3) hsa-miR-145 (SEQ ID NO: 15) and hsa-miR-222 (SEQ ID NO: 40) are used at node 40 of the binary-tree-classifier detailed in the invention to distinguish between synovial sarcoma tumors and other sarcoma tumors.
  • hsa-miR-140-3p (SEQ ID NO: 12) and hsa-miR-455-5p (SEQ ID NO: 58) are used at node 41 of the binary-tree-classifier detailed in the invention to distinguish between chondrosarcoma tumors and other non-synovial sarcoma tumors.
  • hsa-miR-210 (SEQ ID NO: 36) and hsa-miR-193a-5p (SEQ ID NO: 26) are used at node 42 of the binary-tree-classifier detailed in the invention to distinguish between liposarcoma tumors and other non-chondrosarcoma and non-synovial sarcoma tumors.
  • hsa-miR-181a (SEQ ID NO: 21) is used at node 43 of the binary-tree-classifier detailed in the invention to distinguish between Ewing sarcoma or osteosarcoma tumors and rhabdomyosarcoma, malignant fibrous histiocytoma (MFH) or fibrosarcoma tumors.
  • FIG. 24 demonstrates binary decisions at node #44 of the decision-tree.
  • Tumors originating in Ewing sarcoma are easily distinguished from tumors of osteosarcoma origin (squares) using the expression levels of hsa-miR-31 (SEQ ID NO: 49, y-axis) and hsa-miR-193a-3p (SEQ ID NO: 25, x-axis).
  • FIG. 25 demonstrates binary decisions at node #45 of the decision-tree.
  • Tumors originating in Rhabdomyosarcoma are easily distinguished from tumors of malignant fibrous histiocytoma (MFH) or fibrosarcoma origin (squares) using the expression levels of hsa-miR-206 (SEQ ID NO: 33, y-axis), hsa-miR-22 (SEQ ID NO: 39, x-axis) and hsa-miR-487b (SEQ ID NO: 59, z-axis).
  • FNA fine-needle aspiration
  • pleural effusion or bronchial brushing for the identification of cancer tissue of origin
  • Class Biopsy identified Site Histological Type Sampling Method lung-small Lymph Neuroendocrine; Small percutaneous FNA Node UpperSCC Lung Non-small; squamous percutaneous FNA UpperSCC Lung Non-small; adenocarcinoma percutaneous FNA lung-small Lung Neuroendocrine; Small percutaneous FNA lung-adeno Lung Non-small; adenocarcinoma percutaneous FNA UpperSCC Lung Non-small; squamous percutaneous FNA lung-small Lymph Neuroendocrine; Small transbronchial FNA Node lung-small Lung Neuroendocrine; Small transbronchial FNA lung-adeno Lung Non-small; adenocarcinoma Pleural effusion pleura lung-adeno Lung Non-small; adenocarcinoma Pleural effusion pleura Lung, small L

Abstract

The present invention provides a process for classification of cancers and tissues of origin through the analysis of the expression patterns of specific microRNAs and nucleic acid molecules relating thereto. Classification according to a microRNA tree-based expression framework allows optimization of treatment, and determination of specific therapy.

Description

    FIELD OF THE INVENTION
  • The present invention relates to methods and materials for classification of cancers and the identification of their tissue of origin. Specifically the invention relates to microRNA molecules associated with specific cancers, as well as various nucleic acid molecules relating thereto or derived therefrom.
  • BACKGROUND OF THE INVENTION
  • microRNAs (miRs, miRNAs) are a novel class of non-coding, regulatory RNA genes1-3 which are involved in oncogenesis4 and show remarkable tissue-specificity5-7. They have emerged as highly tissue-specific biomarkers2,5,6 postulated to play important roles in encoding developmental decisions of differentiation. Various studies have tied microRNAs to the development of specific malignancies4. MicroRNAs are also stable in tissue, stored frozen or as formalin-fixed, paraffin-embedded (FFPE) samples, and in serum.
  • Hundreds of thousands of patients in the U.S. are diagnosed each year with a cancer that has already metastasized, without a clearly identified primary site. Oncologists and pathologists are constantly faced with a diagnostic dilemma when trying to identify the primary origin of a patient's metastasis. As metastases need to be treated according to their primary origin, accurate identification of the metastases' primary origin can be critical for determining appropriate treatment.
  • Once a metastatic tumor is found, the patient may undergo a wide range of costly, time consuming, and at times inefficient tests, including physical examination of the patient, histopathology analysis of the biopsy, imaging methods such as chest X-ray, CT and PET scans, in order to identify the primary origin of the metastasis.
  • Metastatic cancer of unknown primary (CUP) accounts for 3-5% of all new cancer cases, and as a group is usually a very aggressive disease with a poor prognosis10. The concept of CUP comes from the limitation of present methods to identify cancer origin, despite an often complicated and costly process which can significantly delay proper treatment of such patients. Recent studies revealed a high degree of variation in clinical management, in the absence of evidence based treatment for CUP11. Many protocols were evaluated12 but have shown relatively small benefit13. Determining tumor tissue of origin is thus an important clinical application of molecular diagnostics9.
  • Molecular classification studies for tumor tissue origin14-17 have generally used classification algorithms that did not utilize domain-specific knowledge: tissues were treated as a-priori equivalents, ignoring underlying similarities between tissue types with a common developmental origin in embryogenesis. An exception of note is the study by Shedden and co-workers18, that was based on a pathology classification tree. These studies used machine-learning methods that average effects of biological features (e.g., mRNA expression levels), an approach which is more amenable to automated processing but does not use or generate mechanistic insights.
  • Various markers have been proposed to indicate specific types of cancers and tumor tissue of origin. However, the diagnostic accuracy of tumor markers has not yet been defined. There is thus a need for a more efficient and effective method for diagnosing and classifying specific types of cancers.
  • SUMMARY OF THE INVENTION
  • The present invention provides specific nucleic acid sequences for use in the identification, classification and diagnosis of specific cancers and tumor tissue of origin. The nucleic acid sequences can also be used as prognostic markers for prognostic evaluation and determination of appropriate treatment of a subject based on the abundance of the nucleic acid sequences in a biological sample. The present invention provides a method for accurate identification of tumor tissue origin.
  • The invention is based in part on the development of a microRNA-based classifier for tumor classification. microRNA expression levels were measured in 1300 primary and metastatic tumor paraffin-embedded samples. microRNAs were profiled using a custom array platform. Using the custom array platform, a set of over 300 microRNAs was identified for the normalization of the array data and 65 microRNAs were used for the accurate classification of over 40 different tumor types. The accuracy of the assay exceeds 85%.
  • The findings demonstrate the utility of microRNA as novel biomarkers for the tissue of origin of a metastatic tumor. The classifier has wide biological as well as diagnostic applications.
  • According to a first aspect, the present invention provides a method of identifying a tissue of origin of a cancer, the method comprising obtaining a biological sample from a subject, measuring the relative abundance in said sample of nucleic acid sequences selected from the group consisting of SEQ ID NOS: 1-390, any combinations thereof, or a sequence having at least about 80% identity thereto; and comparing the measurement to a reference abundance of the nucleic acid by using a classifier algorithm, wherein the relative abundance of said nucleic acid sequences allows for the identification of the tissue of origin of said sample.
  • According to one aspect, the classifier algorithm is selected from the group consisting of decision tree classifier, K-nearest neighbor classifier (KNN), logistic regression classifier, nearest neighbor classifier, neural network classifier, Gaussian mixture model (GMM), Support Vector Machine (SVM) classifier, nearest centroid classifier, linear regression classifier and random forest classifier. According to one aspect, the sample is obtained from a subject with cancer of unknown primary (CUP), with a primary cancer or with a metastatic cancer.
  • According to certain embodiments, the cancer is selected from the group consisting of adrenocortical carcinoma; anus or skin squamous cell carcinoma; biliary tract adenocarcinoma; Ewing sarcoma; gastrointestinal stromal tumor (GIST); gastrointestinal tract carcinoid; renal cell carcinoma: chromophobe, clear cell and papillary; pancreatic islet cell tumor; pheochromocytoma; urothelial cell carcinoma (TCC); lung, head & neck, or esophagus squamous cell carcinoma (SCC); brain: astrocytic tumor, oligodendroglioma; breast adenocarcinoma; uterine cervix squamous cell carcinoma; chondrosarcoma; germ cell cancer; sarcoma; colorectal adenocarcinoma; liposarcoma; hepatocellular carcinoma (HCC); lung large cell or adenocarcinoma; lung carcinoid; pleural mesothelioma; lung small cell carcinoma; B-cell lymphoma; T-cell lymphoma; melanoma; malignant fibrous histiocytoma (MFH) or fibrosarcoma; osteosarcoma; ovarian primitive germ cell tumor; ovarian carcinoma; pancreatic adenocarcinoma; prostate adenocarcinoma; rhabdomyosarcoma; gastric or esophageal adenocarcinoma; synovial sarcoma; non-seminomatous testicular germ cell tumor; seminomatous testicular germ cell tumor; thymoma/thymic carcinoma; follicular thyroid carcinoma; medullary thyroid carcinoma; and papillary thyroid carcinoma.
  • The invention further provides a method for identifying a cancer of germ cell origin, comprising measuring the relative abundance of SEQ ID NO: 55 or a sequence having at least about 80% identity thereto in said sample; wherein the abundance of said nucleic acid sequence is indicative of a cancer of germ cell origin. According to some embodiments the germ cell is selected from the group consisting of an ovarian primitive cell and a testis cell. According to some embodiments the group of nucleic acid furthers consists of SEQ ID NOS: 29, 62 or a sequence having at least about 80% identity thereto, and the abundance of said nucleic acid sequence is indicative of a testis cell cancer origin selected from the group consisting of seminomatous testicular germ cell and non-seminomatous testicular germ cell.
  • The invention further provides a method for identifying a cancer origin selected from the group consisting of biliary tract adenocarcinoma and hepatocellular carcinoma, comprising measuring the relative abundance of a nucleic acid sequence selected from the group consisting of SEQ ID NOS: 55, 6, 9, 29 or a sequence having at least about 80% identity thereto in said sample; wherein the abundance of said nucleic acid sequence is indicative of a cancer origin selected from the group consisting of biliary tract adenocarcinoma and hepatocellular carcinoma.
  • The invention further provides a method for identifying a cancer of brain origin, the method comprising measuring the relative abundance of a nucleic acid sequence selected from the group consisting of SEQ ID NOS: 55, 6, 30, 46, 16, 156, 66, 68 or a sequence having at least about 80% identity thereto in said sample; wherein the abundance of said nucleic acid sequence is indicative of a cancer of brain origin.
  • According to some embodiments the group of nucleic acid furthers consists of SEQ ID NOS: 40, 60 or a sequence having at least about 80% identity thereto, and wherein the abundance of said nucleic acid sequence is indicative of a brain cancer origin selected from the group consisting of oligodendroglioma and astrocytoma.
  • The invention further provides a method for identifying a cancer of prostate adenocarcinoma origin, the method comprising measuring the relative abundance of a nucleic acid sequence selected from the group consisting of SEQ ID NOS: 55, 6, 30, 46, 56, 65, 25, 27, 35, 14, 21 or a sequence having at least about 80% identity thereto in said sample; wherein the abundance of said nucleic acid sequence is indicative of a cancer of prostate adenocarcinoma origin.
  • The invention further provides a method for identifying a cancer of breast adenocarcinoma origin, the method comprising measuring the relative abundance of a nucleic acid sequence selected from the group consisting of SEQ ID NOS: 55, 6, 30, 46, 56, 65, 27, 35, 14, 21, 32, 51, 7, 25, 50, 11, 148, 4, 49, 67 or a sequence having at least about 80% identity thereto in said sample; wherein the abundance of said nucleic acid sequence is indicative of a cancer of breast adenocarcinoma origin.
  • The invention further provides a method for identifying a cancer of ovarian carcinoma origin, the method comprising measuring the relative abundance of a nucleic acid sequence selected from the group consisting of SEQ ID NOS: 55, 6, 30, 46, 56, 65, 27, 35, 14, 21, 32, 51, 7, 25, 4, 39, 50, 11, 148, 49, 67, 57, 34 or a sequence having at least about 80% identity thereto in said sample; wherein the abundance of said nucleic acid sequence is indicative of a cancer of an ovarian carcinoma origin.
  • The invention further provides a method for identifying a cancer of thyroid carcinoma origin, the method comprising measuring the relative abundance of a nucleic acid sequence selected from the group consisting of SEQ ID NOS: 55, 6, 30, 46, 56, 65, 25, 27, 35, 14, 21, 32, 51, 7, 11, 148, 4 or a sequence having at least about 80% identity thereto in said sample; wherein the abundance of said nucleic acid sequence is indicative of a cancer of thyroid carcinoma origin.
  • According to some embodiments the group of nucleic acid furthers consists of SEQ ID NOS: 17, 34 or a sequence having at least about 80% identity thereto, and wherein said thyroid carcinoma origin is selected from the group consisting of follicular and papillary.
  • The invention further provides a method for identifying a cancer origin selected from the group consisting of lung large cell and lung adenocarcinoma, the method comprising measuring the relative abundance of a nucleic acid sequence selected from the group consisting of SEQ ID NOS: 55, 6, 30, 46, 56, 65, 25, 27, 35, 14, 21, 32, 51, 7, 11, 148, 4, 49, 67, 57, 34 or a sequence having at least about 80% identity thereto in said sample; wherein the abundance of said nucleic acid sequence is indicative of a cancer origin selected from the group consisting of lung large cell and lung adenocarcinoma.
  • The invention further provides a method for identifying a cancer origin selected from the group consisting of lung large cell and lung adenocarcinoma, the method comprising measuring the relative abundance of a nucleic acid sequence selected from the group consisting of SEQ ID NOS: 55, 6, 30, 46, 56, 65, 25, 27, 35, 14, 21, 32, 51, 7, 11, 148, 4, 49, 67, 57, 34 or a sequence having at least about 80% identity thereto in said sample; wherein the abundance of said nucleic acid sequence is indicative of a cancer origin selected from the group consisting of lung large cell and lung adenocarcinoma.
  • The invention further provides a method for identifying a cancer of thymic carcinoma origin, the method comprising measuring the relative abundance of a nucleic acid sequence selected from the group consisting of SEQ ID NOS: 55, 6, 30, 46, 56, 65, 25, 27, 35, 14, 21, 32, 51, 7, 50, 4, 39, 3, 34 or a sequence having at least about 80% identity thereto in said sample; wherein the abundance of said nucleic acid sequence is indicative of a cancer of a thymic carcinoma origin.
  • The invention further provides a method for identifying a cancer origin selected from the group consisting of a urothelial cell carcinoma and squamous cell carcinoma, the method comprising measuring the relative abundance of a nucleic acid sequence selected from the group consisting of SEQ ID NOS: 55, 6, 30, 46, 56, 65, 25, 27, 35, 14, 21, 32, 51, 7, 50, 4, 39, 3, 34, 69, 24, 44 or a sequence having at least about 80% identity thereto in said sample; wherein the abundance of said nucleic acid sequence is indicative of is indicative of a cancer origin selected from the group consisting of urothelial cell carcinoma and squamous cell carcinoma.
  • According to some embodiments the group of nucleic acid further consists of SEQ ID NOS: 1, 5, 54 or a sequence having at least about 80% identity thereto, and wherein the abundance of said nucleic acid sequence is indicative of squamous-cell-carcinoma origin selected from the group consisting of uterine cervix squamous-cell-carcinoma and non uterine cervix squamous cell carcinoma.
  • According to some embodiments the group of nucleic acid further consists of SEQ ID NOS: 11, 23 or a sequence having at least about 80% identity thereto in said sample, and wherein the abundance of said nucleic acid sequence is indicative of a non-uterine cervix squamous cell carcinoma origin selected from the group consisting of anus or skin squamous cell carcinoma; and lung, head & neck, and esophagus squamous cell carcinoma.
  • The invention further provides a method for identifying a cancer origin selected from melanoma and lymphoma, the method comprising measuring the relative abundance of a nucleic acid sequence selected from the group consisting of SEQ ID NOS: 55, 6, 30, 46, 16, 2, 47, 50 or a sequence having at least about 80% identity thereto in said sample; wherein the abundance of said nucleic acid sequence is indicative of a cancer origin selected from the group consisting of melanoma and lymphoma.
  • According to some embodiments the group of nucleic acid further consists of SEQ ID NOS: 35, 48 or a sequence having at least about 80% identity thereto, and wherein the abundance of said nucleic acid sequence is indicative of a lymphoma cancer origin selected from the group consisting of B-cell lymphoma and T-cell lymphoma.
  • The invention further provides a method for identifying a cancer of lung small cell carcinoma origin, the method comprising measuring the relative abundance of a nucleic acid sequence selected from the group consisting of SEQ ID NOS: 55, 6, 30, 46, 56, 65, 25, 20, 45 or a sequence having at least about 80% identity thereto in said sample; wherein the abundance of said nucleic acid sequence is indicative of a cancer of lung small cell carcinoma origin.
  • The invention further provides a method for identifying a cancer of medullary thyroid carcinoma origin, the method comprising measuring the relative abundance of a nucleic acid sequence selected from the group consisting of SEQ ID NOS: 55, 6, 30, 46, 56, 65, 25, 20, 45, 40, 67, 68 or a sequence having at least about 80% identity thereto in said sample; wherein the abundance of said nucleic acid sequence is indicative of a cancer of medullary thyroid carcinoma origin.
  • The invention further provides a method for identifying a cancer of lung carcinoid origin, the method comprising measuring the relative abundance of a nucleic acid sequence selected from the group consisting of SEQ ID NOS: 55, 6, 30, 46, 56, 65, 25, 20, 45, 40, 67, 68, 64, 53, 37 or a sequence having at least about 80% identity thereto in said sample; wherein the abundance of said nucleic acid sequence is indicative of a cancer of lung carcinoid origin.
  • The invention further provides a method for identifying a cancer origin selected from the group consisting of gastrointestinal tract carcinoid and pancreatic islet cell tumor, the method comprising measuring the relative abundance of a nucleic acid sequence selected from the group consisting of SEQ ID NOS: 55, 6, 30, 46, 56, 65, 25, 20, 45, 40, 67, 68, 64, 53, 37, 34, 18 or a sequence having at least about 80% identity thereto in said sample; wherein the abundance of said nucleic acid sequence is indicative of a cancer origin selected from the group consisting of gastrointestinal tract carcinoid and pancreatic islet cell tumor.
  • The invention further provides a method for identifying a cancer origin selected from the group consisting of gastric and esophageal adenocarcinoma, the method comprising measuring the relative abundance of a nucleic acid sequence selected from the group consisting of SEQ ID NOS: 55, 6, 30, 46, 56, 65, 25, 27, 35, 42, 36, 146 or a sequence having at least about 80% identity thereto in said sample; wherein the abundance of said nucleic acid sequence is indicative of a cancer origin elected from the group consisting of gastric and esophageal adenocarcinoma.
  • The invention further provides a method for identifying a cancer of colorectal adenocarcinoma origin, the method comprising measuring the relative abundance of a nucleic acid sequence selected from the group consisting of SEQ ID NOS: 55, 6, 30, 46, 56, 65, 25, 27, 35, 42, 36, 146, 20, 43 or a sequence having at least about 80% identity thereto in said sample; wherein the abundance of said nucleic acid sequence is indicative of a cancer of colorectal adenocarcinoma origin.
  • The invention further provides a method for identifying a cancer origin selected from the group consisting of pancreatic adenocarcinoma and biliary tract adenocarcinoma, the method comprising measuring the relative abundance of a nucleic acid sequence selected from the group consisting of SEQ ID NOS: 55, 6, 30, 46, 56, 65, 25, 27, 35, 42, 36, 146, 20, 4351, 49, 16, or a sequence having at least about 80% identity thereto, and wherein the abundance of said nucleic acid sequence is indicative of a cancer origin selected from the group consisting of pancreatic adenocarcinoma or biliary tract adenocarcinoma.
  • The invention further provides a method for identifying a cancer of renal cell carcinoma origin, the method comprising measuring the relative abundance of a nucleic acid sequence selected from the group consisting of SEQ ID NOS: 55, 6, 30, 46, 16, 2, 66, 68, 19, 29 or a sequence having at least about 80% identity thereto in said sample; wherein the abundance of said nucleic acid sequence is indicative of a cancer of renal cell carcinoma origin.
  • According to some embodiments the group of nucleic acid further consists of SEQ ID NOS: 36, 147 or a sequence having at least about 80% identity thereto, and wherein the abundance of said nucleic acid sequence is indicative of a chromophobe renal cell carcinoma origin.
  • According to some embodiments the group of nucleic acid further consists of SEQ ID NOS: 49, 9 or a sequence having at least about 80% identity thereto, and wherein the abundance of said nucleic acid sequence is indicative of a renal cell carcinoma origin selected from the group consisting of clear cell and papillary.
  • The invention further provides a method for identifying a cancer of pheochromocytoma origin, the method comprising measuring the relative abundance of a nucleic acid sequence selected from the group consisting of SEQ ID NOS: 55, 6, 30, 46, 16, 2, 66, 68, 19, 29, 65, 56 or a sequence having at least about 80% identity thereto in said sample; wherein the abundance of said nucleic acid sequence is indicative of a cancer of pheochromocytoma origin.
  • The invention further provides a method for identifying a cancer of adrenocortical origin, the method comprising measuring the relative abundance of a nucleic acid sequence selected from the group consisting of SEQ ID NOS: 55, 6, 30, 46, 16, 2, 66, 68, 19, 29, 65, 56, 31, 38, 61 or a sequence having at least about 80% identity thereto in said sample; wherein the abundance of said nucleic acid sequence is indicative of a cancer of adrenocortical origin.
  • The invention further provides a method for identifying a cancer of gastrointestinal stromal tumor origin, the method comprising measuring the relative abundance of a nucleic acid sequence selected from the group consisting of SEQ ID NOS: 55, 6, 30, 46, 16, 2, 66, 68, 19, 29, 65, 56, 31, 38, 61, 14, 45 or a sequence having at least about 80% identity thereto in said sample; wherein the abundance of said nucleic acid sequence is indicative of a cancer of gastrointestinal stromal tumor origin.
  • The invention further provides a method for identifying a cancer origin selected from the group consisting of pleural mesothelioma and sarcoma, the method comprising measuring the relative abundance of a nucleic acid sequence selected from the group consisting of SEQ ID NOS: 55, 6, 30, 46, 16, 2, 66, 68, 19, 29, 65, 56, 31, 38, 61, 14, 45, 35, 10, 5 or a sequence having at least about 80% identity thereto in said sample; wherein the abundance of said nucleic acid sequence is indicative of a cancer origin selected from the group consisting of pleural mesothelioma and sarcoma.
  • According to some embodiments the group of nucleic acid further consists of SEQ ID NOS: 3, 40, 15 or a sequence having at least about 80% identity thereto, and wherein said sarcoma is synovial sarcoma.
  • According to some embodiments the group of nucleic acid further consists of SEQ ID NOS: 3, 40, 15, 12, 58 or a sequence having at least about 80% identity thereto, and wherein said sarcoma is chondrosarcoma.
  • According to some embodiments the group of nucleic acid further consists of SEQ ID NOS: 3, 40, 15, 12, 58, 36, 26 or a sequence having at least about 80% identity thereto, and wherein said sarcoma is liposarcoma.
  • According to some embodiments the group of nucleic acid further consists of SEQ ID NOS: 3, 40, 15, 12, 58, 36, 26, 21, 25, 49 or a sequence having at least about 80% identity thereto and wherein said sarcoma is selected from the group consisting of Ewing sarcoma and osteosarcoma.
  • According to some embodiments the group of nucleic acid further consists of SEQ ID NOS: 3, 40, 15, 12, 58, 36, 26, 21, 59, 39, 33 or a sequence having at least about 80% identity thereto and wherein said sarcoma is selected from the group consisting of rhabdomyosarcoma; and malignant fibrous histiocytoma and fibrosarcoma.
  • According to another aspect, the present invention provides a method of distinguishing between cancers of different origins, said method comprising:
  • (a) obtaining a biological sample from a subject;
  • (b) measuring the relative abundance in said sample of nucleic acid sequences selected from the group consisting of SEQ ID NOS: 1-390 or a sequence having at least about 80% identity thereto; and
  • (c) comparing said measurement to a reference abundance of said nucleic acid by using a classifier algorithm;
  • wherein the relative abundance of said nucleic acid sequence in said sample allows for distinguishing between cancers of different origins.
  • According to some embodiments the measurement of the relative abundance of SEQ ID NOS: 372, 233, 55, 200, 201 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a cancer originating from a germ-cell tumor and a cancer originating from the group consisting of non-germ-cell tumors.
  • According to some embodiments the measurement of the relative abundance of SEQ ID NOS: 6, 30, 13 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a cancer originating from hepatobiliary tumors and a cancer originating from the group consisting of non-germ-cell non-hepatobiliary tumors.
  • According to some embodiments the measurement of the relative abundance of SEQ ID NOS: 28, 29, 231, 9 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a cancer originating from liver tumors and a cancer originating from biliary-tract carcinomas.
  • According to some embodiments the measurement of the relative abundance of SEQ ID NOS: 46, 5, 12, 30, 29, 28, 32, 13, 152, 49 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a cancer originating from the group consisting of tumors from an epithelial origin and a cancer originating from the group consisting of tumors from a non-epithelial origin.
  • According to some embodiments the measurement of the relative abundance of SEQ ID NOS: 164, 168, 170, 16, 198, 50, 176, 186, 11, 158, 20, 155, 231, 4, 8, 46, 3, 2, 7 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a cancer originating from the group consisting of melanoma and lymphoma and a cancer originating from the group consisting of all other non-epithelial tumors.
  • According to some embodiments the measurement of the relative abundance of SEQ ID NOS: 159, 66, 225, 187, 162, 161, 68, 232, 173, 11, 8, 174, 155, 231, 4, 182, 181, 37 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a cancer originating from brain tumors and a cancer originating from the group consisting of all non-brain, non-epithelial tumors.
  • According to some embodiments the measurement of the relative abundance of SEQ ID NOS: 40, 208, 60, 153, 230, 228, 147, 34, 206, 35, 52, 25, 229, 161, 187, 179 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a cancer originating from astrocytoma and a cancer originating from oligodendroglioma.
  • According to some embodiments, measurement of the relative abundance of SEQ ID NOS: 56, 65, 25, 175, 152, 155, 32, 49, 35, 181, or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a cancer originating from the group consisting of neuroendocrine tumors and a cancer originating from the group consisting of all non-neuroendocrine, epithelial tumors.
  • According to some embodiments, measurement of the relative abundance of SEQ ID NOS: 27, 177, 4, 32, 35 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a cancer originating from the group consisting of gastrointestinal epithelial tumors and a cancer originating from the group consisting of non-gastrointestinal epithelial tumors.
  • According to some embodiments, measurement of the relative abundance of SEQ ID NOS: 56, 199, 14, 15, 165, 231, 36, 154, 21, 49 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a cancer originating from prostate tumors and a cancer originating from the group consisting of all other non-gastrointestinal epithelial tumors.
  • According to some embodiments, measurement of the relative abundance of SEQ ID NOS: 222, 62, 29, 28, 211, 214, 227, 215, 218, 152, 216, 212, 224, 13, 194, 192, 221, 217, 205, 219, 32, 193, 223, 220, 210, 209, 213, 163, 30 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a cancer originating from seminoma and a cancer originating from the group consisting of non-seminoma testis-tumors.
  • According to some embodiments, measurement of the relative abundance of SEQ ID NOS: 42, 32, 36, 178, 243, 242, 49, 240, 57, 11, 46, 17, 47, 51, 7, 8, 154, 190, 157, 196, 197, or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a cancer originating from the group consisting of squamous cell carcinoma, transitional cell carcinoma and thymoma, and a cancer originating from the group consisting of non gastrointestinal adenocarcinoma tumors.
  • According to some embodiments, measurement of the relative abundance of SEQ ID NOS: 56, 46, 25, 152, 50, 45, 191, 181, 179, 49, 32, 42, 184, 40, 147, 236, 57, 203, 36, or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a cancer originating from breast adenocarcinoma, and a cancer originating from the group consisting of squamous cell carcinoma, transitional cell carcinoma, thymomas and ovarian carcinoma.
  • According to some embodiments, measurement of the relative abundance of SEQ ID NOS: 253, 32, 4, 39, 10, 46, 5, 226, 2, 195, 32, 185, 11, 168, 184, 16, 242, 12, 237, 243, 250, 49, 246, 167 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a cancer originating from ovarian carcinoma, and a cancer originating from the group consisting of squamous cell carcinoma, transitional cell carcinoma and thymomas.
  • According to some embodiments, measurement of the relative abundance of SEQ ID NOS: 11, 147, 17, 157, 40, 8, 49, 9, 191, 205, 207, 195, 51, 46, 45, 52, 234, 231, 21, 169, 43, 3, 196, 154, 390, 171, 255, 197, 190, 189, 39, 7, 48, 47, 32, 36, 4, 178, 37, 181, 25, 183, 182, 35, 240, 57, 242, 204, 236, 176, 158, 148, 206, 50, 20, 34, 186, 239, 251, 244, 24, 188, 172, 238 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a cancer originating from thyroid carcinoma, and a cancer originating from the group consisting of breast adenocarcinoma, lung large cell carcinoma, lung adenocarcinoma and ovarian carcinoma.
  • According to some embodiments, measurement of the relative abundance of SEQ ID NOS: 249, 180, 65, 235, 241, 248, 254, 247, 160, 243, 245, 252, 17, 49, 166, 225, 168, 34 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a cancer originating from follicular thyroid carcinoma and a cancer originating from papillary thyroid carcinoma.
  • According to some embodiments, measurement of the relative abundance of SEQ ID NOS: 32, 56, 50, 45, 25, 253, 152, 9, 46, 191, 178, 49, 40, 10, 147, 4, 36, 228, 236, 230, 189, 240, 67, 202, 17 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a cancer originating from breast adenocarcinoma and a cancer originating from the group consisting of lung adenocarcinoma and ovarian carcinoma.
  • According to some embodiments, measurement of the relative abundance of SEQ ID NOS: 56, 11, 168, 16, 237, 21, 52, 12, 154, 279, 9, 39, 47, 23, 50, 167, 383, 34, 35, 388, 5, 359, 245, 254, 10, 240, 236, 202, 4, 25, 203, 231, 20, 158, 186, 258, 244, 172, 2, 235, 256, 28, 277, 296, 374, 153, 181 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a cancer originating from lung adenocarcinoma and a cancer originating from ovarian carcinoma.
  • According to some embodiments, measurement of the relative abundance of SEQ ID NOS: 161, 164, 22, 53, 285, 3, 152, 191, 154, 21, 206, 174, 19, 45, 171, 179, 8, 296, 284, 18, 51, 258, 49, 184, 35, 34, 37, 42, 228, 15, 14, 242, 230, 253, 36, 182, 293, 292, 4, 294, 297, 354, 377, 189, 30, 386, 249, 5, 274 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a cancer originating from thymic carcinoma and a cancer originating from the group consisting of transitional cell carcinoma and squamous cell carcinoma.
  • According to some embodiments, measurement of the relative abundance of SEQ ID NOS: 69, 28, 280, 13, 191, 152, 29, 175, 30, 204, 4, 24, 5, 329, 273, 170, 184, 26, 231, 368, 37, 16, 169, 155, 35, 40, 17 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a cancer originating from transitional cell carcinoma and a cancer originating from the group consisting of squamous cell carcinoma.
  • According to some embodiments, measurement of the relative abundance of SEQ ID NOS: 164, 5, 231, 54, 1, 242, 372, 249, 167, 254, 354, 381, 380, 245, 358, 364, 240, 11, 378 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between squamous cell carcinoma cancers originating from the uterine cervix, and squamous cell carcinoma cancers originating from the group consisting of anus and skin, lung, head & neck and esophagus.
  • According to some embodiments, measurement of the relative abundance of SEQ ID NOS: 305, 184, 41, 183, 49, 382, 235, 291, 181, 5, 296, 289, 206, 338, 334, 25, 11, 19, 198, 23 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between squamous cell carcinoma cancers originating from the group consisting of anus and skin, and between squamous cell carcinoma cancers originating from the group consisting of lung, head & neck and esophagus.
  • According to some embodiments, measurement of the relative abundance of SEQ ID NOS: 4, 11, 46, 8, 274, 169, 36, 47, 363, 231, 303, 349, 10, 7, 3, 16, 164, 170, 168, 198, 50, 245, 365, 45, 382, 259, 296, 364, 314, 12 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a cancer originating from melanoma and a cancer originating from lymphoma.
  • According to some embodiments, measurement of the relative abundance of SEQ ID NOS: 11, 191, 48, 35, 228 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a cancer originating from B-cell lymphoma and a cancer originating from T-cell lymphoma.
  • According to some embodiments, measurement of the relative abundance of SEQ ID NOS: 158, 20, 176, 186, 148, 36, 51, 172, 260, 265, 67, 188, 277, 284, 302, 68, 168, 242, 204, 162, 177, 27, 65, 263, 155, 191, 190, 45, 59, 43, 56, 266, 14, 15, 8, 7, 39, 189, 249, 231, 293, 2 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a cancer originating from lung small cell carcinoma and a cancer originating from the group consisting of lung carcinoid, medullary thyroid carcinoma, gastrointestinal tract carcinoid and pancreatic islet cell tumor.
  • According to some embodiments, measurement of the relative abundance of SEQ ID NOS: 159, 40, 147, 11, 311, 4, 8, 231, 301, 297, 68, 67, 265, 36 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a cancer originating from medullary thyroid carcinoma and a cancer originating from other neuroendocrine tumors selected from the group consisting of lung carcinoid, gastrointestinal tract carcinoid and pancreatic islet cell tumor.
  • According to some embodiments, measurement of the relative abundance of SEQ ID NOS: 331, 162, 59, 326, 306, 350, 317, 155, 325, 318, 339, 264, 332, 262, 336, 324, 322, 330, 321, 263, 309, 53, 320, 275, 352, 312, 355, 367, 269, 64, 308, 175, 190, 54, 302, 152, 301, 266, 47, 313, 359, 65, 307, 191, 242, 4, 147, 40, 372, 168, 16, 182, 167, 356, 148, 382, 37, 364, 35 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a cancer originating from lung carcinoid tumors, and a cancer originating from gastrointestinal neuroendocrine tumors selected from the group consisting of gastrointestinal tract carcinoid and pancreatic islet cell tumor.
  • According to some embodiments, measurement of the relative abundance of SEQ ID NOS: 263, 288, 18, 286, 162, 225, 287, 206, 205, 296, 258, 313, 377, 373, 256, 153, 259, 265, 303, 268, 267, 165, 15, 272, 14, 202, 236, 203, 4, 168, 310, 298, 27, 29, 34, 228, 3, 349, 35, 26 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a cancer originating from pancreatic islet cell tumors and a Gastrointestinal neuroendocrine carcinoid cancer originating from the group consisting of small intestine and duodenum; appendicitis, stomach and pancreas.
  • According to some embodiments, measurement of the relative abundance of SEQ ID NOS: 36, 267, 268, 165, 15, 14, 356, 167, 372, 272, 370, 42, 41, 146 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between adenocarcinoma tumors of the gastrointestinal system originating from:
  • the group consisting of gastric and esophageal adenocarcinoma, and
  • the group consisting of cholangiocarcinoma or adenocarcinoma of the extrahepatic biliary tract, pancreatic adenocarcinoma and colorectal adenocarcinoma.
  • According to some embodiments, measurement of the relative abundance of SEQ ID NOS: 42, 184, 67, 158, 20, 186, 284, 389, 203, 240, 236, 146, 204, 43, 176, 202, 49, 46, 38, 363 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a cancer originating from colorectal adenocarcinoma and a cancer originating from the group consisting of adenocarcinoma of biliary tract or pancreas.
  • According to some embodiments, measurement of the relative abundance of SEQ ID NOS: 49, 11, 13, 373, 154, 5, 30, 45, 178, 147, 274, 16, 40, 21, 43, 253, 245, 256, 12, 374, 379, 180, 153, 51, 52, 1, 295, 257, 385, 293, 294 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a cancer originating from pancreatic adenocarcinoma, and a cancer originating from the group consisting of cholangiocarcinoma or adenocarcinoma of the extrahepatic biliary tract.
  • According to some embodiments, measurement of the relative abundance of SEQ ID NOS: 29, 28, 30, 46, 49, 195, 152, 175, 47, 4, 387, 196, 177, 375, 27, 304, 40, 191, 147, 35, 16, 34, 5, 155, 181, 312, 183, 182, 320, 59, 38, 324, 323, 37, 322, 325, 19, 42, 334, 265, 22 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a cancer originating from:
  • renal cell tumors selected from the group consisting of chromophobe renal cell carcinoma, clear cell renal cell carcinoma and papillary renal cell carcinoma, and
  • the group consisting of sarcomas, adrenal tumors and pleural mesothelioma.
  • According to some embodiments, measurement of the relative abundance of SEQ ID NOS: 65, 56, 11, 162, 59, 331, 350, 155, 335, 159, 336, 332, 263, 306, 339, 337, 275, 301, 276, 330, 317, 309, 45, 318, 324, 352, 191, 262, 269, 313, 19, 367, 326, 325, 322, 327, 190, 261, 321, 360, 353, 312, 371, 5, 328, 205, 183, 38, 181, 37, 40, 182, 147, 17, 42, 382, 34, 18, 3 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a cancer originating from pheochromocytoma, and a cancer originating from the group consisting of all sarcoma, adrenal carcinoma and mesothelioma tumors.
  • According to some embodiments, measurement of the relative abundance of SEQ ID NOS: 61, 333, 31, 347, 346, 344, 345, 387, 334, 351, 324, 326, 269, 155, 320, 322, 59, 318, 325, 245, 254, 331, 275, 180, 355, 370, 323, 312, 178, 249, 183, 181, 38, 182, 37, 3, 25 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a cancer originating from adrenal carcinoma and a cancer originating from the group consisting of mesothelioma and sarcoma tumors.
  • According to some embodiments, measurement of the relative abundance of SEQ ID NOS: 165, 14, 15, 333, 272, 270, 45, 301, 191, 46, 195, 266, 190, 19, 334, 155, 25, 147, 40, 34 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a cancer originating from a gastrointestinal stromal tumor and a cancer originating from the group consisting of mesothelioma and sarcoma tumors.
  • According to some embodiments, measurement of the relative abundance of SEQ ID NOS: 13, 30, 361, 280, 362, 147, 40, 291, 387, 290, 299, 152, 178, 303, 242, 49, 11, 35, 34, 36, 206, 16, 170, 177, 17 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a cancer originating from a chromophobe renal cell carcinoma tumor and a cancer originating from the group consisting of clear cell and papillary renal cell carcinoma tumors.
  • According to some embodiments, measurement of the relative abundance of SEQ ID NOS: 344, 382, 9, 338, 29, 49, 28, 195, 46, 4, 11, 254 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a renal carcinoma cancer originating from a clear cell tumor and a cancer originating from a papillary tumor.
  • According to some embodiments, measurement of the relative abundance of SEQ ID NOS: 49, 35, 17, 34, 25, 36, 168, 170, 26, 4, 190, 46, 10, 240, 43, 39, 385, 63, 202, 181, 37, 5, 183, 182, 38, 206, 296, 1 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a cancer originating from pleural mesothelioma and a cancer originating from the group consisting of sarcoma tumors.
  • According to some embodiments, measurement of the relative abundance of SEQ ID NOS: 152, 29, 159, 28, 339, 275, 352, 19, 320, 155, 262, 38, 37, 182, 331, 317, 323, 355, 3, 282, 312, 181, 269, 318, 59, 266, 322, 8, 324, 10, 40, 147, 169, 205, 34, 168, 14, 15, 12, 46, 255, 39, 23, 190, 236, 386, 379, 202 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a cancer originating from a synovial sarcoma and a cancer originating from the group consisting of other sarcoma tumors.
  • According to some embodiments, measurement of the relative abundance of SEQ ID NOS: 12, 271, 206, 333, 11, 58, 36, 18, 178, 293, 189, 382, 381, 240, 249, 5, 377, 235, 17, 20, 385, 384, 46, 283 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a cancer originating from chondrosarcoma and a cancer originating from the group consisting of other non-synovial sarcoma tumors.
  • According to some embodiments, measurement of the relative abundance of SEQ ID NOS: 295, 205, 25, 26, 231, 183, 42, 254, 168, 64, 14, 178, 15, 39, 36, 154, 265, 174, 384, 67 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a cancer originating from liposarcoma and a cancer originating from the group consisting of other non chondrosarcoma and non synovial sarcoma tumors.
  • According to some embodiments, measurement of the relative abundance of SEQ ID NOS: 22, 154, 21, 174, 205, 158, 186, 148, 20, 59, 8, 183, 231 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a cancer originating from:
  • the group consisting of Ewing sarcoma and osteosarcoma, and
  • the group consisting of rhabdomyosarcoma, malignant fibrous histiocytoma and fibrosarcoma.
  • According to some embodiments, measurement of the relative abundance of SEQ ID NOS: 155, 179, 43, 208, 278, 17, 385, 174, 5, 52, 257, 366, 48, 49, 12, 25, 169, 34, 35, 23, 384, 189, 377, 265, 294, 293, 292 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a cancer originating from Ewing sarcoma and a cancer originating from osteosarcoma.
  • According to some embodiments, measurement of the relative abundance of SEQ ID NOS: 33, 268, 267, 333, 276, 319, 306, 320, 334, 323, 300, 281, 59, 339, 316, 176, 348, 352, 349, 67, 357, 315, 343, 342, 355, 340, 344, 10, 341, 331, 20, 277, 318, 158, 265, 284, 36, 183, 40, 63, 147, 43, 289, 52, 190, 4, 5, 39, 169, 208 or a sequence having at least about 80% identity thereto in said sample allows for distinguishing between a cancer originating from rhabdomyosarcoma and a cancer originating from the group consisting of malignant fibrous histiocytoma and fibrosarcoma.
  • According to some aspects of the invention the biological sample is selected from the group consisting of bodily fluid, a cell line, a tissue sample, a biopsy sample, a needle biopsy sample, a fine needle biopsy (FNA) sample, a surgically removed sample, and a sample obtained by tissue-sampling procedures such as endoscopy, bronchoscopy, or laparoscopic methods. According to some embodiments, the tissue is a fresh, frozen, fixed, wax-embedded or formalin-fixed paraffin-embedded (FFPE) tissue.
  • According to additional aspects of the invention the nucleic acid sequence relative abundance is determined by a method selected from the group consisting of nucleic acid hybridization and nucleic acid amplification. According to some embodiments, the nucleic acid hybridization is performed using a solid-phase nucleic acid biochip array or in situ hybridization. According to some embodiments, the nucleic acid amplification method is real-time PCR. According to some embodiments, the real-time PCR comprises forward and reverse primers. According to additional embodiments, the real-time PCR method further comprises a probe. According to additional embodiments, the probe comprises a sequence selected from the group consisting of a sequence that is complementary to a sequence selected from SEQ ID NOS: 1-390; a fragment thereof and a sequence having at least about 80% identity thereto.
  • According to another aspect, the present invention provides a kit for cancer origin identification, the kit comprising a probe comprising a sequence selected from the group consisting of a sequence that is complementary to a sequence selected from SEQ ID NOS: 1-390; a fragment thereof and a sequence having at least about 80% identity thereto.
  • These and other embodiments of the present invention will become apparent in conjunction with the figures, description and claims that follow.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIGS. 1A-1F demonstrate the structure of the binary decision-tree classifier, with 45 nodes and 46 leaves. Each node is a binary decision between two sets of samples, those to the left and right of the node. A series of binary decisions, starting at node #1 and moving downwards, lead to one of the possible tumor types, which are the “leaves” of the tree. A sample which is classified to the right branch at node #1 continues to node #2, otherwise it continues to node #11. A sample which is classified to the right branch at node #2 continues to node #4, otherwise it continues to node #3. A sample that reaches node #3, is further classified to either the left branch at node #3, and is assigned to the “hepatocellular carcinoma” class, or to the right branch at node #3, and is assigned to the “biliary tract adenocarcinoma” class.
  • Decisions are made at consecutive nodes using microRNA expression levels, until an end-point (“leaf” of the tree) is reached, indicating the predicted class for this sample. In specifying the tree structure, clinico-pathological considerations were combined with properties observed in the training set data.
  • FIGS. 2A-2D demonstrate binary decisions at node #1 of the decision-tree. When training a decision algorithm for a given node, only samples from classes which are possible outcomes of this node are used for training. The “non germ cell” classes (right branch at node #1); are easily distinguished from tumors of the “germ cell” classes (left branch at node #1) using the expression levels of hsa-miR-373 (SEQ ID NO: 233, 2A), hsa-miR-372 (SEQ ID NO: 55, 2B), hsa-miR-371-3p (SEQ ID NO: 200, 2C), and hsa-miR-371-5p (SEQ ID NO: 201, 2D). The boxplot presentations comparing distribution of the expression of the statistically significant miRs in tumor samples from the “germ cell” classes (left box) and “non germ cell” classes (right box). The line in the box indicates the median value. The box contains 50% of the data and the horizontal lines and crosses (outliers) show the full range of signals in this group.
  • FIG. 3 demonstrates binary decisions at node #3 of the decision-tree. Tumors of hepatocellular carcinoma (HCC) origin (left branch at node #3, marked by squares) are easily distinguished from tumors of biliary tract adenocarcinoma origin (right branch at node #3, marked by diamonds) using the expression levels of hsa-miR-200b (SEQ ID NO: 29, y-axis) and hsa-miR-126 (SEQ ID NO: 9, x-axis).
  • FIG. 4 demonstrates binary decisions at node #4 of the decision-tree. Tumors originating in epithelial (diamonds) are easily distinguished from tumors of non-epithelial origin (squares) using the expression levels of hsa-miR-30a (SEQ ID NO: 46, y-axis) and hsa-miR-200c (SEQ ID NO: 30, x-axis).
  • FIG. 5 demonstrates binary decisions at node #5 of the decision-tree. Tumors originating in the lymphoma or melanoma (diamonds) are easily distinguished from tumors of non epithelial, non lymphoma/melanoma origin (squares) using the expression levels of hsa-miR-146a (SEQ ID NO: 16, y-axis), hsa-miR-30a (SEQ ID NO: 46, x-axis) and hsa-let-7e (SEQ ID NO: 2, z-axis).
  • FIG. 6 demonstrates binary decisions at node #6 of the decision-tree. Tumors originating in the brain (left branch at node #6, marked by diamonds) are easily distinguished from tumors of non epithelial, non brain (right branch at node #6, marked by squares) using the expression levels of hsa-miR-9* (SEQ ID NO: 66, y-axis) and hsa-miR-92b (SEQ ID NO: 68, x-axis).
  • FIG. 7 demonstrates binary decisions at node #7 of the decision-tree. Tumors originating in astrocytoma (right branch at node #7, marked by diamonds) are easily distinguished from tumors of oligodendroglioma origins (left branch at node #7, marked by squares) using the expression levels of hsa-miR-497 (SEQ ID NO: 60, y-axis) and hsa-miR-222 (SEQ ID NO: 40, x-axis).
  • FIG. 8 demonstrates binary decisions at node #8 of the decision-tree. Tumors originating in the neuroendocrine (diamonds) are easily distinguished from tumors of epithelial, origin (squares) using the expression levels of hsa-miR-193a-3p (SEQ ID NO: 181, y-axis), hsa-miR-7 (SEQ ID NO: 65, x-axis) and hsa-miR-375 (SEQ ID NO: 56, z-axis).
  • FIG. 9 demonstrates binary decisions at node #9 of the decision-tree. Tumors originating in gastro-intestinal (GI) (left branch at node #9, marked by diamonds) are easily distinguished from tumors of non GI origins (right branch at node #9, marked by squares) using the expression levels of hsa-miR-21* (SEQ ID NO: 35, y-axis) and hsa-miR-194 (SEQ ID NO: 27, x-axis).
  • FIG. 10 demonstrates binary decisions at node #10 of the decision-tree. Tumors originating in prostate adenocarcinoma (left branch at node #10, marked by diamonds) are easily distinguished from tumors of non prostate origins (right branch at node #10, marked by squares) using the expression levels of hsa-miR-181a (SEQ ID NO: 21, y-axis) and hsa-miR-143 (SEQ ID NO: 14, x-axis).
  • FIG. 11 demonstrates binary decisions at node #12 of the decision-tree. Tumors originating in seminomatous testicular germ cell (left branch at node #12, marked by diamonds) are easily distinguished from tumors of non seminomatous origins (right branch at node #12, marked by squares) using the expression levels of hsa-miR-516a-5p (SEQ ID NO: 62, y-axis) and hsa-miR-200b (SEQ ID NO: 29, x-axis).
  • FIG. 12 demonstrates binary decisions at node #16 of the decision-tree. Tumors originating in thyroid carcinoma (diamonds) are easily distinguished from tumors of adenocarcinoma of the lung, breast and ovarian origin (squares) using the expression levels of hsa-miR-93 (SEQ ID NO: 148, y-axis), hsa-miR-138 (SEQ ID NO: 11, x-axis) and hsa-miR-10a (SEQ ID NO: 4, z-axis).
  • FIG. 13 demonstrates binary decisions at node #17 of the decision-tree. Tumors originating in follicular thyroid carcinoma (left branch at node #17, marked by diamonds) are easily distinguished from tumors of papillary thyroid carcinoma origins (right branch at node #17, marked by squares) using the expression levels of hsa-miR-21 (SEQ ID NO: 34, y-axis) and hsa-miR-146b-5p (SEQ ID NO: 17, x-axis).
  • FIG. 14 demonstrates binary decisions at node #18 of the decision-tree. Tumors originating in breast (diamonds) are easily distinguished from tumors of lung and ovarian origin (squares) using the expression levels of hsa-miR-92a (SEQ ID NO: 67, y-axis), hsa-miR-193a-3p (SEQ ID NO: 25, x-axis) and hsa-miR-31 (SEQ ID NO: 49, z-axis).
  • FIG. 15 demonstrates binary decisions at node #19 of the decision-tree. Tumors originating in lung adenocarcinoma (diamonds) are easily distinguished from tumors of ovarian carcinoma origin (squares) using the expression levels of hsa-miR-21 (SEQ ID NO: 34, y-axis), hsa-miR-378 (SEQ ID NO: 57, x-axis) and hsa-miR-138 (SEQ ID NO: 11, z-axis).
  • FIG. 16 demonstrates binary decisions at node #20 of the decision-tree. Tumors originating in thymic carcinoma (left branch at node #20, marked by diamonds) are easily distinguished from tumors of urothelial carcinoma, transitional cell carcinoma (TCC) carcinoma and squamous cell carcinoma (SCC) origins (right branch at node #20, marked by squares) using the expression levels of hsa-miR-21 (SEQ ID NO: 34, y-axis) and hsa-miR-100 (SEQ ID NO: 3, x-axis).
  • FIG. 17 demonstrates binary decisions at node #22 of the decision-tree. Tumors originating in SCC of the uterine cervix (diamonds) are easily distinguished from tumors of other SCC origin (squares) using the expression levels of hsa-miR-361-5p (SEQ ID NO: 54, y-axis), hsa-let-7c (SEQ ID NO: 1, x-axis) and hsa-miR-10b (SEQ ID NO: 5, z-axis).
  • FIG. 18 demonstrates binary decisions at node #24 of the decision-tree. Tumors originating in melanoma (diamonds) are easily distinguished from tumors of lymphoma origin (squares) using the expression levels of hsa-miR-342-3p (SEQ ID NO: 50, y-axis) and hsa-miR-30d (SEQ ID NO: 47, x-axis).
  • FIG. 19 demonstrates binary decisions at node #27 of the decision-tree. Tumors originating in thyroid carcinoma, medullary (diamonds) are easily distinguished from tumors of other neuroendocrine origin (squares) using the expression levels of hsa-miR-92b (SEQ ID NO: 68, y-axis), hsa-miR-222 (SEQ ID NO: 40, x-axis) and hsa-miR-92a (SEQ ID NO: 67, z-axis).
  • FIG. 20 demonstrates binary decisions at node #30 of the decision-tree. Tumors originating in gastric or esophageal adenocarcinoma (diamonds) are easily distinguished from tumors of other GI adenocarcinoma origin (squares) using the expression levels of hsa-miR-1201 (SEQ ID NO: 146, y-axis), hsa-miR-224 (SEQ ID NO: 42, x-axis) and hsa-miR-210 (SEQ ID NO: 36, z-axis).
  • FIG. 21 demonstrates binary decisions at node #31 of the decision-tree. Tumors originating in colorectal adenocarcinoma (diamonds) are easily distinguished from tumors of adenocarcinoma of biliary tract or pancreas origin (squares) using the expression levels of hsa-miR-30a (SEQ ID NO: 46, y-axis), hsa-miR-17 (SEQ ID NO: 20, x-axis) and hsa-miR-29a (SEQ ID NO: 43, z-axis).
  • FIG. 22 demonstrates binary decisions at node #33 of the decision-tree. Tumors originating in kidney (diamonds) are easily distinguished from tumors of adrenal, mesothelioma and sarcoma origin (squares) using the expression levels of hsa-miR-200b (SEQ ID NO: 29, y-axis), hsa-miR-30a (SEQ ID NO: 46, x-axis) and hsa-miR-149 (SEQ ID NO: 19, z-axis).
  • FIG. 23 demonstrates binary decisions at node #34 of the decision-tree. Tumors originating in pheochromocytoma (diamonds) are easily distinguished from tumors of adrenal, mesothelioma and sarcoma origin (squares) using the expression levels of hsa-miR-375 (SEQ ID NO: 56, y-axis) and hsa-miR-7 (SEQ ID NO: 65, x-axis).
  • FIG. 24 demonstrates binary decisions at node #44 of the decision-tree. Tumors originating in Ewing sarcoma (diamonds) are easily distinguished from tumors of osteosarcoma origin (squares) using the expression levels of hsa-miR-31 (SEQ ID NO: 49, y-axis) and hsa-miR-193a-3p (SEQ ID NO: 25, x-axis).
  • FIG. 25 demonstrates binary decisions at node #45 of the decision-tree. Tumors originating in Rhabdomyosarcoma (diamonds) are easily distinguished from tumors of malignant fibrous histiocytoma (MFH) or fibrosarcoma origin (squares) using the expression levels of hsa-miR-206 (SEQ ID NO: 33, y-axis), hsa-miR-22 (SEQ ID NO: 39, x-axis) and hsa-miR-487b (SEQ ID NO: 59, z-axis).
  • DETAILED DESCRIPTION OF THE INVENTION
  • Identification of the tissue-of-origin of a tumor is vital to its management. The present invention is based in part on the discovery that specific nucleic acid sequences can be used for the identification of the tissue-of-origin of a tumor. The present invention provides a sensitive, specific and accurate method which can be used to distinguish between different tumor origins. A new microRNA-based classifier was developed for determining tissue origin of tumors based on 65 microRNAs markers. The classifier uses a specific algorithm and allows a clear interpretation of the specific biomarkers.
  • According to the present invention each node in the classification tree may be used as an independent differential diagnosis tool, for example in the identification of different types of lung cancers. The possibility to distinguish between different tumor origins facilitates providing the patient with the best and most suitable treatment.
  • The present invention provides diagnostic assays and methods, both quantitative and qualitative for detecting, diagnosing, monitoring, staging and prognosticating cancers by comparing the levels of the specific microRNA molecules of the invention. Such levels are preferably measured in at least one of biopsies, tumor samples, fine-needle aspiration (FNA), cells, tissues and/or bodily fluids. The methods provided in the present invention are particularly useful for discriminating between different cancers.
  • All the methods of the present invention may optionally further include measuring levels of additional cancer markers. The cancer markers measured in addition to said microRNA molecules depend on the cancer being tested and are known to those of skill in the art.
  • Assay techniques can be used to determine levels of gene expression, such as genes encoding the nucleic acids of the present invention in a sample derived from a patient. Such assay methods, which are well known to those of skill in the art, include, but are not limited to, nucleic acid microarrays and biochip analysis, reverse transcriptase PCR (RT-PCR) assays, immunohistochemistry assays, in situ hybridization assays, competitive-binding assays, northern blot analyses and ELISA assays.
  • According to one embodiment, the assay is based on expression level of 65 microRNAs in RNA extracted from FFPE metastatic tumor tissue.
  • The expression levels are used to infer the sample origin using analysis techniques such as, but not limited to, decision-tree classifier, K nearest neighbors classifier, logistic regression classifier, linear regression classifier, nearest neighbor classifier, neural network classifier and nearest centroid classifier.
  • In use of the decision tree classifier the expression levels are used to make binary decisions (at each relevant node) following the pre-defined structure of the binary decision-tree (defined using a training set).
  • At each node, the expressions of one or several microRNAs are combined together using a function of the form P=exp (β0+β1*miR1+β2*miR2+β3*miR3 . . . )/(1−exp (β0+β1*miR1+β2*miR2+β3*miR3 . . . )), where the values of β0, β1, β2 . . . and the identities of the microRNAs have been pre-determined (using a training set). The resulting P is compared to a probability threshold level (PTH, which was also determined using the training set), and the classification continues to the left or right branch according to whether P is larger or smaller than the PTH for that node. This continues until an end-point (“leaf”) of the tree is reached. According to some embodiments, PTH=0.5 for all nodes, and the value of β0 is adjusted accordingly. According to further embodiments, β0, β1, β2, . . . are adjusted so that the slope of the log of the odds ratio function is limited.
  • Training the tree algorithm means determining the tree structure—which nodes there are and what is on each side, and, for each node: which miRs are used, the values of β0, β1, β2 . . . and the PTH. These are determined by a combination of machine learning, optimization algorithm, and trial and error by experts in machine learning and diagnostic algorithms.
  • An arbitrary threshold of the expression level of one or more nucleic acid sequences can be set for assigning a sample to one of two groups. Alternatively, in a preferred embodiment, expression levels of one or more nucleic acid sequences of the invention are combined by a method such as logistic regression to define a metric which is then compared to previously measured samples or to a threshold. The threshold is treated as a parameter that can be used to quantify the confidence with which samples are assigned to each class. The threshold can be scaled to favor sensitivity or specificity, depending on the clinical scenario. The correlation value to the reference data generates a continuous score that can be scaled and provides diagnostic information on the likelihood that a sample belongs to a certain class of cancer origin or type. In multivariate analysis the microRNA signature provides a high level of prognostic information.
  • In another preferred embodiment, expression level of nucleic acids is used to classify a test sample by comparison to a training set of samples. In this embodiment, the test sample is compared in turn to each one of the training set samples. The comparison is performed by comparing the expression levels of one or multiple nucleic acids between the test sample and the specific training sample. Each such pairwise comparison generates a combined metric for the multiple nucleic acids, which can be calculated by various numeric methods such as correlation, cosine, Euclidian distance, mean square distance, or other methods known to those skilled in the art. The training samples are then ranked according to this metric, and the samples with the highest values of the metric (or lowest values, according to the type of metric) are identified, indicating those samples that are most similar to the test sample. By choosing a parameter K, this generates a list that includes the K training samples that are most similar to the test sample. Various methods can then be applied to identify from this list the predicted class of the test sample. In a favored embodiment, the test sample is predicted to belong to the class that has the highest number of representative in the list of K most-similar training samples (this method is known as the K Nearest Neighbors method). Other embodiments may provide a list of predictions including all or part of the classes represented in the list, those classes that are represented more than a given minimum number of times, or other voting schemes whereby classes are grouped together.
  • 1. DEFINITIONS
  • It is to be understood that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting. It must be noted that, as used in the specification and the appended claims, the singular forms “a,” “an” and “the” include plural referents unless the context clearly dictates otherwise.
  • For the recitation of numeric ranges herein, each intervening number there between with the same degree of precision is explicitly contemplated. For example, for the range of 6-9, the numbers 7 and 8 are contemplated in addition to 6 and 9, and for the range 6.0-7.0, the number 6.0, 6.1, 6.2, 6.3, 6.4, 6.5, 6.6, 6.7, 6.8, 6.9 and 7.0 are explicitly contemplated.
  • About
  • As used herein, the term “about” refers to +/−10%.
  • Attached
  • “Attached” or “immobilized”, as used herein, to refer to a probe and a solid support means that the binding between the probe and the solid support is sufficient to be stable under conditions of binding, washing, analysis, and removal. The binding may be covalent or non-covalent. Covalent bonds may be formed directly between the probe and the solid support or may be formed by a cross linker or by inclusion of a specific reactive group on either the solid support or the probe or both molecules. Non-covalent binding may be one or more of electrostatic, hydrophilic, and hydrophobic interactions. Included in non-covalent binding is the covalent attachment of a molecule, such as streptavidin, to the support and the non-covalent binding of a biotinylated probe to the streptavidin Immobilization may also involve a combination of covalent and non-covalent interactions.
  • Baseline
  • “Baseline”, as used herein, means the initial cycles of PCR, in which there is little change in fluorescence signal.
  • Biological Sample
  • “Biological sample”, as used herein, means a sample of biological tissue or fluid that comprises nucleic acids. Such samples include, but are not limited to, tissue or fluid isolated from subjects. Biological samples may also include sections of tissues such as biopsy and autopsy samples, FFPE samples, frozen sections taken for histological purposes, blood, blood fraction, plasma, serum, sputum, stool, tears, mucus, hair, skin, urine, effusions, ascitic fluid, amniotic fluid, saliva, cerebrospinal fluid, cervical secretions, vaginal secretions, endometrial secretions, gastrointestinal secretions, bronchial secretions, cell line, tissue sample, or secretions from the breast. A biological sample may be provided by fine-needle aspiration (FNA), pleural effusion or bronchial brushing. A biological sample may be provided by removing a sample of cells from a subject but can also be accomplished by using previously isolated cells (e. g., isolated by another person, at another time, and/or for another purpose), or by performing the methods described herein in vivo. Archival tissues, such as those having treatment or outcome history, may also be used. Biological samples also include explants and primary and/or transformed cell cultures derived from animal or human tissues.
  • Cancer
  • The term “cancer” is meant to include all types of cancerous growths or oncogenic processes, metastatic tissues or malignantly transformed cells, tissues, or organs, irrespective of histopathologic type or stage of invasiveness. Examples of cancers include, but are not limited, to solid tumors and leukemias, including: apudoma, choristoma, branchioma, malignant carcinoid syndrome, carcinoid heart disease, carcinoma (e.g., Walker, basal cell, basosquamous, Brown-Pearce, ductal, Ehrlich tumor, non-small cell lung (e.g., lung squamous cell carcinoma, lung adenocarcinoma and lung undifferentiated large cell carcinoma), oat cell, papillary, bronchiolar, bronchogenic, squamous cell, and transitional cell), histiocytic disorders, leukemia (e.g., B cell, mixed cell, null cell, T cell, T-cell chronic, HTLV-II-associated, lymphocytic acute, lymphocytic chronic, mast cell, and myeloid), histiocytosis malignant, Hodgkin disease, immunoproliferative small, non-Hodgkin lymphoma, plasmacytoma, reticuloendotheliosis, melanoma, chondroblastoma, chondroma, chondrosarcoma, fibroma, fibrosarcoma, giant cell tumors, histiocytoma, lipoma, liposarcoma, mesothelioma, myxoma, myxosarcoma, osteoma, osteosarcoma, Ewing sarcoma, synovioma, adenofibroma, adenolymphoma, carcinosarcoma, chordoma, craniopharyngioma, dysgerminoma, hamartoma, mesenchymoma, mesonephroma, myosarcoma, ameloblastoma, cementoma, odontoma, teratoma, thymoma, trophoblastic tumor, adeno-carcinoma, adenoma, cholangioma, cholesteatoma, cylindroma, cystadenocarcinoma, cystadenoma, granulosa cell tumor, gynandroblastoma, hepatoma, hidradenoma, islet cell tumor, Leydig cell tumor, papilloma, Sertoli cell tumor, theca cell tumor, leiomyoma, leiomyosarcoma, myoblastoma, myosarcoma, rhabdomyoma, rhabdomyosarcoma, ependymoma, ganglioneuroma, glioma, medulloblastoma, meningioma, neurilemmoma, neuroblastoma, neuroepithelioma, neurofibroma, neuroma, paraganglioma, paraganglioma nonchromaffin, angiokeratoma, angiolymphoid hyperplasia with eosinophilia, angioma sclerosing, angiomatosis, glomangioma, hemangioendothelioma, hemangioma, hemangiopericytoma, hemangiosarcoma, lymphangioma, lymphangiomyoma, lymphangiosarcoma, pinealoma, carcinosarcoma, chondrosarcoma, cystosarcoma, phyllodes, fibrosarcoma, hemangiosarcoma, leimyosarcoma, leukosarcoma, liposarcoma, lymphangiosarcoma, myosarcoma, myxosarcoma, ovarian carcinoma, rhabdomyosarcoma, sarcoma (e.g., Ewing, experimental, Kaposi, and mast cell), neurofibromatosis, and cervical dysplasia, and other conditions in which cells have become immortalized or transformed.
  • Classification
  • The term classification refers to a procedure and/or algorithm in which individual items are placed into groups or classes based on quantitative information on one or more characteristics inherent in the items (referred to as traits, variables, characters, features, etc.) and based on a statistical model and/or a training set of previously labeled items. A “classification tree” places categorical variables into classes.
  • Complement
  • “Complement” or “complementary” is used herein to refer to a nucleic acid may mean Watson-Crick (e.g., A-T/U and C-G) or Hoogsteen base pairing between nucleotides or nucleotide analogs of nucleic acid molecules. A full complement or fully complementary means 100% complementary base pairing between nucleotides or nucleotide analogs of nucleic acid molecules. In some embodiments, the complementary sequence has a reverse orientation (5′-3′).
  • Ct
  • Ct signals represent the first cycle of PCR where amplification crosses a threshold (cycle threshold) of fluorescence. Accordingly, low values of Ct represent high abundance or expression levels of the microRNA.
  • In some embodiments the PCR Ct signal is normalized such that the normalized Ct remains inversed from the expression level. In other embodiments the PCR Ct signal may be normalized and then inverted such that low normalized-inverted Ct represents low abundance or low expression levels of the microRNA.
  • Data Processing Routine
  • As used herein, a “data processing routine” refers to a process that can be embodied in software that determines the biological significance of acquired data (i.e., the ultimate results of an assay or analysis). For example, the data processing routine can determine a tissue of origin based upon the data collected. In the systems and methods herein, the data processing routine can also control the data collection routine based upon the results determined. The data processing routine and the data collection routines can be integrated and provide feedback to operate the data acquisition, and hence provide assay-based judging methods.
  • Data Set
  • As use herein, the term “data set” refers to numerical values obtained from the analysis. These numerical values associated with analysis may be values such as peak height and area under the curve.
  • Data Structure
  • As used herein, the term “data structure” refers to a combination of two or more data sets, an application of one or more mathematical manipulation to one or more data sets to obtain one or more new data sets, or a manipulation of two or more data sets into a form that provides a visual illustration of the data in a new way. An example of a data structure prepared from manipulation of two or more data sets would be a hierarchical cluster.
  • Detection
  • “Detection” means detecting the presence of a component in a sample. Detection also means detecting the absence of a component. Detection also means determining the level of a component, either quantitatively or qualitatively.
  • Differential Expression
  • “Differential expression” means qualitative or quantitative differences in the temporal and/or spatial gene expression patterns within and among cells and tissue. Thus, a differentially expressed gene may qualitatively have its expression altered, including an activation or inactivation, in, e.g., normal versus diseased tissue. Genes may be turned on or turned off in a particular state, relative to another state, thus permitting comparison of two or more states. A qualitatively regulated gene may exhibit an expression pattern within a state or cell type which may be detectable by standard techniques. Some genes may be expressed in one state or cell type, but not in both. Alternatively, the difference in expression may be quantitative, e.g., in that expression is modulated: up-regulated, resulting in an increased amount of transcript, or down-regulated, resulting in a decreased amount of transcript. The degree to which expression differs needs only to be large enough to quantify via standard characterization techniques such as expression arrays, quantitative reverse transcriptase PCR, northern blot analysis, real-time PCR, in situ hybridization and RNase protection.
  • Epithelial Tumors
  • “Epithelial tumors” is meant to include all types of tumors from epithelial origin. Examples of epithelial tumors include, but are not limited to cholangioca or adenoca of extrahepatic biliary tract, urothelial carcinoma, adenocarcinoma of the breast, lung large cell or adenocarcinoma, lung small cell carcinoma, carcinoid, lung, ovarian carcinoma, pancreatic adenocarcinoma, prostatic adenocarcinoma, gastric or esophageal adenocarcinoma, thymoma/thymic carcinoma, follicular thyroid carcinoma, papillary thyroid carcinoma, medullary thyroid carcinoma, anus or skin squamous cell carcinoma, lung, head&neck, or esophagus squamous cell carcinoma, uterine cervix squamous cell carcinoma, gastrointestinal tract carcinoid, pancreatic islet cell tumor and colorectal adenocarcinoma.
  • Non Epithelial Tumors
  • “Non epithelial tumors” is meant to include all types of tumors from non epithelial origin. Examples of non epithelial tumors include, but are not limited to adrenocortical carcinoma, chromophobe renal cell carcinoma, clear cell renal cell carcinoma, papillary renal cell carcinoma, pleural mesothelioma, astrocytic tumor, oligodendroglioma, pheochromocytoma, B-cell lymphoma, T-cell lymphoma, melanoma, gastrointestinal stromal tumor (GIST), Ewing Sarcoma, chondrosarcoma, malignant fibrous histiocytoma (MFH) or fibrosarcoma, osteosarcoma, rhabdomyosarcoma, synovial sarcoma and liposarcoma.
  • Expression Profile
  • The term “expression profile” is used broadly to include a genomic expression profile, e.g., an expression profile of microRNAs. Profiles may be generated by any convenient means for determining a level of a nucleic acid sequence, e.g., quantitative hybridization of microRNA, labeled microRNA, amplified microRNA, cDNA, etc., quantitative PCR, ELISA for quantitation, and the like, and allow the analysis of differential gene expression between two samples. A subject or patient tumor sample, e.g., cells or collections thereof, e.g., tissues, is assayed. Samples are collected by any convenient method, as known in the art. Nucleic acid sequences of interest are nucleic acid sequences that are found to be predictive, including the nucleic acid sequences provided above, where the expression profile may include expression data for 5, 10, 20, 25, 50, 100 or more of the nucleic acid sequences, including all of the listed nucleic acid sequences. According to some embodiments, the term “expression profile” means measuring the relative abundance of the nucleic acid sequences in the measured samples.
  • Expression Ratio
  • “Expression ratio”, as used herein, refers to relative expression levels of two or more nucleic acids as determined by detecting the relative expression levels of the corresponding nucleic acids in a biological sample.
  • FDR (False Discovery Rate)
  • When performing multiple statistical tests, for example in comparing between the signal of two groups in multiple data features, there is an increasingly high probability of obtaining false positive results, by random differences between the groups that can reach levels that would otherwise be considered statistically significant. In order to limit the proportion of such false discoveries, statistical significance is defined only for data features in which the differences reached a p-value (by two-sided t-test) below a threshold, which is dependent on the number of tests performed and the distribution of p-values obtained in these tests.
  • Fragment
  • “Fragment” is used herein to indicate a non-full-length part of a nucleic acid. Thus, a fragment is itself also a nucleic acid.
  • Gastrointestinal Tumors
  • “gastrointestinal tumors” is meant to include all types of tumors from gastrointestinal origin. Examples of gastrointestinal tumors include, but are not limited to cholangioca. or adenoca of extrahepatic biliary tract, pancreatic adenocarcinoma, gastric or esophageal adenocarcinoma, and colorectal adenocarcinoma.
  • Gene
  • “Gene”, as used herein, may be a natural (e.g., genomic) or synthetic gene comprising transcriptional and/or translational regulatory sequences and/or a coding region and/or non-translated sequences (e.g., introns, 5′- and 3′-untranslated sequences). The coding region of a gene may be a nucleotide sequence coding for an amino acid sequence or a functional RNA, such as tRNA, rRNA, catalytic RNA, siRNA, miRNA or antisense RNA. A gene may also be an mRNA or cDNA corresponding to the coding regions (e.g., exons and miRNA) optionally comprising 5′- or 3′-untranslated sequences linked thereto. A gene may also be an amplified nucleic acid molecule produced in vitro, comprising all or a part of the coding region and/or 5′- or 3′-untranslated sequences linked thereto.
  • Germ Cell Tumors
  • “Germ cell tumors” as used herein, include, but are not limited, to non-seminomatous testicular germ cell tumors, seminomatous testicular germ cell tumors and ovarian primitive germ cell tumors.
  • Groove Binder/Minor Groove Binder (MGB)
  • “Groove binder” and/or “minor groove binder” may be used interchangeably and refer to small molecules that fit into the minor groove of double-stranded DNA, typically in a sequence-specific manner. Minor groove binders may be long, flat molecules that can adopt a crescent-like shape and thus fit snugly into the minor groove of a double helix, often displacing water. Minor groove binding molecules may typically comprise several aromatic rings connected by bonds with torsional freedom such as furan, benzene, or pyrrole rings. Minor groove binders may be antibiotics such as netropsin, distamycin, berenil, pentamidine and other aromatic diamidines, Hoechst 33258, SN 6999, aureolic anti-tumor drugs such as chromomycin and mithramycin, CC-1065, dihydrocyclopyrroloindole tripeptide (DPI3), 1,2-dihydro-(3H)-pyrrolo[3,2-e]indole-7-carboxylate (CDPI3), and related compounds and analogues, including those described in Nucleic Acids in Chemistry and Biology, 2nd ed., Blackburn and Gait, eds., Oxford University Press, 1996, and PCT Published Application No. WO 03/078450, the contents of which are incorporated herein by reference. A minor groove binder may be a component of a primer, a probe, a hybridization tag complement, or combinations thereof. Minor groove binders may increase the Tm of the primer or a probe to which they are attached, allowing such primers or probes to effectively hybridize at higher temperatures.
  • High Expression miR-205 Tumors
  • “High expression miR-205 tumors” as used herein include, but are not limited, to urothelial carcinoma (TCC), thymoma/thymic carcinoma, anus or skin squamous cell carcinoma, lung, head&neck, or esophagus squamous cell carcinoma and uterine cervix squamous cell carcinoma.
  • Low Expression 205 Tumors
  • “Low expression miR-205 tumors” as used herein include, but are not limited, to lung, large cell or adenocarcinoma, follicular thyroid carcinoma and papillary thyroid carcinoma.
  • Host Cell
  • “Host cell”, as used herein, may be a naturally occurring cell or a transformed cell that may contain a vector and may support replication of the vector. Host cells may be cultured cells, explants, cells in vivo, and the like. Host cells may be prokaryotic cells, such as E. coli, or eukaryotic cells, such as yeast, insect, amphibian, or mammalian cells, such as CHO and HeLa cells.
  • Identity
  • “Identical” or “identity”, as used herein, in the context of two or more nucleic acids or polypeptide sequences mean that the sequences have a specified percentage of residues that are the same over a specified region. The percentage may be calculated by optimally aligning the two sequences, comparing the two sequences over the specified region, determining the number of positions at which the identical residue occurs in both sequences to yield the number of matched positions, dividing the number of matched positions by the total number of positions in the specified region, and multiplying the result by 100 to yield the percentage of sequence identity. In cases where the two sequences are of different lengths or the alignment produces one or more staggered ends and the specified region of comparison includes only a single sequence, the residues of single sequence are included in the denominator but not the numerator of the calculation. When comparing DNA and RNA sequences, thymine (T) and uracil (U) may be considered equivalent. Identity may be performed manually or by using a computer sequence algorithm such as BLAST or BLAST 2.0.
  • In Situ Detection
  • “In situ detection”, as used herein, means the detection of expression or expression levels in the original site, hereby meaning in a tissue sample such as biopsy.
  • K-Nearest Neighbor
  • The phrase “K-nearest neighbor” refers to a classification method that classifies a point by calculating the distances between it and points in the training data set. It then assigns the point to the class that is most common among its K-nearest neighbors (where K is an integer).
  • Leaf
  • A leaf, as used herein, is the terminal group in a classification or decision tree.
  • Label
  • “Label”, as used herein, means a composition detectable by spectroscopic, photochemical, biochemical, immunochemical, chemical, or other physical means. For example, useful labels include 32P, fluorescent dyes, electron-dense reagents, enzymes (e.g., as commonly used in an ELISA), biotin, digoxigenin, or haptens and other entities which can be made detectable. A label may be incorporated into nucleic acids and proteins at any position.
  • Logistic Regression
  • Logistic regression is part of a category of statistical models called generalized linear models. Logistic regression can allow one to predict a discrete outcome, such as group membership, from a set of variables that may be continuous, discrete, dichotomous, or a mix of any of these. The dependent or response variable can be dichotomous, for example, one of two possible types of cancer. Logistic regression models the natural log of the odds ratio, i.e., the ratio of the probability of belonging to the first group (P) over the probability of belonging to the second group (1-P), as a linear combination of the different expression levels (in log-space). The logistic regression output can be used as a classifier by prescribing that a case or sample will be classified into the first type if P is greater than 0.5 or 50%. Alternatively, the calculated probability P can be used as a variable in other contexts, such as a 1D or 2D threshold classifier.
  • Metastasis
  • “Metastasis” means the process by which cancer spreads from the place at which it first arose as a primary tumor to other locations in the body. The metastatic progression of a primary tumor reflects multiple stages, including dissociation from neighboring primary tumor cells, survival in the circulation, and growth in a secondary location.
  • Neuroendocrine Tumors
  • “Neuroendocrine tumors” is meant to include all types of tumors from neuroendocrine origin. Examples of neuroendocrine tumors include, but are not limited to lung small cell carcinoma, lung carcinoid, gastrointestinal tract carcinoid, pancreatic islet cell tumor and medullary thyroid carcinoma.
  • Node
  • A “node” is a decision point in a classification (i.e., decision) tree. Also, a point in a neural net that combines input from other nodes and produces an output through application of an activation function.
  • Nucleic Acid
  • “Nucleic acid” or “oligonucleotide” or “polynucleotide”, as used herein, mean at least two nucleotides covalently linked together. The depiction of a single strand also defines the sequence of the complementary strand. Thus, a nucleic acid also encompasses the complementary strand of a depicted single strand. Many variants of a nucleic acid may be used for the same purpose as a given nucleic acid. Thus, a nucleic acid also encompasses substantially identical nucleic acids and complements thereof. A single strand provides a probe that may hybridize to a target sequence under stringent hybridization conditions. Thus, a nucleic acid also encompasses a probe that hybridizes under stringent hybridization conditions.
  • Nucleic acids may be single-stranded or double-stranded, or may contain portions of both double-stranded and single-stranded sequences. The nucleic acid may be DNA, both genomic and cDNA, RNA, or a hybrid, where the nucleic acid may contain combinations of deoxyribo- and ribo-nucleotides, and combinations of bases including uracil, adenine, thymine, cytosine, guanine, inosine, xanthine hypoxanthine, isocytosine and isoguanine. Nucleic acids may be obtained by chemical synthesis methods or by recombinant methods.
  • A nucleic acid will generally contain phosphodiester bonds, although nucleic acid analogs may be included that may have at least one different linkage, e.g., phosphoramidate, phosphorothioate, phosphorodithioate, or O-methylphosphoroamidite linkages and peptide nucleic acid backbones and linkages. Other analog nucleic acids include those with positive backbones, non-ionic backbones and non-ribose backbones, including those described in U.S. Pat. Nos. 5,235,033 and 5,034,506, which are incorporated herein by reference. Nucleic acids containing one or more non-naturally occurring or modified nucleotides are also included within one definition of nucleic acids. The modified nucleotide analog may be located for example at the 5′-end and/or the 3′-end of the nucleic acid molecule. Representative examples of nucleotide analogs may be selected from sugar- or backbone-modified ribonucleotides. It should be noted, however, that also nucleobase-modified ribonucleotides, i.e., ribonucleotides, containing a non-naturally occurring nucleobase instead of a naturally occurring nucleobase such as uridine or cytidine modified at the 5-position, e.g., 5-(2-amino) propyl uridine, 5-bromo uridine; adenosine and guanosine modified at the 8-position, e.g., 8-bromo guanosine; deaza nucleotides, e.g., 7-deaza-adenosine; 0- and N-alkylated nucleotides, e.g., N6-methyl adenosine are suitable. The 2′-OH-group may be replaced by a group selected from H, OR, R, halo, SH, SR, NH2, NHR, NR2 or CN, wherein R is C1-C6 alkyl, alkenyl or alkynyl and halo is F, Cl, Br or I. Modified nucleotides also include nucleotides conjugated with cholesterol through, e.g., a hydroxyprolinol linkage as described in Krutzfeldt et al., Nature 2005; 438:685-689, Soutschek et al., Nature 2004; 432:173-178, and U.S. Patent Publication No. 20050107325, which are incorporated herein by reference. Additional modified nucleotides and nucleic acids are described in U.S. Patent Publication No. 20050182005, which is incorporated herein by reference. Modifications of the ribose-phosphate backbone may be done for a variety of reasons, e.g., to increase the stability and half-life of such molecules in physiological environments, to enhance diffusion across cell membranes, or as probes on a biochip. The backbone modification may also enhance resistance to degradation, such as in the harsh endocytic environment of cells. The backbone modification may also reduce nucleic acid clearance by hepatocytes, such as in the liver and kidney. Mixtures of naturally occurring nucleic acids and analogs may be made; alternatively, mixtures of different nucleic acid analogs, and mixtures of naturally occurring nucleic acids and analogs may be made.
  • Probe
  • “Probe”, as used herein, means an oligonucleotide capable of binding to a target nucleic acid of complementary sequence through one or more types of chemical bonds, usually through complementary base pairing, usually through hydrogen bond formation. Probes may bind target sequences lacking complete complementarity with the probe sequence depending upon the stringency of the hybridization conditions. There may be any number of base pair mismatches which will interfere with hybridization between the target sequence and the single-stranded nucleic acids described herein. However, if the number of mutations is so great that no hybridization can occur under even the least stringent of hybridization conditions, the sequence is not a complementary target sequence. A probe may be single-stranded or partially single- and partially double-stranded. The strandedness of the probe is dictated by the structure, composition, and properties of the target sequence. Probes may be directly labeled or indirectly labeled such as with biotin to which a streptavidin complex may later bind.
  • Reference Value
  • As used herein, the term “reference value” or “reference expression profile” refers to a criterion expression value to which measured values are compared in order to identify a specific cancer. The reference value may be based on the abundance of the nucleic acids, or may be based on a combined metric score thereof.
  • In preferred embodiments the reference value is determined from statistical analysis of studies that compare microRNA expression with known clinical outcomes.
  • Sarcoma
  • Sarcoma is meant to include all types of tumors from sarcoma origin. Examples of sarcoma tumors include, but are not limited to gastrointestinal stromal tumor (GIST), Ewing sarcoma, chondrosarcoma, malignant fibrous histiocytoma (MFH) or fibrosarcoma, osteosarcoma, rhabdomyosarcoma, synovial sarcoma and liposarcoma.
  • Sensitivity
  • “Sensitivity”, as used herein, may mean a statistical measure of how well a binary classification test correctly identifies a condition, for example, how frequently it correctly classifies a cancer into the correct class out of two possible classes. The sensitivity for class A is the proportion of cases that are determined to belong to class “A” by the test out of the cases that are in class “A”, as determined by some absolute or gold standard.
  • Specificity
  • “Specificity”, as used herein, may mean a statistical measure of how well a binary classification test correctly identifies a condition, for example, how frequently it correctly classifies a cancer into the correct class out of two possible classes. The specificity for class A is the proportion of cases that are determined to belong to class “not A” by the test out of the cases that are in class “not A”, as determined by some absolute or gold standard.
  • Stringent Hybridization Conditions
  • “Stringent hybridization conditions”, as used herein, mean conditions under which a first nucleic acid sequence (e.g., probe) will hybridize to a second nucleic acid sequence (e.g., target), such as in a complex mixture of nucleic acids. Stringent conditions are sequence-dependent and will be different in different circumstances. Stringent conditions may be selected to be about 5-10° C. lower than the thermal melting point (Tm) for the specific sequence at a defined ionic strength pH. The Tm may be the temperature (under defined ionic strength, pH, and nucleic concentration) at which 50% of the probes complementary to the target hybridize to the target sequence at equilibrium (as the target sequences are present in excess, at Tm, 50% of the probes are occupied at equilibrium). Stringent conditions may be those in which the salt concentration is less than about 1.0 M sodium ion, such as about 0.01-1.0 M sodium ion concentration (or other salts) at pH 7.0 to 8.3 and the temperature is at least about 30° C. for short probes (e.g., about 10-50 nucleotides) and at least about 60° C. for long probes (e.g., greater than about 50 nucleotides). Stringent conditions may also be achieved with the addition of destabilizing agents such as formamide. For selective or specific hybridization, a positive signal may be at least 2 to 10 times background hybridization. Exemplary stringent hybridization conditions include the following: 50% formamide, 5×SSC, and 1% SDS, incubating at 42° C., or, 5×SSC, 1% SDS, incubating at 65° C., with wash in 0.2×SSC, and 0.1% SDS at 65° C.
  • Substantially Complementary
  • “Substantially complementary”, as used herein, means that a first sequence is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 97%, 98% or 99% identical to the complement of a second sequence over a region of 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100 or more nucleotides, or that the two sequences hybridize under stringent hybridization conditions.
  • Substantially Identical
  • “Substantially identical”, as used herein, means that a first and a second sequence are at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 97%, 98% or 99% identical over a region of 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100 or more nucleotides or amino acids, or with respect to nucleic acids, if the first sequence is substantially complementary to the complement of the second sequence.
  • Subject
  • As used herein, the term “subject” refers to a mammal, including both human and other mammals. The methods of the present invention are preferably applied to human subjects.
  • Target Nucleic Acid
  • “Target nucleic acid”, as used herein, means a nucleic acid or variant thereof that may be bound by another nucleic acid. A target nucleic acid may be a DNA sequence. The target nucleic acid may be RNA. The target nucleic acid may comprise a mRNA, tRNA, shRNA, siRNA or Piwi-interacting RNA, or a pri-miRNA, pre-miRNA, miRNA, or anti-miRNA.
  • The target nucleic acid may comprise a target miRNA binding site or a variant thereof. One or more probes may bind the target nucleic acid. The target binding site may comprise 5-100 or 10-60 nucleotides. The target binding site may comprise a total of 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30-40, 40-50, 50-60, 61, 62 or 63 nucleotides. The target site sequence may comprise at least 5 nucleotides of the sequence of a target miRNA binding site disclosed in U.S. patent application Ser. Nos. 11/384,049, 11/418,870 or 11/429,720, the contents of which are incorporated herein.
  • 1D/2D Threshold Classifier
  • “1D/2D threshold classifier”, as used herein, may mean an algorithm for classifying a case or sample such as a cancer sample into one of two possible types such as two types of cancer. For a 1D threshold classifier, the decision is based on one variable and one predetermined threshold value; the sample is assigned to one class if the variable exceeds the threshold and to the other class if the variable is less than the threshold. A 2D threshold classifier is an algorithm for classifying into one of two types based on the values of two variables. A threshold may be calculated as a function (usually a continuous or even a monotonic function) of the first variable; the decision is then reached by comparing the second variable to the calculated threshold, similar to the 1D threshold classifier.
  • Tissue Sample
  • As used herein, a tissue sample is tissue obtained from a tissue biopsy using methods well known to those of ordinary skill in the related medical arts. The phrase “suspected of being cancerous”, as used herein, means a cancer tissue sample believed by one of ordinary skill in the medical arts to contain cancerous cells. Methods for obtaining the sample from the biopsy include gross apportioning of a mass, microdissection, laser-based microdissection, or other art-known cell-separation methods.
  • Tumor
  • “Tumor”, as used herein, refers to all neoplastic cell growth and proliferation, whether malignant or benign, and all pre-cancerous and cancerous cells and tissues.
  • Variant
  • “Variant”, as used herein, referring to a nucleic acid means (i) a portion of a referenced nucleotide sequence; (ii) the complement of a referenced nucleotide sequence or portion thereof; (iii) a nucleic acid that is substantially identical to a referenced nucleic acid or the complement thereof; or (iv) a nucleic acid that hybridizes under stringent conditions to the referenced nucleic acid, complement thereof, or a sequence substantially identical thereto.
  • Wild Type
  • As used herein, the term “wild-type” sequence refers to a coding, a non-coding or an interface sequence which is an allelic form of sequence that performs the natural or normal function for that sequence. Wild-type sequences include multiple allelic forms of a cognate sequence, for example, multiple alleles of a wild type sequence may encode silent or conservative changes to the protein sequence that a coding sequence encodes.
  • The present invention employs miRNAs for the identification, classification and diagnosis of specific cancers and the identification of their tissues of origin.
  • 1. microRNA Processing
  • A gene coding for microRNA (miRNA) may be transcribed leading to production of a miRNA primary transcript known as the pri-miRNA. The pri-miRNA may comprise a hairpin with a stem and loop structure. The stem of the hairpin may comprise mismatched bases. The pri-miRNA may comprise several hairpins in a polycistronic structure.
  • The hairpin structure of the pri-miRNA may be recognized by Drosha, which is an RNase III endonuclease. Drosha may recognize terminal loops in the pri-miRNA and cleave approximately two helical turns into the stem to produce a 60-70 nt precursor known as the pre-miRNA. Drosha may cleave the pri-miRNA with a staggered cut typical of RNase III endonucleases yielding a pre-miRNA stem loop with a 5′ phosphate and ˜2 nucleotide 3′ overhang. Approximately one helical turn of stem (˜10 nucleotides) extending beyond the Drosha cleavage site may be essential for efficient processing. The pre-miRNA may then be actively transported from the nucleus to the cytoplasm by Ran-GTP and the export receptor Ex-portin-5.
  • The pre-miRNA may be recognized by Dicer, which is also an RNase III endonuclease. Dicer may recognize the double-stranded stem of the pre-miRNA. Dicer may also cut off the terminal loop two helical turns away from the base of the stem loop, leaving an additional 5′ phosphate and a ˜2 nucleotide 3′ overhang. The resulting siRNA-like duplex, which may comprise mismatches, comprises the mature miRNA and a similar-sized fragment known as the miRNA*. The miRNA and miRNA* may be derived from opposing arms of the pri-miRNA and pre-miRNA. MiRNA* sequences may be found in libraries of cloned miRNAs, but typically at lower frequency than the miRNAs.
  • Although initially present as a double-stranded species with miRNA*, the miRNA may eventually become incorporated as a single-stranded RNA into a ribonucleoprotein complex known as the RNA-induced silencing complex (RISC). Various proteins can form the RISC, which can lead to variability in specificity for miRNA/miRNA* duplexes, binding site of the target gene, activity of miRNA (repress or activate), and which strand of the miRNA/miRNA* duplex is loaded in to the RISC.
  • When the miRNA strand of the miRNA:miRNA* duplex is loaded into the RISC, the miRNA* may be removed and degraded. The strand of the miRNA:miRNA* duplex that is loaded into the RISC may be the strand whose 5′ end is less tightly paired. In cases where both ends of the miRNA:miRNA* have roughly equivalent 5′ pairing, both miRNA and miRNA* may have gene silencing activity.
  • The RISC may identify target nucleic acids based on high levels of complementarity between the miRNA and the mRNA, especially by nucleotides 2-7 of the miRNA. Only one case has been reported in animals where the interaction between the miRNA and its target was along the entire length of the miRNA. This was shown for miR-196 and Hox B8 and it was further shown that miR-196 mediates the cleavage of the Hox B8 mRNA (Yekta et al. Science 2004; 304:594-596). Otherwise, such interactions are known only in plants (Bartel & Bartel 2003; 132:709-717).
  • A number of studies have looked at the base-pairing requirement between miRNA and its mRNA target for achieving efficient inhibition of translation (reviewed by Bartel 2004; 116:281-297). In mammalian cells, the first 8 nucleotides of the miRNA may be important (Doench & Sharp Genes Dev 2004; 18:504-511). However, other parts of the microRNA may also participate in mRNA binding. Moreover, sufficient base pairing at the 3′ can compensate for insufficient pairing at the 5′ (Brennecke et al., PloS Biol 2005; 3:e85). Computation studies, analyzing miRNA binding on whole genomes have suggested a specific role for bases 2-7 at the 5′ of the miRNA in target binding but the role of the first nucleotide, found usually to be “A” was also recognized (Lewis et al. Cell 2005; 120:15-20) Similarly, nucleotides 1-7 or 2-8 were used to identify and validate targets by Krek et al. (Nat Genet 2005; 37:495-500).
  • The target sites in the mRNA may be in the 5′ UTR, the 3′ UTR or in the coding region. Interestingly, multiple miRNAs may regulate the same mRNA target by recognizing the same or multiple sites. The presence of multiple miRNA binding sites in most genetically identified targets may indicate that the cooperative action of multiple RISCs provides the most efficient translational inhibition.
  • miRNAs may direct the RISC to down-regulate gene expression by either of two mechanisms: mRNA cleavage or translational repression. The miRNA may specify cleavage of the mRNA if the mRNA has a certain degree of complementarity to the miRNA. When a miRNA guides cleavage, the cut may be between the nucleotides pairing to residues 10 and 11 of the miRNA. Alternatively, the miRNA may repress translation if the miRNA does not have the requisite degree of complementarity to the miRNA. Translational repression may be more prevalent in animals since animals may have a lower degree of complementarity between the miRNA and binding site.
  • It should be noted that there may be variability in the 5′ and 3′ ends of any pair of miRNA and miRNA*. This variability may be due to variability in the enzymatic processing of Drosha and Dicer with respect to the site of cleavage. Variability at the 5′ and 3′ ends of miRNA and miRNA* may also be due to mismatches in the stem structures of the pri-miRNA and pre-miRNA. The mismatches of the stem strands may lead to a population of different hairpin structures. Variability in the stem structures may also lead to variability in the products of cleavage by Drosha and Dicer.
  • 2. Nucleic Acids
  • Nucleic acids are provided herein. The nucleic acids comprise the sequences of SEQ ID NOS: 1-390 or variants thereof. The variant may be a complement of the referenced nucleotide sequence. The variant may also be a nucleotide sequence that is substantially identical to the referenced nucleotide sequence or the complement thereof. The variant may also be a nucleotide sequence which hybridizes under stringent conditions to the referenced nucleotide sequence, complements thereof, or nucleotide sequences substantially identical thereto.
  • The nucleic acid may have a length of from about 10 to about 250 nucleotides. The nucleic acid may have a length of at least 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 125, 150, 175, 200 or 250 nucleotides. The nucleic acid may be synthesized or expressed in a cell (in vitro or in vivo) using a synthetic gene described herein. The nucleic acid may be synthesized as a single-strand molecule and hybridized to a substantially complementary nucleic acid to form a duplex. The nucleic acid may be introduced to a cell, tissue or organ in a single- or double-stranded form or capable of being expressed by a synthetic gene using methods well known to those skilled in the art, including as described in U.S. Pat. No. 6,506,559, which is incorporated herein by reference.
  • SEQ ID NOs 1-34 are in accordance with Sanger database version 10; SEQ ID NOs 35-390 are in accordance with Sanger database version 11;
  • TABLE 1
    SEQ ID NOS of sequences used in the invention
    miR name miR SEQ ID NO hairpin SEQ ID NO
    hsa-let-7c 1 70
    hsa-let-7e  2, 156 71
    hsa-miR-100 3 72
    hsa-miR-10a 4 73
    hsa-miR-10b 5 74
    hsa-miR-122 6 75
    hsa-miR-125a-5p 7 76
    hsa-miR-125b 8 77, 78
    hsa-miR-126 9 79
    hsa-miR-130a 10 80
    hsa-miR-138 11 81, 82
    hsa-miR-140-3p 12 83
    hsa-miR-141 13 84
    hsa-miR-143 14 85
    hsa-miR-145 15 86
    hsa-miR-146a 16 87
    hsa-miR-146b-5p 17 88
    hsa-miR-148a 18 89
    hsa-miR-149 19 90
    hsa-miR-17 20 91
    hsa-miR-181a 21 92, 93
    hsa-miR-181a* 22 92, 93
    hsa-miR-185 23 94
    hsa-miR-191 24 95
    hsa-miR-193a-3p 25 96
    hsa-miR-193a-5p 26 96
    hsa-miR-194 27 97, 98
    hsa-miR-200a 28 99
    hsa-miR-200b 29 100
    hsa-miR-200c 30 101
    hsa-miR-202 31 102
    hsa-miR-205 32 103
    hsa-miR-206 33 104
    hsa-miR-21 34 105
    hsa-miR-21* 35 105
    hsa-miR-210 36 106
    hsa-miR-214 37 107
    hsa-miR-214* 38 107
    hsa-miR-22 39 108
    hsa-miR-222 40 109
    hsa-miR-223 41 110
    hsa-miR-224 42 111
    hsa-miR-29a 43 112
    hsa-miR-29c 44, 191 113
    hsa-miR-29c* 45 113
    hsa-miR-30a 46 114
    hsa-miR-30d 47 115
    hsa-miR-30e 48 116
    hsa-miR-31 49 117
    hsa-miR-342-3p 50 118
    hsa-miR-345 51 119
    hsa-miR-34a 52 120
    hsa-miR-34c-5p 53 121
    hsa-miR-361-5p 54 122
    hsa-miR-372 55 123
    hsa-miR-375 56 124
    hsa-miR-378 57, 202 125
    hsa-miR-455-5p 58 126
    hsa-miR-487b 59 127
    hsa-miR-497 60, 208 128
    hsa-miR-509-3p 61 129, 130, 131
    hsa-miR-516a-5p 62, 211 132, 133
    hsa-miR-574-5p 63 134
    hsa-miR-652 64 135
    hsa-miR-7 65 136, 137, 138
    hsa-miR-9* 66 139, 140, 141
    hsa-miR-92a 67 142, 143
    hsa-miR-92b 68 144
    hsa-miR-934 69 145
    hsa-miR-1201 146 149
    hsa-miR-221 147 150
    hsa-miR-93 148 151
    hsa-miR-182 152
    hsa-let-7d 153
    hsa-miR-181b 154
    hsa-miR-127-3p 155
    hsa-let-7i 157
    hsa-miR-106a 158
    hsa-miR-124 159
    hsa-miR-1248 160
    hsa-miR-128 161
    hsa-miR-129-3p 162
    hsa-miR-1323 163
    hsa-miR-142-5p 164
    hsa-miR-143* 165
    hsa-miR-146b-3p 166
    hsa-miR-149* 167
    hsa-miR-150 168
    hsa-miR-152 169
    hsa-miR-155 170
    hsa-miR-15a 171
    hsa-miR-15b 172
    hsa-miR-181c 173
    hsa-miR-181d 174
    hsa-miR-183 175
    hsa-miR-18a 176
    hsa-miR-192 177
    hsa-miR-193b 178
    hsa-miR-195 179
    hsa-miR-1973 180
    hsa-miR-199a-3p 181
    hsa-miR-199a-5p 182
    hsa-miR-199b-5p 183
    hsa-miR-203 184
    hsa-miR-205* 185
    hsa-miR-20a 186
    hsa-miR-219-2-3p 187
    hsa-miR-25 188
    hsa-miR-27b 189
    hsa-miR-29b 190
    hsa-miR-302a 192
    hsa-miR-302a* 193
    hsa-miR-302d 194
    hsa-miR-30a* 195
    hsa-miR-30c 196
    hsa-miR-331-3p 197
    hsa-miR-342-5p 198
    hsa-miR-363 199
    hsa-miR-371-3p 200
    hsa-miR-371-5p 201
    hsa-miR-422a 203
    hsa-miR-425 204
    hsa-miR-451 205
    hsa-miR-455-3p 206
    hsa-miR-486-5p 207
    hsa-miR-498 209
    hsa-miR-512-5p 210
    hsa-miR-516b 212
    hsa-miR-517a 213
    hsa-miR-517c 214
    hsa-miR-518a-3p 215
    hsa-miR-518e 216
    hsa-miR-518f* 217
    hsa-miR-519a 218
    hsa-miR-519d 219
    hsa-miR-520a-5p 220
    hsa-miR-520c-3p 221
    hsa-miR-520d-5p 222
    hsa-miR-524-5p 223
    hsa-miR-527 224
    hsa-miR-551b 225
    hsa-miR-625 226
    hsa-miR-767-5p 227
    hsa-miR-886-3p 228
    hsa-miR-9 229
    hsa-miR-886-5p 230
    hsa-miR-99a 231
    hsa-miR-99a* 232
    hsa-miR-373 233
    hsa-miR-1977 234
    hsa-miR-1978 235
    MID-00689 236
    MID-15684 237, 369 
    MID-15867 238
    MID-15907 239
    MID-15965 240
    MID-16318 241
    MID-16489 242
    MID-16869 243
    MID-17144 244
    MID-18336 245
    MID-18422 246
    MID-19340 247
    MID-19533 248
    MID-20524 249
    MID-20703 250
    MID-21271 251
    MID-22664 252
    MID-23256 253
    MID-23291 254
    MID-23794 255
    MID-00405 390
    hsa-let-7a 256
    hsa-let-7b 257
    hsa-let-7f 258
    hsa-let-7g 259
    hsa-miR-106b 260
    hsa-miR-1180 261
    hsa-miR-127-5p 262
    hsa-miR-129* 263
    hsa-miR-129-5p 264
    hsa-miR-130b 265
    hsa-miR-132 266
    hsa-miR-133a 267
    hsa-miR-133b 268
    hsa-miR-134 269
    hsa-miR-139-5p 270
    hsa-miR-140-5p 271
    hsa-miR-145* 272
    hsa-miR-148b 273
    hsa-miR-151-3p 274
    hsa-miR-154 275
    hsa-miR-154* 276
    hsa-miR-17* 277
    hsa-miR-181a-2* 278
    hsa-miR-1826 279
    hsa-miR-187 280
    hsa-miR-188-5p 281
    hsa-miR-196a 282
    hsa-miR-1979 283
    hsa-miR-19b 284
    hsa-miR-20b 285
    hsa-miR-216a 286
    hsa-miR-216b 287
    hsa-miR-217 288
    hsa-miR-22* 289
    hsa-miR-221* 290
    hsa-miR-222* 291
    hsa-miR-23a 292
    hsa-miR-23b 293
    hsa-miR-24 294
    hsa-miR-26a 295
    hsa-miR-26b 296
    hsa-miR-27a 297
    hsa-miR-28-3p 298
    hsa-miR-296-5p 299
    hsa-miR-299-3p 300
    hsa-miR-29b-2* 301
    hsa-miR-301a 302
    hsa-miR-30b 303
    hsa-miR-30e* 304
    hsa-miR-31* 305
    hsa-miR-323-3p 306
    hsa-miR-324-5p 307
    hsa-miR-328 308
    hsa-miR-329 309
    hsa-miR-330-3p 310
    hsa-miR-335 311
    hsa-miR-337-5p 312
    hsa-miR-338-3p 313
    hsa-miR-361-3p 314
    hsa-miR-362-3p 315
    hsa-miR-362-5p 316
    hsa-miR-369-5p 317
    hsa-miR-370 318
    hsa-miR-376a 319
    hsa-miR-376c 320
    hsa-miR-377* 321
    hsa-miR-379 322
    hsa-miR-381 323
    hsa-miR-382 324
    hsa-miR-409-3p 325
    hsa-miR-409-5p 326
    hsa-miR-410 327
    hsa-miR-411 328
    hsa-miR-425* 329
    hsa-miR-431* 330
    hsa-miR-432 331
    hsa-miR-433 332
    hsa-miR-483-3p 333
    hsa-miR-483-5p 334
    hsa-miR-485-3p 335
    hsa-miR-485-5p 336
    hsa-miR-487a 337
    hsa-miR-494 338
    hsa-miR-495 339
    hsa-miR-500 340
    hsa-miR-500* 341
    hsa-miR-501-3p 342
    hsa-miR-502-3p 343
    hsa-miR-503 344
    hsa-miR-506 345
    hsa-miR-509-3-5p 346
    hsa-miR-513a-5p 347
    hsa-miR-532-3p 348
    hsa-miR-532-5p 349
    hsa-miR-539 350
    hsa-miR-542-5p 351
    hsa-miR-543 352
    hsa-miR-598 353
    hsa-miR-612 354
    hsa-miR-654-3p 355
    hsa-miR-658 356
    hsa-miR-660 357
    hsa-miR-665 358
    hsa-miR-708 359
    hsa-miR-873 360
    hsa-miR-874 361
    hsa-miR-891a 362
    hsa-miR-99b 363
    MID-00064 364
    MID-00078 365
    MID-00144 366
    MID-00465 367
    MID-00672 368
    MID-15986 370
    MID-16270 371
    MID-16469 372
    MID-16582 373
    MID-16748 374
    MID-17356 (3651) 389
    MID-17375 375
    MID-17576 376
    MID-17866 377
    MID-18307 378
    MID-18395 379
    MID-19898 380
    MID-19962 381
    MID-22331 382
    MID-22912 383
    MID-23017 384
    MID-23168 385
    MID-23178 386
    MID-23751 387
    hsa-miR-423-5p 388
  • 3. Nucleic Acid Complexes
  • The nucleic acid may further comprise one or more of the following: a peptide, a protein, a RNA-DNA hybrid, an antibody, an antibody fragment, a Fab fragment, and an aptamer.
  • 4. Pri-miRNA
  • The nucleic acid may comprise a sequence of a pri-miRNA or a variant thereof. The pri-miRNA sequence may comprise from 45-30,000, 50-25,000, 100-20,000, 1,000-1,500 or 80-100 nucleotides. The sequence of the pri-miRNA may comprise a pre-miRNA, miRNA and miRNA*, as set forth herein, and variants thereof. The sequence of the pri-miRNA may comprise any of the sequences of SEQ ID NOS: 1-390 or variants thereof.
  • The pri-miRNA may comprise a hairpin structure. The hairpin may comprise a first and a second nucleic acid sequence that are substantially complimentary. The first and second nucleic acid sequence may be from 37-50 nucleotides. The first and second nucleic acid sequence may be separated by a third sequence of from 8-12 nucleotides. The hairpin structure may have a free energy of less than −25 Kcal/mole, as calculated by the Vienna algorithm with default parameters, as described in Hofacker et al. (Monatshefte f. Chemie 1994; 125:167-188), the contents of which are incorporated herein by reference. The hairpin may comprise a terminal loop of 4-20, 8-12 or 10 nucleotides. The pri-miRNA may comprise at least 19% adenosine nucleotides, at least 16% cytosine nucleotides, at least 23% thymine nucleotides and at least 19% guanine nucleotides.
  • 5. Pre-miRNA
  • The nucleic acid may also comprise a sequence of a pre-miRNA or a variant thereof. The pre-miRNA sequence may comprise from 45-90, 60-80 or 60-70 nucleotides. The sequence of the pre-miRNA may comprise a miRNA and a miRNA* as set forth herein. The sequence of the pre-miRNA may also be that of a pri-miRNA excluding from 0-160 nucleotides from the 5′ and 3′ ends of the pri-miRNA. The sequence of the pre-miRNA may comprise the sequence of SEQ ID NOS: 1-390 or variants thereof.
  • 6. miRNA
  • The nucleic acid may also comprise a sequence of a miRNA (including miRNA*) or a variant thereof. The miRNA sequence may comprise from 13-33, 18-24 or 21-23 nucleotides. The miRNA may also comprise a total of at least 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39 or 40 nucleotides. The sequence of the miRNA may be the first 13-33 nucleotides of the pre-miRNA. The sequence of the miRNA may also be the last 13-33 nucleotides of the pre-miRNA. The sequence of the miRNA may comprise the sequence of SEQ ID NOS: 1-69, 146-148, 152-390 or variants thereof.
  • 7. Probes
  • A probe comprising a nucleic acid described herein is also provided. Probes may be used for screening and diagnostic methods, as outlined below. The probe may be attached or immobilized to a solid substrate, such as a biochip.
  • The probe may have a length of from 8 to 500, 10 to 100 or 20 to 60 nucleotides. The probe may also have a length of at least 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 120, 140, 160, 180, 200, 220, 240, 260, 280 or 300 nucleotides. The probe may further comprise a linker sequence of from 10-60 nucleotides. The probe may comprise a nucleic acid that is complementary to a sequence selected from the group consisting of SEQ ID NOS: 1-390 or variants thereof.
  • 8. Biochip
  • A biochip is also provided. The biochip may comprise a solid substrate comprising an attached probe or plurality of probes described herein. The probes may be capable of hybridizing to a target sequence under stringent hybridization conditions. The probes may be attached at spatially defined addresses on the substrate. More than one probe per target sequence may be used, with either overlapping probes or probes to different sections of a particular target sequence. The probes may be capable of hybridizing to target sequences associated with a single disorder appreciated by those in the art. The probes may either be synthesized first, with subsequent attachment to the biochip, or may be directly synthesized on the biochip.
  • The solid substrate may be a material that may be modified to contain discrete individual sites appropriate for the attachment or association of the probes and is amenable to at least one detection method. Representative examples of substrates include glass and modified or functionalized glass, plastics (including acrylics, polystyrene and copolymers of styrene and other materials, polypropylene, polyethylene, polybutylene, polyurethanes, TeflonJ, etc.), polysaccharides, nylon or nitrocellulose, resins, silica or silica-based materials including silicon and modified silicon, carbon, metals, inorganic glasses and plastics. The substrates may allow optical detection without appreciably fluorescing.
  • The substrate may be planar, although other configurations of substrates may be used as well. For example, probes may be placed on the inside surface of a tube, for flow-through sample analysis to minimize sample volume. Similarly, the substrate may be flexible, such as flexible foam, including closed cell foams made of particular plastics.
  • The biochip and the probe may be derivatized with chemical functional groups for subsequent attachment of the two. For example, the biochip may be derivatized with a chemical functional group including, but not limited to, amino groups, carboxyl groups, oxo groups or thiol groups. Using these functional groups, the probes may be attached using functional groups on the probes either directly or indirectly using a linker. The probes may be attached to the solid support by either the 5′ terminus, 3′ terminus, or via an internal nucleotide.
  • The probe may also be attached to the solid support non-covalently. For example, biotinylated oligonucleotides can be made, which may bind to surfaces covalently coated with streptavidin, resulting in attachment. Alternatively, probes may be synthesized on the surface using techniques such as photopolymerization and photolithography.
  • 9. Diagnostics
  • As used herein, the term “diagnosing” refers to classifying pathology, or a symptom, determining a severity of the pathology (e.g., grade or stage), monitoring pathology progression, forecasting an outcome of pathology and/or prospects of recovery.
  • As used herein, the phrase “subject in need thereof” refers to an animal or human subject who is known to have cancer, at risk of having cancer (e.g., a genetically predisposed subject, a subject with medical and/or family history of cancer, a subject who has been exposed to carcinogens, occupational hazard, environmental hazard) and/or a subject who exhibits suspicious clinical signs of cancer (e.g., blood in the stool or melena, unexplained pain, sweating, unexplained fever, unexplained loss of weight up to anorexia, changes in bowel habits (constipation and/or diarrhea), tenesmus (sense of incomplete defecation, for rectal cancer specifically), anemia and/or general weakness). Additionally or alternatively, the subject in need thereof can be a healthy human subject undergoing a routine well-being check up.
  • Analyzing presence of malignant or pre-malignant cells can be effected in vivo or ex vivo, whereby a biological sample (e.g., biopsy, blood) is retrieved. Such biopsy samples comprise cells and may be an incisional or excisional biopsy. Alternatively, the cells may be retrieved from a complete resection.
  • While employing the present teachings, additional information may be gleaned pertaining to the determination of treatment regimen, treatment course and/or to the measurement of the severity of the disease.
  • As used herein the phrase “treatment regimen” refers to a treatment plan that specifies the type of treatment, dosage, follow-up plans, schedule and/or duration of a treatment provided to a subject in need thereof (e.g., a subject diagnosed with a pathology). The selected treatment regimen can be an aggressive one which is expected to result in the best clinical outcome (e.g., complete cure of the pathology) or a more moderate one which may relieve symptoms of the pathology yet results in incomplete cure of the pathology. It will be appreciated that in certain cases the treatment regimen may be associated with some discomfort to the subject or adverse side effects (e.g., damage to healthy cells or tissue). The type of treatment can include a surgical intervention (e.g., removal of lesion, diseased cells, tissue, or organ), a cell replacement therapy, an administration of a therapeutic drug (e.g., receptor agonists, antagonists, hormones, chemotherapy agents) in a local or a systemic mode, an exposure to radiation therapy using an external source (e.g., external beam) and/or an internal source (e.g., brachytherapy) and/or any combination thereof. The dosage, schedule and duration of treatment can vary, depending on the severity of pathology and the selected type of treatment, and those of skill in the art are capable of adjusting the type of treatment with the dosage, schedule and duration of treatment.
  • A method of diagnosis is also provided. The method comprises detecting an expression level of a specific cancer-associated nucleic acid in a biological sample. The sample may be derived from a patient. Diagnosis of a specific cancer state in a patient may allow for prognosis and selection of therapeutic strategy. Further, the developmental stage of cells may be classified by determining temporarily expressed specific cancer-associated nucleic acids.
  • In situ hybridization of labeled probes to tissue arrays may be performed. When comparing the fingerprints between individual samples the skilled artisan can make a diagnosis, a prognosis, or a prediction based on the findings. It is further understood that the nucleic acid sequences which indicate the diagnosis may differ from those which indicate the prognosis and molecular profiling of the condition of the cells or exosomes may lead to distinctions between responsive or refractory conditions or may be predictive of outcomes.
  • 10. Kits
  • A kit is also provided and may comprise a nucleic acid described herein together with any or all of the following: assay reagents, buffers, probes and/or primers, and sterile saline or another pharmaceutically acceptable emulsion and suspension base. In addition, the kits may include instructional materials containing directions (e.g., protocols) for the practice of the methods described herein. The kit may further comprise a software package for data analysis of expression profiles.
  • For example, the kit may be a kit for the amplification, detection, identification or quantification of a target nucleic acid sequence. The kit may comprise a poly (T) primer, a forward primer, a reverse primer, and a probe.
  • Any of the compositions described herein may be comprised in a kit. In a non-limiting example, reagents for isolating miRNA, labeling miRNA, and/or evaluating a miRNA population using an array are included in a kit. The kit may further include reagents for creating or synthesizing miRNA probes. The kits will thus comprise, in suitable container means, an enzyme for labeling the miRNA by incorporating labeled nucleotide or unlabeled nucleotides that are subsequently labeled. It may also include one or more buffers, such as reaction buffer, labeling buffer, washing buffer, or a hybridization buffer, compounds for preparing the miRNA probes, components for in situ hybridization and components for isolating miRNA. Other kits of the invention may include components for making a nucleic acid array comprising miRNA, and thus may include, for example, a solid support.
  • The following examples are presented in order to more fully illustrate some embodiments of the invention. They should, in no way be construed, however, as limiting the broad scope of the invention.
  • EXAMPLES Methods 1. Tumor Samples
  • 1300 primary and metastatic tumor FFPE were used in the study. Tumor samples were obtained from several sources. Institutional review approvals were obtained for all samples in accordance with each institute's institutional review board or IRB equivalent guidelines. Samples included primary tumors and metastases of defined origins, according to clinical records. Tumor content was at least 50% for >95% of samples, as determined by a pathologist based on hematoxylin-eosin (H&E) stained slides.
  • 2. RNA Extraction
  • For FFPE samples, total RNA was isolated from seven to ten 10-μm-thick tissue sections using the miR extraction protocol developed at Rosetta Genomics. Briefly, the sample was incubated a few times in xylene at 57° C. to remove paraffin excess, followed by ethanol washes. Proteins were degraded by proteinase K solution at 45° C. for a few hours. The RNA was extracted with acid phenol:chloroform followed by ethanol precipitation and DNAse digestion. Total RNA quantity and quality was checked by spectrophotometer (Nanodrop ND-1000).
  • 3. miR Array Platform
  • Custom microarrays (Agilent Technologies, Santa Clara, Calif.) were produced by printing DNA oligonucleotide probes to: 982 miRs sequences, 17 negative controls, 23 spikes, and 10 positive controls (total of 1032 probes). Each probe, printed in triplicate, carried up to 28-nucleotide (nt) linker at the 3′ end of the microRNA's complement sequence. 17 negative control probes were designed using as sequences which do not match the genome. Two groups of positive control probes were designed to hybridize to miR array: (i) synthetic small RNAs were spiked to the RNA before labeling to verify the labeling efficiency; and (ii) probes for abundant small RNA (e.g., small nuclear RNAs (U43, U24, Z30, U6, U48, U44), 5.8s and 5s ribosomal RNA are spotted on the array to verify RNA quality.
  • 4. Cy-Dye Labeling of miRNA for miR Array
  • One μg of total RNA were labeled by ligation (Thomson et al. Nature Methods 2004; 1:47-53) of an RNA-linker, p-rCrU-Cy/dye (Eurogentec or equivalent), to the 3′ end with Cy3 or Cy5. The labeling reaction contained total RNA, spikes (0.1-100 fmoles), 400 ng RNA-linker-dye, 15% DMSO, 1× ligase buffer and 20 units of T4 RNA ligase (NEB), and proceeded at 4° C. for 1 h, followed by 1 h at 37° C., followed by 4° C. up to 40 min.
  • The labeled RNA was mixed with 30 μl hybridization mixture (mixture of 45 μL of the 10×GE Agilent Blocking Agent and 246 μL of 2× Hi-RPM Hybridization). The labeling mixture was incubated at 100° C. for 5 minutes followed by ice incubation in water bath for 5 minutes. Slides were Hybridize at 54° C. for 16-20 hours, followed by two washes. The first wash was conducted at room temperature with Agilent GE Wash Buffer 1 for 5 min followed by a second wash with Agilent GE Wash Buffer 2 at 37° C. for 5 min
  • Arrays were scanned using an Agilent Microarray Scanner Bundle G2565BA (resolution of 5 μm at XDR Hi 100%, XDR Lo 5%). Array images were analyzed using Feature Extraction 10.7 software (Agilent).
  • 5. Array Signal Calculation and Normalization
  • Triplicate spots were combined to produce one signal for each probe by taking the logarithmic mean of reliable spots. All data were log 2-transformed and the analysis was performed in log 2-space. A reference data vector for normalization R was calculated by taking the median expression level for each probe across all samples. For each sample data vector S, a 2nd degree polynomial F was found so as to provide the best fit between the sample data and the reference data, such that R≈F(S). Remote data points (“outliers”) were not used for fitting the polynomial F. For each probe in the sample (element Si in the vector S), the normalized value (in log-space) Mi was calculated from the initial value Si by transforming it with the polynomial function F, so that Mi=F(Si).
  • 6. Logistic Regression
  • The aim of a logistic regression model is to use several features, such as expression levels of several microRNAs, to assign a probability of belonging to one of two possible groups, such as two branches of a node in a binary decision-tree. Logistic regression models the natural log of the odds ratio, i.e., the ratio of the probability of belonging to the first group, for example, the left branch in a node of a binary decision-tree (P) over the probability of belonging to the second group, for example, the right branch in such a node (1-P), as a linear combination of the different expression levels (in log-space). The logistic regression assumes that:
  • ln ( P 1 - P ) = β 0 + i = 1 N β i · M i = β 0 + β 1 · M 1 + β 2 · M 2 + ,
  • where β0 is the bias, Mi is the expression level (normalized, in log 2-space) of the i-th microRNA used in the decision node, and βi is its corresponding coefficient. βi>0 indicates that the probability to take the left branch (P) increases when the expression level of this microRNA (Mi) increases, and the opposite for βi<0. If a node uses only a single microRNA (M), then solving for P results in:
  • P = β 0 + β i · M 1 + β 0 + β 1 · M .
  • The regression error on each sample is the difference between the assigned probability P and the true “probability” of this sample, i.e., 1 if this sample is in the left branch group and 0 otherwise. The training and optimization of the logistic regression model calculates the parameters β and the p-values [for each microRNA by the Wald statistic and for the overall model by the χ2 (chi-square) difference], maximizing the likelihood of the data given the model and minimizing the total regression error
  • Samples in first group ( 1 - P j ) + Samples in second group P j .
  • The probability output of the logistic model is here converted to a binary decision by comparing P to a threshold, denoted by PTH, i.e., if P≧PTH then the sample belongs to the left branch (“first group”) and vice versa. Choosing at each node the branch which has a probability >0.5, i.e., using a probability threshold of 0.5, leads to a minimization of the sum of the regression errors. However, as the goal was the minimization of the overall number of misclassifications (and not of their probability), a modification which adjusts the probability threshold (PTH) was used in order to minimize the overall number of mistakes at each node (Table 2). For each node the threshold to a new probability threshold PTH was optimized such that the number of classification errors is minimized. This change of probability threshold is equivalent (in terms of classifications) to a modification of the bias β0, which may reflect a change in the prior frequencies of the classes. Once the threshold was chosen β0 was modified such that the threshold will be shifted back to 0.5. In addition, β0, β1, β2, . . . were adjusted so that the slope of the log of the odds ratio function is limited.
  • 7. Stepwise Logistic Regression and Feature Selection
  • The original data contain the expression levels of multiple microRNAs for each sample, i.e., multiple of data features. In training the classifier for each node, only a small subset of these features was selected and used for optimizing a logistic regression model. In the initial training this was done using a forward stepwise scheme. The features were sorted in order of decreasing log-likelihoods, and the logistic model was started off and optimized with the first feature. The second feature was then added, and the model re-optimized. The regression error of the two models was compared: if the addition of the feature did not provide a significant advantage (a χ2 difference less than 7.88, p-value of 0.005), the new feature was discarded. Otherwise, the added feature was kept. Adding a new feature may make a previous feature redundant (e.g., if they are very highly correlated). To check for this, the process iteratively checks if the feature with lowest likelihood can be discarded (without losing χ2 difference as above). After ensuring that the current set of features is compact in this sense, the process continues to test the next feature in the sorted list, until features are exhausted. No limitation on the number of features was inserted into the algorithm.
  • The stepwise logistic regression method was used on subsets of the training set samples by re-sampling the training set with repetition (“bootstrap”), so that each of the 20 runs contained somewhat different training set. All the features that took part in one of the 20 models were collected. A robust set of 1-3 features per each node was selected by comparing features that were repeatedly chosen in the bootstrap sets to previous evidence, and considering their signal strengths and reliability. When using these selected features to construct the classifier, the stepwise process was not used and the training optimized the logistic regression model parameters only.
  • 8. K-Nearest-Neighbors (KNN) Classification Algorithm
  • The KNN algorithm (see e.g., Ma et al., Arch Pathol Lab Med 2006; 130:465-73) calculates the distance (Pearson correlation) of any sample to all samples in the training set, and classifies the sample by the majority vote of the k samples which are most similar (k being a parameter of the classifier). The correlation is calculated on the pre-defined set of microRNAs (the microRNAs that were used by the decision-tree). KNN algorithms with k=1; 10 were compared, and the optimal performer was selected, using k=5. The KNN was based on comparing the expression of all 65 microRNAs in each sample to all other samples in the training database.
  • 9. Reporting a Final Answer (Prediction):
  • The decision-tree and KNN each return a predicted tissue of origin and histological type where applicable. The tissue of origin and histological type may be one of the exact origins and types in the training or a variant thereof. For example, whereas the training includes brain oligodendroglioma and brain astrocytoma, the answer may simply be brain carcinoma. In addition to the tissue of origin and histological type, the KNN and decision-tree each return a confidence measure. The KNN returns the number of samples within the K nearest neighbors that agreed with the answer reported by the KNN (denoted by V), and the decision-tree returns the probability of the result (P), which is the multiplication of the probabilities at each branch point made on the way to that answer. The classifier returns the two different predictions or a single prediction in case the predictions concur, can be unified into a single answer (for example into the prediction brain if the KNN returned brain oligodendroglioma and the decision-tree brain astrocytoma), or if based on V and P, one answer is chosen to override the other.
  • Example 1 Decision-Tree Classification Algorithm
  • A tumor classifier was built using the microRNA expression levels by applying a binary tree classification scheme (FIGS. 1A-F). This framework is set up to utilize the specificity of microRNAs in tissue differentiation and embryogenesis: different microRNAs are involved in various stages of tissue specification, and are used by the algorithm at different decision points or “nodes”. The tree breaks up the complex multi-tissue classification problem into a set of simpler binary decisions. At each node, classes which branch out earlier in the tree are not considered, reducing interference from irrelevant samples and further simplifying the decision. The decision at each node can then be accomplished using only a small number of microRNA biomarkers, which have well-defined roles in the classification (Table 2). The structure of the binary tree was based on a hierarchy of tissue development and morphological similarity18, which was modified by prominent features of the microRNA expression patterns. For example, the expression patterns of microRNAs indicated a significant difference between germ cell tumors and tumors of non-germ cell origin, and these are therefore distinguished at node #1 (FIG. 2) into separate branches (FIG. 1A).
  • For each of the individual nodes logistic regression models were used, a robust family of classifiers which are frequently used in epidemiological and clinical studies to combine continuous data features into a binary decision (FIGS. 2-25 and Methods). Since gene expression classifiers have an inherent redundancy in selecting the gene features, bootstrapping was used on the training sample set as a method to select a stable microRNA set for each node (Methods). This resulted in a small number (usually 2-3) of microRNA features per node, totaling 65 microRNAs for the full classifier (Table 2). This approach provides a systematic process for identifying new biomarkers for differential expression.
  • TABLE 2
    microRNAs used per class in the tree classifier
    miR List: Class
    hsa-miR-372 (SEQ ID NO: 55) Germ cell cancer
    hsa-miR-372, hsa-miR-122 (SEQ ID NO: 6), hsa-miR-126 (SEQ ID Biliary tract
    NO: 9), hsa-miR-200b (SEQ ID NO: 29) adenocarcinoma
    hsa-miR-372, hsa-miR-122, hsa-miR-126, hsa-miR-200b Hepatocellular
    carcinoma (HCC)
    hsa-miR-372, hsa-miR-122, hsa-miR-200c (SEQ ID NO: 30), hsa-miR- Brain tumor
    30a (SEQ ID NO: 46), hsa-miR-146a (SEQ ID NO: 16), hsa-let-7e (SEQ
    ID NO: 156), hsa-miR-9* (SEQ ID NO: 66), hsa-miR-92b (SEQ ID
    NO: 68)
    hsa-miR-372, hsa-miR-122, hsa-miR-200c, hsa-miR-30a, hsa-miR- Brain -
    146a, hsa-let-7e, hsa-miR-9*, hsa-miR-92b, hsa-miR-222 (SEQ ID oligodendroglioma
    NO: 40), hsa-miR-497 (SEQ ID NO: 60)
    hsa-miR-372, hsa-miR-122, hsa-miR-200c, hsa-miR-30a, hsa-miR- Brain - astrocytoma
    146a, hsa-let-7e, hsa-miR-9*, hsa-miR-92b, hsa-miR-222, hsa-miR-497
    hsa-miR-372, hsa-miR-122, hsa-miR-200c, hsa-miR-30a, hsa-miR-375 Prostate
    (SEQ ID NO: 56), hsa-miR-7 (SEQ ID NO: 65), hsa-miR-193a-3p (SEQ Adenocarcinoma
    ID NO: 25), hsa-miR-194 (SEQ ID NO: 27), hsa-miR-21* (SEQ ID
    NO: 35), hsa-miR-143 (SEQ ID NO: 14), hsa-miR-181a (SEQ ID
    NO: 21)
    hsa-miR-372 Ovarian primitive
    germ cell tumor
    hsa-miR-372 Testis
    hsa-miR-372, hsa-miR-200b, hsa-miR-516a-5p (SEQ ID NO: 62) Seminomatous
    testicular germ cell
    tumor
    hsa-miR-372, hsa-miR-200b, hsa-miR-516a-5p Non seminomatous
    testicular germ cell
    tumor
    hsa-miR-372, hsa-miR-122, hsa-miR-200c, hsa-miR-30a, hsa-miR-7, Breast
    hsa-miR-194, hsa-miR-21*, hsa-miR-143, hsa-miR-181a, hsa-miR-205 adenocarcinoma
    (SEQ ID NO: 32), hsa-miR-345 (SEQ ID NO: 51), hsa-miR-125a-5p
    (SEQ ID NO: 7), hsa-miR-193a-3p (SEQ ID NO: 25), hsa-miR-375, hsa-
    miR-342-3p (SEQ ID NO: 50)
    hsa-miR-372, hsa-miR-122, hsa-miR-200c, hsa-miR-30a, hsa-miR-7, Ovarian carcinoma
    hsa-miR-193a-3p, hsa-miR-194, hsa-miR-21*, hsa-miR-143, hsa-miR-
    181a, hsa-miR-345, hsa-miR-125a-5p, hsa-miR-193a-3p, hsa-miR-375,
    hsa-miR-342-3p, hsa-miR-205 (SEQ ID NO: 32), hsa-miR-10a (SEQ ID
    NO: 4), hsa-miR-22 (SEQ ID NO: 39)
    hsa-miR-372, hsa-miR-122, hsa-miR-200c, hsa-miR-30a, hsa-miR-375, Thyroid carcinoma
    hsa-miR-7, hsa-miR-193a-3p, hsa-miR-194, hsa-miR-21*, hsa-miR-
    143, hsa-miR-181a, hsa-miR-205, hsa-miR-345, hsa-miR-125a-5p, hsa-
    miR-138 (SEQ ID NO: 11), hsa-miR-93 (SEQ ID NO: 148), hsa-miR-
    10a (SEQ ID NO: 4)
    hsa-miR-372, hsa-miR-122, hsa-miR-200c, hsa-miR-30a, hsa-miR-375, Thyroid carcinoma
    hsa-miR-7, hsa-miR-193a-3p, hsa-miR-194, hsa-miR-21*, hsa-miR- follicular
    143, hsa-miR-181a, hsa-miR-205, hsa-miR-345, hsa-miR-125a-5p, hsa-
    miR-138, hsa-miR-93, hsa-miR-10a, hsa-miR-146b-5p (SEQ ID
    NO: 17), hsa-miR-21 (SEQ ID NO: 34)
    hsa-miR-372, hsa-miR-122, hsa-miR-200c, hsa-miR-30a, hsa-miR-375, Thyroid carcinoma
    hsa-miR-7, hsa-miR-193a-3p, hsa-miR-194, hsa-miR-21*, hsa-miR- papillary
    143, hsa-miR-181a, hsa-miR-205, hsa-miR-345, hsa-miR-125a-5p, hsa-
    miR-138, hsa-miR-93, hsa-miR-10a, hsa-miR-146b-5p, hsa-miR-21
    hsa-miR-372, hsa-miR-122, hsa-miR-200c, hsa-miR-30a, hsa-miR-375, Breast
    hsa-miR-7, hsa-miR-194, hsa-miR-21*, hsa-miR-143, hsa-miR-181a, adenocarcinoma
    hsa-miR-205, hsa-miR-345, hsa-miR-125a-5p, hsa-miR-138, hsa-miR-
    93, hsa-miR-10a, hsa-miR-193a-3p (SEQ ID NO: 25), hsa-miR-31 (SEQ
    ID NO: 49), hsa-miR-92a (SEQ ID NO: 67)
    hsa-miR-372, hsa-miR-122, hsa-miR-200c, hsa-miR-30a, hsa-miR-375, Lung large cell or
    hsa-miR-7, hsa-miR-193a-3p, hsa-miR-194, hsa-miR-21*, hsa-miR- adenocarcinoma
    143, hsa-miR-181a, hsa-miR-205, hsa-miR-345, hsa-miR-125a-5p, hsa-
    miR-93, hsa-miR-10a, hsa-miR-193a-3p, hsa-miR-31, hsa-miR-92a,
    hsa-miR-138 (SEQ ID NO: 11), hsa-miR-378 (SEQ ID NO: 57), hsa-
    miR-21 (SEQ ID NO: 34)
    hsa-miR-372, hsa-miR-122, hsa-miR-200c, hsa-miR-30a, hsa-miR-375, Ovarian carcinoma
    hsa-miR-7, hsa-miR-194, hsa-miR-21*, hsa-miR-143, hsa-miR-181a,
    hsa-miR-205, hsa-miR-345, hsa-miR-125a-5p, hsa-miR-93, hsa-miR-
    10a, hsa-miR-193a-3p, hsa-miR-31, hsa-miR-92a, hsa-miR-138, hsa-
    miR-378, hsa-miR-21
    hsa-miR-372, hsa-miR-122, hsa-miR-200c, hsa-miR-30a, hsa-miR-375, Thymoma
    hsa-miR-7, hsa-miR-193a-3p, hsa-miR-194, hsa-miR-21*, hsa-miR-
    143, hsa-miR-181a, hsa-miR-205, hsa-miR-345, hsa-miR-125a-5p, hsa-
    miR-342-3p, hsa-miR-10a, hsa-miR-22, hsa-miR-100, hsa-miR-21
    hsa-miR-372, hsa-miR-122, hsa-miR-200c, hsa-miR-30a, hsa-miR-375, Urothelial carcinoma
    hsa-miR-7, hsa-miR-193a-3p, hsa-miR-194, hsa-miR-21*, hsa-miR- (TCC)
    143, hsa-miR-181a, hsa-miR-205, hsa-miR-345, hsa-miR-125a-5p, hsa-
    miR-342-3p, hsa-miR-205, hsa-miR-10a, hsa-miR-22, hsa-miR-100,
    hsa-miR-21, hsa-miR-934 (SEQ ID NO: 69), hsa-miR-191 (SEQ ID
    NO: 24), hsa-miR-29c (SEQ ID NO: 44)
    hsa-miR-372, hsa-miR-122, hsa-miR-200c, hsa-miR-30a, hsa-miR-375, Squamous cell
    hsa-miR-7, hsa-miR-193a-3p, hsa-miR-194, hsa-miR-21*, hsa-miR- carcinoma (SCC)
    143, hsa-miR-181a, hsa-miR-205, hsa-miR-345, hsa-miR-125a-5p, hsa-
    miR-342-3p, hsa-miR-10a, hsa-miR-22, hsa-miR-100, hsa-miR-21, hsa-
    miR-934, hsa-miR-191, hsa-miR-29c
    hsa-miR-372, hsa-miR-122, hsa-miR-200c, hsa-miR-30a, hsa-miR-375, Uterine cervix SCC
    hsa-miR-7, hsa-miR-193a-3p, hsa-miR-194, hsa-miR-21*, hsa-miR-
    143, hsa-miR-181a, hsa-miR-205, hsa-miR-345, hsa-miR-125a-5p, hsa-
    miR-342-3p, hsa-miR-10a, hsa-miR-22, hsa-miR-100, hsa-miR-21, hsa-
    miR-934, hsa-miR-191, hsa-miR-29c, hsa-miR-10b (SEQ ID NO: 5),
    hsa-let-7c (SEQ ID NO: 1), hsa-miR-361-5p (SEQ ID NO: 54)
    hsa-miR-372, hsa-miR-122, hsa-miR-200c, hsa-miR-30a, hsa-miR-375, Anus or Skin SCC
    hsa-miR-7, hsa-miR-193a-3p, hsa-miR-194, hsa-miR-21*, hsa-miR-
    143, hsa-miR-181a, hsa-miR-205, hsa-miR-345, hsa-miR-125a-5p, hsa-
    miR-193a-3p, hsa-miR-375, hsa-miR-342-3p, hsa-miR-205, hsa-miR-
    10a, hsa-miR-22, hsa-miR-100, hsa-miR-21, hsa-miR-934, hsa-miR-
    191, hsa-miR-29c, hsa-miR-10b, hsa-let-7c, hsa-miR-361-5p, hsa-miR-
    138, hsa-miR-185 (SEQ ID NO: 23)
    hsa-miR-372, hsa-miR-122, hsa-miR-200c, hsa-miR-30a, hsa-miR-375, Lung, Head& Neck or
    hsa-miR-7, hsa-miR-193a-3p, hsa-miR-194, hsa-miR-21*, hsa-miR- Esophagus SCC
    143, hsa-miR-181a, hsa-miR-205, hsa-miR-345, hsa-miR-125a-5p, hsa-
    miR-342-3p, hsa-miR-10a, hsa-miR-22, hsa-miR-100, hsa-miR-21, hsa-
    miR-934, hsa-miR-191, hsa-miR-29c, hsa-let-7c, hsa-miR-361-5p, hsa-
    miR-10b, hsa-miR-138, hsa-miR-185
    hsa-miR-372, hsa-miR-122, hsa-miR-200c, hsa-miR-30a, hsa-miR-146a Melanoma
    (SEQ ID NO: 16), hsa-let-7e (SEQ ID NO: 2), hsa-miR-30d (SEQ ID
    NO: 47), hsa-miR-342-3p
    hsa-miR-372, hsa-miR-122, hsa-miR-200c, hsa-miR-30a, hsa-miR- Lymphoma
    146a, hsa-let-7e, hsa-miR-30d, hsa-miR-342-3p
    hsa-miR-372, hsa-miR-122, hsa-miR-200c, hsa-miR-30a, hsa-miR- B cell lymphoma
    146a, hsa-let-7e, hsa-miR-30d, hsa-miR-342-3p, hsa-miR-21*, hsa-
    miR-30e (SEQ ID NO: 48)
    hsa-miR-372, hsa-miR-122, hsa-miR-200c, hsa-miR-30a, hsa-miR- T cell lymphoma
    146a, hsa-let-7e, hsa-miR-30a, hsa-miR-30d, hsa-miR-342-3p, hsa-
    miR-21*, hsa-miR-30e
    hsa-miR-372, hsa-miR-122, hsa-miR-200c, hsa-miR-30a, hsa-miR-375, Lung small cell
    hsa-miR-7, hsa-miR-193a-3p, hsa-miR-17 (SEQ ID NO: 20), hsa-miR- carcinoma
    29c* (SEQ ID NO: 45)
    hsa-miR-372, hsa-miR-122, hsa-miR-200c, hsa-miR-30a, hsa-miR-375, Medullary thyroid
    hsa-miR-7, hsa-miR-193a-3p, hsa-miR-17, hsa-miR-29c*, hsa-miR-222 carcinoma
    (SEQ ID NO: 40), hsa-miR-92a (SEQ ID NO: 67), hsa-miR-92b (SEQ
    ID NO: 68)
    hsa-miR-372, hsa-miR-122, hsa-miR-200c, hsa-miR-30a, hsa-miR-375, Lung carcinoid
    hsa-miR-7, hsa-miR-193a-3p, hsa-miR-17, hsa-miR-29c*, hsa-miR-
    222, hsa-miR-92a, hsa-miR-92b, hsa-miR-652 (SEQ ID NO: 64), hsa-
    miR-34c-5p (SEQ ID NO: 53), hsa-miR-214 (SEQ ID NO: 37)
    hsa-miR-372, hsa-miR-122, hsa-miR-200c, hsa-miR-30a, hsa-miR-375, Gastrointestinal (GI)
    hsa-miR-7, hsa-miR-193a-3p, hsa-miR-17, hsa-miR-29c*, hsa-miR- tract carcinoid
    222, hsa-miR-92a, hsa-miR-92b, hsa-miR-652, hsa-miR-34c-5p, hsa-
    miR-214, hsa-miR-21 (SEQ ID NO: 34), hsa-miR-148a (SEQ ID
    NO: 18)
    hsa-miR-372, hsa-miR-122, hsa-miR-200c, hsa-miR-30a, hsa-miR-375, Pancreas islet cell
    hsa-miR-7, hsa-miR-193a-3p, hsa-miR-17, hsa-miR-29c*, hsa-miR- tumor
    222, hsa-miR-92a, hsa-miR-92b, hsa-miR-652, hsa-miR-34c-5p, hsa-
    miR-214, hsa-miR-21, hsa-miR-148a
    hsa-miR-372, hsa-miR-122, hsa-miR-200c, hsa-miR-30a, hsa-miR-375, Gastric or Esophageal
    hsa-miR-7, hsa-miR-193a-3p, hsa-miR-194 (SEQ ID NO: 27), hsa-miR- Adenocarcinoma
    21*(SEQ ID NO: 35), hsa-miR-224 (SEQ ID NO: 42), hsa-miR-210
    (SEQ ID NO: 36), hsa-miR-1201 (SEQ ID NO: 146)
    hsa-miR-372, hsa-miR-122, hsa-miR-200c, hsa-miR-30a, hsa-miR-375, Colorectal
    hsa-miR-7, hsa-miR-193a-3p, hsa-miR-194, hsa-miR-21*, hsa-miR- Adenocarcinoma
    224, hsa-miR-210, hsa-miR-1201, hsa-miR-17 (SEQ ID NO: 20), hsa-
    miR-29a (SEQ ID NO: 43)
    hsa-miR-372, hsa-miR-122, hsa-miR-200c, hsa-miR-30a, hsa-miR-375, Pancreas or bile
    hsa-miR-7, hsa-miR-193a-3p, hsa-miR-194, hsa-miR-21*, hsa-miR-
    224, hsa-miR-210, hsa-miR-1201, hsa-miR-17, hsa-miR-29a
    hsa-miR-372, hsa-miR-122, hsa-miR-200c, hsa-miR-30a, hsa-miR-375, Pancreatic
    hsa-miR-7, hsa-miR-193a-3p, hsa-miR-194, hsa-miR-21*, hsa-miR- adenocarcinoma
    224, hsa-miR-210, hsa-miR-1201, hsa-miR-17, hsa-miR-29a, hsa-miR-
    345 (SEQ ID NO: 51), hsa-miR-31 (SEQ ID NO: 49), hsa-miR-146a
    (SEQ ID NO: 16)
    hsa-miR-372, hsa-miR-122, hsa-miR-200c, hsa-miR-30a, hsa-miR-375, Biliary tract
    hsa-miR-7, hsa-miR-193a-3p, hsa-miR-194, hsa-miR-21*, hsa-miR- adenocarcinoma
    224, hsa-miR-210, hsa-miR-1201, hsa-miR-17, hsa-miR-29a, hsa-miR-
    345, hsa-miR-31, hsa-miR-146a
    hsa-miR-372, hsa-miR-122, hsa-miR-200c, hsa-miR-30a, hsa-miR-146a Renal cell carcinoma
    (SEQ ID NO: 16), hsa-let-7e, hsa-miR-9* (SEQ ID NO: 66), hsa-miR- chromophobe
    92b (SEQ ID NO: 68), hsa-miR-149 (SEQ ID NO: 19), hsa-miR-200b
    (SEQ ID NO: 29)
    hsa-miR-372, hsa-miR-122, hsa-miR-200c, hsa-miR-30a, hsa-miR- Pheochromocytoma
    146a, hsa-let-7e, hsa-miR-9*, hsa-miR-92b, hsa-miR-30a, hsa-miR-149,
    hsa-miR-200b, hsa-miR-7 (SEQ ID NO: 65), hsa-miR-375
    hsa-miR-372, hsa-miR-122, hsa-miR-200c, hsa-miR-30a, hsa-miR- Adrenocortical
    146a, hsa-let-7e, hsa-miR-9*, hsa-miR-92b, hsa-miR-149, hsa-miR-
    200b, hsa-miR-7, hsa-miR-375, hsa-miR-202 (SEQ ID NO: 31), hsa-
    miR-214* (SEQ ID NO: 38), hsa-miR-509-3p (SEQ ID NO: 61)
    hsa-miR-372, hsa-miR-122, hsa-miR-200c, hsa-miR-30a, hsa-miR- Gastrointestinal
    146a, hsa-let-7e, hsa-miR-9*, hsa-miR-92b, hsa-miR-149, hsa-miR- stromal tumor (GIST)
    200b, hsa-miR-7, hsa-miR-375, hsa-miR-202, hsa-miR-214*, hsa-miR-
    509-3p, hsa-miR-143 (SEQ ID NO: 14), hsa-miR-29c*
    hsa-miR-372, hsa-miR-122, hsa-miR-200c, hsa-miR-30a, hsa-miR- Renal cell carcinoma
    146a, hsa-let-7e, hsa-miR-9*, hsa-miR-92b, hsa-miR-149, hsa-miR- chromophobe
    200b, hsa-miR-210 (SEQ ID NO: 36), hsa-miR-221 (SEQ ID NO: 147)
    hsa-miR-372, hsa-miR-122, hsa-miR-200c, hsa-miR-30a, hsa-miR- Renal cell carcinoma
    146a, hsa-let-7e, hsa-miR-9*, hsa-miR-92b, hsa-miR-149, hsa-miR- clear cell
    200b, hsa-miR-210, hsa-miR-221, hsa-miR-31 (SEQ ID NO: 49), hsa-
    miR-126 (SEQ ID NO: 9)
    hsa-miR-372, hsa-miR-122, hsa-miR-200c, hsa-miR-30a, hsa-miR- Renal cell carcinoma
    146a, hsa-let-7e, hsa-miR-9*, hsa-miR-92b, hsa-miR-149, hsa-miR- papillary
    200b, hsa-miR-210, hsa-miR-221, hsa-miR-31, hsa-miR-126
    hsa-miR-372, hsa-miR-122, hsa-miR-200c, hsa-miR-30a, hsa-miR- Pleural mesothelioma
    146a, hsa-let-7e, hsa-miR-9*, hsa-miR-92b, hsa-miR-149, hsa-miR-
    200b, hsa-miR-7 (SEQ ID NO: 65), hsa-miR-375, hsa-miR-202 (SEQ
    ID NO: 31), hsa-miR-214* (SEQ ID NO: 38), hsa-miR-509-3p (SEQ ID
    NO: 61), hsa-miR-143 (SEQ ID NO: 14), hsa-miR-29c*, hsa-miR-21*
    (SEQ ID NO: 35), hsa-miR-130a (SEQ ID NO: 10), hsa-miR-10b (SEQ
    ID NO: 5)
    hsa-miR-372, hsa-miR-122, hsa-miR-200c, hsa-miR-30a, hsa-miR- Sarcoma
    146a, hsa-let-7e, hsa-miR-9*, hsa-miR-92b, hsa-miR-149, hsa-miR-
    200b, hsa-miR-7, hsa-miR-375, hsa-miR-202, hsa-miR-214*, hsa-miR-
    509-3p, hsa-miR-143, hsa-miR-29c*, hsa-miR-21*, hsa-miR-130a, hsa-
    miR-10b
    hsa-miR-372, hsa-miR-122, hsa-miR-200c, hsa-miR-30a, hsa-miR- Synovial sarcoma
    146a, hsa-let-7e, hsa-miR-9*, hsa-miR-92b, hsa-miR-149, hsa-miR-
    200b, hsa-miR-7, hsa-miR-375, hsa-miR-202, hsa-miR-214*, hsa-miR-
    509-3p, hsa-miR-143, hsa-miR-29c*, hsa-miR-21*, hsa-miR-130a, hsa-
    miR-10b, hsa-miR-100 (SEQ ID NO: 3), hsa-miR-222 (SEQ ID NO: 40),
    hsa-miR-145 (SEQ ID NO: 15)
    hsa-miR-372, hsa-miR-122, hsa-miR-200c, hsa-miR-30a, hsa-miR- Chondrosarcoma
    146a, hsa-let-7e, hsa-miR-9*, hsa-miR-92b, hsa-miR-149, hsa-miR-
    200b, hsa-miR-7, hsa-miR-375, hsa-miR-202, hsa-miR-214*, hsa-miR-
    509-3p, hsa-miR-143, hsa-miR-29c*, hsa-miR-21*, hsa-miR-130a, hsa-
    miR-10b, hsa-miR-100, hsa-miR-222, hsa-miR-145, hsa-miR-140-3p
    (SEQ ID NO: 12), hsa-miR-455-5p (SEQ ID NO: 58)
    hsa-miR-372, hsa-miR-122, hsa-miR-200c, hsa-miR-30a, hsa-miR- Liposarcoma
    146a, hsa-let-7e, hsa-miR-9*, hsa-miR-92b, hsa-miR-149, hsa-miR-
    200b, hsa-miR-7, hsa-miR-375, hsa-miR-202, hsa-miR-214*, hsa-miR-
    509-3p, hsa-miR-143, hsa-miR-29c*, hsa-miR-21*, hsa-miR-130a, hsa-
    miR-10b, hsa-miR-100, hsa-miR-222, hsa-miR-145, hsa-miR-140-3p,
    hsa-miR-455-5p, hsa-miR-210 (SEQ ID NO: 36), hsa-miR-193a-5p
    (SEQ ID NO: 26)
    hsa-miR-372, hsa-miR-122, hsa-miR-200c, hsa-miR-30a, hsa-miR- Ewing sarcoma
    146a, hsa-let-7e, hsa-miR-9*, hsa-miR-92b, hsa-miR-149, hsa-miR-
    200b, hsa-miR-7, hsa-miR-375, hsa-miR-202, hsa-miR-214*, hsa-miR-
    509-3p, hsa-miR-143, hsa-miR-29c*, hsa-miR-21*, hsa-miR-130a, hsa-
    miR-10b, hsa-miR-100, hsa-miR-222, hsa-miR-145, hsa-miR-140-3p,
    hsa-miR-455-5p, hsa-miR-210, hsa-miR-193a-5p, hsa-miR-181a, hsa-
    miR-193a-3p (SEQ ID NO: 25), hsa-miR-31 (SEQ ID NO: 49)
    hsa-miR-372, hsa-miR-122, hsa-miR-200c, hsa-miR-30a, hsa-miR- Osteosarcoma
    146a, hsa-let-7e, hsa-miR-9*, hsa-miR-92b, hsa-miR-149, hsa-miR-
    200b, hsa-miR-7, hsa-miR-375, hsa-miR-202, hsa-miR-214*, hsa-miR-
    509-3p, hsa-miR-143, hsa-miR-29c*, hsa-miR-21*, hsa-miR-130a, hsa-
    miR-10b, hsa-miR-100, hsa-miR-222, hsa-miR-145, hsa-miR-140-3p,
    hsa-miR-455-5p, hsa-miR-210, hsa-miR-193a-5p, hsa-miR-181a, hsa-
    miR-193a-3p, hsa-miR-31
    hsa-miR-372, hsa-miR-122, hsa-miR-200c, hsa-miR-30a, hsa-miR- Rhabdomyo sarcoma
    146a, hsa-let-7e, hsa-miR-30a, hsa-miR-9*, hsa-miR-92b, hsa-miR-30a,
    hsa-miR-149, hsa-miR-200b, hsa-miR-7, hsa-miR-375, hsa-miR-202,
    hsa-miR-214*, hsa-miR-509-3p, hsa-miR-143, hsa-miR-29c*, hsa-miR-
    21*, hsa-miR-130a, hsa-miR-10b, hsa-miR-100, hsa-miR-222, hsa-
    miR-145, hsa-miR-140-3p, hsa-miR-455-5p, hsa-miR-210, hsa-miR-
    193a-5p, hsa-miR-181a, hsa-miR-487b (SEQ ID NO: 59), hsa-miR-22
    (SEQ ID NO: 39), hsa-miR-206 (SEQ ID NO: 33)
    hsa-miR-372, hsa-miR-122, hsa-miR-200c, hsa-miR-30a, hsa-miR- Malignant fibrous
    146a, hsa-let-7e, hsa-miR-30a, hsa-miR-9*, hsa-miR-92b, hsa-miR-30a, histiocytoma (MFH)
    hsa-miR-149, hsa-miR-200b, hsa-miR-7, hsa-miR-375, hsa-miR-202, or fibresarcoma
    hsa-miR-214*, hsa-miR-509-3p, hsa-miR-143, hsa-miR-29c*, hsa-miR-
    21*, hsa-miR-130a, hsa-miR-10b, hsa-miR-100, hsa-miR-222, hsa-
    miR-145, hsa-miR-140-3p, hsa-miR-455-5p, hsa-miR-210, hsa-miR-
    193a-5p, hsa-miR-181a, hsa-miR-487b, hsa-miR-22, hsa-miR-206
  • Example 2 Expression of miRs Provides for Distinguishing Between Tumors
  • TABLE 3
    miR expression (in fluorescence units) distinguishing
    between the group consisting of germ-cell tumors
    and the group consisting of all other tumors
    SEQ
    fold- ID
    median values change p-value NO. miR name
    2.7e+004-5.0e+001 545.73 (+) <e−240 233 hsa-miR-373
    1.8e+004-5.0e+001 365.93 (+) <e−240 55 hsa-miR-372
    8.6e+003-5.0e+001 171.72 (+) <e−240 200 hsa-miR-371-3p
    5.9e+003-5.1e+001 115.94 (+) 7.3e−249 201 hsa-miR-371-5p
    (+) for all the listed miRs, the higher expression is in tumors from a germ-cell origin.
  • hsa-miR-372 (SEQ ID NO: 55) is used at node 1 of the binary-tree-classifier detailed in the invention to distinguish between germ-cell tumors and all other tumors.
  • FIGS. 2A-D are boxplot presentations comparing distribution of the expression of the statistically significant miRs in tumor samples from the “germ cell” class (left box) and “non germ cell” class (right box).
  • TABLE 4
    miR expression (in fluorescence units) distinguishing between
    the group consisting of hepatobiliary tumors and the group
    consisting of non germ-cell non-hepatobiliary tumors
    SEQ
    fold- ID
    medianvalues change p-value NO. miR name
    1.0e+005-5.0e+001 2024.31 (+) 1.1e−123 6 hsa-miR-122
    7.4e+001-8.1e+003  109.63 (−) 3.6e−010 30 hsa-miR-200c
    5.0e+001-1.4e+003  27.92 (−) 4.8e−010 13 hsa-miR-141
    (+) the higher expression of this miR is in tumors from a hepatobiliary origin
    (−) the higher expression of this miR is in tumors from a non germ-cell, non-hepatobiliary origin
  • hsa-miR-122 (SEQ ID NO: 6) is used at node 2 of the binary-tree-classifier detailed in the invention to distinguish between hepatobiliary tumors and non germ-cell non-hepatobiliary tumors.
  • TABLE 5
    miR expression (in fluorescence units) distinguishing between
    the group consisting of liver tumors and the group consisting of biliary-
    tract carcinomas (cholangiocarcinoma or gallbladder adenocarcinoma)
    SEQ
    fold- ID
    median values change p-value NO. miR name
    6.1e+003-4.1e+002 14.74 (+) 5.5e−005 28 hsa-miR-200a
    9.7e+003-9.0e+002 10.74 (+) 2.4e−004 29 hsa-miR-200b
    1.9e+003-7.0e+003  3.67 (−) 8.5e−004 231 hsa-miR-99a
    3.3e+003-7.5e+003  2.28 (−) 6.2e−004 9 hsa-miR-126
    (+) the higher expression of this miR is in biliary tract carcinomas
    (−) the higher expression of this miR is in liver tumors
  • hsa-miR-126 (SEQ ID NO: 9) and hsa-miR-200b (SEQ ID NO: 29) are used at node 3 of the binary-tree-classifier detailed in the invention to distinguish between liver tumors and biliary-tract carcinoma.
  • FIG. 3 demonstrates that tumors of hepatocellular carcinoma (HCC) origin (marked by squares) are easily distinguished from tumors of biliary tract adenocarcinoma origin (marked by diamonds) using the expression levels of hsa-miR-200b (SEQ ID NO: 29, y-axis) and hsa-miR-126 (SEQ ID NO: 9, x-axis).
  • TABLE 6
    miR expression (in fluorescence units) distinguishing between
    the group consisting of tumors from an epithelial origin and
    the group consisting of tumors from a non-epithelial origin
    SEQ
    fold- ID
    median values change p-value NO. miR name
    1.5e+004-7.7e+001 196.43 (+)  1.5e−300 30 hsa-miR-200c
    9.0e+003-5.0e+001 180.07 (+)  1.3e−208 29 hsa-miR-200b
    3.9e+003-5.0e+001 78.09 (+) 2.2e−187 28 hsa-miR-200a
    2.7e+003-5.0e+001 54.64 (+) 7.0e−078 32 hsa-miR-205
    2.6e+003-5.0e+001 51.98 (+) 1.2e−265 13 hsa-miR-141
    5.4e+002-9.2e+001  5.90 (+) 6.3e−048 152 hsa-miR-182
    1.1e+003-2.5e+002  4.35 (+) 4.8e−022 49 hsa-miR-31
    (+) for all the listed miRs, the higher expression is in tumors from epithelial origins
  • A combination of the expression level of any of the miRs detailed in table 6 with the expression level of any of hsa-miR-30a (SEQ ID NO: 46), hsa-miR-10b (SEQ ID NO: 5) and hsa-miR-140-3p (SEQ ID NO: 12) also provides for distinguishing between tumors from epithelial origins and tumors from non-epithelial origins. This is demonstrated at node 4 of the binary-tree-classifier detailed in the invention with hsa-miR-200c (SEQ ID NO: 30) and hsa-miR-30a (SEQ ID NO: 46) (FIG. 4). Tumors originating in epithelial (diamonds) are easily distinguished from tumors of non-epithelial origin (squares) using the expression levels of hsa-miR-30a (SEQ ID NO: 46, y-axis) and hsa-miR-200c (SEQ ID NO: 30, x-axis).
  • TABLE 7
    miR expression (in fluorescence units) distinguishing between
    the group consisting of melanoma and lymphoma (B-cell, T-cell),
    and the group consisting of all other non-epithelial tumors
    SEQ
    fold- ID
    median values change p-value NO. miR name
    2.0e+003-7.0e+001 28.25 (−)  1.9e−074 164 hsa-miR-142-5p
    1.2e+004-6.3e+002 18.86 (−)  6.0e−061 168 hsa-miR-150
    5.4e+003-3.1e+002 17.29 (−)  5.6e−060 170 hsa-miR-155
    4.2e+003-3.5e+002 12.03 (−)  8.4e−068 16 hsa-miR-146a
    5.9e+002-1.4e+002 4.25 (−) 8.2e−048 198 hsa-miR-342-5p
    7.5e+003-1.9e+003 4.02 (−) 4.8e−056 50 hsa-miR-342-3p
    8.9e+002-2.5e+002 3.53 (−) 6.0e−035 176 hsa-miR-18a
    4.4e+003-1.4e+003 3.28 (−) 8.0e−038 186 hsa-miR-20a
    7.9e+002-2.6e+002 3.03 (−) 7.3e−005 11 hsa-miR-138
    6.6e+003-2.3e+003 2.82 (−) 4.0e−039 158 hsa-miR-106a
    4.1e+003-1.4e+003 2.82 (−) 2.4e−037 20 hsa-miR-17
    6.2e+001-5.9e+002 9.53 (+) 3.7e−027 155 hsa-miR-127-3p
    1.2e+003-7.0e+003 5.71 (+) 1.5e−047 231 hsa-miR-99a
    3.9e+002-1.7e+003 4.25 (+) 6.6e−022 4 hsa-miR-10a
    1.0e+004-4.1e+004 3.91 (+) 3.2e−037 8 hsa-miR-125b
    6.5e+002-2.2e+003 3.37 (+) 2.4e−023 46 hsa-miR-30a
    1.9e+003-5.6e+003 2.98 (+) 1.0e−025 3 hsa-miR-100
    2.5e+003-7.1e+003 2.89 (+) 1.8e−051 2 hsa-let-7e
    2.9e+003-8.4e+003 2.86 (+) 8.1e−047 7 hsa-miR-125a-5p
    (+) the higher expression of this miR is in the group of non-epithelial tumors excluding melanoma and lymphoma
    (−) the higher expression of this miR is in the group consisting of melanoma and lymphoma
  • hsa-miR-146a (SEQ ID NO: 16), hsa-let-7e (SEQ ID NO: 2) and hsa-miR-30a (SEQ ID NO: 46) are used at node 5 of the binary-tree-classifier detailed in the invention to distinguish between the group consisting of melanoma and lymphoma, and the group consisting of all other non-epithelial tumors. FIG. 5 demonstrates that tumors originating in the lymphoma or melanoma (diamonds) are easily distinguished from tumors of non epithelial, non lymphoma/melanoma origin (squares) using the expression levels of hsa-miR-146a (SEQ ID NO: 16, y-axis), hsa-miR-30a (SEQ ID NO: 46, x-axis) and hsa-let-7e (SEQ ID NO: 2, z-axis).
  • TABLE 8
    miR expression (in fluorescence units) distinguishing
    between the group consisting of brain tumors (astrocytic
    tumor and oligodendroglioma) and the group consisting
    of all non-brain, non-epithelial tumors
    SEQ
    fold- ID
    median values change p-value NO. miR name
    9.1e+003-5.0e+001 182.94 (+)  3.8e−059 159 hsa-miR-124
    4.4e+003-5.0e+001 88.33 (+)  1.1e−125 66 hsa-miR-9*
    2.1e+003-6.0e+001 34.97 (+)  6.0e−035 225 hsa-miR-551b
    9.9e+002-5.0e+001 19.73 (+)  3.0e−116 187 hsa-miR-219-2-3p
    6.5e+002-5.0e+001 12.95 (+)  1.8e−021 162 hsa-miR-129-3p
    1.1e+003-1.0e+002 10.52 (+)  2.0e−034 161 hsa-miR-128
    2.3e+003-2.5e+002 9.45 (+) 2.2e−052 68 hsa-miR-92b
    5.2e+002-6.8e+001 7.61 (+) 6.7e−019 232 hsa-miR-99a*
    6.9e+002-9.2e+001 7.45 (+) 5.5e−023 173 hsa-miR-181c
    2.2e+003-3.5e+002 6.34 (+) 7.4e−007 11 hsa-miR-138
    1.2e+005-2.4e+004 4.78 (+) 1.7e−014 8 hsa-miR-125b
    1.8e+003-3.9e+002 4.70 (+) 7.2e−014 174 hsa-miR-181d
    8.5e+002-1.8e+002 4.64 (+) 2.2e−002 155 hsa-miR-127-3p
    1.6e+004-3.5e+003 4.60 (+) 2.4e−010 231 hsa-miR-99a
    8.5e+001-1.1e+003 13.55 (−)  2.4e−014 4 hsa-miR-10a
    7.7e+002-6.6e+003 8.58 (−) 8.4e−017 182 hsa-miR-199a-5p
    5.7e+002-4.7e+003 8.12 (−) 1.8e−013 181 hsa-miR-199a-3p
    2.8e+002-1.9e+003 6.81 (−) 1.4e−012 37 hsa-miR-214
    (+) the higher expression of this miR is in the group consisting of brain tumors
    (−) the higher expression of this miR is in the group consisting of all non-brain, non-epithelial tumors
  • hsa-miR-9* (SEQ ID NO: 66) and hsa-miR-92b (SEQ ID NO: 68) are used at node 6 of the binary-tree-classifier detailed in the invention to distinguish between brain tumors and the group consisting of all non-brain, non-epithelial tumors. FIG. 6 demonstrates that tumors originating in the brain (marked by diamonds) are easily distinguished from tumors of non epithelial, non brain origin (marked by squares) using the expression levels of hsa-miR-9* (SEQ ID NO: 66, y-axis) and hsa-miR-92b (SEQ ID NO: 68, x-axis).
  • TABLE 9
    miR expression (in fluorescence units) distinguishing
    between astrocytic tumors and oligodendrogliomas
    SEQ
    fold- ID
    median values change p-value NO. miR name
    2.5e+003-2.3e+002 11.10 (+)  5.1e−011 230 hsa-miR-886-5p
    4.4e+003-4.9e+002 9.06 (+) 1.1e−009 228 hsa-miR-886-3p
    1.0e+004-1.7e+003 5.99 (+) 7.7e−008 147 hsa-miR-221
    1.3e+004-2.6e+003 5.03 (+) 2.6e−006 40 hsa-miR-222
    3.3e+004-7.3e+003 4.54 (+) 3.9e−004 34 hsa-miR-21
    8.4e+002-2.2e+002 3.78 (+) 3.7e−006 206 hsa-miR-455-3p
    6.0e+002-1.8e+002 3.30 (+) 1.3e−002 35 hsa-miR-21*
    5.8e+003-1.8e+003 3.15 (+) 2.4e−005 52 hsa-miR-34a
    1.1e+003-3.5e+002 3.04 (+) 1.0e−003 25 hsa-miR-193a-3p
    1.6e+002-8.2e+002 5.17 (−) 1.2e−004 229 hsa-miR-9
    4.6e+002-2.3e+003 5.09 (−) 7.1e−003 161 hsa-miR-128
    4.1e+002-1.8e+003 4.43 (−) 1.3e−002 187 hsa-miR-219-2-3p
    3.8e+003-1.3e+004 3.31 (−) 1.9e−002 179 hsa-miR-195
    (+) the higher expression of this miR is in astrocytic tumors
    (−) the higher expression of this miR is in oligodendrogliomas
  • A combination of the expression level of any of the miRs detailed in table 9 with the expression level of hsa-miR-497 (SEQ ID NO: 208) or hsa-let-7d (SEQ ID NO: 153) also provides for classification of brain tumors as astrocytic tumors or oligodendrogliomas. This is demonstrated at node 7 of the binary-tree-classifier detailed in the invention with hsa-miR-222 (SEQ ID NO: 40) and hsa-miR-497 (SEQ ID NO: 208). In another embodiment of the invention, the expression levels of hsa-miR-222 (SEQ ID NO: 40) and hsa-let-7d (SEQ ID NO: 153) are combined to distinguish between astrocytic tumors and oligodendrogliomas.
  • FIG. 7 demonstrates that tumors originating in astrocytoma (marked by diamonds) are easily distinguished from tumors of oligodendroglioma origins (marked by squares) using the expression levels of hsa-miR-497 (SEQ ID NO: 208, y-axis) and hsa-miR-222 (SEQ ID NO: 40, x-axis).
  • TABLE 10
    miR expression (in fluorescence units) distinguishing between
    the group consisting of neuroendocrine tumors and the group
    consisting of all non-neuroendocrine, epithelial tumors
    SEQ
    fold- ID
    median values change p-value NO. miR name
    3.8e+004-1.5e+002 259.47 (+)  5.3e−086 56 hsa-miR-375
    3.6e+003-5.2e+001 70.47 (+)  4.4e−145 65 hsa-miR-7
    1.3e+003-1.8e+002 6.89 (+) 4.7e−044 175 hsa-miR-183
    1.9e+003-4.4e+002 4.42 (+) 3.5e−025 152 hsa-miR-182
    1.2e+003-3.0e+002 4.16 (+) 5.5e−028 155 hsa-miR-127-3p
    5.6e+001-7.0e+003 124.66 (−)  1.4e−023 32 hsa-miR-205
    1.5e+002-1.4e+003 9.25 (−) 1.8e−019 49 hsa-miR-31
    3.4e+002-1.4e+003 4.12 (−) 9.5e−032 35 hsa-miR-21*
    (+) the higher expression of this miR is in the group consisting of neuroendocrine tumors
    (−) the higher expression of this miR is in the group consisting of all non-neuroendocrine, epithelial tumors
  • hsa-miR-375 (SEQ ID NO: 56), hsa-miR-7 (SEQ ID NO: 65) and hsa-miR-193a-3p (SEQ ID NO: 25) are used at node 8 of the binary-tree-classifier detailed in the invention to distinguish between the group consisting of neuroendocrine tumors and the group consisting of all non-neuroendocrine, epithelial tumors. FIG. 8 demonstrates that tumors originating in the neuroendocrine (diamonds) are easily distinguished from tumors of epithelial, origin (squares) using the expression levels of hsa-miR-193a-3p (SEQ ID NO: 25, y-axis), hsa-miR-7 (SEQ ID NO: 65, x-axis) and hsa-miR-375 (SEQ ID NO: 56, z-axis).
  • TABLE 11
    miR expression (in fluorescence units) distinguishing between
    the group consisting of gastrointestinal (GI) epithelial tumors
    and the group consisting of non-GI epithelial tumors
    SEQ
    fold- ID
    median values change p-value NO. miR name
    2.6e+003-7.1e+001 36.09 (+) 2.5e−127 27 hsa-miR-194
    3.9e+003-1.2e+002 33.26 (+) 1.6e−117 177 hsa-miR-192
    2.6e+003-6.7e+002  3.88 (+) 3.3e−021 4 hsa-miR-10a
    5.0e+001-2.1e+004 411.76 (−)  6.5e−045 32 hsa-miR-205
    (+) the higher expression of this miR is in the group consisting of GI epithelial tumors
    (−) the higher expression of this miR is in the group consisting of non-GI epithelial tumors
  • hsa-miR-194 (SEQ ID NO: 27) and hsa-miR-21* (SEQ ID NO: 35) are used at node 9 of the binary-tree-classifier detailed in the invention to distinguish between GI epithelial tumors and non-GI epithelial tumors.
  • FIG. 9 demonstrates that tumors originating in gastro-intestinal (GI) (marked by diamonds) are easily distinguished from tumors of non GI origins (marked by squares) using the expression levels of hsa-miR-21* (SEQ ID NO: 35, y-axis) and hsa-miR-194 (SEQ ID NO: 27, x-axis).
  • TABLE 12
    miR expression (in fluorescence units) distinguishing between
    prostate tumors and all other non-GI epithelial tumors
    SEQ
    fold- ID
    median values change p-value NO. miR name
    5.1e+003-5.2e+001 96.76 (+)  3.7e−016 56 hsa-miR-375
    1.0e+003-5.5e+001 18.27 (+)  4.0e−025 199 hsa-miR-363
    6.8e+004-7.2e+003 9.41 (+) 1.0e−025 14 hsa-miR-143
    1.2e+005-1.4e+004 8.14 (+) 7.8e−022 15 hsa-miR-145
    2.8e+003-3.5e+002 7.89 (+) 1.5e−012 165 hsa-miR-143*
    2.1e+004-4.4e+003 4.76 (+) 2.2e−011 231 hsa-miR-99a
    4.6e+002-2.1e+003 4.58 (−) 8.0e−007 36 hsa-miR-210
    2.7e+002-1.1e+003 3.84 (−) 7.8e−017 154 hsa-miR-181b
    1.2e+003-4.3e+003 3.76 (−) 1.2e−014 21 hsa-miR-181a
    5.5e+002-2.0e+003 3.63 (−) 2.3e−002 49 hsa-miR-31
    (+) the higher expression of this miR is in prostate tumors
    (−) the higher expression of this miR is in the group consisting of all other non-GI epithelial tumors
  • hsa-miR-143 (SEQ ID NO: 14) and hsa-miR-181a (SEQ ID NO: 21) are used at node 10 of the binary-tree-classifier detailed in the invention to distinguish between prostate tumors and all other non-GI epithelial tumors.
  • FIG. 10 demonstrates that tumors originating in prostate adenocarcinoma (marked by diamonds) are easily distinguished from tumors of non prostate origins (marked by squares) using the expression levels of hsa-miR-181a (SEQ ID NO: 21, y-axis) and hsa-miR-143 (SEQ ID NO: 14, x-axis).
  • TABLE 13
    miR expression (in fluorescence units) distinguishing between
    seminomatous and non- seminomatous testicular tumors
    SEQ
    fold- ID
    median values change p-value NO. miR name
    4.3e+003-7.6e+002 5.63 (+) 6.6e−004 152 hsa-miR-182
    1.0e+002-2.1e+003 20.46 (−)  6.2e−005 216 hsa-miR-518e
    7.8e+001-1.2e+003 15.29 (−)  4.5e−005 212 hsa-miR-516b
    6.8e+001-8.2e+002 11.94 (−)  2.2e−005 224 hsa-miR-527
    2.1e+002-2.2e+003 10.40 (−)  1.9e−006 13 hsa-miR-141
    5.3e+002-5.0e+003 9.48 (−) 5.0e−004 194 hsa-miR-302d
    1.4e+002-1.3e+003 8.97 (−) 4.1e−006 192 hsa-miR-302a
    2.7e+002-2.3e+003 8.78 (−) 2.9e−003 221 hsa-miR-520c-3p
    1.3e+002-1.2e+003 8.65 (−) 8.3e−004 217 hsa-miR-518f*
    3.4e+003-2.9e+004 5.98 (−) 2.6e−007 205 hsa-miR-451
    2.8e+002-1.7e+003 5.98 (−) 1.1e−002 219 hsa-miR-519d
    2.0e+002-1.2e+003 5.90 (−) 6.8e−005 32 hsa-miR-205
    2.0e+002-1.1e+003 5.59 (−) 5.8e−006 193 hsa-miR-302a*
    1.9e+002-1.0e+003 5.27 (−) 6.7e−003 223 hsa-miR-524-5p
    1.5e+002-8.0e+002 5.22 (−) 5.4e−003 220 hsa-miR-520a-5p
    2.2e+002-1.1e+003 5.21 (−) 4.1e−003 210 hsa-miR-512-5p
    3.2e+002-1.4e+003 4.57 (−) 9.2e−003 209 hsa-miR-498
    7.2e+002-3.2e+003 4.51 (−) 3.1e−002 213 hsa-miR-517a
    6.4e+002-2.9e+003 4.47 (−) 2.9e−002 163 hsa-miR-1323
    9.5e+002-4.1e+003 4.29 (−) 1.3e−004 30 hsa-miR-200c
    (+) the higher expression of this miR is in seminoma tumors
    (−) the higher expression of this miR is in non-seminoma tumors
  • A combination of the expression level of any of the miRs detailed in table 13 with the expression level of hsa-miR-200b (SEQ ID NO: 29), hsa-miR-200a (SEQ ID NO: 28), hsa-miR-516a-5p (SEQ ID NO: 211), hsa-miR-767-5p (SEQ ID NO: 227), hsa-miR-518a-3p (SEQ ID NO: 215), hsa-miR-520d-5p (SEQ ID NO: 222), hsa-miR-519a (SEQ ID NO: 218) and hsa-miR-517c (SEQ ID NO: 214) also provides for classification of seminoma and non-seminoma testis-tumors.
  • hsa-miR-516a-5p (SEQ ID NO: 211) and hsa-miR-200b (SEQ ID NO: 29) are used at node 12 of the binary-tree-classifier detailed in the invention to distinguish between seminoma and non-seminoma testis-tumors.
  • FIG. 11 demonstrates that tumors originating in seminomatous testicular germ cell (marked by diamonds) are easily distinguished from tumors of non seminomatous origins (marked by squares) using the expression levels of hsa-miR-516a-5p (SEQ ID NO: 211, y-axis) and hsa-miR-200b (SEQ ID NO: 29, x-axis).
  • TABLE 14
    miR expression (in fluorescence units) distinguishing between
    the group consisting of squamous cell carcinoma (SCC), transitional
    cell carcinoma (TCC), thymoma and the group consisting of
    non gastrointestinal (GI) adenocarcinoma tumors
    SEQ
    ID fold-
    miR name NO. p-value change median values
    hsa-miR-205 32 1.6e−059 321.76 (+)  4.6e+004-1.4e+002
    hsa-miR-210 36 8.6e−015 5.96 (+) 2.9e+003-4.9e+002
    hsa-miR-193b 178 2.6e−016 3.82 (+) 2.5e+003-6.6e+002
    MID-16869 243 1.8e−008 3.67 (+) 2.5e+003-6.8e+002
    MID-16489 242 2.2e−011 3.53 (+) 3.4e+003-9.7e+002
    hsa-miR-31 49 8.2e−004 2.82 (+) 3.7e+003-1.3e+003
    MID-15965 240 1.7e−010 2.78 (+) 6.4e+003-2.3e+003
    hsa-miR-378 57 2.7e−017 2.71 (+) 1.4e+003-5.2e+002
    hsa-miR-138 11 2.9e−023 8.05 (−) 2.8e+002-2.2e+003
    hsa-miR-30a 46 1.5e−018 3.70 (−) 7.3e+002-2.7e+003
    hsa-miR-146b-5p 17 4.0e−013 2.60 (−) 8.6e+002-2.2e+003
    hsa-miR-30d 47 1.5e−021 2.44 (−) 1.8e+003-4.3e+003
    hsa-miR-345 51 2.6e−019 2.38 (−) 4.5e+002-1.1e+003
    hsa-miR-125a-5p 7 3.8e−014 2.30 (−) 4.2e+003-9.6e+003
    hsa-miR-125b 8 3.0e−009 2.26 (−) 1.9e+004-4.3e+004
    hsa-miR-181b 154 3.2e−008 2.24 (−) 9.4e+002-2.1e+003
    hsa-miR-29b 190 3.3e−010 2.13 (−) 7.4e+002-1.6e+003
    hsa-let-7i 157 4.6e−014 2.04 (−) 6.6e+003-1.3e+004
    hsa-miR-30c 196 7.9e−010 2.04 (−) 2.4e+003-4.8e+003
    (+) the higher expression of this miR is in SCC, TCC and thymoma
    (−) the higher expression of this miR is in non GI adenocarcinoma
  • Node 13 of the binary-tree-classifier separates tissues with high expression of miR-205 (SCC marker) such as SCC, TCC and thymomas from adenocarcinomas.
  • Breast adenocarcinoma and ovarian carcinoma are excluded from this separation due to a wide range of expression of miR-205.
  • A combination of the expression level of any of the miRs detailed in table 14 with the expression level of hsa-miR-331-3p (SEQ ID NO: 197) also provides for this classification.
  • hsa-miR-205 (SEQ ID NO: 32), hsa-miR-345 (SEQ ID NO: 51) and hsa-miR-125a-5p (SEQ ID NO: 7) are used at node 13 of the binary-tree-classifier detailed in the invention.
  • TABLE 15
    miR expression (in fluorescence units) distinguishing between
    the group consisting of breast adenocarcinoma and the group
    consisting of SCC, TCC, thymomas and ovarian carcinoma
    SEQ
    ID fold-
    miR name NO. p-value change median values
    hsa-miR-375 56 4.1e−029 25.95 (+)  1.3e+003-5.0e+001
    hsa-miR-30a 46 1.6e−014 3.25 (+) 2.7e+003-8.3e+002
    hsa-miR-193a-3p 25 9.5e−022 3.09 (+) 4.4e+003-1.4e+003
    hsa-miR-182 152 2.1e−009 2.94 (+) 1.2e+003-4.1e+002
    hsa-miR-342-3p 50 7.3e−014 2.48 (+) 5.5e+003-2.2e+003
    hsa-miR-29c* 45 6.3e−008 2.48 (+) 6.6e+002-2.7e+002
    hsa-miR-29c 191 5.8e−007 2.26 (+) 5.0e+002-2.2e+002
    hsa-miR-199a-3p 181 1.3e−004 2.19 (+) 7.3e+003-3.3e+003
    hsa-miR-195 179 2.0e−006 2.05 (+) 2.2e+003-1.1e+003
    hsa-miR-31 49 9.6e−014 13.81 (−)  2.2e+002-3.1e+003
    hsa-miR-205 32 9.3e−008 6.32 (−) 6.3e+003-4.0e+004
    hsa-miR-224 42 5.3e−010 5.72 (−) 8.9e+001-5.1e+002
    hsa-miR-203 184 6.7e−007 4.05 (−) 1.5e+002-6.3e+002
    hsa-miR-222 40 5.8e−018 2.64 (−) 5.7e+003-1.5e+004
    hsa-miR-221 147 1.0e−018 2.41 (−) 3.8e+003-9.2e+003
    MID-00689 236 4.9e−010 2.39 (−) 4.8e+002-1.1e+003
    hsa-miR-378 57 1.2e−010 2.37 (−) 6.0e+002-1.4e+003
    hsa-miR-422a 203 2.2e−008 2.24 (−) 2.3e+002-5.2e+002
    hsa-miR-210 36 1.5e−007 2.22 (−) 1.2e+003-2.6e+003
    (+) the higher expression of this miR is in breast adenocarcinoma
    (−) the higher expression of this miR is in SCC, TCC, thymomas and ovarian carcinoma
  • hsa-miR-193a-3p (SEQ ID NO: 25), hsa-miR-375 (SEQ ID NO: 56) and hsa-miR-342-3p (SEQ ID NO: 50) are used at node 14 of the binary-tree-classifier detailed in the invention. According to another embodiment, hsa-miR-193a-3p (SEQ ID NO: 25), hsa-miR-375 (SEQ ID NO: 56) and hsa-miR-224 (SEQ ID NO: 42) may be used at node 14 of the binary-tree-classifier detailed in the invention.
  • TABLE 16
    miR expression (in fluorescence units) distinguishing
    between the group consisting of ovarian carcinoma
    and the group consisting of SCC, TCC and thymomas
    SEQ
    ID fold-
    miR name NO. p-value change median values
    hsa-miR-10a 4 1.2e−012 5.57 (+) 3.1e+003-5.5e+002
    hsa-miR-130a 10 1.7e−014 3.41 (+) 5.1e+003-1.5e+003
    hsa-miR-30a* 195 1.3e−014 3.39 (+) 2.5e+002-7.5e+001
    hsa-miR-10b 5 5.5e−009 2.68 (+) 2.4e+003-8.8e+002
    hsa-miR-625 226 2.5e−012 2.48 (+) 2.9e+002-1.2e+002
    hsa-let-7e 2 7.7e−012 2.28 (+) 8.8e+003-3.9e+003
    hsa-miR-30a 46 3.6e−007 2.20 (+) 1.6e+003-7.3e+002
    hsa-miR-205 32 1.0e−033 37.52 (−)  1.2e+003-4.6e+004
    hsa-miR-205* 185 4.2e−018 5.42 (−) 5.0e+001-2.7e+002
    hsa-miR-138 11 1.1e−009 4.63 (−) 6.0e+001 2.8e+002
    hsa-miR-150 168 5.2e−010 4.18 (−) 5.7e+002-2.4e+003
    hsa-miR-203 184 2.5e−003 2.74 (−) 2.9e+002-8.0e+002
    hsa-miR-146a 16 2.1e−006 2.62 (−) 2.9e+002-7.6e+002
    MID-16489 242 1.9e−007 2.49 (−) 1.4e+003-3.4e+003
    hsa-miR-140-3p 12 2.3e−015 2.42 (−) 9.5e+002-2.3e+003
    MID-15684 237 5.2e−006 2.37 (−) 7.6e+002-1.8e+003
    MID-16869 243 9.8e−005 2.23 (−) 1.1e+003-2.5e+003
    MID-20703 250 5.9e−004 2.18 (−) 2.1e+003-4.6e+003
    hsa-miR-22 39 6.5e−012 2.13 (−) 2.7e+003-5.8e+003
    MID-23256 253 1.0e−003 2.13 (−) 4.3e+002-9.2e+002
    hsa-miR-31 49 1.4e−002 2.12 (−) 1.7e+003-3.7e+003
    MID-18422 246 1.3e−003 2.06 (−) 1.7e+003-3.6e+003
    hsa-miR-149* 167 1.4e−007 2.04 (−) 2.1e+003-4.4e+003
    (+) the higher expression of this miR is in ovarian carcinoma
    (−) the higher expression of this miR is in SCC, TCC and thymomas
  • hsa-miR-205 (SEQ ID NO: 32), hsa-miR-10a (SEQ ID NO: 4) and hsa-miR-22 (SEQ ID NO: 39) are used at node 15 of the binary-tree-classifier detailed in the invention.
  • TABLE 17
    miR expression (in fluorescence units) distinguishing between
    the group consisting of thyroid carcinoma (follicular and papillary)
    and the group consisting of breast adenocarcinoma, lung large
    cell carcinoma, lung adenocarcinoma and ovarian carcinoma
    SEQ
    ID fold-
    miR name NO. p-value change median values
    hsa-miR-138 11 7.4e−033 33.86 (+)  4.1e+003-1.2e+002
    hsa-miR-221 147 1.4e−009 5.03 (+) 3.4e+004-6.7e+003
    hsa-miR-146b-5p 17 1.0e−006 4.74 (+) 4.9e+003-1.0e+003
    hsa-let-7i 157 2.8e−027 3.71 (+) 2.2e+004-5.8e+003
    hsa-miR-222 40 7.9e−009 3.63 (+) 3.9e+004-1.1e+004
    hsa-miR-125b 8 1.5e−014 2.78 (+) 5.4e+004-1.9e+004
    hsa-miR-31 49 5.0e−003 2.78 (+) 1.3e+003-4.9e+002
    hsa-miR-126 9 1.3e−008 2.48 (+) 6.9e+003-2.8e+003
    hsa-miR-29c 191 4.8e−007 2.36 (+) 8.1e+002-3.4e+002
    hsa-miR-451 205 3.8e−003 2.33 (+) 1.3e+004-5.7e+003
    hsa-miR-486-5p 207 1.3e−003 2.16 (+) 4.8e+002-2.2e+002
    hsa-miR-30a* 195 8.0e−006 2.12 (+) 4.9e+002-2.3e+002
    hsa-miR-345 51 2.7e−011 2.11 (+) 1.4e+003-6.6e+002
    hsa-miR-30a 46 1.5e−006 2.10 (+)  .5e+003-1.7e+003
    hsa-miR-29c* 45 1.5e−006 1.97 (+) 6.4e+002-3.2e+002
    hsa-miR-34a 52 3.4e−007 1.88 (+) 7.4e+003-4.0e+003
    hsa-miR-1977 234 5.9e−008 1.88 (+) 6.1e+003-3.3e+003
    hsa-miR-99a 231 1.5e−005 1.85 (+) 7.5e+003-4.1e+003
    hsa-miR-181a 21 2.7e−004 1.85 (+) 7.6e+003-4.1e+003
    hsa-miR-152 169 5.1e−008 1.82 (+) 1.0e+003-5.5e+002
    hsa-miR-29a 43 3.7e−008 1.79 (+) 1.16+004-6.0e+003
    hsa-miR-100 3 2.9e−005 1.77 (+) 5.4e+003-3.0e+003
    hsa-miR-30c 196 7.3e−007 1.73 (+) 5.9e+003-3.4e+003
    hsa-miR-181b 154 4.6e−004 1.69 (+) 2.1e+003-1.2e+003
    MID-00405 390 6.4e−003 1.66 (+) 4.4e+002-2.6e+002
    hsa-miR-15a 171 5.3e−006 1.65 (+) 5.5e+002-3.3e+002
    MID-23794 255 6.0e−003 1.60 (+) 1.3e+003-8.1e+002
    hsa-miR-331-3p 197 8.3e−007 1.57 (+) 2.3e+003-1.4e+003
    hsa-miR-29b 190 1.4e−005 1.57 (+) 1.8e+003-1.1e+003
    hsa-miR-27b 189 1.4e−003 1.56 (+) 3.6e+003-2.3e+003
    hsa-miR-22 39 3.6e−005 1.52 (+) 6.6e+003-4.3e+003
    hsa-miR-125a-5p 7 2.2e−006 1.52 (+) 9.9e+003-6.5e+003
    hsa-miR-30e 48 6.2e−006 1.51 (+) 7.8e+002-5.2e+002
    hsa-miR-30d 47 1.2e−002 1.51 (+) 4.3e+003-2.8e+003
    hsa-miR-205 32 2.2e−005 22.05 (−)  1.0e+002-2.2e+003
    hsa-miR-210 36 4.8e−020 8.83 (−) 2.1e+002-1.8e+003
    hsa-miR-10a 4 8.7e−011 4.34 (−) 3.2e+002-1.4e+003
    hsa-miR-193b 178 4.6e−014 3.59 (−) 4.9e+002-1.8e+003
    hsa-miR-214 37 8.3e−006 2.74 (−) 1.0e+003-2.8e+003
    hsa-miR-199a-3p 181 4.4e−005 2.67 (−) 2.0e+003-5.5e+003
    hsa-miR-193a-3p 25 3.1e−011 2.65 (−) 1.1e+003-2.8e+003
    hsa-miR-199b-5p 183 1.3e−004 2.63 (−) 2.7e+002-7.2e+002
    hsa-miR-199a-5p 182 1.6e−005 2.57 (−) 3.1e+003-8.1e+003
    hsa-miR-21* 35 4.7e−006 2.41 (−) 5.1e+002-1.2e+003
    MID-15965 240 9.3e−003 2.39 (−) 2.0e+003-4.8e+003
    hsa-miR-378 57 3.0e−006 2.35 (−) 3.4e+002-8.0e+002
    MID-16489 242 1.2e−003 2.25 (−) 8.2e+002-1.9e+003
    hsa-miR-425 204 1.4e−011 2.18 (−) 6.0e+002-1.3e+003
    MID-00689 236 5.3e−006 2.06 (−) 3.0e+002-6.1e+002
    hsa-miR-18a 176 2.0e−006 1.96 (−) 2.3e+002-4.5e+002
    hsa-miR-106a 158 3.5e−006 1.85 (−) 2.3e+003-4.3e+003
    hsa-miR-93 148 1.9e−010 1.79 (−) 2.4e+003-4.4e+003
    hsa-miR-455-3p 206 1.9e−007 1.79 (−) 2.9e+002-5.2e+002
    hsa-miR-342-3p 50 7.8e−005 1.78 (−) 1.3e+003-2.3e+003
    hsa-miR-17 20 6.0e−006 1.75 (−) 1.4e+003-2.5e+003
    hsa-miR-21 34 5.1e−006 1.75 (−) 2.8e+004-4.8e+004
    hsa-miR-20a 186 1.1e−004 1.74 (−) 1.4e+003-2.4e+003
    MID-15907 239 8.4e−004 1.72 (−) 2.4e+002-4.1e+002
    MID-21271 251 9.9e−003 1.62 (−) 3.0e+002-4.8e+002
    MID-17144 244 4.3e−002 1.60 (−) 2.2e+003-3.6e+003
    hsa-miR-191 24 4.7e−006 1.59 (−) 3.8e+003-6.1e+003
    hsa-miR-25 188 2.0e−005 1.59 (−) 1.0e+003-1.6e+003
    hsa-miR-15b 172 1.9e−002 1.57 (−) 2.1e+003-3.2e+003
    MID-15867 238 9.5e−003 1.56 (−) 2.6e+003-4.0e+003
    (+) the higher expression of this miR is in thyroid carcinoma
    (−) the higher expression of this miR is in breast adenocarcinoma, lung large cell carcinoma, lung adenocarcinoma and ovarian carcinoma
  • FIG. 12 demonstrates binary decisions at node #16 of the decision-tree. Tumors originating in thyroid carcinoma (diamonds) are easily distinguished from tumors of adenocarcinoma of the lung, breast and ovarian origin (squares) using the expression levels of hsa-miR-93 (SEQ ID NO: 148, y-axis), hsa-miR-138 (SEQ ID NO: 11, x-axis) and hsa-miR-10a (SEQ ID NO: 4, z-axis).
  • TABLE 18
    miR expression (in fluorescence units) distinguishing between
    follicular thyroid carcinoma and papillary thyroid carcinoma
    SEQ
    ID fold-
    miR name NO. p-value change median values
    MID-20524 249 4.5e−011 9.34 (+) 6.6e+003-7.1e+002
    hsa-miR-1973 180 1.9e−008 7.80 (+) 1.7e+003-2.2e+002
    hsa-miR-7 65 8.3e−005 7.58 (+) 4.5e+002-5.9e+001
    hsa-miR-1978 235 4.8e−007 6.52 (+) 2.5e+003-3.8e+002
    MID-16318 241 1.5e−008 6.14 (+) 2.2e+003-3.6e+002
    MID-19533 248 3.0e−004 6.00 (+) 4.2e+002-7.1e+001
    MID-23291 254 1.6e−008 5.76 (+) 9.6e+002-1.7e+002
    MID-19340 247 3.2e−005 5.33 (+) 9.9e+002-1.9e+002
    hsa-miR-1248 160 6.8e−009 5.17 (+) 6.4e+002-1.2e+002
    MID-16869 243 1.1e−006 4.97 (+) 1.5e+003-3.0e+002
    MID-18336 245 1.4e−010 4.48 (+) 2.7e+003-6.1e+002
    MID-22664 252 7.0e−004 4.00 (+) 5.0e+002-1.2e+002
    hsa-miR-146b-5p 17 6.7e−011 62.88 (−)  4.0e+002-2.5e+004
    hsa-miR-31 49 2.5e−008 18.72 (−)  4.4e+002-8.2e+003
    hsa-miR-146b-3p 166 5.0e−012 18.69 (−)  5.0e+001-9.3e+002
    hsa-miR-551b 225 4.8e−006 10.86 (−)  7.6e+001-8.3e+002
    hsa-miR-150 168 3.2e−007 10.71 (−)  3.1e+002-3.3e+003
    hsa-miR-21 34 3.4e−007 4.40 (−) 1.1e+004-4.7e+004
    (+) the higher expression of this miR is in follicular thyroid carcinoma
    (−) the higher expression of this miR is in papillary thyroid carcinoma
  • FIG. 13 demonstrates binary decisions at node #17 of the decision-tree. Tumors originating in follicular thyroid carcinoma (marked by diamonds) are easily distinguished from tumors of papillary thyroid carcinoma origins (marked by squares) using the expression levels of hsa-miR-21 (SEQ ID NO: 34, y-axis) and hsa-miR-146b-5p (SEQ ID NO: 17, x-axis).
  • TABLE 19
    miR expression (in fluorescence units) distinguishing between
    the group consisting of breast adenocarcinoma and the group
    consisting of lung adenocarcinoma and ovarian carcinoma
    SEQ
    ID fold-
    miR name NO. p-value change median values
    hsa-miR-205 32 8.8e−005 10.55 (+)  6.3e+003-6.0e+002
    hsa-miR-375 56 7.9e−006 8.43 (+) 1.3e+003-1.5e+002
    hsa-miR-342-3p 50 7.7e−012 3.17 (+) 5.5e+003-1.7e+003
    hsa-miR-29c* 45 2.2e−008 2.52 (+) 6.6e+002-2.6e+002
    hsa-miR-193a-3p 25 2.2e−012 2.23 (+) 4.4e+003-2.0e+003
    MID-23256 253 7.9e−005 2.20 (+) 9.5e+002-4.3e+002
    hsa-miR-182 152 4.9e−005 2.15 (+) 1.2e+003-5.6e+002
    hsa-miR-126 9 5.3e−005 1.94 (+) 3.7e+003-1.9e+003
    hsa-miR-30a 46 2.9e−004 1.90 (+) 2.7e+003-1.4e+003
    hsa-miR-29c 191 1.1e−004 1.78 (+) 5.0e+002-2.8e+002
    hsa-miR-193b 178 1.5e−006 1.72 (+) 2.4e+003-1.4e+003
    hsa-miR-31 49 7.7e−006 5.79 (−) 2.2e+002-1.3e+003
    hsa-miR-222 40 3.0e−010 2.69 (−) 5.7e+003-1.5e+004
    hsa-miR-130a 10 1.6e−006 2.50 (−) 1.4e+003-3.4e+003
    hsa-miR-221 147 1.2e−009 2.41 (−) 3.8e+003-9.3e+003
    hsa-miR-10a 4 2.2e−002 2.33 (−) 9.3e+002-2.2e+003
    hsa-miR-210 36 1.5e−005 2.09 (−) 1.2e+003-2.5e+003
    hsa-miR-886-3p 228 9.9e−005 2.01 (−) 1.3e+003-2.6e+003
    MID-00689 236 4.6e−004 1.95 (−) 4.8e+002-9.4e+002
    hsa-miR-886-5p 230 6.3e−004 1.92 (−) 5.3e+002-1.0e+003
    hsa-miR-27b 189 7.1e−005 1.86 (−) 1.8e+003-3.3e+003
    MID-15965 240 6.1e−003 1.79 (−) 3.6e+003-6.4e+003
    hsa-miR-92a 67 3.5e−005 1.75 (−) 2.7e+003-4.7e+003
    hsa-miR-378 202 3.0e−004 1.73 (−) 6.0e+002-1.0e+003
    hsa-miR-146b-5p 17 2.9e−004 1.71 (−) 8.0e+002-1.4e+003
    (+) the higher expression of this miR is in breast adenocarcinoma
    (−) the higher expression of this miR is in lung adenocarcinoma and ovarian carcinoma
  • FIG. 14 demonstrates binary decisions at node #18 of the decision-tree. Tumors originating in breast (diamonds) are easily distinguished from tumors of lung and ovarian origin (squares) using the expression levels of hsa-miR-92a (SEQ ID NO: 67, y-axis), hsa-miR-193a-3p (SEQ ID NO: 25, x-axis) and hsa-miR-31 (SEQ ID NO: 49, z-axis).
  • TABLE 20
    miR expression (in fluorescence units) distinguishing
    between lung adenocarcinoma and ovarian carcinoma
    SEQ
    fold- ID
    median values change p-value NO. miR name
    1.4e+003-5.2e+001 27.96 (+)  3.5e−008 56 hsa-miR-375
    5.5e+002-6.0e+001 9.19 (+) 8.1e−009 11 hsa-miR-138
    3.2e+003-5.7e+002 5.65 (+) 5.6e−004 168 hsa-miR-150
    9.7e+002-2.9e+002 3.35 (+) 2.2e−004 16 hsa-miR-146a
    2.3e+003-7.6e+002 2.96 (+) 3.2e−003 237 MID-15684
    8.4e+003-2.9e+003 2.88 (+) 1.7e−010 21 hsa-miR-181a
    7.0e+003-2.6e+003 2.69 (+) 9.2e−008 52 hsa-miR-34a
    2.4e+003-9.5e+002 2.58 (+) 3.2e−007 12 hsa-miR-140-3p
    2.1e+003-8.8e+002 2.39 (+) 2.1e−007 154 hsa-miR-181b
    1.4e+005-6.3e+004 2.28 (+) 1.9e−003 279 hsa-miR-1826
    3.2e+003-1.4e+003 2.25 (+) 6.1e−003 9 hsa-miR-126
    6.1e+003-2.7e+003 2.24 (+) 2.3e−006 39 hsa-miR-22
    4.2e+003-2.2e+003 1.93 (+) 8.9e−005 47 hsa-miR-30d
    1.9e+003-1.0e+003 1.90 (+) 3.5e−006 23 hsa-miR-185
    2.8e+003-1.5e+003 1.88 (+) 1.4e−003 50 hsa-miR-342-3p
    3.6e+003-2.1e+003 1.69 (+) 1.8e−002 167 hsa-miR-149*
    9.7e+002-5.9e+002 1.66 (+) 8.7e−005 383 MID-22912
    6.8e+004-4.2e+004 1.64 (+) 5.4e−004 34 hsa-miR-21
    1.8e+003-1.2e+003 1.55 (+) 5.1e−004 35 hsa-miR-21*
    1.4e+003-8.9e+002 1.54 (+) 1.7e−004 388 hsa-miR-423-5p
    4.4e+002-2.4e+003 5.38 (−) 1.0e−007 5 hsa-miR-10b
    1.9e+002-7.2e+002 3.77 (−) 3.3e−006 359 hsa-miR-708
    6.4e+002-2.1e+003 3.27 (−) 5.8e−003 245 MID-18336
    1.8e+002-5.4e+002 2.96 (−) 6.6e−003 254 MID-23291
    1.7e+003-5.1e+003 2.95 (−) 3.4e−005 10 hsa-miR-130a
    2.8e+003-8.1e+003 2.86 (−) 3.7e−003 240 MID-15965
    5.1e+002-1.3e+003 2.65 (−) 3.7e−004 236 MID-00689
    6.1e+002-1.6e+003 2.63 (−) 1.8e−004 202 hsa-miR-378
    1.3e+003-3.1e+003 2.39 (−) 1.2e−002 4 hsa-miR-41a
    1.0e+003-2.3e+003 2.30 (−) 1.8e−006 25 hsa-miR-193a-3p
    2.6e+002-5.6e+002 2.15 (−) 4.1e−004 203 hsa-miR-422a
    3.0e+003-6.1e+003 2.04 (−) 1.8e−002 231 hsa-miR-99a
    2.0e+003-3.9e+003 2.01 (−) 3.5e−005 20 hsa-miR-17
    3.3e+003-6.4e+003 1.96 (−) 3.5e−005 158 hsa-miR-106a
    1.8e+003-3.5e+003 1.88 (−) 3.1e−005 186 hsa-miR-20a
    3.2e+003-5.9e+003 1.85 (−) 1.2e−004 258 hsa-let-7f
    2.4e+003-4.4e+003 1.85 (−) 2.0e−003 244 MID-17144
    2.0e+003-3.5e+003 1.72 (−) 8.7e−004 172 hsa-miR-15b
    5.2e+003-8.8e+003 1.69 (−) 5.3e−004 2 hsa-let-7e
    4.1e+002-6.8e+002 1.66 (−) 1.2e−002 235 hsa-miR-1978
    3.3e+004-5.4e+004 1.66 (−) 3.3e−005 256 hsa-let-7a
    2.8e+003-4.7e+003 1.65 (−) 3.5e−003 28 hsa-miR-200a
    5.2e+002-8.5e+002 1.62 (−) 1.6e−004 277 hsa-miR-17*
    3.7e+002-5.8e+002 1.57 (−) 1.9e−003 296 hsa-miR-26b
    1.0e+005-1.6e+005 1.56 (−) 3.7e−002 374 MID-16748
    6.0e+003-9.1e+003 1.53 (−) 2.1e−005 153 hsa-let-7d
    3.8e+003-5.7e+003 1.50 (−) 4.1e−003 181 hsa-miR-199a-3p
    (+) the higher expression of this miR is in lung adenocarcinoma
    (−) the higher expression of this miR is in ovarian carcinoma
  • FIG. 15 demonstrates binary decisions at node #19 of the decision-tree. Tumors originating in lung adenocarcinoma (diamonds) are easily distinguished from tumors of ovarian carcinoma origin (squares) using the expression levels of hsa-miR-21 (SEQ ID NO: 34, y-axis), hsa-miR-378 (SEQ ID NO: 202, x-axis) and hsa-miR-138 (SEQ ID NO: 11, z-axis).
  • TABLE 21
    miR expression (in fluorescence units) distinguishing
    between the group consisting of thymic carcinoma
    and the group consisting of TCC and SCC
    SEQ
    fold- ID
    median values change p-value NO. miR name
    5.3e+002-5.9e+001 9.00 (+) 5.7e−026 161 hsa-miR-128
    7.4e+002-9.2e+001 8.04 (+) 2.2e−007 164 hsa-miR-142-5p
    6.8e+002-8.8e+001 7.82 (+) 2.6e−021 22 hsa-miR-181a*
    7.1e+002-1.2e+002 6.09 (+) 1.2e−006 53 hsa-miR-34c-5p
    9.1e+002-1.8e+002 5.06 (+) 2.2e−008 285 hsa-miR-20b
    1.3e+004-2.8e+003 4.59 (+) 7.5e−014 3 hsa-miR-100
    1.6e+003-3.6e+002 4.39 (+) 7.1e−007 152 hsa-miR-182
    8.7e+002-2.0e+002 4.37 (+) 6.4e−010 191 hsa-miR-29c
    3.7e+003-9.1e+002 4.09 (+) 2.6e−014 154 hsa-miR-181b
    1.5e+004-3.8e+003 3.82 (+) 4.8e−009 21 hsa-miR-181a
    2.1e+003-6.4e+002 3.25 (+) 6.4e−006 206 hsa-miR-455-3p
    8.7e+002-2.7e+002 3.23 (+) 1.5e−010 174 hsa-miR-181d
    9.4e+002-2.9e+002 3.23 (+) 1.9e−004 19 hsa-miR-149
    7.3e+002-2.6e+002 2.80 (+) 2.5e−008 45 hsa-miR-29c*
    8.7e+002-3.2e+002 2.69 (+) 2.2e−007 171 hsa-miR-15a
    2.7e+003-1.0e+003 2.66 (+) 5.1e−005 179 hsa-miR-195
    4.4e+004-1.8e+004 2.46 (+) 8.2e−008 8 hsa-miR-125b
    7.3e+002-3.2e+002 2.26 (+) 2.8e−004 296 hsa-miR-26b
    2.7e+003-1.2e+003 2.22 (+) 2.4e−003 284 hsa-miR-19b
    5.7e+002-2.6e+002 2.17 (+) 7.8e−005 18 hsa-miR-148a
    9.1e+002-4.4e+002 2.06 (+) 1.4e−006 51 hsa-miR-345
    7.5e+003-3.8e+003 2.00 (+) 1.6e−002 258 hsa-let-7f
    1.8e+002-4.4e+003 24.66 (−)  3.9e−008 49 hsa-miR-31
    5.8e+001-1.0e+003 17.65 (−)  1.0e−007 184 hsa-miR-203
    2.2e+002-1.6e+003 7.43 (−) 4.5e−018 35 hsa-miR-21*
    1.1e+004-5.5e+004 4.97 (−) 1.5e−032 34 hsa-miR-21
    6.2e+002-2.5e+003 4.06 (−) 2.0e−008 37 hsa-miR-214
    1.6e+002-5.8e+002 3.69 (−) 6.3e−005 42 hsa-miR-224
    6.9e+002-2.5e+003 3.58 (−) 2.3e−009 228 hsa-miR-886-3p
    4.8e+003-1.7e+004 3.47 (−) 3.9e−009 15 hsa-miR-145
    2.7e+003-8.2e+003 3.08 (−) 2.7e−008 14 hsa-miR-143
    1.3e+003-3.7e+003 2.93 (−) 6.7e−005 242 MID-16489
    2.5e+002-7.4e+002 2.91 (−) 4.3e−006 230 hsa-miR-886-5p
    3.5e+002-1.0e+003 2.90 (−) 5.6e−004 253 MID-23256
    1.1e+003-3.0e+003 2.82 (−) 7.9e−006 36 hsa-miR-210
    2.2e+003-5.8e+003 2.63 (−) 1.2e−007 182 hsa-miR-199a-5p
    5.0e+003-1.2e+004 2.48 (−) 3.8e−006 293 hsa-miR-23b
    7.5e+003-1.8e+004 2.44 (−) 1.1e−008 292 hsa-miR-23a
    2.7e+002-6.6e+002 2.43 (−) 5.4e−003 4 hsa-miR-10a
    9.0e+003-2.2e+004 2.43 (−) 5.5e−015 294 hsa-miR-24
    3.8e+003-8.8e+003 2.35 (−) 3.0e−004 297 hsa-miR-27a
    2.3e+002-5.2e+002 2.28 (−) 1.6e−002 354 hsa-miR-612
    1.8e+003-3.9e+003 2.22 (−) 1.1e−006 377 MID-17866
    1.6e+003-3.3e+003 2.11 (−) 2.4e−004 189 hsa-miR-27b
    6.9e+003-1.4e+004 2.08 (−) 1.5e−004 30 hsa-miR-200c
    3.8e+004-7.9e+004 2.05 (−) 1.8e−008 386 MID-23178
    1.5e+003-3.1e+003 2.04 (−) 2.6e−002 249 MID-20524
    4.6e+002-9.3e+002 2.03 (−) 5.6e−002 5 hsa-miR-10b
    2.5e+002-5.0e+002 2.03 (−) 3.2e−005 274 hsa-miR-151-3p
    (+) the higher expression of this miR is in thymic carcinoma
    (−) the higher expression of this miR is in TCC and SCC
  • FIG. 16 demonstrates binary decisions at node #20 of the decision-tree. Tumors originating in thymic carcinoma (marked by diamonds) are easily distinguished from tumors of urothelial carcinoma, transitional cell carcinoma (TCC) carcinoma and squamous cell carcinoma (SCC) origins (marked by squares) using the expression levels of hsa-miR-21 (SEQ ID NO: 34, y-axis) and hsa-miR-100 (SEQ ID NO: 3, x-axis).
  • TABLE 22
    miR expression (in fluorescence units) distinguishing
    between TCC and SCC (of anus, skin, lung, head&neck,
    esophagus or uterine cervix)
    SEQ
    fold- ID
    median values change p-value NO. miR name
    2.5e+002-5.0e+001 5.05 (+) 7.4e−036 69 hsa-miR-934
    9.1e+003-2.2e+003 4.14 (+) 1.2e−012 28 hsa-miR-200a
    2.1e+002-5.3e+001 3.87 (+) 8.4e−007 280 hsa-miR-187
    6.0e+003-1.9e+003 3.19 (+) 9.8e−008 13 hsa-miR-141
    5.6e+002-1.8e+002 3.15 (+) 4.7e−013 191 hsa-miR-29c
    9.4e+002-3.0e+002 3.13 (+) 8.1e−008 152 hsa-miR-182
    1.8e+004-6.2e+003 2.99 (+) 3.9e−010 29 hsa-miR-200b
    3.2e+002-1.1e+002 2.81 (+) 8.8e−009 175 hsa-miR-183
    3.1e+004-1.2e+004 2.65 (+) 8.1e−005 30 hsa-miR-200c
    2.1e+003-8.1e+002 2.63 (+) 6.2e−020 204 hsa-miR-425
    1.2e+003-4.8e+002 2.41 (+) 6.3e−007 4 hsa-miR-10a
    8.5e+003-3.7e+003 2.30 (+) 2.4e−024 24 hsa-miR-191
    1.8e+003-8.4e+002 2.14 (+) 2.7e−006 5 hsa-miR-10b
    3.5e+002-1.7e+002 2.09 (+) 2.0e−006 329 hsa-miR-425*
    3.4e+002-1.6e+002 2.08 (+) 8.0e−005 273 hsa-miR-148b
    3.1e+002-8.1e+002 2.60 (−) 1.6e−005 170 hsa-miR-155
    5.0e+002-1.2e+003 2.49 (−) 1.0e−002 184 hsa-miR-203
    1.5e+002-3.5e+002 2.39 (−) 3.0e−011 26 hsa-miR-193a-5p
    1.9e+003-4.5e+003 2.35 (−) 1.3e−008 231 hsa-miR-99a
    1.2e+002-2.7e+002 2.28 (−) 1.6e−003 368 MID-00672
    1.4e+003-3.2e+003 2.25 (−) 1.3e−005 37 hsa-miR-214
    3.9e+002-8.7e+002 2.23 (−) 1.5e−004 16 hsa-miR-146a
    3.5e+002-7.6e+002 2.15 (−) 4.2e−005 169 hsa-miR-152
    1.5e+002-3.3e+002 2.15 (−) 4.4e−004 155 hsa-miR-127-3p
    8.4e+002-1.7e+003 2.08 (−) 1.6e−005 35 hsa-miR-21*
    7.7e+003-1.6e+004 2.07 (−) 5.3e−012 40 hsa-miR-222
    4.4e+002-9.0e+002 2.06 (−) 1.5e−005 17 hsa-miR-146b-5p
    (+) the higher expression of this miR is in TCC
    (−) the higher expression of this miR is in SCC
  • hsa-miR-934 (SEQ ID NO: 69), hsa-miR-191 (SEQ ID NO: 24) and hsa-miR-29c (SEQ ID NO: 191) are used at node #21 of the binary-tree-classifier detailed in the invention to distinguish between TCC and SCC.
  • TABLE 23
    miR expression (in fluorescence units) distinguishing between SCC of the uterine
    cervix and other SCC tumors (anus, skin, lung, head& neck or esophagus)
    median values auROC fold-change p-value SEQ ID NO. miR name
    2.4e+002-9.2e+001 0.65 2.57 (+) 2.0e−002 164 hsa-miR-142-5p
    1.6e+003-7.6e+002 0.85 2.13 (+) 1.7e−005 5 hsa-miR-10b
    8.9e+003-4.4e+003 0.74 2.01 (+) 2.1e−004 231 hsa-miR-99a
    1.2e+003-9.8e+002 0.71 1.24 (+) 1.2e−002 54 hsa-miR-361-5p
    3.4e+004-2.7e+004 0.71 1.24 (+) 3.9e−004 1 hsa-let-7c
    1.3e+003-4.3e+003 0.81 3.39 (−) 9.9e−006 242 MID-16489
    3.9e+002-1.2e+003 0.74 3.10 (−) 2.1e−003 372 MID-16469
    1.1e+003-3.3e+003 0.84 3.09 (−) 1.3e−008 249 MID-20524
    1.7e+003-5.2e+003 0.78 3.01 (−) 2.4e−005 167 hsa-miR-149*
    2.7e+002-8.0e+002 0.79 2.97 (−) 1.4e−004 254 MID-23291
    2.2e+002-6.2e+002 0.76 2.77 (−) 1.7e−004 354 hsa-miR-612
    5.7e+002-1.5e+003 0.76 2.65 (−) 7.8e−006 381 MID-19962
    2.3e+002-6.0e+002 0.79 2.63 (−) 2.1e−005 380 MID-19898
    9.8e+002-2.4e+003 0.78 2.44 (−) 2.8e−005 245 MID-18336
    1.2e+002-2.8e+002 0.73 2.34 (−) 5.3e−003 358 hsa-miR-665
    6.1e+002-1.4e+003 0.70 2.31 (−) 6.1e−003 364 MID-00064
    2.9e+003-6.7e+003 0.81 2.30 (−) 8.8e−008 240 MID-15965
    1.2e+002-2.8e+002 0.66 2.26 (−) 1.5e−002 11 hsa-miR-138
    1.0e+002-2.3e+002 0.77 2.24 (−) 8.9e−005 378 MID-18307
    (+) the higher expression of this miR is in SCC of the uterine cervix
    (−) the higher expression of this miR is in other SCC tumors
  • FIG. 17 demonstrates binary decisions at node #22 of the decision-tree. Tumors originating in SCC of the uterine cervix (diamonds) are easily distinguished from tumors of other SCC origin (squares) using the expression levels of hsa-miR-361-5p (SEQ ID NO: 54, y-axis), hsa-let-7c (SEQ ID NO: 1, x-axis) and hsa-miR-10b (SEQ ID NO: 5, z-axis).
  • TABLE 24
    miR expression (in fluorescence units) distinguishing between anus
    or skin SCC and upper SCC tumors (lung, head& neck or esophagus)
    median values auROC fold-change p-value SEQ ID NO. miR name
    3.2e+002-5.0e+001 0.78 6.38 (+) 3.0e−006 305 hsa-miR-31*
    4.3e+003-8.0e+002 0.80 5.39 (+) 1.8e−006 184 hsa-miR-203
    8.6e+002-2.5e+002 0.78 3.49 (+) 1.8e−006 41 hsa-miR-223
    1.7e+003-5.4e+002 0.80 3.12 (+) 3.5e−006 183 hsa-miR-199b-5p
    9.4e+003-3.5e+003 0.70 2.73 (+) 2.4e−003 49 hsa-miR-31
    8.7e+003-3.2e+003 0.86 2.71 (+) 3.6e−007 382 MID-22331
    1.9e+003-7.1e+002 0.87 2.68 (+) 1.7e−008 235 hsa-miR-1978
    2.4e+002-9.2e+001 0.83 2.55 (+) 9.6e−009 291 hsa-miR-222*
    6.8e+003-2.9e+003 0.74 2.31 (+) 7.4e−004 181 hsa-miR-199a-3p
    1.5e+003-6.7e+002 0.88 2.28 (+) 7.1e−007 5 hsa-miR-10b
    5.3e+002-2.4e+002 0.75 2.21 (+) 1.4e−004 296 hsa-miR-26b
    3.4e+002-1.6e+002 0.74 2.19 (+) 7.7e−005 289 hsa-miR-22*
    1.3e+003-6.0e+002 0.71 2.13 (+) 1.2e−003 206 hsa-miR-455-3p
    7.9e+003-3.8e+003 0.84 2.11 (+) 4.2e−006 338 hsa-miR-494
    2.9e+002-1.4e+002 0.73 2.08 (+) 1.1e−004 334 hsa-miR-483-5p
    2.8e+003-1.3e+003 0.82 2.07 (+) 4.5e−006 25 hsa-miR-193a-3p
    1.1e+002-3.3e+002 0.77 3.03 (−) 2.3e−005 11 hsa-miR-138
    1.3e+002-3.1e+002 0.65 2.29 (−) 1.5e−002 19 hsa-miR-149
    9.7e+001-2.1e+002 0.75 2.16 (−) 4.0e−005 198 hsa-miR-342-5p
    1.1e+003-1.8e+003 0.83 1.63 (−) 1.1e−006 23 hsa-miR-185
    (+) the higher expression of this miR is in anus or skin SCC
    (−) the higher expression of this miR is in upper SCC tumors
  • hsa-miR-10b (SEQ ID NO: 5), hsa-miR-138 (SEQ ID NO: 11) and hsa-miR-185 (SEQ ID NO: 23) are used at node 23 of the binary-tree-classifier detailed in the invention to distinguish between anus or skin SCC and upper SCC tumors.
  • TABLE 25
    miR expression (in fluorescence units) distinguishing
    between melanoma and lymphoma (B-cell or T-cell) tumors
    median values auROC fold-change p-value SEQ ID NO. miR name
    1.7e+003-3.0e+002 0.89 5.81 (+) 2.8e−010 4 hsa-miR-10a
    1.9e+003-6.0e+002 0.80 3.13 (+) 7.9e−005 11 hsa-miR-138
    1.7e+003-5.7e+002 0.94 2.98 (+) 2.3e−011 46 hsa-miR-30a
    2.5e+004-8.8e+003 0.87 2.83 (+) 1.1e−009 8 hsa-miR-125b
    6.2e+002-2.3e+002 0.94 2.74 (+) 9.2e−011 274 hsa-miR-151-3p
    9.2e+002-3.4e+002 0.87 2.70 (+) 1.9e−007 169 hsa-miR-152
    1.6e+003-6.0e+002 0.77 2.60 (+) 2.0e−004 36 hsa-miR-210
    4.8e+003-1.9e+003 0.90 2.56 (+) 2.1e−011 47 hsa-miR-30d
    1.2e+003-5.5e+002 0.88 2.26 (+) 2.5e−008 363 hsa-miR-99b
    2.4e+003-1.1e+003 0.85 2.24 (+) 1.4e−006 231 hsa-miR-99a
    6.5e+003-3.0e+003 0.80 2.17 (+) 2.2e−005 303 hsa-miR-30b
    6.4e+002-3.0e+002 0.86 2.14 (+) 2.9e−008 349 hsa-miR-532-5p
    2.1e+003-1.0e+003 0.86 2.08 (+) 1.6e−006 10 hsa-miR-130a
    5.4e+003-2.6e+003 0.81 2.06 (+) 8.3e−006 7 hsa-miR-125a-5p
    3.6e+003-1.8e+003 0.82 2.05 (+) 2.5e−006 3 hsa-miR-100
    7.9e+003-3.9e+003 0.69 2.04 (+) 1.5e−002 16 hsa-miR-146a
    1.6e+002-2.2e+003 0.93 13.84 (−)  1.1e−014 164 hsa-miR-142-5p
    7.2e+002-7.5e+003 0.93 10.40 (−)  1.1e−013 170 hsa-miR-155
    2.0e+003-1.4e+004 0.90 7.18 (−) 2.2e−010 168 hsa-miR-150
    1.7e+002-7.0e+002 0.91 4.14 (−) 5.2e−011 198 hsa-miR-342-5p
    2.2e+003-8.3e+003 0.97 3.83 (−) 6.2e−019 50 hsa-miR-342-3p
    9.1e+002-2.6e+003 0.86 2.87 (−) 1.3e−008 245 MID-18336
    2.3e+002-6.4e+002 0.77 2.74 (−) 2.4e−004 365 MID-00078
    1.9e+002-5.2e+002 0.78 2.68 (−) 4.7e−004 45 hsa-miR-29c*
    2.8e+003-6.6e+003 0.74 2.34 (−) 1.6e−003 382 MID-22331
    3.4e+003-7.9e+003 0.85 2.30 (−) 1.2e−004 259 hsa-let-7g
    3.8e+002-8.5e+002 0.75 2.25 (−) 4.3e−003 296 hsa-miR-26b
    7.5e+002-1.6e+003 0.78 2.16 (−) 2.3e−004 364 MID-00064
    6.3e+002-1.3e+003 0.79 2.08 (−) 7.9e−005 314 hsa-miR-361-3p
    2.7e+003-5.4e+003 0.80 2.05 (−) 7.5e−007 12 hsa-miR-140-3p
    (+) the higher expression of this miR is in melanoma
    (−) the higher expression of this miR is in lymphoma
  • FIG. 18 demonstrates binary decisions at node #24 of the decision-tree. Tumors originating in melanoma (diamonds) are easily distinguished from tumors of lymphoma origin (squares) using the expression levels of hsa-miR-342-3p (SEQ ID NO: 50, y-axis) and hsa-miR-30d (SEQ ID NO: 47, x-axis).
  • TABLE 26
    miR expression (in fluorescence units) distinguishing
    between B-cell lymphoma and T-cell lymphoma
    median values auROC fold-change p-value SEQ ID NO. miR name
    8.3e+002-2.8e+002 0.74 2.96 (+) 3.7e−005 11 hsa-miR-138
    6.7e+002-2.8e+002 0.72 2.37 (+) 2.2e−003 191 hsa-miR-29c
    1.2e+003-5.9e+002 0.76 2.02 (+) 1.4e−003 48 hsa-miR-30e
    6.7e+002-1.8e+003 0.79 2.77 (−) 1.1e−006 35 hsa-miR-21*
    1.5e+003-3.9e+003 0.68 2.68 (−) 2.6e−003 228 hsa-miR-886-3p
    (+) the higher expression of this miR is in B-cell lymphoma
    (−) the higher expression of this miR is in T-cell lymphoma
  • hsa-miR-30e (SEQ ID NO: 48) and hsa-miR-21* (SEQ ID NO: 35) are used at node 25 of the binary-tree-classifier detailed in the invention to distinguish between B-cell lymphoma and T-cell lymphoma.
  • TABLE 27
    miR expression (in fluorescence units) distinguishing between lung
    small cell carcinoma and other neuroendocrine tumors selected from
    the group consisting of lung carcinoid, medullary thyroid carcinoma,
    gastrointestinal tract carcinoid and pancreatic islet cell tumor
    median values auROC fold-change p-value SEQ ID NO. miR name
    1.2e+004-1.2e+003 0.99 9.68 (+) 3.3e−021 158 hsa-miR-106a
    7.3e+003-7.9e+002 1.00 9.17 (+) 3.4e−022 20 hsa-miR-17
    1.4e+003-1.6e+002 0.99 8.53 (+) 8.2e−022 176 hsa-miR-18a
    5.8e+003-7.0e+002 1.00 8.38 (+) 7.4e−021 186 hsa-miR-20a
    1.1e+004-1.5e+003 0.98 7.71 (+) 1.7e−022 148 hsa-miR-93
    4.7e+003-6.7e+002 0.89 6.99 (+) 1.0e−008 36 hsa-miR-210
    2.2e+003-3.7e+002 0.95 5.87 (+) 2.8e−016 51 hsa-miR-345
    8.9e+003-1.8e+003 0.95 4.96 (+) 1.6e−010 172 hsa-miR-15b
    8.2e+003-1.8e+003 0.98 4.68 (+) 6.3e−020 260 hsa-miR-106b
    1.1e+003-2.4e+002 0.91 4.62 (+) 7.7e−010 265 hsa-miR-130b
    8.0e+003-1.8e+003 0.94 4.33 (+) 2.7e−013 67 hsa-miR-92a
    4.1e+003-9.8e+002 0.98 4.15 (+) 2.6e−019 188 hsa-miR-25
    1.1e+003-3.4e+002 0.98 3.40 (+) 7.9e−016 277 hsa-miR-17*
    2.5e+003-8.3e+002 0.99 2.96 (+) 1.8e−011 284 hsa-miR-19b
    5.1e+002-1.8e+002 0.74 2.84 (+) 6.1e−004 302 hsa-miR-301a
    7.9e+002-2.9e+002 0.91 2.78 (+) 8.7e−010 68 hsa-miR-92b
    9.9e+002-4.3e+002 0.69 2.28 (+) 5.1e−002 168 hsa-miR-150
    2.5e+003-1.1e+003 0.70 2.24 (+) 4.5e−003 242 MID-16489
    1.4e+003-6.6e+002 0.91 2.12 (+) 1.1e−009 204 hsa-miR-425
    5.0e+001-1.6e+003 0.91 31.23 (−)  4.5e−009 162 hsa-miR-129-3p
    1.1e+002-1.6e+003 0.91 14.13 (−)  8.6e−009 177 hsa-miR-192
    7.6e+001-7.9e+002 0.91 10.42 (−)  1.5e−008 27 hsa-miR-194
    5.5e+002-5.0e+003 0.92 9.14 (−) 1.7e−009 65 hsa-miR-7
    7.1e+001-5.7e+002 0.78 8.02 (−) 5.8e−005 263 hsa-miR-129*
    2.5e+002-1.6e+003 0.80 6.30 (−) 3.5e−005 155 hsa-miR-127-3p
    1.5e+002-9.1e+002 0.96 6.05 (−) 3.5e−015 191 hsa-miR-29c
    3.3e+002-2.0e+003 0.93 5.99 (−) 3.3e−013 190 hsa-miR-29b
    1.7e+002-9.9e+002 0.99 5.76 (−) 1.3e−020 45 hsa-miR-29c*
    1.2e+002-6.6e+002 0.75 5.60 (−) 8.0e−004 59 hsa-miR-487b
    1.8e+003-8.0e+003 0.90 4.44 (−) 1.6e−012 43 hsa-miR-29a
    1.3e+004-4.9e+004 0.88 3.87 (−) 3.3e−006 56 hsa-miR-375
    1.6e+002-5.5e+002 0.95 3.37 (−) 9.6e−011 266 hsa-miR-132
    4.0e+003-1.2e+004 0.82 2.98 (−) 9.4e−006 14 hsa-miR-143
    7.8e+003-2.3e+004 0.85 2.89 (−) 6.1e−006 15 hsa-miR-145
    1.2e+004-3.4e+004 0.79 2.83 (−) 4.3e−005 8 hsa-miR-125b
    4.5e+003-1.2e+004 0.97 2.70 (−) 1.7e−014 7 hsa-miR-125a-5p
    1.9e+003-5.0e+003 0.89 2.67 (−) 3.6e−010 39 hsa-miR-22
    2.5e+003-5.7e+003 0.79 2.25 (−) 8.7e−004 189 hsa-miR-27b
    1.1e+003-2.4e+003 0.64 2.18 (−) 4.1e−002 249 MID-20524
    2.2e+003-4.8e+003 0.72 2.14 (−) 8.5e−003 231 hsa-miR-99a
    9.6e+003-2.0e+004 0.82 2.12 (−) 1.3e−003 293 hsa-miR-23b
    5.1e+003-1.0e+004 0.80 2.01 (−) 6.3e−005 2 hsa-let-7e
    (+) the higher expression of this miR is in lung small cell carcinoma
    (−) the higher expression of this miR is in other neuroendocrine tumors
  • hsa-miR-17 (SEQ ID NO: 20) and hsa-miR-29c* (SEQ ID NO: 45) are used at node #26 of the binary-tree-classifier detailed in the invention to distinguish between lung small cell carcinoma and other neuroendocrine tumors.
  • TABLE 28
    miR expression (in fluorescence units) distinguishing between medullary thyroid
    carcinoma and other neuroendocrine tumors selected from the group consisting of
    lung carcinoid, gastrointestinal tract carcinoid and pancreatic islet cell tumor
    median values auROC fold-change p-value SEQ ID NO. miR name
    4.4e+003-5.5e+001 0.84 79.70 (+)  1.5e−007 159 hsa-miR-124
    4.0e+004-4.9e+003 0.98 8.07 (+) 1.6e−015 40 hsa-miR-222
    1.9e+004-2.8e+003 0.98 6.85 (+) 4.8e−016 147 hsa-miR-221
    1.1e+003-2.0e+002 0.70 5.55 (+) 1.1e−003 11 hsa-miR-138
    3.2e+002-7.8e+001 0.83 4.12 (+) 7.6e−007 311 hsa-miR-335
    5.8e+003-1.5e+003 0.86 3.91 (+) 1.3e−006 4 hsa-miR-10a
    6.3e+004-1.7e+004 0.83 3.61 (+) 3.9e−006 8 hsa-miR-125b
    1.1e+004-3.2e+003 0.79 3.43 (+) 5.5e−005 231 hsa-miR-99a
    4.3e+002-2.0e+002 0.78 2.10 (+) 2.8e−004 301 hsa-miR-29b-2*
    7.9e+003-3.8e+003 0.82 2.06 (+) 4.4e−005 297 hsa-miR-27a
    1.4e+002-4.0e+002 0.95 2.95 (−) 7.5e−011 68 hsa-miR-92b
    1.1e+003-2.8e+003 0.87 2.50 (−) 3.2e−006 67 hsa-miR-92a
    1.8e+002-3.7e+002 0.76 2.07 (−) 2.0e−003 265 hsa-miR-130b
    4.4e+002-9.0e+002 0.75 2.04 (−) 2.1e−003 36 hsa-miR-210
    (+) the higher expression of this miR is in medullary thyroid carcinoma
    (−) the higher expression of this miR is in other neuroendocrine tumors
  • FIG. 19 demonstrates binary decisions at node #27 of the decision-tree. Tumors originating in medullary thyroid carcinoma (diamonds) are easily distinguished from tumors of other neuroendocrine origin (squares) using the expression levels of hsa-miR-92b (SEQ ID NO: 68, y-axis), hsa-miR-222 (SEQ ID NO: 40, x-axis) and hsa-miR-92a (SEQ ID NO: 67, z-axis).
  • TABLE 29
    miR expression (in fluorescence units) distinguishing between lung carcinoid
    tumors and GI neuroendocrine tumors selected from the group consisting
    of gastrointestinal tract carcinoid and pancreatic islet cell tumor
    median values auROC fold-change p-value SEQ ID NO. miR name
    4.0e+003-9.9e+001 0.90 40.08 (+)  1.9e−010 331 hsa-miR-432
    6.0e+003-1.5e+002 0.86 39.24 (+)  4.6e−008 162 hsa-miR-129-3p
    6.3e+003-1.9e+002 0.87 34.16 (+)  7.8e−009 59 hsa-miR-487b
    1.3e+003-5.5e+001 0.88 23.36 (+)  2.9e−010 326 hsa-miR-409-5p
    1.1e+003-5.0e+001 0.88 21.14 (+)  5.2e−010 306 hsa-miR-323-3p
    1.0e+003-5.5e+001 0.87 18.59 (+)  1.5e−009 350 hsa-miR-539
    7.9e+002-5.6e+001 0.84 14.25 (+)  1.4e−008 317 hsa-miR-369-5p
    1.0e+004-7.2e+002 0.86 13.95 (+)  3.2e−007 155 hsa-miR-127-3p
    1.7e+003-1.2e+002 0.86 13.60 (+)  2.1e−008 325 hsa-miR-409-3p
    1.6e+003-1.2e+002 0.88 13.10 (+)  4.2e−009 318 hsa-miR-370
    9.5e+002-7.3e+001 0.81 13.03 (+)  3.1e−006 339 hsa-miR-495
    9.5e+002-7.4e+001 0.84 12.92 (+)  5.7e−007 264 hsa-miR-129-5p
    6.4e+002-5.0e+001 0.91 12.84 (+)  1.6e−013 332 hsa-miR-433
    6.5e+002-5.7e+001 0.88 11.52 (+)  5.1e−011 262 hsa-miR-127-5p
    5.6e+002-5.2e+001 0.90 10.76 (+)  2.7e−012 336 hsa-miR-485-5p
    2.0e+003-1.9e+002 0.86 10.44 (+)  4.2e−008 324 hsa-miR-382
    7.9e+002-7.8e+001 0.83 10.20 (+)  1.3e−007 322 hsa-miR-379
    6.0e+002-5.9e+001 0.89 10.15 (+)  9.6e−012 330 hsa-miR-431*
    4.7e+002-5.0e+001 0.90 9.41 (+) 6.1e−012 321 hsa-miR-377*
    1.3e+003-1.4e+002 0.80 9.40 (+) 1.5e−005 263 hsa-miR-129*
    4.7e+002-5.0e+001 0.86 9.35 (+) 1.8e−008 309 hsa-miR-329
    4.9e+002-5.3e+001 0.79 9.24 (+) 3.1e−005 53 hsa-miR-34c-5p
    1.1e+003-1.2e+002 0.83 9.05 (+) 6.4e−007 320 hsa-miR-376c
    1.1e+003-1.2e+002 0.86 8.81 (+) 2.3e−008 275 hsa-miR-154
    6.5e+002-8.4e+001 0.83 7.73 (+) 8.1e−007 352 hsa-miR-543
    9.9e+002-1.3e+002 0.82 7.49 (+) 3.2e−007 312 hsa-miR-337-5p
    6.2e+002-8.8e+001 0.86 7.10 (+) 3.0e−008 355 hsa-miR-654-3p
    3.5e+002-5.0e+001 0.91 7.05 (+) 3.2e−013 367 MID-00465
    6.0e+002-1.0e+002 0.84 5.76 (+) 5.9e−007 269 hsa-miR-134
    3.2e+003-8.5e+002 0.91 3.84 (+) 1.2e−011 64 hsa-miR-652
    3.2e+002-1.1e+002 0.83 2.84 (+) 2.3e−005 308 hsa-miR-328
    2.6e+003-9.4e+002 0.74 2.78 (+) 1.1e−003 175 hsa-miR-183
    2.8e+003-1.0e+003 0.87 2.73 (+) 3.0e−006 190 hsa-miR-29b
    3.9e+003-1.6e+003 0.88 2.49 (+) 6.9e−010 54 hsa-miR-361-5p
    4.1e+002-1.7e+002 0.67 2.44 (+) 2.1e−002 302 hsa-miR-301a
    4.0e+003-1.7e+003 0.79 2.41 (+) 5.9e−004 152 hsa-miR-182
    4.0e+002-1.7e+002 0.88 2.39 (+) 4.7e−007 301 hsa-miR-29b-2*
    8.7e+002-3.7e+002 0.77 2.36 (+) 6.8e−005 266 hsa-miR-132
    7.7e+003-3.3e+003 0.82 2.34 (+) 5.4e−006 47 hsa-miR-30d
    3.7e+002-1.6e+002 0.70 2.32 (+) 5.8e−003 313 hsa-miR-338-3p
    3.3e+002-1.5e+002 0.66 2.16 (+) 1.3e−002 359 hsa-miR-708
    5.5e+003-2.5e+003 0.68 2.16 (+) 4.2e−002 65 hsa-miR-7
    2.1e+003-9.9e+002 0.78 2.13 (+) 7.0e−005 307 hsa-miR-324-5p
    1.2e+003-5.9e+002 0.81 2.02 (+) 1.6e−004 191 hsa-miR-29c
    3.5e+002-1.9e+003 0.88 5.36 (−) 1.0e−007 242 MID-16489
    6.5e+002-1.9e+003 0.76 2.96 (−) 4.9e−004 4 hsa-miR-10a
    1.3e+003-3.6e+003 0.84 2.79 (−) 1.9e−006 147 hsa-miR-221
    2.2e+003-5.9e+003 0.81 2.75 (−) 8.5e−006 40 hsa-miR-222
    2.6e+002-6.8e+002 0.76 2.56 (−) 7.4e−004 372 MID-16469
    3.5e+002-8.9e+002 0.71 2.56 (−) 4.7e−003 168 hsa-miR-150
    1.5e+002-3.7e+002 0.83 2.55 (−) 4.8e−005 16 hsa-miR-146a
    1.9e+003-4.7e+003 0.82 2.40 (−) 1.7e−005 182 hsa-miR-199a-5p
    1.3e+003-3.0e+003 0.79 2.35 (−) 1.2e−004 167 hsa-miR-149*
    2.1e+002-4.8e+002 0.84 2.26 (−) 3.1e−005 356 hsa-miR-658
    1.2e+003-2.8e+003 0.74 2.25 (−) 1.4e−003 148 hsa-miR-93
    1.4e+003-3.1e+003 0.70 2.23 (−) 1.9e−002 382 MID-22331
    8.0e+002-1.8e+003 0.83 2.21 (−) 2.9e−005 37 hsa-miR-214
    4.4e+002-8.9e+002 0.79 2.01 (−) 2.1e−004 364 MID-00064
    2.1e+002-4.2e+002 0.78 2.01 (−) 1.1e−003 35 hsa-miR-21*
    (+) the higher expression of this miR is in lung carcinoid tumors
    (−) the higher expression of this miR is in GI neuroendocrine tumors
  • hsa-miR-652 (SEQ ID NO: 64), hsa-miR-34c-5p (SEQ ID NO: 53) and hsa-miR-214 (SEQ ID NO: 37) are used at node 28 of the binary-tree-classifier detailed in the invention to distinguish between lung carcinoid tumors and GI neuroendocrine tumors.
  • TABLE 30
    miR expression (in fluorescence units) distinguishing between pancreatic islet cell
    tumors and GI neuroendocrine carcinoid tumors selected from the group consisting
    of small intestine and duodenum; appendicitis, stomach and pancreas
    fold-
    miR name SEQ ID NO. p-value change auROC median values
    hsa-miR-129* 263 2.8e−004 20.91 (+)  0.80 2.3e+003 1.1e+002
    hsa-miR-217 288 6.6e−003 9.61 (+) 0.72 4.8e+002 5.0e+001
    hsa-miR-148a 18 6.8e−006 8.54 (+) 0.90 1.6e+003 1.9e+002
    hsa-miR-216a 286 2.7e−002 8.34 (+) 0.68 4.3e+002 5.2e+001
    hsa-miR-129-3p 162 4.4e−003 7.22 (+) 0.74 1.8e+003 2.5e+002
    hsa-miR-551b 225 2.3e−003 6.65 (+) 0.74 6.6e+002 9.9e+001
    hsa-miR-216b 287 5.4e−003 6.04 (+) 0.75 3.0e+002 5.0e+001
    hsa-miR-455-3p 206 7.3e−007 3.75 (+) 0.92 7.1e+002 1.9e+002
    hsa-miR-451 205 2.5e−003 3.65 (+) 0.79 1.3e+004 3.4e+003
    hsa-miR-26b 296 2.8e−004 3.43 (+) 0.83 8.9e+002 2.6e+002
    hsa-let-7f 258 3.6e−004 3.29 (+) 0.91 8.7e+003 2.6e+003
    hsa-miR-338-3p 313 3.2e−003 3.25 (+) 0.78 5.2e+002 1.6e+002
    MID-17866 377 5.0e−005 2.71 (+) 0.85 6.6e+003 2.4e+003
    MID-16582 373 1.2e−005 2.45 (+) 0.88 1.9e+004 7.6e+003
    hsa-let-7a 256 1.0e−003 2.42 (+) 0.80 6.4e+004 2.7e+004
    hsa-let-7d 153 1.8e−004 2.36 (+) 0.89 1.1e+004 4.7e+003
    hsa-let-7g 259 2.2e−003 2.28 (+) 0.86 6.4e+003 2.8e+003
    hsa-miR-130b 265 1.0e−002 2.11 (+) 0.69 4.8e+002 2.3e+002
    hsa-miR-30b 303 7.5e−003 2.09 (+) 0.75 6.4e+003 3.1e+003
    hsa-miR-133b 268 5.4e−004 9.40 (−) 0.81 1.0e+002 9.7e+002
    hsa-miR-133a 267 5.4e−004 9.22 (−) 0.80 1.1e+002 1.0e+003
    hsa-miR-143* 165 2.1e−006 8.37 (−) 0.93 2.3e+002 1.9e+003
    hsa-miR-145 15 3.7e−008 8.18 (−) 0.94 1.1e+004 9.2e+004
    hsa-miR-145* 272 7.0e−006 8.05 (−) 0.91 6.6e+001 5.3e+002
    hsa-miR-143 14 3.7e−009 7.30 (−) 0.96 5.2e+003 3.8e+004
    hsa-miR-378 202 6.2e−006 6.35 (−) 0.88 3.1e+002 2.0e+003
    MID-00689 236 8.1e−006 4.99 (−) 0.88 2.9e+002 1.4e+003
    hsa-miR-422a 203 7.9e−006 4.74 (−) 0.88 1.4e+002 6.4e+002
    hsa-miR-10a 4 2.4e−004 3.91 (−) 0.82 9.2e+002 3.6e+003
    hsa-miR-150 168 4.4e−003 3.78 (−) 0.78 3.0e+002 1.1e+003
    hsa-miR-330-3p 310 3.4e−004 3.23 (−) 0.81 1.1e+002 3.6e+002
    hsa-miR-28-3p 298 4.6e−007 3.16 (−) 0.95 2.1e+002 6.7e+002
    hsa-miR-194 27 1.4e−002 3.09 (−) 0.74 7.2e+002 2.2e+003
    hsa-miR-200b 29 7.6e−005 2.72 (−) 0.91 7.5e+003 2.1e+004
    hsa-miR-21 34 2.8e−006 2.57 (−) 0.87 8.5e+003 2.2e+004
    hsa-miR-886-3p 228 8.0e−003 2.56 (−) 0.74 6.7e+002 1.7e+003
    hsa-miR-100 3 3.6e−003 2.50 (−) 0.77 1.5e+003 3.8e+003
    hsa-miR-532-5p 349 4.3e−007 2.14 (−) 0.94 2.5e+002 5.3e+002
    hsa-miR-21* 35 8.5e−004 2.06 (−) 0.82 2.4e+002 5.0e+002
    hsa-miR-193a-5p 26 5.7e−003 2.01 (−) 0.77 1.6e+002 3.3e+002
    (+) the higher expression of this miR is in pancreatic islet cell tumors
    (−) the higher expression of this miR is in GI neuroendocrine carcinoid tumors
  • hsa-miR-21 (SEQ ID NO: 34), and hsa-miR-148a (SEQ ID NO: 18) are used at node 29 of the binary-tree-classifier detailed in the invention to distinguish between pancreatic islet cell tumors and GI neuroendocrine carcinoid tumors.
  • TABLE 31
    miR expression (in fluorescence units) distinguishing between gastric or esophageal
    adenocarcinoma and other adenocarcinoma tumors of the gastrointestinal system
    selected from the group consisting of cholangiocarcinoma or adenocarcinoma of
    extrahepatic biliary tract, pancreatic adenocarcinoma and colorectal adenocarcinoma
    SEQ ID fold-
    miR name NO. p-value change auROC median values
    hsa-miR-133a 267 4.6e−008 9.14 (+) 0.74 6.2e+002 6.7e+001
    hsa-miR-133b 268 3.9e−008 8.73 (+) 0.74 5.5e+002 6.3e+001
    hsa-miR-143* 165 3.9e−007 4.26 (+) 0.75 2.5e+003 5.9e+002
    hsa-miR-145 15 4.5e−004 2.82 (+) 0.71 7.9e+004 2.8e+004
    hsa-miR-143 14 1.3e−003 2.55 (+) 0.68 3.2e+004 1.3e+004
    hsa-miR-658 356 8.2e−004 2.53 (+) 0.71 1.3e+003 5.1e+002
    hsa-miR-149* 167 2.2e−004 2.33 (+) 0.72 7.2e+003 3.1e+003
    MID-17576 376 7.2e−004 2.22 (+) 0.69 3.1e+003 1.4e+003
    MID-16469 372 3.0e−004 2.20 (+) 0.71 1.4e+003 6.5e+002
    hsa-miR-145* 272 3.0e−004 2.14 (+) 0.69 3.2e+002 1.5e+002
    MID-15986 370 3.8e−004 2.11 (+) 0.74 2.9e+003 1.4e+003
    hsa-miR-224 42 5.4e−008 6.57 (−) 0.83 5.5e+001 3.6e+002
    hsa-miR-223 41 1.1e−004 2.61 (−) 0.73 1.5e+002 4.0e+002
    hsa-miR-1201 146 1.2e−002 1.28 (−) 0.67 9.0e+002 1.2e+003
    (+) the higher expression of this miR is in gastric or esophageal adenocarcinoma
    (−) the higher expression of this miR is in other adenocarcinoma tumors of the gastrointestinal system
  • FIG. 20 demonstrates binary decisions at node #30 of the decision-tree. Tumors originating in gastric or esophageal adenocarcinoma (diamonds) are easily distinguished from tumors of other GI adenocarcinoma origin (squares) using the expression levels of hsa-miR-1201 (SEQ ID NO: 146, y-axis), hsa-miR-224 (SEQ ID NO: 42, x-axis) and hsa-miR-210 (SEQ ID NO: 36, z-axis).
  • TABLE 32
    miR expression (in fluorescence units) distinguishing between colorectal
    adenocarcinoma and cholangiocarcinoma or adenocarcinoma of biliary tract or pancreas
    SEQ fold-
    miR name ID NO. p-value change auROC median values
    hsa-miR-224 42 4.0e−003 2.55 (+) 0.69 5.4e+002 2.1e+002
    hsa-miR-203 184 1.2e−003 2.28 (+) 0.70 4.2e+002 1.8e+002
    hsa-miR-92a 67 5.1e−007 1.91 (+) 0.77 6.2e+003 3.2e+003
    hsa-miR-106a 158 4.6e−007 1.81 (+) 0.81 5.6e+003 3.1e+003
    hsa-miR-17 20 1.3e−007 1.81 (+) 0.81 3.2e+003 1.8e+003
    hsa-miR-20a 186 7.9e−005 1.80 (+) 0.76 3.2e+003 1.8e+003
    hsa-miR-19b 284 1.4e−005 1.75 (+) 0.76 1.9e+003 1.1e+003
    MID-17356 389 3.0e−003 1.67 (+) 0.70 2.6e+003 1.6e+003
    hsa-miR-422a 203 2.1e−005 1.63 (+) 0.75 5.1e+002 3.1e+002
    MID-15965 240 5.6e−003 1.60 (+) 0.67 7.2e+003 4.5e+003
    MID-00689 236 1.7e−005 1.59 (+) 0.76 1.1e+003 6.9e+002
    hsa-miR-1201 146 2.5e−003 1.53 (+) 0.68 1.6e+003 1.1e+003
    hsa-miR-425 204 5.2e−004 1.49 (+) 0.69 1.4e+003 9.1e+002
    hsa-miR-29a 43 1.2e−005 1.44 (+) 0.77 9.3e+003 6.5e+003
    hsa-miR-18a 176 7.3e−006 1.44 (+) 0.75 6.4e+002 4.5e+002
    hsa-miR-378 202 1.4e−004 1.41 (+) 0.72 1.3e+003 9.1e+002
    hsa-miR-31 49 2.0e−003 3.39 (−) 0.69 5.3e+002 1.8e+003
    hsa-miR-30a 46 2.2e−008 2.39 (−) 0.82 8.2e+002 2.0e+003
    hsa-miR-214* 38 1.3e−002 1.47 (−) 0.66 2.5e+002 3.7e+002
    hsa-miR-99b 363 2.2e−003 1.41 (−) 0.73 9.0e+002 1.3e+003
    (+) the higher expression of this miR is in colorectal adenocarcinoma
    (−) the higher expression of this miR is in other cholangiocarcinoma or adenocarcinoma tumors of biliary tract or pancreas
  • FIG. 21 demonstrates binary decisions at node #31 of the decision-tree. Tumors originating in colorectal adenocarcinoma (diamonds) are easily distinguished from tumors of cholangiocarcinoma or adenocarcinoma of biliary tract or pancreas origin (squares) using the expression levels of hsa-miR-30a (SEQ ID NO: 46, y-axis), hsa-miR-17 (SEQ ID NO: 20, x-axis) and hsa-miR-29a (SEQ ID NO: 43, z-axis).
  • TABLE 33
    miR expression (in fluorescence units) distinguishing between cholangiocarcinoma or
    adenocarcinoma of extrahepatic biliary tract and pancreatic adenocarcinoma
    SEQ fold-
    miR name ID NO. p-value change auROC median values
    hsa-miR-31 49 1.5e−003 3.06 (+) 0.81 3.4e+003 1.1e+003
    hsa-miR-138 11 1.1e−002 2.36 (+) 0.71 3.3e+002 1.4e+002
    hsa-miR-141 13 1.7e−002 1.77 (+) 0.70 3.0e+003 1.7e+003
    MID-16582 373 1.5e−002 1.65 (+) 0.70 1.8e+004 1.1e+004
    hsa-miR-181b 154 9.6e−002 1.63 (+) 0.69 1.4e+003 8.4e+002
    hsa-miR-10b 5 5.1e−001 1.62 (+) 0.69 7.0e+002 4.3e+002
    hsa-miR-200c 30 7.4e−002 1.61 (+) 0.68 1.5e+004 9.3e+003
    hsa-miR-29c* 45 1.3e−002 1.58 (+) 0.72 4.2e+002 2.7e+002
    hsa-miR-193b 178 1.1e−001 1.47 (+) 0.66 1.5e+003 1.0e+003
    hsa-miR-221 147 1.2e−002 1.36 (+) 0.75 9.0e+003 6.6e+003
    hsa-miR-151-3p 274 4.0e−002 1.36 (+) 0.70 6.4e+002 4.7e+002
    hsa-miR-146a 16 2.4e−002 1.34 (+) 0.66 7.3e+002 5.4e+002
    hsa-miR-222 40 3.7e−002 1.32 (+) 0.71 1.5e+004 1.1e+004
    hsa-miR-181a 21 8.4e−002 1.30 (+) 0.71 4.9e+003 3.8e+003
    hsa-miR-29a 43 6.3e−002 1.14 (+) 0.66 6.8e+003 6.0e+003
    MID-23256 253 2.1e−002 1.81 (−) 0.74 3.3e+002 5.9e+002
    MID-18336 245 8.0e−002 1.70 (−) 0.66 1.1e+003 1.9e+003
    hsa-let-7a 256 7.4e−003 1.68 (−) 0.73 2.7e+004 4.5e+004
    hsa-miR-140-3p 12 9.2e−002 1.51 (−) 0.65 1.8e+003 2.7e+003
    MID-16748 374 5.4e−003 1.47 (−) 0.75 9.3e+004 1.4e+005
    MID-18395 379 2.9e−002 1.45 (−) 0.66 6.1e+004 8.9e+004
    hsa-miR-1973 180 6.6e−002 1.41 (−) 0.69 3.3e+002 4.7e+002
    hsa-let-7d 153 2.6e−002 1.40 (−) 0.68 4.3e+003 6.0e+003
    hsa-miR-345 51 7.1e−002 1.39 (−) 0.75 3.2e+002 4.4e+002
    hsa-miR-34a 52 3.9e−002 1.38 (−) 0.70 4.4e+003 6.1e+003
    hsa-let-7c 1 1.4e−002 1.37 (−) 0.73 2.4e+004 3.3e+004
    hsa-miR-26a 295 2.8e−002 1.36 (−) 0.68 1.5e+004 2.0e+004
    hsa-let-7b 257 3.3e−003 1.35 (−) 0.77 2.9e+004 3.9e+004
    MID-23168 385 6.5e−002 1.26 (−) 0.66 4.8e+003 6.1e+003
    hsa-miR-23b 293 9.0e−002 1.21 (−) 0.67 1.0e+004 1.3e+004
    hsa-miR-24 294 2.6e−002 1.18 (−) 0.68 2.1e+004 2.4e+004
    (+) the higher expression of this miR is in pancreatic adenocarcinoma
    (−) the higher expression of this miR is in cholangiocarcinoma or adenocarcinoma of extrahepatic biliary tract
  • hsa-miR-345 (SEQ ID NO: 51), hsa-miR-31 (SEQ ID NO: 49) and hsa-miR-146a (SEQ ID NO: 16) are used at node #32 of the binary-tree-classifier detailed in the invention to distinguish between cholangio cancer or adenocarcinoma of extrahepatic biliary tract and pancreatic adenocarcinoma.
  • TABLE 34
    miR expression (in fluorescence units) distinguishing between kidney tumors
    selected from the group consisting of chromophobe renal cell carcinoma, clear cell
    renal cell carcinoma and papillary renal cell carcinoma and other tumors selected from
    the group consisting of sarcoma, adrenal (pheochromocytoma, adrenocortical
    carcinoma) and mesothelioma (pleural mesothelioma)
    SEQ fold-
    miR name ID NO. p-value change auROC median values
    hsa-miR-200b 29 7.6e−042 96.12 (+) 0.94 4.8e+003 5.0e+001
    hsa-miR-200a 28 3.3e−044 45.03 (+) 0.94 2.3e+003 5.0e+001
    hsa-miR-200c 30 1.1e−015 15.36 (+) 0.82 7.7e+002 5.0e+001
    hsa-miR-30a 46 8.6e−041  9.73 (+) 0.96 1.1e+004 1.2e+003
    hsa-miR-31 49 1.1e−008  9.21 (+) 0.74 1.1e+003 1.2e+002
    hsa-miR-30a* 195 8.6e−039  8.87 (+) 0.94 1.7e+003 1.9e+002
    hsa-miR-182 152 1.1e−009  6.58 (+) 0.74 5.0e+002 7.5e+001
    hsa-miR-183 175 9.8e−011  5.07 (+) 0.76 2.5e+002 5.0e+001
    hsa-miR-30d 47 5.0e−033  3.81 (+) 0.92 8.3e+003 2.2e+003
    hsa-miR-10a 4 2.5e−016  3.52 (+) 0.83 5.1e+003 1.5e+003
    MID-23751 387 6.4e−011  3.15 (+) 0.75 2.3e+002 7.3e+001
    hsa-miR-30c 196 1.1e−025  2.95 (+) 0.89 9.5e+003 3.2e+003
    hsa-miR-192 177 2.1e−012  2.80 (+) 0.76 4.0e+002 1.4e+002
    MID-17375 375 1.7e−015  2.52 (+) 0.79 2.5e+002 9.8e+001
    hsa-miR-194 27 2.1e−012  2.43 (+) 0.75 2.2e+002 9.0e+001
    hsa-miR-30e* 304 6.5e−013  2.40 (+) 0.76 2.8e+002 1.2e+002
    hsa-miR-222 40 1.6e−012  2.37 (+) 0.75 1.6e+004 6.7e+003
    hsa-miR-29c 191 9.5e−006  2.33 (+) 0.69 5.9e+002 2.5e+002
    hsa-miR-221 147 1.4e−011  2.20 (+) 0.74 9.8e+003 4.5e+003
    hsa-miR-21* 35 7.6e−003  2.19 (+) 0.61 9.8e+002 4.5e+002
    hsa-miR-146a 16 3.7e−007  2.15 (+) 0.71 6.1e+002 2.8e+002
    hsa-miR-21 34 1.2e−003  2.07 (+) 0.64 4.9e+004 2.4e+004
    hsa-miR-10b 5 5.4e−004  2.06 (+) 0.66 4.7e+003 2.3e+003
    hsa-miR-127-3p 155 2.1e−018  9.53 (−) 0.85 1.2e+002 1.2e+003
    hsa-miR-199a-3p 181 4.9e−023  7.57 (−) 0.89 1.6e+003 1.2e+004
    hsa-miR-337-5p 312 1.2e−019  7.45 (−) 0.86 5.0e+001 3.7e+002
    hsa-miR-199b-5p 183 1.7e−015  7.21 (−) 0.82 1.4e+002 1.0e+003
    hsa-miR-199a-5p 182 1.4e−017  6.48 (−) 0.86 2.6e+003 1.7e+004
    hsa-miR-376c 320 3.5e−019  5.73 (−) 0.86 5.0e+001 2.9e+002
    hsa-miR-487b 59 2.6e−016  5.23 (−) 0.86 6.5e+001 3.4e+002
    hsa-miR-214* 38 9.7e−016  5.18 (−) 0.82 9.4e+001 4.9e+002
    hsa-miR-382 324 2.2e−017  4.83 (−) 0.86 5.0e+001 2.4e+002
    hsa-miR-381 323 6.6e−017  4.27 (−) 0.83 5.0e+001 2.1e+002
    hsa-miR-214 37 2.7e−013  4.22 (−) 0.81 1.2e+003 5.0e+003
    hsa-miR-379 322 8.5e−018  4.21 (−) 0.86 5.0e+001 2.1e+002
    hsa-miR-409-3p 325 1.2e−015  4.14 (−) 0.83 5.0e+001 2.1e+002
    hsa-miR-149 19 2.8e−016  3.76 (−) 0.86 6.7e+001 2.5e+002
    hsa-miR-224 42 2.8e−007  3.51 (−) 0.71 7.5e+001 2.6e+002
    hsa-miR-483-5p 334 1.3e−011  3.25 (−) 0.79 9.9e+001 3.2e+002
    hsa-miR-130b 265 9.5e−012  2.08 (−) 0.79 1.5e+002 3.0e+002
    hsa-miR-181a* 22 4.8e−009  2.00 (−) 0.76 1.0e+002 2.0e+002
    (+) the higher expression of this miR is in kidney tumors
    (−) the higher expression of this miR is in sarcoma, adrenal and mesothelioma tumors
  • FIG. 22 demonstrates binary decisions at node #33 of the decision-tree. Tumors originating in kidney (diamonds) are easily distinguished from tumors of adrenal, mesothelioma and sarcoma origin (squares) using the expression levels of hsa-miR-200b (SEQ ID NO: 29, y-axis), hsa-miR-30a (SEQ ID NO: 46, x-axis) and hsa-miR-149 (SEQ ID NO: 19, z-axis).
  • TABLE 35
    miR expression (in fluorescence units) distinguishing between pheochromocytoma
    (neuroendocrine tumor of the adrenal) and all sarcoma, adrenal carcinoma and
    mesothelioma tumors
    miR name SEQ ID NO. p-value fold-change auROC median values
    hsa-miR-7 65 6.7e−067 295.36 (+) 0.96 1.5e+004 5.0e+001
    hsa-miR-375 56 5.0e−036 196.58 (+) 0.91 9.8e+003 5.0e+001
    hsa-miR-138 11 3.2e−009  29.73 (+) 0.85 4.0e+003 1.3e+002
    hsa-miR-129-3p 162 1.5e−021  20.53 (+) 0.94 1.0e+003 5.0e+001
    hsa-miR-487b 59 3.0e−008  15.11 (+) 0.84 4.0e+003 2.7e+002
    hsa-miR-432 331 7.4e−008  14.54 (+) 0.81 2.2e+003 1.5e+002
    hsa-miR-539 350 9.4e−011  12.45 (+) 0.84 8.7e+002 7.0e+001
    hsa-miR-127-3p 155 8.3e−005  12.36 (+) 0.80 1.2e+004 9.6e+002
    hsa-miR-485-3p 335 1.2e−008  11.61 (+) 0.80 6.8e+002 5.8e+001
    hsa-miR-124 159 2.7e−008  11.48 (+) 0.87 5.7e+002 5.0e+001
    hsa-miR-485-5p 336 2.3e−014  10.67 (+) 0.86 5.6e+002 5.3e+001
    hsa-miR-433 332 1.2e−012  10.38 (+) 0.83 5.2e+002 5.0e+001
    hsa-miR-129* 263 1.1e−029  10.28 (+) 0.94 5.1e+002 5.0e+001
    hsa-miR-323-3p 306 9.6e−008  9.55 (+) 0.82 4.8e+002 5.0e+001
    hsa-miR-495 339 1.0e−005  9.22 (+) 0.79 1.2e+003 1.2e+002
    hsa-miR-487a 337 4.4e−006  9.01 (+) 0.80 5.2e+002 5.8e+001
    hsa-miR-154 275 4.8e−006  8.75 (+) 0.80 1.4e+003 1.6e+002
    hsa-miR-29b-2* 301 6.0e−013  8.56 (+) 0.90 6.1e+002 7.1e+001
    hsa-miR-154* 276 3.7e−005  8.54 (+) 0.78 4.5e+002 5.3e+001
    hsa-miR-431* 330 4.7e−009  7.77 (+) 0.83 4.1e+002 5.3e+001
    hsa-miR-369-5p 317 1.3e−006  7.56 (+) 0.81 7.4e+002 9.8e+001
    hsa-miR-329 309 1.4e−007  7.56 (+) 0.80 4.8e+002 6.4e+001
    hsa-miR-29c* 45 1.4e−009  7.28 (+) 0.90 1.8e+003 2.4e+002
    hsa-miR-370 318 1.5e−005  7.24 (+) 0.79 1.1e+003 1.5e+002
    hsa-miR-382 324 1.5e−005  6.74 (+) 0.80 1.5e+003 2.2e+002
    hsa-miR-543 352 3.1e−004  6.53 (+) 0.76 8.6e+002 1.3e+002
    hsa-miR-29c 191 2.6e−008  6.44 (+) 0.89 1.5e+003 2.3e+002
    hsa-miR-127-5p 262 1.1e−005  6.40 (+) 0.79 6.2e+002 9.6e+001
    hsa-miR-134 269 2.3e−004  6.39 (+) 0.77 9.6e+002 1.5e+002
    hsa-miR-338-3p 313 2.1e−012  6.03 (+) 0.90 3.3e+002 5.4e+001
    hsa-miR-149 19 4.8e−008  5.80 (+) 0.84 1.3e+003 2.2e+002
    MID-00465 367 7.5e−012  5.32 (+) 0.82 2.7e+002 5.0e+001
    hsa-miR-409-5p 326 5.3e−005  5.27 (+) 0.78 5.6e+002 1.1e+002
    hsa-miR-409-3p 325 3.1e−004  5.26 (+) 0.76 9.6e+002 1.8e+002
    hsa-miR-379 322 8.2e−004  5.25 (+) 0.76 9.2e+002 1.8e+002
    hsa-miR-410 327 4.6e−008  5.05 (+) 0.79 2.5e+002 5.0e+001
    hsa-miR-29b 190 3.4e−011  4.95 (+) 0.97 4.0e+003 8.1e+002
    hsa-miR-1180 261 7.6e−019  4.85 (+) 0.93 3.7e+002 7.6e+001
    hsa-miR-377* 321 1.0e−005  4.43 (+) 0.79 2.8e+002 6.4e+001
    hsa-miR-873 360 2.9e−009  4.09 (+) 0.81 2.0e+002 5.0e+001
    hsa-miR-598 353 1.2e−012  4.08 (+) 0.88 2.1e+002 5.3e+001
    hsa-miR-337-5p 312 3.0e−003  4.01 (+) 0.73 1.3e+003 3.3e+002
    MID-16270 371 8.8e−005  3.96 (+) 0.77 2.6e+002 6.6e+001
    hsa-miR-10b 5 6.2e−005  3.51 (+) 0.85 7.2e+003 2.1e+003
    hsa-miR-411 328 2.6e−002  3.40 (+) 0.68 2.3e+002 6.7e+001
    hsa-miR-451 205 4.3e−004  3.17 (+) 0.77 2.2e+004 6.9e+003
    hsa-miR-199b-5p 183 2.9e−008  12.33 (−) 0.90 1.1e+002 1.3e+003
    hsa-miR-214* 38 2.4e−006  4.75 (−) 0.86 1.1e+002 5.5e+002
    hsa-miR-199a-3p 181 2.9e−005  4.48 (−) 0.84 3.2e+003 1.5e+004
    hsa-miR-214 37 6.8e−005  4.47 (−) 0.85 1.3e+003 5.9e+003
    hsa-miR-222 40 5.9e−004  4.23 (−) 0.80 1.8e+003 7.8e+003
    hsa-miR-199a-5p 182 9.0e−006  4.13 (−) 0.87 4.5e+003 1.8e+004
    hsa-miR-221 147 6.4e−004  4.11 (−) 0.80 1.3e+003 5.2e+003
    hsa-miR-146b-5p 17 4.7e−007  3.72 (−) 0.85 1.9e+002 7.0e+002
    hsa-miR-224 42 2.3e−003  3.48 (−) 0.72 8.9e+001 3.1e+002
    MID-22331 382 3.8e−005  3.42 (−) 0.81 8.0e+002 2.7e+003
    hsa-miR-21 34 1.1e−004  3.35 (−) 0.79 7.9e+003 2.6e+004
    hsa-miR-148a 18 1.3e−003  3.08 (−) 0.74 1.4e+002 4.2e+002
    hsa-miR-100 3 2.6e−005  3.03 (−) 0.83 2.1e+003 6.3e+003
    (+) the higher expression of this miR is in pheochromocytoma
    (−) the higher expression of this miR is in sarcoma, adrenal carcinoma and mesothelioma tumors
  • FIG. 23 demonstrates binary decisions at node #34 of the decision-tree. Tumors originating in pheochromocytoma (diamonds) are easily distinguished from tumors of adrenal, mesothelioma and sarcoma origin (squares) using the expression levels of hsa-miR-375 (SEQ ID NO: 56, y-axis) and hsa-miR-7 (SEQ ID NO: 65, x-axis).
  • TABLE 36
    miR expression (in fluorescence units) distinguishing between adrenal carcinoma and
    mesothelioma or sarcoma tumors
    SEQ fold-
    miR name ID NO. p-value change auROC median values
    hsa-miR-509-3p 61 1.3e−040 51.10 (+) 0.98 2.6e+003 5.0e+001
    hsa-miR-483-3p 333 4.9e−007 24.55 (+) 0.76 1.3e+003 5.4e+001
    hsa-miR-202 31 8.1e−066 24.01 (+) 0.99 1.2e+003 5.0e+001
    hsa-miR-513a-5p 347 2.6e−024 21.83 (+) 0.95 1.4e+003 6.4e+001
    hsa-miR-509-3-5p 346 9.3e−030 12.08 (+) 0.96 6.0e+002 5.0e+001
    hsa-miR-503 344 2.2e−016 11.82 (+) 0.92 2.2e+003 1.9e+002
    hsa-miR-506 345 3.8e−033 10.25 (+) 0.98 5.1e+002 5.0e+001
    MID-23751 387 1.2e−026  9.70 (+) 0.96 5.9e+002 6.1e+001
    hsa-miR-483-5p 334 6.0e−005  8.66 (+) 0.71 2.7e+003 3.1e+002
    hsa-miR-542-5p 351 1.1e−015  7.79 (+) 0.91 1.1e+003 1.4e+002
    hsa-miR-382 324 8.5e−005  5.77 (+) 0.72 1.2e+003 2.0e+002
    hsa-miR-409-5p 326 3.1e−007  5.44 (+) 0.75 5.5e+002 1.0e+002
    hsa-miR-134 269 2.7e−004  5.31 (+) 0.73 7.2e+002 1.4e+002
    hsa-miR-127-3p 155 8.9e−003  4.98 (+) 0.69 3.9e+003 7.9e+002
    hsa-miR-376c 320 6.4e−003  4.93 (+) 0.68 1.0e+003 2.1e+002
    hsa-miR-379 322 2.6e−003  4.84 (+) 0.69 7.8e+002 1.6e+002
    hsa-miR-487b 59 4.9e−005  4.53 (+) 0.72 1.0e+003 2.2e+002
    hsa-miR-370 318 1.3e−003  4.49 (+) 0.69 6.6e+002 1.5e+002
    hsa-miR-409-3p 325 2.9e−004  4.45 (+) 0.71 7.8e+002 1.7e+002
    MID-18336 245 3.9e−011  4.19 (+) 0.92 4.7e+003 1.1e+003
    MID-23291 254 1.8e−007  3.79 (+) 0.84 1.1e+003 2.9e+002
    hsa-miR-432 331 1.5e−004  3.71 (+) 0.70 5.0e+002 1.4e+002
    hsa-miR-154 275 1.6e−003  3.60 (+) 0.69 5.3e+002 1.5e+002
    hsa-miR-1973 180 7.2e−011  3.48 (+) 0.90 1.1e+003 3.1e+002
    hsa-miR-654-3p 355 6.9e−002  3.22 (+) 0.63 5.4e+002 1.7e+002
    MID-15986 370 8.6e−009  3.14 (+) 0.86 3.5e+003 1.1e+003
    hsa-miR-381 323 4.5e−002  3.07 (+) 0.64 5.4e+002 1.8e+002
    hsa-miR-337-5p 312 5.8e−002  3.07 (+) 0.63 8.9e+002 2.9e+002
    hsa-miR-193b 178 1.0e−008  3.03 (+) 0.88 5.6e+003 1.8e+003
    MID-20524 249 1.6e−006  3.02 (+) 0.81 4.2e+003 1.4e+003
    hsa-miR-199b-5p 183 1.3e−015 18.32 (−) 0.96 9.5e+001 1.7e+003
    hsa-miR-199a-3p 181 9.7e−014 10.80 (−) 0.95 1.7e+003 1.9e+004
    hsa-miR-214* 38 2.6e−016 10.75 (−) 0.97 6.1e+001 6.5e+002
    hsa-miR-199a-5p 182 1.9e−015  9.43 (−) 0.97 2.5e+003 2.4e+004
    hsa-miR-214 37 1.2e−011  7.89 (−) 0.96 9.0e+002 7.1e+003
    hsa-miR-100 3 1.8e−012  4.87 (−) 0.90 1.5e+003 7.5e+003
    hsa-miR-193a-3p 25 3.6e−006  3.37 (−) 0.83 7.6e+002 2.5e+003
    hsa-miR-152 169 2.5e−006  3.05 (−) 0.80 4.3e+002 1.3e+003
    (+) the higher expression of this miR is in adrenal carcinoma
    (−) the higher expression of this miR is in sarcoma and mesothelioma tumors
  • hsa-miR-202 (SEQ ID NO: 31), hsa-miR-509-3p (SEQ ID NO: 61) and hsa-miR-214* (SEQ ID NO: 38) are used at node 35 of the binary-tree-classifier detailed in the invention to distinguish between adrenal carcinoma and sarcoma or mesothelioma tumors.
  • TABLE 37
    miR expression (in fluorescence units) distinguishing between GIST and mesothelioma
    or sarcoma tumors
    fold- SEQ
    median values auROC change p-value ID NO. miR name
    2.4e+002 5.6e+003 0.97 23.39 (+) 4.2e−033 165 hsa-miR-143*
    4.9e+003 1.0e+005 0.97 21.41 (+) 1.7e−025 14 hsa-miR-143
    8.1e+003 1.5e+005 0.99 18.42 (+) 4.2e−026 15 hsa-miR-145
    5.0e+001 7.9e+002 0.87 15.77 (+) 1.5e−010 333 hsa-miR-483-3p
    6.2e+001 8.4e+002 0.98 13.54 (+) 1.5e−037 272 hsa-miR-145*
    1.6e+002 1.6e+003 0.99  9.58 (+) 2.7e−024 270 hsa-miR-139-5p
    1.8e+002 1.8e+003 0.96  9.49 (+) 2.7e−019 45 hsa-miR-29c*
    6.1e+001 5.8e+002 0.95  9.48 (+) 7.9e−028 301 hsa-miR-29b-2*
    1.9e+002 1.5e+003 0.94  7.89 (+) 1.9e−015 191 hsa-miR-29c
    1.2e+003 6.3e+003 0.96  5.12 (+) 8.8e−014 46 hsa-miR-30a
    1.9e+002 7.3e+002 0.93  3.84 (+) 6.6e−013 195 hsa-miR-30a*
    2.4e+002 8.7e+002 0.92  3.66 (+) 1.8e−008 266 hsa-miR-132
    6.1e+002 2.2e+003 0.91  3.52 (+) 4.2e−008 190 hsa-miR-29b
    1.9e+002 6.5e+002 0.82  3.50 (+) 1.5e−006 19 hsa-miR-149
    2.6e+002 9.0e+002 0.82  3.47 (+) 4.8e−005 334 hsa-miR-483-5p
    9.3e+002 1.9e+002 0.70  5.00 (−) 2.1e−003 155 hsa-miR-127-3p
    3.4e+003 7.1e+002 0.88  4.74 (−) 2.3e−007 25 hsa-miR-193a-3p
    7.0e+003 1.9e+003 0.79  3.64 (−) 5.5e−004 147 hsa-miR-221
    9.8e+003 2.8e+003 0.78  3.54 (−) 1.1e−003 40 hsa-miR-222
    3.2e+004 9.8e+003 0.75  3.26 (−) 1.1e−003 34 hsa-miR-21
    (+) the higher expression of this miR is in GIST
    (−) the higher expression of this miR is in sarcoma and mesothelioma tumors
  • hsa-miR-29C* (SEQ ID NO: 45) and hsa-miR-143 (SEQ ID NO: 14) are used at node 36 of the binary-tree-classifier detailed in the invention to distinguish between GIST and sarcoma or mesothelioma tumors.
  • TABLE 38
    miR expression (in fluorescence units) distinguishing between chromophobe renal cell
    carcinoma tumors and clear cell or papillary renal cell carcinoma tumors
    fold- SEQ
    median values auROC change p-value ID NO. miR name
    8.8e+001 2.1e+003 0.99 23.68 (+) 4.7e−017 13 hsa-miR-141
    3.0e+002 5.7e+003 0.99 18.81 (+) 8.4e−012 30 hsa-miR-200c
    6.6e+001 9.8e+002 0.99 14.85 (+) 7.5e−019 361 hsa-miR-874
    5.0e+001 7.4e+002 0.97 14.80 (+) 1.0e−014 280 hsa-miR-187
    5.0e+001 7.2e+002 0.98 14.47 (+) 4.7e−018 362 hsa-miR-891a
    5.3e+003 7.4e+004 0.98 13.97 (+) 5.3e−017 147 hsa-miR-221
    7.6e+003 9.0e+004 0.97 11.89 (+) 1.4e−015 40 hsa-miR-222
    5.3e+001 5.1e+002 0.98  9.66 (+) 1.2e−017 291 hsa-miR-222*
    1.4e+002 1.1e+003 0.94  8.01 (+) 2.7e−010 387 MID-23751
    7.4e+001 5.4e+002 0.97  7.32 (+) 4.4e−015 290 hsa-miR-221*
    1.1e+002 5.6e+002 0.93  4.97 (+) 3.2e−010 299 hsa-miR-296-5p
    3.2e+002 1.5e+003 0.90  4.90 (+) 3.1e−007 152 hsa-miR-182
    8.4e+002 3.3e+003 0.73  3.89 (+) 6.3e−003 178 hsa-miR-193b
    5.4e+003 1.7e+004 0.92  3.26 (+) 2.2e−007 303 hsa-miR-30b
    1.1e+003 3.5e+003 0.74  3.20 (+) 8.1e−003 242 MID-16489
    6.2e+003 3.3e+002 0.85 18.53 (−) 4.3e−006 49 hsa-miR-31
    6.1e+002 5.0e+001 0.90 12.13 (−) 2.1e−007 11 hsa-miR-138
    1.8e+003 2.2e+002 0.98  8.38 (−) 6.7e−014 35 hsa-miR-21*
    7.9e+004 1.0e+004 0.95  7.54 (−) 3.1e−013 34 hsa-miR-21
    3.7e+003 5.4e+002 0.92  6.79 (−) 3.1e−009 36 hsa-miR-210
    9.8e+002 1.7e+002 0.97  5.71 (−) 3.1e−013 206 hsa-miR-455-3p
    1.0e+003 2.5e+002 0.91  4.07 (−) 4.7e−008 16 hsa-miR-146a
    6.0e+002 1.7e+002 0.89  3.64 (−) 7.1e−007 170 hsa-miR-155
    7.5e+002 2.1e+002 0.78  3.48 (−) 1.6e−003 177 hsa-miR-192
    8.6e+002 2.5e+002 0.86  3.39 (−) 6.6e−006 17 hsa-miR-146b-5p
    (+) the higher expression of this miR is in chromophobe renal cell carcinoma tumors
    (−) the higher expression of this miR is in clear cell or papillary renal cell carcinoma tumors
  • hsa-miR-210 (SEQ ID NO: 36) and hsa-miR-221 (SEQ ID NO: 147) are used at node #37 of the binary-tree-classifier detailed in the invention to distinguish between chromophobe renal cell carcinoma tumors and clear cell or papillary renal cell carcinoma tumors.
  • TABLE 39
    miR expression (in fluorescence units) distinguishing between clear cell and papillary renal
    cell carcinoma tumors
    SEQ fold-
    miR name ID NO. p-value change auROC median values
    hsa-miR-503 344 2.3e−005 4.81 (+) 0.89 5.7e+002 1.2e+002
    MID-22331 382 5.8e−003 3.65 (+) 0.81 5.9e+003 1.6e+003
    hsa-miR-126 9 1.1e−005 3.54 (+) 0.94 6.4e+003 1.8e+003
    hsa-miR-494 338 3.0e−003 3.45 (+) 0.82 5.7e+003 1.7e+003
    hsa-miR-200b 29 3.1e−004 8.35 (−) 0.87 1.3e+003 1.1e+004
    hsa-miR-31 49 3.0e−002 6.61 (−) 0.81 1.3e+003 8.7e+003
    hsa-miR-200a 28 5.0e−005 5.30 (−) 0.92 9.5e+002 5.1e+003
    hsa-miR-30a* 195 1.1e−009 4.10 (−) 1.00 5.1e+002 2.1e+003
    hsa-miR-30a 46 4.5e−010 3.70 (−) 1.00 5.0e+003 1.9e+004
    hsa-miR-10a 4 6.9e−004 3.39 (−) 0.86 1.6e+003 5.3e+003
    hsa-miR-138 11 2.0e−002 3.23 (−) 0.76 2.3e+002 7.6e+002
    MID-23291 254 7.4e−003 3.17 (−) 0.79 2.0e+002 6.4e+002
    (+) the higher expression of this miR is in renal clear cell carcinoma tumors
    (−) the higher expression of this miR is in papillary renal cell carcinoma tumors
  • hsa-miR-31 (SEQ ID NO: 49) and hsa-miR-126 (SEQ ID NO: 9) are used at node 38 of the binary-tree-classifier detailed in the invention to distinguish between renal clear cell and papillary cell carcinoma tumors.
  • TABLE 40
    miR expression (in fluorescence units) distinguishing between
    pleural mesothelioma and sarcoma tumors
    SEQ ID fold-
    miR name NO. p-value change auROC median values
    hsa-miR-31 49 1.7e−006 13.97 (+)  0.78 1.7e+003 1.2e+002
    hsa-miR-21* 35 2.0e−011 5.01 (+) 0.89 2.1e+003 4.3e+002
    hsa-miR-146b-5p 17 2.1e−008 2.75 (+) 0.84 1.6e+003 5.9e+002
    hsa-miR-21 34 4.8e−010 2.71 (+) 0.89 6.9e+004 2.5e+004
    hsa-miR-193a-3p 25 2.3e−005 2.57 (+) 0.77 6.1e+003 2.4e+003
    hsa-miR-210 36 5.9e−005 2.49 (+) 0.75 3.1e+003 1.2e+003
    hsa-miR-150 168 9.7e−004 2.33 (+) 0.70 1.1e+003 4.6e+002
    hsa-miR-155 170 1.4e−004 2.33 (+) 0.75 6.8e+002 2.9e+002
    hsa-miR-193a-5p 26 1.4e−005 2.25 (+) 0.76 7.3e+002 3.2e+002
    hsa-miR-10a 4 1.7e−004 2.13 (+) 0.76 2.4e+003 1.1e+003
    hsa-miR-29b 190 1.1e−003 2.03 (+) 0.70 1.0e+003 5.1e+002
    hsa-miR-30a 46 1.5e−005 1.99 (+) 0.77 1.8e+003 9.0e+002
    hsa-miR-130a 10 8.9e−003 1.90 (+) 0.71 4.9e+003 2.6e+003
    MID-15965 240 1.6e−003 1.88 (+) 0.69 5.2e+003 2.8e+003
    hsa-miR-29a 43 1.5e−002 1.71 (+) 0.67 7.3e+003 4.3e+003
    hsa-miR-22 39 1.1e−004 1.65 (+) 0.72 8.2e+003 5.0e+003
    MID-23168 385 2.9e−002 1.57 (+) 0.64 5.9e+003 3.8e+003
    hsa-miR-574-5p 63 1.4e−002 1.55 (+) 0.66 1.5e+003 9.9e+002
    hsa-miR-378 202 2.8e−002 1.53 (+) 0.66 1.2e+003 7.5e+002
    hsa-miR-199a-3p 181 3.3e−007 3.62 (−) 0.84 7.4e+003 2.7e+004
    hsa-miR-214 37 3.2e−006 3.45 (−) 0.81 3.1e+003 1.1e+004
    hsa-miR-10b 5 6.7e−008 3.36 (−) 0.85 8.3e+002 2.8e+003
    hsa-miR-199b-5p 183 2.0e−004 3.19 (−) 0.74 9.1e+002 2.9e+003
    hsa-miR-199a-5p 182 6.0e−005 2.84 (−) 0.80 1.1e+004 3.0e+004
    hsa-miR-214* 38 5.7e−005 2.22 (−) 0.78 3.8e+002 8.4e+002
    hsa-miR-455-3p 206 2.8e−004 1.69 (−) 0.75 7.0e+002 1.2e+003
    hsa-miR-26b 296 2.9e−003 1.61 (−) 0.75 4.4e+002 7.0e+002
    hsa-let-7c 1 4.3e−003 1.58 (−) 0.71 3.2e+004 5.0e+004
    (+) the higher expression of this miR is in pleural mesothelioma tumors
    (−) the higher expression of this miR is in sarcoma tumors
  • hsa-miR-21* (SEQ ID NO: 35) hsa-miR-130a (SEQ ID NO: 10) and hsa-miR-10b (SEQ ID NO: 5) are used at node 39 of the binary-tree-classifier detailed in the invention to distinguish between pleural mesothelioma tumors and sarcoma tumors.
  • TABLE 41
    miR expression (in fluorescence units) distinguishing between synovial sarcoma and other
    sarcoma tumors
    SEQ fold-
    miR name ID NO. p-value change auROC median values
    hsa-miR-182 152 2.9e−009 25.03 (+) 0.89 1.3e+003 5.1e+001
    hsa-miR-200b 29 9.2e−009 21.59 (+) 0.92 1.1e+003 5.0e+001
    hsa-miR-124 159 5.9e−007 19.35 (+) 0.88 9.7e+002 5.0e+001
    hsa-miR-200a 28 1.5e−008 12.81 (+) 0.92 6.4e+002 5.0e+001
    hsa-miR-495 339 8.2e−005  7.00 (+) 0.88 8.9e+002 1.3e+002
    hsa-miR-154 275 1.6e−005  6.94 (+) 0.89 1.1e+003 1.6e+002
    hsa-miR-543 352 2.8e−004  6.53 (+) 0.87 7.3e+002 1.1e+002
    hsa-miR-149 19 6.1e−006  6.51 (+) 0.91 8.8e+002 1.4e+002
    hsa-miR-376c 320 9.6e−005  6.22 (+) 0.86 2.0e+003 3.3e+002
    hsa-miR-127-3p 155 1.5e−003  6.05 (+) 0.84 6.2e+003 1.0e+003
    hsa-miR-127-5p 262 1.7e−004  5.40 (+) 0.84 5.2e+002 9.7e+001
    hsa-miR-214* 38 2.5e−004  5.33 (+) 0.86 4.2e+003 7.8e+002
    hsa-miR-214 37 7.3e−003  5.29 (+) 0.84 5.0e+004 9.5e+003
    hsa-miR-199a-5p 182 4.7e−003  4.90 (+) 0.87 1.4e+005 2.9e+004
    hsa-miR-432 331 1.9e−003  4.56 (+) 0.81 6.4e+002 1.4e+002
    hsa-miR-369-5p 317 2.7e−004  4.43 (+) 0.84 5.0e+002 1.1e+002
    hsa-miR-381 323 9.6e−003  4.19 (+) 0.78 8.9e+002 2.1e+002
    hsa-miR-654-3p 355 2.7e−003  3.96 (+) 0.81 7.7e+002 1.9e+002
    hsa-miR-100 3 9.6e−006  3.79 (+) 0.91 2.1e+004 5.6e+003
    hsa-miR-196a 282 2.8e−004  3.76 (+) 0.86 6.5e+002 1.7e+002
    hsa-miR-337-5p 312 6.0e−003  3.55 (+) 0.80 1.5e+003 4.1e+002
    hsa-miR-199a-3p 181 7.4e−003  3.52 (+) 0.86 9.1e+004 2.6e+004
    hsa-miR-134 269 6.5e−003  3.41 (+) 0.80 5.5e+002 1.6e+002
    hsa-miR-370 318 8.3e−003  3.32 (+) 0.79 6.4e+002 1.9e+002
    hsa-miR-487b 59 7.6e−003  3.08 (+) 0.78 1.0e+003 3.2e+002
    hsa-miR-132 266 7.1e−007  3.02 (+) 0.87 6.4e+002 2.1e+002
    hsa-miR-379 322 1.1e−002  2.92 (+) 0.78 6.3e+002 2.2e+002
    hsa-miR-125b 8 1.6e−004  2.83 (+) 0.92 1.2e+005 4.4e+004
    hsa-miR-382 324 8.4e−003  2.58 (+) 0.78 6.1e+002 2.4e+002
    hsa-miR-130a 10 1.8e−003  2.45 (+) 0.80 6.0e+003 2.4e+003
    hsa-miR-222 40 2.1e−010 10.95 (−) 0.96 8.4e+002 9.2e+003
    hsa-miR-221 147 7.9e−010 10.19 (−) 0.96 6.7e+002 6.8e+003
    hsa-miR-152 169 2.1e−005  4.92 (−) 0.88 3.6e+002 1.8e+003
    hsa-miR-451 205 2.8e−002  4.72 (−) 0.72 1.7e+003 7.9e+003
    hsa-miR-21 34 4.4e−005  4.59 (−) 0.84 5.9e+003 2.7e+004
    hsa-miR-150 168 9.2e−004  4.32 (−) 0.83 1.4e+002 5.9e+002
    hsa-miR-143 14 2.2e−007  4.15 (−) 0.92 1.3e+003 5.5e+003
    hsa-miR-145 15 2.7e−007  3.30 (−) 0.93 2.9e+003 9.5e+003
    hsa-miR-140-3p 12 1.6e−003  3.30 (−) 0.91 1.0e+003 3.5e+003
    hsa-miR-30a 46 6.1e−003  2.92 (−) 0.78 3.5e+002 1.0e+003
    MID-23794 255 4.1e−004  2.86 (−) 0.82 3.6e+002 1.0e+003
    hsa-miR-22 39 1.9e−005  2.82 (−) 0.88 2.0e+003 5.8e+003
    hsa-miR-185 23 3.9e−003  2.74 (−) 0.77 4.0e+002 1.1e+003
    hsa-miR-29b 190 6.8e−003  2.45 (−) 0.80 2.4e+002 5.8e+002
    MID-00689 236 4.9e−003  2.27 (−) 0.79 3.1e+002 7.1e+002
    MID-23178 386 2.2e−004  2.10 (−) 0.85 3.2e+004 6.7e+004
    MID-18395 379 2.9e−003  2.08 (−) 0.80 3.7e+004 7.7e+004
    hsa-miR-378 202 7.4e−003  2.07 (−) 0.78 3.8e+002 7.8e+002
    (+) the higher expression of this miR is in synovial sarcoma tumors
    (−) the higher expression of this miR is in other sarcoma tumors
  • hsa-miR-100 (SEQ ID NO: 3) hsa-miR-145 (SEQ ID NO: 15) and hsa-miR-222 (SEQ ID NO: 40) are used at node 40 of the binary-tree-classifier detailed in the invention to distinguish between synovial sarcoma tumors and other sarcoma tumors.
  • TABLE 42
    miR expression (in fluorescence units) distinguishing between chondrosarcoma and other
    non synovial sarcoma tumors
    SEQ fold-
    miR name ID NO. p-value change auROC median values
    hsa-miR-140-3p 12 2.1e−022 75.69 (+) 1.00 2.2e+005 2.9e+003
    hsa-miR-140-5p 271 8.5e−015 35.23 (+) 0.91 5.1e+003 1.5e+002
    hsa-miR-455-3p 206 6.1e−015 14.49 (+) 0.98 1.6e+004 1.1e+003
    hsa-miR-483-3p 333 3.1e−003 11.03 (+) 0.71 5.5e+002 5.0e+001
    hsa-miR-138 11 1.2e−006 11.01 (+) 0.88 1.1e+003 9.5e+001
    hsa-miR-455-5p 58 6.3e−012  8.87 (+) 0.87 8.2e+002 9.2e+001
    hsa-miR-210 36 1.5e−006  4.37 (+) 0.91 4.7e+003 1.1e+003
    hsa-miR-148a 18 3.1e−004  3.98 (+) 0.83 1.4e+003 3.6e+002
    hsa-miR-193b 178 2.3e−002  2.36 (+) 0.72 3.6e+003 1.5e+003
    hsa-miR-23b 293 1.5e−004  2.13 (+) 0.84 2.8e+004 1.3e+004
    hsa-miR-27b 189 5.8e−004  2.05 (+) 0.80 5.5e+003 2.7e+003
    MID-22331 382 1.1e−004  5.01 (−) 0.70 6.7e+002 3.4e+003
    MID-19962 381 1.2e−004  3.91 (−) 0.81 1.7e+002 6.6e+002
    MID-15965 240 1.9e−004  3.76 (−) 0.83 8.5e+002 3.2e+003
    MID-20524 249 8.0e−004  3.47 (−) 0.79 4.2e+002 1.5e+003
    hsa-miR-10b 5 1.3e−005  3.27 (−) 0.85 9.0e+002 2.9e+003
    MID-17866 377 6.9e−005  2.92 (−) 0.78 1.0e+003 2.9e+003
    hsa-miR-1978 235 1.3e−003  2.62 (−) 0.75 2.7e+002 7.1e+002
    hsa-miR-146b-5p 17 4.4e−005  2.48 (−) 0.81 2.8e+002 7.0e+002
    hsa-miR-17 20 2.7e−002  2.36 (−) 0.71 5.7e+002 1.3e+003
    MID-23168 385 8.2e−003  2.36 (−) 0.73 1.9e+003 4.5e+003
    MID-23017 384 4.8e−003  2.16 (−) 0.74 5.0e+003 1.1e+004
    hsa-miR-30a 46 3.2e−004  2.04 (−) 0.79 5.4e+002 1.1e+003
    hsa-miR-1979 283 3.0e−004  2.02 (−) 0.83 8.1e+003 1.6e+004
    (+) the higher expression of this miR is in chondrosarcoma tumors
    (−) the higher expression of this miR is in other non-synovial sarcoma tumors
  • hsa-miR-140-3p (SEQ ID NO: 12) and hsa-miR-455-5p (SEQ ID NO: 58) are used at node 41 of the binary-tree-classifier detailed in the invention to distinguish between chondrosarcoma tumors and other non-synovial sarcoma tumors.
  • TABLE 43
    miR expression (in fluorescence units) distinguishing between liposarcoma and other non
    chondrosarcoma and non synovial sarcoma tumors
    SEQ fold-
    miR name ID NO. p-value change auROC median values
    hsa-miR-26a 295 1.6e−011 6.18 (+) 0.93 1.2e+005 1.9e+004
    hsa-miR-451 205 8.1e−003 4.20 (+) 0.73 1.8e+004 4.2e+003
    hsa-miR-193a-3p 25 6.5e−006 3.94 (+) 0.84 5.9e+003 1.5e+003
    hsa-miR-193a-5p 26 7.5e−007 3.70 (+) 0.88 8.8e+002 2.4e+002
    hsa-miR-99a 231 2.2e−005 3.24 (+) 0.88 2.0e+004 6.1e+003
    hsa-miR-199b-5p 183 1.9e−003 2.60 (+) 0.75 5.9e+003 2.3e+003
    hsa-miR-224 42 1.7e−004 2.54 (+) 0.79 7.9e+002 3.1e+002
    MID-23291 254 9.9e−003 2.54 (+) 0.71 7.4e+002 2.9e+002
    hsa-miR-150 168 1.5e−002 2.38 (+) 0.71 1.0e+003 4.2e+002
    hsa-miR-652 64 1.1e−004 2.36 (+) 0.77 7.7e+002 3.2e+002
    hsa-miR-143 14 5.4e−006 2.27 (+) 0.84 1.1e+004 4.8e+003
    hsa-miR-193b 178 2.7e−004 2.20 (+) 0.76 3.0e+003 1.4e+003
    hsa-miR-145 15 1.1e−004 2.13 (+) 0.78 1.7e+004 7.9e+003
    hsa-miR-22 39 9.8e−004 2.12 (+) 0.79 9.7e+003 4.6e+003
    hsa-miR-210 36 1.8e−004 4.49 (−) 0.79 3.1e+002 1.4e+003
    hsa-miR-181b 154 1.2e−002 2.60 (−) 0.71 9.0e+002 2.4e+003
    hsa-miR-130b 265 4.0e−003 2.29 (−) 0.75 2.6e+002 5.9e+002
    hsa-miR-181d 174 3.2e−003 2.16 (−) 0.75 3.0e+002 6.5e+002
    MID-23017 384 2.0e−004 2.14 (−) 0.79 5.6e+003 1.2e+004
    hsa-miR-92a 67 6.6e−004 2.04 (−) 0.80 1.6e+003 3.3e+003
    (+) the higher expression of this miR is in liposarcoma tumors
    (−) the higher expression of this miR is in other non-chondrosarcoma and non-synovial sarcoma tumors
  • hsa-miR-210 (SEQ ID NO: 36) and hsa-miR-193a-5p (SEQ ID NO: 26) are used at node 42 of the binary-tree-classifier detailed in the invention to distinguish between liposarcoma tumors and other non-chondrosarcoma and non-synovial sarcoma tumors.
  • TABLE 44
    miR expression (in fluorescence units) distinguishing between Ewing
    sarcoma or osteosarcoma; and rhabdomyosarcoma, malignant fibrous histiocytoma
    (MFH) or fibrosarcoma
    miR name SEQ ID NO. p-value fold-change auROC median values
    hsa-miR-181a* 22 1.1e−006 6.62 (+) 0.87 1.2e+003 1.9e+002
    hsa-miR-181b 154 8.7e−009 5.68 (+) 0.91 6.4e+003 1.1e+003
    hsa-miR-181a 21 2.9e−010 5.67 (+) 0.93 2.1e+004 3.7e+003
    hsa-miR-181d 174 3.5e−006 4.19 (+) 0.85 1.8e+003 4.2e+002
    hsa-miR-451 205 1.2e−002 3.27 (+) 0.72 9.4e+003 2.9e+003
    hsa-miR-106a 158 2.9e−003 2.63 (+) 0.78 4.7e+003 1.8e+003
    hsa-miR-20a 186 2.9e−003 2.52 (+) 0.78 2.8e+003 1.1e+003
    hsa-miR-93 148 9.2e−005 2.45 (+) 0.81 4.9e+003 2.0e+003
    hsa-miR-17 20 5.1e−003 2.32 (+) 0.77 2.6e+003 1.1e+003
    hsa-miR-487b 59 1.1e−002 4.54 (−) 0.71 1.3e+002 6.0e+002
    hsa-miR-125b 8 2.9e−005 2.86 (−) 0.84 1.7e+004 4.9e+004
    hsa-miR-199b-5p 183 9.4e−003 2.70 (−) 0.72 1.3e+003 3.4e+003
    hsa-miR-99a 231 1.1e−003 2.34 (−) 0.76 3.5e+003 8.1e+003
    (+) the higher expression of this miR is in Ewing sarcoma or osteosarcoma tumors
    (−) the higher expression of this miR is in rhabdomyosarcoma, malignant fibrous histiocytoma (MFH) or fibrosarcoma tumors
  • hsa-miR-181a (SEQ ID NO: 21) is used at node 43 of the binary-tree-classifier detailed in the invention to distinguish between Ewing sarcoma or osteosarcoma tumors and rhabdomyosarcoma, malignant fibrous histiocytoma (MFH) or fibrosarcoma tumors.
  • TABLE 45
    miR expression (in fluorescence units) distinguishing between Ewing sarcoma and
    osteosarcoma
    miR name SEQ ID NO. p-value fold-change auROC median values
    hsa-miR-127-3p 155 3.7e−006  6.60 (+) 1.00 1.1e+003 1.6e+002
    hsa-miR-195 179 8.9e−004  5.85 (+) 0.97 8.5e+003 1.4e+003
    hsa-miR-29a 43 1.4e−002  4.90 (+) 0.86 1.4e+004 2.8e+003
    hsa-miR-497 208 1.1e−004  4.58 (+) 1.00 6.5e+003 1.4e+003
    hsa-miR-181a-2* 278 1.0e−003  4.42 (+) 0.88 7.6e+002 1.7e+002
    hsa-miR-146b-5p 17 6.0e−003  4.05 (+) 0.86 1.6e+003 4.0e+002
    MID-23168 385 1.4e−002  2.64 (+) 0.81 8.9e+003 3.4e+003
    hsa-miR-181d 174 1.5e−002  2.60 (+) 0.77 2.1e+003 8.0e+002
    hsa-miR-10b 5 1.3e−002  2.55 (+) 0.82 4.1e+003 1.6e+003
    hsa-miR-34a 52 7.1e−003  2.19 (+) 0.84 4.9e+003 2.2e+003
    hsa-let-7b 257 2.7e−004  2.16 (+) 0.97 5.4e+004 2.5e+004
    MID-00144 366 2.1e−003  2.12 (+) 0.88 5.2e+002 2.5e+002
    hsa-miR-30e 48 6.2e−003  2.06 (+) 0.84 9.4e+002 4.5e+002
    hsa-miR-31 49 7.9e−005 25.44 (−) 0.96 5.0e+001 1.3e+003
    hsa-miR-140-3p 12 1.4e−003  5.72 (−) 0.89 2.0e+003 1.2e+004
    hsa-miR-193a-3p 25 5.2e−005  4.92 (−) 0.94 7.6e+002 3.8e+003
    hsa-miR-152 169 3.3e−003  4.09 (−) 0.89 4.4e+002 1.8e+003
    hsa-miR-21 34 3.2e−003  3.00 (−) 0.89 1.2e+004 3.7e+004
    hsa-miR-21* 35 1.7e−003  2.96 (−) 0.83 2.7e+002 8.1e+002
    hsa-miR-185 23 4.2e−003  2.55 (−) 0.88 6.7e+002 1.7e+003
    MID-23017 384 1.7e−002  2.53 (−) 0.82 8.2e+003 2.1e+004
    hsa-miR-27b 189 3.8e−003  2.52 (−) 0.84 1.7e+003 4.3e+003
    MID-17866 377 3.0e−002  2.18 (−) 0.80 2.3e+003 5.1e+003
    hsa-miR-130b 265 3.0e−002  2.17 (−) 0.78 4.4e+002 9.6e+002
    hsa-miR-24 294 3.3e−003  2.07 (−) 0.82 1.8e+004 3.7e+004
    hsa-miR-23b 293 9.0e−003  2.03 (−) 0.86 8.8e+003 1.8e+004
    hsa-miR-23a 292 1.6e−002  2.02 (−) 0.80 1.5e+004 3.0e+004
    (+) the higher expression of this miR is in Ewing sarcoma tumors
    (−) the higher expression of this miR is in osteosarcoma tumors
  • FIG. 24 demonstrates binary decisions at node #44 of the decision-tree. Tumors originating in Ewing sarcoma (diamonds) are easily distinguished from tumors of osteosarcoma origin (squares) using the expression levels of hsa-miR-31 (SEQ ID NO: 49, y-axis) and hsa-miR-193a-3p (SEQ ID NO: 25, x-axis).
  • TABLE 46
    miR expression (in fluorescence units) distinguishing between rhabdomyosarcoma and
    malignant fibrous histiocytoma (MFH) or fibrosarcoma
    fold- SEQ
    median values auROC change p-value ID NO. miR name
    5.0e+001 4.1e+003 0.96 81.34 (+) 1.9e−007 33 hsa-miR-206
    5.7e+001 4.3e+003 0.89 74.89 (+) 1.8e−004 268 hsa-miR-133b
    5.9e+001 3.9e+003 0.88 66.65 (+) 3.2e−004 267 hsa-miR-133a
    5.0e+001 1.3e+003 0.89 25.89 (+) 3.9e−006 333 hsa-miR-483-3p
    5.3e+001 5.2e+002 0.85  9.90 (+) 1.3e−004 276 hsa-miR-154*
    5.8e+001 5.6e+002 0.85  9.63 (+) 1.2e−004 319 hsa-miR-376a
    5.7e+001 5.1e+002 0.86  9.00 (+) 4.8e−005 306 hsa-miR-323-3p
    2.5e+002 1.8e+003 0.84  7.01 (+) 2.8e−003 320 hsa-miR-376c
    2.6e+002 1.7e+003 0.82  6.52 (+) 3.9e−003 334 hsa-miR-483-5p
    3.1e+002 1.9e+003 0.87  6.22 (+) 5.1e−004 323 hsa-miR-381
    1.0e+002 6.3e+002 0.85  6.19 (+) 5.4e−004 300 hsa-miR-299-3p
    1.3e+002 7.9e+002 0.82  6.18 (+) 1.4e−003 281 hsa-miR-188-5p
    4.1e+002 2.3e+003 0.86  5.73 (+) 1.4e−003 59 hsa-miR-487b
    1.5e+002 8.4e+002 0.85  5.68 (+) 8.1e−004 339 hsa-miR-495
    3.7e+002 1.7e+003 0.79  4.57 (+) 3.1e−002 316 hsa-miR-362-5p
    2.0e+002 9.2e+002 0.80  4.49 (+) 2.4e−003 176 hsa-miR-18a
    2.9e+002 1.3e+003 0.82  4.39 (+) 1.4e−003 348 hsa-miR-532-3p
    1.8e+002 7.8e+002 0.85  4.27 (+) 4.0e−004 352 hsa-miR-543
    4.0e+002 1.7e+003 0.81  4.18 (+) 2.3e−002 349 hsa-miR-532-5p
    1.9e+003 7.8e+003 0.87  4.14 (+) 4.9e−004 67 hsa-miR-92a
    5.7e+002 2.4e+003 0.86  4.13 (+) 9.2e−004 357 hsa-miR-660
    1.3e+002 5.6e+002 0.78  4.13 (+) 4.2e−003 315 hsa-miR-362-3p
    2.3e+002 8.6e+002 0.81  3.73 (+) 2.8e−003 343 hsa-miR-502-3p
    2.0e+002 7.2e+002 0.84  3.64 (+) 1.5e−003 342 hsa-miR-501-3p
    2.3e+002 8.5e+002 0.82  3.62 (+) 6.7e−003 355 hsa-miR-654-3p
    1.9e+002 6.7e+002 0.79  3.56 (+) 1.3e−002 340 hsa-miR-500
    2.4e+002 8.4e+002 0.80  3.56 (+) 7.9e−003 344 hsa-miR-503
    2.2e+003 7.6e+003 0.78  3.53 (+) 7.2e−003 10 hsa-miR-130a
    2.6e+002 8.8e+002 0.80  3.35 (+) 3.7e−003 341 hsa-miR-500*
    2.6e+002 7.9e+002 0.79  3.06 (+) 7.3e−003 331 hsa-miR-432
    9.3e+002 2.7e+003 0.77  2.90 (+) 1.4e−002 20 hsa-miR-17
    4.3e+002 1.2e+003 0.86  2.90 (+) 1.0e−003 277 hsa-miR-17*
    2.4e+002 6.7e+002 0.83  2.77 (+) 7.0e−003 318 hsa-miR-370
    1.6e+003 4.5e+003 0.78  2.75 (+) 1.4e−002 158 hsa-miR-106a
    4.3e+002 1.1e+003 0.83  2.67 (+) 3.0e−003 265 hsa-miR-130b
    1.0e+003 2.7e+003 0.86  2.63 (+) 7.1e−004 284 hsa-miR-19b
    8.6e+002 2.1e+003 0.82  2.43 (+) 8.6e−003 36 hsa-miR-210
    6.1e+003 6.8e+002 0.90  8.92 (−) 1.8e−004 183 hsa-miR-199b-5p
    1.9e+004 4.5e+003 0.83  4.15 (−) 8.0e−004 40 hsa-miR-222
    1.1e+003 3.1e+002 0.90  3.55 (−) 5.6e−005 63 hsa-miR-574-5p
    1.1e+004 3.2e+003 0.82  3.52 (−) 2.2e−003 147 hsa-miR-221
    5.9e+003 1.8e+003 0.80  3.25 (−) 2.0e−003 43 hsa-miR-29a
    5.2e+002 1.6e+002 0.82  3.19 (−) 5.4e−003 289 hsa-miR-22*
    5.1e+003 1.7e+003 0.82  3.04 (−) 7.0e−003 52 hsa-miR-34a
    8.1e+002 2.9e+002 0.76  2.81 (−) 1.4e−002 190 hsa-miR-29b
    1.2e+003 4.5e+002 0.86  2.67 (−) 4.7e−003 4 hsa-miR-10a
    3.7e+003 1.5e+003 0.86  2.43 (−) 1.3e−003 5 hsa-miR-10b
    7.0e+003 2.9e+003 0.85  2.39 (−) 1.5e−003 39 hsa-miR-22
    1.6e+003 6.9e+002 0.78  2.25 (−) 1.5e−002 169 hsa-miR-152
    2.9e+003 1.3e+003 0.76  2.19 (−) 2.8e−002 208 hsa-miR-497
    (+) the higher expression of this miR is in rhabdomyosarcoma tumors
    (−) the higher expression of this miR is in MFH or fibrosarcoma tumors
  • FIG. 25 demonstrates binary decisions at node #45 of the decision-tree. Tumors originating in Rhabdomyosarcoma (diamonds) are easily distinguished from tumors of malignant fibrous histiocytoma (MFH) or fibrosarcoma origin (squares) using the expression levels of hsa-miR-206 (SEQ ID NO: 33, y-axis), hsa-miR-22 (SEQ ID NO: 39, x-axis) and hsa-miR-487b (SEQ ID NO: 59, z-axis).
  • TABLE 47
    β values of the decision tree classifier
    The classification at node 11 is based on the gender of subject rather than on beta values;
    accordingly, no data is provided for this node. PTH = 0.5 for all node
    miR
    1 miR 2 miR 3
    β0 SEQ SEQ SEQ
    Node intercept miR hsa- ID NO β1 miR hsa- ID NO β2 miR hsa- ID NO β3
    1 −23.3111 miR-372 55 2.3127
    2 −26.9408 miR-122 6 2.3127
    3 −3.8519 miR-200b 29 1.8567 miR-126 9 −1.379
    4 −8.2646 miR-200c 30 1.9582 miR-30a 46 −1.2306
    5 17.4706 miR-146a 16 1.1979 let-7e 2 −1.7697 miR-30a 46 −0.88435
    6 −32.5621 miR-9* 66 1.5475 miR-92b 68 1.7188
    7 −9.5521 miR-222 40 −1.1606 miR-497 208 2.0005
    8 −23.053 miR-193a-3p 25 −1.0267 miR-7 65 1.2404 miR-375 56 1.6602
    9 −29.3207 miR-194 27 2.0115 miR-21* 35 1.1414
    10 1.244 miR-181a 21 −1.5458 miR-143 14 0.9879
    12 21.3416 miR-200b 29 −1.942 miR-516a-5p 211 −1.256
    13 10.3775 miR-125a-5p 7 −1.1455 miR-205 32 1.1064 miR-345 51 −1.0128
    14 −40.666 miR-193a-3p 25 1.9505 miR-342-3p 50 0.93196 miR-375 56 0.82076
    15 26.2937 miR-22 39 −1.8153 miR-10a 4 0.61098 miR-205 32 −0.91632
    16 9.4008 miR-93 148 −1.3023 miR-138 11 1.5494 miR-10a 4 −1.119
    17 42.5529 miR-21 34 −1.801 miR-146b-5p 17 −1.4509
    18 0.52521 miR-193a-3p 25 1.7974 miR-31 49 −0.63021 miR-92a 67 −1.3119
    19 −20.7179 miR-138 11 0.9662 miR-378 202 −1.3077 miR-21 34 1.6447
    20 15.0039 miR-100 3 1.0814 miR-21 34 −2.0444
    21 −31.6015 miR-191 24 1.5137 miR-29c 191 0.22547 miR-934 69 1.734
    22 −44.3141 miR-10b 5 1.41 let-7c 1 0.86212 miR-361-5p 54 1.6178
    23 7.6168 miR-138 11 −0.32773 miR-10b 5 1.3275 miR-185 23 −1.8652
    24 2.4904 miR-342-3p 50 −1.7146 miR-30d 47 1.5521
    26 −10.0563 miR-17 20 1.9063 miR-29c* 45 −1.3096
    27 −2.3904 miR-222 40 1.5531 miR-92b 68 −1.5907 miR-92a 67 −0.63749
    28 −22.027 miR-652 64 1.9688 miR-214 37 −0.65807 miR-34c-5p 53 1.0197
    29 −11.4697 miR-21 34 1.8457 miR-148a 18 −1.3936
    30 21.7628 miR-224 42 −1.3059 miR-210 36 −0.79749 1201 146 −0.50909
    31 −17.747 miR-17 20 0.95763 miR-29a 43 1.6268 miR-30a 46 −1.3361
    32 −2.3716 miR-31 49 1.0661 miR-146a 16 0.62041 miR-345 51 −1.8214
    33 −4.226 miR-200b 29 0.48415 miR-149 19 −2.0172 miR-30a 46 1.0224
    34 −29.6828 miR-7 65 2.1394 miR-375 56 0.87847
    35 −23.6445 miR-202 31 2.1832 miR-509-3p 61 0.76095 miR-214* 38 −0.057027
    36 −41.4047 miR-29c* 45 1.2571 miR-143 14 1.9413
    37 −25.1227 miR-221 147 2.2247 miR-210 36 −0.63202
    38 −24.5409 miR-31 49 −0.19797 miR-126 9 2.3043
    39 −20.7495 miR-130a 10 1.014 miR-10b 5 −1.0484 miR-21* 35 1.7948
    40 −6.0971 miR-100 3 1.9198 miR-222 40 −1.0289 miR-145 15 −0.77759
    41 −38.5059 miR-140-3p 12 1.6462 miR-455-5p 58 1.6244
    42 −10.7873 miR-210 36 −0.84091 miR-193a-5p 26 1.9298
    43 −30.4778 miR-181a 21 2.3127
    44 31.0975 miR-193a-3p 25 −2.0358 miR-31 49 −1.0974
    45 −17.5516 miR-22 39 −0.91078 miR-487b 59 1.0201 miR-206 487 1.8651
  • TABLE 48
    Using fine-needle aspiration (FNA), pleural effusion or bronchial
    brushing for the identification of cancer tissue of origin
    Class Biopsy
    identified Site Histological Type Sampling Method
    lung-small Lymph Neuroendocrine; Small percutaneous FNA
    Node
    UpperSCC Lung Non-small; squamous percutaneous FNA
    UpperSCC Lung Non-small; adenocarcinoma percutaneous FNA
    lung-small Lung Neuroendocrine; Small percutaneous FNA
    lung-adeno Lung Non-small; adenocarcinoma percutaneous FNA
    UpperSCC Lung Non-small; squamous percutaneous FNA
    lung-small Lymph Neuroendocrine; Small transbronchial FNA
    Node
    lung-small Lung Neuroendocrine; Small transbronchial FNA
    lung-adeno Lung Non-small; adenocarcinoma Pleural effusion
    pleura
    lung-adeno Lung Non-small; adenocarcinoma Pleural effusion
    pleura
    Lung, small Lung Neuroendocrine; Small bronchial brushing
    Lung, small Lung Neuroendocrine; Small bronchial brushing
    Lung, small Lung Neuroendocrine; Small bronchial brushing
    Lung, small Lung Neuroendocrine; Small bronchial brushing
    Lung, small Lung Neuroendocrine; Small bronchial brushing
  • The foregoing description of the specific embodiments so fully reveals the general nature of the invention that others can, by applying current knowledge, readily modify and/or adapt for various applications such specific embodiments without undue experimentation and without departing from the generic concept, and, therefore, such adaptations and modifications should and are intended to be comprehended within the meaning and range of equivalents of the disclosed embodiments. Although the invention has been described in conjunction with specific embodiments thereof, it is evident that many alternatives, modifications and variations will be apparent to those skilled in the art. Accordingly, it is intended to embrace all such alternatives, modifications and variations that fall within the spirit and broad scope of the appended claims.
  • It should be understood that the detailed description and specific examples, while indicating preferred embodiments of the invention, are given by way of illustration only, since various changes and modifications within the spirit and scope of the invention will become apparent to those skilled in the art from this detailed description.
  • REFERENCES
    • 1. Bentwich, I. et al. Identification of hundreds of conserved and nonconserved human microRNAs. Nat Genet (2005).
    • 2. Farh, K. K. et al. The Widespread Impact of Mammalian MicroRNAs on mRNA Repression and Evolution. Science (2005).
    • 3. Griffiths-Jones, S., Grocock, R. J., van Dongen, S., Bateman, A. & Enright, A. J. miRBase: microRNA sequences, targets and gene nomenclature. Nucleic Acids Res 34, D140-4 (2006).
    • 4. He, L. et al. A microRNA polycistron as a potential human oncogene. Nature 435, 828-33 (2005).
    • 5. Baskerville, S. & Bartel, D. P. Microarray profiling of microRNAs reveals frequent coexpression with neighboring miRNAs and host genes. Rna 11, 241-7 (2005).
    • 6. Landgraf, P. et al. A Mammalian microRNA Expression Atlas Based on Small RNA Library Sequencing. Cell 129, 1401-14 (2007).
    • 7. Volinia, S. et al. A microRNA expression signature of human solid tumors defines cancer gene targets. Proc Natl Acad Sci USA (2006).
    • 8. Lu, J. et al. MicroRNA expression profiles classify human cancers. Nature 435, 834-8 (2005).
    • 9. Varadhachary, G. R., Abbruzzese, J. L. & Lenzi, R. Diagnostic strategies for unknown primary cancer. Cancer 100, 1776-85 (2004).
    • 10. Pimiento, J. M., Teso, D., Malkan, A., Dudrick, S. J. & Palesty, J. A. Cancer of unknown primary origin: a decade of experience in a community-based hospital. Am J Surg 194, 833-7; discussion 837-8 (2007).
    • 11. Shaw, P. H., Adams, R., Jordan, C. & Crosby, T. D. A clinical review of the investigation and management of carcinoma of unknown primary in a single cancer network. Clin Oncol (R Coll Radiol) 19, 87-95 (2007).
    • 12. Hainsworth, J. D. & Greco, F. A. Treatment of patients with cancer of an unknown primary site. N Engl J Med 329, 257-63 (1993).
    • 13. Blaszyk, H., Hartmann, A. & Bjornsson, J. Cancer of unknown primary: clinicopathologic correlations. Apmis 111, 1089-94 (2003).
    • 14. Bloom, G. et al. Multi-platform, multi-site, microarray-based human tumor classification. Am J Pathol 164, 9-16 (2004).
    • 15. Ma, X. J. et al. Molecular classification of human cancers using a 92-gene real-time quantitative polymerase chain reaction assay. Arch Pathol Lab Med 130, 465-73 (2006).
    • 16. Talantov, D. et al. A quantitative reverse transcriptase-polymerase chain reaction assay to identify metastatic carcinoma tissue of origin. J Mol Diagn 8, 320-9 (2006).
    • 17. Tothill, R. W. et al. An expression-based site of origin diagnostic method designed for clinical application to cancer of unknown origin. Cancer Res 65, 4031-40 (2005).
    • 18. Shedden, K. A. et al. Accurate molecular classification of human cancers based on gene expression using a simple classifier with a pathological tree-based framework. Am J Pathol 163, 1985-95 (2003).
    • 19. Raver-Shapira, N. et al. Transcriptional Activation of miR-34a Contributes to p53-Mediated Apoptosis. Mol Cell (2007).
    • 20. Xiao, C. et al. MiR-150 Controls B Cell Differentiation by Targeting the Transcription Factor c-Myb. Cell 131, 146-59 (2007).

Claims (19)

1.-95. (canceled)
96. A method of identifying a tissue of origin of a cancer sample, said method comprising:
(a) obtaining a biological sample from a subject in need thereof, wherein the sample is of a cancer selected from the group consisting of cancer of unknown primary (CUP), primary cancer, and metastatic cancer;
(b) measuring the level of nucleic acids comprising SEQ ID NOS: 1, 2 or 156, 3-7, 9-12, 14-21, 23-27, 29-40, 42, 43, 44 or 191, 45-51, 53-56, 57 or 202, 58, 59, 60 or 208, 61, 62 or 211, 64-69, 146-148, and optionally at least one control nucleic acid in the biological sample and applying a classifier algorithm to said level of nucleic acids measured; and
(c) identifying the tissue of origin of the sample based on the classification provided by the classifier algorithm.
97. The method of claim 96, wherein the classifier algorithm is selected from the group consisting of: decision tree classifier, K-nearest neighbor classifier (KNN), logistic regression classifier, nearest neighbor classifier, neural network classifier, Gaussian mixture model (GMM), Support Vector Machine (SVM) classifier, nearest centroid classifier, linear regression classifier and random forest classifier.
98. The method of claim 96, wherein the cancer is selected from the group consisting of adrenocortical carcinoma; anus or skin squamous cell carcinoma; biliary tract adenocarcinoma; Ewing sarcoma; gastrointestinal stromal tumor (GIST); gastrointestinal tract carcinoid; renal cell carcinoma: chromophobe, clear cell and papillary; pancreatic islet cell tumor; pheochromocytoma; urothelial cell carcinoma (TCC); lung, head & neck, or esophagus squamous cell carcinoma (SCC); brain: astrocytic tumor, oligodendroglioma; breast adenocarcinoma; uterine cervix squamous cell carcinoma; chondrosarcoma; germ cell cancer; sarcoma; colorectal adenocarcinoma; liposarcoma; hepatocellular carcinoma (HCC); lung large cell or adenocarcinoma; lung carcinoid; pleural mesothelioma; lung small cell carcinoma; B-cell lymphoma; T-cell lymphoma; melanoma; malignant fibrous histiocytoma (MFH) or fibrosarcoma; osteosarcoma; ovarian primitive germ cell tumor; ovarian carcinoma; pancreatic adenocarcinoma; prostate adenocarcinoma; rhabdomyosarcoma; gastric or esophageal adenocarcinoma; synovial sarcoma; non-seminomatous testicular germ cell tumor; seminomatous testicular germ cell tumor; thymoma; thymic carcinoma; follicular thyroid carcinoma; medullary thyroid carcinoma; and papillary thyroid carcinoma.
99. The method of claim 98, wherein a level of SEQ ID NOS: 55 above the reference threshold indicates a cancer of germ cell origin selected from the group consisting of an ovarian primitive cell and a testis cell, and further wherein a level of SEQ ID NOS: 29 and 62 above the reference threshold indicates a testis cell cancer origin selected from the group consisting of seminomatous testicular germ cell and non-seminomatous testicular germ cell.
100. The method of claim 98, wherein a level of a nucleic acid sequence selected from the group consisting of SEQ ID NOS: 55, 6, 9 and 29 above the reference threshold indicates a cancer origin selected from the group consisting of biliary tract adenocarcinoma and hepatocellular carcinoma.
101. The method of claim 98, wherein a level of a nucleic acid sequence selected from the group consisting of SEQ ID NOS: 55, 6, 30, 46, 16, 156, 66 and 68 above the reference threshold indicates a cancer of brain origin, and further wherein a level of SEQ ID NOS: 40 and 60 above the reference threshold indicates a brain cancer origin selected from the group consisting of oligodendroglioma and astrocytoma.
102. The method of claim 98, wherein a level of a nucleic acid sequence selected from the group consisting of SEQ ID NOS: 55, 6, 30, 46, 56, 65, 25, 27, 35, 14 and 21 above the reference threshold indicates a cancer of prostate adenocarcinoma origin; wherein a level of a nucleic acid sequence selected from the group consisting of SEQ ID NOS: 55, 6, 30, 46, 56, 65, 27, 35, 14, 21, 32, 51, 7, 25, 50, 11, 148, 4, 49 and 67 above the reference threshold indicates a cancer of breast adenocarcinoma origin; wherein a level of a nucleic acid sequence selected from the group consisting of SEQ ID NOS: 55, 6, 30, 46, 56, 65, 27, 35, 14, 21, 32, 51, 7, 25, 4, 39, 50, 11, 148, 49, 67, 57 and 34 above the reference threshold indicates a cancer of an ovarian carcinoma origin; wherein a level of a nucleic acid sequence selected from the group consisting of SEQ ID NOS: 55, 6, 30, 46, 56, 65, 25, 27, 35, 14, 21, 32, 51, 7, 11, 148, 4, 49, 67, 57 and 34 above the reference threshold indicates a cancer of lung large cell or lung adenocarcinoma origin; and wherein a level of a nucleic acid sequence selected from the group consisting of SEQ ID NOS: 55, 6, 30, 46, 56, 65, 25, 20 and 45 above the reference threshold indicates a cancer of lung small cell carcinoma origin.
103. The method of claim 98, wherein a level of a nucleic acid sequence selected from the group consisting of SEQ ID NOS: 55, 6, 30, 46, 56, 65, 25, 27, 35, 14, 21, 32, 51, 7, 11, 148 and 4 above the reference threshold indicates a cancer of thyroid carcinoma origin, and further wherein a level of SEQ ID NOS: 17 and 34 above the threshold indicates that the thyroid carcinoma origin is follicular or papillary.
104. The method of claim 98, wherein a level of a nucleic acid sequence selected from the group consisting of SEQ ID NOS: 55, 6, 30, 46, 56, 65, 25, 27, 35, 14, 21, 32, 51, 7, 50, 4, 39, 3 and 34 above the reference threshold indicates a cancer of a thymic carcinoma origin; or wherein a level of a nucleic acid sequence selected from the group consisting of SEQ ID NOS: 55, 6, 30, 46, 56, 65, 25, 27, 35, 14, 21, 32, 51, 7, 50, 4, 39, 3, 34, 69, 24 and 44 above the reference threshold indicates a cancer of urothelial cell carcinoma or squamous cell carcinoma origin, and further wherein a level of SEQ ID NOS: 1, 5 and 54 above the reference threshold indicates that the squamous-cell-carcinoma origin is uterine cervix squamous-cell—carcinoma or non-uterine cervix squamous cell carcinoma; or further wherein a level of SEQ ID NOS: 11 and 23 above the reference threshold indicates that the non-uterine cervix squamous cell carcinoma origin is selected from the group consisting of: a) anus or skin squamous cell carcinoma, and b) lung, head & neck, and esophagus squamous cell carcinoma.
105. The method of claim 98, wherein a level of a nucleic acid sequence selected from the group consisting of SEQ ID NOS: 55, 6, 30, 46, 16, 2, 47 and 50 above the reference threshold indicates a cancer of melanoma or lymphoma origin, and further wherein a level of SEQ ID NOS: 35 and 48 above the reference threshold indicates that the lymphoma cancer origin is selected from the group consisting of B-cell lymphoma and T-cell lymphoma.
106. The method of claim 98, wherein a level of a nucleic acid sequence selected from the group consisting of SEQ ID NOS: 55, 6, 30, 46, 56, 65, 25, 20, 45, 40, 67 and 68 above the reference threshold indicates a cancer of medullary thyroid carcinoma origin; wherein a level of a nucleic acid sequence selected from the group consisting of SEQ ID NOS: 55, 6, 30, 46, 56, 65, 25, 20, 45, 40, 67, 68, 64, 53 and 37 above the reference threshold indicates a cancer of lung carcinoid origin; and wherein a level of a nucleic acid sequence selected from the group consisting of SEQ ID NOS: 55, 6, 30, 46, 56, 65, 25, 20, 45, 40, 67, 68, 64, 53, 37, 34 and 18 above the reference threshold indicates a cancer of gastrointestinal tract carcinoid or pancreatic islet cell tumor origin.
107. The method of claim 98, wherein a level of a nucleic acid sequence selected from the group consisting of SEQ ID NOS: 55, 6, 30, 46, 56, 65, 25, 27, 35, 42, 36 and 146 above the reference threshold indicates a cancer of gastric or esophageal adenocarcinoma origin; wherein a level of a nucleic acid sequence selected from the group consisting of SEQ ID NOS: 55, 6, 30, 46, 56, 65, 25, 27, 35, 42, 36, 146, 20 and 43 above the reference threshold indicates a cancer of colorectal adenocarcinoma origin; wherein a level of a nucleic acid sequence selected from the group consisting of SEQ ID NOS: 55, 6, 30, 46, 56, 65, 25, 27, 35, 42, 36, 146, 20, 43, 51, 49 and 16 above the reference threshold indicates a cancer of pancreatic adenocarcinoma or biliary tract adenocarcinoma origin; wherein a level of a nucleic acid sequence selected from the group consisting of SEQ ID NOS: 55, 6, 30, 46, 16, 2, 66, 68, 19 and 29 above the reference threshold indicates a cancer of renal cell carcinoma origin, and further wherein a level of SEQ ID NOS: 36 and 147 above the reference threshold indicates a chromophobe renal cell carcinoma origin, or further wherein a level of SEQ ID NOS: 49 and 9 above the reference threshold indicates that the renal cell carcinoma origin is clear cell or papillary.
108. The method of claim 98, wherein a level of a nucleic acid sequence selected from the group consisting of SEQ ID NOS: 55, 6, 30, 46, 16, 2, 66, 68, 19, 29, 65 and 56 above the reference threshold indicates a cancer of pheochromocytoma origin; wherein a level of a nucleic acid sequence selected from the group consisting of SEQ ID NOS: 55, 6, 30, 46, 16, 2, 66, 68, 19, 29, 65, 56, 31, 38 and 61 above the reference threshold indicates a cancer of adrenocortical origin; wherein a level of a nucleic acid sequence selected from the group consisting of SEQ ID NOS: 55, 6, 30, 46, 16, 2, 66, 68, 19, 29, 65, 56, 31, 38, 61, 14 and 45 above the reference threshold indicates a cancer of gastrointestinal stromal tumor origin; wherein a level of a nucleic acid sequence selected from the group consisting of SEQ ID NOS: 55, 6, 30, 46, 16, 2, 66, 68, 19, 29, 65, 56, 31, 38, 61, 14, 45, 35, 10 and 5 above the reference threshold indicates a cancer of pleural mesothelioma or sarcoma origin, and further wherein a level of SEQ ID NOS: 3, 40 and 15 above the reference threshold indicates that the sarcoma is synovial sarcoma, or further wherein a level of SEQ ID NOS: 3, 40, 15, 12 and 58 above the reference threshold indicates that the sarcoma is chondrosarcoma, or further wherein a level of SEQ ID NOS: 3, 40, 15, 12, 58, 36 and 26 above the reference threshold indicates that the sarcoma is liposarcoma, or further wherein a level of SEQ ID NOS: 3, 40, 15, 12, 58, 36, 26, 21, 25 and 49 above the reference threshold indicates that the sarcoma is Ewing sarcoma or osteosarcoma; or further wherein a level of SEQ ID NOS: 3, 40, 15, 12, 58, 36, 26, 21, 59, 39 and 33 above the reference threshold indicates that the sarcoma is selected from the group consisting of: a) rhabdomyosarcoma, and b) malignant fibrous histiocytoma and fibrosarcoma.
109. The method of claim 96, wherein the biological sample is selected from the group consisting of a bodily fluid, a cell line, a tissue sample, a biopsy sample, a needle biopsy sample, a fine needle biopsy (FNA) sample, a surgically removed sample, and a sample obtained by tissue-sampling procedures such as endoscopy, bronchoscopy, or laparoscopic methods.
110. The method of claim 109, wherein the tissue is a fresh, frozen, fixed, wax-embedded or formalin-fixed paraffin-embedded (FFPE) tissue.
111. The method of claim 96, wherein the level of the nucleic acid sequence is determined by a method selected from the group consisting of nucleic acid hybridization and nucleic acid amplification.
112. The method of claim 113, wherein nucleic acid hybridization is performed using a solid-phase nucleic acid biochip array or in situ hybridization and wherein nucleic acid amplification is real-time PCR comprising forward and reverse primers and a probe comprising a sequence selected from the group consisting of a sequence that is complementary to a sequence selected from SEQ ID NOS: 1, 2 or 156, 3-7, 9-12, 14-21, 23-27, 29-40, 42, 43, 44 or 191, 45-51, 53-56, 57 or 202, 58, 59, 60 or 208, 61, 62 or 211, 64-69, 146-148, and optionally at least one control nucleic acid and a fragment thereof.
113. A kit for performing the method of claim 96 comprising probes, wherein the probes comprise (i) DNA equivalents of nucleic acids comprising SEQ ID NOS: 1-7, 9-12, 14-21, 23-27, 29-40, 42-51, 53-57, 59-62, 64-69, 146-148, and 156, (ii) the complements thereof, or (iii) sequences at least 90% identical to (i) or (ii).
US14/746,487 2007-03-27 2015-06-22 Methods and materials for classification of tissue of origin of tumor samples Abandoned US20150368724A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US14/746,487 US20150368724A1 (en) 2007-03-27 2015-06-22 Methods and materials for classification of tissue of origin of tumor samples
US15/909,145 US20190032142A1 (en) 2007-03-27 2018-03-01 Methods and materials for classification of tissue of origin of tumor samples

Applications Claiming Priority (12)

Application Number Priority Date Filing Date Title
US90726607P 2007-03-27 2007-03-27
US92924407P 2007-06-19 2007-06-19
US2456508P 2008-01-30 2008-01-30
PCT/IL2008/000396 WO2008117278A2 (en) 2007-03-27 2008-03-20 Gene expression signature for classification of cancers
US14064208P 2008-12-24 2008-12-24
US53294009A 2009-09-24 2009-09-24
PCT/IL2009/001212 WO2010073248A2 (en) 2008-12-24 2009-12-23 Gene expression signature for classification of tissue of origin of tumor samples
US41587510P 2010-11-22 2010-11-22
US13/167,489 US8802599B2 (en) 2007-03-27 2011-06-23 Gene expression signature for classification of tissue of origin of tumor samples
PCT/IL2011/000849 WO2012070037A2 (en) 2010-11-22 2011-11-01 Methods and materials for classification of tissue of origin of tumor samples
US13/856,190 US9096906B2 (en) 2007-03-27 2013-04-03 Gene expression signature for classification of tissue of origin of tumor samples
US14/746,487 US20150368724A1 (en) 2007-03-27 2015-06-22 Methods and materials for classification of tissue of origin of tumor samples

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US13/856,190 Continuation US9096906B2 (en) 2007-03-27 2013-04-03 Gene expression signature for classification of tissue of origin of tumor samples

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US201715654168A Continuation 2007-03-27 2017-07-19

Publications (1)

Publication Number Publication Date
US20150368724A1 true US20150368724A1 (en) 2015-12-24

Family

ID=49235334

Family Applications (3)

Application Number Title Priority Date Filing Date
US13/856,190 Expired - Fee Related US9096906B2 (en) 2007-03-27 2013-04-03 Gene expression signature for classification of tissue of origin of tumor samples
US14/746,487 Abandoned US20150368724A1 (en) 2007-03-27 2015-06-22 Methods and materials for classification of tissue of origin of tumor samples
US15/909,145 Abandoned US20190032142A1 (en) 2007-03-27 2018-03-01 Methods and materials for classification of tissue of origin of tumor samples

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US13/856,190 Expired - Fee Related US9096906B2 (en) 2007-03-27 2013-04-03 Gene expression signature for classification of tissue of origin of tumor samples

Family Applications After (1)

Application Number Title Priority Date Filing Date
US15/909,145 Abandoned US20190032142A1 (en) 2007-03-27 2018-03-01 Methods and materials for classification of tissue of origin of tumor samples

Country Status (1)

Country Link
US (3) US9096906B2 (en)

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10114924B2 (en) 2008-11-17 2018-10-30 Veracyte, Inc. Methods for processing or analyzing sample of thyroid tissue
US10407731B2 (en) 2008-05-30 2019-09-10 Mayo Foundation For Medical Education And Research Biomarker panels for predicting prostate cancer outcomes
US10422009B2 (en) 2009-03-04 2019-09-24 Genomedx Biosciences Inc. Compositions and methods for classifying thyroid nodule disease
US10446272B2 (en) 2009-12-09 2019-10-15 Veracyte, Inc. Methods and compositions for classification of samples
US10494677B2 (en) 2006-11-02 2019-12-03 Mayo Foundation For Medical Education And Research Predicting cancer outcome
US10513737B2 (en) 2011-12-13 2019-12-24 Decipher Biosciences, Inc. Cancer diagnostics using non-coding transcripts
US10526655B2 (en) 2013-03-14 2020-01-07 Veracyte, Inc. Methods for evaluating COPD status
US10731223B2 (en) 2009-12-09 2020-08-04 Veracyte, Inc. Algorithms for disease diagnostics
US10865452B2 (en) 2008-05-28 2020-12-15 Decipher Biosciences, Inc. Systems and methods for expression-based discrimination of distinct clinical disease states in prostate cancer
US10934587B2 (en) 2009-05-07 2021-03-02 Veracyte, Inc. Methods and compositions for diagnosis of thyroid conditions
WO2021060311A1 (en) * 2019-09-27 2021-04-01 国立大学法人東海国立大学機構 Method for detecting brain tumor
US11011265B2 (en) 2018-06-28 2021-05-18 Case Western Reserve University Predicting prostate cancer risk of progression with multiparametric magnetic resonance imaging using machine learning and peritumoral radiomics
US11035005B2 (en) 2012-08-16 2021-06-15 Decipher Biosciences, Inc. Cancer diagnostics using biomarkers
US11078542B2 (en) 2017-05-12 2021-08-03 Decipher Biosciences, Inc. Genetic signatures to predict prostate cancer metastasis and identify tumor aggressiveness
US11208697B2 (en) 2017-01-20 2021-12-28 Decipher Biosciences, Inc. Molecular subtyping, prognosis, and treatment of bladder cancer
US11217329B1 (en) 2017-06-23 2022-01-04 Veracyte, Inc. Methods and systems for determining biological sample integrity
US11414708B2 (en) 2016-08-24 2022-08-16 Decipher Biosciences, Inc. Use of genomic signatures to predict responsiveness of patients with prostate cancer to post-operative radiation therapy
US11639527B2 (en) 2014-11-05 2023-05-02 Veracyte, Inc. Methods for nucleic acid sequencing
US11873532B2 (en) 2017-03-09 2024-01-16 Decipher Biosciences, Inc. Subtyping prostate cancer to predict response to hormone therapy

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102471803B (en) * 2009-07-14 2015-11-25 森永乳业株式会社 Produce the screening method with the food of the breast of immunoregulation effect
US8831327B2 (en) * 2011-08-30 2014-09-09 General Electric Company Systems and methods for tissue classification using attributes of a biomarker enhanced tissue network (BETN)
US20130280720A1 (en) * 2012-03-15 2013-10-24 The University Of Kansas Tissue biomarkers for indication of progression from barrett's esophagus to esophageal adenocarcinoma
PL405648A1 (en) * 2013-10-15 2015-04-27 Warszawski Uniwersytet Medyczny Method for diagnosing primary hepatic carcinoma, application of microRNA marker for diagnosing the lesion within liver, assessment of the progression of disease and evaluation of patient's vulnerability and/or disease to the proposed treatment and diagnostic set containing such same markers
BR102014003033B8 (en) * 2014-02-07 2020-12-22 Fleury S/A process and classification system for tumor samples of unknown and / or uncertain origin; quality control process of biological tumor samples of known origin and quality control process of biological samples of unknown and / or uncertain origin
GB2554572B (en) 2015-03-26 2021-06-23 Dovetail Genomics Llc Physical linkage preservation in DNA storage
IL262946B2 (en) * 2016-05-13 2023-03-01 Dovetail Genomics Llc Recovering long-range linkage information from preserved samples
US11884979B2 (en) 2016-09-16 2024-01-30 Takeda Pharmaceutical Company Limited RNA biomarkers for hereditary angioedema
JP2022518702A (en) * 2019-01-16 2022-03-16 オスペダーレ・サン・ラッファエーレ・エッセエッレエッレ Biomarker for renal cell carcinoma
CN111944901A (en) * 2020-08-04 2020-11-17 佛山科学技术学院 Characteristic mRNA expression profile combination and renal papillary cell carcinoma early prediction method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100178653A1 (en) * 2007-03-27 2010-07-15 Rosetta Genomics Ltd. Gene expression signature for classification of cancers
US20110312530A1 (en) * 2007-03-27 2011-12-22 Rosetta Genomics Ltd Gene expression signature for classification of tissue of origin of tumor samples
US20120219958A1 (en) * 2009-11-09 2012-08-30 Yale University MicroRNA Signatures Differentiating Uterine and Ovarian Papillary Serous Tumors

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030225526A1 (en) 2001-11-14 2003-12-04 Golub Todd R. Molecular cancer diagnosis using tumor gene expression signature
AU2005250432B2 (en) 2004-05-28 2011-09-15 Asuragen, Inc. Methods and compositions involving microRNA
US20080269072A1 (en) 2004-10-21 2008-10-30 Hart Ronald P Rational Probe Optimization for Detection of MicroRNAs
AU2006208134A1 (en) 2005-01-25 2006-08-03 Rosetta Inpharmatics Llc Methods for quantitating small RNA molecules
US20070065844A1 (en) 2005-06-08 2007-03-22 Massachusetts Institute Of Technology Solution-based methods for RNA expression profiling
US20070092882A1 (en) 2005-10-21 2007-04-26 Hui Wang Analysis of microRNA
WO2007073737A1 (en) 2005-12-29 2007-07-05 Exiqon A/S Detection of tissue origin of cancer
US7670840B2 (en) 2006-01-05 2010-03-02 The Ohio State University Research Foundation Micro-RNA expression abnormalities of pancreatic, endocrine and acinar tumors
EP2038432B1 (en) 2006-06-30 2017-02-08 Rosetta Genomics Ltd Method for detecting and quantifying a target nucleic acid generated by rt-pcr of mirna
US20100273172A1 (en) * 2007-03-27 2010-10-28 Rosetta Genomics Ltd. Micrornas expression signature for determination of tumors origin
WO2012070037A2 (en) * 2010-11-22 2012-05-31 Rosetta Genomics Ltd. Methods and materials for classification of tissue of origin of tumor samples
WO2010073248A2 (en) 2008-12-24 2010-07-01 Rosetta Genomics Ltd. Gene expression signature for classification of tissue of origin of tumor samples
US20100249213A1 (en) 2007-09-06 2010-09-30 The Ohio State University Research Foundation MicroRNA Signatures in Human Ovarian Cancer

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100178653A1 (en) * 2007-03-27 2010-07-15 Rosetta Genomics Ltd. Gene expression signature for classification of cancers
US20110312530A1 (en) * 2007-03-27 2011-12-22 Rosetta Genomics Ltd Gene expression signature for classification of tissue of origin of tumor samples
US20120219958A1 (en) * 2009-11-09 2012-08-30 Yale University MicroRNA Signatures Differentiating Uterine and Ovarian Papillary Serous Tumors

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
Bentwich et al. (Identification of hundreds of conserved and nonconserved human microRNAs, Nature Genetics 37, 766 - 770 (2005), Published online: 19 June 2005) *
Lawrie et al. (Expression of microRNAs in diffuse large B cell lymphoma is associated with immunophenotype, survival and transformation from follicular lymphoma, Journal of Cellular and Molecular Medicine, Volume 13, Issue 7, July 2009, Pages 1248-1260) *
Rosenfeld et al. (MicroRNAs accurately identify cancer tissue origin, Nat Biotechnol. 2008 Apr;26(4):462-9. Epub 2008 Mar 23) *
Stratagene (Gene Characterization Kits; 1988) *
Weiner et al. (Kits and their unique role in molecular biology: a brief retrospective, BioTechniques 44:701-704 (25th Anniversary Issue, April 2008)) *

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10494677B2 (en) 2006-11-02 2019-12-03 Mayo Foundation For Medical Education And Research Predicting cancer outcome
US10865452B2 (en) 2008-05-28 2020-12-15 Decipher Biosciences, Inc. Systems and methods for expression-based discrimination of distinct clinical disease states in prostate cancer
US10407731B2 (en) 2008-05-30 2019-09-10 Mayo Foundation For Medical Education And Research Biomarker panels for predicting prostate cancer outcomes
US10236078B2 (en) 2008-11-17 2019-03-19 Veracyte, Inc. Methods for processing or analyzing a sample of thyroid tissue
US10672504B2 (en) 2008-11-17 2020-06-02 Veracyte, Inc. Algorithms for disease diagnostics
US10114924B2 (en) 2008-11-17 2018-10-30 Veracyte, Inc. Methods for processing or analyzing sample of thyroid tissue
US10422009B2 (en) 2009-03-04 2019-09-24 Genomedx Biosciences Inc. Compositions and methods for classifying thyroid nodule disease
US10934587B2 (en) 2009-05-07 2021-03-02 Veracyte, Inc. Methods and compositions for diagnosis of thyroid conditions
US10446272B2 (en) 2009-12-09 2019-10-15 Veracyte, Inc. Methods and compositions for classification of samples
US10731223B2 (en) 2009-12-09 2020-08-04 Veracyte, Inc. Algorithms for disease diagnostics
US10513737B2 (en) 2011-12-13 2019-12-24 Decipher Biosciences, Inc. Cancer diagnostics using non-coding transcripts
US11035005B2 (en) 2012-08-16 2021-06-15 Decipher Biosciences, Inc. Cancer diagnostics using biomarkers
US10526655B2 (en) 2013-03-14 2020-01-07 Veracyte, Inc. Methods for evaluating COPD status
US11639527B2 (en) 2014-11-05 2023-05-02 Veracyte, Inc. Methods for nucleic acid sequencing
US11414708B2 (en) 2016-08-24 2022-08-16 Decipher Biosciences, Inc. Use of genomic signatures to predict responsiveness of patients with prostate cancer to post-operative radiation therapy
US11208697B2 (en) 2017-01-20 2021-12-28 Decipher Biosciences, Inc. Molecular subtyping, prognosis, and treatment of bladder cancer
US11873532B2 (en) 2017-03-09 2024-01-16 Decipher Biosciences, Inc. Subtyping prostate cancer to predict response to hormone therapy
US11078542B2 (en) 2017-05-12 2021-08-03 Decipher Biosciences, Inc. Genetic signatures to predict prostate cancer metastasis and identify tumor aggressiveness
US11217329B1 (en) 2017-06-23 2022-01-04 Veracyte, Inc. Methods and systems for determining biological sample integrity
US11017896B2 (en) * 2018-06-28 2021-05-25 Case Western Reserve University Radiomic features of prostate bi-parametric magnetic resonance imaging (BPMRI) associate with decipher score
US11011265B2 (en) 2018-06-28 2021-05-18 Case Western Reserve University Predicting prostate cancer risk of progression with multiparametric magnetic resonance imaging using machine learning and peritumoral radiomics
JP7394441B2 (en) 2019-09-27 2023-12-08 Craif株式会社 How to test for brain tumors
WO2021060311A1 (en) * 2019-09-27 2021-04-01 国立大学法人東海国立大学機構 Method for detecting brain tumor

Also Published As

Publication number Publication date
US9096906B2 (en) 2015-08-04
US20190032142A1 (en) 2019-01-31
US20130259839A1 (en) 2013-10-03

Similar Documents

Publication Publication Date Title
US20190032142A1 (en) Methods and materials for classification of tissue of origin of tumor samples
US20190241966A1 (en) Gene Expression Signature for Classification of Tissue of Origin of Tumor Samples
EP2643479B1 (en) Methods and materials for classification of tissue of origin of tumor samples
US20100178653A1 (en) Gene expression signature for classification of cancers
US20190017122A1 (en) Mirnas as diagnostic biomarkers to distinguish benign from malignant thyroid tumors
US9822416B2 (en) miRNA in the diagnosis of ovarian cancer
US20150099665A1 (en) Methods for distinguishing between specific types of lung cancers
US9133522B2 (en) Compositions and methods for the diagnosis and prognosis of mesothelioma
US20150292020A1 (en) miRNA Fingerprint in the Diagnosis of COPD
US20090186353A1 (en) Cancer-related nucleic acids
TW201303026A (en) Micrornas (miRAN) as biomarkers for the identification of familial and non-familial colorectal cancer
US9068232B2 (en) Gene expression signature for classification of kidney tumors
US9834821B2 (en) Diagnosis and prognosis of various types of cancers
US20140243230A1 (en) Gene expression signature for classification of kidney tumors

Legal Events

Date Code Title Description
AS Assignment

Owner name: ROSETTA GENOMICS LTD., ISRAEL

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:ROSENFELD, NITZAN;DROMI, NIR;AHARONOV, RANIT;AND OTHERS;SIGNING DATES FROM 20140213 TO 20140617;REEL/FRAME:037858/0229

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION