WO2015175660A1 - Mirna expression signature in the classification of thyroid tumors - Google Patents

Mirna expression signature in the classification of thyroid tumors Download PDF

Info

Publication number
WO2015175660A1
WO2015175660A1 PCT/US2015/030564 US2015030564W WO2015175660A1 WO 2015175660 A1 WO2015175660 A1 WO 2015175660A1 US 2015030564 W US2015030564 W US 2015030564W WO 2015175660 A1 WO2015175660 A1 WO 2015175660A1
Authority
WO
WIPO (PCT)
Prior art keywords
mir
hsa
nucleic acid
thyroid
classifier
Prior art date
Application number
PCT/US2015/030564
Other languages
French (fr)
Inventor
Zohar BARNETT-ITZHAKI
Gila Lithwick Yanai
Eti Meiri
Yael Spector
Hila Benjamin
Original Assignee
Rosetta Genomics, Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Rosetta Genomics, Ltd. filed Critical Rosetta Genomics, Ltd.
Priority to JP2016567582A priority Critical patent/JP6216470B2/en
Priority to BR112016026575A priority patent/BR112016026575A2/en
Priority to CN201580024961.9A priority patent/CN106460053A/en
Priority to EP15792258.4A priority patent/EP3143162A4/en
Priority to CA2945531A priority patent/CA2945531C/en
Publication of WO2015175660A1 publication Critical patent/WO2015175660A1/en
Priority to US15/237,364 priority patent/US9708667B2/en
Priority to IL248639A priority patent/IL248639A0/en
Priority to US15/625,645 priority patent/US20170356055A1/en
Priority to US16/192,221 priority patent/US20190300963A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61PSPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
    • A61P35/00Antineoplastic agents
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/112Disease subtyping, staging or classification
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/158Expression markers
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/178Oligonucleotides characterized by their use miRNA, siRNA or ncRNA

Definitions

  • the present invention relates to methods for classification of thyroid tumors. Specifically the invention relates to microRNA molecules associated with specific thyroid tumors. 5
  • MicroRNAs are an important class of regulatory RNAs, which have a profound impact on a wide array of biological processes. These small (typically 18-24 nucleotides long) non- coding RNA molecules can modulate protein expression pattern by promoting RNA degradation, inhibiting mRNA translation, and also by affecting gene transcription.
  • miRs play 15 pivotal roles in diverse processes such as development and differentiation, control of cell proliferation, stress response and metabolism.
  • the expression of many miRs was found to be altered in numerous types of human cancer, and in some cases suggesting that such alterations may play a causative role in tumor progression.
  • the thyroid gland is formed of two main types of cells: the follicular cells and the C or 20 parafollicular cells.
  • Follicular cells produce thyroid hormones, which are regulators of human metabolism. Overproduction of thyroid hormone (hyperthyroidism) causes rapid or irregular heartbeat, trouble sleeping, nervousness, hunger, weight loss, and a feeling of being too warm. In counterpart, hypothyroidism causes metabolism slowdown, tiredness, and weight gain. Thyroid hormone release is regulated by the thyroid- stimulating hormone (TSH), produced by the 25 pituitary gland.
  • Thyroid hormone thyroid- stimulating hormone
  • the C cells produce calcitonin, a hormone responsible for use of calcium. Lymphocytes and stromal cells are also found in the thyroid.
  • Thyroid cancer is the eighth most common cancer in the United States, and the most rapidly increasing cancer in the US, with more than 60,000 new cases diagnosed every year, and being the cause of about 1,800 deaths in 2014. Thyroid cancer usually presents itself as a 30 palpable thyroid nodule. Different types of thyroid tumors develop from different cell types, which is a determinant for the gravity and the optimal treatment administered. Most of the growths and tumors in the thyroid gland are benign (non-cancerous) but others are malignant (cancerous).
  • DTC differentiated thyroid carcinomas
  • PTC papillary thyroid carcinoma
  • FTC follicular thyroid carcinoma
  • FNA fine-needle aspiration
  • US 7,319,011 describes the measuring the expression of any one of the genes DDIT3, ARG2, ITM1, Clorf24, TARSH, and ACOl in a test follicular thyroid specimen for distinguishing between follicular adenoma (FA) from follicular carcinoma (FC).
  • US 7,670,775 20 describes the analysis of the expression of CCND2, PCSK2, and PLAB for identifying malignant thyroid tissue.
  • US 6,723,506 describes the molecular characterization of PAX8- PPAR1 molecules in connection with diagnosis and treatment of thyroid follicular carcinomas.
  • US 7,378,233 describes the occurrence of the T1796A mutation of the BRAF gene in 24 (69%) of papillary thyroid carcinomas. 25
  • a novel integrated technology platform was developed by the inventors for profiling and characterizing microRNAs in thyroid clinical samples, including biopsies, generally surgically- obtained resections, and cytological specimens, generally obtained by fine-needle aspiration 10 (FNA), and was used applied to classify thyroid lesions as benign or malignant neoplasms, as well as its sub-types. Novel microRNAs are disclosed as potential biomarkers.
  • biopsies generally surgically- obtained resections
  • cytological specimens generally obtained by fine-needle aspiration 10 (FNA)
  • FNA fine-needle aspiration 10
  • the present invention provides a method for classifying a thyroid lesion sample, the method comprising the steps of:
  • a obtaining a thyroid lesion sample from a subject in need thereof; 15 b. measuring the expression level of at least four nucleic acids in the sample, said nucleic acid comprising a sequence of SEQ ID NOS: 1-308, variants thereof or a sequence having at least about 80% identity thereto;
  • step (b) or (c) further comprising a step of obtaining the ratio between the expression levels of at least one pair of 25 nucleic acids; and wherein in step (d) said classifier algorithm may be applied to any one of the nucleic acid expression profile, said ratio of at least one pair of nucleic acids, or to a combination thereof.
  • said nucleic acid sequence comprises a sequence of any one of SEQ ID NOs.1-37, variants thereof or a sequence having at least about 30 80% identity thereto. In a further embodiment of the method of the invention, said nucleic acid sequence comprises a sequence of any one of SEQ ID NOs.1-25, variants thereof or a sequence having at least about 80% identity thereto.
  • said thyroid lesion sample is obtained by fine needle aspiration (FNA) biopsy.
  • said sample is a 5 smear from a FNA biopsy.
  • said thyroid lesion is a nodule of less than 1 cm.
  • algorithm is a machine- learning algorithm.
  • said algorithm 10 further combines the nucleic acid expression profile with clinical or genetic data from said sample.
  • step (b) following step (b) if at least one of said nucleic acid expression level is below or above a threshold for thyroid cells, said sample is discarded based on the expression level of said nucleic acid.
  • said sample has less than 50 thyroid cells.
  • said measuring is performed by hybridization, amplification or next generation sequencing method.
  • said hybridization 20 comprises contacting the sample with probes, wherein the probes comprise (i) DNA equivalents of the microRNAs, (ii) the complements thereof, (iii) sequences at least 80% identical to (i) or (ii) or (iv) a nucleic acid sequence that hybridizes with at least eight contiguous nucleotides of any one of SEQ ID NOs 1-25.
  • said probes are attached to a solid substrate. 25
  • amplification is real-time polymerase chain reaction (RT-PCR), said RT-PCR amplification method comprising forward and reverse primers, and optionally further comprising hybridization with a probe.
  • RT-PCR real-time polymerase chain reaction
  • said method further comprises the step of administering a differential treatment to said subject if said thyroid lesion is benign or malignant.
  • said lesion is malignant and said treatment is any one of surgery, chemotherapy, radiotherapy, hormone therapy, or any other recommended treatment.
  • the present invention provides a protocol for classifying a thyroid lesion sample comprising the steps of:
  • nucleic acid comprising a sequence of SEQ ID NOS: 1-308, variants thereof or a sequence having at least 5 about 80% identity thereto;
  • step (i) the expression level of at least one nucleic acid that is a non-thyroid cell marker above a threshold determines that the sample is discarded; or (ii) expression levels of 10 non-thyroid cell markers below a threshold determines that the sample proceeds to step (e) for further analysis;
  • step (d) determining a nucleic acid expression profile
  • step (b) further comprising a step of obtaining the ratio between the expression levels of at least one pair of 20 nucleic acids; and wherein in step (f) said classifier algorithm may be applied to any one of the nucleic acid expression profile, said ratio of at least one pair of nucleic acids, or to a combination thereof.
  • said nucleic acid sequence comprises a sequence of any one of SEQ ID NOs.1-37, variants thereof or a sequence having at 25 least about 80% identity thereto. In another embodiment of the protocol of the invention, said nucleic acid sequence comprises a sequence of any one of SEQ ID NOs.1-25, variants thereof or a sequence having at least about 80% identity thereto.
  • said thyroid lesion sample is obtained by fine needle aspiration (FNA) biopsy.
  • said sample is a 30 smear from a FNA biopsy.
  • said thyroid lesion is a nodule of less than 1 cm.
  • said sample has less than 50 thyroid cells.
  • said algorithm is a machine- learning algorithm.
  • the measuring is performed by hybridization, amplification or next generation sequencing method.
  • the present invention provides a kit for thyroid tumor 5 classification, said kit comprising:
  • probes for performing thyroid tumor classification comprise any one of (i) DNA equivalents of microRNAs comprising at least one of SEQ ID NOs 1-308, (ii) the complements thereof, (iii) sequences at least 80% identical to (i) or (ii), (iv) a nucleic acid sequence that hybridizes with at least eight contiguous nucleotides of any one of SEQ ID 10 NOs 1-182, or (v) a nucleic acid sequence that hybridizes with RT-PCR products; and optionally b. an instruction manual for using said probes.
  • said kit further comprises forward and reverse PCR primers.
  • the kit of the invention may comprise forward and reverse primers. In another embodiment, the kit of the invention may further comprise reagents for 15 performing in situ hybridization analysis.
  • the kit for thyroid tumor classification comprises:
  • At least one forward RT-PCR primer such as for example at least one of the primers comprising SEQ ID NO. 270-293; 20
  • said probe is a general probe. In another embodiment said probe is a microRNA sequence-specific probe.
  • the present invention provides an isolated nucleic acid, said nucleic acid comprising at least 12 contiguous nucleotides at least 80% identical to the sequence of any one of SEQ ID NOs. 27-29, 33, 34, 139, 140, 307 and 308.
  • the present invention provides a pharmaceutical composition comprising as active agent the isolated nucleic acids described herein, and optionally adjuvants, carriers, diluents and excipients.
  • said nucleic acid molecules may be comprised as an active agent in a pharmaceutical composition, a formulation or a medicament.
  • the present invention provides a vector comprising the isolated nucleic acid described herein.
  • the present invention provides a probe comprising the isolated nucleic acid described herein. 5
  • the present invention provides a biochip comprising the isolated nucleic acid described herein.
  • the present invention provides the use of an isolated nucleic acid as described herein in the preparation of a medicament.
  • Figure 1 Expression of microRNAs in Giemsa-stained papillary carcinoma (Pap-carc.) and non-papillary carcinoma (N-Pap-carc.) smears.
  • the data are shown in normalized fluorescence units, as measured by microarray.
  • the parallel lines describe a 1.5-fold change between the 15 samples in either direction.
  • Gray crosses represent untested (NT) control probes or median signal ⁇ 300 in both samples.
  • microRNAs (hsa-miR-146b-5p, hsa-miR-222-3p, hsa-miR-221-3p, hsa-miR-21-5p and hsa-miR-31-5p) are up-regulated in the papillary carcinoma smear.
  • FIG.2A shows the predicted secondary structure of two novel microRNAs, MD2-495 (top) and MD2-437 20 (bottom) detected in thyroid tissue.
  • Fig.2B shows the expression of the two novel microRNAs in each one of 11 resected thyroid samples.
  • Figures 3A-3B MicroRNA expression in malignant versus benign samples.
  • the scatter plot shows the median microRNA expression levels of microRNA, including miR-125b-5p, miR-222-3p and miR-146b-5p (highlighted) in malignant nodules (y-axis) versus benign nodules 25 (x-axis).
  • Each cross represents a microRNA, and includes control sequences, microRNAs with low expression and non-reliable probes (NT).
  • the dashed line represents 1.5 fold.
  • Fig.3A shows the analysis in cohort I.
  • Fig.3B shows the analysis in cohort II.
  • FIG. 4 MiR-375 expression in medullary lesions.
  • Figures 5A-5B Samples stained with different dyes can be processed and microRNA can be detected. The plots shows the median expression levels of miR-146b-5p in malignant (M) or benign (B) samples stained with different dyes.
  • Fig.5A shows miR expression in samples stained with May-Griinwald Giemsa compared with DiffQuik.
  • Figures 6A-6B Hurthle cell marker.
  • the dashed factor line xl.5.
  • Bl. blood. NT, not tested.
  • the dashed factor line + 0.6.
  • Figure 7 Profiling of malignant and benign samples with Thyroid assay set of 15 microRNAs.
  • microRNA median expression levels for hsa-miR-222-3p (SEQ ID NO.1-2), hsa-miR-551b-3p (SEQ ID NO.3-4), hsa-miR-31-5p (SEQ ID NO.5-7), hsa-miR-125b-5p (SEQ ID NO.9), hsa-miR-146b-5p (SEQ ID NO.10-11), hsa-miR-152-3p (SEQ ID NO.12-13), hsa-miR-346 (SEQ ID N0.14), hsa-miR-181c-5p (SEQ 20 ID N0.15), hsa-miR-424-3p (SEQ ID NO.16), and hsa-miR-375 (SEQ ID NO.8) is highlighted.
  • Figures 8A-8C A Discriminant Analysis classifier was used to classify malignant (diamonds, M) from benign (squares, B) samples, using microRNA expression values.
  • Fig.8A 25 The normalized values of two microRNAs (hsa-miR-551b-3p and hsa-miR-146b-5p) were used as the features for the classification. The sensitivity of this classifier is 84.8% and the specificity is 68.9%. The grey shaded area marks the space in which a sample is classified as malignant, as determined by the classifier.
  • Fig.8B The normalized values of three microRNAs (hsa-miR- 551b-3p, hsa-miR-146b-5p, and hsa-miR-31-5p) were used as the features for the classification. 30 The sensitivity of this classifier is 82.9% and the specificity is 72.2%. Misclassified samples (miscl.) are represented by a dot.
  • Fig.8C The normalized values of 8 microRNA (hsa-miR- 551b-3p; hsa-miR-146b-5p; hsa-miR-31-5p; hsa-miR-222-3p; hsa-miR-375; hsa-miR-125b-5p; hsa-miR-152-3p; hsa-miR-181c-5p) were used as the features for the classification.
  • the sensitivity of this classifier is 83.5% and the specificity is 81.5%.
  • Figures 9A-9C A Discriminant Analysis classifier was used to classify malignant (diamonds, M) from benign (squares, B) samples, using normalized values of microRNA 5 expression ratios.
  • Fig.9A The normalized values of two microRNA ratios (hsa-miR-146b- 5p:hsa-miR-342-3p and hsa-miR-31-5p:hsa-miR-342-3p) were used as the features for the classification.
  • the sensitivity of this classifier is 78% and the specificity is 79.5%.
  • the grey shaded area marks the space in which a sample is classified as malignant , as determined by the classifier.
  • Fig.9C The normalized values of 8 microRNA ratios (hsa-miR-146b-5p:hsa-miR-342-3p; hsa-miR-31-5p:hsa-miR-342-3p; hsa- miR-125b-5p:hsa-miR-138-5p; hsa-miR-125b-5p:hsa-miR-200c-3p; hsa-miR-222-3p:hsa-miR- 15 486-5p; hsa-miR-200c-3p:hsa-miR-486-5p; MID-16582:hsa-miR-200c-3p; MID-16582:hsa- miR-138-5p) were used as the features for the classification.
  • the sensitivity of this classifier is 74.4% and the specificity is 84.1%.
  • Figure lOA-lOC A Discriminant Analysis classifier was used to classify malignant (diamonds, M) from benign (squares, B) samples, using normalized values of a combination of microRNAs and microRNA ratios.
  • Fig.lOA Normalized values of one microRNA ratio and one microRNA (hsa-miR-146b-5p:hsa-miR-342-3p; hsa-miR-551b-3p) were used as the features for the classification.
  • the sensitivity of this classifier is 82.9% and the specificity is 82.8%.
  • the 25 grey shaded area marks the space in which a sample is classified as malignant, as determined by the classifier.
  • Fig.lOB The normalized values of one microRNA ratio and two microRNAs (hsa- miR-146b-5p:hsa-miR-342-3p; hsa-miR-551b-3p; hsa-miR-146b-5p) were used as the features for the classification.
  • the sensitivity of this classifier is 82.9% and the specificity is 82.8%.
  • IOC The normalized values of 5 microRNAs and 3 microRNA ratios (hsa-miR-146b- 30 5p:hsa-miR-342-3p; hsa-miR-551b-3p; hsa-miR-146b-5p; hsa-miR-31-5p:hsa-miR-342-3p; hsa- miR-31-5p; hsa-miR-222-3p; hsa-miR-125b-5p:hsa-miR-138-5p; hsa-miR-375) were used as the features for the classification.
  • the sensitivity of this classifier is 93.3% and the specificity is 42.4%.
  • FIG. 11A-11C A K-nearest neighbor (KNN) classifier was used to classify malignant (M) from benign (B) samples using normalized values of microRNAs.
  • Fig.llA The normalized values of 6 microRNAs (hsa-miR-551b-3p; hsa-miR-146b-5p; hsa-miR-31-5p; hsa-miR-222-3p; 5 hsa-miR-375; hsa-miR-125b-5p) were used as the features for the classification.
  • the figure shows a confusion matrix where the x-axis shows the classifier answer (Clas.
  • Fig.llB The normalized values of 8 microRNAs (hsa-miR-551b-3p; hsa-miR-146b-5p; hsa-miR-31-5p; hsa-miR-222-3p; hsa-miR-375; hsa-miR-125b-5p; hsa-miR- 10 152-3p; hsa-miR-181c-5p) were used as the features for the classification.
  • the sensitivity of this classifier is 82.9% and the specificity is 74.2%.
  • Fig.llC The normalized values of 12 microRNAs (hsa-miR-551b-3p; hsa- miR-146b-5p; hsa-miR-31-5p; hsa-miR-222-3p; hsa-miR-375; hsa-miR-125b-5p; hsa-miR-152- 15 3p; hsa-miR-181c-5p; hsa-miR-486-5p; hsa-miR-424-3p; hsa-miR-200c-3p; hsa-miR-346) were used as the features for the classification.
  • the sensitivity of this classifier is 81.1% and the specificity is 68.9%.
  • Figure 12A-12B A KNN classifier was used to classify malignant (M) from benign (B) 20 samples using normalized values of microRNA ratios
  • Fig.l2A The normalized values of 6 microRNA ratios (hsa-miR-146b-5p:hsa-miR-342-3p; hsa-miR-31-5p:hsa-miR-342-3p; hsa- miR- 125b-5p:hsa-miR-138-5p; hsa-miR- 125b-5p:hsa-miR-200c-3p; hsa-miR- 222-3p:hsa-miR- 486-5p; hsa-miR-200c-3p:hsa-miR-486-5p) were used as the features for the classification.
  • the sensitivity of this classifier is 78% and the specificity is 58.9%.
  • FIG. 13A-13C A KNN classifier was used to classify malignant (M) from benign (B) samples using normalized values of a combination of microRNAs and microRNA ratios.
  • Fig.l3A The normalized values of 4 microRNAs and 2 microRNA ratios (hsa-miR- 146b- 5p:hsa-miR-342-3p; hsa-miR-551b-3p; hsa-miR-146b-5p; hsa-miR-31-5p:hsa-miR-342-3p; hsa- miR-31-5p; hsa-miR-222-3p) were used as the features for the classification.
  • Fig.l3B The normalized values of 5 microRNAs and 3 microRNA ratios (hsa-miR-146b-5p:hsa-miR-342-3p; hsa-miR-551b-3p; hsa-miR-146b-5p; hsa-miR-31-5p:hsa- miR-342-3p; hsa-miR-31-5p; hsa-miR-222-3p; hsa-miR-125b-5p:hsa-miR-138-5p; hsa-miR- 10 375) were used as the features for the classification.
  • the sensitivity of this classifier is 83.5% and the specificity is 70.9%.
  • Fig.l3C The normalized values of 7 microRNAs and 5 microRNA ratios (hsa-miR- 146b-5p:hsa-miR-342-3p; hsa-miR-551b-3p; hsa-miR-146b-5p; hsa-miR-31-5p:hsa-miR-342- 15 3p; hsa-miR-31-5p; hsa-miR-222-3p; hsa-miR-125b-5p:hsa-miR-138-5p; hsa-miR-375; hsa- miR- 125b-5p:hsa-miR-200c-3p; hsa-miR-125b-5p; hsa-miR-222-3p:hsa-miR-486-5p; hsa-miR- 152-3p) were used as the features for the classification
  • the sensitivity of this classifier is 83.5% and the specificity is 20 66.9%.
  • Figure 14A-14C A Support Vector Machine (SVM) classifier was used to classify malignant (diamonds, M) from benign (squares, B) samples using normalized microRNA values.
  • Fig.l4A The normalized values of three microRNAs (hsa-miR-551b-3p; hsa-miR-146b-5p; hsa- miR-31-5p) were used as the features for the classification. The sensitivity of this classifier is 25 82.3% and the specificity is 68.2%. Misclassified samples (miscl.) are represented by a dot.
  • SVM Support Vector Machine
  • Fig.l4B The normalized values of 6 microRNAs (hsa-miR-551b-3p; hsa-miR- 146b-5p; hsa- miR-31-5p; hsa-miR- 222-3p; hsa-miR-375; hsa-miR- 125b-5p) were used as the features for the classification.
  • the sensitivity of this 30 classifier is 83.5% and the specificity is 75.5%.
  • Fig.l4C The normalized values of 8 microRNAs (hsa-miR-551b-3p; hsa-miR-146b-5p; hsa-miR-31-5p; hsa-miR-222-3p; hsa-miR- 375; hsa-miR-125b-5p; hsa-miR-152-3p; hsa-miR-181c-3p) were used as the features for the classification.
  • the sensitivity of this classifier is 86% and the specificity is 75.5%.
  • Figure 15A-15C A SVM classifier was used to classify malignant (diamonds, M) from benign (squares, B) samples, using normalized values of microRNA ratios.
  • Fig.l5A The normalized values of three microRNA ratios (hsa-miR-146b-5p:hsa-miR-342-3p; hsa-miR-31- 5 5p:hsa-miR-342-3p; hsa-miR-125b-5p:hsa-miR-138-5p) were used as the features for the classification.
  • the sensitivity of this classifier is 83.5% and the specificity is 80.8%. Misclassified samples (miscl.) are represented by a dot.
  • Fig.l5B The normalized values of 6 microRNA ratios (hsa-miR-146b-5p:hsa-miR-342-3p; hsa-miR-31-5p:hsa-miR-342-3p; hsa- miR-125b-5p:hsa-miR-138-5p; hsa-miR-125b-5p:hsa-miR-200c-3p; hsa-miR-222-3p:hsa-miR- 10 486-5p; hsa-miR-200c-3p:hsa-miR-486-5p) were used as the features for the classification.
  • the sensitivity of this classifier is 83.5% and the specificity is 80.1%.
  • Fig.l5C The normalized values of 8 microRNA ratios (hsa-miR- 146b-5p:hsa-miR-342-3p; hsa-miR-31-5p:hsa-miR-342-3p; hsa-miR-125b-5p:hsa-miR-138-5p; 15 hsa-miR-125b-5p:hsa-miR-200c-3p; hsa-miR-222-3p:hsa-miR-486-5p; hsa-miR-200c-3p:hsa- miR-486-5p; MID-16582:hsa-miR-200c-3p; MID-16582:hsa-miR-138-5p) were used as the features for the classification.
  • the sensitivity of this classifier is 82.9% and the specificity is 80.8%. 20
  • Figure 16A-16C A SVM classifier was used to classify malignant (diamonds, M) from benign (squares, B) samples, using normalized values of a combination of microRNA values and microRNA ratios.
  • Fig.l6A The normalized values of 2 microRNAs and one microRNA ratio (hsa-miR-146b-5p:hsa-miR-342-3p; hsa-miR-551b-3p; hsa-miR-146b-5p) were used as the features for the classification.
  • the sensitivity of this classifier is 82.9% and the specificity is 25 83.4%.
  • Misclassified samples are represented by a dot.
  • Fig.l6B The normalized values of 4 microRNA and 2 microRNA ratios (hsa-miR-146b-5p:hsa-miR-342-3p; hsa-miR-551b-3p; hsa-miR-146b-5p; hsa-miR-31-5p:hsa-miR-342-3p; hsa-miR-31-5p; hsa-miR-222-3p) were used as the features for the classification.
  • Fig.l6C The normalized values of 5 microRNAs and 3 microRNA ratios (hsa-miR-146b-5p:hsa-miR-342-3p; hsa-miR-551b-3p; hsa-miR-146b-5p; hsa-miR-31-5p:hsa-miR-342-3p; hsa-miR-31 -5p; hsa-miR- 222-3p; hsa-miR-125b-5p:hsa-miR-138-5p; hsa-miR-375) were used as the features for the classification.
  • the sensitivity of this classifier is 86.6% and the specificity is 79.5%.
  • Figure 17A-17C A Discriminant Analysis Ensemble classifier was used to classify malignant (diamonds, M) from benign (squares, B) samples, using normalized values of 5 microRNAs.
  • Fig.l7A The normalized values of two microRNAs (hsa-miR-551b-3p; hsa-miR- 146b-5p) were used as the features for the classification. The sensitivity of this classifier is 84.8% and the specificity is 64.2%.
  • the grey shaded area marks the space in which a sample is classified as malignant, as determined by the classifier.
  • Fig.l7B The normalized values of three microRNAs (hsa-miR-551b-3p; hsa-miR-146b-5p; hsa-miR-31-5p) were used as the features for 10 the classification.
  • the sensitivity of this classifier is 84.1% and the specificity is 65.6%.
  • Misclassified samples are represented by a dot.
  • Fig.l7C The normalized values of 8 microRNAs (hsa-miR-551b-3p; hsa-miR-146b-5p; hsa-miR-31-5p; hsa-miR-222-3p; hsa-miR- 375; hsa-miR-125b-5p; hsa-miR-152-3p; hsa-miR-181c-3p) were used as the features for the classification.
  • Figure 18A-18C A Discriminant Analysis Ensemble classifier was used to classify malignant (diamonds, M) from benign (squares, B) samples, using normalized values of microRNA ratios.
  • Fig.l8A The normalized values of two microRNA ratios (hsa-miR-146b- 20 5p:hsa-miR-342-3p; hsa-miR-31-5p:hsa-miR-342-3p) were used as the features for the classification.
  • the sensitivity of this classifier is 83.5% and the specificity is 73.5%.
  • the grey shaded area marks the space in which a sample is classified as malignant, as determined by the classifier.
  • Fig.l8B The normalized values of three microRNA ratios (hsa-miR-146b-5p:hsa- miR-342-3p; hsa-miR-31-5p:hsa-miR-342-3p; hsa-miR-125b-5p:hsa-miR-138-5p) were used as 25 the features for the classification.
  • the sensitivity of this classifier is 86% and the specificity is 79.5%. Misclassified samples (miscl.) are represented by a dot.
  • Fig.l8C The normalized values of 8 microRNA ratios (hsa-miR-146b-5p:hsa-miR-342-3p; hsa-miR-31-5p:hsa-miR-342-3p; hsa- miR-125b-5p: hsa-miR-138-5p; hsa-miR-125b-5p:hsa-miR-200c-3p; hsa-miR-222-3p:hsa-miR- 486-5p; hsa-miR-200c-3p:hsa-miR-486-5p; MID-16582:hsa-miR-200c-3p; MID-16582:hsa- 30 miR-138-5p) were used as the features for the classification.
  • FIG. 19A-19C A Discriminant Analysis Ensemble classifier was used to classify malignant (diamonds, M) from benign (squares, B) samples, using a combination of normalized values of microRNAs and microRNA ratios.
  • Fig.l9A The normalized values of one microRNA and one microRNA ratio (hsa-miR-146b-5p:hsa-miR-342-3p; hsa-miR-551b-3p) were used as the features for the classification.
  • the sensitivity of this classifier is 85.4% and the specificity is 5 78.8%.
  • the grey shaded area marks the space in which a sample is classified as malignant, as determined by the classifier.
  • Fig.l9B The normalized values of two microRNAs and one microRNA ratio (hsa-miR-146b-5p:hsa-miR-342-3p; hsa-miR-551b-3p; hsa-miR-146b-5p) were used as the features for the classification.
  • the sensitivity of this classifier is 85.4% and the specificity is 78.1%.
  • Misclassified samples (miscl.) are represented by a dot. Fig.
  • 19C The 10 normalized values of 5 microRNAs and 3 microRNA ratios (hsa-miR-146b-5p:hsa-miR-342-3p; hsa-miR-551b-3p; hsa-miR-146b-5p; hsa-miR-31-5p:hsa-miR-342-3p; hsa-miR-31-5p; hsa-miR- 222-3p; hsa-miR-125b-5p:hsa-miR-138-5p; hsa-miR-375) were used as the features for the classification.
  • the sensitivity of this 15 classifier is 86% and the specificity is 82.8%.
  • Figure 20 The normalized levels of hsa-miR-375 expression (Exp.) is shown as a dot plot for Medullary ("Med.”), non-medullary Malignant (“Mai.”) and for benign (“Ben.”) samples. Lines represent the median values for each group. Within each group, dots are randomly distributed along the x-axis, in order to improve visibility of the dots. 20
  • Figure 21 The normalized levels of hsa-miR-146b-5p expression (Exp.) is shown as a dot plot for non-medullary Malignant ("Mai.”) and for benign (“Ben.”) samples. Lines represent the median values for each group. Within each group, dots are randomly distributed along the x- axis, in order to improve visibility of the dots.
  • Figure 22 The normalized expression (Exp.) levels of the miR ratio 25 hsa-miR-146b-5p:hsa-miR-342-3p is shown as a dot plot for non-medullary Malignant ("Mai.") and for benign (“Ben.”) samples. Lines represent the median values for each group. Within each group, dots are randomly distributed along the x-axis, in order to improve visibility of the dots.
  • Figure 23A-23C A Discriminant Analysis classifier was used to classify Indeterminate 30 malignant (diamonds, M) from benign (squares, B) samples, using normalized values of microRNAs.
  • Fig.23A The normalized values of two microRNAs (hsa-miR-146b-5p; hsa-miR- 551b-3p) were used as the features for the classification. The sensitivity of this classifier is 80% and the specificity is 56.3%. The grey shaded area marks the space in which a sample is classified as malignant, as determined by the classifier.
  • Fig.23B The normalized values of three microRNAs (hsa-miR-146b-5; hsa-miR-551b-3p; hsa-miR-222-3p) were used as the features for the classification.
  • the sensitivity of this classifier is 82.6% and the specificity is 59.5%.
  • Misclassified samples are represented by a dot.
  • Fig.23C The normalized values of 8 microRNAs (hsa-miR-146b-5p,hsa-miR-551b-3p,hsa-miR-222-3p,hsa-miR-125b-5p,hsa-miR- 5 31-5p,hsa-miR-375,hsa-miR-152-3p,hsa-miR-181c-5p) were used as the features for the classification.
  • the figure shows a confusion matrix where the x-axis shows the classifier answer and the y-axis shows the true diagnosis. The sensitivity of this classifier is 81.7% and the specificity is 71.4%.
  • Figure 24A-24C A Discriminant Analysis classifier was used to classify Indeterminate malignant (diamonds, M) from benign (squares, B) samples, using normalized values of microRNA ratios.
  • Fig.24A The normalized values of two microRNA ratios (hsa-miR-146b-5p - hsa-miR-342-3p,hsa-miR-31-5p - hsa-miR-342-3p) were used as the features for the classification.
  • the sensitivity of this classifier is 80% and the specificity is 72.2%.
  • the grey 15 shaded area marks the space in which a sample is classified as malignant, as determined by the classifier.
  • Fig.24B The normalized values of three microRNA ratios (hsa-miR-146b-5p:hsa- miR-342-3p; hsa-miR-31-5p:hsa-miR-342-3p; hsa-miR-125b-5p:hsa-miR-200c-3p) were used as the features for the classification. The sensitivity of this classifier is 80% and the specificity is 69%. Misclassified samples (miscl.) are represented by a dot.
  • Fig.24C The normalized values of 20 8 microRNA ratios indeterminate malignant from benign samples.
  • the figure shows a confusion 25 matrix where the x-axis shows the classifier answer and the y-axis shows the true diagnosis.
  • the sensitivity of this classifier is 80% and the specificity is 66.7%.
  • Figure 25A-25C A Discriminant Analysis classifier was used to classify Indeterminate 30 malignant (diamonds, M) from benign (squares, B) samples, using a combination of normalized values of microRNAs and microRNA ratios.
  • Fig.25A The normalized values of one microRNA and one microRNA ratio (hsa-miR-146b-5p:hsa-miR-342-3p; hsa-miR-146b-5p) were used as the features for the classification.
  • the sensitivity of this classifier is 80% and the specificity is 73.8%.
  • the grey shaded area marks the space in which a sample is classified as malignant, as determined by the classifier Fig.25B:
  • the normalized values of two microRNAs and one microRNA ratio (hsa-miR-146b-5p:hsa-miR-342-3p; hsa-miR-146b-5p; hsa-miR-551b-3p) were used as the features for the classification.
  • the sensitivity of this classifier is 79.1% and the specificity is 73%.
  • Fig.25C The normalized values of 5 microRNAs and 3 microRNA ratios 5 (hsa-miR-146b-5p:hsa-miR-342-3p; hsa-miR-146b-5p; hsa-miR-551b-3p; hsa-miR-222-3p; hsa- miR-31-5p:hsa-miR-342-3p; hsa-miR-125b-5p:hsa-miR-200c-3p; hsa-miR-125b-5p; hsa-miR- 31-5p) were used as the features for the classification.
  • the sensitivity of this classifier is 87.8% and the specificity is 67.5%.
  • Figure 26A-26C A KNN classifier was used to classify Indeterminate malignant (M) from benign (B) samples, using normalized values of microRNAs.
  • Fig.26A The normalized values of 6 microRNAs (hsa-miR-146b-5p; hsa-miR-551b-3p; hsa-miR-222-3p; hsa-miR-125b- 5p; hsa-miR-31-5p; hsa-miR-375) were used as the features for the classification.
  • the figure shows a confusion matrix where the x-axis shows the classifier answer (Clas.
  • Fig.26B The normalized values of 8 microRNAs (hsa-miR-146b-5p; hsa-miR-551b-3p; hsa-miR-222-3p; hsa-miR-125b-5p; hsa-miR-31-5p; hsa-miR-375; hsa-miR- 152-3p; hsa-miR-181c-5p) were used as the features for the classification.
  • the sensitivity of this classifier is 82.6% and the specificity is 73%.
  • Fig.26C The normalized values of 12 microRNAs (hsa-miR-551b-3p; hsa- miR-146b-5p; hsa-miR-222-3p; hsa-miR-125b-5p; hsa-miR-31-5p; hsa-miR-375; hsa-miR-152- 3p; hsa-miR-181c-5p; hsa-miR-424-3p; hsa-miR-486-5p; hsa-miR-200c-3p; hsa-miR-346) were used as the features for the classification.
  • the sensitivity of this classifier is 73.9% and the specificity is 68.3%.
  • Figure 27A-27B A KNN classifier was used to classify Indeterminate malignant (M) from benign (B) samples, using normalized values of microRNA ratios.
  • Fig.27A The normalized values of 6 microRNA ratios (hsa-miR-146b-5p:hsa-miR-342-3p; hsa-miR-31- 30 5p:hsa-miR-342-3p; hsa-miR-125b-5p:hsa-miR-200c-3p; hsa-miR-125b-5p:hsa-miR-138-5p; hsa-miR-222-3p:hsa-miR-486-5p; MID-16582:hsa-miR-200c-3p) were used as the features for the classification.
  • the sensitivity of this classifier is 80.9% and the specificity is 65.9%.
  • Fig.27B The normalized values of 8 microRNA ratios (hsa-miR-146b-5p:hsa-miR-342-3p; hsa-miR-31-5p:hsa-miR-342-3p; hsa- miR-125b-5p:hsa-miR-200c-3p; hsa-miR-125b-5p:hsa-miR-138-5p; hsa-miR-222-3p:hsa-miR- 486-5p; MID-16582:hsa-miR-200c-3p; MID-16582:hsa-miR-138-5p; hsa-miR-200c-3p:hsa- miR-486-5p) were used as the features for the classification.
  • the sensitivity of this classifier is 76.5% and the specificity is 62.7%.
  • Figure 28A-28C A KNN classifier was used to classify Indeterminate malignant (M) from benign (B) samples, using normalized values of microRNAs and microRNA ratios.
  • 10 Fig.27C The normalized values of 3 microRNAs and 3 microRNA ratios (hsa-miR-146b- 5p:hsa-miR-342-3p; hsa-miR-146b-5p; hsa-miR-551b-3p; hsa-miR-222-3p; hsa-miR-31-5p:hsa- miR-342-3p; hsa-miR-125b-5p:hsa-miR-200c-3p) were used as the features for the classification.
  • the sensitivity of this 15 classifier is 76.5% and the specificity is 57.9%.
  • Fig.28B The normalized values of 5 microRNAs and 3 microRNA ratios (hsa-miR-146b-5p:hsa-miR-342-3p; hsa-miR-551b-3p; hsa- miR-146b-5p; hsa-miR-222-3p; hsa-miR-31-5p:hsa-miR-342-3p; hsa-miR-125b-5p:hsa-miR- 200c-3p; hsa-miR-125b-5p; hsa-miR-31-5p) were used as the features for the classification.
  • FIG.28C The normalized values of 12 microRNA and microRNA ratios (hsa-miR-146b-5p:hsa-miR-342-3p; hsa-miR-146b-5p; hsa-miR-551b-3p; hsa-miR-222- 3p; hsa-miR-31-5p:hsa-miR-342-3p; hsa-miR-125b-5p:hsa-miR-200c-3p; hsa-miR-125b-5p; hsa-miR-31-5p; hsa-miR-375; hsa-miR-375; hsa-miR
  • the sensitivity of this classifier is 80.9% and the specificity is 67.5%.
  • Figure 29A-29C A SVM classifier was used to classify Indeterminate malignant 30 (diamonds, M) from benign (squares, B) samples, using the normalized values of microRNAs.
  • Fig.29A The normalized values of three microRNAs (hsa-miR-146b-5p; hsa-miR-551b-3p; hsa- miR-222-3p) were used as the features for the classification.
  • the sensitivity of this classifier is 82.6% and the specificity is 54.8% Misclassified samples (miscl.) are represented by a dot.
  • Fig.29B The normalized values of 6 microRNAs (hsa-miR-146b-5p; hsa-miR-551b-3p; hsa- miR-222-3p; hsa-miR-125b-5p; hsa-miR-31-5p; hsa-miR-375) were used as the features for the classification.
  • the sensitivity of this classifier is 82.6% and the specificity is 59.5%.
  • Fig.29C Figure 20: The normalized values of 8 5 microRNAs (hsa-miR-146b-5p; hsa-miR-551b-3p; hsa-miR-222-3p; hsa-miR-125b-5p; hsa-miR- 31-5p; hsa-miR-375; hsa-miR-152-3p; hsa-miR-181c-5p) were used as the features for the classification.
  • FIG. 30A-30C A SVM classifier was used to classify Indeterminate malignant (diamonds, M) from benign (squares, B) samples, using the normalized values of microRNA ratios.
  • Fig.30A The normalized values of three microRNA ratios (hsa-miR-146b-5p:hsa-miR- 342-3p; hsa-miR-31-5p:hsa-miR-342-3p; hsa-miR-125b-5p:hsa-miR-200c-3p) were used as the features for the classification.
  • the sensitivity of this classifier is 81.7% and the specificity is 15 67.5%.
  • Fig.30B The normalized values of 6 microRNA ratios (hsa-miR-146b-5p:hsa-miR-342-3p; hsa-miR-31-5p:hsa-miR-342-3p; hsa- miR-125b-5p:hsa-miR-200c-3p; hsa-miR-125b-5p:hsa-miR-138-5p; hsa-miR-222-3p:hsa-miR- 486-5p; MID-16582:hsa-miR-200c-3p) were used as the features for the classification.
  • the sensitivity of this classifier is 88.7% and the specificity is 63.5%.
  • the sensitivity of this classifier is 87.8% and the specificity is 58.7%.
  • Figure 31A-31C A SVM classifier was used to classify Indeterminate malignant (diamonds, M) from benign (squares, B) samples, using the combination of normalized values of 30 microRNAs and microRNA ratios.
  • Fig. 31A The normalized values of two microRNAs and one microRNA ratio (hsa-miR-146b-5p:hsa-miR-342-3p; hsa-miR-146b-5p; hsa-miR-551b-3p) were used as the features for the classification.
  • the sensitivity of this classifier is 80% and the specificity is 71.4%.
  • 31B The normalized values of 4 microRNAs and two microRNA ratios (hsa-miR-146b-5p:hsa-miR-342-3p; hsa-miR-146b-5p; hsa-miR-551b-3p; hsa-miR-222- 3p; hsa-miR-31-5p:hsa-miR-342-3p; hsa-miR-125b-5p:hsa-miR-200c-3p) were used as the features for the classification.
  • the figure shows a confusion matrix where the x-axis shows the classifier answer (Clas.
  • Fig. 31C The normalized 5 values of 5 microRNAs and 3 microRNA ratios (hsa-miR-146b-5p:hsa-miR-342-3p; hsa-miR- 551b-3p; hsa-miR-146b-5p; hsa-miR-222-3p; hsa-miR-31-5p:hsa-miR-342-3p; hsa-miR-125b- 5p:hsa-miR-200c-3p; hsa-miR-125b-5p; hsa-miR-31-5p) were used as the features for the classification.
  • the sensitivity of this 10 classifier is 84.3% and the specificity is 68.3%.
  • Figure 32A-32C A Discriminant analysis ensemble classifier was used to classify Indeterminate malignant (diamonds, M) from benign (squares, B) samples using the normalized values of microRNAs.
  • Fig.32A The normalized values of two microRNA (hsa-miR-146b-5p; hsa-miR-551b-3p) were used as the features for the classification. The sensitivity of this 15 classifier is 85.2% and the specificity is 45.2%. The grey shaded area marks the space in which a sample is classified as malignant, as determined by the classifier.
  • Fig.32B The normalized values of three microRNAs (hsa-miR-551b-3p; hsa-miR-146b-5p; hsa-miR-222-3p) were used as the features for the classification.
  • the sensitivity of this classifier is 84.3% and the specificity is 45.2%.
  • Misclassified samples are represented by a dot.
  • Fig.32C The normalized values 20 of 8 microRNAs (hsa-miR-146b-5p; hsa-miR-551b-3p; hsa-miR-222-3p; hsa-miR-125b-5p; hsa- miR-31-5p; hsa-miR-375; hsa-miR-152-3p; hsa-miR-181c-5p) were used as the features for the classification.
  • Figure 33A-33C A Discriminant analysis ensemble classifier was used to classify Indeterminate malignant (diamonds, M) from benign (squares, B) samples using the normalized values of microRNA ratios.
  • Fig.33A The normalized values of two microRNA ratios (hsa-miR- 146b-5p:hsa-miR-342-3p; hsa-miR-31-5p:hsa-miR-342-3p) were used as the features for the classification.
  • the sensitivity of this classifier is 86.1% and the specificity is 61.1%.
  • the grey 30 shaded area marks the space in which a sample is classified as malignant, as determined by the classifier.
  • Fig.33B The normalized values of three microRNA ratios (hsa-miR-146b-5p:hsa- miR-342-3p; hsa-miR-31-5p:hsa-miR-342-3p; hsa-miR-125b-5p:hsa-miR-200c-3p) were used as the features for the classification.
  • the sensitivity of this classifier is 87% and the specificity is 57.1%.
  • Misclassified samples (miscl.) are represented by a dot.
  • Fig.33C The normalized values of 8 microRNA ratios (hsa-miR-146b-5p:hsa-miR-342-3p; hsa-miR-31-5p:hsa-miR-342-3p; hsa- miR-125b-5p:hsa-miR-200c-3p; hsa-miR-125b-5p:hsa-miR-138-5p; hsa-miR-222-3p:hsa-miR- 486-5p; MID-16582:hsa-miR-200c-3p; MID-16582:hsa-miR-138-5p; hsa-miR-200c-3p:hsa- miR-486-5p) were used as the features for the classification.
  • the sensitivity of this classifier is 89.6% and the specificity is 65.1%.
  • Figure 34A-34C A Discriminant analysis ensemble classifier was used to classify Indeterminate malignant (diamonds, M) from benign (squares, B) samples using a combination 10 of normalized values of microRNAs and microRNA ratios.
  • Fig.34A The normalized values of one microRNA and one microRNA ratio (hsa-miR-146b-5p:hsa-miR-342-3p; hsa-miR-146b-5p) were used as the features for the classification.
  • the sensitivity of this classifier is 83.5% and the specificity is 58.7%.
  • the grey shaded area marks the space in which a sample is classified as malignant, as determined by the classifier.
  • Fig.34B The normalized values of two microRNAs 15 and one microRNA ratio (hsa-miR-146b-5p:hsa-miR-342-3p; hsa-miR-146b-5p; hsa-miR-551b- 3p) were used as the features for the classification.
  • the sensitivity of this classifier is 85.2% and the specificity is 65.9%. Misclassified samples (miscl.) are represented by a dot.
  • Fig.34C The normalized values of 5 microRNAs and 3 microRNA ratios (hsa-miR-146b-5p:hsa-miR-342-3p; hsa-miR-146b-5p; hsa-miR-551b-3p; hsa-miR-222-3p; hsa-miR-31-5p:hsa-miR-342-3p; hsa- 20 miR-125b-5p:hsa-miR-200c-3p; hsa-miR-125b-5p; hsa-miR-31-5p) were used as the features for the classification.
  • the sensitivity of this classifier is 87.8% and the specificity is 62.7%.
  • Figure 35 Normalized expression (Exp.) levels of hsa-miR-146b-5p is shown as a dot 25 plot for Indeterminate non-medullary malignant ("Mai.”) and benign (“Ben.”) samples. Lines represent the median values for each group. Within each group, dots are randomly distributed along the x-axis, in order to improve visibility of the dots.
  • Figure 36 The normalized expression levels (Exp.) of the miR ratio hsa-miR-146b- 5p:hsa-miR-342-3p is shown as a dot plot for Indeterminate non-medullary malignant ("Mai.") 30 and benign (“Ben.”) samples. Lines represent the median values for each group. Within each group, dots are randomly distributed along the x-axis, in order to improve visibility of the dots.
  • Figure 37A-37C A Discriminant analysis classifier was used to classify Bethesda IV malignant (diamonds, M) from benign (squares, B) samples, using the normalized values of microRNAs.
  • Fig.37A The normalized values of two microRNAs (hsa-miR-125b-5p; hsa-miR- 551b-3p) were used as the features for the classification. The sensitivity of this classifier is 91.5% and the specificity is 42.9%. The grey shaded area marks the space in which a sample is classified as malignant, as determined by the classifier.
  • Fig.37B The normalized values of three microRNAs (hsa-miR-125b-5p; hsa-miR-551b-3p; hsa-miR-222-3p) were used as the features 5 for the classification.
  • the sensitivity of this classifier is 91.5% and the specificity is 39.7%.
  • Misclassified samples are represented by a dot.
  • Fig.37C The normalized values of 8 microRNAs (hsa-miR-125b-5p; hsa-miR-551b-3p; hsa-miR-222-3p; hsa-miR-146b-5p; hsa-miR- 375; hsa-miR-181c-5p; hsa-miR-31-5p; hsa-miR-138-5p) were used as the features for the classification.
  • the sensitivity of this classifier is 89.4% and the specificity is 47.6%.
  • Figure 38A-38C A Discriminant analysis classifier was used to classify Bethesda IV malignant (diamonds, M) from benign (squares, B) samples, using the normalized values of microRNA ratios.
  • Fig.38A The normalized values of two microRNA ratios (hsa-miR-125b- 15 5p:hsa-miR-200c-3p; hsa-miR-146b-5p:hsa-miR-342-3p) were used as the features for the classification.
  • the sensitivity of this classifier is 89.4% and the specificity is 28.6%.
  • the grey shaded area marks the space in which a sample is classified as malignant, as determined by the classifier.
  • Fig.38B The normalized values of three microRNA ratios (hsa-miR-125b-5p:hsa- miR-200c-3p; hsa-miR-146b-5p:hsa-miR-342-3p; hsa-miR-31-5p:hsa-miR-342-3p) were used as 20 the features for the classification.
  • the sensitivity of this classifier is 91.5% and the specificity is 30.2%.
  • Misclassified samples (miscl.) are represented by a dot.
  • Fig.38C The normalized values of 8 microRNA ratios (hsa-miR-125b-5p:hsa-miR-200c-3p; hsa-miR-146b-5p:hsa-miR-342-3p; hsa-miR-31-5p:hsa-miR-342-3p; MID-16582:hsa-miR-138-5p; hsa-miR-222-3p:hsa-miR-486- 5p; MID-16582:hsa-miR-200c-3p; hsa-miR-125b-5p: hsa-miR-138-5p; hsa-miR-200c-3p:hsa- 25 miR-486-5p) were used as the features for the classification.
  • the sensitivity of this classifier is 80.9% and the specificity is 57.1%.
  • Figure 39A-39C A Discriminant analysis classifier was used to classify Bethesda IV 30 malignant (diamonds, M) from benign (squares, B) samples, using the normalized values of microRNAs and microRNA ratios.
  • Fig. 39A The normalized values of one microRNA and one microRNA ratio (hsa-miR-125b-5p; hsa-miR-125b-5p:hsa-miR-200c-3p) were used as the features for the classification.
  • the sensitivity of this classifier is 93.6% and the specificity is 33.3%.
  • the grey shaded area marks the space in which a sample is classified as malignant, as determined by the classifier.
  • 39C The normalized values of 4 microRNAs and 4 microRNA ratios (hsa-miR-125b-5p; hsa-miR-125b-5p:hsa-miR-200c-3p; hsa-miR-146b-5p: hsa-miR-342-3p; hsa-miR-551b-3p; hsa- miR-222-3p; hsa-miR-146b-5p; hsa-miR-31-5p:hsa-miR-342-3p; MID-16582:hsa-miR-138-5p) were used as the features for the classification.
  • the sensitivity of this classifier is 87.2% and the specificity is 46%.
  • Figure 40A-40C A KNN classifier was used to classify Bethesda IV malignant from benign samples, using the normalized values of microRNAs.
  • Fig.40A The normalized values of 6 microRNAs (hsa-miR- 125b- 5p; hsa-miR-551b-3p; hsa-miR-222-3p; hsa-miR-146b-5p; hsa-miR-375; hsa-miR-181c-5p) were used as the features for the classification.
  • the sensitivity of this classifier is 72.3% and the specificity is 39.7%.
  • Fig.40B The normalized values of 8 microRNA (hsa-miR-125b-5p; hsa- miR-551b-3p; hsa-miR-222-3p; hsa-miR-146b-5p; hsa-miR-375; hsa-miR-181c-5p; hsa-miR-31- 20 5p; hsa-miR-138-5p) were used as the features for the classification.
  • the sensitivity of this classifier is 66% and the specificity is 61.9%.
  • Fig.40C The normalized values of 12 microRNA (hsa-miR-125b-5p; hsa-miR-551b-3p; hsa-miR-222-3p; hsa-miR-146b-5p; hsa-miR-375; hsa- miR-181c-5p; hsa-miR-31-5p; hsa-miR-138-5p; hsa-miR-200c-3p; MID-16582; hsa-miR-346; hsa-miR-152-3p) were used as the features for the classification.
  • the sensitivity of this classifier 25 is 66% and the specificity is 61.9%.
  • Figure 41A-41B A KNN classifier was used to classify Bethesda IV malignant from benign samples, using the normalized values of microRNA ratios.
  • Fig.41A The normalized values of 6 microRNA ratios (hsa-miR- 30 125b-5p:hsa-miR-200c-3p; hsa-miR-146b-5p:hsa-miR-342-3p; hsa-miR-31-5p:hsa-miR-342-3p; MID-16582:hsa-miR-138-5p; hsa-miR-222-3p:hsa-miR-486-5p; MID-16582:hsa-miR-200c-3p) were used as the features for the classification.
  • the sensitivity of this classifier is 78.7% and the specificity is 61.9%.
  • Fig.41B The normalized values of 8 microRNA ratios (hsa-miR- 125b- 5p:hsa-miR-200c-3p; hsa-miR-146b-5p:hsa-miR-342-3p; hsa-miR-31-5p:hsa-miR-342-3p; MID-16582:hsa-miR-138-5p; hsa-miR-222-3p:hsa-miR-486-5p; MID-16582:hsa-miR-200c-3p; hsa-miR-125b-5p:hsa-miR-138-5p; hsa-miR-200c-3p:hsa-miR-486-5p) were used as the features for the classification.
  • the sensitivity of this classifier is 80.9% and the specificity is 50.8%.
  • Figure 42A-42C A KNN classifier was used to classify Bethesda IV malignant from 5 benign samples, using the normalized values of microRNAs and microRNA ratios.
  • Fig.42A The normalized values of 4 microRNAs and 2 microRNA ratios (hsa-miR-125b-5p; hsa-miR-125b-5p:hsa-miR-200c-3p; hsa-miR-146b-5p:hsa-miR-342-3p; hsa-miR-551b-3p; hsa-miR-222-3p; hsa-miR-146b-5p) were 10 used as the features for the classification. The sensitivity of this classifier is 63.8% and the specificity is 46%.
  • Fig.42B The normalized values of 4 microRNAs and 4 microRNA ratios (hsa-miR-125b-5p; hsa-miR-125b-5p:hsa-miR-200c-3p; hsa-miR-146b-5p:hsa-miR-342-3p; hsa- miR-551b-3p; hsa-miR-222-3p; hsa-miR-146b-5p; hsa-miR-31-5p:hsa-miR-342-3p; MID- 16582:hsa-miR-138-5p) were used as the features for the classification.
  • Fig.42C The normalized values of 6 microRNA and 6 microRNA ratios (hsa-miR-125b-5p; hsa-miR-125b-5p:hsa-miR-200c-3p; hsa-miR-146b- 5p:hsa-miR-342-3p; hsa-miR-551b-3p; hsa-miR-222-3p; hsa-miR-146b-5p; hsa-miR-31-5p:hsa- miR-342-3p; MID-16582:hsa-miR-138-5p; hsa-miR-375; hsa-miR-222-3p:hsa-miR-486-5p; hsa-miR-181c-5p; MID-16582:hsa-miR-181c-5p; MID-16582:hsa-miR-181
  • Figure 43A-43C A SVM classifier was used to classify Bethesda IV malignant from benign samples, using the normalized values of microRNAs.
  • Fig.43B The normalized values of 6 microRNAs (hsa-miR-125b-5p; hsa-miR-551b-3p; hsa-miR-222-3p; hsa-miR-146b-5p; hsa-miR- 375; hsa-miR-181c-5p) were used as the features for the classification.
  • the sensitivity of this classifier is 89.4% and the 30 specificity is 38.1%.
  • Fig.43C The normalized values of 8 microRNA (hsa-miR-125b-5p; hsa- miR-551b-3p; hsa-miR-222-3p; hsa-miR-146b-5p; hsa-miR-375; hsa-miR-181c-5p; hsa-miR-31- 5p; hsa-miR-138-5p) were used as the features for the classification.
  • the sensitivity of this classifier is 91.5% and the specificity is 55.6%.
  • Figure 44A-44C A SVM classifier was used to classify Bethesda IV malignant (diamonds, M) from benign (squares, B) samples, using the normalized values of microRNA ratios.
  • Fig.44A The normalized values of three microRNA ratios (hsa-miR-125b-5p:hsa-miR- 5 200c-3p; hsa-miR-146b-5p:hsa-miR-342-3p; hsa-miR-31-5p:hsa-miR-342-3p) were used as the features for the classification. The sensitivity of this classifier is 100%.
  • Fig.44B The normalized values of 6 microRNA ratios (hsa-miR-125b-5p:hsa-miR-200c-3p; hsa-miR-146b- 5p:hsa-miR-342-3p; hsa-miR-31-5p:hsa-miR-342-3p; MID-16582:hsa-miR-138-5p; hsa-miR- 222-3p:hsa-miR-486-5p; MID-16582:hsa-miR-200c-3p) were used as the features for the 10 classification.
  • the figure shows a confusion matrix where the x-axis shows the classifier answer (Clas.
  • Fig.44C The normalized values of 8 microRNA ratios (hsa-miR-125b-5p:hsa-miR-200c-3p; hsa-miR-146b-5p:hsa-miR-342-3p; hsa-miR-31- 5p:hsa-miR-342-3p; MID-16582:hsa-miR-138-5p; hsa-miR-222-3p:hsa-miR-486-5p; MID- 15 16582:hsa-miR-200c-3p; hsa-miR-125b-5p:hsa-miR-138-5p; hsa-miR-200c-3p:hsa-miR-486
  • the sensitivity of this classifier is 93.6% and the specificity is 31.7%.
  • Figure 45A-45C A SVM classifier was used to classify Bethesda IV malignant 20 (diamonds, M) from benign (squares, B) samples, using a combination normalized values of microRNAs and microRNA ratios.
  • Fig.45A The normalized values of one microRNA and two microRNA ratios (hsa-miR-125b-5p; hsa-miR-125b-5p:hsa-miR-200c-3p; hsa-miR-146b- 5p:hsa-miR-342-3p) were used as the features for the classification.
  • the sensitivity of this classifier is 93.6% and the specificity is 22.2%.
  • Misclassified samples are represented by 25 a dot.
  • Fig.45B The normalized values of 4 microRNAs and 2 microRNA ratios (hsa-miR-125b- 5p; hsa-miR-125b-5p:hsa-miR-200c-3p; hsa-miR-146b-5p:hsa-miR-342-3p; hsa-miR-551b-3p; hsa-miR-222-3p; hsa-miR-146b-5p) were used as the features for the classification.
  • the figure shows a confusion matrix where the x-axis shows the classifier answer (Clas.
  • Fig.45C The normalized values of 4 microRNAs and 4 microRNA ratios (hsa-miR-125b-5p; hsa-miR-125b-5p:hsa-miR-200c-3p; hsa-miR-146b-5p:hsa-miR-342- 3p; hsa-miR-551b-3p; hsa-miR-222-3p; hsa-miR-146b-5p; hsa-miR-31-5p:hsa-miR-342-3p; MID-16582:hsa-miR-138-5p) were used as the features for the classification.
  • the sensitivity of this classifier is 91.5% and the specificity is 36.5%.
  • Figure 46A-46C A Discriminant Analysis Ensemble classifier was used to classify Bethesda IV malignant (diamonds, M) from benign (squares, B) samples, using normalized 5 values of microRNAs.
  • Fig.46A The normalized values of two microRNAs (hsa-miR-125b-5p; hsa-miR-551b-3p) were used as the features for the classification. The sensitivity of this classifier is 91.5% and the specificity is 39.7%. The grey shaded area marks the space in which a sample is classified as malignant, as determined by the classifier.
  • Fig.46B The normalized values of three microRNAs (hsa-miR-125b-5p; hsa-miR-551b-3p; hsa-miR-222-3p) were used as 10 the features for the classification.
  • the sensitivity of this classifier is 89.4% and the specificity is 39.7%.
  • Fig.46C The normalized values of 8 microRNAs (hsa-miR-125b-5p; hsa-miR-551b-3p; hsa-miR-222-3p; hsa-miR-146b-5p; hsa-miR-375; hsa-miR-181c-5p; hsa-miR-31-5p; hsa-miR- 138-5p) were used as the features for the classification.
  • the sensitivity of this classifier is 93.6% and the specificity is 46%.
  • Figure 47A-47C A Discriminant Analysis Ensemble classifier was used to classify Bethesda IV malignant (diamonds, M) from benign (squares, B) samples, using normalized values of microRNA ratios.
  • Fig.47A The normalized values of two microRNA ratios (hsa-miR- 20 125b-5p:hsa-miR-200c-3p; hsa-miR-146b-5p:hsa-miR-342-3p) were used as the features for the classification.
  • the sensitivity of this classifier is 93.6% and the specificity is 19%.
  • the grey shaded area marks the space in which a sample is classified as malignant, as determined by the classifier.
  • Fig.47B The normalized values of three microRNA ratios (hsa-miR-125b-5p:hsa- miR-200c-3p; hsa-miR- 146b-5p:hsa-miR-342-3p; hsa-miR-31-5p:hsa-miR-342-3p) were used as 25 the features for the classification.
  • the sensitivity of this classifier is 93.6% and the specificity is 17.5%.
  • Misclassified samples (miscl.) are represented by a dot.
  • Fig.47C The normalized values of 8 microRNA ratios (hsa-miR- 125b-5p:hsa-miR-200c-3p; hsa-miR- 146b-5p:hsa-miR-342-3p; hsa-miR-31-5p:hsa-miR-342-3p; MID-16582:hsa-miR-138-5p; hsa-miR-222-3p:hsa-miR-486- 5p; MID-16582:hsa-miR-200c-3p; hsa-miR-125b-5p:hsa-miR-138-5p; hsa-miR-200c-3p:hsa- 30 miR-486-5p) were used as the features for the classification.
  • FIG. 48A-48C A Discriminant Analysis Ensemble classifier was used to classify Bethesda IV malignant (diamonds, M) from benign (squares, B) samples, using a combination of normalized values of microRNAs and microRNA ratios.
  • Fig.48A The normalized values of one microRNA and one microRNA ratio (hsa-miR-125b-5p; hsa-miR-125b-5p:hsa-miR-200c-3p) were used as the features for the classification.
  • the sensitivity of this classifier is 91.5% and the 5 specificity is 33.3%.
  • the grey shaded area marks the space in which a sample is classified as malignant, as determined by the classifier.
  • Fig.48B The normalized values of one microRNA and two microRNA ratios (hsa-miR-125b-5p; hsa-miR-125b-5p:hsa-miR-200c-3p; hsa-miR- 146b-5p:hsa-miR-342-3p) were used as the features for the classification.
  • the sensitivity of this classifier is 89.4% and the specificity is 36.5%.
  • Misclassified samples (miscl.) are represented by 10 a dot.
  • Fig.48C The normalized values of 4 microRNA and 4 microRNA ratios (hsa-miR-125b- 5p; hsa-miR-125b-5p:hsa-miR-200c-3p; hsa-miR-146b-5p:hsa-miR-342-3p; hsa-miR-551b-3p; hsa-miR-222-3p; hsa-miR-146b-5p; hsa-miR-31-5p:hsa-miR-342-3p; MID-16582:hsa-miR-138- 5p) were used as the features for the classification.
  • the sensitivity of this classifier is 91.5% and the specificity is 34.9%.
  • Figure 49 The normalized expression (Exp.) levels of hsa-miR-146b-5p is shown as a dot plot for Bethesda IV non-medullary malignant ("Mai.”) and for benign (“Ben.”) samples. Lines represent the median values for each group. Within each group, dots are randomly distributed along the x-axis. 20
  • Figure 50 The normalized expression (Exp.) levels of the microRNA ratio hsa-miR- 146b-5p:hsa-miR-342-3p is shown as a dot plot for Bethesda IV non-medullary malignant ("Mai.”) and for benign (“Ben.”) samples. Lines represent the median values for each group. Within each group, dots are randomly distributed along the x-axis.
  • Figure 51 A Discriminant Analysis classifier was used to classify malignant (diamonds, 25 M) from benign (squares, B) samples, wherein the malignant group included samples of medullary tumor.
  • the normalized values of two microRNA hsa-miR-222-3p; hsa-miR-551b-3p
  • the sensitivity of this classifier is 85.2% and the specificity is 53.6%.
  • the grey shaded area marks the space in which a sample is classified as malignant, as determined by the classifier.
  • Figure 52 A Discriminant Analysis classifier was used to classify malignant (diamonds, M) from benign (squares, B) samples, wherein the malignant group included samples of medullary tumor.
  • the normalized values of two microRNA ratios hsa-miR-125b-5p:hsa-miR- 138-5p; hsa-miR-146b-5p:hsa-miR-342-3p
  • the sensitivity of this classifier is 84.7% and the specificity is 80.8%.
  • the grey shaded area marks the space in which a sample is classified as malignant, as determined by the classifier.
  • Figure 53 Expression pattern of hsa-miR-486-5p and hsa-miR-200c-3p is determinant for the quality of the sample.
  • Four samples of blood smears (BS) were analyzed for the expression of hsa-miR-486-5p (SEQ ID N0.22) and hsa-miR-200c-3p (SEQ ID NO.23-24) in 5 comparison with their expression in malignant (M) and benign (B) thyroid samples. Normalized values for the two miRs are shown (normalized using all normalizers).
  • FIG. 54 Sub-typing of Benign Thyroid Tumors.
  • Each cross represents a microRNA or a microRNA 10 ratio.
  • the ratio hsa-miR-125b-5p:hsa-miR-200c-3p correlated to FA, while expression of hsa- miR-342-3p and hsa-miR-31-5p correlated with Hashimoto.
  • Diamonds represent normalizers.
  • Significant microRNAs (p-value for t-test ⁇ 0.05) are represented by circles.
  • FIG. 55 Sub-typing of Malignant Thyroid Tumors.
  • Each cross represents a microRNA or a microRNA ratio.
  • Diamonds are normalizers.
  • Significant microRNAs p-value for t-test ⁇ 0.05
  • Only normalized microRNA values are labeled. Unlabeled circles represent significant ratios.
  • Figure 56 Flowchart representing the protocol for diagnosis of indeterminate thyroid 20 nodule samples obtained through FNA.
  • the present invention provides a sensitive, specific and accurate methodology for 30 distinguishing between malignant and benign thyroid tumors, as well as particular subtypes of thyroid tumors. Distinguishing between different subtypes of thyroid tumors is essential for providing the patient with the best and most suitable treatment.
  • the present invention provides a significant improvement of the technologies currently available in the field of thyroid tumor classification and diagnosis.
  • the present inventors have developed an integrative platform for the classification of thyroid lesions, by profiling and characterizing microRNA expression in thyroid clinical samples obtained by FNA biopsies, while also overcoming hindrances such as low number of cells in the 5 sample and the amount of blood in the sample by microRNA profiling.
  • This technological platform was applied to stratify thyroid lesions into benign or malignant neoplasms, as well as subtypes of thyroid tumors, as an adjunctive tool in the pre-operative management of thyroid nodules.
  • the inventors have exceptionally developed a method for classification of benign and malignant thyroid lesions, and specific subtypes of thyroid cancer and follicular lesions, while 10 integrating steps for filtering out sub-optimal samples, by implementing specific algorithms based on microRNA profiling.
  • the method is part of an overall protocol, in which existing or available clinical cytological slides having smears from FNA samples may be used, without the need to generate or collect additional material from the patients.
  • the present method further incorporates the analysis of microRNAs in minute amounts 15 of RNA material from cytological samples. Once an FNA sample is collected, between one and several passes of material are smeared onto slides. Currently available methods usually require the use of several passes for having enough material for analysis. The present inventors developed a method in which even only one FNA slide provides sufficient material for microRNA detection. Furthermore, the present inventors were able to measure microRNA 20 abundance in FNA samples obtained from thyroid nodules as small as 0.1 cm. This is particularly relevant considering that approximately 50% of thyroid lesions are smaller than 1cm [Jung et al. (2014)/ Clin Endocrinol Metab 99: E276-E285]. In addition, the method developed by the inventors allows for the aanalysis of samples having very small amounts of cells, such as samples having 50 cells, up to 120 cells and over. 25
  • the present method includes steps for eliminating or disqualifying samples that lack thyroid cells and/or in which non-thyroid cells, such as blood cells, are over-represented.
  • the present inventors have identified a unique microRNA expression signature for thyroid lesions through profiling the expression of the microRNAs denoted by SEQ ID NOs. l- 308. 30
  • the present inventors have develop a platform for classification of thyroid clinical samples based on the levels of expression of a set of microRNAs, comprising at least two microRNAs, selected from the group consisting of hsa-miR-31-5p (SEQ ID NO.5-7), hsa-miR-424-3p (SEQ ID NO.16), hsa-miR-222-3p (SEQ ID NO.1-2), hsa-miR-146b-5p (SEQ ID NO.10-11), hsa-miR-346 (SEQ ID N0.14), MID-16582 (SEQ ID N0.25), hsa-miR-342-3p (SEQ ID NO.17-18), hsa-miR-181c-5p (SEQ ID N0.15), hsa-miR-125b-5p (SEQ ID N0.9), hsa- miR-375 (SEQ ID N0.8), hsa-miR-486-5p (SEQ ID
  • the present invention is particularly useful for the 25% of the cases in which FNA specimens present inconclusive results in cytopathology, usually referred to as "indeterminate", and which include thyroid lesion samples classified in Bethesda categories III, IV and V.
  • indeterminate the cases in which FNA specimens present inconclusive results in cytopathology, usually referred to as "indeterminate”
  • thyroid lesion samples classified in Bethesda categories III, IV and V In 10 current medical practice, patients with specimens falling within this category undergo repeat FNA procedure, and surgery, including lobectomy and thyroidectomy.
  • the present invention provides a method of classification for thyroid lesion samples that fall into the "indeterminate” cases, classified in categories III, IV and V of the Bethesda System (described further herein).
  • the present 15 invention provides a method of classification for thyroid lesion samples classified in category IV of the Bethesda System, which relates to "Follicular Neoplasm” or "Suspicious of a Follicular Neoplasm", which is known to be the most difficult category to be classified.
  • the present invention presents primarily a protocol for management of thyroid lesion samples which failed to be classified by cytopathological analysis.
  • Particular samples that 20 are of interest are those obtained by FNA.
  • routine smears from FNA samples are used.
  • FNA samples in preservative solutions may be used.
  • Total RNA is extracted from the FNA samples, and the expression of microRNAs is measured.
  • the expression of about 2200 microRNAs is measured.
  • the expression of 182 microRNAs, comprising the sequences of SEQ ID NO. 1- 25 182 is measured.
  • the expression of the microRNAs comprising the sequences of SEQ ID NO.1-37 is measured.
  • classification of the thyroid sample as malignant or benign comprises measuring the expression levels of hsa-miR-222-3p (SEQ ID NO.1-2), hsa-miR-551b- 5 3p (SEQ ID NO.3-4), hsa-miR-31-5p (SEQ ID NO.5-7), hsa-miR-375 (SEQ ID NO.8), hsa-miR- 125b-5p (SEQ ID NO.9), hsa-miR-146b-5p (SEQ ID NO.10-11), hsa-miR-152-3p (SEQ ID NO.12-13), hsa-miR-346 (SEQ ID N0.14), hsa-miR-181c-5p (SEQ ID N0.15), hsa-miR-424-3p (SEQ ID NO.16), hsa-miR-342-3p (SEQ ID NO.17-18), hsa-m-miR-
  • the present invention provides a method for distinguishing between malignant and benign thyroid tumor lesions in a subject in need, said method comprising obtaining a thyroid 15 tumor lesion sample from said subject, or provided a biological sample obtained from said subject, determining an expression profile in said sample of one or more, or at least four microRNAs comprising SEQ ID NOS: 1-25, or a sequence at least 80%, at least 85%, or at least 90% identical thereto, or any combination of said microRNAs, by hybridization or by amplification, comparing said expression profile to a reference threshold value by using a 20 classifier algorithm; and determining whether the thyroid lesion is malignant or benign.
  • the method of the invention is for distinguishing sub-types of malignant or benign thyroid tumor lesions.
  • the method of the invention comprises measuring the expression of at least four of the microRNAs comprising SEQ ID NOS: 1-25, obtaining the microRNA 25 expression profile value of said sample, and using a classifier to establish, based on said value, whether the thyroid lesion is malignant or benign, and optionally further classifying the sample into one of the malignant or benign subtypes.
  • said determining an expression profile by hybridization comprises contacting the sample with probes that hybridize to each of SEQ ID NOS: 1-25, or to 30 a sequence at least 80%, at least 85%, or at least 90% identical thereto.
  • said determining an expression profile by hybridization comprises contacting the sample with probes that hybridize with at least eight, at least ten, at least twelve, at least fourteen, or at least sixteen contiguous nucleotides of said microRNA comprising SEQ ID NOS: 1-25.
  • the present invention further provides a method of classifying a sample as malignant or benign, and/or sub-typing said sample, whereby, further to measuring the expression levels of microRNAs in the sample, obtaining an expression profile and optionally calculating microRNA ratios, applying a multi-step analysis of the expression data.
  • Said multi-step analysis comprising applying one or more algorithms, in parallel or sequentially, to at least one of the microRNA 5 expression profiles, microRNA ratios, or a combination thereof.
  • Said multi-step analysis may also further include analyzing the expression of one or more single microRNA levels which may be indicative of the overall quality of the sample.
  • the sample may be classified as benign, and further sub-typed as being Hashimoto.
  • the expression of hsa-miR-342-3p (SEQ ID NO.17-18) is very high compared to the threshold established in the data set, e.g. the training data set, the sample may be disqualified for lack of sufficient thyroid cells.
  • Another further 20 optional step may relate to the level of expression of MID-16582 (SEQ ID NO.25), may be used to determine whether the sample may be discarded, or analyzed using a classifier specific for these samples in which MID-16582 (SEQ ID N0.25) is high (compared to the threshold established in the training set).
  • said non-thyroid cell marker is a blood 25 cell marker.
  • said cell marker is an epithelial cell marker.
  • said cell marker is a blood cell marker, a white blood cell marker or an epithelial cell marker.
  • blood cell markers 30 are hsa-miR-486-5p (SEQ ID N0.22), hsa-miR-320a (SEQ ID NO.173), hsa-miR-106a-5p (SEQ ID NO.150), hsa-miR-93-5p (SEQ ID NO.182), hsa-miR-17-3p (SEQ ID NO.160), hsa-let-7d-5p (SEQ ID N0.144), hsa-miR-107 (SEQ ID N0.152), hsa-miR-103a-3p (SEQ ID N0.149), hsa-miR-17-5p (SEQ ID NO.161), hsa-miR-191-5p (SEQ ID N0.163), hsa-miR-25-3p (SEQ ID N
  • white blood cell markers are hsa-miR-342-3p (SEQ ID NO.17-18), hsa-miR-146a-5p and hsa-miR-150-5p (SEQ ID N0.59).
  • epithelial markers are hsa-miR-200c-3p (SEQ ID NO.23-24), hsa-miR-138-5p (SEQ ID N0.19- 21), hsa-miR-3648 (SEQ ID N0.174), hsa-miR-125b-5p (SEQ ID NO.9), hsa-miR-125a-5p (SEQ ID N0.153), hsa-miR-192-3p (SEQ ID N0.164), hsa-miR-4324 (SEQ ID N0.178), hsa- 15 miR-376a-3p (SEQ ID NO.175).
  • said microRNA ratio is the ratio between the normalized expression level of a pair of microRNAs, wherein the normalized expression level of one microRNA is used as the numerator and the normalized expression level of a second microRNA is the denominator.
  • said determining an expression profile comprises contacting the sample with RT-PCR reagents, including forward and reverse primers as exemplified herein in the Examples, and generating RT-PCR products.
  • said method comprises contacting RT-PCR products with specific or general probes, or a combination thereof, as exemplified herein in the Examples, 25 detecting and measuring the PCR products.
  • said determining an expression profile comprises measuring microRNA expression by hybridization, using microarrays and the like. In another further embodiment, said determining an expression profile comprises measuring microRNA expression by next-generation sequencing.
  • said method comprises optionally further determining the expression profile of at least one microRNA to be used as normalizer.
  • any microRNA as described in Table 1 may be used as a normalizer.
  • any of the microRNAs comprising SEQ ID NO. 26-37, or a sequence at least 80%, 85%, 90%, or 95% identical thereto, are used as normalizers.
  • Said markers may be any one of malignant markers, secondary 5 markers and cell-type markers, or any combination thereof, comprising SEQ ID NOS:l-25, or a sequence at least 80%, 85%, 90%, or 95% identical thereto.
  • the full set of markers may be used.
  • any combination of malignant, secondary and cell-type markers may be used.
  • the method may comprise at least one malignant marker, in association with at least one secondary marker and/or at least one cell-type 10 marker.
  • each of the cell type markers may be used as in the form of raw or normalized signals.
  • the cell type markers may be used as a preliminary test prior to performing the classification, in order to determine whether the sample has sufficient relevant material to perform classification, or whether the sample should be 15 discarded.
  • the cell-type markers may be used as part of the final classifier, where the signal of the cell type marker is used by the classifier.
  • the cell-type markers as the denominator of a miR ratio optionally used by the classifier. For example, the expression level of a malignant or a secondary marker may be divided by the expression level of a cell-type marker, and the resulting miR ratio used in the classifier.
  • said classifier may be any one of a single classifier, a multi-step classifier, a classifier which uses all the malignant markers, a classifier which uses a subset of the malignant markers, a classifier which uses all the malignant markers and the secondary markers, a classifier which uses a subset of the malignant markers and a 25 subset of the secondary markers, a classifier which uses all the malignant markers and the secondary markers and the cell type markers, a classifier which employs a subset of all the malignant markers and the secondary markers and the cell type markers, a classifier which uses all or a subset of the malignant markers and all or a subset of the cell type markers.
  • the 30 performance of the classification may be improved by further combining the result from the algorithm classifier with additional clinical or molecular data available for the thyroid sample being analyzed.
  • Additional data available may be related to the thyroid lesion, such as the size of the nodule, the number of nodules; it may relate to other clinical information available for the subject from whom the sample was obtained, such as molecular test results, like the expression of other molecular markers, genetic markers, biochemical test results, blood test results, urine test results, recurrence, prognosis data, family history, patient medical history, and the like.
  • Other data that may also be combined is thyroid genetic data, such as mutation analysis, gene fusions, chromosomal rearrangements, gene expression, protein expression, and the like. 5
  • Therapeutic indications may vary according to the diagnostic obtained with the method or protocol of the invention. Typically there are five types of therapy that may be administered to a thyroid cancer patient: surgery, radiation therapy, chemotherapy, thyroid hormone therapy and targeted therapy.
  • Lobectomy Removal of the lobe in which thyroid cancer is found. Biopsies of lymph nodes in the area may be done to see if they contain cancer.
  • Lymphadenectomy Removal of lymph nodes in the neck that contain cancer.
  • Thyroidectomy is a surgical procedure that has several potential complications or sequela including: temporary or permanent change in voice, temporary or permanently low calcium, need for lifelong thyroid hormone replacement, bleeding, infection, and the remote possibility of airway obstruction due to bilateral vocal cord paralysis. Therefore, accurate diagnosis which 20 would prevent the unnecessary removal of the thyroid gland is very desirable.
  • Radioactive iodine RAI is administered orally and collects in any remaining thyroid tissue, including thyroid cancer cells that have spread to 30 other places in the body.
  • Chemotherapy is another option for thyroid cancer treatment.
  • Chemotherapy may be administered orally or by injection, intravenous or intramuscular.
  • Chemotherapy may also be administered directly into the cancer affected area instead of systemically. The choice of administration will depend on the type and stage of the cancer.
  • a few examples of drugs that have been approved for thyroid cancer treatment are: Adriamycin PFS (Doxorubicin 5 Hydrochloride), Adriamycin RDF (Doxorubicin Hydrochloride), Cabozantinib-S-Malate, Caprelsa (Vandetanib), Cometriq (Cabozantinib-S-Malate), Doxorubicin Hydrochloride, Nexavar (Sorafenib Tosylate), Sorafenib Tosylate and Vandetanib.
  • Thyroid hormone therapy is a cancer treatment that removes hormones or blocks their action and inhibits cancer cell proliferation.
  • drugs may be 10 given to prevent thyroid- stimulating hormone (TSH) production, in order to avoid that the hormone would induce the growth or recurrence of the thyroid cancer.
  • TSH thyroid- stimulating hormone
  • thyroid cancer treatment specifically targets thyroid cells, the thyroid is not able to make enough thyroid hormone. Patients are given thyroid hormone replacement pills.
  • Targeted therapy uses drugs or other substances to identify and attack specific cancer 15 cells without harming normal cells.
  • Tyrosine kinase inhibitor (TKI) therapy blocks signal transduction in thyroid cancer cells, inhibiting their growth.
  • Vandetanib is a TKI used to treat thyroid cancer.
  • Dosage and duration of any therapy will depend on individual evaluation of the patient and on standard practice known by the health care provider.
  • the duration of treatment is the 20 period of time during which doses of a pharmaceutical agent or pharmaceutical composition are administered.
  • the identification and differentiation of the thyroid tumor, firstly as benign or malignant, and subsequently its classification into the various subtypes through the analysis of differentially expressed microRNAs can provide further clues to the biological differences between the 25 subtypes, their diverging oncogenetic processes and possible new targets for type-specific target therapy.
  • the present invention provides diagnostic assays and methods, both quantitative and qualitative, for detecting, diagnosing, monitoring, staging and prognosticating thyroid cancers by comparing levels of the specific microRNA molecules as described herein. Such levels are 30 measured in a patient sample, which may be from a biopsy, tumor samples, cells, tissues and/or bodily fluids.
  • the method of the invention is particularly useful for discriminating between different subtypes of malignant thyroid tumors, such types being follicular carcinoma, papillary carcinoma, follicular variant of papillary carcinoma (FVPC or FVPTC), encapsulated FVPC (or encapsulated FVPTC), medullary carcinoma, anaplastic thyroid cancer, poorly differentiated thyroid cancer, and for determining the therapeutic course to be followed after diagnosis.
  • the present invention provides a method for classifying sub-types of benign thyroid tumor, e.g. follicular adenoma, Hashimoto thyroiditis, hyperplasia (Goiter). 5
  • the present invention also provides a method of treatment of thyroid cancer, said method comprising the method of distinguishing between benign or malignant thyroid tumor as described herein, optionally subtyping the thyroid tumor type, and administering the treatment according to the diagnosis provided by the present method.
  • All the methods of the present invention may optionally further include measuring levels 10 of other cancer markers.
  • Other cancer markers in addition to said microRNA molecules useful in the present invention, will depend on the cancer being tested and are known to those of skill in the art.
  • Assay techniques that can be used to determine levels of gene expression, such as the nucleic acid sequence of the present invention, in a sample derived from a patient are well 15 known to those of skill in the art.
  • Such assay methods include, but are not limited to, radioimmunoassays, reverse transcriptase PCR (RT-PCR) assays, immunohistochemistry assays, in situ hybridization assays, competitive-binding assays, northern blot analyses, ELISA assays, nucleic acid microarrays and biochip analysis.
  • An arbitrary threshold on the expression level of one or more nucleic acid sequences can 20 be set for assigning a sample or tumor sample to one of two groups.
  • expression levels of one or more nucleic acid sequences of the invention are combined by taking ratios of expression levels of two nucleic acid sequences and/or by a method such as logistic regression to define a metric which is then compared to previously measured samples or to a threshold.
  • the threshold for assignment is treated as a parameter, which can be 25 used to quantify the confidence with which samples are assigned to each class.
  • the threshold for assignment can be scaled to favor sensitivity or specificity, depending on the clinical scenario.
  • the correlation value to the reference data generates a continuous score that can be scaled and provides diagnostic information on the likelihood that a samples belongs to a certain class of thyroid subtype. In multivariate analysis, the microRNA signature provides a high level of 30 prognostic information.
  • the present invention also provides novel microRNA molecules, comprising nucleic acids denoted by SEQ ID NOS.27-29, 33, 34, 139, 140, 307 and 308. It is to be understood, that the cDNA, complement sequence, and anti-miR corresponding to any one of SEQ ID NOS.27- 29, 33, 34, 139, 140, 307 and 308 are also encompassed by the present invention.
  • compositions, formulations and medicaments comprising the microRNAs described herein.
  • the present invention provides compositions, formulations and medicaments comprising as an active agent 5 the microRNA comprising any one of SEQ ID NOS.27-29, 33, 34, 139, 140, 307 and 308, variants thereof, or a sequence at least 80%, at least 85%, or at least 90% identical thereto.
  • Said compositions, formulations and medicaments may further optionally comprise any one of adjuvants, carriers, diluents and excipients.
  • microRNAs described herein can be formulated into compositions, formulations and medicaments by combination with appropriate, 10 pharmaceutically acceptable carriers or diluents, and can be formulated into preparations in solid, semi-solid, liquid or gaseous forms, such as tablets, capsules, powders, granules, ointments, solutions, suppositories, injections, inhalants and aerosols.
  • administration of the microRNA or a pharmaceutical composition comprising thereof can be achieved in various ways, including oral, buccal, rectal, parenteral, intraperitoneal, intradermal, transdermal, 15 intratracheal, etc.
  • compositions of the present invention comprise one or more nucleic acids of the invention and one or more excipients.
  • excipients are selected from water, salt solutions, alcohol, polyethylene glycols, gelatin, lactose, amylase, magnesium stearate, talc, silicic acid, viscous paraffin, 20 hydroxymethylcellulose and polyvinylpyrrolidone.
  • a pharmaceutical composition of the present invention is prepared using known techniques, including, but not limited to mixing, dissolving, granulating, dragee-making, levigating, emulsifying, encapsulating, entrapping or tabletting processes. Methods for the preparation of pharmaceutical compositions may be found in the literature, e.g. 25 in Gennaro, A. R. (2000) Remington: The Science and Practice of Pharmacy, 20 th ed.
  • a pharmaceutical composition of the present invention is a liquid (e.g., a suspension, elixir and/or solution).
  • a liquid pharmaceutical composition is prepared using ingredients known in the art, including, but not limited to, water, glycols, oils, alcohols, flavoring agents, preservatives, and coloring agents.
  • a pharmaceutical composition of the present invention is a solid (e.g., a powder, tablet, and/or capsule).
  • a solid pharmaceutical composition comprising one or more nucleic acids of the invention is prepared using ingredients known in the art, including, but not limited to, starches, sugars, diluents, granulating agents, lubricants, binders, and disintegrating agents.
  • the present application provides vectors and probes comprising the compounds (the nucleic acids) disclosed herein.
  • the present application provides vectors and probes comprising nucleic acids denoted by SEQ ID NOS.27-29, 33, 34, 5 139, 140, 307 and 308, variants thereof or a sequence at least 80%, at least 85%, or at least 90% identical thereto.
  • each intervening number there between with the same degree of precision is explicitly contemplated.
  • the numbers 7 and 8 are contemplated in addition to 6 and 9, and for the range 6.0-7.0 for example, the numbers 6.0, 6.1, 6.2, 6.3, 6.4, 6.5, 6.6, 6.7, 6.8, 6.9 and 7.0 are explicitly contemplated. 15
  • aberrant proliferation means cell proliferation that deviates from the normal, proper, or expected course.
  • Aberrant cell proliferation may include cell proliferation whose characteristics are associated with an indication caused by, mediated by, or resulting in inappropriately high levels of cell division, inappropriately low levels of apoptosis, or both.
  • Such indications may be characterized, for example, by single or multiple local 20 abnormal proliferations of cells, groups of cells, or tissue(s), whether cancerous or noncancerous, benign or malignant.
  • Aberrant proliferation is one of the main features of cancer.
  • Bind or “immobilized”, as used herein to refer to a probe and a solid support, means that the binding between the probe and the solid support is sufficient to be stable under 25 conditions of binding, washing, analysis, and removal.
  • the binding may be covalent or non- covalent. Covalent bonds may be formed directly between the probe and the solid support or may be formed by a cross linker or by inclusion of a specific reactive group on either the solid support or the probe or both molecules.
  • Non-covalent binding may be one or more of electrostatic, hydrophilic, and hydrophobic interactions.
  • Biological sample or “sample”, as used herein, means a sample of biological tissue or fluid that comprises nucleic acids, microRNA in particular. Such samples include, but are not limited to, tissue or fluid isolated from subjects. Biological samples also include sections of tissues such as biopsy and autopsy samples, fine-needle aspiration (FNA) samples, frozen sections taken for histological purposes, blood, blood fraction, plasma, serum, and the like.
  • FNA fine-needle aspiration
  • a 5 biological sample may be provided by removing a sample of cells from a subject, but can also be accomplished by using previously isolated cells (e.g., isolated by another person, at another time, and/or for another purpose), which may then be cultured or not.
  • Archival tissues such as those having treatment or outcome history, may also be used.
  • the FNA biopsy is prepared as a smear. 10
  • classification refers to a procedure and/or algorithm in which individual items are placed into groups or classes based on quantitative information on one or more characteristics inherent in the items (referred to as traits, variables, characters, features, etc.) and based on a statistical model and/or a training set of previously labeled items.
  • the term "classifying thyroid tumors” refers to the identification of one or 15 more properties of a thyroid tissue sample (e.g., including but not limited to, the presence of microRNAs expressed in cancerous tissue, the presence of microRNAs expressed in precancerous tissue that is likely to become cancerous, and the presence of microRNAs expressed in cancerous tissue that is likely to metastasize).
  • classifier refers to an algorithm used to classify, distinguish or 20 identify thyroid tumors (or lesions) as benign or malignant, or to classify, distinguish or identify sub-types of thyroid tumor.
  • the algorithm to be used in the method or protocol of the invention is a machine-learning algorithm.
  • machine-learning algorithms are discriminant analysis, K-nearest neighbor classifier (KNN), Support Vector Machine (SVM) classifier, , 30 logistic regression classifier, neural network classifier, Gaussian mixture model (GMM), nearest centroid classifier, linear regression classifier, decision tree classifier, and random forest classifier, ensemble of classifiers, or any combination thereof.
  • KNN K-nearest neighbor classifier
  • SVM Support Vector Machine
  • GMM Gaussian mixture model
  • nearest centroid classifier linear regression classifier
  • decision tree classifier decision tree classifier
  • random forest classifier ensemble of classifiers, or any combination thereof.
  • the discriminant may be any one of a linear, quadratic, a diagonal of the linear covariance matrix, diagonals of the quadratic covariance matrices, pseudoinverse of the linear covariance matrix, and pseudoinverse of the quadratic covariance matrices.
  • the k When a KNN classifier is used, the k may be altered and the distance metric can be either Pearson correlation, spearman correlation, Euclidean or cityblock 5 (Manhattan) distance.
  • the kernel When a SVM classifier is used, the kernel may be linear, Gaussian or polynomial.
  • an ensemble method classifier it usually applies algorithms such as classification trees, KNN or discriminate analysis classifiers. The ensembles can be either created using boosting or bagging algorithms and the number of ensemble learning cycles can range from two up to a few thousand. 10
  • fusion matrix refers to a specific table layout that allows visualization of the performance of an algorithm, typically a supervised learning one.
  • a “confusion matrix” may also be referred to as a contingency table or an error matrix.
  • “Complement” or “complementary”, as used herein to refer to a nucleic acid may mean Watson-Crick (e.g., A-T/U and C-G) or Hoogsteen base pairing between nucleotides or 15 nucleotide analogs of nucleic acid molecules.
  • a full complement or fully complementary means 100% complementary base pairing between nucleotides or nucleotide analogs of nucleic acid molecules.
  • the complementary sequence has a reverse orientation (5 '-3').
  • the present invention also provides the complement of the nucleic acids denoted by SEQ ID NOS. 7-29, 33, 34, 139, and 140. 20
  • CT signals represent the first cycle of PCR where amplification crosses a threshold (cycle threshold) of fluorescence. Accordingly, low values of CT represent high abundance or expression levels of the microRNA.
  • the PCR CT signal is normalized such that the normalized CT is inversed from the expression level.
  • the PCR CT signal may be normalized and then inverted such that low normalized- 25 inverted CT represents low abundance or expression levels of the microRNA.
  • a "data processing routine” refers to a process that can be embodied in software that determines the biological significance of acquired data (i.e., the ultimate results of an assay or analysis) with respect to one or more samples. For example, the data processing routine can make determination of whether a thyroid lesion from which a sample was collected 30 or obtained is benign or malignant, or of a specific sub-type, based upon the data collected. In the systems and methods herein, the data processing routine can also control the data collection routine based upon the results determined. The data processing routine and the data collection routines can be integrated and provide feedback to operate the data acquisition, and hence provide assay-based judging methods.
  • Detection means detecting the presence of a component in a sample. Detection also means detecting the absence of a component. Detection also means determining the level of a component, either quantitatively or qualitatively. 5
  • a differentially expressed microRNA may qualitatively have its expression altered, including an activation or inactivation, in, e.g., normal versus diseased thyroid tissue.
  • a qualitatively regulated microRNA may exhibit an expression pattern within a thyroid sample or cell type 10 which may be detectable by standard techniques. Some microRNAs may be expressed in one thyroid sample or cell type, and not in other, or expressed at different levels between different cell types or different samples.
  • the difference in expression may be quantitative, e.g., in that expression is modulated, up-regulated, resulting in an increased amount of microRNA, or down-regulated, resulting in a decreased amount of microRNA.
  • the degree to which expression 15 differs needs only be large enough to quantify via standard characterization techniques such as expression arrays, next generation sequencing (NGS), quantitative reverse transcriptase PCR, northern blot analysis, real-time PCR, in situ hybridization and RNase protection.
  • expression profile 20 means the set of data obtained for the nucleic acid (or microRNA) expression. It may refer to the raw data or to the normalized expression values. Expression profiles may be generated by any convenient means for determining a level of a nucleic acid sequence e.g. quantitative hybridization of microRNA, labeled microRNA, amplified microRNA, cDNA, etc., quantitative PCR, and the like. Further to measuring nucleic acid sequence levels, the data obtained may be 25 normalized - normalization of data has been discussed somewhere else in this application. Expression profiles allow the analysis of differential gene expression between two or more samples, as well as between samples and thresholds.
  • classifiers may be applied to expression profiles in order to obtain information about the sample, such as classification, diagnosis, sub-typing of the sample, and the like.
  • Nucleic acid sequences of interest are nucleic 30 acid sequences that are found to be predictive, including the nucleic acid sequences provided herein in Table 1, where the expression profile may include expression data for 5, 10, 20, 25, 50, 100 or more of, including all of the listed nucleic acid sequences.
  • the term "expression profile" means measuring the abundance of the nucleic acid sequences in the measured samples.
  • microRNA expression profiles are characterized in each thyroid sample.
  • “Expression ratio” refers to relative expression levels of two or more nucleic acids, i.e. microRNAs, as determined by detecting the relative expression levels of the corresponding nucleic acids in a biological sample, such as a thyroid sample. Since microRNA 5 expression levels are expressed as CTS, which are obtained in log scale, in practice expression ratios are obtained by subtraction of the CTS, rather than by division.
  • FDR or “False Discovery Rate” is a statistical method used in multiple hypothesis testing to correct for multiple comparisons.
  • multiple statistical tests for example in comparing the signal between two groups in multiple data features, there is 10 an increasingly high probability of obtaining false positive results, by random differences between the groups that can reach levels that would otherwise be considered as statistically significant.
  • statistical significance is defined only for data features in which the differences reached a p-value (by two-sided t-test) below a threshold, which is dependent on the number of tests performed and the distribution of 15 p-values obtained in these tests.
  • Fine-needle aspiration biopsy FNAB, FNA or NAB
  • fine-needle aspiration cytology FNAC
  • a biopsy is collected by inserting a thin, hollow needle into the mass 20 for sampling of cells that, after being stained, will be examined under a microscope. There could be cytology exam of aspirate (cell specimen evaluation, FNAC) or histological (biopsy - tissue specimen evaluation, FNAB).
  • FNA is a popular biopsy method used for thyroid nodules since a major surgical (excisional or open) biopsy can be avoided by performing a needle aspiration biopsy instead.
  • a detailed description of specimen collection and preparation may be found in 25 "Atlas of Fine Needle Aspiration Cytology" by Henryk A. Domanski (2014), the contents of which are incorporated herein by reference.
  • the preparation of aspiration specimens has been well described in the art. Usually, a suitable amount of aspirate (usually about one drop) is spread thinly and evenly over a microscopic slide which is then stained and mounted. FNA specimen prepared in this manner are also referred to as "smear". The result should be 30 compatible to a sectioned histological slide with regard to specimen thickness and evenness.
  • Fixation of FNA smears is usually by air drying (generally referred to as "routine air dried FNAB") or wet fixing using either 95% ethanol or cyto-spray as fixative.
  • suitable liquid fixatives are methanol, acetone, isopropyl alcohol, acetone/methanol and the like.
  • FNA samples may be added to or mixed with preservatives in a tube.
  • a "follicular" lesion may be any one of follicular adenoma (FA), follicular carcinoma (FC) and follicular variant of papillary carcinoma (FVPCA).
  • FA follicular adenoma
  • FC follicular carcinoma
  • FVPCA papillary carcinoma
  • Fraction is used herein to indicate a non-full-length part of a nucleic acid. Thus, a 5 fragment is itself also a nucleic acid.
  • Gave binder and/or “minor groove binder” may be used interchangeably and refer to small molecules that fit into the minor groove of double- stranded DNA, typically in a sequence-specific manner.
  • Minor groove binders may be long, flat molecules that can adopt a crescent-like shape and thus, fit snugly into the minor groove of a 10 double helix, often displacing water.
  • Minor groove binding molecules may typically comprise several aromatic rings connected by bonds with torsional freedom such as furan, benzene, or pyrrole rings.
  • Minor groove binders may be antibiotics such as netropsin, distamycin, berenil, pentamidine and other aromatic diamidines, Hoechst 33258, SN 6999, aureolic anti-tumor drugs such as chromomycin and mithramycin, CC-1065, dihydrocyclopyrroloindole tripeptide (DPI 3 ), 15 l,2-dihydro-(3H)-pyrrolo[3,2-e]indole-7-carboxylate (CDPI 3 ), and related compounds and analogues, including those described in Nucleic Acids in Chemistry and Biology, 2d ed., Blackburn and Gait, eds., Oxford University Press, 1996, and PCT Published Application No.
  • antibiotics such as netropsin, distamycin, berenil, pentamidine and other aromatic diamidines, Hoechst 33258, SN 6999, aureolic anti-tumor drugs such as chrom
  • a minor groove binder may be a component of a primer, a probe, a hybridization tag complement, or 20 combinations thereof. Minor groove binders may increase the T m of the primer or a probe to which they are attached, allowing such primers or probes to effectively hybridize at higher temperatures.
  • Identity as used herein in the context of two or more nucleic acid sequences, mean that the sequences have a specified percentage of residues that are the same 25 over a specified region. The percentage may be calculated by optimally aligning the two sequences, comparing the two sequences over the specified region, determining the number of positions at which the identical residue occurs in both sequences to yield the number of matched positions, dividing the number of matched positions by the total number of positions in the specified region, and multiplying the result by 100 to yield the percentage of sequence identity.
  • In situ detection means the detection of expression or expression levels in the original site hereby meaning in a tissue sample such as biopsy.
  • Label means a composition detectable by spectroscopic, 5 photochemical, biochemical, immunochemical, chemical, or other physical means.
  • the label may be any entity that does not naturally occur in a protein or nucleic acid and allows the nucleic acid or protein to be detectable.
  • useful labels include 32 P, fluorescent dyes, electron-dense reagents, enzymes, biotin, digoxigenin, or haptens and other entities which can be made detectable, and the like.
  • a label may be incorporated into nucleic acids and proteins at any 10 position.
  • Logistic regression is part of a category of statistical models called generalized linear models. Logistic regression allows one to predict a discrete outcome, such as group membership, from a set of variables that may be continuous, discrete, dichotomous, or a mix of any of these. The dependent or response variable can be dichotomous, for example, one of two possible types 15 of cancer. Logistic regression models the natural log of the odds ratio, i.e. the ratio of the probability of belonging to the first group (P) over the probability of belonging to the second group (1-P), as a linear combination of the different expression levels (in log-space).
  • the logistic regression output can be used as a classifier by prescribing that a case or sample will be classified into the first type is P is greater than 0.5 or 50%. Alternatively, the calculated 20 probability P can be used as a variable in other contexts such as a ID or 2D threshold classifier.
  • the term "prior" refers to a probability for each class, e.g., given to the different classes, and used by the likelihood that a sample is malignant or benign, without any additional knowledge regarding the expression profile of the sample in a classification.
  • Priors may be set at different ratios, such as for example 80%-20% malignant-benign, 75%-25% 25 malignant-benign, 70%-30% malignant-benign, 65%-35% malignant-benign, 60%-40% malignant-benign, 50%-50% malignant-benign (i.e., uniform).
  • priors may be empirical, i.e., based on the distribution of the samples in training cohort. Priors may be adjusted in order to achieve a predetermined sensitivity or specificity.
  • a “marker” is a microRNA, or a nucleic acid sequence, whose presence 30 and abundance is measured in a sample. A “marker” further provides an indication of the status of the sample.
  • malignant marker is a microRNA, or a nucleic acid sequence which is present at higher levels in malignant samples versus benign samples. A malignant marker may or may not be present in test samples.
  • secondary marker is a microRNA, or a nucleic acid sequence, which is used to differentiate between malignant and benign samples, and for which the difference, or the 5 ratio, in the expression levels of said secondary marker in malignant and benign samples is less than the difference, or the ratio, in the expression levels of malignant markers.
  • a secondary marker may or may not be present in test samples.
  • cell type marker refers to a microRNA, or nucleic acid sequence, whose expression correlates with certain cell types. Said cell types may generally be found in a 10 sample, e.g. blood cells, white blood cells, red blood cells, epithelial cells, Hurthle cells, mitochondrial-rich cells, lymphocytes, follicular cells, parafollicular cells (C cells), metastatic cells, immune cells, macrophages and the like. Other markers included as “cell type markers” may be species-specific markers, such as markers from bacteria, fungi, and the like.
  • Normalizer means a microRNA or a nucleic acid sequence whose 15 signal (i.e., level of expression) is used in order to normalize each sample.
  • a normalizer may be used alone (one microRNA as normalizer), or as part of a set of normalizers (more than one microRNA as normalizer, for example two, three, four, five, six, seven eight, nine, ten eleven, twelve, thirteen fourteen, sixteen or seventeen microRNAs may be used as normalizers in a set).
  • any microRNA detected in the sample may be used as a normalizer.
  • the microRNAs defined herein as "markers” may also be used as "normalizers”. Essentially, any microRNA may be used as a normalizer.
  • microRNAs denoted by any one of SEQ ID NOs 1-182 may be used as normalizers.
  • MicroRNAs denoted by any one of SEQ ID NOs. 1-37 may be used as normalizers.
  • Particular examples of microRNAs that may be used as normalizers are hsa-miR-23a-3p, MID-20094, MID-50969, hsa-miR-345-5p, hsa-miR- 25 3074-5p, MID-50976, MID-50971, hsa-miR-5701 and hsa-miR-574-3p.
  • Normalization of data values refers to mapping the original data range into another scale. Normalization may be done by subtracting the mean expression of the set of normalizers, subtracting the median expression of the set of normalizers, fitting the expression values of the normalizers to a reference set of values (using a polynomial fit) and applying this fit to all 30 signals. All the normalizers, or a subset of the normalizers may be used.
  • Nucleic acid or "oligonucleotide” or “polynucleotide”, as used herein, means at least two nucleotides covalently linked together.
  • the depiction of a single strand also defines the sequence of the complementary strand.
  • a nucleic acid also encompasses the complementary strand of a depicted single strand.
  • Many variants of a nucleic acid may be used for the same purpose as a given nucleic acid.
  • a nucleic acid also encompasses substantially identical nucleic acids and complements thereof.
  • a single strand may provide a probe that hybridizes to a target sequence under stringent hybridization conditions.
  • a nucleic acid also encompasses a probe that hybridizes under stringent hybridization conditions. 5
  • Nucleic acids may be single- stranded or double- stranded, or may contain portions of both double- stranded and single- stranded sequences.
  • the nucleic acid may be DNA, both genomic and cDNA, RNA, or a hybrid, where the nucleic acid may contain combinations of deoxyribo- and ribo-nucleotides, and combinations of bases including uracil, adenine, thymine, cytosine, guanine, inosine, xanthine, hypoxanthine, isocytosine and isoguanine.
  • Nucleic acids may be 10 obtained by chemical synthesis methods or by recombinant methods.
  • a nucleic acid will generally contain phosphodiester bonds, although nucleic acid analogs may be included.
  • the analog may include a non-naturally occurring linkage, backbone, or nucleotide.
  • the analog may have at least one different linkage, e.g., phosphoramidate, phosphorothioate, phosphorodithioate, or O-methylphosphoroamidite linkages and peptide 15 nucleic acid backbones and linkages.
  • Other analog nucleic acids include those with positive backbones; non-ionic backbones, and non-ribose backbones, including those described in US 5,235,033 and US 5,034,506, which are incorporated herein by reference.
  • Nucleic acids containing one or more non-naturally occurring or modified nucleotides are also included within one definition of nucleic acids.
  • the modified nucleotide analog may be located for example at 20 the 5'-end and/or the 3'-end of the nucleic acid molecule.
  • Representative examples of nucleotide analogs may be selected from sugar- or backbone-modified ribonucleotides.
  • nucleobase-modified ribonucleotides i.e., ribonucleotides containing a non- naturally occurring nucleobase instead of a naturally occurring nucleobase such as uridines or cytidines modified at the 5-position, e.g., 5-(2-amino) propyl uridine, 5-bromo uridine; 25 adenosines and guanosines modified at the 8-position, e.g., 8-bromo guanosine; deaza nucleotides, e.g., 7-deaza-adenosine; O- and N-alkylated nucleotides, e.g., N6-methyl adenosine are suitable.
  • uridines or cytidines modified at the 5-position e.g., 5-(2-amino) propyl uridine, 5-bromo uridine
  • the 2'-OH-group may be replaced by a group selected from H, OR, R, halo, SH, SR, NH 2 , NHR, NR 2 or CN, wherein R is C1-C6 alkyl, alkenyl or alkynyl and halo is F, CI, Br or I.
  • Modified nucleotides also include nucleotides conjugated with cholesterol through, e.g., a 30 hydroxyprolinol linkage as described in Krutzfeldt et al. (Nature 2005; 438:685-689), Soutschek et al. (Nature 2004; 432: 173-178), and WO 2005/079397, which are incorporated herein by reference.
  • Modifications of the ribose -phosphate backbone may be done for a variety of reasons, e.g., to increase the stability and half-life of such molecules in physiological environments, to enhance diffusion across cell membranes, or as probes on a biochip.
  • the backbone modification may also enhance resistance to degradation, such as in the harsh endocytic environment of cells.
  • the backbone modification may also reduce nucleic acid clearance by hepatocytes, such as in the liver and thyroid. Mixtures of naturally occurring nucleic acids and analogs may be made. Alternatively, mixtures of different nucleic acid analogs, and mixtures of naturally occurring 5 nucleic acids and analogs may be made.
  • nucleic acids provided herein may be non-naturally occurring, synthesized nucleic acids.
  • the nucleic acid provided herein may be a synthetic nucleic acid. Methods of synthesizing nucleic acids are known to the man skilled in the art, and are described, e.g., in US 7,579,451, the contents of which are 10 incorporated herein by reference.
  • the nucleic acids may comprise at least one of the sequences of SEQ ID NOS: 1-308 or a variant thereof. In one embodiment, the nucleic acids comprise at least one of the sequences of SEQ ID NOS: 1-182.
  • the variant may be a complement of the referenced nucleotide sequence.
  • the variant may be a nucleotide sequence that is 70%, 75%, 80%, 85%, 90% or 95% identical to the referenced nucleotide sequence or the complement 15 thereof.
  • the variant may be a nucleotide sequence which hybridizes under stringent conditions to the referenced nucleotide sequence, complements thereof, or nucleotide sequences substantially identical thereto.
  • a nucleic acid as described herein may have a length of from about 10 to about 250 nucleotides.
  • the nucleic acid may have a length of at least 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 20, 21 , 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 125, 150, 175, 200 or 250 nucleotides.
  • the nucleic acid may be synthesized or expressed in a cell (in vitro or in vivo) using a synthetic gene.
  • the nucleic acid may be synthesized as a single strand molecule and hybridized to a substantially complementary nucleic acid to form a duplex.
  • the nucleic acid may be introduced to a cell, tissue or organ in a single- or double-stranded form or capable of 25 being expressed by a synthetic gene using methods well known to those skilled in the art, including as described in US 6,506,559, the contents of which are incorporated by reference herein.
  • the nucleic acid may comprise a microRNA sequence shown in Table 1 , or a variant thereof. In some instances, variants of the same microRNA are also provided in Table 1. It is to 30 be noted that SEQ ID NOs.1-180 in Table 1 present the cDNA corresponding to the sequence of the naturally occurring microRNA, i.e., the sequences present thymine (T) instead of uracil (U).
  • T thymine
  • U uracil
  • nucleic acid refers to deoxyribonucleotides, ribonucleotides, or modified nucleotides, and polymers thereof in single- or double- stranded form.
  • the term encompasses nucleic acids containing known nucleotide analogs or modified backbone residues or linkages, which are synthetic, naturally occurring, and non-naturally occurring, which have similar binding properties as the reference nucleic acid, and which are metabolized in a manner similar to the reference nucleotides.
  • Examples of such analogs include, without limitation, phosphorothioates, phosphoramidates, methyl phosphonates, chiral-methyl phosphonates, 2-0- 5 methyl ribonucleotides, peptide-nucleic acids (PNAs) and unlocked nucleic acids (UNAs; see, e.g., Jensen et al. Nucleic Acids Symposium Series 52: 133-4), and derivatives thereof.
  • PNAs peptide-nucleic acids
  • UNAs unlocked nucleic acids
  • Nucleotide is used as recognized in the art, to include those with natural bases (standard), and modified bases well known in the art. Such bases are generally located at the 1 ' position of a nucleotide sugar moiety. Nucleotides generally comprise a base, sugar and a phosphate group. 10 The nucleotides can be unmodified or modified at the sugar, phosphate and/or base moiety, also referred to interchangeably as nucleotide analogs, modified nucleotides, non-natural nucleotides, non-standard nucleotides and other (see, e.g., WO 92/07065; WO 93/15187; the contents of which are incorporated herein by reference).
  • nucleic acid bases There are several examples of modified nucleic acid bases known in the art as summarized by Limbach, et al, Nucleic Acids Res. 22:2183, 1994. 15 Some of the non- limiting examples of base modifications that can be introduced into nucleic acid molecules include, hypoxanthine, purine, pyridin-4-one, pyridin-2-one, phenyl, pseudouracil, 2,4,6-trimethoxy benzene, 3-methyl uracil, dihydrouridine, naphthyl, aminophenyl, 5- alkylcytidines (e.g., 5-methylcytidine), 5-alkyluridines (e.g., ribo thymidine), 5-halouridine (e.g., 5-bromouridine) or 6-azapyrimidines or 6-alkylpyrimidines (e.g.
  • modified bases in this aspect is meant nucleotide bases other than adenine, guanine, cytosine and uracil at ⁇ position or their equivalents.
  • Modified nucleotide refers to a nucleotide that has one or more modifications to the nucleoside, the nucleobase, pentose ring, or phosphate group. Modifications include those 25 naturally occurring that result from modification by enzymes that modify nucleotides, such as methyltransferases. Modified nucleotides also include synthetic or non-naturally occurring nucleotides.
  • Synthetic or non-naturally occurring modifications in nucleotides include those with 2' modifications, e.g., 2'-methoxyethoxy, 2'-fluoro, 2'-allyl, 2'-0-[2-(methylamino)-2-oxoethyl], 4'-thio, 4'-CH 2 -0-2'-bridge, 4'-(CH 2 ) 2 -0-2'-bridge, 2'-LNA or other bicyclic or “bridged” 30 nucleoside analog, and 2'-0-(N-methylcarbamate) or those comprising base analogs.
  • 2' modifications e.g., 2'-methoxyethoxy, 2'-fluoro, 2'-allyl, 2'-0-[2-(methylamino)-2-oxoethyl], 4'-thio, 4'-CH 2 -0-2'-bridge, 4'-(CH 2 ) 2 -0-2'-bridge, 2'-LNA or other bicyclic
  • amino 2'-NH 2 or 2'-0-NH 2 , which can be modified or unmodified. Such modified groups are described, e.g., in US 5,672,695 and US 6,248,878.
  • Modified nucleotides of the instant invention can also include nucleotide analogs as described above.
  • base analog refers to a heterocyclic moiety which is located at the ⁇ position of a nucleotide sugar moiety in a modified nucleotide that can be incorporated into a nucleic acid duplex (or the equivalent position in a nucleotide sugar moiety substitution that can 5 be incorporated into a nucleic acid duplex).
  • a base analog may be generally a purine or a pyrimidine base, excluding the common bases guanine (G), cytosine (C), adenine (A), thymine (T), and uracil (U). Base analogs can duplex with other bases or base analogs in dsRNAs.
  • Base analogs include those useful in the compounds and methods of the invention, e.g., those disclosed in US 5,432,272, US 6,001,983 and US 7,579,451, which are herein incorporated by 10 reference.
  • Non- limiting examples of bases include hypoxanthine (I), xanthine (X), 313-D- ribofuranosyl-(2,6-diaminopyrimidine) (K), 3-gamma-D-ribofuranosyl-(l-methyl-pyrazolo[4,3- d]pyrimidine-5,7(4H,6H)-dione) (P), iso-cytosine (iso-C), iso-guanine (iso-G), 1-gamma-D- ribofuranosyl-(5-nitroindole), l-gamma-D-ribofuranosyl-(3-nitropyrrole), 5-bromouracil, 2- aminopurine, 4-thio-dT, 7-(
  • Base analogs may also be a universal base.
  • Universal base refers to a heterocyclic moiety located at the ⁇ position of a nucleotide sugar moiety in a modified nucleotide, or the equivalent position in a nucleotide sugar moiety substitution, that, when present in a nucleic acid duplex, can be positioned opposite more than one type of base without altering the double helical structure (e.g., the structure of the phosphate backbone). Additionally, the universal base does not destroy the ability of the single stranded nucleic acid in which it resides to duplex to a target nucleic acid.
  • hsa-miR-320a 173 AAAAGCTGGGTTGAGAGGGCGAA
  • 1 "N” may be any one of G, C, A, T/U.
  • miR name is the miRBase registry name (release 20), except for the miR names represented by MID- [numeral] or MD2- [numeral].
  • MID-00078, MID-00321, MID-00387, MID-00671, MID-00672, MID-00690, MID-15965, MID-16318, MID-17144, MID-17866, MID-18468, MID-19433, MID-19434, MID-23168, MID-23794, MID-24496, MID-24705, MD2-495 and MD2-437 are putative microRNAs, which were predicted and/or cloned at Rosetta Genomics.
  • the nucleic acid may also comprise a miR hairpin sequence shown in Table 2, or a variant thereof.
  • hsa-mir-7 183 TACTGCGCTCAACAACAAATCCCAGTCTACCTAATGGTGCCAGCCATC
  • hsa-mir-21 186 TCATGGCAACACCAGTCGATGGGCTGTCTGACATTTTGGTAT
  • hsa-mir-34a 190 AAGGAAGCAATCAGCAAGTATACTGCCCTAGAAGTGCTGCAC
  • hsa-mir-92b 191 TGCAGTGTTGTTTTTTCCCCCGCCAATATTGCACTCGTCCCGGCCTCC
  • hsa-mir-100 193 CAAGCTTGTATCTATAGGTATGTGTCTGTTAGGC
  • hsa-mir-125b-2 196 TAACATCACAAGTCAGGCTCTTGGGACCTAGGCGGAGGGGA
  • hsa-mir-138-1 CCCTGGCATGGTGTGGTGGGGCAGCTGGTGTTGTGAATCAGGCCGTTG
  • hsa-mir-138-2 GAGGAAGCCGGCGGAGTTCTGGTATCGTTGCTGCAGCTGGTGTTGTGA
  • hsa-mir-140 201 TGGTAGGTTACGTCATGCTGTTCTACCACAGGGTAGAACCACGGACAG
  • hsa-mir-141 202 TTGTGAAGCTCCTAACACTGTCTGGTAAAGATGGCTCCCGGGTGGGTT
  • hsa-mir-146a 205 ACCTCTGAAATTCAGTTCTTCAGCTGGGATAT
  • hsa-mir-181c 213 TTGGGCAGCTCAGGCAAACCATCGACCGTTGAGTGGACCCTGAGGCCT
  • hsa-mir-205 226 CAACCAGATTTCAGTGGAGTGAAGTTCAGGAGGCATGGA
  • hsa-mir-214 228 AACATCCGCTCACCTGTACAGCAGGCACAGACAGGCAGTCACATGACA
  • hsa-mir-221 229 TGTTCGTTAGGCAACAGCTACATTGTCTGCTGGGTTTCAGGCTACCTG
  • hsa-mir-223 232 CTGAGTTGGACACTCCATGTGGTAGAGTGTCAGTTTGTCAAATACCCC
  • hsa-mir-224 233 GTTTCAAAATGGTGCCCTAGTGACTACAAAGCCCC
  • hsa-mir-345 236 GGTGGGCCCTGAACGAGGGGTCTGGAGGCCTGGGTTTGAATATCGACA
  • hsa-mir-424 240 AAAACGTGAGGCGCTGCTATACCCCCTCGTGGGGAAGGTAGAAGGTGG
  • hsa-mir-487b 244 TTGCTCATGTCGAATCGTACAGGGTCATCCACTTTTTCAGTATCAAGA
  • hsa-mir-497 248 GGCCACGTCCAAACCACACTGTGGTGTTAGAGCGAGGGTGGGGGAGGC
  • hsa-mir-513a 249 TGTCATTTATGTGAACTAAAATATAAATTTCACCTTTCTGAGAAGGGT
  • hsa-mir-542 250 CAGTGTGCACTTGTGACAGATTGATAACTGAAAGGTCTGGGAGCCACT
  • hsa-mir-625 254 GATCTCAGGACTATAGAACTTTCCCCCTCATCCCTCTGCCCTCTACCA
  • hsa-mir-658 256 GCACGACTCAGGGCGGAGGGAAGTAGGTCCGTTGGTCGGTCGGGAACG
  • hsa-mir-708 258 GCACATGAACACAACTAGACTGTGAGCTTCTAGAGGGCAGGGACC
  • hsa-mir-765 259 TACGAGAAACTGGGGTTTCTGGAGGAGAAGGAAGGTGATGAAGGATCT
  • hsa-mir-2392 261 ACCCCAGGCTGTAGGATGGGGGTGAGAGGTGCTA
  • hsa-mir-3162 264 ATGAACAATGTTTCTCACTCCCTACCCCTCCACTCCCCAAAAAAGTCA
  • hsa-mir-4284 269 GAGGGGGTAGTTAGGAGCTTTGATAGAG
  • hsa-mir-4539 276 GCAGGGCTGAGCTGAACTGGGCTGAGCTGGGCTGAGCTGGGCTGAGCTGGGCTGAGTT
  • hsa-mir-4690 278 GCCCAGCTGAGGCCTCTGCTGTCTTATCTGTC
  • hsa-mir-5001 280 CTCAGTTCTGCCTCTGTCCAGGTCCTTGTGACCCGCCC
  • hsa-mir-5100 281 CCCAGCGGTGCCTCTAACTG
  • hsa-mir-5701-2 285 TGATCCAATCAGAACATGAAAATAACGTCCAATC
  • hsa-mir-6076 287 CAGGAGAACATCTGAGAGGGGAAGTTGCTTTCCTGCCCTGGCCCTTTC
  • SEQ ID NOs.183-306 in Table 2 present the cDNA corresponding to the sequence of the naturally occurring pre-miR, i.e., the sequences present thymine (T) instead of uracil (U).
  • the nucleic acid may be in the form of a nucleic acid complex, and may further comprise 5 one or more of the following: a peptide, a protein, a RNA-DNA hybrid, an antibody, an antibody fragment, a Fab fragment, or an aptamer.
  • the nucleic acid may comprise a sequence of a pri-miRNA or a variant thereof.
  • the pri- microRNA sequence may comprise from 45-30,000, 50-25,000, 100-20,000, 1,000-1,500 or 80- 100 nucleotides.
  • the sequence of the pri-miRNA may comprise a pre-miRNA, miRNA and 10 miRNA*, as set forth herein, and variants thereof.
  • the sequence of the pri-miRNA may comprise any of the sequences of SEQ ID NOS: 183-308 or variants thereof.
  • the pri-miRNA may comprise a hairpin structure.
  • the hairpin may comprise a first and a second nucleic acid sequence that are substantially complimentary.
  • the first and second nucleic acid sequence may be from 37-50 nucleotides.
  • the first and second nucleic acid sequence may 15 be separated by a third sequence of from 8-12 nucleotides.
  • the hairpin structure may have a free energy of less than -25 Kcal/mole as calculated by the Vienna algorithm with default parameters, as described in Hofacker et al. (Monatshefte f. Chemie 1994; 125: 167-188), the contents of which are incorporated herein by reference.
  • the hairpin may comprise a terminal loop of 4-20, 8-12 or 10 nucleotides.
  • the pri-miRNA may comprise at least 19% adenosine nucleotides, at least 16% cytosine nucleotides, at least 23% thymine nucleotides and at least 19% guanine nucleotides.
  • the nucleic acid may also comprise a sequence of a pre-miRNA or a variant thereof.
  • the 5 pre-miRNA sequence may comprise from 45-90, 60-80 or 60-70 nucleotides.
  • the sequence of the pre-miRNA may comprise a miRNA and a miRNA* as set forth herein.
  • the sequence of the pre-miRNA may also be that of a pri-miRNA excluding from 0-160 nucleotides from the 5' and 3 ' ends of the pri-miRNA.
  • the sequence of the pre-miRNA may comprise the sequence of SEQ ID NOS: 183-308 or variants thereof. 10
  • the nucleic acid may be at least 70%, 75%, 80%, 85%, 90%, 95%, 97%, 98% or 99% identical to the nucleic acid sequences in Tables 1 or 2 (with increments of 1% from 80 to 99%), over a region of 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23,
  • nucleotides 24, 25, 30, 35, 40, 45, 50 or more nucleotides.
  • the nucleic acid may also comprise a sequence of a microRNA (including a miRNA*) or 15 a variant thereof, including those putative microRNAs represented by MID- [numeral].
  • microRNAs include those miRs which have been listed in the miRBase registry name (release 20), as well as putative microRNAs which have been predicted and/or cloned by Rosetta Genomics and which are represented by MID- [numeral].
  • the microRNA sequence may comprise from 13-33, 18-24 or 21-23 nucleotides.
  • the microRNA may also 20 comprise a total of at least 5, 67, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24,
  • the sequence of the microRNA may be the first 13-33 nucleotides of the pre-miRNA.
  • the sequence of the microRNA may also be the last 13-33 nucleotides of the pre-miRNA.
  • the sequence of the microRNA may comprise the sequence of any one of SEQ ID NOS: 1-182 or a variant thereof.
  • the present invention employs microRNAs for the identification, classification and diagnosis of thyroid nodules.
  • Variant as used herein referring to a nucleic acid, means (i) a portion of a referenced nucleotide sequence; (ii) the complement of a referenced nucleotide sequence or portion thereof; (iii) a nucleic acid that differs from the referenced nucleotide sequence by a point-mutation or 30 the complement thereof; (iv) a naturally-occurring variant of the referenced nucleotide sequence present in the general population or the complement thereof; or (iv) a nucleic acid that hybridizes under stringent conditions to the referenced nucleic acid, of the complement thereof.
  • Probe means an oligonucleotide capable of binding to a target nucleic acid of complementary sequence through one or more types of chemical bonds, usually through complementary base pairing, usually through hydrogen bond formation. Probes may bind target sequences lacking complete complementarity with the probe sequence depending upon the stringency of the hybridization conditions. For example, for hybridization assays, the probe may 5 be complementary to at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20 contiguous nucleotides of the sequence of the microRNA being detected.
  • the probe may be complementary to at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20 contiguous nucleotides of the 10 sequence of the PCR product being detected.
  • a probe may be complementary to, or may hybridize to at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 97%, 98% or 99% of its target nucleic acid.
  • a probe may be single- stranded or partially single- and partially double- stranded.
  • the strandedness of the probe is dictated by the structure, composition and properties of the target 15 sequence.
  • Probes may include a label, an attachment, or a nucleotide sequence that does not naturally occur in a nucleic acid described herein. Probes may be directly labeled or indirectly labeled such as with biotin to which a streptavidin complex may bind.
  • Probe may be an agent for detecting the nucleic acid sequences described herein.
  • Probe may be a labeled nucleic acid probe capable of hybridizing to a portion of the nucleic acid 20 sequence of the invention, or amplification products derived therefrom.
  • the nucleic acid probe is reverse complementary nucleic acid molecule of the nucleic acid sequence disclosed herein.
  • a probe may be a nucleic acid sequence which sufficiently specifically hybridizes under stringent conditions to the nucleic acid disclosed herein.
  • a probe is optionally labeled with a fluorescent molecule such as a fluorescein, e.g. 6-carboxyfluorescein 25 (FAM), an indocarbocyanine, e.g.
  • FAM 6-carboxyfluorescein 25
  • QUASAR-670 a hexafluorocine, such as 6- carboxyhexafluorescein (HEX), or other fluorophore molecules and optionally a quencher.
  • a quencher is appreciated to be matched to a fluorophore.
  • Illustrative examples of a quencher include the black hole quenchers BHQ1, and BHQ2, or minor groove binders (MGB), e.g. dihydrocyclopyrroloindole tripeptide.
  • MGB minor groove binders
  • Other fluorophores and quenchers are known in the art and 30 are similarly operable herein.
  • the present invention also provides a probe, said probe comprising the novel nucleic acid sequences described herein, defined by any one of SEQ ID NOs. 27-29, 33, 34, 139, 140, 307 and 308, or variants thereof.
  • Probes may be used for screening and diagnostic methods.
  • the probe may be attached or immobilized to a solid substrate, such as a biochip.
  • the probe may have a length of from 8 to 500, 10 to 100 or 20 to 60 nucleotides.
  • the probe may have a length of at least 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 120, 140, 160, 180, 200, 220, 240, 260, 280 or 300 nucleotides.
  • the probe may further comprise a linker sequence of from 10-60 nucleotides.
  • the probe may 5 further comprise a linker.
  • the linker may comprise a sequence that does not occur naturally in a nucleic acid described herein.
  • the linker may be 10-60 nucleotides in length.
  • the linker may be 20-27 nucleotides in length.
  • the linker may be of sufficient length to allow the probe to be a total length of 45-60 nucleotides.
  • the linker may not be capable of forming a stable secondary structure, or may not be capable of folding on itself, or may not be capable of folding on a non- 10 linker portion of a nucleic acid contained in the probe.
  • the sequence of the linker is heterogenous, and it may not appear in the genome of the animal from which the probe non- linker nucleic acid is derived.
  • the term "reference value” means a value that statistically correlates to a particular outcome when compared to an assay result.
  • the reference value is 15 determined from statistical analysis of studies that compare microRNA expression with known clinical outcomes.
  • the reference value may vary according to the classifier (i.e. the algorithm) used.
  • the reference value may be the expression levels (or values) of all the microRNAs in the training data.
  • the reference value may be one or more thresholds established by the classifier.
  • the reference value may further be a coefficient or set of 20 coefficients. Essentially the reference value refers to any parameter needed or used by the algorithm.
  • “Sensitivity”, as used herein, may mean a statistical measure of how well a binary classification test correctly identifies a condition, for example, how frequently it correctly classifies a cancer into the correct type out of two possible types.
  • the sensitivity for class A is 25 the proportion of cases that are determined to belong to class "A” by the test out of the cases that are in class "A", as determined by some absolute or gold standard.
  • “Sensitivity”, as used herein, may mean a statistical measure of how well a classification test correctly identifies a condition or conditions, for example, how frequently it correctly classifies a cancer into the correct type out of two or more possible types.
  • the sensitivity for 30 class A is the proportion of cases that are determined to belong to class "A” by the test out of the cases that are in class "A", as determined by some absolute or gold standard.
  • Smear refers to a sample of thyroid tissue spread thinly on a microscope slide for examination, typically for medical diagnosis. Smears from FNAs usually have very small amounts of cells, which results in small amounts of RNA, which may range from 1-1000 ng, 1-100 ng, 1-50 ng, 1-40 ng, accordingly. Smears may be stained with any stain known to the man skilled in the art of cytology, histology or pathology, such as any stain used to differentiate cells in pathologic specimens.
  • stains are multichromatic stains, like Papanicolaou, which are a combination of nuclear stain and cytoplasm stain; cellular structure 5 stains such as Wright, Giemsa, Romanowsky and the like; nuclear stains, such as Hoescht stains and the like; cell viability stains, such as Trypan blue, and the like, enzyme activity, such as benzidine for HRP to form visible precipitate and the like.
  • Specificity may mean a statistical measure of how well a binary classification test correctly identifies cases that do not have a specific condition, for example, 10 how frequently it correctly classifies a sample as non-cancer when indeed it is a non-cancerous sample.
  • the specificity for class A is the proportion of cases that are determined to belong to class "not A” by the test out of the cases that are in class "not A”, as determined by some absolute or gold standard.
  • Specificity may mean a statistical measure of how well a classification 15 test correctly identifies cases that do not have a specific condition.
  • the specificity for class A is the proportion of cases that are determined by the test not to belong to class A out of the cases that are not in class A, as determined by some absolute or gold standard.
  • stage of cancer refers to a numerical measurement of the level of advancement of a cancer. Criteria used to determine the stage of a cancer include, but are not 20 limited to, the size of the tumor, whether the tumor has spread to other parts of the body and where the cancer has spread (e.g., within the same organ or region of the body or to another organ).
  • Stringent hybridization conditions mean conditions under which a first nucleic acid sequence (e.g., probe) will hybridize to a second nucleic acid sequence (e.g., target), 25 such as in a complex mixture of nucleic acids. Stringent conditions are sequence-dependent and will be different in different circumstances. Stringent conditions may be selected to be about 5- 10°C lower than the thermal melting point (T m ) for the specific sequence at a defined ionic strength pH.
  • the T m may be the temperature (under defined ionic strength, pH and nucleic concentration) at which 50% of the probes complementary to the target hybridize to the target 30 sequence at equilibrium (as the target sequences are present in excess, at T m , 50% of the probes are occupied at equilibrium).
  • Stringent conditions may be those in which the salt concentration is less than about 1.0 M sodium ion, such as about 0.01-1.0 M sodium ion concentration (or other salts) at pH 7.0 to 8.3 and the temperature is at least about 30°C for short probes (e.g., about 10- 50 nucleotides) and at least about 60°C for long probes (e.g., greater than about 50 nucleotides).
  • Stringent conditions may also be achieved with the addition of destabilizing agents such as formamide.
  • a positive signal may be at least 2 to 10 times background hybridization.
  • Exemplary stringent hybridization conditions include the following: 50% formamide, 5x SSC, and 1% SDS, incubating at 42°C, or, 5x SSC, 1% SDS, incubating at 5 65°C, with wash in 0.2x SSC, and 0.1% SDS at 65°C, DMSO, 6X SSPE + 0.005% N- Lauroylsarcosine +0.005% Triton X-102, 0.06X SSPE + 0.005% N-Lauroylsarcosine +0.005% Triton X-102.
  • the term "subject” refers to a mammal, including both human and other mammals.
  • the methods of the present invention are preferably applied to human subjects. 10
  • subtype of cancer refers to different types of cancer that affect the same organ (e.g., papillary, follicular carcinoma and follicular variant papillary carcinoma of the thyroid).
  • thyroid lesion may mean a thyroid tumor, including sub-types of thyroid tumors, such as Hashimoto disease, follicular carcinoma, papillary carcinoma, follicular 15 variant of papillary carcinoma (FVPC or FVPTC), encapsulated FVPC (or encapsulated FVPTC), non-encapsulated (infiltrative/diffuse) FVPC or FVPTC, medullary carcinoma, anaplastic thyroid cancer, or poorly differentiated thyroid cancer.
  • sub-types of thyroid tumors such as Hashimoto disease, follicular carcinoma, papillary carcinoma, follicular 15 variant of papillary carcinoma (FVPC or FVPTC), encapsulated FVPC (or encapsulated FVPTC), non-encapsulated (infiltrative/diffuse) FVPC or FVPTC, medullary carcinoma, anaplastic thyroid cancer, or poorly differentiated thyroid cancer.
  • threshold expression profile refers to a criterion expression profile to which measured values are compared in order to classify a tumor.
  • tissue sample is tissue obtained from a tissue biopsy using methods well known to those of ordinary skill in the related medical arts.
  • the phrase "suspected of being cancerous", as used herein, means a cancer tissue sample believed by one of ordinary skill in the medical arts to contain cancerous cells. Methods for obtaining the sample from the biopsy include gross apportioning of a mass, microdissection, laser-based microdissection, or other art- 25 known cell- separation methods.
  • Tumor refers to all neoplastic cell growth and proliferation, whether malignant or benign, and all pre-cancerous and cancerous cells and tissues.
  • the cytological classification of the thyroid lesions or tumor samples used herein is based on "The Bethesda System for Reporting Thyroid Cytopathology", the "BSRTC” (Syed, Z. Ali and Edmund S. 30 Cibas, eds.; DOI 10.1007/978-0-387-87666-5_l ; Springer Science+Business Media, LLC 2010).
  • the BSRTC recommends that each thyroid FNA report be accompanied by a general diagnostic category, in which each category has an implied cancer risk. Recommended nomenclature for the Bethesda categories are as follows:
  • Consistent with a benign follicular nodule includes adenomatoid nodule, colloid nodule, etc.
  • Indeterminate refers to thyroid lesions or tumor samples examined for cytology and classified according to the Bethesda classification in categories III, IV and V.
  • the present invention further provides a method for identifying subtypes of thyroid lesions in a subject, said subtypes of thyroid lesions being said subtypes of malignant or benign thyroid tumor, subtype is any one of follicular carcinoma, papillary carcinoma, follicular variant of papillary carcinoma (FVPC or FVPTC), encapsulated FVPC (or encapsulated FVPTC), non- encapsulated FVPC (or non-encapsulated FVPTC), medullary carcinoma, anaplastic thyroid 5 cancer or poorly differentiated thyroid cancer.
  • said subtype is any one of Hashimoto thyroiditis, follicular adenoma or hyperplasia.
  • said subtype is Hurthle cell carcinoma.
  • the present invention provides a method for distinguishing between 10 follicular adenoma and follicular carcinoma.
  • the present invention provides a method for distinguishing follicular adenoma from papillary carcinoma.
  • the present invention provides a method for distinguishing follicular adenoma from follicular variant of papillary carcinoma.
  • the present invention provides a method for distinguishing non- encapsulated follicular variant of papillary carcinoma from benign lesions.
  • the present invention provides a method for distinguishing papillary carcinoma and Hashimoto thyroiditis.
  • Vector refers to any known vector such as a plasmid vector, a phage vector, a 20 phagemid vector, a cosmid vector, or a virus vector.
  • the nucleic acid described herein may be comprised in a vector.
  • the vector may be used for delivery of the nucleic acid.
  • the vector preferably contains at least a promoter that enhances expression of the nucleic acid carried, and in this case the nucleic acid is preferably operably linked to such a promoter.
  • the vector may or may not be replicable in a host cell, and the transcription of a gene may be carried out either 25 outside the nucleus or within the nucleus of a host cell.
  • the nucleic acid may be incorporated into the genome of a host cell.
  • a vector may be a DNA or RNA vector.
  • a vector may be either a self-replicating extrachromosomal vector or a vector that integrates into a host genome.
  • the levels of microRNAs 30 are measured by reverse transcription polymerase chain reaction (RT-PCR).
  • Target sequences of a cDNA are generated by reverse transcription of a target RNA, which may be a nucleic acid described herein (comprising a sequence provided in Tables 1 and 2).
  • RNA which may be a nucleic acid described herein (comprising a sequence provided in Tables 1 and 2).
  • Known methods for generating cDNA involve reverse transcribing either polyadenylated RNA or alternatively, RNA with a ligated adaptor sequence.
  • RNA may be ligated to an adaptor sequence prior to reverse transcription.
  • a ligation reaction may be performed by T4 RNA ligase to ligate an adaptor sequence at the 3 ' end of the RNA.
  • Reverse transcription (RT) reaction may then be performed using a primer comprising a 5 sequence that is complementary to the 3' end of the adaptor sequence.
  • polyadenylated RNA may be used in a reverse transcription (RT) reaction using a poly(T) primer comprising a 5' adaptor sequence.
  • the poly(T) sequence may comprise 8, 9, 10, 11, 12, 13, or 14 consecutive thymines.
  • the reverse transcript of the RNA may then be amplified by real-time PCR, using a 10 specific forward primer comprising at least 15 nucleic acids complementary to the target nucleic acid and a 5' tail sequence; a reverse primer that is complementary to the 3' end of the adaptor sequence; and a probe comprising at least 8 nucleic acids complementary to the target nucleic acid.
  • the probe may be partially complementary to the 5 ' end of the adaptor sequence.
  • the amplification of the reverse transcripts of the target nucleic acids may be by PCR or the like.
  • the first cycles of the PCR reaction may have an annealing temperature of 56°C, 57°C, 58°C, 59°C, or 60°C.
  • the first cycles may comprise 1-10 cycles.
  • the remaining cycles of the PCR reaction may be 60°C.
  • the remaining cycles may comprise 2-40 cycles.
  • the PCR reaction comprises a forward primer.
  • the forward primer 20 may comprise 15, 16, 17, 18, 19, 20, or 21 nucleotides identical to the target nucleic acid.
  • the 3' end of the forward primer may be sensitive to differences in sequence between a target nucleic acid and highly similar sequences.
  • the forward primer may also comprise a 5' overhanging tail.
  • the 5' tail may increase the melting temperature of the forward primer.
  • the sequence of the 5' tail may comprise a sequence 25 that is non- identical to the target nucleic acid.
  • the sequence of the 5' tail may also be synthetic.
  • the 5' tail may comprise 8, 9, 10, 11, 12, 13, 14, 15, or 16 nucleotides. Examples of forward primers used in the invention are provided in Table 8.
  • the PCR reaction comprises a reverse primer.
  • the reverse primer may be complementary to a target nucleic acid.
  • the reverse primer may also comprise a sequence complementary to an 30 adaptor sequence. Examples of reverse primers used in the invention are provided in Example 8.
  • the probes used to detect products of RT-PCR amplification may be general probes or sequence-specific probes.
  • General probes are designed to detect (or hybridize with) RT-PCR amplification products in a non-sequence specific manner.
  • Said probes are between 16 and 20 nucleotides long, preferably 18 nucleotides long, and comprise a sequence which is the reverse complement of the RT primer, including 4 adenines (As) at the 5 ' end.
  • Sequence-specific probes are designed to detect (or hybridize with) RT-PCR amplification products based on total or partial complementarity between the sequence of the probe and the sequence of the RT-PCR product.
  • Said probes are between 20 and 28 nucleotides longs, preferably 24 nucleotides long, 5 and comprising at the 5 'end three nucleotides from each at least two are complementary to the RT primer, followed by between 10 to 14, preferably 12 thymines (Ts), followed by between 6 to 10, preferably 8 contiguous nucleotides which correspond to the reverse complementary sequence of the specific corresponding microRNA.
  • Ts thymines
  • a biochip comprising novel nucleic acids described herein is provided.
  • the biochip may comprise probes that recognize the novel nucleic acids described herein.
  • Said nucleic acids are isolated nucleic acids comprising at least 12 contiguous nucleotides at least 80% identical to the sequence of any one of SEQ ID NOs. 27-29, 33, 34, 139, 140, 307 and 308.
  • said isolated nucleic acid comprises at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, or at least 20 contiguous nucleotides 15 identical to the sequence of any one of SEQ ID NOs. 27-29, 33, 34, 139, 140, 307 and 308.
  • the biochip may comprise a solid substrate comprising an attached nucleic acid, probe or plurality of probes described herein.
  • the probes may be capable of hybridizing to a target sequence under stringent hybridization conditions.
  • the probes may be attached at spatially defined addresses on the substrate. More than one probe per target sequence may be used, with either overlapping 20 probes or probes to different sections of a particular target sequence.
  • the probes may be capable of hybridizing to target sequences associated with a single disorder appreciated by those in the art.
  • the probes may either be synthesized first, with subsequent attachment to the biochip, or may be directly synthesized on the biochip.
  • the solid substrate may be a material that may be modified to contain discrete individual 25 sites appropriate for the attachment or association of the probes and is amenable to at least one detection method.
  • substrates include glass and modified or functionalized glass, plastics (including acrylics, polystyrene and copolymers of styrene and other materials, polypropylene, polyethylene, polybutylene, polyurethanes, TeflonJ, etc.), polysaccharides, nylon or nitrocellulose, resins, silica or silica-based materials including silicon 30 and modified silicon, carbon, metals, inorganic glasses and plastics.
  • the substrates may allow optical detection without appreciably fluorescing.
  • the substrate may be planar, although other configurations of substrates may be used as well. For example, probes may be placed on the inside surface of a tube, for flow-through sample analysis to minimize sample volume.
  • the substrate may be flexible, such as flexible foam, including closed cell foams made of particular plastics.
  • the biochip and the probe may be derivatized with chemical functional groups for subsequent attachment of the two.
  • the biochip may be derivatized with a chemical functional group including, but not limited to, amino groups, carboxyl groups, oxo groups or 5 thiol groups.
  • the probes may be attached using functional groups on the probes either directly or indirectly using a linker.
  • the probes may be attached to the solid support by either the 5' terminus, 3' terminus, or via an internal nucleotide.
  • the probe may also be attached to the solid support non-covalently.
  • biotinylated oligonucleotides can be made, which may bind to surfaces covalently coated with 10 streptavidin, resulting in attachment.
  • probes may be synthesized on the surface using techniques such as photopolymerization and photolithography.
  • measuring the microRNAs for classification of thyroid lesions may be effected by high throughput sequencing.
  • High throughput sequencing can involve sequencing-by-synthesis, sequencing-by-ligation, and ultra-deep sequencing. Sequence- 15 by-synthesis can be initiated using sequencing primers complementary to the sequencing element on the nucleic acid tags.
  • the method involves detecting the identity of each nucleotide immediately after (substantially real-time) or upon (real-time) the incorporation of a labeled nucleotide or nucleotide analog into a growing strand of a complementary nucleic acid sequence in a polymerase reaction. After the successful incorporation of a label nucleotide, a signal is 20 measured and then nulled by methods known in the art.
  • sequence-by-synthesis methods are known in the art, and are described for example in US 7,056,676, US 8,802,368 and US 7,169,560, the contents of which are incorporated herein by reference.
  • labels that can be used to label nucleotide or nucleotide analogs for sequencing-by-synthesis include, but are not limited to, chromophores, fluorescent moieties, enzymes, antigens, heavy metal, 25 magnetic probes, dyes, phosphorescent groups, radioactive materials, chemiluminescent moieties, scattering or fluorescent nanoparticles, Raman signal generating moieties, and electrochemical detection moieties.
  • Sequencing-by-synthesis can generate at least 1,000, at least 5,000, at least 10,000, at least 20,000, 30,000, at least 40,000, at least 50,000, at least 100,000 or at least 500,000 reads per hour.
  • Such reads can have at least 40, at least 45, at least 50, at least 30 60, at least 70, at least 80, at least 90, at least 100, at least 120 or at least 150 bases per read.
  • Sequencing-by-synthesis may be performed on a solid surface (or a chip) using fold-back PCR and anchored primers. Since microRNAs occur as small nucleic acid fragments - adaptors are added to the 5' and 3' ends of the fragments. Nucleic acid fragments that are attached to the surface of flow cell channels are extended and bridge amplified. The fragments become double stranded, and the double stranded molecules are denatured. Multiple cycles of the solid-phase amplification followed by denaturation can create several million clusters of approximately 1 ,000 copies of single-stranded nucleic acid molecules of the same template in each channel of the flow cell.
  • Primers, polymerase and four fluorophore-labeled, reversibly terminating 5 nucleotides are used to perform sequential sequencing. After nucleotide incorporation, a laser is used to excite the fluorophores, an image is captured and the identity of the first base is recorded. The 3' terminators and fluorophores from each incorporated base are removed and the incorporation, detection and identification steps are repeated. This technology is used, for example, in the Illumina® sequencing platform. 10
  • Another sequencing method involves hybridizing the amplified regions to a primer complementary to the sequence element in an LST (a file listing the names of fasta files).
  • This hybridization complex is incubated with a polymerase, ATP sulfurylase, luciferase, apyrase, and the substrates luciferin and adenosine 5' phosphosulfate.
  • deoxynucleotide triphosphates corresponding to the bases A, C, G, and T (U) are added sequentially.
  • Each base incorporation is 15 accompanied by release of pyrophosphate, converted to ATP by sulfurylase, which drives synthesis of oxyluciferin and the release of visible light.
  • pyrophosphate release is equimolar with the number of incorporated bases, the light given off is proportional to the number of nucleotides adding in any one step. The process is repeated until the entire sequence is determined.
  • Yet another sequencing method involves a four-color sequencing by ligation scheme 20 (degenerate ligation), which involves hybridizing an anchor primer to one of four positions. Then an enzymatic ligation reaction of the anchor primer to a population of degenerate nonamers that are labeled with fluorescent dyes is performed. At any given cycle, the population of nonamers that is used is structure such that the identity of one of its positions is correlated with the identity of the fluorophore attached to that nonamer.
  • the fluorescent signal allows the inference of the identity of the base.
  • the anchor primennonamer complexes are stripped and a new cycle begins.
  • Methods to image sequence information after performing ligation are known in the art. In some cases, high throughput sequencing involves the use of ultra-deep sequencing, such as described in Marguiles et al., 30 Nature 437 (7057): 376-80 (2005).
  • MicroRNA sequencing is a type of RNA Sequencing (RNA-Seq) which uses next-generation sequencing or massively parallel high-throughput DNA sequencing to sequence microRNAs. miRNA-seq differs from other forms of RNA-seq in that input material is often enriched for small RNAs. miRNA-seq provides tissue specific expression patterns, which may lead to disease associations and microRNAs isoforms. miRNA-seq is also used for the discovery of previously uncharacterized microRNAs, such as the nucleic acid sequences denoted by SEQ ID NOs 139-140 and 307-308.
  • diagnosis refers to classifying pathology, or a symptom, 5 determining a severity of the pathology (grade or stage), monitoring pathology progression, forecasting an outcome of pathology and/or prospects of recovery.
  • the phrase "subject in need thereof” refers to an human subject who is known to have cancer, at risk of having cancer (e.g., a genetically predisposed subject, a subject with medical and/or family history of cancer, a subject who has been exposed to carcinogens, 10 occupational hazard, environmental hazard) and/or a subject who exhibits suspicious clinical signs of cancer (e.g., nodules in the thyroid). Additionally or alternatively, the subject in need thereof can be a healthy human subject undergoing a routine well-being check-up.
  • cancer e.g., a genetically predisposed subject, a subject with medical and/or family history of cancer, a subject who has been exposed to carcinogens, 10 occupational hazard, environmental hazard
  • a subject who exhibits suspicious clinical signs of cancer e.g., nodules in the thyroid.
  • the subject in need thereof can be a healthy human subject undergoing a routine well-being check-up.
  • Analyzing presence of malignant or pre-malignant cells can be effected in vivo or ex vivo, whereby a biological sample (e.g., biopsy) is retrieved.
  • a biological sample e.g., biopsy
  • Such biopsy samples comprise cells and 15 may be an incisional or excisional biopsy.
  • the sample may be retrieved from the thyroid of the subject, and may be retrieved using FNA. Alternatively the cells may be retrieved from a complete resection.
  • treatment regimen refers to a treatment plan that specifies the type of treatment, dosage, schedule and/or duration of a treatment provided to a subject in need thereof (e.g., a subject diagnosed with a pathology).
  • the selected treatment regimen can be an aggressive one which is expected to result in the best clinical outcome (e.g., complete cure of 25 the pathology) or a more moderate one which may relieve symptoms of the pathology yet results in incomplete cure of the pathology. It will be appreciated that in certain cases the treatment regimen may be associated with some discomfort to the subject or adverse side effects (e.g., damage to healthy cells or tissue).
  • the type of treatment can include a surgical intervention (e.g., removal of lesion, diseased cells, tissue, or organ), a cell replacement therapy, an administration 30 of a therapeutic drug (e.g., receptor agonists, antagonists, hormones, chemotherapy agents) in a local or a systemic mode, an exposure to radiation therapy using an external source (e.g., external beam) and/or an internal source (e.g., brachytherapy) and/or any combination thereof.
  • a surgical intervention e.g., removal of lesion, diseased cells, tissue, or organ
  • a cell replacement therapy e.g., an administration 30 of a therapeutic drug (e.g., receptor agonists, antagonists, hormones, chemotherapy agents) in a local or a systemic mode
  • an exposure to radiation therapy using an external source e.g., external beam
  • an internal source e.g., brachytherapy
  • the dosage, schedule and duration of treatment can vary, depending on the severity of pathology and the selected type of treatment,
  • a method of diagnosis comprises detecting an expression level of a specific cancer-associated nucleic acid in a biological sample. Diagnosis of a specific cancer state in a patient may allow for prognosis and selection of therapeutic strategy. Further, 5 the developmental stage of cells may be classified by determining temporarily expressed specific cancer-associated nucleic acids.
  • In situ hybridization of labeled probes to tissue sections or FNA smears may be performed.
  • the skilled artisan can make a diagnosis, a prognosis, or a prediction based on the findings. It is further understood that 10 the nucleic acid sequence which indicate the diagnosis may differ from those which indicate the prognosis and molecular profiling of the condition of the cells may lead to distinctions between responsive or refractory conditions or may be predictive of outcomes.
  • kits may comprise a nucleic acid described herein together with any or all of the following: assay reagents, buffers, probes and/or primers, and sterile saline or 15 another pharmaceutically acceptable emulsion and suspension base.
  • the kits may include instructional materials containing directions (e.g., protocols) for the practice of the methods described herein.
  • the kit may further comprise a software package for data analysis of expression profiles.
  • the kit may be a kit for the amplification, detection, identification or 20 quantification of a target nucleic acid sequence.
  • the kit may comprise a poly (T) primer, a forward primer, a reverse primer, and a probe.
  • compositions described herein may be comprised in a kit.
  • reagents for isolating microRNA, labeling microRNA, and/or evaluating a microRNA population using an array are included in a kit.
  • the kit may further include reagents for creating 25 or synthesizing microRNA probes.
  • the kits will thus comprise, in suitable container means, an enzyme for labeling the microRNA by incorporating labeled nucleotide or unlabeled nucleotides that are subsequently labeled. It may also include one or more buffers, such as reaction buffer, labeling buffer, washing buffer, or a hybridization buffer, compounds for preparing the microRNA probes, components for in situ hybridization and components for isolating 30 microRNA.
  • kits of the invention may include components for making a nucleic acid array comprising microRNA, and thus, may include, for example, a solid support.
  • a nucleic acid array comprising microRNA may include, for example, a solid support.
  • RNA expression assays e.g., microarray analysis, RT-PCR, high throughput sequencing (next generation sequencing), cloning, and 10 quantitative real time polymerase chain reaction (qRT-PCR).
  • qRT-PCR quantitative real time polymerase chain reaction
  • Proteins were degraded by proteinase K solution (5-12 ⁇ Proteinase K (e.g., Sigma or ABI) in 500 ⁇ of Buffer B (10 mM NaCl, 500 mM Tris pH 7.5, 20 mM EDTA pH 8, 1% SDS), at 45°C for a few hours (about 16 hours). Proteinase K was inactivated by incubation at 95 °C for 7 25 minutes. After the tubes were chilled 10 ⁇ of RNA synthetic spikes was added (e.g., 2 spikes of 0.15 fmol/ ⁇ ). RNA was extracted using acid phenol/chloroform equal volume, vortexing, followed by centrifugation at 4°C for 15 minutes at 12000g.
  • Buffer B 10 mM NaCl, 500 mM Tris pH 7.5, 20 mM EDTA pH 8, 1% SDS
  • RNA was then precipitated using 8 ⁇ linear acrylamide, 0.1 volumes of 3M NaOAc pH 5.2, and 3 volumes of absolute 100% ethanol, for 30 minutes to 16 hours followed by centrifugation at 4°C for at least 40 minutes at 30 20000g (14,000 rpm). The pellet was washed by adding 1 ml 85% cold Ethanol. DNAses were introduced at 37°C for 60 minutes to digest DNA (e.g. 10 ⁇ Turbo DNase), followed by extraction using acid phenol/chloroform and ethanol precipitated as described above.
  • DNAses were introduced at 37°C for 60 minutes to digest DNA (e.g. 10 ⁇ Turbo DNase), followed by extraction using acid phenol/chloroform and ethanol precipitated as described above.
  • DDW double-distilled water
  • RNA was then precipitated using 8 ⁇ linear acrylamide, 0.1 volumes of 3M NaOAc pH 5.2, and 3 volumes of absolute ethanol from 30 minutes to 16 hours.
  • the tubes were then spun down at 4°C for at least 40 minutes at 20000g (14,000 rpm).
  • the pellet was washed with about 1 ml 85% cold ethanol.
  • DNAses were introduced at 37°C for 60 minutes to digest DNA (e.g. 10 ⁇ TurboTM DNase, Ambion, Life Technologies), followed by extraction using acid 20 phenol/chloroform and ethanol precipitation as described above.
  • RNA quantification was performed by fluorospectrometry in a NanoDrop 3300 (ND3300) fluorospectrometer using the RiboGreen® dye (Thermo Fisher Scientific®, Wilmington, DE).
  • the ND3300 RNA detection range is of 25 ng/ml - 1000 ng/ml when using a 25 high concentration of RiboGreen® dye (1:200 dilution), and 5 ng/ml - 50 ng/ml when using a 1:2000 dilution of RiboGreen® dye.
  • the RNA amounts which were determined by ND3300 were highly correlated to the detected expressed microRNA.
  • Custom microarrays were generated by printing 30 DNA oligonucleotide probes to: 2172 miRs sequences, 17 negative controls, 23 spikes, and 10 positive controls (total of 2222 probes).
  • Negative spikes and positive probes were printed from 3 to 200 times. Seventeen (17) negative control probes were designed using sequences that do not match the genome.
  • Two groups of positive control probes were designed to hybridize to the microRNA array: (i) synthetic small RNAs were spiked to the RNA before labeling to verify the labeling efficiency; and (ii) probes for abundant small RNA, e.g. , small nuclear RNAs (U43, U24, Z30, U6, U48, U44), 5.8s and 5s 5 ribosomal RNA were spotted on the array to verify RNA quality.
  • small RNAs e.g. , small nuclear RNAs (U43, U24, Z30, U6, U48, U44), 5.8s and 5s 5 ribosomal RNA were spotted on the array to verify RNA quality.
  • RNA (20-1000ng) was labeled by ligation (Thomson et al. Nature Methods 2004; 1 :47-53) with an RNA linker, p-rCrU-Cy/dye or several sequential Cys (BioSpring GmbH, IBA GmbH or equivalent), to the 3' end with Cy3 or Cy5.
  • the labeling reaction contained total RNA, 10 spikes (0.1-100 fmoles), 250-400 ng RNA-linker-dye, 15% DMSO, lx ligase buffer and 20 units of T4 RNA ligase (NEB or equivalent), and proceeded at 4°C for 1 hour, followed by 1 hour at 37°C, followed by 4 °C up to 40 minutes.
  • the labeled RNA was mixed with 30 ⁇ hybridization mixture (mixture of 45 ⁇ of the 10X GE Agilent Blocking Agent and 246 ⁇ of 2X Hi-RPM Hybridization).
  • the labeling 15 mixture was incubated at 100°C for 5 minutes followed by ice incubation in water bath for 5 minutes. Slides were hybridized at 54-55°C for 16-20 hours, followed by two washes.
  • the first wash was conducted at room temperature with Agilent GE Wash Buffer 1 (e.g. 6X SSPE + 0.005% N-Lauroylsarcosine +0.005% Triton X-102,) for 5 minutes followed by a second wash with Agilent GE Wash Buffer 2 at 37°C for 5 minutes (e.g. 0.06X SSPE + 0.005% N- 20 Lauroylsarcosine +0.005% Triton X-102).
  • Agilent GE Wash Buffer 1 e.g. 6X SSPE + 0.005% N-Lau
  • Arrays were scanned using a microarray scanner (Agilent Microarray Scanner Bundle G2565BA, resolution of 5 ⁇ at XDR Hi 100%, XDR Lo 10%). Array images were analyzed using appropriate software (Feature Extraction 10.7 software, Agilent).
  • the amplification reaction included a microRNA-specific forward 30 primer, being a TaqMan® (MGB) probe complementary to the 3' of the specific microRNA sequence and or to part of the polyA adaptor sequence, and a universal reverse primer complementary to the consensus 3 ' sequence of the oligodT tail.
  • MGB TaqMan®
  • the cycle threshold (CT, the PCR cycle at which probe signal reaches the threshold) was determined for each microRNA.
  • each value obtained by RT-PCR was subtracted from 50 (50-CT).
  • the 50-CT expression for each microRNA for each patient was compared with the signal obtained by the microarray method.
  • the initial data set consisted of signals measured for multiple probes for every sample. 10 For the analysis, signals were used only for probes that were designed to measure the expression levels of known or validated human microRNAs.
  • Triplicate spots were combined into one signal by taking the logarithmic mean of the reliable spots. All data was log-transformed and the analysis was performed in log-space. A reference data vector for normalization, R, was calculated by taking the mean expression level 15 for each probe in two representative samples, one from each tumor type.
  • Sequence library construction may be performed using a variety of different kits 25 depending on the high-throughput sequencing platform being employed. However, there are several common steps for small RNA sequencing preparation.
  • the ligation step adds DNA adaptors to both ends of the small RNAs, which act as primer binding sites during reverse transcription and PCR amplification.
  • An adenylated single strand DNA 3 'adaptor followed by a 5 'adaptor is ligated to the small RNAs using a ligating enzyme such as T4 RNA ligase or adding 30 5' adaptor using 5' RACE reaction 2.
  • the adaptors are also designed to capture small RNAs with a 5' phosphate group, characteristic microRNAs, rather than RNA degradation products with a 5' hydroxyl group.
  • Reverse transcription and PCR amplification steps convert the small adaptor ligated RNAs into cDNA clones used in the sequencing reaction. PCR is then carried out to amplify the pool of cDNA sequences. Primers designed with unique nucleotide tags may also be used in this step to create ID tags in pooled library multiplex sequencing.
  • RNA deep sequencing 5 500ng of RNA from each FFPE sample were used for small RNA deep sequencing 5 (miRSeq). Libraries were loaded on two lanes of the sequence analyzer (Illumina ® HiSeqTM 2000 DNA). An average of about 6.3 million reads per library were obtained. To find novel microRNAs, sequence analysis software (miRDeep2, Friedlander MR et al. Nucleic Acids Res. 2012 Jan;40(l):37-52) was applied on the raw sequencing data (primer-adapter sequences were trimmed). 10
  • P- values were calculated using a two-sided (unpaired) Student's t-test on the log- transformed normalized fluorescence signal.
  • the threshold for significant differences was determined by setting a false discovery rate (FDR) of 0.05 to 0.1, to correct for effects of multiple hypothesis testing, resulting in p-value cutoffs in the range of 0.01-0.06.
  • FDR false discovery rate
  • AUC area under curve
  • miRs Three sets of miRs were excluded from the statistical analysis: (a) miRs that were previously found as highly expressed in blood samples (due to high percentages of blood in FNA samples), (b) miRs whose level of expression did not correlate with decreasing 20 amounts of RNA, i.e: these miRs did not show linear decrease in signal in association with decreasing measured RNA amounts, and (c) miRs whose level of expression correlated with miRs in set (b).
  • Example 1 Detection of microRNA in pre-operative samples 25
  • microRNA profiling was conducted in a few Papanicolaou, Giemsa and Diff-Quick stained smears from ex-vivo FNA biopsy samples in order to ensure feasibility of the methodology. Since FNA smears often have very few cells, providing a minuscule amount of RNA for analysis, e.g. 1-1000 ng, it was first necessary to evaluate whether microRNA would be detectable under such low RNA amounts. Thus, microRNA expression levels of about 2200 30 individual microRNAs was measured in Giemsa-stained papillary carcinoma and non-papillary carcinoma smears.
  • microRNAs (hsa-miR-146b-5p, hsa-miR-31-5p, hsa-miR-222-3p, hsa- miR-221-3p, and hsa-miR-21-5p), previously shown to correlate with papillary carcinoma were found over-represented in the papillary-carcinoma smears.
  • Figure 1 shows a comparison of microRNA expression between Giemsa-stained papillary carcinoma and non-papillary carcinoma samples, and reveals the highly up-regulated microRNA markers in the papillary carcinoma.
  • the cohort of samples used in the experimental analysis consisted of 73 pre-operative thyroid FNA cell blocks selected from archived materials of the Department of Pathology Temple University Hospital (Philadelphia, USA).
  • the 73 specimens included samples of 35 10 benign and 38 malignant thyroid lesions.
  • the 35 benign tumors consisted of 18 follicular adenoma, eight (8) Hashimoto thyroiditis, and nine (9) hyperplasia (Goiter) samples.
  • the 38 malignant tumors consisted of: 10 follicular carcinoma and 28 papillary carcinoma. Of the 28 papillary carcinoma samples, nine (9) were papillary carcinoma, 13 were papillary carcinoma follicular variant encapsulated, and six (6) were papillary carcinoma follicular variant non- 15 encapsulated.
  • the histological diagnosis assessed ultimately the malignancy or benignity of the thyroid lesions.
  • the cytological classification was based on "The Bethesda System for Reporting Thyroid Cytopathology" (Syed, Z. Ali and Edmund S. Cibas, eds.; DOI 10.1007/978-0-387- 87666-5_l ; Springer Science+Business Media, LLC 2010).
  • the study protocol was approved by the Institutional Review Board (IRB, equivalent to Ethical Review Board) of the contributing 20 institution.
  • Tumor classification was based on the World Health Organization (WHO) guidelines.
  • An additional cohort consisted of 13 thyroid ex-vivo FNA smears, prepared after thyroidectomy, and obtained from the University Milano-Bicocca (Milan, Italy).
  • RNA total RNA (at least 10 ng) was extracted from these samples, and microRNA expression was profiled using custom microarrays containing about 2200 miRs. The results exhibited a 25 significant difference in the expression pattern between benign and malignant lesions of several miRs listed in Table 3 (upregulated or downregulated in malignant versus benign).
  • Table 3 miRNAs up or downregulated in malignant versus benign thyroid tumor
  • the fold-change represents the ratio between the median values of each group.
  • AUC Area under the curve when using the miRNAs to classify the two groups.
  • a classification algorithm for differentiating between malignant and benign thyroid tumor was developed based on miRNA expression in 35 benign and 38 malignant FNA samples.
  • a logistic regression classifier was trained to distinguish between malignant and benign thyroid lesions, based on eight miRs (hsa-miR-125b-5p, hsa-miR-21-5p, hsa-miR-222-3p, hsa-miR-221- 3p, hsa-miR-146b-5p, hsa-miR-181a-5p, hsa-miR-138-5p, and MID-23794) that were found to 10 be differentially expressed in these conditions, either between benign or malignant or between specific thyroid tumor subtypes (data not shown).
  • the classifier reached 89% accuracy with sensitivity of 87% and specificity of 91% for identifying malignant samples.
  • hsa-miR-125b-5p, hsa-miR-21-5p, hsa-miR-222-3p, hsa-miR-221-3p, hsa-miR-146b-5p and hsa-miR-181a-5p exhibited higher expression in malignant lesions, while hsa-miR-138-5p and MID-23794 15 exhibited higher expression in benign lesions (data not shown).
  • Example 3 Distinguishing different sub-types of malignant and benign thyroid lesions
  • miRNAs u p- or downregulated in fo licular adenoma versus follicular carcinoma
  • the fold-change represents the ratio between the median values of each group.
  • AUC Area under the curve when using the miRNAs to classify the two groups.
  • microRNAs that were upregulated or downregulated in follicular variant of papillary carcinoma relative to follicular adenoma are presented in Table 5.
  • the fold-change represents the ratio between the median values of each group.
  • AUC Area under the curve when using the miRNAs to classify the two groups.
  • miRs Expression levels of miRs were compared in 8 Hashimoto thyroiditis samples and 9 (non- 10 follicular) papillary carcinoma samples.
  • microRNAs that were upregulated or downregulated in papillary carcinoma relative to Hashimoto thyroiditis are presented in Table 6.
  • the miRs that are the best candidates for the profile signature for comparing these two thyroid lesions are hsa-miR- 146b-5p, hsa-miR-200a-3p and MID-23794.
  • p-values were calculated using a two-sided (unpaired) Student's t-test.
  • the fold-change represents the ratio between the median values of each group.
  • AUC Area under the curve when using the miRNAs to classify the two groups. 20 Median: median of expression values (rounded).
  • FFPE Form Fixed Paraffin Embedded thyroid resection samples (obtained from surgical biopsies and fixed in formalin and preserved in paraffin) from follicular lesions were obtained from the Department of Pathology at Rabin Medical Center. The specimens included 6 follicular adenomas and 5 follicular carcinomas. Tumor cellular content was higher than 50% in all the samples. A total of 386 novel candidate microRNAs were found with sequence analysis software, and 27 of those were selected for validation, performed by qPCR. Two novel microRNAs are disclosed herein, MD2-495 and MD2-437, and their sequences are presented in Table 1, and their respective hairpins are shown in Table 2.
  • Figure 2A shows the secondary structures of the two novel microRNAs, predicted by sequence analysis software.
  • Figure 2B shows the expression 5 of the two novel microRNAs (normalized number of reads) in each of the 11 samples.
  • the color- coded bar on the right represents a scale for expression.
  • Example 5 Specific microRNAs are differentially expressed between benign and malignant thyroid lesions
  • RNA was extracted from samples using in-house developed protocols as described above.
  • FFPE and cytological (FNA) samples were profiled by custom printed microarrays measuring over 2000 microRNAs to identify differentially expressed microRNAs and to develop a classifier.
  • microRNAs Differential expression of microRNAs was found between benign and malignant 10 neoplasms. Classification of malignant vs. benign smears based on two microRNAs: hsa-miR- 146b-5p and hsa-miR-375 results in over 85% accuracy (based on the median of ten 10-fold cross-validation runs, data not shown).
  • Example 6 hsa-miR-375 is a significant marker for medullary thyroid carcinoma in FNA 15 samples
  • Example 7 Stained thyroid smears can be used for microRNA profiling
  • MicroRNA expression level in samples stained with different dyes was compared in order to evaluate microRNA stability and reproducibility of the microRNA level detection upon staining.
  • a total of 143 smears from FNA cohort I were stained as follows: 60 with May- 25 Griinwald Giemsa, 64 with DiffQuik and 19 with Papanicolaou.
  • MicroRNA expression levels in duplicates of the same sample stained with different dyes showed significant correlation (more than expected).
  • FIG. 5A-5B shows that the normalized expression level of hsa-miR-146b-5p (SEQ ID NO.10- 11) is similar when the same sample is stained with different 30 dyes, as can be seen for the 52 May-Griinwald Giemsa -DiffQuik pairs (Fig.5 A) and for the 15 DiffQuik-Papanicolaou pairs (Fig.5B). Therefore, different cytological dyes used in the clinical setting (Papanicolaou; May- Griinwald Giemsa; and DiffQuik) do not affect the detection and quantification of microRNA expression.
  • microRNAs A total of twenty-four (24) microRNAs overall were chosen for establishing the status of thyroid samples as malignant versus benign (Table 12). MicroRNA expression was measured by RT-PCR as described above. The list of miRs and their respective forward primers are provided in Table 8. First-strand generation was done using polyT adaptor presented below. Forward primers were sequence-specific while the reverse primer was universal.
  • Detection of the RT- 10 PCR products was done with the universal MGB probe for miRs hsa-miR-31-5p (SEQ ID NO.5- 7) , hsa-miR-5701 (SEQ ID N0.35), hsa-miR-424-3p (SEQ ID NO.16), MID-50971 (SEQ ID N0.34), MID-20094 (SEQ ID NO.27-28), MID-50976 (SEQ ID N0.33), hsa-miR-3074-5p (SEQ ID N0.32), hsa-miR-222-3p (SEQ ID NO.1-2), MID-50969 (SEQ ID N0.29), hsa-miR- 146b-5p (SEQ ID NO.10-11), hsa-miR-346 (SEQ ID N0.14), MID-16582 (SEQ ID N0.25), or 15 with probes specific for the miRs as provided in Table 9.
  • GCGAGCACAGAATTAATACGACTCACTATCGGTTTTTTTTTTTTVN (SEQ ID NO. 310), where "V” may be any one of A, G or C; and “N” may be any one of G, C, A or U/T;
  • Marker microRNAs were selected based on their patterns of expression in several preliminary studies performed by the inventors (data not shown), and provided the reasoning for classifying the same as “malignant", “cell type” or alternatively, to be used as normalizers.
  • Cell type markers hsa-miR-486-5p, hsa-miR-342-3p, hsa-miR-138-5p, hsa-miR-200c- 3p, and MID- 16582 were chosen by the inventors according to their pattern or expression as 10 exemplified below.
  • hsa-miR-486-5p (SEQ ID NO.22) was found enriched in whole blood relative to thyroid epithelial cells. Along with other microRNAs (data not shown), it was found to be associated with the amount of blood in thyroid FNA samples. Thus, hsa-miR-486-5p (SEQ ID NO.22) is one example of whole blood marker.
  • microRNAs were detected in high correlation 15 (>0.85) with miR-486-5p, and may also be considered blood markers, including hsa-miR-320a, hsa-miR-106a-5p, hsa-miR-93-5p, hsa-miR-17-3p, hsa-let-7d-5p, hsa-miR-107, hsa-miR-103a-3p, hsa-miR-17-5p, hsa-miR-191-5p, hsa-miR-25-3p, hsa-miR-106b-5p, hsa-miR-20a-5p, hsa-miR-18a-5p, hsa-miR-144-3p, hsa-miR-140-3p, hsa-miR-15b-5p, hsa-miR-16-5p, hsa-miR-
  • hsa-miR-342-3p (SEQ ID NO.17-18) was one of the microRNAs, amongst others, which was enriched in white blood cells, and may therefore be considered an example of white blood cell marker.
  • hsa-miR-342-3p showed to be expressed in correlation with hsa-miR-150-5p, suggesting that also hsa-miR-150-5p is a white blood cell 10 marker.
  • hsa-miR-146a-5p was also shown to be expressed in white blood cells (data not shown).
  • hsa-miR-200c-3p SEQ ID NO.23-24
  • hsa-miR-138-5p SEQ ID NO.19-21
  • hsa-miR-200c-3p SEQ ID NO.23-24
  • hsa-miR-138- 5p SEQ ID NO.19-21
  • MID- 16582 (SEQ ID NO.25) was found at higher expression levels in Hurthle cells. In preliminary studies, the inventors have surprisingly found that this microRNA is upregulated in 25 follicular adenoma presenting Hurthle cells versus follicular adenomas not indicated to have Hurthle cells ( Figures 6A-6B). This result may be attributed to the mitochondrial enrichment found in Hurthle cells. The present inventors have found that the sequence of MID- 16582 (SEQ ID NO.25), as well as other nucleic acid sequences found in Hurthle cells, can be mapped to mitochondrial DNA (data not shown). Thus, MID- 16582 is an example of Hurthle cell marker. 30
  • results of the training in a sub-set of samples are shown in Figure 7.
  • Expression of microRNAs hsa-miR-222-3p (SEQ ID NO.1-2), hsa-miR-551b-3p (SEQ ID NO.3-4), hsa- miR-31-5p (SEQ ID NO.5-7), hsa-miR-125b-5p (SEQ ID NO.9), hsa-miR-146b-5p (SEQ ID NO.10-11), hsa-miR-346 (SEQ ID N0.14), hsa-miR-181c-5p (SEQ ID N0.15), and hsa-miR- 375 (SEQ ID NO.8) above the threshold are found in correlation with malignant samples.
  • MicroRNA expression levels were normalized with the so-called normalizer microRNAs [hsa-miR-23a-3p, MID-20094, MID- 30 50969, hsa-miR-345-5p, hsa-miR-3074-5p, MID-50976, MID-50971 , hsa-miR-5701 and hsa- miR-574-3p] and were subtracted from 50, in order for lower CTS to be associated with higher expression values.
  • MicroRNA Ratios Ratios were obtained from pairs of microRNAs in an attempt to subtract certain factors from the classifier. Thus e.g.
  • a ratio of hsa-miR-31-5p:hsa-miR-342-3p enables to reduce the contribution of white blood cells (through the expression of hsa-miR-342- 3p, the denominator) in the expression of hsa-miR-31-5p (the numerator). Since CTS are in log- scale, ratios were created by subtracting one miR expression from the other. Each ratio was 5 further normalized by adding a constant, in order for the ratios to be within the same range as the microRNA normalized values.
  • Figures 8A-8C, Fig.9A-9C and Fig. lOA-lOC provide the results of this algorithm on malignant+benign samples.
  • Figures 23A-23C, Fig.24A-24C and Fig. 25A-25C provide the results of this algorithm on indeterminate samples.
  • Figures 37A-37C, Fig.38A-38C and Fig. 39A-39C provide the results of this algorithm on Bethesda IV samples.
  • Figures 11 A- 30 l lC, Fig. l2A-12B and Fig. 13A-13C provide the results of this algorithm on malignant+benign samples.
  • Figures 26A-26C, Fig.27A-27B and Fig. 28A-28C provide the results of this algorithm on indeterminate samples.
  • Figures 40A-40C, Fig.41A-41C and Fig. 42A-42C provide the results of this algorithm on Bethesda IV samples.
  • a third analysis was performed applying SVM (Support vector machine) as the algorithm, in which linear kernel was used.
  • SVM Small vector machine
  • the analysis with the SVM algorithm was applied to three sets of samples as mentioned above (malignant+benign, indeterminate and Bethesda IV), 5 using as features either different combinations of microRNA expression levels (Fig.l4A-14C, Fig.29A-29C and Fig.43A-43C), microRNA ratios (Fig. l5A-15C, Fig.30A-30C and Fig.44A- 44C), or a combination of microRNA expression levels and microRNA ratios (Fig.l6A-16C, Fig.31A-31C and Fig.45A-45C), respectively.
  • Figures 14 A- 10 14C, Fig.l5A-15C and Fig. 16A-16C provide the results of this algorithm on malignant+benign samples.
  • Figures 29A-29C, Fig.30A-30C and Fig. 31A-31C provide the results of this algorithm on indeterminate samples.
  • Figures 43A-43C, Fig.44A-44C and Fig. 45A-45C provide the results of this algorithm on Bethesda IV samples.
  • a fourth analysis was performed applying Ensemble methods as the algorithm.
  • An ensemble of up to 100 discriminant analysis classifiers was created using AdaBoost and applied to the data.
  • the analysis with the Ensemble algorithm was applied to three sets of samples as mentioned above (malignant+benign, indeterminate and Bethesda IV), using as features either 20 different combinations of microRNA expression levels (Fig. l7A-17C, Fig.32A-32C and Fig.46A-46C), microRNA ratios (Fig.l8A-18C, Fig.33A-33C and Fig.47A-47C), or a combination of microRNA expression levels and microRNA ratios (Fig.l9A-19C, Fig.34A-34C and Fig.48A-48C).
  • Figures 17 A- 25 17C, Fig.l8A-18C and Fig. 19A-19C provide the results of this algorithm on malignant+benign samples.
  • Figures 32A-32C, Fig.33A-33C and Fig. 34A-34C provide the results of this algorithm on indeterminate samples.
  • Figures 46A-46C, Fig.47A-47C and Fig. 48A-48C provide the results of this algorithm on Bethesda IV samples.
  • Example 10 A classifier for malignant samples including medullary
  • hsa-miR-486-5p SEQ ID N0.22
  • hsa-miR-200c-3p SEQ ID NO.23-24
  • Figure 53 shows the result of this experiment.
  • the blood microRNA marker, hsa-miR-486-5p is very high and the epithelial marker, hsa-miR-200c-3p, is very low, compared to the threshold established in the training set.
  • the blood smear samples were therefore filtered out using these markers.
  • This 25 expression pattern indicates that these samples do not have enough epithelial cells (for lack of the epithelial cell marker) to continue the test. In a test situation, these four samples of blood smears would be disqualified and discarded.
  • Expression of hsa-miR-138-5p (SEQ ID NO.19-21) has also been shown to be low, compared to the threshold, in blood smears (data not shown). Samples with this profile are eligible to be disqualified and/or discarded from the protocol for 30 classification of thyroid lesion samples.
  • hsa-miR-342-3p SEQ ID NO.17- 18 correlates with white blood cells (data not shown).
  • high expression of hsa- miR-342-3p compared to the threshold indicated lack of sufficient thyroid cells, and samples with this profile are eligible to be disqualified and/or discarded from the protocol for classification of thyroid lesion samples.
  • hsa-miR-200c-3p is an indicator of the presence of epithelial cells in general, and specifically thyroid cells (data not shown and Figure 53).
  • the expression of hsa-miR-200c-3p above a threshold may be used as an indicator of sufficiency 5 of thyroid cells in the sample.
  • the inventors also tested microRNA ratios for sub-typing benign thyroid tumors. 15
  • the miR ratio of hsa-miR-125b-5p:hsa-miR-200c-3p was significant for classifying follicular adenoma (FA) versus Hashimoto samples (data not shown).
  • Figure 55 provides one example of an analysis, in which 146b- 5p, 222-3p, 31-5p, 125b-5p, 551-3p and 375 were found to be highly expressed in papillary 20 carcinoma, while MID-16582 was found to be highly expressed in follicular carcinoma.
  • the ratios of the following miR pairs were significant for classifying Papillary Carcinoma (PC) versus Follicular Carcinoma samples: hsa-miR-146b-5p:hsa-miR-342-3p, hsa- miR-125b-5p:hsa-miR-200c-3p, hsa-miR-222-3p:hsa-miR-486-5p, hsa-miR-31-5p:hsa-miR- 342-3p, MID-16582:hsa-miR-200c-3p, MID-16582:hsa-miR-138-5p (data not shown). 25
  • malignant thyroid tumor sub-typing may be performed using miR ratios, particularly miR ratios where the denominator is a cell marker microRNA, such as hsa-miR-486-5p, hsa-miR-200c-3p, hsa-miR-138-5p, and hsa-miR-342-3p.
  • miR ratios particularly miR ratios where the denominator is a cell marker microRNA, such as hsa-miR-486-5p, hsa-miR-200c-3p, hsa-miR-138-5p, and hsa-miR-342-3p.
  • Example 13 Protocol for the classification of thyroid nodules as malignant or benign 30
  • Figure 56 presents a flowchart with the protocol for thyroid nodule sample analysis, from collection of FNA samples to laboratory analysis and diagnostic.
  • FNA samples are collected from patients having thyroid nodules, and are routinely processed. Smears are prepared from the FNA samples.
  • a specialist in cytopathology examines the FNA sample and provides an analysis. In cases where the analysis is inconclusive, particularly in samples classified as Bethesda III, IV, or V, i.e. so-called "indeterminate", the sample is sent to Rosetta Genomics' laboratories to undergo microRNA profiling and conclusive diagnostic. Total RNA is extracted from the sample, which undergoes microRNA profiling.
  • MicroRNA profiling may be performed by amplification (RT-PCR or NGS) or hybridization (microarray), as shown in the 5 Examples above.
  • the protocol may include any one of the following:
  • One or more algorithms may be used during classification, and will be applied on data comprising single microRNAs expression, microRNA ratios, or a combination thereof.
  • Samples wherein the hsa-miR-375 expression level is above a specific threshold may be 10 determined as malignant (e.g. a threshold of at least 10, or a threshold of at least 18), as demonstrated for example in Figures 4 (expression analyzed by array) and 20 (expression analyzed by PCR).
  • the threshold is dependent on the normalization of the samples, as well as on the methodology used for measuring the microRNAs.
  • the threshold may also be a function of the target sensitivity and specificity. 15
  • Samples wherein the hsa-miR-146b-5p expression level is above a specific threshold will be determined as malignant (e.g. a threshold of at least 16), as demonstrated for example in Figures 21, 35 and 49.
  • the threshold is dependent on the normalization of the samples, as well as on the methodology used for measuring the microRNAs.
  • the threshold may also be a function of the target sensitivity and specificity.
  • Samples wherein the ratio hsa-miR-146b-5p:hsa-miR-342-3p, further to normalization, is above a specific threshold will be determined as malignant (e.g. a threshold of at least 16), as demonstrated for example in Figures 22, 36 and 50.
  • the threshold is dependent on the normalization of the samples, as well as on the methodology used for measuring the microRNAs.
  • the level of expression of the normalizers may be used as an indicator for discarding 25 samples, due to insufficient tumor-derived material. Thus, samples presenting low levels of any of the normalizers, or the minimal, median or maximal value of expression for the normalizers may be discarded. For example, low levels of hsa-miR-23a-3p (compared to the overall levels of hsa-miR-23a-3p expression in the cohort) are likely to be misclassified. In counterpart, high levels of hsa-miR-23a-3p improve the classification by improving sensitivity and specificity 30 (data not shown).
  • microRNA profiling data leads to diagnostic of the thyroid nodule as benign or malignant.
  • Results permitting which include the expression of microRNAs that may be associated with thyroid tumor sub-types, as shown in Figures 54 and 55, for example, the sample is further classified according to its thyroid tumor subtype.

Abstract

The present invention provides a method for classification of thyroid tumors through the analysis of the expression patterns of specific microRNAs in fine needle aspiration samples. Thyroid tumor classification according to a microRNA expression signature allows optimization of diagnosis and treatment, as well as determination of signature-specific therapy.

Description

MIRNA EXPRESSION SIGNATURE IN THE CLASSIFICATION OF
THYROID TUMORS
FIELD OF THE INVENTION
The present invention relates to methods for classification of thyroid tumors. Specifically the invention relates to microRNA molecules associated with specific thyroid tumors. 5
BACKGROUND OF THE INVENTION
The accurate diagnosis of thyroid nodules continues to challenge physicians managing patients with thyroid disease. Patients with cytologically indeterminate nodules are often referred for diagnostic surgery, though most of these nodules prove post-surgery to be benign. This limitation of FNA cytology in the pre-operative diagnosis leads to a clinical need for 10 reliable pre-operative molecular markers to distinguish benign from malignant thyroid nodules. MicroRNAs (miRs) are an important class of regulatory RNAs, which have a profound impact on a wide array of biological processes. These small (typically 18-24 nucleotides long) non- coding RNA molecules can modulate protein expression pattern by promoting RNA degradation, inhibiting mRNA translation, and also by affecting gene transcription. miRs play 15 pivotal roles in diverse processes such as development and differentiation, control of cell proliferation, stress response and metabolism. The expression of many miRs was found to be altered in numerous types of human cancer, and in some cases suggesting that such alterations may play a causative role in tumor progression.
The thyroid gland is formed of two main types of cells: the follicular cells and the C or 20 parafollicular cells. Follicular cells produce thyroid hormones, which are regulators of human metabolism. Overproduction of thyroid hormone (hyperthyroidism) causes rapid or irregular heartbeat, trouble sleeping, nervousness, hunger, weight loss, and a feeling of being too warm. In counterpart, hypothyroidism causes metabolism slowdown, tiredness, and weight gain. Thyroid hormone release is regulated by the thyroid- stimulating hormone (TSH), produced by the 25 pituitary gland. The C cells produce calcitonin, a hormone responsible for use of calcium. Lymphocytes and stromal cells are also found in the thyroid.
Thyroid cancer is the eighth most common cancer in the United States, and the most rapidly increasing cancer in the US, with more than 60,000 new cases diagnosed every year, and being the cause of about 1,800 deaths in 2014. Thyroid cancer usually presents itself as a 30 palpable thyroid nodule. Different types of thyroid tumors develop from different cell types, which is a determinant for the gravity and the optimal treatment administered. Most of the growths and tumors in the thyroid gland are benign (non-cancerous) but others are malignant (cancerous).
Approximately 95% of thyroid cancers are differentiated thyroid carcinomas (DTC) that arise from thyroid follicular cells. There are two histological subtypes of DTC: papillary thyroid carcinoma (PTC) type (90-95%) and follicular thyroid carcinoma (FTC) type (5-10%). 5
The most commonly used method for thyroid cancer diagnosis is biopsy by fine-needle aspiration (FNA). FNA samples are routinely examined for cytology to determine whether the nodules are benign or cancerous. The sensitivity and specificity of the cytological examination of an FNA sample range from 68% to 98%, and 72% to 100%, respectively, depending on institutions and doctors. Unfortunately, in at least 25% of the cases the FNA specimens collected 10 are either inadequate for diagnosis or indeterminable by cytology. In current medical practice, most patients with indeterminate results undergo surgery, and are subject to all risks and consequences of the surgical procedure. Follow-up results show that only 25% of the patients operated on are diagnosed with cancer, meaning that 75% of the patients underwent an unnecessary surgical procedure. 15
When examining cytochemical or genetic markers, there is no unique marker that on its own is able to provide reliable results in order to replace the morphologic diagnosis of thyroid lesions. US 7,319,011 describes the measuring the expression of any one of the genes DDIT3, ARG2, ITM1, Clorf24, TARSH, and ACOl in a test follicular thyroid specimen for distinguishing between follicular adenoma (FA) from follicular carcinoma (FC). US 7,670,775 20 describes the analysis of the expression of CCND2, PCSK2, and PLAB for identifying malignant thyroid tissue. US 6,723,506 describes the molecular characterization of PAX8- PPAR1 molecules in connection with diagnosis and treatment of thyroid follicular carcinomas. US 7,378,233 describes the occurrence of the T1796A mutation of the BRAF gene in 24 (69%) of papillary thyroid carcinomas. 25
Accumulated efforts have been invested in finding a molecular diagnostic test which will overcome the uncertainty of indeterminate cytology, and ultimately eliminate unnecessary surgery for non-cancer patients [Chen, Y. T. et.al. (2008) Mod. Pathol. 21, 1139-1146; He, H. et al. (2005) Proc. Natl Acad. Sci. USA 102, 19075-19080; Nikiforova, M. N. et al. (2009) Endocr. Pathol. 20, 85-91; Pallante, P. et al. (2006) Endocr. Relat. Cancer 13, 497-508; Nikiforova, M. 30 N. et al. (2008) J. Clin. Endocrinol. Metab. 93, 1600-1608; Visone, R. et al. (2007) Endocr. Relat. Cancer 14(3):791-8; US 2014/0030714 Al; US 8,541,170; US 2012/0220474 Al; US 8,465,914; US 7,598,052; US 8,202,692; WO 2013/066678; WO 2012/129378; US 2013/0237590; EP 2772 550 Al; Pallante et al. (2010) Endocrine-Related Cancer 17 F91-F104; Dettmer et al. (2014) J Mol Endocrinol. Mar 6; 52(2): 181-9].
Nonetheless, numerous are the challenges that remain. It is of great necessity to develop a molecular assay with not only high sensitivity and specificity, but also that is able to deal with samples that failed the cytology analysis and that fall under the category of indeterminate 5 samples. The present invention provides solutions for this challenge.
SUMMARY OF THE INVENTION
A novel integrated technology platform was developed by the inventors for profiling and characterizing microRNAs in thyroid clinical samples, including biopsies, generally surgically- obtained resections, and cytological specimens, generally obtained by fine-needle aspiration 10 (FNA), and was used applied to classify thyroid lesions as benign or malignant neoplasms, as well as its sub-types. Novel microRNAs are disclosed as potential biomarkers.
Thus, in a first aspect, the present invention provides a method for classifying a thyroid lesion sample, the method comprising the steps of:
a. obtaining a thyroid lesion sample from a subject in need thereof; 15 b. measuring the expression level of at least four nucleic acids in the sample, said nucleic acid comprising a sequence of SEQ ID NOS: 1-308, variants thereof or a sequence having at least about 80% identity thereto;
c. determining a nucleic acid expression profile;
d. applying a classifier algorithm to the nucleic acid expression profile; and 20 e. classifying said thyroid lesion as benign, malignant or of a sub-type of benign or malignant tumor based on the result from the algorithm applied to the nucleic acid expression profile of said sample.
In one embodiment of the method of the invention, following step (b) or (c) further comprising a step of obtaining the ratio between the expression levels of at least one pair of 25 nucleic acids; and wherein in step (d) said classifier algorithm may be applied to any one of the nucleic acid expression profile, said ratio of at least one pair of nucleic acids, or to a combination thereof.
In one embodiment of the method of the invention, said nucleic acid sequence comprises a sequence of any one of SEQ ID NOs.1-37, variants thereof or a sequence having at least about 30 80% identity thereto. In a further embodiment of the method of the invention, said nucleic acid sequence comprises a sequence of any one of SEQ ID NOs.1-25, variants thereof or a sequence having at least about 80% identity thereto.
In a further embodiment of the method of the invention, said thyroid lesion sample is obtained by fine needle aspiration (FNA) biopsy. In one particular embodiment, said sample is a 5 smear from a FNA biopsy.
In another further embodiment of the method of the invention, said thyroid lesion is a nodule of less than 1 cm.
In another further embodiment of the method of the invention, algorithm is a machine- learning algorithm. In one particular embodiment of said method of the invention, said algorithm 10 further combines the nucleic acid expression profile with clinical or genetic data from said sample.
In another further embodiment of the method of the invention, following step (b) if at least one of said nucleic acid expression level is below or above a threshold for thyroid cells, said sample is discarded based on the expression level of said nucleic acid. 15
In another further embodiment of the method of the invention, said sample has less than 50 thyroid cells.
In another further embodiment of the method of the invention, said measuring is performed by hybridization, amplification or next generation sequencing method.
In one particular embodiment of the method of the invention, said hybridization 20 comprises contacting the sample with probes, wherein the probes comprise (i) DNA equivalents of the microRNAs, (ii) the complements thereof, (iii) sequences at least 80% identical to (i) or (ii) or (iv) a nucleic acid sequence that hybridizes with at least eight contiguous nucleotides of any one of SEQ ID NOs 1-25. In another particular embodiment of the invention, said probes are attached to a solid substrate. 25
In another further particular embodiment of the method of the invention, amplification is real-time polymerase chain reaction (RT-PCR), said RT-PCR amplification method comprising forward and reverse primers, and optionally further comprising hybridization with a probe.
In another further embodiment, said method further comprises the step of administering a differential treatment to said subject if said thyroid lesion is benign or malignant. 30
In another further particular embodiment of the method of the invention, said lesion is malignant and said treatment is any one of surgery, chemotherapy, radiotherapy, hormone therapy, or any other recommended treatment. In another aspect, the present invention provides a protocol for classifying a thyroid lesion sample comprising the steps of:
a. obtaining a thyroid lesion sample from a subject in need thereof;
b. measuring the level of at least four nucleic acid in the sample, said nucleic acid comprising a sequence of SEQ ID NOS: 1-308, variants thereof or a sequence having at least 5 about 80% identity thereto;
c. determining the expression of nucleic acids in said sample that associate with specific cell types;
d. wherein (i) the expression level of at least one nucleic acid that is a non-thyroid cell marker above a threshold determines that the sample is discarded; or (ii) expression levels of 10 non-thyroid cell markers below a threshold determines that the sample proceeds to step (e) for further analysis;
e. if the sample is not discarded in step (d), determining a nucleic acid expression profile;
f. applying a classifier algorithm to the microRNA expression profile; 15 g. classifying said thyroid lesion as benign, malignant or of a sub-type of benign or malignant tumor based on the result of the algorithm applied to the nucleic acid expression profile of said sample.
In one embodiment of the protocol of the invention, following step (b) further comprising a step of obtaining the ratio between the expression levels of at least one pair of 20 nucleic acids; and wherein in step (f) said classifier algorithm may be applied to any one of the nucleic acid expression profile, said ratio of at least one pair of nucleic acids, or to a combination thereof.
In another embodiment of the protocol of the invention, said nucleic acid sequence comprises a sequence of any one of SEQ ID NOs.1-37, variants thereof or a sequence having at 25 least about 80% identity thereto. In another embodiment of the protocol of the invention, said nucleic acid sequence comprises a sequence of any one of SEQ ID NOs.1-25, variants thereof or a sequence having at least about 80% identity thereto.
In a further embodiment of the protocol of the invention, said thyroid lesion sample is obtained by fine needle aspiration (FNA) biopsy. In one particular embodiment, said sample is a 30 smear from a FNA biopsy.
In a further embodiment of the protocol of the invention, said thyroid lesion is a nodule of less than 1 cm. In another further embodiment of the protocol of the invention, said sample has less than 50 thyroid cells. In a further embodiment of the protocol of the invention, said algorithm is a machine- learning algorithm.
In a further embodiment of the protocol of the invention, the measuring is performed by hybridization, amplification or next generation sequencing method.
In another further aspect, the present invention provides a kit for thyroid tumor 5 classification, said kit comprising:
a. probes for performing thyroid tumor classification, wherein said probes comprise any one of (i) DNA equivalents of microRNAs comprising at least one of SEQ ID NOs 1-308, (ii) the complements thereof, (iii) sequences at least 80% identical to (i) or (ii), (iv) a nucleic acid sequence that hybridizes with at least eight contiguous nucleotides of any one of SEQ ID 10 NOs 1-182, or (v) a nucleic acid sequence that hybridizes with RT-PCR products; and optionally b. an instruction manual for using said probes.
In one embodiment, said kit further comprises forward and reverse PCR primers.
In another embodiment, the kit of the invention may comprise forward and reverse primers. In another embodiment, the kit of the invention may further comprise reagents for 15 performing in situ hybridization analysis.
In another further aspect of the invention, the kit for thyroid tumor classification comprises:
(a) at least one forward RT-PCR primer, such as for example at least one of the primers comprising SEQ ID NO. 270-293; 20
(b) a reverse primer;
(c) at least one probe that hybridizes with molecules amplified by the RT-PCR, as for example the probes presented in the Examples; and optionally
(d) any one of an instruction manual for performing said RT-PCR, or an instruction manual for thyroid tumor classification. 25
In one embodiment, said probe is a general probe. In another embodiment said probe is a microRNA sequence-specific probe.
In another further aspect, the present invention provides an isolated nucleic acid, said nucleic acid comprising at least 12 contiguous nucleotides at least 80% identical to the sequence of any one of SEQ ID NOs. 27-29, 33, 34, 139, 140, 307 and 308. 30
In another further aspect, the present invention provides a pharmaceutical composition comprising as active agent the isolated nucleic acids described herein, and optionally adjuvants, carriers, diluents and excipients. Thus, said nucleic acid molecules may be comprised as an active agent in a pharmaceutical composition, a formulation or a medicament. In another further aspect, the present invention provides a vector comprising the isolated nucleic acid described herein.
In another further aspect, the present invention provides a probe comprising the isolated nucleic acid described herein. 5
In another further aspect, the present invention provides a biochip comprising the isolated nucleic acid described herein.
In another further aspect, the present invention provides the use of an isolated nucleic acid as described herein in the preparation of a medicament.
BRIEF DESCRIPTION OF THE DRAWINGS 10
Figure 1: Expression of microRNAs in Giemsa-stained papillary carcinoma (Pap-carc.) and non-papillary carcinoma (N-Pap-carc.) smears. The scatter plot shows differential expression of miRs from a Giemsa-stained papillary carcinoma smear (y-axis, n=l) versus a Giemsa-stained non-papillary carcinoma smear (x-axis, n=l). The data are shown in normalized fluorescence units, as measured by microarray. The parallel lines describe a 1.5-fold change between the 15 samples in either direction. Gray crosses represent untested (NT) control probes or median signal <300 in both samples. Five microRNAs (hsa-miR-146b-5p, hsa-miR-222-3p, hsa-miR-221-3p, hsa-miR-21-5p and hsa-miR-31-5p) are up-regulated in the papillary carcinoma smear.
Figures 2A-2B: Novel microRNAs detected by next generation sequencing. Fig.2A shows the predicted secondary structure of two novel microRNAs, MD2-495 (top) and MD2-437 20 (bottom) detected in thyroid tissue. Fig.2B shows the expression of the two novel microRNAs in each one of 11 resected thyroid samples.
Figures 3A-3B: MicroRNA expression in malignant versus benign samples. The scatter plot shows the median microRNA expression levels of microRNA, including miR-125b-5p, miR-222-3p and miR-146b-5p (highlighted) in malignant nodules (y-axis) versus benign nodules 25 (x-axis). Each cross represents a microRNA, and includes control sequences, microRNAs with low expression and non-reliable probes (NT). The dashed line represents 1.5 fold. Fig.3A shows the analysis in cohort I. Fig.3B shows the analysis in cohort II.
Figure 4: MiR-375 expression in medullary lesions. The plot shows the expression of miR-375 in medullary lesions (diamonds, Med) in comparison to malignant non-medullary 30 (squares, Mal-n-med) combined with benign lesions (circles, B). Lines represent the median expression for each group. /?-value=1.2e-42. Fold change=201.4. Figures 5A-5B: Samples stained with different dyes can be processed and microRNA can be detected. The plots shows the median expression levels of miR-146b-5p in malignant (M) or benign (B) samples stained with different dyes. Fig.5A shows miR expression in samples stained with May-Griinwald Giemsa compared with DiffQuik. /?-value=0.18 (Wilcoxon). Median fold-change (med.f-ch) =1.0. Fig.5B shows miR expression in samples stained with DiffQuik 5 compared with Papanicolaou. /?-value=0.56 (Wilcoxon). Median fold-change (med.f-ch) =1.1.
Figures 6A-6B: Hurthle cell marker. The plots shows higher expression of MID- 16582 in follicular adenoma presenting Hurthle cells versus follicular adenomas with no indication of Hurthle cells. Sign.=significant; Diff.=differential; f-ch=fold change; Bl.=blood; NT, not tested. Fig.6A: The y and x axes show the median array expression levels of the miRs in FA (follicular 10 adenoma) samples not documented as having Hurthle cells (n=22) versus FA samples with Hurthle cells (n=9). The dashed factor line = xl.5. Bl.=blood. NT, not tested. Fig.6B: The y and x axes show the median PCR expression levels of the miRs in FA samples with no indication of Hurthle cells (n=21) versus FA samples with Hurthle cells (n=9). The dashed factor line = + 0.6.
Figure 7: Profiling of malignant and benign samples with Thyroid assay set of 15 microRNAs. The x and y axis show the expression levels of the miRs in benign (B) (n=166) versus malignant (M) (n=187) samples, respectively. The microRNA median expression levels for hsa-miR-222-3p (SEQ ID NO.1-2), hsa-miR-551b-3p (SEQ ID NO.3-4), hsa-miR-31-5p (SEQ ID NO.5-7), hsa-miR-125b-5p (SEQ ID NO.9), hsa-miR-146b-5p (SEQ ID NO.10-11), hsa-miR-152-3p (SEQ ID NO.12-13), hsa-miR-346 (SEQ ID N0.14), hsa-miR-181c-5p (SEQ 20 ID N0.15), hsa-miR-424-3p (SEQ ID NO.16), and hsa-miR-375 (SEQ ID NO.8) is highlighted. The numbers refer to (50 - normalized Ct value). Diamonds (♦) represent the normalizers. Sign.=significant; Diff.=differential; f-ch=fold change. The dashed factor line = + 0.6.
Figures 8A-8C: A Discriminant Analysis classifier was used to classify malignant (diamonds, M) from benign (squares, B) samples, using microRNA expression values. Fig.8A: 25 The normalized values of two microRNAs (hsa-miR-551b-3p and hsa-miR-146b-5p) were used as the features for the classification. The sensitivity of this classifier is 84.8% and the specificity is 68.9%. The grey shaded area marks the space in which a sample is classified as malignant, as determined by the classifier. Fig.8B: The normalized values of three microRNAs (hsa-miR- 551b-3p, hsa-miR-146b-5p, and hsa-miR-31-5p) were used as the features for the classification. 30 The sensitivity of this classifier is 82.9% and the specificity is 72.2%. Misclassified samples (miscl.) are represented by a dot. Fig.8C: The normalized values of 8 microRNA (hsa-miR- 551b-3p; hsa-miR-146b-5p; hsa-miR-31-5p; hsa-miR-222-3p; hsa-miR-375; hsa-miR-125b-5p; hsa-miR-152-3p; hsa-miR-181c-5p) were used as the features for the classification. The figure shows a confusion matrix where the x-axis shows the classifier answer (Clas. Ans.) while the y- axis shows the true diagnosis (Real class=re.cl.). The sensitivity of this classifier is 83.5% and the specificity is 81.5%.
Figures 9A-9C: A Discriminant Analysis classifier was used to classify malignant (diamonds, M) from benign (squares, B) samples, using normalized values of microRNA 5 expression ratios. Fig.9A: The normalized values of two microRNA ratios (hsa-miR-146b- 5p:hsa-miR-342-3p and hsa-miR-31-5p:hsa-miR-342-3p) were used as the features for the classification. The sensitivity of this classifier is 78% and the specificity is 79.5%. The grey shaded area marks the space in which a sample is classified as malignant , as determined by the classifier. Fig. 9B: The normalized values of three microRNA ratios (hsa-miR-146b-5p:hsa- 10 miR-342-3p; hsa-miR-31-5p:hsa-miR-342-3p; hsa-miR-125b-5p:hsa-miR-138-5p) were used as the features for the classification. The sensitivity of this classifier is 81.1% and the specificity is 82.1%. Misclassified samples (miscl.) are represented by a dot. Fig.9C: The normalized values of 8 microRNA ratios (hsa-miR-146b-5p:hsa-miR-342-3p; hsa-miR-31-5p:hsa-miR-342-3p; hsa- miR-125b-5p:hsa-miR-138-5p; hsa-miR-125b-5p:hsa-miR-200c-3p; hsa-miR-222-3p:hsa-miR- 15 486-5p; hsa-miR-200c-3p:hsa-miR-486-5p; MID-16582:hsa-miR-200c-3p; MID-16582:hsa- miR-138-5p) were used as the features for the classification. The figure shows a confusion matrix where the x-axis shows the classifier answer (Clas. Ans.) while the y-axis shows the true diagnosis (Real class=re.cl.). The sensitivity of this classifier is 74.4% and the specificity is 84.1%. 20
Figure lOA-lOC: A Discriminant Analysis classifier was used to classify malignant (diamonds, M) from benign (squares, B) samples, using normalized values of a combination of microRNAs and microRNA ratios. Fig.lOA: Normalized values of one microRNA ratio and one microRNA (hsa-miR-146b-5p:hsa-miR-342-3p; hsa-miR-551b-3p) were used as the features for the classification. The sensitivity of this classifier is 82.9% and the specificity is 82.8%. The 25 grey shaded area marks the space in which a sample is classified as malignant, as determined by the classifier. Fig.lOB: The normalized values of one microRNA ratio and two microRNAs (hsa- miR-146b-5p:hsa-miR-342-3p; hsa-miR-551b-3p; hsa-miR-146b-5p) were used as the features for the classification. The sensitivity of this classifier is 82.9% and the specificity is 82.8%. Fig. IOC: The normalized values of 5 microRNAs and 3 microRNA ratios (hsa-miR-146b- 30 5p:hsa-miR-342-3p; hsa-miR-551b-3p; hsa-miR-146b-5p; hsa-miR-31-5p:hsa-miR-342-3p; hsa- miR-31-5p; hsa-miR-222-3p; hsa-miR-125b-5p:hsa-miR-138-5p; hsa-miR-375) were used as the features for the classification. The figure shows a confusion matrix where the x-axis shows the classifier answer (Clas. Ans.) while the y-axis shows the true diagnosis (Real class=re.cl.) (Real class=re.cl.). The sensitivity of this classifier is 93.3% and the specificity is 42.4%.
Figure 11A-11C: A K-nearest neighbor (KNN) classifier was used to classify malignant (M) from benign (B) samples using normalized values of microRNAs. Fig.llA: The normalized values of 6 microRNAs (hsa-miR-551b-3p; hsa-miR-146b-5p; hsa-miR-31-5p; hsa-miR-222-3p; 5 hsa-miR-375; hsa-miR-125b-5p) were used as the features for the classification. The figure shows a confusion matrix where the x-axis shows the classifier answer (Clas. Ans.) while the y- axis shows the true diagnosis (Real class=re.cl.). The sensitivity of this classifier is 82.3% and the specificity is 68.2%. Fig.llB: The normalized values of 8 microRNAs (hsa-miR-551b-3p; hsa-miR-146b-5p; hsa-miR-31-5p; hsa-miR-222-3p; hsa-miR-375; hsa-miR-125b-5p; hsa-miR- 10 152-3p; hsa-miR-181c-5p) were used as the features for the classification. The figure shows a confusion matrix where the x-axis shows the classifier answer (Clas. Ans.) while the y-axis shows the true diagnosis (Real class=re.cl.). The sensitivity of this classifier is 82.9% and the specificity is 74.2%. Fig.llC: The normalized values of 12 microRNAs (hsa-miR-551b-3p; hsa- miR-146b-5p; hsa-miR-31-5p; hsa-miR-222-3p; hsa-miR-375; hsa-miR-125b-5p; hsa-miR-152- 15 3p; hsa-miR-181c-5p; hsa-miR-486-5p; hsa-miR-424-3p; hsa-miR-200c-3p; hsa-miR-346) were used as the features for the classification. The figure shows a confusion matrix where the x-axis shows the classifier answer (Clas. Ans.) while the y-axis shows the true diagnosis (Real class=re.cl.). The sensitivity of this classifier is 81.1% and the specificity is 68.9%.
Figure 12A-12B: A KNN classifier was used to classify malignant (M) from benign (B) 20 samples using normalized values of microRNA ratios Fig.l2A: The normalized values of 6 microRNA ratios (hsa-miR-146b-5p:hsa-miR-342-3p; hsa-miR-31-5p:hsa-miR-342-3p; hsa- miR- 125b-5p:hsa-miR-138-5p; hsa-miR- 125b-5p:hsa-miR-200c-3p; hsa-miR- 222-3p:hsa-miR- 486-5p; hsa-miR-200c-3p:hsa-miR-486-5p) were used as the features for the classification. The figure shows a confusion matrix where the x-axis represents the classifier answer (Clas. Ans.) 25 and the y-axis represents the true diagnosis (Real class=re.cl.). The sensitivity of this classifier is 78% and the specificity is 58.9%. Fig. 12B: The normalized values of 8 miR ratios (hsa-miR- 146b-5p:hsa-miR-342-3p; hsa-miR-31-5p:hsa-miR-342-3p; hsa-miR-125b-5p:hsa-miR-138-5p; hsa-miR-125b-5p:hsa-miR-200c-3p; hsa-miR-222-3p:hsa-miR-486-5p; hsa-miR-200c-3p:hsa- miR-486-5p; MID-16582:hsa-miR-200c-3p; MID-16582:hsa-miR-138-5p) were used as the 30 features for the classification. The figure shows a confusion matrix where the x-axis represents the classifier answer (Clas. Ans.) and the y-axis represents the true diagnosis (Real class=re.cl.). The sensitivity of this classifier is 80.5% and the specificity is 65.6%. Figure 13A-13C: A KNN classifier was used to classify malignant (M) from benign (B) samples using normalized values of a combination of microRNAs and microRNA ratios. Fig.l3A: The normalized values of 4 microRNAs and 2 microRNA ratios (hsa-miR- 146b- 5p:hsa-miR-342-3p; hsa-miR-551b-3p; hsa-miR-146b-5p; hsa-miR-31-5p:hsa-miR-342-3p; hsa- miR-31-5p; hsa-miR-222-3p) were used as the features for the classification. The figure shows a 5 confusion matrix where the x-axis represents the classifier answer (Clas. Ans.) while the y-axis represents the true diagnosis (Real class=re.cl.). The sensitivity of this classifier is 85.4% and the specificity is 66.9%. Fig.l3B: The normalized values of 5 microRNAs and 3 microRNA ratios (hsa-miR-146b-5p:hsa-miR-342-3p; hsa-miR-551b-3p; hsa-miR-146b-5p; hsa-miR-31-5p:hsa- miR-342-3p; hsa-miR-31-5p; hsa-miR-222-3p; hsa-miR-125b-5p:hsa-miR-138-5p; hsa-miR- 10 375) were used as the features for the classification. The figure shows a confusion matrix where the x-axis represents the classifier answer (Clas. Ans.) while the y-axis represents the true diagnosis (Real class=re.cl.). The sensitivity of this classifier is 83.5% and the specificity is 70.9%. Fig.l3C: The normalized values of 7 microRNAs and 5 microRNA ratios (hsa-miR- 146b-5p:hsa-miR-342-3p; hsa-miR-551b-3p; hsa-miR-146b-5p; hsa-miR-31-5p:hsa-miR-342- 15 3p; hsa-miR-31-5p; hsa-miR-222-3p; hsa-miR-125b-5p:hsa-miR-138-5p; hsa-miR-375; hsa- miR- 125b-5p:hsa-miR-200c-3p; hsa-miR-125b-5p; hsa-miR-222-3p:hsa-miR-486-5p; hsa-miR- 152-3p) were used as the features for the classification. The figure shows a confusion matrix where the x-axis represents the classifier answer (Clas. Ans.) while the y-axis represents the true diagnosis (Real class=re.cl.). The sensitivity of this classifier is 83.5% and the specificity is 20 66.9%.
Figure 14A-14C: A Support Vector Machine (SVM) classifier was used to classify malignant (diamonds, M) from benign (squares, B) samples using normalized microRNA values. Fig.l4A: The normalized values of three microRNAs (hsa-miR-551b-3p; hsa-miR-146b-5p; hsa- miR-31-5p) were used as the features for the classification. The sensitivity of this classifier is 25 82.3% and the specificity is 68.2%. Misclassified samples (miscl.) are represented by a dot. Fig.l4B: The normalized values of 6 microRNAs (hsa-miR-551b-3p; hsa-miR- 146b-5p; hsa- miR-31-5p; hsa-miR- 222-3p; hsa-miR-375; hsa-miR- 125b-5p) were used as the features for the classification. The figure shows a confusion matrix where the x-axis shows the classifier answer (Clas. Ans.) while the y-axis shows the true diagnosis (Real class=re.cl.). The sensitivity of this 30 classifier is 83.5% and the specificity is 75.5%. Fig.l4C: The normalized values of 8 microRNAs (hsa-miR-551b-3p; hsa-miR-146b-5p; hsa-miR-31-5p; hsa-miR-222-3p; hsa-miR- 375; hsa-miR-125b-5p; hsa-miR-152-3p; hsa-miR-181c-3p) were used as the features for the classification. The figure shows a confusion matrix where the x-axis shows the classifier answer (Clas. Ans.) and the y-axis shows the true diagnosis (Real class=re.cl.). The sensitivity of this classifier is 86% and the specificity is 75.5%.
Figure 15A-15C: A SVM classifier was used to classify malignant (diamonds, M) from benign (squares, B) samples, using normalized values of microRNA ratios. Fig.l5A: The normalized values of three microRNA ratios (hsa-miR-146b-5p:hsa-miR-342-3p; hsa-miR-31- 5 5p:hsa-miR-342-3p; hsa-miR-125b-5p:hsa-miR-138-5p) were used as the features for the classification. The sensitivity of this classifier is 83.5% and the specificity is 80.8%. Misclassified samples (miscl.) are represented by a dot. Fig.l5B: The normalized values of 6 microRNA ratios (hsa-miR-146b-5p:hsa-miR-342-3p; hsa-miR-31-5p:hsa-miR-342-3p; hsa- miR-125b-5p:hsa-miR-138-5p; hsa-miR-125b-5p:hsa-miR-200c-3p; hsa-miR-222-3p:hsa-miR- 10 486-5p; hsa-miR-200c-3p:hsa-miR-486-5p) were used as the features for the classification. The figure shows a confusion matrix where the x-axis shows the classifier answer (Clas. Ans.) and the y-axis shows the true diagnosis (Real class=re.cl.). The sensitivity of this classifier is 83.5% and the specificity is 80.1%. Fig.l5C: The normalized values of 8 microRNA ratios (hsa-miR- 146b-5p:hsa-miR-342-3p; hsa-miR-31-5p:hsa-miR-342-3p; hsa-miR-125b-5p:hsa-miR-138-5p; 15 hsa-miR-125b-5p:hsa-miR-200c-3p; hsa-miR-222-3p:hsa-miR-486-5p; hsa-miR-200c-3p:hsa- miR-486-5p; MID-16582:hsa-miR-200c-3p; MID-16582:hsa-miR-138-5p) were used as the features for the classification. The figure shows a confusion matrix where the x-axis shows the classifier answer (Clas. Ans.) and the y-axis shows the true diagnosis (Real class=re.cl.). The sensitivity of this classifier is 82.9% and the specificity is 80.8%. 20
Figure 16A-16C: A SVM classifier was used to classify malignant (diamonds, M) from benign (squares, B) samples, using normalized values of a combination of microRNA values and microRNA ratios. Fig.l6A: The normalized values of 2 microRNAs and one microRNA ratio (hsa-miR-146b-5p:hsa-miR-342-3p; hsa-miR-551b-3p; hsa-miR-146b-5p) were used as the features for the classification. The sensitivity of this classifier is 82.9% and the specificity is 25 83.4%. Misclassified samples (miscl.) are represented by a dot. Fig.l6B: The normalized values of 4 microRNA and 2 microRNA ratios (hsa-miR-146b-5p:hsa-miR-342-3p; hsa-miR-551b-3p; hsa-miR-146b-5p; hsa-miR-31-5p:hsa-miR-342-3p; hsa-miR-31-5p; hsa-miR-222-3p) were used as the features for the classification. The figure shows a confusion matrix where the x-axis shows the classifier answer (Clas. Ans.) and the y-axis shows the true diagnosis (Real 30 class=re.cl.). The sensitivity of this classifier is 86% and the specificity is 80.1%. Fig.l6C: The normalized values of 5 microRNAs and 3 microRNA ratios (hsa-miR-146b-5p:hsa-miR-342-3p; hsa-miR-551b-3p; hsa-miR-146b-5p; hsa-miR-31-5p:hsa-miR-342-3p; hsa-miR-31 -5p; hsa-miR- 222-3p; hsa-miR-125b-5p:hsa-miR-138-5p; hsa-miR-375) were used as the features for the classification. The figure shows a confusion matrix where the x-axis shows the classifier answer (Clas. Ans.) and the y-axis shows the true diagnosis (Real class=re.cl.). The sensitivity of this classifier is 86.6% and the specificity is 79.5%.
Figure 17A-17C: A Discriminant Analysis Ensemble classifier was used to classify malignant (diamonds, M) from benign (squares, B) samples, using normalized values of 5 microRNAs. Fig.l7A: The normalized values of two microRNAs (hsa-miR-551b-3p; hsa-miR- 146b-5p) were used as the features for the classification. The sensitivity of this classifier is 84.8% and the specificity is 64.2%. The grey shaded area marks the space in which a sample is classified as malignant, as determined by the classifier. Fig.l7B: The normalized values of three microRNAs (hsa-miR-551b-3p; hsa-miR-146b-5p; hsa-miR-31-5p) were used as the features for 10 the classification. The sensitivity of this classifier is 84.1% and the specificity is 65.6%. Misclassified samples (miscl.) are represented by a dot. Fig.l7C: The normalized values of 8 microRNAs (hsa-miR-551b-3p; hsa-miR-146b-5p; hsa-miR-31-5p; hsa-miR-222-3p; hsa-miR- 375; hsa-miR-125b-5p; hsa-miR-152-3p; hsa-miR-181c-3p) were used as the features for the classification. The figure shows a confusion matrix where the x-axis shows the classifier answer 15 (Clas. Ans.) and the y-axis shows the true diagnosis (Real class=re.cl.). The sensitivity of this classifier is 84.8% and the specificity is 74.8%.
Figure 18A-18C: A Discriminant Analysis Ensemble classifier was used to classify malignant (diamonds, M) from benign (squares, B) samples, using normalized values of microRNA ratios. Fig.l8A: The normalized values of two microRNA ratios (hsa-miR-146b- 20 5p:hsa-miR-342-3p; hsa-miR-31-5p:hsa-miR-342-3p) were used as the features for the classification. The sensitivity of this classifier is 83.5% and the specificity is 73.5%. The grey shaded area marks the space in which a sample is classified as malignant, as determined by the classifier. Fig.l8B: The normalized values of three microRNA ratios (hsa-miR-146b-5p:hsa- miR-342-3p; hsa-miR-31-5p:hsa-miR-342-3p; hsa-miR-125b-5p:hsa-miR-138-5p) were used as 25 the features for the classification. The sensitivity of this classifier is 86% and the specificity is 79.5%. Misclassified samples (miscl.) are represented by a dot. Fig.l8C: The normalized values of 8 microRNA ratios (hsa-miR-146b-5p:hsa-miR-342-3p; hsa-miR-31-5p:hsa-miR-342-3p; hsa- miR-125b-5p: hsa-miR-138-5p; hsa-miR-125b-5p:hsa-miR-200c-3p; hsa-miR-222-3p:hsa-miR- 486-5p; hsa-miR-200c-3p:hsa-miR-486-5p; MID-16582:hsa-miR-200c-3p; MID-16582:hsa- 30 miR-138-5p) were used as the features for the classification. The figure shows a confusion matrix where the x-axis shows the classifier answer (Clas. Ans.) and the y-axis shows the true diagnosis (Real class=re.cl.). The sensitivity of this classifier is 84.1% and the specificity is 78.1%. Figure 19A-19C: A Discriminant Analysis Ensemble classifier was used to classify malignant (diamonds, M) from benign (squares, B) samples, using a combination of normalized values of microRNAs and microRNA ratios. Fig.l9A: The normalized values of one microRNA and one microRNA ratio (hsa-miR-146b-5p:hsa-miR-342-3p; hsa-miR-551b-3p) were used as the features for the classification. The sensitivity of this classifier is 85.4% and the specificity is 5 78.8%. The grey shaded area marks the space in which a sample is classified as malignant, as determined by the classifier. Fig.l9B: The normalized values of two microRNAs and one microRNA ratio (hsa-miR-146b-5p:hsa-miR-342-3p; hsa-miR-551b-3p; hsa-miR-146b-5p) were used as the features for the classification. The sensitivity of this classifier is 85.4% and the specificity is 78.1%. Misclassified samples (miscl.) are represented by a dot. Fig. 19C: The 10 normalized values of 5 microRNAs and 3 microRNA ratios (hsa-miR-146b-5p:hsa-miR-342-3p; hsa-miR-551b-3p; hsa-miR-146b-5p; hsa-miR-31-5p:hsa-miR-342-3p; hsa-miR-31-5p; hsa-miR- 222-3p; hsa-miR-125b-5p:hsa-miR-138-5p; hsa-miR-375) were used as the features for the classification. The figure shows a confusion matrix where the x-axis shows the classifier answer (Clas. Ans.) while the y-axis shows the true diagnosis (Real class=re.cl.). The sensitivity of this 15 classifier is 86% and the specificity is 82.8%.
Figure 20: The normalized levels of hsa-miR-375 expression (Exp.) is shown as a dot plot for Medullary ("Med."), non-medullary Malignant ("Mai.") and for benign ("Ben.") samples. Lines represent the median values for each group. Within each group, dots are randomly distributed along the x-axis, in order to improve visibility of the dots. 20
Figure 21: The normalized levels of hsa-miR-146b-5p expression (Exp.) is shown as a dot plot for non-medullary Malignant ("Mai.") and for benign ("Ben.") samples. Lines represent the median values for each group. Within each group, dots are randomly distributed along the x- axis, in order to improve visibility of the dots.
Figure 22: The normalized expression (Exp.) levels of the miR ratio 25 hsa-miR-146b-5p:hsa-miR-342-3p is shown as a dot plot for non-medullary Malignant ("Mai.") and for benign ("Ben.") samples. Lines represent the median values for each group. Within each group, dots are randomly distributed along the x-axis, in order to improve visibility of the dots.
Figure 23A-23C: A Discriminant Analysis classifier was used to classify Indeterminate 30 malignant (diamonds, M) from benign (squares, B) samples, using normalized values of microRNAs. Fig.23A: The normalized values of two microRNAs (hsa-miR-146b-5p; hsa-miR- 551b-3p) were used as the features for the classification. The sensitivity of this classifier is 80% and the specificity is 56.3%. The grey shaded area marks the space in which a sample is classified as malignant, as determined by the classifier. Fig.23B: The normalized values of three microRNAs (hsa-miR-146b-5; hsa-miR-551b-3p; hsa-miR-222-3p) were used as the features for the classification. The sensitivity of this classifier is 82.6% and the specificity is 59.5%. Misclassified samples (miscl.) are represented by a dot. Fig.23C: The normalized values of 8 microRNAs (hsa-miR-146b-5p,hsa-miR-551b-3p,hsa-miR-222-3p,hsa-miR-125b-5p,hsa-miR- 5 31-5p,hsa-miR-375,hsa-miR-152-3p,hsa-miR-181c-5p) were used as the features for the classification. The figure shows a confusion matrix where the x-axis shows the classifier answer and the y-axis shows the true diagnosis. The sensitivity of this classifier is 81.7% and the specificity is 71.4%. The figure shows a confusion matrix where the x-axis shows the classifier answer (Clas. Ans.) while the y-axis shows the true diagnosis (Real class=re.cl.). 10
Figure 24A-24C: A Discriminant Analysis classifier was used to classify Indeterminate malignant (diamonds, M) from benign (squares, B) samples, using normalized values of microRNA ratios. Fig.24A: The normalized values of two microRNA ratios (hsa-miR-146b-5p - hsa-miR-342-3p,hsa-miR-31-5p - hsa-miR-342-3p) were used as the features for the classification. The sensitivity of this classifier is 80% and the specificity is 72.2%. The grey 15 shaded area marks the space in which a sample is classified as malignant, as determined by the classifier. Fig.24B: The normalized values of three microRNA ratios (hsa-miR-146b-5p:hsa- miR-342-3p; hsa-miR-31-5p:hsa-miR-342-3p; hsa-miR-125b-5p:hsa-miR-200c-3p) were used as the features for the classification. The sensitivity of this classifier is 80% and the specificity is 69%. Misclassified samples (miscl.) are represented by a dot. Fig.24C: The normalized values of 20 8 microRNA ratios indeterminate malignant from benign samples. The normalized values of 8 microRNA ratios (hsa-miR-146b-5p:hsa-miR-342-3p; hsa-miR-31-5p:hsa-miR-342-3p; hsa- miR-125b-5p:hsa-miR-200c-3p; hsa-miR-125b-5p:hsa-miR-138-5p; hsa-miR-222-3p:hsa-miR- 486-5p; MID-16582:hsa-miR-200c-3p; MID-16582:hsa-miR-138-5p; hsa-miR-200c-3p:hsa- miR-486-5p) were used as the features for the classification. The figure shows a confusion 25 matrix where the x-axis shows the classifier answer and the y-axis shows the true diagnosis. The sensitivity of this classifier is 80% and the specificity is 66.7%. The figure shows a confusion matrix where the x-axis shows the classifier answer (Clas. Ans.) while the y-axis shows the true diagnosis (Real class=re.cl.).
Figure 25A-25C: A Discriminant Analysis classifier was used to classify Indeterminate 30 malignant (diamonds, M) from benign (squares, B) samples, using a combination of normalized values of microRNAs and microRNA ratios. Fig.25A: The normalized values of one microRNA and one microRNA ratio (hsa-miR-146b-5p:hsa-miR-342-3p; hsa-miR-146b-5p) were used as the features for the classification. The sensitivity of this classifier is 80% and the specificity is 73.8%. The grey shaded area marks the space in which a sample is classified as malignant, as determined by the classifier Fig.25B: The normalized values of two microRNAs and one microRNA ratio (hsa-miR-146b-5p:hsa-miR-342-3p; hsa-miR-146b-5p; hsa-miR-551b-3p) were used as the features for the classification. The sensitivity of this classifier is 79.1% and the specificity is 73%. Fig.25C: The normalized values of 5 microRNAs and 3 microRNA ratios 5 (hsa-miR-146b-5p:hsa-miR-342-3p; hsa-miR-146b-5p; hsa-miR-551b-3p; hsa-miR-222-3p; hsa- miR-31-5p:hsa-miR-342-3p; hsa-miR-125b-5p:hsa-miR-200c-3p; hsa-miR-125b-5p; hsa-miR- 31-5p) were used as the features for the classification. The sensitivity of this classifier is 87.8% and the specificity is 67.5%. The figure shows a confusion matrix where the x-axis shows the classifier answer (Clas. Ans.) while the y-axis shows the true diagnosis (Real class=re.cl.). 10
Figure 26A-26C: A KNN classifier was used to classify Indeterminate malignant (M) from benign (B) samples, using normalized values of microRNAs. Fig.26A: The normalized values of 6 microRNAs (hsa-miR-146b-5p; hsa-miR-551b-3p; hsa-miR-222-3p; hsa-miR-125b- 5p; hsa-miR-31-5p; hsa-miR-375) were used as the features for the classification. The figure shows a confusion matrix where the x-axis shows the classifier answer (Clas. Ans.) while the y- 15 axis shows the true diagnosis (Real class=re.cl.). The sensitivity of this classifier is 78.3% and the specificity is 65.9%. Fig.26B: The normalized values of 8 microRNAs (hsa-miR-146b-5p; hsa-miR-551b-3p; hsa-miR-222-3p; hsa-miR-125b-5p; hsa-miR-31-5p; hsa-miR-375; hsa-miR- 152-3p; hsa-miR-181c-5p) were used as the features for the classification. The figure shows a confusion matrix where the x-axis shows the classifier answer (Clas. Ans.) while the y-axis 20 shows the true diagnosis (Real class=re.cl.). The sensitivity of this classifier is 82.6% and the specificity is 73%. Fig.26C: The normalized values of 12 microRNAs (hsa-miR-551b-3p; hsa- miR-146b-5p; hsa-miR-222-3p; hsa-miR-125b-5p; hsa-miR-31-5p; hsa-miR-375; hsa-miR-152- 3p; hsa-miR-181c-5p; hsa-miR-424-3p; hsa-miR-486-5p; hsa-miR-200c-3p; hsa-miR-346) were used as the features for the classification. The figure shows a confusion matrix where the x-axis 25 shows the classifier answer (Clas. Ans.) while the y-axis shows the true diagnosis (Real class=re.cl.). The sensitivity of this classifier is 73.9% and the specificity is 68.3%.
Figure 27A-27B: A KNN classifier was used to classify Indeterminate malignant (M) from benign (B) samples, using normalized values of microRNA ratios. Fig.27A: The normalized values of 6 microRNA ratios (hsa-miR-146b-5p:hsa-miR-342-3p; hsa-miR-31- 30 5p:hsa-miR-342-3p; hsa-miR-125b-5p:hsa-miR-200c-3p; hsa-miR-125b-5p:hsa-miR-138-5p; hsa-miR-222-3p:hsa-miR-486-5p; MID-16582:hsa-miR-200c-3p) were used as the features for the classification. The figure shows a confusion matrix where the x-axis shows the classifier answer (Clas. Ans.) while the y-axis shows the true diagnosis (Real class=re.cl.). The sensitivity of this classifier is 80.9% and the specificity is 65.9%. Fig.27B: The normalized values of 8 microRNA ratios (hsa-miR-146b-5p:hsa-miR-342-3p; hsa-miR-31-5p:hsa-miR-342-3p; hsa- miR-125b-5p:hsa-miR-200c-3p; hsa-miR-125b-5p:hsa-miR-138-5p; hsa-miR-222-3p:hsa-miR- 486-5p; MID-16582:hsa-miR-200c-3p; MID-16582:hsa-miR-138-5p; hsa-miR-200c-3p:hsa- miR-486-5p) were used as the features for the classification. The figure shows a confusion 5 matrix where the x-axis shows the classifier answer (Clas. Ans.) while the y-axis shows the true diagnosis (Real class=re.cl.). The sensitivity of this classifier is 76.5% and the specificity is 62.7%.
Figure 28A-28C: A KNN classifier was used to classify Indeterminate malignant (M) from benign (B) samples, using normalized values of microRNAs and microRNA ratios. 10 Fig.27C: The normalized values of 3 microRNAs and 3 microRNA ratios (hsa-miR-146b- 5p:hsa-miR-342-3p; hsa-miR-146b-5p; hsa-miR-551b-3p; hsa-miR-222-3p; hsa-miR-31-5p:hsa- miR-342-3p; hsa-miR-125b-5p:hsa-miR-200c-3p) were used as the features for the classification. The figure shows a confusion matrix where the x-axis shows the classifier answer (Clas. Ans.) while the y-axis shows the true diagnosis (Real class=re.cl.). The sensitivity of this 15 classifier is 76.5% and the specificity is 57.9%. Fig.28B: The normalized values of 5 microRNAs and 3 microRNA ratios (hsa-miR-146b-5p:hsa-miR-342-3p; hsa-miR-551b-3p; hsa- miR-146b-5p; hsa-miR-222-3p; hsa-miR-31-5p:hsa-miR-342-3p; hsa-miR-125b-5p:hsa-miR- 200c-3p; hsa-miR-125b-5p; hsa-miR-31-5p) were used as the features for the classification. The figure shows a confusion matrix where the x-axis shows the classifier answer (Clas. Ans.) while 20 the y-axis shows the true diagnosis (Real class=re.cl.). The sensitivity of this classifier is 78.3% and the specificity is 64.3% Fig.28C: The normalized values of 12 microRNA and microRNA ratios (hsa-miR-146b-5p:hsa-miR-342-3p; hsa-miR-146b-5p; hsa-miR-551b-3p; hsa-miR-222- 3p; hsa-miR-31-5p:hsa-miR-342-3p; hsa-miR-125b-5p:hsa-miR-200c-3p; hsa-miR-125b-5p; hsa-miR-31-5p; hsa-miR-375; hsa-miR-125b-5p:hsa-miR-138-5p; hsa-miR-222-3p:hsa-miR- 25 486-5p; hsa-miR-152-3p) were used as the features for the classification. The figure shows a confusion matrix where the x-axis shows the classifier answer (Clas. Ans.) while the y-axis shows the true diagnosis (Real class=re.cl.). The sensitivity of this classifier is 80.9% and the specificity is 67.5%.
Figure 29A-29C: A SVM classifier was used to classify Indeterminate malignant 30 (diamonds, M) from benign (squares, B) samples, using the normalized values of microRNAs. Fig.29A: The normalized values of three microRNAs (hsa-miR-146b-5p; hsa-miR-551b-3p; hsa- miR-222-3p) were used as the features for the classification. The sensitivity of this classifier is 82.6% and the specificity is 54.8% Misclassified samples (miscl.) are represented by a dot. Fig.29B: The normalized values of 6 microRNAs (hsa-miR-146b-5p; hsa-miR-551b-3p; hsa- miR-222-3p; hsa-miR-125b-5p; hsa-miR-31-5p; hsa-miR-375) were used as the features for the classification. The sensitivity of this classifier is 82.6% and the specificity is 59.5%. The figure shows a confusion matrix where the x-axis shows the classifier answer (Clas. Ans.) while the y- axis shows the true diagnosis (Real class=re.cl.). Fig.29C: Figure 20: The normalized values of 8 5 microRNAs (hsa-miR-146b-5p; hsa-miR-551b-3p; hsa-miR-222-3p; hsa-miR-125b-5p; hsa-miR- 31-5p; hsa-miR-375; hsa-miR-152-3p; hsa-miR-181c-5p) were used as the features for the classification. The figure shows a confusion matrix where the x-axis shows the classifier answer (Clas. Ans.) while the y-axis shows the true diagnosis (Real class=re.cl.). The sensitivity of this classifier is 90.4% and the specificity is 60.3%. 10
Figure 30A-30C: A SVM classifier was used to classify Indeterminate malignant (diamonds, M) from benign (squares, B) samples, using the normalized values of microRNA ratios. Fig.30A: The normalized values of three microRNA ratios (hsa-miR-146b-5p:hsa-miR- 342-3p; hsa-miR-31-5p:hsa-miR-342-3p; hsa-miR-125b-5p:hsa-miR-200c-3p) were used as the features for the classification. The sensitivity of this classifier is 81.7% and the specificity is 15 67.5%. Misclassified samples (miscl.) are represented by a dot. Fig.30B: The normalized values of 6 microRNA ratios (hsa-miR-146b-5p:hsa-miR-342-3p; hsa-miR-31-5p:hsa-miR-342-3p; hsa- miR-125b-5p:hsa-miR-200c-3p; hsa-miR-125b-5p:hsa-miR-138-5p; hsa-miR-222-3p:hsa-miR- 486-5p; MID-16582:hsa-miR-200c-3p) were used as the features for the classification. The figure shows a confusion matrix where the x-axis shows the classifier answer (Clas. Ans.) while 20 the y-axis shows the true diagnosis (Real class=re.cl.). The sensitivity of this classifier is 88.7% and the specificity is 63.5%. Fig. 30C: The normalized values of 8 microRNA ratios (hsa-miR- 146b-5p:hsa-miR-342-3p; hsa-miR-31-5p:hsa-miR-342-3p; hsa-miR-125b-5p:hsa-miR-200c-3p; hsa-miR-125b-5p:hsa-miR-138-5p; hsa-miR-222-3p:hsa-miR-486-5p; MID-16582:hsa-miR- 200c-3p; MID-16582:hsa-miR-138-5p; hsa-miR-200c-3p:hsa-miR-486-5p) were used as the 25 features for the classification. The figure shows a confusion matrix where the x-axis shows the classifier answer (Clas. Ans.) while the y-axis shows the true diagnosis (Real class=re.cl.). The sensitivity of this classifier is 87.8% and the specificity is 58.7%.
Figure 31A-31C: A SVM classifier was used to classify Indeterminate malignant (diamonds, M) from benign (squares, B) samples, using the combination of normalized values of 30 microRNAs and microRNA ratios. Fig. 31A: The normalized values of two microRNAs and one microRNA ratio (hsa-miR-146b-5p:hsa-miR-342-3p; hsa-miR-146b-5p; hsa-miR-551b-3p) were used as the features for the classification. The sensitivity of this classifier is 80% and the specificity is 71.4%. Fig. 31B: The normalized values of 4 microRNAs and two microRNA ratios (hsa-miR-146b-5p:hsa-miR-342-3p; hsa-miR-146b-5p; hsa-miR-551b-3p; hsa-miR-222- 3p; hsa-miR-31-5p:hsa-miR-342-3p; hsa-miR-125b-5p:hsa-miR-200c-3p) were used as the features for the classification. The figure shows a confusion matrix where the x-axis shows the classifier answer (Clas. Ans.) while the y-axis shows the true diagnosis (Real class=re.cl.). The sensitivity of this classifier is 89.9% and the specificity is 51.6%. Fig. 31C: The normalized 5 values of 5 microRNAs and 3 microRNA ratios (hsa-miR-146b-5p:hsa-miR-342-3p; hsa-miR- 551b-3p; hsa-miR-146b-5p; hsa-miR-222-3p; hsa-miR-31-5p:hsa-miR-342-3p; hsa-miR-125b- 5p:hsa-miR-200c-3p; hsa-miR-125b-5p; hsa-miR-31-5p) were used as the features for the classification. The figure shows a confusion matrix where the x-axis shows the classifier answer (Clas. Ans.) while the y-axis shows the true diagnosis (Real class=re.cl.). The sensitivity of this 10 classifier is 84.3% and the specificity is 68.3%.
Figure 32A-32C: A Discriminant analysis ensemble classifier was used to classify Indeterminate malignant (diamonds, M) from benign (squares, B) samples using the normalized values of microRNAs. Fig.32A: The normalized values of two microRNA (hsa-miR-146b-5p; hsa-miR-551b-3p) were used as the features for the classification. The sensitivity of this 15 classifier is 85.2% and the specificity is 45.2%. The grey shaded area marks the space in which a sample is classified as malignant, as determined by the classifier. Fig.32B: The normalized values of three microRNAs (hsa-miR-551b-3p; hsa-miR-146b-5p; hsa-miR-222-3p) were used as the features for the classification. The sensitivity of this classifier is 84.3% and the specificity is 45.2%. Misclassified samples (miscl.) are represented by a dot. Fig.32C: The normalized values 20 of 8 microRNAs (hsa-miR-146b-5p; hsa-miR-551b-3p; hsa-miR-222-3p; hsa-miR-125b-5p; hsa- miR-31-5p; hsa-miR-375; hsa-miR-152-3p; hsa-miR-181c-5p) were used as the features for the classification. The figure shows a confusion matrix where the x-axis shows the classifier answer (Clas. Ans.) while the y-axis shows the true diagnosis (Real class=re.cl.). The sensitivity of this classifier is 88.7% and the specificity is 64.3%. 25
Figure 33A-33C: A Discriminant analysis ensemble classifier was used to classify Indeterminate malignant (diamonds, M) from benign (squares, B) samples using the normalized values of microRNA ratios. Fig.33A: The normalized values of two microRNA ratios (hsa-miR- 146b-5p:hsa-miR-342-3p; hsa-miR-31-5p:hsa-miR-342-3p) were used as the features for the classification. The sensitivity of this classifier is 86.1% and the specificity is 61.1%. The grey 30 shaded area marks the space in which a sample is classified as malignant, as determined by the classifier. Fig.33B: The normalized values of three microRNA ratios (hsa-miR-146b-5p:hsa- miR-342-3p; hsa-miR-31-5p:hsa-miR-342-3p; hsa-miR-125b-5p:hsa-miR-200c-3p) were used as the features for the classification. The sensitivity of this classifier is 87% and the specificity is 57.1%. Misclassified samples (miscl.) are represented by a dot. Fig.33C: The normalized values of 8 microRNA ratios (hsa-miR-146b-5p:hsa-miR-342-3p; hsa-miR-31-5p:hsa-miR-342-3p; hsa- miR-125b-5p:hsa-miR-200c-3p; hsa-miR-125b-5p:hsa-miR-138-5p; hsa-miR-222-3p:hsa-miR- 486-5p; MID-16582:hsa-miR-200c-3p; MID-16582:hsa-miR-138-5p; hsa-miR-200c-3p:hsa- miR-486-5p) were used as the features for the classification. The figure shows a confusion 5 matrix where the x-axis shows the classifier answer (Clas. Ans.) while the y-axis shows the true diagnosis (Real class=re.cl.). The sensitivity of this classifier is 89.6% and the specificity is 65.1%.
Figure 34A-34C: A Discriminant analysis ensemble classifier was used to classify Indeterminate malignant (diamonds, M) from benign (squares, B) samples using a combination 10 of normalized values of microRNAs and microRNA ratios. Fig.34A: The normalized values of one microRNA and one microRNA ratio (hsa-miR-146b-5p:hsa-miR-342-3p; hsa-miR-146b-5p) were used as the features for the classification. The sensitivity of this classifier is 83.5% and the specificity is 58.7%. The grey shaded area marks the space in which a sample is classified as malignant, as determined by the classifier. Fig.34B: The normalized values of two microRNAs 15 and one microRNA ratio (hsa-miR-146b-5p:hsa-miR-342-3p; hsa-miR-146b-5p; hsa-miR-551b- 3p) were used as the features for the classification. The sensitivity of this classifier is 85.2% and the specificity is 65.9%. Misclassified samples (miscl.) are represented by a dot. Fig.34C: The normalized values of 5 microRNAs and 3 microRNA ratios (hsa-miR-146b-5p:hsa-miR-342-3p; hsa-miR-146b-5p; hsa-miR-551b-3p; hsa-miR-222-3p; hsa-miR-31-5p:hsa-miR-342-3p; hsa- 20 miR-125b-5p:hsa-miR-200c-3p; hsa-miR-125b-5p; hsa-miR-31-5p) were used as the features for the classification. The figure shows a confusion matrix where the x-axis shows the classifier answer (Clas. Ans.) while the y-axis shows the true diagnosis (Real class=re.cl.). The sensitivity of this classifier is 87.8% and the specificity is 62.7%.
Figure 35: Normalized expression (Exp.) levels of hsa-miR-146b-5p is shown as a dot 25 plot for Indeterminate non-medullary malignant ("Mai.") and benign ("Ben.") samples. Lines represent the median values for each group. Within each group, dots are randomly distributed along the x-axis, in order to improve visibility of the dots.
Figure 36: The normalized expression levels (Exp.) of the miR ratio hsa-miR-146b- 5p:hsa-miR-342-3p is shown as a dot plot for Indeterminate non-medullary malignant ("Mai.") 30 and benign ("Ben.") samples. Lines represent the median values for each group. Within each group, dots are randomly distributed along the x-axis, in order to improve visibility of the dots.
Figure 37A-37C: A Discriminant analysis classifier was used to classify Bethesda IV malignant (diamonds, M) from benign (squares, B) samples, using the normalized values of microRNAs. Fig.37A: The normalized values of two microRNAs (hsa-miR-125b-5p; hsa-miR- 551b-3p) were used as the features for the classification. The sensitivity of this classifier is 91.5% and the specificity is 42.9%. The grey shaded area marks the space in which a sample is classified as malignant, as determined by the classifier. Fig.37B: The normalized values of three microRNAs (hsa-miR-125b-5p; hsa-miR-551b-3p; hsa-miR-222-3p) were used as the features 5 for the classification. The sensitivity of this classifier is 91.5% and the specificity is 39.7%. Misclassified samples (miscl.) are represented by a dot. Fig.37C: The normalized values of 8 microRNAs (hsa-miR-125b-5p; hsa-miR-551b-3p; hsa-miR-222-3p; hsa-miR-146b-5p; hsa-miR- 375; hsa-miR-181c-5p; hsa-miR-31-5p; hsa-miR-138-5p) were used as the features for the classification. The figure shows a confusion matrix where the x-axis shows the classifier answer 10 (Clas. Ans.) while the y-axis shows the true diagnosis (Real class=re.cl). The sensitivity of this classifier is 89.4% and the specificity is 47.6%.
Figure 38A-38C: A Discriminant analysis classifier was used to classify Bethesda IV malignant (diamonds, M) from benign (squares, B) samples, using the normalized values of microRNA ratios. Fig.38A: The normalized values of two microRNA ratios (hsa-miR-125b- 15 5p:hsa-miR-200c-3p; hsa-miR-146b-5p:hsa-miR-342-3p) were used as the features for the classification. The sensitivity of this classifier is 89.4% and the specificity is 28.6%. The grey shaded area marks the space in which a sample is classified as malignant, as determined by the classifier. Fig.38B: The normalized values of three microRNA ratios (hsa-miR-125b-5p:hsa- miR-200c-3p; hsa-miR-146b-5p:hsa-miR-342-3p; hsa-miR-31-5p:hsa-miR-342-3p) were used as 20 the features for the classification. The sensitivity of this classifier is 91.5% and the specificity is 30.2%. Misclassified samples (miscl.) are represented by a dot. Fig.38C: The normalized values of 8 microRNA ratios (hsa-miR-125b-5p:hsa-miR-200c-3p; hsa-miR-146b-5p:hsa-miR-342-3p; hsa-miR-31-5p:hsa-miR-342-3p; MID-16582:hsa-miR-138-5p; hsa-miR-222-3p:hsa-miR-486- 5p; MID-16582:hsa-miR-200c-3p; hsa-miR-125b-5p: hsa-miR-138-5p; hsa-miR-200c-3p:hsa- 25 miR-486-5p) were used as the features for the classification. The figure shows a confusion matrix where the x-axis shows the classifier answer (Clas. Ans.) while the y-axis shows the true diagnosis (Real class=re.cl.). The sensitivity of this classifier is 80.9% and the specificity is 57.1%.
Figure 39A-39C: A Discriminant analysis classifier was used to classify Bethesda IV 30 malignant (diamonds, M) from benign (squares, B) samples, using the normalized values of microRNAs and microRNA ratios. Fig. 39A: The normalized values of one microRNA and one microRNA ratio (hsa-miR-125b-5p; hsa-miR-125b-5p:hsa-miR-200c-3p) were used as the features for the classification. The sensitivity of this classifier is 93.6% and the specificity is 33.3%. The grey shaded area marks the space in which a sample is classified as malignant, as determined by the classifier. Fig. 39B: The normalized values of one microRNA and two microRNA ratios (hsa-miR-125b-5p; hsa-miR-125b-5p:hsa-miR-200c-3p; hsa-miR-146b- 5p:hsa-miR-342-3p) were used as the features for the classification. The sensitivity of this classifier is 89.4% and the specificity is 41.3%. Misclassified samples (miscl.) are represented by 5 a dot. Fig. 39C: The normalized values of 4 microRNAs and 4 microRNA ratios (hsa-miR-125b-5p; hsa-miR-125b-5p:hsa-miR-200c-3p; hsa-miR-146b-5p: hsa-miR-342-3p; hsa-miR-551b-3p; hsa- miR-222-3p; hsa-miR-146b-5p; hsa-miR-31-5p:hsa-miR-342-3p; MID-16582:hsa-miR-138-5p) were used as the features for the classification. The figure shows a confusion matrix where the 10 x-axis shows the classifier answer (Clas. Ans.) while the y-axis shows the true diagnosis (Real class=re.cl). The sensitivity of this classifier is 87.2% and the specificity is 46%.
Figure 40A-40C: A KNN classifier was used to classify Bethesda IV malignant from benign samples, using the normalized values of microRNAs. The figures show a confusion matrix where the x-axis shows the classifier answer (Clas. Ans.) while the y-axis shows the true 15 diagnosis (Real class=re.cl.). Fig.40A: The normalized values of 6 microRNAs (hsa-miR- 125b- 5p; hsa-miR-551b-3p; hsa-miR-222-3p; hsa-miR-146b-5p; hsa-miR-375; hsa-miR-181c-5p) were used as the features for the classification. The sensitivity of this classifier is 72.3% and the specificity is 39.7%. Fig.40B: The normalized values of 8 microRNA (hsa-miR-125b-5p; hsa- miR-551b-3p; hsa-miR-222-3p; hsa-miR-146b-5p; hsa-miR-375; hsa-miR-181c-5p; hsa-miR-31- 20 5p; hsa-miR-138-5p) were used as the features for the classification. The sensitivity of this classifier is 66% and the specificity is 61.9%. Fig.40C: The normalized values of 12 microRNA (hsa-miR-125b-5p; hsa-miR-551b-3p; hsa-miR-222-3p; hsa-miR-146b-5p; hsa-miR-375; hsa- miR-181c-5p; hsa-miR-31-5p; hsa-miR-138-5p; hsa-miR-200c-3p; MID-16582; hsa-miR-346; hsa-miR-152-3p) were used as the features for the classification. The sensitivity of this classifier 25 is 66% and the specificity is 61.9%.
Figure 41A-41B: A KNN classifier was used to classify Bethesda IV malignant from benign samples, using the normalized values of microRNA ratios. The figures show a confusion matrix where the x-axis shows the classifier answer (Clas. Ans.) while the y-axis shows the true diagnosis (Real class=re.cl.). Fig.41A: The normalized values of 6 microRNA ratios (hsa-miR- 30 125b-5p:hsa-miR-200c-3p; hsa-miR-146b-5p:hsa-miR-342-3p; hsa-miR-31-5p:hsa-miR-342-3p; MID-16582:hsa-miR-138-5p; hsa-miR-222-3p:hsa-miR-486-5p; MID-16582:hsa-miR-200c-3p) were used as the features for the classification. The sensitivity of this classifier is 78.7% and the specificity is 61.9%. Fig.41B: The normalized values of 8 microRNA ratios (hsa-miR- 125b- 5p:hsa-miR-200c-3p; hsa-miR-146b-5p:hsa-miR-342-3p; hsa-miR-31-5p:hsa-miR-342-3p; MID-16582:hsa-miR-138-5p; hsa-miR-222-3p:hsa-miR-486-5p; MID-16582:hsa-miR-200c-3p; hsa-miR-125b-5p:hsa-miR-138-5p; hsa-miR-200c-3p:hsa-miR-486-5p) were used as the features for the classification. The sensitivity of this classifier is 80.9% and the specificity is 50.8%.
Figure 42A-42C: A KNN classifier was used to classify Bethesda IV malignant from 5 benign samples, using the normalized values of microRNAs and microRNA ratios. The figures show a confusion matrix where the x-axis shows the classifier answer (Clas. Ans.) while the y- axis shows the true diagnosis (Real class=re.cl.). Fig.42A: The normalized values of 4 microRNAs and 2 microRNA ratios (hsa-miR-125b-5p; hsa-miR-125b-5p:hsa-miR-200c-3p; hsa-miR-146b-5p:hsa-miR-342-3p; hsa-miR-551b-3p; hsa-miR-222-3p; hsa-miR-146b-5p) were 10 used as the features for the classification. The sensitivity of this classifier is 63.8% and the specificity is 46%. Fig.42B: The normalized values of 4 microRNAs and 4 microRNA ratios (hsa-miR-125b-5p; hsa-miR-125b-5p:hsa-miR-200c-3p; hsa-miR-146b-5p:hsa-miR-342-3p; hsa- miR-551b-3p; hsa-miR-222-3p; hsa-miR-146b-5p; hsa-miR-31-5p:hsa-miR-342-3p; MID- 16582:hsa-miR-138-5p) were used as the features for the classification. The sensitivity of this 15 classifier is 68.1% and the specificity is 49.2%. Fig.42C: The normalized values of 6 microRNA and 6 microRNA ratios (hsa-miR-125b-5p; hsa-miR-125b-5p:hsa-miR-200c-3p; hsa-miR-146b- 5p:hsa-miR-342-3p; hsa-miR-551b-3p; hsa-miR-222-3p; hsa-miR-146b-5p; hsa-miR-31-5p:hsa- miR-342-3p; MID-16582:hsa-miR-138-5p; hsa-miR-375; hsa-miR-222-3p:hsa-miR-486-5p; hsa-miR-181c-5p; MID-16582:hsa-miR-200c-3p) were used as the features for the classification. 20 The sensitivity of this classifier is 74.5% and the specificity is 58.7%.
Figure 43A-43C: A SVM classifier was used to classify Bethesda IV malignant from benign samples, using the normalized values of microRNAs. Fig.43A: The normalized values of three microRNA (hsa-miR-125b-5p; hsa-miR-551b-3p; hsa-miR-222-3p) were used as the features for the classification. The sensitivity of this classifier is 97.9% and the specificity is 25 22.2%. Malignant=M (diamonds); Benign=B (squares). Fig.43B: The normalized values of 6 microRNAs (hsa-miR-125b-5p; hsa-miR-551b-3p; hsa-miR-222-3p; hsa-miR-146b-5p; hsa-miR- 375; hsa-miR-181c-5p) were used as the features for the classification. The figure shows a confusion matrix where the x-axis shows the classifier answer (Clas. Ans.) while the y-axis shows the true diagnosis (Real class=re.cl. The sensitivity of this classifier is 89.4% and the 30 specificity is 38.1%. Fig.43C: The normalized values of 8 microRNA (hsa-miR-125b-5p; hsa- miR-551b-3p; hsa-miR-222-3p; hsa-miR-146b-5p; hsa-miR-375; hsa-miR-181c-5p; hsa-miR-31- 5p; hsa-miR-138-5p) were used as the features for the classification. The figure shows a confusion matrix where the x-axis shows the classifier answer (Clas. Ans.) while the y-axis shows the true diagnosis (Real class=re.cl.). The sensitivity of this classifier is 91.5% and the specificity is 55.6%.
Figure 44A-44C: A SVM classifier was used to classify Bethesda IV malignant (diamonds, M) from benign (squares, B) samples, using the normalized values of microRNA ratios. Fig.44A: The normalized values of three microRNA ratios (hsa-miR-125b-5p:hsa-miR- 5 200c-3p; hsa-miR-146b-5p:hsa-miR-342-3p; hsa-miR-31-5p:hsa-miR-342-3p) were used as the features for the classification. The sensitivity of this classifier is 100%. Fig.44B: The normalized values of 6 microRNA ratios (hsa-miR-125b-5p:hsa-miR-200c-3p; hsa-miR-146b- 5p:hsa-miR-342-3p; hsa-miR-31-5p:hsa-miR-342-3p; MID-16582:hsa-miR-138-5p; hsa-miR- 222-3p:hsa-miR-486-5p; MID-16582:hsa-miR-200c-3p) were used as the features for the 10 classification. The figure shows a confusion matrix where the x-axis shows the classifier answer (Clas. Ans.) while the y-axis shows the true diagnosis (Real class=re.cl.). The sensitivity of this classifier is 93.6% and the specificity is 33.3%. Fig.44C: The normalized values of 8 microRNA ratios (hsa-miR-125b-5p:hsa-miR-200c-3p; hsa-miR-146b-5p:hsa-miR-342-3p; hsa-miR-31- 5p:hsa-miR-342-3p; MID-16582:hsa-miR-138-5p; hsa-miR-222-3p:hsa-miR-486-5p; MID- 15 16582:hsa-miR-200c-3p; hsa-miR-125b-5p:hsa-miR-138-5p; hsa-miR-200c-3p:hsa-miR-486- 5p) were used as the features for the classification. The figure shows a confusion matrix where the x-axis shows the classifier answer (Clas. Ans.) while the y-axis shows the true diagnosis (Real class=re.cl.). The sensitivity of this classifier is 93.6% and the specificity is 31.7%.
Figure 45A-45C: A SVM classifier was used to classify Bethesda IV malignant 20 (diamonds, M) from benign (squares, B) samples, using a combination normalized values of microRNAs and microRNA ratios. Fig.45A: The normalized values of one microRNA and two microRNA ratios (hsa-miR-125b-5p; hsa-miR-125b-5p:hsa-miR-200c-3p; hsa-miR-146b- 5p:hsa-miR-342-3p) were used as the features for the classification. The sensitivity of this classifier is 93.6% and the specificity is 22.2%. Misclassified samples (miscl.) are represented by 25 a dot. Fig.45B: The normalized values of 4 microRNAs and 2 microRNA ratios (hsa-miR-125b- 5p; hsa-miR-125b-5p:hsa-miR-200c-3p; hsa-miR-146b-5p:hsa-miR-342-3p; hsa-miR-551b-3p; hsa-miR-222-3p; hsa-miR-146b-5p) were used as the features for the classification. The figure shows a confusion matrix where the x-axis shows the classifier answer (Clas. Ans.) while the y- axis shows the true diagnosis (Real class=re.cl.). The sensitivity of this classifier is 95.7% and 30 the specificity is 31.7%. Fig.45C: The normalized values of 4 microRNAs and 4 microRNA ratios (hsa-miR-125b-5p; hsa-miR-125b-5p:hsa-miR-200c-3p; hsa-miR-146b-5p:hsa-miR-342- 3p; hsa-miR-551b-3p; hsa-miR-222-3p; hsa-miR-146b-5p; hsa-miR-31-5p:hsa-miR-342-3p; MID-16582:hsa-miR-138-5p) were used as the features for the classification. The figure shows a confusion matrix where the x-axis shows the classifier answer (Clas. Ans.) while the y-axis shows the true diagnosis (Real class=re.cl.). The sensitivity of this classifier is 91.5% and the specificity is 36.5%.
Figure 46A-46C: A Discriminant Analysis Ensemble classifier was used to classify Bethesda IV malignant (diamonds, M) from benign (squares, B) samples, using normalized 5 values of microRNAs. Fig.46A: The normalized values of two microRNAs (hsa-miR-125b-5p; hsa-miR-551b-3p) were used as the features for the classification. The sensitivity of this classifier is 91.5% and the specificity is 39.7%. The grey shaded area marks the space in which a sample is classified as malignant, as determined by the classifier. Fig.46B: The normalized values of three microRNAs (hsa-miR-125b-5p; hsa-miR-551b-3p; hsa-miR-222-3p) were used as 10 the features for the classification. The sensitivity of this classifier is 89.4% and the specificity is 39.7%. Fig.46C: The normalized values of 8 microRNAs (hsa-miR-125b-5p; hsa-miR-551b-3p; hsa-miR-222-3p; hsa-miR-146b-5p; hsa-miR-375; hsa-miR-181c-5p; hsa-miR-31-5p; hsa-miR- 138-5p) were used as the features for the classification. The figure shows a confusion matrix where the x-axis shows the classifier answer (Clas. Ans.) while the y-axis shows the true 15 diagnosis (Real class=re.cl.). The sensitivity of this classifier is 93.6% and the specificity is 46%.
Figure 47A-47C: A Discriminant Analysis Ensemble classifier was used to classify Bethesda IV malignant (diamonds, M) from benign (squares, B) samples, using normalized values of microRNA ratios. Fig.47A: The normalized values of two microRNA ratios (hsa-miR- 20 125b-5p:hsa-miR-200c-3p; hsa-miR-146b-5p:hsa-miR-342-3p) were used as the features for the classification. The sensitivity of this classifier is 93.6% and the specificity is 19%. The grey shaded area marks the space in which a sample is classified as malignant, as determined by the classifier. Fig.47B: The normalized values of three microRNA ratios (hsa-miR-125b-5p:hsa- miR-200c-3p; hsa-miR- 146b-5p:hsa-miR-342-3p; hsa-miR-31-5p:hsa-miR-342-3p) were used as 25 the features for the classification. The sensitivity of this classifier is 93.6% and the specificity is 17.5%. Misclassified samples (miscl.) are represented by a dot. Fig.47C: The normalized values of 8 microRNA ratios (hsa-miR- 125b-5p:hsa-miR-200c-3p; hsa-miR- 146b-5p:hsa-miR-342-3p; hsa-miR-31-5p:hsa-miR-342-3p; MID-16582:hsa-miR-138-5p; hsa-miR-222-3p:hsa-miR-486- 5p; MID-16582:hsa-miR-200c-3p; hsa-miR-125b-5p:hsa-miR-138-5p; hsa-miR-200c-3p:hsa- 30 miR-486-5p) were used as the features for the classification. The figure shows a confusion matrix where the x-axis shows the classifier answer (Clas. Ans.) while the y-axis shows the true diagnosis (Real class=re.cl.). The sensitivity of this classifier is 89.4% and the specificity is 44.4%. Figure 48A-48C: A Discriminant Analysis Ensemble classifier was used to classify Bethesda IV malignant (diamonds, M) from benign (squares, B) samples, using a combination of normalized values of microRNAs and microRNA ratios. Fig.48A: The normalized values of one microRNA and one microRNA ratio (hsa-miR-125b-5p; hsa-miR-125b-5p:hsa-miR-200c-3p) were used as the features for the classification. The sensitivity of this classifier is 91.5% and the 5 specificity is 33.3%. The grey shaded area marks the space in which a sample is classified as malignant, as determined by the classifier. Fig.48B: The normalized values of one microRNA and two microRNA ratios (hsa-miR-125b-5p; hsa-miR-125b-5p:hsa-miR-200c-3p; hsa-miR- 146b-5p:hsa-miR-342-3p) were used as the features for the classification. The sensitivity of this classifier is 89.4% and the specificity is 36.5%. Misclassified samples (miscl.) are represented by 10 a dot. Fig.48C: The normalized values of 4 microRNA and 4 microRNA ratios (hsa-miR-125b- 5p; hsa-miR-125b-5p:hsa-miR-200c-3p; hsa-miR-146b-5p:hsa-miR-342-3p; hsa-miR-551b-3p; hsa-miR-222-3p; hsa-miR-146b-5p; hsa-miR-31-5p:hsa-miR-342-3p; MID-16582:hsa-miR-138- 5p) were used as the features for the classification. The figure shows a confusion matrix where the x-axis shows the classifier answer (Clas. Ans.) while the y-axis shows the true diagnosis 15 (Real class=re.cl.). The sensitivity of this classifier is 91.5% and the specificity is 34.9%.
Figure 49: The normalized expression (Exp.) levels of hsa-miR-146b-5p is shown as a dot plot for Bethesda IV non-medullary malignant ("Mai.") and for benign ("Ben.") samples. Lines represent the median values for each group. Within each group, dots are randomly distributed along the x-axis. 20
Figure 50: The normalized expression (Exp.) levels of the microRNA ratio hsa-miR- 146b-5p:hsa-miR-342-3p is shown as a dot plot for Bethesda IV non-medullary malignant ("Mai.") and for benign ("Ben.") samples. Lines represent the median values for each group. Within each group, dots are randomly distributed along the x-axis.
Figure 51: A Discriminant Analysis classifier was used to classify malignant (diamonds, 25 M) from benign (squares, B) samples, wherein the malignant group included samples of medullary tumor. The normalized values of two microRNA (hsa-miR-222-3p; hsa-miR-551b-3p) were used as features for the classification. The sensitivity of this classifier is 85.2% and the specificity is 53.6%. The grey shaded area marks the space in which a sample is classified as malignant, as determined by the classifier. 30
Figure 52: A Discriminant Analysis classifier was used to classify malignant (diamonds, M) from benign (squares, B) samples, wherein the malignant group included samples of medullary tumor. The normalized values of two microRNA ratios (hsa-miR-125b-5p:hsa-miR- 138-5p; hsa-miR-146b-5p:hsa-miR-342-3p) were used as the features for the classification. The sensitivity of this classifier is 84.7% and the specificity is 80.8%. The grey shaded area marks the space in which a sample is classified as malignant, as determined by the classifier.
Figure 53: Expression pattern of hsa-miR-486-5p and hsa-miR-200c-3p is determinant for the quality of the sample. Four samples of blood smears (BS) were analyzed for the expression of hsa-miR-486-5p (SEQ ID N0.22) and hsa-miR-200c-3p (SEQ ID NO.23-24) in 5 comparison with their expression in malignant (M) and benign (B) thyroid samples. Normalized values for the two miRs are shown (normalized using all normalizers).
Figure 54: Sub-typing of Benign Thyroid Tumors. microRNA expression profile (median) was established for two sub-types of benign tumors, Follicular Adenoma (FA, y axis, n=81) and Hashimoto (Hash., x axis, n=6). Each cross represents a microRNA or a microRNA 10 ratio. The ratio hsa-miR-125b-5p:hsa-miR-200c-3p correlated to FA, while expression of hsa- miR-342-3p and hsa-miR-31-5p correlated with Hashimoto. Diamonds represent normalizers. Significant microRNAs (p-value for t-test < 0.05) are represented by circles.
Figure 55: Sub-typing of Malignant Thyroid Tumors. microRNA expression profile was established for two sub-types of malignant thyroid tumors, papillary carcinoma (Pap.; y-axis, 15 n=161) and follicular carcinoma (FC; x-axis, n=16). Each cross represents a microRNA or a microRNA ratio. Diamonds are normalizers. Significant microRNAs (p-value for t-test < 0.05) are encircled. Only normalized microRNA values are labeled. Unlabeled circles represent significant ratios.
Figure 56: Flowchart representing the protocol for diagnosis of indeterminate thyroid 20 nodule samples obtained through FNA.
DETAILED DESCRIPTION OF THE INVENTION
Despite accumulated efforts in the search for accurate diagnosis of thyroid lesions, a great number of technical problems remain with no solution in sight. As a result of the quality of the 25 material obtained, the diagnosis of thyroid lesions in fine needle aspiration (FNA) samples is still challenging. The low number of cells, the amount of blood, the ratio between thyroid tumor cells and non-thyroid tumor cells in the sample, make it challenging to extract enough material that will provide conclusive results.
The present invention provides a sensitive, specific and accurate methodology for 30 distinguishing between malignant and benign thyroid tumors, as well as particular subtypes of thyroid tumors. Distinguishing between different subtypes of thyroid tumors is essential for providing the patient with the best and most suitable treatment. The present invention provides a significant improvement of the technologies currently available in the field of thyroid tumor classification and diagnosis.
The present inventors have developed an integrative platform for the classification of thyroid lesions, by profiling and characterizing microRNA expression in thyroid clinical samples obtained by FNA biopsies, while also overcoming hindrances such as low number of cells in the 5 sample and the amount of blood in the sample by microRNA profiling. This technological platform was applied to stratify thyroid lesions into benign or malignant neoplasms, as well as subtypes of thyroid tumors, as an adjunctive tool in the pre-operative management of thyroid nodules. The inventors have exceptionally developed a method for classification of benign and malignant thyroid lesions, and specific subtypes of thyroid cancer and follicular lesions, while 10 integrating steps for filtering out sub-optimal samples, by implementing specific algorithms based on microRNA profiling. The method is part of an overall protocol, in which existing or available clinical cytological slides having smears from FNA samples may be used, without the need to generate or collect additional material from the patients.
The present method further incorporates the analysis of microRNAs in minute amounts 15 of RNA material from cytological samples. Once an FNA sample is collected, between one and several passes of material are smeared onto slides. Currently available methods usually require the use of several passes for having enough material for analysis. The present inventors developed a method in which even only one FNA slide provides sufficient material for microRNA detection. Furthermore, the present inventors were able to measure microRNA 20 abundance in FNA samples obtained from thyroid nodules as small as 0.1 cm. This is particularly relevant considering that approximately 50% of thyroid lesions are smaller than 1cm [Jung et al. (2014)/ Clin Endocrinol Metab 99: E276-E285]. In addition, the method developed by the inventors allows for the aanalysis of samples having very small amounts of cells, such as samples having 50 cells, up to 120 cells and over. 25
The present method includes steps for eliminating or disqualifying samples that lack thyroid cells and/or in which non-thyroid cells, such as blood cells, are over-represented.
The present inventors have identified a unique microRNA expression signature for thyroid lesions through profiling the expression of the microRNAs denoted by SEQ ID NOs. l- 308. 30
More specifically, the present inventors have develop a platform for classification of thyroid clinical samples based on the levels of expression of a set of microRNAs, comprising at least two microRNAs, selected from the group consisting of hsa-miR-31-5p (SEQ ID NO.5-7), hsa-miR-424-3p (SEQ ID NO.16), hsa-miR-222-3p (SEQ ID NO.1-2), hsa-miR-146b-5p (SEQ ID NO.10-11), hsa-miR-346 (SEQ ID N0.14), MID-16582 (SEQ ID N0.25), hsa-miR-342-3p (SEQ ID NO.17-18), hsa-miR-181c-5p (SEQ ID N0.15), hsa-miR-125b-5p (SEQ ID N0.9), hsa- miR-375 (SEQ ID N0.8), hsa-miR-486-5p (SEQ ID N0.22), hsa-miR-551b-3p (SEQ ID N0.3- 4), hsa-miR-152-3p (SEQ ID NO.12-13), hsa-miR-200c-3p (SEQ ID NO.23-24) and hsa-miR- 138-5p (SEQ ID NO.19-21); or a sequence at least 80%, at least 85%, or at least 90% identical 5 thereto. The platform was established based on a training study with a robust cohort, and which also included the measurement of additional microRNAs that served as normalizers.
The present invention is particularly useful for the 25% of the cases in which FNA specimens present inconclusive results in cytopathology, usually referred to as "indeterminate", and which include thyroid lesion samples classified in Bethesda categories III, IV and V. In 10 current medical practice, patients with specimens falling within this category undergo repeat FNA procedure, and surgery, including lobectomy and thyroidectomy.
Thus, in one embodiment, the present invention provides a method of classification for thyroid lesion samples that fall into the "indeterminate" cases, classified in categories III, IV and V of the Bethesda System (described further herein). In one particular embodiment, the present 15 invention provides a method of classification for thyroid lesion samples classified in category IV of the Bethesda System, which relates to "Follicular Neoplasm" or "Suspicious of a Follicular Neoplasm", which is known to be the most difficult category to be classified.
Thus, the present invention presents primarily a protocol for management of thyroid lesion samples which failed to be classified by cytopathological analysis. Particular samples that 20 are of interest are those obtained by FNA. In one embodiment, routine smears from FNA samples are used. In another embodiment, FNA samples in preservative solutions may be used. Total RNA is extracted from the FNA samples, and the expression of microRNAs is measured. In one embodiment, the expression of about 2200 microRNAs is measured. In another embodiment, the expression of 182 microRNAs, comprising the sequences of SEQ ID NO. 1- 25 182 is measured. In a further embodiment, the expression of the microRNAs comprising the sequences of SEQ ID NO.1-37 is measured. In another further embodiment, at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least eleven, at least twelve, at least thirteen, at least fourteen, or all microRNAs from the group selected from hsa-miR-31-5p (SEQ ID NO.5-7), hsa-miR-424-3p (SEQ ID NO.16), hsa-miR- 30 222-3p (SEQ ID NO.1-2), hsa-miR- 146b-5p (SEQ ID NO.10-11), hsa-miR-346 (SEQ ID N0.14), MID-16582 (SEQ ID N0.25), hsa-miR-342-3p (SEQ ID NO.17-18), hsa-miR-181c-5p (SEQ ID N0.15), hsa-miR- 125b-5p (SEQ ID NO.9), hsa-miR-375 (SEQ ID NO.8), hsa-miR- 486-5p (SEQ ID N0.22), hsa-miR-551b-3p (SEQ ID NO.3-4), hsa-miR- 152-3p (SEQ ID N0.12- 13), hsa-miR-200c-3p (SEQ ID NO.23-24) and hsa-miR-138-5p (SEQ ID NO.19-21), or a sequence at least 80%, at least 85%, or at least 90% identical thereto, are measured and used in the classification.
In a further embodiment, classification of the thyroid sample as malignant or benign comprises measuring the expression levels of hsa-miR-222-3p (SEQ ID NO.1-2), hsa-miR-551b- 5 3p (SEQ ID NO.3-4), hsa-miR-31-5p (SEQ ID NO.5-7), hsa-miR-375 (SEQ ID NO.8), hsa-miR- 125b-5p (SEQ ID NO.9), hsa-miR-146b-5p (SEQ ID NO.10-11), hsa-miR-152-3p (SEQ ID NO.12-13), hsa-miR-346 (SEQ ID N0.14), hsa-miR-181c-5p (SEQ ID N0.15), hsa-miR-424-3p (SEQ ID NO.16), hsa-miR-342-3p (SEQ ID NO.17-18), hsa-miR-138-5p (SEQ ID NO.19-21), hsa-miR-486-5p (SEQ ID N0.22), hsa-miR-200c-3p (SEQ ID NO.23-24), MID-16582 (SEQ ID 10 NO.25), or any combination thereof, or a sequence at least 80%, at least 85%, or at least 90% identical thereto, providing the levels of expression to a classifier which analyzes and classifies the sample as malignant or benign.
Thus, the present invention provides a method for distinguishing between malignant and benign thyroid tumor lesions in a subject in need, said method comprising obtaining a thyroid 15 tumor lesion sample from said subject, or provided a biological sample obtained from said subject, determining an expression profile in said sample of one or more, or at least four microRNAs comprising SEQ ID NOS: 1-25, or a sequence at least 80%, at least 85%, or at least 90% identical thereto, or any combination of said microRNAs, by hybridization or by amplification, comparing said expression profile to a reference threshold value by using a 20 classifier algorithm; and determining whether the thyroid lesion is malignant or benign. In one particular embodiment, the method of the invention is for distinguishing sub-types of malignant or benign thyroid tumor lesions.
In one embodiment, the method of the invention comprises measuring the expression of at least four of the microRNAs comprising SEQ ID NOS: 1-25, obtaining the microRNA 25 expression profile value of said sample, and using a classifier to establish, based on said value, whether the thyroid lesion is malignant or benign, and optionally further classifying the sample into one of the malignant or benign subtypes.
In one particular embodiment, said determining an expression profile by hybridization comprises contacting the sample with probes that hybridize to each of SEQ ID NOS: 1-25, or to 30 a sequence at least 80%, at least 85%, or at least 90% identical thereto. In another embodiment, said determining an expression profile by hybridization comprises contacting the sample with probes that hybridize with at least eight, at least ten, at least twelve, at least fourteen, or at least sixteen contiguous nucleotides of said microRNA comprising SEQ ID NOS: 1-25. The present invention further provides a method of classifying a sample as malignant or benign, and/or sub-typing said sample, whereby, further to measuring the expression levels of microRNAs in the sample, obtaining an expression profile and optionally calculating microRNA ratios, applying a multi-step analysis of the expression data. Said multi-step analysis comprising applying one or more algorithms, in parallel or sequentially, to at least one of the microRNA 5 expression profiles, microRNA ratios, or a combination thereof. Said multi-step analysis may also further include analyzing the expression of one or more single microRNA levels which may be indicative of the overall quality of the sample.
Examples of criteria that may be included in the multi-step analysis, in any order and in any combination, are: the expression of non-malignant cell markers, the expression of 10 microRNAs that correlate with a specific sub-type of thyroid tumor, and the like. Thus for example, one step may be examining whether the expression of non-thyroid cell markers is higher or lower than the threshold established in the data set, e.g. the training data set, in which case the sample may be disqualified. Another further step may be examining the expression of a microRNA or microRNA ratio that correlates with a thyroid tumor sub-type, e.g. if the 15 expression of hsa-miR-342-3p (SEQ ID NO.17-18) is very high compared to the threshold established in the data set, e.g. the training data set, the sample may be classified as benign, and further sub-typed as being Hashimoto. Alternatively, if the expression of hsa-miR-342-3p (SEQ ID NO.17-18) is very high compared to the threshold established in the data set, e.g. the training data set, the sample may be disqualified for lack of sufficient thyroid cells. Another further 20 optional step may relate to the level of expression of MID-16582 (SEQ ID NO.25), may be used to determine whether the sample may be discarded, or analyzed using a classifier specific for these samples in which MID-16582 (SEQ ID N0.25) is high (compared to the threshold established in the training set).
In one particular embodiment of the invention, said non-thyroid cell marker is a blood 25 cell marker.
In another particular embodiment of the invention, said cell marker is an epithelial cell marker.
In a further particular embodiment of the invention, said cell marker is a blood cell marker, a white blood cell marker or an epithelial cell marker. Examples of blood cell markers 30 are hsa-miR-486-5p (SEQ ID N0.22), hsa-miR-320a (SEQ ID NO.173), hsa-miR-106a-5p (SEQ ID NO.150), hsa-miR-93-5p (SEQ ID NO.182), hsa-miR-17-3p (SEQ ID NO.160), hsa-let-7d-5p (SEQ ID N0.144), hsa-miR-107 (SEQ ID N0.152), hsa-miR-103a-3p (SEQ ID N0.149), hsa-miR-17-5p (SEQ ID NO.161), hsa-miR-191-5p (SEQ ID N0.163), hsa-miR-25-3p (SEQ ID N0.167), hsa-miR-106b-5p (SEQ ID N0.151), hsa-miR-20a-5p (SEQ ID N0.166), hsa-miR-18a-5p (SEQ ID NO.40), hsa-miR-144-3p (SEQ ID N0.154), hsa-miR-140-3p (SEQ ID N0.51), hsa-miR-15b-5p (SEQ ID N0.157), hsa-miR-16-5p (SEQ ID NO.159), hsa-miR-92a-3p (SEQ ID NO.181), hsa-miR-484 (SEQ ID NO.179), hsa-miR-151a-5p (SEQ ID NO.156), hsa-let-7f-5p (SEQ ID NO., hsa-let-7a-5p (SEQ ID 5 NO.141), hsa-let-7c-5p (SEQ ID NO.143), hsa-let-7b-5p (SEQ ID NO.142), hsa-let-7g-5p (SEQ ID N0.146), hsa-let-7i-5p (SEQ ID N0.147), hsa-miR-185-5p (SEQ ID N0.162), hsa-miR-30d-5p (SEQ ID NO.172), hsa-miR-30b-5p (SEQ ID NO.170), hsa-miR-30c-5p (SEQ ID N0.171), hsa-miR-19b-3p, hsa-miR-26a-5p (SEQ ID N0.168), hsa-miR-26b-5p (SEQ ID N0.169), hsa-miR-425-5p (SEQ ID N0.176), MID-19433 (SEQ ID N0.133), and 10 hsa-miR-4306 (SEQ ID NO.177). Examples of white blood cell markers are hsa-miR-342-3p (SEQ ID NO.17-18), hsa-miR-146a-5p and hsa-miR-150-5p (SEQ ID N0.59). Examples of epithelial markers are hsa-miR-200c-3p (SEQ ID NO.23-24), hsa-miR-138-5p (SEQ ID N0.19- 21), hsa-miR-3648 (SEQ ID N0.174), hsa-miR-125b-5p (SEQ ID NO.9), hsa-miR-125a-5p (SEQ ID N0.153), hsa-miR-192-3p (SEQ ID N0.164), hsa-miR-4324 (SEQ ID N0.178), hsa- 15 miR-376a-3p (SEQ ID NO.175).
As referred to herein, said microRNA ratio is the ratio between the normalized expression level of a pair of microRNAs, wherein the normalized expression level of one microRNA is used as the numerator and the normalized expression level of a second microRNA is the denominator. 20
In another particular embodiment, said determining an expression profile comprises contacting the sample with RT-PCR reagents, including forward and reverse primers as exemplified herein in the Examples, and generating RT-PCR products.
In a further particular embodiment, said method comprises contacting RT-PCR products with specific or general probes, or a combination thereof, as exemplified herein in the Examples, 25 detecting and measuring the PCR products.
In a further embodiment, said determining an expression profile comprises measuring microRNA expression by hybridization, using microarrays and the like. In another further embodiment, said determining an expression profile comprises measuring microRNA expression by next-generation sequencing. 30
In another embodiment, said method comprises optionally further determining the expression profile of at least one microRNA to be used as normalizer. In one embodiment, any microRNA as described in Table 1 may be used as a normalizer. In one particular embodiment, any of the microRNAs comprising SEQ ID NO. 26-37, or a sequence at least 80%, 85%, 90%, or 95% identical thereto, are used as normalizers.
The present inventors have surprisingly found that the classification of a thyroid tumor sample is improved when a number of markers, from different categories as defined and exemplified herein are used. Said markers may be any one of malignant markers, secondary 5 markers and cell-type markers, or any combination thereof, comprising SEQ ID NOS:l-25, or a sequence at least 80%, 85%, 90%, or 95% identical thereto. In order to perform the method of the invention, the full set of markers may be used. Alternatively, any combination of malignant, secondary and cell-type markers may be used. Thus, the method may comprise at least one malignant marker, in association with at least one secondary marker and/or at least one cell-type 10 marker.
Depending on the analysis of the data, each of the cell type markers may be used as in the form of raw or normalized signals. Alternatively, the cell type markers may be used as a preliminary test prior to performing the classification, in order to determine whether the sample has sufficient relevant material to perform classification, or whether the sample should be 15 discarded. Yet another option is to use the cell-type markers as part of the final classifier, where the signal of the cell type marker is used by the classifier. A further option is to use the cell-type markers as the denominator of a miR ratio optionally used by the classifier. For example, the expression level of a malignant or a secondary marker may be divided by the expression level of a cell-type marker, and the resulting miR ratio used in the classifier. 20
Thus, in a further embodiment of the method for distinguishing between malignant and benign thyroid tumor lesions in a subject in need, said classifier may be any one of a single classifier, a multi-step classifier, a classifier which uses all the malignant markers, a classifier which uses a subset of the malignant markers, a classifier which uses all the malignant markers and the secondary markers, a classifier which uses a subset of the malignant markers and a 25 subset of the secondary markers, a classifier which uses all the malignant markers and the secondary markers and the cell type markers, a classifier which employs a subset of all the malignant markers and the secondary markers and the cell type markers, a classifier which uses all or a subset of the malignant markers and all or a subset of the cell type markers.
In another further embodiment of the method or the protocol of the invention, the 30 performance of the classification may be improved by further combining the result from the algorithm classifier with additional clinical or molecular data available for the thyroid sample being analyzed. Additional data available may be related to the thyroid lesion, such as the size of the nodule, the number of nodules; it may relate to other clinical information available for the subject from whom the sample was obtained, such as molecular test results, like the expression of other molecular markers, genetic markers, biochemical test results, blood test results, urine test results, recurrence, prognosis data, family history, patient medical history, and the like. Other data that may also be combined is thyroid genetic data, such as mutation analysis, gene fusions, chromosomal rearrangements, gene expression, protein expression, and the like. 5
Therapeutic indications may vary according to the diagnostic obtained with the method or protocol of the invention. Typically there are five types of therapy that may be administered to a thyroid cancer patient: surgery, radiation therapy, chemotherapy, thyroid hormone therapy and targeted therapy.
Surgery is the most common treatment of thyroid cancer. One of the following 10 procedures may be used:
Lobectomy: Removal of the lobe in which thyroid cancer is found. Biopsies of lymph nodes in the area may be done to see if they contain cancer.
Near-total thyroidectomy: Removal of all but a very small part of the thyroid.
Total thyroidectomy: Removal of the whole thyroid. 15
Lymphadenectomy: Removal of lymph nodes in the neck that contain cancer.
Thyroidectomy is a surgical procedure that has several potential complications or sequela including: temporary or permanent change in voice, temporary or permanently low calcium, need for lifelong thyroid hormone replacement, bleeding, infection, and the remote possibility of airway obstruction due to bilateral vocal cord paralysis. Therefore, accurate diagnosis which 20 would prevent the unnecessary removal of the thyroid gland is very desirable.
Radiation therapy uses high-energy x-rays or other types of radiation to eliminate cancer cells or inhibit their proliferation. There are two types of radiation therapy. External radiation therapy uses a machine outside the body to send radiation toward the cancer. Internal radiation therapy uses a radioactive substance sealed in needles, seeds, wires, or catheters that 25 are placed directly into or near the cancer. The radiation therapy of choice will be dependent on the type and stage of the thyroid cancer. Radiation therapy may be supplementary to surgery in order to eliminate cancer cells that were not successfully removed. Follicular and papillary thyroid cancers may be treated with radioactive iodine (RAI) therapy. RAI is administered orally and collects in any remaining thyroid tissue, including thyroid cancer cells that have spread to 30 other places in the body. Since only thyroid tissue takes up iodine, the RAI destroys thyroid tissue and thyroid cancer cells without harming other tissues. Before a full treatment dose of RAI is given, a small test-dose is given to see if the tumor takes up the iodine. Chemotherapy is another option for thyroid cancer treatment. Chemotherapy may be administered orally or by injection, intravenous or intramuscular. Chemotherapy may also be administered directly into the cancer affected area instead of systemically. The choice of administration will depend on the type and stage of the cancer. A few examples of drugs that have been approved for thyroid cancer treatment are: Adriamycin PFS (Doxorubicin 5 Hydrochloride), Adriamycin RDF (Doxorubicin Hydrochloride), Cabozantinib-S-Malate, Caprelsa (Vandetanib), Cometriq (Cabozantinib-S-Malate), Doxorubicin Hydrochloride, Nexavar (Sorafenib Tosylate), Sorafenib Tosylate and Vandetanib.
Thyroid hormone therapy is a cancer treatment that removes hormones or blocks their action and inhibits cancer cell proliferation. In the treatment of thyroid cancer, drugs may be 10 given to prevent thyroid- stimulating hormone (TSH) production, in order to avoid that the hormone would induce the growth or recurrence of the thyroid cancer.
Also, because thyroid cancer treatment specifically targets thyroid cells, the thyroid is not able to make enough thyroid hormone. Patients are given thyroid hormone replacement pills.
Targeted therapy uses drugs or other substances to identify and attack specific cancer 15 cells without harming normal cells. Tyrosine kinase inhibitor (TKI) therapy blocks signal transduction in thyroid cancer cells, inhibiting their growth. Vandetanib is a TKI used to treat thyroid cancer.
Dosage and duration of any therapy will depend on individual evaluation of the patient and on standard practice known by the health care provider. The duration of treatment is the 20 period of time during which doses of a pharmaceutical agent or pharmaceutical composition are administered.
The identification and differentiation of the thyroid tumor, firstly as benign or malignant, and subsequently its classification into the various subtypes through the analysis of differentially expressed microRNAs can provide further clues to the biological differences between the 25 subtypes, their diverging oncogenetic processes and possible new targets for type-specific target therapy.
The present invention provides diagnostic assays and methods, both quantitative and qualitative, for detecting, diagnosing, monitoring, staging and prognosticating thyroid cancers by comparing levels of the specific microRNA molecules as described herein. Such levels are 30 measured in a patient sample, which may be from a biopsy, tumor samples, cells, tissues and/or bodily fluids.
Thus, the method of the invention is particularly useful for discriminating between different subtypes of malignant thyroid tumors, such types being follicular carcinoma, papillary carcinoma, follicular variant of papillary carcinoma (FVPC or FVPTC), encapsulated FVPC (or encapsulated FVPTC), medullary carcinoma, anaplastic thyroid cancer, poorly differentiated thyroid cancer, and for determining the therapeutic course to be followed after diagnosis. In a further embodiment, the present invention provides a method for classifying sub-types of benign thyroid tumor, e.g. follicular adenoma, Hashimoto thyroiditis, hyperplasia (Goiter). 5
The present invention also provides a method of treatment of thyroid cancer, said method comprising the method of distinguishing between benign or malignant thyroid tumor as described herein, optionally subtyping the thyroid tumor type, and administering the treatment according to the diagnosis provided by the present method.
All the methods of the present invention may optionally further include measuring levels 10 of other cancer markers. Other cancer markers, in addition to said microRNA molecules useful in the present invention, will depend on the cancer being tested and are known to those of skill in the art.
Assay techniques that can be used to determine levels of gene expression, such as the nucleic acid sequence of the present invention, in a sample derived from a patient are well 15 known to those of skill in the art. Such assay methods include, but are not limited to, radioimmunoassays, reverse transcriptase PCR (RT-PCR) assays, immunohistochemistry assays, in situ hybridization assays, competitive-binding assays, northern blot analyses, ELISA assays, nucleic acid microarrays and biochip analysis.
An arbitrary threshold on the expression level of one or more nucleic acid sequences can 20 be set for assigning a sample or tumor sample to one of two groups. Alternatively, in a preferred embodiment, expression levels of one or more nucleic acid sequences of the invention are combined by taking ratios of expression levels of two nucleic acid sequences and/or by a method such as logistic regression to define a metric which is then compared to previously measured samples or to a threshold. The threshold for assignment is treated as a parameter, which can be 25 used to quantify the confidence with which samples are assigned to each class. The threshold for assignment can be scaled to favor sensitivity or specificity, depending on the clinical scenario. The correlation value to the reference data generates a continuous score that can be scaled and provides diagnostic information on the likelihood that a samples belongs to a certain class of thyroid subtype. In multivariate analysis, the microRNA signature provides a high level of 30 prognostic information.
The present invention also provides novel microRNA molecules, comprising nucleic acids denoted by SEQ ID NOS.27-29, 33, 34, 139, 140, 307 and 308. It is to be understood, that the cDNA, complement sequence, and anti-miR corresponding to any one of SEQ ID NOS.27- 29, 33, 34, 139, 140, 307 and 308 are also encompassed by the present invention.
Further, the present application provides compositions, formulations and medicaments comprising the microRNAs described herein. In one particular embodiment, the present invention provides compositions, formulations and medicaments comprising as an active agent 5 the microRNA comprising any one of SEQ ID NOS.27-29, 33, 34, 139, 140, 307 and 308, variants thereof, or a sequence at least 80%, at least 85%, or at least 90% identical thereto. Said compositions, formulations and medicaments may further optionally comprise any one of adjuvants, carriers, diluents and excipients. The microRNAs described herein can be formulated into compositions, formulations and medicaments by combination with appropriate, 10 pharmaceutically acceptable carriers or diluents, and can be formulated into preparations in solid, semi-solid, liquid or gaseous forms, such as tablets, capsules, powders, granules, ointments, solutions, suppositories, injections, inhalants and aerosols. As such, administration of the microRNA or a pharmaceutical composition comprising thereof can be achieved in various ways, including oral, buccal, rectal, parenteral, intraperitoneal, intradermal, transdermal, 15 intratracheal, etc.
In certain embodiments, pharmaceutical compositions of the present invention comprise one or more nucleic acids of the invention and one or more excipients. In certain such embodiments, excipients are selected from water, salt solutions, alcohol, polyethylene glycols, gelatin, lactose, amylase, magnesium stearate, talc, silicic acid, viscous paraffin, 20 hydroxymethylcellulose and polyvinylpyrrolidone.
In certain embodiments, a pharmaceutical composition of the present invention is prepared using known techniques, including, but not limited to mixing, dissolving, granulating, dragee-making, levigating, emulsifying, encapsulating, entrapping or tabletting processes. Methods for the preparation of pharmaceutical compositions may be found in the literature, e.g. 25 in Gennaro, A. R. (2000) Remington: The Science and Practice of Pharmacy, 20th ed.
In certain embodiments, a pharmaceutical composition of the present invention is a liquid (e.g., a suspension, elixir and/or solution). In certain of such embodiments, a liquid pharmaceutical composition is prepared using ingredients known in the art, including, but not limited to, water, glycols, oils, alcohols, flavoring agents, preservatives, and coloring agents. 30
In certain embodiments, a pharmaceutical composition of the present invention is a solid (e.g., a powder, tablet, and/or capsule). In certain of such embodiments, a solid pharmaceutical composition comprising one or more nucleic acids of the invention is prepared using ingredients known in the art, including, but not limited to, starches, sugars, diluents, granulating agents, lubricants, binders, and disintegrating agents.
Further, the present application provides vectors and probes comprising the compounds (the nucleic acids) disclosed herein. In one particular embodiment, the present application provides vectors and probes comprising nucleic acids denoted by SEQ ID NOS.27-29, 33, 34, 5 139, 140, 307 and 308, variants thereof or a sequence at least 80%, at least 85%, or at least 90% identical thereto.
It is to be understood that the terminology used herein is for the purpose of describing particular embodiments only and it is not intended to be limiting. It must be noted that, as used in the specification and the appended claims, the singular forms "a," "an" and "the" include plural 10 referents unless the context clearly dictates otherwise.
For the recitation of numeric ranges herein, each intervening number there between with the same degree of precision is explicitly contemplated. For example, for the range of 6-9, the numbers 7 and 8 are contemplated in addition to 6 and 9, and for the range 6.0-7.0 for example, the numbers 6.0, 6.1, 6.2, 6.3, 6.4, 6.5, 6.6, 6.7, 6.8, 6.9 and 7.0 are explicitly contemplated. 15
As used herein, the term "aberrant proliferation" means cell proliferation that deviates from the normal, proper, or expected course. Aberrant cell proliferation may include cell proliferation whose characteristics are associated with an indication caused by, mediated by, or resulting in inappropriately high levels of cell division, inappropriately low levels of apoptosis, or both. Such indications may be characterized, for example, by single or multiple local 20 abnormal proliferations of cells, groups of cells, or tissue(s), whether cancerous or noncancerous, benign or malignant. Aberrant proliferation is one of the main features of cancer.
As used herein, the term "about" refers to +/-10%.
"Attached" or "immobilized", as used herein to refer to a probe and a solid support, means that the binding between the probe and the solid support is sufficient to be stable under 25 conditions of binding, washing, analysis, and removal. The binding may be covalent or non- covalent. Covalent bonds may be formed directly between the probe and the solid support or may be formed by a cross linker or by inclusion of a specific reactive group on either the solid support or the probe or both molecules. Non-covalent binding may be one or more of electrostatic, hydrophilic, and hydrophobic interactions. Included in non-covalent binding is the 30 covalent attachment of a molecule, such as streptavidin, to the support and the non-covalent binding of a biotinylated probe to the streptavidin. Immobilization may also involve a combination of covalent and non-covalent interactions. "Biological sample" or "sample", as used herein, means a sample of biological tissue or fluid that comprises nucleic acids, microRNA in particular. Such samples include, but are not limited to, tissue or fluid isolated from subjects. Biological samples also include sections of tissues such as biopsy and autopsy samples, fine-needle aspiration (FNA) samples, frozen sections taken for histological purposes, blood, blood fraction, plasma, serum, and the like. A 5 biological sample may be provided by removing a sample of cells from a subject, but can also be accomplished by using previously isolated cells (e.g., isolated by another person, at another time, and/or for another purpose), which may then be cultured or not. Archival tissues, such as those having treatment or outcome history, may also be used.
In another embodiment of the invention, the FNA biopsy is prepared as a smear. 10
The term "classification" refers to a procedure and/or algorithm in which individual items are placed into groups or classes based on quantitative information on one or more characteristics inherent in the items (referred to as traits, variables, characters, features, etc.) and based on a statistical model and/or a training set of previously labeled items.
As used herein, the term "classifying thyroid tumors" refers to the identification of one or 15 more properties of a thyroid tissue sample (e.g., including but not limited to, the presence of microRNAs expressed in cancerous tissue, the presence of microRNAs expressed in precancerous tissue that is likely to become cancerous, and the presence of microRNAs expressed in cancerous tissue that is likely to metastasize).
The term "classifier" as used herein refers to an algorithm used to classify, distinguish or 20 identify thyroid tumors (or lesions) as benign or malignant, or to classify, distinguish or identify sub-types of thyroid tumor. Once the microRNA expression profile of the samples of any study cohort is acquired, for example from the training cohort, a database is generated in which the expression levels of all the microRNAs in the samples of the cohorts are stored. This database is also referred to as "the training data" and it is used to choose an optimal algorithm for 25 classification. Nucleic acid (or microRNA) ratios, alone or in combination with nucleic acid (or microRNA) levels may also be used by the algorithm for the classification of thyroid samples.
In one embodiment, the algorithm to be used in the method or protocol of the invention is a machine-learning algorithm. Examples of machine-learning algorithms are discriminant analysis, K-nearest neighbor classifier (KNN), Support Vector Machine (SVM) classifier, , 30 logistic regression classifier, neural network classifier, Gaussian mixture model (GMM), nearest centroid classifier, linear regression classifier, decision tree classifier, and random forest classifier, ensemble of classifiers, or any combination thereof. When a discriminant analysis classifier is used, the discriminant may be any one of a linear, quadratic, a diagonal of the linear covariance matrix, diagonals of the quadratic covariance matrices, pseudoinverse of the linear covariance matrix, and pseudoinverse of the quadratic covariance matrices. When a KNN classifier is used, the k may be altered and the distance metric can be either Pearson correlation, spearman correlation, Euclidean or cityblock 5 (Manhattan) distance. When a SVM classifier is used, the kernel may be linear, Gaussian or polynomial. When an ensemble method classifier is used, it usually applies algorithms such as classification trees, KNN or discriminate analysis classifiers. The ensembles can be either created using boosting or bagging algorithms and the number of ensemble learning cycles can range from two up to a few thousand. 10
As used herein, "confusion matrix" refers to a specific table layout that allows visualization of the performance of an algorithm, typically a supervised learning one. A "confusion matrix" may also be referred to as a contingency table or an error matrix.
"Complement" or "complementary", as used herein to refer to a nucleic acid, may mean Watson-Crick (e.g., A-T/U and C-G) or Hoogsteen base pairing between nucleotides or 15 nucleotide analogs of nucleic acid molecules. A full complement or fully complementary means 100% complementary base pairing between nucleotides or nucleotide analogs of nucleic acid molecules. In some embodiments, the complementary sequence has a reverse orientation (5 '-3'). The present invention also provides the complement of the nucleic acids denoted by SEQ ID NOS. 7-29, 33, 34, 139, and 140. 20
As used herein, "CT signals" or "CT" represent the first cycle of PCR where amplification crosses a threshold (cycle threshold) of fluorescence. Accordingly, low values of CT represent high abundance or expression levels of the microRNA. In some embodiments the PCR CT signal is normalized such that the normalized CT is inversed from the expression level. In other embodiments the PCR CT signal may be normalized and then inverted such that low normalized- 25 inverted CT represents low abundance or expression levels of the microRNA.
As used herein, a "data processing routine" refers to a process that can be embodied in software that determines the biological significance of acquired data (i.e., the ultimate results of an assay or analysis) with respect to one or more samples. For example, the data processing routine can make determination of whether a thyroid lesion from which a sample was collected 30 or obtained is benign or malignant, or of a specific sub-type, based upon the data collected. In the systems and methods herein, the data processing routine can also control the data collection routine based upon the results determined. The data processing routine and the data collection routines can be integrated and provide feedback to operate the data acquisition, and hence provide assay-based judging methods.
"Detection" means detecting the presence of a component in a sample. Detection also means detecting the absence of a component. Detection also means determining the level of a component, either quantitatively or qualitatively. 5
"Differential expression" or a "difference in expression levels" means qualitative or quantitative differences in the microRNA expression patterns in thyroid samples. Thus, a differentially expressed microRNA may qualitatively have its expression altered, including an activation or inactivation, in, e.g., normal versus diseased thyroid tissue. A qualitatively regulated microRNA may exhibit an expression pattern within a thyroid sample or cell type 10 which may be detectable by standard techniques. Some microRNAs may be expressed in one thyroid sample or cell type, and not in other, or expressed at different levels between different cell types or different samples. Thus, the difference in expression may be quantitative, e.g., in that expression is modulated, up-regulated, resulting in an increased amount of microRNA, or down-regulated, resulting in a decreased amount of microRNA. The degree to which expression 15 differs needs only be large enough to quantify via standard characterization techniques such as expression arrays, next generation sequencing (NGS), quantitative reverse transcriptase PCR, northern blot analysis, real-time PCR, in situ hybridization and RNase protection.
The term "expression profile" is used broadly to include a genomic expression profile, as well as an expression profile of microRNAs, for example. As used herein, expression profile 20 means the set of data obtained for the nucleic acid (or microRNA) expression. It may refer to the raw data or to the normalized expression values. Expression profiles may be generated by any convenient means for determining a level of a nucleic acid sequence e.g. quantitative hybridization of microRNA, labeled microRNA, amplified microRNA, cDNA, etc., quantitative PCR, and the like. Further to measuring nucleic acid sequence levels, the data obtained may be 25 normalized - normalization of data has been discussed somewhere else in this application. Expression profiles allow the analysis of differential gene expression between two or more samples, as well as between samples and thresholds. Further, classifiers may be applied to expression profiles in order to obtain information about the sample, such as classification, diagnosis, sub-typing of the sample, and the like. Nucleic acid sequences of interest are nucleic 30 acid sequences that are found to be predictive, including the nucleic acid sequences provided herein in Table 1, where the expression profile may include expression data for 5, 10, 20, 25, 50, 100 or more of, including all of the listed nucleic acid sequences. According to some embodiments, the term "expression profile" means measuring the abundance of the nucleic acid sequences in the measured samples. In a specific embodiment, microRNA expression profiles are characterized in each thyroid sample.
"Expression ratio", as used herein, refers to relative expression levels of two or more nucleic acids, i.e. microRNAs, as determined by detecting the relative expression levels of the corresponding nucleic acids in a biological sample, such as a thyroid sample. Since microRNA 5 expression levels are expressed as CTS, which are obtained in log scale, in practice expression ratios are obtained by subtraction of the CTS, rather than by division.
As used herein, "FDR" or "False Discovery Rate", is a statistical method used in multiple hypothesis testing to correct for multiple comparisons. When performing multiple statistical tests, for example in comparing the signal between two groups in multiple data features, there is 10 an increasingly high probability of obtaining false positive results, by random differences between the groups that can reach levels that would otherwise be considered as statistically significant. In order to limit the proportion of such false discoveries, statistical significance is defined only for data features in which the differences reached a p-value (by two-sided t-test) below a threshold, which is dependent on the number of tests performed and the distribution of 15 p-values obtained in these tests.
As used herein, "FNA" relates to "fine needle aspiration". Fine-needle aspiration biopsy (FNAB, FNA or NAB), or fine-needle aspiration cytology (FNAC), is a diagnostic procedure used to investigate superficial (just under the skin) lumps or masses, and it is particularly useful for thyroid lesion biopsies. A biopsy is collected by inserting a thin, hollow needle into the mass 20 for sampling of cells that, after being stained, will be examined under a microscope. There could be cytology exam of aspirate (cell specimen evaluation, FNAC) or histological (biopsy - tissue specimen evaluation, FNAB). FNA is a popular biopsy method used for thyroid nodules since a major surgical (excisional or open) biopsy can be avoided by performing a needle aspiration biopsy instead. A detailed description of specimen collection and preparation may be found in 25 "Atlas of Fine Needle Aspiration Cytology" by Henryk A. Domanski (2014), the contents of which are incorporated herein by reference. The preparation of aspiration specimens has been well described in the art. Usually, a suitable amount of aspirate (usually about one drop) is spread thinly and evenly over a microscopic slide which is then stained and mounted. FNA specimen prepared in this manner are also referred to as "smear". The result should be 30 compatible to a sectioned histological slide with regard to specimen thickness and evenness. Fixation of FNA smears is usually by air drying (generally referred to as "routine air dried FNAB") or wet fixing using either 95% ethanol or cyto-spray as fixative. Other suitable liquid fixatives are methanol, acetone, isopropyl alcohol, acetone/methanol and the like. Alternatively, FNA samples may be added to or mixed with preservatives in a tube.
As referred to herein, a "follicular" lesion may be any one of follicular adenoma (FA), follicular carcinoma (FC) and follicular variant of papillary carcinoma (FVPCA).
"Fragment" is used herein to indicate a non-full-length part of a nucleic acid. Thus, a 5 fragment is itself also a nucleic acid.
"Groove binder" and/or "minor groove binder" (MGB), as used herein, may be used interchangeably and refer to small molecules that fit into the minor groove of double- stranded DNA, typically in a sequence-specific manner. Minor groove binders may be long, flat molecules that can adopt a crescent-like shape and thus, fit snugly into the minor groove of a 10 double helix, often displacing water. Minor groove binding molecules may typically comprise several aromatic rings connected by bonds with torsional freedom such as furan, benzene, or pyrrole rings. Minor groove binders may be antibiotics such as netropsin, distamycin, berenil, pentamidine and other aromatic diamidines, Hoechst 33258, SN 6999, aureolic anti-tumor drugs such as chromomycin and mithramycin, CC-1065, dihydrocyclopyrroloindole tripeptide (DPI3), 15 l,2-dihydro-(3H)-pyrrolo[3,2-e]indole-7-carboxylate (CDPI3), and related compounds and analogues, including those described in Nucleic Acids in Chemistry and Biology, 2d ed., Blackburn and Gait, eds., Oxford University Press, 1996, and PCT Published Application No. WO 03/078450, the contents of which are incorporated herein by reference. A minor groove binder may be a component of a primer, a probe, a hybridization tag complement, or 20 combinations thereof. Minor groove binders may increase the Tm of the primer or a probe to which they are attached, allowing such primers or probes to effectively hybridize at higher temperatures.
"Identical" or "identity", as used herein in the context of two or more nucleic acid sequences, mean that the sequences have a specified percentage of residues that are the same 25 over a specified region. The percentage may be calculated by optimally aligning the two sequences, comparing the two sequences over the specified region, determining the number of positions at which the identical residue occurs in both sequences to yield the number of matched positions, dividing the number of matched positions by the total number of positions in the specified region, and multiplying the result by 100 to yield the percentage of sequence identity. 30 In cases where the two sequences are of different lengths or the alignment produces one or more staggered ends and the specified region of comparison includes only a single sequence, the residues of single sequence are included in the denominator but not the numerator of the calculation. When comparing DNA and RNA sequences, thymine (T) and uracil (U) may be considered equivalent. Identity may be performed manually or by using a computer sequence algorithm such as BLAST, BLAST 2.0, and the like.
"In situ detection", as used herein, means the detection of expression or expression levels in the original site hereby meaning in a tissue sample such as biopsy.
"Label", as used herein, means a composition detectable by spectroscopic, 5 photochemical, biochemical, immunochemical, chemical, or other physical means. The label may be any entity that does not naturally occur in a protein or nucleic acid and allows the nucleic acid or protein to be detectable. For example, useful labels include 32P, fluorescent dyes, electron-dense reagents, enzymes, biotin, digoxigenin, or haptens and other entities which can be made detectable, and the like. A label may be incorporated into nucleic acids and proteins at any 10 position.
"Logistic regression" is part of a category of statistical models called generalized linear models. Logistic regression allows one to predict a discrete outcome, such as group membership, from a set of variables that may be continuous, discrete, dichotomous, or a mix of any of these. The dependent or response variable can be dichotomous, for example, one of two possible types 15 of cancer. Logistic regression models the natural log of the odds ratio, i.e. the ratio of the probability of belonging to the first group (P) over the probability of belonging to the second group (1-P), as a linear combination of the different expression levels (in log-space). The logistic regression output can be used as a classifier by prescribing that a case or sample will be classified into the first type is P is greater than 0.5 or 50%. Alternatively, the calculated 20 probability P can be used as a variable in other contexts such as a ID or 2D threshold classifier.
As used herein, the term "prior" refers to a probability for each class, e.g., given to the different classes, and used by the likelihood that a sample is malignant or benign, without any additional knowledge regarding the expression profile of the sample in a classification. Priors may be set at different ratios, such as for example 80%-20% malignant-benign, 75%-25% 25 malignant-benign, 70%-30% malignant-benign, 65%-35% malignant-benign, 60%-40% malignant-benign, 50%-50% malignant-benign (i.e., uniform). In addition, priors may be empirical, i.e., based on the distribution of the samples in training cohort. Priors may be adjusted in order to achieve a predetermined sensitivity or specificity.
As used herein, a "marker" is a microRNA, or a nucleic acid sequence, whose presence 30 and abundance is measured in a sample. A "marker" further provides an indication of the status of the sample. As used herein, "malignant marker" is a microRNA, or a nucleic acid sequence which is present at higher levels in malignant samples versus benign samples. A malignant marker may or may not be present in test samples.
As used herein, "secondary marker" is a microRNA, or a nucleic acid sequence, which is used to differentiate between malignant and benign samples, and for which the difference, or the 5 ratio, in the expression levels of said secondary marker in malignant and benign samples is less than the difference, or the ratio, in the expression levels of malignant markers. A secondary marker may or may not be present in test samples.
As used herein, "cell type marker" refers to a microRNA, or nucleic acid sequence, whose expression correlates with certain cell types. Said cell types may generally be found in a 10 sample, e.g. blood cells, white blood cells, red blood cells, epithelial cells, Hurthle cells, mitochondrial-rich cells, lymphocytes, follicular cells, parafollicular cells (C cells), metastatic cells, immune cells, macrophages and the like. Other markers included as "cell type markers" may be species-specific markers, such as markers from bacteria, fungi, and the like.
"Normalizer", as used herein, means a microRNA or a nucleic acid sequence whose 15 signal (i.e., level of expression) is used in order to normalize each sample. A normalizer may be used alone (one microRNA as normalizer), or as part of a set of normalizers (more than one microRNA as normalizer, for example two, three, four, five, six, seven eight, nine, ten eleven, twelve, thirteen fourteen, sixteen or seventeen microRNAs may be used as normalizers in a set). As referred to herein, any microRNA detected in the sample may be used as a normalizer. To 20 that effect, the microRNAs defined herein as "markers" may also be used as "normalizers". Essentially, any microRNA may be used as a normalizer. To that effect, microRNAs denoted by any one of SEQ ID NOs 1-182 may be used as normalizers. MicroRNAs denoted by any one of SEQ ID NOs. 1-37 may be used as normalizers. Particular examples of microRNAs that may be used as normalizers are hsa-miR-23a-3p, MID-20094, MID-50969, hsa-miR-345-5p, hsa-miR- 25 3074-5p, MID-50976, MID-50971, hsa-miR-5701 and hsa-miR-574-3p.
"Normalization" of data values refers to mapping the original data range into another scale. Normalization may be done by subtracting the mean expression of the set of normalizers, subtracting the median expression of the set of normalizers, fitting the expression values of the normalizers to a reference set of values (using a polynomial fit) and applying this fit to all 30 signals. All the normalizers, or a subset of the normalizers may be used.
"Nucleic acid" or "oligonucleotide" or "polynucleotide", as used herein, means at least two nucleotides covalently linked together. The depiction of a single strand also defines the sequence of the complementary strand. Thus, a nucleic acid also encompasses the complementary strand of a depicted single strand. Many variants of a nucleic acid may be used for the same purpose as a given nucleic acid. Thus, a nucleic acid also encompasses substantially identical nucleic acids and complements thereof. A single strand may provide a probe that hybridizes to a target sequence under stringent hybridization conditions. Thus, a nucleic acid also encompasses a probe that hybridizes under stringent hybridization conditions. 5
Nucleic acids may be single- stranded or double- stranded, or may contain portions of both double- stranded and single- stranded sequences. The nucleic acid may be DNA, both genomic and cDNA, RNA, or a hybrid, where the nucleic acid may contain combinations of deoxyribo- and ribo-nucleotides, and combinations of bases including uracil, adenine, thymine, cytosine, guanine, inosine, xanthine, hypoxanthine, isocytosine and isoguanine. Nucleic acids may be 10 obtained by chemical synthesis methods or by recombinant methods.
A nucleic acid will generally contain phosphodiester bonds, although nucleic acid analogs may be included. The analog may include a non-naturally occurring linkage, backbone, or nucleotide. The analog may have at least one different linkage, e.g., phosphoramidate, phosphorothioate, phosphorodithioate, or O-methylphosphoroamidite linkages and peptide 15 nucleic acid backbones and linkages. Other analog nucleic acids include those with positive backbones; non-ionic backbones, and non-ribose backbones, including those described in US 5,235,033 and US 5,034,506, which are incorporated herein by reference. Nucleic acids containing one or more non-naturally occurring or modified nucleotides are also included within one definition of nucleic acids. The modified nucleotide analog may be located for example at 20 the 5'-end and/or the 3'-end of the nucleic acid molecule. Representative examples of nucleotide analogs may be selected from sugar- or backbone-modified ribonucleotides. It should be noted, however, that also nucleobase-modified ribonucleotides, i.e., ribonucleotides containing a non- naturally occurring nucleobase instead of a naturally occurring nucleobase such as uridines or cytidines modified at the 5-position, e.g., 5-(2-amino) propyl uridine, 5-bromo uridine; 25 adenosines and guanosines modified at the 8-position, e.g., 8-bromo guanosine; deaza nucleotides, e.g., 7-deaza-adenosine; O- and N-alkylated nucleotides, e.g., N6-methyl adenosine are suitable. The 2'-OH-group may be replaced by a group selected from H, OR, R, halo, SH, SR, NH2, NHR, NR2 or CN, wherein R is C1-C6 alkyl, alkenyl or alkynyl and halo is F, CI, Br or I. Modified nucleotides also include nucleotides conjugated with cholesterol through, e.g., a 30 hydroxyprolinol linkage as described in Krutzfeldt et al. (Nature 2005; 438:685-689), Soutschek et al. (Nature 2004; 432: 173-178), and WO 2005/079397, which are incorporated herein by reference. Modifications of the ribose -phosphate backbone may be done for a variety of reasons, e.g., to increase the stability and half-life of such molecules in physiological environments, to enhance diffusion across cell membranes, or as probes on a biochip. The backbone modification may also enhance resistance to degradation, such as in the harsh endocytic environment of cells. The backbone modification may also reduce nucleic acid clearance by hepatocytes, such as in the liver and thyroid. Mixtures of naturally occurring nucleic acids and analogs may be made. Alternatively, mixtures of different nucleic acid analogs, and mixtures of naturally occurring 5 nucleic acids and analogs may be made.
Thus, novel isolated nucleic acids are provided herein. The nucleic acids provided herein may be non-naturally occurring, synthesized nucleic acids. Thus, the nucleic acid provided herein may be a synthetic nucleic acid. Methods of synthesizing nucleic acids are known to the man skilled in the art, and are described, e.g., in US 7,579,451, the contents of which are 10 incorporated herein by reference. The nucleic acids may comprise at least one of the sequences of SEQ ID NOS: 1-308 or a variant thereof. In one embodiment, the nucleic acids comprise at least one of the sequences of SEQ ID NOS: 1-182. The variant may be a complement of the referenced nucleotide sequence. The variant may be a nucleotide sequence that is 70%, 75%, 80%, 85%, 90% or 95% identical to the referenced nucleotide sequence or the complement 15 thereof. The variant may be a nucleotide sequence which hybridizes under stringent conditions to the referenced nucleotide sequence, complements thereof, or nucleotide sequences substantially identical thereto.
A nucleic acid as described herein may have a length of from about 10 to about 250 nucleotides. The nucleic acid may have a length of at least 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 20, 21 , 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 125, 150, 175, 200 or 250 nucleotides. The nucleic acid may be synthesized or expressed in a cell (in vitro or in vivo) using a synthetic gene. The nucleic acid may be synthesized as a single strand molecule and hybridized to a substantially complementary nucleic acid to form a duplex. The nucleic acid may be introduced to a cell, tissue or organ in a single- or double-stranded form or capable of 25 being expressed by a synthetic gene using methods well known to those skilled in the art, including as described in US 6,506,559, the contents of which are incorporated by reference herein.
The nucleic acid may comprise a microRNA sequence shown in Table 1 , or a variant thereof. In some instances, variants of the same microRNA are also provided in Table 1. It is to 30 be noted that SEQ ID NOs.1-180 in Table 1 present the cDNA corresponding to the sequence of the naturally occurring microRNA, i.e., the sequences present thymine (T) instead of uracil (U).
It is to be understood that nucleic acid refers to deoxyribonucleotides, ribonucleotides, or modified nucleotides, and polymers thereof in single- or double- stranded form. The term encompasses nucleic acids containing known nucleotide analogs or modified backbone residues or linkages, which are synthetic, naturally occurring, and non-naturally occurring, which have similar binding properties as the reference nucleic acid, and which are metabolized in a manner similar to the reference nucleotides. Examples of such analogs include, without limitation, phosphorothioates, phosphoramidates, methyl phosphonates, chiral-methyl phosphonates, 2-0- 5 methyl ribonucleotides, peptide-nucleic acids (PNAs) and unlocked nucleic acids (UNAs; see, e.g., Jensen et al. Nucleic Acids Symposium Series 52: 133-4), and derivatives thereof.
Nucleotide is used as recognized in the art, to include those with natural bases (standard), and modified bases well known in the art. Such bases are generally located at the 1 ' position of a nucleotide sugar moiety. Nucleotides generally comprise a base, sugar and a phosphate group. 10 The nucleotides can be unmodified or modified at the sugar, phosphate and/or base moiety, also referred to interchangeably as nucleotide analogs, modified nucleotides, non-natural nucleotides, non-standard nucleotides and other (see, e.g., WO 92/07065; WO 93/15187; the contents of which are incorporated herein by reference). There are several examples of modified nucleic acid bases known in the art as summarized by Limbach, et al, Nucleic Acids Res. 22:2183, 1994. 15 Some of the non- limiting examples of base modifications that can be introduced into nucleic acid molecules include, hypoxanthine, purine, pyridin-4-one, pyridin-2-one, phenyl, pseudouracil, 2,4,6-trimethoxy benzene, 3-methyl uracil, dihydrouridine, naphthyl, aminophenyl, 5- alkylcytidines (e.g., 5-methylcytidine), 5-alkyluridines (e.g., ribo thymidine), 5-halouridine (e.g., 5-bromouridine) or 6-azapyrimidines or 6-alkylpyrimidines (e.g. 6-methyluridine), propyne, and 20 others (Burgin, et al., Biochemistry 35:14090, 1996). By "modified bases" in this aspect is meant nucleotide bases other than adenine, guanine, cytosine and uracil at Γ position or their equivalents.
Modified nucleotide refers to a nucleotide that has one or more modifications to the nucleoside, the nucleobase, pentose ring, or phosphate group. Modifications include those 25 naturally occurring that result from modification by enzymes that modify nucleotides, such as methyltransferases. Modified nucleotides also include synthetic or non-naturally occurring nucleotides. Synthetic or non-naturally occurring modifications in nucleotides include those with 2' modifications, e.g., 2'-methoxyethoxy, 2'-fluoro, 2'-allyl, 2'-0-[2-(methylamino)-2-oxoethyl], 4'-thio, 4'-CH2-0-2'-bridge, 4'-(CH2) 2-0-2'-bridge, 2'-LNA or other bicyclic or "bridged" 30 nucleoside analog, and 2'-0-(N-methylcarbamate) or those comprising base analogs. In connection with 2'-modified nucleotides as described for the present disclosure, by "amino" is meant 2'-NH2 or 2'-0-NH2, which can be modified or unmodified. Such modified groups are described, e.g., in US 5,672,695 and US 6,248,878. "Modified nucleotides" of the instant invention can also include nucleotide analogs as described above.
As used herein, "base analog" refers to a heterocyclic moiety which is located at the Γ position of a nucleotide sugar moiety in a modified nucleotide that can be incorporated into a nucleic acid duplex (or the equivalent position in a nucleotide sugar moiety substitution that can 5 be incorporated into a nucleic acid duplex). A base analog may be generally a purine or a pyrimidine base, excluding the common bases guanine (G), cytosine (C), adenine (A), thymine (T), and uracil (U). Base analogs can duplex with other bases or base analogs in dsRNAs. Base analogs include those useful in the compounds and methods of the invention, e.g., those disclosed in US 5,432,272, US 6,001,983 and US 7,579,451, which are herein incorporated by 10 reference. Non- limiting examples of bases include hypoxanthine (I), xanthine (X), 313-D- ribofuranosyl-(2,6-diaminopyrimidine) (K), 3-gamma-D-ribofuranosyl-(l-methyl-pyrazolo[4,3- d]pyrimidine-5,7(4H,6H)-dione) (P), iso-cytosine (iso-C), iso-guanine (iso-G), 1-gamma-D- ribofuranosyl-(5-nitroindole), l-gamma-D-ribofuranosyl-(3-nitropyrrole), 5-bromouracil, 2- aminopurine, 4-thio-dT, 7-(2-thienyl)-imidazo[4,5-b]pyridine (Ds) and pyrrole-2-carbaldehyde 15 (Pa), 2-amino-6-(2-thienyl)purine (S), 2-oxopyridine (Y), difluorotolyl, 4-fluoro-6- methylbenzimidazole, 4-methylbenzimidazole, 3 -methyl isocarbostyrilyl, 5 -methyl isocarbostyrilyl, and 3-methyl-7-propynyl isocarbostyrilyl, 7-azaindolyl, 6-methyl-7-azaindolyl, imidizopyridinyl, 9-methyl-imidizopyridinyl, pyrrolopyrizinyl, isocarbostyrilyl, 7-propynyl isocarbostyrilyl, propynyl-7-azaindolyl, 2,4,5-trimethylphenyl, 4-methylindolyl, 4,6- 20 dimethylindolyl, phenyl, napthalenyl, anthracenyl, phenanthracenyl, pyrenyl, stilbenzyl, tetracenyl, pentacenyl, and structural derivates thereof (Schweitzer et al., J. Org. Chem., 59:7238-7242 (1994); Berger et al., Nucleic Acids Research, 28(15):2911-2914 (2000); Moran et al., J. Am. Chem. Soc, 119:2056-2057 (1997); Morales et al., J. Am. Chem. Soc, 121:2323- 2324 (1999); Guckian et al., J. Am. Chem. Soc, 118:8182-8183 (1996); Morales et al., J. Am. 25 Chem. Soc, 122(6):1001-1007 (2000); McMinn et al., J. Am. Chem. Soc, 121 :11585-11586 (1999); Guckian et al., J. Org. Chem., 63:9652-9656 (1998); Moran et al., Proc. Natl. Acad. Sci., 94:10506-10511 (1997); Das et al., J. Chem. Soc, Perkin Trans., 1 :197-206 (2002); Shibata et al., J. Chem. Soc, Perkin Trans., 1: 1605-1611 (2001); Wu et al., J. Am. Chem. Soc, 122(32):7621-7632 (2000); O'Neill et al., J. Org. Chem., 67:5869-5875 (2002); Chaudhuri et al., 30 J. Am. Chem. Soc, 117:10434-10442 (1995); and U.S. Pat. No. 6,218,108.). Base analogs may also be a universal base.
"Universal base" refers to a heterocyclic moiety located at the Γ position of a nucleotide sugar moiety in a modified nucleotide, or the equivalent position in a nucleotide sugar moiety substitution, that, when present in a nucleic acid duplex, can be positioned opposite more than one type of base without altering the double helical structure (e.g., the structure of the phosphate backbone). Additionally, the universal base does not destroy the ability of the single stranded nucleic acid in which it resides to duplex to a target nucleic acid.
Table 1: The microRNAs of the invention
Figure imgf000051_0001
hsa-miR-574-3p 36 CACGCTCATGCACACACCCAC
37 CACGCTCATGCACACACCCACA hsa-miR-7-5p 38 TGGAAGACTAGTGATTTTGTTGT hsa-miR-10a-5p 39 TACCCTGTAGATCCGAATTTGTG hsa-miR-18a-5p 40 TAAGGTGCATCTAGTGCAGATAG hsa-miR-21-3p 41 CAACACCAGTCGATGGGCTGT hsa-miR-21-5p 42 T AG C T T AT C AG AC T GAT G T T G A hsa-miR-30e-5p 43 T G T AAAC AT C C T T G AC T G G AAG hsa-miR-31-3p 44 TGCTATGCCAACATATTGCCAT hsa-miR-34a-5p 45 TGGCAGTGTCTTAGCTGGTTGTT hsa-miR-92b-5p 46 AGGGACGGGACGCGGTGCAGTG hsa-miR-96-5p 47 TTTGGCACTAGCACATTTTTGCT hsa-miR-100-5p 48 AACCCGTAGATCCGAACTTGTG hsa-miR-126-3p 49 TCGTACCGTGAGTAATAATGCG hsa-miR-138-l-3p 50 GCTACTTCACAACACCAGGGCC hsa-miR-140-3p 51 TACCACAGGGTAGAACCACGG hsa-miR-141-3p 52 TAACACTGTCTGGTAAAGATGG hsa-miR-142-3p 53 TGTAGTGTTTCCTACTTTATGGA hsa-miR-142-5p 54 CATAAAGTAGAAAGCACTACT hsa-miR-146b-3p 55 TGCCCTGTGGACTCAGTTCTGG hsa-miR-146a-5p 56 TGAGAACTGAATTCCATGGGTT hsa-miR-148a-3p 57 TCAGTGCAC T AC AG AAC T T T G T hsa-miR-150-3p 58 CTGGTACAGGCCTGGGGGACAG hsa-miR-150-5p 59 TCTCCCAACCCTTGTACCAGTG hsa-miR-155-5p 60 TTAATGCTAATCGTGATAGGGGT hsa-miR-181a-5p 61 AACATTCAACGCTGTCGGTGAGT hsa-miR-181b-5p 62 AACATTCATTGCTGTCGGTGGGT hsa-miR-182-5p 63 TTTGGCAATGGTAGAACTCACACT hsa-miR-187-3p 64 TCGTGTCTTGTGTTGCAGCCGG hsa-miR-193a-3p 65 AACTGGCCTACAAAGTCCCAGT hsa-miR-195-5p 66 TAGCAGCACAGAAATATTGGC hsa-miR-197-5p 67 CGGGTAGAGAGGGCAGTGGGAGG hsa-miR-199a-3p 68 ACAGTAGTCTGCACATTGGTTA hsa-miR-200a-3p 69 TAACACTGTCTGGTAACGATGTT hsa-miR-200b-3p 70 TAATACTGCCTGGTAATGATGA hsa-miR-199a-5p 71 CCCAGTGTTCAGACTACCTGTTC hsa-miR-199b-5p 72 CCCAGTGTTTAGACTATCTGTTC hsa-miR-205-5p 73 TCCTTCATTCCACCGGAGTCTG hsa-miR-210-3p 74 CTGTGCGTGTGACAGCGGCTGA hsa-miR-214-3p 75 ACAGCAGGCACAGACAGGCAGT hsa-miR-221-3p 76 AGCTACATTGTCTGCTGGGTTTC hsa-miR-221-5p 77 ACCTGGCATACAATGTAGATTT hsa-miR-223-3p 78 TGTCAGTTTGTCAAATACCCCA hsa-miR-222-5p 79 CTCAGTAGCCAGTGTAGATCCT hsa-miR-224-5p 80 CAAGTCACTAGTGGTTCCGTTTAG hsa-miR-342-5p 81 AGGGGTGCTATCTGTGATTGA hsa-miR-429 82 TAATACTGTCTGGTAAAACCGT hsa-miR-455-3p 83 GCAGTCCATGGGCATATACAC hsa-miR-483-5p 84 AAGACGGGAGGAAAGAAGGGAG hsa-miR-487b-3p 85 AATCGTACAGGGTCATCCACTT hsa-miR-497-5p 86 CAGCAGCACACTGTGGTTTGT hsa-miR-513a-5p 87 TTCACAGGGAGGTGTCATTTAT hsa-miR-542-5p 88 TCGGGGATCATCATGTCACGAGA hsa-miR-625-5p 89 AGGGGGAAAGTTCTATAGTCC hsa-miR-650 90 AGGAGGCAGCGCTCTCAGGAC hsa-miR-658 91 GGCGGAGGGAAGTAGGTCCGTTGGT hsa-miR-664b-5p 92 TGGGCTAAGGGAGATGATTGGGTA hsa-miR-708-5p 93 AAGGAGCTTACAATCTAGCTGGG hsa-miR-765 94 TGGAGGAGAAGGAAGGTGATG hsa-miR-1229-5p 95 GTGGGTAGGGTTTGGGGGAGAGCG hsa-miR-2392 96 TAGGATGGGGGTGAGAGGTG hsa-miR-3141 97 GAGGGCGGGTGGAGGAGGA hsa-miR-3162-5p 98 TTAGGGAGTAGAAGGGTGGGGAG hsa-miR-3679-5p 99 T G AG G AT AT G G C AG G G AAG G G G A hsa-miR-3687 100 CCCGGACAGGCGTTCGTGCGACGT hsa-miR-3940-5p 101 GTGGGTTGGGGCGGGCTCTG hsa-miR-4270 102 TCAGGGAGTCAGGGGAGGGC hsa-miR-4284 103 GGGCTCACATCACCCCAT hsa-miR-4443 104 TTGGAGGCGTGGGTTTT hsa-miR-4447 105 GGTGGGGGCTGTTGTTT hsa-miR-4448 106 GGCTCCTTGGTCTAGGGGTA hsa-miR-4454 107 GGATCCGAGTCACGGCACCA hsa-miR-4534 108 GGATGGAGGAGGGGTCT hsa-miR-4538 109 GAGCTTGGATGAGCTGGGCTGA hsa-miR-4539 110 GCTGAACTGGGCTGAGCTGGGC hsa-miR-4689 111 TTGAGGAGACATGGTGGGGGCC hsa-miR-4690-5p 112 GAGCAGGCGAGGCTGGGCTGAA hsa-miR-4739 113 AAGGGAGGAGGAGCGGAGGGGCCCT hsa-miR-5001-5p 114 AGGGCTGGACTCAGCGGCGGAGCT hsa-miR-5100 115 TTCAGATCCCAGCGGTGCCTCT hsa-miR-5684 116 AAC TCTAGCCT GAG C AAC AG hsa-miR-5698 117 TGGGGGAGTGCAGTGATTGTGG hsa-miR-5739 118 GCGGAGAGAGAATGGGGAGC hsa-miR-6076 119 AGCATGACAGAGGAGAGGTGG hsa-miR-6086 120 GGAGGTTGGGAAGGGCAGAG hsa-miR-6127 121 TGAGGGAGTGGGTGGGAGG
MID-00078 122 AAGTGATTGGAGGTGGGTGGGG
MID-00321 123 CCTGTCTGAGCGACGCT
MID-00387 124 GAGACTCTCCTGTGCAG
MID-00671 125 TGCAGATTGTGGGTGGGAGGAC MID-00672 126 TGCAGCTGGTGGAGTCTGGGGG
MID-00690 127 TGGAGAAGACTGGAGAGGGTAT
MID- 15965 128 ACTACCCCAGGATGCCAGCATAGTT
MID- 16318 129 AGCTGGTTTGATGGGGAGCCAT
MID- 17144 130 CACTGATTATCGAGGCGATTCT
MID- 17866 131 CGCCTGTGAATAGTCACTGCAC
MID- 18468 132 GACGTGAGGGGGTGCTACATAC
MID- 19433 133 GGCTGGTCCGAAGGTAGTGAGTT
MID- 19434 134 GGCTGGTCCGAGTGCAGTGGTGTTT
MID-23168 135 TGTCCAAAGTAAACGCCCTGACGCA
MID-23794 136 TTCCCGGCCAATGCATTA
MID-24496 137 TTTGGAGGGGCCGTGACAGATG
MID-24705 138 CTCCCACTGCTTCACTTGACTA
MD2-495 139 NGGGCCGAGGGAGCGAGAG1
MD2-437 140 AGUGCUUGGCUGAGGAGCU hsa-let-7a-5p 141 TGAGGTAGTAGGTTGTATAGTT hsa-let-7b-5p 142 TGAGGTAGTAGGTTGTGTGGTT hsa-let-7c-5p 143 TGAGGTAGTAGGTTGTATGGTT hsa-let-7d-5p 144 AGAGGTAGTAGGTTGCATAGTT hsa-let-7f-5p 145 TGAGGTAGTAGATTGTATAGTT hsa-let-7g-5p 146 TGAGGTAGTAGTTTGTACAGTT hsa-let-7i-5p 147 TGAGGTAGTAGTTTGTGCTGTT hsa-miR-103a-2-5p 148 AGCTTCTTTACAGTGCTGCCTTG hsa-miR-103a-3p 149 AGCAGCATTGTACAGGGCTATGA hsa-miR-106a-5p 150 AAAAGTGCTTACAGTGCAGGTAGC hsa-miR-106b-5p 151 TAAAGTGCTGACAGTGCAGAT hsa-miR-107 152 AGCAGCATTGTACAGGGCTATCA hsa-miR-125a-5p 153 TCCCTGAGACCCTTTAACCTGTGA hsa-miR-144-3p 154 TACAGTATAGATGATGTACT hsa-miR-149-5p 155 TCTGGCTCCGTGTCTTCACTCCC hsa-miR-151a-5p 156 TCGAGGAGCTCACAGTCTAGTA hsa-miR-15b-5p 157 TAGCAGCACATCATGGTTTACA hsa-miR-16-l-3p 158 CCAGTATTAACTGTGCTGCTGA hsa-miR-16-5p 159 TAGCAGCACGTAAATATTGGCG hsa-miR-17-3p 160 ACTGCAGTGAAGGCACTTGTAG hsa-miR-17-5p 161 CAAAGTGCTTACAGTGCAGGTAGT hsa-miR-185-5p 162 TGGAGAGAAAGGCAGTTCCTGA hsa-miR-191-5p 163 CAACGGAATCCCAAAAGCAGCTG hsa-miR-192-3p 164 CTGCCAATTCCATAGGTCACAG hsa-miR-19b-3p 165 T G T G C AAAT C C AT G C AAAAC T G A hsa-miR-20a-5p 166 TAAAGTGCTTATAGTGCAGGTAG hsa-miR-25-3p 167 CATTGCACTTGTCTCGGTCTGA hsa-miR-26a-5p 168 TTCAAGTAATCCAGGATAGGCT hsa-miR-26b-5p 169 TTCAAGTAATTCAGGATAGGT hsa-miR-30b-5p 170 TGTAAACATCCTACACTCAGCT hsa-miR-30c-5p 171 T G TAAACAT CCTACACTCTCAGC hsa-miR-30d-5p 172 T G T AAAC AT C C C C G AC T G G AAG
hsa-miR-320a 173 AAAAGCTGGGTTGAGAGGGCGAA
hsa-miR-3648 174 AGCCGCGGGGATCGCCGAGGG
hsa-miR-376a-3p 175 AT C AT AGAG GAAAATC C AC GT
hsa-miR-425-5p 176 AAT G AC AC G AT C AC T C C C G T T G A
hsa-miR-4306 177 TGGAGAGAAAGGCAGTA
hsa-miR-4324 178 CCCTGAGACCCTAACCTTAA
hsa-miR-484 179 TCAGGCTCAGTCCCCTCCCGAT
hsa-miR-624-5p 180 TAGTACCAGTACCTTGTGTTCA
hsa-miR-92a-3p 181 TATTGCACTTGTCCCGGCCTGT
hsa-miR-93-5p 182 CAAAGTGCTGTTCGTGCAGGTAG
1 "N" may be any one of G, C, A, T/U.
miR name is the miRBase registry name (release 20), except for the miR names represented by MID- [numeral] or MD2- [numeral].
MID-00078, MID-00321, MID-00387, MID-00671, MID-00672, MID-00690, MID-15965, MID-16318, MID-17144, MID-17866, MID-18468, MID-19433, MID-19434, MID-23168, MID-23794, MID-24496, MID-24705, MD2-495 and MD2-437 are putative microRNAs, which were predicted and/or cloned at Rosetta Genomics.
The nucleic acid may also comprise a miR hairpin sequence shown in Table 2, or a variant thereof.
Table 2: Hairpins of the microRNAs of the invention
Hairpin
miR name SEQ ID Hairpin Sequence
NO.
GTGGACCGGCTGGCCCCATCTGGAAGACTAGTGATTTTGTTGTTGTCT
hsa-mir-7 183 TACTGCGCTCAACAACAAATCCCAGTCTACCTAATGGTGCCAGCCATC
GC
GTCTTCTGTATATACCCTGTAGATCCGAATTTGTGTAAGGAATTTTGT
hsa-mir-lOa 184 GGTCACAAATTCGTATCTAGGGGAATATGTAGTTGAC
GTTCTAAGGTGCATCTAGTGCAGATAGTGAAGTAGATTAGCATCTACT
hsa-mir-18a 185 GCCCTAAGTGCTCCTTCTGGC
GTACCACCTTGTCGGGTAGCTTATCAGACTGATGTTGACTGTTGAATC
hsa-mir-21 186 TCATGGCAACACCAGTCGATGGGCTGTCTGACATTTTGGTAT
GGCCGGCTGGGGTTCCTGGGGATGGGATTTGCTTCCTGTCACAAATCA
hsa-mir-23a 187 CATTGCCAGGGATTTCCAACCGACC
GGCAGTCTTTGCTACTGTAAACATCCTTGACTGGAAGCTGTAAGGTGT
hsa-mir-30e 188 TCAGAGGAGCTTTCAGTCGGATGTTTACAGCGGCAGGCTGCC
GGAGAGGAGGCAAGATGCTGGCATAGCTGTTGAACTGGGAACCTGCTA
hsa-mir-31 189 TGCCAACATATTGCCATCTTTCC
GTGAGTGTTTCTTTGGCAGTGTCTTAGCTGGTTGTTGTGAGCAATAGT
hsa-mir-34a 190 AAGGAAGCAATCAGCAAGTATACTGCCCTAGAAGTGCTGCAC
GGGGAGCGGGATCCCGGGCCCCGGGCGGGCGGGAGGGACGGGACGCGG
hsa-mir-92b 191 TGCAGTGTTGTTTTTTCCCCCGCCAATATTGCACTCGTCCCGGCCTCC
GGCCCCCCCGGCCCCCCGGCCTCCCCGCTACCCC
hsa-mir-96 192 TCTGCTTGGCCGATTTTGGCACTAGCACATTTTTGCTTGTGTCTCTCC Hairpin
miR name SEQ ID Hairpin Sequence
NO.
GCTCTGAGCAATCATGTGCAGTGCCAATATGGGAAAAGCAGG
GCCTGTTGCCACAAACCCGTAGATCCGAACTTGTGGTATTAGTCCGCA
hsa-mir-100 193 CAAGCTTGTATCTATAGGTATGTGTCTGTTAGGC
GCTGGCGACGGGACATTATTACTTTTGGTACGCGCTGTGACACTTCAA
hsa-mir-126 194
ACTCGTACCGTGAGTAATAATGCGCCGTCCACGGC
TGCGCTCCTCTCAGTCCCTGAGACCCTAACTTGTGATGTTTACCGTTT
hsa-mir-125b-l 195 AAATCCACGGGTTAGGCTCTTGGGAGCTGCGAGTCGTGCT
ACCAGACTTTTCCTAGTCCCTGAGACCCTAACTTGTGAGGTATTTTAG
hsa-mir-125b-2 196 TAACATCACAAGTCAGGCTCTTGGGACCTAGGCGGAGGGGA
TGGTGTGGTGGGGCAGCTGGTGTTGTGAATCAGGCCGTTGCCAATCAG
197
AGAACGGCTACTTCACAACACCAGGGCCACACCACACTA
hsa-mir-138-1 CCCTGGCATGGTGTGGTGGGGCAGCTGGTGTTGTGAATCAGGCCGTTG
198 CCAATCAGAGAACGGCTACTTCACAACACCAGGGCCACACCACACTAC AGG
CGTTGCTGCAGCTGGTGTTGTGAATCAGGCCGACGAGCAGCGCATCCT
199 CTTACCCGGCTATTTCACGACACCAGGGTTGCATCA
hsa-mir-138-2 GAGGAAGCCGGCGGAGTTCTGGTATCGTTGCTGCAGCTGGTGTTGTGA
200 ATCAGGCCGACGAGCAGCGCATCCTCTTACCCGGCTATTTCACGACAC CAGGGTTGCATCATACCCATCCTCTCCAGGCGAGCCTC
GCGCCCTGTGTGTGTCTCTCTCTGTGTCCTGCCAGTGGTTTTACCCTA
hsa-mir-140 201 TGGTAGGTTACGTCATGCTGTTCTACCACAGGGTAGAACCACGGACAG
GATACCGGGGCACCCTCTGCGT
GTCGGCCGGCCCTGGGTCCATCTTCCAGTACAGTGTTGGATGGTCTAA
hsa-mir-141 202 TTGTGAAGCTCCTAACACTGTCTGGTAAAGATGGCTCCCGGGTGGGTT
CTCTCGGC
ACAGTGCAGTCACCCATAAAGTAGAAAGCACTACTAACAGCACTGGAG
hsa-mir-142 203 GGTGTAGTGTTTCCTACTTTATGGATGAGTGTACTGT
CCTGGCACTGAGAACTGAATTCCATAGGCTGTGAGCTCTAGCAATGCC
hsa-mir-146b 204
CTGTGGACTCAGTTCTGGTGCCCGG
GTATCCTCAGCTTTGAGAACTGAATTCCATGGGTTGTGTCAGTGTCAG
hsa-mir-146a 205 ACCTCTGAAATTCAGTTCTTCAGCTGGGATAT
GGTCTTTTGAGGCAAAGTTCTGAGACACTCCGACTCTGAGTATGATAG
hsa-mir-148a 206 AAGTCAGTGCACTACAGAACTTTGTCTCTAGAGGCT
TCCCCATGGCCCTGTCTCCCAACCCTTGTACCAGTGCTGGGCTCAGAC
hsa-mir-150 207
CCTGGTACAGGCCTGGGGGACAGGGACCTGGGGA
GTCCCCCCCGGCCCAGGTTCTGTGATACACTCCGACTCGGGCTCTGGA
208 GCAGTCAGTGCATGACAGAACTTGGGCCCGGAAGGAC
hsa-mir-152
TGTCCCCCCCGGCCCAGGTTCTGTGATACACTCCGACTCGGGCTCTGG
209
AGCAGTCAGTGCATGACAGAACTTGGGCCCGGAAGGACC
TAGGCTGTATGCTGTTAATGCTAATCGTGATAGGGGTTTTTGCCTCCA
hsa-mir-155 210 ACTGACTCCTACATATTAGCATTAACAGTGTATGATGCCTG
GGTTGCTTCAGTGAACATTCAACGCTGTCGGTGAGTTTGGAATTAAAA
hsa-mir-181a 211
TCAAAACCATCGACCGTTGATTGTACCCTATGGCTAACC
GGTCACAATCAACATTCATTGCTGTCGGTGGGTTGAACTGTGTGGACA
hsa-mir-181b 212
AGCTCACTGAACAATGAATGCAACTGTGGCC
CGGAAAATTTGCCAAGGGTTTGGGGGAACATTCAACCTGTCGGTGAGT
hsa-mir-181c 213 TTGGGCAGCTCAGGCAAACCATCGACCGTTGAGTGGACCCTGAGGCCT
GGAATTGCCATCCT
CCTCCCCCCGTTTTTGGCAATGGTAGAACTCACACTGGTGAGGTAACA
hsa-mir-182 214
GGATCCGGTGGTTCTAGACTTGCCAACTATGGGGCGAGG
CCTCGGGCTACAACACAGGACCCGGGCGCTGCTCTGACCCCTCGTGTC
hsa-mir-187 215 TTGTGTTGCAGCCGGAGG Hairpin
miR name SEQ ID Hairpin Sequence
NO.
GGGAGCTGAGGGCTGGGTCTTTGCGGGCGAGATGAGGGTGTCGGATCA
hsa-mir-193a 216
ACTGGCCTACAAAGTCCCAGTTCTCGGCCCC
CCTGGCTCTAGCAGCACAGAAATATTGGCACAGGGAAGCGAGTCTGCC
hsa-mir-195 217
AATATTGGCTGTGCTGCTCCAGG
TGTGCTCTGGGGGCTGTGCCGGGTAGAGAGGGCAGTGGGAGGTAAGAG
hsa-mir-197 218 CTCTTCACCCTTCACCACCTTCTCCACCCAGCATGGCCGGCACA
GGCCCCGCCAACCCAGTGTTCAGACTACCTGTTCAGGAGGCTCTCAAT
hsa-mir-199a 219 GTGTACAGTAGTCTGCACATTGGTTAGGCTGGGCT
GAGCATCTTACCGGACAGTGCTGGATTTCCCAGCTTGACTCTAACACT
hsa-mir-200a 220
GTCTGGTAACGATGTTC
GCTCGGGCAGCCGTGGCCATCTTACTGGGCAGCATTGGATGGAGTCAG
hsa-mir-200b 221
GTCTCTAATACTGCCTGGTAATGATGACGGCGGAGCCCTGC
GGGCGGGGGCCCTCGTCTTACCCAGCAGTGTTTGGGTGCGGTTGGGAG
222
TCTCTAATACTGCCGGGTAATGATGGAGGCCCCTGTCC
hsa-mir-200c
CCCTCGTCTTACCCAGCAGTGTTTGGGTGCGGTTGGGAGTCTCTAATA
223
CTGCCGGGTAATGATGGAGG
GGCCCCGCCAACCCAGTGTTCAGACTACCTGTTCAGGAGGCTCTCAAT
hsa-mir-199a 224 GTGTACAGTAGTCTGCACATTGGTTAGGCTGGGCT
GTCTACCCAGTGTTTAGACTATCTGTTCAGGACTCCCAAATTGTACAG
hsa-mir-199b 225
TAGTCTGCACATTGGTTAGGC
TCCATGTGCTTCTCTTGTCCTTCATTCCACCGGAGTCTGTCTCATACC
hsa-mir-205 226 CAACCAGATTTCAGTGGAGTGAAGTTCAGGAGGCATGGA
CCAGGCGCAGGGCAGCCCCTGCCCACCGCACACTGCGCTGCCCCAGAC
hsa-mir-210 227
CCACTGTGCGTGTGACAGCGGCTGATCTGTGCCTGG
GGCTGGACAGAGTTGTCATGTGTCTGCCTGTCTACACTTGCTGTGCAG
hsa-mir-214 228 AACATCCGCTCACCTGTACAGCAGGCACAGACAGGCAGTCACATGACA
ACCCAGCC
GAACATCCAGGTCTGGGGCATGAACCTGGCATACAATGTAGATTTCTG
hsa-mir-221 229 TGTTCGTTAGGCAACAGCTACATTGTCTGCTGGGTTTCAGGCTACCTG
GAAACATGTTC
CAGCTGCTGGAAGGTGTAGGTACCCTCAATGGCTCAGTAGCCAGTGTA
230 GATCCTGTCTTTCGTAATCAGCAGCTACATCTGGCTACTGGGTCTCTG ATGGCATCTTCTAGCTTCTG
hsa-mir-222
GCTGCTGGAAGGTGTAGGTACCCTCAATGGCTCAGTAGCCAGTGTAGA
231 TCCTGTCTTTCGTAATCAGCAGCTACATCTGGCTACTGGGTCTCTGAT GGCATCTTCTAGCT
GCTCTTGGCCTGGCCTCCTGCAGTGCCACGCTCCGTGTATTTGACAAG
hsa-mir-223 232 CTGAGTTGGACACTCCATGTGGTAGAGTGTCAGTTTGTCAAATACCCC
AAGTGCGGCACATGCTTACCAGCTCTAGGCCAGGGC
GGGGCTTTCAAGTCACTAGTGGTTCCGTTTAGTAGATGATTGTGCATT
hsa-mir-224 233 GTTTCAAAATGGTGCCCTAGTGACTACAAAGCCCC
GTGAAACTGGGCTCAAGGTGAGGGGTGCTATCTGTGATTGAGGGACAT
234 GGTTAATGGAATTGTCTCACACAGAAATCGCACCCGTCACCTTGGCCT ACTTATCAC
hsa-mir-342
GAAACTGGGCTCAAGGTGAGGGGTGCTATCTGTGATTGAGGGACATGG
235 TTAATGGAATTGTCTCACACAGAAATCGCACCCGTCACCTTGGCCTAC TTA
ACCCAAACCCTAGGTCTGCTGACTCCTAGTCCAGGGCTCGTGATGGCT
hsa-mir-345 236 GGTGGGCCCTGAACGAGGGGTCTGGAGGCCTGGGTTTGAATATCGACA
GC
GGTCTCTGTGTTGGGCGTCTGTCTGCCCGCATGCCTGCCTCTCTGTTG
hsa-mir-346 237
CTCTGAAGGAGGCAGGGGCTGGGCCTGCAGCTGCCTGGGCAGAGCGG Hairpin
miR name SEQ ID Hairpin Sequence
NO.
CGCTCCCGCCCCGCGACGAGCCCCTCGCACAAACCGGACCTGAGCGTT
238
TTGTTCGTTCGGCTCGCGTGAGGCAGGGGCG
hsa-mir-375
CCCCGCGACGAGCCCCTCGCACAAACCGGACCTGAGCGTTTTGTTCGT
239 TCGGCTCGCGTGAGGC
CGAGGGGATACAGCAGCAATTCATGTTTTGAAGTGTTCTAAATGGTTC
hsa-mir-424 240 AAAACGTGAGGCGCTGCTATACCCCCTCGTGGGGAAGGTAGAAGGTGG
GG
GATGGGCGTCTTACCAGACATGGTTAGACCTGGCCCTCTGTCTAATAC
hsa-mir-429 241
TGTCTGGTAAAACCGTCCATC
GGCGTGAGGGTATGTGCCTTTGGACTACATCGTGGAAGCCAGCACCAT
hsa-mir-455 242
GCAGTCCATGGGCATATACACTTGCCTCAAGGCC
ACCCCAAGGTGGAGCCCCCAGCGACCTTCCCCTTCCAGCTGAGCATTG CTGTGGGGGAGAGGGGGAAGACGGGAGGAAAGAAGGGAGTGGTTCCAT
hsa-mir-483 243 CACGCCTCCTCACTCCTCTCCTCCCGTCTTCTCCTCTCCTGCCCTTGT
CTCCCTGTCTCAGCAGCTCCAGGGGTGGTGTGGGCCCCTCCAGCCTCC TAGGTGGT
GTGCTAACCTTTGGTACTTGGAGAGTGGTTATCCCTGTCCTGTTCGTT
hsa-mir-487b 244 TTGCTCATGTCGAATCGTACAGGGTCATCCACTTTTTCAGTATCAAGA
GCGC
CTGATCTCCATCCTCCCTGGGGCATCCTGTACTGAGCTGCCCCGAGGC
245 CCTTCATGCTGCCCAGCTCGGGGCAGCTCAGTACAGGATACTCGGGGT hsa-mir-486 GGGAGTCAGCAGGAGGTGAG
GCATCCTGTACTGAGCTGCCCCGAGGCCCTTCATGCTGCCCAGCTCGG
246
GGCAGCTCAGTACAGGATAC
TCCTGTACTGAGCTGCCCCGAGCTGGGCAGCATGAAGGGCCTCGGGGC
hsa-mir-486-2 247
AGCTCAGTACAGGATG
CGGTCCTGCTCCCGCCCCAGCAGCACACTGTGGTTTGTACGGCACTGT
hsa-mir-497 248 GGCCACGTCCAAACCACACTGTGGTGTTAGAGCGAGGGTGGGGGAGGC
ACCG
GGGATGCCACATTCAGCCATTCAGCGTACAGTGCCTTTCACAGGGAGG
hsa-mir-513a 249 TGTCATTTATGTGAACTAAAATATAAATTTCACCTTTCTGAGAAGGGT
AATGTACAGCATGCACTGCATATGTGGTGTCCC
GGATGCACAGATCTCAGACATCTCGGGGATCATCATGTCACGAGATAC
hsa-mir-542 250 CAGTGTGCACTTGTGACAGATTGATAACTGAAAGGTCTGGGAGCCACT
CATCT
TGCCAGATGTGCTCTCCTGGCCCATGAAATCAAGCGTGGGTGAGACCT
251 GGTGCAGAACGGGAAGGCGACCCATACTTGGTTTCAGAGGCTGTGAGA hsa-mir-551b ATAACTGCA
AGATGTGCTCTCCTGGCCCATGAAATCAAGCGTGGGTGAGACCTGGTG
252 CAGAACGGGAAGGCGACCCATACTTGGTTTCAGAGGCTGTGAGAATAA
GGGACCTGCGTGGGTGCGGGCGTGTGAGTGTGTGTGTGTGAGTGTGTG
hsa-mir-574 253 TCGCTCCGGGTCCACGCTCATGCACACACCCACACGCCCACACTCAGG
TGGTAAGGGTAGAGGGATGAGGGGGAAAGTTCTATAGTCCTGTAATTA
hsa-mir-625 254 GATCTCAGGACTATAGAACTTTCCCCCTCATCCCTCTGCCCTCTACCA
TCTCAGGAGGCAGCGCTCTCAGGACGTCACCACCATGGCCTGGGCTCT
hsa-mir-650 255 GCTCCTCCTCA Hairpin
miR name SEQ ID Hairpin Sequence
NO.
CTCGGTTGCCGTGGTTGCGGGCCCTGCCCGCCCGCCAGCTCGCTGACA
hsa-mir-658 256 GCACGACTCAGGGCGGAGGGAAGTAGGTCCGTTGGTCGGTCGGGAACG
AG
GTTCAGTCCAGGGCAGCTTCCCTGTTCTGTTAATTAAACTTTGGGACA
TTAAAATGGGCTAAGGGAGATGATTGGGTAGAAAGTATTATTCTATTC
hsa-mir-664b 257
ATTTGCCTCCCAGCCTACAAAAATGCCTGCTTGGGGTCTAATACTTCA ACGGTTAAAGATGCCTGGAAGAGGGC
GGTAACTGCCCTCAAGGAGCTTACAATCTAGCTGGGGGTAAATGACTT
hsa-mir-708 258 GCACATGAACACAACTAGACTGTGAGCTTCTAGAGGGCAGGGACC
TTAGGCGCTGATGAAAGTGGAGTTCAGTAGACAGCCCTTTTCAAGCCC
hsa-mir-765 259 TACGAGAAACTGGGGTTTCTGGAGGAGAAGGAAGGTGATGAAGGATCT
GTTCTCGTGAGCCTGA
GTGGGTAGGGTTTGGGGGAGAGCGTGGGCTGGGGTTCAGGGACACCCT
hsa-mir-1229 260 CTCACCACTGCCCTCCCACAG
TGGTCCCTCCCAATCCAGCCATTCCTCAGACCAGGTGGCTCCCGAGCC
hsa-mir-2392 261 ACCCCAGGCTGTAGGATGGGGGTGAGAGGTGCTA
GCTCGACTCCTGTTCCTGCTGAACTGAGCCAGTGTGTAAAATGAGAAC
hsa-mir-3074 262 TGATATCAGCTCAGTAGGCACCGGAGGGCGGGT
CCCGGTGAGGGCGGGTGGAGGAGGAGGGTCCCCACCATCAGCCTTCAC
hsa-mir-3141 263
TGGGACGGG
AAGTTAATTTTGAAGCTGACTTTTTTAGGGAGTAGAAGGGTGGGGAGC
hsa-mir-3162 264 ATGAACAATGTTTCTCACTCCCTACCCCTCCACTCCCCAAAAAAGTCA
GCTTCTCTTGTTAACTT
GGCCCCACGTGGTGAGGATATGGCAGGGAAGGGGAGTTTCCCTCTATT
hsa-mir-3679 265 CCCTTCCCCCCAGTAATCTTCATCATGCGGTGTC
GCGCGTGCGCCCGAGCGCGGCCCGGTGGTCCCTCCCGGACAGGCGTTC
hsa-mir-3687 266 GTGCGACGTGT
GAGGAAAAGATCGAGGTGGGTTGGGGCGGGCTCTGGGGATTTGGTCTC
hsa-mir-3940 267
ACAGCCCGGATCCCAGCCCACTTACCTTGGTTACTCTCCTT
CAAATAGCTTCAGGGAGTCAGGGGAGGGCAGAAATAGATGGCCTTCCC
hsa-mir-4270 268
CTGCTGGGAAGAAAGTG
TTCTGTGAGGGGCTCACATCACCCCATCAAAGTGGGGACTCATGGGGA
hsa-mir-4284 269 GAGGGGGTAGTTAGGAGCTTTGATAGAG
GGTGGGGGTTGGAGGCGTGGGTTTTAGAACCTATCCCTTTCTAGCCCT
hsa-mir-4443 270 GAGCA
GTTCTAGAGCATGGTTTCTCATCATTTGCACTACTGATACTTGGGGTC
hsa-mir-4447 271 AGATAATTGTTTGTGGTGGGGGCTGTTGTTTGCATTGTAGGAT
GGAGTGACCAAAAGACAAGAGTGCGAGCCTTCTATTATGCCCAGACAG
hsa-mir-4448 272
GGCCACCAGAGGGCTCCTTGGTCTAGGGGTAATGCC
CCGGATCCGAGTCACGGCACCAAATTTCATGCGTGTCCGTGTGAAGAG
hsa-mir-4454 273 ACCACCA
GTGAATGACCCCCTTCCAGAGCCAAAATCACCAGGGATGGAGGAGGGG
hsa-mir-4534 274
TCTTGGGTAC
AACTGGGCTGGGCTGAACTGGGCTGGGCTGAGCTGAGCTTGGATGAGC
hsa-mir-4538 275 TGGGCTGAACTGGGCTGGGTTGAGCTGGGCTGGGCTGAGTTGAGCCAG
GCTGATCTGGGCTGAGCCGAGCTGGGTTAAGCCGAGCTGGGTT
GGCTGGGCTGGGCTGGGCTCTGCTGTGCTGTGCTGAACAGGGCTGAGC
TGAACTGAGCTGAGCTGGGCTGAGCTGGGCTCTGCTGTGCTGTGCTGA
hsa-mir-4539 276 GCAGGGCTGAGCTGAACTGGGCTGAGCTGGGCTGAGCTGGGCTGAGTT
GAGCA GAGCTGGGTTGAGCAGAGCTGGGCTGGGCTGGGCTGAGTTGAG CC
hsa-mir-4689 277 CGGTTTCTCCTTGAGGAGACATGGTGGGGGCCGGTCAGGCAGCCCATG Hairpin
miR name SEQ ID Hairpin Sequence
NO.
CCATGTGTCCTCATGGAGAGGCCG
GGCAGGTGAGCAGGCGAGGCTGGGCTGAACCCGTGGGTGAGGAGTGCA
hsa-mir-4690 278 GCCCAGCTGAGGCCTCTGCTGTCTTATCTGTC
GTGGGCAGGGGAGGAAGAAGGGAGGAGGAGCGGAGGGGCCCTTGTCTT
hsa-mir-4739 279 CCCAGAGCCTCTCCCTTCCTCCCCTCCCCCTCCCTCTGCTCAT
GGGCGGCTGCGCAGAGGGCTGGACTCAGCGGCGGAGCTGGCTGCTGGC
hsa-mir-5001 280 CTCAGTTCTGCCTCTGTCCAGGTCCTTGTGACCCGCCC
CTGGGGGTAGGAGCGTGGCTTCTGGAGCTAGACCACATGGGTTCAGAT
hsa-mir-5100 281 CCCAGCGGTGCCTCTAACTG
GAGCTATGATTGTGTAGCTGAACTCTAGCCTGAGCAACAGAGTGAGAT
hsa-mir-5684 282 GGTCTTGTTTTGTTGCCCAGGCTGGAGTCCAGTGTCAAGATCATGGCT
C
GAGCTCCAAATCTGTGCACCTGGGGGAGTGCAGTGATTGTGGAATGCA
hsa-mir-5698 283 AAGTCCCACAATCACTGTACTCCCCAGGTGCACAGATTCTCTCTC
GATTGGACTTTATTGTCACGTTCTGATTGGTTAGCCTAAGACTTGTTC
hsa-mir-5701-1 284
TGATCCAATCAGAACATGAAAATAACGTCCAATC
GATTGGACTTTATTGTCACGTTCTGATTGGTTAGCCTAAGACTTGTTC
hsa-mir-5701-2 285 TGATCCAATCAGAACATGAAAATAACGTCCAATC
TTGGCTATAACTATCATTTCCAAGGTTGTGCTTTTAGGAAATGTTGGC
hsa-mir-5739 286 TGTCCTGCGGAGAGAGAATGGGGAGCCAG
AGCATGACAGAGGAGAGGTGGAGGTAGGCGAGAGTAATATAATTTCTC
hsa-mir-6076 287 CAGGAGAACATCTGAGAGGGGAAGTTGCTTTCCTGCCCTGGCCCTTTC
ACCCTCCTGAGTTTGGG
AGGAGGTTGGGAAGGGCAGAGATGAGCATAAAGTTTTTGCCTTGTTTT
hsa-mir-6086 288 TCTTTTT
AAGATGAGGGAGTGGGTGGGAGGTGGGAAGGCTGCCCCAAATGGCCTC
hsa-mir-6127 289 TAACATCCCTTCCAGTCTCCTCCTCCTCCTCCTCCTTCTTCTT
TATGTACCCGGAGCCAAAAGTGATTGGAGGTGGGTGGGGTTAATGAAT
MID-00078 290 AGACAAGTGTTAAAACTAAAAGTCACGTCTCTCTCTCCTTCCTCCTCA
GTTTTGGCTTGATTTTTCATG
CTTACCTAGAAATTGTTGCCTGTCTGAGCGACGCTTCAAACTCAGCTT
MID-00321 291 CAGCAGGTCTGCAGGGACATCAGGTAGG
GTGTCTCTGTGTTTGCAGGTGTCCAGTGTGAGGTGCAGCTGGTGGAGT
MID-00387 292 CTGGGGGAGGCTTGGTACAGCCTGGGGGATCCCTGAGACTCTCCTGTG
CAGCCTCTGGATTCACCTTCAGTAACAGTGACAT
GTCAGCCTGCAATTAGTGAAATGGAGGCACACATGCTGGTTTGCAGAT
MID-00671 293 TGTGGGTGGGAGGAC
GTGTCTCTGTGTTTGCAGGTGTCCAGTGTGAGGTGCAGCTGGTGGAGT
MID-00672 294 CTGGGGGAGGCTTGGTACAGCCTGGGGGATCCCTGAGACTCTCCTGTG
CAGCCTCTGGATTCACCTTCAGTAACAGTGACAT
GGCCTTGGATGGAGAAGACTGGAGAGGGTATGGAAGTGCTTGGACGTA
MID-00690 295 GGACATCTGCCTCTCTGGTCTTTGTCCATCCCACAGGGCC
AGCTGGTTGGCATTCTGGCCCTGGTTCATGCCAACTCTTGTGTTGACT
MID- 15965 296
ACCCCAGGATGCCAGCATAGTTG
CTGCCAAAGAGCAGCAAGATGAGCTGGTTTGATGGGGAGCCATCCCTT
MID- 16318 297 GATGAGGAGAACCCTTCCCACTCTCACTCAGCCTCACCCAGCTGCCCT
GAGGCAG
GCTCAGAAGTGATGAATTGATCAGATAGACGAGGCCGGGCTTGTCCCC
MID- 17144 298 GGCCACTGATTATCGAGGCGATTCTGATCTGGGC
GCTGGGTGCAGTAGCTTATGTCTGTAGTCCCAGCTACTTGGGAGGCTG
MID-17866 299 AGGTGGGAGGATCACCTGAGGTCAGGAGTTTGGGTCTGCCGTGAGCTG
TGATTGCGCCTGTGAATAGTCACTGCACTCCAGC Hairpin
miR name SEQ ID Hairpin Sequence
NO.
GACGTGAGGGGGTGCTACATACAGCAGCTGTGTGTAGTATGTGCCTTT
MID- 18468 300 CTCTGTT
TAGGAATTCTGGACCAGGCTTAAAAGACTGGGATGAGGCTGGTCCGAA
MID- 19433 301 GGTAGTGAGTTATCTCCATTGATAGTTCAGTCTGTAACAGATCAAACT
CCTTGTTCTACTCTTTTTTTTTTTTTTAGACAGA
TGGGCTGGTCCGAGTGCAGTGGTGTTTACAAGTATTTGATTATAACTA
MID- 19434 302 GTTACAGATTTCTTTGTTTCCTTCTCCACTCCCACTGCCTCACTTGAC
TGGCCTA
GCTCTGTCCAAAGTAAACGCCCTGACGCACTGTGGGAAGGGTGAGATG
MID-23168 303 GGCACCGC
GTGAGTGGGAGGGGGGCTGCAGCCCAAAGAGGCAACAAAGGCCCTTCC
MID-23794 304
CGGCCAATGCATTAC
TGTCCTCAGGCCTGCTACTGATCCTGCAGCCAGAAGTTCCAGAAAGTG AAGGGATTTGGAGGGGCCGTGACAGATGCAGGTGCCCTCAACATCCTT
MID-24496 305 GCCCTGTCACCCCCTGCCCAGAATTTGCTACTTAAATGGTACTTCTCT
GAAGAAGATGAGGAGGAAGGGGACA
ACAGAATTCCTCTTCTCCCTTCTCCTATAACCTGTTTTATTTAATTAA
MID-24705 306 TTAATTTTTTAGGCTAGTCAAGTGAAGCAGTGGGAGTGGAAGGAACAA
AGAAATCTGT
MD2-495 307 UGAGCUCUGCGGCGCCAAGGGACCGAGGGGCCGAGGGAGCGAGAG
AGUGCUUGGCUGAGGAGCUGGGGCCAAGGGGGAACACAAAUAUGGUCC
MD2-437 308
UGACCCUACAUUCCCAGCCCUGCCUCU
It is to be noted that SEQ ID NOs.183-306 in Table 2 present the cDNA corresponding to the sequence of the naturally occurring pre-miR, i.e., the sequences present thymine (T) instead of uracil (U).
The nucleic acid may be in the form of a nucleic acid complex, and may further comprise 5 one or more of the following: a peptide, a protein, a RNA-DNA hybrid, an antibody, an antibody fragment, a Fab fragment, or an aptamer.
The nucleic acid may comprise a sequence of a pri-miRNA or a variant thereof. The pri- microRNA sequence may comprise from 45-30,000, 50-25,000, 100-20,000, 1,000-1,500 or 80- 100 nucleotides. The sequence of the pri-miRNA may comprise a pre-miRNA, miRNA and 10 miRNA*, as set forth herein, and variants thereof. The sequence of the pri-miRNA may comprise any of the sequences of SEQ ID NOS: 183-308 or variants thereof.
The pri-miRNA may comprise a hairpin structure. The hairpin may comprise a first and a second nucleic acid sequence that are substantially complimentary. The first and second nucleic acid sequence may be from 37-50 nucleotides. The first and second nucleic acid sequence may 15 be separated by a third sequence of from 8-12 nucleotides. The hairpin structure may have a free energy of less than -25 Kcal/mole as calculated by the Vienna algorithm with default parameters, as described in Hofacker et al. (Monatshefte f. Chemie 1994; 125: 167-188), the contents of which are incorporated herein by reference. The hairpin may comprise a terminal loop of 4-20, 8-12 or 10 nucleotides. The pri-miRNA may comprise at least 19% adenosine nucleotides, at least 16% cytosine nucleotides, at least 23% thymine nucleotides and at least 19% guanine nucleotides.
The nucleic acid may also comprise a sequence of a pre-miRNA or a variant thereof. The 5 pre-miRNA sequence may comprise from 45-90, 60-80 or 60-70 nucleotides. The sequence of the pre-miRNA may comprise a miRNA and a miRNA* as set forth herein. The sequence of the pre-miRNA may also be that of a pri-miRNA excluding from 0-160 nucleotides from the 5' and 3 ' ends of the pri-miRNA. The sequence of the pre-miRNA may comprise the sequence of SEQ ID NOS: 183-308 or variants thereof. 10
As described herein, the nucleic acid may be at least 70%, 75%, 80%, 85%, 90%, 95%, 97%, 98% or 99% identical to the nucleic acid sequences in Tables 1 or 2 (with increments of 1% from 80 to 99%), over a region of 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23,
24, 25, 30, 35, 40, 45, 50 or more nucleotides.
The nucleic acid may also comprise a sequence of a microRNA (including a miRNA*) or 15 a variant thereof, including those putative microRNAs represented by MID- [numeral]. As referred to herein, microRNAs include those miRs which have been listed in the miRBase registry name (release 20), as well as putative microRNAs which have been predicted and/or cloned by Rosetta Genomics and which are represented by MID- [numeral]. The microRNA sequence may comprise from 13-33, 18-24 or 21-23 nucleotides. The microRNA may also 20 comprise a total of at least 5, 67, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24,
25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39 or 40 nucleotides. The sequence of the microRNA may be the first 13-33 nucleotides of the pre-miRNA. The sequence of the microRNA may also be the last 13-33 nucleotides of the pre-miRNA. The sequence of the microRNA may comprise the sequence of any one of SEQ ID NOS: 1-182 or a variant thereof. 25 The present invention employs microRNAs for the identification, classification and diagnosis of thyroid nodules.
"Variant", as used herein referring to a nucleic acid, means (i) a portion of a referenced nucleotide sequence; (ii) the complement of a referenced nucleotide sequence or portion thereof; (iii) a nucleic acid that differs from the referenced nucleotide sequence by a point-mutation or 30 the complement thereof; (iv) a naturally-occurring variant of the referenced nucleotide sequence present in the general population or the complement thereof; or (iv) a nucleic acid that hybridizes under stringent conditions to the referenced nucleic acid, of the complement thereof. "Probe", as used herein, means an oligonucleotide capable of binding to a target nucleic acid of complementary sequence through one or more types of chemical bonds, usually through complementary base pairing, usually through hydrogen bond formation. Probes may bind target sequences lacking complete complementarity with the probe sequence depending upon the stringency of the hybridization conditions. For example, for hybridization assays, the probe may 5 be complementary to at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20 contiguous nucleotides of the sequence of the microRNA being detected. Alternatively, for PCR assays, the probe may be complementary to at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20 contiguous nucleotides of the 10 sequence of the PCR product being detected.
Thus, a probe may be complementary to, or may hybridize to at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 97%, 98% or 99% of its target nucleic acid.
A probe may be single- stranded or partially single- and partially double- stranded. The strandedness of the probe is dictated by the structure, composition and properties of the target 15 sequence. Probes may include a label, an attachment, or a nucleotide sequence that does not naturally occur in a nucleic acid described herein. Probes may be directly labeled or indirectly labeled such as with biotin to which a streptavidin complex may bind.
"Probe" may be an agent for detecting the nucleic acid sequences described herein. Probe may be a labeled nucleic acid probe capable of hybridizing to a portion of the nucleic acid 20 sequence of the invention, or amplification products derived therefrom. In some embodiments, the nucleic acid probe is reverse complementary nucleic acid molecule of the nucleic acid sequence disclosed herein. A probe may be a nucleic acid sequence which sufficiently specifically hybridizes under stringent conditions to the nucleic acid disclosed herein. A probe is optionally labeled with a fluorescent molecule such as a fluorescein, e.g. 6-carboxyfluorescein 25 (FAM), an indocarbocyanine, e.g. QUASAR-670 (QUA), a hexafluorocine, such as 6- carboxyhexafluorescein (HEX), or other fluorophore molecules and optionally a quencher. A quencher is appreciated to be matched to a fluorophore. Illustrative examples of a quencher include the black hole quenchers BHQ1, and BHQ2, or minor groove binders (MGB), e.g. dihydrocyclopyrroloindole tripeptide. Other fluorophores and quenchers are known in the art and 30 are similarly operable herein.
Thus, the present invention also provides a probe, said probe comprising the novel nucleic acid sequences described herein, defined by any one of SEQ ID NOs. 27-29, 33, 34, 139, 140, 307 and 308, or variants thereof. Probes may be used for screening and diagnostic methods. The probe may be attached or immobilized to a solid substrate, such as a biochip. The probe may have a length of from 8 to 500, 10 to 100 or 20 to 60 nucleotides. The probe may have a length of at least 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 120, 140, 160, 180, 200, 220, 240, 260, 280 or 300 nucleotides. The probe may further comprise a linker sequence of from 10-60 nucleotides. The probe may 5 further comprise a linker. The linker may comprise a sequence that does not occur naturally in a nucleic acid described herein. The linker may be 10-60 nucleotides in length. The linker may be 20-27 nucleotides in length. The linker may be of sufficient length to allow the probe to be a total length of 45-60 nucleotides. The linker may not be capable of forming a stable secondary structure, or may not be capable of folding on itself, or may not be capable of folding on a non- 10 linker portion of a nucleic acid contained in the probe. The sequence of the linker is heterogenous, and it may not appear in the genome of the animal from which the probe non- linker nucleic acid is derived.
As used herein, the term "reference value" means a value that statistically correlates to a particular outcome when compared to an assay result. In one embodiment, the reference value is 15 determined from statistical analysis of studies that compare microRNA expression with known clinical outcomes. In another embodiment, the reference value may vary according to the classifier (i.e. the algorithm) used. Hence, the reference value may be the expression levels (or values) of all the microRNAs in the training data. The reference value may be one or more thresholds established by the classifier. The reference value may further be a coefficient or set of 20 coefficients. Essentially the reference value refers to any parameter needed or used by the algorithm.
"Sensitivity", as used herein, may mean a statistical measure of how well a binary classification test correctly identifies a condition, for example, how frequently it correctly classifies a cancer into the correct type out of two possible types. The sensitivity for class A is 25 the proportion of cases that are determined to belong to class "A" by the test out of the cases that are in class "A", as determined by some absolute or gold standard.
"Sensitivity", as used herein, may mean a statistical measure of how well a classification test correctly identifies a condition or conditions, for example, how frequently it correctly classifies a cancer into the correct type out of two or more possible types. The sensitivity for 30 class A is the proportion of cases that are determined to belong to class "A" by the test out of the cases that are in class "A", as determined by some absolute or gold standard.
"Smear", as used herein, refers to a sample of thyroid tissue spread thinly on a microscope slide for examination, typically for medical diagnosis. Smears from FNAs usually have very small amounts of cells, which results in small amounts of RNA, which may range from 1-1000 ng, 1-100 ng, 1-50 ng, 1-40 ng, accordingly. Smears may be stained with any stain known to the man skilled in the art of cytology, histology or pathology, such as any stain used to differentiate cells in pathologic specimens. Examples of stains are multichromatic stains, like Papanicolaou, which are a combination of nuclear stain and cytoplasm stain; cellular structure 5 stains such as Wright, Giemsa, Romanowsky and the like; nuclear stains, such as Hoescht stains and the like; cell viability stains, such as Trypan blue, and the like, enzyme activity, such as benzidine for HRP to form visible precipitate and the like.
"Specificity", as used herein, may mean a statistical measure of how well a binary classification test correctly identifies cases that do not have a specific condition, for example, 10 how frequently it correctly classifies a sample as non-cancer when indeed it is a non-cancerous sample. The specificity for class A is the proportion of cases that are determined to belong to class "not A" by the test out of the cases that are in class "not A", as determined by some absolute or gold standard.
"Specificity", as used herein, may mean a statistical measure of how well a classification 15 test correctly identifies cases that do not have a specific condition. The specificity for class A is the proportion of cases that are determined by the test not to belong to class A out of the cases that are not in class A, as determined by some absolute or gold standard.
As used herein, the term "stage of cancer" refers to a numerical measurement of the level of advancement of a cancer. Criteria used to determine the stage of a cancer include, but are not 20 limited to, the size of the tumor, whether the tumor has spread to other parts of the body and where the cancer has spread (e.g., within the same organ or region of the body or to another organ).
"Stringent hybridization conditions", as used herein, mean conditions under which a first nucleic acid sequence (e.g., probe) will hybridize to a second nucleic acid sequence (e.g., target), 25 such as in a complex mixture of nucleic acids. Stringent conditions are sequence-dependent and will be different in different circumstances. Stringent conditions may be selected to be about 5- 10°C lower than the thermal melting point (Tm) for the specific sequence at a defined ionic strength pH. The Tm may be the temperature (under defined ionic strength, pH and nucleic concentration) at which 50% of the probes complementary to the target hybridize to the target 30 sequence at equilibrium (as the target sequences are present in excess, at Tm, 50% of the probes are occupied at equilibrium). Stringent conditions may be those in which the salt concentration is less than about 1.0 M sodium ion, such as about 0.01-1.0 M sodium ion concentration (or other salts) at pH 7.0 to 8.3 and the temperature is at least about 30°C for short probes (e.g., about 10- 50 nucleotides) and at least about 60°C for long probes (e.g., greater than about 50 nucleotides). Stringent conditions may also be achieved with the addition of destabilizing agents such as formamide. For selective or specific hybridization, a positive signal may be at least 2 to 10 times background hybridization. Exemplary stringent hybridization conditions include the following: 50% formamide, 5x SSC, and 1% SDS, incubating at 42°C, or, 5x SSC, 1% SDS, incubating at 5 65°C, with wash in 0.2x SSC, and 0.1% SDS at 65°C, DMSO, 6X SSPE + 0.005% N- Lauroylsarcosine +0.005% Triton X-102, 0.06X SSPE + 0.005% N-Lauroylsarcosine +0.005% Triton X-102.
As used herein, the term "subject" refers to a mammal, including both human and other mammals. The methods of the present invention are preferably applied to human subjects. 10
As used herein, the term "subtype of cancer" refers to different types of cancer that affect the same organ (e.g., papillary, follicular carcinoma and follicular variant papillary carcinoma of the thyroid).
"Thyroid lesion" as used herein, may mean a thyroid tumor, including sub-types of thyroid tumors, such as Hashimoto disease, follicular carcinoma, papillary carcinoma, follicular 15 variant of papillary carcinoma (FVPC or FVPTC), encapsulated FVPC (or encapsulated FVPTC), non-encapsulated (infiltrative/diffuse) FVPC or FVPTC, medullary carcinoma, anaplastic thyroid cancer, or poorly differentiated thyroid cancer.
As used herein, the phrase "threshold expression profile" refers to a criterion expression profile to which measured values are compared in order to classify a tumor. 20
As used herein, a tissue sample is tissue obtained from a tissue biopsy using methods well known to those of ordinary skill in the related medical arts. The phrase "suspected of being cancerous", as used herein, means a cancer tissue sample believed by one of ordinary skill in the medical arts to contain cancerous cells. Methods for obtaining the sample from the biopsy include gross apportioning of a mass, microdissection, laser-based microdissection, or other art- 25 known cell- separation methods.
"Tumor", as used herein, refers to all neoplastic cell growth and proliferation, whether malignant or benign, and all pre-cancerous and cancerous cells and tissues. The cytological classification of the thyroid lesions or tumor samples used herein is based on "The Bethesda System for Reporting Thyroid Cytopathology", the "BSRTC" (Syed, Z. Ali and Edmund S. 30 Cibas, eds.; DOI 10.1007/978-0-387-87666-5_l ; Springer Science+Business Media, LLC 2010). The BSRTC recommends that each thyroid FNA report be accompanied by a general diagnostic category, in which each category has an implied cancer risk. Recommended nomenclature for the Bethesda categories are as follows:
I. Non-diagnostic or Unsatisfactory
Cyst fluid only
Virtually acellular specimen
Other (obscuring blood, clotting artifact, etc.)
II. Benign
Consistent with a benign follicular nodule (includes adenomatoid nodule, colloid nodule, etc.)
Consistent with lymphocytic (Hashimoto) thyroiditis in the proper clinical context
Consistent with granulomatous (subacute) thyroiditis
Other
III. Atypia of Undetermined Significance or Follicular Lesion of Undetermined Significance
IV. Follicular Neoplasm or suspicious of a Follicular Neoplasm
Specific in Hurthle cell (oncocytic) type
V. Suspicious for Malignancy
Suspicious for papillary carcinoma
Suspicious for medullary carcinoma
Suspicious for metastatic carcinoma
Suspicious for lymphoma
Other
VI. Malignant
Papillary thyroid carcinoma
Poorly differentiated carcinoma
Medullary thyroid carcinoma
Undifferentiated (anaplastic) carcinoma
Squamous cell carcinoma
Carcinoma with mixed features
Metastatic carcinoma
Non-Hodgkin lymphoma
Other
As used herein, "Indeterminate" refers to thyroid lesions or tumor samples examined for cytology and classified according to the Bethesda classification in categories III, IV and V. The present invention further provides a method for identifying subtypes of thyroid lesions in a subject, said subtypes of thyroid lesions being said subtypes of malignant or benign thyroid tumor, subtype is any one of follicular carcinoma, papillary carcinoma, follicular variant of papillary carcinoma (FVPC or FVPTC), encapsulated FVPC (or encapsulated FVPTC), non- encapsulated FVPC (or non-encapsulated FVPTC), medullary carcinoma, anaplastic thyroid 5 cancer or poorly differentiated thyroid cancer.
In another further embodiment, said subtype is any one of Hashimoto thyroiditis, follicular adenoma or hyperplasia.
In another further embodiment, said subtype is Hurthle cell carcinoma.
In another aspect, the present invention provides a method for distinguishing between 10 follicular adenoma and follicular carcinoma.
In another further aspect, the present invention provides a method for distinguishing follicular adenoma from papillary carcinoma.
In another further aspect, the present invention provides a method for distinguishing follicular adenoma from follicular variant of papillary carcinoma. 15
In another further aspect, the present invention provides a method for distinguishing non- encapsulated follicular variant of papillary carcinoma from benign lesions.
In another further aspect the present invention provides a method for distinguishing papillary carcinoma and Hashimoto thyroiditis.
"Vector" refers to any known vector such as a plasmid vector, a phage vector, a 20 phagemid vector, a cosmid vector, or a virus vector. The nucleic acid described herein may be comprised in a vector. The vector may be used for delivery of the nucleic acid. The vector preferably contains at least a promoter that enhances expression of the nucleic acid carried, and in this case the nucleic acid is preferably operably linked to such a promoter. The vector may or may not be replicable in a host cell, and the transcription of a gene may be carried out either 25 outside the nucleus or within the nucleus of a host cell. In the latter case, the nucleic acid may be incorporated into the genome of a host cell. A vector may be a DNA or RNA vector. A vector may be either a self-replicating extrachromosomal vector or a vector that integrates into a host genome.
In one embodiment of the method or protocol of the invention, the levels of microRNAs 30 are measured by reverse transcription polymerase chain reaction (RT-PCR). Target sequences of a cDNA are generated by reverse transcription of a target RNA, which may be a nucleic acid described herein (comprising a sequence provided in Tables 1 and 2). Known methods for generating cDNA involve reverse transcribing either polyadenylated RNA or alternatively, RNA with a ligated adaptor sequence.
RNA may be ligated to an adaptor sequence prior to reverse transcription. A ligation reaction may be performed by T4 RNA ligase to ligate an adaptor sequence at the 3 ' end of the RNA. Reverse transcription (RT) reaction may then be performed using a primer comprising a 5 sequence that is complementary to the 3' end of the adaptor sequence.
Alternatively, polyadenylated RNA may be used in a reverse transcription (RT) reaction using a poly(T) primer comprising a 5' adaptor sequence. The poly(T) sequence may comprise 8, 9, 10, 11, 12, 13, or 14 consecutive thymines.
The reverse transcript of the RNA may then be amplified by real-time PCR, using a 10 specific forward primer comprising at least 15 nucleic acids complementary to the target nucleic acid and a 5' tail sequence; a reverse primer that is complementary to the 3' end of the adaptor sequence; and a probe comprising at least 8 nucleic acids complementary to the target nucleic acid. The probe may be partially complementary to the 5 ' end of the adaptor sequence.
The amplification of the reverse transcripts of the target nucleic acids (microRNAs, 15 including herein described putative microRNAs) may be by PCR or the like. The first cycles of the PCR reaction may have an annealing temperature of 56°C, 57°C, 58°C, 59°C, or 60°C. The first cycles may comprise 1-10 cycles. The remaining cycles of the PCR reaction may be 60°C.
The remaining cycles may comprise 2-40 cycles.
The PCR reaction comprises a forward primer. In one embodiment, the forward primer 20 may comprise 15, 16, 17, 18, 19, 20, or 21 nucleotides identical to the target nucleic acid. The 3' end of the forward primer may be sensitive to differences in sequence between a target nucleic acid and highly similar sequences.
The forward primer may also comprise a 5' overhanging tail. The 5' tail may increase the melting temperature of the forward primer. The sequence of the 5' tail may comprise a sequence 25 that is non- identical to the target nucleic acid. The sequence of the 5' tail may also be synthetic. The 5' tail may comprise 8, 9, 10, 11, 12, 13, 14, 15, or 16 nucleotides. Examples of forward primers used in the invention are provided in Table 8.
The PCR reaction comprises a reverse primer. The reverse primer may be complementary to a target nucleic acid. The reverse primer may also comprise a sequence complementary to an 30 adaptor sequence. Examples of reverse primers used in the invention are provided in Example 8.
The probes used to detect products of RT-PCR amplification may be general probes or sequence-specific probes. General probes are designed to detect (or hybridize with) RT-PCR amplification products in a non-sequence specific manner. Said probes are between 16 and 20 nucleotides long, preferably 18 nucleotides long, and comprise a sequence which is the reverse complement of the RT primer, including 4 adenines (As) at the 5 ' end. Sequence-specific probes are designed to detect (or hybridize with) RT-PCR amplification products based on total or partial complementarity between the sequence of the probe and the sequence of the RT-PCR product. Said probes are between 20 and 28 nucleotides longs, preferably 24 nucleotides long, 5 and comprising at the 5 'end three nucleotides from each at least two are complementary to the RT primer, followed by between 10 to 14, preferably 12 thymines (Ts), followed by between 6 to 10, preferably 8 contiguous nucleotides which correspond to the reverse complementary sequence of the specific corresponding microRNA.
A biochip comprising novel nucleic acids described herein is provided. In one 10 embodiment, the biochip may comprise probes that recognize the novel nucleic acids described herein. Said nucleic acids are isolated nucleic acids comprising at least 12 contiguous nucleotides at least 80% identical to the sequence of any one of SEQ ID NOs. 27-29, 33, 34, 139, 140, 307 and 308. In one embodiment, said isolated nucleic acid comprises at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, or at least 20 contiguous nucleotides 15 identical to the sequence of any one of SEQ ID NOs. 27-29, 33, 34, 139, 140, 307 and 308. The biochip may comprise a solid substrate comprising an attached nucleic acid, probe or plurality of probes described herein. The probes may be capable of hybridizing to a target sequence under stringent hybridization conditions. The probes may be attached at spatially defined addresses on the substrate. More than one probe per target sequence may be used, with either overlapping 20 probes or probes to different sections of a particular target sequence. The probes may be capable of hybridizing to target sequences associated with a single disorder appreciated by those in the art. The probes may either be synthesized first, with subsequent attachment to the biochip, or may be directly synthesized on the biochip.
The solid substrate may be a material that may be modified to contain discrete individual 25 sites appropriate for the attachment or association of the probes and is amenable to at least one detection method. Representative examples of substrates include glass and modified or functionalized glass, plastics (including acrylics, polystyrene and copolymers of styrene and other materials, polypropylene, polyethylene, polybutylene, polyurethanes, TeflonJ, etc.), polysaccharides, nylon or nitrocellulose, resins, silica or silica-based materials including silicon 30 and modified silicon, carbon, metals, inorganic glasses and plastics. The substrates may allow optical detection without appreciably fluorescing.
The substrate may be planar, although other configurations of substrates may be used as well. For example, probes may be placed on the inside surface of a tube, for flow-through sample analysis to minimize sample volume. Similarly, the substrate may be flexible, such as flexible foam, including closed cell foams made of particular plastics.
The biochip and the probe may be derivatized with chemical functional groups for subsequent attachment of the two. For example, the biochip may be derivatized with a chemical functional group including, but not limited to, amino groups, carboxyl groups, oxo groups or 5 thiol groups. Using these functional groups, the probes may be attached using functional groups on the probes either directly or indirectly using a linker. The probes may be attached to the solid support by either the 5' terminus, 3' terminus, or via an internal nucleotide.
The probe may also be attached to the solid support non-covalently. For example, biotinylated oligonucleotides can be made, which may bind to surfaces covalently coated with 10 streptavidin, resulting in attachment. Alternatively, probes may be synthesized on the surface using techniques such as photopolymerization and photolithography.
In a further embodiment of the invention, measuring the microRNAs for classification of thyroid lesions may be effected by high throughput sequencing. High throughput sequencing can involve sequencing-by-synthesis, sequencing-by-ligation, and ultra-deep sequencing. Sequence- 15 by-synthesis can be initiated using sequencing primers complementary to the sequencing element on the nucleic acid tags. The method involves detecting the identity of each nucleotide immediately after (substantially real-time) or upon (real-time) the incorporation of a labeled nucleotide or nucleotide analog into a growing strand of a complementary nucleic acid sequence in a polymerase reaction. After the successful incorporation of a label nucleotide, a signal is 20 measured and then nulled by methods known in the art. Examples of sequence-by-synthesis methods are known in the art, and are described for example in US 7,056,676, US 8,802,368 and US 7,169,560, the contents of which are incorporated herein by reference. Examples of labels that can be used to label nucleotide or nucleotide analogs for sequencing-by-synthesis include, but are not limited to, chromophores, fluorescent moieties, enzymes, antigens, heavy metal, 25 magnetic probes, dyes, phosphorescent groups, radioactive materials, chemiluminescent moieties, scattering or fluorescent nanoparticles, Raman signal generating moieties, and electrochemical detection moieties. Sequencing-by-synthesis can generate at least 1,000, at least 5,000, at least 10,000, at least 20,000, 30,000, at least 40,000, at least 50,000, at least 100,000 or at least 500,000 reads per hour. Such reads can have at least 40, at least 45, at least 50, at least 30 60, at least 70, at least 80, at least 90, at least 100, at least 120 or at least 150 bases per read.
Sequencing-by-synthesis may be performed on a solid surface (or a chip) using fold-back PCR and anchored primers. Since microRNAs occur as small nucleic acid fragments - adaptors are added to the 5' and 3' ends of the fragments. Nucleic acid fragments that are attached to the surface of flow cell channels are extended and bridge amplified. The fragments become double stranded, and the double stranded molecules are denatured. Multiple cycles of the solid-phase amplification followed by denaturation can create several million clusters of approximately 1 ,000 copies of single-stranded nucleic acid molecules of the same template in each channel of the flow cell. Primers, polymerase and four fluorophore-labeled, reversibly terminating 5 nucleotides are used to perform sequential sequencing. After nucleotide incorporation, a laser is used to excite the fluorophores, an image is captured and the identity of the first base is recorded. The 3' terminators and fluorophores from each incorporated base are removed and the incorporation, detection and identification steps are repeated. This technology is used, for example, in the Illumina® sequencing platform. 10
Another sequencing method involves hybridizing the amplified regions to a primer complementary to the sequence element in an LST (a file listing the names of fasta files). This hybridization complex is incubated with a polymerase, ATP sulfurylase, luciferase, apyrase, and the substrates luciferin and adenosine 5' phosphosulfate. Next, deoxynucleotide triphosphates corresponding to the bases A, C, G, and T (U) are added sequentially. Each base incorporation is 15 accompanied by release of pyrophosphate, converted to ATP by sulfurylase, which drives synthesis of oxyluciferin and the release of visible light. Since pyrophosphate release is equimolar with the number of incorporated bases, the light given off is proportional to the number of nucleotides adding in any one step. The process is repeated until the entire sequence is determined. Yet another sequencing method involves a four-color sequencing by ligation scheme 20 (degenerate ligation), which involves hybridizing an anchor primer to one of four positions. Then an enzymatic ligation reaction of the anchor primer to a population of degenerate nonamers that are labeled with fluorescent dyes is performed. At any given cycle, the population of nonamers that is used is structure such that the identity of one of its positions is correlated with the identity of the fluorophore attached to that nonamer. To the extent that the ligase discriminates for 25 complementarity at that queried position, the fluorescent signal allows the inference of the identity of the base. After performing the ligation and four-color imaging, the anchor primennonamer complexes are stripped and a new cycle begins. Methods to image sequence information after performing ligation are known in the art. In some cases, high throughput sequencing involves the use of ultra-deep sequencing, such as described in Marguiles et al., 30 Nature 437 (7057): 376-80 (2005).
MicroRNA sequencing (miRNA-seq) is a type of RNA Sequencing (RNA-Seq) which uses next-generation sequencing or massively parallel high-throughput DNA sequencing to sequence microRNAs. miRNA-seq differs from other forms of RNA-seq in that input material is often enriched for small RNAs. miRNA-seq provides tissue specific expression patterns, which may lead to disease associations and microRNAs isoforms. miRNA-seq is also used for the discovery of previously uncharacterized microRNAs, such as the nucleic acid sequences denoted by SEQ ID NOs 139-140 and 307-308.
As used herein, the term "diagnosing" refers to classifying pathology, or a symptom, 5 determining a severity of the pathology (grade or stage), monitoring pathology progression, forecasting an outcome of pathology and/or prospects of recovery.
As used herein, the phrase "subject in need thereof" refers to an human subject who is known to have cancer, at risk of having cancer (e.g., a genetically predisposed subject, a subject with medical and/or family history of cancer, a subject who has been exposed to carcinogens, 10 occupational hazard, environmental hazard) and/or a subject who exhibits suspicious clinical signs of cancer (e.g., nodules in the thyroid). Additionally or alternatively, the subject in need thereof can be a healthy human subject undergoing a routine well-being check-up.
Analyzing presence of malignant or pre-malignant cells can be effected in vivo or ex vivo, whereby a biological sample (e.g., biopsy) is retrieved. Such biopsy samples comprise cells and 15 may be an incisional or excisional biopsy. The sample may be retrieved from the thyroid of the subject, and may be retrieved using FNA. Alternatively the cells may be retrieved from a complete resection.
While employing the present teachings, additional information may be gleaned pertaining to the determination of treatment regimen, treatment course and/or to the measurement of the 20 severity of the disease.
As used herein, the phrase "treatment regimen" refers to a treatment plan that specifies the type of treatment, dosage, schedule and/or duration of a treatment provided to a subject in need thereof (e.g., a subject diagnosed with a pathology). The selected treatment regimen can be an aggressive one which is expected to result in the best clinical outcome (e.g., complete cure of 25 the pathology) or a more moderate one which may relieve symptoms of the pathology yet results in incomplete cure of the pathology. It will be appreciated that in certain cases the treatment regimen may be associated with some discomfort to the subject or adverse side effects (e.g., damage to healthy cells or tissue). The type of treatment can include a surgical intervention (e.g., removal of lesion, diseased cells, tissue, or organ), a cell replacement therapy, an administration 30 of a therapeutic drug (e.g., receptor agonists, antagonists, hormones, chemotherapy agents) in a local or a systemic mode, an exposure to radiation therapy using an external source (e.g., external beam) and/or an internal source (e.g., brachytherapy) and/or any combination thereof. The dosage, schedule and duration of treatment can vary, depending on the severity of pathology and the selected type of treatment, and those of skills in the art are capable of adjusting the type of treatment with the dosage, schedule and duration of treatment.
A method of diagnosis is also provided. The method comprises detecting an expression level of a specific cancer-associated nucleic acid in a biological sample. Diagnosis of a specific cancer state in a patient may allow for prognosis and selection of therapeutic strategy. Further, 5 the developmental stage of cells may be classified by determining temporarily expressed specific cancer-associated nucleic acids.
In situ hybridization of labeled probes to tissue sections or FNA smears may be performed. When comparing the fingerprints between individual samples the skilled artisan can make a diagnosis, a prognosis, or a prediction based on the findings. It is further understood that 10 the nucleic acid sequence which indicate the diagnosis may differ from those which indicate the prognosis and molecular profiling of the condition of the cells may lead to distinctions between responsive or refractory conditions or may be predictive of outcomes.
A kit is also provided and may comprise a nucleic acid described herein together with any or all of the following: assay reagents, buffers, probes and/or primers, and sterile saline or 15 another pharmaceutically acceptable emulsion and suspension base. In addition, the kits may include instructional materials containing directions (e.g., protocols) for the practice of the methods described herein. The kit may further comprise a software package for data analysis of expression profiles.
For example, the kit may be a kit for the amplification, detection, identification or 20 quantification of a target nucleic acid sequence. The kit may comprise a poly (T) primer, a forward primer, a reverse primer, and a probe.
Any of the compositions described herein may be comprised in a kit. In a non-limiting example, reagents for isolating microRNA, labeling microRNA, and/or evaluating a microRNA population using an array are included in a kit. The kit may further include reagents for creating 25 or synthesizing microRNA probes. The kits will thus comprise, in suitable container means, an enzyme for labeling the microRNA by incorporating labeled nucleotide or unlabeled nucleotides that are subsequently labeled. It may also include one or more buffers, such as reaction buffer, labeling buffer, washing buffer, or a hybridization buffer, compounds for preparing the microRNA probes, components for in situ hybridization and components for isolating 30 microRNA. Other kits of the invention may include components for making a nucleic acid array comprising microRNA, and thus, may include, for example, a solid support. The following examples are presented in order to more fully illustrate some embodiments of the invention. They should in no way be construed, however, as limiting the broad scope of the invention.
EXAMPLES 5 Materials and Methods
1. microRNA analysis
The presence and/or level of microRNAs in thyroid tumor samples may be evaluated using methods known in the art, e.g., Northern blot, RNA expression assays, e.g., microarray analysis, RT-PCR, high throughput sequencing (next generation sequencing), cloning, and 10 quantitative real time polymerase chain reaction (qRT-PCR). Analytical techniques to determine RNA expression are known in the art, see e.g. Sambrook et al., Molecular Cloning: A Laboratory Manual, 3rd Ed., Cold Spring Harbor Press, Cold Spring Harbor, N.Y. (2001). Examples of specific methods used herein are described in more detail below.
2. RNA extraction 15 FNA Cell Block samples
Total RNA was isolated from seven to ten 10 μιη-thick tissue sections. Sections were incubated a few times (1-3 times) in xylene at 57°C for 5 minutes in order to remove excess paraffin, followed by centrifugation at ambient temperature for 2 minutes at 10,000g. The specimen was then washed several times (about 3 times) with 1 ml 100% ethanol in order to 20 wash the xylene out of the tissue, followed by centrifugation at ambient temperature for 10 minutes at 10,000g. The supernatant was discarded and the tissue dried at 65 °C for 5 minutes. Proteins were degraded by proteinase K solution (5-12 μΐ Proteinase K (e.g., Sigma or ABI) in 500 μΐ of Buffer B (10 mM NaCl, 500 mM Tris pH 7.5, 20 mM EDTA pH 8, 1% SDS), at 45°C for a few hours (about 16 hours). Proteinase K was inactivated by incubation at 95 °C for 7 25 minutes. After the tubes were chilled 10 μΐ of RNA synthetic spikes was added (e.g., 2 spikes of 0.15 fmol/μΐ). RNA was extracted using acid phenol/chloroform equal volume, vortexing, followed by centrifugation at 4°C for 15 minutes at 12000g. RNA was then precipitated using 8 μΐ linear acrylamide, 0.1 volumes of 3M NaOAc pH 5.2, and 3 volumes of absolute 100% ethanol, for 30 minutes to 16 hours followed by centrifugation at 4°C for at least 40 minutes at 30 20000g (14,000 rpm). The pellet was washed by adding 1 ml 85% cold Ethanol. DNAses were introduced at 37°C for 60 minutes to digest DNA (e.g. 10 μΐ Turbo DNase), followed by extraction using acid phenol/chloroform and ethanol precipitated as described above.
FNA smears samples (e.2.)
Total RNA was isolated from FNA smear samples in slides, either non-stained or stained (e.g. by Papanicolaou, Giemsa or Diff-Quick) after removal of the coverslip (when present) by 5 dipping the slides for several hours (about 2-20 hours, usually about 16 hours) in xylene at ambient temperature, in order to remove excess paraffin or glue. Further the slides were washed several times (about 3 times) with 100% ethanol in order to wash the xylene out. Slides were dipped for 1 minute in double-distilled water (DDW). The cells were scraped from the slide using a scalpel. The slide was then washed with 500 μΐ buffer B (lOmM NaCl, 500mM Tris pH 10 7.5, 20mM EDTA pH 8, 1% SDS), and transferred to a 1.7 ml tube. Proteins were degraded by proteinase K (e.g., 5-12 μΐ Sigma or ABI) at 45°C for a few hours (about 16 hours). Proteinase K was inactivated by incubating the tubes at 95°C for 7 minutes. After chilling the tubes, 10 μΐ of RNA synthetic spikes (e.g., 2 spikes of 0.15ίιηο1/μ1) was added. RNA was extracted using acid phenol/chloroform equal volume, vortexing, spinning down at 4°C for 15 minutes at 12000g. 15 RNA was then precipitated using 8 μΐ linear acrylamide, 0.1 volumes of 3M NaOAc pH 5.2, and 3 volumes of absolute ethanol from 30 minutes to 16 hours. The tubes were then spun down at 4°C for at least 40 minutes at 20000g (14,000 rpm). The pellet was washed with about 1 ml 85% cold ethanol. DNAses were introduced at 37°C for 60 minutes to digest DNA (e.g. 10 μΐ Turbo™ DNase, Ambion, Life Technologies), followed by extraction using acid 20 phenol/chloroform and ethanol precipitation as described above.
3. Total RNA quantification
Total RNA quantification was performed by fluorospectrometry in a NanoDrop 3300 (ND3300) fluorospectrometer using the RiboGreen® dye (Thermo Fisher Scientific®, Wilmington, DE). The ND3300 RNA detection range is of 25 ng/ml - 1000 ng/ml when using a 25 high concentration of RiboGreen® dye (1:200 dilution), and 5 ng/ml - 50 ng/ml when using a 1:2000 dilution of RiboGreen® dye. The RNA amounts which were determined by ND3300 were highly correlated to the detected expressed microRNA.
4. MicroRNA profiling in microarray
Custom microarrays (Agilent Technologies, Santa Clara, CA) were generated by printing 30 DNA oligonucleotide probes to: 2172 miRs sequences, 17 negative controls, 23 spikes, and 10 positive controls (total of 2222 probes). Each microRNA probe, printed in triplicate, carried up to 28-nucleotide (nt) linker at the 3' end of the microRNAs' complement sequence. Negative spikes and positive probes were printed from 3 to 200 times. Seventeen (17) negative control probes were designed using sequences that do not match the genome. Two groups of positive control probes were designed to hybridize to the microRNA array: (i) synthetic small RNAs were spiked to the RNA before labeling to verify the labeling efficiency; and (ii) probes for abundant small RNA, e.g. , small nuclear RNAs (U43, U24, Z30, U6, U48, U44), 5.8s and 5s 5 ribosomal RNA were spotted on the array to verify RNA quality.
5. Cy-dye labeling of microRNA for microarray
Total RNA (20-1000ng) was labeled by ligation (Thomson et al. Nature Methods 2004; 1 :47-53) with an RNA linker, p-rCrU-Cy/dye or several sequential Cys (BioSpring GmbH, IBA GmbH or equivalent), to the 3' end with Cy3 or Cy5. The labeling reaction contained total RNA, 10 spikes (0.1-100 fmoles), 250-400 ng RNA-linker-dye, 15% DMSO, lx ligase buffer and 20 units of T4 RNA ligase (NEB or equivalent), and proceeded at 4°C for 1 hour, followed by 1 hour at 37°C, followed by 4 °C up to 40 minutes.
The labeled RNA was mixed with 30 μΐ hybridization mixture (mixture of 45 μΕ of the 10X GE Agilent Blocking Agent and 246 μΕ of 2X Hi-RPM Hybridization). The labeling 15 mixture was incubated at 100°C for 5 minutes followed by ice incubation in water bath for 5 minutes. Slides were hybridized at 54-55°C for 16-20 hours, followed by two washes. The first wash was conducted at room temperature with Agilent GE Wash Buffer 1 (e.g. 6X SSPE + 0.005% N-Lauroylsarcosine +0.005% Triton X-102,) for 5 minutes followed by a second wash with Agilent GE Wash Buffer 2 at 37°C for 5 minutes (e.g. 0.06X SSPE + 0.005% N- 20 Lauroylsarcosine +0.005% Triton X-102).
Arrays were scanned using a microarray scanner (Agilent Microarray Scanner Bundle G2565BA, resolution of 5 μιη at XDR Hi 100%, XDR Lo 10%). Array images were analyzed using appropriate software (Feature Extraction 10.7 software, Agilent).
6. RT-PCR 25
Poly-adenylation and reverse transcription was performed on l-500ng of total RNA. RNA was incubated in the presence of poly (A) polymerase (Poly (A) Polymerase NEB- M0276L), ATP, an oligodT primer harboring a consensus sequence and reverse transcriptase (Superscript® II RT, Invitrogen, Carlsbad, CA) for 1 hour at 37 °C. Next, the cDNA was amplified by RT-PCR. The amplification reaction included a microRNA-specific forward 30 primer, being a TaqMan® (MGB) probe complementary to the 3' of the specific microRNA sequence and or to part of the polyA adaptor sequence, and a universal reverse primer complementary to the consensus 3 ' sequence of the oligodT tail. Detailed description of the RT- PCR methodology may be found in publication WO 2008/029295, the contents of which are incorporated herein by reference.
The cycle threshold (CT, the PCR cycle at which probe signal reaches the threshold) was determined for each microRNA.
In order to allow comparison between microRNA expression results from RT-PCR with 5 microRNA expression results from microarray, each value obtained by RT-PCR was subtracted from 50 (50-CT). The 50-CT expression for each microRNA for each patient was compared with the signal obtained by the microarray method.
7. Array data normalization
The initial data set consisted of signals measured for multiple probes for every sample. 10 For the analysis, signals were used only for probes that were designed to measure the expression levels of known or validated human microRNAs.
Triplicate spots were combined into one signal by taking the logarithmic mean of the reliable spots. All data was log-transformed and the analysis was performed in log-space. A reference data vector for normalization, R, was calculated by taking the mean expression level 15 for each probe in two representative samples, one from each tumor type.
For each sample k with data vector Sk, a 2nd degree polynomial Fk was found so as to provide the best fit between the sample data and the reference data, such that R~Fk(Sk). Remote data points ("outliers") were not used for fitting the polynomials F. For each probe in the sample (element S* in the vector 5*), the normalized value (in log-space) * is calculated from the 20 initial value S. by transforming it with the polynomial function Fk, so that * =F ( S^ ).
Statistical analysis is performed in log-space. For presentation and calculation of fold-change, data is translated back to linear-space by taking the exponent.
8. miRNA-seq Sequence Library Construction
Sequence library construction may be performed using a variety of different kits 25 depending on the high-throughput sequencing platform being employed. However, there are several common steps for small RNA sequencing preparation. The ligation step adds DNA adaptors to both ends of the small RNAs, which act as primer binding sites during reverse transcription and PCR amplification. An adenylated single strand DNA 3 'adaptor followed by a 5 'adaptor is ligated to the small RNAs using a ligating enzyme such as T4 RNA ligase or adding 30 5' adaptor using 5' RACE reaction 2. The adaptors are also designed to capture small RNAs with a 5' phosphate group, characteristic microRNAs, rather than RNA degradation products with a 5' hydroxyl group. Reverse transcription and PCR amplification steps convert the small adaptor ligated RNAs into cDNA clones used in the sequencing reaction. PCR is then carried out to amplify the pool of cDNA sequences. Primers designed with unique nucleotide tags may also be used in this step to create ID tags in pooled library multiplex sequencing.
9. Next generation sequencing (NGS)
500ng of RNA from each FFPE sample were used for small RNA deep sequencing 5 (miRSeq). Libraries were loaded on two lanes of the sequence analyzer (Illumina® HiSeq™ 2000 DNA). An average of about 6.3 million reads per library were obtained. To find novel microRNAs, sequence analysis software (miRDeep2, Friedlander MR et al. Nucleic Acids Res. 2012 Jan;40(l):37-52) was applied on the raw sequencing data (primer-adapter sequences were trimmed). 10
10. Statistical analysis
P- values were calculated using a two-sided (unpaired) Student's t-test on the log- transformed normalized fluorescence signal. The threshold for significant differences was determined by setting a false discovery rate (FDR) of 0.05 to 0.1, to correct for effects of multiple hypothesis testing, resulting in p-value cutoffs in the range of 0.01-0.06. For each 15 differentially expressed microRNA, the fold-difference (ratio of the median normalized fluorescence) and the area under curve (AUC) of the response operating characteristic (ROC) curve were calculated. Three sets of miRs were excluded from the statistical analysis: (a) miRs that were previously found as highly expressed in blood samples (due to high percentages of blood in FNA samples), (b) miRs whose level of expression did not correlate with decreasing 20 amounts of RNA, i.e: these miRs did not show linear decrease in signal in association with decreasing measured RNA amounts, and (c) miRs whose level of expression correlated with miRs in set (b).
Example 1: Detection of microRNA in pre-operative samples 25
A pilot study of microRNA profiling was conducted in a few Papanicolaou, Giemsa and Diff-Quick stained smears from ex-vivo FNA biopsy samples in order to ensure feasibility of the methodology. Since FNA smears often have very few cells, providing a minuscule amount of RNA for analysis, e.g. 1-1000 ng, it was first necessary to evaluate whether microRNA would be detectable under such low RNA amounts. Thus, microRNA expression levels of about 2200 30 individual microRNAs was measured in Giemsa-stained papillary carcinoma and non-papillary carcinoma smears. Five microRNAs (hsa-miR-146b-5p, hsa-miR-31-5p, hsa-miR-222-3p, hsa- miR-221-3p, and hsa-miR-21-5p), previously shown to correlate with papillary carcinoma were found over-represented in the papillary-carcinoma smears. Figure 1 shows a comparison of microRNA expression between Giemsa-stained papillary carcinoma and non-papillary carcinoma samples, and reveals the highly up-regulated microRNA markers in the papillary carcinoma. These results strongly suggested that microRNA profiles can be successfully determined in FNA smears.
5
Example 2: Differential microRNA expression between malignant and benign thyroid lesions
The cohort of samples used in the experimental analysis consisted of 73 pre-operative thyroid FNA cell blocks selected from archived materials of the Department of Pathology Temple University Hospital (Philadelphia, USA). The 73 specimens included samples of 35 10 benign and 38 malignant thyroid lesions. The 35 benign tumors consisted of 18 follicular adenoma, eight (8) Hashimoto thyroiditis, and nine (9) hyperplasia (Goiter) samples. The 38 malignant tumors consisted of: 10 follicular carcinoma and 28 papillary carcinoma. Of the 28 papillary carcinoma samples, nine (9) were papillary carcinoma, 13 were papillary carcinoma follicular variant encapsulated, and six (6) were papillary carcinoma follicular variant non- 15 encapsulated. The histological diagnosis assessed ultimately the malignancy or benignity of the thyroid lesions. The cytological classification was based on "The Bethesda System for Reporting Thyroid Cytopathology" (Syed, Z. Ali and Edmund S. Cibas, eds.; DOI 10.1007/978-0-387- 87666-5_l ; Springer Science+Business Media, LLC 2010). The study protocol was approved by the Institutional Review Board (IRB, equivalent to Ethical Review Board) of the contributing 20 institution. Tumor classification was based on the World Health Organization (WHO) guidelines. An additional cohort consisted of 13 thyroid ex-vivo FNA smears, prepared after thyroidectomy, and obtained from the University Milano-Bicocca (Milan, Italy).
Total RNA (at least 10 ng) was extracted from these samples, and microRNA expression was profiled using custom microarrays containing about 2200 miRs. The results exhibited a 25 significant difference in the expression pattern between benign and malignant lesions of several miRs listed in Table 3 (upregulated or downregulated in malignant versus benign).
Table 3: miRNAs up or downregulated in malignant versus benign thyroid tumor
median
miR name p-value fold-change AUC
malignant benign
hsa-miR-146b-5p 3.80E-05 2.57 (+) 0.77 5.70E+02 2.20E+02 hsa-miR-222-3p 1.80E-03 2.20 (+) 0.71 4.70E+03 2.10E+03 hsa-miR-221-3p 1.80E-03 2.09 (+) 0.71 4.10E+03 2.00E+03 hsa-miR-181b-5p 2.50E-02 1.38 (+) 0.65 5.00E+02 3.60E+02 hsa-miR-29b-3p 9.50E-03 1.32 (+) 0.64 2.10E+03 1.60E+03 median miR name p-value fold-change AUC
malignant benign
hsa-miR-200b-3p 2.60E-02 1.27 (+) 0.65 3.10E+02 2.40E+02 hsa-miR-200a-3p 3.90E-02 1.27 (+) 0.64 3.00E+02 2.40E+02 hsa-miR-29c-3p 8.80E-03 1.22 (+) 0.64 1.40E+03 1.10E+03 hsa-miR-130a-3p 3.30E-02 1.20 (+) 0.64 1.00E+03 8.70E+02 hsa-miR-148b-3p 3.60E-02 1.13 (+) 0.64 5.00E+02 4.50E+02
MID-23794 2.60E-05 2.34 (-) 0.78 6.00E+02 1.40E+03 hsa-miR-197-5p 2.20E-03 1-90 (-) 0.74 3.40E+02 6.60E+02 hsa-miR-486-3p 3.60E-05 1.73 (-) 0.79 2.00E+02 3.50E+02 hsa-miR-574-3p 1.40E-02 1.44 (-) 0.68 2.30E+02 3.30E+02 hsa-miR-532-3p 4.80E-03 1.30 (-) 0.71 4.50E+02 5.80E+02 hsa-miR-199a-5p 2.50E-03 1.25 (-) 0.73 3.90E+02 4.80E+02 hsa-miR-22-3p 3.90E-02 1.11 (-) 0.62 3.40E+03 3.70E+03 p-values were calculated using a two-sided (unpairec ) Student's t-test.
The fold-change represents the ratio between the median values of each group.
AUC: Area under the curve when using the miRNAs to classify the two groups.
Median: median of expression values (rounded).
A classification algorithm for differentiating between malignant and benign thyroid tumor was developed based on miRNA expression in 35 benign and 38 malignant FNA samples. A logistic regression classifier was trained to distinguish between malignant and benign thyroid lesions, based on eight miRs (hsa-miR-125b-5p, hsa-miR-21-5p, hsa-miR-222-3p, hsa-miR-221- 3p, hsa-miR-146b-5p, hsa-miR-181a-5p, hsa-miR-138-5p, and MID-23794) that were found to 10 be differentially expressed in these conditions, either between benign or malignant or between specific thyroid tumor subtypes (data not shown). The classifier reached 89% accuracy with sensitivity of 87% and specificity of 91% for identifying malignant samples. hsa-miR-125b-5p, hsa-miR-21-5p, hsa-miR-222-3p, hsa-miR-221-3p, hsa-miR-146b-5p and hsa-miR-181a-5p exhibited higher expression in malignant lesions, while hsa-miR-138-5p and MID-23794 15 exhibited higher expression in benign lesions (data not shown).
Example 3: Distinguishing different sub-types of malignant and benign thyroid lesions
Expression levels of miRs were compared in 18 follicular adenoma samples and 10 follicular carcinoma samples. microRNAs that were upregulated or downregulated in follicular 20 adenoma relative to follicular carcinoma are presented in Table 4 Table 4: miRNAs u p- or downregulated in fo licular adenoma versus follicular carcinoma
Median
fold- miR name p-value AUC
change Follicular Follicular adenoma carcinoma hsa-miR-486-3p 2.80E-02 2.04 (+) 0.77 4.80E+02 2.40E+02
MID-01141 5.50E-02 1.91 (+) 0.73 3.50E+02 1.80E+02
2.20E+02 hsa-miR-193a-3p 2.70E-02 1.45 (+) 0.76 3.10E+02
hsa-miR-148b-3p 3.90E-02 1.25 (-) 0.71 4.50E+02 5.60E+02 p-values were calculated using a two-sided (unpaired) Student's t-test.
The fold-change represents the ratio between the median values of each group.
AUC: Area under the curve when using the miRNAs to classify the two groups.
Median: median of expression values (rounded).
Expression levels of miRs were compared in 18 follicular adenoma samples versus 9 papillary carcinoma (non-follicular variant) samples, and a classifier was generated for distinguishing between follicular adenoma and papillary carcinoma samples using the expression levels of hsa-miR-146b-5p and hsa-miR-21-5p, with 100% accuracy (data not shown).
Expression levels of miRs were compared in 18 follicular adenoma samples versus 19 follicular variant of papillary carcinoma samples. microRNAs that were upregulated or downregulated in follicular variant of papillary carcinoma relative to follicular adenoma are presented in Table 5.
Table 5: miRNAs up- or downregulated in follicular variant papillary carcinoma (FVPC)
Figure imgf000082_0001
median miR name p -value fold-change AUC
FVPC FA
hsa-miR-34a-5p 4.10E-02 1-09 (-) 0.63 6.00E+02 6.60E+02 p-values were calcu ated using a two-sided (unpaired) Student's t-test.
The fold-change represents the ratio between the median values of each group.
AUC: Area under the curve when using the miRNAs to classify the two groups.
Median: median of expression values (rounded).
Expression levels of miRs were compared in 6 non-encapsulated follicular variant of papillary carcinoma samples versus 35 benign samples, and a classifier was generated using the expression levels of hsa-miR-221-3p and hsa-miR-200b-3p, with 98% accuracy, 83% sensitivity and 100% specificity (data not shown).
Expression levels of miRs were compared in 8 Hashimoto thyroiditis samples and 9 (non- 10 follicular) papillary carcinoma samples. microRNAs that were upregulated or downregulated in papillary carcinoma relative to Hashimoto thyroiditis are presented in Table 6. The miRs that are the best candidates for the profile signature for comparing these two thyroid lesions are hsa-miR- 146b-5p, hsa-miR-200a-3p and MID-23794.
15
Table 6: miRNAs upregulated or downregulated in papillary carcinoma (PC) versus
Figure imgf000083_0001
p-values were calculated using a two-sided (unpaired) Student's t-test.
The fold-change represents the ratio between the median values of each group.
AUC: Area under the curve when using the miRNAs to classify the two groups. 20 Median: median of expression values (rounded).
Example 4: Identification of Novel microRNAs Biomarkers by Deep-sequencing
Eleven (11) FFPE (Formalin Fixed Paraffin Embedded) thyroid resection samples (obtained from surgical biopsies and fixed in formalin and preserved in paraffin) from follicular lesions were obtained from the Department of Pathology at Rabin Medical Center. The specimens included 6 follicular adenomas and 5 follicular carcinomas. Tumor cellular content was higher than 50% in all the samples. A total of 386 novel candidate microRNAs were found with sequence analysis software, and 27 of those were selected for validation, performed by qPCR. Two novel microRNAs are disclosed herein, MD2-495 and MD2-437, and their sequences are presented in Table 1, and their respective hairpins are shown in Table 2. Figure 2A shows the secondary structures of the two novel microRNAs, predicted by sequence analysis software. Figure 2B shows the expression 5 of the two novel microRNAs (normalized number of reads) in each of the 11 samples. The color- coded bar on the right represents a scale for expression.
Example 5: Specific microRNAs are differentially expressed between benign and malignant thyroid lesions
Stained thyroid FNA smears were obtained from a medical center in Israel (Cohort I); and thyroid FNA cell blocks were obtained from a medical center in the USA (Cohort II). For both cohorts, thyroid lesions were ultimately classified as malignant or benign based on histological diagnosis of the resected tumor. A summary of the breakdown of the samples from the two cohorts is shown in Table 7.
Table 7: FNA Samples - Cohorts I and II
Figure imgf000084_0001
Some patients had more than one lesion.
2 The Bethesda System for Reporting Thyroid Cytopathology (BSRTC) resulted from a conference held at the National Institutes of Health in 2007 (Cibas ES, Ali SZ. The Bethesda 20 System for Reporting Thyroid Cytopathology. Am J Clin Pathol 2009;132:658-65). The system led to standardization of FNA reports based on six diagnostic categories: DC I = non-diagnostic, DC II = benign, DC III = atypia/follicular lesion of undetermined significance (AUS/FLUS), DC IV = follicular neoplasm/suspicion for a follicular neoplasm (FN/SFN), DC V = suspicious for malignancy, and DC VI = malignant. 25 Highly purified RNA, including the microRNA fraction, was extracted from samples using in-house developed protocols as described above. FFPE and cytological (FNA) samples were profiled by custom printed microarrays measuring over 2000 microRNAs to identify differentially expressed microRNAs and to develop a classifier.
Over 150 thyroid FNA samples (Table 7) were profiled by custom-printed microarrays 5 measuring over 2000 microRNAs and on 96 microRNAs by qPCR. Figures 3A (cohort I) and 3B (cohort II) show the median microRNA expression levels on microarrays in patients with malignant nodules (y-axis) and in patients with benign nodules (x-axis). For each microRNA, the values in the two groups were compared by Mann- Whitney test with FDR=0.1.
Differential expression of microRNAs was found between benign and malignant 10 neoplasms. Classification of malignant vs. benign smears based on two microRNAs: hsa-miR- 146b-5p and hsa-miR-375 results in over 85% accuracy (based on the median of ten 10-fold cross-validation runs, data not shown).
Example 6: hsa-miR-375 is a significant marker for medullary thyroid carcinoma in FNA 15 samples
Expression level of hsa-miR-375 (SEQ ID NO.8) in FNA cohort I was compared between medullary thyroid cancer samples (n=6) and samples from other thyroid nodules (n=75). Results are shown in Figure 4. Thus, hsa-miR-375 is a significant marker for medullary thyroid carcinoma. 20
Example 7: Stained thyroid smears can be used for microRNA profiling
MicroRNA expression level in samples stained with different dyes was compared in order to evaluate microRNA stability and reproducibility of the microRNA level detection upon staining. A total of 143 smears from FNA cohort I were stained as follows: 60 with May- 25 Griinwald Giemsa, 64 with DiffQuik and 19 with Papanicolaou. MicroRNA expression levels in duplicates of the same sample stained with different dyes showed significant correlation (more than expected). The similarity of miR-146b-5p expression levels between the different stains is further demonstrated in Figures 5A-5B, which shows that the normalized expression level of hsa-miR-146b-5p (SEQ ID NO.10- 11) is similar when the same sample is stained with different 30 dyes, as can be seen for the 52 May-Griinwald Giemsa -DiffQuik pairs (Fig.5 A) and for the 15 DiffQuik-Papanicolaou pairs (Fig.5B). Therefore, different cytological dyes used in the clinical setting (Papanicolaou; May- Griinwald Giemsa; and DiffQuik) do not affect the detection and quantification of microRNA expression.
Example 8: Thyroid Classification - Assay Development 5
A total of twenty-four (24) microRNAs overall were chosen for establishing the status of thyroid samples as malignant versus benign (Table 12). MicroRNA expression was measured by RT-PCR as described above. The list of miRs and their respective forward primers are provided in Table 8. First-strand generation was done using polyT adaptor presented below. Forward primers were sequence-specific while the reverse primer was universal. Detection of the RT- 10 PCR products was done with the universal MGB probe for miRs hsa-miR-31-5p (SEQ ID NO.5- 7) , hsa-miR-5701 (SEQ ID N0.35), hsa-miR-424-3p (SEQ ID NO.16), MID-50971 (SEQ ID N0.34), MID-20094 (SEQ ID NO.27-28), MID-50976 (SEQ ID N0.33), hsa-miR-3074-5p (SEQ ID N0.32), hsa-miR-222-3p (SEQ ID NO.1-2), MID-50969 (SEQ ID N0.29), hsa-miR- 146b-5p (SEQ ID NO.10-11), hsa-miR-346 (SEQ ID N0.14), MID-16582 (SEQ ID N0.25), or 15 with probes specific for the miRs as provided in Table 9.
The sequences of the reverse primer, the polyT adaptor and the MGB probe are provided below:
Reverse primer
GCGAGCACAGAATTAATACGAC (SEQ ID NO.309); 20 PolyT adaptor
GCGAGCACAGAATTAATACGACTCACTATCGGTTTTTTTTTTTTVN (SEQ ID NO. 310), where "V" may be any one of A, G or C; and "N" may be any one of G, C, A or U/T;
25
Universal MGB probe
AAAACCGATAGTGAGTCG (SEQ ID NO.311).
Table 8: Assay Development - MicroRNAs and forward primers
Figure imgf000086_0001
SEQ ID SEQ ID microRNA Forward primer
NO. NO. hsa-miR-181c-5p 15 CAGTCATTTGGCAACATTCAACCTGTCG 320 hsa-miR-424-3p 16 CAAAACGTGAGGCGCTGCTAT 321 hsa-miR-342-3p 17,18 CAGTCATTTGGGTCTCACACAGAAATCG 322 hsa-miR-138-5p 19,20,21 CAGTCATTTGGCAGCTGGTGTTGTGAAT 323 hsa-miR-486-5p 22 CAGTCATTTGGCTCCTGTACTGAGCTGC 324 hsa-miR-200c-3p 23,24 CAGTCATTTGGGTAATACTGCCGGGTAA 325
MID- 16582 25 TTGGCAGTGAAGCATTGGACTGTA 326 hsa-miR-23a-3p 26 CAGTCATTTGGCATCACATTGCCAGGGA 327
MID-20094 27,28 CATTTGGCTAAGCCAGTTTCTGTCTGATA 328
MID-50969 29 T G G C AT G AC AG AT T G AC AT G G AC AAT T 329 hsa-miR-345-5p 30,31 CAGTCATTTGGCGCTGACTCCTAGTCCA 330 hsa-miR-3074-5p 32 CGTTCCTGCTGAACTGAGCCAG 331
MID-50976 33 CCTGTCTGAGCGCCGCTC 332
MID-50971 34 CAGTCATTTGGCATACTCTGGTTTCTTTTC 333 hsa-miR-5701 35 AGTCATTTGGCTTATTGTCACGTTCTGATT 334 hsa-miR-574-3p 36,37 CAGTCATTTGGCCACGCTCATGCACACA 335
Table 9: Assay Development - MicroRNA Specific probes
Figure imgf000087_0001
Table 11: microRNA Markers in Thyroid Assay
Figure imgf000087_0002
Figure imgf000088_0001
hsa-miR-574-3p 36,37
Marker microRNAs were selected based on their patterns of expression in several preliminary studies performed by the inventors (data not shown), and provided the reasoning for classifying the same as "malignant", "cell type" or alternatively, to be used as normalizers.
"Malignant markers" hsa-miR-222-3p, hsa-miR-551b-3p, hsa-miR-31-5p, hsa-miR-375, 5 hsa-miR-125b-5p, hsa-miR-152-3p, hsa-miR-346, hsa-miR-181c-5p, hsa-miR-424-3p and hsa- miR-146b-5p were established according to the level of expression of these microRNAs in malignant samples when compared with their expression in benign samples.
"Cell type" markers hsa-miR-486-5p, hsa-miR-342-3p, hsa-miR-138-5p, hsa-miR-200c- 3p, and MID- 16582 were chosen by the inventors according to their pattern or expression as 10 exemplified below.
hsa-miR-486-5p (SEQ ID NO.22) was found enriched in whole blood relative to thyroid epithelial cells. Along with other microRNAs (data not shown), it was found to be associated with the amount of blood in thyroid FNA samples. Thus, hsa-miR-486-5p (SEQ ID NO.22) is one example of whole blood marker. Several microRNAs were detected in high correlation 15 (>0.85) with miR-486-5p, and may also be considered blood markers, including hsa-miR-320a, hsa-miR-106a-5p, hsa-miR-93-5p, hsa-miR-17-3p, hsa-let-7d-5p, hsa-miR-107, hsa-miR-103a-3p, hsa-miR-17-5p, hsa-miR-191-5p, hsa-miR-25-3p, hsa-miR-106b-5p, hsa-miR-20a-5p, hsa-miR-18a-5p, hsa-miR-144-3p, hsa-miR-140-3p, hsa-miR-15b-5p, hsa-miR-16-5p, hsa-miR-92a-3p, hsa-miR-484, hsa-miR-151a-5p, hsa-let-7f-5p, hsa-let-7a-5p, hsa-let-7c-5p, hsa-let-7b-5p, hsa-let-7g-5p, hsa-let-7i-5p, hsa-miR-185-5p, hsa-miR-30d-5p, hsa-miR-30b-5p, hsa-miR-30c-5p, hsa-miR-19b-3p, hsa-miR-26a-5p, hsa-miR-26b-5p, hsa-miR-425-5p, MID-19433, and hsa-miR-4306.
The inventors have observed upon measuring the microRNA profile of the blood 5 compartments, that a number of microRNAs were found elevated in different blood cell types (data not shown). Thus, hsa-miR-342-3p (SEQ ID NO.17-18) was one of the microRNAs, amongst others, which was enriched in white blood cells, and may therefore be considered an example of white blood cell marker. Interestingly, hsa-miR-342-3p showed to be expressed in correlation with hsa-miR-150-5p, suggesting that also hsa-miR-150-5p is a white blood cell 10 marker. In addition, hsa-miR-146a-5p was also shown to be expressed in white blood cells (data not shown).
hsa-miR-200c-3p (SEQ ID NO.23-24) and hsa-miR-138-5p (SEQ ID NO.19-21) were found enriched in epithelial cells. In a preliminary experiment, smears were generated with blood in the absence of thyroid tissue material, and compared with smears from thyroid tissue. 15 Both hsa-miR-200c-3p (SEQ ID NO.23-24) and hsa-miR-138-5p (SEQ ID NO.19-21) were found to be expressed at much higher levels in the thyroid smears (both benign and malignant) compared to blood smears (data not shown). Other microRNAs were also found enriched in epithelial cells (data not shown). Thus, hsa-miR-200c-3p (SEQ ID NO.23-24) and hsa-miR-138- 5p (SEQ ID NO.19-21) are examples of epithelial cell markers. Interestingly, the inventors found 20 that the expression of hsa-miR-138-5p correlated with the presence of epithelial cells, and in certain subsets of the data hsa-miR-138-5p was found to be upregulated in benign samples (data not shown).
MID- 16582 (SEQ ID NO.25) was found at higher expression levels in Hurthle cells. In preliminary studies, the inventors have surprisingly found that this microRNA is upregulated in 25 follicular adenoma presenting Hurthle cells versus follicular adenomas not indicated to have Hurthle cells (Figures 6A-6B). This result may be attributed to the mitochondrial enrichment found in Hurthle cells. The present inventors have found that the sequence of MID- 16582 (SEQ ID NO.25), as well as other nucleic acid sequences found in Hurthle cells, can be mapped to mitochondrial DNA (data not shown). Thus, MID- 16582 is an example of Hurthle cell marker. 30
Assay development training set included about 360 distinct samples. Most of the samples were stained FNA smears (Papanicolaou, May-Griinwald Giemsa or Diff-Quik). Forty-five (45) FNA samples were in cell blocks. The samples were collected from medical centers in Israel, Europe and USA. Most of the samples were "indeterminate" FNAs (according to Bethesda classification, 71 of class III, 113 of class IV and 74 of class V) while others were "determinate" (38 of class II, 60 of class VI). The training set was composed of malignant (n=197) and benign (n=155) thyroid nodules, and contained representatives of the eight main histological subtypes of thyroid nodules. Thirty-three of the samples came from thyroid nodules that were less than 1 cm in size. The smallest nodule size was 0.1 cm. Samples of medullary carcinoma were excluded 5 from most of the analyses, unless where indicated. Table 10 provides the distribution of the samples per category.
Table 10: Training Study Cohort composition and Bethesda distribution
Figure imgf000090_0001
Samples from FNA smears routinely prepared as well as cell blocks were used for total RNA extraction and RT-PCR amplification. All the samples were tested with a panel of 15 marker microRNAs and 9 microRNAs used as normalizers (Table 11).
Results of the training in a sub-set of samples (n=353) are shown in Figure 7. Expression of microRNAs hsa-miR-222-3p (SEQ ID NO.1-2), hsa-miR-551b-3p (SEQ ID NO.3-4), hsa- miR-31-5p (SEQ ID NO.5-7), hsa-miR-125b-5p (SEQ ID NO.9), hsa-miR-146b-5p (SEQ ID NO.10-11), hsa-miR-346 (SEQ ID N0.14), hsa-miR-181c-5p (SEQ ID N0.15), and hsa-miR- 375 (SEQ ID NO.8) above the threshold are found in correlation with malignant samples. The expression levels shown in Figure 7 were obtained by the following formula: [50 - normalized Ct of each marker]. The normalization was done by subtracting the mean signal of the normalizers. The value of the mean signal of the normalizers over all the samples used, was added to all the expression values detected, in order to bring the values to a range more 5 manageable for calculation. Interestingly, expression levels of hsa-miR-125a-5p correlate with that of hsa-miR-125b-5p.
Example 9: Establishment of Classifiers for the Thyroid Assay
Four algorithms were used in order to establish the best classifier to be implemented in 10 the thyroid assay, Discriminant Analysis, K-nearest neighbor (KNN), support vector machine (SUV) and Ensemble of discriminant analysis classifiers (Discriminant Analysis Ensemble).
The following parameters were established a priori:
Priors: For all the algorithms used, priors were set to 70% for the malignant and 30% for the benign samples. 15
Sample Set: In this example, three sample sets were analyzed. One sample set included malignant (n=183) plus benign (n=155) samples, which excludes the malignant medullary samples; referred to below and in the Figures as "malignant+benign". Another sample set included all "indeterminate" samples, which includes all samples classified as Bethesda III, IV and V, referred to below and in the Figures as "indeterminate". A third sample set included 20 samples classified as Bethesda IV only, referred to below and in the Figures as "Bethesda". Samples from thyroid lesions classified as Bethesda IV are usually difficult to classify by cytological parameters. Therefore, it is important to establish a classifier that is based on this sub-group of samples. In addition, specific samples that presented technical problems due to a variety of reasons (e.g. malignant samples with Bethesda II; sample taken from lymph nodes) 25 were excluded.
Medullary samples were excluded from the classification. Therefore, in this Example, when referring to malignant samples it means non-medullary malignant.
Normalization of microRNA expression levels: MicroRNA expression levels were normalized with the so-called normalizer microRNAs [hsa-miR-23a-3p, MID-20094, MID- 30 50969, hsa-miR-345-5p, hsa-miR-3074-5p, MID-50976, MID-50971 , hsa-miR-5701 and hsa- miR-574-3p] and were subtracted from 50, in order for lower CTS to be associated with higher expression values. MicroRNA Ratios: Ratios were obtained from pairs of microRNAs in an attempt to subtract certain factors from the classifier. Thus e.g. a ratio of hsa-miR-31-5p:hsa-miR-342-3p enables to reduce the contribution of white blood cells (through the expression of hsa-miR-342- 3p, the denominator) in the expression of hsa-miR-31-5p (the numerator). Since CTS are in log- scale, ratios were created by subtracting one miR expression from the other. Each ratio was 5 further normalized by adding a constant, in order for the ratios to be within the same range as the microRNA normalized values.
Example 9.1: Discriminant Analysis Classifier
When discriminant analysis was used as the algorithm, a linear discriminant type of discriminant analysis was applied, in three sets of samples as mentioned above (using as features either different combinations of microRNA expression levels (Fig.8A-8C, Fig.23A-23C and Fig.37A-37C), microRNA ratios (Fig.9A-9C, Fig.24A-24C and Fig.38A-38C), or a combination of microRNA expression levels and microRNA ratios (Fig. lOA-lOC, Fig.25A-25C and Fig.39A- 39C).
As mentioned above, three sets of samples were run with this algorithm. Figures 8A-8C, Fig.9A-9C and Fig. lOA-lOC provide the results of this algorithm on malignant+benign samples. Figures 23A-23C, Fig.24A-24C and Fig. 25A-25C provide the results of this algorithm on indeterminate samples. Figures 37A-37C, Fig.38A-38C and Fig. 39A-39C provide the results of this algorithm on Bethesda IV samples.
Example 9.2: KNN Classifier
One analysis was performed using KNN (k-nearest neighbors) as the algorithm, in which k=5 was used with a distance metric of Pearson correlation. The analysis with the KNN algorithm was applied to three sets of samples as mentioned above (malignant+benign, 25 indeterminate and Bethesda IV) using as features either different combinations of microRNA expression levels (Fig. l lA-l lC, Fig.26A-26C and Fig.40A-40C), microRNA ratios (Fig. l2A- 12B, Fig.27A-27B and Fig.41A-41B), or a combination of microRNA expression levels and microRNA ratios (Fig. l3A-13C, Fig.28A-28C and Fig.42A-42C).
As mentioned above, three sets of samples were run with this algorithm. Figures 11 A- 30 l lC, Fig. l2A-12B and Fig. 13A-13C provide the results of this algorithm on malignant+benign samples. Figures 26A-26C, Fig.27A-27B and Fig. 28A-28C provide the results of this algorithm on indeterminate samples. Figures 40A-40C, Fig.41A-41C and Fig. 42A-42C provide the results of this algorithm on Bethesda IV samples. Example 9.3: SVM Classifier
A third analysis was performed applying SVM (Support vector machine) as the algorithm, in which linear kernel was used. The analysis with the SVM algorithm was applied to three sets of samples as mentioned above (malignant+benign, indeterminate and Bethesda IV), 5 using as features either different combinations of microRNA expression levels (Fig.l4A-14C, Fig.29A-29C and Fig.43A-43C), microRNA ratios (Fig. l5A-15C, Fig.30A-30C and Fig.44A- 44C), or a combination of microRNA expression levels and microRNA ratios (Fig.l6A-16C, Fig.31A-31C and Fig.45A-45C), respectively.
As mentioned above, three sets of samples were run with this algorithm. Figures 14 A- 10 14C, Fig.l5A-15C and Fig. 16A-16C provide the results of this algorithm on malignant+benign samples. Figures 29A-29C, Fig.30A-30C and Fig. 31A-31C provide the results of this algorithm on indeterminate samples. Figures 43A-43C, Fig.44A-44C and Fig. 45A-45C provide the results of this algorithm on Bethesda IV samples.
15
Example 9.4: Ensemble Methods Classifier
A fourth analysis was performed applying Ensemble methods as the algorithm. An ensemble of up to 100 discriminant analysis classifiers was created using AdaBoost and applied to the data. The analysis with the Ensemble algorithm was applied to three sets of samples as mentioned above (malignant+benign, indeterminate and Bethesda IV), using as features either 20 different combinations of microRNA expression levels (Fig. l7A-17C, Fig.32A-32C and Fig.46A-46C), microRNA ratios (Fig.l8A-18C, Fig.33A-33C and Fig.47A-47C), or a combination of microRNA expression levels and microRNA ratios (Fig.l9A-19C, Fig.34A-34C and Fig.48A-48C).
As mentioned above, three sets of samples were run with this algorithm. Figures 17 A- 25 17C, Fig.l8A-18C and Fig. 19A-19C provide the results of this algorithm on malignant+benign samples. Figures 32A-32C, Fig.33A-33C and Fig. 34A-34C provide the results of this algorithm on indeterminate samples. Figures 46A-46C, Fig.47A-47C and Fig. 48A-48C provide the results of this algorithm on Bethesda IV samples.
30
Example 10. A classifier for malignant samples including medullary
The same sample set used in Example 9, but including medullary malignant samples was used for establishing a classifier. All classifiers were applied in this set of samples, and a representative set of results from the discriminant analysis algorithm applied in the set of samples is presented in Figures 51 and 52. When normalized values of two microRNA ratios (hsa-miR-125b-5p:hsa-miR-138-5p; and hsa-miR-146b-5p:hsa-miR-342-3p) were used as the features for the classification, the sensitivity of the classifier was 84.7% and the specificity, 80.8%. When the normalized values of two microRNAs (hsa-miR-222-3p and hsa-miR-551b-3p) were used as the features for the classification, the sensitivity was 85.2% and the specificity, 5 53.6%.
Example 11: Elimination of Samples through the Expression of Cell Specific Markers
One important consideration throughout this study was the accuracy of the result that is to be provided to a patient who has had an FNA sample collected. Laboratories tend to err in 10 order not to provide false-negative results. On the other hand, in the analysis of FNA specimens, a suspicious diagnostic will send the patient to surgery, which in more than 25% of the cases turns out to be unnecessary. For example, at least one report in the literature described that thyroid tumor samples with large amounts of blood, or even pure blood, are misdiagnosed as suspicious in 7 out of 9 cases (Walsh et al. (2012) J Clin Endocrin Metab. doi:10.1210/jc.2012- 15 1923).
With this goal in mind, the present inventors searched for microRNAs that could be used as cell type markers and aid in the screening of the quality of the specimen examined. Thus, the expression of hsa-miR-486-5p (SEQ ID N0.22) and hsa-miR-200c-3p (SEQ ID NO.23-24) was evaluated in the training cohort, including cell blocks, having samples from benign and 20 malignant (non-medullary) thyroid lesions, as well as four samples of blood only (slides of blood smears were generated for this purpose, and RNA extracted as described herein). Figure 53 shows the result of this experiment. The blood microRNA marker, hsa-miR-486-5p is very high and the epithelial marker, hsa-miR-200c-3p, is very low, compared to the threshold established in the training set. The blood smear samples were therefore filtered out using these markers. This 25 expression pattern indicates that these samples do not have enough epithelial cells (for lack of the epithelial cell marker) to continue the test. In a test situation, these four samples of blood smears would be disqualified and discarded. Expression of hsa-miR-138-5p (SEQ ID NO.19-21) has also been shown to be low, compared to the threshold, in blood smears (data not shown). Samples with this profile are eligible to be disqualified and/or discarded from the protocol for 30 classification of thyroid lesion samples.
The inventors had previously established that expression of hsa-miR-342-3p (SEQ ID NO.17- 18) correlates with white blood cells (data not shown). Hence, high expression of hsa- miR-342-3p compared to the threshold indicated lack of sufficient thyroid cells, and samples with this profile are eligible to be disqualified and/or discarded from the protocol for classification of thyroid lesion samples.
In parallel, high expression of hsa-miR-200c-3p is an indicator of the presence of epithelial cells in general, and specifically thyroid cells (data not shown and Figure 53). Hence, the expression of hsa-miR-200c-3p above a threshold may be used as an indicator of sufficiency 5 of thyroid cells in the sample.
Example 12: Classification of Thyroid Tumor Sub-types
Classification of benign thyroid tumor sub-types was done using samples from Hashimoto (n=6) and follicular adenoma (FA; n=81), from the Training cohort. The results are 10 presented in Figure 54. Expression of hsa-miR-342-3p (SEQ ID NO.17-18) and hsa-miR-31-5p in Hashimoto samples was high compared to the threshold established in the training set. Thus, high expression of hsa-miR-342-3p alone or in combination with hsa-miR-31-5p may be used for the classification of samples as benign, and further sub-typing as Hashimoto.
Further, the inventors also tested microRNA ratios for sub-typing benign thyroid tumors. 15 In this context, the miR ratio of hsa-miR-125b-5p:hsa-miR-200c-3p was significant for classifying follicular adenoma (FA) versus Hashimoto samples (data not shown).
Classification of malignant thyroid tumor sub-types was done using a subset of samples (n=177) of the Training cohort. Figure 55 provides one example of an analysis, in which 146b- 5p, 222-3p, 31-5p, 125b-5p, 551-3p and 375 were found to be highly expressed in papillary 20 carcinoma, while MID-16582 was found to be highly expressed in follicular carcinoma.
The ratios of the following miR pairs were significant for classifying Papillary Carcinoma (PC) versus Follicular Carcinoma samples: hsa-miR-146b-5p:hsa-miR-342-3p, hsa- miR-125b-5p:hsa-miR-200c-3p, hsa-miR-222-3p:hsa-miR-486-5p, hsa-miR-31-5p:hsa-miR- 342-3p, MID-16582:hsa-miR-200c-3p, MID-16582:hsa-miR-138-5p (data not shown). 25
Therefore, the inventors have demonstrated that malignant thyroid tumor sub-typing may be performed using miR ratios, particularly miR ratios where the denominator is a cell marker microRNA, such as hsa-miR-486-5p, hsa-miR-200c-3p, hsa-miR-138-5p, and hsa-miR-342-3p.
Example 13: Protocol for the classification of thyroid nodules as malignant or benign 30
Figure 56 presents a flowchart with the protocol for thyroid nodule sample analysis, from collection of FNA samples to laboratory analysis and diagnostic. FNA samples are collected from patients having thyroid nodules, and are routinely processed. Smears are prepared from the FNA samples. As a first step, a specialist in cytopathology examines the FNA sample and provides an analysis. In cases where the analysis is inconclusive, particularly in samples classified as Bethesda III, IV, or V, i.e. so-called "indeterminate", the sample is sent to Rosetta Genomics' laboratories to undergo microRNA profiling and conclusive diagnostic. Total RNA is extracted from the sample, which undergoes microRNA profiling. MicroRNA profiling may be performed by amplification (RT-PCR or NGS) or hybridization (microarray), as shown in the 5 Examples above.
The protocol may include any one of the following:
One or more algorithms may be used during classification, and will be applied on data comprising single microRNAs expression, microRNA ratios, or a combination thereof.
Samples wherein the hsa-miR-375 expression level is above a specific threshold may be 10 determined as malignant (e.g. a threshold of at least 10, or a threshold of at least 18), as demonstrated for example in Figures 4 (expression analyzed by array) and 20 (expression analyzed by PCR). The threshold is dependent on the normalization of the samples, as well as on the methodology used for measuring the microRNAs. The threshold may also be a function of the target sensitivity and specificity. 15
Samples wherein the hsa-miR-146b-5p expression level is above a specific threshold will be determined as malignant (e.g. a threshold of at least 16), as demonstrated for example in Figures 21, 35 and 49. The threshold is dependent on the normalization of the samples, as well as on the methodology used for measuring the microRNAs. The threshold may also be a function of the target sensitivity and specificity. 20
Samples wherein the ratio hsa-miR-146b-5p:hsa-miR-342-3p, further to normalization, is above a specific threshold will be determined as malignant (e.g. a threshold of at least 16), as demonstrated for example in Figures 22, 36 and 50. The threshold is dependent on the normalization of the samples, as well as on the methodology used for measuring the microRNAs.
The level of expression of the normalizers may be used as an indicator for discarding 25 samples, due to insufficient tumor-derived material. Thus, samples presenting low levels of any of the normalizers, or the minimal, median or maximal value of expression for the normalizers may be discarded. For example, low levels of hsa-miR-23a-3p (compared to the overall levels of hsa-miR-23a-3p expression in the cohort) are likely to be misclassified. In counterpart, high levels of hsa-miR-23a-3p improve the classification by improving sensitivity and specificity 30 (data not shown).
Analysis of the microRNA profiling data leads to diagnostic of the thyroid nodule as benign or malignant. Results permitting, which include the expression of microRNAs that may be associated with thyroid tumor sub-types, as shown in Figures 54 and 55, for example, the sample is further classified according to its thyroid tumor subtype.
The foregoing description of the specific embodiments so fully reveals the general nature of the invention that others can, by applying current knowledge, readily modify and/or adapt for 5 various applications such specific embodiments without undue experimentation and without departing from the generic concept, and, therefore, such adaptations and modifications should and are intended to be comprehended within the meaning and range of equivalents of the disclosed embodiments. Although the invention has been described in conjunction with specific embodiments thereof, it is evident that many alternatives, modifications and variations will be 10 apparent to those skilled in the art. Accordingly, it is intended to embrace all such alternatives, modifications and variations that fall within the spirit and broad scope of the appended claims.
It should be understood that the detailed description and specific examples, while indicating preferred embodiments of the invention, are given by way of illustration only, since various changes and modifications within the spirit and scope of the invention will become 15 apparent to those skilled in the art from this detailed description.

Claims

1. A method for classifying a thyroid lesion sample, the method comprising the steps of: a. obtaining a thyroid lesion sample from a subject in need thereof;
b. measuring the expression level of at least four nucleic acids in the sample, said nucleic acid comprising a sequence of SEQ ID NOS: 1-308, variants thereof or a sequence having at least about 80% identity thereto;
c. determining a nucleic acid expression profile;
d. applying a classifier algorithm to the nucleic acid expression profile;
e. classifying said thyroid lesion as benign, malignant or of a sub-type of benign or malignant tumor based on the result from the algorithm applied to the nucleic acid expression profile of said sample.
2. The method of claim 1 , wherein following step (b) or (c) further comprising a step of obtaining the ratio between the expression levels of at least one pair of nucleic acids; and wherein in step (d) said classifier algorithm may be applied to any one of the nucleic acid expression profile, said ratio of at least one pair of nucleic acids, or to a combination thereof.
3. The method of any one of claims 1 or 2, wherein said nucleic acid sequence comprises a sequence of any one of SEQ ID NOs.1-37, variants thereof or a sequence having at least about 80% identity thereto.
4. The method of claim 3, wherein said nucleic acid sequence comprises a sequence of any one of SEQ ID NOs.1-25, variants thereof or a sequence having at least about 80% identity thereto.
5. The method of any one of claims 1-4, wherein said thyroid lesion sample is obtained by fine needle aspiration (FNA) biopsy.
6. The method of claim 5, wherein said sample is a smear from a FNA biopsy.
7. The method of any one of claims 1-4, wherein said thyroid lesion is a nodule of less than 1 cm.
8. The method of any one of claims 1-4, wherein said algorithm is a machine-learning algorithm.
9. The method of claim 1, wherein following step (b) if at least one of said nucleic acid expression level is below or above a threshold for thyroid cells, said sample is discarded based on the expression level of said nucleic acid.
10. The method of claim 1-4, wherein said sample has less than 50 thyroid cells.
11. The method of claim 8, wherein said algorithm further combines the nucleic acid expression profile with clinical or genetic data from said sample.
12. The method of any one of claims 1-4, wherein the measuring is performed by hybridization, amplification or next generation sequencing method.
13. The method of claim 12, wherein hybridization comprises contacting the sample with probes, wherein the probes comprise (i) DNA equivalents of the microRNAs, (ii) the complements thereof, (iii) sequences at least 80% identical to (i) or (ii) or (iv) a nucleic acid sequence that hybridizes with at least eight contiguous nucleotides of any one of SEQ ID NOs 1-25.
14. The method of claim 13, wherein the probes are attached to a solid substrate.
15. The method of claim 12, wherein amplification is real-time polymerase chain reaction (RT-PCR), said RT-PCR amplification method comprising forward and reverse primers, and optionally further comprising hybridization with a probe.
16. The method of any one of claims 1-4, further comprising the step of administering a differential treatment to said subject if said thyroid lesion is benign or malignant.
17. The method of claim 16, wherein said lesion is malignant and said treatment is any one of surgery, chemotherapy, radiotherapy, hormone therapy, or any other recommended treatment.
18. A protocol for classifying a thyroid lesion sample, said protocol comprising the steps of:
a. obtaining a thyroid lesion sample from a subject in need thereof; b. measuring the level of at least four nucleic acid in the sample, said nucleic acid comprising a sequence of SEQ ID NOS: 1-308, variants thereof or a sequence having at least about 80% identity thereto;
c. determining the expression of nucleic acids in said sample that associate with specific cell types;
d. wherein (i) the expression level of at least one nucleic acid that is a non-thyroid cell marker above a threshold determines that the sample is discarded; or (ii) expression levels of non-thyroid cell markers below a threshold determines that the sample proceeds to step (e) for further analysis;
e. if the sample is not discarded in step (d), determining a nucleic acid expression profile;
f. applying a classifier algorithm to the microRNA expression profile;
g. classifying said thyroid lesion as benign, malignant or of a sub-type of benign or malignant tumor based on the result of the algorithm applied to the nucleic acid expression profile of said sample.
19. The protocol of claim 18, wherein following step (b) further comprising a step of obtaining the ratio between the expression levels of at least one pair of nucleic acids; and wherein in step (f) said classifier algorithm may be applied to any one of the nucleic acid expression profile, said ratio of at least one pair of nucleic acids, or to a combination thereof.
20. The protocol of any one of claims 18 or 19, wherein said nucleic acid sequence comprises a sequence of any one of SEQ ID NOs.1-37, variants thereof or a sequence having at least about 80% identity thereto.
21. The protocol of claim 20, wherein said nucleic acid sequence comprises a sequence of any one of SEQ ID NOs.1-25, variants thereof or a sequence having at least about 80% identity thereto.
22. The protocol of any one of claims 18-21, wherein said thyroid lesion sample is obtained by fine needle aspiration (FNA) biopsy.
23. The protocol of claim 22, wherein said sample is a smear from a FNA biopsy.
24. The protocol of any one of claims 18-21, wherein said thyroid lesion is a nodule of less than 1 cm.
25. The protocol of any one of claims 18-21, wherein said sample has less than 50 thyroid cells.
26. The protocol of any one of claims 18-21, wherein said algorithm is a machine- learning algorithm.
27. The protocol of any one of claims 18-21, wherein the measuring is performed by hybridization, amplification or next generation sequencing method.
28. The protocol of claim 27, wherein hybridization comprises contacting the sample with probes, wherein the probes comprise (i) DNA equivalents of the microRNAs, (ii) the complements thereof, (iii) sequences at least 80% identical to (i) or (ii) or (iv) a nucleic acid sequence that hybridizes with at least eight contiguous nucleotides of any one of SEQ ID NOs 1-25.
29. The protocol of claim 28, wherein the probes are attached to a solid substrate.
30. The protocol of claim 27, wherein amplification is real-time polymerase chain reaction (RT-PCR), said RT-PCR amplification method comprising forward and reverse primers, and optionally further comprising hybridization with a probe.
31. The protocol of any one of claims 18-21, further comprising the step of administering a differential treatment to said subject if said thyroid lesion is benign or malignant.
32. The protocol of claim 31, wherein said lesion is malignant and said treatment is any one of surgery chemotherapy, radiotherapy, hormone therapy, or any other recommended treatment.
33. A kit for thyroid tumor classification, said kit comprising:
(a) probes for performing thyroid tumor classification, wherein said probes comprise any one of (i) DNA equivalents of microRNAs comprising at least one of SEQ ID NOs 1-308, (ii) the complements thereof, (iii) sequences at least 80% identical to (i) or (ii), (iv) a nucleic acid sequence that hybridizes with at least eight contiguous nucleotides of any one of SEQ ID NOs 1-182, or (v) a nucleic acid sequence that hybridizes with RT-PCR products; and optionally (b) an instruction manual for using said probes.
34. The kit of claim 33, further comprising forward and reverse PCR primers.
35. An isolated nucleic acid, said nucleic acid comprising at least 12 contiguous nucleotides at least 80% identical to the sequence of any one of SEQ ID NOs. 27-29, 33, 34, 139, 140, 307 and 308.
36. A pharmaceutical composition comprising as active agent the isolated nucleic acid of claim 31, and optionally adjuvants, carriers, diluents and excipients.
37. A vector comprising the isolated nucleic acid of claim 35.
38. A probe comprising the isolated nucleic acid of claim 35.
39. A biochip comprising the isolated nucleic acid of claim 35.
40. Use of an isolated nucleic acid of claim 35 in the preparation of a medicament.
PCT/US2015/030564 2014-05-13 2015-05-13 Mirna expression signature in the classification of thyroid tumors WO2015175660A1 (en)

Priority Applications (9)

Application Number Priority Date Filing Date Title
JP2016567582A JP6216470B2 (en) 2014-05-13 2015-05-13 MiRNA expression signatures in the classification of thyroid tumors
BR112016026575A BR112016026575A2 (en) 2014-05-13 2015-05-13 Mirna Expression Signature in Thyroid Tumor Classification
CN201580024961.9A CN106460053A (en) 2014-05-13 2015-05-13 MIRNA expression signature in classification of thyroid tumors
EP15792258.4A EP3143162A4 (en) 2014-05-13 2015-05-13 Mirna expression signature in the classification of thyroid tumors
CA2945531A CA2945531C (en) 2014-05-13 2015-05-13 Mirna expression signature in the classification of thyroid tumors
US15/237,364 US9708667B2 (en) 2014-05-13 2016-08-15 MiRNA expression signature in the classification of thyroid tumors
IL248639A IL248639A0 (en) 2014-05-13 2016-10-31 Mirna expression signature in the classification of thyroid tumors
US15/625,645 US20170356055A1 (en) 2014-05-13 2017-06-16 Mirna expression signature in the classification of thyroid tumors
US16/192,221 US20190300963A1 (en) 2014-05-13 2018-11-15 Mirna expression signature in the classification of thyroid tumors

Applications Claiming Priority (8)

Application Number Priority Date Filing Date Title
US201461992756P 2014-05-13 2014-05-13
US201461992531P 2014-05-13 2014-05-13
US61/992,531 2014-05-13
US61/992,756 2014-05-13
US201462069353P 2014-10-28 2014-10-28
US62/069,353 2014-10-28
US201562139066P 2015-03-27 2015-03-27
US62/139,066 2015-03-27

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/237,364 Continuation-In-Part US9708667B2 (en) 2014-05-13 2016-08-15 MiRNA expression signature in the classification of thyroid tumors

Publications (1)

Publication Number Publication Date
WO2015175660A1 true WO2015175660A1 (en) 2015-11-19

Family

ID=54480610

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2015/030564 WO2015175660A1 (en) 2014-05-13 2015-05-13 Mirna expression signature in the classification of thyroid tumors

Country Status (7)

Country Link
EP (1) EP3143162A4 (en)
JP (1) JP6216470B2 (en)
CN (1) CN106460053A (en)
BR (1) BR112016026575A2 (en)
CA (1) CA2945531C (en)
IL (1) IL248639A0 (en)
WO (1) WO2015175660A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106434872A (en) * 2016-08-11 2017-02-22 河南大学 MiRNA molecule marker hsa-miR-152-3p for diagnosing type 2 diabetes, and application thereof
WO2019161472A1 (en) * 2018-02-23 2019-08-29 Onkos Diagnósticos Moleculares Ltda Me Method and kit for the classification of thyroid nodules
CN110499367A (en) * 2019-08-09 2019-11-26 深圳市第二人民医院 Biomarker and its application

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107385093A (en) * 2017-09-07 2017-11-24 青岛大学 Primer composition and its application and application its product and product application method
CN107723365B (en) * 2017-09-11 2019-10-01 朱伟 One kind blood plasma miRNA marker relevant to lung squamous cancer auxiliary diagnosis and its application
CN108038352B (en) * 2017-12-15 2021-09-14 西安电子科技大学 Method for mining whole genome key genes by combining differential analysis and association rules
WO2019136253A1 (en) * 2018-01-05 2019-07-11 Visiongate, Inc. Morphometric genotyping of cells using optical tomography for detecting tumor mutational burden
CN108721318B (en) * 2018-05-16 2021-06-25 广东药科大学 Application of miR-125b and chemotherapeutic agent in preparation of medicine for treating thyroid cancer
WO2020032228A1 (en) * 2018-08-10 2020-02-13 東レ株式会社 Kit, device and method for detecting prostate cancer
CN109700824A (en) * 2019-01-08 2019-05-03 上海长海医院 Application of the miR-31 and the like in the drug of preparation prevention or treatment blood vessel endothelium injury
CN113025714B (en) * 2021-03-23 2022-05-24 华中科技大学同济医学院附属同济医院 miRNA biomarker for papillary thyroid carcinoma lateral cervical lymph node metastasis diagnosis and detection kit
CN116769922B (en) * 2023-08-24 2023-11-24 四川大学华西医院 Application of reagent for detecting circulating sEV RNA, kit and diagnostic system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011154008A1 (en) * 2010-06-11 2011-12-15 Rigshospitalet Microrna classification of thyroid follicular neoplasia
WO2012129378A1 (en) * 2011-03-22 2012-09-27 Keutgen Xavier M Distinguishing benign and malignant indeterminate thyroid lesions
WO2013063544A1 (en) * 2011-10-27 2013-05-02 Asuragen, Inc. Mirnas as diagnostic biomarkers to distinguish benign from malignant thyroid tumors

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011154008A1 (en) * 2010-06-11 2011-12-15 Rigshospitalet Microrna classification of thyroid follicular neoplasia
WO2012129378A1 (en) * 2011-03-22 2012-09-27 Keutgen Xavier M Distinguishing benign and malignant indeterminate thyroid lesions
WO2013063544A1 (en) * 2011-10-27 2013-05-02 Asuragen, Inc. Mirnas as diagnostic biomarkers to distinguish benign from malignant thyroid tumors

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
KEUTGEN , XAVIER M. ET AL.: "A panel of four miRNAs accurately differentiates malignant from benign indeterminate thyroid lesions on fine needle aspiration.", CLINICAL CANCER RESEARCH, vol. 18.7, 2012, pages 2032 - 2038, XP055125465, Retrieved from the Internet <URL:http://clincancerres.aacrjournals.org/content/18/7/2032.full> [retrieved on 20120401] *
See also references of EP3143162A4 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106434872A (en) * 2016-08-11 2017-02-22 河南大学 MiRNA molecule marker hsa-miR-152-3p for diagnosing type 2 diabetes, and application thereof
WO2019161472A1 (en) * 2018-02-23 2019-08-29 Onkos Diagnósticos Moleculares Ltda Me Method and kit for the classification of thyroid nodules
CN110499367A (en) * 2019-08-09 2019-11-26 深圳市第二人民医院 Biomarker and its application

Also Published As

Publication number Publication date
JP6216470B2 (en) 2017-10-18
CN106460053A (en) 2017-02-22
EP3143162A1 (en) 2017-03-22
EP3143162A4 (en) 2017-04-26
CA2945531A1 (en) 2015-11-19
CA2945531C (en) 2018-01-30
BR112016026575A2 (en) 2017-12-12
IL248639A0 (en) 2017-01-31
JP2017521051A (en) 2017-08-03

Similar Documents

Publication Publication Date Title
US9708667B2 (en) MiRNA expression signature in the classification of thyroid tumors
CA2945531C (en) Mirna expression signature in the classification of thyroid tumors
Wu et al. Next-generation sequencing of microRNAs for breast cancer detection
KR101900872B1 (en) Plasma Micorornas for The Detection of Early Colorectal Cancer
US20180105888A1 (en) Methods and Kits for Detecting Subjects at Risk of Having Cancer
AU2012265177B2 (en) Methods and devices for prognosis of cancer relapse
Li et al. Identification of aberrantly expressed miRNAs in rectal cancer
US10457994B2 (en) 4-miRNA signature for predicting clear cell renal cell carcinoma metastasis and prognosis
AU2018202963B2 (en) Biomarkers useful for detection of types, grades and stages of human breast cancer
JP5750710B2 (en) Cancer marker, cancer evaluation method and evaluation reagent using the same
WO2011154008A1 (en) Microrna classification of thyroid follicular neoplasia
JP2022536502A (en) Compositions and methods for treating cancer
WO2017079571A1 (en) Process for the indentication of patients at risk for oscc
KR20130098669A (en) Serum mirna as a marker for the diagnosis of lymph node metastasis of gastric cancer
Rghebi Circulating nucleic acids as biomarkers of breast cancer

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15792258

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2945531

Country of ref document: CA

WWE Wipo information: entry into national phase

Ref document number: 248639

Country of ref document: IL

ENP Entry into the national phase

Ref document number: 2016567582

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

REG Reference to national code

Ref country code: BR

Ref legal event code: B01A

Ref document number: 112016026575

Country of ref document: BR

REEP Request for entry into the european phase

Ref document number: 2015792258

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2015792258

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 112016026575

Country of ref document: BR

Kind code of ref document: A2

Effective date: 20161111