WO2023147445A2

WO2023147445A2 - Cell-free rna biomarkers for the detection of cancer or predisposition to cancer

Info

Publication number: WO2023147445A2
Application number: PCT/US2023/061410
Authority: WO
Inventors: Thuy NGO; Hyun Ji Kim; Josiah WAGNER; Breeshey ROSKAMS HIETER; Pavana ANUR
Original assignee: Oregon Health & Science University
Priority date: 2022-01-27
Filing date: 2023-01-26
Publication date: 2023-08-03
Also published as: WO2023147445A3

Abstract

Methods of detecting or treating cancer or predisposition to cancer are provided, the methods including analyzing a level of one or more cell-free RNA (cfRNA) biomarkers selected from AIDA, CAI, CENPE, CPOX, ELL2, EPB42, HBG1, HBG2, NEK2, NUSAP1, APOE, C3, CP, DHCR24, EGA, FGB, EGG, HRG, IFITM3, ATP IB 1, FPR3, SMC4, TXNDC16, ASPM, WRN, ZRANB2-AS2, BMX, CDC42BPA, KNL1, CACNA1A, ABCB7, HIST1H2BF, PSIP1, TMEM150C, ZC3H6, C9orfl6, CPQ, DYNC1I2, ECM1, and HIST1H2AH, or any combination thereof, in the biological sample; and performing a differential expression analysis comparing the level of each of the one or more cfRNA biomarkers to a corresponding control value (CV); in which differential expression shown by the differential expression analysis between the one or more cfRNA biomarkers and corresponding CVs indicates cancer or a predisposition for cancer in the subject.

Description

CELL-FREE RNA BIOMARKERS FOR THE DETECTION OF CANCER OR

PREDISPOSITION TO CANCER

Copyright Notice

[0001] © 2023 Oregon Health & Science University. A portion of the disclosure of this patent document contains material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever. 37 CFR § 1.71(d).

Cross-Reference to Related Applications

[0002] This application claims the benefit of U.S. Provisional Application No. 63/303,970, filed January 27, 2022, and of U.S. Provisional Application No. 63/426,258 filed November 17, 2022, which are both incorporated by reference in their entirety.

Technical Field

[0003] This disclosure relates generally to the field of biotechnology and in particular to utilizing measurement of cell-free RNA (cfRNA) profiles as biomarkers to diagnose cancer and related products and uses thereof.

Background

[0004] Although recent advances in cancer research offer new methods to treat cancer, the early detection of malignancy still confers the highest chance of improving long-term patient survival. Currently, only 2.4% of metastatic liver cancer patients survive for more than 5 years [1]. Early detection of liver cancer, which has the most rapidly increasing incidence in the United States, would extend 5-year survival rates to 33% with current treatment options. Even with a hematologic malignancies like multiple myeloma (MM), 95% of patients are diagnosed when the cancer has already spread systemically, resulting in at least a 20% decrease in 5-year survival rates compared to detection at earlier stages [2], Noninvasive, low cost and reliable cancer diagnostic assays could greatly benefit patients by facilitating accessibility to early cancer screening.

[0005] In many cancers, there are disease states known to be precursors of malignant disease. For example, MM, a cancer of antibody-producing plasma cells, is often preceded by monoclonal gammopathy of undetermined significance (MGUS), which is characterized by lower levels of abnormal antibodies. The prevalence of MGUS is about 3% in the Caucasian population, and the conversion rate from MGUS to multiple myeloma is approximately 1% per year [3, 4], Hepatocellular carcinoma (HCC), the most common form of liver cancer, is often preceded by liver cirrhosis (Cirr) characterized by irreversible fibrosis of the liver. The prevalence of cirrhosis is between 4.5-9.5% of the global population [5-7], The risk of developing de novo HCC in patients with liver cirrhosis ranges between 1-5% per year, depending on the etiology of the cirrhosis [5-11], Most early cancer detection studies to date have focused on distinguishing cancer from healthy controls, rather than discriminating between cancer and common premalignant conditions. Therefore, there is an unmet clinical need for a simple blood test that can identify patients with premalignant conditions who require further intervention due to a higher likelihood of cancer being present.

[0006] With current clinical practices, cancer diagnosis is primarily initiated based upon costly imaging studies or invasive screen procedures. Alternatively, some cancers may only come to attention with clinical symptoms that present at more advanced stages. Liquid biopsy, a minimally invasive method for sampling and analyzing biomarkers in various body fluids, has the potential to improve cancer diagnosis and prognosis [12-15], Several blood-based analytes have been explored for use in liquid biopsies for cancer detection such as circulating cells (Circulating Tumor Cells (CTCs), Circulating Hybrid Cells (CHCs), Tumor Associated Macrophages (TAMs)) [16-21], circulating tumor DNA (ctDNA) [22-24], platelets [25-27] and protein panels [28], However, ctDNA and circulating cells are present at low levels, have varied characteristics between patients, and only weakly correlate with phenotypic changes in cancer [17, 29, 30], Epigenetic features of ctDNA such as DNA methylation and 5- hydroxymethylcytosine signatures, or ctDNA protected patterns may provide information about the tissue of origin for pan-cancer detection [31-38], However, these methods may require a large sequencing coverage to be effective and may have inadequate sensitivity and specificity. Recent transcriptome analysis of tumor-educated platelets has shown promise for pan-cancer detection [25-27], but platelets are fragile, can be easily activated in vitro, and have highly variable characteristics depending on their preparation which make them challenging to utilize with existing clinical blood tests [39], There is thus a need for robust liquid biopsy technology that can overcome these challenges in a safe, reliable and cost-effective manner.

[0007] Circulating cell-free RNA (cfRNA) in blood is released from cells by active secretion or through apoptosis and necrosis [40, 41], Plasma cfRNA has the potential to reflect the systemic response to growing tumors and provide information about the tissue of tumor origin specifically by cancer type. Previous work has demonstrated that global cfRNA profiles indicate temporal changes of organ-specific transcripts. Further analysis of these transcripts facilitated the prediction of pregnancy delivery, preterm birth, and distinction of cancer from healthy controls [42-46] . Thus, an ideal method for distinguishing cancers and their pre- malignant conditions would include measuring the level of cfRNA profiles in a sample from a subject.

Summary of the Invention

[0008] Provided herein are methods including analyzing (such as measuring) a level of one or more cell-free RNA (cfRNA) biomarkers selected from a group comprising: AIDA, CAI, CENPE, CPOX, ELL2, EPB42, HBG1, HBG2, NEK2, NUSAP1, APOE, C3, CP, DHCR24, FGA, FGB, FGG, HRG, IFITM3, ATP1B1, FPR3, SMC4, TXNDC16, ASPM, WRN, ZRANB2-AS2, BMX, CDC42BPA, KNL1, CACNA1A, ABCB7, HIST1H2BF, PSIP1, TMEM150C, ZC3H6, C9orfl6, CPQ, DYNC1I2, ECM1, and HIST1H2AH, or any combination thereof, in a biological sample. A differential expression analysis is performed comparing the level of each cfRNA biomarker selected to a corresponding control value (CV). In some examples, the disclosed materials and methods are useful for diagnosing, in a subject, cancer or a predisposition for cancer. An exemplary method is useful as a method for detecting cancer or a predisposition for cancer utilizing a biological sample obtained from a subject. The exemplary method comprises analyzing (such as measuring) a level of one or more cell-free RNA (cfRNA) biomarkers selected from a group comprising: AIDA, CAI, CENPE, CPOX, ELL2, EPB42, HBG1, HBG2, NEK2, NUSAP1, APOE, C3, CP, DHCR24, FGA, FGB, FGG, HRG, IFITM3, ATP IB 1, FPR3, SMC4, TXNDC16, ASPM, WRN, ZRANB2-AS2, BMX, CDC42BPA, KNL1, CACNA1A, ABCB7, HIST1H2BF, PSIP1, TMEM150C, ZC3H6, C9orfl6, CPQ, DYNC1I2, ECM1, and HIST1H2AH, or any combination thereof, in the biological sample. A differential expression analysis is performed comparing the level of each cfRNA biomarker selected to a corresponding control value (CV). The differential expression shown by the differential expression analysis between the cfRNA biomarkers selected in corresponding CVs indicates cancer or a predisposition for cancer in the subject.

[0009] In some embodiments, the one or more cfRNA biomarkers: AIDA, CAI, CENPE, CPOX, ELL2, EPB42, HBG1, HBG2, NEK2, NUSAP1, FPR3, SMC4, TXNDC16, ASPM, WRN, ZRANB2-AS2, BMX, CDC42BPA, KNL1, CACNA1A, or any combination thereof, are selected to indicate blood cancer or a predisposition to blood cancer. In some examples, the methods include analyzing or measuring a level of one or more of AIDA, CAI, CENPE, CPOX, ELL2, EPB42, HBG1, HBG2, NEK2, NUSAP1, FPR3, SMC4, TXNDC16, ASPM, WRN, ZRANB2-AS2, BMX, CDC42BPA, KNL1, CACNA1A, or any combination thereof in cfRNA in a sample from a subject, wherein differential expression of one or more indicates blood cancer or a predisposition to blood cancer.

[0010] In some embodiments, the one or more cfRNA biomarkers: AIDA, CAI, CENPE, CPOX, ELL2, EPB42, HBG1, HBG2, NEK2, NUSAP1, or any combination thereof, are selected to indicate multiple myeloma (MM). In some examples, one or more of CENPE, HBG1, HBG2, and NUSAP1 are selected to indicate multiple myeloma. In some examples, the methods include analyzing or measuring a level of one or more of AIDA, CAI, CENPE, CPOX, ELL2, EPB42, HBG1, HBG2, NEK2, NUSAP1, or any combination of two or more thereof, in cfRNA in a sample from a subject, wherein differential expression of one or more indicates multiple myeloma. In other examples, the methods include analyzing or measuring a level of one or more (such as 1, 2, 3, or all) of CENPE, HBG1, HBG2, and NUSAP1 in cfRNA in a sample from a subject, wherein differential expression of one or more (such as 1, 2, 3, or all) indicates multiple myeloma.

[0011] In some embodiments, the one or more cfRNA biomarkers: FPR3, SMC4, TXNDC16, ASPM, WRN, ZRANB2-AS2, BMX, CDC42BPA, KNL1, CACNA1A, or any combination thereof, are selected to indicate monoclonal gammopathy of undetermined significance (MGUS). In some examples, the methods include analyzing or measuring a level of one or more of FPR3, SMC4, TXNDC16, ASPM, WRN, ZRANB2-AS2, BMX, CDC42BPA, KNL1, CACNA1A, or combination of any two or more thereof in cfRNA in a sample from a subject, wherein differential expression of one or more indicates MGUS.

[0012] In some embodiments, the one or more cfRNA biomarkers: APOE, C3, CP, DHCR24, FGA, FGB, FGG, HRG, IFITM3, ATP1B1, ABCB7, HIST1H2BF, PSIP1, TMEM150C, ZC3H6, C9orfl6, CPQ, DYNC1I2, ECM1, and HIST1H2AH, or any combination thereof, are selected to indicate liver cancer or a predisposition to liver cancer. In some examples, the methods include analyzing or measuring a level of one or more of APOE, C3, CP, DHCR24, FGA, FGB, FGG, HRG, IFITM3, ATP1B1, ABCB7, HIST1H2BF, PSIP1, TMEM150C, ZC3H6, C9orfl6, CPQ, DYNC1I2, ECM1, and HIST1H2AH, or any combination thereof in cfRNA in a sample from a subject, wherein differential expression of one or more indicates liver cancer or a predisposition to liver cancer.

[0013] In some embodiments, the one or more cfRNA biomarkers: APOE, C3, CP, DHCR24, FGA, FGB, FGG, HRG, IFITM3, ATP1B1, or any combination thereof, are selected to indicate hepatocellular carcinoma (HCC). In some examples, the methods include analyzing or measuring a level of one or more of APOE, C3, CP, DHCR24, FGA, FGB, FGG, HRG, IFITM3, ATP IB 1, or a combination of any two or more thereof in cfRNA in a sample from a subject, wherein differential expression of one or more indicates HCC. In other examples, the methods include analyzing or measuring a level of one or more (such as 1, 2, 3, 4, or all) of C3, CP, FGA, FGB, and IFITM3 in cfRNA in a sample from a subject, wherein differential expression of one or more (such as 1, 2, 3, 4, or all) indicates HCC.

[0014] In some embodiments, the one or more cfRNA biomarkers: ABCB7, HIST1H2BF, PSIP1, TMEM150C, ZC3H6, C9orfl6, CPQ, DYNC1I2, ECM1, and HIST1H2AH, or any combination thereof, are selected to indicate cirrhosis. In some examples, the methods include analyzing or measuring a level of one or more of ABCB7, HIST1H2BF, PSIP1, TMEM150C, ZC3H6, C9orfl6, CPQ, DYNC1I2, ECM1, and HIST1H2AH, or any combination of two or more thereof, wherein differential expression of one or more indicates liver cirrhosis.

[0015] Additional aspects and advantages will be apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.

Brief Description of the Drawings

[0016] Figs. 1A and IB show PCA analyses using the top 500 genes with largest variance across, respectively, (a) non-cancer and multiple myeloma and, (b) or liver cancer sample; Fig 1C shows Linear Discriminant Analysis (LDA) using DE genes with padj < 0.01 and top 10 most important genes identified by LVQ analysis. P-value was derived from Wilcoxon test. [0017] Figs 2A and 2B show ROC curves of, respectively, LDA and random Forest (RF) classifications models with two feature sets DE and LVQ; Fig 2C shows a LOOCV with the two models LDA and RF with two feature sets DE and LVQ.

[0018] Fig 3 shows cfRNA biomarkers and classification models validated in independent sample cohort cfRNA profiles distinguish between non-cancer, MGUS and multiple myeloma donors. As shown in Fig. 3, box plots of representative top 10 most significant genes resulted from the LVQ analysis for MM versus NC and a LDA plot using 10 genes from pairwise analysis across NC - MGUS and NC - MM pairs using the LVQ method. P-value was calculated for each pair by the t-test. Fig. 3 shows a LOOCV using 2 models (LDA and RF) with top 10 LVQ genes to discriminate MGUS and NC, MM vs MGUS and three groups NC, MGUS and MM.

[0019] Figs. 4 is a correlation plot analysis showing that qRT-PCR of cfRNA biomarkers was concordant with RNA-sequencing data. As shown in Fig. 4, the correlation plot of the qRT- PCR of cfRNA biomarkers is concordant with RNA-sequencing data according to of qRT-PCR data compared to RNA-sequencing data. P-value was calculated by t-test.

[0020] Fig. 5 provides box plots showing qRT-PCR Ct values of top 4 LVQ genes identified from MM versus NC and top 5 LVQ genes identified from HCC versus NC.

[0021] Fig 6 and Fig. 7 provide box plots showing that cfRNA profiles distinguish between non-cancer, MGUS and multiple myeloma donors; the box plots represent the top 10 most significant genes resulted from learning vector quantization analysis for multiple myeloma versus non-cancer;

[0022] Fig . 8 is a LDA plot using 10 genes from pairwise analysis across non-cancer - MGUS and non-cancer - multiple myeloma samples using the learning vector quantization method; Fig. 8 shows a LOOCV using 2 models (LDA and RF) with top 10 Ivq genes to discriminate MGUS and non-cancer, multiple myeloma vs MGUS, and three groups: non-cancer, MGUS and multiple myeloma.

[0023] Fig 9 and Fig. 10 provide box plots representative of the top 10 most significant genes from the LVQ analysis for HCC vs. NC. P-value was calculated for each pair by the t-test.

[0024] Fig. 11 is a LDA plot using top 10 genes identified from each pairwise analysis between NC - Cirr and NC - HCC samples using the LVQ method.

[0025] Fig 12 and Fig. 13 show Volcano plots between false discovery rate (FDR) and fold changes for all genes in pairwise comparison between non-cancer (NC) donors and multiple myeloma (MM) and liver cancer (HCC) analyzed by DESeq2. Histograms of number of significant genes differentiating two groups from random permutation between samples across non-cancer donors and multiple myeloma or liver cancer. Differential expression analysis was performed using DESeq2 with Wald test and adjusted p-value cutoff at 0.01.

[0026] Fig 14 and Fig. 15 illustrate cfRNA biomarkers showing stage -dependent discrimination in pilot and validation sample sets. Fig. 14 shows Linear Discriminant Analysis using top 10 LVQ genes and model trained in the pilot cohort shows significant discrimination and classification by stage in both HCC and MM . Fig. 15 shows that when classifying the independent validation cohort with these same models, stage -dependent classification for both HCC and MM were seen. P-value for each pair was calculated by the Wilcoxon rank sum test. [0027] Fig 16 and Fig. 17 show box and whisker plots illustrating how cfRNA biomarkers for HCC show discrimination between various etiologies. As shown in Figs 16 and 17, a Linear Discriminant analysis trained on the pilot cohort with the top 10 LVQ genes showed significant discrimination between NC and HCC on the background of NASH, HCV+ and other etiologies in the pilot cohort and the validation cohort. P-value for each pair was calculated by the Wilcoxon rank sum test.

Detailed Description of the Invention

[0028] As used in the specification and claims, the singular form "a", "an" and "the" include plural references unless the context clearly dictates otherwise.

[0029] The term "about" or "approximately" means within an acceptable error range for the particular value as determined by one of ordinary skill in the art, which will depend in part on how the value is, analyzed, measured or determined, i.e., the limitations of the measurement system. For example, "about" can mean within 1 or more than 1 standard deviation, per the practice in the art. Alternatively, "about" can mean a range of up to 20%, up to 10%, up to 5%, or up to 1% of a given value. Alternatively, particularly with respect to biological systems or processes, the term can mean within an order of magnitude, preferably within 5-fold, and more preferably within 2-fold, of a value. Where particular values are described in the application and claims, unless otherwise stated the term "about" meaning within an acceptable error range for the particular value should be assumed.

[0030] The terms "polynucleotide", "nucleotide", "nucleic acid," and "oligonucleotide" are used interchangeably. They refer to a polymeric form of nucleotides of any length, either deoxyribonucleotides or ribonucleotides, or analogs thereof. For example, a polynucleotide may constitute a deoxyribonucleic acid (DNA) molecule or a ribonucleic acid (RNA) molecule. Polynucleotides may have any three dimensional structure, and may perform any function, known or unknown. The following are non-limiting examples of polynucleotides: coding or non-coding regions of a gene or gene fragment, loci (locus) defined from linkage analysis, exons, introns, cell-free RNA (cfRNA), messenger RNA (mRNA), transfer RNA (tRNA), ribosomal RNA (rRNA), short interfering RNA (siRNA), short-hairpin RNA (shRNA), micro- RNA (miRNA), mitochondrial RNA (mtRNA), ribozymes, complementary DNA (cDNA), recombinant polynucleotides, branched polynucleotides, plasmids, vectors, isolated DNA of any sequence, isolated RNA of any sequence, nucleic acid probes, and primers. A polynucleotide may comprise one or more modified nucleotides, such as methylated nucleotides and nucleotide analogs. If present, modifications to the nucleotide structure may be imparted before or after assembly of the polymer. The sequence of nucleotides may be interrupted by non-nucleotide components. A polynucleotide may be further modified after polymerization, such as by conjugation with a labeling component.

[0031] As used herein, “complementary DNA” or “cDNA” refers to DNA synthesized from a single-stranded template in a enzymatically catalyzed reaction. For example, a expressed cfRNA biomarker may be catalyzed by a reverse transcriptase to produce a cDNA template. Skilled persons will understand that creation of cDNA template libraries facilitates the characterization of expressed RNA by sequencing methods (see, for example, Nat. Rev. Gent. 2009 Jan;10(l):57-63; “RNA-Seq: a revolutionary tool for transcriptomics”).

[0032] The terms "amplify," "amplifies," "amplified," and "amplification," as used herein, generally refer to any process by which one or more copies are made of a target polynucleotide or a portion thereof. A variety of methods of amplifying polynucleotides (e.g. DNA and/or RNA) are available, some examples of which are described herein. Amplification may be linear, exponential, or involve both linear and exponential phases in a multi-phase amplification process. Amplification methods may involve changes in temperature, such as a heat denaturation step, or may be isothermal processes that do not require heat denaturation.

[0033] In some of the various embodiments, some polynucleotides are "preferentially" treated, such as preferentially manipulating RNA in a sample comprising both RNA and DNA. In this context, "preferentially" refers to treatment that affects a greater proportion of the polynucleotide of the indicated type. In some embodiments, preferentially treating RNA indicates that of the polynucleotides affected by the treatment, at least 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or more of the affected polynucleotides in a reaction are RNA molecules. In some embodiments, preferentially treating RNA refers to the use of a particular treatment or reagent known in the art to have a degree of specificity for RNA over DNA. For example, reverse transcriptase is an enzyme typically used in reverse transcription reactions to transcribe RNA into cDNA, and is known to have specificity for using RNA, rather than DNA, as a template. As a further example, RNA can be preferentially treated using reagents that react with elements that are typically found in RNA and not DNA (e.g. the ribose sugar backbone, or the presence of uracil). In some embodiments, preferential treatment of RNA comprises use of enzymes that are not specific to RNA, but whose activity is preferentially directed to polynucleotides derived from RNA (e.g. cDNA) by virtue of one or more previous steps. For example, single -stranded DNA ligases may preferentially ligate oligonucleotides to cDNA in samples where cDNA is produced and rendered single -stranded in the presence of other DNA species that are predominantly double -stranded.

[0034] As used herein, “biomarker” refers to a measurable substance (e.g., protein or polynucleotide) in an organism whose presence is indicative of some phenomenon such as disease (e.g., liver cancer or blood cancer), infection, or environmental exposure. A biomarker may include a gene, a gene fragment, or any other form of polynucleotide such as cell-free RNA (cfRNA). As used herein, “gene” refers to a distinct sequence of polynucleotides forming part of a chromosome. In some embodiments, a cfRNA biomarker may include the entirety or any portion of a polynucleotide expressed as a gene product by a cell. Thus, in some embodiments, for example, selecting a AIDA gene for analysis would include analyzing the level of RNA transcript expressed from the AIDA gene.

[0035] As used herein, the terms "cell-free," "circulating," and "extracellular" as applied to polynucleotides (e.g. "cell-free DNA" and "cell-free RNA") are used interchangeably to refer to polynucleotides present in a biological sample or portion thereof that can be isolated or otherwise manipulated without applying a lysis step to intact cells in the biological sample (e.g., as in extraction from cells or viruses). Cell-free polynucleotides may be encapsulated (e.g., exosomes) or unencapsulated or "free" from the cells or viruses from which they originate, even before a sample of the subject is collected. Accordingly, cell-free polynucleotides may be isolated from a non-cellular fraction of blood (e.g. serum or plasma), from other bodily fluids (e.g. urine), or from non-cellular fractions of other types of samples. Notwithstanding, since cfRNA polynucleotide originates from within a cell, cell-free polynucleotides may be produced as a byproduct of cell death (e.g. apoptosis or necrosis), cell lysis, or cell shedding, releasing polynucleotides into surrounding body fluids or into circulation. Moreover, cell-free polynucleotides may be produced as a by-product of applying a lysis step to the biological sample. Skilled persons will understand that a lysis step may include applying detergent, heat, mechanical shearing, or any combination thereof, to lyse an intact cell or a membrane encapsulated structure. In some embodiments, a lysis step may be applied to induce release of polynucleotides from other membrane structures such as exosomes, or vesicles.

[0036] As used herein, “sequencing” refers to any of a number of technologies used to determine the sequence (e.g., the identity and order of monomer units) of a biomolecule, e.g., a nucleic acid such as DNA or RNA. Exemplary sequencing methods include, but are not limited to, targeted sequencing, single molecule real-time sequencing, exon or exome sequencing, intron sequencing, electron microscopy-based sequencing, panel sequencing, transistor- mediated sequencing, direct sequencing, random shotgun sequencing, Sanger dideoxy termination sequencing, whole-genome sequencing, sequencing by hybridization, pyrosequencing, capillary electrophoresis, duplex sequencing, cycle sequencing, single-base extension sequencing, solid- phase sequencing, high-throughput sequencing, massively parallel signature sequencing, emulsion PCR, co -amplification at lower denaturation temperature-PCR (COLD-PCR), multiplex PCR, sequencing by reversible dye terminator, paired-end sequencing, near-term sequencing, exonuclease sequencing, sequencing by ligation, short-read sequencing, single-molecule sequencing, sequencing-by-synthesis, real-time sequencing, reverse-terminator sequencing, nanopore sequencing, 454 sequencing, Solexa Genome Analyzer sequencing, SOLiD™ sequencing, MS-PET sequencing, and a combination thereof. In some embodiments, sequencing can be performer by a gene analyzer such as, for example, gene analyzers commercially available from Illumina, Inc., Pacific Biosciences, Inc., or Applied Biosystems/Thermo Fisher Scientific, among many others.

[0037] As used herein, “next generation sequencing” or “NGS” refers to sequencing technologies having increased throughput as compared to traditional Sanger- and capillary electrophoresis-based approaches, for example, with the ability to generate hundreds of thousands of relatively small sequence reads at a time. Some examples of next generation sequencing techniques include, but are not limited to, sequencing by synthesis, sequencing by ligation, and sequencing by hybridization.

[0038] As used herein, “subject” or “test subject” refers to an animal, such as a mammalian species (e.g., human) or avian (e.g., bird) species, or other organism, such as a plant. More specifically, a subject can be a vertebrate, e.g., a mammal such as a mouse, a primate, a simian or a human. Animals include farm animals (e.g., production cattle, dairy cattle, poultry, horses, pigs, and the like), sport animals, and companion animals (e.g., pets or support animals). A subject can be a healthy individual, an individual that has or is suspected of having a disease or a predisposition to the disease, or an individual that is in need of therapy or suspected of needing therapy. The terms “individual” or “patient” are intended to be interchangeable with “subject.”

[0039] As used herein, “reference sample” or “reference cfRNA sample” refers to a sample of known composition and/or having or known to have or lack specific properties (e.g., known nucleic acid variant(s), known cellular origin, known tumor fraction, known coverage, and/or the like) that is analyzed along with or compared to test samples in order to evaluate the accuracy of an analytical procedure, classify the test samples, and/or the like. A reference sample dataset typically includes from at least about 25 to at least about 30,000 or more reference samples. In some embodiments, the reference sample dataset includes about 50, 75, 100, 150, 200, 300, 400, 500, 600, 700, 800, 900, 1,000, 2,500, 5,000, 7,500, 10,000, 15,000, 20,000, 25,000, 50,000, 100,000, 1,000,000, or more reference samples.

[0040] In an exemplary embodiment, a reference sample is used as a corresponding control for each biomarker to provide a control value (CV). For example, a reference sample providing a AIDA CV corresponds to an AIDA cfRNA biomarker, a CAI CV corresponds to a CAI cfRNA biomarker, and so forth. In some embodiments, a CV may include a level, or range of levels, indicative of a normal subject’s cfRNA biomarker level or range of levels, whereby a differential expression analysis may be used to detect cfRNA biomarker level or levels that differ, or fall outside of, the level or range of levels indicated by the CV and, thus, detect cancer or a predisposition to cancer. In some cases, a cfRNA biomarker level showing a higher expression than its corresponding CV is indicative of cancer or a predisposition to cancer. In some cases, a combination of one or more cfRNA biomarker levels showing higher expression to their respective corresponding CVs is indicative of cancer of predisposition to cancer. In some cases, a cfRNA biomarker level may be less than its corresponding CV.

[0041] As used herein, “panel” refers to a predetermined group of medical tests or assays used in the diagnosis and treatment of disease. As used herein, “test” or “assay” refers to a process of analyzing a substance to determine is composition or quality. A panel may be designed as a single-plex, duplex, or multiplex where the panel tests or screens for, respectively, one, two, or three or more biomarkers in a single test. For example, a blood cancer panel may include one or more cfRNA biomarkers selected from the group of: AIDA, CAI, CENPE, CPOX, ELL2, EPB42, HBG1, HBG2, NEK2, NUSAP1, FPR3, SMC4, TXNDC16, ASPM, WRN, ZRANB2- AS2, BMX, CDC42BPA, KNL1, CACNA1A, or any combination thereof, to indicate blood cancer or a predisposition to blood cancer. In another example, a liver cancer panel may include one or more cfRNA biomarkers selected from a group of: APOE, C3, CP, DHCR24, FGA, FGB, FGG, HRG, IFITM3, ATP1B1, ABCB7, HIST1H2BF, PSIP1, TMEM150C, ZC3H6, C9orfl6, CPQ, DYNC1I2, ECM1, HIST1H2AH, or any combination thereof, to indicate liver cancer or a predisposition to liver cancer.

[0042] As used herein, “predisposition” or “premalignancy” are used interchangeably and refer to a condition that may (or is a likely to) become cancer. A predisposition may derive from genetic or environmental etiologies relevant to the subject and generally indicates a pre- cancerous stage of disease. For example, monoclonal gammopathy of undetermined significance (MGUS) and cirrhosis are premalignant conditions known in the art have a likelihood of becoming, respectively, liver and blood cancer. Skilled persons will understand that a variety of staging systems exist for determining if a condition is cancerous. For example, the American Joint Committee on Cancer (633 N. St. Clair St., Chicago, IL 60611-3211) defines “Stage IA” liver cancer as a single tumor 2 cm (4/5 inch) or smaller that hasn’t grown into blood vessels. (See: cancer.org/cancer/liver-cancer/detection-diagnosis- staging/staging.html). Thus, for example, in some cases a subject with elevated levels of one or more cfRNA biomarkers: ABCB7, HIST1H2BF, PSIP1, TMEM150C, ZC3H6, C9orfl6, CPQ, DYNC1I2, ECM1, HIST1H2AH, or any combination thereof, relative to one or more of the corresponding CVs may indicate a predisposition to liver cancer if no tumor meeting Stage lA’s requirements is detected.

[0043] In an exemplary embodiment, the disclosed materials and methods relate to a method for detecting cancer or a predisposition for cancer in a biological sample obtained from a subject. In the exemplary embodiment, a level of one or more cell-free RNA (cfRNA) biomarkers selected from a group comprising: AIDA, CAI, CENPE, CPOX, ELL2, EPB42, HBG1, HBG2, NEK2, NUSAP1, APOE, C3, CP, DHCR24, FGA, FGB, FGG, HRG, IFITM3, ATP IB 1, FPR3, SMC4, TXNDC16, ASPM, WRN, ZRANB2-AS2, BMX, CDC42BPA, KNL1, CACNA1A, ABCB7, HIST1H2BF, PSIP1, TMEM150C, ZC3H6, C9orfl6, CPQ, DYNC1I2, ECM1, and HIST1H2AH, or any combination thereof, in the biological sample is analyzed or measured. A differential expression analysis comparing the level of each cfRNA biomarker selected to a corresponding control value (CV) is performed. The differential expression shown by the differential expression analysis between the selected cfRNA biomarkers and corresponding CVs indicates cancer or a predisposition for cancer in the subject. [0044] In some embodiments, axin interactor, dorsalization associated gene (AIDA) is selected (for example, analyzed or measured). In some embodiments, carbon anhydrase 1 gene (CAI) is selected (for example, analyzed or measured). In some embodiments, centromere protein E gene (CENPE) is selected (for example, analyzed or measured). In some embodiments, coproporphyrinogen oxidase gene (CPOX) is selected (for example, analyzed or measured). In some embodiments, elongation factor for RNA Polymerase II 2 gene (ELL2) is selected (for example, analyzed or measured). In some embodiments, erythrocyte membrane protein band 4.2 gene (EPB42) is selected (for example, analyzed or measured). In some embodiments, hemoglobin subunit gamma 1 gene (HBG1) is selected (for example, analyzed or measured). In some embodiments, hemoglobin subunit gamma 2 gene (HBG2) is selected (for example, analyzed or measured). In some embodiments, NIMA related kinase 2 gene (NEK2) is selected (for example, analyzed or measured). In some embodiments, nucleolar and spindle associated protein 1 gene (NUSAP1) is selected (for example, analyzed or measured). In some embodiments, apolipoprotein E gene (APOE) is selected (for example, analyzed or measured). In some embodiments, complement component C3 gene (C3) is selected (for example, analyzed or measured). In some embodiments, ceruloplasmin gene (CP) is selected (for example, analyzed or measured). In some embodiments, 24-dehydrocholesterol reductase gene (DHCR24) is selected (for example, analyzed or measured). In some embodiments, fibrinogen alpha chain gene (FGA) is selected (for example, analyzed or measured). In some embodiments, fibrinogen beta chain gene (FGB) is selected (for example, analyzed or measured). In some embodiments, fibrinogen gamma chain gene (FGG) is selected (for example, analyzed or measured). In some embodiments, histidine rich glycoprotein gene (HRG) is selected (for example, analyzed or measured). In some embodiments, interferon induced transmembrane protein 3 gene (IFITM3) is selected (for example, analyzed or measured). In some embodiments, ATPase Na+/K+ transporting subunit beta 1 gene (ATP IB 1) is selected (for example, analyzed or measured). In some embodiments, N-formyl peptide receptor 3 (FPR3) is selected (for example, analyzed or measured). In some embodiments, structural maintenance of chromosomes 4 gene (SMC4) is selected (for example, analyzed or measured). In some embodiments, thioredoxin domain containing 16 gene (TXNDC16) is selected (for example, analyzed or measured). In some embodiments, assembly factor for spindle microtubules gene (ASPM) is selected (for example, analyzed or measured). In some embodiments, WRN recQ like helicase gene (WRN) is selected (for example, analyzed or measured). In some embodiments, ZRANB2 antisense RNA 2 gene (ZRANB2-AS2) is selected (for example, analyzed or measured). In some embodiments, BMX non-receptor tyrosine kinase gene (BMX) is selected (for example, analyzed or measured). In some embodiments, Serine/ZThreonine kinase MRCK alpha gene (CDC42BPA) is selected (for example, analyzed or measured). In some embodiments, kinetochore scaffold 1 gene (KNL1) is selected (for example, analyzed or measured). In some embodiments, Calcium voltage-gated channel subunit alpha 1 gene (CACAN1A) is selected (for example, analyzed or measured). In some embodiments, ATP binding cassette subfamily B member 7 gene (ABCB7) is selected (for example, analyzed or measured). In some embodiments, histone cluster 1 H2bf gene (HIST1H2BF) is selected (for example, analyzed or measured). In some embodiments, PC4 and SFRS1 interacting protein 1 gene (PSIP1) is selected (for example, analyzed or measured). In some embodiments, transmembrane protein 150C gene (TMEM150C) is selected (for example, analyzed or measured). In some embodiments, Zinc Finger CCCH-type containing protein 6 gene (ZC3H6) is selected (for example, analyzed or measured). In some embodiments, chromosome 9 open reading frame 16 gene (C9orfl6) is selected (for example, analyzed or measured). In some embodiments, carboxypeptidase Q gene (CPQ) is selected (for example, analyzed or measured). In some embodiments, dynein cytoplasmic 1 intermediate chain 2 gene (DYNC1I2) is selected (for example, analyzed or measured). In some embodiments, extracellular matrix protein 1 gene (ECM1) is selected (for example, analyzed or measured). In some embodiments, histone H2A type 1-H gene (HIST1H2AH) is selected, (for example, analyzed or measured) In certain embodiments, any combination thereof is selected (for example, analyzed or measured). In certain embodiments, one or more of the above biomarkers are not selected (for example, are not analyzed or measured).

[0045] In some embodiments, the one or more cfRNA biomarkers: AIDA, CAI, CENPE, CPOX, ELL2, EPB42, HBG1, HBG2, NEK2, NUSAP1, FPR3, SMC4, TXNDC16, ASPM, WRN, ZRANB2-AS2, BMX, CDC42BPA, KNL1, CACNA1A, or any combination thereof, are selected to indicate blood cancer or a predisposition to blood cancer. In some examples, a level of one or more of AIDA, CAI, CENPE, CPOX, ELL2, EPB42, HBG1, HBG2, NEK2, NUSAP1, FPR3, SMC4, TXNDC16, ASPM, WRN, ZRANB2-AS2, BMX, CDC42BPA, KNL1, CACNA1A, or any combination of two or more thereof are analyzed or measured in cfRNA in a sample from a subject, and differential expression of one or more indicates blood cancer or a predisposition to a blood cancer. In some embodiments, the blood cancer is multiple myeloma (MM). In some embodiments, the predisposition to blood cancer is monoclonal gammopathy of undetermined significance (MGUS).

[0046] In some embodiments, the one or more cfRNA biomarkers: AIDA, CAI, CENPE, CPOX, ELL2, EPB42, HBG1, HBG2, NEK2, NUSAP1, or any combination thereof, are selected to indicate multiple myeloma (MM). In some examples, one or more of CENPE, HBG1, HBG2, and NUSAP1 are selected to indicate multiple myeloma. In some examples, the methods include analyzing or measuring a level of one or more of AIDA, CAI, CENPE, CPOX, ELL2, EPB42, HBG1, HBG2, NEK2, NUSAP1, or any combination of two or more thereof, in cfRNA in a sample from a subject, wherein differential expression of one or more indicates multiple myeloma.

[0047] In some examples, the methods include measuring a level of one or more (such as 1, 2, 3, or all) of CENPE, HBG1, HBG2, and NUSAP1 in cfRNA in a sample from a subject, wherein differential expression of one or more (such as 1, 2, 3, or all) indicates multiple myeloma. In some examples, an increase in expression level of one or more of AIDA, CAI, CENPE, CPOX, ELL2, EPB42, HBG1, HBG2, NEK2, NUSAP1, or a combination of any two or more thereof (including, but not limited to each of CENPE, HGB1, HGB2, and NUSAP1) compared to a control indicates multiple myeloma. In some examples, the differential expression is an increase of at least about 10%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 90%, at least about 2-fold, at least about 2.5-fold, at least about 3-fold, or more compared to the control.

[0048] In some embodiments, the one or more cfRNA biomarkers: FPR3, SMC4, TXNDC16, ASPM, WRN, ZRANB2-AS2, BMX, CDC42BPA, KNL1, CACNA1A, or any combination thereof, are selected to indicate monoclonal gammopathy of undetermined significance (MGUS). In some examples, the methods include analyzing or measuring a level of one or more of FPR3, SMC4, TXNDC16, ASPM, WRN, ZRANB2-AS2, BMX, CDC42BPA, KNL1, CACNA1A, or combination of any two or more thereof in cfRNA in a sample from a subject, wherein differential expression of one or more indicates MGUS.

[0049] In some embodiments, the one or more cfRNA biomarkers: APOE, C3, CP, DHCR24, FGA, FGB, FGG, HRG, IFITM3, ATP1B1, ABCB7, HIST1H2BF, PSIP1, TMEM150C, ZC3H6, C9orfl6, CPQ, DYNC1I2, ECM1, and HIST1H2AH, or any combination thereof, are selected to indicate liver cancer or a predisposition to liver cancer. In some examples, a level of one or more of APOE, C3, CP, DHCR24, FGA, FGB, FGG, HRG, IFITM3, ATP1B1, ABCB7, HIST1H2BF, PSIP1, TMEM150C, ZC3H6, C9orfl6, CPQ, DYNC1I2, ECM1, and HIST1H2AH, or any combination of two or more thereof are analyzed or measured in cfRNA in a sample from a subject, and differential expression of one or more indicates liver cancer or a predisposition to a liver cancer. In some embodiments, the liver cancer is hepatocellular carcinoma (HCC). In some embodiments, the predisposition to liver cancer is cirrhosis.

[0050] In some embodiments, the one or more cfRNA biomarkers: APOE, C3, CP, DHCR24, FGA, FGB, FGG, HRG, IFITM3, ATP1B1, or any combination thereof, are selected to indicate hepatocellular carcinoma (HCC). In some examples, the methods include analyzing or measuring a level of one or more of APOE, C3, CP, DHCR24, FGA, FGB, FGG, HRG, IFITM3, ATP IB 1, or a combination of any two or more thereof in cfRNA in a sample from a subject, wherein differential expression of one or more indicates HCC. In other examples, the methods include analyzing or measuring a level of one or more (such as 1, 2, 3, 4, or all) of C3, CP, FGA, FGB, and IFITM3 in cfRNA in a sample from a subject, wherein differential expression of one or more (such as 1, 2, 3, 4, or all) indicates HCC. In some examples, an increase in expression level of one or more of APOE, C3, CP, DHCR24, FGA, FGB, FGG, HRG, IFITM3, ATP IB 1, or a combination of any two or more thereof (including, but not limited to an increase in expression level of each of C3, CP, FGA, FGB, and IFITM3) compared to a control indicates HCC. In some examples, the differential expression is an increase of at least about 10%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 90%, at least about 2- fold, at least about 2.5-fold, at least about 3-fold, or more compared to the control.

[0051] In some embodiments, the one or more cfRNA biomarkers: ABCB7, HIST1H2BF, PSIP1, TMEM150C, ZC3H6, C9orfl6, CPQ, DYNC1I2, ECM1, and HIST1H2AH, or any combination thereof, are selected to indicate liver cirrhosis. In some examples, the methods include analyzing or measuring a level of one or more of ABCB7, HIST1H2BF, PSIP1, TMEM150C, ZC3H6, C9orfl6, CPQ, DYNC1I2, ECM1, and HIST1H2AH, or a combination of any two or more thereof in cfRNA in a sample from a subject, wherein differential expression of one or more indicates liver cirrhosis.

[0052] In some embodiments, the one or more cfRNA biomarkers are selected to determine the efficacy of a prophylactic treatment for preventing the development of cancer in subjects having a predisposition to cancer. [0053] In some embodiments, the differential expression analysis is selected from the group of: (i) read-mapping analysis, (ii) Random Forest analysis , (iii) decision tree analysis, (iv) Linear Discriminant Analysis, (v) DESeq2 analysis, (vi) Bayesian modeling framework analysis, (vii) EMDomics analysis, (viii) Monocle2 analysis, (ix) Discrete distributional differential expression (D3E) analysis, (x) SINCERA analysis, (xi) edgeR analysis, (xii) DEsingle analysis, (xiii) SigEMD analysis, or any combination thereof. Skilled persons will understand that a lack of differential expression between the selected one or more cfRNA biomarkers and a corresponding CV will generally indicate a lack of cancer (e.g., “non-cancer”) or a lack of predisposition to cancer in the subject.

[0054] In some embodiments, the level of the one or more cfRNA biomarkers is analyzed by a method selected from the group of: a polymerase chain reaction (PCR), a quantitative PCR (qPCR), a reverse transcription PCR (rt-PCR), a complementary DNA (cDNA) synthesis, or a real-time PCR, or any combination thereof. Skilled persons will understand the polynucleotide amplification (e.g. PCR) may require a primer pair designed to amplify a specific gene target. In some embodiments, a primer pair is selected to amply a specific cfRNA gene target (as shown in Table 17. In some embodiments, a primer pair, selected from the group of: SEQ ID NO: 1 and SEQ ID NO: 2; SEQ ID NO: 3 and SEQ ID NO: 4; SEQ ID NO: 5 and SEQ ID NO: 6; SEQ ID NO: 7 and SEQ ID NO: 8; SEQ ID NO: 9 and SEQ ID NO: 10; SEQ ID NO: 11 and SEQ ID NO: 12; SEQ ID NO: 13 and SEQ ID NO: 14; SEQ ID NO: 15 and SEQ ID NO: 16; SEQ ID NO: 17 and SEQ ID NO: 18; SEQ ID NO: 19 and SEQ ID NO: 20; SEQ ID NO: 21 and SEQ ID NO: 22; SEQ ID NO: 23 and SEQ ID NO: 24; SEQ ID NO: 25 and SEQ ID NO: 26; SEQ ID NO: 27 and SEQ ID NO: 28; SEQ ID NO: 29 and SEQ ID NO: 30; SEQ ID NO: 31 and SEQ ID NO: 32; SEQ ID NO: 33 and SEQ ID NO: 34; SEQ ID NO: 35 and SEQ ID NO: 36; SEQ ID NO: 37 and SEQ ID NO: 38; SEQ ID NO: 39 and SEQ ID NO: 40; SEQ ID NO: 41 and SEQ ID NO: 42; or any combination thereof, is used to analyze the one or more cfRNA biomarkers in the biological sample.

[0055] In some examples, the level of the one or more cfRNA biomarkers is detected using RT-qPCR. In some examples, the methods include a step utilizing a pool of two or more pairs of primers to pre-amplify a plurality of cDNAs of interest (for example generated by RT-PCR of cfRNA), followed by a step including two or more individual amplification reactions, each utilizing a single pair of primers to amplify a single cDNA of interest from the pre-amplification step (for example, using quantitative real-time PCR). In some examples, the pre-amplification method includes performing a RT-PCR reaction comprising primer pairs for amplifying two or more of the cfRNA biomarkers described herein, producing a pre-amplified pool of cDNAs and digesting the pre-amplified pool of cDNAs to remove single -stranded nucleic acids.

[0056] In some examples, the methods include amplifying the one or more cfRNA biomarkers utilizing one or more primer pairs selected from the primer pair of SEQ ID NO: 23 and SEQ ID NO: 24, the primer pair of SEQ ID NO: 25 and SEQ ID NO: 26, the primer pair of SEQ ID NO: 27 and SEQ ID NO: 28, the primer pair of SEQ ID NO: 29 and SEQ ID NO: 30, the primer pair of SEQ ID NO: 31 and SEQ ID NO: 32, the primer pair of SEQ ID NO: 33 and SEQ ID NO: 34, the primer pair of SEQ ID NO: 35 and SEQ ID NO: 36, the primer pair of SEQ ID NO: 37 and SEQ ID NO: 38, the primer pair of SEQ ID NO: 39 and SEQ ID NO: 40, the primer pair of SEQ ID NO: 41 and SEQ ID NO: 42, or any combination thereof, for example for methods of detecting or identifying multiple myeloma. In some examples, the one or more primer pairs include each of the primer pair of SEQ ID NO: 25 and SEQ ID NO: 26, the primer pair of SEQ ID NO: 27 and SEQ ID NO: 28, the primer pair of SEQ ID NO: 29 and SEQ ID NO: 30, and the primer pair of SEQ ID NO: 31 and SEQ ID NO: 32 for methods of detecting or identifying multiple myeloma. In other examples, the methods include amplifying the one or more cfRNA biomarkers utilizing one or more primer pairs selected from the primer pair of SEQ ID NO: 5 and SEQ ID NO: 6, the primer pair of SEQ ID NO: 7 and SEQ ID NO: 8, the primer pair of SEQ ID NO: 9 and SEQ ID NO: 10, the primer pair of SEQ ID NO: 11 and SEQ ID NO: 12, the primer pair of SEQ ID NO: 13 and SEQ ID NO: 14, the primer pair of SEQ ID NO: 15 and SEQ ID NO: 16, the primer pair of SEQ ID NO: 17 and SEQ ID NO: 18, the primer pair of SEQ ID NO: 19 and SEQ ID NO: 20, the primer pair of SEQ ID NO: 21 and SEQ ID NO: 22, or any combination thereof for methods of detecting or identifying HCC. In further examples, the one or more primer pairs include each of the primer pair of SEQ ID NO: 5 and SEQ ID NO: 6, the primer pair of SEQ ID NO: 7 and SEQ ID NO: 8, the primer pair of SEQ ID NO: 9 and SEQ ID NO: 10, the primer pair of SEQ ID NO: 11 and SEQ ID NO: 12 and the primer pair of SEQ ID NO: 13 and SEQ ID NO: 14 for methods of detecting or identifying HCC.

[0057] In some embodiments, the biological sample is selected from the group of: a blood sample, a serum sample, a plasma sample, a urine sample, a tear sample, a saliva sample, a breast milk sample, a semen sample, a fecal sample, a cerebrospinal fluid sample, a tissue sample, or a cell sample. [0058] In some embodiments, the subject is a human who has, or is suspected of having cancer or a predisposition to cancer. For example, a subject can be an individual who has been diagnosed with having a cancer, is going to receive a cancer therapy, and/or has received at least one cancer therapy. The subject can be in remission of a cancer. In another example, a subject can be an individual which has a family history of having a cancer and therefore is predisposed to cancer. In yet another example, a subject can be an individual who was exposed to an environmental agent and therefore is predisposed to cancer.

[0059] As disclosed herein, “biological sample” and “sample” are used interchangeably and may include but are not limited to, a blood sample, a serum sample, a plasma sample, a urine sample, a tear sample, a saliva sample, a breast milk sample, a semen sample, a fecal sample, a tissue sample, or a cell sample. A biological sample may be material obtained from cells or derived from cells of a subject. The biological sample may be a heterogeneous or homogeneous population of cells or tissues. The biological sample may be obtained using any method known to the art that can provide a sample suitable for the analytical methods described herein. The sample may be obtained by non -invasive methods including but not limited to: drawing blood, scraping of the skin or cervix, swabbing of the cheek, saliva collection, urine collection, feces collection, collection of menses, tears, or semen.

[0060] In certain embodiments the biological sample is obtained by biopsy. In other embodiments the biological sample is obtained by swabbing, endoscopy, scraping, phlebotomy, lumbar puncture (spinal tap) or any other methods known in the art. In some cases, the biological sample may be obtained, stored, or transported using components of a kit of the disclosed methods. In some cases, multiple samples, such as multiple blood samples may be obtained for diagnosis by the methods described herein. In some cases, longitudinal studies relying on multiple samples collected at different times may be performed by the methods described herein. In other cases, multiple samples, such as one or more samples from one tissue type (for example esophagus) and one or more samples from another specimen (for example serum) may be obtained for diagnosis by the methods. In some cases, multiple samples such as one or more samples from one tissue type (e.g. esophagus) and one or more samples from another specimen (e.g. serum) may be obtained at the same or different times. Samples may be obtained at different times are stored and/or analyzed by different methods. For example, a sample may be obtained and analyzed by routine staining methods or any other cytological analysis methods. [0061] In some embodiments the biological sample may be obtained by a physician, nurse, or other medical professional such as a medical technician, endocrinologist, cytologist, phlebotomist, radiologist, or a pulmonologist. The medical professional may indicate the appropriate test or assay to perform on the sample. In certain aspects a molecular profiling business may consult on which assays or tests are most appropriately indicated. In further aspects of the disclosed methods, the patient or subject may obtain a biological sample for testing without the assistance of a medical professional, such as obtaining a whole blood sample, a urine sample, a fecal sample, a buccal sample, or a saliva sample.

[0062] In other cases, the biological sample is obtained by an invasive procedure including but not limited to: biopsy, needle aspiration, endoscopy, or phlebotomy. The method of needle aspiration may further include fine needle aspiration, core needle biopsy, vacuum assisted biopsy, or large core biopsy. In some embodiments, multiple samples may be obtained by the methods herein to ensure a sufficient amount of biological material.

[0063] General methods for obtaining biological samples are also known in the art. Publications such as Ramzy, Ibrahim Clinical Cytopathology and Aspiration Biopsy 2001, which is herein incorporated by reference in its entirety, describes general methods for biopsy and cytological methods. In one embodiment, the sample is a fine needle aspirate of a esophageal or a suspected esophageal tumor or neoplasm. In some cases, the fine needle aspirate sampling procedure may be guided by the use of an ultrasound, X-ray, or other imaging device.

[0064] In certain aspects, the methods for obtaining a biological sample from a subject may include methods of biopsy such as fine needle aspiration, core needle biopsy, vacuum assisted biopsy, incisional biopsy, excisional biopsy, punch biopsy, shave biopsy or skin biopsy. In certain embodiments the biological sample is obtained from a biopsy from liver tissue by any of the biopsy methods previously mentioned. In other embodiments the biological sample may be obtained from any of the tissues provided herein that include but are not limited to non- cancerous or cancerous tissue and non-cancerous or cancerous tissue from the serum, gall bladder, mucosal, skin, heart, lung, breast, pancreas, blood, liver, muscle, kidney, smooth muscle, bladder, colon, intestine, brain, prostate, esophagus, or thyroid tissue. Alternatively, the sample may be obtained from any other source including but not limited to blood, plasma, serum, urine, breastmilk, semen, sweat, hair follicle, buccal tissue, tears, menses, feces, saliva, or cells. In certain aspects of the disclosed methods, any medical professional such as a doctor, nurse or medical technician may obtain a biological sample for testing. Yet further, the biological sample can be obtained without the assistance of a medical professional.

[0065] In some embodiments, the biological sample may be obtained the from a subject directly, from a medical professional, from a third party, or from a kit provided by a molecular profding business or a third party. In some cases, the biological sample may be obtained by the molecular profding business after the subject, a medical professional, or a third party acquires and sends the biological sample to the molecular profiling business. In some cases, the molecular profiling business may provide suitable containers, and excipients for storage and transport of the biological sample to the molecular profiling business.

[0066] In some embodiments, a medical professional need not be involved in the initial diagnosis or biological sample acquisition. A subject may alternatively provide a biological sample through the use of an over the counter (OTC) kit. An OTC kit may contain a means for providing the biological sample as described herein, a means for storing the biological sample for inspection, and instructions for proper use of the OTC kit. In some cases, molecular profiling services are included in the price for purchase of the kit. In other cases, the molecular profiling services are billed separately. A biological sample suitable for use by the molecular profiling business may contain tissues, cells, nucleic acids, genes, gene fragments, expression products, gene expression products, or gene expression product fragments of a subject.

[0067] In some embodiments, the subject may be referred to a specialist such as an oncologist, surgeon, or endocrinologist. The specialist may likewise obtain a biological sample for testing or refer the individual to a testing center or laboratory for submission of the biological sample. In some cases the medical professional may refer the subject to a testing center or laboratory for submission of the biological sample. In other cases, the subject may provide the biological sample. In some cases, a molecular profiling business may obtain the biological sample.

[0068] In an exemplary embodiment the level of the one or more cell-free (cfRNA) biomarkers is a gene expression level. The methods disclosed herein include measuring expression of coding and/or noncoding cfRNA genes. In some embodiments, the expression of coding and/or noncoding RNA or DNA is analyzed. Measurement of expression can be done by a number of processes known in the art. The process of measuring expression may begin by isolating or extracting RNA from a biological sample (e.g., tissue sample, blood sample, plasma sample, etc.). In an exemplary embodiment, isolation or extraction of cfRNA does not require applying a cell lysis step. In some embodiments, a cell lysis step may be applied to induce release of polynucleotide from the cell. Skilled persons will understand that cell -lysis or lysis may be induced by applying detergent, mechanical shearing, heat, or any other methods known in the art used to lyse a cell. In some examples, one or more commercially available kits may be used for isolation of cfRNA. Examples include kits from Qiagen (e.g., QIAamp Circulating Nucleic Acid kit), Thermo Fisher Scientific (e.g., MagMAX Cell-Free Total Nucleic Acid kit), Zymo Research (e.g., Quick-cfRNA Serum & Plasma kit). A skilled person can select appropriate kits and methods for isolating or extracting cfRNA.

[0069] In some embodiments, the level of the one or more cfRNA biomarkers is analyzed or measured by hybridization (for example by means of Northern blot analysis or DNA or RNA arrays (microarrays) after converting RNA into labeled complementary DNA (cDNA) and/or amplification by means of a enzymatic chain reaction. In some embodiments, quantitative or semi-quantitative enzymatic amplification methods such as polymerase chain reaction (PCR) or quantitative real-time RT-PCR or semi-quantitative RT-PCR techniques may be used. Other suitable amplification methods may include ligase chain reaction (LCR), transcription-mediated amplification (TMA), strand displacement amplification (SDA), isothermal amplification of nucleic acids, and nucleic acid sequence based amplification (NASBA).

[0070] As used herein, “primer” refers to a single-stranded polynucleotide configured to hybridize with a complementary polynucleotide strand and define a region or locus of the polynucleotide where amplification will initiate. As used herein, a “primer pair” refers to two primers configured to hybridize with a polynucleotide and define a region or locus that will be amplified. For example, a typical PCR reaction relies on a “forward” primer and a “reverse” primer, used conjunctively as a primer pair, to hybridize to, respectively, the antisense and sense strands of a double-stranded polynucleotide (e.g., DNA). Thus, use as a primer pair constitutes using a primer pair configured to amplify a specific region or locus, such as a selected cfRNA biomarker.

[0071] In an exemplary embodiment, primer pairs are selected to amplify one or more cfRNA biomarkers (see Table 17). In some embodiments, the method uses of any of: SEQ ID NO: 1 and SEQ ID NO: 2 as a primer pair; SEQ ID NO: 3 and SEQ ID NO: 4 as a primer pair; SEQ ID NO: 5 and SEQ ID NO: 6 as a primer pair; SEQ ID NO: 7 and SEQ ID NO: 8 as a primer pair; SEQ ID NO: 9 and SEQ ID NO: 10 as a primer pair; SEQ ID NO: 11 and SEQ ID NO: 12 as a primer pair; SEQ ID NO: 13 and SEQ ID NO: 14 as a primer pair; SEQ ID NO: 15 and SEQ ID NO: 16 as a primer pair; SEQ ID NO: 17 and SEQ ID NO: 18 as a primer pair; SEQ ID NO: 19 and SEQ ID NO: 20 as a primer pair; SEQ ID NO: 21 and SEQ ID NO: 22 as a primer pair; SEQ ID NO: 23 and SEQ ID NO: 24 as a primer pair; SEQ ID NO: 25 and SEQ ID NO: 26 as a primer pair; SEQ ID NO: 27 and SEQ ID NO: 28 as a primer pair; SEQ ID NO: 29 and SEQ ID NO: 30 as a primer pair; SEQ ID NO: 31 and SEQ ID NO: 32 as a primer pair; SEQ ID NO: 33 and SEQ ID NO: 34 as a primer pair; SEQ ID NO: 35 and SEQ ID NO: 36 as a primer pair; SEQ ID NO: 37 and SEQ ID NO: 38 as a primer pair; SEQ ID NO: 39 and SEQ ID NO: 40 as a primer pair; SEQ ID NO: 41 and SEQ ID NO: 42 as a primer pair; or any combination thereof, to analyze the one or more cfRNA biomarkers in the biological sample.

[0072] It is understood that additional separate embodiments are contemplated wherein each method herein uses each individual primer pair previously mentioned. For instance, one embodiment for each method uses the primer pair of SEQ ID NO: 5 and SEQ ID NO: 6 and another embodiment for each method uses the primer pair of SEQ ID NO: 7 and SEQ ID NO: 8, and so on.

[0073] In some embodiments, gene expression levels of the one or more cfRNA biomarkers may also be analyzed by RNA sequencing methods known in the art. RNA sequencing methods may include cfRNA-seq, total RNA-seq, targeted RNA-seq, small RNA-seq, single-cell RNA- seq, ultra-low-input RNA- seq, RNA exome capture sequencing, and ribosome profding. Sequencing data may be processed an aligned using methods known in the art.

[0074] In some embodiments, a method for analyzing one or more cfRNA biomarkers by sequencing comprises: (a) isolating a set of one or more cfRNA biomarkers from the biological sample; (b) analyzing the set of one or more cfRNA biomarkers isolated in Step (a) to produce a set of one or more sequence reads; and (c) performing a differential expression analysis on the set of one or more sequence reads to a corresponding consensus sequence (CS) to measure the level of at least one cell-free RNA (cfRNA) biomarker selected from the group consisting of AIDA, CAI, CENPE, CPOX, ELL2, EPB42, HBG1, HBG2, NEK2, NUSAP1, APOE, C3, CP, DHCR24, FGA, FGB, FGG, HRG, IFITM3, or any combination thereof. A differential expression shown between the set of one or more sequence reads aligning with a corresponding CS indicates cancer or a predisposition for cancer in the subject.

[0075] In some embodiments, the analysis used to obtain sequencing reads of Step (b) is: Maxam-Gilbert sequencing, chain-termination sequencing, pyrosequencing, or massive parallel sequencing, or any combination thereof. [0076] In some embodiments, the differential expression analysis is selected from the group of: (i) read-mapping analysis, (ii) Random Forest analysis , (iii) decision tree analysis, (iv) Linear Discriminant Analysis, (v) DESeq2 analysis, (vi) Bayesian modeling framework analysis, (vii) EMDomics analysis, (viii) Monocle2 analysis, (ix) Discrete distributional differential expression (D3E) analysis, (x) SINCERA analysis, (xi) edgeR analysis, (xii) DEsingle analysis, (xiii) SigEMD analysis, or any combination thereof.

[0077] In some embodiments, one or more primer pairs, selected from the group of: SEQ ID NO: 1 and SEQ ID NO: 2; SEQ ID NO: 3 and SEQ ID NO: 4; SEQ ID NO: 5 and SEQ ID NO: 6; SEQ ID NO: 7 and SEQ ID NO: 8; SEQ ID NO: 9 and SEQ ID NO: 10; SEQ ID NO: 11 and SEQ ID NO: 12; SEQ ID NO: 13 and SEQ ID NO: 14; SEQ ID NO: 15 and SEQ ID NO: 16; SEQ ID NO: 17 and SEQ ID NO: 18; SEQ ID NO: 19 and SEQ ID NO: 20; SEQ ID NO: 21 and SEQ ID NO: 22; SEQ ID NO: 23 and SEQ ID NO: 24; SEQ ID NO: 25 and SEQ ID NO: 26; SEQ ID NO: 27 and SEQ ID NO: 28; SEQ ID NO: 29 and SEQ ID NO: 30; SEQ ID NO: 31 and SEQ ID NO: 32; SEQ ID NO: 33 and SEQ ID NO: 34; SEQ ID NO: 35 and SEQ ID NO: 36; SEQ ID NO: 37 and SEQ ID NO: 38; SEQ ID NO: 39 and SEQ ID NO: 40; SEQ ID NO: 41 and SEQ ID NO: 42; or any combination thereof, are used to generate cDNA useful for producing sequencing reads of the one or more cfRNA biomarkers. In some embodiments, one or more cfRNA biomarkers from the group of: AIDA, CAI, CENPE, CPOX, ELL2, EPB42, HBG1, HBG2, NEK2, NUSAP1, FPR3, SMC4, TXNDC16, ASPM, WRN, ZRANB2-AS2, BMX, CDC42BPA, KNL1, CACNA1A, or any combination thereof, are selected or utilized to indicate blood cancer or a predisposition to blood cancer. In some embodiments, one or more cfRNA biomarkers: AIDA, CAI, CENPE, CPOX, ELL2, EPB42, HBG1, HBG2, NEK2, NUSAP1, or any combination thereof, are selected or utilized to indicate multiple myeloma (MM). In further examples, the cfRNA biomarkers CENPE, HBG1, HBG2, and NUSAP1 are selected or utilized to indicate MM. In some embodiments, one or more cfRNA biomarkers from the group of: FPR3, SMC4, TXNDC16, ASPM, WRN, ZRANB2-AS2, BMX, CDC42BPA, KNL1, CACNA1A, or any combination thereof, are selected to indicate monoclonal gammopathy of undetermined significance (MGUS). In some embodiments, one or more cfRNA biomarkers from the group of: APOE, C3, CP, DHCR24, FGA, FGB, FGG, HRG, IFITM3, ATP1B1, ABCB7, HIST1H2BF, PSIP1, TMEM150C, ZC3H6, C9orfl6, CPQ, DYNC1I2, ECM1, and HIST1H2AH, or any combination thereof, are selected or utilized to indicate liver cancer or a predisposition to liver cancer. In some embodiments, one or more cfRNA biomarkers from the group of: APOE, C3, CP, DHCR24, FGA, FGB, FGG, HRG, IFITM3, ATP IB 1, or any combination thereof, are selected or utilized to indicate hepatocellular carcinoma (HCC). In further examles, the cfRNA biomarkers C3, CP, FGA, FGB, and IFITM3 are selected or utilized to indicate HCC. In some embodiments, one or more cfRNA biomarkers from the group of: ABCB7, HIST1H2BF, PSIP1, TMEM150C, ZC3H6, C9orfl6, CPQ, DYNC1I2, ECM1, and HIST1H2AH, or any combination thereof, are selected or utilized to indicate cirrhosis.

[0078] In some embodiments, the sequencing reads of Step (b) are obtained by: Maxam- Gilbert sequencing, chain-termination sequencing, pyrosequencing, massive parallel sequencing, or any combination thereof.

[0079] In some embodiments, the differential expression analysis is selected from the group of: (i) read-mapping analysis, (ii) Random Forest analysis , (iii) decision tree analysis, (iv) Linear Discriminant Analysis, (v) DESeq2 analysis, (vi) Bayesian modeling framework analysis, (vii) EMDomics analysis, (viii) Monocle2 analysis, (ix) Discrete distributional differential expression (D3E) analysis, (x) SINCERA analysis, (xi) edgeR analysis, (xii) DEsingle analysis, (xiii) SigEMD analysis, or any combination thereof.

[0080] To normalize the expression values of one gene among different samples, comparing the cfRNA level of interest in the samples from the subject with a control value (CV) is possible. In some embodiments, a CV may be of a gene for which the expression level does not differ across sample types, for example a gene that is constitutively expressed in all types of cells. In some embodiments, a CV may be of a gene for which the expression level indicates a non-cancerous state in the subject. In some embodiments, a known amount of a control RNA may be added to the sample(s) and the value analyzed for the level of the RNA of interest may be normalized to the value analyzed for the known amount of the control RNA. Normalization for some methods, such as for sequencing, may comprise calculating the reads per kilobase of transcript per million mapped reads (RPKM) for a gene of interest, or may comprise calculating the fragments per kilobase of transcript per million mapped reads (FPKM) for a gene of interest. Normalization methods may comprise calculating the log2-transformed count per million (log- CPM). Skilled persons will understand that any method of normalization that accurately calculates the expression value of an RNA for comparison between samples may be used.

[0081] In some embodiments, the CV is a reference expression level. As used herein, the term "reference expression level" (or “reference level”) refers to a value used as a reference for the values/data obtained from samples obtained from a subject. The reference level can be an absolute value, a relative value, a value which has an upper and/or lower limit, a series of values, an average value, a median, a mean value, or a value expressed by reference to a control or reference value. A reference level can be based on the value obtained from an individual sample, such as, for example, a value obtained from a sample from the subject but obtained at a previous point in time. The reference level can be based on a high number of samples, such as the levels obtained in a cohort of subjects having a particular characteristic. The reference level may be defined as the mean level of the patients in the cohort. A reference level can be based on the expression levels of the biomarkers obtained from samples from subjects who do not have a disease state or a particular phenotype. Skilled persons will understand that the particular reference expression level can vary depending on the specific method to be performed.

[0082] Some embodiments include determining that an analyzed expression level is higher than, lower than, increased relative to, decreased relative to, equal to, or within a predetermined amount of a reference expression level. In some embodiments, a higher, lower, increased, or decreased expression level is at least 1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5, 5, 5.5, 6, 7, 8, 9, 10, 12, 14, 16, 18, 20, 50, 100, 150, 200, 250, 500, or 1000 fold (or any derivable range therein) or at least 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, 600, 700, 800, or 900% different than the reference level, or any derivable range therein. These values may represent a predetermined threshold level, and some embodiments include determining that the analyzed expression level is higher by a predetermined amount or lower by a predetermined amount than a reference level. In some embodiments, a level of expression may be qualified as “low” or “high,” which indicates the patient expresses a certain gene or cfRNA at a level relative to a reference level or a level with a range of reference levels that are determined from multiple samples meeting particular criteria. The level or range of levels in multiple control samples is an example of this. In some embodiments, that certain level or a predetermined threshold value is at, below, or above 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52,

53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77,

78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100 percentile, or any range derivable therein. Moreover, a threshold level may be derived from a cohort of individuals meeting a particular criteria. The number in the cohort may be, be at least, or be at most 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370,

380, 390, 400, 410, 420, 430, 440, 441, 450, 460, 470, 480, 490, 500, 510, 520, 530, 540, 550,

560, 570, 580, 590, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800,

1900, 2000 or more (or any range derivable therein). An analyzed expression level can be considered equal to a reference expression level if it is within a certain amount of the reference expression level, and such amount may be an amount that is predetermined. The predetermined amount may be within 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 35, 40, 45, or 50% of the reference level, or any range derivable therein.

[0083] In some embodiments, a comparison of cfRNA gene expression levels to a is to be made on a gene-by-gene basis. For example, if the expression levels of gene A, gene B, and gene X, as reflected in a patient’s cfRNA levels, are analyzed, a comparison to mean expression levels as reflected in cfRNA from a cohort of patients would involve: comparing the expression level of gene A in the patient’s cfRNA with the mean expression level of gene A reflected in cfRNA from the cohort of patients, comparing the expression level of gene B reflected in the patient’s cfRNA with the mean expression level of gene B in cfRNA from the cohort of patients, and comparing the expression level of gene X in cfRNA from the patient with the mean expression level of gene X in cfRNA from the cohort of patients. In the above example, genes A, B, and X may be selected from any one of: AIDA, CAI, CENPE, CPOX, ELL2, EPB42, HBG1, HBG2, NEK2, NUSAP1, APOE, C3, CP, DHCR24, FGA, FGB, FGG, HRG, IFITM3, ATP1B1, FPR3, SMC4, TXNDC16, ASPM, WRN, ZRANB2-AS2, BMX, CDC42BPA, KNL1, CACNA1A, ABCB7, HIST1H2BF, PSIP1, TMEM150C, ZC3H6, C9orfl6, CPQ, DYNC1I2, ECM1, and HIST1H2AH for comparison. Comparisons that involve determining whether the expression level analyzed in cfRNA from a patient is within a predetermined amount of a mean expression level or reference expression level are similarly done on a gene-by-gene basis, as applicable.

[0084] In an exemplary embodiment, a differential expression analysis is performed comparing the level of each cfRNA biomarker that is analyzed or utilized to a corresponding control value (CV). Differential expression shown by the differential expression analysis between the cfRNA biomarkers and corresponding CVs indicates cancer or a predisposition for cancer in the subject. [0085] In some embodiments, the differential expression analysis comprises: (i) read-mapping analysis, (ii) Random Forest analysis , (iii) decision tree analysis, (iv) Linear Discriminant Analysis, (v) DESeq2 analysis, (vi) Bayesian modeling framework analysis, (vii) EMDomics analysis, (viii) Monocle2 analysis, (ix) Discrete distributional differential expresssion (D3E) analysis, (x) SINCERA analysis, (xi) edgeR analysis, (xii) DEsingle analysis, (xiii) SigEMD analysis, or any combination thereof.

[0086] In some embodiments, the method measures the level of one or more cfRNA biomarker levels by Maxam-Gilbert sequencing, chain-termination sequencing, pyrosequencing, or massive parallel sequencing.

[0087] In some embodiments, DNA from the biological sample, cDNA derived from RNA from the biological sample, and/or amplification products of any of these are sequenced to produced sequencing reads identifying the order of nucleotides present in the sequenced polynucleotides or the complements thereof. A variety of suitable sequencing techniques are available.

[0088] In some embodiments, the method comprises: (a) collecting a biological sample from the subject; (b) isolating a set of one or more cfRNA molecules from the biological sample collected in Step (a); (c) sequencing the set of one or more cfRNA molecules isolated in Step (b) to produce a set of one or more sequence reads; and (d) performing a differential expression analysis on the set of one or more sequence reads to a corresponding consensus sequence (CS) to measure the level of at least one cell-free RNA (cfRNA) biomarker selected from the group consisting of AIDA, CAI, CENPE, CPOX, ELL2, EPB42, HBG1, HBG2, NEK2, NUSAP1, APOE, C3, CP, DHCR24, FGA, FGB, FGG, HRG, IFITM3, or any combination thereof, in the biological sample. Differential expression between the set of one or more sequence reads aligning with a corresponding CS indicates cancer or a predisposition for cancer in the subject.

[0089] In some embodiments, sequencing comprises massively parallel sequencing of about, or at least about 10,000, 100,000, 500,000, 1,000,000, or more DNA or cDNA molecules using a high-throughput sequencing by synthesis process, such as Illumina's sequencing-by-synthesis and reversible terminator-based sequencing chemistry (e.g. as described in Bentley et al., Nature 6:53-59 (2009)). In some embodiments, particularly when cfDNA is included among the polynucleotides to be sequenced, DNA is not fragmented prior to sequencing. Typically, Illumina's sequencing process comprises attachment of template DNA to a planar, optically transparent surface on which oligonucleotide anchors are bound. (In some embodiments, template DNA may include cDNA.) Template DNA is end-repaired to generate 5'- phosphorylated blunt ends, and the polymerase activity of Klenow fragment is used to add a single A base to the 3' end of the blunt phosphorylated DNA. This addition prepares the DNA for ligation to oligonucleotide adapters, which optionally have an overhang of a single T base at their 3' end to increase ligation efficiency. The adapter oligonucleotides are complementary to the flow-cell anchor oligos. Under limiting -dilution conditions, adapter-modified, singlestranded template DNA is added to the flow cell and immobilized by hybridization to the anchor oligos. Attached DNA fragments are extended and bridge amplified to create an ultra- high density sequencing flow cell with hundreds of millions of clusters, each containing about 1,000 copies of the same template. In one embodiment, the template DNA is amplified using PCR before it is subjected to cluster amplification, such as in a process described above. In some applications, the templates are sequenced using a robust four-color DNA sequencing-by- synthesis technology that employs reversible terminators with removable fluorescent dyes. High-sensitivity fluorescence detection is achieved using laser excitation and total internal reflection optics. Short sequence reads of about tens to a few hundred base pairs are aligned against a reference genome, and unique mapping of the short sequence reads to the reference genome are identified using specially developed data analysis pipeline software. After completion of the first read, the templates can be regenerated in situ to enable a second read from the opposite end of the fragments. Thus, either single -end or paired end sequencing of the DNA fragments can be used.

[0090] Another non-limiting example sequencing process is the single molecule sequencing technology of the Helicos True Single Molecule Sequencing (tSMS) technology (e.g. as described in Harris T. D. et al., Science 320: 106-109 (2008)). In a typical tSMS process, a DNA sample is cleaved into, or otherwise provided as strands of approximately 100 to 200 nucleotides, and a polyA sequence is added to the 3' end of each DNA strand. Each strand is labeled by the addition of a fluorescently labeled adenosine nucleotide. The DNA strands are then hybridized to a flow cell, which contains millions of oligo-T capture sites that are immobilized to the flow cell surface. In some embodiments, the templates are at a density of about 100 million templates/cm². The flow cell is then loaded into an instrument, e.g., HeliScope™ sequencer, and a laser illuminates the surface of the flow cell, revealing the position of each template. A CCD camera can map the position of the templates on the flow cell surface. The template fluorescent label is then cleaved and washed away. The sequencing reaction begins by introducing a DNA polymerase and a fluorescently labeled nucleotide. The oligo-T nucleic acid serves as a primer. The polymerase incorporates the labeled nucleotides to the primer in a template directed manner. The polymerase and unincorporated nucleotides are removed. The templates that have directed incorporation of the fluorescently labeled nucleotide are discerned by imaging the flow cell surface. After imaging, a cleavage step removes the fluorescent label, and the process is repeated with other fluorescently labeled nucleotides until the desired read length is achieved. Sequence information is collected with each nucleotide addition step. Whole genome sequencing by single molecule sequencing technologies excludes or typically obviates PCR-based amplification in the preparation of the sequencing libraries. [0091] Another illustrative, but non-limiting example sequencing process is pyrosequencing, such as in the 454 sequencing platform (Roche) (e.g. as described in Margulies, M. et al. Nature 437:376-380 (2005)). 454 sequencing typically involves two steps. In the first step, DNA is sheared into fragments of, or otherwise provided (e.g. as naturally occurring cfDNA molecules, or cDNA from naturally short RNA molecules) as DNA having sizes of approximately 300-800 base pairs, and the polynucleotides are blunt-ended. Oligonucleotide adapters are then ligated to the ends of the DNA. The adapters serve as primers for amplification and sequencing of the DNA. The DNA can be attached to capture beads, e.g., streptavidin-coated beads using, e.g., adapter B, which contains 5'-biotin tag. The DNA attached to the beads are PCR amplified within droplets of an oil-water emulsion. The result is multiple copies of clonally amplified DNA molecules on each bead. In the second step, the beads are captured in wells (e.g., picoliter-sized wells). Pyrosequencing is performed on each DNA molecule in parallel.

Addition of one or more nucleotides generates a light signal that is recorded by a CCD camera in a sequencing instrument. The signal strength is proportional to the number of nucleotides incorporated. Pyrosequencing makes use of pyrophosphate (PPi) which is released upon nucleotide addition. PPi is converted to ATP by ATP sulfurylase in the presence of adenosine 5' phosphosulfate. Luciferase uses ATP to convert luciferin to oxyluciferin, and this reaction generates light that is measured and analyzed.

[0092] Further high-throughput sequencing processes are available. Non-limiting examples include sequencing by ligation technologies (e.g., SOLiD™ sequencing of Applied Biosystems), single-molecule real-time sequencing (e.g., Pacific Biosciences sequencing platforms utilizing zero-mode wave detectors), nanopore sequencing (e.g. as described in Soni G V and Meller A. Clin Chem 53: 1996-2001 (2007)), sequencing using a chemical-sensitive field effect transistor (e.g., as described in U.S. Patent Application Publication No. 20090026082 ), sequencing platforms by Ion Torrent (pairing semiconductor technology with sequencing chemistry to directly translate chemically encoded information (A, C, G, T) into digital information (0, 1) on a semiconductor chip), and sequencing by hybridization. Additional illustrative details regarding sequencing technologies can be found in, e.g., U.S. Patent Application Publication No. 2016/031 9345 .

[0093] In some embodiments using unique molecular identifiers (UMIs), multiple sequence reads having the same UMI(s) are collapsed to obtain one or more consensus sequences, which are then used to determine the sequence of a source DNA polynucleotide. Multiple distinct reads may be generated from distinct instances of the same source DNA polynucleotide, and these reads may be compared to produce a consensus sequence. The instances may be generated by amplifying a source DNA molecule prior to sequencing, such that distinct sequencing operations are performed on distinct amplification products, each sharing the source DNA polynucleotide's sequence. Of course, amplification may introduce errors such that the sequences of the distinct amplification products have differences. In the context some sequencing technologies such as an embodiment of Illumina's sequencing -by-synthesis, a source DNA molecule or an amplification product thereof forms a cluster of DNA molecules linked to a region of a flow cell. The molecules of the cluster collectively provide a read. Typically, at least two reads are required to provide a consensus sequence. Sequencing depths of 100, 1000, and 10,000 are examples of sequencing depths useful in the disclosed embodiments for creating consensus reads for low allele frequencies (e.g., about 1% or less). In some embodiments, nucleotides that are consistent across 100% of the reads sharing a UMI or combination of UMIs are included in the consensus sequence. In some embodiments, consensus criterion can be lower than 100%. For instance, a 90% consensus criterion may be used, which means that base pairs that exist in 90% or more of the reads in the group are included in the consensus sequence. In some embodiments, the consensus criterion may be set at about, or more than about 30%, about 40%, about 50%, about 60%, about 70%, about 80%, about 90%, about 95%, or about 100%.

[0094] In some embodiments, sequencing reads (or consensus sequences thereof) are identified as originating from an RNA molecule in the source sample if the tag sequence (or the complement thereof) forms part of the sequence read (optionally, at an expected position, and/or adjacent to other expected sequence element(s)), and otherwise is identified as originating from a DNA molecule in the source sample if the tag sequence (or the complement thereof) is absent. In this way, RNA sequencing reads and DNA sequencing reads can be produced in a single sequencing reaction, but analyzed separately, and optionally compared to one another. In some embodiments, a processor is used to group RNA-derived sequences separately from DNA- derived sequences. For example, in some embodiments, a mutation relative to an internal reference (e.g. overlapping reads) or an external reference (e.g. a reference genome) is only designated as accurately representing the original molecule (e.g. a DNA molecule of the sample) if the same mutation is identified in one or more reads corresponding to an original molecule of the other type (e.g. an RNA molecule of the sample). This is particularly helpful for increasing sequencing accuracy in cases where no UMIs are used, and can further increase sequencing accuracy when used in combination with UMIs. In some embodiments, for the purposes of alignment among sequencing reads and/or between sequencing reads and a reference sequence, one or more sequences corresponding to features known not to be present in the source polynucleotides (e.g. sequences known to originate from tag oligonucleotides, RT primers, TSOs, or amplification primers) are computationally ignored (e.g. filtered out of the reads prior to alignment).

[0095] In some embodiments, sequencing reads (or consensus sequence thereof) are localized (mapped) by aligning the reads to a known reference genome. In some embodiments, localization is realized by k-mer sharing and read-read alignment. In some embodiments, the reference genome sequence is the NCBI36/hgl8 sequence, which is available on the World Wide Web at genome.ucsc.edu/cgi-bin/hgGateway?org=Human&db=hgl8&hgsid=166260105). In some embodiments, the reference genome sequence is the GRCh37/hgl9 or GRCh38, which is available on the World Wide Web at genome.ucsc.edu/cgi-bin/hgGateway. Other sources of public sequence information include GenBank, dbEST, dbSTS, EMBL (the European Molecular Biology Laboratory), and the DDBJ (the DNA Databank of Japan). A number of computer algorithms are available for aligning sequences, including without limitation BLAST (Altschul et al., 1990), BLITZ (MPsrch) (Sturrock & Collins, 1993), FASTA (Person & Lipman, 1988), BOWTIE (Langmead et al., Genome Biology 10:R25.1-R25.10 [2009]), or ELAND (Illumina, Inc., San Diego, Calif., USA). In some embodiments, one end of clonally expanded copies of plasma polynucleotide molecules (or amplification products thereof) is sequenced and processed by bioinformatics alignment analysis for the Illumina Genome Analyzer, which uses the Efficient Large-Scale Alignment of Nucleotide Databases (ELAND) software. By aligning reads to a reference genome, the genomic locations of mutations relative to the reference sequence can be identified. In some cases, alignment will facilitate inferring an effect of the mutation and/or a property of the cell from which it originated. For example, if the mutation creates a premature stop codon in a tumor suppressor gene, it may be inferred that the source polynucleotide originated from a cancer cell, particularly if there are a statistically significant number of cancer-associated markers are detected in the sequencing reads.

[0096] In some embodiments, one or more causal genetic variants are sequence variants associated with a particular type or stage of cancer, or of cancer having a particular characteristic (e.g. metastatic potential, drug resistance, drug responsiveness). As used herein, “causal variant” refers to genetic variants responsible for an associated signal at a locus, such as biological effect on the phenotype of the subject. In some embodiments, the disclosure provides methods for the determination of prognosis, such as where certain mutations or other genetic characteristics are known to be associated with patient outcomes. For example, circulating tumor DNA (ctDNA) has been shown to be a better biomarker for breast cancer prognosis than the traditional cancer antigen 53 (CA-53) and enumeration of circulating tumor cells (see e.g. Dawson, et al., N Engl J Med 368: 1199 (2013))n some embodiments, methods of the present disclosure comprise treating a subject based on RNA and DNA polynucleotide biomarkers analyzed in a sample from the subject. By way of non-limiting example, methods disclosed herein can be used in making therapeutic decisions, guidance and monitoring, as well as development and clinical trials of cancer therapies. For example, treatment efficacy can be monitored by comparing an individual’s DNA and RNA in samples from before, during, and after treatment with particular therapies such as molecular targeted therapies (monoclonal drugs), chemotherapeutic drugs, radiation protocols, etc. or combinations of these. In some examples, the subject is identified as having MM using the methods provided herein and is treated with one or more of immunotherapy (such elotuzumab, daratumumab, or isatuximab), corticosteroids (such as dexamethasone), immunomodulating agents (such as thalidomide, lenalidomide, or pomalidmide), proteasome inhibitors (such as bortezomib, carfilzomib, or ixazmoib), chemotherapy (such as cisplatin, doxorubicin, cyclophosphamide, etoposide, melphalan, and/or bendamu stine), CAR-T therapy (such as idecabtagene violence 1 and/or ciltacabtagene autoleucel), and bone marrow transplant. In other examples, the subject is identified as having HCC using the methods provided herein and is treated with one or more of surgery (such as hepatectomy), radiation therapy, radiofrequency ablation, percutaneous ethanol injection, radioembolization, chemoembolization, immunotherpay (such as bevacizumab, atezolizumab, ramucirumab, pembrolizumab, and/or nivolumab), targeted therapy (such as sorafenib, lenvatinib, cabozantinib, and/or regorafenib), chemotherapy (such as doxorubicin, gemcitabine, oxaliplatin, cisplatin, 5 -fluorouracil, capecitabine, and/or mitoxantrone), and/or liver transplant. A skilled clinician can select approbate treatment regimen(s) based on the subject, disease being treated, stage of disease, condition of the subject, and other factors. [0097] In some embodiments, a series of samples collected over time from a single subject may be monitored to see if certain mutations, expression levels, or other phenotypic changes occur without treatment (e.g., longitudinal testing to monitor cancer staging from non-cancer to pre -malignancy or pre-malignancy to cancer). In some embodiments, cell-free polynucleotides are monitored to see if certain mutations, expression levels, or other features of DNA or RNA increase or decrease, or new mutations appear, after treatment, which can allow a physician to alter a treatment (continue, stop or change treatment, for example) in a much shorter penod of time than afforded by methods of monitonng that track traditional patient symptoms. In some examples, a subject identified as having a predisposition to cancer (such as MM or HCC) using the disclosed methods is monitored at intervals (such as every 3 months, every 6 months, annually, every 2 years, or more) to identify if progession to cancer has occurred or is occurring. In some examples, the subject has a predisposition to MM (for example, has MGUS) and the monitonng may include one or more of a second (or more) screen with the methods provided herein, blood tests (such as to detect M protein and/or p2-microglobulin, blood cell counts, and/or calcium levels), unne tests (such as to detect M protein), bone manow biopsy, and/or imaging tests. In other examples, the subject has a predisposition to HCC (such as liver cinhosis) and the monitonng may include one or more of a second (or more) screening with the methods provided herein, diagnostic imaging, and/or liver biopsy. In some embodiments, a method further comprises the step of diagnosing an individual based on the RNA-derived sequences and DNA-denved sequences, such as diagnosing the subject with a particular stage or type of cancer associated with a detected sequence variant, or reporting a likelihood that the patient has or will develop such cancer.

[0098] In one aspect, the present disclosure provides systems, such as computer systems, for implementing methods descnbed herein, including with respect to any of the vanous other aspects of this disclosure. It should be understood that it is not practical, or even possible in most cases, for an unaided human being to perform computational operations involved in some embodiments of methods disclosed herein. For example, mapping a single 30 bp read from a sample to any one of the human chromosomes might require years of effort without the assistance of a computational apparatus. Of course, the challenge of unaided sequence analysis and alignment is compounded in cases where reliable calls of low allele frequency mutations require mapping thousands (e.g., at least about 10,000) or even millions of reads to one or more chromosomes. Accordingly, some embodiments of methods described herein are not capable of being performed in the human mind alone, or with mere pencil in paper, but rather necessitate the use of a computational system, such as a system comprising one or more processors programmed to implement one or more analytical processes.

[0099] In some embodiments, the disclosure provides tangible and/or non-transitory computer readable media or computer program products that include program instructions and/or data (including data structures) for performing various computer-implemented operations. Examples of computer-readable media include, but are not limited to, semiconductor memory devices, magnetic media such as disk drives, magnetic tape, optical media such as CDs, magneto-optical media, and hardware devices that are specially configured to store and perform program instructions, such as read-only memory devices (ROM) and random access memory (RAM). The computer readable media may be directly controlled by an end user or the media may be indirectly controlled by the end user. Examples of directly controlled media include the media located at a user facility and/or media that are not shared with other entities. Examples of indirectly controlled media include media that is indirectly accessible to the user via an external network and/or via a service providing shared resources such as the "cloud." Examples of program instructions include both machine code, such as produced by a compiler, and files containing higher level code that may be executed by the computer using an interpreter.

[00100] In some embodiments, the data or information employed in methods and systems disclosed herein are provided in an electronic format. Examples of such data or information include, but are not limited to, sequencing reads derived from a nucleic acid sample, reference sequences (including reference sequences providing solely or primarily polymorphisms), sequences of one or more oligonucleotides used in the preparation of the sequencing reads (including portions thereof, and/or complements thereof), calls such as cancer diagnosis calls, counseling recommendations, diagnoses, and the like. As used herein, data or other information provided in electronic format is available for storage on a machine and transmission between machines. Conventionally, data in electronic format is provided digitally and may be stored as bits and/or bytes in various data structures, lists, databases, etc. The data may be embodied electronically, optically, etc.

[00101] In some embodiments, provided herein is a computer program product for generating an output indicating the sequences of DNA and RNA in a test sample. The computer product may contain instructions for performing any one or more of the above-described methods for determining DNA and RNA sequences. As explained, the computer product may include a non-transitory and/or tangible computer readable medium having a computer executable or compilable logic (e.g., instructions) recorded thereon for enabling a processor to determine a sequence of interest. In one example, the computer product includes a computer readable medium having a computer executable or compilable logic (e.g., instructions) recorded thereon for enabling a processor to diagnose a condition and/or determine a nucleic acid sequence of interest.

[00102] In some embodiments, methods described herein (or portions thereof) are performed using a computer processing system which is adapted or configured to perform a method for determining the sequence of polynucleotides derived from DNA and RNA of a sample, such as one or more sequences of interest (e.g. an expressed gene or portion thereof). In some embodiments, a computer processing system is adapted or configured to perform a method as described herein. In one embodiment, the system includes a sequencing device adapted or configured for sequencing polynucleotides to obtain the type of sequence information described elsewhere herein, such as with regard to any of the various aspects described herein. In some embodiments, the apparatus includes components for processing the sample, such as liquid handlers and sequencing systems, comprising modules for implementing one or more steps of any of the various methods described herein (e.g. sample processing, polynucleotide purification, and various reactions (e.g. RT reactions, amplification reactions, and sequencing reactions).

[00103] In some embodiments, sequence or other data is input into a computer or stored on a computer readable medium either directly or indirectly. In one embodiment, a computer system is directly coupled to a sequencing device that reads and/or analyzes sequences of nucleic acids from samples. Sequences or other information from such tools are provided via interface in the computer system. Alternatively, the sequences processed by system are provided from a sequence storage source such as a database or other repository. Once available to the processing apparatus, a memory device or mass storage device buffers or stores, at least temporarily, sequences of the nucleic acids. In addition, the memory device may store read counts for various chromosomes or genomes, etc. The memory may also store various routines and/or programs for analyzing the sequence or mapped data. In some embodiments, the programs/routines include programs for performing statistical analyses.

[00104] In one example, a user provides a polynucleotide sample into a sequencing apparatus. Data is collected and/or analyzed by the sequencing apparatus which is connected to a computer. Software on the computer allows for data collection and/or analysis. Data can be stored, displayed (via a monitor or other similar device), and/or sent to another location. The computer may be connected to the internet, which is used to transmit data to a handheld device utilized by a remote user (e.g., a physician, scientist or analyst). It is understood that the data can be stored and/or analyzed prior to transmittal. In some embodiments, raw data is collected and sent to a remote user or apparatus that will analyze and/or store the data. Transmittal can occur via the internet, but can also occur via satellite or other connection. Alternately, data can be stored on a computer-readable medium and the medium can be shipped to an end user (e.g., via mail). The remote user can be in the same or a different geographical location including, but not limited to a building, city, state, country or continent.

[00105] In some embodiments, the methods comprise collecting data regarding a plurality of polynucleotide sequences (e.g., reads, consensus sequences, and/or reference chromosome sequences) and sending the data to a computer or other computational system. For example, the computer can be connected to laboratory equipment, e.g., a sample collection apparatus, a nucleotide amplification apparatus, a nucleotide sequencing apparatus, or a hybridization apparatus. The computer can then collect applicable data gathered by the laboratory device. The data can be stored on a computer at any step, e.g., while collected in real time, prior to the sending, during or in conjunction with the sending, or following the sending. The data can be stored on a computer-readable medium that can be extracted from the computer. T he data collected or stored can be transmitted from the computer to a remote location, e.g., via a local network or a wide area network such as the internet. At the remote location various operations can be performed on the transmitted data.

[00106] Among the types of electronically formatted data that may be stored, transmitted, analyzed, and/or manipulated in systems, apparatus, and methods disclosed herein are the following: reads obtained by sequencing nucleic acids, consensus sequences based on the reads, the reference genome or sequence, thresholds for calling a test sample as either affected, non- affected, or no call, the actual calls of medical conditions related to the sequence of interest, diagnoses (clinical condition associated with the calls), recommendations for further tests derived from the calls and/or diagnoses, treatment and/or monitoring plans derived from the calls and/or diagnoses. In some embodiments, these various types of data are obtained, stored transmitted, analyzed, and/or manipulated at one or more locations using distinct apparatus. The processing options span a wide spectrum of options. At one end of the spectrum, all or much of this information is stored and used at the location where the test sample is processed, e.g., a doctor's office or other clinical setting. At the other end of the spectrum, the sample is obtained at one location, it is processed and optionally sequenced at a different location, reads are aligned and calls are made at one or more different locations, and diagnoses, recommendations, and/or plans are prepared at still another location (which may be a location where the sample was obtained).

[00107] Also provided herein are kits that may be used in connection with the disclosed methods and systems. In some examples, the kits include one or more primer pairs (such as 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more primers pairs) for analyzing or measuring the level of one or more of the disclosed cfRNA biomarkers. In some examples, the kits include up to 10 primers pairs selected from SEQ ID NOs: 23-42, for example for use in methods of diagnosing or treating MM. In other examples, the kits include 4 or more primers pairs, including SEQ ID NOs: 25-32, for example, for use in methods of diagnosing or treating MM. In further examples, the kits include up to 9 primer pairs selected from SEQ ID NOs: 5-22, for example, for methods of diagnosing or treating HCC. In some examples, the kits include 5 or more primer pairs, including SEQ ID NOs: 5-14, for example, for use in diagnosing or treating HCC. [00108] The kits may further include additional components for use in connection with the disclosed methods, such as one or more buffers, enzymes (such as a reverse transcriptase and/or a DNA polymerase), salts, or other reaction components. In additional examples, the kits may include reagents for one or more controls, such as primers for amplification of one or more control cfRNAs. In some examples, the kits include one or more control primer pairs selected from the pair of SEQ ID NO: 1 and SEQ ID NO: 2 or the pair of SEQ ID NO: 3 and SEQ ID NO: 4.

EXAMPLES [00109] As disclosed herein, total plasma cfRNA from plasma samples of patients with HCC and MM and their pre-cancerous conditions including liver cirrhosis, MGUS, and noncancer (NC) donors (also referred to herein as healthy donors (HD)), were sequenced. Potential cfRNA biomarkers were identified using plasma cfRNA-sequencing of a pilot sample set and then validated in an independent sample set. The sequencing data were further validated using orthogonal measurement by quantitative reverse transcription PCR. Feature selection and classification models were built to explore the potential of cfRNA profiles in differentiating malignant from pre -malignant conditions.

[00110] Cell-free RNA (cfRNA) in plasma reflects phenotypic alterations of both localized sites of cancer and the systemic host response. In one aspect, the present disclosure provides methods for utilizing cfRNA sequencing to identify messenger RNA (mRNA) signatures in plasma with the tissue of origin specific to cancer types and pre-cancerous conditions. Total cfRNA were sequenced from plasma samples of hepatocellular carcinoma (HCC) and multiple myeloma (MM) patients, their respective pre-cancerous conditions and non-cancer donors to explore the diagnostic potential. Distinct gene sets were identified and classification models were built using the random forest and linear discriminant analysis algorithms that could distinguish cancer patients from premalignant conditions and non-cancer individuals with high accuracy. Sequencing data was cross-validated by quantitative reverse transcription PCR and cfRNA biomarkers were validated in independent sample sets with AUC higher than 0.86.

Distinction of multiple myeloma from its pre-cancerous condition, monoclonal gammopathy of undetermined significance (MGUS), yielded an accuracy of 90% (17/19). Detection of primary liver cancer from its premalignant condition cirrhosis yielded an accuracy of 100% (12/12). This work demonstrates the potential of using mRNA transcripts in plasma with a small panel of genes for monitoring pre-malignant disease progression from cirrhosis to HCC and MGUS to MM.

[00111] Disclosed herein are methods for analyzing cfRNA biomarker panels to distinguish cancers and their pre-malignant conditions. Total plasma cfRNA were sequenced from plasma samples of patients with liver cancer (HCC) and multiple myeloma (MM) and their pre- cancerous conditions including liver cirrhosis (Cirr) and MGUS, and non-cancer donors. Potential cfRNA biomarkers were identified using plasma cfRNA-sequencing of a pilot sample set and validated the potential cfRNA biomarkers in an independent sample set. The sequencing data were then cross-validated using orthogonal measurement by quantitative reverse transcription PCR. Feature selection and classification models were built to explore the potential of cfRNA profiles in differentiating malignant from pre-malignant conditions.

1. Plasma cfRNA Biomarkers Identified by Sequencing

[00112] To identify cfRNA transcripts which potentially distinguish cancer patients from healthy individuals, blood samples were prospectively collected from the following sample sets: a pilot set of 10 MM patients and 8 HCC patients; 13 patients with pre-malignant conditions including 9 MGUS and 4 Cirr; and 20 age and gender matched non -cancer donors. Table 1 and Table 2 show detailed clinical information of, respectively, the pilot set and validation set .

[00113] Table 1: Detailed Clinical Information of Pilot Set.

[00114] Table 2: Detailed Clinical Information of Validation set.

[00115] Samples were randomly shuffled for RNA extraction, library preparation and sequencing in Illumina flow cells. Libraries were sequenced to saturation with a mean of 33.8M raw reads with a range of 27.7M to 52.3M (as shown below in Tables 3 and 4). After selecting for reads that mapped uniquely to the human genome, the cfRNA libraries had an average read depth of 14M with a range from 2.3M to 43M. On average, 80% of reads mapped to exons (shown in Tables 3 and 4). A total of 39,374 annotated features were detected with at least 1 mapped read across all samples. The majority of detected RNAs were protein coding with a mean fraction of 82% with a range from 65% to 89% (shown in Tables 3 and 4). The fraction of reads mapping to exons and the distribution of read depths were uniform across all sample groups. [00116] Table 3: Pilot Set Quality Control Data

[00117] Table 4: Validation Set Quality Control Data.

[00118] It was determined whether cfRNA profiles can distinguish HCC and MM from NC donors. Figs. 1A and IB show the results of an unbiased Principal Component Analysis (PCA) using the top 500 genes where the largest variance across all samples through pairwise comparison showed separation of HCC and MM cfRNA profiles from that of non-cancer donors. A differential expression (DE) analysis of pairwise comparison between individual cancer types with respect to NC donors using DEseq2 yielded 110, and 12 differentiating genes (adjusted p-value < 0.01) for MM and HCC, respectively (shown below in Tables 5-8 and Fig 12). [00119] Table 5: Top DE genes Pairwise HCC vs. Healthy Donor (HD)

[00120] Table 6: Top DE Genes Pairwise MM vs. Healthy Donor (HD)

[00121] Table 7: Top DE Genes Pairwise Cirr vs. Healthy Donor (HD)

[00122] Table 8: Top DE Genes Pairwise MGUS vs. Healthy Donor (HD)

[00123] To confirm the significance of the differential expression results for each pairwise comparison of cancer to NC donors, a permutation test was performed in which differential expression analysis between two groups of randomized samples was compared. Permutations of random sample shuffling in each pair with 500 rounds resulted in zero significant differentiating genes (padj < 0.01) in more than 95% and 94% of permutations for each pair comparing MM, and HCC to non-cancer donors, respectively (shown below in Tables 9A-C, 10A-C, and Fig. 13).

[00124] Gene ontology analysis revealed that MM up-regulated genes were enriched for oxygen transport and gas transport. In HCC, the up-regulated gene set was enriched for plasminogen activation. This data collectively indicates the separation of cfRNA profiles in HCC and MM compared to NC donors.

2. Validation of cfRNA Biomarkers

[00125] To further explore the potential of cell-free RNA for cancer detection, Linear Discriminant Analysis (LDA) and a Random Forest (RF) algorithm were applied to find combinations of discriminating genes to separate cancer from non-cancer individuals. Two independent methods were used to identify specific input gene lists for the classifying algorithms. First, discriminating genes using DESeq2 analysis with False Discovery Rate (FDR/adjusted p-value) < 0.01 (shown in Tables 5-8) were used as one feature set (DE gene set). Second, the learning vector quantization (LVQ) method was implemented to find the most important features that distinguished the two groups and selected the top 10 as another feature set (LVQ gene set) (shown below as Tables 11-17).

[00126] Table 11: List of Genes Used for Linear Discriminant Analysis shown in Figs. 1C and 2A; Top 10 Genes Differentiating HCC and MM from NC.

[00127] Table 12: List of Genes Used for Linear Discriminant Analysis shown in Figs. 6-8 and LOOCV; Top 10 Genes Differentiating Pairwise Comparisons between MGUS and NC Determined Using Learning Vector Quantization Algorithm.

[00128] Table 13: List of Genes Used for Linear Discriminant Analysis shown in Figs. 6-8 and LOOCV; Top 10 Genes Differentiating Pairwise Comparisons between MM and NC Determined Using Learning Vector Quantization Algorithm.

[00129] Table 14: List of Genes Used for Linear Discriminant Analysis shown in Figs. 6-8 and LOOCV; Top 10 Genes Differentiating Pairwise Comparisons between MM and NC, and between MGUS and NC, Determined Using Learning Vector Quantization Algorithm.

[00130] Table 15: List of Genes Used for Linear Discriminant Analysis shown in Figs. 9-11 and LOOCV; Top 10 Genes Differentiating Pairwise Comparisons between Cirr. and NC Determined Using Learning Vector Quantization Algorithm.

[00131] Table 16: List of Genes Used for Linear Discriminant Analysis shown in Figs. 9-11 and LOOCV; Top 10 Genes Differentiating Pairwise Comparisons between HCC and NC Determined Using Learning Vector Quantization Algorithm.

[00132] Table 16: List of Genes Used for Linear Discriminant Analysis shown in Figs. 9-11 and LOOCV; Top 10 Genes Differentiating Pairwise Comparisons between HCC and NC as well as between Cirr. and NC Determined Using Learning Vector Quantization Algorithm.

[00133] The linear combination for each gene set by LDA showed significant separation between HCC and MM from NC donors with p-value of 6.7x10-8, 6.7x10-10 and 6.4x10-7, 6.4x10-7 using the DE and top 10 LVQ gene sets, respectively (as shown in Fig. 1C). The Random Forest (RF) method was further employed to develop orthogonal classification models. The area under the receiver operating characteristic (ROC) curve (AUC) is higher than 0.92 in both LDA and RF models with both DE and LVQ feature sets for the two cancer types (as shown in Figs. 2A and 2B).

[00134] The leave-one-out cross validation (LOOCV) method was employed to evaluate the significance and accuracy of the classification models. Briefly, in LOOCV, one sample was iteratively removed for testing, with the remaining samples used for training by the LDA or RF algorithms to create a classifying model. LDA or RF algorithms classified each left out sample based on these training models. The test was repeated until all individual samples were classified and cross-validated. Both LDA and RF algorithms were trained on the described DE and LVQ gene sets, resulting in four classification models (as shown in Figs. 2A-2C). Classifying MM from non-cancer donors yielded greater than 90% accuracy (27/30) for all four models tested. HCC was correctly differentiated from NC donors with accuracies of 100% (28/28) and 93% (26/28) when using the LDA method or 96% (27/28) and 96% (27/28) when using the RF method with LVQ and DE feature sets, respectively. Overall, the LOOCV test confirmed that the biomarker sets determined by DESeq2 and LVQ methods, combined with our classification models using LDA and RF algorithms, are statistically significant. LVQ gene sets yielded higher accuracy for both cancer types and were used as the feature sets for further validation.

[00135] A primer panel for amplifying the LVQ genes was designed to validate the sequencing data by quantitative reverse transcription PCR (RT-qPCR). RT-qPCR results from the pilot sample set were consistent with the sequencing data with a Pearson correlation coefficient > 0.77 and a p-value of 2.2x10-16 (as shown in Fig. 4). It was confirmed that the differential level of cfRNA transcripts of genes identified by the LVQ algorithm (HBG1, HBG2, NUSAP1, for MM and C3, CP, FGA, FGB for HCC) from RNA-sequencing was also observed with RT-qPCR (as shown in Fig. 5). Table 17 provides forward and reverse primers delineated by LVQ gene target. [00136] Table 17: List of Forward and Reverse Primer Sequences for Amplifying LVQ

Gene Targets.

[00137] Amplification Parameters for a RT-qPCR assay were configured to pre-amplify products using SEQ ID NO: 1 through SEQ ID NO: 42. Template RNA was mixed with Superscript III One-step RT-PCR system with Platinum Taq DNA polymerase kit (Invitrogen Corp.; 1600 Faraday Ave., PO Box 6482, Carlsbad, CA, 92008, USA; Cat. No. 12574026) and SEQ ID NOs: 1-42 to generate cDNA according to the kit’s product-insert protocol. PCR amplification products were treated with Exonuclease I to digest single stranded primers at 37°C for 30 min followed by inactivation of enzymes at 80°C for 15 min. For RT-qPCR, cDNA from the preamplification was diluted 1:80 and set-up in 96-well plates with SsoFast EvaGreen supermix (BioRad, Inc.;1000 Alfred Nobel Dr., Hercules, CA, 94547, USA; Cat. No. 1725200) with low ROX with the individual primer pairs at lOpM each. QuantStudio 7 Flex (Applied Biosystems, LLC; 180 Oyster Point Blvd., San Francisco, CA, 94080, USA; Cat. No. 4485701) was used to run RT-qPCR assay according to manufacturer’s recommended cycling conditions. The delta Ct of a target gene was calculated by subtracting the Ct of a control gene (such as either GAPDH or ACTB).

[00138] RT-qPCR results from the pilot sample set were consistent with the sequencing data with a Pearson correlation coefficient > 0.77 and a p-value of 2.2xl0 ¹⁶ (as shown in Fig. 3). It was confirmed that the differential level of cfRNA transcripts of genes identified by the LVQ algorithm (HBG1, HBG2, NUSAP1, for MM and C3, CP, FGA, FGB for HCC) from RNAseq was also observed with RT-qPCR (as shown in Fig. 3).

[00139] To confirm that the feature sets and classification models defined in the pilot cohort were robust and generalizable, a set of independent validation samples was collected from 10 NC controls, 9 MM patients, and 20 HCC patients to validate if the feature sets and classification models defined in our pilot cohort were robust and generalizable (shown in Table 1 and Table 2). cfRNA biomarkers identified from the pilot set in-silico were validated by measuring the classification accuracy of this independent sample set on the models trained on the pilot dataset using the LVQ gene sets. The linear combination by LDA identified in the pilot cohort of the LVQ feature set showed significant separation in the validation sample set between MM and HCC from NC donors, consistent with the previous findings (shown in Fig.

3). Furthermore, both LDA and RF models trained on the pilot cohort with this same feature set were able to classify cancer from NC controls from the validation cohort, with an AUC > 0.86 and 0.9 when classifying non-cancer donors from MM and HCC, respectively (shown in Fig 3). [00140] This cfRNA classification model performed well for early and late stages in the pilot set. In the validation sample set, the model displayed a stage-dependent discrimination. It was validated with an AUC of 0.74 for stage A in HCC (see Figs. 14 and 15) and an AUC of 0.64 for stage I in MM (see Fig. 15). For later stages, the model achieved a higher AUC of 0.91 for stages B and C in HCC (see Figs. 14) and 0.83 for stages II and III in MM (see Fig. 15) in the validation sample set. This stepwise increase in discrimination suggests that these biomarkers become more prevalent with cancer progression. HCC classification also showed significant discrimination compared to NC for different etiologies, and both HCC and MM showed discrimination for males and females (as shown in Figs. 16 and 17) and are not agedependent (as shown in Figs. 16 and 17) in our pilot and validation sample sets.

3. cfRNA Profiles Distinguished Multiple Myeloma from Its Premalignant Condition: MGUS. and MGUS from Non-cancer [00141] Disclosed herein are methods of utilizing cfRNA to distinguish MM from MGUS, MM from non-cancer, and MGUS from non-cancer in individuals. It was next examined whether cfRNA profdes were able to recapitulate the transition from a pre-cancerous condition to a cancerous one, and distinguish between them. The hypothesis was tested on multiple myeloma (MM) as it has a well-defined pre-cancerous condition: MGUS. The top ten most significant genes that discriminate MM from non-cancer donors as identified by UVQ displayed a gradual transition in cfRNA level from the non-cancer donors through MGUS to MM Among these ten most significant genes, seven genes (CAI, EPB42, HBG1, HBG2, CENPE, CPOX, EPB42, NEK2 and NUSAP1) have higher expression in bone marrow, where cancerous plasma cells accumulate, compared to other tissue and cell types in publicly available data from the Human Protein Atlas [47, 48] . Three out of the ten most important genes resulting from the LVQ analysis are related to cell cycle processes: Centromere protein E (CENPE), a kinesin-like motor protein that accumulates in the G2 phase of the cell cycle and is highly expressed in bone marrow [49, 50]; Serine/threonine-protein kinase (NEK2), which is involved in mitotic regulation [50, 51]; and Nucleolar and spindle associated protein 1 (NUSAP1), a nucleolar- spindle-associated protein that plays a role in spindle microtubule organization [52], [00142] An LDA plot using a combination of the top 10 LVQ genes from pairwise comparisons MM - NC, and MGUS - NC displayed the separation of all three groups (shown in Fig. 8). A RF model using the top 10 most important LVQ genes from MGUS - NC pairwise comparison yielded an accuracy of 88.6% (20/20 non-cancer donors and 6/9 MGUS patients). Classification of MM from MGUS yielded an accuracy of 89.5% (8/9 MGUS and 9/10 MM) using LOOCV with the RF classification method using the top 10 most important genes from LVQ analysis of MM versus NC comparison as a feature set. The 3-group classification resulted in an accuracy of 82% (19/20 NC, 4/9 MGUS and 9/10 MM) defined by LOOCV using the RF method with the feature set composed of the combination of the top 10 LVQ genes from the comparison MM versus non-cancer and MGUS versus non-cancer donors.

4. cfRNA Profiles Distinguish Liver Cancer from Its Pre-Malignant Condition, Cirrhosis, and Cirrhosis from Non-cancer

[00143] Next it was asked if a solid tumor such as HCC could be distinguished from its pre- cancerous condition, Cirr. Among the top ten most important genes that discriminate HCC from NC identified by the LVQ analysis, five genes also significantly differentiate HCC from Cirr. Interestingly, 8 out of the top 10 genes are expressed specifically in the liver and the corresponding proteins are secreted into the blood [47, 48], Apolipoprotein E (APOE) binds to specific liver and peripheral cell receptors and is essential for normal catabolism of triglyceride- rich lipoprotein constituents [53], Complement C3 (C3) is synthesized in the liver and secreted to the plasma and is involved in both innate and adaptive immune responses [54], Ceruloplasmin (CP) is a secreted plasma metalloprotein from the liver that binds copper in the plasma and is involved in the peroxidation of Fe(II) transferrin to Fe(III) transferrin [55], 24- dehydrocholesterol reductase DHCR24 catalyzes the reduction of sterol intermediates [56], Fibrinogen Alpha Chain (FGA), Fibrinogen Beta Chain (FGB) and Fibrinogen Gamma Chain (FGG) encode the coagulation factor fibrinogen, which is a component of blood clotting [57], Histidine Rich Glycoprotein (HRG) is a plasma glycoprotein that binds heparin sulfate on the surface of the liver, lung, kidney and heart endothelial cells [58], [00144] Skilled persons will understand that current practices for HCC surveillance include screening on Cirr patients using imaging techniques, such as ultrasound, computerized tomography (CT) and magnetic resonance imaging (MRI). These methods are expensive and can have limited accessibility [5], In addition, detection of Cirr is mostly based on clinical symptoms which are often from complications displayed at later stages of the disease [59], Therefore, easy-to-use, reliable and specific biomarkers with accompanying prediction models are needed to improve detection of both HCC and Cirr.

[00145] Disclosed herein are methods of utilizing cfRNA to distinguish HCC from Cirr and Cirr from NC individuals. An LDA plot using the feature set comprised of a combination of the top 10 LVQ genes identified for the pairwise comparisons of HCC - NC and Cirr - NC, shows a distinct separation between these groups (shown in Fig. 11). RF methods using the top 10 important genes from Cirr - NC pairwise comparisons yielded 100% accuracy in classifying Cirr from NC samples using LOOCV (shown in Figs. 9-11). Classification of HCC from Cirr also yielded 100% accuracy using LOOCV with RF (as shown in Figs. 9-11). It was attempted to classify three classes including NC, Cirr, and HCC in one model. The 3-group classification resulted in 90.6% accuracy using LOOCV with RF (as shown in Figs. 9-11).

5. Discussion

[00146] cfRNA was sequenced from patients having two cancer types: one solid (HCC), and the other hematologic (MM) and their respective pre-cancerous conditions: Cirr and MGUS, respectively, and from NC donors. Both cancer types can be distinguished from non-cancer controls and pre-cancerous conditions using their cfRNA profdes. To differentiate each cancer type from non-cancer individuals, the combination of ten genes identified by learning vector quantization (LVQ) analysis in each pairwise comparison yields higher accuracy compared to the use of a larger set of differentiating genes as evaluated by leave one out cross validation (LOOCV). Two classification models built on linear discriminant analysis (LDA) and the random forest (RF) algorithm resulted in similar classification performance in each pairwise comparison of cancer to healthy donors. RT-qPCR confirmation for a panel of selected biomarkers was consistent with the sequencing data. Plasma cfRNA biomarkers identified from the sequencing data were further validated in an independent sample cohort. In some embodiments, use of a small gene panel potentially enables a cost-effective assay for pan-cancer detection that might be performed in a clinical environment, such as a doctor’s office, that can be useful in broad clinical applications, including the detection and diagnosis of cancer or a predisposition to cancer.

[00147] To date, most investigations into the potential of blood-based methods for cancer detection have only focused on distinguishing cancers from healthy controls [15, 22, 25, 26, 28, 36], However, many cancer types have etiologies associated with precursor states such as MGUS for MM and Cirr for HCC. Disclosed herein is that cfRNA profiles can recapitulate the transition from a pre-cancerous condition to cancer, including for both solid and hematologic cancers. In some embodiments, the disclosed method comprises cfRNA panels containing a small number of genes may be useful for distinguishing cancers from pre-malignant conditions and precursors from healthy individuals, thus, facilitating cost-effective screening strategies for early cancer detection during routine exams in high-risk patients within the general population. [00148] Liver and bone marrow have been reported to contribute heavily to the abundance of cell-free nucleic acids in plasma [42, 45, 46], This may explain the source of cfRNA biomarkers found in these cancer types. In HCC, eight out of the top ten genes used in the classification model are specifically synthesized in the liver and encode secreted proteins found in blood that mediate plasminogen activation and fibrinolysis processes. In MM, seven out of ten genes among the cfRNA biomarkers have relatively high expression in bone marrow compared to other tissue and cell types and are related to cell cycle processes. These findings indicate that the identified cfRNA biomarkers likely originate from the tissue of origin of the tumor. [00149] In some embodiments, the disclosed method may be used to profile cell-free mRNA to establish a platform for longitudinal monitoring of disease progression (e.g., monitoring a pre-malignant condition as progresses to cancer) across multiple cancers. In some embodiments, the disclosed method may be used as an panel or assay that measure transcript levels of mRNA in plasma for a small panel of genes that can differentiate cancer from pre- malignant conditions and otherwise healthy donors. As disclosed herein, organ-specific mRNA transcripts were identified as biomarkers that indicate the tissue of origin for the tumor. In some embodiments, detecting the level of these cell-free plasma RNA biomarkers in a sample from a subject by the disclosed method may be combined with other nucleic acids-based and protein-based approaches for potentially increased diagnostic sensitivity and specificity. For example, abnormal liver enzyme levels detected in the blood (indicative of cirrhosis) combined with measurement of AIDA, CAI, CENPE, CPOX, ELL2, EPB42, HBG1, HBG2, NEK2, NUSAP1, APOE, C3, CP, DHCR24, FGA, FGB, FGG, HRG, IFITM3, or any combination thereof, may increase the diagnostic sensitivity and specificity of diagnosing cirrhosis. In another example, elevated levels of monoclonal protein (M protein) detected in a urine sample (indicative of kidney damage related to MGUS) combined with measurement of cfRNA biomarkers: AIDA, CAI, CENPE, CPOX, ELL2, EPB42, HBG1, HBG2, NEK2, NUSAP1, or any combination thereof, may increase the diagnostic sensitivity and specificity of diagnosing MGUS.

6. Methods

[00150] Patient Samples — Blood samples from non-cancer donors and patients with monoclonal gammopathy of undetermined significance (MGUS), multiple myeloma, liver cirrhosis, and liver cancer were obtained from Oregon Health and Science University (OHSU) by Knight Cancer Institute Biolibrary and Oregon Clinical and Translational Research Institute (OCTRI). All samples were collected under institutional review board (IRB) approved protocols with informed consent from all participants for research use. Individuals who had no recorded previous history of cancer were considered to be non-cancer donors.

[00151] All samples were collected and processed using a uniform protocol by the same staff at Oregon Health and Science University. Samples for analysis were matched between cancer and control groups with respect to age and gender of participants. The clinical information regarding study participants are given in Table 1 and Table 2. [00152] Processing of Whole Blood — For all cohorts, whole blood samples were collected in EDTA-anticoagulated vacutainers. Within 2 hours of collection, blood samples were first centrifuged at 1,000g for 10 minutes at 4°C followed by 15,000g for 10 min at 4°C. Plasma was then stored at -80°C until RNA isolation.

[00153] cfRNA Isolation — Total RNA purification was performed by using plasma/serum circulating and exosomal RNA purification kit (Norgen Biotek) from 3ml of human plasma according to the manufacturer’s protocol. To digest trace amounts of contaminating DNA, RNA was treated with 10X Baseline-ZERO DNase. DNase I treated RNA samples were purified and further concentrated using RNA clean and concentrator-5 (Zymo Research) according to the manufacturer’s manuals. Final eluted RNA was stored immediately at -80°C.

[00154] Library Preparation — Stranded RNA-Seq libraries were prepared using Clontech SMARTer stranded total RNA-seq kit v2- pico input mammalian (Takara Bio) according to the manufacturer’s instructions. For cDNA synthesis, option 2 was used (without fragmentation), starting from highly degraded RNA. Input of 7ul of RNA samples were used to generate cDNA libraries suitable for next-generation sequencing. For addition of adapters and indexes, the SMARTer RNA unique dual index kit -96 U was employed. SMARTer RNA unique dual index of each 5 ’ and 3 ’ PCR primer were added to each sample to distinguish pooled libraries from each other. The amplified RNA-seq library was purified by immobilization onto AMPure XP PCR purification system (Beckman Coulter). The library fragments originated from rRNA and mitochondrial rRNA were treated with ZapR v2 and R-Probes according to manufacturer’s protocols. For final RNA-seq library amplification, 16 cycles of PCR were performed and final 20 ul was eluted in Tris buffer following amplified RNA-seq library purification. The amplified RNA-seq library was stored at -20°C prior to sequencing.

[00155] Sequencing Data Processing and Quality Control — Each sample was sequenced to more than 20 million paired-end reads using an Illumina Nextseq or HiSeq sequencer. Adapter sequences were trimmed using sickle tool [60], After trimming, the quality of the reads were checked using FastQC (vO.l 1.7) [61, 62] and RSeQC (v2.6.4) [63], Reads were aligned to the hg38 human genome using the STAR aligner (v2.5.3a) [64] with two pass mode flag. Duplicated reads were removed using the picard tool (v 1.119) [65], Read counts for each gene were calculated using the htseq-count tool (vO. 11.2) [66] in intersection-strict mode. The number of mapped reads to each gene were normalized to the total number of reads in the whole transcriptome (Reads Per Million - RPM). For each sample, exon, intron, intergenic fractions and protein coding fractions (CDS exons) were calculated using RSeQC [67], Samples with an exon fraction larger than 0.35 were kept for further analysis. Each sample was sequenced to more than 20 million paired-end reads using an Illumina Nextseq or HiSeq sequencer. Adapter sequences were trimmed using sickle tool [60], After trimming, the quality of the reads were checked using FastQC (vO.l 1.7) [61, 62] and RSeQC (v2.6.4) [63], Reads were aligned to the hg38 human genome using the STAR aligner (v2.5.3a) [64] with two pass mode flag. Duplicated reads were removed using the picard tool (v 1.119) [65], Read counts for each gene were calculated using the htseq-count tool (vO. 11.2) [66] in intersection-strict mode. The number of mapped reads to each gene were normalized to the total number of reads in the whole transcriptome (Reads Per Million - RPM). For each sample, exon, intron, intergenic fractions and protein coding fractions (CDS exons) were calculated using RSeQC [67], Samples with an exon fraction larger than 0.35 were kept for further analysis. Each sample was sequenced to more than 20 million paired-end reads using an Illumina Nextseq or HiSeq sequencer. Adapter sequences were trimmed using sickle tool [60], After trimming, the quality of the reads were checked using FastQC (vO.l 1.7) [61, 62] and RSeQC (v2.6.4) [63], Reads were aligned to the hg38 human genome using the STAR aligner (v2.5.3a) [64] with two pass mode flag. Duplicated reads were removed using the picard tool (v 1.119) [65], Read counts for each gene were calculated using the htseq-count tool (vO. 11.2) [66] in intersection-strict mode. The number of mapped reads to each gene were normalized to the total number of reads in the whole transcriptome (Reads Per Million - RPM). For each sample, exon, intron, intergenic fractions and protein coding fractions (CDS exons) were calculated using RSeQC [67], Samples with an exon fraction larger than 0.35 were kept for further analysis.

[00156] Identification of cfRNA Biomarkers (DESeq and LVQ and GO analysis) — Two independent methods were applied to select cfRNA features for building classification models. Differentiating genes between all pairwise comparisons were identified with the R package DESeq2 (vl.24.0) using the Wald test [68], The second method for feature selection using the LVQ algorithm built in an R package caret (v6.0-84) - with 10 fold cross validation repeated 3 times [69], The top 10 most important features were selected by ranking the varlmp parameter. GO analysis was implemented on the top differentiating genes from the DESeq2 analysis with padj > 0.01 using the package topGO (v2.37.0) and a Fischer statistical test to measure significant enrichment of each Gene Ontology term [70],

[00157] Cancer Type Classification (LDA and RF) — Two methods were used to build models for classifying cancer types using feature sets identified from pairwise comparison using DESeq2 and LVQ methods. LDA models were built using the R package MASS (v7.3-51.4) [71], Random Forest models were built using the R package randomForest (v4.6-14) [72], [00158] Statistical Consideration (Permutation Test and Leave One Out Cross Validation) — - To test if the difference in pairwise comparison between each cancer type and healthy control was specific, a permutation test in which differential expression analysis using DESeq2 package was performed between two groups of randomized samples. For each pair, 500 permutations of random shuffling were performed and the number of differentiating genes with padj < 0.01 were documented for building a histogram, and compared to the number of significant genes (padj < 0.01) for the group with correct labeling. To determine the significance and accuracy of our classification models, the LOOCV method was employed. Briefly, in LOOCV, LDA or RF algorithms classified each sample based on the training models obtained from all other samples (total number of samples in each pair minus the testing sample). The test was repeated until all individual samples were classified and cross validated.

[00159] Tissue Specificity of LVQ Feature Sets Using Publicly-A vailable Databases — To evaluate whether the LVQ gene sets were tissue specific to the tissue-of-origin (TOO), publicly available average tissue-level expression values (transcripts per million; TPMs) were downloaded from the Human Protein Atlas (ref: www.proteinatlas.org/about/download). The methodology used to normalize and calculate average expression values can be found here: www.proteinatlas.org/about/assays+annotation#hpa_ma. This matrix of counts values were then sub-setted for the two gene sets (top 10 LVQ for MM versus non-cancer, and top 10 LVQ for HCC versus non-cancer), and a z-score was calculated across tissue types to evaluate which tissue types the genes were enriched in. Next, a heatmap of this transformed matrix was generated using ComplexHeatmap (v2.4.3).

[00160] Data Availability — Data and materials availability: cfRNA sequencing data have been deposited in the Sequence Read Archive (SRA).

[00161] Code Availability — In-house scripts used in this manuscript, which includes data processing, downstream analysis and the scripts used to generate figures are publicly available on github: github.com/ohsu-cedar-comp-hub/cfRNA-seq-pipeline-Ngo-manuscript-2019 [00162] Table 18: Linear Discriminant Analysis results for MGUS versus NC.

[00163] Table 19: Linear Discriminant Analysis results for MGUS versus MM.

[00164] Table 20: Linear Discriminant Analysis results for NC versus MGUS versus

MM.

[00165] Table 21: Linear Discriminant Analysis results for NC versus Cirr.

[00166] Table 22: Linear Discriminant Analysis results for Cirr. Versus HCC

[00167] Table 23: Linear Discriminant Analysis results for NC versus Cirr. versus HCC.

REFERENCES

[00168] [1] SEER Cancer Stat Facts: Liver and Intrahepatic Bile Duct Cancer. National Cancer Institute. Bethesda, MD. 2018; [2] Howlader N, N.A., Krapcho M, Miller D, Brest A, Yu M, Ruhl J, Tatalovich Z, Marietta A, Lewis DR, Chen HS, Feuer EJ, Cronin KA SEER Cancer Statistics Review, 1975-2016, National Cancer Institute. Bethesda, MD; [3] Kyle, R.A. and S.V. Rajkumar, Management of monoclonal gammopathy of undetermined significance (MGUS) and smoldering multiple myeloma (SMM). Oncology (Williston Park), 2011. 25(7): p. 578-86; [4] Dhodapkar, M.V., MGUS to myeloma: a mysterious gammopathy of underexplored significance. Blood, 2016. 128(23): p. 2599; [5] Llovet, J.M., et al., Hepatocellular carcinoma. Nat Rev Dis Primers, 2016. 2: p. 16018; [6] Fateen, W. and S.D. Ryder, Screening for hepatocellular carcinoma: patient selection and perspectives. J Hepatocell Carcinoma, 2017. 4: p. 71-79; [7] Starr, S.P. and D. Raines, Cirrhosis: diagnosis, management, and prevention. Am Fam Physician, 2011. 84(12): p. 1353-9; [8] Laursen, L., A preventable cancer. Nature, 2014. 516: p. S2; [9] Goh, G.B., P.E. Chang, and C.K. Tan, Changing epidemiology of hepatocellular carcinoma in Asia. Best Pract Res Clin Gastroenterol, 2015. 29(6): p. 919-28; [10] Wong, V.W., et al., Clinical scoring system to predict hepatocellular carcinoma in chronic hepatitis B carriers. J Clin Oncol, 2010. 28(10): p. 1660-5; [10] Yang, H.I., et al., Risk estimation for hepatocellular carcinoma in chronic hepatitis B (REACH-B): development and validation of a predictive score. Lancet Oncol, 2011. 12(6): p. 568-74; [11] Bai, Y. and H. Zhao, Liquid biopsy in tumors: opportunities and challenges. Ann Transl Med, 2018. 6(Suppl 1): p. S89; [12] Palmirotta, R., et al., Liquid biopsy of cancer: a multimodal diagnostic tool in clinical oncology. Ther Adv Med Oncol, 2018. 10: p. 1758835918794630; [13] Marrugo-Ramirez, J., M. Mir, and J. Samitier, Blood-Based Cancer Biomarkers in Liquid Biopsy: A Promising Non-Invasive Alternative to Tissue Biopsy. Int J Mol Sci, 2018. 19(10); [14] Esposito, A., et al., Liquid biopsies for solid tumors: Understanding tumor heterogeneity and real time monitoring of early resistance to targeted therapies. Pharmacol Ther, 2016. 157: p. 120-4; [15] Sundling, K.E. and A.C. Lowe, Circulating Tumor Cells: Overview and Opportunities in Cytology. Adv Anat Pathol, 2019. 26(1): p. 56-63; [16] Millner, L.M., M.W. Linder, and R. Valdes, Jr., Circulating tumor cells: a review of present methods and the need to identify heterogeneous phenotypes. Ann Clin Lab Sci, 2013. 43(3): p. 295-304; [17] Thiele, J.A., et al., Circulating Tumor Cells: Fluid Surrogates of Solid Tumors. Annual Review of Pathology: Mechanisms of Disease, 2017. 12(1): p. 419- 447; [18] Liu, Y. and X. Cao, The origin and function of tumor-associated macrophages. Cellular And Molecular Immunology, 2014. 12: p. 1; [19] Adams, D.L., et al., Circulating giant macrophages as a potential biomarker of solid tumors. Proceedings of the National Academy of Sciences, 2014. 111(9): p. 3514; [20] Gast, C.E., et al., Cell fusion potentiates tumor heterogeneity and reveals circulating hybrid cells that correlate with stage and survival. Science Advances, 2018. 4(9): p. eaat7828; [21] Newman, A.M., et al., An ultrasensitive method for quantitating circulating tumor DNA with broad patient coverage. Nat Med, 2014. 20(5): p. 548- 54; [22] Corcoran, R.B. and B.A. Chabner, Application of Cell-free DNA Analysis to Cancer Treatment. N Engl J Med, 2018. 379(18): p. 1754-1765; [23] Abbosh, C., et al., Phylogenetic ctDNA analysis depicts early-stage lung cancer evolution. Nature, 2017. 545(7655): p. 446-451; [24] Best, M.G., et al., RNA-Seq of Tumor-Educated Platelets Enables Blood-Based PanCancer, Multiclass, and Molecular Pathway Cancer Diagnostics. Cancer Cell, 2015. 28(5): p. 666-676; [25] Best, M.G., P. Wesseling, and T. Wurdinger, Tumor-Educated Platelets as a Noninvasive Biomarker Source for Cancer Detection and Progression Monitoring. Cancer Res, 2018. 78(13): p. 3407-3412; [26] In, S.G.J.G. t Veld, and T. Wurdinger, Tumor-educated platelets. Blood, 2019: p. blood-2018-12-852830; [27] Cohen, J.D., et al., Detection and localization of surgically resectable cancers with a multi -analyte blood test. Science, 2018. 359(6378): p. 926; [28] Abbosh, C., N.J. Birkbak, and C. Swanton, Early stage NSCLC - challenges to implementing ctDNA-based screening and MRD detection. Nat Rev Clin Oncol, 2018. 15(9): p. 577-586; [29] Haque, LS. and O. Elemento, Challenges in Using ctDNA to Achieve Early Detection of Cancer. bioRxiv, 2017: p. 237578; [30] Salta, S., et al., A DNA Methylation-Based Test for Breast Cancer Detection in Circulating Cell-Free DNA. J Clin Med, 2018. 7(11); [31] Xu, R.-h., et al., Circulating tumour DNA methylation markers for diagnosis and prognosis of hepatocellular carcinoma. Nature Materials, 2017. 16: p. 1155; [32] Song, C - X., et al., 5-Hydroxymethylcytosine signatures in cell-free DNA provide information about tumor types and stages. Cell Research, 2017. 27: p. 1231; [33] Shen, S.Y., et al., Sensitive tumour detection and classification using plasma cell-free DNA methylomes. Nature, 2018. 563(7732): p. 579-583; [34] Moss, J., et al., Comprehensive human cell-type methylation atlas reveals origins of circulating cell-free DNA in health and disease. Nature Communications, 2018. 9(1): p. 5068; [35] Cristiano, S., et al., Genome-wide cell-free DNA fragmentation in patients with cancer. Nature, 2019. 570(7761): p. 385-389; [36] Liu, M.C., et al., Sensitive and specific multi-cancer detection and localization using methylation signatures in cell-free DNA. Ann Oncol, 2020. 31(6): p. 745-759; [37] Chen, X., et al., Non-invasive early detection of cancer four years before conventional diagnosis using a blood test. Nature Communications, 2020. 11(1): p. 3475; [38] Gemmell, C.H., Activation of platelets by in vitro whole blood contact with materials: increases in microparticle, procoagulant activity, and soluble P-selectin blood levels. J Biomater Sci Polym Ed, 2001. 12(8): p. 933-43; [39] Heitzer, E., et al., Current and future perspectives of liquid biopsies in genomics-driven oncology. Nature Reviews Genetics, 2019. 20(2): p. 71-88; [40] Wan, J.C.M., et al., Liquid biopsies come of age: towards implementation of circulating tumour DNA. Nature Reviews Cancer, 2017. 17: p. 223; [41] Koh, W., et al., Noninvasive in vivo monitoring of tissue-specific global gene expression in humans. Proceedings of the National Academy of Sciences, 2014: p. 201405528; [42] Pan, W., et al., Simultaneously Monitoring Immune Response and Microbial Infections During Pregnancy through Plasma cfRNA Sequencing. Clinical Chemistry, 2016: p. clinchem.2017.273888; [43] Ngo, T.T.M., et al., Noninvasive blood tests for fetal development predict gestational age and preterm delivery. Science, 2018. 360(6393): p. 1133; [44] Larson, M.H., et al., A comprehensive characterization of the cell-free transcriptome reveals tissue- and subtype -specific biomarkers for cancer detection. Nature Communications, 2021. 12(1): p. 2357; [45] Ibarra, A., et al., Non-invasive characterization of human bone marrow stimulation and reconstitution by cell-free messenger RNA sequencing. Nature Communications, 2020. 11(1): p. 400; [46] The Genotype-Tissue Expression (GTEx) project. Nat Genet, 2013. 45(6): p. 580-5; [47] The Human Protein Atlas; [48] Sardar, H.S. and S.P. Gilbert, Microtubule capture by mitotic kinesin centromere protein E (CENP-E). J Biol Chem, 2012. 287(30): p. 24894-904; [49] Uhlen, M., et al., Proteomics. Tissue-based map of the human proteome. Science, 2015. 347(6220): p. 1260419; [50] Fry, A.M., The Nek2 protein kinase: a novel regulator of centrosome structure. Oncogene, 2002. 21(40): p. 6184-6194; [51] Mills, C.A., et al., Nucleolar and spindle-associated protein 1 (NUSAP1) interacts with a SUMO E3 ligase complex during chromosome segregation. J Biol Chem, 2017. 292(42): p. 17178-17189; [52] Srivastava, R.A., N. Bhasin, and N. Srivastava, Apolipoprotein E gene expression in various tissues of mouse and regulation by estrogen. Biochem Mol Biol Int, 1996. 38(1): p. 91-101; [53] Jia, Q., et al., Association between complement C3 and prevalence of fatty liver disease in an adult population: a cross-sectional study from the Tianjin Chronic Low-Grade Systemic Inflammation and Health (TCLSIHealth) cohort study. PLoS One, 2015. 10(4): p. e0122026; [54] Zeng, D.W., et al., Serum ceruloplasmin levels correlate negatively with liver fibrosis in males with chronic hepatitis B: a new noninvasive model for predicting liver fibrosis in HBV-related liver disease. PLoS One, 2013. 8(10): p. e77942; [55] Waterham, H.R., et al., Mutations in the 3beta- hydroxysterol Delta24-reductase gene cause desmosterolosis, an autosomal recessive disorder of cholesterol biosynthesis. Am J Hum Genet, 2001. 69(4): p. 685-94; [56] Fort, A., et al., A liver enhancer in the fibrinogen gene cluster. Blood, 2011. 117(1): p. 276-82; [57] Gram, J., et al., Plasma histidine-rich glycoprotein and plasminogen in patients with liver disease. Thromb Res, 1985. 39(4): p. 411-7; [58] Goodman, Z.D., Liver Biopsy Diagnosis of Cirrhosis, in Diagnostic Methods for Cirrhosis and Portal Hypertension, A. Berzigotti and J. Bosch, Editors. 2018, Springer International Publishing: Cham. p. 17-31; [59] Joshi NA, F.J., Sickle: A sliding- window, adaptive, quality-based trimming tool for FastQ files. Available at github.com/najoshi/sickle, 2011; [60] Leggett, R.M., et al., Sequencing quality assessment tools to enable data-driven informatics for high throughput genomics. Frontiers in genetics, 2013. 4: p. 288-288; [61] Andrews, S., FastQC: a quality control tool for high throughput sequence data. 2010; [62] Wang, L., S. Wang, and W. Li, RSeQC: quality control of RNA-seq experiments.

Bioinformatics, 2012. 28(16): p. 2184-5; [63] Dobin, A., et al., STAR: ultrafast universal RNA- seq aligner. Bioinformatics, 2013. 29(1): p. 15-21; [64] Van der Auwera, G.A., et al., From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline. Curr Protoc Bioinformatics, 2013. 43: p. 11.10.1-33; [65] Anders, S., P.T. Pyl, and W. Huber, HTSeq— a Python framework to work with high-throughput sequencing data.

Bioinformatics, 2015. 31(2): p. 166-9; [66] Wang, L., S. Wang, and W. Li, RSeQC: quality control of RNA-seq experiments. Bioinformatics, 2012. 28(16): p. 2184-2185; [67] Love, M.I., W. Huber, and S. Anders, Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol, 2014. 15(12): p. 550; [68] Kuhn, M., Building Predictive Models in R Using the caret Package. Journal of Statistical Software, 2008. 28(5); [69] Alexa A, R.J., topGO: Enrichment Analysis for Gene Ontology. R package version 2.36.0., 2019; [71] Ripley, W.N.V.a.B.D., Modem Applied Statistics with S. 2002; and, [72] Wiener, A.L.a.M., Classification and Regression by randomForest. RNews, 2002. 2(3): p. 18-22.

[00169] It will be understood by those having skill in the art that many changes may be made to the details of the above-described embodiments without departing from the underlying principles of the invention. The scope of the present invention should, therefore, be determined only by the following claims.

Claims

Claims A method for detecting cancer or a predisposition for cancer in a biological sample obtained from a subject, the method comprising:

(a) analyzing a level of one or more cell-free RNA (cfRNA) biomarkers selected from a group comprising: AIDA, CAI, CENPE, CPOX, ELL2, EPB42, HBG1, HBG2, NEK2, NUSAP1, APOE, C3, CP, DHCR24, FGA, FGB, FGG, HRG, IFITM3, ATP IB 1, FPR3, SMC4, TXNDC16, ASPM, WRN, ZRANB2-AS2, BMX, CDC42BPA, KNL1, CACNA1A, ABCB7, HIST1H2BF, PSIP1, TMEM150C, ZC3H6, C9orfl6, CPQ, DYNC1I2, ECM1, and HIST1H2AH, or any combination thereof, in the biological sample; and

(b) performing a differential expression analysis comparing the level of each cfRNA biomarker selected in Step (a) to a corresponding control value (CV); in which differential expression shown by the differential expression analysis between the cfRNA biomarkers selected in Step (a) and corresponding CVs indicates cancer or a predisposition for cancer in the subject. The method of claim 1, wherein one or more of the cfRNA biomarkers: AIDA, CAI, CENPE, CPOX, ELL2, EPB42, HBG1, HBG2, NEK2, NUSAP1, FPR3, SMC4, TXNDC16, ASPM, WRN, ZRANB2-AS2, BMX, CDC42BPA, KNL1, CACNA1A, or any combination thereof, are selected to indicate blood cancer or a predisposition to blood cancer. The method of claim 1, in which one or more of the cfRNA biomarkers: AIDA, CAI, CENPE, CPOX, ELL2, EPB42, HBG1, HBG2, NEK2, NUSAP1, or any combination thereof, are selected to indicate multiple myeloma (MM). The method of claim 3, wherein one or more of CENPE, HBG1, HBG2, NUSAP1, or any combination thereof are selected to indicate MM. The method of claim 4, wherein CENPE, HBG1, HBG2, and NUSAP1 are selected to indicate MM. The method of claim 1, in which one or more of the cfRNA biomarkers: FPR3, SMC4, TXNDC16, ASPM, WRN, ZRANB2-AS2, BMX, CDC42BPA, KNL1, CACNA1A, or any combination thereof, are selected to indicate monoclonal gammopathy of undetermined significance (MGUS).

7. The method of claim 6, wherein FPR3, SMC4, TXNDC16, ASPM, WRN, ZRANB2- AS2, BMX, CDC42BPA, KNL1, and CACNA1A are selected to indicate MGUS.

8. The method of claim 1, in which one or more of the cfRNA biomarkers: APOE, C3, CP, DHCR24, FGA, FGB, FGG, HRG, IFITM3, ATP1B1, ABCB7, HIST1H2BF, PSIP1, TMEM150C, ZC3H6, C9orfl6, CPQ, DYNC1I2, ECM1, and HIST1H2AH, or any combination thereof, are selected to indicate liver cancer or a predisposition to liver cancer.

9. The method of claim 1, in which one or more of the cfRNA biomarkers: APOE, C3, CP, DHCR24, FGA, FGB, FGG, HRG, IFITM3, and ATP IB 1, or any combination thereof, are selected to indicate hepatocellular carcinoma (HCC).

10. The method of claim 9, wherein one or more of the cfRNA biomarkers C3, CP, FGA, FGB, IFITM3, or any combination thereof are selected to indicate HCC.

11. The method of claim 10, wherein C3, CP, FGA, FGB, and IFITM3 are selected to indicate HCC.

12. The method of claim 1, in which one or more of the cfRNA biomarkers: ABCB7, HIST1H2BF, PSIP1, TMEM150C, ZC3H6, C9orfl6, CPQ, DYNC1I2, ECM1, and HIST1H2AH, or any combination thereof, are selected to indicate cirrhosis.

13. The method of claim 12, wherein ABCB7, HIST1H2BF, PSIP1, TMEM150C, ZC3H6, C9orfl6, CPQ, DYNC1I2, ECM1, and HIST1H2AH are selected to indicate cirrhosis.

14. The method of claim 1, in which the differential expression analysis is selected from the group of: (i) read-mapping analysis, (ii) Random Forest analysis , (iii) decision tree analysis, (iv) Linear Discriminant Analysis, (v) DESeq2 analysis, (vi) Bayesian modeling framework analysis, (vii) EMDomics analysis, (viii) Monocle2 analysis, (ix) Discrete distributional differential expression (D3E) analysis, (x) SINCERA analysis, (xi) edgeR analysis, (xii) DEsingle analysis, (xiii) SigEMD analysis, or any combination thereof.

15. The method of claim 1, in which the level of the one or more cfRNA biomarkers is measured by a method selected from the group of: a polymerase chain reaction (PCR), a quantitative PCR (qPCR), a reverse transcription PCR (rt-PCR), a complementary DNA (cDNA) synthesis, or a real-time PCR, or any combination thereof.

16. The method of claim 15, wherein the level of the one or more cfRNA biomarkers is measured by: a. performing a RT-PCR reaction comprising primer pairs for amplifying two or more of the cfRNA biomarkers, producing a pre-amplified pool of cDNAs; b. digesting the pre-amplified pool of cDNAs to remove single-stranded nucleic acids; and c. performing two or more qPCR reactions each comprising a single primer pair for amplifying a single cfRNA biomarker.

17. The method of Claim 1, further comprising analyzing a level of one or more of GAPDH, ACTB, or a combination thereof.

18. The method of claim 17, wherein the method uses the primer pair of SEQ ID NO: 1 and SEQ ID NO: 2, the primer pair of SEQ ID NO: 3 and SEQ ID NO: 4, or a combination thereof.

19. The method of any of Claims 1, in which the method uses one or more primer pairs selected from the primer pair of SEQ ID NO: 23 and SEQ ID NO: 24, the primer pair of SEQ ID NO: 25 and SEQ ID NO: 26, the primer pair of SEQ ID NO: 27 and SEQ ID NO: 28, the primer pair of SEQ ID NO: 29 and SEQ ID NO: 30, the primer pair of SEQ ID NO: 31 and SEQ ID NO: 32, the primer pair of SEQ ID NO: 33 and SEQ ID NO: 34, the primer pair of SEQ ID NO: 35 and SEQ ID NO: 36, the primer pair of SEQ ID NO: 37 and SEQ ID NO: 38, the primer pair of SEQ ID NO: 39 and SEQ ID NO: 40, the primer pair of SEQ ID NO: 41 and SEQ ID NO: 42, or any combination thereof. 0. The method of claim 1, in which the one or more primer pairs comprise the primer pair of SEQ ID NO: 25 and SEQ ID NO: 26, the primer pair of SEQ ID NO: 27 and SEQ ID NO: 28, the primer pair of SEQ ID NO: 29 and SEQ ID NO: 30, and the primer pair of SEQ ID NO: 31 and SEQ ID NO: 32. The method of Claim 1, in which the method uses one or more primer pairs selected from the primer pair of SEQ ID NO: 5 and SEQ ID NO: 6, the primer pair of SEQ ID NO: 7 and SEQ ID NO: 8, the primer pair of SEQ ID NO: 9 and SEQ ID NO: 10, the primer pair of SEQ ID NO: 11 and SEQ ID NO: 12, the primer pair of SEQ ID NO: 13 and SEQ ID NO: 14, the primer pair of SEQ ID NO: 15 and SEQ ID NO: 16, the primer pair of SEQ ID NO: 17 and SEQ ID NO: 18, the primer pair of SEQ ID NO: 19 and SEQ ID NO: 20, the primer pair of SEQ ID NO: 21 and SEQ ID NO: 22, , or any combination thereof. The method of claim 1, in which the one or more primer pairs comprise the primer pair of SEQ ID NO: 5 and SEQ ID NO: 6, the primer pair of SEQ ID NO: 7 and SEQ ID NO: 8, the primer pair of SEQ ID NO: 9 and SEQ ID NO: 10, the primer pair of SEQ ID NO: 11 and SEQ ID NO: 12 and the primer pair of SEQ ID NO: 13 and SEQ ID NO: 14. The method of claim 1, in which the biological sample is selected from the group of: a blood sample, a serum sample, a plasma sample, a urine sample, a tear sample, a saliva sample, a breast milk sample, a semen sample, a fecal sample, a cerebral spinal fluid sample, a tissue sample, or a cell sample.