WO2018081465A1 - Systems and methods for characterizing nucleic acid in a biological sample - Google Patents

Systems and methods for characterizing nucleic acid in a biological sample Download PDF

Info

Publication number
WO2018081465A1
WO2018081465A1 PCT/US2017/058599 US2017058599W WO2018081465A1 WO 2018081465 A1 WO2018081465 A1 WO 2018081465A1 US 2017058599 W US2017058599 W US 2017058599W WO 2018081465 A1 WO2018081465 A1 WO 2018081465A1
Authority
WO
WIPO (PCT)
Prior art keywords
sequencing
cancer
nucleic acid
copy number
copies
Prior art date
Application number
PCT/US2017/058599
Other languages
French (fr)
Inventor
Nilesh Ganeshbhai DHARAJIYA
Akhil Rajput
Original Assignee
Pathway Genomics Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Pathway Genomics Corporation filed Critical Pathway Genomics Corporation
Publication of WO2018081465A1 publication Critical patent/WO2018081465A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • C12Q1/6874Methods for sequencing involving nucleic acid arrays, e.g. sequencing by hybridisation
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/154Methylation markers
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/156Polymorphic or mutational markers

Definitions

  • This disclosure generally relates to systems and methods for characterizing nucleic acid. More specifically, the present disclosure relates to cancer diagnostic systems and methods using nucleic acid from a biological sample.
  • Cancer is a disease characterized by the aberrant and uncontrolled growth of cells in the body. Cancer is a leading cause of death worldwide and, in the U. S., cancer is second only to heart disease as a leading cause of death.
  • cancer detection often involves histological analysis of surgically-excised tissues by a trained medical professional. The expense associated with such an analysis may further inhibit early detection, as a patient is deterred by the cost of testing.
  • Existing cancer screening techniques include whole body (or part body) scanning through use of computed tomography, magnetic resonance imaging, or similar imaging techniques to locate masses of (often over-proliferating, under- differentiated, and/or metastatic) cells (e.g., tumors) and are typically followed by a biopsy of the cell mass and diagnostic testing to determine whether the cell mass was benign or malignant. Other traditional diagnostic tests rely on the presence of a subset of cancer diagnostic markers. Cancer markers may be proteinaceous or may present as mutations within the DNA of the cancer cells. Protein-based diagnostic markers may be identified through antibody staining procedures of tumor cross-sections, requiring a tissue specific sample. DNA-based markers have historically comprised small changes in specific genes.
  • cancer diagnostic markers are based on single nucleotide polymorphisms (SNPs) in a collection of genes that have each or collectively correlated with cancer. Determining the presence of a SNP in a sample also typically requires a tissue specific sample. However, even in cases where a cancerous tissue sample can be collected non-invasively and/or from a systemic sample (e.g., a (random) sampling of blood or other bodily fluid), the diagnostic potential is limited to only those cancers that have been correlated with a unique S P, which are few.
  • SNPs single nucleotide polymorphisms
  • cancer detection through cancer marker identification requires foreknowledge of the origin of the cancerous cell and/or the actual presence of cancerous cells from which the markers originate, making it difficult, if not improbable to identify cancer from a (random) sampling of bodily tissues and/or fluids
  • Embodiments of the present disclosure comprise methods and systems for detecting cancer.
  • Inventive methods can include identifying, diagnosing, classifying, distinguishing, and/or staging the detected cancer, and systems for performing the same.
  • One or more embodiments can include (obtaining) a biological sample.
  • the biological sample can be (obtained) from a patient and/or can comprise or contain nucleic acid, such as (genomic) DNA or RNA.
  • One or more embodiments can include: (preparing) a sequencing library, such as a next generation sequencing library, of the nucleic acid; sequencing at least a portion (e.g., the entirety) of the prepared sequencing library with a predetermined sequencing coverage; measuring a number of copies of at least one nucleic acid sequence included in the sequenced portion of the nucleic acid library; comparing the measured number of copies with a standard copy number for the at least one nucleic acid sequence to determine variability or similarity between the measured number of copies and the standard copy number; and/or diagnosing a cancer or cancer condition based on the variability or similarity between the measured number of copies and the standard copy number.
  • the predetermined sequencing coverage can be less than or equal to about 10X, 7X, 5X, 4X, 3X, 2.5X, 2X, 1.5X, IX, or 0.5X.
  • One or more embodiments can include providing the patient with the diagnosis of cancer or cancer condition, optionally in the form of a report.
  • One or more embodiments can include prescribing further and/or confirmatory testing, prescribing a treatment protocol, and/or administering a treatment based on the diagnosis or based on the variability or similarity between the measured number of copies and the standard copy number.
  • the treatment or treatment protocol can include one or more dietary or lifestyle components or alterations.
  • the treatment or treatment protocol can include one or more supplement or pharmaceutical compositions.
  • a method for detecting cancer in a biological sample includes receiving a biological sample comprising nucleic acid.
  • the method further includes preparing a nucleic acid library of the nucleic acid in the biological sample.
  • the method also includes sequencing at least a portion of the prepared nucleic acid library and measuring the number of copies of at least one nucleic acid sequence included in the sequenced portion of the nucleic acid library.
  • the method further includes comparing the measured number of copies with a standard copy number for the at least one nucleic acid sequence to determine variability or similarity between the measured number of copies and the standard copy number.
  • the comparing step of the previously recited method comprises determining a tissue of origin and/or a stage of cancer based on the similarity of the measured copy number of nucleic acid from the biological sample to a standard copy number.
  • the standard copy number may comprise a copy number of nucleic acid or nucleic acid sequence(s) and/or a copy number profile of nucleic acid or nucleic acid sequence(s) for a wild-type cell or sample.
  • the standard copy number may comprise a copy number of nucleic acid or nucleic acid sequence(s) and/or a copy number profile of nucleic acid or nucleic acid sequence(s) for (each of) one or more cancer or cancerous cells, cell types, or samples
  • the method comprises isolating cell free DNA (cfDNA) from the liquid biopsy, the cfDNA comprising one or more genetic elements.
  • the method also comprises determining a copy number of the one or more genetic elements and comparing the determined copy number to one or more known copy number standards for ctDNA.
  • methods for detecting ctDNA in a liquid biopsy can (further) comprise assembling a genetic profile of the cfDNA, wherein the genetic profile comprises a representation of the relative abundance of the one or more genetic elements in the cfDNA.
  • the comparing step in methods for detecting ctDNA in a liquid biopsy can comprise detecting the presence of ctDNA in the liquid biopsy (e.g., by measuring one or more similarities between the determined copy number and the one or more known copy number standards for ctDNA).
  • the comparing step can also, or alternatively comprise detecting the presence of cancer or cancerous cells (e.g., by detecting ctDNA in the liquid biopsy and/or measuring one or more similarities between the determined copy number and the one or more known copy number standards for ctDNA).
  • the method can comprise determining a methylation pattern of the cfDNA and/or the one or more genetic elements.
  • the aforementioned methods may further comprise determining copy number alterations, gene expression, a tissue of origin and/or a stage of cancer based on the detected ctDNA and/or the methylation pattern of the cfDNA and/or the one or more genetic elements.
  • One or more embodiments include a (computer) system.
  • the system can be configured for engineering compliant communications.
  • the system can include one or more processors and one or more computer-readable storage media.
  • the computer- readable storage media can have stored thereon computer-executable instructions that are executable by the one or more processors to cause the computer system to analyze a nucleic acid processed from a biological sample to determine the presence of cancer in the biological sample.
  • the computer-executable instructions can include instructions that are executable to cause the computer system to perform one or more of the following: receive sequence data, the sequence data comprising a plurality of sequence reads derived from the nucleic acid; parse the sequence data to determine a number of copies of at least one nucleic acid sequence included in the sequence data; analyze the parsed number of copies with a standard copy number for the at least one nucleic acid sequence to determine variability or similarity between the parsed number of copies and the standard copy number; and, based on the determined variability or similarity, display a result at a user interface.
  • the result can be a copy number variation (CNV) profile result, a diagnosed cancer or cancer condition, and/or a report comprising a diagnosis.
  • CNV copy number variation
  • Embodiments of the present disclosure provide technical solutions to the aforementioned technical problems associated with a non-invasive method for detecting cancer, at least by providing systems and methods for non-invasive detection of cancer (or ctDNA) in blood or other sampling. Further, embodiments of the present disclosure provide a technical solution to the technical problem associated with a lack of early detection methods for diagnosing cancer, at least by providing systems and methods for detection of cancer (or ctDNA) in early cancer stage(s). Further, embodiments of the present disclosure provide a technical solution to the technical problem associated with reducing the amount of time from biological sample receipt to cancer prognosis/diagnosis, at least by providing systems and methods for rapid detection of cancer (or ctDNA).
  • embodiments of the present disclosure provide a technical solution to the technical problem associated with reliable cancer prognoses/diagnoses that indicate tissue specificity and/or stage severity of cancer from a non-localized tissue sample (e.g., a fluid sample), at least by providing systems and methods for tissue specificity and/or stage severity detection of cancer.
  • a non-localized tissue sample e.g., a fluid sample
  • CNV copy number variation
  • Figure 3 illustrates a series of frequency plots depicting genome-wide CNV profile results for: (A) the "Primary" breast tumor of Figure 1 and “Normal” blood derived cells, (B) "Metastatic” breast tumor and the “Normal” blood derived cells, (C) the “Metastatic” breast tumor and the “Normal” breast tissue of Figure 1, (D) the "Metastatic” breast tumor and the “Primary” breast tumor of Figure 1, and (E) the "Normal” blood derived cells and the "Normal” breast tissue of Figure 1;
  • Figure 5 illustrates a series of frequency plots depicting genome-wide CNV profile results for: (A) brain "Primary” tumor and “Normal” brain tissue adjacent to Primary tumor, (B) "Recurrent” primary brain tumor and the “Normal” tissue adjacent to Primary tumor, and (C) the "Recurrent” primary brain tumor and the "Primary” brain tumor;
  • Figure 6 illustrates a frequency plot depicting genome-wide CNV profile results for the "Breast” Primary tumor of Figure 1, the "Nervous System” Primary tumor of Figure 4, and the "Brain” Primary tumor of Figure 5;
  • Figure 7 illustrates a series of frequency plots depicting genome-wide CNV profile results for bladder, blood, brain, breast, cervix, colorectal, head and neck, kidney, liver, lung, ovary, pancreas, prostate, skin, stomach, and uterus tumors;
  • Figure 8 illustrates a frequency plot depicting genome-wide CNV profile results for "Bone” tumor and "Nervous System” tumor;
  • Figure 9 illustrates differential thresholds for unique CNV events at 50%, 55%, 60%, 65% and 70% for the genome-wide CNV profile results of Figure 8;
  • Figure 10 illustrates a frequency plot depicting genome-wide CNV profile results for "Stage 1" Nervous System tumors and "Stage 4" Nervous System tumor;
  • Figure 11 illustrates differential thresholds for unique CNV events at 50%
  • Figure 12 illustrates a series of frequency plots depicting genome-wide CNV results at IX, 3X, 5X, 7X, and 10X sequencing coverage as compared to CNV profile results for colorectal cancer
  • Figure 13 illustrates a schematic representation of a basic computing system according to one or more embodiments of the present disclosure.
  • systems also contemplates devices, apparatus, compositions, assemblies, kits, etc., and vice versa.
  • method also contemplates processes, procedures, steps, etc., and vice versa.
  • products also contemplates devices, apparatus, compositions, assemblies, kits, etc., and vice versa, and so forth.
  • the words “can” and “may” are used in a permissive sense (i.e., meaning having the potential to), rather than the mandatory sense (i.e., meaning must).
  • the terms “including,” “having,” “involving,” “containing,” “characterized by,” variants thereof (e.g., “includes,” “has,” and “involves,” “contains,” etc.), and similar terms as used herein, including the claims, shall be inclusive and/or open-ended, shall have the same meaning as the word “comprising” and variants thereof (e.g., “comprise” and “comprises”), and do not exclude additional, un-recited elements or method steps, illustratively.
  • nucleic acid contemplates and specifically discloses one, as well as two or more nucleic acids.
  • use of a plural referent does not necessarily require a plurality of such referents, but contemplates, includes, and specifically discloses one, as well as two or more of such referents, unless the context clearly dictates otherwise.
  • disclosure of an illustrative measurement that is less than or equal to about 10 units or between 0 and 10 units includes, illustratively, a specific disclosure of: (i) a measurement of 9 units, 5 units, 1 units, or any other value between 0 and 10 units, including 0 units and/or 10 units; and/or (ii) a measurement between 9 units and 1 units, between 8 units and 2 units, between 6 units and 4 units, and/or any other range of values between 0 and 10 units.
  • cancer refers to an abnormal, typically uncontrolled, growth of cells.
  • a "cancerous cell” as used herein comprises a malignant cell having an abnormal, typically uncontrolled, growth.
  • cancer is an umbrella term encompassing a plurality of different distinctive diseases characterized by malignant cells growing in a typically uncontrolled manner.
  • circulating tumor DNA or "ctDNA” as used herein should be understood in its broadest sense to include cell free DNA circulating in the bloodstream that originated from a tumor cell.
  • ctDNA refers to cell free DNA circulating in the blood stream that originated from a cancerous tumor cell.
  • copy number variation comprises any of one or more additions, duplications, insertions, deletions, etc. of genomic content at and around the genome, including within one or a plurality of distinct sites on any number of chromosomes.
  • the sites of copy number variation comprise genes (exon and intron regions inclusive), promoter regions, intergenic regions, and may comprise any genomic location producing any of siRNA, miRNA, or other interfering RNA species.
  • copy number variation includes any of one or more additions, duplications, insertions, deletions, etc. of genomic content of any size and of any type.
  • neoplasm refers to new, uncontrolled growth of cells where the growth is not under physiologic control.
  • a "neoplastic cell” as used herein comprises any of the cells of a neoplasm that are experiencing uncontrolled growth that is not under physiologic control.
  • a neoplasm can be subclassified as either benign or malignant.
  • tissue refers to a biological sample derived from a patient's body and includes solid tissue and liquid tissue.
  • patient generally refers to any animal under the care of a physician, as that term is defined herein, with particular reference to humans under the care of a medical doctor or other relevant medical professional.
  • the term "physician” as used herein generally refers to a medical doctor. This term may, when contextually appropriate, include any medical professional, including an oncologist, a surgeon, or any licensed medical professional, such as a physician's assistant, a nurse, a phlebotomist, a veterinarian, etc.
  • tumor maintains its traditional understanding as any form of swelling or a growth or enlargement.
  • a tumor may be subclassified as benign, precancerous, or cancerous.
  • a tumor may not be neoplastic, making some neoplasms, such as leukemia and carcinoma, fall outside the scope of "tumors" as the term is defined herein. Nonetheless, when contextually appropriate, a “tumor” may be synonymous for a neoplasm, and further, a malignant neoplasm is synonymous with a cancerous tumor, as those terms are defined herein.
  • Cancers by definition, comprise neoplastic cells (if massed, they are often referred to as a cancerous tumor), and by their very nature, neoplastic cells comprise an unstable genome. This instability may present as one or more duplications, insertions, deletions, etc. of genomic content at and around the genome, including within one or a plurality of distinct sites on any number of chromosomes.
  • cfDNA cell free DNA
  • the cell free DNA may comprise circulating tumor DNA (ctDNA).
  • cfDNA can be isolated from the plasma portion of a blood sample, which lacks nucleated cells. If the cfDNA comprises ctDNA, this type of sampling provides a preferable non-invasive site for screening for the presence of cancer.
  • ctDNA is isolated from a plasma sample followed by deep sequencing and/or directed sequencing of specific target loci known to be correlated with one or more cancers.
  • these target loci represent genetic mutations (e.g., SNPs) in a gene and may be associated with one or more physiological, cancerous effects in the cell.
  • SNPs genetic mutations
  • This technique is limited to the number of cancer-specific mutations (e.g., SNPs) identified for a specific cancer type, and some cancers lack consistent genetic mutations to accurately and consistently function as a diagnostic marker.
  • the foregoing test may be prognostic/diagnostic if one or more mutations are indicated as present. However, if none of the tested mutations are present, the sample may yet represent an unknown cancer type; that is, a negative result on the foregoing test does not rule out a prognosis/diagnosis of cancer.
  • nucleic acid e.g., DNA, cfDNA, ctDNA, RNA, etc.
  • nucleic acid can be isolated from a biological sample.
  • cellular (i.e., nuclear) DNA can be isolated from (primary) tumor or other tissue.
  • cfDNA such as ctDNA
  • a biological sample such as a liquid biopsy (e.g., blood, plasma, serum, mucus, saliva, sputum, spinal fluid, etc.).
  • At least a portion of the isolated nucleic acid can be sequenced.
  • a nucleic acid library can be prepared of or from the isolated nucleic acid. At least a portion of the prepared nucleic acid library can be sequenced. The sequenced library can then be searched for relevant sequencing information.
  • sequenced nucleic acid can be searched or probed for copy number variations, which— as described in more detail below— can be used to identify one or more of a cancer type, a tissue of origin of the nucleic acid (e.g., cfDNA) and/or cancer, a stage/severity of the cancer, etc., whether alone or in combination with other approaches such as, for example, nucleic acid methylation sequencing.
  • Embodiments can also include measuring a number of copies of at least one nucleic acid sequence included in the sequenced portion of the nucleic acid library and/or comparing the measured number of copies with a standard copy number or CNV profile for the at least one nucleic acid sequence to determine variability or similarity between the measured number of copies and the standard copy number.
  • the detection size of copy number variation is preferably between about 1Mb and about 20Mb but may, in some embodiments comprise any size and combination of copy number variations.
  • the detected size of copy number variation may range between about 10 kb - 100 Mb, 100 kb - 50 Mb, 500 kb - 50 Mb, 500 kb - 25 Mb, , 750 kb - 25 Mb, 1Mb - 25 Mb, or any other range where the lower end value and the higher end value are at least one of (or at least greater than for the lower end value or at least less than for the upper end value): 100 kb, 200 kb, 300 kb, 400 kb, 500 kb, 600 kb, 700 kb, 800 kb, 900 kb, 1 Mb, 2 Mb, 3 Mb, 4 Mb, 5 Mb, 6 Mb, 7 Mb, 8 Mb, 9 Mb, 10 Mb, 11 Mb, 12 Mb,
  • sequence coverage is preferably between about 0.05X - 20X but may, in some embodiments, be less than or equal to any of about: 30X, 20X, 15X, 10X, 9X, 8X, 7X, 6X, 5X, 4X, 3X, 2.5X, 2X, 1.5X, IX, 0.9X, 0.8X, 0.75X, 0.7X, 0.6X, 0.5X, 0.4X, 0.3X, 0.25X, 0.2X, 0.1X, 0.05X, 0.025X, 0.01X, 0.005X, 0.0025X, or 0.0001X coverage, or any possible coverage value or range of coverage created between any two of the foregoing points (e.g., between about 0.0001X - 10X, 0.1X -
  • sequencing can be accomplished via single end sequencing and/or paired end sequencing, as known in the art.
  • sequencing of a defined region can be accomplished with relative few, but longer (e.g., 100-200 base pair) reads or relatively many, but shorter (e.g., 25-100 base pair) reads spanning a specific or specified region.
  • approximately IX coverage of a 250 base region can be accomplished with, for example, 2 fragments of 125 bases each, 5 fragments of 50 bases each, 5 fragments of 25 bases each, and other combinations as understood by those skilled in the art.
  • sequencing can be performed using reads of less than about 200 bp, 180 bp, 170 bp, 160bp, 150 bp, 140 bp, 130 bp, 120 bp, 110 bp, 100 bp, 90 bp, 80 bp, 70 bp, 60 bp, 50 bp, 40 bp, 30 bp, 20 bp, 10 bp, or any value or range of values therebetween.
  • sequencing can be performed using reads of between about 10-200 bp, 25-150 bp, 25-100 bp, 25-50bp, 50-150 bp, 50-100 bp, 100-150 bp, and so forth.
  • sequencing can be performed using reads of about 20-30 bp, preferably about 24-28 bp, more preferably about 25-26 bp.
  • One advantage of these smaller reads or ranges of reads is that gene or sequence copy numbers and/or locations can be more robustly identified, measured, and/or determined.
  • some embodiments of the present disclosure can augment the sequence coverage by adjusting the total number of samples run on a single flow cell or similar sequencing input.
  • a flow cell having a maximum read output of 120 Gb may support 40 samples (which in some embodiments have a normalized input, and in the case of the human genome comprising ⁇ 3 Gb) at IX coverage.
  • the same flow cell could support 80 samples at 0.5X coverage or 20 samples at 2X coverage.
  • the sequence coverage may be adjusted depending on how many samples are being processed and the capacity of the flow cell (or similar input device for sequencing) being used.
  • lower sequencing coverages e.g., 0.5X, IX, 3X, 5X, 7X, etc.
  • sequencing at a higher sequencing coverage e.g., 10X, 7X, 5X, etc.
  • CNV profiles can be obtained, for example, by analysis performed using Nexus Copy Number software.
  • Certain advantages are associated with the inventive methods and systems provided herein. For example, because the samples can be acquired in a non-invasive manner and the detection of cancer types through copy number variation patterns is robust, it is, in some embodiments, possible to detect cancer at early stages and to be able to distinguish early stage from late stage cancer.
  • at least some embodiments of the present disclosure are directed to systems and methods for detecting cancer and for differentiating between early-stage and late-stage cancer. In doing so, tangible benefits are afforded to the patient. Specialized and/or individualized treatment regimens can be prescribed to the patient based on the detection and determination of cancer type or stage. Having a better understanding and a finer precision for detecting cancer— as provided by implementations disclosed herein— can increase patient survival rates, decrease the amount or time to effectively treat, and/or reduce the incidence of misdiagnosis.
  • the defined set of rules or criteria associated with the disclosed methods provide many benefits and improvements. For example, many more cancers are detectable using the methods disclosed herein than through other methods known in the art. This expands the number and types of cancers that would otherwise have gone undetected or misdiagnosed. As an additional example, performing low sequence coverage reads for each sample provides additional bandwidth on a single machine. This increases the efficiency of the device and allows more work to be done in less time (e.g., more samples can be processed in a single run on the machine and/or more samples can be processed in a given period of time). Further, implementations of the present disclosure allow for less resources to be consumed per sample, which allows for more efficient use of resources and/or less money spent per sample.
  • one or more additional tools may be provided to increase the predictability and/or consistency of the results.
  • use of antibody-based assays and/or nucleic acid methylation sequencing techniques e.g., bisulfite sequencing
  • the combination thereof can provide an unexpected result of increased predictive capacity for the cancer type, tissue of origin, and/or stage/severity of cancer that, in some embodiments, may not be possible through inspection and/or analysis of either alone.
  • systems and methods comprise a sequencing step and/or the cfDNA/ctDNA is sequenced. It should be appreciated that a variety of sequencing techniques fall within the scope of the present disclosure and may be adopted for use in one or more of the disclosed systems and/or methods.
  • sequencing comprises the selective incorporation of chain-terminating di-deoxynucleotides—which were modified (e.g., fluorescent and/or radioactive) for reporting the site of incorporation.
  • sequencing comprises Sanger sequencing.
  • NGS next generation sequencing
  • NGS refers to non-Sanger-based, high-throughput DNA sequencing technologies. Through NGS, millions or even billions of DNA strands can be sequenced in parallel, yielding substantially more throughput and minimizing the need for the fragment-cloning methods that are often used in Sanger sequencing of genomes.
  • NGS is the catch-all term used to describe a number of different modern sequencing technologies or platforms including, for example, pyrosequencing, sequencing by synthesis, sequencing by ligation, ion semiconductor sequencing, and others as known in the art.
  • NGS generally allow sequencing of large amounts of DNA and RNA much more quickly and affordably than Sanger sequencing.
  • vast numbers of short reads are sequenced in a single stroke. To do this, firstly the input sample can be cleaved into short sections. The length of these sections depends on the particular sequencing machinery used.
  • Illustrative examples of specific NGS technologies include, for example, Ulumina® (Solexa) sequencing, Roche 454TM sequencing, Ion torrentTM: Proton / PGM sequencing, SOLiD sequencing, and so forth.
  • the terminators are removed, allowing the next base to be added, and the fluorescent signal is removed, preventing the signal from contaminating the next image.
  • the process is repeated, adding one nucleotide at a time and imaging in between.
  • Computers are then used to detect the base at each site in each image and these are used to construct a sequence. All of the sequence reads will be the same length, as the read length depends on the number of cycles carried out.
  • Roche 454TM sequencing can generally sequence much longer reads than Illumina®. Like Illumina®, it does this by sequencing multiple reads at once by reading optical signals as bases are added. As in Illumina®, the DNA or RNA is fragmented into shorter reads, in this case up to lkb. Generic adaptors are added to the ends and these are annealed to beads, one DNA fragment per bead. The fragments are then amplified by PCR using adaptor-specific primers. Each bead is then placed in a single well of a slide. So each well will contain a single bead, covered in many PCR copies of a single sequence. The wells also contain DNA polymerase and sequencing buffers. The slide is flooded with one of the four NTP species.
  • this nucleotide is next in the sequence, it is added to the sequence read. If that single base repeats, then more will be added. So if we flood with Guanine bases, and the next in a sequence is G, one G will be added, however if the next part of the sequence is GGGG, then four Gs will be added. The addition of each nucleotide releases a light signal. These locations of signals are detected and used to determine which beads the nucleotides are added to. This NTP mix is washed away. The next NTP mix is now added and the process repeated, cycling through the four NTPs. This kind of sequencing generates graphs for each sequence read, showing the signal density for each nucleotide wash. The sequence can then be determined computationally from the signal density in each wash. All of the sequence reads we get from 454 will be different lengths, because different numbers of bases will be added with each cycle.
  • Ion torrentTM and Ion proton sequencing do not make use of optical signals. Instead, they exploit the fact that addition of a dNTP to a DNA polymer releases an H+ ion.
  • the input DNA or RNA is fragmented, this time ⁇ 200bp.
  • Adaptors are added and one molecule is placed onto a bead.
  • the molecules are amplified on the bead by emulsion PCR.
  • Each bead is placed into a single well of a slide.
  • the slide is flooded with a single species of dNTP, along with buffers and polymerase, one NTP at a time.
  • the pH is detected is each of the wells, as each H+ ion released will decrease the pH.
  • the changes in pH allow us to determine if that base, and how many thereof, was added to the sequence read.
  • the dNTPs are washed away, and the process is repeated cycling through the different dNTP species.
  • the pH change, if any, is used to determine how many bases (if any) were added with each cycle.
  • the sequencing may be more generally performed by a fluorescent-based sequencing technique and/or any electrical-current- based sequencing technique.
  • fluorescent-based sequencing techniques include any technique that incorporates nucleotides conjugated to a fluorophore, such as, for example sequencing using Illumina® based sequencing methods and systems.
  • electrical-current-based sequencing techniques include any sequencing technique (including strand sequencing methods) that measures the electrical current of a polynucleotide as it passes through a pore inserted into a charged membrane or otherwise specifically disrupts the electrical current of a sensor and/or charged membrane.
  • electrical- current-based sequencing techniques include the Nanopore DNA sequencing systems and methods of Oxford NanoPore Technologies®.
  • Strand sequencing systems such as those provided by Oxford NanoPore Technologies®, provide some advantages when determining copy number variation of a nucleic acid, particularly the copy number variation of a sample that potentially contains DNA (or other nucleic acid) from neoplastic and/or cancerous cells.
  • strand sequencing techniques a single portion of the genome is continuously sequenced, which allows a direct analysis of copy number variation instead of an implicit analysis of copy number variation that may occur when analyzing sequencing data provided by other sequencing methods where the sample nucleic acid is cut into small fragments for sequencing. This may be particularly advantageous for embodiments when sequence coverage is low. That is, in some embodiments, a low sequence coverage run may return an incomplete set of genomic data.
  • the long sequence reads produced may allow for a more definitive assessment of copy number variation, particularly for regions that are duplicated or deleted. If a full sequence is not available due to the low coverage of the sequencing run, it may be difficult to determine what portions of the genome are deleted (a form of copy number variation) versus what portions of the genome were not represented based on statistical probability (i.e., random sampling).
  • the final product may be a sequence library representing about half of the total reference genome, where an aligned reference genome is littered with a smattering of smaller nucleic acid matches.
  • the result may be a sequence library representing, again, about half of the total reference genome.
  • strand sequencing may provide a robust model for analyzing copy number variation.
  • any of the foregoing sequencing techniques may be used in any number or capacity and with any number of flow cells or other similar inputs that affect the total number of sequencing reads provided for each sequencing reaction/run.
  • the accompanying figures depict comparison plots that, in some instances (e.g., sample types, chromosomal locations, etc.), illustrate the frequency of copy number gains (black bars; above the reference line) and/or losses (grey bars; below the reference line) in particular genomic regions and/or for a variety of cancer classifiers (e.g., cancerous cells or cell types, such as primary tumors, cancer stage, etc.) and/or a variety of normal (e.g., non-cancerous) tissues and/or biological samples.
  • cancer classifiers e.g., cancerous cells or cell types, such as primary tumors, cancer stage, etc.
  • normal e.g., non-cancerous tissues and/or biological samples.
  • some of the figures illustrate CNV plots comparing gene copy numbers observed in primary (solid) tumors, normal primary tissue adjacent the cancerous and/or tumor tissue, metastatic tumors, and so forth.
  • Unique copy number variation patterns can be significant, in some embodiments. Further, a plurality of cancer types, tissues of origin, and/or cancer stage/severity may, in some embodiments, comprise unique copy number variation patterns. In some embodiments, the copy number variation patterns may span a single portion of a single chromosomal region while in other embodiments, the copy number variation patterns are more nuanced and comprise smaller portions of a plurality of chromosomes in combination.
  • the plot is segmented into a chromosome map, including chromosomes 1- 22, X, and Y.
  • Duplication events, or copy number gains are depicted with black bars above the respective sample reference lines.
  • Deletion events, or copy number losses are depicted with grey bars below the respective sample reference lines.
  • the relative size (i.e., length, height, etc.) of the black and grey bars is representative of the 'frequency' of the corresponding CNV event.
  • longer (or taller) bars indicate regions where a higher percentage of the total samples in the study (i.e., more samples) were positive for the CNV event, while shorter bars indicate regions where a lower percentage of the total samples in the study (i.e., fewer samples) were positive for the CNV event.
  • Illustrative regions of "Significant" CNV, relative to the consensus genome, are indicated with black shading (for duplications) and grey shading (for deletions).
  • results illustrate the contrast between observable CNV in cancerous and non-cancerous breast tissue.
  • Some significant regions of CNV duplications in breast cancer include, without limitation, respective parts of chromosomes 1, 5, 8, 16, 17, and 20.
  • Some significant regions of CNV deletion in breast cancer include, without limitation, respective parts of chromosomes 8, 11, 13, 16, 17, and 22.
  • breast cancer tumors illustratively, have a CNV signature that distinguishes such tumors from normal tissue.
  • breast tissue presenting the illustrated pattern can be classified as cancerous, likely-to-be-cancerous, in danger of being cancerous, or otherwise associated with a cancer profile.
  • breast tissue suspected of being cancerous can be biopsied and sequenced (at relatively low sequencing coverage (e.g., less than or equal to 5X coverage)) for CNV, rather than SNP, to provide a preliminary or even final diagnosis.
  • Low coverage sequencing can be performed quickly to provide clinical indications of cancerous or cancer-prone tissues.
  • a subject CNV profile (for breast tissue suspected of being cancerous) can be compared to one or more standard, breast cancer CNV profiles to observe and/or measure similarities and/or differences between the subject and the standard CNVs. Based on the similarities and/or differences, the subject tissue can be diagnosed as being associated with a breast cancer profile, etc.
  • Figure 3 illustrates a series of frequency plots depicting genome-wide CNV profile results for: (A) the "Primary" breast tumor of Figure 1 and “Normal” blood derived cells, (B) "Metastatic” breast tumor and the “Normal” blood derived cells, (C) the “Metastatic” breast tumor and the “Normal” breast tissue of Figure 1, (D) the "Metastatic” breast tumor and the “Primary” breast tumor of Figure 1, and (E) the "Normal” blood derived cells and the "Normal” breast tissue of Figure 1.
  • These results illustrate that a variety of cancerous tissue sources can be used to investigate CNV patterns in cancer. Accordingly, these various tissue sources can each be used diagnostically when detecting cancer in a biological sample.
  • metastatic breast tumor samples (or CNV profiles thereof), which have unique CNV event(s) in chromosome 15, for example, can be distinguished, not only from normal tissue samples (or CNV profiles thereof), as in plots (B) and (C), but also from primary breast tumor samples (or CNV profiles thereof), as in plot (D).
  • Figure 5 illustrates a series of frequency plots depicting genome-wide CNV profile results for: (A) brain "Primary” tumor and “Normal” brain tissue adjacent to Primary tumor, (B) "Recurrent” primary brain tumor and the “Normal” tissue adjacent to Primary tumor, and (C) the "Recurrent” primary brain tumor and the "Primary” brain tumor.
  • brain tumors can be characterized by a CNV pattern.
  • brain tissue presenting the illustrated CNV profile (or CNV profile significantly similar thereto) can also be classified as being associated with a cancer profile, etc.
  • a subject CNV profile for brain tissue suspected of being cancerous
  • a subject CNV profile can be compared to one or more standard, brain cancer CNV profiles to observe and/or measure similarities and/or differences between the subject and the standard CNVs. Based on the similarities and/or differences, the subject tissue can be diagnosed as being associated with a brain cancer profile, etc.
  • Figure 6 illustrates a frequency plot depicting genome-wide CNV profile results for the "Breast" Primary tumor of Figure 1, the "Nervous System” Primary tumor of Figure 4, and the "Brain” Primary tumor of Figure 5.
  • CNV profiles e.g., obtained through low-coverage sequencing
  • a subject CNV profile for any tissue can be compared to various standard, cancer-type CNV profiles to observe and/or measure similarities and/or differences between the subject and standard CNVs. Based on the similarities and/or differences, the subject tissue can be diagnosed as being associated with a specific cancer type profile, etc.
  • Figure 7 illustrates a series of frequency plots depicting genome-wide CNV profile results for bladder, blood, brain, breast, cervix, colorectal, head and neck, kidney, liver, lung, ovary, pancreas, prostate, skin, stomach, and uterus tumors. These results further illustrate the variety of cancer types that can be distinguished one from another based on CNV profile.
  • Figure 8 illustrates a frequency plot depicting genome-wide CNV profile results for "Bone” tumor and "Nervous System” tumor, further illustrating the variety of cancer types that can be distinguished one from another based on CNV profile.
  • Figure 9 illustrates differential thresholds for the unique CNV events illustrated in the genome-wide CNV profile results of Figure 8.
  • 2737 different CNV events were unique (i.e., observed in bone tumors and not to nervous system tumors, or vice versa), having at least a 50% difference.
  • 61 different CNV events were unique with at least a 70% difference. Higher number of CNV events at higher differential threshold percentage means less similarity between two groups.
  • Figure 10 illustrates a frequency plot depicting genome-wide CNV profile results for "Stage 1" Nervous System tumors and "Stage 4" Nervous System tumors. Differences in CNV profile between the two samples are readily observable, indicating that cancer stage/severity is also distinguishable using CNV profile comparison. Diagnostically, a cancer sample can be staged and even graded based on similarities and/or differences between the subject and standard CNVs.
  • Figure 11 illustrates differential thresholds for the unique CNV events illustrated in the genome-wide CNV profile results of Figure 10.
  • Figure 12 illustrates a (diagnostic) series of frequency plots depicting genome- wide CNV results for a patient at IX, 3X, 5X, 7X, and 10X sequencing coverage as compared to CNV profile results for colorectal cancer.
  • the CNV patient results can be obtained from ctDNA (circulating tumor DNA), tumor, and/or polyp tissue nucleic acid analysis.
  • the CNV patient results can be obtained by sequencing the cfDNA (cell free DNA) isolated from the blood of a colorectal cancer patient.
  • the CNV patient results can be compared and/or matched with a colorectal cancer CNV profile.
  • the profile can be obtained by compiling CNV sequencing data (obtained in advance, publicly available, and updated periodically).
  • colorectal cancer also has a CNV signature.
  • the profile illustrates a variety of CNV events (e.g., duplications and deletions) across the represented genome plot.
  • the patient sample CNV matches the colorectal cancer CNV profile (see e.g., CNV gains (indicated with black arrows) at chromosome 5p, 7, 8, 9p, 13, 16, 19 and 20, illustratively, and CNV losses (indicated with grey arrows) at chromosome lp, 4, 14, 15, 17, and 22).
  • the foregoing illustrates how a patient sample can be analyzed through relatively low-coverage sequencing to detect CNV present in the sample.
  • the detected CNV can be compared to and/or matched with one or more existing cancer profiles in order to associate the patient sample with a cancer CNV profile.
  • This association can be informative, predictive, and/or diagnostic for the cancer or other cancer condition, such as predisposition, early development, cancer classification, metastasis, stage/grade, etc.
  • cfDNA sampled from blood or other bodily fluid can be sampled non-invasively and sequenced to provide a diagnosis or indication of cancer, likely-to-be-cancerous, in danger of being cancerous, or otherwise associated with a cancer profile.
  • Figures 1-12 thus illustrate the unique nature of cancer CNV profiles, including cancer origin, type, classification, stage, etc., as compared to normal tissue samples, related cancer origin, type, classification, stage, etc., and different cancer origin, type, classification, stage, etc.
  • Figures 1-12 also illustrate differential threshold and significance determination for comparative samples, as well as the diagnostic relevance of comparing a single, patient sample CNV plot with one or more cancer CNV profiles. Based on the similarities and/or differences between the patient CNV and the profile CNV, the patient can be diagnosed as being associated with a particular cancer profile.
  • the cancer profile can be indicative or representative of cancer origin, type, classification, stage, etc.
  • Certain embodiments of the present disclosure comprise methods for detecting cancer and/or cancer nucleic acid (e.g., DNA, cfDNA, ctDNA, RNA, etc.) in a biological sample, such as tumor tissue or a liquid biopsy.
  • a liquid biopsy for the purposes of this disclosure, comprises a biological fluid sample (e.g., a fluid sample taken from a patient). It may, for example, include any of whole blood, serum, plasma, cerebrospinal fluid, tumor fluid, interstitial fluid phase, any other relevant bodily fluid, and combinations thereof.
  • the methods may further comprise isolating cfDNA from the liquid biopsy wherein the cfDNA comprises one or more genetic elements.
  • the method may further comprise determining a copy number of the one or more genetic elements and comparing the determined copy number to one or more known copy number standards for ctDNA.
  • the foregoing method may additionally comprise assembling a genetic profile of the cfDNA, wherein the genetic profile comprises a representation of the relative abundance of the one or more genetic elements in the cfDNA.
  • the method may further comprise detecting the presence of ctDNA in the liquid biopsy by measuring one or more similarities between the determined copy number and the one or more known copy number standards for ctDNA. Similar to the methods described above, the method may further comprise determining a tissue of origin and/or a stage/severity of cancer based on the similarity of the measured number of copies and the standard number of copies.
  • the detection of ctDNA in a liquid biopsy comprises detecting a copy number variation pattern in the ctDNA and/or in one or more genetic elements derived from the cfDNA.
  • the cfDNA and the ctDNA are the same molecules.
  • Any of the methods disclosed herein may additionally comprise a step of implementing a cancer-specific treatment based on the identified cancer type, tissue of origin, and/or stage/severity of the identified cancer. Additionally, or alternatively, one or more therapeutics are prescribed and/or one or more surgeries are performed to mitigate the potential harms of leaving the identified cancer untreated.
  • Computing systems may, for example, be handheld devices, appliances, laptop computers, desktop computers, mainframes, distributed computing systems, datacenters, or even devices that have not conventionally been considered a computing system, such as wearables (e.g., glasses).
  • the term "computing system” is defined broadly as including any device or system— or combination thereof— that includes at least one physical and tangible processor and a physical and tangible memory capable of having thereon computer- executable instructions that may be executed by a processor.
  • the memory may take any form and may depend on the nature and form of the computing system.
  • a computing system may be distributed over a network environment and may include multiple constituent computing systems.
  • a basic configuration of a computing system 100 typically includes at least one hardware processing unit 102 and memory 104.
  • the memory 104 may be physical system memory, which may be volatile, nonvolatile, or some combination of the two.
  • the term "memory” may also be used herein to refer to non-volatile mass storage such as physical storage media. If the computing system is distributed, the processing, memory, and/or storage capability may be distributed as well.
  • the computing system 100 also has thereon multiple structures often referred to as an "executable component.”
  • the memory 104 of the computing system 100 is illustrated as including executable component 106.
  • executable component is the name for a structure that is well understood to one of ordinary skill in the art in the field of computing as being a structure that can be software, hardware, or a combination thereof.
  • the structure of an executable component may include software objects, routines, methods, and so forth, that may be executed on the computing system, whether such an executable component exists in the heap of a computing system, or whether the executable component exists on computer-readable storage media.
  • the structure of the executable component exists on a computer-readable medium such that, when interpreted by one or more processors of a computing system (e.g., by a processor thread), the computing system is caused to perform a function.
  • Such structure may be computer-readable directly by the processors— as is the case if the executable component were binary.
  • the structure may be structured to be interpretable and/or compiled— whether in a single stage or in multiple stages— so as to generate such binary that is directly interpretable by the processors.
  • Such an understanding of exemplary structures of an executable component is well within the understanding of one of ordinary skill in the art of computing when using the term "executable component.”
  • executable component is also well understood by one of ordinary skill as including structures that are implemented exclusively or near- exclusively in hardware, such as within a field programmable gate array (FPGA), an application specific integrated circuit (ASIC), Program-specific Standard Products (ASSPs), System-on-a-chip systems (SOCs), Complex Programmable Logic Devices (CPLDs), or any other specialized circuit.
  • FPGA field programmable gate array
  • ASIC application specific integrated circuit
  • ASSPs Program-specific Standard Products
  • SOCs System-on-a-chip systems
  • CPLDs Complex Programmable Logic Devices
  • the term “executable component” is a term for a structure that is well understood by those of ordinary skill in the art of computing, whether implemented in software, hardware, or a combination.
  • the terms “component,” “service,” “engine,” “module,” “control,” “generator,” or the like may also be used.
  • these terms—whether expressed with or without a modifying clause— are also intended to be synonymous with the term “executable
  • processors of the associated computing system that performs the act
  • computer-executable instructions may be embodied on one or more computer-readable media that form a computer program product.
  • An example of such an operation involves the manipulation of data.
  • the computer-executable instructions may be stored in the memory 104 of the computing system 100.
  • Computing system 100 may also contain communication channels 108 that allow the computing system 100 to communicate with other computing systems over, for example, network 110.
  • the computing system 100 includes a user interface 112 for use in interfacing with a user.
  • the user interface 112 may include output mechanisms 112A as well as input mechanisms 112B.
  • output mechanisms 112A might include, for instance, speakers, displays, tactile output, holograms and so forth.
  • input mechanisms 112B might include, for instance, microphones, touchscreens, holograms, cameras, keyboards, mouse or other pointer input, sensors of any type, and so forth.
  • Embodiments described herein may comprise or utilize a special purpose or general-purpose computing system including computer hardware, such as, for example, one or more processors and system memory, as discussed in greater detail below.
  • Embodiments described herein also include physical and other computer-readable media for carrying or storing computer-executable instructions and/or data structures.
  • Such computer-readable media can be any available media that can be accessed by a general purpose or special purpose computing system.
  • Computer-readable media that store computer-executable instructions are physical storage media.
  • Computer-readable media that carry computer-executable instructions are transmission media.
  • embodiments of the invention can comprise at least two distinctly different kinds of computer-readable media: storage media and transmission media.
  • Computer-readable storage media include RAM, ROM, EEPROM, solid state drives (“SSDs”), flash memory, phase-change memory (“PCM”), CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other physical and tangible storage medium which can be used to store desired program code in the form of computer-executable instructions or data structures and which can be accessed and executed by a general purpose or special purpose computing system to implement the disclosed functionality of the invention.
  • RAM random access memory
  • ROM read-only memory
  • EEPROM electrically erasable programmable read-only memory
  • SSDs solid state drives
  • PCM phase-change memory
  • CD-ROM or other optical disk storage CD-ROM or other optical disk storage
  • magnetic disk storage or other magnetic storage devices or any other physical and tangible storage medium which can be used to store desired program code in the form of computer-executable instructions or data structures and which can be accessed and executed by a general purpose or special purpose computing system to implement the disclosed functionality of the invention.
  • PCM phase-change memory
  • a "network” is defined as one or more data links that enable the transport of electronic data between computing systems and/or modules and/or other electronic devices.
  • Networks may be "private” or they may be "public,” or networks may share qualities of both private and public networks.
  • a private network may be any network that has restricted access such that only the computer systems and/or modules and/or other electronic devices that are provided and/or permitted access to the private network may transport electronic data through the one or more data links that comprise the private network.
  • a public network may, on the other hand, not restrict access and allow any computer systems and/or modules and/or other electronic devices capable of connecting to the network to use the one or more data links comprising the network to transport electronic data.
  • a private network found within an organization such as a private business, restricts transport of electronic data between only those computer systems and/or modules and/or other electronic devices within the organization.
  • the Internet is an example of a public network where access to the network is, generally, not restricted.
  • Computer systems and/or modules and/or other electronic devices may often be connected simultaneously or serially to multiple networks, some of which may be private, some of which may be public, and some of which may be varying degrees of public and private.
  • a laptop computer may be permitted access to a closed network, such as a network for a private business that enables transport of electronic data between the computing systems of permitted business employees, and the same laptop computer may also access an open network, such as the Internet, at the same time or at a different time as it accesses the exemplary closed network.
  • a closed network such as a network for a private business that enables transport of electronic data between the computing systems of permitted business employees
  • an open network such as the Internet
  • Transmission media can include a network and/or data links which can be used to carry desired program code in the form of computer-executable instructions or data structures and which can be accessed and executed by a general purpose or special purpose computing system. Combinations of the above should also be included within the scope of computer-readable media.
  • program code in the form of computer-executable instructions or data structures can be transferred automatically from transmission media to storage media (or vice versa).
  • computer-executable instructions or data structures received over a network or data link can be buffered in RAM within a network interface module (e.g., a "NIC") and then eventually transferred to computing system RAM and/or to less volatile storage media at a computing system.
  • a network interface module e.g., a "NIC”
  • storage media can be included in computing system components that also, or even primarily, utilize transmission media.
  • Computer-executable instructions comprise, for example, instructions and data which, when executed at a processor, cause a general purpose computing system, special purpose computing system, or special purpose processing device to perform a certain function or group of functions. Alternatively, or additionally, the computer- executable instructions may configure the computing system to perform a certain function or group of functions.
  • the computer executable instructions may be, for example, binaries or even instructions that undergo some translation (such as compilation) before direct execution by the processors, such as intermediate format instructions like assembly language, or even source code.
  • the invention may be practiced in network computing environments with many types of computing system configurations, including, personal computers, desktop computers, laptop computers, message processors, hand-held devices, multi-processor systems, microprocessor- based or programmable consumer electronics, network PCs, minicomputers, mainframe computers, tablets, mobile telephones, PDAs, pagers, routers, switches, datacenters, wearables (e.g., glasses) and the like.
  • the invention may also be practiced in distributed system environments where local and remote computing systems, which are linked (either by hardwired data links, wireless data links, or by a combination of hardwired and wireless data links) through a network, both perform tasks.
  • program modules may be located in both local and remote memory storage devices.
  • Cloud computing environments may be distributed, although this is not required. When distributed, cloud computing environments may be distributed internationally within an organization and/or have components possessed across multiple organizations.
  • “cloud computing” is defined as a model for enabling on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services). The definition of “cloud computing” is not limited to any of the other numerous advantages that can be obtained from such a model when properly deployed.
  • a cloud-computing model can be composed of various characteristics, such as on-demand self-service, broad network access, resource pooling, rapid elasticity, measured service, and so forth.
  • a cloud-computing model may also come in the form of various service models such as, for example, Software as a Service (“SaaS”), Platform as a Service (“PaaS”), and Infrastructure as a Service (“IaaS”).
  • SaaS Software as a Service
  • PaaS Platform as a Service
  • IaaS Infrastructure as a Service
  • the cloud- computing model may also be deployed using different deployment models such as private cloud, community cloud, public cloud, hybrid cloud, and so forth.
  • Computing systems of the present invention may be any computing systems as previously described and adapted for compiling, storing, analyzing, parsing, displaying, and/or communicating one or more portions of sequence data derived from a biological sample as previously described.
  • the foregoing functionalities i.e., compiling, storing, analyzing, parsing, displaying, and/or communicating
  • Embodiments may comprise compiling the sequencing data, which may be performed, for example, by a compiling module. For example, this may comprise concatenating one or more sequencing reads and/or assembling a genome based on the one or more sequencing reads.
  • compiling sequencing reads comprises generating a sequencing profile and/or copy number variation profile of the DNA (e.g., cfDNA or ctDNA) obtained from the biological sample.
  • Embodiments may comprise storing the sequencing data.
  • the storing may be in any of the aforementioned storage methods (e.g., volatile, non-volatile, local, networked, etc.).
  • computing systems may be adapted to store the individual sequencing reads (e.g., the raw sequencing data) in addition to or distinctly from concatenated sequencing data and/or sequencing profiles, including copy number variation profiles.
  • Any of the foregoing data may be stored in any form and/or any database system (or other storage system) described herein and/or known in the art.
  • the sequencing data may be stored as one or more graphical images comprising the copy number variation profile of one or more samples (e.g., patient samples and standards). Further, sequencing data and/or other data such as, for example, the copy number variation plot of one or more samples/standards, may be retrieved from any of the one or more data stores provided herein.
  • Embodiments may additionally, or alternatively, comprise an analyzing module for analyzing the sequencing data. In one or more embodiments, this may comprise comparing one or more sequencing data from patient samples to one or more standards.
  • the one or more standards as described above, may comprise DNA isolated and sequenced from non-neoplastic cells or may comprise DNA isolated and sequenced from known neoplastic cells of differing cell and/or cancer types.
  • Computing systems analyzing the sequencing may, in some embodiments, digitally and/or logically align sequencing results. This may comprise searching for logical matches (individual matches or a plurality/set of matches) between one or more samples and one or more standards and/or logical matches between two or more samples and/or logical matches between two or more standards.
  • the logical matches may be of any predetermined length or may be determined by a machine learning (or other) algorithm automatically by the computing system.
  • a machine learning algorithm as known in the art
  • analyzing the sequencing data may comprise identifying a digital match between two or more sequencing data.
  • a copy number variation plot of one or more samples may be in a digital and/or image- based format, and the computing system may analyze the digital and/or image-based plots to determine similarities and/or differences between the two or more plots, and in some embodiments, analyzing comprises predicting and/or determining a likelihood that a sample plot matches a standard plot. Additionally, or alternatively, analyzing may comprise determining a correlation (or lack of correlation) between two or more copy number variation plots.
  • Computing systems of the present invention may additionally comprise a parsing module.
  • the parsing module may parse and/or break up sequencing reads and/or genome profiles to identify unique and/or predictive sequences that are indicative and/or correlate to one or more cancer types, tissues of origin, and/or cancer stage/severity. In some embodiments, this may comprise parsing a standard or group of standards to determine one or more portions of the standard sequence that correlate with one or more cancer types, tissues of origin, and/or cancer stages/severity.
  • Embodiments of the present disclosure may comprise a displaying module and/or a physical display for operably displaying the one or more sequencing reads, copy number variation plots, analysis results, and/or any other data or image associated herewith.
  • the display will comprise one or more graphical user interfaces whereby a user may interact with the computing system to input one or more data entries through any of the input devices/methods disclosed herein.
  • the one or more graphical user interfaces may comprise a medium through which a physician, technician, or other healthcare and/or scientific personnel may view the sequencing results and/or the copy number variation plots.
  • the display comprises a navigation window that allows the user to transit or transition between one or more historical and/or current copy number variation plots and/or sequencing samples (which may be organized by any method known in the art, including, for example, by unique identifiers, date, etc.).
  • the display may additionally, or alternatively, comprise the copy number variation plot of one or more samples in addition to one or more standard copy number variation plots.
  • the computing system may output through the display a first and/or an ordered list of probable matches between the sample sequencing read and one or more standards and/or one or more other sample reads/plots.
  • the display may further allow the user to annotate and/or select an associated cancer type, tissue of origin, and/or cancer stage/severity.
  • the display comprises one or more plots representing each of the foregoing— cancer type, tissue of origin, and/or cancer stage/severity— or may display one or more plots comprising an accumulation of each (e.g., stage 3 breast cancer metastasized to the lung).
  • the display may communicate one or more results of the sequencing data and/or copy number variation plot analysis to one or more administrators, billing modules/institutions, insurance carriers, physicians, patients, technicians, or others requiring and/or requesting the information.
  • the computing system may generate a form and/or beautified communication comprising any and/or all of the information disclosed above and may communicate said communication through any means known in the art.
  • Any and/or all of the foregoing may be embodied in a system and/or may be included in one or more methods and/or embodied in one or more computer-readable hardware storage devices as one or more computer-executable instructions.
  • systems, devices, products, kits, methods, and/or processes, according to certain embodiments of the present disclosure may include, incorporate, or otherwise comprise properties, features (e.g., components, members, elements, parts, and/or portions) described in other embodiments disclosed and/or described herein. Accordingly, the various features of certain embodiments can be compatible with, combined with, included in, and/or incorporated into other embodiments of the present disclosure. Thus, disclosure of certain features relative to a specific embodiment of the present disclosure should not be construed as limiting application or inclusion of said features to the specific embodiment. Rather, it will be appreciated that other embodiments can also include said features, members, elements, parts, and/or portions without necessarily departing from the scope of the present disclosure.
  • any feature herein may be combined with any other feature of a same or different embodiment disclosed herein.
  • various well-known aspects of illustrative systems, methods, apparatus, and the like are not described herein in particular detail in order to avoid obscuring aspects of the example embodiments. Such aspects are, however, also contemplated herein.

Abstract

A cancer detection method includes preparing a library of nucleic acids in a biological sample, sequencing the library, preferably with less than 1X coverage, measuring a copy number for genes in the sequenced library, and comparing the measured copy numbers with standard copy numbers for the genes to determine variability or similarity between the measured copy numbers and the standard copy numbers. Some embodiments include classifying the biological sample as cancer, determining a tissue of origin and/or a stage of cancer, and/or providing a patient with a diagnosis of cancer based on the variability or similarity between the measured copy numbers and the standard copy numbers. A computer system includes hardware components and computer-executable instructions that are executable to cause the computer system to perform steps in a cancer detection method.

Description

SYSTEMS AND METHODS FOR CHARACTERIZING
NUCLEIC ACID IN A BIOLOGICAL SAMPLE
BACKGROUND
1. Technical Field
[0001] This disclosure generally relates to systems and methods for characterizing nucleic acid. More specifically, the present disclosure relates to cancer diagnostic systems and methods using nucleic acid from a biological sample.
2. Related Technology
[0002] Animals may suffer from myriad pathologies. Particularly, humans (and other evolved animals) are prone to a variety of dysfunctions, diseases, and ailments. Cancer, for example, is a disease characterized by the aberrant and uncontrolled growth of cells in the body. Cancer is a leading cause of death worldwide and, in the U. S., cancer is second only to heart disease as a leading cause of death.
[0003] Because cancer develops from ordinary cells in the body, it has proven difficult to detect cancerous cells until an uncontrolled growth of cancer cells masses into a tumor and/or becomes metastatic and/or symptomatic. In such cases, cancer detection often involves histological analysis of surgically-excised tissues by a trained medical professional. The expense associated with such an analysis may further inhibit early detection, as a patient is deterred by the cost of testing.
[0004] Existing cancer screening techniques include whole body (or part body) scanning through use of computed tomography, magnetic resonance imaging, or similar imaging techniques to locate masses of (often over-proliferating, under- differentiated, and/or metastatic) cells (e.g., tumors) and are typically followed by a biopsy of the cell mass and diagnostic testing to determine whether the cell mass was benign or malignant. Other traditional diagnostic tests rely on the presence of a subset of cancer diagnostic markers. Cancer markers may be proteinaceous or may present as mutations within the DNA of the cancer cells. Protein-based diagnostic markers may be identified through antibody staining procedures of tumor cross-sections, requiring a tissue specific sample. DNA-based markers have historically comprised small changes in specific genes. For example, many cancer diagnostic markers are based on single nucleotide polymorphisms (SNPs) in a collection of genes that have each or collectively correlated with cancer. Determining the presence of a SNP in a sample also typically requires a tissue specific sample. However, even in cases where a cancerous tissue sample can be collected non-invasively and/or from a systemic sample (e.g., a (random) sampling of blood or other bodily fluid), the diagnostic potential is limited to only those cancers that have been correlated with a unique S P, which are few. Generally, cancer detection through cancer marker identification requires foreknowledge of the origin of the cancerous cell and/or the actual presence of cancerous cells from which the markers originate, making it difficult, if not improbable to identify cancer from a (random) sampling of bodily tissues and/or fluids
[0005] Accordingly, there are a number of disadvantages with systems and methods for cancer detection that can be addressed.
BRIEF SUMMARY
Technical Problem
[0006] There is a need for systems and methods that detect, identify, classify, and/or stage cancer in a biological sample. In particular, there is a need for systems and methods for early, affordably, rapid, and/or non-invasive cancer detection, identification, classification, and/or staging. These needs are exacerbated when considering the myriad types and stages of cancer worldwide.
[0007] Existing systems and methods rely on local tissue sampling or time- and resource-intensive tests that are technically inefficient. In some cases, the technical inefficiency translates into exorbitant costs. The high cost of the currently available tests makes them impractical as diagnostic tools in modern healthcare and/or cancer diagnostic protocols. Additionally, the patient is usually experiencing symptoms of cancer or has otherwise noticed a tumorous growth somewhere on their body, precipitating a visit to a physician for diagnosis. At this stage, however, the cancer is likely at an advanced stage, and it is generally accepted that the later the cancer is diagnosed, the lower the probability that treatment will be successful and/or the higher the morbidity rate.
[0008] Accordingly, there is a technical hurdle in the field of cancer diagnostics for implementing an early stage and/or high-throughput cancer screening method from a non-invasive patient sample (e.g., a (random) sampling of blood or other bodily fluid). Additionally, a faster prognostic/diagnostic method for detecting cancer is needed. Additionally, a prognostic/diagnostic method for detecting a wide range of cancer types is needed. Additionally, a prognostic/diagnostic method for detecting the tissue of origin and/or stage/severity of the cancer is needed. [0009] Accordingly, systems and methods for detecting cancer that solve one or more of the foregoing technical problems are provided.
Solutions to the Problem
[0010] Embodiments of the present disclosure comprise methods and systems for detecting cancer. Inventive methods can include identifying, diagnosing, classifying, distinguishing, and/or staging the detected cancer, and systems for performing the same. One or more embodiments can include (obtaining) a biological sample. The biological sample can be (obtained) from a patient and/or can comprise or contain nucleic acid, such as (genomic) DNA or RNA. One or more embodiments can include: (preparing) a sequencing library, such as a next generation sequencing library, of the nucleic acid; sequencing at least a portion (e.g., the entirety) of the prepared sequencing library with a predetermined sequencing coverage; measuring a number of copies of at least one nucleic acid sequence included in the sequenced portion of the nucleic acid library; comparing the measured number of copies with a standard copy number for the at least one nucleic acid sequence to determine variability or similarity between the measured number of copies and the standard copy number; and/or diagnosing a cancer or cancer condition based on the variability or similarity between the measured number of copies and the standard copy number. In one or more embodiments, the predetermined sequencing coverage can be less than or equal to about 10X, 7X, 5X, 4X, 3X, 2.5X, 2X, 1.5X, IX, or 0.5X.
[0011] One or more embodiments can include providing the patient with the diagnosis of cancer or cancer condition, optionally in the form of a report. One or more embodiments can include prescribing further and/or confirmatory testing, prescribing a treatment protocol, and/or administering a treatment based on the diagnosis or based on the variability or similarity between the measured number of copies and the standard copy number. In some embodiments, the treatment or treatment protocol can include one or more dietary or lifestyle components or alterations. In some embodiments, the treatment or treatment protocol can include one or more supplement or pharmaceutical compositions.
[0012] In an exemplary embodiment, a method for detecting cancer in a biological sample includes receiving a biological sample comprising nucleic acid. The method further includes preparing a nucleic acid library of the nucleic acid in the biological sample. The method also includes sequencing at least a portion of the prepared nucleic acid library and measuring the number of copies of at least one nucleic acid sequence included in the sequenced portion of the nucleic acid library. The method further includes comparing the measured number of copies with a standard copy number for the at least one nucleic acid sequence to determine variability or similarity between the measured number of copies and the standard copy number.
[0013] In one or more embodiments, the comparing step of the previously recited method comprises determining a tissue of origin and/or a stage of cancer based on the similarity of the measured copy number of nucleic acid from the biological sample to a standard copy number. The standard copy number may comprise a copy number of nucleic acid or nucleic acid sequence(s) and/or a copy number profile of nucleic acid or nucleic acid sequence(s) for a wild-type cell or sample. Additionally, or alternatively, the standard copy number may comprise a copy number of nucleic acid or nucleic acid sequence(s) and/or a copy number profile of nucleic acid or nucleic acid sequence(s) for (each of) one or more cancer or cancerous cells, cell types, or samples
[0014] Additionally, methods for detecting circulating tumor DNA (ctDNA) in a liquid biopsy (e.g., a (random) sampling of blood or other bodily fluid) are disclosed. In one embodiment, the method comprises isolating cell free DNA (cfDNA) from the liquid biopsy, the cfDNA comprising one or more genetic elements. The method also comprises determining a copy number of the one or more genetic elements and comparing the determined copy number to one or more known copy number standards for ctDNA.
[0015] In one or more embodiments, methods for detecting ctDNA in a liquid biopsy can (further) comprise assembling a genetic profile of the cfDNA, wherein the genetic profile comprises a representation of the relative abundance of the one or more genetic elements in the cfDNA. Additionally, or alternatively, the comparing step in methods for detecting ctDNA in a liquid biopsy can comprise detecting the presence of ctDNA in the liquid biopsy (e.g., by measuring one or more similarities between the determined copy number and the one or more known copy number standards for ctDNA). The comparing step can also, or alternatively comprise detecting the presence of cancer or cancerous cells (e.g., by detecting ctDNA in the liquid biopsy and/or measuring one or more similarities between the determined copy number and the one or more known copy number standards for ctDNA).
[0016] Additionally, or alternatively, the method can comprise determining a methylation pattern of the cfDNA and/or the one or more genetic elements. The aforementioned methods may further comprise determining copy number alterations, gene expression, a tissue of origin and/or a stage of cancer based on the detected ctDNA and/or the methylation pattern of the cfDNA and/or the one or more genetic elements.
[0017] One or more embodiments include a (computer) system. The system can be configured for engineering compliant communications. The system can include one or more processors and one or more computer-readable storage media. The computer- readable storage media can have stored thereon computer-executable instructions that are executable by the one or more processors to cause the computer system to analyze a nucleic acid processed from a biological sample to determine the presence of cancer in the biological sample. The computer-executable instructions can include instructions that are executable to cause the computer system to perform one or more of the following: receive sequence data, the sequence data comprising a plurality of sequence reads derived from the nucleic acid; parse the sequence data to determine a number of copies of at least one nucleic acid sequence included in the sequence data; analyze the parsed number of copies with a standard copy number for the at least one nucleic acid sequence to determine variability or similarity between the parsed number of copies and the standard copy number; and, based on the determined variability or similarity, display a result at a user interface. In some embodiments, the result can be a copy number variation (CNV) profile result, a diagnosed cancer or cancer condition, and/or a report comprising a diagnosis.
[0018] Embodiments of the present disclosure provide technical solutions to the aforementioned technical problems associated with a non-invasive method for detecting cancer, at least by providing systems and methods for non-invasive detection of cancer (or ctDNA) in blood or other sampling. Further, embodiments of the present disclosure provide a technical solution to the technical problem associated with a lack of early detection methods for diagnosing cancer, at least by providing systems and methods for detection of cancer (or ctDNA) in early cancer stage(s). Further, embodiments of the present disclosure provide a technical solution to the technical problem associated with reducing the amount of time from biological sample receipt to cancer prognosis/diagnosis, at least by providing systems and methods for rapid detection of cancer (or ctDNA). Further, embodiments of the present disclosure provide a technical solution to the technical problem associated with reliable cancer prognoses/diagnoses that indicate tissue specificity and/or stage severity of cancer from a non-localized tissue sample (e.g., a fluid sample), at least by providing systems and methods for tissue specificity and/or stage severity detection of cancer.
[0019] This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an indication of the scope of the claimed subject matter.
[0020] Additional features and advantages of the disclosure will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the disclosure. The features and advantages of the disclosure may be realized and obtained by means of the instruments and combinations particularly pointed out in the appended claims. These and other features of the present disclosure will become more fully apparent from the following description and appended claims, or may be learned by the practice of the disclosure as set forth hereinafter.
BRIEF DESCRIPTION OF THE DRAWINGS
[0021] In order to describe the manner in which the above-recited and other advantages and features of the disclosure can be obtained, a more particular description of the disclosure briefly described above will be rendered by reference to specific embodiments thereof, which are illustrated in the appended drawings. It is appreciated that these drawings depict only typical embodiments of the disclosure and are not therefore to be considered to be limiting of its scope. The disclosure will be described and explained with additional specificity and detail through the use of the accompanying drawings in which:
[0022] Figure 1 illustrates a frequency plot depicting genome-wide copy number variation (CNV) profile results for breast "Primary" tumor (n = 977) and "Normal" breast tissue adjacent to the Primary tumor samples (n = 128);
[0023] Figure 2 illustrates a series of frequency plots depicting genome-wide CNV profile results for the "Primary" breast tumor of Figure 1 with varying sample numbers (n = 5, 10, 20, 50, 100 and 977);
[0024] Figure 3 illustrates a series of frequency plots depicting genome-wide CNV profile results for: (A) the "Primary" breast tumor of Figure 1 and "Normal" blood derived cells, (B) "Metastatic" breast tumor and the "Normal" blood derived cells, (C) the "Metastatic" breast tumor and the "Normal" breast tissue of Figure 1, (D) the "Metastatic" breast tumor and the "Primary" breast tumor of Figure 1, and (E) the "Normal" blood derived cells and the "Normal" breast tissue of Figure 1;
[0025] Figure 4 illustrates a frequency plot depicting genome-wide CNV profile results for nervous system primary tumor (n = 198);
[0026] Figure 5 illustrates a series of frequency plots depicting genome-wide CNV profile results for: (A) brain "Primary" tumor and "Normal" brain tissue adjacent to Primary tumor, (B) "Recurrent" primary brain tumor and the "Normal" tissue adjacent to Primary tumor, and (C) the "Recurrent" primary brain tumor and the "Primary" brain tumor;
[0027] Figure 6 illustrates a frequency plot depicting genome-wide CNV profile results for the "Breast" Primary tumor of Figure 1, the "Nervous System" Primary tumor of Figure 4, and the "Brain" Primary tumor of Figure 5;
[0028] Figure 7 illustrates a series of frequency plots depicting genome-wide CNV profile results for bladder, blood, brain, breast, cervix, colorectal, head and neck, kidney, liver, lung, ovary, pancreas, prostate, skin, stomach, and uterus tumors;
[0029] Figure 8 illustrates a frequency plot depicting genome-wide CNV profile results for "Bone" tumor and "Nervous System" tumor;
[0030] Figure 9 illustrates differential thresholds for unique CNV events at 50%, 55%, 60%, 65% and 70% for the genome-wide CNV profile results of Figure 8;
[0031] Figure 10 illustrates a frequency plot depicting genome-wide CNV profile results for "Stage 1" Nervous System tumors and "Stage 4" Nervous System tumor;
[0032] Figure 11 illustrates differential thresholds for unique CNV events at 50%,
55%, 60%, 65% and 70% for the genome-wide CNV profile results of Figure 10;
[0033] Figure 12 illustrates a series of frequency plots depicting genome-wide CNV results at IX, 3X, 5X, 7X, and 10X sequencing coverage as compared to CNV profile results for colorectal cancer; and
[0034] Figure 13 illustrates a schematic representation of a basic computing system according to one or more embodiments of the present disclosure.
DETAILED DESCRIPTION
[0035] Before describing various embodiments of the present disclosure in detail, it is to be understood that this disclosure is not limited to the specific parameters and description of the particularly exemplified systems, methods, and/or products that may vary from one embodiment to the next. Thus, while certain embodiments of the present disclosure will be described in detail, with reference to specific configurations, parameters, features (e.g., components, members, elements, parts, and/or portions), etc., the descriptions are illustrative and are not to be construed as limiting the scope of the present disclosure and/or the claimed invention. In addition, the terminology used herein is for the purpose of describing the embodiments, and is not necessarily intended to limit the scope of the present disclosure and/or the claimed invention.
[0036] Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the present disclosure pertains.
[0037] Various aspects of the present disclosure, including systems, methods, and/or products may be illustrated with reference to one or more embodiments or implementations, which are exemplary in nature. As used herein, the terms "embodiment" and "implementation" mean "serving as an example, instance, or illustration," and should not necessarily be construed as preferred or advantageous over other aspects disclosed herein. In addition, reference to an "implementation" of the present disclosure or invention includes a specific reference to one or more embodiments thereof, and vice versa, and is intended to provide illustrative examples without limiting the scope of the invention, which is indicated by the appended claims rather than by the description thereof.
[0038] As used herein, the term "systems" also contemplates devices, apparatus, compositions, assemblies, kits, etc., and vice versa. Similarly, the term "method" also contemplates processes, procedures, steps, etc., and vice versa. Moreover, the term "products" also contemplates devices, apparatus, compositions, assemblies, kits, etc., and vice versa, and so forth.
[0039] As used throughout this disclosure, the words "can" and "may" are used in a permissive sense (i.e., meaning having the potential to), rather than the mandatory sense (i.e., meaning must). Additionally, the terms "including," "having," "involving," "containing," "characterized by," variants thereof (e.g., "includes," "has," and "involves," "contains," etc.), and similar terms as used herein, including the claims, shall be inclusive and/or open-ended, shall have the same meaning as the word "comprising" and variants thereof (e.g., "comprise" and "comprises"), and do not exclude additional, un-recited elements or method steps, illustratively. [0040] As used in this specification and the appended claims, the singular forms "a," "an" and "the" each contemplate, include, and specifically disclose both the singular and plural referents, unless the context clearly dictates otherwise. For example, reference to a "nucleic acid" contemplates and specifically discloses one, as well as two or more nucleic acids. Similarly, use of a plural referent does not necessarily require a plurality of such referents, but contemplates, includes, and specifically discloses one, as well as two or more of such referents, unless the context clearly dictates otherwise.
[0041] It will also be appreciated that where two or more values, or a range of values (e-g-, less than, greater than, at least, and/or up to a certain value, and/or between two recited values) is disclosed or recited, any specific value or range of values falling within the disclosed values or range of values is likewise specifically disclosed and contemplated herein. Thus, disclosure of an illustrative measurement (e.g., length, width, thickness, etc.) that is less than or equal to about 10 units or between 0 and 10 units includes, illustratively, a specific disclosure of: (i) a measurement of 9 units, 5 units, 1 units, or any other value between 0 and 10 units, including 0 units and/or 10 units; and/or (ii) a measurement between 9 units and 1 units, between 8 units and 2 units, between 6 units and 4 units, and/or any other range of values between 0 and 10 units.
[0042] While the detailed description is separated into sections, the section headers and contents within each section are not intended to be self-contained descriptions and embodiments. Rather, the contents of each section within the detailed description are intended to be read and understood as a collective whole where elements of one section may pertain to and/or inform other sections. Accordingly, embodiments specifically disclosed within one section may also relate to and/or serve as additional and/or alternative embodiments in another section having the same and/or similar systems, devices, methods, and/or terminology.
Abbreviated list of defined terms
[0043] To assist in understanding the scope and content of the foregoing and forthcoming written description and appended claims, a select few terms are defined directly below.
[0044] The term "cancer" refers to an abnormal, typically uncontrolled, growth of cells. A "cancerous cell" as used herein comprises a malignant cell having an abnormal, typically uncontrolled, growth. As such, the term cancer is an umbrella term encompassing a plurality of different distinctive diseases characterized by malignant cells growing in a typically uncontrolled manner.
[0045] The term "circulating tumor DNA" or "ctDNA" as used herein should be understood in its broadest sense to include cell free DNA circulating in the bloodstream that originated from a tumor cell. When contextually appropriate, ctDNA refers to cell free DNA circulating in the blood stream that originated from a cancerous tumor cell.
[0046] The term "copy number variation" (or "CNV") comprises any of one or more additions, duplications, insertions, deletions, etc. of genomic content at and around the genome, including within one or a plurality of distinct sites on any number of chromosomes. The sites of copy number variation comprise genes (exon and intron regions inclusive), promoter regions, intergenic regions, and may comprise any genomic location producing any of siRNA, miRNA, or other interfering RNA species. In general, the term "copy number variation" includes any of one or more additions, duplications, insertions, deletions, etc. of genomic content of any size and of any type.
[0047] For the purposes of this disclosure, the term "neoplasm" refers to new, uncontrolled growth of cells where the growth is not under physiologic control. A "neoplastic cell" as used herein comprises any of the cells of a neoplasm that are experiencing uncontrolled growth that is not under physiologic control. A neoplasm can be subclassified as either benign or malignant.
[0048] The term "tissue" refers to a biological sample derived from a patient's body and includes solid tissue and liquid tissue.
[0049] The term "patient" generally refers to any animal under the care of a physician, as that term is defined herein, with particular reference to humans under the care of a medical doctor or other relevant medical professional.
[0050] The term "physician" as used herein generally refers to a medical doctor. This term may, when contextually appropriate, include any medical professional, including an oncologist, a surgeon, or any licensed medical professional, such as a physician's assistant, a nurse, a phlebotomist, a veterinarian, etc.
[0051] The term "tumor" as used herein maintains its traditional understanding as any form of swelling or a growth or enlargement. A tumor may be subclassified as benign, precancerous, or cancerous. Further, as used herein, a tumor may not be neoplastic, making some neoplasms, such as leukemia and carcinoma, fall outside the scope of "tumors" as the term is defined herein. Nonetheless, when contextually appropriate, a "tumor" may be synonymous for a neoplasm, and further, a malignant neoplasm is synonymous with a cancerous tumor, as those terms are defined herein.
Overview of cancer prognostic and diagnostic methods
[0052] In addition to the problems mentioned above, current prognostic/diagnostic methods fail to adequately provide a means for accurately, quickly, and non- invasively identifying neoplasms, and/or malignant neoplasms. As described above, traditional treatments rely on a foreknowledge of the location and/or cell type of neoplastic cells in order to screen for them and any associated cancer. These types of cancer prognostic/diagnostic techniques though often useful in diagnosing late stage cancers are outdated and/or incapable of identifying most early stage cancers when treatments could be most effective.
[0053] Cancers, by definition, comprise neoplastic cells (if massed, they are often referred to as a cancerous tumor), and by their very nature, neoplastic cells comprise an unstable genome. This instability may present as one or more duplications, insertions, deletions, etc. of genomic content at and around the genome, including within one or a plurality of distinct sites on any number of chromosomes. Recent advances in the art have uncovered the presence of cell free DNA (cfDNA) in the blood of animals (with particular applications herein to humans and other higher, evolved animals), and in some circumstances, the cell free DNA may comprise circulating tumor DNA (ctDNA). In general, cfDNA can be isolated from the plasma portion of a blood sample, which lacks nucleated cells. If the cfDNA comprises ctDNA, this type of sampling provides a preferable non-invasive site for screening for the presence of cancer.
[0054] One such screening method currently exists and was innovated and commercialized by the assignee of the present application. In such a method, ctDNA is isolated from a plasma sample followed by deep sequencing and/or directed sequencing of specific target loci known to be correlated with one or more cancers. In general, these target loci represent genetic mutations (e.g., SNPs) in a gene and may be associated with one or more physiological, cancerous effects in the cell. This technique, however, is limited to the number of cancer-specific mutations (e.g., SNPs) identified for a specific cancer type, and some cancers lack consistent genetic mutations to accurately and consistently function as a diagnostic marker.
[0055] Additionally, identification of cancer-specific mutations through deep sequencing approaches may be prohibitively expensive and require a significant amount of resource allocation during both the sequencing and post-sequencing steps. These factors, though they can be ameliorated to a certain degree, can be a hurdle to a feasible and widespread adoption. For example, each known mutation (e.g., SNP) must be evaluated individually, and though there are a finite number of them to evaluate, there may be as many as 100 or more different loci to specifically evaluate for each sample, and when running a plurality of samples, the number and cost of laboratory equipment and reagents and/or computational systems necessary to provide high-throughput services may become a limiting factor. Additionally, the foregoing test may be prognostic/diagnostic if one or more mutations are indicated as present. However, if none of the tested mutations are present, the sample may yet represent an unknown cancer type; that is, a negative result on the foregoing test does not rule out a prognosis/diagnosis of cancer.
[0056] Accordingly, although ctDNA is promising for its diagnostic potential, previously identified systems and methods fail to translate the contents of ctDNA into a fast, reliable, and broad-range diagnostic tool.
Methods for detecting cancer in a biological sample
[0057] The present disclosure provides systems and methods for detecting cancer from a biological sample. In some embodiments, the present disclosure provides systems and methods that utilize nucleic acid (e.g., DNA, cfDNA, ctDNA, RNA, etc.) (or the sequence thereof) as a fast, reliable, and broad-range diagnostic tool. As an exemplary embodiment, nucleic acid can be isolated from a biological sample. For instance, cellular (i.e., nuclear) DNA can be isolated from (primary) tumor or other tissue. Alternatively, or in addition, cfDNA, such as ctDNA, can be isolated from a biological sample, such as a liquid biopsy (e.g., blood, plasma, serum, mucus, saliva, sputum, spinal fluid, etc.). At least a portion of the isolated nucleic acid (e.g., nuclear DNA or cfDNA) can be sequenced. For example, a nucleic acid library can be prepared of or from the isolated nucleic acid. At least a portion of the prepared nucleic acid library can be sequenced. The sequenced library can then be searched for relevant sequencing information.
[0058] In some embodiments, instead of (or in addition to) sequencing and/or searching for specific SNPs correlating with cancer, the sequenced nucleic acid (library) can be searched or probed for copy number variations, which— as described in more detail below— can be used to identify one or more of a cancer type, a tissue of origin of the nucleic acid (e.g., cfDNA) and/or cancer, a stage/severity of the cancer, etc., whether alone or in combination with other approaches such as, for example, nucleic acid methylation sequencing. Embodiments can also include measuring a number of copies of at least one nucleic acid sequence included in the sequenced portion of the nucleic acid library and/or comparing the measured number of copies with a standard copy number or CNV profile for the at least one nucleic acid sequence to determine variability or similarity between the measured number of copies and the standard copy number.
[0059] The detection size of copy number variation is preferably between about 1Mb and about 20Mb but may, in some embodiments comprise any size and combination of copy number variations. In some embodiments, the detected size of copy number variation may range between about 10 kb - 100 Mb, 100 kb - 50 Mb, 500 kb - 50 Mb, 500 kb - 25 Mb, , 750 kb - 25 Mb, 1Mb - 25 Mb, or any other range where the lower end value and the higher end value are at least one of (or at least greater than for the lower end value or at least less than for the upper end value): 100 kb, 200 kb, 300 kb, 400 kb, 500 kb, 600 kb, 700 kb, 800 kb, 900 kb, 1 Mb, 2 Mb, 3 Mb, 4 Mb, 5 Mb, 6 Mb, 7 Mb, 8 Mb, 9 Mb, 10 Mb, 11 Mb, 12 Mb, 13 Mb, 14 Mb, 15 Mb, 16 Mb, 17 Mb, 18 Mb, 19 Mb, 20 Mb, 21 Mb, 22 Mb, 23 Mb, 24 Mb, 25 Mb, 26 Mb, 27 Mb, 28 Mb, 29 Mb, 30 Mb, 35 Mb, 40 Mb, 45 Mb, 50 Mb, 60 Mb, 70 Mb, 75 Mb, 80 Mb, 90 Mb, or 100 Mb, wherein the values between the lower end and the upper end are non-negative.
[0060] Embodiments of the present disclosure display surprising and unexpected results in that even at less granular detection levels (i.e., relatively shallow sequencing coverage), the copy number variation identification patterns of various cancer types are unique. Accordingly, in some embodiments of the present disclosure, sequence coverage is preferably between about 0.05X - 20X but may, in some embodiments, be less than or equal to any of about: 30X, 20X, 15X, 10X, 9X, 8X, 7X, 6X, 5X, 4X, 3X, 2.5X, 2X, 1.5X, IX, 0.9X, 0.8X, 0.75X, 0.7X, 0.6X, 0.5X, 0.4X, 0.3X, 0.25X, 0.2X, 0.1X, 0.05X, 0.025X, 0.01X, 0.005X, 0.0025X, or 0.0001X coverage, or any possible coverage value or range of coverage created between any two of the foregoing points (e.g., between about 0.0001X - 10X, 0.1X - 5X, 0.5X - 4X, 0.5X - 3X, 0.5X - 2.5X, 0.5X - 2X, 0.5X - IX, 1X - 5X, 1X - 3X, 1X - 2.5X, 1X - 2X, 1.5X - 2.5X, 0.75X - 2X, etc.).
[0061] The foregoing coverage depths can be accomplished by any sequencing means known in the art. For instance, sequencing can be accomplished via single end sequencing and/or paired end sequencing, as known in the art. Moreover, sequencing of a defined region can be accomplished with relative few, but longer (e.g., 100-200 base pair) reads or relatively many, but shorter (e.g., 25-100 base pair) reads spanning a specific or specified region. By way of example, approximately IX coverage of a 250 base region can be accomplished with, for example, 2 fragments of 125 bases each, 5 fragments of 50 bases each, 5 fragments of 25 bases each, and other combinations as understood by those skilled in the art.
[0062] In some embodiments, sequencing can be performed using reads of less than about 200 bp, 180 bp, 170 bp, 160bp, 150 bp, 140 bp, 130 bp, 120 bp, 110 bp, 100 bp, 90 bp, 80 bp, 70 bp, 60 bp, 50 bp, 40 bp, 30 bp, 20 bp, 10 bp, or any value or range of values therebetween. For instance, in some embodiments, sequencing can be performed using reads of between about 10-200 bp, 25-150 bp, 25-100 bp, 25-50bp, 50-150 bp, 50-100 bp, 100-150 bp, and so forth. In a preferred embodiment, sequencing can be performed using reads of about 20-30 bp, preferably about 24-28 bp, more preferably about 25-26 bp. One advantage of these smaller reads or ranges of reads is that gene or sequence copy numbers and/or locations can be more robustly identified, measured, and/or determined.
[0063] Depending on the technology used to sequence, summarized herein, some embodiments of the present disclosure can augment the sequence coverage by adjusting the total number of samples run on a single flow cell or similar sequencing input. For example, a flow cell having a maximum read output of 120 Gb may support 40 samples (which in some embodiments have a normalized input, and in the case of the human genome comprising ~3 Gb) at IX coverage. The same flow cell could support 80 samples at 0.5X coverage or 20 samples at 2X coverage. In some embodiments, therefore, the sequence coverage may be adjusted depending on how many samples are being processed and the capacity of the flow cell (or similar input device for sequencing) being used. In some embodiments, lower sequencing coverages (e.g., 0.5X, IX, 3X, 5X, 7X, etc.) can be illustrated by sequencing at a higher sequencing coverage (e.g., 10X, 7X, 5X, etc.) followed by in-silico down- sampling the higher sequence coverage sample. CNV profiles can be obtained, for example, by analysis performed using Nexus Copy Number software.
[0064] Certain advantages are associated with the inventive methods and systems provided herein. For example, because the samples can be acquired in a non-invasive manner and the detection of cancer types through copy number variation patterns is robust, it is, in some embodiments, possible to detect cancer at early stages and to be able to distinguish early stage from late stage cancer. Thus, at least some embodiments of the present disclosure are directed to systems and methods for detecting cancer and for differentiating between early-stage and late-stage cancer. In doing so, tangible benefits are afforded to the patient. Specialized and/or individualized treatment regimens can be prescribed to the patient based on the detection and determination of cancer type or stage. Having a better understanding and a finer precision for detecting cancer— as provided by implementations disclosed herein— can increase patient survival rates, decrease the amount or time to effectively treat, and/or reduce the incidence of misdiagnosis.
[0065] Additionally, the defined set of rules or criteria associated with the disclosed methods provide many benefits and improvements. For example, many more cancers are detectable using the methods disclosed herein than through other methods known in the art. This expands the number and types of cancers that would otherwise have gone undetected or misdiagnosed. As an additional example, performing low sequence coverage reads for each sample provides additional bandwidth on a single machine. This increases the efficiency of the device and allows more work to be done in less time (e.g., more samples can be processed in a single run on the machine and/or more samples can be processed in a given period of time). Further, implementations of the present disclosure allow for less resources to be consumed per sample, which allows for more efficient use of resources and/or less money spent per sample. These foregoing benefits can lower the effective cost (e.g., time and resource utilization), thereby affording an additional benefit and improvement. This may, in some embodiments, bring the monetary cost to consumers down to a level where it is practical to perform this exam as a precautionary/screening step during regular checkups. This satisfies a long felt but unmet need in the market and is likely to experience substantial economic success.
[0066] In addition to the copy number variation plots, one or more additional tools may be provided to increase the predictability and/or consistency of the results. For example, use of antibody-based assays and/or nucleic acid methylation sequencing techniques (e.g., bisulfite sequencing) may be paired with the copy number variation methods described herein. The combination thereof can provide an unexpected result of increased predictive capacity for the cancer type, tissue of origin, and/or stage/severity of cancer that, in some embodiments, may not be possible through inspection and/or analysis of either alone.
[0067] In one or more embodiments of the present disclosure, systems and methods comprise a sequencing step and/or the cfDNA/ctDNA is sequenced. It should be appreciated that a variety of sequencing techniques fall within the scope of the present disclosure and may be adopted for use in one or more of the disclosed systems and/or methods. In one or more embodiments, sequencing comprises the selective incorporation of chain-terminating di-deoxynucleotides— which were modified (e.g., fluorescent and/or radioactive) for reporting the site of incorporation. As a non- limiting example of the foregoing, sequencing comprises Sanger sequencing.
[0068] Additionally, or alternatively, embodiments of the present disclosure extend to and/or include so called "next generation sequencing" (NGS), also known as high- throughput sequencing. NGS refers to non-Sanger-based, high-throughput DNA sequencing technologies. Through NGS, millions or even billions of DNA strands can be sequenced in parallel, yielding substantially more throughput and minimizing the need for the fragment-cloning methods that are often used in Sanger sequencing of genomes. NGS is the catch-all term used to describe a number of different modern sequencing technologies or platforms including, for example, pyrosequencing, sequencing by synthesis, sequencing by ligation, ion semiconductor sequencing, and others as known in the art.
[0069] NGS generally allow sequencing of large amounts of DNA and RNA much more quickly and affordably than Sanger sequencing. In NGS, vast numbers of short reads are sequenced in a single stroke. To do this, firstly the input sample can be cleaved into short sections. The length of these sections depends on the particular sequencing machinery used. Illustrative examples of specific NGS technologies include, for example, Ulumina® (Solexa) sequencing, Roche 454™ sequencing, Ion torrent™: Proton / PGM sequencing, SOLiD sequencing, and so forth.
[0070] In Illumina sequencing, 100-150bp reads are used. Somewhat longer fragments are ligated to generic adaptors and annealed to a slide using the adaptors. PCR is carried out to amplify each read, creating a spot with many copies of the same read. They are then separated into single strands to be sequenced. The slide is flooded with nucleotides and DNA polymerase. These nucleotides are fluorescently labelled, with the color corresponding to the base. They also have a terminator, so that only one base is added at a time. An image is taken of the slide. In each read location, there will be a fluorescent signal indicating the base that has been added. The slide is then prepared for the next cycle. The terminators are removed, allowing the next base to be added, and the fluorescent signal is removed, preventing the signal from contaminating the next image. The process is repeated, adding one nucleotide at a time and imaging in between. Computers are then used to detect the base at each site in each image and these are used to construct a sequence. All of the sequence reads will be the same length, as the read length depends on the number of cycles carried out.
[0071] Roche 454™ sequencing can generally sequence much longer reads than Illumina®. Like Illumina®, it does this by sequencing multiple reads at once by reading optical signals as bases are added. As in Illumina®, the DNA or RNA is fragmented into shorter reads, in this case up to lkb. Generic adaptors are added to the ends and these are annealed to beads, one DNA fragment per bead. The fragments are then amplified by PCR using adaptor-specific primers. Each bead is then placed in a single well of a slide. So each well will contain a single bead, covered in many PCR copies of a single sequence. The wells also contain DNA polymerase and sequencing buffers. The slide is flooded with one of the four NTP species. Where this nucleotide is next in the sequence, it is added to the sequence read. If that single base repeats, then more will be added. So if we flood with Guanine bases, and the next in a sequence is G, one G will be added, however if the next part of the sequence is GGGG, then four Gs will be added. The addition of each nucleotide releases a light signal. These locations of signals are detected and used to determine which beads the nucleotides are added to. This NTP mix is washed away. The next NTP mix is now added and the process repeated, cycling through the four NTPs. This kind of sequencing generates graphs for each sequence read, showing the signal density for each nucleotide wash. The sequence can then be determined computationally from the signal density in each wash. All of the sequence reads we get from 454 will be different lengths, because different numbers of bases will be added with each cycle.
[0072] Unlike Illumina® and Roche 454™, Ion torrent™ and Ion proton sequencing do not make use of optical signals. Instead, they exploit the fact that addition of a dNTP to a DNA polymer releases an H+ ion. As in other kinds of NGS, the input DNA or RNA is fragmented, this time ~200bp. Adaptors are added and one molecule is placed onto a bead. The molecules are amplified on the bead by emulsion PCR. Each bead is placed into a single well of a slide. Like Roche 454™, the slide is flooded with a single species of dNTP, along with buffers and polymerase, one NTP at a time. The pH is detected is each of the wells, as each H+ ion released will decrease the pH. The changes in pH allow us to determine if that base, and how many thereof, was added to the sequence read. The dNTPs are washed away, and the process is repeated cycling through the different dNTP species. The pH change, if any, is used to determine how many bases (if any) were added with each cycle.
[0073] Additionally, or alternatively, the sequencing may be more generally performed by a fluorescent-based sequencing technique and/or any electrical-current- based sequencing technique. Illustrative examples of fluorescent-based sequencing techniques include any technique that incorporates nucleotides conjugated to a fluorophore, such as, for example sequencing using Illumina® based sequencing methods and systems. Illustrative examples of electrical-current-based sequencing techniques include any sequencing technique (including strand sequencing methods) that measures the electrical current of a polynucleotide as it passes through a pore inserted into a charged membrane or otherwise specifically disrupts the electrical current of a sensor and/or charged membrane. A non-limiting example of electrical- current-based sequencing techniques include the Nanopore DNA sequencing systems and methods of Oxford NanoPore Technologies®.
[0074] Strand sequencing systems, such as those provided by Oxford NanoPore Technologies®, provide some advantages when determining copy number variation of a nucleic acid, particularly the copy number variation of a sample that potentially contains DNA (or other nucleic acid) from neoplastic and/or cancerous cells. For example, in strand sequencing techniques, a single portion of the genome is continuously sequenced, which allows a direct analysis of copy number variation instead of an implicit analysis of copy number variation that may occur when analyzing sequencing data provided by other sequencing methods where the sample nucleic acid is cut into small fragments for sequencing. This may be particularly advantageous for embodiments when sequence coverage is low. That is, in some embodiments, a low sequence coverage run may return an incomplete set of genomic data. It may be possible to infer from the sequence data the presence and/or absence of genomic regions in addition to an implicit copy number for each sequenced region. However, in a strand sequencing method, the long sequence reads produced may allow for a more definitive assessment of copy number variation, particularly for regions that are duplicated or deleted. If a full sequence is not available due to the low coverage of the sequencing run, it may be difficult to determine what portions of the genome are deleted (a form of copy number variation) versus what portions of the genome were not represented based on statistical probability (i.e., random sampling).
[0075] As an illustrative example, a sequencing run that generates data having 0.5X coverage will theoretically leave half of the sample unrepresented. Using sequencing methods that "chop up" the nucleic acid into small fragments for sequencing, the final product may be a sequence library representing about half of the total reference genome, where an aligned reference genome is littered with a smattering of smaller nucleic acid matches. On the other hand, using a strand sequencing method, again at low coverage (e.g., 0.5X), the result may be a sequence library representing, again, about half of the total reference genome. However, when aligned with a reference genome, the matching portions are much longer and may provide more definitive information, such as what sequences have been deleted, duplicated, inserted, etc. This may also prove problematic. While a longer contiguous portion of the genome may be represented by a strand sequencing approach, long contiguous portions of the genome are also left unknown. So, although strand sequencing methods may allow for a higher definition view of portions of the genome, smaller sequencing reads have the potential to provide a more global picture of the entire genome. In in this and other ways, strand sequencing may provide a robust model for analyzing copy number variation.
[0076] Though the foregoing is illustrative of known sequencing techniques and their applications to the inventive methods and systems disclosed herein, it should be understood that this does not preclude as yet undiscovered or otherwise undisclosed sequencing methods from being applied within the scope of the present invention. That is, the sequencing method, itself, is not, in many embodiments, a requisite inventive step (unless, for example, an improvement is provided to the method and/or system through use of a particular sequencing technique); rather, what is done with the sequencing data provided by the sequencing method and/or how those data are applied generally comprises an inventive step. Accordingly, it should be appreciated that future sequencing technologies (and those sequencing technologies that have not been explicitly listed herein), if used as a tool in the disclosed method or systems, are included within the scope of this application.
[0077] Additionally, any of the foregoing sequencing techniques may be used in any number or capacity and with any number of flow cells or other similar inputs that affect the total number of sequencing reads provided for each sequencing reaction/run.
[0078] Due to the genomic instability of neoplastic cells, there are inevitably certain copy number variations that arise, and by processing and analyzing the sequences of a plurality of samples comprising cancer DNA, it is possible to select a locus or multiple loci that, in combination, provide one or more copy number variation plots that are illustrative and/or indicative of one or more cancer phenotypes (e.g., cancer type, tissue of origin, and/or stage/severity of cancer, etc.).
[0079] For example, the accompanying figures depict comparison plots that, in some instances (e.g., sample types, chromosomal locations, etc.), illustrate the frequency of copy number gains (black bars; above the reference line) and/or losses (grey bars; below the reference line) in particular genomic regions and/or for a variety of cancer classifiers (e.g., cancerous cells or cell types, such as primary tumors, cancer stage, etc.) and/or a variety of normal (e.g., non-cancerous) tissues and/or biological samples. As a non-limiting example, some of the figures illustrate CNV plots comparing gene copy numbers observed in primary (solid) tumors, normal primary tissue adjacent the cancerous and/or tumor tissue, metastatic tumors, and so forth. In some instances, a stark contrast in the copy number at one or more particular loci may be observed in cancer cells. Additionally, these copy number variation patterns may not be observed in normal tissue, whether derived from liquid (e.g., blood, etc.) or solid tissue. Additionally, or alternatively, the figures generally illustrate that the copy number variation pattern of certain cancer types (e.g., breast cancer) may be evident even when comparing a low number of samples (e.g., n = 5, n = 10, n = 20, etc.), and as the number of samples increases, the copy number variation pattern may become more solidified (or significant) in some areas.
[0080] Unique copy number variation patterns can be significant, in some embodiments. Further, a plurality of cancer types, tissues of origin, and/or cancer stage/severity may, in some embodiments, comprise unique copy number variation patterns. In some embodiments, the copy number variation patterns may span a single portion of a single chromosomal region while in other embodiments, the copy number variation patterns are more nuanced and comprise smaller portions of a plurality of chromosomes in combination.
[0081] Figure 1 illustrates a frequency plot depicting genome-wide CNV profile results in human breast "Primary" tumor (n = 977) and "Normal" breast tissue adjacent to the Primary tumor samples (n = 128), relative to a consensus genome sequence. The plot is segmented into a chromosome map, including chromosomes 1- 22, X, and Y. Duplication events, or copy number gains, are depicted with black bars above the respective sample reference lines. Deletion events, or copy number losses, are depicted with grey bars below the respective sample reference lines. The relative size (i.e., length, height, etc.) of the black and grey bars is representative of the 'frequency' of the corresponding CNV event. In particular, longer (or taller) bars indicate regions where a higher percentage of the total samples in the study (i.e., more samples) were positive for the CNV event, while shorter bars indicate regions where a lower percentage of the total samples in the study (i.e., fewer samples) were positive for the CNV event. Illustrative regions of "Significant" CNV, relative to the consensus genome, are indicated with black shading (for duplications) and grey shading (for deletions).
[0082] These results illustrate the contrast between observable CNV in cancerous and non-cancerous breast tissue. Some significant regions of CNV duplications in breast cancer include, without limitation, respective parts of chromosomes 1, 5, 8, 16, 17, and 20. Some significant regions of CNV deletion in breast cancer include, without limitation, respective parts of chromosomes 8, 11, 13, 16, 17, and 22. These results indicate that breast cancer tumors, illustratively, have a CNV signature that distinguishes such tumors from normal tissue. Diagnostically, breast tissue presenting the illustrated pattern can be classified as cancerous, likely-to-be-cancerous, in danger of being cancerous, or otherwise associated with a cancer profile. Thus, breast tissue suspected of being cancerous can be biopsied and sequenced (at relatively low sequencing coverage (e.g., less than or equal to 5X coverage)) for CNV, rather than SNP, to provide a preliminary or even final diagnosis. Low coverage sequencing can be performed quickly to provide clinical indications of cancerous or cancer-prone tissues. A subject CNV profile (for breast tissue suspected of being cancerous) can be compared to one or more standard, breast cancer CNV profiles to observe and/or measure similarities and/or differences between the subject and the standard CNVs. Based on the similarities and/or differences, the subject tissue can be diagnosed as being associated with a breast cancer profile, etc.
[0083] Figure 2 illustrates a series of frequency plots depicting genome-wide CNV profile results for the "Primary" breast tumor of Figure 1 with varying sampling sizes (n = 5, 10, 20, 50, 100 and 977). These results illustrate that the CNV pattern of breast cancer, illustratively, is evident even in a low number of samples (e.g., n = 5, n = 10, n = 20, etc.), and that as the number of samples increases (e.g., n = 50, n = 100, etc.), the CNV pattern may become more solidified (or significant) in some areas. Accordingly, even rare cancer CNV profiles can be assembled using a few samples of cancerous (and normal, control) tissue.
[0084] Figure 3 illustrates a series of frequency plots depicting genome-wide CNV profile results for: (A) the "Primary" breast tumor of Figure 1 and "Normal" blood derived cells, (B) "Metastatic" breast tumor and the "Normal" blood derived cells, (C) the "Metastatic" breast tumor and the "Normal" breast tissue of Figure 1, (D) the "Metastatic" breast tumor and the "Primary" breast tumor of Figure 1, and (E) the "Normal" blood derived cells and the "Normal" breast tissue of Figure 1. These results illustrate that a variety of cancerous tissue sources can be used to investigate CNV patterns in cancer. Accordingly, these various tissue sources can each be used diagnostically when detecting cancer in a biological sample. Moreover, as indicated above, even the six (6) Metastatic breast tumor tissue samples were effective to assemble a genome-wide CNV profile. Moreover, as observed in plots (B)-(D), metastatic breast tumor samples (or CNV profiles thereof), which have unique CNV event(s) in chromosome 15, for example, can be distinguished, not only from normal tissue samples (or CNV profiles thereof), as in plots (B) and (C), but also from primary breast tumor samples (or CNV profiles thereof), as in plot (D).
[0085] Figure 4 illustrates a frequency plot depicting genome-wide CNV profile results for nervous system primary tumors (n = 198). These results illustrate that, like breast tumors, nervous system tumors can be characterized by a CNV pattern. Diagnostically, nervous system tissue presenting the illustrated CNV profile (or CNV profile significantly similar thereto) can also be classified as being associated with a cancer profile, etc. Similarly, a subject CNV profile (for nervous system tissue suspected of being cancerous) can be compared to one or more standard, nervous system cancer CNV profiles to observe and/or measure similarities and/or differences between the subject and the standard CNVs. Based on the similarities and/or differences, the subject tissue can be diagnosed as being associated with a nervous system cancer profile, etc.
[0086] Figure 5 illustrates a series of frequency plots depicting genome-wide CNV profile results for: (A) brain "Primary" tumor and "Normal" brain tissue adjacent to Primary tumor, (B) "Recurrent" primary brain tumor and the "Normal" tissue adjacent to Primary tumor, and (C) the "Recurrent" primary brain tumor and the "Primary" brain tumor. These results illustrate that, like breast tumors and nervous system tumors, brain tumors can be characterized by a CNV pattern. Diagnostically, brain tissue presenting the illustrated CNV profile (or CNV profile significantly similar thereto) can also be classified as being associated with a cancer profile, etc. Similarly, a subject CNV profile (for brain tissue suspected of being cancerous) can be compared to one or more standard, brain cancer CNV profiles to observe and/or measure similarities and/or differences between the subject and the standard CNVs. Based on the similarities and/or differences, the subject tissue can be diagnosed as being associated with a brain cancer profile, etc.
[0087] Figure 6 illustrates a frequency plot depicting genome-wide CNV profile results for the "Breast" Primary tumor of Figure 1, the "Nervous System" Primary tumor of Figure 4, and the "Brain" Primary tumor of Figure 5. These results illustrate that various cancer types can be distinguished one from another based on CNV profile. Accordingly, in addition to classifying a tissue sample as being associated with a cancer profile, etc., CNV profiles (e.g., obtained through low-coverage sequencing) can be diagnostic for cancer type. Diagnostically, a subject CNV profile for any tissue can be compared to various standard, cancer-type CNV profiles to observe and/or measure similarities and/or differences between the subject and standard CNVs. Based on the similarities and/or differences, the subject tissue can be diagnosed as being associated with a specific cancer type profile, etc.
[0088] Figure 7 illustrates a series of frequency plots depicting genome-wide CNV profile results for bladder, blood, brain, breast, cervix, colorectal, head and neck, kidney, liver, lung, ovary, pancreas, prostate, skin, stomach, and uterus tumors. These results further illustrate the variety of cancer types that can be distinguished one from another based on CNV profile.
[0089] Figure 8 illustrates a frequency plot depicting genome-wide CNV profile results for "Bone" tumor and "Nervous System" tumor, further illustrating the variety of cancer types that can be distinguished one from another based on CNV profile.
[0090] Figure 9 illustrates differential thresholds for the unique CNV events illustrated in the genome-wide CNV profile results of Figure 8. In particular, the chart presents the number of CNV events having a significant, minimum difference percentage between the two tissue types (bone and nervous system tumors), with a p- value (threshold) = 0.05. Illustratively, 2737 different CNV events were unique (i.e., observed in bone tumors and not to nervous system tumors, or vice versa), having at least a 50% difference. Similarly, 61 different CNV events were unique with at least a 70% difference. Higher number of CNV events at higher differential threshold percentage means less similarity between two groups.
[0091] Figure 10 illustrates a frequency plot depicting genome-wide CNV profile results for "Stage 1" Nervous System tumors and "Stage 4" Nervous System tumors. Differences in CNV profile between the two samples are readily observable, indicating that cancer stage/severity is also distinguishable using CNV profile comparison. Diagnostically, a cancer sample can be staged and even graded based on similarities and/or differences between the subject and standard CNVs.
[0092] Figure 11 illustrates differential thresholds for the unique CNV events illustrated in the genome-wide CNV profile results of Figure 10.
[0093] Figure 12 illustrates a (diagnostic) series of frequency plots depicting genome- wide CNV results for a patient at IX, 3X, 5X, 7X, and 10X sequencing coverage as compared to CNV profile results for colorectal cancer. The CNV patient results can be obtained from ctDNA (circulating tumor DNA), tumor, and/or polyp tissue nucleic acid analysis. Illustratively, the CNV patient results can be obtained by sequencing the cfDNA (cell free DNA) isolated from the blood of a colorectal cancer patient. The CNV patient results can be compared and/or matched with a colorectal cancer CNV profile. The profile can be obtained by compiling CNV sequencing data (obtained in advance, publicly available, and updated periodically).
[0094] As an initial matter, these results indicate that colorectal cancer also has a CNV signature. For instance, the profile illustrates a variety of CNV events (e.g., duplications and deletions) across the represented genome plot. At several (e.g., each) coverage levels, the patient sample CNV matches the colorectal cancer CNV profile (see e.g., CNV gains (indicated with black arrows) at chromosome 5p, 7, 8, 9p, 13, 16, 19 and 20, illustratively, and CNV losses (indicated with grey arrows) at chromosome lp, 4, 14, 15, 17, and 22).
[0095] Diagnostically, the foregoing illustrates how a patient sample can be analyzed through relatively low-coverage sequencing to detect CNV present in the sample. The detected CNV can be compared to and/or matched with one or more existing cancer profiles in order to associate the patient sample with a cancer CNV profile. This association can be informative, predictive, and/or diagnostic for the cancer or other cancer condition, such as predisposition, early development, cancer classification, metastasis, stage/grade, etc. Illustratively, cfDNA sampled from blood or other bodily fluid can be sampled non-invasively and sequenced to provide a diagnosis or indication of cancer, likely-to-be-cancerous, in danger of being cancerous, or otherwise associated with a cancer profile.
[0096] Figures 1-12 thus illustrate the unique nature of cancer CNV profiles, including cancer origin, type, classification, stage, etc., as compared to normal tissue samples, related cancer origin, type, classification, stage, etc., and different cancer origin, type, classification, stage, etc. Figures 1-12 also illustrate differential threshold and significance determination for comparative samples, as well as the diagnostic relevance of comparing a single, patient sample CNV plot with one or more cancer CNV profiles. Based on the similarities and/or differences between the patient CNV and the profile CNV, the patient can be diagnosed as being associated with a particular cancer profile. The cancer profile can be indicative or representative of cancer origin, type, classification, stage, etc.
[0097] Certain embodiments of the present disclosure comprise methods for detecting cancer and/or cancer nucleic acid (e.g., DNA, cfDNA, ctDNA, RNA, etc.) in a biological sample, such as tumor tissue or a liquid biopsy. A liquid biopsy, for the purposes of this disclosure, comprises a biological fluid sample (e.g., a fluid sample taken from a patient). It may, for example, include any of whole blood, serum, plasma, cerebrospinal fluid, tumor fluid, interstitial fluid phase, any other relevant bodily fluid, and combinations thereof. The methods may further comprise isolating cfDNA from the liquid biopsy wherein the cfDNA comprises one or more genetic elements. The method may further comprise determining a copy number of the one or more genetic elements and comparing the determined copy number to one or more known copy number standards for ctDNA.
[0098] In one or more embodiments of the present disclosure, the foregoing method may additionally comprise assembling a genetic profile of the cfDNA, wherein the genetic profile comprises a representation of the relative abundance of the one or more genetic elements in the cfDNA. In the same or additional embodiments, the method may further comprise detecting the presence of ctDNA in the liquid biopsy by measuring one or more similarities between the determined copy number and the one or more known copy number standards for ctDNA. Similar to the methods described above, the method may further comprise determining a tissue of origin and/or a stage/severity of cancer based on the similarity of the measured number of copies and the standard number of copies.
[0099] In one or more embodiments, the detection of ctDNA in a liquid biopsy comprises detecting a copy number variation pattern in the ctDNA and/or in one or more genetic elements derived from the cfDNA. In some cases, the cfDNA and the ctDNA are the same molecules.
[00100] Any of the methods disclosed herein may additionally comprise a step of implementing a cancer-specific treatment based on the identified cancer type, tissue of origin, and/or stage/severity of the identified cancer. Additionally, or alternatively, one or more therapeutics are prescribed and/or one or more surgeries are performed to mitigate the potential harms of leaving the identified cancer untreated.
[00101] Computing systems are increasingly taking a wide variety of forms.
Computing systems may, for example, be handheld devices, appliances, laptop computers, desktop computers, mainframes, distributed computing systems, datacenters, or even devices that have not conventionally been considered a computing system, such as wearables (e.g., glasses). In this description and in the claims, the term "computing system" is defined broadly as including any device or system— or combination thereof— that includes at least one physical and tangible processor and a physical and tangible memory capable of having thereon computer- executable instructions that may be executed by a processor. The memory may take any form and may depend on the nature and form of the computing system. A computing system may be distributed over a network environment and may include multiple constituent computing systems.
[00102] As illustrated in Figure 13, a basic configuration of a computing system 100 typically includes at least one hardware processing unit 102 and memory 104. The memory 104 may be physical system memory, which may be volatile, nonvolatile, or some combination of the two. The term "memory" may also be used herein to refer to non-volatile mass storage such as physical storage media. If the computing system is distributed, the processing, memory, and/or storage capability may be distributed as well.
[00103] The computing system 100 also has thereon multiple structures often referred to as an "executable component." For instance, the memory 104 of the computing system 100 is illustrated as including executable component 106. The term "executable component" is the name for a structure that is well understood to one of ordinary skill in the art in the field of computing as being a structure that can be software, hardware, or a combination thereof. For instance, when implemented in software, one of ordinary skill in the art would understand that the structure of an executable component may include software objects, routines, methods, and so forth, that may be executed on the computing system, whether such an executable component exists in the heap of a computing system, or whether the executable component exists on computer-readable storage media.
[00104] In such a case, one of ordinary skill in the art will recognize that the structure of the executable component exists on a computer-readable medium such that, when interpreted by one or more processors of a computing system (e.g., by a processor thread), the computing system is caused to perform a function. Such structure may be computer-readable directly by the processors— as is the case if the executable component were binary. Alternatively, the structure may be structured to be interpretable and/or compiled— whether in a single stage or in multiple stages— so as to generate such binary that is directly interpretable by the processors. Such an understanding of exemplary structures of an executable component is well within the understanding of one of ordinary skill in the art of computing when using the term "executable component."
[00105] The term "executable component" is also well understood by one of ordinary skill as including structures that are implemented exclusively or near- exclusively in hardware, such as within a field programmable gate array (FPGA), an application specific integrated circuit (ASIC), Program-specific Standard Products (ASSPs), System-on-a-chip systems (SOCs), Complex Programmable Logic Devices (CPLDs), or any other specialized circuit. Accordingly, the term "executable component" is a term for a structure that is well understood by those of ordinary skill in the art of computing, whether implemented in software, hardware, or a combination. In this description, the terms "component," "service," "engine," "module," "control," "generator," or the like may also be used. As used in this description and in this case, these terms— whether expressed with or without a modifying clause— are also intended to be synonymous with the term "executable component," and thus also have a structure that is well understood by those of ordinary skill in the art of computing.
[00106] In the description that follows, embodiments are described with reference to acts that are performed by one or more computing systems. If such acts are implemented in software, one or more processors (of the associated computing system that performs the act) direct the operation of the computing system in response to having executed computer-executable instructions that constitute an executable component. For example, such computer-executable instructions may be embodied on one or more computer-readable media that form a computer program product. An example of such an operation involves the manipulation of data.
[00107] The computer-executable instructions (and the manipulated data) may be stored in the memory 104 of the computing system 100. Computing system 100 may also contain communication channels 108 that allow the computing system 100 to communicate with other computing systems over, for example, network 110.
[00108] While not all computing systems require a user interface, in some embodiments the computing system 100 includes a user interface 112 for use in interfacing with a user. The user interface 112 may include output mechanisms 112A as well as input mechanisms 112B. The principles described herein are not limited to the precise output mechanisms 112A or input mechanisms 112B as such will depend on the nature of the device. However, output mechanisms 112A might include, for instance, speakers, displays, tactile output, holograms and so forth. Examples of input mechanisms 112B might include, for instance, microphones, touchscreens, holograms, cameras, keyboards, mouse or other pointer input, sensors of any type, and so forth.
[00109] Embodiments described herein may comprise or utilize a special purpose or general-purpose computing system including computer hardware, such as, for example, one or more processors and system memory, as discussed in greater detail below. Embodiments described herein also include physical and other computer-readable media for carrying or storing computer-executable instructions and/or data structures. Such computer-readable media can be any available media that can be accessed by a general purpose or special purpose computing system. Computer-readable media that store computer-executable instructions are physical storage media. Computer-readable media that carry computer-executable instructions are transmission media. Thus, by way of example— not limitation— embodiments of the invention can comprise at least two distinctly different kinds of computer-readable media: storage media and transmission media.
[00110] Computer-readable storage media include RAM, ROM, EEPROM, solid state drives ("SSDs"), flash memory, phase-change memory ("PCM"), CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other physical and tangible storage medium which can be used to store desired program code in the form of computer-executable instructions or data structures and which can be accessed and executed by a general purpose or special purpose computing system to implement the disclosed functionality of the invention.
[00111] A "network" is defined as one or more data links that enable the transport of electronic data between computing systems and/or modules and/or other electronic devices. When information is transferred or provided over a network or another communications connection (either hardwired, wireless, or a combination of hardwired or wireless) to a computing system, the computing system properly views the connection as a transmission medium. Networks may be "private" or they may be "public," or networks may share qualities of both private and public networks. A private network may be any network that has restricted access such that only the computer systems and/or modules and/or other electronic devices that are provided and/or permitted access to the private network may transport electronic data through the one or more data links that comprise the private network. A public network may, on the other hand, not restrict access and allow any computer systems and/or modules and/or other electronic devices capable of connecting to the network to use the one or more data links comprising the network to transport electronic data.
[00112] For example, a private network found within an organization, such as a private business, restricts transport of electronic data between only those computer systems and/or modules and/or other electronic devices within the organization. Conversely, the Internet is an example of a public network where access to the network is, generally, not restricted. Computer systems and/or modules and/or other electronic devices may often be connected simultaneously or serially to multiple networks, some of which may be private, some of which may be public, and some of which may be varying degrees of public and private. For example, a laptop computer may be permitted access to a closed network, such as a network for a private business that enables transport of electronic data between the computing systems of permitted business employees, and the same laptop computer may also access an open network, such as the Internet, at the same time or at a different time as it accesses the exemplary closed network.
[00113] Transmission media can include a network and/or data links which can be used to carry desired program code in the form of computer-executable instructions or data structures and which can be accessed and executed by a general purpose or special purpose computing system. Combinations of the above should also be included within the scope of computer-readable media.
[00114] Further, upon reaching various computing system components, program code in the form of computer-executable instructions or data structures can be transferred automatically from transmission media to storage media (or vice versa). For example, computer-executable instructions or data structures received over a network or data link can be buffered in RAM within a network interface module (e.g., a "NIC") and then eventually transferred to computing system RAM and/or to less volatile storage media at a computing system. Thus, it should be understood that storage media can be included in computing system components that also, or even primarily, utilize transmission media.
[00115] Computer-executable instructions comprise, for example, instructions and data which, when executed at a processor, cause a general purpose computing system, special purpose computing system, or special purpose processing device to perform a certain function or group of functions. Alternatively, or additionally, the computer- executable instructions may configure the computing system to perform a certain function or group of functions. The computer executable instructions may be, for example, binaries or even instructions that undergo some translation (such as compilation) before direct execution by the processors, such as intermediate format instructions like assembly language, or even source code.
[00116] Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the described features or acts described above. Rather, the described features and acts are disclosed as example forms of implementing the claims.
[00117] Those skilled in the art will appreciate that the invention may be practiced in network computing environments with many types of computing system configurations, including, personal computers, desktop computers, laptop computers, message processors, hand-held devices, multi-processor systems, microprocessor- based or programmable consumer electronics, network PCs, minicomputers, mainframe computers, tablets, mobile telephones, PDAs, pagers, routers, switches, datacenters, wearables (e.g., glasses) and the like. The invention may also be practiced in distributed system environments where local and remote computing systems, which are linked (either by hardwired data links, wireless data links, or by a combination of hardwired and wireless data links) through a network, both perform tasks. In a distributed system environment, program modules may be located in both local and remote memory storage devices.
[00118] Those skilled in the art will also appreciate that the invention may be practiced in a cloud computing environment. Cloud computing environments may be distributed, although this is not required. When distributed, cloud computing environments may be distributed internationally within an organization and/or have components possessed across multiple organizations. In this description and the following claims, "cloud computing" is defined as a model for enabling on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services). The definition of "cloud computing" is not limited to any of the other numerous advantages that can be obtained from such a model when properly deployed.
[00119] A cloud-computing model can be composed of various characteristics, such as on-demand self-service, broad network access, resource pooling, rapid elasticity, measured service, and so forth. A cloud-computing model may also come in the form of various service models such as, for example, Software as a Service ("SaaS"), Platform as a Service ("PaaS"), and Infrastructure as a Service ("IaaS"). The cloud- computing model may also be deployed using different deployment models such as private cloud, community cloud, public cloud, hybrid cloud, and so forth.
[00120] Computing systems of the present invention may be any computing systems as previously described and adapted for compiling, storing, analyzing, parsing, displaying, and/or communicating one or more portions of sequence data derived from a biological sample as previously described. The foregoing functionalities (i.e., compiling, storing, analyzing, parsing, displaying, and/or communicating) may be implemented by any module, system component, computer-executable instruction, etc. known in the art.
[00121] Embodiments may comprise compiling the sequencing data, which may be performed, for example, by a compiling module. For example, this may comprise concatenating one or more sequencing reads and/or assembling a genome based on the one or more sequencing reads. In one or more embodiments, compiling sequencing reads comprises generating a sequencing profile and/or copy number variation profile of the DNA (e.g., cfDNA or ctDNA) obtained from the biological sample.
[00122] Embodiments may comprise storing the sequencing data. The storing may be in any of the aforementioned storage methods (e.g., volatile, non-volatile, local, networked, etc.). For example, computing systems may be adapted to store the individual sequencing reads (e.g., the raw sequencing data) in addition to or distinctly from concatenated sequencing data and/or sequencing profiles, including copy number variation profiles. Any of the foregoing data may be stored in any form and/or any database system (or other storage system) described herein and/or known in the art. As a non-limiting example, the sequencing data may be stored as one or more graphical images comprising the copy number variation profile of one or more samples (e.g., patient samples and standards). Further, sequencing data and/or other data such as, for example, the copy number variation plot of one or more samples/standards, may be retrieved from any of the one or more data stores provided herein.
[00123] Embodiments may additionally, or alternatively, comprise an analyzing module for analyzing the sequencing data. In one or more embodiments, this may comprise comparing one or more sequencing data from patient samples to one or more standards. The one or more standards, as described above, may comprise DNA isolated and sequenced from non-neoplastic cells or may comprise DNA isolated and sequenced from known neoplastic cells of differing cell and/or cancer types. Computing systems analyzing the sequencing may, in some embodiments, digitally and/or logically align sequencing results. This may comprise searching for logical matches (individual matches or a plurality/set of matches) between one or more samples and one or more standards and/or logical matches between two or more samples and/or logical matches between two or more standards. The logical matches may be of any predetermined length or may be determined by a machine learning (or other) algorithm automatically by the computing system. For example, a machine learning algorithm (as known in the art) may "learn" from a known set of cancer- specific sequences comprising copy number variants and fed a host of novel sequences to analyze, predict, and/or update the machine learning algorithm. In this manner, new sequences and/or associations between samples (whether correlating to cancer phenotypes or other states) may be identified. [00124] Additionally, or alternatively, analyzing the sequencing data may comprise identifying a digital match between two or more sequencing data. For example, a copy number variation plot of one or more samples may be in a digital and/or image- based format, and the computing system may analyze the digital and/or image-based plots to determine similarities and/or differences between the two or more plots, and in some embodiments, analyzing comprises predicting and/or determining a likelihood that a sample plot matches a standard plot. Additionally, or alternatively, analyzing may comprise determining a correlation (or lack of correlation) between two or more copy number variation plots.
[00125] Computing systems of the present invention may additionally comprise a parsing module. The parsing module may parse and/or break up sequencing reads and/or genome profiles to identify unique and/or predictive sequences that are indicative and/or correlate to one or more cancer types, tissues of origin, and/or cancer stage/severity. In some embodiments, this may comprise parsing a standard or group of standards to determine one or more portions of the standard sequence that correlate with one or more cancer types, tissues of origin, and/or cancer stages/severity.
[00126] Embodiments of the present disclosure may comprise a displaying module and/or a physical display for operably displaying the one or more sequencing reads, copy number variation plots, analysis results, and/or any other data or image associated herewith. In one or more embodiments, the display will comprise one or more graphical user interfaces whereby a user may interact with the computing system to input one or more data entries through any of the input devices/methods disclosed herein. The one or more graphical user interfaces may comprise a medium through which a physician, technician, or other healthcare and/or scientific personnel may view the sequencing results and/or the copy number variation plots.
[00127] In one or more embodiments, the display comprises a navigation window that allows the user to transit or transition between one or more historical and/or current copy number variation plots and/or sequencing samples (which may be organized by any method known in the art, including, for example, by unique identifiers, date, etc.). The display may additionally, or alternatively, comprise the copy number variation plot of one or more samples in addition to one or more standard copy number variation plots. The computing system may output through the display a first and/or an ordered list of probable matches between the sample sequencing read and one or more standards and/or one or more other sample reads/plots. The display may further allow the user to annotate and/or select an associated cancer type, tissue of origin, and/or cancer stage/severity. In one or more embodiments, the display comprises one or more plots representing each of the foregoing— cancer type, tissue of origin, and/or cancer stage/severity— or may display one or more plots comprising an accumulation of each (e.g., stage 3 breast cancer metastasized to the lung).
[00128] In one or more embodiments, the display (or the computing system without a display module/hardware) may communicate one or more results of the sequencing data and/or copy number variation plot analysis to one or more administrators, billing modules/institutions, insurance carriers, physicians, patients, technicians, or others requiring and/or requesting the information. The computing system may generate a form and/or beautified communication comprising any and/or all of the information disclosed above and may communicate said communication through any means known in the art.
[00129] Any and/or all of the foregoing may be embodied in a system and/or may be included in one or more methods and/or embodied in one or more computer-readable hardware storage devices as one or more computer-executable instructions.
[00130] Various alterations and/or modifications of the inventive features illustrated herein, and additional applications of the principles illustrated herein, which would occur to one skilled in the relevant art and having possession of this disclosure, can be made to the illustrated embodiments without departing from the spirit and scope of the invention as defined by the claims, and are to be considered within the scope of this disclosure. Thus, while various aspects and embodiments have been disclosed herein, other aspects and embodiments are contemplated. While a number of methods and components similar or equivalent to those described herein can be used to practice embodiments of the present disclosure, only certain components and methods are described herein.
[00131] It will also be appreciated that systems, devices, products, kits, methods, and/or processes, according to certain embodiments of the present disclosure may include, incorporate, or otherwise comprise properties, features (e.g., components, members, elements, parts, and/or portions) described in other embodiments disclosed and/or described herein. Accordingly, the various features of certain embodiments can be compatible with, combined with, included in, and/or incorporated into other embodiments of the present disclosure. Thus, disclosure of certain features relative to a specific embodiment of the present disclosure should not be construed as limiting application or inclusion of said features to the specific embodiment. Rather, it will be appreciated that other embodiments can also include said features, members, elements, parts, and/or portions without necessarily departing from the scope of the present disclosure.
[00132] Moreover, unless a feature is described as requiring another feature in combination therewith, any feature herein may be combined with any other feature of a same or different embodiment disclosed herein. Furthermore, various well-known aspects of illustrative systems, methods, apparatus, and the like are not described herein in particular detail in order to avoid obscuring aspects of the example embodiments. Such aspects are, however, also contemplated herein.
[00133] The present disclosure may be embodied in other specific forms without departing from its spirit or essential characteristics. The described embodiments are to be considered in all respects only as illustrative and not restrictive. The scope of the invention is, therefore, indicated by the appended claims rather than by the foregoing description. While certain embodiments and details have been included herein and in the attached disclosure for purposes of illustrating embodiments of the present disclosure, it will be apparent to those skilled in the art that various changes in the methods, products, devices, and apparatus disclosed herein may be made without departing from the scope of the disclosure or of the invention, which is defined in the appended claims. All changes which come within the meaning and range of equivalency of the claims are to be embraced within their scope.

Claims

CLAIMS What is claimed is:
1. A method for detecting cancer, the method comprising:
obtaining a biological sample comprising nucleic acid from a patient;
preparing a next generation sequencing library of the nucleic acid;
sequencing at least a portion of the prepared next generation sequencing library with a sequencing coverage of less than or equal to about 5X coverage;
measuring a number of copies of at least one nucleic acid sequence included in the sequenced portion of the nucleic acid library;
comparing the measured number of copies with a standard copy number for the at least one nucleic acid sequence to determine variability or similarity between the measured number of copies and the standard copy number; and
providing the patient with a diagnosis of cancer based on the variability or similarity between the measured number of copies and the standard copy number.
2. A method for detecting cancer in a biological sample, the biological sample comprising nucleic acid, the method comprising:
preparing a nucleic acid library of the nucleic acid in the biological sample;
sequencing at least a portion of the prepared nucleic acid library;
measuring a number of copies of at least one nucleic acid sequence included in the sequenced portion of the nucleic acid library; and
comparing the measured number of copies with a standard copy number for the at least one nucleic acid sequence to determine variability or similarity between the measured number of copies and the standard copy number.
3. The method as in any of the preceding claims, wherein the standard copy number comprises a copy number of nucleic acid sequence for a wild-type cell or a copy number profile of nucleic acid sequence for a wild-type cell.
4. The method as in any of the preceding claims, wherein the wild-type cell is a nonneoplastic cell.
5. The method as in any of the preceding claims, wherein the wild-type cell is a non- cancerous cell.
6. The method as in any of the preceding claims, wherein the standard copy number comprises a copy number of nucleic acid sequence within a known cancer cell type or a copy number profile of nucleic acid sequence for each of one or more cancer cell types.
7. The method as in any of the preceding claims, wherein the comparing step further comprises determining a tissue of origin of the nucleic acid based on the similarity of the measured number of copies and the standard copy number.
8. The method as in any of the preceding claims, wherein the comparing step further comprises determining a presence and/or a stage of cancer based on the similarity of the measured number of copies and the standard copy number.
9. The method as in any of the preceding claims, wherein the biological sample comprises a solid tissue or a liquid biopsy.
10. The method as in claim 9, wherein the solid tissue comprises tumor tissue, primary tumor tissue, or recurrent primary tumor tissue.
11. The method as in claim 9, wherein the liquid biopsy comprises a liquid portion of a biological sample selected from the group consisting of: blood plasma, blood serum, cerebrospinal fluid, tumor fluid, interstitial fluid phase, and combinations thereof, preferably wherein the liquid biopsy comprises blood plasma.
12. The method as in any of the preceding claims, wherein the nucleic acid comprises cell-free DNA.
13. The method as in claim 12, further comprising determining a DNA methylation pattern of the cell-free DNA.
14. The method as in one of claims 12 or 13, further comprising detecting a DNA mutation in the cell-free DNA.
15. The method as in claim 12, wherein the comparing step further comprises determining a tissue of origin of the cell-free DNA based on:
the similarity of the measured number of copies and the standard copy number; and one or more of:
a determined DNA methylation pattern of the cell-free DNA; and a detected DNA mutation in the cell-free DNA.
16. The method as in claim 12, wherein the comparing step further comprises determining a presence and a stage of cancer based on:
the similarity of the measured number of copies and the standard copy number; and one or more of:
a determined DNA methylation pattern of the cell-free DNA; and a detected DNA mutation in the cell-free DNA.
17. The method as in any of the preceding claims, wherein the preparing step further comprises quantifying the nucleic acid library.
18. The method as in claim 2, wherein the preparing step comprises preparing the nucleic acid library for next generation sequencing.
19. The method as in claim 2 or 18, wherein the sequencing step comprises performing a next generation sequencing technique.
20. The method as in claim 19, wherein the next generation sequencing technique comprises one or more of: pyrosequencing, sequencing by synthesis, sequencing by ligation, and ion semiconductor sequencing.
21. The method as in claim 19, wherein the next generation sequencing technique comprises a fluorescent-based sequencing technique or an electrical-current-based sequencing technique.
22. The method as in any one of the preceding claims, wherein a sequencing coverage of the sequencing step is less than or equal to about 20x, 15x, lOx, 5X, 4X, 3X, 2.5X, 2X, 1.5X, IX, 0.75X, 0.5X, 0.25X, 0.1X, or 0.05X coverage.
23. The method as in any of the preceding claims, wherein the sequencing coverage is between 0.5X and 10X coverage.
24. The method as in any of the preceding claims, wherein the sequencing coverage is between IX and 5X coverage.
25. The method as in any of the preceding claims, wherein the sequencing coverage is between 0.5X and IX coverage.
26. A method for detecting a circulating tumor DNA (ctDNA) in a liquid biopsy, the method comprising:
isolating cell free DNA (cfDNA) from the liquid biopsy, the cfDNA comprising one or more genetic elements;
determining a copy number of the one or more genetic elements; and
comparing the determined copy number to one or more known copy number standards for ctDNA.
27. The method as in claim 26, further comprising assembling a genetic profile of the cfDNA, wherein the genetic profile comprises a representation of the relative abundance of the one or more genetic elements in the cfDNA.
28. The method as in claim 26, wherein the comparing step further comprises detecting the presence of ctDNA in the liquid biopsy by measuring one or more similarities between the determined copy number and the one or more known copy number standards for ctDNA.
29. The method as in any one of claims 26-28, wherein the comparing step further comprises determining a tissue of origin of the cfDNA based on the similarity of the measured number of copies and the standard copy number.
30. The method as in claim 29, wherein determining the tissue of origin of the cfDNA is further based on a methylation pattern of the cfDNA and/or the one or more genetic elements.
31. The method as in any of claim 29-30, wherein the tissue of origin is selected from the group consisting of: bladder, brain, breast, bone, cervix, colon, head and neck, gall bladder, kidney, liver, lung, ovary, pancreas, prostate, rectum, skin, spleen, thyroid, stomach, uterus, and combinations thereof.
32. The method as in any of claims 26-31, wherein the comparing step further comprises determining a presence and a stage of cancer based on the similarity of the measured number of copies and the standard copy number.
33. The method as in claim 32, wherein determining the presence and the stage of cancer is further based on a determined methylation pattern of the one or more genetic sequences.
34. The method as in one of claims 32 or 33, wherein determining the presence and the stage of cancer is further based on a detected mutation in the one or more genetic sequences.
35. The method as in any of claims 26-34, wherein the isolating step further comprises one or more of:
quantifying the cfDNA;
preparing the cfDNA for sequencing; and
sequencing the cfDNA.
36. The method as in claim 35, wherein a sequencing coverage of sequencing the cfDNA is any of less than or equal to about 20x, 15x, lOx, 5X, 4X, 3X, 2.5X, 2X, 1.5X, IX, 0.75X, 0.5X, 0.25X, 0.1X, or 0.05X coverage.
37. The method as in one of claims 35 or 36, wherein sequencing the cfDNA comprises sequencing using a next generation sequencing technique.
38. The method as in claim 37, wherein the next generation sequencing technique comprises one or more of: pyrosequencing, sequencing by synthesis, sequencing by ligation, and ion semiconductor sequencing.
39. The method as in claim 37, wherein the next generation sequencing technique comprises a fluorescent-based sequencing technique or an electrical-current-based sequencing technique.
40. The method as in any of the preceding claims, further comprising implementing a cancer-specific treatment based on:
an identified cancer; and at least one of:
an identified cancer type;
a tissue of origin of the identified cancer; and/or
a stage/severity of the identified cancer.
41. A The method as in any of the preceding claims, wherein sequencing is performed and/or accomplished with reads of between about 10-200 bp, 25-150 bp, 25-100 bp, 25-50bp, 50-150 bp, 50-100 bp, 100-150 bp, 20-30 bp, 24-28 bp, or 25-26 bp.
42. Acomputer system for engineering compliant communications comprising:
one or more processors; and
one or more computer-readable storage media having stored thereon computer-executable instructions that are executable by the one or more processors to cause the computer system to analyze a nucleic acid processed from a biological sample to determine the presence of cancer in the biological sample, the computer- executable instructions including instructions that are executable to cause the computer system to perform at least the following:
receive sequence data, the sequence data comprising a plurality of sequence reads derived from the nucleic acid;
parse the sequence data to determine a number of copies of at least one nucleic acid sequence included in the sequence data;
analyze the parsed number of copies with a standard copy number for the at least one nucleic acid sequence to determine variability or similarity between the parsed number of copies and the standard copy number; and
based on the determined variability or similarity, display a result at a user interface.
PCT/US2017/058599 2016-10-26 2017-10-26 Systems and methods for characterizing nucleic acid in a biological sample WO2018081465A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201662413359P 2016-10-26 2016-10-26
US62/413,359 2016-10-26

Publications (1)

Publication Number Publication Date
WO2018081465A1 true WO2018081465A1 (en) 2018-05-03

Family

ID=62024027

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2017/058599 WO2018081465A1 (en) 2016-10-26 2017-10-26 Systems and methods for characterizing nucleic acid in a biological sample

Country Status (1)

Country Link
WO (1) WO2018081465A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111028888A (en) * 2018-10-09 2020-04-17 北京贝瑞和康生物技术有限公司 Detection method of genome-wide copy number variation and application thereof
EP3670670A1 (en) * 2018-12-18 2020-06-24 Ricoh Company, Ltd. Nucleic acid analysis method, nucleic acid analysis program, and device for library preparation
CN111334566A (en) * 2018-12-18 2020-06-26 株式会社理光 Nucleic acid analysis method, nucleic acid analysis program, and library preparation device
JP2020124185A (en) * 2019-01-31 2020-08-20 株式会社リコー Method for analyzing high-throughput sequencing reaction data

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140296094A1 (en) * 2013-03-15 2014-10-02 Abbott Molecular Inc. Systems and methods for detection of genomic copy number changes
US20150368708A1 (en) * 2012-09-04 2015-12-24 Gaurdant Health, Inc. Systems and methods to detect rare mutations and copy number variation

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150368708A1 (en) * 2012-09-04 2015-12-24 Gaurdant Health, Inc. Systems and methods to detect rare mutations and copy number variation
US20140296094A1 (en) * 2013-03-15 2014-10-02 Abbott Molecular Inc. Systems and methods for detection of genomic copy number changes

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
KOBOLDT ET AL.: "VarScan 2: Somatic mutation and copy number alteration discovery in cancer by exome sequencing", GENOME RES., vol. 22, no. 3, March 2012 (2012-03-01), pages 568 - 576, XP055364674 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111028888A (en) * 2018-10-09 2020-04-17 北京贝瑞和康生物技术有限公司 Detection method of genome-wide copy number variation and application thereof
EP3670670A1 (en) * 2018-12-18 2020-06-24 Ricoh Company, Ltd. Nucleic acid analysis method, nucleic acid analysis program, and device for library preparation
CN111334566A (en) * 2018-12-18 2020-06-26 株式会社理光 Nucleic acid analysis method, nucleic acid analysis program, and library preparation device
US11705218B2 (en) 2018-12-18 2023-07-18 Ricoh Company, Ltd. Nucleic acid analysis method, nucleic acid analysis program, and device for library preparation
JP2020124185A (en) * 2019-01-31 2020-08-20 株式会社リコー Method for analyzing high-throughput sequencing reaction data
JP7236050B2 (en) 2019-01-31 2023-03-09 株式会社リコー How to analyze data from high-throughput sequencing reactions

Similar Documents

Publication Publication Date Title
US11335437B2 (en) Set membership testers for aligning nucleic acid samples
US11499196B2 (en) Cell-free DNA methylation patterns for disease and condition analysis
Ding et al. Expanding the computational toolbox for mining cancer genomes
ES2806728T3 (en) Resolution of Genome Fractions Using Polymorphism Counts
TWI814753B (en) Models for targeted sequencing
JP2021521536A (en) Machine learning implementation for multi-sample assay of biological samples
JP7299169B2 (en) Methods and systems for determining clonality of somatic mutations
CN110706749B (en) Cancer type prediction system and method based on tissue and organ differentiation hierarchical relation
US11581062B2 (en) Systems and methods for classifying patients with respect to multiple cancer classes
US20210104297A1 (en) Systems and methods for determining tumor fraction in cell-free nucleic acid
US11929148B2 (en) Systems and methods for enriching for cancer-derived fragments using fragment size
US20210065842A1 (en) Systems and methods for determining tumor fraction
WO2018081465A1 (en) Systems and methods for characterizing nucleic acid in a biological sample
CA3204451A1 (en) Systems and methods for joint low-coverage whole genome sequencing and whole exome sequencing inference of copy number variation for clinical diagnostics
CA3167253A1 (en) Methods and systems for a liquid biopsy assay
CN116631508B (en) Detection method for tumor specific mutation state and application thereof
KR20220069943A (en) Single-cell RNA-SEQ data processing
JP5461959B2 (en) Glioma prognosis prediction method and kit used therefor
US20230090925A1 (en) Methylation fragment probabilistic noise model with noisy region filtration
CN117413072A (en) Methods and systems for detecting cancer by nucleic acid methylation analysis
WO2022262569A1 (en) Method for distinguishing somatic mutation and germline mutation
Emmert-Streib Statistical diagnostics for cancer: analyzing high-dimensional data
Iloshini et al. Dots Witer: Prediction of Potential Cancer Driver Genes Using Hybrid Approach
US20220042108A1 (en) Systems and methods of assessing breast cancer
US20240136018A1 (en) Component mixture model for tissue identification in dna samples

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17863990

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17863990

Country of ref document: EP

Kind code of ref document: A1