WO2017106365A1 - Methods for measuring mutation load - Google Patents

Methods for measuring mutation load Download PDF

Info

Publication number
WO2017106365A1
WO2017106365A1 PCT/US2016/066685 US2016066685W WO2017106365A1 WO 2017106365 A1 WO2017106365 A1 WO 2017106365A1 US 2016066685 W US2016066685 W US 2016066685W WO 2017106365 A1 WO2017106365 A1 WO 2017106365A1
Authority
WO
WIPO (PCT)
Prior art keywords
test
loci
mutation
chromosomes
mutation load
Prior art date
Application number
PCT/US2016/066685
Other languages
French (fr)
Inventor
Andrey Zharkikh
Michael Perry
Original Assignee
Myriad Genetics, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Myriad Genetics, Inc. filed Critical Myriad Genetics, Inc.
Publication of WO2017106365A1 publication Critical patent/WO2017106365A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/106Pharmacogenomics, i.e. genetic variability in individual responses to drugs and drug metabolism
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/156Polymorphic or mutational markers

Definitions

  • the present disclosure relates to methods, kits, systems and compositions for measuring mutation load in ca ncer cells, which ca n in turn be applied to molecular diagnostic methods, kits, systems and compositions for characterizing cancer.
  • the present disclosure provides a method of measuring mutation load (e.g., in a cancer cell sample) comprising: (1) analyzing DNA derived from a cancer cell sample to determine (e.g., detect) the nucleotide sequence of the DNA at a plurality of test loci comprising at least [X number] of loci on each of a plurality of chromosomes, the plurality of chromosomes comprising at least [Y number] of chromosomes, wherein the plurality of test loci comprises at least [Z number] of total loci across the plurality of chromosomes; and
  • test mutation load is, or is derived or calculated from, the number or proportion of test loci harboring a mutation.
  • the present disclosure provides a method of measuring mutation load (e.g., in a cancer cell sample) comprising:
  • analyzing DNA derived from a cancer cell sample to determine (e.g., detect) the nucleotide sequence of the DNA at a plurality of test loci comprising at least [X number] of loci on each of a plurality of chromosomes, the plurality of chromosomes comprising at least [Y number] of chromosomes, wherein the plurality of test loci comprises at least [Z number] of total loci across the plurality of chromosomes;
  • test mutation load is, or is derived or calculated from, the number or proportion of test loci harboring a mutation
  • the present disclosure provides a method of treating cancer patients comprising:
  • analyzing DNA derived from a cancer cell sample to determine (e.g., detect) the nucleotide sequence of the DNA at a plurality of test loci comprising at least [X number] of loci on each of a plurality of chromosomes, the plurality of chromosomes comprising at least [Y number] of chromosomes, wherein the plurality of test loci comprises at least [Z number] of total loci across the plurality of chromosomes;
  • quantifying a test mutation load wherein the test mutation load is, or is derived or calculated from, the number or proportion of test loci harboring a mutation and wherein the test mutation load is high where at least [A number] of test loci harbor a mutation and the test mutation load is low where fewer than [A number] of test loci harbor a mutation;
  • [X number] (the number of test loci per test chromosome) comprises at least 30,000, 40,000, 50,000, 60,000, 70,000, 80,000, 90,000, 100,000, 125,000, 150,000, 175,000, 200,000, 225,000, 250,000, 275,000, 300,000, 325,000, 350,000, 375,000, 400,000, 425,000, 450,000, 475,000, 500,000, 550,000, 600,000, 650,000, 700,000, 750,000, 800,000, 850,000, 900,000, 950,000, 1,000,000, 1,250,000, 1,500,000, 1,750,000, 2,000,000, 2,500,000, 3,000,000, 3,500,000, 4,000,000, 4,500,000, or 5,000,000 test loci on each of the plurality of chromosomes.
  • [Z number] (the total number of test loci across all test chromosomes) comprises at least 500,000, 550,000, 600,000, 650,000, 700,000, 750,000, 800,000, 850,000, 900,000, 950,000, 1,000,000, 1,250,000, 1,500,000, 1,750,000, 2,000,000, 2,250,000, 2,500,000, 2,750,000, 3,000,000, 3,250,000 3,500,000, 3,750,000, 4,000,000, 4,250,000, 4,500,000, 4,750,000, 5,000,000, 5,500,000, 6,000,000, 6,500,000, 7,000,000, 7,500,000, 8,000,000, 8,500,000, 9,000,000, 9,500,000, 10,000,000, 11,000,000, 12,000,000, 13,000,000, 14,000,000, 15,000,000, 16,000,000, 17,000,000, 18,000,000, 19,000,000, 20,000,000, 25,000,000, 30,000,000, 35,000,000, 40,000,000, 45,000,000, 50,000,000, 60,000,000, 70,000,000, 80,000,000, 90,000,000, or 100,000,000 loci.
  • [Y number] (the number of test chromosomes) comprises at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, or 46 chromosomes.
  • the plurality of test loci excludes loci having certain characteristics, e.g., repetitive loci (e.g., nucleotides within homopolymers at least 4, 5, 6, 7, 8, 9, or 10 nucleotides in length), SNPs found in the general population (e.g., SNPs entered in the National Center for Biotechnology I nformation's dbSN P), loci found to harbor a mutation in more than one sample out of a reference cohort of at least some number of samples (e.g., 25, 50, 75, 100), etc.
  • repetitive loci e.g., nucleotides within homopolymers at least 4, 5, 6, 7, 8, 9, or 10 nucleotides in length
  • SNPs found in the general population e.g., SNPs entered in the National Center for Biotechnology I nformation's dbSN P
  • loci found to harbor a mutation in more than one sample out of a reference cohort of at least some number of samples e.g., 25, 50, 75, 100
  • the cancer cell sample is, or is derived from, a bodily fluid.
  • the cancer cell sample is, or is derived from, a tumor.
  • the cancer is selected from endometrial cancer, ovarian cancer, breast cancer, colorectal cancer, lung cancer, or prostate cancer.
  • a first sample is, or is derived from, known cancerous tissue (e.g., tumor biopsy or tumor resection) and a second sa mple is, or is derived from, known germline tissue (e.g., nucleated blood cells, fibroblasts, etc.).
  • Figure 1 illustrates use of high mutation load measured as described herein to differentiate responders from non-responders to a specific PARP inhibitor drug. DETAILED DESCRIPTION OF THE DISCLOSURE
  • algorithm encompasses any formula, model, mathematical equation, algorithmic, analytical or programmed process, or statistical technique or classification analysis that takes one or more inputs or parameters, whether continuous or categorical, and calculates an output value, index, index value or score.
  • algorithms include but are not limited to ratios, sums, regression operators such as exponents or coefficients, biomarker value transformations and normalizations (including, without limitation, normalization schemes that are based on clinical parameters such as age, gender, ethnicity, etc.), rules and guidelines, statistical classification models, and neural networks trained on populations.
  • linear and non-linear equations and statistical classification analyses to determine the relationship between (a) the number of mutations detected in a subject sample and (b) the level of the respective subject's mutation load.
  • diagnosis refers to methods by which a determination can be made as to whether an individual is likely to be suffering from a given disease or condition, including but not limited diseases or conditions characterized by high mutation load.
  • the skilled artisan often makes a diagnosis on the basis of one or more diagnostic indicators, e.g., a biomarker, the presence, absence, amount, or change in amount of which is indicative of the presence, severity, or absence of the condition.
  • diagnostic indicators can include patient history; physical symptoms, e.g., unexplained weight loss, fever, fatigue, pains, or skin anomalies; phenotype; genotype; or environmental or heredity factors.
  • diagnostic refers to an increased probability that certain course or outcome will occur; that is, that a course or outcome is more likely to occur in a patient exhibiting a given characteristic, e.g., the presence or level of a diagnostic indicator, when compared to individuals not exhibiting the characteristic. Diagnostic methods can be used independently, or in combination with other diagnosing methods to determine whether a course or outcome is more likely to occur in a patient exhibiting a given characteristic.
  • disease can encompass any disorder, condition, sickness, ailment, etc. that manifests in, e.g., a disordered or incorrectly functioning organ, part, structure, or system of the body, and results from, e.g., genetic or developmental errors, infection, poisons, nutritional deficiency or imbalance, toxicity, or unfavorable environmental factors.
  • immune checkpoint inhibitor refers to a therapeutic agent whose mode of action is to prevent (or inhibit) immune cells and/or the immune response from being turned off (or down-regulated or inhibited) by cancer cells.
  • examples include PD-1 inhibitors, ipilimumab (see, e.g., Gulley & Dahut, Nat. Clin. Practice Oncol. (2007) 4:136-137), tremelimumab [see, e.g., Ribas et oi, Oncologist (2007) 12:873-883), and the agents listed in Table 1 below.
  • mutation is described in detail below, but generally refers to an acquired nucleotide change in a somatic tissue as compared to a subject's germline.
  • Meltation load is described in detail below, but generally refers to the number or proportion of analyzed loci harboring a mutation, with “high mutation load” or “H ML” generally referring to a number or proportion, or score derived therefrom, that exceeds some reference or threshold.
  • NGS next generation sequencing
  • DNA sequencing libraries are generated by clonal amplification by PCR in vitro
  • the DNA is sequenced by synthesis, such that the DNA sequence is determined by the addition of nucleotides to the complementary strand rather through chain-termination chemistry typical of Sanger sequencing
  • third, the spatially segregated, amplified DNA templates are sequenced simultaneously in a massively parallel fashion, typically without the requirement for a physical separation step.
  • NGS parallelization of sequencing reactions can generate hundreds of megabases to gigabases of nucleotide sequence reads in a single instrument run.
  • conventional sequencing techniques such as Sanger sequencing, which typically report the average genotype of an aggregate collection of molecules
  • NGS technologies typically digitally tabulate the sequence of numerous individual DNA fragments (sequence reads discussed in detail below), such that low frequency variants (e.g., variants present at less than about 10%, 5% or 1% frequency in a heterogeneous population of nucleic acid molecules) can be detected.
  • the term "massively parallel” can also be used to refer to the simultaneous generation of sequence information from many different template molecules by NGS.
  • NGS strategies can include several methodologies, including, but not limited to: (i) microelectrophoretic methods; (ii) sequencing by hybridization; (iii) real-time observation of single molecules, and (iv) cyclic-array sequencing.
  • Cyclic-array sequencing refers to technologies in which a sequence of a dense array of DNA is obtained by iterative cycles of template extension and imaging-based data collection.
  • cyclic-array sequencing technologies include, but are not limited to 454 sequencing, for example, used in 454 Genome Sequencers (Roche Applied Science; Basel), Solexa technology, for example, used in the l llumina Genome Analyzer, lllumina HiSeq, MiSeq, and NextSeq (San Diego, CA), the SOLiD platform (Applied Biosystems; Foster City, CA), the Polonator (Dover/Harvard) and HeliScope Single Molecule Sequencer technology (Helicos; Cambridge, MA).
  • Other NGS methods include single molecule rea l time sequencing (e.g., Pacific Bio) and ion semiconductor sequencing (e.g., Ion Torrent sequencing). See, e.g., Shendure & Ji, Next Generation DNA Sequencing, NAT. BIOTECH. (2008) 26:1135-1145 for a more detailed discussion of NGS sequencing technologies.
  • PARP inhibitor refers to a therapeutic agent that inhibits the poly (ADP-ribose) polymerase (PARP). Examples include those listed in Table 2.
  • patient or “individual” or “subject” refers to a human.
  • a subject ca n be male or female.
  • a subject can be one who has been previously diagnosed or identified as having a disease characterized by high mutation load.
  • a subject can be one who has already undergone, or is undergoing, a therapeutic intervention for disease characterized by high mutation load.
  • a subject can also be one who has not been previously diagnosed with a disease characterized by high mutation load.
  • sample or “biological sample” refers to samples such as biopsy or tissue samples, frozen samples, blood and blood fractions or products (e.g., serum, platelets, red blood cells, and the like), tumor samples, sputum, bronchoalveolar lavage, cultured cells, e.g., primary cultures, explants, and transformed cells, stool, urine, etc.
  • a “biopsy” refers to the process of removing a tissue sample for diagnostic or prognostic evaluation, and to the tissue specimen itself. Various biopsy techniques can be applied to the methods of the present disclosure. The biopsy technique applied will depend on the tissue type to be evaluated (e.g., lung, etc.), the size and type of the tumor, among other factors.
  • Representative biopsy techniques include, but are not limited to, excisional biopsy, incisional biopsy, needle biopsy, surgical biopsy, and bone marrow biopsy.
  • An "excisional biopsy” refers to the removal of an entire tumor mass with a small margin of normal tissue surrounding it.
  • An “incisional biopsy” refers to the removal of a wedge of tissue that includes a cross-sectional diameter of the tumor.
  • a diagnosis made by endoscopy or fluoroscopy can require a "core-needle biopsy", or a "fine-needle aspiration biopsy” which generally obtains a suspension of cells from within a target tissue.
  • a “bodily fluid” include all fluids obtained from a mammalian body, either processed (e.g., serum) or unprocessed, which can include, for example, blood, plasma, urine, lymph, gastric juices, bile, serum, saliva, sweat, and spinal and brain fluids.
  • a biological sample is typically obtained from a subject.
  • cancer cell samples or “tumor sample” means a specimen comprising either at least one cancer cell or biomolecules derived therefrom, including without limitation, lung cancer (e.g., non-small cell lung cancer (NSCLC)), ovarian cancer, colorectal cancer, breast cancer, endometrial cancer, or prostate cancer.
  • NSCLC non-small cell lung cancer
  • Biomolecules "derived” from a cancer cell sample include molecules located within or extracted from the sample as well as artificially synthesized copies or versions of such biomolecules.
  • One illustrative, non-limiting example of such artificially synthesized molecules includes PCR amplification products in which nucleic acids from the sample serve as PCR templates.
  • Nucleic acids of" a cancer cell sample include nucleic acids located in a cancer cell or biomolecules derived from a cancer cell.
  • score means a value or set of values selected so as to provide a quantitative measure of a variable or characteristic of a subject's condition or the degree of mutation load in a sample, and/or to discriminate, differentiate or otherwise characterize mutation load.
  • the value(s) comprising the score can be based on, for example, quantitative data resulting in a measured amount of one or more sample constituents obtained from the subject.
  • the score can be derived from a single constituent, parameter or assessment, while in other embodiments the score is derived from multiple constituents, parameters and/or assessments.
  • the score can be based upon or derived from an interpretation function; e.g., an interpretation function derived from a particular predictive model using any of various statistical algorithms.
  • a "change in score” can refer to the absolute change in score, e.g. from one time point to the next, or the percent change in score, or the change in the score per unit time (i.e., the rate of score change).
  • test locus is a genomic locus (e.g., single nucleotide at a specified position within a chromosome) whose sequence or genotype is assessed according to the present disclosure, wherein a mutation at such a locus (e.g., as compared to a reference genotype or sequence) is potentially counted in a measurement of mutation load.
  • treatment includes all clinical management of a subject and interventions, whether biological, chemical, physical, or a combination thereof, intended to sustain, ameliorate, improve, or otherwise alter the condition of a subject. These terms may be used synonymously herein. Treatments include but are not limited to administration of prophylactics or therapeutic compounds (including small molecule and biologic drugs), exercise regimens, physical therapy, dietary modification and/or supplementation, bariatric surgical intervention, administration of therapeutic compounds (prescription or over-the-counter), and any other treatments efficacious in preventing, delaying the onset of, or ameliorating disease characterized by HML.
  • a “response to treatment” includes a subject's response to any of the above-described treatments, whether biological, chemical, physical, or a combination of the foregoing.
  • a “treatment course” relates to the dosage, duration, extent, etc. of a particular treatment or therapeutic regimen.
  • An initial therapeutic regimen as used herein is the first line of treatment.
  • the present disclosure generally relates to methods for measuring mutation load in a biological specimens and methods for clinically applying such mutation load measurement in directing patient therapy.
  • the disclosed methods generally involve analyzing a large number of genomic loci (e.g., substantially genetically random) to detect variations in these loci (e.g., as compared to some reference or expected base or sequence), quantitating the number of such variations as a measurement of genome-wide mutation load, and optionally selecting a therapeutic regimen for patient's whose samples harbor a certain mutation load.
  • a locus may generally be considered "genetically random" when it is selected without particular regard to its genetic characteristics (e.g., in which gene it is located, its independent association with a clinical feature, etc.).
  • Such a locus may instead be selected for its genomic characteristics (e.g., spacing relative to other test loci along a chromosome, genomic sequence context, etc.) or assay characteristics (e.g., good multiplex amplification, etc.).
  • one aspect of the present disclosure provides a method for measuring mutation load (e.g., in a cancer cell sample) comprising:
  • analyzing DNA derived from a cancer cell sample to determine (e.g., detect) the nucleotide sequence of the DNA at a plurality of test loci comprising at least [X number] of loci on each of a plurality of chromosomes comprising at least [Y number] of chromosomes, wherein the plurality of test loci comprises at least [Z number] of total loci across the plurality of chromosomes;
  • test mutation load is, or is derived from, the number or proportion of test loci harboring a mutation.
  • Another aspect of the present disclosure provides a method of measuring mutation load (e.g., in a cancer cell sample) comprising:
  • analyzing DNA derived from a cancer cell sample to determine (e.g., detect) the nucleotide sequence of the DNA at a plurality of test loci comprising at least [X number] of loci on each of a plurality of chromosomes comprising at least [Y number] of chromosomes, wherein the plurality of test loci comprises at least [Z number] of total loci across the plurality of chromosomes;
  • test mutation load is, or is derived from, the number or proportion of test loci harboring a mutation
  • Another aspect of the present disclosure provides a method of treating cancer patients comprising:
  • test mutation load is, or is derived from, the number or proportion of test loci harboring a mutation and wherein the test mutation load is high where at least [A number] of test loci harbor a mutation and the test mutation load is low where fewer than [A number] of test loci harbor a mutation;
  • mutation generally refers to an acquired nucleotide change in a somatic tissue as compared to a subject's germline.
  • the method comprises analyzing both known cancer-derived DNA (e.g., a first sample that comprises, or comprises DNA extracted or derived from, known cancerous cells) and known germline DNA (e.g., a second sample that comprises, or comprises DNA extracted or derived from, known germline cells) to determine the sequence at the plurality of test loci for each sample.
  • known cancer-derived DNA e.g., a first sample that comprises, or comprises DNA extracted or derived from, known cancerous cells
  • known germline DNA e.g., a second sample that comprises, or comprises DNA extracted or derived from, known germline cells
  • variations between the germline DNA sequence and the ca ncer-derived DNA sequence at test loci would be "m utations" that could be counted in quantifying mutation load (e.g., the germline DNA can be a reference sequence against which the cancer-derived DNA is compared to detect mutations).
  • the patient's germline DNA sequence for the plurality of test loci is known and need not be determined in the method of the disclosure (e.g., no need in some embodiments for physical or bioinformatic analysis to determine germline sequence).
  • this known germline DNA sequence may be used as a reference against which the cancer-derived DNA sequence is compared in order to detect mutations in the cancer-derived DNA.
  • an external reference sequence i.e., not the patient's own germline sequence
  • Variations at test loci in the patient's sequence can in turn represent "mutations" that can in some embodiments be counted toward mutation load according to the present disclosure.
  • test loci in cancer cells may be germline rather than somatic.
  • the present disclosure describes several novel approaches to addressing the possibility that variations in the patient's cancer-derived DNA as compared to any external reference are also in that patient's germline (e.g., germline polymorphism). These approaches generally include, e.g., careful selection of test loci according to specific criteria, estimating the ratio of cancerous to non-cancerous cells in the sample being analyzed, etc.
  • test locus selection criteria can be the repetitive nature of the locus.
  • Repetitive loci can present difficulties in accurate sequence alignment (due to the sequence being repeated in multiple locations in the genome) and also show natural, benign germline variation in humans.
  • repetitive loci carry a relatively high likelihood of harboring an apparent somatic variation as compared to an external reference that is either an artifact of an inaccurate sequencing alignment or a germline variation in the individual.
  • the plurality of test loci excludes repetitive loci.
  • "repetitive loci" includes short tandem repeats (STRs), ALU sequences, and/or low complexity regions, etc.
  • test locus selection criteria can be the prevalence of natural, germline variation at the locus in any relevant population.
  • the plurality of test loci can exclude any single nucleotide polymorphism (SNP) listed in the National Center for Biotechnology Information's SNP database (called "dbSNP"). This can be any SNP listed in the then-current build of dbSN P, the build of dbSNP current as of the filing date of this disclosure, or dbSNP build 146.
  • SNP single nucleotide polymorphism
  • test locus selection criteria can be the prevalence of variations at a locus within some patient/sample cohort.
  • Prevalence of variations at a locus within some patient/sample cohort ca n reflect at least two underlying causes: rare germline polymorphisms not reflected in dbSNP or sequence context that consistently gives rise to inaccurate sequence reads.
  • the plurality of test loci ca n exclude any locus found to harbor a variation in at least some specific number [X] of samples out of a reference cohort comprising at least some minimum number [Y] of reference samples.
  • X at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90 or 100 samples.
  • I n some embodiments Y at least 25, 50, 75, 100, 150, 200, 250, 300, 350, 400, 450, or 500 samples. I n some embodiments all reference samples in the cohort are unrelated (e.g., each sample is from a different patient).
  • a criterion for test locus selection can be known deficiencies in sequencing accuracy and/or reproducibility other than those revealed by prevalence in a cohort as described in the preceding paragraph. This can include, but is certainly not limited to, repetitive sequence context surrounding the locus, which ca n often be identified in silico (e.g., no need for empirical testing of the locus). It can also include empirically determined reproducibility deficiencies. Another overlap is found in the fact that prevalence in a reference cohort ca n be caused by germline variation not reflected in a common SNP (or other polymorphism) database.
  • Another novel technique disclosed herein for addressing the possibility that variations in the patient's ca ncer-derived DNA as compared to any external reference are also in that patient's germline includes estimating the ratio of cancerous to non-cancerous cells in the sample being analyzed.
  • some embodiments of any of the above-described aspects of the disclosure may comprise a step of estimating the ratio of cancerous to non-cancerous cells (or the ratio of cancerous to non-cancerous DNA) in the sample being analyzed.
  • this cancerous/non-cancerous ratio can be used in conjunction with a predefined range of variant allele ratios (as discussed below) to determine whether a detected variation is germline or somatic.
  • the ratio of cancerous to non-cancerous DNA can optionally be estimated and optionally further the allelic changes that occur at each genomic region can be reconstructed (these changes ca n include both allele copy number changes and loss of an allele).
  • an estimate of frequency can be determined for variants on either germline allele or a somatic variant on either allele. Should the estimated frequency for a variant on either of the somatic alleles differ significantly from the expected germline frequencies, one may call variants with a frequency that is close to the estimated somatic allele(s) frequency as a somatic variant. In practice, these frequencies may be anywhere from 0-100%.
  • variants with close to 25% frequency e.g., 15%, 16%, 17%, 18%, 19%, 20%, 21%, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 30%, 31%, 32%, 33%, 34%, or 35%) may be counted as "mutations" in quantifying mutation load.
  • [X number] (the number of test loci per test chromosome) comprises at least 30,000, 40,000, 50,000, 60,000, 70,000, 80,000, 90,000, 100,000, 125,000, 150,000, 175,000, 200,000, 225,000, 250,000, 275,000, 300,000, 325,000, 350,000, 375,000, 400,000, 425,000, 450,000, 475,000, 500,000, 550,000, 600,000, 650,000, 700,000, 750,000, 800,000, 850,000, 900,000, 950,000, 1,000,000, 1,250,000, 1,500,000, 1,750,000, 2,000,000, 2,500,000, 3,000,000, 3,500,000, 4,000,000, 4,500,000, or 5,000,000 test loci on each of the plurality of chromosomes.
  • [Z number] (the total number of test loci across all test chromosomes) comprises at least 500,000, 550,000, 600,000, 650,000, 700,000, 750,000, 800,000, 850,000, 900,000, 950,000, 1,000,000,
  • [Y number] (the number of test chromosomes) comprises at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, or 46 chromosomes.
  • this threshold [A number] may be derived from a study of a cohort of patients in which mutation load was assessed according to the present disclosure, with [A number] representing a mutation number that separated patients in the cohort according to some clinical feature associated with mutation load (e.g., [A number] separated patients with or without response to immune checkpoint inhibitors with at least some minimum level of accuracy, e.g., sensitivity and specificity each of at least 70%, 80%, 85%, 90%, 95%).
  • some clinical feature associated with mutation load e.g., [A number] separated patients with or without response to immune checkpoint inhibitors with at least some minimum level of accuracy, e.g., sensitivity and specificity each of at least 70%, 80%, 85%, 90%, 95%).
  • “low” can mean either low with respect to some reference or simply “not high.”
  • more than two categories i.e., more than “high” and “low” can be used to quantify mutation load (e.g., 3, 4, 5, 6, 7, 8, 9, 10 or more categories).
  • mutation load e.g. 3, 4, 5, 6, 7, 8, 9, 10 or more categories.
  • a patient cohort as described in this paragraph could be divided into three groups, "high” mutation load for mutation numbers at or above [A number], “intermediate” mutation load for mutation numbers between [A number] and [B number], and “low” mutation load for mutation numbers at or below [B number]. This division may be based on the cohort being divided into terciles of mutation load or using some other thresholds between subgroups. I n some embodiments a numerical score is derived from the number of mutations and this score is used as the threshold for differentiating high and low mutation load.
  • sequence read refers to the sequence of an individual DNA molecule sequenced in a sequencing reaction.
  • individual DNA molecules used for sequencing can be relatively short (e.g., ranging from 50nt to l,000nt). These molecules are often heavily overlapping in their sequences. Thus, any individual test locus is contained or represented within numerous distinct DNA molecules in the sample that is subjected to the sequencing reaction.
  • sequence reads can be aligned against each other and/or against a larger reference sequence (e.g., a reference human genome sequence such as the hgl9 version of the human genome assembly available at the University of California Santa Clara's Genome Browser website).
  • a reference human genome sequence such as the hgl9 version of the human genome assembly available at the University of California Santa Clara's Genome Browser website.
  • a greater number of reliably sequenced (or “informative") reads containing or representing (or “covering”) any individual locus tends to yield greater accuracy and confidence in the called genotype/sequence at that locus.
  • a test locus (or a variant at that locus) may be counted toward mutation load only if it is covered by at least some minimal number of sequence reads in the sequencing reaction(s) (e.g., at least 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 125, 150, 175, 200, 225, 50, 275, 300, 325, 350, 375, 400, 425, 450, 475, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1,000, 1,100, 1,200, 1,300, 1,400, 1,500, 1,600, 1,700, 1,800, 1,900, 2,000, 2,250, 2,500, 2,750, 3,000 or more sequence reads).
  • sequence reads e.g., at least 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 125, 150, 175, 200, 225, 50, 275, 300, 325, 350, 375, 400, 425, 450, 475, 500,
  • variant allele ratio refers to the proportion of informative sequence reads harboring a variant nucleotide as a proportion of the total sequence reads. For example, if a test locus is covered by 100 informative sequence reads in a particular sequencing reaction and 15 reads carry a variant nucleotide (e.g., T instead of the G expected based on knowledge of the patient's germline sequence or based on an external reference sequence), then the variant allele ratio is 15%. I n some contexts variant allele ratios that are too low or too high may indicate unreliability in a mutation call.
  • the variant allele ratio is around 1%, this can be due to sequencing artifacts and noise (e.g., a small proportion of sequence reads simply contain sequencing errors). If, however, the variant ratio approaches 100%, this can indicate that the variant may not be a mutation since (a) the sample may contain only or substantially only tumor tissue and thus any variants detected may be either somatic mutations or rare germline variants and/or (b) the variant may be a homozygous germline variant found throught the tumor and normal portions of the sample.
  • a test locus (or a variant at that locus) may be counted toward mutation load only if the variant allele ratio is within a specific (e.g., pre-specified) range (e.g., 15%-85%, 16%-85%, 17%-85%, 18%-85%, 19%-85%, 20%-85%, 21%-85%, 22%-85%, 23%-85%, 24%-85%, 25%- 85%, 26%-85%, 27%-85%, 28%-85%, 29%-85%, 30%-85%, 31%-85%, 32%-85%, 33%-85%, 34%-85%, 35%-85%, 15%-80%, 16%-80%, 17%-80%, 18%-80%, 19%-80%, 20%-80%, 21%-80%, 22%-80%, 23%- 80%, 24%-80%, 25%-80%, 26%-80%, 27%-80%, 28%-80%, 29%-80%, 30%-80%, 31%-80%, 3
  • sequence reads may be discarded.
  • X 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90 or 100 mismatches.
  • sequence reads having more than a certain maximum proportion [Y] of mismatches against the reference may be discarded (e.g., the percentage of nucleotide positions with a mismatch divided by the total number of nucleotide positions in the sequence read or, as described below, in the region of confident sequence calls in the sequence read).
  • Y 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 25%, 30%, 35%, 40%, 45%, or 50% or more mismatches.
  • sequence reads that, when aligned to the reference sequence, show insertions or deletions (gaps) may also be discarded.
  • raw reads produced by a sequencing reaction may be correct and precise (i.e., high confidence in the sequence calls) only in a fraction of their length.
  • using the entire read sequence may introduce artifacts in variant or mutation calling to quantify mutation load as described herein. Reliability can be increased, however, by "trimming" the sequence to be analyzed in each read (i.e., discarding from ultimate analysis a certain length of nucleotides at the end(s) of reads).
  • this is done by trimming a pre-specified number of nucleotides from the 5' and/or 3' ends of each read, e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50 or more nucleotides from the 5' end and/or 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 125, 150, 175, 200, 250, 300, 350, 400, 450, 500 or more nucleotides from the 3' end.
  • this is done by trimming a pre-specified proportion of nucleotides from the 5' and/or 3' ends of each read (e.g., the percentage of nucleotide positions to be trimmed divided by the total number of nucleotide positions in the read), e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50 or more nucleotides from the 5' end and/or 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 25%, 30%, 35%, 40%, 45%, or 50% or more nucleotides from the 3' end.
  • the total average read length is between
  • trimming may be performed informatically.
  • Two general classes of trimming algorithms include window-based or running-sum algorithms, with examples of each as follows:
  • double-stranded DNA may be sequenced only in one direction whereas, in other embodments, such DNA may be sequenced in both the forward and reverse directions (e.g., to increase the number of potentially informative sequence reads).
  • Several techniques may be employed to in turn improve bidirectional sequencing reliability, including trimming and removal of reads whose sequence varies too highly from a reference (e.g., as each is described above).
  • Another technique specific to improving reliability of bidirectional sequencing involves comparing the forward and reverse strand reads and discarding certain reads under certain circumstances. For example, if the forward and reverse read alignments overlap, one may in some embodiments use only one to contribute to the coverage and variant allele analysis (or counting) of the overlapping positions. Further, in some embodiments positions or sequence reads with non-matching forward and reverse bases may be excluded as sequencing errors.
  • analyzing DNA can comprise either (a) analyzing (e.g., informatically) DNA data previously obtained from a patient sample or (b) obtaining DNA from a patient sample and performing a laboratory assay to obtain data regarding such DNA.
  • determining the nucleotide sequence of" DNA includes either (a) analyzing (e.g., informatically) DNA sequence data previously obtained from a patient sample or (b) obtaining DNA from a patient sample and performing a laboratory assay to sequence such DNA.
  • the methods of the invention generally require at least analyzing cancer-derived DNA (e.g., "DNA derived from a cancer cell sample”), but may optionally also include analyzing germline DNA and using the sequence of such germline DNA as a reference for detecting (and quantifying mutations in the cancer-derived DNA).
  • cancer-derived DNA is synonymous with DNA “derived” (as defined herein) from a cancer cell.
  • the ca ncer cell sample is, or is derived from, a bodily fluid.
  • the ca ncer cell sample is, or is derived from, a tumor.
  • the cancer is selected from endometrial cancer, ovarian cancer, breast cancer, colorectal cancer, lung cancer, or prostate cancer.
  • a first sample comprises, or is derived from, known cancerous cells (e.g., tumor biopsy or tumor resection) and a second sample is, or is derived from, known germline tissue (e.g., nucleated blood cells, fibroblasts, etc.).
  • the method comprises identifying a sample as not having HRD (including assaying the sample to determine HRD status, e.g., as described and defined in U.S. Appl. Ser. No. 14/245,576 and I nternat. Appl. Ser. No. PCT/US2015/045561) and then performing the methods as described herein.
  • the present methods generally relate to the detection of variants in test loci.
  • the methodology for preparing nucleic acids in a form that is suitable for variant detection can include, but are not limited to, PCR, detectable probes, sequencing and single base extensions, reverse transcriptase-PCR (RT-PCR), real-time PCR, allele-specific hybridization, reverse transcription quantitative real-time PCR (RT-qPCR) ligase chain reaction, strand displacement amplification (SDA), self-sustained sequence replication (3SR), or in situ PCR.
  • RT-PCR reverse transcriptase-PCR
  • RT-qPCR reverse transcription quantitative real-time PCR
  • SDA strand displacement amplification
  • 3SR self-sustained sequence replication
  • I ndels can also be detected by direct sequencing. Methods include e.g., dideoxy sequencing-based methods and other methods such as Maxam and Gilbert sequence (see, e.g., Sambrook et al., supra).
  • Other detection methods include PyrosequencingTM of oligonucleotide-length products. Such methods often employ amplification techniques such as PCR. For example, in pyrosequencing, a sequencing primer is hybridized to a single stranded, PCR-amplified, DNA template; and incubated with the enzymes, DNA polymerase, ATP sulfurylase, luciferase and apyrase, and the substrates, adenosine 5' phosphosulfate (APS) and luciferin. The first of four deoxynucleotide triphosphates (dNTP) is added to the reaction.
  • dNTP deoxynucleotide triphosphates
  • DNA polymerase catalyzes the incorporation of the deoxynucleotide triphosphate into the DNA strand, if it is complementary to the base in the template strand. Each incorporation event is accompanied by release of pyrophosphate (PPi) in a quantity equimolar to the amount of incorporated nucleotide.
  • PPi pyrophosphate
  • ATP sulfurylase quantitatively converts PPi to ATP in the presence of adenosine 5' phosphosulfate. This ATP drives the luciferase-mediated conversion of luciferin to oxyluciferin that generates visible light in amounts that are proportional to the amount of ATP.
  • the light produced in the luciferase- catalyzed reaction is detected by a charge coupled device (CCD) camera and seen as a peak in a PyrogramTM. Each light signal is proportional to the number of nucleotides incorporated.
  • Apyrase a nucleotide degrading enzyme, continuously degrades unincorporated dNTPs and excess ATP. When degradation is complete, another dNTP is added.
  • Another similar method for characterizing indels does not require use of a complete PCR, but typically uses only the extension of a primer by a single, fluorescence-labeled dideoxyribonucleic acid molecule (ddNTP) that is complementary to the nucleotide to be investigated.
  • ddNTP dideoxyribonucleic acid molecule
  • the nucleotide at the polymorphic site ca n be identified via detection of a primer that has been extended by one base and is fluorescently labeled (e.g., Kobayashi et al, Mol. Cell. Probes, 9:175-182, 1995).
  • Amplification products can be analyzed using techniques including, without limitation, electrophoretic analysis or sequence analysis.
  • electrophoretic analysis include slab gel electrophoresis such as agarose or polyacrylamide gel electrophoresis, capillary electrophoresis, and denaturing gradient gel electrophoresis (DGGE).
  • slab gel electrophoresis such as agarose or polyacrylamide gel electrophoresis
  • capillary electrophoresis capillary electrophoresis
  • denaturing gradient gel electrophoresis DGGE
  • Other methods of nucleic acid analysis include, but are limited to, hybridization with allele-specific oligonucleotide probes (Wallace et ai, Nucl. Acids Res.
  • This technique also commonly referred to as allele specific oligonucleotide hybridization (ASO) [e.g., Stoneking et oL, AM. J. HUM. GENET. (1991) 48:70-382; Saiki et oL, NATURE (1986) 324, 163-166; EP 235,726; and WO/1989/011548), relies on distinguishing between two DNA molecules differing by one base by hybridizing an oligonucleotide probe that is specific for one of the variants to an amplified product obtained from amplifying the nucleic acid sample.
  • This method typically employs short oligonucleotides, e.g., 15-20 bases in length.
  • the probes are designed to differentially hybridize to one variant (e.g., from a reference sequence) versus another. Hybridization conditions should be sufficiently stringent that there is a significant difference in hybridization intensity between alleles, and producing an essentially binary response, whereby a probe hybridizes to only one of the alleles. Some probes are designed to hybridize to a segment of target DNA such that the polymorphic site aligns with a central position (e.g., in a 15-base oligonucleotide at the 7 position; in a 16-based oligonucleotide at either the 8 or 9 position) of the probe, but this design is not required.
  • a central position e.g., in a 15-base oligonucleotide at the 7 position; in a 16-based oligonucleotide at either the 8 or 9 position
  • the amount and/or presence of an allele is determined by measuring the amount of allele-specific oligonucleotide that is hybridized to the sample.
  • the oligonucleotide comprises a label (e.g., a fluorescent label).
  • an allele-specific oligonucleotide is applied to immobilized oligonucleotides representing sequences with different nucleotides at the test locus. After stringent hybridization and washing conditions, fluorescence intensity is measured for each variant-specific oligonucleotide.
  • Suitable assay formats for detecting hybrids formed between probes and target nucleic acid sequences in a sample include, but are not limited to, the immobilized target (dot-blot) format and immobilized probe (reverse dot-blot or line-blot) assay formats.
  • Dot blot and reverse dot blot assay formats are described in U.S. Pat. Nos. 5,310,893; 5,451,512; 5,468,613; and 5,604,099; each incorporated herein by reference.
  • Hybridization probes useful in the methods of the present disclosure can include an allele-specific probe that discriminates between alleles with and without variants at the test locus.
  • the probes can be at least about 12, 15, 16, 18, 20, 22, 24, 25, 30 or more nucleotide fragments of a contiguous sequence surrounding the test locus.
  • the probes can be produced by, for example, chemical synthesis, PCR amplification, generation from longer polynucleotides using restriction enzymes, or other methods.
  • the probes can be made completely complementary to the target nucleic acid or portion thereof (e.g., to all or a portion of a sequence encoding a target). Therefore, usually high stringency conditions are desirable in order to prevent or at least minimize false positives.
  • conditions of high stringency may be best suited to situations where the probes are complementary to regions of the target which lack heterogeneity.
  • the stringency of hybridization is determined by a number of factors during hybridization and during the washing procedure, including temperature, ionic strength, length of time, and concentration of formamide (Sambrook et al. (1989), "Molecular Cloning; A Laboratory Manual,” Second Edition (Cold Spring Harbor Press, Cold Spring Harbor, N.Y.)).
  • Nucleic acid probes can be provided in solution for such assays, or can be affixed to a support (e.g., solid or semi-solid support).
  • a support e.g., solid or semi-solid support.
  • supports that can be used are nitrocellulose (e.g., in membrane or microtiter well form), polyvinyl chloride (e.g., in sheets or microtiter wells), polystyrene latex (e.g., in beads or microtiter plates, polyvinylidine fluoride, diazotized paper, nylon membranes, activated beads, and Protein A beads.
  • Probes detectable upon a secondary structural change are also suitable for detection of a test locus variant.
  • Exemplified secondary structure or stem-loop structure probes include molecular beacons or Scorpion ® primer/probes.
  • Molecular beacon probes are single- stranded oligonucleic acid probes that can form a hairpin structure in which a fluorophore and a quencher are usually placed on the opposite ends of the oligonucleotide. At either end of the probe short complementary sequences allow for the formation of an intramolecular stem, which enables the fluorophore and the quencher to come into close proximity.
  • the loop portion of the molecular beacon is complementary to a target nucleic acid of interest.
  • Binding of this probe to its target nucleic acid of interest forms a hybrid that forces the stem apart. This causes a conformation change that moves the fluorophore and the quencher away from each other and leads to a more intense fluorescent signal.
  • Tyagi & Kramer Nat. Biotechnol. (1996) 14:303-308; Tyagi et al., Nat. Biotechnol. (1998) 16:49-53; Piatek et al., Nat. Biotechnol. (1998) 16:359-363; Marras et al., Genetic Analysis: Biomolecular Engineering (1999) 14:151-156; Tpp et al, BioTechniques (2000) 28:732-738).
  • the present disclosure also provides methods of administering, recommending, prescribing, etc. specific therapeutic regimens to patients whose tumors are found to harbor specific mutation loads as disclosed herein. Measuring mutation load as disclosed herein over a period time can provide a clinician with a dynamic picture of a patient's biological state. These embodiments of the present disclosure thus will provide patient-specific biological information, which will be informative for therapy selection and will facilitate therapy response prediction.
  • the mutation load in a sample is compared to a reference ("reference standard” or "reference level”) in order to direct treatment decisions.
  • Mutation numbers can be manipulated into a score, which can represent mutation load.
  • the reference standard used for any embodiment disclosed herein may comprise average, mean, or median mutation loads in a control population.
  • the reference standard may further include an earlier time point for the same subject.
  • a reference standard may include a first time point, and mutation load can be examined again at second, third, fourth, fifth, sixth time points, etc. Any time point earlier than any particular time point can be considered a reference standard.
  • the reference standard may additionally comprise cutoff values or any other statistical attribute of the control population, or earlier time points of the same subject, such as a standard deviation from the mean mutation load.
  • the control population may comprise healthy individuals, cancer patients having a particular response profile, or the same test patient prior to the administration of any or a specific therapy.
  • a mutation load may be quantified from the reference time point, and a different mutation load may be quantified from a later time point.
  • a first time point can be when an initial therapeutic regimen is begun.
  • a first time point can also be when a first mutation load assay is performed.
  • a time point can be hours, days, months, years, etc.
  • a period between time points is 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 21, 22, 23, 24, 36, 48, 60, 72, 84, 96, 108, or 120 months.
  • a test patient is treated more or less aggressively than a reference therapy based on the difference between the test patient's sample's mutation load and the reference mutation load.
  • a reference therapy is any therapy that is the standard of care for the patient's disease. The standard of care can vary temporally and geographically, and a skilled person can easily determine the appropriate standard of care by consulting the relevant medical literature.
  • a more aggressive therapy than the standard therapy comprises beginning treatment earlier than in the standard therapy. In some embodiments, a more aggressive therapy than the standard therapy comprises administering additional treatments beyond the standard therapy. In some embodiments, a more aggressive therapy than the standard therapy comprises administering alternative treatments instead of the standard therapy. In some embodiments, a more aggressive therapy than the standard therapy comprises treating on an accelerated schedule compared to the standard therapy. In one embodiment a more aggressive therapy comprises increased length of therapy. In one embodiment a more aggressive therapy comprises increased frequency of the dose schedule. In one embodiment, more aggressive therapy comprises selecting and administering more potent drugs and increasing drug dosage. In one embodiment, more aggressive therapy comprises selecting and administering more potent drugs and accelerating dose schedule.
  • more aggressive therapy comprises selecting and administering more potent drugs and increasing length of therapy. In one embodiment, more aggressive therapy comprises increasing drug dosage and accelerating dose schedule. In one embodiment, more aggressive therapy comprises increasing drug dosage and increasing length of therapy. In one embodiment, more aggressive therapy comprises accelerating dose schedule and increasing length of therapy. In one embodiment, more aggressive therapy comprises selecting and administering more potent drugs, increasing drug dosage, and accelerating dose schedule. In one embodiment, more aggressive therapy comprises selecting and administering more potent drugs, increasing drug dosage, and increasing length of therapy. In one embodiment, more aggressive therapy comprises selecting and administering more potent drugs, accelerating dose schedule, and increasing length of therapy. In one embodiment, more aggressive therapy comprises increasing drug dosage, accelerating dose schedule, and increasing length of therapy. In one embodiment, more aggressive therapy comprises selecting and administering more potent drugs, increasing drug dosage, accelerating dose schedule, and increasing length of therapy. In some embodiments, a more aggressive therapy comprises administering a combination of drug- based and non-drug-based therapies.
  • a less aggressive therapy than the standard therapy comprises delaying treatment relative to the standard therapy.
  • a less aggressive therapy than the standard therapy comprises administering less treatment (e.g., lower dosage of one or more standard therapy agents) than in the standard therapy.
  • a less aggressive therapy than the standard therapy comprises administering a treatment regimen lacking one or more components of the standard therapy.
  • a less aggressive therapy than the standard therapy comprises administering treatment on a decelerated schedule compared to the standard therapy.
  • a less aggressive therapy than the standard therapy comprises administering no treatment (e.g., no therapeutic agents, watchful waiting, active surveillance, etc.).
  • a less aggressive therapy comprises delaying treatment.
  • a less aggressive therapy comprises selecting and administering less potent drugs. In one embodiment a less aggressive therapy comprises decreasing the frequency treatment. In one embodiment a less aggressive therapy comprises shortening length of therapy. In one embodiment, less aggressive therapy comprises selecting and administering less potent drugs and decreasing drug dosage. In one embodiment, less aggressive therapy comprises selecting and administering less potent drugs and decelerating dose schedule. In one embodiment, less aggressive therapy comprises selecting and administering less potent drugs and shortening length of therapy. In one embodiment, less aggressive therapy comprises decreasing drug dosage and decelerating dose schedule. In one embodiment, less aggressive therapy comprises decreasing drug dosage and shortening length of therapy. In one embodiment, less aggressive therapy comprises decelerating dose schedule and shortening length of therapy.
  • less aggressive therapy comprises selecting and administering less potent drugs, decreasing drug dosage, and decelerating dose schedule. In one embodiment, less aggressive therapy comprises selecting and administering less potent drugs, decreasing drug dosage, and shortening length of therapy. I n one embodiment, less aggressive therapy comprises selecting and administering less potent drugs, decelerating dose schedule, and shortening length of therapy. In one embodiment, less aggressive therapy comprises decreasing drug dosage, decelerating dose schedule, and shortening length of therapy. I n one embodiment, less aggressive therapy comprises selecting and administering less potent drugs, decreasing drug dosage, decelerating dose schedule, and shortening length of therapy. I n some embodiments, a less aggressive therapy comprises administering only non-drug-based therapies.
  • kits for quantifying mutation load in a sample, e.g., a tumor sample, comprising reagents, etc. to detect variants in test loci.
  • Some embodiments of the present disclosure comprise detection reagents packaged together in the form of a kit for conducting any of the assays disclosed herein.
  • the kits comprise oligonucleotides capable of specifically detecting one or more test loci variants as described herein.
  • the oligonucleotide sequences may correspond to fragments of the biomarker nucleic acids.
  • the oligonucleotides ca n be more than 200, 200, 150, 100, 50, 25, 10, or fewer than 10 nucleotides in length.
  • the kit can contain in separate containers a solution of nucleic acids, control formulations (positive and/or negative), and/or a detectable label, such as but not limited to fluorescein, green fluorescent protein, rhodamine, cyanine dyes, Alexa dyes, luciferase, and radiolabels, among others. Instructions for carrying out the assay can optionally be included in the kit.
  • the kit can contain a nucleic acid substrate array comprising one or more nucleic acid sequences.
  • a nucleic acid substrate array comprising one or more nucleic acid sequences.
  • Such an array will comprise oligonucleotides capable of specifically detecting one or more test locus variants as described herein.
  • the presence or absence of one or more of the test locus variants can be identified by virtue of binding to the array.
  • the substrate array can be on a solid substrate, such as what is known as a "chip.” See, e.g., U.S. Pat. No. 5,744,305.
  • the substrate array can be a solution array; e.g., xMAP (Luminex, Austin, TX), Cyvera (lllumina, San Diego, CA), RayBio Antibody Arrays (RayBiotech, Inc., Norcross, GA), CellCard (Vitra Bioscience, Mountain View, CA) and Quantum Dots' Mosaic (Invitrogen, Carlsbad, CA).
  • xMAP Luminex, Austin, TX
  • Cyvera lllumina, San Diego, CA
  • RayBio Antibody Arrays RayBiotech, Inc., Norcross, GA
  • CellCard Vitra Bioscience, Mountain View, CA
  • Quantum Dots' Mosaic Invitrogen, Carlsbad, CA.
  • a machine-readable storage medium can comprise, for example, a data storage material that is encoded with machine-readable data or data arrays.
  • the data and machine-readable storage medium are capable of being used for a variety of purposes, when using a machine programmed with instructions for using said data. Such purposes include, without limitation, storing, accessing and manipulating information or data relating to mutation load of a patient or population over time.
  • Data comprising the presence of test locus variants can be implemented in computer programs that are executing on programmable computers, which comprise a processor, a data storage system, one or more input devices, one or more output devices, etc.
  • Program code can be applied to the input data to perform the functions described herein, and to generate output information. This output information can then be applied to one or more output devices.
  • the computer can be, for example, a personal computer, a microcomputer, or a workstation of conventional design.
  • the computer programs can be implemented in a high-level procedural or object-oriented programming language, to communicate with a computer system.
  • the programs can also be implemented in machine or assembly language.
  • the programming language can also be a compiled or interpreted language.
  • Each computer program can be stored on storage media or a device such as ROM, magnetic diskette, etc., and can be readable by a programmable computer for configuring and operating the computer when the storage media or device is read by the computer to perform the described procedures.
  • Any health-related data management systems of the present disclosure can be considered to be implemented as a computer-readable storage medium, configured with a computer program, where the storage medium causes a computer to operate in a specific manner to perform various functions, as described herein.
  • the assays disclosed herein can be used to generate a "patient mutation load profile.”
  • the patient mutation load profile s can then be compared to a reference mutation load profile.
  • the biomarker profiles, reference and subject, of embodiments of the present disclosure can be contained in a machine-readable medium, such as analog tapes like those readable by a CD- ROM or USB flash media, among others.
  • the machine-readable media can also comprise subject information; e.g., the subject's medical or family history.
  • Embodiment 1 A method of measuring mutation load comprising:
  • test mutation load is, or is derived from, the number or proportion of test loci harboring a mutation.
  • Embodiment 2 A method of measuring mutation load comprising:
  • test mutation load is, or is derived from, the number or proportion of test loci harboring a mutation
  • Embodiment 3 A method of treating cancer patients comprising:
  • test mutation load is, or is derived from, the number or proportion of test loci harboring a mutation and wherein the test mutation load is high where at least [A number] test loci harbor a mutation and the test mutation load is low where fewer than [A number] test loci harbor a mutation;
  • Embodiment 4 The method of any one of embodiments 1-3, wherein the plurality of test loci comprises at least 30,000, 40,000, 50,000, 60,000, 70,000, 80,000, 90,000, 100,000, 125,000, 150,000, 175,000, 200,000, 225,000, 250,000, 275,000, 300,000, 325,000, 350,000, 375,000, 400,000, 425,000, 450,000, 475,000, 500,000, 550,000, 600,000, 650,000, 700,000, 750,000, 800,000, 850,000, 900,000, 950,000, 1,000,000, 1,250,000, 1,500,000, 1,750,000, 2,000,000, 2,500,000, 3,000,000, 3,500,000, 4,000,000, 4,500,000, or 5,000,000 test loci on each of the plurality of chromosomes.
  • Embodiment 5 The method of any one of embodiments 1-3, wherein the total loci across all the plurality of chromosomes comprises at least 550,000, 600,000, 650,000, 700,000, 750,000, 800,000, 850,000, 900,000, 950,000, 1,000,000, 1,250,000, 1,500,000, 1,750,000, 2,000,000, 2,250,000, 2,500,000, 2,750,000, 3,000,000, 3,250,000 3,500,000, 3,750,000, 4,000,000, 4,250,000, 4,500,000, 4,750,000, 5,000,000, 5,500,000, 6,000,000, 6,500,000, 7,000,000, 7,500,000, 8,000,000, 8,500,000, 9,000,000, 9,500,000, 10,000,000, 11,000,000, 12,000,000, 13,000,000, 14,000,000, 15,000,000, 16,000,000, 17,000,000, 18,000,000, 19,000,000, 20,000,000, 25,000,000, 30,000,000, 35,000,000, 40,000,000, 45,000,000, 50,000,000, 60,000,000, 70,000,000, 80,000,000, 90,000,000, or 100,000,000 loci.
  • Embodiment 6 The method of any one of embodiments 1-3, wherein the plurality of chromosomes comprises at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, or 46 chromosomes.
  • Embodiment ? The method of any one of embodiments 1-3, wherein a test locus is counted as harboring a mutation if (a) there are 25 or more covering reads and (b) the variant allele ratio is in the range of 10%-75%, inclusive.
  • Embodiment s The method of any one of embodiments 1-3, wherein a test locus is counted as harboring a mutation if (a) there are 50 or more covering reads and (b) the variant allele ratio is in the range of 15%-65%, inclusive.
  • Embodiment 9 The method of any one of embodiments 1-3, wherein the plurality of test loci excludes the following loci:
  • loci found to harbor a mutation in more than one sample out of a reference cohort of at least 50 samples are loci found to harbor a mutation in more than one sample out of a reference cohort of at least 50 samples.
  • Embodiment 10 The method of any one of embodiments 1-3, wherein the cancer cell sample is a bodily fluid.
  • Embodiment 11 The method of any one of embodiments 1-3, wherein the cancer cell sample is a tumor sample.
  • Embodiment 12 The method of embodiment 3, wherein [A number] is is 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 125, 150, 175, 200, 250, 300, 350, 400, 450, 500 or more.
  • a method for measuring mutation load as described above was assessed for its ability to differentiate non-small cell lung cancer patients who responded to a specific therapeutic regimen and those who did not.
  • DNA fragments derived from tumor samples were captured by hybridization to about 54,000 SNP probes, size-selected with average fragment size 150-200 bp, and sequenced from both ends on either an lllumina Hiseq or Miseq instrument.
  • the forward (F) and reverse (R) read sequences, trimmed by quality, were aligned to the 801 bp segments around the target SNPs (i.e., test loci), with maximum 7 mismatches and no gaps used as criteria to exclude lower quality reads. Mismatches between the template segment sequences and the read sequences were counted for each selected test locus.

Abstract

The present disclosure relates to methods, kits, systems and compositions for measuring mutation load in cancer cell samples, which can in turn be applied to molecular diagnostic methods, kits, systems and compositions for characterizing cancer.

Description

METHODS FOR MEASURING MUTATION LOAD
CROSS REFERENCE TO RELATED APPLICATIONS
[0001] This application claims priority to U.S. Provisional Patent Application Serial
No. 62/266,965 filed December 14, 2015, the entire contents of which are hereby incorporated by reference.
FIELD OF THE DISCLOSURE
[0002] The present disclosure relates to methods, kits, systems and compositions for measuring mutation load in ca ncer cells, which ca n in turn be applied to molecular diagnostic methods, kits, systems and compositions for characterizing cancer.
BACKGROUND OF THE DISCLOSURE
[0003] Somatic mutations are a hallmark of cancer. Theodor Boveri, J. CELL SCI.
(2008) 121:1-84 (translated and annotated by Henry Harris). Mutations initiate the conversion of healthy cells to malignant cells and drive each of several distinct steps in the process of this conversion. Hanahan & Weinberg, CELL (2000) 100:57-70.
[0004] Mutations in specific genes can also be detected and applied to the characterization of cancers, e.g., according to their aggressiveness, malignant potential, likelihood of response to a particular treatment, etc. Over the last few decades, the number of ca ncer drivers useful in characterizing cancer has grown substantially. More recently, some have tried to make cancer characterization more comprehensive and efficient by looking not to specific genetic drivers, but instead to the genomic footprint these drivers leave on the cancer cell. Abkevich et ai, BR. J. CANCER (2012) 107:1776-1782. These new biomarkers are useful, but other biomarkers may be needed in certain subtypes of cancer. Thus, there remains a clinical need for additional biomarkers for characterizing cancer.
BRIEF SUMMARY OF THE DISCLOSURE
[0005] I n one aspect, the present disclosure provides a method of measuring mutation load (e.g., in a cancer cell sample) comprising: (1) analyzing DNA derived from a cancer cell sample to determine (e.g., detect) the nucleotide sequence of the DNA at a plurality of test loci comprising at least [X number] of loci on each of a plurality of chromosomes, the plurality of chromosomes comprising at least [Y number] of chromosomes, wherein the plurality of test loci comprises at least [Z number] of total loci across the plurality of chromosomes; and
(2) determining a test mutation load, wherein the test mutation load is, or is derived or calculated from, the number or proportion of test loci harboring a mutation.
[0006] I n another aspect, the present disclosure provides a method of measuring mutation load (e.g., in a cancer cell sample) comprising:
(1) analyzing DNA derived from a cancer cell sample to determine (e.g., detect) the nucleotide sequence of the DNA at a plurality of test loci comprising at least [X number] of loci on each of a plurality of chromosomes, the plurality of chromosomes comprising at least [Y number] of chromosomes, wherein the plurality of test loci comprises at least [Z number] of total loci across the plurality of chromosomes;
(2) determining a test mutation load, wherein the test mutation load is, or is derived or calculated from, the number or proportion of test loci harboring a mutation; and
(3) quantifying
(a) high mutation load in a cancer cell sample in which at least [A number] of test loci harbor a mutation; or
(b) low mutation load in a cancer cell sample in which fewer than [A number] of test loci harbor a mutation.
[0007] I n another aspect, the present disclosure provides a method of treating cancer patients comprising:
(1) analyzing DNA derived from a cancer cell sample to determine (e.g., detect) the nucleotide sequence of the DNA at a plurality of test loci comprising at least [X number] of loci on each of a plurality of chromosomes, the plurality of chromosomes comprising at least [Y number] of chromosomes, wherein the plurality of test loci comprises at least [Z number] of total loci across the plurality of chromosomes; (2) quantifying a test mutation load, wherein the test mutation load is, or is derived or calculated from, the number or proportion of test loci harboring a mutation and wherein the test mutation load is high where at least [A number] of test loci harbor a mutation and the test mutation load is low where fewer than [A number] of test loci harbor a mutation; and
(3) administering
(a) a therapeutic regimen comprising a PARP inhibitor or an immune checkpoint inhibitor to a patient in whose cancer cell sample a high test mutation load is detected in (2); or
(b) a therapeutic regimen not comprising a PARP inhibitor or an immune checkpoint inhibitor to a patient in whose cancer cell sample a low test mutation load is detected in (2).
[0008] In some embodiments, [X number] (the number of test loci per test chromosome) comprises at least 30,000, 40,000, 50,000, 60,000, 70,000, 80,000, 90,000, 100,000, 125,000, 150,000, 175,000, 200,000, 225,000, 250,000, 275,000, 300,000, 325,000, 350,000, 375,000, 400,000, 425,000, 450,000, 475,000, 500,000, 550,000, 600,000, 650,000, 700,000, 750,000, 800,000, 850,000, 900,000, 950,000, 1,000,000, 1,250,000, 1,500,000, 1,750,000, 2,000,000, 2,500,000, 3,000,000, 3,500,000, 4,000,000, 4,500,000, or 5,000,000 test loci on each of the plurality of chromosomes.
[0009] In some embodiments, [Z number] (the total number of test loci across all test chromosomes) comprises at least 500,000, 550,000, 600,000, 650,000, 700,000, 750,000, 800,000, 850,000, 900,000, 950,000, 1,000,000, 1,250,000, 1,500,000, 1,750,000, 2,000,000, 2,250,000, 2,500,000, 2,750,000, 3,000,000, 3,250,000 3,500,000, 3,750,000, 4,000,000, 4,250,000, 4,500,000, 4,750,000, 5,000,000, 5,500,000, 6,000,000, 6,500,000, 7,000,000, 7,500,000, 8,000,000, 8,500,000, 9,000,000, 9,500,000, 10,000,000, 11,000,000, 12,000,000, 13,000,000, 14,000,000, 15,000,000, 16,000,000, 17,000,000, 18,000,000, 19,000,000, 20,000,000, 25,000,000, 30,000,000, 35,000,000, 40,000,000, 45,000,000, 50,000,000, 60,000,000, 70,000,000, 80,000,000, 90,000,000, or 100,000,000 loci. [0010] I n some embodiments, [Y number] (the number of test chromosomes) comprises at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, or 46 chromosomes.
[0011] I n some embodiments, a test locus is counted as harboring a mutation if (a) there are at least some minimum number of sequence reads covering the test locus (e.g., 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100 or more reads) and (b) the variant allele ratio is in the range of X%-Y%, inclusive (e.g., X = 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45,
46, 47, 48, 49 or 50; Y = 15, 20, 25, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46,
47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, or 90).
[0012] I n some embodiments, the plurality of test loci excludes loci having certain characteristics, e.g., repetitive loci (e.g., nucleotides within homopolymers at least 4, 5, 6, 7, 8, 9, or 10 nucleotides in length), SNPs found in the general population (e.g., SNPs entered in the National Center for Biotechnology I nformation's dbSN P), loci found to harbor a mutation in more than one sample out of a reference cohort of at least some number of samples (e.g., 25, 50, 75, 100), etc.
[0013] I n some embodiments, the cancer cell sample is, or is derived from, a bodily fluid. In some embodiments, the cancer cell sample is, or is derived from, a tumor. I n some embodiments, the cancer is selected from endometrial cancer, ovarian cancer, breast cancer, colorectal cancer, lung cancer, or prostate cancer. In some embodiments, a first sample is, or is derived from, known cancerous tissue (e.g., tumor biopsy or tumor resection) and a second sa mple is, or is derived from, known germline tissue (e.g., nucleated blood cells, fibroblasts, etc.).
BRIEF DESCRIPTION OF THE DRAWINGS
[0014] Figure 1 illustrates use of high mutation load measured as described herein to differentiate responders from non-responders to a specific PARP inhibitor drug. DETAILED DESCRIPTION OF THE DISCLOSURE
Definitions
[0015] The following terms or definitions are provided solely to aid in the understanding of the disclosure. Unless specifically defined herein, all terms used herein have the same meaning as they would to one skilled in the art of the present disclosure. Practitioners are particularly directed to Sambrook et ai, Molecular Cloning: A Laboratory Manual, 2nd ed., Cold Spring Harbor Press, Plainsview, N.Y. (1989); and Ausubel et ai, Current Protocols in Molecular Biology (Supplement 47), John Wiley & Sons, New York (1999), for definitions and terms of the art. Unless expressly defined otherwise herein, the terms used herein should not be construed to have a scope less than understood by a person of ordinary skill in the art.
[0016] As herein, "algorithm" encompasses any formula, model, mathematical equation, algorithmic, analytical or programmed process, or statistical technique or classification analysis that takes one or more inputs or parameters, whether continuous or categorical, and calculates an output value, index, index value or score. Examples of algorithms include but are not limited to ratios, sums, regression operators such as exponents or coefficients, biomarker value transformations and normalizations (including, without limitation, normalization schemes that are based on clinical parameters such as age, gender, ethnicity, etc.), rules and guidelines, statistical classification models, and neural networks trained on populations. Also of use in the context of mutation load as described herein are linear and non-linear equations and statistical classification analyses to determine the relationship between (a) the number of mutations detected in a subject sample and (b) the level of the respective subject's mutation load.
[0017] As used herein, the term "analyze" or "analyzing" includes "measure,"
"measuring," "detect," "detecting," identify," "identifying," "assay," "assaying," "quantify," or "quantifying," and refers to the process of determining a value or set of values associated with a sample by measurement of the number (or load) of mutations in a sample, and may further comprise comparing test nucleotide sequence(s) detected in a patient's sample against reference nucleotide sequence(s) and/or comparing the test number of any differences revealed by such a comparison (mutations) to one or more reference numbers. [0018] As used herein, the term "diagnosis" refers to methods by which a determination can be made as to whether an individual is likely to be suffering from a given disease or condition, including but not limited diseases or conditions characterized by high mutation load. The skilled artisan often makes a diagnosis on the basis of one or more diagnostic indicators, e.g., a biomarker, the presence, absence, amount, or change in amount of which is indicative of the presence, severity, or absence of the condition. Other diagnostic indicators can include patient history; physical symptoms, e.g., unexplained weight loss, fever, fatigue, pains, or skin anomalies; phenotype; genotype; or environmental or heredity factors. A skilled artisan will understand that the term "diagnosis" refers to an increased probability that certain course or outcome will occur; that is, that a course or outcome is more likely to occur in a patient exhibiting a given characteristic, e.g., the presence or level of a diagnostic indicator, when compared to individuals not exhibiting the characteristic. Diagnostic methods can be used independently, or in combination with other diagnosing methods to determine whether a course or outcome is more likely to occur in a patient exhibiting a given characteristic.
[0019] As used herein, "disease" can encompass any disorder, condition, sickness, ailment, etc. that manifests in, e.g., a disordered or incorrectly functioning organ, part, structure, or system of the body, and results from, e.g., genetic or developmental errors, infection, poisons, nutritional deficiency or imbalance, toxicity, or unfavorable environmental factors.
[0020] As used herein, "immune checkpoint inhibitor" refers to a therapeutic agent whose mode of action is to prevent (or inhibit) immune cells and/or the immune response from being turned off (or down-regulated or inhibited) by cancer cells. Examples include PD-1 inhibitors, ipilimumab (see, e.g., Gulley & Dahut, Nat. Clin. Practice Oncol. (2007) 4:136-137), tremelimumab [see, e.g., Ribas et oi, Oncologist (2007) 12:873-883), and the agents listed in Table 1 below.
Table 1
Figure imgf000007_0001
Tremelimumab
Ribas et al., ONCOLOGIST (2007) (ticilimumab, CP- AstraZeneca CTLA4 Mesothelioma
12:873-883
675,206)
Bristol-
Opdivo Malignant Brahmer et al., J. CLIN. ONCOL.
Myers PD1
(nivolumab) melanoma (2010) 28:3167-3175
Squibb
Keytruda
(pembrolizumab, Malignant Hamid et al., N. ENGL. J. MED.
Merck & Co. PD1
lambrolizumab, melanoma (2013) 369:134-144
MK-3475)
Lee & Chow, TRANSL. LUNG
MEDI4736 AstraZeneca PDL1 NSCLC
CANCER RES. (2014) 3:408-410
Roche/ Urothelial bladder Powles et al., NATURE (2014)
MPDL3280A PDL1
Genentech cancer or NSCLC 515:558-562
Pidilizumab (CT- Hematologic or solid Berger et al., CLIN. CANCER RES.
CureTech PD1
011) tumors (2008) 14:3044-3051
Bristol- lirilumab (BMS- Hematologic or solid Kohrt et al., BLOOD (2014)
Myers KIR
986015) tumors 123:678-686
Squibb
Indoximod (NLG- Newlink Soliman et al., ONCOTARGET
IDOl Breast cancer
9189) Genetics (2014) 5:8136-8146
Koblish et al., MoL. CANCER
INCB024360 Incyte IDOl Solid tumors
THER. (2010) 9:489-498
MEDI0680 (AMP-
AstraZeneca PD1 Solid tumors
514)
MSB-0010718C Merck KGaA PDL1 Solid tumors
4-1BB (also Fisher et al., CANCER IMMUNOL.
Hematologic or solid
PF-05082566 Pfizer known as IMMUNOTHER. (2012) 61:1721- tumors
CD137) 33
OX40 (also
MEDI6469 AstraZeneca known as Solid tumors
CD134)
Bristol-
Hematologic or solid
BMS-986016 Myers LAG 3
tumors
Squibb
Newlink
NLG-919 IDOl Solid tumors
Genetics
Bristol- 4-1BB (also
Urelumab (BMS- Hematologic or solid Li & Liu, CLIN. PHARMACOL.
Myers known as
663513) tumors (2013) 5 (Suppl. l):47-53
Squibb CD137)
[0021] As used herein, "mutation" is described in detail below, but generally refers to an acquired nucleotide change in a somatic tissue as compared to a subject's germline. "Mutation load" is described in detail below, but generally refers to the number or proportion of analyzed loci harboring a mutation, with "high mutation load" or "H ML" generally referring to a number or proportion, or score derived therefrom, that exceeds some reference or threshold.
[0022] As used herein, "next generation sequencing" or "NGS" refers to a variety of high-throughput sequencing technologies that parallelize the sequencing process, producing thousands or millions of sequences at once. NGS is generally conducted with the following steps: First, DNA sequencing libraries are generated by clonal amplification by PCR in vitro; second, the DNA is sequenced by synthesis, such that the DNA sequence is determined by the addition of nucleotides to the complementary strand rather through chain-termination chemistry typical of Sanger sequencing; third, the spatially segregated, amplified DNA templates are sequenced simultaneously in a massively parallel fashion, typically without the requirement for a physical separation step. NGS parallelization of sequencing reactions can generate hundreds of megabases to gigabases of nucleotide sequence reads in a single instrument run. Unlike conventional sequencing techniques, such as Sanger sequencing, which typically report the average genotype of an aggregate collection of molecules, NGS technologies typically digitally tabulate the sequence of numerous individual DNA fragments (sequence reads discussed in detail below), such that low frequency variants (e.g., variants present at less than about 10%, 5% or 1% frequency in a heterogeneous population of nucleic acid molecules) can be detected. The term "massively parallel" can also be used to refer to the simultaneous generation of sequence information from many different template molecules by NGS.
[0023] NGS strategies can include several methodologies, including, but not limited to: (i) microelectrophoretic methods; (ii) sequencing by hybridization; (iii) real-time observation of single molecules, and (iv) cyclic-array sequencing. Cyclic-array sequencing refers to technologies in which a sequence of a dense array of DNA is obtained by iterative cycles of template extension and imaging-based data collection. Commercially available cyclic-array sequencing technologies include, but are not limited to 454 sequencing, for example, used in 454 Genome Sequencers (Roche Applied Science; Basel), Solexa technology, for example, used in the l llumina Genome Analyzer, lllumina HiSeq, MiSeq, and NextSeq (San Diego, CA), the SOLiD platform (Applied Biosystems; Foster City, CA), the Polonator (Dover/Harvard) and HeliScope Single Molecule Sequencer technology (Helicos; Cambridge, MA). Other NGS methods include single molecule rea l time sequencing (e.g., Pacific Bio) and ion semiconductor sequencing (e.g., Ion Torrent sequencing). See, e.g., Shendure & Ji, Next Generation DNA Sequencing, NAT. BIOTECH. (2008) 26:1135-1145 for a more detailed discussion of NGS sequencing technologies.
[0024] As used herein, "PARP inhibitor" refers to a therapeutic agent that inhibits the poly (ADP-ribose) polymerase (PARP). Examples include those listed in Table 2.
Table 2
Figure imgf000010_0001
[0025] As used herein, "patient" or "individual" or "subject" refers to a human. A subject ca n be male or female. A subject can be one who has been previously diagnosed or identified as having a disease characterized by high mutation load. A subject can be one who has already undergone, or is undergoing, a therapeutic intervention for disease characterized by high mutation load. A subject can also be one who has not been previously diagnosed with a disease characterized by high mutation load.
[0026] As used herein, "sample" or "biological sample" refers to samples such as biopsy or tissue samples, frozen samples, blood and blood fractions or products (e.g., serum, platelets, red blood cells, and the like), tumor samples, sputum, bronchoalveolar lavage, cultured cells, e.g., primary cultures, explants, and transformed cells, stool, urine, etc. A "biopsy" refers to the process of removing a tissue sample for diagnostic or prognostic evaluation, and to the tissue specimen itself. Various biopsy techniques can be applied to the methods of the present disclosure. The biopsy technique applied will depend on the tissue type to be evaluated (e.g., lung, etc.), the size and type of the tumor, among other factors. Representative biopsy techniques include, but are not limited to, excisional biopsy, incisional biopsy, needle biopsy, surgical biopsy, and bone marrow biopsy. An "excisional biopsy" refers to the removal of an entire tumor mass with a small margin of normal tissue surrounding it. An "incisional biopsy" refers to the removal of a wedge of tissue that includes a cross-sectional diameter of the tumor. A diagnosis made by endoscopy or fluoroscopy can require a "core-needle biopsy", or a "fine-needle aspiration biopsy" which generally obtains a suspension of cells from within a target tissue. A "bodily fluid" include all fluids obtained from a mammalian body, either processed (e.g., serum) or unprocessed, which can include, for example, blood, plasma, urine, lymph, gastric juices, bile, serum, saliva, sweat, and spinal and brain fluids. A biological sample is typically obtained from a subject. As used herein, "cancer cell samples" or "tumor sample" means a specimen comprising either at least one cancer cell or biomolecules derived therefrom, including without limitation, lung cancer (e.g., non-small cell lung cancer (NSCLC)), ovarian cancer, colorectal cancer, breast cancer, endometrial cancer, or prostate cancer. Non-limiting examples of such biomolecules include nucleic acids and proteins. Biomolecules "derived" from a cancer cell sample include molecules located within or extracted from the sample as well as artificially synthesized copies or versions of such biomolecules. One illustrative, non-limiting example of such artificially synthesized molecules includes PCR amplification products in which nucleic acids from the sample serve as PCR templates. "Nucleic acids of" a cancer cell sample include nucleic acids located in a cancer cell or biomolecules derived from a cancer cell.
[0027] As used herein, "score" means a value or set of values selected so as to provide a quantitative measure of a variable or characteristic of a subject's condition or the degree of mutation load in a sample, and/or to discriminate, differentiate or otherwise characterize mutation load. The value(s) comprising the score can be based on, for example, quantitative data resulting in a measured amount of one or more sample constituents obtained from the subject. In certain embodiments the score can be derived from a single constituent, parameter or assessment, while in other embodiments the score is derived from multiple constituents, parameters and/or assessments. The score can be based upon or derived from an interpretation function; e.g., an interpretation function derived from a particular predictive model using any of various statistical algorithms. A "change in score" can refer to the absolute change in score, e.g. from one time point to the next, or the percent change in score, or the change in the score per unit time (i.e., the rate of score change).
[0028] As used herein, a "test locus" is a genomic locus (e.g., single nucleotide at a specified position within a chromosome) whose sequence or genotype is assessed according to the present disclosure, wherein a mutation at such a locus (e.g., as compared to a reference genotype or sequence) is potentially counted in a measurement of mutation load.
[0029] As used herein, the term "treatment" or "therapy" or "therapeutic regimen" includes all clinical management of a subject and interventions, whether biological, chemical, physical, or a combination thereof, intended to sustain, ameliorate, improve, or otherwise alter the condition of a subject. These terms may be used synonymously herein. Treatments include but are not limited to administration of prophylactics or therapeutic compounds (including small molecule and biologic drugs), exercise regimens, physical therapy, dietary modification and/or supplementation, bariatric surgical intervention, administration of therapeutic compounds (prescription or over-the-counter), and any other treatments efficacious in preventing, delaying the onset of, or ameliorating disease characterized by HML. A "response to treatment" includes a subject's response to any of the above-described treatments, whether biological, chemical, physical, or a combination of the foregoing. A "treatment course" relates to the dosage, duration, extent, etc. of a particular treatment or therapeutic regimen. An initial therapeutic regimen as used herein is the first line of treatment.
Aspects of the Disclosure
[0030] The present disclosure generally relates to methods for measuring mutation load in a biological specimens and methods for clinically applying such mutation load measurement in directing patient therapy. The disclosed methods generally involve analyzing a large number of genomic loci (e.g., substantially genetically random) to detect variations in these loci (e.g., as compared to some reference or expected base or sequence), quantitating the number of such variations as a measurement of genome-wide mutation load, and optionally selecting a therapeutic regimen for patient's whose samples harbor a certain mutation load. In this context, a locus may generally be considered "genetically random" when it is selected without particular regard to its genetic characteristics (e.g., in which gene it is located, its independent association with a clinical feature, etc.). Such a locus may instead be selected for its genomic characteristics (e.g., spacing relative to other test loci along a chromosome, genomic sequence context, etc.) or assay characteristics (e.g., good multiplex amplification, etc.).
[0031] Accordingly, one aspect of the present disclosure provides a method for measuring mutation load (e.g., in a cancer cell sample) comprising:
(1) analyzing DNA derived from a cancer cell sample to determine (e.g., detect) the nucleotide sequence of the DNA at a plurality of test loci comprising at least [X number] of loci on each of a plurality of chromosomes comprising at least [Y number] of chromosomes, wherein the plurality of test loci comprises at least [Z number] of total loci across the plurality of chromosomes; and
(2) determining a test mutation load, wherein the test mutation load is, or is derived from, the number or proportion of test loci harboring a mutation.
[0032] Another aspect of the present disclosure provides a method of measuring mutation load (e.g., in a cancer cell sample) comprising:
(1) analyzing DNA derived from a cancer cell sample to determine (e.g., detect) the nucleotide sequence of the DNA at a plurality of test loci comprising at least [X number] of loci on each of a plurality of chromosomes comprising at least [Y number] of chromosomes, wherein the plurality of test loci comprises at least [Z number] of total loci across the plurality of chromosomes;
(2) determining a test mutation load, wherein the test mutation load is, or is derived from, the number or proportion of test loci harboring a mutation; and
(3) quantifying
(a) high mutation load in a cancer cell sample in which at least [A number] of test loci harbor a mutation; or
(b) low mutation load in a cancer cell sample in which fewer than [A number] of test loci harbor a mutation. [0033] Another aspect of the present disclosure provides a method of treating cancer patients comprising:
(1) analyzing DNA derived from a cancer cell sample to determine the nucleotide sequence of the DNA at a plurality of test loci comprising at least [X number] of loci on each of a plurality of chromosomes comprising at least [Y number] of chromosomes, wherein the plurality of test loci comprises at least [Z number] of total loci across the plurality of chromosomes;
(2) quantifying a test mutation load, wherein the test mutation load is, or is derived from, the number or proportion of test loci harboring a mutation and wherein the test mutation load is high where at least [A number] of test loci harbor a mutation and the test mutation load is low where fewer than [A number] of test loci harbor a mutation; and
(3) administering
(a) a therapeutic regimen comprising a PD-1 inhibitor and/or a PARP inhibitor to a patient in whose cancer cell sample a high test mutation load is detected in (2); or
(b) a therapeutic regimen not comprising a PD-1 inhibitor and/or a PARP inhibitor to a patient in whose cancer cell sample a low test mutation load is detected in (2).
[0034] As discussed above, "mutation" generally refers to an acquired nucleotide change in a somatic tissue as compared to a subject's germline. In some embodiments of the disclosure the method comprises analyzing both known cancer-derived DNA (e.g., a first sample that comprises, or comprises DNA extracted or derived from, known cancerous cells) and known germline DNA (e.g., a second sample that comprises, or comprises DNA extracted or derived from, known germline cells) to determine the sequence at the plurality of test loci for each sample. I n such embodiments, variations between the germline DNA sequence and the ca ncer-derived DNA sequence at test loci would be "m utations" that could be counted in quantifying mutation load (e.g., the germline DNA can be a reference sequence against which the cancer-derived DNA is compared to detect mutations). I n some embodiments the patient's germline DNA sequence for the plurality of test loci is known and need not be determined in the method of the disclosure (e.g., no need in some embodiments for physical or bioinformatic analysis to determine germline sequence). I n such embodiments this known germline DNA sequence may be used as a reference against which the cancer-derived DNA sequence is compared in order to detect mutations in the cancer-derived DNA.
[0035] I n other embodiments of the disclosure, on the other hand, no germline DNA is required. This can be particularly useful in situations where obtaining a matched germline sample or sequence is difficult, costly, etc. I n such embodiments, an external reference sequence (i.e., not the patient's own germline sequence) can be used to detect "variations" by comparing the patient's cancer-derived DNA sequence against this external reference. Variations at test loci in the patient's sequence can in turn represent "mutations" that can in some embodiments be counted toward mutation load according to the present disclosure.
[0036] It has been empirically determined herein that, in some cases, up to 90% of variants found in test loci in cancer cells may be germline rather than somatic. The present disclosure describes several novel approaches to addressing the possibility that variations in the patient's cancer-derived DNA as compared to any external reference are also in that patient's germline (e.g., germline polymorphism). These approaches generally include, e.g., careful selection of test loci according to specific criteria, estimating the ratio of cancerous to non-cancerous cells in the sample being analyzed, etc.
[0037] One example of test locus selection criteria can be the repetitive nature of the locus. Repetitive loci can present difficulties in accurate sequence alignment (due to the sequence being repeated in multiple locations in the genome) and also show natural, benign germline variation in humans. Thus, repetitive loci carry a relatively high likelihood of harboring an apparent somatic variation as compared to an external reference that is either an artifact of an inaccurate sequencing alignment or a germline variation in the individual. Accordingly, in some embodiments the plurality of test loci excludes repetitive loci. In various embodiments "repetitive loci" includes short tandem repeats (STRs), ALU sequences, and/or low complexity regions, etc.
[0038] Another example of test locus selection criteria can be the prevalence of natural, germline variation at the locus in any relevant population. I n some embodiments the plurality of test loci can exclude any single nucleotide polymorphism (SNP) listed in the National Center for Biotechnology Information's SNP database (called "dbSNP"). This can be any SNP listed in the then-current build of dbSN P, the build of dbSNP current as of the filing date of this disclosure, or dbSNP build 146.
[0039] Another example of test locus selection criteria can be the prevalence of variations at a locus within some patient/sample cohort. Prevalence of variations at a locus within some patient/sample cohort ca n reflect at least two underlying causes: rare germline polymorphisms not reflected in dbSNP or sequence context that consistently gives rise to inaccurate sequence reads. I n some embodiments the plurality of test loci ca n exclude any locus found to harbor a variation in at least some specific number [X] of samples out of a reference cohort comprising at least some minimum number [Y] of reference samples. In some embodiments X = at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90 or 100 samples. I n some embodiments Y = at least 25, 50, 75, 100, 150, 200, 250, 300, 350, 400, 450, or 500 samples. I n some embodiments all reference samples in the cohort are unrelated (e.g., each sample is from a different patient).
[0040] These criteria are not necessarily mutually exclusive. For example, a criterion for test locus selection can be known deficiencies in sequencing accuracy and/or reproducibility other than those revealed by prevalence in a cohort as described in the preceding paragraph. This can include, but is certainly not limited to, repetitive sequence context surrounding the locus, which ca n often be identified in silico (e.g., no need for empirical testing of the locus). It can also include empirically determined reproducibility deficiencies. Another overlap is found in the fact that prevalence in a reference cohort ca n be caused by germline variation not reflected in a common SNP (or other polymorphism) database.
[0041] Another novel technique disclosed herein for addressing the possibility that variations in the patient's ca ncer-derived DNA as compared to any external reference are also in that patient's germline includes estimating the ratio of cancerous to non-cancerous cells in the sample being analyzed. Thus, some embodiments of any of the above-described aspects of the disclosure may comprise a step of estimating the ratio of cancerous to non-cancerous cells (or the ratio of cancerous to non-cancerous DNA) in the sample being analyzed. In some such embodiments, this cancerous/non-cancerous ratio can be used in conjunction with a predefined range of variant allele ratios (as discussed below) to determine whether a detected variation is germline or somatic. By way of non-limiting example, the ratio of cancerous to non-cancerous DNA can optionally be estimated and optionally further the allelic changes that occur at each genomic region can be reconstructed (these changes ca n include both allele copy number changes and loss of an allele). With both the ratio of cancerous to non-cancerous DNA and the relative allele balance known for a given locus, an estimate of frequency can be determined for variants on either germline allele or a somatic variant on either allele. Should the estimated frequency for a variant on either of the somatic alleles differ significantly from the expected germline frequencies, one may call variants with a frequency that is close to the estimated somatic allele(s) frequency as a somatic variant. In practice, these frequencies may be anywhere from 0-100%. For example, in a sample with 50% cancerous and 50% non-cancerous cells/DNA, assuming the cancer retained a balanced 2 copy number structure (1 copy of each allele) for the entire genome, one would expected only 3 potential frequencies: 100%, 50%, and 25%. 100% would correspond to germline homozygous, 50% to germline heterozygous, and 25% to the somatic variants. If a somatic variant did occur on both alleles, its frequency would be 50% and could potentially be confused for a germline heterozygous variant, but this is statistically an unlikely event that in some embodiments may be ignored (i.e., in some embodiments, a 50% allele ratio/frequency may be interpreted as conclusive evidence of a germline heterozygous variant). Thus, in some embodiments variants with close to 25% frequency (e.g., 15%, 16%, 17%, 18%, 19%, 20%, 21%, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 30%, 31%, 32%, 33%, 34%, or 35%) may be counted as "mutations" in quantifying mutation load.
Further Embodiments of these Aspects
[0042] I n some embodiments of each of the above aspects of the disclosure, [X number] (the number of test loci per test chromosome) comprises at least 30,000, 40,000, 50,000, 60,000, 70,000, 80,000, 90,000, 100,000, 125,000, 150,000, 175,000, 200,000, 225,000, 250,000, 275,000, 300,000, 325,000, 350,000, 375,000, 400,000, 425,000, 450,000, 475,000, 500,000, 550,000, 600,000, 650,000, 700,000, 750,000, 800,000, 850,000, 900,000, 950,000, 1,000,000, 1,250,000, 1,500,000, 1,750,000, 2,000,000, 2,500,000, 3,000,000, 3,500,000, 4,000,000, 4,500,000, or 5,000,000 test loci on each of the plurality of chromosomes. [0043] I n some embodiments of each of the above aspects of the disclosure, [Z number] (the total number of test loci across all test chromosomes) comprises at least 500,000, 550,000, 600,000, 650,000, 700,000, 750,000, 800,000, 850,000, 900,000, 950,000, 1,000,000,
I, 250,000, 1,500,000, 1,750,000, 2,000,000, 2,250,000, 2,500,000, 2,750,000, 3,000,000, 3,250,000 3,500,000, 3,750,000, 4,000,000, 4,250,000, 4,500,000, 4,750,000, 5,000,000, 5,500,000, 6,000,000, 6,500,000, 7,000,000, 7,500,000, 8,000,000, 8,500,000, 9,000,000, 9,500,000, 10,000,000,
II, 000,000, 12,000,000, 13,000,000, 14,000,000, 15,000,000, 16,000,000, 17,000,000, 18,000,000, 19,000,000, 20,000,000, 25,000,000, 30,000,000, 35,000,000, 40,000,000, 45,000,000, 50,000,000, 60,000,000, 70,000,000, 80,000,000, 90,000,000, or 100,000,000 loci.
[0044] I n some embodiments of each of the above aspects of the disclosure, [Y number] (the number of test chromosomes) comprises at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, or 46 chromosomes.
[0045] I n some embodiments of the above aspects of the disclosure involving quantifying mutation load as high or low, [A number] (the number of mutations including and above which a sample is quantified as have a high mutation load and below which a sample is quantified as having a low mutation load) is 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 125, 150, 175, 200, 250, 300, 350, 400, 450, 500 or more. I n some embodiments this threshold [A number] may be derived from a study of a cohort of patients in which mutation load was assessed according to the present disclosure, with [A number] representing a mutation number that separated patients in the cohort according to some clinical feature associated with mutation load (e.g., [A number] separated patients with or without response to immune checkpoint inhibitors with at least some minimum level of accuracy, e.g., sensitivity and specificity each of at least 70%, 80%, 85%, 90%, 95%). As used herein, "low" can mean either low with respect to some reference or simply "not high." In some embodiments more than two categories (i.e., more than "high" and "low") can be used to quantify mutation load (e.g., 3, 4, 5, 6, 7, 8, 9, 10 or more categories). For example, a patient cohort as described in this paragraph could be divided into three groups, "high" mutation load for mutation numbers at or above [A number], "intermediate" mutation load for mutation numbers between [A number] and [B number], and "low" mutation load for mutation numbers at or below [B number]. This division may be based on the cohort being divided into terciles of mutation load or using some other thresholds between subgroups. I n some embodiments a numerical score is derived from the number of mutations and this score is used as the threshold for differentiating high and low mutation load.
[0046] I n some embodiments of each of the above aspects of the disclosure, a test locus is counted as harboring a mutation if (a) there are at least some minimum number of sequence reads covering the test locus (e.g., 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 125, 150, 175, 200, 250, 300, 350, 400, 450, 500, 600, 700, 800, 900, or 1,000 or more reads) and (b) the variant allele ratio is in the range of X%-Y%, inclusive (e.g., X = 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36,
37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49 or 50; Y = 15, 20, 25, 30, 31, 32, 33, 34, 35, 36, 37,
38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, or 90). I n some specific embodiments the minimum number of covering reads is 50 and the variant allele range is 15%-65%.
[0047] As used herein, "sequence read" refers to the sequence of an individual DNA molecule sequenced in a sequencing reaction. Especially in massively-parallel (often called next- generation) sequencing, individual DNA molecules used for sequencing can be relatively short (e.g., ranging from 50nt to l,000nt). These molecules are often heavily overlapping in their sequences. Thus, any individual test locus is contained or represented within numerous distinct DNA molecules in the sample that is subjected to the sequencing reaction. When each individual molecule is sequenced (often in parallel), the numerous resulting "sequence reads" can be aligned against each other and/or against a larger reference sequence (e.g., a reference human genome sequence such as the hgl9 version of the human genome assembly available at the University of California Santa Clara's Genome Browser website). Generally speaking, a greater number of reliably sequenced (or "informative") reads containing or representing (or "covering") any individual locus tends to yield greater accuracy and confidence in the called genotype/sequence at that locus. Thus, in some specific embodiments of each of the above aspects of the disclosure a test locus (or a variant at that locus) may be counted toward mutation load only if it is covered by at least some minimal number of sequence reads in the sequencing reaction(s) (e.g., at least 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 125, 150, 175, 200, 225, 50, 275, 300, 325, 350, 375, 400, 425, 450, 475, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1,000, 1,100, 1,200, 1,300, 1,400, 1,500, 1,600, 1,700, 1,800, 1,900, 2,000, 2,250, 2,500, 2,750, 3,000 or more sequence reads).
[0048] As used herein, "variant allele ratio" refers to the proportion of informative sequence reads harboring a variant nucleotide as a proportion of the total sequence reads. For example, if a test locus is covered by 100 informative sequence reads in a particular sequencing reaction and 15 reads carry a variant nucleotide (e.g., T instead of the G expected based on knowledge of the patient's germline sequence or based on an external reference sequence), then the variant allele ratio is 15%. I n some contexts variant allele ratios that are too low or too high may indicate unreliability in a mutation call. For example, if the variant allele ratio is around 1%, this can be due to sequencing artifacts and noise (e.g., a small proportion of sequence reads simply contain sequencing errors). If, however, the variant ratio approaches 100%, this can indicate that the variant may not be a mutation since (a) the sample may contain only or substantially only tumor tissue and thus any variants detected may be either somatic mutations or rare germline variants and/or (b) the variant may be a homozygous germline variant found throught the tumor and normal portions of the sample. Thus, in some specific embodiments of each of the above aspects of the disclosure a test locus (or a variant at that locus) may be counted toward mutation load only if the variant allele ratio is within a specific (e.g., pre-specified) range (e.g., 15%-85%, 16%-85%, 17%-85%, 18%-85%, 19%-85%, 20%-85%, 21%-85%, 22%-85%, 23%-85%, 24%-85%, 25%- 85%, 26%-85%, 27%-85%, 28%-85%, 29%-85%, 30%-85%, 31%-85%, 32%-85%, 33%-85%, 34%-85%, 35%-85%, 15%-80%, 16%-80%, 17%-80%, 18%-80%, 19%-80%, 20%-80%, 21%-80%, 22%-80%, 23%- 80%, 24%-80%, 25%-80%, 26%-80%, 27%-80%, 28%-80%, 29%-80%, 30%-80%, 31%-80%, 32%-80%, 33%-80%, 34%-80%, 35%-80%, 15%-70%, 16%-70%, 17%-70%, 18%-70%, 19%-70%, 20%-70%, 21%- 70%, 22%-70%, 23%-70%, 24%-70%, 25%-70%, 26%-70%, 27%-70%, 28%-70%, 29%-70%, 30%-70%, 31%-70%, 32%-70%, 33%-70%, 34%-70%, 35%-70%, 15%-60%, 16%-60%, 17%-60%, 18%-60%, 19%- 60%, 20%-60%, 21%-60%, 22%-60%, 23%-60%, 24%-60%, 25%-60%, 26%-60%, 27%-60%, 28%-60%, 29%-60%, 30%-60%, 31%-60%, 32%-60%, 33%-60%, 34%-60%, 35%-60%, 15%-50%, 16%-50%, 17%- 50%, 18%-50%, 19%-50%, 20%-50%, 21%-50%, 22%-50%, 23%-50%, 24%-50%, 25%-50%, 26%-50%, 27%-50%, 28%-50%, 29%-50%, 30%-50%, 31%-50%, 32%-50%, 33%-50%, 34%-50%, 35%-50%, 15%- 40%, 16%-40%, 17%-40%, 18%-40%, 19%-40%, 20%-40%, 21%-40%, 22%-40%, 23%-40%, 24%-40%, 25%-40%, 26%-40%, 27%-40%, 28%-40%, 29%-40%, 30%-40%, 31%-40%, 32%-40%, 33%-40%, 34%- 40%, 35%-40%, 15%-35%, 16%-34%, 17%-33%, 18%-32%, 19%-31%, 20%-30%, 21%-29%, 22%-28%, 23%-27%, or 24%-26%).
[0049] I n some embodiments of the above aspects of the disclosure, additional criteria and/or techniques may be applied to the sequence data obtained from one or more sequencing reactions to further improve sequence read reliability. One example involves discarding sequence reads with too much variation from the reference sequence (whether the patient's germline or an external reference). For example, in some embodiments sequence reads having more than a certain maximum number [X] of mismatches against the reference may be discarded. In some embodiments X = 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90 or 100 mismatches. In some embodiments sequence reads having more than a certain maximum proportion [Y] of mismatches against the reference may be discarded (e.g., the percentage of nucleotide positions with a mismatch divided by the total number of nucleotide positions in the sequence read or, as described below, in the region of confident sequence calls in the sequence read). In some embodiments Y = 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 25%, 30%, 35%, 40%, 45%, or 50% or more mismatches. Alternatively, sequence reads that, when aligned to the reference sequence, show insertions or deletions (gaps) may also be discarded.
[0050] I n some cases, raw reads produced by a sequencing reaction may be correct and precise (i.e., high confidence in the sequence calls) only in a fraction of their length. I n such cases, using the entire read sequence may introduce artifacts in variant or mutation calling to quantify mutation load as described herein. Reliability can be increased, however, by "trimming" the sequence to be analyzed in each read (i.e., discarding from ultimate analysis a certain length of nucleotides at the end(s) of reads). I n some embodiments this is done by trimming a pre-specified number of nucleotides from the 5' and/or 3' ends of each read, e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50 or more nucleotides from the 5' end and/or 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 125, 150, 175, 200, 250, 300, 350, 400, 450, 500 or more nucleotides from the 3' end. In some embodiments this is done by trimming a pre-specified proportion of nucleotides from the 5' and/or 3' ends of each read (e.g., the percentage of nucleotide positions to be trimmed divided by the total number of nucleotide positions in the read), e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50 or more nucleotides from the 5' end and/or 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 25%, 30%, 35%, 40%, 45%, or 50% or more nucleotides from the 3' end. In some specific embodiments the total average read length is between about 150 and about 200 nucleotides and only positions 11-135 (or 11-185 in the case of a 200 nucleotide read length) are used to estimate coverage and to count potentially variant alleles.
[0051] Alternatively, trimming may be performed informatically. Two general classes of trimming algorithms include window-based or running-sum algorithms, with examples of each as follows:
Table 3
Figure imgf000022_0001
Cox et ai, SolexaQA: At-a- glance quality assessment of
lllumina second-generation
SolexaQA
sequencing data, BMC
BlOINFORMATICS (2010)
11(1):485
FASTX
quality NA
trimmer
[0052] In some embodiments double-stranded DNA may be sequenced only in one direction whereas, in other embodments, such DNA may be sequenced in both the forward and reverse directions (e.g., to increase the number of potentially informative sequence reads). Several techniques may be employed to in turn improve bidirectional sequencing reliability, including trimming and removal of reads whose sequence varies too highly from a reference (e.g., as each is described above). Another technique specific to improving reliability of bidirectional sequencing involves comparing the forward and reverse strand reads and discarding certain reads under certain circumstances. For example, if the forward and reverse read alignments overlap, one may in some embodiments use only one to contribute to the coverage and variant allele analysis (or counting) of the overlapping positions. Further, in some embodiments positions or sequence reads with non-matching forward and reverse bases may be excluded as sequencing errors.
[0053] In some embodiments of each of the above aspects of the present disclosure,
"analyzing DNA" can comprise either (a) analyzing (e.g., informatically) DNA data previously obtained from a patient sample or (b) obtaining DNA from a patient sample and performing a laboratory assay to obtain data regarding such DNA. Analogously, "determining the nucleotide sequence of" DNA includes either (a) analyzing (e.g., informatically) DNA sequence data previously obtained from a patient sample or (b) obtaining DNA from a patient sample and performing a laboratory assay to sequence such DNA. As discussed above, the methods of the invention generally require at least analyzing cancer-derived DNA (e.g., "DNA derived from a cancer cell sample"), but may optionally also include analyzing germline DNA and using the sequence of such germline DNA as a reference for detecting (and quantifying mutations in the cancer-derived DNA). "Cancer-derived DNA," as used herein, is synonymous with DNA "derived" (as defined herein) from a cancer cell.
[0054] I n some embodiments of each of the above aspects of the present disclosure, the ca ncer cell sample is, or is derived from, a bodily fluid. In some embodiments, the ca ncer cell sample is, or is derived from, a tumor. I n some embodiments, the cancer is selected from endometrial cancer, ovarian cancer, breast cancer, colorectal cancer, lung cancer, or prostate cancer. I n some embodiments, a first sample comprises, or is derived from, known cancerous cells (e.g., tumor biopsy or tumor resection) and a second sample is, or is derived from, known germline tissue (e.g., nucleated blood cells, fibroblasts, etc.).
[0055] It has been discovered that high mutation loads detected as described herein typically do not co-occur with other assay biomarkers of drug response, thus allowing mutation load as described herein to in some embodiments supplement such other assays. For example, as described in Example 1 below, a study of a specific homologous recombination deficiency (HRD) assay (see, e.g., U.S. Appl. Ser. No. 14/245,576 (published as US20140363521A1); I nternat. Appl. Ser. No. PCT/US2015/045561 (published as WO/2016/025958A1)), incorporated herein by reference) and its ability to predict response to a specific PARP inhibitor drug showed that two samples with strong response did not have HRD. Both of these samples, however, had high mutation load detected as described herein. Thus in some embodiments the method comprises identifying a sample as not having HRD (including assaying the sample to determine HRD status, e.g., as described and defined in U.S. Appl. Ser. No. 14/245,576 and I nternat. Appl. Ser. No. PCT/US2015/045561) and then performing the methods as described herein.
Detection of Test Locus Variants
[0056] The present methods generally relate to the detection of variants in test loci.
The methodology for preparing nucleic acids in a form that is suitable for variant detection can include, but are not limited to, PCR, detectable probes, sequencing and single base extensions, reverse transcriptase-PCR (RT-PCR), real-time PCR, allele-specific hybridization, reverse transcription quantitative real-time PCR (RT-qPCR) ligase chain reaction, strand displacement amplification (SDA), self-sustained sequence replication (3SR), or in situ PCR. Exemplary, but non- limiting, techniques for analysis of nucleic acid samples to detect test locus variants are briefly described below. One preferred technique is NGS
DNA Sequencing and Single Base Extensions
[0057] I ndels can also be detected by direct sequencing. Methods include e.g., dideoxy sequencing-based methods and other methods such as Maxam and Gilbert sequence (see, e.g., Sambrook et al., supra).
[0058] Other detection methods include Pyrosequencing™ of oligonucleotide-length products. Such methods often employ amplification techniques such as PCR. For example, in pyrosequencing, a sequencing primer is hybridized to a single stranded, PCR-amplified, DNA template; and incubated with the enzymes, DNA polymerase, ATP sulfurylase, luciferase and apyrase, and the substrates, adenosine 5' phosphosulfate (APS) and luciferin. The first of four deoxynucleotide triphosphates (dNTP) is added to the reaction. DNA polymerase catalyzes the incorporation of the deoxynucleotide triphosphate into the DNA strand, if it is complementary to the base in the template strand. Each incorporation event is accompanied by release of pyrophosphate (PPi) in a quantity equimolar to the amount of incorporated nucleotide. ATP sulfurylase quantitatively converts PPi to ATP in the presence of adenosine 5' phosphosulfate. This ATP drives the luciferase-mediated conversion of luciferin to oxyluciferin that generates visible light in amounts that are proportional to the amount of ATP. The light produced in the luciferase- catalyzed reaction is detected by a charge coupled device (CCD) camera and seen as a peak in a Pyrogram™. Each light signal is proportional to the number of nucleotides incorporated. Apyrase, a nucleotide degrading enzyme, continuously degrades unincorporated dNTPs and excess ATP. When degradation is complete, another dNTP is added.
[0059] Another similar method for characterizing indels does not require use of a complete PCR, but typically uses only the extension of a primer by a single, fluorescence-labeled dideoxyribonucleic acid molecule (ddNTP) that is complementary to the nucleotide to be investigated. The nucleotide at the polymorphic site ca n be identified via detection of a primer that has been extended by one base and is fluorescently labeled (e.g., Kobayashi et al, Mol. Cell. Probes, 9:175-182, 1995). Distinguishing Test Locus Variants
[0060] Amplification products can be analyzed using techniques including, without limitation, electrophoretic analysis or sequence analysis. Non-limiting examples of electrophoretic analysis include slab gel electrophoresis such as agarose or polyacrylamide gel electrophoresis, capillary electrophoresis, and denaturing gradient gel electrophoresis (DGGE). Other methods of nucleic acid analysis include, but are limited to, hybridization with allele-specific oligonucleotide probes (Wallace et ai, Nucl. Acids Res. (1978) 6:3543-3557), including immobilized oligonucleotides (Saiki et ai, PNAS (1989) 86:6230-6234), oligonucleotide arrays (Maskos and Southern, Nucl. Acids Res. (1993) 21:2269-2270), oligonucleotide-ligation assay (OLA) (Landegren et ai, Science (1988) 241:1077), allele-specific ligation chain reaction (LCR) (Barrany, PNAS (1991) 88:189-193), gap-LCR (Abavaya et al. Nucl. Acids Res. (1995) 23:675-682), single-strand-conformation-polymorphism detection (Orita et ai, Genomics (1983) 5:874-879), RNAase cleavage at mismatched base-pairs (Myers et ai, Science (1985) 230:1242), genetic bit analysis (GBA) (Nikiforov et ai, Nucl. Acids Res. (1994) 22:4167-4175), in situ hybridization, denaturing high performance liquid chromatography (DHPLC) (Kim et ai, Genetic Testing (2008) 12:295-298). Non-limiting examples of sequence analysis include NGS [e.g., Chen et al., Genome Res. (2008) 18:1143-1149); Srivatsan et al. PLoS Genet. (2008) 4:el000139), Maxam-Gilbert sequencing, Sanger sequencing, capillary array DNA sequencing, thermal cycle sequencing (Sears et al., Biotechniques (1992) 13:626-633), solid-phase sequencing (Zimmerman et al., Methods Mol. Cell Biol. (1992) 3:39-42), sequencing with mass spectrometry such as matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF/MS; Fu et al., Nat. Biotechnol. (1998) 16:381-384), sequencing by hybridization (Chee et al., Science (1996) 274:610-614); Drmanac et al., Science (1993) 260:1649-1652); Drmanac et al., Nat. Biotechnol. (1998) 16:54-58), Polony sequencing (Porreca et al., Curr. Protoc. Mol. Biol. (2006) Chp. 7; Unit7.8), ion semiconductor sequencing (Elliott et al., J. Biomol. Tech. 1:24-30 (2010), DNA nanoball sequencing (Kaji et al., Chem. Soc. Rev. (2010) 39:948-56), single molecule real-time sequencing (Flusberg et al., Nat. Methods (2010) 6:461-5), or nanopore DNA sequencing (Wanunu, Phys. Life Rev. (2012)9:125-58). Allele-Specific Hybridization
[0061] This technique, also commonly referred to as allele specific oligonucleotide hybridization (ASO) [e.g., Stoneking et oL, AM. J. HUM. GENET. (1991) 48:70-382; Saiki et oL, NATURE (1986) 324, 163-166; EP 235,726; and WO/1989/011548), relies on distinguishing between two DNA molecules differing by one base by hybridizing an oligonucleotide probe that is specific for one of the variants to an amplified product obtained from amplifying the nucleic acid sample. This method typically employs short oligonucleotides, e.g., 15-20 bases in length. The probes are designed to differentially hybridize to one variant (e.g., from a reference sequence) versus another. Hybridization conditions should be sufficiently stringent that there is a significant difference in hybridization intensity between alleles, and producing an essentially binary response, whereby a probe hybridizes to only one of the alleles. Some probes are designed to hybridize to a segment of target DNA such that the polymorphic site aligns with a central position (e.g., in a 15-base oligonucleotide at the 7 position; in a 16-based oligonucleotide at either the 8 or 9 position) of the probe, but this design is not required.
[0062] The amount and/or presence of an allele is determined by measuring the amount of allele-specific oligonucleotide that is hybridized to the sample. Typically, the oligonucleotide comprises a label (e.g., a fluorescent label). For example, an allele-specific oligonucleotide is applied to immobilized oligonucleotides representing sequences with different nucleotides at the test locus. After stringent hybridization and washing conditions, fluorescence intensity is measured for each variant-specific oligonucleotide.
[0063] Suitable assay formats for detecting hybrids formed between probes and target nucleic acid sequences in a sample include, but are not limited to, the immobilized target (dot-blot) format and immobilized probe (reverse dot-blot or line-blot) assay formats. Dot blot and reverse dot blot assay formats are described in U.S. Pat. Nos. 5,310,893; 5,451,512; 5,468,613; and 5,604,099; each incorporated herein by reference.
Detectable Probes
[0064] Hybridization probes useful in the methods of the present disclosure can include an allele-specific probe that discriminates between alleles with and without variants at the test locus. The probes can be at least about 12, 15, 16, 18, 20, 22, 24, 25, 30 or more nucleotide fragments of a contiguous sequence surrounding the test locus. The probes can be produced by, for example, chemical synthesis, PCR amplification, generation from longer polynucleotides using restriction enzymes, or other methods. The probes can be made completely complementary to the target nucleic acid or portion thereof (e.g., to all or a portion of a sequence encoding a target). Therefore, usually high stringency conditions are desirable in order to prevent or at least minimize false positives. However, conditions of high stringency may be best suited to situations where the probes are complementary to regions of the target which lack heterogeneity. The stringency of hybridization is determined by a number of factors during hybridization and during the washing procedure, including temperature, ionic strength, length of time, and concentration of formamide (Sambrook et al. (1989), "Molecular Cloning; A Laboratory Manual," Second Edition (Cold Spring Harbor Press, Cold Spring Harbor, N.Y.)).
[0065] Nucleic acid probes, or alternatively nucleic acid from the samples, can be provided in solution for such assays, or can be affixed to a support (e.g., solid or semi-solid support). Examples of supports that can be used are nitrocellulose (e.g., in membrane or microtiter well form), polyvinyl chloride (e.g., in sheets or microtiter wells), polystyrene latex (e.g., in beads or microtiter plates, polyvinylidine fluoride, diazotized paper, nylon membranes, activated beads, and Protein A beads.
[0066] Probes detectable upon a secondary structural change are also suitable for detection of a test locus variant. Exemplified secondary structure or stem-loop structure probes include molecular beacons or Scorpion® primer/probes. Molecular beacon probes are single- stranded oligonucleic acid probes that can form a hairpin structure in which a fluorophore and a quencher are usually placed on the opposite ends of the oligonucleotide. At either end of the probe short complementary sequences allow for the formation of an intramolecular stem, which enables the fluorophore and the quencher to come into close proximity. The loop portion of the molecular beacon is complementary to a target nucleic acid of interest. Binding of this probe to its target nucleic acid of interest forms a hybrid that forces the stem apart. This causes a conformation change that moves the fluorophore and the quencher away from each other and leads to a more intense fluorescent signal. See, e.g., Tyagi & Kramer, Nat. Biotechnol. (1996) 14:303-308; Tyagi et al., Nat. Biotechnol. (1998) 16:49-53; Piatek et al., Nat. Biotechnol. (1998) 16:359-363; Marras et al., Genetic Analysis: Biomolecular Engineering (1999) 14:151-156; Tpp et al, BioTechniques (2000) 28:732-738).
Therapeutic regimens
[0067] The present disclosure also provides methods of administering, recommending, prescribing, etc. specific therapeutic regimens to patients whose tumors are found to harbor specific mutation loads as disclosed herein. Measuring mutation load as disclosed herein over a period time can provide a clinician with a dynamic picture of a patient's biological state. These embodiments of the present disclosure thus will provide patient-specific biological information, which will be informative for therapy selection and will facilitate therapy response prediction.
Reference Standards for Treatment
[0068] In many embodiments, the mutation load in a sample is compared to a reference ("reference standard" or "reference level") in order to direct treatment decisions. Mutation numbers can be manipulated into a score, which can represent mutation load. The reference standard used for any embodiment disclosed herein may comprise average, mean, or median mutation loads in a control population. The reference standard may further include an earlier time point for the same subject. For example, a reference standard may include a first time point, and mutation load can be examined again at second, third, fourth, fifth, sixth time points, etc. Any time point earlier than any particular time point can be considered a reference standard. The reference standard may additionally comprise cutoff values or any other statistical attribute of the control population, or earlier time points of the same subject, such as a standard deviation from the mean mutation load. In some embodiments, the control population may comprise healthy individuals, cancer patients having a particular response profile, or the same test patient prior to the administration of any or a specific therapy.
[0069] In some embodiments, a mutation load may be quantified from the reference time point, and a different mutation load may be quantified from a later time point. A first time point can be when an initial therapeutic regimen is begun. A first time point can also be when a first mutation load assay is performed. A time point can be hours, days, months, years, etc. In some embodiments, a period between time points is 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 21, 22, 23, 24, 36, 48, 60, 72, 84, 96, 108, or 120 months.
Reference Therapy for Treatment
[0070] In some embodiments, a test patient is treated more or less aggressively than a reference therapy based on the difference between the test patient's sample's mutation load and the reference mutation load. In some embodiments a reference therapy is any therapy that is the standard of care for the patient's disease. The standard of care can vary temporally and geographically, and a skilled person can easily determine the appropriate standard of care by consulting the relevant medical literature.
[0071] In some embodiments, a more aggressive therapy than the standard therapy comprises beginning treatment earlier than in the standard therapy. In some embodiments, a more aggressive therapy than the standard therapy comprises administering additional treatments beyond the standard therapy. In some embodiments, a more aggressive therapy than the standard therapy comprises administering alternative treatments instead of the standard therapy. In some embodiments, a more aggressive therapy than the standard therapy comprises treating on an accelerated schedule compared to the standard therapy. In one embodiment a more aggressive therapy comprises increased length of therapy. In one embodiment a more aggressive therapy comprises increased frequency of the dose schedule. In one embodiment, more aggressive therapy comprises selecting and administering more potent drugs and increasing drug dosage. In one embodiment, more aggressive therapy comprises selecting and administering more potent drugs and accelerating dose schedule. In one embodiment, more aggressive therapy comprises selecting and administering more potent drugs and increasing length of therapy. In one embodiment, more aggressive therapy comprises increasing drug dosage and accelerating dose schedule. In one embodiment, more aggressive therapy comprises increasing drug dosage and increasing length of therapy. In one embodiment, more aggressive therapy comprises accelerating dose schedule and increasing length of therapy. In one embodiment, more aggressive therapy comprises selecting and administering more potent drugs, increasing drug dosage, and accelerating dose schedule. In one embodiment, more aggressive therapy comprises selecting and administering more potent drugs, increasing drug dosage, and increasing length of therapy. In one embodiment, more aggressive therapy comprises selecting and administering more potent drugs, accelerating dose schedule, and increasing length of therapy. In one embodiment, more aggressive therapy comprises increasing drug dosage, accelerating dose schedule, and increasing length of therapy. In one embodiment, more aggressive therapy comprises selecting and administering more potent drugs, increasing drug dosage, accelerating dose schedule, and increasing length of therapy. In some embodiments, a more aggressive therapy comprises administering a combination of drug- based and non-drug-based therapies.
[0072] In some embodiments, a less aggressive therapy than the standard therapy comprises delaying treatment relative to the standard therapy. In some embodiments, a less aggressive therapy than the standard therapy comprises administering less treatment (e.g., lower dosage of one or more standard therapy agents) than in the standard therapy. In some embodiments, a less aggressive therapy than the standard therapy comprises administering a treatment regimen lacking one or more components of the standard therapy. In some embodiments, a less aggressive therapy than the standard therapy comprises administering treatment on a decelerated schedule compared to the standard therapy. In some embodiments, a less aggressive therapy than the standard therapy comprises administering no treatment (e.g., no therapeutic agents, watchful waiting, active surveillance, etc.). In one embodiment a less aggressive therapy comprises delaying treatment. In one embodiment a less aggressive therapy comprises selecting and administering less potent drugs. In one embodiment a less aggressive therapy comprises decreasing the frequency treatment. In one embodiment a less aggressive therapy comprises shortening length of therapy. In one embodiment, less aggressive therapy comprises selecting and administering less potent drugs and decreasing drug dosage. In one embodiment, less aggressive therapy comprises selecting and administering less potent drugs and decelerating dose schedule. In one embodiment, less aggressive therapy comprises selecting and administering less potent drugs and shortening length of therapy. In one embodiment, less aggressive therapy comprises decreasing drug dosage and decelerating dose schedule. In one embodiment, less aggressive therapy comprises decreasing drug dosage and shortening length of therapy. In one embodiment, less aggressive therapy comprises decelerating dose schedule and shortening length of therapy. In one embodiment, less aggressive therapy comprises selecting and administering less potent drugs, decreasing drug dosage, and decelerating dose schedule. In one embodiment, less aggressive therapy comprises selecting and administering less potent drugs, decreasing drug dosage, and shortening length of therapy. I n one embodiment, less aggressive therapy comprises selecting and administering less potent drugs, decelerating dose schedule, and shortening length of therapy. In one embodiment, less aggressive therapy comprises decreasing drug dosage, decelerating dose schedule, and shortening length of therapy. I n one embodiment, less aggressive therapy comprises selecting and administering less potent drugs, decreasing drug dosage, decelerating dose schedule, and shortening length of therapy. I n some embodiments, a less aggressive therapy comprises administering only non-drug-based therapies.
Kits
[0073] I n another embodiment, a kit is provided for quantifying mutation load in a sample, e.g., a tumor sample, comprising reagents, etc. to detect variants in test loci. Some embodiments of the present disclosure comprise detection reagents packaged together in the form of a kit for conducting any of the assays disclosed herein. I n certain embodiments, the kits comprise oligonucleotides capable of specifically detecting one or more test loci variants as described herein. The oligonucleotide sequences may correspond to fragments of the biomarker nucleic acids. For example, the oligonucleotides ca n be more than 200, 200, 150, 100, 50, 25, 10, or fewer than 10 nucleotides in length. The kit can contain in separate containers a solution of nucleic acids, control formulations (positive and/or negative), and/or a detectable label, such as but not limited to fluorescein, green fluorescent protein, rhodamine, cyanine dyes, Alexa dyes, luciferase, and radiolabels, among others. Instructions for carrying out the assay can optionally be included in the kit.
[0074] I n other embodiments of the present disclosure, the kit can contain a nucleic acid substrate array comprising one or more nucleic acid sequences. Such an array will comprise oligonucleotides capable of specifically detecting one or more test locus variants as described herein. In various embodiments, the presence or absence of one or more of the test locus variants can be identified by virtue of binding to the array. In some embodiments the substrate array can be on a solid substrate, such as what is known as a "chip." See, e.g., U.S. Pat. No. 5,744,305. I n some embodiments the substrate array can be a solution array; e.g., xMAP (Luminex, Austin, TX), Cyvera (lllumina, San Diego, CA), RayBio Antibody Arrays (RayBiotech, Inc., Norcross, GA), CellCard (Vitra Bioscience, Mountain View, CA) and Quantum Dots' Mosaic (Invitrogen, Carlsbad, CA).
Machine-readable storage medium
[0075] A machine-readable storage medium can comprise, for example, a data storage material that is encoded with machine-readable data or data arrays. The data and machine-readable storage medium are capable of being used for a variety of purposes, when using a machine programmed with instructions for using said data. Such purposes include, without limitation, storing, accessing and manipulating information or data relating to mutation load of a patient or population over time. Data comprising the presence of test locus variants can be implemented in computer programs that are executing on programmable computers, which comprise a processor, a data storage system, one or more input devices, one or more output devices, etc. Program code can be applied to the input data to perform the functions described herein, and to generate output information. This output information can then be applied to one or more output devices. The computer can be, for example, a personal computer, a microcomputer, or a workstation of conventional design.
[0076] The computer programs can be implemented in a high-level procedural or object-oriented programming language, to communicate with a computer system. The programs can also be implemented in machine or assembly language. The programming language can also be a compiled or interpreted language. Each computer program can be stored on storage media or a device such as ROM, magnetic diskette, etc., and can be readable by a programmable computer for configuring and operating the computer when the storage media or device is read by the computer to perform the described procedures. Any health-related data management systems of the present disclosure can be considered to be implemented as a computer-readable storage medium, configured with a computer program, where the storage medium causes a computer to operate in a specific manner to perform various functions, as described herein.
[0077] The assays disclosed herein can be used to generate a "patient mutation load profile." The patient mutation load profile s can then be compared to a reference mutation load profile. The biomarker profiles, reference and subject, of embodiments of the present disclosure can be contained in a machine-readable medium, such as analog tapes like those readable by a CD- ROM or USB flash media, among others. The machine-readable media can also comprise subject information; e.g., the subject's medical or family history.
ADDITIONAL EMBODIMENTS
[0078] Embodiment 1. A method of measuring mutation load comprising:
(1) analyzing DNA derived from a cancer cell sample to determine the nucleotide sequence of the DNA at a plurality of test loci comprising at least 30,000 loci on each of a plurality of chromosomes comprising at least 5 chromosomes, wherein the plurality of test loci comprises at least 500,000 total loci across all the plurality of chromosomes; and
(2) determining a test mutation load, wherein the test mutation load is, or is derived from, the number or proportion of test loci harboring a mutation.
[0079] Embodiment 2. A method of measuring mutation load comprising:
(1) analyzing DNA derived from a cancer cell sample to determine the nucleotide sequence of the DNA at a plurality of test loci comprising at least 30,000 loci on each of a plurality of chromosomes comprising at least 5 chromosomes, wherein the plurality of test loci comprises at least 500,000 total loci across all the plurality of chromosomes;
(2) determining a test mutation load, wherein the test mutation load is, or is derived from, the number or proportion of test loci harboring a mutation; and
(3) quantifying
(a) high mutation load in a cancer cell sample in which at least 100 test loci harbor a mutation; or
(b) low mutation load in a cancer cell sample in which fewer than 100 test loci harbor a mutation.
[0080] Embodiment 3. A method of treating cancer patients comprising:
(1) analyzing DNA derived from a cancer cell sample to determine the nucleotide sequence of the DNA at a plurality of test loci comprising at least 30,000 loci on each of a plurality of chromosomes comprising at least 5 chromosomes, wherein the plurality of test loci comprises at least 500,000 total loci across all the plurality of chromosomes;
(2) quantifying a test mutation load, wherein the test mutation load is, or is derived from, the number or proportion of test loci harboring a mutation and wherein the test mutation load is high where at least [A number] test loci harbor a mutation and the test mutation load is low where fewer than [A number] test loci harbor a mutation; and
(3) administering
(a) a therapeutic regimen comprising a PARP inhibitor or an immune checkpoint inhibitor to a patient in whose cancer cell sample a high test mutation load is detected in (2); or
(b) a therapeutic regimen not comprising a PARP inhibitor or an immune checkpoint inhibitor to a patient in whose cancer cell sample a low test mutation load is detected in (2).
[0081] Embodiment 4. The method of any one of embodiments 1-3, wherein the plurality of test loci comprises at least 30,000, 40,000, 50,000, 60,000, 70,000, 80,000, 90,000, 100,000, 125,000, 150,000, 175,000, 200,000, 225,000, 250,000, 275,000, 300,000, 325,000, 350,000, 375,000, 400,000, 425,000, 450,000, 475,000, 500,000, 550,000, 600,000, 650,000, 700,000, 750,000, 800,000, 850,000, 900,000, 950,000, 1,000,000, 1,250,000, 1,500,000, 1,750,000, 2,000,000, 2,500,000, 3,000,000, 3,500,000, 4,000,000, 4,500,000, or 5,000,000 test loci on each of the plurality of chromosomes.
[0082] Embodiment 5. The method of any one of embodiments 1-3, wherein the total loci across all the plurality of chromosomes comprises at least 550,000, 600,000, 650,000, 700,000, 750,000, 800,000, 850,000, 900,000, 950,000, 1,000,000, 1,250,000, 1,500,000, 1,750,000, 2,000,000, 2,250,000, 2,500,000, 2,750,000, 3,000,000, 3,250,000 3,500,000, 3,750,000, 4,000,000, 4,250,000, 4,500,000, 4,750,000, 5,000,000, 5,500,000, 6,000,000, 6,500,000, 7,000,000, 7,500,000, 8,000,000, 8,500,000, 9,000,000, 9,500,000, 10,000,000, 11,000,000, 12,000,000, 13,000,000, 14,000,000, 15,000,000, 16,000,000, 17,000,000, 18,000,000, 19,000,000, 20,000,000, 25,000,000, 30,000,000, 35,000,000, 40,000,000, 45,000,000, 50,000,000, 60,000,000, 70,000,000, 80,000,000, 90,000,000, or 100,000,000 loci.
[0083] Embodiment 6. The method of any one of embodiments 1-3, wherein the plurality of chromosomes comprises at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, or 46 chromosomes. [0084] Embodiment ?. The method of any one of embodiments 1-3, wherein a test locus is counted as harboring a mutation if (a) there are 25 or more covering reads and (b) the variant allele ratio is in the range of 10%-75%, inclusive.
[0085] Embodiment s. The method of any one of embodiments 1-3, wherein a test locus is counted as harboring a mutation if (a) there are 50 or more covering reads and (b) the variant allele ratio is in the range of 15%-65%, inclusive.
[0086] Embodiment 9. The method of any one of embodiments 1-3, wherein the plurality of test loci excludes the following loci:
(a) repetitive loci;
(b) SNPs found in the general population; and
(c) loci found to harbor a mutation in more than one sample out of a reference cohort of at least 50 samples.
[0087] Embodiment 10. The method of any one of embodiments 1-3, wherein the cancer cell sample is a bodily fluid.
[0088] Embodiment 11. The method of any one of embodiments 1-3, wherein the cancer cell sample is a tumor sample.
[0089] Embodiment 12. The method of embodiment 3, wherein [A number] is is 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 125, 150, 175, 200, 250, 300, 350, 400, 450, 500 or more.
EXAMPLES
Example 1
[0090] A method for measuring mutation load as described above was assessed for its ability to differentiate non-small cell lung cancer patients who responded to a specific therapeutic regimen and those who did not.
Assay Details
[0091] DNA fragments derived from tumor samples were captured by hybridization to about 54,000 SNP probes, size-selected with average fragment size 150-200 bp, and sequenced from both ends on either an lllumina Hiseq or Miseq instrument. [0092] The forward (F) and reverse (R) read sequences, trimmed by quality, were aligned to the 801 bp segments around the target SNPs (i.e., test loci), with maximum 7 mismatches and no gaps used as criteria to exclude lower quality reads. Mismatches between the template segment sequences and the read sequences were counted for each selected test locus. Excluded from the analysis were repetitive positions, positions with SN Ps listed by NCBI's dbSNP, positions with consistently low coverage among different samples (with coverage < 50 in half or more out of 100 previously-tested samples), and positions that were found previous analysis to consistently have mutations in unrelated samples (mutated in more than one sample out of 100 previously-tested samples). The remaining 9,084,648 test loci were used to count mutations.
[0093] The starting positions of forward (F) and reverse (R) reads were compared among the clones. If the location of different clones coincided, the reads were cosidered as potentially derived from the same original fragment and only one copy was counted.
[0094] For each read sequence, positions close to each end with increased rate of sequencing errors were excluded (i.e., only positions 11-85 were used to estimate the coverage and count the allele bases and, thus mutations). If F and R read alignments overlapped, only one was counted to contribute to the coverage of the overlapping positions. Positions with non-matching F and R bases were excluded as sequencing errors.
[0095] For each sample, the number of positions with 50 or more covering reads and with the variant allele ratio in range ll%-65% was counted. The samples with high counts were also inspected graphically with the distribution plots of the allele ratio. Samples were considered to have a high mutation load if they exhibited greater than about 200 test loci harboring variants.
Results
[0096] 22 tumor samples were assessed as described above, with 21 samples producing evaluable results. 6/22 patients showed pathological complete response (pCR) to veliparib. 2/6 patients with pCR had a high mutation load. 2/2 patients with high mutation load show pCR. See, e.g., Figure 1. Figure 1 shows these patients further stratified according to another biomarker profile, which corresponds to the HRD assays described in as described and defined in U.S. Appl. Ser. No. 14/245,576 and I nternat. Appl. Ser. No. PCT/US2015/045561 (discussed above). The two samples with pCR that have very low HRD scores were detected to have high mutation loads as described herein, suggesting that the two tests may be complementary in their ability to direct more responsive patients to PARP inhibition treatment.
[0097] Although the foregoing disclosure has been described in some detail by way of illustration and example for purposes of clarity of understanding, it will be obvious that certain changes and modifications may be practiced within the scope of the appended claims.

Claims

CLAIMS What is claimed is:
Claim 1. An in vitro method of measuring mutation load comprising:
(1) analyzing DNA derived from a cancer cell sample to determine the nucleotide sequence of the DNA at a plurality of test loci comprising at least 30,000 loci on each of a plurality of chromosomes comprising at least 5 chromosomes, wherein the plurality of test loci comprises at least 500,000 total loci across all the plurality of chromosomes; and
(2) determining a test mutation load, wherein the test mutation load is, or is derived from, the number or proportion of test loci harboring a mutation.
Claim 2. An in vitro method of measuring mutation load comprising:
(1) analyzing DNA derived from a cancer cell sample to determine the nucleotide sequence of the DNA at a plurality of test loci comprising at least 30,000 loci on each of a plurality of chromosomes comprising at least 5 chromosomes, wherein the plurality of test loci comprises at least 500,000 total loci across all the plurality of chromosomes;
(2) determining a test mutation load, wherein the test mutation load is, or is derived from, the number or proportion of test loci harboring a mutation; and
(3) quantifying
(a) high mutation load in a cancer cell sample in which at least 100 test loci harbor a mutation; or
(b) low mutation load in a cancer cell sample in which fewer than 100 test loci harbor a mutation.
Claim 3. An in vitro method of treating cancer patients comprising:
(1) analyzing DNA derived from a cancer cell sample to determine the nucleotide sequence of the DNA at a plurality of test loci comprising at least 30,000 loci on each of a plurality of chromosomes comprising at least 5 chromosomes, wherein the plurality of test loci comprises at least 500,000 total loci across all the plurality of chromosomes;
(2) quantifying a test mutation load, wherein the test mutation load is, or is derived from, the number or proportion of test loci harboring a mutation and wherein the test mutation load is high where at least 100 test loci harbor a mutation and the test mutation load is low where fewer than 100 test loci harbor a mutation; and
(3) administering
(a) a therapeutic regimen comprising a PARP inhibitor or an immune checkpoint inhibitor to a patient in whose cancer cell sample a high test mutation load is detected in (2); or
(b) a therapeutic regimen not comprising a PARP inhibitor or an immune checkpoint inhibitor to a patient in whose cancer cell sample a low test mutation load is detected in (2).
Claim 4. The method of any one of claims 1-3, wherein the plurality of test loci comprises at least 30,000, 40,000, 50,000, 60,000, 70,000, 80,000, 90,000, 100,000, 125,000, 150,000, 175,000, 200,000, 225,000, 250,000, 275,000, 300,000, 325,000, 350,000, 375,000, 400,000, 425,000, 450,000, 475,000, 500,000, 550,000, 600,000, 650,000, 700,000, 750,000, 800,000, 850,000, 900,000, 950,000, 1,000,000, 1,250,000, 1,500,000, 1,750,000, 2,000,000, 2,500,000, 3,000,000, 3,500,000, 4,000,000, 4,500,000, or 5,000,000 test loci on each of the plurality of chromosomes.
Claim 5. The method of any one of claims 1-3, wherein the total loci across all the plurality of chromosomes comprises at least 550,000, 600,000, 650,000, 700,000, 750,000, 800,000, 850,000, 900,000, 950,000, 1,000,000, 1,250,000, 1,500,000, 1,750,000, 2,000,000, 2,250,000, 2,500,000, 2,750,000, 3,000,000, 3,250,000 3,500,000, 3,750,000, 4,000,000, 4,250,000, 4,500,000, 4,750,000, 5,000,000, 5,500,000, 6,000,000, 6,500,000, 7,000,000, 7,500,000, 8,000,000, 8,500,000, 9,000,000, 9,500,000, 10,000,000, 11,000,000, 12,000,000, 13,000,000, 14,000,000, 15,000,000, 16,000,000, 17,000,000, 18,000,000, 19,000,000, 20,000,000, 25,000,000, 30,000,000, 35,000,000, 40,000,000, 45,000,000, 50,000,000, 60,000,000, 70,000,000, 80,000,000, 90,000,000, or 100,000,000 loci.
Claim 6. The method of any one of claims 1-3, wherein the plurality of chromosomes comprises at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, or 46 chromosomes.
Claim 7. The method of any one of claims 1-3, wherein a test locus is counted as harboring a mutation if (a) there are 25 or more covering reads and (b) the variant allele ratio is in the range of 10%-75%, inclusive.
Claim 8. The method of any one of claims 1-3, wherein a test locus is counted as harboring a mutation if (a) there are 50 or more covering reads and (b) the variant allele ratio is in the range of 15%-65%, inclusive.
Claim 9. The method of any one of claims 1-3, wherein the plurality of test loci excludes the following loci:
(a) repetitive loci;
(b) SNPs found in the general population; and
(c) loci found to harbor a mutation in more than one sample out of a reference cohort of at least 50 samples.
Claim 10. The method of any one of claims 1-3, wherein the cancer cell sample is a bodily fluid.
Claim 11. The method of any one of claims 1-3, wherein the cancer cell sample is a tumor sample.
PCT/US2016/066685 2015-12-14 2016-12-14 Methods for measuring mutation load WO2017106365A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201562266965P 2015-12-14 2015-12-14
US62/266,965 2015-12-14

Publications (1)

Publication Number Publication Date
WO2017106365A1 true WO2017106365A1 (en) 2017-06-22

Family

ID=57737991

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2016/066685 WO2017106365A1 (en) 2015-12-14 2016-12-14 Methods for measuring mutation load

Country Status (1)

Country Link
WO (1) WO2017106365A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107338292A (en) * 2017-07-10 2017-11-10 上海思路迪生物医学科技有限公司 Method and kit based on high-flux sequence detection human genome mutational load
CN113168885A (en) * 2018-11-13 2021-07-23 麦利亚德基因公司 Methods and systems for somatic mutation and uses thereof

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014165785A2 (en) * 2013-04-05 2014-10-09 Myriad Genetics, Inc. Methods and materials for assessing homologous recombination deficiency

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014165785A2 (en) * 2013-04-05 2014-10-09 Myriad Genetics, Inc. Methods and materials for assessing homologous recombination deficiency

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
E. M. VAN ALLEN ET AL: "Genomic correlates of response to CTLA-4 blockade in metastatic melanoma", SCIENCE, vol. 350, no. 6257, 10 September 2015 (2015-09-10), pages 207 - 211, XP055342253, ISSN: 0036-8075, DOI: 10.1126/science.aad0095 *
E. M. VAN ALLEN ET AL: "Genomic correlates of response to CTLA-4 blockade in metastatic melanoma", SCIENCE, vol. 350, no. 6257, 9 October 2015 (2015-10-09), pages 207 - 211, XP055342466, ISSN: 0036-8075, DOI: 10.1126/science.aad0095 *
LUDMIL B. ALEXANDROV ET AL: "Signatures of mutational processes in human cancer", NATURE, vol. 500, no. 7463, 22 August 2013 (2013-08-22), United Kingdom, pages 415 - 421, XP055251628, ISSN: 0028-0836, DOI: 10.1038/nature12477 *
T. POPOVA ET AL: "Ploidy and Large-Scale Genomic Instability Consistently Identify Basal-like Breast Carcinomas with BRCA1/2 Inactivation", CANCER RESEARCH, vol. 72, no. 21, 29 August 2012 (2012-08-29), pages 5454 - 5462, XP055341888, ISSN: 0008-5472, DOI: 10.1158/0008-5472.CAN-12-1470 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107338292A (en) * 2017-07-10 2017-11-10 上海思路迪生物医学科技有限公司 Method and kit based on high-flux sequence detection human genome mutational load
CN113168885A (en) * 2018-11-13 2021-07-23 麦利亚德基因公司 Methods and systems for somatic mutation and uses thereof

Similar Documents

Publication Publication Date Title
US20220010385A1 (en) Methods for detecting inactivation of the homologous recombination pathway (brca1/2) in human tumors
TWI803505B (en) Mutational analysis of plasma dna for cancer detection
JP6625045B2 (en) Methods and materials for assessing homologous recombination deficiencies
KR102339760B1 (en) Diagnosing fetal chromosomal aneuploidy using massively parallel genomic sequencing
US10711308B2 (en) Mutation signatures for predicting the survivability of myelodysplastic syndrome subjects
CN114774520A (en) System and method for detecting tumor development
US20160095920A1 (en) Kras mutations and resistance to anti-egfr treatment
WO2017112738A1 (en) Methods for measuring microsatellite instability
JP2014506459A (en) Methods for discovering pharmacogenomic biomarkers
JP2023109998A (en) Detection of microsatellite instability
US20130035404A1 (en) Integrated Analyses of Breast and Colorectal Cancers
US20210262016A1 (en) Methods and systems for somatic mutations and uses thereof
EP3954784A1 (en) Composition for diagnosis or prognosis prediction of glioma, and method for providing information related thereto
WO2017106365A1 (en) Methods for measuring mutation load
KR102112951B1 (en) Ngs method for the diagnosis of cancer
KR20200064891A (en) Method of providing the information for predicting of hematologic malignancy prognosis after peripheral blood stem cell transplantation
US20220205043A1 (en) Detecting cancer risk
AU2021291586B2 (en) Multimodal analysis of circulating tumor nucleic acid molecules
US20240093302A1 (en) Non-invasive cancer detection based on dna methylation changes
Tariq et al. Targeted capture and massively parallel sequencing in pediatric cardiomyopathy: development of novel diagnostics
TWI607091B (en) Method or kit for determining lung cancer development
Johnson 5-Hydroxymethylcytosine localizes to glioblastoma-specific enhancer elements that stratifies patient survival
WO2018186687A1 (en) Method for determining nucleic acid quality of biological sample

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16822582

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16822582

Country of ref document: EP

Kind code of ref document: A1