WO2024151667A2 - Analyse de 5-hydroxyméthylation de l'adng de la couche leuco-plaquettaire dans la détection du cancer - Google Patents

Analyse de 5-hydroxyméthylation de l'adng de la couche leuco-plaquettaire dans la détection du cancer Download PDF

Info

Publication number
WO2024151667A2
WO2024151667A2 PCT/US2024/010932 US2024010932W WO2024151667A2 WO 2024151667 A2 WO2024151667 A2 WO 2024151667A2 US 2024010932 W US2024010932 W US 2024010932W WO 2024151667 A2 WO2024151667 A2 WO 2024151667A2
Authority
WO
WIPO (PCT)
Prior art keywords
cancer
hydroxymethylation
patient
cfdna
buffy coat
Prior art date
Application number
PCT/US2024/010932
Other languages
English (en)
Other versions
WO2024151667A3 (fr
Inventor
Samuel Levy
Gulfem Dilek Guler
Giulana P. MOGNOL
Yuhong Ning
Original Assignee
Clearnote Health, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Clearnote Health, Inc. filed Critical Clearnote Health, Inc.
Publication of WO2024151667A2 publication Critical patent/WO2024151667A2/fr
Publication of WO2024151667A3 publication Critical patent/WO2024151667A3/fr

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/154Methylation markers

Definitions

  • the present invention relates generally to cancer, and more particularly relates to a novel hydroxymethylation analysis useful in an improved method for detecting cancer.
  • Cancer is the second leading cause of death globally. Cancer mortality is exacerbated by diagnosis at late stage when prognosis is poor. Earlier cancer detection offers the opportunity to improve patient outcomes by identifying tumors when treatment is more likely to be effective. While breast, colorectal and lung cancers are among the few cancers for which screening modalities exist, screening tests that are currently used in the clinic can be expensive, invasive, and limited to detection of a single cancer type; this, in turn, may necessitate multiple tests, further increasing the cost for overall early cancer detection and resulting in possible delay of treatment. Liquid biopsy-based multi-cancer early detection tests aim to address these limitations and complement these screening approaches.
  • Peripheral blood contains multiple analytes that can be assessed and implemented in a non-invasive method for early detection of cancer.
  • genomic and epigenomic profiling of plasma-derived cell free DNA (cfDNA) has been repeatedly shown to have utility for cancer detection (see, e.g., Liu et al. (2020), Ann. Oncol. 31: 745- 759; Song et al. (2017) Cell Res 27: 1231-1242; Guler et al. (2020) Nat Commun. 11: 5270; Gao et al. (2022) Innovation 3: 100259).
  • cfDNA-based liquid biopsy assays are particularly effective with cancers having high levels of circulating tumor DNA (ctDNA)
  • the sensitivity of these assays is reduced for early-stage disease when ctDNA levels are usually low (Gao et al. (2022), supra; Cohen et al. (2016) Science 359: 926-930; Chabon et al. (2020) Nature 580: 245-251). Therefore, the discovery of biomarkers that do not rely on ctDNA release is essential for improving early cancer detection.
  • the buffy coat (BC) fraction of peripheral blood contains intact cells such as circulating tumor cells that have been widely investigated not only for cancer detection but also for cancer prognosis (Alix-Panabieres et al.
  • the present invention provides a method for detecting cancer without a surgical biopsy or other invasive means, wherein the method can be carried out with respect to a wide range of cancer types and at various stages, including early stage cancer.
  • the method is a "liquid biopsy" based technique that, in contrast to prior such methods, makes use of the buffy coat fraction of a patient's blood sample and a 5-hydroxymethylation analysis of the buffy coat.
  • the method can be carried out without a combined analysis involving additional feature types, but, optimally, is combined with at least one additional feature.
  • the additional feature is a 5-hydroxymethylation signature obtained from a cell-free DNA (cfDNA) sample extracted from a blood sample of the same patient.
  • cfDNA cell-free DNA
  • the invention provides a method for analyzing buffy coat in a peripheral blood sample obtained from a patient, wherein the method comprises: [0007] obtaining a buffy coat hydroxymethylation signature for the patient by
  • the method may further comprise, in some embodiments, generating a buffy coat gDNA- based probability score that the patient has cancer from the buffy coat hydroxymethylation signature.
  • the at least one shared characteristic comprises having cancer. In other embodiments, the at least one shared characteristic comprises not having cancer. In other embodiments, the at least one shared characteristic comprises having a particular type of cancer or not having a particular type of cancer. In additional embodiments, the at least one shared characteristic comprises having a particular stage of cancer or not having a particular stage of cancer.
  • each hydroxymethylation biomarker locus is selected as exhibiting differential hydroxymethylation in a manner that correlates with having cancer or a particular type of cancer.
  • differential hydroxymethylation is determined using a p- value of less than or equal to 0.05 using a linear regression F-test.
  • the method involves combining the buffy coat gDNA- based probability score alluded to above with at least one an additional feature value to characterize the likelihood that a patient has cancer.
  • the additional feature comprises a cfDNA hydroxymethylation signature obtained from a cfDNA sample extracted from a blood sample taken from the same patient.
  • the cfDNA sample is extracted from the same blood sample that comprises the buffy coat.
  • the additional feature value derives from an additional feature type that comprises one or more of: DNA fragment size distribution; copy number variation; cfDNA concentration; methylation profile; T-cell-inflamed gene expression profile; circulating tumor DNA count; serum CA19-9 level; serum CA125 level; IDO-1 expression; T- cell count; T-cell percentage; inflammation gene signature; myeloid-derived suppressor cell count; lymphocyte count; deficient mismatch repair; tumor mutational burden; presence or absence of germline mutations; and a patient-specific clinical parameter.
  • additional feature type comprises one or more of: DNA fragment size distribution; copy number variation; cfDNA concentration; methylation profile; T-cell-inflamed gene expression profile; circulating tumor DNA count; serum CA19-9 level; serum CA125 level; IDO-1 expression; T- cell count; T-cell percentage; inflammation gene signature; myeloid-derived suppressor cell count; lymphocyte count; deficient mismatch repair; tumor mutational burden; presence or absence of germline mutations; and a patient-specific clinical parameter.
  • the invention additionally provides a method for analyzing a peripheral blood sample obtained from a patient, wherein the method comprises:
  • the composite probability score represents the likelihood that the patient has cancer. [0029] In some embodiments, the composite probability score represents the likelihood that the patient has a particular type of cancer. In certain aspects of these embodiments, the particular type of cancer is breast cancer. In other aspects, the type of cancer is colorectal cancer. In still other aspects, the type of cancer is lung cancer.
  • the aforementioned method may further include combining the composite probability score with an additional feature value for at least one additional feature type to characterize the likelihood that the patient has cancer.
  • the additional feature value derives from an additional feature type that comprises one or more of: DNA fragment size distribution; copy number variation; cfDNA concentration; methylation profile; T-cell-inflamed gene expression profile; circulating tumor DNA count; serum CA19-9 level; serum CA125 level; IDO-1 expression; T-cell count; T-cell percentage; inflammation gene signature; myeloid-derived suppressor cell count; lymphocyte count; deficient mismatch repair; tumor mutational burden; presence or absence of germline mutations; and a patient-specific clinical parameter.
  • the additional feature type comprises: the number of cfDNA fragments in each of at least two nonoverlapping size ranges; copy number variation in the cfDNA sample; concentration of cfDNA in the cfDNA sample; a patient-specific clinical parameter; and combinations of any of the foregoing.
  • Representative patient-specific clinical parameters include, without limitation, lesion size; lesion grade; lesion stage; lesion location; patient age; patient weight; patient gender; patient ethnicity; cigarette smoking status; and exposure or lack of exposure to a known carcinogen.
  • combining two or more feature values comprises an ensemble analysis. In some embodiments, combining two or more feature values comprises a stacked ensemble analysis. In one aspect of the aforementioned embodiments, the buffy coat hydroxymethylation signature is used as a base model in a stacked ensemble analysis. [0032] The invention additionally provides a method for analyzing a peripheral blood sample obtained from a patient which comprises:
  • the specific cell type is a granulocyte.
  • the invention provides a method for detecting cancer in a patient by determining the percentage of granulocytes in a buffy coat fraction obtained from a peripheral blood sample and comparing that percentage to an established percentage of granulocytes in a reference standard, wherein the reference standard may be a mean percentage observed in non-cancer patients or a mean percentage observed in cancer patients.
  • a reference standard may be a mean percentage observed in non-cancer patients or a mean percentage observed in cancer patients.
  • An elevated percentage of granulocytes in the buffy coat has now been found to correlate with the likelihood that a patient has cancer.
  • Granulocyte percentage is itself correlated with the presence of cancer, but may also be combined as a feature type with buffy coat hydroxymethylation signature and/or cfDNA hydroxymethylation signature.
  • FIG. 1 is a table summarizing the clinical characteristics of the cancer and noncancer cohorts evaluated in the Example, showing the distribution of age, sex, and smoking status in each cohort. Also shown is an indication of the composition of the cancer cohort, stratified by individual tumor types, and the stage distribution within the cancer cohort.
  • FIG. 2 schematically illustrates the laboratory process and analytical steps employed in one embodiment of the invention, and as described in the Example. Plasma or buffy coat were isolated from whole blood obtained by routine venous phlebotomy. cfDNA and gDNA were then extracted from plasma and BC, respectively. WGS and 5hmC libraries, prepared from both cfDNA and fragmented gDNA, were then sequenced. Cancer prediction models were built using sequencing data obtained from cfDNA and BC individually or in combination.
  • FIGS. 3-8 pertain to differential 5hmC features in cancer versus non-cancer samples from the BC gDNA and performance of the BC cancer prediction model, as described in the Example herein:
  • FIG. 3 is an MA plot indicating differentially hydroxymethylated genes (DhMGs) identified in cancer samples compared to non-cancer controls. Red and blue dots respectively indicate increased or decreased 5hmC density in cancer relative to non-cancers with a false discovery rate (FDR) ⁇ 0.05.
  • DhMGs differentially hydroxymethylated genes
  • FIG. 4 provides boxplots (logCPM) of selected (FDR ⁇ 0.001) DhMGs in all cancer samples versus non-cancers for purposes of comparison.
  • the center line represents the median and the bounds of the box represent 5th through 95th percentiles.
  • Each dot represents an individual gDNA sample.
  • FIG. 6 provides gene set enrichment analysis (GSEA) C8 normalized enrichment scores of the top hematopoietic-related positive and negative representative pathways in cancer and non-cancer samples.
  • GSEA gene set enrichment analysis
  • FIG. 7 is a cross-validation ROC curve showing the performance of the BC model to distinguish all cancer samples relative to non-cancer controls. AUC value with confidence interval [Cl] are shown. The red dashed line represents 98% specificity.
  • FIG. 8 indicates in graph form the cancer prediction scores obtained using the BC model stratified by cancer type. The number of true positives, for the cancer cohorts, and false positives, for non-cancer controls, over the total number of samples are indicated underneath the graph.
  • FIGS. 9-11 relate to the observation and analysis of 5hmC signals present in cfDNA in cancer and non-cancer samples and the performance of the cfDNA cancer prediction model, as described in the Example herein:
  • FIG. 9 is an MA plot indicating DhMGs in cfDNA, comparing all cancer versus non- cancer samples (FDR ⁇ 0.05). The red and blue dots indicate increased or decreased 5hmC density in cancer compared to non-cancer samples, respectively.
  • FIG. 10 is a cross-validation ROC curve showing the performance of the cfDNA model in distinguishing all cancer versus non-cancer samples, with AUC value and confidence interval [Cl] shown. The red dashed line represents 98% specificity.
  • FIG. 11 indicates in graph form the cancer prediction scores obtained using the cfDNA model stratified by cancer type. As in FIG. 8, the number of true positives, for the cancer cohorts, and false positives, for non-cancer controls, over the total number of samples are indicated underneath the graph.
  • FIG. 12 are Venn diagrams showing the overlap of differentially hydroxymethylated genes in breast cancer, colorectal cancer, and lung cancer relative to non-cancer controls.
  • FIG. 13 provides gene set enrichment analysis (GSEA) C8 normalized enrichment scores of the top hematopoietic-related positive and negative representative pathways in cancer and non-cancer samples.
  • GSEA gene set enrichment analysis
  • FIG. 15 indicates the correlation of cancer to non-cancer fold change in 5hmC counts across all genes as calculated using the BC and cfDNA models (scatter plot of the two datasets); p-value (linear regression F-test) ⁇ 0.001.
  • FIG. 16 is an MA plot of DhMGs observed using the BC model, comparing all cancers versus non-cancers samples (FDR ⁇ 0.05). Red and blue dots indicate increased or decreased 5hmC density in cancer compared to non-cancers, respectively.
  • FIG. 17 is an analogous MA plot of DhMGs observed using the cfDNA model.
  • FIGS. 18-23 pertain to the performance of the BC-cfDNA combined cancer prediction model as described in the Example:
  • FIG. 18 is a cross-validation ROC curve showing the performance of the combined model to distinguish all cancer versus non-cancer samples, with AUC value and confidence interval [Cl] shown. The red dashed line represents 98% specificity.
  • FIG. 19 provides cancer prediction scores stratified by cancer type using the combined BC and cfDNA model, indicating the number of true positives, for the cancer cohorts, and false positives, for non-cancer controls, underneath the graph.
  • FIG. 20 is a Venn diagram showing the overlap of true positives scored using the BC, cfDNA or the combined models, set at 98% training specificity.
  • FIG. 21 is a Venn diagram showing the overlap of false positives scored using the BC, cfDNA or the combined models, set at 98% training specificity.
  • FIGS. 22 and 23 provide a cross-validation performance comparison among the BC, cfDNA and the combined models, at 98% training specificity for: all cancers or individual cancer types versus non-cancer samples (FIG. 22); and all early-stage cancer (stages l-ll) versus non-cancers (FIG. 23). * indicates statistical significance (p ⁇ 0.05, McNemar's test) between the individual models relative to the combined model.
  • FIG. 24 is a dot plot providing a comparison between granulocyte percentage and BC prediction scores, as evaluated in the Example.
  • the percentage of granulocytes (CD45 + CD14 CD15 + cells) in the BC was determined by flow cytometry. This result was plotted against each sample's prediction score determined by the BC model.
  • FIGS. 25-27 pertain to differential 5hmC features identified in granulocytes and monocytes of cancer versus non-cancer BC samples and performance of their cancer prediction models, as described in the Example herein:
  • FIG. 25 is a table summarizing the clinical cohort characteristics in the granulocyte cohort.
  • FIG. 26 is a table summarizing the clinical cohort characteristics in the monocyte cohort.
  • FIG. 27 provides MA plots of DhMG observed in BC for granulocytes (left plot) and monocytes (right plot), comparing all cancer and non-cancer samples (FDR ⁇ 0.05). Red and blue dots indicate increased or decreased 5hmC density in cancer compared to non- cancer samples, respectively.
  • FIG. 28 provides cross-validation ROC curves showing the performance of the granulocyte (left curve) and monocyte (right curve) models to distinguish cancer from non- cancer samples, with AUC value and confidence interval [Cl] shown.
  • FIG. 29 is a table setting forth identifying genes with differential 5hmC in buffy coat-derived gDNA obtained from individuals with cancer compared to non-cancer controls as determined by thresholding with FDR ⁇ 0.05 and
  • FIG. 30 is a table indicating GSEA results with cell type signature (C8) gene sets comparing cancers to non-cancer using 5hmC counts over genes in buffy coat-derived gDNA.
  • FIG. 31 is a table showing performance of the cancer prediction model built by buffy coat features alone, cfDNA features alone or a combination of feature sets from buffy coat and the matched cfDNA.
  • FIG. 32 is a table indicating those genes with differential 5hmC in cfDNA obtained from individuals with cancer compared to non-cancer controls (FDR ⁇ 0.05 and
  • FIG. 33 provides a comparison of sensitivity values at 98% specificity for cancer prediction models that were built with buffy coat features alone, cfDNA features alone or combination of buffy coat and cfDNA features.
  • nucleic acids are written left to right in 5' to 3' orientation; amino acid sequences are written left to right in amino to carboxy orientation, respectively.
  • sample as used herein relates to a material or mixture of materials, typically in liquid form, containing one or more analytes of interest.
  • the biological samples evaluated herein are blood samples obtained from a patient.
  • nucleic acid sample refers to a biological sample comprising nucleic acids.
  • the nucleic acid sample may be a genomic DNA sample, or it may be comprised of cell-free DNA wherein the sample is substantially free of histones and other proteins, such as will be the case following cell-free DNA purification.
  • sample fraction refers to a subset of an original biological sample, and may be a compositionally identical portion of the biological sample, as when a blood sample is divided into identical fractions.
  • sample fraction may be compositionally different, as will be the case when, for example, certain components of the biological sample are removed, with extraction of cell-free nucleic acids being one such example.
  • cell-free nucleic acid encompasses both cell-free DNA and cell-free RNA, where the cell-free DNA and cell-free RNA may be in a cell-free fraction of a biological sample comprising a body fluid.
  • the body fluid may be blood, including peripheral blood, serum, or plasma.
  • the biological sample is a blood sample
  • a cell-free nucleic acid sample e.g., a cell-free DNA sample
  • a cell-free DNA sample is extracted therefrom using now-conventional means known to those of ordinary skill in the art and/or described in the pertinent texts and literature; kits for carrying out cell-free nucleic acid extraction are commercially available (e.g., the AllPrep® DNA/RNA Mini Kit and QIAmp DNA Blood Mini Kit, both available from Qiagen, or the MagMAX Cell-Free Total Nucleic Acid Kit and the MagMAX DNA Isolation Kit, available from ThermoFisher Scientific). Also see, e.g., Hui et al. Fong et al. (2009) Clin. Chem. 55(3):587-598.
  • Adapters as that term is used herein are short synthetic oligonucleotides that serve a specific purpose in a biological analysis. Adapters can be single-stranded or doublestranded, although the preferred adapters herein are double-stranded.
  • an adapter may be a hairpin adapter (i.e., one molecule that base pairs with itself to form a structure that has a double-stranded stem and a loop, where the 3' and 5' ends of the molecule ligate to the 5' and 3' ends of a double-stranded DNA molecule, respectively).
  • an adapter may be a Y-adapter.
  • an adapter may itself be composed of two distinct oligonucleotide molecules that are base paired with each other.
  • a ligatable end of an adapter may be designed to be compatible with overhangs made by cleavage by a restriction enzyme, or it may have blunt ends or a 5' T overhang.
  • the term "adapter" refers to double-stranded as well as singlestranded molecules.
  • An adapter can be DNA or RNA, or a mixture of the two.
  • An adapter containing RNA may be cleavable by RNase treatment or by alkaline hydrolysis.
  • An adapter may be 15 to 100 bases, e.g., 50 to 70 bases, although adapters outside of this range are envisioned.
  • adapter-ligated refers to a nucleic acid that has been ligated to an adapter.
  • the adapter can be ligated to a 5' end and/or a 3' end of a nucleic acid molecule.
  • the term “adding adapter sequences” refers to the act of adding an adapter sequence to the end of fragments in a sample. This may be done by filling in the ends of the fragments using a polymerase, adding an A tail, and then ligating an adapter comprising a T overhang onto the A-tailed fragments.
  • Adapters are usually ligated to a DNA duplex using a ligase, while with RNA, adapters are covalently or otherwise attached to at least one end of a cDNA duplex preferably in the absence of a ligase.
  • amplifying refers to generating one or more copies, or "amplicons,” of a template nucleic acid, such as may be carried out using any suitable nucleic acid amplification technique, such as technology, such as PCR, NASBA, TMA, and
  • enrichment refers to a partial purification of template molecules that have a certain feature (e.g., nucleic acids that contain 5- hydroxymethylcytosine) from analytes that do not have the feature (e.g., nucleic acids that do not contain hydroxymethylcytosine).
  • Enrichment typically increases the concentration of the analytes that have the feature by at least 2-fold, at least 5-fold or at least 10-fold relative to the analytes that do not have the feature.
  • at least 10%, at least 20%, at least 50%, at least 80% or at least 90% of the analytes in a sample may have the feature used for enrichment.
  • at least 10%, at least 20%, at least 50%, at least 80% or at least 90% of the nucleic acid molecules in an enriched composition may contain a strand having one or more hydroxymethylcytosines that have been modified to contain a capture tag.
  • sequence refers to a method by which the identity of at least 10 consecutive nucleotides (e.g., the identity of at least 20, at least 50, at least 100 or at least 200 or more consecutive nucleotides) of a polynucleotide is obtained.
  • next-generation sequencing or “high-throughput sequencing”, as used herein, refer to the so-called parallelized sequencing-by-synthesis or sequencing-by- ligation platforms currently employed by Illumina, Life Technologies, Roche, etc.
  • Nextgeneration sequencing methods may also include nanopore sequencing methods such as that commercialized by Oxford Nanopore Technologies, electronic detection methods such as Ion Torrent technology commercialized by Life Technologies, and single-molecule fluorescence-based methods such as that commercialized by Pacific Biosciences.
  • read refers to the raw or processed output of sequencing systems, such as massively parallel sequencing.
  • the output of the methods described herein is reads.
  • these reads may need to be trimmed, filtered, and aligned, resulting in raw reads, trimmed reads, aligned reads.
  • UFI sequence refers to a relatively short nucleic acid sequence that serves to identify a feature of a nucleic acid molecule.
  • Nucleic acid template molecules and amplicons thereof that contain a UFI are sometimes referred to herein as "barcoded" template molecules or amplicons. Examples of UFI sequence types include, without limitation, the following:
  • a "molecular UFI sequence” (or “molecular barcode”) is appended to every nucleic acid template molecule in a sample, and is random, such that, providing the UFI sequence is of sufficient length, every nucleic acid template molecule is attached to a unique UFI sequence.
  • Molecular UFI sequences can be used to account for and offset amplification and sequencer errors, allow a user to track duplicates and remove them from downstream analysis, and enable molecular counting, and, in turn, the determination of an analyte concentration. See, e.g., Casbon et al. (2011) Nuc. Acids Res. 39(12):l-8.
  • the "unique feature” here is the identity of the nucleic acid template molecules.
  • a UFI may have a length in the range of from 1 to about 35 nucleotides, e.g., from 3 to 30 nucleotides, 4 to 25 nucleotides, or 6 to 20 nucleotides.
  • the UFI may be error-detecting and/or error-correcting, meaning that even if there is an error (e.g., if the sequence of the molecular barcode is mis-synthesized, mis-read or distorted during any of the various processing steps leading up to the determination of the molecular barcode sequence) then the code can still be interpreted correctly.
  • error-correcting sequences is described in the literature (e.g., in U.S. Patent Publication Nos. U.S. 2010/0323348 to Hamati et al. and U.S. 2009/0105959 to Braverman et al.).
  • detection is used interchangeably with the terms “determining,” “measuring,” “evaluating,” “assessing,” “assaying,” and “analyzing,” to refer to any form of measurement, and include determining if an element is present or not. These terms include both quantitative and/or qualitative determinations. Assessing may be relative or absolute. “Assessing the presence of” thus includes determining the amount of a moiety present, as well as determining whether it is present or absent. Assessing the level at a hydroxymethylation biomarker locus refers to a determination of the degree of hydroxymethylation at that locus.
  • TP true positives
  • TN true negatives
  • FP false negatives
  • FN false negatives
  • Performance is a term that relates to the overall usefulness and quality of a diagnostic or prognostic test, including, among others, clinical and analytical accuracy, other analytical and process characteristics, such as use characteristics (e.g., stability, ease of use), health economic value, and relative costs of components of the test. Any of these factors may be the source of superior performance and thus usefulness of the test, and may be measured by appropriate "performance metrics," such as AUC, time to result, shelf life, etc. as relevant.
  • “Clinical parameters” encompass all non-sample biomarkers of subject health status or other characteristics, such as, without limitation, lesion size; lesion location; patient age; patient weight; patient gender; patient ethnicity; family history; genetic mutations; and PD-L1 tumor staining result, which is currently used in the clinic to determine whether anti-PD-1 therapy is in order.
  • a "formula,” “algorithm,” or “model” is any mathematical equation, algorithmic, analytical or programmed process, or statistical technique that takes one or more continuous or categorical inputs and calculates an output value, sometimes referred to as a "probability score" or “index value.”
  • “formulas” include sums, ratios, and regression operators, such as coefficients or exponents, biomarker value transformations and normalizations (including, without limitation, those normalization schemes based on clinical parameters, such as gender, age, or ethnicity), rules and guidelines, statistical classification models, and neural networks trained on historical populations.
  • AIC Akaike's Information Criterion
  • BIC Bayes Information Criterion
  • the resulting predictive models may be validated in other studies, or cross-validated in the study they were originally trained in, using such techniques as Bootstrap, Leave-One-Out (LOO) and 10- Fold cross-validation (10-Fold CV).
  • LEO Leave-One-Out
  • 10- Fold cross-validation 10-Fold CV.
  • false discovery rates may be estimated by value permutation according to techniques known in the art.
  • “Likelihood,” in the context of one embodiment of the present invention, is the probability that a patient has or does not have cancer or a particular type of cancer.
  • a "hydroxymethylation level” refers to the extent of hydroxymethylation within a hydroxymethylation biomarker locus.
  • the extent of hydroxymethylation is normally measured as hydroxymethylation density, e.g., the ratio of 5hmC residues to total cytosines, both modified and unmodified, within a nucleic acid region.
  • Other measures of hydroxymethylation density are also possible, e.g., the ratio of 5hmC residues to total nucleotides in a nucleic acid region.
  • a "hydroxymethylation profile” or “hydroxymethylation signature” refers to a data set that comprises the hydroxymethylation level at each of a plurality of hydroxymethylation biomarker loci that are preselected as differentially hydroxymethylated with regard to a particular disease phenotype, e.g., lung cancer, colorectal cancer, breast cancer, or the like.
  • the hydroxymethylation profile may be a reference hydroxymethylation profile that comprises composite a hydroxymethylation profile for a population of individuals with at least one shared characteristic, as explained elsewhere herein.
  • the hydroxymethylation profile may also be a patient hydroxymethylation signature, constructed from the measurement of hydroxymethylation levels at each of a plurality of hydroxymethylation biomarker sites.
  • locus refers to a site on a nucleic acid molecule, wherein the nucleic acid molecule may be single-stranded or doublestranded, and further wherein an individual locus (or multiple "loci") may be of any length, thus including a single CpG site as well as a full-length gene, or across larger features such as topologically associated domains, including when several such loci are aggregated into groups such as related sequence motifs, other homologies or functional characteristics (regardless of their adjacency or topological relationship).
  • the loci herein may be contained within a gene body; within an annotation feature outside of the gene body, such as a promoter, an enhancer, a transcription initiation site, a transcription stop site, or a DNA binding site, or a combination thereof; or within an untranslated region, or "UTR" (including 3'UTRs and 5'UTRs).
  • any two variables are considered to be "very highly correlated" when they have a Coefficient of Determination (R2) of 0.5 or greater.
  • R2 Coefficient of Determination
  • the present invention encompasses such functional and statistical equivalents to the presently disclosed hydroxymethylation biomarkers.
  • correlate indicates a tendency of the two variables to vary together.
  • a “correlation” is a measure of the extent to which two or more variables fluctuate together.
  • a positive correlation indicates the extent to which those variables increase or decrease in parallel.
  • One example of a positive correlation is the relationship between a hydroxymethylation level at a hydroxymethylation biomarker locus, on the one hand, and the likelihood that a patient has cancer or a particular type of cancer, on the other.
  • a negative correlation would exist when the hydroxymethylation level at a hydroxymethylation biomarker locus decreases as a subject's likelihood of having cancer or a particular type of cancer decreases.
  • the present invention relates, in part, to the discovery that buffy coat hydroxymethylation signature, i.e., whole buffy coat gDNA 5hmC signature, is correlated with the presence of cancer in a patient.
  • the buffy coat gDNA 5hmC signature may be combined in an ensemble-type analysis, e.g., a stacked ensemble analysis, with one or more feature types, including cfDNA 5hmC signature, DNA fragment size information, such as fragment size distribution, copy number variation, and the like.
  • the buffy coat fraction of peripheral blood contains intact cells, such as circulating tumor cells that are widely investigated for not only cancer detection but also cancer prognosis. Yet, immune cells which make up the bulk majority of buffy coat cells have been only minimally explored for their potential utility in cancer detection.
  • the literature over the last decade has shown that signals secreted by different types of solid tumors are sensed by the bone marrow, skewing the hematopoiesis to a myeloid bias, releasing to the periphery a heterogeneous population of immature cells collectively called MDSCs (myeloid-derived suppressor cells) which are a mix of monocytes, myeloid precursors, and neutrophils.
  • MDSCs myeloid-derived suppressor cells
  • MDSCs impair immune responses, induce angiogenesis, and promote epithelial-to-mesenchymal transition (EMT), supporting tumor growth, as explained by Marvel et al. (2015) J. Clin. Invest. 125: 3356-64.
  • 5hmC serves as an important epigenetic mark from which cell type and disease status can be inferred.
  • This example is directed to the question of whether 5hmC profiles of buffy coat gDNA is altered in cancer, particularly in breast, colorectal and lung cancer.
  • peripheral blood samples were collected from 318 male and female subjects, minimum age 45 years old, with 152 cancer samples and 166 non-cancer controls.
  • 46.7% of the cancer cohort had early-stage (stages I and II) disease.
  • the percentage of early-stage cancer was 71.4%, 35.8% and 34% for breast, colorectal and lung, respectively.
  • the cancer cohort was cancer-treatment naive.
  • the non-cancer cohort was negative for any form of cancer.
  • a single blood draw from each individual was processed to isolate buffy coat gDNA and plasma-derived cfDNA, which were ultimately used as input material for WGS and 5hmC sequencing and subsequent analysis to compare and classify cancer and non-cancer samples, as will be explained infra.
  • Two Cell-Free DNA BCT® Streck tubes containing 10 mL of whole peripheral blood each were obtained per individual, by routine venous phlebotomy, according to the manufacturer's instructions. Streck tubes were kept at 15- 25°C and processed within 96 hours of phlebotomy by centrifugation at 1,500 ref for 10 minutes with the brake off at room temperature. The top layer containing plasma was collected and transferred to a new tube, and the layer of buffy coat was then carefully transferred to a 50 mL conical tube.
  • Plasma collected as described above was spun at 3,000 ref for 10 min with the brake on at room temperature. The supernatant was transferred to two 5 mL conical tubes and stored at -80°C.
  • cfDNA was extracted from 4 mL of plasma using MyOne® (ferrimagnetic) Silane Beads cfDNA isolation kit (Thermo Fisher) following manufacturer's instructions, in a HAMILTON STAR automated liquid handler (HAMILTON Co., Reno NV). During this procedure, plasma was incubated with Proteinase K and 20% SDS at 60°C for 20 minutes followed by cooling. Next, the cfDNA was bound to the magnetic beads and washed with a Thermo Fisher Scientific proprietary wash buffer and with 80% ethanol.
  • cfDNA was eluted with elution buffer, quantitated using Molecular Devices' Spectramax® Plate Readers and the Quant-iTTM PicoGreen® dsDNA quantitation assay (Thermo Fisher), and stored at -20°C.
  • TapeStation® 4200 capillary electrophoresis was employed to ensure the absence of contaminating high molecular weight DNA emanating from white blood cell lysis.
  • Genomic DNA was extracted from cell pellets stored at -80°C using the DNeasy® Blood & Tissue Kit (QIAGEN), following the manufacturer's instructions.
  • gDNA eluates were quantified using Spectra Max® i D3 (Molecular Devices). 100 ng of gDNA were sonicated to a modal 150 bp size using an ME220 focused ultrasonicator (Covaris). The sonicated DNA fragments were verified by TapeStation® 2200 dsDNA high sensitivity assay (Agilent).
  • the cell pellet was resuspended in FACS buffer and cell analysis was performed on a NovoCyte® Advanteon® Flow Cytometer (Agilent). Data points were analyzed using the NovoExpress® software (Agilent).
  • 5hmC enrichment and subsequent sequencing libraries were prepared as described previously (Guler et al. (2020), supra) using the "5hmC-Seal" method of International Patent Publication WO 2017/176630 to Quake et al., Song et al. (2011) 29: 68- 72, and Han et al. (2016) Mol. Cell 63:711-19.
  • hMe-Seal is a low-input, wholegenome 5hmC sequencing and enrichment method based on selective chemical labeling, in which 0-glucosyltransferase (0-GT) is used to selectively label 5hmC with a biotin moiety via an azide-modified glucose for pull-down of 5hmC-containing DNA fragments for sequencing.
  • 0-glucosyltransferase (0-GT) is used to selectively label 5hmC with a biotin moiety via an azide-modified glucose for pull-down of 5hmC-containing DNA fragments for sequencing.
  • the normalized buffy coat gDNA and the normalized cfDNA were ligated to sequencing adapters, followed by selective labeling of 5hmC with 0-GT, and affinity enrichment via selective pull-down of DNA fragments containing biotin-labeled 5hmC by binding to Dynabeads® M270 streptavidin (Thermo Fisher). PCR was then carried out directly on the beads to minimize sample loss during purification.
  • Adapter-ligated DNA fragments were prepared for library construction using the KAPA Hyperprep® kit (Roche) according to the manufacturer's instructions. All libraries were quantitated using the Qubit® dsDNA High Sensitivity Assay (Thermo Fisher Scientific) and normalized in preparation for sequencing. 75 base-pair paired-end sequencing was performed on a NovaSeq6000 instrument (Illumina). Sequencing data were collected with NovaSeq Control Software vl.7.0 (Illumina), as explained in part (v) of the next section. [0134] B. Bioinformatic Analysis:
  • Sequencing data from 5hmC and WGS was produced using NovaSeq® Control Software vl.7.0 (Illumina, Inc.). Raw data processing and demultiplexing were performed using bcl2fastq Conversion Software (Illumina, Inc.) to generate sample-specific FASTQ output. Sequencing reads were analyzed by a computational pipeline implemented as a Nextflow® script, which aligns the reads to the human genome reference build 38 (GRCh38 or Hg38) using the BWA-MEM2 algorithm (Anaconda.org, version 2.2.1). Metrics were computed by the pipeline via Picard to assess the quality of the sequencing data. Samples passing quality control metrics were placed into the two datasets to be used for training and validation.
  • the quality control failure rate for the set of validation samples was 1.94%.
  • Noncancer samples in the training data were matched to various clinical features such as age, sex, body mass index, and smoking status.
  • the machine-learning classification algorithm was trained as follows: each sample included in the training dataset was analyzed with the bioinformatics pipeline as already described. The pipeline divided the genome into functional regions pertaining to annotated gene bodies, enhancers, CpG islands, CCCTC- binding factor sites, promoters, and 3-prime untranslated regions from Gencode human annotation version 31 (GRCh38.pl2), and then counted, with the number of 5hmC library read pairs mapped to each region, correcting for differences in coverage using counts per million mapped reads.
  • the genes with decreased 5hmC included lymphocyte activation and differentiation genes such as BLK, CD3E, CCR7, CD28, FYN, GATA3, ICOS, IGLL5 and IRF4 (FIG. 4; also see the table of FIG. 29).
  • DhMGs differentially hydroxymethylated genes
  • GSEA gene set enrichment analysis
  • FIG. 16-21 Given the findings that the buffy coat and the matched cfDNA models carry different and complementary signals, we next assessed whether combining these two models could increase cancer detection performance relative to using cfDNA or buffy coat models individually (FIGS. 16-21). Models built with feature sets from both buffy coat and the matched cfDNA performed with an AUC of 0.957 and overall sensitivity of 65.79% (Cl 57.67%-73.28%) at 98% training specificity (FIGS. 16 and 29). FIG. 19 shows the cancer prediction score distribution for all samples in the study scored with the combined model. The number of true positive (TP) samples was significantly higher using the combined model compared to the individual models (FIG.
  • TP true positive
  • Epigenetics 10: 8 which exclude the granulocytes; or isolated T and B lymphocytes, with a few profiling the different immune populations found in the buffy coat (Parashar (2016) Bmc Cancer 18: 574; Wernig-Zorc et al. (2019) Epigenetic Chromatin 12: 4; Koestler et al. (2012) Cancer Epidem. Prev. Biomarkers 21 1293-1302; Manoochehri et al. (2021) doi: 10:10.21203/rs.3.rs- 508197/v2).
  • GMP granulocyte-monocyte progenitor
  • Several DhMGs enriched in the cancer cohort and associated with myeloid functions resemble MDSCs, an immature population of myeloid cells that suppress immune functions and support tumor progression.
  • Arginase (ARG1), elastase (ELANE), the cytokine G-CSF (CSF3) (Casbon et al.
  • a cancer prediction model was built using 5hmC and WGS features from the plasma cfDNA of matching patients.
  • the performance of the cfDNA model was similar to that of the buffy coat model, notwithstanding that the DhMGs identified in the buffy coat and cfDNA datasets were different (FIGS. 14 and 15).
  • the colorectal and lung cancer samples exhibited thousands of DhMGs using either the buffy coat gDNA or the cfDNA features.
  • buffy coat gDNA yielded thousands of DhMGs (FIGS. 16 and 17 and Table 1), while cfDNA had only 95 DhMGs (FIG. 15).

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Organic Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Engineering & Computer Science (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Analytical Chemistry (AREA)
  • Zoology (AREA)
  • Genetics & Genomics (AREA)
  • Wood Science & Technology (AREA)
  • Physics & Mathematics (AREA)
  • Biotechnology (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • Hospice & Palliative Care (AREA)
  • Biophysics (AREA)
  • Oncology (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

L'invention concerne des méthodes d'analyse d'échantillons de sang périphérique, l'ADNg de couche leuco-plaquettaire étant analysé pour permettre une signature d'hydroxyméthylation d'ADNg de couche leuco-plaquettaire. La signature d'hydroxyméthylation d'ADNg de couche leuco-plaquettaire est corrélée à la présence d'un cancer chez un patient. La combinaison de la signature d'hydroxyméthylation d'ADNg de couche leuco-plaquettaire avec une autre valeur de caractéristique associée au patient augmente la prédiction du processus. Dans certains modes de réalisation, une valeur de caractéristique supplémentaire est la signature d'hydroxyméthylation d'ADN acellulaire (ADNcf) dans un échantillon de sang périphérique obtenu du même patient. La combinaison de valeurs de caractéristiques peut être effectuée à l'aide d'une analyse de type d'ensemble telle qu'une analyse d'ensemble empilée.
PCT/US2024/010932 2023-01-09 2024-01-09 Analyse de 5-hydroxyméthylation de l'adng de la couche leuco-plaquettaire dans la détection du cancer WO2024151667A2 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202363437946P 2023-01-09 2023-01-09
US63/437,946 2023-01-09

Publications (2)

Publication Number Publication Date
WO2024151667A2 true WO2024151667A2 (fr) 2024-07-18
WO2024151667A3 WO2024151667A3 (fr) 2024-08-22

Family

ID=91762234

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2024/010932 WO2024151667A2 (fr) 2023-01-09 2024-01-09 Analyse de 5-hydroxyméthylation de l'adng de la couche leuco-plaquettaire dans la détection du cancer

Country Status (2)

Country Link
US (1) US20240229149A1 (fr)
WO (1) WO2024151667A2 (fr)

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113957124A (zh) * 2015-02-10 2022-01-21 香港中文大学 用于癌症筛查和胎儿分析的突变检测
EP4066245A1 (fr) * 2019-11-27 2022-10-05 Grail, LLC Systèmes et procédés pour évaluer des données de caractéristique biologique longitudinale

Also Published As

Publication number Publication date
US20240229149A1 (en) 2024-07-11
WO2024151667A3 (fr) 2024-08-22

Similar Documents

Publication Publication Date Title
Valihrach et al. Circulating miRNA analysis for cancer diagnostics and therapy
US20200277667A1 (en) Noninvasive diagnostics by sequencing 5-hydroxymethylated cell-free dna
US12060611B2 (en) Gene expression profiles associated with sub-clinical kidney transplant rejection
US10538813B2 (en) Biomarker panel for diagnosis and prediction of graft rejection
US20200024644A1 (en) Systems and methods for combined detection of genetic alterations
KR20140105836A (ko) 다유전자 바이오마커의 확인
JP2009529880A (ja) 原発細胞の増殖
US20110312521A1 (en) Genomic Transcriptional Analysis as a Tool for Identification of Pathogenic Diseases
EP3825416A2 (fr) Profils d'expression génique associés au rejet de greffe du rein subclinique
AU2021221905B2 (en) Gene expression profiles associated with sub-clinical kidney transplant rejection
WO2021116314A1 (fr) Analyse de signatures cellulaires pour la détection de maladies
WO2023235379A1 (fr) Séquençage monomoléculaire et établissement du profil de méthylation de l'adn acellulaire
US20240229149A1 (en) 5-HYDROXYMETHYLATION ANALYSIS OF BUFFY COAT gDNA IN CANCER DETECTION
CN113584174A (zh) 一种急性髓性白血病诊断、预后分子标志物Lnc-HEATR1-4及其应用
JP2023527761A (ja) 核酸サンプル富化およびスクリーニング方法
KR20200038659A (ko) 대식세포 특이적 바이오마커 패널 및 이의 용도
US20240344141A1 (en) Cell-free dna analysis in the detection and monitoring of pancreatic cancer using a combination of features
US20160312289A1 (en) Biomolecular events in cancer revealed by attractor molecular signatures
US20240344142A1 (en) Cell-free dna analysis in the detection of pancreatic cancer using a combination of features
WO2024097217A1 (fr) Détection de mutations somatiques non cancéreuses
Autio Comparison of endogenous retroviral RNA profiles from blood cells and plasma, between nonagenarians and young controls
CN115491414A (zh) 类风湿关节炎全转录组m6A甲基化修饰差异表达的综合分析方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24741908

Country of ref document: EP

Kind code of ref document: A2