EP2714933A2 - Procédés faisant appel à la méthylation de l'adn pour identifier une cellule ou un mélange de cellules afin de pronostiquer et de diagnostiquer des maladies et pour effectuer des traitements de réparation cellulaire - Google Patents

Procédés faisant appel à la méthylation de l'adn pour identifier une cellule ou un mélange de cellules afin de pronostiquer et de diagnostiquer des maladies et pour effectuer des traitements de réparation cellulaire

Info

Publication number
EP2714933A2
EP2714933A2 EP12789375.8A EP12789375A EP2714933A2 EP 2714933 A2 EP2714933 A2 EP 2714933A2 EP 12789375 A EP12789375 A EP 12789375A EP 2714933 A2 EP2714933 A2 EP 2714933A2
Authority
EP
European Patent Office
Prior art keywords
methylation
dna
cells
cpg
cell
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP12789375.8A
Other languages
German (de)
English (en)
Other versions
EP2714933A4 (fr
Inventor
Karl KELSEY
Eugene Andres HOUSEMAN
John WIENCKE
William P. ACCOMANDO, Jr.
Carmen MARSIT
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of California
Brown University
Original Assignee
University of California
Brown University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of California, Brown University filed Critical University of California
Publication of EP2714933A2 publication Critical patent/EP2714933A2/fr
Publication of EP2714933A4 publication Critical patent/EP2714933A4/fr
Withdrawn legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/10Ploidy or copy number detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/30Unsupervised data analysis
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/154Methylation markers
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/20Supervised data analysis

Definitions

  • Methylation arrays as surrogate measures of the identity of a cell or a mixture of cells
  • Methylation arrays as surrogate measures of the identity of a cell or a mixture of cells for prognosis and diagnosis of diseases
  • 61/585,892 filed January 12, 2012 entitled, “Methods of Immunodiagnostics using DNA Methylation arrays as surrogate measures of the identity of a cell or a mixture of cells for prognosis and diagnosis of diseases”
  • Kelsey K Houseman EA, Wiencke J, Accomando W and Marsit C, which applications are hereby incorporated herein by reference in their entireties.
  • Methods of determining altered immune cell distribution to diagnose or prognose a disease condition based on determining DNA methylation signatures of specific immune cell type of or mixture of immune cells types are provided.
  • Leukocytes commonly called white blood cells, are cells that are primarily responsible for mounting an immune response by a host to pathogens and to foreign antigens. Leukocyte distribution is currently determined by simple histologic or flow cytometric assessments. These methods have significant limitations. In particular, flow cytometry is limited by the following: availability of fluorescent antibody tags, laborious nature of the antibody tagging process, and needs for separation of cells requiring large volumes of fresh cells, expensive technology as well as equipment for detection of cells, and maintaining the integrity of the outer membrane of the cells to preserve labile protein epitopes. Further limitation of methods requiring fresh cells is that the methods are not useful in situations in which prospective studies are impractical, such as in the case of rare diseases, in which large numbers of disease subjects are not available.
  • retrospective studies are needed to correlate disease outcome with disease parameters.
  • retrospective studies can be performed only if archival samples derived from archived cohort populations could be used to analyze the disease parameters.
  • archived samples from patients and normal subjects could be used to provide a quantitative estimate of leukocyte distributions in disease conditions.
  • an embodiment of the invention provides a method a method for assessing a disease condition in a subject, including: measuring a CD3Z positive T lymphocyte cell number in a sample from the subject by analyzing methylation in the sample of at least one CpG dinucleotide (CpG) in gene CD3Z or in an orthologous or a paralogous gene thereof, such that an amount of a demethylated C of the at least one CpG in the sample is a measure of CD3+ T lymphocyte cell number; and comparing the amount of the demethylated C in the sample from the subject with that in positive control samples from patients with the disease condition, and with that in negative control samples from healthy subjects, such that the disease condition is selected from: an autoimmune disease, an allergy, a transplant
  • subject refers to any animal, for example, a mammal that is healthy or that has a disease condition for example a human, or a high value agricultural animal or a zoo animal.
  • a “patient” is a subject that either has a disease condition or is in need of obtaining a diagnosis of a disease condition.
  • a related embodiment of the method includes at least one of: monitoring, diagnosing, prognosing, and measuring response to therapy by comparing the measured CD3+ T lymphocyte cell numbers in the subject after therapy to that in the patients with the disease condition and in the healthy subjects.
  • an embodiment of the method provides that the inherited disease is an aneuploidy.
  • aneuploidy is selected from trisomy 21 , Turner's syndrome, and Klinefelter's syndrome.
  • the sample used in the method is a fresh sample.
  • the fresh sample is freshly drawn blood, a tumor infiltrate or cells obtained from a lymph node puncture.
  • the sample is an archival sample.
  • the archival sample is archival blood collected and stored on filter paper cards such as a Guthrie card, frozen blood specimens or frozen tissue.
  • Demethylation of DNA is a stable chemical modification of DNA, and archival samples are used to measure cell numbers.
  • Flow cytometry in contrast, requires fresh cells, for detection of cells depends on the availability of protein epitopes, which are labile and not well preserved in archival samples.
  • the amount of the demethylated C of the at least one CpG in the CD3Z gene in the sample is at least about 80%, at least about 90%, or at least about 95% of the total amount of the CpG in CD3Z genes in the sample.
  • An embodiment of the method further involves analyzing the methylation of the CD3Z gene further by amplifying by Polymerase Chain Reaction (PCR) using primer pairs specific for amplification of specific demethylated CpG loci.
  • PCR Polymerase Chain Reaction
  • amplification by PCR involves monitoring quantitative PCR in real time using a MethyLight assay or using digital PCR.
  • An embodiment of the method further involves analyzing the methylation of the CD3Z gene by a method selected from the group of: Pyrosequencing, Methylation-sensitive single- nucleotide primer extension (Ms-SNuPE), Methylation-sensitive single stranded conformation analysis (MS-SSCA), and High resolution melting analysis (HRM) and digital PCR methods comprising emulsion and nanofluidic partitioning.
  • a method selected from the group of: Pyrosequencing, Methylation-sensitive single- nucleotide primer extension (Ms-SNuPE), Methylation-sensitive single stranded conformation analysis (MS-SSCA), and High resolution melting analysis (HRM) and digital PCR methods comprising emulsion and nanofluidic partitioning.
  • Methylation-sensitive single-nucleotide primer extension further includes: chemically converting the lymphocyte derived whole genomic DNA with bisulfite; amplifying chemically converted whole genomic DNA; enzymatically fragmenting resulting amplified DNA;
  • Another embodiment of the method further provides steps for analyzing methylation of differentially methylated regions (DMRs) of gene FOXP3, using primer pairs for amplification of specific loci of demethylated CpG in the FOXP3 gene.
  • loci refers to locations of all CpG dinucleotide containing sequences present in that gene, and only one or a few may be differentially demethylated in a specific cell.
  • a related embodiment of the method further includes: determining a ratio of CpG demethylation of FOXP3 gene DMR to the CpG demethylation of CD3Z gene DMR in a sample of tumor infiltrate, such that the ratio involves an index of T regulatory cell number to the total T cell number in the infiltrate; and the method further involves diagnosing of a pathological grade of the cancer, so that the index of T regulatory cell number to the total T cell number in the tumor infiltrate correlates with the grade of the cancer.
  • the cancer is selected from: a glioma; an ovarian cancer; a head and neck squamous cell cancer (HNSCC), breast cancer, lung cancer, prostate cancer, colon cancer, pancreatic cancer, bladder cancer, cervical cancer and liver cancer.
  • HNSCC head and neck squamous cell cancer
  • the method further includes prognosing survival of a patient having or needing a diagnosis of glioma or HNSCC, in which amount of demethylation of CD3Z gene DMR in the patient as a percent of total DNA greater than a median value in a sample population of subjects correlates with a prognosis of poor survival.
  • An embodiment of the invention provides a kit for measuring CD3+ T lymphocyte and FOXP3+ T regulatory cell numbers by analyzing methylation of CpG positions in CD3Z and FOXP3 genes, the kit having sequencing and PCR primers specific for the CD3Z and the FOXP3 gene DMRs and instructions for analyzing and comparing the CpG methylation between healthy subjects and a patient.
  • An embodiment provides a method for assessing a disease condition by estimating an alteration in proportions of types of leukocytes in a sample from a subject, the method including the steps of: measuring a DNA methylation profile for each type of leukocyte and for unfractionated cells, such that DNA methylation profiles are obtained for a plurality of CpG loci, and obtaining the status of an individual CpG locus by amplifying DNA from each of the types of leukocyte and from the unfractionated cells, such that amplifying comprises hybridizing methylation sensitive locus-specific DNA oligomers corresponding to each CpG locus; ordering CpG loci by ability to distinguish types of leukocytes, such that the ordering of the CpG loci determines differentially methylated DNA regions (DMRs), such that obtaining DMRs comprises statistically minimizing introduction of bias in amount of total methylation status of a large number of CpG loci obtained from the unfractionated cells by employing a Bayesian treatment of prior probabilities of the methyl
  • the locus- specific DNA oligomers are linked to an array selected from the group of: a glass slide array; a quartz slide array; a fiber optic bundle array, a planar slide array, a micro-well array; a multi- well dish array; a digital PCR array; and a bead array having beads located at known addressable locations on the array.
  • a related embodiment of the method further provides at least one of steps of: monitoring, diagnosing, prognosing and measuring response to therapy of the disease condition.
  • the method in a related embodiment further includes analyzing sensitivity for correcting bias, such that correcting bias is unrelated to measurement error and is related to errors arising from unprofiled cell types and non-cell mediated profile differences.
  • fractionated leukocyte types include at least one selected from: CD19+ B lymphocytes, CD15+ granulocytes, CD14+ monocytes, CD56+ Natural Killer cells, and CD3+ T lymphocytes.
  • the disease condition is Head and Neck Squamous Cell Carcinoma (HNSCC).
  • HNSCC Head and Neck Squamous Cell Carcinoma
  • control sample is taken from the subject at a different point in time for prognosis of the course of the disease condition in the subject.
  • the method of assessing disease condition further includes after employing the measurement model, comparing the distribution of leukocytes to the relative amounts in the control sample as a normal standard, such that the normal standard is a statistical measure obtained from a plurality of disease-free subjects.
  • the method provides a diagnosis of immunosuppression due to smoking in a currently smoking subject by: determining a ratio of CpG demethylation of FOXP3 gene DMR to the CpG demethylation of CD3Z gene DMR in blood in the currently smoking subject, such that the ratio is an index of T regulatory cell number to the total T cell number; and providing a diagnosis of immunosuppression in the currently smoking subject, such that the value of the index of T regulatory cell number to the total T cell number in the currently smoking subject, greater than the average value in a sample population of currently non- smoking subjects correlates with immunosuppression due to smoking.
  • the subject with the currently-smoking or currently non-smoking status is a patient having a cancer, an infection or in need of a transplant.
  • An embodiment provides a method of predicting a methylation class membership in a bodily fluid sample of a subject for assessing disease status of the subject, in which the methylation class membership corresponds to an epigenetic signature of a plurality of leukocyte types, the method including: measuring amounts of DNA methylation in each of a plurality of leukocyte type populations to determine differentially methylated regions (DMRs);
  • DMRs differentially methylated regions
  • leukocyte DMRs for each leukocyte type according to statistical strength of association of the DMR with each leukocyte type; randomly dividing a data set of control subjects and subjects with a disease into groups having substantially the same numbers of control subjects and subjects with the disease to obtain a training set and a testing set; clustering samples in the training set using a defined number of highest ranked leukocyte DMRs to determine clustering solutions, in which a clustering solution corresponds to the methylation class membership; and predicting methylation class membership for subjects within the testing set by applying the clustering solutions obtained from the training set to the highest ranked leukocyte DMRs in the testing set, such that clinical utility of the predicted methylation class membership is determined by testing association of the predicted methylation class membership with the disease status of the subject.
  • the highest ranked leukocyte DMRs are as shown in Table 21 , in which each DMR is identified by chromosomal location and gene name , and the defined number of highest ranked leukocyte DMRs is selected from: least 10, at least 20, at least 30, at least 40 and is 50.
  • the methylation class membership of the subject in the testing set is predicted for example using a naiVe Bayes classifier. Testing the association of the predicted methylation class with disease status includes for example using receiver operating characteristic curves
  • the bodily fluid sample in some embodiments is a fresh sample, for example freshly collected blood or a blood derivative.
  • the bodily fluid is an archival sample, for example stored frozen blood or archival blood collected and stored on a filter paper card such as a Guthrie card.
  • the method in a related embodiment includes at least one of: diagnosing, monitoring, prognosing and measuring response to therapy of the disease status.
  • the leukocyte types are selected from the group of: natural killer cells, B Cells, CD4+ T cells, CD8+ T cells, granulocytes and monocytes.
  • the disease according to an embodiment of the method is exemplified by one of: head and neck squamous cell carcinoma (HNSCC), ovarian cancer, and bladder cancer.
  • HNSCC head and neck squamous cell carcinoma
  • ovarian cancer ovarian cancer
  • bladder cancer bladder cancer
  • An array is provided as another embodiment for estimating proportions of leukocyte types in a sample from a mammal for assessing a disease condition of the mammal by analyzing differential methylation of CpG dinucleotides in a plurality of genes of the sample, the array including: a plurality of DNA probes attached to a plurality of surfaces at known addressable locations on the array, such that the surface at each location is attached to a DNA probe having a specific nucleotide sequence, such that the DNA probe having the specific nucleotide sequence hybridizes to a nucleotide sequence of a methylated form or an ummethylated form of a CpG dinucleotide in a sequence of a gene of the plurality of genes in the sample, such that the array is selected from having: at least 16 probes, at least 64 probes, at least 96 probes, and at least 384 probes.
  • the plurality of probes in a related embodiment of the array, has nucleotide sequences that hybridize with a respective plurality of 96 different nucleotide sequences which are found in nature occurring in the plurality of genes.
  • the 96 nucleotide sequences have SEQ ID NO: 1 to SEQ ID NO: 96.
  • the addressable locations are wells of a substrate, such that the substrate is selected from: glass slide; quartz slide; fiber optic bundle and planar silica slides.
  • the surfaces included in the array are particles added to the wells.
  • the addressable locations of the array are defined spots on a glass slide or are microbeads or particles labeled with a code.
  • the particles are microbeads in the form of glass cylinders identifiable with inscribed holographic code.
  • the disease condition is selected from: an autoimmune disease, an allergy, a transplant rejection, obesity, an inherited disease, immunosuppression and a cancer.
  • Another embodiment provides a method for estimating proportions of types of leukocytes in a sample from a subject for assessing a disease condition of the subject by analyzing differential methylation of CpG dinucleotides in a plurality of genes of the sample, the method including: providing an array having a plurality of DNA probes attached to a plurality of surfaces at known addressable locations on the array, such that the surface at each location is attached to a DNA probe having a specific nucleotide sequence; reacting genomic DNA in the sample with a bisulfite reagent to convert unmethylated cytosine residues to uracil; hybridizing resulting bisulfite treated genomic DNA with the array to obtain resulting hybridized probes on the array, such that the DNA probes hybridize to a DNA sequence of each of a methylated form and an ummethylated form of a sequence having a CpG dinucleotide in a gene for each of the plurality of genes; and detecting the methylation status of each of the CpG
  • detecting the methylation status of the CpG dinucleotide sequence includes: extending each hybridized probe of the resulting hybridized probes on the array by primer extension to obtain a resulting primer extension product; ligating the resulting primer extension product to an oligonucleotide complementary to the DNA sequence of a 3' region of the gene to obtain a resulting template for PCR on the array; and amplifying by PCR and measuring amount of resulting PCR product, thereby detecting the methylation status of the CpG dinucleotide containing nucleotide sequence.
  • amplifying by PCR further includes: amplifying the resulting template on the array using primers pairs including a 5' primer specific to each of the methylated or the unmethylated form of the CpG dinucleotide containing gene, and a 3 'primer specific to the gene containing the CpG dinucleotide, thereby resulting in a first PCR product; amplifying the resulting first PCR product with differentially labeled 5' primers that specifically amplify either the methylated or the unmethylated form of the CpG dinucleotide containing nucleotide sequence containing gene, and a common 3 ' primer, resulting in a differentially labeled second PCR product, and hybridizing the second PCR product to the CpG dinucleotide containing gene for measuring amount of the second PCR product, thereby detecting the methylation status of the CpG dinucleotide sequence.
  • Detecting the methylation status of the CpG dinucleotide sequence includes extending the resulting hybridized probes on the array by single base primer extension with a labeled nucleotide.
  • the array used in the method includes at least 16 probes, at least 64, at least 96 probes or at least 384 probes.
  • the plurality of probes on the array hybridizes with a plurality of 96 different nucleotide sequences occurring in the plurality of genes.
  • each probe on the array is complementary to nucleotide sequences having SEQ ID NO: 1 to SEQ ID NO: 96.
  • the disease condition assessed is selected from: an autoimmune disease, an allergy, a transplant rejection, obesity, an inherited disease, and a cancer.
  • Assessing the disease condition using the array includes at least one of: monitoring, diagnosing, prognosing, and measuring response to therapy by comparing estimated proportions of types of leukocytes of the subject after therapy to proportions of leukocytes from a healthy subject.
  • the sample containing the genomic DNA used to hybridize with the probes on the array is fresh i.e., obtained in real time prior to performing the method.
  • the sample is archival.
  • the leukocyte types include at least one selected from: CD 19+ B lymphocytes, CD! 5+ granulocytes, CD 14+ monocytes, CD56+ natural Killer cells, and CD3+ T lymphocytes.
  • kits for estimating proportions of leukocyte types in a sample by analyzing differential methylation of CpG dinucleotides in a plurality of genes of the sample including: an array having: a plurality of DNA probes attached to a plurality of surfaces at known addressable locations on the array, such that the surface at each location is attached to a DNA probe having a specific nucleotide sequence, such that the DNA probe having the specific nucleotide sequence hybridizes to a DNA sequence of a methylated form or an ummethylated form of a CpG dinucleotide in a sequence of a gene of the plurality of genes in the sample, such that the array is selected from having: at least 16 probes, at least 64 probes, at least 96 probes, and at least 384 probes; primers and reagents for detecting the hybridized probes and for detecting the reaction products derived from the hybridized probes; and instructions for using the array with a bisulfite reagent
  • the probes hybridize with a respective plurality of 96 different DNA sequences occurring in the plurality of genes.
  • the probes have nucleotide sequences complementary to 96 nucleotide sequences having SEQ ID NO: 1 to SEQ ID NO: 96.
  • the instructions in a related embodiment of the kit include methods for: reacting genomic DNA in the sample with the bisulfite reagent to convert unmethylated cytosine residues to uracil; hybridizing resulting bisulfite treated genomic DNA with probes immobilized to the surfaces to obtain resulting hybridized probes on the array, such that the DNA probes hybridize to a DNA sequence of each of a methylated form and an ummethylated form of a CpG dinucleotide sequence in a gene of the plurality of genes; and detecting the methylation status of the CpG dinucleotide sequence, thereby estimating proportions of leukocyte types in the sample from the subject for assessing the disease condition of the subject.
  • the instructions for detecting the methylation status of the CpG dinucleotide sequence include methods for: extending each hybridized probe of the resulting hybridized probes on the array by primer extension to obtain a resulting primer extension product; ligating the resulting primer extension product to an oligonucleotide complementary to the DNA sequence of a 3' region of the gene to obtain a resulting template for PCR on the array; and amplifying by PCR and measuring amount of resulting PCR product, thereby detecting the methylation status of the CpG dinucleotide sequence.
  • kits amplifying by PCR include methods for: amplifying the resulting template on the array using primers pairs having a 5' primer specific to each of the methylated or the unmethylated form of the CpG dinucleotide containing gene, and a 3'primer specific to the gene containing the CpG dinucleotide, thereby- resulting in a first PCR product; amplifying the resulting first PCR product with differentially labeled 5' primers that specifically amplify each of the methylated and unmethylated form of the CpG dinucleotide sequence containing gene, and a common 3' primer, resulting in a differentially labeled second PCR product, and hybridizing the second PCR product to the CpG dinucleotide containing gene for measuring amount of the second PCR product, to detect the methylation status of the CpG dinucleotide sequence.
  • Instructions for detecting the methylation status of the CpG dinucleotide sequence include methods for extending the resulting hybridized probes on the array by single base primer extension with a labeled nucleotide.
  • Another embodiment of the invention is a method of treating a subject for a disease condition, such that the subject is a human patient and, such that the disease condition is a cancer, the method comprising: obtaining signatures comprising differentially methylated regions (DMRs) from types of leukocytes in a blood sample of the patient, the types of leukocytes comprising at least one selected from: CD 19+ B lymphocyte, CD 15+ granulocyte, CD 14+ monocyte, CD56 dim Natural Killer cell, CD56 br,ght Natural Killer cell, and CD3+ T lymphocyte, and from a healthy control human subject not having the cancer; comparing a signature specific for the type of leukocyte in the patient with that in the healthy subject, such that the type of leukocyte specific signature is an indication of amount of cells of the type of leukocyte circulating in blood, and such that a decreased amount of the cells of the type of leukocyte circulating in the blood of the patient compared to the healthy subject is an ind icium of the cancer; and, administering
  • the leukocyte type cell is the CD56 dim Natural
  • the cancer in related embodiments of the method is head and neck squamous cell carcinoma (HNSCC).
  • HNSCC head and neck squamous cell carcinoma
  • Natural Killer cells includes at least one CpG dinucleotide in a region near the promoter of gene NKp46.
  • the DMR signature specific for CD56 d m Natural Killer cells is a CpG dinucleotide in a region near the promoter of the gene NKp46, such the methylation status of the CpG dinucleotide is quantified by methylation specific quantitative polymerase chain reaction (MS-qPCR) using primers and probes having SEQ ID NOs: 1 16-1 18 and 97-99.
  • MS-qPCR methylation specific quantitative polymerase chain reaction
  • the DMR signature specific for CD56 dim Natural Killer cells is a CpG dinucleotide in a region near the promoter of the gene NKp46, such that the methylation status of the CpG dinucleotide is quantified by digital PCR involving emulsion and nanofluidic partitioning using primers and probes having SEQ ID NOs: 1 16-1 18 and 97-99.
  • the blood sample is archival.
  • the blood sample is fresh.
  • Figure 1 is a photograph showing a clustering heatmap for External Validation White Blood Cell Data (So)-
  • the data were obtained by applying the measurement error formulation described in Examples 1-3.
  • the method delineates effects resulting from immune cell distribution as compared to those resulting from other "non-cell type" alterations in DNA methylation.
  • Methylation array procedure was carried out using Infinium HumanMethylation27 Beadchip Microarrays from Illumina, inc. (San Diego, CA).
  • Figure 2 is a chart showing the results of cell mixture reconstruction experiments validating prediction of individual sample profiles.
  • the reconstruction experiments involved six known mixtures of monocytes and B cells and six known mixtures of granulocytes and T cells.
  • FIG. 3 is a photograph showing a clustering heatmap for Target HNSCC data (Si).
  • the target data set Si consisted of arrays applied to whole blood specimens collected in a random subset of individuals involved in an ongoing population-based case-control study (Peters et al.,
  • HNSCC head and neck cancer
  • Figure 4 is a graphical representation of bias sensitivity analysis for HNSCC Data. Bias was assessed by resampling the case coefficients of Bj, a procedure that assumes maximum bias. The abscissa shows the number of assumed non-zero alterations. The dark filled diamond shapes (red in color) indicate median, the thick vertical lines (blue in color) indicates interquartile range, the thin lines (blue in color) represent 95% probability ranges, and the outer dots (black in color) represent 99% probability ranges.
  • FIG. 5 panels A-B are graphs showing Rate-of-Convergence of the Hessian matrix H m which allows the determination of the optimal number of CpG sites whose combined methylation status measurements most accurately reflect the exact distribution of different cells in a mixture.
  • the x-axis represents increasing m, the number of CpG sites (ordered by F- statistic) included in the model space, on a logarithmic scale.
  • Figure 5 panel A shows convergence by correlating the Hessian Matrix with the number of CpG sites included in the measurement.
  • the dotted line in (A) shows the tangent at low values of m.
  • Figure 5 panel B shows the Rate of convergence which was calculated by smoothing the first differences of logio(trH m ).
  • the dotted line (red in color) in (B) corresponds to linear convergence.
  • the annotation track above the heatmap indicates case-control status (cancer case or control).
  • Figure 7 is a photograph showing a clustering heatmap for Target Down Syndrome Data.
  • the method herein was applied to a trisomy 21 (Down syndrome) data set (Kerkel et al.,. PLoS Genet 2010, 6(1 l):el001212) consisting of 29 total peripheral blood leukocyte samples from Down syndrome cases and 21 controls, as well as six T cell samples from cases and four T cell samples from controls (GEO Accession number GSE25395).
  • the annotation track above the heatmap indicates case-control and cell type status [Down syndrome case (whole blood), control (whole blood), T cell (pooled cases and controls)].
  • the annotation track above the heatmap indicates case-control status (obese and lean).
  • Figure 9 is a photograph (heatmap) of the methylation profiles of white blood cells obtained from a DNA methylation array analysis described in Example 9. Methylation array procedure was carried out using Infinium HumanMethylation27 Beadchip Microarrays from Illumina, Inc. (San Diego, CA). The number of individual leukocyte samples in each methylation class is shown in the table to the right. The DNA methylation profile distinguishes Lymphocytes from Myeloid Derived Leukocytes. The highest 5000 most variable CpG loci are plotted on the left. Less methylated loci are grey and more methylated loci are black.
  • RPMM Recursively partitioned mixture model
  • FIGS. 10 panels A-B are graphical representations of the DNA methylation status of regions in CD3E and CD3Z genes.
  • Figure 10 panel A shows DNA methylation status of a region in CD3E that was identified from the DNA methylation array analysis (the results of which are shown in Figure 9) as one of the two candidate DMRs with specificity towards CD3+ T cells.
  • the DNA methylation status was measured by pyrosequencing bisulfite converted DNA from different sorted, human, peripheral blood leukocytes.
  • Figure 10 panel B shows DNA methylation status of a region in CD3Z gene that was identified from the DN A methylation array analysis (the results of which are shown in Figure 9) as one of the two candidate DMRs with specificity towards CD3+ T cells.
  • the DNA methylation status of the region in CD3Z gene in different sorted, human, peripheral blood leukocytes was measured by MethyLight® qPCR.
  • Figure 11 is a drawing showing the genomic region containing CD3Z gene, based on information available from the public databases UniProt, RefSeq and GenBank.
  • UniProt is a freely accessible universal protein resource of protein sequence and functional information.
  • RefSeq is a collection that provides integrated and annotated set of sequences including genomic DNA, transcripts and protein.
  • GenBank ® is the genetic sequence database of the National Institutes of Health which contains an annotated collection of all publicly available DNA sequences.
  • Figure 12 is a list of genomic regions used for measuring methylation of CD3Z and FOXP3 gene, for quantitating genome copy numbers, and a list of the corresponding primer and probe sequences. Underlined letters are "C” in CpG motifs.
  • Figure 13 panels A-C are graphical representations of standard calibration curves which show the relationship between copy numbers of genomic DNA and the signal obtained from quantitaive real time methylation specific PCR.
  • the calibration curves are used for quantifying CD3+ T cells, Tregs (FOXP3 demethylated) and ratios of Tregs/CD3+ T cells.
  • DNA isolated from purified cell types was bisulfite converted and serially diluted into a background of fully methylated commercial DNA standard (Qiagen). The total genomic copy numbers of each sample within a dilution series remained constant. Log dilutions were performed in the appropriate range of Ct values corresponding to test samples (whole blood, tumor specimens). Using cytosine-less: C-less primers genome copy numbers for each test standard were measured to ensure adequate input DNA and to normalize the CD3+ and Treg assay values.
  • Figure 14 is a drawing and a set of graphical representations showing detection of CD3+ T cell numbers by measuring differential demethylation using MS-qPCR.
  • Figure 14 panel A is a schematic diagram showing methylation specific primers and probe targeting six CpGs (lollipops) in a region of the CD3Z gene identified herein as demethylated in CD3+ T cells.
  • Figure 14 panel B shows results of real time PCR. The real time PCR Ct values decreased linearly with a ten-fold increase in bisulfite converted CD3+ T cell DNA
  • Figure 14 panel C shows correlation between T cell levels determined by flow cytometry and CD3Z MS-qPCR. Evaluation of CD3+ T cell level by flow cytometry was observed to be highly correlated with T cell quantification by CD3Z MS-qPCR in whole blood specimens from glioma patients and healthy donors.
  • FIG. 14 shows correlation between T cell counts obtained using by imunohistochemical staining and CD3Z MS-qPCR.
  • CD3+ T cell count by imunohistochemical staining correlates with T cell quantification by CD3Z MS-qPCR in excised tumors across histological subtypes. Pearson correlations and F-test p-values are shown in panels B-D.
  • FIGS. 15 panels A-C are graphical representations showing T cells and Tregs in the peripheral blood of glioblastoma multiform (GBM) patients and healthy donors determined by MS-qPCR for demethylation of specific CpG loci.
  • GBM glioblastoma multiform
  • Figure 15 panel A shows comparison of T cell numbers in blood between GBM patients and control subjects measured using CD3Z demethylation assay.
  • Figure 15 panel B shows comparison of Tregs between GBM patients and control subjects measured using FOXP3 demethylation assay.
  • Figure 15 panel C is a graph showing comparison of Treg percent of T cells between GBM patients and control subjects determined by the ratio of FOXP3ICD3Z demethylation. Wilcoxon rank sum p-values are shown.
  • Figure 16 panels A-C are graphical representations showing association between cigarette smoking and peripheral blood T cells and Tregs in glioma patients and healthy donors determined by MS-qPCR for demethylation of specific CpG loci.
  • Figure 16 panel A shows a comparison of peripheral blood T cell levels, determined by CD3Z demethylation, among never, former and current cigarette smokers stratified by glioma case status (indicated “cases” on the abscissa).
  • Figure 16 panel B shows a comparison of peripheral blood Treg levels, determined by FOXP3 demethylation, among never, former and current cigarette smokers stratified by glioma case status.
  • Figure 16 panel C shows a comparison of peripheral blood Treg percent of T cells, determined by ratio of FOXP3 to CD3Z demethylation, among never, former and current cigarette smokers stratified by glioma case status. Wilcoxon rank sum p-values are shown.
  • Figure 17 panels A-C are graphical representations showing levels of T cell and Treg infiltrates in excised glioma tumors determined by MS-qPCR for demethylation of specific CpG loci.
  • Figure 17 panel A shows T cell levels, determined by CD3Z demethyation, in solid glioma samples stratified by tumor grade.
  • Figure 17 panel B shows Treg levels, determined by FOXP3 demethyation, in solid glioma samples stratified by tumor grade.
  • Figure 17 panel C shows Treg percent of T cells, determined by ratio of FOXP3 to CD3Z demethylation, in solid glioma samples stratified by tumor grade. Wilcoxon rank sum p-values are shown.
  • Figure 18 panels A-C are graphical representations of flow cytometry analysis of CD3+ T cells and total leukocytes in whole blood from glioma cases and controls.
  • Figure 18 panel A shows a forward and side scatter plot of a representative blood sample showing gating for lymphocytes and counting beads.
  • Figure 18 panel B shows lymphocyte subpopulation observed using gating for CD3 expression.
  • Figure 18 panel C shows CD45 gating on all non-bead events. CD45+ low and high cells were added in order to count total CD45+ cells.
  • Figure 19 panels A-C are photomicrographs and a lie graph that show
  • IHC immunohistochemical
  • Figure 9 panel A shows CD3 staining. Average number of cells positive for staining was 418.
  • Figure 19 panel B shows CD8 staining. Average number of cells positive for staining was 296.
  • Figure 20 is a set of two heatmaps showing results of MS-qPCR and bisulfite pyrosequencing of Magnetic activated cell sorting (MACS) sorted human leukocyte subsets.
  • MCS Magnetic activated cell sorting
  • B B lymphocytes
  • Gran Granulocytes
  • Neut Neutrophils
  • Mono Monocytes
  • NK CD56+ Natural killer cells
  • Nkdim CD16+CD56dim natural killer cells
  • NKbr CD16-CD56bright natural killer cells
  • NK8+ CD8+CD56+ natural killer cells
  • NK8- CD8-CD56+ natural killer cells
  • NKT CD3+CD56+ natural killer T cells
  • T CD3+ T lymphocytes
  • CD8 CD3+CD8+ T lymphocytes (cytotoxic T cells)
  • CD4 CD3+CD4+ T lymphocytes (helper T cells)
  • Treg CD3+CD4+CD25+FOXP3+ regulatory T cells.
  • Figure 20 panel A is a heatmap of DNA methylation in FOXP3 and CD3Z gene regions assessed by MS-qPCR.
  • Figure 20 panel B is a heatmap of DNA methylation at three CpG loci in the CD3Z gene assessed by bisulfite pyrosequencing.
  • FIGS. 21 panels A-C are graphical representations showing levels of T cell and Treg infiltrates in glioma tissues stratified by histological subtype detennined by MS-qPCR for demethylation of specific CpG loci.
  • PA Pilocytic Astrocytoma
  • EP Pilocytic Astrocytoma
  • Ependymoma Oligodendroglioma
  • OA Oligoastrocytoma
  • AS Astrocytoma
  • GBM Glioblastoma multiforme. Kruskal-Wallis one-way analysis of variance by rank test p-values shown.
  • Figure 21 panel A shows T cell levels determined by CD3Z demethylation in solid glioma samples stratified by tumor histology.
  • Figure 21 panel B shows Treg levels determined by FOXP3 demethylation in solid glioma samples stratified by tumor histology.
  • Figure 21 panel C shows Treg percent of T cells, determined by ratio of FOXP3 to CD3Z demethylation in solid glioma samples stratified by histology.
  • Figure 22 panels A-C are graphical representations showing Kaplan Meier analysis of time of survival of glioma patients stratified according to whether the level of T cells or Tregs in the tumor infiltrates of the patients are above or below the median level of T cells or Tregs, respectively. Log Rank p-values shown.
  • Figure 22 panel A shows survival (ordinate) of glioma patients as a function of time (abscissa) in relation to T cell levels as determined by CD3Z demethylation.
  • Figure 22 panel B shows survival of glioma patients in relation to Treg levels as determined by FOXP3 demethylation.
  • FIG 22 shows survival of glioma patients in relation to Treg percent of T cells as determined by ratio of FOXP3 to CD3Z demethylation.
  • Figure 23 panels A-B are representations of results obtained from analysis of DMRs of leukocyte subtypes.
  • Figure 23 panel A shows a heat map of the methylation status for the highest ranked 50 leukocyte DMRs by leukocyte subtype.
  • Figure 23 panel B shows a Plot depicting the -loglO(P-values) for the highest ranked 50 leukocyte DMRs across three cancer data sets (HNSCC; Ovarian; Bladder).
  • P-values (ordinate) show methylation differences between cancer cases and non-cancer controls and were obtained from individual unconditional logistic regression models fit to each of the 50 leukocyte DMRs.
  • HNSCC data set logistic regression models were adjusted for patient age, gender, smoking status (never, former, current), smoking pack years, weekly alcohol consumption, and HPV serology status.
  • the bladder cancer data set was adjusted for patient age, gender, smoking status, smoking pack years, and family history of bladder cancer.
  • the ovarian cancer data set was adjusted for patient age group (55-60, 60-65, 65-70, 70-75 and >75 years).
  • FIG. 24 panels A-B shows results obtained from the DMR profile analysis of the HNSCC data set determining methylation class membership.
  • Figure 24 panel A left column shows a heat map of the HNSCC testing data set. Rows represent subjects, which are grouped by predicted methylation class membership. Columns represent the highest ranked 50 leukocyte DMRs that were used to generate the methylation classes for the HNSCC testing set. Panel A right column is a bar-plot depicting the percent cancer case/control across the predicted methylation classes in the HNSCC testing set.
  • Figure 24 panel B shows receiver operating characteristic (ROC) curves based on the predicted methylation classes only in the HNSCC testing set and methylation classes including patient age, gender, smoking status (never, former, current), smoking pack years, weekly alcohol consumption, and HPV serostatus.
  • ROC receiver operating characteristic
  • Figure 25 shows results obtained from the DMR profile analysis of the Ovarian data set for determining methylation class membership.
  • Figure 25 panel A is a heat map of the ovarian testing data set. Rows represent subjects which are grouped by predicted methylation class membership. Columns represent the highest ranked ten leukocyte DMRs that were used to generate the methylation classes for the ovarian testing set. Panel A right column is a bar-plot depicting the percent cancer case/control across the predicted methylation classes in the ovarian testing set.
  • Figure 25 panel B shows ROC curves based on the predicted methylation classes alone in the ovarian testing set and methylation classes plus patient age group (55-60, 60-65, 65-70, 70-75 and >75 years).
  • Figure 26 shows results obtained from the DMR profile analysis of the bladder data set for determining methylation class membership.
  • Figure 26 is a heat map of the bladder testing data set. Rows represent subjects, which are grouped by predicted methylation class membership. Columns represent the highest ranked 56 leukocyte DMRs that were used to generate the methylation classes for the bladder testing set. Panel A right column represents a bar-plot depicting the percent cancer case/control across the predicted methylation classes in the bladder testing set.
  • Figure 26 panel B shows ROC curves based on the predicted methylation classes alone in the bladder testing set and methylation classes plus patient age, gender, smoking status (never, former, current), smoking pack years, and family history of bladder cancer.
  • Figure 27 panels A-C are graphical representations showing image plots representing the pairwise spearman correlation coefficients.
  • Figure 27 panel A shows the six CpG loci identified by HNSCC analysis (Langevin SM et al., Epigenetics. 2012 Mar; 7(3):291-9) and the highest ranked 50 leukocyte DMRs used in the present analysis.
  • Figure 27 panel B shows the seven CpG loci identified by the alternative ovarian analysis and the highest ranked ten leukocyte DMRs used in the present analysis
  • Figure 27 panel C shows the nine CpG loci identified by the bladder analysis reported in
  • Figure 28 is a schematic diagram showing hierarchy of leukocyte subtypes and sample sizes for each of the leukocyte subtypes used in the analysis for determination of methylation class membership.
  • FIG. 30 is a diagram representing the analytic workflow the ovarian cancer data set
  • the full ovarian cancer data set was divided into equally sized training and testing sets.
  • the training sets were used in the development of a classifier based on leukocyte DMRs.
  • the resulting classifiers were then used to predict methylation class membership for the observations in the respective independent testing sets.
  • the phenotypic importance of the predicted methylation classes in the testing data was then examined.
  • the full bladder cancer data set was divided into equally sized training and testing sets.
  • the training sets were used in the development of a classifier based on leukocyte DMRs.
  • the resulting classifiers were then used to predict methylation class membership for the observations in the respective independent testing sets.
  • the phenotypic importance of the predicted methylation classes in the testing data was then examined.
  • Figure 32 is a diagram illustrating Semi-Supervised Recursively Partitioned Mixture Models (SS-RPMM) for predicting methylation class membership.
  • SS-RPMM Semi-Supervised Recursively Partitioned Mixture Models
  • Figure 33 panels A-D show results obtained from SS-RPMM analysis (see Figure 30) of the ovarian cancer data set for determination of methylation class membership.
  • Figure 33 panel A is a heatmap of the testing set obtained by predicted methylation class using the SS-RPMM procedure. Rows represent subjects and columns represent the seven CpG loci identified by this analysis. Figure 33 panel B represents percentage of cases/controls obtained by predicted methylation class membership in the testing set.
  • Figure 33 panel C sows information regarding the seven CpG loci identified by the SS- RPMM analysis.
  • Figure 33 panel D shows a ROC/AUC (area under the curve) analysis based on the predicted methylation class memberships in the testing set. Dark represents the ROC/AUC based on the predicted methylation classes along and light represents the ROC/AUC using the predicted methylation classes and patient age group.
  • Figure 34 is a graphical representation showing loci in the gene NKp46 chosen from candidate NK cell-specific differential DNA methylation markers, selected by DNA methylation and mRNA expression criteria.
  • Linear mixed effects modeling of DNA methylation microarray data from MACS isolated human leukocytes generated a coefficient estimating differential methylation in NK cells relative to other cell subtypes, shown on the avscissa.
  • Linear modeling of mRNA microarray data from the same isolated cells determined log-fold change in expression between N K cells and each of the following subtypes: T cells, B cells, granulocytes and monocytes. The average of these four log-fold change values is shown on the ordinate. Significance for a particular gene region was achieved when q ⁇ 0.1 for four mRNA expression linear models as well as the DNA methylation mixed effects model.
  • Candidates for NK cell-specific DNA methylation biomarkers were limited to significant gene loci exhibiting decreased methylation in NK cells (methylation estimate ⁇ 0) and within genes that exhibited increased RNA expression (log fold change > 1).
  • the candidate loci are marked with asterisks in the top left quadrant, and NKp46 loci are marked with grey asterisks.
  • Figure 35 is a heatmap showing demethylation status of NKp46 determined by methylation specific quantitative PCR (MS-qPCR) of isolated human leukocyte populations. Individual samples of (MACS) purified white blood cell subtypes were subjected to a MS-qPCR assay that detects demethylated copies of NKp46 DNA. Extent of NKp46 methylation is illustrated in this heatmap in which light indicates that all copies of DNA in particular sample were demethylated in the targeted region of NKp46, and dark indicates that all copies were methylated.
  • MS-qPCR methylation specific quantitative PCR
  • Figure 36 is a line graph showing linearity of NKp46 MS-qPCR calibration. Bisulfite converted universal methylated DNA was used to standardize total amount of DNA in all samples at a constant amount. At least three replicates of each standard are plotted. Real time PCR Ct values decrease linearly with ten-fold increase in bisulfite converted NK cell DNA concentration.
  • Figure 37 is a bar graph showing prevalence of HNSCC by normal NKp46
  • NKp46 demethylation tertile Normal NKp46 demethylation tertile cutoffs were determined from control blood samples only. Higher tertiles indicate higher NK cell levels. HNSCC prevalence
  • Figure 38 is a heatmap showing methylation status of selected NKp46 CpG loci measured by bisulfite pyrosequencing of isolated human leukocytes. The methylation status of eight individual CpG loci near the promoter region of NKp46 were interrogated by
  • FIG. 39 is a graph showing percent demethylation (ordinate) of a DNA region in
  • NKp46 in control and HNSCC patient blood samples assessed by MS-qPCR.
  • the NKp46 MS-qPCR assay measures the extent of DNA demethylation. A higher level of demethylation indicates a higher level of NK cells within a sample. Wilcoxon rank sum p-value is displayed.
  • Figure 40 is a listing of DNA sequences of regions in 96 different genes, each sequence having one CpG dinucleotide shown within square brackets and used to determine methylation status of the gene.
  • the DNA sequence surrounding the CpG dinucleotides was used to design probes for the array and for primers for performing the methods for analyzing differential methylation.
  • Also included are the names of the genes, chromosome number indicating the chromosome in which each genes is located, the source of the DNA sequences, Genebank accession numbers, and the coordinate of the CpG dinucleotide in each respective gene.
  • Figure 41 is a schematic diagram showing different ways of representing effects on measured DNA methylation due to an exposure or a specific phenotype.
  • Figure 41 panel A depicts the marginal effects ( ⁇ ) on measured DNA methylation.
  • the marginal effects are effects which are not adjusted for white blood cell (WBC) distribution.
  • Figure 41 panel B depicts the effects on measured DNA methylation adjusted for WBC distribution resulting from exposure or a specific phenotype.
  • Figure 42 is a set of graphical representations showing the relationship between a and ⁇ , the effect on measured DNA methylation not adjusted or adjusted for WBC distribution, for the covariate (e.g. age, current smoker status, toe Arsenic concentration and Dye use) of interest over all autosomal CpGs.
  • the curve depicts a loess fit to the scatter plot.
  • Figure 43 is a graphical representation showing fluorescence intensities of CD3Z gene amplified by digital droplet PCR, and a graphical representation showing concentration of CD3Z gene in PCR samples.
  • Figure 43 panel A shows a fluorescence intensity dot plot for amplification of CD3Z gene by detection of intensities of 6 FAM (6-Carboxyfluorescein). Positive and negative droplets are distinguished by a horizontal line.
  • Figure 43 panel B shows a correlation of the concentration of copy numbers of CD3Z gene obtained by measuring 6 FAM fluorescence intensities and the expected copy numbers of CD3Z gene obtained by dilution of a known amount of DNA from CD3+ T cells.
  • Figure 44 is a graphical representation showing fluorescence intensities of FoxP3 gene amplified by digital droplet PCR, and a graphical representation showing concentration of FoxP3 gene in PCR samples.
  • Figure 44 panel A shows a fluorescence intensity dot plot for amplification of FoxP3 gene by detection of intensities of 6 FAM (6-Carboxyfluorescein). Positive and negative droplets are distinguished by a horizontal line.
  • Figure 44 panel B shows a correlation of the concentration of copy numbers of FoxP3 gene obtained by measuring 6 FAM fluorescence intensities and the expected copy numbers of FoxP3 gene obtained by dilution of a known amount of DNA from CD3+ T cells.
  • Figure 45 is a graphical representation showing fluorescence intensities of NKp46 gene amplified by digital droplet PCR, and a table showing concentration of NKp46 gene in the PCR samples amplified under different conditions.
  • Figure 45 panel A shows a fluorescence intensity dot plot for amplification of NKp46 gene under different conditions by detection of intensities of 6 FAM (6-Carboxyfluorescein). Positive and negative droplets are distinguished by a horizontal line.
  • Figure 45 panel B is a table showing concentration of NKp46 gene in copies/ ⁇ determined under different PCR conditions as fractions of methylated control DNA.
  • Figure 46 is a graphical representation showing fluorescence intensities of NKp46 gene amplified by digital droplet PCR, and a table showing concentration of NKp46 gene in the PCR samples amplified under different conditions.
  • Figure 46 panel A shows a fluorescence intensity dot plot for amplification of NKp46 gene by detection of intensities of 6 FAM (6-Carboxyfiuorescein).
  • the amplification of demethylated NKp46 locus was performed using C-less and NKp46 DMR specific primers and probes, and results compared. Positive and negative droplets are distinguished by a horizontal line.
  • Figure 46 panel B is a table showing concentration of NKp46 gene in copies/ ⁇ determined with whole blood DNA, Neutrophil DNA, CD 16+CD56 dim NK cell DNA and CD16+CD56 bright NK cell DNA.
  • a model of hematopoiesis includes an early restriction point at which multipotent progenitor cells become committed to either lymphoid or myeloid lineages.
  • the standard methods of distinguishing immune cell lineages are inadequate for fully distinguishing lineage commitment and the process of hematopoiesis.
  • Epigenetics refers to heritable control of gene expression that occurs without changing the sequence of DNA. Chromatin packaging is a mechanism of epigenetic gene regulation which has been implicated in cell lineage commitment and lineage-specific gene expression.
  • DNA methylation is a marker of chromatin packaging. DNA methylation is largely confined to cytosine residues in CpG dinucleotides which, though underrepresented in the genome, are frequently found in high concentrations called CpG islands. Less methylated CpG islands are highly associated with transcriptional activity and subsequent gene expression, and more methylated CpG islands are highly associated with transcriptional inactivity and gene silencing. Methylation of CpG dinucleotides causes chromatin to become more compact and inaccessible to transcription machinery by moving histones and altering the organization of chromatin and nucleosomes.(Christensen, B.C., et al. 2009, PLoS Genet S, el 000602; Schmidl, C, et al. 2009, Genome Res 19, 1165-1 174).
  • the overall balance of leukocyte subclasses in circulation or in tissue most prominently influences pathogenesis.
  • incipient cancer cells are recognized and eliminated by cytotoxic T cells (CTLs) and natural killer (NK) cells, and tumorigenesis is also promoted by certain other inflammatory cells, including B-lymphocytes, mast cells, neutrophils, regulatory T cells (Tregs), and others.
  • CTLs cytotoxic T cells
  • NK natural killer cells
  • Tugenesis is also promoted by certain other inflammatory cells, including B-lymphocytes, mast cells, neutrophils, regulatory T cells (Tregs), and others.
  • NK cells and CTLs circulating in the blood and residing in adipose tissues are associated with lower incidence of metabolic diseases such as type II diabetes (Lynch et al., 2009, Obesity, 17, 601-5), and higher levels of Ml macrophages in adipose tissue can induce inflammation and insulin resistance (Anderson et al, 201 1, Curr Opin Lipidol. 21, 172-177).
  • Methods of quantifying the composition of lymphocyte populations can be informative regarding the underlying immuno-biology of disease states as well as the immune response to almost all chronic medical conditions. (Chua et al., 201 1, Brit J Cancer 104, 1288-1295).
  • the methods described herein provide a measurement of individual human or animal immune cell numbers or immune cell ratios and in diverse biologic media without the requirement for viable cells or cell sorting or the use of any antibodies or protein markers.
  • the methods are applicable to blood including samples of unsorted blood that is fresh, or is frozen or unfrozen anticoagulant treated peripheral whole blood, finger stick blood, non-anticoagulant treated whole blood, blood clots, isolated mononuclear cells, buffy coat, archival Guthrie card neonatal blood, and to a sample that is a spot, fresh, frozen or is from a tumor such as a formalin-fixed tumor biopsy, and to urine sediment, CNS fluid, fat or other tissue biopsy.
  • the methods described herein are provided as diagnostic kits for testing laboratories in the form of immune cell specific detection reagents, premixed and optimized plate formatted multiplex assays for immune profiling compatible with specific instrument platforms, applications for in vitro diagnostics of blood, CNS, urine or
  • bronchoalveolar lavage and point of care blood sampling kits for mail-in immune testing and immune monitoring.
  • the simplified DNA based immuno-diagnostic approach provided herein uses samples that are much smaller volumes of blood than required for earlier methods and that require no processing. These samples can be simply 'spotted' onto a solid phase carrier and transported through the mail or delivered using courier.
  • the methods described include development of software that can process the output data of immune specific methylation assays to create immune parameter reports by comparison to different reference and control values.
  • the methods herein describe a discovery platform which is a bioinformatic integration of empirically derived genome wide methylation analyses with publically available differential gene expression analyses. The merged datasets are then sorted to produce candidates for further examination. The discovery platform is useful to discover clinically useful gene biomarkers.
  • the methods described herein include a proof-of-principal test of the discovery platform.
  • the goal set was to discover a gene or gene set that provides a marker of CD3+ T cells.
  • the method is applicable to finding a biomarker for any cell.
  • the platform identifies gene regions that are 'demethylated' within the target cell population (CD3+ T cell) and completely methylated in non-target cells.
  • DNA from the leukocytes was extracted according to manufacturer's protocol using the DNeasy Blood & Tissue kit (Qiagen), and subjected to Bisulfite conversion by treatment with sodium bisulfite using the EZ DNA ethylation Kit (Zymo) following the manufacturer's protocol, thereby converting unmethylated cytosine residues to uracil and leaving methylated cytosine residues intact.
  • DNA methylation is measured using a DNA methylation microarray as described in Example 13.
  • Huehn et al. (U.S. patent publication number 2007/0269823 Al) describes a method for identifying FoxP3-positive regulatory T cells by analyzing the methylation status of CpG positions in the FOXP3 gene, and further describes a method for diagnosing immune status of a mammal by measuring amounts of regulatory T cells thus identified.
  • CpG methylation analysis of FoxP3 gene is also used to determine the quality of in vitro generated T regulatory cells and for identifying chemical or biological substances that modulate the expression of the FOXP3 gene in T cells.
  • Specific CpG positions in the mouse FoxP3 gene are identified for analyzing methylation status and primers for amplifying mouse and human CpG dense regions in FOXP3 gene are described.
  • Olek U.S. patent publication number 2007/0243161 Al describes a method for pan-cancer diagnostics involving identification of an amount and/or proportion of stable regulatory T cells in a patient suspected of having cancer by analyzing methylation status of CpG positions in the FOXP3 and/or camtal genes. Increased amount proportion of stable regulatory T cells in the patient is indicative of an unspecified cancerous disease.
  • a method of treating cancer by reducing the amount or proportion of stable regulatory T cells and a method for diagnosing survival of a cancer patient by measuring T regulator ⁇ ' cell amounts and/or proportions in patients suspected of having cancer using CpG methylation analysis of FoxP3 and/or camtal genes are described. Increased amounts and/or proportions of stable regulatory T cells in the cancer patient is indicative of a shorter survival.
  • Olek et al. (International publication number WO 2010/069499 A2) describes a method of identifying T-lymphocytes, in particular CD3+CD4+ and/or CD3+CD8+ cells by analyzing the methylation status of CpG positions in one or more of genes for CD3 multi-protein complex CD3 ⁇ , - ⁇ and - ⁇ , or in other genes. Demethylation is indicative of a CD3+ cell.
  • Olek further describes methods for methylation analysis of CpG positions in CD4+ and/or CD8+ genes, in particular CDS beta gene, or in other genes, and for determining immune status based on T- lymphocytes identified by methylation analyses, and for monitoring amounts of T-lymphocytes in response to chemical and/or biological substance exposure, in particular CD4+ or CD8+ T lymphocytes.
  • Shen-Orr et al. 2010, Nature Methods Vol. 7:4, 287-289 describes a cell-type specific significance analysis of microarrays for analyzing differential gene expression for each cell type in a biological sample from microarray data and relative cell type frequencies.
  • Shen-Orr's method relative abundance of each cell type in a mix tissue sample is first quantified, and this information is used in combination with microarray gene expression data to deconvolve and compare cell type-specific average expression profiles for groups of mixed tissue samples.
  • a method similar to regression calibration is provided herein for determining changes in the distribution of white blood cells between different subpopulations (e.g. cases and controls) using DNA methylation signatures ro DNA methylation profiles, in combination with an external validation set having methylation signatures from purified leukocyte samples.
  • the method is demonstrated with Head and Neck Squamous Cell Carcinoma (HNSCC) cases and matched controls, showing that DNA methylation signatures register known changes in CD4+ and granulocyte populations.
  • HNSCC Head and Neck Squamous Cell Carcinoma
  • DMRs as markers of immune cell identity
  • a high density methylation platform and a set of analytical tools for estimating the proportions of immune cells in unfractionated whole blood to determine the DNA methylation signature of each of the principal immune components of whole blood (B cells, granulocytes, monocytes, NK cells, and T cells subsets).
  • a form of regression calibration was determined that considers a methylation signature as a high-dimensional multivariate surrogate for the distribution of white blood cells. This distribution was used to predict or model disease states.
  • the DNA methylation signature was assumed to be a highly correlated measure of leukocyte distribution, and thus fits into the framework of measurement error models, in which the use of a noisy surrogate marker to investigate an association with a disease outcome of interest results in biased estimates, unless internal or external validation data are obtained to "calibrate” the model and correct the bias (Carroll et al., 2006, Measurement error in nonlinear models. Chapman & Hall, Boca Raton, Florida, 2 nd edition).
  • Measurement error problems are formulated as a set of relationships between z, the disease outcome (e.g. case/control status), ⁇ , the gold standard (e.g. leukocyte distribution), and y, the surrogate (e.g. DNA methylation).
  • ⁇ ) was difficult to estimate due to the cost or logistical complications involved in obtaining ⁇ in a large number of samples.
  • Examples herein include methods for an estimation technique, theoretical treatment of bias, and a demonstration of the approach through an application to whole blood specimens collected in an example of head and neck squamous cell carcinoma (HNSCC). See Figure 3. Also provided are methods for a sensitivity analysis, demonstrating the impact of possible biases. Simulation study results are shown in examples herein based on the biology in the samples used.
  • HNSCC head and neck squamous cell carcinoma
  • Examples 1-3 herein show a method for determining changes in distribution of white blood cells between different subpopulations (e.g. cases and controls) from DNA methylation signatures, assuming an external validation set consisting of methylation signatures from purified white blood cell (WBC) samples exists.
  • Examples 4, 10 andl 1 herein demonstrate the methodology using a data set of HNSCC cases and matched controls, inferring from DNA methylation assays alone known changes in CD4+ and granulocyte populations between cases and controls and change in CD4+ populations due to aging. Using previous methods flow cytometry would have been necessary to obtain the same results. A method for assessing the sensitivity of the magnitude estimates to possible biases is also provided.
  • Example 12 validates the method through simulation.
  • Methods are provide herein for determining changes in the distribution of white blood cell types between different human populations (e.g. cases and controls) using DNA methylation signatures; by using an external validation set having methylation profiles from purified white blood cell components.
  • DNA methylation in peripheral blood was accordingly shown to be a biomarker for clinical and epidemiological investigation.
  • a solution to partition this component of variation in methylation from other determinants employs multivariate analytic tools including regression coefficients, associated inference, and coefficients of determination measures. These tools were used to evaluate whether the observed DNA methylation differences were due to an immunologically mediated response.
  • Prior measurement error formulations (Thurston et al., 2003, J Stat Plan Inf, 1 13, 527-34; Li and Yin, 2007, Ann Stat, 35, 2143-2172) require specification of a logistic regression model for case/control status, conditional on DNA methylation signature, a computationally difficult task that is vulnerabe to model mis-specifications.
  • a reverse formulation was used herein that naturally models the relationship of DNA methylation conditional on known phenotypes.
  • the formulation respects the protocol (DNA methylation assay data collected after sampling from phenotype groups).
  • Other strategies to formulate errors were found to be unsuccessful.
  • the strategy utilizing Expectation-Maxinlization (EM) algorithm to integrate over the missing data ⁇ (Little and Rubin, 2002, Statistical Analysis with Missing Data. Wiley, Hoboken, NJ, 2 nd edition) is outside the measurement error literature and within the larger missing-data literature.
  • EM Expectation-Maxinlization
  • the distribution of ⁇ varied substantially between the data sets S 0 and S 1 , severely complicating the approach, with side- effect of introducing feedback from S 1 to S 0 , contaminating the gold-standard status of S 0 .
  • Another alternative that was found to be unsuccessful was the simpler approach of an empirical Bayes procedure, similar to existing mixture-model approaches (Koestler et al., 2010,
  • Examples herein show that group level comparisons of blood cell DNA methylation revealed significant immune alterations. Methods for individual level immune cell profiling are applicable also, since methods herein are useful also to clinical and detailed analytical epidemiologic applications that examine individual risk factor information.
  • Zn involves an orthogonal (e.g. one-way ANOVA) parameterization and ordinary least squares (OLS) is used to obtain Bi
  • projections ⁇ serve as estimates of individual profiles.
  • There is interest in minor immune cell fractions and their role in disease though the signal strength of cell types comprising ⁇ 5% of the total white cell compartment is difficult to quantitate. Examples of such cell types include the regulatory T cell or NK cell fractions, which are implicated in autoimmune and malignant diseases. Optimization of platforms for technical sensitivity to minor subtypes combined with statistical optimization of signature recognition are needed to enhance the approach for testing highly targeted immune hypotheses.
  • immune cell profiling at the individual level is important for examining individual risk factors in clinical and detailed analytical epidemiologic applications. As shown in Examples herein, individual immune profiles are theoretically achievable and require extensive validation with a wide array of mixture combinations.
  • the methods herein have potentially far reaching implications for rapid, simple and complete assessment of the composition of human white blood cell populations, i.e. the immune profile.
  • assessment of the cellular composition of peripheral blood cannot be accomplished without the use of freshly drawn venous blood that is immediately prepared in a specially equipped laboratory.
  • a complete assessment of the entire immune profile requires extensive flow cytometric measurements based on protein epitopes on leukocyte membranes that distinguishes subtypes of immune cells that are either too rare or too similar in appearance to be distinguished using simple microscopic approaches.
  • flow cytometry is limited by the following: cells must be separated, requiring large volumes of fresh cells; detection can be accomplished only by the fluorescent antibody tags available, which require expensive technology to read; the outer cell membrane must be intact, mandating limited utility in many instances.
  • the methods herein obviate the need for fresh blood and the preservation of labile protein epitopes.
  • the methods herein are able to also simultaneously assess all of the individual components of the peripheral blood using a highly multiplexed molecular platform and therefore logistically straightforward.
  • the statistical methodology used here is implemented easily with the instrumental output of the methylation arrays, which simplifies the interpretation of the immune profile data from the operator's point of view.
  • the methods herein are immediately deployed in a research framework to cost effectively assess human immune profiles (in fresh or archival samples), to explore the potential of the immune profiles to function as biomarkers, and to address key questions regarding disease pathogenesis. Furthermore, the approach used in the methods herein is readily suited for rapid translation to a broad base of clinical applications such as disease monitoring, diagnosis, prognosis, and response to therapy.
  • the methods herein are applied to tumor biopsies for immune characterization of cancer patients.
  • Other notable applications exist including the application of the test to urine sediments in patients with autoimmune and diabetic kidney disease or in patients undergoing kidney transplantation. Positive detection of T cells in urine sediment is indicative of immune activation and potential kidney disease progression or acute rejection in the context of kidney transplantation.
  • Populations of blood lymphocytes can be distinguished morphologically on the basis of size and the presence of a granular cytoplasm.
  • Small lymphocytes including all subsets of T- and B cells, are responsible for adaptive immune responses. Sublineages of small lymphocytes are morphologically indistinguishable and are distinguished by cell surface receptors and cellular function. B cells are typically distinguished by expression of the surface molecule CD19. They express immunoglobulins, which are surface receptors for pathogens. In addition, B cells are capable of further differentiating into effector cells called plasma cells. (Parham, P. The Immune System, Garland Science, New York, NY, 2005). Differentiated T cells exhibit a complex of surface molecules which function as antigen receptors, referred to as the T cell receptor (TCR) complex.
  • TCR T cell receptor
  • This complex includes the TCR a plus ⁇ , or ⁇ plus ⁇ antigen recognition chains, which are associated with invariant chain subunits CD3y, ⁇ , ⁇ , and ⁇ .
  • T cells are distinguished from other cell lineages by expression of CD3 molecules on the cell surface.
  • the genes that encode CD3 ⁇ , ⁇ , ⁇ , and ⁇ subunits are CD3G, CD3D, CD3E and CD3Z respectively.
  • the former three genes are tightly clustered on chromosome 1 1, whereas CD3Z is located on chromosome 1.
  • Differentiated T cells are further divided into two lineages depending on their expression of either CD4 or CD8.
  • CD8+- T cells also known as cytotoxic T cells
  • the main function of CD4+ T cells is to help other immune cells respond appropriately to sources of infection or malignancy.
  • CD4+ T cells There are several subsets of CD4+ T cells, including Thl, Th2, Th l 7 and regulatory T cells. (Parham, P. The Immune System, Garland Science, New York, NY, 2005).
  • Regulatory T cells suppress an immune response by influencing the activity of other cell types. They act primarily in the periphery on mature lymphocytes that have exited the main lymphoid tissues and serve as a means of preventing autoimmunity during protective immune responses.
  • Exemplary regulatory T cells are thymus-derived CD4+CD25+Foxp3+ T cells, commonly referred to as Tregs.
  • Tregs thymus-derived CD4+CD25+Foxp3+ T cells
  • These cells primarily function to maintain peripheral self-tolerance.
  • Forkhead Box P3 (FOXP3) a transcription factor expressed by Tregs, is an important developmental and functional factor that regulates Treg immunosuppressive functions.
  • NK cells Natural killer cells are large CD56+ lymphocytes with a granular cytoplasm. They enter infected or malignant tissue to kill damaged cells and secrete cytokines aimed at preventing the spread of disease to other cells or tissues. Thus, NK cells act as effector cells of innate immunity. A subset of CD56+ NK cells that express CD3 surface molecules are NKT cells.
  • RPMM recursively partitioned mixture modeling
  • Resultant t- values from each comparison were converted to p-values in R version 2.11.1 of Illumina's software which provides convenient mechanisms for loading and analyzing the results of methylation status, and for quality control and basic visualization tasks.
  • a negative t-value indicates the locus putatively represents a DMR that is unmethylated in group 1 leukocyte lineage(s) and methylated in group 2 leukocyte lineage(s).
  • a positive t-value indicates that the locus putatively represents a DMR that is methylated in group 1 leukocyte lineages and unmethylated in group 2 leukocyte lineages.
  • a DMR that is unmethylated in the leukocyte lineage(s) of interest and methylated in other leukocyte lineages would make the best epigenetic biomarker, since unmethylation is associated with transcriptional activity whereas methylation is associated with transcriptional silencing. Therefore, significant CpG loci exhibiting negative t-values are preferred.
  • results of locus by locus comparisons were merged with cell type specific gene expression data.
  • Palmer et al., 2006, BMC Genomics 7, 1 15; Du et al, 2006, Genomics 87, 693-703; and Hashimoto et al, 2003, Blood 101, 3509-3513 to identify putative DMRs that are in genes associated with altered expression by Group 1 leukocyte lineages compared to Group 2 leukocyte lineages.
  • An exemplary candidate epigenetic biomarker of a specific leukocyte lineage is an unmethylated region of a gene that is highly expressed by the leukocyte lineage, and not expressed by other cell types such as lineage-specific surface molecules,obligate differentiation proteins, and secreted factors.
  • a further candidate is a methylated region of a gene that is not expressed by the leukocyte lineage and is expressed by all other cell types. Without being limited by any theory or mechanism of action scenarios correlate with chromatin packaging, so that differential DNA methylation plays a large role in regulating leukocyte lineage specific expression of the gene. If no leukocyte lineage specific difference in expression of the gene containing a putative DMR were observed, other modes of gene regulation such as activators, repressors, and enhancers overshadow the role of chromatin packaging in regulating expression of the gene. Alternatively, such a gene is expressed in a temporally or environmentally specific manner that was not elucidated by the gene expression candidate data. Such a putative DMR would not be an ideal target to explore as an epigenetic biomarker of that leukocyte lineage.
  • DMR validation is performed for each putative DMR identified from array data using bisulfite pyrosequencing and/or MethyLight quantitative real time PCR assays that measure DNA methylation of the gene region in all sorted human leukocyte samples shown in Table 15, Example 13.
  • Bisulfite pyrosequencing assays were designed using Pyromark Assay Design 2.0 (Qiagen), and carried out on a Pyromark MD pyrosequencer running Pyromark qCpG software (Qiagen). Oligonucleotide primers were obtained from InvitrogenTM by Life TechnologiesTM.
  • the gene region of interest were PCR amplified from bisulfite converted DNA using a biotinylated reverse primer and an unlabelled forward primer.
  • the biotinylated PCR product was complexed with sequencing primers that anneal upstream from the target region, and was then incubated with enzymes and substrates.
  • dNTPs were dispensed in a specific order and light emitted with the incorporation of each nucleotide is measured with a CCD camera. Methylation was quantified by calculating the ratio of cytosine (methylated) to thymine (unmethylated) at each CpG locus.
  • MethyLight® qPCR to be unmethylated in group 1 leukocytes and in some group 2 leukocytes, was not confirmed as an epigenetic biomarker specific to the group 1 leukocyte lineage(s). Instead that DMR represents an epigenetic biomarker of several different human leukocyte lineages including the group 1 lineage(s).
  • a DMR that is partially unmethylated by bisulfite pyrosequencing or MethyLight® qPCR in group 1 leukocytes and methylated in group 2 leukocytes, is a weak epigenetic biomarker of the group 1 leukocyte lineage(s). That DMR is heterogeneously unmethylated in group 1 leukocytes and is homogeneously methylated in group 2 leukocytes and is therefore not useful for distinguishing group 1 from group 2 leukocyte lineages.
  • Gliomas are a histologically diverse cancer with few established risk factors and poor prognoses (Kleihues et al. 1993, Brain Pathol 3(3): 255-68; Ohgaki and Kleihues 2005, Acta Neuropathol 109(1): 93-108: Louis et al. 2007, Acta Neuropathol 114(2): 97-109; Ohgaki, and Kleihues 2007, Am J Pathol 170(5): 1445-53).
  • immune factors are associated with increased glioma risk and are also thought to play a role in patient outcomes (Wiemels et al. 2009, Int J Cancer. 2009 Aug 1 ; 125(3):680-7; Yang et al.
  • TILs tumor infiltrating lymphocytes
  • CD3Z CD247
  • Examples herein show the validity of CD3Z demethylation as a CD3+ T cell marker and illustrate its application in patients with glioma that demonstrate the high discriminating value of CD3Z demethylation in glioma case-control subject comparisons, histopathological characterization of tumors and patient prognosis.
  • CD3+ T cells and Tregs in peripheral blood from GBM patients were observed to be reduced about two-fold in GBM patients, which was highly statistically significant.
  • Validation studies herein support the notion that the CD3Z MS-qPCR assay using unprocessed archival whole blood is an accurate reflection of T cells as measured by conventional flow cytometry.
  • Previous studies have validated the FOXP3 demethylation assay as a measure of Tregs in blood and tissues (Baron et al., 2007, Eur J Immunol 37(9): 2378-89).
  • Treg/T cell ratios were herein observed to be higher in current smokers versus former smokers ( Figure 16). It was subsequently confirmed in an independent population that current but not former cigarette smoking exhibit higher Treg/T cell ratios. Results herein illustrate the need for examination of patient characteristics to include cigarette smoking in diseases that affect Treg levels. New epigenetic methods described herein are useful in promoting these types of studies.
  • CD3Z and FOXP3 demethylation in brain tumor cells lines and in human GBM xenografts which cannot contain human T cells were assessed. These samples contained non-detectable levels of CD3Z or FOXP3 demethylation.
  • Some subtypes of NK cells (CD56 dim CDl 6 bright ) utilize CD3Z in NK receptor signaling (Lanier, 2006, Trends Cell Biol 16(8): 388-90). The contribution of CD3Z expressing and demethylated NK cells to the overall CD3Z demethylated signal in peripheral white blood cells is estimated to be very small. Furthermore, NK cells have not been observed in glioma tissues.
  • the fundamental innovation in the epigenetic analyses described herein is a shift in immunodiagnostics away from proteomic-based approaches to one that is based on quantifying cell type specific DNA methylation events.
  • This new approach produces gains in versatility, sensitivity, feasibility and throughput compared with conventional flow cytometry or IHC and does so at a lower cost.
  • the high chemical stability of cytosine methylation marks within genomic DNA and the fact that differentiation within the immune system is tightly linked with gene specific DNA methylation events makes quantification of immune cells through epigenetic analyses a unique approach.
  • the method combines the intrinsic chemical stability of DNA with the high sensitivity of qPCR methods. Automation and liquid robotic handling in processing and analysis add further to the power of the methodology and open avenues for investigations in the immunoepidemiology of glioma and many other diseases.
  • Methods herein show that blood-based DNA methylation signatures across a complex cellular mixture of WBCs are useful for distinguishing solid tumor cancer cases in which there are well-defined immune-mediated responses and controls.
  • tumorigenesis elicits a distinct immune response (Camilleri-Brot S et al, 2004, Ann Oncol 15: 104-112; Wang Y et al, 2005, Am J Clin Pathol 124:392 ⁇ 01 ; Rui L et al, 201 1 Nat Immunol 12:933-940)
  • the result is a hematopoietic shift in WBC populations, which can be precisely discerned by applying the unique epigenetic signature of differing lineages.
  • the aggregate methylation signature in blood that distinguishes cancer cases from controls corresponds to the epigenetic signatures that define leukocyte subtypes.
  • the epigenetic landscape of WBCs was obtained by identifying DMRs among leukocyte subtypes. This analysis revealed that nearly all of the highest ranking 50 leukocyte DMRs (Example 25) were differentially methylated between disease cases and normal controls for HNSCC and ovarian cancers, with a smaller fraction differentially methylation between bladder cancer cases and controls. Among the eight overlapping CpG loci that were found to be significantly differentially methylated between cancer cases and controls across the three data sets, the direction of the relationships was similar for HNSCC and ovarian cancer cases compared to controls. These findings show that HNSCC and ovarian cancer elicit similar shifts in leukocyte compositions in the hematopoietic system.
  • NFE2, ASGR2 several are located within genes with either established or alleged involvement in immune differentiation or function, viz., CD72, PACAP and FGD2 (Kumanogoh and ikutani, 2001, Trends Immunol 22:670-676; Parnes and Pan, 2000, Immunol Rev 176:75-85; Tan et al., 2009, Proc Natl Acad Sci 106:2012- 2017; Huber C et al., 2008, J Biol Chem 283:34002-34012).
  • CD72 a member of the C-type lectin superfamily, negatively regulates B cell coreceptor signaling (Kumanogoh and Kikutani, 2001) and has been shown to act as a unique inhibitoiy receptor on NK cells regulating cytokine production (Alcon VL et al., 2009, Eur J Immunol 39:826-832).
  • PACAP has been implicated as an intrinsic regulator of regulatory T cell abundance after inflammation36 and FGD2 has been shown to play a role in leukocyte signaling and vesicle trafficking in cells specialized to present antigen in the immune system (Huber C et al., 2008, J Biol Chem 283:34002-34012).
  • HNSCC cancer was predicted with high degree of sensitivity and specificity. Similarly high prediction performance was obtained for ovarian cancer using the DNA methylation profile for the highest ranking ten leukocyte OMRs and patient age group. Prediction performance for bladder cancer, based on the methylation profile of the highest ranking 56 DMRs, patient age, gender, smoking status, smoking pack years, and family history of bladder cancer, was lower than that observed for HNSCC and ovarian cancer.
  • the 56 leukocyte DMRs used in the bladder profile analysis were less correlated with the nine CpG loci identified via the previously reported SSRPMM analysis of this data set (Marsit CJ et al., 201 1, J Clin Oncol 29: 1 133-1 139).
  • Alternative biological epigenetic mechanisms may be operative in bladder cancer in addition to the epigenetic signatures characteristic of leukocyte subtypes, and contribute independently to the blood-derived differences in DNA methylation between bladder cancer cases and controls.
  • Examples herein provide evidence that observed differences in blood-derived DNA methylation in cancer cases are largely explained by systematic differences in the methylation signatures of leukocyte sub-populations. These findings signify that different cancers elicit a discernible, unique immune response evident in peripheral blood. These results have important implications for research into the immunology of cancer. Further, the approach of observing differences in blood derived DNA methylation provides a completely novel tool for the study of the immune profiles of diseases where only DNA can be accessed; that is, this approach has utility not only in cancer diagnostics and risk-prediction, but can also be applied to future research (including stored specimens) for any disease where the immune profile holds medical information. The approach represents an extremely simple, yet truly powerful and important new tool for medical research and may serve as a catalyst for future non-invasive disease diagnostics.
  • NK cells Natural kil ler cells are a key element of the innate immune system implicated in human cancer.
  • HNSCC head and neck squamous cell carcinoma
  • FTNSCC Head and neck squamous cell carcinoma
  • Natural killer (NK) cells are of particular interest in the context of HNSCC and other cancers, since they are able to recognize and destroy pre-cancerous and malignant cells (Kim R et al., 2007, Immunology, 121 : 1-14; Ostrand-Rosenberg S. 2008, Curr Opin Genet Dev, 18: 1 1-8; Whiteside TL, 2006, Cancer Treat Res, 130: 103-24; Parham P. The Immune System. 2nd ed. New York, NY: Garland Science; 2005).
  • NKT natural killer T
  • Gastrointestinal and liver physiology 300:G516-25) Gastrointestinal and liver physiology 300:G516-25). Unlike previous studies, data shown herein evaluates the effects of these factors on the depression in NK immune profile. Patient risk factors and disease characteristics (e.g. tumor location) are evaluated herein in relationship to NK cells to determine the independent associations of HNSCC with innate immune parameters.
  • NK cell-specific DNA methylation was identified by analyzing DNA methylation and mRNA array data from purified blood leukocyte subtypes (NK, T, B, monocytes, granulocytes), and confirmed via pyrosequencing and methylation specific quantitative PGR (MS-qPCR).
  • NK cell levels in archived whole blood DNA from 122 HNSCC patients and 122 controls from a study population were assessed by MS-qPCR. Details of this study population have been previously described (Applebaum KM et al., 2007, Journal of the National Cancer Institute, 99: 1801-10). Briefly, peripheral blood from 122 control donors and 122 HNSCC patients was collected between December 1999 and December 2003 in the greater Boston area.
  • DNA methylation biomarkers of NK cells represent an alternative to conventional flow cytometry that can be applied in a wide variety of clinical and epidemiologic settings including archival blood specimens.
  • Described herein is a new method for measuring NK cell levels in human blood and tissue based on cell- lineage specific DNA methylation that can be applied to samples regardless of handling and storage procedures. This is a step forward in immune cell detection and quantification that is applicable to many types of clinical samples.
  • Applying the method to a case-control study of HNSCC revealed a case-associated decrease in circulating NK cells that is independent of known risk factors and treatments. This shows that it is important to monitor NK cell levels in patients with HNSCC, and that it may be worthwhile to pursue future immune therapies may be designed aimed at restoring circulating NK cells in patients with HNSCC.
  • methylation hot-spots or methylated CpG islands in the genome may also be identified by several of the recently developed genome-wide screen methods such as Restriction Landmark Genomic Scanning for Methylation (RLGS-M), and CpG island microarray.
  • RGS-M Restriction Landmark Genomic Scanning for Methylation
  • MethyLight is a highly sensitive high-throughput quantitative methylation assay, capable of detecting methylated alleles in the presence of a 10000-fold excess of unmethylated alleles using fluorescence-based real-time PCR technology that requires few or minor further manipulations after the PCR step.
  • a MethylLight assay is commercially available from Q1AGEN, Inc. Valencia, CA.
  • analyzing the methylation of any gene e.g., the
  • CD3Z gene through amplification by Polymerase Chain Reaction is performed using digital PCR.
  • Digital PCR is an improved method of PCR useful to overcome difficulties associated with conventional PCR.
  • Conventional PCR assumes that amplification of nucleic acid is exponential and nucleic acids are quantified by comparing the number of amplification cycles and amount of PCR end-product to those of a reference sample. In practice however, several factors interfere with this calculation, making measurements uncertainties and inaccurate and hence unsuitable for highly sensitive measurements.
  • a sample is partitioned so that individual nucleic acid molecules within the sample are localized and concentrated within many separate regions. Molecules can be counted by estimating by using a Poisson distribution. Each partition contains "0" or " 1 " molecules, or a negative or positive reaction, respectively.
  • nucleic acids are quantified by counting the regions that contain PCR end-product, which is a count of positive reactions.
  • a system for digital PCR based on integrated fluidic circuits (chips) having integrated chambers and valves for partitioning samples is commercially available. For example a digital PCR system is available from Life Technologies (Grand Island, NY 140721JSA) and QuantaLife QuantaLife Pleasanton, CA USA).
  • Example 1 Statistical methods for using DNA methylation arrays as surrogate measures of cell mixture distribution
  • Y oh represents an m x 1 vector of methylation assay values, e.g. average beta values from an
  • Infinium bead-array product corresponding to a purified blood sample consisting of a homogenous cellular population (e.g. monocytes or granulocytes), with the qualitative characterization of the cell type indicated by a d Q x 1 covariate vector w,, .
  • h e ⁇ 1,..., n 0 ⁇ and the m individual values correspond to CpG sites on a DNA methylation microarray, possibly pre-selected to correspond to putative DMRs for distinguishing different cellular types.
  • Y .
  • m x 1 represents an m x 1 vector of methylation assay values for the same CpG sites (in the same order) as ⁇ 0 ⁇ , but corresponding to a heterogeneous mixture of cells (e.g. peripheral whole blood) from a human subject.
  • i e ⁇ 1 ,..., «, ⁇ , « is the number of target specimens
  • the goal is to understand the associations between Y tract.
  • a reasonable regression parameterization for z is also assumed, including an intercept, and for convenience, the first column of B 0 is denoted as ⁇ , , the m x l intercept.
  • the error vectors e 0 and e l may reflect independence among arrays h and / , or else may have more complex random effects structure accounting for technical effects or biological replication; however, their substructures are incidental to this analysis, with the exception of the fine details of the bootstrap procedure proposed below.
  • is a d 0 ⁇ d x matrix that summarizes associations between the rows of B 0j and B 1( and
  • SS V measures variability explained by mixtures of profiles in the set S 0
  • SS measures variability in systematic biological heterogeneity that nevertheless remains unexplained by mixtures of profiles in S 0 , presumably due to some process other than differences in mixtures of cell types.
  • R, 2 0 SS SS a , which represents the proportion of total variation in S 1 explained by S 0
  • SS (SS 0 - SS e ) which represents the proportion of systematic variation in S 1 explained by S 0
  • R, 2 is poorly defined when SS a * SS e .
  • Estimation proceeds by applying an appropriate linear model, e.g. ordinary least squares, linear mixed effects models (Wang and Petronis, 2008, DNA Methylation Microarrays:
  • B 0 (l m , B 0 ) , as described in detail in the Example 3.
  • Standard errors can be obtained in one of three ways.
  • the simplest estimator, SE 0 is the "naive" estimator from simple least squares theory, ignoring the fact that B 0 and B 1 are estimates, i.e. potentially variable. To account for variation in estimating B. , a simple alternative is to use a nonparametric bootstrap procedure.
  • Example 2 provides a detailed analysis of such biases, and proposes a sensitivity analysis procedure for assessing the magnitude of possible bias in a given data set.
  • DNA methylation arrays are typically focused on the comparison of methylated to unmethylated CpG dinucleotides, not quantifying actual amounts of DNA.
  • Example 2 General designs for the treatment of methylation assay data obtained from purified cells S 0
  • one sample is purified CD4+ T cells
  • another sample is purified CD8+ T cells
  • yet another sample is T-lymphocyte cells that have not been purified to more specific lineages.
  • S 0 in the examples.
  • the CD4+ sample may be identified as
  • a two-stage estimation procedure is here introduced.
  • the first stage of analysis involves estimation of B 0 and Bi by appropriate linear models, e.g. ordinary least squares (OLS) regression estimator and a similar estimator for ( ⁇ ,,B,) T ; a procedure such as limma; or else locus-by-locus linear mixed effects models that
  • Naive standard error estimates for the (r, s)' h element of ( 0 , f T ) can be obtained by computing (m ⁇ v ⁇ a ⁇ ) 1 ' 2 .
  • the naive standard error estimates fail to account for the variability in estimating B 0 and B j , and are consequently biased, as demonstrated in the simulations, Example 12.
  • be the variance-covariance matrix for the j"' row of B 0 .
  • a resampled matrix B, 3 is formed by adding, to each row j of B 0 , a zero-mean multivariate normal vector with variance-covariance ⁇ ; , or a corresponding multivariate t-distribution with n 0 - d 0 degrees of freedom. Then ⁇ ⁇ and ⁇ ( are computed from (SI) by replacing B 0 with Bp (in addition to the previously mentioned replacement). This method is referred to as the
  • double bootstrap ignores correlation between CpG sites within a single validation sample, and given the relative purity assumed for these samples and adequate correction for technical effects, this is reasonable to first order. As is demonstrated in Examples 6-9 and simulations (Example 10), there is negligible difference between the single and double bootstrap, so the incorporation of additional complexity to model cross-CpG correlations is unlikely to produce much benefit. However, the double-bootstrap has the additional benefit over the single-bootstrap, in that it can be used to assess bias due to measurement error (variability) in B 0 .
  • » ⁇ °'_ [ ⁇ ⁇ .
  • Biases induced by biological non-orthogonality are more insidious.
  • Non-orthogonal ⁇ may arise from two distinct sources. One occurs when some cell types have not been profiled in S 0 , so that ⁇ 3 ⁇ 4) ⁇ ⁇ 1 . The other may arise when some non- cell-mediated biological process (i.e. distinct from a change in cellular mixtures) nevertheless results in methylation profiles that appear similar to those that distinguish cell types profiled in S 0 ,.
  • model represented by equation (4) is elaborated follows:
  • the first term on the right hand side of (6) is the target quantity, identifying the desired mixture weights.
  • the second term will be negligible if all profiles ⁇ are approximately orthogonal to the columns ofB 0 , or else the differences A > q are all small. This condition will be satisfied if S 0 is exhaustive in the sense that !-Li ⁇ is negligible.
  • a key consideration is whether P annihilates the methylation signature corresponding to a given non-cell-mediated biological process.
  • a Bayesian view is adopted to characterize a prior expectation of bias as a function of prior probabilities for individual CpG sites. The goal, in part, is to understand the potential for bias, given the number m of CpG sites chosen to be measured in S 0 , with the goal of selecting m in a manner consistent with minimizing bias.
  • ⁇ kl as at most k nonzero values.
  • a prior probability ⁇ ⁇ is assumed that the subset J kl could correspond to one or more biological processes that distinguish cases from controls. It follows from this view that the prior expectation of ⁇ is m C(m,k) ( m C(m,k) ( m C(m,k)
  • ⁇ 0 ) theoretically independent of m , it does so at the cost of including many potentially noninformative CpGs, early on at low values of m , and these may be possible sources of bias in practice, without offering any modeling benefit in return. If the CpG sites are ordered by level of informativeness, then potentially H, submit 0( ⁇ ⁇ ) , and there will be a small increasing prior expectation of bias, motivating judicious choice of m .
  • Example 4 Proof of concept of Measurement Error Model for determining changes in distribution of white blood cells between different subpopulations
  • S 0 consisted of 46 white blood cell samples; the sorted, normal, human, peripheral blood leukocyte subtypes were purchased from AllCells®, LLC (Emeryville, CA) and were isolated from whole blood using a combination of negative and positive selection with highly specific cell surface antibodies conjugated to magnetic beads; materials and protocols were obtained from Miltenyi Biotec, Inc. (Auburn, CA). These 46 samples are summarized in Table 2 and depicted by the clustering heatmap in Figure 1. T lymphocytes that express CD4 or CD8 constitute over 95% of the T cell class. The pan-T cell type was further refined to CD4+, CD8+, and "other" Pan-T cells subtypes.
  • the covariate vector w h consisted of indicators for five cell types and another two indicators for CD4+ and CD8+ T cell subtypes.
  • a linear mixed effects model with a random intercept for bead chip was used to estimate B 0 ; 27 additional whole blood control samples (replicates from the same individual) were used to assist in estimating chip effects, since otherwise the data set would have been sufficiently sparse to risk confounding between cell type and chip. These "array controls" were indicated with an additional term in wo 3 ⁇ 4 .
  • a linear mixed effects model with a random intercept for bead chip was used to estimate the corresponding row of B render and B j .
  • Figure 5A depicts the relationship log 10 trH m by log 10 (m) for increasing array sizes.
  • Figure 5B depicts the relationship dlogw tr(H m )/ dlog(m) by logi 0 (m) for increasing array sizes, obtained by smoothing the first differences of the curve depicted in Figure 5 panel A via loess smoother.
  • Figure 5 panel A also shows the tangent (obtained from the loess curve) at low values of m. For 0(m) convergence, Figure 5 panel A should show a linear association with slope equal to one, and the curve in Figure 5 panel B should show a curve close to the value of 1.0.
  • Example 5 Cell mixture experiment for validating the method for determining changes in distribution of white blood cells between different subpopulations
  • Example 6 Application of the methods herein to the subpopulations of head and neck cancer patients and controls
  • This example describes the application of the method herein for detennining changes in the distribution of white blood cells between different subpopulations to patients having head and neck squamous cell carcinoma (FINSCC).
  • the target data set Si was obtained from arrays applied to whole blood specimens collected in a random subset of individuals involved in an ongoing population-based case-control study (Peters et al, 2005, Cancer Epidemiol Biomarkers Prev, 14(2), 476-82) of head and neck cancer (HNSCC): 92 cases and 92 age and sex matched controls. Blood was drawn at enrollment (prior to treatment in 85% of the cases).
  • the clustering heatmap in Figure 3 depicts the raw DNA methylation data in Si.
  • Table 4 presents coefficient case status, double-bootstrap bias estimates (estimates of bias arising from measurement error), as well as naive, single-bootstrap, and double-bootstrap standard error estimates. Each of these quantities is measured in percentage points (%). Estimates of bias arising from measurement error (i.e. substituting estimated quantities for known ones in a two- stage statistical procedure) were almost always less than half a percentage point, and for significant coefficient estimates, always towards the null.
  • CD4+ T-lymphocytes decreased in cases compared with controls, with a bias-corrected estimate of -10:4 percentage points and approximate 95% confidence interval (- 13: l %;-3:3%); the proportion of NK cells decreased, with a bias-corrected estimate of -1.5 percentage points and 95% confidence interval (-2:2%;-0:75%); and the proportion of granulocytes increased, with a bias-corrected estimate of 7.6 percentage points and 95% confidence interval (4:2%; 10:9%). There was also some evidence of an increase in CD8+ T- lymphocytes, with an estimate of 4.5 percentage points and 95% confidence interval (4:5%; 7:0%).
  • CD4+ T-lymphocytes decreased by 3.3 percentage points (-4:4%;-2:2%) per decade of age, and CD8+ T-lymphocytes increased by 2.0 percentage point (1 :0%; 3:0%) per decade. All other coefficients were insignificant.
  • Treg cells a subclass of CD4+ T cells
  • the bias estimates obtained from the double-bootstrap procedure allow the correction of bias arising from measurement error.
  • there is no statistical procedure for correcting the other possible sources of bias those arising from changes in distribution among unprofiled cell types as well as non-immune-mediated methylation differences.
  • Example 7 presents a detailed sensitivity analysis which shows that the magnitude of the resulting bias is likely to be small, less than a percentage point.
  • Bias 2 Double-bootstrap bias estimate (x 100%).
  • Bias 2 Double-bootstrap bias estimate ( x 100%)
  • z consisted of case-control status, age (categorized in five-year increments), and two bisulfite conversion efficiency measures.
  • Tables 6-8 presents result for case-control status and estimated regression coefficients for age in ovarian cancer data set. R (l was estimated at 17.8%, and was estimated at 86: 1 %.
  • Bias 2 Double-bootstrap bias estimate (x 1 00%).
  • Bias 2 Double-bootstrap bias estimate (x 100%).
  • SE 2 Double-bootstrap standard error (x 100%). P-values were computed using SE 2 .
  • CD4+ T cells there were significant systematic decreases in CD4+ T cells with increasing age, with a gradient consistent in direction and somewhat consistent in magnitude with the corresponding effect found in the HNSCC data set.
  • the CD8+ T cell coefficients for were all positive, with gradient consistent in direction and somewhat consistent in magnitude with the corresponding effect found in the HNSCC data set. No bisulfite conversion coefficient was significant, and all coefficients were of small magnitude (Table 8; generally less than 1 percentage point per standard deviation).
  • Example 8 Application of the methods herein to subpopulations of Down Syndrome patients and controls
  • Rf Q was estimated at 4.5%, and was estimated at 67:6%.
  • Rf fi was estimated at 81.4%, and R ⁇ was estimated at 98:9%.
  • Example 9 Application of the methods herein to obesity in an African American population
  • FIG. 8 shows a clustering heatmap displaying the DNA methylation data.
  • z consisted of obesity status.
  • Obese subjects had an estimated increase of 12 percentage points in granulocytes, bias-corrected 95% confidence interval (3:4%; 20%) and an estimated decrease of 4 percentage points in NK cells, bias-corrected 95% confidence interval (-7:7%;- 0:9%) (Table 10). No significant differences were found for other blood cell types.
  • Bias 2 Double -bootstrap bias estimate ( x 100%).
  • Table 1 1 White Blood Cell Distribution in HNSCC Controls
  • Est Regression coefficient estimate ( x 100%), normalized so that estimates sum to
  • Bias 2 Double-bootstrap bias estimate (x 100%).
  • BC-Est bias-corrected estimate. If the coefficients represented a complete profiling of blood cell types, the estimates should sum approximately to one, even though the model does not explicitly constrain them so. In this case, the original bias corrected estimates (of leukocyte distribution in HNSCC controls) summed to 133%. The table shows the values re-normalized to 90%, the anticipated proportion of the cell types. The resulting estimated distribution of leukocytes is consistent with the literature (Alberts B et al, 2008, Molecular Biology of the cell. New York, NY: Taylor and Francis, 5 th edition)
  • bias estimates evident from the double-bootstrap procedure admit the possibility of correcting the bias arising from measurement error.
  • There is no statistical procedure for correcting the other possible sources of bias those arising from unprofiled cell types and non- cell-mediated profile differences, i.e. methylation difference signatures ⁇ with nonzero projection onto the space spanned by the WBC signatures.
  • the magnitude of the bias was small: for the more likely low values of k, the bias was 0.1 to 0.25 of a percentage point. In addition, this analysis was conservative in that it assumed all of the effect in B l was due to non-cell -mediated processes, a strongly conservative assumption.
  • the expected bias over the uniform posterior implied by ⁇ 0 was computed by iterated expectation, first by computing the mean bias for each choice of k, then forming the expectation over the binomial distribution j5/ «(100, ⁇ ), As noted in details described under "Bias" in Example 3 the result scaled linearly with ⁇ . The constant of proportionality was estimated to be 2.08 percentage points. In summary, if the prior expectation is of even moderate size ( ⁇ 0.1) that any one CpG among the 100 selected for this application will show systematic differentiation between cases and controls, then the implied bias would be expected to be less than a percentage point.
  • Example 12 Simulations
  • ni/2 cases and nJ2 controls were specified, no e ⁇ 100, 200, 500 ⁇ .
  • methylation profiles were generated by a white blood cell population of 7% B cells, 62% granulocytes, 6% monocytes, 2% NK cells, and 13% were T cells, of which 65% were CD4+ cells and 35% were CD8+ cells, and the remaining 5% were unspecified (and assumed to have mean equal to the unsorted T-lymphocytes).
  • Residual effects ⁇ 0) for controls were set equal to 0.1 times estimated intercept ⁇ ⁇ and residual effects plus multiples 10$ of the column of U corresponding to case.
  • the constants of proportionality 0.1, 0.08, and 0.09 were chosen to correspond to assumed contributions of ⁇ to an overall methylation signature presumed to be dominated by profiled populations of white blood cells in specified proportions, with 0.08 used for the strong alternatives and 0.09 used for the Mixed Alternative.
  • the constant 10 was used to amplify the scale of ⁇ so that its effect could be detected in simulation; it is noted that U was orthogonal to the white blood cell profiles, by construction.
  • Estimates were biased towards the null, on the order of about a percentage point.
  • Table 15 were purchased from AllCells®, LLC (Emeryville, CA). These leukocytes were sorted in a column with antibody-conjugated magnetic beads using a combination of positive and negative selection. Genomic DNA from the leukocytes was extracted according to
  • the array was fluorescently stained, scanned, and fluorescent intensities of each of the unmethylated and methylated bead types were measured.
  • the ratio of fluorescent signals is computed from both alleles using the following equation: + 100.
  • the ⁇ -value is a continuous variable ranging from 0 (unmethylated) to 1 (completely methylated) that represents the methylation at each CpG site and is used in subsequent statistical analyses.
  • Data were assembled with BeadStudio methylation software from lllumina, Inc. (San Diego, CA). Bibikova, M., et al , Epigenomics 1, 177-200 (2009).
  • a comparison of methylation in sorted normal human immune cells was observed to produce distinct profiles of methylation markers for further consideration.
  • DMA Methylation profiles distinguished lymphocytes from myeloid derived leukocytes.
  • RPMM Recursively partitioned mixture model
  • Candidate DNA regions with high potential to discriminate CD3+ T cells from non-T cells were chosen based on the criteria of being differentially demethylated and differentially overexpressed in CD3+ T cells compared with other cell types (monocytes, granulocytes, NK cells, and B cells). Two quantitative methylation methods, bisulfite pyrosequencing and MS- qPCR, were used to confirm array methylation.
  • Example 14 Patient characteristics and biological samples for determining CD3+ T cell distribution in glioma cases and controls
  • Pertinent data for this analysis included age at histological diagnosis, gender, vital status, and survival time between diagnosis date and date of death for those deceased or between diagnosis date and date of last contact for those alive, and any of cigarette smoking history and exposure to steroids, chemotherapy and radiation therapy.
  • Tumor samples were defined as secondary GBM if the patients had prior histological diagnosis of a low-grade glioma. All ages are given at the time of surgery, which occurred at UCSF between 1990 and 2003.
  • This tumor set contained the following histological subtypes: 2 pilocytic astrocytoma (PA), 15 ependymoma grade II (EPII), 20 oligodendroglioma grade II (ODII), 16 oligoastroglioma grade II (OAII), 3 oligoastroglioma grade III (OAIII), 23 astrocytoma grade II (ASII), 4 astrocytoma grade III (ASIII) and 37 astrocytoma grade IV, also called glioblastoma multiforme grade IV (GBM), ten of which were recurrent and five of which were secondary.
  • PA pilocytic astrocytoma
  • EPII 15 ependymoma grade II
  • ODII oligodendroglioma grade II
  • OAII oligoastroglioma grade II
  • OAIII oligoastroglioma grade III
  • ASII astrocyto
  • Sorted, normal, human, peripheral blood leukocyte subtypes were isolated from different non-diseased individuals' whole blood by MACS using a combination of negative and positive selection with highly specific cell surface antibodies conjugated to magnetic beads. The purity of separated cells was determined with flow cytometry to be >97%.
  • Example 15 Bisulfite pyrosequencing and MS-qPCR assays for validating CD3Z, CD3E and FOXP3 specific DMRs
  • the latter procedure mimics the conditions of detection that exist in differentiating different relative numbers of CD3+ T cells and Tregs within a mixture of cells in a complex biological sample.
  • CD3Z the four-point standard curve used 10,000, 1,000, 100, and 10 bisulfite converted CD3+ T cell DNA copies; absolute quantification of FOXP3 used, 5,000, 500, 50 and 5 bisulfite converted Treg cell DNA copies.
  • CD3Z Probe 6FAM CCAACCACCACTACCTCAA (MGB,NFQ) (SEQ ID NO: 102)
  • the CD3E specific DMR DNA methylation status of the DMR in CD3E gene was measured by pyrosequencing bisulfite converted DNA from sorted, human, peripheral blood leukocytes.
  • Figure 10 panel A The CD3Z specific DMR, DNA methylation status of the DMR in CD3Z gene was measured by MethyLight® qPCR. of converted DNA from sorted, human, peripheral blood leukocytes ( Figure 10 panel B).
  • the genomic region containing the CD3Z DMR is shown in Figure 1 1.
  • Standard calibration curves were used to determine if the newly identified CD3Z DMR was useful to quantify CD3+ T cells, Tregs (FOXP3 demethylated) and ratios of Tregs/CD3+ T cells in biological specimens such as whole or separated blood or other tissues.
  • quantitative real time methylation specific PCR was performed. DNA isolated from purified cell types was bisulfite converted and serially diluted into a background of fully methylated commercial DNA standard (Qiagen). This method is referred to herein as "CS-DM assay" or assays.
  • Example 16 Flow cytometry of blood lymphocytes in whole blood for quantification of CD3+ T cells
  • Cells were labeled by direct staining with the appropriate fluorochrome-conjugated antibodies (eBioscience Inc, San Diego, CA), and were incubated for 20 minutes in the dark at 4 °C; CD3-fluorescein isothiocyanate (FITC, cat # 1 1 - 0038-41), anti-CD4-allophycocyanin (APC, cat # 17-0048-41 ), anti-CD8-phycoerythrin (PE, cat #12-0086-41), and anti-CD45-PerCP-Cy5.5 (cat #45-0459-41). Isotype control mAbs were used as negative controls.
  • FITC CD3-fluorescein isothiocyanate
  • APC anti-CD4-allophycocyanin
  • PE anti-CD8-phycoerythrin
  • PE anti-CD45-PerCP-Cy5.5
  • Isotype control mAbs were used as negative controls.
  • Example 17 Tumor immunohistochemistry (IHC) for measuring levels of tumor infiltrating lymphocytes (TIL) in glioma tumors
  • Example 18 Statistical analysis of differential methylation in CD3+ T cells for identification of cell-specific OMRs
  • MACS sorted leukocyte DNA methyation data consisting of un-normalized average beta values from the Illumina HumanMethyation27 microrray were calculated from probe intensities using Illumina GenomeStudio.
  • Locus by locus comparisons of DNA methyation between the sorted cell types were performed using a linear mixed effects model (controlling for beadchip) in SAS version 9.2, thereby generating estimates and p-values for differential methyation in CD3+ T cells compared to other cell types.
  • Resultant p-values were adjusted for multiple comparisons using the qValue package in the software program R project for statistical computing, version 2.13 available for downloading from the internet, and q-values of less than 0.05 were considered significant. All correlations, F-tests, Wicoxon rank sum and Kruskal-Wallis one-way analysis of variance by ranks tests were carried out in R version 2.1 1.1 and survival analysis was performing using the survival pack in R version 2.1 1.1.
  • Example 19 Discovery and validation of CD3Z demethylation as a marker of CD3+ T cells
  • the search for genes containing DMRs specific for CD3+ T cells using methods herein revealed candidate CpG sites within the genes encoding several components of the T cell receptor (TCR) complex; namely, CD3D, CD3E, CD3G, and CD3Z.
  • TCR T cell receptor
  • Myeloid derived blood cells (granulocytes, neutrophils, monocytes) and B-lymphocytes contained methylated CpG sites within CD3D, CD3E, CD3G and CD3Z loci compared with T cells, which were demethylated.
  • CD3Z was also unmethylated in CD16+ NK cells, but was methylated in CD 16- NK cells.
  • CD3D, CD3E and CD3G genes are CpG sparse compared with CD3Z, which contains a CpG island that is optimally suited for designing MS-qPCR assays (Fig. 1 panel A).
  • CD3Z locus was analyzed for the development of a CD3+ T cell epigenetic marker.
  • Pyrosequencing of CD3Z showed the extent of differences in demethylation among immune cell lineages, which approaches complete demethylation in CD3+ T cells and nearly complete methylation in other cell lineages ( Figure 20 panels A-B).
  • Example 20 Determination of T cells and Tregs levels in peripheral blood by CD3Z and FOXP3 MS-qPCR assays in glioma cases and controls
  • Use of dexamethasone or chemotherapy was not associated with T cell measures.
  • Non-GBM (n 6) 18.5 (3.5-26.6) 0.9 (0.2-1.6) 6.0 (3.8-7.1)
  • Example 21 Determination of T cells and Tregs levels in tumor infiltrates by CD3Z and FOXP3 MS-qPCR assays in excised glioma tumors.
  • Tregs as measured by CD3Z demethylation assays were divided into two groups. In each panel the top trace represents survival data of the group of patients for whom the measured variable (methylation status of CD3+ T cells, or of Tregs, or a ratio Tregs/T cells) was below the median observed for that variable, and the bottom trace represents survival data of the group of patients for whom the measured variable was above the median observed for that variable.
  • the measured variable methylation status of CD3+ T cells, or of Tregs, or a ratio Tregs/T cells
  • Example 23 Cells, and cancer patient and control datasets for determining DNA methylation based epigenetic signatures for differentiating patients and controls
  • Sorted, normal, human peripheral blood leukocyte subtypes were isolated from whole blood by magnetic activated cell sorting (MACS) (AllCells LLC, Emeryville, CA). The purity of separated cells was confirmed with flow cytometry to be >97%. Genomic DNA was extracted and purified from cell pellets using a commercially available method (Qiagen, Valencia, CA), treated with sodium bisulfite (Zymo Research, Irvine, CA) and subjected to methylation profiling using the Infinium HumanMethyation27 BeadArray (Illumina, San Diego, CA). This same platform was used for the analysis of samples from the case-control studies described below.
  • MCS magnetic activated cell sorting
  • the ITNSCC data set consists (Table 19) of 92 incident cases from the greater Boston area and 92 cancer-free population-based control subjects from the same region (Applebaum
  • PLoS One 4:e8274, 2009 is publicly available from Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo/, Accession number GSE1971 1), and consists of 266 postmenopausal women diagnosed with primary epithelial ovarian cancer (131 pre-treatment and 135 post-treatment cases) from the UK Ovarian Cancer Population Study (UKOPS).
  • GEO Gene Expression Omnibus
  • Controls were cancer-free postmenopausal women for which annual serum samples were available. To avoid potential biases due to therapy, only pre-treatment ovarian cases were included in the analysis.
  • the bladder cancer data set (Marsit CJ et al., 201 1, J Clin Oncol 29: 1133-1 139) consists of 223 incident bladder cancer cases identified from the New Hampshire state cancer registry and 237 population controls from the same region (Karagas MR et al., 1998, Environ Health Perspect 106: 1047-1050; Wallace K et al., 2009, Cancer Prev Res 2:70- 73). Table 20 provides a summary of the participant characteristics.
  • Table 20 Characteristics of the study population in the Bladder cancer data set.
  • Example 24 Statistical analysis of differences in methylation status in leucocyte subsets for determining signatures based on leukocyte DMRs
  • the analytic strategy was aimed toward examining the extent to which peripheral blood DNA methylation of non-hematopoietic cancers is driven by the epigenetic signatures that define leukocyte subtypes.
  • Linear mixed-effects models were used to assess differences in methylation across the leukocyte subtypes and controlled for the large number of comparisons using false discovery rate (fdr) estimation.
  • Leukocyte DMRs were subsequently ranked based on their strength of association and the highest ranking 50 DMRs were examined across the three cancer data sets between cancer cases and cancer-free controls.
  • Example 25 Prediction of methylation class membership based on epigenetic signatures from leukocyte derived DMRs
  • Genome-wide DNA methylation was profiled in 46 samples of magnetic antibody sorted, normal human peripheral blood leukocyte subtypes (including B cells, granulocytes, monocytes, NK-cells, CD4+ T cells, CD8+ T cells, and Pan-T cells; Figure 28) using the Infinium
  • DMRs (M) for subsequent clustering analysis of the training sets.
  • the highest ranking 50, 10, and 56 leukocyte DMRs from the respective cross-validation procedures using the 10,370 putative DMRs initially identified were selected to cluster the observations in the HNSCC, ovarian cancer, and bladder cancer training sets respectively.
  • the resultant clustering solutions were used to predict methylation class membership for the subjects within the respective independent testing sets.
  • Figures 24 panel A, 25 panel A and 26 panel A depict heat maps of the respective testing sets by predicted methylation class for each cancer data set.
  • Methylation classes derived from leukocyte subtype DMRs were significantly associated with cancer case status within each cancer type (permutation ⁇ 2 p-values ⁇ 0.0001, ⁇ 0.0001, 0.03, HNSCC, ovarian cancer, and bladder cancer data sets respectively), supporting the phenotypic relevance of predicted methylation classes based on leukocyte DMRs.
  • OR 1.94 95% CI [0.95, 3.98], adjusted for age, gender, smoking and family history of bladder cancer.
  • Mean delta-beta refers to the difference in mean methylation between cancer cases and controls (i.e. peases - controls).
  • Example 26 Statistical analysis of methylation differences in leukocyte DMRs between cancer cases and cancer-free controls for determining epigenetic signatures specific to each group
  • Linear mixed-effects models were used to assess differences in methylation across the leukocyte subtypes, modeling arcsine square-root transformed methylation as the response 1, leukocyte subtype as a fixed effect covariate, and a random effect term for plate/BeadChip.
  • False discovery rate (fdr) estimation was used to control for the large number of comparisons and putative leukocyte DMRs were defined as those with fdr q-value ⁇ 0.05. Leukocyte DMRs were then ranked based on their strength of association using the F-statistics that resulted from the respective linear mixed-effects models.
  • Methylation differences among the highest ranking 50 leukocyte DMRs were examined between cancer cases and cancer-free controls using a series of unconditional logistic regression models that were adjusted using available and relevant covariate information.
  • a leukocyte DMR was considered differentially methylated if the nominal p-value from the unconditional logistic regression model was less than 0.05.
  • Permutation tests were then applied to each of the three data sets to determine if the number of differentially methylated leukocyte DMRs was significantly greater than expected by chance. Specifically, samples were randomly permuted (same permutation across the highest ranking 50 DMRs) and an unconditional logistic regression model was fit to the resampled data.
  • the leukocyte DMR profile analysis involved splitting the full cancer data sets into equally sized training and testing sets ( Figures 29-32). Samples in the training set were clustered using the highest ranking M leukocyte DMRs, where M was determined from the total pool of putative DMRs using the previously described cross-validation procedure (Sincic N and Herceg Z, 2011, Curr Opin Oncol 23:69-76).
  • RPMM Recursively Partitioned Mixture ModeB
  • SS-RPMM semi-supervised RPMM2
  • Methylation analysis was performed using The Infinium® HumanMethylation27 Beadchip Microarray (lilumina Inc., San Diego, CA), which quantifies the methylation status of 27,578 CpG loci from 14,495 genes, with a redundancy of 15-18 fold.
  • the resultant ⁇ -value is a continuous variable ranging from 0 (unmethylated) to 1 (completely methylated) that represents the methylation at each CpG site and is used in subsequent statistical analyses. Data were assembled with the methylation module of
  • GenomeStudio software (lilumina, Inc., San Diego, CA; Bibikova M et al, 2009, Epigenomics 2009; 1 : 177-200)
  • Example 28 Validation of DNA Methylation Microarray results for identifying NK cell-specific DMRs by pyrosequencing Pyrosequencing assays to validate microarray results were designed using Pyromark Assay Design 2.0 (Qiagen Inc., Valencia, CA), and carried out on a Pyromark
  • MD pyrosequencer running Pyromark qCpG 1.1.11 software (Qiagen Inc., Valencia, CA).
  • Oligonucleotide primers were obtained from Life TechnologiesTM (Grand Island, NY).
  • Example 29 Protein expression analysis by mRNA expression array for identifying NK cell-specific DMRs
  • the Whole-Genome DASL HT Assay Kit (Illumina Inc., San Diego, CA) was used to obtain simultaneous profiles of more than 29,000 mRNA transcripts. Data were assembled with the expression module of GenomeStudio software (Illumina Inc., San Diego, CA). The mRNA expression array data was used in combination with DNA methylation array data to identify NK cell-specific DNA methylation.
  • Example 30 Methylation specific quantitative polymerase chain reaction (MS-qPCR) analysis for quantification of NKp46 demethylation
  • MS-qPCR was performed using solutions and conditions according to Campan M et al., 2009, Methods Mol Biol, 507:325-37 with the following modifications.
  • a solution of 10X TaqMan® Stabilizer containing 0.1% Tween-20, 0.5% gelatin was prepared weekly. Each reaction of 20 ⁇ contained 5 ⁇ DNA, 11.9 ⁇ preMix, 3 ⁇ oligoMix, and 0.1 ⁇ Taq DNA polymerase. Cycling was performed using a 7900HT Fast Real- Time PCR System (Applied Biosystems, Foster City, CA); 50 cycles at 95 °C for 15 sec and 60 °C for 1 min after 10 min at 95 °C preheat. All samples were run in triplicate using the absolute quantification method.
  • NKp46 reverse primer CCCATTCCCCTTCCACA (SEQ ID NO: 117)
  • NKp46 probe (6FAM) CTCACCAACACAAAACAA (MGB, NFQ) (SEQ ID NO: 118 ) C-less forward primer TTGTATGTATGTGAGTGTGGGAGAGA (SEQ ID NO: 97)
  • a conversion factor was used for a diploid human cell, which is 6.6 picograms (pg) of DNA ( 3.3 pg per copy) to calculate copy number.
  • Normal human blood DNA quantified by UV absorption (Nanodrop, Inc) was used to generate a four point standard curve with 30,000 copies, 3,000 copies, 300 copies and 30 copies of genomic DNA. This standard curve was included on each sample plate to obtain
  • NK cell DNA in known copy numbers was spiked into universal methylated DNA in ratios that maintained a constant total number of DNA copies (10,000 copies) in each reaction across the dilution scheme. This mimics conditions for detecting different relative numbers of NK cells within a complex mixture of cells in a biological sample.
  • the four- point standard curve used 10,000 copies, 1,000 copies, 100 copies, and 10 copies of bisulfite converted NK cell DNA.
  • Example 31 Statistical modeling of the DNA methylation microarray data for estimation of differential methylation
  • NK cells and non-NK cells which included pan T lymphocytes, CD4+ T- lymphocytes, Tregs, CD8+ T-lymphocytes, B-lymphocytes, granulocytes and monocytes.
  • Example 32 Statistical modeling of the RNA expression array for estimation of differential RNA expression
  • RNA expression for MACS isolated NK cells was compared to each of the following MACS isolated leukocytes: pan T-lymphocytes, CD4+ T-lymphocytes, Tregs, CD8+ T-lymphocytes, B lymphocytes, ganulocytes and monocytes.
  • NKp46 demethylation is a biomarker of NK cells
  • Example 35 Samples from F1NSCC patients have diminished circulating NK cells
  • the calibrated NKp46 MS-qPCR assay was used to measure the level of circulating NK cells in the peripheral blood of patients with FTNSCC and cancer free controls.
  • NKp46 MS-qPCR measurements from cancer-free control blood samples were used to determine suitable cutoffs for NKp46 demethylation tertiles.
  • the proportion of total HNSCC cases decreased significantly with increasing demethylation tertile (p> 0.001, Figure 37), indicating that HNSCC patients are more likely to have depressed levels of NK cells in their peripheral blood.
  • Multivariate logistic regression controlling for age, gender, cigarette smoking, alcohol consumption, BMI, and HPV16 (E6 and/or E7) serology confirmed increased HNSCC risk for individuals in the lower two normal NKp46 demethylation tertiles (Table 25), strongly indicating that lower levels of NK cells in the peripheral blood are significantly associated with HNSCC.
  • Example 36 Application of the methodology to mRNA data
  • the statistical methods described herein for determining changes the distribution of white blood cells among different subpopulations are applicable to mRNA expression profiles with the following considerations.
  • a mathematical consideration is that mRNA is typically analyzed on a logarithmic scale, yet the assumptions of the methods herein involve linearity on an arithmetic scale, since the mixing coefficients are assumed to act linearly on absolute numbers of nucleic acid molecules; thus, the proposed methods would require analysis of untransformed fluorescence intensities, for which skewed distributions would result in numerical instabilities.
  • a biological consideration is absence of a linear relationship between cell number and mRNA copies, since proteins may be translated as a consequence of an initial burst of mRNA transcription upon cellular development, followed by significant mRNA degradation. In contrast, one would expect the average beta value provided by Illumina bead-array products, as well as similarly constructed quantities from other platforms to scale in proportion to the actual fraction of methylated nucleic acids with a biologically reasonable assumption of two DNA molecules per cell.
  • the validation data set S 0 was obtained from Watkins NA et al., 2009, Blood 113: e l-e9, in which the illumina Human-6 v2 Expression BeadChip was used to characterize the mRNA expresion profile of eigt types of blood cells: B cells, granulocytes, erythroblasts, megakaryocytes, monocytes, natural killer cells, CD4+ T cells, and CD8+ T cells.
  • erythroblasts nucleated progenitors of red blood cells
  • megakaryocytes progenitors of platelets
  • the target data set S l was obtained from Showe MK et al, 2009, Cancer Res 69: 9202-10, in which the same mRNA expression platform was used to characterize expression differences in isolated mononuclear cells between nonsmall cell lung cancer (NSCLC) cases and controls having non-cancer lung disease, adjusting for age, sex and smoking.
  • NSCLC nonsmall cell lung cancer
  • data was presented from 18 matched case samples, pre- and post-operative.
  • the same methodology was used as for the DNA methylation data sets herein, ordering the 46,693 transcripts by F statistic according to their ability to distinguish six types of leukocytes. Of the 100 transcripts having the largest F statistics it was observed that 86 overlapped with the transcripts in Showe MK et al, 2009, Cancer Res 69: 9202-10.
  • Table 26 presents results from 137 NSCLC cases and 91 controls, adjusted for age, sex, and smoking status.
  • Table 27 presents results from 18 matched pre-operative and post-operative samples from NSCLC cases, where the analyzed outcome was the difference in untransformed expression (post-operative expression minus pre-operative expression), and coefficients displayed correspond to the intercept of l (analogous to a paired t-test). Perturbations in T cell distribution were consistent with known immunological changes resulting from NSCLC (Ginns LC et al, 1982, Am Rev Respir Dis 23: 265—9; Mazzoccoli G et al., 1999, In Vivo 13: 205-9), as well as with age and smoking.
  • Table 27 White blood cell distribution comparing matched pre-operative and post-operative cases in SCLC mRNA data set
  • Example 37 An array for high-throughput DNA methylation analysis
  • VeraCode microbeads Illumina, San Diego, CA USA
  • DNA sequences of regions in 96 different genes each sequence having one CpG dinucleotide shown within square brackets ( Figure 40) and used to determine methylation status of the gene.
  • Veracode beads are cylindrical glass microbeads 240 microns in length by 28 microns in diameter with a surface suitable for attaching DNA, RNA, protein, antibody and other ligands for performing bioassays. For performing DNA methylation analysis various CpG specific DNA oligomers were attached to these beads.
  • a microbead is inscribed with a high-density holographic code (24-bit), allowing development of very large numbers of bead types.
  • a laser is shone at the high density codes of the beads they emit a signal specific to the code and the signal is detected by a CCD camera.
  • the fluorescence of the bead indicates whether the particular CpG site carried by the bead is demethylated.
  • the result is compared with the fluorescence readout obtained from DNA from a purified leukocyte sample.
  • a VeraCode array is a collection of beads, each carrying a DNA oligomer specific for either the methylated or the unmethylated form of a particular CpG locus, distributed into different wells of a micro titer plate.
  • a user selects all or a subset of nucleotide sequences containing CpG sites in a gene or genes of interest for attaching to VeraCode beads to have a custom designed VeraCode array particularly advantageous for the user's analysis.
  • the Infmium HumanMethylation 27K data corresponding to all of the Magnetic activated cell sorting (MACS sorted leukocyte DNA were assembled in the methylation module of GenomeStudio, and the quality of the data was assessed by calculating Mahalanobis distances. All 47 samples yielded acceptable data.
  • a matrix of ⁇ - values was generated with rows defined by microarray CpG locus and columns defined by sample identification.
  • a corresponding matrix indicating cellular phenotypes was also generated, with rows defined by sample identification (in precisely the same order as the columns in the corresponding matrix) and columns defining the cell lineage(s) to which each cell lineage belongs.
  • LME linear mixed effects
  • the fixed effect groups were: Pan-T cell, CD4+ T cell, CD8+ T cell, Pan-NK cell, CD56 dim NK cell, CD56 bnght NK cell, B cell, granulocyte, neutrophil, eosinophil, and monocyte. Across all gene loci, this model generated coefficients for each fixed effect group indicating relative estimates of DNA methylation for each of the different cell types.
  • Collapsing categories accounted for the hierarchical relationships among cell lineages and a linear transformation was applied to convert coefficient estimates to estimated mean value per cell type, resulting in a matrix B 0 of mean values, each row corresponding to a CpG locus and each column corresponding to a cell type.
  • the model also generated an F-statistic for each locus that indicates how significantly different DNA methylation was between the cell types.
  • DMRs differentially methylated regions
  • the stochastic search algorithm was designed to maximize precision of estimated cellular fractions, under the assumption that the variance-covariance of the fraction estimates is proportional to (BjB 0 ) ⁇ ' .
  • the corresponding diagonal element of (BjB,,) '1 was minimized; to optimize a set of cell types, the sum of the corresponding diagonal elements was minimized.
  • the general strategy was as follows.
  • the engine is a stochastic search algorithm that starts with an initial set of CpGs, which is the beginning choice for the "current" set. On each iteration a randomly chosen CpG from the current set is switched out with a randomly chosen CpG from the remaining (unselected) CpGs, and precision is compared between the current set and the "candidate" set. If the candidate set gives better precision then the switch is accepted. Otherwise it is rejected. Ideally, by the end of the algorithm, the acceptance rate should be 0%.
  • Most Epigenome-wide association scans (EWAS) have attempted to estimate the marginal effect ( ⁇ , depicted in Figure 41, panel A) on measured DNA methylation, which are effects not adjusted for WBC distribution.
  • depicted in Figure 41, panel A
  • a significant portion of the effect on DNA methylation is mediated through changes in WBC distribution as shown in Figure 41, panel B.
  • the direct effect adjusted for WBC distribution is a , the direct effect adjusted for WBC distribution.
  • the effect of exposure or phenotype on WBC distribution
  • ⁇ , ⁇ + u ( , where u ( is a zero-mean error vector.
  • a is a p x 1 vector
  • K cell types are assumed, so that ⁇ , is &K x 1 vector, ⁇ is a K x p matrix, and ⁇ is a K x 1 vector.
  • Statistical inference is achieved by permutation. Specifically, the null distributions of a and ⁇ are obtained by permuting the exposure or phenotype of interest within z (only the components representing the covariate to be tested), and the null distribution of ⁇ is obtained by permuting the subject assignments corresponding to o) t . Adjustments for multiple comparisons are achieved by nesting within each permutation a loop that estimates ⁇ . , ⁇ ; , and ⁇ 7 for each individual CpG , with adjusted p-values obtained by comparing the maximum absolute values of a j , T j , and ⁇ . (over all CpGs ) to the corresponding statistics computed from each individual permutation. For comparison purposes, a similar permutation test can be applied for the marginal coefficient ⁇ .
  • Table 28 shows the multiple-comparisons adjusted p- values for each coefficient corresponding to the covariate of interest ( ⁇ , a , ⁇ ) and overall WBC distribution effect on DNA methylation ( ⁇ ), obtained by permutation test using 5000 permutations.
  • significance of a may be greater than, less than, or equal to the significance of ⁇ .
  • the covariate of interest shows a strongly significant association with WBC distribution. It is noted that WBC shows significant overall association with DNA methylation.
  • Example 39 Comparison of methods herein for estimating fractions of blood cell types with non-negative matrix factorization (NNMF)
  • ⁇ ⁇ are the fractions of each blood cell type corresponding to subject i
  • b is the vector of methylation fractions corresponding to blood cell type /; the methods herein provide techniques for estimating the fractions ⁇ ⁇ assuming the values of b, have been obtained from an external validation data set.
  • non-negative matrix factorization could be used to estimate o a and b, simultaneously in absence of an external validation set.
  • the do vectors ⁇ admiration are considered “factors”, and the o vectors (assumed to represent individual methylation profiles) are considered “basis vectors” and the number of factors d 0 must be provided to the NNMF algorithm.
  • Example 5 NNMF was compared to methods herein (Examples 1-3). Highest ranking 100 and 500 pseudo-DMRs were selected on the basis of informativeness as in Example 4; for each choice, the constrained projection described in Examples 1 and 5 was used to impute specific cell distributions, then NNMF was performed assuming four, five, and six factors (i.e. factor values assumed to represent the fractions ⁇ note for one cell type I). The nmf function in the R package NMF was used with default settings. Since NNMF requires random inputs, NNMF was applied 100 times, each with different randomly generated starting values according to the default settings of the nmf function.
  • Example 40 Quantitation of T cell Treg and CD16+CD56 d " n NK cell numbers by CD3Z.
  • a droplet digital PCR technique was used to quantitate T cell, Treg and CD16+CD56 dim NK cell numbers using CD3Z, FoxP3 and NKp46 methylation assays described in Examples 15 and 30.
  • Digital PCR is a refinement of conventional PCR methods and is used to directly quantify and clonally amplify nucleic acids.
  • dPCR and traditional PCR differ in method of measuring nucleic acid amounts, as dPCR is more precise.
  • the two PCR methods differ in that the sample is separated into a large number of partitions in dPCR, and the reaction in each partition is carried out individually. This separation produces a more reliable collection and sensitive measurement of nucleic acid amounts.
  • T cells and Tregs were serially diluted, and copies of each of the targets were quantified as measures of cell numbers.
  • Bisulfite converted DNA from whole blood, isolated human T-cells and Treg cells and from N cells was quantified using the emulsion partitioning method of BioRad QX100TM Droplet DigitalTM PGR (ddPCRTM) system. This system creates portioned PCR reaction using water-in-oil droplets for performing high- throughput digital PCR.
  • the QX100 droplet generator partitions samples into 20,000 nanoliter- sized droplets. After PCR using a thermal cycler, droplets from the samples were streamed in single file on a reader (QX100 droplet reader).
  • Figures 43 and 44 show that successful amplification and detection of CD3Z and Foxp3
  • DMRs DMRs, respectively were obtained.
  • Panel A of Figures 43 and 44 show dot plots indicating distinguishing of positive droplets and negative droplets.
  • Panel B of Figures 43 and 44 show the calculated absolute numbers of positive PCR droplets.
  • Results obtained from dilution of standard purified T cells shows correspondence of quantities of CD3Z and FoxP3 genes with extent of dilution and hence validity of dPCR as a detection method for methylation based assay of immune cell identity.
  • Other partitioning approaches have been developed that employ microfluidic manipulation and results similar to the data obtained herein are expected from the use of such other methods of partitioning.
  • Figure 45 shows quantitation of purified NK cells under different conditions and
  • Figure 46 shows quantitation of whole blood and of purified leukocyte subsets by measuring demethylated NKp46 DMR described in Example 30.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Chemical & Material Sciences (AREA)
  • Medical Informatics (AREA)
  • Biotechnology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Analytical Chemistry (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Organic Chemistry (AREA)
  • Wood Science & Technology (AREA)
  • Bioethics (AREA)
  • Databases & Information Systems (AREA)
  • Epidemiology (AREA)
  • Zoology (AREA)
  • Public Health (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Pathology (AREA)
  • Microbiology (AREA)
  • Immunology (AREA)
  • Biochemistry (AREA)
  • General Engineering & Computer Science (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

La présente invention concerne des procédés faisant appel à des réseaux de méthylation d'ADN pour identifier une cellule ou un mélange de cellules et pour quantifier les modifications de distribution de cellules dans le sang ou dans les tissus, et pour diagnostiquer, pronostiquer et traiter des maladies, en particulier un cancer. Les procédés utilisent des échantillons fraîchement prélevés et stockés.
EP12789375.8A 2011-05-25 2012-05-25 Procédés faisant appel à la méthylation de l'adn pour identifier une cellule ou un mélange de cellules afin de pronostiquer et de diagnostiquer des maladies et pour effectuer des traitements de réparation cellulaire Withdrawn EP2714933A4 (fr)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US201161489883P 2011-05-25 2011-05-25
US201161509644P 2011-07-20 2011-07-20
US201261585892P 2012-01-12 2012-01-12
US201261619663P 2012-04-03 2012-04-03
PCT/US2012/039699 WO2012162660A2 (fr) 2011-05-25 2012-05-25 Procédés faisant appel à la méthylation de l'adn pour identifier une cellule ou un mélange de cellules afin de pronostiquer et de diagnostiquer des maladies et pour effectuer des traitements de réparation cellulaire

Publications (2)

Publication Number Publication Date
EP2714933A2 true EP2714933A2 (fr) 2014-04-09
EP2714933A4 EP2714933A4 (fr) 2014-12-31

Family

ID=47218125

Family Applications (1)

Application Number Title Priority Date Filing Date
EP12789375.8A Withdrawn EP2714933A4 (fr) 2011-05-25 2012-05-25 Procédés faisant appel à la méthylation de l'adn pour identifier une cellule ou un mélange de cellules afin de pronostiquer et de diagnostiquer des maladies et pour effectuer des traitements de réparation cellulaire

Country Status (3)

Country Link
EP (1) EP2714933A4 (fr)
CA (1) CA2869295A1 (fr)
WO (1) WO2012162660A2 (fr)

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9561006B2 (en) * 2008-10-15 2017-02-07 The United States Of America As Represented By The Secretary Of The Navy Bayesian modeling of pre-transplant variables accurately predicts kidney graft survival
EP3382033B1 (fr) * 2017-03-30 2020-08-05 Rheinisch-Westfälische Technische Hochschule (RWTH) Aachen Méthode pour déterminer le nombre des cellules sanguines basée sur la méthylation de l'adn
DE102017125013B4 (de) 2017-10-25 2019-10-17 Epiontis Gmbh MCC als epigenetischer Marker zur Identifizierung von Immunzellen, insbesondere basophiler Granulozyten
DE102017125019B4 (de) 2017-10-25 2019-10-17 Epiontis Gmbh PDCD1 als epigenetischer Marker zur Identifizierung von Immunzellen, insbesondere PD1+ Zellen
DE102017125150B4 (de) 2017-10-26 2019-10-10 Epiontis Gmbh Endosialin (CD248) als epigenetischer Marker zur Identifizierung von Immunzellen, insbesondere naïver CD8+ T-Zellen
DE102017125335B4 (de) 2017-10-27 2019-10-17 Epiontis Gmbh Amplikon-Region als epigenetischer Marker zur Identifizierung von Immunzellen, insbesondere nicht-klassischer Monozyten
DE102017126248B4 (de) 2017-11-09 2019-10-17 Epiontis Gmbh ERGIC1 als epigenetischer Marker zur Identifizierung von Immunzellen, insbesonderemonozytischem myeloid-abgeleiteten Suppressor-Zellen (mMDSCs)
DE102018112644B4 (de) 2018-05-25 2020-06-10 Epiontis Gmbh CXCR3 als epigenetischer Marker zur Identifizierung von inflammatorischen Immunzellen, insbesondere CD8+ Gedächnis-T-Zellen
WO2020007951A1 (fr) 2018-07-05 2020-01-09 Epiontis Gmbh Méthode épigénétique pour détecter et faire la distinction entre des syndromes ipex et des syndromes de type ipex, en particulier chez les nouveau-nés
DE102020108560B4 (de) 2020-03-27 2022-03-03 Precision For Medicine Gmbh CBX6 als epigenetischer Marker für die Identifizierung von Immunzellen, insbesondere Gedächtnis-B-Zellen
DE102020111423B4 (de) 2020-04-27 2022-03-03 Precision For Medicine Gmbh MYH11/NDE1 Region als epigenetischer Marker für die Identifizierung von Endothel-Vorläuferzellen (EPCs)
KR20230119130A (ko) 2020-12-14 2023-08-16 리제너론 파마슈티칼스 인코포레이티드 인히빈 서브유닛 베타 e(inhbe) 억제제로 대사 장애 및 심혈관 질환을 치료하는 방법
CN114703284A (zh) * 2022-04-15 2022-07-05 北京莱盟君泰国际医疗技术开发有限公司 一种血液游离dna甲基化定量检测方法及其应用
CN116312794B (zh) * 2023-01-09 2023-11-14 哈尔滨医科大学 一种融合单细胞分析方法的甲基化样本聚类方法
CN116949156B (zh) * 2023-09-19 2023-12-08 美迪西普亚医药科技(上海)有限公司 一种基于核酸变体的通用型检测人t细胞的分析方法

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1826278A1 (fr) * 2006-02-28 2007-08-29 Epiontis GmbH Modification épigénétique des loci pour camta1 et/ou foxp3 à des fins de marquage dans le traitement du cancer

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1826278A1 (fr) * 2006-02-28 2007-08-29 Epiontis GmbH Modification épigénétique des loci pour camta1 et/ou foxp3 à des fins de marquage dans le traitement du cancer

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
JOSE I. MARTIN-SUBERO ET AL: "A Comprehensive Microarray-Based DNA Methylation Study of 367 Hematological Neoplasms", PLOS ONE, vol. 4, no. 9, E6986, 11 September 2009 (2009-09-11), pages 1-11, XP055139130, DOI: 10.1371/journal.pone.0006986 *
KRISTI KERKEL ET AL: "Altered DNA Methylation in Leukocytes with Trisomy 21", PLOS GENETICS, vol. 6, no. 11, E1001212, 18 November 2010 (2010-11-18), pages 1-13, XP055139128, DOI: 10.1371/journal.pgen.1001212 *
See also references of WO2012162660A2 *
WU MINGQI ET AL: "Bayesian modeling of ChIP-chip data using latent variables.", BMC BIOINFORMATICS, vol. 10, 352, 2009, page 13PP, XP002731974, ISSN: 1471-2105 *

Also Published As

Publication number Publication date
EP2714933A4 (fr) 2014-12-31
WO2012162660A3 (fr) 2013-03-28
WO2012162660A2 (fr) 2012-11-29
CA2869295A1 (fr) 2012-11-29

Similar Documents

Publication Publication Date Title
US10619211B2 (en) Methods using DNA methylation for identifying a cell or a mixture of cells for prognosis and diagnosis of diseases, and for cell remediation therapies
WO2012162660A2 (fr) Procédés faisant appel à la méthylation de l'adn pour identifier une cellule ou un mélange de cellules afin de pronostiquer et de diagnostiquer des maladies et pour effectuer des traitements de réparation cellulaire
Jamshidi et al. Evaluation of cell-free DNA approaches for multi-cancer early detection
ES2735993T3 (es) Métodos para predecir el resultado clínico del cáncer
EP2569626B1 (fr) Méthodes et compositions utilisées dans le cadre du diagnostic de maladies
US9963747B2 (en) Methods for the identification, assessment, and treatment of patients with cancer therapy
EP3103046B1 (fr) Procédé de signature de biomarqueurs, et appareil et kits associés
US20180011106A1 (en) Methods for identifying, diagnosing, and predicting survival of lymphomas
US20190100809A1 (en) Algorithms for disease diagnostics
US20140127690A1 (en) Mutation Signatures for Predicting the Survivability of Myelodysplastic Syndrome Subjects
US11661632B2 (en) Compositions and methods for diagnosing lung cancers using gene expression profiles
EP2785873A2 (fr) Procédés de traitement du cancer du sein avec une thérapie au taxane
CA2985683A1 (fr) Procedes et compositions de diagnostic ou de detection de cancers du poumon
US20110070582A1 (en) Gene Expression Profiling for Predicting the Response to Immunotherapy and/or the Survivability of Melanoma Subjects
WO2016061246A1 (fr) Procédés et compositions pour mettre en corrélation des marqueurs génétiques avec le risque de cancer
CA2890161A1 (fr) Combinaisons de biomarqueurs pour tumeurs colorectales
JP2011509689A (ja) Ii及びiii期結腸癌の分子病期分類並びに予後診断
Zheng et al. Heterozygous/dispermic complete mole confers a significantly higher risk for post-molar gestational trophoblastic disease
WO2019005764A1 (fr) Génotype du cmh-i limitant le paysage mutationnel oncogène
Scott et al. A multi-scale integrated analysis identifies KRT8 as a pan-cancer early biomarker
WO2014066984A1 (fr) Procédé pour identifier un profil moléculaire cible associé à une population cellulaire cible
CA3041821A1 (fr) Procede de mesure des cellules myeloides suppressives pour le diagnostic et le pronostic du cancer
US20210079479A1 (en) Compostions and methods for diagnosing lung cancers using gene expression profiles
Goan et al. Deregulated p21WAF1 overexpression impacts survival of surgically resected esophageal squamous cell carcinoma patients
WO2007137366A1 (fr) Indicateurs de diagnostic et de pronostic du cancer

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20140102

AK Designated contracting states

Kind code of ref document: A2

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

DAX Request for extension of the european patent (deleted)
RIC1 Information provided on ipc code assigned before grant

Ipc: C12N 15/11 20060101ALI20141110BHEP

Ipc: G06F 17/10 20060101ALI20141110BHEP

Ipc: C12Q 1/68 20060101AFI20141110BHEP

A4 Supplementary search report drawn up and despatched

Effective date: 20141203

RIC1 Information provided on ipc code assigned before grant

Ipc: G06F 17/10 20060101ALI20141126BHEP

Ipc: C12Q 1/68 20060101AFI20141126BHEP

Ipc: C12N 15/11 20060101ALI20141126BHEP

17Q First examination report despatched

Effective date: 20150806

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20200114