WO2014189787A2

WO2014189787A2 - Compositions and methods for the determination of methylation status

Info

Publication number: WO2014189787A2
Application number: PCT/US2014/038448
Authority: WO
Inventors: Jonathan Lim; Kurt KRUMMEL; Robert Shoemaker; Zachary HORNBY
Original assignee: Ignyta, Inc.
Priority date: 2013-05-20
Filing date: 2014-05-16
Publication date: 2014-11-27
Also published as: WO2014189787A3

Abstract

Compositions and methods related to improved nucleic acid methylation status determination are disclosed.

Description

COMPOSITIONS AND METHODS FOR THE DETERMINATION OF

METHYLATION STATUS INCORPORATION BY REFERENCE TO ANY PRIORITY APPLICATIONS

[0001] The present application claims priority to U.S. Provisional Application No. 61/825,485, filed on May 20, 2013, which is herein expressly incorporated by reference in its entirety.

REFERENCE TO SEQUENCE LISTING

[0002] The present application is being filed along with a Sequence Listing in electronic format. The Sequence Listing is provided as a file entitled IGNYT_019WO.TXT, created May 12, 2014, which is 1.02 Kb in size. The information in the electronic format of the Sequence Listing is incorporated herein by reference in its entirety. BACKGROUND OF THE INVENTION

[0003] DNA methylation is known to have a major role in determining gene expression and biological function. DNA methylation has been implicated in a number of diseases, including autoimmune diseases and cancer. Therefore, accurate methylation status determination is of major benefit in determination of, for example, disease status or cause.

[0004] Systemic lupus erythematosus (SLE) or lupus is an autoimmune disease resulting in multiorgan involvement. SLE is characterized by autoantibodies directed against a variety of nuclear and cytoplasmic cellular components, forming immune complexes which circulate and eventually deposit in tissues. This deposition causes the chronic inflammation and tissue damage that is the hallmark of SLE.

[0005] Similarly, rheumatoid arthritis (RA) is a multisystem disorder in which immunological abnormalities result in joint inflammation, articular erosions and extra- articular complications. Diagnosis of RA has traditionally been challenging, as the disease pattern may not be evident from the patient history, examination and investigations during the early stages of the disease. [0006] Thus, both properly diagnosing and monitoring the treatment of these diseases is problematic. This is accentuated by the broad spectrum of disease that ranges from subtle or vague symptoms to life threatening multi-organ failure. There are other diseases with multi- system involvement that can be mistaken for SLE and RA, or vice versa.

[0007] Cancer detection provides similar problems, despite advances in imaging and treatment. Genetic analysis of a patient with cancer, or with a predisposition to cancer, is an advantageous strategy for both detecting the disease, as well as monitoring the efficacy of treatment.

[0008] Epigenetic mechanisms such as DNA methylation play a fundamental role in the etiology of autoimmune diseases and cancer by modulating the methylation state and transcriptional activity of critical genes that affect cell differentiation, maturation, and function. These epigenetic changes may lead to the development of autoimmune disorders and/or cancer. The identification of the key differentially methylated loci may provide novel biomarkers for diagnostics.

[0009] If a cell type specific differentially methylated locus (DML) is located within a gene's promoter region, this gene is considered to be differentially methylated. SUMMARY OF THE INVENTION

[0010] The disclosure herein relates to methods and compositions useful for the determination of methylation status at one or more positions in a nucleic acid sequence.

[0011] Some embodiments comprise oligonucleotide molecules that may be used in PCR amplification reactions, for example, having the properties that they selectively amplify nucleic acid template which has been successfully bisulfite converted. Some embodiments relate to methods of designing oligonucleotide molecules that may be used in PCR amplification reactions, for example, having the properties that they selectively amplify a nucleic acid template that has been successfully bisulfite converted.

[0012] Some embodiments comprise nucleic acid compositions having known methylation percentages, for example nucleic acid compositions that may be used as controls to monitor one or more aspects of bisulfite treatment, conversion, amplification or sequencing related to the determination of methylation status at least one position on a sample nucleic acid. Some embodiments relate to methods of evaluating procedures for the determination of methylation status at least one position on a nucleic acid sample.

[0013] In some embodiments, the method of selecting a first primer for amplification of an amplicon spanning a differentially methylated locus in a nucleic acid comprises identifying a differentially methylated locus for amplification; identifying a desired amplicons size; and identifying a first primer binding site wherein the number of cytosines in the first primer binding site is maximized. In some embodiments, the number of cytosines in the first primer binding site is maximized while maintaining acceptable primer melting temperature (Tm), GC percentage, primer structure, primer uniqueness, and/or primer size. Primer structure refers to minimizing self-annealing and primer-dimer interactions. Primer uniqueness refers to the primer only hybridizing to only a single site in the human genome.

[0014] In some embodiments, the Tm is between about 50 to about 70°C. In some embodiments the Tm is 50°C. In some embodiments the Tm is 51°C. In some embodiments the Tm is 52°C. In some embodiments the Tm is 53°C. In some embodiments the Tm is 54°C. In some embodiments the Tm is 55°C. In some embodiments the Tm is 56°C. In some embodiments the Tm is 57°C. In some embodiments the Tm is 58°C. In some embodiments the Tm is 59°C. In some embodiments the Tm is 60°C. In some embodiments the Tm is 61 °C. In some embodiments the Tm is 62°C. In some embodiments the Tm is 63°C. In some embodiments the Tm is 64°C. In some embodiments the Tm is 65°C. In some embodiments the Tm is 66°C. In some embodiments the Tm is 67°C. In some embodiments the Tm is 68°C. In some embodiments the Tm is 69°C. In some embodiments the Tm is 70°C.

[0015] In some embodiments, the GC percentage is about 10% to about 70%. In some embodiments, the GC percentage is about 10-15%. In some embodiments, the GC percentage is about 15-20%. In some embodiments, the GC percentage is about 20-25%. In some embodiments, the GC percentage is about 25-30%. In some embodiments, the GC percentage is about 30-35%. In some embodiments, the GC percentage is about 35-40%. In some embodiments, the GC percentage is about 40-45%. In some embodiments, the GC percentage is about 45-50%. In some embodiments, the GC percentage is about 50-55%. In some embodiments, the GC percentage is about 55-60%. In some embodiments, the GC percentage is about 60-65%. In some embodiments, the GC percentage is about 65-70%.

[0016] In some embodiments, the primer size is 18-30 base pairs (bp). In some embodiments, the primer is 18 bp. In some embodiments, the primer is 19 bp. In some embodiments, the primer is 20 bp. In some embodiments, the primer is 21 bp. In some embodiments, the primer is 22 bp. In some embodiments, the primer is 23 bp. In some embodiments, the primer is 24 bp. In some embodiments, the primer is 25 bp. In some embodiments, the primer is 26 bp. In some embodiments, the primer is 27 bp. In some embodiments, the primer is 28 bp. In some embodiments, the primer is 29 bp. In some embodiments, the primer is 30 bp.

[0017] In some embodiments, the number of single nucleotide polymorphisms in the amplicon is minimized. In some embodiments, the number of cytosines in the primer binding site corresponding to the 3' end of the primer is maximized. In some embodiments, the number of cytosines in the first primer binding site in combination with the number of cytosines in a second primer binding site of a second primer is maximized, wherein the second primer is configured to be used in pair with the first primer to amplify the amplicon from the nucleic acid in a pair with the first primer. In some embodiments, the number of cytosines in the first primer binding site corresponding to the 3 ' end of the first primer in combination with the number of cytosines in the second primer binding site corresponding to the 3 ' end of the second primer of a pair primer to generate said amplicon is maximized. In some embodiments, the number of cytosines in the amplicon outside of the first primer binding site and the second primer binding site is minimized. In some embodiments, a difference in CG concentration of the first primer and the second primer is minimized. In some embodiments, a difference in Tm of the first primer and the second primer is minimized.

[0018] In some embodiments, the method of assessing methylation reaction quality comprises the steps of performing a polymerase chain reaction on a sample having a known template concentration using primers from the aforementioned primer design method, determining an effective template concentration, and discarding said sample if said effective concentration is less than 50%, 60%, 70%, 80%, or 90% of said known template concentration. [0019] In some embodiments, the method of assessing bisulfite treatment reaction quality comprises the steps of performing a polymerase chain reaction on a sample having a known template concentration using primers designed according to the aforementioned primer design method, determining an effective template concentration, and discarding said sample if said effective concentration is less than 50%, 60%, 70%, 80%, or 90% of said known template concentration.

[0020] In some embodiments, the method of assessing amplicons generation reaction quality comprises the steps of performing a polymerase chain reaction on a sample having a known template concentration using primers according to the aforementioned primer design method, determining an effective template concentration, and discarding said sample if said effective concentration is less than 50%, 60%, 70%, 80%, or 90% of said known template concentration.

[0021] In some embodiments, the method of assessing amplification bias in a set of templates comprises performing an amplification reaction on a first template having a first methylation frequency; performing an amplification reaction on a second template having a first methylation frequency; and discarding said first template if an amplification yields less than a 50%, 60%, 70%, 80%, or 90% yield of amplification product from said first template compared to said second template.

[0022] In some embodiments, the method of assessing methylation site determination reaction quality comprises the steps of performing a methylation determining reaction on a first template, performing a methylation determining reaction on a control template having a known methylation efficiency, and discarding said sample if an effective template concentration of said first sample differs from a known template concentration of said first sample by greater than 10%, 20%, 30%, or 40% compared to said difference in said known template concentration and said effective concentration as determined for said control template. In some embodiments, the method of assessing methylation site determination reaction quality comprises the steps of performing a methylation determining reaction on a first template, performing a methylation determining reaction on a control template having a known methylation efficiency, and discarding said sample if a variance across samples is greater than 5%, 10%, or 20%, higher than control values. In some embodiments, the method of assessing methylation site determination reaction quality comprises the steps of performing a methylation determining reaction on a first template, performing a methylation determining reaction on a control template having a known methylation efficiency, and discarding said sample if a variance across samples is greater than l . lx, 1.2x, 1.3x, 1.4x, or 1.5x of control values. In some embodiments, the templates are bisulfite-treated nucleic acids.

[0023] In some embodiments, the method of interfacing a primer-treated sample output with an amplicon sequencing apparatus comprises the steps of providing at least a first amplicons, generating a coordinates file for each amplicons, determining which sites within each amplicons are methylation variant sites, creating a machine compatible sample sheet, wherein the sample sheet identifies sample IDs, primer IDs and associated barcode IDs, transferring said sample sheet to said sequencing device, and generating a sheet creation signal, and sending said single to said sample sequencing apparatus to catalogue a second sample.

[0024] In some embodiments, these methods diagnose a disease. In some embodiments, the disease is cancer. In some embodiments, the cancer is selected from the group consisting of of leukemia, carcinoma, sarcoma, lymphoma, skin, Non-Hodgkin lymphoma, Hodgkin lymphoma melanoma, acute myeloid leukemia, chronic myeloid leukemia, chronic lymphocytic leukemia, acute lymphocytic leukemia, tongue, astrocytoma, nephroma, glioblastoma, hepatocellular, urinary bladder, gallbladder, pancreatic, gastric, colon, rectal, glioma, small intestine, intrahepatic bile duct, non-epithelial skin, breast, brain, testicular, cervical, ovarian, kidney and renal pelvis, cardiac, endometrial, uterine, vaginal, vulvar, esophageal, head, neck, salivary, small cell lung, non-small cell lung, bone, eye and orbit, endocrine, salivary, retinoblastoma, neuroblastoma, meduloblastoma, osteosarcoma, Pheochromocytoma, renal, penile, liver, mesothelioma, oral, nasal, myeloma, thyroid, adrenal, pituitary, prostate, throat, perineural invasion, Wilm's tumor, adenoid cystic, bronchus, and inflammatory myofibroblastic tumor.

[0025] In some embodiments, the disease is an autoimmune disease. In some embodiments, the autoimmune disease is selected from the group consisting of rheumatoid arthritis, juvenile rheumatoid arthritis, lupus, ulcerative colitis, Crohn's disease, psoriasis, psoriatic arthritis, Addison's disease, Grave's disease, myasthenia gravis, Cushing's syndrome, ankylosing spondylitis, Type I diabetes, eczema, and multiple sclerosis. [0026] In some embodiments, the disease is pain. In some embodiments, the pain is related to cancer, autoimmune disease, or other chronic genetic condition.

BRIEF DESCRIPTION OF THE DRAWINGS

[0027] Fii *ure 1 depicts a general lab overview.

[0028] Fii *ure 2 depicts sample procurement.

[0029] Fii *ure 3 depicts a cell enrichment workflow.

[0030] Fii *ure 4 depicts a DNA processing workflow.

[0031] Fij *ure 5 depicts a Differentially Methylated Loci (DML) data generation workflow.

[0032] Fii *ure 6 depicts a microarray workflow.

[0033] Fii *ure 7 depicts a next generation sequencing (NGS) workflow.

[0034] Fii *ure 8 depicts a methylation control workflow.

[0035] Fii *ure 9 depicts a microarray NGS Bioinformatics workflow.

[0036] Fii *ure 10 depicts an array data normalization workflow.

[0037] Fii *ure 1 1 depicts a DML discovery workflow.

[0038] Fii *ure 12 depicts a detailed view of a primer generation workflow.

[0039] Fii *ure 13 depicts an NGS sample sheet generation workflow.

DETAILED DESCRIPTION

[0040] While the present invention has been described in some detail for purposes of clarity and understanding, one skilled in the art will appreciate that various changes in form and detail can be made without departing from the true scope of the invention.

[0041] Methods and compositions disclosed herein relate to the determination of methylation status at one or more loci, such as loci identified to be differentially methylated in, for example, the human genome. Embodiments are not, however, limited to any particular genome. As one will see, some embodiments of the methods and compositions disclosed herein may apply to any number of nucleic acid sequences for which methylation status at a known differentially methylated locus is to be determined.

[0042] Some embodiments relate to primer generation for the amplification of amplicons spanning one or more differentially methylated loci. Some embodiments relate to methods of primer design that comprise one or more selective constraints on primer design. In some embodiments these one or more selective constraints on primer design may yield primers which beneficially selectively amplify target template sequence.

[0043] DNA methylation is detected by a number of methods. Some approaches involve bisulfite treatment (See, for example, Frommer et al. (1992) "A genomic sequencing tool that yields a positive display of 5-methyl cytosine residues in individual DNA strands" Proc. Natl. Acad. Sci. USA 89: 1827- 1831 , the contents of which are hereby incorporated by reference in their entirety). More recent refinements include commercially available bisulfite treatment kits, such as the EZ DNA Methylation-Lightning Kit offered by Zymo research, the protocol of which is available at the website http:/;Vw^^zvmoresearch.conx/downloads/dl/file/id' 90/d5030i.pdf (as of May 15, 2014), the contents of which website document are hereby incorporated by reference in their entirety herein.

[0044] Chemical reactions performed pursuant to determining a methylation status at one or more loci of a DNA sample may comprise converting unmethylated cytosine bases to uracil bases, which bases pair in double-stranded DNA hybridization reactions like thymidine rather than like cytosine. Methylated cytosine bases are unconverted. Thus in some embodiments the presence of cytosine bases in a bisulfite treated DNA sample are indicative of methylation at said cytosine bases.

[0045] However, even in light of the protocols discussed above, there are substantial technical challenges to determining a DNA methylation status. For example, incomplete bisulfite treatment may result in an over-representation of cytosine bases in a DNA sequence as compared to the actual number of methylated sites.

[0046] Accordingly, some embodiments of the recent disclosure relate to methods and compositions for the detection of incompletely performed bisulfite treatment reactions or for the selective amplification of sequence for which said bisulfite treatment is likely to have been complete or relatively so.

[0047] In some embodiments, DNA is obtained from a patient sample. In some embodiments, the sample is a blood sample. In some embodiments, each cell type-specific DNA sample is bisulfite converted and run on an Infinium HumanMethylation450 BeadChip. In some embodiments, differential methylation discovery is then executed per cell type. In some embodiments, this analysis provides a set of cell type specific CpGs that are significantly differentially methylated in a phenotype of interest (e.g., SLE) relative to other phenotypes (e.g., non-SLE/RA autoimmune diseases and healthy controls). The identified CpGs are known as differentially methylated loci (DML).

[0048] In some embodiments, oligonucleotide primers for the selective amplification of amplicons spanning one or more differentially methylated loci in reactions such as polymerase chain reactions (PCR) are beneficially designed according to at least one of the ranked sorting criteria as follows:

1. Chromosome.

2. Coordinate of targeted Cytosine adjacent to Guanine (CpG) methylation site.

3. Targeted strand by primer pair.

4. Is a SNP located on diagnostic CpG (No ranked higher than Yes)?

5. Total cytosine count in unconverted sequences targeted by paired primers (largest to smallest).

6. Sum of cytosines at 3' terminal base position (relative to primer) in unconverted sequences targeted by paired primers (largest to smallest).

7. Sum of differences between paired primers' GC% and optimal primer GC% parameter (smallest to largest).

8. GC% difference between left primer and optimal GC% parameter (smallest to largest).

9. GC% difference between right primer and optimal GC% parameter (smallest to largest).

10. Sum of difference between paired primers' melting temperature (Tms) and optimal primer Tm parameter (smallest to largest).

11. Tm difference between left primer and optimal Tm parameter (smallest to largest).

12. Difference between right primer and optimal Tm parameter (smallest largest).

[0049] In some embodiments, one or more beneficial aspects of the above- mentioned sorting criteria are as follows. The chromosome for which amplification is desired is selected. The targeted methylation site, such as a CpG site, is selected. A strand is targeted by primer pair.

[0050] In some embodiments, amplicons comprising no single-nucleotide polymorphisms (SNP) are preferred, as SNPs may confuse the methylation analysis, particularly if one polymorphism at a SNP locus mimics the effect of bisulfite conversion, that is, conversion of a C to a T in a final amplicon. SNPs are also particularly not preferred if the differentially methylated locus is also a SNP locus.

[0051] In some embodiments, primer binding sites having a higher total cytosine count in unconverted template are preferred. A beneficial aspect of such a selection is that incompletely converted DNA template will not be efficiently amplified by primers the sequence of which matches the expected sequence resultant from bisulfite treatment and subsequent base conversion. Cytosines at or near the 3 ' terminal base position of a potential primer are particularly useful for this purpose.

[0052] Some embodiments of the methods performed according to at least one of the selection criteria above yield primers which differentially anneal to bisulfite-treated templates. In some embodiments an effective template concentration may be determined, for example using quantitative PCR methods, and the results of a PCR reaction using primers designed according to the methods herein may be evaluated for further analysis based at least in part on the difference between the known template concentration and the effective template concentration calculated from the efficiency of amplification. An effective template concentration may be determined using, for example, comparison to standard amplification cycle counts for known inputs of high molecular integrity, and a known template concentration may be determined, for example, spectrophotometrically, for example in advance of performing the PCR reaction. Methods of determining an effective template concentration in a QPCR reaction involve monitoring amplification and determining the concentration of template consistent with the observed nucleic acid synthesis resulting from performance of a PCR reaction. Exemplary methods are provided, for example, at http://www.invitrogen.con site/us/en/home/Products-and-Services/Applications/PCR real- time-pcr/qpcr-education/pcr-understanding-ct-application-note.html. viewed May 19 2013, the contents of which are hereby incorporated by reference in their entirety. In some embodiments a sample is excluded from further analysis if the effective template concentration is less than 99%, 98%, 97%, 96%, 95%, 94%, 93%, 92%, 91 %, 90%, 89%, 88%, 87%, 86%, 85%, 84%, 83%, 82%, 81%, 80%, 79%, 78%, 77%, 76%, 75%, 74%, 73%, 72%, 71%, 70%, 69%, 68%, 67%, 66%, 65%, 64%, 63%, 62%, 61%, 60%, 59%, 58%, 57%, 56%, 55%, 54%, 53%, 52%, 51%, 50%, 49%, 48%, 47%, 46%, 45%, 44%, 43%, 42%, 41%, 40%, 39%, 38%, 37%, 36%, 35%, 34%, 33%, 32%, 31%, 30%, 29%, 28%, 27%, 26%, 25%, 24%, 23%, 22%, 21%, 20%, 19%, 18%, 17%, 16%, 15%, 14%, 13%, 12%, 1 1%, 10%, 9%, 8%, 7%, 6%, 5%, 4%, 3%, 2%, or 1% of an input template concentration.

[0053] Some embodiments of methods performed according to at least one of the selection criteria above yield primers that possess properties that are common to a primer population, so as to facilitate use of multiple primer pairs in a single thermocycling block or in common thermocycling parameters.

[0054] In some embodiments primers may be designed such that the sum of the differences in GC% content and optimal GC% is minimized. In some embodiments primers may be designed such that the difference in GC% content between the left primer and the optimum is minimized. In some embodiments primers may be designed such that the difference in GC% content between the right primer and the optimum is minimized.

[0055] In some embodiments the sum of the difference between the paired primers; Tm and optimal primer Tm is minimized. In some embodiments primers may be designed such that the difference in Tm between the left primer and the optimum is minimized. In some embodiments primers may be designed such that the difference in Tm between the right primer and the optimum is minimized.

[0056] Some embodiments involve primer generation and primers generated according to one or more of the following parameters. 1. Minimum Primer Size: 20 (default=18) 2. Maximum Primer Size: 30 (default=27) 3. Optimal Primer Size: 20 (default=20) 4. Minimum Primer GC Content: 30 (default=20) 5. Maximum Primer GC Content: 80 (default=80) 6. Optimal Primer GC Content: 50 (default=50) 7. Minimum Primer Tm: 58 (default=57) 8. Maximum Primer Tm: 61 (default = 63) 9. Optimal Primer Tm: 60 (default = 60) 10. PCR Product Size Range: 100—200 (no default, context dependent) 1 1. Optimal PCR Product Size: 150 (no default, context dependent) 12. Maximum Primer Pair Tm Difference: 5 (default= 100) 13. Number Of Primer Pairs To Return: 250 (default=5) 14. Maximum Number Of N's Accepted: 0 (default=0) Note: Bases Marked with Y And R may be treated as N's in some primer design programs.

[0057] Some beneficial attributes of primers designed according to the disclosure herein are as follows. In some embodiments, avoiding CpGs in primers' target regions minimizes biased amplification of methylated/ unmethylated target regions. In some embodiments, CpGs are marked as an ambiguous base so that a primer design program avoids selecting primers that overlap with target methylation sites, such as target methylation site CpGs. Some embodiments rely on an assumption that only cytosines of CpGs will be methylated. Methylation state changes base composition of CpG after bisulfite conversion and thus could affect primer hybridization.

[0058] In some embodiments, choosing primers that target the bisulfite conversion sequence resulting from bisulfite conversion of regions which, prior to conversion, are regions rich in cytosines minimizes hybridization to incompletely converted bisulfite sequences (turquoise bubbles). In some embodiments, complete bisulfite conversion means non-CpG cytosines will be converted to uracils, which will direct synthesis of thymines in their place in amplicons generated through PCR amplification. Unconverted target regions containing more cytosines will have a greater difference in base composition if bisulfite conversion is incomplete. Primers may hybridize to these incompletely converted target regions at a lower rate, such that a difference in amplification efficiency or effective template concentration may be determined. In some embodiments, primers designed as above are more likely to bind template DNA resulting from successful bisulfite conversion reactions than from DNA template which has not been or has not been efficiently or successfully bisulfite converted.

[0059] In some embodiments the primers are designed using a primer design program, guided by one or more of the considerations above. In some embodiments the primer design program is primer3, such as primer3 (http://primer3.sourceforge.net/, visited May 19, 2013) available at the website http://frodo.wi.niit.edu/ (visited May 19, 2013).

[0060] Some embodiments relate to compositions and methods for the controlled assessment of the success or efficiency of reactions such as reactions performed pursuant to the determination of methylation status at one or more nucleic acid loci. Some embodiments comprising methods and compositions for the assessment of the success or efficiency of said reactions comprise the generation or use of amplification templates having known or controlled methylation percentages.

[0061] In some embodiments, methylation control templates are processed, for example in parallel with samples the methylation status of which are to be analyzed. In some embodiments the control templates have known methylation percentages. Control templates may be subjected to bisulfite mapping and/or PCR amplification reactions in parallel with or similar to those to which one or more samples are subjected, and methylation status of one or more sites may be determined, for example by next generation sequencing of amplicons generated from the templates. Experimentally determined methylation status may be compared to expected methylation status in light of the known input methylation concentrations for the controls. In some embodiments, samples or reaction runs or both may be excluded from further analysis if experimentally determined and expected methylation patterns for a control differ by, for example, 1 %, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 1 1%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 21 %, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 30%, 31%, 32%, 33%, 34%, 35%, 36%, 37%, 38%, 39%, 40%, 41 %, 42%, 43%, 44%, 45%, 46%, 47%, 48%, 49%, 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81 %, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% from expected values.

[0062] In some embodiments, controls having similar expected methylation efficiencies but different template sequences are compared, for example to measure any detection bias that may arise from the template sequence. In some embodiments, a sample or reaction run may be excluded if the experimentally determined control methylation values for the run of that sample differ from one or more other experimentally determined control methylation values for controls of differing sequence but similar expected methylation percentage. In some embodiments, samples or reaction runs or both may be excluded from further analysis if experimentally determined and expected methylation patterns for a control differ from values for other controls of similar expected methylation percentage by, for example, 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 1 1%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 21%, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 30%, 31 %, 32%, 33%, 34%, 35%, 36%, 37%, 38%, 39%, 40%, 41%, 42%, 43%, 44%, 45%, 46%, 47%, 48%, 49%, 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71 %, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91 %, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% from observed values.

[0063] In some embodiments, DNA templates of known methylation percentages are generated. In some embodiments, completely unmethylated and completely methylated DNA populations are created, for example using commercial kits. Methylated and unmethylated DNA products are mixed at predetermined ratios to establish methylation controls, for example controls of known methylation percentages.

[0064] In some embodiments, samples are treated, for example using commercial kits, for various times or with various enzyme concentrations to generate differing percent methylation according to the time of treatment or concentration of enzyme.

[0065] Methylation controls may be run alongside clinical samples to assess quality of run according, for example, to the parameters as discussed above.

[0066] In some embodiments the control methylation templates comprise DNA of sequence which, perhaps aside from methylation status or SNP presence or other natural variation, is identical to that of the target amplicons to be assayed for a sample or set of samples. In some embodiments the control methylation template to be generated is human DNA. In some embodiments the control DNA comprises an amplicon that span a methylation site of interest. In some embodiments the amplicon is an amplicon such as that to be generated from a sample, perhaps using primers as designed using methods as disclosed herein. In some embodiments the methylation control DNA comprises readily available non- human DNA such as phage lambda DNA. E. coli DNA or plasmid DNA. In some embodiments control DNA is selected having a known methylation pattern, such as centromeric DNA, such that methylation reactions do not need to be performed prior to use as a control, or such that demethylation rather than methylation reactions may be performed pursuant to control sample preparation.

[0067] As an example of the methylation control disclosed herein, two methylation control DNA samples (e.g., 30% and 60% methylated) are run alongside clinical samples during NGS, qPCR, digital PCR, microarray, or other sample preparations. Targeted CpGs of methylation control samples are amplified alongside clinical samples. Methylation percentages for these control samples are calculated using, for example, Next Generation Sequencing data.

[0068] In some embodiments, if the coefficient of variation percentage of methylation frequencies across targeted CpGs among methylation control DNA samples exceeds a threshold, t, sample preparation and sequencing is repeated. In some embodiments, t is, for example, 1 %, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 1 1%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 21%, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 30%, 31%, 32%, 33%, 34%, 35%, 36%, 37%, 38%, 39%, 40%, 41%, 42%, 43%, 44%, 45%, 46%, 47%, 48%, 49%, 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99. In some embodiments, if the absolute methylation frequency for any CpG differs from the expected methylation frequency beyond a threshold, u, sample preparation and sequencing is repeated. In some embodiments, u is, for example, 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 1 1%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 21%, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 30%, 31%, 32%, 33%, 34%, 35%, 36%, 37%, 38%, 39%, 40%, 41%, 42%, 43%, 44%, 45%, 46%, 47%, 48%, 49%, 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%.

[0069] In some embodiments methylation controls establish strong quality control metrics. In some embodiments methylation controls may influence throughput of the assay, for example by decreasing throughput metrics.

[0070] In some embodiments mixtures of 0% to 100% methylated DNA in 10% increments to use as quality controls in methylation arrays and NGS, as methylation controls may be generated as follows: Create 100% unmethylated normal DNA. Create 100% methylated normal DNA. Make the appropriate percent mixtures of DNA.

[0071] Some embodiments of the methods disclosed herein relate to the interface between amplicons generating devices such as PCR thermocyclers and sequencing devices such as NGS sequencers. Some embodiments comprise a method of interfacing a primer- treated sample output with an amplicon sequencing apparatus comprising the steps of providing at least a first amplicons, generating a coordinates file for each amplicons, determining which sites within each amplicons are methylation variant sites, creating a machine compatible sample sheet, wherein the sample sheet identifies sample IDs, primer IDs and associated barcode IDs, transferring said sample sheet to said sequencing device, and generating a sheet creation signal, and sending said single to said sample sequencing apparatus to catalogue a second sample.

[0072] In some embodiments performance of said method beneficially allows the interface of otherwise incompatible devices such that sample amplicons generated on one device may be analyzed on a second device at a high throughput rate.

[0073] This DML analysis method found a 1 18 DML cluster in PBMC samples that distinguishes lupus from all phenotype comparisons. Similarly, a 79 DML cluster was discovered in the T-cell subset, a 180 DML cluster in the B-cell subset, and a 182 DML cluster in the monocyte subset, each of which distinguishes lupus from all phenotype comparisons. This data was then used to develop a 25 DML cluster for whole blood samples, which distinguished 7/18 samples as having lupus and 1 1/18 that did not.

[0074] Accordingly, methylation promoter profiles for lupus-relevant genes were generated. Multiple CpGs showed significant hypermethylation in the Gene 2 promoter region. Moreover, monocytes and B-cells showed the greatest hypermethylation from CpGs - 150 to -300 bps upstream of the Gene 2 transcription start site.

[0075] All numbers expressing quantities of ingredients, reaction conditions, and so forth used in the specification are to be understood as being modified in all instances by the term "about." Accordingly, unless indicated to the contrary, the numerical parameters set forth herein are approximations that may vary depending upon the desired properties sought to be obtained. At the very least, and not as an attempt to limit the application of the doctrine of equivalents to the scope of any claims in any application claiming priority to the present application, each numerical parameter should be construed in light of the number of significant digits and ordinary rounding approaches.

[0076] The above description discloses several methods and materials of the present invention. This invention is susceptible to modifications in the methods and materials, as well as alterations in the fabrication methods and equipment. Such modifications will become apparent to those skilled in the art from a consideration of this disclosure or practice of the invention disclosed herein. Consequently, it is not intended that this invention be limited to the specific embodiments disclosed herein, but that it cover all modifications and alternatives coming within the true scope and spirit of the invention.

[0077] All references cited herein, including but not limited to published and unpublished applications, patents, and literature references, are incorporated herein by reference in their entirety and are hereby made a part of this specification. To the extent publications and patents or patent applications incorporated by reference contradict the disclosure contained in the specification, the specification is intended to supersede and/or take precedence over any such contradictory material.

EXAMPLES

[0078] Generally, whole blood samples from a clinical site are portioned out, with some stored for DNA extraction. The remaining sample is used to extract peripheral blood mononuclear cells (PBMCs) through centrifugation. A portion of the PBMC sample is set aside for DNA extraction. The remaining PBMC sample is aliquoted out into three samples and each sample undergoes a separate antibody-based enrichment process (MACs and AutoMACs). The purpose of this enrichment is to isolate T-cells, B-cells, and Monocytes. Enriched samples of these three PBMC subsets are then used for DNA extraction. For each patient blood sample, five DNA samples are prepared: Whole blood, PBMCs, T-cells, B- cells, and monocytes.

[0079] Per patient, each cell type-specific DNA sample is bisulfite converted and run on an Infinium HumanMethylation450 BeadChip. Differential methylation discovery is then executed per cell type. This analysis results in a set of cell type specific CpGs that are significantly differentially methylated in a phenotype of interest (e.g., SLE) relative to other phenotypes (e.g., non-SLE/RA autoimmune diseases and healthy controls). These identified CpGs are known as differentially methylated loci (DML).

[0080] If a cell type specific DML is located within a gene's promoter region, this gene is considered to be differentially methylated. Pathway enrichment analyses are conducted based on these differentially methylated gene sets to assess whether specific biological pathways are enriched. Example 1: Creation of unmethylated DNA Using a REPLI-g UltraFast Mini Kit

(Qiagen WGA)

[0081] Whole genome amplification (WGA) does not translate the methylation properties, so by taking the normal DNA through WGA we create 100% unmethylated DNA. Notably, in some embodiments, this kit is designed for >10ng of input material and template should be in TE buffer. The REPLI-g UltraFast reaction typically yields between 7-10ug DNA. Lower DNA yields may be observed when using low-quality DNA. For best results, the template DNA should be >2kb in length with some fragments >10kb. REPLI-g UltraFast DNA Polymerase should be thawed on ice. All other components can be thawed at RT. Buffer Dl and Buffer Nl should not be stored longer than 3 months.

[0082] Preparation Prepare Buffer DLB by adding 500ul of nuclease-free water to the tube. Mix thoroughly and centrifuge briefly. NOTE: Reconstituted Buffer DLB can be stored for 6 months at -20°C. Buffer DLB is pH-liable. Avoid neutralization with C02. All buffers and reagents should be vortexed before use to ensure thorough mixing. Set a water bath or heat block to 30°C for use in step 9.

[0083] Procedure 1. Prepare sufficient Buffer Dl (denaturation buffer) and Buffer Nl (neutralization buffer) for the total number of whole genome amplification reactions. Volumes given are suitable for up to 40 reactions. Excess Buffer Dl can be stored at -20°C for up to 3 months. Preparation of Buffer Dl :

Component Volume 20rxns

Reconstituted Buffer DLB 5ul 2.5

Nuclease-free Water 35ul 17.5

Total Volume 40ul 20

[0084] 2. Place lul template DNA into a microfuge tube. (Amount of DNA should be >10ng. Generally use 20-40ng, so DNA should be concentrated to 20ng/ul or 40ng/ul). 3. Add lul Buffer Dl to the DNA. Mix by vortexing and centrifuge briefly. 4. Incubate the samples at RT (15~-25°C) for 3 min. 5. Add 2ul Buffer Nl to the samples. Mix by vortexing and centrifuge briefly. 6. Keep REPLI-g UltraFast DNA Polymerase on ice. Thaw all other components at RT (room temperature), vortex, then centrifuge briefly. The REPLI-g UltraFast Reaction Buffer may form a precipitate after thawing. The precipitate will dissolve by vortexing 10s. 7. Prepare a master mix on ice according to the below. Mix and centrifuge briefly. The Master Mix should be kept on ice and used immediately upon addition of the REPLI-g UltraFast DNA Polymerase.

[0085] Preparation of Master Mix:

[0086] 8. Add 16ul of the master mix to 4ul of denatured DNA. 9. Incubate at 30°C for 1.5hrs. 10. Inactivate REPLI-g UltraFast DNA Polymerase by heating the sample for 3 min at 65°C. 11. Purify and quantitate the DNA.

[0087] DTR Column Purification Purpose: Run the amplified DNA through a DTR (Dye Terminator Removal) column to purify the DNA. Procedure: 1. Centrifuge Gel Filtration for 3 min at 850xg. 2. Transfer the cartridge to the provided 1.5ml micro fuge tube and add the sample to the packed column. Be sure the fluid runs into the gel. 3. Close the cap and centrifuge for 3 minutes at 850 x g. Retain eluate. NOTE: Up to 4ul may be lost during sample processing. If the volume loss is greater than 4ul, this is an indication of an overly dry gel. To optimize recovery of sample, repeat the centrifugation.

[0088] DNA Quantification: Qubit Materials: Qubit dsDNA BR Assay Kit, 0.5ml thin wall clear PCR tubes, Qubit Method: 1. Set up the required number of 0.5ml tubes (thin wall clear PCR tubes, VWR 10011—830) for each sample and standards 1 and 2. 2. Label the tube lids accordingly. 3. Make the Qubit working solution by diluting the Qubit dsDNA BR reagent 1 :200 in Qubit dsDNA BR buffer. Use a clean plastic tube each time you make the Qubit working solution. Do not mix the working solution in a glass container, a. Ex. dsDNA BR Buffer= 199 x (Sample number + 2) b. Ex. dsDNA BR Reagent= 1 x (Sample number + 2) 4. Vortex the Qubit working solution and quick spin if necessary. 5. Load 190ul of Qubit working solution into each of the tubes used for standards. 6. Load 198ul of Qubit working solution to each sample tube. 7. Add lOul of each standard to the appropriate standard tube. 8. Add 2ul of each sample to the appropriate sample tube. 9. Vortex each sample (the final volume should be 200ul) for 2-3 seconds then incubate at room temperature for 2 minutes. 10. On the home screen of the Qubit, select DNA then select dsDNA Broad Range as the assay type. 11. On the Standards Screen, select to run a new calibration and follow the instructions to read the standards. 12. When standard measurements are done, insert the first sample and click Read Next Sample. 13. Record the concentration (given in ng/ul). 14. Repeat sample readings until all samples have been read. 15. Calculate the final concentration of each sample by multiplying the assay concentration from the Qubit by your dilution factor. (If 2ul of sample is used the dilution factor is 100. Total vol/ sample amount= dilution factor). 16. The amplified and now quantified DNA can be stored at -20 until required.

Example 2: CpG Methylase (M. Sssl) Zymo

[0089] Purpose: Completely methylate normal DNA. Method: NOTE: up to 1 ug can be used per reaction tube.

1. Calculate the amount of DNA required for the methylation control dilutions. If you want to make at least 2ug of each dilution you will need a total of 13ug of methylated DNA.

2. Based on the DNA concentration, determine how many replicates need to be made to create the 13ug of methylated DNA. (NOTE: only 4ul of DNA can be added per reaction tube, so if the DNA is not very concentrated more replicates are required. DNA should be a minimum of lOOng/ul).

3. Combine the reagents in the following table:

4. Incubate the samples at 30°C for 2 hrs. 5. After 2hrs, re— add CpG Methylase and incubate for another 2hrs. 6. After 4 hr incubation, heat inactivate the enzyme at 65°C for 20 min. 7. Clean Methylated DNA using The Zymo Genomic DNA Clean and concentrator kit. [0090] Genomic DNA Clean and Concentrator Kit Zymo Research - Purpose: Clean and concentrate the Sssl treated normal DNA. Do not purify more than 2ug per a Zymo column. Method: 1. Before starting, follow the kit's instructions to make sure all of the reagents are ready. 2. Add 2 volumes of DNA Binding Buffer to each volume of DNA sample. Mix thoroughly. 3. Transfer the mixture to a provided Zym-Spin IC-XL Column in a Collection Tube. 4. Centrifuge for 30 seconds at 12,800 x g. Discard the flow through. 5. Add 200ul of DNA Wash Buffer to the column. Centrifuge at 16,000 x g for 1 minute. Repeat the wash step. 6. Transfer the column to a clean 1.5ml capless tube and add 20ul of DNA Elution Buffer warmed to 60°C directly to the column matrix. Incubate at room temperature for 1 minute. 7. Centrifuge for 30 seconds at 16,000 x g to elute the DNA. 8. Transfer the DNA to a clean 1.5ml microfuge tube. 9. Quantitate the concentrated DNA on the Qubit as outlined in the DNA Quantification section3. 10. The 100% methylated DNA is now ready for the control mixtures. 1 1. DNA can be stored at-20°C until required. For long-term storage, store at -80°C.

[0091] Mixing Methylation Controls Purpose: Create 0% to 100% methylation mixes in 10% increments using the amplified and methylated DNAs created on the previous pages. Method: 1. Calculate the amount of DNA needed to have 2050ng total in each of the percentage mixes, a. Ex. 10% methylation means the mixture should be 10% methylated DNA (Sssl DNA) and 90% unmethylated DNA (WGA DNA). The table below shows the amount of each methylated and unmethylated DNA that should be present in each mix for a 205 Ong total mix.

[0092] b. Using the above table and the concentration of the unmethylated (0% Me/ WGA DNA) and 100% methylated DNA (Sssl DNA), calculate the volume required from each DNA to create each of the mixes above. 2. Once the amount of each of the DNAs required for the mixes is known, the final concentration of each mix should be brought to 50ug/ul or 25ng/ul (if your samples are not very concentrated, you will need to bring the final concentration to 25ng/ul instead of 50ng/ul). a. Since each mix contains a total of 2050ng, the total volume should be 41ul. 2050ng/41ul= 50ng/ul. b. Or the total volume can be brought to 82ul. 2050ng/82ul= 25ng/ul. 3. Calculate the amount of TE that needs to be added to bring the total volume of each mix to either 41ul of 82ul. 4. Once all of the calculations are made, create the methylation control mixes, a. Label 0.2ml strip tubes for each sample, b. Mix the 0% Me and 100% Me DNAs and quick spin down. c. Add the appropriate amount of TE to each tube. d. Using the calculations made earlier, add the correct amount of 0% Me DNA and 100% Me DNA to each of the tubes, e. Once all of the DNA and TE is added, make sure all the tubes are closed. Flick to mix the tubes and quick spin them down. 5. DNAs can be stored at -20°C or -80°C until required. 6. Or proceed to DNA Bisulfite Conversion.

[0093] Bisulfite Conversion: Zymo Research EZ DNA Methylation Lightning Kit Purpose: Bisulfite conversion 500ng of Methylation Control Mixes to be run on array per NGS. Methods: 1. Aliquot 500ng of each DNA sample into a 0.2ml strip tube, maximum of 20ul. If the volume is less than 20ul, make up the difference with water so the total volume is 20ul. 2. Vortex the Lightning Conversion Reagent for 10 seconds, and make sure there is no precipitate; then quick spin down to remove any droplets from the cap. 3. Add 130ul of Lightning Conversion Reagent to the 20ul (500ng) of DNA sample in a PCR tube. Mix, then centrifuge briefly to ensure there are no droplets in the cap or sides of the tube. 4. Place the PCR tube in the Viia7 to perform the following steps: i. 98°C for 8 minutes ii. 54°C for 60 minutes iii. 4°C storage for up to 20 hours (optional) 5. Place the column into a provided Collection Tube and add 600ul of MBinding Buffer to a Zymo-Spin IC Column. Close the cap and mix by inverting the column 12 times. 6. Centrifuge at full speed (14,000 x g) for 30 seconds. Discard the flow through. 7. Add lOOul of M-Wash Buffer to the column. Centrifuge at full speed for 30 seconds. 8. Add 200ul of L-Desulfonation Buffer to the column and let stand at room temperature (20-30°C) for 15—20 minutes. After incubation, centrifuge at full speed for 30 seconds. 9. Add 200ul of M-Wash Buffer to the column. Centrifuge at full speed for 30 seconds. Discard the flow through and repeat this wash step. 10. Place the column into a clean 1.5ml capless micro fuge tube and add l lul of M-Elution Buffer directly to the column matrix. Centrifuge for 30 seconds at full speed to elute the DNA. DNA is ready for immediate use or can be stored at -80°C or below for long-term storage.

Example 3: PCR Primer Design and Ranking

[0094] The primer design input is a text file with one DML coordinate per line. The DML coordinates refer to the hgl9 positive strand cytosine position of a CpG. This method generates a list of PCR primer candidates for DML.

10: 102872719

12: 12223989

12: 122356400

12: 124929963

13:24798133

13:27998168

13:28366814

16: 1722957

16: 1797050

19:46807466

1 :231512676

1 :231820076

2: 1 14261369

6:41168960

7:30725669

8:41169433

X:40035961

[0095] The primer pipeline outputs ranked primers based on the criteria disclosed in the present application. The Left Primer Sequences below are (top to bottom): SEQ ID NOs. 1-3. The Right Primer Sequences below are SEQ ID NO. 4.

[0096] The primers were then used in a sequencing run (e.g. MiSeq NGS) using control methylation samples. Methylation controls, as discussed above, are DNA with 0% methylation, 25% methylation, 50% methylation, 75% methylation, and 100% methylation. Data from a validation run is shown below. Samples of differently methylated control percentages are listed in rows. Two tested DML are shown in columns, with the listed values representing methylation frequencies.

Sample Name DML 1 DML 2

0% Methylation Replicate 1 0.4 0.6

0% Methylation Replicate 2 7 0.4

100% Methylation Replicate 1 87.6 94.8

100% Methylation Replicate 2 99.6 95.2

25% Methylation Replicate 1 33.6 27.6

25% Methylation Replicate 2 10.1 17.5

50% Methylation Replicate 1 37.6 45.6

50% Methylation Replicate 2 57.7 43.1

75% Methylation Replicate 1 86.7 68.8

75% Methylation Replicate 2 56.8 66.3

[0097] Similarly, methylation control samples (e.g. 10%, 50%, and 100% methylated) were also run alongside patient samples. An integrated output with controls and patient samples, from osteoarthritis (OA) and rheumatoid arthritis (RA), is shown below.

[0098] The methylation controls were also run on a discovery platform. On the graph below, the y-axis represents the observed BeadChip methylation frequencies across CpGs and the x-axis represents the expected methylation frequency. The straight line represents an ideal case, with the error bars indicating standard deviation.

[_{0099] The congruence of the error bars shows that the variance was consistent} across the experiments. The divergence of the observed values from ideal may be due to

-26- inherent color bias in the sequencing platform and/or issues with the preparation of the methylated control DNA.

Example 4: PCR Primer Validation

[0100] A number of PCR primers designed in Example 3 were synthesized and tested. The Bioanalyzer traces, below, were used to assess which candidate primers pass and which fail. The metrics used in this determination include band count, intensity, tightness of band, and size of band in base pairs. Passing candidates have a single, intense, tight band between 150-300 bp. As shown below, the candidate PCR primers were overwhelmingly successful.

Example 5: Sample Preparation

[0101] Whole blood samples were received from a clinical site and a portion is stored for DNA extraction. The remaining sample was used to extract peripheral blood mononuclear cells (PBMCs). A portion of the PBMC sample was set aside for DNA extraction. The remaining PBMC sample was aliquoted out into three samples and each sample underwent a separate antibody-based enrichment process (MACs and AutoMACs) to isolate T-cells, B-cells, and Monocytes. Enriched samples of these three PBMC subsets were then used for DNA extraction. Accordingly, for each patient blood sample, five DNA samples were prepared: Whole blood, PBMCs, T-cells, B-cells, and monocytes. The PBMC collection process is described below.

[0102] Method

1) Harvest all the sample tubes into 50 ml conical tubes (2 tubes of blood/50 ml conical) and record volume collected from each sample tube.

2) Rinse each sample tube with 5 ml AutoMACS and pool it with the appropriate 50 ml conical collection tube by pouring it from the blood tube to the 50ml conical tube.

3) Bring up the volume to 45 ml with AutoMACS.

4) Spin at 1800rpm (approximately 600xg) in Sorvall 7 for 10 min.

5) Aspirate supernatant to just above the pellet.

6) Resuspend the pellet in 1ml of buffer using the PI 000.

7) Pool the tubes of the same sample (rinse one tube with AutoMACS) and bring up the volume to 45 ml with buffer. ) Spin tubes at 1500rpm for 10 min.

) Remove the supernatant and resuspend the pellet in 1 ml AutoMACS

0) Make a 1 :50 dilution; 5ul of sample in 245ul PBS for cell counting.

[0103] Countess

) Use disposable hemocytometers (Invitrogen, individual packages).

) Add l Oul of the 1 :50 sample dilution to lOul of Trypan and mix.

) Load lOul of sample/ Trypan mix into one side of the disposable hemocytometer.) Insert the slide into Countess and analyze.

) Push slide in, slide will eject, turn it around and count the other sample.

) Repeat steps 1 -5 to count remaining samples and slides.

[0104] Post PBMC Collection

) Save PBMCs at -80°C.

) Extract DNA from the PBMCs.

a. Bring the total volume to 200ul with PBS.

b. Keeps cells on ice until ready to proceed to Qiagen DNA Blood Mini Extraction.

) Pull out cells for staining.

a. Bring the volume to 1 OOul with 10% BS A.

b. Incubate with O. lul mlgG for 10 min. at RT.

c. After mlgG incubation add antibodies.

d. Incubate on ice.

e. Bring volume to 1ml with staining buffer.

f. Spin at 3,000 rpm for 2 min.

g. Aspirate supernatant.

h. Resuspend pellet in lOOul of Fixing.

i. Add 900ul of staining buffer. Spin at 3,000 rpm for 2 min.

j. Aspirate supernatant.

k. Resuspend the pellet in 1 OOul of staining buffer.

[0105] Collecting cell subsets, for example, B-Cells, Monocytes, and T-Cells:) Calculate remaining PBMCs (subtract 10⁶ from the total cell number)

) Calculate for B-Cells, Monocytes, and T-Cells.

) Place each aliquot for each of the cell subsets into a 15ml conical tube and bring the volume to 5ml with AutoMACs buffer.

) Spin at 1 ,500 rpm for lOmin.

) Remove supernatant and resuspend in MACS buffer.

) Calculate for MACS beads.

) Incubate in refrigerator for 15min.

) Bring volume to 5ml with AutoMACS. Spin at 1 ,500 rpm (300xg) for lOmin.

0) Remove the supernatant and resuspend in 500ul of AutoMACS buffer.

1) Place 500ul sample on the MACS column and wait for it to flow through.

2) Add 1ml of AutoMACS buffer to the column and plunge into the 15ml conical. 13) Follow the same Cell Countess instructions as before and count each sample subset.

14) Transfer each sample to a 2ml tube and spin down at 5,000 rpm for 5 min.

15) Remove the supernatant and resuspend in 200ul of PBS.

16) Keep the sample on ice until ready to proceed to DNA Extraction.

[0106] The autoMACS Pro Separator was used for automatic labeling and separation, according to the manufacturer's instructions. DNA extraction with QIAamp DNA Blood Mini Kit, DNA concentration with Zymo Research Genomic DNA Clean and Concentrator Kit, DNA quantification with Qubit dsDNA BR Assay Kit, and the bisulfite conversion was performed as described in Example 1.

Example 6a: Minimal Noise in DML Detection

[0107] NHC DNA were divided into 48 identical aliquots, which were subjected to 48 independent bisulfite conversions. The samples were then divided into four groups for analysis. Samples 1-12 and 13-24 were run using one lot of reagents; Samples 25-36 and 37- 48 were analyzed using a different reagent lot. The chart below indicates HC replicates cluster tightly together (center) relative to 5RA, 4 OA, 1 HC, and 4 SLE PBMC samples (left and right edges). Forty-eight independent bisulfite conversion reactions, with two BeadChip reagent lots, and four BeadChips do not strongly affect the RA SLE signatures. This filtering of noise and clustering of samples by disease state is particularly advantageous for decreasing the limits of detection for specific disease states.

Replicate Annotation Key:

BSR -Bisulfite Reaction

RL -Reagent Lot

BC -BeadChip

The scale on the right indicates the dissimilarity.

PBMC Samples Clustered across 1,683 DML

Example 6b: Median Replicate Methylation Differences [0108] Here, the 1,683 DML, as used in the analysis in Example 6a, of two replicate samples were plotted in each graph. The line represents a linear regression. The R² value for each sample indicates an excellent correlation between analysis of different samples, further demonstrating the robustness of this method.

-32- [0109] Similarly, technical replicates showed decreasing coefficient of variation (CV) with increasing methylation frequency. The graphs below demonstrate that 60% methylation frequency exhibits the largest median standard deviation (2.26%). Moreover, 95% of samples in this group have methylation frequencies between 55– 65%. In this example, 23 out of the 25 Lupus Panel DML have methylation frequencies > 10%. Accordingly, a multi-analyte diagnostic is more robust against technical noise.

E_{xample 7: Distribution of Median Replicate Methylation Differences} [0110] CpGs must have a methylation frequency difference > 0.10 to be labeled as DML. The chart below depicts 1,632 unique DML for rheumatoid arthritis and lupus. The chart further demonstrates that the median methylation difference between replicate samples is well below the 0.10 threshold. Moreover, this methylation cutoff is 7.1 standard deviations away from this distribution’s mean, indicating the method of the present application is particularly effective at filtering noise.

[0111] Furthermore, median methylation differences between non-replicate samples reveal biological differences. The median of absolute methylation frequency differences shows that methylation differences between non-replicate samples tend to be larger than replicate samples (Wilcoxon test p-value < 10^-15). This reveals the different biology of the RA, OA, SLE, and HC samples. The top chart shows the median methylation frequency difference distribution between replicate samples. The bottom chart represents the median methylation frequency difference distribution between non-replicate samples. The x- and y-axes are identical in both graphs and bin size is identical across graphs.

-35- Example 8: Methylation Frequency Differences of 25 DML SLE Panel are not

Explained by Technical Variability [0112] The figure below shows that the absolute methylation frequency differences between SLE and non-SLE samples tend to be larger than across technical replicates (Wilcoxon test p-value = 1.3x10^-13).

[_{0113] The figures below indicate that the 25 DML SLE panel does not exhibit} more technical noise than an all CpG background. The Wilcoxon p-value that 25 DML SLE panel CV % or DR values tended to be larger than all CpG background was≥ 0.69. This 25 SLE DML panel is able to predictively diagnose SLE with 77% sensitivity and 100% specificity (batch adjusted data set).

Example 9: Discovery versus Diagnostic Platform Comparison [0114] In this experiment, 248 samples were analyzed, as described above, across both diagnostic and discovery platforms. Of these, 107 samples were characterized as non- SLE/SLE on both discovery and diagnostic platforms with 98% concordance between platforms. This demonstrates the accuracy of this method in distinguishing between a diseased and non-diseased state, and substantially decreases the risk of false positive readings.

Model 3 P(SLE) Between BeadChip and NGS

Example 10: Performance Based DML Selection Cross Validation Results

[0115] In this example, the multi-cell model uses DML across cell types: BMCs, T-cells, B-cells, and monocytes. Metrics were compared across the five model types using 2 fold cross-validation. Across the different cell types shown in the chart below, (Monocyte only, B-cell only, T-cell only, and IGN-102 PBMC only) results show that the multi-cell type model performs significantly better than the individual cell type models. The individual cell type models' errors rates approach zero as more training data is used. Thus, the multi-cell model is more robust, as larger sample sizes of a single subset may generate a highly accurate RA diagnostic model. Thus, the chart below demonstrates the superior performance a classification model that integrates DML across cell types relative to models based on single cell types. The first column of the table shows the error rate of a model that integrates methylation data across cell types. The remaining column shows the performance of cell type specific models. Each row indicates the DML panel used in the analysis. Lower error percentages represent better performance.

[0116] The term“comprising” as used herein is synonymous with“including,” “containing,” or“characterized by,” and is inclusive or open-ended and does not exclude additional, unrecited elements or method steps.

Claims

WHAT IS CLAIMED IS:

1. A method of selecting a first primer for amplification of an amplicon spanning a differentially methylated locus in a nucleic acid, comprising

identifying a differentially methylated locus for amplification; identifying a desired amplicons size; and

identifying a first primer binding site wherein the number of cytosines in the first primer binding site is maximized.

2. The method of Claim 1, wherein the number of single nucleotide polymorphisms in the amplicon is minimized.

3. The method of Claim 1 , wherein the number of cytosines in the primer binding site corresponding to the 3 ' end of the primer is maximized.

4. The method of Claim 1, wherein the number of cytosines in the first primer binding site in combination with the number of cytosines in a second primer binding site of a second primer is maximized, wherein the second primer is configured to be used in pair with the first primer to amplify the amplicon from the nucleic acid in a pair with the first primer.

5. The method of Claim 2, wherein the number of cytosines in the first primer binding site corresponding to the 3' end of the first primer in combination with the number of cytosines in the second primer binding site corresponding to the 3' end of the second primer of a pair primer to generate said amplicon is maximized.

6. The method of any of Claims 1-5, wherein the number of cytosines in the amplicon outside of the first primer binding site and the second primer binding site is minimized.

7. The method of any of Claims 1-6, wherein a difference in CG concentration of the first primer and the second primer is minimized.

8. The method of any of Claims 1-7, wherein a difference in Tm of the first primer and the second primer is minimized.

9. The method of any of Claims 1-8, wherein the Tm of the first primer and second primer are between about 50°C and about 70°C.

10. The method of any of Claims 1-9, wherein the GC percentage of the first primer and second primer are between about 10% to about 70%.

11. The method of any of Claims 1-10, wherein the primer size of the first primer and second primer are between 18 bp to 30 bp.

12. A method of assessing methylation reaction quality, comprising the steps of performing a polymerase chain reaction on a sample having a known template concentration using primers designed according to the method of any of Claims 1-11 , determining an effective template concentration, and discarding said sample if said effective concentration is less than 50%, 60%, 70%, 80%, or 90% of said known template concentration.

13. A method of assessing bisulfite treatment reaction quality, comprising the steps of performing a polymerase chain reaction on a sample having a known template concentration using primers designed according to the method of any of Claims 1-1 1, determining an effective template concentration, and discarding said sample if said effective concentration is less than 50%, 60%, 70%, 80%, or 90% of said known template concentration.

14. A method of assessing amplicons generation reaction quality, comprising the steps of performing a polymerase chain reaction on a sample having a known template concentration using primers designed according to the method of any of Claims 1-1 1, determining an effective template concentration, and discarding said sample if said effective concentration is less than 50%, 60%, 70%, 80%, or 90% of said known template concentration.

15. A method of assessing amplification bias in a set of templates comprising: performing an amplification reaction on a first template having a first methylation frequency; performing an amplification reaction on a second template having a first methylation frequency; and discarding said first template if an amplification yields less than a 50%, 60%, 70%, 80%, or 90% yield of amplification product from said first template compared to said second template.

16. A method of assessing methylation site determination reaction quality comprising the steps of performing a methylation determining reaction on a first template, performing a methylation determining reaction on a control template having a known methylation efficiency, and discarding said sample if an effective template concentration of said first sample differs from a known template concentration of said first sample by greater than 10%, 20%, 30%, or 40% compared to said difference in said known template concentration and said effective concentration as determined for said control template.

17. A method of assessing methylation site determination reaction quality comprising the steps of performing a methylation determining reaction on a first template, performing a methylation determining reaction on a control template having a known methylation efficiency, and discarding said sample if a variance across samples is greater than 5%, 10%, or 20%, higher than control values.

18. A method of assessing methylation site determination reaction quality comprising the steps of performing a methylation determining reaction on a first template, performing a methylation determining reaction on a control template having a known methylation efficiency, and discarding said sample if a variance across samples is greater than l .lx, 1.2x, 1.3x, 1.4x, or 1.5x of control values.

19. The method of any of Claims 12-18, wherein said templates are bisulfite- treated nucleic acids.

20. A method of interfacing a primer-treated sample output with an amplicon sequencing apparatus comprising the steps of providing at least a first amplicons, generating a coordinates file for each amplicons, determining which sites within each amplicons are methylation variant sites, creating a machine compatible sample sheet, wherein the sample sheet identifies sample IDs, primer IDs and associated barcode IDs, transferring said sample sheet to said sequencing device, and generating a sheet creation signal, and sending said single to said sample sequencing apparatus to catalogue a second sample.

21. A method for diagnosing a disease according to the method of Claims 1-1 1.

22. The method of Claim 21 , wherein the disease is cancer.

23. The method of Claim 22, wherein the cancer is selected from the group consisting of leukemia, carcinoma, sarcoma, lymphoma, skin, Non-Hodgkin lymphoma, Hodgkin lymphoma melanoma, acute myeloid leukemia, chronic myeloid leukemia, chronic lymphocytic leukemia, acute lymphocytic leukemia, tongue, astrocytoma, nephroma, glioblastoma, hepatocellular, urinary bladder, gallbladder, pancreatic, gastric, colon, rectal, glioma, small intestine, intrahepatic bile duct, non-epithelial skin, breast, brain, testicular, cervical, ovarian, kidney and renal pelvis, cardiac, endometrial, uterine, vaginal, vulvar, esophageal, head, neck, salivary, small cell lung, non-small cell lung, bone, eye and orbit, endocrine, salivary, retinoblastoma, neuroblastoma, meduloblastoma, osteosarcoma, Pheochromocytoma, renal, penile, liver, mesothelioma, oral, nasal, myeloma, thyroid, adrenal, pituitary, prostate, throat, perineural invasion, Wilm's tumor, adenoid cystic, bronchus, and inflammatory myo fibroblastic tumor.

24. The method of Claim 21, wherein the disease is an autoimmune disease.

25. The method of Claim 24, wherein the autoimmune disease is selected from the group consisting of rheumatoid arthritis, juvenile rheumatoid arthritis, lupus, ulcerative colitis, Crohn's disease, psoriasis, psoriatic arthritis, Addison's disease, Grave's disease, myasthenia gravis, Cushing's syndrome, ankylosing spondylitis, Type I diabetes, eczema, and multiple sclerosis.

26. The method of Claim 21 , wherein the disease is pain.

27. The method of Claim 26 wherein the pain is related to cancer, autoimmune disease, or other chronic genetic condition.