WO2019239218A2 - Determination of epigenetic modifications by nanopore sequencing - Google Patents

Determination of epigenetic modifications by nanopore sequencing Download PDF

Info

Publication number
WO2019239218A2
WO2019239218A2 PCT/IB2019/000855 IB2019000855W WO2019239218A2 WO 2019239218 A2 WO2019239218 A2 WO 2019239218A2 IB 2019000855 W IB2019000855 W IB 2019000855W WO 2019239218 A2 WO2019239218 A2 WO 2019239218A2
Authority
WO
WIPO (PCT)
Prior art keywords
base
nucleic acid
moiety
sequencing
cases
Prior art date
Application number
PCT/IB2019/000855
Other languages
French (fr)
Other versions
WO2019239218A3 (en
Inventor
Joanne N. MASON
Sidong LIU
Walraj GOSAL
Original Assignee
Cambridge Epigenetix Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Cambridge Epigenetix Limited filed Critical Cambridge Epigenetix Limited
Publication of WO2019239218A2 publication Critical patent/WO2019239218A2/en
Publication of WO2019239218A3 publication Critical patent/WO2019239218A3/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6813Hybridisation assays
    • C12Q1/6827Hybridisation assays for detection of mutation or polymorphism

Definitions

  • the systems and methods as described herein may provide an approach allowing real-time long-read sequencing of nucleic acid molecules.
  • the methods disclosed herein are an improvement in the field of sequencing.
  • An aspect of the present disclosure provides a method.
  • the method may comprise: (a) associating a moiety with a hydroxymethylated base of a target nucleic acid sequence to form a labeled hydroxymethylated base; (b) oxidizing the labeled hydroxymethylated base; and (c) identifying the hydroxymethylated base by sequencing the target nucleic acid sequence, wherein the sequencing comprises nanopore sequencing.
  • the hydroxymethylated base may comprise a pyrimidine.
  • the pyrimidine may be a cytosine.
  • the hydroxymethylated base may comprise a 5 -hydroxymethylated base.
  • the 5- hydroxymethylated base may comprise a 5-hydroxymethylcytosine.
  • the moiety may comprise a glucose moiety.
  • the method may comprise, before the identifying, oxidizing the moiety.
  • the oxidizing may be carried out by an oxidizing agent.
  • the oxidizing agent may comprise sodium periodate.
  • target nucleic acid sequence may further comprise a formylated base.
  • the formylated base may comprise a 5- formylated base.
  • the 5- formylated base may comprise a 5- formylcytosine.
  • the method may further comprise associating the formylated based with a second moiety.
  • the second moiety may comprise a hydroxylamine, a hydrazine, a l,3-indandione; a hemiacetal, an acetal; an aldehyde; a ketone; an ester, a primary amine, a secondary amine, an alkene, an alcohol, a thioacetal, a malononitrile, a benzoin, an aldol, a derivative thereof of any of these, or any combination thereof.
  • the target nucleic acid sequence may further comprise a carboxylic acid containing base.
  • the carboxylic acid containing base may comprise a 5- carboxylated base.
  • the 5- carboxylated base may comprise a 5- carboxy cytosine.
  • the method may further comprise associating the carboxylic acid containing base with a third moiety.
  • the third moiety may comprise an anisidine, a carbodiimide, a p-Xylylenediamine, an ester, an amine, an acyl halide, an acid anhydride, a derivative thereof of any of these, or any combination thereof.
  • the target nucleic acid sequence may further comprise a methylated base.
  • the methylated base may comprise a 5 -methylated base.
  • the 5- methylated base may comprise a 5-methylcytosine.
  • the target nucleic acid sequence may comprise DNA or R A.
  • the target nucleic acid sequence may further comprise a N6-methyladenine, aN6-hydroxymethyladenine, a N6-formyladenine, a 2 -0- methyladenosine, a Nl-methyladenosine, a pseudouridine, an inosine, a 8-oxoguanine, a 7-methylguanine, a 5-hydroxymethyluracil, an abasic site, or any combination thereof.
  • a size of a nanopore may be at most one nanometer.
  • at least one nanopore used in the nanopore sequencing may be a biological nanopore.
  • the moiety may be at least two moieties.
  • the identifying may comprise employing a trained algorithm.
  • the hydroxymethylated base may be identified at an accuracy greater than an accuracy achieved by a method of identifying the hydroxymethylated base using a different sequencing method.
  • a hydroxymethylated base, a methylated based, carboxylic acid containing base or a formylated base may be identified at an accuracy greater than an accuracy achieved by a method of identifying the base using a different sequencing method.
  • the different sequencing method may be Illumina sequencing.
  • the identifying may comprise identifying an unmodified base.
  • the unmodified base may be identified at an accuracy greater than an accuracy achieved by a method of identifying the unmodified base using a different sequencing method.
  • the different sequencing method may be Illumina sequencing.
  • the unmodified base may be a cytosine, a thymine, a uracil, a adenine or guanine.
  • the method may comprise: (a) associating a moiety with an epigenetically modified base of a target nucleic acid sequence to form a labeled epigenetically modified base, wherein the epigenetically modified base comprises a formylated base, or a carboxylic acid containing base; and (b) identifying the epigenetically modified base by sequencing the target nucleic acid comprising the labeled epigenetically modified base, wherein the sequencing comprises nanopore sequencing.
  • the epigenetically modified base may comprise a pyrimidine.
  • the pyrimidine may be a cytosine.
  • the epigenetically modified base may further comprise a hydroxymethylated base.
  • the hydroxymethylated base may comprise a 5 -hydroxymethylated base.
  • the 5- hydroxymethylated base may comprise a 5-hydroxymethylcytosine.
  • the moiety may comprise a glucose moiety.
  • the method may further comprise, before the identifying, oxidizing the moiety.
  • the oxidizing may be carried out by an oxidizing agent.
  • the oxidizing agent may comprise sodium periodate.
  • the epigenetically modified base may comprise a formylated base.
  • the formylated base may comprise a 5- formylated base.
  • the 5- formylated base may comprise a 5- formylcytosine.
  • the moiety may comprise a hydroxylamine, a hydrazine, a 1,3- indandione; a hemiacetal, an acetal; an aldehyde; a ketone; an ester, a primary amine, a secondary amine, an alkene, an alcohol, a thioacetal, a malononitrile, a benzoin, an aldol, a derivative thereof of any of these, or any combination thereof.
  • the epigenetically modified base may comprise a carboxylic acid containing base.
  • the carboxylic acid containing base may comprise a 5- carboxylated base.
  • the 5- carboxylated base may comprise a 5- carboxy cytosine.
  • the moiety may comprise an anisidine, a carbodiimide, a p- Xylylenediamine, an ester, an amine, an acyl halide, an acid anhydride, a derivative thereof of any of these, or any combination thereof.
  • the epigenetically modified base may further comprise a methylated base.
  • the methylated base may comprise a 5 -methylated base.
  • the 5 -methylated base may comprise a 5-methylcytosine.
  • the target nucleic acid sequence may comprise DNA or RNA.
  • the epigenetically modified base may further comprise a N6-methyladenine, a N6-hydroxymethyladenine, a N6-formyladenine, a 2’-0-methyladenosine, a Nl-methyladenosine, a pseudouridine, an inosine, a 8- oxoguanine, a 7-methylguanine, a 5-hydroxymethyhiracil, an abasic site, or any combination thereof.
  • a size of a nanopore may be at most one nanometer.
  • at least one nanopore used in the nanopore sequencing may be a biological nanopore.
  • the moiety may be at least two moieties.
  • the identifying may comprise employing a trained algorithm.
  • the hydroxymethylated base may be identified at an accuracy greater than an accuracy achieved by a method of identifying the hydroxymethylated base using a different sequencing method.
  • an epigenetically modified base may be identified at an accuracy greater than an accuracy achieved by a method of identifying the epigentically modified base using a different sequencing method.
  • a hydroxymethylated base, a methylated based, carboxylic acid containing base or a formylated base may be identified at an accuracy greater than an accuracy achieved by a method of identifying the base using a different sequencing method.
  • the different sequencing method may be Illumina sequencing.
  • the identifying may comprise identifying an unmodified base.
  • the unmodified base may be identified at an accuracy greater than an accuracy achieved by a method of identifying the unmodified base using a different sequencing method.
  • the different sequencing method may be Illumina sequencing.
  • the unmodified base may be a cytosine, a thymine, a uracil, a adenine or guanine.
  • FIG. 1 shows a computer control system that may be programmed or otherwise configured to implement methods provided herein.
  • FIG. 2A shows a labeled 5-hmC.
  • FIG. 2B shows a fully perpendicular single stranded DNA with glucosylated 5-hmC.
  • FIG. 2C shows an inner diameter of mutant CsgG pore.
  • FIG. 2D shows an oxidized glucose labeled 5-hmC.
  • FIG. 3A shows a control DNA sequence amplified from a plasmodium DNA.
  • FIG. 3B shows CpG distribution in a genome.
  • FIG. 3C shows sample sizes of a control DNA sequence, a target DNA sequence with glucosylated 5-hmC, and a target DNA sequence wherein the glucosylated 5-hmC is oxidized.
  • FIG. 3D shows sample sizes of a control DNA sequence, a target DNA sequence with glucosylated 5-hmC, and a target DNA sequence wherein the glucosylated 5-hmC is oxidized.
  • FIG. 3E shows no pore blockage.
  • FIG. 3F shows insertion size distributions of the control DNA sequence.
  • FIG. 3G shows insertion size distributions of the target DNA sequence with glucosylated 5-hmC.
  • FIG. 3H shows insertion size distributions of the target DNA sequence wherein the glucosylated 5-hmC is oxidized.
  • FIG. 4A shows an example where base modifications causes basecalling errors.
  • FIG. 4B shows a zoom-in view of 4A.
  • FIG. 4C shows an example of a raw signal alignment.
  • FIG. 5 shows a workflow to identify 5-hmC by nanopore sequencing.
  • FIG. 6A shows a labeled 5-fC with a hydroxylamine derivative.
  • FIG. 6B shows a labeled 5-fC with a hydrazine derivatives.
  • FIG. 6C shows a labeled 5-fC with l,3-indandione.
  • FIG. 6D shows a labeled 5-caC with p-anisidine.
  • FIG. 7A shows sequences of a control DNA sequence amplified from plasmodium DNA.
  • FIG. 7B shows an example of CpG distribution in a genome.
  • FIG. 7C shows different input sample sizes of a control DNA sequence and target DNA sequence with a labeled 5-fC.
  • FIG. 7D shows different library sample sizes of control DNA sequence and target DNA sequence with labeled 5-fC.
  • FIG. 7E shows insertion size distributions of the control DNA sequence.
  • FIG. 7F shows the insertion size distributions of the target DNA sequence with labeled 5-fC.
  • FIG. 8A shows an example of epigenetic modifications caused error basecalling.
  • FIG. 8B shows a zoom-in view of 8A.
  • FIG. 8C shows a raw signal alignment.
  • FIG. 9 shows 5-caC labeled with ethyl-3-[3-(dimethylamino)propyl]-carbodiimide hydrochloride (EDC).
  • FIG. 10A shows sequences of control DNA sequence amplified from plasmodium DNA.
  • FIG. 10B shows an example of CpG distribution in a genome.
  • FIG. 10C shows different input sample sizes of control DNA sequence and target DNA sequence with 5-caC labeled by p-Xylylenediamine.
  • FIG. 10D shows different library sample sizes of control DNA sequence and target DNA sequence with 4-caC labeled by p-Xylylenediamine.
  • FIG. 10E shows insertion size distributions of the control DNA sequence.
  • FIG. 10F shows insertion size distributions of the target DNA sequence with labeled 5-caC.
  • FIG. 11 A shows an example of epigenetic modification caused error basecalling.
  • FIG.11B shows a zoom-in view of 11A.
  • FIG. 11C shows an example of a raw signal alignment.
  • FIG. 12A shows a method for using bioinformatics software for epigenetic modification calling.
  • FIG. 12B shows a method for using bioinformatics software for epigenetic modification calling.
  • FIG. 13A shows bioanalyzer results for 5-hmC modification.
  • FIG. 13B shows the size distributions of 1 kb-hmC and glucosylated 1 kb-hmC.
  • FIG. 13C shows 1 GU view of reads mapped to reference.
  • FIG. 14A shows result of bioanalyzer for 5-fC modification.
  • FIG. 14B shows the bioanalyzer result with 1/3 dilution for 5-fC modification.
  • FIG. 14C shows there is no signs of pore blockage.
  • FIG. 14D shows the insert size distributions of C, 5-fC, and 5-fC-HA.
  • FIG. 14E shows the modification cause error basecalling.
  • FIG. 14F shows zoom-in view of the modification cause error basecalling.
  • FIG. 14G shows the raw signal analysis of TTACT kmer.
  • FIG. 15A shows the result of the bioanalyzer after p-Xylylenediamine modification.
  • FIG. 15B shows the bioanalyzer results with 1/3 dilution for 5-caC modification.
  • FIG. 15C shows the raw signal analysis of ACTAT.
  • FIG. 16 shows the chemical structures of 5-hmC, 5-mC, 5-fC, and 5-caC.
  • FIG. 17 shows sample sizes of a 2kb PCR product with dCTP, a 2kb PCR product with d5mCTP, a 2kb PCR product with dCTP, and a 2kb PCR product with d5mCTP.
  • FIG. 18 shows that modifications cause error basecalling.
  • FIG. 19A shows overlaid raw signals of raw signal analysis (ONT Tombo).
  • FIG. 19B shows densities of overlaid raw signals of raw signal analysis (ONT Tombo).
  • FIG. 20 shows that modifications cause error basecalling.
  • FIG. 21A shows overlaid raw signals of raw signal analysis (ONT Tombo).
  • FIG. 21B shows densities of overlaid raw signals of raw signal analysis (ONT Tombo).
  • FIG. 22A-FIG. E shows use of De Bruijn sequence to collect k-mer information.
  • FIG. 23 A- FIG. B shows use of De Bruijn sequence to collect k-mer information.
  • FIG. 24 shows steps of one exemplary method of making a k-mer sequence including (1) gBlock PCR amplification, (2) digest Bsal, (3) ligation, (4) excised and cloned, (5) reamplified from positive clones, and (6) sanger sequencing 2/13 clones without mutation.
  • FIG. 25 shows effect of spiking in mC - data from Nanopore Tombo development.
  • FIG. 26 shows a summary of data collected using Nanopore and Illumina sequencing platforms.
  • FIG. 27A-FIG. B shows read filtering and demultiplexing of different runs.
  • FIG. 27A shows Run659.
  • FIG. 27B shows Run669.
  • FIG 27C shows Run67l.
  • FIG. 28A shows IGV screenshot of aligned reads from bisulphite sequencing of the De Bruijn sequence.
  • FIG. 28B shows extent of 5-mC labeling of various CpGs in M.Ssssl labeled (top) and unlabeled sequences (bottom).
  • FIG. 29 shows percent modified C labelling of mC group, hmC group, and ghmC group.
  • FIG. 30A-FIG. C shows datasets for 5 different kmers for both unmodified and M.SssI mC.
  • FIG. 31A-C shows datasets for 5 different kmers for both unmodified and M.SssI mC.
  • FIG. 32 shows a dataset for 50 positions in the De Bruijn sequence for unmodified and modified reads.
  • FIG. 33A shows differences in the ion current level sequences taken with DNA containing methylation (hydroxymethylation) and DNA without methylation.
  • FIG. 33B shows raw Tombo trace (top) and extracted data processed in R (bottom).
  • FIG. 34A-FIG. C shows an exemplary method for calling modifications.
  • FIG. 35 A shows global positional differences in the signal for 5hmC and 5ghmC for the De Bruijn sequence with position 188 highlighted (blue).
  • FIG. 35B shows overlap in the signal for C, 5-mC, 5-hmC, 5-ghmC at a single position (188).
  • FIG. 35C shows raw aligned traces demonstrates signal differences for 5-hmC and 5-ghmC at a single position (188).
  • FIG. 36A shows global positional differences in the signal for 5-hmC and 5-ghmC for the De Bruijn sequence with position 188 highlighted (blue).
  • FIG. 36B shows signal intensity differences in the signal for 5mC, 5hmC and 5ghmC for the De Bruijn sequence with position 188 highlighted.
  • FIG. 37 shows a full dataset of modifications.
  • FIG. 38 shows a full dataset for modifications.
  • FIG. 39 shows signal intensity differences between cytosine, mC, hmC, and ghmC.
  • FIG. 40A shows Nanopore trace differences for soft-labelling with PCR, mC (red).
  • FIG. 40B shows a dataset for 5 different kmers for M.SssI treated data.
  • FIG. 41A shows signal differences between unmodified and modified C in the context of CpG containing kmers with an example of where ghmC enhances differentiation between hmC and mC.
  • FIG. 41B shows signal differences between unmodified and modified C in the context of CpG containing kmers showing a region containing a CpG where it is difficult to distinguish between mC and hmC.
  • FIG. 42 shows a heatmap from FIG. 41 A, with the sequence of each kmer alongside its respective heatmap. DETAILED DESCRIPTION
  • a method as described herein may comprise associating a moiety with an epigenetically modified base of a target nucleic acid sequence to form a labeled epigenetically modified base, wherein the epigenetically modified base comprises a hydroxymethylated base, a formylated base, or a carboxylic acid containing base; and identifying the epigenetically modified base by sequencing the target nucleic acid comprising the labeled epigenetically modified base, wherein the sequencing comprises nanopore sequencing, wherein the sequencing is performed without an enzyme associated with the target nucleic acid sequence.
  • nanopore sequencing can be used to differentiate a first epigenetically modified base from a second epigenetically modified base.
  • a target sequence may include any sequence for which a method as described herein is used to identify a base.
  • a target sequence may include any sequence for which a method as described herein identifies a modification in that target sequence.
  • a target nucleic acid sequence may include any nucleic acid sequence that a method as described herein identifies a modification in that target sequence.
  • the method described may be configured to determine whether a modified base, for example a modified cytosine, can pass through a nanopore.
  • the method described may be configured to determine whether a moiety can be associated with a base, for example a modified base, and pass through a nanopore.
  • identification of a modified base as described herein can increase accuracy of detecting an unmodified base.
  • the method as described may increase the signal-to-noise levels for determination.
  • the method as described may increase the signal-to-noise levels for determining the presence or absence of a epigenetically modified base.
  • bioinformatics software e.g. Tombo
  • bioinformatics software e.g. Tombo
  • the method described may include procedures such as native DNA (or DNA after modifications) fragmentation, end repair, adapter ligation, and sequencing on MinlON.
  • the method described herein may be used to gather data about a modification, associating a moiety with a modification or association of a moiety with a modification, wherein the data can be used for data training for machine learning algorithm.
  • the method described herein can detect or identify a modified base or an unmodified base using an algorithm.
  • 5 -hmC can be detected by nanopore sequencing with the method detailed herein.
  • Advantages of the method may comprise: (a) allowing real-time base calling of nucleic acid molecules; (b) allowing long-read sequencing (up to 2.3 mb); (c) an improved accuracy of determining an epigenetically modified base (e.g., at the >99% consensus accuracy and up to 95% for 5-methylcystosine (5-mC)); (d) being able to be combined with other methods to determine multiple epigenetic modifications containing, but not limited to, 5-formylcytosine (5-fC), 5 -carboxy cytosine (5-caC), 5-methylcystosine (5-mC), 6- methyladenine (6-mA), 6-hydroxymethyladenine (6-hmA), 6-formyladenine (6-fA), 8-oxoadenine (8- oxoA), 8-oxoguanine(8-oxoG), 7-methylguanine(7
  • a plurality of nucleic acid molecules may be first obtained.
  • the plurality of nucleic acid molecules may comprise doubled-stranded nucleic acid or single-stranded nucleic acid.
  • the plurality of nucleic acid molecules may comprise one or more epigenetic modifications.
  • the one or more epigenetic modifications may comprise 5-hmC.
  • a moiety may be associated with at least one of the epigenetically modified bases to form a labeled epigenetically modified base.
  • the moiety may associate with (such as bind to) the epigenetic modified base with an aid, such as an enzyme.
  • the moiety may associate with the epigenetic modified base by click chemistry.
  • the moiety may be a glucose moiety.
  • the glucose moiety may be a uridine diphosphate glucose (UDPG).
  • the glucose moiety may be added to differentiate the electric signals of 5-hmC from those of 5- methylcystosine (5-mC) when performing nanopore sequencing.
  • the electric signals of 5-hmC may be similar to those of 5-methylcystosine (5-mC) because of a one oxygen atom difference.
  • the glucose moiety may be added to enhance the electric signals of 5-hmC.
  • the glucose moiety can be added (or 5- hmC can be glucosylated) by T4 beta-glucosyltransferase.
  • the size of the fully perpendicular single stranded DNA with glucosylated 5-hmC may be much bigger than the 0.9 nm size of current nanopore described in FIG. 2C.
  • the fully perpendicular single stranded DNA with glucosylated 5-hmC may be tilted when passing the nanopore.
  • FIG. 2C shows an inner diameter of mutant CsgG, pore used by the current R9 flowcell of Oxford Nanopore Technology (ONT). Mutant GsgG may be embedded in a flow cell for Nanopore sequencers. The inner diameter may be about 9 Angstroms.
  • Single stranded DNA/RNA may pass through the pore to give different electric currents for base detection.
  • the glucose moiety on the 5-hmC can be broken down.
  • the breaking down can be carried out by an oxidizing agent.
  • the oxidizing agent may be sodium periodate.
  • the oxidation product may be smaller and more flexible, which may make itself pass through the nanopore.
  • a plurality of steps may be performed for nanopore sequencing.
  • a plasmodium DNA sequence may be prepared.
  • the plasmodium DNA sequence may have low GC content.
  • the plasmodium DNA sequence may be amplified by PCR with 5- hmCTP/dGTP/dATP/dTTP to form a control DNA sequence.
  • the 5-hmC may be modified by glucosylation to form a target DNA sequence with glucosylated 5-hmC.
  • the glucosylated 5-hmC may be further modified by periodate oxidation to form a target DNA sequence with oxidized and glucosylated 5-hmC.
  • FIG. 3A shows the sequences of the control DNA sequence amplified from the plasmodium DNA.
  • the control DNA may be 2l52bp.
  • the GC content of the control DNA sequence may be about 15.1%.
  • the cytosine distribution of the control DNA sequence may be even, mimicking CpG distribution in the genome showed in FIG. 3B.
  • FIG. 3C shows different sample sizes of the control DNA sequence, the target DNA sequence with glucosylated 5-hmC, and the target DNA sequence with oxidized and glucosylated 5-hmC.
  • FIG. 3C shows different input sample sizes of the control DNA sequence, the target DNA sequence with glucosylated 5-hmC, and the target DNA sequence with oxidized and glucosylated 5-hmC.
  • 3D shows different library sample sizes of the control DNA sequence, the target DNA sequence with glucosylated 5-hmC, and the target DNA sequence with oxidized and glucosylated 5-hmC.
  • the size of the target DNA sequence with oxidized and glucosylated 5-hmC may be larger than the size of the target DNA sequence with glucosylated 5-hmC, which may be larger than the size of control DNA sequence.
  • FIG. 3E shows there is no pore blockage.
  • FIGs. 3F-3H show the insertion size distributions of the control DNA sequence, the target DNA sequence with glucosylated 5-hmC, and the target DNA sequence with oxidized and glucosylated 5-hmC, respectively.
  • FIG. 4A shows an example that modifications cause error basecalling.
  • FIG. 4B shows a zoom-in view of the same example that modifications cause error basecalling.
  • the correct basecalling may be in gray, and the error basecalling may be in darker shading.
  • Modifications may make the basecalling software confused and cause errors on cytosine determination.
  • the errors may be demonstrated in G because C is in complimentary strand.
  • the data gathered about the modification can be used for data training for machine learning algorithms.
  • FIG. 4C shows an example of a raw signal alignment. Upon manually aligning electric signals obtained through nanopore sequencing, there may be differences between the control DNA sequence and the target DNA sequence with glucosylated 5-hmC in ACT motifs.
  • FIG. 5 shows an example of a workflow to determine 5-hmC by nanopore sequencing.
  • a fragmented DNA may be prepared.
  • the fragmented DNA may comprise one or more epigenetic modifications on one or both strands.
  • a glucose moiety may be associated with the epigenetic modification to form a target DNA fragment with glucosylated 5-hmC.
  • the glucose moiety may be UDPG.
  • the target DNA fragment with glucosylated 5-hmC may be end repaired and ligated with adaptors. Then, after the end repair and adaptor ligation, the target DNA fragment with glucosylated 5-hmC can go through nanopore sequencing.
  • a 5-hmC disclosed herein can result from oxidation of a 5-mC via an oxidizing agent.
  • an oxidizing agent may comprise a perruthenate, a metal oxo complex, or a combination thereof.
  • an oxidizing agent may comprise a perruthenate and a metal oxo complex.
  • the metal oxo complex may be a metal VI oxide, a metal VII oxide, or a combination thereof.
  • an oxidizing agent may comprise hydrogen peroxide, fluorine chlorine, nitric acid, sulfuric acid, peroxydisulfuric acid, peroxymonosulfuric acid, chlorite, chlorate, perchlorate, hypochlorite, permanganate, sodium perborate, nitrous oxide, potassium nitrate, sodium bismuthate, or any combination thereof.
  • an oxidizing agent can be an enzyme.
  • An oxidizing agent may oxidize 5-mC to 5-hmC, 5-fC, 5-caC, or any combination thereof.
  • An oxidizing agent may oxidize 5-hmC to 5-fC, 5-caC, or any combination thereof.
  • An oxidizing agent may selectively oxidize 5-mC to 5-hmC.
  • An oxidizing agent may selectively oxidize 5-mC to 5-fC.
  • An oxidizing agent may selectively oxidize 5-mC to 5-caC.
  • An oxidizing agent may selectively oxidize 5-hmC to 5-fC.
  • An oxidizing agent may selectively oxidize 5-hmC to 5-caC.
  • an enzyme may comprise a ten-eleven translocation (TET) family enzyme.
  • an enzyme may comprise TET1, TET2, TET3, CXXC finger protein 4 (CXXC4), any catalytically active fragment thereof, or any combination thereof.
  • a 5-hmC disclosed herein can result from reduction of a 5-fC or 5-caC via a reducing agent.
  • a reducing agent may comprise pic-borane.
  • a reducing agent may comprise NaBH 4 , NaCNBEE, or LiBH 4 .
  • a reducing agent may comprise lithium aluminum hydride, sodium amalgam, amalgam, diborane, sodium borohydride, sulfur dioxide, dithionate, thiosulfate, iodide, hydrogen peroxide, hydrazine, diisobutylaluminum hydride, oxalic acid, carbon monoxide, cyanide, ascorbic acid, formic acid, dithiothreitol, beta-mercaptoethanol, or any combination thereof.
  • a reducing agent may reduce 5-caC to 5fC, 5-hmC, 5-mC or any combination thereof.
  • a reducing agent may reduce 5-fC to 5- hmC, 5-mC or any combination thereof.
  • a reducing agent may selectively reduce 5-caC to 5-fC.
  • a reducing agent may selectively reduce 5-caC to 5-hmC.
  • a reducing agent may selectively reduce 5-caC to 5-mC.
  • a reducing agent may selectively reduce 5-fC to 5-hmC.
  • a reducing agent may selectively reduce 5-fC to 5-mC.
  • a reducing agent may selectively reduce 5-hmC to 5-mC.
  • a reducing agent may reduce 5-caC to 5-fC such that substantially no other epigenetic modification is reduced.
  • a reducing agent may reduce 5-caC to 5-hmC such that substantially no other epigenetic modification is reduced.
  • a reducing agent may reduce 5-caC to 5-mC such that substantially no other epigenetic modification is reduced.
  • a reducing agent may reduce 5-fC to 5-hmC such that substantially no other epigenetic modification is reduced.
  • a reducing agent may reduce 5-fC to 5-mC such that substantially no other epigenetic modification is reduced.
  • a reducing agent may reduce 5-hmC to 5- mC such that substantially no other epigenetic modification is reduced.
  • An epigenetic modification may be reduced in the presence of a reducing agent and a co-factor.
  • An epigenetic modification may be oxidized in the presence of an oxidizing agent and a co-factor.
  • the co factor may comprise SALL4A, Fe 2+ , 2-oxoglutarate, ATP, or any combination thereof.
  • An epigenetically modified base may be deaminated by a cytidine deaminase, such as APOBEC 1.
  • the epigenetically modified base may be a 5-mC, 5-hmC, 5-fC, 5-caC or any combination thereof.
  • the deamination may occur before or after associating a moiety with the epigenetically modified base.
  • the deamination may occur before or after reducing or oxidizing the epigenetically modified base.
  • the method may comprise: associating a moiety with a hydroxymethylated base of a target nucleic acid sequence to form a labeled base; oxidizing the labeled base; and identifying the oxidized labeled base.
  • the identifying comprises nanopore sequencing.
  • the method may comprise: oxidizing a hydroxymethylated base; associating a moiety with the oxidized base; and identifying the labeled oxidized base.
  • the identifying comprises nanopore sequencing.
  • the method may comprise: associating a moiety with a hydroxymethylated base of a target nucleic acid sequence to form a labeled base; reducing the labeled base; and identifying the reduced labeled base.
  • the identifying comprises nanopore sequencing.
  • the method may comprise: reducing a hydroxymethylated base; associating a moiety with the reduced base; and identifying the labeled reduced base.
  • the identifying comprises nanopore sequencing.
  • the method may comprise: associating a moiety with a methylated base of a target nucleic acid sequence to form a labeled base; oxidizing the labeled base; and identifying the oxidized labeled base.
  • the identifying comprises nanopore sequencing.
  • the method may comprise: oxidizing a methylated base; associating a moiety with the oxidized base; and identifying the labeled oxidized base.
  • the identifying comprises nanopore sequencing.
  • the method may comprise: associating a moiety with a methylated base of a target nucleic acid sequence to form a labeled base; reducing the labeled base; and identifying the reduced labeled base.
  • the identifying comprises nanopore sequencing.
  • the method may comprise: reducing a methylated base; associating a moiety with the reduced base; and identifying the labeled reduced base.
  • the identifying comprises nanopore sequencing.
  • the method may comprise: associating a moiety with a formylated base of a target nucleic acid sequence to form a labeled base; oxidizing the labeled base; and identifying the oxidized labeled base.
  • the identifying comprises nanopore sequencing.
  • the method may comprise: oxidizing a formylated base; associating a moiety with the oxidized base; and identifying the labeled oxidized base.
  • the identifying comprises nanopore sequencing.
  • the method may comprise: associating a moiety with a formylated base of a target nucleic acid sequence to form a labeled base; reducing the labeled base; and identifying the reduced labeled base.
  • the identifying comprises nanopore sequencing.
  • the method may comprise: reducing a formylated base; associating a moiety with the reduced base; and identifying the labeled reduced base.
  • the identifying comprises nanopore sequencing.
  • the method may comprise: associating a moiety with a carboxylated base of a target nucleic acid sequence to form a labeled base; oxidizing the labeled base; and identifying the oxidized labeled base.
  • the identifying comprises nanopore sequencing.
  • the method may comprise: oxidizing a carboxylated base; associating a moiety with the oxidized base; and identifying the labeled oxidized base.
  • the identifying comprises nanopore sequencing.
  • the method may comprise: associating a moiety with a carboxylated base of a target nucleic acid sequence to form a labeled base; reducing the labeled base; and identifying the reduced labeled base.
  • the identifying comprises nanopore sequencing.
  • the method may comprise: reducing a carboxylated base; associating a moiety with the reduced base; and identifying the labeled reduced base.
  • the identifying comprises nanopore sequencing.
  • a first portion or aliquot of a nucleic acid sample may be subjected to a reducing agent or an oxidizing agent, wherein a second portion or aliquot of the nucleic acid sample may not be.
  • a first portion or aliquot of a nucleic acid sample may be subjected to deamination, wherein a second portion or aliquot of the nucleic acid sample may not be.
  • a first portion or aliquot of a nucleic acid sample may be subjected to association with a moiety, wherein a second portion or aliquot of the nucleic acid sample may not be.
  • 5 -fC can be detected by nanopore sequencing with the method detailed herein.
  • Advantages of the method may comprise: (a) allowing real-time base calling of nucleic acid molecules; (b) allowing long- read sequencing (up to 2.3 mb); (c) an improved accuracy of determining an epigenetically modified base (e.g., at the >99% consensus accuracy and up to 95% for 5-methylcystosine (5-mC)); (d) being able to be combined with other methods of determining multiple epigenetic modifications contain, but not limited to, 5-hydroxymethylcytosine (5-hmC), 5 -carboxy cytosine (5-caC), 5-methylcystosine (5-mC), 6- methyladenine (6-mA), 6-hydroxymethyladenine (6-hmA), 6-formyladenine (6-fA), 8-oxoadenine (8- oxoA), 8-oxoguanine(8-oxoG), 7-methylguanine(7-
  • a plurality of nucleic acid molecules may be first obtained.
  • the plurality of nucleic acid molecules may comprise doubled-stranded nucleic acid or single-stranded nucleic acid.
  • the plurality of nucleic acid molecules may comprise one or more epigenetic modifications.
  • the one or more epigenetic modifications may comprise 5-fC.
  • a moiety may be associated with at least one of the epigenetically modified bases to form a labeled epigenetically modified base.
  • the moiety may associate with a plurality of epigenetically modified bases.
  • the moiety may associate with (such as bind to) the epigenetically modified base with an aid, such as an enzyme.
  • the moiety may associate with the epigenetically modified base by click chemistry.
  • the moiety may be hydroxylamine derivatives, hydrazine derivatives, l,3-indandione, p-anisidine, or any combination thereof.
  • a plasmodium DNA sequence may be prepared.
  • the plasmodium DNA sequence may have low GC content.
  • the plasmodium DNA sequence may be amplified by PCR with 5-fCTP/dGTP/dATP/dTTP to form a control DNA sequence.
  • FIG. 7A shows the sequences of the control DNA sequence amplified from the plasmodium DNA.
  • the control DNA may be l533bp.
  • the GC content of the control DNA sequence may be about 14.2%.
  • the cytosine distribution of the control DNA sequence may be even, mimicking CpG distribution in the genome showed in FIG. 7B.
  • 5hmC may be mainly in CpG islands of the human genome, which may be less than about 1% of the human genome.
  • all cytosines may become 5-hmC in addition to CpG context.
  • the cytosine content may be from about 20% to about 30%.
  • Plasmodium may have an AT rich genome.
  • FIG. 7C shows different sample sizes of the control DNA sequence and the target DNA sequence with modified 5-fC.
  • FIG. 7C shows different input sample sizes of the control DNA sequence and the target DNA sequence with modified 5-fC.
  • the target DNA sequence with modified 5-fC may be treated by hydroxylamine (HA).
  • FIG. 7D shows different library sample sizes of the control DNA sequence and the target DNA sequence with modified 5-fC.
  • the size of the target DNA sequence with modified 5-fC may be larger than the size of control DNA sequence.
  • FIGs. 7E-7F show the insertion size distributions of the control DNA sequence and the target DNA sequence with modified 5-fC. There may be no differences between the control DNA sequence and the target DNA sequence with modified 5-fC because the reaction conditions of DNA modification may not cause DNA damage.
  • FIG. 8A shows an example wherein modifications may cause error basecalling.
  • FIG. 8B shows a zoom-in view of the same example that modifications cause error basecalling.
  • the correct basecalling may be in gray, and the error basecalling may be represented by dark lines.
  • Modifications may make the basecalling software confused and cause errors on cytosine determination.
  • the errors may be demonstrated in G because C is in complimentary strand.
  • the data gathered about the modification can be used for data training for machine learning algorithm.
  • the errors of the target DNA sequence with modified 5-fC may be larger than the errors of the control DNA sequence.
  • FIG. 8C shows an example of a raw signal alignment.
  • a 5-fC disclosed herein can result from oxidation of a 5-mC or 5-hmC via an oxidizing agent.
  • an oxidizing agent can be an enzyme.
  • an enzyme may comprise a ten-eleven translocation (TET) family enzyme.
  • an enzyme may comprise TET1, TET2, TET3, CXXC finger protein 4 (CXXC4), any catalytically active fragment thereof, or any combination thereof.
  • 5 -caC can be detected by nanopore sequencing with the method detailed herein.
  • Advantages of the method may comprise: (a) allowing real-time base calling of nucleic acid molecules; (b) allowing long-read sequencing (up to 2.3 mb); (c) an improved accuracy of determining an epigenetically modified base (e.g., at the >99% consensus accuracy and up to 95% for 5-methylcystosine (5-mC)); (d) being able to be combined with other methods of determining multiple epigenetic modifications contain, but not limited to, 5-hydroxymethylcytosine (5-hmC), 5-methylcystosine (5-mC), 5-formylcytosine (5-fC), 6- methyladenine (6-mA), 6-hydroxymethyladenine (6-hmA), 6-formyladenine (6-fA), 8-oxoadenine (8- oxoA), 8-oxoguanine(8-oxoG), 7-methylguanine(7-mG),
  • a plurality of nucleic acid molecules may be first obtained.
  • the plurality of nucleic acid molecules may comprise doubled-stranded nucleic acid or single-stranded nucleic acid.
  • the plurality of nucleic acid molecules may comprise one or more epigenetic modifications.
  • the one or more epigenetic modifications may comprise 5-caC.
  • a moiety may be associated with at least one of the epigenetically modified bases to form a labeled epigenetically modified base.
  • the moiety may associate with (such as bind to) the epigenetically modified bases with an aid, such as an enzyme.
  • the moiety may associate with the epigenetically modified bases by click chemistry.
  • the moiety may be -ethyl-3-[3-(dimethylamino)propyl]-carbodiimide hydrochloride (EDC).
  • a plasmodium DNA sequence may be prepared.
  • the plasmodium DNA sequence may have low GC content.
  • the plasmodium DNA sequence may be amplified by PCR with 5-caCTP/dGTP/dATP/dTTP to form a control DNA sequence.
  • the 5-caC may be modified by p-Xylylenediamine.
  • FIG. 10A shows the sequences of the control DNA sequence amplified from the plasmodium DNA.
  • the control DNA may be l533bp.
  • the GC content of the control DNA sequence may be about 14.2%.
  • the cytosine distribution of the control DNA sequence may be even, such as to mimic CpG distribution in the genome (as described above) shown in FIG. 10B.
  • FIG. 10C shows different sample sizes of the control DNA sequence and the target DNA sequence treated by p-Xylylenediamine.
  • FIG. 10C shows different input sample sizes of the control DNA sequence and the target DNA sequence treated by p-Xylylenediamine.
  • FIG. 10D shows different library sample sizes of the control DNA sequence and the target DNA sequence treated by p-Xylylenediamine.
  • the size of the target DNA sequence with modified 5-caC may be larger than the size of the control DNA sequence.
  • FIGs. 10E-10F show the insertion size distributions of the control DNA sequence and the target DNA sequence with modified 5-caC. There may be differences between the control DNA sequence and the target DNA sequence with modified 5-caC because reaction conditions of DNA modification may cause some damage.
  • FIG. 11A shows an example that modifications cause error basecalling.
  • FIG.l 1B shows a zoom-in view of the same example that modifications cause error basecalling.
  • the correct basecalling may be in gray, and the error basecalling may be represented by dark markings. Modifications may make the basecalling software confused and cause errors on cytosine determination. The errors may be
  • the errors of the target DNA sequence with modified 5-caC may be larger than the errors of the control DNA sequence
  • FIG. 11C shows an example of a raw signal alignment. Upon manually aligning electric signals obtained through nanopore sequencing, there may be differences between the control DNA sequence and the target DNA sequence with modified 5-caC in ACTAT kmer.
  • One or more bioinformatics software programs can be used to determine the epigenetic modifications.
  • the one or more bioinformatics software programs may comprise Tombo and Nanopolish.
  • the advantages of Tombo may be testing-based advantages.
  • the testing-based advantages may comprise: (a) no requirement for training data; (b) identification of modified and unmodified nucleotides in close proximity; (c) detection of chemical modification; or (d) any combination thereof.
  • the advantages of Nanopolish may be model-based advantages.
  • the model-based advantages may comprise: (a) knowing exact chemical modifications; (b) knowing exact modified positions; (c) requiring native DNA after training; or (d) any combination thereof.
  • the one or more bioinformatics software programs may comprise NET Bio, AMPHORA, Anduril, AutoDock, Biolipse, Bioconductor, BioJava, BioJS,
  • FIGs 12A and 12B show examples of methods for using bioinformatics software for epigenetic modification calling.
  • the 5hmC calling is taken as examples.
  • two distinct libraries may be prepared.
  • the first library may comprise a control DNA sequence (with 5-hmC).
  • the second library may comprise a target DNA sequence with glucosylated 5-hmC.
  • the target DNA sequence with glucosylated 5-hmC may be created by associating a glucose moiety to a 5-hmC base of the control DNA sequences.
  • Both the first library and the second library may go through nanopore sequencing so the control DNA sequence can be compared with target DNA sequence with glucosylated 5-hmC to generate data related to 5-hmC base calls.
  • a model can be built and trained (e.g., by aggregating data related to 5-hmC base calls) to leam how to interpret signals toward accurate base calling and/or determination of epigenetic modification.
  • Developing a model may comprise analyzing the plurality of associated sequence signals and developing rules for predicting base calls and/or epigenetic modification, based on the comparison between the control DNA sequence and the target DNA sequence with glucosylated 5-hmC.
  • the model may be built and trained (e.g., using machine learning techniques) based on analysis of different electric signals of the control DNA sequence and the target DNA sequence with glucosylated 5-hmC.
  • a model may comprise expected sequence signals corresponding to a glucosylated 5-hmC.
  • models may comprise distributions, medians, averages, or other quantitative measures of sequence signals (e.g., signal amplitudes) corresponding to a glucosylated 5-hmC.
  • Methods of the present disclosure may comprise algorithms to determine the epigenetic modification.
  • the one or more algorithms may include machine learning algorithms.
  • the machine learning algorithms may comprise supervised learning algorithms, unsupervised learning algorithms, semi-supervised learning algorithms, reinforcement learning algorithms, deep learning algorithms, or any combination thereof.
  • the machine learning algorithms may also comprise Support Vector Machine (SVM), Naive Bayes (NB), Quadratic Discriminant Analysis (QDA), K-Nearest Neighbors (KNN),
  • LDA Linear Discriminant Analysis
  • MLP Multilayer Perceptron
  • the algorithms may incorporate training data of known sequences of the control DNA sequence and/or the target DNA sequence with glucosylated 5-hmC.
  • the algorithms may comprise auxiliary outputs, which may include assessments of the quantization noise (e.g., Poisson or binomial random variation) or other quality assessments, including a confidence interval or error assessment of the epigenetic modification.
  • the outputs may also include dynamic assessments of chemistry process parameters (e.g., temperature) and the most likely labeling fraction to account for the observations as well.
  • the trained model may then be applied by one or more trained algorithms (e.g., machine learning algorithms) to predict base calls and/or determination of epigenetic modification.
  • Such predictions may comprise refining or correcting base calls and/or error base calls, which show the epigenetic modification.
  • Such predictions may comprise determining base calls and/or determination of epigenetic modification from a plurality of sequence signals. All of the operations described herein, such as training an algorithm, predicting and/or generating base calls and other operations, such as those described elsewhere herein, are capable of happening in real-time.
  • one library may be prepared.
  • the library may comprise control DNA sequences (with 5-hmC).
  • at least one of the control DNA sequences may be glucosylated to form an intermediate DNA sequence with glucosylated 5-hmC.
  • the intermediate DNA sequence with glucosylated 5-hmC may be ligated with barcode and go through copy strand synthesis to form a first DNA sequence and a second DNA sequence.
  • the first DNA sequence may comprise a forward strand from the intermediate DNA sequence with glucosylated 5-hmC and a complementary strand to the forward strand.
  • the second DNA sequence may comprise a reverse strand from the intermediate DNA sequence with glucosylated 5-hmC and a complementary strand to the reverse strand.
  • the forward strand may be complementary to the reverse strand.
  • the library may comprise the first DNA sequence and the second DNA sequence.
  • the library may then go through nanopore sequencing so the first DNA sequence can be compared with the second DNA sequence to generate data related to 5-hmC base calls.
  • both data of the nanopore sequencing may be analyzed through Tombo.
  • a model can be built and trained (e.g., by aggregating data related to 5-hmC base calls) to leam how to interpret signals toward accurate base calling and/or determination of epigenetic modification.
  • Developing a model may comprise analyzing the plurality of associated sequence signals and developing rules for predicting base calls and/or epigenetic modification, based on the comparison between the first DNA sequence and the second DNA sequence.
  • the model may be built and trained (e.g., using machine learning techniques) based on analysis of electric signals of the first DNA sequence and the second DNA sequence.
  • a model may comprise expected sequence signals corresponding to a glucosylated 5-hmC.
  • models may comprise distributions, medians, averages, or other quantitative measures of sequence signals (e.g., signal amplitudes) corresponding to a glucosylated 5-hmC.
  • Methods of the present disclosure may comprise algorithms to determine the epigenetic modification.
  • the one or more algorithms may include machine learning algorithms.
  • the machine learning algorithms may comprise supervised learning algorithms, unsupervised learning algorithms, semi-supervised learning algorithms, reinforcement learning algorithms, deep learning algorithms, or any combination thereof.
  • the machine learning algorithms may also comprise Support Vector Machine (SVM), Naive Bayes (NB), Quadratic Discriminant Analysis (QDA), K-Nearest Neighbors (KNN),
  • LDA Linear Discriminant Analysis
  • MLP Multilayer Perceptron
  • the algorithms may incorporate training data of known sequences of the first DNA sequence and the second DNA sequence.
  • the algorithms may comprise auxiliary outputs, which may include assessments of the quantization noise (e.g., Poisson or binomial random variation) or other quality assessments, including a confidence interval or error assessment of the epigenetic modification.
  • the outputs may include dynamic assessments of chemistry process parameters (e.g., temperature) and the most likely labeling fraction to account for the observations as well.
  • the trained model may then be applied by one or more trained algorithms (e.g., machine learning algorithms) to predict base calls and/or determination of epigenetic modification.
  • Such predictions may comprise refining or correcting base calls and/or determination of epigenetic modification.
  • such predictions may comprise determining base calls and/or determination of epigenetic modification from a plurality of sequence signals. All of the operations described herein, such as training an algorithm, predicting and/or generating base calls and other operations, such as those described elsewhere herein, are capable of happening in real-time.
  • the term“about” may mean the referenced numeric indication plus or minus 15% of that referenced numeric indication.
  • fragment may be a portion of a sequence, a subset that may be shorter than a full length sequence.
  • a fragment may be a portion of a gene.
  • a fragment may be a portion of a peptide or protein.
  • a fragment may be a portion of an amino acid sequence.
  • a fragment may be a portion of an oligonucleotide sequence.
  • a fragment may be less than about: 20, 30, 40, 50 amino acids in length.
  • a fragment may be less than about: 20, 30, 40, 50 oligonucleotides in length.
  • epigenetic modification may be any covalent modification of a nucleic acid base.
  • a covalent modification may comprise (i) adding a methyl group, a
  • a covalent modification may occur at any base, such as a cytosine, a thymine, a uracil, an adenine, a guanine, or any combination thereof.
  • an epigenetic modification may comprise an oxidation or a reduction.
  • a nucleic acid sequence may comprise one or more epigenetically modified bases.
  • An epigenetically modified base may comprise any base, such as a cytosine, a uracil, a thymine, adenine, or a guanine.
  • An epigenetically modified base may comprise a methylated base, a hydroxymethylated base, a formylated base, or a carboxylic acid containing base or a salt thereof.
  • An epigenetically modified base may comprise a 5- methylated base, such as a 5-methylated cytosine (5-mC).
  • An epigenetically modified base may comprise a 5 -hydroxymethylated base, such as a 5 -hydroxymethylated cytosine (5-hmC).
  • An epigenetically modified base may comprise a 5 -formylated base, such as a 5 -formylated cytosine (5-fC).
  • epigenetically modified base may comprise a 5-carboxylated base or a salt thereof, such as a 5- carboxylated cytosine (5-caC).
  • an epigenetically modified base may comprise a methyltransferase-directed transfer of a group (such as an mTAG).
  • FIG. 16 shows the chemical structures of 5-hmC, 5-mC, 5-fC, and 5-caC.
  • An epigenetically modified base may comprise one or more bases or a purine (such as Structure 1) or one or more bases of a pyrimidine (such as Structure 2).
  • An epigenetic modification may occur one or more of any positions.
  • an epigenetic modification may occur at one or more positions of a purine, including positions 1, 2, 3, 4, 5, 6, 7, 8, 9, as shown in Structure 1.
  • an epigenetic modification may occur at one or more positions of a pyrimidine, including positions 1, 2, 3, 4, 5, 6, as shown in Structure 2.
  • a nucleic acid sequence may comprise an epigenetically modified base.
  • a nucleic acid sequence may comprise a plurality of epigenetically modified bases.
  • a nucleic acid sequence may comprise an epigenetically modified base positioned within a CG site, a CpG island, or a combination thereof.
  • a nucleic acid sequence may comprise different epigenetically modified bases, such as a methylated base, a hydroxymethylated base, a formylated base, a carboxylic acid containing base or a salt thereof, a plurality of any of these, or any combination thereof.
  • barcode as used herein may relate to a natural or synthetic nucleic acid sequence comprised by a polynucleotide allowing for unambiguous identification of the polynucleotide and other sequences comprised by the polynucleotide having said barcode sequence.
  • the number of different barcode sequences theoretically possible can be directly dependent on the length of the barcode sequence; e.g., if a DNA barcode with randomly assembled adenine, thymidine, guanosine and cytidine nucleotides can be used, the theoretical maximal number of barcode sequences possible can be 1,048,576 for a length of ten nucleotides, and can be 1,073,741,824 for a length of fifteen nucleotides.
  • Unique sample identifiers or barcodes can be completely scrambled (e.g., randomers of A, C, G, and T for DNA or A, C, G, and U for RNA) or they can have some regions of shared sequence.
  • a shared region on each end may reduce sequence biases in ligation events.
  • a shared region can be about or at least about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, or 20 common base pairs.
  • a shared region can be up to about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, or 20 common base pairs.
  • Combinations of barcodes can be added to increase diversity.
  • a barcode may uniquely identify a subject, a sample (such as a cell-free sample), a nucleic acid sequence (such as a sequence having one or more epigenetically modified bases), or any combination thereof.
  • a barcode may be associated with a nucleic acid sequence or a complementary strand.
  • a nucleic acid sequence may comprise a single barcode.
  • a nucleic acid sequence may comprise one or more barcodes, such as a first barcode and a second barcode.
  • the first barcode is different from the second barcode.
  • each barcode of a plurality of barcodes may be a unique barcode.
  • a barcode may comprise a sample identification barcode.
  • a first barcode may comprise a unique barcode and a second barcode may comprise a sample identification barcode.
  • the term“adapter” as used herein may be a nucleic acid with known or unknown sequence.
  • An adapter may be attached to the 3’end, 5’end, or both ends of a nucleic acid (e.g. target nucleic acid).
  • An adapter may comprise known sequences and/or unknown sequences.
  • An adapter may be double -stranded or single-stranded.
  • an adapter can comprise a barcode (e.g. unique identifier sequence).
  • an adapter can be an amplification adapter.
  • An amplification adapter may attach to a target nucleic acid and help the amplification of the target nucleic acid.
  • an amplification adapter may comprise one or more of: a primer binding site, a unique identifier sequence, a non-unique identifier sequence, and a sequence for immobilizing the target nucleic acid on a substrate.
  • a target nucleic acid attached with an amplification adapter may be immobilized on a substrate.
  • An amplification primer may hybridize to the adapter and be extended using the target nucleic acid as a template in an amplification reaction.
  • the unique identifiers in an adapter can be used to label the amplicons.
  • an adapter can be a sequencing adapter.
  • a sequencing adapter may attach to a target nucleic acid and help the sequencing of the target nucleic acid.
  • a sequencing adapter may comprise one or more of: a sequencing primer binding site, a unique identifier sequence, a non-unique identifier sequence, and a sequence for immobilizing target nucleic acid on a substrate.
  • a target nucleic acid attached with a sequencing adapter may be immobilized on a substrate on a sequencer.
  • a sequencing primer may hybridize to the adapter and be extended using the target nucleic acid as a template in a sequencing reaction.
  • the unique identifiers in an adapter can be used to label the sequence reads of different target sequences, thus allowing high-throughput sequencing of a plurality of target nucleic acids.
  • an adapter sequence (such as a double-stranded or single -stranded oligonucleotide) may be ligated to one or both ends of a nucleic acids sequence.
  • a nucleic acid sequence may comprise one or more epigenetically modified bases.
  • a nucleic acid sequence may be from a sample, such as a cell free DNA sample.
  • a nucleic acid sequence may be from a sample obtained from a subject.
  • a nucleic acid sequence may comprise a double-stranded portion, a single -stranded portion, or a combination thereof.
  • an adapter may recognize or may be complementary to a primer, such as a universal primer.
  • an adapter may be specific to a sequencing method.
  • an adapter may be associated with a nucleic acid sequence or a complementary strand.
  • nucleic acid sequence may comprise DNA or RNA.
  • a nucleic acid sequence may comprise a plurality of nucleotides.
  • a nucleic acid sequence may comprise an artificial nucleic acid analogue.
  • a nucleic acid sequence comprising DNA may comprise cell-free DNA, cDNA, fetal DNA, or maternal DNA.
  • a nucleic acid sequence may comprise miRNA, shRNA, or siRNA.
  • moiety may be a component that may be (a) associated with a substrate, (b) associated with an epigenetically modified base, or (c) a combination thereof.
  • a moiety may be associated with an epigenetically modified base by a single bond, a double bond, a triple bond, a metal -associated bond, or an ion pairing.
  • a moiety may comprise a magnetic metal, such as iron, nickel, cobalt, aluminum, or any combination thereof.
  • a moiety may be associated with an epigenetically modified base by the assistance of an enzyme.
  • a moiety may be associated with a substrate via (a) a biotin-streptavidin association, (b) a magnetic association, (c) an antibody-antigen association, or (d) any combination thereof.
  • a moiety may be selectively for a portion of a nucleic acid sequence.
  • a moiety may selectively associate with a double-stranded portion of a nucleic acid sequence as compared to single- stranded portion.
  • a moiety may selectively associate with portions of a nucleic acid sequence having an epigenetically modified base as compared to portions having a non-modified base.
  • a moiety may selectively associate with a type of epigenetically modified base, such as selectively associating with a 5- hydroxymethylated cytosine (5-hmC) as compared to a 5-methylated cytosine (5-mC).
  • a moiety may comprise a sugar, such as a glucose.
  • a glucose may comprise a modified glucose.
  • a moiety may comprise more than one sugar, such as two sugars or more.
  • a moiety may comprise a modified sugar, such as a modified glucose.
  • a moiety may comprise a uridine diphosphate glucose (UDPG).
  • UDPG uridine diphosphate glucose
  • a moiety may comprise a detectable moiety such as a radioactive moiety, a fluorescent moiety, a chemiluminescent moiety, a phosphorescent moiety, an infrared moiety, a visible moiety, a chemically reactive moiety (such as an azide-based moiety), or any combination thereof.
  • a moiety may be a moiety which results from incorporating a chromophore via a reaction with a radioactive moiety.
  • a moiety may comprise a protein, peptide, or polypeptide.
  • a moiety may comprise an antibody or portion thereof.
  • a moiety may comprise a tag, such as a FLAG-tag.
  • a moiety may comprise a biotin or an avidin, such as streptavidin.
  • a moiety may comprise a nucleic acid sequence.
  • a moiety may comprise a substrate.
  • a different moiety may be employed to uniquely moiety different epigenetic modifications. For example, a first moiety may bind a methylated base and a second moiety may bind a
  • a moiety may comprise a hydroxylamine, a hydrazine, a l,3-indandione, a hemi-acetal, an acetal, an aldehyde, a ketone, an ester, a primary amine, a secondary amine, an alkene, an alcohol, a thioacetal, a malonitrile, a benzoin, an aldol, an anisidine, a carbodiimide, a p-Xylylenediamine, an ester, an amine, an acyl halide, an acid anhydride, a N6-methyladenine, a N6-hydroxymethyladenine, a N6-formyladenine, a 2’-0-methyladenosine, a Nl-methyladenosine, a pseudouridine, an inosine, a 8-oxoguanine, a 7- methylguanine, a 5-
  • a moiety may be associated with a formylated base, such as for example a hydroxylamine, a hydrazine, a l,3-indandione, a hemi-acetal, an acetal, an aldehyde, a ketone, an ester, a primary amine, a secondary amine, an alkene, an alcohol, a thioacetal, a malonitrile, a benzoin, an aldol, an anisidine, a carbodiimide, a p-Xylylenediamine, an ester, an amine, an acyl halide, an acid anhydride, a N6- methyladenine, a N6-hydroxymethyladenine, aN6-formyladenine, a 2’-0-methyladenosine, a Nl- methyladenosine, a pseudouridine, an inosine, a 8-oxogua hydroxy
  • a moiety may be associated with a carboxylic acid containing base, such as for example a hydroxylamine, a hydrazine, a l,3-indandione, a hemi-acetal, an acetal, an aldehyde, a ketone, an ester, a primary amine, a secondary amine, an alkene, an alcohol, a thioacetal, a malonitrile, a benzoin, an aldol, an anisidine, a carbodiimide, a p-Xylylenediamine, an ester, an amine, an acyl halide, an acid anhydride, a N6-methyladenine, a N6-hydroxymethyladenine, a N6-formyladenine, a 2’-0-methyladenosine, a Nl- methyladenosine, a pseudouridine, an inosine, a 8-oxo
  • a moiety may be associated with a hydroxymethylated base, such as for example a
  • Two or more, three or more, four or more moieties may be associated with an epigenetically modified base.
  • An epigenetically modified base may be identified in a nucleic acid.
  • a labeled epigenetically modified base may comprise a sugar moiety, such as a glucose.
  • the identifying of the epigenetically modified base may not comprise identifying a presence or an absence of the sugar moiety. In some cases, when a sugar moiety is present, the identifying may not be based on a presence or an absence of the sugar moiety.
  • the identifying of the epigenetically modified base may comprise identifying a presence or an absence of a moiety associated with an epigenetically modified base. In some cases, the identifying the epigenetically modified base may comprise identifying a presence or an absence of a labeled epigenetically modified base.
  • associating a moiety with an epigenetically modified base may permit
  • identification of the epigenetically modified base by sequencing, such as by nanopore sequencing.
  • identifying an epigenetically modified base may comprise sequencing a nucleic acid sequence, such as by nanopore sequencing the nucleic acid sequence.
  • association of a moiety with an epigenetically modified base may modify the epigenetically modified base such that the identification of the epigenetically modified base is by the sequencing.
  • a tag may be a fusion tag, a covalent peptide tag, a protein tag, a peptide tag, an affinity tag, an epitope tag, a solubilization tag, or any combination thereof.
  • a tag may comprise a recombinant protein.
  • a tag may associate with a protein or protein fragment.
  • a FLAG-tag may comprise a sequence or a portion thereof comprising DYKDDDDK, where D may be aspartic acid, Y may be tyrosine, and K may be lysine.
  • a moiety may be associated reversibly with a substrate.
  • a moiety may be associated irreversibly with a substrate.
  • a moiety may be reversibly associated with an epigenetically modified base.
  • a moiety may be irreversibly associated with an epigenetically modified base.
  • a moiety may be associated by binding to a substrate, an epigenetically modified base, or a combination thereof.
  • a moiety may be bound by a single bond, a double bond, or a triple bond to a substrate.
  • a moiety may be bound by a single bond, a double bond, or a triple bond to an epigenetically modified base.
  • the moiety may be a component that may aid in or catalyze a reaction.
  • a moiety may comprise an enzyme or a catalytically active fragment thereof.
  • a moiety may comprise an antibody or fragment thereof.
  • a moiety may comprise a protein, a peptide, or polypeptide.
  • a moiety may comprise a cofactor such as a coenzyme.
  • a moiety may comprise an enzyme, a protein or portion thereof, an antibody or portion thereof, a cofactor or any combination thereof.
  • a moiety, such as an enzyme may aid in an association of a label with an epigenetically modified base.
  • a moiety, such as an enzyme may selectively associate a label with an epigenetically modified base present on a double-stranded oligonucleotide fragment as compared with an epigenetically modified base present on a single -stranded oligonucleotide fragment.
  • a moiety, such as an enzyme may selectively associate a label with an epigenetically modified base present on a single- stranded oligonucleotide fragment as compared with an epigenetically modified base present on a double- stranded oligonucleotide fragment.
  • An enzyme may comprise a transferase.
  • An enzyme may comprise a glucosyltransferase.
  • An enzyme may comprise (a) an alpha-glucosyltransferase, (b) a beta- glucosyltransferase, (c) a beta-glucosyl-alpha-glucosyl-transferase, (d) J-glucosyltransferase, or (e) any combination thereof.
  • a moiety, such as an enzyme may comprise a modified moiety such as a genetically mutated moiety.
  • a modified moiety may be modified to enhance an association of a label with an epigenetically modified base.
  • a modified moiety may be modified to selectively aid in a) an association of a specific label with an epigenetically modified base, b) an association of a label with a specific epigenetically modified base, or c) a combination thereof.
  • a moiety may catalyze a transfer of a methyl group to one or more bases of a nucleic acid sequence, a complementary strand, or a combination thereof.
  • a moiety may comprise a methyltransferase.
  • an enzyme may comprise a DNA methyltransferase 1 (DNMT1), a DNA methyltransferase 3-like (DNMT3L), a DNMT3A, a DNMT3B, a tRNA aspartic acid methyltransferase (TRDMT1), a DNMT3, any catalytically active fragment thereof, or any combination thereof.
  • DNMT1 DNA methyltransferase 1
  • DNMT3L DNA methyltransferase 3-like
  • TRDMT1 tRNA aspartic acid methyltransferase
  • a moiety may catalyze a change in an epigenetic modification, such as a conversion of a methylated base to a hydroxymethylated base.
  • an enzyme may comprise a dioxygenase.
  • an enzyme may comprise a ten-eleven translocation (TET) family enzyme.
  • TET1 TET2, TET3, CXXC finger protein 4 (CXXC4), any catalytically active fragment thereof, or any combination thereof.
  • a moiety may catalyze an oxidative reaction, such as an oxidative decarboxylation.
  • an enzyme may comprise an isocitrate dehydrogenase (IDH) family enzyme.
  • an enzyme may comprise isocitrate dehydrogenase [NAD] subunit alpha (IDH3A), isocitrate dehydrogenase [NAD] subunit beta (IDH3B), isocitrate dehydrogenase [NAD] subunit gamma (IDH3G), isocitrate dehydrogenase 1 (IDH1), isocitrate dehydrogenase 2 (IDH2), any catalytically active fragment thereof, or any combination thereof.
  • IDH isocitrate dehydrogenase
  • a base of a nucleic acid sequence or a complementary strand may be deaminated, spontaneously or by contacting a moiety to a portion of a nucleic acid sequence.
  • a base may be deaminated.
  • a base, a methylated base, a hydroxymethylated base, a formylated base, a carboxylated base, or any combination thereof may be deaminated.
  • a methylated cytosine may be deaminated.
  • Deamination may occur selectively to a single base or to any combination of bases. Deamination may occur spontaneously. Deamination may occur by contacting a moiety to a portion of a nucleic acid sequence.
  • a moiety may include an enzyme such as a deaminase, such as an adenosine deaminase, a guanine deaminase, or a cytidine deaminase.
  • a deaminase may comprise activation-induced cytidine deaminase (AID), a conserved cytidine deaminase (CDA), apolipoprotein B mRNA editing enzyme catalytic polypeptide 1 (APOBEC1), apolipoprotein B mRNA-editing enzyme catalytic polypetide-like 3H (APOBEC3A-H), apolipoprotein B mRNA editing enzyme catalytic polypeptide -like 3G (APOBEC3G), or others.
  • Bisulfite sequencing may deaminate one or more bases of a nucleic acid sequence or a complementary strand.
  • click-chemistry may comprise a reaction having at least one of the following: (a) high yielding, (b) wide in scope, (c) create byproducts that may be removed in the absence of chromatography, (d) stereospecific, (e) simple to perform, (f) conducted in easily removable or benign solvents.
  • click-chemistry comprises tagging, such as tagging a nucleic acid sequence or a complementary strand.
  • click-chemistry may associate a nucleic acid sequence with a moiety.
  • Click-chemistry may comprise a reaction having a [3+2] cycloaddition; a thiol-ene reaction; a Diels-Alder reaction, an inverse electron demand Diels-Alder reaction; a [4+1] cycloaddition; a nucleophilic substitution; a carbonyl-chemistry-like formation of urea; an addition to a carbon-carbon double bond; or any combination thereof.
  • a [3+2] cycloaddition may comprise a Huisgen l,3-dipolar cycloaddition.
  • a [4+1] cycloaddition may comprise a cycloaddition between an isonitrile and a tetrazine.
  • Click-chemistry may comprise a copper(I)-catalyzed azide-alkyne cycloaddition (CuAAC); a strain-promoted azide-alkyne cycloaddition (SPAAC); a strain-promoted alkyne-nitrone cycloaddition (SPANC); or any combination thereof.
  • CuAAC copper(I)-catalyzed azide-alkyne cycloaddition
  • SPAAC strain-promoted azide-alkyne cycloaddition
  • SPANC strain-promoted alkyne-nitrone cycloaddition
  • the term“sequencing” as used herein, may comprise bisulfite-free sequencing, bisulfite sequencing, TET-assisted bisulfite (TAB) sequencing, ACE-sequencing, high-throughput sequencing, Maxam-Gilbert sequencing, massively parallel signature sequencing, Polony sequencing, 454 pyrosequencing, Sanger sequencing, Illumina sequencing, SOLiD sequencing, Ion Torrent semiconductor sequencing, DNA nanoball sequencing, Heliscope single molecule sequencing, single molecule real time (SMRT) sequencing, nanopore DNA sequencing, shot gun sequencing, RNA sequencing, Enigma sequencing, or any combination thereof.
  • TAB TET-assisted bisulfite
  • ACE-sequencing high-throughput sequencing
  • Maxam-Gilbert sequencing massively parallel signature sequencing
  • Polony sequencing 454 pyrosequencing
  • Sanger sequencing Illumina sequencing
  • SOLiD sequencing Ion Torrent semiconductor sequencing
  • DNA nanoball sequencing Heliscope single molecule sequencing
  • SMRT single molecule real time sequencing
  • a method may comprise sequencing.
  • the sequencing may include bisulfite sequencing or bisulfite-free sequencing.
  • a method may comprise oxidizing one or more bases of a nucleic acid sequence or complementary strand or combination thereof.
  • a method may comprise selectively enriching for a nucleic acid sequence that contains at least one epigenetic modification.
  • primer extension reaction generally refers to the binding of a primer to a strand of the template nucleic acid, followed by elongation of the primer(s). It may also include, denaturing of a double-stranded nucleic acid and the binding of a primer strand to either one or both of the denatured template nucleic acid strands, followed by elongation of the primer(s). Primer extension reactions may be used to incorporate nucleotides or nucleotide analogs to a primer in template- directed fashion by using enzymes (polymerizing enzymes)
  • substrate may be a surface with which an entity (such as a moiety, a functional group, an epigenetic modification, a label or functional moiety associated with an epigenetic modification, a label or functional moiety associated with a parent strand) can be associated.
  • an entity may be immobilized to the substrate (such as a support).
  • an entity may be reversibly or irreversibly bound to the substrate (such as a support).
  • an entity may comprise a moiety. In such cases, a moiety may also associate with a nucleic acid sequence.
  • an entity may comprise a moiety, a nucleic acid sequence, a sugar, an enzyme, or any combination thereof.
  • a substrate may comprise a bead.
  • a substrate may comprise a plurality of beads.
  • a substrate may comprise an array of beads.
  • a substrate may comprise an array, such as an array of wells or an array of beads.
  • a substrate (such as a solid support) may comprise a column, such as a packed column, a size- exclusion column, a magnetic column, or any combination thereof.
  • a substrate may comprise a membrane.
  • a substrate may comprise a bead, a capillary, a plate, a membrane, a wafer, a well, a plurality of any of these, an array of any of these, or any combination thereof.
  • a substrate (such as a support) may positively select a nucleic acid sequence of interest by associating the nucleic acid sequence of interest with the substrate.
  • a substrate may negatively select for a nucleic acid sequence of interest by associating other nucleic acid sequences of a sample with the substrate.
  • a bead may comprise one or more beads.
  • a bead may comprise an array of beads.
  • a bead may be associated with a substrate.
  • a bead may be associated with a moiety.
  • a bead may associate a moiety with a substrate.
  • a bead may be associated with a substrate, a moiety, a nucleic acid sequence or any combination thereof.
  • a bead may comprise a polymer, a metal, or a combination thereof.
  • a bead may comprise a hydrogel, a silica gel, a glass, a resin, a metal, a metal alloy, a plastic, a cellulose, an agarose, a magnetic material, or any combination thereof.
  • a support may be organic or inorganic; may be metal (e.g., copper or silver) or non-metal; may be a polymer or nonpolymer; may be conducting, semiconducting or nonconducting (insulating); may be reflecting or nonreflecting; may be porous or nonporous; etc.
  • a substrate as described above can be formed of any suitable material, including metals, metal oxides, semiconductors, polymers (particularly organic polymers in any suitable form including woven, nonwoven, molded, extruded, cast, etc.), silicon, silicon oxide, and composites thereof.
  • tissue may be any tissue sample.
  • a tissue may be a tissue suspected or confirmed of having a disease or condition.
  • a tissue may be a sample that may be substantially healthy, substantially benign, or otherwise substantially free of a disease or a condition.
  • a tissue may be a tissue removed from a subject, such as a tissue biopsy, a tissue resection, an aspirate (such as a fine needle aspirate), a tissue washing, a cytology specimen, a bodily fluid, or any combination thereof.
  • a tissue may comprise cancerous cells, tumor cells, non-cancerous cells, or a combination thereof.
  • a tissue may comprise brain tissue, cerebral spinal tissue, cerebral spinal fluid, breast tissue, bladder tissue, kidney tissue, liver tissue, colon tissue, thyroid tissue, cervical tissue, prostate tissue, lung tissue, heart tissue, muscle tissue, pancreas tissue, anal tissue, bile duct tissue, a bone tissue, uterine tissue, ovarian tissue, endometrial tissue, vaginal tissue, vulvar tissue, stomach tissue, ocular tissue, nasal tissue, sinus tissue, penile tissue, salivary gland tissue, gut tissue, gallbladder tissue, gastrointestinal tissue, bladder tissue, brain tissue, spinal tissue, a blood sample, or any combination thereof.
  • a tissue may be a sample that may be genetically modified.
  • Animals can be mammals, such as humans, non-human primates, rodents such as mice and rats, dogs, cats, pigs, sheep, rabbits, and others. Animals can be fish, reptiles, or others. Animals can be neonatal, infant, adolescent, or adult animals. Humans can be more than about: 1, 2, 5, 10, 20, 30, 40, 50, 60, 65, 70, 75, or about 80 years of age.
  • the subject may have or be suspected of having a condition or a disease, such as cancer.
  • the subject may be a patient, such as a patient being treated for a condition or a disease, such as a cancer patient.
  • the subject may be predisposed to a risk of developing a condition or a disease such as cancer.
  • the subject may be in remission from a condition or a disease, such as a cancer patient.
  • the subject may be healthy.
  • a nucleic acid sequence may comprise a cytosine guanine (CG) site, a cytosine phosphate guanine (CpG) island, a portion of any of these, or a combination thereof.
  • a CpG island may comprise one or more CG sites.
  • a nucleic acid sequence may comprise one or more CG sites or portions thereof.
  • a nucleic acid sequence may comprise dense CG sites, dense CpG islands or a combination thereof.
  • a nucleic acid sequence may comprise a plurality of CG sites or portions thereof.
  • a nucleic acid sequence may comprise one or more CpG islands or portions thereof.
  • a nucleic acid sequence may comprise a plurality of CpG islands or portions thereof.
  • One or more bases of a nucleic acid sequence comprising a CG site, a CpG island, a portion thereof, or any of these may comprise an epigenetically modified base, such as a methylated base or a hydroxymethylated base.
  • One or more cytosines of a nucleic acid sequence comprising a CG site, a CpG island, a portion thereof, or any of these may comprise an epigenetically modified cytosine, such as a methylated cytosine or a hydroxymethylated cytosine.
  • a CpG island (or a CG island) may be a region with a high frequency of CG sites.
  • a CpG island may be a region of a nucleic acid sequence with at least about 200 basepairs (bp) and a GC percentage that may be greater than about 50% and with an observed-to-expected CpG ratio that may be greater than about 60 %.
  • a CpG island may be a region of a nucleic acid sequence with at least about: 20,
  • a CpG island may be a region of a nucleic acid sequence with from about 20 to about 600 bp. In some cases, a CpG island may be a region of a nucleic acid sequence with from about 20 to about 500 bp. In some cases, a CpG island may be a region of a nucleic acid sequence with from about 10 to about 500 bp. In some cases, a CpG island may be a region of a nucleic acid sequence with from about 10 to about 300 bp. In some cases, a CpG island may be a region of a nucleic acid sequence with from about 20 to about 200 bp.
  • a GC percentage in a CpG island may be greater than about: 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95% or greater. In some cases, a GC percentage in a CpG island may be from about 50% to about 95%. In some cases, a GC percentage in a CpG island may be from about 50% to about 99%. In some cases, a GC percentage in a CpG island may be from about 55% to about 85%. In some cases, a GC percentage in a CpG island may be from about 60% to about 99%. In some cases, a GC percentage in a CpG island may be from about 70% to about 99%.
  • cell-free refers to the condition of the nucleic acid sequence as it appeared in the body before the sample is obtained from the body.
  • circulating cell-free nucleic acid sequences in a sample may have originated as cell-free nucleic acid sequences circulating in the bloodstream of the human body.
  • nucleic acid sequences that are extracted from a solid tissue, such as a biopsy are generally not considered to be“cell -free.”
  • cell-free DNA may comprise fetal DNA, maternal DNA, or a combination thereof.
  • cell-free DNA may comprise DNA fragments released into a blood plasma.
  • the cell-free DNA may comprise circulating tumor DNA.
  • cell-free DNA may comprise circulating DNA indicative of a tissue origin, a disease or a condition.
  • a cell-free nucleic acid sequence may be isolated from a blood sample.
  • a cell-free nucleic acid sequence may be isolated from a plasma sample.
  • a cell -free nucleic acid sequence may comprise a complementary DNA (cDNA).
  • cDNA complementary DNA
  • one or more cDNAs may form a cDNA library.
  • a nucleic acid sequence may be double-stranded, such as a cDNA library comprising the nucleic acid sequence.
  • a nucleic acid sequence may be double -stranded such as when a substantially complementary strand may be hybridized to at least a portion of the nucleic acid sequence.
  • a portion of a nucleic acid sequence may be double -stranded, such as when a primer may be hybridized to a portion of the nucleic acid sequence.
  • a nucleic acid sequence may be from a sample.
  • a sample may be isolated from a subject.
  • a subject may be a human subject.
  • a sample may comprise a buccal sample, a saliva sample, a blood sample, a plasma sample, a reproductive sample (such as an egg or a sperm), a mucus sample, a cerebral spinal fluid sample, a tissue sample, a tissue biopsy, a surgical resection, a fine needle aspirate sample, or any combination thereof.
  • a sample may comprise a blood sample.
  • a sample may comprise a buccal sample.
  • a subject may have previously received a diagnosis of a disease or condition prior to performing a method as described herein.
  • a subject may have previously received a positive diagnosis of a disease, such as a cancer.
  • a subject may have previously received an indeterminate or inclusive diagnosis of a disease, such as a cancer.
  • a subject may be a subject in need thereof, such as a need for a definitive diagnosis or a need for a selection of a therapeutic treatment regime.
  • a subject may not have previously received a diagnosis of a disease or condition prior to performing a method as described herein.
  • a subject may be suspected of having a disease or condition, such as having one or more symptoms of a disease or condition.
  • a subject may be at risk of developing a disease or condition, such as a subject having a biomarker or genetic indication that may be indicative of a risk of developing a disease or condition.
  • a disease or a condition may comprise a cancer.
  • a method as described herein may comprise obtaining a result.
  • a method may comprise obtaining a result and reporting the result.
  • a result may be reported to a user, a medical professional, a subject, or any combination thereof.
  • a result may be reported via a communication medium.
  • a communication medium may include a written report or a printed report.
  • a communication medium may include a visual display such as a graphical user interface.
  • a communication medium may comprise a result provided by a computer, a tablet device, a cellphone, or other electronic device.
  • a result may comprise a diagnosis of a disease or condition or a confirmation of an absence of a disease or condition.
  • a result may comprise a diagnosis of a subject as having a disease or condition.
  • a result may comprise a confirmation of an absence of the disease or condition.
  • a result may comprise a likelihood or a risk of a subject to develop a disease or a condition.
  • a disease or a condition may comprise a cancer.
  • a result may comprise predicting mortality of a subject, determining a biological age of a subject, or a combination thereof.
  • a mortality prediction or biological age determination may be based on a presence of an epigenetic modification, sequencing information or any combination thereof.
  • a result, such as a prediction of a likelihood of a disease or condition or a diagnosis of a disease or condition may be based on a presence of an epigenetic modification, sequencing information or a combination thereof.
  • a presence of an epigenetic modification may include a pattern of epigenetic modification, a presence of a specific epigenetic modification, a level of an epigenetic modification, or any combination thereof.
  • a method as described herein may comprise comparing a result to a reference.
  • a reference may comprise a plurality of references.
  • a reference may comprise a database comprising a plurality of results.
  • a reference may comprise a control sample.
  • a reference may comprise a positive control sample, a negative control sample, or a combination thereof.
  • a reference, such as a reference sample may be obtained from a subject or from a different source, such as a different subject.
  • a diagnosis may comprise comparing a result to a reference. In some cases, a result comprising a diagnosis may at least partially confirm a previous diagnosis.
  • One or more results obtained from a method described herein may provide a quantitative value or values indicative of one or more of the following: a likelihood of diagnostic accuracy, a likelihood of a presence of a condition in a subject, a likelihood of a subject developing a condition, a likelihood of success of a particular treatment, or any combination thereof.
  • a method as described herein may predict a risk or likelihood of developing a condition.
  • a method as described herein may be an early diagnostic indicator of developing a condition.
  • a method as described herein may confirm a diagnosis or a presence of a condition.
  • a method as described herein may monitor the progression of a condition.
  • a method as described herein may monitor the efficacy of a treatment for a condition in a subject.
  • Samples obtained for analysis using the methods described herein may be obtained from a subject.
  • the subject may not have any symptoms of a condition.
  • the subject may have one or more symptoms of a condition.
  • the subject may be a risk, such as a genetic risk, of developing a condition.
  • the subject may have previously received a positive diagnosis.
  • the subject may have previously received an indeterminate result from a diagnostic test.
  • the subject may be currently receiving in a treatment.
  • Methods for diagnosing and/or suggesting, selecting, designating, recommending or otherwise determining a course of treatment for a subject having or suspected of having a condition can be employed in combination with the methods as described herein.
  • These techniques may include cytological analysis or histological classification, molecular profiling, a blood test, a genetic analysis, ultrasound analysis, MRI results, CT scan results, other imaging scans, measurements of hormone cytokine or blood cell levels, or any combination thereof.
  • the methods described herein may include at least one other type of diagnostic method.
  • the methods described herein may include at least two other diagnostic methods.
  • the methods of the present invention provide for storing the sample for a time such as seconds, minutes, hours, days, weeks, months, years or longer after the sample is obtained and before the sample is analyzed by one or more methods of the invention.
  • the sample obtained from a subject is subdivided prior to the step of storage or further analysis such that different portions of the sample are subject to different downstream methods or processes including but not limited to any combination of methods described herein, storage, bisulfite treatment, amplification, sequencing, labeling, cytological analysis, adequacy tests, nucleic acid extraction, molecular profiling or a combination thereof.
  • a portion of the sample may be stored while another portion of said sample is further manipulated.
  • manipulations may include but are not limited to any method as described herein; bisulfite treatment; sequencing; amplification; labeling; selective enrichment; molecular profiling; cytological staining; nucleic acid (RNA or DNA) extraction, detection, or quantification; gene expression product (RNA or Protein) extraction, detection, or quantification; fixation; and examination.
  • the sample may be fixed prior to or during storage by any method known to the art such as using glutaraldehyde, formaldehyde, or methanol.
  • the sample is obtained and stored and subdivided after the step of storage for further analysis such that different portions of the sample are subject to different downstream methods.
  • a method as described herein may comprise treating a subject.
  • a treatment may comprise surgery, chemotherapy, radiation therapy, immunotherapy, targeted therapy, hormone therapy, stem cell transplantation, precision medicine, or any combination thereof.
  • a treatment may comprise further monitoring of a condition of a subject.
  • a subject diagnosed with a disease or condition may receive a treatment to treat a disease or a condition.
  • a subject receiving a confirmation of a likelihood or a risk of developing a disease or a condition may receive a treatment, such as a preventive treatment.
  • a treatment for a subject may be selected based on a result of a method, such as a confirmed positive diagnosis of a disease or a condition.
  • a result may comprise one or more treatments, such as a recommended treatment, for a subject based on a result.
  • a treatment may comprise a single treatment.
  • a treatment may comprise a recurring treatment.
  • a treatment may comprise a recurring treatment over a remaining lifespan of a subject.
  • a treatment may comprise a daily treatment.
  • a treatment may comprise a biweekly treatment.
  • a treatment may be selected base on a result.
  • a treatment for a subject can be a surgery (such as a tissue resection), a nutrition regime, a physical activity, a radiation treatment, a chemotherapy, an immunotherapy, a pharmaceutical composition, a cell transplantation, a blood fusion, or any combination thereof.
  • the methods described herein may be conducted prior to an operation on a diseased tissue of the subject, such as a tumor resection.
  • the methods described herein may be conducted prior to the subject having a positive disease diagnosis, such as a cancer or a tumor diagnosis.
  • the methods described herein may be conducted on a subject suspected of having a condition or a disease, such as a cancer or a tumor.
  • the methods described herein may be conducted on a subject that has received a positive disease diagnosis, such as a positive cancer or a positive tumor diagnosis.
  • the methods described herein may be conducted on a subject having received a prior treatment regime, wherein the prior treatment regime was ineffective in eliminating the disease or condition, such as a cancer or tumor.
  • a tissue sample may be obtained from a subject prior to performing the methods described herein.
  • a tissue sample may be obtained during a biopsy, fine needle aspiration, blood sample, surgery resection, or any combination thereof.
  • Assaying a tissue sample of a subject may be performed at one or more time points.
  • a separate tissue sample may be obtained from the subject for assaying at each of the one or more time points.
  • Assaying at one or more time points may be performed on the same tissue sample.
  • Assaying at one or more time points may provide an assessment of an effectiveness of a drug, a longitudinal course of a disease treatment regime, or a combination thereof.
  • a tissue sample may be compared to a same reference.
  • a tissue sample may be compared to a different reference at each of the one or more time points.
  • the one or more time points may be the same.
  • the one or more time points may be different.
  • the one or more time points may comprise at least one time point prior to a drug administration, at least one time point after a drug administration, at least one time point prior to a positive disease diagnosis, at least one time point after a disease remission diagnosis, at least one time point during a disease treatment regime, or a combination thereof.
  • the methods as described herein may be used for diagnosis of a particular condition and also to monitor efficacy of a particular treatment after an initial diagnosis or monitor progression of a particular condition.
  • the methods as described herein may be used to monitoring a subject as risk of developing a particular condition, as a preventive measure.
  • the methods as described herein may be used alone for diagnosis and/or monitoring efficacy of a particular treatment.
  • the methods as described herein may be used in combination with other assays for diagnosis or monitoring (such as a cytological analysis or molecular profiling).
  • a subject may be monitored using methods as disclosed herein.
  • a subject may be diagnosed with condition, such as a cancer or a genetic disorder. This initial diagnosis may or may not involve the use of the methods described herein.
  • the subject may be prescribed a treatment such as surgical resection of a tumor or chemotherapy.
  • the results of the treatment may be monitored on an ongoing basis by the methods described herein to detect the efficacy of the treatment.
  • a subject may be diagnosed with a benign tumor or a precancerous lesion or nodule, and the tumor, nodule, or lesion may be monitored on an ongoing basis by the methods described herein to detect any changes in the state of the tumor or lesion.
  • the methods described herein may also be used to ascertain the potential efficacy of a specific treatment prior to administering to a subject.
  • a subject may be diagnosed with cancer.
  • the methods described herein may indicate a presence of one or more epigenetic residues on a particular nucleic acid sequence known to be involved in cancer malignancy.
  • a further sample may be obtained from the subject and cultured in vitro using methods known to the art.
  • the application of various inhibitors or drugs may then be tested for growth inhibition.
  • the methods described herein may also be used to monitor the effect of these inhibitors on for example down-stream targets of the implicated pathway.
  • the methods described herein may be used as a research tool to identify new markers for diagnosis of conditions (such as suspected tumors); to monitor the effect of drugs or candidate drugs on samples such as tumor cells, cell lines, tissues, or organisms; or to uncover new pathways for disease prevention or inhibition (such as oncogenesis and/or tumor suppression).
  • a nucleic acid sequence may comprise one or more epigenetically modified bases, such as (a) one or more epigenetically modified cytosines, (b) one or more epigenetically modified uracils, (c) one or more epigenetically modified thymines, (d) one or more epigenetically modified guanine, (e) one or more epigenetically modified adenines, or (f) any combination thereof.
  • epigenetically modified bases such as (a) one or more epigenetically modified cytosines, (b) one or more epigenetically modified uracils, (c) one or more epigenetically modified thymines, (d) one or more epigenetically modified guanine, (e) one or more epigenetically modified adenines, or (f) any combination thereof.
  • a nucleic acid sequence may comprise one or more epigenetically modified bases.
  • a nucleic acid sequence may comprise at least about: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
  • a nucleic acid sequence may comprise about: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 epigenetically modified bases per about 20 basepairs of the nucleic acid sequence.
  • a nucleic acid sequence may comprise one or more epigenetically modified bases. For example, about: 4%, 5%, 6%, 7%, 8%, 9%, 10%, 15%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, or 90% of total bases of a nucleic acid sequence may comprise epigenetically modified bases. In some cases, at least about: 4%, 5%, 6%, 7%, 8%, 9%, 10%, 15%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, or 90% of total bases of a nucleic acid sequence may comprise epigenetically modified bases. In some cases, from about 4% to about 10% of total bases of a nucleic acid sequence may comprise epigenetically modified bases.
  • from about 4% to about 6% of total bases of a nucleic acid sequence may comprise epigenetically modified bases. In some cases, from about 4% to about 20% of total bases of a nucleic acid sequence may comprise epigenetically modified bases. In some cases, from about 4% to about 30% of total bases of a nucleic acid sequence may comprise epigenetically modified bases. In some cases, from about 3% to about 30% of total bases of a nucleic acid sequence may comprise epigenetically modified bases. In some cases, from about 30% to about 90% of total bases of a nucleic acid sequence may comprise epigenetically modified bases. In some cases, from about 40% to about 90% of total bases of a nucleic acid sequence may comprise epigenetically modified bases. In some cases, from about 50% to about 90% of total bases of a nucleic acid sequence may comprise epigenetically modified bases. In some cases, from about 60% to about 90% of total bases of a nucleic acid sequence may comprise
  • a nucleic acid sequence (in some cases comprising a plurality of epigenetically modified residues) may be enriched.
  • Enrichment of the nucleic acid sequence may comprise amplification such as amplification by polymerase chain reaction (PCR), loop mediated isothermal amplification, nucleic acid sequence based amplification, strand displacement amplification, multiple displacement amplification, rolling circle amplification, ligase chain reaction, helicase dependent amplification, ramification amplification method, or any combination thereof.
  • PCR polymerase chain reaction
  • loop mediated isothermal amplification nucleic acid sequence based amplification
  • strand displacement amplification strand displacement amplification
  • multiple displacement amplification rolling circle amplification
  • ligase chain reaction helicase dependent amplification
  • ramification amplification method or any combination thereof.
  • amplification may comprise at least 2 cycles of amplification.
  • Amplification may comprise at least 3 cycles of amplification.
  • Amplification may comprise at least 4 cycles of amplification.
  • Amplification may comprise at least 5 cycles of amplification.
  • Amplification may comprise at least 6 cycles of amplification.
  • Amplification may comprise at least 7 cycles of amplification.
  • Amplification may comprise at least 8 cycles of amplification.
  • Amplification may comprise at least 9 cycles of amplification.
  • Amplification may comprise at least 10 cycles of amplification.
  • Amplification may comprise at least 11 cycles of amplification.
  • Amplification may comprise at least 12 cycles of amplification.
  • Amplification may comprise at least 13 cycles of amplification.
  • Amplification may comprise at least 14 cycles of amplification.
  • Amplification may comprise at least 15 cycles of amplification.
  • Amplification may comprise at least 20 cycles of amplification.
  • Amplification may comprise at least 25 cycles of amplification.
  • Amplification may comprise at least 30 cycles of amplification.
  • amplification of a given number of cycles produces a plurality of sequence reads that retain a percentage of original sequence length.
  • about 90% of the plurality of sequence reads retain at least about 90% of the sequence length.
  • about 80% of the plurality of sequence reads retain at least about 90% of the sequence length.
  • about 75% of the plurality of sequence reads retain at least about 90% of the sequence length.
  • about 95% of the plurality of sequence reads retain at least about 90% of the sequence length.
  • about 85% of the plurality of sequence reads retain at least about 90% of the sequence length.
  • about 90% of the plurality of sequence reads retain at least about 85% of the sequence length. In some cases, about 80% of the plurality of sequence reads retain at least about 85% of the sequence length. In some cases, about 75% of the plurality of sequence reads retain at least about 85% of the sequence length. In some cases, about 95% of the plurality of sequence reads retain at least about 85% of the sequence length. In some cases, about 85% of the plurality of sequence reads retain at least about 85% of the sequence length.
  • about 90% of the plurality of sequence reads retain at least about 80% of the sequence length. In some cases, about 80% of the plurality of sequence reads retain at least about 80% of the sequence length. In some cases, about 75% of the plurality of sequence reads retain at least about 80% of the sequence length. In some cases, about 95% of the plurality of sequence reads retain at least about 80% of the sequence length. In some cases, about 85% of the plurality of sequence reads retain at least about 80% of the sequence length.
  • 80%, 90% of the bases of a nucleic acid sequence may comprise an epigenetically modified base. In some cases, at least about 1% of the bases of a nucleic acid sequence may comprise an epigenetically modified base. In some cases, at least about 2% of the bases of a nucleic acid sequence may comprise an epigenetically modified base. In some cases, at least about 3% of the bases of a nucleic acid sequence may comprise an epigenetically modified base. In some cases, at least about 4% of the bases of a nucleic acid sequence may comprise an epigenetically modified base. In some cases, at least about 5% of the bases of a nucleic acid sequence may comprise an epigenetically modified base.
  • At least about 10% of the bases of a nucleic acid sequence may comprise an epigenetically modified base. In some cases, from about 10% to about 100% of the bases of a nucleic acid sequence may comprise an epigenetically modified base. In some cases, from about 10% to about 90% of the bases of a nucleic acid sequence may comprise an epigenetically modified base. In some cases, from about 5% to about 100% of the bases of a nucleic acid sequence may comprise an epigenetically modified base. In some cases, from about 4% to about 100% of the bases of a nucleic acid sequence may comprise an epigenetically modified base. In some cases, from about 3% to about 100% of the bases of a nucleic acid sequence may comprise an epigenetically modified base.
  • a nucleic acid sequence comprises at least about: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15 or 20 epigenetically modified bases per at least about 20 bases of the nucleic acid sequence. In some cases, a nucleic acid sequence comprises at least about 1 epigenetically modified base per at least about 20 bases of the nucleic acid sequence. In some cases, a nucleic acid sequence comprises at least about 2 epigenetically modified bases per at least about 20 bases of the nucleic acid sequence. In some cases, a nucleic acid sequence comprises at least about 3 epigenetically modified bases per at least about 20 bases of the nucleic acid sequence.
  • a nucleic acid sequence comprises at least about 4 epigenetically modified bases per at least about 20 bases of the nucleic acid sequence. In some cases, a nucleic acid sequence comprises at least about 5 epigenetically modified bases per at least about 20 bases of the nucleic acid sequence. In some cases, a nucleic acid sequence comprises at least about 10 epigenetically modified bases per at least about 20 bases of the nucleic acid sequence. In some cases, a nucleic acid sequence comprises from about 1 to about 10 epigenetically modified bases per at least about 20 bases of the nucleic acid sequence. In some cases, a nucleic acid sequence comprises at least from about 3 to about 10 epigenetically modified bases per at least about 20 bases of the nucleic acid sequence.
  • a nucleic acid sequence comprises at least from about 4 to about 10 epigenetically modified bases per at least about 20 bases of the nucleic acid sequence. In some cases, a nucleic acid sequence comprises at least from about 5 to about 10 epigenetically modified bases per at least about 20 bases of the nucleic acid sequence.
  • a nucleic acid sequence comprises at least from about 1 to about 3 epigenetically modified bases per at least about 20 bases of a nucleic acid sequence. In some cases, a nucleic acid sequence comprises at least from about 1 to about 4 epigenetically modified bases per at least about 20 bases of a nucleic acid sequence. In some cases, a nucleic acid sequence comprises at least from about 1 to about 5 epigenetically modified bases per at least about 20 bases of a nucleic acid sequence. In some cases, a nucleic acid sequence comprises at least from about 1 to about 8 epigenetically modified bases per at least about 20 bases of a nucleic acid sequence. In some cases, a nucleic acid sequence comprises at least from about 1 to about 10 epigenetically modified bases per at least about 20 bases of a nucleic acid sequence. In some cases, a nucleic acid sequence comprises at least from about 1 to about 15
  • a nucleic acid sequence comprises at least from about 1 to about 20 epigenetically modified bases per at least about 20 bases of a nucleic acid sequence.
  • a sample obtained from a subject can comprise tissue, cells, cell fragments, cell organelles, nucleic acids, genes, gene fragments, expression products, gene expression products, gene expression product fragments or any combination thereof.
  • a sample can be heterogeneous or homogenous.
  • a sample can comprise blood, urine, cerebrospinal fluid, seminal fluid, saliva, sputum, stool, lymph fluid, tissue, mucus, or any combination thereof.
  • a sample can be a tissue-specific sample such as a sample obtained from a reproductive tissue (such as a sperm or an egg), thyroid, skin, heart, lung, kidney, breast, pancreas, liver, muscle, smooth muscle, bladder, gall bladder, colon, intestine, brain, esophagus, prostate, or any combination thereof.
  • a reproductive tissue such as a sperm or an egg
  • thyroid skin, heart, lung, kidney, breast, pancreas, liver, muscle, smooth muscle, bladder, gall bladder, colon, intestine, brain, esophagus, prostate, or any combination thereof.
  • a sample of the present disclosure can be obtained by various methods, such as, for example, fine needle aspiration (FNA), core needle biopsy, vacuum assisted biopsy, incisional biopsy, excisional biopsy, core biopsy, punch biopsy, shave biopsy, skin biopsy, or any combination thereof.
  • FNA fine needle aspiration
  • core needle biopsy vacuum assisted biopsy
  • incisional biopsy incisional biopsy
  • excisional biopsy core biopsy
  • punch biopsy shave biopsy
  • skin biopsy or any combination thereof.
  • a sample may be obtained from a subject by another individual or entity, such as a healthcare (or medical) professional or robot.
  • a medical professional can include a physician, nurse, medical technician or other.
  • a physician may be a specialist, such as an oncologist, surgeon, or
  • a medical technician may be a specialist, such as a cytologist, phlebotomist, radiologist, pulmonologist or others.
  • a medical professional may obtain a sample from a subject fortesting or refer the subject to a testing center or laboratory for the submission of the sample. The medical professional may indicate to the testing center or laboratory the appropriate test or assay to perform on the sample, such as methods of the present disclosure including determining gene sequence data, gene expression levels, sequence variant data, or any combination thereof.
  • kits may contain collection unit or device for obtaining the sample as described herein, a storage unit for storing the sample ahead of sample analysis, and instructions for use of the kit.
  • Epigenetic modifications may be monitored over time. Monitoring epigenetic modification over time may include monitoring changes in a presence of an epigenetic modification, a level of an epigenetic modification, a pattern of an epigenetic modification. Monitoring may include monitoring an efficacy of a therapeutic, monitoring a progression of a disease, monitoring a regression of a disease, monitoring a risk or likelihood of developing a disease, monitoring a mortality prediction or biological age, or any combination thereof.
  • a sample can be obtained a) pre -operatively, b) post-operatively, c) after a disease diagnosis, d) during routine screening following remission or cure of a disease, e) when a subject may be suspected of having a disease, f) during a routine office visit or clinical screen, g) following the request of a medical professional, or any combination thereof.
  • Multiple samples at separate times can be obtained from the same subject, such as before treatment for a disease commences and after treatment ends, such as monitoring a subject over a time course. Multiple samples can be obtained from a subject at separate times to monitor the absence or presence of disease progression, regression, or remission in the subject.
  • a condition or a disease, as disclosed herein, can include a cancer, a neurological disorder, or an autoimmune disease.
  • a disease or condition may comprise a neurological disorder.
  • a neurological disorder may comprise Acquired Epileptiform Aphasia, Acute Disseminated
  • Encephalomyelitis Adrenoleukodystrophy, Agenesis of the corpus callosum, Agnosia, Aicardi syndrome, Alexander disease, Alpers' disease, Alternating hemiplegia, Alzheimer's disease, Amyotrophic lateral sclerosis (see Motor Neuron Disease), Anencephaly, Angelman syndrome, Angiomatosis, Anoxia, Aphasia, Apraxia, Arachnoid cysts, Arachnoiditis, Amold-Chiari malformation, Arteriovenous malformation, Asperger's syndrome, Ataxia Telangiectasia, Attention Deficit Hyperactivity Disorder, Autism, Auditory processing disorder, Autonomic Dysfunction, , Back Pain, Batten disease, Behcet's disease, Bell's palsy, Benign Essential Blepharospasm, Benign Focal Amyotrophy, Benign Intracranial Hypertension, Bilateral frontoparietal polymicrogyria, Binswanger's disease, Blepharospasm, Bloch-
  • Cytomegalic inclusion body disease CIBD
  • Cytomegalovirus Infection , Dandy-Walker syndrome, Dawson disease, De Morsier's syndrome, Dejerine-Klumpke palsy, Dejerine-Sottas disease, Delayed sleep phase syndrome, Dementia, Dermatomyositis, Neurological Dyspraxia, Diabetic neuropathy, Diffuse sclerosis, Dysautonomia, Dyscalculia, Dysgraphia, Dyslexia, Dystonia, Early infantile epileptic encephalopathy, Empty sella syndrome, Encephalitis, Encephalocele, Encephalotrigeminal angiomatosis, Encopresis, Epilepsy, Erb's palsy, Erythromelalgia, Essential tremor, , Fabry's disease, Fahr's syndrome, Fainting, Familial spastic paralysis, Febrile seizures, Fisher syndrome, Friedreich's ataxia, FART
  • Hypercortisolism Hypoxia, Immune-Mediated encephalomyelitis, Inclusion body myositis, Incontinentia pigmenti, Infantile phytanic acid storage disease, Infantile Refsum disease, Infantile spasms,
  • Inflammatory myopathy Intracranial cyst, Intracranial hypertension, Joubert syndrome, Keams-Sayre syndrome, Kennedy disease, Kinsboume syndrome, Klippel Fed syndrome, Krabbe disease, Kugelberg- Welander disease, Kura, Fafora disease, Fambert-Eaton myasthenic syndrome, Fandau-Kleffher syndrome, Fateral medullary (Wallenberg) syndrome, Feaming disabilities, Feigh's disease, Fennox- Gastaut syndrome, Fesch-Nyhan syndrome, Feukodystrophy, Fewy body dementia, Fissencephaly, Focked-In syndrome, Fou Gehrig's disease, Fumbar disc disease, Fyme disease - Neurological Sequelae, Machado-Joseph disease (Spinocerebellar ataxia type 3), Macrencephaly, Maple Syrup Urine Disease, Megalencephaly, Melkersson-Rosenthal syndrome, Menieres disease, Meningitis
  • Neuromyotonia Neuronal ceroid lipofuscinosis, Neuronal migration disorders, Niemann-Pick disease, Non 24-hour sleep-wake syndrome, Nonverbal learning disorder, O'Sullivan-McFeod syndrome,
  • Occipital Neuralgia Occult Spinal Dysraphism Sequence, Ohtahara syndrome, Olivopontocerebellar atrophy, Opsoclonus myoclonus syndrome, Optic neuritis, Orthostatic Hypotension, Overuse syndrome, Palinopsia, Paresthesia, Parkinson's disease, Paramyotonia Congenita, Paraneoplastic diseases,
  • Paroxysmal attacks Parry -Romberg syndrome, Rombergs Syndrome, Pelizaeus-Merzbacher disease, Periodic Paralyses, Peripheral neuropathy, Persistent Vegetative State, Pervasive neurological disorders, Photic sneeze reflex, Phytanic Acid Storage disease, Pick's disease, Pinched Nerve, Pituitary Tumors, PMG, Polio, Polymicrogyria, Polymyositis, Porencephaly, Post-Polio syndrome, Postherpetic Neuralgia (PHN), Postinfectious Encephalomyelitis, Postural Hypotension, Prader-Willi syndrome, Primary Lateral Sclerosis, Prion diseases, Progressive Hemifacial Atrophy also known as Rombergs Syndrome,
  • Reye's syndrome Rombergs Syndrome, Rabies, Saint Vitus dance, Sandhoff disease, Schytsophrenia, Schilder's disease, Schizencephaly, Sensory Integration Dysfunction, Septo-optic dysplasia, Shaken baby syndrome, Shingles, Shy-Drager syndrome, Sjogren's syndrome, Sleep apnea, Sleeping sickness, Snatiation, Sotos syndrome, Spasticity, Spina bifida, Spinal cord injury, Spinal cord tumors, Spinal muscular atrophy, Spinal stenosis, Steele-Richardson-Olszewski syndrome, see Progressive Supranuclear Palsy, Spinocerebellar ataxia, Stiff-person syndrome, Stroke, Sturge-Weber syndrome, Subacute sclerosing panencephalitis, Subcortical arteriosclerotic encephalopathy, Superficial siderosis, Sydenham's chorea, Syncope, Synesthesia, Syringomyelia,
  • a disease or condition may comprise an autoimmune disease.
  • an autoimmune disease may comprise acute disseminated encephalomyelitis (ADEM), acute necrotizing hemorrhagic leukoencephalitis, Addison's disease, agammaglobulinemia, allergic asthma, allergic rhinitis, alopecia areata, amyloidosis, ankylosing spondylitis, anti-GBM/anti-TBM nephritis, antiphospholipid syndrome (APS), autoimmune aplastic anemia, autoimmune dysautonomia, autoimmune hepatitius, autoimmune hyperlipidemia, autoimmune immunodeficiency, autoimmune inner ear disease (AIED), autoimmune myocarditis, autoimmune pancreatitis, autoimmune retinopathy, autoimmune
  • ADAM acute disseminated encephalomyelitis
  • Addison's disease agammaglobulinemia
  • allergic asthma allergic rhinitis
  • alopecia areata
  • thrombocytopenic purpura ATP
  • autoimmune thyroid disease axonal & neuronal neuropathies
  • Balo disease Behcet's disease, bullous pemphigoid, cardiomyopathy, Castlemen disease, celiac sprue (non- tropical), Chagas disease, chronic fatigue syndrome, chronic inflammatory demyelinating polyneuropathy (CIDP), chronic recurrent multifocal ostomyelitis (CRMO), Churg-Strauss syndrome, cicatricial pemphigoid/benign mucosal pemphigoid, Crohn's disease, Cogan's syndrome, cold agglutinin disease, congenital heart block, coxsackie myocarditis, CREST disease, essential mixed cryoglobulinemia, demyelinating neuropathies, dermatomyositis, Devic's disease (neuromyelitis optica), discoid lupus, Dressler's syndrome, endometriosis
  • leukocytoclastic vasculitis lichen planus, lichen sclerosus, ligneous conjunctivitis, linear IgA disease (LAD), Lupus (SLE), Lyme disease, Meniere's disease, microscopic polyangitis, mixed connective tissue disease (MCTD), Mooren's ulcer, Mucha-Habermann disease, multiple sclerosis, myasthenia gravis, myositis, narcolepsy, neuromyelitis optica (Devic's), neutropenia, ocular cicatricial pemphigoid, optic neuritis, palindromic rheumatism, PANDAS (Pediatric Autoimmune Neuropsychiatric Disorders
  • paraneoplastic cerebellar degeneration paroxysmal nocturnal hemoglobinuria (PNH), Parry Romberg syndrome, Parsonnage-Tumer syndrome, pars plantis (peripheral uveitis), pemphigus, peripheral neuropathy, perivenous encephalomyelitis, pernicious anemia, POEMS syndrome, polyarteritis nodosa, type I, II & III autoimmune polyglandular syndromes, polymyalgia rheumatic, polymyositis, postmyocardial infarction syndrome, postpericardiotomy syndrome, progesterone dermatitis, primary biliary cirrhosis, primary sclerosing cholangitis, psoriasis, psoriatic arthritis, idiopathic pulmonary fibrosis, pyoderma gangrenosum, pure red cell aplasis, Raynaud's phenomena, reflex sympathetic dystrophy
  • thrombocytopenic purpura TPP
  • Tolosa-Hunt syndrome transverse myelitis, ulcerative colitis, undifferentiated connective tissue disease (UCTD), uveitis, vasculitis, vesiculobullous dermatosis, vitiligo or Wegener's granulomatosis or , chronic active hepatitis, primary biliary cirrhosis, cadilated cardiomyopathy, myocarditis, autoimmune polyendocrine syndrome type I (APS-I), cystic fibrosis vasculitides, acquired hypoparathyroidism, coronary artery disease, pemphigus foliaceus, pemphigus vulgaris, Rasmussen encephalitis, autoimmune gastritis, insulin hypoglycemic syndrome (Hirata disease), Type B insulin resistance, acanthosis, systemic lupus erythematosus (SLE), pernicious anemia, treatment-
  • MPGN glomerulonephritis
  • encephalomyelitis subacute autonomic neuropathy, cancer-associated retinopathy, paraneoplastic opsoclonus myoclonus ataxia, lower motor neuron syndrome and Lambert-Eaton myasthenic syndrome.
  • a disease or a condition may comprise AIDS, anthrax, botulism, brucellosis, chancroid, chlamydial infection, cholera, coccidioidomycosis, cryptosporidiosis, cyclosporiasis, dipheheria, ehrlichiosis, arboviral encephalitis, enterohemorrhagic Escherichia coli, giardiasis, gonorrhea, dengue fever, haemophilus influenza, Hansen's disease (Leprosy), hantavirus pulmonary syndrome, hemolytic uremic syndrome, hepatitis A, hepatitis B, hepatitis C, human immunodeficiency virus, legionellosis, listeriosis, lyme disease, malaria, measles.
  • Meningococcal disease Meningococcal disease, mumps, pertussis (whooping cough), plague, paralytic poliomyelitis, psittacosis, Q fever, rabies, rocky mountain spotted fever, rubella, congenital rubella syndrome (SARS), shigellosis, smallpox, streptococcal disease (invasive group A), streptococcal toxic shock syndrome, streptococcus pneumonia, syphilis, tetanus, toxic shock syndrome, trichinosis, tuberculosis, tularemia, typhoid fever, vancomycin intermediate resistant staphylocossus aureus, varicella, yellow fever, variant Creutzfeldt-Jakob disease (vCJD), Eblola hemorrhagic fever, Echinococcosis, Hendra virus infection, human monkeypox, influenza A, H5N1, lassa fever, Margurg hemorrhagic fever, Ni
  • a disease or condition may comprise a cancer.
  • a cancer may comprise thyroid cancer, adrenal cortical cancer, anal cancer, aplastic anemia, bile duct cancer, bladder cancer, bone cancer, bone metastasis, central nervous system (CNS) cancers, peripheral nervous system (PNS) cancers, breast cancer, Castleman's disease, cervical cancer, childhood Non-Hodgkin's lymphoma, lymphoma, colon and rectum cancer, endometrial cancer, esophagus cancer, Ewing's family of tumors (e.g.
  • Ewing's sarcoma eye cancer, gallbladder cancer, gastrointestinal carcinoid tumors, gastrointestinal stromal tumors, gestational trophoblastic disease, hairy cell leukemia, Hodgkin's disease, Kaposi's sarcoma, kidney cancer, laryngeal and hypopharyngeal cancer, acute lymphocytic leukemia, acute myeloid leukemia, children's leukemia, chronic lymphocytic leukemia, chronic myeloid leukemia, liver cancer, lung cancer, lung carcinoid tumors, Non-Hodgkin's lymphoma, male breast cancer, malignant mesothelioma, multiple myeloma, myelodysplastic syndrome, myeloproliferative disorders, nasal cavity and paranasal cancer, nasopharyngeal cancer, neuroblastoma, oral cavity and oropharyngeal cancer, osteosarcoma, ovarian cancer, pancreatic cancer, penile cancer
  • a condition or a disease, as disclosed herein, can include hyperproliferative disorders.
  • Malignant hyperproliferative disorders can be stratified into risk groups, such as a low risk group and a medium-to- high risk group.
  • Hyperproliferative disorders can include but may not be limited to cancers, hyperplasia, or neoplasia.
  • the hyperproliferative cancer can be breast cancer such as a ductal carcinoma in duct tissue of a mammary gland, medullary carcinomas, colloid carcinomas, tubular carcinomas, and inflammatory breast cancer; ovarian cancer, including epithelial ovarian tumors such as adenocarcinoma in the ovary and an adenocarcinoma that has migrated from the ovary into the abdominal cavity; uterine cancer; cervical cancer such as adenocarcinoma in the cervix epithelial including squamous cell carcinoma and adenocarcinomas; prostate cancer, such as a prostate cancer selected from the following: an adenocarcinoma or an adenocarcinoma that has migrated to the bone; pancreatic cancer such as epithelioid carcinoma in the pancreatic duct tissue and an adenocarcinoma in a pancreatic duct; bladder cancer such as a transitional cell carcinoma in urinary bladder, urot
  • leukemia such as acute myeloid leukemia (AML), acute lymphocytic leukemia, chronic lymphocytic leukemia, chronic myeloid leukemia, hairy cell leukemia, myelodysplasia, myeloproliferative disorders, acute myelogenous leukemia (AML), chronic myelogenous leukemia (CML), mastocytosis, chronic lymphocytic leukemia (CLL), multiple myeloma (MM), and myelodysplastic syndrome (MDS); bone cancer; lung cancer such as non-small cell lung cancer (NSCLC), which may be divided into squamous cell carcinomas, adenocarcinomas, and large cell undifferentiated carcinomas, and small cell lung cancer; skin cancer such as basal cell carcinoma, melanoma, squamous cell carcinoma and actinic keratosis, which may be a skin condition that sometimes develops into squam
  • the diseases stratified, classified, characterized, or diagnosed by the methods of the present disclosure include but may not be limited to thyroid disorders such as for example benign thyroid disorders including but not limited to follicular adenomas, Hurthle cell adenomas, lymphocytic thyroiditis, and thyroid hyperplasia.
  • the diseases stratified, classified, characterized, or diagnosed by the methods of the present disclosure include but may not be limited to malignant thyroid disorders such as for example follicular carcinomas, follicular variant of papillary thyroid carcinomas, medullary carcinomas, and papillary carcinomas.
  • Conditions or diseases of the present disclosure can include a genetic disorder.
  • a genetic disorder may be an illness caused by abnormalities in genes or chromosomes. Genetic disorders can be grouped into two categories: single gene disorders and multifactorial and polygenic (complex) disorders.
  • a single gene disorder can be the result of a single mutated gene. Inheriting a single gene disorder can include but not be limited to autosomal dominant, autosomal recessive, X-linked dominant, X-linked recessive, Y- linked and mitochondrial inheritance. In some cases, one mutated copy of the gene can be necessary for a person to be affected by an autosomal dominant disorder.
  • autosomal dominant type of disorder can include but may not be limited to Huntington's disease, Neurofibromatosis 1, Marfan Syndrome, Hereditary nonpolyposis colorectal cancer, or Hereditary multiple exostoses.
  • autosomal recessive disorders two copies of the gene must be mutated for a subject to be affected by an autosomal recessive disorder.
  • this type of disorder can include but may not be limited to cystic fibrosis, sickle-cell disease (also partial sickle-cell disease), Tay-Sachs disease, Niemann-Pick disease, or spinal muscular atrophy.
  • X-linked dominant disorders are caused by mutations in genes on the X chromosome such as X-linked hypophosphatemic rickets.
  • X-linked dominant conditions such as Rett syndrome, Incontinentia Pigmenti type 2 and Aicardi Syndrome can be fatal.
  • X-linked recessive disorders are also caused by mutations in genes on the X chromosome. Examples of this type of disorder can include but are not limited to Hemophilia A, Duchenne muscular dystrophy, red-green color blindness, muscular dystrophy and Androgenetic alopecia.
  • Y-linked disorders are caused by mutations on the Y chromosome. Examples can include but are not limited to Male Infertility and hypertrichosis pinnae.
  • the genetic disorder of mitochondrial inheritance also known as maternal inheritance, can apply to genes in mitochondrial DNA such as in Leber's Hereditary Optic Neuropathy.
  • Genetic disorders may also be complex, multifactorial or polygenic.
  • Polygenic genetic disorders can be associated with the effects of multiple genes in combination with lifestyle and environmental factors. Although complex genetic disorders can cluster in families, they do not have a clear-cut pattern of inheritance.
  • Multifactorial or polygenic disorders can include heart disease, diabetes, asthma, autism, autoimmune diseases such as multiple sclerosis, cancers, ciliopathies, cleft palate, hypertension, inflammatory bowel disease, mental retardation or obesity.
  • Other genetic disorders can include but may not be limited to lp36 deletion syndrome, 21- hydroxylase deficiency, 22ql l.2 deletion syndrome, aceruloplasminemia, achondrogenesis, type II, achondroplasia, acute intermittent porphyria, adenylosuccinate lyase deficiency, Adrenoleukodystrophy, Alexander disease, alkaptonuria, alpha- 1 antitrypsin deficiency, Alstrom syndrome, Alzheimer's disease (type 1, 2, 3, and 4), Amelogenesis Imperfecta, amyotrophic lateral sclerosis, Amyotrophic lateral sclerosis type 2, Amyotrophic lateral sclerosis type 4, amyotrophic lateral sclerosis type 4, androgen insensitivity syndrome, Anemia, Angelman syndrome, Apert syndrome, ataxia-telangiectasia, Beare- Stevenson cutis gyrata syndrome, Benjamin syndrome, beta thalassemia, biotimidase defici
  • spondyloepimetaphyseal dysplasia Strudwick type, spondyloepiphyseal dysplasia congenita, Stickler syndrome, Stickler syndrome COL2A1, Tay-Sachs disease, tetrahydrobiopterin deficiency, thanatophoric dysplasia, thiamine -responsive megaloblastic anemia with diabetes mellitus and sensorineural deafness, Thyroid disease, Tourette's Syndrome, Treacher Collins syndrome, triple X syndrome, tuberous sclerosis, Turner syndrome, Usher syndrome, variegate porphyria, von Hippel-Lindau disease, Waardenburg syndrome, Weissenbacher-Zweymiiller syndrome, Wilson disease, Wolf-Hirschhom syndrome,
  • Xeroderma Pigmentosum X-linked severe combined immunodeficiency
  • X-linked sideroblastic anemia X-linked spinal -bulbar muscle atrophy.
  • a kit may include a moiety, a container, an enzyme or fragment thereof, instructions for use, a portable sequencer, or any combination thereof.
  • a kit may be a general kit for all tissue samples or disease types.
  • a kit may be a specific kit for a specific tissue sample, such as a plasma sample, a blood sample, a serum sample, a buccal sample, or a urine sample.
  • a kit may be a specific kit for a specific disease such as cancer.
  • a kit may comprise a control. In some embodiments, a control can comprise one or more epigenetic modification disclosed herein.
  • a kit may provide periodic updates of a database of references or analysis software that compute a result of the method.
  • a kit may provide software to automate one or more aspects of a method, such as a comparison to a reference to provide a result or to provide a summary of a result that may be reported or displayed or downloaded by a medical professional and/or entered into a database.
  • a result or a summary of results may include any of the results disclosed herein, including recommendations of treatment options for subject and a risk occurrence of a disease or condition.
  • a kit may provide a unit or device for obtaining a sample from a subject (e.g., a device with a needle coupled to an aspirator).
  • a kit may provide instructions for performing methods as disclosed herein, and include all necessary buffers and reagents for hybridizing, sequencing, amplifying, associating, extending, or combination thereof.
  • a kit may include instructions for analyzing a result.
  • An informational material of a kit may comprise printed matter, e.g., a printed text, drawing, and/or photograph, e.g., a label or printed sheet.
  • An information material may comprise Braille, computer readable material, video recording, or audio recording.
  • the informational material of the kit may include contact information, e.g., a physical address, email address, website, or telephone number, where a user of the kit can obtain substantive information about a compound described herein and/or its use in the methods described herein.
  • Informational material may be provided in any combination of formats.
  • a kit may include a package, such as a fiber-based package, a cardboard package, or a polymeric package, such as a styrofoam box.
  • a package may be configured so as to substantially maintain a temperature differential between an interior and an exterior. In some cases, it may provide insulating properties to keep one or more components of a kit at a preselected temperature for a preselected time.
  • a kit may include one or more containers for a composition containing a compound(s) described herein. In some embodiments, a kit may contain separate containers (such as two separate containers for two components of a kit), dividers or compartments for one or more components, and informational material.
  • a kit component may be contained in a bottle, a vial, or a syringe, and informational material may be contained in a plastic sleeve or a packet.
  • separate components of a kit may be contained within a single, undivided container.
  • a kit component may be contained in a bottle, a vial or a syringe that has attached thereto the informational material in the form of a label.
  • a kit may include a plurality (e.g., a pack) of individual containers, each containing one or more unit dosage forms (e.g., a dosage form described herein) of a component described herein.
  • the kit may include a plurality of syringes, ampules, foil packets, or blister packs, each containing a single unit dose of a kit component described herein.
  • Containers of a kit may be air tight, waterproof (e.g., impermeable to changes in moisture or evaporation), and/or light-tight.
  • a kit may include a device suitable for administration of the component, e.g., a syringe, inhalant, pipette, forceps, measured spoon, dropper (e.g., eye dropper), swab (e.g., a cotton swab or wooden swab), or any such delivery device.
  • the device may be a medical implant device, e.g., packaged for surgical insertion.
  • a basic research business, a disease diagnostic business, a molecular profiling business, a pharmaceutical business, or any other business associated with patient healthcare may provide a kit for performing the methods described herein.
  • FIG. 1 shows a computer system 101 that is programmed or otherwise configured to interface with a sequence library, a sequencer, a PCR machine, an apparatus that is configured to sequence or amplify an oligonucleotide, a substrate, or any combination thereof.
  • the computer system 101 can regulate various aspects of the present disclosure, such as, for example, conditions for perform epigenetic modifications, conditions for associating a moiety to the epigenetic modifications, and conditions for nanopore sequencing.
  • the computer system 101 can regulate amplification conditions, associating conditions, sequencing conditions, such as buffer types, temperatures, or time periods of incubation.
  • the computer system 101 can be an electronic device of a user or a computer system that is remotely located with respect to the electronic device.
  • the electronic device can be a mobile electronic device.
  • the computer system 101 includes a central processing unit (CPU, also“processor” and “computer processor” herein) 105, which can be a single core or multi core processor, or a plurality of processors for parallel processing.
  • the computer system 101 also includes memory or memory location 110 (e.g., random-access memory, read-only memory, flash memory), electronic storage unit 115 (e.g., hard disk), communication interface 120 (e.g., network adapter) for communicating with one or more other systems, and peripheral devices 125, such as cache, other memory, data storage and/or electronic display adapters.
  • the memory 110, storage unit 115, interface 120 and peripheral devices 125 are in communication with the CPU 105 through a communication bus (solid lines), such as a motherboard.
  • the storage unit 115 can be a data storage unit (or data repository) for storing data.
  • the computer system 101 can be operatively coupled to a computer network (“network”) 130 with the aid of the communication interface 120.
  • the network 130 can be the Internet, an internet and/or extranet, or an intranet and/or extranet that is in communication with the Internet.
  • the network 130 in some cases is a
  • the network 130 can include one or more computer servers, which can enable distributed computing, such as cloud computing.
  • the network 130 in some cases with the aid of the computer system 101, can implement a peer-to-peer network, which may enable devices coupled to the computer system 101 to behave as a client or a server.
  • the CPU 105 can execute a sequence of machine-readable instructions, which can be embodied in a program or software.
  • the instructions may be stored in a memory location, such as the memory 110.
  • the instructions can be directed to the CPU 105, which can subsequently program or otherwise configure the CPU 105 to implement methods of the present disclosure. Examples of operations performed by the CPU 105 can include fetch, decode, execute, and writeback.
  • the CPU 105 can be part of a circuit, such as an integrated circuit.
  • a circuit such as an integrated circuit.
  • One or more other components of the system 101 can be included in the circuit.
  • the circuit is an application specific integrated circuit (ASIC).
  • ASIC application specific integrated circuit
  • the storage unit 115 can store files, such as drivers, libraries and saved programs.
  • the storage unit 115 can store user data, e.g., user preferences and user programs.
  • the computer system 101 in some cases can include one or more additional data storage units that are external to the computer system 101, such as located on a remote server that is in communication with the computer system 101 through an intranet or the Internet.
  • the computer system 101 can communicate with one or more remote computer systems through the network 130.
  • the computer system 101 can communicate with a remote computer system of a user.
  • remote computer systems include personal computers (e.g., portable PC), slate or tablet PC’s (e.g., Apple® iPad, Samsung® Galaxy Tab), telephones, Smart phones (e.g., Apple® iPhone, Android-enabled device, Blackberry®), or personal digital assistants.
  • the user can access the computer system 1101 via the network 130.
  • Methods as described herein can be implemented by way of machine (e.g., computer processor) executable code stored on an electronic storage location of the computer system 101, such as, for example, on the memory 110 or electronic storage unit 115.
  • the machine executable or machine readable code can be provided in the form of software.
  • the code can be executed by the processor 105.
  • the code can be retrieved from the storage unit 115 and stored on the memory 110 for ready access by the processor 105.
  • the electronic storage unit 115 can be precluded, and machine-executable instructions are stored on memory 110.
  • the code can be pre-compiled and configured for use with a machine having a processer adapted to execute the code, or can be compiled during runtime.
  • the code can be supplied in a programming language that can be selected to enable the code to execute in a pre-compiled or as-compiled fashion.
  • aspects of the systems and methods provided herein can be embodied in programming.
  • Various aspects of the technology may be thought of as“products” or “articles of manufacture” typically in the form of machine (or processor) executable code and/or associated data that is carried on or embodied in a type of machine readable medium.
  • Machine- executable code can be stored on an electronic storage unit, such as memory (e.g., read-only memory, random-access memory, flash memory) or a hard disk.
  • “Storage” type media can include any or all of the tangible memory of the computers, processors or the like, or associated modules thereof, such as various semiconductor memories, tape drives, disk drives and the like, which may provide non-transitory storage at any time for the software programming. All or portions of the software may at times be communicated through the Internet or various other telecommunication networks. Such communications, for example, may enable loading of the software from one computer or processor into another, for example, from a management server or host computer into the computer platform of an application server.
  • another type of media that may bear the software elements includes optical, electrical and electromagnetic waves, such as used across physical interfaces between local devices, through wired and optical landline networks and over various air-links.
  • the physical elements that carry such waves, such as wired or wireless links, optical links or the like, also may be considered as media bearing the software.
  • terms such as computer or machine “readable medium” refer to any medium that participates in providing instructions to a processor for execution.
  • a machine readable medium such as computer-executable code, may take many forms, including but not limited to, a tangible storage medium, a carrier wave medium or physical transmission medium.
  • Non-volatile storage media include, for example, optical or magnetic disks, such as any of the storage devices in any computer(s) or the like, such as may be used to implement the databases, etc. shown in the drawings.
  • Volatile storage media include dynamic memory, such as main memory of such a computer platform.
  • Tangible transmission media include coaxial cables; copper wire and fiber optics, including the wires that comprise a bus within a computer system.
  • Carrier-wave transmission media may take the form of electric or electromagnetic signals, or acoustic or light waves such as those generated during radio frequency (RF) and infrared (IR) data communications.
  • Computer- readable media therefore include for example: a floppy disk, a flexible disk, hard disk, magnetic tape, any other magnetic medium, a CD-ROM, DVD or DVD-ROM, any other optical medium, punch cards paper tape, any other physical storage medium with patterns of holes, a RAM, a ROM, a PROM and EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave transporting data or instructions, cables or links transporting such a carrier wave, or any other medium from which a computer may read programming code and/or data. Many of these forms of computer readable media may be involved in carrying one or more sequences of one or more instructions to a processor for execution.
  • the computer system 101 can include or be in communication with an electronic display 135 that comprises a user interface (UI) 140 for providing, for example, one or more results (immediate results or archived results from a previous experiment), one or more user inputs, reference values from a library or database, or a combination thereof.
  • UI user interface
  • Examples of UFs include, without limitation, a graphical user interface (GUI) and web-based user interface.
  • Methods and systems of the present disclosure can be implemented by way of one or more algorithms.
  • An algorithm can be implemented by way of software upon execution by the central processing unit 105.
  • the algorithm can, for example, determine optimized conditions via supervised learning to optimize conditions such as a buffer type, a buffer concentration, a temperature, an incubation period.
  • Conditions may be optimized for an oligonucleotide fragment, such as an oligonucleotide fragment having a particular number of epigenetic modifications or a particular length of sequence.
  • An aspect of the present disclosure provides a method.
  • the method may comprise associating a moiety with an epigenetically modified base of a target nucleic acid sequence to form a labeled epigenetically modified base, wherein the epigenetically modified base comprises a
  • the epigenetically modified base may comprise a pyrimidine.
  • the pyrimidine may be a cytosine.
  • the pyrimidine may be thymine (T) or uracil.
  • the epigenetically modified base may comprise 6-methyladenine, 6- hydroxymethyladenine, 8-oxoguanine, 5-hydroxymethyluracil, abasic sites, 5-methylcytosine, and 6- methyladenine.
  • the epigenetically modified base may comprise N 6 - methyladenosine, N 3 -methyladenosine, N 7 -methylguanosine, 5-hydroxymethylcytosine, other methylated nucleotides, pseudouridine, thiouridine, isoguanosine, isocytosine, dihydrouridine, queuosine, wyosine, inosine, triazole, diaminopurine, b-D-glucopyranosyloxymethyluracil (a.k.a., b-D-glucosyl-HOMedU, b- glucosyl-hydroxymethyluracil,“dJ,” or“base J”), 8-oxoguanosine, and 2'-0-methyl derivatives of adenosine, cytidine, guanosine, and uridine.
  • the epigenetically modified base may comprise a hydroxymethylated base.
  • the hydroxymethylated base may comprise a 5 -hydroxymethylated base.
  • the 5 -hydroxymethylated base may comprise a 5-hydroxymethylcytosine.
  • the moiety may comprise a glucose moiety.
  • the method may further comprise, before the identifying, oxidizing the moiety.
  • the oxidizing may be carried out by an oxidizing agent.
  • the oxidizing agent may comprise sodium periodate.
  • the epigenetically modified base may comprise a formylated base.
  • the formylated base may comprise a 5- formylated base.
  • the 5- formylated base may comprise a 5- formylcytosine.
  • the moiety may comprise a hydroxylamine or a derivative thereof, a hydrazine or a derivative thereof, or a l,3-indandione or a derivative thereof.
  • the epigenetically modified base may comprise a carboxylic acid containing base.
  • the carboxylic acid containing base may comprise a 5- carboxylated base.
  • the 5- carboxylated base may comprise a 5- carboxy cytosine.
  • the moiety may comprise an anisidine or a derivative thereof, a carbodiimide or a derivative thereof, or a p-Xylylenediamine or a derivative thereof.
  • the epigenetically modified base may further comprise a methylated base.
  • the methylated base may comprise a 5 -methylated base.
  • the 5 -methylated base may comprise a 5-methylcytosine.
  • the target nucleic acid sequence may comprise DNA or RNA.
  • the epigenetically modified base may further comprise a 6-methyladenine, a 6-hydroxymethyladenine, a 8-oxoguanine, a 7-methylguanine, a 5-hydroxymethyluracil, and an abasic site.
  • a size of a nanopore may be at most 1 nanometer (nm). In some embodiments, a size of a nanopore may be at most 1 nanometer (nm), 0.9 nm, 0.8 nm, 0.7 nm, 0.8 nm, 0.6 nm, 0.5 nm, 0.4 nm, 0.3 nm, 0.2 nm, 0.1 nm, or less. In some embodiments, a size of a nanopore may be more than 1 nm.
  • At least one nanopore used in the nanopore sequencing may be a biological nanopore.
  • at least 2, 3, 4, 5, 6, 7, 8, 9, 10 or greater nanopores used in the nanopore sequencing may be biological nanopores.
  • the nanopore may comprise a lipid bilayer.
  • the nanopore may be formed using a biological transmembrane protein such as MspA.
  • the nanopore may be a solid state or hybrid nanopore.
  • the nucleic acid sequence or the target nucleic acid sequence may be double-stranded.
  • the moiety may be associated with the epigenetically modified base by a single bond, a double bond, or a triple bond.
  • the target nucleic acid sequence may comprise an adapter sequence.
  • the target nucleic acid sequence may comprise a barcode.
  • the target nucleic acid sequence may comprise at least: from about 1 to about 3; from about 1 to about 5; from about 1 to about 10; from about 1 to about 15; or from about 1 to about 20 epigenetically modified bases per at least about 20 bases of the target nucleic acid sequence. In some cases, the target nucleic acid sequence may comprise at least about: 1, 5, 10, 15 or 20 epigenetically modified bases per at least about 20 bases of the target nucleic acid sequence.
  • the target nucleic acid sequence may comprise a cytosine guanine (CG) site, a cytosine phosphate guanine (CpG) island, or a combination thereof.
  • the target nucleic acid sequence may comprise cell-free DNA.
  • the target nucleic acid sequence may comprise a cDNA sequence.
  • the method may comprise sequencing an amplified product.
  • the target nucleic acid sequence may be from a sample.
  • the sample may be from a subject.
  • the subject may be a human.
  • the sample may comprise a buccal sample, a saliva sample, a blood sample, a plasma sample, a reproductive sample, a mucus sample, cerebral spinal fluid sample, a tissue sample, or any combination thereof.
  • the method may comprise obtaining a result. In some cases, the method may comprise communicating the result via a communication medium.
  • the subject may be diagnosed with a condition.
  • the method may comprise diagnosing the subject as having a condition.
  • the method may comprise diagnosing the subject as having a likelihood of developing a condition.
  • the diagnosing may be based on the comparing the result to the reference.
  • the diagnosing may at least partially confirm a previous diagnosis.
  • the condition may be a cancer.
  • the method may comprise selecting a treatment for the subject.
  • the method may comprise treating the subject.
  • the treating may comprise: surgery, chemotherapy, radiation therapy, immunotherapy, targeted therapy, hormone therapy, stem cell transplant, and precision medicine.
  • the method may comprise repeating the associating, the hybridizing and the amplifying at different time points.
  • the subject may be a human.
  • the moiety may comprise a sugar.
  • the sugar may comprise a glucose.
  • the glucose may be modified.
  • the moiety may be associated with the epigenetically modified base with the assistance of an enzyme.
  • the enzyme may be selective for a portion of the target nucleic acid sequence that is double-stranded.
  • the moiety may be selectively associated with the epigenetically modified base at a portion of the target nucleic acid sequence that is double -stranded.
  • the moiety may be selective for a portion of the nucleic acid sequence. In some cases, the portion may be double -stranded.
  • a method comprising: (a) associating a moiety with a hydroxymethylated base of a target nucleic acid sequence to form a labeled hydroxymethylated base; (b) oxidizing the labeled hydroxymethylated base; and (c) identifying the hydroxymethylated base by sequencing the target nucleic sequence, wherein the sequencing comprises nanopore sequencing.
  • Embodiment 2 The method of embodiment [00222], wherein the hydroxymethylated base comprises a pyrimidine.
  • Embodiment 3 The method of embodiment [00223], wherein the pyrimidine is a cytosine.
  • Embodiment 4 The method of embodiment 1, wherein the hydroxymethylated base comprises a 5 -hydroxymethylated base.
  • Embodiment 5 The method of embodiment 4, wherein the 5 -hydroxymethylated base comprises a 5 -hydroxymethylcytosine .
  • Embodiment 6 The method of any one of embodiments 1-5, wherein the moiety comprises a glucose moiety.
  • Embodiment 7 The method of any one of embodiments 1-5, further comprising, before the identifying, oxidizing the moiety.
  • Embodiment 8 The method of any one of embodiments 1-7, wherein the oxidizing is carried out by an oxidizing agent.
  • Embodiment 9 The method of embodiment 8, the oxidizing agent comprises sodium periodate.
  • Embodiment 10 The method of any one of embodiments 1-9, wherein the target nucleic sequence comprises a formylated base.
  • Embodiment 11 The method of embodiment 10, wherein the formylated base comprises a 5- formylated base.
  • Embodiment 12 The method of embodiment 11, wherein the 5- formylated base comprises a 5- formylcytosine.
  • Embodiment 13 The method of any one of embodiments 10-12, wherein the formylated base is associated with a second moiety, wherein the second moiety comprises a hydroxylamine, a hydrazine, a l,3-indandione; a hemiacetal, an acetal; an aldehyde; a ketone; an ester, a primary amine, a secondary amine, an alkene, an alcohol, a thioacetal, a malononitrile, a benzoin, an aldol, a derivative thereof of any of these, or any combination thereof.
  • Embodiment 14 The method of any one of embodiments 1-13, wherein the target nucleic sequence further comprises a carboxylic acid containing base.
  • Embodiment 15 The method of embodiment 14, wherein the carboxylic acid containing base comprises a 5- carboxylated base.
  • Embodiment 16 The method of embodiment 15, wherein the 5- carboxylated base comprises a 5- carboxy cytosine.
  • Embodiment 17 The method of any one of embodiments 14-16, wherein the carboxylic acid containing base is associated with a third moiety, wherein the third moiety comprises an anisidine, a carbodiimide, a p-Xylylenediamine, an ester, an amine, an acyl halide, an acid anhydride, a derivative thereof of any of these, or any combination thereof.
  • Embodiment 18 The method of any one of embodiments 1-7, wherein the target nucleic sequence further comprises a methylated base.
  • Embodiment 19 The method of embodiment 18, wherein the methylated base comprises a 5- methylated base.
  • Embodiment 20 The method of embodiment 19, wherein the 5-methylated base comprises a 5- methylcytosine.
  • Embodiment 21 The method of any one of embodiments 1-20, wherein the target nucleic acid sequence comprises DNA or RNA.
  • Embodiment 22 The method of any one of embodiments 1-21, wherein the target nucleic sequence further comprises aN6-methyladenine, a N6-hydroxymethyladenine, a N6-formyladenine, a 2’- O-methyladenosine, aNl-methyladenosine, a pseudouridine, an inosine, a 8-oxoguanine, a 7- methylguanine, a 5-hydroxymethyluracil, an abasic site, or any combination thereof.
  • the target nucleic sequence further comprises aN6-methyladenine, a N6-hydroxymethyladenine, a N6-formyladenine, a 2’- O-methyladenosine, aNl-methyladenosine, a pseudouridine, an inosine, a 8-oxoguanine, a 7- methylguanine, a 5-hydroxymethyluracil, an abasic site, or
  • Embodiment 23 The method of any one of embodiments 1-22, wherein a size of a nanopore is at most one nanometer.
  • Embodiment 24 The method of any one of embodiments 1-23, wherein at least one nanopore used in the nanopore sequencing is a biological nanopore.
  • Embodiment 25 The method of any one of embodiments 1-24, wherein the moiety is at least two moieties.
  • Embodiment 26 A method comprising: (a) associating a moiety with an epigenetically modified base of a target nucleic acid sequence to form a labeled epigenetically modified base, wherein the epigenetically modified base comprises a formylated base, or a carboxylic acid containing base; and (b) identifying the epigenetically modified base by sequencing the target nucleic acid comprising the labeled epigenetically modified base, wherein the sequencing comprises nanopore sequencing.
  • Embodiment 27 The method of embodiment 26, wherein the epigenetically modified base comprises a pyrimidine.
  • Embodiment 28 The method of embodiment 27, wherein the pyrimidine is a cytosine.
  • Embodiment 30 The method of embodiment 29, wherein the hydroxymethylated base comprises a 5 -hydroxymethylated base.
  • Embodiment 31 The method of embodiment 30, wherein the 5 -hydroxymethylated base comprises a 5-hydroxymethylcytosine.
  • Embodiment 32 The method of any one of embodiments 29-31, wherein the moiety comprises a glucose moiety.
  • Embodiment 33 The method of any one of embodiments 29-32, further comprising, before the identifying, oxidizing the moiety.
  • Embodiment 34 The method of embodiment 33, wherein the oxidizing is carried out by an oxidizing agent.
  • Embodiment 35 The method of embodiment 34, the oxidizing agent comprises sodium periodate.
  • Embodiment 36 The method of any one of embodiments 26-28, wherein the epigenetically modified base comprises a formylated base.
  • Embodiment 37 The method of embodiment 36, wherein the formylated base comprises a 5- formylated base.
  • Embodiment 38 The method of embodiment 37, wherein the 5- formylated base comprises a 5- formylcytosine.
  • Embodiment 39 The method of any one of embodiments 36-38, wherein the moiety comprises a hydroxylamine, a hydrazine, a l,3-indandione; a hemiacetal, an acetal; an aldehyde; a ketone; an ester, a primary amine, a secondary amine, an alkene, an alcohol, athioacetal, a malononitrile, a benzoin, an aldol, a derivative thereof of any of these, or any combination thereof.
  • Embodiment 40 The method of any one of embodiments 26-28, wherein the epigenetically modified base comprises a carboxylic acid containing base.
  • Embodiment 41 The method of embodiment 40, wherein the carboxylic acid containing base comprises a 5- carboxylated base.
  • Embodiment 42 The method of embodiment 41, wherein the 5- carboxylated base comprises a 5- carboxy cytosine.
  • Embodiment 43 The method of any one of embodiments 40-42, wherein the moiety comprises an anisidine, a carbodiimide, a p-Xylylenediamine, an ester, an amine, an acyl halide, an acid anhydride, a derivative thereof of any of these, or any combination thereof.
  • Embodiment 44 The method of any one of embodiments 26-43, wherein the epigenetically modified base further comprises a methylated base.
  • Embodiment 45 The method of embodiment 44, wherein the methylated base comprises a 5- methylated base.
  • Embodiment 46 The method of embodiment 45, wherein the 5-methylated base comprises a 5- methylcytosine.
  • Embodiment 47 The method of any one of embodiments 26-46, wherein the target nucleic acid sequence comprises DNA or R A.
  • Embodiment 48 The method of any one of embodiments 26-47, wherein the epigenetically modified base further comprises a N6-methyladenine, a N6-hydroxymethyladenine, a N6-formyladenine, a 2’-0-methyladenosine, a Nl-methyladenosine, a pseudouridine, an inosine, a 8-oxoguanine, a 7- methylguanine, a 5-hydroxymethyhiracil, an abasic site, or any combination thereof.
  • Embodiment 49 The method of any one of embodiments 26-48, wherein a size of a nanopore is at most one nanometer.
  • Embodiment 50 The method of any one of embodiments 26-49, wherein at least one nanopore used in the nanopore sequencing is a biological nanopore.
  • Embodiment 51 The method of any one of embodiments 26-50, wherein the moiety is at least two moieties.
  • Embodiment 52 The method of any one of embodiments 28-51, wherein at least one nanopore used in the nanopore sequencing is a biological nanopore.
  • Embodiment 53 The method of any one of embodiments 28-52, wherein the moiety is at least two moieties.
  • Embodiment 54 The method of any one of embodiments 1-27 or any one of claims 28-53, wherein the identifying comprises employing a trained algorithm.
  • Embodiment 55 The method of any one of embodiments 1-27 or any one of claims 28-53, wherein the hydroxymethylated base is identified at an accuracy greater than an accuracy achieved by a method of identifying the hydroxymethylated base using a different sequencing method.
  • Embodiment 56 The method of embodiments 55, wherein the different sequencing method is Illumina sequencing.
  • Embodiment 57 The method of any one of claims 1-27 or any one of claims 28-53, wherein the identifying comprises identifying an unmodified base.
  • Embodiment 58 The method of embodiments 57, wherein the unmodified base is identified at an accuracy greater than an accuracy achieved by a method of identifying the unmodified base using a different sequencing method.
  • Embodiment 59 The method of embodiments 58, wherein the different sequencing method is Illumina sequencing.
  • Embodiment 60 The method of any one of embodiments 57-59, wherein the unmodified base is a cytosine.
  • a subject may be suspected of having a cancer.
  • a sample comprising a target nucleic acid sequence may be obtained from the subject with at least one of: a plasma sample, a serum sample, a blood sample, a urine sample, and a buccal sample.
  • the target nucleic acid sequence may be isolated from the sample.
  • Epigenetic modifications present on the target nucleic acid sequence may be associated with UDPG employing T4 Phage beta-glucosyltransferase (T4-BGT) or with click chemistry.
  • T4-BGT T4 Phage beta-glucosyltransferase
  • the target nucleic acid may then go through nanopore sequencing.
  • the subject may be diagnosed as having the cancer when an epigenetic modification associated with the cancer may be confirmed present in the sample obtained from the subject.
  • This reaction is run on a PCR machine, which holds the reaction at 37°C for 30 minutes and then holds the reaction at l0°C.
  • CEGX067 144 2 is purified in 50ul solution (according to the previous paragraph).
  • the End Prep step may need a volume input of about 25ul (see below). Therefore, a sample may be divided into two or more portions.
  • CEG067 144 1 may be directly diluted to 50uL with H20
  • the products are run on a PCR machine.
  • the PCR machine conditions include: holding the reaction at 20°C for 5 minutes; then pausing to add 0.9 pl 100 mM dATP (Jena Bioscience, Cat. No. NU-1001); then holding the reaction at 65°C for 5 minutes; then holding the reaction at l0°C.
  • the products are run on a PCR machine.
  • the PCR conditions include: holding at 25°C for 10 minutes and then holding at l0°C.
  • FIG. 13A shows the bioanalyzer results with 1/3 dilution.
  • the flow cell is washed by Flow cell wash kit (EXP-WSH002) and store at 4°C.
  • FIG. 13B shows the size distributions of 2 samples.
  • the read lengths of 1 kb-hmC (5-hmC) are mainly about 1 kb with small portion of shore reads about 400 bp.
  • the read lengths of glucosylated 1 kb- hmC are about 1 kb (main peak) with 200 bp/400 bp/600 bp reads.
  • the read ratio between the 1 kb-hmC and glucosylated 1 kb-hmC is about 10: 1. Because the modification of hmC increases the friction between the DNA sequence and pore, it makes modified DNA pass through the pore slower. In a certain time windows, the glucosylated 1 kb-hmC sample produced less reads than 1 kb-hmC sample does.
  • FIG. 13C shows IGV (Integrative Genomics Viewer, a visualization tool for genomic datasets) view of reads mapped to reference. It shows a similar trend for 1 kb-hmC and the glucosylated 1 kb-hmC samples.
  • the products may be run on a PCR machine.
  • the PCR conditions include: holding the products at
  • PCR reactions are pooled and purified by GeneJet kit similar to 2. Cleanup in Example 2.
  • FIG. 13D shows the result of the bioanalyzer after sodium periodate oxidation.
  • the products may be run on a PCR machine.
  • the PCR conditions include: holding the reaction at 20°C for 5 minutes; then pausing to add 0.9 m ⁇ 100 mM dATP (Jena Bioscience, Cat. No. NU-1001); then holding the reaction at 65°C for 5 minutes; then holding the reaction at l0°C.
  • the products may be run on a PCR machine.
  • the PCR conditions include: holding the reaction at
  • FIG. 13E shows the bioanalyzer results with 1/3 dilution.
  • Flow cell is washed by Flow cell wash kit (EXP-WSH002) and stored at 4°C.
  • FIGs. 3-4 show that there are no differences between sample 2kb-hmC and glucosylated 2kb- hmC. There is DNA fragmentations after sodium periodate oxidation. This step is not required because glucosylated 2kb-hmC samples aren’t through pore very well. Cytosine is miscalled by Albacore software (a basecaller). The error rate of glucosylated 2kb-hmC and the oxidized glucosylated 2kb-hmC may be the same, which may be larger than the error rate of 2kb-hmC. Manually align electric signals using HDFView (a software tool to view raw data produced from a sequencer) and there are differences between 5hmc and glucosylated 5-hmC in ACT motif
  • FIG. 13F shows that in the forward primer region, cytosines are identified as“c” correctly, while cytosines with modifications were basecalled with errors.
  • the products may be run on a PCR machine.
  • the PCR conditions include: holding the reaction at
  • FIG. 14A shows the result of the bioanalyzer after modification.
  • the products may be run on a PCR machine.
  • the PCR conditions include: holding the reaction at
  • the products may be run on a PCR machine.
  • the PCR conditions include: holding the reaction at
  • FIG. 14B shows the bioanalyzer result with 1/3 dilution.
  • FIG. 14C shows there is no signs of pore blockage.
  • FIG. 14D shows the insert size distributions of C, fC, and fC-HA.
  • FIG. 14E shows the modification cause error basecalling.
  • FIG. 14F shows the modification cause error basecalling and the errors of fC-HA is larger than fC, which is larger than C.
  • FIG. 14G shows the raw signal analysis of TTACT kmer. Upon manually aligning electric signals, there are differences among C, fC and fC-HA in TTACT kmer.
  • modification by hydroxylamine to 5fc may be very successful, at upwards of about -100% completion (no searching material in the product on BA gel).
  • the modification may be mild because it may not cause DNA damage.
  • the basecalling error of fC-HA may be larger than fC, which may be larger than C.
  • a larger hydroxylamine may be employed or a derivative thereof, hydrazine or a derivative thereof, l,3-indandione or a derivative thereof, or any combination thereof.
  • the products may be run on a PCR machine.
  • the PCR conditions include holding the reaction at
  • the products may be run on a PCR machine.
  • the PCR conditions include holding the reaction at
  • FIG. 15A shows the result of the bioanalyzer after p-Xylylenediamine modification.
  • the products may be run on a PCR machine.
  • the PCR conditions include: holding the reaction at 20°C for 5 minutes; then pausing to add 0.9 m ⁇ 100 mM dATP (Jena Bioscience, Cat. No. NU-1001); then holding the reaction at 65°C for 5 minutes; then holding the reaction at l0°C.
  • the products may be run on a PCR machine.
  • the PCR conditions include: holding the reaction at
  • FIG. 15B shows the bioanalyzer results with 1/3 dilution.
  • FIG. 15C shows the raw signal analysis of ACTAT. Upon manually aligning electric signals, there are differences between caC and caC-XDA in ACTAT kmer.
  • Figure 20 indicates the base calling errors associated with 5fC, 5fC-HA, 5caC and 5caC-XDA.
  • a cell-free sample of a subject will be obtained.
  • the cell-free sample will comprise a nucleic acid sequence.
  • a hydroxy lamine will be associated with a formylated base of the nucleic acid sequence.
  • Nanopore sequencing will be performed to identify the presence of the formylated base.
  • a buccal swab sample of a subject will be obtained.
  • the buccal swab sample will comprise a nucleic acid sequence.
  • a hydrazine will be associated with a formylated base of the nucleic acid sequence.
  • Nanopore sequencing will be performed to identify the presence of the formylated base.
  • a fecal sample of a subject will be obtained.
  • the fecal sample will comprise a nucleic acid sequence.
  • a hemi -acetal will be associated with a formylated base of the nucleic acid sequence.
  • Nanopore sequencing will be performed to identify the presence of the formylated base.
  • a tissue sample of a subject will be obtained.
  • the tissue sample will comprise a nucleic acid sequence.
  • An aldehyde will be associated with a formylated base of the nucleic acid sequence.
  • Nanopore sequencing will be performed to identify the presence of the formylated.
  • a cell-free sample of a subject will be obtained.
  • the cell-free sample will comprise a nucleic acid sequence.
  • An anisidine will be associated with a carboxyl acid containing base of the nucleic acid sequence.
  • Nanopore sequencing will be performed to identify the presence of the carboxyl acid containing base.
  • a buccal swab sample of a subject will be obtained.
  • the buccal swab sample will comprise a nucleic acid sequence.
  • An ester will be associated with a carboxyl acid containing base of the nucleic acid sequence.
  • Nanopore sequencing will be performed to identify the presence of the carboxyl acid containing base.
  • a fecal sample of a subject will be obtained.
  • the fecal sample will comprise a nucleic acid sequence.
  • a carbodiimide will be associated with a carboxyl acid containing base of the nucleic acid sequence.
  • Nanopore sequencing will be performed to identify the presence of the carboxyl acid containing base.
  • a tissue sample of a subject will be obtained.
  • the tissue sample will comprise a nucleic acid sequence.
  • An acyl halide will be associated with a carboxyl acid containing base of the nucleic acid sequence.
  • Nanopore sequencing will be performed to identify the presence of the carboxyl acid containing base.
  • the products may be run on a PCR machine.
  • the PCR conditions include holding the reaction at 20°C, 5 min; pause to add 0.9m1 lOOmM dATP (Jena Bioscience, Cat. Nu-lOOl); 65°C, 5 min; l0°C, hold.
  • the products may be run on a PCR machine.
  • the PCR conditions include: 25°C for 10 minutes; and then holding the reaction at l0°C.
  • FIG. 17 shows the result of the bioanalyzer.
  • FIG. 18 indicates that the larger the modification of cytosine the larger the error for the basecaller. .
  • signals of TACAT and TA m CAT are almost identical.
  • signal of '""CAT starts showing small difference with the interference with the signal of the preceding base (the neighboring effect).
  • motif of TA glu hm CAT after glucosylation by T4 beta glucosyltransferase, signal of glu hm C shifts to the following base, and interferes with the preceding base as well.
  • a De Bruijn sequence may be an efficient way to collect k-mer information which encodes the library of k-mers in the minimal possible sequence.
  • FIG. 24 demonstrates exemplary steps to making of the k-mer sequence.
  • the 5-mer sequence may be designed by ordering two gBlocks from IDT (fl, f2) with flanking regions for PCR and Golden Gate cloning.
  • gBlocks (fl, f2) were (1) amplified, (2) cut with Bsal-HF, (3) ligated to form full sequence (fl+f2), (4) cloned into pCR2.
  • l (TA cloning) (5) reamplified and (6) Sanger sequenced.
  • a full length De Bruijn sequence for a 5-mer results.
  • a 6-mer can also be made using the same process outlined above if required. [00385] Referring to FIG.
  • FIG. 26 a summary of data collect using Nanopore and Illumina sequencing platforms is shown.
  • a complete nanopore dataset for K-mer s (where K 5) using the synthetic De Bruijn sequence using mC, hmC & ghmC as modifications was collected. It was probed for whether these data could be used as a training set to call modifications in a whole genome setting such as Fugu or human to an industry leading standard (>90% accuracy). Extracting raw aligned squiggles from reads arising from 100% labelling with modified nucleotides may be successful.
  • Soft-labelling where modified nucleotides are spiked in at -25% in the PCR, produced aligned reads where modifications visibly appeared to affect raw aligned squiggles.
  • M.SssI labelling of mC also produced reads that passed filter. Labelling was verified by bisulphite-sequencing.
  • FIG. 27A-C three tables of read filtering and demultiplexing is shown.
  • Preliminary analysis of the first -8000 reads of fully labelled C’s demonstrate that reads passing filter for modifications may be very low (average quality score ⁇ Q7)
  • reads do pass filter where mC labelling is done by M.SssI (CEGX_Run669 in FIG. 27C), and when labelled using a soft-labelling PCR approach.
  • M.SssI CEGX_Run669 in FIG. 27C
  • extracting raw aligned squiggles from reads arising from 100% labelling with modified nucleotides may be successful.
  • modified nucleotides are introduced by spiking in modified dCTPs at -25% in the PCR or enzymatically labelling CpGs using M.SssI, also results in reads also generally passing filter.
  • FIG. 28A IGV screenshots of aligned reads from bisulphite sequencing of the De Bruijn sequence are shown.
  • FIG 28B shows extent of 5-mC labeling of various CpGs in M.SssI labelled (top) and unlabelled sequences (bottom). Verifying labelling of sequences is done using Illumina sequencing. Before analysis of Nanopore reads, verify labelling using bisulphite sequencing was done. Bisulphite sequencing the M.SssI labelled De Bruijn sequence (Illumina bisulphite NGS sequencing) was done to determine mC labelling by M.SssI. Bisulphite sequencing results for M.SssI labelling demonstrates -55% labelling of C with mC.
  • modified C labelling was verified by Illumina NGS bisulphite for the soft labelling PCR approach sequences.
  • Bisulphite sequencing results for the soft-labelling approach demonstrates -19-33% (mean) labelling of CpG with mC, hmC & ghmC. Any differences in Nanopore traces at C’s can be assigned to modifications.
  • FIG. 30A-FIG.30C and FIG. 31A- FIG. 31C datasets for 5 different kmers are shown and FIG. 32 shows a dataset for 50 positions in the De Bruijn sequence for unmodified and modified reads.
  • Nanopore reads may display a modified trace. The first few thousand reads passing filter were analyzed for modification-containing sequences. Tombo was used to re-squiggle and align reads to the De Bruijn sequence. Data demonstrates differences in many of the traces are visually seen - a result that is consistent with a low level ( ⁇ 10-50%) of labelling.
  • M.SssI induced labelling slight differences between modified and unmodified are seen close to CpG sites.
  • For soft-labeling slight differences between modified and unmodified are seen close to C sites.
  • Nanopore reads may display a modified trace. Doping in modifications at 25% may increase the pass rate of sequences. Whilst slight differences in the traces are observed, extracting the modified data for k-mers may be more difficult. Overcoming this issue, and assigning a modification value to each K-mer may lead to analyzing the data in a better way, such as for example similar to Laszlo, A. H. et al. PNAS (2013).
  • the Tombo code was modified (mainly in“plotSingleRun.R”) to output the aligned raw data in a format that can be manipulated in R for calculating differences in signal.
  • IlluminaNGS bisulphite data shows that the sequence is methylated with a mean -50% at each CpG.
  • Example in FIG. 40A-B shows where these results are consistent between the two datasets. Data appears consistent between these independent experiments for the same sequence.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Organic Chemistry (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Microbiology (AREA)
  • Immunology (AREA)
  • Physics & Mathematics (AREA)
  • Molecular Biology (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • Analytical Chemistry (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

Provided herein are methods for detection of an epigenetic modification in a nucleic acid sequence. The methods as described herein may provide an improvement in the field of next-generation sequencing (e.g., Illumina sequencing). The methods disclosed herein can perform long read, real-time base calling, and detection of structural variation.

Description

DETERMINATION OF EPIGENETIC MODIFICATIONS BY NANOPORE SEQUENCING
CROSS-REFERENCE
[0001] This PCT application claims priority to U.S. provisional application 62/685,217 filed on June 14, 2018, which is entirely incorporated herein by reference.
BACKGROUND
[0002] It is important to develop new methods to determine methylation status and to monitor changes in methylation status.
SUMMARY
[0003] The systems and methods as described herein may provide an approach allowing real-time long-read sequencing of nucleic acid molecules. The methods disclosed herein are an improvement in the field of sequencing.
[0004] An aspect of the present disclosure provides a method. The method may comprise: (a) associating a moiety with a hydroxymethylated base of a target nucleic acid sequence to form a labeled hydroxymethylated base; (b) oxidizing the labeled hydroxymethylated base; and (c) identifying the hydroxymethylated base by sequencing the target nucleic acid sequence, wherein the sequencing comprises nanopore sequencing. In some embodiments, the hydroxymethylated base may comprise a pyrimidine. In some embodiments, the pyrimidine may be a cytosine. In some embodiments, the hydroxymethylated base may comprise a 5 -hydroxymethylated base. In some embodiments, the 5- hydroxymethylated base may comprise a 5-hydroxymethylcytosine. In some embodiments, the moiety may comprise a glucose moiety. In some embodiments, the method may comprise, before the identifying, oxidizing the moiety. In some embodiments, the oxidizing may be carried out by an oxidizing agent. In some embodiments, the oxidizing agent may comprise sodium periodate. In some embodiments, target nucleic acid sequence may further comprise a formylated base. In some embodiments, the formylated base may comprise a 5- formylated base. In some embodiments, the 5- formylated base may comprise a 5- formylcytosine. In some embodiments, the method may further comprise associating the formylated based with a second moiety. In some embodiments, the second moiety may comprise a hydroxylamine, a hydrazine, a l,3-indandione; a hemiacetal, an acetal; an aldehyde; a ketone; an ester, a primary amine, a secondary amine, an alkene, an alcohol, a thioacetal, a malononitrile, a benzoin, an aldol, a derivative thereof of any of these, or any combination thereof. In some embodiments, the target nucleic acid sequence may further comprise a carboxylic acid containing base. In some embodiments, the carboxylic acid containing base may comprise a 5- carboxylated base. In some embodiments, the 5- carboxylated base may comprise a 5- carboxy cytosine. In some embodiments, the method may further comprise associating the carboxylic acid containing base with a third moiety. In some embodiments, the third moiety may comprise an anisidine, a carbodiimide, a p-Xylylenediamine, an ester, an amine, an acyl halide, an acid anhydride, a derivative thereof of any of these, or any combination thereof. In some embodiments, the target nucleic acid sequence may further comprise a methylated base. In some embodiments, the methylated base may comprise a 5 -methylated base. In some embodiments, the 5- methylated base may comprise a 5-methylcytosine. In some embodiments, the target nucleic acid sequence may comprise DNA or R A. In some embodiments, the target nucleic acid sequence may further comprise a N6-methyladenine, aN6-hydroxymethyladenine, a N6-formyladenine, a 2 -0- methyladenosine, a Nl-methyladenosine, a pseudouridine, an inosine, a 8-oxoguanine, a 7-methylguanine, a 5-hydroxymethyluracil, an abasic site, or any combination thereof. In some embodiments, a size of a nanopore may be at most one nanometer. In some embodiments, at least one nanopore used in the nanopore sequencing may be a biological nanopore. In some embodiments, the moiety may be at least two moieties. In some embodiments, the identifying may comprise employing a trained algorithm. In some embodiments, the hydroxymethylated base may be identified at an accuracy greater than an accuracy achieved by a method of identifying the hydroxymethylated base using a different sequencing method. In some embodiments, a hydroxymethylated base, a methylated based, carboxylic acid containing base or a formylated base may be identified at an accuracy greater than an accuracy achieved by a method of identifying the base using a different sequencing method. In some embodiments, the different sequencing method may be Illumina sequencing. In some embodiments, the identifying may comprise identifying an unmodified base. In some embodiments, the unmodified base may be identified at an accuracy greater than an accuracy achieved by a method of identifying the unmodified base using a different sequencing method. In some embodiments, the different sequencing method may be Illumina sequencing. In some embodiments, the unmodified base may be a cytosine, a thymine, a uracil, a adenine or guanine.
[0005] Another aspect of the present disclosure provides a method. The method may comprise: (a) associating a moiety with an epigenetically modified base of a target nucleic acid sequence to form a labeled epigenetically modified base, wherein the epigenetically modified base comprises a formylated base, or a carboxylic acid containing base; and (b) identifying the epigenetically modified base by sequencing the target nucleic acid comprising the labeled epigenetically modified base, wherein the sequencing comprises nanopore sequencing. In some embodiments, the epigenetically modified base may comprise a pyrimidine. In some embodiments, the pyrimidine may be a cytosine. In some embodiments, the epigenetically modified base may further comprise a hydroxymethylated base. In some embodiments, the hydroxymethylated base may comprise a 5 -hydroxymethylated base. In some embodiments, the 5- hydroxymethylated base may comprise a 5-hydroxymethylcytosine. In some embodiments, the moiety may comprise a glucose moiety. In some embodiments, the method may further comprise, before the identifying, oxidizing the moiety. In some embodiments, the oxidizing may be carried out by an oxidizing agent. In some embodiments, the oxidizing agent may comprise sodium periodate. In some embodiments, the epigenetically modified base may comprise a formylated base. In some embodiments, the formylated base may comprise a 5- formylated base. In some embodiments, the 5- formylated base may comprise a 5- formylcytosine. In some embodiments, the moiety may comprise a hydroxylamine, a hydrazine, a 1,3- indandione; a hemiacetal, an acetal; an aldehyde; a ketone; an ester, a primary amine, a secondary amine, an alkene, an alcohol, a thioacetal, a malononitrile, a benzoin, an aldol, a derivative thereof of any of these, or any combination thereof. In some embodiments, the epigenetically modified base may comprise a carboxylic acid containing base. In some embodiments, the carboxylic acid containing base may comprise a 5- carboxylated base. In some embodiments, the 5- carboxylated base may comprise a 5- carboxy cytosine. In some embodiments, the moiety may comprise an anisidine, a carbodiimide, a p- Xylylenediamine, an ester, an amine, an acyl halide, an acid anhydride, a derivative thereof of any of these, or any combination thereof. In some embodiments, the epigenetically modified base may further comprise a methylated base. In some embodiments, the methylated base may comprise a 5 -methylated base. In some embodiments, the 5 -methylated base may comprise a 5-methylcytosine. In some embodiments, the target nucleic acid sequence may comprise DNA or RNA. In some embodiments, the epigenetically modified base may further comprise a N6-methyladenine, a N6-hydroxymethyladenine, a N6-formyladenine, a 2’-0-methyladenosine, a Nl-methyladenosine, a pseudouridine, an inosine, a 8- oxoguanine, a 7-methylguanine, a 5-hydroxymethyhiracil, an abasic site, or any combination thereof. In some embodiments, a size of a nanopore may be at most one nanometer. In some embodiments, at least one nanopore used in the nanopore sequencing may be a biological nanopore. In some embodiments, the moiety may be at least two moieties. In some embodiments, the identifying may comprise employing a trained algorithm. In some embodiments, the hydroxymethylated base may be identified at an accuracy greater than an accuracy achieved by a method of identifying the hydroxymethylated base using a different sequencing method. In some embodiments, an epigenetically modified base may be identified at an accuracy greater than an accuracy achieved by a method of identifying the epigentically modified base using a different sequencing method. In some embodiments, a hydroxymethylated base, a methylated based, carboxylic acid containing base or a formylated base may be identified at an accuracy greater than an accuracy achieved by a method of identifying the base using a different sequencing method. In some embodiments, the different sequencing method may be Illumina sequencing. In some embodiments, the identifying may comprise identifying an unmodified base. In some embodiments, the unmodified base may be identified at an accuracy greater than an accuracy achieved by a method of identifying the unmodified base using a different sequencing method. In some embodiments, the different sequencing method may be Illumina sequencing. In some embodiments, the unmodified base may be a cytosine, a thymine, a uracil, a adenine or guanine.
INCORPORATION BY REFERENCE
[0006] All publications, patents, and patent applications herein are incorporated by reference to the same extent as if each individual publication, patent, or patent application was specifically and individually indicated to be incorporated by reference. To the extent publications and patents or patent applications incorporated by reference contradict the disclosure contained in the specification, the specification is intended to supersede or take precedence over any such contradictory material. BRIEF DESCRIPTION OF THE DRAWINGS
[0007] The novel features herein are set forth with particularity in the appended claims. A better understanding of the features and advantages herein will be obtained by reference to the following detailed description that sets forth illustrative embodiments, in which the principles herein are utilized, and the accompanying drawings (also“figure” and“FIG.” herein), of which:
[0008] FIG. 1 shows a computer control system that may be programmed or otherwise configured to implement methods provided herein.
[0009] FIG. 2A shows a labeled 5-hmC. FIG. 2B shows a fully perpendicular single stranded DNA with glucosylated 5-hmC. FIG. 2C shows an inner diameter of mutant CsgG pore. FIG. 2D shows an oxidized glucose labeled 5-hmC.
[0010] FIG. 3A shows a control DNA sequence amplified from a plasmodium DNA. FIG. 3B shows CpG distribution in a genome. FIG. 3C shows sample sizes of a control DNA sequence, a target DNA sequence with glucosylated 5-hmC, and a target DNA sequence wherein the glucosylated 5-hmC is oxidized. FIG. 3D shows sample sizes of a control DNA sequence, a target DNA sequence with glucosylated 5-hmC, and a target DNA sequence wherein the glucosylated 5-hmC is oxidized. FIG. 3E shows no pore blockage. FIG. 3F shows insertion size distributions of the control DNA sequence. FIG.
3G shows insertion size distributions of the target DNA sequence with glucosylated 5-hmC. FIG. 3H shows insertion size distributions of the target DNA sequence wherein the glucosylated 5-hmC is oxidized.
[0011] FIG. 4A shows an example where base modifications causes basecalling errors. FIG. 4B shows a zoom-in view of 4A.FIG. 4C shows an example of a raw signal alignment.
[0012] FIG. 5 shows a workflow to identify 5-hmC by nanopore sequencing.
[0013] FIG. 6A shows a labeled 5-fC with a hydroxylamine derivative. FIG. 6B shows a labeled 5-fC with a hydrazine derivatives. FIG. 6C shows a labeled 5-fC with l,3-indandione. FIG. 6D shows a labeled 5-caC with p-anisidine.
[0014] FIG. 7A shows sequences of a control DNA sequence amplified from plasmodium DNA.
FIG. 7B shows an example of CpG distribution in a genome. FIG. 7C shows different input sample sizes of a control DNA sequence and target DNA sequence with a labeled 5-fC. FIG. 7D shows different library sample sizes of control DNA sequence and target DNA sequence with labeled 5-fC. FIG. 7E shows insertion size distributions of the control DNA sequence. FIG. 7F shows the insertion size distributions of the target DNA sequence with labeled 5-fC.
[0015] FIG. 8A shows an example of epigenetic modifications caused error basecalling. FIG. 8B shows a zoom-in view of 8A. FIG. 8C shows a raw signal alignment.
[0016] FIG. 9 shows 5-caC labeled with ethyl-3-[3-(dimethylamino)propyl]-carbodiimide hydrochloride (EDC).
[0017] FIG. 10A shows sequences of control DNA sequence amplified from plasmodium DNA. FIG. 10B shows an example of CpG distribution in a genome. FIG. 10C shows different input sample sizes of control DNA sequence and target DNA sequence with 5-caC labeled by p-Xylylenediamine. FIG. 10D shows different library sample sizes of control DNA sequence and target DNA sequence with 4-caC labeled by p-Xylylenediamine. FIG. 10E shows insertion size distributions of the control DNA sequence. FIG. 10F shows insertion size distributions of the target DNA sequence with labeled 5-caC.
[0018] FIG. 11 A shows an example of epigenetic modification caused error basecalling. FIG.11B shows a zoom-in view of 11A. FIG. 11C shows an example of a raw signal alignment.
[0019] FIG. 12A shows a method for using bioinformatics software for epigenetic modification calling.
FIG. 12B shows a method for using bioinformatics software for epigenetic modification calling.
[0020] FIG. 13A shows bioanalyzer results for 5-hmC modification. FIG. 13B shows the size distributions of 1 kb-hmC and glucosylated 1 kb-hmC. FIG. 13C shows 1 GU view of reads mapped to reference.
[0021] FIG. 14A shows result of bioanalyzer for 5-fC modification.
[0022] FIG. 14B shows the bioanalyzer result with 1/3 dilution for 5-fC modification.
[0023] FIG. 14C shows there is no signs of pore blockage.
[0024] FIG. 14D shows the insert size distributions of C, 5-fC, and 5-fC-HA.
[0025] FIG. 14E shows the modification cause error basecalling.
[0026] FIG. 14F shows zoom-in view of the modification cause error basecalling.
[0027] FIG. 14G shows the raw signal analysis of TTACT kmer.
[0028] FIG. 15A shows the result of the bioanalyzer after p-Xylylenediamine modification.
[0029] FIG. 15B shows the bioanalyzer results with 1/3 dilution for 5-caC modification.
[0030] FIG. 15C shows the raw signal analysis of ACTAT.
[0031] FIG. 16 shows the chemical structures of 5-hmC, 5-mC, 5-fC, and 5-caC.
[0032] FIG. 17 shows sample sizes of a 2kb PCR product with dCTP, a 2kb PCR product with d5mCTP, a 2kb PCR product with dCTP, and a 2kb PCR product with d5mCTP.
[0033] FIG. 18 shows that modifications cause error basecalling.
[0034] FIG. 19A shows overlaid raw signals of raw signal analysis (ONT Tombo). FIG. 19B shows densities of overlaid raw signals of raw signal analysis (ONT Tombo).
[0035] FIG. 20 shows that modifications cause error basecalling.
[0036] FIG. 21A shows overlaid raw signals of raw signal analysis (ONT Tombo). FIG. 21B shows densities of overlaid raw signals of raw signal analysis (ONT Tombo).
[0037] FIG. 22A-FIG. E shows use of De Bruijn sequence to collect k-mer information.
[0038] FIG. 23 A- FIG. B shows use of De Bruijn sequence to collect k-mer information.
[0039] FIG. 24 shows steps of one exemplary method of making a k-mer sequence including (1) gBlock PCR amplification, (2) digest Bsal, (3) ligation, (4) excised and cloned, (5) reamplified from positive clones, and (6) sanger sequencing 2/13 clones without mutation.
[0040] FIG. 25 shows effect of spiking in mC - data from Nanopore Tombo development. [0041] FIG. 26 shows a summary of data collected using Nanopore and Illumina sequencing platforms.
[0042] FIG. 27A-FIG. B shows read filtering and demultiplexing of different runs. FIG. 27A shows Run659. FIG. 27B shows Run669. FIG 27C shows Run67l.
[0043] FIG. 28A shows IGV screenshot of aligned reads from bisulphite sequencing of the De Bruijn sequence. FIG. 28B shows extent of 5-mC labeling of various CpGs in M.Ssssl labeled (top) and unlabeled sequences (bottom).
[0044] FIG. 29 shows percent modified C labelling of mC group, hmC group, and ghmC group.
[0045] FIG. 30A-FIG. C shows datasets for 5 different kmers for both unmodified and M.SssI mC.
[0046] FIG. 31A-C shows datasets for 5 different kmers for both unmodified and M.SssI mC.
[0047] FIG. 32 shows a dataset for 50 positions in the De Bruijn sequence for unmodified and modified reads.
[0048] FIG. 33A shows differences in the ion current level sequences taken with DNA containing methylation (hydroxymethylation) and DNA without methylation. FIG. 33B shows raw Tombo trace (top) and extracted data processed in R (bottom).
[0049] FIG. 34A-FIG. C shows an exemplary method for calling modifications.
[0050] FIG. 35 A shows global positional differences in the signal for 5hmC and 5ghmC for the De Bruijn sequence with position 188 highlighted (blue). FIG. 35B shows overlap in the signal for C, 5-mC, 5-hmC, 5-ghmC at a single position (188). FIG. 35C shows raw aligned traces demonstrates signal differences for 5-hmC and 5-ghmC at a single position (188).
[0051] FIG. 36A shows global positional differences in the signal for 5-hmC and 5-ghmC for the De Bruijn sequence with position 188 highlighted (blue). FIG. 36B shows signal intensity differences in the signal for 5mC, 5hmC and 5ghmC for the De Bruijn sequence with position 188 highlighted.
[0052] FIG. 37 shows a full dataset of modifications.
[0053] FIG. 38 shows a full dataset for modifications.
[0054] FIG. 39 shows signal intensity differences between cytosine, mC, hmC, and ghmC.
[0055] FIG. 40A shows Nanopore trace differences for soft-labelling with PCR, mC (red). FIG. 40B shows a dataset for 5 different kmers for M.SssI treated data.
[0056] FIG. 41A shows signal differences between unmodified and modified C in the context of CpG containing kmers with an example of where ghmC enhances differentiation between hmC and mC. FIG. 41B shows signal differences between unmodified and modified C in the context of CpG containing kmers showing a region containing a CpG where it is difficult to distinguish between mC and hmC. FIG. 42 shows a heatmap from FIG. 41 A, with the sequence of each kmer alongside its respective heatmap. DETAILED DESCRIPTION
[0057] While various embodiments have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. It should be understood that various alternatives to the embodiments herein may be employed.
Overview
[0058] A method as described herein may comprise associating a moiety with an epigenetically modified base of a target nucleic acid sequence to form a labeled epigenetically modified base, wherein the epigenetically modified base comprises a hydroxymethylated base, a formylated base, or a carboxylic acid containing base; and identifying the epigenetically modified base by sequencing the target nucleic acid comprising the labeled epigenetically modified base, wherein the sequencing comprises nanopore sequencing, wherein the sequencing is performed without an enzyme associated with the target nucleic acid sequence. In some embodiments, nanopore sequencing can be used to differentiate a first epigenetically modified base from a second epigenetically modified base. A target sequence may include any sequence for which a method as described herein is used to identify a base. A target sequence may include any sequence for which a method as described herein identifies a modification in that target sequence. A target nucleic acid sequence may include any nucleic acid sequence that a method as described herein identifies a modification in that target sequence.
[0059] The method described may be configured to determine whether a modified base, for example a modified cytosine, can pass through a nanopore. The method described may be configured to determine whether a moiety can be associated with a base, for example a modified base, and pass through a nanopore. In some embodiment, identification of a modified base as described herein can increase accuracy of detecting an unmodified base. The method as described may increase the signal-to-noise levels for determination. The method as described may increase the signal-to-noise levels for determining the presence or absence of a epigenetically modified base. Furthermore, bioinformatics software (e.g. Tombo) may be used for data analysis. The method described may include procedures such as native DNA (or DNA after modifications) fragmentation, end repair, adapter ligation, and sequencing on MinlON. The method described herein may be used to gather data about a modification, associating a moiety with a modification or association of a moiety with a modification, wherein the data can be used for data training for machine learning algorithm. The method described herein can detect or identify a modified base or an unmodified base using an algorithm.
Method - Determining 5-hmC
[0060] 5 -hmC can be detected by nanopore sequencing with the method detailed herein. Advantages of the method may comprise: (a) allowing real-time base calling of nucleic acid molecules; (b) allowing long-read sequencing (up to 2.3 mb); (c) an improved accuracy of determining an epigenetically modified base (e.g., at the >99% consensus accuracy and up to 95% for 5-methylcystosine (5-mC)); (d) being able to be combined with other methods to determine multiple epigenetic modifications containing, but not limited to, 5-formylcytosine (5-fC), 5 -carboxy cytosine (5-caC), 5-methylcystosine (5-mC), 6- methyladenine (6-mA), 6-hydroxymethyladenine (6-hmA), 6-formyladenine (6-fA), 8-oxoadenine (8- oxoA), 8-oxoguanine(8-oxoG), 7-methylguanine(7-mG), 5-hydroxymethyluracil (5-hmU), and the abasic site; (e) being able to be incorporated with portable sequencers; or (f) any combination thereof.
[0061] In an illustrated example, a plurality of nucleic acid molecules may be first obtained. The plurality of nucleic acid molecules may comprise doubled-stranded nucleic acid or single-stranded nucleic acid. The plurality of nucleic acid molecules may comprise one or more epigenetic modifications. The one or more epigenetic modifications may comprise 5-hmC. Next, a moiety may be associated with at least one of the epigenetically modified bases to form a labeled epigenetically modified base. The moiety may associate with (such as bind to) the epigenetic modified base with an aid, such as an enzyme. The moiety may associate with the epigenetic modified base by click chemistry. In the example of FIG. 2A, the moiety may be a glucose moiety. The glucose moiety may be a uridine diphosphate glucose (UDPG). The glucose moiety may be added to differentiate the electric signals of 5-hmC from those of 5- methylcystosine (5-mC) when performing nanopore sequencing. The electric signals of 5-hmC may be similar to those of 5-methylcystosine (5-mC) because of a one oxygen atom difference. The glucose moiety may be added to enhance the electric signals of 5-hmC. The glucose moiety can be added (or 5- hmC can be glucosylated) by T4 beta-glucosyltransferase.
[0062] In FIG. 2B, a fully perpendicular single stranded DNA with glucosylated 5-hmC may be about 0.5nm + 0.8nm = l.3nm. The size of the fully perpendicular single stranded DNA with glucosylated 5-hmC may be much bigger than the 0.9 nm size of current nanopore described in FIG. 2C. The fully perpendicular single stranded DNA with glucosylated 5-hmC may be tilted when passing the nanopore. FIG. 2C shows an inner diameter of mutant CsgG, pore used by the current R9 flowcell of Oxford Nanopore Technology (ONT). Mutant GsgG may be embedded in a flow cell for Nanopore sequencers. The inner diameter may be about 9 Angstroms. Single stranded DNA/RNA may pass through the pore to give different electric currents for base detection.
[0063] In FIG. 2D, the glucose moiety on the 5-hmC can be broken down. The breaking down can be carried out by an oxidizing agent. The oxidizing agent may be sodium periodate. The oxidation product may be smaller and more flexible, which may make itself pass through the nanopore.
[0064] In the illustrated example, a plurality of steps may be performed for nanopore sequencing. First, a plasmodium DNA sequence may be prepared. The plasmodium DNA sequence may have low GC content. Second, the plasmodium DNA sequence may be amplified by PCR with 5- hmCTP/dGTP/dATP/dTTP to form a control DNA sequence. Third, preparing the same plasmodium DNA sequence but with modified 5-hmC. The 5-hmC may be modified by glucosylation to form a target DNA sequence with glucosylated 5-hmC. The glucosylated 5-hmC may be further modified by periodate oxidation to form a target DNA sequence with oxidized and glucosylated 5-hmC. Fourth, sequencing the control DNA sequence, the target DNA sequence with glucosylated 5-hmC, and the target DNA sequence with oxidized and glucosylated 5-hmC.
[0065] FIG. 3A shows the sequences of the control DNA sequence amplified from the plasmodium DNA. The control DNA may be 2l52bp. In FIG. 3A, the GC content of the control DNA sequence may be about 15.1%. The cytosine distribution of the control DNA sequence may be even, mimicking CpG distribution in the genome showed in FIG. 3B. FIG. 3C shows different sample sizes of the control DNA sequence, the target DNA sequence with glucosylated 5-hmC, and the target DNA sequence with oxidized and glucosylated 5-hmC. FIG. 3C shows different input sample sizes of the control DNA sequence, the target DNA sequence with glucosylated 5-hmC, and the target DNA sequence with oxidized and glucosylated 5-hmC. FIG. 3D shows different library sample sizes of the control DNA sequence, the target DNA sequence with glucosylated 5-hmC, and the target DNA sequence with oxidized and glucosylated 5-hmC. The size of the target DNA sequence with oxidized and glucosylated 5-hmC may be larger than the size of the target DNA sequence with glucosylated 5-hmC, which may be larger than the size of control DNA sequence.
[0066] FIG. 3E shows there is no pore blockage. FIGs. 3F-3H show the insertion size distributions of the control DNA sequence, the target DNA sequence with glucosylated 5-hmC, and the target DNA sequence with oxidized and glucosylated 5-hmC, respectively. There may be no differences between the control DNA sequence and the target DNA sequence with glucosylated 5-hmC. There may be some differences between the target DNA sequence with glucosylated 5-hmC and the target DNA sequence with oxidized and glucosylated 5-hmC because periodate oxidation may cause some fragmentation of DNA. In some embodiments, periodate oxidation may not be required.
[0067] FIG. 4A shows an example that modifications cause error basecalling. FIG. 4B shows a zoom-in view of the same example that modifications cause error basecalling. The correct basecalling may be in gray, and the error basecalling may be in darker shading. Modifications may make the basecalling software confused and cause errors on cytosine determination. The errors may be demonstrated in G because C is in complimentary strand. The data gathered about the modification can be used for data training for machine learning algorithms.
[0068] FIG. 4C shows an example of a raw signal alignment. Upon manually aligning electric signals obtained through nanopore sequencing, there may be differences between the control DNA sequence and the target DNA sequence with glucosylated 5-hmC in ACT motifs.
[0069] FIG. 5 shows an example of a workflow to determine 5-hmC by nanopore sequencing. In this example of FIG. 5, a fragmented DNA may be prepared. The fragmented DNA may comprise one or more epigenetic modifications on one or both strands. Next, a glucose moiety may be associated with the epigenetic modification to form a target DNA fragment with glucosylated 5-hmC. The glucose moiety may be UDPG. Next, the target DNA fragment with glucosylated 5-hmC may be end repaired and ligated with adaptors. Then, after the end repair and adaptor ligation, the target DNA fragment with glucosylated 5-hmC can go through nanopore sequencing. [0070] In some embodiment, a 5-hmC disclosed herein can result from oxidation of a 5-mC via an oxidizing agent. In some cases, an oxidizing agent may comprise a perruthenate, a metal oxo complex, or a combination thereof. In some cases, an oxidizing agent may comprise a perruthenate and a metal oxo complex. In some cases, the metal oxo complex may be a metal VI oxide, a metal VII oxide, or a combination thereof. In some cases, an oxidizing agent may comprise hydrogen peroxide, fluorine chlorine, nitric acid, sulfuric acid, peroxydisulfuric acid, peroxymonosulfuric acid, chlorite, chlorate, perchlorate, hypochlorite, permanganate, sodium perborate, nitrous oxide, potassium nitrate, sodium bismuthate, or any combination thereof. In some cases, an oxidizing agent can be an enzyme. An oxidizing agent may oxidize 5-mC to 5-hmC, 5-fC, 5-caC, or any combination thereof. An oxidizing agent may oxidize 5-hmC to 5-fC, 5-caC, or any combination thereof. An oxidizing agent may selectively oxidize 5-mC to 5-hmC. An oxidizing agent may selectively oxidize 5-mC to 5-fC. An oxidizing agent may selectively oxidize 5-mC to 5-caC. An oxidizing agent may selectively oxidize 5-hmC to 5-fC. An oxidizing agent may selectively oxidize 5-hmC to 5-caC. In some embodiment, an enzyme may comprise a ten-eleven translocation (TET) family enzyme. In some cases, an enzyme may comprise TET1, TET2, TET3, CXXC finger protein 4 (CXXC4), any catalytically active fragment thereof, or any combination thereof. In some embodiment, a 5-hmC disclosed herein can result from reduction of a 5-fC or 5-caC via a reducing agent. A reducing agent may comprise pic-borane. A reducing agent may comprise NaBH4, NaCNBEE, or LiBH4. A reducing agent may comprise lithium aluminum hydride, sodium amalgam, amalgam, diborane, sodium borohydride, sulfur dioxide, dithionate, thiosulfate, iodide, hydrogen peroxide, hydrazine, diisobutylaluminum hydride, oxalic acid, carbon monoxide, cyanide, ascorbic acid, formic acid, dithiothreitol, beta-mercaptoethanol, or any combination thereof. A reducing agent may reduce 5-caC to 5fC, 5-hmC, 5-mC or any combination thereof. A reducing agent may reduce 5-fC to 5- hmC, 5-mC or any combination thereof. A reducing agent may selectively reduce 5-caC to 5-fC. A reducing agent may selectively reduce 5-caC to 5-hmC. A reducing agent may selectively reduce 5-caC to 5-mC. A reducing agent may selectively reduce 5-fC to 5-hmC. A reducing agent may selectively reduce 5-fC to 5-mC. A reducing agent may selectively reduce 5-hmC to 5-mC.
[0071] A reducing agent may reduce 5-caC to 5-fC such that substantially no other epigenetic modification is reduced. A reducing agent may reduce 5-caC to 5-hmC such that substantially no other epigenetic modification is reduced. A reducing agent may reduce 5-caC to 5-mC such that substantially no other epigenetic modification is reduced. A reducing agent may reduce 5-fC to 5-hmC such that substantially no other epigenetic modification is reduced. A reducing agent may reduce 5-fC to 5-mC such that substantially no other epigenetic modification is reduced. A reducing agent may reduce 5-hmC to 5- mC such that substantially no other epigenetic modification is reduced.
[0072] An epigenetic modification may be reduced in the presence of a reducing agent and a co-factor.
An epigenetic modification may be oxidized in the presence of an oxidizing agent and a co-factor. The co factor may comprise SALL4A, Fe2+, 2-oxoglutarate, ATP, or any combination thereof. [0073] An epigenetically modified base may be deaminated by a cytidine deaminase, such as APOBEC 1. The epigenetically modified base may be a 5-mC, 5-hmC, 5-fC, 5-caC or any combination thereof. The deamination may occur before or after associating a moiety with the epigenetically modified base. The deamination may occur before or after reducing or oxidizing the epigenetically modified base.
[0074] In some cases, the method may comprise: associating a moiety with a hydroxymethylated base of a target nucleic acid sequence to form a labeled base; oxidizing the labeled base; and identifying the oxidized labeled base. In some cases, the identifying comprises nanopore sequencing.
[0075] In some cases, the method may comprise: oxidizing a hydroxymethylated base; associating a moiety with the oxidized base; and identifying the labeled oxidized base. In some cases, the identifying comprises nanopore sequencing.
[0076] In some cases, the method may comprise: associating a moiety with a hydroxymethylated base of a target nucleic acid sequence to form a labeled base; reducing the labeled base; and identifying the reduced labeled base. In some cases, the identifying comprises nanopore sequencing.
[0077] In some cases, the method may comprise: reducing a hydroxymethylated base; associating a moiety with the reduced base; and identifying the labeled reduced base. In some cases, the identifying comprises nanopore sequencing.
[0078] In some cases, the method may comprise: associating a moiety with a methylated base of a target nucleic acid sequence to form a labeled base; oxidizing the labeled base; and identifying the oxidized labeled base. In some cases, the identifying comprises nanopore sequencing.
[0079] In some cases, the method may comprise: oxidizing a methylated base; associating a moiety with the oxidized base; and identifying the labeled oxidized base. In some cases, the identifying comprises nanopore sequencing.
[0080] In some cases, the method may comprise: associating a moiety with a methylated base of a target nucleic acid sequence to form a labeled base; reducing the labeled base; and identifying the reduced labeled base. In some cases, the identifying comprises nanopore sequencing.
[0081] In some cases, the method may comprise: reducing a methylated base; associating a moiety with the reduced base; and identifying the labeled reduced base. In some cases, the identifying comprises nanopore sequencing.
[0082] In some cases, the method may comprise: associating a moiety with a formylated base of a target nucleic acid sequence to form a labeled base; oxidizing the labeled base; and identifying the oxidized labeled base. In some cases, the identifying comprises nanopore sequencing.
[0083] In some cases, the method may comprise: oxidizing a formylated base; associating a moiety with the oxidized base; and identifying the labeled oxidized base. In some cases, the identifying comprises nanopore sequencing.
[0084] In some cases, the method may comprise: associating a moiety with a formylated base of a target nucleic acid sequence to form a labeled base; reducing the labeled base; and identifying the reduced labeled base. In some cases, the identifying comprises nanopore sequencing. [0085] In some cases, the method may comprise: reducing a formylated base; associating a moiety with the reduced base; and identifying the labeled reduced base. In some cases, the identifying comprises nanopore sequencing.
[0086] In some cases, the method may comprise: associating a moiety with a carboxylated base of a target nucleic acid sequence to form a labeled base; oxidizing the labeled base; and identifying the oxidized labeled base. In some cases, the identifying comprises nanopore sequencing.
[0087] In some cases, the method may comprise: oxidizing a carboxylated base; associating a moiety with the oxidized base; and identifying the labeled oxidized base. In some cases, the identifying comprises nanopore sequencing.
[0088] In some cases, the method may comprise: associating a moiety with a carboxylated base of a target nucleic acid sequence to form a labeled base; reducing the labeled base; and identifying the reduced labeled base. In some cases, the identifying comprises nanopore sequencing.
[0089] In some cases, the method may comprise: reducing a carboxylated base; associating a moiety with the reduced base; and identifying the labeled reduced base. In some cases, the identifying comprises nanopore sequencing.
[0090] A first portion or aliquot of a nucleic acid sample may be subjected to a reducing agent or an oxidizing agent, wherein a second portion or aliquot of the nucleic acid sample may not be. A first portion or aliquot of a nucleic acid sample may be subjected to deamination, wherein a second portion or aliquot of the nucleic acid sample may not be. A first portion or aliquot of a nucleic acid sample may be subjected to association with a moiety, wherein a second portion or aliquot of the nucleic acid sample may not be.
Method - Determining 5-fC
[0091] 5 -fC can be detected by nanopore sequencing with the method detailed herein. Advantages of the method may comprise: (a) allowing real-time base calling of nucleic acid molecules; (b) allowing long- read sequencing (up to 2.3 mb); (c) an improved accuracy of determining an epigenetically modified base (e.g., at the >99% consensus accuracy and up to 95% for 5-methylcystosine (5-mC)); (d) being able to be combined with other methods of determining multiple epigenetic modifications contain, but not limited to, 5-hydroxymethylcytosine (5-hmC), 5 -carboxy cytosine (5-caC), 5-methylcystosine (5-mC), 6- methyladenine (6-mA), 6-hydroxymethyladenine (6-hmA), 6-formyladenine (6-fA), 8-oxoadenine (8- oxoA), 8-oxoguanine(8-oxoG), 7-methylguanine(7-mG), 5-hydroxymethyluracil (5-hmU), and the abasic site; (e) being able to be incorporated with portable sequencers; or (f) any combination thereof.
[0092] In an illustrated example, a plurality of nucleic acid molecules may be first obtained. The plurality of nucleic acid molecules may comprise doubled-stranded nucleic acid or single-stranded nucleic acid.
The plurality of nucleic acid molecules may comprise one or more epigenetic modifications. The one or more epigenetic modifications may comprise 5-fC. Next, a moiety may be associated with at least one of the epigenetically modified bases to form a labeled epigenetically modified base. The moiety may associate with a plurality of epigenetically modified bases. The moiety may associate with (such as bind to) the epigenetically modified base with an aid, such as an enzyme. The moiety may associate with the epigenetically modified base by click chemistry. In the examples of FIGs. 6A-6D, the moiety may be hydroxylamine derivatives, hydrazine derivatives, l,3-indandione, p-anisidine, or any combination thereof.
[0093] To perform the nanopore sequencing, a plurality of steps may be performed. First, a plasmodium DNA sequence may be prepared. The plasmodium DNA sequence may have low GC content. Second, the plasmodium DNA sequence may be amplified by PCR with 5-fCTP/dGTP/dATP/dTTP to form a control DNA sequence. Third, preparing the same plasmodium DNA sequence but with modified 5-fC to form a target DNA sequence with modified 5-fC. Fourth, sequencing both the control DNA sequence and the target DNA sequence.
[0094] FIG. 7A shows the sequences of the control DNA sequence amplified from the plasmodium DNA. The control DNA may be l533bp. In FIG. 7A, the GC content of the control DNA sequence may be about 14.2%. The cytosine distribution of the control DNA sequence may be even, mimicking CpG distribution in the genome showed in FIG. 7B. 5hmC may be mainly in CpG islands of the human genome, which may be less than about 1% of the human genome. In a PCR product amplified with d5hmCTP, all cytosines may become 5-hmC in addition to CpG context. For a GC balanced (40-60%) DNA fragment, the cytosine content may be from about 20% to about 30%. In contract, Plasmodium may have an AT rich genome. Thus, employing a DNA fragment (such as from about l.5kB to about 2kB) obtained or derived from a Plasmodium may be advantageous because of its low GC content (from about 14% to about 15%), wherein cytosine occurrence may be from about 7% to about 8%. In this case, it may mimic CpG in a human genome. FIG. 7C shows different sample sizes of the control DNA sequence and the target DNA sequence with modified 5-fC. FIG. 7C shows different input sample sizes of the control DNA sequence and the target DNA sequence with modified 5-fC. The target DNA sequence with modified 5-fC may be treated by hydroxylamine (HA). FIG. 7D shows different library sample sizes of the control DNA sequence and the target DNA sequence with modified 5-fC. The size of the target DNA sequence with modified 5-fC may be larger than the size of control DNA sequence.
[0095] FIGs. 7E-7F show the insertion size distributions of the control DNA sequence and the target DNA sequence with modified 5-fC. There may be no differences between the control DNA sequence and the target DNA sequence with modified 5-fC because the reaction conditions of DNA modification may not cause DNA damage.
[0096] FIG. 8A shows an example wherein modifications may cause error basecalling. FIG. 8B shows a zoom-in view of the same example that modifications cause error basecalling. The correct basecalling may be in gray, and the error basecalling may be represented by dark lines. Modifications may make the basecalling software confused and cause errors on cytosine determination. The errors may be demonstrated in G because C is in complimentary strand. The data gathered about the modification can be used for data training for machine learning algorithm. In the FIG. 8B, the errors of the target DNA sequence with modified 5-fC may be larger than the errors of the control DNA sequence. [0097] FIG. 8C shows an example of a raw signal alignment. Upon manually aligning electric signals obtained through nanopore sequencing, there may be differences between the control DNA sequence and the target DNA sequence with modified 5-fC in TACTA kmer. In some embodiment, a 5-fC disclosed herein can result from oxidation of a 5-mC or 5-hmC via an oxidizing agent. In some cases, an oxidizing agent can be an enzyme. In some embodiment, an enzyme may comprise a ten-eleven translocation (TET) family enzyme. In some cases, an enzyme may comprise TET1, TET2, TET3, CXXC finger protein 4 (CXXC4), any catalytically active fragment thereof, or any combination thereof.
Method - Determining 5-caC
[0098] 5 -caC can be detected by nanopore sequencing with the method detailed herein. Advantages of the method may comprise: (a) allowing real-time base calling of nucleic acid molecules; (b) allowing long-read sequencing (up to 2.3 mb); (c) an improved accuracy of determining an epigenetically modified base (e.g., at the >99% consensus accuracy and up to 95% for 5-methylcystosine (5-mC)); (d) being able to be combined with other methods of determining multiple epigenetic modifications contain, but not limited to, 5-hydroxymethylcytosine (5-hmC), 5-methylcystosine (5-mC), 5-formylcytosine (5-fC), 6- methyladenine (6-mA), 6-hydroxymethyladenine (6-hmA), 6-formyladenine (6-fA), 8-oxoadenine (8- oxoA), 8-oxoguanine(8-oxoG), 7-methylguanine(7-mG), 5-hydroxymethyluracil (5-hmU), and the abasic site; (e) being able to be incorporated with portable sequencers; or (f) any combination thereof.
[0099] In an illustrated example, a plurality of nucleic acid molecules may be first obtained. The plurality of nucleic acid molecules may comprise doubled-stranded nucleic acid or single-stranded nucleic acid. The plurality of nucleic acid molecules may comprise one or more epigenetic modifications. The one or more epigenetic modifications may comprise 5-caC. Next, a moiety may be associated with at least one of the epigenetically modified bases to form a labeled epigenetically modified base. The moiety may associate with (such as bind to) the epigenetically modified bases with an aid, such as an enzyme. The moiety may associate with the epigenetically modified bases by click chemistry. In the example of FIG. 9, the moiety may be -ethyl-3-[3-(dimethylamino)propyl]-carbodiimide hydrochloride (EDC).
[00100] To perform the nanopore sequencing, a plurality of steps may be performed. First, a plasmodium DNA sequence may be prepared. The plasmodium DNA sequence may have low GC content. Second, the plasmodium DNA sequence may be amplified by PCR with 5-caCTP/dGTP/dATP/dTTP to form a control DNA sequence. Third, preparing the same plasmodium DNA sequence but with modified 5-caC to form a target DNA sequence with modified 5-caC. The 5-caC may be modified by p-Xylylenediamine. Fourth, sequencing both the control DNA sequence and the target DNA sequence with modified 5-caC.
[00101] FIG. 10A shows the sequences of the control DNA sequence amplified from the plasmodium DNA. The control DNA may be l533bp. In FIG. 10A, the GC content of the control DNA sequence may be about 14.2%. The cytosine distribution of the control DNA sequence may be even, such as to mimic CpG distribution in the genome (as described above) shown in FIG. 10B. FIG. 10C shows different sample sizes of the control DNA sequence and the target DNA sequence treated by p-Xylylenediamine. FIG. 10C shows different input sample sizes of the control DNA sequence and the target DNA sequence treated by p-Xylylenediamine. FIG. 10D shows different library sample sizes of the control DNA sequence and the target DNA sequence treated by p-Xylylenediamine. The size of the target DNA sequence with modified 5-caC may be larger than the size of the control DNA sequence.
[00102] FIGs. 10E-10F show the insertion size distributions of the control DNA sequence and the target DNA sequence with modified 5-caC. There may be differences between the control DNA sequence and the target DNA sequence with modified 5-caC because reaction conditions of DNA modification may cause some damage.
[00103] FIG. 11A shows an example that modifications cause error basecalling. FIG.l 1B shows a zoom-in view of the same example that modifications cause error basecalling. The correct basecalling may be in gray, and the error basecalling may be represented by dark markings. Modifications may make the basecalling software confused and cause errors on cytosine determination. The errors may be
demonstrated in G because C is in complimentary strand. The data gathered about the modification can be used for data training for machine learning algorithm. In the FIG. 11B, the errors of the target DNA sequence with modified 5-caC may be larger than the errors of the control DNA sequence
[00104] FIG. 11C shows an example of a raw signal alignment. Upon manually aligning electric signals obtained through nanopore sequencing, there may be differences between the control DNA sequence and the target DNA sequence with modified 5-caC in ACTAT kmer.
Bioinformatics software for determination of epigenetic modifications
[00105] One or more bioinformatics software programs can be used to determine the epigenetic modifications. The one or more bioinformatics software programs may comprise Tombo and Nanopolish. The advantages of Tombo may be testing-based advantages. The testing-based advantages may comprise: (a) no requirement for training data; (b) identification of modified and unmodified nucleotides in close proximity; (c) detection of chemical modification; or (d) any combination thereof. The advantages of Nanopolish may be model-based advantages. The model-based advantages may comprise: (a) knowing exact chemical modifications; (b) knowing exact modified positions; (c) requiring native DNA after training; or (d) any combination thereof. The one or more bioinformatics software programs may comprise NET Bio, AMPHORA, Anduril, AutoDock, Biolipse, Bioconductor, BioJava, BioJS,
BioMOBY, BioPeri, BioPHP, Biophython, BioRuby, EMBOSS, Galaxy, GenePattem, Geworkbench, GMOD, GenGIS, Genomespace, GENtile, Integrated Genome Browser, InterMine, LabKey Server, mother, PathVisio, Orange, Staden Package, Tavema workbench, UGENE, and Unipept.
[00106] FIGs 12A and 12B show examples of methods for using bioinformatics software for epigenetic modification calling. In FIGs 12A-12B, the 5hmC calling is taken as examples. In FIG. 12A, two distinct libraries may be prepared. The first library may comprise a control DNA sequence (with 5-hmC). The second library may comprise a target DNA sequence with glucosylated 5-hmC. The target DNA sequence with glucosylated 5-hmC may be created by associating a glucose moiety to a 5-hmC base of the control DNA sequences. Both the first library and the second library may go through nanopore sequencing so the control DNA sequence can be compared with target DNA sequence with glucosylated 5-hmC to generate data related to 5-hmC base calls.
[00107] Using the data generated through nanopore sequencing, a model can be built and trained (e.g., by aggregating data related to 5-hmC base calls) to leam how to interpret signals toward accurate base calling and/or determination of epigenetic modification. Developing a model may comprise analyzing the plurality of associated sequence signals and developing rules for predicting base calls and/or epigenetic modification, based on the comparison between the control DNA sequence and the target DNA sequence with glucosylated 5-hmC.
[00108] For example, the model may be built and trained (e.g., using machine learning techniques) based on analysis of different electric signals of the control DNA sequence and the target DNA sequence with glucosylated 5-hmC. Such a model may comprise expected sequence signals corresponding to a glucosylated 5-hmC. Alternatively, or in addition, models may comprise distributions, medians, averages, or other quantitative measures of sequence signals (e.g., signal amplitudes) corresponding to a glucosylated 5-hmC.
[00109] Methods of the present disclosure may comprise algorithms to determine the epigenetic modification. The one or more algorithms may include machine learning algorithms. The machine learning algorithms may comprise supervised learning algorithms, unsupervised learning algorithms, semi-supervised learning algorithms, reinforcement learning algorithms, deep learning algorithms, or any combination thereof. The machine learning algorithms may also comprise Support Vector Machine (SVM), Naive Bayes (NB), Quadratic Discriminant Analysis (QDA), K-Nearest Neighbors (KNN),
Linear Discriminant Analysis (LDA), and Multilayer Perceptron (MLP).
[00110] The algorithms may incorporate training data of known sequences of the control DNA sequence and/or the target DNA sequence with glucosylated 5-hmC. The algorithms may comprise auxiliary outputs, which may include assessments of the quantization noise (e.g., Poisson or binomial random variation) or other quality assessments, including a confidence interval or error assessment of the epigenetic modification. The outputs may also include dynamic assessments of chemistry process parameters (e.g., temperature) and the most likely labeling fraction to account for the observations as well.
[00111] The trained model may then be applied by one or more trained algorithms (e.g., machine learning algorithms) to predict base calls and/or determination of epigenetic modification. Such predictions may comprise refining or correcting base calls and/or error base calls, which show the epigenetic modification. Alternatively, such predictions may comprise determining base calls and/or determination of epigenetic modification from a plurality of sequence signals. All of the operations described herein, such as training an algorithm, predicting and/or generating base calls and other operations, such as those described elsewhere herein, are capable of happening in real-time.
[00112] In FIG. 12B, one library may be prepared. At the beginning, the library may comprise control DNA sequences (with 5-hmC). Next, at least one of the control DNA sequences may be glucosylated to form an intermediate DNA sequence with glucosylated 5-hmC. Later, the intermediate DNA sequence with glucosylated 5-hmC may be ligated with barcode and go through copy strand synthesis to form a first DNA sequence and a second DNA sequence. The first DNA sequence may comprise a forward strand from the intermediate DNA sequence with glucosylated 5-hmC and a complementary strand to the forward strand. The second DNA sequence may comprise a reverse strand from the intermediate DNA sequence with glucosylated 5-hmC and a complementary strand to the reverse strand. The forward strand may be complementary to the reverse strand. The library may comprise the first DNA sequence and the second DNA sequence. The library may then go through nanopore sequencing so the first DNA sequence can be compared with the second DNA sequence to generate data related to 5-hmC base calls. In both FIG. 12A and FIG. 12B, both data of the nanopore sequencing may be analyzed through Tombo.
[00113] Using the data generated through nanopore sequencing, a model can be built and trained (e.g., by aggregating data related to 5-hmC base calls) to leam how to interpret signals toward accurate base calling and/or determination of epigenetic modification. Developing a model may comprise analyzing the plurality of associated sequence signals and developing rules for predicting base calls and/or epigenetic modification, based on the comparison between the first DNA sequence and the second DNA sequence.
[00114] For example, the model may be built and trained (e.g., using machine learning techniques) based on analysis of electric signals of the first DNA sequence and the second DNA sequence. Such a model may comprise expected sequence signals corresponding to a glucosylated 5-hmC. Alternatively, or in addition, models may comprise distributions, medians, averages, or other quantitative measures of sequence signals (e.g., signal amplitudes) corresponding to a glucosylated 5-hmC.
[00115] Methods of the present disclosure may comprise algorithms to determine the epigenetic modification. The one or more algorithms may include machine learning algorithms. The machine learning algorithms may comprise supervised learning algorithms, unsupervised learning algorithms, semi-supervised learning algorithms, reinforcement learning algorithms, deep learning algorithms, or any combination thereof. The machine learning algorithms may also comprise Support Vector Machine (SVM), Naive Bayes (NB), Quadratic Discriminant Analysis (QDA), K-Nearest Neighbors (KNN),
Linear Discriminant Analysis (LDA), and Multilayer Perceptron (MLP).
[00116] The algorithms may incorporate training data of known sequences of the first DNA sequence and the second DNA sequence. The algorithms may comprise auxiliary outputs, which may include assessments of the quantization noise (e.g., Poisson or binomial random variation) or other quality assessments, including a confidence interval or error assessment of the epigenetic modification. The outputs may include dynamic assessments of chemistry process parameters (e.g., temperature) and the most likely labeling fraction to account for the observations as well.
[00117] The trained model may then be applied by one or more trained algorithms (e.g., machine learning algorithms) to predict base calls and/or determination of epigenetic modification. Such predictions may comprise refining or correcting base calls and/or determination of epigenetic modification. Alternatively, such predictions may comprise determining base calls and/or determination of epigenetic modification from a plurality of sequence signals. All of the operations described herein, such as training an algorithm, predicting and/or generating base calls and other operations, such as those described elsewhere herein, are capable of happening in real-time.
Definitions
[00118] As used herein, the singular forms“a”,“an”, and“the” include plural references unless the context clearly dictates otherwise. Any reference to“or” herein may be intended to encompass“and/or” unless otherwise stated.
[00119] As used herein, the term“about” may mean the referenced numeric indication plus or minus 15% of that referenced numeric indication.
[00120] The term“fragment,” as used herein, may be a portion of a sequence, a subset that may be shorter than a full length sequence. A fragment may be a portion of a gene. A fragment may be a portion of a peptide or protein. A fragment may be a portion of an amino acid sequence. A fragment may be a portion of an oligonucleotide sequence. A fragment may be less than about: 20, 30, 40, 50 amino acids in length.
A fragment may be less than about: 20, 30, 40, 50 oligonucleotides in length.
[00121] The term“epigenetic modification” as used herein, may be any covalent modification of a nucleic acid base. In some cases, a covalent modification may comprise (i) adding a methyl group, a
hydroxymethyl group, a carbon atom, an oxygen atom, or any combination thereof to one or more bases of a nucleic acid sequence, (ii) changing an oxidation state of a molecule associated with a nucleic acid sequence, such as an oxygen atom, or (iii) a combination thereof. A covalent modification may occur at any base, such as a cytosine, a thymine, a uracil, an adenine, a guanine, or any combination thereof. In some cases, an epigenetic modification may comprise an oxidation or a reduction. A nucleic acid sequence may comprise one or more epigenetically modified bases. An epigenetically modified base may comprise any base, such as a cytosine, a uracil, a thymine, adenine, or a guanine. An epigenetically modified base may comprise a methylated base, a hydroxymethylated base, a formylated base, or a carboxylic acid containing base or a salt thereof. An epigenetically modified base may comprise a 5- methylated base, such as a 5-methylated cytosine (5-mC). An epigenetically modified base may comprise a 5 -hydroxymethylated base, such as a 5 -hydroxymethylated cytosine (5-hmC). An epigenetically modified base may comprise a 5 -formylated base, such as a 5 -formylated cytosine (5-fC). An
epigenetically modified base may comprise a 5-carboxylated base or a salt thereof, such as a 5- carboxylated cytosine (5-caC). In some cases, an epigenetically modified base may comprise a methyltransferase-directed transfer of a group (such as an mTAG). FIG. 16 shows the chemical structures of 5-hmC, 5-mC, 5-fC, and 5-caC.
[00122] An epigenetically modified base may comprise one or more bases or a purine (such as Structure 1) or one or more bases of a pyrimidine (such as Structure 2). An epigenetic modification may occur one or more of any positions. For example, an epigenetic modification may occur at one or more positions of a purine, including positions 1, 2, 3, 4, 5, 6, 7, 8, 9, as shown in Structure 1. In some cases, an epigenetic modification may occur at one or more positions of a pyrimidine, including positions 1, 2, 3, 4, 5, 6, as shown in Structure 2.
[00123]
Figure imgf000021_0001
Structure 1
[00124]
Figure imgf000021_0002
Structure 2
[00125] A nucleic acid sequence may comprise an epigenetically modified base. A nucleic acid sequence may comprise a plurality of epigenetically modified bases. A nucleic acid sequence may comprise an epigenetically modified base positioned within a CG site, a CpG island, or a combination thereof. A nucleic acid sequence may comprise different epigenetically modified bases, such as a methylated base, a hydroxymethylated base, a formylated base, a carboxylic acid containing base or a salt thereof, a plurality of any of these, or any combination thereof.
[00126] The term“barcode” as used herein may relate to a natural or synthetic nucleic acid sequence comprised by a polynucleotide allowing for unambiguous identification of the polynucleotide and other sequences comprised by the polynucleotide having said barcode sequence. The number of different barcode sequences theoretically possible can be directly dependent on the length of the barcode sequence; e.g., if a DNA barcode with randomly assembled adenine, thymidine, guanosine and cytidine nucleotides can be used, the theoretical maximal number of barcode sequences possible can be 1,048,576 for a length of ten nucleotides, and can be 1,073,741,824 for a length of fifteen nucleotides. Unique sample identifiers or barcodes can be completely scrambled (e.g., randomers of A, C, G, and T for DNA or A, C, G, and U for RNA) or they can have some regions of shared sequence. For example, a shared region on each end may reduce sequence biases in ligation events. In some cases, a shared region can be about or at least about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, or 20 common base pairs. In some cases, a shared region can be up to about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, or 20 common base pairs. Combinations of barcodes can be added to increase diversity. [00127] A barcode may uniquely identify a subject, a sample (such as a cell-free sample), a nucleic acid sequence (such as a sequence having one or more epigenetically modified bases), or any combination thereof. A barcode may be associated with a nucleic acid sequence or a complementary strand. A nucleic acid sequence may comprise a single barcode. A nucleic acid sequence may comprise one or more barcodes, such as a first barcode and a second barcode. In some cases, the first barcode is different from the second barcode. In some cases, each barcode of a plurality of barcodes may be a unique barcode. In some cases, a barcode may comprise a sample identification barcode. For example, a first barcode may comprise a unique barcode and a second barcode may comprise a sample identification barcode.
[00128] The term“adapter” as used herein may be a nucleic acid with known or unknown sequence. An adapter may be attached to the 3’end, 5’end, or both ends of a nucleic acid (e.g. target nucleic acid). An adapter may comprise known sequences and/or unknown sequences. An adapter may be double -stranded or single-stranded. In some cases, an adapter can comprise a barcode (e.g. unique identifier sequence). In some cases, an adapter can be an amplification adapter. An amplification adapter may attach to a target nucleic acid and help the amplification of the target nucleic acid. For example, an amplification adapter may comprise one or more of: a primer binding site, a unique identifier sequence, a non-unique identifier sequence, and a sequence for immobilizing the target nucleic acid on a substrate. A target nucleic acid attached with an amplification adapter may be immobilized on a substrate. An amplification primer may hybridize to the adapter and be extended using the target nucleic acid as a template in an amplification reaction. In some cases, the unique identifiers in an adapter can be used to label the amplicons. In some cases, an adapter can be a sequencing adapter. A sequencing adapter may attach to a target nucleic acid and help the sequencing of the target nucleic acid. For example, a sequencing adapter may comprise one or more of: a sequencing primer binding site, a unique identifier sequence, a non-unique identifier sequence, and a sequence for immobilizing target nucleic acid on a substrate. A target nucleic acid attached with a sequencing adapter may be immobilized on a substrate on a sequencer. A sequencing primer may hybridize to the adapter and be extended using the target nucleic acid as a template in a sequencing reaction. In some cases, the unique identifiers in an adapter can be used to label the sequence reads of different target sequences, thus allowing high-throughput sequencing of a plurality of target nucleic acids. In some examples, an adapter sequence (such as a double-stranded or single -stranded oligonucleotide) may be ligated to one or both ends of a nucleic acids sequence. A nucleic acid sequence may comprise one or more epigenetically modified bases. A nucleic acid sequence may be from a sample, such as a cell free DNA sample. A nucleic acid sequence may be from a sample obtained from a subject.
A nucleic acid sequence may comprise a double-stranded portion, a single -stranded portion, or a combination thereof. In some cases, an adapter may recognize or may be complementary to a primer, such as a universal primer. In some cases, an adapter may be specific to a sequencing method. In some cases, an adapter may be associated with a nucleic acid sequence or a complementary strand.
[00129] The term“nucleic acid sequence” as used herein may comprise DNA or RNA. In some cases, a nucleic acid sequence may comprise a plurality of nucleotides. In some cases, a nucleic acid sequence may comprise an artificial nucleic acid analogue. In some cases, a nucleic acid sequence comprising DNA, may comprise cell-free DNA, cDNA, fetal DNA, or maternal DNA. In some cases, a nucleic acid sequence may comprise miRNA, shRNA, or siRNA.
[00130] The term“moiety” as used herein, may be a component that may be (a) associated with a substrate, (b) associated with an epigenetically modified base, or (c) a combination thereof. A moiety may be associated with an epigenetically modified base by a single bond, a double bond, a triple bond, a metal -associated bond, or an ion pairing. A moiety may comprise a magnetic metal, such as iron, nickel, cobalt, aluminum, or any combination thereof. A moiety may be associated with an epigenetically modified base by the assistance of an enzyme. A moiety may be associated with a substrate via (a) a biotin-streptavidin association, (b) a magnetic association, (c) an antibody-antigen association, or (d) any combination thereof. A moiety may be selectively for a portion of a nucleic acid sequence. A moiety may selectively associate with a double-stranded portion of a nucleic acid sequence as compared to single- stranded portion. A moiety may selectively associate with portions of a nucleic acid sequence having an epigenetically modified base as compared to portions having a non-modified base. A moiety may selectively associate with a type of epigenetically modified base, such as selectively associating with a 5- hydroxymethylated cytosine (5-hmC) as compared to a 5-methylated cytosine (5-mC). A moiety may comprise a sugar, such as a glucose. A glucose may comprise a modified glucose. A moiety may comprise more than one sugar, such as two sugars or more. A moiety may comprise a modified sugar, such as a modified glucose. A moiety may comprise a uridine diphosphate glucose (UDPG). A moiety may comprise a detectable moiety such as a radioactive moiety, a fluorescent moiety, a chemiluminescent moiety, a phosphorescent moiety, an infrared moiety, a visible moiety, a chemically reactive moiety (such as an azide-based moiety), or any combination thereof. In some cases, a moiety may be a moiety which results from incorporating a chromophore via a reaction with a radioactive moiety. A moiety may comprise a protein, peptide, or polypeptide. In some cases, a moiety may comprise an antibody or portion thereof. A moiety may comprise a tag, such as a FLAG-tag. A moiety may comprise a biotin or an avidin, such as streptavidin. A moiety may comprise a nucleic acid sequence. A moiety may comprise a substrate. In some cases, a different moiety may be employed to uniquely moiety different epigenetic modifications. For example, a first moiety may bind a methylated base and a second moiety may bind a
hydroxymethylated base.
[00131] A moiety may comprise a hydroxylamine, a hydrazine, a l,3-indandione, a hemi-acetal, an acetal, an aldehyde, a ketone, an ester, a primary amine, a secondary amine, an alkene, an alcohol, a thioacetal, a malonitrile, a benzoin, an aldol, an anisidine, a carbodiimide, a p-Xylylenediamine, an ester, an amine, an acyl halide, an acid anhydride, a N6-methyladenine, a N6-hydroxymethyladenine, a N6-formyladenine, a 2’-0-methyladenosine, a Nl-methyladenosine, a pseudouridine, an inosine, a 8-oxoguanine, a 7- methylguanine, a 5-hydroxymethyluracil, an abasic site, a derivative of any of these, or any combination thereof. A moiety may be associated with an epigenetically modified base, such as a formylated base, a carboxylic acid containing base, a hydroxymethylated base, a methylated base, or any combination thereof.
[00132] A moiety may be associated with a formylated base, such as for example a hydroxylamine, a hydrazine, a l,3-indandione, a hemi-acetal, an acetal, an aldehyde, a ketone, an ester, a primary amine, a secondary amine, an alkene, an alcohol, a thioacetal, a malonitrile, a benzoin, an aldol, an anisidine, a carbodiimide, a p-Xylylenediamine, an ester, an amine, an acyl halide, an acid anhydride, a N6- methyladenine, a N6-hydroxymethyladenine, aN6-formyladenine, a 2’-0-methyladenosine, a Nl- methyladenosine, a pseudouridine, an inosine, a 8-oxoguanine, a 7-methylguanine, a 5- hydroxymethyluracil, an abasic site, a derivative of any of these, or any combination thereof.
[00133] A moiety may be associated with a carboxylic acid containing base, such as for example a hydroxylamine, a hydrazine, a l,3-indandione, a hemi-acetal, an acetal, an aldehyde, a ketone, an ester, a primary amine, a secondary amine, an alkene, an alcohol, a thioacetal, a malonitrile, a benzoin, an aldol, an anisidine, a carbodiimide, a p-Xylylenediamine, an ester, an amine, an acyl halide, an acid anhydride, a N6-methyladenine, a N6-hydroxymethyladenine, a N6-formyladenine, a 2’-0-methyladenosine, a Nl- methyladenosine, a pseudouridine, an inosine, a 8-oxoguanine, a 7-methylguanine, a 5- hydroxymethyluracil, an abasic site, a derivative of any of these, or any combination thereof.
[00134] A moiety may be associated with a hydroxymethylated base, such as for example a
hydroxylamine, a hydrazine, a l,3-indandione, a hemi-acetal, an acetal, an aldehyde, a ketone, an ester, a primary amine, a secondary amine, an alkene, an alcohol, a thioacetal, a malonitrile, a benzoin, an aldol, an anisidine, a carbodiimide, a p-Xylylenediamine, an ester, an amine, an acyl halide, an acid anhydride, a N6-methyladenine, a N6-hydroxymethyladenine, a N6-formyladenine, a 2’-0-methyladenosine, a Nl- methyladenosine, a pseudouridine, an inosine, a 8-oxoguanine, a 7-methylguanine, a 5- hydroxymethyluracil, an abasic site, a derivative of any of these, or any combination thereof.
[00135] Two or more, three or more, four or more moieties may be associated with an epigenetically modified base. For example, two or more, three or more, four or more of: a hydrazine, a l,3-indandione, a hemi-acetal, an acetal, an aldehyde, a ketone, an ester, a primary amine, a secondary amine, an alkene, an alcohol, a thioacetal, a malonitrile, a benzoin, an aldol, an anisidine, a carbodiimide, a p-Xylylenediamine, an ester, an amine, an acyl halide, an acid anhydride, a N6-methyladenine, aN6-hydroxymethyladenine, a N6-formyladenine, a 2’-0-methyladenosine, a Nl-methyladenosine, a pseudouridine, an inosine, a 8- oxoguanine, a 7-methylguanine, a 5-hydroxymethyluracil, an abasic site, a derivative of any of these, or any combination thereof may be associated with an epigenetically modified base.
[00136] An epigenetically modified base may be identified in a nucleic acid. In some cases, a labeled epigenetically modified base may comprise a sugar moiety, such as a glucose. When a sugar moiety may be present, the identifying of the epigenetically modified base may not comprise identifying a presence or an absence of the sugar moiety. In some cases, when a sugar moiety is present, the identifying may not be based on a presence or an absence of the sugar moiety. [00137] In some cases, the identifying of the epigenetically modified base may comprise identifying a presence or an absence of a moiety associated with an epigenetically modified base. In some cases, the identifying the epigenetically modified base may comprise identifying a presence or an absence of a labeled epigenetically modified base.
[00138] In some cases, associating a moiety with an epigenetically modified base may permit
identification of the epigenetically modified base by sequencing, such as by nanopore sequencing. In some cases, identifying an epigenetically modified base may comprise sequencing a nucleic acid sequence, such as by nanopore sequencing the nucleic acid sequence. In such cases, association of a moiety with an epigenetically modified base may modify the epigenetically modified base such that the identification of the epigenetically modified base is by the sequencing.
[00139] In some cases, a tag may comprise a glutathione-S-transferase (GST), a maltose binding protein (MBP), a green fluorescent protein (GFP), an AviTag, a Calmodulin tag, a polyglutamate tag, a FLAG tag, an human influenza hemagglutinin (HA) tag, a polyhistidine (His) tag, a Myc-tag, an S-tag, an streptavidin-binding peptide (SBP) tag, a Softag 1, a Strep tag, a TC tag, a V5 tag, an Xpress tag, an Isopeptag, a SpyTag, a biotin carboxyl carrier protein (BCCP) tag, a chitin binding protein (CBP) tag, a HaloTag, a thioredoxin tag, a T7 tag, a protein kinase A (PKA) tag, a c-Myc tag, a Trx tag, a Hsv tag, a CBD tag, a Dsb tag, a pelB/ompT, a KSI, a VSV-G tag, a 3-Gal tag, or any combination thereof. A tag may be a fusion tag, a covalent peptide tag, a protein tag, a peptide tag, an affinity tag, an epitope tag, a solubilization tag, or any combination thereof. A tag may comprise a recombinant protein. A tag may associate with a protein or protein fragment. A FLAG-tag may comprise a sequence or a portion thereof comprising DYKDDDDK, where D may be aspartic acid, Y may be tyrosine, and K may be lysine.
[00140] A moiety may be associated reversibly with a substrate. A moiety may be associated irreversibly with a substrate. A moiety may be reversibly associated with an epigenetically modified base. A moiety may be irreversibly associated with an epigenetically modified base. A moiety may be associated by binding to a substrate, an epigenetically modified base, or a combination thereof. A moiety may be bound by a single bond, a double bond, or a triple bond to a substrate. A moiety may be bound by a single bond, a double bond, or a triple bond to an epigenetically modified base.
[00141] The moiety may be a component that may aid in or catalyze a reaction. In some cases, a moiety may comprise an enzyme or a catalytically active fragment thereof. In some cases, a moiety may comprise an antibody or fragment thereof. In some cases, a moiety may comprise a protein, a peptide, or polypeptide. In some cases, a moiety may comprise a cofactor such as a coenzyme. In some cases, a moiety may comprise an enzyme, a protein or portion thereof, an antibody or portion thereof, a cofactor or any combination thereof. In some cases, a moiety, such as an enzyme, may aid in an association of a label with an epigenetically modified base. A moiety, such as an enzyme, may selectively associate a label with an epigenetically modified base present on a double-stranded oligonucleotide fragment as compared with an epigenetically modified base present on a single -stranded oligonucleotide fragment. A moiety, such as an enzyme, may selectively associate a label with an epigenetically modified base present on a single- stranded oligonucleotide fragment as compared with an epigenetically modified base present on a double- stranded oligonucleotide fragment. An enzyme may comprise a transferase. An enzyme may comprise a glucosyltransferase. An enzyme may comprise (a) an alpha-glucosyltransferase, (b) a beta- glucosyltransferase, (c) a beta-glucosyl-alpha-glucosyl-transferase, (d) J-glucosyltransferase, or (e) any combination thereof. A moiety, such as an enzyme, may comprise a modified moiety such as a genetically mutated moiety. A modified moiety may be modified to enhance an association of a label with an epigenetically modified base. A modified moiety may be modified to selectively aid in a) an association of a specific label with an epigenetically modified base, b) an association of a label with a specific epigenetically modified base, or c) a combination thereof.
[00142] In some cases, a moiety may catalyze a transfer of a methyl group to one or more bases of a nucleic acid sequence, a complementary strand, or a combination thereof. In some cases, a moiety may comprise a methyltransferase. In some cases, an enzyme may comprise a DNA methyltransferase 1 (DNMT1), a DNA methyltransferase 3-like (DNMT3L), a DNMT3A, a DNMT3B, a tRNA aspartic acid methyltransferase (TRDMT1), a DNMT3, any catalytically active fragment thereof, or any combination thereof.
[00143] In some cases, a moiety may catalyze a change in an epigenetic modification, such as a conversion of a methylated base to a hydroxymethylated base. In some cases, an enzyme may comprise a dioxygenase. In some cases, an enzyme may comprise a ten-eleven translocation (TET) family enzyme. In some cases, an enzyme may comprise TET1, TET2, TET3, CXXC finger protein 4 (CXXC4), any catalytically active fragment thereof, or any combination thereof.
[00144] In some cases, a moiety may catalyze an oxidative reaction, such as an oxidative decarboxylation. In some cases, an enzyme may comprise an isocitrate dehydrogenase (IDH) family enzyme. In some cases, an enzyme may comprise isocitrate dehydrogenase [NAD] subunit alpha (IDH3A), isocitrate dehydrogenase [NAD] subunit beta (IDH3B), isocitrate dehydrogenase [NAD] subunit gamma (IDH3G), isocitrate dehydrogenase 1 (IDH1), isocitrate dehydrogenase 2 (IDH2), any catalytically active fragment thereof, or any combination thereof.
[00145] A base of a nucleic acid sequence or a complementary strand may be deaminated, spontaneously or by contacting a moiety to a portion of a nucleic acid sequence. For example, a base, may be deaminated. In some cases, a base, a methylated base, a hydroxymethylated base, a formylated base, a carboxylated base, or any combination thereof may be deaminated. In some cases, a methylated cytosine may be deaminated. Deamination may occur selectively to a single base or to any combination of bases. Deamination may occur spontaneously. Deamination may occur by contacting a moiety to a portion of a nucleic acid sequence. A moiety may include an enzyme such as a deaminase, such as an adenosine deaminase, a guanine deaminase, or a cytidine deaminase. A deaminase may comprise activation-induced cytidine deaminase (AID), a conserved cytidine deaminase (CDA), apolipoprotein B mRNA editing enzyme catalytic polypeptide 1 (APOBEC1), apolipoprotein B mRNA-editing enzyme catalytic polypetide-like 3H (APOBEC3A-H), apolipoprotein B mRNA editing enzyme catalytic polypeptide -like 3G (APOBEC3G), or others. Bisulfite sequencing may deaminate one or more bases of a nucleic acid sequence or a complementary strand.
[00146] The term“click-chemistry” as used herein may comprise a reaction having at least one of the following: (a) high yielding, (b) wide in scope, (c) create byproducts that may be removed in the absence of chromatography, (d) stereospecific, (e) simple to perform, (f) conducted in easily removable or benign solvents. In some cases, click-chemistry comprises tagging, such as tagging a nucleic acid sequence or a complementary strand. In some cases, click-chemistry may associate a nucleic acid sequence with a moiety. Click-chemistry may comprise a reaction having a [3+2] cycloaddition; a thiol-ene reaction; a Diels-Alder reaction, an inverse electron demand Diels-Alder reaction; a [4+1] cycloaddition; a nucleophilic substitution; a carbonyl-chemistry-like formation of urea; an addition to a carbon-carbon double bond; or any combination thereof. In some cases, a [3+2] cycloaddition may comprise a Huisgen l,3-dipolar cycloaddition. In some cases, a [4+1] cycloaddition may comprise a cycloaddition between an isonitrile and a tetrazine. Click-chemistry may comprise a copper(I)-catalyzed azide-alkyne cycloaddition (CuAAC); a strain-promoted azide-alkyne cycloaddition (SPAAC); a strain-promoted alkyne-nitrone cycloaddition (SPANC); or any combination thereof.
[00147] The term“sequencing” as used herein, may comprise bisulfite-free sequencing, bisulfite sequencing, TET-assisted bisulfite (TAB) sequencing, ACE-sequencing, high-throughput sequencing, Maxam-Gilbert sequencing, massively parallel signature sequencing, Polony sequencing, 454 pyrosequencing, Sanger sequencing, Illumina sequencing, SOLiD sequencing, Ion Torrent semiconductor sequencing, DNA nanoball sequencing, Heliscope single molecule sequencing, single molecule real time (SMRT) sequencing, nanopore DNA sequencing, shot gun sequencing, RNA sequencing, Enigma sequencing, or any combination thereof.
[00148] In some cases, a method may comprise sequencing. The sequencing may include bisulfite sequencing or bisulfite-free sequencing. In some cases, a method may comprise oxidizing one or more bases of a nucleic acid sequence or complementary strand or combination thereof. In some cases, a method may comprise selectively enriching for a nucleic acid sequence that contains at least one epigenetic modification.
[00149] The term“primer extension reaction,” or PCR, as used herein, generally refers to the binding of a primer to a strand of the template nucleic acid, followed by elongation of the primer(s). It may also include, denaturing of a double-stranded nucleic acid and the binding of a primer strand to either one or both of the denatured template nucleic acid strands, followed by elongation of the primer(s). Primer extension reactions may be used to incorporate nucleotides or nucleotide analogs to a primer in template- directed fashion by using enzymes (polymerizing enzymes)
[00150] The term“substrate” as used herein, may be a surface with which an entity (such as a moiety, a functional group, an epigenetic modification, a label or functional moiety associated with an epigenetic modification, a label or functional moiety associated with a parent strand) can be associated. In some cases, an entity may be immobilized to the substrate (such as a support). In some cases, an entity may be reversibly or irreversibly bound to the substrate (such as a support). In some cases, an entity may comprise a moiety. In such cases, a moiety may also associate with a nucleic acid sequence. In some cases, an entity may comprise a moiety, a nucleic acid sequence, a sugar, an enzyme, or any combination thereof. A substrate may comprise a bead. A substrate may comprise a plurality of beads. A substrate may comprise an array of beads. A substrate may comprise an array, such as an array of wells or an array of beads. A substrate (such as a solid support) may comprise a column, such as a packed column, a size- exclusion column, a magnetic column, or any combination thereof. A substrate may comprise a membrane. A substrate may comprise a bead, a capillary, a plate, a membrane, a wafer, a well, a plurality of any of these, an array of any of these, or any combination thereof. A substrate (such as a support) may positively select a nucleic acid sequence of interest by associating the nucleic acid sequence of interest with the substrate. A substrate may negatively select for a nucleic acid sequence of interest by associating other nucleic acid sequences of a sample with the substrate.
[00151] A bead may comprise one or more beads. A bead may comprise an array of beads. A bead may be associated with a substrate. A bead may be associated with a moiety. A bead may associate a moiety with a substrate. A bead may be associated with a substrate, a moiety, a nucleic acid sequence or any combination thereof. A bead may comprise a polymer, a metal, or a combination thereof. A bead may comprise a hydrogel, a silica gel, a glass, a resin, a metal, a metal alloy, a plastic, a cellulose, an agarose, a magnetic material, or any combination thereof.
[00152] A support may be organic or inorganic; may be metal (e.g., copper or silver) or non-metal; may be a polymer or nonpolymer; may be conducting, semiconducting or nonconducting (insulating); may be reflecting or nonreflecting; may be porous or nonporous; etc. A substrate as described above can be formed of any suitable material, including metals, metal oxides, semiconductors, polymers (particularly organic polymers in any suitable form including woven, nonwoven, molded, extruded, cast, etc.), silicon, silicon oxide, and composites thereof.
[00153] The term“tissue” as used herein, may be any tissue sample. A tissue may be a tissue suspected or confirmed of having a disease or condition. A tissue may be a sample that may be substantially healthy, substantially benign, or otherwise substantially free of a disease or a condition. A tissue may be a tissue removed from a subject, such as a tissue biopsy, a tissue resection, an aspirate (such as a fine needle aspirate), a tissue washing, a cytology specimen, a bodily fluid, or any combination thereof. A tissue may comprise cancerous cells, tumor cells, non-cancerous cells, or a combination thereof. A tissue may comprise brain tissue, cerebral spinal tissue, cerebral spinal fluid, breast tissue, bladder tissue, kidney tissue, liver tissue, colon tissue, thyroid tissue, cervical tissue, prostate tissue, lung tissue, heart tissue, muscle tissue, pancreas tissue, anal tissue, bile duct tissue, a bone tissue, uterine tissue, ovarian tissue, endometrial tissue, vaginal tissue, vulvar tissue, stomach tissue, ocular tissue, nasal tissue, sinus tissue, penile tissue, salivary gland tissue, gut tissue, gallbladder tissue, gastrointestinal tissue, bladder tissue, brain tissue, spinal tissue, a blood sample, or any combination thereof. A tissue may be a sample that may be genetically modified. [00154] The term“subject,” as used herein, may be any animal or living organism. Animals can be mammals, such as humans, non-human primates, rodents such as mice and rats, dogs, cats, pigs, sheep, rabbits, and others. Animals can be fish, reptiles, or others. Animals can be neonatal, infant, adolescent, or adult animals. Humans can be more than about: 1, 2, 5, 10, 20, 30, 40, 50, 60, 65, 70, 75, or about 80 years of age. The subject may have or be suspected of having a condition or a disease, such as cancer. The subject may be a patient, such as a patient being treated for a condition or a disease, such as a cancer patient. The subject may be predisposed to a risk of developing a condition or a disease such as cancer. The subject may be in remission from a condition or a disease, such as a cancer patient. The subject may be healthy.
[00155] A nucleic acid sequence may comprise a cytosine guanine (CG) site, a cytosine phosphate guanine (CpG) island, a portion of any of these, or a combination thereof. A CpG island may comprise one or more CG sites. A nucleic acid sequence may comprise one or more CG sites or portions thereof. A nucleic acid sequence may comprise dense CG sites, dense CpG islands or a combination thereof. A nucleic acid sequence may comprise a plurality of CG sites or portions thereof. A nucleic acid sequence may comprise one or more CpG islands or portions thereof. A nucleic acid sequence may comprise a plurality of CpG islands or portions thereof. One or more bases of a nucleic acid sequence comprising a CG site, a CpG island, a portion thereof, or any of these may comprise an epigenetically modified base, such as a methylated base or a hydroxymethylated base. One or more cytosines of a nucleic acid sequence comprising a CG site, a CpG island, a portion thereof, or any of these may comprise an epigenetically modified cytosine, such as a methylated cytosine or a hydroxymethylated cytosine. A CpG island (or a CG island) may be a region with a high frequency of CG sites. A CpG island may be a region of a nucleic acid sequence with at least about 200 basepairs (bp) and a GC percentage that may be greater than about 50% and with an observed-to-expected CpG ratio that may be greater than about 60 %.
[00156] In some cases, a CpG island may be a region of a nucleic acid sequence with at least about: 20,
30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 225, 250, 300, 350, 400,
450, 500, 550, 600 bp. In some cases, a CpG island may be a region of a nucleic acid sequence with from about 20 to about 600 bp. In some cases, a CpG island may be a region of a nucleic acid sequence with from about 20 to about 500 bp. In some cases, a CpG island may be a region of a nucleic acid sequence with from about 10 to about 500 bp. In some cases, a CpG island may be a region of a nucleic acid sequence with from about 10 to about 300 bp. In some cases, a CpG island may be a region of a nucleic acid sequence with from about 20 to about 200 bp.
[00157] In some cases, a GC percentage in a CpG island may be greater than about: 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95% or greater. In some cases, a GC percentage in a CpG island may be from about 50% to about 95%. In some cases, a GC percentage in a CpG island may be from about 50% to about 99%. In some cases, a GC percentage in a CpG island may be from about 55% to about 85%. In some cases, a GC percentage in a CpG island may be from about 60% to about 99%. In some cases, a GC percentage in a CpG island may be from about 70% to about 99%. [00158] As used herein, the term“cell-free” refers to the condition of the nucleic acid sequence as it appeared in the body before the sample is obtained from the body. For example, circulating cell-free nucleic acid sequences in a sample may have originated as cell-free nucleic acid sequences circulating in the bloodstream of the human body. In contrast, nucleic acid sequences that are extracted from a solid tissue, such as a biopsy, are generally not considered to be“cell -free.” In some cases, cell-free DNA may comprise fetal DNA, maternal DNA, or a combination thereof. In some cases, cell-free DNA may comprise DNA fragments released into a blood plasma. In some cases, the cell-free DNA may comprise circulating tumor DNA. In some cases, cell-free DNA may comprise circulating DNA indicative of a tissue origin, a disease or a condition. A cell-free nucleic acid sequence may be isolated from a blood sample. A cell-free nucleic acid sequence may be isolated from a plasma sample. A cell -free nucleic acid sequence may comprise a complementary DNA (cDNA). In some cases, one or more cDNAs may form a cDNA library.
[00159] In some cases, a nucleic acid sequence may be double-stranded, such as a cDNA library comprising the nucleic acid sequence. In some cases, a nucleic acid sequence may be double -stranded such as when a substantially complementary strand may be hybridized to at least a portion of the nucleic acid sequence. In some cases, a portion of a nucleic acid sequence may be double -stranded, such as when a primer may be hybridized to a portion of the nucleic acid sequence.
[00160] A nucleic acid sequence may be from a sample. A sample may be isolated from a subject. A subject may be a human subject. A sample may comprise a buccal sample, a saliva sample, a blood sample, a plasma sample, a reproductive sample (such as an egg or a sperm), a mucus sample, a cerebral spinal fluid sample, a tissue sample, a tissue biopsy, a surgical resection, a fine needle aspirate sample, or any combination thereof. In some cases, a sample may comprise a blood sample. In some cases, a sample may comprise a buccal sample.
[00161] In some cases, a subject may have previously received a diagnosis of a disease or condition prior to performing a method as described herein. A subject may have previously received a positive diagnosis of a disease, such as a cancer. A subject may have previously received an indeterminate or inclusive diagnosis of a disease, such as a cancer. A subject may be a subject in need thereof, such as a need for a definitive diagnosis or a need for a selection of a therapeutic treatment regime.
[00162] In some cases, a subject may not have previously received a diagnosis of a disease or condition prior to performing a method as described herein. In some cases, a subject may be suspected of having a disease or condition, such as having one or more symptoms of a disease or condition. In some cases, a subject may be at risk of developing a disease or condition, such as a subject having a biomarker or genetic indication that may be indicative of a risk of developing a disease or condition. In some cases, a disease or a condition may comprise a cancer.
[00163] In some cases, a method as described herein may comprise obtaining a result. A method may comprise obtaining a result and reporting the result. A result may be reported to a user, a medical professional, a subject, or any combination thereof. A result may be reported via a communication medium. A communication medium may include a written report or a printed report. A communication medium may include a visual display such as a graphical user interface. A communication medium may comprise a result provided by a computer, a tablet device, a cellphone, or other electronic device. A result may comprise a diagnosis of a disease or condition or a confirmation of an absence of a disease or condition. A result may comprise a diagnosis of a subject as having a disease or condition. A result may comprise a confirmation of an absence of the disease or condition. A result may comprise a likelihood or a risk of a subject to develop a disease or a condition. In some cases, a disease or a condition may comprise a cancer. A result may comprise predicting mortality of a subject, determining a biological age of a subject, or a combination thereof. A mortality prediction or biological age determination may be based on a presence of an epigenetic modification, sequencing information or any combination thereof. A result, such as a prediction of a likelihood of a disease or condition or a diagnosis of a disease or condition may be based on a presence of an epigenetic modification, sequencing information or a combination thereof. A presence of an epigenetic modification may include a pattern of epigenetic modification, a presence of a specific epigenetic modification, a level of an epigenetic modification, or any combination thereof.
[00164] A method as described herein may comprise comparing a result to a reference. A reference may comprise a plurality of references. A reference may comprise a database comprising a plurality of results. A reference may comprise a control sample. A reference may comprise a positive control sample, a negative control sample, or a combination thereof. A reference, such as a reference sample, may be obtained from a subject or from a different source, such as a different subject. A diagnosis may comprise comparing a result to a reference. In some cases, a result comprising a diagnosis may at least partially confirm a previous diagnosis.
Diagnostics
[00165] One or more results obtained from a method described herein may provide a quantitative value or values indicative of one or more of the following: a likelihood of diagnostic accuracy, a likelihood of a presence of a condition in a subject, a likelihood of a subject developing a condition, a likelihood of success of a particular treatment, or any combination thereof. A method as described herein may predict a risk or likelihood of developing a condition. A method as described herein may be an early diagnostic indicator of developing a condition. A method as described herein may confirm a diagnosis or a presence of a condition. A method as described herein may monitor the progression of a condition. A method as described herein may monitor the efficacy of a treatment for a condition in a subject.
[00166] Samples obtained for analysis using the methods described herein may be obtained from a subject. The subject may not have any symptoms of a condition. The subject may have one or more symptoms of a condition. The subject may be a risk, such as a genetic risk, of developing a condition. The subject may have previously received a positive diagnosis. The subject may have previously received an indeterminate result from a diagnostic test. The subject may be currently receiving in a treatment. [00167] Methods for diagnosing and/or suggesting, selecting, designating, recommending or otherwise determining a course of treatment for a subject having or suspected of having a condition can be employed in combination with the methods as described herein. These techniques may include cytological analysis or histological classification, molecular profiling, a blood test, a genetic analysis, ultrasound analysis, MRI results, CT scan results, other imaging scans, measurements of hormone cytokine or blood cell levels, or any combination thereof. The methods described herein may include at least one other type of diagnostic method. The methods described herein may include at least two other diagnostic methods.
[00168] In some embodiments, the methods of the present invention provide for storing the sample for a time such as seconds, minutes, hours, days, weeks, months, years or longer after the sample is obtained and before the sample is analyzed by one or more methods of the invention. In some cases, the sample obtained from a subject is subdivided prior to the step of storage or further analysis such that different portions of the sample are subject to different downstream methods or processes including but not limited to any combination of methods described herein, storage, bisulfite treatment, amplification, sequencing, labeling, cytological analysis, adequacy tests, nucleic acid extraction, molecular profiling or a combination thereof.
[00169] In some cases, a portion of the sample may be stored while another portion of said sample is further manipulated. Such manipulations may include but are not limited to any method as described herein; bisulfite treatment; sequencing; amplification; labeling; selective enrichment; molecular profiling; cytological staining; nucleic acid (RNA or DNA) extraction, detection, or quantification; gene expression product (RNA or Protein) extraction, detection, or quantification; fixation; and examination. The sample may be fixed prior to or during storage by any method known to the art such as using glutaraldehyde, formaldehyde, or methanol. In other cases, the sample is obtained and stored and subdivided after the step of storage for further analysis such that different portions of the sample are subject to different downstream methods.
Treatment
[00170] A method as described herein may comprise treating a subject. In some cases, a treatment may comprise surgery, chemotherapy, radiation therapy, immunotherapy, targeted therapy, hormone therapy, stem cell transplantation, precision medicine, or any combination thereof. In some cases, a treatment may comprise further monitoring of a condition of a subject. In some cases, a subject diagnosed with a disease or condition may receive a treatment to treat a disease or a condition. In some cases, a subject receiving a confirmation of a likelihood or a risk of developing a disease or a condition, may receive a treatment, such as a preventive treatment. A treatment for a subject may be selected based on a result of a method, such as a confirmed positive diagnosis of a disease or a condition. A result may comprise one or more treatments, such as a recommended treatment, for a subject based on a result. A treatment may comprise a single treatment. A treatment may comprise a recurring treatment. A treatment may comprise a recurring treatment over a remaining lifespan of a subject. A treatment may comprise a daily treatment. A treatment may comprise a biweekly treatment. A treatment may be selected base on a result.
[00171] In some embodiments, a treatment for a subject can be a surgery (such as a tissue resection), a nutrition regime, a physical activity, a radiation treatment, a chemotherapy, an immunotherapy, a pharmaceutical composition, a cell transplantation, a blood fusion, or any combination thereof.
[00172] The methods described herein, such as assaying and comparing, may be conducted prior to an operation on a diseased tissue of the subject, such as a tumor resection. The methods described herein may be conducted prior to the subject having a positive disease diagnosis, such as a cancer or a tumor diagnosis. The methods described herein may be conducted on a subject suspected of having a condition or a disease, such as a cancer or a tumor. The methods described herein may be conducted on a subject that has received a positive disease diagnosis, such as a positive cancer or a positive tumor diagnosis. The methods described herein may be conducted on a subject having received a prior treatment regime, wherein the prior treatment regime was ineffective in eliminating the disease or condition, such as a cancer or tumor. A tissue sample may be obtained from a subject prior to performing the methods described herein. A tissue sample may be obtained during a biopsy, fine needle aspiration, blood sample, surgery resection, or any combination thereof.
[00173] Assaying a tissue sample of a subject may be performed at one or more time points. A separate tissue sample may be obtained from the subject for assaying at each of the one or more time points. Assaying at one or more time points may be performed on the same tissue sample. Assaying at one or more time points may provide an assessment of an effectiveness of a drug, a longitudinal course of a disease treatment regime, or a combination thereof. At each of the one or more time points, a tissue sample may be compared to a same reference. A tissue sample may be compared to a different reference at each of the one or more time points. The one or more time points may be the same. The one or more time points may be different. The one or more time points may comprise at least one time point prior to a drug administration, at least one time point after a drug administration, at least one time point prior to a positive disease diagnosis, at least one time point after a disease remission diagnosis, at least one time point during a disease treatment regime, or a combination thereof.
[00174] The methods as described herein may be used for diagnosis of a particular condition and also to monitor efficacy of a particular treatment after an initial diagnosis or monitor progression of a particular condition. The methods as described herein may be used to monitoring a subject as risk of developing a particular condition, as a preventive measure. The methods as described herein may be used alone for diagnosis and/or monitoring efficacy of a particular treatment. The methods as described herein may be used in combination with other assays for diagnosis or monitoring (such as a cytological analysis or molecular profiling).
[00175] A subject may be monitored using methods as disclosed herein. For example, a subject may be diagnosed with condition, such as a cancer or a genetic disorder. This initial diagnosis may or may not involve the use of the methods described herein. The subject may be prescribed a treatment such as surgical resection of a tumor or chemotherapy. The results of the treatment may be monitored on an ongoing basis by the methods described herein to detect the efficacy of the treatment. In another example, a subject may be diagnosed with a benign tumor or a precancerous lesion or nodule, and the tumor, nodule, or lesion may be monitored on an ongoing basis by the methods described herein to detect any changes in the state of the tumor or lesion.
[00176] The methods described herein may also be used to ascertain the potential efficacy of a specific treatment prior to administering to a subject. For example, a subject may be diagnosed with cancer. The methods described herein may indicate a presence of one or more epigenetic residues on a particular nucleic acid sequence known to be involved in cancer malignancy. A further sample may be obtained from the subject and cultured in vitro using methods known to the art. The application of various inhibitors or drugs may then be tested for growth inhibition. The methods described herein may also be used to monitor the effect of these inhibitors on for example down-stream targets of the implicated pathway.
[00177] In some embodiments, the methods described herein may be used as a research tool to identify new markers for diagnosis of conditions (such as suspected tumors); to monitor the effect of drugs or candidate drugs on samples such as tumor cells, cell lines, tissues, or organisms; or to uncover new pathways for disease prevention or inhibition (such as oncogenesis and/or tumor suppression).
Ranges and Numbers
[00178] In some cases, a nucleic acid sequence may comprise one or more epigenetically modified bases, such as (a) one or more epigenetically modified cytosines, (b) one or more epigenetically modified uracils, (c) one or more epigenetically modified thymines, (d) one or more epigenetically modified guanine, (e) one or more epigenetically modified adenines, or (f) any combination thereof.
[00179] A nucleic acid sequence may comprise one or more epigenetically modified bases. For example, a nucleic acid sequence may comprise at least about: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19, 20, or more epigenetically modified bases per about 20 basepairs of the nucleic acid sequence. A nucleic acid sequence may comprise about: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 epigenetically modified bases per about 20 basepairs of the nucleic acid sequence.
[00180] A nucleic acid sequence may comprise one or more epigenetically modified bases. For example, about: 4%, 5%, 6%, 7%, 8%, 9%, 10%, 15%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, or 90% of total bases of a nucleic acid sequence may comprise epigenetically modified bases. In some cases, at least about: 4%, 5%, 6%, 7%, 8%, 9%, 10%, 15%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, or 90% of total bases of a nucleic acid sequence may comprise epigenetically modified bases. In some cases, from about 4% to about 10% of total bases of a nucleic acid sequence may comprise epigenetically modified bases. In some cases, from about 4% to about 6% of total bases of a nucleic acid sequence may comprise epigenetically modified bases. In some cases, from about 4% to about 20% of total bases of a nucleic acid sequence may comprise epigenetically modified bases. In some cases, from about 4% to about 30% of total bases of a nucleic acid sequence may comprise epigenetically modified bases. In some cases, from about 3% to about 30% of total bases of a nucleic acid sequence may comprise epigenetically modified bases. In some cases, from about 30% to about 90% of total bases of a nucleic acid sequence may comprise epigenetically modified bases. In some cases, from about 40% to about 90% of total bases of a nucleic acid sequence may comprise epigenetically modified bases. In some cases, from about 50% to about 90% of total bases of a nucleic acid sequence may comprise epigenetically modified bases. In some cases, from about 60% to about 90% of total bases of a nucleic acid sequence may comprise
epigenetically modified bases.
[00181] A nucleic acid sequence (in some cases comprising a plurality of epigenetically modified residues) may be enriched. Enrichment of the nucleic acid sequence may comprise amplification such as amplification by polymerase chain reaction (PCR), loop mediated isothermal amplification, nucleic acid sequence based amplification, strand displacement amplification, multiple displacement amplification, rolling circle amplification, ligase chain reaction, helicase dependent amplification, ramification amplification method, or any combination thereof.
[00182] In some cases, amplification may comprise at least 2 cycles of amplification. Amplification may comprise at least 3 cycles of amplification. Amplification may comprise at least 4 cycles of amplification. Amplification may comprise at least 5 cycles of amplification. Amplification may comprise at least 6 cycles of amplification. Amplification may comprise at least 7 cycles of amplification. Amplification may comprise at least 8 cycles of amplification. Amplification may comprise at least 9 cycles of amplification. Amplification may comprise at least 10 cycles of amplification. Amplification may comprise at least 11 cycles of amplification. Amplification may comprise at least 12 cycles of amplification. Amplification may comprise at least 13 cycles of amplification. Amplification may comprise at least 14 cycles of amplification. Amplification may comprise at least 15 cycles of amplification. Amplification may comprise at least 20 cycles of amplification. Amplification may comprise at least 25 cycles of amplification. Amplification may comprise at least 30 cycles of amplification.
[00183] In some cases, amplification of a given number of cycles produces a plurality of sequence reads that retain a percentage of original sequence length. In some cases, about 90% of the plurality of sequence reads retain at least about 90% of the sequence length. In some cases, about 80% of the plurality of sequence reads retain at least about 90% of the sequence length. In some cases, about 75% of the plurality of sequence reads retain at least about 90% of the sequence length. In some cases, about 95% of the plurality of sequence reads retain at least about 90% of the sequence length. In some cases, about 85% of the plurality of sequence reads retain at least about 90% of the sequence length.
[00184] In some cases, about 90% of the plurality of sequence reads retain at least about 85% of the sequence length. In some cases, about 80% of the plurality of sequence reads retain at least about 85% of the sequence length. In some cases, about 75% of the plurality of sequence reads retain at least about 85% of the sequence length. In some cases, about 95% of the plurality of sequence reads retain at least about 85% of the sequence length. In some cases, about 85% of the plurality of sequence reads retain at least about 85% of the sequence length.
[00185] In some cases, about 90% of the plurality of sequence reads retain at least about 80% of the sequence length. In some cases, about 80% of the plurality of sequence reads retain at least about 80% of the sequence length. In some cases, about 75% of the plurality of sequence reads retain at least about 80% of the sequence length. In some cases, about 95% of the plurality of sequence reads retain at least about 80% of the sequence length. In some cases, about 85% of the plurality of sequence reads retain at least about 80% of the sequence length.
[00186] In some cases, at least about: 1%, 2%, 3%, 4%, 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%,
80%, 90% of the bases of a nucleic acid sequence may comprise an epigenetically modified base. In some cases, at least about 1% of the bases of a nucleic acid sequence may comprise an epigenetically modified base. In some cases, at least about 2% of the bases of a nucleic acid sequence may comprise an epigenetically modified base. In some cases, at least about 3% of the bases of a nucleic acid sequence may comprise an epigenetically modified base. In some cases, at least about 4% of the bases of a nucleic acid sequence may comprise an epigenetically modified base. In some cases, at least about 5% of the bases of a nucleic acid sequence may comprise an epigenetically modified base. In some cases, at least about 10% of the bases of a nucleic acid sequence may comprise an epigenetically modified base. In some cases, from about 10% to about 100% of the bases of a nucleic acid sequence may comprise an epigenetically modified base. In some cases, from about 10% to about 90% of the bases of a nucleic acid sequence may comprise an epigenetically modified base. In some cases, from about 5% to about 100% of the bases of a nucleic acid sequence may comprise an epigenetically modified base. In some cases, from about 4% to about 100% of the bases of a nucleic acid sequence may comprise an epigenetically modified base. In some cases, from about 3% to about 100% of the bases of a nucleic acid sequence may comprise an epigenetically modified base.
[00187] In some cases, a nucleic acid sequence comprises at least about: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15 or 20 epigenetically modified bases per at least about 20 bases of the nucleic acid sequence. In some cases, a nucleic acid sequence comprises at least about 1 epigenetically modified base per at least about 20 bases of the nucleic acid sequence. In some cases, a nucleic acid sequence comprises at least about 2 epigenetically modified bases per at least about 20 bases of the nucleic acid sequence. In some cases, a nucleic acid sequence comprises at least about 3 epigenetically modified bases per at least about 20 bases of the nucleic acid sequence. In some cases, a nucleic acid sequence comprises at least about 4 epigenetically modified bases per at least about 20 bases of the nucleic acid sequence. In some cases, a nucleic acid sequence comprises at least about 5 epigenetically modified bases per at least about 20 bases of the nucleic acid sequence. In some cases, a nucleic acid sequence comprises at least about 10 epigenetically modified bases per at least about 20 bases of the nucleic acid sequence. In some cases, a nucleic acid sequence comprises from about 1 to about 10 epigenetically modified bases per at least about 20 bases of the nucleic acid sequence. In some cases, a nucleic acid sequence comprises at least from about 3 to about 10 epigenetically modified bases per at least about 20 bases of the nucleic acid sequence. In some cases, a nucleic acid sequence comprises at least from about 4 to about 10 epigenetically modified bases per at least about 20 bases of the nucleic acid sequence. In some cases, a nucleic acid sequence comprises at least from about 5 to about 10 epigenetically modified bases per at least about 20 bases of the nucleic acid sequence.
[00188] In some cases, a nucleic acid sequence comprises at least from about 1 to about 3 epigenetically modified bases per at least about 20 bases of a nucleic acid sequence. In some cases, a nucleic acid sequence comprises at least from about 1 to about 4 epigenetically modified bases per at least about 20 bases of a nucleic acid sequence. In some cases, a nucleic acid sequence comprises at least from about 1 to about 5 epigenetically modified bases per at least about 20 bases of a nucleic acid sequence. In some cases, a nucleic acid sequence comprises at least from about 1 to about 8 epigenetically modified bases per at least about 20 bases of a nucleic acid sequence. In some cases, a nucleic acid sequence comprises at least from about 1 to about 10 epigenetically modified bases per at least about 20 bases of a nucleic acid sequence. In some cases, a nucleic acid sequence comprises at least from about 1 to about 15
epigenetically modified bases per at least about 20 bases of a nucleic acid sequence. In some cases, a nucleic acid sequence comprises at least from about 1 to about 20 epigenetically modified bases per at least about 20 bases of a nucleic acid sequence.
Samples
[00189] A sample obtained from a subject can comprise tissue, cells, cell fragments, cell organelles, nucleic acids, genes, gene fragments, expression products, gene expression products, gene expression product fragments or any combination thereof. A sample can be heterogeneous or homogenous. A sample can comprise blood, urine, cerebrospinal fluid, seminal fluid, saliva, sputum, stool, lymph fluid, tissue, mucus, or any combination thereof. A sample can be a tissue-specific sample such as a sample obtained from a reproductive tissue (such as a sperm or an egg), thyroid, skin, heart, lung, kidney, breast, pancreas, liver, muscle, smooth muscle, bladder, gall bladder, colon, intestine, brain, esophagus, prostate, or any combination thereof.
[00190] A sample of the present disclosure can be obtained by various methods, such as, for example, fine needle aspiration (FNA), core needle biopsy, vacuum assisted biopsy, incisional biopsy, excisional biopsy, core biopsy, punch biopsy, shave biopsy, skin biopsy, or any combination thereof.
[00191] A sample may be obtained from a subject by another individual or entity, such as a healthcare (or medical) professional or robot. A medical professional can include a physician, nurse, medical technician or other. In some cases, a physician may be a specialist, such as an oncologist, surgeon, or
endocrinologist. A medical technician may be a specialist, such as a cytologist, phlebotomist, radiologist, pulmonologist or others. A medical professional may obtain a sample from a subject fortesting or refer the subject to a testing center or laboratory for the submission of the sample. The medical professional may indicate to the testing center or laboratory the appropriate test or assay to perform on the sample, such as methods of the present disclosure including determining gene sequence data, gene expression levels, sequence variant data, or any combination thereof.
[00192] In some cases, a medical professional need not be involved in the initial diagnosis of a condition or a disease or the initial sample acquisition. An individual, such as the subject, may alternatively obtain a sample through the use of an over the counter kit. The kit may contain collection unit or device for obtaining the sample as described herein, a storage unit for storing the sample ahead of sample analysis, and instructions for use of the kit.
[00193] Epigenetic modifications may be monitored over time. Monitoring epigenetic modification over time may include monitoring changes in a presence of an epigenetic modification, a level of an epigenetic modification, a pattern of an epigenetic modification. Monitoring may include monitoring an efficacy of a therapeutic, monitoring a progression of a disease, monitoring a regression of a disease, monitoring a risk or likelihood of developing a disease, monitoring a mortality prediction or biological age, or any combination thereof. A sample can be obtained a) pre -operatively, b) post-operatively, c) after a disease diagnosis, d) during routine screening following remission or cure of a disease, e) when a subject may be suspected of having a disease, f) during a routine office visit or clinical screen, g) following the request of a medical professional, or any combination thereof. Multiple samples at separate times can be obtained from the same subject, such as before treatment for a disease commences and after treatment ends, such as monitoring a subject over a time course. Multiple samples can be obtained from a subject at separate times to monitor the absence or presence of disease progression, regression, or remission in the subject.
Conditions or Diseases
[00194] A condition or a disease, as disclosed herein, can include a cancer, a neurological disorder, or an autoimmune disease.
[00195] In some cases, a disease or condition may comprise a neurological disorder. In some cases, a neurological disorder may comprise Acquired Epileptiform Aphasia, Acute Disseminated
Encephalomyelitis, Adrenoleukodystrophy, Agenesis of the corpus callosum, Agnosia, Aicardi syndrome, Alexander disease, Alpers' disease, Alternating hemiplegia, Alzheimer's disease, Amyotrophic lateral sclerosis (see Motor Neuron Disease), Anencephaly, Angelman syndrome, Angiomatosis, Anoxia, Aphasia, Apraxia, Arachnoid cysts, Arachnoiditis, Amold-Chiari malformation, Arteriovenous malformation, Asperger's syndrome, Ataxia Telangiectasia, Attention Deficit Hyperactivity Disorder, Autism, Auditory processing disorder, Autonomic Dysfunction, , Back Pain, Batten disease, Behcet's disease, Bell's palsy, Benign Essential Blepharospasm, Benign Focal Amyotrophy, Benign Intracranial Hypertension, Bilateral frontoparietal polymicrogyria, Binswanger's disease, Blepharospasm, Bloch- Sulzberger syndrome, Brachial plexus injury, Brain abscess, Brain damage, Brain injury, Brain tumor, Brown-Sequard syndrome, Canavan disease, Carpal tunnel syndrome (CTS), Causalgia, Central pain syndrome, Central pontine myelinolysis, Centronuclear myopathy, Cephalic disorder, Cerebral aneurysm, Cerebral arteriosclerosis, Cerebral atrophy, Cerebral gigantism, Cerebral palsy, Charcot-Marie-Tooth disease, Chiari malformation, Chorea, Chronic inflammatory demyelinating polyneuropathy (CIDP), Chronic pain, Chronic regional pain syndrome, Coffin Lowry syndrome, Coma, including Persistent Vegetative State, Congenital facial diplegia, Corticobasal degeneration, Cranial arteritis,
Craniosynostosis, Creutzfeldt-Jakob disease, Cumulative trauma disorders, Cushing's syndrome,
Cytomegalic inclusion body disease (CIBD), Cytomegalovirus Infection, , Dandy-Walker syndrome, Dawson disease, De Morsier's syndrome, Dejerine-Klumpke palsy, Dejerine-Sottas disease, Delayed sleep phase syndrome, Dementia, Dermatomyositis, Neurological Dyspraxia, Diabetic neuropathy, Diffuse sclerosis, Dysautonomia, Dyscalculia, Dysgraphia, Dyslexia, Dystonia, , Early infantile epileptic encephalopathy, Empty sella syndrome, Encephalitis, Encephalocele, Encephalotrigeminal angiomatosis, Encopresis, Epilepsy, Erb's palsy, Erythromelalgia, Essential tremor, , Fabry's disease, Fahr's syndrome, Fainting, Familial spastic paralysis, Febrile seizures, Fisher syndrome, Friedreich's ataxia, FART
Syndrome, Gaucher's disease, Gerstmann's syndrome, Giant cell arteritis, Giant cell inclusion disease, Globoid cell Feukodystrophy, Gray matter heterotopia, Guillain-Barre syndrome, HTFV-l associated myelopathy, Hallervorden-Spatz disease, Head injury, Headache, Hemifacial Spasm, Hereditary Spastic Paraplegia, Heredopathia atactica polyneuritiformis, Herpes zoster oticus, Herpes zoster, Hirayama syndrome, Holoprosencephaly, Huntington's disease, Hydranencephaly, Hydrocephalus,
Hypercortisolism, Hypoxia, Immune-Mediated encephalomyelitis, Inclusion body myositis, Incontinentia pigmenti, Infantile phytanic acid storage disease, Infantile Refsum disease, Infantile spasms,
Inflammatory myopathy, Intracranial cyst, Intracranial hypertension, Joubert syndrome, Keams-Sayre syndrome, Kennedy disease, Kinsboume syndrome, Klippel Fed syndrome, Krabbe disease, Kugelberg- Welander disease, Kura, Fafora disease, Fambert-Eaton myasthenic syndrome, Fandau-Kleffher syndrome, Fateral medullary (Wallenberg) syndrome, Feaming disabilities, Feigh's disease, Fennox- Gastaut syndrome, Fesch-Nyhan syndrome, Feukodystrophy, Fewy body dementia, Fissencephaly, Focked-In syndrome, Fou Gehrig's disease, Fumbar disc disease, Fyme disease - Neurological Sequelae, Machado-Joseph disease (Spinocerebellar ataxia type 3), Macrencephaly, Maple Syrup Urine Disease, Megalencephaly, Melkersson-Rosenthal syndrome, Menieres disease, Meningitis, Menkes disease, Metachromatic leukodystrophy, Microcephaly, Migraine, Miller Fisher syndrome, Mini-Strokes, Mitochondrial Myopathies, Mobius syndrome, Monomelic amyotrophy, Motor Neuron Disease, Motor skills disorder, Moyamoya disease, Mucopolysaccharidoses, Multi-Infarct Dementia, Multifocal motor neuropathy, Multiple sclerosis, Multiple system atrophy, Muscular dystrophy, Myalgic encephalomyelitis, Myasthenia gravis, Myelinoclastic diffuse sclerosis, Myoclonic Encephalopathy of infants, Myoclonus, Myopathy, Myotubular myopathy, Myotonia congenita, Narcolepsy, Neurofibromatosis, Neuroleptic malignant syndrome, Neurological manifestations of AIDS, Neurological sequelae of lupus,
Neuromyotonia, Neuronal ceroid lipofuscinosis, Neuronal migration disorders, Niemann-Pick disease, Non 24-hour sleep-wake syndrome, Nonverbal learning disorder, O'Sullivan-McFeod syndrome,
Occipital Neuralgia, Occult Spinal Dysraphism Sequence, Ohtahara syndrome, Olivopontocerebellar atrophy, Opsoclonus myoclonus syndrome, Optic neuritis, Orthostatic Hypotension, Overuse syndrome, Palinopsia, Paresthesia, Parkinson's disease, Paramyotonia Congenita, Paraneoplastic diseases,
Paroxysmal attacks, Parry -Romberg syndrome, Rombergs Syndrome, Pelizaeus-Merzbacher disease, Periodic Paralyses, Peripheral neuropathy, Persistent Vegetative State, Pervasive neurological disorders, Photic sneeze reflex, Phytanic Acid Storage disease, Pick's disease, Pinched Nerve, Pituitary Tumors, PMG, Polio, Polymicrogyria, Polymyositis, Porencephaly, Post-Polio syndrome, Postherpetic Neuralgia (PHN), Postinfectious Encephalomyelitis, Postural Hypotension, Prader-Willi syndrome, Primary Lateral Sclerosis, Prion diseases, Progressive Hemifacial Atrophy also known as Rombergs Syndrome,
Progressive multifocal leukoencephalopathy, Progressive Sclerosing Poliodystrophy, Progressive Supranuclear Palsy, Pseudotumor cerebri, Ramsay-Hunt syndrome (Type I and Type II), Rasmussen's encephalitis, Reflex sympathetic dystrophy syndrome, Refsum disease, Repetitive motion disorders, Repetitive stress injury, Restless legs syndrome, Retrovirus-associated myelopathy, Rett syndrome,
Reye's syndrome, Rombergs Syndrome, Rabies, Saint Vitus dance, Sandhoff disease, Schytsophrenia, Schilder's disease, Schizencephaly, Sensory Integration Dysfunction, Septo-optic dysplasia, Shaken baby syndrome, Shingles, Shy-Drager syndrome, Sjogren's syndrome, Sleep apnea, Sleeping sickness, Snatiation, Sotos syndrome, Spasticity, Spina bifida, Spinal cord injury, Spinal cord tumors, Spinal muscular atrophy, Spinal stenosis, Steele-Richardson-Olszewski syndrome, see Progressive Supranuclear Palsy, Spinocerebellar ataxia, Stiff-person syndrome, Stroke, Sturge-Weber syndrome, Subacute sclerosing panencephalitis, Subcortical arteriosclerotic encephalopathy, Superficial siderosis, Sydenham's chorea, Syncope, Synesthesia, Syringomyelia, Tardive dyskinesia, Tay-Sachs disease, Temporal arteritis, Tethered spinal cord syndrome, Thomsen disease, Thoracic outlet syndrome, Tic Douloureux, Todd's paralysis, Tourette syndrome, Transient ischemic attack, Transmissible spongiform encephalopathies, Transverse myelitis, Traumatic brain injury, Tremor, Trigeminal neuralgia, Tropical spastic paraparesis, Trypanosomiasis, Tuberous sclerosis, Vasculitis including temporal arteritis, Von Hippel-Lindau disease (VHL), Viliuisk Encephalomyelitis (VE), Wallenberg's syndrome, Werdnig-Hoffman disease, West syndrome, Whiplash, Williams syndrome, Wilson's disease, X-Linked Spinal and Bulbar Muscular Atrophy, and Zellweger syndrome. Neurological conditions can comprise movement disorders, for example multiple system atrophy (MSA).
[00196] In some cases, a disease or condition may comprise an autoimmune disease. In some cases, an autoimmune disease may comprise acute disseminated encephalomyelitis (ADEM), acute necrotizing hemorrhagic leukoencephalitis, Addison's disease, agammaglobulinemia, allergic asthma, allergic rhinitis, alopecia areata, amyloidosis, ankylosing spondylitis, anti-GBM/anti-TBM nephritis, antiphospholipid syndrome (APS), autoimmune aplastic anemia, autoimmune dysautonomia, autoimmune hepatitius, autoimmune hyperlipidemia, autoimmune immunodeficiency, autoimmune inner ear disease (AIED), autoimmune myocarditis, autoimmune pancreatitis, autoimmune retinopathy, autoimmune
thrombocytopenic purpura (ATP), autoimmune thyroid disease, axonal & neuronal neuropathies, Balo disease, Behcet's disease, bullous pemphigoid, cardiomyopathy, Castlemen disease, celiac sprue (non- tropical), Chagas disease, chronic fatigue syndrome, chronic inflammatory demyelinating polyneuropathy (CIDP), chronic recurrent multifocal ostomyelitis (CRMO), Churg-Strauss syndrome, cicatricial pemphigoid/benign mucosal pemphigoid, Crohn's disease, Cogan's syndrome, cold agglutinin disease, congenital heart block, coxsackie myocarditis, CREST disease, essential mixed cryoglobulinemia, demyelinating neuropathies, dermatomyositis, Devic's disease (neuromyelitis optica), discoid lupus, Dressler's syndrome, endometriosis, eosinophillic fasciitis, erythema nodosum, experimental allergic encephalomyelitis, Evan's syndrome, fibromyalgia, fibrosing alveolitis, giant cell arteritis (temporal arteritis), glomerulonephritis, Goodpasture's syndrome, Grave's disease, Guillain-Barre syndrome, Hashimoto's encephalitis, Hashimoto's thyroiditis, hemolytic anemia, Henock-Schoniein purpura, herpes gestationis, hypogammaglobulinemia, idiopathic thrombocytopenic purpura (ITP), IgA nephropathy, immunoregulatory lipoproteins, inclusion body myositis, insulin-dependent diabetes (type 1), interstitial cystitis, juvenile arthritis, juvenile diabetes, Kawasaki syndrome, Lambert-Eaton syndrome,
leukocytoclastic vasculitis, lichen planus, lichen sclerosus, ligneous conjunctivitis, linear IgA disease (LAD), Lupus (SLE), Lyme disease, Meniere's disease, microscopic polyangitis, mixed connective tissue disease (MCTD), Mooren's ulcer, Mucha-Habermann disease, multiple sclerosis, myasthenia gravis, myositis, narcolepsy, neuromyelitis optica (Devic's), neutropenia, ocular cicatricial pemphigoid, optic neuritis, palindromic rheumatism, PANDAS (Pediatric Autoimmune Neuropsychiatric Disorders
Associated with Streptococcus), paraneoplastic cerebellar degeneration, paroxysmal nocturnal hemoglobinuria (PNH), Parry Romberg syndrome, Parsonnage-Tumer syndrome, pars plantis (peripheral uveitis), pemphigus, peripheral neuropathy, perivenous encephalomyelitis, pernicious anemia, POEMS syndrome, polyarteritis nodosa, type I, II & III autoimmune polyglandular syndromes, polymyalgia rheumatic, polymyositis, postmyocardial infarction syndrome, postpericardiotomy syndrome, progesterone dermatitis, primary biliary cirrhosis, primary sclerosing cholangitis, psoriasis, psoriatic arthritis, idiopathic pulmonary fibrosis, pyoderma gangrenosum, pure red cell aplasis, Raynaud's phenomena, reflex sympathetic dystrophy, Reiter's syndrome, relapsing polychondritis, restless legs syndrome, retroperitoneal fibrosis, rheumatic fever, rheumatoid arthritis, sarcoidosis, Schmidt syndrome, scleritis, scleroderma, Slogren's syndrome, sperm and testicular autoimmunity, stiff person syndrome, subacute bacterial endocarditis (SBE), sympathetic ophthalmia, Takayasu's arteritis, temporal
arteritis/giant cell arteries, thrombocytopenic purpura (TPP), Tolosa-Hunt syndrome, transverse myelitis, ulcerative colitis, undifferentiated connective tissue disease (UCTD), uveitis, vasculitis, vesiculobullous dermatosis, vitiligo or Wegener's granulomatosis or , chronic active hepatitis, primary biliary cirrhosis, cadilated cardiomyopathy, myocarditis, autoimmune polyendocrine syndrome type I (APS-I), cystic fibrosis vasculitides, acquired hypoparathyroidism, coronary artery disease, pemphigus foliaceus, pemphigus vulgaris, Rasmussen encephalitis, autoimmune gastritis, insulin hypoglycemic syndrome (Hirata disease), Type B insulin resistance, acanthosis, systemic lupus erythematosus (SLE), pernicious anemia, treatment-resistant Lyme arthritis, polyneuropathy, demyelinating diseases, atopic dermatitis, autoimmune hypothyroidism, vitiligo, thyroid associated ophthalmopathy, autoimmune coeliac disease, ACTH deficiency, dermatomyositis, Sjogren syndrome, systemic sclerosis, progressive systemic sclerosis, morphea, primary antiphospholipid syndrome, chronic idiopathic urticaria, connective tissue syndromes, necrotizing and crescentic glomerulonephritis (NCGN), systemic vasculitis, Raynaud syndrome, chronic liver disease, visceral leishmaniasis, autoimmune Cl deficiency, membrane proliferative
glomerulonephritis (MPGN), prolonged coagulation time, immunodeficiency, atherosclerosis, neuronopathy, paraneoplastic pemphigus, paraneoplastic stiff man syndrome, paraneoplastic
encephalomyelitis, subacute autonomic neuropathy, cancer-associated retinopathy, paraneoplastic opsoclonus myoclonus ataxia, lower motor neuron syndrome and Lambert-Eaton myasthenic syndrome.
[00197] In some cases, a disease or a condition may comprise AIDS, anthrax, botulism, brucellosis, chancroid, chlamydial infection, cholera, coccidioidomycosis, cryptosporidiosis, cyclosporiasis, dipheheria, ehrlichiosis, arboviral encephalitis, enterohemorrhagic Escherichia coli, giardiasis, gonorrhea, dengue fever, haemophilus influenza, Hansen's disease (Leprosy), hantavirus pulmonary syndrome, hemolytic uremic syndrome, hepatitis A, hepatitis B, hepatitis C, human immunodeficiency virus, legionellosis, listeriosis, lyme disease, malaria, measles. Meningococcal disease, mumps, pertussis (whooping cough), plague, paralytic poliomyelitis, psittacosis, Q fever, rabies, rocky mountain spotted fever, rubella, congenital rubella syndrome (SARS), shigellosis, smallpox, streptococcal disease (invasive group A), streptococcal toxic shock syndrome, streptococcus pneumonia, syphilis, tetanus, toxic shock syndrome, trichinosis, tuberculosis, tularemia, typhoid fever, vancomycin intermediate resistant staphylocossus aureus, varicella, yellow fever, variant Creutzfeldt-Jakob disease (vCJD), Eblola hemorrhagic fever, Echinococcosis, Hendra virus infection, human monkeypox, influenza A, H5N1, lassa fever, Margurg hemorrhagic fever, Nipah virus, O'nyong fever, Rift valley fever, Venezuelan equine encephalitis and West Nile virus.
[00198] In some cases, a disease or condition may comprise a cancer. In some cases, a cancer may comprise thyroid cancer, adrenal cortical cancer, anal cancer, aplastic anemia, bile duct cancer, bladder cancer, bone cancer, bone metastasis, central nervous system (CNS) cancers, peripheral nervous system (PNS) cancers, breast cancer, Castleman's disease, cervical cancer, childhood Non-Hodgkin's lymphoma, lymphoma, colon and rectum cancer, endometrial cancer, esophagus cancer, Ewing's family of tumors (e.g. Ewing's sarcoma), eye cancer, gallbladder cancer, gastrointestinal carcinoid tumors, gastrointestinal stromal tumors, gestational trophoblastic disease, hairy cell leukemia, Hodgkin's disease, Kaposi's sarcoma, kidney cancer, laryngeal and hypopharyngeal cancer, acute lymphocytic leukemia, acute myeloid leukemia, children's leukemia, chronic lymphocytic leukemia, chronic myeloid leukemia, liver cancer, lung cancer, lung carcinoid tumors, Non-Hodgkin's lymphoma, male breast cancer, malignant mesothelioma, multiple myeloma, myelodysplastic syndrome, myeloproliferative disorders, nasal cavity and paranasal cancer, nasopharyngeal cancer, neuroblastoma, oral cavity and oropharyngeal cancer, osteosarcoma, ovarian cancer, pancreatic cancer, penile cancer, pituitary tumor, prostate cancer, retinoblastoma, rhabdomyosarcoma, salivary gland cancer, sarcoma (adult soft tissue cancer), melanoma skin cancer, non-melanoma skin cancer, stomach cancer, testicular cancer, thymus cancer, uterine cancer (e.g. uterine sarcoma), vaginal cancer, vulvar cancer, or Waldenstrom's macroglobulinemia. [00199] A condition or a disease, as disclosed herein, can include hyperproliferative disorders. Malignant hyperproliferative disorders can be stratified into risk groups, such as a low risk group and a medium-to- high risk group. Hyperproliferative disorders can include but may not be limited to cancers, hyperplasia, or neoplasia. In some cases, the hyperproliferative cancer can be breast cancer such as a ductal carcinoma in duct tissue of a mammary gland, medullary carcinomas, colloid carcinomas, tubular carcinomas, and inflammatory breast cancer; ovarian cancer, including epithelial ovarian tumors such as adenocarcinoma in the ovary and an adenocarcinoma that has migrated from the ovary into the abdominal cavity; uterine cancer; cervical cancer such as adenocarcinoma in the cervix epithelial including squamous cell carcinoma and adenocarcinomas; prostate cancer, such as a prostate cancer selected from the following: an adenocarcinoma or an adenocarcinoma that has migrated to the bone; pancreatic cancer such as epithelioid carcinoma in the pancreatic duct tissue and an adenocarcinoma in a pancreatic duct; bladder cancer such as a transitional cell carcinoma in urinary bladder, urothelial carcinomas (transitional cell carcinomas), tumors in the urothelial cells that line the bladder, squamous cell carcinomas,
adenocarcinomas, and small cell cancers; leukemia such as acute myeloid leukemia (AML), acute lymphocytic leukemia, chronic lymphocytic leukemia, chronic myeloid leukemia, hairy cell leukemia, myelodysplasia, myeloproliferative disorders, acute myelogenous leukemia (AML), chronic myelogenous leukemia (CML), mastocytosis, chronic lymphocytic leukemia (CLL), multiple myeloma (MM), and myelodysplastic syndrome (MDS); bone cancer; lung cancer such as non-small cell lung cancer (NSCLC), which may be divided into squamous cell carcinomas, adenocarcinomas, and large cell undifferentiated carcinomas, and small cell lung cancer; skin cancer such as basal cell carcinoma, melanoma, squamous cell carcinoma and actinic keratosis, which may be a skin condition that sometimes develops into squamous cell carcinoma; eye retinoblastoma; cutaneous or intraocular (eye) melanoma; primary liver cancer (cancer that begins in the liver); kidney cancer; autoimmune deficiency syndrome (AIDS)-related lymphoma such as diffuse large B-cell lymphoma, B-cell immunoblastic lymphoma and small non- cleaved cell lymphoma; Kaposi's Sarcoma; viral-induced cancers including hepatitis B virus (HBV), hepatitis C virus (HCV), and hepatocellular carcinoma; human lymphotropic virus-type 1 (HTLV-l) and adult T-cell leukemia/lymphoma; and human papilloma virus (HPV) and cervical cancer; central nervous system (CNS) cancers such as primary brain tumor, which includes gliomas (astrocytoma, anaplastic astrocytoma, or glioblastoma multiforme), oligodendrogliomas, ependymomas, meningiomas, lymphomas, schwannomas, and medulloblastomas; peripheral nervous system (PNS) cancers such as acoustic neuromas and malignant peripheral nerve sheath tumors (MPNST) including neurofibromas and schwannomas, malignant fibrous cytomas, malignant fibrous histiocytomas, malignant meningiomas, malignant mesotheliomas, and malignant mixed Miillerian tumors; oral cavity and oropharyngeal cancer such as hypopharyngeal cancer, laryngeal cancer, nasopharyngeal cancer, and oropharyngeal cancer; stomach cancer such as lymphomas, gastric stromal tumors, and carcinoid tumors; testicular cancer such as germ cell tumors (GCTs), which include seminomas and nonseminomas, and gonadal stromal tumors, which include Leydig cell tumors and Sertoli cell tumors; thymus cancer such as to thymomas, thymic carcinomas, Hodgkin disease, non-Hodgkin lymphomas carcinoids or carcinoid tumors; rectal cancer; and colon cancer. In some cases, the diseases stratified, classified, characterized, or diagnosed by the methods of the present disclosure include but may not be limited to thyroid disorders such as for example benign thyroid disorders including but not limited to follicular adenomas, Hurthle cell adenomas, lymphocytic thyroiditis, and thyroid hyperplasia. In some cases, the diseases stratified, classified, characterized, or diagnosed by the methods of the present disclosure include but may not be limited to malignant thyroid disorders such as for example follicular carcinomas, follicular variant of papillary thyroid carcinomas, medullary carcinomas, and papillary carcinomas.
[00200] Conditions or diseases of the present disclosure can include a genetic disorder. A genetic disorder may be an illness caused by abnormalities in genes or chromosomes. Genetic disorders can be grouped into two categories: single gene disorders and multifactorial and polygenic (complex) disorders. A single gene disorder can be the result of a single mutated gene. Inheriting a single gene disorder can include but not be limited to autosomal dominant, autosomal recessive, X-linked dominant, X-linked recessive, Y- linked and mitochondrial inheritance. In some cases, one mutated copy of the gene can be necessary for a person to be affected by an autosomal dominant disorder. Examples of autosomal dominant type of disorder can include but may not be limited to Huntington's disease, Neurofibromatosis 1, Marfan Syndrome, Hereditary nonpolyposis colorectal cancer, or Hereditary multiple exostoses. In autosomal recessive disorders, two copies of the gene must be mutated for a subject to be affected by an autosomal recessive disorder. Examples of this type of disorder can include but may not be limited to cystic fibrosis, sickle-cell disease (also partial sickle-cell disease), Tay-Sachs disease, Niemann-Pick disease, or spinal muscular atrophy. X-linked dominant disorders are caused by mutations in genes on the X chromosome such as X-linked hypophosphatemic rickets. Some X-linked dominant conditions such as Rett syndrome, Incontinentia Pigmenti type 2 and Aicardi Syndrome can be fatal. X-linked recessive disorders are also caused by mutations in genes on the X chromosome. Examples of this type of disorder can include but are not limited to Hemophilia A, Duchenne muscular dystrophy, red-green color blindness, muscular dystrophy and Androgenetic alopecia. Y-linked disorders are caused by mutations on the Y chromosome. Examples can include but are not limited to Male Infertility and hypertrichosis pinnae. The genetic disorder of mitochondrial inheritance, also known as maternal inheritance, can apply to genes in mitochondrial DNA such as in Leber's Hereditary Optic Neuropathy.
[00201] Genetic disorders may also be complex, multifactorial or polygenic. Polygenic genetic disorders can be associated with the effects of multiple genes in combination with lifestyle and environmental factors. Although complex genetic disorders can cluster in families, they do not have a clear-cut pattern of inheritance. Multifactorial or polygenic disorders can include heart disease, diabetes, asthma, autism, autoimmune diseases such as multiple sclerosis, cancers, ciliopathies, cleft palate, hypertension, inflammatory bowel disease, mental retardation or obesity.
[00202] Other genetic disorders can include but may not be limited to lp36 deletion syndrome, 21- hydroxylase deficiency, 22ql l.2 deletion syndrome, aceruloplasminemia, achondrogenesis, type II, achondroplasia, acute intermittent porphyria, adenylosuccinate lyase deficiency, Adrenoleukodystrophy, Alexander disease, alkaptonuria, alpha- 1 antitrypsin deficiency, Alstrom syndrome, Alzheimer's disease (type 1, 2, 3, and 4), Amelogenesis Imperfecta, amyotrophic lateral sclerosis, Amyotrophic lateral sclerosis type 2, Amyotrophic lateral sclerosis type 4, amyotrophic lateral sclerosis type 4, androgen insensitivity syndrome, Anemia, Angelman syndrome, Apert syndrome, ataxia-telangiectasia, Beare- Stevenson cutis gyrata syndrome, Benjamin syndrome, beta thalassemia, biotimidase deficiency, Birt- Hogg-Dube syndrome, bladder cancer, Bloom syndrome, Bone diseases, breast cancer, Camptomelic dysplasia, Canavan disease, Cancer, Celiac Disease, Chronic Granulomatous Disorder (CGD), Charcot- Marie-Tooth disease, Charcot-Marie-Tooth disease Type 1, Charcot-Marie-Tooth disease Type 4, Charcot-Marie-Tooth disease Type 2, Charcot-Marie-Tooth disease Type 4, Cockayne syndrome, Coffin- Lowry syndrome, collagenopathy types II and XI, Colorectal Cancer, Congenital absence of the vas deferens, congenital bilateral absence of vas deferens, congenital diabetes, congenital erythropoietic porphyria, Congenital heart disease, congenital hypothyroidism, Connective tissue disease, Cowden syndrome, Cri du chat syndrome, Crohn's disease, fibrostenosing, Crouzon syndrome,
Crouzonodermoskeletal syndrome, cystic fibrosis, De Grouchy Syndrome, Degenerative nerve diseases, Dent's disease, developmental disabilities, DiGeorge syndrome, Distal spinal muscular atrophy type V, Down syndrome, Dwarfism, Ehlers-Danlos syndrome, Ehlers-Danlos syndrome arthrochalasia type, Ehlers-Danlos syndrome classical type, Ehlers-Danlos syndrome dermatosparaxis type, Ehlers-Danlos syndrome kyphoscoliosis type, vascular type, erythropoietic protoporphyria, Fabry's disease, Facial injuries and disorders, factor V Leiden thrombophilia, familial adenomatous polyposis, familial dysautonomia, fanconi anemia, FG syndrome, fragile X syndrome, Friedreich ataxia, Friedreich's ataxia, G6PD deficiency, galactosemia, Gaucher's disease (type 1, 2, and 3), Genetic brain disorders, Glycine encephalopathy, Haemochromatosis type 2, Haemochromatosis type 4, Harlequin Ichthyosis, Head and brain malformations, Hearing disorders and deafness, Hearing problems in children, hemochromatosis (neonatal, type 2 and type 3), hemophilia, hepatoerythropoietic porphyria, hereditary coproporphyria, Hereditary Multiple Exostoses, hereditary neuropathy with liability to pressure palsies, hereditary nonpolyposis colorectal cancer, homocystinuria, Huntington's disease, Hutchinson Gilford Progeria Syndrome, hyperoxaluria, primary, hyperphenylalaninemia, hypochondrogenesis, hypochondroplasia, idicl5, incontinentia pigmenti, Infantile Gaucher disease, infantile-onset ascending hereditary spastic paralysis, Infertility, Jackson-Weiss syndrome, Joubert syndrome, Juvenile Primary Lateral Sclerosis, Kennedy disease, Klinefelter syndrome, Kniest dysplasia, Krabbe disease, Learning disability, Lesch- Nyhan syndrome, Leukodystrophies, Li-Fraumeni syndrome, lipoprotein lipase deficiency, familial, Male genital disorders, Marfan syndrome, McCune-Albright syndrome, McLeod syndrome, Mediterranean fever, familial, Menkes disease, Menkes syndrome, Metabolic disorders, methemoglobinemia beta-globin type, Methemoglobinemia congenital methaemoglobinaemia, methylmalonic acidemia, Micro syndrome, Microcephaly, Movement disorders, Mowat-Wilson syndrome, Mucopolysaccharidosis (MPS I), Muenke syndrome, Muscular dystrophy, Muscular dystrophy, Duchenne and Becker type, muscular dystrophy, Duchenne and Becker types, myotonic dystrophy, Myotonic dystrophy type 1 and type 2, Neonatal hemochromatosis, neurofibromatosis, neurofibromatosis 1, neurofibromatosis 2, Neurofibromatosis type I, neurofibromatosis type II, Neurologic diseases, Neuromuscular disorders, Niemann-Pick disease, Nonketotic hyperglycinemia, nonsyndromic deafness, Nonsyndromic deafness autosomal recessive, Noonan syndrome, osteogenesis imperfecta (type I and type III), otospondylomegaepiphyseal dysplasia, pantothenate kinase-associated neurodegeneration, Patau Syndrome (Trisomy 13), Pendred syndrome, Peutz-Jeghers syndrome, Pfeiffer syndrome, phenylketonuria, porphyria, porphyria cutanea tarda, Prader- Willi syndrome, primary pulmonary hypertension, prion disease, Progeria, propionic acidemia, protein C deficiency, protein S deficiency, pseudo-Gaucher disease, pseudoxanthoma elasticum, Retinal disorders, retinoblastoma, retinoblastoma FA— Friedreich ataxia, Rett syndrome, Rubinstein-Taybi syndrome, Sandhoff disease, sensory and autonomic neuropathy type III, sickle cell anemia, skeletal muscle regeneration, Skin pigmentation disorders, Smith Lemli Opitz Syndrome, Speech and communication disorders, spinal muscular atrophy, spinal-bulbar muscular atrophy, spinocerebellar ataxia,
spondyloepimetaphyseal dysplasia, Strudwick type, spondyloepiphyseal dysplasia congenita, Stickler syndrome, Stickler syndrome COL2A1, Tay-Sachs disease, tetrahydrobiopterin deficiency, thanatophoric dysplasia, thiamine -responsive megaloblastic anemia with diabetes mellitus and sensorineural deafness, Thyroid disease, Tourette's Syndrome, Treacher Collins syndrome, triple X syndrome, tuberous sclerosis, Turner syndrome, Usher syndrome, variegate porphyria, von Hippel-Lindau disease, Waardenburg syndrome, Weissenbacher-Zweymiiller syndrome, Wilson disease, Wolf-Hirschhom syndrome,
Xeroderma Pigmentosum, X-linked severe combined immunodeficiency, X-linked sideroblastic anemia, or X-linked spinal -bulbar muscle atrophy.
Kits
[00203] A kit may include a moiety, a container, an enzyme or fragment thereof, instructions for use, a portable sequencer, or any combination thereof. A kit may be a general kit for all tissue samples or disease types. A kit may be a specific kit for a specific tissue sample, such as a plasma sample, a blood sample, a serum sample, a buccal sample, or a urine sample. A kit may be a specific kit for a specific disease such as cancer. A kit may comprise a control. In some embodiments, a control can comprise one or more epigenetic modification disclosed herein.
[00204] A kit may provide periodic updates of a database of references or analysis software that compute a result of the method. A kit may provide software to automate one or more aspects of a method, such as a comparison to a reference to provide a result or to provide a summary of a result that may be reported or displayed or downloaded by a medical professional and/or entered into a database. A result or a summary of results may include any of the results disclosed herein, including recommendations of treatment options for subject and a risk occurrence of a disease or condition.
[00205] A kit may provide a unit or device for obtaining a sample from a subject (e.g., a device with a needle coupled to an aspirator). [00206] A kit may provide instructions for performing methods as disclosed herein, and include all necessary buffers and reagents for hybridizing, sequencing, amplifying, associating, extending, or combination thereof. A kit may include instructions for analyzing a result.
[00207] An informational material of a kit may comprise printed matter, e.g., a printed text, drawing, and/or photograph, e.g., a label or printed sheet. An information material may comprise Braille, computer readable material, video recording, or audio recording. In some cases, the informational material of the kit may include contact information, e.g., a physical address, email address, website, or telephone number, where a user of the kit can obtain substantive information about a compound described herein and/or its use in the methods described herein. Informational material may be provided in any combination of formats.
[00208] A kit may include a package, such as a fiber-based package, a cardboard package, or a polymeric package, such as a styrofoam box. A package may be configured so as to substantially maintain a temperature differential between an interior and an exterior. In some cases, it may provide insulating properties to keep one or more components of a kit at a preselected temperature for a preselected time. A kit may include one or more containers for a composition containing a compound(s) described herein. In some embodiments, a kit may contain separate containers (such as two separate containers for two components of a kit), dividers or compartments for one or more components, and informational material. For example, a kit component may be contained in a bottle, a vial, or a syringe, and informational material may be contained in a plastic sleeve or a packet. In other embodiments, separate components of a kit may be contained within a single, undivided container. For example, a kit component may be contained in a bottle, a vial or a syringe that has attached thereto the informational material in the form of a label. In some embodiments, a kit may include a plurality (e.g., a pack) of individual containers, each containing one or more unit dosage forms (e.g., a dosage form described herein) of a component described herein.
For example, the kit may include a plurality of syringes, ampules, foil packets, or blister packs, each containing a single unit dose of a kit component described herein. Containers of a kit may be air tight, waterproof (e.g., impermeable to changes in moisture or evaporation), and/or light-tight. A kit may include a device suitable for administration of the component, e.g., a syringe, inhalant, pipette, forceps, measured spoon, dropper (e.g., eye dropper), swab (e.g., a cotton swab or wooden swab), or any such delivery device. In a preferred embodiment, the device may be a medical implant device, e.g., packaged for surgical insertion.
[00209] A basic research business, a disease diagnostic business, a molecular profiling business, a pharmaceutical business, or any other business associated with patient healthcare may provide a kit for performing the methods described herein.
Computer Control Systems
[00210] The present disclosure provides computer control systems that are programmed to implement methods of the disclosure. FIG. 1 shows a computer system 101 that is programmed or otherwise configured to interface with a sequence library, a sequencer, a PCR machine, an apparatus that is configured to sequence or amplify an oligonucleotide, a substrate, or any combination thereof. The computer system 101 can regulate various aspects of the present disclosure, such as, for example, conditions for perform epigenetic modifications, conditions for associating a moiety to the epigenetic modifications, and conditions for nanopore sequencing. The computer system 101 can regulate amplification conditions, associating conditions, sequencing conditions, such as buffer types, temperatures, or time periods of incubation. The computer system 101 can be an electronic device of a user or a computer system that is remotely located with respect to the electronic device. The electronic device can be a mobile electronic device.
[00211] The computer system 101 includes a central processing unit (CPU, also“processor” and “computer processor” herein) 105, which can be a single core or multi core processor, or a plurality of processors for parallel processing. The computer system 101 also includes memory or memory location 110 (e.g., random-access memory, read-only memory, flash memory), electronic storage unit 115 (e.g., hard disk), communication interface 120 (e.g., network adapter) for communicating with one or more other systems, and peripheral devices 125, such as cache, other memory, data storage and/or electronic display adapters. The memory 110, storage unit 115, interface 120 and peripheral devices 125 are in communication with the CPU 105 through a communication bus (solid lines), such as a motherboard. The storage unit 115 can be a data storage unit (or data repository) for storing data. The computer system 101 can be operatively coupled to a computer network (“network”) 130 with the aid of the communication interface 120. The network 130 can be the Internet, an internet and/or extranet, or an intranet and/or extranet that is in communication with the Internet. The network 130 in some cases is a
telecommunication and/or data network. The network 130 can include one or more computer servers, which can enable distributed computing, such as cloud computing. The network 130, in some cases with the aid of the computer system 101, can implement a peer-to-peer network, which may enable devices coupled to the computer system 101 to behave as a client or a server.
[00212] The CPU 105 can execute a sequence of machine-readable instructions, which can be embodied in a program or software. The instructions may be stored in a memory location, such as the memory 110. The instructions can be directed to the CPU 105, which can subsequently program or otherwise configure the CPU 105 to implement methods of the present disclosure. Examples of operations performed by the CPU 105 can include fetch, decode, execute, and writeback.
[00213] The CPU 105 can be part of a circuit, such as an integrated circuit. One or more other components of the system 101 can be included in the circuit. In some cases, the circuit is an application specific integrated circuit (ASIC).
[00214] The storage unit 115 can store files, such as drivers, libraries and saved programs. The storage unit 115 can store user data, e.g., user preferences and user programs. The computer system 101 in some cases can include one or more additional data storage units that are external to the computer system 101, such as located on a remote server that is in communication with the computer system 101 through an intranet or the Internet.
[00215] The computer system 101 can communicate with one or more remote computer systems through the network 130. For instance, the computer system 101 can communicate with a remote computer system of a user. Examples of remote computer systems include personal computers (e.g., portable PC), slate or tablet PC’s (e.g., Apple® iPad, Samsung® Galaxy Tab), telephones, Smart phones (e.g., Apple® iPhone, Android-enabled device, Blackberry®), or personal digital assistants. The user can access the computer system 1101 via the network 130.
[00216] Methods as described herein can be implemented by way of machine (e.g., computer processor) executable code stored on an electronic storage location of the computer system 101, such as, for example, on the memory 110 or electronic storage unit 115. The machine executable or machine readable code can be provided in the form of software. During use, the code can be executed by the processor 105. In some cases, the code can be retrieved from the storage unit 115 and stored on the memory 110 for ready access by the processor 105. In some situations, the electronic storage unit 115 can be precluded, and machine-executable instructions are stored on memory 110.
[00217] The code can be pre-compiled and configured for use with a machine having a processer adapted to execute the code, or can be compiled during runtime. The code can be supplied in a programming language that can be selected to enable the code to execute in a pre-compiled or as-compiled fashion.
[00218] Aspects of the systems and methods provided herein, such as the computer system 101, can be embodied in programming. Various aspects of the technology may be thought of as“products” or “articles of manufacture” typically in the form of machine (or processor) executable code and/or associated data that is carried on or embodied in a type of machine readable medium. Machine- executable code can be stored on an electronic storage unit, such as memory (e.g., read-only memory, random-access memory, flash memory) or a hard disk.“Storage” type media can include any or all of the tangible memory of the computers, processors or the like, or associated modules thereof, such as various semiconductor memories, tape drives, disk drives and the like, which may provide non-transitory storage at any time for the software programming. All or portions of the software may at times be communicated through the Internet or various other telecommunication networks. Such communications, for example, may enable loading of the software from one computer or processor into another, for example, from a management server or host computer into the computer platform of an application server. Thus, another type of media that may bear the software elements includes optical, electrical and electromagnetic waves, such as used across physical interfaces between local devices, through wired and optical landline networks and over various air-links. The physical elements that carry such waves, such as wired or wireless links, optical links or the like, also may be considered as media bearing the software. As used herein, unless restricted to non-transitory, tangible“storage” media, terms such as computer or machine “readable medium” refer to any medium that participates in providing instructions to a processor for execution. [00219] Hence, a machine readable medium, such as computer-executable code, may take many forms, including but not limited to, a tangible storage medium, a carrier wave medium or physical transmission medium. Non-volatile storage media include, for example, optical or magnetic disks, such as any of the storage devices in any computer(s) or the like, such as may be used to implement the databases, etc. shown in the drawings. Volatile storage media include dynamic memory, such as main memory of such a computer platform. Tangible transmission media include coaxial cables; copper wire and fiber optics, including the wires that comprise a bus within a computer system. Carrier-wave transmission media may take the form of electric or electromagnetic signals, or acoustic or light waves such as those generated during radio frequency (RF) and infrared (IR) data communications. Common forms of computer- readable media therefore include for example: a floppy disk, a flexible disk, hard disk, magnetic tape, any other magnetic medium, a CD-ROM, DVD or DVD-ROM, any other optical medium, punch cards paper tape, any other physical storage medium with patterns of holes, a RAM, a ROM, a PROM and EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave transporting data or instructions, cables or links transporting such a carrier wave, or any other medium from which a computer may read programming code and/or data. Many of these forms of computer readable media may be involved in carrying one or more sequences of one or more instructions to a processor for execution.
[00220] The computer system 101 can include or be in communication with an electronic display 135 that comprises a user interface (UI) 140 for providing, for example, one or more results (immediate results or archived results from a previous experiment), one or more user inputs, reference values from a library or database, or a combination thereof. Examples of UFs include, without limitation, a graphical user interface (GUI) and web-based user interface.
[00221] Methods and systems of the present disclosure can be implemented by way of one or more algorithms. An algorithm can be implemented by way of software upon execution by the central processing unit 105. The algorithm can, for example, determine optimized conditions via supervised learning to optimize conditions such as a buffer type, a buffer concentration, a temperature, an incubation period. Conditions may be optimized for an oligonucleotide fragment, such as an oligonucleotide fragment having a particular number of epigenetic modifications or a particular length of sequence.
Embodiments
[00222] An aspect of the present disclosure provides a method. In one aspect, the method may comprise associating a moiety with an epigenetically modified base of a target nucleic acid sequence to form a labeled epigenetically modified base, wherein the epigenetically modified base comprises a
hydroxymethylated base, a formylated base, or a carboxylic acid containing base; and identifying the epigenetically modified base by sequencing the target nucleic acid comprising the labeled epigenetically modified base, wherein the sequencing comprises nanopore sequencing, wherein the sequencing is performed without an enzyme associated with the target nucleic acid sequence. [00223] In some embodiments, the epigenetically modified base may comprise a pyrimidine. In some embodiments, the pyrimidine may be a cytosine. In some embodiments, the pyrimidine may be thymine (T) or uracil. In some embodiments, the epigenetically modified base may comprise 6-methyladenine, 6- hydroxymethyladenine, 8-oxoguanine, 5-hydroxymethyluracil, abasic sites, 5-methylcytosine, and 6- methyladenine. In some embodiments, the epigenetically modified base may comprise N6- methyladenosine, N3-methyladenosine, N7-methylguanosine, 5-hydroxymethylcytosine, other methylated nucleotides, pseudouridine, thiouridine, isoguanosine, isocytosine, dihydrouridine, queuosine, wyosine, inosine, triazole, diaminopurine, b-D-glucopyranosyloxymethyluracil (a.k.a., b-D-glucosyl-HOMedU, b- glucosyl-hydroxymethyluracil,“dJ,” or“base J”), 8-oxoguanosine, and 2'-0-methyl derivatives of adenosine, cytidine, guanosine, and uridine.
[00224] In some embodiments, the epigenetically modified base may comprise a hydroxymethylated base. In some embodiments, the hydroxymethylated base may comprise a 5 -hydroxymethylated base. In some embodiments, the 5 -hydroxymethylated base may comprise a 5-hydroxymethylcytosine.
[00225] In some embodiments, the moiety may comprise a glucose moiety. In some embodiments, the method may further comprise, before the identifying, oxidizing the moiety. In some embodiments, the oxidizing may be carried out by an oxidizing agent. In some embodiments, the oxidizing agent may comprise sodium periodate.
[00226] In some embodiments, the epigenetically modified base may comprise a formylated base. In some embodiments, the formylated base may comprise a 5- formylated base. In some embodiments, the 5- formylated base may comprise a 5- formylcytosine. In some embodiments, the moiety may comprise a hydroxylamine or a derivative thereof, a hydrazine or a derivative thereof, or a l,3-indandione or a derivative thereof.
[00227] In some embodiments, the epigenetically modified base may comprise a carboxylic acid containing base. In some embodiments, the carboxylic acid containing base may comprise a 5- carboxylated base. In some embodiments, the 5- carboxylated base may comprise a 5- carboxy cytosine. In some embodiments, the moiety may comprise an anisidine or a derivative thereof, a carbodiimide or a derivative thereof, or a p-Xylylenediamine or a derivative thereof.
[00228] In some embodiments, the epigenetically modified base may further comprise a methylated base. In some embodiments, the methylated base may comprise a 5 -methylated base. In some embodiments, the 5 -methylated base may comprise a 5-methylcytosine.
[00229] In some embodiments, the target nucleic acid sequence may comprise DNA or RNA.
[00230] In some embodiments, the epigenetically modified base may further comprise a 6-methyladenine, a 6-hydroxymethyladenine, a 8-oxoguanine, a 7-methylguanine, a 5-hydroxymethyluracil, and an abasic site.
[00231] In some embodiments, a size of a nanopore may be at most 1 nanometer (nm). In some embodiments, a size of a nanopore may be at most 1 nanometer (nm), 0.9 nm, 0.8 nm, 0.7 nm, 0.8 nm, 0.6 nm, 0.5 nm, 0.4 nm, 0.3 nm, 0.2 nm, 0.1 nm, or less. In some embodiments, a size of a nanopore may be more than 1 nm.
[00232] In some embodiments, at least one nanopore used in the nanopore sequencing may be a biological nanopore. In some embodiments, at least 2, 3, 4, 5, 6, 7, 8, 9, 10 or greater nanopores used in the nanopore sequencing may be biological nanopores. In some embodiments, the nanopore may comprise a lipid bilayer. In some embodiments, the nanopore may be formed using a biological transmembrane protein such as MspA. In some embodiments, the nanopore may be a solid state or hybrid nanopore.
[00233] In some cases, at least a portion of the nucleic acid sequence or the target nucleic acid sequence may be double-stranded. In some cases, the moiety may be associated with the epigenetically modified base by a single bond, a double bond, or a triple bond. In some cases, the target nucleic acid sequence may comprise an adapter sequence. In some cases, the target nucleic acid sequence may comprise a barcode.
[00234] In some cases, the target nucleic acid sequence may comprise at least: from about 1 to about 3; from about 1 to about 5; from about 1 to about 10; from about 1 to about 15; or from about 1 to about 20 epigenetically modified bases per at least about 20 bases of the target nucleic acid sequence. In some cases, the target nucleic acid sequence may comprise at least about: 1, 5, 10, 15 or 20 epigenetically modified bases per at least about 20 bases of the target nucleic acid sequence.
[00235] In some cases, the target nucleic acid sequence may comprise a cytosine guanine (CG) site, a cytosine phosphate guanine (CpG) island, or a combination thereof. In some cases, the target nucleic acid sequence may comprise cell-free DNA. In some cases, the target nucleic acid sequence may comprise a cDNA sequence. In some cases, the method may comprise sequencing an amplified product.
[00236] In some cases, the target nucleic acid sequence may be from a sample. In some cases, the sample may be from a subject. In some cases, the subject may be a human. In some cases, the sample may comprise a buccal sample, a saliva sample, a blood sample, a plasma sample, a reproductive sample, a mucus sample, cerebral spinal fluid sample, a tissue sample, or any combination thereof.
[00237] In some cases, the method may comprise obtaining a result. In some cases, the method may comprise communicating the result via a communication medium.
[00238] In some cases, the subject may be diagnosed with a condition. In some cases, the method may comprise diagnosing the subject as having a condition. In some cases, the method may comprise diagnosing the subject as having a likelihood of developing a condition. In some cases, the diagnosing may be based on the comparing the result to the reference. In some cases, the diagnosing may at least partially confirm a previous diagnosis. In some cases, the condition may be a cancer.
[00239] In some cases, the method may comprise selecting a treatment for the subject. In some cases, the method may comprise treating the subject. In some cases, the treating may comprise: surgery, chemotherapy, radiation therapy, immunotherapy, targeted therapy, hormone therapy, stem cell transplant, and precision medicine. In some cases, the method may comprise repeating the associating, the hybridizing and the amplifying at different time points. [00240] In some cases, the subject may be a human. In some cases, the moiety may comprise a sugar. In some cases, the sugar may comprise a glucose. In some cases, the glucose may be modified.
[00241] In some cases, the moiety may be associated with the epigenetically modified base with the assistance of an enzyme. In some cases, the enzyme may be selective for a portion of the target nucleic acid sequence that is double-stranded. In some cases, the moiety may be selectively associated with the epigenetically modified base at a portion of the target nucleic acid sequence that is double -stranded. In some cases, the moiety may be selective for a portion of the nucleic acid sequence. In some cases, the portion may be double -stranded.
Specific Embodiments
[00242] A number of methods and systems are disclosed herein. Specific exemplary embodiments of these methods and systems are disclosed below.
[00243 {Embodiment 1. A method comprising: (a) associating a moiety with a hydroxymethylated base of a target nucleic acid sequence to form a labeled hydroxymethylated base; (b) oxidizing the labeled hydroxymethylated base; and (c) identifying the hydroxymethylated base by sequencing the target nucleic sequence, wherein the sequencing comprises nanopore sequencing.
[00244] Embodiment 2. The method of embodiment [00222], wherein the hydroxymethylated base comprises a pyrimidine.
[00245] Embodiment 3. The method of embodiment [00223], wherein the pyrimidine is a cytosine.
[00246 \ Embodiment 4. The method of embodiment 1, wherein the hydroxymethylated base comprises a 5 -hydroxymethylated base.
[00247] Embodiment 5. The method of embodiment 4, wherein the 5 -hydroxymethylated base comprises a 5 -hydroxymethylcytosine .
[00248] Embodiment 6. The method of any one of embodiments 1-5, wherein the moiety comprises a glucose moiety.
[00249] Embodiment 7. The method of any one of embodiments 1-5, further comprising, before the identifying, oxidizing the moiety.
[00250 \ Embodiment 8. The method of any one of embodiments 1-7, wherein the oxidizing is carried out by an oxidizing agent.
[00251] Embodiment 9. The method of embodiment 8, the oxidizing agent comprises sodium periodate.
[00252] Embodiment 10. The method of any one of embodiments 1-9, wherein the target nucleic sequence comprises a formylated base.
[00253] Embodiment 11. The method of embodiment 10, wherein the formylated base comprises a 5- formylated base.
[00254] Embodiment 12. The method of embodiment 11, wherein the 5- formylated base comprises a 5- formylcytosine. [00255 Embodiment 13. The method of any one of embodiments 10-12, wherein the formylated base is associated with a second moiety, wherein the second moiety comprises a hydroxylamine, a hydrazine, a l,3-indandione; a hemiacetal, an acetal; an aldehyde; a ketone; an ester, a primary amine, a secondary amine, an alkene, an alcohol, a thioacetal, a malononitrile, a benzoin, an aldol, a derivative thereof of any of these, or any combination thereof.
[00256 \ Embodiment 14. The method of any one of embodiments 1-13, wherein the target nucleic sequence further comprises a carboxylic acid containing base.
[00257] Embodiment 15. The method of embodiment 14, wherein the carboxylic acid containing base comprises a 5- carboxylated base.
[00258] Embodiment 16. The method of embodiment 15, wherein the 5- carboxylated base comprises a 5- carboxy cytosine.
[00259] Embodiment 17. The method of any one of embodiments 14-16, wherein the carboxylic acid containing base is associated with a third moiety, wherein the third moiety comprises an anisidine, a carbodiimide, a p-Xylylenediamine, an ester, an amine, an acyl halide, an acid anhydride, a derivative thereof of any of these, or any combination thereof.
[00260] Embodiment 18. The method of any one of embodiments 1-7, wherein the target nucleic sequence further comprises a methylated base.
[00261] Embodiment 19. The method of embodiment 18, wherein the methylated base comprises a 5- methylated base.
[00262] Embodiment 20. The method of embodiment 19, wherein the 5-methylated base comprises a 5- methylcytosine.
[00263] Embodiment 21. The method of any one of embodiments 1-20, wherein the target nucleic acid sequence comprises DNA or RNA.
[00264 ] Embodiment 22. The method of any one of embodiments 1-21, wherein the target nucleic sequence further comprises aN6-methyladenine, a N6-hydroxymethyladenine, a N6-formyladenine, a 2’- O-methyladenosine, aNl-methyladenosine, a pseudouridine, an inosine, a 8-oxoguanine, a 7- methylguanine, a 5-hydroxymethyluracil, an abasic site, or any combination thereof.
[00265] Embodiment 23. The method of any one of embodiments 1-22, wherein a size of a nanopore is at most one nanometer.
[00266] Embodiment 24. The method of any one of embodiments 1-23, wherein at least one nanopore used in the nanopore sequencing is a biological nanopore.
[00267] Embodiment 25. The method of any one of embodiments 1-24, wherein the moiety is at least two moieties.
[00268] Embodiment 26. A method comprising: (a) associating a moiety with an epigenetically modified base of a target nucleic acid sequence to form a labeled epigenetically modified base, wherein the epigenetically modified base comprises a formylated base, or a carboxylic acid containing base; and (b) identifying the epigenetically modified base by sequencing the target nucleic acid comprising the labeled epigenetically modified base, wherein the sequencing comprises nanopore sequencing.
[00269] Embodiment 27. The method of embodiment 26, wherein the epigenetically modified base comprises a pyrimidine.
[00270] Embodiment 28. The method of embodiment 27, wherein the pyrimidine is a cytosine.
[00271 [Embodiment 29. The method of any one of embodiments 26-28, wherein the epigenetically modified base further comprises a hydroxymethylated base.
[ 212\ Embodiment 30. The method of embodiment 29, wherein the hydroxymethylated base comprises a 5 -hydroxymethylated base.
[00273] Embodiment 31. The method of embodiment 30, wherein the 5 -hydroxymethylated base comprises a 5-hydroxymethylcytosine.
[00274] Embodiment 32. The method of any one of embodiments 29-31, wherein the moiety comprises a glucose moiety.
[00275] Embodiment 33. The method of any one of embodiments 29-32, further comprising, before the identifying, oxidizing the moiety.
[00276] Embodiment 34. The method of embodiment 33, wherein the oxidizing is carried out by an oxidizing agent.
[00277] Embodiment 35. The method of embodiment 34, the oxidizing agent comprises sodium periodate.
[00278] Embodiment 36. The method of any one of embodiments 26-28, wherein the epigenetically modified base comprises a formylated base.
[00279] Embodiment 37. The method of embodiment 36, wherein the formylated base comprises a 5- formylated base.
[00280] Embodiment 38. The method of embodiment 37, wherein the 5- formylated base comprises a 5- formylcytosine.
[00281] Embodiment 39. The method of any one of embodiments 36-38, wherein the moiety comprises a hydroxylamine, a hydrazine, a l,3-indandione; a hemiacetal, an acetal; an aldehyde; a ketone; an ester, a primary amine, a secondary amine, an alkene, an alcohol, athioacetal, a malononitrile, a benzoin, an aldol, a derivative thereof of any of these, or any combination thereof.
[00282] Embodiment 40. The method of any one of embodiments 26-28, wherein the epigenetically modified base comprises a carboxylic acid containing base.
[00283] Embodiment 41. The method of embodiment 40, wherein the carboxylic acid containing base comprises a 5- carboxylated base.
[00284] Embodiment 42. The method of embodiment 41, wherein the 5- carboxylated base comprises a 5- carboxy cytosine.
[00285] Embodiment 43. The method of any one of embodiments 40-42, wherein the moiety comprises an anisidine, a carbodiimide, a p-Xylylenediamine, an ester, an amine, an acyl halide, an acid anhydride, a derivative thereof of any of these, or any combination thereof. [00286] Embodiment 44. The method of any one of embodiments 26-43, wherein the epigenetically modified base further comprises a methylated base.
[00287] Embodiment 45. The method of embodiment 44, wherein the methylated base comprises a 5- methylated base.
[00288] Embodiment 46. The method of embodiment 45, wherein the 5-methylated base comprises a 5- methylcytosine.
[00289 \ Embodiment 47. The method of any one of embodiments 26-46, wherein the target nucleic acid sequence comprises DNA or R A.
[00290] Embodiment 48. The method of any one of embodiments 26-47, wherein the epigenetically modified base further comprises a N6-methyladenine, a N6-hydroxymethyladenine, a N6-formyladenine, a 2’-0-methyladenosine, a Nl-methyladenosine, a pseudouridine, an inosine, a 8-oxoguanine, a 7- methylguanine, a 5-hydroxymethyhiracil, an abasic site, or any combination thereof.
[00291] Embodiment 49. The method of any one of embodiments 26-48, wherein a size of a nanopore is at most one nanometer.
[00292] Embodiment 50. The method of any one of embodiments 26-49, wherein at least one nanopore used in the nanopore sequencing is a biological nanopore.
[00293] Embodiment 51. The method of any one of embodiments 26-50, wherein the moiety is at least two moieties.
[00294] Embodiment 52. The method of any one of embodiments 28-51, wherein at least one nanopore used in the nanopore sequencing is a biological nanopore.
[00295] Embodiment 53. The method of any one of embodiments 28-52, wherein the moiety is at least two moieties.
[00296] Embodiment 54. The method of any one of embodiments 1-27 or any one of claims 28-53, wherein the identifying comprises employing a trained algorithm.
[00297] Embodiment 55. The method of any one of embodiments 1-27 or any one of claims 28-53, wherein the hydroxymethylated base is identified at an accuracy greater than an accuracy achieved by a method of identifying the hydroxymethylated base using a different sequencing method.
[00298] Embodiment 56. The method of embodiments 55, wherein the different sequencing method is Illumina sequencing.
[00299] Embodiment 57. The method of any one of claims 1-27 or any one of claims 28-53, wherein the identifying comprises identifying an unmodified base.
[00300] Embodiment 58. The method of embodiments 57, wherein the unmodified base is identified at an accuracy greater than an accuracy achieved by a method of identifying the unmodified base using a different sequencing method.
[00301 [Embodiment 59. The method of embodiments 58, wherein the different sequencing method is Illumina sequencing. [00302] Embodiment 60. The method of any one of embodiments 57-59, wherein the unmodified base is a cytosine.
Example 1: A Diagnostic Method
[00303] A subject may be suspected of having a cancer. A sample comprising a target nucleic acid sequence may be obtained from the subject with at least one of: a plasma sample, a serum sample, a blood sample, a urine sample, and a buccal sample. The target nucleic acid sequence may be isolated from the sample. Epigenetic modifications present on the target nucleic acid sequence may be associated with UDPG employing T4 Phage beta-glucosyltransferase (T4-BGT) or with click chemistry. The target nucleic acid may then go through nanopore sequencing. The subject may be diagnosed as having the cancer when an epigenetic modification associated with the cancer may be confirmed present in the sample obtained from the subject.
Example 2: 5-hmC and its modification - variance I
DNA samples and ONT kits
Figure imgf000057_0001
Figure imgf000057_0003
1. T4 BGT
[00304] This reaction is run for CEG067 144 2.
Figure imgf000057_0002
[00305] This reaction is run on a PCR machine, which holds the reaction at 37°C for 30 minutes and then holds the reaction at l0°C.
2. Cleanup
[00306] The cleanup is performed by using GeneJet PCR Purification Kit. This reaction is run for CEG067 144 2. Briefly, DNA is purified by GeneJet PCR purification Kit and pure DNA is eluted in 50ul with elution buffer supplied in the kit.
[00307] CEGX067 144 2 is purified in 50ul solution (according to the previous paragraph). The End Prep step may need a volume input of about 25ul (see below). Therefore, a sample may be divided into two or more portions. To be consistent with CEG067 144 2, CEG067 144 1 may be directly diluted to 50uL with H20
3. End Preparation
Figure imgf000058_0001
[00308] After the end preparation, the products are run on a PCR machine. The PCR machine conditions include: holding the reaction at 20°C for 5 minutes; then pausing to add 0.9 pl 100 mM dATP (Jena Bioscience, Cat. No. NU-1001); then holding the reaction at 65°C for 5 minutes; then holding the reaction at l0°C.
4. Pool reactions for each sample and cleanup
[00309] Pool reactions for each sample and cleanup are performed by lx Ampure beads (2x 70%EtOH wash + 1/5 EB elution) (1/5 EB = 1 in 5 dilution of EB (Qiagen, Cat. No. 19086). The sample is eluded in 12 mΐ. Briefly, End Prep is carried out in two portions of each sample. The portions of same samples are pooled and then purified by Ampure beads at a 1 : 1 ratio. Ampure beads purification of DNA may be a generic DNA cleanup method. Samples are mixed the Ampure beads at a 1 : 1 ratio and incubated at room temperature for about 10 minutes. After DNA is bound to the beads, beads are magnetized on a magnet stand and washed by about 200ul 70% EtOH twice and dried in the air at the room temperature for about 5 minutes. DNA is then eluted out with l2ul of 1/5 EB.
5. dsDNA Qubit of Native Barcodes
Figure imgf000058_0002
Figure imgf000058_0003
[00310] The products are run on a PCR machine. The PCR conditions include: holding at 25°C for 10 minutes and then holding at l0°C.
7. Cleanup by Ampure beads
[00311] To cleanup, 26.4m1 EB and 21.6m1 Ampure beads (~0.62x) are added for each ligation reaction. Pool mixtures are prepared for each sample and washed twice by PEG wash buffer (8% PEG 8000, 750mM NaCl, 50mM Tris 8.0). The products are eluted by l6ul ELB from SQK-LSK108 at 37C for 4 minutes. Briefly, reactions are diluted by EB and purified by Ampure beads at (~0.62x). Bound DNA are washed by 200uL PEG wash buffer twice. Without air drying, DNA is eluted with l6uL of ELB by incubating at 37 degrees for 4 minutes.
8. dsDNA Qubit (with 1/3 dilution)
Figure imgf000058_0004
9. Bioanalyzer
[00312] FIG. 13A shows the bioanalyzer results with 1/3 dilution.
Figure imgf000059_0001
10. Flow cell QC
Figure imgf000059_0002
11. Sequence on MinlON
Figure imgf000059_0003
12. Wash flow cell
[00313] The flow cell is washed by Flow cell wash kit (EXP-WSH002) and store at 4°C.
[00314] FIG. 13B shows the size distributions of 2 samples. The read lengths of 1 kb-hmC (5-hmC) are mainly about 1 kb with small portion of shore reads about 400 bp. The read lengths of glucosylated 1 kb- hmC are about 1 kb (main peak) with 200 bp/400 bp/600 bp reads. The read ratio between the 1 kb-hmC and glucosylated 1 kb-hmC is about 10: 1. Because the modification of hmC increases the friction between the DNA sequence and pore, it makes modified DNA pass through the pore slower. In a certain time windows, the glucosylated 1 kb-hmC sample produced less reads than 1 kb-hmC sample does.
[00315] FIG. 13C shows IGV (Integrative Genomics Viewer, a visualization tool for genomic datasets) view of reads mapped to reference. It shows a similar trend for 1 kb-hmC and the glucosylated 1 kb-hmC samples.
Example 3: 5-hmC and its modification - variance 2
DNA samples and ONT kits
Figure imgf000059_0004
Figure imgf000060_0001
oxidation, NB05
Figure imgf000060_0006
1. PCR
[00316] Two reactions are set up.
Figure imgf000060_0002
[00317] There are 25 runs on a PCR machine. And the PCR conditions are listed below.
Figure imgf000060_0003
2. Pool and cleanup
[00318] Pool and cleanup are performed by GeneJet PCR Purification Kit. Pool reactions and clean up to
50ul. Briefly, PCR reactions are pooled and purified by GeneJet kit similar to 2. Cleanup in Example 2.
3. dsDNA Qubit
CEG067 150 1 73 6 pg/ul 3705ng
4. T4 BGT
Figure imgf000060_0004
[00319] The products may be run on a PCR machine. The PCR conditions include: holding the products at
37°C for 30 minutes, and holding the products at l0°C.
5. Cleanup
[00320] The cleanup is performed by GeneJet PCR Purification Kit. The product is eluted in 50uL.
Briefly, PCR reactions are pooled and purified by GeneJet kit similar to 2. Cleanup in Example 2.
6. dsDNA Qubit (1/20 dilution)
Figure imgf000060_0005
7. Sodium periodate oxidation [00321] Two reactions are set up. The reaction can also be conducted with different concentration of Sodium periodate suspension (e.g. 46 mM).
Figure imgf000061_0001
22°C for overnight.
8. Sodium sulfite quenching
[00323] 10 pl of 460 mM sodium sulfite (dissolved in water, Sigma, Cat. No. 239321) is added to 90 mΐ oxidation reaction.
9. Pool and cleanup
[00324] Pool and cleanup are performed by GeneJet PCR Purification Kit. Pool reactions and clean up to 50 mΐ. Briefly, PCR reactions are pooled and purified by GeneJet kit similar to 2. Cleanup in Example 2.
10. dsDNA Qubit (1/20 dilution)
Figure imgf000061_0002
11. Bioanalyzer
[00325] FIG. 13D shows the result of the bioanalyzer after sodium periodate oxidation.
12. End Prep
Figure imgf000061_0003
[00326] The products may be run on a PCR machine. The PCR conditions include: holding the reaction at 20°C for 5 minutes; then pausing to add 0.9 mΐ 100 mM dATP (Jena Bioscience, Cat. No. NU-1001); then holding the reaction at 65°C for 5 minutes; then holding the reaction at l0°C.
13. Pool reactions for each sample and cleanup
[00327] Pool reactions for each sample and cleanup are performed by lx Ampure beads (2x 70%EtOH wash + 1/5 EB elution) (1/5 EB = 1 in 5 dilution of EB (Qiagen, Cat. No. 19086). Briefly, End Prep is carried out in two portions of each sample. The portions of same samples are pooled and then purified by Ampure beads at a 1 : 1 ratio. Ampure beads purification of DNA may be a generic DNA cleanup method. Samples are mixed the Ampure beads at a 1 : 1 ratio and incubated at room temperature for about 10 minutes. After DNA is bound to the beads, beads are magnetized on a magnet stand and washed by about 200ul 70% EtOH twice and dried in the air at the room temperature for about 5 minutes. DNA is then eluted out with l2ul of 1/5 EB.
14. dsDNA Qubit of Native Barcodes
Figure imgf000061_0004
Figure imgf000062_0001
[00328] The products may be run on a PCR machine. The PCR conditions include: holding the reaction at
25°C for 10 minutes and then holding the reaction at l0°C.
16. Cleanup by Ampure beads
[00329] To cleanup, 26.4ul EB and 2l.6ul Ampure beads (~0.62x) are added for each ligation reaction. Pool mixtures are prepared for each sample and washed twice by PEG wash buffer (8% PEG 8000, 750mM NaCI, 50mM Tris 8.0). The products are eluted by l6ul ELB from SQK-LSK108 at 37°C for 4 minutes. Briefly, reactions are diluted by EB and purified by Ampure beads at (~0.62x). Bound DNA are washed by 200uL PEG wash buffer twice. Without air drying, DNA is eluted with l6uL of ELB by incubating at 37 degrees for 4 minutes.
17. dsDNA Qubit (1/3 dilution)
Figure imgf000062_0002
18. Bioanalyzer
[00330] FIG. 13E shows the bioanalyzer results with 1/3 dilution.
Figure imgf000062_0003
19. Flow cell QC
Figure imgf000062_0004
20. Sequence on MinlON
Figure imgf000062_0005
Figure imgf000063_0001
[00331] Flow cell is washed by Flow cell wash kit (EXP-WSH002) and stored at 4°C.
[00332] FIGs. 3-4 show that there are no differences between sample 2kb-hmC and glucosylated 2kb- hmC. There is DNA fragmentations after sodium periodate oxidation. This step is not required because glucosylated 2kb-hmC samples aren’t through pore very well. Cytosine is miscalled by Albacore software (a basecaller). The error rate of glucosylated 2kb-hmC and the oxidized glucosylated 2kb-hmC may be the same, which may be larger than the error rate of 2kb-hmC. Manually align electric signals using HDFView (a software tool to view raw data produced from a sequencer) and there are differences between 5hmc and glucosylated 5-hmC in ACT motif
[00333] FIG. 13F shows that in the forward primer region, cytosines are identified as“c” correctly, while cytosines with modifications were basecalled with errors.
Example 4: 5-fC and its modification by hydroxylamine
DNA samples and ONT kits
Figure imgf000063_0002
Figure imgf000063_0004
1. PCR
[00334] Two reactions are set up for CEG067_l59_l and eight reactions are set up for CEG067_l59_2/3.
Figure imgf000063_0003
Figure imgf000064_0001
[00335] There are 35 runs on a PCR machine.
Figure imgf000064_0002
2. Pool and cleanup
[00336] Pool and cleanup are performed by GeneJet PCR Purification Kit. Pool reactions and clean up to 100 mΐ. Pool reactions and clean up to 50 mΐ for CEG067_l59_2. All samples are purified by GeneJet Kit. CEG067 159 1 is eluted in lOOuL of elution buffer. CEG067 159 2 and CEG067 159 3 are eluted in
50uL of elution buffer, respectively.
3. dsDNA Qubit (1/20 dilution)
Figure imgf000064_0003
4. o-(carboxymethyl)hydroxylamine modification
Figure imgf000064_0004
[00337] The products may be run on a PCR machine. The PCR conditions include: holding the reaction at
37 °C for 24 hours.
5. Pool and cleanup
[00338] Pool and cleanup are performed by GeneJet PCR Purification Kit. The products are eluted in 50 mΐ. Briefly, PCR reactions are pooled and purified by GeneJet kit similar to 2. Cleanup in Example 2.
6. dsDNA Qubit (1/20 dilution)
Figure imgf000064_0005
7. Bioanalyzer
[00339] FIG. 14A shows the result of the bioanalyzer after modification.
8. End Preparation
Figure imgf000064_0006
[00340] The products may be run on a PCR machine. The PCR conditions include: holding the reaction at
20°C for 5 minutes; then pausing to add 0.9m1 lOOmM dATP (Jena Bioscience, Cat. No. NU-1001); then holding the reaction at 65°C for 5 minutes; then holding the reaction at l0°C.
9. Pool reactions for each sample and cleanup [00341] Pool reactions for each sample and cleanup are performed by lx Ampure beads (2x 70%EtOH wash + 1/5 EB elution) (1/5 EB = 1 in 5 dilution of EB (Qiagen, Cat. No. 19086). The products are eluted in 12 pl. Briefly, End Prep is carried out in two portions of each sample. The portions of same samples are pooled and then purified by Ampure beads at a 1 : 1 ratio. Ampure beads purification of DNA may be a generic DNA cleanup method. Samples are mixed the Ampure beads at a 1: 1 ratio and incubated at room temperature for about 10 minutes. After DNA is bound to the beads, beads are magnetized on a magnet stand and washed by about 200ul 70% EtOH twice and dried in the air at the room temperature for about 5 minutes. DNA is then eluted out with l2ul of 1/5 EB.
10. dsDNA Qubit of Native Barcodes
Figure imgf000065_0001
Figure imgf000065_0002
[00342] The products may be run on a PCR machine. The PCR conditions include: holding the reaction at
25°C for 10 minutes then holding the reaction at l0°C.
12. Cleanup by Ampure beads
[00343] To cleanup, 26.4 mΐ EB and 21.6 mΐ Ampure beads (~0.62x) are added for each ligation reaction. Pool mixtures are prepared for each sample and washed twice by PEG wash buffer (8% PEG 8000, 750 mM NaCl, 50 mM Tris 8.0). The product is eluted by 16 mΐ ELB from SQK-LSK108 at 37°C for 4 minutes. Briefly, reactions are diluted by EB and purified by Ampure beads at (~0.62x). Bound DNA are washed by 200uL PEG wash buffer twice. Without air drying, DNA is eluted with l6uL of ELB by incubating at 37 degrees for 4 minutes.
13. dsDNA Qubit (1/3 dilution)
Figure imgf000065_0003
14. Bioanalyzer
[00344] FIG. 14B shows the bioanalyzer result with 1/3 dilution.
Figure imgf000065_0004
15. Flow cell QC
Figure imgf000065_0005
Figure imgf000066_0001
16. Sequence on MinlON
Figure imgf000066_0002
[00345] Flow cell is washed by Flow cell wash kit (EXP-WSH002) and store at 4°C .FIG. 14C shows there is no signs of pore blockage. FIG. 14D shows the insert size distributions of C, fC, and fC-HA. FIG. 14E shows the modification cause error basecalling. FIG. 14F shows the modification cause error basecalling and the errors of fC-HA is larger than fC, which is larger than C. FIG. 14G shows the raw signal analysis of TTACT kmer. Upon manually aligning electric signals, there are differences among C, fC and fC-HA in TTACT kmer.
[00346] In conclusion, modification by hydroxylamine to 5fc may be very successful, at upwards of about -100% completion (no searching material in the product on BA gel). The modification may be mild because it may not cause DNA damage. The basecalling error of fC-HA may be larger than fC, which may be larger than C. In some embodiments, a larger hydroxylamine may be employed or a derivative thereof, hydrazine or a derivative thereof, l,3-indandione or a derivative thereof, or any combination thereof.
Example 5: 5-caC and its modification by Xylylenediamine
DNA samples and ONT kits
Figure imgf000066_0003
Figure imgf000066_0004
Figure imgf000067_0001
[00347] Eight reactions are set up for CEG067 164 1.
Figure imgf000067_0002
[00348] 35 runs on a PCR machine
Figure imgf000067_0003
2. Pool and cleanup
[00349] Pool and cleanup are performed by GeneJet PCR Purification Kit. Pool reactions and clean up to 100 pl. Briefly, PCR reactions are pooled and purified by GeneJet kit similar to 2. Cleanup in Example 2.
3. dsDNA Qubit (1/20 dilution)
Figure imgf000067_0004
4. NHS activation
[00350] Two reactions are set up.
Figure imgf000067_0005
[00351] The products may be run on a PCR machine. The PCR conditions include holding the reaction at
37°C for 1 hour.
5. Pool and cleanup
[00352] Pool and cleanup are performed by GeneJet PCR Purification Kit. Pool reactions and clean up to 50 mΐ. Briefly, PCR reactions are pooled and purified by GeneJet kit similar to 2. Cleanup in Example 2.
6. p-Xylylenediamine modification
Figure imgf000067_0006
[00353] The products may be run on a PCR machine. The PCR conditions include holding the reaction at
37°C for 2 hours. 7. Pool and cleanup
[00354] Pool and cleanup are performed by GeneJet PCR Purification Kit. Pool reactions and clean up to 50 mΐ. Briefly, PCR reactions are pooled and purified by GeneJet kit similar to 2. Cleanup in Example 2.
8. dsDNA Qubit (1/20 dilution)
Figure imgf000068_0001
9. Bioanalyzer
[00355] FIG. 15A shows the result of the bioanalyzer after p-Xylylenediamine modification.
10. End Prep
Figure imgf000068_0002
[00218] The products may be run on a PCR machine. The PCR conditions include: holding the reaction at 20°C for 5 minutes; then pausing to add 0.9 mΐ 100 mM dATP (Jena Bioscience, Cat. No. NU-1001); then holding the reaction at 65°C for 5 minutes; then holding the reaction at l0°C.
11. Pool reactions for each sample and cleanup
[00356] Pool reactions for each sample and cleanup are performed by lx Ampure beads (2x 70%EtOH wash + 1/5 EB elution) (1/5 EB = 1 in 5 dilution of EB (Qiagen, Cat. No. 19086). The products are eluted in 12 mΐ. Briefly, End Prep is carried out in two portions of each sample. The portions of same samples are pooled and then purified by Ampure beads at a 1 : 1 ratio. Ampure beads purification of DNA may be a generic DNA cleanup method. Samples are mixed the Ampure beads at a 1 : 1 ratio and incubated at room temperature for about 10 minutes. After DNA is bound to the beads, beads are magnetized on a magnet stand and washed by about 200ul 70% EtOH twice and dried in the air at the room temperature for about 5 minutes. DNA is then eluted out with l2ul of 1/5 EB.
12. dsDNA Qubit of Native Barcodes
Figure imgf000068_0003
Figure imgf000068_0004
[00357] The products may be run on a PCR machine. The PCR conditions include: holding the reaction at
25°C for 10 minutes; and then holding the reaction at l0°C.
14. Cleanup by Ampure beads
[00358] 26.4 mΐ EB and 21.6 mΐ Ampure beads (~0.62x) are added for each ligation reaction. Pool mixtures are prepared for each sample and washed twice by PEG wash buffer (8% PEG 8000, 750mM
NaCI, 50mM Tris 8.0). The products are eluted by 16 mΐ ELB from SQK-LSK108 at 37°C for 4 minutes.
15. dsDNA Qubit (1/3 dilution)
Figure imgf000069_0001
16. Bioanalyzer
[00359] FIG. 15B shows the bioanalyzer results with 1/3 dilution.
Figure imgf000069_0002
17. Flow cell QC
Figure imgf000069_0003
18. Sequence on MinlON
Figure imgf000069_0004
19. Dispose flow cell
[00360] FIG. 15C shows the raw signal analysis of ACTAT. Upon manually aligning electric signals, there are differences between caC and caC-XDA in ACTAT kmer.
[00361] The modification at 5-caC is successful. The condition is harsh, which made some noticeable DNA damage (but not serious). The ethylamine may be used.
[00362] Figure 20 indicates the base calling errors associated with 5fC, 5fC-HA, 5caC and 5caC-XDA.
[00363] With regard to FIG. 21 A, based on previous results on 2kb PCR products, signals of C, mC and hmC are very similar. Therefore, l .5kb_C was used as a‘ground state’ reference. Within motif of TAfCAT, signal of fC starts showing small difference with the interference with the signal of the preceding base (the neighboring effect). After modification by hydroxylamine, the signal differences between C and f HAC were increased. Within motif of TAcaCAT, signal of caC starts showing small difference with the interference with the signal of the preceding base (the neighboring effect). After modification by p- Xylylenediamine, the signal differences between C and ca XDAC were increased. Similar trends of modified cytosines could be observed in other contexts. Example 7 - Identifying a fC by Sequencing in a Cell-free Sample
[00364] A cell-free sample of a subject will be obtained. The cell-free sample will comprise a nucleic acid sequence. A hydroxy lamine will be associated with a formylated base of the nucleic acid sequence.
Nanopore sequencing will be performed to identify the presence of the formylated base.
Example 8 - Identifying a fC by Sequencing in a Buccal Swab Sample
[00365] A buccal swab sample of a subject will be obtained. The buccal swab sample will comprise a nucleic acid sequence. A hydrazine will be associated with a formylated base of the nucleic acid sequence. Nanopore sequencing will be performed to identify the presence of the formylated base.
Example 9 - Identifying a fC by Sequencing in a Fecal Sample
[00366] A fecal sample of a subject will be obtained. The fecal sample will comprise a nucleic acid sequence. A hemi -acetal will be associated with a formylated base of the nucleic acid sequence. Nanopore sequencing will be performed to identify the presence of the formylated base.
Example 10 - Identifying a fC by Sequencing in a Tissue Sample
[00367] A tissue sample of a subject will be obtained. The tissue sample will comprise a nucleic acid sequence. An aldehyde will be associated with a formylated base of the nucleic acid sequence. Nanopore sequencing will be performed to identify the presence of the formylated.
Example 11 - Identifying a caC by Sequencing in a Cell-free Sample
[00368] A cell-free sample of a subject will be obtained. The cell-free sample will comprise a nucleic acid sequence. An anisidine will be associated with a carboxyl acid containing base of the nucleic acid sequence. Nanopore sequencing will be performed to identify the presence of the carboxyl acid containing base.
Example 12 - Identifying a caC by Sequencing in a Buccal Swab Sample
[00369] A buccal swab sample of a subject will be obtained. The buccal swab sample will comprise a nucleic acid sequence. An ester will be associated with a carboxyl acid containing base of the nucleic acid sequence. Nanopore sequencing will be performed to identify the presence of the carboxyl acid containing base.
Example 13 - Identifying a caC by Sequencing in a Fecal Sample
[00370] A fecal sample of a subject will be obtained. The fecal sample will comprise a nucleic acid sequence. A carbodiimide will be associated with a carboxyl acid containing base of the nucleic acid sequence. Nanopore sequencing will be performed to identify the presence of the carboxyl acid containing base.
Example 14 - Identifying a caC by Sequencing in a Tissue Sample
[00371] A tissue sample of a subject will be obtained. The tissue sample will comprise a nucleic acid sequence. An acyl halide will be associated with a carboxyl acid containing base of the nucleic acid sequence. Nanopore sequencing will be performed to identify the presence of the carboxyl acid containing base.
Example 15: 5-mC
DNA samples and ONT kits
Figure imgf000071_0001
Figure imgf000071_0005
1. PCR
[00372] Set up 2 reactions for each sample
Figure imgf000071_0002
Figure imgf000071_0003
[00373] 35 runs on a PCR machine
Figure imgf000071_0004
Figure imgf000072_0001
2. Pool and cleanup
[00374] Pool and cleanup are performed by GeneJet PCR Purification Kit. Pool reactions and clean up to 50 mΐ. Briefly, PCR reactions are pooled and purified by GeneJet kit similar to 2. Cleanup in Example 2.
3. dsDNA Qubit (1/50 dilution)
Figure imgf000072_0002
4. End Prep
Figure imgf000072_0003
[00375] The products may be run on a PCR machine. The PCR conditions include holding the reaction at 20°C, 5 min; pause to add 0.9m1 lOOmM dATP (Jena Bioscience, Cat. Nu-lOOl); 65°C, 5 min; l0°C, hold.
5. Cleanup
[00376] Cleanup by lx Ampure beads (2x 70% EtOH wash + 1/5 EB elution) Elute in 12m1. (1/5 EB = 1 in 5 dilution of EB (Qiagen, Cat. NO. 19086
6. Adapter ligation
Figure imgf000072_0004
[00377] The products may be run on a PCR machine. The PCR conditions include: 25°C for 10 minutes; and then holding the reaction at l0°C.
7. Clean up by Ampure beads
[00378] Add 26.4m1 EB and 21.6m1 Ampure beads (~62x) for each ligation reaction. Pool mixtures for each sample and wash twice by 200m1 PEG wash buffer (8% PEG 8000, 750mM NaCl, 50mM Tris 8.0). Elute by 16m1 ELB* from SQK-LSK108, 37°C, 4 min (*5m1 for CEG084 75 1 due to limited amount of ELB)
8. dsDNA Qubit (1/20 dilution)
Figure imgf000072_0005
9. Bioanalyzer
[00379] FIG. 17 shows the result of the bioanalyzer.
10. Flow cell QC
Figure imgf000072_0006
Figure imgf000073_0001
11. Sequence on MinlON
Figure imgf000073_0002
12. Dispose flow cell
[00380] Wash flow cell by Flow cell wash kit and store at 4 °C.
[00381] FIG. 18 indicates that the larger the modification of cytosine the larger the error for the basecaller. . As noted in FIG. 19A, signals of TACAT and TAmCAT are almost identical. Within motif of T A '""CAT. signal of '""C starts showing small difference with the interference with the signal of the preceding base (the neighboring effect). Within motif of TAglu hmCAT, after glucosylation by T4 beta glucosyltransferase, signal of glu hmC shifts to the following base, and interferes with the preceding base as well.
Example 16 - Nanopore Sequencing
[00382] In some cases, Nanopore sequencing may discriminate epigenetic modifications on top of normal base calling. To do so, building a more complete dataset to train software to call modified bases may be required. To do so, all possible k-mers (where k = 5 or 6) with epigenetic modifications may be required.
[00383] Referring to FIG. 23A-B, to obtain a more complete k-mers dataset a De Bruijn sequence may be employed. A De Bruijn sequence may be an efficient way to collect k-mer information which encodes the library of k-mers in the minimal possible sequence.
[00384] FIG. 24 demonstrates exemplary steps to making of the k-mer sequence. The 5-mer sequence may be designed by ordering two gBlocks from IDT (fl, f2) with flanking regions for PCR and Golden Gate cloning. gBlocks (fl, f2) were (1) amplified, (2) cut with Bsal-HF, (3) ligated to form full sequence (fl+f2), (4) cloned into pCR2. l (TA cloning), (5) reamplified and (6) Sanger sequenced. A full length De Bruijn sequence for a 5-mer results. A 6-mer can also be made using the same process outlined above if required. [00385] Referring to FIG. 25, the effect of spiking in mC is shown from data from Nanopore Tombo development. (1) Tombo is used to re-squiggle Nanopore sequencing data from the De Bruijn sequence using C, mC, hmC & ghmC, to build a dataset for these modifications. (2) Modifications are spiked in at 25% - to allow calling a fifth base. (3) A strategy may be developed for only CpG modification (e.g. using a CpG methyltransferase with TET/bGT), may be expanded k-mer to all possible 6-mers if required, and may be looked at caC and fC using the De Bruijn sequence.
[00386] Referring to FIG. 26, a summary of data collect using Nanopore and Illumina sequencing platforms is shown. A complete nanopore dataset for K-mer s (where K=5) using the synthetic De Bruijn sequence using mC, hmC & ghmC as modifications was collected. It was probed for whether these data could be used as a training set to call modifications in a whole genome setting such as Fugu or human to an industry leading standard (>90% accuracy). Extracting raw aligned squiggles from reads arising from 100% labelling with modified nucleotides may be successful. Soft-labelling, where modified nucleotides are spiked in at -25% in the PCR, produced aligned reads where modifications visibly appeared to affect raw aligned squiggles. M.SssI labelling of mC also produced reads that passed filter. Labelling was verified by bisulphite-sequencing.
[00387] Referring to FIG. 27A-C, three tables of read filtering and demultiplexing is shown. Preliminary analysis of the first -8000 reads of fully labelled C’s (Run 659 in FIG. 27A and Run 665 in FIG. 27B) demonstrate that reads passing filter for modifications may be very low (average quality score < Q7) In contrast, reads do pass filter where mC labelling is done by M.SssI (CEGX_Run669 in FIG. 27C), and when labelled using a soft-labelling PCR approach. Hence, extracting raw aligned squiggles from reads arising from 100% labelling with modified nucleotides may be successful. Where modified nucleotides are introduced by spiking in modified dCTPs at -25% in the PCR or enzymatically labelling CpGs using M.SssI, also results in reads also generally passing filter.
[00388] Referring to FIG. 28A, IGV screenshots of aligned reads from bisulphite sequencing of the De Bruijn sequence are shown. FIG 28B shows extent of 5-mC labeling of various CpGs in M.SssI labelled (top) and unlabelled sequences (bottom). Verifying labelling of sequences is done using Illumina sequencing. Before analysis of Nanopore reads, verify labelling using bisulphite sequencing was done. Bisulphite sequencing the M.SssI labelled De Bruijn sequence (Illumina bisulphite NGS sequencing) was done to determine mC labelling by M.SssI. Bisulphite sequencing results for M.SssI labelling demonstrates -55% labelling of C with mC.
[00389] Referring to FIG. 29, modified C labelling was verified by Illumina NGS bisulphite for the soft labelling PCR approach sequences. Bisulphite sequencing results for the soft-labelling approach demonstrates -19-33% (mean) labelling of CpG with mC, hmC & ghmC. Any differences in Nanopore traces at C’s can be assigned to modifications.
[00390] Referring to FIG. 30A-FIG.30C and FIG. 31A- FIG. 31C, datasets for 5 different kmers are shown and FIG. 32 shows a dataset for 50 positions in the De Bruijn sequence for unmodified and modified reads. Nanopore reads may display a modified trace. The first few thousand reads passing filter were analyzed for modification-containing sequences. Tombo was used to re-squiggle and align reads to the De Bruijn sequence. Data demonstrates differences in many of the traces are visually seen - a result that is consistent with a low level (< 10-50%) of labelling. For M.SssI induced labelling, slight differences between modified and unmodified are seen close to CpG sites. For soft-labeling, slight differences between modified and unmodified are seen close to C sites.
[00391] Referring to FIG. 33B, raw Tombo trace (top) and extracted data processed in R (bottom).
Nanopore reads may display a modified trace. Doping in modifications at 25% may increase the pass rate of sequences. Whilst slight differences in the traces are observed, extracting the modified data for k-mers may be more difficult. Overcoming this issue, and assigning a modification value to each K-mer may lead to analyzing the data in a better way, such as for example similar to Laszlo, A. H. et al. PNAS (2013). In order to achieve this, as a first step, the Tombo code was modified (mainly in“plotSingleRun.R”) to output the aligned raw data in a format that can be manipulated in R for calculating differences in signal.
[00392] Next, a method was developed to call data for modifications. Method: Briefly: (1) Extract & reduce positional data from 400 runs (200 C only & 200 C & modified traces) to a position distribution,
(2) fit C only data to a single Gaussian, (3) attempt to fit modified C data using sum of two Gaussian peaks using Overlapping 1 package in Rto suggest initial starting solutions, (4) filter modifications for “true modifications” (based on a small training set) based on parameters such as the width, height & peak shift of the smaller Gaussian & (5) assign the peak of the smaller Gaussian as the modification. A method was developed to extract and assign modification data which we can now apply to reveal modifications in all K-mers as described in FIG. 34A-C. Mostly, modifications are -15-30% of the main peak consistent with bisulphite sequencing, but assignment may require further refinement.
[00393] Referring to Nanopore reads of the De Bruijn sequence, there may be signal difference between 5- mC, 5-hmC & glucosylated 5hmC - positional differences. Example below (FIG. 35A-FIG. C, FIG. 36A- FIG. B, FIG. 37 and FIG. 38) shows that the data can reveal individual positions where the modifications mC & hmC are silent, whilst ghmC induces a change. In some cases, looking at single positional differences and not in context of individual K-mers.
[00394] Referring to FIG. 39 and Nanopore reads of the De Bruijn sequence, globally, signal intensity differences caused by hmC & ghmC appear more extreme than mC and the control set of sequences. In addition, appears that +ve signal changes are also seen slightly more frequently with mC.
[00395] Referring to FIG. 40A-B and returning back to the M.SssI data, IlluminaNGS bisulphite data shows that the sequence is methylated with a mean -50% at each CpG. Example in FIG. 40A-B shows where these results are consistent between the two datasets. Data appears consistent between these independent experiments for the same sequence.
[00396] Referring to FIG. 41A-B and FIG. 42 and Nanopore reads of the De Bruijn sequence, a full dataset for modifications is shown in the form of all Kmers containing CpG.
[00397] While preferred embodiments have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. It is not intended that the disclosure be limited by the specific examples provided within the specification. While the disclosure has been described with reference to the aforementioned specification, the descriptions and illustrations of the embodiments herein are not meant to be construed in a limiting sense. Furthermore, it shall be understood that all aspects described herein are not limited to the specific depictions, configurations or relative proportions set forth herein which depend upon a variety of conditions and variables. It should be understood that various alternatives to the embodiments described herein may be employed. It is therefore contemplated that the disclosure shall also cover any such alternatives, modifications, variations or equivalents. It is intended that the following claims define the scope of the disclosure and that methods and structures within the scope of these claims and their equivalents be covered thereby.

Claims

CLAIMS WHAT IS CLAIMED IS:
1. A method comprising:
a. associating a moiety with a hydroxymethylated base of a target nucleic acid sequence to form a labeled hydroxymethylated base;
b. oxidizing the labeled hydroxymethylated base; and
c. identifying the hydroxymethylated base by sequencing the target nucleic acid sequence, wherein the sequencing comprises nanopore sequencing.
2. The method of claim 1, wherein the hydroxymethylated base comprises a pyrimidine.
3. The method of claim 2, wherein the pyrimidine is a cytosine.
4. The method of any one of claims 1-3, wherein the hydroxymethylated base comprises a 5- hydroxymethylated base.
5. The method of claim 4, wherein the 5 -hydroxymethylated base comprises a 5-hydroxymethylcytosine.
6. The method of any one of claims 1-5, wherein the moiety comprises a glucose moiety.
7. The method of any one of claims 1-6, further comprising, before the identifying, oxidizing the moiety.
8. The method of any one of claims 1-7, wherein the oxidizing is carried out by an oxidizing agent.
9. The method of claim 8, the oxidizing agent comprises sodium periodate.
10. The method of any one of claims 1-9, wherein target nucleic acid sequence further comprises a
formylated base.
11. The method of claim 10, wherein the formylated base comprises a 5- formylated base.
12. The method of claim 11, wherein the 5- formylated base comprises a 5- formylcytosine.
13. The method of any one of claims 10-12, further comprising associating the formylated based with a second moiety.
14. The method of claim 13, wherein the second moiety comprises a hydroxylamine, a hydrazine, a 1,3- indandione; a hemiacetal, an acetal; an aldehyde; a ketone; an ester, a primary amine, a secondary amine, an alkene, an alcohol, a thioacetal, a malononitrile, a benzoin, an aldol, a derivative thereof of any of these, or any combination thereof.
15. The method of any one of claims 1-14, wherein the target nucleic acid sequence further comprises a carboxylic acid containing base.
16. The method of claim 15, wherein the carboxylic acid containing base comprises a 5- carboxylated base.
17. The method of claim 16, wherein the 5- carboxylated base comprises a 5- carboxy cytosine.
18. The method of any one of claims 15-17, further comprising associating the carboxylic acid containing base with a third moiety.
19. The method of any one of claims 15-17, wherein the third moiety comprises an anisidine, a carbodiimide, a p-Xylylenediamine, an ester, an amine, an acyl halide, an acid anhydride, a derivative thereof of any of these, or any combination thereof.
20. The method of any one of claims 1-19, wherein the target nucleic acid sequence further comprises a methylated base.
21. The method of claim 20, wherein the methylated base comprises a 5-methylated base.
22. The method of claim 21, wherein the 5-methylated base comprises a 5-methylcytosine.
23. The method of any one of claims 1-22, wherein the target nucleic acid sequence comprises DNA or RNA.
24. The method of any one of claims 1-23, wherein the target nucleic acid sequence further comprises a N6-methyladenine, a N6-hydroxymethyladenine, a N6-formyladenine, a 2’-0-methyladenosine, a Nl- methyladenosine, a pseudouridine, an inosine, a 8-oxoguanine, a 7-methylguanine, a 5- hydroxymethyluracil, an abasic site, or any combination thereof.
25. The method of any one of claims 1-24, wherein a size of a nanopore is at most one nanometer.
26. The method of any one of claims 1-25, wherein at least one nanopore used in the nanopore
sequencing is a biological nanopore.
27. The method of any one of claims 1-26, wherein the moiety is at least two moieties.
28. A method comprising:
a. associating a moiety with an epigenetically modified base of a target nucleic acid sequence to form a labeled epigenetically modified base, wherein the epigenetically modified base comprises a formylated base, or a carboxylic acid containing base; and
b. identifying the epigenetically modified base by sequencing the target nucleic acid comprising the labeled epigenetically modified base, wherein the sequencing comprises nanopore sequencing.
29. The method of claim 28, wherein the epigenetically modified base comprises a pyrimidine.
30. The method of claim 29, wherein the pyrimidine is a cytosine.
31. The method of any one of claims 28-30, wherein the epigenetically modified base further comprises a hydroxymethylated base.
32. The method of claim 31, wherein the hydroxymethylated base comprises a 5 -hydroxymethylated base.
33. The method of claim 32, wherein the 5 -hydroxymethylated base comprises a 5- hydroxymethylcytosine .
34. The method of any one of claims 31-33, wherein the moiety comprises a glucose moiety.
35. The method of any one of claims 31-34, further comprising, before the identifying, oxidizing the moiety.
36. The method of claim 35, wherein the oxidizing is carried out by an oxidizing agent.
37. The method of claim 36, the oxidizing agent comprises sodium periodate.
38. The method of any one of claims 28-30, wherein the epigenetically modified base comprises a formylated base.
39. The method of claim 38, wherein the formylated base comprises a 5- formylated base.
40. The method of claim 39, wherein the 5- formylated base comprises a 5- formylcytosine.
41. The method of any one of claims 38-40, wherein the moiety comprises a hydroxylamine, a hydrazine, a l,3-indandione; a hemiacetal, an acetal; an aldehyde; a ketone; an ester, a primary amine, a secondary amine, an alkene, an alcohol, a thioacetal, a malononitrile, a benzoin, an aldol, a derivative thereof of any of these, or any combination thereof.
42. The method of any one of claims 28-30, wherein the epigenetically modified base comprises a
carboxylic acid containing base.
43. The method of claim 42, wherein the carboxylic acid containing base comprises a 5- carboxylated base.
44. The method of claim 43, wherein the 5- carboxylated base comprises a 5- carboxy cytosine.
45. The method of any one of claims 42-44, wherein the moiety comprises an anisidine, a carbodiimide, a p-Xylylenediamine, an ester, an amine, an acyl halide, an acid anhydride, a derivative thereof of any of these, or any combination thereof.
46. The method of any one of claims 28-45, wherein the epigenetically modified base further comprises a methylated base.
47. The method of claim 46, wherein the methylated base comprises a 5 -methylated base.
48. The method of claim 47, wherein the 5-methylated base comprises a 5-methylcytosine.
49. The method of any one of claims 28-48, wherein the target nucleic acid sequence comprises DNA or RNA.
50. The method of any one of claims 28-49, wherein the epigenetically modified base further comprises a N6-methyladenine, a N6-hydroxymethyladenine, a N6-formyladenine, a 2’-0-methyladenosine, a Nl- methyladenosine, a pseudouridine, an inosine, a 8-oxoguanine, a 7-methylguanine, a 5- hydroxymethyluracil, an abasic site, or any combination thereof.
51. The method of any one of claims 28-50, wherein a size of a nanopore is at most one nanometer.
52. The method of any one of claims 28-51, wherein at least one nanopore used in the nanopore
sequencing is a biological nanopore.
53. The method of any one of claims 28-52, wherein the moiety is at least two moieties.
54. The method of any one of claims 1-53, wherein the identifying comprises employing a trained
algorithm.
55. The method of any one of claims 1-54, wherein a base is identified at an accuracy greater than an accuracy achieved by a method of identifying the base using a different sequencing method.
56. The method of claim 55, wherein the different sequencing method is Illumina sequencing.
57. The method of any one of claims 1-56, further comprising detecting an unmodified base.
58. The method of claim 57, wherein the unmodified base is detected at an accuracy greater than an accuracy achieved by a method of identifying the unmodified base using a different sequencing method.
59. The method of claim 58, wherein the different sequencing method is Illumina sequencing.
60. The method of any one of claims 57-59, wherein the unmodified base is a cytosine, a guanine, a thymine, an adenine or a uracil.
PCT/IB2019/000855 2018-06-14 2019-06-14 Determination of epigenetic modifications by nanopore sequencing WO2019239218A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201862685217P 2018-06-14 2018-06-14
US62/685,217 2018-06-14

Publications (2)

Publication Number Publication Date
WO2019239218A2 true WO2019239218A2 (en) 2019-12-19
WO2019239218A3 WO2019239218A3 (en) 2020-03-19

Family

ID=68343175

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2019/000855 WO2019239218A2 (en) 2018-06-14 2019-06-14 Determination of epigenetic modifications by nanopore sequencing

Country Status (1)

Country Link
WO (1) WO2019239218A2 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110982883A (en) * 2019-12-30 2020-04-10 西安交通大学 High-throughput single-cell genome 5-hydroxymethylpyrimidine single-molecule visual analysis method
CN113373207A (en) * 2020-02-25 2021-09-10 深圳华大基因科技服务有限公司 Methods for determining cytosine modifications
WO2023215830A1 (en) * 2022-05-04 2023-11-09 The University Of Chicago Analysis of rna modifications
WO2023244480A1 (en) * 2022-06-17 2023-12-21 The Children's Hospital Of Philadelphia T lymphocyte activity screening and sequencing
WO2024028866A1 (en) * 2022-08-01 2024-02-08 Ramot At Tel-Aviv University Ltd. Detection of base modifications by enhancing electrical contrast in nanopores

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9267117B2 (en) * 2012-03-15 2016-02-23 New England Biolabs, Inc. Mapping cytosine modifications
WO2018187382A1 (en) * 2017-04-03 2018-10-11 The Trustees Of Columbia University In The City Of New York Comprehensive single molecule enhanced detection of modified cytosines

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
LASZLO, A. H. ET AL., PNAS, 2013

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110982883A (en) * 2019-12-30 2020-04-10 西安交通大学 High-throughput single-cell genome 5-hydroxymethylpyrimidine single-molecule visual analysis method
CN113373207A (en) * 2020-02-25 2021-09-10 深圳华大基因科技服务有限公司 Methods for determining cytosine modifications
WO2023215830A1 (en) * 2022-05-04 2023-11-09 The University Of Chicago Analysis of rna modifications
WO2023244480A1 (en) * 2022-06-17 2023-12-21 The Children's Hospital Of Philadelphia T lymphocyte activity screening and sequencing
WO2024028866A1 (en) * 2022-08-01 2024-02-08 Ramot At Tel-Aviv University Ltd. Detection of base modifications by enhancing electrical contrast in nanopores

Also Published As

Publication number Publication date
WO2019239218A3 (en) 2020-03-19

Similar Documents

Publication Publication Date Title
US11591653B2 (en) Methods and systems for genetic analysis
WO2019239218A2 (en) Determination of epigenetic modifications by nanopore sequencing
US11935625B2 (en) Methods and systems for genomic analysis
JP2021101732A (en) Method for determining tumor gene copy number by analysis of cell-free dna
LeBlanc et al. Next-generation sequencing approaches in cancer: where have they brought us and where will they take us?
Chen et al. 5-Hydroxymethylcytosine profiles of cfDNA are highly predictive of R-CHOP treatment response in diffuse large B cell lymphoma patients
WO2020026031A2 (en) Methods and systems for target enrichment
US20230102739A1 (en) Detection of epigenetic modifications
Mitiushkina et al. Molecular Analysis of Biliary Tract Cancers with the Custom 3′ RACE-Based NGS Panel
Ku et al. The evolution of high-throughput sequencing technologies: From sanger to single-molecule sequencing
Chan et al. Detecting m6A at single-molecular resolution via direct RNA sequencing and realistic training data

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19794243

Country of ref document: EP

Kind code of ref document: A2