WO2017070693A1 - Methods and systems for detecting variations in dna - Google Patents

Methods and systems for detecting variations in dna Download PDF

Info

Publication number
WO2017070693A1
WO2017070693A1 PCT/US2016/058521 US2016058521W WO2017070693A1 WO 2017070693 A1 WO2017070693 A1 WO 2017070693A1 US 2016058521 W US2016058521 W US 2016058521W WO 2017070693 A1 WO2017070693 A1 WO 2017070693A1
Authority
WO
WIPO (PCT)
Prior art keywords
mismatch
nucleic acid
current
base
base pair
Prior art date
Application number
PCT/US2016/058521
Other languages
French (fr)
Inventor
Cynthia Burrows
Henry White
Aaron Fleming
Robert Johnson
Yun Ding
Qian Jin
Original Assignee
University Of Utah Research Foundation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University Of Utah Research Foundation filed Critical University Of Utah Research Foundation
Publication of WO2017070693A1 publication Critical patent/WO2017070693A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6813Hybridisation assays
    • C12Q1/6827Hybridisation assays for detection of mutation or polymorphism
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B82NANOTECHNOLOGY
    • B82YSPECIFIC USES OR APPLICATIONS OF NANOSTRUCTURES; MEASUREMENT OR ANALYSIS OF NANOSTRUCTURES; MANUFACTURE OR TREATMENT OF NANOSTRUCTURES
    • B82Y15/00Nanotechnology for interacting, sensing or actuating, e.g. quantum dots as markers in protein assays or molecular motors

Definitions

  • a biopsy of possible cancerous tissue likely will include a population of cells that may each have a unique set of genome variations (i.e., single nucleotide polymorphisms (SNPs) or mutations) (Burrell et al, Nature, 501: 338 (2013)).
  • SNPs single nucleotide polymorphisms
  • Knowledge about these heterogeneous cell populations and their corresponding genetic variations is critical for the diagnosis and treatment of a variety of diseases, including cancer.
  • low-level genetic variations if they occur at critical sites, can be the driver mutations of cancer phenotypes.
  • next generation sequencing methods are used to detect genetic variations in heterogeneous cell populations (Burrell et al., Nature, 501, 338-345 (2013)). High sequence coverage (up to lOOOx), however, is required to identify low levels of SNPs by current next generation sequencing methods, making the technology cost-prohibitive to conduct on a routine basis (Sims et al, Nat. Rev. Genet., 15, 111 (2014)). Despite its biological significance in DNA repair, base-flipping also has been consistently challenging to measure (Stivers, supra). The most common methodology employs nuclear magnetic resonance spectroscopy (NMR), where the exchange of imino protons occurs when the base pair is open to the solution (i.e., extra-helical).
  • NMR nuclear magnetic resonance spectroscopy
  • the present disclosure provides method for identifying a single nucleotide of an analyte nucleic acid.
  • the method comprises: (a) hybridizing the analyte nucleic acid to a probe nucleic acid to form a hybridized analyte nucleic acid that includes a double-stranded portion comprising either (i) a base pair comprising the single nucleotide and a
  • the hybridized analyte nucleic acid with a nanopore within a membrane, the nanopore comprising at least a first region defining a first channel with a diameter sufficient to allow passage of double-stranded nucleic acids, a second region proximate to the first region and defining a second channel with a diameter that is larger than the first diameter, and a third region proximate to the second region and spaced from the first region and defining a third channel with a diameter sufficient to allow passage of single-stranded nucleic acids but not sufficient to allow passage of double-stranded nucleic acids, (c) applying an electrical voltage across the nanopore, whereupon the hybridized analyte nucleic acid passes into the nanopore, and the base pair or the base mismatch is positioned within
  • the measurement based on the current of step (d) is a measurement of a current at a fixed time
  • the reference of step (e) is a measurement of a current at a fixed time when a control nucleic acid having a control base pair or a control base mismatch is independently positioned within the nanopore in a manner that causes the control base pair or the control base mismatch to be positioned within the first channel.
  • the measurement based on the current of step (d) is a measurement of current modulating as a function of time
  • the reference of step (e) is a current modulation signature corresponding to at least one measurement of current modulating as a function of time when a particular base pair mismatch is positioned within the first channel.
  • the present disclosure also provides a method for identifying a single nucleotide in the genome of an organism which comprises amplifying a portion of the genome comprising a single nucleotide to form an analyte nucleic acid, and then performing the above-described method to identify the single nucleic acid.
  • Figure 1A is a diagram illustrating the hairpin duplex sequences hp-1, hp-2, hp-3, and hp-4 described in Example 1.
  • Figure IB is a diagram illustrating a representative structure of the oc-hemolysin nanopore and the hairpin duplex based on pdb 7AHL, 1JVE, and 4HW1.
  • Figure 1C is a current blocking histogram for a mixture of the hp-1 and hp-2 duplexes.
  • Figure ID is a current blocking histogram for a mixture of the hp-1 and hp-3 duplexes.
  • Figure IE is a current blocking histogram for a mixture of the hp-1 and hp-4 duplexes.
  • Figures 1C-1E the hp-1 sequence was used as an internal standard, and it was always mixed with the analyte strand (hp-2, hp-3 and hp-4) in a ratio of 1 :2, respectively; therefore, the smaller peak area always represents hp-1.
  • Figure IF is a table listing peak-to- peak AI/IQ and current differences measured between the intemal standard (hp-1) and the analyte strand. The error in each value represents the standard deviation of peak-to-peak widths from three individual protein channels. The data were recorded with 10 mM KPi (pH 7.4), 1.00 M KC1, at 22 ⁇ 1 °C, and a 100 mV (trans vs. cis) bias. The histograms represent > 200 recorded events.
  • Figure 2A is a diagram illustrating the sequences for the hairpin duplexes hp-1, hp-5, hp-6, hp-7 described in Example 1.
  • Figures 2B is a current blocking histogram for a mixture of the hp-1 hp-5 hairpin duplexes.
  • Figures 2C is a current blocking histogram for a mixture of the hp-1 hp-6 hairpin duplexes.
  • Figures 2D is a current blocking histogram for a mixture of the hp-1 hp-7 hairpin duplexes.
  • Figures 2B-2D the hp-1 sequence was used as an intemal standard and it was always mixed with the analyte strand (hp-5, hp-6 and hp-7) in a ratio of 1 :2, respectively; therefore, the smaller peak area always represents hp-1.
  • Figure 2E is a table listing peak-to-peak AI/I 0 and current differences measured between the standard (hp-1) and the analyte. The error in each value represents the standard deviation of peak-to- peak widths from three individual protein channels. The data were recorded with 10 mM KPi (pH 7.4), 1.00 M KC1, at 22 ⁇ 1 °C, and a 100 mV (trans vs. cis) bias. The histograms represent > 200 recorded events.
  • Figure 3A includes a diagram illustrating the hairpin duplex sequences hp-6, hp-8, and hp-9, as well as blocking current histograms comparing G-C vs. OG-C or G-C vs. 7- deazaguanine (7):C when placed at position 9 of the oc-HL latch zone.
  • the current blocking histograms for both studies identified a single population, and blocking current histograms for hp-6 vs. hp-8 and hp-6 vs. hp-9 were nearly identical. These results suggest the latch zone of wild-type a-HL cannot resolve these base pairs.
  • Figure 3B includes a diagram illustrating the hairpin duplex sequences hp-10, hp-11, and hp-12, as well as current blocking histograms comparing the epigenetic markers mC with C-containing duplexes. These blocking current histograms show that one mC could not be distinguished from one C;
  • the data were recorded with 10 mM KPi (pH 7.4), 1.00 M KC1, at 22 ⁇ 1 °C, and a 100 mV (trans vs. cis) bias.
  • the histograms represent > 200 recorded events.
  • Figure 4A is a diagram of a duplex section of DNA driven into the a-HL vestibule up to the 1.4 nm central constriction. A mismatch was located at the 9th base-pair from the 3' end of the shorter strand, placing it in close proximity to the 2.6 nm latch constriction.
  • Figure 4B is a plot of typical current-time traces in a duplex containing a Watson-Crick CG base pair or a CA, CC, or CT mismatch located at position 9.
  • Figure 4C is a plot showing an expanded view of the current signature observed when DNA resides in a-HL for a CC9-containing duplex.
  • Figure 4D is a plot showing an expanded view of the current signature observed when DNA resides in a-HL for a CA9-containing duplex.
  • Figure 4E is a plot showing an expanded view of the current signature observed when DNA resides in a-HL for a CT9- containing duplex.
  • Figure 4F is a plot showing an expanded view of the current signature observed when DNA resides in a-HL for a CG9-containing duplex. Data were recorded in a 10 mM phosphate (pH 7.5), 0.25 M KC1 solution. A voltage of 120 mV was applied across the protein channel. For clarity, continuous open channel data have been excised, as denoted by an axis break (//). The relative current levels (/;, and I 2 *) observed for the different mismatch containing duplexes were confirmed by analyzing multiple duplexes with the same protein channel.
  • Figure 5A is a plot showing the typical individual events observed for CC and CA mismatches, where the current modulates between states, labeled /; and for the CC mismatch and /; and I 2 * for the CA mismatch. For comparison, a typical event for the complementary CG duplex is also shown.
  • the CT mismatch gives a single state current signature of identical current magnitude to the complementary duplex.
  • Expanded views for duplexes CC9 and CA9 are shown in Figure 5B and Figure 5C, respectively, and illustrate the typical state lifetimes.
  • Figure 5D and Figure 5E are histograms generated from dwells in a unique state that show the relative current amplitudes and lifetimes for the CC9 and CA9 mismatch-containing duplexes, respectively. State current and lifetime values were extracted using QUB software. Data were recorded in a 10 mM phosphate, 0.25 M KC1, pH 7.5 buffer. A voltage of 120 mV was applied across the protein channel.
  • Figure 6A is a schematic of KRAS duplex sequences in which a mismatched CC base pair was positioned 3-4 bases away from the oc-HL latch constriction.
  • Figure 6B is a series of a representative current-time trace plots illustrating that identification of the CC mismatch in the KRAS sequence was restricted to instances where the mismatch was in close proximity to the latch constriction during DNA residence in the a-HL vestibule.
  • Figure 6C, Figure 6D and Figure 6E are histograms of the current states for duplexes CC6, CC9, and CC13, respectively, and illustrate that the /2 state is only observed when the CC is in proximity to the latch constriction. For clarity, continuous open channel data have been excised from the current-time traces, as denoted by an axis break (//).
  • Figure 7A, Figure 7B and Figure 7C are a series of histograms illustrating that a duplex containing a CC mismatch in proximity to the latch constriction of aHL resides significantly longer in the nanopore than the fully complementary duplex. Shown are histograms of the total residence time prior to unzipping (t res , see Figure 5) for a duplex with a CC mismatch at position 9 ( Figure 7A), the fully complementary duplex ( Figure 7B), and a duplex with a CC mismatch at position 13 ( Figure 7C).
  • the residence time constant for the duplex CC6 was 47 ⁇ 6 ms.
  • Figure 8 is a schematic illustrating the proposed model for the interactions of a CC mismatch-containing duplex with a-HL as described in Example 4. Rate constants presented in Figure 8 were extracted from histograms of the total dwell times of each state.
  • Figure 9 is a series of diagrams illustrating the structures of the Watson-Crick CG base pair ( Figure 9A), the wobble CC mismatch ( Figure 9B), the CA wobble mismatch h ( Figure 9C), and the CT homopyrimidine mismatch ( Figure 9D).
  • the CC and CA mismatches are stabilized by just one hydrogen bond each, the CT mismatch by two hydrogen bonds, and the CG complementary base-pair by three hydrogen bonds.
  • the extra stability yielded by the additional hydrogen bonds in the CT mismatch and CG pair may inhibit base-flipping on a time scale shorter than the residence of the duplex within the pore prior to unzipping.
  • Figure 1 OA is a diagram illustrating that the interaction of a CC mismatch with the latch constriction is dependent on pH.
  • Figure 10B is a graph showing the fraction of events with a current signature that modulated between the /; and states plotted as a function of pH over the range 6 to 7.5. Modulating current capture events were assigned to duplexes that contain the CC mismatch (dominant > pH 7), and single-level current capture events (/; only) were assigned to duplexes containing the CC+ mismatch (dominant ⁇ 6). At intermediate pH values, a mixture of event types was observed.
  • Figure 11 A is a plot of representative current-time trace highlighting the different event types observed for interactions of duplexes containing a GG mismatch or GC base-pair at position 9 within the oc-HL latch zone.
  • Figure 1 IB shows current event histograms of capture events for experiments with just the GG duplex and both the GG and CG duplexes.
  • Figure 12A is a diagram of a DNA duplex driven into the a-HL nanopore (shown in cross-section) under an applied potential where it was held for up to 20 seconds and then ejected by reversing the applied bias as described in Example 7. While resident within the pore, the C-C mismatch site was aligned with the latch constriction of a-HL.
  • Figure 12B shows plots of modulating current signatures observed while DNA resides within the nanopore, where corresponds to a confirmation where all bases are intra-helical and corresponds to a conformation where one of the cytosine bases at the mismatch site is extra- helical. Intra- and extra-helical lifetimes are given by tj and 3 ⁇ 4 respectively.
  • Figure 13 A is a series of representative lifetime histograms for states I ⁇ (intra- helical) and h (extra-helical), for a single molecule of DNA, from which lifetime constants can be extracted.
  • Figure 13B are distributions of lifetime constants for states I ⁇ and across a sample of 35 individual duplexes, measured with a single protein channel.
  • Figure 13C is a scatter plot of intra- and extra-helical lifetime constants ⁇ ⁇ and ⁇ 2 for individual DNA duplexes measured across three independent a-HL channels (squares, circles, triangles).
  • Figure 14 shows plots of representative current time traces from a six second window of a single DNA capture event demonstrating measurement of base-flipping at a C-X mismatch site as described in Example 8, where X is mC, hmC, fC or caC.
  • Two event types (I and II) were observed for fC, with type I comprising 80% of events.
  • Figure 15 is a scatter plot of intra-helical (xi) vs. extra-helical (12) lifetime constants for duplexes C-C (squares), C-mC (circles), C-hmC (diamonds), C-fC (triangles), and C-caC (pentagons), as described in Example 8. Each data point represents a base- flipping measurement for a single DNA molecule.
  • Figure 16 is a scatter plot that illustrates that base-flipping kinetics at a mismatch site within the a-HL latch are sequence dependent, as described in Example 9.
  • flanking base pairs of the cytosine modifications mC (circles), hmC (diamonds) and fC (triangles) are changed from 5 ⁇ and 3'C (hollow symbols, data from Figure 15) to 5'G and 3'T (solid symbols)
  • the population centers of the lifetime constants xi and X2 were shifted. Changes to the lifetime constant when changing the sequence context of fC were observed only for the minor event type. No changes were observed for caC.
  • Three independent measurements, i.e., with three different protein channels (hollow, solid, and hatched squares) for the C-C duplex highlighted the negligible variation in population centers expected from experiment to experiment with DNA of the same composition.
  • Figure 17 is a diagram illustrating a method for applying the latch zone to the discrimination of specific base pairs from a genomic sample.
  • the present disclosure is based, at least in part, on the discovery of a nanopore- based system for analysis of nucleic acid molecules at the single nucleotide level.
  • Nanopore analysis of nucleic acids involves using a voltage to drive molecules through a nanoscale pore in a membrane between two electrolytes, and monitoring how the ionic current through the nanopore changes as single molecules pass through it. This approach allows charged polymers (including single-stranded DNA, double-stranded DNA and RNA) to be analyzed with sub-nanometer resolution and without the need for labels or amplification.
  • Described herein is a method for identifying a single nucleotide of an analyte nucleic acid which involves a nanopore-based system.
  • nucleobase or “base” are synonymous and refer to naturally occurring and synthetic heterocyclic moieties commonly known in the art of nucleic acid or polynucleotide technology or peptide nucleic acid technology for generating polymers.
  • suitable nucleobases include: adenine, cytosine, guanine, thymine, uracil, 5-propynyl- uracil, 2-thio-5-propynyl-uracil, 5-methylcytosine,
  • nucleobases can be linked to other moieties to form nucleosides, nucleotides, and nucleoside/tide analogs.
  • Nucleoside refers to a compound consisting of a purine, deazapurine, or pyrimidine nucleoside base, e.g., adenine, guanine, cytosine, uracil, thymine, 7- deazaadenine, 7- deazaguanosine, that is linked to the anomeric carbon of a pentose sugar at the 1 ' position, such as a ribose, 2'-deoxyribose, 3'-deoxyribose, or a 2',3'-di-deoxyribose.
  • nucleotide refers to a phosphate ester of a nucleoside, e.g., a mono-, a di-, or a triphosphate ester, wherein the most common site of esterification is the hydroxyl group attached to the C- 5, C-2, or C-3 position of the pentose.
  • nucleic acid sequence refers to a polymer of DNA or RNA, i.e., a polynucleotide or oligonucleotide, which can be single-stranded or double- stranded and which can contain non-natural or altered nucleotides.
  • Nucleic acids are typically linked via phosphodiester bonds to form nucleic acids or polynucleotides, though many other linkages are known in the art (e.g., phosphorothioates, boranophosphates, and the like).
  • analyte nucleic acid generally refers to a substance that is the subject of chemical, biological, or structural analysis.
  • an "analyte nucleic acid” refers to a nucleic acid molecule that is subject to an analysis of any suitable property or feature of the nucleic acid. Such property or features include, but are not limited to, the sequence, secondary structure, or size of the nucleic acid molecule.
  • probe refers to an oligonucleotide that hybridizes specifically to a target sequence in a nucleic acid under conditions that promote hybridization to form a detectable hybrid.
  • the analyte nucleic acid may be a nucleic acid molecule or sequence obtained or derived from a cell of any suitable organism, such as, for example, a bacterium, a virus, a parasite, an insect, a bird, or a mammal (e.g., cow, pig, goat, rabbit, a sheep, a hamster, guinea pig, cat, dog, rat, mouse, monkey, chimpanzee, gorilla, or human).
  • the analyte nucleic acid molecule may be obtained or derived from a healthy (i.e., non-diseased) organism.
  • the analyte nucleic acid molecule may be obtained or derived from an abnormal cell or population of cells, such as a neoplasm, cancer, or other diseases which damage or impair cellular function.
  • the analyte nucleic acid molecule may be obtained or derived from any type of neoplasm or cancer, such as, for example, a melanoma, renal cell carcinoma, lung cancer, bladder cancer, breast cancer, cervical cancer, colon cancer, gall bladder cancer, laryngeal cancer, liver cancer, thyroid cancer, stomach cancer, salivary gland cancer, prostate cancer, or pancreatic cancer.
  • the analyte nucleic acid molecule may be obtained or derived from a bacterium, a virus, a parasite, or other infectious agents.
  • the analyte nucleic acid molecule may be obtained or derived from any of the following bacteria:
  • Halobacterium Heliobacter, Hyphomicrobium, Methanobacterium, Micrococcus,
  • Myobacterium Mycoplasma, Myxococcus, Neisseria, Nitrobacter, Oscillatoria, Prochloron, Proteus, Pseudomonas, Phodospirillum, Rickettsia, Salmonella, Shigella, Spirillum, Spirochaeta, Staphylococcus, Streptococcus, Streptomyces, Sulfolobus, Thermoplasma, Thiobacillus , and Treponema.
  • the analyte nucleic acid molecule may be obtained or derived from any of the following viruses: Arenaviridae, Arterivirus, Astroviridae,
  • Coronaviridae e.g., Coronavirus, such as severe acute respiratory syndrome (SARS) virus
  • Corticoviridae e.g., corticoviridae
  • Cystoviridae Deltavirus, Dianthovirus, Enamovirus
  • Filoviridae e.g., Marburg virus and Ebola virus (e.g., Zaire, Reston, Ivory Coast, or Sudan strain)
  • Flaviviridae e.g., Hepatitis C virus, Dengue virus 1 , Dengue virus 2, Dengue virus 3, Dengue virus 4, and Zika virus
  • Hepadnaviridae e.g., Hepatitis B virus
  • Herpesviridae e.g., Human herpesvirus 1 , 3, 4, 5, and 6, and Cytomegalovirus
  • Hypoviridae Iridoviridae, Leviviridae, Lipothrixviridae, Microviridae, Orthomyxoviridae (e.
  • the analyte nucleic acid molecule may be obtained or derived from any of the following parasites: Sporozoa (e.g., Plasmodium species),
  • the analyte nucleic acid may comprise a portion of a larger nucleic acid sequence that encodes a protein, or the analyte nucleic acid may itself encode a protein.
  • the analyte nucleic acid or a larger nucleic acid sequence containing the analyte nucleic acid may encode any suitable protein, including but not limited to, surface proteins, intracellular proteins, membrane proteins, and secreted proteins.
  • the analyte nucleic acid or a larger nucleic acid sequence containing the analyte nucleic acid may encode an antibody heavy chain or portion thereof, an antibody light chain or portion thereof, an enzyme, a receptor, a cytokine, a tumor suppressor protein, a mitogen, a neuropeptide, a neurotransmitter, a structural protein, a co-factor, a polypeptide, a peptide, an intrabody, a selectable marker, a toxin, a growth factor, or a peptide hormone.
  • the analyte nucleic acid or a larger nucleic acid sequence containing the analyte nucleic acid also may not encode a protein.
  • non-coding nucleic acid sequences may function as cis- or trans-regulatory elements that control the transcription of a nearby or distant genes, respectively, or may be transcribed into noncoding RNA sequences (e.g., ribosomal RNA, tRNA, or microRNA).
  • noncoding RNA sequences e.g., ribosomal RNA, tRNA, or microRNA.
  • analyte nucleic acid may be obtained or derived from a naturally- occurring (i.e., wild-type) nucleic acid sequence
  • the analyte nucleic acid may be synthetically generated or engineered using routine methods known in the art, such as those described in, e.g., Sambrook et al, Molecular Cloning- A Laboratory Manual, 4th ed., Cold Spring Harbor Press, Cold Spring Harbor, New York (2012); and Ausubel et al, Current Protocols in Molecular Biology, John Wiley & Sons, New York (2016).
  • the methods described herein involve hybridizing the analyte nucleic acid to a probe nucleic acid.
  • the probe nucleic acid may be synthetically generated and may comprises a sequence that is complementary to the analyte nucleic acid sequence, including a nucleotide that creates a base pair or base mismatch with a single nucleotide of the analyte nucleic acid upon hybridization.
  • base pair refers to a pair of complementary bases in a double-stranded nucleic acid molecule, consisting of a purine in one strand linked by hydrogen bonds to a pyrimidine in the other.
  • the pyrimidine cytosine always pairs with the purine guanine, and the purine adenine with the pyrimidine thymine (in DNA) or uracil (in RNA).
  • a "base mismatch,” as used herein, refers to the presence, in one strand of a double-stranded nucleic acid sequence (e.g., DNA), of a nucleotide that is not complementary to the nucleotide occupying the corresponding position in the other strand.
  • the base pair may comprise a canonical Watson-Crick base pair, such as cytosine- guanine base pair (C-G) or an adenine-thymine (A-T) base pair.
  • the base pair may involve non-Watson-Crick interactions, such as between modified nucleotides.
  • one or both nucleotides of the base pair may comprise an epigenetic modification, such as methylation.
  • epigenetic modifications appear to occur in nature as a means to control gene regulation in human cells and have implications in the development of cancer and other diseases (Robertson, K.D., Nat Rev Genet, (5(8): 597-610 (2005); and Lister et al, Nature, 4(52(7271): 315-322 (2009)).
  • the most common epigenetic modification is the enzyme-catalyzed addition of a methyl group to the carbon-5 position of cytosine to generate methylcytosine (mC).
  • Other modified nucleotides which can form a non- Watson-Crick base pair include, for example, 5-hydroxymethylcystosine (hmC), 5-formylcytosine (fC) and 5- carboxylcytosine (caC).
  • the base pair formed by hybridization between the analyte nucleic acid and probe nucleic acid may be a G-mC base pair, a G-hmC base pair, a G-fC base pair or a G-caC base pair.
  • the base mismatch may comprise any mismatched nucleotide pairing, such as, for example, a C-A mismatch, a C-C mismatch, a C- mC mismatch, a C-hmC mismatch, a C-fC mismatch, or a C-caC mismatch.
  • mismatch repair proteins access bases to be excised through a process referred to as "base-flipping," in which a single base rotates from an intra-helical position through to an exposed extra-helical position (Stivers, supra).
  • Spontaneous base-flipping occurs slowly for Watson-Crick base pairs, but is significantly more prominent at mismatch sites and is known to play a key role in many biological processes, particularly sequence or base-specific recognition by DNA repair enzymes (Stivers, J.T., Chem. - Eur. J. , 14: 786 (2008); and Yin et al, Proc. Natl. Acad. Sci. USA, 111: 8043 (2014)).
  • the mechanisms of base-flipping can be passive, where a protein merely identifies an extra-helical base, or active, where a protein is involved in causing base- flipping, and/or stabilizing the intra-helical state (Lariviere et al, J. Biol. Chem., 279: 34715 (2004); and Huang et al ; Proc. Natl. Acad. Sci. USA. 100: 68 (2003)).
  • the analyte nucleic acid and probe nucleic acid each may be of any suitable size.
  • the analyte nucleic acid and/or the probe nucleic acid may comprise at least 15 nucleotides (e.g., 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 or more nucleotides).
  • the analyte nucleic may comprise about 15 to about 20 nucleotides, about 18 to about 22 nucleotides, about 20 to about 25 nucleotides, about 27 to about 35 nucleotides, or a range defined by any two of the foregoing values.
  • the analyte nucleic acid and probe nucleic may differ in size, i.e., the probe nucleic acid may be of a different length than the analyte nucleic acid.
  • the probe nucleic acid may be designed such that it hybridizes to the analyte nucleic acid under at least moderate, preferably high, stringency conditions.
  • Methods for designing and synthesizing probe nucleic acid sequences are known in the art and described in, e.g., Tang et al, Biotechniques , 40(6): 759-763 (2006); Espelund et al, Nuc. Acids. Res., 18(20): 6157-6158 (1990); Sambrook et al, supra; and Ausubel et al, supra.
  • hybridization of the analyte nucleic acid to the probe nucleic acid may be performed using standard methods known in the art, such as those described in, e.g., Kashima et al, Nature 313: 402-404 (1985); Sambrook et al, supra; and Haymes et al, Nucleic Acid Hybridization: A Practical Approach, IRL Press, Washington, D.C. (1985). Additional details and an explanation of stringency of hybridization reactions are provided in, e.g., Ausubel et al., supra.
  • the analyte nucleic acid hybridizes to the probe nucleic acid to form a
  • hybridized analyte nucleic acid may include a double-stranded portion and a single-stranded portion.
  • the double-stranded portion (also referred to herein as a "duplex") of the hybridized analyte nucleic acid may comprise a portion of the analyte nucleic acid and the entire probe nucleic acid.
  • the double-stranded portion of the hybridized analyte nucleic acid may comprise a portion of the probe nucleic acid and the entire analyte nucleic acid.
  • the double-stranded portion of the hybridized analyte nucleic acid may include either a base pair comprising a single nucleotide and a complementary nucleotide on the probe nucleic acid or a base mismatch comprising a single nucleotide and a non-complementary nucleotide on the probe nucleic acid.
  • the base pair or base mismatch as described above, may comprise any suitable single nucleotide.
  • the single nucleotide may be a naturally occurring, an artificial, a damaged, or a modified (e.g., an epigenetic modification) nucleotide.
  • the single nucleotide may be cytosine (C), guanine (G), adenine (A), thymine (T), 5-methylcytosine (mC), 5-hydroxymethylcystosine (hmC), 5- formylcytosine (fC) or 5-carboxylcytosine (caC).
  • C cytosine
  • G guanine
  • A adenine
  • T thymine
  • mC 5-methylcytosine
  • hmC 5-hydroxymethylcystosine
  • fC 5- formylcytosine
  • caC 5-carboxylcytosine
  • the single-stranded portion of the hybridized analyte nucleic acid may comprise a single-stranded polynucleotide "tail" that is appended to the analyte nucleic acid either 5' or 3' of the portion of the analyte nucleic acid that hybridizes to the probe nucleic acid (see, e.g., Ding et al., ACS Nano, 9(11): 11325-11332 (2015)).
  • These single-stranded portions may comprise any type of polynucleotide of any suitable length.
  • the single-stranded portion may comprise a tail of one nucleotide type (e.g., polythymine (poly-T) or poly cytosine (poly-C)) or a heterogeneous polynucleotide tail comprising more than one type of nucleotide(Ding et al., ACS Nano, 9 ⁇ 1): 11325-11332 (2015)).
  • the single-stranded portion may comprise about 20 to about 30 nucleotides (e.g., 21, 22, 23, 24, 25, 26, 27, 28, or 29 nucleotides). It will be appreciated that the single- stranded portion of the hybridized analyte may function to "tether" the double-stranded portion of the hybridized analyte nucleic acid within a nanopore (as described below).
  • duplex portion of the hybridized analyte nucleic acid may fold into various other inter- and intramolecular secondary structures such as, for example, stem-loops, hairpins, G-quadruplexes, i-motifs, and folded RNA structures (Vercoutere et al, Nat.
  • the duplex portion may comprise a tetraloop structure, resulting in the hybridized analyte nucleic acid taking the form of a "hairpin.”
  • the methods described herein involve contacting the hybridized analyte nucleic acid with a nanopore within a membrane.
  • nanopore refers to a pore, typically having a size on the order of nanometers, which allows the passage of biopolymers (e.g., polynucleotides) there through.
  • the nanopore may be comprised of pore-forming proteins, such as the Staphylococcus aureus toxin oc-hemolysin (oc-HL) or Mycobacterium smegmatis (MspA) porin protein (Stoddart et al., Nano Lett., 10(9): 3633-7 (2010); and Faller et al, Science, 303(5661): 1189-1192 (2004)), which are capable of forming a pore that may permit hydrated ions driven by an applied potential to flow from one side of a membrane to the other.
  • oc-HL Staphylococcus aureus toxin oc-hemolysin
  • MspA Mycobacterium smegmatis
  • the nanopore may be fabricated from non-natural materials.
  • Such "solid-state" nanopores generally may be fabricated in insulating membranes comprised of silicon compounds (e.g., silicon nitride), aluminum compounds (e.g., aluminum oxide), titanium compounds (e.g., titanium oxide) or graphene (e.g., box-shaped graphene) (see, e.g., Storm et al, Nat. Mater., 2(8): 537-540 (2003); Garaj et al, Nature, 467(7312): 190-193 (2010); and Lapshin, R.V., Applied Surface Science, 360: 451-460 (2016)).
  • silicon compounds e.g., silicon nitride
  • aluminum compounds e.g., aluminum oxide
  • titanium compounds e.g., titanium oxide
  • graphene e.g., box-shaped graphene
  • Solid-state nanopores may be manufactured using several techniques known in the art, including, for example ion-beam sculpting (Lie et al., Nature, 472(6843): 166-169 (2001)) and electron beams (Storm et al, supra).
  • the nanopore may be a hybrid of biological (e.g., protein) and solid-state nanopores.
  • the nanopore preferably comprises a protein.
  • the protein nanopore may comprise a monomer or an oligomer. Oligomeric nanopores may include several (e.g., 5, 6, 7, 8 or more) repeating subunits.
  • the nanopore may be comprised of any suitable protein known to form pore structures within a membrane, including but not limited to, oc-hemolysin (oc-HL), Mycobacterium smegmatis porin A (MspA) protein, or the phi29 connector protein (see, e.g., Haque et al., Nano Today, 8( ⁇ ): 56-74 (2013)).
  • a protein nanopore typically is inserted into an amphiphilic layer such as a biological membrane.
  • An amphiphilic layer is a layer formed from amphiphilic molecules, such as phospholipids, which have both hydrophilic and lipophilic properties.
  • the amphiphilic layer may be a monolayer or a bilayer.
  • the membrane is a phospholipid bilayer membrane.
  • the phospholipid bilayer may be prepared using any suitable phospholipid, such, as, for example, diphytanoylphosphatidylcholine,
  • phosphatidylcholine phosphatidylethanolamine, phosphatidic acid, phosphotidylglycerol, phosphatidylinositol, or l,2-diphytanoyl-s??-glycero-3-phosphocholine by any suitable method known in the art, such as those described in, e.g., Shim et al., Biomed Microdevices, 14(5): 912-928 (2012); and Funakoshi et ⁇ ., ⁇ Chem., 78(24): 8169-8174 (2016).
  • the membrane may be suspended across a solid support, such as a glass nanopore membrane (GNP) or a quartz nanopore membrane (QNP) (White et al, J. Am. Chem. Soc, 129: 11766-11775 (2007); and Schibel et ai. Anal. Chem. , 82(17): 7259-66 (2010)).
  • GNP glass nanopore membrane
  • QNP quartz nanopore membrane
  • Protein-based nanopores may comprise a barrel or channel through which ions may flow.
  • the subunits of the pore may surround a central axis and contribute strands to a transmembrane ⁇ -barrel or channel or a transmembrane oc-helix bundle or channel.
  • the barrel or channel of the transmembrane protein pore typically comprises amino acids that facilitate interaction with an analyte, such as polymers, nucleotides, polynucleotides or nucleic acids. These amino acids may be located near a constriction of the barrel or channel.
  • the transmembrane protein pore may comprise one or more positively charged amino acids, such as arginine, lysine or histidine, or aromatic amino acids, such as tyrosine or tryptophan. These amino acids may facilitate the interaction between the pore and polymers, nucleotides, polynucleotides, or nucleic acids.
  • the nanopore may comprise at least a first region defining a first channel with a diameter sufficient to allow passage of double-stranded nucleic acids, a second region proximate to the first region and defining a second channel with a diameter that is larger than the first diameter, and a third region proximate to the second region and spaced from the first region defining a third channel with a diameter sufficient to allow passage of single-stranded nucleic acids, but not sufficient to allow passage of double-stranded nucleic acids.
  • the protein nanopore may comprise a lysine residue positioned within the first channel.
  • the nanopore comprises an oc-hemolysin.
  • oc-Hemolysin (oc- HL) is an exotoxin secreted by the human pathogen Staphylococcus aureus bacterium. Wild- type oc-HL has been studied extensively as a platform for ion-channel recordings of single- stranded DNA (ssDNA) (Akeson et al, Biophys. J., 77: 3227-3233 (1999); Kasianowicz et al., Proc. Natl. Acad. Sci. USA, 93: 13110-13113 (1996); Meller et al., Proc. Natl. Acad. Sci.
  • sDNA single- stranded DNA
  • the current is recorded as a function of time, and translocations of the individual DNA strands are observed as events in which the current momentarily decreases (Kasianowicz et al, Annu Rev. Anal. Chem., 1: 737- 766 (2008); and Wanunu, M., Phys. Life Rev., 9: 125-158 (2012)).
  • the extent of this current change is dependent on the sequence of the DNA near the tightest constriction of the protein channel, which is comparable to the diameter of ssDNA.
  • oc-HL forms a 232.4 kDa mushroom-like heptameric transmembrane pore, consisting of a vestibule (3.6 nm in diameter; ⁇ 5 nm in length) connected to a transmembrane ⁇ -barrel (-2.6 nm in diameter; ⁇ 5 nm in length) (Song et al., Science, 274(5294): 1859-66 (1996)).
  • the pore is narrowest at the vestibule-transmembrane domain junction with a diameter of -1.4 nm.
  • the "latch” constriction (also referred to as the "latch zone” or “latch region”) is a 2.6 nm constriction located in the upper vestibule of a-HL, which has been shown to comprise a sensing zone for specific dsDNA structure (see, e.g., Johnson et al, Biophys. J., 107: 924-931 (2014); Jin et al., J. Am. Chem. Soc, 135(51): 19347-19353 (2013); Johnson et al., J. Am. Chem. Soc, 138(2): 594-603 (2016); Ding et al, ACS Nano, 9(1 1): 11325-11332 (2015); and Johnson et al., J.
  • the aforementioned first region of the nanopore corresponds to the latch zone (or latch region) of ⁇ -hemolysin
  • the second region corresponds to the vestibule region of a-hemolysin
  • the third region corresponds to the ⁇ -barrel region of a-hemolysin.
  • the a-hemolysin may be either a wild-type or mutant form of a-hemolysin.
  • an oc- HL mutant that can be used in the described method comprises a mutation that alters an amino acid within the latch zone of the a-HL monomer.
  • ssDNA single-stranded DNA
  • dsDNA double-stranded DNA
  • dsDNA double-stranded DNA
  • Duplex or dsDNA comprising a single-stranded tail (such as a poly-T tail), as described herein, can be driven into an a-HL nanopore from the cis (vestibule) side of the channel.
  • the duplex may be driven down to the 1.4 nm constriction that separates the vestibule from the ⁇ -barrel (Jin et al, J. Am. Chem. Soc, 134: 11006-11011 (2012)), through which the duplex cannot pass.
  • An electrophoretic driving force causes the double-stranded section to unzip into its constituent components.
  • the unzipping time which is on the order of milliseconds, is dependent on the length and composition of the DNA and correlates with the stability of the duplex (Jin et al, J. Am. Chem. Soc, 134: 11006-11011 (2012); Sauer- Budge et al, supra; and Schibel et al, J. Am. Chem. Soc, 133: 14778-14784 (2011)).
  • a current delivered between the electrodes will flow through the channel of the nanopore, electrophoretically driving charged molecules toward the nanopore.
  • the changes in current are characteristic of the molecular interactions between the nanopore and the analyte; further, the duration of these events may also be indicative of the interactions.
  • the standing open current reading of the channel may be noted, and any change in this current may be due to some current impedance within the channel.
  • directing a nucleic acid through the channel will cause a decrease in current flow through the channel as compared to the open current reading.
  • the decrease in current while dsDNA is captured in the pore, relative to the current measured through an open channel, is a result of the blocking contributions from the double-stranded sections of DNA and any single-stranded sections of the DNA. Although the majority of the current is blocked by a single stranded portion residing in the ⁇ -barrel during unzipping, the double-stranded section that resides in the vestibule also contributes to the current blockage.
  • This feature of oc-HL may be employed to discriminate the identity of different nucleotides, abasic sites, nucleotide analogs (e.g., furan), mismatched base pairs, sequence variations, single- nucleotide polymorphisms, mutations, epigenetic modification nucleotides, and the like.
  • Information about a nucleotide at a particular location in a duplex may be revealed by distinct electrical current signatures, such as the duration and extent of current block and the variance of current levels. In other words, different types of nucleotides will block current to a greater or lesser extent, and thus provide a distinct current signature.
  • the present methods comprise applying an electrical voltage across the nanopore, whereupon the hybridized nucleic acid passes into the nanopore, and the base pair or the base mismatch comprising the single nucleotide and the complementary nucleotide is positioned within the first channel (e.g., the latch zone of oc-HL).
  • the electrical voltage may be applied by using any combination of electrodes suitable for introducing a current across lipid bilayer membrane, such as, for example Ag/AgCl electrodes, using routine methods known in the art.
  • the appropriate magnitude of voltage applied may be determined by one of ordinary skill in the art, and will depend on a number of factors, including the type of membrane used, the sequence of the probe nucleic acid, etc.
  • the voltage applied may be from about +2 V to about -2 V, from about -400 mV to about +400 mV, from about -200 mV to about +200 mV, or a range defined by any two of the foregoing values.
  • the voltage applied is about +90 mV to about +120 mV (e.g., about +100 mV). It may be possible to increase discrimination between different nucleotides by a pore using an increased applied potential.
  • the present methods may also be carried out in the presence of charge carriers, such as metal salts (e.g., alkali metal salts and halide salts), ionic liquids, or organic salts (e.g., tetramethyl ammonium chloride, trimethylphenyl ammonium chloride, phenyltrimethyl ammonium chloride, or 1-ethy 1-3 -methyl imidazolium chloride).
  • charge carriers such as metal salts (e.g., alkali metal salts and halide salts), ionic liquids, or organic salts (e.g., tetramethyl ammonium chloride, trimethylphenyl ammonium chloride, phenyltrimethyl ammonium chloride, or 1-ethy 1-3 -methyl imidazolium chloride).
  • the disclosed methods may be performed in the presence of potassium chloride (KC1), sodium chloride (NaCl), rubidium chloride (RbCl), lithium chloride (LiCl), or cesium
  • the salt concentration may be 3M or lower, such as from 0.1 to 2.5 M. It will be appreciated that base-flipping kinetics may change as a function of salt concentration. Indeed, while high salt concentrations provide a high signal to noise ratio and allow for currents indicative of the presence of a polymer to be identified against the background of normal current fluctuations, it may be easier to distinguish two blocking states at low salt concentrations because they are less noisy and, in the case of the extra-helical state, longer-lived.
  • the present methods may also be performed in the presence of a buffer.
  • a buffer Any suitable buffer may be used, such as, for example, a phosphate buffer.
  • the application of voltage current across the membrane desirably may be carried out at a pH of about 4.0 to about 12.0 (e.g., about 5.0, about 5.5, about 6.0, about 6.5, about 7.0 about 7.5, about 8.0, about 8.5, about 9.0, about 9.5, about 10.0, about 10.5, about 11.0, about 11.5, or a ranged defined by any two of the foregoing values), such as from about 7.0 to about 8.0 (e.g., about 7.1, 7.2, 7.3, 7.4, 7.5, 7.6, 7.7, 7.8, or 7.9).
  • the present method may be performed at any suitable temperature.
  • the method may be performed at a temperature from about 0° C to about 100° C.
  • the method may be performed at room temperature (e.g., from about 15 °C to about 30 °C).
  • the electrical voltage applied across the nanopore embedded within the membrane causes the hybridized analyte nucleic acid to pass into the nanopore such that the base pair or the base mismatch comprising the single nucleotide and complementary nucleotide is positioned within the first channel of the nanopore (e.g., the latch zone of oc- HL).
  • the first channel of the nanopore e.g., the latch zone of oc- HL.
  • residual current measured when DNA resides in the nanopore is sensitive to changes in DNA structure at base pairs situated in proximity to the latch zone.
  • the latch sensing zone of oc-HL comprises a seven base pair detection window that spans from the sixth to the thirteenth base pair of an analyte duplex above the central constriction of oc-HL (Jin et al., J. Am. Chem. Soc, 135: 19347- 19353 (2013); and Johnson et al., Biophys. J., 107: 924-931 (2014)). More recently, the latch sensing zone has been further refined as spanning the eighth or ninth base pair of an analyte duplex above the central constriction (Ding et al, ACSNano, 9(11): 11325-11332 (2015)).
  • dsDNA may be temporarily captured in the oc-HL latch zone using an appended single-stranded tail on one of the strands to pull the duplex into the vestibule.
  • the single-stranded portion of the hybridized analyte nucleic acid passes into the nanopore through the first and second channels and enters the third channel.
  • the single-stranded tail may penetrate the central constriction and thread into the narrow ⁇ -barrel, while the wider dsDNA is temporarily trapped in the vestibule.
  • the duplex region of the analyte nucleic acid may be tethered within the latch zone sensing region, such that the base pair or base mismatch is positioned at the eighth or ninth position above the central constriction.
  • the base pair or base mismatch is positioned nine nucleotides away from the single-stranded portion of the hybridized analyte nucleic acid.
  • the present method comprises measuring the electric current across the nanopore while the base pair or base mismatch is positioned within the first channel to obtain a measurement based on the current.
  • the method further comprises comparing the current measurement to a reference, and then identifying the single nucleotide based on the comparison. It will be appreciated that current may be measured and reported in variety of ways under a variety of parameters.
  • the current measurement may be a measurement of ion current flow through the nanopore, which may be the direct current (DC) flow or the alternating current (AC) flow.
  • AC phase-sensitive detection can be used to measure the conductance of the ion channel, while simultaneously applying a DC bias to electrostatically control the binding affinity and kinetics of charged molecules.
  • a low amplitude AC signal ( ⁇ 10 mV rms) allows the protein- DNA interaction to be measured in the absence of large DC fields, thereby reducing the effects of electroosmosis, electrophoresis, and protein deformation.
  • the current may be measured at a fixed time, in which case the reference is a measurement of a current at a fixed time when a control nucleic acid having a control base pair or a control base mismatch is independently positioned within the nanopore in a manner that causes the control base pair or the control base mismatch to be positioned within the first channel.
  • a "fixed" time includes a specific time point, as well as a finite period of time. Measurement of current at a fixed time may allow for the differentiation of particular base pairs (e.g., G-C vs. A-T), abasic sites, and nucleotide analogs based on the change and duration of current flow through the nanopore.
  • control nucleic acid may comprise a first control nucleic acid strand and a second control nucleic acid strand.
  • the first control nucleic acid strand may comprise a sequence that is identical to the sequence of the analyte nucleic acid at all nucleotide positions other than the position corresponding to the single nucleotide (as described above), and has a first nucleotide at the position corresponding to the single nucleotide.
  • the first nucleotide of the first control nucleic acid strand may be either the same as or different than the single nucleotide of the analyte nucleic acid.
  • the second control nucleic acid may comprise a sequence that is identical to the sequence of the probe nucleic acid at all nucleotide positions other than the position corresponding to the complementary nucleotide, and that has a second nucleotide at the position corresponding to the complementary nucleotide.
  • the second nucleotide of the second control nucleic acid strand may be either the same as or different than the complementary nucleotide of the probe nucleic acid.
  • the first and second control nucleic acid strands are hybridized to form a hybridized control nucleic acid that includes either a control base pair or a control base mismatch comprising the first and second nucleotides.
  • the control base pair or control base mismatch of the hybridized control nucleic acid sequence may be any suitable base pair or base mismatch, such as those described above with respect to the analyte and probe nucleic acid sequences.
  • the current measurement is a measurement of current modulating as a function of time
  • the reference is a current modulation signature corresponding to at least one measurement of current modulating as a function of time when a particular base pair mismatch is positioned within the first channel.
  • Measurement of current modulation as a function of time may permit discrimination of base pair mismatches, such as mismatches involving epigenetically modified nucleotides.
  • mismatches may include, but are not limited to, a C-A mismatch, a C-C mismatch, a C-mC mismatch, a C-hmC mismatch, a C-fC mismatch and a C-caC mismatch.
  • the latch zone of a-hemolysin may be used to distinguish with single molecule level resolution a C-C or C-A mismatch from a canonical C-G base pair, at a specific site within a short sequence from codon 12 of the KRAS gene.
  • the identification is based on unique two-state modulating current signatures observed during residence of the DNA inside the a-HL vestibule that is attributable to base-flipping at C-A and C-C mismatch sites. Specifically, upon capture of the DNA duplex, attenuation of the measured current may be observed due to an immediate decrease in the ion flux through the pore. Proximity of the C-C mismatch to the oc-HL latch constriction when the DNA resides inside the pore leads to distinct modulation of current between two states, producing a "modulation signature." The two states that comprise the modulation signature typically differ in amplitude and periodicity.
  • the frequency of the current modulation and the current amplitudes for the two states is unique to the mismatch (C-C or C-A) and readily permits discrimination between the C-C and C-A mismatches and the fully complementary duplex (Johnson et al., J. Am. Chem. Soc, 138: 594-603 (2016)).
  • Measurement of current modulation as a function of time may also allow for analysis of the kinetics of localized conformational changes at a mismatched base pair in DNA, which may be attributed to a single base-flipping in and out of the helix at the mismatch site (Johnson et al, Faraday Discuss., Sept. 20, 2016 (Epub ahead of print); and Johnson et al, J. Am. Chem.
  • kinetics of base- flipping of a cytosine-cytosine pair situated at the latch constriction of oc-HL may be significantly altered when one of the cytosine bases in the mismatch is modified at the carbon-5 position.
  • measurement of current modulation as described herein may provide information regarding base-flipping kinetics which allow for discrimination of duplexes containing a single mC, hmC, fC, or caC base.
  • the reference desirably is a current modulation signature which corresponds to at least one measurement of current modulating as a function of time when a particular base pair mismatch is positioned within the first channel.
  • the reference may be a known current modulation signature corresponding to a particular mismatch, such as, for example, a C-A mismatch, a C-C mismatch, a C-mC mismatch, a C-hmC mismatch, a C-fC mismatch, or a C-caC mismatch.
  • the present disclosure also provides methods for identifying a single nucleotide in the genome of an organism, which comprises amplifying a portion of the genome comprising a single nucleotide to form an analyte nucleic acid, and then performing a method as described herein to identify the single nucleic acid.
  • the organism may be any suitable organism, such as those described herein (e.g., a bacterium, a virus, a parasite, and insect, a bird, or a mammal (e.g., cow, pig, goat, rabbit, a sheep, a hamster, guinea pig, cat, dog, rat, mouse, monkey, chimpanzee, gorilla or human)).
  • the organism may be a human, and any suitable portion of the human genome may be analyzed by the aforementioned method.
  • the single nucleotide may be any such nucleotide described herein, e.g., cytosine (C), guanine (G), adenine (A), thymine (T), 5-methylcytosine (mC), 5- hydroxymethylcystosine (hmC), 5-formylcytosine (fC) and 5-carboxylcytosine (caC).
  • the single nucleotide may be located at a site corresponding to a single nucleotide polymorphism (SNP).
  • a SNP is a variation at a single nucleotide that occurs at a specific position in the genome which is present to some appreciable degree within a population (e.g., greater than 1%). While most SNPs have no effect on the health or development of a particular organism, many SNPs underlie susceptibility to certain diseases (e.g., Alzheimer's disease).
  • the analyte nucleic acid is RNA
  • the above-described nanopore may be used identify single nucleotides as for DNA described above, including modified RNA nucleotides, such as those described in The RNA
  • a portion of the genome comprising a single nucleotide be amplified using any suitable method known in the art, such as, for example, polymerase chain reaction (PCR), and subjected to the above-described nanopore analysis as illustrated in Figure 17.
  • custom primers may be designed to amplify desired regions of DNA to be analyzed within the latch zone of oc-HL, which can position the amplified nucleic acid sequence at position 9 of the latch zone.
  • one 8-mer primer may be used to properly position the amplified nucleic acid within the latch, and the second primer may be of standard length (e.g., 17-21 nucleotides).
  • PCR using short primers has been previously described (Afonina et al, Nucleic Acids Res., 25: 2657 (1997)).
  • a long tether terminated in a cholesterol tag may be added to the 5' end, which allows for pre- concentrating a sample on the lipid bilayer (Smith et al, Frontiers in Bioeng. Biotech., 3: 91 (2015); doi: 10.3389/fbioe.2015.00091).
  • a 3'-homopolymer DNA tail may be added to the amplified nucleic acid via terminal transferase, or, alternatively, T4-RNA ligase may be used for blunt-end to single-stranded DNA ligation with a 5 '-activated single-stranded DNA.
  • T4-RNA ligase may be used for blunt-end to single-stranded DNA ligation with a 5 '-activated single-stranded DNA.
  • T4-RNA ligase may be used for blunt-end to single-stranded DNA ligation with a 5 '-activated single-stranded DNA.
  • the sample may be cartridge purified and submitted to nanopore analysis as described above.
  • This example demonstrates a method of differentiating between a G-C base pair and an A-T base pair in double-stranded DNA using an oc-HL nanopore system.
  • dsDNA double- stranded DNA
  • the hairpins contained a 12-mer duplex stem with a 5 ⁇ - ⁇ -3 ⁇ tetraloop and were synthesized from commercially available phosphoramidites (Glen Research, Sterling, VA) by the DNA-Peptide Core Facility at the University of Utah.
  • the purities of the oligodeoxynucleotides were determined by analytical ion- exchange HPLC, running the previously mentioned buffers and method, with the exception that the flow rate was 1 mL/min. After purification, all oligodeoxynucleotides were annealed by incubating them at 90 °C in analysis buffer electrolyte solution for five minutes and then rapidly cooling on ice. The duplex DNA was annealed by incubating at 90 °C in analysis buffer electrolyte solution for five minutes and then cooling slowly in a water bath to room temperature. The prepared samples were stored in a -20 °C freezer before they were used in other experiments.
  • the first hairpin duplex (“hp-1"), which was also used as an internal standard in other experiments described below, included six base pairs of G-C on the tail side and six base pairs of A-T on the tetraloop side, as shown in Figure 1A.
  • the use of an internal standard allowed analysis of sequence variations between different protein channels that inherently have natural variation of 5% in the open channel current (White et al., J. Am. Chem. Soc, 129, 11766-11775 (2007)), and also provided a method to determine changes in deep blocking currents for each sequence studied (Figure IB).
  • hp-1 The hp-1 standard was mixed in a 1 :2 ratio with another hairpin duplex (“hp-2") that introduced three G-C base pairs at positions 7, 8, and 9 ( Figure 1A) and blocking ion currents were recorded.
  • hp-2 another hairpin duplex
  • a glass nanopore membrane (GNM) (radius 800 nm) was constructed as previously described (White et al., supra; and Zhang et al, Anal. Chem., 79, 4778-4787 (2007)).
  • DPhPC l,2-Diphytanoyl-sn-glycero-3-phospho-choline bilayers spanning across the orifice of the GNM were prepared as previously described (White et al, supra).
  • a proper bilayer was determined by a resistance of approximately 200 GO., a value consistent with previous reports (White et al, supra).
  • the protein a-HL was diluted to 1 mg/mL in ultra-pure water and the DPhPC was dissolved in decane to a concentration of 10 mg/mL, both of which were stored at -80 °C.
  • a pipette holder with a pressure gauge and a 10-mL gas-tight syringe was used to attach the GNM to the direct current (DC) system.
  • Two Ag/AgCl electrodes were positioned inside and outside of the GNM to apply a voltage.
  • a plastic pipette tip was used to paint the DPhPC solution (1 ⁇ , 10 mg/mL) on the GNM surface.
  • the internal control was added to the cis side of the chamber with a final concentration of 5 ⁇ .
  • another sample was added to the same protein channel at a concentration of 10 ⁇ to allow the comparison of the current levels between two oligodeoxynucleotides.
  • data were collected from three individual protein channels and greater than 200 events were collected for each protein channel with a 10 kHz low pass filter and a 50 kHz data acquisition rate.
  • the blocking currents (I) were normalized by the open channel current (I 0 ) yielding plots of II I 0 ( Figure 1C).
  • Current levels and blockage durations for events in the i-t traces were extracted using QUB 1.5.0.31 software and plotted using Origin 9.1. The plots allow comparisons between samples measured with different protein pores. Only events with deep blockage current (I) that were less than 20% of the open channel current (Io) were considered to be unzipping events, similar to previous work with fishhook hairpin unzipping in the a-HL nanopore (Ding et al., J. Phys. Chem. B, 118: 12873-12882 (2014)).
  • the error bars associated with the difference in blocking currents between the analyte and internal standard strand were determined from three individual protein channels. For the current histograms, either 100 or 150 bins of 0.1 or 0.05 pA widths were used in each plot.
  • A-T tracks in DNA adopt a conformation with a narrower minor groove and wider major groove compared to classical B-form duplexes leading to a slightly wider duplex.
  • the wider duplex at A-T tracks is expected to block the current more than the duplex section observed when G-C tracks are present in the latch zone. This observation further supports the latch zone of the a-HL nanopore as a detector of sequence variations in dsDNA.
  • Positions 8 and 9 showed the greatest difference in blocking current with a value of 0.5 ⁇ 0.1 pA between the individual A-T and G-C base pairs ( Figures 2B and C).
  • G-C vs. G-AP was most discriminated at position 10 (Jin et al, J. Am. Chem. Soc, 135: 19347-19353 (2013); and Johnson et al., Biophys. J., 107: 924-931 (2014)).
  • These previous studies were conducted in a different sequence context that was comprised of G-C and A-T base pairs between the central constriction and the latch zone, unlike the studies described above which comprise only G-C base pairs in this region.
  • This difference in position detected by the latch zone may be the result of poly (G-C) tracks which adopt secondary structures different from the classical B-form helix that can induce a slight elongation of the strand (Timsit et al., Nature, 341: 459-462 (1989)), thereby elongating the duplex in the vestibule.
  • duplexes bearing single-stranded tails comprised of 5 '-(CAT) 10-3' were tested in the a-HL nanopore system and yielded the same current difference between A-T and G-C -containing duplexes, as previously observed, suggesting that the composition of the single-stranded tail has little or no effect on the level of discrimination.
  • This example describes a method of differentiating base pair modifications in duplex DNA using an oc-HL nanopore.
  • Protein nanopores have been utilized for detecting or sequencing epigenetic markers on DNA (Wang et al., Sci. Rep. , 4: 5883 (2014); Wescoe et al, J. Am. Chem. Soc, 136: 16582-16587 (2014); Zeng et al, Chem. Sci. , 6: 5628-5634 (2015), Clarke et al, Nat. Nanotechnol, 4: 265-270 (2009); Laszlo et al, Proc. Natl. Acad. Sci. U.S.A., 110: 18904- 18909 (2013); and Wallace et al, Chem. Commun., 46: 8195-8197 (2010)).
  • Position 9 was chosen because it gave the best discrimination between G-C vs. A-T base pairs.
  • An 8-oxo-7,8- dihydroguanine (OG) was placed at position 9 (hp-8, Figure 3A), and blocking current for an OG-C duplex (hp-8) was compared to a standard with a G-C at the same position (hp-6) as described in Example 1. Histograms of the normalized blocking current showed a single peak, suggesting that the latch zone is not capable of differentiating an OG-C vs. G-C base pair in this duplex system ( Figure 3A).
  • the internal standard was a strand lacking the methylation modification (hp- 10, Figure 3B), and the first test strand included a single mC at position 8 (hp-11, Figure 3B), representing a hemi-methylated strand, which is a rare occurrence in the genome.
  • Analysis of a mixture of hp- 10 and hp-11 exhibited only one peak in the current histogram profile (Figure 3B).
  • results of this example demonstrate a method to detect 5-methylcytosine (mC) in specific sequences using wild-type a-HL.
  • This example describes a method of discriminating C-C and C-A mismatches from a C-G base-pair using the a-HL nanopore system.
  • C-C and C-A mismatches located at the latch constriction during dsDNA residence within a-HL result in distinct modulation of the measured current between two states (/; and for C-C; /; and * for C-A), as illustrated by the representative traces in Figure 4.
  • the open channel current, Io is observed.
  • the modulation frequency between the two states, as well as the amplitude of the residual currents for each state, is unique to the mismatch under study.
  • the characteristic current-time traces corresponding to duplexes CC9 and CA9 permit immediate identification and discrimination of these molecules, providing significant advantages in comparison to situations where either the unzipping kinetics (Jin et al, Biochemistry, 52: 7870 (2013)) or the current amplitude (Jin et al, J. Am. Chem. Soc, 134: 11006 (2012)) is used to identify structural changes in a duplex.
  • the exponential kinetics of the unzipping process generates a wide range of residence times and this requires hundreds of events to be analyzed in order to generate descriptive kinetics (Sauer-Budge et al., D. Phys. Rev. Lett., 90: 238101 (2003)).
  • results of this example demonstrate a method to identify a C-C or C-A mismatch from visual inspection of the current signature of a single nanopore capture event, which provides a significant advantage over previous methods.
  • the same result was also observed for a C-C mismatch at position 6, which was situated deeper within the pore vestibule, but still away from the protein walls because it is situated at the widest, internal point of a-HL.
  • modulating current signatures were observed when a mismatch was incorporated into the middle of the duplex structure, far away from the duplex termini, and so a mechanism other than fraying must be considered.
  • the modulating current signatures observed may be a result of base-flipping, such that extra-helical cytosine and adenine bases are able to interact with lysine residues at the latch constriction of a-HL.
  • FIG. 8 A proposed model for the various interactions for the CC9 duplex with aHL is shown in Figure 8.
  • the model system begins with current amplitude Io, which corresponds to the open channel current.
  • the duplex is driven into the vestibule with a rate constant that is dependent on the concentration in the bulk (15 ⁇ ).
  • Io current amplitude
  • the DNA is inside the pore, there are two possible states distinguishable by their specific attenuation of the current.
  • these states are assigned to cases where one of the mismatched C-C bases is intra-helical (/;) or extra-helical (I2).
  • the DNA After a period of time in the intra-helical state, the DNA unzips into its constituent strands and the pore returns to the open state. Unzipping from /2 is not observed.
  • this model can be described as a simple Markov chain (Privault, N., Understanding Markov Chains: Examples and Applications; Springer: New York, 2013), and using the overall dwell times in each state (either I 0 (the open channel current), /; (intra-helical), or I 2 (extra-helical)), the individual rate constants were extracted for each of the transition pathways.
  • duplex CC9 entered the pore with a rate constant that was DNA
  • the transitions between the I2 and /; states may be indicative of the base-flipping process, with state /; corresponding to a B-form DNA structure with the mismatched CC pair intra-helical.
  • This assumption is based on the observation that the residual current of the state was identical to the residual current of the sole state observed for a fully complementary duplex, indicating that in these two scenarios the DNA conformations are similar, attenuating ion transport through a-HL by the same magnitude.
  • the lifetime of the intra-helical state for the CC mismatch in the above experiments was 15 ⁇ 1 ms.
  • Measuring base-flipping in the context of a confined environment, as described here for the a-HL vestibule, may provide a model for how mismatched base pairs are identified in cells.
  • the latch constriction of a-HL and the toroidal protein proliferating cell nuclear antigen (PCNA), which is required in a number of cell processes that involve base-flipping are similar, with both consisting of a ring of lysines on the internal surface and with internal diameters of 2.6 nm and 3.2 nm, respectively (Ivanov et al, Nucleic Acids Res., 34: 6023 (2006); Zhou, Y. and Hingorani, M. M., J. Biol.
  • PCNA Protein-binding protein
  • the primary role of PCNA is thought to be that of a molecular scaffold, forming a ring around a dsDNA duplex and directing repair proteins, although the precise role of PCNA is still poorly understood (Paunesku et al., Int. J. Radiat. Biol , 77: 1007 (2001); Masih et al, Nucleic Acids Res., 36: 67 (2008)).
  • This example describes a method of detecting a G-G mismatch in duplex DNA.
  • This example describes a method of measuring the dynamics of a DNA mismatch at a single base-pair site.
  • Hybridization of the probe sequence which is fully complementary to the KRAS sequence except at the 9th base as counted from the 3' terminus, generated a single cytosine-cytosine mispair that was specifically placed to align with the latch constriction of a-HL when the DNA duplex was captured by the pore forming a molecular rotaxane, as shown in Figure 12 A.
  • a DNA molecule was generated identical to that shown in Figure 12A, with the exception of a mC, hmC, fC or caC base replacing one of the cytosines in the duplex at the 9 th position in the sequence as counted from the 3' terminal of the shorter (23 base) strand. Initially, the cytosine in the shorter probe strand was replaced to generate a C-X mismatch (where X was either mC, hmC, fC, or caC) in proximity to the latch constriction of a-HL upon capture by DNA.
  • the C-hmC, C-fC, and C-caC base pairs all presented modulating current signatures between two states, but the lifetimes of each state were dramatically altered relative to the C-C duplex.
  • C-hmC the extra-helical lifetime significantly decreased relative to C-C
  • C-caC the intra- helical lifetime increased, but not to the same extent of C-mC.
  • the C-fC duplex exhibited two distinct event types. In type I events, the extra-helical lifetimes were extremely short relative to duplexes with the C-C base pair, and in type II events, the extra-helical lifetimes were extremely long relative to duplexes with the C-C base pair. The ratio of type I to type II events was approximately 5: 1, suggesting that duplexes containing the fC base, or the fC base itself, may exist in two uniquely identifiable forms.
  • results of this example demonstrate that distinct kinetics for different cytosine modifications can be used to determine the identity of an individually captured duplex from a mixed sample and determine the ratio of duplex concentrations.
  • the modified base at the C-X mismatch was flanked by a 5'G and a 3'T, and in the target strand, the modified base at the X:C mismatch was flanked by a 5' A and a 3'C.
  • the position of the mismatch site relative to the latch constriction of a-HL remained unchanged, while the pore itself was seven-fold symmetric (Song et al, supra).
  • the C-mC mismatch exhibited average state lifetime constants xi ( mea n), and 12 ( me an) of 46.5 and 41.3 ms, respectively, while the mC-C mismatch exhibited xi (mean), and X2 (mean) values of 59.1 and 43.8 ms, respectively.
  • Changing the context mC base from A(mC)C to G(mC)T resulted in a 27% increase in xi (mean)- Changes to the time constant of the third state, X3 (mean), also were observed, with a significant decrease when mC was placed in the A(mC)C context.
  • the increase in xi (mean) indicated that an A and C at either side of the methylcytosine base work to stabilize the intra-helical state relative to flanking T and G pairs.
  • formylcytosine within the context of the DNA duplex, exists in two unique structural forms, with each form having different base-flipping kinetics when confined at the latch constriction of a-HL.
  • the two event types observed for fC-containing duplexes may be the result of hydration of the formyl group in aqueous solution.
  • Aldehydes undergo nucleophilic addition in water to form hydrates, with both the hydrate and formyl structures existing in an equilibrium defined by the relative stabilities of the two structures (Hilal et al, QSAR Comb. Sci., 24(5): 631-638 (2005)).
  • the existence of formylcytosine base in hydrate form was previously measured at very low quantities (0.5%) by Carell and co-workers via mass spectrometry (Pfaffeneder et al, Angew Chemie Int Ed., 60(31): 7008-7012 (2011)).

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Organic Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Microbiology (AREA)
  • Immunology (AREA)
  • Biotechnology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Analytical Chemistry (AREA)
  • Physics & Mathematics (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The disclosure is directed to a method for identifying a single nucleotide of an analyte nucleic acid, such as a nucleotide that is part of base pair or base mismatch, using a protein nanopore system.

Description

METHODS AND SYSTEMS FOR DETECTING VARIATIONS IN DNA
CROSS-REFERENCE TO RELATED APPLICATION(S)
[0001] This application claims the benefit of U.S. Provisional Patent Application No. 62/245,920, filed on October 23, 2015, and U.S. Provisional Patent Application No.
62/386,954, filed on December 17, 2015, which are hereby incorporated by reference in their entireties.
STATEMENT OF GOVERNMENT SUPPORT
[0002] The invention described herein was made with government support under Grant #R01 GM093099 awarded by the National Institutes of Health. The Government has certain rights to this invention.
BACKGROUND OF THE INVENTION
[0003] The ability to distinguish specific nucleic acid sequence variations is critical to a number of clinical, commercial, and research applications. For example, a biopsy of possible cancerous tissue likely will include a population of cells that may each have a unique set of genome variations (i.e., single nucleotide polymorphisms (SNPs) or mutations) (Burrell et al, Nature, 501: 338 (2013)). Knowledge about these heterogeneous cell populations and their corresponding genetic variations is critical for the diagnosis and treatment of a variety of diseases, including cancer. In this regard, low-level genetic variations, if they occur at critical sites, can be the driver mutations of cancer phenotypes.
[0004] The ability to identify the presence and position of mismatched bases in a DNA sequence at the single molecule level also would be valuable in understanding how enzymes incorrectly incorporate DNA bases into newly replicated strands, what effect this has on genomic fidelity, and how mismatched base-pairs are recognized and repaired (Sancar et al, Annu. Rev. Biochem. , 73: 79 (2004). Mismatched base-pairs occur at a frequency of 10"6 to 10"8 per nucleotide and lead to harmful mutations when left unrepaired (Kunkel, T.A, J. Biol. Chem., 279: 16895 (2004); and Want et al., Proc. Natl. Acad. Sci. USA, 108: 17644 (2011)). The process of mismatch repair at the molecular level is still poorly understood, with a variety of proteins and multiple mechanisms involved. As mismatch sites have little effect on local or global conformation of the helical structure, it is believed that repair proteins access bases to be excised through "base-flipping," in which a single base rotates from an intra- helical position through to an exposed extra-helical position (Stivers, J.T., Chem.-Eur. J., 14: 786 (2008)). Base-flipping is thought to be especially prominent at mismatch sites that, in general, form less stable base pairs than canonical Watson-Crick pairs. The ability to discriminate between cytosine and epigenetically modified analogs thereof, such as 5- methylcytosine (mC), also is critical to understanding how genes regulate cell function and development.
[0005] Currently, next generation sequencing methods are used to detect genetic variations in heterogeneous cell populations (Burrell et al., Nature, 501, 338-345 (2013)). High sequence coverage (up to lOOOx), however, is required to identify low levels of SNPs by current next generation sequencing methods, making the technology cost-prohibitive to conduct on a routine basis (Sims et al, Nat. Rev. Genet., 15, 111 (2014)). Despite its biological significance in DNA repair, base-flipping also has been consistently challenging to measure (Stivers, supra). The most common methodology employs nuclear magnetic resonance spectroscopy (NMR), where the exchange of imino protons occurs when the base pair is open to the solution (i.e., extra-helical). The detection of base-flipping using NMR is limited to those bases that contain an iminoproton (G and T) (Bhattacharya et al, Nud. Adds Res., 30: 4740 (2002)), and computational simulations suggest that intra-helical lifetimes reported from NMR measurements may not be a true representation of base-flipping, because solvent access to imino protons can occur when the base rotates just 30° from its intra-helical position (Giudice et al, Nud. Adds Res., 31: 1434 (2003); and Priyakumar, U.D. and MacKerell, A.D., Chem. Rev., 106: 489 (2006)). Indeed, recent measurements of base- flipping with single-molecule florescence in the absence of proteins have generated debate as to whether intra-helical lifetimes are much longer than previously thought (Yin et al, Proc. Natl. Acad. Sci. USA, 111: 8043 (2014)).
[0006] While cytosine and mC can be readily discriminated with high precision using bisulfite sequencing (Grunau et al, Nuc Acids Res, 29(13): e65 (2001)), the development of suitable assays for discriminating the products of mC oxidation remains a significant challenge. Variations of bisulfite sequencing, in which the target of identification (hmC, fC, or caC) is first selectively modified through chemical or enzymatic reaction, have been described (Booth et al., Nat. Chem., 6(5): 435-440 (2014); Sun et al, Molecular Cell, 57(4): 750-761 (2015); Lu et al, J Am. Chem. Soc , 135(25): 9315-9317 (2013); Lu et al, Cell Res., 25(3): 386-389 (2015); Yu et al, Cell, 149(6): 1368-1380; Booth et al, Science, 336(6083): 934-937 (2012); and Song et al, Nat Biotech, 29(1): 68-72 (2011)), but in order to be completely reliable the conversion reactions require a 100% reaction yield. This is especially important given the relatively low abundance of oxidative products of mC, where hmC, fC and caC are found at levels of just -0.5%, -0.002% and 0.0003%, respectively, of all cytosine in mouse ES cells (Song et al., Nat. Biotech., 30(\ 1): 1107-11 16 (2012)).
[0007] As such, there remains a need for methods that distinguish single nucleotide changes in nucleic acid sequences, such as low-level variations, epigenetic chemical modifications, and mismatched base pairs. The present disclosure provides such methods.
BRIEF SUMMARY OF THE INVENTION
[0008] The present disclosure provides method for identifying a single nucleotide of an analyte nucleic acid. The method comprises: (a) hybridizing the analyte nucleic acid to a probe nucleic acid to form a hybridized analyte nucleic acid that includes a double-stranded portion comprising either (i) a base pair comprising the single nucleotide and a
complementary nucleotide on the probe nucleic acid or (ii) a base mismatch comprising the single nucleotide and a non-complementary nucleotide on the probe nucleic acid, (b) contacting the hybridized analyte nucleic acid with a nanopore within a membrane, the nanopore comprising at least a first region defining a first channel with a diameter sufficient to allow passage of double-stranded nucleic acids, a second region proximate to the first region and defining a second channel with a diameter that is larger than the first diameter, and a third region proximate to the second region and spaced from the first region and defining a third channel with a diameter sufficient to allow passage of single-stranded nucleic acids but not sufficient to allow passage of double-stranded nucleic acids, (c) applying an electrical voltage across the nanopore, whereupon the hybridized analyte nucleic acid passes into the nanopore, and the base pair or the base mismatch is positioned within the first channel, (d) measuring the electric current across the nanopore while the base pair or base mismatch is positioned within the first channel to obtain a measurement based on the current, (e) comparing the measurement of step (d) to a reference, and (f) identifying the single nucleotide based on the comparison of step (e).
[0009] In one aspect of the above method, the measurement based on the current of step (d) is a measurement of a current at a fixed time, and the reference of step (e) is a measurement of a current at a fixed time when a control nucleic acid having a control base pair or a control base mismatch is independently positioned within the nanopore in a manner that causes the control base pair or the control base mismatch to be positioned within the first channel.
[0010] In another aspect of the above method, the measurement based on the current of step (d) is a measurement of current modulating as a function of time, and the reference of step (e) is a current modulation signature corresponding to at least one measurement of current modulating as a function of time when a particular base pair mismatch is positioned within the first channel.
[0011] The present disclosure also provides a method for identifying a single nucleotide in the genome of an organism which comprises amplifying a portion of the genome comprising a single nucleotide to form an analyte nucleic acid, and then performing the above-described method to identify the single nucleic acid.
BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWING(S)
[0012] Figure 1A is a diagram illustrating the hairpin duplex sequences hp-1, hp-2, hp-3, and hp-4 described in Example 1. Figure IB is a diagram illustrating a representative structure of the oc-hemolysin nanopore and the hairpin duplex based on pdb 7AHL, 1JVE, and 4HW1. Figure 1C is a current blocking histogram for a mixture of the hp-1 and hp-2 duplexes. Figure ID is a current blocking histogram for a mixture of the hp-1 and hp-3 duplexes. Figure IE is a current blocking histogram for a mixture of the hp-1 and hp-4 duplexes. For Figures 1C-1E, the hp-1 sequence was used as an internal standard, and it was always mixed with the analyte strand (hp-2, hp-3 and hp-4) in a ratio of 1 :2, respectively; therefore, the smaller peak area always represents hp-1. Figure IF is a table listing peak-to- peak AI/IQ and current differences measured between the intemal standard (hp-1) and the analyte strand. The error in each value represents the standard deviation of peak-to-peak widths from three individual protein channels. The data were recorded with 10 mM KPi (pH 7.4), 1.00 M KC1, at 22 ± 1 °C, and a 100 mV (trans vs. cis) bias. The histograms represent > 200 recorded events.
[0013] Figure 2A is a diagram illustrating the sequences for the hairpin duplexes hp-1, hp-5, hp-6, hp-7 described in Example 1. Figures 2B is a current blocking histogram for a mixture of the hp-1 hp-5 hairpin duplexes. Figures 2C is a current blocking histogram for a mixture of the hp-1 hp-6 hairpin duplexes. Figures 2D is a current blocking histogram for a mixture of the hp-1 hp-7 hairpin duplexes. For Figures 2B-2D,the hp-1 sequence was used as an intemal standard and it was always mixed with the analyte strand (hp-5, hp-6 and hp-7) in a ratio of 1 :2, respectively; therefore, the smaller peak area always represents hp-1. Figure 2E is a table listing peak-to-peak AI/I0 and current differences measured between the standard (hp-1) and the analyte. The error in each value represents the standard deviation of peak-to- peak widths from three individual protein channels. The data were recorded with 10 mM KPi (pH 7.4), 1.00 M KC1, at 22 ± 1 °C, and a 100 mV (trans vs. cis) bias. The histograms represent > 200 recorded events.
[0014] Figure 3A includes a diagram illustrating the hairpin duplex sequences hp-6, hp-8, and hp-9, as well as blocking current histograms comparing G-C vs. OG-C or G-C vs. 7- deazaguanine (7):C when placed at position 9 of the oc-HL latch zone. The current blocking histograms for both studies identified a single population, and blocking current histograms for hp-6 vs. hp-8 and hp-6 vs. hp-9 were nearly identical. These results suggest the latch zone of wild-type a-HL cannot resolve these base pairs. Figure 3B includes a diagram illustrating the hairpin duplex sequences hp-10, hp-11, and hp-12, as well as current blocking histograms comparing the epigenetic markers mC with C-containing duplexes. These blocking current histograms show that one mC could not be distinguished from one C;
however, two mCs could be differentiated from the parent strand. The data were recorded with 10 mM KPi (pH 7.4), 1.00 M KC1, at 22 ± 1 °C, and a 100 mV (trans vs. cis) bias. The histograms represent > 200 recorded events.
[0015] Figure 4A is a diagram of a duplex section of DNA driven into the a-HL vestibule up to the 1.4 nm central constriction. A mismatch was located at the 9th base-pair from the 3' end of the shorter strand, placing it in close proximity to the 2.6 nm latch constriction. Figure 4B is a plot of typical current-time traces in a duplex containing a Watson-Crick CG base pair or a CA, CC, or CT mismatch located at position 9. Figure 4C is a plot showing an expanded view of the current signature observed when DNA resides in a-HL for a CC9-containing duplex. Figure 4D is a plot showing an expanded view of the current signature observed when DNA resides in a-HL for a CA9-containing duplex. Figure 4E is a plot showing an expanded view of the current signature observed when DNA resides in a-HL for a CT9- containing duplex. Figure 4F is a plot showing an expanded view of the current signature observed when DNA resides in a-HL for a CG9-containing duplex. Data were recorded in a 10 mM phosphate (pH 7.5), 0.25 M KC1 solution. A voltage of 120 mV was applied across the protein channel. For clarity, continuous open channel data have been excised, as denoted by an axis break (//). The relative current levels (/;, and I 2*) observed for the different mismatch containing duplexes were confirmed by analyzing multiple duplexes with the same protein channel.
[0016] Figure 5A is a plot showing the typical individual events observed for CC and CA mismatches, where the current modulates between states, labeled /; and for the CC mismatch and /; and I 2* for the CA mismatch. For comparison, a typical event for the complementary CG duplex is also shown. The CT mismatch gives a single state current signature of identical current magnitude to the complementary duplex. Expanded views for duplexes CC9 and CA9 are shown in Figure 5B and Figure 5C, respectively, and illustrate the typical state lifetimes. Figure 5D and Figure 5E are histograms generated from dwells in a unique state that show the relative current amplitudes and lifetimes for the CC9 and CA9 mismatch-containing duplexes, respectively. State current and lifetime values were extracted using QUB software. Data were recorded in a 10 mM phosphate, 0.25 M KC1, pH 7.5 buffer. A voltage of 120 mV was applied across the protein channel.
[0017] Figure 6A is a schematic of KRAS duplex sequences in which a mismatched CC base pair was positioned 3-4 bases away from the oc-HL latch constriction. Figure 6B is a series of a representative current-time trace plots illustrating that identification of the CC mismatch in the KRAS sequence was restricted to instances where the mismatch was in close proximity to the latch constriction during DNA residence in the a-HL vestibule. Figure 6C, Figure 6D and Figure 6E are histograms of the current states for duplexes CC6, CC9, and CC13, respectively, and illustrate that the /2 state is only observed when the CC is in proximity to the latch constriction. For clarity, continuous open channel data have been excised from the current-time traces, as denoted by an axis break (//).
[0018] Figure 7A, Figure 7B and Figure 7C are a series of histograms illustrating that a duplex containing a CC mismatch in proximity to the latch constriction of aHL resides significantly longer in the nanopore than the fully complementary duplex. Shown are histograms of the total residence time prior to unzipping (tres, see Figure 5) for a duplex with a CC mismatch at position 9 (Figure 7A), the fully complementary duplex (Figure 7B), and a duplex with a CC mismatch at position 13 (Figure 7C). The residence time constant for the duplex CC6 was 47 ± 6 ms.
[0019] Figure 8 is a schematic illustrating the proposed model for the interactions of a CC mismatch-containing duplex with a-HL as described in Example 4. Rate constants presented in Figure 8 were extracted from histograms of the total dwell times of each state. [0020] Figure 9 is a series of diagrams illustrating the structures of the Watson-Crick CG base pair (Figure 9A), the wobble CC mismatch (Figure 9B), the CA wobble mismatch h (Figure 9C), and the CT homopyrimidine mismatch (Figure 9D). The CC and CA mismatches are stabilized by just one hydrogen bond each, the CT mismatch by two hydrogen bonds, and the CG complementary base-pair by three hydrogen bonds. The extra stability yielded by the additional hydrogen bonds in the CT mismatch and CG pair may inhibit base-flipping on a time scale shorter than the residence of the duplex within the pore prior to unzipping.
[0021] Figure 1 OA is a diagram illustrating that the interaction of a CC mismatch with the latch constriction is dependent on pH. Figure 10B is a graph showing the fraction of events with a current signature that modulated between the /; and states plotted as a function of pH over the range 6 to 7.5. Modulating current capture events were assigned to duplexes that contain the CC mismatch (dominant > pH 7), and single-level current capture events (/; only) were assigned to duplexes containing the CC+ mismatch (dominant <6). At intermediate pH values, a mixture of event types was observed.
[0022] Figure 11 A is a plot of representative current-time trace highlighting the different event types observed for interactions of duplexes containing a GG mismatch or GC base-pair at position 9 within the oc-HL latch zone. Figure 1 IB shows current event histograms of capture events for experiments with just the GG duplex and both the GG and CG duplexes.
[0023] Figure 12A is a diagram of a DNA duplex driven into the a-HL nanopore (shown in cross-section) under an applied potential where it was held for up to 20 seconds and then ejected by reversing the applied bias as described in Example 7. While resident within the pore, the C-C mismatch site was aligned with the latch constriction of a-HL. Figure 12B shows plots of modulating current signatures observed while DNA resides within the nanopore, where corresponds to a confirmation where all bases are intra-helical and corresponds to a conformation where one of the cytosine bases at the mismatch site is extra- helical. Intra- and extra-helical lifetimes are given by tj and ¾ respectively.
[0024] Figure 13 A is a series of representative lifetime histograms for states I\ (intra- helical) and h (extra-helical), for a single molecule of DNA, from which lifetime constants can be extracted. Figure 13B are distributions of lifetime constants for states I\ and across a sample of 35 individual duplexes, measured with a single protein channel. Figure 13C is a scatter plot of intra- and extra-helical lifetime constants τ\ and τ2 for individual DNA duplexes measured across three independent a-HL channels (squares, circles, triangles). [0025] Figure 14 shows plots of representative current time traces from a six second window of a single DNA capture event demonstrating measurement of base-flipping at a C-X mismatch site as described in Example 8, where X is mC, hmC, fC or caC. Two event types (I and II) were observed for fC, with type I comprising 80% of events.
[0026] Figure 15 is a scatter plot of intra-helical (xi) vs. extra-helical (12) lifetime constants for duplexes C-C (squares), C-mC (circles), C-hmC (diamonds), C-fC (triangles), and C-caC (pentagons), as described in Example 8. Each data point represents a base- flipping measurement for a single DNA molecule.
[0027] Figure 16 is a scatter plot that illustrates that base-flipping kinetics at a mismatch site within the a-HL latch are sequence dependent, as described in Example 9. When the flanking base pairs of the cytosine modifications mC (circles), hmC (diamonds) and fC (triangles) are changed from 5Ά and 3'C (hollow symbols, data from Figure 15) to 5'G and 3'T (solid symbols), the population centers of the lifetime constants xi and X2 were shifted. Changes to the lifetime constant when changing the sequence context of fC were observed only for the minor event type. No changes were observed for caC. Three independent measurements, i.e., with three different protein channels (hollow, solid, and hatched squares) for the C-C duplex highlighted the negligible variation in population centers expected from experiment to experiment with DNA of the same composition.
[0028] Figure 17 is a diagram illustrating a method for applying the latch zone to the discrimination of specific base pairs from a genomic sample.
DETAILED DESCRIPTION OF THE INVENTION
[0029] The present disclosure is based, at least in part, on the discovery of a nanopore- based system for analysis of nucleic acid molecules at the single nucleotide level. Nanopore analysis of nucleic acids involves using a voltage to drive molecules through a nanoscale pore in a membrane between two electrolytes, and monitoring how the ionic current through the nanopore changes as single molecules pass through it. This approach allows charged polymers (including single-stranded DNA, double-stranded DNA and RNA) to be analyzed with sub-nanometer resolution and without the need for labels or amplification. Described herein is a method for identifying a single nucleotide of an analyte nucleic acid which involves a nanopore-based system.
[0030] As used herein, the terms "nucleobase" or "base" are synonymous and refer to naturally occurring and synthetic heterocyclic moieties commonly known in the art of nucleic acid or polynucleotide technology or peptide nucleic acid technology for generating polymers. Non-limiting examples of suitable nucleobases include: adenine, cytosine, guanine, thymine, uracil, 5-propynyl- uracil, 2-thio-5-propynyl-uracil, 5-methylcytosine,
pseudoisocytosine, 2-thiouracil and 2- thiothymine, 2-aminopurine, N9-(2-amino-6- chloropurine), N9-(2,6-diaminopurine), hypoxanthine, N9-(7-deaza-guanine), N9-(7-deaza-8- aza-guanine), N8-(7-deaza-8- aza-adenine), and a tetrahydrofuran. Nucleobases can be linked to other moieties to form nucleosides, nucleotides, and nucleoside/tide analogs.
"Nucleoside" refers to a compound consisting of a purine, deazapurine, or pyrimidine nucleoside base, e.g., adenine, guanine, cytosine, uracil, thymine, 7- deazaadenine, 7- deazaguanosine, that is linked to the anomeric carbon of a pentose sugar at the 1 ' position, such as a ribose, 2'-deoxyribose, 3'-deoxyribose, or a 2',3'-di-deoxyribose. The term "nucleotide" refers to a phosphate ester of a nucleoside, e.g., a mono-, a di-, or a triphosphate ester, wherein the most common site of esterification is the hydroxyl group attached to the C- 5, C-2, or C-3 position of the pentose.
[0031] The terms "nucleic acid sequence" or "nucleic acid" refer to a polymer of DNA or RNA, i.e., a polynucleotide or oligonucleotide, which can be single-stranded or double- stranded and which can contain non-natural or altered nucleotides. Nucleic acids are typically linked via phosphodiester bonds to form nucleic acids or polynucleotides, though many other linkages are known in the art (e.g., phosphorothioates, boranophosphates, and the like).
Analyte and Probe Nucleic Acid Molecules
[0032] The methods described herein involve, inter alia, hybridizing an analyte nucleic acid to a probe nucleic acid to form a hybridized analyte nucleic acid. The term "analyte" generally refers to a substance that is the subject of chemical, biological, or structural analysis. Thus, an "analyte nucleic acid" refers to a nucleic acid molecule that is subject to an analysis of any suitable property or feature of the nucleic acid. Such property or features include, but are not limited to, the sequence, secondary structure, or size of the nucleic acid molecule. As used herein, the term "probe " refers to an oligonucleotide that hybridizes specifically to a target sequence in a nucleic acid under conditions that promote hybridization to form a detectable hybrid.
[0033] In some embodiments, the analyte nucleic acid may be a nucleic acid molecule or sequence obtained or derived from a cell of any suitable organism, such as, for example, a bacterium, a virus, a parasite, an insect, a bird, or a mammal (e.g., cow, pig, goat, rabbit, a sheep, a hamster, guinea pig, cat, dog, rat, mouse, monkey, chimpanzee, gorilla, or human). In some embodiments, the analyte nucleic acid molecule may be obtained or derived from a healthy (i.e., non-diseased) organism. In some embodiments, the analyte nucleic acid molecule may be obtained or derived from an abnormal cell or population of cells, such as a neoplasm, cancer, or other diseases which damage or impair cellular function. For example, the analyte nucleic acid molecule may be obtained or derived from any type of neoplasm or cancer, such as, for example, a melanoma, renal cell carcinoma, lung cancer, bladder cancer, breast cancer, cervical cancer, colon cancer, gall bladder cancer, laryngeal cancer, liver cancer, thyroid cancer, stomach cancer, salivary gland cancer, prostate cancer, or pancreatic cancer.
[0034] In some embodiments, the analyte nucleic acid molecule may be obtained or derived from a bacterium, a virus, a parasite, or other infectious agents. For example, the analyte nucleic acid molecule may be obtained or derived from any of the following bacteria:
Actinomyces, Anabaena, Bacillus, Bacteroides, Bdellovibrio, Caulobacter, Chlamydia, Chlorobium, Chromatium, Clostridium, Cytophaga, Deinococcus, Escherichia,
Halobacterium, Heliobacter, Hyphomicrobium, Methanobacterium, Micrococcus,
Myobacterium, Mycoplasma, Myxococcus, Neisseria, Nitrobacter, Oscillatoria, Prochloron, Proteus, Pseudomonas, Phodospirillum, Rickettsia, Salmonella, Shigella, Spirillum, Spirochaeta, Staphylococcus, Streptococcus, Streptomyces, Sulfolobus, Thermoplasma, Thiobacillus , and Treponema.
[0035] In some embodiments, the analyte nucleic acid molecule may be obtained or derived from any of the following viruses: Arenaviridae, Arterivirus, Astroviridae,
Baculoviridae, Badnavirus, Barnaviridae, Birnaviridae, Bromoviridae, Bunyaviridae, Caliciviridae, Capillovirus, Carlavirus, Caulimovirus, Circoviridae, Closterovirus,
Comoviridae, Coronaviridae (e.g., Coronavirus, such as severe acute respiratory syndrome (SARS) virus), Corticoviridae, Cystoviridae, Deltavirus, Dianthovirus, Enamovirus, Filoviridae (e.g., Marburg virus and Ebola virus (e.g., Zaire, Reston, Ivory Coast, or Sudan strain)), Flaviviridae, (e.g., Hepatitis C virus, Dengue virus 1 , Dengue virus 2, Dengue virus 3, Dengue virus 4, and Zika virus), Hepadnaviridae (e.g., Hepatitis B virus), Herpesviridae (e.g., Human herpesvirus 1 , 3, 4, 5, and 6, and Cytomegalovirus), Hypoviridae, Iridoviridae, Leviviridae, Lipothrixviridae, Microviridae, Orthomyxoviridae (e.g., Influenza virus A and B), Papovaviridae, Paramyxoviridae (e.g., measles, mumps, and human respiratory syncytial virus), Parvoviridae, Picornaviridae (e.g., poliovirus, rhinovirus, hepatovirus, and aphthovirus), Poxyiridae (e.g., vaccinia virus), Reoviridae (e.g., rotavirus), Retro viridae (e.g., lentivirus, such as human immunodeficiency virus (HIV) 1 and HIV 2), Rhabdoviridae, Totiviridae, Crimean-Congo haemorrhagic fever virus, Eastern Equine Encephalitis virus, Hendra virus, Lassa fever virus, Monkeypox virus, Nipah virus, Rift Valley fever virus, South American Haemorrhagic Fever viruses, and Venezuelan Equine Encephalitis virus.
[0036] In some embodiments, the analyte nucleic acid molecule may be obtained or derived from any of the following parasites: Sporozoa (e.g., Plasmodium species),
Ciliophora, Rhizopoda, or Zoomastigophora.
[0037] The analyte nucleic acid may comprise a portion of a larger nucleic acid sequence that encodes a protein, or the analyte nucleic acid may itself encode a protein. In this regard, the analyte nucleic acid or a larger nucleic acid sequence containing the analyte nucleic acid may encode any suitable protein, including but not limited to, surface proteins, intracellular proteins, membrane proteins, and secreted proteins. The analyte nucleic acid or a larger nucleic acid sequence containing the analyte nucleic acid may encode an antibody heavy chain or portion thereof, an antibody light chain or portion thereof, an enzyme, a receptor, a cytokine, a tumor suppressor protein, a mitogen, a neuropeptide, a neurotransmitter, a structural protein, a co-factor, a polypeptide, a peptide, an intrabody, a selectable marker, a toxin, a growth factor, or a peptide hormone. The analyte nucleic acid or a larger nucleic acid sequence containing the analyte nucleic acid also may not encode a protein. In this regard, non-coding nucleic acid sequences may function as cis- or trans-regulatory elements that control the transcription of a nearby or distant genes, respectively, or may be transcribed into noncoding RNA sequences (e.g., ribosomal RNA, tRNA, or microRNA).
[0038] While the analyte nucleic acid may be obtained or derived from a naturally- occurring (i.e., wild-type) nucleic acid sequence, in an alternative embodiment, the analyte nucleic acid may be synthetically generated or engineered using routine methods known in the art, such as those described in, e.g., Sambrook et al, Molecular Cloning- A Laboratory Manual, 4th ed., Cold Spring Harbor Press, Cold Spring Harbor, New York (2012); and Ausubel et al, Current Protocols in Molecular Biology, John Wiley & Sons, New York (2016).
[0039] The methods described herein involve hybridizing the analyte nucleic acid to a probe nucleic acid. The probe nucleic acid may be synthetically generated and may comprises a sequence that is complementary to the analyte nucleic acid sequence, including a nucleotide that creates a base pair or base mismatch with a single nucleotide of the analyte nucleic acid upon hybridization. The term "base pair," as used herein, refers to a pair of complementary bases in a double-stranded nucleic acid molecule, consisting of a purine in one strand linked by hydrogen bonds to a pyrimidine in the other. In this regard, the pyrimidine cytosine always pairs with the purine guanine, and the purine adenine with the pyrimidine thymine (in DNA) or uracil (in RNA). A "base mismatch," as used herein, refers to the presence, in one strand of a double-stranded nucleic acid sequence (e.g., DNA), of a nucleotide that is not complementary to the nucleotide occupying the corresponding position in the other strand.
[0040] The base pair may comprise a canonical Watson-Crick base pair, such as cytosine- guanine base pair (C-G) or an adenine-thymine (A-T) base pair. Alternatively, the base pair may involve non-Watson-Crick interactions, such as between modified nucleotides. For example, one or both nucleotides of the base pair may comprise an epigenetic modification, such as methylation. Such epigenetic modifications appear to occur in nature as a means to control gene regulation in human cells and have implications in the development of cancer and other diseases (Robertson, K.D., Nat Rev Genet, (5(8): 597-610 (2005); and Lister et al, Nature, 4(52(7271): 315-322 (2009)). The most common epigenetic modification is the enzyme-catalyzed addition of a methyl group to the carbon-5 position of cytosine to generate methylcytosine (mC). Other modified nucleotides which can form a non- Watson-Crick base pair include, for example, 5-hydroxymethylcystosine (hmC), 5-formylcytosine (fC) and 5- carboxylcytosine (caC). Accordingly, the base pair formed by hybridization between the analyte nucleic acid and probe nucleic acid may be a G-mC base pair, a G-hmC base pair, a G-fC base pair or a G-caC base pair.
[0041] Mismatched base pairs are a common error encountered in human cells, where they are spontaneously generated through the addition of incorrect bases by DNA
polymerases into newly synthesized DNA. The base mismatch may comprise any mismatched nucleotide pairing, such as, for example, a C-A mismatch, a C-C mismatch, a C- mC mismatch, a C-hmC mismatch, a C-fC mismatch, or a C-caC mismatch. As discussed above, because mismatch sites are known to have little effect on local or global conformation of the DNA helical structure, it is believed that mismatch repair proteins access bases to be excised through a process referred to as "base-flipping," in which a single base rotates from an intra-helical position through to an exposed extra-helical position (Stivers, supra).
Spontaneous base-flipping occurs slowly for Watson-Crick base pairs, but is significantly more prominent at mismatch sites and is known to play a key role in many biological processes, particularly sequence or base-specific recognition by DNA repair enzymes (Stivers, J.T., Chem. - Eur. J. , 14: 786 (2008); and Yin et al, Proc. Natl. Acad. Sci. USA, 111: 8043 (2014)). The mechanisms of base-flipping can be passive, where a protein merely identifies an extra-helical base, or active, where a protein is involved in causing base- flipping, and/or stabilizing the intra-helical state (Lariviere et al, J. Biol. Chem., 279: 34715 (2004); and Huang et al ; Proc. Natl. Acad. Sci. USA. 100: 68 (2003)).
[0042] The analyte nucleic acid and probe nucleic acid each may be of any suitable size. In some embodiments, the analyte nucleic acid and/or the probe nucleic acid may comprise at least 15 nucleotides (e.g., 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 or more nucleotides). For example, the analyte nucleic may comprise about 15 to about 20 nucleotides, about 18 to about 22 nucleotides, about 20 to about 25 nucleotides, about 27 to about 35 nucleotides, or a range defined by any two of the foregoing values. In some embodiments, however, the analyte nucleic acid and probe nucleic may differ in size, i.e., the probe nucleic acid may be of a different length than the analyte nucleic acid.
[0043] The probe nucleic acid may be designed such that it hybridizes to the analyte nucleic acid under at least moderate, preferably high, stringency conditions. Methods for designing and synthesizing probe nucleic acid sequences are known in the art and described in, e.g., Tang et al, Biotechniques , 40(6): 759-763 (2006); Espelund et al, Nuc. Acids. Res., 18(20): 6157-6158 (1990); Sambrook et al, supra; and Ausubel et al, supra. Likewise, hybridization of the analyte nucleic acid to the probe nucleic acid may be performed using standard methods known in the art, such as those described in, e.g., Kashima et al, Nature 313: 402-404 (1985); Sambrook et al, supra; and Haymes et al, Nucleic Acid Hybridization: A Practical Approach, IRL Press, Washington, D.C. (1985). Additional details and an explanation of stringency of hybridization reactions are provided in, e.g., Ausubel et al., supra.
[0044] The analyte nucleic acid hybridizes to the probe nucleic acid to form a
"hybridized analyte nucleic acid." In some embodiments, such as where the analyte nucleic acid and the probe nucleic acid are of different lengths, the hybridized nucleic acid may include a double-stranded portion and a single-stranded portion. In some embodiments, the double-stranded portion (also referred to herein as a "duplex") of the hybridized analyte nucleic acid may comprise a portion of the analyte nucleic acid and the entire probe nucleic acid. In some embodiments, the double-stranded portion of the hybridized analyte nucleic acid may comprise a portion of the probe nucleic acid and the entire analyte nucleic acid. The double-stranded portion of the hybridized analyte nucleic acid may include either a base pair comprising a single nucleotide and a complementary nucleotide on the probe nucleic acid or a base mismatch comprising a single nucleotide and a non-complementary nucleotide on the probe nucleic acid. The base pair or base mismatch, as described above, may comprise any suitable single nucleotide. For example, the single nucleotide may be a naturally occurring, an artificial, a damaged, or a modified (e.g., an epigenetic modification) nucleotide. In some embodiments, the single nucleotide may be cytosine (C), guanine (G), adenine (A), thymine (T), 5-methylcytosine (mC), 5-hydroxymethylcystosine (hmC), 5- formylcytosine (fC) or 5-carboxylcytosine (caC). Based on the identity of the single nucleotide in the analyte nucleic acid, one of ordinary skill in the art will be able to generate a corresponding probe nucleic acid comprising a nucleotide that either forms a base pair or a base mismatch with the single nucleotide.
[0045] In some embodiments, the single-stranded portion of the hybridized analyte nucleic acid may comprise a single-stranded polynucleotide "tail" that is appended to the analyte nucleic acid either 5' or 3' of the portion of the analyte nucleic acid that hybridizes to the probe nucleic acid (see, e.g., Ding et al., ACS Nano, 9(11): 11325-11332 (2015)). These single-stranded portions may comprise any type of polynucleotide of any suitable length. In some embodiments, the single-stranded portion may comprise a tail of one nucleotide type (e.g., polythymine (poly-T) or poly cytosine (poly-C)) or a heterogeneous polynucleotide tail comprising more than one type of nucleotide(Ding et al., ACS Nano, 9 \ 1): 11325-11332 (2015)). The single-stranded portion may comprise about 20 to about 30 nucleotides (e.g., 21, 22, 23, 24, 25, 26, 27, 28, or 29 nucleotides). It will be appreciated that the single- stranded portion of the hybridized analyte may function to "tether" the double-stranded portion of the hybridized analyte nucleic acid within a nanopore (as described below).
[0046] The duplex portion of the hybridized analyte nucleic acid may fold into various other inter- and intramolecular secondary structures such as, for example, stem-loops, hairpins, G-quadruplexes, i-motifs, and folded RNA structures (Vercoutere et al, Nat.
Biotechnol, 19: 248-252 (2001); Ding et al., J. Phys. Chem. B., 118: 12873-12882 (2014), An et al, Proc. Natl. Acad. Sci. U.S.A., Ill: 14325-14331 (2014); Shim et al, Nucleic Acids Res., 37: 972-982 (2009); Acosta-Reyes, Biopolymers, 103: 123-133 (2015); Timsit et al, Nature, 341: 459-462 (1989); Ding et al, J. Am. Chem. Soc, 137: 9053-9060 (2015)). In some embodiments, the duplex portion may comprise a tetraloop structure, resulting in the hybridized analyte nucleic acid taking the form of a "hairpin."
Nanopore
[0047] Following hybridization of the analyte nucleic acid to the probe nucleic acid to produce a hybridized analyte nucleic acid, the methods described herein involve contacting the hybridized analyte nucleic acid with a nanopore within a membrane. The term
"nanopore," as used herein, refers to a pore, typically having a size on the order of nanometers, which allows the passage of biopolymers (e.g., polynucleotides) there through. The nanopore may be comprised of pore-forming proteins, such as the Staphylococcus aureus toxin oc-hemolysin (oc-HL) or Mycobacterium smegmatis (MspA) porin protein (Stoddart et al., Nano Lett., 10(9): 3633-7 (2010); and Faller et al, Science, 303(5661): 1189-1192 (2004)), which are capable of forming a pore that may permit hydrated ions driven by an applied potential to flow from one side of a membrane to the other. Alternatively, the nanopore may be fabricated from non-natural materials. Such "solid-state" nanopores generally may be fabricated in insulating membranes comprised of silicon compounds (e.g., silicon nitride), aluminum compounds (e.g., aluminum oxide), titanium compounds (e.g., titanium oxide) or graphene (e.g., box-shaped graphene) (see, e.g., Storm et al, Nat. Mater., 2(8): 537-540 (2003); Garaj et al, Nature, 467(7312): 190-193 (2010); and Lapshin, R.V., Applied Surface Science, 360: 451-460 (2016)). Solid-state nanopores may be manufactured using several techniques known in the art, including, for example ion-beam sculpting (Lie et al., Nature, 472(6843): 166-169 (2001)) and electron beams (Storm et al, supra). In some embodiments, the nanopore may be a hybrid of biological (e.g., protein) and solid-state nanopores.
[0048] In some embodiments, the nanopore preferably comprises a protein. The protein nanopore may comprise a monomer or an oligomer. Oligomeric nanopores may include several (e.g., 5, 6, 7, 8 or more) repeating subunits. The nanopore may be comprised of any suitable protein known to form pore structures within a membrane, including but not limited to, oc-hemolysin (oc-HL), Mycobacterium smegmatis porin A (MspA) protein, or the phi29 connector protein (see, e.g., Haque et al., Nano Today, 8(\): 56-74 (2013)).
[0049] A protein nanopore typically is inserted into an amphiphilic layer such as a biological membrane. An amphiphilic layer is a layer formed from amphiphilic molecules, such as phospholipids, which have both hydrophilic and lipophilic properties. The amphiphilic layer may be a monolayer or a bilayer. In some embodiments, the membrane is a phospholipid bilayer membrane. The phospholipid bilayer may be prepared using any suitable phospholipid, such, as, for example, diphytanoylphosphatidylcholine,
phosphatidylcholine, phosphatidylethanolamine, phosphatidic acid, phosphotidylglycerol, phosphatidylinositol, or l,2-diphytanoyl-s??-glycero-3-phosphocholine by any suitable method known in the art, such as those described in, e.g., Shim et al., Biomed Microdevices, 14(5): 912-928 (2012); and Funakoshi et ύ., Αηαί Chem., 78(24): 8169-8174 (2016). In some embodiments, the membrane may be suspended across a solid support, such as a glass nanopore membrane (GNP) or a quartz nanopore membrane (QNP) (White et al, J. Am. Chem. Soc, 129: 11766-11775 (2007); and Schibel et ai. Anal. Chem. , 82(17): 7259-66 (2010)).
[0050] Protein-based nanopores may comprise a barrel or channel through which ions may flow. The subunits of the pore may surround a central axis and contribute strands to a transmembrane β-barrel or channel or a transmembrane oc-helix bundle or channel. The barrel or channel of the transmembrane protein pore typically comprises amino acids that facilitate interaction with an analyte, such as polymers, nucleotides, polynucleotides or nucleic acids. These amino acids may be located near a constriction of the barrel or channel. The transmembrane protein pore may comprise one or more positively charged amino acids, such as arginine, lysine or histidine, or aromatic amino acids, such as tyrosine or tryptophan. These amino acids may facilitate the interaction between the pore and polymers, nucleotides, polynucleotides, or nucleic acids. Thus, in the context of the present disclosure, the nanopore may comprise at least a first region defining a first channel with a diameter sufficient to allow passage of double-stranded nucleic acids, a second region proximate to the first region and defining a second channel with a diameter that is larger than the first diameter, and a third region proximate to the second region and spaced from the first region defining a third channel with a diameter sufficient to allow passage of single-stranded nucleic acids, but not sufficient to allow passage of double-stranded nucleic acids. In some embodiments, the protein nanopore may comprise a lysine residue positioned within the first channel.
[0051] In some embodiment, the nanopore comprises an oc-hemolysin. oc-Hemolysin (oc- HL) is an exotoxin secreted by the human pathogen Staphylococcus aureus bacterium. Wild- type oc-HL has been studied extensively as a platform for ion-channel recordings of single- stranded DNA (ssDNA) (Akeson et al, Biophys. J., 77: 3227-3233 (1999); Kasianowicz et al., Proc. Natl. Acad. Sci. USA, 93: 13110-13113 (1996); Meller et al., Proc. Natl. Acad. Sci. USA, 97: 1079-1084 (2000); Purnell et al, ACS Nam, 3: 2533-2538 (2009); Reiner et al, Chem. Rev., 112: 6431 -6451 (2012); Stoddart et al., Nano Lett., 10: 3633-3637 (2010); and Stoddart et a\., Angew. Chem. Int. Ed. Engl., 49: 556-559 (2010)). By applying a potential difference across an oc-HL pore that is embedded in a lipid membrane, DNA can be driven electrophoretically from one side of the pore to the other. The current is recorded as a function of time, and translocations of the individual DNA strands are observed as events in which the current momentarily decreases (Kasianowicz et al, Annu Rev. Anal. Chem., 1: 737- 766 (2008); and Wanunu, M., Phys. Life Rev., 9: 125-158 (2012)). The extent of this current change is dependent on the sequence of the DNA near the tightest constriction of the protein channel, which is comparable to the diameter of ssDNA.
[0052] oc-HL forms a 232.4 kDa mushroom-like heptameric transmembrane pore, consisting of a vestibule (3.6 nm in diameter; ~5 nm in length) connected to a transmembrane β-barrel (-2.6 nm in diameter; ~5 nm in length) (Song et al., Science, 274(5294): 1859-66 (1996)). The pore is narrowest at the vestibule-transmembrane domain junction with a diameter of -1.4 nm. The "latch" constriction (also referred to as the "latch zone" or "latch region") is a 2.6 nm constriction located in the upper vestibule of a-HL, which has been shown to comprise a sensing zone for specific dsDNA structure (see, e.g., Johnson et al, Biophys. J., 107: 924-931 (2014); Jin et al., J. Am. Chem. Soc, 135(51): 19347-19353 (2013); Johnson et al., J. Am. Chem. Soc, 138(2): 594-603 (2016); Ding et al, ACS Nano, 9(1 1): 11325-11332 (2015); and Johnson et al., J. Phys. Chem. Lett., 5(21): 3781-3786 (2014)). Accordingly, in the context of α-hemolysin, the aforementioned first region of the nanopore corresponds to the latch zone (or latch region) of α-hemolysin, the second region corresponds to the vestibule region of a-hemolysin, and the third region corresponds to the β-barrel region of a-hemolysin.
[0053] The a-hemolysin may be either a wild-type or mutant form of a-hemolysin.
Amino acid sequences of wild-type α-hemolysin from a variety of species have been identified and characterized (see, e.g., GenBank Accession No. ALS30900.1 ; Imagawa et al, FEMS Microbiol Lett., 117(3): 287-92 (1994); Yoh et al, J. Bacteriology, 171(12): 6859- 6861 (1989); Gouaux et al, Protein Science, 6: 2631 -2635 (1997); Frey et al, Infection an Immunity, 59(9): 3026-3032 (1991); and Alouf, J.E. and Popoff, M R. (eds.), The
Comprehensive Sourcebook of Bacterial Protein Toxins, 3rd Ed., Elsevier Ltd. (2006)).
Several a-HL mutants have been identified or synthesized and also may be used in the method described herein (see, e.g., Kawate, T. and Gouaux, E., Protein Sci., 12(5): 997-1006 (2003); Fang et al, Biochemistry, 36(31): 9518-95212 (1997)) In some embodiments, an oc- HL mutant that can be used in the described method comprises a mutation that alters an amino acid within the latch zone of the a-HL monomer.
[0054] Although single-stranded DNA (ssDNA) can translocate through a-HL, the diameter of double-stranded DNA (dsDNA) (-2.0 nm) is larger than the narrowest constriction of the a-HL pore (-1.4 nm), which prevents dsDNA translocation through a-HL (Song et al, supra). However, it is possible to capture dsDNA in the a-HL vestibule, and this technique has been used to interrogate dsDNA hairpins within the vestibule of a-HL to elucidate structural composition (Vercoutere et al, Nat. BiotechnoL, 19: 248-252 (2001); and Vercoutere et al, Nucl. Acids. Res., 31: 1311-1318 (2003)), to study escape kinetics (Lathrop et al, J. Am. Chem. Soc, 132: 1878-1885 (2010)), and to probe the electrical potential distribution within the a-HL protein pore (Howorka et al., Biophys. J., 83: 3202-3210 (2002)). With an appropriate applied voltage (120 mV), dsDNA will unzip (i.e., denature) into its constituent components (Vercoutere et al, Nat. BiotechnoL, 19: 248-252 (2001), Jin et al., J. Am. Chem. Soc, 134: 11006-11011 (2012); Liu et al., J. Phys. Chem. Lett., 2: 1372- 1376 (2011); Mathe et al, Biophys. J., 87: 3205-3212 (2004); Muzard et al, Biophys. J., 98: 2170-2178 (2010); and Sauer-Bridge et al, Phys. Rev. Lett., 90: 238101 (2003)).
[0055] Duplex or dsDNA comprising a single-stranded tail (such as a poly-T tail), as described herein, can be driven into an a-HL nanopore from the cis (vestibule) side of the channel. The duplex may be driven down to the 1.4 nm constriction that separates the vestibule from the β-barrel (Jin et al, J. Am. Chem. Soc, 134: 11006-11011 (2012)), through which the duplex cannot pass. An electrophoretic driving force causes the double-stranded section to unzip into its constituent components. The unzipping time, which is on the order of milliseconds, is dependent on the length and composition of the DNA and correlates with the stability of the duplex (Jin et al, J. Am. Chem. Soc, 134: 11006-11011 (2012); Sauer- Budge et al, supra; and Schibel et al, J. Am. Chem. Soc, 133: 14778-14784 (2011)).
Voltage Application and Current Measurement
[0056] By placing an electrode on the cis and trans sides of the protein nanopore the membrane, a current delivered between the electrodes will flow through the channel of the nanopore, electrophoretically driving charged molecules toward the nanopore. When the molecules enter the nanopore, the changes in current are characteristic of the molecular interactions between the nanopore and the analyte; further, the duration of these events may also be indicative of the interactions. The standing open current reading of the channel may be noted, and any change in this current may be due to some current impedance within the channel. Thus, directing a nucleic acid through the channel will cause a decrease in current flow through the channel as compared to the open current reading. The decrease in current while dsDNA is captured in the pore, relative to the current measured through an open channel, is a result of the blocking contributions from the double-stranded sections of DNA and any single-stranded sections of the DNA. Although the majority of the current is blocked by a single stranded portion residing in the β-barrel during unzipping, the double-stranded section that resides in the vestibule also contributes to the current blockage. This feature of oc-HL may be employed to discriminate the identity of different nucleotides, abasic sites, nucleotide analogs (e.g., furan), mismatched base pairs, sequence variations, single- nucleotide polymorphisms, mutations, epigenetic modification nucleotides, and the like. Information about a nucleotide at a particular location in a duplex may be revealed by distinct electrical current signatures, such as the duration and extent of current block and the variance of current levels. In other words, different types of nucleotides will block current to a greater or lesser extent, and thus provide a distinct current signature.
[0057] Thus, once the hybridized analyte nucleic acid is contacted with a nanopore, the present methods comprise applying an electrical voltage across the nanopore, whereupon the hybridized nucleic acid passes into the nanopore, and the base pair or the base mismatch comprising the single nucleotide and the complementary nucleotide is positioned within the first channel (e.g., the latch zone of oc-HL). The electrical voltage may be applied by using any combination of electrodes suitable for introducing a current across lipid bilayer membrane, such as, for example Ag/AgCl electrodes, using routine methods known in the art. The appropriate magnitude of voltage applied may be determined by one of ordinary skill in the art, and will depend on a number of factors, including the type of membrane used, the sequence of the probe nucleic acid, etc. In some embodiments, the voltage applied may be from about +2 V to about -2 V, from about -400 mV to about +400 mV, from about -200 mV to about +200 mV, or a range defined by any two of the foregoing values. In some embodiments, the voltage applied is about +90 mV to about +120 mV (e.g., about +100 mV). It may be possible to increase discrimination between different nucleotides by a pore using an increased applied potential. [0058] The present methods may also be carried out in the presence of charge carriers, such as metal salts (e.g., alkali metal salts and halide salts), ionic liquids, or organic salts (e.g., tetramethyl ammonium chloride, trimethylphenyl ammonium chloride, phenyltrimethyl ammonium chloride, or 1-ethy 1-3 -methyl imidazolium chloride). In some embodiments, the disclosed methods may be performed in the presence of potassium chloride (KC1), sodium chloride (NaCl), rubidium chloride (RbCl), lithium chloride (LiCl), or cesium chloride (CsCl). The salt concentration may be at saturation. The salt concentration may be 3M or lower, such as from 0.1 to 2.5 M. It will be appreciated that base-flipping kinetics may change as a function of salt concentration. Indeed, while high salt concentrations provide a high signal to noise ratio and allow for currents indicative of the presence of a polymer to be identified against the background of normal current fluctuations, it may be easier to distinguish two blocking states at low salt concentrations because they are less noisy and, in the case of the extra-helical state, longer-lived.
[0059] The present methods may also be performed in the presence of a buffer. Any suitable buffer may be used, such as, for example, a phosphate buffer. Whatever buffer is chosen, the application of voltage current across the membrane desirably may be carried out at a pH of about 4.0 to about 12.0 (e.g., about 5.0, about 5.5, about 6.0, about 6.5, about 7.0 about 7.5, about 8.0, about 8.5, about 9.0, about 9.5, about 10.0, about 10.5, about 11.0, about 11.5, or a ranged defined by any two of the foregoing values), such as from about 7.0 to about 8.0 (e.g., about 7.1, 7.2, 7.3, 7.4, 7.5, 7.6, 7.7, 7.8, or 7.9). The present method may be performed at any suitable temperature. For example, the method may be performed at a temperature from about 0° C to about 100° C. In some embodiments, the method may be performed at room temperature (e.g., from about 15 °C to about 30 °C).
[0060] The electrical voltage applied across the nanopore embedded within the membrane causes the hybridized analyte nucleic acid to pass into the nanopore such that the base pair or the base mismatch comprising the single nucleotide and complementary nucleotide is positioned within the first channel of the nanopore (e.g., the latch zone of oc- HL). With respect to oc-HL, residual current measured when DNA resides in the nanopore is sensitive to changes in DNA structure at base pairs situated in proximity to the latch zone. Mapping studies have identified that the latch sensing zone of oc-HL comprises a seven base pair detection window that spans from the sixth to the thirteenth base pair of an analyte duplex above the central constriction of oc-HL (Jin et al., J. Am. Chem. Soc, 135: 19347- 19353 (2013); and Johnson et al., Biophys. J., 107: 924-931 (2014)). More recently, the latch sensing zone has been further refined as spanning the eighth or ninth base pair of an analyte duplex above the central constriction (Ding et al, ACSNano, 9(11): 11325-11332 (2015)). As discussed above, dsDNA may be temporarily captured in the oc-HL latch zone using an appended single-stranded tail on one of the strands to pull the duplex into the vestibule. The single-stranded portion of the hybridized analyte nucleic acid passes into the nanopore through the first and second channels and enters the third channel. For example, in the case of oc-HL, the single-stranded tail may penetrate the central constriction and thread into the narrow β-barrel, while the wider dsDNA is temporarily trapped in the vestibule. In this manner, the duplex region of the analyte nucleic acid may be tethered within the latch zone sensing region, such that the base pair or base mismatch is positioned at the eighth or ninth position above the central constriction. As such, in some embodiments, the base pair or base mismatch is positioned nine nucleotides away from the single-stranded portion of the hybridized analyte nucleic acid.
[0061] As the electric voltage is applied across the nanopore while the analyte nucleic acid is temporarily captured or confined within the nanopore, the present method comprises measuring the electric current across the nanopore while the base pair or base mismatch is positioned within the first channel to obtain a measurement based on the current. The method further comprises comparing the current measurement to a reference, and then identifying the single nucleotide based on the comparison. It will be appreciated that current may be measured and reported in variety of ways under a variety of parameters. The current measurement may be a measurement of ion current flow through the nanopore, which may be the direct current (DC) flow or the alternating current (AC) flow. Additionally, AC phase- sensitive detection can be used to measure the conductance of the ion channel, while simultaneously applying a DC bias to electrostatically control the binding affinity and kinetics of charged molecules. A low amplitude AC signal (~10 mV rms) allows the protein- DNA interaction to be measured in the absence of large DC fields, thereby reducing the effects of electroosmosis, electrophoresis, and protein deformation.
[0062] In some embodiments, the current may be measured at a fixed time, in which case the reference is a measurement of a current at a fixed time when a control nucleic acid having a control base pair or a control base mismatch is independently positioned within the nanopore in a manner that causes the control base pair or the control base mismatch to be positioned within the first channel. A "fixed" time includes a specific time point, as well as a finite period of time. Measurement of current at a fixed time may allow for the differentiation of particular base pairs (e.g., G-C vs. A-T), abasic sites, and nucleotide analogs based on the change and duration of current flow through the nanopore. In these embodiments, the control nucleic acid may comprise a first control nucleic acid strand and a second control nucleic acid strand. The first control nucleic acid strand may comprise a sequence that is identical to the sequence of the analyte nucleic acid at all nucleotide positions other than the position corresponding to the single nucleotide (as described above), and has a first nucleotide at the position corresponding to the single nucleotide. The first nucleotide of the first control nucleic acid strand may be either the same as or different than the single nucleotide of the analyte nucleic acid. The second control nucleic acid may comprise a sequence that is identical to the sequence of the probe nucleic acid at all nucleotide positions other than the position corresponding to the complementary nucleotide, and that has a second nucleotide at the position corresponding to the complementary nucleotide. The second nucleotide of the second control nucleic acid strand may be either the same as or different than the complementary nucleotide of the probe nucleic acid. The first and second control nucleic acid strands are hybridized to form a hybridized control nucleic acid that includes either a control base pair or a control base mismatch comprising the first and second nucleotides. The control base pair or control base mismatch of the hybridized control nucleic acid sequence may be any suitable base pair or base mismatch, such as those described above with respect to the analyte and probe nucleic acid sequences.
[0063] In some embodiments, the current measurement is a measurement of current modulating as a function of time, in which case the reference is a current modulation signature corresponding to at least one measurement of current modulating as a function of time when a particular base pair mismatch is positioned within the first channel.
Measurement of current modulation as a function of time may permit discrimination of base pair mismatches, such as mismatches involving epigenetically modified nucleotides.
Examples of such mismatches may include, but are not limited to, a C-A mismatch, a C-C mismatch, a C-mC mismatch, a C-hmC mismatch, a C-fC mismatch and a C-caC mismatch. In this regard, for example, under specific electrolyte conditions (e.g., 10 mM phosphate, 0.25 M KC1 and pH > 7) the latch zone of a-hemolysin may be used to distinguish with single molecule level resolution a C-C or C-A mismatch from a canonical C-G base pair, at a specific site within a short sequence from codon 12 of the KRAS gene. The identification is based on unique two-state modulating current signatures observed during residence of the DNA inside the a-HL vestibule that is attributable to base-flipping at C-A and C-C mismatch sites. Specifically, upon capture of the DNA duplex, attenuation of the measured current may be observed due to an immediate decrease in the ion flux through the pore. Proximity of the C-C mismatch to the oc-HL latch constriction when the DNA resides inside the pore leads to distinct modulation of current between two states, producing a "modulation signature." The two states that comprise the modulation signature typically differ in amplitude and periodicity. The frequency of the current modulation and the current amplitudes for the two states is unique to the mismatch (C-C or C-A) and readily permits discrimination between the C-C and C-A mismatches and the fully complementary duplex (Johnson et al., J. Am. Chem. Soc, 138: 594-603 (2016)). Measurement of current modulation as a function of time may also allow for analysis of the kinetics of localized conformational changes at a mismatched base pair in DNA, which may be attributed to a single base-flipping in and out of the helix at the mismatch site (Johnson et al, Faraday Discuss., Sept. 20, 2016 (Epub ahead of print); and Johnson et al, J. Am. Chem. Soc, 138: 594-603 (2016)). Indeed, the kinetics of base- flipping of a cytosine-cytosine pair situated at the latch constriction of oc-HL may be significantly altered when one of the cytosine bases in the mismatch is modified at the carbon-5 position. Furthermore, in some embodiments, measurement of current modulation as described herein may provide information regarding base-flipping kinetics which allow for discrimination of duplexes containing a single mC, hmC, fC, or caC base.
[0064] When the current measurement is a measurement of current modulating as a function of time, the reference desirably is a current modulation signature which corresponds to at least one measurement of current modulating as a function of time when a particular base pair mismatch is positioned within the first channel. For example, the reference may be a known current modulation signature corresponding to a particular mismatch, such as, for example, a C-A mismatch, a C-C mismatch, a C-mC mismatch, a C-hmC mismatch, a C-fC mismatch, or a C-caC mismatch.
[0065] The present disclosure also provides methods for identifying a single nucleotide in the genome of an organism, which comprises amplifying a portion of the genome comprising a single nucleotide to form an analyte nucleic acid, and then performing a method as described herein to identify the single nucleic acid. The organism may be any suitable organism, such as those described herein (e.g., a bacterium, a virus, a parasite, and insect, a bird, or a mammal (e.g., cow, pig, goat, rabbit, a sheep, a hamster, guinea pig, cat, dog, rat, mouse, monkey, chimpanzee, gorilla or human)). For example, the organism may be a human, and any suitable portion of the human genome may be analyzed by the aforementioned method. The single nucleotide may be any such nucleotide described herein, e.g., cytosine (C), guanine (G), adenine (A), thymine (T), 5-methylcytosine (mC), 5- hydroxymethylcystosine (hmC), 5-formylcytosine (fC) and 5-carboxylcytosine (caC). In some embodiments, the single nucleotide may be located at a site corresponding to a single nucleotide polymorphism (SNP). A SNP is a variation at a single nucleotide that occurs at a specific position in the genome which is present to some appreciable degree within a population (e.g., greater than 1%). While most SNPs have no effect on the health or development of a particular organism, many SNPs underlie susceptibility to certain diseases (e.g., Alzheimer's disease). In embodiments where the analyte nucleic acid is RNA, the above-described nanopore may be used identify single nucleotides as for DNA described above, including modified RNA nucleotides, such as those described in The RNA
Modification Database maintained by The RNA Institute at the State University of New York at Albany (www.mods.rna.albany.edu/mods/).
[0066] A portion of the genome comprising a single nucleotide be amplified using any suitable method known in the art, such as, for example, polymerase chain reaction (PCR), and subjected to the above-described nanopore analysis as illustrated in Figure 17. In this regard, for example, custom primers may be designed to amplify desired regions of DNA to be analyzed within the latch zone of oc-HL, which can position the amplified nucleic acid sequence at position 9 of the latch zone. For example, one 8-mer primer may be used to properly position the amplified nucleic acid within the latch, and the second primer may be of standard length (e.g., 17-21 nucleotides). PCR using short primers has been previously described (Afonina et al, Nucleic Acids Res., 25: 2657 (1997)). On the longer PCR primer, a long tether terminated in a cholesterol tag may be added to the 5' end, which allows for pre- concentrating a sample on the lipid bilayer (Smith et al, Frontiers in Bioeng. Biotech., 3: 91 (2015); doi: 10.3389/fbioe.2015.00091). Following PCR amplification, a 3'-homopolymer DNA tail may be added to the amplified nucleic acid via terminal transferase, or, alternatively, T4-RNA ligase may be used for blunt-end to single-stranded DNA ligation with a 5 '-activated single-stranded DNA. The latter approach is routinely used in the art for converting RNA samples into DNA libraries for sequencing. After addition of a tail for threading the duplex into oc-HL, the sample may be cartridge purified and submitted to nanopore analysis as described above.
[0067] The following examples further illustrate the invention but, of course, should not be construed as in any way limiting its scope. EXAMPLE 1
[0068] This example demonstrates a method of differentiating between a G-C base pair and an A-T base pair in double-stranded DNA using an oc-HL nanopore system.
[0069] For all experiments described herein, a set of fishhook terminal hairpin duplexes with a 30-mer ροΓν-2λ -deoxycytidine tail on the 3' end were used as models for double- stranded DNA (dsDNA). The hairpins contained a 12-mer duplex stem with a 5λ -ΟΤΤΑ-3λ tetraloop and were synthesized from commercially available phosphoramidites (Glen Research, Sterling, VA) by the DNA-Peptide Core Facility at the University of Utah. The hairpins were purified using a semi-preparative ion-exchange HPLC column with a linear gradient of 20% to 100% B over 30 min while monitoring absorbance at 260 nm (A = 10% CH3CN/90% ddH20; B = 20 mM Tris, 1 M NaCl, pH = 8 in 10% CH3CN/90% ddH20; flow rate = 3 mL/min).
[0070] The purities of the oligodeoxynucleotides were determined by analytical ion- exchange HPLC, running the previously mentioned buffers and method, with the exception that the flow rate was 1 mL/min. After purification, all oligodeoxynucleotides were annealed by incubating them at 90 °C in analysis buffer electrolyte solution for five minutes and then rapidly cooling on ice. The duplex DNA was annealed by incubating at 90 °C in analysis buffer electrolyte solution for five minutes and then cooling slowly in a water bath to room temperature. The prepared samples were stored in a -20 °C freezer before they were used in other experiments.
[0071] The first hairpin duplex ("hp-1"), which was also used as an internal standard in other experiments described below, included six base pairs of G-C on the tail side and six base pairs of A-T on the tetraloop side, as shown in Figure 1A. The use of an internal standard allowed analysis of sequence variations between different protein channels that inherently have natural variation of 5% in the open channel current (White et al., J. Am. Chem. Soc, 129, 11766-11775 (2007)), and also provided a method to determine changes in deep blocking currents for each sequence studied (Figure IB).
[0072] The hp-1 standard was mixed in a 1 :2 ratio with another hairpin duplex ("hp-2") that introduced three G-C base pairs at positions 7, 8, and 9 (Figure 1A) and blocking ion currents were recorded. Specifically, a glass nanopore membrane (GNM) (radius 800 nm) was constructed as previously described (White et al., supra; and Zhang et al, Anal. Chem., 79, 4778-4787 (2007)). l,2-Diphytanoyl-sn-glycero-3-phospho-choline (DPhPC) bilayers spanning across the orifice of the GNM were prepared as previously described (White et al, supra). A proper bilayer was determined by a resistance of approximately 200 GO., a value consistent with previous reports (White et al, supra). The protein a-HL was diluted to 1 mg/mL in ultra-pure water and the DPhPC was dissolved in decane to a concentration of 10 mg/mL, both of which were stored at -80 °C. A pipette holder with a pressure gauge and a 10-mL gas-tight syringe was used to attach the GNM to the direct current (DC) system. Two Ag/AgCl electrodes were positioned inside and outside of the GNM to apply a voltage. A plastic pipette tip was used to paint the DPhPC solution (1 μί, 10 mg/mL) on the GNM surface. After the addition of a-HL monomer (0.2 μί, 1 mg/mL), pressure was applied to form a suspended bilayer, followed by reconstitution of a single a-HL ion channel in the bilayer (Schibel et al, J. Am. Chem. Soc, 133: 7810-7815 (2011)). All of the nanopore experiments described herein were performed at 22 ± 1 °C with 1.00 M KC1 or other electrolytes and 100 mV {trans vs. cis) as previously described (Johnson et al, Biophys. J., 107: 924-931 (2014)).
[0073] Next, the internal control was added to the cis side of the chamber with a final concentration of 5 μΜ. After collecting greater than 200 events, another sample was added to the same protein channel at a concentration of 10 μΜ to allow the comparison of the current levels between two oligodeoxynucleotides. For each experiment, data were collected from three individual protein channels and greater than 200 events were collected for each protein channel with a 10 kHz low pass filter and a 50 kHz data acquisition rate.
[0074] The blocking currents (I) were normalized by the open channel current (I0) yielding plots of II I0 (Figure 1C). Current levels and blockage durations for events in the i-t traces were extracted using QUB 1.5.0.31 software and plotted using Origin 9.1. The plots allow comparisons between samples measured with different protein pores. Only events with deep blockage current (I) that were less than 20% of the open channel current (Io) were considered to be unzipping events, similar to previous work with fishhook hairpin unzipping in the a-HL nanopore (Ding et al., J. Phys. Chem. B, 118: 12873-12882 (2014)). The error bars associated with the difference in blocking currents between the analyte and internal standard strand were determined from three individual protein channels. For the current histograms, either 100 or 150 bins of 0.1 or 0.05 pA widths were used in each plot.
[0075] Initial plots revealed two populations of events with a difference in blocking current of 1.0 ± 0.2 pA, corresponding to AI/Io = 0.010 ± 0.002 (Figure ID). Because the two samples were always mixed with two-fold more analyte strand than standard, the histogram peak areas could readily be used to identify the hp-1 signals. In the next study, the hp-1 standard was compared to a third hairpin duplex ("hp-3") (Figure 1A) that placed G-C base pairs at positions 10, 11, and 12 relative to the standard. The histogram of the normalized blocking currents displayed two populations that differed in current by 0.5 ± 0.1 pA (Figure ID). The last sequence comparison was conducted with the hp-1 standard and a fourth hairpin duplex sequence ("hp-4") that contained all G-C base pairs (hp-4, Figure 1 A). Histograms of the blocking currents for this study were found to be separated by 1.4 ± 0.3 pA (Figure ID).
[0076] These studies identified that a change of A-T vs. G-C blocks between positions 7 and 12 in the duplex significantly affects the blocking current level. When the six base-pair A-T track from positions 7 - 12 was replaced with blocks of three G-C base pairs, the recorded current was always greater (i.e., G-C was less blocking to the current). Specifically, a current increase of approximately 0.5 pA (positions 7 - 9) or approximately 1.0 pA
(positions 10 - 12) was observed for the replacement of three A-T base pairs with G-C base pairs, while replacing all six A-T base pairs with G-C led to a current increase of about 1.4 pA (Figure ID).
[0077] Structurally, A-T tracks in DNA adopt a conformation with a narrower minor groove and wider major groove compared to classical B-form duplexes leading to a slightly wider duplex. The wider duplex at A-T tracks is expected to block the current more than the duplex section observed when G-C tracks are present in the latch zone. This observation further supports the latch zone of the a-HL nanopore as a detector of sequence variations in dsDNA.
[0078] Based on the above results with native dsDNA and previous studies for analyzing damaged dsDNA in the latch (e.g., Jin et al., J. Am. Chem. Soc, 135: 19347-19353 (2013); and Johnson et al, Biophys. J., 107: 924-931 (2014)), positions 8 through 10 of the latch zone were hypothesized as likely sites that would be most sensitive for detecting alterations in the duplex sequence. To test this hypothesis, a series of duplexes in which A-T base pairs were switched for G-C base pairs at positions 8 (hp-5), 9 (hp-6), or 10 (hp-7, Figure 2 A) were studied in comparison to the standard hp-1. Positions 8 and 9 showed the greatest difference in blocking current with a value of 0.5 ± 0.1 pA between the individual A-T and G-C base pairs (Figures 2B and C). In previous studies examining damaged duplexes in the latch zone, G-C vs. G-AP was most discriminated at position 10 (Jin et al, J. Am. Chem. Soc, 135: 19347-19353 (2013); and Johnson et al., Biophys. J., 107: 924-931 (2014)). These previous studies, however, were conducted in a different sequence context that was comprised of G-C and A-T base pairs between the central constriction and the latch zone, unlike the studies described above which comprise only G-C base pairs in this region. This difference in position detected by the latch zone may be the result of poly (G-C) tracks which adopt secondary structures different from the classical B-form helix that can induce a slight elongation of the strand (Timsit et al., Nature, 341: 459-462 (1989)), thereby elongating the duplex in the vestibule.
[0079] To demonstrate that the ability to detect native base pairs in dsDNA in the latch zone of a-HL is a general phenomenon for all dsDNAs and not those derived from hairpins, a control system was developed that maintained the same sequence context as the hairpins described above containing a template-probe complementary pair without a connecting tetraloop. This study found a nearly identical current separation between the G-C vs. A-T base pairs at position 9 (0.5 ± 0.1 pA) to that observed with the fishhook hairpins (0.5 ± 0.1 pA). Curve shapes indicate that the resolution was in fact slightly better with the two-strand duplex compared to the hairpin. This result confirmed that the ability to resolve the blocking currents for G-C and A-T base pairs in DNA duplexes is not limited to fishhook hairpin duplexes.
[0080] Studies also were conducted to determine if G-C vs. C-G base pairs exhibited different current signals. These studies were conducted in two-stranded duplexes and yielded identical current levels. Thus, the latch zone of a-HL nanopore cannot differentiate different base pair orientations (i.e., G-C vs. C-G), as might be expected based on the 7-fold symmetry of interior of the latch. To further test the latch zone sensing capabilities, duplexes bearing single-stranded tails comprised of 5 '-(CAT) 10-3' were tested in the a-HL nanopore system and yielded the same current difference between A-T and G-C -containing duplexes, as previously observed, suggesting that the composition of the single-stranded tail has little or no effect on the level of discrimination.
[0081] The tails were moved from the 3' end to the 5' end of the hairpin duplexes, and the base pair discrimination abilities were similar (5λ entry = 0.3 ± 0.1 pA difference and 3λ entry = 0.5 ± 0.1 pA difference, Figure SI 4). Additional changes in the sequence context around position 9 led to the same 0.5 ± 0.1 pA current difference originally observed.
[0082] The results of this example demonstrate that the latch zone of a-HL nanopore is a robust platform for differentiating between a G-C base pair and an A-T base pair in double- stranded DNA. EXAMPLE 2
[0083] This example describes a method of differentiating base pair modifications in duplex DNA using an oc-HL nanopore.
[0084] Protein nanopores have been utilized for detecting or sequencing epigenetic markers on DNA (Wang et al., Sci. Rep. , 4: 5883 (2014); Wescoe et al, J. Am. Chem. Soc, 136: 16582-16587 (2014); Zeng et al, Chem. Sci. , 6: 5628-5634 (2015), Clarke et al, Nat. Nanotechnol, 4: 265-270 (2009); Laszlo et al, Proc. Natl. Acad. Sci. U.S.A., 110: 18904- 18909 (2013); and Wallace et al, Chem. Commun., 46: 8195-8197 (2010)). Thus, several base modifications were introduced at position 9 in fishhook terminal hairpin duplexes to determine if the latch zone of a-HL could distinguish between them. Position 9 was chosen because it gave the best discrimination between G-C vs. A-T base pairs. An 8-oxo-7,8- dihydroguanine (OG) was placed at position 9 (hp-8, Figure 3A), and blocking current for an OG-C duplex (hp-8) was compared to a standard with a G-C at the same position (hp-6) as described in Example 1. Histograms of the normalized blocking current showed a single peak, suggesting that the latch zone is not capable of differentiating an OG-C vs. G-C base pair in this duplex system (Figure 3A).
[0085] Next, a 7-deazaguanine (Jain et al, Nat. Methods, 12: 351-356 (2015)) was placed in the duplex system base paired with C (hp-9, Figure 3A). When a 1 :2 mixture of the standard hp-6 and hp-9 was studied by the a-HL nanopore system, a single peak was again observed in the current histogram (Figure 3A), confirming that the latch zone is incapable of discriminating these two very similar base pairs. These results, in which modifications to the G base did not lead to an observable current difference, suggest that the 1-2 atom structural differences that OG and 7-deazaguanine induce in the major groove of duplex DNA are not great enough to impact the blocking currents in the latch zone of the wild-type a-HL nanopore.
[0086] Methylation of the cytosine heterocycle at C5 to give 5-methylcytosine (mC) yields an epigenetic marker utilized in biology to regulate gene transcription (Zheng et al., Chem. Rev. , 114: 4602-4620 (2014); Booth et al, Chem. Rev., 115: 2240-2254 (2015); and Wagner et &\., Angew. Chem. Int. Ed. Engl, 54: 12511-12514 (2015)). These methyl groups are installed at 5'-CpG-3' sequences in the genome; therefore, a duplex was designed including a 5'-CpG-3' sequence at the most sensitive positions 8 and 9 of the a-HL nanopore. In this study, the internal standard was a strand lacking the methylation modification (hp- 10, Figure 3B), and the first test strand included a single mC at position 8 (hp-11, Figure 3B), representing a hemi-methylated strand, which is a rare occurrence in the genome. Analysis of a mixture of hp- 10 and hp-11 exhibited only one peak in the current histogram profile (Figure 3B).
[0087] Next, mC was placed at the C sites at positions 8 and 9 (hp- 12, Figure 3B).
Analysis of a mixture of hp-10 and hp-12 in the oc-HL nanopore system produced two peaks in the current histogram that were separated by 0.2 ± 0.1 pA, and they were not baseline resolved to a point that would allow calling a sequence with >95% confidence (Figure 3B). These results demonstrate that the latch zone can discriminate bismethylated strands from a parent duplex.
[0088] The base modifications probed in the latch zone all induced changes to the major groove of the duplex. When one modification was present, such as OG, 7, or mC, the current level remained the same as the parent duplex without the modification (Figures 3A and 3B). On the other hand, when two mC modifications were introduced in the duplex, current differentiation with the C-containing parent duplex was observed. The results described here for mC suggest an additive affect for the modifications to impact the blocking current level. There were no significant differences in base pair geometry or helical parameters between duplex structures containing C or mC (Renciuk et al., Nucleic Acids Res., 41: 9891-9900 (2013)). In contrast, mC induces a change in the ordered waters found in the major groove (Mayer-Jung et al, EMBO J., 17: 2709-2718 (1998)).
[0089] The results of this example demonstrate a method to detect 5-methylcytosine (mC) in specific sequences using wild-type a-HL.
EXAMPLE 3
[0090] This example describes a method of discriminating C-C and C-A mismatches from a C-G base-pair using the a-HL nanopore system.
[0091] C-C and C-A mismatches located at the latch constriction during dsDNA residence within a-HL result in distinct modulation of the measured current between two states (/; and for C-C; /; and * for C-A), as illustrated by the representative traces in Figure 4. When no DNA is present inside the a-HL pore, the open channel current, Io is observed. The modulation frequency between the two states, as well as the amplitude of the residual currents for each state, is unique to the mismatch under study. For the duplex CC9, where "9" represents the base pair position inside the vestibule in proximity to the latch constriction (Figure 4A), a less blocking state of Ii/Io = 0.34 and a more blocking state of hllo = 0.29 were observed in 99% of DNA capture events under typical electrolyte conditions (10 mM phosphate, 0.25 M KC1, pH 7.5). For duplex CA9, a more blocking state of h/I0 = 0.34 and a less blocking state of h*Uo =0.39 were observed. For duplexes containing the Watson- Crick C-G base-pair and the C-T mismatch in proximity of the latch constriction during dsDNA residence within the pore (CT9 and CG9), no such current modulation was observed; each capture event presented as a single blocking state (h/Io = 0.34). The relative current amplitudes of each state were confirmed through a series of experiments in which two duplexes, each with a different base pair in proximity to the latch constriction during DNA residence inside a-HL (either CC9, CA9, CT9 or CG9, as shown in Figure 4) were analyzed with the same a-HL protein channel (Figure S5). For each duplex, a state with residual current identical to that for the fully complementary duplex (CG9) was always observed (h), as shown in Figures 4 and 5. For the C-T duplex, the residual current of the capture events was identical to that observed for C-G (i.e., only the state /; is observed). The observation of a state with current amplitude /; in all cases indicated, for at least some time periods during residence in a-HL, the conformations of all four DNA duplexes studied were similar, attenuating ion transport through the pore to the same degree. This result is consistent with reports that incorporation of a mismatch into dsDNA has only a limited effect on the global conformation of a duplex (Tikhomirova et al, Biochemistry, 45: 10563 (2006)).
[0092] In addition to the current amplitudes, the lifetimes of each state, corresponding to current levels /; and h for CC9 (Figures 5B,5D) and /; and h* for CA9 (Figures 5C,5E), were unique to the specific mismatch under study. The lifetimes of each state in all cases were described by first order rate kinetics:
— kt
xn = xn(T)e
[0093] where x„ is the number of transitions from state n (of current amplitude /„) in time t, x„(T) is the total number of measured transitions from state n, and k is the rate constant (s_1) describing the transition kinetics. The time constants, τ, of each state, n, are given by the inverse of the rate constant:
τη = 1/k
[0094] Lifetimes of /; and h the CC9 duplex were an order of magnitude longer than for the CA9 duplex (Figures 5D,5E). For example, the state with current amplitude which was common to both duplexes, showed lifetime constants of 11 ± 1 and 0.80 ± 0.02 ms for the C-C and C-A mismatches, respectively. The differing time scales demonstrate that within the α-HL pore, reversible changes to the DNA conformation or structure (that cause measurable changes to ion flux), occur on a time scale that is strongly dependent on the identity of the mismatch site.
[0095] Equilibrium constants for these conformational changes, (KCC = 3.2 and KCA = 1.2) in the electrolyte conditions used (0.25 M KC1, 0.01 M phosphate, pH 7.5) suggest the unique DNA structures/conformations that correspond to states 2 and 2* for duplexes CC9 and CA9, respectively, were slightly favored relative to the DNA conformation represented by state 1 that was common to both duplexes. The lifetimes of states /; and were not discernible for the duplex CC9 when a 1 M KC1 electrolyte was used.
[0096] The characteristic current-time traces corresponding to duplexes CC9 and CA9 permit immediate identification and discrimination of these molecules, providing significant advantages in comparison to situations where either the unzipping kinetics (Jin et al, Biochemistry, 52: 7870 (2013)) or the current amplitude (Jin et al, J. Am. Chem. Soc, 134: 11006 (2012)) is used to identify structural changes in a duplex. The exponential kinetics of the unzipping process generates a wide range of residence times and this requires hundreds of events to be analyzed in order to generate descriptive kinetics (Sauer-Budge et al., D. Phys. Rev. Lett., 90: 238101 (2003)). While it has been shown that damage sites in a duplex can be detected based on residual current amplitudes, multiple capture events are nonetheless required for identification because the ~4% fluctuation in current amplitude between a-HL channels (Kasianowicz et al, Proc. Natl. Acad. Sci. U.S.A. , 93: 13770 (1996)) necessitates the addition of a control duplex to which the residual currents of the duplex of interest can be compared.
[0097] The results of this example demonstrate a method to identify a C-C or C-A mismatch from visual inspection of the current signature of a single nanopore capture event, which provides a significant advantage over previous methods.
EXAMPLE 4
[0098] This example demonstrates that current modulation by a C-C mismatch is localized to the latch constriction of a-HL.
[0099] The position of the mismatched C-C base pair was moved away from the latch constriction by 3-4 bases (approximately 1.02-1.36 nm) (Damaschun et al, Biomed. Biochim. Acta. , 42: 697 (1983)) as shown in Figure 6. When the mismatch is located at the 13th position from the 3' end of the shorter strand, it is located near the vestibule opening (Jin et al., J. Am. Chem. Soc , 135, 19347(2013)) away from possible interactions with the protein surface. The duplex with a mismatch in this position produced current signatures similar to those observed for the fully complementary duplex, i.e., a single, uniform current state with amplitude Irflo = 0.34 and no modulation (Figure 6). The same result was also observed for a C-C mismatch at position 6, which was situated deeper within the pore vestibule, but still away from the protein walls because it is situated at the widest, internal point of a-HL.
[00100] The dependence of the distinct current modulation upon the location of the mismatch within the duplex demonstrated a strong localized effect. Thus, it was
hypothesized that modulating current signatures observed when the mismatch was located at the latch constriction of a-HL are due to local interactions of one of the cytosine bases with the amino acid residues (lysine) at this specific point within the pore structure. The unusual modulation in current signatures described above are similar to those described for DNA hairpin structures in previous studies in which modulation between different residual current states during residence of a hairpin within the vestibule of a-HL was attributed to interactions of the terminal base-pairs with the protein surface (likely lysine residues (Song et al, Science, 274: 1859 (1996)) near the 1.4 nm central constriction (Vercoutere et al., Nucleic Acids Res., 31: 1311 (2003); Vercoutere et al, Nat. Biotechnol, 19: 248 (2001); Vercoutere et al, Biophys. J. , 84: 967 (2003); and DeGuzman et al., Nucleic Acids Res., 34: 6425 (2006)). The nature of the interactions was highly dependent on the terminal base pair, with longer dwell times observed for some states when the terminal base pair was an AT as opposed to CG (Vercoutere et al, Nucleic Acids Res., 31: 1311 (2003); and DeGuzman et al, supra).
Fraying, which results in the localized opening of dsDNA structure at the termini of hairpins and duplexes, has been widely reported (Andreatta, et al, J. Am. Chem. Soc, 128: 6885 (2006); Jose et al, Proc. Natl. Acad. Sci. U.S.A., 106: 4231 (2009); Leroy et al, J. Mol. Biol, 200: 223 (1988); Nonin et al, Biochemistry, 34: 10652 (1995)), and it is likely that opening of the duplex at the terminus is a prerequisite and/or plays a key role in nucleobase interactions with amino acids within a-HL in the above-described experiments.
[00101] In contrast to previous reports, modulating current signatures were observed when a mismatch was incorporated into the middle of the duplex structure, far away from the duplex termini, and so a mechanism other than fraying must be considered. The modulating current signatures observed may be a result of base-flipping, such that extra-helical cytosine and adenine bases are able to interact with lysine residues at the latch constriction of a-HL. [00102] The extended residence times (the total time the DNA resides in the pore prior to unzipping, xres) of duplexes with a C-C mismatch at position 9 (Figure 7) indicates that the latch constriction of a-HL was capable of stabilizing the mismatch site through localized interactions. The typical resident lifetimes of a duplex in a-HL before unzipping (as measured from the residence time constant, xres) are shorter for thermodynamically less stable duplexes (Schibel et al, J. Am. Chem. Soc, 133: 14778 (201 1); and Jin et al, Biochemistry, 52: 7870 (2013)), a finding that has been previously exploited to identify the presence of damage sites and other destabilizing influences on duplex integrity. Substituting a C-C mismatch in place of a C-G base pair in the 17-mer duplex used in these studies lowers the melting temperature, Tm, from 74 to 59 °C, irrespective of whether the substitution is made at position 9 (CC9) or at position 13 (CC13). On the basis of previous reports (Sauer- Budgeet et al, Phys. Rev. Lett., 90: 238101 (2003); Mathe et al, Biophys. J., 87: 3205 (2004); and Sutherland et al, Biochem. Cell Biol., 82: 407 (2004)), a corresponding decrease in the residence time constant (xres) of these duplexes within a-HL should also be observed. However, the residence times for these duplexes were found to be strongly dependent on the position of the mismatched base (Figure 7). A 22 ms decrease in xres relative to the fully complementary duplex (CG9) was observed for the duplexes CC 13 and CC6, but for the duplex CC9, the value of xres increased by 42 ms. While some differences in the residence times of the duplexes with CC pairs at positions 9 and 13, respectively, might have been anticipated because of the directionality of the unzipping process (Sutherland et al, supra; and Jin et al., J. Am. Chem. Soc, 134: 1 1006 (2012)) the 42 ms increase in residence times relative to the fully complementary duplex when the mismatch is at position 9 can only be explained by a stabilizing interaction between the mismatch site and the latch constriction of the a-HL protein channel. The short xres values for duplexes CC6 and CC 13 relative to CC9 suggest that the latch constriction is actively involved in the fast base-flipping kinetics that were observed, rather than just permitting its detection by virtue of its diameter relative to that of the duplex.
[00103] A proposed model for the various interactions for the CC9 duplex with aHL is shown in Figure 8. The model system begins with current amplitude Io, which corresponds to the open channel current. The duplex is driven into the vestibule with a rate constant that is dependent on the concentration in the bulk (15 μΜ). When the DNA is inside the pore, there are two possible states distinguishable by their specific attenuation of the current.
Mechanistically, these states are assigned to cases where one of the mismatched C-C bases is intra-helical (/;) or extra-helical (I2). After a period of time in the intra-helical state, the DNA unzips into its constituent strands and the pore returns to the open state. Unzipping from /2 is not observed. By assuming that this model can be described as a simple Markov chain (Privault, N., Understanding Markov Chains: Examples and Applications; Springer: New York, 2013), and using the overall dwell times in each state (either I0 (the open channel current), /; (intra-helical), or I2 (extra-helical)), the individual rate constants were extracted for each of the transition pathways.
[00104] The duplex CC9 entered the pore with a rate constant that was DNA
concentration- dependent, and is presented in Figure 8 for a concentration of 15 μΜ. It is impossible to define precisely with which state the DNA actually enters the pore (i.e., to address the nature of the helical structure in bulk solution), as in approximately the first 0.2 ms of the capture event the measured current amplitude was ill-defined (noisy) and not easily assigned to either state 1 or state 2. Presumably, this brief period represents the movement of DNA inside the vestibule prior to threading of the single-stranded tail into the pore β-barrel, as suggested in previous reports (Wang et al, Nanoscale, 6: 1 1372 (2014); and Perera et al, Nanotechnology, 26: 074002 (2015)).
[00105] The transitions between the I2 and /; states may be indicative of the base-flipping process, with state /; corresponding to a B-form DNA structure with the mismatched CC pair intra-helical. This assumption is based on the observation that the residual current of the state was identical to the residual current of the sole state observed for a fully complementary duplex, indicating that in these two scenarios the DNA conformations are similar, attenuating ion transport through a-HL by the same magnitude. The lifetime of the intra-helical state for the CC mismatch in the above experiments was 15 ± 1 ms.
[00106] State I2 was attributed to a DNA conformation inside the pore where one of the cytosine bases was extra-helical and interacting with the lysine resides of a-HL latch constriction. The extra-helical lifetimes for mismatched base pairs have been reported to be in the 10-30 ms range at 25 °C, thus, the detection of such DNA dynamics with a-HL was completely plausible (Yin et al., Proc. Natl. Acad. Sci. USA, 111 : 8043 (2014); Roberts et al, Annu. Rev. Biochem., 67: 181 (1998)). Conversely, extra-helical lifetimes for base-flipping at Watson-Crick pairs have been reported to be on the nanosecond time scale, beyond the current capabilities of ion channel recordings, which may explain the absence of current modulation for the fully complementary duplex in these experiments (Stivers et al, supra; and Gueron, M. and Leroy, J.-L., Methods Enzymol., 261: 383 (1995)). [00107] The shorter lifetimes of the extra-helical state for the CA mismatch imply that the extra-helical state of this mismatch was not stabilized by the lysine residues at the latch constriction to the same degree as for the CC mismatch. The binding of cytosine to lysine residues was 10-times stronger than that of adenine, consistent with the observations of significantly shorter extra-helical lifetimes of CA mismatches, implying that for the CA mismatch adenine is the extra-helical base (Akeson et al, Biophys. J., 77: 3227 (1999); and Bruskov, V. Stud. Biophys. (Berlin), 67S: 43 (1978)). However, it was not possible to precisely ascertain which base of the pair flips-out, and it is possible (but unlikely) that both bases flipped-out together.
[00108] Notably, the changes in current amplitude for the base-flipping associated with the CC and CA mismatches were in opposite directions, that is, the extra-helical state of the CC mismatch gave rise to further attenuation of the current relative to the intra-helical state, while the extra-helical state of CA mismatch gave rise to a current increase relative to the intra-helical state. At a molecular level, this suggests that for the CC mismatch the extra- helical state occupied a greater volume of the latch constriction than the intra-helical state, with the reverse being true for the CA mismatch, which may be a consequence of differing hydration of the exposed intra-helical bases. These observations were consistent with previously reported data for single-stranded homopolymers of cytosine, which showed greater attenuation of ion flux through a-HL than homopolymers consisting of the other DNA bases (Akeson et al, supra).
[00109] While it cannot be discounted that conformational changes other than base- flipping gave rise to the modulating current signatures, base-flipping appears the most appropriate known mechanism to fit these experimental observations. The highly localized effect (modulating current signatures were only observed when the mismatch is at the latch constriction) discount possible global conformational changes and strongly implicate the role of the lysine residues in interacting with the mismatch site. This interaction must be significant (based on the lifetime of the /; and states) and unique to the mismatch, discounting backbone and/or groove interactions which may be expected to be present regardless of the base-pair identity. While DNA sliding within the pore was considered as a potential origin of the modulating current, the independence of the kinetics on the applied voltage (see Table 1), which would affect the ability of the DNA to move vertically within the confines of the pore, suggest that a sliding mechanism is unlikely. Table 1
Figure imgf000038_0001
[00110] Notably, unzipping occurred only from the intra-helical state, and transitions from I2 to Io were not observed on a time scale shorter than the transition from I2 to
Presumably, when a cytosine base is flipped-out and interacting with the latch constriction (state ), the duplex is unable to overcome the energy barrier required to unzip because of stabilization by electrostatic and/or hydrogen bonding interactions with lysine residues. The inverse of the rate constant (x) for transition between states and Io was 43 ±4 ms, and was the true unzipping time constant for the duplex containing the CC mismatch at position 9. This value was within error of the total residence time constants observed for duplexes containing the CC mismatch at positions 6 and 13 (Figures 7). In essence, the kinetics of the unzipping process for the CC6, CC9, and CC13 were the same, but the total residence time of the CC9 duplex within the pore was longer because unzipping did not occur when a cytosine base was extra-helical (state ).
[00111] Measuring base-flipping in the context of a confined environment, as described here for the a-HL vestibule, may provide a model for how mismatched base pairs are identified in cells. Structurally, the latch constriction of a-HL and the toroidal protein proliferating cell nuclear antigen (PCNA), which is required in a number of cell processes that involve base-flipping, are similar, with both consisting of a ring of lysines on the internal surface and with internal diameters of 2.6 nm and 3.2 nm, respectively (Ivanov et al, Nucleic Acids Res., 34: 6023 (2006); Zhou, Y. and Hingorani, M. M., J. Biol. Chem., 287: 35370 (2012); Krishna et al, J. Cell, 79: 1233 (1994); Gulbis et al, Cell, 87: 297 (1996)). The primary role of PCNA is thought to be that of a molecular scaffold, forming a ring around a dsDNA duplex and directing repair proteins, although the precise role of PCNA is still poorly understood (Paunesku et al., Int. J. Radiat. Biol , 77: 1007 (2001); Masih et al, Nucleic Acids Res., 36: 67 (2008)).
[00112] The data presented here suggest that intra-helical base pair lifetimes deviate significantly from those measured in solution from single molecule florescence (Yin et al, supra). The confined context of the a-HL latch constriction in which the mismatched base pair is situated may lead to shortened intra-helical lifetimes. The decreased lifetime may result from favorable interactions with the lysine side chains, a feature that may also exist in PCNA. In addition to base-flipping kinetics, shorter (μβ time scale) fluctuations in DNA structure continuously occur. The kinetics of these structural changes, such as DNA breathing (the brief formation of ssDNA "bubbles" along the helix) are rapid and are not directly measurable, but are reflected in the noise associated with the measured current amplitude (Johnson et al, Biophys. J. , 107: 924 (2014)). The results of this example may shed light on how repair enzymes are able to "capture" and excise an extra-helical base.
EXAMPLE 5
[00113] This example demonstrates that base-flipping within the a-HL nanopore is pH- dependent.
[00114] A modulating current signature was observed only for a CC and CA mismatch present at the latch, and not for CG or CT. The CC and CA pairs were less stable than the CT and CG pairs because they are each stabilized by just one hydrogen bond, whereas the CT mismatch forms two and the CG pair three hydrogen bonds, respectively (Figure 9). The fact that the higher stability of the CT and CG base pairs prevented base-flipping and subsequent interaction with the a-HL latch constriction on a time scale shorter than the residence time prior to unzipping may explain the absence of event signatures with a current modulation for these duplexes. To test this hypothesis, unzipping experiments were conducted on the duplex CC9 as a function of pH. Under mildly acidic conditions, one of the cytosine bases in the CC mismatch is known to undergo protonation with pKa = 6, permitting the base pair CC+ to form two hydrogen bonds, thereby increasing the stability of the base-pair. The change in current signatures observed as the pH decreased from 7.5 to 6.0 is shown in Figure 10.
[00115] At more acidic pH values, the fraction of capture events presenting a modulating current signature, where the current switches between states 7/ and h, decreased, and the fraction of capture events that occurred with a single state ( i only) increased. At a pH of 6.0, where the CC+ form was expected to be dominant, all events presented a single current state identical to those observed for the CT and CG pairs. At a pH > 7, the CC form was dominant, and the modulating current signature was observed in >98% of events. The transition between the two forms was sharp, from which the pKa of the CC/CC+ system was estimated as 6.6. This value appears in reasonable agreement with prior NMR studies (pKa = 6.95) (Boulard, J. Mol. Biol., 268: 331 (1997)) especially when the differences in DNA sequence are taken into account.
[00116] Surprisingly, it was found that the transition rate constants between the /; and states remained unchanged as a function of buffer pH for those capture events exhibiting a modulating signature. The implications of this are two-fold. First, there were no changes to the chemical composition of the amino acid residues at the latch constriction between pH values of 6.0 and 7.5, consistent with the hypothesis of the presence of lysine residues at the latch constriction, which do not have amino groups with pKa values in the pH range studied. Second, the data suggest that protonation and deprotonation of the mismatch did not occur inside the a-HL pore on the timescale of duplex residence (approximately 100 ms), and so the latch constriction can in effect be used to take a "snapshot" of the protonation state of a DNA base that exists in acid-base equilibrium in the bulk solution external to the pore. Changes to the protonation state of the mismatch within the pore were not observed, which may be explained by the small number of H+ ions that passed through the pore during each DNA capture event.
[00117] The results of this example confirm that that base-flipping within the a-HL nanopore is pH-dependent.
EXAMPLE 6
[00118] This example describes a method of detecting a G-G mismatch in duplex DNA.
[00119] To determine whether the approach for the identification of cytosine-containing mismatches described above is applicable to other, non-cytosine-containing mismatch pairs, the structure of the KRAS duplex was modified to include a GG mismatch at position 9, while all of the other bases in the duplex were unchanged. The GG mismatch containing-duplex gave a unique current signature (Figure 11 A) that was approximately 1.1 pA more blocking than the fully complementary duplex (Figure 1 IB). It was immediately noticeable that capture events for the GG9 containing duplex were also characterized by higher noise, and analysis of the duplex residence times for GG9 demonstrated a greater residence time constant relative to the complementary duplex. Both of these characteristics in the data were shared with the duplexes that contain the CC and CA mismatches situated at the latch constriction during residence within a-HL. It is likely that the GG mismatch also interacts with the amino acid residues within the a-HL pore, stabilizing the structure and decreasing the probability of the duplex unzipping. The distinctly higher (approximately double) noise associated with the GG mismatch containing duplex may also indicate rapid transitions between two distinct states, although the limitations of the instrumentation did not allow for distinguishing these two states. Nevertheless, the mean current of an entire capture event can be used to assign the DNA molecule as either a GG containing duplex or a fully
complementary duplex with near-baseline resolution (Figure 11B).
[00120] The results of this example demonstrate that the latch constriction of a-HL can be used to detect additional mismatched base-pairs in individual molecules of dsDNA.
EXAMPLE 7
[00121] This example describes a method of measuring the dynamics of a DNA mismatch at a single base-pair site.
[00122] To examine base-flipping analysis at the single molecule level, a 23-base pair model sequence from a section of the KRAS gene was used. In addition to being well- characterized in the nanopore system described herein, modifications to the KRAS gene have been implicated in uncontrolled cell growth and formation of human carcinomas (Pfiefer, G. and Besaratinia, A. Hum Genet, 725(5-6): 493-506 (2009)). A homogeneous single-stranded tail 24 thymine bases in length was added to the sequence to ease threading of the duplex into the a-HL protein pore (Jin et al, J Am Chem Soc, 735(51): 19347-19353 (2013)).
Hybridization of the probe sequence, which is fully complementary to the KRAS sequence except at the 9th base as counted from the 3' terminus, generated a single cytosine-cytosine mispair that was specifically placed to align with the latch constriction of a-HL when the DNA duplex was captured by the pore forming a molecular rotaxane, as shown in Figure 12 A.
[00123] Upon capture of the DNA duplex, attenuation of the measured current was observed due to an immediate decrease in the ion flux through the pore. Proximity of the C-C mismatch to the latch constriction when the DNA resides inside the pore led to distinct modulation of current between two states (Figure 12B). The two states that comprise the modulating signature were separated by approximately 1.6 pA in amplitude and exhibited a modulation periodicity on the order of 10 ms. It was previously observed that modulation between two distinct states is a result of one of the cytosine bases in the unstable mismatch flipping in and out of the DNA helix (Johnson et al, Faraday Discuss., Sept. 20, 2016 (Epub ahead of print); and Johnson et al, J. Am. Chem. Soc , 138(2): 594-603 (2016)). The less- blocking state (approx. -10 pA) was assigned to the intra-helical conformation because the same current amplitude (and an absence of current modulation) was observed when the mismatch at the latch constriction was replaced by a stable complementary (C-G) base pair (Johnson et al, J. Am. Chem. Soc, 138(2): 594-603 (2016)).
[00124] At an applied voltage of 100 mV, the majority (>80%) of the 23 base-pair duplexes utilized in the experiments described above were held within the pore for 20 seconds or longer, with the base-flipping in and out of the helix around 200 times during this period. Under such conditions, it was possible to capture the duplexes containing a C-C mismatch one at a time, hold them within the pore for 20 seconds, and then release them by reversing the bias and driving the DNA back out into bulk solution (Figure 12B). Each duplex captured was thus analyzed individually to determine the base-flipping kinetics at the C-C mismatch site at the single molecule level.
[00125] The lifetimes of the two modulating states from a single duplex were well described by first-order rate kinetics, and the distribution of state lifetimes were used to extract characteristic lifetime constants xi and X2 (Figure 13 A), which represent the intra- helical (less blocking, /;) and extra-helical (more blocking, I2) conformations at the mismatch site. Representative intra- and extra- helical lifetime constants were found to vary from duplex to duplex of the same composition. The analysis of approximately 40 individual duplexes demonstrated a Gaussian-like distribution (Figure 13B), from which average lifetime constants for a population of duplexes of the same composition, measured with the same protein, were calculated (τ1 (mean) and X2 (mean)). This Gaussian-like distribution indicated the stochastic variation in base-flipping kinetics for different DNA duplexes captured with a single protein channel. Repeating the same experiment with DNA of the same composition and under the same conditions, but with a different protein channel, returned (within error) the same values for xi (mean) and X2 (mean), as shown in Figure 13C. The mean values from three unique protein channels (i.e., three unique experiments) were found to be 13.8, 13.1, and 14.1 ms for xi (mean), and 41.6, 43.0, and 42.2 ms for X2 (mean). EXAMPLE 8
[00126] This example demonstrates that cytosine modifications alter base-flipping kinetics in DNA.
[00127] A DNA molecule was generated identical to that shown in Figure 12A, with the exception of a mC, hmC, fC or caC base replacing one of the cytosines in the duplex at the 9th position in the sequence as counted from the 3' terminal of the shorter (23 base) strand. Initially, the cytosine in the shorter probe strand was replaced to generate a C-X mismatch (where X was either mC, hmC, fC, or caC) in proximity to the latch constriction of a-HL upon capture by DNA.
[00128] Replacing the cytosine base on the probe strand in the mismatch pair resulted in significant changes to the observed current modulation when DNA resided inside a-HL (Figure 14). Most striking was the clear change to the intra-helical and extra-helical lifetimes (states Ii and , respectively). There were also clear changes to the relative current noise associated with each of the states, and in the case of methyl-cytosine, modulation to a previously unseen, less blocking third state (/?).
[00129] For the C-mC duplex, state Ii became significantly longer relative to the C-C duplex, and was characterized by a higher noise level, particularly in the intra-helical state. This is consistent with the model of base-flipping proposed herein, because two recent reports have suggested that the incorporation of mC into a base-pair stabilizes the intra-helical state relative to the extra-helical state (Bianchi et al, J. Phys. Chem. B., 777(8): 2348-2358 (2013); and Bianchi, C. and Zangi, R, Biophys. Chem., 187-188: 14-22 (2014)). The C-hmC, C-fC, and C-caC base pairs all presented modulating current signatures between two states, but the lifetimes of each state were dramatically altered relative to the C-C duplex. For C-hmC, the extra-helical lifetime significantly decreased relative to C-C, while for C-caC, the intra- helical lifetime increased, but not to the same extent of C-mC. The C-fC duplex exhibited two distinct event types. In type I events, the extra-helical lifetimes were extremely short relative to duplexes with the C-C base pair, and in type II events, the extra-helical lifetimes were extremely long relative to duplexes with the C-C base pair. The ratio of type I to type II events was approximately 5: 1, suggesting that duplexes containing the fC base, or the fC base itself, may exist in two uniquely identifiable forms.
[00130] In most cases, visual inspection of the current-time trace was sufficient to observe which epigenetic modification to cytosine was present at the mismatch site within the duplex. While duplexes containing different epigenetic modifications were difficult to differentiate from just one parameter, for example, C-C, C-caC, and C-mC -containing duplexes all exhibited similar extra-helical (12) lifetimes, and the use of both the intra- and extra-helical lifetime parameters together permitted ready identification of all epigenetic modifications to cytosine. The base-flipping kinetics of each modification were sufficiently different to allow unambiguous identification of C-C, C-mC, C-hmC, or C-fC at the single molecule level (Figure 15). Plotted as %2 versus τ1; the data were resolved into clusters that in most cases did not overlap and were readily distinguished. While some overlap was observed for C-mC and C-caC, the former can be readily differentiated from the latter based on its unique three-state modulation signature and distinctly higher noise in state /; relative to I 2 (Figure 14).
[00131] The results of this example demonstrate that distinct kinetics for different cytosine modifications can be used to determine the identity of an individually captured duplex from a mixed sample and determine the ratio of duplex concentrations.
EXAMPLE 9
[00132] This example demonstrates that base-flipping kinetics are dependent on the flanking bases for mC- and hmC -containing duplexes.
[00133] Base-flipping kinetics, and indeed the stability of a mismatch site, have been shown to be dependent on the identity of flanking base pairs (Coman, D. and Russu, I.M., Biophys J 89(5): 3285-3292 (2005); and Folta-Stogniew, E. and Russu, I.M., Biochem, 33(36): 11016-11024 (1994)). A new series of duplexes were generated in which the modified cytosine base at the mismatch site was placed on the longer target strand rather than the probe strand. When incorporated into the probe strand, the modified base at the C-X mismatch was flanked by a 5'G and a 3'T, and in the target strand, the modified base at the X:C mismatch was flanked by a 5' A and a 3'C. The position of the mismatch site relative to the latch constriction of a-HL remained unchanged, while the pore itself was seven-fold symmetric (Song et al, supra).
[00134] A series of experiments with duplexes containing the modified cytosine flanked by 5Ά and 3'C revealed changes to the base-flipping kinetics of a population relative to the duplexes containing the modified cytosine flanked by 5'G and 3'T for the cases of mC, hmC, and fC (Figure 16). While a determination of the bases that flank the modified cytosine could not be made at the single molecule level, preliminary experiments revealed a statistically significant sequence context effect. For example, the C-mC mismatch exhibited average state lifetime constants xi (mean), and 12 (mean) of 46.5 and 41.3 ms, respectively, while the mC-C mismatch exhibited xi (mean), and X2 (mean) values of 59.1 and 43.8 ms, respectively. Changing the context mC base from A(mC)C to G(mC)T resulted in a 27% increase in xi (mean)- Changes to the time constant of the third state, X3 (mean), also were observed, with a significant decrease when mC was placed in the A(mC)C context. The increase in xi (mean), indicated that an A and C at either side of the methylcytosine base work to stabilize the intra-helical state relative to flanking T and G pairs.
[00135] When the bases that flank hmC were changed from 5Ά and 3 'C to 5 'G and 3'T, ti (mean) remained the same, but X2 (mean) increased by 49% from 13.8 to 20.6 ms, indicating a stabilization of the extra-helical state. The hydroxyl group of hmC readily forms hydrogen bonds, and is known to interact with neighboring base-pairs (Wang et al, Chem. Comm., 57(91): 16389-16392 (2015)). It is plausible that these interactions play some role in determining the stability of the extra-helical conformation at the mismatch site, and by changing the flanking bases it is possible to change the strength and or nature of these interactions.
[00136] In the cases of both mC- and hmC-containing duplexes, changing the sequence context altered just one of the time constants, i.e., only xi (mean) for mC and only x2 (mean) for hmC. In addition, the time constant that was altered was the same as the dominant change observed when changing from a C-C- to a C-mC-containing duplex or from a C-C- to a C- hmC-containing duplex.
[00137] For the fC-containing duplexes, changes to the base-flipping kinetics when the sequence context changed were dependent on the event type. No changes were observed to the kinetics of the type I event, which retained dominance at approximately 80% of capture events. However, the average extra-helical lifetime (τ2 (mean)) of type II events decreased from 89.4 to 18.5 ms, an 80% decrease. In the case of caC-containing duplexes, no change to the average state lifetimes (xi (mean) and X2 (mean)) was observed.
EXAMPLE 10
[00138] This example demonstrates that formyl-cytosine (fC) can exist as a hydrate in aqueous solution.
[00139] Two unique event types were observed for 23-mer duplexes in which cytosine was substituted with formyl-cytosine (fC), as shown in Figures 14 and 15. A count of the number of events of each type indicated a ratio of approximately 5 : 1, where type I events were more prevalent, regardless of whether the fC was in the shorter (C-fC) or longer (fC-C) strand. The two distinct event types observed for the fC-containing duplex may indicate that
formylcytosine, within the context of the DNA duplex, exists in two unique structural forms, with each form having different base-flipping kinetics when confined at the latch constriction of a-HL.
[00140] The two event types observed for fC-containing duplexes may be the result of hydration of the formyl group in aqueous solution. Aldehydes undergo nucleophilic addition in water to form hydrates, with both the hydrate and formyl structures existing in an equilibrium defined by the relative stabilities of the two structures (Hilal et al, QSAR Comb. Sci., 24(5): 631-638 (2005)). The existence of formylcytosine base in hydrate form was previously measured at very low quantities (0.5%) by Carell and co-workers via mass spectrometry (Pfaffeneder et al, Angew Chemie Int Ed., 60(31): 7008-7012 (2011)). The above results, with the advantage that measurements are made directly in DNA's native aqueous environment, suggest that the hydrate is potentially more abundant. This hypothesis is supported by data for hydrate equilibrium constants for similar pyridinium aldehydes that are also electron deficient, and have previously been shown to exist in the hydrate form in significant quantities. For these types of aldehydes, the hydrate is present at levels of 1-20 %. (KHYD= [hydrate]/[aldehyde] = 0.2 - 0.01) (Hilal et al, supra; and Huang et al, Tet Lett, 50(47): 6584-6585 (2009)). Based on these prior reports, the existence of formylcytosine in the hydrate form for the DNA strands studied here is highly plausible, and the hydrate form may represent the minor (type II) events observed in these experiments.
[00141] Once an fC-containing duplex was captured by the a-HL nanopore, no hydration or dehydration reactions were observed within the 20s time period that the DNA was held inside the pore. Hydration and dehydration is expected to be rapid in bulk solution, catalysed by nucleophilic OH- ions in basic solutions. During a DNA capture event, the negatively-charged DNA backbone results in electrostatic exclusion of anions (including OH- ) from entering the pore (Johnson et al, J. Phys. Chem. Lett., 5(21): 3781-3786 (2014)). In such circumstances, conversion between the two forms is expected to be extremely slow or impossible.
[00142] The results of this example demonstrate that the a-HL nanopore is capable of taking a 'snapshot' of the aldehyde/aldehyde hydrate equilibrium in bulk through
determination of the ratio of base-flipping event types, and suggest an equilibrium constant of hydration for fC in the tested DNA of KHYD = kaidehyde khydrate = 0.2. [00143] All references, including publications, patent applications, and patents, cited herein are hereby incorporated by reference to the same extent as if each reference were individually and specifically indicated to be incorporated by reference and were set forth in its entirety herein.
[00144] The use of the terms "a" and "an" and "the" and "at least one" and similar referents in the context of describing the invention (especially in the context of the following claims) are to be construed to cover both the singular and the plural, unless otherwise indicated herein or clearly contradicted by context. The use of the term "at least one" followed by a list of one or more items (for example, "at least one of A and B") is to be construed to mean one item selected from the listed items (A or B) or any combination of two or more of the listed items (A and B), unless otherwise indicated herein or clearly contradicted by context. The terms "comprising," "having," "including," and "containing" are to be construed as open-ended terms (i.e., meaning "including, but not limited to,") unless otherwise noted. Recitation of ranges of values herein are merely intended to serve as a shorthand method of referring individually to each separate value falling within the range, unless otherwise indicated herein, and each separate value is incorporated into the specification as if it were individually recited herein. All methods described herein can be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. The use of any and all examples, or exemplary language (e.g., "such as") provided herein, is intended merely to better illuminate the invention and does not pose a limitation on the scope of the invention unless otherwise claimed. No language in the specification should be construed as indicating any non-claimed element as essential to the practice of the invention.
Preferred embodiments of this invention are described herein, including the best mode known to the inventors for carrying out the invention. Variations of those preferred embodiments may become apparent to those of ordinary skill in the art upon reading the foregoing description. The inventors expect skilled artisans to employ such variations as appropriate, and the inventors intend for the invention to be practiced otherwise than as specifically described herein. Accordingly, this invention includes all modifications and equivalents of the subject matter recited in the claims appended hereto as permitted by applicable law. Moreover, any combination of the above-described elements in all possible variations thereof is encompassed by the invention unless otherwise indicated herein or otherwise clearly contradicted by context.

Claims

CLAIMS:
1. A method for identifying a single nucleotide of an analyte nucleic acid, comprising:
(a) hybridizing the analyte nucleic acid to a probe nucleic acid to form a hybridized analyte nucleic acid that includes a double-stranded portion comprising either (i) a base pair comprising the single nucleotide and a complementary nucleotide on the probe nucleic acid or (ii) a base mismatch comprising the single nucleotide and a non-complementary nucleotide on the probe nucleic acid,
(b) contacting the hybridized analyte nucleic acid with a nanopore within a membrane, the nanopore comprising at least a first region defining a first channel with a diameter sufficient to allow passage of double-stranded nucleic acids, a second region proximate to the first region and defining a second channel with a diameter that is larger than the first diameter, and a third region proximate to the second region and spaced from the first region and defining a third channel with a diameter sufficient to allow passage of single-stranded nucleic acids but not sufficient to allow passage of double-stranded nucleic acids,
(c) applying an electrical voltage across the nanopore, whereupon the hybridized analyte nucleic acid passes into the nanopore, and the base pair or the base mismatch is positioned within the first channel,
(d) measuring the electric current across the nanopore while the base pair or base mismatch is positioned within the first channel to obtain a measurement based on the current,
(e) comparing the measurement of step (d) to a reference, and
(f) identifying the single nucleotide based on the comparison of step (e).
2. The method of claim 1 , wherein the nanopore comprises a protein.
3. The method of claim 2, wherein the protein comprises a lysine residue positioned within the first channel.
4. The method of claim 2 or claim 3, wherein the protein is a a-hemolysin.
5. The method of claim 4, wherein the a-hemolysin is a wild-type a-hemolysin.
6. The method of claim 4, wherein the first region is the latch zone of a- hemolysin, the second region is the vestibule region of a-hemolysin, and the third region is the β-barrel region of a-hemolysin.
7. The method of any one of claims 1-6, wherein the membrane is a phospholipid bilayer.
8. The method of claim 7, wherein the phospholipid is
diphytanoylphosphatidylcholine.
9. The method of any one of claims 1-8, wherein the single nucleotide is selected from the group consisting of cytosine (C), guanine (G), adenine (A), thymine (T), 5- methylcytosine (mC), 5-hydroxymethylcystosine (hmC), 5-formylcytosine (fC) and 5- carboxylcytosine (caC).
10. The method of any one of claims 1-9, wherein the base pair is selected from the group consisting of an A-T base pair, a G-C base pair, a G-mC base pair, a G-hmC base pair, a G-fC base pair and a G-caC base pair.
11. The method of any one of claims 1-10, wherein the base mismatch is selected from the group consisting of a C-A mismatch, a C-C mismatch, a C-mC mismatch, a C-hmC mismatch, a C-fC mismatch and a C-caC mismatch.
12. The method of any one of claims 1-11, wherein the probe nucleic acid has a different length than the analyte nucleic acid, and the hybridized analyte nucleic acid includes a single stranded portion, and whereupon applying an electrical voltage across the nanopore, the single stranded portion of the hybridized nucleic acid passes into the nanopore through the first and second channels and enters the third channel.
13. The method of claim 12, wherein the base pair or base mismatch is positioned nine nucleotides away from the single-stranded portion of the hybridized analyte nucleic acid.
14. The method of any one of claims 1-13, wherein the hybridized analyte nucleic acid is comprised of DNA, RNA or a combination of DNA and RNA.
15. The method of claim 1, wherein the measurement based on the current of step (d) is a measurement of a current at a fixed time, and the reference of step (e) is a measurement of a current at a fixed time when a control nucleic acid having a control base pair or a control base mismatch is independently positioned within the nanopore in a manner that causes the control base pair or the control base mismatch to be positioned within the first channel.
16. The method of claim 15, wherein the control nucleic acid comprises:
a first control nucleic acid strand comprising a sequence that is identical to the sequence of the analyte nucleic acid at all of the nucleotide positions other than the position corresponding to the single nucleotide, and that has a first nucleotide at the position corresponding to the single nucleotide, and
a second control nucleic acid strand that is identical to the sequence of the probe nucleic acid at all of the nucleotide positions other than the position corresponding to the complementary nucleotide, and that has a second nucleotide at the position corresponding to the complementary nucleotide, and
wherein the first and second control nucleic acid strands are hybridized to form a hybridized control nucleic acid that includes either a control base pair or a control base mismatch comprising the first and second nucleotides.
17. The method of claim 16, wherein the first nucleotide is a different nucleotide than the single nucleotide.
18. The method of claim 16, wherein the second nucleotide is a different nucleotide than the complementary nucleotide.
19. The method of claim 16, wherein the control base pair is selected from the group consisting of an A-T base pair, a G-C base pair, a G-mC base pair, a G-hmC base pair, a G-fC base pair and a G-caC base pair.
20. The method of claim 16, wherein the control base mismatch is selected from the group consisting of a C-A mismatch, a C-C mismatch, a C-mC mismatch, a C-hmC mismatch, a C-fC mismatch and a C-caC mismatch.
21. The method of claim 1, wherein the measurement based on the current of step (d) is a measurement of current modulating as a function of time, and the reference of step (e) is a current modulation signature corresponding to at least one measurement of current modulating as a function of time when a particular base pair mismatch is positioned within the first channel.
22. The method of claim 21, wherein the particular base pair mismatch corresponding to the current modulation signal is selected from the group consisting of a C-A mismatch, a C-C mismatch, a C-mC mismatch, a C-hmC mismatch, a C-fC mismatch and a C-caC mismatch.
23. The method of claim 21 or claim 22, wherein the current modulating as a function of time is due to base-flipping.
24. A method for identifying a single nucleotide in the genome of an organism, comprising amplifying a portion of the genome comprising a single nucleotide to form an analyte nucleic acid, and then performing the method of claim 1 to identify the single nucleic acid.
25. The method of claim 24, wherein the single nucleotide is positioned at a site corresponding to a single nucleotide polymorphism.
26. The method of claim 24, wherein the portion of the genome is amplified using a polymerase chain reaction (PCR).
PCT/US2016/058521 2015-10-23 2016-10-24 Methods and systems for detecting variations in dna WO2017070693A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US201562245920P 2015-10-23 2015-10-23
US62/245,920 2015-10-23
US201562386954P 2015-12-17 2015-12-17
US62/386,954 2015-12-17

Publications (1)

Publication Number Publication Date
WO2017070693A1 true WO2017070693A1 (en) 2017-04-27

Family

ID=58557925

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2016/058521 WO2017070693A1 (en) 2015-10-23 2016-10-24 Methods and systems for detecting variations in dna

Country Status (1)

Country Link
WO (1) WO2017070693A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090123914A1 (en) * 2004-09-24 2009-05-14 Ingeneus Inc. Genomic Assay
WO2015021055A1 (en) * 2013-08-05 2015-02-12 The Curators Of The University Of Missouri Base-pair specific inter-strand locks for genetic and epigenetic detection

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090123914A1 (en) * 2004-09-24 2009-05-14 Ingeneus Inc. Genomic Assay
WO2015021055A1 (en) * 2013-08-05 2015-02-12 The Curators Of The University Of Missouri Base-pair specific inter-strand locks for genetic and epigenetic detection

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
DEAMER ET AL.: "Characterization of nucleic acids by nanopore analysis", ACC CHEM RES, vol. 35, no. 10, 27 September 2002 (2002-09-27), pages 817 - 825, XP002226144 *
JIN ET AL.: "Base-excision repair activity of uracil-DNA glycosylase monitored using the latch zonE of a-hemolysin", J AM CHEM SOC, vol. 135, no. 51, 11 December 2013 (2013-12-11), pages 19347 - 19353, XP055378020 *
JIN ET AL.: "Structural destabilization of DNA duplexes containing single-base lesions investigatec by nanopore measurements", BIOCHEMISTRY, vol. 52, no. 45, 31 October 2013 (2013-10-31), pages 7870 - 7877, XP055378016 *
JIN ET AL.: "Unzipping kinetics of duplex DNA containing oxidized lesions in an a-hemolysin nanopore", J AM CHEM SOC, vol. 134, no. 26, 4 July 2012 (2012-07-04), pages 11006 - 11011, XP055378008 *
SONG ET AL.: "Structure of staphylococcal alpha-hemolysin, a heptameric transmembrane pore", SCIENCE, vol. 274, no. 5294, 13 December 1996 (1996-12-13), pages 1859 - 66, XP002122973 *

Similar Documents

Publication Publication Date Title
US20220064724A1 (en) Analysis of a polynucleotide via a nanopore system
US9150918B2 (en) Identifying modified bases using hemi-natural nucleic acids
CN104955958B (en) Nucleic acid sequencing using a label
EP3152320B1 (en) Compositions, systems, and methods for detecting events using tethers anchored to or adjacent to nanopores
JP6226869B2 (en) Enzyme method
KR20190075010A (en) System and method for measurement and sequencing of biomolecules
WO2014071250A1 (en) Methods for detecting and mapping modifications to nucleic acid polymers using nanopore systems
Ding et al. Differentiation of G: C vs A: T and G: C vs G: mC base pairs in the latch zone of α-hemolysin
Tan et al. γ-Hemolysin nanopore is sensitive to guanine-to-inosine substitutions in double-stranded DNA at the single-molecule level
Wolna et al. Electrical current signatures of DNA base modifications in single molecules immobilized in the α‐hemolysin ion channel
WO2019226822A1 (en) Methods of analyzing capped ribonucleic acids
EP3673085A1 (en) Enzyme screening methods
WO2017070693A1 (en) Methods and systems for detecting variations in dna
JP2024500005A (en) Ready-to-use nanopore platform for attomolar DNA/RNA oligo detection using osmium-tagged complementary probes
CN113039285A (en) Liquid sample workflow for nanopore sequencing
CN111836904A (en) Compositions and methods for unidirectional nucleic acid sequencing
US11427859B2 (en) Nanopore platform for DNA/RNA oligo detection using an osmium tagged complementary probe
Tan Detection of DNA base modifications by biological nanopores
Wolna Single-molecule analysis of DNA cross-links using nanopore technology

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16858455

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16858455

Country of ref document: EP

Kind code of ref document: A1