WO2023141650A1

WO2023141650A1 - Modified adenine for nucleic acid based molecular electronics

Info

Publication number: WO2023141650A1
Application number: PCT/US2023/061152
Authority: WO
Inventors: Peiming Zhang; Feng Liang; Ming Lei
Original assignee: Universal Sequencing Technology Corporation
Priority date: 2022-01-24
Filing date: 2023-01-24
Publication date: 2023-07-27

Abstract

This disclosure provides for modified adenines and adenosines for engineering DNA and other nucleic acids. In particular, such modified adenine and adenosines are intended to increase the conductivity of the DNA into which they are incorporated, thereby allowing such DNA to be used, for example, as effective molecular wires, and also allowing for greater control when tailoring the properties of DNA.

Description

MODIFIED ADENINE FOR NUCLEIC ACID BASED MOLECULAR ELECTRONICS

CROSS REFERENCE TO RELATED APPLICATIONS

The present application claims priority to and the benefit of U.S. App. No. 63/302,158, filed January 24, 2022, which is hereby incorporated by reference in its entirety.

BACKGROUND OF THE INVENTION

DNA can transport electrons through its helical structure, which makes DNA an attractive molecular wire for molecular electronics because of its uniform one-dimensional structure (~ 2 nm diameter), programmable self-assembly, which allows one to design structures based on the Watson-Crick base-pairing rule (G base pairs with C and A with T), and tunable length ranging from nanometer to micrometers with angstrom accuracy.

However, the conductivity of DNA is sequence-dependent, and thymidine-adenine (T-A) base pairs make a DNA molecule less conductive compared to guanine-cytosine (G-C) base pairs. Although homogeneous sequences containing only G-C base pairs exhibit relatively high hole mobility for charge transport, given the length of the molecule and requisite high levels of purity, synthesis is difficult. Moreover, GC-rich DNA is prone to forming undesired secondary and even quaternary structures, which complicates the measurements of, and assembly of a DNA based molecule device. Thus, it is desirable to improve the conductivity of A-T base pairs for use in DNA wires.

SUMMARY OF THE INVENTION

The invention generally provides modified adenosines having improved conductivity and methods of using modified A-T base pairs, for example, in a DNA wire.

In one aspect, the invention features a system including a conductive or semiconductive molecular wire, the system containing a nanostructure including one or more nucleic acid base pairs, where at least one nucleobase within the nanostructure includes 2- amino-7-deaza-adenine, or the following:

In another aspect, the invention features a system for identification, characterization, or sequencing of a biopolymer, the system including: a nanogap formed by a first electrode and a second electrode; a nanostructure having two ends, the nanostructure containing one or more nucleic acid base pairs, where the nanostructure bridges the nanogap between the electrodes, where each of the electrodes is chemically bound to one of the ends of the nanostructure, and where the nanostructure includes a 2-amino-7-deaza-adenine; and a sensing probe attached to the nanostructure that can interact or perform a chemical or biochemical reaction with the biopolymer.

In another aspect, the invention features a method for improving the conductance of a molecular wire. The method involves: providing a molecular wire, where the molecular wire is a nanostructure including one or more nucleic base pairs; and modifying at least one nucleic base to 2-amino-7-deaza-adenine.

In another aspect, the invention features a method for identification, characterization, or sequencing of a biopolymer. The method involves: forming a nanogap by placing a first electrode and a second electrode in proximity to one another on a non-conductive substrate or overlapping each other separated by a non-conductive layer; providing a nanostructure containing one or more nucleic acid base pairs with length comparable to the nanogap, where at least one nucleic acid base within the nanostructure is 2-amino-7-deaza-adenine; chemically bonding one end of the nanostructure to the first electrode and the other end of the nanostructure to the second electrode; and attaching a sensing probe to the nanostructure, where the sensing probe is capable of physically interacting with or performing a chemical or a biochemical reaction with the biopolymer.

In any of the above aspects, or embodiments thereof, the molecular wire is in electrical communication with the electrodes.

In any of the above aspects, or embodiments thereof, the molecular wire bridges a nanogap between the two electrodes. In any of the above aspects, or embodiments thereof, the molecular wire includes two ends, each of which is chemically bound to one of the electrodes.

In any of the above aspects, or embodiments thereof, one of the electrodes is positively charged and the other is negatively charged.

In any of the above aspects, or embodiments thereof, the molecular wire contains DNA, RNA, a peptide nucleic acid molecule, a DNA/RNA duplex, or combinations thereof.

In any of the above aspects, or embodiments thereof, the electrodes are disposed on a non-conductive substrate or are separated by a non-conductive layer.

In any of the above aspects, or embodiments thereof, the system includes one or more of the following: a bias voltage that is applied between the first electrode and the second electrode; a device that records a current fluctuation through the nanostructure caused by the interaction between the sensing probe and a biopolymer; and a software for data analysis that identifies or characterizes the biopolymer or a subunit of the biopolymer.

In any of the above aspects, or embodiments thereof, the nanostructure is a nucleic acid duplex, a nucleic acid triplex, a nucleic acid quadruplex, a nucleic acid origami structure, or a combination thereof.

In any of the above aspects, or embodiments thereof, the biopolymer contains DNA, RNA, a protein, a polypeptide, an oligonucleotide, a polysaccharide, or analogues thereof, either natural, synthesized, or modified.

In any of the above aspects, or embodiments thereof, the sensing probe is a polynucleotide or a polypeptide.

In any of the above aspects, or embodiments thereof, the sensing probe is a nucleic acid probe, a molecular tweezers, an enzyme, a receptor, a ligand, an antigen and an antibody, either native, mutated, expressed, or synthesized, or a combination thereof.

In any of the above aspects, or embodiments thereof, the enzyme is a DNA polymerase, an RNA polymerase, a DNA helicase, a DNA ligase, a DNA exonuclease, a reverse transcriptase, a RNA primase, a ribosome, a sucrase, or a lactase, either natural, mutated or synthesized.

In any of the above aspects, or embodiments thereof, the nanostructure contains a polypeptide.

In any of the above aspects, or embodiments thereof, the polypeptide is a DNA polymerase, an RNA polymerase, a DNA helicase, a DNA ligase, a DNA exonuclease, a reverse transcriptase, an RNA primase, a ribosome, a sucrase, or a lactase, either natural, mutated or synthesized.

In any of the above aspects, or embodiments thereof, the nanogap comprises about 3 to 1000 nm, about 5 to 100 nm, or about 5 to 30 nm.

In any of the above aspects, or embodiments thereof, the nanogap is 5, 10, 20, 25, or 30 nm.

In any of the above aspects, or embodiments thereof, the electrodes contain platinum (Pt), gold (Au), silver (Ag), palladium (Pd), rhodium (Rd), ruthenium (Ru), osmium (Os), iridium (Ir), copper (Cu), rhenium (Re), titanium (Ti), Niobium (Nb), Tantalum (Ta) and their derivatives, or an alloy or combination thereof.

In any of the above aspects, or embodiments thereof, the Ta derivative is TiN, or TaN.

In any of the above aspects, or embodiments thereof, the method involves: applying a bias voltage between the first electrode and the second electrode; providing a device that records a current fluctuation through the nanostructure caused by the interaction between the sensing probe and the biopolymer; and providing a software for data analysis that identifies or characterizes the biopolymer or a subunit of the biopolymer.

In any of the above aspects, or embodiments thereof, the biopolymer is a DNA, an RNA, a protein, a polypeptide, an oligonucleotide, a polysaccharide, or their analogies, either natural, synthesized, or modified.

In any of the above aspects, or embodiments thereof, the sensing probe is a nucleic acid probe, a molecular tweezers, an enzyme, a receptor, ligands, an antigen and an antibody, either native, mutated, expressed, or synthesized, or a combination thereof.

In any of the above aspects, or embodiments thereof, the enzyme is a DNA polymerase, an RNA polymerase, a DNA helicase, a DNA ligase, a DNA exonuclease, a reverse transcriptase, an RNA primase, a ribosome, a sucrase, or a lactase, either natural, mutated or synthesized.

In any of the above aspects, or embodiments thereof, the nanogap size or the distance between the two electrodes is about 3 to 1000 nm, preferably about 5 to 100 nm, and most preferably about 5 to 30 nm. In any of the above aspects, or embodiments thereof, the electrodes contain platinum (Pt), gold (Au), silver (Ag), palladium (Pd), rhodium (Rd), ruthenium (Ru), osmium (Os), iridium (Ir), copper (Cu), rhenium (Re), titanium (Ti), Niobium (Nb), Tantalum (Ta) and their derivatives, or a combination thereof.

In any of the above aspects, or embodiments thereof, the method involves: providing 2-amino-7-deaza-adenine-triphosphate; and incorporating the 2-amino-7-deaza-adenine- triphosphate into a nucleic acid strand within the nanostructure enzymatically.

The invention provides modified adenosines. Compositions and articles defined by the invention were isolated or otherwise manufactured in connection with the examples provided below. Other features and advantages of the invention will be apparent from the detailed description, and from the claims.

Definitions

Unless defined otherwise, all technical and scientific terms used herein have the meaning commonly understood by a person skilled in the art to which this invention belongs. The following references provide one of skill with a general definition of many of the terms used in this invention: Singleton et al., Dictionary of Microbiology and Molecular Biology (2nd ed. 1994); The Cambridge Dictionary of Science and Technology (Walker ed., 1988); The Glossary of Genetics, 5th Ed., R. Rieger et al. (eds.), Springer Verlag (1991); and Hale & Marham, The Harper Collins Dictionary of Biology (1991). As used herein, the following terms have the meanings ascribed to them below, unless specified otherwise.

The term “adaptor” refers to a sequence that is added, for example, by ligation, to a nucleic acid. The length of an adaptor may be from about 5 to about 100 bases, and may provide a sequencing primer binding site (e.g., an amplification primer binding site), and a molecular barcode such as a sample identifier sequence or molecule identifier sequence, preferably a unique identifier sequence. An adaptor may be added to 1) the 5' end, 2) the 3' end, or 3) both ends of a nucleic acid molecule. Double-stranded adaptors contain a doublestranded end ligated to a nucleic acid. An adaptor can have an overhang or may be blunt ended. As will be described in greater detail below, a double stranded adaptor can be added to a fragment by ligating only one strand of the adaptor to the fragment. The sequence of the non-ligated strand of the adaptor may be added to the fragment using a polymerase. Y- adaptors and loop adaptors are type of double-stranded adaptors.

By "alteration" is meant a change (increase or decrease) in the structure, activity, or conductive characteristics of a nucleobase or polynucleotide as detected by standard art known methods such as those described herein. As used herein, an alteration includes a 10% change in conductivity, a 25% change, a 40% change, a 50% or greater change in conductivity.

By "analog" is meant a molecule that is not identical, but has analogous functional or structural features.

As used herein, the term “antisense strand” refers to a polynucleotide that is substantially or 100% complementary to a target nucleic acid of interest. For example, an antisense strand may be complementary, in whole or in part, to a molecule of mRNA (messenger RNA), an RNA sequence that is not mRNA (e.g., microRNA, piwiRNA, tRNA, rRNA and hnRNA) or a sequence of DNA that is either coding or non-coding. The terms “antisense strand” and “guide strand” are used interchangeably herein.

In this disclosure, "comprises," "comprising," "containing" and "having" and the like can have the meaning ascribed to them in U.S. Patent law and can mean " includes," "including," and the like; "consisting essentially of' or "consists essentially" likewise has the meaning ascribed in U.S. Patent law and the term is open-ended, allowing for the presence of more than that which is recited so long as basic or novel characteristics of that which is recited is not changed by the presence of more than that which is recited, but excludes prior art embodiments.

By “complementary” is meant capable of pairing to form a double-stranded nucleic acid molecule or portion thereof. In one embodiment, an antisense molecule is in large part complementary to a target sequence. The complementarity need not be perfect, but may include mismatches at 1, 2, 3, or more nucleotides.

By “corresponds” is meant comprising at least a fragment of a double-stranded gene, such that a strand of the double-stranded inhibitory nucleic acid molecule is capable of binding to a complementary strand of the gene.

By “decreases” is meant a reduction by at least about 5% relative to a reference level. A decrease may be by 5%, 10%, 15%, 20%, 25% or 50%, or even by as much as 75%, 85%, 95% or more and any intervening percentages.

“Detect” refers to identifying the presence, absence amount, or characteristics of the analyte to be detected. In some embodiments, the analyte is a modified adenosine. In other embodiments, the conductivity of the modified adenosine is detected using methods known in the art. By "detectable label" is meant a composition that when linked to a molecule of interest renders the latter detectable, via spectroscopic, photochemical, biochemical, immunochemical, or chemical means. For example, useful labels include radioactive isotopes, magnetic beads, metallic beads, colloidal particles, fluorescent dyes, electron-dense reagents, enzymes (for example, as commonly used in an ELISA), biotin, digoxigenin, or haptens.

By “DFT” is meant density functional theory, from which molecular structures and properties are computed. In this disclosure, DFT analyses are conducted using the commercially available software, SPARTAN (Wavefunction, Inc., USA). In particular, DFT calculations in this disclosure were conducted using the B3LYP/6-31G* approach, as further described in the Spartan’20 Tutorial and User’s Guide (Wavefunction, Inc., 2022, downloads. wavefun.com/Spartan20Manual. pdf), which is hereby incorporated by reference in its entirety.

The term “expression” or “expressed” as used herein in reference to a gene means the transcriptional and/or translational product of that gene. The level of expression of a DNA molecule in a cell may be determined on the basis of either the amount of corresponding mRNA that is present within the cell or the amount of protein encoded by that DNA produced by the cell (Sambrook et al., 1989 Molecular Cloning: A Laboratory Manual, 18.1-18.88). Expression of a transfected gene can occur transiently or stably in a cell. During “transient expression” the transfected gene is not transferred to the daughter cell during cell division. Since its expression is restricted to the transfected cell, expression of the gene is lost over time. In contrast, stable expression of a transfected gene can occur when the gene is cotransfected with another gene that confers a selection advantage to the transfected cell. Such a selection advantage may be a resistance towards a certain toxin that is presented to the cell.

By "fragment" is meant a portion of a polypeptide or nucleic acid molecule. This portion contains, preferably, at least 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, or 90% of the entire length of the reference nucleic acid molecule or polypeptide. A fragment may contain 10, 20, 30, 40, 50, 60, 70, 80, 90, or 100, 200, 300, 400, 500, 600, 700, 800, 900, or 1000 nucleotides or amino acids.

By “high-throughput sequencing” is meant a sequencing technique that allows for large amounts of nucleic acids to be sequenced.

"Hybridization" means hydrogen bonding, which may be Watson-Crick, Hoogsteen or reversed Hoogsteen hydrogen bonding, between complementary nucleobases. For example, adenine and thymine are complementary nucleobases that pair through the formation of hydrogen bonds.

The terms "isolated," "purified," or "biologically pure" refer to material that is free to varying degrees from components which normally accompany it as found in its native state. "Isolate" denotes a degree of separation from original source or surroundings. "Purify" denotes a degree of separation that is higher than isolation. A "purified" or "biologically pure" protein is sufficiently free of other materials such that any impurities do not materially affect the biological properties of the protein or cause other adverse consequences. That is, a nucleic acid or peptide of this invention is purified if it is substantially free of cellular material, viral material, or culture medium when produced by recombinant DNA techniques, or chemical precursors or other chemicals when chemically synthesized. Purity and homogeneity are typically determined using analytical chemistry techniques, for example, polyacrylamide gel electrophoresis or high performance liquid chromatography. The term "purified" can denote that a nucleic acid or protein gives rise to essentially one band in an electrophoretic gel. For a protein that can be subjected to modifications, for example, phosphorylation or glycosylation, different modifications may give rise to different isolated proteins, which can be separately purified.

By "isolated polynucleotide" is meant a nucleic acid (e.g., a DNA) that is free of the genes which, in the naturally-occurring genome of the organism from which the nucleic acid molecule of the invention is derived, flank the gene. The term therefore includes, for example, a recombinant DNA that is incorporated into a vector; into an autonomously replicating plasmid or virus; or into the genomic DNA of a prokaryote or eukaryote; or that exists as a separate molecule (for example, a cDNA or a genomic or cDNA fragment produced by PCR or restriction endonuclease digestion) independent of other sequences. In addition, the term includes an RNA molecule that is transcribed from a DNA molecule, as well as a recombinant DNA that is part of a hybrid gene encoding additional polypeptide sequence.

By an "isolated polypeptide" is meant a polypeptide of the invention that has been separated from components that naturally accompany it. Typically, the polypeptide is isolated when it is at least 60%, by weight, free from the proteins and naturally-occurring organic molecules with which it is naturally associated. Preferably, the preparation is at least 75%, more preferably at least 90%, and most preferably at least 99%, by weight, a polypeptide of the invention. An isolated polypeptide of the invention may be obtained, for example, by extraction from a natural source, by expression of a recombinant nucleic acid encoding such a polypeptide; or by chemically synthesizing the protein. Purity can be measured by any appropriate method, for example, column chromatography, polyacrylamide gel electrophoresis, or by HPLC analysis.

By “molecular wire” is meant a molecule or a molecular structure capable of conducting electrical current.

By “nanogap” is meant a gap between two objects having a size measuring in the nanometer scale range. In an embodiment, the nanogap is formed between two electrodes.

By “nanostructure” is meant any structure with one or more dimensions, measuring in the nanometer scale range.

By “RNA-seq” is meant RNA sequencing for detecting and quantifying RNA molecules in a biological sample, which, for example, may be used to study cellular responses. A related term, “scRNA-seq” is single-cell RNA sequencing, which may be, for example, a droplet-based single-cell RNA-seq or “Drop-seq,” that is a sequencing technology for analyzing RNA expression in at least hundreds of thousands of individual cells in embodiments of the disclosure, but may alternatively use any other high-throughput sequencing platform.

As used herein, “obtaining” as in “obtaining an agent” includes synthesizing, purchasing, or otherwise acquiring the agent.

By “reduces” is meant a negative alteration of at least 10%, 25%, 50%, 75%, or 100%.

By “reference” is meant a standard or control condition. In some embodiments, the conductivity of a polynucleotide comprising one or more modified nucleobases (e.g., modified adenosines) is compared to the conductivity of a reference polynucleotide (e.g., DNA molecule) comprising no modified nucleobases or comprising fewer modified nucleobases (e.g., A-T) than the corresponding polynucleotide.

A "reference sequence" is a defined sequence used as a basis for sequence comparison. A reference sequence may be a subset of or the entirety of a specified sequence; for example, a segment of a full-length cDNA or gene sequence, or the complete cDNA or gene sequence. For polypeptides, the length of the reference polypeptide sequence will generally be at least about 16 amino acids, preferably at least about 20 amino acids, more preferably at least about 25 amino acids, and even more preferably about 35 amino acids, about 50 amino acids, or about 100 amino acids. For nucleic acids, the length of the reference nucleic acid sequence will generally be at least about 50 nucleotides, preferably at least about 60 nucleotides, more preferably at least about 75 nucleotides, and even more preferably about 100 nucleotides or about 300 nucleotides or any

Nucleic acid molecules useful in the methods of the invention include any nucleic acid molecule that encodes a polypeptide of the invention or a fragment thereof. Such nucleic acid molecules need not be 100% identical with an endogenous nucleic acid sequence, but will typically exhibit substantial identity. Polynucleotides having “substantial identity” to an endogenous sequence are typically capable of hybridizing with at least one strand of a double-stranded nucleic acid molecule. Nucleic acid molecules useful in the methods of the invention include any nucleic acid molecule that encodes a polypeptide of the invention or a fragment thereof. Such nucleic acid molecules need not be 100% identical with an endogenous nucleic acid sequence, but will typically exhibit substantial identity. Polynucleotides having “substantial identity” to an endogenous sequence are typically capable of hybridizing with at least one strand of a double-stranded nucleic acid molecule. By "hybridize" is meant pair to form a double-stranded molecule between complementary polynucleotide sequences (e.g., a gene described herein), or portions thereof, under various conditions of stringency. (See, e.g., Wahl, G. M. and S. L. Berger (1987) Methods Enzymol. 152:399; Kimmel, A. R. (1987) Methods Enzymol. 152:507).

For example, stringent salt concentration will ordinarily be less than about 750 mM NaCl and 75 mM trisodium citrate, preferably less than about 500 mM NaCl and 50 mM trisodium citrate, and more preferably less than about 250 mM NaCl and 25 mM trisodium citrate. Low stringency hybridization can be obtained in the absence of organic solvent, e.g., formamide, while high stringency hybridization can be obtained in the presence of at least about 35% formamide, and more preferably at least about 50% formamide. Stringent temperature conditions will ordinarily include temperatures of at least about 30° C, more preferably of at least about 37° C, and most preferably of at least about 42° C. Varying additional parameters, such as hybridization time, the concentration of detergent, e.g., sodium dodecyl sulfate (SDS), and the inclusion or exclusion of carrier DNA, are well known to those skilled in the art. Various levels of stringency are accomplished by combining these various conditions as needed. In a preferred embodiment, hybridization will occur at 30° C in 750 mM NaCl, 75 mM trisodium citrate, and 1% SDS. In a more preferred embodiment, hybridization will occur at 37° C in 500 mM NaCl, 50 mM trisodium citrate, 1% SDS, 35% formamide, and 100 ,mu.g/ml denatured salmon sperm DNA (ssDNA). In a most preferred embodiment, hybridization will occur at 42° C in 250 mM NaCl, 25 mM trisodium citrate, 1% SDS, 50% formamide, and 200 pg/ml ssDNA. Useful variations on these conditions will be readily apparent to those skilled in the art.

For most applications, washing steps that follow hybridization will also vary in stringency. Wash stringency conditions can be defined by salt concentration and by temperature. As above, wash stringency can be increased by decreasing salt concentration or by increasing temperature. For example, stringent salt concentration for the wash steps will preferably be less than about 30 mM NaCl and 3 mM trisodium citrate, and most preferably less than about 15 mM NaCl and 1.5 mM trisodium citrate. Stringent temperature conditions for the wash steps will ordinarily include a temperature of at least about 25° C, more preferably of at least about 42° C, and even more preferably of at least about 68° C. In a preferred embodiment, wash steps will occur at 25° C in 30 mM NaCl, 3 mM trisodium citrate, and 0.1% SDS. In a more preferred embodiment, wash steps will occur at 42° C in 15 mM NaCl, 1.5 mM trisodium citrate, and 0.1% SDS. In a more preferred embodiment, wash steps will occur at 68° C in 15 mM NaCl, 1.5 mM trisodium citrate, and 0.1% SDS. Additional variations on these conditions will be readily apparent to those skilled in the art. Hybridization techniques are well known to those skilled in the art and are described, for example, in Benton and Davis (Science 196:180, 1977); Grunstein and Hogness (Proc. Natl. Acad. Sci., USA 72:3961, 1975); Ausubel et al. (Current Protocols in Molecular Biology, Wiley Interscience, New York, 2001); Berger and Kimmel (Guide to Molecular Cloning Techniques, 1987, Academic Press, New York); and Sambrook et al., Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory Press, New York.

By "substantially identical" is meant a polypeptide or nucleic acid molecule exhibiting at least 50% identity to a reference amino acid sequence (for example, any one of the amino acid sequences described herein) or nucleic acid sequence (for example, any one of the nucleic acid sequences described herein). Preferably, such a sequence is at least 60%, more preferably 80% or 85%, and more preferably 90%, 95% or even 99% identical at the amino acid level or nucleic acid to the sequence used for comparison.

Sequence identity is typically measured using sequence analysis software (for example, Sequence Analysis Software Package of the Genetics Computer Group, University of Wisconsin Biotechnology Center, 1710 University Avenue, Madison, Wis. 53705, BLAST, BESTFIT, GAP, or PILEUP/PRETTYBOX programs). Such software matches identical or similar sequences by assigning degrees of homology to various substitutions, deletions, and/or other modifications. Conservative substitutions typically include substitutions within the following groups: glycine, alanine; valine, isoleucine, leucine; aspartic acid, glutamic acid, asparagine, glutamine; serine, threonine; lysine, arginine; and phenylalanine, tyrosine. In an exemplary approach to determining the degree of identity, a BLAST program may be used, with a probability score between e'³ and e ⁰⁰ indicating a closely related sequence.

Ranges provided herein are understood to be shorthand for all of the values within the range. For example, a range of 1 to 50 is understood to include any number, combination of numbers, or sub-range from the group consisting 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, or 50.

Unless specifically stated or obvious from context, as used herein, the term "or" is understood to be inclusive. Unless specifically stated or obvious from context, as used herein, the terms "a", "an", and "the" are understood to be singular or plural.

By “vector” is meant a nucleic acid molecule, for example, a plasmid, cosmid, virus, or bacteriophage that is capable of replication in a host cell. In one embodiment, a vector is an expression vector that is a nucleic acid construct, generated recombinantly or synthetically, bearing a series of specified nucleic acid elements that enable transcription of a nucleic acid molecule in a host cell. Typically, expression is placed under the control of certain regulatory elements, including constitutive or inducible promoters, tissue-preferred regulatory elements, and enhancers.

Unless specifically stated or obvious from context, as used herein, the term “about” is understood as within a range of normal tolerance in the art, for example within 2 standard deviations of the mean. About can be understood as within 10%, 9%, 8%, 7%, 6%, 5%, 4%, 3%, 2%, 1%, 0.5%, 0.1%, 0.05%, or 0.01% of the stated value. Unless otherwise clear from context, all numerical values provided herein are modified by the term about.

The recitation of a listing of chemical groups in any definition of a variable herein includes definitions of that variable as any single group or combination of listed groups. The recitation of an embodiment for a variable or aspect herein includes that embodiment as any single embodiment or in combination with any other embodiments or portions thereof.

Any compositions or methods provided herein can be combined with one or more of any of the other compositions and methods provided herein. BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 provides an illustration showing a modified DNA bridged nanojunction or nanogap between two electrodes, where dots represent modified base pairs.

FIG. 2 provides an illustration showing chemical structures of modified adenines.

FIG. 3 provides a chart showing HOMO orbitals and energies of canonical and modified adenine base pairs from density functional theory (DFT) calculations using the B3LYP/6-31G* approach (Spartan Software, Wavefunction, Inc., USA).

FIG. 4 provides an illustration showing chemical structures of 2-amino-7-deaza- adenine with proposed position 7 modifications.

DETAILED DESCRIPTION OF THE INVENTION

This disclosure provides modified adenosines having improved conductivity, and methods of using modified A-T base pairs, for example, in a DNA wire.

The technology provided herein is based, at least in part, on the discovery of modified adenosines having improved conductivity.

In one aspect, this disclosure provides for modified adenosines for engineering DNA to overcome the A-T base barrier and achieve a highly conductive molecular wire (FIG. 1). In this invention, Compound 0-1, 2-amino-7-deaza-adenine (designated as D^a, FIG. 2), is utilized as a substitute for adenine. D^a can be synthesized through techniques well known in the literature, such as, for example, in Okamoto, et al., “2-Amino-7-deazaadenine forms stable base pairs with cytosine and thymine” (Bioorg Med Chem Lett. 12:97-9, 2002). Although an exemplary use of the present disclosure is in connection with engineering DNA, the disclosed technology is not limited to just DNA, but may also be used with RNA, peptide nucleic acid (PNA) molecules, and/or any hybrid or chimeric combination of such poly- or oligonucleotides, such as, for example, in a DNA/RNA duplex.

A calculation by density functional theory (DFT) indicates that D^a in the D^a-T base pair has a highest occupied molecular orbital (HOMO) energy level which is higher than the HOMO energy level of A in the A-T base pair. Perhaps more importantly, D^a also has a HOMO energy level which is higher than the HOMO energy level of G in the G-C base pair, by 0.36 eV (FIG. 3). That indicates that D^a can be used as an effective hopping side for charge transport through DNA.

In an embodiment, D^a is chemically incorporated into DNA by an automated DNA synthesizer. In some embodiments, this invention also includes Compound 0-2, 2-amino-adenine, and Compound 0-3, 7-deaza-adenine (designated as A^a and D, respectively, FIG. 2) as substitutes for adenine in DNA to tune the conductivity of DNA. Both A^a and D have higher HOMO energies than adenine in base pairs (FIG. 3). But A^a has a HOMO energy level similar to that of G, and D has a lower HOMO energy level than G (FIG. 3). In many embodiments, a length of DNA may be engineered such that some or all of the adenines in the DNA have been replaced with some combination of D^a, A^a, and D. In further embodiments, modified guanines, cytidines, thymidines, uracils, etc., may also be incorporated into any given length of DNA, alongside the modified adenines of this disclosure.

In some embodiments, D^a is further modified at position 7 with an R group with beneficial charge transfer properties, as shown in FIG. 4. In many embodiments, R may be, for example, but is not limited to: an alkyl group, such as methyl, ethyl, propyl, iso-propyl, butyl, iso-butyl, tert-butyl, cyclopropyl, cyclohexyl, nitro, cyano, halogenated alkyl, an aromatic ring, such as benzene, five-membered heterocycles, or combinations or derivatives of any of the previous.

In another aspect, the present disclosure also provides for modified adenosine triphosphates, for incorporating the modified adenosines into DNA enzymatically. An exemplary enzyme for use in incorporating the modified adenosines is a DNA polymerase that can extend a DNA chain with or without a template.

In another aspect, the present disclosure provides for using engineered DNA including one or more adenosines modified using the said methods or schemes discussed in this disclosure in a nanogap electronic measuring device for the identification and/or sequencings of biopolymers, such as, but not limited to the devices disclosed in U.S. App. Pub. Nos. 2017/0044605 Al and 2018/0305727 Al, and also in U.S. App. Nos. 62/794,096, 62/812,736, 62/833,870, 62/890,251, 62/861,675, and 62/853,119. Specifically, the engineered DNA can be used as a nanowire (or molecular wire) or part of a nanowire or a nanostructure to bridge a nanogap comprising two electrodes, the distance between which is in a range of 3 nm to 1 pm, 5 nm to 100 nm, or 5 to 30 nm (e.g., 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30 nm). The said nanostructure can be a nucleic acid duplex, a nucleic acid triplex, a nucleic acid quadruplex, a nucleic acid origami structure, or the combination thereof, or other nanostructures composed of nucleic acid bases or mixed nucleic acid bases and amino acid bases. The said electrodes may comprise noble metals, for example, platinum (Pt), gold (Au), silver (Ag), palladium (Pd), rhodium (Rd), ruthenium (Ru), osmium (Os), and iridium (Ir), as well as other metals, such as copper (Cu), rhenium (Re), titanium (Ti), Niobium (Nb), Tantalum (Ta) and their derivatives, such as TiN, and TaN, etc., or their alloys. The two electrodes may form a nanogap by being placed next to each other on a non-conductive substrate or by being placed overlapping each other, separated by a non-conductive layer, such as, for example, in U.S. App. No. 62/890,251. In some embodiments, an enzyme may be attached to the nanowire or nanostructure for carrying out a biochemical reaction for the sensing, identification, or sequencing of biopolymers. Exemplary biopolymers for sensing, identification, or sequencing by devices of the present invention include, but are not limited to DNA, RNA, DNA oligos, protein, peptides, polysaccharides, etc., either natural, modified, or synthesized. Exemplary enzymes for attachment to nanowires or nanostructures of the present invention include but are not limited to DNA polymerase, RNA polymerase, DNA helicase, DNA ligase, DNA exonuclease, reverse transcriptase, RNA primase, ribosome, sucrase, lactase, etc., whether natural, mutated, or synthesized.

In another aspect, the present disclosure provides for a system comprising a conductive or semiconductive molecular wire incorporating one or more modified adenosines of the present disclosure.

In another aspect, the present disclosure provides for a system for identification, characterization, or sequencing of a biopolymer comprising, a nanogap formed by a first electrode and a second electrode placed next to each other on a non-conductive substrate or placed overlapping each other separated by a non-conductive layer; a nanostructure comprising one or more nucleic acid base pairs that bridges the said nanogap by attaching one end to the first electrode and another end to the second electrode through a chemical bond, wherein at least one nucleic acid base within the nanostructure is 2-amino-7-deaza- adenine; a sensing probe attached to the nanostructure that can interact or perform a chemical or biochemical reaction with the biopolymer, further comprising, a bias voltage that is applied between the first electrode and the second electrode; a device that records a current fluctuation through the nanostructure caused by the interaction between the sensing probe and the biopolymer; and a software for data analysis that identifies or characterizes the biopolymer or a subunit of the biopolymer. In a further embodiment, the nanostructure is selected from the group consisting of a nucleic acid duplex, a nucleic acid triplex, a nucleic acid quadruplex, a nucleic acid origami structure, and a combination thereof. In a further embodiment, the nucleic acid base modification reduces the energy gap between HOMO and lowest unoccupied molecular orbital (LUMO) in comparison to a canonical nucleic acid base in the same position without modification. In a further embodiment, the biopolymer is selected from the group consisting of a DNA, a RNA, a protein, a polypeptide, an oligonucleotide, a polysaccharide, and their analogues, either natural, synthesized, or modified. In a further embodiment, the sensing probe is selected from the group consisting of a nucleic acid probe, a molecular tweezers, an enzyme, a receptor, a ligand, an antigen and an antibody, either native, mutated, expressed, or synthesized, and a combination thereof. In a further embodiment, the enzyme is selected from the group consisting of a DNA polymerase, an RNA polymerase, a DNA helicase, a DNA ligase, a DNA exonuclease, a reverse transcriptase, an RNA primase, a ribosome, a sucrase, lactase, either natural, mutated or synthesized. In a further embodiment, the nanogap size or the distance between the two electrodes is about 3 to 1000 nm, preferably about 5 to 100 nm, and most preferably about 5 to 30 nm. In a further embodiment, the electrodes are made using a noble metal selected from the group consisting of platinum (Pt), gold (Au), silver (Ag), palladium (Pd), rhodium (Rd), ruthenium (Ru), osmium (Os), iridium (Ir), or another metal selected from a group consisting of copper (Cu), rhenium (Re), titanium (Ti), Niobium (Nb), Tantalum (Ta) and their derivatives, such as TiN, and TaN or an alloy, and a combination thereof.

In another aspect, the present disclosure provides for a method for improving the conductance of a molecular wire, comprising modifying at least one nucleic acid base to 2- amino-7-deaza-adenine.

In another aspect, the present disclosure provides for a method for identification, characterization, or sequencing of a biopolymer comprising, forming a nanogap by placing a first electrode and a second electrode next to each other on a non-conductive substrate or overlapping each other separated by a non-conductive layer; providing a nanostructure comprising one or more nucleic acid base pairs with length comparable to the nanogap, wherein at least one nucleic acid base within the nanostructure is 2-amino-7-deaza-adenine; attaching one end of the nanostructure to the first electrode and another end to the second electrode through a chemical bond; and attaching a sensing probe to the nanostructure that can interact or perform a chemical or a biochemical reaction with the biopolymer. In a further embodiment, the embodiment further comprises applying a bias voltage between the first electrode and the second electrode; providing a device that records a current fluctuation through the nanostructure caused by the interaction between the sensing probe and the biopolymer; and providing a software for data analysis that identifies or characterizes the biopolymer or a subunit of the biopolymer. In a further embodiment, the nanostructure is selected from the group consisting of a nucleic acid duplex, a nucleic acid triplex, a nucleic acid quadruplex, a nucleic acid origami structure, and a combination thereof. In a further embodiment, the nucleic acid base modification reduces the energy gap between HOMO and LUMO in comparison to the canonical nucleic acid base in the same position without modification. In a further embodiment, the biopolymer is selected from the group consisting of a DNA, a RNA, a protein, a polypeptide, an oligonucleotide, a polysaccharide, and their analogues, either natural, synthesized, or modified. In a further embodiment, the sensing probe is selected from the group consisting of a nucleic acid probe, a molecular tweezers, an enzyme, a receptor, ligands, an antigen and an antibody, either native, mutated, expressed, or synthesized, and a combination thereof. In a further embodiment, the enzyme is selected from the group consisting of a DNA polymerase, a RNA polymerase, a DNA helicase, a DNA ligase, a DNA exonuclease, a reverse transcriptase, a RNA primase, a ribosome, a sucrase, a lactase, either natural, mutated or synthesized. In a further embodiment, the nanogap size or the distance between the two electrodes is about 3 to 1000 nm, preferably about 5 to 100 nm, and most preferably about 5 to 30 nm. In a further embodiment, the electrodes are made using a noble metal selected from a group consisting of platinum (Pt), gold (Au), silver (Ag), palladium (Pd), rhodium (Rd), ruthenium (Ru), osmium (Os), and iridium (Ir), or another metal selected from a group consisting of copper (Cu), rhenium (Re), titanium (Ti), Niobium (Nb), Tantalum (Ta) and their derivatives, such as TiN, and TaN or an alloy, and a combination thereof.

Other Embodiments

From the foregoing description, it will be apparent that variations and modifications may be made to the invention described herein to adopt it to various usages and conditions. Such embodiments are also within the scope of the following claims.

The recitation of a listing of elements in any definition of a variable herein includes definitions of that variable as any single element or combination (or subcombination) of listed elements. The recitation of an embodiment herein includes that embodiment as any single embodiment or in combination with any other embodiments or portions thereof. All patents and publications mentioned in this specification are herein incorporated by reference to the same extent as if each independent patent and publication was specifically and individually indicated to be incorporated by reference.

Claims

What is claimed is:

1. A system comprising a conductive or semiconductive molecular wire, the system comprising a nanostructure comprising one or more nucleic acid base pairs, wherein at least one nucleobase within the nanostructure comprises the following:

0-1

2. The system of claim 1, wherein the molecular wire is in electrical communication with the electrodes.

3. The system of claim 1, wherein the molecular wire bridges a nanogap between the two electrodes.

4. The system of claim 1, wherein the molecular wire comprises two ends, each of which is chemically bound to one of the electrodes.

5. The system of claim 1, wherein one of the electrodes is positively charged and the other is negatively charged.

6. The system of any one of claims 1-4, wherein the molecular wire comprises DNA, RNA, a peptide nucleic acid molecule, a DNA/RNA duplex, or combinations thereof.

7. A system for identification, characterization, or sequencing of a biopolymer, the system comprising i. a nanogap formed by a first electrode and a second electrode; ii. a nanostructure having two ends, the nanostructure comprising one or more nucleic acid base pairs, wherein the nanostructure bridges the nanogap between the electrodes, wherein each of the electrodes is chemically bound to one of the ends of the nanostructure, and wherein the nanostructure comprises a 2-amino-7-deaza-adenine; and iii. a sensing probe attached to the nanostructure that can interact or perform a chemical or biochemical reaction with the biopolymer.

8. The system of any one of claims 1-7, wherein the electrodes are disposed on a non- conductive substrate or are separated by a non-conductive layer.

9. The system of claim 1 or 3, further comprising one or more of the following: iv. a bias voltage that is applied between the first electrode and the second electrode; v. a device that records a current fluctuation through the nanostructure caused by the interaction between the sensing probe and a biopolymer; and vi. a software for data analysis that identifies or characterizes the biopolymer or a subunit of the biopolymer.

10. The system of any one of claims 1-9, wherein the nanostructure is selected from the group consisting of a nucleic acid duplex, a nucleic acid triplex, a nucleic acid quadruplex, a nucleic acid origami structure, and a combination thereof.

11. The system of any one of claims 1-10, wherein the biopolymer comprises DNA, RNA, a protein, a polypeptide, an oligonucleotide, a polysaccharide, or analogues thereof, either natural, synthesized, or modified.

12. The system of any one of claims 1-10, wherein the sensing probe is a polynucleotide or a polypeptide.

13. The system of any one of claims 1-10, wherein the sensing probe is selected from the group consisting of a nucleic acid probe, a molecular tweezers, an enzyme, a receptor, a ligand, an antigen and an antibody, either native, mutated, expressed, or synthesized, and a combination thereof.

14. The system of claim 13, wherein the enzyme is selected from the group consisting of a DNA polymerase, an RNA polymerase, a DNA helicase, a DNA ligase, a DNA exonuclease, a reverse transcriptase, a RNA primase, a ribosome, a sucrase, lactase, either natural, mutated or synthesized.

15. The system of any one of claims 1-14, wherein the nanostructure further comprises a polypeptide.

16. The system of claim 15, wherein the polypeptide is an enzyme selected from the group consisting of a DNA polymerase, an RNA polymerase, a DNA helicase, a DNA ligase, a DNA exonuclease, a reverse transcriptase, an RNA primase, a ribosome, a sucrase, lactase, either natural, mutated or synthesized.

17. The system of any one of claims 1-14, wherein the nanogap comprises about 3 to 1000 nm, about 5 to 100 nm, or about 5 to 30 nm.

18. The system of claim 17, wherein the nanogap is selected from the group consisting of 5, 10, 20, 25, and 30 nm.

19. The system of any one of claims 1-18 wherein the electrodes comprises a metal selected from the group consisting of platinum (Pt), gold (Au), silver (Ag), palladium (Pd), rhodium (Rd), ruthenium (Ru), osmium (Os), iridium (Ir), copper (Cu), rhenium (Re), titanium (Ti), Niobium (Nb), Tantalum (Ta) and their derivatives, and an alloy or combination thereof.

20. The system of claim 19, wherein the Ta derivative is TiN, or TaN.

22. A method for improving the conductance of a molecular wire, comprising: providing a molecular wire, wherein the molecular wire is a nanostructure comprising one or more nucleic base pairs; and modifying at least one nucleic base to 2-amino-7-deaza-adenine.

23. A method for identification, characterization, or sequencing of a biopolymer comprising, i. forming a nanogap by placing a first electrode and a second electrode in proximity to one another on a non-conductive substrate or overlapping each other separated by a non-conductive layer; ii. providing a nanostructure comprising one or more nucleic acid base pairs with length comparable to the nanogap, wherein at least one nucleic acid base within the nanostructure is 2-amino-7-deaza-adenine; iii. chemically bonding one end of the nanostructure to the first electrode and the other end of the nanostructure to the second electrode; and iv. attaching a sensing probe to the nanostructure, wherein the sensing probe is capable of physically interacting with or performing a chemical or a biochemical reaction with the biopolymer.

24. The method of claim 23, further comprising, v. applying a bias voltage between the first electrode and the second electrode; vi. providing a device that records a current fluctuation through the nanostructure caused by the interaction between the sensing probe and the biopolymer; and vii. providing a software for data analysis that identifies or characterizes the biopolymer or a subunit of the biopolymer.

25. The method of claim 23, wherein the nanostructure is selected from the group consisting of a nucleic acid duplex, a nucleic acid triplex, a nucleic acid quadruplex, a nucleic acid origami structure, and a combination thereof.

26. The method of claim 23, wherein the biopolymer is selected from the group consisting of a DNA, an RNA, a protein, a polypeptide, an oligonucleotide, a polysaccharide, and their analogies, either natural, synthesized, or modified.

27. The method of claim 23, wherein the sensing probe is selected from the group consisting of a nucleic acid probe, a molecular tweezers, an enzyme, a receptor, ligands, an antigen and an antibody, either native, mutated, expressed, or synthesized, and a combination thereof.

28. The method of claim 27, wherein the enzyme is selected from the group consisting of a DNA polymerase, an RNA polymerase, a DNA helicase, a DNA ligase, a DNA exonuclease, a reverse transcriptase, an RNA primase, a ribosome, a sucrase, a lactase, either natural, mutated or synthesized.

29. The method of claim 23, wherein the nanogap size or the distance between the two electrodes is about 3 to 1000 nm, preferably about 5 to 100 nm, and most preferably about 5 to 30 nm.

30. The method of claim 23, wherein the electrodes comprise a metal selected from the group consisting of platinum (Pt), gold (Au), silver (Ag), palladium (Pd), rhodium (Rd), ruthenium (Ru), osmium (Os), iridium (Ir), copper (Cu), rhenium (Re), titanium (Ti), Niobium (Nb), Tantalum (Ta) and their derivatives, or a combination thereof.

31. The method of claim 23, further comprising providing 2-amino-7 -deaza-adenine- triphosphate; and incorporating the 2-amino-7-deaza-adenine-triphosphate into a nucleic acid strand within the nanostructure enzymatically.