WO2020185896A1 - Methods for processing and storing dna encoding formats of information - Google Patents
Methods for processing and storing dna encoding formats of information Download PDFInfo
- Publication number
- WO2020185896A1 WO2020185896A1 PCT/US2020/022102 US2020022102W WO2020185896A1 WO 2020185896 A1 WO2020185896 A1 WO 2020185896A1 US 2020022102 W US2020022102 W US 2020022102W WO 2020185896 A1 WO2020185896 A1 WO 2020185896A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- polynucleotides
- subset
- barcode
- sequence
- addressable
- Prior art date
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/10—Processes for the isolation, preparation or purification of DNA or RNA
- C12N15/1034—Isolating an individual clone by screening libraries
- C12N15/1065—Preparation or screening of tagged libraries, e.g. tagged microorganisms by STM-mutagenesis, tagged polynucleotides, gene tags
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C13/00—Digital stores characterised by the use of storage elements not covered by groups G11C11/00, G11C23/00, or G11C25/00
- G11C13/0002—Digital stores characterised by the use of storage elements not covered by groups G11C11/00, G11C23/00, or G11C25/00 using resistive RAM [RRAM] elements
- G11C13/0009—RRAM elements whose operation depends upon chemical change
- G11C13/0014—RRAM elements whose operation depends upon chemical change comprising cells based on organic memory material
- G11C13/0019—RRAM elements whose operation depends upon chemical change comprising cells based on organic memory material comprising bio-molecules
Definitions
- the present invention relates in general to methods of processing and storing polynucleotides that encode formats of information which have been translated into digital representation such as bits, trits, etc. as is known in the art which are encoded Into nucleic acid sequences.
- polynucleotides can be selectively removed from an addressable support and barcoded for later processing such as identification, selection, retrieval, etc. such as from among a plurality of barcoded polynucleotides.
- barcodes include nucleic acid sequences with associated meta tag information.
- One feature of the enzyme-based DNA synthesis method is the use of template- independent TdT which increases the efficiency for converting information to DNA.
- This method allows accurate data encoding without requiring single-base precision.
- data can be stored in a series of nucleotide homopolymers such that a DNA strand having a sequence of “AAAATTTCGGG”, for example, is informationally equivalent to a DNA strand, which can be computed and represented in silieo, as A t CG .
- This data encoding approach which may be referred to as a flexible-write synthesis (because it allows for and interprets repeated nucleotides as a single nucleotide), allows for a high-level of tolerance on the stringency by which DNA synthesis is performed. As a result, synthesis processes can be optimized for increased speed and reduced reagent costs by compromising the precision requirement.
- a issue associated with the flexible-write synthesis method is that the resultant DNA strands can be highly heterogeneous at dm single-base resolution making downstream processing difficult.
- the present disclosure provides a method for selectively harcoding a subset of polynucleotides encoding bits of information with a unique barcode, wherein a plurality of polynucleotides comprising the subset is releasably attached to an addressable substrate by an activatabie linker, wherein die subset is releasably attached to known subset locations of the addressable substrate, wherein each known subset location includes a plurality of different sequence polynucleotides each encoding the same bits of information, wherein each polynucleotide of the subset includes a common 5’ universal initiator of same sequence and a common 3 universal adaptor of same sequence.
- the method includes the steps of selectively releasing the subset of polynucleotides from the addressable array, and barcoding the polynucleotides of the subset at either the 5 ' end or the 3 ’ end wife a first barcode.
- a solid- phase synthesis device can be used to record digital information in DNA molecules on an addressable substrate and the DNA molecules can be selectively further processed, such as by adding barcode or other metatag information.
- FIGs. I A- IB depict an overview schematic of mixed-mode production of mass tagged DNA polymers for information storage.
- Pig. 2 depicts the result of a trace under 1 M Thiol- PEGS -alcohol.
- Anodic current is detected from application of volts fro 0 to -1 25 V with addition of ImM final Thiol-PEG3- alcohol.
- the first cyclic voltammetry scan shows a significant anodic current of -20 micro amps when the voltage is between -1 to -1.25 volts, demonstrating conjugation of thioiated molecules to the electrode.
- the peak anodic current between -1 to - 1 25 volts is less than -5 micro amps and is similar to that of the hare electrode, demonstrating desorption of the thioiated molecules from the electrode surface.
- Fig 3 depicts the result of a trace under 10 mM Thio!-PEGS-aicohoi.
- Anodic current is detected from application of volts from 0 to -1.25 V with addition of 100 mieroM final Thiol - PECB-alcohoL
- the first cyclic voltammetry scan shows a significant anodic current of -20 micro amps when the voltage is between -1 to -1.25 volts, demonstrating conjugation of thioiated molecules to the electrode.
- the peak anodic current between -1 to -1.25 volts is approximately -5 micro amps and is similar to that of the bare electrode, demonstrating desorption of the thioiated molecules fro the electrode surface.
- Fig. 4 depicts the result of a trace under 100 p Thiol- PEG3-alcohol. Anodic current is detected from application of volts from 0 to -1.25 V with addition of ImM final Thioi-PEG3- alcohol. The first cyclic voltammetry scan shows a significant anodic current of -20 micro amps when the voltage is between -1 to -1.25 volts, demonstrating conjugation of thioiated molecules to the electrode.
- the peak anodic current between -1 to - 1 25 volts is approximately -3 micro amps and is highly similar to that of the bare electrode, demonstrating high (i.e., near complete) desorption of the thioiated molecules from the electrode strrf ce
- FIG. 5 is a gel separation depicting desorbed DNA versus tethered DNA.
- F control Initiator only control FI - fluid from extension reaction with dCTP (desorbed DNA).
- S control Cleaved initiator control.
- SI - tethered DNA from extension reaction with dCTP (tethered DNA).
- S2 --- tethered DN A from extension reaction with dCTP (tethered DNA).
- aspects of the present disclosure are directed to methods of processing a subset of polynucleotides (such as DNA or RNA strands) from among a collection of polynucleotides present on an addressable substrate.
- the aspects of the present disclosure have particular application where the polynucleotides (strands) encode for a format of information which is represented by bits which are encoded into nucleic acid sequences, as is known in the art.
- the formal of information can be digital as is known in information theory and information systems, to be t e discrete, discontinuous representation of information or works.
- the format of information can be analog, as is known In the art as relating to or using signals or information represented by a continuously variable physical quantity such as spatial position voltage, etc.
- a format of information such as text, an image, a video or an audio format, such as an him! format of information, as is known in the art, is converted to a digital representation (i.e., encoding in discrete units such as a bicary numeral system ternary numeral system and so on) such as bits (zeros and ones), trits (zeros ones, twos), for example using a computer and appropriate software, and then the series of bits (or trits or other exemplary digital representation system) are translated into a series of nucleotides.
- a digital representation i.e., encoding in discrete units such as a bicary numeral system ternary numeral system and so on
- bits zeros and ones
- trits zeros ones, twos
- bit is to be understood according to its common meaning to one of skill in the art.
- the term“bit” may be a contraction of“binary digit” and may refer to a basic capacity of information in computing and telecommunications.
- A“bit” represents either a first state or a second state, such as 1 or 0 (one or zero) only.
- the representation may be implemented, in a variety of systems, by means of a two state device.
- the bit sequence is converted (encoded), such as by a computer and appropriate software, to a designed sequence of nucleotides, i.e., an oligonucleotide or DNA or RNA.
- a 1 bit per base encodi g A or C -0; T/U or G -1 or other encoding method to for a corre spending encoded oligonucleotide sequence, i.e. the oligonucleotide sequence corresponds fo or encodes for the bit sequence.
- the term“trit” refers to a ternary numeral system, also called base 3, and has 3 as its base. Analogous to a bit. a ternary digit is a trit (trinary digit). The following discioosure may make refernece to bits, but equally applies to method using nits and other digital representation methods known to those of skill in the art.
- the series of bits may be separated into bit sequences such that a series of nucleotides, he . a polynucleotide, is representative of the bit sequence.
- Each polynucleotide is then synthesized for example, using a template independent process or other processes as described herein and the polynucleotide is then stored.
- the format of information is encoded by a plurality of polynucleotides with each polynucleotide representing a bit sequence making up the series of bits representative of the entire format of information.
- sequences of the plurality of polynucleotides can then be determined, such as by sequencing, and translated back into the series of bits which is then translated back into the format of information.
- sequences of the plurality of polynucleotides can then be determined, such as by sequencing, and translated back into the series of bits which is then translated back into the format of information.
- the subset of polynucleotides for example, encoding for a format of information as a series of bits, is selectively removed from the addressable substrate and bareoded with one or more, such as two or more or a plurality of barcodes using methods known to those of skill in the art.
- the locations on the addressable array where the subset of polynucleotides is attached is known because of the synthesis process.
- a polynucleotide synthesizer can be programmed to synthesize the subset of polynucleotides of predetermined sequences at defined locations of the addressable array.
- the locations need not be contiguous, although synthesizing the subset at contiguous locations on an addressable array has certain advantages in terms of synthesis efficiency and removal, since the subset is confined to a particular geographic location of the array.
- the geographic location may be separated from other locations by physical barriers such as channels or trenches or by chemical barriers such as hydrophobic layers as is known in the art. Since the locations are known a priori, the subset can be released from those known locations, using methods as described herein, for further processing, such as adding one or more, two or more or a plurality of barcodes.
- the barcoded subset of polynucleotides is then stored within a storage vessel.
- One or more additional subsets of polynucleotides axe subsequently removed, such as in series, from the addressable array and barcoded with one or more such as two or more or a plurality of barcodes and stored within the storage vessel.
- the barcode uniquely identifies each subset or provides information about the nature of t e information stored in the subset, such as eta tag information, and is used to identify, select and/or retrieve the subset of polynucleotides from among the different subsets of polynucleotides stored in the same storage vessel.
- the polynucleotides are synthesized on the addressable substrate using methods known to those of skill in the art, such as template independent synthesis using a template independent polymerase, as is known in the art for encoding a format of information into polynucleotide sequences.
- the template independent polymerase can be error prone or not error prone.
- template independent synthesis can be used to produce polypeptide chains at known locations of an addressable substrate.
- an addressable substrate Is provided having an initiator sequence bound or tethered to locations of the addressable substrate where polynucleotides are intended to be synthesized.
- the initiator sequence may be or also have a single-base precise sequence to be used for adding a barcode sequence.
- the ioeations of the addressable substrate may be or include an electrode as is known in the ait, such as with an electrode array. Polynucleotides are produced at the electrode locations.
- the reagents for extending the initiator sequence with a nucleotide, such as the nucleotide, a template independent polymerase cations, etc. as is known in the art, axe contacted to the addressable substrate and an electrical stimulus can be generated to activate the template independent polymerase to add the nucleotide to the initiator sequence at desired locations of the addressable substrate.
- the process can be repeated to produce polynucleotides at desired known locations of the addressable substrate as is known in the art.
- a single-base precise 3’ adapter may be added to the polynucleotides by methods such as ligation or PCR.
- the single-base precise 3 ' adapter may be used to add a barcode sequence to the polynucleotides.
- a given known location of an addressable substrate such as an electrode of an electrode array can include a plurality of initiators attached thereto resulting in a plurality of oligonucleotides or polynucleotides at the given known location.
- a given known location of an addressable substrate is intended to include a polynucleotide of a same given predetermined sequence assuming perfect synthesis fidelity
- template independent synthesis methods can result in repeat addition of a given nucleotide resulting in a plurality of different sequences at the given known location of an addressable substrate insofar as the polynucleotide sequences may include repeat sequences.
- the polynucleotide sequence strands for a given known location can be highly heterogeneous at the single-base resolution though the strands accurately encode the intended bit sequence.
- the different polynucleotide sequences can be interpreted as ;h ⁇ . same predetermined sequence, as is known in the art, such as by interpreting a given repeat nucleotide as a single given nucleotide, and so information can be accurately maintained by the different polynucleotide sequences.
- polynucleotides can be synthesized emphasizing high speed and low cost of synthesis without requiring high synthesis fidelity by allowing repeat nucleotides during synthesis.
- the diverse or different sequence strands naake amplification and hybridization less accurate and more difficult, since a single primer sequence (intended to hybridize to and amplify the sequence, i.e. the many sequences attached to the known location) may not be able to hybridize to each of the different polynucleotide sequences representing t e same predetermined sequence at the given known location on the addressable substrate
- the single primer sequence may not be able to hybridize to all of the polynucleotide sequences at a given known location of an addressable substrate.
- the present disclosure provides a method for barcoding the polyn cleotides of the given known location of the addressable substrate, and other polynucleotides of a given subset.
- a barcode is unique to each polynucleotide at a given known location of the addressable substrate even though the polynucleotides may differ in sequence.
- the barcode is also unique to fee other polyn cleotide sequences of a given subset located at other given known locations of the addressable substrate.
- each polynucleotide of the subset includes a first same or identical barcode identifying fee members of the subset.
- the barcode is unique to the members of fee subset and so can be used to identify the members of the subset.
- the members of the subset may include an additional barcode, i.e. a second barcode which can be a same or identical barcode identifying the members of the subset.
- the first barcode is different from the second barcode insofar as the first barcode represents information different from the second barcode.
- the second barcode may be different, i.e. nonidentical, among members of the subset, such as to convey specific information about different members of the subset.
- the second barcode may have a first sequence identifying a first set of polynucleotides of fee subset, a second sequence identifying a second set of polynucleotides of fee subset, a third sequence identifying a third set of polynucleotides of fee subset, and so on, such that the second barcode may comprise a plurality of different sequences.
- the first sequence, second sequence and third sequence represent different information.
- Such a method of barcoding provides for selective amplification and retrieval of an entire subset of polynucleotides representing stored information such as flies, particularly when the subset of polynucleotides is pooled along wit other subsets of polynucleotides representing different stored information, and also when polynucleotides within a given known location have a sequence different from the predetermined sequence for that given known location due to synthesis infidelity.
- the use of the first barcode allows for base-precise hybridization of an amplification or selection primer to imprecisely synthesized but informationally accurate polynucleotides of a given known location.
- the retrieved subset of information can then be probed for particular polynucleotides using the different sequences of fee second barcode if such different sequences have been attached to the polynucleotides of the subset.
- the second barcode can be used to distinguish categories of files within the subset.
- the polynucleotides of a given subset that are attached to the addressable substrate include a universal single -base precise initiator sequence at the 5’ end of the polynucleotides and a universal single-base precise adapter at the 3’ end of the polynucleotides.
- the polynucleotides of a given subset are flanked by first and second universal single-base precise sequences.
- a subset of polynucleotides is released from the addressable substrate, with the polynucleotides having a first single base precise sequence at one end and a second single base precise sequence at the other end.
- the polynucleotides are then processed under conditions to add one or mom, such as two or more or a plurality of barcode sequences.
- the first single base precise sequence can be used to add a first barcode to the polynucleotides and the second single base precise sequence can be used to add a second barcode to ie polynucleotides.
- the polynucleotides are flanked by barcodes.
- the first and second single-base precise sequences can be used to create a barcode under conditions using known methods such as PCR methods, iigase methods, RPA methods, transposon/transposase methods, recombination methods or hybridization extension methods, as axe known in the art.
- PCR conditions as generally described herein utilize common PCR buffer conditions known to those of skill in the art such as: standard PCR buffer (I X), 1.5 mM MgC12, 50 mM KCi, 10 m Tris-HCl pH: 8.3 at 25°C and a standard PCR buffer (10X), 15 M MgC12, 500 mM KCI, 100 mM Tris-HCL pH 8.3 at 25°C.
- the polynucleotides of a given subset that have been synthesized to encode a format of information include a first barcode at either the 5’ or 3’ end.
- the polynucleotides of a given subset that have been synthesized to encode the format of information can also have a second barcode at the other of the 5’ or 3’ end.
- the polynucleotides of a given subset that have been synthesized to encode the format of information have a first barcode at the 5 and a second barcode at the 3’ end.
- the barcodes may include eta tag information.
- a subset of polynucleotides is selectively released or desorbed or decoupled from the surface of an addressable electrode array using electrochemical desorption (BCD).
- BCD electrochemical desorption
- Such electrode arrays are known in the art and provide high spatial control, such as (1) by using mild electrically reducing conditions, (2) by repeatedly oxidizing-reducing the underlying electrode surface or (3) by electrolysis.
- the initiators can be tethered, linked or otherwise bound, attached or connected to the electrode array by thiol linkages. The thiol linkages can be broken under mild reducing conditions by applying a negative potential to selected electrodes resulting in electrochemical desorption or release of the polynucleotide from the substrate.
- the selectively released or desorbed or decoupled polynucleotides are transferred, such as by using fluidics channels, into a vessel or chamber for barcoding under conditions to add either a 5’ barcode sequence or 3 barcode sequence or both to each polynucleotide, such as under PCR conditions or by ligation or other known methods.
- 5’ or 3 ’ sequences unique to the set of polynucleotides can be added under PCR conditions using the 5 ' or 3’ single base precise sequences.
- the barcode sequence can be used as hash keys to represent eta tags. i.e.
- each barcode represents known information such as a category of subject matter (“financial”,“vacation photos”“tax information”, etc.) or a dale or other information describing the forma! of information encoded by the subset of polynucleotides.
- 'Fagging can be performed iteratively in a plurality of chambers or vessels connected by fluidic channels to attach a plurality of meta tags to data encoded in the polynucleotide strands.
- a plurality of chambers may be used to barcode strands in parallel or in series as desired.
- a plurality of bareoded subsets of polynucleotides can be collectively stored as a single pool. Each subset ca be retrieved using the single base precise sequence or barcode unique to each subset, for example, using PCR methods or hybridization methods known to those of skill in the art.
- Computer software utilized in the methods of the present disclosure include computer readable medium having computer-executable instructions for performing logic steps of the method of the invention.
- Suitable computer readable media include, but are not limited to, a floppy disk, CD-ROM/DVD/DVD-ROM, hard disk drive flash memory, ROM/RAM, magnetic tapes, and others that may be developed.
- the computer executable instructions may be written in a suitable computer language or combination of several computer languages.
- the methods described herein may also make use of various commercially available computers and computer program products and software for a variety of purposes including tran lating text or images into binary code, designing nucleic acids sequences represe tative of the binary code, analyzing sequencing data fro the nucleic acid sequences, translating the nucleic acid sequence data into binary code, and translating the binary code into text or images.
- Certain exemplary embodiments axe directed to the use of computer software and hardware to automate polynucleotide synthesis upon an addressable substrate. Such software and hardware may be used in conjunction with individuals performing synthesis by hand or in a semi-automated f shion or combined with an automated system.
- exemplary programs are written in suitable programming language. The program may be compiled into an executable that may then be run from a command prompt in the WINDOWS XP operating system or other operating systems. Unless specifically set forth in the claims, the invention is not limited to implementation using a specific programming language, operating system environment or hardware platform.
- oligonucleotides or polynucleotides attached to a substrate may be an addressable substrate, such as an addressable array, such as an addressable electrode array. Such methods are generally known to those of skill in the art and as described herein.
- the term“attach” refers to both covalent interactions and noncovending interactions.
- a covalent interaction is a chemical linkage between two atoms or radicals formed by the sharing of a pair of electrons (i.e., a single bond). two pairs of electrons (i.e., a double bond) or three pairs of electrons the..
- Noncovaleni interactions include but are not limited to, van der Waa!s interactions, hydrogen bonds, weak chemical bonds (i.e , via short-range noncovaleni forces), hydrophobic interactions, ionic bonds and the like. A review of noncovaleni interactions can be found in Alberts et ai., in Molecular Biology of ike Celt 3d edition. Garland Publishing, 1994
- oligonucleotide sequences can be synthesized using a support.
- Methods of synthesizing oligonucleotide sequences are well-known in the art (See, e.g., Seliger (1993) Protocols for Oligonucleotides and Analogs: Synthesis and Properties, vol. 20, pp. 391-435, Efimov (2007) Nucleosides, Nucleotides & Nucleic Acids 26:8 McMlrm et ah i 1997) / Org. Ghent 62:7074, Froehier et ah ( 1986) Nucleic Acids Res. 14:5399, Garegg 11986) let. Lett.
- nucleotide is intended to include, but is not limited to, a single-stranded or double stranded DMA or RNA molecule, typically prepared by synthetic means. Nucleotides of the present invention will typical ly be the naturally-occurring nucleotides such as nucleotides derived from adenosine, guanosine, uridine, cyiidine and thymidine. However, synthetic or non-natural nucleotides may be used.
- nucleic acid “nucleic acid molecule,”“nucleic acid sequence,”“nucleic acid fragment,'’“oligonucleotide” and“polynucleotide” are used interchangeably and are intended to include, but not limited to, a polymeric form of nucleotides that may have various lengths, either deoxyribonucieotides or ribonucleotides or analogs thereof.
- Oligonucleotides or polynucleotides useful in the methods deseribed herein may comprise natural nucleic acid sequences and variants thereof artificial nucleic acid sequences, or a combination of such sequences. Oligonucleotides or polynucleotides may be single stranded or double stranded.
- a polynucleotide is typically composed of a specific sequence of tour nucleotide bases: adenine (A); cytosine ⁇ €); guanine (G); an thymine (T) (uracil (U) for thymine (T) when the polynucleotide is RNA).
- polynucleotide sequence is the alphabetical representatio of a polynucleotide molecule; alternatively, the term may be applied to the polynucleotide molecule itself. This alphabetical representation can be input into databases in a computer having a central processing unit and used for synthesis of the oligonucleotide or polynucleotide.
- Polynucleotides may optionally include one or more non-standard nucleotide(s), nucleotide anaiog(s) and/or modified nucleotides.
- a single support or multiple supports may be utilized (e g , synthesized, amplified, hybridized or the like) in parallel.
- Suitable supports include but are not limited to, slides (e.g., microscope slides), beads, chips, particles, strands, gels sheets, tubing (e.g . microfuge tubes, test tribes, cuvettes), spheres, containers capillaries, microfibers, pads, electrodes, slices, films, plates (e.g , multi-well plates) microfluidic supports (e.g., microarray chips, flow channel plates, biochips and the like) and the like in various embodiments, the solid supports may be biological, nonbioiogical. organic inorganic or combinations thereof.
- the support When using supports that are substantially planar, the support may be physically separated into regions by physical barriers, for example, with trenches, grooves, wells, or chemically separated into regions by chemical barriers (e.g., lacking a lipid-binding coating hydrophobic coatings and the like).
- the supports include a plurality of locations where oligonucleotides or polynucleotides are to be synthesized.
- supports can be made of a variety of materials including but not limited to glass quartz, ceramic, plastic, polystyrene methylstyrene, acrylic polymers, titanium, gold, platinum, latex, sepbarose, cellulose, nylon and the like and any combination thereof. Such supports and their uses are well known in the art.
- a support is an array or a microarray.
- the term“micro array” refers in one embodiment to type of array that comprises a solid phase support having a substantially planar surface on which there is an array of spatially defined non- overlapping regions or sites that each contain an immobilized polynucleotide or a plurality of immobilized polynucleotides. The regions or sites may each contain an electrode. “Substantially planar’ means that features or objects of interest, such as polynucleotide sites, on a surface may occupy a volume feat extends above or below a surface and whose dimensions are small relative to he dimensions of the surface.
- beads disposed on the face of a fiber optic bundle create a substantially planar surface of probe sites, or oligonucleotides disposed or synthesized on a porous planar substrate create substantially planar surface.
- Spatially defined sites may additionally be“addressable” in feat its location and the identity of fee immobilized polynucleotide at feat location are known or determinable.
- Oligonucleotide or polynucleotide sequences may be prepared by any suitable method e.g , fee phosnhoramidlte method described by Beaucage and Carruthers ((1981 ) Tetrahedron Lett. 22: 1859) or the triester method according to Matteucei et ai (1981) ./. Am. Chem. Soc. 103:3185), both incorporated herein by reference in their entirety for all purposes, or by other chemical methods using either a commercial automated oligonucleotide synthesizer or high- throughput. high-density array methods described herein and known in the art (see U.S. Patent Nos. 5.602.244. 5,574,146, 5,554,744, 5,428,148, 5.264.566. 5,141.813. 5,959,463, 4,861 ,571 arid 4 659,774, incorporated herein by reference in its entirety for all purposes).
- oligonucleotides or polynucleotides may be synthesized on a solid support using a maskless array synthesizer (MAS).
- MAS maskless array synthesizer
- Maskless array synthesizers are described, for example in PCX application No. WO 99/42813 and in corresponding U.S. Patent No. 6,375,903
- Other examples are known of maskless instruments which can fabricate a custom polynucleotide microarray in which each of the features in the array has a single stranded DNA molecule of desired sequence.
- An exemplary type of instrument is the type shown in Figure 5 of U.S. Patent No. 6.375.903. based on the use of reflective optics.
- oligonucleotide or polynucleotide sequences include, for example, light-directed methods utilizing masks, flow channel methods, spotting methods, pin-based methods, and methods utilizing multiple supports as is known in the art.
- Flow channel methods involve, for example, microfluid ie systems to control synthesis of polynucleotides on a solid support.
- suitable reagents may be flowed over the entire surface of a support and methods employed for selective activation of known locations for synthesizing polynucleotides.
- diverse polymer sequences may be synthesized at selected regions of a solid support by forming flow channels on a surface of the support through which appropriate reagents flow or in which appropriate reagents are placed.
- flow channels on a surface of the support through which appropriate reagents flow or in which appropriate reagents are placed.
- a protective coating such as a hydrophilic or hydrophobic coating (depending upon the nature of the solvent) is utilized over portions of the support to be protected sometimes in combination with materials that facilitate wetting by the reactant solution In other regions in this manner the flowing solutions are further prevented from passing outside of their designated flow paths.
- Spotting methods for preparation of oligonucleotides on a solid support involve delivering reactants in relatively small quantities by directly depositing them in selected regions hi some steps, the entire support surface can be sprayed or otherwise coated with a solution, if it is more efficient to do so.
- Precisely measured aliquots of monomer solutions may be deposited dropwise by a dispenser that moves from region to region.
- Typical dispensers include a micropipette to deliver the monomer solution to the support and a robotic system to control the position of the micropipette with respect to the support, or an ink-jet printer.
- the dispenser includes a series of tubes a manifold an array of pipettes, or the like so that various reagents can be delivered to the reaction regions simultaneously.
- Pin-based methods for synthesis of oligonucleotides on a solid support are described, for example, in U.S. Patent No. 5,288.514.
- Pin-based methods utilize a support having a plurality of pins or other extensions. The pins are each inserted simultaneously into individual reagent containers in a tray.
- An array of 96 pins is commonly utilized with a 96-coniainer tray, such as a 96-well micro litre dish.
- Each tray is filled with a particular reagent for coupling in a particular chemical reaction on an individual pin. Accordingly, the trays will often contain different reagents. Since the chemical reactions have been optimized such that each of the reactions can be performed under a relatively similar set of reaction conditions, it becomes possible to conduct multiple chemical coupling steps simultaneously.
- a plurality of oligonucleotides or polynucleotides may be synthesized on multiple supports.
- One example is a bead based synthesis method which is described, for example, in ITS.
- oligonucleotides or polynucleotides may be removed, released or uncoupled from the solid support, for example, by exposure to conditions such as acid, base, oxidation, reduction, beat light, pH, electric current, electric potential, metal ion catalysis, displacement or elimination chemistry, or by enzymatic cleavage as is known in the art.
- Cieavabie linkages are known to those of skill in the art and include those aetivatable, he cieavabie, by acid, base, oxidation, reduction, heat, tight pH, electric current, electric potential, metal ion catalysis, displacement or elimination chemistry, or enzyme. Methods synthesizing and cleaving nucleic acids containing chemically cieavabie, thermally cieavabie, and photo- labile groups are described for example, in II. S. Patent No. 5,700,642.
- oligonucleotides may be attached to a solid support through a cieavabie linkage moiety.
- a cieavabie linkage moiety for example, foe solid support may be functionalized to provide cieavabie linkers for covalent attachment to the oligonucleotides.
- the linker moiety may be one, two, three, four, five, six or more atoms in length.
- the cieavabie moiety may be within an oligonucleotide and may be introduced during in situ synthesis.
- cieavabie sites contained within the modified oligonucleotide mayinclude chemically cieavabie groups such as dialkoxysila.ne, 3 (S)-phosphoroihi solo, 5 (S) phosphorothioate, 3 ' --(N)-phospboramidate, 5’-(N)phosphoramidate, and ribose Synthesis and cleavage conditions of chemically cieavable oligonucleotides are described in U.S. Patent Nos. 5.700.642 and 5,830,655.
- a non-cleavable hydroxyl linker may be converted into a cieavabie linker by coupling a special phosphoramidite to the hydroxyl group prior to the phosphoramidite or H-phosphonate oligonucleotide synthesis as described in U.S. Patent Application Publication No. 2003/0186226.
- the cieavabie linking moiety may be a TOPS (two oligonucleotides per synthesis) linker (see e.g.. PCT publication WO 93/20092).
- the TOPS phosphoramidite may be used to convert non- cleavable hydroxyl group on the solid support to a cieavabie linker.
- a cieavabie linking moiety may be an amino linker.
- Thio-containlng inieraucieotide bonds such as 3‘-(S)- phosphorothioate and 5‘-(S)-phosphorothioate are cleaved by treatment with silver nitrate or mercuric chloride.
- Acid cleavable sites include 3 : -iN)-phosphoramidaie. 5 -(N)- phosphorarnidate, difeioacetal, acetal and phosphonic bisamide.
- the cleavable linking moiety may be a photocleavable linker, such as an oriho-niirobenzyl photocleavable linker.
- Photocleavable moieties include those capable of being cleaved by light of a certain wavelength. Such cleavable moieties are referred to as phoioiabiie linkages and are disclosed in Oiejnik ei at, Photocleavable biotin derivatives: a versatile approach tor the isolation of biomolecules, Proc. Natl. Acad. Set ILS.A., voi. 92, p. 7590-7594 (1995). Photo-labile linkages include nitrobenzylether and thymidine dimer.
- Such photocleavable linkers can be cleaved by IJV illumination between wavelengths of about 275 to about 375 n for a period of a few seconds to 30 minutes, such as about one minute. Exemplary wavelengths include between about 300 nm to about 350 nm. Synthesis and cleavage conditions of phoioiabiie oligonucleotides on solid supports are described, for example, in Venkatesan ei al. J of Org. Chem. 61:525-529 (1996), Kaiil ei afo J. of Org. Chem. 64:507-510 (1999), Kahl et al., J. of Org.
- Thermally cleavable groups include ally lie sulfoxide and cyclohexene.
- oligonucleotides may be removed from a solid support by an enzyme such as a nuclease.
- an enzyme such as a nuclease.
- oligonucleotides may be removed from a solid support upon exposure to one or more endonucleases, including, for example, restriction endonucleases such as class 11s restriction enzymes.
- endonucleases including, for example, restriction endonucleases such as class 11s restriction enzymes.
- a restriction endonuclease recognition sequence may be incorporated into the immobilized oligonucleotides and the oligonucleotides may be contacted with one or more restriction endonucleases to remove the oligonucleotides from the support.
- restriction endonucleases having specific binding and/or cleavage sites are commercially available, for example, from New England Biolabs (Ipswich, MA)
- suitable nucleases include zinc fingers TALENs and CRISPR nucleases as are known in the art.
- a suitable deavabie moiety may be selected to be compatible with the nature of the protecting group of the nucleoside bases if a protecting group is utilized the choice of solid support, and/or the mode of reagent delivery, among others.
- the deavabie moiety may be removed under conditions which do not degrade the oligonucleotides.
- Suitable deavabie or releasable moieties include those responsive to changes in pH, such as which result from application of an electric current or potential to create a localized basic or acidic pH.
- Such moieties may include one or mom bonds that break in response to such changes in pH, such as thiol bond.
- the encoded oligonucleotide or polynucleotide sequences are then synthesized using an error prone polymerase, such as template independent error prone polymerase, and common or natural nucleic acids which may be unmodified.
- an error prone polymerase such as template independent error prone polymerase
- common or natural nucleic acids which may be unmodified.
- initiator sequences or primers are attached to a substrate, such as a silicon dioxide substrate, at various known locations, which may include an electrode, to produce an addressable substrate.
- Reagents including at least a selected nucleotide, a template independent polymerase and other reagents, such as cations, required for enzymatic activity of the polymerase are applied at one or more locations of foe substrate or the entire substrate where the initiator sequences are loeated and under conditions where the polymerase adds one or more than one or a plurality of the nucleotide to the initiator sequence to extend the initiator sequence.
- the nucleotides kklNTPs'’ are applied or flow in periodic applications or waves of known temporal and spatial manner or width or conditions considering the polymerase polymerization (or switching rate) rate in this exemplary manner, blocking groups or reversible terminators may not be used with the dNTPs because the reaction conditions are selected to be sufficient to limit or reduce the probability of enzymatic addition of the dNTP to one dNTP, i.e one dNTP is added using the selected reaction conditions taking into consideration the reaction kinetics.
- nucleotides with blocking groups or reversible terminators can be used in certain embodiments.
- Nucleotides with blocking groups or reversible terminators are known to those of skill in the art.
- more than one dNTP may be added to form a homopolymer run when common or natural nucleotides are used with a polymerase, such as a template independent error prone polymerase.
- a polymerase such as a template independent error prone polymerase.
- each homopolymer run (as determined by sequencing) is interpreted as representing a single dNTP.
- Polymerase activity may be modified using photo-chemical or electrochemical modulation as a reaction condition, which may allow for addition of dNTP beyond a single dNTP.
- a wash is then applied to the one or more locations to remove the reagents.
- the steps of applying the reagents and the wash are repeated until desired nucleic acids are created.
- the reagents may be added to one or more than one or a plurality of locations on the substrate in series or in parallel or the reagents may contact the entire surface of the support, such as by flowing the reagents across the surface of the support.
- the reaction conditions are determined, for example based ou reaction kinetics or the activity of the polymerase so as to determine or limit the ability of the polymerase to attach more than one nucleotide to the end of the initiator sequence or the growing oligonucleotide.
- a template dependent error prone polymerase can be used.
- a template dependent polymerase may be used which may become error prone.
- a template independent RNA polymerase can be used.
- polymerases are used to build nucleic acid molecules representing information which Is referred to herein as being recorded in the nucleic acid sequence or the nucleic acid is referred to herein as being storage media.
- Polymerases are enzymes that prodnce a nncleic acid sequence, for example, using DNA or RNA as a template, or such enzymes may be template independent.
- Polymerases that produce RNA polymers are known as RNA polymerases while polymerases that produce DNA polymers are known as DNA polymerases.
- Polymerases that incorporate more titan one type of nucleotide are known in the art and are referred to herein as an“error-prone polymerases”.
- Template independent polymerases may be error prone polymerases. Using an error-prone polymerase allows the incorporation of specific bases at precise locations of the DNA molecule. Error- prone polymerases will either accept a non-standard base, such as a reversible chain terminating base, or will incorporate different nucleotide, such as a natural or unmodified nucleotide that is selectively given to it as it tries to copy a template.
- a non-standard base such as a reversible chain terminating base
- nucleotide such as a natural or unmodified nucleotide that is selectively given to it as it tries to copy a template.
- TdT terminal deoxynucleotidyl transferase
- DNTT DNA nuc!eotidylexotransierase
- terminal transferase create nucleic acid strands by catalyzing the addition of nucleotides to the 3' terminus of a DNA molecule without a template.
- the preferred substrate of TdT is a 3 '-overhang, but it can also add nucleotides to blunt or recessed 3’ ends.
- Cobalt is a cofactor, however the enzyme catalyzes reaction upon Mg and Mn administration in vitro.
- Nucleic acid initiators may be 4 or 5 nucleotides or longer and may be single stranded or double stranded. Double stranded initiators may have a 3’ overhang or they may be blunt ended or they may have a 3 ' recessed end.
- TdT like all DNA polymerases, also requires divalent metal ions for catalysis.
- TdT is unique in its ability to use a variety of divalent cations snch as Co2-f, Mn2-f, Zn2+ and Mg2+.
- the extension rale of the primer p(dA)n (where n is the chain length from 4 through 50) with dATP in the presence of divalent metal ions is ranked in the following order: Mg2+ > Zn2+ > C o.: ⁇ > Mh2-k
- each metal ion has different effects on the kinetics of nucleotide incorporation.
- Mg2+ facilitates he preferential utilization of dGTP and dATP whereas Co2+ increases the catalytic polymerization efficiency of the pyrimidines
- dCTP and dITP Zre' f behaves as a unique positive effector for TdT since reaction rates with Mg2+ are stimulated by the addition of micromolar quantities of Zn2+.
- This enhancement may reflect the ability of Zn2-f to induce conformational changes in TdT that yields higher catalytic efficiencies. Polymerization rates are lower in the presence of Mn2+ compared to Mg2+ suggesting that Mn2+ does not support the reaction as efficiently as Mg2+.
- TdT is provided in Biochim Biophys Ada., May 2010; 1804(5): 1151-1166 hereby incorporated by reference in its entirety.
- Mg2-t, Zn2 f. Co2-f or Mn2+ in the nucleotide pulse with other cations designed to modulate nucleotide attachment.
- the nucleotide pulse replaces Mg++ with other caiion(s), such as Na+, K r. Rb- h oe-i-r Ca++ or Sr-t-t ⁇
- the nucleotide can bind but not incorporate, thereby regulating whether the nucleotide will incorporate or not.
- a pulse of (optional) pre-wash without nucleotide or Mg-r+ can be provided or then Mg-s-+ buffer without nucleotide can be provided.
- nucleic acid sequence By limiting nucleotides available to the polymerase the incorporation of specific nucleic acids into the polymer can be regulated.
- these polymerases are capable of incorporating nucleotides independent of the template sequence and are therefore beneficial lor creating nucleic acid sequences de novo.
- the combination of an error-prone polymerase and a primer sequence serves as a writing mechanism for imparting information into a nucleic acid sequence.
- nucleotides available to template independent polymerase By limiting nucleotides available to template independent polymerase, the additio of a nucleotide to an initiator sequence or an existing nucleotide or oligonucleotide can be regulated to produce an oligonucleotide by extension.
- these polymerases are capable of incorporating nucleotides without a template sequence and axe therefore beneficial for creating nucleic acid sequences de novo.
- the eta-polymerase (Matsuda et ai (2000) Nature 404(6781): 1011-1013) is an example of a polymerase having a high mutation rate ( ⁇ 10%j and high tolerance for 3 mismatch in the presence of all 4 dNTPs and probably even higher if limited to one or two dNTPs.
- the eta-polymerase is a de novo recorder of nucleic acid information similar to terminal deoxynucleoddyl transferase (TdTj but with the advantage that the product produced by this polymerase is continuously double-stranded.
- Double stranded DNA has less sticky secondary structure and has a more predictable secondary structure than single stranded DNA.
- double stranded DNA serves as a good support for polymerases and/or DNA- binding-protein teihers.
- a template dependent or template semi-dependent error prone polymerase can be used.
- a template dependent polymerase may be used which may become error prone.
- a template independent RNA polymerase can be used.
- any combination of templates with universal bases can be used which encourage acceptance of many nucleotide types.
- error tolerant cations such as Mn + can be used.
- the present disclosure contemplates the use of error-tolerant polymerase mutants. See Berger et ah, Universal Bases for Hybridization, Replication and Chain Termination, Nucleic Acids Research 2000. August L 28(15 ⁇ pp 2911-2914 hereby incorporated by reference.
- nucleic acid sequences am disclosed in "Large- scale de novo DNA synthesis: technologies and applications," by Siira Kosuri and George M. Church, Nature Methods, May. 2014. Voi. 11 No. 5, pp 499-507 hereby incorporated by reference in its entirety.
- the commercially available CustomArray system fro CustomArray, Inc. is an exemplary syste that can be used to make the nucleic acid sequences encoding the information to be stored by affecting or altering or producing pH locally on a substrate. It is to be understood that other methods may be used to affect or alter or produce pH at particular locations on a substrate.
- the CostomArray system uses a pH gradient and synthesizes a desired ol gonucleotide microarray using a semiconductor-based electrochemical- synthesis process. Each oligonucleotide or polynucleotide is synthesized via a platinum electrode that is independently controlled by the synthesizer's computer.
- pH gradient is created which activates a pH sensitive polymerase at specific, desired locations on the substrate to add a nucleotide present in an aqueous medium at the specific, desired location.
- pH is modulated to initiate the polymerase to add a single nucleotide however, more than one nucleotide may be added to create a homopolymer.
- system such as the CustomArray system, or other systems described herein, can be used to afreet or alter or produce pH locally on a substrate where a pH dependent polymerase, a nucleotide and other suitable reagents in aqueous media are present to add the nucleotide to an initiator sequence or existing nucleotide or oligonucleotide in a method of forming an oligonucleotide.
- Exemplary methods described herein use aqueous solvents and pH to modulate activity of a polymerase such as a template independent polymerase, such as TdT to add a nucleotide to an existing initiator sequence, an existing nucleotide or an existing oligonucleotide at a desired location on the substrate in a method of forming an oligonucleotide.
- a polymerase such as a template independent polymerase, such as TdT to add a nucleotide to an existing initiator sequence, an existing nucleotide or an existing oligonucleotide at a desired location on the substrate in a method of forming an oligonucleotide.
- Supports described herein may have one or more electrodes positioned at or near or adjacent to a reaction site such that oxidation or reduction may take place within a reaction zone including the reaction site.
- the present disclosure provides for the use of an aqueous electrolyte media such as in commonly used with electrochemical cells.
- the aqueous electrolyte media may further include a weakly acidic moiety participating in oxidation or reduction reaction at an electrode and releasing one or more protons or adsorbing one or more hydroxide ions upon oxidation, thereby altering pH.
- the aqueous electrolyte media may further include one or more or a plurality of acid generating reagents.
- An exemplary acid-generating reagent is bydroquionone, catechol, resorcinol, Aikannin, bexahydroxynaphthoquinone, Jugione, Lapachoi, Lawsone, Menatetrenone, spinochrome D, Phylloquinone, Plumbagin, spinochrome B, Menadione, 1,4- Naphthoquinone, 1.2-Naphthoquinone, 1.6-Naphthoquinone, anthraquinones, isoindole-4, 7- diones, other natural and synthetic derivatives of quinone, other phenol derivatives, pyrrole and related derivatives and polymers thereof, thiophenes and related derivatives and polymers thereof, aniline and related derivatives and polymers thereof, acetylene derivatives and polymers thereof.
- Bipyridiniumor and derivatives thereof and related compounds aldehydes and alcohols, bromine oxides cyanides, carbonates hypoohiorous acids hypoiodons acids, thiols, organic halides, or other weakly acidic organic and inorganic compounds.
- the aqueous electrolyte media may further include a weakly basic moiety participating in an oxidation or reduction reaction at an electrode and releasing one or more hydroxide ions or absorbing one or more protons upon reduction thereby altering pH.
- An exemplary base generating reagent is 1,4-benzoquinone, 1 ,2-benzoquinone, 1,3-benzoquinone, anthraquinone, Duroquinone, Tetrahydroxy- 1 ,4-benzoquinone, Aikannin, bexahydroxynaphthoquinone, Jugione, Lapachoi, Lawsone, Menatetrenone, spinochrome D, Phylloquinone.
- Plumbagin, spinochrome B Menadione, 1, 4-Naphthoquinone, 1,2-Naphthoquinone, 1 6-Naphthoqninone, anthraqoinones. lsoindoie-4,7-diones, other natural and synthetic quinone derivatives, other phenol derivatives pyrrole and related derivatives and polymers thereof, thiophenes and related derivatives and polymers thereof, aniline and related derivatives and polymers thereof, acetylene derivatives and polymers thereof, Bipyridiniumor and derivatives thereof and related compound, aldehydes, ketones, and alcohols bromine oxides cyanides carbonates hypochiorous acids, hypoiodous acids, thiols, organic halides, or oilier weakly basic organic or inorganic compounds.
- a microfluidic device is provided with one or more reservoirs which include one or more reagents which are then transferred via microchannels to a reaction zone or location on the addressable substrate where the reagents axe mixed and the reaction occurs.
- Such microfluidic devices and the methods of moving fluid reagents through such microfluidic devices are known to those of skill in the art.
- a flow cell or other channel such a microfluidic channel or microiluidic channels having an input and an output is used to deliver fluids including reagents, such as a polymerase, a nucleotide and other appropriate reagents and washes to particular locations on a substrate within the flow ceil, such as within a reaction chamber.
- reagents such as a polymerase, a nucleotide and other appropriate reagents and washes to particular locations on a substrate within the flow ceil, such as within a reaction chamber.
- reaction conditions are selected to selectively activate and deactivate locations on the substrate.
- a desired location such as a grid point on a substrate or array
- reaction conditions to facilitate covalent binding of a nucleotide to an initiator sequence an existing nucleotide an existing oligonucleotide and the reaction conditions can be provided to prevent further attachment of an additional nucleotide at the same location.
- reaction conditions to facilitate covalent binding of a nucleotide to an existing nucleotide can be provided to the same location in a method of making an oligonucleotide at that desired location.
- reagents can be delivered to the entirety of (he substrate or portions thereof and a selected known location or locations can be activated to cause the polymerase to add the nucleotide to either the initiator or growing nucleotide chain.
- the surface of the addressable substrate can be washed and a second set of reagents can be added io the surface of the addressable support, activated to add a nucleotide and so on.
- the synthesized oligonucleotides or polynucleotides can be amplified using methods known to those of skill in the art.
- Amplification methods may comprise contacting a nucleic aci with one or more primers that specifically hybridize to the nucleic aci under conditions that facilitate hybridization and chain extension.
- Exemplary methods for amplifying nucleic acids include the polymerase chain reaction (PCR) (see. e.g., Mollis ei ai. (1986) Cold Spring Harh. Symp. Quant. Biol 51 Pi 1:263 and Cleary ei ah (2004) Nature Methods 1:241; and U.S. Patent Nos.
- Chem. 277:7790 the amplification methods described in U.S. Patent Nos 6,391,544, 6.365 375 6.294.323, 6,261,797, 6,124.090 and 5,612,199, or any other nucleic acid amplification method using techniques well known to those of skill in the art.
- polynucleotides or a plicons thereof are sequenced using methods known to those of skill in the art, such as next-generation sequencing methods.
- the sequenced oligonucleotides or polynucleotides are then converted into bit sequences corresponding to, for example, an him! format of information.
- the bit sequences can be converted to the format of information using methods known to those of skill in the art.
- the format of information can be visualized or displayed or played, if an audio format, using methods and devices known to those of skill In the art.
- Sequencing methods useful in the present disclosure include Shendure et a!., Accurate multiplex polony sequencing of an evolved bacterial genome, Science, vol 309, p 1728-32 2005; Drmanac et ah Human genome sequencing using unchained base reads on self- assembling DNA nanoarrays, Science, vol. 327, p. 78-81. 2009: Me Kern an et ah, Sequence and structural variation in a human genome uncovered by short-read, massively parallel ligation sequencing using two-base encoding. Genome Res., vol 19, p 1527-41. 2009; Rodrigue et ah, Unlocking short read sequencing for metagenomics PIMS One, vol.
- the data reconstruction step may then be carried out where the polynucleotide sequence is translated into the digital representation format.
- barcoding is the inclusion or association of a specific unique nucleotide sequence or barcode tag along with a larger polynucleotide sequence so as to identify the larger polynucleotide sequence or otherwise provide information along with the larger polynucleotide sequence.
- Barcodes may also be referred to as unique nucleic acid sequence identifiers tags. MIDs, or Indexes and all serve to identify the polynucleotide to which they are attached.
- the barcode is generally understood to be an identifying sequence that is read in by a sequencing read separate from the main read that sequences the genomic DNA.
- a barcode may refer to a short sequence that is read in the same read as the genomic DNA. Barcodes enable multiple samples to be pooled for sequencing; each sample is identified by a unique barcode which enables identification of results during the analysis.
- barcode tags are known in the art and include PCR methods, iigase methods, transposon/transposase methods, RPA methods, recombination methods or hybridization extension methods.
- a barcode sequence can be added using a primer sequence including th barcode sequence under PCR conditions so that the barcode sequence is included into the PCR product sequence.
- Unique DNA sequence identifiers are added in a PCR reaction carried out before sequencing, which also adds the primers used for the sequencing reaction.
- barcodes must be unambiguous; but because the barcode is part of the sequence read, it is beneficial to have barcodes be as short as possible.
- the barcode also should be designed to minimize primer-dimer artifacts. Methods of designing barcodes are known to those of skill in the art.
- the disclosure provides that ore or more or a plurality of reagents and washes are delivered to one or more or a plurality of reaction sites within one or more or a plurality of reaction zones including an electrode or electrodes in a method of covalently attaching dNTP to an initiator sequence or an existing nucleotide attached at the desired location using electricity to alter pH within a reaction zone
- a selected nucleotide reagent liquid is pulsed or flowed or deposited at the reaction site where reaction takes place and then rnay be optionally followed by delivery of a buffer or wash that does not include the nucleotide.
- Suitable delivery systems include fluidics systems niicrofluidics systems syringe systems, ink jet systems, pipette systems and other fluid delivery systems known to those of skill in the art.
- Various flow cell embodiments or flow channel embodiments or microfluidic channel embodiments are envisioned which can deliver separate reagents or a mixture of reagents or washes using pumps or electrodes or other methods known to those of skill in the art of moving fluids through channels or microiluldie channels through one or more channels to reaction region or vessel where the surface of the substrate is positioned so that the reagents can contact the desired location where a nucleotide is to be added.
- a microfluidic device is provided with one or more reservoirs which include one or more reagents winch are then transferred via mlcroehannels to a reaction zone where the reagents are mixe an the reaction occurs.
- Such microfluidic devices and the methods of moving fluid reagents through such microfluidic devices are known to those of skill in the art.
- Reagents can be deposited onto a discrete region of the support, such that each region forms a feature of the array.
- the pH of the feature is capable of being altered, he. the pH Is raised or lowered to either activate or deactivate an enzyme that catalyzes addition of a dNTP as described herein.
- the present disclosure provides for a method of synthesizing a plurality of polynucleotide sequences using a template-independent polymerases such as TdT, which encodes data without the need for single base precision.
- Each oligonucleotide includes single base precise initiator sequence at the 5’ end and a single base precise adaptor at the 3 end. Subsequent tagging is based on base-precise hybridization of the universal single base precise initiator sequence at the 5’ end or the universal single-base precise 3’ adapter, which is added to ail synthesized strands by ligation or OCR
- a plurality of 5’ initiator primers are conjugated or tethered onto the spots in 2D electrode microarray slide, with each snot including an electrode of roughly 0.2mm in diameter according to di thiol chemical conjugation.
- the polynucleotide synthesis is carried out with TdT under suitable conditions and with suitable reagents.
- Each nucleotide mixed with a template-independent polymerase is flown across the entire surface of the array and electrical stimulus can be used to toggle polymerization activity at the desired electrode for the addition of the desired nucleotide.
- the process is repeated to create polynucleotides of known sequence at known locations of the array.
- the polynucleotides may include homopolymers of heterogeneous lengths.
- a universal single-base precise 3 adapter is added to all synthesized strands by ligation hybridization extension or
- the synthesized polynucleotides are selectively desorbed from a chosen electrode or electrode arrangement tor mass tagging by electrochemical desorption (BCD).
- BCD electrochemical desorption
- a subset of polynucleotide strands subset of DNA strands can be selectively decoupled from the solid-support by electrical stimulus of electrodes or enzymatic cleavage.
- Subsets of decoupled DNA strands can be barcoded with unique 5’ and 3’ primers (gray outlined box) which axe designed to anneal either to the universal 5’ initiator or the universal 3 ' adapter.
- Each barcode sequence is a hash key that represents a meta tag.
- the green strands could be barcoded with B €-5’-44, which may represent‘vacation”, an hC- d -d 2 which may represent“photos”.
- each subset of strands am uniquely tagged with nomoverlapping barcodes in practice, overlapping barcodes may be desirable. Barcodes can be iteratively added or added in series for additional layers of meta tagging. Finally, all subsets of DMA strands can be mixed in a single pool.
- an electrode array where electrodes are placed in an addressable array format on a substrate having initiator sequences reieasably tethered thereto so that oligonucleotides or polynucleotides can be synthesized on the electrodes and so that the synthesized oligonucleotides or polynucleotides can be released f om the electrodes.
- the gold electrode surface Prior to tethering of the initiator sequence to the addressable substrate, the gold electrode surface is first cleaned by abrasion with aluminum oxide particles (BASi) of decreasing sizes (1. 0.3. and 0.05 microns). For each particle size the electrode is manually polished for 30 seconds wills a figure 8 motion on a surface saturated with the particles and distilled water. Following the final particle polishing (0.05 microns), debris is cleared from the electrode surface by ulfrasonicadon (Branson) in an ethanol bath for 5 minutes. Electrodes are then rinsed in distilled water and dried with pure nitrogen. To remove residual organic material fro the electrodes. they are washed in‘piranha etch ' (3 parts of concentrated sulfuric acid and 1 part of 30% hydrogen peroxide solution) for 2 minutes. Finally the electrodes are ulirasonicated (Branson) in distilled water for 15 minutes and ready for tethering A thioiated oligo /5 hioMC6-
- BASi aluminum oxide particles
- CTACACTCTTTCCCTACACGACGCTCTTCCGATCTACGTACTGAG IDT, lOOuM .
- Thioi-PEG3-a1cohol BroadPharm 6M in 100% EtOH
- TCEP Sigma. 1 M stock
- the Thiol-PE03-alcohol is used as a competitor inhibitor to the thioiate ohgo to improve oligo spacing to decrease steric interference.
- Oligonucleotide (“oligo”) preparations for coupling to the gold electrode are provided in the
- an electrode array has been created where electrodes are placed in an addressable array format on a substrate so that oligonucleotides or polynucleotides can be synthesized on the electrodes and so that the synthesized oligonucleotides or polynucleotides can be released from the electrodes.
- the three oligonucleotide (“oligo”) mixtures are statically incubated at room temperatures for 30 minutes and 25 microliters of each are dispensed onto 3 different electrodes. One electrode is kept bare as a negative control.
- the electrodes with oligo mixtures are sealed in a humidified chamber with saturated NaCl (75% humidity) and tethering proceeded statically at room temperature overnight (>i2 hours).
- a potentiostatic mode set up for purpose of electrochemical desorption of thiol oligonucleotides tethered to gold electrodes.
- An electrochemical cell including three electrodes (working electrode, counter electrode and reference electrode) are submerged in a 0.5M NaOH solution which is continuously degassed with nitrogen. Each electrode with a tethered o!igo mixture is used as the working electrode and desorption is tested with each working electrode Eierctrodes are connected to an Auto Lab tMetrohm) potentiostat.
- electrodes were dried with nitrogen gas and the surface is covered with 0.5M NaOH which was degassed for 10 minutes with nitrogen gas.
- a reference and counter electrode were also provided, and all connected to a potentiostat (MetroOhm).
- the Autolab control software was set up in potentiostatic mode. Cyclic voltammetry is performed for five cycles by applying a range of voltages from 0 to -1.25V while measuring current. A representative trace for each of the above three oiigo mixtures is shown in Fugs. 4-6. Trace for bare electrode is overlaid.
- the IV scan of the bare electrode represents the trace with no attached molecules.
- the 1 st scan with mixed DNA/PEG shows significant anodic current when voltage is ⁇ 0.5V. Peak is approximately -IV.
- the 5 vii scan with mixed DNA/PEG shows that the anodic current is significantly reduced.
- the trace is similar to that of the I scan of the bare electrode, indicating that thiol- tethered molecules are being desorbed.
- the oligos are desorbed into a basic solution such as a solution of 0.5 NaOH.
- a basic solution such as a solution of 0.5 NaOH.
- an equimolar volume of 0.5 M HQ can be added such that the oligos are dissolved in a final solution of water with 0 5 M NaCL
- Alternative methods for oligo purification may be used, such as with solid support silica spin columns or Solid Phase Reversible Immobilization (SPRI) beads.
- desorption conditions may be tailored as desired to achieve certain objectives. For example, methods are provided herein where the number of voltage pulses axe minimized or a large negative voltage is applied. Each aspect can be used to minimize time. In cases where a large negative voltage cannot be used due to instrumentation limitations a large number of smaller negative voltages may be used. Alternatively, buffer conditions such as increased concentration of sodiu hydroxide may be helpful to reduce number of pulses and minimum voltage. EXAMPLE X
- oligonucleotide is tethered onto the surface of 4 electrodes and desorbed by die application of positive current.
- the electrode surface is flooded with an extension reaction mixture containing the template-independent polymerase, TdT and different species of nucleoside triphosphates.
- a custom fabricated 4-electrode chip is created by evaporating gold onto a glass substrate.
- the 4-electrode chip is cleaned by sonication in acetone, then isopropanol, and finally by plasma.
- An initiator oligonucleotide is tethered onto the working electrode.
- the initiator is dissolved in K2HPO4 -TE and dispensed onto the chip. Teihering occurs in a chamber humidified to 100%. After 1 hour, the chip is rinsed with water then cleaned with nitrogen.
- a extension reaction mixture is prepared that includes an alkaline buffer (pH>10); a divalent ion (he., Mg) for TdT activity; a nucleoside triphosphate (such as dCTP and dATP), and TdT enzyme
- a fluidic flowcell is assembled on top of the chip and the pads are connected to a potentiostat.
- the extension reaction is injected to cover the electrodes and a constant current of iuA is applied tor 30 seconds followed by an incubation step lor 3 minutes.
- DNA tethered to the surface is collected by injecting lOuL of USER (a mixture of Uracil DNA giycosy!ase (UDG) and the DNA giycosylase-lyase Endonuclease VIII ⁇ fro NEB with incubation at 37°C for Ihour.
- USER a mixture of Uracil DNA giycosy!ase (UDG) and the DNA giycosylase-lyase Endonuclease VIII ⁇ fro NEB
- desorption and/or PGR tagging are provided in a method of DNA data storage.
- the foiol modificatio to foe end of a short DNA oligo allows it to bind onto gold.
- an electrostatic repulsion between voltage / charge and the thiol may result in desorption.
- aspects of the present disclosure are directed to a method for selectively barcoding a subset of polynucleotides encoding bits of information with a unique barcode, wherein a plurality of polynucleotides comprising the subset is releasabiy attached to an addressable substrate by an activatabie linker, wherein foe subset is releasabiy attached to known subset locations of the addressaJbie substrate, wherein each known subset location includes a plurality of different sequence polynucleotides each encoding the same bits of Information wherein each polynucleotide of the subset includes a common 5’ universal initiator of same sequence and a common 3 ' universal adaptor of same sequence, wherein the method includes (a) selectiveiy releasing the subset of polynucleotides from the addressable array, and (b) barcoding the polynucleotides of the subset at either the 5 end or the 3 ’ end with a first barcode.
- the method further includes (c) barcoding the polynucleotides of the subset at either the 5’ end or the 3’ end with a second barcode.
- the first barcode comprises metatag information.
- the second barcode comprises metatag information.
- each polynucleotide of the plurality includes a common 5’ universal initiator of same sequence and a common 3’ universal adaptor of same sequence.
- each polynucleotide of the subset includes a common 5 universal initiator of same sequence unique to the subset and a common 3’ universal adaptor of same sequence unique to the subset.
- the activatable linker is activated to detach the subset of polynucleotides from the addressable substrate using heat, light, an enzyme, a chemical, electrical charge or pH.
- the barcoding of step (b) includes hybridizing a first barcoded primer to either the 5’ universal initiator or the 3’ universal adaptor under PCR conditions to add the first barcode to either the 5 ' universal initiator or the 3 universal adaptor.
- the barcoding of step (c) includes hybridizing a second barcoded primer to either the 5’ universal initiator or the 3’ universal adaptor under PCR conditions to add the second barcode to either the 5’ universal initiator or the 3’ universal adaptor.
- the barcoded polynucleotides of the subset are collected in a storage vessel.
- the addressable substrate includes a plurality of different subsets of polynucleotides wherein each subset encodes bits of information different from other subsets.
- the addressable substrate includes a plurality of different subsets of polynucleotides, and wherein different subsets of polynucleotides axe subject to steps fa) and (b)
- tire addressable substrate includes a plurality of different subsets of polynucleotides, and wherein different subsets of polynucleotides are subject to steps (a) and (b), and wherein the barcoded polynucleotides of the different subsets an; collected in a storage vessel.
- the addressable substrate includes a plurality of different subsets of polynucleotides, and wherein different subsets of polynucleotides are subject to steps fa), (b) and c).
- the addressable substrate includes a plurality of different subsets of polynucleotides, and wherein different subsets of polynucleotides arc subject to steps (a), (b) and (c), and wherein the barcoded polynucleotides of the different subsets are collected in a storage vessel.
- the activatable linker 1 thiol linkage and the subset of polynucleotides is released from the addressable substrate using electronically-stimulated desorption.
- the subset of polynucleotides includes DNA or RNA.
- the method Anther includes sorting, collecting, amplifying sequencing, storing and/or retrieving the barcoded polynucleotides.
- the addressable substrate is an electrode array including a plurality of electrode reaction sites wherein each electrode reaction site is electrically connected to receive a voltage, and wherein the subset of polynucleotides is attached to corresponding electrode reaction sites and wherein releasing tire subset of polynucleotides attached to the corresponding electrode reaction sites is controlled by application of voltage to the corresponding electrode reaction sites.
- the addressable substrate is an electrode array including a plurality of electrode reaction sites wherein each electrode reaction site is electrically connected to receive an electric potential and wherein tire array comprises a plurality of different subsets of polynucleotides, wherein each subset of the plurality is attached to corresponding electrode reaction sites and wherein releasing each subset of the plurality is independently controlled by separate application of voltages.
- electronically-stimulated desorption occurs when polynucleotides are Immersed in basic solution and with the application of at least 4 pulses of -IV or lower.
- electronically-stimulated desorption occurs when polynucleotides are immersed in basic solution and with the application of at least 5 pulses of -I V or lower.
- electronically- stimulated desorption occurs when polynucleotides are immersed with 0.5 M sodium hydroxide and with the application of at least 5 pulses of - IV or lower.
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Health & Medical Sciences (AREA)
- Organic Chemistry (AREA)
- Genetics & Genomics (AREA)
- Engineering & Computer Science (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biotechnology (AREA)
- Molecular Biology (AREA)
- Biochemistry (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Biomedical Technology (AREA)
- Microbiology (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Crystallography & Structural Chemistry (AREA)
- Bioinformatics & Computational Biology (AREA)
- Analytical Chemistry (AREA)
- Plant Pathology (AREA)
- Immunology (AREA)
- Apparatus Associated With Microorganisms And Enzymes (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The present disclosure provides methods for selectively tagging a subset of polynucleotide sequences from a plurality of polynucleotides comprising (a) synthesizing a plurality of polynucleotide sequences by flexible-write synthesis on a solid support, (b) selectively releasing a subset of polynucleotide sequences from the plurality of synthesized polynucleotides, and (c) tagging the released subset of polynucleotide sequences.
Description
Methods for Processing and Storing DNA Encoding Formats of Information
STATEMENT OF GOVERNMENT INTERESTS
This invention was made with government support under Grant No. DE-FGG2- 02ER63445 awarded by Department of Energy Tire government lias certain rights in the invention.
FIELD
The present invention relates in general to methods of processing and storing polynucleotides that encode formats of information which have been translated into digital representation such as bits, trits, etc. as is known in the art which are encoded Into nucleic acid sequences. Such polynucleotides can be selectively removed from an addressable support and barcoded for later processing such as identification, selection, retrieval, etc. such as from among a plurality of barcoded polynucleotides. Such barcodes include nucleic acid sequences with associated meta tag information.
BACKGROUND
DNA has been considered as a medium for digital information storage. See Bancroft et ah. Science 293, 1763-1765 (2001). See also, Davis, Art Journal 55, 70-74 ( 1996); Gustafsson, Nature 458. 703 (2009) and Gibson, Science 329, 52-56 (2010): US 2003/0228611 and WO2014/014991. See also US2010/0099080 and WO201 /014991.
However, its wide adoption as a practical information storage medium has been hampered by the high cost associated with its chemical- based DNA synthesis methods. Enzyme-
based methods for DNA synthesis have been developed towards large-scale information storage in DNA with reduced cost. See for example 11 FI. 1.or . R. Kaihor. N. Goela, J. Bold, G. M. Church, Enzymatic DNA synthesis for digital information storage. bioRxiv (2018), p. 348987.
One feature of the enzyme-based DNA synthesis method is the use of template- independent TdT which increases the efficiency for converting information to DNA. This method allows accurate data encoding without requiring single-base precision. For example, data can be stored in a series of nucleotide homopolymers such that a DNA strand having a sequence of “AAAATTTCGGG”, for example, is informationally equivalent to a DNA strand, which can be computed and represented in silieo, as A t CG . This data encoding approach, which may be referred to as a flexible-write synthesis (because it allows for and interprets repeated nucleotides as a single nucleotide), allows for a high-level of tolerance on the stringency by which DNA synthesis is performed. As a result, synthesis processes can be optimized for increased speed and reduced reagent costs by compromising the precision requirement.
A issue associated with the flexible-write synthesis method is that the resultant DNA strands can be highly heterogeneous at dm single-base resolution making downstream processing difficult. A need therefore remains for a method that can mass tag the DNA strands that correspond to the same data origin with ease while also allowing for selective amplification and retrieval of informatio stored in the heterogeneous DNA strands, especially in situations when a plurality of strands represent a plurality of data, such as files, are stored in a single pool on an array.
SUMMARY
The present disclosure provides a method for selectively harcoding a subset of polynucleotides encoding bits of information with a unique barcode, wherein a plurality of polynucleotides comprising the subset is releasably attached to an addressable substrate by an activatabie linker, wherein die subset is releasably attached to known subset locations of the addressable substrate, wherein each known subset location includes a plurality of different sequence polynucleotides each encoding the same bits of information, wherein each polynucleotide of the subset includes a common 5’ universal initiator of same sequence and a common 3 universal adaptor of same sequence. The method includes the steps of selectively releasing the subset of polynucleotides from the addressable array, and barcoding the polynucleotides of the subset at either the 5' end or the 3’ end wife a first barcode.
The methods according to the present disclosure can be used for selective sorting, manipulating and storing of information such as information encoded in DNA using a flexible- write synthesis method as further described herein. Given high-speed DNA synthesis methods the methods of the present disclosure are particularly useful with DNA as an information storage medium and especially where synthesis fidelity Is not required. According to one aspect, a solid- phase synthesis device can be used to record digital information in DNA molecules on an addressable substrate and the DNA molecules can be selectively further processed, such as by adding barcode or other metatag information.
Further features and advantages of certain embodiments of fee present invention will become more fully apparent in fee following description of embodiments and drawings thereof, and from the claims.
BRIEF DESCRIPTION OF THE DRAWINGS
The foregoing arid other features and advantages of the present embodiments will be more fully understood from the following detailed description of illustrative embodiments taken in conjunction with the accompanying drawings in which:
Figs. I A- IB depict an overview schematic of mixed-mode production of mass tagged DNA polymers for information storage.
Pig. 2 depicts the result of a trace under 1 M Thiol- PEGS -alcohol. Anodic current is detected from application of volts fro 0 to -1 25 V with addition of ImM final Thiol-PEG3- alcohol. The first cyclic voltammetry scan shows a significant anodic current of -20 micro amps when the voltage is between -1 to -1.25 volts, demonstrating conjugation of thioiated molecules to the electrode. After the 5th cyclic voltammetry scan, the peak anodic current between -1 to - 1 25 volts is less than -5 micro amps and is similar to that of the hare electrode, demonstrating desorption of the thioiated molecules from the electrode surface.
Fig 3 depicts the result of a trace under 10 mM Thio!-PEGS-aicohoi. Anodic current is detected from application of volts from 0 to -1.25 V with addition of 100 mieroM final Thiol - PECB-alcohoL The first cyclic voltammetry scan shows a significant anodic current of -20 micro amps when the voltage is between -1 to -1.25 volts, demonstrating conjugation of thioiated molecules to the electrode. After the 5th cyclic voltammetry scan, the peak anodic current between -1 to -1.25 volts is approximately -5 micro amps and is similar to that of the bare electrode, demonstrating desorption of the thioiated molecules fro the electrode surface.
Fig. 4 depicts the result of a trace under 100 p Thiol- PEG3-alcohol. Anodic current is detected from application of volts from 0 to -1.25 V with addition of ImM final Thioi-PEG3-
alcohol. The first cyclic voltammetry scan shows a significant anodic current of -20 micro amps when the voltage is between -1 to -1.25 volts, demonstrating conjugation of thioiated molecules to the electrode. After the 5th cyclic voltammetry scan, the peak anodic current between -1 to - 1 25 volts is approximately -3 micro amps and is highly similar to that of the bare electrode, demonstrating high (i.e., near complete) desorption of the thioiated molecules from the electrode strrf ce
big. 5 is a gel separation depicting desorbed DNA versus tethered DNA. F control = Initiator only control FI - fluid from extension reaction with dCTP (desorbed DNA). F2 - fluid from extension reaction with dCTP (desorbed DN A). F3 fluid Dorn extension reaction with dATP (desorbed DNA). S control - Cleaved initiator control. SI - tethered DNA from extension reaction with dCTP (tethered DNA). S2 --- tethered DN A from extension reaction with dCTP (tethered DNA).
DETAILED DESCRIPTION
Aspects of the present disclosure are directed to methods of processing a subset of polynucleotides (such as DNA or RNA strands) from among a collection of polynucleotides present on an addressable substrate. The aspects of the present disclosure have particular application where the polynucleotides (strands) encode for a format of information which is represented by bits which are encoded into nucleic acid sequences, as is known in the art. According to the present disclosure, the formal of information can be digital as is known in information theory and information systems, to be t e discrete, discontinuous representation of information or works. According to the present disclosure, the format of information can be
analog, as is known In the art as relating to or using signals or information represented by a continuously variable physical quantity such as spatial position voltage, etc.
For example and without limitation, a format of information such as text, an image, a video or an audio format, such as an him! format of information, as is known in the art, is converted to a digital representation (i.e., encoding in discrete units such as a bicary numeral system ternary numeral system and so on) such as bits (zeros and ones), trits (zeros ones, twos), for example using a computer and appropriate software, and then the series of bits (or trits or other exemplary digital representation system) are translated into a series of nucleotides. See, for example. Lee et. ah . hioRxxv (2018) p. 348987 and Jensen et ak, hioRxxv (2018) doi: 10.1101/355719 each of which are hereby incorporated by reference in its entirety.
As used herein, the term“bit” is to be understood according to its common meaning to one of skill in the art. The term“bit” may be a contraction of“binary digit” and may refer to a basic capacity of information in computing and telecommunications. A“bit” represents either a first state or a second state, such as 1 or 0 (one or zero) only. The representation may be implemented, in a variety of systems, by means of a two state device. The bit sequence is converted (encoded), such as by a computer and appropriate software, to a designed sequence of nucleotides, i.e., an oligonucleotide or DNA or RNA. for example using a 1 bit per base encodi g (A or C -0; T/U or G -1) or other encoding method to for a corre spending encoded oligonucleotide sequence, i.e. the oligonucleotide sequence corresponds fo or encodes for the bit sequence. The term“trit” refers to a ternary numeral system, also called base 3, and has 3 as its base. Analogous to a bit. a ternary digit is a trit (trinary digit). The following discioosure may
make refernece to bits, but equally applies to method using nits and other digital representation methods known to those of skill in the art.
The series of bits may be separated into bit sequences such that a series of nucleotides, he . a polynucleotide, is representative of the bit sequence. Many individual or separate polynucleotides, such as coliected as a library, collectively encode for the format of information. Each polynucleotide is then synthesized for example, using a template independent process or other processes as described herein and the polynucleotide is then stored. The format of information is encoded by a plurality of polynucleotides with each polynucleotide representing a bit sequence making up the series of bits representative of the entire format of information. The sequences of the plurality of polynucleotides can then be determined, such as by sequencing, and translated back into the series of bits which is then translated back into the format of information. Such a process is described in US Patent No. 9,996,778 or US Patent No. 9,928,869 each of which are hereby incorporated by reference in its entirety.
According to the present disclosure, the subset of polynucleotides, for example, encoding for a format of information as a series of bits, is selectively removed from the addressable substrate and bareoded with one or more, such as two or more or a plurality of barcodes using methods known to those of skill in the art. According to one aspect, the locations on the addressable array where the subset of polynucleotides is attached is known because of the synthesis process. For example, a polynucleotide synthesizer can be programmed to synthesize the subset of polynucleotides of predetermined sequences at defined locations of the addressable array. The locations need not be contiguous, although synthesizing the subset at contiguous locations on an addressable array has certain advantages in terms of synthesis efficiency and
removal, since the subset is confined to a particular geographic location of the array. The geographic location may be separated from other locations by physical barriers such as channels or trenches or by chemical barriers such as hydrophobic layers as is known in the art. Since the locations are known a priori, the subset can be released from those known locations, using methods as described herein, for further processing, such as adding one or more, two or more or a plurality of barcodes.
The barcoded subset of polynucleotides is then stored within a storage vessel. One or more additional subsets of polynucleotides axe subsequently removed, such as in series, from the addressable array and barcoded with one or more such as two or more or a plurality of barcodes and stored within the storage vessel. According to one aspect the barcode uniquely identifies each subset or provides information about the nature of t e information stored in the subset, such as eta tag information, and is used to identify, select and/or retrieve the subset of polynucleotides from among the different subsets of polynucleotides stored in the same storage vessel.
According to one aspect, the polynucleotides are synthesized on the addressable substrate using methods known to those of skill in the art, such as template independent synthesis using a template independent polymerase, as is known in the art for encoding a format of information into polynucleotide sequences. The template independent polymerase can be error prone or not error prone. As is known, template independent synthesis can be used to produce polypeptide chains at known locations of an addressable substrate. For example, an addressable substrate Is provided having an initiator sequence bound or tethered to locations of the addressable substrate where polynucleotides are intended to be synthesized. The initiator sequence may be or also
have a single-base precise sequence to be used for adding a barcode sequence. The ioeations of the addressable substrate may be or include an electrode as is known in the ait, such as with an electrode array. Polynucleotides are produced at the electrode locations. The reagents for extending the initiator sequence with a nucleotide, such as the nucleotide, a template independent polymerase cations, etc. as is known in the art, axe contacted to the addressable substrate and an electrical stimulus can be generated to activate the template independent polymerase to add the nucleotide to the initiator sequence at desired locations of the addressable substrate. The process can be repeated to produce polynucleotides at desired known locations of the addressable substrate as is known in the art. Thereafter, a single-base precise 3’ adapter may be added to the polynucleotides by methods such as ligation or PCR. The single-base precise 3' adapter may be used to add a barcode sequence to the polynucleotides.
As is known in the art, a given known location of an addressable substrate, such as an electrode of an electrode array can include a plurality of initiators attached thereto resulting in a plurality of oligonucleotides or polynucleotides at the given known location. While a given known location of an addressable substrate is intended to include a polynucleotide of a same given predetermined sequence assuming perfect synthesis fidelity, template independent synthesis methods can result in repeat addition of a given nucleotide resulting in a plurality of different sequences at the given known location of an addressable substrate insofar as the polynucleotide sequences may include repeat sequences.
As a result of synthesis infidelity, the polynucleotide sequence strands for a given known location can be highly heterogeneous at the single-base resolution though the strands accurately encode the intended bit sequence. The different polynucleotide sequences can be interpreted as
;h·. same predetermined sequence, as is known in the art, such as by interpreting a given repeat nucleotide as a single given nucleotide, and so information can be accurately maintained by the different polynucleotide sequences. Using this aspect, polynucleotides can be synthesized emphasizing high speed and low cost of synthesis without requiring high synthesis fidelity by allowing repeat nucleotides during synthesis.
However, the diverse or different sequence strands naake amplification and hybridization less accurate and more difficult, since a single primer sequence (intended to hybridize to and amplify the sequence, i.e. the many sequences attached to the known location) may not be able to hybridize to each of the different polynucleotide sequences representing t e same predetermined sequence at the given known location on the addressable substrate The single primer sequence may not be able to hybridize to all of the polynucleotide sequences at a given known location of an addressable substrate.
Since the given known location of the addressable substrate includes polynucleotide sequences other than the intended predetermined sequence due to the presence of repeat sequences the present disclosure provides a method for barcoding the polyn cleotides of the given known location of the addressable substrate, and other polynucleotides of a given subset. According to one aspect, a barcode is unique to each polynucleotide at a given known location of the addressable substrate even though the polynucleotides may differ in sequence. The barcode is also unique to fee other polyn cleotide sequences of a given subset located at other given known locations of the addressable substrate. As a result, each polynucleotide of the subset includes a first same or identical barcode identifying fee members of the subset. The barcode is unique to the members of fee subset and so can be used to identify the members of the subset.
The members of the subset may include an additional barcode, i.e. a second barcode which can be a same or identical barcode identifying the members of the subset. The first barcode is different from the second barcode insofar as the first barcode represents information different from the second barcode. According to one aspect, the second barcode may be different, i.e. nonidentical, among members of the subset, such as to convey specific information about different members of the subset. For example, the second barcode may have a first sequence identifying a first set of polynucleotides of fee subset, a second sequence identifying a second set of polynucleotides of fee subset, a third sequence identifying a third set of polynucleotides of fee subset, and so on, such that the second barcode may comprise a plurality of different sequences. The first sequence, second sequence and third sequence represent different information.
Such a method of barcoding provides for selective amplification and retrieval of an entire subset of polynucleotides representing stored information such as flies, particularly when the subset of polynucleotides is pooled along wit other subsets of polynucleotides representing different stored information, and also when polynucleotides within a given known location have a sequence different from the predetermined sequence for that given known location due to synthesis infidelity. The use of the first barcode allows for base-precise hybridization of an amplification or selection primer to imprecisely synthesized but informationally accurate polynucleotides of a given known location.
The retrieved subset of information can then be probed for particular polynucleotides using the different sequences of fee second barcode if such different sequences have been attached to the polynucleotides of the subset. For example, while a given subset Includes a first
barcode identifying the members of the subset as reiaied files, the second barcode can be used to distinguish categories of files within the subset.
According to one aspect, the polynucleotides of a given subset that are attached to the addressable substrate include a universal single -base precise initiator sequence at the 5’ end of the polynucleotides and a universal single-base precise adapter at the 3’ end of the polynucleotides. According to one aspect, the polynucleotides of a given subset are flanked by first and second universal single-base precise sequences. According to one aspect, a subset of polynucleotides is released from the addressable substrate, with the polynucleotides having a first single base precise sequence at one end and a second single base precise sequence at the other end. The polynucleotides are then processed under conditions to add one or mom, such as two or more or a plurality of barcode sequences. For example, the first single base precise sequence can be used to add a first barcode to the polynucleotides and the second single base precise sequence can be used to add a second barcode to ie polynucleotides. In this manner, the polynucleotides are flanked by barcodes. The first and second single-base precise sequences can be used to create a barcode under conditions using known methods such as PCR methods, iigase methods, RPA methods, transposon/transposase methods, recombination methods or hybridization extension methods, as axe known in the art. PCR conditions as generally described herein utilize common PCR buffer conditions known to those of skill in the art such as: standard PCR buffer (I X), 1.5 mM MgC12, 50 mM KCi, 10 m Tris-HCl pH: 8.3 at 25°C and a standard PCR buffer (10X), 15 M MgC12, 500 mM KCI, 100 mM Tris-HCL pH 8.3 at 25°C.
Accordingly, the polynucleotides of a given subset that have been synthesized to encode a format of information include a first barcode at either the 5’ or 3’ end. The polynucleotides of
a given subset that have been synthesized to encode the format of information can also have a second barcode at the other of the 5’ or 3’ end. The polynucleotides of a given subset that have been synthesized to encode the format of information have a first barcode at the 5 and a second barcode at the 3’ end. The barcodes may include eta tag information.
According to one particular aspect of the present disclosure, a subset of polynucleotides is selectively released or desorbed or decoupled from the surface of an addressable electrode array using electrochemical desorption (BCD). Such electrode arrays are known in the art and provide high spatial control, such as (1) by using mild electrically reducing conditions, (2) by repeatedly oxidizing-reducing the underlying electrode surface or (3) by electrolysis. As an example, the initiators can be tethered, linked or otherwise bound, attached or connected to the electrode array by thiol linkages. The thiol linkages can be broken under mild reducing conditions by applying a negative potential to selected electrodes resulting in electrochemical desorption or release of the polynucleotide from the substrate. Useful methods of electrochemical desorption arc disclosed in Henry et a!., Electrochem. Commun. 1 1 , 664-667 (2009) and Refera Soreta et ah, Electrochim. Acta. 55, 4309-4313 (2010) each of which are incorporated by reference in its entirety.
The selectively released or desorbed or decoupled polynucleotides are transferred, such as by using fluidics channels, into a vessel or chamber for barcoding under conditions to add either a 5’ barcode sequence or 3 barcode sequence or both to each polynucleotide, such as under PCR conditions or by ligation or other known methods. For example, 5’ or 3’ sequences unique to the set of polynucleotides can be added under PCR conditions using the 5' or 3’ single base precise sequences. The barcode sequence can be used as hash keys to represent eta tags.
i.e. where each barcode represents known information such as a category of subject matter (“financial”,“vacation photos”“tax information”, etc.) or a dale or other information describing the forma! of information encoded by the subset of polynucleotides. 'Fagging can be performed iteratively in a plurality of chambers or vessels connected by fluidic channels to attach a plurality of meta tags to data encoded in the polynucleotide strands. A plurality of chambers may be used to barcode strands in parallel or in series as desired. A plurality of bareoded subsets of polynucleotides can be collectively stored as a single pool. Each subset ca be retrieved using the single base precise sequence or barcode unique to each subset, for example, using PCR methods or hybridization methods known to those of skill in the art.
The practice of certain embodiments or features of certain embodiments may employ, unless otherwise indicated, conventional techniques of nucleic acid synthesis and so forth which are within ordinary skill in the art. Such techniques are explained fully in the literature. Terms and symbols of nucleic acid chemistry, biochemistry, genetics, and molecular biology used herein follow those of standard treatises and texts In the field, e.g , Komherg and Baker, DNA Replication , Second Edition (W.BL Freeman, New York, 1992); Lehninger, Biochemistry, Second Edition (Worth Publishers, New York, 1975); Strachan and Read, Human Aiolecular Genetics , Second Edition (Wiley-Liss, New York, 1999); Eckstein, editor, Oligonucleotides and Analogs: A Practical Approach (Oxford University Press, New York, 1991); Gait editor, Oligonucleotide Synthesis: A Practical Approach (URL Press, Oxford, 1984); and the like.
it is to be understood that method steps described herein need not be performed in the order listed unless expressly stated. Method steps may be performed in any order. Further,
method steps may be performed simultaneously or together and need not be performed separately or individually.
The practice of the methods disclosed herein may employ conventional biology methods, software, computers and computer systems. Accordingly, the methods described herein may be computer implemented methods in whole or in part. Computer software utilized in the methods of the present disclosure include computer readable medium having computer-executable instructions for performing logic steps of the method of the invention. Suitable computer readable media include, but are not limited to, a floppy disk, CD-ROM/DVD/DVD-ROM, hard disk drive flash memory, ROM/RAM, magnetic tapes, and others that may be developed. The computer executable instructions may be written in a suitable computer language or combination of several computer languages. The methods described herein may also make use of various commercially available computers and computer program products and software for a variety of purposes including tran lating text or images into binary code, designing nucleic acids sequences represe tative of the binary code, analyzing sequencing data fro the nucleic acid sequences, translating the nucleic acid sequence data into binary code, and translating the binary code into text or images.
Certain exemplary embodiments axe directed to the use of computer software and hardware to automate polynucleotide synthesis upon an addressable substrate. Such software and hardware may be used in conjunction with individuals performing synthesis by hand or in a semi-automated f shion or combined with an automated system. In at least some embodiments, exemplary programs are written in suitable programming language. The program may be compiled into an executable that may then be run from a command prompt in the WINDOWS
XP operating system or other operating systems. Unless specifically set forth in the claims, the invention is not limited to implementation using a specific programming language, operating system environment or hardware platform.
it is to be understood that the embodiments of the present invention which have been described are merely illustrative of some of the applications of the principles of the present invention. Numerous modifications may be made by those skilled In the art based upon the teachings presented herein without departing from the true spirit and scope of the invention. The contents of all references, patents and published patent applications cited throughout this application are hereby incorporated by reference in their entirety for all purposes.
The following examples are set forth as being represe ative of the present invention. These examples are not to be construed as limiting the scope of the invention as these and other equivalent embodiments will be apparent in view of the present disclosure, figures, tables, and accompanying clai s.
EXAMPLE I
General Methods of Making Oligonucleotides or Polynucleotides on a Substrate According to the present disclosure, methods are provided for making oligonucleotides or polynucleotides attached to a substrate. The substrate may be an addressable substrate, such as an addressable array, such as an addressable electrode array. Such methods are generally known to those of skill in the art and as described herein. As used herein, the term“attach” refers to both covalent interactions and noncovaient interactions. A covalent interaction is a chemical linkage between two atoms or radicals formed by the sharing of a pair of electrons (i.e., a single bond).
two pairs of electrons (i.e., a double bond) or three pairs of electrons the.. a triple bond). Covalent interactions are also known lit the art as electron pair interactions or electron pair bonds. Noncovaient Interactions Include but are not limited to, van der Waa!s interactions, hydrogen bonds, weak chemical bonds (i.e , via short-range noncovaleni forces), hydrophobic interactions, ionic bonds and the like. A review of noncovaleni interactions can be found in Alberts et ai., in Molecular Biology of ike Celt 3d edition. Garland Publishing, 1994
The present disclosure provides that synthesis of polynucleotides can be performed using a support. Methods of synthesizing oligonucleotide sequences are well-known in the art (See, e.g., Seliger (1993) Protocols for Oligonucleotides and Analogs: Synthesis and Properties, vol. 20, pp. 391-435, Efimov (2007) Nucleosides, Nucleotides & Nucleic Acids 26:8 McMlrm et ah i 1997) / Org. Ghent 62:7074, Froehier et ah ( 1986) Nucleic Acids Res. 14:5399, Garegg 11986) let. Lett. 27:4051 , Efimov (1983 ) Nucleic Acids Res. 1 1 :8369, Reese ( 1978) Tetrahedron 34:3143). As used herein, the term‘'polynucleotide” is intended to include, but is not limited to, a single-stranded or double stranded DMA or RNA molecule, typically prepared by synthetic means. Nucleotides of the present invention will typical ly be the naturally-occurring nucleotides such as nucleotides derived from adenosine, guanosine, uridine, cyiidine and thymidine. However, synthetic or non-natural nucleotides may be used.
The terms“nucleic acid,"“nucleic acid molecule,”“nucleic acid sequence,”“nucleic acid fragment,'’“oligonucleotide” and“polynucleotide” are used interchangeably and are intended to include, but not limited to, a polymeric form of nucleotides that may have various lengths, either deoxyribonucieotides or ribonucleotides or analogs thereof. Oligonucleotides or polynucleotides useful in the methods deseribed herein may comprise natural nucleic acid
sequences and variants thereof artificial nucleic acid sequences, or a combination of such sequences. Oligonucleotides or polynucleotides may be single stranded or double stranded.
A polynucleotide is typically composed of a specific sequence of tour nucleotide bases: adenine (A); cytosine {€); guanine (G); an thymine (T) (uracil (U) for thymine (T) when the polynucleotide is RNA). Thus, the term ‘polynucleotide sequence” is the alphabetical representatio of a polynucleotide molecule; alternatively, the term may be applied to the polynucleotide molecule itself. This alphabetical representation can be input into databases in a computer having a central processing unit and used for synthesis of the oligonucleotide or polynucleotide. Polynucleotides may optionally include one or more non-standard nucleotide(s), nucleotide anaiog(s) and/or modified nucleotides.
In certain aspects, a single support or multiple supports (tens hundreds, thousands or more) may be utilized (e g , synthesized, amplified, hybridized or the like) in parallel. Suitable supports include but are not limited to, slides (e.g., microscope slides), beads, chips, particles, strands, gels sheets, tubing (e.g . microfuge tubes, test tribes, cuvettes), spheres, containers capillaries, microfibers, pads, electrodes, slices, films, plates (e.g , multi-well plates) microfluidic supports (e.g., microarray chips, flow channel plates, biochips and the like) and the like in various embodiments, the solid supports may be biological, nonbioiogical. organic inorganic or combinations thereof. When using supports that are substantially planar, the support may be physically separated into regions by physical barriers, for example, with trenches, grooves, wells, or chemically separated into regions by chemical barriers (e.g., lacking a lipid-binding coating hydrophobic coatings and the like). The supports include a plurality of locations where oligonucleotides or polynucleotides are to be synthesized. In exemplary
embodiments, supports can be made of a variety of materials including but not limited to glass quartz, ceramic, plastic, polystyrene methylstyrene, acrylic polymers, titanium, gold, platinum, latex, sepbarose, cellulose, nylon and the like and any combination thereof. Such supports and their uses are well known in the art.
In certain exemplary embodiments, a support is an array or a microarray. As used herein, the term“micro array” refers in one embodiment to type of array that comprises a solid phase support having a substantially planar surface on which there is an array of spatially defined non- overlapping regions or sites that each contain an immobilized polynucleotide or a plurality of immobilized polynucleotides. The regions or sites may each contain an electrode. “Substantially planar’ means that features or objects of interest, such as polynucleotide sites, on a surface may occupy a volume feat extends above or below a surface and whose dimensions are small relative to he dimensions of the surface. For example, beads disposed on the face of a fiber optic bundle create a substantially planar surface of probe sites, or oligonucleotides disposed or synthesized on a porous planar substrate create substantially planar surface. Spatially defined sites may additionally be“addressable” in feat its location and the identity of fee immobilized polynucleotide at feat location are known or determinable.
Oligonucleotide or polynucleotide sequences may be prepared by any suitable method e.g , fee phosnhoramidlte method described by Beaucage and Carruthers ((1981 ) Tetrahedron Lett. 22: 1859) or the triester method according to Matteucei et ai (1981) ./. Am. Chem. Soc. 103:3185), both incorporated herein by reference in their entirety for all purposes, or by other chemical methods using either a commercial automated oligonucleotide synthesizer or high- throughput. high-density array methods described herein and known in the art (see U.S. Patent
Nos. 5.602.244. 5,574,146, 5,554,744, 5,428,148, 5.264.566. 5,141.813. 5,959,463, 4,861 ,571 arid 4 659,774, incorporated herein by reference in its entirety for all purposes).
In an exemplary embodiment, oligonucleotides or polynucleotides may be synthesized on a solid support using a maskless array synthesizer (MAS). Maskless array synthesizers are described, for example in PCX application No. WO 99/42813 and in corresponding U.S. Patent No. 6,375,903 Other examples are known of maskless instruments which can fabricate a custom polynucleotide microarray in which each of the features in the array has a single stranded DNA molecule of desired sequence. An exemplary type of instrument is the type shown in Figure 5 of U.S. Patent No. 6.375.903. based on the use of reflective optics. Other methods for synthesizing oligonucleotide or polynucleotide sequences include, for example, light-directed methods utilizing masks, flow channel methods, spotting methods, pin-based methods, and methods utilizing multiple supports as is known in the art.
Flow channel methods involve, for example, microfluid ie systems to control synthesis of polynucleotides on a solid support. However suitable reagents may be flowed over the entire surface of a support and methods employed for selective activation of known locations for synthesizing polynucleotides. For example, diverse polymer sequences may be synthesized at selected regions of a solid support by forming flow channels on a surface of the support through which appropriate reagents flow or in which appropriate reagents are placed. One of skill in the art will recognize that there are alternative methods of forming channels or otherwise protecting a portion of the surface of the support. For example, a protective coating such as a hydrophilic or hydrophobic coating (depending upon the nature of the solvent) is utilized over portions of the support to be protected sometimes in combination with materials that facilitate wetting by the
reactant solution In other regions in this manner the flowing solutions are further prevented from passing outside of their designated flow paths.
Spotting methods for preparation of oligonucleotides on a solid support involve delivering reactants in relatively small quantities by directly depositing them in selected regions hi some steps, the entire support surface can be sprayed or otherwise coated with a solution, if it is more efficient to do so. Precisely measured aliquots of monomer solutions may be deposited dropwise by a dispenser that moves from region to region. Typical dispensers include a micropipette to deliver the monomer solution to the support and a robotic system to control the position of the micropipette with respect to the support, or an ink-jet printer. In other embodiments the dispenser includes a series of tubes a manifold an array of pipettes, or the like so that various reagents can be delivered to the reaction regions simultaneously.
Pin-based methods for synthesis of oligonucleotides on a solid support are described, for example, in U.S. Patent No. 5,288.514. Pin-based methods utilize a support having a plurality of pins or other extensions. The pins are each inserted simultaneously into individual reagent containers in a tray. An array of 96 pins is commonly utilized with a 96-coniainer tray, such as a 96-well micro litre dish. Each tray is filled with a particular reagent for coupling in a particular chemical reaction on an individual pin. Accordingly, the trays will often contain different reagents. Since the chemical reactions have been optimized such that each of the reactions can be performed under a relatively similar set of reaction conditions, it becomes possible to conduct multiple chemical coupling steps simultaneously.
In yet another embodiment, a plurality of oligonucleotides or polynucleotides may be synthesized on multiple supports. One example is a bead based synthesis method which is described, for example, in ITS. Patent Nos 5,770 358 5,639,603, and 5,541 ,061.
The oligonucleotides or polynucleotides may be removed, released or uncoupled from the solid support, for example, by exposure to conditions such as acid, base, oxidation, reduction, beat light, pH, electric current, electric potential, metal ion catalysis, displacement or elimination chemistry, or by enzymatic cleavage as is known in the art.
Cieavabie Linkages
Aspects of the present disclosure utilize a cieavabie linkage attaching a polynucleotide to a support to release, decouple, or detach the polynucleotide from the support. Cieavabie linkages are known to those of skill in the art and include those aetivatable, he cieavabie, by acid, base, oxidation, reduction, heat, tight pH, electric current, electric potential, metal ion catalysis, displacement or elimination chemistry, or enzyme. Methods synthesizing and cleaving nucleic acids containing chemically cieavabie, thermally cieavabie, and photo- labile groups are described for example, in II. S. Patent No. 5,700,642.
In one embodiment, oligonucleotides may be attached to a solid support through a cieavabie linkage moiety. For example, foe solid support may be functionalized to provide cieavabie linkers for covalent attachment to the oligonucleotides. The linker moiety may be one, two, three, four, five, six or more atoms in length. Alternatively, the cieavabie moiety may be within an oligonucleotide and may be introduced during in situ synthesis. A broad variety of
cieavabie moieties are availabie in the art of solid phase and microarray oligonucleotide synthesis (see e.g , Pom R., Methods Mol Biol 20:465-496 (1993); Verrna et ah, Ann. Rev. Biochem. 67:99-134 (1998); U.S. Patent Nos. 5,739,386, 5,700,642 and 5,830,655; and U.S. Patent Publication Nos. 2003/0186226 and 2004/0106728).
In one embodiment, cieavabie sites contained within the modified oligonucleotide mayinclude chemically cieavabie groups such as dialkoxysila.ne, 3 (S)-phosphoroihioaie, 5 (S) phosphorothioate, 3'--(N)-phospboramidate, 5’-(N)phosphoramidate, and ribose Synthesis and cleavage conditions of chemically cieavable oligonucleotides are described in U.S. Patent Nos. 5.700.642 and 5,830,655. In another embodiment, a non-cleavable hydroxyl linker may be converted into a cieavabie linker by coupling a special phosphoramidite to the hydroxyl group prior to the phosphoramidite or H-phosphonate oligonucleotide synthesis as described in U.S. Patent Application Publication No. 2003/0186226. In another embodiment, the cieavabie linking moiety may be a TOPS (two oligonucleotides per synthesis) linker (see e.g.. PCT publication WO 93/20092). For example, the TOPS phosphoramidite may be used to convert non- cleavable hydroxyl group on the solid support to a cieavabie linker. In another embodiment, a cieavabie linking moiety may be an amino linker. The resulting oligonucleotides bound to the linker via a phosphoramidite linkage may be cleaved with 80% acetic acid yielding a 3‘- phosphoryiated oligonucleotide Base-eleavab!e sites include b-eyano ether, 5'-deoxy-5 - aminocarbamate, 3'-deoxy-3 -aminocarbamate, urea, 2 -cyano-3', 5'-phosphodiester, 2'-amino-3', 5 -phosphodiesier, ester and ribose. Thio-containlng inieraucieotide bonds such as 3‘-(S)- phosphorothioate and 5‘-(S)-phosphorothioate are cleaved by treatment with silver nitrate or
mercuric chloride. Acid cleavable sites include 3:-iN)-phosphoramidaie. 5 -(N)- phosphorarnidate, difeioacetal, acetal and phosphonic bisamide.
In another embodiment, the cleavable linking moiety may be a photocleavable linker, such as an oriho-niirobenzyl photocleavable linker. Photocleavable moieties include those capable of being cleaved by light of a certain wavelength. Such cleavable moieties are referred to as phoioiabiie linkages and are disclosed in Oiejnik ei at, Photocleavable biotin derivatives: a versatile approach tor the isolation of biomolecules, Proc. Natl. Acad. Set ILS.A., voi. 92, p. 7590-7594 (1995). Photo-labile linkages include nitrobenzylether and thymidine dimer. Such photocleavable linkers can be cleaved by IJV illumination between wavelengths of about 275 to about 375 n for a period of a few seconds to 30 minutes, such as about one minute. Exemplary wavelengths include between about 300 nm to about 350 nm. Synthesis and cleavage conditions of phoioiabiie oligonucleotides on solid supports are described, for example, in Venkatesan ei al. J of Org. Chem. 61:525-529 (1996), Kaiil ei afo J. of Org. Chem. 64:507-510 (1999), Kahl et al., J. of Org. Chem 63:4870-4871 (1998), Greenberg et al., J. of Org Chem. 59:746-753 (1994), Holmes et ah, j. of Org. Chem. 62:2370-2380 (1997), and U.S. Pat. No. 5,739,386.
Thermally cleavable groups Include ally lie sulfoxide and cyclohexene.
In another embodiment, oligonucleotides may be removed from a solid support by an enzyme such as a nuclease. For example, oligonucleotides may be removed from a solid support upon exposure to one or more endonucleases, including, for example, restriction endonucleases such as class 11s restriction enzymes. A restriction endonuclease recognition sequence may be incorporated into the immobilized oligonucleotides and the oligonucleotides may be contacted with one or more restriction endonucleases to remove the oligonucleotides from the support. A
wide variety of restriction endonucleases having specific binding and/or cleavage sites are commercially available, for example, from New England Biolabs (Ipswich, MA) Other suitable nucleases include zinc fingers TALENs and CRISPR nucleases as are known in the art.
A suitable deavabie moiety may be selected to be compatible with the nature of the protecting group of the nucleoside bases if a protecting group is utilized the choice of solid support, and/or the mode of reagent delivery, among others. The deavabie moiety may be removed under conditions which do not degrade the oligonucleotides.
Suitable deavabie or releasable moieties include those responsive to changes in pH, such as which result from application of an electric current or potential to create a localized basic or acidic pH. Such moieties may include one or mom bonds that break in response to such changes in pH, such as thiol bond.
According to one exemplary aspect of die present disclosure, the encoded oligonucleotide or polynucleotide sequences are then synthesized using an error prone polymerase, such as template independent error prone polymerase, and common or natural nucleic acids which may be unmodified. According to this aspect, initiator sequences or primers are attached to a substrate, such as a silicon dioxide substrate, at various known locations, which may include an electrode, to produce an addressable substrate. Reagents including at least a selected nucleotide, a template independent polymerase and other reagents, such as cations, required for enzymatic activity of the polymerase are applied at one or more locations of foe substrate or the entire
substrate where the initiator sequences are loeated and under conditions where the polymerase adds one or more than one or a plurality of the nucleotide to the initiator sequence to extend the initiator sequence. According to one aspect, the nucleotides kklNTPs'’) are applied or flow in periodic applications or waves of known temporal and spatial manner or width or conditions considering the polymerase polymerization (or switching rate) rate in this exemplary manner, blocking groups or reversible terminators may not be used with the dNTPs because the reaction conditions are selected to be sufficient to limit or reduce the probability of enzymatic addition of the dNTP to one dNTP, i.e one dNTP is added using the selected reaction conditions taking into consideration the reaction kinetics. Although, it is to be understood that nucleotides with blocking groups or reversible terminators can be used in certain embodiments. Nucleotides with blocking groups or reversible terminators are known to those of skill in the art. According to an additional embodiment when reaction conditions permit, more than one dNTP may be added to form a homopolymer run when common or natural nucleotides are used with a polymerase, such as a template independent error prone polymerase. However during the data reconstruction step of the methods described herein, each homopolymer run (as determined by sequencing) is interpreted as representing a single dNTP. In this manner, the recording and leading methods described herein allow homopolymer runs and the synthesis methods need not add only a single dNTP, as could be the ease when using template independent polymerases. Polymerase activity may be modified using photo-chemical or electrochemical modulation as a reaction condition, which may allow for addition of dNTP beyond a single dNTP.
A wash is then applied to the one or more locations to remove the reagents. The steps of applying the reagents and the wash are repeated until desired nucleic acids are created.
According to one aspect, the reagents may be added to one or more than one or a plurality of locations on the substrate in series or in parallel or the reagents may contact the entire surface of the support, such as by flowing the reagents across the surface of the support. According to one aspect, the reaction conditions are determined, for example based ou reaction kinetics or the activity of the polymerase so as to determine or limit the ability of the polymerase to attach more than one nucleotide to the end of the initiator sequence or the growing oligonucleotide. According to certain embodiments, a template dependent error prone polymerase can be used.
According to certain embodiments, a template dependent polymerase may be used which may become error prone. According to certain embodiments, a template independent RNA polymerase can be used.
Polymerases
According to an alternate embodiment of the present invention, polymerases are used to build nucleic acid molecules representing information which Is referred to herein as being recorded in the nucleic acid sequence or the nucleic acid is referred to herein as being storage media. Polymerases are enzymes that prodnce a nncleic acid sequence, for example, using DNA or RNA as a template, or such enzymes may be template independent. Polymerases that produce RNA polymers are known as RNA polymerases while polymerases that produce DNA polymers are known as DNA polymerases. Polymerases that incorporate more titan one type of nucleotide are known in the art and are referred to herein as an“error-prone polymerases”. Template independent polymerases may be error prone polymerases. Using an error-prone polymerase allows the incorporation of specific bases at precise locations of the DNA molecule. Error-
prone polymerases will either accept a non-standard base, such as a reversible chain terminating base, or will incorporate different nucleotide, such as a natural or unmodified nucleotide that is selectively given to it as it tries to copy a template.
Template -independent polymerases snch as terminal deoxynucleotidyl transferase (TdT), also known as DNA nuc!eotidylexotransierase (DNTT) or terminal transferase create nucleic acid strands by catalyzing the addition of nucleotides to the 3' terminus of a DNA molecule without a template. The preferred substrate of TdT is a 3 '-overhang, but it can also add nucleotides to blunt or recessed 3’ ends. Cobalt is a cofactor, however the enzyme catalyzes reaction upon Mg and Mn administration in vitro. Nucleic acid initiators may be 4 or 5 nucleotides or longer and may be single stranded or double stranded. Double stranded initiators may have a 3’ overhang or they may be blunt ended or they may have a 3' recessed end.
TdT, like all DNA polymerases, also requires divalent metal ions for catalysis. However, TdT is unique in its ability to use a variety of divalent cations snch as Co2-f, Mn2-f, Zn2+ and Mg2+. In general, the extension rale of the primer p(dA)n (where n is the chain length from 4 through 50) with dATP in the presence of divalent metal ions is ranked in the following order: Mg2+ > Zn2+ > C o.:··· > Mh2-k In addition, each metal ion has different effects on the kinetics of nucleotide incorporation. For example, Mg2+ facilitates he preferential utilization of dGTP and dATP whereas Co2+ increases the catalytic polymerization efficiency of the pyrimidines, dCTP and dITP Zre' f behaves as a unique positive effector for TdT since reaction rates with Mg2+ are stimulated by the addition of micromolar quantities of Zn2+. This enhancement may reflect the ability of Zn2-f to induce conformational changes in TdT that yields higher catalytic efficiencies. Polymerization rates are lower in the presence of Mn2+ compared to Mg2+
suggesting that Mn2+ does not support the reaction as efficiently as Mg2+. Further description of TdT is provided in Biochim Biophys Ada., May 2010; 1804(5): 1151-1166 hereby incorporated by reference in its entirety. In addition, one may replace Mg2-t, Zn2 f. Co2-f or Mn2+ in the nucleotide pulse with other cations designed to modulate nucleotide attachment. For example, if the nucleotide pulse replaces Mg++ with other caiion(s), such as Na+, K r. Rb-h oe-i-r Ca++ or Sr-t-t·, then the nucleotide can bind but not incorporate, thereby regulating whether the nucleotide will incorporate or not. Then a pulse of (optional) pre-wash without nucleotide or Mg-r+ can be provided or then Mg-s-+ buffer without nucleotide can be provided.
By limiting nucleotides available to the polymerase the incorporation of specific nucleic acids into the polymer can be regulated. Thus these polymerases are capable of incorporating nucleotides independent of the template sequence and are therefore beneficial lor creating nucleic acid sequences de novo. The combination of an error-prone polymerase and a primer sequence serves as a writing mechanism for imparting information into a nucleic acid sequence.
By limiting nucleotides available to template independent polymerase, the additio of a nucleotide to an initiator sequence or an existing nucleotide or oligonucleotide can be regulated to produce an oligonucleotide by extension. Thus, these polymerases are capable of incorporating nucleotides without a template sequence and axe therefore beneficial for creating nucleic acid sequences de novo.
The eta-polymerase (Matsuda et ai (2000) Nature 404(6781): 1011-1013) is an example of a polymerase having a high mutation rate (~ 10%j and high tolerance for 3 mismatch in the presence of all 4 dNTPs and probably even higher if limited to one or two dNTPs. Hence, the eta-polymerase is a de novo recorder of nucleic acid information similar to terminal
deoxynucleoddyl transferase (TdTj but with the advantage that the product produced by this polymerase is continuously double-stranded. Double stranded DNA has less sticky secondary structure and has a more predictable secondary structure than single stranded DNA. Furthermore, double stranded DNA serves as a good support for polymerases and/or DNA- binding-protein teihers.
According to certain aspects, a template dependent or template semi-dependent error prone polymerase can be used. According to certain embodiments a template dependent polymerase may be used which may become error prone. According to certain embodiments, a template independent RNA polymerase can be used. Where a template dependent or template semi-dependent polymerase is used any combination of templates with universal bases can be used which encourage acceptance of many nucleotide types. In addition, error tolerant cations such as Mn+ can be used. Further, the present disclosure contemplates the use of error-tolerant polymerase mutants. See Berger et ah, Universal Bases for Hybridization, Replication and Chain Termination, Nucleic Acids Research 2000. August L 28(15} pp 2911-2914 hereby incorporated by reference.
In addition, useful methods of making nucleic acid sequences am disclosed in "Large- scale de novo DNA synthesis: technologies and applications," by Siira Kosuri and George M. Church, Nature Methods, May. 2014. Voi. 11 No. 5, pp 499-507 hereby incorporated by reference in its entirety.
According to certain aspects, the commercially available CustomArray system fro CustomArray, Inc. is an exemplary syste that can be used to make the nucleic acid sequences encoding the information to be stored by affecting or altering or producing pH locally on a
substrate. It is to be understood that other methods may be used to affect or alter or produce pH at particular locations on a substrate. The CostomArray system uses a pH gradient and synthesizes a desired ol gonucleotide microarray using a semiconductor-based electrochemical- synthesis process. Each oligonucleotide or polynucleotide is synthesized via a platinum electrode that is independently controlled by the synthesizer's computer. According to methods described herein, pH gradient is created which activates a pH sensitive polymerase at specific, desired locations on the substrate to add a nucleotide present in an aqueous medium at the specific, desired location. According to this aspect, pH is modulated to initiate the polymerase to add a single nucleotide however, more than one nucleotide may be added to create a homopolymer.
According to aspects described herein system, such as the CustomArray system, or other systems described herein, can be used to afreet or alter or produce pH locally on a substrate where a pH dependent polymerase, a nucleotide and other suitable reagents in aqueous media are present to add the nucleotide to an initiator sequence or existing nucleotide or oligonucleotide in a method of forming an oligonucleotide. Exemplary methods described herein use aqueous solvents and pH to modulate activity of a polymerase such as a template independent polymerase, such as TdT to add a nucleotide to an existing initiator sequence, an existing nucleotide or an existing oligonucleotide at a desired location on the substrate in a method of forming an oligonucleotide.
Supports described herein may have one or more electrodes positioned at or near or adjacent to a reaction site such that oxidation or reduction may take place within a reaction zone including the reaction site.
The present disclosure provides for the use of an aqueous electrolyte media such as in commonly used with electrochemical cells. The aqueous electrolyte media may further include a weakly acidic moiety participating in oxidation or reduction reaction at an electrode and releasing one or more protons or adsorbing one or more hydroxide ions upon oxidation, thereby altering pH. The aqueous electrolyte media may further include one or more or a plurality of acid generating reagents. An exemplary acid-generating reagent is bydroquionone, catechol, resorcinol, Aikannin, bexahydroxynaphthoquinone, Jugione, Lapachoi, Lawsone, Menatetrenone, spinochrome D, Phylloquinone, Plumbagin, spinochrome B, Menadione, 1,4- Naphthoquinone, 1.2-Naphthoquinone, 1.6-Naphthoquinone, anthraquinones, isoindole-4, 7- diones, other natural and synthetic derivatives of quinone, other phenol derivatives, pyrrole and related derivatives and polymers thereof, thiophenes and related derivatives and polymers thereof, aniline and related derivatives and polymers thereof, acetylene derivatives and polymers thereof. Bipyridiniumor and derivatives thereof and related compounds, aldehydes and alcohols, bromine oxides cyanides, carbonates hypoohiorous acids hypoiodons acids, thiols, organic halides, or other weakly acidic organic and inorganic compounds.
The aqueous electrolyte media may further include a weakly basic moiety participating in an oxidation or reduction reaction at an electrode and releasing one or more hydroxide ions or absorbing one or more protons upon reduction thereby altering pH. An exemplary base generating reagent is 1,4-benzoquinone, 1 ,2-benzoquinone, 1,3-benzoquinone, anthraquinone, Duroquinone, Tetrahydroxy- 1 ,4-benzoquinone, Aikannin, bexahydroxynaphthoquinone, Jugione, Lapachoi, Lawsone, Menatetrenone, spinochrome D, Phylloquinone. Plumbagin, spinochrome B, Menadione, 1, 4-Naphthoquinone, 1,2-Naphthoquinone, 1 6-Naphthoqninone, anthraqoinones.
lsoindoie-4,7-diones, other natural and synthetic quinone derivatives, other phenol derivatives pyrrole and related derivatives and polymers thereof, thiophenes and related derivatives and polymers thereof, aniline and related derivatives and polymers thereof, acetylene derivatives and polymers thereof, Bipyridiniumor and derivatives thereof and related compound, aldehydes, ketones, and alcohols bromine oxides cyanides carbonates hypochiorous acids, hypoiodous acids, thiols, organic halides, or oilier weakly basic organic or inorganic compounds.
According to another embodiment, a microfluidic device is provided with one or more reservoirs which include one or more reagents which are then transferred via microchannels to a reaction zone or location on the addressable substrate where the reagents axe mixed and the reaction occurs. Such microfluidic devices and the methods of moving fluid reagents through such microfluidic devices are known to those of skill in the art.
According to one aspect, a flow cell or other channel, such a microfluidic channel or microiluidic channels having an input and an output is used to deliver fluids including reagents, such as a polymerase, a nucleotide and other appropriate reagents and washes to particular locations on a substrate within the flow ceil, such as within a reaction chamber. According to certain aspects, reaction conditions are selected to selectively activate and deactivate locations on the substrate. In this manner a desired location, such as a grid point on a substrate or array, can be provided with reaction conditions to facilitate covalent binding of a nucleotide to an initiator sequence, an existing nucleotide an existing oligonucleotide and the reaction conditions can be provided to prevent further attachment of an additional nucleotide at the same location. Then, reaction conditions to facilitate covalent binding of a nucleotide to an existing nucleotide can be provided to the same location in a method of making an oligonucleotide at that desired location.
It is to be understood that while methods described herein can deliver reagents to particular locations of an addressable substrate the reagents can be delivered to the entirety of (he substrate or portions thereof and a selected known location or locations can be activated to cause the polymerase to add the nucleotide to either the initiator or growing nucleotide chain. The surface of the addressable substrate can be washed and a second set of reagents can be added io the surface of the addressable support, activated to add a nucleotide and so on.
The synthesized oligonucleotides or polynucleotides can be amplified using methods known to those of skill in the art. Amplification methods may comprise contacting a nucleic aci with one or more primers that specifically hybridize to the nucleic aci under conditions that facilitate hybridization and chain extension. Exemplary methods for amplifying nucleic acids include the polymerase chain reaction (PCR) (see. e.g., Mollis ei ai. (1986) Cold Spring Harh. Symp. Quant. Biol 51 Pi 1:263 and Cleary ei ah (2004) Nature Methods 1:241; and U.S. Patent Nos. 4,683,195 and 4,683,202), anchor PCR, RACE PCR, ligation chain reaction (LCR) (see, e.g., Landegran et ai. (1988) Science 241:1077-1080; and Nakazawa et ah (1994) Proc. Nad Acad ScL U.S.A 91:360-364), seif-sustained sequence replication (Goateiii et ai (1990) Proc. Nad. Acad ScL U.S.A. 87:1874), tra scriptional amplification system (Kwoii ei ah (1989) Proc. Nad. Acad. Sci U.S.A. 86: 1173), Q-Beta Rephcase (Llzardi et ah (1988 ; BioTechnology 6:1197), recursive PCR (laffe et ai (2000) 7 Biol Chem. 275:2619: and Williams et al. (2002) 7. Had.
Chem. 277:7790), the amplification methods described in U.S. Patent Nos 6,391,544, 6.365 375
6.294.323, 6,261,797, 6,124.090 and 5,612,199, or any other nucleic acid amplification method using techniques well known to those of skill in the art.
Sequencing Methods
According to certain aspects, polynucleotides or a plicons thereof are sequenced using methods known to those of skill in the art, such as next-generation sequencing methods. The sequenced oligonucleotides or polynucleotides are then converted into bit sequences corresponding to, for example, an him! format of information. The bit sequences can be converted to the format of information using methods known to those of skill in the art. The format of information can be visualized or displayed or played, if an audio format, using methods and devices known to those of skill In the art.
Sequencing methods useful in the present disclosure include Shendure et a!., Accurate multiplex polony sequencing of an evolved bacterial genome, Science, vol 309, p 1728-32 2005; Drmanac et ah Human genome sequencing using unchained base reads on self- assembling DNA nanoarrays, Science, vol. 327, p. 78-81. 2009: Me Kern an et ah, Sequence and structural variation in a human genome uncovered by short-read, massively parallel ligation sequencing using two-base encoding. Genome Res., vol 19, p 1527-41. 2009; Rodrigue et ah, Unlocking short read sequencing for metagenomics PIMS One, vol. 28, el 1840. 2010; Rothberg et ah. An integrated semiconductor device enabling non- optical genome sequencing, Nature, vol. 475, p. 348-352. 2011 : Margulies et ah, Genome sequencing in microfabricated high-density picoiitre reactors, Nature, vol. 437, p. 376-380 2005; Rasko et ah Origins of the E. coll strain
causing an outbreak of hemolytic-uremic syndrome in Germany, /V. Engl. J. Med. Epub. 201 1; Butter et ah, Labeled nucleoside triphosphates with reversibly terminating arninoaikoxyi groups, Nudeos. NudeoL Nud., vol. 92, p. 879-895. 2010; Seo et ah, Four-color DNA sequencing by synthesis on a chip using photocleavabie fluorescent nucleotides, Proc. Natl. Acad. Set USA., Vol. 102, P. 5926-5931 (2005 ); Olejnik et ak; Photocleavabie biotin derivatives: a versatile approach for the isolation of biomolecules Proc. Nad. Acad. Sci. USA., vol. 92, p. 7590-7594. 1995; US 5,750,34; US 2009/0062129 and US 2009/0191553.
Once the polynucleotides have been sequenced, the data reconstruction step may then be carried out where the polynucleotide sequence is translated into the digital representation format.
EXAMPLE VI
Barcoding Methods
Methods of creating a nucleic acid barcode sequence are known to those of skill in the ait. In general, barcoding is the inclusion or association of a specific unique nucleotide sequence or barcode tag along with a larger polynucleotide sequence so as to identify the larger polynucleotide sequence or otherwise provide information along with the larger polynucleotide sequence. Barcodes may also be referred to as unique nucleic acid sequence identifiers tags. MIDs, or Indexes and all serve to identify the polynucleotide to which they are attached. The barcode is generally understood to be an identifying sequence that is read in by a sequencing read separate from the main read that sequences the genomic DNA. In general, a barcode may refer to a short sequence that is read in the same read as the genomic DNA. Barcodes enable
multiple samples to be pooled for sequencing; each sample is identified by a unique barcode which enables identification of results during the analysis.
Methods of adding barcode tags to polynucleotides are known in the art and include PCR methods, iigase methods, transposon/transposase methods, RPA methods, recombination methods or hybridization extension methods. In general, a barcode sequence can be added using a primer sequence including th barcode sequence under PCR conditions so that the barcode sequence is included into the PCR product sequence. Unique DNA sequence identifiers are added in a PCR reaction carried out before sequencing, which also adds the primers used for the sequencing reaction. To be effective, barcodes must be unambiguous; but because the barcode is part of the sequence read, it is beneficial to have barcodes be as short as possible. The barcode also should be designed to minimize primer-dimer artifacts. Methods of designing barcodes are known to those of skill in the art.
EXAMPLE VII
Reagent Delivery Systems
The disclosure provides that ore or more or a plurality of reagents and washes are delivered to one or more or a plurality of reaction sites within one or more or a plurality of reaction zones including an electrode or electrodes in a method of covalently attaching dNTP to an initiator sequence or an existing nucleotide attached at the desired location using electricity to alter pH within a reaction zone A selected nucleotide reagent liquid is pulsed or flowed or deposited at the reaction site where reaction takes place and then rnay be optionally followed by delivery of a buffer or wash that does not include the nucleotide. Suitable delivery systems
include fluidics systems niicrofluidics systems syringe systems, ink jet systems, pipette systems and other fluid delivery systems known to those of skill in the art. Various flow cell embodiments or flow channel embodiments or microfluidic channel embodiments are envisioned which can deliver separate reagents or a mixture of reagents or washes using pumps or electrodes or other methods known to those of skill in the art of moving fluids through channels or microiluldie channels through one or more channels to reaction region or vessel where the surface of the substrate is positioned so that the reagents can contact the desired location where a nucleotide is to be added.
The disclosure provides that a microfluidic device is provided with one or more reservoirs which include one or more reagents winch are then transferred via mlcroehannels to a reaction zone where the reagents are mixe an the reaction occurs. Such microfluidic devices and the methods of moving fluid reagents through such microfluidic devices are known to those of skill in the art.
Reagents can be deposited onto a discrete region of the support, such that each region forms a feature of the array. The pH of the feature is capable of being altered, he. the pH Is raised or lowered to either activate or deactivate an enzyme that catalyzes addition of a dNTP as described herein.
The present disclosure provides for a method of synthesizing a plurality of polynucleotide sequences using a template-independent polymerases such as TdT, which encodes data without the need for single base precision. Each oligonucleotide includes single base precise initiator
sequence at the 5’ end and a single base precise adaptor at the 3 end. Subsequent tagging is based on base-precise hybridization of the universal single base precise initiator sequence at the 5’ end or the universal single-base precise 3’ adapter, which is added to ail synthesized strands by ligation or OCR
As depicted in schematic in Fig. 1A, a plurality of 5’ initiator primers (dark gray) are conjugated or tethered onto the spots in 2D electrode microarray slide, with each snot including an electrode of roughly 0.2mm in diameter according to di thiol chemical conjugation. The polynucleotide synthesis is carried out with TdT under suitable conditions and with suitable reagents. Each nucleotide mixed with a template-independent polymerase, is flown across the entire surface of the array and electrical stimulus can be used to toggle polymerization activity at the desired electrode for the addition of the desired nucleotide. The process is repeated to create polynucleotides of known sequence at known locations of the array. The polynucleotides may include homopolymers of heterogeneous lengths. At the end. a universal single-base precise 3 adapter (light gray) is added to all synthesized strands by ligation hybridization extension or
The synthesized polynucleotides are selectively desorbed from a chosen electrode or electrode arrangement tor mass tagging by electrochemical desorption (BCD). (Fig. IB) Useful methods are disclosed in Henry et ah, Electrochem common. 11 , 664-667 (2009): Refera Soreta et ah, Electrochim. Acta. 55 , 4309-4313 (2010); or Kurylo ei ah, Scientific Reports
(2018) 8:6396, DOI: 10.1038/s41598-018-24659-? each of which are incorporated by reference.
In particular, a subset of polynucleotide strands subset of DNA strands (depicted by red, green, or blue box) can be selectively decoupled from the solid-support by electrical stimulus of electrodes or enzymatic cleavage. Subsets of decoupled DNA strands can be barcoded with unique 5’ and 3’ primers (gray outlined box) which axe designed to anneal either to the universal 5’ initiator or the universal 3' adapter. Each barcode sequence is a hash key that represents a meta tag. For example the green strands could be barcoded with B€-5’-44, which may represent‘vacation”, an hC- d -d 2 which may represent“photos”. In this example, each subset of strands am uniquely tagged with nomoverlapping barcodes in practice, overlapping barcodes may be desirable. Barcodes can be iteratively added or added in series for additional layers of meta tagging. Finally, all subsets of DMA strands can be mixed in a single pool.
According to one aspect, an electrode array is provided where electrodes are placed in an addressable array format on a substrate having initiator sequences reieasably tethered thereto so that oligonucleotides or polynucleotides can be synthesized on the electrodes and so that the synthesized oligonucleotides or polynucleotides can be released f om the electrodes.
Prior to tethering of the initiator sequence to the addressable substrate, the gold electrode surface is first cleaned by abrasion with aluminum oxide particles (BASi) of decreasing sizes (1. 0.3. and 0.05 microns). For each particle size the electrode is manually polished for 30 seconds wills a figure 8 motion on a surface saturated with the particles and distilled water. Following the final particle polishing (0.05 microns), debris is cleared from the electrode surface by ulfrasonicadon (Branson) in an ethanol bath for 5 minutes. Electrodes are then rinsed in distilled water and dried with pure nitrogen. To remove residual organic material fro the electrodes.
they are washed in‘piranha etch' (3 parts of concentrated sulfuric acid and 1 part of 30% hydrogen peroxide solution) for 2 minutes. Finally the electrodes are ulirasonicated (Branson) in distilled water for 15 minutes and ready for tethering A thioiated oligo /5 hioMC6-
D/TTTTΪTTTTTTTTTTΪTTTT /ideoxyU/
CTACACTCTTTCCCTACACGACGCTCTTCCGATCTACGTACTGAG (IDT, lOOuM.g Thioi-PEG3-a1cohol (BroadPharm 6M in 100% EtOH), and TCEP (Sigma. 1 M stock) is prepared as three separate formulations as follows. The Thiol-PE03-alcohol is used as a competitor inhibitor to the thioiate ohgo to improve oligo spacing to decrease steric interference. Oligonucleotide (“oligo”) preparations for coupling to the gold electrode are provided in the
Tab!e below.
According to one aspect, an electrode array has been created where electrodes are placed in an addressable array format on a substrate so that oligonucleotides or polynucleotides can be synthesized on the electrodes and so that the synthesized oligonucleotides or polynucleotides can be released from the electrodes. The three oligonucleotide (“oligo”) mixtures are statically incubated at room temperatures for 30 minutes and 25 microliters of each are dispensed onto 3 different electrodes. One electrode is kept bare as a negative control. The electrodes with oligo
mixtures are sealed in a humidified chamber with saturated NaCl (75% humidity) and tethering proceeded statically at room temperature overnight (>i2 hours). Excess un tethered oligos and PEG3 -alcohol is removed from the electrode surface by washing each electrode once with 0 2% Tween- 20 detergent and twice in IJitraPure Distilled Water (Thermo Fisher)
According to one aspect, a potentiostatic mode set up is provided for purpose of electrochemical desorption of thiol oligonucleotides tethered to gold electrodes. An electrochemical cell including three electrodes (working electrode, counter electrode and reference electrode) are submerged in a 0.5M NaOH solution which is continuously degassed with nitrogen. Each electrode with a tethered o!igo mixture is used as the working electrode and desorption is tested with each working electrode Eierctrodes are connected to an Auto Lab tMetrohm) potentiostat.
According to one aspect, electrodes were dried with nitrogen gas and the surface is covered with 0.5M NaOH which was degassed for 10 minutes with nitrogen gas. For each electrode a reference and counter electrode were also provided, and all connected to a potentiostat (MetroOhm). To measure desorption current and voltages, the Autolab control software was set up in potentiostatic mode. Cyclic voltammetry is performed for five cycles by applying a range of voltages from 0 to -1.25V while measuring current. A representative trace for each of the above three oiigo mixtures is shown in Fugs. 4-6. Trace for bare electrode is overlaid.
This protocol demonstrates desorption of thiol-tethered molecules by electrical stimulation as follows:
A. The IV scan of the bare electrode represents the trace with no attached molecules.
B. The 1 st scan with mixed DNA/PEG shows significant anodic current when voltage is < 0.5V. Peak is approximately -IV.
C. The 5vii scan with mixed DNA/PEG shows that the anodic current is significantly reduced. The trace is similar to that of the I scan of the bare electrode, indicating that thiol- tethered molecules are being desorbed.
I). This indicates that 4 pulses or 5 poises of -0.5V lower or -IV or lower 1s sufficient for desorption across a range of tethered DNA molecules. The number of DNA molecules decreases with increasing PEG-1 niol competitor.
The oligos are desorbed into a basic solution such as a solution of 0.5 NaOH. To prepare the desorbed oligos for tagging/barcoding, an equimolar volume of 0.5 M HQ can be added such that the oligos are dissolved in a final solution of water with 0 5 M NaCL Alternative methods for oligo purification may be used, such as with solid support silica spin columns or Solid Phase Reversible Immobilization (SPRI) beads.
According to certain aspects, desorption conditions may be tailored as desired to achieve certain objectives. For example, methods are provided herein where the number of voltage pulses axe minimized or a large negative voltage is applied. Each aspect can be used to minimize time. In cases where a large negative voltage cannot be used due to instrumentation limitations a large number of smaller negative voltages may be used. Alternatively, buffer conditions such as increased concentration of sodiu hydroxide may be helpful to reduce number of pulses and minimum voltage.
EXAMPLE X
An oligonucleotide is tethered onto the surface of 4 electrodes and desorbed by die application of positive current. To highlight desorbed DNA from the electrode, the electrode surface is flooded with an extension reaction mixture containing the template-independent polymerase, TdT and different species of nucleoside triphosphates. As a result, DNA that is desorbed vs inhered will have different lengths due to altered exposure to the extension reaction
A custom fabricated 4-electrode chip is created by evaporating gold onto a glass substrate. The 4-electrode chip is cleaned by sonication in acetone, then isopropanol, and finally by plasma.
An initiator oligonucleotide is tethered onto the working electrode. The initiator is dissolved in K2HPO4 -TE and dispensed onto the chip. Teihering occurs in a chamber humidified to 100%. After 1 hour, the chip is rinsed with water then cleaned with nitrogen.
A extension reaction mixture is prepared that includes an alkaline buffer (pH>10); a divalent ion (he., Mg) for TdT activity; a nucleoside triphosphate (such as dCTP and dATP), and TdT enzyme
A fluidic flowcell is assembled on top of the chip and the pads are connected to a potentiostat. The extension reaction is injected to cover the electrodes and a constant current of iuA is applied tor 30 seconds followed by an incubation step lor 3 minutes.
Fluid is collected and analyzed for DNA content the initiator contains uracil, which allows it to be released from the surface enzymatically. DNA tethered to the surface is collected by injecting lOuL of USER (a mixture of Uracil DNA giycosy!ase (UDG) and the DNA
giycosylase-lyase Endonuclease VIII} fro NEB with incubation at 37°C for Ihour.
DNA is visualized with a 15% TBE-Urea PAGE gel
lug. 5 depicts the gel results wife F control = initiator only control, FI = fluid from extension reaction wife dCTP (desorbed DNA), E2 - fluid from extension reaction with dCTP (desorbed DNA) and F3 ···.·. fluid from extension reaction with dATP (desorbed DNA).
S control - Cleaved initiator control, Si - tethered DNA from extension reaction with dCTP (tethered DNA) and S2 = tethered DNA from extension reactio wife dCTP (tethered DNA).
This example demonstrates the application of desorption in foe context of DNA data storage. According to the present disclosure, desorption and/or PGR tagging are provided in a method of DNA data storage. The foiol modificatio to foe end of a short DNA oligo allows it to bind onto gold. Without wishing to be bound by scientific theory, an electrostatic repulsion between voltage / charge and the thiol may result in desorption.
Aspects of the present disclosure are directed to a method for selectively barcoding a subset of polynucleotides encoding bits of information with a unique barcode, wherein a plurality of polynucleotides comprising the subset is releasabiy attached to an addressable substrate by an activatabie linker, wherein foe subset is releasabiy attached to known subset locations of the addressaJbie substrate, wherein each known subset location includes a plurality of different sequence polynucleotides each encoding the same bits of Information wherein each polynucleotide of the subset includes a common 5’ universal initiator of same sequence and a
common 3' universal adaptor of same sequence, wherein the method includes (a) selectiveiy releasing the subset of polynucleotides from the addressable array, and (b) barcoding the polynucleotides of the subset at either the 5 end or the 3’ end with a first barcode. According to one aspect, the method further includes (c) barcoding the polynucleotides of the subset at either the 5’ end or the 3’ end with a second barcode. According to one aspect the first barcode comprises metatag information. According to one aspect, the second barcode comprises metatag information. According to one aspect, each polynucleotide of the plurality includes a common 5’ universal initiator of same sequence and a common 3’ universal adaptor of same sequence. According to one aspect each polynucleotide of the subset includes a common 5 universal initiator of same sequence unique to the subset and a common 3’ universal adaptor of same sequence unique to the subset. According to one aspect, the activatable linker is activated to detach the subset of polynucleotides from the addressable substrate using heat, light, an enzyme, a chemical, electrical charge or pH. According to one aspect, the barcoding of step (b) includes hybridizing a first barcoded primer to either the 5’ universal initiator or the 3’ universal adaptor under PCR conditions to add the first barcode to either the 5' universal initiator or the 3 universal adaptor. According to one aspect, the barcoding of step (c) includes hybridizing a second barcoded primer to either the 5’ universal initiator or the 3’ universal adaptor under PCR conditions to add the second barcode to either the 5’ universal initiator or the 3’ universal adaptor. According to one aspect, the barcoded polynucleotides of the subset are collected in a storage vessel. According to one aspect, the addressable substrate includes a plurality of different subsets of polynucleotides wherein each subset encodes bits of information different from other subsets. According to one aspect, the addressable substrate includes a plurality of
different subsets of polynucleotides, and wherein different subsets of polynucleotides axe subject to steps fa) and (b) According to one aspect tire addressable substrate includes a plurality of different subsets of polynucleotides, and wherein different subsets of polynucleotides are subject to steps (a) and (b), and wherein the barcoded polynucleotides of the different subsets an; collected in a storage vessel. According to one aspect the addressable substrate includes a plurality of different subsets of polynucleotides, and wherein different subsets of polynucleotides are subject to steps fa), (b) and c). According to one aspect, the addressable substrate includes a plurality of different subsets of polynucleotides, and wherein different subsets of polynucleotides arc subject to steps (a), (b) and (c), and wherein the barcoded polynucleotides of the different subsets are collected in a storage vessel. According to one aspect, the activatable linker 1 thiol linkage and the subset of polynucleotides is released from the addressable substrate using electronically-stimulated desorption. According to one aspect, the subset of polynucleotides includes DNA or RNA. According to one aspect, the method Anther includes sorting, collecting, amplifying sequencing, storing and/or retrieving the barcoded polynucleotides. According to one aspect, the addressable substrate is an electrode array including a plurality of electrode reaction sites wherein each electrode reaction site is electrically connected to receive a voltage, and wherein the subset of polynucleotides is attached to corresponding electrode reaction sites and wherein releasing tire subset of polynucleotides attached to the corresponding electrode reaction sites is controlled by application of voltage to the corresponding electrode reaction sites. According to one aspect, the addressable substrate is an electrode array including a plurality of electrode reaction sites wherein each electrode reaction site is electrically connected to receive an electric potential and wherein tire array comprises a plurality of different subsets of
polynucleotides, wherein each subset of the plurality is attached to corresponding electrode reaction sites and wherein releasing each subset of the plurality is independently controlled by separate application of voltages. According to one aspect, electronically-stimulated desorption occurs when polynucleotides are Immersed in basic solution and with the application of at least 4 pulses of -IV or lower. According to one aspect, electronically-stimulated desorption occurs when polynucleotides are immersed in basic solution and with the application of at least 5 pulses of -I V or lower. According to one aspect, electronically- stimulated desorption occurs when polynucleotides are immersed with 0.5 M sodium hydroxide and with the application of at least 5 pulses of - IV or lower.
OTHER EMBODIM ENTS
Other embodiments will be evident to those of skill in the ait. It should be understood that the foregoing description is provided for clarity only and is merely exemplary. The spirit and scope of the present Invention are not limited to the above examples, but are encompassed by the following claims. Ail publications and patent applications cited above are incorporated by reference in their entirety for ail purposes to the same extent as if each individual publication or patent application were specifically and individually indicated to be so incorporated by reference.
Claims
Claims
1. A method for selectively barcoding a subset of polynucleotides encoding bits of information with a unique barcode.
wherein a plurality of polynucleotides comprising the subset is reieasabiy attached to an addressable substrate by an activatable linker,
wherein the subset is reieasabiy attached to know subset locations of the addressable substrate
wherein each known subset location includes a plurality of different sequence polynucleotides each encoding the same bits of information.
wherein each polynucleotide of the subset includes a commo 5’ universal initiator of same sequence and a common 3’ universal adaptor of same sequence,
wherein the metho comprises
(a) selectively releasing the subset of polynucleotides from the addressable array and
(b) barcoding the polynucleotides of the subset at either the S' end or the 3’ end with a first barcode.
2. The method of claim 1 further comprising
ic) bareoding the polynucleotides of the subset at either the 5’ end or the 3 end with a second barcode.
3. The method of claim 1 wherein the first barcode comprises metatag information.
4. The method of claim 2 wherein the second barcode comprises metatag information.
5. The method of claim 1 wherein each polynucleotide of the plurality includes a common 5’ universal initiator of same sequence and a common 3' universal adaptor of same sequence.
6 The method of claim 1 wherein each polynucleotide of the subset includes a common S' universal initiator of same sequence unique to the subset and a common 3 universal adaptor of same sequence unique to the subset.
?. The method of claim 1 wherein the activatable linker is activated to detach the subset of polynucleotides from the addressable substrate using heat, light an enzyme a chemical, electrical charge or pH
8. The method of claim 1 wherein the barcoding of step (b) comprises hybridizing a first ha sconce primer to either the 5’ universal initiator or the 3’ universal adaptor under PCR conditions to add the first barcode to either the S' universal initiator or the 3’ universal adaptor.
9. The method of claim 2 wherein the barcoding of step tc) comprises hybridizing a second baxcoded primer to either the S’ universal initiator or the 3’ universal adaptor under PCR conditions to add the second barcode to either the S' universal initiator or the 3’ universal
adaptor.
10. The method of claim 1 wherein the barcoded polynucleotides of (he subset are collected in a storage vessel.
11. The method of claim 1 wherein fee addressable substrate comprises a plurality of different subsets of polynucleotides wherein each subset encodes bits of information different fro other subsets.
12. The method of claim 1 wherein the addressable substrate comprises a plurality of different subsets of polynucleotides, and wherein different subsets of polynucleotides are subject to steps (a) and (b).
13. The method of claim 1 wherein the addressable substrate comprises a plurality of different subsets of polynucleotides, and wherein different subsets of polynucleotides are subject to steps (a) and (b), and wherein the barcoded polynucleotides of the different subsets are collected in a storage vessel.
14. The method of claim 2 wherei fee addressable substrate comprises a plurality of different subsets of polynucleotides, and wherein different subsets of polynucleotides are subject to steps (a), (b) and (c).
15. The method of claim 2 wherein the addressable substrate comprises a plurality of different subsets of polynucleotides, and wherein different subsets of polynucleotides are subject to steps (a) (b) and (c), and wherein the barcoded polynucleotides of the different subsets are collected in a storage vessel.
16. The method of claim 1 wherein the aetivaiable linker is a thiol linkage and the subset of polynucleotides is released from the addressable substrate using electronically- stimulated desorption.
17. The method of claim L wherein the subset of polynucleotides comprises DNA or
RNA.
IS. The method of claim L further comprising sorting, collecting, amplifying, sequencing, storing and/or retrieving the barcoded polynucleotides.
19. The method of claim 1, wherein the addressable substrate is an electrode array comprising a plurality of electrode reaction sites wherein each electrode reaction site is electrically connected to receive a voltage and
wherein the subset of polynucleotides is attached to corresponding electrode reaction sites, and
wherein releasing the subset of polynucleotides attached to the corresponding electrode reaction sites is eon trolled by application of voltage to the corresponding electrode reaction sites.
20. The method of claim 1, wherein the addressable substrate is an electrode array comprising a plurality of electrode reaction sites wherein each electrode reaction site is electrically connected to receive an electric potential and
wherein the array comprises a plurality of different subsets of polynucleotides, wherein each subset of the plurality is attached to corresponding electrode reaction sites and
wherein releasing each subset of the plurality is independently controlled by separate application of voltages.
21. The method of claim 1 where electronically-stimulated desorption occurs when polynucleotides are immersed in basic solution and with the application of at least 4 pulses of -
1 V or lower.
22. The method of claim 1 where electronically-stimulated desorption occurs when polynucleotides are immersed in basic solution and with the application of at least 5 pulses of - IV or lower.
23. The method of claim 1 where electronically-stimulated desorption occurs when polynucleotides are immersed with 0.5M sodium hydroxide and with the application of at least 5 pulses of - IV or lower.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201962816243P | 2019-03-11 | 2019-03-11 | |
US62/816,243 | 2019-03-11 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020185896A1 true WO2020185896A1 (en) | 2020-09-17 |
Family
ID=72426935
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2020/022102 WO2020185896A1 (en) | 2019-03-11 | 2020-03-11 | Methods for processing and storing dna encoding formats of information |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2020185896A1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11724244B2 (en) | 2021-02-26 | 2023-08-15 | Avery Digital Data, Inc. | Semiconductor chip devices and methods for polynucleotide synthesis |
WO2024163733A1 (en) * | 2023-02-01 | 2024-08-08 | Twist Bioscience Corporation | Electrochemical synthesis with redox stable nucleotides |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170337324A1 (en) * | 2015-07-13 | 2017-11-23 | President And Fellows Of Harvard College | Methods for Retrievable Information Storage Using Nucleic Acids |
WO2019040871A1 (en) * | 2017-08-24 | 2019-02-28 | Miller Julian | Device for information encoding and, storage using artificially expanded alphabets of nucleic acids and other analogous polymers |
-
2020
- 2020-03-11 WO PCT/US2020/022102 patent/WO2020185896A1/en active Application Filing
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170337324A1 (en) * | 2015-07-13 | 2017-11-23 | President And Fellows Of Harvard College | Methods for Retrievable Information Storage Using Nucleic Acids |
WO2019040871A1 (en) * | 2017-08-24 | 2019-02-28 | Miller Julian | Device for information encoding and, storage using artificially expanded alphabets of nucleic acids and other analogous polymers |
Non-Patent Citations (2)
Title |
---|
HO DENNY: "Detection and Melting of Surface-Bound DNA using a Purely Electrochemical Approach", MASTER'S THESES, vol. 30, no. 31, 2018, pages 56 - 58, XP055738639, Retrieved from the Internet <URL:https://repository.usfca.edu/thes/1152> [retrieved on 20180614] * |
ZHANG ET AL.: "Fabrication of a Sensitive Impedance Biosensor of DNA Hybridization Based on Gold Nanoparticles Modified Gold Electrode", ELECTROANALYSIS, vol. 20, no. 19, 2008, pages 2127 - 2133, XP055739188 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11724244B2 (en) | 2021-02-26 | 2023-08-15 | Avery Digital Data, Inc. | Semiconductor chip devices and methods for polynucleotide synthesis |
WO2024163733A1 (en) * | 2023-02-01 | 2024-08-08 | Twist Bioscience Corporation | Electrochemical synthesis with redox stable nucleotides |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
AU2020244615B2 (en) | Compositions and methods for sample processing | |
US20200318182A1 (en) | Methods and arrays for producing and sequencing monoclonal clusters of nucleic acid | |
JP7430701B2 (en) | Large-scale parallel enzymatic synthesis of nucleic acid strands | |
KR102583062B1 (en) | Homopolymer encoded nucleic acid memory | |
US20200181699A1 (en) | Nucleic acid synthesis techniques | |
US20190382836A1 (en) | Methods and systems for processing polynucleotides | |
US20190360034A1 (en) | Methods and systems for sequencing nucleic acids | |
US20180179580A1 (en) | Compositions and methods for sample processing | |
US10774366B2 (en) | Method of making polynucleotides using closed-loop verification | |
US20180363029A1 (en) | Compositions and methods for sample processing | |
AU2016298158A1 (en) | Spatial mapping of nucleic acid sequence information | |
CN111876409A (en) | Method for sorting nucleic acids and multiplex preparations in vitro cloning | |
WO2009076485A9 (en) | Sequencing of nucleic acids | |
KR20180014054A (en) | Orthogonal non-blocking of nucleotides | |
WO2020185896A1 (en) | Methods for processing and storing dna encoding formats of information | |
CN111527205A (en) | Novel methods for synthesizing polynucleotides using diverse libraries of oligonucleotides | |
US20230265497A1 (en) | Single cell workflow for whole genome amplification | |
US20170130258A1 (en) | Multiplex on-array droplet pcr and quantitative pcr | |
CN114051535A (en) | Methods and compositions for identifying ligands on an array using indices and barcodes | |
US20210171939A1 (en) | Sample processing barcoded bead composition, method, manufacturing, and system | |
CN114026231A (en) | Polynucleotide libraries | |
US20240052406A1 (en) | Competitive methods and compositions for amplifying polynucleotides | |
US20230053916A1 (en) | Methods and systems for nucleic acid synthesis | |
RU2825578C1 (en) | Methods and compositions for determining ligands on matrices using indices and barcodes | |
RU2816708C2 (en) | Methods and compositions for determining ligands on matrices using indices and barcodes |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20769904 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20769904 Country of ref document: EP Kind code of ref document: A1 |