WO2003025123A2 - Dna: a medium for long-term information storage specification - Google Patents

Dna: a medium for long-term information storage specification Download PDF

Info

Publication number
WO2003025123A2
WO2003025123A2 PCT/US2002/027606 US0227606W WO03025123A2 WO 2003025123 A2 WO2003025123 A2 WO 2003025123A2 US 0227606 W US0227606 W US 0227606W WO 03025123 A2 WO03025123 A2 WO 03025123A2
Authority
WO
WIPO (PCT)
Prior art keywords
information
nucleic acid
acid molecule
dna
stored
Prior art date
Application number
PCT/US2002/027606
Other languages
French (fr)
Inventor
Carter Bancroft
Catherine Clelland
Original Assignee
Mount Sinai School Of Medecine
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mount Sinai School Of Medecine filed Critical Mount Sinai School Of Medecine
Publication of WO2003025123A2 publication Critical patent/WO2003025123A2/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids

Definitions

  • the present invention relates to a technique for using nucleic acid molecules as a medium for long-term storage and retrieval of information.
  • the single Polyprimer Key contains a series of sequencing primer sequences ("Seq Primer #”) flanked by common forward (“F Primer”) and reverse (“R Primer”) PCR primer sequences, each separated by a small common spacer.
  • Seq Primer # sequencing primer sequences flanked by common forward (“F Primer”) and reverse (“R Primer”) PCR primer sequences, each separated by a small common spacer.
  • F Primer forward
  • R Primer reverse
  • DNA possesses three properties that recommend it as a vehicle for long-term information storage.
  • PCR templates in the readout procedure greatly suppresses effects of any base modifications during prolonged storage that yield sequence changes in individual DNA molecules (O. Handt, M. Krings, R.H. Ward, S. Paabo, 1996, Am. J. Hum. Genet. 59, 368.
  • An optimal procedure for long-term storage of information in DNA is designed so that data retrieval requires minimal prior knowledge beyond a familiarity with molecular biological techniques.
  • the method of the present invention utilizes two standard procedures for recovery of stored information: polymerase chain reaction (PCR) and DNA sequence analysis.
  • the method uses two classes of nucleic acid molecules as depicted in Figure 1.: (i) "Information DNAs" (iDNAs) containing the stored information, and (ii) a single “Polyprimer Key” (PPK) that is the key to retrieving the information stored in the iDNAs.
  • Each iDNA contains the following sequence elements: common flanking forward (F) and reverse (R) PCR amplification primers ( ⁇ 10-20 bases long), a unique sequencing primer (Seq Primer) (comparable in size to the F and R primers), a small common spacer ( ⁇ 3-4 bases long) serving as a cue to indicate the start of the stored information, and a unique information segment.
  • the information to be stored is encoded successively in these information segments, beginning with Information 1.
  • the PPK is also flanked by the common F and R primers, and contains in the proper order the unique Seq Primers for the ordered retrieval from each iDNA of its information segment sequence. Common spacer sequences indicate the demarcations between each
  • Each information segment should be capable of encoding any possible data (e.g., text), while correct readout requires that each Seq Primer prime a sequencing reaction only from the appropriate position within a specific iDNA, and not mis-prime on any iDNA.
  • Various approaches can be taken to satisfy these conditions.
  • two bases e.g., A and T
  • the other two e.g., G and C
  • Seq Primers This would prevent mis-hybridization of Seq Primers to information segments, but would greatly limit both efficiency of text storage and the number of possible different Seq Primers.
  • a reader will amplify and sequence the PPK.
  • the PPK plus the F and R PCR amplification primers can be stored separately from the iDNAs.
  • the PPK, plus F and R primers and other non-degradable PCR reagents could be stored in, or attached to, a vessel made of glass or plastic exhibiting long-term stability; and a separate vessel used to store the iDNAs.
  • the outside of the former vessel could have a permanent glyph depicting PCR amplification, prompting a future reader to add a thermostable DNA polymerase (plus water) and begin PCR. This scheme should lead the reader, without further prompts, to sequence and interpret the PPK.
  • the two standard molecular biological techniques employed in the model for information storage in DNA could form the basis for a variety of DNA- based memory storage devices.
  • the combined operations of PCR followed by sequence analysis are directly analogous to the retrieval of information from an addressable storage device such as the random access memory (RAM) in a computer.
  • RAM random access memory
  • the ability to employ these combined operations to retrieve data permits construction of DNA representations of classical computer data structures such as arrays, linked lists, and trees.
  • the model depicted in Fig. 1 is somewhat analogous to an array data structure.
  • the PPK contains the addresses of the data elements (the iDNA information segments), which can be calculated (sequenced), and then employed for selective retrieval of the stored data.
  • microchip for storage would impose two levels of order on the information stored in iDNAs placed in these "microwells": the X and Y coordinates of each micro well, plus the order within each microwell provided by the scheme described above and in Figure 1.
  • a single series of unique identification primers, encoded within a single PPK should suffice to order the collection of iDNAs within every microwell, and thus permit readout in the proper order of the information stored on the entire chip. Because of the enormous number of different potential 20 base primer
  • the two Information DNAs (iDNAs) employed for the prototype experiment were designed as described in the Letter. All primers were designed to contain a G in every 4th position, and also to contain sequences that would minimize the possibility of cross-hybridization among the primers.
  • iDNAs Information DNAs
  • a large 19-base common spacer sequence of alternating GTs was employed (substituting for the original design containing 19 straight Gs, which could not be synthesized by the supplier).
  • current sequencing technology permits determination of sequences quite close to the primer, it should be possible to employ instead small (3-4 base) common spacer sequences containing only Gs.
  • sequences of the common Forward and Reverse PCR primers are, respectively, TGCACGTCAGGAGGTAGGTC and TGCTCACTAGCGCACACGCT.
  • sequences of the unique Sequencing Primers for the first and second iDNAs are, respectively,
  • the two iDNAs (232 and 247 bases, respectively) were too long for commercial synthesis. They were therefore each constructed from overlapping short oligonucleotides (supplied by Genosys, Inc.), employing a modification of the single- step assembly PCR method (W.P.C. Stemmer, A. Crameri, K.D. Ha, T.M. Brennan, H.L. Heyneker, 1995, Gene 164, 49).
  • W.P.C. Stemmer, A. Crameri, K.D. Ha, T.M. Brennan, H.L. Heyneker, 1995, Gene 164, 49 were modified to ensure specificity of formation of DNA molecules containing several repetitive regions. These modifications, introduced to ensure specificity of formation of DNA molecules containing several repetitive regions, included use of larger starting oligonucleootides (23-104 bases), lower MgCl 2 concentration (1.5mM) in the gene assembly mix, and higher annealing temperatures (60°C).
  • oligonucleotides were PCR- amplified (25 cycles) as described (Ibid.), employing the common Forward and Reverse PCR Primer sequences described above. Following gel electrophoresis of the PCR products (on 3% low-melting agarose), fragments of the expected size were exercised and cloned into Pcr2.1-TOPO (Invitrogen), permitting storage of each iDNA in the form of a bacterial glycerol stock. To generate iDNAs for information storage, 50 ng of each cloned vector was employed as a template for PCR amplification.
  • the PCR mix contained the common Forward and Reverse PCR Primers (final concentrations 0.5 uM), O.lmM dNTPs, 1U Taq polymerase plus standard buffers supplied by Qiagen. PCR involved an initial denaturation stage at 94°C for 5 min, followed by 30 cycles of 94°C for 30 s, 60°C for 30 s, 72°C for 45 s, and then a final extension step of 72°c for 5 min. Equal quantities (20 ng) of each iDNA were added to a microtube. The iDNAs were then simultaneously amplified in a PCR reaction identical to that described above, using the common Forward and Reverse PCR Primers. Aliquots (160 ng) of the resultant mixture of amplification products were employed for two sequencing reactions, each using 3 pmole of one of the above unique iDNA

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • General Health & Medical Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Organic Chemistry (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Medical Informatics (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Immunology (AREA)
  • Molecular Biology (AREA)
  • Microbiology (AREA)
  • Biochemistry (AREA)
  • General Engineering & Computer Science (AREA)
  • Genetics & Genomics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The present invention relates to a technique for using nucleic acid molecules as a medium for long-term storage and retrieval of information.

Description

DNA: A MEDIUM FOR LONG-TERM INFORMATION STORAGE
SPECIFICATION
INTRODUCTION The present invention relates to a technique for using nucleic acid molecules as a medium for long-term storage and retrieval of information.
BACKGROUND OF THE INVENTION
In this digital age, the technology employed for information storage is undergoing rapid advances. Data currently being stored in magnetic or optical media will in all likelihood become unrecoverable within a century or less, through the combined effects of hardware and software obsolescence, and decay of the storage medium. New approaches are required that will permit retrieval of information stored for centuries or even millennia.
SUMMARY OF THE INVENTION The present invention relates to a technique for using nucleic acid molecules as a medium for long-term storage and retrieval of information
BRIEF DESCRIPTION OF THE DRAWINGS
Figure 1. Structures of DNA Molecules Employed for Information
Storage and Readout. The single Polyprimer Key contains a series of sequencing primer sequences ("Seq Primer #") flanked by common forward ("F Primer") and reverse ("R Primer") PCR primer sequences, each separated by a small common spacer. Each Information DNA is also flanked by the common PCR primer sequences, and contains two unique elements, separated by the common spacer: a Seq
Primer and a numbered information segment ("Information #") (not drawn to scale). Figure 2. Experimental Prototype. Readout of information DNAs, including sequencing plus decoding. DETAILED DESCRIPTION OF THE INVENTION
DNA possesses three properties that recommend it as a vehicle for long-term information storage. First, DNA has stood the informational "test of time" during the billions of years and multiple generations since the early days of life on Earth. Non-replicating DNA molecules have proven to be quite robust. Although DNA stored under non-ideal conditions, e.g., in ancient archaeological deposits, is subject to hydrolytic and oxidative damage (Moss, M. et al., 1996, Nucl. Acids. Res. 24:1304) mitochondrial DNA extracted and amplified from 7000-year-old human remains yielded an accurate DNA sequence (Paabo, JA et al., 1988, Nucl. Acids Res. 16: 1775). Storage of DNA under more ideal conditions can result in extremely long stability, as evidenced by the reported recovery of viable bacteria (and thus of their genomic DNA) from 250-million-year-old salt crystals (Vreeland, RH et al., 2000, Nature 407:897). Second, since DNA is genetic material, methods for both storage and reading of DNA-encoded information should remain a central core of technological civilizations, and undergo continual improvements in both efficiency and miniaturization. Third, theuse of DNA as a storage medium permits each segment of information to be stored in an enormous number of identical molecules. This extensive informational redundancy strongly mitigates effects of any losses due to stochastic decay. Studies of ancient human remains provide information on long-term rates of DNA decay and/or modification, and thus a highly conservative estimate of minimum DNA amounts required for prolonged storage under more ideal conditions. About 0.1% of the DNA extracted from ancient, decomposed tissue is unmodified (S. Paabo, R.G. Higuchi, AC. Wilson, J. Biol. Chem. 264, 9709 (1989). However, as little as 100-300 fg of this unmodified DNA can serve as a PCR template and generate accurate DNA sequence. In the example described herein, information is stored in 20 ng (20,000 pg) of identical DNA molecules -250 bp in size, far above the above the 100 pg range. Moreover, use of this large number of molecules (c. 80 billion) as PCR templates in the readout procedure greatly suppresses effects of any base modifications during prolonged storage that yield sequence changes in individual DNA molecules (O. Handt, M. Krings, R.H. Ward, S. Paabo, 1996, Am. J. Hum. Genet. 59, 368. An optimal procedure for long-term storage of information in DNA is designed so that data retrieval requires minimal prior knowledge beyond a familiarity with molecular biological techniques. The method of the present invention utilizes two standard procedures for recovery of stored information: polymerase chain reaction (PCR) and DNA sequence analysis. The method uses two classes of nucleic acid molecules as depicted in Figure 1.: (i) "Information DNAs" (iDNAs) containing the stored information, and (ii) a single "Polyprimer Key" (PPK) that is the key to retrieving the information stored in the iDNAs. Each iDNA contains the following sequence elements: common flanking forward (F) and reverse (R) PCR amplification primers (~10-20 bases long), a unique sequencing primer (Seq Primer) (comparable in size to the F and R primers), a small common spacer (~3-4 bases long) serving as a cue to indicate the start of the stored information, and a unique information segment. The information to be stored is encoded successively in these information segments, beginning with Information 1. The PPK is also flanked by the common F and R primers, and contains in the proper order the unique Seq Primers for the ordered retrieval from each iDNA of its information segment sequence. Common spacer sequences indicate the demarcations between each Seq primer.
Each information segment should be capable of encoding any possible data (e.g., text), while correct readout requires that each Seq Primer prime a sequencing reaction only from the appropriate position within a specific iDNA, and not mis-prime on any iDNA. Various approaches can be taken to satisfy these conditions. In the simplest possible model, two bases (e.g., A and T) would be employed to encode text in information segments, and the other two (e.g., G and C) to construct Seq Primers. This would prevent mis-hybridization of Seq Primers to information segments, but would greatly limit both efficiency of text storage and the number of possible different Seq Primers.
To retrieve the stored information, a reader will amplify and sequence the PPK. To facilitate this step, the PPK plus the F and R PCR amplification primers can be stored separately from the iDNAs. The PPK, plus F and R primers and other non-degradable PCR reagents, could be stored in, or attached to, a vessel made of glass or plastic exhibiting long-term stability; and a separate vessel used to store the iDNAs. The outside of the former vessel could have a permanent glyph depicting PCR amplification, prompting a future reader to add a thermostable DNA polymerase (plus water) and begin PCR. This scheme should lead the reader, without further prompts, to sequence and interpret the PPK. Other long-term storage scenarios can be envisaged employing only a single container, involving, e.g., the use of DNA strands immobilized on beads. PCR amplification will yield levels of the PPK sufficient for further analysis, even if extensive degradation/modification had occurred during storage. Sequence analysis of the entire PPK reveals the sequences of the F and R PCR primers, plus an internal ordered series of elements of comparable size, suggesting roles for these elements as sequencing primers. This interpretation would lead the reader to perform sequence analysis of the information segments. Assuming that the F and R primers were interpreted to be "universal" PCR primers, the reader would employ these for simultaneous PCR amplification of all of the iDNAs. Sequential use of each Seq Primer to prime a sequencing reaction on the collection of PCR products would then yield the sequence of each of the information segments, arranged in the proper order to be decoded and read as a continuous block.
The two standard molecular biological techniques employed in the model for information storage in DNA could form the basis for a variety of DNA- based memory storage devices. The combined operations of PCR followed by sequence analysis are directly analogous to the retrieval of information from an addressable storage device such as the random access memory (RAM) in a computer. The ability to employ these combined operations to retrieve data permits construction of DNA representations of classical computer data structures such as arrays, linked lists, and trees. The model depicted in Fig. 1 is somewhat analogous to an array data structure. The PPK contains the addresses of the data elements (the iDNA information segments), which can be calculated (sequenced), and then employed for selective retrieval of the stored data. In an alternative serial model, analogous to a linked list, a series of iDNAs could be designed, each containing both a data element and the sequencing primer for retrieving the data element from the succeeding iDNA. Such a serial model would obviate the need for a separate PPK, but information retrieval would require considerably more experimental manipulations (and prior specific knowledge) than in the parallel model explored in detail here. Moreover, the use of DNA microarray technology should permit extensive scaleup of this model. Current microarray technology, in which up to 10,000 small DNA samples can be spotted in an ordered array onto a ~3 cm2 surface (glass, etc.) (Gerhold, D. et al., 1999 TIBS 24:168) would have to be modified to permit spotting of DNA into small wells at a comparable density on a "microchip". Use of such a microchip for storage would impose two levels of order on the information stored in iDNAs placed in these "microwells": the X and Y coordinates of each micro well, plus the order within each microwell provided by the scheme described above and in Figure 1. A single series of unique identification primers, encoded within a single PPK (perhaps stored in the first microwell), should suffice to order the collection of iDNAs within every microwell, and thus permit readout in the proper order of the information stored on the entire chip. Because of the enormous number of different potential 20 base primer
■ 90 • ■ sequences, i.e., a theoretical maximum of 4 , the capacity for information storage in microarrayed DNA is presently limited by practical rather than theoretical considerations. It seems reasonable that with minor advances in microarray technology, about 700 novels or other data each equivalent in size to the entire "Tale of Two Cities" could be stored in a DNA microchip with the area of a postage stamp. A conservative upper limit on the size of the information segment is -600 bases, set by the present limits on DNA sequence obtainable from a single sequencing primer. If four-base codons chosen from our three-base alphabet (A, C, T) were employed (to pennit encoding of all common English alphanumeric characters plus a space), each iDNA could store about 150 characters. Storage of "A Tale of Two Cities", containing 135,345 words, or roughly 200,000 alphanumeric characters, would require -1300 iDNAs. Current technology would permit single- pass sequence analysis of a single PPK containing up to 100 unique sequencing primers, implying that information could be stored in and retrieved from about 100 different iDNAs per microwell. Since 14 microwells would thus be required to store Dickens' novel in DNA form, a 10,000 well microchip could store -700 texts or equivalent information the size of that novel.
6. EXAMPLE: DESIGN AND PRODUCTION OF INFORMATIONAL
DNA hi the example described herein the following method has been utilized. Text is encoded using only the bases A, C, and T. Seq Primers were designed using all four bases, plus a requirement that each 4th position be a G. The resultant mismatch at (at least) each 4th position between the sequences of any Seq Primer and any information segment should prevent mis-priming. Scaleup of the storage model presented here will ultimately require computer-generated design (see, e.g., M. Garzon, R. Deaton, J. A. Rose, in DNA Based Computers. DIMACS Series in Discrete Mathematics and Theoretical Computer Science vol. 54, E. Winfree,
D.K. Gifford, Eds. (American Mathematical Society, Providence, RI, 2000), p. 91) of large numbers of both Seq Primer and information segment DNA sequences satisfying the constraints on these elements.
A simple prototype of this technique was carried out, employing only the bases A, C, and T (i.e. omitting G) to encode text. To facilitate future decoding of the information stored in the iDNAs, an "obvious" ternary code based upon alphabetical order was employed to store English text in DNA. The DNA bases were ordered "alphabetically" (A, C, T). DNA codons were then constructed via a ternary code, beginning with "AAA" (encoding the letter "A"). The bases C, and then T, were inserted progressively into the 3rd, 2nd, and 1st positions, yielding a series of 27 codons encoding the English letters in alphabetical order, plus a space. However, even if this encoding were not obvious to a future reader, recovery of a sufficient number of information segment sequences would permit use of standard cryptanalytical techniques- to determine how text had been encoded in the DNA. Two iDNAs were constructed to encode, respectively, "IT WAS THE
BEST OF TIMES IT WAS THE WORST OF TIMES", and "IT WAS THE AGE OF FOOLISHNESS IT WAS THE EPOCH OF BELIEF". Not only is this text one of the most famous opening lines of a novel C. Dickens, A Tale of Two Cities (Oxford University Press, London, New York, 1953. Originally published in 1859), but the four-fold repetition of the phrase "it was the" provided a test of the ability of this approach to deal with repeated DNA sequences both within and between information segments. The scheme described above was employed to simultaneously PCR amplify the two iDNAs contained in a microtube, and then to sequence each of the two amplification products. Decoding of the resultant DNA sequences above successfully recovered the stored text.
The two Information DNAs (iDNAs) employed for the prototype experiment were designed as described in the Letter. All primers were designed to contain a G in every 4th position, and also to contain sequences that would minimize the possibility of cross-hybridization among the primers. To ensure complete sequencing of the Information segment in each iDNA, a large 19-base common spacer sequence of alternating GTs was employed (substituting for the original design containing 19 straight Gs, which could not be synthesized by the supplier). However, since current sequencing technology permits determination of sequences quite close to the primer, it should be possible to employ instead small (3-4 base) common spacer sequences containing only Gs. The sequences of the common Forward and Reverse PCR primers (5* to 3'), are, respectively, TGCACGTCAGGAGGTAGGTC and TGCTCACTAGCGCACACGCT. The sequences of the unique Sequencing Primers for the first and second iDNAs (5' to 3') are, respectively,
TAGAGGGACTGTTCGGCAGC and CCGTTCGGAGGATAGCGAGT.
The two iDNAs (232 and 247 bases, respectively) were too long for commercial synthesis. They were therefore each constructed from overlapping short oligonucleotides (supplied by Genosys, Inc.), employing a modification of the single- step assembly PCR method (W.P.C. Stemmer, A. Crameri, K.D. Ha, T.M. Brennan, H.L. Heyneker, 1995, Gene 164, 49). These modifications, introduced to ensure specificity of formation of DNA molecules containing several repetitive regions, included use of larger starting oligonucleootides (23-104 bases), lower MgCl2 concentration (1.5mM) in the gene assembly mix, and higher annealing temperatures (60°C). Following this use of assembly PCR to generate long, double stranded oligonucleotides corresponding to each iDNA, these oligonucleotides were PCR- amplified (25 cycles) as described (Ibid.), employing the common Forward and Reverse PCR Primer sequences described above. Following gel electrophoresis of the PCR products (on 3% low-melting agarose), fragments of the expected size were exercised and cloned into Pcr2.1-TOPO (Invitrogen), permitting storage of each iDNA in the form of a bacterial glycerol stock. To generate iDNAs for information storage, 50 ng of each cloned vector was employed as a template for PCR amplification. The PCR mix contained the common Forward and Reverse PCR Primers (final concentrations 0.5 uM), O.lmM dNTPs, 1U Taq polymerase plus standard buffers supplied by Qiagen. PCR involved an initial denaturation stage at 94°C for 5 min, followed by 30 cycles of 94°C for 30 s, 60°C for 30 s, 72°C for 45 s, and then a final extension step of 72°c for 5 min. Equal quantities (20 ng) of each iDNA were added to a microtube. The iDNAs were then simultaneously amplified in a PCR reaction identical to that described above, using the common Forward and Reverse PCR Primers. Aliquots (160 ng) of the resultant mixture of amplification products were employed for two sequencing reactions, each using 3 pmole of one of the above unique iDNA
Sequencing Primers, and the products analyzed on an ABI 377 sequencer. The Codon Table was then used to decode the resulting sequences of the two Information segments. This procedure returned the correct input text demonstrating the success of the prototype experiment.

Claims

We cla m
1. A method for storing information in a nucleic acid molecule comprising:
(i) synthesis of an information nucleic acid molecule wherein said information is stored as a nucleotide sequence;
(ii) synthesis of a polyprimer key wherein said polyprimer key provides information necessary for amplification and sequencing of the information nucleic acid molecule, thereby, permitting retrieval of the stored information.
2. The method of claim 1 wherein said information nucleic acid molecule comprises;
(i) a forward and reverse polymerase chain reaction primer recognition sequence;
(ii) a sequencing primer recognition sequence; (iii) at least one common spacer; and (iv) an information segment.
3. The method of claim 1 wherein said polyprimer key comprises:
(i) a forward and reverse polymerase chain reaction primer recognition sequence;
(ii) at least one common spacer; and
(iii) a sequencing primer recognition sequence.
4. A method for retrieving stored information in a nucleic acid molecule comprising:
(i) amplification of a polyprimer key nucleic acid molecule using primers that bind to the primer recognition sequences of the polyprimer key;
(ii) deriving the nucleotide sequence of the sequencing primer recognition sequence, thereby, providing the primer sequence required for amplification and sequencing of an information nucleic acid molecule;
(iii) amplification of the information nucleic acid molecule using primers that bind to the primer recognition sequences of the information nucleic acid molecule; and
(iv) deriving the nucleotide sequence of the information nucleic acid molecule.
5. The method of claim 1 wherein the information nucleic acid molecule is stored on a microchip.
6. The method of claim 1 wherein the infonnation nucleic acid molecule is co- linear with the polyprimer key.
PCT/US2002/027606 2001-08-28 2002-08-28 Dna: a medium for long-term information storage specification WO2003025123A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US31557601P 2001-08-28 2001-08-28
US60/315,576 2001-08-28

Publications (1)

Publication Number Publication Date
WO2003025123A2 true WO2003025123A2 (en) 2003-03-27

Family

ID=23225062

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2002/027606 WO2003025123A2 (en) 2001-08-28 2002-08-28 Dna: a medium for long-term information storage specification

Country Status (1)

Country Link
WO (1) WO2003025123A2 (en)

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013178801A2 (en) 2012-06-01 2013-12-05 European Molecular Biology Laboratory High-capacity storage of digital information in dna
WO2014014991A3 (en) * 2012-07-19 2014-03-27 President And Fellows Of Harvard College Methods of storing information using nucleic acids
WO2015090879A1 (en) * 2013-12-18 2015-06-25 Ge Healthcare Uk Limited Oligonucleotide data storage on solid supports
WO2016164779A1 (en) * 2015-04-10 2016-10-13 University Of Washington Integrated system for nucleic acid-based storage of digital data
EP3098742A1 (en) * 2015-05-26 2016-11-30 Thomson Licensing Method and apparatus for creating a plurality of oligos with a targeted distribution of nucleotide types
US9928869B2 (en) 2015-07-13 2018-03-27 President And Fellows Of Harvard College Methods for retrievable information storage using nucleic acids
WO2018081745A1 (en) 2016-10-31 2018-05-03 Dodo Omnidata, Inc. Methods, compositions, and devices for information storage
WO2018094108A1 (en) * 2016-11-16 2018-05-24 Catalog Technologies, Inc. Nucleic acid-based data storage
CN109943560A (en) * 2018-11-22 2019-06-28 西藏自治区人民政府驻成都办事处医院 Chinese character information storage method based on DNA vector
WO2020005598A1 (en) * 2018-06-29 2020-01-02 Microsoft Technology Licensing, Llc Whole pool amplification and in-sequencer random-access of data encoded by polynucleotides
US10640822B2 (en) 2016-02-29 2020-05-05 Iridia, Inc. Systems and methods for writing, reading, and controlling data stored in a polymer
US10650312B2 (en) 2016-11-16 2020-05-12 Catalog Technologies, Inc. Nucleic acid-based data storage
JP2020534633A (en) * 2017-07-25 2020-11-26 ナンジンジンスールイ サイエンス アンド テクノロジー バイオロジー コーポレイション DNA-based data storage and data retrieval
US10859562B2 (en) 2016-02-29 2020-12-08 Iridia, Inc. Methods, compositions, and devices for information storage
US11106633B2 (en) * 2018-04-24 2021-08-31 EMC IP Holding Company, LLC DNA-based data center with deduplication capability
US11227219B2 (en) 2018-05-16 2022-01-18 Catalog Technologies, Inc. Compositions and methods for nucleic acid-based data storage
US11286479B2 (en) 2018-03-16 2022-03-29 Catalog Technologies, Inc. Chemical methods for nucleic acid-based data storage
US11306353B2 (en) 2020-05-11 2022-04-19 Catalog Technologies, Inc. Programs and functions in DNA-based data storage
US11315023B2 (en) 2018-04-13 2022-04-26 The Hong Kong Polytechnic University Data storage using peptides
US11535842B2 (en) 2019-10-11 2022-12-27 Catalog Technologies, Inc. Nucleic acid security and authentication
US11610651B2 (en) 2019-05-09 2023-03-21 Catalog Technologies, Inc. Data structures and operations for searching, computing, and indexing in DNA-based data storage
US11837302B1 (en) 2020-08-07 2023-12-05 Iridia, Inc. Systems and methods for writing and reading data stored in a polymer using nano-channels
US12002547B2 (en) 2023-02-09 2024-06-04 Catalog Technologies, Inc. Data structures and operations for searching, computing, and indexing in DNA-based data storage

Cited By (63)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10387301B2 (en) 2012-06-01 2019-08-20 European Molecular Biology Laboratory High-capacity storage of digital information in DNA
WO2013178801A3 (en) * 2012-06-01 2014-01-23 European Molecular Biology Laboratory High-capacity storage of digital information in dna
JP2020119576A (en) * 2012-06-01 2020-08-06 ヨーロピアン モレキュラー バイオロジー ラボラトリーEuropean Molecular Biology Laboratory High-capacity storage of digital information in dna
EP3346404A1 (en) 2012-06-01 2018-07-11 European Molecular Biology Laboratory High-capacity storage of digital information in dna
WO2013178801A2 (en) 2012-06-01 2013-12-05 European Molecular Biology Laboratory High-capacity storage of digital information in dna
JP2015529864A (en) * 2012-06-01 2015-10-08 ヨーロピアン モレキュラー バイオロジー ラボラトリーEuropean Molecular Biology Laboratory High capacity storage of digital information in DNA
AU2020202857B2 (en) * 2012-06-01 2022-03-17 European Molecular Biology Laboratory High-Capacity Storage of Digital Information in DNA
AU2018247323B2 (en) * 2012-06-01 2020-01-30 European Molecular Biology Laboratory High-Capacity Storage of Digital Information in DNA
IL290490B1 (en) * 2012-06-01 2023-04-01 European Molecular Biology Laboratory Embl High capacity storage of digital information in dna
JP2021144745A (en) * 2012-06-01 2021-09-24 ヨーロピアン モレキュラー バイオロジー ラボラトリーEuropean Molecular Biology Laboratory High-capacity storage of digital information in DNA
CN107055468A (en) * 2012-06-01 2017-08-18 欧洲分子生物学实验室 The high-capacity storage of digital information in DNA
US11892945B2 (en) 2012-06-01 2024-02-06 European Molecular Biology Laboratory High-capacity storage of digital information in DNA
JP2019023890A (en) * 2012-06-01 2019-02-14 ヨーロピアン モレキュラー バイオロジー ラボラトリーEuropean Molecular Biology Laboratory High capacity storage of digital information with DNA
AU2013269536B2 (en) * 2012-06-01 2018-11-08 European Molecular Biology Laboratory High-capacity storage of digital information in DNA
JP7431775B2 (en) 2012-06-01 2024-02-15 ヨーロピアン モレキュラー バイオロジー ラボラトリー High capacity storage of digital information in DNA
US9384320B2 (en) 2012-07-19 2016-07-05 President And Fellows Of Harvard College Methods of storing information using nucleic acids
JP2015533077A (en) * 2012-07-19 2015-11-19 プレジデント アンド フェローズ オブ ハーバード カレッジ Information storage method using nucleic acid
US9996778B2 (en) 2012-07-19 2018-06-12 President And Fellows Of Harvard College Methods of storing information using nucleic acids
WO2014014991A3 (en) * 2012-07-19 2014-03-27 President And Fellows Of Harvard College Methods of storing information using nucleic acids
CN108875312A (en) * 2012-07-19 2018-11-23 哈佛大学校长及研究员协会 Utilize the method for nucleic acid storage information
US11900191B2 (en) 2012-07-19 2024-02-13 President And Fellows Of Harvard College Methods of storing information using nucleic acids
CN104662544A (en) * 2012-07-19 2015-05-27 哈佛大学校长及研究员协会 Methods of storing information using nucleic acids
US10460220B2 (en) 2012-07-19 2019-10-29 President And Fellows Of Harvard College Methods of storing information using nucleic acids
US11931713B2 (en) 2013-12-18 2024-03-19 Global Life Sciences Solutions Operations UK Ltd Oligonucleotide data storage on solid supports
WO2015090879A1 (en) * 2013-12-18 2015-06-25 Ge Healthcare Uk Limited Oligonucleotide data storage on solid supports
US11164661B2 (en) 2015-04-10 2021-11-02 University Of Washington Integrated system for nucleic acid-based storage and retrieval of digital data using keys
WO2016164779A1 (en) * 2015-04-10 2016-10-13 University Of Washington Integrated system for nucleic acid-based storage of digital data
EP3098742A1 (en) * 2015-05-26 2016-11-30 Thomson Licensing Method and apparatus for creating a plurality of oligos with a targeted distribution of nucleotide types
JP2018527900A (en) * 2015-07-13 2018-09-27 プレジデント アンド フェローズ オブ ハーバード カレッジ Method for recoverable information storage using nucleic acids
US11532380B2 (en) 2015-07-13 2022-12-20 President And Fellows Of Harvard College Methods for using nucleic acids to store, retrieve and access information comprising a text, image, video or audio format
US9928869B2 (en) 2015-07-13 2018-03-27 President And Fellows Of Harvard College Methods for retrievable information storage using nucleic acids
US10289801B2 (en) 2015-07-13 2019-05-14 President And Fellows Of Harvard College Methods for retrievable information storage using nucleic acids
CN108026557A (en) * 2015-07-13 2018-05-11 哈佛学院董事及会员团体 It is used for the method for retrievable information storage using nucleic acid
US10714178B2 (en) 2016-02-29 2020-07-14 Iridia, Inc. Methods, compositions, and devices for information storage
US10995373B2 (en) 2016-02-29 2021-05-04 Iridia, Inc. Systems and methods for writing, reading, and controlling data stored in a polymer
US10438662B2 (en) 2016-02-29 2019-10-08 Iridia, Inc. Methods, compositions, and devices for information storage
US11549140B2 (en) 2016-02-29 2023-01-10 Iridia, Inc. Systems and methods for writing, reading, and controlling data stored in a polymer
US10859562B2 (en) 2016-02-29 2020-12-08 Iridia, Inc. Methods, compositions, and devices for information storage
US11505825B2 (en) 2016-02-29 2022-11-22 Iridia, Inc. Methods of synthesizing DNA
US10640822B2 (en) 2016-02-29 2020-05-05 Iridia, Inc. Systems and methods for writing, reading, and controlling data stored in a polymer
WO2018081745A1 (en) 2016-10-31 2018-05-03 Dodo Omnidata, Inc. Methods, compositions, and devices for information storage
US10650312B2 (en) 2016-11-16 2020-05-12 Catalog Technologies, Inc. Nucleic acid-based data storage
US11379729B2 (en) 2016-11-16 2022-07-05 Catalog Technologies, Inc. Nucleic acid-based data storage
GB2563105B (en) * 2016-11-16 2022-10-19 Catalog Tech Inc Nucleic acid-based data storage
US11763169B2 (en) 2016-11-16 2023-09-19 Catalog Technologies, Inc. Systems for nucleic acid-based data storage
WO2018094108A1 (en) * 2016-11-16 2018-05-24 Catalog Technologies, Inc. Nucleic acid-based data storage
GB2563105A (en) * 2016-11-16 2018-12-05 Catalog Tech Inc Nucleic acid-based data storage
JP7090148B2 (en) 2017-07-25 2022-06-23 ナンジン ジェンスクリプト バイオテック カンパニー,リミテッド DNA-based data storage and data retrieval
JP2020534633A (en) * 2017-07-25 2020-11-26 ナンジンジンスールイ サイエンス アンド テクノロジー バイオロジー コーポレイション DNA-based data storage and data retrieval
US11286479B2 (en) 2018-03-16 2022-03-29 Catalog Technologies, Inc. Chemical methods for nucleic acid-based data storage
US11315023B2 (en) 2018-04-13 2022-04-26 The Hong Kong Polytechnic University Data storage using peptides
US11106633B2 (en) * 2018-04-24 2021-08-31 EMC IP Holding Company, LLC DNA-based data center with deduplication capability
US11227219B2 (en) 2018-05-16 2022-01-18 Catalog Technologies, Inc. Compositions and methods for nucleic acid-based data storage
WO2020005598A1 (en) * 2018-06-29 2020-01-02 Microsoft Technology Licensing, Llc Whole pool amplification and in-sequencer random-access of data encoded by polynucleotides
US11651836B2 (en) 2018-06-29 2023-05-16 Microsoft Technology Licensing, Llc Whole pool amplification and in-sequencer random-access of data encoded by polynucleotides
CN109943560A (en) * 2018-11-22 2019-06-28 西藏自治区人民政府驻成都办事处医院 Chinese character information storage method based on DNA vector
US11610651B2 (en) 2019-05-09 2023-03-21 Catalog Technologies, Inc. Data structures and operations for searching, computing, and indexing in DNA-based data storage
US11535842B2 (en) 2019-10-11 2022-12-27 Catalog Technologies, Inc. Nucleic acid security and authentication
US11306353B2 (en) 2020-05-11 2022-04-19 Catalog Technologies, Inc. Programs and functions in DNA-based data storage
US11837302B1 (en) 2020-08-07 2023-12-05 Iridia, Inc. Systems and methods for writing and reading data stored in a polymer using nano-channels
US12006497B2 (en) 2022-02-17 2024-06-11 Catalog Technologies, Inc. Chemical methods for nucleic acid-based data storage
US12002547B2 (en) 2023-02-09 2024-06-04 Catalog Technologies, Inc. Data structures and operations for searching, computing, and indexing in DNA-based data storage
US12001962B2 (en) 2023-08-04 2024-06-04 Catalog Technologies, Inc. Systems for nucleic acid-based data storage

Similar Documents

Publication Publication Date Title
WO2003025123A2 (en) Dna: a medium for long-term information storage specification
Organick et al. Random access in large-scale DNA data storage
US5695940A (en) Method of sequencing by hybridization of oligonucleotide probes
US6018041A (en) Method of sequencing genomes by hybridization of oligonucleotide probes
Hao et al. Data storage based on DNA
US7230093B2 (en) Method of sequencing by hybridization of oligonucleotide probes
Organick et al. Scaling up DNA data storage and random access retrieval
Konopka Sequences and codes: fundamentals of biomolecular cryptology
US11845982B2 (en) Key-value store that harnesses live micro-organisms to store and retrieve digital information
Garafutdinov et al. Encoding of non-biological information for its long-term storage in DNA
Wang et al. Data Storage Using DNA
Choi et al. Addition of degenerate bases to DNA-based data storage for increased information capacity
US6670120B1 (en) Categorising nucleic acid
KR101953663B1 (en) Method for generating pool containing oligonucleotides from a oligonucleotide
Lavenier DNA Storage: Synthesis and Sequencing Semiconductor Technologies
CN110982877A (en) RNA transcription-based nucleic acid information repeated reading method
De Giorgi et al. Mitochondrial DNA in the sea urchin Arbacia lixula: Comparison between echinoid orders
Nagl Molecular evolution
Patel et al. Deoxyribonucleic acid as a tool for digital information storage: an overview
Rawal et al. Unlocking the Future: DNA Encryption for Secure and Efficient Massive Data Storage.
Kari et al. How to Compute with DNA
Harvest Science’s
Xie CS284A Introduction to Computational Biology and Bioinformatics
Jamal et al. First Generation–The Sanger Shotgun Approach
Harvest COM PASS COM PASS

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BY BZ CA CH CN CO CR CU CZ DE DM DZ EC EE ES FI GB GD GE GH HR HU ID IL IN IS JP KE KG KP KR LC LK LR LS LT LU LV MA MD MG MN MW MX MZ NO NZ OM PH PL PT RU SD SE SG SI SK SL TJ TM TN TR TZ UA UG US UZ VN YU ZA ZM

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ UG ZM ZW AM AZ BY KG KZ RU TJ TM AT BE BG CH CY CZ DK EE ES FI FR GB GR IE IT LU MC PT SE SK TR BF BJ CF CG CI GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application