US20190138909A1 - Method for using dna to store text information, decoding method therefor and application thereof - Google Patents

Method for using dna to store text information, decoding method therefor and application thereof Download PDF

Info

Publication number
US20190138909A1
US20190138909A1 US16/098,471 US201616098471A US2019138909A1 US 20190138909 A1 US20190138909 A1 US 20190138909A1 US 201616098471 A US201616098471 A US 201616098471A US 2019138909 A1 US2019138909 A1 US 2019138909A1
Authority
US
United States
Prior art keywords
dna sequence
encoding
dna
character
adapter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US16/098,471
Other versions
US10839295B2 (en
Inventor
Yue Shen
Tai Chen
Longying LIU
Shihong CHEN
Yun Wang
Huanming Yang
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BGI Shenzhen Co Ltd
Original Assignee
BGI Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BGI Shenzhen Co Ltd filed Critical BGI Shenzhen Co Ltd
Assigned to BGI SHENZHEN reassignment BGI SHENZHEN ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: CHEN, TAI, YANG, HUANMING, CHEN, Shihong, LIU, Longying, SHEN, YUE, WANG, YUN
Publication of US20190138909A1 publication Critical patent/US20190138909A1/en
Application granted granted Critical
Publication of US10839295B2 publication Critical patent/US10839295B2/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/12Computing arrangements based on biological models using genetic models
    • G06N3/123DNA computing
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07HSUGARS; DERIVATIVES THEREOF; NUCLEOSIDES; NUCLEOTIDES; NUCLEIC ACIDS
    • C07H21/00Compounds containing two or more mononucleotide units having separate phosphate or polyphosphate groups linked by saccharide radicals of nucleoside groups, e.g. nucleic acids
    • C07H21/04Compounds containing two or more mononucleotide units having separate phosphate or polyphosphate groups linked by saccharide radicals of nucleoside groups, e.g. nucleic acids with deoxyribosyl as saccharide radical
    • G06F17/2205
    • G06F17/2217
    • G06F17/275
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/123Storage facilities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/263Language identification
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C13/00Digital stores characterised by the use of storage elements not covered by groups G11C11/00, G11C23/00, or G11C25/00
    • G11C13/0002Digital stores characterised by the use of storage elements not covered by groups G11C11/00, G11C23/00, or G11C25/00 using resistive RAM [RRAM] elements
    • G11C13/0009RRAM elements whose operation depends upon chemical change
    • G11C13/0014RRAM elements whose operation depends upon chemical change comprising cells based on organic memory material
    • G11C13/0019RRAM elements whose operation depends upon chemical change comprising cells based on organic memory material comprising bio-molecules
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C13/00Digital stores characterised by the use of storage elements not covered by groups G11C11/00, G11C23/00, or G11C25/00
    • G11C13/02Digital stores characterised by the use of storage elements not covered by groups G11C11/00, G11C23/00, or G11C25/00 using elements whose operation depends upon chemical change
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • G16B50/20Heterogeneous data integration
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B99/00Subject matter not provided for in other groups of this subclass

Definitions

  • the present invention belongs to the technical field of DNA-based storage, and in particular relates to a method for encoding and storing text information by using DNA as a storage medium, and a decoding method therefor and application thereof.
  • DNA-based storage is a future-focused, subversive information storage technology.
  • the use of DNA as an information storage medium has many advantages such as small volume, large storage capacity, strong stability and low cost of maintenance.
  • 1 gram of DNA can store thousands of terabytes of data, from which it is estimated that the storage of all the existing information of human beings including books, files, videos, etc. can be achieved by using only hundreds of kilograms of DNA, and the storage time can be up to thousands of years under normal conditions. Therefore, those information that is not commonly used but needs long-term preservation, such as government documents and historical files, etc., is especially suitable for DNA-based storage.
  • DNA-based storage has many superior advantages as compared with the existing storage, there are some technical barriers that hinder its development, such as the inability to reuse synthetic DNA oligo fragments, high cost of DNA synthesis, complex design and poor flexibility, etc., resulting in difficulties in large-scale promotion and application of the existing DNA-based storage technology. Therefore, it is necessary to start from the design of basic information-constituting unit to optimize the coding design of DNA-based storage, thereby reducing costs and improving efficiency and convenience.
  • the method for storing text information generally comprises: firstly, encoding a character into a computer binary digit by encoding, and then converting the binary digit into a DNA sequence by transcoding; and secondly, artificially synthesizing the DNA sequence encoding the character information and locating the character by a designed ligation adapter to assemble the DNA sequences encoding the characters in a preset order.
  • the assembled DNA sequences can be further assembled into a longer DNA sequence as needed.
  • each character can be used repeatedly, and by changing the adapter, can be used for storing any information, the principle of which is the same as that of the “movable-type printing” strategy.
  • the DNA which has stored text information can be preserved under appropriate conditions.
  • the stored character information can be obtained by sequencing the DNA sequence followed by decoding with a computer (as shown in FIG. 1 ).
  • the method provided by the present invention has the advantages of small storage volume, large storage capacity, strong stability and low cost of maintenance, etc by using DNA as a storage medium.
  • the present invention provides a method for storing text information by using DNA as a storage medium, comprising the steps of:
  • the encoding is Unicode-ucs2 encoding; that is, each Chinese character is encoded by a hexadecimal digit, for example, the corresponding Unicode code of the character “ ” is U+5535; each 1-bit hexadecimal digit is converted into a 4-bit binary digit, for example, 5 is converted into 0101 and 3 is converted into 0011, and thus the character “ ” is converted into a binary digit 0101010100110101; preferably, each 8-bit binary digit produce a 4-bit Hamming code for verification, and thus the Hamming codes of the character “ ” are 0010 and 1110 respectively. Finally, a complete binary code of the character “ ” can be obtained, that is 010101010010001101011110.
  • the transcoding is performed according to the principle that the binary digit 0 is converted into G or T and the binary digit 1 is converted into C or A so as to convert the binary digit encoding a character into a DNA sequence.
  • one Chinese character is encoded into 24 bases.
  • the sequence design is controlled by considering one or more of parameters including GC content, secondary structure and base repetition rate of the DNA sequence; for example, preferably, the DNA sequence is designed such that the GC content thereof is 45-60%, preferably 50%; preferably, the DNA sequence is designed to avoid the formation of secondary structure; preferably, the DNA sequence is designed such that no more than 2 consecutive single bases are present therein. Taking the character “ ” as an example, it is finally converted into a DNA sequence TAGCTATAGGCTTGCATAGCACCG.
  • Both the DNA sequence and the ligation adapter sequence in the present invention are obtained by de novo chemically synthesizing the forward and reverse strands and allowing them to anneal to form a double-stranded structure.
  • a complementary locating base protrudes from both the DNA sequence fragment and the ligation adapter.
  • the directional ligation of the DNA sequence to the ligation adapter is achieved via the complementary bases (i.e., “locating base”) respectively protruding from the DNA sequence and the ligation adapter.
  • the ligation adapter comprises an upstream adapter and a downstream adapter; ligation adapters with the same DNA sequence but the different overhanging locating bases will linked to the upstream and downstream of two DNA fragments respectively, and the resulted two DNA fragments can be ligated by the ligation adapters by using a conventional molecular biology method, preferably by PCA, GoldenGate, etc. (as shown in FIG. 2 ).
  • one base protrudes from each end of a DNA fragment respectively, such as a base “A” protrudes from the sense strand and a base “G” protrudes from the antisense strand of the DNA fragment at 5′-end, in which case, a base “T” should protrude from the antisense strand of the corresponding upstream adapter and a base “C” should protrude from the sense strand of the downstream adapter such that the directional ligation of the fragment to the adapter can be achieved by means of A/T and G/C pairing, that is, the adapter overhanging a “T” can only be linked at upstream of the DNA fragment and the adapter overhanging a “C” can only be linked at downstream of the DNA fragment.
  • the bases protruding from upstream and downstream of a DNA fragment may also be A/C, T/G and T/C, etc., in which case the corresponding bases protruding from the adapter become T/G, A/C and A/G, etc., correspondingly (as shown in FIG. 3A ).
  • more than one unpaired base may protrude from each of the DNA fragment and the adapter.
  • the base may protrude from the DNA fragment and the adapter at the 3′-end thereof.
  • a base “G” may protrude from the sense strand of the DNA fragment at both 5′-end and 3′-end, in which case, a base “C” should protrude from the antisense strand of the corresponding upstream adapter at 5 ‘-end and the antisense strand of the downstream adapter at 3’-end, also allowing the directional ligation of the fragment to the adapter.
  • the bases protruding from the sense strand of a DNA fragment at 5′-end and 3′-end may also be “C”, “T” or “A”, in which case the bases protruding from the adapter should become “G”, “A” or “T”, correspondingly.
  • “G”, “C”, “T” or “A” may protrude from the antisense strand of a DNA fragment, in which case “C”, “G”, “A” or “T” should protrude from the sense strand of the adapter, correspondingly (as shown in FIG. 3B ).
  • more than one complementary base may protrude from each of the DNA fragment and the adapter.
  • Sequence of the ligation adapter can be automatically generated by a computer program.
  • a PCA adapter needs to have a length of more than 8 bp, a GC content of 50%-60%, no secondary structure, no more than 2 consecutive bases, and no mismatch between the same set of adapters, etc.
  • a GoldenGate adapter consists of an enzymatic cleavage site sequence and its outer protective bases, and the difference in the 4 bp sticky ends resulting from enzyme restriction between the same set of adapters needs to be more than 2 bp (as shown in FIG. 3C ).
  • the 5′-ends of the sense and antisense strands of the DNA fragment, the antisense strand of the upstream adapter, and the sense stand of the downstream adapter are phosphorylated.
  • the 5′-ends of the sense strand of the upstream adapter and the antisense strand of the downstream adapter are dephosphorylated to reduce the probability of self-linking and misligation of the adapters.
  • the designed ligation adapters are respectively added to the DNA sequences encoding the characters, though which locating is achieved; in a particular embodiment, by overlap extension PCR, individual DNA sequences comprising the encoding information of individual characters are ligated according to the character order of the information to be stored, and the ligated sequences can be further assembled into a longer DNA sequence; preferably, individual DNA sequences comprising the encoding information of individual characters are ligated by a method such as PCA or GoldenGate; preferably, the ligated DNA sequences are assembled by a Gibson method and the assembled DNA sequence which encodes the character information can be preserved under suitable storage conditions, for example, can be lyophilized for long-term storage at low temperatures.
  • characters may be firstly assembled into a form of phrase or idiom, etc., such that the subsequent assembly becomes more convenient and efficient; for example, 10-20 characters may be assembled into a short sentence at once by using a method such as PCA or GoldenGate, etc., and then the short sentences can be further spliced into a long sentence, a paragraph or an article by using an assembly method such as Gibson assembly, etc.
  • the assembled DNA sequence is cloned into a plasmid for storage; preferably, a step of verifying the correctness of the assembled DNA sequence by sequencing is also included prior to the storage.
  • the present invention also provides a method for decoding the text information stored according to the method of the first aspect, comprising the steps of:
  • sequencing the DNA sequence which stores text information for example, by sanger sequencing, second generation sequencing, third generation sequencing or other sequencing methods;
  • the DNA sequence When a mutation exists in the DNA sequence, it can be corrected by the Hamming code. For example, if the base at position 2 of the above DNA sequence is mutated from A to G, i.e. the DNA sequence becomes TGGCTATAGGCTTGCATAGCACCG, the corresponding binary digit will become 000101010010001101011110. It can be calculated according to the Hamming code verification principle that the base at position 2 is mutated, and thereby the binary digit will be corrected to 010101010010001101011110 and the sequence can still be correctly decoded as “ ”.
  • the decoding may further include the step of correcting mutations in the DNA sequence according to the Hamming code verification principle.
  • the present invention provides use of the method for storing text information according to the first aspect and/or the method for decoding the stored text information according to the second aspect in the storage and/or reading of text information.
  • the method for storing text information according to the present invention has the advantages of small storage volume, large storage capacity, strong stability and low cost of maintenance, etc. by using DNA as a storage medium.
  • the present method is more suitable for storing text information, supporting text forms including characters of various countries including all Chinese characters, English letters, Japanese and Korean, etc., punctuation marks and mathematical symbols, etc.; has high encoding efficiency, wherein 1000 Chinese characters can be encoded within 100 milliseconds; adopts a strategy similar to “movable-type printing”, wherein the DNA fragments and adapters can be used repeatedly, resulting in lower cost of synthesis; the stored DNA sequence can be in a double-stranded closed circular conformation, which is more stable in storage; the stored DNA sequence can be verified by sequencing and the Hamming verification code can be added thereto, allowing any one mutation in every 12 bases, which results in a more fidelity; and the stored DNA sequence is a long double-stranded DNA, which is more easier for reading information.
  • FIG. 1 shows a schematic view showing the overall flow of Example 1 of the present invention
  • FIG. 2 shows a schematic view showing the assembly flow of Example 1 of the present invention
  • FIG. 3 shows schematic views showing the designs of fragments/adapters of Example 1 of the present invention
  • FIG. 4 shows electrophoretograms of the assembly test results of Example 1 of the present invention.
  • the experimental procedures mentioned in the examples are conventional experimental methods unless otherwise specified; the reagents and consumables mentioned are conventional reagents and consumables unless otherwise specified.
  • the synthetic oligos used in the experiments were diluted to 100 ⁇ M and the primers were diluted to 10 ⁇ M with sterile water.
  • Transcoding was performed according to the method of the present invention and DNA oligo sequences were synthesized; the PCA adapters were used for locating and ligating according to G/C and A/T base pairing manner at the upstream and downstream, respectively.
  • the DNA oligo and primer sequences were shown in Table 1 below.
  • the forward and reverse oligos were each taken 10 ⁇ L for each character or adapter, mixed and annealed; the annealing procedure was: denatured at 99° C. for 10 min, slowly cooled to 25° C. at 0.1° C./sec, maintained at 12° C.
  • the ligation system was: 1 ⁇ L of T4 DNA ligase (Enzymatics), 10 ⁇ L of 2 ⁇ ligation buffer, 2 ⁇ L each of the annealed character, upstream adapter and downstream adapter, 3 ⁇ L of ddH 2 O; ligating at 16° C. overnight.
  • the ligation products were subjected to gel electrophoresis on 15% PAGE gel at 100 V for 1 h (the electrophoresis results were shown in FIG. 4A ); the target bands (42 bp in size, as indicated by the arrow in FIG. 4A ) were cut off and purified: the cut gel was placed into a 0.5 mL tube with punctured bottom, which was then placed into a 2 mL tube, centrifuged at 14,000 rpm for 2 min, 200 ⁇ L of 0.3 M NaCl was added to the broken gel, shaken at 1400 rpm at 25° C.
  • Step 1 0.2 ⁇ L of Ex Taq DNA polymerase (TAKARA), 2 ⁇ L of 10 ⁇ buffer, 1.6 ⁇ L of 2.5 mM dNTPs, 50 ng each of the ligated, cut and purified products of adapter1-U+ +adapter2-D, adapter2-U+ +adapter5-D, adapter5-U+ +adapter6-D, adapter6-U+ +adapter7-D, adapter7-U+ +adapter8-D, adapter8-U+ adapter9-D and adapter9-U+ +adapter10-D, adding water to 20 ⁇ L. 94° C. for 5 min; 94° C.
  • TAKARA Ex Taq DNA polymerase
  • Step 2 0.2 ⁇ L of Ex Taq DNA polymerase (TAKARA), 2 ⁇ L of 10 ⁇ buffer, 1.6 ⁇ L of 2.5 mM dNTPs, 3 ⁇ L of the PCR product of step 1, 1 ⁇ L each of the primers 12-F and St1-R, 11.2 ⁇ L of ddH 2 O. 94° C. for 5 min; 94° C. for 30 sec, 60° C. for 30 sec, 72° C. for 30 sec, 35 cycles, 72° C. for 5 min, maintained at 12° C.
  • TAKARA Ex Taq DNA polymerase
  • the PCR product was detected by electrophoresis: 5 ⁇ L of PCR product was used for electrophoresis detection. The electrophoresis was performed with 2% agarose gel at 180 V for 30 min (The electrophoresis result was shown in FIG. 4B . The PCR product was approximately 280 bp in size as indicated by the arrows).
  • the PCR product obtained in step 4 was purified by gel purification with a gel purification kit and the purified PCR product was cloned with a TA cloning kit (TAKARA).
  • TAKARA TA cloning kit
  • Monoclones were selected from the TA cloning plate obtained in step 5, incubated overnight, followed by plasmid extraction with a kit and identification by restriction digestion with the designed BssHII restriction site.
  • the digestion system was: 0.5 ⁇ L of BssHII (NEB), 1 ⁇ L of CutSmart buffer, 4 ⁇ L of plasmid DNA and 4.5 ⁇ L of ddH 2 O. Digested at 37° C. for 1 h. 5 ⁇ L of the digested product was used for electrophoresis with 2% agarose gel at 180 V for 30 min (the electrophoresis results were shown in FIG. 4C and the band with correct size was indicated by the arrow).
  • the correctly enzyme-digested plasmid was selected for sanger sequencing, and the plasmid with the correct assembly sequence was analyzed and obtained.
  • adapter2-UR TCGCGAATT SEQ ID NO. 18
  • adapter2-DF CAATTCGCG SEQ ID NO. 19
  • adapter2-DR CGCGAATT SEQ ID NO. 20
  • adapter5-UF CCTTTAGC SEQ ID NO. 21
  • adapter5-UR TGCTAAAGG SEQ ID NO. 22
  • adapter5-DF CCCTTTAGC SEQ ID NO. 23
  • adapter5-DR GCTAAAGG adapter6-UF ACAGAGAC
  • adapter6-UR TGTCTCTGT SEQ ID NO. 26
  • adapter6-DF CACAGAGAC SEQ ID NO. 27
  • adapter6-DR GTCTCTGT SEQ ID NO.
  • adapter7-UF CCGTCATA (SEQ ID NO. 29) adapter7-UR TTATGACGG (SEQ ID NO. 30) adapter7-DF CCCGTCATA (SEQ ID NO. 31) adapter7-DR TATGACGG (SEQ ID NO. 32) adapter8-UF GGATCTAC (SEQ ID NO. 33) adapter8-UR TGTAGATCC (SEQ ID NO. 34) adapter8-DF CGGATCTAC (SEQ ID NO. 35) adapter8-DR GTAGATCC (SEQ ID NO. 36) adapter9-UF GTTGCATC (SEQ ID NO. 37) adapter9-UR TGATGCAAC (SEQ ID NO.
  • adapter9-DF CGTTGCATC (SEQ ID NO. 39) adapter9-DR GATGCAAC (SEQ ID NO. 40) adapter10-DF CATCGGGAA (SEQ ID NO. 41) adapter10-DR TTCCCGAT (SEQ ID NO. 42) St12-F GCGCGCTTGGTTCAGACGTGAGAAGTGATGCA (SEQ ID NO. 43) CAGTAGCTTAACGCGTATGACATCGCA St1-R GCGCGCTACCGTACTAGGATCCAATCG (SEQ ID NO. 44)

Abstract

A method for encoding and storing text information using DNA as a storage medium, a decoding method therefor and an application thereof. The method for using DNA to store text information comprises: encoding characters into computer binary digits by means of encoding, and converting the binary digits into DNA sequences by means of transcoding; and artificially synthesizing the DNA sequences encoded with character information, positioning the characters by means of a designed ligation adapter, and assembling the DNA sequences encoded with the character information according to a pre-set order. The method for using DNA to store text information has the advantages of a small storage volume, a large storage capacity, a strong stability and low maintenance costs.

Description

    TECHNICAL FIELD
  • The present invention belongs to the technical field of DNA-based storage, and in particular relates to a method for encoding and storing text information by using DNA as a storage medium, and a decoding method therefor and application thereof.
  • BACKGROUND
  • With the development of human society, the accumulated amount of information shows an explosive growth trend. It has been predicted in IDC's report “Digital Universe in 2020” that by 2020 the total amount of global data will exceed 40 ZB! Moreover, the amount of global data is still growing rapidly at a rate of 58% per year, and a large amount of valid data is being lost. Data storage is a problem all over the world. The commonly used storage media at present, such as optical disks and hard disks, have disadvantages such as low storage capacity, large volume, high cost of maintenance and short storage time (˜50 years). In order to solve these problems fundamentally, it is necessary to develop a novel information storage medium as soon as possible.
  • DNA-based storage is a future-focused, subversive information storage technology. The use of DNA as an information storage medium has many advantages such as small volume, large storage capacity, strong stability and low cost of maintenance. Theoretically, 1 gram of DNA can store thousands of terabytes of data, from which it is estimated that the storage of all the existing information of human beings including books, files, videos, etc. can be achieved by using only hundreds of kilograms of DNA, and the storage time can be up to thousands of years under normal conditions. Therefore, those information that is not commonly used but needs long-term preservation, such as government documents and historical files, etc., is especially suitable for DNA-based storage.
  • Although DNA-based storage has many superior advantages as compared with the existing storage, there are some technical barriers that hinder its development, such as the inability to reuse synthetic DNA oligo fragments, high cost of DNA synthesis, complex design and poor flexibility, etc., resulting in difficulties in large-scale promotion and application of the existing DNA-based storage technology. Therefore, it is necessary to start from the design of basic information-constituting unit to optimize the coding design of DNA-based storage, thereby reducing costs and improving efficiency and convenience.
  • SUMMARY OF THE INVENTION
  • It is an object of the present invention to provide a method for encoding and storing text information by using DNA as a storage medium, and a decoding method therefor and application thereof.
  • The method for storing text information provided by the present invention generally comprises: firstly, encoding a character into a computer binary digit by encoding, and then converting the binary digit into a DNA sequence by transcoding; and secondly, artificially synthesizing the DNA sequence encoding the character information and locating the character by a designed ligation adapter to assemble the DNA sequences encoding the characters in a preset order. Alternatively, the assembled DNA sequences can be further assembled into a longer DNA sequence as needed.
  • In the method for storing text information of the present invention, each character can be used repeatedly, and by changing the adapter, can be used for storing any information, the principle of which is the same as that of the “movable-type printing” strategy. The DNA which has stored text information can be preserved under appropriate conditions. When the stored information needs to be read, the stored character information can be obtained by sequencing the DNA sequence followed by decoding with a computer (as shown in FIG. 1). The method provided by the present invention has the advantages of small storage volume, large storage capacity, strong stability and low cost of maintenance, etc by using DNA as a storage medium.
  • Specifically, the technical object of the present invention can be achieved by the following aspects:
  • In a first aspect, the present invention provides a method for storing text information by using DNA as a storage medium, comprising the steps of:
  • (1) encoding a character into a computer binary digit by encoding;
  • (2) converting the computer binary digit encoding the character into a DNA sequence, which is represented by the four deoxyribonucleotides A, T, G, and C, by transcoding;
  • (3) synthesizing the DNA sequence encoding the character;
  • (4) locating the DNA sequence encoding the character by a designed ligation adapter, assembling individual DNA sequences encoding individual characters according to the order of characters of the information to be stored, followed by storing.
  • Regarding Encoding
  • In an alternative particular embodiment, the encoding is Unicode-ucs2 encoding; that is, each Chinese character is encoded by a hexadecimal digit, for example, the corresponding Unicode code of the character “
    Figure US20190138909A1-20190509-P00001
    ” is U+5535; each 1-bit hexadecimal digit is converted into a 4-bit binary digit, for example, 5 is converted into 0101 and 3 is converted into 0011, and thus the character “
    Figure US20190138909A1-20190509-P00001
    ” is converted into a binary digit 0101010100110101; preferably, each 8-bit binary digit produce a 4-bit Hamming code for verification, and thus the Hamming codes of the character “
    Figure US20190138909A1-20190509-P00001
    ” are 0010 and 1110 respectively. Finally, a complete binary code of the character “
    Figure US20190138909A1-20190509-P00001
    ” can be obtained, that is 010101010010001101011110.
  • Regarding Transcoding
  • In an optional particular embodiment, the transcoding is performed according to the principle that the binary digit 0 is converted into G or T and the binary digit 1 is converted into C or A so as to convert the binary digit encoding a character into a DNA sequence.
  • Preferably, one Chinese character is encoded into 24 bases.
  • Preferably, the sequence design is controlled by considering one or more of parameters including GC content, secondary structure and base repetition rate of the DNA sequence; for example, preferably, the DNA sequence is designed such that the GC content thereof is 45-60%, preferably 50%; preferably, the DNA sequence is designed to avoid the formation of secondary structure; preferably, the DNA sequence is designed such that no more than 2 consecutive single bases are present therein. Taking the character “
    Figure US20190138909A1-20190509-P00001
    ” as an example, it is finally converted into a DNA sequence TAGCTATAGGCTTGCATAGCACCG.
  • Regarding the DNA Sequence and the Ligation Adapter
  • Both the DNA sequence and the ligation adapter sequence in the present invention are obtained by de novo chemically synthesizing the forward and reverse strands and allowing them to anneal to form a double-stranded structure.
  • In an optional particular embodiment, a complementary locating base protrudes from both the DNA sequence fragment and the ligation adapter. The directional ligation of the DNA sequence to the ligation adapter is achieved via the complementary bases (i.e., “locating base”) respectively protruding from the DNA sequence and the ligation adapter. By designing the ligation adapter, DNA sequence fragments encoding various characters can be ligated in the desired character order.
  • In an optional particular embodiment, the ligation adapter comprises an upstream adapter and a downstream adapter; ligation adapters with the same DNA sequence but the different overhanging locating bases will linked to the upstream and downstream of two DNA fragments respectively, and the resulted two DNA fragments can be ligated by the ligation adapters by using a conventional molecular biology method, preferably by PCA, GoldenGate, etc. (as shown in FIG. 2).
  • For example, one base protrudes from each end of a DNA fragment respectively, such as a base “A” protrudes from the sense strand and a base “G” protrudes from the antisense strand of the DNA fragment at 5′-end, in which case, a base “T” should protrude from the antisense strand of the corresponding upstream adapter and a base “C” should protrude from the sense strand of the downstream adapter such that the directional ligation of the fragment to the adapter can be achieved by means of A/T and G/C pairing, that is, the adapter overhanging a “T” can only be linked at upstream of the DNA fragment and the adapter overhanging a “C” can only be linked at downstream of the DNA fragment. Similarly, the bases protruding from upstream and downstream of a DNA fragment may also be A/C, T/G and T/C, etc., in which case the corresponding bases protruding from the adapter become T/G, A/C and A/G, etc., correspondingly (as shown in FIG. 3A). Similarly, more than one unpaired base may protrude from each of the DNA fragment and the adapter. Similarly, the base may protrude from the DNA fragment and the adapter at the 3′-end thereof. A base “G” may protrude from the sense strand of the DNA fragment at both 5′-end and 3′-end, in which case, a base “C” should protrude from the antisense strand of the corresponding upstream adapter at 5 ‘-end and the antisense strand of the downstream adapter at 3’-end, also allowing the directional ligation of the fragment to the adapter. Similarly, the bases protruding from the sense strand of a DNA fragment at 5′-end and 3′-end may also be “C”, “T” or “A”, in which case the bases protruding from the adapter should become “G”, “A” or “T”, correspondingly. Similarly, “G”, “C”, “T” or “A” may protrude from the antisense strand of a DNA fragment, in which case “C”, “G”, “A” or “T” should protrude from the sense strand of the adapter, correspondingly (as shown in FIG. 3B). Similarly, more than one complementary base may protrude from each of the DNA fragment and the adapter.
  • Sequence of the ligation adapter can be automatically generated by a computer program. For example, a PCA adapter needs to have a length of more than 8 bp, a GC content of 50%-60%, no secondary structure, no more than 2 consecutive bases, and no mismatch between the same set of adapters, etc.; a GoldenGate adapter consists of an enzymatic cleavage site sequence and its outer protective bases, and the difference in the 4 bp sticky ends resulting from enzyme restriction between the same set of adapters needs to be more than 2 bp (as shown in FIG. 3C). The 5′-ends of the sense and antisense strands of the DNA fragment, the antisense strand of the upstream adapter, and the sense stand of the downstream adapter are phosphorylated. The 5′-ends of the sense strand of the upstream adapter and the antisense strand of the downstream adapter are dephosphorylated to reduce the probability of self-linking and misligation of the adapters.
  • For the Assembly and Preservation
  • To the DNA sequences encoding the characters are respectively added the designed ligation adapters, though which locating is achieved; in a particular embodiment, by overlap extension PCR, individual DNA sequences comprising the encoding information of individual characters are ligated according to the character order of the information to be stored, and the ligated sequences can be further assembled into a longer DNA sequence; preferably, individual DNA sequences comprising the encoding information of individual characters are ligated by a method such as PCA or GoldenGate; preferably, the ligated DNA sequences are assembled by a Gibson method and the assembled DNA sequence which encodes the character information can be preserved under suitable storage conditions, for example, can be lyophilized for long-term storage at low temperatures.
  • In a particular embodiment, characters may be firstly assembled into a form of phrase or idiom, etc., such that the subsequent assembly becomes more convenient and efficient; for example, 10-20 characters may be assembled into a short sentence at once by using a method such as PCA or GoldenGate, etc., and then the short sentences can be further spliced into a long sentence, a paragraph or an article by using an assembly method such as Gibson assembly, etc.
  • Preferably, the assembled DNA sequence is cloned into a plasmid for storage; preferably, a step of verifying the correctness of the assembled DNA sequence by sequencing is also included prior to the storage.
  • In a second aspect, the present invention also provides a method for decoding the text information stored according to the method of the first aspect, comprising the steps of:
  • (1) sequencing the DNA sequence which stores text information, for example, by sanger sequencing, second generation sequencing, third generation sequencing or other sequencing methods;
  • (2) converting the sequenced DNA sequence into a binary digit, which in turn is converted into a corresponding Chinese character according to the same transcoding and encoding rules as defined in the method described in the first aspect to obtain the stored text information.
  • When a mutation exists in the DNA sequence, it can be corrected by the Hamming code. For example, if the base at position 2 of the above DNA sequence is mutated from A to G, i.e. the DNA sequence becomes TGGCTATAGGCTTGCATAGCACCG, the corresponding binary digit will become 000101010010001101011110. It can be calculated according to the Hamming code verification principle that the base at position 2 is mutated, and thereby the binary digit will be corrected to 010101010010001101011110 and the sequence can still be correctly decoded as “
    Figure US20190138909A1-20190509-P00001
    ”.
  • Therefore, in particular embodiments, the decoding may further include the step of correcting mutations in the DNA sequence according to the Hamming code verification principle.
  • In a third aspect, the present invention provides use of the method for storing text information according to the first aspect and/or the method for decoding the stored text information according to the second aspect in the storage and/or reading of text information.
  • Beneficial Effects
  • First, the method for storing text information according to the present invention has the advantages of small storage volume, large storage capacity, strong stability and low cost of maintenance, etc. by using DNA as a storage medium.
  • In addition, compared with other existing DNA storage methods, the present method is more suitable for storing text information, supporting text forms including characters of various countries including all Chinese characters, English letters, Japanese and Korean, etc., punctuation marks and mathematical symbols, etc.; has high encoding efficiency, wherein 1000 Chinese characters can be encoded within 100 milliseconds; adopts a strategy similar to “movable-type printing”, wherein the DNA fragments and adapters can be used repeatedly, resulting in lower cost of synthesis; the stored DNA sequence can be in a double-stranded closed circular conformation, which is more stable in storage; the stored DNA sequence can be verified by sequencing and the Hamming verification code can be added thereto, allowing any one mutation in every 12 bases, which results in a more fidelity; and the stored DNA sequence is a long double-stranded DNA, which is more easier for reading information.
  • DESCRIPTION OF THE DRAWINGS
  • FIG. 1 shows a schematic view showing the overall flow of Example 1 of the present invention;
  • FIG. 2 shows a schematic view showing the assembly flow of Example 1 of the present invention;
  • FIG. 3 shows schematic views showing the designs of fragments/adapters of Example 1 of the present invention;
  • FIG. 4 shows electrophoretograms of the assembly test results of Example 1 of the present invention.
  • DETAILED DESCRIPTION
  • The experimental procedures described in the following examples are only used to demonstrate the feasibility of the method of the patent, and the application of the method of the invention is not limited thereto.
  • The experimental procedures mentioned in the examples are conventional experimental methods unless otherwise specified; the reagents and consumables mentioned are conventional reagents and consumables unless otherwise specified. The synthetic oligos used in the experiments were diluted to 100 μM and the primers were diluted to 10 μM with sterile water.
  • Example 1 Assembly Test
  • The phrase for the assembly test:
    Figure US20190138909A1-20190509-P00002
    .
  • Transcoding was performed according to the method of the present invention and DNA oligo sequences were synthesized; the PCA adapters were used for locating and ligating according to G/C and A/T base pairing manner at the upstream and downstream, respectively. The DNA oligo and primer sequences were shown in Table 1 below.
  • 1. Annealing
  • The forward and reverse oligos were each taken 10 μL for each character or adapter, mixed and annealed; the annealing procedure was: denatured at 99° C. for 10 min, slowly cooled to 25° C. at 0.1° C./sec, maintained at 12° C.
  • 2. Ligation
  • To each character were added upstream and downstream adapters in order respectively for ligation; the ligation system was: 1 μL of T4 DNA ligase (Enzymatics), 10 μL of 2×ligation buffer, 2 μL each of the annealed character, upstream adapter and downstream adapter, 3 μL of ddH2O; ligating at 16° C. overnight.
  • 3. Purification of the Ligation Product
  • The ligation products were subjected to gel electrophoresis on 15% PAGE gel at 100 V for 1 h (the electrophoresis results were shown in FIG. 4A); the target bands (42 bp in size, as indicated by the arrow in FIG. 4A) were cut off and purified: the cut gel was placed into a 0.5 mL tube with punctured bottom, which was then placed into a 2 mL tube, centrifuged at 14,000 rpm for 2 min, 200 μL of 0.3 M NaCl was added to the broken gel, shaken at 1400 rpm at 25° C. for 2 h; the broken gel and the liquid were transferred together into a filter column, centrifuged at 14,000 rpm for 2 min, the filtrate was transferred into a 1.5 mL centrifuge tube, 400 μL of absolute ethanol was added, left for sinking at −80° C. for 1 h. Centrifuged at 14000 rpm at 4° C. for 30 min, discarded the supernatant, 500 μL of 70% ethanol was added for washing the precipitate once, drawn off the supernatant, dried at 37° C. for 5 min, and 20 μL of ddH2O was added to dissolve the DNA.
  • 4. Assembly
  • The characters were assembled into a short sentence by using a method of PCA. Step 1: 0.2 μL of Ex Taq DNA polymerase (TAKARA), 2 μL of 10×buffer, 1.6 μL of 2.5 mM dNTPs, 50 ng each of the ligated, cut and purified products of adapter1-U+
    Figure US20190138909A1-20190509-P00003
    +adapter2-D, adapter2-U+
    Figure US20190138909A1-20190509-P00004
    +adapter5-D, adapter5-U+
    Figure US20190138909A1-20190509-P00005
    +adapter6-D, adapter6-U+
    Figure US20190138909A1-20190509-P00006
    +adapter7-D, adapter7-U+
    Figure US20190138909A1-20190509-P00007
    +adapter8-D, adapter8-U+
    Figure US20190138909A1-20190509-P00008
    adapter9-D and adapter9-U+
    Figure US20190138909A1-20190509-P00009
    +adapter10-D, adding water to 20 μL. 94° C. for 5 min; 94° C. for 30 sec, 55° C. for 1 sec, cooled to 45° C. at 0.5° C./sec, 45° C. for 15 sec, 72° C. for 1 min, 20 cycles; 72° C. for 5 min, maintained at 12° C. Step 2: 0.2 μL of Ex Taq DNA polymerase (TAKARA), 2 μL of 10×buffer, 1.6 μL of 2.5 mM dNTPs, 3 μL of the PCR product of step 1, 1 μL each of the primers 12-F and St1-R, 11.2 μL of ddH2O. 94° C. for 5 min; 94° C. for 30 sec, 60° C. for 30 sec, 72° C. for 30 sec, 35 cycles, 72° C. for 5 min, maintained at 12° C.
  • The PCR product was detected by electrophoresis: 5 μL of PCR product was used for electrophoresis detection. The electrophoresis was performed with 2% agarose gel at 180 V for 30 min (The electrophoresis result was shown in FIG. 4B. The PCR product was approximately 280 bp in size as indicated by the arrows).
  • 5. TA Cloning
  • The PCR product obtained in step 4 was purified by gel purification with a gel purification kit and the purified PCR product was cloned with a TA cloning kit (TAKARA).
  • 6. Identification by Restriction Digestion
  • Monoclones were selected from the TA cloning plate obtained in step 5, incubated overnight, followed by plasmid extraction with a kit and identification by restriction digestion with the designed BssHII restriction site. The digestion system was: 0.5 μL of BssHII (NEB), 1 μL of CutSmart buffer, 4 μL of plasmid DNA and 4.5 μL of ddH2O. Digested at 37° C. for 1 h. 5 μL of the digested product was used for electrophoresis with 2% agarose gel at 180 V for 30 min (the electrophoresis results were shown in FIG. 4C and the band with correct size was indicated by the arrow).
  • 7. Sequencing Analysis
  • The correctly enzyme-digested plasmid was selected for sanger sequencing, and the plasmid with the correct assembly sequence was analyzed and obtained.
  • TABLE 1
    NO. Sequence
    Figure US20190138909A1-20190509-P00010
    -F
    ATAGCTTAACGCGTATGACATCGCA
    (SEQ ID NO. 1)
    Figure US20190138909A1-20190509-P00010
    -R
    GTGCGATGTCATACGCGTTAAGCTA
    (SEQ ID NO. 2)
    Figure US20190138909A1-20190509-P00011
    -F
    ATACAAGCTTCAGACAGTTGTGGTG
    (SEQ ID NO. 3)
    Figure US20190138909A1-20190509-P00011
    -R
    GCACCACAACTGTCTGAAGCTTGTA
    (SEQ ID NO. 4)
    Figure US20190138909A1-20190509-P00012
    -F
    AATGTCATGGTTGTGACGTGCATAC
    (SEQ ID NO. 5)
    Figure US20190138909A1-20190509-P00012
    -R
    GGTATGCACGTCACAACCATGACAT
    (SEQ ID NO. 6)
    Figure US20190138909A1-20190509-P00013
    -F
    ATAGCTTGAGACCATTACGTACGGT
    (SEQ ID NO. 7)
    Figure US20190138909A1-20190509-P00013
    -R
    GACCGTACGTAATGGTCTCAAGCTA
    (SEQ ID NO. 8)
    Figure US20190138909A1-20190509-P00014
    -F
    ATAGCTAACCAACACCACTAGAGCT
    (SEQ ID NO. 9)
    Figure US20190138909A1-20190509-P00014
    -R
    GAGCTCTAGTGGTGTTGGTTAGCTA
    (SEQ ID NO. 10)
    Figure US20190138909A1-20190509-P00015
    -F
    ATAGCTAATCCGGAACTTGTGGTGT
    (SEQ ID NO. 11)
    Figure US20190138909A1-20190509-P00015
    -R
    GACACCACAAGTTCCGGATTAGCTA
    (SEQ ID NO. 12)
    Figure US20190138909A1-20190509-P00016
    -F
    ATACGATTGGATCCTAGTACGGTAG
    (SEQ ID NO. 13)
    Figure US20190138909A1-20190509-P00016
    -R
    GCTACCGTACTAGGATCCAATCGTA
    (SEQ ID NO. 14)
    adapter1-UF CTCATTCC
    (SEQ ID NO. 15)
    adapter1-UR TGGAATGAG
    (SEQ ID NO. 16)
    adapter2-UF AATTCGCG
    (SEQ ID NO. 17)
    adapter2-UR TCGCGAATT
    (SEQ ID NO. 18)
    adapter2-DF CAATTCGCG
    (SEQ ID NO. 19)
    adapter2-DR CGCGAATT
    (SEQ ID NO. 20)
    adapter5-UF CCTTTAGC
    (SEQ ID NO. 21)
    adapter5-UR TGCTAAAGG
    (SEQ ID NO. 22)
    adapter5-DF CCCTTTAGC
    (SEQ ID NO. 23)
    adapter5-DR GCTAAAGG
    (SEQ ID NO. 24)
    adapter6-UF ACAGAGAC
    (SEQ ID NO. 25)
    adapter6-UR TGTCTCTGT
    (SEQ ID NO. 26)
    adapter6-DF CACAGAGAC
    (SEQ ID NO. 27)
    adapter6-DR GTCTCTGT
    (SEQ ID NO. 28)
    adapter7-UF CCGTCATA
    (SEQ ID NO. 29)
    adapter7-UR TTATGACGG
    (SEQ ID NO. 30)
    adapter7-DF CCCGTCATA
    (SEQ ID NO. 31)
    adapter7-DR TATGACGG
    (SEQ ID NO. 32)
    adapter8-UF GGATCTAC
    (SEQ ID NO. 33)
    adapter8-UR TGTAGATCC
    (SEQ ID NO. 34)
    adapter8-DF CGGATCTAC
    (SEQ ID NO. 35)
    adapter8-DR GTAGATCC
    (SEQ ID NO. 36)
    adapter9-UF GTTGCATC
    (SEQ ID NO. 37)
    adapter9-UR TGATGCAAC
    (SEQ ID NO. 38)
    adapter9-DF CGTTGCATC
    (SEQ ID NO. 39)
    adapter9-DR GATGCAAC
    (SEQ ID NO. 40)
    adapter10-DF CATCGGGAA
    (SEQ ID NO. 41)
    adapter10-DR TTCCCGAT
    (SEQ ID NO. 42)
    St12-F GCGCGCTTGGTTCAGACGTGAGAAGTGATGCA
    (SEQ ID NO. 43) CAGTAGCTTAACGCGTATGACATCGCA
    St1-R GCGCGCTACCGTACTAGGATCCAATCG
    (SEQ ID NO. 44)
  • The Applicant states that the method and application thereof of the present invention are illustrated through the above examples, however, the present invention is not limited thereto. Those skilled in the art should understand that, for any improvement of the present invention, the equivalent replacement of the products of the present invention, the addition of auxiliary components, and the selection of specific modes, etc., will all fall within the protection scope and the disclosure scope of the present invention.

Claims (21)

1.-9. (canceled)
10. A method for storing text information by using DNA as a storage medium, comprising the steps of:
(1) encoding a character into a computer binary digit by encoding;
(2) converting the computer binary digit encoding the character into a DNA sequence, which is represented by the four deoxyribonucleotides A, T, G, and C, by transcoding;
(3) synthesizing the DNA sequence encoding the character;
(4) locating the DNA sequence encoding the character by a designed ligation adapter, ligating individual DNA sequences encoding individual characters according to the order of characters of the information to be stored, followed by assembling and storing.
11. The method according to claim 10, wherein the encoding is Unicode-ucs2 encoding.
12. The method according to claim 11, wherein the character is a Chinese character, and wherein the Chinese character is encoded by a hexadecimal digit, and the hexadecimal digit is converted into a 4-bit binary digit.
13. The method according to claim 12, wherein the 8-bit binary digit produces a 4-bit Hamming code for verification.
14. The method according to claim 10, wherein the transcoding is performed according to a principle that the binary digit 0 is converted into G or T and the binary digit 1 is converted into C or A so as to convert the binary digit encoding a character into a DNA sequence.
15. The method according to claim 10, wherein the character is a Chinese character, and the Chinese character is encoded into 24 bases.
16. The method according to claim 10, wherein the sequence design is controlled by considering one or more of parameters selected from the group consisting of GC content, secondary structure and base repetition rate of the DNA sequence.
17. The method according to claim 16, wherein the DNA sequence is designed such that the GC content thereof is 45-60%.
18. The method according to claim 16, wherein the DNA sequence is designed such that the GC content thereof is 50%.
19. The method according to claim 16, wherein the DNA sequence is designed to avoid the formation of secondary structure.
20. The method according to claim 16, wherein the DNA sequence is designed such that no more than 2 consecutive single bases are present therein.
21. The method according to claim 10, wherein the ligation adapter in step (4) comprises an upstream adapter and a downstream adapter.
22. The method according to claim 10, wherein the directional ligation of the DNA sequence to the ligation adapter is achieved via the complementary bases respectively protruding from the DNA sequence and the ligation adapter.
23. The method according to claim 10, wherein by overlap extension PCR, individual DNA sequences comprising the encoding information of individual characters are ligated, followed by being further assembled into a longer DNA sequence.
24. The method according to claim 10, wherein the ligation is performed by PCA or GoldenGate method.
25. The method according to claim 10, wherein the assembly is performed by Gibson method.
26. The method according to claim 10, wherein the assembled DNA sequence is cloned into a plasmid for storage.
27. The method according to claim 10, wherein a step of verifying the correctness of the assembled DNA sequence by sequencing is further included prior to the storage.
28. A method for decoding the text information stored according to the method of claim 10, comprising the steps of:
(1) sequencing the DNA sequence which stores text information;
(2) converting the sequenced DNA sequence into a binary digit, which in turn is converted into a corresponding Chinese character according to the same transcoding and encoding rules as defined in the method according to claim 10 to obtain the stored text information.
29. The method according to claim 28, wherein the decoding process further comprises correcting mutations in the DNA sequence according to the Hamming code verification principle.
US16/098,471 2016-05-04 2016-05-04 Method for using DNA to store text information, decoding method therefor and application thereof Active US10839295B2 (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2016/081037 WO2017190297A1 (en) 2016-05-04 2016-05-04 Method for using dna to store text information, decoding method therefor and application thereof

Publications (2)

Publication Number Publication Date
US20190138909A1 true US20190138909A1 (en) 2019-05-09
US10839295B2 US10839295B2 (en) 2020-11-17

Family

ID=60202585

Family Applications (1)

Application Number Title Priority Date Filing Date
US16/098,471 Active US10839295B2 (en) 2016-05-04 2016-05-04 Method for using DNA to store text information, decoding method therefor and application thereof

Country Status (4)

Country Link
US (1) US10839295B2 (en)
EP (1) EP3470997A4 (en)
CN (1) CN109074424B (en)
WO (1) WO2017190297A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112002376A (en) * 2020-08-13 2020-11-27 中国海洋大学 Method for recording and reading information by DNA molecule
US11017170B2 (en) * 2018-09-27 2021-05-25 At&T Intellectual Property I, L.P. Encoding and storing text using DNA sequences
CN114758703A (en) * 2022-06-14 2022-07-15 深圳先进技术研究院 Data information storage method based on recombinant plasmid DNA molecules
CN114958828A (en) * 2022-06-14 2022-08-30 深圳先进技术研究院 Data information storage method based on DNA molecular medium
US11763169B2 (en) 2016-11-16 2023-09-19 Catalog Technologies, Inc. Systems for nucleic acid-based data storage

Families Citing this family (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10650312B2 (en) 2016-11-16 2020-05-12 Catalog Technologies, Inc. Nucleic acid-based data storage
CA3094077A1 (en) 2018-03-16 2019-09-19 Catalog Technologies, Inc. Chemical methods for nucleic acid-based data storage
US11339423B2 (en) * 2018-03-18 2022-05-24 Bryan Bishop Systems and methods for data storage in nucleic acids
JP2021524229A (en) 2018-05-16 2021-09-13 カタログ テクノロジーズ, インコーポレイテッド Compositions and Methods for Nucleic Acid-Based Data Storage
GB2576304B (en) 2018-07-26 2020-09-09 Evonetix Ltd Accessing data storage provided using double-stranded nucleic acid molecules
JP7343584B2 (en) 2018-08-10 2023-09-12 ニュークレオトレース・ピーティワイ・リミテッド Systems and methods for identifying product identity
CN109460822B (en) * 2018-11-19 2021-11-12 天津大学 DNA-based information storage method
CN109943560A (en) * 2018-11-22 2019-06-28 西藏自治区人民政府驻成都办事处医院 Chinese character information storage method based on DNA vector
CN109830263B (en) * 2019-01-30 2023-04-07 东南大学 DNA storage method based on oligonucleotide sequence coding storage
CN109887549B (en) * 2019-02-22 2023-01-20 天津大学 Data storage and restoration method and device
EP3966823A1 (en) 2019-05-09 2022-03-16 Catalog Technologies, Inc. Data structures and operations for searching, computing, and indexing in dna-based data storage
GB201907460D0 (en) 2019-05-27 2019-07-10 Vib Vzw A method of storing information in pools of nucleic acid molecules
US10956806B2 (en) 2019-06-10 2021-03-23 International Business Machines Corporation Efficient assembly of oligonucleotides for nucleic acid based data storage
WO2021056167A1 (en) * 2019-09-24 2021-04-01 深圳华大生命科学研究院 Information encoding method and apparatus, information decoding method and apparatus, storage medium, and information storage and interpretation method
EP4041920A1 (en) 2019-10-11 2022-08-17 Catalog Technologies, Inc. Nucleic acid security and authentication
CN111028883B (en) * 2019-11-20 2023-07-18 广州达美智能科技有限公司 Gene processing method and device based on Boolean algebra and readable storage medium
CN111091876A (en) * 2019-12-16 2020-05-01 中国科学院深圳先进技术研究院 DNA storage method, system and electronic equipment
CN111243670A (en) * 2020-01-23 2020-06-05 天津大学 DNA information storage coding method meeting biological constraint
CN111368132B (en) * 2020-02-28 2023-04-14 元码基因科技(北京)股份有限公司 Method for storing audio or video files based on DNA sequences and storage medium
CN111680797B (en) * 2020-05-08 2023-06-06 中国科学院计算技术研究所 DNA type printer, DNA-based data storage device and method
JP2023526017A (en) 2020-05-11 2023-06-20 カタログ テクノロジーズ, インコーポレイテッド Programs and functions in DNA-based data storage
CN111737955A (en) * 2020-06-24 2020-10-02 任兆瑞 Method for storing character dot matrix by using DNA character code
CN114058471A (en) * 2020-07-29 2022-02-18 东南大学 Data storage device loaded with DNA storage data, preparation method and reading method
CN112100982B (en) * 2020-08-07 2023-06-20 广州大学 DNA storage method, system and storage medium
CN112382340B (en) * 2020-11-25 2022-11-15 中国科学院深圳先进技术研究院 Coding and decoding method and coding and decoding device for DNA data storage
CN112802549B (en) * 2021-01-26 2022-05-13 武汉大学 Coding and decoding method for DNA sequence integrity check and error correction
US20220243252A1 (en) * 2021-02-03 2022-08-04 Seagate Technology Llc Isotope modified nucleotides for dna data storage
CN113744804B (en) * 2021-06-21 2023-03-10 深圳先进技术研究院 Method and device for storing data by using DNA and storage equipment
CN117751410A (en) * 2021-12-17 2024-03-22 深圳华大生命科学研究院 Method and system for storing information by using DNA
CN114898806A (en) * 2022-05-25 2022-08-12 天津大学 DNA type writing system and method

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040001371A1 (en) * 2002-06-26 2004-01-01 The Arizona Board Of Regents On Behalf Of The University Of Arizona Information storage and retrieval device using macromolecules as storage media
US20040121362A1 (en) * 2002-06-20 2004-06-24 Whitney Gena S. Identification and modulation of a G-protein coupled receptor (GPCR), RAI-3, associated with chronic obstructive pulmonary disease (COPD) and NF-kappaB and E-selectin regulation
US20050053968A1 (en) * 2003-03-31 2005-03-10 Council Of Scientific And Industrial Research Method for storing information in DNA
US20140315310A1 (en) * 2012-12-13 2014-10-23 Massachusetts Institute Of Technology Recombinase-based logic and memory systems
US20170141793A1 (en) * 2015-11-13 2017-05-18 Microsoft Technology Licensing, Llc Error correction for nucleotide data stores
US20170249345A1 (en) * 2014-10-18 2017-08-31 Girik Malik A biomolecule based data storage system
US20170335334A1 (en) * 2014-10-29 2017-11-23 Massachusetts Institute Of Technology Dna cloaking technologies

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004512807A (en) * 1999-03-18 2004-04-30 コンプリート ゲノミックス エイエス Cloning and generation of fragment chains with readable information
US20040153255A1 (en) * 2003-02-03 2004-08-05 Ahn Tae-Jin Apparatus and method for encoding DNA sequence, and computer readable medium
JP2004355294A (en) * 2003-05-29 2004-12-16 National Institute Of Advanced Industrial & Technology Designing method of dna code as information carrier
ES2698609T3 (en) * 2012-06-01 2019-02-05 European Molecular Biology Laboratory High capacity storage of digital information in DNA
CN105022935A (en) 2014-04-22 2015-11-04 中国科学院青岛生物能源与过程研究所 Encoding method and decoding method for performing information storage by means of DNA
KR20160001455A (en) * 2014-06-27 2016-01-06 한국생명공학연구원 DNA Memory for Data Storage
CN104850760B (en) * 2015-03-27 2016-12-21 苏州泓迅生物科技有限公司 The information storing and reading method of artificial-synthetic DNA's storage medium
CN105119717A (en) 2015-07-21 2015-12-02 郑州轻工业学院 DNA coding based encryption system and encryption method

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040121362A1 (en) * 2002-06-20 2004-06-24 Whitney Gena S. Identification and modulation of a G-protein coupled receptor (GPCR), RAI-3, associated with chronic obstructive pulmonary disease (COPD) and NF-kappaB and E-selectin regulation
US20040001371A1 (en) * 2002-06-26 2004-01-01 The Arizona Board Of Regents On Behalf Of The University Of Arizona Information storage and retrieval device using macromolecules as storage media
US20050053968A1 (en) * 2003-03-31 2005-03-10 Council Of Scientific And Industrial Research Method for storing information in DNA
US20140315310A1 (en) * 2012-12-13 2014-10-23 Massachusetts Institute Of Technology Recombinase-based logic and memory systems
US20170249345A1 (en) * 2014-10-18 2017-08-31 Girik Malik A biomolecule based data storage system
US20170335334A1 (en) * 2014-10-29 2017-11-23 Massachusetts Institute Of Technology Dna cloaking technologies
US20170141793A1 (en) * 2015-11-13 2017-05-18 Microsoft Technology Licensing, Llc Error correction for nucleotide data stores

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Next-Generation Church Supplementary Materials for Digital Information Storage in DNA, Published 16 August 2012 on Science Express, Retrieved from the Internet URL https //science.sciencemag.org/content/sci/suppl/2012/08/15/science.1226355.DC1/Church.SM.pdf > *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11763169B2 (en) 2016-11-16 2023-09-19 Catalog Technologies, Inc. Systems for nucleic acid-based data storage
US11017170B2 (en) * 2018-09-27 2021-05-25 At&T Intellectual Property I, L.P. Encoding and storing text using DNA sequences
US11361159B2 (en) 2018-09-27 2022-06-14 At&T Intellectual Property I, L.P. Encoding and storing text using DNA sequences
US20220358290A1 (en) * 2018-09-27 2022-11-10 At&T Intellectual Property I, L.P. Encoding and storing text using dna sequences
CN112002376A (en) * 2020-08-13 2020-11-27 中国海洋大学 Method for recording and reading information by DNA molecule
CN114758703A (en) * 2022-06-14 2022-07-15 深圳先进技术研究院 Data information storage method based on recombinant plasmid DNA molecules
CN114958828A (en) * 2022-06-14 2022-08-30 深圳先进技术研究院 Data information storage method based on DNA molecular medium
WO2023240950A1 (en) * 2022-06-14 2023-12-21 深圳先进技术研究院 Data information storage method based on dna molecular medium

Also Published As

Publication number Publication date
WO2017190297A1 (en) 2017-11-09
US10839295B2 (en) 2020-11-17
CN109074424A (en) 2018-12-21
EP3470997A4 (en) 2020-04-01
CN109074424B (en) 2022-03-11
EP3470997A1 (en) 2019-04-17

Similar Documents

Publication Publication Date Title
US10839295B2 (en) Method for using DNA to store text information, decoding method therefor and application thereof
Lopez et al. DNA assembly for nanopore data storage readout
CN112382340B (en) Coding and decoding method and coding and decoding device for DNA data storage
US20170249345A1 (en) A biomolecule based data storage system
Helfenbein et al. The complete mitochondrial genome of the articulate brachiopod Terebratalia transversa
CN111858510B (en) DNA type storage system and method
Gilson et al. Jam packed genomes–a preliminary, comparative analysis of nucleomorphs
KR970703695A (en) Plant Virus Resistance Gene and Methods
Smith et al. Some possible codes for encrypting data in DNA
US11845982B2 (en) Key-value store that harnesses live micro-organisms to store and retrieve digital information
KR20160001455A (en) DNA Memory for Data Storage
CN104087610A (en) Shuttle plasmid vector, as well as construction method and applications thereof
CN114974429A (en) DNA storage coding method and device based on decimal system and readable storage medium
Garafutdinov et al. Encoding of non-biological information for its long-term storage in DNA
JP6175453B2 (en) Encryption and decryption method using nucleic acid
WO2010108929A2 (en) Methods for providing a set of symbols uniquely distinguishing an organism such as a human individual
Battail Heredity as an encoded communication process
Lee et al. DNA data storage in Perl
Beck et al. Finding data in DNA: computer forensic investigations of living organisms
Gorlach et al. Differential expression of tomato (Lycopersicon esculentum L.) genes encoding shikimate pathway isoenzymes. II. Chorismate synthase
Kawano Run-length encoding graphic rules, biochemically editable designs and steganographical numeric data embedment for DNA-based cryptographical coding system
US11017170B2 (en) Encoding and storing text using DNA sequences
US20140234848A1 (en) Cryptographic Methods Using Nucleic Acid Codes
Ahn et al. Storing Digital Information in Long-Read DNA
US20220351807A1 (en) Biocompatible nucleic acids for digital data storage

Legal Events

Date Code Title Description
FEPP Fee payment procedure

Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

AS Assignment

Owner name: BGI SHENZHEN, CHINA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:SHEN, YUE;CHEN, TAI;LIU, LONGYING;AND OTHERS;SIGNING DATES FROM 20181019 TO 20181022;REEL/FRAME:048100/0683

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER

STCF Information on status: patent grant

Free format text: PATENTED CASE