WO2017190297A1 - 利用dna存储文本信息的方法、其解码方法及应用 - Google Patents

利用dna存储文本信息的方法、其解码方法及应用 Download PDF

Info

Publication number
WO2017190297A1
WO2017190297A1 PCT/CN2016/081037 CN2016081037W WO2017190297A1 WO 2017190297 A1 WO2017190297 A1 WO 2017190297A1 CN 2016081037 W CN2016081037 W CN 2016081037W WO 2017190297 A1 WO2017190297 A1 WO 2017190297A1
Authority
WO
WIPO (PCT)
Prior art keywords
dna
dna sequence
encoding
text information
sequence
Prior art date
Application number
PCT/CN2016/081037
Other languages
English (en)
French (fr)
Inventor
沈玥
陈泰
刘龙英
陈世宏
王云
杨焕明
Original Assignee
深圳华大基因研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳华大基因研究院 filed Critical 深圳华大基因研究院
Priority to CN201680085320.9A priority Critical patent/CN109074424B/zh
Priority to US16/098,471 priority patent/US10839295B2/en
Priority to PCT/CN2016/081037 priority patent/WO2017190297A1/zh
Priority to EP16900819.0A priority patent/EP3470997A4/en
Publication of WO2017190297A1 publication Critical patent/WO2017190297A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/12Computing arrangements based on biological models using genetic models
    • G06N3/123DNA computing
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07HSUGARS; DERIVATIVES THEREOF; NUCLEOSIDES; NUCLEOTIDES; NUCLEIC ACIDS
    • C07H21/00Compounds containing two or more mononucleotide units having separate phosphate or polyphosphate groups linked by saccharide radicals of nucleoside groups, e.g. nucleic acids
    • C07H21/04Compounds containing two or more mononucleotide units having separate phosphate or polyphosphate groups linked by saccharide radicals of nucleoside groups, e.g. nucleic acids with deoxyribosyl as saccharide radical
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/123Storage facilities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/263Language identification
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C13/00Digital stores characterised by the use of storage elements not covered by groups G11C11/00, G11C23/00, or G11C25/00
    • G11C13/0002Digital stores characterised by the use of storage elements not covered by groups G11C11/00, G11C23/00, or G11C25/00 using resistive RAM [RRAM] elements
    • G11C13/0009RRAM elements whose operation depends upon chemical change
    • G11C13/0014RRAM elements whose operation depends upon chemical change comprising cells based on organic memory material
    • G11C13/0019RRAM elements whose operation depends upon chemical change comprising cells based on organic memory material comprising bio-molecules
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C13/00Digital stores characterised by the use of storage elements not covered by groups G11C11/00, G11C23/00, or G11C25/00
    • G11C13/02Digital stores characterised by the use of storage elements not covered by groups G11C11/00, G11C23/00, or G11C25/00 using elements whose operation depends upon chemical change
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • G16B50/20Heterogeneous data integration
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B99/00Subject matter not provided for in other groups of this subclass

Definitions

  • the invention belongs to the technical field of DNA storage, and in particular relates to a method for encoding and storing text information using DNA as a storage medium, and a decoding method and application thereof.
  • DNA storage is a disruptive information storage technology that looks to the future.
  • DNA has many advantages such as small size, large storage capacity, strong stability, and low maintenance cost.
  • 1 gram of DNA can store thousands of terabytes of data, and it is estimated that only hundreds of kilograms of DNA can be used to store all existing books, materials, videos, etc., and storage time under normal conditions. Can be reached for thousands of years. Therefore, some information that is not commonly used but needs to be stored for a long time, such as government documents, historical files, etc., is especially suitable for storage using DNA.
  • DNA storage has many advantages over existing storage media, there are still some technical barriers that hinder its development: such as the inability to reuse synthetic DNA oligo fragments, high cost of DNA synthesis, complex design, and poor flexibility.
  • the problem is that existing DNA storage technologies are difficult to promote and use on a large scale. Therefore, it is necessary to start from the design of the basic information component unit, optimize the coding design scheme of DNA storage, reduce the cost, improve the efficiency and ease of use.
  • the text information storage method provided by the present invention generally comprises: first, encoding a text into a computer binary number by encoding, and then converting the binary number into a DNA sequence by transcoding; secondly, artificially synthesizing the DNA sequence encoding the text information, and adopting Designed connector to position the text, the text-encoded DNA sequences can be assembled in a preset order and further assembled into longer DNA sequences as needed.
  • each character can be reused, and the information can be stored for any information by replacing the joint, and the principle is the same as the "living type printing" strategy.
  • the DNA storing the text information can be stored under appropriate conditions. When the stored information needs to be read, the DNA sequence is obtained by sequencing, and then the stored text information can be obtained by computer decoding (as shown in FIG. 1).
  • the method provided by the invention has the advantages of small DNA storage volume, large storage capacity, strong stability and low maintenance cost.
  • the present invention provides a method of storing text information using DNA, comprising the steps of:
  • the DNA sequence encoding the character is positioned by the designed connector, and each DNA sequence encoding each character is assembled and stored according to the text sequence of the information to be stored.
  • the encoding is Unicode-ucs2 encoding; that is, each Chinese character is composed of sixteen
  • the hexadecimal number encoding such as the corresponding Unicode encoding of the word " ⁇ " is U+5535; each hexadecimal digit is converted to a 4-digit binary number, such as 5 to 0101, 3 to 0011, and the word " ⁇ " It can be converted to binary digits 0101010100110101; preferably, each 8-bit binary number produces 4 bits of Hamming code for verification, and the Hamming codes of the word " ⁇ " are 0010 and 1110, respectively. In the end, the complete binary code "010" of the " ⁇ ” word can be obtained 010101010010001101011110.
  • the transcoding is based on the principle of converting a binary number 0 to G or T, and a binary number 1 to C or A, converting the binary digits of the encoded text into a DNA sequence.
  • one Chinese character is encoded into 24 bases.
  • control of the sequence design is carried out by considering one or more of the parameters including the GC content, the secondary structure, and the base repetition rate of the DNA sequence; for example, preferably, the DNA sequence is designed such that The GC content is in the range of 45-60%, preferably 50%; preferably, the DNA sequence is designed to avoid the production of secondary structures; preferably, the DNA sequence is designed such that the single base within its sequence does not exceed 2 Taking the word " ⁇ " as an example, the final DNA sequence is TAGCTATAGGCTTGCATAGCACCG.
  • Both the DNA sequence and the linker sequence in the present invention are obtained by chemically synthesizing the forward and reverse strands from the head and annealing to form a double-stranded structure.
  • the DNA sequence fragment and the linker both highlight complementary paired locating bases, and the DNA sequences are achieved by respective complementary complementary pairing bases (ie, "localized bases")
  • the directional connection of the connection joint By designing the connector, the DNA sequence fragments encoding the text information can be connected in the desired textual order.
  • the linker comprises an upstream linker and a downstream linker;
  • the DNA sequence can be respectively connected to the upstream and downstream of the two DNA fragments, and the two DNA fragments produced can be connected by conventional molecular biological methods, preferably by PCA, GoldenGate, etc. Implement the connection (as shown in Figure 2).
  • a DNA fragment protrudes from one base at a time, such as a 5'-end positive-stranded base "A” of a DNA fragment, and a negative-stranded base "G", and the corresponding upstream link has a negative-stranded base "T”.
  • the downstream linker positively protrudes from the base "C”, so that the directional connection of the fragment and the linker can be achieved by A/T, G/C pairing, that is, the linker protruding "T" can only be connected upstream of the DNA fragment, highlighting " The linker of C" can only be attached downstream of the DNA fragment.
  • the bases protruding upstream and downstream of the DNA fragment can also be A/C, T/G, T/C, etc., and the corresponding bases of the linker also become T/G, A/C, A/. G, etc. (as shown in Figure 3A).
  • the DNA fragment and the linker may have more than one mismatched base.
  • the DNA fragment and the base highlighted by the linker can also be at the 3' end.
  • the DNA fragment can also highlight the base "G" at both the 5' and 3' ends of the positive strand, so the corresponding upstream linker 5' end and the downstream linker minus 3' end both highlight the base "C".
  • Directional connections of segments and connectors can be achieved.
  • the bases protruding at the 5' end and the 3' end of the positive strand of the DNA fragment may also be “C", “T” or “A”, and the bases protruding from the linker are correspondingly changed to “G", "A”. Or “T”.
  • the DNA fragment can also be a negative strand “G”, “C”, “T” or “A”, and the linker becomes a positive stranded base “C”, "G", “A” or “T” (as shown in Figure 3B).
  • the DNA fragment and the linker may have more than one mismatched base.
  • connection connector sequence can be automatically generated by a computer program.
  • a PCA linker requires a length of 8 bp or more, a GC content of 50%-60%, no secondary structure, no more than 2 consecutive bases, no mismatch between the same set of linkers, and a GoldenGate linker consisting of a sequence of cleavage sites and The outer protective base composition, the 4 bp sticky end of the same set of linkers required a difference of more than 2 bp (as shown in Figure 3C).
  • the positive and negative strands of the DNA fragment, the negative strand of the upstream linker, and the 5' end of the positive link of the downstream linker are all phosphorylated.
  • the negative link 5' end of the upstream link and the downstream link of the upstream linker is dephosphorylated to reduce the probability of joint self-joining and misconnection.
  • the DNA sequence encoding the text is respectively added to the designed connection joint, and is positioned by the connection joint; in a specific embodiment, each of the coded information including each character is overlapped and extended by PCR according to the text order of the information to be stored.
  • the DNA sequences are ligated and further assembled into longer DNA sequences; preferably, each DNA sequence encoding each textual information is ligated by a method such as PCA or GoldenGate; preferably, the ligated DNA sequence is subjected to Gibson's method
  • the assembled DNA sequence encoded with text information can be stored under suitable storage conditions, for example, lyophilized for long-term storage at low temperatures.
  • the words can be assembled into words, idioms, etc., so that subsequent assembly is more convenient and convenient; for example, by using PCA or GoldenGate, 10-20 words can be assembled into short sentences at a time, and then passed. Assembly methods such as Gibson assembly further splicing short sentences into sentences, paragraphs or articles.
  • the assembled DNA sequence is cloned into a plasmid for storage; preferably, prior to storage, the step of verifying the correctness of the assembled DNA sequence by sequencing is also included.
  • the present invention also provides a method of decoding text information stored in accordance with the method of the first aspect, comprising the steps of:
  • sequencing a DNA sequence storing text information for example, by sanger sequencing, second generation sequencing, three generation sequencing or other sequencing methods;
  • the measured DNA sequence is converted into a binary number by the same transcoding and encoding rules defined in the method described in the first aspect, and then converted into corresponding Chinese characters, thereby obtaining the stored text information.
  • the DNA sequence can be corrected by Hamming code. For example, if the second position of the above DNA sequence is mutated from A to G, that is, the DNA sequence becomes TGGCTATAGGCTTGCATAGCACCG, the corresponding binary number becomes 000101010010001101011110. According to the Hamming code verification principle, the second base can be mutated, so that the binary number is corrected to 010101010010001101011110, and still The sequence can be correctly decoded as " ⁇ ".
  • the decoding may also include the step of correcting for variations in the DNA sequence according to the Hamming code verification principle.
  • the present invention provides the use of the text information storage method according to the first aspect and/or the decoding method according to the second aspect in the storage and/or reading of text information.
  • the method for storing text information using DNA of the present invention has the advantages of small DNA storage volume, large storage capacity, strong stability, and low maintenance cost.
  • the present invention is more suitable for storing text information, and supports text forms such as Chinese characters, punctuation marks, and mathematical symbols in all Chinese characters, English letters, Japanese, and Korean languages; 1000 Chinese characters can be encoded in 100 milliseconds; using a strategy similar to "flying type printing", DNA fragments and adaptors can be reused, and the synthesis cost is low; the stored DNA sequence can be double-stranded closed circular conformation, and stored. More stable; the stored DNA sequence can be verified by sequencing, and can be added to the Hamming check code, allowing any mutation in every 12 bases to make it more fidelity; the stored DNA sequence is a long one. Double-stranded DNA makes information reading easier.
  • Embodiment 1 is a schematic view showing the overall flow of Embodiment 1 of the present invention.
  • Embodiment 1 of the present invention is a schematic view showing the assembly process of Embodiment 1 of the present invention
  • FIG. 3 is a schematic view showing the design of a segment/joint of Embodiment 1 of the present invention.
  • Fig. 4 is a view showing an electrophoresis pattern of the assembly test results of Example 1 of the present invention.
  • the experimental procedures mentioned in the examples are routine experimental methods unless otherwise specified; the reagent consumables mentioned are conventional reagent consumables unless otherwise specified.
  • the synthetic oligos used in the experiments were diluted to 100 ⁇ M with sterile water and the primers were each diluted to 10 ⁇ M.
  • the DNA oligo sequence is transcoded; and the PCA linker is used for positioning and linking in the G/C and A/T base pairing manners respectively.
  • the DNA oligo and primer sequences are shown in Table 1 below.
  • the positive and negative oligos were each taken 10 ⁇ L, mixed and annealed; the annealing procedure was: denaturation at 99 ° C for 10 min, slow down to 25 ° C at 0.1 ° C / sec, and maintained at 12 ° C.
  • T4 DNA ligase Enzymatics 1 ⁇ L, 2 ⁇ ligation buffer 10 ⁇ L, annealed text, upstream linker, downstream linker 2 ⁇ L each, ddH 2 O 3 ⁇ L ; 16 ° C connection overnight.
  • the ligation product was subjected to gel electrophoresis on 15% PAGE gel, and electrophoresed at 100 V for 1 h (electrophoresis results are shown in Fig. 4A); the gelatinization and purification target bands (42 bp in size, as shown by the arrow in Fig. 4A):
  • the next piece of rubber was placed in a hollow 0.5 mL tube, placed in a 2 mL tube, centrifuged at 14,000 rpm for 2 min, 200 ⁇ L of 0.3 M NaCl was added to the broken gel, and shaken at 1400 rpm for 2 h at 25 ° C; the gel and the liquid were transferred together to the filter column.
  • the mixture was centrifuged at 14,000 rpm for 2 min, and the filtrate was transferred to a 1.5 mL centrifuge tube, 400 ⁇ L of absolute ethanol was added, and the substrate was allowed to stand at -80 ° C for 1 h. After centrifugation at 14000 rpm for 30 min at 4 ° C, the supernatant was discarded, and the precipitate was washed once with 500 ⁇ L of 70% ethanol, the supernatant was aspirated, dried at 37 ° C for 5 min, and 20 ⁇ L of ddH 2 O was added to dissolve the DNA.
  • TAKARA Ex Taq DNA polymerase
  • Step 2 Ex Taq DNA polymerase (TAKARA) 0.2 ⁇ L, 10 ⁇ buffer 2 ⁇ L, 2.5 mM dNTPs 1.6 ⁇ L, first step PCR product 3 ⁇ L, primers St12-F, St1-R 1 ⁇ L each, ddH 2 O 11.2 ⁇ L .
  • TAKARA Ex Taq DNA polymerase
  • PCR product was detected by electrophoresis: 5 ⁇ L of the PCR product was electrophoresed. 2% agarose gel, electrophoresis at 180 V for 30 min (electrophoresis results are shown in Figure 4B. PCR product size is about 280 bp, as indicated by the arrow).
  • the PCR product in step 4 was purified by gel purification using a gel purification kit, and the product was purified by PCR using a TA cloning kit (TAKARA).
  • TAKARA TA cloning kit
  • plasmid was extracted using a kit and identified by restriction enzyme digestion with a designed BssHII restriction site.
  • the enzyme digestion system 0.5 ⁇ L of BssHII (NEB), 1 ⁇ L of CutSmart buffer, 4 ⁇ L of plasmid DNA, and 4.5 ⁇ L of ddH2O.
  • the enzyme was digested at 37 ° C for 1 h. 5 ⁇ L of the digested product was detected by electrophoresis, 2% agarose gel, and electrophoresis at 180 V for 30 min (the electrophoresis results are shown in Fig. 4C, and the correct strip size is indicated by an arrow).
  • the correct plasmid was selected for sanger sequencing, and the plasmid with the correct assembly sequence was analyzed.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Chemical & Material Sciences (AREA)
  • Molecular Biology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Genetics & Genomics (AREA)
  • Biotechnology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Biochemistry (AREA)
  • Organic Chemistry (AREA)
  • Biomedical Technology (AREA)
  • Medical Informatics (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Bioethics (AREA)
  • Databases & Information Systems (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

一种使用DNA作为存储介质,编码、储存文本信息的方法,以及其解码方法及应用。DNA存储文本信息的方法包括:通过编码,将文字编码为计算机二进制数字,再通过转码将二进制数字转换成DNA序列;人工合成此编码有文字信息的DNA序列,通过设计的连接接头对文字进行定位,按预设顺序将编码文字信息的DNA序列组装起来。DNA存储文本信息的方法具有存储体积小、存储量大、稳定性强以及维护成本低等优点。

Description

利用DNA存储文本信息的方法、其解码方法及应用 技术领域
本发明属于DNA存储技术领域,具体地,涉及使用DNA作为存储介质,编码、储存文本信息的方法,以及其解码方法及应用。
背景技术
随着人类社会的发展,累积的信息量呈现出爆炸式增长的趋势。IDC《2020年的数字宇宙》报告预测,到2020年全球数据总量将超过40ZB!而且全球数据量还在以每年58%的速度高速增长,大量的有效数据正在丢失。数据存储是一个全世界都面临的问题。目前常用的存储介质如光盘、硬盘等,均存在存储容量低、体积大、维护成本高、存储时间短(~50年)等缺点。为了从根本上解决这些问题,需要尽快开发新的信息存储介质。
DNA存储是一项着眼于未来的具有颠覆性意义的信息存储技术。DNA作为信息存储介质,具有体积小、存储量大、稳定性强、维护成本低等诸多优点。理论上,1克DNA即可存储上千TB的数据,以此推算,仅需数百千克的DNA即可完成现存人类所有的图书、资料、视频等信息的存储,而且在常规条件下储存时间可达成千上万年。因此,一些不常用却需要长期保存的信息,譬如政府文件、历史档案等,尤其适合使用DNA进行存储。
虽然DNA存储有诸多现有存储介质无法比拟的优点,但是仍然存在一些技术壁垒阻碍了其发展进程:如合成的DNA oligo片段无法重复使用、DNA合成成本较高、设计复杂、灵活性差等,这些问题都使得现有的DNA存储技术难以大规模推广和使用。所以需要从基本信息构成单位的设计上入手,优化DNA存储的编码设计方案,降低成本,提高效率和使用便利性。
发明内容
本发明的目的在于提供一种使用DNA作为存储介质,编码、储存文本信息的方法,以及其解码方法及应用。
本发明提供的文本信息储存方法大体包括:首先,通过编码,将文字编码为计算机二进制数字,再通过转码将二进制数字转换成DNA序列;其次,人工合成此编码有文字信息的DNA序列,通过设计的连接接头对文字进行定位,即可按预设顺序将文字编码的DNA序列组装起来,并可根据需要进一步组装成更长的DNA序列。
在本发明的文本信息存储方法中,每个文字均可重复使用,通过更换接头即可用于任意信息的存储,原理同“活字印刷”策略。储存了文本信息的DNA可在适当的条件下保存,当需要读取存储的信息时,通过测序得到DNA序列,再通过计算机解码即可得到所存储的文字信息(如图1所示)。本发明所提供的方法具备DNA存储体积小、存储量大、稳定性强以及维护成本低等优点。
具体地,本发明的技术目的可通过以下几个方面来实现:
第一方面,本发明提供了一种利用DNA存储文本信息的方法,其包括以下步骤:
(1)通过编码,将文字编码为计算机二进制数字;
(2)通过转码,将编码文字的计算机二进制数字转换成A、T、G、C四种脱氧核糖核苷酸表示的DNA序列;
(3)合成编码文字的DNA序列;
(4)通过设计的连接接头对编码文字的DNA序列进行定位,按照待存储信息的文字顺序,将编码各个文字的各个DNA序列进行组装并储存。
关于编码
在可选的具体实施方案中,所述编码为Unicode-ucs2编码;即,每个汉字由十六 进制数字编码,如“唵”字的对应Unicode编码为U+5535;每1位十六进制数字转换为4位二进制数字,如5转换为0101,3转换为0011,“唵”字即可转换为二进制数字0101010100110101;优选地,每8位二进制数字产生4位用于校验的海明码(Hamming code),则“唵”字的海明码分别为0010和1110。最终可得到“唵”字完整的二进制编码010101010010001101011110。
关于转码
在可选的具体实施方案中,所述转码为按照二进制数字0转换为G或者T、二进制数字1转换为C或者A的原则,将编码文字的二进制数字转换成DNA序列。
优选地,一个汉字被编码成24个碱基。
优选地,通过考虑包括DNA序列的GC含量、二级结构、碱基重复率的参数中的一种或多种,进行序列设计的控制;例如,优选地,所述DNA序列被设计为使其GC含量在45-60%、优选50%;优选地,所述DNA序列被设计为避免二级结构的产生;优选地,所述DNA序列被设计为使其序列内的单碱基连续不超过2个。以“唵”字为例,最终转换成的DNA序列为TAGCTATAGGCTTGCATAGCACCG。
关于DNA序列及连接接头
本发明中的DNA序列和连接接头序列均采用化学方法从头合成正、反链,退火形成双链结构的方式获得。
在可选的具体实施方案中,所述DNA序列片段和连接接头均突出互补配对的定位碱基,通过各自突出的互补配对碱基(即“定位碱基”),实现所述DNA序列与所述连接接头的定向连接。通过设计连接接头,可按预期的文字顺序连接编码各文字信息的DNA序列片段。
在可选的具体实施方案中,所述连接接头包括上游接头和下游接头;DNA序列相 同,突出的定位碱基不同,则该DNA序列可分别连接到2个DNA片段的上游和下游,产生的2个DNA片段可利用连接接头以常规分子生物学方法,优选通过PCA、GoldenGate等方法实现连接(如图2所示)。
例如,DNA片段两端分别突出一个碱基,如DNA片段5’端正链突出碱基“A”,负链突出碱基“G”,那么对应的上游接头则负链突出碱基“T”,下游接头正链突出碱基“C”,从而可以通过A/T、G/C配对的方式实现片段和接头的定向连接,即突出“T”的接头只能连接到DNA片段的上游,突出“C”的接头只能连接到DNA片段的下游。同理,DNA片段上、下游突出的碱基也可以是A/C、T/G、T/C等,那么接头对应突出的碱基也相应的变为T/G、A/C、A/G等(如图3A所示)。同理,DNA片段和接头突出的互不配对碱基也均可以是一个以上。同理,DNA片段和接头突出的碱基也可以是在3’端。DNA片段也可以在正链5’端和3’端均突出碱基“G”,那么对应的上游接头负链5’端和下游接头负链3’端则均突出碱基“C”,同样可以实现片段和接头的定向连接。同理,DNA片段正链5’端和3’端突出的碱基也可以是“C”、“T”或“A”,接头突出的碱基则对应的变为“G”、“A”或“T”。同理,DNA片段也可以是负链突出“G”、“C”、“T”或“A”,接头则相应的变为正链突出碱基“C”、“G”、“A”或“T”(如图3B所示)。同理,DNA片段和接头突出的互不配对碱基也均可以是一个以上。
所述连接接头序列可由计算机程序自动生成。例如,PCA接头要求长度在8bp以上,GC含量50%-60%,没有二级结构,碱基连续不超过2个,同一套接头之间没有错配等;GoldenGate接头由酶切位点序列以及外侧的保护碱基组成,同一套接头之间酶切产生的4bp粘性末端要求差异在2bp以上(如图3C所示)。DNA片段的正链、负链,上游接头的负链,下游接头的正链5’端均采用磷酸化处理。上游接头的正链和下游接头的负链5’端采用去磷酸化处理,以降低接头自连和错误连接的概率。
关于组装及保存
将编码文字的DNA序列分别加上设计的连接接头,并通过该连接接头进行定位;在具体实施方案中,按照待存储信息的文字顺序,通过重叠延伸PCR,对包含各文字的编码信息的各个DNA序列进行连接,并进一步组装成更长的DNA序列;优选地,通过PCA或GoldenGate等方法,将编码各个文字信息的各个DNA序列进行连接;优选地,通过Gibson法将所连接的DNA序列进行组装,所组装的编码有文字信息的DNA序列可在合适的保存条件下保存,例如,冻干后低温长期保存。
在具体的实施方案中,可先将文字组装成词语、成语等形式,以便后续组装更方便快捷;例如,通过PCA或GoldenGate等方法,一次可将10-20个文字组装成短句,再通过Gibson组装等组装方法将短句进一步拼接成长句、段落或文章。
优选地,将组装的DNA序列克隆到质粒上再进行储存;优选地,在储存前,还包括通过测序验证组装的DNA序列的正确性的步骤。
第二方面,本发明还提供了对根据第一方面所述方法存储的文本信息进行解码的方法,其包括以下步骤:
(1)将存储有文本信息的DNA序列进行测序;例如,通过sanger测序、二代测序、三代测序或其他测序方法等;
(2)通过与第一方面所述方法中限定的相同的转码、编码规则,将所测得的DNA序列转换成二进制数字,再转换成对应的汉字,从而获得所述存储的文本信息。
如果DNA序列存在变异,则可通过海明码进行纠正。例如,如果上述DNA序列第2位由A突变成G,即DNA序列变成TGGCTATAGGCTTGCATAGCACCG,则对应的二进制数字变成000101010010001101011110。根据海明码校验原理,可以算得第2位碱基发生突变,从而将二进制数字纠正为010101010010001101011110,进而仍然 可对序列正确解码为“唵”。
因此,在具体实施方案中,所述解码还可包括根据海明码校验原理,对DNA序列中的变异进行纠正的步骤。
第三方面,本发明提供了根据第一方面所述的文本信息存储方法和/或根据第二方面所述的解码方法在文本信息存储和/或读取中的应用。
有益效果
首先,本发明的利用DNA存储文本信息的方法具有DNA存储体积小、存储量大、稳定性强以及维护成本低等优点。
另外,与现有的其他DNA存储方法相比,本发明更适用于文本信息的存储,支持全部汉字、英文字母、日文、韩文等各国语言文字、标点符号以及数学符号等文本形式;编码效率高,1000个汉字可在100毫秒内完成编码;采用了类似“活字印刷”的策略,DNA片段和接头均可重复使用,合成成本较低;存储的DNA序列可以采用双链闭合环状构象,存储稳定性更强;存储的DNA序列可经过测序验证,且可加入海明校验码,每12个碱基中可允许一个任意突变,使其保真性更好;存储的DNA序列为一条长的双链DNA,信息读取更容易。
附图说明
图1显示本发明实施例1的总体流程示意图;
图2显示本发明实施例1的组装流程示意图;
图3显示本发明实施例1的片段/接头设计示意图;
图4显示本发明实施例1的组装测试结果的电泳图。
具体实施方式
以下实施例中所描述的实验流程仅用于证明专利方法的可行性,发明方法的应用 并不仅限于此。
实施例中所提及的实验操作,如无特殊说明,均为常规实验方法;所提及的试剂耗材,如无特殊说明,均为常规试剂耗材。实验中所用到的合成oligo均使用无菌水稀释至100μM,引物均稀释至10μM。
实施例1组装测试
组装测试短句:华章谱写基因梦。
按本发明所述方法转码、合成DNA oligo序列;使用PCA接头,上、下游分别以G/C、A/T碱基配对方式进行定位连接。DNA oligo和引物序列见下文表1。
1.退火
每个文字或者接头,正、反向oligo分别各取10μL,混匀,退火;退火程序为:99℃变性10min,0.1℃/sec缓慢降温至25℃,12℃保持。
2.连接
每个文字分别按顺序加上游、下游接头,进行连接;连接体系为:T4DNA连接酶(Enzymatics)1μL,2×连接缓冲液10μL,退火的文字、上游接头、下游接头各2μL,ddH2O 3μL;16℃连接过夜。
3.连接产物纯化
将连接产物在15%PAGE胶上进行凝胶电泳,100V电压电泳1h(电泳结果如图4A所示);切胶、纯化目的条带(大小为42bp,如图4A箭头所示):将切下的胶块置于扎空的0.5mL管中,装在2mL管内,14000rpm离心2min,加200μL 0.3M的NaCl至碎胶中,25℃1400rpm振荡2h;将碎胶和液体一起转入过滤柱中,14000rpm离心2min,将滤液转至1.5mL离心管中,加400μL无水乙醇,-80℃静置沉底1h。4℃14000rpm离心30min,弃上清,加500μL 70%乙醇洗涤沉淀一次,吸尽上清, 37℃干燥5min,加20μL ddH2O溶解DNA。
4.组装
使用PCA的方法将文字组装成短句。第一步:Ex Taq DNA聚合酶(TAKARA)0.2μL,10×缓冲液2μL,2.5mM dNTPs 1.6μL,adapter1-U+华+adapter2-D、adapter2-U+章+adapter5-D、adapter5-U+谱+adapter6-D、adapter6-U+写+adapter7-D、adapter7-U+基+adapter8-D、adapter8-U+因+adapter9-D、adapter9-U+梦+adapter10-D连接、切胶纯化产物各50ng,加水补足20μL。94℃5min,94℃30sec、55℃1sec、0.5℃/sec降温至45℃、45℃15sec、72℃1min,20个循环,72℃5min,12℃保持。第二步:Ex Taq DNA聚合酶(TAKARA)0.2μL,10×缓冲液2μL,2.5mM dNTPs1.6μL,第一步PCR产物3μL,引物St12-F、St1-R各1μL,ddH2O 11.2μL。94℃5min,94℃30sec、60℃30sec、72℃30sec,35个循环,72℃5min,12℃保持。
电泳检测PCR产物:取5μL PCR产物电泳检测。2%琼脂糖凝胶,180V电泳30min(电泳结果如图4B所示。PCR产物大小约280bp,如箭头所示)。
5.TA克隆
使用凝胶纯化试剂盒切胶纯化步骤4中的PCR产物,使用TA克隆试剂盒(TAKARA)克隆PCR纯化产物。
6.酶切鉴定
从步骤5中的TA克隆平板挑单克隆,培养过夜后使用试剂盒提取质粒,用设计的BssHII酶切位点酶切鉴定。酶切体系:BssHII(NEB)0.5μL,CutSmart缓冲液1μL,质粒DNA4μL,ddH2O 4.5μL。37℃酶切1h。取5μL酶切产物电泳检测,2%琼脂糖凝胶,180V电泳30min(电泳结果如图4C所示,正确条带大小如箭头所示)。
7.测序分析
挑选酶切正确的质粒进行sanger测序,分析、获得带有正确组装序列的质粒。
表1
Figure PCTCN2016081037-appb-000001
Figure PCTCN2016081037-appb-000002
Figure PCTCN2016081037-appb-000003
Figure PCTCN2016081037-appb-000004
申请人声明,本发明通过上述实施例来说明本发明的方法及其用途,但本发明并不局限于此,所属技术领域的技术人员应该明了,对本发明方法的任何改进,对本发明产品的等效替换及辅助成分的添加、具体方式的选择等,均落在本发明的保护范围和公开范围之内。

Claims (9)

  1. 一种利用DNA存储文本信息的方法,其包括以下步骤:
    (1)通过编码,将文字编码为计算机二进制数字;
    (2)通过转码,将编码文字的计算机二进制数字转换成A、T、G、C四种脱氧核糖核苷酸表示的DNA序列;
    (3)合成编码文字的DNA序列;
    (4)通过设计的连接接头对编码文字的DNA序列进行定位,按照待存储信息的文字顺序,将编码各个文字的各个DNA序列进行连接、组装并储存。
  2. 根据权利要求1所述的方法,其特征在于,所述编码为Unicode-ucs2编码;
    优选地,每个汉字由十六进制数字编码,每1位十六进制数字转换为4位二进制数字;进一步优选地,每8位二进制数字产生4位用于校验的海明码。
  3. 根据权利要求1或2所述的方法,其特征在于,所述转码为按照二进制数字0转换为G或者T、二进制数字1转换为C或者A的原则,将编码文字的二进制数字转换成DNA序列。
  4. 根据权利要求1-3任一项所述的方法,其特征在于,一个汉字被编码成24个碱基。
  5. 根据权利要求1-4任一项所述的方法,其特征在于,通过考虑包括DNA序列的GC含量、二级结构、碱基重复率的参数中的一种或多种,进行序列设计的控制;
    优选地,所述DNA序列被设计为使其GC含量在45-60%、优选50%;
    优选地,所述DNA序列被设计为避免二级结构的产生;
    优选地,所述DNA序列被设计为使其序列内的单碱基连续不超过2个。
  6. 根据权利要求1-5任一项所述的方法,其特征在于,步骤(4)中,所述连接接头包括上游接头和下游接头;
    优选地,通过各自突出的互补配对碱基,实现所述DNA序列与所述连接接头的定向连接;
    优选地,通过重叠延伸PCR,对包含各文字的编码信息的各个DNA序列进行连接,并进一步组装成更长的DNA序列;进一步优选地,所述连接方法为PCA或GoldenGate法;进一步优选地,所述组装方法为Gibson法;
    优选地,将组装的DNA序列克隆到质粒上再进行储存;
    优选地,在储存前,还包括通过测序验证组装的DNA序列的正确性的步骤。
  7. 对根据权利要求1-6任一项所述方法存储的文本信息进行解码的方法,其特征在于,包括以下步骤:
    (1)将存储有文本信息的DNA序列进行测序;
    (2)通过与权利要求1-8任一项所述方法中限定的相同的转码、编码规则,将所测得的DNA序列转换成二进制数字,再转换成对应的汉字,从而获得所述存储的文本信息。
  8. 根据权利要求7所述的方法,其特征在于,在所述解码过程中,还包括根据海明码校验原理,对DNA序列中的变异进行纠正的步骤。
  9. 根据权利要求1-6任一项所述的方法和/或根据权利要求7-8任一项所述的方法在文本信息存储和/或读取中的应用。
PCT/CN2016/081037 2016-05-04 2016-05-04 利用dna存储文本信息的方法、其解码方法及应用 WO2017190297A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
CN201680085320.9A CN109074424B (zh) 2016-05-04 2016-05-04 利用dna存储文本信息的方法、其解码方法及应用
US16/098,471 US10839295B2 (en) 2016-05-04 2016-05-04 Method for using DNA to store text information, decoding method therefor and application thereof
PCT/CN2016/081037 WO2017190297A1 (zh) 2016-05-04 2016-05-04 利用dna存储文本信息的方法、其解码方法及应用
EP16900819.0A EP3470997A4 (en) 2016-05-04 2016-05-04 METHOD OF USING DNA TO STORE TEXTUAL INFORMATION, CORRESPONDING DECODING METHOD AND APPLICATION THEREOF

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2016/081037 WO2017190297A1 (zh) 2016-05-04 2016-05-04 利用dna存储文本信息的方法、其解码方法及应用

Publications (1)

Publication Number Publication Date
WO2017190297A1 true WO2017190297A1 (zh) 2017-11-09

Family

ID=60202585

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/081037 WO2017190297A1 (zh) 2016-05-04 2016-05-04 利用dna存储文本信息的方法、其解码方法及应用

Country Status (4)

Country Link
US (1) US10839295B2 (zh)
EP (1) EP3470997A4 (zh)
CN (1) CN109074424B (zh)
WO (1) WO2017190297A1 (zh)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109460822A (zh) * 2018-11-19 2019-03-12 天津大学 基于dna的信息存储方法
CN109943560A (zh) * 2018-11-22 2019-06-28 西藏自治区人民政府驻成都办事处医院 基于dna载体的汉字信息存储方法
WO2019178551A1 (en) 2018-03-16 2019-09-19 Catalog Technologies, Inc. Chemical methods for nucleic acid-based data storage
WO2020028955A1 (en) 2018-08-10 2020-02-13 Nucleotrace Pty. Ltd. Systems and methods for identifying a products identity
US10650312B2 (en) 2016-11-16 2020-05-12 Catalog Technologies, Inc. Nucleic acid-based data storage
WO2020239806A1 (en) 2019-05-27 2020-12-03 Vib Vzw A method of storing digital information in pools of nucleic acid molecules
US10956806B2 (en) 2019-06-10 2021-03-23 International Business Machines Corporation Efficient assembly of oligonucleotides for nucleic acid based data storage
WO2021056167A1 (zh) * 2019-09-24 2021-04-01 深圳华大生命科学研究院 信息编码和解码方法、装置、存储介质以及信息存储和解读方法
US11017170B2 (en) 2018-09-27 2021-05-25 At&T Intellectual Property I, L.P. Encoding and storing text using DNA sequences
US11227219B2 (en) 2018-05-16 2022-01-18 Catalog Technologies, Inc. Compositions and methods for nucleic acid-based data storage
US11306353B2 (en) 2020-05-11 2022-04-19 Catalog Technologies, Inc. Programs and functions in DNA-based data storage
CN114958828A (zh) * 2022-06-14 2022-08-30 深圳先进技术研究院 基于dna分子介质的数据信息存储方法
US11535842B2 (en) 2019-10-11 2022-12-27 Catalog Technologies, Inc. Nucleic acid security and authentication
US11610651B2 (en) 2019-05-09 2023-03-21 Catalog Technologies, Inc. Data structures and operations for searching, computing, and indexing in DNA-based data storage
WO2023108616A1 (zh) * 2021-12-17 2023-06-22 深圳华大生命科学研究院 利用dna进行信息存储的方法和系统
US11763169B2 (en) 2016-11-16 2023-09-19 Catalog Technologies, Inc. Systems for nucleic acid-based data storage
US11854668B2 (en) 2018-07-26 2023-12-26 Evonetix Ltd Accessing data storage provided using double-stranded nucleic acid molecules

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11339423B2 (en) * 2018-03-18 2022-05-24 Bryan Bishop Systems and methods for data storage in nucleic acids
CN109830263B (zh) * 2019-01-30 2023-04-07 东南大学 一种基于寡核苷酸序列编码存储的dna存储方法
CN109887549B (zh) * 2019-02-22 2023-01-20 天津大学 一种数据存储、还原方法及装置
CN111028883B (zh) * 2019-11-20 2023-07-18 广州达美智能科技有限公司 基于布尔代数的基因处理方法、装置及可读存储介质
CN111091876B (zh) * 2019-12-16 2024-05-17 中国科学院深圳先进技术研究院 一种dna存储方法、系统及电子设备
CN111243670A (zh) * 2020-01-23 2020-06-05 天津大学 一种满足生物约束的dna信息存储编码方法
CN111368132B (zh) * 2020-02-28 2023-04-14 元码基因科技(北京)股份有限公司 基于dna序列存储音频或视频文件的方法及存储介质
CN111680797B (zh) * 2020-05-08 2023-06-06 中国科学院计算技术研究所 一种dna活字印刷机、基于dna的数据存储设备和方法
CN111737955A (zh) * 2020-06-24 2020-10-02 任兆瑞 一种使用dna字符码存储文字点阵的方法
CN114058471A (zh) * 2020-07-29 2022-02-18 东南大学 负载了dna存储数据的数据存储装置、制备方法和读数方法
CN112100982B (zh) * 2020-08-07 2023-06-20 广州大学 Dna存储方法、系统和存储介质
CN112002376B (zh) * 2020-08-13 2024-03-19 中国海洋大学 一种dna分子记录和读取信息的方法
CN112382340B (zh) * 2020-11-25 2022-11-15 中国科学院深圳先进技术研究院 用于dna数据存储的编解码方法和编解码装置
CN112802549B (zh) * 2021-01-26 2022-05-13 武汉大学 Dna序列完整性校验和纠错的编解码方法
US20220243252A1 (en) * 2021-02-03 2022-08-04 Seagate Technology Llc Isotope modified nucleotides for dna data storage
CN113744804B (zh) * 2021-06-21 2023-03-10 深圳先进技术研究院 利用dna进行数据存储的方法、装置及存储设备
CN114898806A (zh) * 2022-05-25 2022-08-12 天津大学 一种dna活字写入系统及方法
CN114758703B (zh) * 2022-06-14 2022-09-13 深圳先进技术研究院 基于重组质粒dna分子的数据信息存储方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1536068A (zh) * 2003-02-03 2004-10-13 ���ǵ�����ʽ���� 编码脱氧核糖核酸序列的方法和装置及计算机可读介质
CN104850760A (zh) * 2015-03-27 2015-08-19 苏州泓迅生物科技有限公司 带有编码信息的人工合成dna存储介质及信息的存储读取方法和应用
CN105022935A (zh) * 2014-04-22 2015-11-04 中国科学院青岛生物能源与过程研究所 一种利用dna进行信息存储的编码方法和解码方法
CN105119717A (zh) * 2015-07-21 2015-12-02 郑州轻工业学院 一种基于dna编码的加密系统及加密方法

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7371851B1 (en) * 1999-03-18 2008-05-13 Complete Genomics As Methods of cloning and producing fragment chains with readable information content
EP1573034A4 (en) * 2002-06-20 2006-06-14 Bristol Myers Squibb Co IDENTIFICATION AND MODULATION OF A G PROTEIN-COUPLED RECEPTOR (GPCR), RAI3, ASSOCIATED WITH CHRONIC OBSTRUCTIVE BRONCHOPNEUMOPATHY (COPD) AND REGULATION OF NF- $ G (K) B AND E-SELECTINE
US20040001371A1 (en) * 2002-06-26 2004-01-01 The Arizona Board Of Regents On Behalf Of The University Of Arizona Information storage and retrieval device using macromolecules as storage media
US20050053968A1 (en) * 2003-03-31 2005-03-10 Council Of Scientific And Industrial Research Method for storing information in DNA
JP2004355294A (ja) * 2003-05-29 2004-12-16 National Institute Of Advanced Industrial & Technology 情報担体としてのdna符号の設計方法
CA2874540A1 (en) * 2012-06-01 2013-12-05 European Molecular Biology Laboratory High-capacity storage of digital information in dna
US9691017B2 (en) * 2012-12-13 2017-06-27 Massachusetts Institute Of Technology Recombinase-based logic and memory systems
KR20160001455A (ko) * 2014-06-27 2016-01-06 한국생명공학연구원 데이터 저장용 dna 메모리 기술
WO2016059610A1 (en) * 2014-10-18 2016-04-21 Malik Girik A biomolecule based data storage system
US20170335334A1 (en) * 2014-10-29 2017-11-23 Massachusetts Institute Of Technology Dna cloaking technologies
US20170141793A1 (en) * 2015-11-13 2017-05-18 Microsoft Technology Licensing, Llc Error correction for nucleotide data stores

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1536068A (zh) * 2003-02-03 2004-10-13 ���ǵ�����ʽ���� 编码脱氧核糖核酸序列的方法和装置及计算机可读介质
CN105022935A (zh) * 2014-04-22 2015-11-04 中国科学院青岛生物能源与过程研究所 一种利用dna进行信息存储的编码方法和解码方法
CN104850760A (zh) * 2015-03-27 2015-08-19 苏州泓迅生物科技有限公司 带有编码信息的人工合成dna存储介质及信息的存储读取方法和应用
CN105119717A (zh) * 2015-07-21 2015-12-02 郑州轻工业学院 一种基于dna编码的加密系统及加密方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3470997A4 *

Cited By (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10650312B2 (en) 2016-11-16 2020-05-12 Catalog Technologies, Inc. Nucleic acid-based data storage
US12001962B2 (en) 2016-11-16 2024-06-04 Catalog Technologies, Inc. Systems for nucleic acid-based data storage
US11763169B2 (en) 2016-11-16 2023-09-19 Catalog Technologies, Inc. Systems for nucleic acid-based data storage
US11379729B2 (en) 2016-11-16 2022-07-05 Catalog Technologies, Inc. Nucleic acid-based data storage
JP7364604B2 (ja) 2018-03-16 2023-10-18 カタログ テクノロジーズ, インコーポレイテッド 核酸ベースのデータ記憶のための化学的方法
WO2019178551A1 (en) 2018-03-16 2019-09-19 Catalog Technologies, Inc. Chemical methods for nucleic acid-based data storage
US12006497B2 (en) 2018-03-16 2024-06-11 Catalog Technologies, Inc. Chemical methods for nucleic acid-based data storage
US11286479B2 (en) 2018-03-16 2022-03-29 Catalog Technologies, Inc. Chemical methods for nucleic acid-based data storage
EP3766077A4 (en) * 2018-03-16 2021-12-08 Catalog Technologies, Inc. CHEMICAL PROCESSES FOR DATA STORAGE BASED ON NUCLEIC ACIDS
JP2021518164A (ja) * 2018-03-16 2021-08-02 カタログ テクノロジーズ, インコーポレイテッド 核酸ベースのデータ記憶のための化学的方法
US11227219B2 (en) 2018-05-16 2022-01-18 Catalog Technologies, Inc. Compositions and methods for nucleic acid-based data storage
US11854668B2 (en) 2018-07-26 2023-12-26 Evonetix Ltd Accessing data storage provided using double-stranded nucleic acid molecules
TWI828700B (zh) * 2018-07-26 2024-01-11 英商伊門勒汀斯有限公司 用於存取使用雙鏈核酸分子設置的資料存儲的方法、電腦可讀取程式和資料結構
WO2020028955A1 (en) 2018-08-10 2020-02-13 Nucleotrace Pty. Ltd. Systems and methods for identifying a products identity
EP3834159A4 (en) * 2018-08-10 2022-08-17 Nucleotrace Pty. Ltd. PRODUCT IDENTIFICATION SYSTEMS AND PROCEDURES
US11017170B2 (en) 2018-09-27 2021-05-25 At&T Intellectual Property I, L.P. Encoding and storing text using DNA sequences
CN109460822A (zh) * 2018-11-19 2019-03-12 天津大学 基于dna的信息存储方法
CN109943560A (zh) * 2018-11-22 2019-06-28 西藏自治区人民政府驻成都办事处医院 基于dna载体的汉字信息存储方法
US11610651B2 (en) 2019-05-09 2023-03-21 Catalog Technologies, Inc. Data structures and operations for searching, computing, and indexing in DNA-based data storage
US12002547B2 (en) 2019-05-09 2024-06-04 Catalog Technologies, Inc. Data structures and operations for searching, computing, and indexing in DNA-based data storage
WO2020239806A1 (en) 2019-05-27 2020-12-03 Vib Vzw A method of storing digital information in pools of nucleic acid molecules
US10956806B2 (en) 2019-06-10 2021-03-23 International Business Machines Corporation Efficient assembly of oligonucleotides for nucleic acid based data storage
WO2021056167A1 (zh) * 2019-09-24 2021-04-01 深圳华大生命科学研究院 信息编码和解码方法、装置、存储介质以及信息存储和解读方法
US11535842B2 (en) 2019-10-11 2022-12-27 Catalog Technologies, Inc. Nucleic acid security and authentication
US11306353B2 (en) 2020-05-11 2022-04-19 Catalog Technologies, Inc. Programs and functions in DNA-based data storage
WO2023108616A1 (zh) * 2021-12-17 2023-06-22 深圳华大生命科学研究院 利用dna进行信息存储的方法和系统
CN114958828B (zh) * 2022-06-14 2024-04-19 深圳先进技术研究院 基于dna分子介质的数据信息存储方法
CN114958828A (zh) * 2022-06-14 2022-08-30 深圳先进技术研究院 基于dna分子介质的数据信息存储方法

Also Published As

Publication number Publication date
EP3470997A4 (en) 2020-04-01
US10839295B2 (en) 2020-11-17
CN109074424A (zh) 2018-12-21
CN109074424B (zh) 2022-03-11
EP3470997A1 (en) 2019-04-17
US20190138909A1 (en) 2019-05-09

Similar Documents

Publication Publication Date Title
WO2017190297A1 (zh) 利用dna存储文本信息的方法、其解码方法及应用
Meiser et al. Reading and writing digital data in DNA
AU2018247323B2 (en) High-Capacity Storage of Digital Information in DNA
CN110945595B (zh) 基于dna的数据存储和检索
Organick et al. Scaling up DNA data storage and random access retrieval
Amos et al. Error-resistant implementation of DNA computations
Akram et al. Trends to store digital data in DNA: an overview
KR20130038353A (ko) 신규 pcr 서열화 방법 및 hla 제노타이핑에서의 상기 방법의 사용
US20210074380A1 (en) Reverse concatenation of error-correcting codes in dna data storage
CN106868005B (zh) 一种用于高效快速扩增cDNA末端的锚定引物及扩增方法
KR20160001455A (ko) 데이터 저장용 dna 메모리 기술
CN104087610A (zh) 穿梭质粒载体及其构建方法和应用
CN111858507B (zh) 基于dna的数据存储方法、解码方法、系统和装置
WO2004107243A1 (ja) 情報担体としてのdna符号の設計方法
US11845982B2 (en) Key-value store that harnesses live micro-organisms to store and retrieve digital information
Wang et al. Oligo design with single primer binding site for high capacity DNA-based data storage
Yan et al. Scaling logical density of DNA storage with enzymatically-ligated composite motifs
Lau et al. Magnetic DNA random access memory with nanopore readouts and exponentially-scaled combinatorial addressing
Garafutdinov et al. Encoding of non-biological information for its long-term storage in DNA
JP6352804B2 (ja) バイオインフォマティクス文字セット及びマップされたバイオインフォマティクスフォントを用いたゲノム/プロテオミクス配列の表現、視覚化、比較及びレポーティング
Lee et al. DNA data storage in Perl
WO2015175602A1 (en) Systems, methods, and devices for analysis of genetic material
US20220358290A1 (en) Encoding and storing text using dna sequences
CN110551802A (zh) 一种快速合成全基因序列的方法及其应用
KR101953663B1 (ko) 하나의 올리고뉴클레오티드를 이용해서 올리고뉴클레오티드 풀을 생산하는 방법

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16900819

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2016900819

Country of ref document: EP

Effective date: 20181204