WO2022120626A1 - Procédé et appareil de stockage de données basé sur l'adn, procédé et appareil de récupération de données basée sur l'adn et dispositif terminal - Google Patents

Procédé et appareil de stockage de données basé sur l'adn, procédé et appareil de récupération de données basée sur l'adn et dispositif terminal Download PDF

Info

Publication number
WO2022120626A1
WO2022120626A1 PCT/CN2020/134847 CN2020134847W WO2022120626A1 WO 2022120626 A1 WO2022120626 A1 WO 2022120626A1 CN 2020134847 W CN2020134847 W CN 2020134847W WO 2022120626 A1 WO2022120626 A1 WO 2022120626A1
Authority
WO
WIPO (PCT)
Prior art keywords
file
data
algorithm
base sequence
binary
Prior art date
Application number
PCT/CN2020/134847
Other languages
English (en)
Chinese (zh)
Inventor
李敏
戴俊彪
王洋
姜青山
罗周卿
姜双英
Original Assignee
中国科学院深圳先进技术研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院深圳先进技术研究院 filed Critical 中国科学院深圳先进技术研究院
Priority to PCT/CN2020/134847 priority Critical patent/WO2022120626A1/fr
Publication of WO2022120626A1 publication Critical patent/WO2022120626A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics

Definitions

  • the present application belongs to the technical field of data storage, and in particular, relates to a DNA-based data storage method, a data recovery method, a data storage device, a data recovery device, a terminal device, and a computer-readable storage medium.
  • DNA deoxyribonucleic acid
  • DNA as a storage medium
  • One of the advantages of DNA as a storage medium is the stability of DNA molecules, which can be stored for up to a hundred years without human intervention.
  • Most of the data will be preprocessed by some algorithms before being stored. If the data can be restored after decades or hundreds of years, it is necessary to know the data and the corresponding preprocessing algorithm.
  • the preprocessing algorithm used cannot be guaranteed. Whether it exists completely, and the data stored after preprocessing cannot be recovered.
  • DNA-based data storage method DNA-based data storage method, data recovery method, data storage device, data recovery device, terminal equipment and computer-readable storage medium, aiming to solve the problem of whether the preprocessing algorithm adopted cannot be guaranteed because it cannot be guaranteed. It exists completely, and the data stored after preprocessing cannot be recovered.
  • the embodiments of the present application provide a DNA-based data storage method, including:
  • the data file is a file obtained by preprocessing the original data according to the algorithm file; edit the data file and the algorithm file according to the preset file format, and generate the binary file to be encoded, so
  • the file format is used to indicate the index type between the data file and the algorithm file;
  • the binary file is encoded to obtain a base sequence, and the base sequence is used to synthesize and store the data file and the algorithm file.
  • the DNA fragment of the algorithm file is a file obtained by preprocessing the original data according to the algorithm file; edit the data file and the algorithm file according to the preset file format, and generate the binary file to be encoded, so
  • the file format is used to indicate the index type between the data file and the algorithm file;
  • the binary file is encoded to obtain a base sequence, and the base sequence is used to synthesize and store the data file and the algorithm file.
  • the DNA fragment of the algorithm file is a file obtained by preprocessing the original data according to the algorithm file; edit the data file and the algorithm file according to the preset
  • the method before acquiring the data file to be stored, the method further includes: compressing, deleting redundancy or encrypting the original data according to the algorithm file, Get the data file.
  • the editing of the data file and the algorithm file according to a preset file format to generate a binary file to be encoded includes:
  • edit the attribute identification bits and valid data bits of the data file according to the attribute identification bits and the valid data bits of the data file, determine the relative The offset of the valid data bits of the data file; based on the offset, edit the valid data bits of the algorithm file to obtain a binary file in which the data file and the algorithm file are located in the same file.
  • the encoding of the binary file to obtain a base sequence includes:
  • the editing of the data file and the algorithm file according to a preset file format to generate a binary file to be encoded includes:
  • the preset file format edit the first attribute identification bit and the first valid data bit of the data file to obtain the first binary file corresponding to the data file; according to the preset file format, Edit the second attribute identification bit and the second valid data bit of the algorithm file to obtain a second binary file corresponding to the algorithm file; wherein, the first binary file and the second binary file
  • the files are two independent files.
  • the encoding of the binary file to obtain a base sequence includes:
  • the first binary file and the second binary file are encoded to obtain the first base sequence of the first binary file and the second binary file.
  • prepare the second base sequence of the file add the first primer sequence to the head and tail of the first base sequence to obtain the base sequence for synthesizing the first fragment in the DNA fragment; add the second primer sequence To the head and tail of the second base sequence, the base sequence for synthesizing the second fragment in the DNA fragment is obtained.
  • the encoding of the binary file to obtain a base sequence includes:
  • the first binary file and the second binary file are encoded to obtain the first base sequence of the first binary file and the second binary file.
  • preparing the second base sequence of the file adding the head primer sequence and the tail primer sequence to the head and tail of the first base sequence to obtain the base sequence for synthesizing the third fragment in the DNA fragment; adding A universal primer sequence and a tail primer sequence of one or more first base sequences corresponding to the second base sequence to the head and tail of the second base sequence to obtain a primer for synthesizing the DNA fragment.
  • the base sequence of the fourth fragment is preparing the second base sequence of the file; adding the head primer sequence and the tail primer sequence to the head and tail of the first base sequence to obtain the base sequence for synthesizing the third fragment in the DNA fragment; adding A universal primer sequence and a tail primer sequence of one or more first base sequences corresponding to the second base sequence to the head and tail of the second base sequence to obtain a primer for synthesizing the DNA fragment.
  • the embodiments of the present application provide a DNA-based data recovery method, including:
  • the DNA fragment to be decoded is used to store data files and algorithm files; decoding the DNA fragment to be decoded to obtain a binary file conforming to a preset file format, the file format used to indicate the index type between the data file and the algorithm file; read the data file and the algorithm file in the binary file, and call the algorithm file according to the index type;
  • the algorithm file performs parsing processing on the data file to obtain the original data corresponding to the data file.
  • the decoding process is performed on the DNA segment to be decoded to obtain a binary file conforming to a preset file format, including:
  • the DNA fragment to be decoded is sequenced to obtain the base sequence of the DNA fragment; according to the preset decoding model, the base sequence of the DNA fragment is sequenced Decoding is performed to obtain the binary file; wherein, the binary file is a file in which the data file and the algorithm file are located in the same file.
  • the attribute identification bit of the data file includes an index indicating an index type; the reading of the data file and the algorithm file in the binary file is performed according to The index type calls the algorithm file, including:
  • the decoding process is performed on the DNA segment to be decoded to obtain a binary file conforming to a preset file format, including:
  • the first fragment is sequenced to obtain the first base sequence and the second primer sequence; according to the second primer sequence, the to-be-decoded sequence Sequencing the second fragment of the DNA fragment obtained by obtaining the second base sequence; according to a preset decoding model, decoding the first base sequence and the second base sequence to obtain the first base sequence
  • the first binary file corresponding to the base sequence and the second binary file corresponding to the second base sequence wherein, the first binary file corresponds to the data file, and the second binary file corresponds to the data file.
  • the file corresponds to the algorithm file.
  • the decoding process is performed on the DNA segment to be decoded to obtain a binary file conforming to a preset file format, including:
  • the third fragment is sequenced to obtain the first base sequence; according to the tail primer sequence of the third fragment and For the universal primer sequence of the fourth fragment in the DNA fragments to be decoded, the fourth fragment is sequenced to obtain a second base sequence; according to a preset decoding model, the first base sequence and the The second base sequence is decoded to obtain a first binary file corresponding to the first base sequence and a second binary file corresponding to the second base sequence; wherein, the first binary file is The system file corresponds to the data file, and the second binary file corresponds to the algorithm file.
  • the first attribute identification bit of the data file includes an index indicating an index type; the reading the data file and the algorithm file in the binary file, And call the algorithm file according to the index type, including:
  • an embodiment of the present application provides a DNA-based data storage device, including:
  • a first acquiring unit configured to acquire a data file to be stored, where the data file is a file obtained by preprocessing the original data according to the algorithm file;
  • the first processing unit is configured to edit the data file and the algorithm file according to a preset file format, and generate a binary file to be encoded, and the file format is used to indicate the relationship between the data file and the algorithm file.
  • index type
  • the coding unit is used for coding the binary file to obtain a base sequence, and the base sequence is used for synthesizing the DNA fragments storing the data file and the algorithm file.
  • an embodiment of the present application provides a DNA-based data recovery device, including:
  • a second acquiring unit configured to acquire DNA fragments to be decoded, and the DNA fragments to be decoded are used to store data files and algorithm files;
  • a decoding unit configured to decode the DNA fragment to be decoded to obtain a binary file conforming to a preset file format, where the file format is used to indicate an index type between the data file and the algorithm file;
  • a second processing unit configured to read the data file and the algorithm file in the binary file, and call the algorithm file according to the index type
  • the parsing unit is configured to perform parsing processing on the data file according to the algorithm file to obtain original data corresponding to the data file.
  • an embodiment of the present application provides a terminal device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, which is implemented when the processor executes the computer program Any one of the data storage method in the above-mentioned first aspect or any one of the data recovery method in the above-mentioned second aspect.
  • an embodiment of the present application provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, implements any one of the above-mentioned first aspect The method for data storage or the method for data recovery according to any one of the above second aspects.
  • an embodiment of the present application provides a computer program product that, when the computer program product runs on a terminal device, enables the terminal device to execute the data storage method described in any one of the first aspects or the second method described above.
  • the data recovery method of any one of the aspects is not limited to:
  • the terminal device obtains the data file to be stored, and the data file is a file obtained by preprocessing the original data according to the algorithm file; the data file and the algorithm file are edited according to the preset file format, and the data file to be encoded is generated.
  • the file format is used to indicate the index type between the data file and the algorithm file; the binary file is encoded to obtain the base sequence, and the base sequence is used to synthesize the DNA fragments that store the data file and the algorithm file.
  • the base sequence is obtained, and the DNA is synthesized and stored, which reduces the external information. It reduces the risk of unrecoverable data due to the loss of external algorithms, and ensures the integrity and reliability of large-scale data storage in a long-term uncertain environment; it has strong ease of use and practicability.
  • FIG. 1 is a schematic diagram of a system architecture of an application scenario provided by an embodiment of the present application.
  • FIG. 2 is a schematic flowchart of a DNA-based data storage method provided by an embodiment of the present application
  • FIG. 3 is a schematic diagram of a logical relationship between a data file and an algorithm file provided by an embodiment of the present application
  • FIG. 4 is a schematic diagram of a logical relationship between a data file and an algorithm file provided by another embodiment of the present application.
  • FIG. 5 is a schematic diagram of a synthetic DNA fragment provided by an embodiment of the present application.
  • FIG. 6 is a schematic diagram of a synthetic DNA fragment provided by another embodiment of the present application.
  • FIG. 7 is a schematic diagram of a synthetic DNA fragment provided by another embodiment of the present application.
  • FIG. 8 is a schematic flowchart of a DNA-based data recovery method provided by another embodiment of the present application.
  • FIG. 9 is a schematic structural diagram of a binary file provided by an embodiment of the present application.
  • FIG. 10 is a schematic structural diagram of a binary file provided by another embodiment of the present application.
  • FIG. 11 is a schematic structural diagram of a DNA-based data storage device provided by an embodiment of the present application.
  • FIG. 12 is a schematic structural diagram of a DNA-based data recovery device provided by an embodiment of the present application.
  • FIG. 13 is a schematic structural diagram of a terminal device provided by an embodiment of the present application.
  • the term “if” may be contextually interpreted as “when” or “once” or “in response to determining” or “in response to detecting “.
  • the phrases “if it is determined” or “if the [described condition or event] is detected” may be interpreted, depending on the context, to mean “once it is determined” or “in response to the determination” or “once the [described condition or event] is detected. ]” or “in response to detection of the [described condition or event]”.
  • references in this specification to "one embodiment” or “some embodiments” and the like mean that a particular feature, structure or characteristic described in connection with the embodiment is included in one or more embodiments of the present application.
  • appearances of the phrases “in one embodiment,” “in some embodiments,” “in other embodiments,” “in other embodiments,” etc. in various places in this specification are not necessarily All refer to the same embodiment, but mean “one or more but not all embodiments” unless specifically emphasized otherwise.
  • the terms “including”, “including”, “having” and their variants mean “including but not limited to” unless specifically emphasized otherwise.
  • DNA storage The general process of DNA storage is that digital data is encoded into DNA base sequences, DNA fragments are synthesized according to the encoded base sequences, and stored in storage media in vivo and in vitro. Among them, the synthesis of DNA fragments can be realized by writing nucleotide base sequences through a synthesizer, and then the pooled liquid is used as a medium for storage. When reading data, the data can be read, sequenced by a sequencer, and recovered by subsequent decoding processing.
  • FIG. 1 is a schematic diagram of a system architecture of an application scenario provided by an embodiment of the present application.
  • FIG. 1 shows an end-to-end full-process DNA storage system architecture that realizes data self-contained and self-recovery provided by an embodiment of the present application.
  • the preprocessing methods include but are not limited to compression, redundant deletion, encryption, encoding, etc., to obtain data files, and at the same time, obtain the algorithm files of the preprocessing algorithm. If the data file and the algorithm file are defined according to the preset file format, a digital file is obtained, and the self-contained data to be encoded is realized.
  • the digital file may be a file in binary, quaternary, or octal format, which is not specifically limited.
  • different encoding techniques are used to encode digital files to obtain base sequences; through synthetic biology, base sequences are synthesized into DNA fragments and stored in in vivo and in vitro storage media.
  • the layout of data in the DNA storage media is optimized, such as the index algorithm corresponding to the test tube number of the DNA storage file, etc., to improve the search for DNA storage files. read speed.
  • the DNA fragment is sequenced to obtain the base sequence; and the base sequence is decoded into a digital file through a computer mathematical algorithm (the inverse operation of the encoding process); the digital file includes binary, quaternary or File formats such as octal represent files for data files and algorithm files.
  • a computer mathematical algorithm the inverse operation of the encoding process
  • the digital file includes binary, quaternary or File formats such as octal represent files for data files and algorithm files.
  • the algorithm for preprocessing the original data is also stored in DNA according to a certain file format in the storage stage, so that the data can be completely restored without the aid of an external algorithm, or the data can be restored with the least external information.
  • Complete original data in the application scenario of large-scale and complex data storage, in order to realize the self-containment and self-interpretation of data stored in DNA, a unified digital file format is defined, and data and algorithms are associated and managed in a unified manner; It ensures the reliability of large-scale data storage in a long-term uncertain environment and the integrity of data recovery.
  • FIG. 2 it is a schematic flowchart of a DNA-based data storage method provided by an embodiment of the present application, including the following steps:
  • Step S201 Obtain a data file to be stored, where the data file is a file obtained by preprocessing the original data according to the algorithm file.
  • the data files are files obtained by preprocessing various types of raw data.
  • Various types of raw data include text types (txt format, doc format, etc.), image types (jpg format, etc.), and video types. Different types of raw data correspond to different preprocessing algorithms.
  • the method before acquiring the data file to be stored, the method further includes: compressing, removing redundancy or encrypting the original data according to the algorithm file to obtain the data file.
  • pre-processing methods include but are not limited to processing such as compression, redundancy deletion, or encryption.
  • preprocessing process the purpose of compression can be achieved by removing redundancy.
  • common preprocessing algorithms include Huffman coding, fountain codes or LZMA data compression algorithms.
  • the data compression algorithm of Huffman coding is suitable for application scenarios where each character of the input file appears with an unequal probability;
  • the original data information can be recovered with high probability by using code overhead, which can greatly improve the storage efficiency in the process of DNA storage.
  • the LZMA data compression algorithm makes full use of the structural characteristics of various original data, and can realize simple and feasible data compression processing.
  • Step S202 Edit the data file and the algorithm file according to a preset file format to generate a binary file to be encoded, where the file format is used to indicate an index type between the data file and the algorithm file.
  • a standard file storage format is preset. Edit the data file and the algorithm file according to the preset file format to obtain the binary file to be encoded.
  • the size unit of the binary file is bytes; the format in the binary file includes an identification bit used to indicate the index type corresponding to the data file and the algorithm file; the index type includes direct index and indirect index.
  • the index type is direct index
  • the data file and the algorithm file are edited according to the file format shown in Table 1, and the corresponding binary file to be encoded is obtained.
  • editing a data file and an algorithm file according to a preset file format to generate a binary file to be encoded includes: editing attribute identification bits and valid data bits of the data file according to the preset file format; attribute identification bit and valid data bit, determine the offset of the algorithm file relative to the valid data bit of the data file; based on the offset, edit the valid data bit of the algorithm file to obtain the data file and the algorithm file Binaries located in the same file.
  • the file format includes an offset address corresponding to each variable name related to the data file and the algorithm file, the size of the occupied bytes, and the like.
  • the file format of the data file includes attribute identification bits of the data file and valid data bits of the data file.
  • the variable names of the attribute identification bits of the data file include the data file start flag DataB, the file type Type, the flag field Flag, the compression method ComS, the compressed data length ComLen, the data length before compression SouLen, and the data start flag PayloadB;
  • the data file The variable name of the valid data bits includes the compressed or uncompressed data Payload;
  • the variable name of the valid data bits of the algorithm file includes the compression algorithm code or logical representation (optional) Algr.
  • the address offset and size corresponding to each variable name is shown in Table 1.
  • the identification bit of the data file start marker DataB indicates the start of the compressed (or uncompressed) data file.
  • editing the data file and the algorithm file according to the preset file format to generate the binary file to be encoded includes: editing the first attribute identification bit and the first valid data bit of the data file according to the preset file format to obtain the first binary file corresponding to the data file; according to the preset file format, edit the second attribute identification bit and the second valid data bit of the algorithm file to obtain the second binary file corresponding to the algorithm file; wherein, The first binary file and the second binary file are two independent files.
  • the compression algorithm is expressed in another file, as shown in Table 2, the algorithm file corresponding to the binary file alone file format.
  • Table 2 each variable name corresponding to the algorithm file, the address offset corresponding to each variable name, the size of the occupied bytes, and the corresponding function.
  • the variable name of the binary file of the algorithm file includes the attribute identification bit of the algorithm file and the valid data bit of the algorithm file; the variable name of the attribute identification bit of the algorithm file includes the algorithm file start tag AlgrB and the compression algorithm name AlgrName, the valid data of the algorithm file.
  • the bit is the specific algorithm AlgrData (ie, the specific algorithm or logical representation of data compression).
  • the data files and algorithm files are edited according to the preset file format, and the obtained digital files can also be files of other formats, such as ternary files or quaternary files, which correspond to different file formats. Different meanings are set.
  • the identification bits of the same or similar concepts all fall within the protection scope of the embodiments of the present application.
  • Step S203 the binary file is encoded to obtain a base sequence, and the base sequence is used to synthesize the DNA fragments storing the data file and the algorithm file.
  • encoding a binary file refers to converting the binary file information that needs to be stored into a DNA base sequence (that is, a sequence containing A, G, C, and T) through a certain correspondence or rule.
  • a DNA base sequence that is, a sequence containing A, G, C, and T
  • the base sequence DNA fragments used to synthesize storage data files and algorithm files are used to synthesize storage data files and algorithm files.
  • coding models are suitable for different information types, for example, some coding models are suitable for text information, some are only suitable for picture information, and some can be suitable for various types of information.
  • Synthesis of DNA fragments is the process of linking the bases in the base sequence one by one to form a DNA chain.
  • the code conversion may be performed through a conversion model based on a mathematical algorithm.
  • the DNA fragment is composed of A, G, C and T4 bases. Since the data in the computer is in the form of binary (ie 0, 1), storing the data information in the DNA is to encode the binary code stream of the data information. DNA is stored for base sequences. According to the structure of DNA, common DNA storage coding models include binary model, ternary model and quaternary model.
  • the binary model defines any two of the A, T, C and G4 bases as 1, and the other two as 1, that is, the base sequence has only two states of 0 and 1.
  • the binary model can better avoid the unbalanced GC content and many homopolymers in DNA, which can reduce the difficulty of synthesizing DNA fragments in the later stage.
  • the ternary model means that the entire base sequence has only 3 states: 0, 1 and 2. First, edit the data information to be stored into a ternary code stream, and then encode 0, 1 and 2 in the code stream according to the corresponding relationship in Table 3 to obtain the base sequence. The ternary model determines the next base by the previous base, which can store more information.
  • the coding model also includes a quaternary model, which corresponds A, T, C, and G in the base to 0, 1, 2, and 3, and converts the binary code stream read into DNA into quaternary. Commonly prohibited
  • the model mapping relationship is shown in Table 4, wherein the mapping relationship is not unique and includes different combination schemes, and Table 4 only shows one of the mapping relationships.
  • the quaternary model has stronger information storage capacity, and each base can encode two bits of data, which can improve storage efficiency and reduce DNA storage costs.
  • encoding the binary file to obtain the base sequence includes: encoding the binary file according to a preset encoding model to obtain the base sequence of the binary file; adding primer sequences to the base sequence of the binary file The head and tail of the DNA fragment are obtained to obtain the base sequence used for the synthesis of DNA fragments.
  • the preset coding model includes the above-mentioned binary coding model, ternary coding model and quaternary coding model.
  • FIG. 3 a schematic diagram of the logical relationship between a data file and an algorithm file provided by an embodiment of the present application, the logical relationship of the direct index type between the data file and the algorithm file, the data and the algorithm are located in the same binary file.
  • the data is fragmented, and primer sequences are added to the first position of the base fragment.
  • data 1 is preprocessed by algorithm 1
  • data 2 is preprocessed by algorithm 1
  • data n is preprocessed by algorithm m.
  • Each file x has a pair of primer identifiers, including head primer x-F and tail primer x-R, and different data can correspond to the same or different algorithms; for example, data 1 in file 1 corresponds to algorithm 1, and the first part of file 1 includes primer sequences 1-F and 1- R, data 2 in file 2 corresponds to algorithm 1, the first part of file 2 includes primer sequences 2-F and 2-R, and data n in file n corresponds to algorithm m, and the first part of file n includes primer sequences n-F and n-R.
  • Figure 3 shows the direct index relationship between data files and algorithm files.
  • Each binary file x contains data and a backup of the algorithm corresponding to the data. Loss or damage of the binary files in it will not affect each other, such as file 1 If it is lost, file 2 can still be recovered through its corresponding algorithm; it is suitable for application scenarios with very high data security and reliability requirements.
  • the schematic diagram of synthesizing DNA fragments corresponds to the logical relationship of the data corresponding algorithm shown in FIG. 3 , and the encoded files are divided to obtain DNA fragments that can be stored.
  • primer sequences 1-F and 1-R are added to both ends of the base sequence of data 1, 1-F is added to one end of data 1, and 1-R is added to the other end of data 1 ;
  • the primer sequence 1-F of file 1 is added to the head of the base sequence of data 1, as the head primer of the DNA fragment corresponding to data 1
  • the primer sequence 1-R of file 1 is added to the base sequence of data 1 , as the tail primer of the DNA fragment corresponding to data 1.
  • primer sequences 1-F and 1-R to both ends of the base sequence of Algorithm 1, add 1-F to one end of Algorithm 1, and add 1-R to the other end of Algorithm 1;
  • the primer sequence 1-F of 1 is added to the head of the base sequence of Algorithm 1 as the head primer of the DNA fragment corresponding to Algorithm 1
  • the primer sequence 1-R of File 1 is added to the tail of the base sequence of Algorithm 1, as The tail primer of the DNA fragment corresponding to Algorithm 1.
  • primer sequences n-F and n-R of file n are added to both ends of the base sequence of data n, n-F is added to one end of the base sequence of data n, and n-R is added to the other end of the clip sequence of data n.
  • the primer sequence n-F of file n is added to the head of the base sequence of data n, as the head primer of the DNA fragment corresponding to data n;
  • the primer sequence n-R of file n is added to the tail of data n, as data n corresponding The tail primer of the DNA fragment;
  • the primer sequences n-F and n-R of file n are added to both ends of the base sequence of algorithm m, n-F is added to one end of the base sequence of algorithm m, and n-R is added to the other end of algorithm m.
  • One end for example, add the primer sequence n-F of file n to the head of the base sequence of algorithm m, as the head primer of the DNA fragment corresponding to algorithm m; add the primer sequence n-R of file n to the tail of the base sequence of algorithm m, As the tail primer of the DNA fragment corresponding to the algorithm m; thus dividing the encoded file x to obtain a DNA fragment that can be stored.
  • the primer sequence is information stored externally in the DNA, and the DNA fragment can be sequenced through the primer sequence to obtain the base sequence of the DNA fragment.
  • Adding primer sequences to the head and tail of the base sequence of the binary file is not limited to the primer sequences corresponding to the above-mentioned head and tail, as long as it is added at both ends of the base sequence of the binary file.
  • encoding the binary file to obtain the base sequence includes: encoding the first binary file and the second binary file according to a preset encoding model to obtain the The first base sequence of the first binary file and the second base sequence of the second binary file; adding the first primer sequence to the head and tail of the first base sequence to obtain for synthesis The base sequence of the first fragment in the DNA fragment; adding the second primer sequence to the head and tail of the second base sequence to obtain the base sequence for synthesizing the second fragment in the DNA fragment.
  • the first binary file is a binary file of a data file
  • the second binary file is a binary file of an algorithm file corresponding to the data file.
  • the first binary file and the corresponding second binary file are collectively referred to as DNA files.
  • the first to nth DNA files are encoded to obtain the first to nth DNA files.
  • the base sequence of the DNA file After dividing the base sequence of each DNA file into short fragments, adding a first primer sequence to the head and tail of the first base sequence to obtain a base sequence for synthesizing the first fragment in the DNA fragments; adding A second primer sequence is applied to the head and tail of the second base sequence to obtain a base sequence for synthesizing the second fragment of the DNA fragments.
  • the first base sequence is the sequence corresponding to the data file
  • the second base sequence is the sequence corresponding to the algorithm file
  • the first primer sequence corresponding to each DNA file may be a sequence containing different base pairs.
  • FIG. 4 in order to reduce the redundancy in the data storage process and reduce the cost of synthesis and sequencing, another embodiment of the present application provides a schematic diagram of the logical relationship between a data file and an algorithm file. Binary files are kept separately.
  • the encoded data is fragmented, and primer sequences are added to the first position of the base fragment.
  • both data 1 and data 2 are preprocessed by algorithm 1
  • data 3 is preprocessed by algorithm 2.
  • Each file x is identified by a pair of primers, including the head primer x-F and the tail primer x-R.
  • the first part of file 1 includes the first primer sequences 1-F and 1-R
  • the data file 1 includes data 1 and the second primer sequences 1'-F and 1'-R
  • the first part of file 2 includes the first primer sequences 2-F and 1'-R
  • 2-R data file 2 includes data 2 and second primer sequences 1'-F, 1'-R
  • the first position of file 3 includes first primer sequences 3-F and 3-R
  • data file 3 includes data 3 and second Primer sequences 2'-F, 2'-R.
  • the second primer sequence is the primer sequence corresponding to the algorithm.
  • the base pair sequence of the second primer sequence corresponding to different algorithms can be different.
  • the second primer sequences 1'-F and 1'-R corresponding to the algorithm 1
  • the second primer sequences 2'-F and 2'-R corresponding to Algorithm 2.
  • the primers x-F and x-R corresponding to the data file x and the primers x'-F and x'-R corresponding to the algorithm file x are two different pairs of primers, and the primer sequences of the data pointing to the algorithm are included in the data file.
  • FIG. 6 corresponding to FIG. 4 is a schematic diagram of synthesizing DNA fragments provided by another embodiment of the present application.
  • the encoded file is divided to obtain DNA fragments that can be stored; the DNA fragment corresponding to the synthesized data file is the first fragment, and the DNA fragment corresponding to the synthesized algorithm is the second fragment. Fragment.
  • the data is stored separately from the algorithm, and the first primer sequence may be externally stored data for DNA storage.
  • the first primer sequences 1-F and 1-R are added to the head and tail of the base sequence of data file 1 to obtain the first fragment; the first primer sequences 2-F and 2-R are added Go to the head and tail of the base sequence of data file 2 to obtain the first fragment; add the second primer sequences 1'-F and 1'-R to the head and tail of the base sequence of algorithm 1 to obtain the second fragment.
  • the first primer sequence is externally stored information stored in DNA, and the first fragment is sequenced through the first primer sequence to obtain the base sequence of the first fragment.
  • Adding the first primer sequence to the head and tail of the base sequence of the data file, and adding the second primer sequence to the head and tail of the base sequence of the algorithm, are not limited to those corresponding to the head and tail of the base sequence described above.
  • the primer sequence can be added to both ends of the clip sequence in the binary file.
  • the information stored externally is as little as possible, and the information stored in DNA is as much as possible.
  • Only the first base sequence of the first fragment can be stored externally, but this method requires Amplified and sequenced twice.
  • the first primer sequence of the data file and the second primer sequence of the algorithm can also be used as external storage information for DNA storage.
  • encoding the binary file to obtain the base sequence includes: encoding the first binary file and the second binary file according to a preset encoding model to obtain the first binary file The first base sequence of the first base sequence and the second base sequence of the second binary file; the head primer sequence and the tail primer sequence are added to the head and tail of the first base sequence to obtain the DNA fragment used in the synthesis of The base sequence of the third fragment; adding a universal primer sequence and one or more tail primer sequences of the first base sequence corresponding to the second base sequence to the head and tail of the second base sequence to obtain DNA for synthesis The base sequence of the fourth fragment in the fragment.
  • FIG. 7 a schematic diagram of synthesizing DNA fragments provided in another embodiment of the present application.
  • different data may correspond to the same algorithm, for example, data 1, data 2 and data 3 are all preprocessed by algorithm 1, Data 4 is preprocessed by Algorithm 2, but the pointer direction of the algorithm file and the data file is reversed from the algorithm file to the data file.
  • the binary files of the data files and the binary files of the algorithm files are kept separately. After encoding the first binary file corresponding to the data file and the second binary file corresponding to the algorithm file, the encoded data is fragmented, and primer sequences are added to the first position of the base fragment.
  • data 1, data 2, and data 3 are all preprocessed by algorithm 1, and data 4 is preprocessed by algorithm 2.
  • Each file x is identified by a pair of primers, including the head primer x-F and the tail primer x-R.
  • the primers x-F and x-R corresponding to the data file x, and the primers corresponding to the algorithm file x include universal primers and one or more primers x-R; wherein, the primer x-R corresponding to the data file x and the primer x-R corresponding to the algorithm file x are the same primers, which are defined by Algorithms point to data.
  • file 1, file 2, etc. in (a) in FIG. 7 refer to DNA files including data files and algorithm files.
  • the head primer sequence and the tail primer sequence are added to both ends of the first base sequence, respectively, and the universal primer sequence and the one or more first base sequences corresponding to the second base sequence are added.
  • the tail primer sequences are respectively added to both ends of the second base sequence; wherein, the first base sequence is the fragment corresponding to the data file, and the second base sequence is the fragment corresponding to the algorithm file.
  • the base sequence of data 1 is added to the head primer sequence 1-F and the tail primer sequence 1-R
  • the base sequence of data 2 is added to the head primer sequence 2-F and tail primer sequence 2-R
  • data The tail primer sequence 2-R of 2, the tail primer sequence 3-R of data 3; the head primer sequence 4-F and the tail primer sequence 4-R are added to the base sequence of data 4, in the base sequence of algorithm 2.
  • the general primer sequence is added to the head, and the tail primer sequence 4-R corresponding to data 4 is added to the tail; thus, the DNA fragments corresponding to each data file and the algorithm file are synthesized, and the DNA fragment corresponding to the data is classified as the third fragment. The corresponding DNA fragment is classified as the fourth fragment.
  • the universal primer sequence and the head primer sequences of one or more first base sequences corresponding to the second base sequence may also be added to both ends of the second base sequence, respectively.
  • primers x-F and x-R corresponding to the data file x are known primer sequences, which can be external storage for DNA storage. information. With the above-mentioned known primer sequences, the DNA fragment can be sequenced once to obtain the base sequences corresponding to the data file and the algorithm file.
  • a head primer sequence and a tail primer sequence are added to the head and tail of the first base sequence
  • a universal primer sequence and one or more tail primer sequences of the first base sequence corresponding to the second base sequence are added to The head and tail of the second base sequence, or adding a universal primer sequence and one or more head primer sequences of the first base sequence corresponding to the second base sequence to the head and tail of the second base sequence, without It is limited to the primer sequences corresponding to the head and tail of the base sequence described above, and it suffices to be added at both ends of the clip sequence of the binary file.
  • the DNA sequences of the data file and the algorithm file can be expanded simultaneously according to the primer sequence and the general primer sequence of the data file, and the data can be decoded at the same time. files and algorithm files, reducing the amount of information that needs to be saved externally. By storing the data files and algorithm files separately, the concurrent amplification and sequencing of the data files and the algorithm files is realized.
  • FIG. 8 a schematic flowchart of a DNA-based data recovery method provided by another embodiment of the present application.
  • the DNA-based data recovery method as an inverse operation process of DNA-based data storage, can realize the self-recovery of the stored original data.
  • the data preprocessing algorithm is stored in DNA.
  • the system finds the primer sequence of the corresponding file, and simultaneously obtains the data and executable algorithm file through PCR sequencing. After decoding, the executable algorithm in the same directory can automatically convert the data file. Perform analysis, restore the original data, and realize the self-interpretation of the data.
  • the process includes:
  • step S801 the DNA fragment to be decoded is acquired, and the DNA fragment to be decoded is used to store the data file and the algorithm file.
  • the data files and algorithm files are stored in the in vivo and ex vivo storage media in the form of DNA fragments.
  • the system can find the corresponding DNA storage file and the corresponding primer sequence.
  • Data files can be various types of information such as text, picture, and video.
  • the data files are obtained by preprocessing the original data through the algorithm files.
  • the preprocessing algorithms include compression, redundancy deletion, encryption and other preprocessing algorithms.
  • Step S802 decoding the DNA segment to be decoded to obtain a binary file conforming to a preset file format, where the file format is used to indicate the index type between the data file and the algorithm file.
  • the process of decoding the DNA fragments is the inverse of the encoding process.
  • decoding the DNA fragment to be decoded to obtain a binary file conforming to a preset file format includes: sequencing the DNA fragment to be decoded according to the primer sequences in the DNA fragment to be decoded to obtain the DNA fragment The base sequence of the DNA fragment is decoded according to the preset decoding model to obtain a binary file.
  • the DNA fragments are amplified by PCR technology, and then sequenced to obtain the base sequence of the data and the algorithm.
  • the preset decoding model is an inverse operation model of the encoding model, and through the conversion relationship of the decoding model, the data and the base sequence of the algorithm are converted into corresponding binary files.
  • the binary file is a file in which the data file and the algorithm file are located in the same file, and the index type is direct index.
  • FIG. 9 is a schematic structural diagram of a binary file provided by an embodiment of the present application.
  • the data and the algorithm are located in the same binary file, including the attribute identification bits of the data file, the valid data bits of the data file, and the valid data bits of the algorithm file.
  • the variable name corresponding to each identification bit as shown in (a) in Figure 9, the attribute identification bit of the data file includes the data file start mark, data file type, binary attribute mark, compression method, compressed data length, compression Pre-data length and data start marker, etc.
  • the offset of the valid data bits of the algorithm file can be determined by the start of book marker field and the valid data bits of the data file.
  • performing decoding processing on the DNA fragment to be decoded to obtain a binary file conforming to a preset file format includes: performing decoding on the first fragment according to the first primer sequence of the first fragment in the DNA fragment to be decoded Sequencing to obtain a first base sequence and a second primer sequence; according to the second primer sequence, sequencing the second fragment in the DNA fragment to be decoded to obtain a second base sequence; according to a preset decoding model, the first The base sequence and the second base sequence are decoded to obtain a first binary file corresponding to the first base sequence and a second binary file corresponding to the second base sequence.
  • the first binary file corresponds to a data file
  • the second binary file corresponds to an algorithm file.
  • PCR technology is used to amplify and sequence the first fragment to obtain the base sequence of data 1 and the second primer. sequence.
  • the second fragment is amplified and sequenced to obtain the base sequence corresponding to the algorithm file.
  • the base sequence of the data file and the base sequence of the algorithm file are decoded to obtain the binary file of the data file and the binary file of the algorithm file.
  • the first fragment in the decoding process corresponding to the DNA fragment shown in FIG. 6 , according to the first primer sequences 1-F and 1-R, the first fragment can be amplified, and then sequenced to obtain the base sequence of the data file 1.
  • the base sequence of data file 1 includes the first base sequence corresponding to data 1 and the second primer sequence 1'-F, 1'-R; according to the second primer sequence 1'-F, 1'-R, for the first
  • the two fragments are amplified and sequenced to obtain the base sequence corresponding to Algorithm 1.
  • the first primer sequences 2-F and 2-R the first fragment can be amplified and then sequenced to obtain the base sequence of data file 2.
  • the base sequence of data file 2 includes the first base corresponding to data 2 sequence and the second primer sequence 1'-F, 1'-R; according to the second primer sequence 1'-F, 1'-R, the second fragment is amplified, and then sequenced to obtain the base sequence corresponding to algorithm 1 .
  • the first primer sequences 3-F and 3-R the first fragment can be amplified and then sequenced to obtain the base sequence of data file 3.
  • the base sequence of data file 3 includes the first base corresponding to data 3 sequence and the second primer sequence 2'-F, 2'-R; according to the second primer sequence 2'-F, 2'-R, the second fragment is amplified, and then sequenced to obtain the base sequence corresponding to algorithm 2 .
  • the first binary file corresponding to the data file includes the first attribute identification bit of the data file and the first valid data bit of the data file; the first attribute identification bit includes the start of the data file Variable fields such as tag, data file type, binary attribute tag, compression method, data length after compression, data length before compression, and data start tag.
  • the binary attribute tag field includes one byte, eight bits, the first bit F1 indicates whether the original data is preprocessed, and F2 indicates the index type between the data file and the algorithm file.
  • the second binary file corresponding to the algorithm file as shown in (c) of FIG. 10 includes the second attribute identifier of the algorithm file and the second valid data bits of the algorithm file.
  • the second attribute identification bit includes the field of the algorithm file start marker and the field of the algorithm name.
  • the second significant data bit indicates the specific preprocessing algorithm employed.
  • the first binary file corresponds to a data file
  • the second binary file corresponds to an algorithm file
  • decoding the DNA fragment to be decoded to obtain a binary file conforming to a preset file format includes: according to the head primer sequence and the tail primer sequence of the third fragment in the DNA fragment to be decoded, The third fragment is sequenced to obtain the first base sequence; according to the tail primer sequence of the third fragment and the universal primer sequence of the fourth fragment in the DNA fragments to be decoded, the fourth fragment is sequenced to obtain the second base base sequence; according to the preset decoding model, decode the first base sequence and the second base sequence to obtain the first binary file corresponding to the first base sequence and the second binary file corresponding to the second base sequence binary file.
  • the primer sequences corresponding to the data files only need to be stored externally, and the DNA fragments are sequenced by reading the primer sequences corresponding to the data files, and the data files and algorithm files are obtained at the same time. corresponding base sequences.
  • the third fragment and the fourth fragment are simultaneously amplified by using PCR technology, and then sequenced to obtain The base sequence corresponding to the data file and the base sequence corresponding to the algorithm file.
  • the third fragment corresponding to data 1 and the fourth fragment corresponding to algorithm 1 are simultaneously amplified, and then sequenced to obtain the corresponding data 1.
  • the base sequence of and the base sequence corresponding to Algorithm 1. By decoding the base sequence of the data file and the base sequence of the algorithm file, the first binary file and the second binary file in the file format shown in FIG. 10 are obtained.
  • the first binary file corresponds to a data file
  • the second binary file corresponds to an algorithm file
  • Step S803 read the data file and the algorithm file in the binary file, and call the algorithm file according to the index type.
  • the attribute identification bit of the data file when the index type between the data query and the algorithm file is a direct index, includes an index indicating the index type; read the data in the binary file file, algorithm file, and calling the algorithm file according to the index type, including: reading the attribute identification bits and valid data bits of the data file of the binary file, and determining the index type based on the attribute identification bit of the data file; reading the algorithm file of the binary file The valid data bits of the file are called according to the index type.
  • the first attribute identification bit of the data file when the index type between the data file and the algorithm file is an indirect index, includes an identification indicating the index type; read the data file in the binary file and the algorithm file, and call the algorithm file according to the index type, including: reading the first attribute identification bit and the first valid data bit of the data file in the first binary file, and determining the index type according to the first attribute identification bit; reading; The second attribute identification bit and the second valid data bit of the algorithm file in the second binary file are obtained, and the algorithm of the second valid data bit of the second binary file is called according to the index type.
  • Step S804 Perform parsing processing on the data file according to the algorithm file to obtain original data corresponding to the data file.
  • DNA storage needs to be stored for a long time in practical applications.
  • the data preprocessing algorithm may be lost, in order to ensure the security and integrity of the data in the long-term uncertain environment, the compression algorithm is used.
  • a specific file format is stored in DNA fragments, and on the basis of controlling the amount of data redundancy and simplifying the complexity of data reading, it ensures that the data can be self-interpreted and self-recoverable.
  • FIG. 11 shows a structural block diagram of the DNA-based data storage device provided by the embodiment of the present application. relevant part.
  • the device includes:
  • the first obtaining unit 111 is configured to obtain a data file to be stored, where the data file is a file obtained by preprocessing the original data according to the algorithm file;
  • the first processing unit 112 is configured to edit the data file and the algorithm file according to a preset file format, and generate a binary file to be encoded, and the file format is used to indicate the difference between the data file and the algorithm file.
  • the index type ;
  • the encoding unit 113 is configured to encode the binary file to obtain a base sequence, and the base sequence is used for synthesizing a DNA fragment storing the data file and the algorithm file.
  • FIG. 12 shows a structural block diagram of the DNA-based data recovery apparatus provided by the embodiment of the present application. relevant part.
  • the device includes:
  • the second obtaining unit 121 is configured to obtain DNA fragments to be decoded, and the DNA fragments to be decoded are used to store data files and algorithm files;
  • the decoding unit 122 is configured to perform decoding processing on the DNA fragment to be decoded to obtain a binary file conforming to a preset file format, where the file format is used to indicate the index type between the data file and the algorithm file ;
  • a second processing unit 123 configured to read the data file and the algorithm file in the binary file, and call the algorithm file according to the index type;
  • the parsing unit 124 is configured to perform parsing processing on the data file according to the algorithm file to obtain original data corresponding to the data file.
  • DNA storage needs to be stored for a long time in practical applications.
  • the data preprocessing algorithm may be lost, in order to ensure the security and integrity of the data in the long-term uncertain environment, the compression algorithm is used.
  • a specific file format is stored in DNA fragments, and on the basis of controlling the amount of data redundancy and simplifying the complexity of data reading, it ensures that the data can be self-interpreted and self-recoverable.
  • FIG. 13 is a schematic structural diagram of a terminal device according to an embodiment of the application.
  • the terminal device 13 of this embodiment includes: at least one processor 130 (only one is shown in FIG. 13 ), a processor, a memory 131 , and a processor 131 stored in the memory 131 and available for processing in the at least one processor
  • the computer program 132 running on the processor 130, the processor 130 implements the steps in any of the above method embodiments when the computer program 132 is executed.
  • the terminal device 13 may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server.
  • the terminal device may include, but is not limited to, the processor 130 and the memory 131 .
  • FIG. 13 is only an example of the terminal device 13, and does not constitute a limitation on the terminal device 13, and may include more or less components than the one shown, or combine some components, or different components , for example, may also include input and output devices, network access devices, and the like.
  • the so-called processor 130 may be a central processing unit (Central Processing Unit, CPU), and the processor 130 may also be other general-purpose processors, digital signal processors (Digital Signal Processors, DSP), application specific integrated circuits (Application Specific Integrated Circuits) , ASIC), off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the memory 131 may be an internal storage unit of the terminal device 13 in some embodiments, such as a hard disk or a memory of the terminal device 13 .
  • the memory 131 may also be an external storage device of the terminal device 13 in other embodiments, such as a plug-in hard disk equipped on the terminal device 13, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, flash memory card (Flash Card), etc.
  • the memory 131 may also include both an internal storage unit of the terminal device 13 and an external storage device.
  • the memory 131 is used to store an operating system, an application program, a boot loader (Boot Loader), data, and other programs, for example, program codes of the computer program, and the like.
  • the memory 131 may also be used to temporarily store data that has been output or will be output.
  • Embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the steps in the foregoing method embodiments can be implemented.
  • the embodiments of the present application provide a computer program product, when the computer program product runs on a mobile terminal, the steps in the foregoing method embodiments can be implemented when the mobile terminal executes the computer program product.
  • the integrated unit if implemented in the form of a software functional unit and sold or used as an independent product, may be stored in a computer-readable storage medium.
  • the present application realizes all or part of the processes in the methods of the above embodiments, which can be completed by instructing the relevant hardware through a computer program, and the computer program can be stored in a computer-readable storage medium.
  • the computer program includes computer program code
  • the computer program code may be in the form of source code, object code, executable file or some intermediate form, and the like.
  • the computer-readable medium may include at least: any entity or device capable of carrying the computer program code to the photographing device/terminal device, recording medium, computer memory, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), electrical carrier signals, telecommunication signals, and software distribution media.
  • ROM read-only memory
  • RAM random access memory
  • electrical carrier signals telecommunication signals
  • software distribution media For example, U disk, mobile hard disk, disk or CD, etc.
  • computer readable media may not be electrical carrier signals and telecommunications signals.
  • the disclosed apparatus/network device and method may be implemented in other manners.
  • the apparatus/network device embodiments described above are only illustrative.
  • the division of the modules or units is only a logical function division. In actual implementation, there may be other division methods, such as multiple units. Or components may be combined or may be integrated into another system, or some features may be omitted, or not implemented.
  • the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.

Landscapes

  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Bioethics (AREA)
  • Biophysics (AREA)
  • Databases & Information Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Sont divulgués un procédé de stockage de données, un procédé de récupération de données, un appareil de stockage de données et un appareil de récupération de données basés sur l'ADN, et un dispositif terminal et un support de stockage lisible par ordinateur. Le procédé de stockage de données consiste à : acquérir un fichier de données à stocker, le fichier de données étant un fichier obtenu par prétraitement de données d'origine selon un fichier d'algorithme ; éditer le fichier de données et le fichier d'algorithme selon un format de fichier prédéfini, de manière à générer un fichier binaire à coder, le format de fichier étant utilisé pour indiquer un type d'indexation entre le fichier de données et le fichier d'algorithme ; et coder le fichier binaire pour obtenir une séquence de bases, la séquence de bases étant utilisée pour synthétiser un fragment d'ADN qui stocke le fichier de données et le fichier d'algorithme. Grâce à la présente demande, le problème de l'impossibilité pour des données stockées après un prétraitement d'être récupérées en raison de l'impossibilité de garantir qu'un algorithme de prétraitement utilisé est complètement présent peut être résolu, ce qui garantit l'intégrité du stockage et de la récupération de données dans des environnements incertains.
PCT/CN2020/134847 2020-12-09 2020-12-09 Procédé et appareil de stockage de données basé sur l'adn, procédé et appareil de récupération de données basée sur l'adn et dispositif terminal WO2022120626A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/134847 WO2022120626A1 (fr) 2020-12-09 2020-12-09 Procédé et appareil de stockage de données basé sur l'adn, procédé et appareil de récupération de données basée sur l'adn et dispositif terminal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/134847 WO2022120626A1 (fr) 2020-12-09 2020-12-09 Procédé et appareil de stockage de données basé sur l'adn, procédé et appareil de récupération de données basée sur l'adn et dispositif terminal

Publications (1)

Publication Number Publication Date
WO2022120626A1 true WO2022120626A1 (fr) 2022-06-16

Family

ID=81974106

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/134847 WO2022120626A1 (fr) 2020-12-09 2020-12-09 Procédé et appareil de stockage de données basé sur l'adn, procédé et appareil de récupération de données basée sur l'adn et dispositif terminal

Country Status (1)

Country Link
WO (1) WO2022120626A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116187435B (zh) * 2022-12-19 2024-01-05 武汉大学 基于大小喷泉码及mrc算法利用dna进行信息存储方法及系统

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190317684A1 (en) * 2018-04-11 2019-10-17 University-Industry Cooperation Group Of Kyung Hee University Dna digital data storage device and method, and decoding method of dna digital data
CN110431148A (zh) * 2017-01-10 2019-11-08 罗斯威尔生命技术公司 用于dna数据存储的方法和系统
CN110706751A (zh) * 2019-09-25 2020-01-17 东南大学 一种dna存储加密编码方法
CN110945595A (zh) * 2017-07-25 2020-03-31 南京金斯瑞生物科技有限公司 基于dna的数据存储和检索

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110431148A (zh) * 2017-01-10 2019-11-08 罗斯威尔生命技术公司 用于dna数据存储的方法和系统
CN110945595A (zh) * 2017-07-25 2020-03-31 南京金斯瑞生物科技有限公司 基于dna的数据存储和检索
US20190317684A1 (en) * 2018-04-11 2019-10-17 University-Industry Cooperation Group Of Kyung Hee University Dna digital data storage device and method, and decoding method of dna digital data
CN110706751A (zh) * 2019-09-25 2020-01-17 东南大学 一种dna存储加密编码方法

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116187435B (zh) * 2022-12-19 2024-01-05 武汉大学 基于大小喷泉码及mrc算法利用dna进行信息存储方法及系统

Similar Documents

Publication Publication Date Title
CN112527736B (zh) 基于dna的数据存储方法、数据恢复方法及终端设备
US10778246B2 (en) Managing compression and storage of genomic data
US6925467B2 (en) Byte-level file differencing and updating algorithms
US20190385702A1 (en) Method and systems for the reconstruction of genomic reference sequences from compressed genomic sequence reads
JP6931050B2 (ja) バイナリデータをエンコード及びデコードする方法及び装置
CN107046812B (zh) 一种数据保存方法和装置
CA2902873C (fr) Gestion d'operations sur des unites de donnees stockees
US20130073527A1 (en) Data storage dedeuplication systems and methods
WO2017020576A1 (fr) Procédé et appareil de compactage de fichiers dans un système de stockage clé/valeur
EP2965187A1 (fr) Gestion d'opérations sur des unités de données stockées
WO2022120626A1 (fr) Procédé et appareil de stockage de données basé sur l'adn, procédé et appareil de récupération de données basée sur l'adn et dispositif terminal
CN107798063B (zh) 快照处理方法和快照处理装置
US20230229633A1 (en) Adding content to compressed files using sequence alignment
CA2902869C (fr) Gestion d'operations sur des unites de donnees stockees
US9471246B2 (en) Data sharing using difference-on-write
JP2020509474A (ja) 圧縮されたゲノムシーケンスリードからゲノムリファレンスシーケンスを再構築するための方法とシステム
US8463759B2 (en) Method and system for compressing data
CN111104259A (zh) 一种数据库恢复方法、装置及存储介质
US20220237470A1 (en) Storing digital data in dna storage using blockchain and destination-side deduplication using smart contracts
CN116564424A (zh) 基于纠删码与组装技术的dna数据存储方法、读取方法及终端
WO2023272499A1 (fr) Procédé de codage, procédé de décodage, appareil, dispositif terminal et support de stockage lisible
WO2023201782A1 (fr) Procédé et appareil de codage d'informations basés sur un stockage d'adn, et dispositif informatique et support
WO2023206023A1 (fr) Procédé de codage et dispositif de codage pour stockage d'adn
WO2022088184A1 (fr) Procédé de stockage de données, dispositif électronique et support de stockage lisible par ordinateur
US12040055B2 (en) Securely archiving digital data in DNA storage as blocks in a blockchain

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20964570

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20964570

Country of ref document: EP

Kind code of ref document: A1