WO2023201782A1

WO2023201782A1 - Information coding method and apparatus based on dna storage, and computer device and medium

Info

Publication number: WO2023201782A1
Application number: PCT/CN2022/091100
Authority: WO
Inventors: 黄奕翼; 戴俊彪
Original assignee: 中国科学院深圳先进技术研究院
Priority date: 2022-04-23
Filing date: 2022-05-06
Publication date: 2023-10-26
Also published as: CN114974434A

Abstract

An information coding method and apparatus based on DNA storage, and a computer device and a medium. The method comprises: acquiring a file, which is stored in the form of a binary sequence (S10); on the basis of a binary-DNA conversion mapping table, segmenting the binary sequence in bytes to obtain binary sequence segments, mapping each binary sequence segment to a base segment, and then sequentially combining all the base segments to form DNA information, wherein each base segment satisfies the following composition conditions: the length is 5; the sum of the number of G bases and the number of C bases is t, which satisfies 0.4 ≤ t/5 ≤ 0.6; there are no repeated bases at a boundary; and there are no three consecutive repeated bases in the middle (S20); and saving the file in the form of DNA information (S30). The method can strictly guarantee that consecutive repeated bases in all coded DNA information have a maximum length of only 2 and a GC content that is between 0.4 and 0.6, an algorithm has linear complexity, and the net information density is 1.60.

Description

Information encoding methods, devices, computer equipment and media based on DNA storage

Technical field

The present invention relates to the field of information storage technology, and in particular to an information encoding method, device, computer equipment and medium based on DNA storage.

Background technique

With the rapid development of information and digital technologies such as the Internet and artificial intelligence, the amount of information has grown exponentially. Traditional storage media such as disks, hard drives, and flash memories have gradually been unable to meet the needs of data storage worldwide. There are natural advantages to using DNA as a storage medium: First, the information density is high. According to previous estimates by Microsoft Research, 1 cubic millimeter of DNA can store 1 exabyte (exabyte) of data; second, the storage time is long. , Strong stability, and can be stored for tens of thousands of years under suitable conditions; third, storage energy consumption is very low.

DNA is a biological macromolecule with a defined sequence. DNA storage technology encodes information into base sequences and stores them in DNA. The DNA can then be copied, sequenced, and decoded to read the information inside. During these processes, DNA with long repetitive bases or extreme GC content is prone to errors during synthesis, replication, and sequencing. How to quickly and efficiently avoid long repetitive bases or extreme GC content in DNA sequences after information encoding has become an urgent problem to be solved.

Contents of the invention

Embodiments of the present invention provide an information encoding method, device, computer equipment and medium based on DNA storage to solve the problem of how to avoid long repeated bases or extreme GC content in DNA sequences after information encoding.

An information encoding method based on DNA storage, including:

Get a file stored as a binary sequence;

Based on the binary-DNA conversion mapping table, the binary sequence is divided into byte units to obtain binary sequence slices, each binary sequence slice is mapped to a base slice, and then all the base slices are combined in sequence to form DNA information. A base piece satisfies the following conditions: the length is five, the sum of the number of G bases and C bases is t, satisfies 0.4≤t/5≤0.6, there are no repeated bases at the boundary, and there are no three consecutive repeated bases in the middle base;

Save the file as DNA information.

Further, after all the base pieces are combined in sequence to form DNA information, it also includes:

Obtain the DNA information decoding request, and obtain the corresponding DNA information based on the DNA information decoding request;

Based on the binary-DNA conversion mapping table, the DNA information is converted into a binary sequence, which is used to convert the DNA information into files for storage.

Further, after the file is saved in the form of DNA information, it also includes:

Obtain the scheduled task, and when the system time meets the scheduled task, update the mapping relationship in the binary-DNA conversion mapping table;

or,

Establish a public mapping relationship based on public documents and a private mapping relationship based on private documents for the mapping relationship.

Further, before obtaining the file stored in the form of a binary sequence, it also includes:

Based on the principle that the number of types of base chips is greater than or equal to the number of types of binary sequence chips and the convenience of conversion, it is determined that one byte carrying eight bits is used as the division unit of binary sequence chips;

According to the number of bits 8, determine the base slices corresponding to the ²⁸ binary sequence slices.

Further, determine the base slices corresponding to the ²⁸ binary sequence slices, including:

Use any one of the four bases as the first base of each base piece, and continue to synthesize the second base until the last base according to the composition conditions of the base piece;

Use any one of the remaining three bases as the second base of each base piece, and repeat the steps of continuing to synthesize the second base until the last base according to the composition conditions, until the base The number of slice types is equal to the number of binary sequence slice types.

Further, convert the file into the form of DNA base sequence code and save it, including:

The DNA base sequence code is synthesized into DNA solution or dry powder through a DNA synthesis device and stored.

An information encoding device based on DNA storage, including:

Obtain binary sequence file module, used to obtain files stored in the form of binary sequence;

Form a DNA information module, which is used to divide the binary sequence in units of bytes to obtain binary sequence slices based on the binary-DNA conversion mapping table, map each binary sequence slice to a base slice, and then all The base pieces are combined sequentially to form DNA information. Each base piece meets the following conditions: the length is five, the sum of the number of G bases and C bases is t, and 0.4≤t/5≤0.6 is satisfied. There are no repeated bases at the boundaries, and there are no three consecutive repeated bases in the middle;

A module for saving DNA information is used to save the file in the form of the DNA information.

A computer device includes a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, it implements the above information encoding method based on DNA storage.

A computer-readable medium stores a computer program. When the computer program is executed by a processor, the above-mentioned information encoding method based on DNA storage is implemented.

The above-mentioned information encoding methods, devices, computer equipment and media based on DNA storage can strictly guarantee the length of continuously repeated bases in all encoded DNA information by converting files into DNA information through composition conditions with linear complexity for storage. The longest is only 2, and the GC content is between 0.4 and 0.6, ensuring that DNA information has linear complexity, and the net information density (Net Information Density, NID) is 1.60 to effectively ensure the synthesis, replication and sequencing processes of DNA and improve The synthesis efficiency of DNA can improve the information density of information storage and extend the stable storage time of information, while reducing storage energy consumption.

Description of the drawings

In order to explain the technical solutions of the embodiments of the present invention more clearly, the drawings needed to be used in the description of the embodiments of the present invention will be briefly introduced below. Obviously, the drawings in the following description are only some embodiments of the present invention. , for those of ordinary skill in the art, other drawings can also be obtained based on these drawings without exerting creative labor.

Figure 1 is a schematic diagram of the application environment of the information encoding method based on DNA storage in one embodiment of the present invention;

Figure 2 illustrates a flow chart of an information encoding method based on DNA storage in one embodiment of the present invention;

Figure 3 illustrates a first flow chart of an information encoding method based on DNA storage in another embodiment of the present invention;

Figure 4 is a schematic diagram illustrating the entire process from encoding to decoding of an information encoding method based on DNA storage in another embodiment of the present invention;

Figure 5 illustrates a second flow chart of an information encoding method based on DNA storage in another embodiment of the present invention;

Figure 6 illustrates a third flow chart of an information encoding method based on DNA storage in another embodiment of the present invention;

Figure 7 illustrates a fourth flow chart of an information encoding method based on DNA storage in another embodiment of the present invention;

Figure 8 is a schematic diagram of an information encoding device based on DNA storage in one embodiment of the present invention;

FIG. 9 is a schematic diagram of a computer device according to an embodiment of the present invention.

Detailed ways

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are part of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without making creative efforts fall within the scope of protection of the present invention.

The information encoding method based on DNA storage provided by the embodiment of the present invention can be applied in the application environment as shown in Figure 1. The information encoding method based on DNA storage is applied in the information encoding system based on DNA storage. The information based on DNA storage The coding system includes a client and a server, where the client communicates with the server through the network. The client, also known as the user end, refers to the program that corresponds to the server and provides local services to the client. Further, the client is a computer program, an APP program of a smart device, or a third-party applet embedded in other APPs. The client can be installed on, but is not limited to, various computer devices such as personal computers, laptops, smartphones, tablets, and portable wearable devices. The server can be implemented as an independent server or a server cluster composed of multiple servers.

Traditional storage technology stores information in binary sequences (that is, sequences composed of 0s and 1s) into hard disks, optical disks, U disks, CDs and other media. For example, music, pictures, videos and other files are all binary sequences at the bottom of the computer. . Due to their mechanical properties, these media will fail after a certain number of reads or writes. In addition, they have low information density and large volume. With the explosive growth of global information, it is difficult to meet storage needs.

In one embodiment, as shown in Figure 2, an information encoding method based on DNA storage is provided. The application of this method to the server in Figure 1 is used as an example to illustrate, specifically including the following steps:

S10. Obtain files stored in the form of binary sequences.

Among them, the binary sequence is the digital sequence encoding of the file saved in binary form (composed only of 0 and 1).

Specifically, this embodiment can perform DNA encoding on various forms of documents that have been saved in binary. The various forms of documents include voice, text, images, music, etc., and are not specifically limited here.

S20. Based on the binary-DNA conversion mapping table, divide the binary sequence into byte units to obtain binary sequence slices, map each binary sequence slice to a base slice, and then merge all the base slices in sequence to form DNA information. , each base piece meets the following composition conditions: the length is five, the sum of the number of G bases and C bases is t, satisfies 0.4≤t/5≤0.6, there are no repeated bases on the boundary, and there are no three consecutive bases in the middle Repeat bases.

Among them, the binary-DNA conversion mapping table is a table that maps binary codes and DNA codes to each other. For example, a set of data in the binary-DNA conversion mapping table is: 00000000<-->ATACG, which means that the binary code is 00000000. The data can be saved in the DNA-encoded form ATACG after mapping in this embodiment.

The DNA base sequence code is the code synthesized from all n base units.

Specifically, there are four nitrogenous bases that make up DNA: adenine (A), guanine (G), thymine (T), and cytosine (C). In this embodiment, a binary-DNA conversion mapping table is used to convert a file saved as a binary sequence into DNA information synthesized into base slices in units of n bases for storage.

In this embodiment, 8 bits, that is, one byte, can be used as the unit of the binary sequence. For example, the continuous binary source code corresponding to the file can be split by bytes to obtain multiple binary sequence slices. Each sequence slice contains 8 bits. The binary sequence is a sequence synthesized from multiple 8-bit binary sequence slices. code.

For example, the binary code of the file is divided into one byte (that is, a binary sequence slice with 8 binary bits, and there are 256 such binary sequence slices in total). Each group is mapped to 5 appropriate bases. The binary sequence can be obtained by synthesizing the base slices (there are 256 such base slices in total).

Understandably, the decoding process is the reverse process of the encoding process: the DNA sequence code is divided into groups of 5 bases, and each group is mapped to a binary sequence piece with 8 binary bits.

S30. Save the file in the form of DNA information.

Specifically, saving files in the form of DNA information, that is, in the form of biological macromolecules, can effectively extend the stable storage time of information and reduce storage energy consumption.

The information encoding method based on DNA storage provided in this embodiment can strictly ensure that the longest length of consecutive repeating bases in all encoded DNA information is only 2. The GC content is between 0.4 and 0.6, ensuring that DNA information has linear complexity, and the Net Information Density (NID) is 1.60 to effectively ensure the synthesis, replication and sequencing processes of DNA and improve DNA synthesis. Efficiency, improve the information density of information storage and extend the stable storage time of information, while reducing storage energy consumption.

In a specific embodiment, as shown in Figure 3, after step S20, that is, after all the base pieces are combined in sequence to form DNA information, the following steps are specifically included:

S201. Obtain the DNA information decoding request, and obtain the corresponding DNA information based on the DNA information decoding request.

S202. Based on the binary-DNA conversion mapping table, convert the DNA information into a binary sequence, which is used to convert the DNA information into a file for storage.

Among them, the DNA decoding request is a request to decode the file with DNA information stored into a binary sequence, that is, decoding is the reverse process of encoding.

Specifically, this embodiment still uses a binary-DNA conversion mapping table to divide the DNA information into base slices and then map it into a binary sequence. Preferably, the binary-DNA conversion mapping table includes at least one set of mapping relationships between base slices and binary sequence slices. This mapping relationship is also the binary code and five bases between each binary sequence slice and each base slice. The specific correspondence between the basis arrangements.

Further, the entire process from encoding to decoding is given as an example to illustrate the implementation process of this embodiment:

A coding process:

segmentation:

Divide the binary sequence of the input information into groups of 8 binary bits. This must result in an integer number of groups, because 8 binary bits are a byte, and files are stored in bytes, and their size must be an integer multiple of bytes. Taking the input information in Figure 4 as an example, "01100011 10001101 01011011 10001110 10111011 00110111 01011000" is divided into 7 groups in total.

Mapping:

According to the mapping relationship provided by the binary-DNA conversion mapping table, query the 5-base slice corresponding to each set of binary sequence slices. For example, "01100011" corresponds to "TCGTA", and so on. In this way, seven sets of 5-base chips were obtained.

merge:

By combining the obtained base slices, the required DNA information is obtained. This DNA information can then be synthesized into DNA as an information recording carrier.

B decoding process:

segmentation:

The DNA information to be decoded is divided into groups of 5 bases. This will definitely result in an integer number of groups, because the previous encoding was an integer number of groups. For example, "TCGTA CACTG TCTCT CACGA CGTCT AGTGC TCTAC" is divided into 7 groups.

Mapping:

According to the mapping relationship provided by the binary-DNA conversion mapping table, query the 8-binary sequence slice corresponding to each group of base slices. For example, "TCGTA" corresponds to "01100011", and so on. In this way, 7 groups of 8-binary sequence slices are obtained.

merge:

The resulting binary sequence slices are combined to obtain the decoded information.

This embodiment is used to read and decode data from DNA as a storage medium carrier for subsequent processing. It is a fast, efficient and robust encoding and decoding method.

In a specific embodiment, as shown in Figure 5, after step S30, that is, after saving the file in the form of DNA information, the following steps are specifically included:

S3011. Obtain the scheduled task, and when the system time meets the scheduled task, update the mapping relationship in the binary-DNA conversion mapping table;

or,

S3012 establishes a public mapping relationship based on public documents and a private mapping relationship based on private documents for mapping relationships.

Specifically, in order to enhance the security and reliability of file preservation, the method provided in this embodiment can regularly update the mapping relationship in the binary-DNA conversion mapping table, that is, each binary sequence piece and the five base pieces in each base piece. The order of bases is changed. Or, establish a set of public mapping relationships for public documents for public use, and at the same time, establish a private private mapping relationship for use by documents with security or privacy.

In a specific embodiment, as shown in Figure 6, before step S10, that is, before obtaining the file stored in the form of a binary sequence, the following steps are specifically included:

S101. Based on the principle that the number of types of base chips is greater than or equal to the number of types of binary sequence chips and that conversion is convenient, it is determined that one byte carrying eight bits is used as the division unit of the binary sequence chip.

S102. According to the number of bits 8, determine the base slices corresponding to the ²⁸ binary sequence slices.

Specifically, in order to make the complexity of the algorithm linear, this application cuts the binary sequence of the file into binary sequence slices of a certain length, and then maps each binary sequence slice to an appropriate DNA base slice. The principle that the number of types of base chips is greater than or equal to the number of types of binary sequence chips is 4 ⁿ ≥ 2 ⁸ , and it can be deduced that 2n ≥ 8. Among them, 2 ⁸ is the total number of binary sequence slices. In order to save base chip resources, in this embodiment, the minimum value of n can be taken as the number of bases in the base chip.

For example, since the files on the computer are stored in bytes, each byte is 8 bits, the length of the binary sequence slice can be fixed at 8, so the total number of binary sequence slices is 2 ⁸ =256. Next, 256 suitable base chips need to be determined.

When one byte is a unit of binary sequence slices, n must be at least 4. However, a large part of the length-4 base slices have very long consecutive repeating bases (for example, AAAT has 3 consecutive repeating bases) or extreme GC content (for example, the GC content of GCCT is 75%). The continuous repeating bases of the base sequence obtained by combining such base slices are longer, and the GC content cannot be controlled. So x cannot take 4, it should at least take 5.

The method provided in this embodiment converts the file into a binary sequence synthesized from multiple binary sequence slices, which facilitates subsequent rapid conversion of the storage format through the mapping relationship between the binary sequence slices and the base slices. Preferably, the mapping relationship between the binary sequence slices and the base slices is recorded and the binary-DNA conversion mapping table includes at least one set of mapping relationships between the base slices and the binary sequence slices. This embodiment can adapt the number of corresponding base slices based on the total number of binary sequence slices, thereby saving storage resources of base slices.

In a specific embodiment, as shown in Figure 7, in step S102, the base slices corresponding to the ²⁸ binary sequence slices are determined, which specifically includes the following steps:

S1021. Use any one of the four bases as the first base of each base piece, and continue to synthesize the second base until the last base according to the composition conditions of the base piece.

S1022. Use any one of the remaining three bases as the second base of each base piece, and repeat the steps of continuing to synthesize the second base until the last base according to the composition conditions, until The number of types of base chips is equal to the number of types of binary sequence chips.

Specifically, continue to take n=5 as an example for explanation. From all the base slices with a length of 5, 256 suitable base slices that can be used as storage media are selected.

Let abcde be a base piece of length 5, in which the values of a, b, c, d, and e are all in the set {A, T, C, G}. Considering the length of consecutive repeating bases, they should satisfy the following conditions:

Condition 1: a≠b,d≠e;

Condition 2: b, c, and d are not all the same.

Condition 1 means that there must be no repeated bases on the boundaries of the base sheets to avoid 3 or more consecutive repeating bases after combination (for example, the combination of AATCG and TCGGA will result in 3 consecutive repeating bases AAA). Condition 2 means that three consecutive repeating bases cannot appear inside the base sheet. This is also to avoid too long consecutive repeating bases. In addition, let t be the sum of the numbers of G and C in the base piece. In order to control the GC content in the encoded base sequence, t should meet the following conditions:

Condition 3: 0.4≤t/5≤0.6

In other words, the GC content of each base piece is between 0.4 and 0.6, so the GC content of the encoded base sequence is also within this range. Solve condition 3 to get

2≤t≤3,

This means that the sum of the number of G and C in the base piece can only be 2 or 3.

Based on the above conditions, suitable base chips can be screened out.

Starting from a, you can choose any one of {A, T, C, G}, there are four possibilities. Since b cannot be the same as a, there are only three possibilities for b. In other words, if a takes A, b cannot take A, but can only take T, C, and G. There are four values for c, because it is in the middle of the base sheet, with only two bases in front. The value of d must be chosen carefully, and it must be ensured that b, c, and d cannot all be the same, and the total number of G and C cannot be less than 1 or more than 3 (it can be 1, 2, or 3 at this time, because there is another base e). For example, if the previous one is GTT, then d can no longer be T. Otherwise, condition 2 is violated and d can be A, C or G. Next, the value of e also depends on the previous ones. e cannot be the same as d, and the total number of G and C must be 2 or 3. For example, if it is preceded by GTTA, then e cannot be A, otherwise condition 1 is violated; further, e can only be G or C to satisfy condition 3. . In this way, suitable base slices are screened out, as shown below in the binary-DNA conversion mapping table, where the symbol "<-->" represents the mapping, and the left side of the symbol is a binary sequence slice of length 8. (called an 8-binary sequence piece), and on the right is a base piece of length 5 (called a 5-base piece):

In a specific embodiment, in step S20, the file is converted into the form of DNA base sequence code and saved, which specifically includes the following steps:

S21. Use the DNA synthesis device to synthesize the DNA base sequence code into a DNA solution or dry powder and store it.

Specifically, through the splicing of oligonucleotides, a variety of existing technologies can already artificially synthesize specific DNA sequences, among which chemical methods have matured and enzymatic synthesis methods are developing. The chemical method is divided into four steps: deprotection, coupling, capping (optional) and oxidation. It is characterized by early appearance and the use of toxic reagents. Enzymatic methods are relatively mild, less damaging to DNA, more accurate, and have fewer by-products.

DNA is closely related to biology and will not be eliminated by the times like other storage media. The storage density of DNA is very high. The storage density of the most compact hard disk in the world is only one thousandth of it. Using DNA to store data, 10 complete high-definition movies can be stored in the size of a grain of salt. DNA is the core of biological research. As time goes by and technology matures, it will become more and more convenient to access data on DNA.

Analyze the complexity of the method provided in this application. 8 is the length of the binary sequence of the input file, then the number of steps required to implement the encoding or decoding mapping is a linear function of 8, so both encoding and decoding have linear complexity. The method of the present invention maps every 8 binary bits to 5 quaternary fragments, and the net information density is 8/5=1.60.

Furthermore, the information encoding method based on DNA storage proposed in this embodiment can store various information originally stored as binary into DNA information through fast, high encoding efficiency and robust construction conditions, with a net information density (Net Information Density, NID) can reach 1.60, and the length of continuous repeating bases (homopolymer) in all encoded DNA sequences is only 2 bases at most, and the GC content is strictly controlled between 40% and 60%.

It should be understood that the sequence number of each step in the above embodiment does not mean the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiment of the present invention.

In one embodiment, an information encoding device based on DNA storage is provided. The information encoding device based on DNA storage corresponds to the information encoding method based on DNA storage in the above embodiment. As shown in FIG. 8 , the information encoding device based on DNA storage includes a module 10 for obtaining a binary sequence file, a module 20 for forming DNA information, and a module 30 for saving DNA information. The detailed description of each functional module is as follows:

Obtain binary sequence file module 10, used to acquire files stored in the form of binary sequence;

A DNA information module 20 is formed, which is used to divide the binary sequence in units of bytes to obtain binary sequence slices based on the binary-DNA conversion mapping table, map each binary sequence slice to a base slice, and then All the base pieces are combined in sequence to form DNA information. Each base piece meets the following conditions: the length is five, the sum of the number of G bases and C bases is t, and 0.4≤t/5≤0.6 , there are no repeated bases at the boundary, and there are no three consecutive repeated bases in the middle;

The DNA information saving module 30 is used to save the file in the form of the DNA information.

For specific limitations on the information encoding device based on DNA storage, please refer to the above limitations on the information encoding method based on DNA storage, which will not be described again here. Each module in the above-mentioned DNA storage-based information encoding device can be implemented in whole or in part by software, hardware, and combinations thereof. Each of the above modules may be embedded in or independent of the processor of the computer device in the form of hardware, or may be stored in the memory of the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.

In one embodiment, a computer device is provided. The computer device may be a server, and its internal structure diagram may be as shown in Figure 9. The computer device includes a processor, memory, network interface, and database connected through a system bus. Wherein, the processor of the computer device is used to provide computing and control capabilities. The memory of the computer device includes non-volatile media and internal memory. This non-volatile medium stores the operating system, computer programs and databases. This internal memory provides an environment for the execution of operating systems and computer programs in non-volatile media. The computer device's database is used for data related to information encoding methods based on DNA storage. The network interface of the computer device is used to communicate with external terminals through a network connection. The computer program, when executed by the processor, implements an information encoding method based on DNA storage.

In one embodiment, a computer device is provided, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, the information encoding method based on DNA storage in the above embodiment is implemented. , for example, steps S10 to S20 shown in Figure 2 . Alternatively, when the processor executes the computer program, the functions of each module/unit of the information encoding device based on DNA storage in the above embodiments are implemented, such as the functions of modules 10 to 20 shown in FIG. 8 . To avoid repetition, they will not be repeated here.

In one embodiment, a computer-readable medium is provided with a computer program stored thereon. When the computer program is executed by a processor, the information encoding method based on DNA storage in the above embodiment is implemented, such as steps S10 to S20 shown in FIG. 2 . Alternatively, when the computer program is executed by the processor, it realizes the functions of each module/unit in the DNA storage-based information encoding device in the above device embodiment, such as the functions of modules 10 to 20 shown in Figure 8 . To avoid repetition, they will not be repeated here.

Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented by instructing relevant hardware through a computer program. The computer program can be stored in a non-volatile computer-readable medium. When executed, the computer program may include the processes of the above method embodiments. Any reference to memory, storage, database or other media used in various embodiments of this application may include non-volatile and/or volatile memory. Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Synchlink DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

Those skilled in the art can clearly understand that for the convenience and simplicity of description, only the division of the above functional units and modules is used as an example. In actual applications, the above functions can be allocated to different functional units and modules according to needs. Module completion means dividing the internal structure of the device into different functional units or modules to complete all or part of the functions described above.

The above embodiments are only used to illustrate the technical solutions of the present invention, but not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that they can still modify the technical solutions of the foregoing embodiments. Modifications are made to the recorded technical solutions, or equivalent substitutions are made to some of the technical features; however, these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of each embodiment of the present invention, and should all be included in the present invention. within the scope of protection.

Claims

An information encoding method based on DNA storage, which is characterized by including:

Get a file stored as a binary sequence;

Based on the binary-DNA conversion mapping table, the binary sequence is divided into byte units to obtain binary sequence slices, each binary sequence slice is mapped to a base slice, and then all the base slices are merged in sequence Finally, DNA information is formed. Each base piece meets the following conditions: the length is five, the sum of the number of G bases and C bases is t, satisfies 0.4≤t/5≤0.6, and there are no repeated bases on the boundary. There are no three consecutive repeated bases in the middle;

The file is saved in the form of the DNA information.
The information encoding method based on DNA storage according to claim 1, characterized in that, after said sequentially merging all the base pieces to form DNA information, it further includes:

Obtain a DNA information decoding request, and obtain the corresponding DNA information based on the DNA information decoding request;

Based on the binary-DNA conversion mapping table, the DNA information is converted into a binary sequence, which is used to convert the DNA information into a file for storage.
The information encoding method based on DNA storage according to claim 1, characterized in that the binary-DNA conversion mapping table includes at least one set of mapping relationships between the base slices and the binary sequence slices.
The information encoding method based on DNA storage according to claim 1, characterized in that after the file is saved in the form of the DNA information, it further includes:

Obtain a scheduled task, and when the system time meets the scheduled task, update the mapping relationship in the binary-DNA conversion mapping table;

or,

A public mapping relationship based on public documents and a private mapping relationship based on private documents are established for the mapping relationships.
The information encoding method based on DNA storage according to claim 1, characterized in that, before obtaining the file stored in the form of a binary sequence, it also includes:

Based on the principle that the number of types of base chips is greater than or equal to the number of types of binary sequence chips and that conversion is convenient, it is determined that one byte carrying eight bits is used as the division unit of the binary sequence chip;

According to the number of bits (8), the base slices corresponding to the 28 binary sequence slices are determined.
The information encoding method based on DNA storage according to claim 5, characterized in that determining the base slices corresponding to the 28 binary sequence slices includes:

Use any one of the four bases as the first base of each base piece, and continue to synthesize the second base until the last base according to the composition conditions of the base piece;

Use any one of the remaining three bases as the second base of each base piece, and repeat the steps of continuing to synthesize the second base until the last base according to the composition conditions. The number of types of base chips is equal to the number of types of binary sequence chips.
The information encoding method based on DNA storage according to claim 1, wherein the step of saving the file in the form of the DNA information includes:

The DNA base sequence code is synthesized into a DNA solution or dry powder by a DNA synthesis device and stored.
An information encoding device based on DNA storage, characterized by including:

Obtain binary sequence file module, used to obtain files stored in the form of binary sequence;

Form a DNA information module, which is used to divide the binary sequence in units of bytes to obtain binary sequence slices based on the binary-DNA conversion mapping table, map each binary sequence slice to a base slice, and then all The base pieces are combined sequentially to form DNA information. Each base piece meets the following conditions: the length is five, the sum of the number of G bases and C bases is t, and 0.4≤t/5≤0.6 is satisfied. There are no repeated bases at the boundaries, and there are no three consecutive repeated bases in the middle;

A module for saving DNA information is used to save the file in the form of the DNA information.
A computer device, including a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that when the processor executes the computer program, it implements claims 1 to 1 7. Information encoding method based on DNA storage according to any one of 7.
A computer-readable medium storing a computer program, characterized in that when the computer program is executed by a processor, the information encoding method based on DNA storage as described in any one of claims 1 to 7 is implemented. .