WO2023173842A1

WO2023173842A1 - Dna coding method and apparatus, dna decoding method and apparatus, terminal device and medium

Info

Publication number: WO2023173842A1
Application number: PCT/CN2022/138143
Authority: WO
Inventors: 戴俊彪; 强薇; 黄小罗
Original assignee: 深圳先进技术研究院
Priority date: 2022-03-14
Filing date: 2022-12-09
Publication date: 2023-09-21
Also published as: CN115312128A; WO2023173837A1

Abstract

The present application belongs to the technical field of data storage. Provided are a DNA coding method and apparatus, a DNA decoding method and apparatus, a terminal device and a medium. The DNA coding method comprises: converting into a first quaternary sequence original data to be stored; according to a preset mapping relationship between quaternary characters and bases, performing code conversion on the first quaternary sequence to obtain a base sequence; and according to the base sequence, obtaining a DNA sequence in which the original data is stored. The present application can increase the coding speed.

Description

DNA encoding method, decoding method, device, terminal equipment and medium

Technical field

This application belongs to the field of data storage technology, and in particular relates to a DNA encoding method, decoding method, device, terminal equipment and medium.

Background technique

The development of the Internet has caused the explosive growth of information in human society. However, the existing storage media has been almost exhausted. Based on this, relevant researchers have turned their goals to deoxyribonucleic acid (DNA, DeoxyriboNucleic Acid) storage. The existing DNA encoding method is based on silicon-based 01 binary storage, converting the information to be stored into 01 binary numbers, and then further encoding it into a DNA sequence. Since the problem of single base repetition needs to be taken into account during the encoding process, binary numbers need to undergo a series of complex operations (such as XOR operations, random function mapping, conditional filtering, etc.) before they can be encoded into DNA sequences. However, because complex operations take a long time, the encoding speed is not ideal.

technical problem

Embodiments of the present application provide a DNA encoding method, decoding method, device, terminal equipment and medium, which can solve the problem of unsatisfactory DNA encoding speed.

Technical solutions

In a first aspect, embodiments of the present application provide a DNA encoding method, including:

Convert the original data to be stored into the first quaternary sequence;

According to the preset mapping relationship between quaternary characters and bases, the first quaternary sequence is encoded and converted to obtain the base sequence;

According to the base sequence, the DNA sequence storing the original data is obtained.

Optionally, convert the original data to be stored into the first quaternary sequence, including:

When the original data to be stored is text, the text is encoded according to the preset character encoding table to obtain a coding sequence;

According to the mapping relationship between the coded characters and quaternary characters in the preset character coding table, the coding sequence is converted into a first quaternary sequence.

When the original data to be stored is a picture, obtain the RGB value of each pixel in the picture;

Sort the RGB values of each pixel according to the preset pixel arrangement order to obtain a decimal sequence;

Convert the decimal sequence into the first quaternary sequence according to the mapping relationship between decimal characters and quaternary characters.

When the original data to be stored is audio, the audio is sampled according to the preset sampling rate to obtain multiple sample data;

Get the amplitude value of each sampled data;

Sort the obtained amplitude values according to the sampling order of multiple sampling data to obtain a decimal sequence;

Optionally, the first quaternary sequence includes the second quaternary sequence corresponding to the audio and the third quaternary sequence corresponding to the picture.

Three quaternary sequences; convert the original data to be stored into the first quaternary sequence, including:

When the original data to be stored is a video, extract the audio of the video and each frame of the video;

Process the extracted audio to obtain the second quaternary sequence corresponding to the audio;

Process each extracted picture frame to obtain the fourth quaternary sequence corresponding to each extracted frame picture;

According to the playback order of each extracted frame picture in the video, the fourth and fourth steps corresponding to each frame picture are

Sort the system sequence to obtain the third quaternary sequence corresponding to the screen.

Optionally, obtain the DNA sequence storing the original data based on the base sequence, including:

Insert N bases at every M base position in the base sequence to obtain a DNA sequence that stores the original data;

Among them, the bases adjacent to the first base among the N bases in the base sequence are different; the bases adjacent to the Nth base among the N bases in the base sequence are different, and N The bases at adjacent positions among the bases are different, and M and N are both integers greater than 0.

In the second aspect, embodiments of the present application provide a DNA decoding method, including:

Determine the DNA sequence to be decoded;

Obtain the base sequence based on the DNA sequence;

According to the preset mapping relationship between quaternary characters and bases, the base sequence is decoded to obtain the first quaternary sequence;

Convert the first quaternary sequence to obtain the original data corresponding to the DNA sequence.

Optionally, convert the first quaternary sequence to obtain the original data corresponding to the DNA sequence, including:

When the original data stored in the DNA sequence is text, the first quaternary sequence is converted into a coding sequence according to the mapping relationship between the coded characters and quaternary characters in the preset character encoding table;

Convert the encoding sequence into text according to the default character encoding table.

When the original data stored in the DNA sequence is a picture, the first quaternary sequence is converted into a decimal sequence according to the mapping relationship between quaternary characters and decimal characters;

Determine the RGB values of multiple pixels based on the decimal sequence;

Generate a picture based on the RGB values of multiple pixels and the preset pixel arrangement order.

When the original data stored in the DNA sequence is audio, the first quaternary sequence is converted into a decimal sequence according to the mapping relationship between quaternary characters and decimal characters;

Determine the amplitude values of multiple sampled data according to the decimal sequence;

Determine the total duration of audio stored in the DNA sequence based on the preset sampling rate and the number of sampled data;

The determined amplitude values are evenly distributed over the total duration to obtain the audio.

Optionally, the DNA sequence to be decoded includes the DNA sequence of the audio and the DNA sequence of the picture. The first quaternary sequence includes the second quaternary sequence corresponding to the DNA sequence of the audio and the third quaternary sequence corresponding to the DNA sequence of the picture. The original data stored in the audio DNA sequence is audio, and the original data stored in the picture DNA sequence is multi-frame pictures;

Convert the first quaternary sequence to obtain the original data corresponding to the DNA sequence, including:

Convert the second quaternary sequence corresponding to the DNA sequence of the audio to obtain the audio;

Convert the third quaternary sequence corresponding to the DNA sequence of the picture to obtain multiple frames of pictures;

After converting the first quaternary sequence to obtain the original data corresponding to the DNA sequence, the method also includes:

Synthesize multiple frames of pictures and audio to obtain a video.

Optionally, obtain the base sequence based on the DNA sequence, including:

In the DNA sequence, N bases are removed at every M base position to obtain the base sequence;

Among them, M and N are both integers greater than 0.

In a third aspect, embodiments of the present application provide a DNA encoding device, including:

The first conversion module is used to convert the original data to be stored into a first quaternary sequence;

The encoding module is used to encode and convert the first quaternary sequence to obtain the base sequence according to the preset mapping relationship between quaternary characters and bases;

The generation module is used to obtain the DNA sequence storing the original data based on the base sequence.

Optionally, the first conversion module includes:

The encoding unit is used to encode the text according to the preset character encoding table to obtain a coding sequence when the original data to be stored is text;

The first conversion unit is used to convert the coding sequence into a first quaternary sequence according to the mapping relationship between the coded characters and the quaternary characters in the preset character coding table.

Optionally, the first conversion module includes:

The first acquisition unit is used to acquire each pixel in the image when the original data to be stored is an image.

RGB value;

The first sorting unit is used to sort the RGB values of each pixel according to the preset pixel arrangement order to obtain a decimal sequence;

The second conversion unit is used to convert the decimal sequence into the first quaternary sequence according to the mapping relationship between decimal characters and quaternary characters.

Optionally, the first conversion module includes:

The sampling unit is used to sample the audio according to the preset sampling rate to obtain multiple sampled data when the original data to be stored is audio;

The second acquisition unit is used to acquire the amplitude value of each sampled data;

The second sorting unit is used to sort the obtained amplitude values according to the sampling order of multiple sampled data to obtain a decimal sequence;

The third conversion unit is used to convert the decimal sequence into the first quaternary sequence according to the mapping relationship between decimal characters and quaternary characters.

Three-quaternary sequence; the first conversion module includes:

An extraction unit, used to extract the audio of the video and each frame of the video when the original data to be stored is a video;

The first processing unit is used to process the extracted audio to obtain the second quaternary sequence corresponding to the audio;

The second processing unit is used to process each extracted picture frame and obtain the fourth quaternary sequence corresponding to each extracted frame picture;

The third sorting unit is used to sort each frame according to the playback order of the extracted frame pictures in the video.

The fourth quaternary sequence corresponding to the picture is sorted to obtain the third quaternary sequence corresponding to the picture.

Optional, generated modules include:

The generation unit is used to insert N bases at every M base position in the base sequence to obtain a DNA sequence storing original data.

In a fourth aspect, embodiments of the present application provide a DNA decoding device, including:

Determination module, used to determine the DNA sequence to be decoded;

The processing module is used to obtain the base sequence based on the DNA sequence;

The decoding module is used to decode the base sequence according to the preset mapping relationship between quaternary characters and bases to obtain the first quaternary sequence;

The second conversion module is used to convert the first quaternary sequence to obtain original data corresponding to the DNA sequence.

Optionally, the second conversion module includes:

The fourth conversion unit is used to convert the first quaternary sequence into coding sequence;

The fifth conversion unit is used to convert the encoding sequence into text according to the preset character encoding table.

Optionally, the second conversion module includes:

The sixth conversion unit is used to convert the first quaternary sequence into a decimal sequence according to the mapping relationship between quaternary characters and decimal characters when the original data stored in the DNA sequence is a picture;

The first determination unit is used to determine the RGB values of multiple pixels according to the decimal sequence;

The generation unit is used to generate pictures based on the RGB values of multiple pixels and the preset arrangement order of pixels.

Optionally, the second conversion module includes:

The seventh conversion unit is used to convert the first quaternary sequence into a decimal sequence according to the mapping relationship between quaternary characters and decimal characters when the original data stored in the DNA sequence is audio;

The second determination unit is used to determine the amplitude values of multiple sampled data according to the decimal sequence;

The third determination unit is used to determine the total duration of the audio stored in the DNA sequence based on the preset sampling rate and the number of sampled data;

The distribution unit is used to evenly distribute the determined amplitude values over the total duration to obtain audio.

The second conversion module includes:

The eighth conversion unit is used to convert the second quaternary sequence corresponding to the DNA sequence of the audio to obtain the audio;

The ninth conversion unit is used to convert the third quaternary sequence corresponding to the DNA sequence of the picture to obtain multiple frames of pictures;

The DNA decoding device also includes:

The synthesis module is used to synthesize multiple frames of pictures and audio to obtain a video.

The optional processing module is specifically used to remove N bases at every M base position in the DNA sequence to obtain the base sequence; where M and N are both integers greater than 0.

In the fifth aspect, embodiments of the present application provide a terminal device, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, the above-mentioned DNA encoding method or DNA encoding method is implemented. Decoding method.

In a sixth aspect, embodiments of the present application provide a computer-readable storage medium. The computer-readable storage medium stores a computer program. When the computer program is executed by a processor, the above-mentioned DNA encoding method or DNA decoding method is implemented.

In a seventh aspect, embodiments of the present application provide a computer program product, which when the computer program product is run on a terminal device, causes the terminal device to execute the above-mentioned DNA encoding method or DNA decoding method.

beneficial effects

Compared with the prior art, the beneficial effects of the embodiments of the present application are:

In the embodiment of the present application, by converting the original data to be stored into a first quaternary sequence, and then encoding the first quaternary sequence into a base sequence according to the mapping relationship between quaternary characters and bases, Finally, the DNA sequence storing the original data is obtained based on the base sequence. In the process of encoding the first quaternary sequence into a base sequence, there is no need to go through complex operations. Instead, according to the mapping relationship between quaternary characters and bases, the first quaternary sequence is directly encoded as Base sequence, thereby reducing algorithm complexity and improving coding speed.

Description of the drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or description of the prior art will be briefly introduced below. Obviously, the drawings in the following description are only for the purpose of the present application. For some embodiments, for those of ordinary skill in the art, other drawings can be obtained based on these drawings without exerting creative efforts.

Figure 1 is a flow chart of a DNA encoding method provided by an embodiment of the present application;

Figure 2 is a flow chart of steps for converting text into a first quaternary sequence provided by an embodiment of the present application;

Figure 3 is a flow chart of steps for converting a picture into a first quaternary sequence according to an embodiment of the present application;

Figure 4 is a flow chart of steps for converting audio into a first quaternary sequence provided by an embodiment of the present application;

Figure 5 is a flow chart of steps for converting video into a first quaternary sequence provided by an embodiment of the present application;

Figure 6 is a flow chart of a DNA decoding method provided by an embodiment of the present application;

Figure 7 is a schematic structural diagram of a DNA encoding device provided by an embodiment of the present application;

Figure 8 is a schematic structural diagram of a DNA decoding device provided by an embodiment of the present application;

Figure 9 is a schematic structural diagram of a terminal device provided by an embodiment of the present application.

Best Mode of Carrying Out the Invention

In the following description, for the purpose of explanation rather than limitation, specific details such as specific system structures and technologies are provided to provide a thorough understanding of the embodiments of the present application. However, it will be apparent to those skilled in the art that the present application may be practiced in other embodiments without these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

It will be understood that, when used in this specification and the appended claims, the term "comprising" indicates the presence of the described features, integers, steps, operations, elements and/or components but does not exclude one or more other The presence or addition of features, integers, steps, operations, elements, components and/or collections thereof.

It will also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

As used in this specification and the appended claims, the term "if" may be interpreted as "when" or "once" or "in response to determining" or "in response to detecting" depending on the context. ". Similarly, the phrase "if determined" or "if [the described condition or event] is detected" may be interpreted, depending on the context, to mean "once determined" or "in response to a determination" or "once the [described condition or event] is detected ]" or "in response to detection of [the described condition or event]".

In addition, in the description of this application and the appended claims, the terms "first", "second", "third", etc. are only used to distinguish the description, and cannot be understood as indicating or implying relative importance.

Reference in this specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure or characteristic described in connection with the embodiment is included in one or more embodiments of the application. Therefore, the phrases "in one embodiment", "in some embodiments", "in other embodiments", "in other embodiments", etc. appearing in different places in this specification are not necessarily References are made to the same embodiment, but rather to "one or more but not all embodiments" unless specifically stated otherwise. The terms “including,” “includes,” “having,” and variations thereof all mean “including but not limited to,” unless otherwise specifically emphasized.

When currently encoding DNA, the information to be stored needs to be converted into a 01 binary number, and then further encoded into a DNA sequence. Among them, because binary numbers need to undergo a series of complex operations that take a long time to be encoded into DNA sequences, the DNA encoding speed is not ideal.

To address the above problems, embodiments of the present application convert the information to be stored into a first quaternary sequence, and then use the mapping relationship between quaternary characters and bases to directly encode the first quaternary sequence into a base sequence, and Obtain the DNA sequence based on the base sequence. In the process of encoding the first quaternary sequence into a base sequence, there is no need to go through complex operations. Instead, according to the mapping relationship between quaternary characters and bases, the first quaternary sequence is directly encoded as Base sequence, thereby reducing algorithm complexity and improving coding speed.

The DNA encoding method provided in this application will be exemplified below in conjunction with specific examples.

As shown in Figure 1, the embodiment of the present application provides a DNA encoding method, including the following steps:

Step 11: Convert the original data to be stored into the first quaternary sequence.

The format of the above raw data can be text, picture, audio, video and other formats. And when converting the original data into the first quaternary sequence, the specific conversion methods of the original data in different formats are different. The specific conversion process will be explained in detail later.

Step 12: According to the preset mapping relationship between quaternary characters and bases, the first quaternary sequence is code-converted to obtain the base sequence.

The above-mentioned quaternary characters include 0, 1, 2, and 3, and the above-mentioned bases include four natural bases: adenine (A), uracil (T), cytosine (C), and guanine (G). It can be seen that if the four bases of ATCG are mapped one-to-one to 0123, there are 24 mapping relationships between quaternary characters and bases, and the above-mentioned preset mapping relationships between quaternary characters and bases can be these 24 types Any of the mapping relationships.

Step 13: Obtain the DNA sequence storing the original data based on the base sequence.

In one example, after obtaining the base sequence corresponding to the original data, the base sequence can be directly used as the DNA sequence storing the original data, and the subsequent DNA synthesis process can be performed to obtain the DNA storing the original data and store it. It should be noted that the specific implementation process of synthesizing DNA can be implemented by referring to the current common methods, and will not be described again here.

It can be seen that in some embodiments of the present application, the original data to be stored is converted into a first quaternary sequence, and then the first quaternary sequence is converted into a first quaternary sequence according to the one-to-one mapping relationship between quaternary characters and bases. The sequence is directly encoded as a base sequence, and the DNA sequence is obtained based on the base sequence. In the process of encoding the first quaternary sequence into a base sequence, there is no need to go through complex operations. Instead, according to the mapping relationship between quaternary characters and bases, the first quaternary sequence is directly encoded as Base sequence, thereby reducing algorithm complexity and improving coding speed.

When it is necessary to obtain original data from the above-mentioned DNA, the above-mentioned DNA needs to be sequenced first, the DNA sequence is read, and then the DNA sequence is decoded to obtain the original data. In the related technology of DNA decoding, when sequencing DNA, due to the continuous sequence of A or T, it is difficult for the polymerase to recognize each complete A or T, resulting in the continuous sequence of A or T starting after a certain A or T. The polymerization reaction of the sequence after the structure causes the sequencing results to be disordered and the peaks appear.

For this reason, in another embodiment of the present application, for the above step 13, in order to avoid disordered sequencing results, the DNA sequence can be obtained by inserting bases into the base sequence to control single bases in the DNA sequence. Number of repetitions.

Specifically, the DNA sequence storing the original data can be obtained by inserting N bases at positions every M bases in the base sequence.

Among them, the bases adjacent to the first base among the N bases in the base sequence are different; the bases adjacent to the Nth base among the N bases in the base sequence are different, and N The bases at adjacent positions among the bases are different.

The above M and N are both integers greater than 0, and the specific values of M and N can be set according to the actual situation. For example, the value of M can be set to 6.

In a possible embodiment of the present application, the number percentage of G and C in the DNA sequence is 40% to 60%, so as to achieve the highest sequencing efficiency.

In some embodiments of the present application, the number of single base repeats in the DNA sequence obtained by inserting N bases every M base interval does not exceed M, thereby effectively controlling the number of single base repeats and avoiding sequencing results. disorder.

It is worth mentioning that when controlling the number of single base repeats, the DNA encoding method in the embodiment of the present application does not need to undergo complex calculation processing (such as conditional filtering, etc.), but controls the single base repeats by inserting bases at intervals. number, thus reducing the algorithm complexity and further improving the encoding speed.

The process of converting raw data to be stored into a quaternary sequence is exemplarily described below with reference to specific embodiments.

In some embodiments of the present application, the original data to be stored can be converted into a quaternary sequence according to the format of the original data.

When the original data to be stored is text, as shown in Figure 2, in step 11 above, the specific implementation method of converting the original data to be stored into the first quaternary sequence includes the following steps:

Step 21: Encode the text according to the preset character encoding table to obtain a encoding sequence.

The above-mentioned default character encoding table is a encoding table for encoding text, such as Unicode encoding table, UTF-8 encoding table, ASCII encoding table, ISO8859-1 encoding table, GB2312 encoding table, GBK encoding table and other encoding tables. It can be understood that in some embodiments of the present application, the specific form of the above-mentioned preset character encoding table is not limited.

Step 22: Convert the encoding sequence into a first quaternary sequence according to the mapping relationship between the coded characters and quaternary characters in the preset character encoding table.

The above-mentioned coded characters are the characters of the base number corresponding to the coded sequence. In some embodiments of the present application, in order to quickly convert the encoding sequence into the first quaternary sequence, the above-mentioned encoding sequence may be an information sequence represented by hexadecimal. Correspondingly, the above-mentioned encoding characters are hexadecimal characters. .

For example, assume that the text used as the original data above is: hello world!, the default character encoding table is the Unicode encoding table, the value of M is 5, the value of N is 1, the mapping relationship between quaternary characters and bases are: 0-A, 1-T, 2-C, 3-G. The encoding sequence obtained by encoding using the Unicode encoding table is: 00680065006c006c006f00200077006f0072006c00640021, and then according to the mapping relationship between the encoded characters and quaternary characters of the Unicode encoding table , the obtained first quaternary sequence is:

000012200000121100001230000012300000123300000200000013130000123300001302000012300000121000000201, and then using the mapping relationship between quaternary characters and bases, the obtained base sequence is:

AAAATCCAAAAATCTTAAAATCGAAAAAATCGAAAAATCGGAAAAACAAAAAATGTGAAAATCGGAAAATGACAAAATCGAAAAATCTAAAAAACAT. The final DNA sequence obtained by inserting the bases is:

AAAATGCCAAATAATCTATAAAAGTCGAATAAATCTGAAAATATCGGTAAAATCAAAATAATGTAGAAAAGTCGGATAAATGTACAAATATCGATAAAATGCTAAATAAACAGT.

When the original data to be stored is a picture, as shown in Figure 3, in step 11 above, the specific implementation method of converting the original data to be stored into the first quaternary sequence includes the following steps:

Step 31: Obtain the RGB value of each pixel in the picture.

In some embodiments of the present application, the image can be extracted through the currently common RGB value extraction method.

The RGB (R represents red, G represents green, and B represents blue) value of each pixel in the film.

Step 32: Sort the RGB values of each pixel according to the preset pixel arrangement order to obtain a decimal sequence.

In some embodiments of the present application, the above-mentioned preset arrangement order of pixel points may be set in advance according to the relative position of each pixel point in the picture. For example, when the picture consists of 4 pixels, the 4 pixels can be numbered 0, 1, 2, and 3 according to their positions in the picture, and the arrangement order of each pixel can be determined to be 0, 1, 2, 3. , then when sorting the RGB values of these four pixels, the RGB values of each pixel can be sorted according to the numbering sequence (that is, the above-mentioned sequence 0, 1, 2, 3).

Step 33: Convert the decimal sequence into the first quaternary sequence according to the mapping relationship between decimal characters and quaternary characters.

For example, assume that the picture consists of 4 pixels, and the RGB values of the 4 pixels are: (45, 254, 78), (2, 100, 23), (99, 65, 109), (68, 126 , 94), according to the relative position of these four pixels in the picture, sort these four RGB values, and the resulting decimal sequence can be expressed as 2 100 23 45 254 78 99 65 109 68 126 94. According to the mapping relationship between decimal characters and quaternary characters, the converted first quaternary sequence is 2 1210 113 231 3332 1032 1203 1001 1231 1010 1332 1132 . It should be noted that since the maximum value of RGB mode is 255, which is 3333 after conversion to quaternary, in order to ensure that the number of digits occupied by each number is equal, the numbers with less than 4 digits are filled with 0 to maintain the original relative sequence, the final first quaternary sequence is: 000212100113023133321032120310011231101013321132. After the first quaternary sequence is obtained, assuming that the value of M is 5 and the value of N is 1, the mapping relationship between quaternary characters and bases is: 0-A, 1-T, 2-C, 3 -G, then using the mapping relationship between quaternary characters and bases, the obtained base sequence is:

AAACTCTAATTGACGTGGGCTAGCTCAGTAATTCGTTATATGGCTTGC. The final DNA sequence obtained by inserting the bases is:

AAACTACTAATATGACGATGGGCATAGCTACAGTACATTCGATTATACTGGCTATGC.

When the original data to be stored is audio, as shown in Figure 4, in step 11 above, the specific implementation method of converting the original data to be stored into the first quaternary sequence includes the following steps:

Step 41: Sampling the audio according to the preset sampling rate to obtain multiple sample data.

The above-mentioned preset sampling rate can be set according to the actual situation, for example, set to 8kHz, 11.025kHz, 22.05kHz, 16kHz, 37.8kHz, 44.1kHz, 48kHz, 96kHz, 192kHz, etc.

Step 42: Obtain the amplitude value of each sampled data.

The amplitude value of the above-mentioned sampled data may be the audio amplitude of the sampled data, which may be obtained through a currently common audio amplitude extraction method.

Step 43: Sort the obtained amplitude values according to the sampling order of the multiple sampled data to obtain a decimal sequence.

In some embodiments of the present application, after obtaining the amplitude value of the sampling data, the amplitude value can be represented by a decimal number. Correspondingly, the amplitude values of each sampled data can be sorted according to the order in which each sampled data is obtained to obtain a decimal sequence.

Step 44: Convert the decimal sequence into a first quaternary sequence according to the mapping relationship between decimal characters and quaternary characters.

In some embodiments of the present application, in order to facilitate encoding and decoding, in the process of converting the decimal sequence into the first quaternary sequence, the amplitude value of each sampled data is represented by a quaternary number with the same number of digits. For example, assuming that a 4-digit quaternary number is used to represent an amplitude value, then when the amplitude value of a certain sampled data is expressed as 2 in decimal, when the decimal sequence is converted into the first quaternary sequence, the decimal number 2 is converted into Quaternary number 0002.

For example, assuming that the audio is sampled using a sampling rate of 8kHz, the amplitude values of the 16 sampled data obtained according to the sampling order are: 25, 76, 127, 255, 127, 76, 178, 204, 127, 153, 76, 51, 127, 153, 51, 153, and then the decimal sequence is 25 76 127 255 127 76 178 204 127 153 76 51 127 153 51 153, and then according to the requirement that each decimal number be represented by a 4-digit quaternary number, and based on the mapping relationship between decimal characters and quaternary characters, the first quaternary sequence obtained for:

0121103013333333133310302302303013332121103003031333212103032121. After the first quaternary sequence is obtained, assuming that the value of M is 5 and the value of N is 1, the mapping relationship between quaternary characters and bases is: 0-A, 1-T, 2-C, 3 -G, then using the mapping relationship between quaternary characters and bases, the obtained base sequence is:

ATCTTAGATGGGGGGGTGGGTAGACGACGAGATGGGCTCTTAGAAGAGTGGGCTCTAGAGCTCT. The final DNA sequence obtained by inserting the bases is:

ATCTTCAGATGAGGGGGAGTGGGATAGACAGACGACGATGGAGCTCTATAGAACGAGTGAGGCTCATAGAGACTCT.

In a possible embodiment of the present application, after obtaining the amplitude value of each sampled data, instead of using decimal numbers to represent the amplitude value, several quaternary numbers are directly used to represent the amplitude value, and then according to In the sampling order of each sampled data, the corresponding quaternary numbers are sorted to obtain the first quaternary sequence. For example, if a four-digit quaternary number is used to represent the amplitude value, the maximum amplitude value can be assigned to 3333, and the amplitude values of other sampled data can be calculated proportionally to their digitized quaternary numbers to obtain the first Quaternary sequence.

When the original data to be stored is video, as shown in Figure 5, the above-mentioned first quaternary sequence includes the second quaternary sequence corresponding to the audio and the third quaternary sequence corresponding to the picture. The above step 11 will The specific implementation method of converting the stored original data into the first quaternary sequence includes the following steps:

Step 51: Extract the audio of the video and each frame of the video.

In some embodiments of the present application, currently common video and picture extraction methods may be used to extract the audio and each frame of the video.

Step 52: Process the extracted audio to obtain the second quaternary sequence corresponding to the audio.

In some embodiments of the present application, the specific implementation of processing the audio to obtain the corresponding second quaternary sequence (the implementation shown in Figure 4) has been described in detail above and will not be described again here.

Step 53: Process each extracted picture frame to obtain a fourth quaternary sequence corresponding to each extracted frame picture.

In some embodiments of the present application, the specific implementation method of processing the image to obtain the corresponding fourth quaternary sequence (the implementation method shown in Figure 3) has been described in detail above and will not be described again here. It should be noted that here, each frame of picture can be processed separately to obtain the fourth quaternary sequence corresponding to each frame of picture.

Step 54: According to the playback order of the extracted frame pictures in the video, match the corresponding

The fourth quaternary sequence is sorted to obtain the third quaternary sequence corresponding to the picture.

It should be noted that when the original data is a video, it is necessary to extract the audio and each frame of the video, and then process the audio and pictures respectively to obtain the second quaternary sequence corresponding to the audio and the third quaternary sequence corresponding to the picture. sequence.

That is, when DNA encoding a video, the video will be converted into the second quaternary sequence corresponding to the audio and the fourth quaternary sequence corresponding to the picture, and then these two quaternary sequences will be encoded into base sequences respectively. Finally, the DNA sequence of the audio (that is, obtained through the second quaternary sequence corresponding to the audio) and the DNA sequence of the picture (that is, obtained through the third quaternary sequence corresponding to the picture) are obtained.

For example, suppose that after extracting the audio of the video, the audio is sampled using a sampling rate of 8kHz. The amplitude values of the five sampled data obtained according to the sampling order are: 25, 76, 127, 255, 127, and then we get The decimal sequence is 25 76 127 255 127, and then according to the requirement that each decimal number be represented by a 4-digit quaternary number, and based on the mapping relationship between decimal characters and quaternary characters, the second quaternary corresponding to the audio is obtained The sequence is:

01211030133333331333. After the second quaternary sequence is obtained, assuming that the value of M is 5 and the value of N is 1, the mapping relationship between quaternary characters and bases is: 0-A, 1-T, 2-C, 3 -G, then using the mapping relationship between quaternary characters and bases, the obtained base sequence is:

ATCTTAGATGGGGGGGTGGG, the final DNA sequence of inserting the bases to obtain the audio is:

ATCTTCAGATGAGGGGGAGTGGG.

In this example, the video includes 3 frames of pictures. Each frame of picture is composed of 4 pixels. After extracting these 3 frames of pictures, these 3 frames of pictures are processed respectively to obtain the fourth quaternary number corresponding to each frame of picture. sequence. For the first frame of the picture, assume that the RGB values of the four pixels are: (2, 100, 23), (45, 254, 78), (99, 65, 109), (68, 126, 94), according to The relative position of these four pixels in the picture, and the RGB values of these four pixels are sorted. The resulting decimal sequence can be expressed as 2 100 23 45 254 78 99 65 109 68 126 94. According to the mapping relationship between decimal characters and quaternary characters, the converted fourth quaternary sequence is 2 1210 113 231 3332 1032 1203 1001 1231 1010 1332 1132 . It should be noted that since the maximum value of RGB mode is 255, which is 3333 after conversion to quaternary, in order to ensure that the number of digits occupied by each number is equal, the numbers with less than 4 digits are filled with 0 to maintain the original relative order, the final fourth quaternary sequence is obtained:

000212100113023133321032120310011231101013321132. In the same way, assume that the RGB values of the four pixels of the second frame are: (34, 54, 122), (56, 89, 90), (211, 168, 88), (80, 250, 255) , according to the relative position of these four pixels in the picture, sort the RGB values of these four pixels, and the resulting decimal sequence can be expressed as 34 54 122 56 89 90 211 168 88 80 250 255. According to the mapping relationship between decimal characters and quaternary characters, the converted fourth quaternary sequence is:

020203121322032011211122310322201120110033223333; Similarly, assume that the RGB values of the four pixels of the third frame of the picture are: (120, 70, 92), (126, 21, 24), (127, 75, 66), (185, 5, 221), according to the relative position of these four pixels in the picture, sort the RGB values of these four pixels, and the resulting decimal sequence can be expressed as 120 70 92 126 21 24 127 75 66 185 5 221, according to the mapping relationship between decimal characters and quaternary characters, the converted fourth quaternary sequence:

132010121130133201110120133310231002232100113131. After getting 3 frames of pictures

After the corresponding fourth quaternary sequence, the fourth quaternary sequence corresponding to the three frames of pictures is sorted according to the playback order of the three frames of pictures in the video. The third quaternary sequence corresponding to the picture is obtained: 0002121001130231333210321203100112311010133211320202031213220320112111223103 22201120110033223333132010121130133201110120133310231002232100113131. After the third quaternary sequence is obtained, assuming that the value of M is 5 and the value of N is 1, the mapping relationship between quaternary characters and bases is: 0-A, 1-T, 2-C, 3 -G, then using the mapping relationship between quaternary characters and bases, the obtained base sequence is:

AAACTCTAATTGACGTTGGGCTAGCTCAGTAATTCGTTATATGGCTTGCACACAGTCTGCCAGCATTCTTTCCGTAGCCCATTCATTAAGGCCGGGGTGCATATCTTGATGGCATTTATCATGGGTACGTAACCGCTAATTGTGT, and finally insert the bases to get the DNA sequence of the picture:

AAACTACTAATATGACGATGGGCATAGCTACAGTACATTCGATTATACTGGCTATGCACACACGTCTGACCAGCTATTCTATTCCGATAGCCACATTCTATTAACGGCCGAGGGTGCATCTTAGATGGACATTTCATCATAGGGTATCGTAATCCGCTCAATTGATGT.

The DNA decoding method provided by this application will be exemplarily described below in conjunction with specific embodiments.

As shown in Figure 6, the embodiment of the present application provides a DNA decoding method, including the following steps:

Step 61: Determine the DNA sequence to be decoded.

When it is necessary to obtain original data from DNA that stores original data, the above-mentioned DNA needs to be sequenced first, and the DNA sequence must be read to determine the DNA sequence to be decoded. It should be noted that the specific implementation process of DNA sequencing can be implemented by referring to the current common methods, and will not be described again here. As mentioned above, if the original data stored is a video, two DNA sequences need to be used for storage, and if the original data stored is text, picture, or audio, only one DNA sequence needs to be used for storage. Therefore, during decoding, the number of DNA sequences to be decoded may be one or multiple.

Step 62: Obtain the base sequence based on the DNA sequence.

In one example, after determining the DNA sequence to be decoded, the DNA sequence can be directly used as the base sequence.

In another embodiment of the present application, in order to avoid disordered sequencing results during DNA encoding, bases are inserted into the base sequence. Therefore, during decoding, the inserted bases in the DNA sequence need to be removed to obtain the base sequence.

Corresponding to the insertion of bases, in some embodiments of the present application, the base sequence can be obtained by removing N bases at every M base position in the DNA sequence.

The above M and N are both integers greater than 0, and the specific values of M and N can be set according to the actual situation.

Step 63: Decode the base sequence according to the preset mapping relationship between quaternary characters and bases to obtain the first quaternary sequence.

The above-mentioned quaternary characters include 0, 1, 2, and 3, and the above-mentioned bases include four natural bases: A, T, C, and G. In some embodiments of this application, the four ATCG bases are mapped one-to-one to 0123. There are 24 mapping relationships between quaternary characters and bases. The above-mentioned preset mapping relationships between quaternary characters and bases It can be any of these 24 mapping relationships.

In some embodiments of the present application, in the process of decoding the base sequence into the first quaternary sequence, there is no need to go through complex operations, but through a one-to-one mapping of quaternary characters and bases. Relationship, the base sequence is decoded into the first quaternary sequence, thereby greatly reducing the algorithm complexity and improving the decoding speed.

Step 64: Convert the first quaternary sequence to obtain original data corresponding to the DNA sequence.

In some embodiments of the present application, when converting the first quaternary sequence into original data, the specific conversion methods of the original data in different formats are different, and the specific conversion process will be explained in detail later.

It should be noted that when the number of DNA sequences to be decoded is multiple, steps 62 to 64 need to be performed for each DNA sequence to be decoded to obtain original data corresponding to each DNA sequence to be decoded.

It is worth mentioning that, corresponding to the improvement of encoding speed, in the process of decoding the base sequence into the first quaternary sequence, no complex operation is required, but based on the relationship between quaternary characters and bases. The one-to-one mapping relationship decodes the base sequence into the first quaternary sequence, thereby reducing the algorithm complexity and improving the decoding speed.

The specific implementation manner of the above step 64 will be exemplarily described below with reference to specific embodiments.

When the original data stored in the DNA sequence is text, the above-mentioned step 64 is to convert the first quaternary sequence to obtain the original data corresponding to the DNA sequence. The specific implementation method includes the following steps: according to the encoding in the preset character encoding table The mapping relationship between characters and quaternary characters is to convert the first quaternary sequence into a coding sequence, and convert the coding sequence into text according to the preset character coding table.

The above-mentioned coded characters are: characters corresponding to the encoding sequence obtained by using the preset character encoding table when DNA encoding text. Therefore, when decoding, the mapping relationship between the coded characters and quaternary characters can be used. , convert the first quaternary sequence into a coding sequence, and further convert it into text using a preset character encoding table.

Corresponding to DNA encoding, in some embodiments of the present application, the above-mentioned encoding sequence may be an information sequence represented by hexadecimal, and accordingly, the above-mentioned encoding characters may be hexadecimal characters.

When the original data stored in the DNA sequence is a picture, the above step 64 is to convert the first quaternary sequence to obtain the original data corresponding to the DNA sequence. The specific implementation method includes the following steps: first, according to the quaternary characters and decimal characters The mapping relationship between the first quaternary sequence is converted into a decimal sequence; then the RGB values of multiple pixels are determined based on the decimal sequence; finally, based on the RGB values of multiple pixels and the preset pixel arrangement order, Generate pictures.

Corresponding to DNA coding, in some embodiments of the present application, in the process of converting the first quaternary sequence into a decimal sequence, each four-digit quaternary number can be regarded as a decimal number, for example, if the first The quaternary sequence is 0002 1210 0113 0231 3332 1032 1203 1001 1231 1010 1332 1132, then the resulting decimal sequence is 2 100 23 45 254 78 99 65 109 68 126 94. In the process of determining the RGB values of multiple pixels, every three decimal numbers in the decimal sequence can be used as the RGB value of one pixel. For example, if the decimal sequence is 2 100 23 45 254 78 99 65 109 68 126 94, then the RGB values of the multiple pixels obtained are (2, 100, 23), (45, 254, 78), (99, 65, 109), (68, 126, 94). In the process of generating a picture, the RGB value of each pixel can be processed according to the relative position of each pixel in the picture to obtain the picture.

When the original data stored in the DNA sequence is audio, the above step 64 is to convert the first quaternary sequence to obtain the original data corresponding to the DNA sequence. The specific implementation method includes the following steps: first, according to the quaternary characters and decimal characters The mapping relationship between the first quaternary sequence is converted into a decimal sequence; then the amplitude values of multiple sampled data are determined based on the decimal sequence; and then the amplitude values stored in the DNA sequence are determined based on the preset sampling rate and the number of sampled data. The total duration of the audio; finally, the determined amplitude values are evenly distributed over the total duration to obtain the audio.

Corresponding to DNA encoding, in the process of converting the first quaternary sequence into a decimal sequence, each T (T value is: When converting the decimal sequence into the first quaternary sequence when DNA encoding is performed, each decimal The number of digits (number of digits corresponding to the quaternary number) of the quaternary number is converted into a decimal number. In the process of determining the amplitude values of multiple sampled data, each decimal number in the decimal sequence can be used as the amplitude value of one sampled data, and each amplitude value is sorted according to the order of the corresponding decimal number in the decimal sequence. In the process of determining the total duration of the audio, the ratio of the number of amplitude values (that is, the number of sampled data) to the pre-existing sampling rate can be used as the total duration of the audio. In the process of finally obtaining the audio, the sorted amplitude values can be evenly distributed over the total duration to obtain the audio.

When the DNA sequence to be decoded includes the DNA sequence of the audio and the DNA sequence of the picture, it indicates that the DNA sequence currently to be decoded is the DNA sequence corresponding to the video. Specifically, the original data stored in the audio DNA sequence at this time is the audio of the video. , the original data stored in the DNA sequence of the picture is the multi-frame picture of the video. It should be noted that when performing steps 62 to 63 for the DNA sequence of the audio, the first quaternary sequence obtained is the second quaternary sequence corresponding to the DNA sequence of the audio. 62 to step 63, the first quaternary sequence obtained is the third quaternary sequence corresponding to the DNA sequence of the picture. Correspondingly, the above step 64, the specific implementation method of converting the first quaternary sequence to obtain the original data corresponding to the DNA sequence includes the following steps: converting the second quaternary sequence corresponding to the DNA sequence of the audio to obtain the audio , and convert the third quaternary sequence corresponding to the DNA sequence of the picture to obtain multi-frame pictures.

In some embodiments of the present application, the specific implementation method of converting the second quaternary sequence corresponding to the DNA sequence of the audio (that is, the stored original data is the DNA sequence of the audio) to obtain the audio has been explained in detail above. No longer.

The specific implementation method of converting the third quaternary sequence corresponding to the DNA sequence of the picture to obtain the multi-frame picture is the same as that of converting the stored original data into the first quaternary sequence corresponding to the DNA sequence of the picture to obtain the specific details of the picture. The implementation method is similar, but the difference is that in the process of generating pictures, the RGB value of each pixel is processed according to the relative position of each pixel in the picture, and multiple frames of pictures can be obtained. It should be noted that when generating multi-frame pictures, since the pixels corresponding to each frame of picture and the relative position of each pixel in each frame of picture are recorded during DNA encoding, multi-frame pictures can be decoded at this time.

When the DNA sequence currently to be decoded is the DNA sequence corresponding to the video, after converting the first quaternary sequence to obtain the original data corresponding to the DNA sequence, the method also includes: synthesizing multiple frames of pictures and audio to obtain the video . Specifically, the total duration of the audio corresponding to the DNA sequence of the audio can be determined first, and then the decoded multi-frame pictures can be sorted according to the order of the RGB values of the pixels of the decoded multi-frame pictures in the decimal sequence, and then The sorted multi-frame pictures are evenly distributed in the total duration, and the video is obtained by combining the audio and pictures.

In summary, the DNA encoding and decoding method provided by the embodiment of the present application has the following effects:

First, in the process of encoding a quaternary sequence into a base sequence, there is no need to go through complex operations. Instead, the quaternary sequence is directly encoded into a base sequence based on the mapping relationship between quaternary characters and bases. , thereby reducing the algorithm complexity and improving the coding speed;

Second, when controlling the number of single-base repeats, there is no need to go through complex calculations (such as conditional filtering, etc.). Instead, the number of single-base repeats is controlled by inserting bases at intervals, thereby reducing the complexity of the algorithm and further improving the encoding speed;

Third, DNA storage of text, pictures, audio, and video can be achieved.

The following is an exemplary description of the DNA encoding device, DNA decoding device, terminal equipment, storage media and products provided by the present application in conjunction with specific embodiments.

As shown in Figure 7, an embodiment of the present application provides a DNA encoding device. The DNA encoding device 700 includes:

The first conversion module 701 is used to convert the original data to be stored into a first quaternary sequence;

The encoding module 702 is used to encode and convert the first quaternary sequence to obtain a base sequence according to the preset mapping relationship between quaternary characters and bases;

The generation module 703 is used to obtain a DNA sequence storing original data based on the base sequence.

Optionally, the first conversion module 701 includes:

RGB value;

Optionally, the first conversion module 701 includes:

Three-quaternary sequence; the first conversion module 701 includes:

Optionally, the generation module 703 includes:

It should be noted that the information interaction, execution process, etc. between the above-mentioned DNA encoding devices/units are based on the same concept as the DNA encoding method in the method embodiment of the present application. For details, please refer to The method embodiment part will not be described again here.

As shown in Figure 8, an embodiment of the present application provides a DNA decoding device. The DNA decoding device 800 includes:

Determination module 801, used to determine the DNA sequence to be decoded;

The processing module 802 is used to obtain the base sequence according to the DNA sequence;

The decoding module 803 is used to decode the base sequence according to the preset mapping relationship between quaternary characters and bases to obtain the first quaternary sequence;

The second conversion module 804 is used to convert the first quaternary sequence to obtain original data corresponding to the DNA sequence.

Optionally, the second conversion module 804 includes:

The second conversion module 804 includes:

The DNA decoding device 800 also includes:

Optionally, the processing module 802 is specifically configured to remove N bases at positions every M base interval in the DNA sequence to obtain a base sequence; where M and N are both integers greater than 0.

It should be noted that the information interaction and execution process between the above-mentioned DNA decoding devices/units are based on the same concept as the DNA decoding method in the method embodiment of the present application. For details, please refer to The method embodiment part will not be described again here.

Those skilled in the art can clearly understand that for the convenience and simplicity of description, only the division of the above functional units and modules is used as an example. In actual applications, the above functions can be allocated to different functional units and modules according to needs. Module completion means dividing the internal structure of the device into different functional units or modules to complete all or part of the functions described above. Each functional unit and module in the embodiment can be integrated into one processing unit, or each unit can exist physically alone, or two or more units can be integrated into one unit. The above-mentioned integrated unit can be hardware-based. It can also be implemented in the form of software functional units. In addition, the specific names of each functional unit and module are only for the convenience of distinguishing each other and are not used to limit the scope of protection of the present application. For the specific working processes of the units and modules in the above system, please refer to the corresponding processes in the foregoing method embodiments, and will not be described again here.

As shown in Figure 9, an embodiment of the present application provides a terminal device. As shown in Figure 9, the terminal device D10 of this embodiment includes: at least one processor D100 (only one processor is shown in Figure 9), Memory D101 and a computer program D102 stored in the memory D101 and executable on the at least one processor D100. When the processor D100 executes the computer program D102, the steps in any of the above method embodiments are implemented.

The so-called processor D100 can be a central processing unit (CPU, Central Processing Unit). The processor D100 can also be other general-purpose processors, digital signal processors (DSP, Digital Signal Processor), application specific integrated circuits (ASIC, Application Specific Integrated Circuit), off-the-shelf programmable gate array (FPGA, Field-Programmable Gate Array) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general-purpose processor may be a microprocessor or the processor may be any conventional processor, etc.

In some embodiments, the memory D101 may be an internal storage unit of the terminal device D10, such as a hard disk or memory of the terminal device D10. In other embodiments, the memory D101 may also be an external storage device of the terminal device D10, such as a plug-in hard disk, a smart memory card (SMC, Smart Media Card), or a secure digital device equipped on the terminal device D10. (SD, Secure Digital) card, flash card (Flash Card), etc. Further, the memory D101 may also include both an internal storage unit of the terminal device D10 and an external storage device. The memory D101 is used to store operating systems, application programs, boot loaders (Boot Loaders), data and other programs, such as program codes of the computer programs. The memory D101 can also be used to temporarily store data that has been output or will be output.

It should be noted that the information interaction, execution process, etc. between the above-mentioned devices/units are based on the same concept as the method embodiments of the present application. For details of their specific functions and technical effects, please refer to the method embodiments section. No further details will be given.

Embodiments of the present application also provide a computer-readable storage medium. The computer-readable storage medium stores a computer program. When the computer program is executed by a processor, the steps in each of the above method embodiments can be implemented.

Embodiments of the present application provide a computer program product. When the computer program product is run on a terminal device, the steps in each of the above method embodiments can be implemented when the terminal device executes it.

If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a computer-readable storage medium. Based on this understanding, this application can implement all or part of the processes in the methods of the above embodiments by instructing relevant hardware through a computer program. The computer program can be stored in a computer-readable storage medium. The computer program When executed by a processor, the steps of each of the above method embodiments may be implemented. Wherein, the computer program includes computer program code, which may be in the form of source code, object code, executable file or some intermediate form. The computer-readable medium may at least include: any entity or device capable of carrying computer program code to a DNA encoding device/DNA decoding device/terminal device, a recording medium, a computer memory, or a read-only memory (ROM, Read-Only Memory) , random access memory (RAM, Random Access Memory), electrical carrier signals, telecommunications signals and software distribution media. For example, U disk, mobile hard disk, magnetic disk or CD, etc. In some jurisdictions, subject to legislation and patent practice, computer-readable media may not be electrical carrier signals and telecommunications signals.

In the above embodiments, each embodiment is described with its own emphasis. For parts that are not detailed or documented in a certain embodiment, please refer to the relevant descriptions of other embodiments.

Those of ordinary skill in the art will appreciate that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented with electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each specific application, but such implementations should not be considered beyond the scope of this application.

In the embodiments provided in this application, it should be understood that the disclosed devices/network devices and methods can be implemented in other ways. For example, the apparatus/network equipment embodiments described above are only illustrative. For example, the division of modules or units is only a logical function division. In actual implementation, there may be other division methods, such as multiple units. Or components can be combined or can be integrated into another system, or some features can be omitted, or not implemented. On the other hand, the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, indirect coupling or communication connection of devices or units, which may be in electrical, mechanical or other forms.

The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or they may be distributed to multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

The above-described embodiments are only used to illustrate the technical solutions of the present application, but not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that they can still implement the above-mentioned implementations. The technical solutions described in the examples are modified, or some of the technical features are equivalently replaced; and these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions in the embodiments of this application, and should be included in within the protection scope of this application.

Claims

A DNA encoding method, characterized by including:

Convert the original data to be stored into the first quaternary sequence;

According to the preset mapping relationship between quaternary characters and bases, code conversion is performed on the first quaternary sequence to obtain a base sequence;

A DNA sequence storing the original data is obtained based on the base sequence.
The method according to claim 1, characterized in that converting the original data to be stored into a first quaternary sequence includes:

When the original data to be stored is text, the text is encoded according to the preset character encoding table to obtain a coding sequence;

According to the mapping relationship between the coded characters and quaternary characters in the preset character encoding table, the coding sequence is converted into a first quaternary sequence.
The method according to claim 1, characterized in that converting the original data to be stored into a first quaternary sequence includes:

When the original data to be stored is a picture, obtain the RGB value of each pixel in the picture;

Sort the RGB values of each pixel according to the preset pixel arrangement order to obtain a decimal sequence;

The decimal sequence is converted into a first quaternary sequence according to the mapping relationship between decimal characters and quaternary characters.
The method according to claim 1, characterized in that converting the original data to be stored into a first quaternary sequence includes:

When the original data to be stored is audio, the audio is sampled according to a preset sampling rate to obtain multiple sampled data;

Obtain the amplitude value of each sampled data;

Sort the obtained amplitude values according to the sampling order of the plurality of sampled data to obtain a decimal sequence;

The decimal sequence is converted into a first quaternary sequence according to the mapping relationship between decimal characters and quaternary characters.
The method of claim 1, wherein the first quaternary sequence includes a second quaternary sequence corresponding to the audio and a third quaternary sequence corresponding to the picture; the original data to be stored Convert to the first quaternary sequence, including:

When the original data to be stored is a video, extract the audio of the video and each frame of the video;

Process the extracted audio to obtain the second quaternary sequence corresponding to the audio;

Process each extracted picture frame to obtain the fourth quaternary sequence corresponding to each extracted frame picture;

According to the playback order of each extracted frame picture in the video, the corresponding

The fourth quaternary sequence is sorted to obtain the third quaternary sequence corresponding to the picture.
The method according to claim 1, characterized in that said obtaining the DNA sequence storing the original data according to the base sequence includes:

Insert N bases at every position separated by M bases in the base sequence to obtain a DNA sequence storing the original data;

Wherein, the bases adjacent to the first base among the N bases in the base sequence are different; the base sequence is similar to the Nth base among the N bases. The adjacent bases are different, and the bases at adjacent positions among the N bases are different, and both M and N are integers greater than 0.
A DNA decoding method, characterized by including:

Determine the DNA sequence to be decoded;

Obtain the base sequence according to the DNA sequence;

According to the preset mapping relationship between quaternary characters and bases, decode the base sequence to obtain the first quaternary sequence;

Convert the first quaternary sequence to obtain original data corresponding to the DNA sequence.
The method according to claim 7, characterized in that said converting the first quaternary sequence to obtain original data corresponding to the DNA sequence includes:

When the original data stored in the DNA sequence is text, the first quaternary sequence is converted into a coding sequence according to the mapping relationship between the coded characters and quaternary characters in the preset character encoding table;

Convert the encoding sequence into text according to the preset character encoding table.
The method according to claim 7, characterized in that said converting the first quaternary sequence to obtain original data corresponding to the DNA sequence includes:

When the original data stored in the DNA sequence is a picture, convert the first quaternary sequence into a decimal sequence according to the mapping relationship between quaternary characters and decimal characters;

Determine the RGB values of multiple pixels according to the decimal sequence;

A picture is generated according to the RGB values of the plurality of pixels and the preset arrangement order of the pixels.
The method according to claim 7, characterized in that said converting the first quaternary sequence to obtain original data corresponding to the DNA sequence includes:

When the original data stored in the DNA sequence is audio, convert the first quaternary sequence into a decimal sequence according to the mapping relationship between quaternary characters and decimal characters;

Determine amplitude values of multiple sampled data according to the decimal sequence;

Determine the total duration of the audio stored in the DNA sequence based on the preset sampling rate and the number of sampled data;

The determined amplitude values are evenly distributed over the total duration to obtain audio.
The method of claim 7, wherein the DNA sequence to be decoded includes an audio DNA sequence and a picture DNA sequence, and the first quaternary sequence includes a second quaternary sequence corresponding to the audio DNA sequence. The third quaternary sequence corresponding to the DNA sequence of the frame and the picture, the original data stored in the DNA sequence of the audio is audio, and the original data stored in the DNA sequence of the picture is a multi-frame picture;

Converting the first quaternary sequence to obtain original data corresponding to the DNA sequence includes:

Convert the second quaternary sequence corresponding to the DNA sequence of the audio to obtain the audio;

Convert the third quaternary sequence corresponding to the DNA sequence of the picture to obtain multiple frames of pictures;

After converting the first quaternary sequence to obtain original data corresponding to the DNA sequence, the method further includes:

The multi-frame pictures and the audio are synthesized to obtain a video.
The method according to claim 7, wherein obtaining the base sequence according to the DNA sequence includes:

In the DNA sequence, N bases are removed at every position separated by M bases to obtain the base sequence;

Among them, M and N are both integers greater than 0.
A DNA encoding device, characterized by including:

The first conversion module is used to convert the original data to be stored into a first quaternary sequence;

An encoding module, configured to encode and convert the first quaternary sequence to obtain a base sequence according to the preset mapping relationship between quaternary characters and bases;

A generating module is used to obtain a DNA sequence storing the original data according to the base sequence.
A DNA decoding device, characterized by including:

Determination module, used to determine the DNA sequence to be decoded;

A processing module, used to obtain the base sequence according to the DNA sequence;

A decoding module, used to decode the base sequence according to the preset mapping relationship between quaternary characters and bases to obtain the first quaternary sequence;

The second conversion module is used to convert the first quaternary sequence to obtain original data corresponding to the DNA sequence.
A terminal device includes a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that when the processor executes the computer program, it implements claims 1 to 1 6. The DNA encoding method according to any one of claims 7 to 12, or when the processor executes the computer program, the DNA decoding method according to any one of claims 7 to 12 is implemented.
A computer-readable storage medium, the computer-readable storage medium stores a computer program, characterized in that, when the computer program is executed by a processor, the DNA encoding method as described in any one of claims 1 to 6 is implemented, Or when the computer program is executed by a processor, the DNA decoding method according to any one of claims 7 to 12 is implemented.