WO2023173842A1 - Dna coding method and apparatus, dna decoding method and apparatus, terminal device and medium - Google Patents

Dna coding method and apparatus, dna decoding method and apparatus, terminal device and medium Download PDF

Info

Publication number
WO2023173842A1
WO2023173842A1 PCT/CN2022/138143 CN2022138143W WO2023173842A1 WO 2023173842 A1 WO2023173842 A1 WO 2023173842A1 CN 2022138143 W CN2022138143 W CN 2022138143W WO 2023173842 A1 WO2023173842 A1 WO 2023173842A1
Authority
WO
WIPO (PCT)
Prior art keywords
sequence
quaternary
dna
original data
audio
Prior art date
Application number
PCT/CN2022/138143
Other languages
French (fr)
Chinese (zh)
Inventor
戴俊彪
强薇
黄小罗
Original Assignee
深圳先进技术研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳先进技术研究院 filed Critical 深圳先进技术研究院
Publication of WO2023173842A1 publication Critical patent/WO2023173842A1/en

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • G16B50/30Data warehousing; Computing architectures
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • G16B50/50Compression of genetic data

Definitions

  • This application belongs to the field of data storage technology, and in particular relates to a DNA encoding method, decoding method, device, terminal equipment and medium.
  • DNA Deoxyribonucleic acid
  • the existing DNA encoding method is based on silicon-based 01 binary storage, converting the information to be stored into 01 binary numbers, and then further encoding it into a DNA sequence. Since the problem of single base repetition needs to be taken into account during the encoding process, binary numbers need to undergo a series of complex operations (such as XOR operations, random function mapping, conditional filtering, etc.) before they can be encoded into DNA sequences. However, because complex operations take a long time, the encoding speed is not ideal.
  • Embodiments of the present application provide a DNA encoding method, decoding method, device, terminal equipment and medium, which can solve the problem of unsatisfactory DNA encoding speed.
  • embodiments of the present application provide a DNA encoding method, including:
  • the first quaternary sequence is encoded and converted to obtain the base sequence
  • the DNA sequence storing the original data is obtained.
  • convert the original data to be stored into the first quaternary sequence including:
  • the text is encoded according to the preset character encoding table to obtain a coding sequence
  • the coding sequence is converted into a first quaternary sequence.
  • convert the original data to be stored into the first quaternary sequence including:
  • convert the original data to be stored into the first quaternary sequence including:
  • the audio is sampled according to the preset sampling rate to obtain multiple sample data
  • the first quaternary sequence includes the second quaternary sequence corresponding to the audio and the third quaternary sequence corresponding to the picture.
  • Three quaternary sequences convert the original data to be stored into the first quaternary sequence, including:
  • the original data to be stored is a video
  • the fourth and fourth steps corresponding to each frame picture are
  • obtain the DNA sequence storing the original data based on the base sequence including:
  • the bases adjacent to the first base among the N bases in the base sequence are different; the bases adjacent to the Nth base among the N bases in the base sequence are different, and N The bases at adjacent positions among the bases are different, and M and N are both integers greater than 0.
  • embodiments of the present application provide a DNA decoding method, including:
  • the base sequence is decoded to obtain the first quaternary sequence
  • convert the first quaternary sequence to obtain the original data corresponding to the DNA sequence including:
  • the first quaternary sequence is converted into a coding sequence according to the mapping relationship between the coded characters and quaternary characters in the preset character encoding table;
  • convert the first quaternary sequence to obtain the original data corresponding to the DNA sequence including:
  • the first quaternary sequence is converted into a decimal sequence according to the mapping relationship between quaternary characters and decimal characters;
  • convert the first quaternary sequence to obtain the original data corresponding to the DNA sequence including:
  • the first quaternary sequence is converted into a decimal sequence according to the mapping relationship between quaternary characters and decimal characters;
  • the determined amplitude values are evenly distributed over the total duration to obtain the audio.
  • the DNA sequence to be decoded includes the DNA sequence of the audio and the DNA sequence of the picture.
  • the first quaternary sequence includes the second quaternary sequence corresponding to the DNA sequence of the audio and the third quaternary sequence corresponding to the DNA sequence of the picture.
  • the original data stored in the audio DNA sequence is audio, and the original data stored in the picture DNA sequence is multi-frame pictures;
  • Convert the first quaternary sequence to obtain the original data corresponding to the DNA sequence including:
  • the method After converting the first quaternary sequence to obtain the original data corresponding to the DNA sequence, the method also includes:
  • obtain the base sequence based on the DNA sequence including:
  • N bases are removed at every M base position to obtain the base sequence
  • M and N are both integers greater than 0.
  • a DNA encoding device including:
  • the first conversion module is used to convert the original data to be stored into a first quaternary sequence
  • the encoding module is used to encode and convert the first quaternary sequence to obtain the base sequence according to the preset mapping relationship between quaternary characters and bases;
  • the generation module is used to obtain the DNA sequence storing the original data based on the base sequence.
  • the first conversion module includes:
  • the encoding unit is used to encode the text according to the preset character encoding table to obtain a coding sequence when the original data to be stored is text;
  • the first conversion unit is used to convert the coding sequence into a first quaternary sequence according to the mapping relationship between the coded characters and the quaternary characters in the preset character coding table.
  • the first conversion module includes:
  • the first acquisition unit is used to acquire each pixel in the image when the original data to be stored is an image.
  • the first sorting unit is used to sort the RGB values of each pixel according to the preset pixel arrangement order to obtain a decimal sequence
  • the second conversion unit is used to convert the decimal sequence into the first quaternary sequence according to the mapping relationship between decimal characters and quaternary characters.
  • the first conversion module includes:
  • the sampling unit is used to sample the audio according to the preset sampling rate to obtain multiple sampled data when the original data to be stored is audio;
  • the second acquisition unit is used to acquire the amplitude value of each sampled data
  • the second sorting unit is used to sort the obtained amplitude values according to the sampling order of multiple sampled data to obtain a decimal sequence
  • the third conversion unit is used to convert the decimal sequence into the first quaternary sequence according to the mapping relationship between decimal characters and quaternary characters.
  • the first quaternary sequence includes the second quaternary sequence corresponding to the audio and the third quaternary sequence corresponding to the picture.
  • the first conversion module includes:
  • An extraction unit used to extract the audio of the video and each frame of the video when the original data to be stored is a video
  • the first processing unit is used to process the extracted audio to obtain the second quaternary sequence corresponding to the audio;
  • the second processing unit is used to process each extracted picture frame and obtain the fourth quaternary sequence corresponding to each extracted frame picture;
  • the third sorting unit is used to sort each frame according to the playback order of the extracted frame pictures in the video.
  • the fourth quaternary sequence corresponding to the picture is sorted to obtain the third quaternary sequence corresponding to the picture.
  • generated modules include:
  • the generation unit is used to insert N bases at every M base position in the base sequence to obtain a DNA sequence storing original data.
  • the bases adjacent to the first base among the N bases in the base sequence are different; the bases adjacent to the Nth base among the N bases in the base sequence are different, and N The bases at adjacent positions among the bases are different, and M and N are both integers greater than 0.
  • a DNA decoding device including:
  • Determination module used to determine the DNA sequence to be decoded
  • the processing module is used to obtain the base sequence based on the DNA sequence
  • the decoding module is used to decode the base sequence according to the preset mapping relationship between quaternary characters and bases to obtain the first quaternary sequence;
  • the second conversion module is used to convert the first quaternary sequence to obtain original data corresponding to the DNA sequence.
  • the second conversion module includes:
  • the fourth conversion unit is used to convert the first quaternary sequence into coding sequence
  • the fifth conversion unit is used to convert the encoding sequence into text according to the preset character encoding table.
  • the second conversion module includes:
  • the sixth conversion unit is used to convert the first quaternary sequence into a decimal sequence according to the mapping relationship between quaternary characters and decimal characters when the original data stored in the DNA sequence is a picture;
  • the first determination unit is used to determine the RGB values of multiple pixels according to the decimal sequence
  • the generation unit is used to generate pictures based on the RGB values of multiple pixels and the preset arrangement order of pixels.
  • the second conversion module includes:
  • the seventh conversion unit is used to convert the first quaternary sequence into a decimal sequence according to the mapping relationship between quaternary characters and decimal characters when the original data stored in the DNA sequence is audio;
  • the second determination unit is used to determine the amplitude values of multiple sampled data according to the decimal sequence
  • the third determination unit is used to determine the total duration of the audio stored in the DNA sequence based on the preset sampling rate and the number of sampled data;
  • the distribution unit is used to evenly distribute the determined amplitude values over the total duration to obtain audio.
  • the DNA sequence to be decoded includes the DNA sequence of the audio and the DNA sequence of the picture.
  • the first quaternary sequence includes the second quaternary sequence corresponding to the DNA sequence of the audio and the third quaternary sequence corresponding to the DNA sequence of the picture.
  • the original data stored in the audio DNA sequence is audio, and the original data stored in the picture DNA sequence is multi-frame pictures;
  • the second conversion module includes:
  • the eighth conversion unit is used to convert the second quaternary sequence corresponding to the DNA sequence of the audio to obtain the audio;
  • the ninth conversion unit is used to convert the third quaternary sequence corresponding to the DNA sequence of the picture to obtain multiple frames of pictures;
  • the DNA decoding device also includes:
  • the synthesis module is used to synthesize multiple frames of pictures and audio to obtain a video.
  • the optional processing module is specifically used to remove N bases at every M base position in the DNA sequence to obtain the base sequence; where M and N are both integers greater than 0.
  • embodiments of the present application provide a terminal device, including a memory, a processor, and a computer program stored in the memory and executable on the processor.
  • the processor executes the computer program, the above-mentioned DNA encoding method or DNA encoding method is implemented. Decoding method.
  • embodiments of the present application provide a computer-readable storage medium.
  • the computer-readable storage medium stores a computer program.
  • the computer program is executed by a processor, the above-mentioned DNA encoding method or DNA decoding method is implemented.
  • embodiments of the present application provide a computer program product, which when the computer program product is run on a terminal device, causes the terminal device to execute the above-mentioned DNA encoding method or DNA decoding method.
  • the DNA sequence storing the original data is obtained based on the base sequence.
  • the first quaternary sequence is directly encoded as Base sequence, thereby reducing algorithm complexity and improving coding speed.
  • Figure 1 is a flow chart of a DNA encoding method provided by an embodiment of the present application.
  • Figure 2 is a flow chart of steps for converting text into a first quaternary sequence provided by an embodiment of the present application
  • Figure 3 is a flow chart of steps for converting a picture into a first quaternary sequence according to an embodiment of the present application
  • Figure 4 is a flow chart of steps for converting audio into a first quaternary sequence provided by an embodiment of the present application
  • Figure 5 is a flow chart of steps for converting video into a first quaternary sequence provided by an embodiment of the present application
  • Figure 6 is a flow chart of a DNA decoding method provided by an embodiment of the present application.
  • Figure 7 is a schematic structural diagram of a DNA encoding device provided by an embodiment of the present application.
  • Figure 8 is a schematic structural diagram of a DNA decoding device provided by an embodiment of the present application.
  • Figure 9 is a schematic structural diagram of a terminal device provided by an embodiment of the present application.
  • the term “if” may be interpreted as “when” or “once” or “in response to determining” or “in response to detecting” depending on the context. ". Similarly, the phrase “if determined” or “if [the described condition or event] is detected” may be interpreted, depending on the context, to mean “once determined” or “in response to a determination” or “once the [described condition or event] is detected ]” or “in response to detection of [the described condition or event]”.
  • the information to be stored needs to be converted into a 01 binary number, and then further encoded into a DNA sequence.
  • binary numbers need to undergo a series of complex operations that take a long time to be encoded into DNA sequences, the DNA encoding speed is not ideal.
  • embodiments of the present application convert the information to be stored into a first quaternary sequence, and then use the mapping relationship between quaternary characters and bases to directly encode the first quaternary sequence into a base sequence, and Obtain the DNA sequence based on the base sequence.
  • the first quaternary sequence is directly encoded as Base sequence, thereby reducing algorithm complexity and improving coding speed.
  • the embodiment of the present application provides a DNA encoding method, including the following steps:
  • Step 11 Convert the original data to be stored into the first quaternary sequence.
  • the format of the above raw data can be text, picture, audio, video and other formats. And when converting the original data into the first quaternary sequence, the specific conversion methods of the original data in different formats are different. The specific conversion process will be explained in detail later.
  • Step 12 According to the preset mapping relationship between quaternary characters and bases, the first quaternary sequence is code-converted to obtain the base sequence.
  • the above-mentioned quaternary characters include 0, 1, 2, and 3, and the above-mentioned bases include four natural bases: adenine (A), uracil (T), cytosine (C), and guanine (G). It can be seen that if the four bases of ATCG are mapped one-to-one to 0123, there are 24 mapping relationships between quaternary characters and bases, and the above-mentioned preset mapping relationships between quaternary characters and bases can be these 24 types Any of the mapping relationships.
  • Step 13 Obtain the DNA sequence storing the original data based on the base sequence.
  • the base sequence after obtaining the base sequence corresponding to the original data, the base sequence can be directly used as the DNA sequence storing the original data, and the subsequent DNA synthesis process can be performed to obtain the DNA storing the original data and store it. It should be noted that the specific implementation process of synthesizing DNA can be implemented by referring to the current common methods, and will not be described again here.
  • the original data to be stored is converted into a first quaternary sequence, and then the first quaternary sequence is converted into a first quaternary sequence according to the one-to-one mapping relationship between quaternary characters and bases.
  • the sequence is directly encoded as a base sequence, and the DNA sequence is obtained based on the base sequence.
  • the first quaternary sequence is directly encoded as Base sequence, thereby reducing algorithm complexity and improving coding speed.
  • the above-mentioned DNA needs to be sequenced first, the DNA sequence is read, and then the DNA sequence is decoded to obtain the original data.
  • the DNA sequence is read, and then the DNA sequence is decoded to obtain the original data.
  • the polymerase due to the continuous sequence of A or T, it is difficult for the polymerase to recognize each complete A or T, resulting in the continuous sequence of A or T starting after a certain A or T.
  • the polymerization reaction of the sequence after the structure causes the sequencing results to be disordered and the peaks appear.
  • the DNA sequence in order to avoid disordered sequencing results, can be obtained by inserting bases into the base sequence to control single bases in the DNA sequence. Number of repetitions.
  • the DNA sequence storing the original data can be obtained by inserting N bases at positions every M bases in the base sequence.
  • the bases adjacent to the first base among the N bases in the base sequence are different; the bases adjacent to the Nth base among the N bases in the base sequence are different, and N The bases at adjacent positions among the bases are different.
  • M and N are both integers greater than 0, and the specific values of M and N can be set according to the actual situation.
  • the value of M can be set to 6.
  • the number percentage of G and C in the DNA sequence is 40% to 60%, so as to achieve the highest sequencing efficiency.
  • the number of single base repeats in the DNA sequence obtained by inserting N bases every M base interval does not exceed M, thereby effectively controlling the number of single base repeats and avoiding sequencing results. disorder.
  • the DNA encoding method in the embodiment of the present application does not need to undergo complex calculation processing (such as conditional filtering, etc.), but controls the single base repeats by inserting bases at intervals. number, thus reducing the algorithm complexity and further improving the encoding speed.
  • the original data to be stored can be converted into a quaternary sequence according to the format of the original data.
  • the specific implementation method of converting the original data to be stored into the first quaternary sequence includes the following steps:
  • Step 21 Encode the text according to the preset character encoding table to obtain a encoding sequence.
  • the above-mentioned default character encoding table is a encoding table for encoding text, such as Unicode encoding table, UTF-8 encoding table, ASCII encoding table, ISO8859-1 encoding table, GB2312 encoding table, GBK encoding table and other encoding tables. It can be understood that in some embodiments of the present application, the specific form of the above-mentioned preset character encoding table is not limited.
  • Step 22 Convert the encoding sequence into a first quaternary sequence according to the mapping relationship between the coded characters and quaternary characters in the preset character encoding table.
  • the above-mentioned coded characters are the characters of the base number corresponding to the coded sequence.
  • the above-mentioned encoding sequence in order to quickly convert the encoding sequence into the first quaternary sequence, may be an information sequence represented by hexadecimal.
  • the above-mentioned encoding characters are hexadecimal characters. .
  • the default character encoding table is the Unicode encoding table
  • the value of M is 5
  • the value of N is 1
  • the mapping relationship between quaternary characters and bases are: 0-A, 1-T, 2-C, 3-G.
  • the encoding sequence obtained by encoding using the Unicode encoding table is: 00680065006c006c006f00200077006f0072006c00640021, and then according to the mapping relationship between the encoded characters and quaternary characters of the Unicode encoding table , the obtained first quaternary sequence is:
  • the specific implementation method of converting the original data to be stored into the first quaternary sequence includes the following steps:
  • Step 31 Obtain the RGB value of each pixel in the picture.
  • the image can be extracted through the currently common RGB value extraction method.
  • Step 32 Sort the RGB values of each pixel according to the preset pixel arrangement order to obtain a decimal sequence.
  • the above-mentioned preset arrangement order of pixel points may be set in advance according to the relative position of each pixel point in the picture. For example, when the picture consists of 4 pixels, the 4 pixels can be numbered 0, 1, 2, and 3 according to their positions in the picture, and the arrangement order of each pixel can be determined to be 0, 1, 2, 3. , then when sorting the RGB values of these four pixels, the RGB values of each pixel can be sorted according to the numbering sequence (that is, the above-mentioned sequence 0, 1, 2, 3).
  • Step 33 Convert the decimal sequence into the first quaternary sequence according to the mapping relationship between decimal characters and quaternary characters.
  • the picture consists of 4 pixels
  • the RGB values of the 4 pixels are: (45, 254, 78), (2, 100, 23), (99, 65, 109), (68, 126 , 94), according to the relative position of these four pixels in the picture, sort these four RGB values, and the resulting decimal sequence can be expressed as 2 100 23 45 254 78 99 65 109 68 126 94.
  • the converted first quaternary sequence is 2 1210 113 231 3332 1032 1203 1001 1231 1010 1332 1132 .
  • the final first quaternary sequence is: 000212100113023133321032120310011231101013321132.
  • the mapping relationship between quaternary characters and bases is: 0-A, 1-T, 2-C, 3 -G, then using the mapping relationship between quaternary characters and bases, the obtained base sequence is:
  • the specific implementation method of converting the original data to be stored into the first quaternary sequence includes the following steps:
  • Step 41 Sampling the audio according to the preset sampling rate to obtain multiple sample data.
  • the above-mentioned preset sampling rate can be set according to the actual situation, for example, set to 8kHz, 11.025kHz, 22.05kHz, 16kHz, 37.8kHz, 44.1kHz, 48kHz, 96kHz, 192kHz, etc.
  • Step 42 Obtain the amplitude value of each sampled data.
  • the amplitude value of the above-mentioned sampled data may be the audio amplitude of the sampled data, which may be obtained through a currently common audio amplitude extraction method.
  • Step 43 Sort the obtained amplitude values according to the sampling order of the multiple sampled data to obtain a decimal sequence.
  • the amplitude value after obtaining the amplitude value of the sampling data, can be represented by a decimal number.
  • the amplitude values of each sampled data can be sorted according to the order in which each sampled data is obtained to obtain a decimal sequence.
  • Step 44 Convert the decimal sequence into a first quaternary sequence according to the mapping relationship between decimal characters and quaternary characters.
  • the amplitude value of each sampled data is represented by a quaternary number with the same number of digits. For example, assuming that a 4-digit quaternary number is used to represent an amplitude value, then when the amplitude value of a certain sampled data is expressed as 2 in decimal, when the decimal sequence is converted into the first quaternary sequence, the decimal number 2 is converted into Quaternary number 0002.
  • the amplitude values of the 16 sampled data obtained according to the sampling order are: 25, 76, 127, 255, 127, 76, 178, 204, 127, 153, 76, 51, 127, 153, 51, 153, 153, and then the decimal sequence is 25 76 127 255 127 76 178 204 127 153 76 51 127 153 51 153, and then according to the requirement that each decimal number be represented by a 4-digit quaternary number, and based on the mapping relationship between decimal characters and quaternary characters, the first quaternary sequence obtained for:
  • the final DNA sequence obtained by inserting the bases is:
  • the corresponding quaternary numbers are sorted to obtain the first quaternary sequence. For example, if a four-digit quaternary number is used to represent the amplitude value, the maximum amplitude value can be assigned to 3333, and the amplitude values of other sampled data can be calculated proportionally to their digitized quaternary numbers to obtain the first Quaternary sequence.
  • the above-mentioned first quaternary sequence includes the second quaternary sequence corresponding to the audio and the third quaternary sequence corresponding to the picture.
  • the above step 11 will The specific implementation method of converting the stored original data into the first quaternary sequence includes the following steps:
  • Step 51 Extract the audio of the video and each frame of the video.
  • currently common video and picture extraction methods may be used to extract the audio and each frame of the video.
  • Step 52 Process the extracted audio to obtain the second quaternary sequence corresponding to the audio.
  • Step 53 Process each extracted picture frame to obtain a fourth quaternary sequence corresponding to each extracted frame picture.
  • each frame of picture can be processed separately to obtain the fourth quaternary sequence corresponding to each frame of picture.
  • Step 54 According to the playback order of the extracted frame pictures in the video, match the corresponding
  • the fourth quaternary sequence is sorted to obtain the third quaternary sequence corresponding to the picture.
  • the original data is a video
  • the video when DNA encoding a video, the video will be converted into the second quaternary sequence corresponding to the audio and the fourth quaternary sequence corresponding to the picture, and then these two quaternary sequences will be encoded into base sequences respectively. Finally, the DNA sequence of the audio (that is, obtained through the second quaternary sequence corresponding to the audio) and the DNA sequence of the picture (that is, obtained through the third quaternary sequence corresponding to the picture) are obtained.
  • the audio is sampled using a sampling rate of 8kHz.
  • the amplitude values of the five sampled data obtained according to the sampling order are: 25, 76, 127, 255, 127, and then we get The decimal sequence is 25 76 127 255 127, and then according to the requirement that each decimal number be represented by a 4-digit quaternary number, and based on the mapping relationship between decimal characters and quaternary characters, the second quaternary corresponding to the audio is obtained
  • the sequence is:
  • the mapping relationship between quaternary characters and bases is: 0-A, 1-T, 2-C, 3 -G, then using the mapping relationship between quaternary characters and bases, the obtained base sequence is:
  • the video includes 3 frames of pictures.
  • Each frame of picture is composed of 4 pixels.
  • these 3 frames of pictures are processed respectively to obtain the fourth quaternary number corresponding to each frame of picture. sequence.
  • the RGB values of the four pixels are: (2, 100, 23), (45, 254, 78), (99, 65, 109), (68, 126, 94), according to The relative position of these four pixels in the picture, and the RGB values of these four pixels are sorted.
  • the resulting decimal sequence can be expressed as 2 100 23 45 254 78 99 65 109 68 126 94.
  • the converted fourth quaternary sequence is 2 1210 113 231 3332 1032 1203 1001 1231 1010 1332 1132 . It should be noted that since the maximum value of RGB mode is 255, which is 3333 after conversion to quaternary, in order to ensure that the number of digits occupied by each number is equal, the numbers with less than 4 digits are filled with 0 to maintain the original relative order, the final fourth quaternary sequence is obtained:
  • the RGB values of the four pixels of the second frame are: (34, 54, 122), (56, 89, 90), (211, 168, 88), (80, 250, 255)
  • sort the RGB values of these four pixels and the resulting decimal sequence can be expressed as 34 54 122 56 89 90 211 168 88 80 250 255.
  • the converted fourth quaternary sequence is:
  • the RGB values of the four pixels of the third frame of the picture are: (120, 70, 92), (126, 21, 24), (127, 75, 66), (185, 5, 221), according to the relative position of these four pixels in the picture, sort the RGB values of these four pixels, and the resulting decimal sequence can be expressed as 120 70 92 126 21 24 127 75 66 185 5 221, according to the mapping relationship between decimal characters and quaternary characters, the converted fourth quaternary sequence:
  • the fourth quaternary sequence corresponding to the three frames of pictures is sorted according to the playback order of the three frames of pictures in the video.
  • the third quaternary sequence corresponding to the picture is obtained: 0002121001130231333210321203100112311010133211320202031213220320112111223103 22201120110033223333132010121130133201110120133310231002232100113131.
  • the mapping relationship between quaternary characters and bases is: 0-A, 1-T, 2-C, 3 -G, then using the mapping relationship between quaternary characters and bases, the obtained base sequence is:
  • the embodiment of the present application provides a DNA decoding method, including the following steps:
  • Step 61 Determine the DNA sequence to be decoded.
  • DNA sequencing When it is necessary to obtain original data from DNA that stores original data, the above-mentioned DNA needs to be sequenced first, and the DNA sequence must be read to determine the DNA sequence to be decoded. It should be noted that the specific implementation process of DNA sequencing can be implemented by referring to the current common methods, and will not be described again here. As mentioned above, if the original data stored is a video, two DNA sequences need to be used for storage, and if the original data stored is text, picture, or audio, only one DNA sequence needs to be used for storage. Therefore, during decoding, the number of DNA sequences to be decoded may be one or multiple.
  • Step 62 Obtain the base sequence based on the DNA sequence.
  • the DNA sequence after determining the DNA sequence to be decoded, the DNA sequence can be directly used as the base sequence.
  • bases are inserted into the base sequence. Therefore, during decoding, the inserted bases in the DNA sequence need to be removed to obtain the base sequence.
  • the base sequence can be obtained by removing N bases at every M base position in the DNA sequence.
  • M and N are both integers greater than 0, and the specific values of M and N can be set according to the actual situation.
  • Step 63 Decode the base sequence according to the preset mapping relationship between quaternary characters and bases to obtain the first quaternary sequence.
  • the above-mentioned quaternary characters include 0, 1, 2, and 3, and the above-mentioned bases include four natural bases: A, T, C, and G.
  • the four ATCG bases are mapped one-to-one to 0123.
  • the above-mentioned preset mapping relationships between quaternary characters and bases It can be any of these 24 mapping relationships.
  • the base sequence in the process of decoding the base sequence into the first quaternary sequence, there is no need to go through complex operations, but through a one-to-one mapping of quaternary characters and bases. Relationship, the base sequence is decoded into the first quaternary sequence, thereby greatly reducing the algorithm complexity and improving the decoding speed.
  • Step 64 Convert the first quaternary sequence to obtain original data corresponding to the DNA sequence.
  • the specific conversion methods of the original data in different formats are different, and the specific conversion process will be explained in detail later.
  • steps 62 to 64 need to be performed for each DNA sequence to be decoded to obtain original data corresponding to each DNA sequence to be decoded.
  • step 64 The specific implementation manner of the above step 64 will be exemplarily described below with reference to specific embodiments.
  • the above-mentioned step 64 is to convert the first quaternary sequence to obtain the original data corresponding to the DNA sequence.
  • the specific implementation method includes the following steps: according to the encoding in the preset character encoding table
  • the mapping relationship between characters and quaternary characters is to convert the first quaternary sequence into a coding sequence, and convert the coding sequence into text according to the preset character coding table.
  • the above-mentioned default character encoding table is a encoding table for encoding text, such as Unicode encoding table, UTF-8 encoding table, ASCII encoding table, ISO8859-1 encoding table, GB2312 encoding table, GBK encoding table and other encoding tables. It can be understood that in some embodiments of the present application, the specific form of the above-mentioned preset character encoding table is not limited.
  • the above-mentioned coded characters are: characters corresponding to the encoding sequence obtained by using the preset character encoding table when DNA encoding text. Therefore, when decoding, the mapping relationship between the coded characters and quaternary characters can be used. , convert the first quaternary sequence into a coding sequence, and further convert it into text using a preset character encoding table.
  • the above-mentioned encoding sequence may be an information sequence represented by hexadecimal, and accordingly, the above-mentioned encoding characters may be hexadecimal characters.
  • the above step 64 is to convert the first quaternary sequence to obtain the original data corresponding to the DNA sequence.
  • the specific implementation method includes the following steps: first, according to the quaternary characters and decimal characters The mapping relationship between the first quaternary sequence is converted into a decimal sequence; then the RGB values of multiple pixels are determined based on the decimal sequence; finally, based on the RGB values of multiple pixels and the preset pixel arrangement order, Generate pictures.
  • each four-digit quaternary number in the process of converting the first quaternary sequence into a decimal sequence, can be regarded as a decimal number, for example, if the first The quaternary sequence is 0002 1210 0113 0231 3332 1032 1203 1001 1231 1010 1332 1132, then the resulting decimal sequence is 2 100 23 45 254 78 99 65 109 68 126 94.
  • every three decimal numbers in the decimal sequence can be used as the RGB value of one pixel.
  • the RGB values of the multiple pixels obtained are (2, 100, 23), (45, 254, 78), (99, 65, 109), (68, 126, 94).
  • the RGB value of each pixel can be processed according to the relative position of each pixel in the picture to obtain the picture.
  • the above step 64 is to convert the first quaternary sequence to obtain the original data corresponding to the DNA sequence.
  • the specific implementation method includes the following steps: first, according to the quaternary characters and decimal characters The mapping relationship between the first quaternary sequence is converted into a decimal sequence; then the amplitude values of multiple sampled data are determined based on the decimal sequence; and then the amplitude values stored in the DNA sequence are determined based on the preset sampling rate and the number of sampled data. The total duration of the audio; finally, the determined amplitude values are evenly distributed over the total duration to obtain the audio.
  • each T (T value is:
  • each decimal The number of digits (number of digits corresponding to the quaternary number) of the quaternary number is converted into a decimal number.
  • each decimal number in the decimal sequence can be used as the amplitude value of one sampled data, and each amplitude value is sorted according to the order of the corresponding decimal number in the decimal sequence.
  • the ratio of the number of amplitude values (that is, the number of sampled data) to the pre-existing sampling rate can be used as the total duration of the audio.
  • the sorted amplitude values can be evenly distributed over the total duration to obtain the audio.
  • the DNA sequence to be decoded includes the DNA sequence of the audio and the DNA sequence of the picture, it indicates that the DNA sequence currently to be decoded is the DNA sequence corresponding to the video.
  • the original data stored in the audio DNA sequence at this time is the audio of the video.
  • the original data stored in the DNA sequence of the picture is the multi-frame picture of the video.
  • the first quaternary sequence obtained is the second quaternary sequence corresponding to the DNA sequence of the audio.
  • the first quaternary sequence obtained is the third quaternary sequence corresponding to the DNA sequence of the picture.
  • the specific implementation method of converting the first quaternary sequence to obtain the original data corresponding to the DNA sequence includes the following steps: converting the second quaternary sequence corresponding to the DNA sequence of the audio to obtain the audio , and convert the third quaternary sequence corresponding to the DNA sequence of the picture to obtain multi-frame pictures.
  • the specific implementation method of converting the second quaternary sequence corresponding to the DNA sequence of the audio (that is, the stored original data is the DNA sequence of the audio) to obtain the audio has been explained in detail above. No longer.
  • the specific implementation method of converting the third quaternary sequence corresponding to the DNA sequence of the picture to obtain the multi-frame picture is the same as that of converting the stored original data into the first quaternary sequence corresponding to the DNA sequence of the picture to obtain the specific details of the picture.
  • the implementation method is similar, but the difference is that in the process of generating pictures, the RGB value of each pixel is processed according to the relative position of each pixel in the picture, and multiple frames of pictures can be obtained. It should be noted that when generating multi-frame pictures, since the pixels corresponding to each frame of picture and the relative position of each pixel in each frame of picture are recorded during DNA encoding, multi-frame pictures can be decoded at this time.
  • the method also includes: synthesizing multiple frames of pictures and audio to obtain the video .
  • the total duration of the audio corresponding to the DNA sequence of the audio can be determined first, and then the decoded multi-frame pictures can be sorted according to the order of the RGB values of the pixels of the decoded multi-frame pictures in the decimal sequence, and then The sorted multi-frame pictures are evenly distributed in the total duration, and the video is obtained by combining the audio and pictures.
  • the quaternary sequence is directly encoded into a base sequence based on the mapping relationship between quaternary characters and bases. , thereby reducing the algorithm complexity and improving the coding speed;
  • the number of single-base repeats is controlled by inserting bases at intervals, thereby reducing the complexity of the algorithm and further improving the encoding speed;
  • DNA storage of text, pictures, audio, and video can be achieved.
  • the DNA encoding device 700 includes:
  • the first conversion module 701 is used to convert the original data to be stored into a first quaternary sequence
  • the encoding module 702 is used to encode and convert the first quaternary sequence to obtain a base sequence according to the preset mapping relationship between quaternary characters and bases;
  • the generation module 703 is used to obtain a DNA sequence storing original data based on the base sequence.
  • the first conversion module 701 includes:
  • the encoding unit is used to encode the text according to the preset character encoding table to obtain a coding sequence when the original data to be stored is text;
  • the first conversion unit is used to convert the coding sequence into a first quaternary sequence according to the mapping relationship between the coded characters and the quaternary characters in the preset character coding table.
  • the first conversion module 701 includes:
  • the first acquisition unit is used to acquire each pixel in the image when the original data to be stored is an image.
  • the first sorting unit is used to sort the RGB values of each pixel according to the preset pixel arrangement order to obtain a decimal sequence
  • the second conversion unit is used to convert the decimal sequence into the first quaternary sequence according to the mapping relationship between decimal characters and quaternary characters.
  • the first conversion module 701 includes:
  • the sampling unit is used to sample the audio according to the preset sampling rate to obtain multiple sampled data when the original data to be stored is audio;
  • the second acquisition unit is used to acquire the amplitude value of each sampled data
  • the second sorting unit is used to sort the obtained amplitude values according to the sampling order of multiple sampled data to obtain a decimal sequence
  • the third conversion unit is used to convert the decimal sequence into the first quaternary sequence according to the mapping relationship between decimal characters and quaternary characters.
  • the first quaternary sequence includes the second quaternary sequence corresponding to the audio and the third quaternary sequence corresponding to the picture.
  • the first conversion module 701 includes:
  • An extraction unit used to extract the audio of the video and each frame of the video when the original data to be stored is a video
  • the first processing unit is used to process the extracted audio to obtain the second quaternary sequence corresponding to the audio;
  • the second processing unit is used to process each extracted picture frame and obtain the fourth quaternary sequence corresponding to each extracted frame picture;
  • the third sorting unit is used to sort each frame according to the playback order of the extracted frame pictures in the video.
  • the fourth quaternary sequence corresponding to the picture is sorted to obtain the third quaternary sequence corresponding to the picture.
  • the generation module 703 includes:
  • the generation unit is used to insert N bases at every M base position in the base sequence to obtain a DNA sequence storing original data.
  • the bases adjacent to the first base among the N bases in the base sequence are different; the bases adjacent to the Nth base among the N bases in the base sequence are different, and N The bases at adjacent positions among the bases are different, and M and N are both integers greater than 0.
  • the DNA decoding device 800 includes:
  • Determination module 801 used to determine the DNA sequence to be decoded
  • the processing module 802 is used to obtain the base sequence according to the DNA sequence
  • the decoding module 803 is used to decode the base sequence according to the preset mapping relationship between quaternary characters and bases to obtain the first quaternary sequence;
  • the second conversion module 804 is used to convert the first quaternary sequence to obtain original data corresponding to the DNA sequence.
  • the second conversion module 804 includes:
  • the fourth conversion unit is used to convert the first quaternary sequence into coding sequence
  • the fifth conversion unit is used to convert the encoding sequence into text according to the preset character encoding table.
  • the second conversion module 804 includes:
  • the sixth conversion unit is used to convert the first quaternary sequence into a decimal sequence according to the mapping relationship between quaternary characters and decimal characters when the original data stored in the DNA sequence is a picture;
  • the first determination unit is used to determine the RGB values of multiple pixels according to the decimal sequence
  • the generation unit is used to generate pictures based on the RGB values of multiple pixels and the preset arrangement order of pixels.
  • the second conversion module 804 includes:
  • the seventh conversion unit is used to convert the first quaternary sequence into a decimal sequence according to the mapping relationship between quaternary characters and decimal characters when the original data stored in the DNA sequence is audio;
  • the second determination unit is used to determine the amplitude values of multiple sampled data according to the decimal sequence
  • the third determination unit is used to determine the total duration of the audio stored in the DNA sequence based on the preset sampling rate and the number of sampled data;
  • the distribution unit is used to evenly distribute the determined amplitude values over the total duration to obtain audio.
  • the DNA sequence to be decoded includes the DNA sequence of the audio and the DNA sequence of the picture.
  • the first quaternary sequence includes the second quaternary sequence corresponding to the DNA sequence of the audio and the third quaternary sequence corresponding to the DNA sequence of the picture.
  • the original data stored in the audio DNA sequence is audio, and the original data stored in the picture DNA sequence is multi-frame pictures;
  • the second conversion module 804 includes:
  • the eighth conversion unit is used to convert the second quaternary sequence corresponding to the DNA sequence of the audio to obtain the audio;
  • the ninth conversion unit is used to convert the third quaternary sequence corresponding to the DNA sequence of the picture to obtain multiple frames of pictures;
  • the DNA decoding device 800 also includes:
  • the synthesis module is used to synthesize multiple frames of pictures and audio to obtain a video.
  • the processing module 802 is specifically configured to remove N bases at positions every M base interval in the DNA sequence to obtain a base sequence; where M and N are both integers greater than 0.
  • Module completion means dividing the internal structure of the device into different functional units or modules to complete all or part of the functions described above.
  • Each functional unit and module in the embodiment can be integrated into one processing unit, or each unit can exist physically alone, or two or more units can be integrated into one unit.
  • the above-mentioned integrated unit can be hardware-based. It can also be implemented in the form of software functional units.
  • the specific names of each functional unit and module are only for the convenience of distinguishing each other and are not used to limit the scope of protection of the present application.
  • For the specific working processes of the units and modules in the above system please refer to the corresponding processes in the foregoing method embodiments, and will not be described again here.
  • an embodiment of the present application provides a terminal device.
  • the terminal device D10 of this embodiment includes: at least one processor D100 (only one processor is shown in Figure 9), Memory D101 and a computer program D102 stored in the memory D101 and executable on the at least one processor D100.
  • the processor D100 executes the computer program D102, the steps in any of the above method embodiments are implemented.
  • the so-called processor D100 can be a central processing unit (CPU, Central Processing Unit).
  • the processor D100 can also be other general-purpose processors, digital signal processors (DSP, Digital Signal Processor), application specific integrated circuits (ASIC, Application Specific Integrated Circuit), off-the-shelf programmable gate array (FPGA, Field-Programmable Gate Array) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • DSP digital signal processors
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • a general-purpose processor may be a microprocessor or the processor may be any conventional processor, etc.
  • the memory D101 may be an internal storage unit of the terminal device D10, such as a hard disk or memory of the terminal device D10. In other embodiments, the memory D101 may also be an external storage device of the terminal device D10, such as a plug-in hard disk, a smart memory card (SMC, Smart Media Card), or a secure digital device equipped on the terminal device D10. (SD, Secure Digital) card, flash card (Flash Card), etc. Further, the memory D101 may also include both an internal storage unit of the terminal device D10 and an external storage device. The memory D101 is used to store operating systems, application programs, boot loaders (Boot Loaders), data and other programs, such as program codes of the computer programs. The memory D101 can also be used to temporarily store data that has been output or will be output.
  • Boot Loaders Boot Loaders
  • Module completion means dividing the internal structure of the device into different functional units or modules to complete all or part of the functions described above.
  • Each functional unit and module in the embodiment can be integrated into one processing unit, or each unit can exist physically alone, or two or more units can be integrated into one unit.
  • the above-mentioned integrated unit can be hardware-based. It can also be implemented in the form of software functional units.
  • the specific names of each functional unit and module are only for the convenience of distinguishing each other and are not used to limit the scope of protection of the present application.
  • For the specific working processes of the units and modules in the above system please refer to the corresponding processes in the foregoing method embodiments, and will not be described again here.
  • Embodiments of the present application also provide a computer-readable storage medium.
  • the computer-readable storage medium stores a computer program.
  • the steps in each of the above method embodiments can be implemented.
  • Embodiments of the present application provide a computer program product.
  • the steps in each of the above method embodiments can be implemented when the terminal device executes it.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a computer-readable storage medium.
  • this application can implement all or part of the processes in the methods of the above embodiments by instructing relevant hardware through a computer program.
  • the computer program can be stored in a computer-readable storage medium.
  • the computer program When executed by a processor, the steps of each of the above method embodiments may be implemented.
  • the computer program includes computer program code, which may be in the form of source code, object code, executable file or some intermediate form.
  • the computer-readable medium may at least include: any entity or device capable of carrying computer program code to a DNA encoding device/DNA decoding device/terminal device, a recording medium, a computer memory, or a read-only memory (ROM, Read-Only Memory) , random access memory (RAM, Random Access Memory), electrical carrier signals, telecommunications signals and software distribution media.
  • ROM Read-Only Memory
  • RAM Random Access Memory
  • electrical carrier signals telecommunications signals and software distribution media.
  • U disk, mobile hard disk, magnetic disk or CD etc.
  • computer-readable media may not be electrical carrier signals and telecommunications signals.
  • the disclosed devices/network devices and methods can be implemented in other ways.
  • the apparatus/network equipment embodiments described above are only illustrative.
  • the division of modules or units is only a logical function division. In actual implementation, there may be other division methods, such as multiple units. Or components can be combined or can be integrated into another system, or some features can be omitted, or not implemented.
  • the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, indirect coupling or communication connection of devices or units, which may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or they may be distributed to multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Biophysics (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • Databases & Information Systems (AREA)
  • Genetics & Genomics (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The present application belongs to the technical field of data storage. Provided are a DNA coding method and apparatus, a DNA decoding method and apparatus, a terminal device and a medium. The DNA coding method comprises: converting into a first quaternary sequence original data to be stored; according to a preset mapping relationship between quaternary characters and bases, performing code conversion on the first quaternary sequence to obtain a base sequence; and according to the base sequence, obtaining a DNA sequence in which the original data is stored. The present application can increase the coding speed.

Description

DNA编码方法、解码方法、装置、终端设备及介质DNA encoding method, decoding method, device, terminal equipment and medium 技术领域Technical field
本申请属于数据存储技术领域,尤其涉及一种DNA编码方法、解码方法、装置、终端设备及介质。This application belongs to the field of data storage technology, and in particular relates to a DNA encoding method, decoding method, device, terminal equipment and medium.
背景技术Background technique
互联网的发展使得人类社会的信息呈现爆炸式增长,然而现有的存储介质已经快被消耗殆尽。基于此,相关研究人员将目标转向了脱氧核糖核酸(DNA,DeoxyriboNucleic Acid)存储。现有的DNA编码方法是以硅基存储的01二进制作为基础,将待存储的信息转换为01二进制数,再进一步编码为DNA序列。其中由于在编码过程中需要顾及单碱基重复的问题,因此二进制数需经过一系列复杂运算处理(如异或运算、随机函数映射、条件过滤等)才能编码为DNA序列。而由于复杂运算的耗时较长,造成编码速度不理想。The development of the Internet has caused the explosive growth of information in human society. However, the existing storage media has been almost exhausted. Based on this, relevant researchers have turned their goals to deoxyribonucleic acid (DNA, DeoxyriboNucleic Acid) storage. The existing DNA encoding method is based on silicon-based 01 binary storage, converting the information to be stored into 01 binary numbers, and then further encoding it into a DNA sequence. Since the problem of single base repetition needs to be taken into account during the encoding process, binary numbers need to undergo a series of complex operations (such as XOR operations, random function mapping, conditional filtering, etc.) before they can be encoded into DNA sequences. However, because complex operations take a long time, the encoding speed is not ideal.
技术问题technical problem
本申请实施例提供了一种DNA编码方法、解码方法、装置、终端设备及介质,可以解决DNA编码速度不理想的问题。Embodiments of the present application provide a DNA encoding method, decoding method, device, terminal equipment and medium, which can solve the problem of unsatisfactory DNA encoding speed.
技术解决方案Technical solutions
第一方面,本申请实施例提供了一种DNA编码方法,包括:In a first aspect, embodiments of the present application provide a DNA encoding method, including:
将待存储的原始数据转换为第一四进制序列;Convert the original data to be stored into the first quaternary sequence;
根据预设的四进制字符与碱基的映射关系,对第一四进制序列进行编码转换得到碱基序列;According to the preset mapping relationship between quaternary characters and bases, the first quaternary sequence is encoded and converted to obtain the base sequence;
根据碱基序列得到存储有原始数据的DNA序列。According to the base sequence, the DNA sequence storing the original data is obtained.
可选的,将待存储的原始数据转换为第一四进制序列,包括:Optionally, convert the original data to be stored into the first quaternary sequence, including:
当待存储的原始数据为文字时,按照预设字符编码表对文字进行编码,得到编码序列;When the original data to be stored is text, the text is encoded according to the preset character encoding table to obtain a coding sequence;
根据预设字符编码表中的编码字符与四进制字符之间的映射关系,将编码序列转换为第一四进制序列。According to the mapping relationship between the coded characters and quaternary characters in the preset character coding table, the coding sequence is converted into a first quaternary sequence.
可选的,将待存储的原始数据转换为第一四进制序列,包括:Optionally, convert the original data to be stored into the first quaternary sequence, including:
当待存储的原始数据为图片时,获取图片中各像素点的RGB值;When the original data to be stored is a picture, obtain the RGB value of each pixel in the picture;
按照预设的像素点排列顺序对各像素点的RGB值进行排序,得到十进制序列;Sort the RGB values of each pixel according to the preset pixel arrangement order to obtain a decimal sequence;
根据十进制字符与四进制字符之间的映射关系,将十进制序列转换为第一四进制序列。Convert the decimal sequence into the first quaternary sequence according to the mapping relationship between decimal characters and quaternary characters.
可选的,将待存储的原始数据转换为第一四进制序列,包括:Optionally, convert the original data to be stored into the first quaternary sequence, including:
当待存储的原始数据为音频时,根据预设采样率对音频进行采样处理,得到多个采样数据;When the original data to be stored is audio, the audio is sampled according to the preset sampling rate to obtain multiple sample data;
获取每个采样数据的振幅值;Get the amplitude value of each sampled data;
根据多个采样数据的采样顺序对获取到的振幅值进行排序,得到十进制序列;Sort the obtained amplitude values according to the sampling order of multiple sampling data to obtain a decimal sequence;
根据十进制字符与四进制字符之间的映射关系,将十进制序列转换为第一四进制序列。Convert the decimal sequence into the first quaternary sequence according to the mapping relationship between decimal characters and quaternary characters.
可选的,第一四进制序列包括音频对应的第二四进制序列和画面对应的第Optionally, the first quaternary sequence includes the second quaternary sequence corresponding to the audio and the third quaternary sequence corresponding to the picture.
三四进制序列;将待存储的原始数据转换为第一四进制序列,包括:Three quaternary sequences; convert the original data to be stored into the first quaternary sequence, including:
当待存储的原始数据为视频时,提取视频的音频以及视频的每帧图片;When the original data to be stored is a video, extract the audio of the video and each frame of the video;
对提取到的音频进行处理,得到音频对应的第二四进制序列;Process the extracted audio to obtain the second quaternary sequence corresponding to the audio;
对提取到的每帧图片进行处理,得到提取到的每帧图片对应的第四四进制序列;Process each extracted picture frame to obtain the fourth quaternary sequence corresponding to each extracted frame picture;
按照提取到的各帧图片在视频中的播放顺序,对各帧图片对应的第四四进According to the playback order of each extracted frame picture in the video, the fourth and fourth steps corresponding to each frame picture are
制序列进行排序,得到画面对应的第三四进制序列。Sort the system sequence to obtain the third quaternary sequence corresponding to the screen.
可选的,根据碱基序列得到存储有原始数据的DNA序列,包括:Optionally, obtain the DNA sequence storing the original data based on the base sequence, including:
在碱基序列中每间隔M个碱基的位置,插入N个碱基,得到存储有原始数据的DNA序列;Insert N bases at every M base position in the base sequence to obtain a DNA sequence that stores the original data;
其中,碱基序列中与N个碱基中的第一个碱基相邻的碱基不同;碱基序列中与N个碱基中的第N个碱基相邻的碱基不同,且N个碱基中相邻位置的碱基不同,M和N均为大于0的整数。Among them, the bases adjacent to the first base among the N bases in the base sequence are different; the bases adjacent to the Nth base among the N bases in the base sequence are different, and N The bases at adjacent positions among the bases are different, and M and N are both integers greater than 0.
第二方面,本申请实施例提供了一种DNA解码方法,包括:In the second aspect, embodiments of the present application provide a DNA decoding method, including:
确定需解码的DNA序列;Determine the DNA sequence to be decoded;
根据DNA序列得到碱基序列;Obtain the base sequence based on the DNA sequence;
根据预设的四进制字符与碱基的映射关系,对碱基序列进行解码得到第一四进制序列;According to the preset mapping relationship between quaternary characters and bases, the base sequence is decoded to obtain the first quaternary sequence;
对第一四进制序列进行转换,得到DNA序列对应的原始数据。Convert the first quaternary sequence to obtain the original data corresponding to the DNA sequence.
可选的,对第一四进制序列进行转换,得到DNA序列对应的原始数据,包括:Optionally, convert the first quaternary sequence to obtain the original data corresponding to the DNA sequence, including:
当DNA序列所存储的原始数据为文字时,根据预设字符编码表中的编码字符与四进制字符之间的映射关系,将第一四进制序列进行转换为编码序列;When the original data stored in the DNA sequence is text, the first quaternary sequence is converted into a coding sequence according to the mapping relationship between the coded characters and quaternary characters in the preset character encoding table;
根据预设字符编码表将编码序列转换为文字。Convert the encoding sequence into text according to the default character encoding table.
可选的,对第一四进制序列进行转换,得到DNA序列对应的原始数据,包括:Optionally, convert the first quaternary sequence to obtain the original data corresponding to the DNA sequence, including:
当DNA序列所存储的原始数据为图片时,根据四进制字符与十进制字符之间的映射关系,将第一四进制序列转换为十进制序列;When the original data stored in the DNA sequence is a picture, the first quaternary sequence is converted into a decimal sequence according to the mapping relationship between quaternary characters and decimal characters;
根据十进制序列,确定多个像素点的RGB值;Determine the RGB values of multiple pixels based on the decimal sequence;
根据多个像素点的RGB值以及预设的像素点排列顺序,生成图片。Generate a picture based on the RGB values of multiple pixels and the preset pixel arrangement order.
可选的,对第一四进制序列进行转换,得到DNA序列对应的原始数据,包括:Optionally, convert the first quaternary sequence to obtain the original data corresponding to the DNA sequence, including:
当DNA序列所存储的原始数据为音频时,根据四进制字符与十进制字符之间的映射关系,将第一四进制序列转换为十进制序列;When the original data stored in the DNA sequence is audio, the first quaternary sequence is converted into a decimal sequence according to the mapping relationship between quaternary characters and decimal characters;
根据十进制序列,确定多个采样数据的振幅值;Determine the amplitude values of multiple sampled data according to the decimal sequence;
根据预设采样率和采样数据的数量,确定DNA序列所存储的音频的总时长;Determine the total duration of audio stored in the DNA sequence based on the preset sampling rate and the number of sampled data;
将确定出的振幅值平均分布于总时长中,得到音频。The determined amplitude values are evenly distributed over the total duration to obtain the audio.
可选的,需解码的DNA序列包括音频的DNA序列和画面的DNA序列,第一四进制序列包括音频的DNA序列对应的第二四进制序列和画面的DNA序列对应的第三四进制序列,音频的DNA序列所存储的原始数据为音频,画面的DNA序列所存储的原始数据为多帧图片;Optionally, the DNA sequence to be decoded includes the DNA sequence of the audio and the DNA sequence of the picture. The first quaternary sequence includes the second quaternary sequence corresponding to the DNA sequence of the audio and the third quaternary sequence corresponding to the DNA sequence of the picture. The original data stored in the audio DNA sequence is audio, and the original data stored in the picture DNA sequence is multi-frame pictures;
对第一四进制序列进行转换,得到DNA序列对应的原始数据,包括:Convert the first quaternary sequence to obtain the original data corresponding to the DNA sequence, including:
对音频的DNA序列对应的第二四进制序列进行转换,得到音频;Convert the second quaternary sequence corresponding to the DNA sequence of the audio to obtain the audio;
对画面的DNA序列对应的第三四进制序列进行转换,得到多帧图片;Convert the third quaternary sequence corresponding to the DNA sequence of the picture to obtain multiple frames of pictures;
在对第一四进制序列进行转换,得到DNA序列对应的原始数据之后,方法还包括:After converting the first quaternary sequence to obtain the original data corresponding to the DNA sequence, the method also includes:
对多帧图片和音频进行合成,得到视频。Synthesize multiple frames of pictures and audio to obtain a video.
可选的,根据DNA序列得到碱基序列,包括:Optionally, obtain the base sequence based on the DNA sequence, including:
在DNA序列中,每间隔M个碱基的位置,去除N个碱基,得到碱基序列;In the DNA sequence, N bases are removed at every M base position to obtain the base sequence;
其中,M和N均为大于0的整数。Among them, M and N are both integers greater than 0.
第三方面,本申请实施例提供了一种DNA编码装置,包括:In a third aspect, embodiments of the present application provide a DNA encoding device, including:
第一转换模块,用于将待存储的原始数据转换为第一四进制序列;The first conversion module is used to convert the original data to be stored into a first quaternary sequence;
编码模块,用于根据预设的四进制字符与碱基的映射关系,对第一四进制序列进行编码转换得到碱基序列;The encoding module is used to encode and convert the first quaternary sequence to obtain the base sequence according to the preset mapping relationship between quaternary characters and bases;
生成模块,用于根据碱基序列得到存储有原始数据的DNA序列。The generation module is used to obtain the DNA sequence storing the original data based on the base sequence.
可选的,第一转换模块包括:Optionally, the first conversion module includes:
编码单元,用于当待存储的原始数据为文字时,按照预设字符编码表对文字进行编码,得到编码序列;The encoding unit is used to encode the text according to the preset character encoding table to obtain a coding sequence when the original data to be stored is text;
第一转换单元,用于根据预设字符编码表中的编码字符与四进制字符之间的映射关系,将编码序列转换为第一四进制序列。The first conversion unit is used to convert the coding sequence into a first quaternary sequence according to the mapping relationship between the coded characters and the quaternary characters in the preset character coding table.
可选的,第一转换模块包括:Optionally, the first conversion module includes:
第一获取单元,用于当待存储的原始数据为图片时,获取图片中各像素点The first acquisition unit is used to acquire each pixel in the image when the original data to be stored is an image.
的RGB值;RGB value;
第一排序单元,用于按照预设的像素点排列顺序对各像素点的RGB值进行排序,得到十进制序列;The first sorting unit is used to sort the RGB values of each pixel according to the preset pixel arrangement order to obtain a decimal sequence;
第二转换单元,用于根据十进制字符与四进制字符之间的映射关系,将十进制序列转换为第一四进制序列。The second conversion unit is used to convert the decimal sequence into the first quaternary sequence according to the mapping relationship between decimal characters and quaternary characters.
可选的,第一转换模块包括:Optionally, the first conversion module includes:
采样单元,用于当待存储的原始数据为音频时,根据预设采样率对音频进行采样处理,得到多个采样数据;The sampling unit is used to sample the audio according to the preset sampling rate to obtain multiple sampled data when the original data to be stored is audio;
第二获取单元,用于获取每个采样数据的振幅值;The second acquisition unit is used to acquire the amplitude value of each sampled data;
第二排序单元,用于根据多个采样数据的采样顺序对获取到的振幅值进行排序,得到十进制序列;The second sorting unit is used to sort the obtained amplitude values according to the sampling order of multiple sampled data to obtain a decimal sequence;
第三转换单元,用于根据十进制字符与四进制字符之间的映射关系,将十进制序列转换为第一四进制序列。The third conversion unit is used to convert the decimal sequence into the first quaternary sequence according to the mapping relationship between decimal characters and quaternary characters.
可选的,第一四进制序列包括音频对应的第二四进制序列和画面对应的第Optionally, the first quaternary sequence includes the second quaternary sequence corresponding to the audio and the third quaternary sequence corresponding to the picture.
三四进制序列;第一转换模块包括:Three-quaternary sequence; the first conversion module includes:
提取单元,用于当待存储的原始数据为视频时,提取视频的音频以及视频的每帧图片;An extraction unit, used to extract the audio of the video and each frame of the video when the original data to be stored is a video;
第一处理单元,用于对提取到的音频进行处理,得到音频对应的第二四进制序列;The first processing unit is used to process the extracted audio to obtain the second quaternary sequence corresponding to the audio;
第二处理单元,用于对提取到的每帧图片进行处理,得到提取到的每帧图片对应的第四四进制序列;The second processing unit is used to process each extracted picture frame and obtain the fourth quaternary sequence corresponding to each extracted frame picture;
第三排序单元,用于按照提取到的各帧图片在视频中的播放顺序,对各帧The third sorting unit is used to sort each frame according to the playback order of the extracted frame pictures in the video.
图片对应的第四四进制序列进行排序,得到画面对应的第三四进制序列。The fourth quaternary sequence corresponding to the picture is sorted to obtain the third quaternary sequence corresponding to the picture.
可选的,生成模块包括:Optional, generated modules include:
生成单元,用于在碱基序列中每间隔M个碱基的位置,插入N个碱基,得到存储有原始数据的DNA序列。The generation unit is used to insert N bases at every M base position in the base sequence to obtain a DNA sequence storing original data.
其中,碱基序列中与N个碱基中的第一个碱基相邻的碱基不同;碱基序列中与N个碱基中的第N个碱基相邻的碱基不同,且N个碱基中相邻位置的碱基不同,M和N均为大于0的整数。Among them, the bases adjacent to the first base among the N bases in the base sequence are different; the bases adjacent to the Nth base among the N bases in the base sequence are different, and N The bases at adjacent positions among the bases are different, and M and N are both integers greater than 0.
第四方面,本申请实施例提供了一种DNA解码装置,包括:In a fourth aspect, embodiments of the present application provide a DNA decoding device, including:
确定模块,用于确定需解码的DNA序列;Determination module, used to determine the DNA sequence to be decoded;
处理模块,用于根据DNA序列得到碱基序列;The processing module is used to obtain the base sequence based on the DNA sequence;
解码模块,用于根据预设的四进制字符与碱基的映射关系,对碱基序列进行解码得到第一四进制序列;The decoding module is used to decode the base sequence according to the preset mapping relationship between quaternary characters and bases to obtain the first quaternary sequence;
第二转换模块,用于对第一四进制序列进行转换,得到DNA序列对应的原始数据。The second conversion module is used to convert the first quaternary sequence to obtain original data corresponding to the DNA sequence.
可选的,第二转换模块包括:Optionally, the second conversion module includes:
第四转换单元,用于当DNA序列所存储的原始数据为文字时,根据预设字符编码表中的编码字符与四进制字符之间的映射关系,将第一四进制序列进行转换为编码序列;The fourth conversion unit is used to convert the first quaternary sequence into coding sequence;
第五转换单元,用于根据预设字符编码表将编码序列转换为文字。The fifth conversion unit is used to convert the encoding sequence into text according to the preset character encoding table.
可选的,第二转换模块包括:Optionally, the second conversion module includes:
第六转换单元,用于当DNA序列所存储的原始数据为图片时,根据四进制字符与十进制字符之间的映射关系,将第一四进制序列转换为十进制序列;The sixth conversion unit is used to convert the first quaternary sequence into a decimal sequence according to the mapping relationship between quaternary characters and decimal characters when the original data stored in the DNA sequence is a picture;
第一确定单元,用于根据十进制序列,确定多个像素点的RGB值;The first determination unit is used to determine the RGB values of multiple pixels according to the decimal sequence;
生成单元,用于根据多个像素点的RGB值以及预设的像素点排列顺序,生成图片。The generation unit is used to generate pictures based on the RGB values of multiple pixels and the preset arrangement order of pixels.
可选的,第二转换模块包括:Optionally, the second conversion module includes:
第七转换单元,用于当DNA序列所存储的原始数据为音频时,根据四进制字符与十进制字符之间的映射关系,将第一四进制序列转换为十进制序列;The seventh conversion unit is used to convert the first quaternary sequence into a decimal sequence according to the mapping relationship between quaternary characters and decimal characters when the original data stored in the DNA sequence is audio;
第二确定单元,用于根据十进制序列,确定多个采样数据的振幅值;The second determination unit is used to determine the amplitude values of multiple sampled data according to the decimal sequence;
第三确定单元,用于根据预设采样率和采样数据的数量,确定DNA序列所存储的音频的总时长;The third determination unit is used to determine the total duration of the audio stored in the DNA sequence based on the preset sampling rate and the number of sampled data;
分布单元,用于将确定出的振幅值平均分布于总时长中,得到音频。The distribution unit is used to evenly distribute the determined amplitude values over the total duration to obtain audio.
可选的,需解码的DNA序列包括音频的DNA序列和画面的DNA序列,第一四进制序列包括音频的DNA序列对应的第二四进制序列和画面的DNA序列对应的第三四进制序列,音频的DNA序列所存储的原始数据为音频,画面的DNA序列所存储的原始数据为多帧图片;Optionally, the DNA sequence to be decoded includes the DNA sequence of the audio and the DNA sequence of the picture. The first quaternary sequence includes the second quaternary sequence corresponding to the DNA sequence of the audio and the third quaternary sequence corresponding to the DNA sequence of the picture. The original data stored in the audio DNA sequence is audio, and the original data stored in the picture DNA sequence is multi-frame pictures;
第二转换模块包括:The second conversion module includes:
第八转换单元,用于对音频的DNA序列对应的第二四进制序列进行转换,得到音频;The eighth conversion unit is used to convert the second quaternary sequence corresponding to the DNA sequence of the audio to obtain the audio;
第九转换单元,用于对画面的DNA序列对应的第三四进制序列进行转换,得到多帧图片;The ninth conversion unit is used to convert the third quaternary sequence corresponding to the DNA sequence of the picture to obtain multiple frames of pictures;
DNA解码装置还包括:The DNA decoding device also includes:
合成模块,用于对多帧图片和音频进行合成,得到视频。The synthesis module is used to synthesize multiple frames of pictures and audio to obtain a video.
可选的,处理模块,具体用于在DNA序列中,每间隔M个碱基的位置,去除N个碱基,得到碱基序列;其中,M和N均为大于0的整数。The optional processing module is specifically used to remove N bases at every M base position in the DNA sequence to obtain the base sequence; where M and N are both integers greater than 0.
第五方面,本申请实施例提供了一种终端设备,包括存储器、处理器以及存储在存储器中并可在处理器上运行的计算机程序,处理器执行计算机程序时实现上述的DNA编码方法或者DNA解码方法。In the fifth aspect, embodiments of the present application provide a terminal device, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, the above-mentioned DNA encoding method or DNA encoding method is implemented. Decoding method.
第六方面,本申请实施例提供了一种计算机可读存储介质,计算机可读存储介质存储有计算机程序,计算机程序被处理器执行时实现上述的DNA编码方法或者DNA解码方法。In a sixth aspect, embodiments of the present application provide a computer-readable storage medium. The computer-readable storage medium stores a computer program. When the computer program is executed by a processor, the above-mentioned DNA encoding method or DNA decoding method is implemented.
第七方面,本申请实施例提供了一种计算机程序产品,当计算机程序产品在终端设备上运行时,使得终端设备执行上述的DNA编码方法或者DNA解码方法。In a seventh aspect, embodiments of the present application provide a computer program product, which when the computer program product is run on a terminal device, causes the terminal device to execute the above-mentioned DNA encoding method or DNA decoding method.
有益效果beneficial effects
本申请实施例与现有技术相比存在的有益效果是:Compared with the prior art, the beneficial effects of the embodiments of the present application are:
在本申请的实施例中,通过将待存储的原始数据转换为第一四进制序列,然后根据四进制字符与碱基的映射关系,将第一四进制序列编码为碱基序列,最终根据该碱基序列得到存储有原始数据的DNA序列。其中由于在将第一四进制序列编码为碱基序列的过程中,不需要经过复杂运算处理,而是根据四进制字符与碱基的映射关系,将第一四进制序列直接编码为碱基序列,从而降低了算法复杂度,提升了编码速度。In the embodiment of the present application, by converting the original data to be stored into a first quaternary sequence, and then encoding the first quaternary sequence into a base sequence according to the mapping relationship between quaternary characters and bases, Finally, the DNA sequence storing the original data is obtained based on the base sequence. In the process of encoding the first quaternary sequence into a base sequence, there is no need to go through complex operations. Instead, according to the mapping relationship between quaternary characters and bases, the first quaternary sequence is directly encoded as Base sequence, thereby reducing algorithm complexity and improving coding speed.
附图说明Description of the drawings
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or description of the prior art will be briefly introduced below. Obviously, the drawings in the following description are only for the purpose of the present application. For some embodiments, for those of ordinary skill in the art, other drawings can be obtained based on these drawings without exerting creative efforts.
图1是本申请一实施例提供的DNA编码方法的流程图;Figure 1 is a flow chart of a DNA encoding method provided by an embodiment of the present application;
图2是本申请一实施例提供的将文字转换为第一四进制序列的步骤流程图;Figure 2 is a flow chart of steps for converting text into a first quaternary sequence provided by an embodiment of the present application;
图3是本申请一实施例提供的将图片转换为第一四进制序列的步骤流程图;Figure 3 is a flow chart of steps for converting a picture into a first quaternary sequence according to an embodiment of the present application;
图4是本申请一实施例提供的将音频转换为第一四进制序列的步骤流程图;Figure 4 is a flow chart of steps for converting audio into a first quaternary sequence provided by an embodiment of the present application;
图5是本申请一实施例提供的将视频转换为第一四进制序列的步骤流程图;Figure 5 is a flow chart of steps for converting video into a first quaternary sequence provided by an embodiment of the present application;
图6是本申请一实施例提供的DNA解码方法的流程图;Figure 6 is a flow chart of a DNA decoding method provided by an embodiment of the present application;
图7是本申请一实施例提供的DNA编码装置的结构示意图;Figure 7 is a schematic structural diagram of a DNA encoding device provided by an embodiment of the present application;
图8是本申请一实施例提供的DNA解码装置的结构示意图;Figure 8 is a schematic structural diagram of a DNA decoding device provided by an embodiment of the present application;
图9是本申请一实施例提供的终端设备的结构示意图。Figure 9 is a schematic structural diagram of a terminal device provided by an embodiment of the present application.
本发明的最佳实施方式Best Mode of Carrying Out the Invention
以下描述中,为了说明而不是为了限定,提出了诸如特定系统结构、技术之类的具体细节,以便透彻理解本申请实施例。然而,本领域的技术人员应当清楚,在没有这些具体细节的其它实施例中也可以实现本申请。在其它情况中,省略对众所周知的系统、装置、电路以及方法的详细说明,以免不必要的细节妨碍本申请的描述。In the following description, for the purpose of explanation rather than limitation, specific details such as specific system structures and technologies are provided to provide a thorough understanding of the embodiments of the present application. However, it will be apparent to those skilled in the art that the present application may be practiced in other embodiments without these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
应当理解,当在本申请说明书和所附权利要求书中使用时,术语“包括”指示所描述特征、整体、步骤、操作、元素和/或组件的存在,但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。It will be understood that, when used in this specification and the appended claims, the term "comprising" indicates the presence of the described features, integers, steps, operations, elements and/or components but does not exclude one or more other The presence or addition of features, integers, steps, operations, elements, components and/or collections thereof.
还应当理解,在本申请说明书和所附权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合,并且包括这些组合。It will also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
如在本申请说明书和所附权利要求书中所使用的那样,术语“如果”可以依据上下文被解释为“当...时”或“一旦”或“响应于确定”或“响应于检测到”。类似地,短语“如果确定”或“如果检测到[所描述条件或事件]”可以依据上下文被解释为意指“一旦确定”或“响应于确定”或“一旦检测到[所描述条件或事件]”或“响应于检测到[所描述条件或事件]”。As used in this specification and the appended claims, the term "if" may be interpreted as "when" or "once" or "in response to determining" or "in response to detecting" depending on the context. ". Similarly, the phrase "if determined" or "if [the described condition or event] is detected" may be interpreted, depending on the context, to mean "once determined" or "in response to a determination" or "once the [described condition or event] is detected ]" or "in response to detection of [the described condition or event]".
另外,在本申请说明书和所附权利要求书的描述中,术语“第一”、“第二”、“第三”等仅用于区分描述,而不能理解为指示或暗示相对重要性。In addition, in the description of this application and the appended claims, the terms "first", "second", "third", etc. are only used to distinguish the description, and cannot be understood as indicating or implying relative importance.
在本申请说明书中描述的参考“一个实施例”或“一些实施例”等意味着在本申请的一个或多个实施例中包括结合该实施例描述的特定特征、结构或特点。由此,在本说明书中的不同之处出现的语句“在一个实施例中”、“在一些实施例中”、“在其他一些实施例中”、“在另外一些实施例中”等不是必然都参考相同的实施例,而是意味着“一个或多个但不是所有的实施例”,除非是以其他方式另外特别强调。术语“包括”、“包含”、“具有”及它们的变形都意味着“包括但不限于”,除非是以其他方式另外特别强调。Reference in this specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure or characteristic described in connection with the embodiment is included in one or more embodiments of the application. Therefore, the phrases "in one embodiment", "in some embodiments", "in other embodiments", "in other embodiments", etc. appearing in different places in this specification are not necessarily References are made to the same embodiment, but rather to "one or more but not all embodiments" unless specifically stated otherwise. The terms “including,” “includes,” “having,” and variations thereof all mean “including but not limited to,” unless otherwise specifically emphasized.
目前在进行DNA编码时,需将待存储的信息转换为01二进制数,再进一步编码为DNA序列。其中由于二进制数需要经过一系列耗时较长的复杂运算处理才能编码为DNA序列,从而造成DNA编码速度不理想。When currently encoding DNA, the information to be stored needs to be converted into a 01 binary number, and then further encoded into a DNA sequence. Among them, because binary numbers need to undergo a series of complex operations that take a long time to be encoded into DNA sequences, the DNA encoding speed is not ideal.
针对上述问题,本申请实施例通过将待存储的信息转换为第一四进制序列,然后利用四进制字符与碱基的映射关系将第一四进制序列直接编码为碱基序列,并根据碱基序列得到DNA序列。其中由于在将第一四进制序列编码为碱基序列的过程中,不需要经过复杂运算处理,而是根据四进制字符与碱基的映射关系,将第一四进制序列直接编码为碱基序列,从而降低了算法复杂度,提升了编码速度。To address the above problems, embodiments of the present application convert the information to be stored into a first quaternary sequence, and then use the mapping relationship between quaternary characters and bases to directly encode the first quaternary sequence into a base sequence, and Obtain the DNA sequence based on the base sequence. In the process of encoding the first quaternary sequence into a base sequence, there is no need to go through complex operations. Instead, according to the mapping relationship between quaternary characters and bases, the first quaternary sequence is directly encoded as Base sequence, thereby reducing algorithm complexity and improving coding speed.
下面结合具体实施例对本申请提供的DNA编码方法进行示例性的说明。The DNA encoding method provided in this application will be exemplified below in conjunction with specific examples.
如图1所示,本申请的实施例提供了一种DNA编码方法,包括如下步骤:As shown in Figure 1, the embodiment of the present application provides a DNA encoding method, including the following steps:
步骤11,将待存储的原始数据转换为第一四进制序列。Step 11: Convert the original data to be stored into the first quaternary sequence.
上述原始数据的格式可以为文字、图片、音频、视频等格式。且在将原始数据转换为第一四进制序列时,不同格式的原始数据的具体转换方式不同,具体转换过程将在后文详细阐述。The format of the above raw data can be text, picture, audio, video and other formats. And when converting the original data into the first quaternary sequence, the specific conversion methods of the original data in different formats are different. The specific conversion process will be explained in detail later.
步骤12,根据预设的四进制字符与碱基的映射关系,对第一四进制序列进行编码转换得到碱基序列。Step 12: According to the preset mapping relationship between quaternary characters and bases, the first quaternary sequence is code-converted to obtain the base sequence.
上述四进制字符包括0、1、2、3,上述碱基包括腺膘呤(A)、尿嘧啶(T)、胞嘧啶(C)、鸟膘呤(G)四种天然碱基。可见,若ATCG四种碱基与0123一一对应的映射,四进制字符与碱基共存在24种映射关系,而上述预设的四进制字符与碱基的映射关系可以是这24种映射关系中的任一种。The above-mentioned quaternary characters include 0, 1, 2, and 3, and the above-mentioned bases include four natural bases: adenine (A), uracil (T), cytosine (C), and guanine (G). It can be seen that if the four bases of ATCG are mapped one-to-one to 0123, there are 24 mapping relationships between quaternary characters and bases, and the above-mentioned preset mapping relationships between quaternary characters and bases can be these 24 types Any of the mapping relationships.
步骤13,根据碱基序列得到存储有原始数据的DNA序列。Step 13: Obtain the DNA sequence storing the original data based on the base sequence.
在一个示例中,获取到原始数据对应的碱基序列后,可以直接将碱基序列作为存储有原始数据的DNA序列,进行后续的DNA合成过程,得到存储有原始数据的DNA,进行储存。需要说明的是,合成DNA的具体实现过程可参照目前通用的方式实现,此处不再赘述。In one example, after obtaining the base sequence corresponding to the original data, the base sequence can be directly used as the DNA sequence storing the original data, and the subsequent DNA synthesis process can be performed to obtain the DNA storing the original data and store it. It should be noted that the specific implementation process of synthesizing DNA can be implemented by referring to the current common methods, and will not be described again here.
可见,在本申请的一些实施例中,通过将待存储的原始数据转换为第一四进制序列,然后根据四进制字符与碱基的一一对应的映射关系,将第一四进制序列直接编码为碱基序列,并根据该碱基序列得到DNA序列。其中由于在将第一四进制序列编码为碱基序列的过程中,不需要经过复杂运算处理,而是根据四进制字符与碱基的映射关系,将第一四进制序列直接编码为碱基序列,从而降低了算法复杂度,提升了编码速度。It can be seen that in some embodiments of the present application, the original data to be stored is converted into a first quaternary sequence, and then the first quaternary sequence is converted into a first quaternary sequence according to the one-to-one mapping relationship between quaternary characters and bases. The sequence is directly encoded as a base sequence, and the DNA sequence is obtained based on the base sequence. In the process of encoding the first quaternary sequence into a base sequence, there is no need to go through complex operations. Instead, according to the mapping relationship between quaternary characters and bases, the first quaternary sequence is directly encoded as Base sequence, thereby reducing algorithm complexity and improving coding speed.
当需要从上述DNA中获取原始数据时,需先对上述DNA进行测序,读取出DNA序列,然后对DNA序列进行解码得到原始数据。在DNA解码的相关技术中,在对DNA进行测序时,由于A或T的连续,聚合酶难以识别完整的每个A或T,造成在某个A或T的后面便开始进行A或T连续结构以后序列的聚合反应,造成测序结果紊乱,出现套峰。When it is necessary to obtain original data from the above-mentioned DNA, the above-mentioned DNA needs to be sequenced first, the DNA sequence is read, and then the DNA sequence is decoded to obtain the original data. In the related technology of DNA decoding, when sequencing DNA, due to the continuous sequence of A or T, it is difficult for the polymerase to recognize each complete A or T, resulting in the continuous sequence of A or T starting after a certain A or T. The polymerization reaction of the sequence after the structure causes the sequencing results to be disordered and the peaks appear.
为此,在本申请的另一个实施例中,对于上述步骤13,为避免测序结果紊乱,可通过在碱基序列中插入碱基的方式,得到DNA序列,以控制DNA序列中的单碱基重复数。For this reason, in another embodiment of the present application, for the above step 13, in order to avoid disordered sequencing results, the DNA sequence can be obtained by inserting bases into the base sequence to control single bases in the DNA sequence. Number of repetitions.
具体的,可通过在碱基序列中每间隔M个碱基的位置,插入N个碱基的方式,得到存储有原始数据的DNA序列。Specifically, the DNA sequence storing the original data can be obtained by inserting N bases at positions every M bases in the base sequence.
其中,碱基序列中与N个碱基中的第一个碱基相邻的碱基不同;碱基序列中与N个碱基中的第N个碱基相邻的碱基不同,且N个碱基中相邻位置的碱基不同。Among them, the bases adjacent to the first base among the N bases in the base sequence are different; the bases adjacent to the Nth base among the N bases in the base sequence are different, and N The bases at adjacent positions among the bases are different.
上述M和N均为大于0的整数,M和N的具体数值均可根据实际情况进行设定。示例性的,将M的值可设置为6。The above M and N are both integers greater than 0, and the specific values of M and N can be set according to the actual situation. For example, the value of M can be set to 6.
在本申请的一个可能的实施例中,上述DNA序列中G、C在DNA序列中数量百分含量为40%至60%,以使测序效率最高。In a possible embodiment of the present application, the number percentage of G and C in the DNA sequence is 40% to 60%, so as to achieve the highest sequencing efficiency.
在本申请的一些实施例中,通过每间隔M个碱基插入N个碱基得到的DNA序列中单碱基的重复个数不超过M,从而有效控制了单碱基重复数,避免测序结果紊乱。In some embodiments of the present application, the number of single base repeats in the DNA sequence obtained by inserting N bases every M base interval does not exceed M, thereby effectively controlling the number of single base repeats and avoiding sequencing results. disorder.
值得一提的是,本申请实施例的DNA编码方法在控制单碱基重复数时,不需要经过复杂运算处理(如条件过滤等),而是通过间隔插入碱基的方式控制单碱基重复数,从而降低了算法复杂度,进一步提升了编码速度。It is worth mentioning that when controlling the number of single base repeats, the DNA encoding method in the embodiment of the present application does not need to undergo complex calculation processing (such as conditional filtering, etc.), but controls the single base repeats by inserting bases at intervals. number, thus reducing the algorithm complexity and further improving the encoding speed.
下面结合具体实施例对将待存储的原始数据转换为四进制序列的过程进行示例性的说明。The process of converting raw data to be stored into a quaternary sequence is exemplarily described below with reference to specific embodiments.
在本申请的一些实施例中,可根据待存储的原始数据的格式,将该原始数据转换为四进制序列。In some embodiments of the present application, the original data to be stored can be converted into a quaternary sequence according to the format of the original data.
当待存储的原始数据为文字时,如图2所示,上述步骤11,将待存储的原始数据转换为第一四进制序列的具体实现方式包括如下步骤:When the original data to be stored is text, as shown in Figure 2, in step 11 above, the specific implementation method of converting the original data to be stored into the first quaternary sequence includes the following steps:
步骤21,按照预设字符编码表对文字进行编码,得到编码序列。Step 21: Encode the text according to the preset character encoding table to obtain a encoding sequence.
上述预设字符编码表为对文字进行编码的编码表,如 Unicode编码表、UTF-8编码表、ASCII编码表、ISO8859-1编码表、GB2312编码表、GBK编码表等编码表。可以理解的是,在本申请的一些实施例中并不限定上述预设字符编码表的具体形式。The above-mentioned default character encoding table is a encoding table for encoding text, such as Unicode encoding table, UTF-8 encoding table, ASCII encoding table, ISO8859-1 encoding table, GB2312 encoding table, GBK encoding table and other encoding tables. It can be understood that in some embodiments of the present application, the specific form of the above-mentioned preset character encoding table is not limited.
步骤22,根据预设字符编码表中的编码字符与四进制字符之间的映射关系,将编码序列转换为第一四进制序列。Step 22: Convert the encoding sequence into a first quaternary sequence according to the mapping relationship between the coded characters and quaternary characters in the preset character encoding table.
上述编码字符是编码序列对应的进制数的字符。在本申请的一些实施例中,为便于快速将编码序列转换为第一四进制序列,上述编码序列可以为十六进制表示的信息序列,相应的,上述编码字符为十六进制字符。The above-mentioned coded characters are the characters of the base number corresponding to the coded sequence. In some embodiments of the present application, in order to quickly convert the encoding sequence into the first quaternary sequence, the above-mentioned encoding sequence may be an information sequence represented by hexadecimal. Correspondingly, the above-mentioned encoding characters are hexadecimal characters. .
举例说明,假设上述作为原始数据的文字为:hello world!,预设字符编码表为Unicode编码表,M的取值为5,N的取值为1,四进制字符与碱基的映射关系为:0-A,1-T,2-C,3-G,利用Unicode编码表进行编码得到的编码序列为:00680065006c006c006f00200077006f0072006c00640021,然后根据Unicode编码表的编码字符与四进制字符之间的映射关系,得到的第一四进制序列为:For example, assume that the text used as the original data above is: hello world!, the default character encoding table is the Unicode encoding table, the value of M is 5, the value of N is 1, the mapping relationship between quaternary characters and bases are: 0-A, 1-T, 2-C, 3-G. The encoding sequence obtained by encoding using the Unicode encoding table is: 00680065006c006c006f00200077006f0072006c00640021, and then according to the mapping relationship between the encoded characters and quaternary characters of the Unicode encoding table , the obtained first quaternary sequence is:
000012200000121100001230000012300000123300000200000013130000123300001302000012300000121000000201,再利用四进制字符与碱基的映射关系,得到的碱基序列为:000012200000121100001230000012300000123300000200000013130000123300001302000012300000121000000201, and then using the mapping relationship between quaternary characters and bases, the obtained base sequence is:
AAAATCCAAAAATCTTAAAATCGAAAAATCGAAAAATCGGAAAAACAAAAAATGTGAAAATCGGAAAATGACAAAATCGAAAAATCTAAAAAACAT,最终插入碱基得到的DNA序列为:AAAATCCAAAAATCTTAAAATCGAAAAAATCGAAAAATCGGAAAAACAAAAAATGTGAAAATCGGAAAATGACAAAATCGAAAAATCTAAAAAACAT. The final DNA sequence obtained by inserting the bases is:
AAAATGCCAAATAATCTATAAAAGTCGAATAAATCTGAAAATATCGGTAAAAATCAAAATAATGTAGAAAAGTCGGATAAATGTACAAATATCGATAAAATGCTAAATAAACAGT。AAAATGCCAAATAATCTATAAAAGTCGAATAAATCTGAAAATATCGGTAAAATCAAAATAATGTAGAAAAGTCGGATAAATGTACAAATATCGATAAAATGCTAAATAAACAGT.
当待存储的原始数据为图片时,如图3所示,上述步骤11,将待存储的原始数据转换为第一四进制序列的具体实现方式包括如下步骤:When the original data to be stored is a picture, as shown in Figure 3, in step 11 above, the specific implementation method of converting the original data to be stored into the first quaternary sequence includes the following steps:
步骤31,获取图片中各像素点的RGB值。Step 31: Obtain the RGB value of each pixel in the picture.
在本申请的一些实施例中,可通过目前通用的RGB值提取方式,提取图In some embodiments of the present application, the image can be extracted through the currently common RGB value extraction method.
片中每个像素点的RGB(R代表红色(Red),G代表绿色(Green),B代表蓝色(Blue))值。The RGB (R represents red, G represents green, and B represents blue) value of each pixel in the film.
步骤32,按照预设的像素点排列顺序对各像素点的RGB值进行排序,得到十进制序列。Step 32: Sort the RGB values of each pixel according to the preset pixel arrangement order to obtain a decimal sequence.
在本申请的一些实施例中,上述预设的像素点排列顺序可以是预先根据各像素点在图片中的相对位置关系设定的。例如当图片由4个像素点组成时,可以按照这4个像素点在图片中的位置依次编号为0、1、2、3,并确定各像素点的排列顺序为0、1、2、3,然后在对这4个像素点的RGB值进行排序时,可按照编号顺序(即上述的顺序0、1、2、3)对各像素点的RGB值进行排序。In some embodiments of the present application, the above-mentioned preset arrangement order of pixel points may be set in advance according to the relative position of each pixel point in the picture. For example, when the picture consists of 4 pixels, the 4 pixels can be numbered 0, 1, 2, and 3 according to their positions in the picture, and the arrangement order of each pixel can be determined to be 0, 1, 2, 3. , then when sorting the RGB values of these four pixels, the RGB values of each pixel can be sorted according to the numbering sequence (that is, the above-mentioned sequence 0, 1, 2, 3).
步骤33,根据十进制字符与四进制字符之间的映射关系,将十进制序列转换为第一四进制序列。Step 33: Convert the decimal sequence into the first quaternary sequence according to the mapping relationship between decimal characters and quaternary characters.
举例说明,假设图片由4个像素点组成,4个像素点的RGB值分别为:(45,254,78),(2,100,23),(99,65,109),(68,126,94),按照这4个像素点在图片中的相对位置关系,对这4个RGB值进行排序,得到的十进制序列可以表达为2 100 23 45 254 78 99 65 109 68 126 94 ,根据十进制字符与四进制字符之间的映射关系,转换后的第一四进制序列为2 1210 113 231 3332 1032 1203 1001 1231 1010 1332 1132 。需要说明的是,由于RGB模式最大值为255,转换为四进制后是3333,为了保证每个数字所占的位数相等,将不足4位的数用0进行补位,保持原有相对顺序,得到最终的第一四进制序列为:000212100113023133321032120310011231101013321132。在得到的第一四进制序列后,假设M取值为5,N的取值为1,四进制字符与碱基的映射关系为:0-A,1-T,2-C,3-G,则利用四进制字符与碱基的映射关系,得到的碱基序列为:For example, assume that the picture consists of 4 pixels, and the RGB values of the 4 pixels are: (45, 254, 78), (2, 100, 23), (99, 65, 109), (68, 126 , 94), according to the relative position of these four pixels in the picture, sort these four RGB values, and the resulting decimal sequence can be expressed as 2 100 23 45 254 78 99 65 109 68 126 94. According to the mapping relationship between decimal characters and quaternary characters, the converted first quaternary sequence is 2 1210 113 231 3332 1032 1203 1001 1231 1010 1332 1132 . It should be noted that since the maximum value of RGB mode is 255, which is 3333 after conversion to quaternary, in order to ensure that the number of digits occupied by each number is equal, the numbers with less than 4 digits are filled with 0 to maintain the original relative sequence, the final first quaternary sequence is: 000212100113023133321032120310011231101013321132. After the first quaternary sequence is obtained, assuming that the value of M is 5 and the value of N is 1, the mapping relationship between quaternary characters and bases is: 0-A, 1-T, 2-C, 3 -G, then using the mapping relationship between quaternary characters and bases, the obtained base sequence is:
AAACTCTAATTGACGTGGGCTAGCTCAGTAATTCGTTATATGGCTTGC,最终插入碱基得到的DNA序列为:AAACTCTAATTGACGTGGGCTAGCTCAGTAATTCGTTATATGGCTTGC. The final DNA sequence obtained by inserting the bases is:
AAACTACTAATATGACGATGGGCATAGCTACAGTACATTCGATTATACTGGCTATGC。AAACTACTAATATGACGATGGGCATAGCTACAGTACATTCGATTATACTGGCTATGC.
当待存储的原始数据为音频时,如图4所示,上述步骤11,将待存储的原始数据转换为第一四进制序列的具体实现方式包括如下步骤:When the original data to be stored is audio, as shown in Figure 4, in step 11 above, the specific implementation method of converting the original data to be stored into the first quaternary sequence includes the following steps:
步骤41,根据预设采样率对音频进行采样处理,得到多个采样数据。Step 41: Sampling the audio according to the preset sampling rate to obtain multiple sample data.
上述预设采样率可根据实际情况进行设定,例如设定为8kHz、11.025kHz、22.05kHz、16kHz、37.8kHz、44.1kHz、48kHz、96kHz、192kHz等。The above-mentioned preset sampling rate can be set according to the actual situation, for example, set to 8kHz, 11.025kHz, 22.05kHz, 16kHz, 37.8kHz, 44.1kHz, 48kHz, 96kHz, 192kHz, etc.
步骤42,获取每个采样数据的振幅值。Step 42: Obtain the amplitude value of each sampled data.
上述采样数据的振幅值可以是该采样数据的音频振幅,具体可通过目前通用的音频振幅提取方式获得。The amplitude value of the above-mentioned sampled data may be the audio amplitude of the sampled data, which may be obtained through a currently common audio amplitude extraction method.
步骤43,根据多个采样数据的采样顺序对获取到的振幅值进行排序,得到十进制序列。Step 43: Sort the obtained amplitude values according to the sampling order of the multiple sampled data to obtain a decimal sequence.
在本申请的一些实施例中,在获取到采样数据的振幅值后,可用十进制数表示该振幅值。相应的,可根据获取到各采样数据的先后顺序,对各采样数据的振幅值进行排序得到十进制序列。In some embodiments of the present application, after obtaining the amplitude value of the sampling data, the amplitude value can be represented by a decimal number. Correspondingly, the amplitude values of each sampled data can be sorted according to the order in which each sampled data is obtained to obtain a decimal sequence.
步骤44,根据十进制字符与四进制字符之间的映射关系,将十进制序列转换为第一四进制序列。Step 44: Convert the decimal sequence into a first quaternary sequence according to the mapping relationship between decimal characters and quaternary characters.
在本申请的一些实施例中,为便于编解码,在将十进制序列转换为第一四进制序列的过程中,每个采样数据的振幅值均采用相同位数的四进制数表示。例如假设采用4位四进制数表示一个振幅值,那么当某个采样数据的振幅值用十进制表示为2时,在将十进制序列转换为第一四进制序列时,该十进制数2转换为四进制数0002。In some embodiments of the present application, in order to facilitate encoding and decoding, in the process of converting the decimal sequence into the first quaternary sequence, the amplitude value of each sampled data is represented by a quaternary number with the same number of digits. For example, assuming that a 4-digit quaternary number is used to represent an amplitude value, then when the amplitude value of a certain sampled data is expressed as 2 in decimal, when the decimal sequence is converted into the first quaternary sequence, the decimal number 2 is converted into Quaternary number 0002.
举例说明,假设利用8kHz的采样率对音频进行采样处理,按照采样顺序得到的16个采样数据的振幅值依次为:25,76,127,255,127,76,178,204,127,153,76,51,127,153,51,153,进而得到十进制序列为25 76 127 255 127 76 178 204 127 153 76 51 127 153 51 153,然后按照每个十进制数用4位四进制数表示的要求,并根据十进制字符与四进制字符之间的映射关系,得到的第一四进制序列为:For example, assuming that the audio is sampled using a sampling rate of 8kHz, the amplitude values of the 16 sampled data obtained according to the sampling order are: 25, 76, 127, 255, 127, 76, 178, 204, 127, 153, 76, 51, 127, 153, 51, 153, and then the decimal sequence is 25 76 127 255 127 76 178 204 127 153 76 51 127 153 51 153, and then according to the requirement that each decimal number be represented by a 4-digit quaternary number, and based on the mapping relationship between decimal characters and quaternary characters, the first quaternary sequence obtained for:
0121103013333333133310302302303013332121103003031333212103032121。在得到的第一四进制序列后,假设M取值为5,N的取值为1,四进制字符与碱基的映射关系为:0-A,1-T,2-C,3-G,则利用四进制字符与碱基的映射关系,得到的碱基序列为:0121103013333333133310302302303013332121103003031333212103032121. After the first quaternary sequence is obtained, assuming that the value of M is 5 and the value of N is 1, the mapping relationship between quaternary characters and bases is: 0-A, 1-T, 2-C, 3 -G, then using the mapping relationship between quaternary characters and bases, the obtained base sequence is:
ATCTTAGATGGGGGGGTGGGTAGACGACGAGATGGGCTCTTAGAAGAGTGGGCTCTAGAGCTCT,最终插入碱基得到的DNA序列为:ATCTTAGATGGGGGGGTGGGTAGACGACGAGATGGGCTCTTAGAAGAGTGGGCTCTAGAGCTCT. The final DNA sequence obtained by inserting the bases is:
ATCTTCAGATGAGGGGGAGTGGGATAGACAGACGACGATGGAGCTCTATAGAACGAGTGAGGCTCATAGAGACTCT。ATCTTCAGATGAGGGGGAGTGGGATAGACAGACGACGATGGAGCTCTATAGAACGAGTGAGGCTCATAGAGACTCT.
在本申请的一个可能的实施例中,在获取到各采样数据的振幅值后,不采用十进制数对振幅值进行表示,而是直接利用几位四进制数对振幅值进行表示,然后根据各采样数据的采样顺序,对对应的四进制数进行排序得到第一四进制序列。示例性的,若采用四位四进制数对振幅值进行表示,则可以将最大振幅值赋值为3333,其他采样数据的振幅值可按比例计算出其数字化的四进制数,得到第一四进制序列。In a possible embodiment of the present application, after obtaining the amplitude value of each sampled data, instead of using decimal numbers to represent the amplitude value, several quaternary numbers are directly used to represent the amplitude value, and then according to In the sampling order of each sampled data, the corresponding quaternary numbers are sorted to obtain the first quaternary sequence. For example, if a four-digit quaternary number is used to represent the amplitude value, the maximum amplitude value can be assigned to 3333, and the amplitude values of other sampled data can be calculated proportionally to their digitized quaternary numbers to obtain the first Quaternary sequence.
当待存储的原始数据为视频时,如图5所示,上述第一四进制序列包括音频对应的第二四进制序列和画面对应的第三四进制序列,上述步骤11,将待存储的原始数据转换为第一四进制序列的具体实现方式包括如下步骤:When the original data to be stored is video, as shown in Figure 5, the above-mentioned first quaternary sequence includes the second quaternary sequence corresponding to the audio and the third quaternary sequence corresponding to the picture. The above step 11 will The specific implementation method of converting the stored original data into the first quaternary sequence includes the following steps:
步骤51,提取视频的音频以及视频的每帧图片。Step 51: Extract the audio of the video and each frame of the video.
在本申请的一些实施例中,具体可采用目前通用的视频和图片提取方式,提取到视频的音频和每帧图片。In some embodiments of the present application, currently common video and picture extraction methods may be used to extract the audio and each frame of the video.
步骤52,对提取到的音频进行处理,得到音频对应的第二四进制序列。Step 52: Process the extracted audio to obtain the second quaternary sequence corresponding to the audio.
在本申请的一些实施例中,对音频进行处理得到对应的第二四进制序列的具体实现方式(如图4所示的实现方式)在前文已详细阐述,在此不再赘述。In some embodiments of the present application, the specific implementation of processing the audio to obtain the corresponding second quaternary sequence (the implementation shown in Figure 4) has been described in detail above and will not be described again here.
步骤53,对提取到的每帧图片进行处理,得到提取到的每帧图片对应的第四四进制序列。Step 53: Process each extracted picture frame to obtain a fourth quaternary sequence corresponding to each extracted frame picture.
在本申请的一些实施例中,对图片进行处理得到对应的第四四进制序列的具体实现方式(如图3所示的实现方式)在前文已详细阐述,在此不再赘述。需要说明的是,此处可通过分别对每帧图片进行处理,得到每帧图片对应的第四四进制序列。In some embodiments of the present application, the specific implementation method of processing the image to obtain the corresponding fourth quaternary sequence (the implementation method shown in Figure 3) has been described in detail above and will not be described again here. It should be noted that here, each frame of picture can be processed separately to obtain the fourth quaternary sequence corresponding to each frame of picture.
步骤54,按照提取到的各帧图片在视频中的播放顺序,对各帧图片对应的Step 54: According to the playback order of the extracted frame pictures in the video, match the corresponding
第四四进制序列进行排序,得到画面对应的第三四进制序列。The fourth quaternary sequence is sorted to obtain the third quaternary sequence corresponding to the picture.
需要说明的是,当原始数据为视频时,需要提取视频的音频和每帧图片,然后分别对音频和图片进行处理,得到音频对应的第二四进制序列以及画面对应的第三四进制序列。It should be noted that when the original data is a video, it is necessary to extract the audio and each frame of the video, and then process the audio and pictures respectively to obtain the second quaternary sequence corresponding to the audio and the third quaternary sequence corresponding to the picture. sequence.
即,在对视频进行DNA编码时,视频将转换为音频对应的第二四进制序列和画面对应的第四四进制序列,然后分别将这两个四进制序列编码为碱基序列,最终得到音频的DNA序列(即通过音频对应的第二四进制序列得到的)和画面的DNA序列(即通过画面对应的第三四进制序列得到的)。That is, when DNA encoding a video, the video will be converted into the second quaternary sequence corresponding to the audio and the fourth quaternary sequence corresponding to the picture, and then these two quaternary sequences will be encoded into base sequences respectively. Finally, the DNA sequence of the audio (that is, obtained through the second quaternary sequence corresponding to the audio) and the DNA sequence of the picture (that is, obtained through the third quaternary sequence corresponding to the picture) are obtained.
举例说明,假设在提取到视频的音频后,利用8kHz的采样率对音频进行采样处理,按照采样顺序得到的5个采样数据的振幅值依次为:25,76,127,255,127,进而得到十进制序列为25 76 127 255 127,然后按照每个十进制数用4位四进制数表示的要求,并根据十进制字符与四进制字符之间的映射关系,得到音频对应的第二四进制序列为:For example, suppose that after extracting the audio of the video, the audio is sampled using a sampling rate of 8kHz. The amplitude values of the five sampled data obtained according to the sampling order are: 25, 76, 127, 255, 127, and then we get The decimal sequence is 25 76 127 255 127, and then according to the requirement that each decimal number be represented by a 4-digit quaternary number, and based on the mapping relationship between decimal characters and quaternary characters, the second quaternary corresponding to the audio is obtained The sequence is:
01211030133333331333。在得到的第二四进制序列后,假设M取值为5,N的取值为1,四进制字符与碱基的映射关系为:0-A,1-T,2-C,3-G,则利用四进制字符与碱基的映射关系,得到的碱基序列为:01211030133333331333. After the second quaternary sequence is obtained, assuming that the value of M is 5 and the value of N is 1, the mapping relationship between quaternary characters and bases is: 0-A, 1-T, 2-C, 3 -G, then using the mapping relationship between quaternary characters and bases, the obtained base sequence is:
ATCTTAGATGGGGGGGTGGG,最终插入碱基得到音频的DNA序列为:ATCTTAGATGGGGGGGTGGG, the final DNA sequence of inserting the bases to obtain the audio is:
ATCTTCAGATGAGGGGGAGTGGG。ATCTTCAGATGAGGGGGAGTGGG.
在该举例中,视频包括3帧图片,每帧图片由4个像素点组成,在提取到这3帧图片后,分别对这3帧图片进行处理,得到每帧图片对应的第四四进制序列。对于第一帧图片,假设4个像素点的RGB值分别为:(2,100,23),(45,254,78),(99,65,109),(68,126,94),按照这4个像素点在图片中的相对位置关系,对这4个像素点的RGB值进行排序,得到的十进制序列可以表达为2 100 23 45 254 78 99 65 109 68 126 94 ,根据十进制字符与四进制字符之间的映射关系,转换后的第四四进制序列为2 1210 113 231 3332 1032 1203 1001 1231 1010 1332 1132 。需要说明的是,由于RGB模式最大值为255,转换为四进制后是3333,为了保证每个数字所占的位数相等,将不足4位的数用0进行补位,保持原有相对顺序,得到最终的第四四进制序列为:In this example, the video includes 3 frames of pictures. Each frame of picture is composed of 4 pixels. After extracting these 3 frames of pictures, these 3 frames of pictures are processed respectively to obtain the fourth quaternary number corresponding to each frame of picture. sequence. For the first frame of the picture, assume that the RGB values of the four pixels are: (2, 100, 23), (45, 254, 78), (99, 65, 109), (68, 126, 94), according to The relative position of these four pixels in the picture, and the RGB values of these four pixels are sorted. The resulting decimal sequence can be expressed as 2 100 23 45 254 78 99 65 109 68 126 94. According to the mapping relationship between decimal characters and quaternary characters, the converted fourth quaternary sequence is 2 1210 113 231 3332 1032 1203 1001 1231 1010 1332 1132 . It should be noted that since the maximum value of RGB mode is 255, which is 3333 after conversion to quaternary, in order to ensure that the number of digits occupied by each number is equal, the numbers with less than 4 digits are filled with 0 to maintain the original relative order, the final fourth quaternary sequence is obtained:
000212100113023133321032120310011231101013321132。同理,假设第二帧图片的4个像素点的RGB值分别为:(34,54,122),(56,89,90),(211,168,88),(80,250,255),按照这4个像素点在图片中的相对位置关系,对这4个像素点的RGB值进行排序,得到的十进制序列可以表达为34 54 122 56 89 90 211 168 88 80 250 255,根据十进制字符与四进制字符之间的映射关系,转换后的第四四进制序列为:000212100113023133321032120310011231101013321132. In the same way, assume that the RGB values of the four pixels of the second frame are: (34, 54, 122), (56, 89, 90), (211, 168, 88), (80, 250, 255) , according to the relative position of these four pixels in the picture, sort the RGB values of these four pixels, and the resulting decimal sequence can be expressed as 34 54 122 56 89 90 211 168 88 80 250 255. According to the mapping relationship between decimal characters and quaternary characters, the converted fourth quaternary sequence is:
020203121322032011211122310322201120110033223333;同理,假设第三帧图片的4个像素点的RGB值分别为:(120,70,92),(126,21,24),(127,75,66),(185,5,221),按照这4个像素点在图片中的相对位置关系,对这4个像素点的RGB值进行排序,得到的十进制序列可以表达为120 70 92 126 21 24 127 75 66 185 5 221,根据十进制字符与四进制字符之间的映射关系,转换后的第四四进制序列:020203121322032011211122310322201120110033223333; Similarly, assume that the RGB values of the four pixels of the third frame of the picture are: (120, 70, 92), (126, 21, 24), (127, 75, 66), (185, 5, 221), according to the relative position of these four pixels in the picture, sort the RGB values of these four pixels, and the resulting decimal sequence can be expressed as 120 70 92 126 21 24 127 75 66 185 5 221, according to the mapping relationship between decimal characters and quaternary characters, the converted fourth quaternary sequence:
132010121130133201110120133310231002232100113131。在得到3帧图片132010121130133201110120133310231002232100113131. After getting 3 frames of pictures
对应的第四四进制序列后,按照3帧图片在视频中的播放顺序,对3帧图片对应的第四四进制序列进行排序,得到画面对应的第三四进制序列为:000212100113023133321032120310011231101013321132020203121322032011211122310322201120110033223333132010121130133201110120133310231002232100113131。在得到的第三四进制序列后,假设M取值为5,N的取值为1,四进制字符与碱基的映射关系为:0-A,1-T,2-C,3-G,则利用四进制字符与碱基的映射关系,得到的碱基序列为:After the corresponding fourth quaternary sequence, the fourth quaternary sequence corresponding to the three frames of pictures is sorted according to the playback order of the three frames of pictures in the video. The third quaternary sequence corresponding to the picture is obtained: 0002121001130231333210321203100112311010133211320202031213220320112111223103 22201120110033223333132010121130133201110120133310231002232100113131. After the third quaternary sequence is obtained, assuming that the value of M is 5 and the value of N is 1, the mapping relationship between quaternary characters and bases is: 0-A, 1-T, 2-C, 3 -G, then using the mapping relationship between quaternary characters and bases, the obtained base sequence is:
AAACTCTAATTGACGTGGGCTAGCTCAGTAATTCGTTATATGGCTTGCACACAGTCTGCCAGCATTCTTTCCGTAGCCCATTCATTAAGGCCGGGGTGCATATCTTGATGGCATTTATCATGGGTACGTAACCGCTAATTGTGT,最终插入碱基得到画面的DNA序列为:AAACTCTAATTGACGTTGGGCTAGCTCAGTAATTCGTTATATGGCTTGCACACAGTCTGCCAGCATTCTTTCCGTAGCCCATTCATTAAGGCCGGGGTGCATATCTTGATGGCATTTATCATGGGTACGTAACCGCTAATTGTGT, and finally insert the bases to get the DNA sequence of the picture:
AAACTACTAATATGACGATGGGCATAGCTACAGTACATTCGATTATACTGGCTATGCACACACGTCTGACCAGCTATTCTATTCCGATAGCCACATTCTATTAACGGCCGAGGGTGCATCATCTTAGATGGACATTTCATCATAGGGTATCGTAATCCGCTCAATTGATGT。AAACTACTAATATGACGATGGGCATAGCTACAGTACATTCGATTATACTGGCTATGCACACACGTCTGACCAGCTATTCTATTCCGATAGCCACATTCTATTAACGGCCGAGGGTGCATCTTAGATGGACATTTCATCATAGGGTATCGTAATCCGCTCAATTGATGT.
下面结合具体实施例对本申请提供的DNA解码方法进行示例性的说明。The DNA decoding method provided by this application will be exemplarily described below in conjunction with specific embodiments.
如图6所示,本申请实施例提供了一种DNA解码方法,包括如下步骤:As shown in Figure 6, the embodiment of the present application provides a DNA decoding method, including the following steps:
步骤61,确定需解码的DNA序列。Step 61: Determine the DNA sequence to be decoded.
当需要从存储有原始数据的DNA中获取原始数据时,需先对上述DNA进行测序,读取出DNA序列,从而确定需解码的DNA序列。需要说明的是,DNA测序的具体实现过程可参照目前通用的方式实现,此处不再赘述。如前文所述,若存储的原始数据为视频,则需要利用两个DNA序列进行存储,而若存储的原始数据为文字、图片或者音频,则只需要利用一个DNA序列进行存储。因此在解码时,上述需解码的DNA序列的数量可以是一个,也可以是多个。When it is necessary to obtain original data from DNA that stores original data, the above-mentioned DNA needs to be sequenced first, and the DNA sequence must be read to determine the DNA sequence to be decoded. It should be noted that the specific implementation process of DNA sequencing can be implemented by referring to the current common methods, and will not be described again here. As mentioned above, if the original data stored is a video, two DNA sequences need to be used for storage, and if the original data stored is text, picture, or audio, only one DNA sequence needs to be used for storage. Therefore, during decoding, the number of DNA sequences to be decoded may be one or multiple.
步骤62,根据DNA序列得到碱基序列。Step 62: Obtain the base sequence based on the DNA sequence.
在一个示例中,在确定需解码的DNA序列后,可以直接将DNA序列作为碱基序列。In one example, after determining the DNA sequence to be decoded, the DNA sequence can be directly used as the base sequence.
在本申请的另一个实施例中,在DNA编码时为避免测序结果紊乱,碱基序列中插入了碱基,因此在解码时,需要去除DNA序列中插入的碱基,得到碱基序列。In another embodiment of the present application, in order to avoid disordered sequencing results during DNA encoding, bases are inserted into the base sequence. Therefore, during decoding, the inserted bases in the DNA sequence need to be removed to obtain the base sequence.
与插入碱基相对应的,在本申请的一些实施例中,可通过在DNA序列中,每间隔M个碱基的位置,去除N个碱基的方式,得到碱基序列。Corresponding to the insertion of bases, in some embodiments of the present application, the base sequence can be obtained by removing N bases at every M base position in the DNA sequence.
上述M和N均为大于0的整数,M和N的具体数值均可根据实际情况进行设定。The above M and N are both integers greater than 0, and the specific values of M and N can be set according to the actual situation.
步骤63,根据预设的四进制字符与碱基的映射关系,对碱基序列进行解码得到第一四进制序列。Step 63: Decode the base sequence according to the preset mapping relationship between quaternary characters and bases to obtain the first quaternary sequence.
上述四进制字符包括0、1、2、3,上述碱基包括A、T、C、G四种天然碱基。在本申请的一些实施例中,ATCG四种碱基与0123一一对应的映射,四进制字符与碱基共存在24种映射关系,上述预设的四进制字符与碱基的映射关系可以是这24种映射关系中的任一种。The above-mentioned quaternary characters include 0, 1, 2, and 3, and the above-mentioned bases include four natural bases: A, T, C, and G. In some embodiments of this application, the four ATCG bases are mapped one-to-one to 0123. There are 24 mapping relationships between quaternary characters and bases. The above-mentioned preset mapping relationships between quaternary characters and bases It can be any of these 24 mapping relationships.
在本申请的一些实施例中,在将碱基序列解码为第一四进制序列的过程中,不需要经过复杂的运算处理,而是通过四进制字符与碱基的一一对应的映射关系,将碱基序列解码为第一四进制序列,从而大大降低了算法复杂度,提升了解码速度。In some embodiments of the present application, in the process of decoding the base sequence into the first quaternary sequence, there is no need to go through complex operations, but through a one-to-one mapping of quaternary characters and bases. Relationship, the base sequence is decoded into the first quaternary sequence, thereby greatly reducing the algorithm complexity and improving the decoding speed.
步骤64,对第一四进制序列进行转换,得到DNA序列对应的原始数据。Step 64: Convert the first quaternary sequence to obtain original data corresponding to the DNA sequence.
在本申请的一些实施例中,在将第一四进制序列转换为原始数据时,不同格式的原始数据的具体转换方式不同,具体转换过程将在后文详细阐述。In some embodiments of the present application, when converting the first quaternary sequence into original data, the specific conversion methods of the original data in different formats are different, and the specific conversion process will be explained in detail later.
需要说明的是,当需解码的DNA序列的数量为多个时,需分别针对每个需解码的DNA序列执行步骤62至步骤64,得到每个需解码的DNA序列对应的原始数据。It should be noted that when the number of DNA sequences to be decoded is multiple, steps 62 to 64 need to be performed for each DNA sequence to be decoded to obtain original data corresponding to each DNA sequence to be decoded.
值得一提的是,与提升编码速度相对应的,由于在将碱基序列解码为第一四进制序列的过程中,不需要经过复杂运算处理,而是根据四进制字符与碱基的一一映射关系,将碱基序列解码为第一四进制序列,从而降低了算法复杂度,提升了解码速度。It is worth mentioning that, corresponding to the improvement of encoding speed, in the process of decoding the base sequence into the first quaternary sequence, no complex operation is required, but based on the relationship between quaternary characters and bases. The one-to-one mapping relationship decodes the base sequence into the first quaternary sequence, thereby reducing the algorithm complexity and improving the decoding speed.
下面结合具体实施例对上述步骤64的具体实现方式进行示例性的说明。The specific implementation manner of the above step 64 will be exemplarily described below with reference to specific embodiments.
当DNA序列所存储的原始数据为文字时,上述步骤64,对第一四进制序列进行转换,得到DNA序列对应的原始数据的具体实现方式包括如下步骤:根据预设字符编码表中的编码字符与四进制字符之间的映射关系,将第一四进制序列进行转换为编码序列,并根据预设字符编码表将编码序列转换为文字。When the original data stored in the DNA sequence is text, the above-mentioned step 64 is to convert the first quaternary sequence to obtain the original data corresponding to the DNA sequence. The specific implementation method includes the following steps: according to the encoding in the preset character encoding table The mapping relationship between characters and quaternary characters is to convert the first quaternary sequence into a coding sequence, and convert the coding sequence into text according to the preset character coding table.
上述预设字符编码表为对文字进行编码的编码表,如 Unicode编码表、UTF-8编码表、ASCII编码表、ISO8859-1编码表、GB2312编码表、GBK编码表等编码表。可以理解的是,在本申请的一些实施例中并不限定上述预设字符编码表的具体形式。The above-mentioned default character encoding table is a encoding table for encoding text, such as Unicode encoding table, UTF-8 encoding table, ASCII encoding table, ISO8859-1 encoding table, GB2312 encoding table, GBK encoding table and other encoding tables. It can be understood that in some embodiments of the present application, the specific form of the above-mentioned preset character encoding table is not limited.
上述编码字符是:在对文字进行DNA编码时利用预设字符编码表得到的编码序列对应的进制数的字符,因此在解码时,可利用该编码字符与四进制字符之间的映射关系,将第一四进制序列转换为编码序列,并进一步利用预设字符编码表转换为文字。The above-mentioned coded characters are: characters corresponding to the encoding sequence obtained by using the preset character encoding table when DNA encoding text. Therefore, when decoding, the mapping relationship between the coded characters and quaternary characters can be used. , convert the first quaternary sequence into a coding sequence, and further convert it into text using a preset character encoding table.
与DNA编码相对应的,在本申请的一些实施例中,上述编码序列可以为十六进制表示的信息序列,相应的,上述编码字符为十六进制字符。Corresponding to DNA encoding, in some embodiments of the present application, the above-mentioned encoding sequence may be an information sequence represented by hexadecimal, and accordingly, the above-mentioned encoding characters may be hexadecimal characters.
当DNA序列所存储的原始数据为图片时,上述步骤64,对第一四进制序列进行转换,得到DNA序列对应的原始数据的具体实现方式包括如下步骤:首先根据四进制字符与十进制字符之间的映射关系,将第一四进制序列转换为十进制序列;然后根据十进制序列,确定多个像素点的RGB值;最终根据多个像素点的RGB值以及预设的像素点排列顺序,生成图片。When the original data stored in the DNA sequence is a picture, the above step 64 is to convert the first quaternary sequence to obtain the original data corresponding to the DNA sequence. The specific implementation method includes the following steps: first, according to the quaternary characters and decimal characters The mapping relationship between the first quaternary sequence is converted into a decimal sequence; then the RGB values of multiple pixels are determined based on the decimal sequence; finally, based on the RGB values of multiple pixels and the preset pixel arrangement order, Generate pictures.
与DNA编码相对应的,在本申请的一些实施例中,在将第一四进制序列转换为十进制序列的过程中,可将每四位四进制数作为一个十进制数,例如若第一四进制序列为0002 1210 0113 0231 3332 1032 1203 1001 1231 1010 1332 1132,则得到的十进制序列为2 100 23 45 254 78 99 65 109 68 126 94。在确定多个像素点的RGB值的过程中,可将十进制序列中的每三个十进制数作为一个像素点的RGB值,例如若十进制序列为2 100 23 45 254 78 99 65 109 68 126 94,则得到的多个像素点的RGB值分别为(2,100,23),(45,254,78),(99,65,109),(68,126,94)。在生成图片的过程中,可根据各像素点在图片中的相对位置关系,对各像素点的RGB值进行处理得到图片。Corresponding to DNA coding, in some embodiments of the present application, in the process of converting the first quaternary sequence into a decimal sequence, each four-digit quaternary number can be regarded as a decimal number, for example, if the first The quaternary sequence is 0002 1210 0113 0231 3332 1032 1203 1001 1231 1010 1332 1132, then the resulting decimal sequence is 2 100 23 45 254 78 99 65 109 68 126 94. In the process of determining the RGB values of multiple pixels, every three decimal numbers in the decimal sequence can be used as the RGB value of one pixel. For example, if the decimal sequence is 2 100 23 45 254 78 99 65 109 68 126 94, then the RGB values of the multiple pixels obtained are (2, 100, 23), (45, 254, 78), (99, 65, 109), (68, 126, 94). In the process of generating a picture, the RGB value of each pixel can be processed according to the relative position of each pixel in the picture to obtain the picture.
当DNA序列所存储的原始数据为音频时,上述步骤64,对第一四进制序列进行转换,得到DNA序列对应的原始数据的具体实现方式包括如下步骤:首先根据四进制字符与十进制字符之间的映射关系,将第一四进制序列转换为十进制序列;然后根据十进制序列,确定多个采样数据的振幅值;再根据预设采样率和采样数据的数量,确定DNA序列所存储的音频的总时长;最终将确定出的振幅值平均分布于总时长中,得到音频。When the original data stored in the DNA sequence is audio, the above step 64 is to convert the first quaternary sequence to obtain the original data corresponding to the DNA sequence. The specific implementation method includes the following steps: first, according to the quaternary characters and decimal characters The mapping relationship between the first quaternary sequence is converted into a decimal sequence; then the amplitude values of multiple sampled data are determined based on the decimal sequence; and then the amplitude values stored in the DNA sequence are determined based on the preset sampling rate and the number of sampled data. The total duration of the audio; finally, the determined amplitude values are evenly distributed over the total duration to obtain the audio.
与DNA编码相对应的,在将第一四进制序列转换为十进制序列的过程中,可将每T(T值为:DNA编码时将十进制序列转换为第一四进制序列时每个十进制数对应的四进制数的位数)位四进制数转换为一个十进制数。在确定多个采样数据的振幅值的过程中,可将十进制序列中每个十进制数分别作为一个采样数据的振幅值,且各振幅值按照对应的十进制数在十进制序列中的顺序排序。在确定音频的总时长的过程中,可将振幅值的数量(即采样数据的数量)与预存在采样率的比值作为音频的总时长。在最终获得音频的过程中,可将排序后的振幅值平均分布于总时长中,得到音频。Corresponding to DNA encoding, in the process of converting the first quaternary sequence into a decimal sequence, each T (T value is: When converting the decimal sequence into the first quaternary sequence when DNA encoding is performed, each decimal The number of digits (number of digits corresponding to the quaternary number) of the quaternary number is converted into a decimal number. In the process of determining the amplitude values of multiple sampled data, each decimal number in the decimal sequence can be used as the amplitude value of one sampled data, and each amplitude value is sorted according to the order of the corresponding decimal number in the decimal sequence. In the process of determining the total duration of the audio, the ratio of the number of amplitude values (that is, the number of sampled data) to the pre-existing sampling rate can be used as the total duration of the audio. In the process of finally obtaining the audio, the sorted amplitude values can be evenly distributed over the total duration to obtain the audio.
当需解码的DNA序列包括音频的DNA序列和画面的DNA序列时,表明当前要解码的DNA序列为视频对应的DNA序列,具体的,此时音频的DNA序列所存储的原始数据为视频的音频,画面的DNA序列所存储的原始数据为视频的多帧图片。需要说明的是,在针对音频的DNA序列执行步骤62至步骤63时,得到的第一四进制序列为音频的DNA序列对应的第二四进制序列,在针对音画面的DNA序列执行步骤62至步骤63时,得到的第一四进制序列为画面的DNA序列对应的第三四进制序列。相应的,上述步骤64,对第一四进制序列进行转换,得到DNA序列对应的原始数据的具体实现方式包括如下步骤:对音频的DNA序列对应的第二四进制序列进行转换,得到音频,并对画面的DNA序列对应的第三四进制序列进行转换,得到多帧图片。When the DNA sequence to be decoded includes the DNA sequence of the audio and the DNA sequence of the picture, it indicates that the DNA sequence currently to be decoded is the DNA sequence corresponding to the video. Specifically, the original data stored in the audio DNA sequence at this time is the audio of the video. , the original data stored in the DNA sequence of the picture is the multi-frame picture of the video. It should be noted that when performing steps 62 to 63 for the DNA sequence of the audio, the first quaternary sequence obtained is the second quaternary sequence corresponding to the DNA sequence of the audio. 62 to step 63, the first quaternary sequence obtained is the third quaternary sequence corresponding to the DNA sequence of the picture. Correspondingly, the above step 64, the specific implementation method of converting the first quaternary sequence to obtain the original data corresponding to the DNA sequence includes the following steps: converting the second quaternary sequence corresponding to the DNA sequence of the audio to obtain the audio , and convert the third quaternary sequence corresponding to the DNA sequence of the picture to obtain multi-frame pictures.
在本申请的一些实施例中,对音频的DNA序列(即存储的原始数据为音频的DNA序列)对应的第二四进制序列进行转换得到音频的具体实现方式在前文已详细阐述,在此不再赘述。In some embodiments of the present application, the specific implementation method of converting the second quaternary sequence corresponding to the DNA sequence of the audio (that is, the stored original data is the DNA sequence of the audio) to obtain the audio has been explained in detail above. No longer.
对画面的DNA序列对应的第三四进制序列进行转换得到多帧图片的具体实现方式与,前文对存储的原始数据为图片的DNA序列对应的第一四进制序列进行转换得到图片的具体实现方式类似,不同之处在于:在生成图片的过程中,此时根据各像素点在图片中的相对位置关系,对各像素点的RGB值进行处理,能得到多帧图片。需要说明的是,在生成多帧图片时,由于在DNA编码时记录了每帧图片对应的像素点,以及每帧图片中各像素点的相对位置关系,因此此时能解码出多帧图片。The specific implementation method of converting the third quaternary sequence corresponding to the DNA sequence of the picture to obtain the multi-frame picture is the same as that of converting the stored original data into the first quaternary sequence corresponding to the DNA sequence of the picture to obtain the specific details of the picture. The implementation method is similar, but the difference is that in the process of generating pictures, the RGB value of each pixel is processed according to the relative position of each pixel in the picture, and multiple frames of pictures can be obtained. It should be noted that when generating multi-frame pictures, since the pixels corresponding to each frame of picture and the relative position of each pixel in each frame of picture are recorded during DNA encoding, multi-frame pictures can be decoded at this time.
在当前要解码的DNA序列为视频对应的DNA序列时,在对第一四进制序列进行转换,得到DNA序列对应的原始数据之后,方法还包括:对多帧图片和音频进行合成,得到视频。具体的,可先确定音频的DNA序列对应的的音频的总时长,然后根据解码得到的多帧图片的像素点的RGB值在十进制序列中的顺序,对解码得到的多帧图片进行排序,再将排序后的多帧图片平均分布在该总时长中,并对音频与图片进行结合便能得到视频。When the DNA sequence currently to be decoded is the DNA sequence corresponding to the video, after converting the first quaternary sequence to obtain the original data corresponding to the DNA sequence, the method also includes: synthesizing multiple frames of pictures and audio to obtain the video . Specifically, the total duration of the audio corresponding to the DNA sequence of the audio can be determined first, and then the decoded multi-frame pictures can be sorted according to the order of the RGB values of the pixels of the decoded multi-frame pictures in the decimal sequence, and then The sorted multi-frame pictures are evenly distributed in the total duration, and the video is obtained by combining the audio and pictures.
综上,本申请实施例提供的DNA编解码方法具备如下效果:In summary, the DNA encoding and decoding method provided by the embodiment of the present application has the following effects:
第一,在将四进制序列编码为碱基序列的过程中,不需要经过复杂运算处理,而是根据四进制字符与碱基的映射关系,将四进制序列直接编码为碱基序列,从而降低了算法复杂度,提升了编码速度;First, in the process of encoding a quaternary sequence into a base sequence, there is no need to go through complex operations. Instead, the quaternary sequence is directly encoded into a base sequence based on the mapping relationship between quaternary characters and bases. , thereby reducing the algorithm complexity and improving the coding speed;
第二,在控制单碱基重复数时,不需要经过复杂运算处理(如条件过滤等),而是通过间隔插入碱基的方式控制单碱基重复数,从而降低了算法复杂度,进一步提升了编码速度;Second, when controlling the number of single-base repeats, there is no need to go through complex calculations (such as conditional filtering, etc.). Instead, the number of single-base repeats is controlled by inserting bases at intervals, thereby reducing the complexity of the algorithm and further improving the encoding speed;
第三,可实现对文字、图片、音频、视频的DNA存储。Third, DNA storage of text, pictures, audio, and video can be achieved.
下面结合具体实施例对本申请提供的DNA编码装置、DNA解码装置、终端设备、存储介质及产品进行示例性的说明。The following is an exemplary description of the DNA encoding device, DNA decoding device, terminal equipment, storage media and products provided by the present application in conjunction with specific embodiments.
如图7所示,本申请的实施例提供了一种DNA编码装置,该DNA编码装置700包括:As shown in Figure 7, an embodiment of the present application provides a DNA encoding device. The DNA encoding device 700 includes:
第一转换模块701,用于将待存储的原始数据转换为第一四进制序列;The first conversion module 701 is used to convert the original data to be stored into a first quaternary sequence;
编码模块702,用于根据预设的四进制字符与碱基的映射关系,对第一四进制序列进行编码转换得到碱基序列;The encoding module 702 is used to encode and convert the first quaternary sequence to obtain a base sequence according to the preset mapping relationship between quaternary characters and bases;
生成模块703,用于根据碱基序列得到存储有原始数据的DNA序列。The generation module 703 is used to obtain a DNA sequence storing original data based on the base sequence.
可选的,第一转换模块701包括:Optionally, the first conversion module 701 includes:
编码单元,用于当待存储的原始数据为文字时,按照预设字符编码表对文字进行编码,得到编码序列;The encoding unit is used to encode the text according to the preset character encoding table to obtain a coding sequence when the original data to be stored is text;
第一转换单元,用于根据预设字符编码表中的编码字符与四进制字符之间的映射关系,将编码序列转换为第一四进制序列。The first conversion unit is used to convert the coding sequence into a first quaternary sequence according to the mapping relationship between the coded characters and the quaternary characters in the preset character coding table.
可选的,第一转换模块701包括:Optionally, the first conversion module 701 includes:
第一获取单元,用于当待存储的原始数据为图片时,获取图片中各像素点The first acquisition unit is used to acquire each pixel in the image when the original data to be stored is an image.
的RGB值;RGB value;
第一排序单元,用于按照预设的像素点排列顺序对各像素点的RGB值进行排序,得到十进制序列;The first sorting unit is used to sort the RGB values of each pixel according to the preset pixel arrangement order to obtain a decimal sequence;
第二转换单元,用于根据十进制字符与四进制字符之间的映射关系,将十进制序列转换为第一四进制序列。The second conversion unit is used to convert the decimal sequence into the first quaternary sequence according to the mapping relationship between decimal characters and quaternary characters.
可选的,第一转换模块701包括:Optionally, the first conversion module 701 includes:
采样单元,用于当待存储的原始数据为音频时,根据预设采样率对音频进行采样处理,得到多个采样数据;The sampling unit is used to sample the audio according to the preset sampling rate to obtain multiple sampled data when the original data to be stored is audio;
第二获取单元,用于获取每个采样数据的振幅值;The second acquisition unit is used to acquire the amplitude value of each sampled data;
第二排序单元,用于根据多个采样数据的采样顺序对获取到的振幅值进行排序,得到十进制序列;The second sorting unit is used to sort the obtained amplitude values according to the sampling order of multiple sampled data to obtain a decimal sequence;
第三转换单元,用于根据十进制字符与四进制字符之间的映射关系,将十进制序列转换为第一四进制序列。The third conversion unit is used to convert the decimal sequence into the first quaternary sequence according to the mapping relationship between decimal characters and quaternary characters.
可选的,第一四进制序列包括音频对应的第二四进制序列和画面对应的第Optionally, the first quaternary sequence includes the second quaternary sequence corresponding to the audio and the third quaternary sequence corresponding to the picture.
三四进制序列;第一转换模块701包括:Three-quaternary sequence; the first conversion module 701 includes:
提取单元,用于当待存储的原始数据为视频时,提取视频的音频以及视频的每帧图片;An extraction unit, used to extract the audio of the video and each frame of the video when the original data to be stored is a video;
第一处理单元,用于对提取到的音频进行处理,得到音频对应的第二四进制序列;The first processing unit is used to process the extracted audio to obtain the second quaternary sequence corresponding to the audio;
第二处理单元,用于对提取到的每帧图片进行处理,得到提取到的每帧图片对应的第四四进制序列;The second processing unit is used to process each extracted picture frame and obtain the fourth quaternary sequence corresponding to each extracted frame picture;
第三排序单元,用于按照提取到的各帧图片在视频中的播放顺序,对各帧The third sorting unit is used to sort each frame according to the playback order of the extracted frame pictures in the video.
图片对应的第四四进制序列进行排序,得到画面对应的第三四进制序列。The fourth quaternary sequence corresponding to the picture is sorted to obtain the third quaternary sequence corresponding to the picture.
可选的,生成模块703包括:Optionally, the generation module 703 includes:
生成单元,用于在碱基序列中每间隔M个碱基的位置,插入N个碱基,得到存储有原始数据的DNA序列。The generation unit is used to insert N bases at every M base position in the base sequence to obtain a DNA sequence storing original data.
其中,碱基序列中与N个碱基中的第一个碱基相邻的碱基不同;碱基序列中与N个碱基中的第N个碱基相邻的碱基不同,且N个碱基中相邻位置的碱基不同,M和N均为大于0的整数。Among them, the bases adjacent to the first base among the N bases in the base sequence are different; the bases adjacent to the Nth base among the N bases in the base sequence are different, and N The bases at adjacent positions among the bases are different, and M and N are both integers greater than 0.
需要说明的是,上述DNA编码装置/单元之间的信息交互、执行过程等内容,由于与本申请方法实施例的DNA编码方法基于同一构思,其具体功能及带来的技术效果,具体可参见方法实施例部分,此处不再赘述。It should be noted that the information interaction, execution process, etc. between the above-mentioned DNA encoding devices/units are based on the same concept as the DNA encoding method in the method embodiment of the present application. For details, please refer to The method embodiment part will not be described again here.
如图8所示,本申请的实施例提供了一种DNA解码装置,该DNA解码装置800包括:As shown in Figure 8, an embodiment of the present application provides a DNA decoding device. The DNA decoding device 800 includes:
确定模块801,用于确定需解码的DNA序列;Determination module 801, used to determine the DNA sequence to be decoded;
处理模块802,用于根据DNA序列得到碱基序列;The processing module 802 is used to obtain the base sequence according to the DNA sequence;
解码模块803,用于根据预设的四进制字符与碱基的映射关系,对碱基序列进行解码得到第一四进制序列;The decoding module 803 is used to decode the base sequence according to the preset mapping relationship between quaternary characters and bases to obtain the first quaternary sequence;
第二转换模块804,用于对第一四进制序列进行转换,得到DNA序列对应的原始数据。The second conversion module 804 is used to convert the first quaternary sequence to obtain original data corresponding to the DNA sequence.
可选的,第二转换模块804包括:Optionally, the second conversion module 804 includes:
第四转换单元,用于当DNA序列所存储的原始数据为文字时,根据预设字符编码表中的编码字符与四进制字符之间的映射关系,将第一四进制序列进行转换为编码序列;The fourth conversion unit is used to convert the first quaternary sequence into coding sequence;
第五转换单元,用于根据预设字符编码表将编码序列转换为文字。The fifth conversion unit is used to convert the encoding sequence into text according to the preset character encoding table.
可选的,第二转换模块804包括:Optionally, the second conversion module 804 includes:
第六转换单元,用于当DNA序列所存储的原始数据为图片时,根据四进制字符与十进制字符之间的映射关系,将第一四进制序列转换为十进制序列;The sixth conversion unit is used to convert the first quaternary sequence into a decimal sequence according to the mapping relationship between quaternary characters and decimal characters when the original data stored in the DNA sequence is a picture;
第一确定单元,用于根据十进制序列,确定多个像素点的RGB值;The first determination unit is used to determine the RGB values of multiple pixels according to the decimal sequence;
生成单元,用于根据多个像素点的RGB值以及预设的像素点排列顺序,生成图片。The generation unit is used to generate pictures based on the RGB values of multiple pixels and the preset arrangement order of pixels.
可选的,第二转换模块804包括:Optionally, the second conversion module 804 includes:
第七转换单元,用于当DNA序列所存储的原始数据为音频时,根据四进制字符与十进制字符之间的映射关系,将第一四进制序列转换为十进制序列;The seventh conversion unit is used to convert the first quaternary sequence into a decimal sequence according to the mapping relationship between quaternary characters and decimal characters when the original data stored in the DNA sequence is audio;
第二确定单元,用于根据十进制序列,确定多个采样数据的振幅值;The second determination unit is used to determine the amplitude values of multiple sampled data according to the decimal sequence;
第三确定单元,用于根据预设采样率和采样数据的数量,确定DNA序列所存储的音频的总时长;The third determination unit is used to determine the total duration of the audio stored in the DNA sequence based on the preset sampling rate and the number of sampled data;
分布单元,用于将确定出的振幅值平均分布于总时长中,得到音频。The distribution unit is used to evenly distribute the determined amplitude values over the total duration to obtain audio.
可选的,需解码的DNA序列包括音频的DNA序列和画面的DNA序列,第一四进制序列包括音频的DNA序列对应的第二四进制序列和画面的DNA序列对应的第三四进制序列,音频的DNA序列所存储的原始数据为音频,画面的DNA序列所存储的原始数据为多帧图片;Optionally, the DNA sequence to be decoded includes the DNA sequence of the audio and the DNA sequence of the picture. The first quaternary sequence includes the second quaternary sequence corresponding to the DNA sequence of the audio and the third quaternary sequence corresponding to the DNA sequence of the picture. The original data stored in the audio DNA sequence is audio, and the original data stored in the picture DNA sequence is multi-frame pictures;
第二转换模块804包括:The second conversion module 804 includes:
第八转换单元,用于对音频的DNA序列对应的第二四进制序列进行转换,得到音频;The eighth conversion unit is used to convert the second quaternary sequence corresponding to the DNA sequence of the audio to obtain the audio;
第九转换单元,用于对画面的DNA序列对应的第三四进制序列进行转换,得到多帧图片;The ninth conversion unit is used to convert the third quaternary sequence corresponding to the DNA sequence of the picture to obtain multiple frames of pictures;
DNA解码装置800还包括:The DNA decoding device 800 also includes:
合成模块,用于对多帧图片和音频进行合成,得到视频。The synthesis module is used to synthesize multiple frames of pictures and audio to obtain a video.
可选的,处理模块802,具体用于在DNA序列中,每间隔M个碱基的位置,去除N个碱基,得到碱基序列;其中,M和N均为大于0的整数。Optionally, the processing module 802 is specifically configured to remove N bases at positions every M base interval in the DNA sequence to obtain a base sequence; where M and N are both integers greater than 0.
需要说明的是,上述DNA解码装置/单元之间的信息交互、执行过程等内容,由于与本申请方法实施例的DNA解码方法基于同一构思,其具体功能及带来的技术效果,具体可参见方法实施例部分,此处不再赘述。It should be noted that the information interaction and execution process between the above-mentioned DNA decoding devices/units are based on the same concept as the DNA decoding method in the method embodiment of the present application. For details, please refer to The method embodiment part will not be described again here.
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将所述装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。实施例中的各功能单元、模块可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中,上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。另外,各功能单元、模块的具体名称也只是为了便于相互区分,并不用于限制本申请的保护范围。上述系统中单元、模块的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that for the convenience and simplicity of description, only the division of the above functional units and modules is used as an example. In actual applications, the above functions can be allocated to different functional units and modules according to needs. Module completion means dividing the internal structure of the device into different functional units or modules to complete all or part of the functions described above. Each functional unit and module in the embodiment can be integrated into one processing unit, or each unit can exist physically alone, or two or more units can be integrated into one unit. The above-mentioned integrated unit can be hardware-based. It can also be implemented in the form of software functional units. In addition, the specific names of each functional unit and module are only for the convenience of distinguishing each other and are not used to limit the scope of protection of the present application. For the specific working processes of the units and modules in the above system, please refer to the corresponding processes in the foregoing method embodiments, and will not be described again here.
如图9所示,本申请的实施例提供了一种终端设备,如图9所示,该实施例的终端设备D10包括:至少一个处理器D100(图9中仅示出一个处理器)、存储器D101以及存储在所述存储器D101中并可在所述至少一个处理器D100上运行的计算机程序D102,所述处理器D100执行所述计算机程序D102时实现上述任意各个方法实施例中的步骤。As shown in Figure 9, an embodiment of the present application provides a terminal device. As shown in Figure 9, the terminal device D10 of this embodiment includes: at least one processor D100 (only one processor is shown in Figure 9), Memory D101 and a computer program D102 stored in the memory D101 and executable on the at least one processor D100. When the processor D100 executes the computer program D102, the steps in any of the above method embodiments are implemented.
所称处理器D100可以是中央处理单元(CPU,Central Processing Unit),该处理器D100还可以是其他通用处理器、数字信号处理器 (DSP,Digital Signal Processor)、专用集成电路 (ASIC,Application Specific Integrated Circuit)、现成可编程门阵列 (FPGA,Field-Programmable Gate Array) 或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。The so-called processor D100 can be a central processing unit (CPU, Central Processing Unit). The processor D100 can also be other general-purpose processors, digital signal processors (DSP, Digital Signal Processor), application specific integrated circuits (ASIC, Application Specific Integrated Circuit), off-the-shelf programmable gate array (FPGA, Field-Programmable Gate Array) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general-purpose processor may be a microprocessor or the processor may be any conventional processor, etc.
所述存储器D101在一些实施例中可以是所述终端设备D10的内部存储单元,例如终端设备D10的硬盘或内存。所述存储器D101在另一些实施例中也可以是所述终端设备D10的外部存储设备,例如所述终端设备D10上配备的插接式硬盘,智能存储卡(SMC,Smart Media Card ),安全数字(SD,Secure Digital)卡,闪存卡(Flash Card)等。进一步地,所述存储器D101还可以既包括所述终端设备D10的内部存储单元也包括外部存储设备。所述存储器D101用于存储操作系统、应用程序、引导装载程序(BootLoader)、数据以及其他程序等,例如所述计算机程序的程序代码等。所述存储器D101还可以用于暂时地存储已经输出或者将要输出的数据。In some embodiments, the memory D101 may be an internal storage unit of the terminal device D10, such as a hard disk or memory of the terminal device D10. In other embodiments, the memory D101 may also be an external storage device of the terminal device D10, such as a plug-in hard disk, a smart memory card (SMC, Smart Media Card), or a secure digital device equipped on the terminal device D10. (SD, Secure Digital) card, flash card (Flash Card), etc. Further, the memory D101 may also include both an internal storage unit of the terminal device D10 and an external storage device. The memory D101 is used to store operating systems, application programs, boot loaders (Boot Loaders), data and other programs, such as program codes of the computer programs. The memory D101 can also be used to temporarily store data that has been output or will be output.
需要说明的是,上述装置/单元之间的信息交互、执行过程等内容,由于与本申请方法实施例基于同一构思,其具体功能及带来的技术效果,具体可参见方法实施例部分,此处不再赘述。It should be noted that the information interaction, execution process, etc. between the above-mentioned devices/units are based on the same concept as the method embodiments of the present application. For details of their specific functions and technical effects, please refer to the method embodiments section. No further details will be given.
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将所述装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。实施例中的各功能单元、模块可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中,上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。另外,各功能单元、模块的具体名称也只是为了便于相互区分,并不用于限制本申请的保护范围。上述系统中单元、模块的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that for the convenience and simplicity of description, only the division of the above functional units and modules is used as an example. In actual applications, the above functions can be allocated to different functional units and modules according to needs. Module completion means dividing the internal structure of the device into different functional units or modules to complete all or part of the functions described above. Each functional unit and module in the embodiment can be integrated into one processing unit, or each unit can exist physically alone, or two or more units can be integrated into one unit. The above-mentioned integrated unit can be hardware-based. It can also be implemented in the form of software functional units. In addition, the specific names of each functional unit and module are only for the convenience of distinguishing each other and are not used to limit the scope of protection of the present application. For the specific working processes of the units and modules in the above system, please refer to the corresponding processes in the foregoing method embodiments, and will not be described again here.
本申请实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时实现可实现上述各个方法实施例中的步骤。Embodiments of the present application also provide a computer-readable storage medium. The computer-readable storage medium stores a computer program. When the computer program is executed by a processor, the steps in each of the above method embodiments can be implemented.
本申请实施例提供了一种计算机程序产品,当计算机程序产品在终端设备上运行时,使得终端设备执行时实现可实现上述各个方法实施例中的步骤。Embodiments of the present application provide a computer program product. When the computer program product is run on a terminal device, the steps in each of the above method embodiments can be implemented when the terminal device executes it.
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实现上述实施例方法中的全部或部分流程,可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一计算机可读存储介质中,该计算机程序在被处理器执行时,可实现上述各个方法实施例的步骤。其中,所述计算机程序包括计算机程序代码,所述计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述计算机可读介质至少可以包括:能够将计算机程序代码携带到DNA编码装置/DNA解码装置/终端设备的任何实体或装置、记录介质、计算机存储器、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、电载波信号、电信信号以及软件分发介质。例如U盘、移动硬盘、磁碟或者光盘等。在某些司法管辖区,根据立法和专利实践,计算机可读介质不可以是电载波信号和电信信号。If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a computer-readable storage medium. Based on this understanding, this application can implement all or part of the processes in the methods of the above embodiments by instructing relevant hardware through a computer program. The computer program can be stored in a computer-readable storage medium. The computer program When executed by a processor, the steps of each of the above method embodiments may be implemented. Wherein, the computer program includes computer program code, which may be in the form of source code, object code, executable file or some intermediate form. The computer-readable medium may at least include: any entity or device capable of carrying computer program code to a DNA encoding device/DNA decoding device/terminal device, a recording medium, a computer memory, or a read-only memory (ROM, Read-Only Memory) , random access memory (RAM, Random Access Memory), electrical carrier signals, telecommunications signals and software distribution media. For example, U disk, mobile hard disk, magnetic disk or CD, etc. In some jurisdictions, subject to legislation and patent practice, computer-readable media may not be electrical carrier signals and telecommunications signals.
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述或记载的部分,可以参见其它实施例的相关描述。In the above embodiments, each embodiment is described with its own emphasis. For parts that are not detailed or documented in a certain embodiment, please refer to the relevant descriptions of other embodiments.
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。Those of ordinary skill in the art will appreciate that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented with electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each specific application, but such implementations should not be considered beyond the scope of this application.
在本申请所提供的实施例中,应该理解到,所揭露的装置/网络设备和方法,可以通过其它的方式实现。例如,以上所描述的装置/网络设备实施例仅仅是示意性的,例如,所述模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通讯连接可以是通过一些接口,装置或单元的间接耦合或通讯连接,可以是电性,机械或其它的形式。In the embodiments provided in this application, it should be understood that the disclosed devices/network devices and methods can be implemented in other ways. For example, the apparatus/network equipment embodiments described above are only illustrative. For example, the division of modules or units is only a logical function division. In actual implementation, there may be other division methods, such as multiple units. Or components can be combined or can be integrated into another system, or some features can be omitted, or not implemented. On the other hand, the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, indirect coupling or communication connection of devices or units, which may be in electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or they may be distributed to multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
以上所述实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。The above-described embodiments are only used to illustrate the technical solutions of the present application, but not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that they can still implement the above-mentioned implementations. The technical solutions described in the examples are modified, or some of the technical features are equivalently replaced; and these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions in the embodiments of this application, and should be included in within the protection scope of this application.

Claims (16)

  1. 一种DNA编码方法,其特征在于,包括:A DNA encoding method, characterized by including:
    将待存储的原始数据转换为第一四进制序列;Convert the original data to be stored into the first quaternary sequence;
    根据预设的四进制字符与碱基的映射关系,对所述第一四进制序列进行编码转换得到碱基序列;According to the preset mapping relationship between quaternary characters and bases, code conversion is performed on the first quaternary sequence to obtain a base sequence;
    根据所述碱基序列得到存储有所述原始数据的DNA序列。A DNA sequence storing the original data is obtained based on the base sequence.
  2. 根据权利要求1所述的方法,其特征在于,所述将待存储的原始数据转换为第一四进制序列,包括:The method according to claim 1, characterized in that converting the original data to be stored into a first quaternary sequence includes:
    当待存储的原始数据为文字时,按照预设字符编码表对所述文字进行编码,得到编码序列;When the original data to be stored is text, the text is encoded according to the preset character encoding table to obtain a coding sequence;
    根据所述预设字符编码表中的编码字符与四进制字符之间的映射关系,将所述编码序列转换为第一四进制序列。According to the mapping relationship between the coded characters and quaternary characters in the preset character encoding table, the coding sequence is converted into a first quaternary sequence.
  3. 根据权利要求1所述的方法,其特征在于,所述将待存储的原始数据转换为第一四进制序列,包括:The method according to claim 1, characterized in that converting the original data to be stored into a first quaternary sequence includes:
    当待存储的原始数据为图片时,获取所述图片中各像素点的RGB值;When the original data to be stored is a picture, obtain the RGB value of each pixel in the picture;
    按照预设的像素点排列顺序对所述各像素点的RGB值进行排序,得到十进制序列;Sort the RGB values of each pixel according to the preset pixel arrangement order to obtain a decimal sequence;
    根据十进制字符与四进制字符之间的映射关系,将所述十进制序列转换为第一四进制序列。The decimal sequence is converted into a first quaternary sequence according to the mapping relationship between decimal characters and quaternary characters.
  4. 根据权利要求1所述的方法,其特征在于,所述将待存储的原始数据转换为第一四进制序列,包括:The method according to claim 1, characterized in that converting the original data to be stored into a first quaternary sequence includes:
    当待存储的原始数据为音频时,根据预设采样率对所述音频进行采样处理,得到多个采样数据;When the original data to be stored is audio, the audio is sampled according to a preset sampling rate to obtain multiple sampled data;
    获取每个所述采样数据的振幅值;Obtain the amplitude value of each sampled data;
    根据所述多个采样数据的采样顺序对获取到的振幅值进行排序,得到十进制序列;Sort the obtained amplitude values according to the sampling order of the plurality of sampled data to obtain a decimal sequence;
    根据十进制字符与四进制字符之间的映射关系,将所述十进制序列转换为第一四进制序列。The decimal sequence is converted into a first quaternary sequence according to the mapping relationship between decimal characters and quaternary characters.
  5. 根据权利要求1所述的方法,其特征在于,所述第一四进制序列包括音频对应的第二四进制序列和画面对应的第三四进制序列;所述将待存储的原始数据转换为第一四进制序列,包括:The method of claim 1, wherein the first quaternary sequence includes a second quaternary sequence corresponding to the audio and a third quaternary sequence corresponding to the picture; the original data to be stored Convert to the first quaternary sequence, including:
    当待存储的原始数据为视频时,提取所述视频的音频以及所述视频的每帧图片;When the original data to be stored is a video, extract the audio of the video and each frame of the video;
    对提取到的所述音频进行处理,得到所述音频对应的第二四进制序列;Process the extracted audio to obtain the second quaternary sequence corresponding to the audio;
    对提取到的每帧图片进行处理,得到提取到的每帧图片对应的第四四进制序列;Process each extracted picture frame to obtain the fourth quaternary sequence corresponding to each extracted frame picture;
    按照提取到的各帧图片在所述视频中的播放顺序,对所述各帧图片对应的According to the playback order of each extracted frame picture in the video, the corresponding
    第四四进制序列进行排序,得到画面对应的第三四进制序列。The fourth quaternary sequence is sorted to obtain the third quaternary sequence corresponding to the picture.
  6. 根据权利要求1所述的方法,其特征在于,所述根据所述碱基序列得到存储有所述原始数据的DNA序列,包括:The method according to claim 1, characterized in that said obtaining the DNA sequence storing the original data according to the base sequence includes:
    在所述碱基序列中每间隔M个碱基的位置,插入N个碱基,得到存储有所述原始数据的DNA序列;Insert N bases at every position separated by M bases in the base sequence to obtain a DNA sequence storing the original data;
    其中,所述碱基序列中与所述N个碱基中的第一个碱基相邻的碱基不同;所述碱基序列中与所述N个碱基中的第N个碱基相邻的碱基不同,且所述N个碱基中相邻位置的碱基不同,M和N均为大于0的整数。Wherein, the bases adjacent to the first base among the N bases in the base sequence are different; the base sequence is similar to the Nth base among the N bases. The adjacent bases are different, and the bases at adjacent positions among the N bases are different, and both M and N are integers greater than 0.
  7. 一种DNA解码方法,其特征在于,包括:A DNA decoding method, characterized by including:
    确定需解码的DNA序列;Determine the DNA sequence to be decoded;
    根据所述DNA序列得到碱基序列;Obtain the base sequence according to the DNA sequence;
    根据预设的四进制字符与碱基的映射关系,对所述碱基序列进行解码得到第一四进制序列;According to the preset mapping relationship between quaternary characters and bases, decode the base sequence to obtain the first quaternary sequence;
    对所述第一四进制序列进行转换,得到所述DNA序列对应的原始数据。Convert the first quaternary sequence to obtain original data corresponding to the DNA sequence.
  8. 根据权利要求7所述的方法,其特征在于,所述对所述第一四进制序列进行转换,得到所述DNA序列对应的原始数据,包括:The method according to claim 7, characterized in that said converting the first quaternary sequence to obtain original data corresponding to the DNA sequence includes:
    当所述DNA序列所存储的原始数据为文字时,根据预设字符编码表中的编码字符与四进制字符之间的映射关系,将所述第一四进制序列进行转换为编码序列;When the original data stored in the DNA sequence is text, the first quaternary sequence is converted into a coding sequence according to the mapping relationship between the coded characters and quaternary characters in the preset character encoding table;
    根据所述预设字符编码表将所述编码序列转换为文字。Convert the encoding sequence into text according to the preset character encoding table.
  9. 根据权利要求7所述的方法,其特征在于,所述对所述第一四进制序列进行转换,得到所述DNA序列对应的原始数据,包括:The method according to claim 7, characterized in that said converting the first quaternary sequence to obtain original data corresponding to the DNA sequence includes:
    当所述DNA序列所存储的原始数据为图片时,根据四进制字符与十进制字符之间的映射关系,将所述第一四进制序列转换为十进制序列;When the original data stored in the DNA sequence is a picture, convert the first quaternary sequence into a decimal sequence according to the mapping relationship between quaternary characters and decimal characters;
    根据所述十进制序列,确定多个像素点的RGB值;Determine the RGB values of multiple pixels according to the decimal sequence;
    根据所述多个像素点的RGB值以及预设的像素点排列顺序,生成图片。A picture is generated according to the RGB values of the plurality of pixels and the preset arrangement order of the pixels.
  10. 根据权利要求7所述的方法,其特征在于,所述对所述第一四进制序列进行转换,得到所述DNA序列对应的原始数据,包括:The method according to claim 7, characterized in that said converting the first quaternary sequence to obtain original data corresponding to the DNA sequence includes:
    当所述DNA序列所存储的原始数据为音频时,根据四进制字符与十进制字符之间的映射关系,将所述第一四进制序列转换为十进制序列;When the original data stored in the DNA sequence is audio, convert the first quaternary sequence into a decimal sequence according to the mapping relationship between quaternary characters and decimal characters;
    根据所述十进制序列,确定多个采样数据的振幅值;Determine amplitude values of multiple sampled data according to the decimal sequence;
    根据预设采样率和采样数据的数量,确定所述DNA序列所存储的音频的总时长;Determine the total duration of the audio stored in the DNA sequence based on the preset sampling rate and the number of sampled data;
    将确定出的振幅值平均分布于所述总时长中,得到音频。The determined amplitude values are evenly distributed over the total duration to obtain audio.
  11. 根据权利要求7所述的方法,其特征在于,所述需解码的DNA序列包括音频的DNA序列和画面的DNA序列,所述第一四进制序列包括音频的DNA序列对应的第二四进制序列和画面的DNA序列对应的第三四进制序列,所述音频的DNA序列所存储的原始数据为音频,所述画面的DNA序列所存储的原始数据为多帧图片;The method of claim 7, wherein the DNA sequence to be decoded includes an audio DNA sequence and a picture DNA sequence, and the first quaternary sequence includes a second quaternary sequence corresponding to the audio DNA sequence. The third quaternary sequence corresponding to the DNA sequence of the frame and the picture, the original data stored in the DNA sequence of the audio is audio, and the original data stored in the DNA sequence of the picture is a multi-frame picture;
    所述对所述第一四进制序列进行转换,得到所述DNA序列对应的原始数据,包括:Converting the first quaternary sequence to obtain original data corresponding to the DNA sequence includes:
    对所述音频的DNA序列对应的第二四进制序列进行转换,得到音频;Convert the second quaternary sequence corresponding to the DNA sequence of the audio to obtain the audio;
    对所述画面的DNA序列对应的第三四进制序列进行转换,得到多帧图片;Convert the third quaternary sequence corresponding to the DNA sequence of the picture to obtain multiple frames of pictures;
    在所述对所述第一四进制序列进行转换,得到所述DNA序列对应的原始数据之后,所述方法还包括:After converting the first quaternary sequence to obtain original data corresponding to the DNA sequence, the method further includes:
    对所述多帧图片和所述音频进行合成,得到视频。The multi-frame pictures and the audio are synthesized to obtain a video.
  12. 根据权利要求7所述的方法,其特征在于,所述根据所述DNA序列得到碱基序列,包括:The method according to claim 7, wherein obtaining the base sequence according to the DNA sequence includes:
    在所述DNA序列中,每间隔M个碱基的位置,去除N个碱基,得到碱基序列;In the DNA sequence, N bases are removed at every position separated by M bases to obtain the base sequence;
    其中,M和N均为大于0的整数。Among them, M and N are both integers greater than 0.
  13. 一种DNA编码装置,其特征在于,包括:A DNA encoding device, characterized by including:
    第一转换模块,用于将待存储的原始数据转换为第一四进制序列;The first conversion module is used to convert the original data to be stored into a first quaternary sequence;
    编码模块,用于根据预设的四进制字符与碱基的映射关系,对所述第一四进制序列进行编码转换得到碱基序列;An encoding module, configured to encode and convert the first quaternary sequence to obtain a base sequence according to the preset mapping relationship between quaternary characters and bases;
    生成模块,用于根据所述碱基序列得到存储有所述原始数据的DNA序列。A generating module is used to obtain a DNA sequence storing the original data according to the base sequence.
  14. 一种DNA解码装置,其特征在于,包括:A DNA decoding device, characterized by including:
    确定模块,用于确定需解码的DNA序列;Determination module, used to determine the DNA sequence to be decoded;
    处理模块,用于根据所述DNA序列得到碱基序列;A processing module, used to obtain the base sequence according to the DNA sequence;
    解码模块,用于根据预设的四进制字符与碱基的映射关系,对所述碱基序列进行解码得到第一四进制序列;A decoding module, used to decode the base sequence according to the preset mapping relationship between quaternary characters and bases to obtain the first quaternary sequence;
    第二转换模块,用于对所述第一四进制序列进行转换,得到所述DNA序列对应的原始数据。The second conversion module is used to convert the first quaternary sequence to obtain original data corresponding to the DNA sequence.
  15. 一种终端设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现如权利要求1至6任一项所述的DNA编码方法,或者所述处理器执行所述计算机程序时实现如权利要求7至12任一项所述的DNA解码方法。A terminal device includes a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that when the processor executes the computer program, it implements claims 1 to 1 6. The DNA encoding method according to any one of claims 7 to 12, or when the processor executes the computer program, the DNA decoding method according to any one of claims 7 to 12 is implemented.
  16. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1至6任一项所述的DNA编码方法,或者所述计算机程序被处理器执行时实现如权利要求7至12任一项所述的DNA解码方法。A computer-readable storage medium, the computer-readable storage medium stores a computer program, characterized in that, when the computer program is executed by a processor, the DNA encoding method as described in any one of claims 1 to 6 is implemented, Or when the computer program is executed by a processor, the DNA decoding method according to any one of claims 7 to 12 is implemented.
PCT/CN2022/138143 2022-03-14 2022-12-09 Dna coding method and apparatus, dna decoding method and apparatus, terminal device and medium WO2023173842A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210248376 2022-03-14
CN202210248376.9 2022-03-14

Publications (1)

Publication Number Publication Date
WO2023173842A1 true WO2023173842A1 (en) 2023-09-21

Family

ID=83859084

Family Applications (2)

Application Number Title Priority Date Filing Date
PCT/CN2022/137698 WO2023173837A1 (en) 2022-03-14 2022-12-08 Dna encoding method and apparatus, dna decoding method and apparatus, terminal device, and medium
PCT/CN2022/138143 WO2023173842A1 (en) 2022-03-14 2022-12-09 Dna coding method and apparatus, dna decoding method and apparatus, terminal device and medium

Family Applications Before (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/137698 WO2023173837A1 (en) 2022-03-14 2022-12-08 Dna encoding method and apparatus, dna decoding method and apparatus, terminal device, and medium

Country Status (2)

Country Link
CN (1) CN115312128A (en)
WO (2) WO2023173837A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115312128A (en) * 2022-03-14 2022-11-08 深圳先进技术研究院 DNA encoding method, decoding method, apparatus, terminal device and medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106845158A (en) * 2017-02-17 2017-06-13 苏州泓迅生物科技股份有限公司 A kind of method that information Store is carried out using DNA
US20170249345A1 (en) * 2014-10-18 2017-08-31 Girik Malik A biomolecule based data storage system
CN108183712A (en) * 2017-12-28 2018-06-19 北京华生恒业科技有限公司 A kind of Chinese character gene coding and decoding method and system
CN112527736A (en) * 2020-12-09 2021-03-19 中国科学院深圳先进技术研究院 Data storage method and data recovery method based on DNA and terminal equipment
CN113066534A (en) * 2021-03-08 2021-07-02 山东骥图生物科技有限公司 Method for writing and reading information by using DNA sequence

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050053968A1 (en) * 2003-03-31 2005-03-10 Council Of Scientific And Industrial Research Method for storing information in DNA
CN111368132B (en) * 2020-02-28 2023-04-14 元码基因科技(北京)股份有限公司 Method for storing audio or video files based on DNA sequences and storage medium
CN115312128A (en) * 2022-03-14 2022-11-08 深圳先进技术研究院 DNA encoding method, decoding method, apparatus, terminal device and medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170249345A1 (en) * 2014-10-18 2017-08-31 Girik Malik A biomolecule based data storage system
CN106845158A (en) * 2017-02-17 2017-06-13 苏州泓迅生物科技股份有限公司 A kind of method that information Store is carried out using DNA
CN108183712A (en) * 2017-12-28 2018-06-19 北京华生恒业科技有限公司 A kind of Chinese character gene coding and decoding method and system
CN112527736A (en) * 2020-12-09 2021-03-19 中国科学院深圳先进技术研究院 Data storage method and data recovery method based on DNA and terminal equipment
CN113066534A (en) * 2021-03-08 2021-07-02 山东骥图生物科技有限公司 Method for writing and reading information by using DNA sequence

Also Published As

Publication number Publication date
CN115312128A (en) 2022-11-08
WO2023173837A1 (en) 2023-09-21

Similar Documents

Publication Publication Date Title
US7652595B2 (en) Generating a data stream and identifying positions within a data stream
CN111091876A (en) DNA storage method, system and electronic equipment
WO2023173842A1 (en) Dna coding method and apparatus, dna decoding method and apparatus, terminal device and medium
JP2001522548A (en) Lossless digital data compression method and compression apparatus
WO2020083019A1 (en) Decoding method based on multi-core processor, terminal device and storage medium
TWI273779B (en) Method and apparatus for optimized lossless compression using a plurality of coders
CZ289508B6 (en) Method and apparatus for generating encoded image signal and a video image signal display apparatus
JPS60140981A (en) Method and device for decoding digital coded word of coded word system
KR100537523B1 (en) Apparatus for encoding DNA sequence and method of the same
JP2962518B2 (en) Image data encoding device
JPS60140979A (en) Parallel processor for encoding and decoding of picture signal
CN1892820A (en) Audio-frequency decoding system and audio-frequency for mat detecting method
JP3417684B2 (en) Image processing device
JP3063433B2 (en) Microprocessor
JPH01302917A (en) Data compression system
JP2003273746A (en) Variable length code decoder
JP2718600B2 (en) Synchronous signal detection device
JP3044847B2 (en) Variable-length code decoding device
JP3032239B2 (en) Variable-length code decoding circuit
JP2536489B2 (en) Compressed data decoding device
JP2795023B2 (en) Encoding device
JP2594766B2 (en) Data compression method and data compression method
JPS6028371A (en) Data storage system
JPH0378372A (en) Picture compression circuit and picture restoration circuit
JPH11205613A (en) Image processor

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22931851

Country of ref document: EP

Kind code of ref document: A1