CN116432217A

CN116432217A - File storage method, file reading method and related devices

Info

Publication number: CN116432217A
Application number: CN202310238624.6A
Authority: CN
Inventors: 戴俊彪; 崔君婷; 黄小罗
Original assignee: Shenzhen Institute of Advanced Technology of CAS
Current assignee: Shenzhen Institute of Advanced Technology of CAS
Priority date: 2023-03-01
Filing date: 2023-03-01
Publication date: 2023-07-14

Abstract

The invention discloses a file storage method, a reading method and a related device, wherein the file reading method comprises the steps of converting an acquired file to be archived into a file DNA sequence, converting the archive information of the acquired file to be archived into an archive information DNA sequence, combining the file DNA sequence with the archive information DNA sequence to obtain information to be signed, regenerating a DNA sequence abstract of the information to be signed, signing the DNA sequence abstract according to a pre-generated private key, generating signature information and converting the signature information into a signature information DNA sequence; then based on the filing information DNA sequence and the signature information DNA sequence, obtaining a seal DNA sequence of the file to be filed; and finally, synthesizing DNA molecules corresponding to the files to be archived based on the seal DNA sequence and the file DNA sequence to finish the storage of the files to be archived. By the scheme, DNA molecules can be effectively prevented from being damaged or tampered, and the safety of the archive in DNA storage is improved.

Description

File storage method, file reading method and related devices

Technical Field

The present invention relates to the field of information storage technologies, and in particular, to a file storage method, a file reading method, and a related device.

Background

With the rapid growth of global data, DNA storage technology is increasingly coming into the public view. The DNA storage technology is used as a molecular-level storage mode, and artificially synthesized deoxyribonucleic acid (DeoxyriboNucleic Acid, DNA) is used as a storage medium, so that the method has the advantages of high efficiency, large storage capacity, long storage time, easiness in acquisition and maintenance avoidance.

The file is an original record with a storage medium, which is directly formed by people in various social activities, and generally the file does not need to be read with small requirement and high frequency. At present, the files are stored mainly in the mode of electronic files and paper files, but with the increase of files, larger storage space and more storage cost are necessarily required.

Thus, the archive can be converted into the corresponding DNA molecule for storage. However, when the files are converted into corresponding DNA molecules for storage, the stored files may be damaged and tampered, so that the security of file storage is low.

Disclosure of Invention

The invention mainly aims to provide a file storage method, a read file and a related device, and aims to solve the problem that in the prior art, when files are stored by a DNA storage technology, the files can be damaged and tampered according to the files, so that the safety of file storage is low.

In order to achieve the above object, the present invention provides a archive storage method, including:

converting the acquired files to be archived into a file DNA sequence, and converting the acquired archive information of the files to be archived into an archive information DNA sequence;

combining the file DNA sequence and the archive information DNA sequence to obtain information to be signed;

generating a DNA sequence abstract of the information to be signed, signing the DNA sequence abstract according to a pre-generated private key, generating signature information and converting the signature information into a signature information DNA sequence;

based on the filing information DNA sequence and the signature information DNA sequence, obtaining a seal DNA sequence of the file to be filed;

and synthesizing DNA molecules corresponding to the files to be archived based on the seal DNA sequence and the file DNA sequence.

Optionally, the obtaining the seal DNA sequence of the file to be archived based on the archiving information DNA sequence and the signature information DNA sequence specifically includes:

acquiring a preset joint DNA sequence;

and combining the archiving information DNA sequence, the signature information DNA sequence and the preset joint DNA sequence according to a preset sequence to obtain the seal DNA sequence of the file to be archived.

Alternatively, the predetermined adaptor DNA sequence comprises an upstream adaptor DNA sequence and a downstream adaptor DNA sequence.

Optionally, the archive information includes at least one of: full number, year, shelf life, organization code, part number, number of sequences.

Optionally, converting the obtained file to be archived into a file DNA sequence, and converting the obtained archive information of the file to be archived into an archive information DNA sequence, specifically including:

converting the file to be archived into a first quaternary sequence and converting the archive information into a second quaternary sequence;

and respectively performing code conversion on the first quaternary sequence and the second quaternary sequence according to a preset mapping relation between quaternary characters and bases to obtain the file DNA sequence of the file to be archived and the archive information DNA sequence of the archive information.

Optionally, the archival information DNA sequence has a preset DNA sequence length.

Optionally, the synthesizing, based on the seal DNA sequence and the file DNA sequence, a DNA molecule corresponding to the file to be archived specifically includes:

taking the seal DNA sequence and the file DNA sequence as DNA sequences to be synthesized;

And controlling a DNA synthesizer to synthesize DNA molecules corresponding to the files to be archived based on the DNA sequences to be synthesized, and completing the storage of the files to be archived.

taking the archiving information DNA sequence and the signature information DNA sequence as DNA sequences to be detected;

dividing the DNA sequence to be detected in sequence to obtain a plurality of groups to be detected; wherein each group to be detected comprises bases with preset base numbers;

detecting each group to be detected, and determining the base content of guanine G and cytosine C of each group to be detected;

according to the base contents of guanine G and cytosine C of the group to be detected, selecting a base from a first base set to be selected or a second base set to be selected as an adjusting base, and adding the adjusting base into the group to be detected to obtain an updated group to be detected;

wherein the first set of bases to be selected comprises: adenine a, thymine T; the second set of candidate bases comprises: guanine G, cytosine C;

based on the updated group to be detected, obtaining an updated archiving information DNA sequence and an updated signature information DNA sequence;

And obtaining the seal DNA sequence of the file to be archived based on the updated archiving information DNA sequence and the updated signature information DNA sequence.

Optionally, according to the base content of guanine G and cytosine C in the to-be-detected group, selecting a base from the first to-be-selected base set or the second to-be-selected base set as an adjusting base, and adding the adjusting base to the to-be-detected group to obtain the updated to-be-detected group, which specifically includes:

if the base contents of guanine G and cytosine C of the group to be detected are larger than a preset content threshold, selecting a base from the first base set to be selected as an adjusting base to be added into the group to be detected, and obtaining the updated group to be detected;

and if the base contents of guanine G and cytosine C in the group to be detected are smaller than a preset content threshold, selecting a base from the second base set to be selected as an adjusting base, and adding the adjusting base into the group to be detected to obtain the updated group to be detected.

Optionally, if the base content of guanine G and cytosine C in the to-be-detected group is greater than a preset content threshold, selecting a base from the first to-be-selected base set as the adjusting base, and adding the adjusting base to the to-be-detected group, which specifically includes:

If the last base in the group to be detected exists in the first base set to be detected, selecting another base which is different from the last base in the group to be detected from the first base set to be detected as the adjustment base;

if the last base in the to-be-detected group does not exist in the first to-be-detected base set, determining whether the to-be-detected group is the last to-be-detected group;

if the group to be detected is the last group to be detected, randomly selecting one base from the first base set to be selected as the adjustment base;

if the group to be detected is not the last group to be detected, determining a first base of a next group to be detected adjacent to the group to be detected, and selecting another base different from the first base of the next group to be detected from the first base set to be detected as an adjustment base.

In order to achieve the above object, the present invention further provides a file reading method, which includes:

sequencing the DNA molecules to be read, which are stored with files, to obtain the DNA sequences to be read of the DNA molecules to be read;

wherein the DNA molecule to be read, in which the archive is stored, is obtained according to the archive storage method as described in any one of the above;

Acquiring a file DNA sequence in the DNA sequence of the DNA molecule to be read;

and decoding the file DNA sequence to obtain the file to be read.

Optionally, in the case that the DNA sequence of the DNA molecule to be read includes a preset linker DNA sequence, obtaining a file DNA sequence in the DNA sequence of the DNA molecule to be read specifically includes:

determining a file DNA sequence and a seal DNA sequence in the DNA sequence of the DNA molecule to be read through a PCR technology according to a preset joint DNA sequence;

and decoding the file DNA sequence to obtain a corresponding file to be read.

Optionally, after determining the file DNA sequence and the stamp DNA sequence in the DNA sequence of the DNA molecule to be read by PCR technique according to the preset linker DNA sequence, the method further comprises:

extracting a signature information DNA sequence in the seal DNA sequence according to a preset joint DNA sequence, an archive information DNA sequence and a preset sequence;

decoding the signature information DNA sequence and converting the signature information DNA sequence into corresponding signature information;

signing verification is carried out on the signature information according to a preset public key, and a signing verification result is obtained;

the signature verification result is used for indicating whether the DNA sequence to be read is tampered or not.

In order to achieve the above object, the present invention also provides a computer-readable storage medium storing one or more programs executable by one or more processors to implement steps in the archive storage method as set forth in any one of the above or steps in the archive reading method as set forth in any one of the above.

In order to achieve the above object, the present invention also provides a terminal, including: a processor and a memory; the memory has stored thereon a computer readable program executable by the processor; the processor, when executing the computer readable program, implements the steps of the archive storage method as described in any one of the above, or the archive reading method as described in any one of the above.

The method comprises the steps of converting an obtained file to be archived into a file DNA sequence, converting the obtained archive information of the file to be archived into an archive information DNA sequence, combining the file DNA sequence with the archive information DNA sequence to obtain information to be signed, regenerating a DNA sequence abstract of the information to be signed, signing the DNA sequence abstract according to a pre-generated private key, generating signature information and converting the signature information into a signature information DNA sequence; then based on the filing information DNA sequence and the signature information DNA sequence, obtaining a seal DNA sequence of the file to be filed; and finally, synthesizing DNA molecules corresponding to the files to be archived based on the seal DNA sequence and the file DNA sequence, and finishing the storage of the files to be archived. When files are to be archived through a DNA storage technology, based on seal DNA sequences generated by the archive information DNA sequences, verification of DNA molecules corresponding to the files to be archived can be realized, so that the DNA molecules are effectively prevented from being damaged or tampered, and the safety of the files in DNA storage is improved.

Drawings

FIG. 1 is a flowchart of a method for storing files according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a seal DNA sequence provided in an embodiment of the present invention;

fig. 3 is a flowchart of step S106 provided in an embodiment of the present invention;

FIG. 4 is a flowchart of another method for storing files according to an embodiment of the present invention;

FIG. 5 is a flowchart of a method for storing files according to an embodiment of the present invention;

FIG. 6 is a diagram illustrating an exemplary method for storing files according to an embodiment of the present invention;

FIG. 7 is an exemplary diagram of archive information provided by an embodiment of the present invention;

FIG. 8 is a flowchart of a file reading method according to an embodiment of the present invention;

fig. 9 is a schematic structural diagram of a terminal according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more clear and clear, the present invention will be further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

The invention provides a file storage method, as shown in fig. 1, which at least comprises the following steps:

s101, acquiring files to be archived and archiving information of the files to be archived.

Wherein the archive information of the file to be archived comprises at least one of the following: full number, year, shelf life, organization code, part number, and number of sequences.

The complete file number is the number compiled by the archive office for the vertical file unit. The year, i.e. the formation of the file of the archive. The storage period is the time for which the file is stored, and is, for example: permanent, regular 10 years, regular 30, etc. The organization code is the identification of the organization that takes care of the archive. The part numbers, i.e., the arrangement of individual files in the lowest level directory in the sort plan, are numbered sequentially. The number of sequences, i.e., the number of DNA sequences after a single file is converted to DNA sequences, can be determined according to the size of the file to be archived.

The whole number, year, shelf life, organization code, part number, and number of sequences may be represented by numerals, letters, or a mixture of numerals and letters, and may be represented by special characters, chinese characters, or the like, which are not particularly limited in the embodiments of the present application.

In the embodiment of the invention, corresponding compiling rules can be set for the total number, the year, the storage period, the organization code, the part number, the sequence number and the like in the archive information so as to obtain the character string with fixed length. For example, the year is represented by a 4-digit number, such as 2013; the organization code is identified by 3-bit Pinyin letters or Arabic numerals, such as TSG.

It will be appreciated that the files to be archived and their archive information described above may be entered via an entry device (e.g., keyboard, scanner, etc.) and stored in a terminal device (e.g., computer, etc.).

S102, converting the files to be archived into file DNA sequences and converting the archive information of the files to be archived into archive information DNA sequences.

Wherein the archival information DNA sequence has a predetermined DNA sequence length. Since each data in the archive information can be represented by a character string of a different fixed length, the corresponding DNA sequence obtained by converting the archive information, that is, the archive information DNA sequence, is also a DNA sequence having a corresponding length.

Specifically, the file to be archived may be first converted into a first quaternary sequence, and the archive information may be converted into a second quaternary sequence; and respectively performing code conversion on the first quaternary sequence and the second quaternary sequence according to the mapping relation between the preset quaternary characters and the bases to obtain a file DNA sequence of a file to be archived and an archive information DNA sequence of archive information.

The DNA sequences (e.g., file DNA sequences, archive information DNA sequences, etc.) mentioned in the embodiments of the present invention are all DNA sequences composed of four deoxyribonucleotides of adenine a, thymine T, cytosine C, and guanine G, and will not be described in detail later.

In the embodiment of the invention, the file to be archived and the archiving information thereof are converted into the corresponding quaternary sequences, and the obtained quaternary sequences can be encoded into the corresponding DNA sequences through the mapping relation (for example, A=0, T=1, C=2 and G=3) of the preset quaternary characters and bases, so that the encoding complexity is reduced, the encoding efficiency is improved, and the data storage efficiency is further improved.

In addition, since the files to be archived and their archive information may be represented in different characters, for example, may include both letters, numbers, letters, special symbols, and the like. Therefore, for different files to be archived and different characters in the archive information thereof, different coding modes can be adopted to convert the files to be archived and the archive information thereof into quaternary representation, so as to obtain corresponding quaternary sequences.

If the data in the archive information exists in the form of pure numbers (such as years, part numbers, sequence numbers and the like), the data can be used as decimal numbers, the decimal numbers can be converted into quaternary numbers, and then the converted quaternary numbers are mapped into four bases through a preset mapping relation (such as A=0, T=1, C=2 and G=3), so that the DNA sequence of the data is obtained.

For example, the year is represented by 4 decimal digits, the 4 decimal digits are converted into 7 quaternary digits, and each quaternary digit can be set to correspond to a different deoxyribonucleotide according to a corresponding preset rule, so that the 7 quaternary digits are converted into a DNA sequence consisting of 7 bases.

For another example, where the part number is represented by a 4-digit decimal number, the 4-digit decimal number may be converted to a 7-digit quaternary number and then the 7-digit quaternary number may be converted to a 7-base constituent DNA sequence with reference to the above-described annual example.

If the data in the archive information exists in a mixed form of letters and numbers (such as complete numbers, organization codes and the like), because all characters (including uppercase letters, lowercase letters and numbers) are 62, the characters corresponding to the data can be used as sixty binary digits, then converted into quaternary digits, and mapped into corresponding bases through a preset mapping relation (such as A=0, T=1, C=2 and G=3), so that the DNA coding of the data is completed. If the 4-bit letters and the digits are mixed, the mixed digits can be regarded as 4-bit sixty binary digits, the range of the converted digits is 0-320113200033 after the converted digits are converted into quaternary digits, and the maximum length is 12 bits, so that the code length can be set to be 12-bit bases, and zero padding is performed on the left side when the digits are less than 12 bits.

Or when the data in the archive information exists in a mixed form of letters and numbers or in a mixed form of numbers, letters and special characters, each character in the data can be used as an ASCII code, the ASCII code is converted into a quaternary number, and the corresponding quaternary number is mapped into a corresponding base through a preset mapping relation, so that the DNA coding of the data is completed.

For example, if the whole number is represented by a string of 4 digits and letters, the whole number can be converted into a 4-digit sixty-binary number, then into a 12-digit 4-digit number, then transcoded by a predetermined rule, and converted into a DNA sequence represented by four deoxyribonucleotides of adenine a, thymine T, cytosine C and guanine G, and the DNA sequence is composed of 12 bases.

For example, the organization code is represented by a character string of 3 digits and letters, and the character in the organization code can be converted into a 3-digit sixty-binary number, then into a 9-digit quaternary number, and then into a DNA sequence of 9 bases by referring to the above-mentioned examples of whole numbers.

For example, the shelf life is classified into permanent, 30 years on a regular basis, and 10 years on a regular basis, and can be represented by A, T, G deoxyribonucleotides.

It is understood that the archive information DNA sequence is formed by arranging DNA sequences of respective data of the archive information in a corresponding preset order.

Similarly, numbers, letters, and special characters in the file to be archived can also be converted into quaternary characters in the same manner as the archiving information described above.

In an actual application scene, most of files to be archived are composed of characters, the characters can be encoded through UTF-8 encoding and stored as txt texts, then each byte in the txt texts is converted into quaternary numbers, and the quaternary numbers are converted into corresponding DNA sequences through preset mapping rules.

S103, combining the file DNA sequence and the archive information DNA sequence to obtain the information to be signed.

The file DNA sequence and the archive information DNA sequence can be combined according to the sequence of the file DNA sequence before and the archive information DNA sequence after to obtain the information to be signed.

It will be appreciated that the above-described order of combining the file DNA sequence and the archive information DNA sequence is merely exemplary, and that the two may be arranged in other orders.

And S104, generating a DNA sequence abstract of the information to be signed.

And calculating the information to be signed through a preset digest algorithm such as a hash algorithm to obtain the DNA sequence digest of the information to be signed.

S105, signing the DNA sequence abstract according to a preset private key, generating signature information and converting the signature information into a signature information DNA sequence.

In the embodiment of the invention, a key pair can be generated through an asymmetric encryption algorithm (such as an RSA encryption algorithm, a Rabin algorithm, a DSS algorithm, etc.), and the key pair comprises a private key (i.e. the preset private key) and a public key (i.e. the preset public key), and the signature information can be obtained by encrypting the DNA sequence digest through the preset private key.

Further, the signature information can be converted into a quaternary character, and then converted into a corresponding DNA sequence through a mapping relation, so that the signature information DNA sequence can be obtained.

S106, obtaining a seal DNA sequence of the file to be archived based on the archiving information DNA sequence and the signature information DNA sequence.

In this embodiment, the seal DNA sequence of the file to be archived may be obtained by combining the two in the order of the preceding archive information DNA sequence and the following signature information DNA sequence.

Further, a preset linker DNA sequence may be obtained first, and then, according to a preset sequence, the archive information DNA sequence, the signature information DNA sequence, and the preset linker DNA sequence may be combined to obtain a stamp DNA sequence of the file to be archived.

The above-mentioned predetermined adaptor DNA sequence may refer to a predetermined fixed DNA sequence for distinguishing from other DNA sequences. Therefore, in the scheme, the seal DNA sequence can be distinguished from the file DNA sequence of the file to be filed through the preset joint DNA sequence, so that the information to be read can be accurately positioned during reading.

Further, the above-mentioned predetermined linker DNA sequence may include an upper linker DNA sequence and a lower linker DNA sequence.

As shown in FIG. 2, the stamp DNA sequence can be obtained by combining the forms of the upper-end DNA sequence, the archive information DNA sequence (including DNA sequences corresponding to the whole number, year, shelf life, organization code, part number, and sequence number), the signature information DNA sequence, and the lower-end DNA sequence.

The preset linker DNA sequence comprises an upper linker DNA sequence and a lower linker DNA sequence, and compared with a single linker, the accuracy of positioning in information reading can be further improved, and the DNA sequence to be read can be rapidly positioned.

In the practical application process, if the content of guanine G and cytosine C in the DNA sequence is excessive and the single base repetition is too long, the DNA synthesis and sequencing are affected, so that when the DNA sequence is provided with a joint DNA sequence, the base equalization treatment is carried out on the archive information DNA sequence and the signature information DNA sequence except for the joint DNA sequence. Specifically, as shown in fig. 3, the step S106 may at least further include the following steps:

S301, taking the archiving information DNA sequence and the signature information DNA sequence as DNA sequences to be detected.

Specifically, the archive information DNA sequence is used as a DNA sequence to be detected, and the signature information DNA sequence is also used as the DNA sequence to be detected, so that two DNA sequences to be detected are obtained.

S302, sequentially dividing the DNA sequences to be detected to obtain a plurality of groups to be detected.

Wherein each group to be detected includes a preset number of bases. The preset base number can be adaptively adjusted according to actual conditions.

For example, two DNA sequences to be detected are respectively segmented according to a group of 5 bases, so as to obtain a plurality of groups to be detected.

S303, detecting each group to be detected, and determining the base content of guanine G and cytosine C of each group to be detected.

Wherein, the base content of guanine G and cytosine C in the group to be detected refers to the sum of the numbers of guanine G and cytosine C in the group to be detected divided by the total number of bases in the group to be detected.

For example, if the sequence 1 to be detected is ATGC, the base content of guanine G and cytosine C in the group to be detected is 3/5.

S304, selecting a base from the first base set or the second base set to be selected as an adjusting base according to the base content of guanine G and cytosine C in the group to be detected, and adding the base into the group to be detected to obtain an updated group to be detected.

Wherein the first set of bases to be selected comprises: adenine a, thymine T; the second set of bases to be selected comprises: guanine G, cytosine C.

Specifically, if the base contents of guanine G and cytosine C in the group to be detected are greater than a preset content threshold, selecting a base from the first base set to be selected as an adjusting base, and adding the adjusting base into the group to be detected, so as to obtain an updated group to be detected.

And if the base contents of guanine G and cytosine C in the group to be detected are smaller than or equal to a preset content threshold, selecting one base from the second base set to be selected as an adjusting base, and adding the adjusting base into the group to be detected to obtain an updated group to be detected.

The preset content threshold may be set and adjusted according to practical situations, for example, the preset content threshold is 50%.

For example, if the to-be-detected group 1 is ACGCT, and the preset content threshold is 50%, then the to-be-detected group 1 needs to select a base from the first set of to-be-selected bases as the adjustment base. And the group to be detected 2 is ACGTA, the preset content threshold value of the group to be detected is 50%, and one base is required to be selected from the second base set to be selected as an adjusting base in the group to be detected 1.

In the embodiment of the invention, the adjusted base is taken as the last base of the group to be detected, so that the updated group to be detected is obtained. For example, the group to be detected 1 is ACGCT, and if the base is adjusted to a, the updated group to be detected is ACGCTA.

Further, as shown in fig. 4, if the base content of guanine G and cytosine C in the group to be detected is greater than the preset content threshold, the step of adding a base selected from the first set of bases to be detected as the adjusted base may be at least implemented by:

s401, determining whether the last base in the group to be detected exists in the first set of bases to be selected.

Wherein whether the last base is present in the first set of bases to be selected refers to: whether the last base is the same as the base in the first base combination or not, if so, the last base is present in the first base combination, otherwise, the last base is not present in the first base combination.

For example, if the last base of group 1 to be detected is guanine G, then there is no base set with the first base to be selected; the last base of the group 2 to be detected is adenine A, which is present in the first band-selected base set.

S402, if the last base in the group to be detected exists in the first base set to be detected, selecting another base which is different from the last base in the group to be detected from the first base set to be detected as an adjustment base.

For example, when adenine A is the last base in the group 2 to be detected, thymine T is selected as the regulatory base when it is present in the first band-selected base set.

S403, if the last base in the group to be detected does not exist in the first set of bases to be detected, determining whether the group to be detected is the last group to be detected.

The last group to be detected is referred to herein as the last group of the profiling information DNA sequence and the last group of the signature information DNA sequence.

S404, if the group to be detected is the last group to be detected, randomly selecting one base from the first base set to be selected as the adjustment base.

S405, if the group to be detected is not the last group to be detected, determining the first base of the next group to be detected adjacent to the group to be detected, and selecting another base different from the first base of the next group to be detected from the first base set to be detected as an adjustment base.

For example, to-be-detected group 1: the next to-be-detected group adjacent to the to-be-detected group 1 is to-be-detected group 2: TCGAC, adenine a was chosen as the regulatory base.

Further, as shown in fig. 5, if the base content of guanine G and cytosine C in the group to be detected is less than or equal to the preset content threshold, the step of adding one base selected from the second set of bases to be selected as the adjusting base to the group to be detected may be at least achieved by:

S501, determining whether the last base in the group to be detected exists in the second set of bases to be selected.

S502, if the last base in the group to be detected exists in the second base set to be selected, selecting another base which is different from the last base in the group to be detected from the second base set to be selected as an adjustment base.

S503, if the last base in the group to be detected does not exist in the second set of bases to be selected, determining whether the group to be detected is the last group to be detected.

S504, if the group to be detected is the last group to be detected, randomly selecting one base from the second base set to be selected as the adjustment base.

S505, if the group to be detected is not the last group to be detected, determining a second base of the next group to be detected adjacent to the group to be detected, and selecting another base different from the second base of the next group to be detected from the second base set to be selected as the adjustment base.

S305, based on the updated group to be detected, obtaining an updated archiving information DNA sequence and an updated signature information DNA sequence.

S306, obtaining a seal DNA sequence of the file to be archived based on the updated archiving information DNA sequence and the updated signature information DNA sequence.

It can be understood that, based on the updated archiving information DNA sequence and the updated signature information DNA sequence, a seal DNA sequence of the file to be archived is obtained, and the above-mentioned embodiments are referred to, and will not be repeated.

In the embodiment of the invention, the base balance treatment can be carried out on the archiving information DNA sequence and the signature information DNA sequence by adding the adjusting base, so that the situation that the DNA synthesis and sequencing are affected due to overlarge local guanine G and cytosine C content and single base repetition is avoided, and the storage accuracy of files to be archived is improved.

S107, synthesizing DNA molecules corresponding to the files to be archived based on the seal DNA sequence and the file DNA sequence.

Specifically, a seal DNA sequence and a file DNA sequence can be used as DNA sequences to be synthesized; and then controlling a DNA synthesizer to synthesize DNA molecules corresponding to the files to be archived based on the DNA sequences to be synthesized, thereby completing the storage of the files.

In some embodiments of the present application, when the file to be archived has a sequence number, the file DNA sequence of the file to be archived may be divided according to the sequence number, so as to obtain a file DNA sequence with a number corresponding to the sequence number, and then these file DNA sequences and the seal DNA Xu Lei are used as the DNA sequence to be synthesized. After the DNA sequence to be synthesized is obtained, a control instruction can be generated to control the DNA synthesizer to synthesize DNA molecules corresponding to the file to be archived based on the DNA sequence to be synthesized, so that the file to be archived is stored.

As shown in fig. 6, the archive storage method proposed by the present invention is described below by way of an example:

(1) A key pair (including a public key and a private key) for signing is acquired, and the key length is set to 1024.

The private key is generated as follows:

-----BEGIN RSA PRIVATE KEY-----

MIICXAIBAAKBgQCf04943LZTODlJtWCqcIZ+jNRjgqD9f5Bpm UtIJ1CktlO9LikmAZB4kSnsss8F7T6yKycspVUgvSyvBTAk04KTwFzzs Qo38YxOgF6azyPy9PbSTGJWXv3Fr3T4RoiHV6+RmgyZf0Bg1SuYuln Fx4/s2cO3Xhj6myX9S1/yjWCFXQIDAQABAoGAHhjzhDGgHhTQ6PoAjagqE6vGlUS0t3gtxE0LWbeZnqL9KvuF4TPbNnEzwXC8vqN6Mogg1O3/lW46e20RL2YrSQZz72WT+yNQAyJxWv3MiomUL65K2wa3eH2H0Y/7fau28/rHFF2m3SVSWJ2G8XsxNSanOwKsWxorvEjV0/GrR6sC QQDFRDtC7WQC6UQuG2w6SOr0UCFYp+oNlN/HbD44ImCEEsfRlp4su0jSPXenyt5C484PlJGW7GgjfaNTH/lyQ/azAkEAz2mfD6bDtH2234MiUbMKx5UbKsoxCxdKbVKEpafyAregKvKPs+dTmjA/tmqf6RiEJKBO x1J0HASwn0qGhYIbrwJAedtgUEOc2D+IooLGJGsO2MT3FHEFoEYqx mITPVHfFTcUwF+ubitzHIxj8f7bta5LiExac0SuP95ImfzSdseNCQJAOX hLoXuUuHso46+jH74bW4e+GlIh2q/eaII3zOrHDOeyUpQZK0EKkiuSS8opet3XJ4rfqSz4jRbjlY+BzJZrcwJBALxUUPXlRRAYGKy0cuHe0IhnY Xpk5iGHn/PH/w+91GayuIHxsBeYYB4GiHXW4HgyWB0ggXExlkKj2hspyv/elgQ＝

-----END RSA PRIVATE KEY-----

the public key is generated as follows:

-----BEGIN PUBLIC KEY-----

MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQCf04943LZTODlJtWCqcIZ+jNRjgqD9f5BpmUtIJ1CktlO9LikmAZB4kSnsss8F7T6yKycspVUgvSyvBTAk04KTwFzzsQo38YxOgF6azyPy9PbSTGJWXv3Fr3T4RoiHV6+RmgyZf0Bg1SuYulnFx4/s2cO3Xhj6myX9S1/yjWCFX QIDAQAB

-----END PUBLIC KEY-----

(2) And obtaining a file DNA sequence of the file to be archived.

File content to be archived:

spring, river flower, moon and night

Zhang Rexu (Tang Dynasty)

Chun Jiang Chao shui Lian Haiping, shang Ming Yue Sheng.

To the extent that the Zhen Yan is in the millions of waves, where Chunjiang has no moon-!

The river flow turns around the meadow, and the lunar illumination flowers and forests are similar to the aragonis.

The frost in the air is not perceived to fly, and the white sand on the heater is not visible.

The color of the river is free of fiber dust, and the sun is a lunar wheel in the air.

What is the first month of the first sight at the riverside? Is the first Jiang Yuehe year old person?

The generation of the life is endless, and the life is similar in the year of the river and the month.

The people waiting in the month of the river are not known, but the water flowing in the Yangtze river is seen.

A tablet of white cloud is youthful, and Qingfengpu is not worried.

Who is a flat boat today? Where is abrin moon building?

The user can walk upstairs and loiter, and he should leave the dressing table.

The jade household curtain can not be rolled up, and the pounding anvil is flicked back.

At this time, it is expected to be disagreeable and to shine on the monarch in the moon-like fashion.

The long and flying geese have different degrees, and the fish dragon is submerged in water.

The night break-up dream falls the flowers, and the spring is poor.

The river water flows through the spring to remove the desire, and the river pool falls into the moon to return to the west slope.

Sinking sea fog in inclined month and Dan Xiaoxiang infinite ways.

Without knowledge of the people who get to the moon, the people fall to the moon and shake the Manjiang tree.

And saving the file to be archived as txt text through UIF-8 coding. The decimal numbers represented by each byte are converted into quaternary numbers by binary reading the file, and then converted into a file DNA sequence by mapping rules of "a=0, t=1, c=2, g=3", the file DNA sequence being as follows:

GCTCCTCACCTTGCTCCGATCTGGGCCACACCCGATGCTCCTGACACAGCTTCCTACTGAAAGTAACCGCTTCGGACCAAGCCACACGCCTTGCCACTCTCTCCACAAGCAGCAAACTTAGCTTCTTACTAAGCTACGCGCCAGGCAGCAAACTTTAAGTAACCAAGTAACCGCTCCTCACCTTGCTCCGATCTGGGCTCCGGTCCGCGCTCCGAACGTAGCCACGGGCTGCGCTCCGTTCGTGGCTTCGCTCGAGGCGGCGGACAGAGCTCCGTTCGTGGCTACGCACACCGCTCCTCACAGCGCTCCTGACACAGCTTCATTCGATGCTCCGGTCCGCGCTGCTTACTGGGCAGCAAACAACAAGTAACCGCTCCGCGCTGGGCTCCGCGCTGGGCCTCTCCCAGGGCTCCGAGCCACGCTTCAGTCAAGGCTACGCACATGGCCTCATGCAGAGCGGCGGACAGAGCTACGGTCTTTGCTTCCTACATAGCTCCTCACCTTGCTCCGATCTGGGCTCCTTGCCAAGCTCCTGACACAGCTCCTCACAGCGCGGCGGACAATAAGTAACCGCTCCGATCTGGGCTCCGTTCAATGCTTCCGCCTCGGCCACGGTCCGAGCTGCGCGCTTTGCCACACCCGAGGCTGCTTACGCAGCGGCGGACAGAGCTCCTGACACAGCTGCATTCCTGGCCACACCCGATGCTCCTGCCTTGGCTGCTCCCATCGCTACGGACGGAGCCTCTGACGAAGCAGCAAACAACAAGTAACCGCTGCCCTCGCCGCCTCATGCAGAGCTCCGTTCAATGCCTCTGACTGAGCTACGCACAGTGCCACCTGCACTGCCTCCAGCTGCGCGGCGGACAGAGCTCCGATCAAAGCTACGCACACCGCTGCTCTCGGTGCTCCGACCTCTGCTGCTGACACGGCTACGCACAGTGCCACCTGCAATGCAGCAAACAACAAGTAACCGCTCCGATCTGGGCTTCCTACCCTGCTACGCACAAAGCCACACTCGACGCTCCTTGCCAAGCTGCGCCCCTAGCTTCGAACTCAGCGGCGGACAGAGCTGCTCCCAGCGCTGCTCCCAGCGCTGCCCTCGCCGCTACGCACCGTGCTTCCGTCCTAGCTCCTGACACAGCCACGGTCCGCGCAGCAAACAACAAGTAACCGCTCCGATCTGGGCTGCTTTCTTAGCTACGGTCTTTGCTACGCCCGCCGCTTCACACTGTGCCACCTGCAATGCTCCTGACACAGCGGCGGACTGGGCTCCGATCTGGGCTCCTGACACAGCTACGGTCTTTGCTTCGCTCGTAGCTTCACACTGTGCTGCATTCCTGGCTACGCCCGCCGCGGCGGACTGGAAGTAACCGCTACGCCCGCCGCTGCTTACTGGGCTACGCGCCAGGCTACGCGCCAGGCTCCTTGCCAAGCTGCCCTCGTGGCTTCGTGCGACGCGGCGGACAGAGCTCCGATCTGGGCTCCTGACACAGCTTCGCTCGTAGCTTCGCTCGTAGCTTCAGGCCCCGCTGCTCGCGCAGCTACGGACGGAGCAGCAAACAACAAGTAACCGCTACGCACAGTGCTGCTGGCCTTGCTCCGATCTGGGCTCCTGACACAGCTTCGGCCATTGCTACGGTCTTTGCTACGCCCGCCGCGGCGGACAGAGCTACGGTCATCGCCACCTGCAATGCCTCTTTCGGGGCTCCGATCTGGGCCTCAAACAATGCTCCGTTCAATGCTCCGAACGTAGCAGCAAACAACAAGTAACCGCTGCTCTCGGTGCTACGCCCTATGCTACGCACAAAGCTGCACTCATGGCTTCAGCCGCGGCTCCAACCCAAGCTCCAACCCAAGCGGCGGACAGAGCCTCTGTCTACGCTCCTGCCCCGGCTCCGTTCCTCGCTACGCACACCGCTACGCACAGTGCCACAAGCTGAGCTCCATACAATGCAGCAAACAACAAGTAACCGCCACGAACAATGCTTCCGCCGTCGCTACGCGCACCGCTTCCTACTGAGCTCCACTCAATGCCACACACTGGGCTTCCGTCTAAGCGGCGGACTGGGCTACGGTCTTTGCTTCCTACATAGCTGCTCGCGCAGCTCCAAACTGTGCTCCTCACAGCGCTCCTGACACAGCTCCCTTCGGAGCGGCGGACTGGAAGTAACCGCTTCAGGCCGGGCTCCAAACTGAGCTCCCTTCGGAGCTACGCACACCGCTCCTGACACAGCTTCGGCCTCAGCTTCGGCCACCGCGGCGGACAGAGCTTCGCCCTTAGCTGCATTCCTGGCTGCCTCCGCGGCTACGCCCGCCGCTTCCTCCATCGCCTCTTTCTGAGCTTCAGGCGAAGCAGCAAACAACAAGTAACCGCTGCAGCCACTGCTCCACACGTGGCTTCGCACTCAGCTACGCACCGTGCTTCAGTCGTGGCTACGCACAGTGCTTCAGCCGCGGCGGCGGACAGAGCTCCAGTCCAGGCCACCATCCAGGCTGCCAACCTGGCTACGCACACCGCTCCACGCAACGCCACGGGCTCAGCTCCTGTCCTTGCAGCAAACAACAAGTAACCGCTCCCGTCCTAGCTCCTTGCGTCGCTGCTCGCGCAGCTCCTGACTCGGCTACGCACAGTGCTGCTCGCGCAGCCTCTTGCGCGGCGGCGGACAGAGCTCCATACGGGGCCTCAAACTAAGCTCCTGACACAGCTTCAGTCAGCGCTCCGTTCAATGCTGCATTCCTGGCTTCTAACTCGGCAGCAAACAACAAGTAACCGCCTCGCACGGGGCCTCTCGCAATGCCTCTTTCGGGGCCTCCAGCTGCGCTTCATTCACTGCTACGCACAGTGCTTCGCCCCTCGCGGCGGACAGAGCCTCGATCGGAGCCTCGGCCTCTGCTCCGGTCTGAGCCACGTGCAAGGCTCCGAACGTAGCTCCACACTAAGCTCCTTCCATGGCAGCAAACAACAAGTAACCGCTCCTCACCCAGCTTCCTACTGAGCCTCTTGCGACGCTCCGGTCCGTGCTCCCACCCTCGCCACTAACGGTGCCACACCCGATGCGGCGGACAGAGCTTCAGGCCGGGCTCCAAACTGAGCTCCTCACCTTGCTTCAGTCACCGCTACGCACAGTGCCACGGGCTCAGCTTCCGCCGTCGCAGCAAACAACAAGTAACCGCTCCGATCTGGGCTCCGAACGTAGCTCCGTTCAATGCTCCTCACCTTGCTTCAGCCGCGGCTCCCGACGACGCTTCGAACGGTGCGGCGGACAGAGCTCCGATCTGGGCTCCGGTCCGTGCCACTAACGGTGCTCCTGACACAGCTTCCTACAGTGCCACCTTCGGGGCTCCTTCCTGAGCAGCAAACAACAAGTAACCGCTCCTTCCTGAGCTCCTGACACAGCTCCGACCACTGCTCCGACCACTGCCACTTGCAGGGCTCCGTTCGTGGCCTCTCGCGGCGCGGCGGACAGAGCTGCCACCCAGGCTGCTGGCGAGGCTCCGGTCATGGCTCCGCTCTCAGCTCCTTGCCAAGCCTCTCTCTAAGCCACGTGCCGGGCAGCAAACAACAAGTAACCGCTACGCACAGTGCTGCTGGCCTTGCTACGCTCTCAGCTCCTGACACAGCTTCATGCCAAGCTACGCCCGCCGCTTCGGTCTACGCGGCGGACAGAGCCACTAACGGTGCTCCTGACACAGCTCCTATCATGGCTCCAAGCATTGCTCCGCGCCATGCTCCGATCTGGGCTCCCAACTATGCAGCAAACAAC

(3) Converting archive information of a file to be archived into an archive information DNA sequence

The archive information of the file to be archived is shown in fig. 7, and the corresponding archive information DNA sequence is shown as follows:

TGGGTTTCGGTTATGGCTCAACCCTACCAATGTGTTAAAAAAT

(4) Generating signature information DNA sequences

Combining the file DNA sequence and the archive information DNA sequence to obtain the information to be signed, generating a DNA sequence abstract through an MD5 algorithm, and signing the DNA sequence through an RSA algorithm by using the private key in the step (1) to obtain the signature information. Then, the decimal numbers represented by each byte in the signature information are converted into quaternary numbers, and the quaternary numbers are converted into DNA sequences through preset mapping rules of 'A=0, T=1, C=2 and G=3', so that the signature information DNA sequences are obtained as follows:

TCATACGCTTCTTAGTGACGTGTCGCCGGGGTCTTGGTGCACTCACATTGACGCCAAGTTATGGAATGTGTAACCGATCGTAATACATGATATGGCGTCGCCTTGGCATGGCAAAAAGTGTTGACACTTTGATGATTATAACCACGGCGCCACCGTTTTTCGCAAGACCAGAGCACCATGCCCTGGGATTTACCGCTATTGTGTATAGGATGGAAGAACGAAAGAAAACACAGTAACAACCCCTAAGTGTGTCACAGAGCGCGGTCGCTACCTGGCAACGCAGGTTTCGCCCGTGGACATACTTGTTTCGTGGGCGCTAGGATTACCCTGGAATTCTTAGGCAGATTCATTATTATTGAGGGGCGAAACCCTGAAATGAATTTCAGGCCGTGATTAATGTTTACATGGACACAAATTCCACGGCCACGATCGTTTTCATACTTTGTATGATCGAACCTGAACGGTCCTCTACGGTTCCATCTGTCGCTGAACGCTTCAGCTTCCCGTCACCC

(5) Generating seal DNA sequence

And combining the preset joint DNA sequence, the filing information DNA sequence and the signature information DNA sequence according to the sequence in the step 2 to obtain a corresponding seal DNA sequence.

(6) And synthesizing DNA molecules corresponding to the files to be archived based on the seal DNA sequence and the file DNA sequence so as to finish the storage of the files to be archived.

According to the archive storage method provided by the invention, the archive information of the archive file is converted into the archive information DNA sequence by converting the archive file into the archive information DNA sequence, then the archive information DNA sequence and the archive information DNA sequence are combined to obtain the information to be signed, the DNA sequence abstract of the information to be signed is generated, the DNA sequence abstract is signed according to a pre-generated private key, the signature information is generated and converted into the signature information DNA sequence, and the seal DNA sequence of the archive file is obtained based on the archive information DNA sequence and the signature information DNA sequence, so that the DNA molecule corresponding to the archive file is synthesized based on the seal DNA sequence and the file DNA sequence. Through the scheme, when the file to be archived is converted into the corresponding DNA molecule for storage, whether the synthesized DNA molecule is damaged or tampered can be determined through the seal DNA sequence, so that the safety of archival storage is improved, and the development of archival storage is facilitated.

The present invention also provides a file reading method, as shown in fig. 8, which at least comprises the following steps:

s801, sequencing the DNA molecules to be read, which are stored with files, to obtain the DNA sequences to be read of the DNA molecules to be read.

The DNA molecule to be read, in which the file is stored, is obtained according to the file storage method provided in the above embodiment.

S802, acquiring a file DNA sequence in the DNA sequence to be read.

S803, decoding the file DNA sequence to obtain the file to be read.

Specifically, in the case where the DNA sequence to be read includes a preset adaptor DNA sequence, step S803 may be implemented at least by:

determining a seal DNA sequence and a file DNA sequence in the DNA sequence to be read by a PCR technology according to the preset joint sequence;

and decoding the file DNA sequence to obtain the corresponding file to be read.

It should be noted that, the process of decoding the DNA sequence of the file is the reverse process of converting the file to be archived into the DNA sequence, which is not specifically limited in the embodiment of the present application.

Further, the signature information DNA sequence in the seal DNA sequence can be extracted according to the preset joint DNA sequence, the filing information DNA sequence and the preset sequence; decoding the signature DNA sequence and converting the signature DNA sequence into corresponding signature information; and then signing the signature information through a preset public key to obtain a signing verification result.

Wherein, the signature verification result is used for indicating whether the DNA sequence to be read is tampered or not.

The seal DNA sequence is obtained by arranging and combining the preset joint DNA sequence, the filing information DNA sequence and the signature information DNA sequence according to a certain sequence, wherein the preset joint DNA sequence is fixed, and the filing information DNA sequence has a fixed character string length, so that the signature information DNA sequence in the seal DNA sequence can be well extracted, and the signature verification is realized.

According to the file reading method provided by the invention, the stamp DNA sequence can be used for checking the stamp, so that whether the DNA molecule to be read is damaged or tampered is determined, and the file reading safety is ensured.

Based on the above-mentioned archive storage method and archive reading method, the present invention also provides a computer-readable storage medium storing one or more programs executable by one or more processors to implement the steps in the archive storage method described in the above-mentioned embodiments, or the steps in the archive reading method described above.

Based on the above archive storage method and archive reading method, the present invention also provides a terminal, as shown in fig. 9, which includes at least one processor (processor) 50; a display screen 51; and a memory (memory) 51, which may also include a communication interface (Communications Interface) 53 and a bus 54. Wherein the processor 50, the display 51, the memory 51 and the communication interface 53 may communicate with each other via a bus 54. The display screen 51 is configured to display a user guidance interface preset in the initial setting mode. The communication interface 53 may transmit information. The processor 50 may invoke logic instructions in the memory 51 to perform steps in the archive storage method described in the above embodiment, or the archive read method described in the above embodiment.

Further, the logic instructions in the memory 51 described above may be implemented in the form of software functional units and may be stored in a computer readable storage medium when sold or used as a stand alone product.

The memory 51 is provided as a computer readable storage medium storing a software program, a computer executable program, such as program instructions or modules corresponding to the methods in the embodiments of the present disclosure. The processor 50 executes functional applications and data processing, i.e. implements the methods of the embodiments described above, by running software programs, instructions or modules stored in the memory 51.

The memory 51 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, at least one application program required for a function; the storage data area may store data created according to the use of the terminal, etc. In addition, the memory 51 may include a high-speed random access memory, and may also include a nonvolatile memory. For example, a plurality of media capable of storing program codes such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or a transitory storage medium may be used.

All embodiments in the application are described in a progressive manner, and identical and similar parts of all embodiments are mutually referred, so that each embodiment mainly describes differences from other embodiments. In particular, for the apparatus and medium embodiments, the description is relatively simple, as it is substantially similar to the method embodiments, with reference to the section of the method embodiments being relevant.

The devices and media provided in the embodiments of the present application are in one-to-one correspondence with the methods, so that the devices and media also have similar beneficial technical effects as the corresponding methods, and since the beneficial technical effects of the methods have been described in detail above, the beneficial technical effects of the devices and media are not described in detail herein.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

Of course, those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by a computer program for instructing relevant hardware (e.g., processor, controller, etc.), the program may be stored on a computer readable storage medium, and the program may include the above described methods when executed. The computer readable storage medium may be a memory, a magnetic disk, an optical disk, etc.

It is to be understood that the invention is not limited in its application to the examples described above, but is capable of modification and variation in light of the above teachings by those skilled in the art, and that all such modifications and variations are intended to be included within the scope of the appended claims.

Claims

1. A method of archive storage, the method comprising:

2. The archive storage method according to claim 1, wherein the obtaining the stamp DNA sequence of the file to be archived based on the archive information DNA sequence and the signature information DNA sequence specifically comprises:

acquiring a preset joint DNA sequence;

3. The archive storage method of claim 2, wherein the predetermined adaptor DNA sequence comprises an upstream adaptor DNA sequence and a downstream adaptor DNA sequence.

4. The archive storage method according to claim 1, wherein converting the acquired archive information of the archive file to an archive information DNA sequence, and converting the acquired archive information of the archive file to an archive information DNA sequence, specifically comprises:

5. The archive storage method according to claim 1, wherein the obtaining the stamp DNA sequence of the file to be archived based on the archive information DNA sequence and the signature information DNA sequence specifically comprises:

6. The archive storage method according to claim 5, wherein the step of selecting a base from the first or second base set as an adjustment base according to the base contents of guanine G and cytosine C in the group to be detected and adding the adjustment base to the group to be detected to obtain the updated group to be detected comprises:

and if the base contents of the guanine G and the cytosine C of the group to be detected are smaller than a preset content threshold, selecting a base from the second base set to be selected as an adjusting base, and adding the adjusting base into the group to be detected to obtain the updated group to be detected.

7. The archive storage method of claim 6, wherein if the base contents of guanine G and cytosine C in the group to be detected are greater than a predetermined content threshold, selecting a base from the first set of bases to be selected as the adjustment base and adding the adjustment base to the group to be detected, specifically comprising:

8. A method of archive reading, the method comprising:

wherein the DNA molecule to be read, in which the archive is stored, is obtained by the archive storage method according to any one of claims 1 to 7;

and decoding the file DNA sequence to obtain the file to be read.

9. A archival reading method according to claim 8, wherein after determining the file DNA sequence and the stamp DNA sequence in the DNA sequence of the DNA molecule to be read by PCR technique based on the preset linker DNA sequence, the method further comprises:

10. A terminal, comprising: a processor and a memory; the memory has stored thereon a computer readable program executable by the processor; the processor, when executing the computer readable program, implements the steps of the archive storage method of any one of claims 1 to 7 or the archive reading method of any one of claims 8 to 10.