Disclosure of Invention
The technical problems to be solved by the invention are as follows: aiming at the problem that the existing DNA hybridization storage technology does not realize the encryption function, the invention provides an encryption method for storing DNA hybridization information. The invention can effectively prevent free diffusion of information, improve information safety and promote practical application of DNA storage technology.
In order to solve the technical problems, the invention adopts the technical scheme that: a DNA hybridization information storage encryption method based on dual-probe specific separation comprises the following steps:
1) converting target information into binary codes, and dividing the binary codes into a plurality of fields with equal length, wherein each field corresponds to one data unit and comprises M binary digits;
2) mapping 1/0 values on M binary digits of each field into the presence/absence combination of M DNA coding strands, thereby obtaining a coding strand list to be added to all data units on the DNA storage disk;
3) fixing each DNA coding chain in each data unit on the surface of the DNA storage disc in a spotting, ink-jetting or in-situ synthesis mode according to the list, namely completing the writing of target information on the DNA storage disc;
4) Reading the information of the DNA memory disc by hybridizing the coding strand with the complementary fluorescent probe; performing fluorescence detection after hybridization, judging whether the corresponding coding chains exist or not according to the existence of fluorescence of each color in each data unit, thus obtaining a coding chain combination list in all data units on the storage disc, further reducing the coding chains into binary codes, and finally reducing target information;
5) the encryption link of the invention lies in the design and application of the encrypted double-fluorescence probe, and comprises the following steps:
a. each DNA oligonucleotide double-probe positive strand comprises a basic sequence divided into two sections (marked as S1 and S2), and the two sections are respectively complementary with the two corresponding DNA oligonucleotide coding strands, namely each double-probe corresponds to the two coding strands, and the total number of the basic sequences of all the double-probes is half (marked as M) of the total number (marked as 2M) of the coding strands;
b. two sections (namely S1 and S2) of the basic sequence of each double-probe chain are connected by a restriction enzyme recognition sequence (marked as EN), so that a double-probe positive chain with the structure of 5 '-S1-EN-S2-3' is formed;
c. connecting one fluorescent group (marked as F1) to the 5 'end of the double-probe sequence positive strand, connecting the other fluorescent group (marked as F2) to the 3' end of the double-probe sequence positive strand, and enabling F1 and F2 to be different, so that the double-probe positive strand with double fluorescence is formed;
d. Mixing the positive strand of the double-probe and the complementary negative strand of the DNA in equal proportion, and annealing after denaturation to obtain the double-strand double-probe with double fluorescence;
e. n double-stranded double probes are prepared aiming at each of M basic sequences, the combination of fluorescent groups carried by the two ends of the N double-stranded double probes is the same, but the restriction enzyme recognition sequences of the intermediate connection S1 and S2 of each probe are different (marked as EN) 1 、EN 2 、……、EN N ) All the M multiplied by N double probes form a double probe library together and are preserved by an information sender;
f. and a restriction enzyme recognition sequence EN between S1 and S2 in the double probe sequence 1 -EN N All N restriction enzymes (denoted E) 1 、E 2 、……、E N ) Forming an endonuclease library, and preserving by an information receiver;
g. before sending information, a sender randomly selects a group of M double probes covering all coding strand complementary sequences from a double probe library, wherein the basic sequences of the double probes in the group are different, but the included endonuclease recognition sequences are the same (marked as EN) i );
h. When sending information, the sender physically delivers the set of double probes and the DNA storage disk written with information to the receiver, and simultaneously sends the key (namely the correct EN) i Information) to the recipient via another secure means (e.g., secure phone, secure code, etc.);
i. before the recipient can make hybridization reading of DNA storage information, it must first select correct enzyme E from endonuclease library by means of key information i Carrying out enzyme digestion on the double probes to separate the two probes on the double probes, thereby activating the fluorescent probes;
6) when a receiver performs DNA storage information hybridization reading, the activated fluorescent probe is hybridized (complementary pairing) with the coding strand on the DNA storage disc, and a hybridization signal of each data storage unit is obtained through fluorescence detection; if the probe which is not activated is used, an error fluorescence detection signal can be obtained due to the existence of the double fluorescent group;
7) in order to further improve the safety of the encrypted probe, a restriction enzyme recognition sequence combination can be added between the 5 'fluorophore and the S1 segment of the basic sequence and between the S2 segment and the 3' fluorophore of the basic sequence respectively, and the recognition sequences in the combination are all different from the recognition sequences connecting the S1 segment and the S2 segment; thus, when the double probe is treated with an endonuclease (wrong key) corresponding to any of the recognition sequences in the combination, the fluorophore of the probe is cleaved to cause self-destruction of the probe.
Preferably, the value range of M in the step 1) is between 1 and 4;
preferably, the DNA storage disk in step 3) is a hard material with a specific chemical group fixed on the surface, the hard material includes but is not limited to glass, silicon wafer, plastic, magnetic beads, etc., and the fixed terminal chemical group on the surface includes but is not limited to amino, aldehyde, sulfhydryl, etc.;
Preferably, the length of the basic sequence in step 5) a is 8-30 nucleotides;
preferably, the recognition sequence or sequence string for the restriction enzyme in steps 5) b and 7) is 4 to 60 nucleotides in length;
preferably, the different fluorophores in step 5) c can be effectively distinguished in the fluorescence detection instrument, and can include, but are not limited to, combinations of fluorophores with large emission wavelength differences among Alexa488, Cy3, Cy5, Cy7, and the like;
preferably, the denaturation temperature in the step 5) d is 70-100 ℃, the annealing temperature is 20-65 ℃, and the denaturation temperature and the annealing temperature are both adjusted according to the length of the oligonucleotide and the GC content of the base;
preferably, the value of N in step 5) e can be more than 200, because currently commercialized type II restriction enzymes have more than 200 specificities;
preferably, the hybridization temperature in the step 6) is 25-55 ℃, the hybridization temperature is adjusted according to the length of the oligonucleotide and the GC content, and the used fluorescence detection equipment is a commercially available microplate reader or a biochip scanner;
compared with the prior art, the invention has the following advantages:
1. the invention realizes the hard encryption function which is not possessed by the prior DNA hybridization storage technology;
2. compared with the technology of realizing encryption by signal coding and decoding or a software mode, the hard encryption of the invention prevents the acquisition of effective detection signals from a physical layer, so that a third party cannot be violently cracked by an algorithm, and the security is higher.
Detailed Description
The following will describe the DNA information storage encryption method of the present invention in further detail, taking "encryption" (two chinese characters) as an example of the write target information.
As shown in fig. 2, the DNA information storage parallel addressing writing method of the present embodiment includes:
1) converting target information, namely 'encrypted' two Chinese characters into binary codes, 10111100110100111100001111011100, and dividing the binary codes into 8 fields, wherein each field corresponds to a data unit and comprises 4 binary digits;
2) mapping 1/0 values on 4 binary digits of each field into the presence/absence combination of 4 DNA coding chains B1, B2, B3 and B4; for example, if there is a code chain on the first digit (corresponding to B1), the value of the digit is 1, otherwise, it is 0, thereby obtaining a list of code chains that all data units on the DNA storage disc need to be added; taking the above 8 data units as an example, the corresponding coding chain combinations are as follows:
data unit
|
1011
|
1100
|
1101
|
0011
|
1100
|
0011
|
1101
|
1100
|
Coding chain combination
|
B1 B3 B4
|
B1 B2
|
B1 B2 B4
|
B3 B4
|
B1 B2
|
B3 B4
|
B1 B2 B4
|
B1 B2 |
3) Fixing each DNA coding chain in 8 data unit micro-pools on the surface of the DNA storage disc in a spotting way according to the list, namely completing the writing of target information on the DNA storage disc;
4) Reading the information of the DNA memory disc by hybridizing the coding strands B1-B4 with complementary fluorescent probes R1-R4; performing fluorescence detection after hybridization, and judging whether the corresponding coding strand exists or not according to the existence of fluorescence of each color in each data unit; for example, when the fluorescence signal of R2 is not detected but the fluorescence signals of R1, R3 and R4 are detected in the first data unit after hybridization, the combination of the code chains of the data unit is judged to be B1B 3B 4, and the corresponding binary number is 1011. Therefore, a code chain combination list in all data units on the storage disc can be obtained, and then the code chain combination list is further reduced into binary codes (as shown in the following table), and finally target information (namely two Chinese characters are encrypted) is reduced;
fluorescent signal
|
R1 R3 R4
|
R1 R2
|
R1 R2 R4
|
R3 R4
|
R1 R2
|
R3 R4
|
R1 R2 R4
|
R1 R2
|
Transcoding chain
|
B1 B3 B4
|
B1 B2
|
B1 B2 B4
|
B3 B4
|
B1 B2
|
B3 B4
|
B1 B2 B4
|
B1 B2
|
Restoring code
|
1011
|
1100
|
1101
|
0011
|
1100
|
0011
|
1101
|
1100 |
5) The encryption link of the invention lies in the design and application of the encrypted double-fluorescence probe, as shown in figure 1, comprising the following steps:
a. designing two DNA oligonucleotide double probes JSTA and JSTB, wherein each double probe positive strand comprises a basic sequence divided into two sections, the two sections are respectively complementary with the two corresponding DNA oligonucleotide coding strands, the basic sequence of the JSTA consists of R1 and R2, and the two corresponding coding strands are B1 and B2; the basic sequence of JSTB consists of R3 and R4 corresponding to two other coding chains B3 and B4;
b. A restriction enzyme recognition sequence (e.g., a restriction enzyme recognition sequence) is inserted between two single probes of the basic sequence of each double probe strand (i.e., between the R1 and R2 sequences of JSTA, and between the R3 and R4 sequences of JSTB)EcoRThe I recognition sequence GAATTC, noted EN1), thereby constituting a JSTA plus strand having the structure 5 '-R1-EN 1-R2-3', and a JSTB plus strand having the structure 5 '-R3-EN 1-R4-3';
c. the 5 'end of JSTA is connected with a fluorophore Alexa488, and the 3' end is connected with another fluorophore Cy 3; the 5 'end of JSTB is connected with a fluorophore Cy5, and the 3' end is connected with another fluorophore Cy 7; thereby forming a double-probe positive strand with two double-fluorescence, wherein the fluorescent groups are different;
d. respectively mixing the JSTA and JTB positive strands of the double probes with complementary DNA negative strands in equal proportion, and annealing after 85C denaturation to obtain double-stranded double probes with double fluorescence;
e. aiming at JSTA basic sequence, 100 double-stranded double probes are prepared, the combination of fluorophores carried at two ends of the 100 probes is the same, but the restriction enzyme recognition sequences of R1 and R2 connected in the middle of each probe are different (marked as EN) 1 、EN 2 、……、EN 100 ) (ii) a Aiming at the JSTB basic sequence, processing 100 double-stranded double probes corresponding to the coding chains B3 and B4 by a method; all 200 double probes form a double-probe library together and are preserved by an information sender;
f. Restriction enzyme recognition sequence EN between two single probes in double probe sequence 1 -EN 100 All 100 restriction enzymes (denoted as E) 1 、E 2 、……、E 100 ) Forming an endonuclease library, and preserving by an information receiver;
g. before information is sent, a sender randomly selects a group of 2 double probes covering all coding strand complementary sequences from a double probe library, namely one each of JSTA and JSTB, and endonuclease identification sequences contained in the 2 double probes are identical (marked as EN) i );
h. When sending information, the sender physically delivers the set of double probes and the DNA storage disk written with information to the receiver, and simultaneously sends the key (namely the correct EN) i Information) to the recipient via another secure means (e.g., secure phone, secure code, etc.);
i. before the recipient can make hybridization reading of DNA storage information, it must first select correct enzyme E from endonuclease library by means of key information i Carrying out enzyme digestion on the double probes to separate the two single probes on the double probes so as to activate the fluorescent probes;
6) when a receiver performs DNA storage information hybridization reading, the activated fluorescent probe is hybridized (complementary pairing) with the coding strand on the DNA storage disc, and a hybridization signal of each data storage unit is obtained through fluorescence detection; if the probe which is not activated is used, an error fluorescence detection signal can be obtained due to the existence of the double fluorescent group; for example, the first data unit in this example is a B1B 3B 4 code chain combination (corresponding to 2-ary code 1011), and due to the binding of R1-R2 in JSTA, the B1 code chain of the data unit can read out the R1 and R2 signals, which results in the first two-bit binary code corresponding to the data unit being misread from 10 to 11;
7) In order to further improve the safety of the encrypted probe, a restriction endonuclease recognition sequence combination (marked as ENF and ENT, and corresponding endonucleases are respectively EF and ET) can be respectively added between the 5 'fluorophore of the probe and the 5' end of the basic sequence and between the 3 'end of the basic sequence and the 3' fluorophore, and the recognition sequences in the combination are different from the recognition sequences of two single probes connected with the basic sequence; thus, when the double probe is treated with the endonuclease EF or ET (wrong key), the fluorophore of the probe is cleaved to cause the self-destruction of the probe.
In this embodiment, the numeric value range of the data unit in step 1) is between 1 and 8;
in this embodiment, the DNA storage disk in step 3) is a hard material with a surface fixed with specific chemical groups, the hard material includes, but is not limited to, glass, silicon wafer, plastic, magnetic beads, etc., and the fixed end chemical groups on the surface include, but are not limited to, amino groups, aldehyde groups, thiol groups, etc.;
in this example, the length of the base sequence in step 5) a is 8 to 30 nucleotides;
in this example, the recognition sequence or sequence string for the restriction enzyme in steps 5) b and 7) is 4-60 nucleotides in length;
in this embodiment, the different fluorophores in step 5) c can be effectively distinguished in the fluorescence detection apparatus, and may include, but are not limited to, combinations of fluorophores with large emission wavelength differences among Alexa488, Cy3, Cy5, Cy7, etc.;
In the embodiment, the denaturation temperature in the step 5) d is between 70 and 100 ℃, the annealing temperature is between 20 and 65 ℃, and the denaturation temperature and the annealing temperature are adjusted according to the length of the oligonucleotide and the GC content of the base;
in the embodiment, the value of the endonuclease and the type of the recognition sequence thereof in the step 5) e can reach more than 200, because the currently commercialized II-type restriction enzyme has more than 200 specificities;
in the embodiment, the hybridization temperature in the step 6) is 25-55 ℃, the hybridization temperature is adjusted according to the length of the oligonucleotide and the GC content, and the used fluorescence detection equipment is a commercially available microplate reader or a biochip scanner;
in conclusion, the present embodiment realizes the hard encryption function that the existing DNA hybridization storage technology does not have before; compared with the technology of realizing encryption by signal coding and decoding or a software mode, the hard encryption provided by the invention prevents the acquisition of effective detection signals from a physical layer, so that a third party cannot be violently cracked by an algorithm, the safety is higher, and the application prospect is good.
In this embodiment, a high molecular polymer or silica is used as a storage substrate, and data storage micro-pools arranged according to a predetermined rule are densely distributed on the substrate, and each data storage micro-pool is an information storage unit.
When the DNA information storage medium is used specifically, the activated coding strand mixed solution is uniformly covered on a storage disc in which data is written through coding strand crosslinking, so that the activated fluorescent probe and the corresponding coding strand perform sufficient hybridization reaction in a data unit micro-pool. After the hybridization reaction is completed, washing and reading of fluorescence information are performed, and the original information is restored by signal conversion and decoding.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-readable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The above description is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above embodiments, and all technical solutions belonging to the idea of the present invention belong to the protection scope of the present invention. It should be noted that modifications and embellishments within the scope of the invention may occur to those skilled in the art without departing from the principle of the invention, and are considered to be within the scope of the invention.