CN111440827A

CN111440827A - Information storage medium, information storage method and application

Info

Publication number: CN111440827A
Application number: CN202010443536.6A
Authority: CN
Inventors: 朱佑民; 程倩; 王梓旭; 陶小倩; 侯强波; 杨平; 柳伟强; 邢妍婧; 赵一凡
Original assignee: Suzhou Synbio Technologies Co ltd
Current assignee: Suzhou Synbio Technologies Co ltd
Priority date: 2020-05-22
Filing date: 2020-05-22
Publication date: 2020-07-24

Abstract

The invention provides an information storage medium, an information storage method and application, wherein the information storage medium is a nucleic acid molecule; the information storage medium comprises a fusion gene consisting of a codon sequence corresponding to the stored information and a fluorescent protein gene. The invention stores information by using the codons corresponding to the amino acids, the stored information can be read by using a sequencing technology, and the information contained in the information can be presumed according to the amino acid sequence corresponding to the protein, so that the information reading is not influenced even if the DNA is damaged; furthermore, the stored codon sequence can be theoretically translated into fluorescent protein, and the information is classified by observing the fluorescent color through a laser confocal microscope.

Description

Information storage medium, information storage method and application

Technical Field

The invention belongs to the technical field of information storage, relates to an information storage medium, an information storage method and application, and particularly relates to an amino acid corresponding codon information storage medium, an information storage method and application.

Background

Today, global digital information is growing rapidly, and the total amount of global digital information is expected to reach 35 ze bytes in 2020. However, the existing information storage devices cannot meet the demand for an explosive increase in the amount of information, and the existing magnetic and optical storage media have problems of low storage density and poor durability. The tape cassette is one of the most dense storage media, the storage capacity is 185TB, and the storage density is about 10GB/mm³(ii) a The storage capacity of the optical disk can reach 1PB and the storage density reaches 100GB/mm³. Storage media based on magnetic and optical technologies have a limited lifetime and, to ensure long-term storage of information, it is necessary to refresh and erase the damaged data or perform data migration to avoid data loss. For example, the storage life of the rotating disk is 3-5 years, and the storage life of the magnetic tape is 10-30 years. Energy consumption of the storage medium is also important, and the U.S. data center consumes 1.5% of the total electricity used in the U.S. year 2010, costing up to $ 45 billion. Needless to say, developing a next generation digital information storage medium with higher storage density and stronger durability is one of the primary tasks in the field of digital information storage at present.

DNA will become the future due to its high density and good long-term stabilityAn excellent choice of information storage medium. The DNA density is large, the total length of human genome DNA is up to 30 hundred million base pairs, but the DNA can exist in cells with the diameter of tens of micrometers, and the theoretical limit value of the information storage density of the DNA is 1EB/mm³(1EB＝10⁹GB), that is to say 1gDNA can store 700TB of data, corresponding to 1.4 million blu-ray discs of 50GB or 233 hard discs of 3TB, far exceeding existing magnetic and optical storage media. The DNA stability is extremely strong, and researchers predict that the DNA can be stored for 1 million years in an environment at-18 ℃.

At present, the cost and time for gene synthesis and gene sequencing are exponentially reduced, greatly promoting the development of the field of DNA information storage media.

CN106845158A discloses a method for storing information by using DNA, which comprises (1) converting binary information of computer original files into quaternary system and further code-converting into DNA full sequence, wherein binary codes 00, 01, 10, 11 are correspondingly converted into A, T, C, G four deoxyribonucleotides respectively; (2) dividing the whole DNA sequence into a plurality of DNA fragments, and organizing and constructing an output DNA sequence which has the length of 90-110nt and comprises an insertion nucleotide coding sequence consisting of the DNA fragments, flanking primer sequences positioned at two ends and an index coding sequence positioned at the inner side of each flanking primer sequence; (3) synthesizing artificial DNA sequence based on the output DNA sequence and storing. The method has the remarkable advantages of good universality, capability of simplifying operation, improving the continuity, the storage efficiency and the density of DNA information storage, reducing the error rate, reducing the sequence synthesis and detection cost and the like.

However, the current DNA storage is that information is directly stored in units of bases, and if DNA is damaged, the stored data cannot be recovered. Therefore, there is a need for further improvement of the existing DNA information storage technology.

Disclosure of Invention

Aiming at the defects and actual requirements of the prior art, the invention provides an information storage medium, an information storage method and application, wherein the information is stored by utilizing codons corresponding to amino acids, the stored information can be read by utilizing a sequencing technology, and the information contained in the information can be presumed according to the amino acid sequence corresponding to the protein, so that the reading of the information is not influenced even if the DNA is damaged; furthermore, the stored codon sequence can be theoretically translated into fluorescent protein, and the information is classified by observing the fluorescent color through a laser confocal microscope.

In order to achieve the purpose, the invention adopts the following technical scheme:

in a first aspect, the present invention provides an information storage medium which is a nucleic acid molecule;

the information storage medium comprises a fusion gene consisting of a codon sequence corresponding to the stored information and a fluorescent protein gene.

Preferably, each character of the stored information is represented by a number of consecutive amino acids, and each character of the stored information corresponds to a codon sequence consisting of a number of nucleotide residues.

In the invention, each character of the stored information is expressed as N continuous amino acids, and then the amino acids are converted into corresponding codons, namely, each character of the stored information corresponds to a codon sequence consisting of 3N nucleotide residues.

Preferably, each character of the stored information is represented by three consecutive amino acids, and each character of the stored information corresponds to a codon sequence consisting of 9 nucleotide residues.

In the present invention, each character of stored information (chinese character, english character or symbol) is converted into an amino acid sequence represented by three consecutive amino acids, on the basis of twenty amino acids "G", "a", "V", "L", "I", "F", "W", "Y", "D", "N", "E", "K", "Q", "M", "S", "T", "C", "P", "H" and "R", any 3 amino acid combinations represent one chinese character, and 20 × 20 × 20 chinese characters can be represented at the maximum, each amino acid is converted into a corresponding codon and codon-optimized according to saccharomyces cerevisiae, that is, each character of the stored information corresponds to a codon sequence consisting of 9 nucleotide residues, the codon sequence and a fluorescent protein gene are constructed as a fusion gene, and the thus constructed information storage medium is essentially a DNA sequence, and the stored information can be read by sequencing, and the DNA sequence also corresponds to an amino acid sequence, even after DNA is damaged, the stored information can be calculated by the amino acid sequence, and the information storage medium can be theoretically translated into a fluorescent protein, and the information can be classified by observing the fluorescent color under a microscope.

Preferably, the codon sequence corresponding to the stored information is located at the 5 'end of the fusion gene, and the fluorescent protein gene is located at the 3' end of the fusion gene.

Preferably, the fluorescent protein gene includes a green fluorescent protein gene or a red fluorescent protein gene.

Preferably, the information storage medium further includes a promoter and a terminator.

In the invention, a promoter and a terminator which are suitable for the saccharomyces cerevisiae are respectively connected with the 5 'end and the 3' end of the fusion gene.

Preferably, the promoter is the GA L1 promoter.

Preferably, the terminator is CYC1 terminator.

Preferably, the information storage medium further includes a 5're-organizing arm and a 3're-organizing arm.

In the invention, the recombination arm is a sequence suitable for homologous recombination in saccharomyces cerevisiae, namely, a sequence on two sides of a target gene corresponding to the designed sgRNA when CRISPR/Cas9 gene editing is carried out.

Preferably, the length of the 5 'recombinant arm and the 3' recombinant arm is 50-200 bp, for example, 50bp, 60bp, 70bp, 80bp, 90bp, 100bp, 110bp, 120bp, 130bp, 140bp, 150bp, 160bp, 170bp, 180bp, 190bp or 200 bp.

Preferably, the information storage medium comprises a 5 'recombination arm, a GA L1 promoter, a codon sequence corresponding to stored information, a green fluorescent protein gene, a CYC1 terminator and a 3' recombination arm in sequence from 5 'to 3'.

Preferably, the green fluorescent protein comprises a nucleic acid sequence shown as SEQ ID NO. 1;

SEQ ID NO:1：

atggttagtaagggagaagagttatttacaggggtcgttcctatattagtagaacttgatggcgacgttaatggacataaatttagtgtttcaggtgaaggagaaggtgatgcaacgtacggtaaactgactctaaagttcatttgcaccaccggtaaattgcctgtaccgtggccaacactagttactacgttaacatacggcgtacagtgtttttcgagatatccagaccacatgaaacaacacgactttttcaaatccgcaatgccagaaggttacgtccaggaacgtactattttcttcaaagatgatggaaattataaaaccagggctgaagtgaaatttgaaggcgacactctagtgaacagaattgagttgaaggggattgatttcaaggaagacgggaacatactcggtcataagctggagtacaactataattcccataacgtctatattatggcggataagcaaaagaatggtatcaaggttaactttaaaatccggcacaatatcgaagatggctctgtacaattggccgatcattatcaacaaaatacacctattggagatggtcccgtgttgttaccagacaatcattacttgtcaacacaatctgctttaagcaaagatcccaatgagaaaagagatcatatggtcttgttagagtttgttactgccgctggtataactctgggtatggatgaactttataaataa。

preferably, the GA L1 promoter comprises the nucleic acid sequence shown in SEQ ID NO. 2;

SEQ ID NO:2：

cggattagaagccgccgagcgggtgacagccctccgaaggaagactctcctccgtgcgtcctcgtcttcaccggtcgcgttcctgaaacgcagatgtgcctcgcgccgcactgctccgaacaataaagattctacaatactagcttttatggttatgaagaggaaaaattggcagtaacctggccccacaaaccttcaaatgaacgaatcaaattaacaaccataggatgataatgcgattagttttttagccttatttctggggtaattaatcagcgaagcgatgatttttgatctattaacagatatataaatgcaaaaactgcataaccactttaactaatactttcaacattttcggtttgtattacttcttattcaaatgtaataaaagtatcaacaaaaaattgttaatatacctctatactttaacgtcaaggag。

preferably, the CYC1 terminator comprises the nucleic acid sequence set forth in SEQ ID NO. 3;

SEQ ID NO:3：

tcatgtaattagttatgtcacgcttacattcacgccctccccccacatccgctctaaccgaaaaggaaggagttagacaacctgaagtctaggtccctatttatttttttatagttatgttagtattaagaacgttatttatatttcaaatttttcttttttttctgtacagacgcgtgtacgcatgtaacattatactgaaaaccttgcttgagaaggttttgggacgctcgaaggctttaatttgc。

preferably, the 5' recombination arm comprises the nucleic acid sequence shown in SEQ ID NO. 4;

SEQ ID NO:4：

gaggatgtaataatactaatctcgaagatgccatctaatacatatagacatacatatatatatatatacattctatatattcttacccagattctttgaggtaagacggttgggttttatcttttgcagttggtactattaagaacaatcgaatcataagcattgcttacaaagaatacacatacgaaatattaacgata。

preferably, the 3' recombination arm comprises the nucleic acid sequence shown in SEQ ID NO. 5;

SEQ ID NO:5：

cgtgatttacatatactacaagtcgccagtgtaactcctcactgaatatgattcatacatacccgtatgtattaatgtataaatgttctcagagcaaattttatcgatatcttgtttgccagtggtatgcaggtttggcaaattttttaccataatatccgtttatagattctggaaccttaccaactttcttaccgcta。

in a second aspect, the present invention provides an information storage method, including:

integrating the information storage medium of the first aspect into the saccharomyces cerevisiae genome for information storage.

Preferably, the method comprises:

transforming the information storage medium of the first aspect into competent saccharomyces cerevisiae simultaneously with the CRISPR/Cas9 gene editing plasmid targeting the saccharomyces cerevisiae gene;

primarily screening positive colonies, and carrying out PCR detection to obtain positive clones introduced with plasmids;

and extracting the positive cloned genome, and performing sequencing verification to obtain the saccharomyces cerevisiae integrated with the information storage medium.

Preferably, the CRISPR/Cas9 gene editing plasmid contains sgRNA targeting the saccharomyces cerevisiae AED1 or AED2 genes.

In the invention, CRISPR/Cas9 gene editing plasmids of targeted saccharomyces cerevisiae AED1 or AED2 genes are designed, so that the color of colonies is changed from white to red after the AED1 or AED2 genes of saccharomyces cerevisiae are edited, and the primary screening of positive colonies by visual observation is facilitated.

Preferably, the CRISPR/Cas9 gene editing plasmid contains sgrnas that target the 5' sequence of the saccharomyces cerevisiae AED1 or AED2 genes.

Preferably, the sgRNA comprises a nucleic acid sequence shown in SEQ ID NO. 6-11;

SEQ ID NO:6：TCCTGCCCAGGCCGCTGAGC；

SEQ ID NO:7：ATTGTCAGAGGCTACATCAC；

SEQ ID NO:8：ACTCTGACAGTTTGGTCAAT；

SEQ ID NO:9：ACTTTACCTCTGGCCACCAA；

SEQ ID NO:10：GGACGGTATATTGCCATTGG；

SEQ ID NO:11：TATGTCTCTAACTTTACCTC。

preferably, the information storage medium is prepared by the following method:

and synthesizing a plurality of positive and negative spacing sequences as PCR templates, designing head and tail primers for PCR amplification to obtain the information storage medium, wherein the adjacent positive and negative spacing sequences have overlapping regions.

Preferably, the length of the positive and negative spacing sequence is 55-70 bp, for example, 55bp, 56bp, 57bp, 58bp, 59bp, 60bp, 61bp, 62bp, 63bp, 64bp, 65bp, 66bp, 67bp, 68bp, 69bp or 70 bp.

Preferably, the length of the overlapping region is 20-25 bp, for example, 20bp, 21bp, 22bp, 23bp, 24bp or 25 bp.

Preferably, the length of the head and tail primers is 55-70 bp, for example, 55bp, 56bp, 57bp, 58bp, 59bp, 60bp, 61bp, 62bp, 63bp, 64bp, 65bp, 66bp, 67bp, 68bp, 69bp or 70 bp.

Preferably, the positive colonies are red.

Preferably, the sequencing comprises Sanger sequencing.

In a third aspect, the present invention provides an information storage kit comprising the information storage medium of the first aspect.

Preferably, the kit further comprises a CRISPR/Cas9 gene editing plasmid targeting the saccharomyces cerevisiae gene.

Preferably, the kit further comprises saccharomyces cerevisiae.

Compared with the prior art, the invention has the following beneficial effects:

(1) the invention stores information by using codons corresponding to amino acids, each character of the stored information corresponds to a codon sequence consisting of 9 nucleotide residues, and the codon sequence and the fluorescent protein gene construct a fusion gene, so that the constructed information storage medium is essentially a DNA sequence, the stored information can be read by sequencing, the DNA sequence also corresponds to the amino acid sequence, and the stored information can be calculated by the amino acid sequence even if the DNA is damaged, thereby solving the problems that the DNA sequence is damaged and the data can not be recovered in the prior art;

(2) the information storage medium of the present invention can theoretically be translated into a protein that exhibits fluorescence, and information classification is performed by observing the fluorescence color under a microscope;

(3) the information storage medium has high storage density and good stability, and has wide application prospect in the technical field of information storage.

Drawings

FIG. 1 is a flow chart of a method of information storage;

FIG. 2 is a CRISPR/Cas9 gene editing plasmid map;

FIG. 3 shows that the color of the Saccharomyces cerevisiae colony changes from white to red after ADE1 gene knockout is successful;

FIG. 4 is a Sanger sequencing alignment after information storage.

Detailed Description

To further illustrate the technical means adopted by the present invention and the effects thereof, the present invention is further described below with reference to the embodiments and the accompanying drawings. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention.

The examples do not show the specific techniques or conditions, according to the technical or conditions described in the literature in the field, or according to the product specifications. The reagents or apparatus used are conventional products commercially available from normal sources, not indicated by the manufacturer.

Example 1 information storage Experimental procedure

The flow chart of the information storage method constructed by the invention is shown in figure 1, firstly, each character of the information to be stored is converted into an amino acid sequence represented by three continuous amino acids, and then each amino acid is converted into a corresponding codon to obtain a codon sequence corresponding to the stored information; fusing a codon sequence with a Green Fluorescent Protein (GFP) gene to form a fusion gene, adding a promoter and a terminator, and adding a homologous recombination arm to form Donor; designing sgRNA of a targeting saccharomyces cerevisiae AED1 or AED2 gene, and constructing a CRISPR/Cas9 gene editing plasmid; co-transforming the plasmid and the Donor into saccharomyces cerevisiae, primarily screening positive colonies according to the color of the colonies, and carrying out PCR (polymerase chain reaction) detection to obtain positive clones introduced with the plasmid; and extracting the positive cloned genome, sequencing and verifying to obtain the saccharomyces cerevisiae integrated with the information storage medium, and reversely pushing the stored information according to the sequencing result.

Example 2 selection of information and codon conversion

The embodiment selects the following information for storage:

"Hongxu Biotechnology GmbH (Hongxu technology, Suzhou) is a leading DNA technology company. The technical platform developed by the present company is a globally leading combined platform for synthesizing biologically complete DNA

1.0，

2.0 and

3.0. the method efficiently meets the customer requirements of humanized antibody library construction, genetic engineering vaccine development, industrial enzyme optimization, chromosome/genome synthesis, molecular assisted breeding, DNA information storage technology development and the like. "

Converting the information to be stored into an amino acid sequence, wherein any 3 amino acid combinations represent a Chinese character based on twenty amino acids 'G', 'A', 'V', 'L', 'I', 'F', 'W', 'Y', 'D', 'N', 'E', 'K', 'Q', 'M', 'S', 'T', 'C', 'P', 'H' and 'R', and can represent at most 20 × 20 × 20 Chinese characters, converting the amino acid sequence into a corresponding codon sequence, and optimizing the codon sequence according to the codon preference of Saccharomyces cerevisiae to obtain the nucleic acid sequence carrying the stored information shown as SEQ ID NO. 12:

cgcctggatagagtatacagattgaagagggtcgttagggcttaccgcgtagcacgtgcattgcgagtgtccagggccattaggctacctagattgtttcgaattgatagattcccgcgagtggataggggttgcagagcaaggagactgatcagagctaagagattaaagcgcgtagttcgtgcacttagagtatctagggttactagaggttctagattagagaggggcgaaagagcacaaagaggtaaaagaggaatgaggttatcgagaattgcgcgtggtgacaggatccccagggtcagtagagcagaaagattcccaagagttgacagattatggagaattatcagattgcaaagatttcccagagtggatcgtgcccacagagtctgccgtctatcaagagtctcccgcgctgagcggggttggagggtctggcgtggtttcagagctgatagggggcctaggggtaaaaggggtatgcgactttcgagagtccaacgcgcttgtagggcctacagagttgccagaggctataggggacttcgtattaccagactcagtaggatcgcaagaggcgatcgaattccaagggtccagagagcgtgcagactattgcgtgtgcaaagaggatggcgtgtatggagggctgtgcgagctgttaggatttcacgtgggcaacgcttctgtagaggacatcgaatacgcagaggggtccgcgtcatgcgagtggaacgattagttcgtatcagccgtggacagagattttgcagaggtcacagaatccggcgtttggggagagttatgagggtcgaaagggtgattagaatttctagaggacaacgcttttgtagaggccatagaataaggagaggtacgagagttatgagggttgagagattatggagattgtatcgggttaatcgggttaggcggattaaacgaatttgtcggatcgaaagaggtattcgtgtaaaaaggggtaatagattacgaaggattcaacgtatatttagactccatagagtaccacgtataggcagattgacaaggatagttagggctgctagaggcggtagattaaacagagcctggagattgatgagaattttacgtgctcacagagtttgtcgaatagttcgactgaacagaatctatagagttcacagattcagaaggatacagagaatcgtgagaatttggcgctttactcgtttgcatagaggtcgtagagcagctagaggaggtcggctattgaggctttcccgcgtacaaagagcttgtcgtatcgtgaggttatgtagatttcatagaatacatcgtgctggtcgtgccttcagagttctacgggccaatagagcaatgagaatagctaggggcgacagaattcctcgtgtttttagagttggaagaattatgagagcaagcagattatcaagagtatctagagccgaaagggcccatagagtatgtagactggcgagagcgacaagggctccaagaggtgcaaggataaataggctttgg。

EXAMPLE 3 design and Synthesis of information storage Medium

The method comprises the steps of fusing a nucleic acid sequence shown as SEQ ID NO. 12 and carrying storage information with a green fluorescent protein gene shown as SEQ ID NO. 1 through a carbon end to construct a fusion gene, adding an initiation codon at a 5 'end, adding a stop codon at a 3' end, adding a GA L1 promoter (SEQ ID NO:2) at the upstream end, adding a CYC1 terminator (SEQ ID NO:3) at the downstream end, and adding homologous recombination arms (SEQ ID NO: 4-5) at two ends to design an information storage medium shown as SEQ ID NO. 13;

SEQ ID NO:13：

gaggatgtaataatactaatctcgaagatgccatctaatacatatagacatacatatatatatatatacattctatatattcttacccagattctttgaggtaagacggttgggttttatcttttgcagttggtactattaagaacaatcgaatcataagcattgcttacaaagaatacacatacgaaatattaacgatacggattagaagccgccgagcgggtgacagccctccgaaggaagactctcctccgtgcgtcctcgtcttcaccggtcgcgttcctgaaacgcagatgtgcctcgcgccgcactgctccgaacaataaagattctacaatactagcttttatggttatgaagaggaaaaattggcagtaacctggccccacaaaccttcaaatgaacgaatcaaattaacaaccataggatgataatgcgattagttttttagccttatttctggggtaattaatcagcgaagcgatgatttttgatctattaacagatatataaatgcaaaaactgcataaccactttaactaatactttcaacattttcggtttgtattacttcttattcaaatgtaataaaagtatcaacaaaaaattgttaatatacctctatactttaacgtcaaggagatgcgcctggatagagtatacagattgaagagggtcgttagggcttaccgcgtagcacgtgcattgcgagtgtccagggccattaggctacctagattgtttcgaattgatagattcccgcgagtggataggggttgcagagcaaggagactgatcagagctaagagattaaagcgcgtagttcgtgcacttagagtatctagggttactagaggttctagattagagaggggcgaaagagcacaaagaggtaaaagaggaatgaggttatcgagaattgcgcgtggtgacaggatccccagggtcagtagagcagaaagattcccaagagttgacagattatggagaattatcagattgcaaagatttcccagagtggatcgtgcccacagagtctgccgtctatcaagagtctcccgcgctgagcggggttggagggtctggcgtggtttcagagctgatagggggcctaggggtaaaaggggtatgcgactttcgagagtccaacgcgcttgtagggcctacagagttgccagaggctataggggacttcgtattaccagactcagtaggatcgcaagaggcgatcgaattccaagggtccagagagcgtgcagactattgcgtgtgcaaagaggatggcgtgtatggagggctgtgcgagctgttaggatttcacgtgggcaacgcttctgtagaggacatcgaatacgcagaggggtccgcgtcatgcgagtggaacgattagttcgtatcagccgtggacagagattttgcagaggtcacagaatccggcgtttggggagagttatgagggtcgaaagggtgattagaatttctagaggacaacgcttttgtagaggccatagaataaggagaggtacgagagttatgagggttgagagattatggagattgtatcgggttaatcgggttaggcggattaaacgaatttgtcggatcgaaagaggtattcgtgtaaaaaggggtaatagattacgaaggattcaacgtatatttagactccatagagtaccacgtataggcagattgacaaggatagttagggctgctagaggcggtagattaaacagagcctggagattgatgagaattttacgtgctcacagagtttgtcgaatagttcgactgaacagaatctatagagttcacagattcagaaggatacagagaatcgtgagaatttggcgctttactcgtttgcatagaggtcgtagagcagctagaggaggtcggctattgaggctttcccgcgtacaaagagcttgtcgtatcgtgaggttatgtagatttcatagaatacatcgtgctggtcgtgccttcagagttctacgggccaatagagcaatgagaatagctaggggcgacagaattcctcgtgtttttagagttggaagaattatgagagcaagcagattatcaagagtatctagagccgaaagggcccatagagtatgtagactggcgagagcgacaagggctccaagaggtgcaaggataaataggctttgggttagtaagggagaagagttatttacaggggtcgttcctatattagtagaacttgatggcgacgttaatggacataaatttagtgtttcaggtgaaggagaaggtgatgcaacgtacggtaaactgactctaaagttcatttgcaccaccggtaaattgcctgtaccgtggccaacactagttactacgttaacatacggcgtacagtgtttttcgagatatccagaccacatgaaacaacacgactttttcaaatccgcaatgccagaaggttacgtccaggaacgtactattttcttcaaagatgatggaaattataaaaccagggctgaagtgaaatttgaaggcgacactctagtgaacagaattgagttgaaggggattgatttcaaggaagacgggaacatactcggtcataagctggagtacaactataattcccataacgtctatattatggcggataagcaaaagaatggtatcaaggttaactttaaaatccggcacaatatcgaagatggctctgtacaattggccgatcattatcaacaaaatacacctattggagatggtcccgtgttgttaccagacaatcattacttgtcaacacaatctgctttaagcaaagatcccaatgagaaaagagatcatatggtcttgttagagtttgttactgccgctggtataactctgggtatggatgaactttataaataatcatgtaattagttatgtcacgcttacattcacgccctccccccacatccgctctaaccgaaaaggaaggagttagacaacctgaagtctaggtccctatttatttttttatagttatgttagtattaagaacgttatttatatttcaaatttttcttttttttctgtacagacgcgtgtacgcatgtaacattatactgaaaaccttgcttgagaaggttttgggacgctcgaaggctttaatttgccgtgatttacatatactacaagtcgccagtgtaactcctcactgaatatgattcatacatacccgtatgtattaatgtataaatgttctcagagcaaattttatcgatatcttgtttgccagtggtatgcaggtttggcaaattttttaccataatatccgtttatagattctggaaccttaccaactttcttaccgcta；

the information storage medium was synthesized as follows:

synthesizing a plurality of positive and negative spacing sequences of 55-70 bp as a PCR template, designing an overlap region with the length of 20-25 bp in the adjacent positive and negative spacing sequences, carrying out PCR amplification by using a head-tail primer, and obtaining the information storage medium by using Kapa high-fidelity polymerase as DNA polymerase.

Example 4 integration of an information storage Medium into the Saccharomyces cerevisiae genome

Firstly, designing sgRNA SEQ ID NO of 6 targeting saccharomyces cerevisiae AED1 genes to be 5-11, inserting the sgRNA SEQ ID NO into CRISPR/Cas9 gene editing plasmid (pRCC-K) in a recombination mode, wherein a plasmid map is shown in figure 2;

the information storage medium shown as SEQ ID NO. 13 and the CRISPR/Cas9 gene editing plasmid are simultaneously transformed into the saccharomyces cerevisiae for G418 screening, the CRISPR/Cas9 gene editing plasmid edits the AED1 gene of the saccharomyces cerevisiae, and the colony color is changed from white to red as shown in figure 3;

selecting red colonies, and carrying out PCR detection to obtain positive clones introduced with plasmids;

extracting the genome of the positive clone, amplifying the fragment near the ADE1 target point by PCR, obtaining the saccharomyces cerevisiae integrated with the information storage medium by Sanger sequencing verification, and reversely pushing the stored information according to the sequencing result as shown in figure 4 to obtain the original information.

EXAMPLE 5 stability analysis of information storage Medium

After the Saccharomyces cerevisiae with the genome integrated with the information storage medium shown as SEQ ID NO. 13 is stored for one month at-80 ℃, 20 ℃, 0 ℃ or 4 ℃, genomic DNA is extracted, and a fragment near an ADE1 target point is amplified by PCR for Sanger sequencing.

The sequencing result shows that after the DNA information storage medium is stored for one month at-80 ℃, 20 ℃, 0 ℃ or 4 ℃, the stored information in the information storage medium is not lost or changed, and the sequencing result is consistent with the original information, which indicates that the DNA information storage medium constructed by the invention is very stable.

In summary, the invention stores information by using codons corresponding to amino acids, the stored information can be read by using a sequencing technology, and the information contained in the information can be presumed according to the amino acid sequence corresponding to the protein, so that the reading of the information is not influenced even if the DNA is damaged, the technical problems that the DNA sequence is damaged and the stored information is lost in the prior art are solved, and the method has important significance in the technical field of information storage.

The applicant states that the present invention is illustrated in detail by the above examples, but the present invention is not limited to the above detailed methods, i.e. it is not meant that the present invention must rely on the above detailed methods for its implementation. It should be understood by those skilled in the art that any modification of the present invention, equivalent substitutions of the raw materials of the product of the present invention, addition of auxiliary components, selection of specific modes, etc., are within the scope and disclosure of the present invention.

SEQUENCE LISTING

<110> Suzhou Hongxn Biotechnology Ltd

<120> information storage medium, information storage method and application

<130>20200522

<160>13

<170>PatentIn version 3.3

<210>1

<211>720

<212>DNA

<213> Artificial sequence

<400>1

atggttagta agggagaaga gttatttaca ggggtcgttc ctatattagt agaacttgat 60

ggcgacgtta atggacataa atttagtgtt tcaggtgaag gagaaggtga tgcaacgtac 120

ggtaaactga ctctaaagtt catttgcacc accggtaaat tgcctgtacc gtggccaaca 180

ctagttacta cgttaacata cggcgtacag tgtttttcga gatatccaga ccacatgaaa 240

caacacgact ttttcaaatc cgcaatgcca gaaggttacg tccaggaacg tactattttc 300

ttcaaagatg atggaaatta taaaaccagg gctgaagtga aatttgaagg cgacactcta 360

gtgaacagaa ttgagttgaa ggggattgat ttcaaggaag acgggaacat actcggtcat 420

aagctggagt acaactataa ttcccataac gtctatatta tggcggataa gcaaaagaat 480

ggtatcaagg ttaactttaa aatccggcac aatatcgaag atggctctgt acaattggcc 540

gatcattatc aacaaaatac acctattgga gatggtcccg tgttgttacc agacaatcat 600

tacttgtcaa cacaatctgc tttaagcaaa gatcccaatg agaaaagaga tcatatggtc 660

ttgttagagt ttgttactgc cgctggtata actctgggta tggatgaact ttataaataa 720

<210>2

<211>442

<212>DNA

<213> Artificial sequence

<400>2

cggattagaa gccgccgagc gggtgacagc cctccgaagg aagactctcc tccgtgcgtc 60

ctcgtcttca ccggtcgcgt tcctgaaacg cagatgtgcc tcgcgccgca ctgctccgaa 120

caataaagat tctacaatac tagcttttat ggttatgaag aggaaaaatt ggcagtaacc 180

tggccccaca aaccttcaaa tgaacgaatc aaattaacaa ccataggatg ataatgcgat 240

tagtttttta gccttatttc tggggtaatt aatcagcgaa gcgatgattt ttgatctatt 300

aacagatata taaatgcaaa aactgcataa ccactttaac taatactttc aacattttcg 360

gtttgtatta cttcttattc aaatgtaata aaagtatcaa caaaaaattg ttaatatacc 420

tctatacttt aacgtcaagg ag 442

<210>3

<211>248

<212>DNA

<213> Artificial sequence

<400>3

tcatgtaatt agttatgtca cgcttacatt cacgccctcc ccccacatcc gctctaaccg 60

aaaaggaagg agttagacaa cctgaagtct aggtccctat ttattttttt atagttatgt 120

tagtattaag aacgttattt atatttcaaa tttttctttt ttttctgtac agacgcgtgt 180

acgcatgtaa cattatactg aaaaccttgc ttgagaaggt tttgggacgc tcgaaggctt 240

taatttgc 248

<210>4

<211>200

<212>DNA

<213> Artificial sequence

<400>4

gaggatgtaa taatactaat ctcgaagatg ccatctaata catatagaca tacatatata 60

tatatataca ttctatatat tcttacccag attctttgag gtaagacggt tgggttttat 120

cttttgcagt tggtactatt aagaacaatc gaatcataag cattgcttac aaagaataca 180

catacgaaat attaacgata 200

<210>5

<211>200

<212>DNA

<213> Artificial sequence

<400>5

cgtgatttac atatactaca agtcgccagt gtaactcctc actgaatatg attcatacat 60

acccgtatgt attaatgtat aaatgttctc agagcaaatt ttatcgatat cttgtttgcc 120

agtggtatgc aggtttggca aattttttac cataatatcc gtttatagat tctggaacct 180

taccaacttt cttaccgcta 200

<210>6

<211>20

<212>DNA

<213> Artificial sequence

<400>6

tcctgcccag gccgctgagc 20

<210>7

<211>20

<212>DNA

<213> Artificial sequence

<400>7

attgtcagag gctacatcac 20

<210>8

<211>20

<212>DNA

<213> Artificial sequence

<400>8

actctgacag tttggtcaat 20

<210>9

<211>20

<212>DNA

<213> Artificial sequence

<400>9

actttacctc tggccaccaa 20

<210>10

<211>20

<212>DNA

<213> Artificial sequence

<400>10

ggacggtata ttgccattgg 20

<210>11

<211>20

<212>DNA

<213> Artificial sequence

<400>11

tatgtctcta actttacctc 20

<210>12

<211>1530

<212>DNA

<213> Artificial sequence

<400>12

cgcctggata gagtatacag attgaagagg gtcgttaggg cttaccgcgt agcacgtgca 60

ttgcgagtgt ccagggccat taggctacct agattgtttc gaattgatag attcccgcga 120

gtggataggg gttgcagagc aaggagactg atcagagcta agagattaaa gcgcgtagtt 180

cgtgcactta gagtatctag ggttactaga ggttctagat tagagagggg cgaaagagca 240

caaagaggta aaagaggaat gaggttatcg agaattgcgc gtggtgacag gatccccagg 300

gtcagtagag cagaaagatt cccaagagtt gacagattat ggagaattat cagattgcaa 360

agatttccca gagtggatcg tgcccacaga gtctgccgtc tatcaagagt ctcccgcgct 420

gagcggggtt ggagggtctg gcgtggtttc agagctgata gggggcctag gggtaaaagg 480

ggtatgcgac tttcgagagt ccaacgcgct tgtagggcct acagagttgc cagaggctat 540

aggggacttc gtattaccag actcagtagg atcgcaagag gcgatcgaat tccaagggtc 600

cagagagcgt gcagactatt gcgtgtgcaa agaggatggc gtgtatggag ggctgtgcga 660

gctgttagga tttcacgtgg gcaacgcttc tgtagaggac atcgaatacg cagaggggtc 720

cgcgtcatgc gagtggaacg attagttcgt atcagccgtg gacagagatt ttgcagaggt 780

cacagaatcc ggcgtttggg gagagttatg agggtcgaaa gggtgattag aatttctaga 840

ggacaacgct tttgtagagg ccatagaata aggagaggta cgagagttat gagggttgag 900

agattatgga gattgtatcg ggttaatcgg gttaggcgga ttaaacgaat ttgtcggatc 960

gaaagaggta ttcgtgtaaa aaggggtaat agattacgaa ggattcaacg tatatttaga 1020

ctccatagag taccacgtat aggcagattg acaaggatag ttagggctgc tagaggcggt 1080

agattaaaca gagcctggag attgatgaga attttacgtg ctcacagagt ttgtcgaata 1140

gttcgactga acagaatcta tagagttcac agattcagaa ggatacagag aatcgtgaga 1200

atttggcgct ttactcgttt gcatagaggt cgtagagcag ctagaggagg tcggctattg 1260

aggctttccc gcgtacaaag agcttgtcgt atcgtgaggt tatgtagatt tcatagaata 1320

catcgtgctg gtcgtgcctt cagagttcta cgggccaata gagcaatgag aatagctagg 1380

ggcgacagaa ttcctcgtgt ttttagagtt ggaagaatta tgagagcaag cagattatca 1440

agagtatcta gagccgaaag ggcccataga gtatgtagac tggcgagagc gacaagggct 1500

ccaagaggtg caaggataaa taggctttgg 1530

<210>13

<211>3340

<212>DNA

<213> Artificial sequence

<400>13

gaggatgtaa taatactaat ctcgaagatg ccatctaata catatagaca tacatatata 60

tatatataca ttctatatat tcttacccag attctttgag gtaagacggt tgggttttat 120

cttttgcagt tggtactatt aagaacaatc gaatcataag cattgcttac aaagaataca 180

catacgaaat attaacgata cggattagaa gccgccgagc gggtgacagc cctccgaagg 240

aagactctcc tccgtgcgtc ctcgtcttca ccggtcgcgt tcctgaaacg cagatgtgcc 300

tcgcgccgca ctgctccgaa caataaagat tctacaatac tagcttttat ggttatgaag 360

aggaaaaatt ggcagtaacc tggccccaca aaccttcaaa tgaacgaatc aaattaacaa 420

ccataggatg ataatgcgat tagtttttta gccttatttc tggggtaatt aatcagcgaa 480

gcgatgattt ttgatctatt aacagatata taaatgcaaa aactgcataa ccactttaac 540

taatactttc aacattttcg gtttgtatta cttcttattc aaatgtaata aaagtatcaa 600

caaaaaattg ttaatatacc tctatacttt aacgtcaagg agatgcgcct ggatagagta 660

tacagattga agagggtcgt tagggcttac cgcgtagcac gtgcattgcg agtgtccagg 720

gccattaggc tacctagatt gtttcgaatt gatagattcc cgcgagtgga taggggttgc 780

agagcaagga gactgatcag agctaagaga ttaaagcgcg tagttcgtgc acttagagta 840

tctagggtta ctagaggttc tagattagag aggggcgaaa gagcacaaag aggtaaaaga 900

ggaatgaggt tatcgagaat tgcgcgtggt gacaggatcc ccagggtcag tagagcagaa 960

agattcccaa gagttgacag attatggaga attatcagat tgcaaagatt tcccagagtg 1020

gatcgtgccc acagagtctg ccgtctatca agagtctccc gcgctgagcg gggttggagg 1080

gtctggcgtg gtttcagagc tgataggggg cctaggggta aaaggggtat gcgactttcg 1140

agagtccaac gcgcttgtag ggcctacaga gttgccagag gctatagggg acttcgtatt 1200

accagactca gtaggatcgc aagaggcgat cgaattccaa gggtccagag agcgtgcaga 1260

ctattgcgtg tgcaaagagg atggcgtgta tggagggctg tgcgagctgt taggatttca 1320

cgtgggcaac gcttctgtag aggacatcga atacgcagag gggtccgcgt catgcgagtg 1380

gaacgattag ttcgtatcag ccgtggacag agattttgca gaggtcacag aatccggcgt 1440

ttggggagag ttatgagggt cgaaagggtg attagaattt ctagaggaca acgcttttgt 1500

agaggccata gaataaggag aggtacgaga gttatgaggg ttgagagatt atggagattg 1560

tatcgggtta atcgggttag gcggattaaa cgaatttgtc ggatcgaaag aggtattcgt 1620

gtaaaaaggg gtaatagatt acgaaggatt caacgtatat ttagactcca tagagtacca 1680

cgtataggca gattgacaag gatagttagg gctgctagag gcggtagatt aaacagagcc 1740

tggagattga tgagaatttt acgtgctcac agagtttgtc gaatagttcg actgaacaga 1800

atctatagag ttcacagatt cagaaggata cagagaatcg tgagaatttg gcgctttact 1860

cgtttgcata gaggtcgtag agcagctaga ggaggtcggc tattgaggct ttcccgcgta 1920

caaagagctt gtcgtatcgt gaggttatgt agatttcata gaatacatcg tgctggtcgt 1980

gccttcagag ttctacgggc caatagagca atgagaatag ctaggggcga cagaattcct 2040

cgtgttttta gagttggaag aattatgaga gcaagcagat tatcaagagt atctagagcc 2100

gaaagggccc atagagtatg tagactggcg agagcgacaa gggctccaag aggtgcaagg 2160

ataaataggc tttgggttag taagggagaa gagttattta caggggtcgt tcctatatta 2220

gtagaacttg atggcgacgt taatggacat aaatttagtg tttcaggtga aggagaaggt 2280

gatgcaacgt acggtaaact gactctaaag ttcatttgca ccaccggtaa attgcctgta 2340

ccgtggccaa cactagttac tacgttaaca tacggcgtac agtgtttttc gagatatcca 2400

gaccacatga aacaacacga ctttttcaaa tccgcaatgc cagaaggtta cgtccaggaa 2460

cgtactattt tcttcaaaga tgatggaaat tataaaacca gggctgaagt gaaatttgaa 2520

ggcgacactc tagtgaacag aattgagttg aaggggattg atttcaagga agacgggaac 2580

atactcggtc ataagctgga gtacaactat aattcccata acgtctatat tatggcggat 2640

aagcaaaaga atggtatcaa ggttaacttt aaaatccggc acaatatcga agatggctct 2700

gtacaattgg ccgatcatta tcaacaaaat acacctattg gagatggtcc cgtgttgtta 2760

ccagacaatc attacttgtc aacacaatct gctttaagca aagatcccaa tgagaaaaga 2820

gatcatatgg tcttgttaga gtttgttact gccgctggta taactctggg tatggatgaa 2880

ctttataaat aatcatgtaa ttagttatgt cacgcttaca ttcacgccct ccccccacat 2940

ccgctctaac cgaaaaggaa ggagttagac aacctgaagt ctaggtccct atttattttt 3000

ttatagttat gttagtatta agaacgttat ttatatttca aatttttctt ttttttctgt 3060

acagacgcgt gtacgcatgt aacattatac tgaaaacctt gcttgagaag gttttgggac 3120

gctcgaaggc tttaatttgc cgtgatttac atatactaca agtcgccagt gtaactcctc 3180

actgaatatg attcatacat acccgtatgt attaatgtat aaatgttctc agagcaaatt 3240

ttatcgatat cttgtttgcc agtggtatgc aggtttggca aattttttac cataatatcc 3300

gtttatagat tctggaacct taccaacttt cttaccgcta 3340

Claims

1. An information storage medium, wherein the information storage medium is a nucleic acid molecule;

2. The information storage medium of claim 1, wherein each character of the stored information is represented by a number of consecutive amino acids, each character of the stored information corresponding to a codon sequence consisting of a number of nucleotide residues;

preferably, each character of the stored information is represented by three consecutive amino acids, and each character of the stored information corresponds to a codon sequence consisting of 9 nucleotide residues;

preferably, the codon sequence corresponding to the stored information is positioned at the 5 'end of the fusion gene, and the fluorescent protein gene is positioned at the 3' end of the fusion gene;

3. The information storage medium of claim 1 or 2, further comprising a promoter and a terminator;

preferably, the promoter is GA L1 promoter;

preferably, the terminator is CYC1 terminator;

preferably, the information storage medium further comprises a 5're-organizing arm and a 3're-organizing arm;

preferably, the length of the 5 'recombination arm and the 3' recombination arm is 50-200 bp.

4. The information storage medium of any one of claims 1 to 3, wherein the information storage medium comprises, in order from 5 'to 3', a 5 'recombination arm, a GA L1 promoter, a codon sequence corresponding to a stored information, a green fluorescent protein gene, a CYC1 terminator, and a 3' recombination arm;

preferably, the 3' recombination arm comprises the nucleic acid sequence shown in SEQ ID NO. 5.

5. An information storage method, the method comprising:

information storage medium according to any one of claims 1 to 4 incorporated into the s.cerevisiae genome for information storage.

6. The method of claim 5, wherein the method comprises:

transforming the information storage medium of any one of claims 1-4 into competent saccharomyces cerevisiae simultaneously with a CRISPR/Cas9 gene editing plasmid targeting a saccharomyces cerevisiae gene;

7. The method of claim 5 or 6, wherein the CRISPR/Cas9 gene editing plasmid contains a sgRNA targeted to the Saccharomyces cerevisiae AED1 or AED2 gene;

preferably, the CRISPR/Cas9 gene editing plasmid contains sgRNA targeting the 5' sequence of the saccharomyces cerevisiae AED1 or AED2 gene;

preferably, the sgRNA includes a nucleic acid sequence shown in SEQ ID NO. 6-11.

8. The method according to any one of claims 5 to 7, wherein the information storage medium is prepared by:

synthesizing a plurality of positive and negative spacing sequences as a PCR template, designing a head primer and a tail primer to carry out PCR amplification to obtain the information storage medium, wherein the adjacent positive and negative spacing sequences have an overlapping region;

preferably, the length of the positive and negative spacing sequences is 55-70 bp;

preferably, the length of the overlapping region is 20-25 bp;

preferably, the length of the head primer and the tail primer is 55-70 bp.

9. The method of any one of claims 5-8, wherein the positive colonies are red;

preferably, the sequencing comprises Sanger sequencing.

10. An information storage kit, characterized in that the kit comprises the information storage medium according to any one of claims 1 to 4;

preferably, the kit further comprises a CRISPR/Cas9 gene editing plasmid targeting the saccharomyces cerevisiae gene;

preferably, the kit further comprises saccharomyces cerevisiae.