CN111440827A - Information storage medium, information storage method and application - Google Patents
Information storage medium, information storage method and application Download PDFInfo
- Publication number
- CN111440827A CN111440827A CN202010443536.6A CN202010443536A CN111440827A CN 111440827 A CN111440827 A CN 111440827A CN 202010443536 A CN202010443536 A CN 202010443536A CN 111440827 A CN111440827 A CN 111440827A
- Authority
- CN
- China
- Prior art keywords
- information storage
- storage medium
- information
- gene
- seq
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 27
- 108020004705 Codon Proteins 0.000 claims abstract description 32
- 108090000623 proteins and genes Proteins 0.000 claims abstract description 31
- 150000001413 amino acids Chemical class 0.000 claims abstract description 19
- 150000007523 nucleic acids Chemical class 0.000 claims abstract description 17
- 238000012163 sequencing technique Methods 0.000 claims abstract description 15
- 108091006047 fluorescent proteins Proteins 0.000 claims abstract description 12
- 230000004927 fusion Effects 0.000 claims abstract description 12
- 108020004707 nucleic acids Proteins 0.000 claims abstract description 3
- 102000039446 nucleic acids Human genes 0.000 claims abstract description 3
- 240000004808 Saccharomyces cerevisiae Species 0.000 claims description 31
- 235000014680 Saccharomyces cerevisiae Nutrition 0.000 claims description 31
- 108091033409 CRISPR Proteins 0.000 claims description 30
- 108091028043 Nucleic acid sequence Proteins 0.000 claims description 24
- 239000013612 plasmid Substances 0.000 claims description 20
- 238000010354 CRISPR gene editing Methods 0.000 claims description 15
- 238000010362 genome editing Methods 0.000 claims description 15
- 230000006798 recombination Effects 0.000 claims description 12
- 238000005215 recombination Methods 0.000 claims description 12
- 108010043121 Green Fluorescent Proteins Proteins 0.000 claims description 9
- 108091027544 Subgenomic mRNA Proteins 0.000 claims description 9
- 101100001082 Arabidopsis thaliana AED1 gene Proteins 0.000 claims description 8
- 125000003729 nucleotide group Chemical group 0.000 claims description 8
- 230000008685 targeting Effects 0.000 claims description 8
- 102000004144 Green Fluorescent Proteins Human genes 0.000 claims description 6
- 239000005090 green fluorescent protein Substances 0.000 claims description 6
- 238000001514 detection method Methods 0.000 claims description 5
- 238000007480 sanger sequencing Methods 0.000 claims description 5
- 238000012216 screening Methods 0.000 claims description 5
- 230000002194 synthesizing effect Effects 0.000 claims description 5
- 238000012408 PCR amplification Methods 0.000 claims description 3
- 238000012795 verification Methods 0.000 claims description 3
- 108010054624 red fluorescent protein Proteins 0.000 claims description 2
- 230000001131 transforming effect Effects 0.000 claims description 2
- 108700026220 vif Genes Proteins 0.000 claims description 2
- 125000003275 alpha amino acid group Chemical group 0.000 abstract description 11
- 238000005516 engineering process Methods 0.000 abstract description 8
- 102000004169 proteins and genes Human genes 0.000 abstract description 4
- 102000034287 fluorescent proteins Human genes 0.000 abstract description 3
- 108020004414 DNA Proteins 0.000 description 38
- 238000003752 polymerase chain reaction Methods 0.000 description 8
- 230000015572 biosynthetic process Effects 0.000 description 4
- 239000012634 fragment Substances 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 4
- 238000003786 synthesis reaction Methods 0.000 description 4
- 238000011161 development Methods 0.000 description 3
- 230000006801 homologous recombination Effects 0.000 description 3
- 238000002744 homologous recombination Methods 0.000 description 3
- 108091026890 Coding region Proteins 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 230000007774 longterm Effects 0.000 description 2
- 101150052453 ADE1 gene Proteins 0.000 description 1
- 101150065482 AED1 gene Proteins 0.000 description 1
- OKTJSMMVPCPJKN-UHFFFAOYSA-N Carbon Chemical compound [C] OKTJSMMVPCPJKN-UHFFFAOYSA-N 0.000 description 1
- 108700010070 Codon Usage Proteins 0.000 description 1
- 102000016928 DNA-directed DNA polymerase Human genes 0.000 description 1
- 108010014303 DNA-directed DNA polymerase Proteins 0.000 description 1
- 108091081024 Start codon Proteins 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000009395 breeding Methods 0.000 description 1
- 230000001488 breeding effect Effects 0.000 description 1
- 229910052799 carbon Inorganic materials 0.000 description 1
- 210000004027 cell Anatomy 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 239000003153 chemical reaction reagent Substances 0.000 description 1
- 210000000349 chromosome Anatomy 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 239000005547 deoxyribonucleotide Substances 0.000 description 1
- 125000002637 deoxyribonucleotide group Chemical group 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 238000005265 energy consumption Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 239000002360 explosive Substances 0.000 description 1
- 238000003209 gene knockout Methods 0.000 description 1
- 238000010353 genetic engineering Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 239000003262 industrial enzyme Substances 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000002773 nucleotide Substances 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 229910002059 quaternary alloy Inorganic materials 0.000 description 1
- 239000002994 raw material Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000011144 upstream manufacturing Methods 0.000 description 1
- 229960005486 vaccine Drugs 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/87—Introduction of foreign genetic material using processes not otherwise provided for, e.g. co-transformation
- C12N15/90—Stable introduction of foreign DNA into chromosome
- C12N15/902—Stable introduction of foreign DNA into chromosome using homologous recombination
- C12N15/905—Stable introduction of foreign DNA into chromosome using homologous recombination in yeast
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/63—Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
- C12N15/65—Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression using markers
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/63—Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
- C12N15/79—Vectors or expression systems specially adapted for eukaryotic hosts
- C12N15/80—Vectors or expression systems specially adapted for eukaryotic hosts for fungi
- C12N15/81—Vectors or expression systems specially adapted for eukaryotic hosts for fungi for yeasts
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B50/00—ICT programming tools or database systems specially adapted for bioinformatics
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N2800/00—Nucleic acids vectors
- C12N2800/22—Vectors comprising a coding region that has been codon optimised for expression in a respective host
Landscapes
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Genetics & Genomics (AREA)
- Engineering & Computer Science (AREA)
- Biotechnology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Zoology (AREA)
- General Engineering & Computer Science (AREA)
- Biomedical Technology (AREA)
- Wood Science & Technology (AREA)
- Chemical & Material Sciences (AREA)
- Organic Chemistry (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Mycology (AREA)
- Biophysics (AREA)
- Microbiology (AREA)
- Biochemistry (AREA)
- Plant Pathology (AREA)
- Bioethics (AREA)
- Databases & Information Systems (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Medical Informatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention provides an information storage medium, an information storage method and application, wherein the information storage medium is a nucleic acid molecule; the information storage medium comprises a fusion gene consisting of a codon sequence corresponding to the stored information and a fluorescent protein gene. The invention stores information by using the codons corresponding to the amino acids, the stored information can be read by using a sequencing technology, and the information contained in the information can be presumed according to the amino acid sequence corresponding to the protein, so that the information reading is not influenced even if the DNA is damaged; furthermore, the stored codon sequence can be theoretically translated into fluorescent protein, and the information is classified by observing the fluorescent color through a laser confocal microscope.
Description
Technical Field
The invention belongs to the technical field of information storage, relates to an information storage medium, an information storage method and application, and particularly relates to an amino acid corresponding codon information storage medium, an information storage method and application.
Background
Today, global digital information is growing rapidly, and the total amount of global digital information is expected to reach 35 ze bytes in 2020. However, the existing information storage devices cannot meet the demand for an explosive increase in the amount of information, and the existing magnetic and optical storage media have problems of low storage density and poor durability. The tape cassette is one of the most dense storage media, the storage capacity is 185TB, and the storage density is about 10GB/mm3(ii) a The storage capacity of the optical disk can reach 1PB and the storage density reaches 100GB/mm3. Storage media based on magnetic and optical technologies have a limited lifetime and, to ensure long-term storage of information, it is necessary to refresh and erase the damaged data or perform data migration to avoid data loss. For example, the storage life of the rotating disk is 3-5 years, and the storage life of the magnetic tape is 10-30 years. Energy consumption of the storage medium is also important, and the U.S. data center consumes 1.5% of the total electricity used in the U.S. year 2010, costing up to $ 45 billion. Needless to say, developing a next generation digital information storage medium with higher storage density and stronger durability is one of the primary tasks in the field of digital information storage at present.
DNA will become the future due to its high density and good long-term stabilityAn excellent choice of information storage medium. The DNA density is large, the total length of human genome DNA is up to 30 hundred million base pairs, but the DNA can exist in cells with the diameter of tens of micrometers, and the theoretical limit value of the information storage density of the DNA is 1EB/mm3(1EB=109GB), that is to say 1gDNA can store 700TB of data, corresponding to 1.4 million blu-ray discs of 50GB or 233 hard discs of 3TB, far exceeding existing magnetic and optical storage media. The DNA stability is extremely strong, and researchers predict that the DNA can be stored for 1 million years in an environment at-18 ℃.
At present, the cost and time for gene synthesis and gene sequencing are exponentially reduced, greatly promoting the development of the field of DNA information storage media.
CN106845158A discloses a method for storing information by using DNA, which comprises (1) converting binary information of computer original files into quaternary system and further code-converting into DNA full sequence, wherein binary codes 00, 01, 10, 11 are correspondingly converted into A, T, C, G four deoxyribonucleotides respectively; (2) dividing the whole DNA sequence into a plurality of DNA fragments, and organizing and constructing an output DNA sequence which has the length of 90-110nt and comprises an insertion nucleotide coding sequence consisting of the DNA fragments, flanking primer sequences positioned at two ends and an index coding sequence positioned at the inner side of each flanking primer sequence; (3) synthesizing artificial DNA sequence based on the output DNA sequence and storing. The method has the remarkable advantages of good universality, capability of simplifying operation, improving the continuity, the storage efficiency and the density of DNA information storage, reducing the error rate, reducing the sequence synthesis and detection cost and the like.
However, the current DNA storage is that information is directly stored in units of bases, and if DNA is damaged, the stored data cannot be recovered. Therefore, there is a need for further improvement of the existing DNA information storage technology.
Disclosure of Invention
Aiming at the defects and actual requirements of the prior art, the invention provides an information storage medium, an information storage method and application, wherein the information is stored by utilizing codons corresponding to amino acids, the stored information can be read by utilizing a sequencing technology, and the information contained in the information can be presumed according to the amino acid sequence corresponding to the protein, so that the reading of the information is not influenced even if the DNA is damaged; furthermore, the stored codon sequence can be theoretically translated into fluorescent protein, and the information is classified by observing the fluorescent color through a laser confocal microscope.
In order to achieve the purpose, the invention adopts the following technical scheme:
in a first aspect, the present invention provides an information storage medium which is a nucleic acid molecule;
the information storage medium comprises a fusion gene consisting of a codon sequence corresponding to the stored information and a fluorescent protein gene.
Preferably, each character of the stored information is represented by a number of consecutive amino acids, and each character of the stored information corresponds to a codon sequence consisting of a number of nucleotide residues.
In the invention, each character of the stored information is expressed as N continuous amino acids, and then the amino acids are converted into corresponding codons, namely, each character of the stored information corresponds to a codon sequence consisting of 3N nucleotide residues.
Preferably, each character of the stored information is represented by three consecutive amino acids, and each character of the stored information corresponds to a codon sequence consisting of 9 nucleotide residues.
In the present invention, each character of stored information (chinese character, english character or symbol) is converted into an amino acid sequence represented by three consecutive amino acids, on the basis of twenty amino acids "G", "a", "V", "L", "I", "F", "W", "Y", "D", "N", "E", "K", "Q", "M", "S", "T", "C", "P", "H" and "R", any 3 amino acid combinations represent one chinese character, and 20 × 20 × 20 chinese characters can be represented at the maximum, each amino acid is converted into a corresponding codon and codon-optimized according to saccharomyces cerevisiae, that is, each character of the stored information corresponds to a codon sequence consisting of 9 nucleotide residues, the codon sequence and a fluorescent protein gene are constructed as a fusion gene, and the thus constructed information storage medium is essentially a DNA sequence, and the stored information can be read by sequencing, and the DNA sequence also corresponds to an amino acid sequence, even after DNA is damaged, the stored information can be calculated by the amino acid sequence, and the information storage medium can be theoretically translated into a fluorescent protein, and the information can be classified by observing the fluorescent color under a microscope.
Preferably, the codon sequence corresponding to the stored information is located at the 5 'end of the fusion gene, and the fluorescent protein gene is located at the 3' end of the fusion gene.
Preferably, the fluorescent protein gene includes a green fluorescent protein gene or a red fluorescent protein gene.
Preferably, the information storage medium further includes a promoter and a terminator.
In the invention, a promoter and a terminator which are suitable for the saccharomyces cerevisiae are respectively connected with the 5 'end and the 3' end of the fusion gene.
Preferably, the promoter is the GA L1 promoter.
Preferably, the terminator is CYC1 terminator.
Preferably, the information storage medium further includes a 5're-organizing arm and a 3're-organizing arm.
In the invention, the recombination arm is a sequence suitable for homologous recombination in saccharomyces cerevisiae, namely, a sequence on two sides of a target gene corresponding to the designed sgRNA when CRISPR/Cas9 gene editing is carried out.
Preferably, the length of the 5 'recombinant arm and the 3' recombinant arm is 50-200 bp, for example, 50bp, 60bp, 70bp, 80bp, 90bp, 100bp, 110bp, 120bp, 130bp, 140bp, 150bp, 160bp, 170bp, 180bp, 190bp or 200 bp.
Preferably, the information storage medium comprises a 5 'recombination arm, a GA L1 promoter, a codon sequence corresponding to stored information, a green fluorescent protein gene, a CYC1 terminator and a 3' recombination arm in sequence from 5 'to 3'.
Preferably, the green fluorescent protein comprises a nucleic acid sequence shown as SEQ ID NO. 1;
SEQ ID NO:1:
atggttagtaagggagaagagttatttacaggggtcgttcctatattagtagaacttgatggcgacgttaatggacataaatttagtgtttcaggtgaaggagaaggtgatgcaacgtacggtaaactgactctaaagttcatttgcaccaccggtaaattgcctgtaccgtggccaacactagttactacgttaacatacggcgtacagtgtttttcgagatatccagaccacatgaaacaacacgactttttcaaatccgcaatgccagaaggttacgtccaggaacgtactattttcttcaaagatgatggaaattataaaaccagggctgaagtgaaatttgaaggcgacactctagtgaacagaattgagttgaaggggattgatttcaaggaagacgggaacatactcggtcataagctggagtacaactataattcccataacgtctatattatggcggataagcaaaagaatggtatcaaggttaactttaaaatccggcacaatatcgaagatggctctgtacaattggccgatcattatcaacaaaatacacctattggagatggtcccgtgttgttaccagacaatcattacttgtcaacacaatctgctttaagcaaagatcccaatgagaaaagagatcatatggtcttgttagagtttgttactgccgctggtataactctgggtatggatgaactttataaataa。
preferably, the GA L1 promoter comprises the nucleic acid sequence shown in SEQ ID NO. 2;
SEQ ID NO:2:
cggattagaagccgccgagcgggtgacagccctccgaaggaagactctcctccgtgcgtcctcgtcttcaccggtcgcgttcctgaaacgcagatgtgcctcgcgccgcactgctccgaacaataaagattctacaatactagcttttatggttatgaagaggaaaaattggcagtaacctggccccacaaaccttcaaatgaacgaatcaaattaacaaccataggatgataatgcgattagttttttagccttatttctggggtaattaatcagcgaagcgatgatttttgatctattaacagatatataaatgcaaaaactgcataaccactttaactaatactttcaacattttcggtttgtattacttcttattcaaatgtaataaaagtatcaacaaaaaattgttaatatacctctatactttaacgtcaaggag。
preferably, the CYC1 terminator comprises the nucleic acid sequence set forth in SEQ ID NO. 3;
SEQ ID NO:3:
tcatgtaattagttatgtcacgcttacattcacgccctccccccacatccgctctaaccgaaaaggaaggagttagacaacctgaagtctaggtccctatttatttttttatagttatgttagtattaagaacgttatttatatttcaaatttttcttttttttctgtacagacgcgtgtacgcatgtaacattatactgaaaaccttgcttgagaaggttttgggacgctcgaaggctttaatttgc。
preferably, the 5' recombination arm comprises the nucleic acid sequence shown in SEQ ID NO. 4;
SEQ ID NO:4:
gaggatgtaataatactaatctcgaagatgccatctaatacatatagacatacatatatatatatatacattctatatattcttacccagattctttgaggtaagacggttgggttttatcttttgcagttggtactattaagaacaatcgaatcataagcattgcttacaaagaatacacatacgaaatattaacgata。
preferably, the 3' recombination arm comprises the nucleic acid sequence shown in SEQ ID NO. 5;
SEQ ID NO:5:
cgtgatttacatatactacaagtcgccagtgtaactcctcactgaatatgattcatacatacccgtatgtattaatgtataaatgttctcagagcaaattttatcgatatcttgtttgccagtggtatgcaggtttggcaaattttttaccataatatccgtttatagattctggaaccttaccaactttcttaccgcta。
in a second aspect, the present invention provides an information storage method, including:
integrating the information storage medium of the first aspect into the saccharomyces cerevisiae genome for information storage.
Preferably, the method comprises:
transforming the information storage medium of the first aspect into competent saccharomyces cerevisiae simultaneously with the CRISPR/Cas9 gene editing plasmid targeting the saccharomyces cerevisiae gene;
primarily screening positive colonies, and carrying out PCR detection to obtain positive clones introduced with plasmids;
and extracting the positive cloned genome, and performing sequencing verification to obtain the saccharomyces cerevisiae integrated with the information storage medium.
Preferably, the CRISPR/Cas9 gene editing plasmid contains sgRNA targeting the saccharomyces cerevisiae AED1 or AED2 genes.
In the invention, CRISPR/Cas9 gene editing plasmids of targeted saccharomyces cerevisiae AED1 or AED2 genes are designed, so that the color of colonies is changed from white to red after the AED1 or AED2 genes of saccharomyces cerevisiae are edited, and the primary screening of positive colonies by visual observation is facilitated.
Preferably, the CRISPR/Cas9 gene editing plasmid contains sgrnas that target the 5' sequence of the saccharomyces cerevisiae AED1 or AED2 genes.
Preferably, the sgRNA comprises a nucleic acid sequence shown in SEQ ID NO. 6-11;
SEQ ID NO:6:TCCTGCCCAGGCCGCTGAGC;
SEQ ID NO:7:ATTGTCAGAGGCTACATCAC;
SEQ ID NO:8:ACTCTGACAGTTTGGTCAAT;
SEQ ID NO:9:ACTTTACCTCTGGCCACCAA;
SEQ ID NO:10:GGACGGTATATTGCCATTGG;
SEQ ID NO:11:TATGTCTCTAACTTTACCTC。
preferably, the information storage medium is prepared by the following method:
and synthesizing a plurality of positive and negative spacing sequences as PCR templates, designing head and tail primers for PCR amplification to obtain the information storage medium, wherein the adjacent positive and negative spacing sequences have overlapping regions.
Preferably, the length of the positive and negative spacing sequence is 55-70 bp, for example, 55bp, 56bp, 57bp, 58bp, 59bp, 60bp, 61bp, 62bp, 63bp, 64bp, 65bp, 66bp, 67bp, 68bp, 69bp or 70 bp.
Preferably, the length of the overlapping region is 20-25 bp, for example, 20bp, 21bp, 22bp, 23bp, 24bp or 25 bp.
Preferably, the length of the head and tail primers is 55-70 bp, for example, 55bp, 56bp, 57bp, 58bp, 59bp, 60bp, 61bp, 62bp, 63bp, 64bp, 65bp, 66bp, 67bp, 68bp, 69bp or 70 bp.
Preferably, the positive colonies are red.
Preferably, the sequencing comprises Sanger sequencing.
In a third aspect, the present invention provides an information storage kit comprising the information storage medium of the first aspect.
Preferably, the kit further comprises a CRISPR/Cas9 gene editing plasmid targeting the saccharomyces cerevisiae gene.
Preferably, the kit further comprises saccharomyces cerevisiae.
Compared with the prior art, the invention has the following beneficial effects:
(1) the invention stores information by using codons corresponding to amino acids, each character of the stored information corresponds to a codon sequence consisting of 9 nucleotide residues, and the codon sequence and the fluorescent protein gene construct a fusion gene, so that the constructed information storage medium is essentially a DNA sequence, the stored information can be read by sequencing, the DNA sequence also corresponds to the amino acid sequence, and the stored information can be calculated by the amino acid sequence even if the DNA is damaged, thereby solving the problems that the DNA sequence is damaged and the data can not be recovered in the prior art;
(2) the information storage medium of the present invention can theoretically be translated into a protein that exhibits fluorescence, and information classification is performed by observing the fluorescence color under a microscope;
(3) the information storage medium has high storage density and good stability, and has wide application prospect in the technical field of information storage.
Drawings
FIG. 1 is a flow chart of a method of information storage;
FIG. 2 is a CRISPR/Cas9 gene editing plasmid map;
FIG. 3 shows that the color of the Saccharomyces cerevisiae colony changes from white to red after ADE1 gene knockout is successful;
FIG. 4 is a Sanger sequencing alignment after information storage.
Detailed Description
To further illustrate the technical means adopted by the present invention and the effects thereof, the present invention is further described below with reference to the embodiments and the accompanying drawings. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention.
The examples do not show the specific techniques or conditions, according to the technical or conditions described in the literature in the field, or according to the product specifications. The reagents or apparatus used are conventional products commercially available from normal sources, not indicated by the manufacturer.
Example 1 information storage Experimental procedure
The flow chart of the information storage method constructed by the invention is shown in figure 1, firstly, each character of the information to be stored is converted into an amino acid sequence represented by three continuous amino acids, and then each amino acid is converted into a corresponding codon to obtain a codon sequence corresponding to the stored information; fusing a codon sequence with a Green Fluorescent Protein (GFP) gene to form a fusion gene, adding a promoter and a terminator, and adding a homologous recombination arm to form Donor; designing sgRNA of a targeting saccharomyces cerevisiae AED1 or AED2 gene, and constructing a CRISPR/Cas9 gene editing plasmid; co-transforming the plasmid and the Donor into saccharomyces cerevisiae, primarily screening positive colonies according to the color of the colonies, and carrying out PCR (polymerase chain reaction) detection to obtain positive clones introduced with the plasmid; and extracting the positive cloned genome, sequencing and verifying to obtain the saccharomyces cerevisiae integrated with the information storage medium, and reversely pushing the stored information according to the sequencing result.
Example 2 selection of information and codon conversion
The embodiment selects the following information for storage:
"Hongxu Biotechnology GmbH (Hongxu technology, Suzhou) is a leading DNA technology company. The technical platform developed by the present company is a globally leading combined platform for synthesizing biologically complete DNA1.0,2.0 and3.0. the method efficiently meets the customer requirements of humanized antibody library construction, genetic engineering vaccine development, industrial enzyme optimization, chromosome/genome synthesis, molecular assisted breeding, DNA information storage technology development and the like. "
Converting the information to be stored into an amino acid sequence, wherein any 3 amino acid combinations represent a Chinese character based on twenty amino acids 'G', 'A', 'V', 'L', 'I', 'F', 'W', 'Y', 'D', 'N', 'E', 'K', 'Q', 'M', 'S', 'T', 'C', 'P', 'H' and 'R', and can represent at most 20 × 20 × 20 Chinese characters, converting the amino acid sequence into a corresponding codon sequence, and optimizing the codon sequence according to the codon preference of Saccharomyces cerevisiae to obtain the nucleic acid sequence carrying the stored information shown as SEQ ID NO. 12:
cgcctggatagagtatacagattgaagagggtcgttagggcttaccgcgtagcacgtgcattgcgagtgtccagggccattaggctacctagattgtttcgaattgatagattcccgcgagtggataggggttgcagagcaaggagactgatcagagctaagagattaaagcgcgtagttcgtgcacttagagtatctagggttactagaggttctagattagagaggggcgaaagagcacaaagaggtaaaagaggaatgaggttatcgagaattgcgcgtggtgacaggatccccagggtcagtagagcagaaagattcccaagagttgacagattatggagaattatcagattgcaaagatttcccagagtggatcgtgcccacagagtctgccgtctatcaagagtctcccgcgctgagcggggttggagggtctggcgtggtttcagagctgatagggggcctaggggtaaaaggggtatgcgactttcgagagtccaacgcgcttgtagggcctacagagttgccagaggctataggggacttcgtattaccagactcagtaggatcgcaagaggcgatcgaattccaagggtccagagagcgtgcagactattgcgtgtgcaaagaggatggcgtgtatggagggctgtgcgagctgttaggatttcacgtgggcaacgcttctgtagaggacatcgaatacgcagaggggtccgcgtcatgcgagtggaacgattagttcgtatcagccgtggacagagattttgcagaggtcacagaatccggcgtttggggagagttatgagggtcgaaagggtgattagaatttctagaggacaacgcttttgtagaggccatagaataaggagaggtacgagagttatgagggttgagagattatggagattgtatcgggttaatcgggttaggcggattaaacgaatttgtcggatcgaaagaggtattcgtgtaaaaaggggtaatagattacgaaggattcaacgtatatttagactccatagagtaccacgtataggcagattgacaaggatagttagggctgctagaggcggtagattaaacagagcctggagattgatgagaattttacgtgctcacagagtttgtcgaatagttcgactgaacagaatctatagagttcacagattcagaaggatacagagaatcgtgagaatttggcgctttactcgtttgcatagaggtcgtagagcagctagaggaggtcggctattgaggctttcccgcgtacaaagagcttgtcgtatcgtgaggttatgtagatttcatagaatacatcgtgctggtcgtgccttcagagttctacgggccaatagagcaatgagaatagctaggggcgacagaattcctcgtgtttttagagttggaagaattatgagagcaagcagattatcaagagtatctagagccgaaagggcccatagagtatgtagactggcgagagcgacaagggctccaagaggtgcaaggataaataggctttgg。
EXAMPLE 3 design and Synthesis of information storage Medium
The method comprises the steps of fusing a nucleic acid sequence shown as SEQ ID NO. 12 and carrying storage information with a green fluorescent protein gene shown as SEQ ID NO. 1 through a carbon end to construct a fusion gene, adding an initiation codon at a 5 'end, adding a stop codon at a 3' end, adding a GA L1 promoter (SEQ ID NO:2) at the upstream end, adding a CYC1 terminator (SEQ ID NO:3) at the downstream end, and adding homologous recombination arms (SEQ ID NO: 4-5) at two ends to design an information storage medium shown as SEQ ID NO. 13;
SEQ ID NO:13:
gaggatgtaataatactaatctcgaagatgccatctaatacatatagacatacatatatatatatatacattctatatattcttacccagattctttgaggtaagacggttgggttttatcttttgcagttggtactattaagaacaatcgaatcataagcattgcttacaaagaatacacatacgaaatattaacgatacggattagaagccgccgagcgggtgacagccctccgaaggaagactctcctccgtgcgtcctcgtcttcaccggtcgcgttcctgaaacgcagatgtgcctcgcgccgcactgctccgaacaataaagattctacaatactagcttttatggttatgaagaggaaaaattggcagtaacctggccccacaaaccttcaaatgaacgaatcaaattaacaaccataggatgataatgcgattagttttttagccttatttctggggtaattaatcagcgaagcgatgatttttgatctattaacagatatataaatgcaaaaactgcataaccactttaactaatactttcaacattttcggtttgtattacttcttattcaaatgtaataaaagtatcaacaaaaaattgttaatatacctctatactttaacgtcaaggagatgcgcctggatagagtatacagattgaagagggtcgttagggcttaccgcgtagcacgtgcattgcgagtgtccagggccattaggctacctagattgtttcgaattgatagattcccgcgagtggataggggttgcagagcaaggagactgatcagagctaagagattaaagcgcgtagttcgtgcacttagagtatctagggttactagaggttctagattagagaggggcgaaagagcacaaagaggtaaaagaggaatgaggttatcgagaattgcgcgtggtgacaggatccccagggtcagtagagcagaaagattcccaagagttgacagattatggagaattatcagattgcaaagatttcccagagtggatcgtgcccacagagtctgccgtctatcaagagtctcccgcgctgagcggggttggagggtctggcgtggtttcagagctgatagggggcctaggggtaaaaggggtatgcgactttcgagagtccaacgcgcttgtagggcctacagagttgccagaggctataggggacttcgtattaccagactcagtaggatcgcaagaggcgatcgaattccaagggtccagagagcgtgcagactattgcgtgtgcaaagaggatggcgtgtatggagggctgtgcgagctgttaggatttcacgtgggcaacgcttctgtagaggacatcgaatacgcagaggggtccgcgtcatgcgagtggaacgattagttcgtatcagccgtggacagagattttgcagaggtcacagaatccggcgtttggggagagttatgagggtcgaaagggtgattagaatttctagaggacaacgcttttgtagaggccatagaataaggagaggtacgagagttatgagggttgagagattatggagattgtatcgggttaatcgggttaggcggattaaacgaatttgtcggatcgaaagaggtattcgtgtaaaaaggggtaatagattacgaaggattcaacgtatatttagactccatagagtaccacgtataggcagattgacaaggatagttagggctgctagaggcggtagattaaacagagcctggagattgatgagaattttacgtgctcacagagtttgtcgaatagttcgactgaacagaatctatagagttcacagattcagaaggatacagagaatcgtgagaatttggcgctttactcgtttgcatagaggtcgtagagcagctagaggaggtcggctattgaggctttcccgcgtacaaagagcttgtcgtatcgtgaggttatgtagatttcatagaatacatcgtgctggtcgtgccttcagagttctacgggccaatagagcaatgagaatagctaggggcgacagaattcctcgtgtttttagagttggaagaattatgagagcaagcagattatcaagagtatctagagccgaaagggcccatagagtatgtagactggcgagagcgacaagggctccaagaggtgcaaggataaataggctttgggttagtaagggagaagagttatttacaggggtcgttcctatattagtagaacttgatggcgacgttaatggacataaatttagtgtttcaggtgaaggagaaggtgatgcaacgtacggtaaactgactctaaagttcatttgcaccaccggtaaattgcctgtaccgtggccaacactagttactacgttaacatacggcgtacagtgtttttcgagatatccagaccacatgaaacaacacgactttttcaaatccgcaatgccagaaggttacgtccaggaacgtactattttcttcaaagatgatggaaattataaaaccagggctgaagtgaaatttgaaggcgacactctagtgaacagaattgagttgaaggggattgatttcaaggaagacgggaacatactcggtcataagctggagtacaactataattcccataacgtctatattatggcggataagcaaaagaatggtatcaaggttaactttaaaatccggcacaatatcgaagatggctctgtacaattggccgatcattatcaacaaaatacacctattggagatggtcccgtgttgttaccagacaatcattacttgtcaacacaatctgctttaagcaaagatcccaatgagaaaagagatcatatggtcttgttagagtttgttactgccgctggtataactctgggtatggatgaactttataaataatcatgtaattagttatgtcacgcttacattcacgccctccccccacatccgctctaaccgaaaaggaaggagttagacaacctgaagtctaggtccctatttatttttttatagttatgttagtattaagaacgttatttatatttcaaatttttcttttttttctgtacagacgcgtgtacgcatgtaacattatactgaaaaccttgcttgagaaggttttgggacgctcgaaggctttaatttgccgtgatttacatatactacaagtcgccagtgtaactcctcactgaatatgattcatacatacccgtatgtattaatgtataaatgttctcagagcaaattttatcgatatcttgtttgccagtggtatgcaggtttggcaaattttttaccataatatccgtttatagattctggaaccttaccaactttcttaccgcta;
the information storage medium was synthesized as follows:
synthesizing a plurality of positive and negative spacing sequences of 55-70 bp as a PCR template, designing an overlap region with the length of 20-25 bp in the adjacent positive and negative spacing sequences, carrying out PCR amplification by using a head-tail primer, and obtaining the information storage medium by using Kapa high-fidelity polymerase as DNA polymerase.
Example 4 integration of an information storage Medium into the Saccharomyces cerevisiae genome
Firstly, designing sgRNA SEQ ID NO of 6 targeting saccharomyces cerevisiae AED1 genes to be 5-11, inserting the sgRNA SEQ ID NO into CRISPR/Cas9 gene editing plasmid (pRCC-K) in a recombination mode, wherein a plasmid map is shown in figure 2;
the information storage medium shown as SEQ ID NO. 13 and the CRISPR/Cas9 gene editing plasmid are simultaneously transformed into the saccharomyces cerevisiae for G418 screening, the CRISPR/Cas9 gene editing plasmid edits the AED1 gene of the saccharomyces cerevisiae, and the colony color is changed from white to red as shown in figure 3;
selecting red colonies, and carrying out PCR detection to obtain positive clones introduced with plasmids;
extracting the genome of the positive clone, amplifying the fragment near the ADE1 target point by PCR, obtaining the saccharomyces cerevisiae integrated with the information storage medium by Sanger sequencing verification, and reversely pushing the stored information according to the sequencing result as shown in figure 4 to obtain the original information.
EXAMPLE 5 stability analysis of information storage Medium
After the Saccharomyces cerevisiae with the genome integrated with the information storage medium shown as SEQ ID NO. 13 is stored for one month at-80 ℃, 20 ℃, 0 ℃ or 4 ℃, genomic DNA is extracted, and a fragment near an ADE1 target point is amplified by PCR for Sanger sequencing.
The sequencing result shows that after the DNA information storage medium is stored for one month at-80 ℃, 20 ℃, 0 ℃ or 4 ℃, the stored information in the information storage medium is not lost or changed, and the sequencing result is consistent with the original information, which indicates that the DNA information storage medium constructed by the invention is very stable.
In summary, the invention stores information by using codons corresponding to amino acids, the stored information can be read by using a sequencing technology, and the information contained in the information can be presumed according to the amino acid sequence corresponding to the protein, so that the reading of the information is not influenced even if the DNA is damaged, the technical problems that the DNA sequence is damaged and the stored information is lost in the prior art are solved, and the method has important significance in the technical field of information storage.
The applicant states that the present invention is illustrated in detail by the above examples, but the present invention is not limited to the above detailed methods, i.e. it is not meant that the present invention must rely on the above detailed methods for its implementation. It should be understood by those skilled in the art that any modification of the present invention, equivalent substitutions of the raw materials of the product of the present invention, addition of auxiliary components, selection of specific modes, etc., are within the scope and disclosure of the present invention.
SEQUENCE LISTING
<110> Suzhou Hongxn Biotechnology Ltd
<120> information storage medium, information storage method and application
<130>20200522
<160>13
<170>PatentIn version 3.3
<210>1
<211>720
<212>DNA
<213> Artificial sequence
<400>1
atggttagta agggagaaga gttatttaca ggggtcgttc ctatattagt agaacttgat 60
ggcgacgtta atggacataa atttagtgtt tcaggtgaag gagaaggtga tgcaacgtac 120
ggtaaactga ctctaaagtt catttgcacc accggtaaat tgcctgtacc gtggccaaca 180
ctagttacta cgttaacata cggcgtacag tgtttttcga gatatccaga ccacatgaaa 240
caacacgact ttttcaaatc cgcaatgcca gaaggttacg tccaggaacg tactattttc 300
ttcaaagatg atggaaatta taaaaccagg gctgaagtga aatttgaagg cgacactcta 360
gtgaacagaa ttgagttgaa ggggattgat ttcaaggaag acgggaacat actcggtcat 420
aagctggagt acaactataa ttcccataac gtctatatta tggcggataa gcaaaagaat 480
ggtatcaagg ttaactttaa aatccggcac aatatcgaag atggctctgt acaattggcc 540
gatcattatc aacaaaatac acctattgga gatggtcccg tgttgttacc agacaatcat 600
tacttgtcaa cacaatctgc tttaagcaaa gatcccaatg agaaaagaga tcatatggtc 660
ttgttagagt ttgttactgc cgctggtata actctgggta tggatgaact ttataaataa 720
<210>2
<211>442
<212>DNA
<213> Artificial sequence
<400>2
cggattagaa gccgccgagc gggtgacagc cctccgaagg aagactctcc tccgtgcgtc 60
ctcgtcttca ccggtcgcgt tcctgaaacg cagatgtgcc tcgcgccgca ctgctccgaa 120
caataaagat tctacaatac tagcttttat ggttatgaag aggaaaaatt ggcagtaacc 180
tggccccaca aaccttcaaa tgaacgaatc aaattaacaa ccataggatg ataatgcgat 240
tagtttttta gccttatttc tggggtaatt aatcagcgaa gcgatgattt ttgatctatt 300
aacagatata taaatgcaaa aactgcataa ccactttaac taatactttc aacattttcg 360
gtttgtatta cttcttattc aaatgtaata aaagtatcaa caaaaaattg ttaatatacc 420
tctatacttt aacgtcaagg ag 442
<210>3
<211>248
<212>DNA
<213> Artificial sequence
<400>3
tcatgtaatt agttatgtca cgcttacatt cacgccctcc ccccacatcc gctctaaccg 60
aaaaggaagg agttagacaa cctgaagtct aggtccctat ttattttttt atagttatgt 120
tagtattaag aacgttattt atatttcaaa tttttctttt ttttctgtac agacgcgtgt 180
acgcatgtaa cattatactg aaaaccttgc ttgagaaggt tttgggacgc tcgaaggctt 240
taatttgc 248
<210>4
<211>200
<212>DNA
<213> Artificial sequence
<400>4
gaggatgtaa taatactaat ctcgaagatg ccatctaata catatagaca tacatatata 60
tatatataca ttctatatat tcttacccag attctttgag gtaagacggt tgggttttat 120
cttttgcagt tggtactatt aagaacaatc gaatcataag cattgcttac aaagaataca 180
catacgaaat attaacgata 200
<210>5
<211>200
<212>DNA
<213> Artificial sequence
<400>5
cgtgatttac atatactaca agtcgccagt gtaactcctc actgaatatg attcatacat 60
acccgtatgt attaatgtat aaatgttctc agagcaaatt ttatcgatat cttgtttgcc 120
agtggtatgc aggtttggca aattttttac cataatatcc gtttatagat tctggaacct 180
taccaacttt cttaccgcta 200
<210>6
<211>20
<212>DNA
<213> Artificial sequence
<400>6
tcctgcccag gccgctgagc 20
<210>7
<211>20
<212>DNA
<213> Artificial sequence
<400>7
attgtcagag gctacatcac 20
<210>8
<211>20
<212>DNA
<213> Artificial sequence
<400>8
actctgacag tttggtcaat 20
<210>9
<211>20
<212>DNA
<213> Artificial sequence
<400>9
actttacctc tggccaccaa 20
<210>10
<211>20
<212>DNA
<213> Artificial sequence
<400>10
ggacggtata ttgccattgg 20
<210>11
<211>20
<212>DNA
<213> Artificial sequence
<400>11
tatgtctcta actttacctc 20
<210>12
<211>1530
<212>DNA
<213> Artificial sequence
<400>12
cgcctggata gagtatacag attgaagagg gtcgttaggg cttaccgcgt agcacgtgca 60
ttgcgagtgt ccagggccat taggctacct agattgtttc gaattgatag attcccgcga 120
gtggataggg gttgcagagc aaggagactg atcagagcta agagattaaa gcgcgtagtt 180
cgtgcactta gagtatctag ggttactaga ggttctagat tagagagggg cgaaagagca 240
caaagaggta aaagaggaat gaggttatcg agaattgcgc gtggtgacag gatccccagg 300
gtcagtagag cagaaagatt cccaagagtt gacagattat ggagaattat cagattgcaa 360
agatttccca gagtggatcg tgcccacaga gtctgccgtc tatcaagagt ctcccgcgct 420
gagcggggtt ggagggtctg gcgtggtttc agagctgata gggggcctag gggtaaaagg 480
ggtatgcgac tttcgagagt ccaacgcgct tgtagggcct acagagttgc cagaggctat 540
aggggacttc gtattaccag actcagtagg atcgcaagag gcgatcgaat tccaagggtc 600
cagagagcgt gcagactatt gcgtgtgcaa agaggatggc gtgtatggag ggctgtgcga 660
gctgttagga tttcacgtgg gcaacgcttc tgtagaggac atcgaatacg cagaggggtc 720
cgcgtcatgc gagtggaacg attagttcgt atcagccgtg gacagagatt ttgcagaggt 780
cacagaatcc ggcgtttggg gagagttatg agggtcgaaa gggtgattag aatttctaga 840
ggacaacgct tttgtagagg ccatagaata aggagaggta cgagagttat gagggttgag 900
agattatgga gattgtatcg ggttaatcgg gttaggcgga ttaaacgaat ttgtcggatc 960
gaaagaggta ttcgtgtaaa aaggggtaat agattacgaa ggattcaacg tatatttaga 1020
ctccatagag taccacgtat aggcagattg acaaggatag ttagggctgc tagaggcggt 1080
agattaaaca gagcctggag attgatgaga attttacgtg ctcacagagt ttgtcgaata 1140
gttcgactga acagaatcta tagagttcac agattcagaa ggatacagag aatcgtgaga 1200
atttggcgct ttactcgttt gcatagaggt cgtagagcag ctagaggagg tcggctattg 1260
aggctttccc gcgtacaaag agcttgtcgt atcgtgaggt tatgtagatt tcatagaata 1320
catcgtgctg gtcgtgcctt cagagttcta cgggccaata gagcaatgag aatagctagg 1380
ggcgacagaa ttcctcgtgt ttttagagtt ggaagaatta tgagagcaag cagattatca 1440
agagtatcta gagccgaaag ggcccataga gtatgtagac tggcgagagc gacaagggct 1500
ccaagaggtg caaggataaa taggctttgg 1530
<210>13
<211>3340
<212>DNA
<213> Artificial sequence
<400>13
gaggatgtaa taatactaat ctcgaagatg ccatctaata catatagaca tacatatata 60
tatatataca ttctatatat tcttacccag attctttgag gtaagacggt tgggttttat 120
cttttgcagt tggtactatt aagaacaatc gaatcataag cattgcttac aaagaataca 180
catacgaaat attaacgata cggattagaa gccgccgagc gggtgacagc cctccgaagg 240
aagactctcc tccgtgcgtc ctcgtcttca ccggtcgcgt tcctgaaacg cagatgtgcc 300
tcgcgccgca ctgctccgaa caataaagat tctacaatac tagcttttat ggttatgaag 360
aggaaaaatt ggcagtaacc tggccccaca aaccttcaaa tgaacgaatc aaattaacaa 420
ccataggatg ataatgcgat tagtttttta gccttatttc tggggtaatt aatcagcgaa 480
gcgatgattt ttgatctatt aacagatata taaatgcaaa aactgcataa ccactttaac 540
taatactttc aacattttcg gtttgtatta cttcttattc aaatgtaata aaagtatcaa 600
caaaaaattg ttaatatacc tctatacttt aacgtcaagg agatgcgcct ggatagagta 660
tacagattga agagggtcgt tagggcttac cgcgtagcac gtgcattgcg agtgtccagg 720
gccattaggc tacctagatt gtttcgaatt gatagattcc cgcgagtgga taggggttgc 780
agagcaagga gactgatcag agctaagaga ttaaagcgcg tagttcgtgc acttagagta 840
tctagggtta ctagaggttc tagattagag aggggcgaaa gagcacaaag aggtaaaaga 900
ggaatgaggt tatcgagaat tgcgcgtggt gacaggatcc ccagggtcag tagagcagaa 960
agattcccaa gagttgacag attatggaga attatcagat tgcaaagatt tcccagagtg 1020
gatcgtgccc acagagtctg ccgtctatca agagtctccc gcgctgagcg gggttggagg 1080
gtctggcgtg gtttcagagc tgataggggg cctaggggta aaaggggtat gcgactttcg 1140
agagtccaac gcgcttgtag ggcctacaga gttgccagag gctatagggg acttcgtatt 1200
accagactca gtaggatcgc aagaggcgat cgaattccaa gggtccagag agcgtgcaga 1260
ctattgcgtg tgcaaagagg atggcgtgta tggagggctg tgcgagctgt taggatttca 1320
cgtgggcaac gcttctgtag aggacatcga atacgcagag gggtccgcgt catgcgagtg 1380
gaacgattag ttcgtatcag ccgtggacag agattttgca gaggtcacag aatccggcgt 1440
ttggggagag ttatgagggt cgaaagggtg attagaattt ctagaggaca acgcttttgt 1500
agaggccata gaataaggag aggtacgaga gttatgaggg ttgagagatt atggagattg 1560
tatcgggtta atcgggttag gcggattaaa cgaatttgtc ggatcgaaag aggtattcgt 1620
gtaaaaaggg gtaatagatt acgaaggatt caacgtatat ttagactcca tagagtacca 1680
cgtataggca gattgacaag gatagttagg gctgctagag gcggtagatt aaacagagcc 1740
tggagattga tgagaatttt acgtgctcac agagtttgtc gaatagttcg actgaacaga 1800
atctatagag ttcacagatt cagaaggata cagagaatcg tgagaatttg gcgctttact 1860
cgtttgcata gaggtcgtag agcagctaga ggaggtcggc tattgaggct ttcccgcgta 1920
caaagagctt gtcgtatcgt gaggttatgt agatttcata gaatacatcg tgctggtcgt 1980
gccttcagag ttctacgggc caatagagca atgagaatag ctaggggcga cagaattcct 2040
cgtgttttta gagttggaag aattatgaga gcaagcagat tatcaagagt atctagagcc 2100
gaaagggccc atagagtatg tagactggcg agagcgacaa gggctccaag aggtgcaagg 2160
ataaataggc tttgggttag taagggagaa gagttattta caggggtcgt tcctatatta 2220
gtagaacttg atggcgacgt taatggacat aaatttagtg tttcaggtga aggagaaggt 2280
gatgcaacgt acggtaaact gactctaaag ttcatttgca ccaccggtaa attgcctgta 2340
ccgtggccaa cactagttac tacgttaaca tacggcgtac agtgtttttc gagatatcca 2400
gaccacatga aacaacacga ctttttcaaa tccgcaatgc cagaaggtta cgtccaggaa 2460
cgtactattt tcttcaaaga tgatggaaat tataaaacca gggctgaagt gaaatttgaa 2520
ggcgacactc tagtgaacag aattgagttg aaggggattg atttcaagga agacgggaac 2580
atactcggtc ataagctgga gtacaactat aattcccata acgtctatat tatggcggat 2640
aagcaaaaga atggtatcaa ggttaacttt aaaatccggc acaatatcga agatggctct 2700
gtacaattgg ccgatcatta tcaacaaaat acacctattg gagatggtcc cgtgttgtta 2760
ccagacaatc attacttgtc aacacaatct gctttaagca aagatcccaa tgagaaaaga 2820
gatcatatgg tcttgttaga gtttgttact gccgctggta taactctggg tatggatgaa 2880
ctttataaat aatcatgtaa ttagttatgt cacgcttaca ttcacgccct ccccccacat 2940
ccgctctaac cgaaaaggaa ggagttagac aacctgaagt ctaggtccct atttattttt 3000
ttatagttat gttagtatta agaacgttat ttatatttca aatttttctt ttttttctgt 3060
acagacgcgt gtacgcatgt aacattatac tgaaaacctt gcttgagaag gttttgggac 3120
gctcgaaggc tttaatttgc cgtgatttac atatactaca agtcgccagt gtaactcctc 3180
actgaatatg attcatacat acccgtatgt attaatgtat aaatgttctc agagcaaatt 3240
ttatcgatat cttgtttgcc agtggtatgc aggtttggca aattttttac cataatatcc 3300
gtttatagat tctggaacct taccaacttt cttaccgcta 3340
Claims (10)
1. An information storage medium, wherein the information storage medium is a nucleic acid molecule;
the information storage medium comprises a fusion gene consisting of a codon sequence corresponding to the stored information and a fluorescent protein gene.
2. The information storage medium of claim 1, wherein each character of the stored information is represented by a number of consecutive amino acids, each character of the stored information corresponding to a codon sequence consisting of a number of nucleotide residues;
preferably, each character of the stored information is represented by three consecutive amino acids, and each character of the stored information corresponds to a codon sequence consisting of 9 nucleotide residues;
preferably, the codon sequence corresponding to the stored information is positioned at the 5 'end of the fusion gene, and the fluorescent protein gene is positioned at the 3' end of the fusion gene;
preferably, the fluorescent protein gene includes a green fluorescent protein gene or a red fluorescent protein gene.
3. The information storage medium of claim 1 or 2, further comprising a promoter and a terminator;
preferably, the promoter is GA L1 promoter;
preferably, the terminator is CYC1 terminator;
preferably, the information storage medium further comprises a 5're-organizing arm and a 3're-organizing arm;
preferably, the length of the 5 'recombination arm and the 3' recombination arm is 50-200 bp.
4. The information storage medium of any one of claims 1 to 3, wherein the information storage medium comprises, in order from 5 'to 3', a 5 'recombination arm, a GA L1 promoter, a codon sequence corresponding to a stored information, a green fluorescent protein gene, a CYC1 terminator, and a 3' recombination arm;
preferably, the green fluorescent protein comprises a nucleic acid sequence shown as SEQ ID NO. 1;
preferably, the GA L1 promoter comprises the nucleic acid sequence shown in SEQ ID NO. 2;
preferably, the CYC1 terminator comprises the nucleic acid sequence set forth in SEQ ID NO. 3;
preferably, the 5' recombination arm comprises the nucleic acid sequence shown in SEQ ID NO. 4;
preferably, the 3' recombination arm comprises the nucleic acid sequence shown in SEQ ID NO. 5.
5. An information storage method, the method comprising:
information storage medium according to any one of claims 1 to 4 incorporated into the s.cerevisiae genome for information storage.
6. The method of claim 5, wherein the method comprises:
transforming the information storage medium of any one of claims 1-4 into competent saccharomyces cerevisiae simultaneously with a CRISPR/Cas9 gene editing plasmid targeting a saccharomyces cerevisiae gene;
primarily screening positive colonies, and carrying out PCR detection to obtain positive clones introduced with plasmids;
and extracting the positive cloned genome, and performing sequencing verification to obtain the saccharomyces cerevisiae integrated with the information storage medium.
7. The method of claim 5 or 6, wherein the CRISPR/Cas9 gene editing plasmid contains a sgRNA targeted to the Saccharomyces cerevisiae AED1 or AED2 gene;
preferably, the CRISPR/Cas9 gene editing plasmid contains sgRNA targeting the 5' sequence of the saccharomyces cerevisiae AED1 or AED2 gene;
preferably, the sgRNA includes a nucleic acid sequence shown in SEQ ID NO. 6-11.
8. The method according to any one of claims 5 to 7, wherein the information storage medium is prepared by:
synthesizing a plurality of positive and negative spacing sequences as a PCR template, designing a head primer and a tail primer to carry out PCR amplification to obtain the information storage medium, wherein the adjacent positive and negative spacing sequences have an overlapping region;
preferably, the length of the positive and negative spacing sequences is 55-70 bp;
preferably, the length of the overlapping region is 20-25 bp;
preferably, the length of the head primer and the tail primer is 55-70 bp.
9. The method of any one of claims 5-8, wherein the positive colonies are red;
preferably, the sequencing comprises Sanger sequencing.
10. An information storage kit, characterized in that the kit comprises the information storage medium according to any one of claims 1 to 4;
preferably, the kit further comprises a CRISPR/Cas9 gene editing plasmid targeting the saccharomyces cerevisiae gene;
preferably, the kit further comprises saccharomyces cerevisiae.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010443536.6A CN111440827A (en) | 2020-05-22 | 2020-05-22 | Information storage medium, information storage method and application |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010443536.6A CN111440827A (en) | 2020-05-22 | 2020-05-22 | Information storage medium, information storage method and application |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111440827A true CN111440827A (en) | 2020-07-24 |
Family
ID=71657058
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010443536.6A Pending CN111440827A (en) | 2020-05-22 | 2020-05-22 | Information storage medium, information storage method and application |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111440827A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112489721A (en) * | 2020-11-25 | 2021-03-12 | 清华大学 | Mirror image protein information storage and coding technology |
CN113462710A (en) * | 2021-06-30 | 2021-10-01 | 清华大学 | Random rewriting DNA information storage method |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103975063A (en) * | 2011-11-23 | 2014-08-06 | 帝斯曼知识产权资产管理有限公司 | Nucleic acid assembly system |
CN104603273A (en) * | 2012-03-12 | 2015-05-06 | 帝斯曼知识产权资产管理有限公司 | Recombination system |
CN106086061A (en) * | 2016-07-27 | 2016-11-09 | 苏州泓迅生物科技有限公司 | A kind of genes of brewing yeast group editor's carrier based on CRISPR Cas9 system and application thereof |
CN106191099A (en) * | 2016-07-27 | 2016-12-07 | 苏州泓迅生物科技有限公司 | A kind of parallel multiple editor's carrier of genes of brewing yeast group based on CRISPR Cas9 system and application thereof |
CN107723287A (en) * | 2016-08-12 | 2018-02-23 | 中国科学院天津工业生物技术研究所 | A kind of expression system for strengthening silk-fibroin production and preparing |
CN108517331A (en) * | 2018-03-19 | 2018-09-11 | 安徽希普生物科技有限公司 | A kind of engineering bacteria construction method of amalgamation and expression antibacterial peptide and red fluorescent protein |
CN109072243A (en) * | 2016-02-18 | 2018-12-21 | 哈佛学院董事及会员团体 | Pass through the method and system for the molecule record that CRISPR-CAS system carries out |
CN109460822A (en) * | 2018-11-19 | 2019-03-12 | 天津大学 | Information storage means based on DNA |
US20200063119A1 (en) * | 2018-08-22 | 2020-02-27 | Massachusetts Institute Of Technology | In vitro dna writing for information storage |
-
2020
- 2020-05-22 CN CN202010443536.6A patent/CN111440827A/en active Pending
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103975063A (en) * | 2011-11-23 | 2014-08-06 | 帝斯曼知识产权资产管理有限公司 | Nucleic acid assembly system |
CN104603273A (en) * | 2012-03-12 | 2015-05-06 | 帝斯曼知识产权资产管理有限公司 | Recombination system |
CN109072243A (en) * | 2016-02-18 | 2018-12-21 | 哈佛学院董事及会员团体 | Pass through the method and system for the molecule record that CRISPR-CAS system carries out |
CN106086061A (en) * | 2016-07-27 | 2016-11-09 | 苏州泓迅生物科技有限公司 | A kind of genes of brewing yeast group editor's carrier based on CRISPR Cas9 system and application thereof |
CN106191099A (en) * | 2016-07-27 | 2016-12-07 | 苏州泓迅生物科技有限公司 | A kind of parallel multiple editor's carrier of genes of brewing yeast group based on CRISPR Cas9 system and application thereof |
CN107723287A (en) * | 2016-08-12 | 2018-02-23 | 中国科学院天津工业生物技术研究所 | A kind of expression system for strengthening silk-fibroin production and preparing |
CN108517331A (en) * | 2018-03-19 | 2018-09-11 | 安徽希普生物科技有限公司 | A kind of engineering bacteria construction method of amalgamation and expression antibacterial peptide and red fluorescent protein |
US20200063119A1 (en) * | 2018-08-22 | 2020-02-27 | Massachusetts Institute Of Technology | In vitro dna writing for information storage |
CN109460822A (en) * | 2018-11-19 | 2019-03-12 | 天津大学 | Information storage means based on DNA |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112489721A (en) * | 2020-11-25 | 2021-03-12 | 清华大学 | Mirror image protein information storage and coding technology |
CN112489721B (en) * | 2020-11-25 | 2021-11-12 | 清华大学 | Mirror image protein information storage and coding technology |
CN113462710A (en) * | 2021-06-30 | 2021-10-01 | 清华大学 | Random rewriting DNA information storage method |
CN113462710B (en) * | 2021-06-30 | 2023-07-11 | 清华大学 | DNA information storage method capable of randomly rewriting |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111440827A (en) | Information storage medium, information storage method and application | |
CN110607320B (en) | Plant genome directional base editing framework vector and application thereof | |
WO2015144045A1 (en) | Plasmid library comprising two random markers and use thereof in high throughput sequencing | |
CN108034671B (en) | Plasmid vector and method for establishing plant population by using same | |
Wu et al. | In vivo assembly in Escherichia coli of transformation vectors for plastid genome engineering | |
CN110699383A (en) | Method for integrating multiple copies of target gene into saccharomyces cerevisiae genome | |
US20190194738A1 (en) | Key-value store that harnesses live micro-organisms to store and retrieve digital information | |
CN101948871A (en) | Marine microalgae chloroplast expression vector and application thereof | |
CN112481413B (en) | Plant mitochondrial genome assembly method based on second-generation and third-generation sequencing technologies | |
WO2024207806A1 (en) | Double-plasmid system for rapid gene editing of ralstonia eutropha and use thereof | |
CN103966249B (en) | A kind of carrier and application thereof for building without screening label cyanobacteria | |
Ohdate et al. | Discovery of novel replication proteins for large plasmids in cyanobacteria and their potential applications in genetic engineering | |
CN108130338A (en) | The carrier T and application of a kind of pre-T carrier and its composition | |
CN116218890A (en) | Gene tandem expression cassette, multi-site gene editing system and application | |
CN114591997A (en) | Expression vector of schizochytrium limacinum, construction method of expression vector and application | |
US11124819B2 (en) | Genes involved in astaxanthin biosynthesis | |
CN117683755B (en) | C-to-G base editing system | |
CN113832151B (en) | Cucumber endogenous promoter and application thereof | |
CN113005137B (en) | Construction method of regulatory element with dual functions of starting and stopping, dual-function element library and application | |
CN114437191B (en) | Voltage-dependent anion channel protein OsVDAC4 and application thereof in regulation and control of rice male sterility | |
Ohdate et al. | Discovery of novel replication | |
CN114657202B (en) | Heat-resistant nucleic acid degrading enzyme expression vector, construction method and application thereof | |
Pennetti et al. | Single component CRISPR-mediated base-editors for Agrobacterium and their use to develop an improved suite of strains | |
KR102010246B1 (en) | Promoter from microalgae Ettlia and uses thereof | |
CN118581087A (en) | Data writing and erasing elements, constructs, hosts and methods based on yeast optical disc storage data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200724 |
|
RJ01 | Rejection of invention patent application after publication |