CN109460822B - DNA-based information storage method - Google Patents

DNA-based information storage method Download PDF

Info

Publication number
CN109460822B
CN109460822B CN201811377712.XA CN201811377712A CN109460822B CN 109460822 B CN109460822 B CN 109460822B CN 201811377712 A CN201811377712 A CN 201811377712A CN 109460822 B CN109460822 B CN 109460822B
Authority
CN
China
Prior art keywords
dna
sequence
information
code
fragments
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811377712.XA
Other languages
Chinese (zh)
Other versions
CN109460822A (en
Inventor
元英进
韩明哲
陈为刚
章新晨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN201811377712.XA priority Critical patent/CN109460822B/en
Publication of CN109460822A publication Critical patent/CN109460822A/en
Application granted granted Critical
Publication of CN109460822B publication Critical patent/CN109460822B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/12Computing arrangements based on biological models using genetic models
    • G06N3/123DNA computing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • G06F11/1008Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices
    • G06F11/1048Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices using arrangements adapted for a specific error detection or correction feature

Abstract

The invention relates to the technical field of information storage, in particular to a DNA-based information storage method. The invention provides a long sequence in vivo DNA information storage technology. The main objective is to combine the LDPC code and BCH code as basis to construct a coding system with strong error correction mechanism by medium-long DNA sequence (more than 1 Kbp), and reduce redundancy of primer, index and the like by long sequence coding, thereby realizing high actual bearing rate (more than 97%). The long sequence is assembled and stored by using a saccharomyces cerevisiae in-vivo assembly system, information is stored, and low-cost, high-fidelity and high-speed replication of the information is realized by means of model organisms such as saccharomyces cerevisiae, escherichia coli or bacillus. Meanwhile, due to the existence of a strong error correction system, the data in the bacteria can be perfectly restored by low coverage (1-5X) under the second generation sequencing.

Description

DNA-based information storage method
Technical Field
The invention relates to the technical field of information storage, in particular to a DNA-based information storage method.
Background
Human society has entered the information explosion era and has higher requirements for storage media and storage systems. Surveys conducted by international data and peace and trust companies together show that the global data volume is rapidly increasing today at an annual growth rate of 50%, and that the global data volume has reached 16ZB in 2017, which is 100 times the global data volume ten years ago. According to this survey estimate, by 2020, the global data population will reach 44ZB, corresponding to the summed storage capacity of 45000 existing amazon data storage centers.
Binary data, commonly denoted as 0 and 1, are stored, communicated and manipulated by modern digital computers. The storage and transmission of these bits of information are associated with the physical structure of the medium and signals such as the electronic state of transistors or the magnetic orientation of magnetic materials. At the same time, digital information, such as the genetic code in a cell, is also stored in nature in the form of natural molecular polymers. DNA in cells is constructed with deoxynucleotides, different nucleotides are distinguished by bases, and each base position can have four values (A, C, G or T, respectively representing the chemical name of the base), so that each base is essentially equivalent to two bits of information in modern memory systems.
One of the most attractive features of digital storage of DNA is its extremely high physical storage density. In DNA data storage, only up to 14 atoms per bit of data are needed, and the distance between every two bases (4 information bits) is only 0.34nm, and there is no other technology that can store information at such a high density. Furthermore, as a molecular medium, digital storage based on DNA can enable storage in three dimensions, rather than two dimensions on the surface of the medium as with a magnetic disk, meaning that DNA will occupy less physical space.
The idea of using DNA to store digital data was originally proposed by Baum in the mid 90's of the 20 th century. The earliest verification experiments showed that information could be stored in DNA (cleland et al, 1999), which first published a DNA-based steganography in Nature that deposited useful information in a heap of spam and specifically located the hidden information by a specific key. As the development of DNA storage has gradually shifted to the design of coding schemes, a number of computer background scientists have focused on this direction and have further studied the coding on this basis, and have proposed many novel coding schemes. Chen et al further elaborates a DNA-based storage model with learning and searching capabilities that defines storage as a learning process, enabling a certain degree of DNA information storage (Chen et al, 2003); innovations in coding at the same time also utilized the reverse storage of text as codons using amino acid triplet codons and short names for individual amino acids, the translation of a poem into a DNA sequence (Bogard et al, 2008) and allenberg et al written text songs and a lamb image into DNA using a modified Huffman ternary coding method (allenberg et al, 2009). The brand-new coding scheme realizes the exploration of coding data on DNA, and particularly, the utilization of Huffman codes avoids the problem that a part of DNA is difficult to synthesize and sequence in DNA synthesis and sequencing, so that the coding is technical service and has more mathematical scientificity. However, due to limitations of DNA synthesis and sequencing technologies, relatively large amounts of data cannot be stored in DNA until several years later (Church et al, 2012& Goldman et al, 2013). These studies have again led to an increasing amount of information in DNA from scholars in different fields. Yaniv and Dina firstly apply the fountain code concept used in the communication process to DNA storage, and improve the coding concept from the traditional strategy of simply converting and adding error correcting codes to a height more biased to the actual communication application, thereby realizing the lossless reading of files (Yaniv et al, 2017). The use of fountain codes in DNA skillfully and perfectly combines DNA storage with communication coding with specific similar characteristics, which also allows people to see the possibility of combining DNA storage with actual communication coding. Recently, researchers at washington university, microsoft and american Twist have recently encoded 200MB of information in DNA and accurately retrieved this data, which is by far the largest DNA-based storage project (organic et al, 2018).
The existing DNA information storage technology is limited by DNA synthesis technology, mainly focuses on short sequences (below 200 bp), and has the following disadvantages: the coding (error correcting code) is simple (Huffman + RS error correcting code is the main), the actual carrying rate of the DNA sequence is low (caused by other invalid sequences such as primer, address and redundancy, the actual carrying rate is usually below 60%), the copying cost is high (PCR amplification needs a large amount of DNA polymerase when a large amount of copying is needed, the cost is high), and the copying fidelity is poor (the base mutation rate is high in the PCR process and the amplification imbalance possibly caused).
Disclosure of Invention
In view of the above, the technical problem to be solved by the present invention is to provide a DNA-based information storage method that enables storage of long fragments.
The DNA-based information storage method provided by the invention comprises the following steps:
converting the information into a binary sequence;
converting the binary sequence into a DNA sequence;
after synthesizing a DNA fragment according to the DNA sequence, the DNA fragment is transformed into a microorganism and stored.
In the existing DNA information storage technology, information is stored in the form of DNA dry powder or solution, which is limited by the DNA synthesis technology, and macromolecular DNA synthesis is difficult, so that the existing DNA information storage technology is difficult to store a large amount of information, and the actual bearing rate of a DNA sequence is low. The invention utilizes a saccharomyces cerevisiae in vivo assembly system to assemble and store long sequences and save information. Therefore, the stored information is characters, pictures and/or videos. Then, the information is copied at low cost, high fidelity and high speed by means of model organisms such as saccharomyces cerevisiae, escherichia coli or bacillus.
The data exists in a binary form in a computer, and in the invention, the information is converted into a binary sequence, which comprises the following steps: the information is converted into a binary bit sequence and then error correction coding is performed in segments.
The segmentation is as follows: each (2)12-1) the bp is divided into a packet and error correction coded. In the invention, the scheme of the error correction coding is low-density parity check superposition watermarking code. In the embodiment of the invention, the adopted error correction code scheme is a Low Density Parity Check (LDPC) code as a traditional error correction code, and then a watermarking code is superposed to correct gaps (gap) generated in the long sequence sequencing and assembling process, Insertion (Insertion) and Deletion (Deletion) of base/base fragments and the like, and various errors in the genome (or plasmid) replication process, the sequencing and the assembling process are processed. The code rate of the adopted watermark code is 4/5; the code length of the adopted LDPC code is 64800 bits, the information bit length is 32400 bits, and the code rate is 1/2; the overall efficiency represents 0.8 bits per base.
In the invention, the method for converting the binary sequence into the DNA sequence adopts a method of converting two bits into one base, and the preset conversion relation is as follows: 00 → A, 01 → T, 10 → G, 11 → C.
A is adenine, T is thymine, G is guanine, and C is cytosine.
In the present invention, the DNA sequence is 10kbp to 100kbp in length.
In the invention, the length of the fragments assembled in vivo is 1-4 kbp, and the length of homologous sequences among the fragments is 30-150 bp.
The invention utilizes the in vivo assembly technology of yeast, so that the acquisition of long fragments is easier to realize, redundant invalid sequences such as joints, primers or addresses and the like are not needed among the fragments, and the actual bearing rate is improved.
In the present invention, the method for preparing the DNA fragment comprises: and performing in-vivo yeast assembly after PCR amplification.
In some embodiments, the method of preparing the DNA fragment comprises:
firstly, breaking the DNA sequence into sequences with the length of 1kbp to 3kbp and containing homologous sequences at the upstream and the downstream;
secondly, synthesizing DNA fragments according to the sequence of the first step;
thirdly, connecting the 5 'end of the first fragment at the 5' end with the homologous left arm of the microorganism insertion site by using a PCR method; ligating the 3 'end of the last fragment at the 3' end to the homologous right arm of the microorganism insertion site;
and fourthly, incubating the LiAc, the PEG3350, the homologous left arm, the homologous right arm and other DNA fragments with the saccharomyces cerevisiae together.
In the invention, the microorganism is a prokaryote or a eukaryote; the prokaryote is preferably Escherichia coli or Bacillus; the eukaryote is yeast.
In the present invention, the selection of the microorganism can be made according to various storage and distribution requirements. For example: storing the strain into escherichia coli when rapid replication is required; storing in Bacillus at room temperature for a long time (such as more than 20 years); and the yeast can be stored into longer fragments and can be directly assembled in vivo. In order to realize the storage of DNA information in other microorganisms, after artificial synthesis and yeast in vivo assembly, synthetic DNA fragments bearing information can be extracted and transformed into other microorganisms. For example, after extracting the DNA fragment assembled in vivo, the DNA fragment is transformed into Escherichia coli and/or Bacillus.
In some embodiments, the microorganism is a yeast, preferably saccharomyces cerevisiae; the site of the DNA fragment was transformed with ADE2 gene.
The ADE2 gene encodes phosphoribosylaminoimidazole carboxylase gene which is capable of catalyzing the sixth reaction of purine nucleotide synthesis. In Saccharomyces cerevisiae, Pichia pastoris, or other yeasts, mutation or deletion of this gene can result in accumulation of purine precursors within the yeast intracellular vacuoles, causing the cells to appear pink, thus judging from appearance whether the gene segment was successfully introduced.
The method of the invention also comprises a step of propagating the microorganisms.
The invention realizes the in vivo storage of information in a DNA form, and can propagate microorganisms in order to realize the rapid replication and mass propagation of the information. Based on the rapid reproductive characteristics of microorganisms, the method provided by the invention can realize rapid and mass reproduction of information, and the efficiency of the reproduction is far higher than that of printing of printed matters or copying of optical discs. And the operation is simpler and the cost is lower.
The invention also comprises a step of reading information; the method specifically comprises the steps of sequencing the microorganisms, converting a detected DNA sequence into a binary sequence, and decoding to obtain binary data so as to obtain stored information. In the present invention, the reading of the stored DNA information can be performed by a sequencing technique in the prior art, such as a second generation or third generation sequencing technique.
The method for converting the DNA sequence into the binary sequence comprises the following steps: a → 00, T → 01, G → 10, C → 11.
The sequencing adopts a second generation or third generation sequencing technology to read the sequence. After the sequence is measured, the sequence is firstly assembled and converted into binary bit data, then the binary bit data is decoded and recovered, and the original digital information such as characters, pictures, videos and the like is further recovered.
The invention provides a long sequence in vivo DNA information storage technology. The main objective is to combine the LDPC code and BCH code as basis to construct a coding system with strong error correction mechanism by medium-long DNA sequence (more than 1 Kbp), and reduce redundancy of primer, index and the like by long sequence coding, thereby realizing high actual bearing rate (more than 97%). The long sequence is assembled and stored by using a saccharomyces cerevisiae in-vivo assembly system, information is stored, and low-cost, high-fidelity and high-speed replication of the information is realized by means of model organisms such as saccharomyces cerevisiae, escherichia coli or bacillus. Meanwhile, due to the existence of a strong error correction system, the data in the thallus can be perfectly restored under the condition of low coverage (1-5X) under the second generation of sequencing.
Drawings
FIG. 1 shows the main process of DNA information storage according to the present invention;
FIG. 2 shows yeast colonies before and after storing the information.
Detailed Description
The invention provides a DNA-based information storage method, and a person skilled in the art can appropriately improve the process parameters by referring to the content. It is expressly intended that all such similar substitutes and modifications which would be obvious to one skilled in the art are deemed to be included in the invention. While the methods and applications of this invention have been described in terms of preferred embodiments, it will be apparent to those of ordinary skill in the art that variations and modifications in the methods and applications described herein, as well as other suitable variations and combinations, may be made to implement and use the techniques of this invention without departing from the spirit and scope of the invention.
The invention is further illustrated by the following examples:
examples
Coding of information
In this embodiment, a "tianda chapter program selecting" txt "file is selected for encoding, and the selecting contents include:
tianjin university chap
Preamble of preamble
The antecedent of Tianjin university is North ocean university, which is established in 1895, 10 months and 2 days, and is the earliest higher school mansion in recent higher education history in China. 9 months in 1951, regulated by the national higher education institute, approved by the central national government and government institute, and named Tianjin university. In 1959, it was identified centrally as national focus university. In 2000, it was identified as a high-level research university with national "985 engineering" emphasis.
The first is to store the Chinese herbs, and the study of storage is needed. The school takes the strong country as the study purpose, is a poor theory of study, is used as talents for education, inherits civilization, revises China and shapes the future; aiming at building the world first-class university which becomes comprehensive, research type, open type and internationalization, the method strives to make important contributions to national economy and social development and world civilization progress.
Chapter i general rule
The first one is to promote law-based treatment of schools, perfect modern university systems, guarantee basic rights and interests of teachers and students, promote scientific development of schools, and make a course according to laws and regulations and education ministries such as the education Law of the people's republic of China, and the like, and in combination with the practice of schools.
The Chinese name in the second school is Tianjin university, named Tianda for short; the English name is Tianjin University, abbreviated TJU.
The third school is held by the country, is managed by the education administration of the State Council, and is constructed by the education administration of the State Council and the people's government of Tianjin city.
Schools have legal qualifications for the business entity, and the school leader is a legal representative.
The fourth school legal residence is the Nankai district Weijinlu No. 92 in Tianjin, and is provided with a Weijinlu school district, a North-ocean garden school district and a coastal industrial research institute school district.
The school website is http:// www.tju.edu.cn.
The school can change the residence and adjust the school district according to the development requirement and the approval of the competent department.
The fifth school's training is "fact" (Seeking Truth from Facts). Schools develop the school wind of strict therapeutics and inherit the tradition of the republic of the Edison.
The sixth school, based on the concept of "shape up and shape down and material achievement", is dedicated to the cultivation of prominent talents with national conditions and conception, global vision, innovative spirit and practical ability.
The seventh school is primarily in the form of a full-time system of homeland and student education, with non-academic education and training, providing lifelong educational services.
Schools grant law for academic records, and grant law for scholars, major and doctor degrees.
The eighth school is oriented to the needs of the world academic frontier and the national strategy, follows the talent culture law, continuously adjusts and optimizes the subject structure, consolidates the advantages of engineering and science, and develops the science and the humanistic social science vigorously to form a comprehensive subject layout with outstanding advantages, distinct features, cross fusion and coordinated development.
The ninth school carries out the leader-responsible system under the leader of Communist Party of China Tianjin university Committee (hereinafter referred to as school Party Commission), advances the teaching and teaching of academics and democratic management, and establishes a mechanism combining healthy teacher and student participation, expert consultation and collective decision.
The tenth sponsor provides the school with the expense for the school to work, guarantees the basic conditions of the school, guides the school work activities macroscopically by law, and exempts the main responsible person of the school according to the relevant regulations.
The eleventh school obeys rules and enjoys the autonomous right of talent culture, scientific research, team construction, internal management, campus planning construction and the like, independently assumes legal responsibility and is not interfered by any organization and individual illegally.
The twelfth school should solicit the teacher's and student's opinions as to separation, merger, termination and renaming and report the sponsor's approval.
School function of chapter ii
The thirteenth school insists on being people-oriented and takes the Lide tree as the center to carry out education, teaching, scientific research, social service and cultural inheritance innovation.
The fourteenth school insists that the person is the first and the quality is the first, implements quality education according to the education guidelines of the comprehensive development of the moral intelligence, and carries out comprehensive culture on students.
The fifteenth school adheres to the national major strategic demands, focuses on the advance of the scientific and technological development of the world, advocates science, creates a vector innovation, promotes talent culture, and promotes academic progress, scientific and technological development and achievement transformation.
The sixteenth school sound social service system provides talents and intelligence support, promotes national and regional development, and promotes social progress.
The seventeenth school focuses on cultural nurseries and continuously improves the pursuit of literacy, aesthetic feelings, temperament and value of teachers and students. The schools develop the humanistic spirit of 'new and new' daily life, adhere to cultural inheritance innovation, lead social fashion and serve strong national construction of culture.
Third chapter student
The eighteenth student is an educator who is entitled to admission by law of Tianjin university and has the school status of Tianjin university.
Basic rights of nineteenth student:
the study is free, and the major and the school courses can be selected and corrected independently according to the culture scheme, the relevant regulations and the procedures of the school;
secondly, fairly receiving school education, using school public education resources and fairly obtaining various rewards and honor titles;
(III) reaching the specified academic level, and acquiring corresponding academic calendar and academic certificate;
organizing and participating in student autonomous organization and student community according to laws, regulations and school regulations;
(V) participating in school management, knowing about school reform, construction and development conditions and major matters related to personal interests;
(VI) making a complaint on the given punishment or the treatment expression of the school;
and (seventhly) other rights specified by laws and regulations.
Basic obligation of twenty students:
firstly, aiming at the talent culture target of schools, repairing and practicing, diligently learning, and comprehensively developing the talents in the schools;
(II) the people are honest and friendly, respect teachers and students and reunion;
(III) carrying forward the reputations, protecting education equipment and living facilities and maintaining the benefits of schools;
(IV) adherence to academic specifications, adherence to academic ethics;
(V) paying the fee and the related fee according to the regulation;
(VI) complying with school regulations;
and (seventhly) other obligations prescribed by laws and regulations.
The twenty-first school carries out the reward of showing out the students who obtain outstanding results and strive for honor for the school or individuals; giving corresponding discipline punishment to the disrotatory students.
The twenty-second school establishes a student right guarantee mechanism, establishes student complaint committees and maintains the legal rights and interests of students. Schools encourage and support students to participate in school democratic management, and provide opinions and suggestions for school work.
Converting the file into binary bit sequence by computer, each (2)12-1) dividing the bp into a packet, and performing error correction coding to obtain a binary sequence; further, the binary sequence is converted into A, T, G, C sequence, and two bits are converted into 1 base, the preset corresponding relation is 00 → A, 01 → T, 10 → G, 11 → C, and the DNA coding sequence with the length of 40500bp is shown as SEQ ID NO. 1.
DNA fragment preparation and information storage
The sequence of the obtained long DNA sequence is interrupted to respectively obtain 13 segments of about 2-4kbp fragments, about 30-150bp homologous segments are kept among the fragments, and the homologous fragments have larger difference with non-homologous parts as much as possible so as to facilitate the in vivo recombination. The interruption cases are as in table 1:
TABLE 1 breaking segments
Figure BDA0001871180520000081
Figure BDA0001871180520000091
The above 13 DNA fragments were synthesized by DNA synthesis and assembly techniques.
The first 500bp of the No. 1 fragment and the first 500bp of the ADE2 gene of the saccharomyces cerevisiae are amplified by a PCR technology, and connected 1000bp joint fragments are obtained by amplification by an Overlap-PCR technology and are named as joint A after Sanger sequencing verification.
Amplifying the later 500bp of the No. 13 fragment by using a PCR technology, screening the 1512bp of a tag Leu gene and the later 500bp of a saccharomyces cerevisiae ADE2 gene, amplifying by using an Overlap-PCR technology to obtain a connected 2512bp joint fragment, and naming the fragment as a joint B after Sanger sequencing verification.
TABLE 2 common PCR and Overlap-PCR reaction systems
Component 50μL rxn
ddH2O add to 50μL
2×reaction buffer 25.0μL
dNTP(10mM) 1.0μL
PCR template (10. mu.M) 2.0μL
Primer F(10μM) 2.0μL
Primer R(10μM) 2.0μL
DNA polymerase 1.0μL
TABLE 3 general PCR and Overlap-PCR procedure
Figure BDA0001871180520000101
The obtained linker A, B was mixed with the synthesized 13 fragments for Saccharomyces cerevisiae co-transformation and replaced to the Saccharomyces cerevisiae ADE2 gene position by in vivo assembly, as follows:
(1) a single colony of Saccharomyces cerevisiae (BY4741) was picked up from the plate and cultured in YPD medium overnight at 30 ℃ and 250 rpm.
(2) The next morning, according to the bacterial liquid concentration, 5-10% of the bacterial liquid is transferred into a fresh YPD culture medium, and the YPD culture medium is cultured for 6h at 30 ℃ and 250 rpm.
(3) 10min before starting the transformation, the solid was heated to 100 ℃ to boil the ssDNA for 12min and quickly placed on ice.
(4) 1mL of the bacterial solution was centrifuged at 4000rpm for 1min in a sterile EP tube.
(5) The supernatant was decanted, the cells were resuspended in 1mL of sterile water, and centrifuged at 4000rpm for 1 min.
(6) The supernatant was decanted and 1mL of 0.1M LiAc pre-chilled in a refrigerator at 4 ℃ was gently resuspended in cells and placed on ice.
(7) The transformation system was configured with an EP tube as shown in Table 4.
TABLE 4 Yeast transformation systems
Composition (I) Volume of
50%PEG3350 620μL
ssDNA(10mg/mL) 40~45μL
1M LiAc 90μL
(8) After the system configuration is finished, fully mixing on a vortex oscillator
(9) The competent cells on ice are taken out, centrifuged at 4000rpm for 1min, part of the supernatant is poured, about 100. mu.L of the supernatant is left to resuspend the cells, and 200ng of each of the A, B adaptors and 13 DNA fragments are added, and the mixture is evenly blown and sucked.
(10) Spirally adding the bacterial liquid obtained in the step (9) into the transformation system obtained in the step (8); turning over the mixture for 3-4 times, and incubating in 30 deg.C incubator for 30 min.
(11) Adding 90 μ L DMSO into the spiral, turning 3-4 times, mixing, and performing heat shock in 42 deg.C water bath for 18 min.
(12) Centrifuging at 4000rpm for 2min, discarding the supernatant, adding 500. mu.L of 5mM CaCl2 to resuspend the thallus, and standing at room temperature for 5 min.
(13) The supernatant was discarded, and the remaining 100. mu.L was applied to SC-leu (adenine-restricted) plates for screening.
(14) After the yeast grows on the screening plate for 2 days, a pink single colony is selected, and colony PCR verification is carried out by using Rapid Taq Mix and using each junction as a primer (the primer is shown in Table 7) designed for a target fragment.
TABLE 5 colony PCR reaction System
Composition (I) 15 μ L system
ddH2O 5.4μL
Rapid Taq DNA polymerase Mix 7.5μL
Primer F(10μM) 0.3μL
Primer R(10μM) 0.3μL
Template 1.5μL
TABLE 6 colony PCR program procedure
Figure BDA0001871180520000111
TABLE 7 colony PCR primer design
Target segment Upstream primer Downstream primer
Joint A-1 atctagaatcaaaacgacactttatttccaaaaagg tattaggatcggaatccatctgcaa
1-2 gaacgacaaaccccgacaagtaaca ctgtattccgtctgacgaaaattttgtaat
2-3 tgtaatctccgccacaatggtttgt acgtctccggatttttaatccgc
3-4 tttctttggcggttaaactcacacatctat gttaatagtatcacaccacccatatgaggttagc
4-5 acgtcctgatggatggagcaattag tagtttcagtaatgaatactgtctcaagcttcg
5-6 aacgccttaaagccaaataaagatcgaaac tccacctctaaggctgtcatgtctatt
6-7 acgttataatccctagtgcgtaggtc tcacggtgtaattataaggtccgtaacg
7-8 tccccgaagtgtgtacgatatctatgac agcttgcgtgcttatcagcataag
8-9 tcatagatcgctcccgtctgcgata agcagcgttctacaacgactagc
9-10 tgcacgattgattggggcatttc acacagttattaatgctagctatcgtcg
10-11 ataacagtttggactctacagccagatt Tagtgtatgcattcacggcacagt25
11-12 Tctgcgcacgcagatacctct25 Tggcctaacagagcacgtcac26
12-13 Acctgctccacgtgatcagt27 Aacgaacatttgagatccggatgtg28
13-Joint B Ttatccctgagtaaattgatacgttgg caagggaacattatagggtgttaagagtact
The correct Saccharomyces cerevisiae was selected and inoculated into YPD liquid medium, cultured overnight at 30 ℃ and 250 rpm. The liquid thallus is freeze-dried and packaged, and the specific process is as follows:
(1) by measuring OD of bacterial liquid600Estimate the cell concentration to 109After cfu/ml, 4ml of the bacterial solution was centrifuged at 6000rpm in a 4 ℃ centrifuge for 10 minutes to collect cells.
(2) The cells were resuspended in an equal volume of 10% sucrose solution (or 10% skim milk powder solution) and transferred to a 10ml vial.
(3) And (3) tying the penicillin bottle with a breathable sealing film, and pre-freezing the penicillin bottle and a freeze-drying rack together at the temperature of-20 ℃ for 12 hours.
(4) And carrying out vacuum freeze drying for 12h, wherein the temperature of a cold trap is-45 ℃, and the vacuum degree is 10-20 Pa.
(5) The butyl rubber stopper is sealed, marked and numbered TJU40K, and is stored at 4 ℃ or normal temperature to be released.
Third, information distribution
In order to realize the mass distribution of information, the propagation of the yeast is carried out, and the distribution of the same information to 300000 people can be completed within 25 h.
Fourth, information reading
The flow of reading and issuing the in-vivo information is as follows:
(1) 1ml of sterile water is taken to dissolve the released freeze-dried fungus powder in the penicillin bottle.
(2) 200. mu.l of the suspension was transferred to 5ml of YPD medium and cultured overnight at 30 ℃ and 250 rpm.
(3) Cells were harvested by centrifugation at 6000rpm for 10 minutes and sent for second generation sequencing.
And analyzing and decoding the sequencing result to obtain the content of a stored ' Tianda chapter program selection ' txt ' file.
Compared with Goldman, which is the most complete method for storing DNA information in european molecular biology laboratories so far, the results are shown in table 8:
TABLE 8 comparison of the Effect of the invention with Goldman
Parameter(s) Goldman The invention
Multiple of redundancy 4 1
Error correction method Repetition of LDPC code + watermark code
Single chain Length (nt) 104 40k
Depth of sequencing 51X 1-5X
Information density (bits/nt) 0.33 0.8
Actual load bearing 18% 97%~99%
The comparison result shows that the invention develops the long sequence in vivo DNA information storage technology. The main objective is to construct a coding system with a strong error correction mechanism by combining a watermark code based on an LDPC code and a BCH code, and to reduce primers, redundancy and indexes by long sequence coding to realize high actual bearing rate (more than 97%); the long sequence is assembled and stored by using a saccharomyces cerevisiae in-vivo assembly system, information is stored, and low-cost, high-fidelity and high-speed replication of the information is realized by means of model organisms such as saccharomyces cerevisiae, escherichia coli or bacillus. Meanwhile, due to the existence of a strong error correction system, the data in the thallus can be perfectly restored under the low sequencing depth (1-5X).
The foregoing is only a preferred embodiment of the present invention, and it should be noted that it is obvious to those skilled in the art that various modifications and improvements can be made without departing from the principle of the present invention, and these modifications and improvements should also be considered as the protection scope of the present invention.
Sequence listing
<110> Tianjin university
<120> DNA-based information storage method
<130> MP1824726
<160> 29
<170> SIPOSequenceListing 1.0
<210> 1
<211> 40500
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 1
gcccgtgatt cttctccatg gaacggtttc attccagctg actgggagta tcccataagc 60
gatccagtta ttgtcttgcg tctagccaac gcacgtagtc agccccgggg acttaggata 120
aagtagcaaa gttcggggct gcgcgcatat ggcacggtag tttccatgac ggaccacccc 180
gctgttggtc taggtacggt acaggaacta atttcgaata atcccgagtg acttatctga 240
ctcgcgaaca agtcgttatt atccctggcc cagagccgtc aagttcccat cattctcgag 300
gtgcaattat atcttgagat aagggctagc agtatattga gtacctgatc tgatgacccc 360
tgtgttgaag gactgagtgt ttgtgattgt cacgatctag aggaggtggt agaagcaaat 420
ttgtgcacca ctcctagtgt caacgcggac ttcccgtgta accaattcca gcatcgcaca 480
taatgactac cgagagcatg agtcctgaca ctacacgtac acttcacgtc tcgagtgcta 540
gtaaccgagt catggagtgc gtctattatc atccgcacgg ctgcagcctc cgcgacctgc 600
ttgtttaccg gtggcgagat tggcgagctc gcctccactg gctgcgcagt ctaagcagaa 660
gatccgctga atcgtcgact gccccagggt ttgcacgatt gcagatggat tccgatccta 720
atacctgtgg gcattcatta atggttccat gaacatgaca aatggaactt cagtgagaca 780
cttggttcag agacacggac caacccactt tattaatagt ataataagtc ggtacgcaag 840
gacggccata gccaccagcc gttgctctga tgtttggaga ctattaaaca ggtctatccg 900
tcccataacg aggaccctgg atcgtccgta gcttgggagg taccgaaatc atatctatgg 960
gcatattaag ggatttgcgc attgcaggtt gaatgagccg atatgtcgat gtccttgggt 1020
tgtgatttcg atccgttgct atacacacgt tcaatggccc attctaataa ctcgtgtagg 1080
ccgacagcgc actactaact cgtttataac agatcaacta attcggttta tgtggagtca 1140
gtagtctggc tccaatatgc agtaccgcaa aacgcgcgca gggtcgggcc ccatctctaa 1200
ggggtcggga tgcaatgcgc gtttaacata ctgtgggtgc ggcgcattgg gtggcccatg 1260
ctccgttgac gttcgaggcc agcttgatgg gttcgtcatt accagcttat gtgctcagga 1320
ataaattatg gagggtcccg tctcaggcca gcacaagaca ctgttaaaat ctgtagggat 1380
accgcaggga tttcccgatt agatgggcag tttgacgatc gacggcggtt aaatcaagtt 1440
cgcttattca tggcgtaata cccgaacgcc tccctagcca gggactgttc gcatgccgtc 1500
acgcggccat gcatgggagt tatagtgagc atatcatcgt tagacatggc ccagcttttc 1560
gttgcgtaag ggattcacgt atcattaggc acgagtctcc tacaaacatg ggaatgaggt 1620
cagtccattt cgtggagccc ggatcacctc atggagcgct ggttggtgta agggggtatg 1680
agccgctgca tttccgttat tcggagtagg gtttctgttc ggaatgatca tctcaaactt 1740
tatgtgaagt aatcgacgaa aacgggtaga ttttaacaat atagtgccga gctcactgtc 1800
tactgcctta gctatacctt tgcgaattga ttcgtacttc ttctgatagg gcagctgcag 1860
cagagcaagg gtaataacga ggggtcgttc tggtagggcg ggcaggcgag tacctagttg 1920
ggtgggctta acccttaggg cgtgagcaaa gcccccatag tagttcaatc gagacaaccg 1980
cgttgcgtac gcaatgtgtt acaggagatg gcatctacag gctacacgtc acaggtgttt 2040
cactcatttc agctatggac acaactgtgc aacttggtca aagctcctct gcttgagagc 2100
atacaccttg tatcgataga gcaacaaggg aacttcgcgg ttactgccgt tcctaacaac 2160
gttacggcct ttgttcacga ccactacact ccagctcggt cataaattat ggtttaatgt 2220
tcaaggtccg tgaccccgtt gatgatagat ggaccggtat taggaaacac actatgcgcc 2280
gatattcata catagggata ttcgcatcgt cgggacgcta tcttcagctt gtcgtcttct 2340
gaggggacag cagataaaga atacgatcca ccctccacga ggggagttaa ctataacgcg 2400
cttttcttat ggcggcaaga gtagtctacg cgcctttctt agaaacctat tcgcgtggtt 2460
cccggtgaat ttaactcgtt cgatcgggac aggactggat gtgacggaaa gcatccggca 2520
ccccttcact caattccgct aggcaatgtt gatccatgtc cccgtgccag tggtcgacat 2580
aaatggcata cgctggcggc gtcaagtgag ggaaattaaa cagaaagtac attctcagat 2640
tacacagtta cgcatcgaat gaatcacaca tacacgacta caccttcgaa agcgtgctcg 2700
aaaatatgtg tcgcttcagt agtgaacccc ttcgtgctcc tatctttgtc gattttagtc 2760
gtcaacatgg accaaccttt aggctataca agcggcagga ctgcccagtc cggactcgta 2820
cccgacctgg gttcctctgc ctgtgctcac tttaattgtc tagcctaata cgtcggagat 2880
cccatgcgtg tcagaatgga cctcaacgaa ctaccaagcg ttcgacccag aacgacaaac 2940
cccgacaagt aacactcctt gtgaacaagt attggcgtta tgtcgttctt ataaaagact 3000
ttgttaaacc cgcttcatga cgtccctcag tagattttgt tttcgtataa ctgaataggc 3060
agtgaggcat agctggtaga tgaaccaaaa gcgtccagta tacactagcg aattgatgga 3120
accgcgcaca cgaagactgt aaagcggccg ataccatggg attcacttgc aggaagtcac 3180
agtcgtagat tataatattt tctaggaggg tctcatgccc tcccccacaa gtttgcagca 3240
aaggaggctg agttttcggg agtttggtgt ccaagacttt aatattcgcg aagtcctcac 3300
cacacgttaa ttagtgagat gagttggcta cttagcgcga taagtgcata ctgaatctcc 3360
acatctcccg ctggggtagt tccatgaacc ggcatcagtg aaaagtcggg gcctctgcaa 3420
gctatgctga aggcgtcaca cacatgggcc aggtataact acacgtaata atccctcggg 3480
gacgacttcg atatcacaaa gcggctgttg gggtaccgtt tctcccgctg tcgcagtatg 3540
aacaacctgg aaccaaaggt cagggtaata caacatgtag caaccatcta ctttttatat 3600
catctaaaac gttatggcaa aaacactatt ccccgaaact attccgatga catgctacta 3660
cagtgcgccc ctgtctgagc aacgccgaac accctatcct tatgagactc atgcgctgaa 3720
taccaaccag gactagaccg gttgtttagg taaccgcaga ggcgaaaaat gtttttcggc 3780
ggttcgcatg aagtgttcgc catcatgatc atattaatga actgacgtca gttgggttcg 3840
ttcggtcacc taccaactcg tttcaaaaat agtcggaatt caggcgtcgg tgtggttgtg 3900
gaagaagatc ggcatcaacg atacctttgc aggcggagac aaagtgcctg aggttctggg 3960
caacttaagc agatccagcg gtcttcctct ctggacgagg gcactggccc tgaacaatgc 4020
attaagtggt gtggaacgag acaggcctat cgtaagcaat ccagaactgt gagcgtgtaa 4080
ttgacggaaa ctatcgggtc cgtacgtcaa ggcggcgtct accttttctt tgagagcagc 4140
cgacgtcccg ggaagcgtgt gcactaaatt acaaaatttt cgtcagacgg aatacaggct 4200
tcggcgtgag ccacaccata cgctggcacc agcgccgacg aaggcgattc gttggcctaa 4260
aagccggatt ctactcgcta ggacagtggt tcagcggcga ctggagatac cctaatgagc 4320
gcgactgtga atggtcagta gctttgcgaa tactagtggt cagtcgaaac ggattaacgt 4380
tcctgttccc gtgaatccga cttacgtgtt acctatagtt acggatgccc gagggttgct 4440
aactgttcac ccggctcagg gagtggtacg gtaacggttt acctaaccct aaacgacaag 4500
atcggttata accttactct aatcaggagc actcaaagtt ctgtgaggcc aacattctat 4560
cgggatattc aatataaaat aatcgtttga ctccacctgg tcgccctgta gttgtatata 4620
cagatagctg cgttatctta tagcttctcc tcaataccac tagcgattta gtgcacgtat 4680
gggtgtgtga ggagctgata tggcctcaag gtaagtattt cggtaggccg gcaagtcgtg 4740
ctagctcgcg gacgggttac aggcaaagtc gggtcgtggt tatctttttt cgggtctacc 4800
tcctacacgg cagatgtgcc tgtagaattc gtaggcgccc cgtaatattg gcttcaatcc 4860
gtggtgagtg agtctgtctt gatcggctcg tatcccaaac gcgttacgcg tcaattcgct 4920
cctatggacc taatgatcca ctacggagat gaatggagtg gacaaggtat agagatgagt 4980
ttcaatctac tttctattga gcacgaacct gaacgcaaaa cgctagagtc aatgtacacg 5040
aaggcggttt atcagagagt ccacgtaata ctttcaactg acacatggcg tctcaacaat 5100
acgaataccc tgtgccgcct gatcgtaggg gcgtaaggtt gggtcgattt tttgtttctt 5160
agaggtttat tttcccgaaa ggggccttgc tcggatctag ctacctgctg tccaattcag 5220
gctaccgatg catgcagttt gtgcgctcat tacggcgcag ctcatgtcga ttgccgacgc 5280
caagattcac gcacatgatg gcttgggatt caaggtagaa tttcgcgatt gttgtgttcg 5340
ggcgcgagca ccgcgccatc agatggcttg ctcgcaaaat agggcctagg agtatacata 5400
aggatcgaac ggggatccta tttcataacg tcgcacggta atggcgctca ttagggcgac 5460
ctcccggttt ccttttcccc acaatgttga gtgttcggcc gcgcaaatgg cacattcacg 5520
tggtatctac gtgtggcgca aatccaacga aactgctcta ctcgggcggc ggccggccct 5580
ggtgagcgca aaattctcga cttcaagcac ctatgggccc ttcgtgaggt cttttggcac 5640
agcgaaccca ctctcggtcc gtcggagtcc ataccgagtg ttaagcacag acacaagctt 5700
gacgccaaca gggtggagtg caattgtccg ctatgccacc agaggtctca cactagagct 5760
atgttttgtg ctggctagag gtgcaacgcg tagctcctgt aatctccgcc acaatggttt 5820
gtccaagatt ccttctttca tatcgcttac gcgaactgag aacggcccga tggttaattc 5880
tcttgttggg aatatttgac ccgggcgagg tgcctggcgg gctgagacga tcgatccagg 5940
cttgacaagt agcgtgaatt ttactccacc attcggtgtt cccggccacc tgcaccattc 6000
ggcgtcggca cccgtagttg aagattacga atcataggga aaccatctga gctggtctga 6060
ggtgaaggtg aaaggggagc cttcgtagac atgtttagca cgcggtgaag ggaaggtagg 6120
cgagggcctg tccttgatag cttgtcgaca aagtctctgt ggttgcgcca ggtcgctgct 6180
cttactaaga cgcgctattg tgtaggcgag atcgaggtcg gaaatatcaa gcatggatta 6240
cgggggccgg tattttagtc ggttttcacc gggcaagaag cctgcgagac aacgttggtg 6300
ggcgactgcg tgcgaatttc ctacataagc gtcaggcttg ttacagagac cgtctccgcc 6360
cgaagtccct gaacagatcc actagcaggc tgaaactggg cttttcagcc aggtaagttt 6420
acagccatca tcagtgaacg gcactgttcc gaaggtccat ctcttatata cacgaagtac 6480
cgggagtcga aactgaaggt tgtctcctga gctaggccaa ttactcgttt aagacttcga 6540
aatcattgtg gccatgttac tgagcaggtc gccgtacgct gctgaccagt ggatcgcggt 6600
tcaacctata gcccggataa cataagcctt ctgaagtgat ttgttgtatc tccggcgtgg 6660
accggcggat taaaaatccg gagacgtcgg gaacttccgg attggtccac acgatctagt 6720
accaatgacc cgcctcaccg tctctctttc gctagctccg tgctaaaagt cacccgtaat 6780
tggagtccac cgtggtaagg gtatggacat ccgacgtcgg atgtggggat agtaccaact 6840
ttgggcacgc aatcgtattt agctgcgaag ctatgcccta gtgctcggag ggcgacaggc 6900
atatcatcac cgcggatctc catgttaaac acaaagaacg cgcaatgctt gtctatagtt 6960
tttacaattg ggaattctga gcgtgtagca agacagctga gactatatag taccaggacg 7020
ctaaaactgt actttgagcg gacgtcctat tggatttggt cgggctacaa agaaatcacg 7080
caagcttgta cgcgtcaggt attagcacgt aagccctcgg ggtagtgcgg atgcttggac 7140
ttatggcctg aatctacacc gctggctcta gggttggaca caacaaataa gagcagtatc 7200
tcatacctta ttcaaagtga gtcgaacggg tcggagttgc gttcacgaag gatatatggg 7260
acctgcgttc gactgaccgc aaaccaactc gatattattt gtggctatgg aatccgttgc 7320
tcatgttctc gggaaacaca gggagtgtat cacgctttgg caaaccagag ttactcggta 7380
tcacggggct gaaggagctg atatacaatt tccagctggg ggtcaaaatg tcaaggtcaa 7440
tgtttggtta acgtgcagtc cgccggtgaa ggcgtgtact ctagttgtaa aactgtatgt 7500
tcttattcac tccacggtga gtatgaagta tgtcaacaga aggcatacga tgccattcgt 7560
gtccgccgtc tatatgaatg tgtctatgcc tccgattcgg tgcgagcgtt cggtgcaccg 7620
gctcatccgg gagagtgccc gctcacgccg gtccacgggc gctaattata cgaggcctgg 7680
gtacgtccct agatttcaga acagagggac atgtcctagt ttcgtaaggg aaatagaaat 7740
tgttagcgag tgaccgagat cagacgaccg agccggagtt tgattgtgcg acgtgtatgc 7800
ttcataggga ccttattagt ggccggatat agtgtcgtgg ggagcggtct attcagccgc 7860
actgcctttt agttttgtca actagttttc taggcttggg gaaacacgag actataatgc 7920
tggagactga tgtcaccgta agtctgagac ccggaattct ttaatatttg ttttacacaa 7980
gaatgtgatg cccctcgagg agaccggcgt tatagttacg aaacccggac atagccgagg 8040
gtgcgaatgc gcgtgtaacc gctgcgtatg gcggttacga ccgccggcaa gggcttatac 8100
agacgcgagt aataaaggcc gcggctcgtc tcgtattgcc tcgccgaaac gcggtgttgg 8160
acctccaaag tcctgataga cattacgata atccatgtca ctatacccct gattgtgccc 8220
acactacaag gtgcaggtga tcgtagaatt agtttgttcg aacacaaatg tctctcgaga 8280
gctggtagcg ccagttcccc cggggactac gcccacagcc tcataaaaga gtagactttt 8340
ttaatcgcct actacatgga tgactacgcc gtacttcgga tgctttcttt ggcggttaaa 8400
ctcacacatc tatgtcaaat tactgcctgc gatgtggact tttcactctt caattgcccg 8460
tgtatcgcac gggcctcctt atgcttcgat tagcaatcgg accgtagtag gaacaggaag 8520
tcgttcttat gccgtttata catgggtaga taatcaacag ggctaggttt tttaaggtta 8580
cgcctattat ggagatgggt cttcgcggca ctttgctcgc aaccttgcat cagggtgggc 8640
tctatgctat gctagcgggg acttctcgct atcggagagg gtagacaaat tctgcgagtt 8700
agacaccagt caagtgagat ctccattctc gtcgtcgaca acgcggcttt acaaggtaag 8760
cagagttcgt catatgcacc gagccctccc cgcaggccgt aactggtaga gctttgcccc 8820
gcgtaatctg cctcggtaca tgcttctacg ggcctacggg atcgcccgac ggggatgctc 8880
ttacatcgga acgcatacat gacggccaaa tgagtatggt taatttacaa gtctctgtag 8940
ctacaacaca aaacgggttc cggtgttggt aaccatccca tttggaagcg ataacccgga 9000
agacattttc ctgacggatt cagccgggag cgtgggcacg ctatccgtcg gaattcataa 9060
ggaatcaatg cacatatccg tgcccattgc gtcgaaaaaa ccctccgttc taactttgcg 9120
tggggataac tgaagcaaag gtacctactc tggcagagag aaatttgtcg tacacaaggg 9180
tttggcgcgg tatccacatg acctctacct taagcgagtt acccgttgtc atcagcacag 9240
agaattcccg acggtagaaa tttgtcacgt ctctcagcgc taacctcata tgggtggtgt 9300
gatactatta acagatatgt gcatgactct taattctact tacgttgtcc gtcaatgaat 9360
catgggctct gcacaggagt aagtcccgct ccgtgtccca tcatccaccg taagatgttg 9420
gaatgcttct ggtgtagctc tacttcagag gagagggggg caagatacgt agatcagtct 9480
gcgcataagg acgacgagtc tatattcgtg cagggggttg ctgggattcc gaagaagaag 9540
agccaccgga acctcctggc aagcagcgaa accgcacggc aggcctgcgc cccggaacgg 9600
gacgatacac ttggaggctg tctgcaatct gcggtatatg caagggcatc gatgatgctg 9660
tttgctggtg ggggtcactc catgagcctt cgacagacca cggtggccta gctgacctct 9720
gtattcgtat gcgatgtagc aatgattttc caaatgcggt ggacctaagg gcgccctggg 9780
tgcagtgata cactcgagtc ggtgcaagtg tatctctcat taacgcccgg aggctgcaca 9840
cacgatttta agccaacatc gacggctact ttccgcgcaa ggcccctcct aacgagatgc 9900
cggttgttgg tcttcctcca tcttgtcaat gctctaaaaa gaagaagttg gatggtgtca 9960
cgcctatgga aactggctgc gtacaaacat tgttacattc cgtcgagaaa atgtacggta 10020
gaagtggtaa ttaccaatag gcgcctatag cgaggcagat gttctattgt ggtctcactg 10080
gctggtagtc tgtctacact ctagagccac caacccaatc tcctattggc tagtagctcc 10140
tgacgagtca gcgatccgga gagatacgcc gcgggcaccg gcgactccta acctagacac 10200
atgtttgcac acaatgagac gcttgcgaat caaatgtggc aaccataaat cggatcatcc 10260
ccgcagcgtg tcctcttttt gcttgcggct caacagcata gtaaaaaagc tgcactgtaa 10320
ctgactgtgg aacataatcc ctgcagtaca taattgtttc ttacctacag cttgtatagg 10380
ctagctagga caatagtaga gaatatgtct ttaggtcaca tgaacctcag ggatattacg 10440
ctacccttgt cagaaaattc gtgctcggag tattgcgaaa ttcgcaaatt catgggtttg 10500
tggctggcaa cgatcgcggt cttcgcgcat atggagatcc ctagtgctag ctcagacctt 10560
acggtgaggg gcctaaacaa gtcgatggta gtgactcact ctcagctagc gtcctcggcc 10620
atctctgtgt tcctatatct agctggcgct ccggacgtgt ctaatttcct accgttcggg 10680
tcgattcgtt ggccacgaat gctcacggcg agtagttaca ttctttgcta cagaaatttt 10740
caaggaagcg tggtctagga ccacccgcaa atctgtgacc cctaacatct agttgctggg 10800
cgctagttag ctggtaggtg cttacacaac ggcccatatt tcaccgtagt tataagattt 10860
aacccgaccc aggcttcgac tggcctgtct catgcttagc caggtcctgg gacaatcctt 10920
tcatcacaca aagtctcgcc cacgtttagg ttcagcataa tcgcggtctg tgtattcaca 10980
cgcgacagaa tacgcgggat ggttccctta atccattccc ctttctccgg cagtaggcac 11040
gacaatcgcc cagtcgcaac aatggtgact cgtctgagtc ctttactgaa tctgaggtgc 11100
agatcatatt ctacactcta cccgaaggtt aacgggcatt aacatcttta atccataacg 11160
gcgcgtggta accgctgtgt gtccaggatt ctcgtactgg caccttataa tgctccccag 11220
tagatgtgga ctccgggtga tctcacacac ccaagttaaa ggctgagagg taagtcgtgg 11280
cggctaggaa gtcgtatata tcaaggcatt actgttcgct gcctggtagc tcacccacgg 11340
caacttaagt tgggatggtg ttaaagtctc agtcggacac taacgggttc ttgcgacgtg 11400
ttaaatatgg tccggtacag acaggctgta ttcagtcctg aatttgggac aactccttgg 11460
tcaaaaaatc gaaacaggcc gagtccgagc tatccacttg tcttctagtt agatcgagag 11520
ctcggtgagt caaaattttg ccgggctatt tagaccagca tcagacgagc aacatgcaca 11580
tcgctgtcgg gacgatttaa gcctctgtta attgtgaact acctatggga gacaaccgga 11640
gagtgaattc actaggtagg caggcgtaag cctctaagcc ataccagtga gcgggaaaat 11700
gaataccgac ttccttcact ccgcacggcc tctcatctgg agtcggggca gcacggagcc 11760
acacattaat taactttcta tcggttgtcc tagacagttc catgtgactg ctcaacgata 11820
agttgatcgt gcgacataag atgcctgtga cacaacaccc tcgtcgcacg ttcgtcagaa 11880
atctcagtta ataaactcgc atcgagagtt actggaggtc ttgaaataat caggcgaggt 11940
acaccatagt tctgatacga cacctcctga tccttagata ggcaagcgta tatcgccgct 12000
atcactacta acgtgcccca gttaggtgcc ggcaacctac aaaatatggc atgatataaa 12060
attaatatct aagtccctgg ttttacggct taacatattg gtagcacaat cttgtatagc 12120
cgtgcgaccc gtgctgcgga attccacctc gaagtcctat gtaatcggcc cctaacacct 12180
gtacacacgc tctccgccgg ttgttgacaa aggggcaagc ttagatagaa tcctagcttt 12240
aattaaaacg cgacgtcctg atggatggag caattagttc ctaccaggac tgcgcgattc 12300
cccaacaccc cgtttcttag tactgtcaat attggcgcta gaagacatgt actattgacg 12360
tactaggcag attgtcggtg gaactcagta agatgtggta cttaaaggct tgccgccctg 12420
cctgtgcgcg gacatgctgt tgcgaccgct aacgtcgaaa tctaccccca tcgaattgtt 12480
ttaatattgc tcaaagtatc tcgcagatat gtaatgactt gtaacgttct tggcgcgcgg 12540
caaaagagaa tgcgcatcgc gaacttacta cttttgtgcc gtccgtctat ttgtccttcc 12600
tattacggca tagcgttctg gaggacgcca aattatatgg tccccgaagt ttgatggact 12660
gcagcaatcg tattgtttgt ccgcagcgat cgcgaattta accttgtgcg tttatcttct 12720
gcgggtggtg aacggaccaa gaatcttaac taagaccaag ataaccaaca actaacgaag 12780
ggttgacggg gagttttgta tatatgaata taggcaacgt tattgcagat cggtttcact 12840
gatttctcgc tagcgtccat gttgactagg gacgagccct agttcttgaa cgcgcgtgat 12900
cagggcctta cgcttctacg acctggcacc ctacgtcctg ttatacgcgg tccaaaagat 12960
aggtcgtcgg gtcctgatcc cgctttgacc ccaaagaccg gtctcagggc tgtagtgtta 13020
taaaacacat acaatgctag gcaaccactt actttggcag cctctgtaaa agtccggcca 13080
tgtccaaagt catccagtcg ttgatctgtg gatgtccagg cacagtggcc aattgccatt 13140
tttaaagaag gaagagagca gactacgaca aagtacgagc aaaatatccg tgagcctccc 13200
tcacgctcgt tagttcgacc aaacctcagt ccaactagtt tgtagaattg cctggtaact 13260
ttggtgatcg atgtgtacca gttggtcagg ccatatgctt ccagtgggag cgcctccgcg 13320
atcgaagctt gagacagtat tcattactga aactactagt tttcctgtcg ttttctctcc 13380
ggtcatggac agacttgtat ccatagctgc actcctacag atactccatt gtgctcatca 13440
aagagcaatg gtaatgacgt ggtacggggt gtagatatac tatggctgaa cgaggagatc 13500
cggggggtaa tcctgcacgg acactaacgc ttcatgagaa aacaggtagg ctcaaaacgg 13560
ttgctcgacg agttcctcag agcgttctag aacatctcac ggagatccaa ttccgtggag 13620
tcgcgccgcc acgaccgata gattaagcta aagcttactt tcagttacac gcccctctgc 13680
attatccgcg tacggattgc gctacaacag cggttccttg gcgcaggcct tcgacgcccg 13740
agttgatttt agcgaacgac aattgaccta tcaaaggtag ggcgtagaga tcacacatcg 13800
ccgtaggacg agaccatatc aagacgcgtc ggtgacccga ctctcggatt atcgaacatg 13860
ataccagaaa ctaggtcatc ccttgcgtaa gctttctatc aacaagggcg gccgttatgg 13920
ttgcgcattt ccagagcgag cgtcagactg atatctacgt agaaccacac cgcaagcgtg 13980
cacgttacat aactcggttt acttttgact cacaaacagt ttccctcgcg gttcagtaga 14040
tgctacatta ttccagctaa tggctaaccc ggtaccatgg ctatcacaga tcctgtaaga 14100
taggcaggct acttgttctc tcttcgcgtg aaggggtaga ctgtcatagg aacattaagt 14160
taaagggggt ctataagaaa attgcgcttc aaattggggg aggccattct ttagtgcgat 14220
atacttttca aaaaaccaga gcatactgga agtagctgta tctcacggtg ttgggttatt 14280
gtgccacaaa gcaaatgtgc ggctgtcaca caatggctta aaaatgtcct ggtataacca 14340
aaattttatg ggtcgcacca ggctgtcccg aagctacatg tacaccattt gcgaacgtta 14400
tacagcactc ttcacgaagt atcggattag tgcagcccga gtaatttgtg ggaatagtgc 14460
ctacaccgcc aatttggtcc tccatgcgtg cctagcgtcc taatgttcgg aaaaagcggc 14520
atgtacaatt gatgaagagg gcggtggacg atgaatctct tcgtagattt tggcgacccc 14580
acctgctccg ctcgtctggt aagtgaggat actcccaaag ggcttacggt cattcacggc 14640
tcaagtatag gggttttgac atcgggcgac tgtatgtcca gagggatgcg gctttacatt 14700
cagtaggccg agagccaggc ttgatttaaa gacacatgtt accgcagggc tatctggatg 14760
acgcttcttt cgacgtggga tccatgagcc ccaaaccgca gcccggcctg cgatagcact 14820
gaggggtgtt tcttcgaagt ctcgagcgat tagaggcagg gtaagagccc ccaccgacgt 14880
gatattctct gtgtgcattt cctattaact gattgcaagc tgaacctccg agtaagggat 14940
gcgcgcagca acagggaata taggtttgat taaaacgaac tggcagtgtc caaagtcttt 15000
ccggtgttcg atttttgcgt tctacctgcc gggctcgccc agaagccttt ctctagggaa 15060
gaatatgcct gtgtatccag ggatacaaat acgtcaaaac tggcgggcca gtttgcatga 15120
tgccgcggct atttttcgtg acagactgct cctgtcgtta gacttagcac atataaaatg 15180
cttgaagcta ccctgtttgt acgggatatt ttcacagaga ataacgcccc gaactatttt 15240
cgctctaaaa atcctgtgag ctgaatttgt catttttttg gttatgggcc taacatcgcc 15300
ttacccgtcc gacagtgcat atcactcctg tcgtgtctta aacctatgta gcgcactacc 15360
ggtatacaac attaacgcct taaagccaaa taaagatcga aacgacaaga acgtgcgtcg 15420
agccagcatg cacgggagcg gtgattacta ttccgccaaa ttgaggcaat ttgccactag 15480
gagtttgtta atccctcgaa ctagtaagac gaaagttagc tcgagatcgt ccaaacataa 15540
gcaccgtcaa tgtccgtcaa ggtatcaaac aaaaacacgg tcaagaacta cgtctcggtt 15600
ctcggccgac cccgagtgcg ctagaacgca gcgtggcatg acatatgatt tatcttagcg 15660
gggctatttg agactacccc ccaggtccta ggtgaaaggg ccatctctct cgtaaatctg 15720
tgaaaggtac gaagatttta catcgcgtgg gcctgacctc actataactt tgtcggttat 15780
tctgacgcag cgttaacata acacgcaggg gttcgccgcg ttccgtgcta cgtcgtgggg 15840
taggagaggg cgggttatcg caaaggattt gtacagtaga ttcacttaag atccgatatt 15900
cgcgtcaaga gacctatatc tgtatggtaa ctacgcaacc ttgatgtgaa ctactgagtt 15960
tacgggtgct tcagtgacta aaggggaggc gcttggagag gtgacttcat taggacccag 16020
tacctcataa atagggattt aacggacgcg ttggccgtga tgcgccgtcg gaagattatg 16080
cttatctcta gccgctgtgt taattgggag cgttgcaaaa atccgacgcc agtctcctaa 16140
tcagaacaca cttactcttg gaaaaagcta ggcagttcgt actcaggagg atcaccccct 16200
aggtctttcg gagtcatggg catccgaatt ggctataagc tcccgatatt atggagctca 16260
actctcactt ggatcgacgt ctggtagtaa agtaataagt gaggagcctc gtggtgtgtt 16320
aataataact agtgtgctga tgcttcctgt tgtttcatag aaccggtgta actcaataac 16380
tgggacttag gggtgtgatt ccgtgtgtcc cgattaaata caatagacat gacagcctta 16440
gaggtggact gaagtagagc aggtgcccgc gctagtactc agcctacgct acgagactaa 16500
tggagtccct ctatggagga cacaatgcac ggcgacgtga actccgtgga gcgcttgtcc 16560
gtgttccaag accctcctct agtgattcaa aaatctccaa ataattgatg gcggatcaac 16620
ttgccgtatg cctgtggata cgtctgcacc cgaagcgttg aggcctccct atatcttcga 16680
actaaatagg catgtttgta ttagaccact agcgcccggg gtcgtcatca taacatcatg 16740
accgtatgtc cctcaaccag ggtggggttt gtagcacacc gaaagtagcg gattgctcgt 16800
aggcgggtat ccctccggga cgcaataatc tctctgtctt tagtactgtc cgggcatatt 16860
gttaaaggag gcagattaca ggaacctgaa ttgcagttat gttctcagat aaaagtaaag 16920
caagtgggga gcttacgtat tgctgtcgac tactgtttga attgtctcaa aaactgaagt 16980
gtcagctcgc ccaccgggta taacggcatc gccggtacta ccacaaaata cacttaaaaa 17040
ttcctatggc gcgttcagtc ttccaaagct tttagacggt gcctatggat aatcgagcat 17100
cacgtgggga attaccttta cacctggagg ctacggtgca gagtgagaaa gtgtgaccct 17160
ccccggtctg agtggcgcct tattctagat actcatctgt atgtcgcaat gagtcagcgg 17220
agcgggcatt taattatttg cgcataacga gctcttacga ggattactgg agaacaagac 17280
ttacgcgaat tcctccaaga taactaaatc gcaggatatg ccgcaaaaac tctttactag 17340
tgtgacatgc agatctctct cgacctaagt tctgaccgcc catccagcgg actgttggag 17400
gccccaatct tgagattctt tacaatagga agtcttgtcc gtccctataa aggttgttca 17460
ctgacggccc atgcttattc gctctcgaac agtatgctta tgtcatcagc accagaattt 17520
tgttccggtc gcccttcccg tagtgtatga cagtggccat cgcgggggga aatcctggta 17580
aataacttcc aggttccttt taaagaaatg aacttacact ccactctcac cgattcgtag 17640
ctctgatcag tgtggcgaca aaaatgccgg cgcgaatagt tacaggaaaa tgattaggta 17700
gcctatattg aactggcctc ctctatgtcc tgtccgcacc ggtatcaaag tttgagacgt 17760
atcatagagc cttgtagtat cctggtgggg agcgatcatc ctcgggctct agcccaacaa 17820
cgctcgcaag gtccaaccgg tttaaacctt ggcgcatgcg ccggttgtgg acacacgcag 17880
accgagtgcc tgatgatatt acacaggcca tgactcttac gtggcgcact ggatgaagta 17940
attacaagac cctcgattgc cgtatcattc actagcaatg gcgtgccctc tctaccggtg 18000
tagttttacg cggcatggcg gagctacata cagcgtcgat ccctgttaga ttatgacttc 18060
agaaatgtca ttcctattac agtcgatagt acttttttac ggagcttgaa agctagccac 18120
tacgcgaaat agctaacaaa cccggtttga ccgaggatct caggccttgt gcccgggtct 18180
atctgaggag cggggaccct atcaacgcaa cccctagtct ttggaatatg cacgcagccg 18240
taaagcaaaa gtggcgatgg gaaatcatcc tggcaagacc gcgctatgca aacgaacttg 18300
cgtgtattgc cccaaaagac actaaggtgt gttggcaaca tgcaccgaaa ttatattagg 18360
acaagtattc tactttagtc atttacgtta taatccctag tgcgtaggtc cgtcatgctt 18420
cgggcctgcc tttcgcgagg tatctcttga tccgagacta gggcccttac tagaacccct 18480
cagtgcggat gttagatgta cgagtcagct ggagtacctc cccataatcc ccttttactc 18540
cagctacgtg gagtctgtga ccactcgtgc ggggctcagc catggggcaa gataaaacga 18600
ctttacgacg gctcttggat ggcagtgagt gagctgttgc gtttgacccg atacttcgta 18660
cggtgcatag cctgccgata caatgggggt ggacgcagtc ggctttgaca atcgataact 18720
ctttaggtat gtggtttcaa caatgtccct ccatagagag ccgaacgctg ttttctggtc 18780
tttctcaagc ggatagtgaa tgggtgctcg cgggttaagc aaggagggaa ccgtgcggtt 18840
ctacgtcact gattttcctt tggctgaggc cctcttgatg atgtgccatt gcgggtgggg 18900
gcaaactatt cgtgtccaag gacgaaatgt ctgacttggc cctacgccga gcaacaaaac 18960
ctcaggcacg gtccggccta acatgtaagt agttgtatga catgtagtcg atacaacttt 19020
gtcgcatatt gcgttacgga ccttataatt acaccgtgat acagcctcat tcgtccctca 19080
aggacctatg cttatatcca agacataata aagaaacggg tatctggagg ccccctcgcc 19140
gcatatatag atgacaagta ggcatccatt ttcaagtaaa aagtccgaaa atcttgctgg 19200
ctagagactc gtctctctgt ctatgggcta tttggtgtca atcaccaaaa cccagtgaga 19260
actctggcat aatcggaggg tcccagttgt cttataggcg ctacagctag agatgtcaaa 19320
ctgtgttcat gaccccagta acctgctgtg attgggaatc acgggtctat atcacatcgt 19380
gatgcgatct cacctgttcc tggatcttag tacgtttcgg gagggtcacg cttagaagca 19440
agataagact ttaagggcgt cggatttata atgtcaagtg gctatagaca ttctggaaga 19500
cgtgcccaaa caggggcagg tcagtcccct gcatctggat tatcatcccg tcaagattcc 19560
tcgaagtggt tacttccatt agtagggatg aactagccgt ccagcaacag attatggatc 19620
ttaggcgcga tccttgtggt tcagtacccc agcagtcttc gagttgtaca gacgccatag 19680
cgaacactaa aatagctata cctatagcgc ccatgaagga cagcgagaag tcctacgtcg 19740
catgcaggtc ttccacccgc ggtatgtacg tccgcctgga agggatgcga tgaaccgatc 19800
tgtgcggtgc tttaaagttt cattacaggt gatgaaggtg gcgtggcatt gcttctgagc 19860
ccgtaacccc cgttggctta gccggagaca ggatgcgggg ttgaccaaca gacaagggtc 19920
ttaaaagtcc gatgcagggg tggtggggac tacggaatga gccgtccgta gaggttatgg 19980
ataataagcg cacccctccc tcatacatac ggagggtcac cgagcttggg tgcgttcttt 20040
cagtgcgact ggttcgggag aacgctttat attctttaga gcccactcca acgtgctgag 20100
tttgacggca tacaaatggt gcgacttcgg cgactagaca ggcggctttt cgtccactca 20160
ggtttacaag gggtgttcgc tgggacgcac tgagtgaata gtcggccgtc gtacgttact 20220
cccagtatca gtctgtgacg cctacactac gaagatggac tgcttgtgta catggttata 20280
gaaggcttcc ggtgttcaac gttgtgtact ttcgggagcc ggcccgggtc ggtcgactgg 20340
catgcgcaat tctcaacagc atcagcgcaa ctctatagct ataccagctg cgaacgaaga 20400
tggtagggat cgctatacat tgtcagtcaa tgagtgaact ggccatccaa aaggacctga 20460
agggtgatac ctaccagaag gcgtccctta tcacataagg gcctctccta atctaagagg 20520
cgtgaacttc aaaaacatgc cggtcagagg tgttatcctg tgtcgggctc tcgcctgggt 20580
ggctcgacaa tctgaggggt cgtcgctttt tcgggcaggt ggggcggtcc gccagtatgc 20640
agcacgcgga tttcacgctt ctgaaaagca agctcacatg ctggagcatc tcgcagttcc 20700
cgtttagctg acagtgattt cgcctggtat gtaaagttcc cgatcaagac ggtcaaacgc 20760
agtatagcta tggagtctga ccaccttcct gtaagtaagt acacaacgtt gaaggttcct 20820
gaacgtactc ccgcattttt tccgggactg actttgactc tagagcatct agtggggtga 20880
cacagaagtt gcgtgaatcg aaaccattcg acggaatgaa agcaacgctt ttcatatgtg 20940
cactgccgag ttaggaaccg tcgggtcttt ccttggcggc aacattagga gcaaagtttt 21000
cggttaccgt cagaagatgc caggtggccg tccctctaag atgtattatg acccaggagt 21060
gagcatcccc ctccccgaag tgtgtacgat atctatgacg tacccctaaa cggaaacctg 21120
gatggtcgta ccacgcgcct ctcgaaggca ggatcactga cgccccaacc caaattcggt 21180
gaagagtcta cgcctatggc tatgggatgt taagagttgg tgtttgtata tgatttactg 21240
gcgatgcttg aacctcccgc tttgacgcgc gtggcatctc attctgggga ctaataccac 21300
aatactcacc tgatgccgcg aatcgcgaac ttatgcgaag ggagggcgat ccccagacca 21360
ctcgaatcgt cctggagctt cgacgacacc tttgggtagc tattgatggg aatgcatgga 21420
ggttccactt tattgaatta ctcagtcgac ctagtcagtt tacagtgtgc cgacgtctct 21480
tctagaacgt tcaatttcgc caatgagatg cgcaaatcca aacgcggtgt aggcctacgg 21540
tcgagaggga ctcgatgacc ggctctctcg ttgaatctca cgagtaatgc acctagtccg 21600
acatttcatc cacgtgcaat catcatacgc ccctgtattc gtgataggga tcctccgggc 21660
gcctaccatc acaagtgcat caaggcgcca cgttacctgt tccggcattt gtcgtcgttc 21720
tctgcgaact tctaaacagc cccggctcga gggcttatat agggcctcct ttgctttata 21780
ttatcccaag tcagactccg aatcgtcgaa agatgctgca tacccagtga aaggagttat 21840
tcaactgctt ggtttcccat tgggtacagt tagctcgtat cagcagttcg gctgcaaagt 21900
tgctcttatc tagtaagatt ttctcatgta ctatgaagca aggtaattgg gaacgtcttt 21960
agctgagact cccatcccca ccacgaaatc cccaaagtcc agtgctttgg cctatccgtt 22020
agcagcggga tcgtcggacg gttaaagtac atgctagttt tcctgcagac acctggttga 22080
cttgtcgtgc aacaggacta gtatttgttc cttgacgtgt ctttacggta cgtcatcgag 22140
acaaccggag gagactattc accaaacgag cgcatcgtgt cctcccttat gctgataagc 22200
acgcaagcta tcattacacc atgagattcc gcttgcttgg ttgcgtacca ataccagttc 22260
tcatcttcct gacactgatg agctggttat taagtacagc acgtgaccat cttgaaactg 22320
cttcgctcgg aacgccgaca cccatgaccg actagaggat ggtagaagga tgtgcgcacc 22380
aggctcatgt aacacggtgg gtgttttcta ctgattgacg gctggattag cctcatgact 22440
aataaacctg ccaaggcggg gggagttgtc gccctacgcc cgtcatttga ccggaccacc 22500
gcgagtctgc ctttcgataa ttatctatat ttgtcagacc ggttatgtta ccgagattga 22560
gaacttaatt taaccctagg tgtaactaag taacagcaac tggaccttct tcccaaacat 22620
taggaacgcc tgaactagtt agcttgagtc ttcggccgag aaagcgagct aggatcggcg 22680
ggcgactatg tactaacgac aaaagggata tggtatattc aggtagcagg tgcctaggcg 22740
cgttgagccc gagtaagtac attggccacc tgactctcct ttagtcggga aacaatatat 22800
tgaatcttcg ggtctattca gcatccggtc aagggatgcg aagctttata atgcgggtta 22860
tgggagttcc agttgccgta taagtgccac caacagtgat ttagtccttc gattctgaaa 22920
ctaggacatt tggctgcgcg gtagcaccta tgtgtttact actggttctt agggccaacg 22980
ggtacagaga tgccgacaga caatacactg ccgaaaggtg cctaccgcgt ctacacggat 23040
cggcgccggt gtaaataacg aaaacgactt gacattaata catagagttg attacgcggt 23100
tgtaatcacc cagatttctt tcacggcact tatgaccctg cggaagaatg caactacggt 23160
acgggaagcc gcgtgttctc gtaagcactg agcttcgtta cgcgatagag aaattagggc 23220
ctcacgatat tctcgtcaag gattaccggc gacagaccct ataaatgctt aaatacgttc 23280
tgcactgcga gcgtgcctaa tacgccctgg tgttatagca acgatctttc gtgccgaaat 23340
gatatggacg aagtgtctaa tacaacaaca aaaaacttgt gggaaagctg acgcgatggt 23400
gattgacagc taaatgccag aatatgatca tacctccctg gttctgttgg tcgaaactgg 23460
atggtctgga gtgctgagct caacgttcca aacatcctgc agtagaaggt accattagga 23520
agcaggccga tcctcgcgta gtgtacagcc agaccaaaga ctaagcatac gagtacggtc 23580
atttataagt ttgcagtgcc tggcaatggg gaccattacg cggcacatgc gatatggggg 23640
tgacgccttg cttaggcaat gagtcaacct ctcattactc cgggcattac cccctcatac 23700
gaacatccag ctgaaactct agtcattggc acggggttaa gtagtcgtct agtaccacct 23760
aaaagcgctg gaaggaatac tataatattg gaaagccacc aggaggaaag ctagacacgg 23820
atgtgccgct tgtagatgcc taacaatatt ggtatcttta gggcttacca cctcgcactg 23880
agtcaaagtc tcgtcactgc gtcgaacttc tgtcgtgtag ggtcacaatc taagatgtga 23940
tagagccctc accgcctaca gtcgggaccg cctggctagc attcgatatc tgatgccggt 24000
actcggtaga ggccgtaaaa cattacaagc tggagagcat cgcgactact tgagatctgt 24060
ataaggccgt ctatcggatt acaaggcgtt cactaattat ccgtcgcagt atgtcaatat 24120
tctaagcggt tcccccacgg ctatttacag cagacatctt agagttcgct ggctagattg 24180
attacagcac gctcccacgt tgcgatggac gtcctccgac gctgccgatc agtgaaatga 24240
gatcccttcg actttggtcc tcctagtctg cttatgtcca gcaccaatgg taccgtgatg 24300
tgagggaatc taaagagata tcatcgttta ccctgacgta aagataaggg ttaagaccgg 24360
aacagaccgg gtgaacttaa tgcgcatggc cttgccgcgt ctcataggcg atcctccttt 24420
ccgagcgcag ccacgatacc caattgctgg tagaccagtg gggctacgca aaggtagact 24480
tttagtctgg ctttgtccta gttttcaatt aaaagcgggg tccgccgaca caactcccag 24540
acctttagag ggtcaacatt tgtcagtaac tggaagcacc tcatacttga ccgcgatcac 24600
caatcggggt acggtaatca tccgacaatt gatgtgttcc tcatcaagcc agcgaccccc 24660
agttgagacc cgacctcggt cactggcact cgggacgaaa gaataaggct tagtggacgg 24720
ctaatcgctc accaaatcgg gtcatagatc gctcccgtct gcgataccga gagcgcatat 24780
ttcgcatgat atcccccacc cgttacttac cttgcggagg ctaaacatta cggtacctcg 24840
catattgcaa agcgtgcaac tggccatgtc accgatttac gcacttagga ggccatgagc 24900
tcattcttat gttttcttag atgtggattc atgctacacg gggaaagatc gacaaatcag 24960
cggatgcgca ctcagtcgct ttgggctttg tcacaagtgt gatccggcta cggtgcacag 25020
ttcgtgcaat gcgatcggcg catcctgggt taagaattcc caggacgatc agctccagcc 25080
agcaaataag caaatcgcat tcggaatgga gtaacacgcg caacgaattt tggaactgga 25140
atgagatcga acaaacacta gggctaatgc tgtagtcaac ccttaatgag acatgacctt 25200
gcgtagacag gatggatata gcgaccacat aaagcggggt gtcatatggc ccgaggggcc 25260
agtggctgca gtaggcttag ttcgatcccg gtgcttgcaa tagtctctcc cacggtctat 25320
aaatgacaca gacaagacat cgacatcgtc gaagataaag ggcggaaacg atggcaaact 25380
ataaagctta tagtcgacac tttacgtgtg ggataggata gtactcaaaa tgtacgatat 25440
cgcttcatca aagctgcgcg agtccactac aggcgaaacg aatcccgcgc caaccgccta 25500
cgccgaccgc acaggttgcg gtacctatag tgaaaccaga tccgttctac acgcctgtga 25560
gatagcttgg tgtttaccgc tgggctgggg tgtaggaccg atagaccctt tgttgttggc 25620
gatttactgg tactcctaaa aatgcctttc tcaacgcatc acacctgtga cgtttaaagt 25680
gatgcatctc gcccaaatcc tagtaatcgt cggcttctct atgttaacca cactgcacac 25740
taatgatcgc tgtgacaaga cctgcttagt tcataccgaa agatcgccgg caagggcaag 25800
gaatagcacc tcgccaggtc cgctcaccta gggaaacgcg tatcgcatga agttcggtag 25860
gcgcatcacc tgtagaaact gccatcgggt cagggttacc agccgagttc tcgcatgtcc 25920
cgtagcgata ggcatccagc aagtgccggc tacgctgacc ggtatagaga ttatggagtc 25980
acagaatatc gtggggcaat gggccgaacc cagataaagt atccagggag ggtaatcttc 26040
aaacctaatc ccgttcttac gctagtcgtt gtagaacgct gctggatgat attgatgcca 26100
acaccgtcgg acgactggga ccgcagtgct gtattaagct ctatttaatc acattcaatt 26160
cacaatgttg atcagtcatc tccgcttgac cactagactt atgcagggca gaggaacatc 26220
gctaaggaca ctgtaatgtg cttagccatg aacagttcct agttcacatt ggcgcgcagg 26280
cgaccattgt aatcctcgct aaaaaataag tatgtgcctg aggaaacgaa gagacattcc 26340
aaaaaacgga attttgattg caacaaattc tgccgggtta gtagaaaaac accgatcgtt 26400
tcggtagttc aaccgttcca gcactcggat attcagtggt attctcttgc ggggttaaag 26460
atacaagctc gctattagat gaggaaccgg tgtgcactac ggctgtcgta tggtagaagg 26520
atatgccgca gtgctccggt ctcttttagg cggcactcag gtgaccaccg acatagcttg 26580
attgtccggg acattgagag gtaggtcctt gtgatccgtc tcggagcaat acgtcctcga 26640
ggcaatgggc ccccccgcgt acccaggggt tacggccatg gcctggatac tgaatttgaa 26700
aaccttttac aacgcacggg gggcatggat cacatgccat tttaccaaga ggatcttcac 26760
aacccccgaa caaaccaaga aaatacgaca tttatacgcc tgcccgcgta gatcactggg 26820
cttgttttct gcccgcactg gctgtacgat actagttatt tattttaaag ctctagactt 26880
cgggtcatta caaacaagcc gtgggcagga ttgaagctac cgcccaacta tacagtctca 26940
agcctgaccg tcttgtgtaa aaacattcca ctacccttcg gatagccaac gtcccgtatg 27000
acccccatag ccttagccaa ccaaactgat gggtgcttta ccaataagac taatcggaca 27060
agctccgttt ttgagggggc gatgagtatc caagactgca cgattgattg gggcatttct 27120
gaagcatgct gagatcaacg tgtacattaa ctatgttttc acctaatcgg tacgagttgg 27180
agaagttcca caattaaaag acagcgaatc accctagtca cgtatcgtaa gggttgaatt 27240
actacgcgta atactgctta tccacgccag ctaactccgg ttgtcagcaa ctttatgcaa 27300
ggcgtattgg attccatcac ctagcccacg cggtagaaag tgtaacctct ccgtttttca 27360
ttgtaggtgg aacatgcagc cgcccctccc cttagcgcct caccctctat ctgcagatct 27420
tcgtacagct acgaccaacg tcacatgaga cgaatgggga agttggcgga tttgcgacat 27480
aaaccttgaa ttacgggtgc tccagcgaac gggtgctatt gaactcaccc acgaagtcct 27540
atgcatgtaa atggactgga gtctattggt aagctctcca caggcatcct gccatcggtt 27600
atcttctgga ttctgtgcat cttcatgata agatactgaa aagggatagt atgatttatt 27660
taattataga gcaaaattcg agtcaatagc cgaaagtctg gcagggctat cctctccccg 27720
gctgcaagac tactactgcc aaaaggtcag agagaatacc tacaacatgg aataggacgc 27780
tcttatacgt gaaactgatg cattgcaatc tcaagtaatt agggtcctgg gaaactcagt 27840
tcggattttt cccactcccg ccgccagtat atggcaggta agagcgaaac aggccttggg 27900
ggcggcttgt caaaaggttc aggagtctgc acaccgtgtg ctccgcttgt ttctagcagg 27960
aggttcacct agccatggcg acgatagcta gcattaataa ctgtgtgcta gcctggggcg 28020
aagcagactg tgtagcatcg gtgactcatg ctcgaaatca cagccactgg gtcgaggagg 28080
ccgcgtcaag tcggcccgag ggacgtgggc tcccggtggg aatcaagggg tagagcaaca 28140
taacatctac actcacatca ggtcctcttg acgtacttga cgattcggct tcaatcacta 28200
cttcgttctt cacaaataaa gccactccag tagcgcacct ttcacaggct aagttccggt 28260
cactttatga catagatcta attgagtatg gttaaagagt tcgaataacg cagaccttac 28320
cgtaccttgg agacgaacgt tgaatagcct agggcccacg ggatggagag gttgacggtc 28380
tgtgctatac atcagcagcg gttagcaatc tctttttttt atctgatgct aatgtatcca 28440
gtgcgaggag ggcgcgagtg tcagaataaa catgggttgc ccacttatgg attgcgagtt 28500
tctagacgtc aggctccaaa ggtatgcccc tatgcttact agcatccgca cacggcgccg 28560
ctctgtggca accgctattg caaatcctat taccagtgat ttgtgaagtg ttgcaagatc 28620
tataaggtca ggctgtatct cctggcctcg aggattatgt gacatgggca cggtctcaca 28680
tcatactacg accgcctata atccatcagc gtcaaatctt gaagacgtag gctacacaac 28740
acttgaactt actgggctag tccgcccagc ccttctcgta ctcaaggcga gctcaggtta 28800
tccgcctgat gagacgagat gcgcgtgagc ttaaagccgc ttattgttgt gggtaaggat 28860
tcaacgctgg ggtaaatcag tcatgaatag gcagcttcga gttcctacgt gctctgtcga 28920
gtctaccgct cgcgtgtaat ccatccgcgc ctgtcatatc tactatgtta aagctcttta 28980
ataaatagtc tagcgaccgc ccgggcactc tctagtcttc cctccttcga agcagtaaac 29040
ataactttta tacaccaagc ttcgaattac cggcgcagtg gcgactattg ccgctaaggc 29100
tggaggtgga gagagaccta agactttgtt caagatgctc ttcgctgagt tcttaacgca 29160
agagggccta aagctcaaga cacacacttc cagaaaaaag attcggtttt tgactcccgg 29220
ccaacccagg atgggctttc atcggtcagg aacgaatctc gacccttgac tgtgccgatg 29280
taggacgacg atggcgtctt gcgggacgct gataactctc cccagttcct ccgcgattat 29340
gacgccctcg attctttatg acaatccagc gatgagacga gtctcaatga acacgcatct 29400
tattggacaa accttgtcgt gggttgatgg ggtacgtcag ttcttatgat aacagtttgg 29460
actctacagc cagattaact ccaacgcgaa gatgtcacag accagccgta ctatacttaa 29520
cttagagaaa tttcagagca ggaggcattg ggtgagctgt gaagtgactt tggggggccc 29580
gaacattcgg atctgggctg tttacattcg aggtctgtta tagcaaaagt gatagagagg 29640
ctggcgttcc atgtatcaag tgatgttgtt ttagagcggt ttcctagcca cgacatggga 29700
ctccgcatag cgggtggttc agcgttttgg ttctagcgga gactttgtgg ggctttttcg 29760
tcgagcctca ccacccttcc tttgaagtca gtattgtgtg attatagaaa cggggtatcg 29820
tcagacacta attagtgcag ctcgcgggtt cgggacaccc atacgcaaac cgaaaattcc 29880
tgggggccaa ccacgtatta cgcactgcct tctgcgatct ggtagacgac ggcgaatcgc 29940
tcgcttaagg agtcccggcg aacatccaaa aacaccttac agagactaat aagagtaccc 30000
tctgggcgct acgatcttta ctgaagtcct ccagctacac gaagtctgtg tagcggttct 30060
tagttcggac cagggagaga gttatcaaga cactcaatgc tagcgggacc ttctcgtacg 30120
gggaggtctc agcaaattgc tcttgtcact ggtgcggagg agattcagat tccgggccta 30180
ctcgagttcc ggtcatcttt ccgatgatac aggtggggaa ttcctacgtc gtcaacctgc 30240
tgcgtgaata ctttgtgtag gttaggattg cctttcagcc ggcgacaccc caatttgttc 30300
aatggacagt ctaaactcgg gcaaagctag acatcgtggg ctgactgtgc cgtgaatgca 30360
tacactagga tctacccttg gctctgcacc catgaaggta tcagctctgt caccggtagt 30420
acttacagac ggcgatacta tggccgatga aatatcctct ctcatttata gagggactgc 30480
cagacagggg tgtaaaccta aaaatgccca cctcacaact tcacccaagg agggagaggc 30540
gcaggccgcc ccgtaacagc tacacgatgg agtgtccggg agcgcagcag tttcttcaga 30600
tcggtataca gccgatgtaa tgcggccgaa tcataatacg ggaagagatc cttcgcccag 30660
cacattcggc actcgcgagt ggggatctcc tggtggccat ttctcgatta gagtccttgt 30720
ggtactgata tttcagagtg ttccagtggc gaaccataga gtcgcgtcag gcgtaccttt 30780
actaggctcc gaattagggt atcggaccca gctcgcgcca tagacccaat gaggcgtata 30840
gaacaccatt aaaacagctc aggccaggtc ctgaaggtaa atcgggttgc gaaaggaaac 30900
cgcaatctag agcgggagga accctctgat gcgaggacga tcgcagattt tagtggttct 30960
tgagacccta ggtcctgggg aagtactgta gtggtatggg ggggtgggcc tgattctgat 31020
tcaagactag gggctgactt cgatggccct cacgacctaa aaaagtggct cgtttggtag 31080
atacggagac cttttccgac ctttctgctt ccctaagtca gccacaagac cgtctccctt 31140
tatatttttt cgtaaaacct gatacctttg ccaacgcggc actcgtcaac cggcaaacaa 31200
gaccgggaag ctttcatgtc tcgtacgctt caatgcctcc cagagggcag cttttaacat 31260
gagttttcta cagggcctga gtagtgcact acgcggcagc accttcacct cttgacgaaa 31320
gcgtacgaat tgatataaag cacccttggg cagaatatct cggcgttggc cgtcgtggtg 31380
gatctgtcgc gcgtgaatat gcgactaaat gtagcctcct ctaggcccct ctccgtcgac 31440
ggtaacatta taaaatgctt ccatacatag gtatcaccgc cggcggacga gttccattcc 31500
tgttatggcc gttttctctt gtccacccgc gaggaagccg ggccttcaga cacggttaag 31560
gatgaaacgt ctcgctgagc atccaactca aattaaaagg atgacatctc aaattgtccc 31620
gcggtttgag accccctcgc ctttgacgaa tacttaccac gctatcagat tgatagctcc 31680
cttgctcctg ccatagcggc gtcgtagagg agtcagctac agctcaccgt actagcgaac 31740
ggcgctgact gtgatggctc caggattgta taaggtaaca ttcaaggtgg cggtggacca 31800
ccttaacgtg cgcgaccagt gacatagcag gttcgttgaa gactggctat aaatcagcgg 31860
ggtactgttc ataactacga ccagcttcgg aataaataaa gaagggcacg ctgaataaat 31920
cacgtgattt gtcggtccat cttatcaatg cttcaggatt cgtggtgtcc aattaccttc 31980
ttatggcgga agccgatgat cctcggaggt gccagacatt cataaaaatg attagaaaca 32040
tccgtctatc ggtattggct ctccatgcct tttcggccag gggtcaattt cacgagtatg 32100
caaaaagaat agggatagtt tatcaagctg gcaggcgtgc tacacgctag atgcggttcg 32160
tccggaagag acccgggggg actggtccag gcgtcatcca ggtcgtcccg gtggaaacta 32220
tggaaggaat acgtaaatca attcgcccct caagggcgga aggcccaaca tacacaggat 32280
tggttgctca tccatggacg attacgatca tcacagtcgc ggcttctggg gaccgcctcg 32340
cagcaatcct cctctcagtt agccggatat cgtacaaacg atttcttcca atcgaggtgt 32400
taccccgaac gtgctgaatg ccacagcagt ttttcagtgt cgactcctaa ctatactcaa 32460
ctgccagtga gcatcggtgc tacgcaaagg tgtcgcaggt ataaatactg aactagccac 32520
ccggggcgat aaccctcgcg agttaacctc gatgagcacg acgggtatgt gttgtcatcc 32580
cttatcgcca ttggcccgtg acctcccacg ctgctacttt ggccagtgct catacacatg 32640
tggaccttac gagtcccggg cgatttatgc gcctgtgttt tttcagactt atgatttatt 32700
ttatatcgtc agattgaagt acagctgtcc tgcttctgag tcgagttgtg tatgccacag 32760
gcgagtctag gatagcctcc aaacgcctcc caatctgcgc acgcagatac ctctcgaccg 32820
ggtttggctg tcaccttgcg tccgacccgg gtcggtaaca agcccccgtg cagagagata 32880
gcctattgca ctctctcacc caacgtgctg cctccgcgcg attctaccct acactagtac 32940
tggccgaata gcggcgtaat ccgcacatgt gacgctaaaa tgctcagatc acttgcctcg 33000
aagccgcatc gaaaatccta catcccaccc taaagggtcc ttcgtgttta gtcacttgag 33060
acgcattatc cgcgcatatg ttctataaac ttctatttga gtgctctcgg caacgttagc 33120
gttgccggaa ccagaggtcc aatggggaat taggtagcct agaaggagaa ttacttaata 33180
tcgcgggttt ctgttgggca gcgtacgatc ggcgtaacgt acacgctcaa cggatggact 33240
atcggtccaa cggggtaccg ggagcttggg gaaatttttc catccatcgc ctgagttata 33300
caaacctgct attggaccat tgaagcgggc attgcactga tgcgtatcca agcctgaaac 33360
acacgttcgt tcgtcaaata gcatgggcta cagcgccgaa cctgggtccg acggcccaag 33420
gaagtgtcga cagactggtg aggaaacgac ctacgtcagt cgccaggcga aatttgccgg 33480
ataccttgcc tgactatgga gataccgctt atttgacggt cttagagcga gccgaacgcg 33540
gactgtgccc tctggaagcg acaacccgaa atagaagtac cacgtatgag aaagcacgac 33600
tccaagtaac cggttttctc cgtatcgacg cagaccgcca gatatatcct tttcttatgc 33660
ttcagaaagg agccccacac gccgtatgcg tggtgggagc acgagtggag gcttagctta 33720
cgtgacgtgc tctgttaggc caaccagcga ctacactagg gttcttaaaa attcttagct 33780
cgggtcgacg acgcacccaa catagtacgg tccttttccc acgattgaaa aggctgcgtc 33840
ccagcgccgc atccgaaaag gcaaccaagg agctgcttaa cagggttacc gtctccattt 33900
cggattcgga acctaccgag tatagctctt cacccggtcg gcgcagaagt cctattgtag 33960
cccgacgcca ggtcacccgt atagagttgc aactgggagt aggccaattg cagcatccgg 34020
aaccgtccaa caccagggat tcagtacccg gtgtggatat tcgggaggct ctagtttgaa 34080
tgctacagtc tcaagatccc gaagaggacg gctgggtgct ggcgttgggg tttagagcgc 34140
tgatcggcat tttgccggat tcatagaatc aaatacgaag tttggccgcc tccgcgtccg 34200
gcttcgcaac attcggggtc ccatacaccg aacgtttatg ccctccttac cagattgggt 34260
gccaggactt cgtttatcta cgacgtgggt taatcggcat cactcaccgc ggcgaatccc 34320
gttatgatta tcttaaacat accacggctg ataaaacgcg atataaatcc ccaccccgag 34380
atactctcct atctgaggcg ctactgtgtc cgacatcaat acgtagtaac cagagggaat 34440
gggaaccgtc ttagcattat aaagagtatc gctttaccgg ttctctgcga ggaggtcgcg 34500
tggccgctta ctcaagggat ccgcggtcct tcttgagaag tacccgttta cgcgatattc 34560
tgtcatcgcc attgagcgat tatgattact aagatgcggt cctcgtcgga gcgactctta 34620
caccacaaaa tatctgttgt ctgactacag gcaatagtgg gtatcttaaa ggcgagtccg 34680
attgtataga ccgaataatt ttgaactcta cctcgcgcag tgactcggat aagacacacc 34740
gcaagtgcac tacttcccta cgaggggtcc cggttcccgg gcatcgaagg gtgagaaaat 34800
atctcgcgtg tcccgtcgaa gcagtcgcta gacgacgtcg tttgccacca cgaagaagac 34860
gctagtcaag cgttaaagct acgacttggt aaagtgcacg ccgttagccg ggacgagcta 34920
cgcatgcccg ttaaacacta ccatcgtcgg ctgctccagt gttaagctag ggacgtgtca 34980
ctgaggtatc caacacggca tcatgaaagc ggatctgtct gcggggggta cggttgggga 35040
tacagttatg aaccctgaga tataggttca ccagttcgta gagatagatt attcgagtgg 35100
ccccaaaata cacccgtttg atgtgaagct tcactacggc ttctggacaa cttaatggcg 35160
gagatccaag gatagcgtag gtgatcaact gcttcttttt aaaaagttga ccgagtgtat 35220
ccgcgtctga ggaatagaac cgcatcggga agggttgagc gaggagcgtg ggatgcattg 35280
gcaaaattga atcatcgatt ctcaactctc gacccgtcat ctcgcgtagt gtgaacatca 35340
agcaggcatc acgaaatact tgtaagagtc tctcgtagac gttacactta caggccattc 35400
tagttgtcgt gcaggcctcg gaccatcgca atgttagagt acggtccaat gatgcacccc 35460
ccaacccaac aagttccgac attcaatcaa cgaataagtc atggcgtgaa ccctttagct 35520
cacaacatat tggcagtcct tccatttggt tctgactgga gagtcggctt tacacacttc 35580
ggctgtccgg tatcctgcgt ctcggcacga ttcagtgaga tacgcttagt tcgtgaattt 35640
gtaataagct gattgaaggg cttcagccgg ttgccttttt ttataattcc ttgtgccata 35700
gaacagagag ttgttctcgt taaagctaga caacgttcca acccaaatat gttaaagagt 35760
aaacttatcc gcgaccgact cgaatccaca atcttttccc aagagcatat agttatgcct 35820
agacaggatc ctaggagcgt actttgcacg gtactaattc gccatatgaa aatgtacgat 35880
gctgtaccag gcggggaaga tcaaccctcc gcgtctaggc actcgtccgt aatcaccgga 35940
actagtcgtc ggctgcctgt tctgggagca aactgcacaa gacttcagat cgttatagtc 36000
gattgtggat tcctccaagc aggattgtac ggcgggatgt ttgttatgac tccgctctac 36060
caaatactgg ggtaaggcgg ccaccaccac ctgacggtgg taacaagtta gagaactcac 36120
caatcgacac ctgcagaaga ggtatgcgta ttcatccggg aggagtgcaa atcctaactg 36180
tagctcgtga agcggagtta cgaaaaaaga ttgtggtccg gccagggagc gctacatatt 36240
gagtaactct taccgggagg gccgaaatat tctagtgaag ccctcccatt aggcacggag 36300
ttgaggttat aatgaatgga tgcagcgtaa attctatcgt cggcctaatc ctacactttc 36360
tgcttgttca gtcgcgtcta gaacatacag aaaagatgct ataacagggt acgctttagt 36420
tgcgaaggct tctacggtag ctattcgtga atgactgtgg cgttattccc atgcgctaag 36480
caaggaaggg ggcggcgcct gttttctacc cgagggttta aagtatatca atagtatgct 36540
tacacctaga tgctggaacc taccaccaaa gcgtcgattg gtcggccgcc gcacatcact 36600
ggtagtgcag gtctggagcg aggtttcaac aaccgcactg gtttcgcgct tgatgcggtc 36660
gaccatttct cccttcaaag cagccgagcc aggtgatgtc ggggtgcact tttaccgatg 36720
cgtgtcgacc ctctttccca gtatactccg ctcattacgt taatctattg acaaccaacg 36780
gtcaagaaaa taagacccag acgctacgtg acatgggata ctaagtacct gagtgcctgg 36840
tcgataacac ctgctccacg tgatcagtaa ggccgcaact gcaatttata atggaccaga 36900
caacgacacc ttctgcgttt gccatcaaca cactgtcgtg tttgatgcaa tgaacctggt 36960
tcaataggct ggcaaaggtt caggaactcc aaactctagc ctccccgtcg accagcatgt 37020
aacatcgggg ttctgccgcg tcggagggga ggcttggaac gaaaagcctc tctgaagaac 37080
cgtttatcgt attgacaaat catccgtgca cggaggtgct acgttcagtc ctactttcca 37140
gagtcaaatt tattgcgtct ttacccccta gtgaggcgca attgatgttc aaccgctcca 37200
agacacaacg ctcccccagt ccgtgcgagt tattctactg cagaagttaa cctaaggcag 37260
gccgatgacg gtagtgctcc gacatgtggg acggagaacc tcgcggccga tccatgtcag 37320
ccgtagcggc gtggtatgta ggttattact ccggggccaa ttatttggat cagtcaaata 37380
acgtcgtcct caagacgtgt actctccttt tccatcgtgc caatgctctt tgcaacagga 37440
ctcgaatccc aatacttggc tgatcagaag catcatatta cggacctctc cgctgcatcg 37500
ttatactgcc ttgtgcgccg ctcgagtgga gaggggcacg caggctagcc cgcttctggg 37560
aattataaaa cgggtagctt accagaggat taaaggtcgc attttactat cgttcgaacg 37620
ggtgtcgcgg atgccgaccg ctcgatgccc acggcaaatg caagcaaacc agttcctaaa 37680
aaatccaatg gcgtgcctcg agttgctttt aatagcgagt ttccagggga ctagacttca 37740
ccgcaactgc gacccaacca attgaatccc ttacatagga tgttaccgtg aaactcagcg 37800
gatcccattg tgtctagttg ggcaattaaa aacgagaaac tataataccg tatagtaatt 37860
cctaattggg tcagtagagc gcacatccgg atctcaaatg ttcgttcgat tccttatgat 37920
agctaatgtt cttagtgagg gagacactca ggatatctaa ccactcatta agtagggccc 37980
tttccgttag gattcgtagt gcctatcaca agttgattaa gcgataagag tccctcgaat 38040
gtttacttgc cgacaggagt gtagatgcgt gcgaggtaag aaatgtcgta cgcttacgcg 38100
aacatgagtg ttcaggtgct ccgtaaatac ccactgcacc tttctcccta tcgcttcatc 38160
aacgtctctt aggggctgga tcctgttata aatgggcact cgcctagacc agtcagctag 38220
tttctaacag ccgatacatt cgcttgctaa cggatcatcg tacaggtatt cgtacaagac 38280
cgcttccact gctgagtgtt tctttgtgga agtcgcttag aagcgtccca tcgaggagat 38340
agctggaaat tctataggga cccgaggtta ggacaggcgc agcattctga ctcatgtgcg 38400
tccagattgt aatctagatg gttagggttc caattatgaa ggaagttatt acctacgctc 38460
ttatccatag tccaaggact gttgcttcct tccggacggt agttgcgagt acacaaggaa 38520
gtctttttag aaaatagtaa cagcaggtcc cattacgggc ccagcgcatc accgctaagc 38580
atacaagacg ttatcttagt gttcgggtaa ataaagacaa ttacgagtca tgggtgctgc 38640
ctcattccca ttcattcgtg agttgtctat cggtacagcg actagatggg ggaacgtggc 38700
ctaggacact gagtcgagcc ctcatgggcc attaagggct ctcaaccacc tacgtcggct 38760
tccgccggcg tccgcaaatc gattctctac tagtcctcgt ccaggtagcg tgacaacgtt 38820
gggcaatata aaagactgta ttaagtgcaa gcctctgggg caaagtgaat agtagtgagc 38880
cgaagccttc aacaagttat gatgtagcaa agaggttact gaccacgcgc ccgagatagc 38940
ggacctcctt tgtgtccatg accagtaagg gtcaacgttc cttatgcctt tggtgaactg 39000
tgagggcaca ataatttctt tcccctgaac gacgaactcc aatctacgcg tcaccggact 39060
gtaaaggtga gatcagatcg ttctgagtcg gcacatttag aacgagtctc gctatatgcc 39120
gcggggtcgt ggacgtattt actaataagc caatgatctc ggagcctcct ggccacgcca 39180
atagtctcgc cccccgtatt tacatttgca gactgagttg cccgtatgtc gtgcagcttt 39240
tggaatctaa ctgggaagcc tgtctgcctt tgatgggccc ctggccctaa ttcctattaa 39300
ctgaaaggtt acggcaggcg cggataaact cggcttgaac gttagtacat agagcccccg 39360
tctgcgcaat tgaggcccct gttggagata tcttgtcaaa tcaccgatct ctgggtcgtc 39420
taacgccttt atagatagaa gagacgactc ggcgccggtg cgtagcgttt cgaatgcaga 39480
gcacgctacg acaaccttgc tccgactgag cgacaatgca acttggtgag tggcttagat 39540
taaaatcagg cgtcctcaat tgttagagct cctccctttg taatagggag aggtttggtt 39600
ccgctcaagt aaccgttctc gaagggccgg cttttcgttg caagatacac ccacactgtt 39660
gttactgtaa cataggggag gtatcgcaac cgttacacgg cattccgctc taggggaaat 39720
cttatccttc aagcttgttc cacgagaaag tccgatctaa ctgaaatttt tagaaaaaaa 39780
gaagagggga cgagcagccg ttgttcgcac gtgtatccag caggcttggt ttaggctcct 39840
actcttccat gcgctatcct tataacctgc cttatccctg agtaaattga tacgttggga 39900
tcacagttag aggctaaaga catagctaag gatattgaat gcataaggat atagagagac 39960
gtttatgctt ctatggatct gccaaaagcc agtcgtaatc taacggcaag tcaatgcccg 40020
atacgtggaa aaaggcctgt ctgctagcgc ggctaagatg caggcgtcat ttccccacgc 40080
aagtgtctgg ttgaaatttt ccttatgcca gcggtactaa acccccggta agtttgtaat 40140
tctcattctg agttggcaac gtatatacat ggaacccacc ggtcagtatc cctcaattga 40200
caatggttaa attagaatgt tgtgggcgct ctacctactc ccacctgttt cttcgtactt 40260
ggggaatcgg tctgcaggct cagcatacta tagtatccaa tctcactgtg taacctcttc 40320
cactactcca acgacgcaaa tgtaggatac ccaatccgca taggaagtaa gcgggggggt 40380
attcggagcg ctccaaactg taaggaatca gcggagcgaa tggtatttaa atcgccgcta 40440
cgaaggcgta cctatctaaa agtcagattc ggcgtgtaga cgtatgcacg aacgtgatac 40500
<210> 2
<211> 36
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 2
atctagaatc aaaacgacac tttatttcca aaaagg 36
<210> 3
<211> 25
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 3
tattaggatc ggaatccatc tgcaa 25
<210> 4
<211> 25
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 4
gaacgacaaa ccccgacaag taaca 25
<210> 5
<211> 30
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 5
ctgtattccg tctgacgaaa attttgtaat 30
<210> 6
<211> 25
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 6
tgtaatctcc gccacaatgg tttgt 25
<210> 7
<211> 23
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 7
acgtctccgg atttttaatc cgc 23
<210> 8
<211> 30
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 8
tttctttggc ggttaaactc acacatctat 30
<210> 9
<211> 34
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 9
gttaatagta tcacaccacc catatgaggt tagc 34
<210> 10
<211> 25
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 10
acgtcctgat ggatggagca attag 25
<210> 11
<211> 33
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 11
tagtttcagt aatgaatact gtctcaagct tcg 33
<210> 12
<211> 30
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 12
aacgccttaa agccaaataa agatcgaaac 30
<210> 13
<211> 27
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 13
tccacctcta aggctgtcat gtctatt 27
<210> 14
<211> 26
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 14
acgttataat ccctagtgcg taggtc 26
<210> 15
<211> 28
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 15
tcacggtgta attataaggt ccgtaacg 28
<210> 16
<211> 28
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 16
tccccgaagt gtgtacgata tctatgac 28
<210> 17
<211> 24
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 17
agcttgcgtg cttatcagca taag 24
<210> 18
<211> 25
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 18
tcatagatcg ctcccgtctg cgata 25
<210> 19
<211> 23
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 19
agcagcgttc tacaacgact agc 23
<210> 20
<211> 23
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 20
tgcacgattg attggggcat ttc 23
<210> 21
<211> 28
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 21
acacagttat taatgctagc tatcgtcg 28
<210> 22
<211> 28
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 22
ataacagttt ggactctaca gccagatt 28
<210> 23
<211> 24
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 23
tagtgtatgc attcacggca cagt 24
<210> 24
<211> 21
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 24
tctgcgcacg cagatacctc t 21
<210> 25
<211> 21
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 25
tggcctaaca gagcacgtca c 21
<210> 26
<211> 20
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 26
acctgctcca cgtgatcagt 20
<210> 27
<211> 25
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 27
aacgaacatt tgagatccgg atgtg 25
<210> 28
<211> 27
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 28
ttatccctga gtaaattgat acgttgg 27
<210> 29
<211> 31
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 29
caagggaaca ttatagggtg ttaagagtac t 31

Claims (3)

1. A DNA-based information storage method comprising:
1) converting the information into a binary bit sequence, and then carrying out error correction coding in segments;
the segmentation is as follows: each (2)12-1) the bp is divided into one packet,
the scheme of the error correction coding is that a low-density parity check code is used as a traditional error correction code, then a watermarking code is superposed and used for correcting vacancy generated in the process of long sequence sequencing and assembling and insertion and deletion of base/base fragments, various errors in the process of genome or plasmid replication, sequencing and assembling are processed, and the code rate of the adopted watermarking code is 4/5; the code length of the adopted LDPC code is 64800 bits, the information bit length is 32400 bits, and the code rate is 1/2; the overall efficiency represents 0.8 bits per base;
2) converting the binary sequence into a sequence consisting of A, T, G, C, wherein the preset corresponding relation is 00 → A, 01 → T, 10 → G, 11 → C;
3) sequentially breaking the obtained DNA sequences with the length of 10kbp to 100kbp to respectively obtain fragments with the length of 2 kbp to 4kbp, keeping 30 bp to 150bp homologous segments among the fragments, and synthesizing the DNA fragments by using a DNA synthesis and assembly technology;
4) amplifying the first 500bp of the 1 st fragment and the first 500bp of an ADE2 gene of saccharomyces cerevisiae by a PCR technology, amplifying by using an Overlap-PCR technology to obtain connected 1000bp joint fragments, and naming the joint fragments as a joint A after Sanger sequencing verification; amplifying the last fragment by the PCR technology for later 500bp, screening a tag Leu gene 1512bp and a saccharomyces cerevisiae ADE2 gene later 500bp, amplifying by the Overlap-PCR technology to obtain a connected 2512bp joint fragment, and naming the joint fragment as a joint B after Sanger sequencing verification;
5) and mixing the obtained adaptor A and adaptor B with the synthesized DNA fragment for saccharomyces cerevisiae co-transformation, and replacing the joint A and adaptor B to the position of saccharomyces cerevisiae ADE2 gene by in vivo assembly.
2. The information storage method according to claim 1, further comprising a step of propagating the microorganisms.
3. The information storage method according to claim 1, further comprising a step of information reading; the method specifically comprises the steps of sequencing the microorganisms, converting a detected DNA sequence into a binary sequence, and decoding to obtain binary data so as to obtain stored information.
CN201811377712.XA 2018-11-19 2018-11-19 DNA-based information storage method Active CN109460822B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811377712.XA CN109460822B (en) 2018-11-19 2018-11-19 DNA-based information storage method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811377712.XA CN109460822B (en) 2018-11-19 2018-11-19 DNA-based information storage method

Publications (2)

Publication Number Publication Date
CN109460822A CN109460822A (en) 2019-03-12
CN109460822B true CN109460822B (en) 2021-11-12

Family

ID=65610910

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811377712.XA Active CN109460822B (en) 2018-11-19 2018-11-19 DNA-based information storage method

Country Status (1)

Country Link
CN (1) CN109460822B (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110060734B (en) * 2019-03-29 2021-08-13 天津大学 High-robustness bar code generation and reading method for DNA sequencing
CN110190858B (en) * 2019-05-30 2022-02-22 宋理富 Polymer molecule information storage error correction coding and decoding system
CN110706751A (en) * 2019-09-25 2020-01-17 东南大学 DNA storage encryption coding method
CN110684791A (en) * 2019-11-15 2020-01-14 天津大学 Method for storing information in vivo by using DNA
CN111243670A (en) * 2020-01-23 2020-06-05 天津大学 DNA information storage coding method meeting biological constraint
CN111440827A (en) * 2020-05-22 2020-07-24 苏州泓迅生物科技股份有限公司 Information storage medium, information storage method and application
CN111737955A (en) * 2020-06-24 2020-10-02 任兆瑞 Method for storing character dot matrix by using DNA character code
CN112002376B (en) * 2020-08-13 2024-03-19 中国海洋大学 Method for recording and reading information by DNA molecules
CN112700819B (en) * 2020-12-31 2021-11-30 云舟生物科技(广州)有限公司 Gene sequence processing method, computer storage medium and electronic device
CN113300720B (en) * 2021-05-25 2022-06-28 天津大学 Sectional identification method for insertion deletion of long DNA sequence of superimposed watermark
CN113380322B (en) * 2021-06-25 2023-10-24 倍生生物科技(深圳)有限公司 Artificial nucleic acid sequence watermark coding system, watermark character string and coding and decoding method
CN113205857B (en) * 2021-07-02 2021-09-28 天津诺禾致源生物信息科技有限公司 Method and device for identifying non-homologous regions of genomic chromosomes
CN115197956A (en) * 2022-06-07 2022-10-18 南方科技大学 DNA data storage method and application thereof

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101719908A (en) * 2009-11-26 2010-06-02 大连大学 Image encryption method based on chaos theory and DNA splice model

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104662544B (en) * 2012-07-19 2018-08-03 哈佛大学校长及研究员协会 The method for storing information using nucleic acid
CN104419701B (en) * 2013-08-29 2019-01-11 天津大学 The quick assemble method of the yeast of multiple clips DNA
CN105022935A (en) * 2014-04-22 2015-11-04 中国科学院青岛生物能源与过程研究所 Encoding method and decoding method for performing information storage by means of DNA
EP3470997A4 (en) * 2016-05-04 2020-04-01 BGI Shenzhen Method for using dna to store text information, decoding method therefor and application thereof
CN107798219B (en) * 2016-08-30 2021-07-13 清华大学 Method for biologically storing and restoring data
CN106845158A (en) * 2017-02-17 2017-06-13 苏州泓迅生物科技股份有限公司 A kind of method that information Store is carried out using DNA

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101719908A (en) * 2009-11-26 2010-06-02 大连大学 Image encryption method based on chaos theory and DNA splice model

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
"CRISPR–Cas encoding of a digital movie into the genomes of a population of living bacteria";Seth L.Shipman等;《Nature》;20170712;第1-14页 *
"DNA组装新方法的研究进展";李雷等;《生物工程学报》;20130825;第1113-1122页 *
"The essential component in DNA-based information storage system robust error-tolerating module storagesystemrobusterror-toleratingmodule";AldrinKay-YuenYim et al;《BIOENGINEERING AND BIOTECHNOLOGY》;20141106;全文 *
"纠正同步错误的反转级联水印码的迭代译码";张林林等;《信号处理》;20170228;第33卷(第2期);第144-151页 *

Also Published As

Publication number Publication date
CN109460822A (en) 2019-03-12

Similar Documents

Publication Publication Date Title
CN109460822B (en) DNA-based information storage method
Pérez-Cataluña et al. Revisiting the taxonomy of the genus Arcobacter: getting order from the chaos
Antón et al. Extremely halophilic bacteria in crystallizer ponds from solar salterns
Meng et al. Genetic and functional properties of uncultivated MCG archaea assessed by metagenome and gene expression analyses
Lema et al. Corals form characteristic associations with symbiotic nitrogen-fixing bacteria
Cavicchioli Archaea—timeline of the third domain
Sawabe et al. Updating the Vibrio clades defined by multilocus sequence phylogeny: proposal of eight new clades, and the description of Vibrio tritonius sp. nov.
Skirnisdottir et al. Influence of sulfide and temperature on species composition and community structure of hot spring microbial mats
Neufeld et al. Stable-isotope probing implicates Methylophaga spp and novel Gammaproteobacteria in marine methanol and methylamine metabolism
Breitbart et al. Here a virus, there a virus, everywhere the same virus?
Nercessian et al. Archaeal diversity associated with in situ samplers deployed on hydrothermal vents on the East Pacific Rise (13 N)
Zeng et al. Metagenomic evidence for the presence of phototrophic G emmatimonadetes bacteria in diverse environments
Bartossek et al. Metagenomic analysis of ammonia-oxidizing archaea affiliated with the soil group
Sjöling et al. High 16S rDNA bacterial diversity in glacial meltwater lake sediment, Bratina Island, Antarctica
Bird et al. Culture independent genomic comparisons reveal environmental adaptations for Altiarchaeales
Garrett et al. Metagenomic analyses of novel viruses and plasmids from a cultured environmental sample of hyperthermophilic neutrophiles
Albers et al. The legacy of Carl Woese and Wolfram Zillig: from phylogeny to landmark discoveries
Reeve Archaebacteria then… Archaes now (are there really no archaeal pathogens?)
Shinzato et al. Phylogenetic analysis and fluorescence in situ hybridization detection of archaeal and bacterial endosymbionts in the anaerobic ciliate Trimyema compressum
Tang et al. Phylogenomic analysis reveals a two‐stage process of the evolutionary transition of Shewanella from the upper ocean to the hadal zone
Shurigin et al. A glimpse of the prokaryotic diversity of the Large Aral Sea reveals novel extremophilic bacterial and archaeal groups
Bhattarai et al. Viruses and their interactions with bacteria and archaea of hypersaline great Salt Lake
Nicol et al. Genome Sequence of “Candidatus Nitrosocosmicus franklandus” C13, a terrestrial ammonia-oxidizing archaeon
Rahalkar et al. Cultivation of important methanotrophs from Indian rice fields
Clementino et al. Archaeal diversity in naturally occurring and impacted environments from a tropical region

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant