WO2024027620A1 - Data storage medium and use thereof - Google Patents

Data storage medium and use thereof Download PDF

Info

Publication number
WO2024027620A1
WO2024027620A1 PCT/CN2023/110132 CN2023110132W WO2024027620A1 WO 2024027620 A1 WO2024027620 A1 WO 2024027620A1 CN 2023110132 W CN2023110132 W CN 2023110132W WO 2024027620 A1 WO2024027620 A1 WO 2024027620A1
Authority
WO
WIPO (PCT)
Prior art keywords
sequence
data
nucleic acid
acid molecule
chain
Prior art date
Application number
PCT/CN2023/110132
Other languages
French (fr)
Chinese (zh)
Inventor
樊春海
王飞
郝亚亚
李子慕
Original Assignee
上海交通大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海交通大学 filed Critical 上海交通大学
Publication of WO2024027620A1 publication Critical patent/WO2024027620A1/en

Links

Classifications

    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C11/00Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor
    • G11C11/54Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using elements simulating biological cells, e.g. neuron
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids

Definitions

  • This application relates to the field of data storage, and specifically to a data storage medium and its application.
  • DNA Deoxyribonucleic acid
  • the codability and efficient replication capabilities of DNA may provide a new strategy for high-density data storage and high-performance computing.
  • DNA storage has the advantage of high physical stability, unlike electronic media that deteriorates with the number of reads, providing a fundamental solution for long-term data storage.
  • DNA has both information processing and computing capabilities, providing new ideas for the development of new storage-computing integrated architectures and systems.
  • the DNA data storage process mainly includes six technical links: information encoding, writing, saving, retrieval, reading, and decoding.
  • the existing DNA storage system is mainly based on artificially synthesized short-chain DNA or long-chain DNA, which encodes digital information into DNA sequences and synthesizes corresponding DNA strands, which are stored in cells or outside the body, and then sequenced technology is used to read out the data.
  • existing DNA storage systems mainly include two storage modes.
  • One storage mode is a non-random read storage architecture. This solution encodes and writes the data to be stored holistically. Therefore, the reading of the data also requires sequencing of all sequences in the system and the overall sequencing results. Decode to obtain data information.
  • the non-random read storage architecture cannot search for partial information within the file, and therefore cannot perform addressable modifications to the written information.
  • Another storage mode is a DNA storage architecture with a random read mode. In this type of system, the data to be written is first divided into data fragments, the data fragments are encoded and index sequences are added. Therefore, this type of DNA storage architecture has the ability to selectively read information fragments.
  • the selective reading of data in this type of system mainly relies on PCR technology. It is necessary to directly take out a part of the original data or extract the data to be read through magnetic beads and then perform PCR amplification, and then perform sequencing reading. However, this method will destroy the original data composition. After multiple reads, multiple PCRs will introduce many errors into the data, affecting the recovery of the original data. Similar to the in-situ reading method of traditional semiconductor storage media, a data reading method that achieves selective reading of data without destroying the original data is still lacking.
  • the current DNA storage system mainly has a single write archiving function, but the ability to modify data in the storage system after writing is still very insufficient, especially the addressable modification of any data stored in the DNA storage system. Still not implemented.
  • This application provides a DNA storage system with complete data operation capabilities, which can realize data writing, data deletion, data modification, data reading and other addressable features, especially multiple erasing, data modification and multiple Secondary data readout can make up for the functional shortcomings of existing DNA storage systems.
  • this application enables programmable combination and separation of storage addresses and data on DNA molecules.
  • the present application provides a system comprising 1) a nucleic acid molecule containing data information, and 2) a carrier having addressable information, the nucleic acid molecule being able to bind to the carrier, and the
  • the data information contained in the nucleic acid molecules can be stored (also called writing), read in situ and/or edited in situ (including erasing and/or writing new data information) on the carrier.
  • the present application provides a method for data storage, which method includes providing 1) the nucleic acid molecule containing data information, and 2) the carrier with addressable information.
  • the present application provides a method for data reading, which method includes providing 1) the nucleic acid molecule containing data information, and 2) the carrier with addressable information.
  • the present application provides a method for data editing, which method includes providing 1) the nucleic acid molecule containing data information, and 2) the carrier with addressable information, and replacing the nucleic acid molecule data information in.
  • the carrier has different sequences of address sequences at different physical locations.
  • the vector comprises a DNA origami substrate comprising a staple strand comprising an address sequence having addressable information.
  • the address sequences at the ends of the staple chains are arranged in a matrix.
  • the matrix is at least a 2 ⁇ 2 matrix.
  • the distance between two adjacent staple strands is about 6-24 nm.
  • the distance between two adjacent staple chains is approximately 6 nm.
  • the distance between two adjacent staple strands is approximately 12 nm.
  • the distance between two adjacent staple strands is approximately 18 nm.
  • the distance between two adjacent staple strands is approximately 24 nm.
  • the nucleic acid molecules are bound to the staple strands and arranged in a matrix.
  • the matrix is at least a 2 ⁇ 2 matrix.
  • the distance between two adjacent nucleic acid molecules is about 6-24 nm.
  • the distance between two adjacent nucleic acid molecules is approximately 6 nm.
  • the distance between two adjacent nucleic acid molecules is approximately 12 nm.
  • the distance between two adjacent nucleic acid molecules is approximately 18 nm.
  • the distance between two adjacent nucleic acid molecules is approximately 24 nm.
  • the nucleic acid molecule comprises an address complementary sequence that is complementary to the address sequence on the vector.
  • the nucleic acid molecule is bound to a specific physical location of the vector by complementarity between the address complementary sequence and the address sequence.
  • the address complement is about 15 or more nucleotides in length.
  • the nucleic acid molecule includes a data sequence
  • the data information in the data sequence can be read in a state where the nucleic acid molecule and the vector are not substantially separated. For example, by activating a reaction, a readable data chain is synthesized, thereby reading the data information in the data sequence.
  • the data sequence is about 1 or more nucleotides in length.
  • the data sequence may be single stranded.
  • the nucleic acid molecule includes a read priming sequence
  • the read priming sequence can trigger the synthesis or transcription of the data sequence into a strand to be sequenced
  • the strand to be sequenced is complementary to the data sequence.
  • the read priming sequence includes a promoter.
  • the read initiating sequence is a T7 promoter.
  • the read priming sequence enables transcription of the data sequence. By sequencing and decoding the transcript products, the data information contained in the data sequence can be read.
  • the system and/or method further includes 3) a blocking chain that can prevent the data information from being read.
  • the blocking chain may also be called a closed chain.
  • the blocking strand includes a blocking sequence that is complementary to the read initiating sequence, thereby preventing the data information from being read.
  • the blocking sequence includes a promoter complementary sequence that binds to the promoter as a read initiator sequence, so that transcription cannot be initiated.
  • the blocking strand includes a blocking elongation sequence located upstream and/or downstream of the blocking sequence, and the blocking elongation sequence is substantially non-complementary to the nucleic acid molecule.
  • the elongation-blocking sequence may also be referred to as a dangling chain (teohold). Dangling strands can provide energy driving force and specificity for transcriptional activation reactions.
  • the elongation-blocking sequence is about 4-10 nucleotides in length, for example, 6 nucleotides.
  • the blocking chain is complementary to the read initiation sequence on the nucleic acid molecule, and transcription cannot be initiated.
  • the system and/or method further includes 4) activation chain.
  • the activation chain can be added.
  • the activation chain can prevent the blocking chain from binding to the nucleic acid molecule, and the activation chain is complementary to the blocking sequence and the blocking extension sequence at the same time.
  • the activation chain corresponds to a specific physical address.
  • an activation chain corresponding to the specific address can be added.
  • the activation chain combines with the hindrance chain, causing the hindrance chain to detach from the reading initiation sequence of the nucleic acid molecule. , thereby initiating reading, for example, activating the transcription function at that address.
  • an activation chain corresponding to a specific address is added to activate the transcription function of the address.
  • the data of the address is transcribed into an RNA chain.
  • the unactivated address has no transcription function. ability.
  • the data reading process can be performed at room temperature.
  • the blocking strand bound to the nucleic acid molecule at a specific physical location can bind to the activating strand.
  • the nucleic acid molecule includes an erasure function sequence, the erasure function sequence is located upstream and/or downstream of the address complementary sequence, and the erasure function sequence is consistent with all the addresses on the carrier.
  • the above address sequences are basically not complementary.
  • the system and/or method further includes 5) erasure chain.
  • the erasure chain can be added.
  • the erasure strand can prevent the nucleic acid molecule from binding to the carrier, and the erasure strand can be complementary to the address complementary sequence and the erasure function sequence at the same time.
  • the erasing strand is capable of complementary binding to the erasing functional sequence of the nucleic acid molecule.
  • the erasure chain corresponds to a specific physical address.
  • the erasure chain corresponding to the specific address can be added, and then, for example, through a DNA chain substitution reaction, the data sequence of the address is combined, so that the data sequence Removed from the carrier, returning the address to its unwritten state.
  • the erasing process can be performed at room temperature. At room temperature, after the erasure strand is added, the nucleic acid molecules at specific physical locations can bind to the erasure strand, and the nucleic acid molecules are basically not bound to the carrier.
  • the address can be rewritten into a new data sequence, for example, adding new data information containing nucleic acid molecules. This enables in-situ modification and/or editing of data.
  • the molar ratio of the nucleic acid molecule to the address sequence in the vector is about 1:1 or greater. In certain embodiments, the molar ratio of the nucleic acid molecule to the address sequence in the vector is about 2:1 or greater. In certain embodiments, the molar ratio of the nucleic acid molecule to the address sequence in the vector is about 3:1 or greater. In certain embodiments, the molar ratio of the nucleic acid molecule to the address sequence in the vector is about 4:1 or greater. In certain embodiments, the molar ratio of the nucleic acid molecule to the address sequence in the vector is about 5:1 or greater.
  • the application provides a nucleic acid molecule in the system and/or method, the nucleic acid molecule comprising the data sequence, address complement sequence and/or read priming sequence.
  • the nucleic acid molecule upstream and/or downstream of the data sequence, the nucleic acid molecule further includes primers for verifying the feasibility of erasing and reading.
  • the primer part can be replaced with a data sequence containing data information to further increase storage capacity.
  • Figure 2 shows an example of a nucleic acid molecule described herein.
  • the application provides vectors in the systems and/or methods.
  • the present application provides a storage medium, which contains the method of the present application.
  • the present application provides a device, the device includes the storage medium of the present application, and is coupled to the A processor of a storage medium, the processor is configured to execute based on a program stored in the storage medium to implement the method of the present application.
  • the inventor of the present application constructed a 6nm DNA chip to achieve multiple in-situ erasing and/or multiple in-situ selective reading (WMRM).
  • WMRM in-situ erasing and/or multiple in-situ selective reading
  • the addressability and programmability of DNA nanostructures are used to arrange data sequences storing information on DNA origami with a pitch of 6 nm, giving each data sequence a physical structure with a pitch of 6 nm.
  • Address by designing erasure chains and utilizing the information processing and computing capabilities provided by the DNA strand displacement reaction, can realize repeated erasing of DNA data sequences on origami and modify any data.
  • a promoter for example, T7 promoter
  • switch is set on the DNA chain as a read start sequence.
  • the DNA chain displacement reaction is used to regulate the transcriptional activity of the DNA chain, and the transcription activity of the DNA chain is selectively
  • the DNA information is transcribed and RNA sequenced (for example, using Illumina technology) to achieve selective reading and multiple reading of the data information.
  • this application at least has the following characteristics: it is based on DNA assembly technology and has nanoscale addressability, which is not available in the current DNA storage system; using this nano-addressability, it can The area density of DNA storage has been increased to 1 bit/square nanometer, surpassing existing inorganic storage architecture and DNA storage architecture; based on the controllable dynamic assembly of the DNA origami surface, a full-featured storage system is realized, including data reading, writing, modification, etc. Operation; the readout process adopts a solution of in-situ transcription into RNA molecules and sequencing of the transcript products without changing the original storage system, achieving non-destructive DNA data readout.
  • the method and data carrier of the present application can have the effect of multiple reading and writing, and multiple reading.
  • the structure of the data system basically does not change.
  • the address information of the data carrier of the present application has high accuracy and specificity, and can achieve a higher level of discrimination.
  • the method and data system of the present application can realize the reading of data information at room temperature to about 42 degrees Celsius.
  • Figure 1 shows an exemplary operation flow of the data storage medium of the present application.
  • the stored data can be used for all data forms, including but not limited to Chinese characters; the storage capacity can be infinitely expanded based on the length of the data sequence, including but not limited to 16 bits/site or 120bits/site; for data reading, sequencing technologies known in the art can be used, including but not limited to high-throughput in-situ sequencing, transcription-RNA sequencing, and transcription-reverse transcription-amplification- DNA sequencing.
  • Figure 2 shows an exemplary structural composition of the nucleic acid molecule of the present application, that is, the data chain includes the data sequence, in which primer 1 and primer 2 are designed only to more easily verify the feasibility of selective erasing and reading, This part can be replaced with actual stored information to further increase storage capacity.
  • Figure 3 shows an exemplary process of selective reading and reversible erasing of the data storage medium of the present application.
  • Transcription activation process the key chain is the activation chain; when the nucleic acid molecule described in this application is blocked by the hindrance chain, transcription cannot be started.
  • the key chain is added, the hindrance chain turns to combine with the key chain and no longer binds to the promoter, adding After the T7 promoter is intact, RNA polymerase binds to the promoter and initiates transcription.
  • Reversible erasing When the erasing chain (ie erasure chain) is added, the data chain is combined with the erasing chain and separated from the carrier. After rejoining the complementary sequence data chain with the same address, the data chain is re-written to the carrier.
  • Figure 4 shows the structural characterization results after writing DNA origami surface data.
  • the upper part is a schematic diagram of the address where data is written on the surface of DNA origami.
  • the lower part shows the AFM characterization results after data writing.
  • Figure 5 shows a heat map of PAGE data statistics for readout activation orthogonality. Only when the activation chain matches the data chain can the data be read out efficiently.
  • Figure 6 shows the second-generation sequencing results after seven address data reads. The sequence information obtained by sequencing at each address is completely consistent with the written information.
  • Figure 7 shows the super-resolution fluorescence microscopy imaging results of addressed data erasing.
  • the upper part shows the data fixed-point erasure and writing process, as well as the corresponding data address schematic diagram.
  • the figure below shows the fluorescence imaging structural characterization results for each step of the process.
  • Figure 8 shows the fluorescence test results of the feasibility of repeated data modification.
  • Figure 9 shows a single-point TIRF (total internal reflection fluorescence microscope) imaging image of seven address data being repeatedly erased 10 times.
  • Figure 10 shows the TIRF (total internal reflection fluorescence microscopy) imaging and co-localization statistical chart of seven address data that were repeatedly erased 10 times.
  • the co-localization of red and green fluorescence represents the presence of data on the origami paper.
  • Figure 11 shows the qPCR quantification of transcribed RNA during repeated reads of the data chain.
  • Figure 12 shows the writing results of the data link under four different matrix spacings.
  • Figure 13 shows the readout results of the data link under four different matrix spacings.
  • addressability generally refers to associating specific information with a location on a storage medium.
  • addressability of the data carrier is required.
  • the information carrier can simultaneously record basically uniquely corresponding index information (address) corresponding to different data information.
  • the address information of specific data information is recorded through the physical location, spatial location, etc. of the data carrier.
  • the data carrier is addressable, information desired to be accessed, read, and/or modified may be selectively accessed and may not need to be accessed one by one.
  • nucleic acid molecule refers to deoxyribonucleotides or ribonucleotides of various lengths, or analogs thereof.
  • exemplary nucleotides include deoxyribonucleotides (DNA) or ribonucleotides (RNA), or non-standard nucleotides, nucleotide analogs and/or modified nucleotides.
  • the term "vector” generally refers to a substance capable of carrying nucleic acid molecules.
  • the vector may comprise nucleic acid nanostructures, which may be made from nucleic acids such as DNA, RNA, locked nucleic acid (LNA), peptide nucleic acid (PNA), or any combination thereof.
  • LNA locked nucleic acid
  • PNA peptide nucleic acid
  • Two- or three-dimensional nanostructures For example, single-stranded nucleic acids or double-stranded nucleic acids (eg, having only a helical structure) may not be considered “nanostructures.”
  • nucleic acid nanostructures serve as scaffolds for the formation of more complex structures, such as molecular complexes.
  • nucleic acid nanostructure is a DNA origami structure assembled using DNA origami methods.
  • nucleic acid origami nanostructures may refer to nucleic acid nanostructures formed by assembling two or more "staple strands" with one or more "scaffold" strands into a prescribed shape. Staple strands are typically short (eg, 50 nucleotides or less) nucleic acid strands (single-stranded nucleic acids); scaffold strands are typically longer (eg, longer than 200 nucleotides) nucleic acid strands (single-stranded nucleic acids). nucleic acids).
  • the nucleic acid origami nanostructure may be a DNA origami nanostructure.
  • DNA origami nanostructures can fold (e.g., through self-assembly) into discrete and unique geometric patterns, such as two-dimensional (2D) and three-dimensional (3D) shapes, which can further self-assemble to create structures containing two or more discrete origami nanostructures. Larger nanostructures or microstructures of structures.
  • the scaffold strand has sequences derived from M13 phage. Other bracket chains can be used.
  • the staple strands are fluorophore labeled staple strands.
  • the staple strand is 4 to 30 nucleotides in length (e.g., 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29 or 30 nucleotides).
  • the staple strands are stably bonded to the scaffold strands (for longer than 10 seconds), for example, at room temperature.
  • the staple strands are stably bonded to the scaffold strands (for longer than one week), for example, at room temperature.
  • the staple strand is greater than 30 nucleotides in length.
  • in situ generally refers to operations performed in the original location.
  • in-situ reading refers to The nucleic acid molecules that record the data are read at the original position of the carrier, without the need to release the nucleic acid molecules into the solution first and then read the data.
  • amplification is generally intended to include the production of copies of a nucleic acid molecule via repeated cycles of primed enzymatic synthesis.
  • the reading step of the present application may include a polymerization step, including but not limited to polymerase chain reaction (PCR), transcription to RNA, and the like, including, for example, any other nucleic acid amplification and/or transcription known to those skilled in the art. Technology.
  • complementary generally refers to the ability of a nucleic acid to form hydrogen bonds with another nucleic acid sequence via traditional Watson-Crick or other non-traditional types.
  • sequence A-G-T is complementary to the sequence T-C-A.
  • Percent complementarity indicates the percentage of residues in a nucleic acid molecule that can form hydrogen bonds (e.g., Watson-Crick base pairing) with a second nucleic acid sequence (e.g., 50 out of 10, 6, 7, 8, 9, 10, respectively) %, 60%, 70%, 80%, 90% and 100% complementary).
  • perfect complementary means that all contiguous residues of a nucleic acid sequence will hydrogen bond to the same number of contiguous residues in a second nucleic acid sequence.
  • substantially complementary means that at 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 35, Within a region of 40, 45, 50 or more nucleotides, or refers to at least 60%, 65%, 70%, 75%, 80%, A degree of complementarity of 85%, 90%, 95%, 97%, 98%, 99% or 100%.
  • binding generally refers to a species that uniquely binds to a particular species, such as in a sequence-specific manner.
  • the term "about” generally refers to a variation within the range of 0.5% to 10% above or below the specified value, such as 0.5%, 1%, 1.5%, 2%, 2.5%, above or below the specified value. 3%, 3.5%, 4%, 4.5%, 5%, 5.5%, 6%, 6.5%, 7%, 7.5%, 8%, 8.5%, 9%, 9.5%, or 10%.
  • the present application provides a nucleic acid molecule that can be bound to a carrier with addressable information, and the data information that the nucleic acid molecule can contain can be randomly read and erased in situ on the carrier.
  • the nucleic acid molecules provided in this application can be used as a data link.
  • the data link can be combined with a specific position on the carrier based on the principle of base complementarity. Combined with dynamic DNA assembly technology, programmable combination and separation of storage addresses and data can be achieved. For example, when the data information is read, the original storage system does not need to be changed, achieving non-destructive data reading. For example, when reading the data information recorded in the nucleic acid molecule, the nucleic acid molecule does not need to be released from the carrier first, thereby achieving in-situ random reading of the required read information.
  • address sequences at different physical locations on the carrier have different sequences
  • the nucleic acid molecule may include an address complementary sequence
  • the address complementary sequence is complementary to the address sequence on the carrier.
  • the address complementary sequence can be an address recognition sequence, and the address complementary sequence can recognize and specifically bind the address sequence on the carrier.
  • the address information can be recorded by a sequence of addresses at different physical locations on the carrier.
  • different coordinate points represent different physical locations.
  • specific coordinate points can be located by the ends of staple strands at specific locations in DNA origami.
  • different coordinate points extend an address sequence with a unique sequence composition.
  • the address sequence of this application may be a linear single chain or a branched chain structure.
  • the address sequence of a specific physical location can be used to record index information because it has a unique sequence.
  • the data information may be combined with the address information.
  • the data link is stably associated with the address link (eg, for longer than 10 seconds), such as at room temperature. In some embodiments, such as at low temperatures (about 4 degrees Celsius), the data link is stably coupled to the address chain (eg, for longer than 10 seconds).
  • the data link and the address link are stably combined (eg, longer than 10 seconds).
  • the address complementary sequence is about 15 or more nucleotides in length.
  • the address complementary sequence of the present application may be a linear single chain or a branched chain structure.
  • the length of the address complementary sequence may be about 10 or more, about 11 or more, about 12 or more, about 13 or more, about 14 or more, about 15 or more, about 16 or more, about 17 or more, about 18 or more, about 19 or more, about 20 or more, about 25 or more, about 30 or more, about 40 or more, about 50 or more, or about 100 or more nucleotides.
  • the nucleotides may include natural nucleotides and/or nucleotides with artificial modifications, such as, but not limited to, methyl modifications, amino modifications, fluoro modifications, and the like.
  • the nucleic acid molecule may further comprise an erasure function sequence, the erasure function sequence is located upstream and/or downstream of the address complementary sequence, and the erasure function sequence is identical to the address sequence on the carrier.
  • the erase and write function sequence may also be called an erase and write functional area.
  • the erase and write function sequence is located upstream and/or downstream of the address complementary sequence.
  • the sequence composed of the erasing function sequence and the address complementary sequence may not be completely complementary to the address sequence on the carrier.
  • the exemplary address complementary sequence may be 20 nucleotides, and the erase function sequence may be 10 nucleotides.
  • the address sequence on the carrier may only be complementary to the 20 nucleotides of the address complementary sequence, but may not be complementary.
  • the erasing function sequence and the address complementary sequence together form 30 nucleotides that are completely complementary.
  • a strand with a higher binding capacity (such as the erasure strand of the present application) can thus be introduced so that the nucleic acid sequence of the data strand can be removed (erased) from a specific position of the vector.
  • the nucleic acid molecule when the erasure chain exists, can not be combined with the carrier, and the erasure chain is complementary to the address chain and the erasure function sequence at the same time.
  • the nucleic acid molecule (data strand) of the present application can have stronger binding ability to the erasure strand.
  • the data strand may have more and/or stronger binding bases to the erasure strand than to the address sequence on the carrier.
  • the address complementary sequence of the nucleic acid molecule (data chain), the address sequence corresponding to a specific position of the carrier, and the sequence of the specific corresponding erasure chain can be specified through design.
  • the address sequence of a specific position of the vector is uniquely complementary to the address complementary sequence of a specific nucleic acid molecule (data link) to achieve addressable writing.
  • the address complementary sequence of a specific nucleic acid molecule (data strand) bound to a specific position of the carrier is uniquely complementary to the sequence of the specific corresponding erasure strand to achieve addressable erasure.
  • the nucleic acid molecule may include a data sequence, and the data information in the data sequence may be read in a state where the nucleic acid molecule and the carrier are not substantially separated.
  • the data information can be read.
  • reading methods include but are not limited to in-situ sequencing, in-situ transcription into other nucleic acid molecules for reading information, in-situ reading into other nucleic acid molecules for reading information, and so on.
  • deriving information stored in a nucleic acid molecule (data link) can be considered as reading.
  • sequencing the derived (eg transcribed/amplified) other nucleic acid molecules may be considered a subsequent optional additional read step.
  • the data sequence is about 1 or more nucleotides in length.
  • the data sequence of this application may be a linear single chain or a branched chain structure.
  • the length of the data sequence may be about 1 or more, about 2 or more, about 3 or more, about 4 or more, about 5 or more, about 6 or more, about 7 or more, about 8 or more, about 9 or more, about 10 or more, about 15 or more, about 20 or More, about 30 or more, about 40 or more, about 50 or more, about 100 or more, about 120 or more, about 150 or more , about 200 or more, about 500 or more, about 700 or more, or about 1000 or more.
  • the storage method of this application can be compatible with data sequences of any length.
  • the nucleic acid molecule may include a read priming sequence capable of priming the data sequence to be read as a strand to be sequenced, and the strand to be sequenced is complementary to the data sequence.
  • the nucleic acid molecule may comprise a read priming sequence that may cause the data sequence to be read.
  • the read priming sequence of the present application can be a promoter, such as including but not limited to T7 promoter.
  • the storage method of the present application can be compatible with read priming sequences of any length and type, and the read priming sequences have specific sequences that can be used for binding and polymerization initiation of polymerases known in the art.
  • the nucleic acid molecule (data link) of the present application can have the function of selective reading.
  • the selective reading can be achieved by introducing a blocking strand whose blocking sequence is complementary to the reading initiating sequence.
  • the blocking chain may include a blocking sequence, and the blocking sequence of the blocking chain is partially or completely complementary to the read initiating sequence.
  • the blocking sequence of the blocking strand is complementary to the read priming sequence and an interval of about 5 nucleotides upstream/downstream of the reading priming sequence.
  • the blocking sequence is long The length is 22 nucleotides, of which 17 nucleotides are complementary to the read priming sequence, and the other 5 nucleotides are complementary to an interval of approximately 5 nucleotides upstream/downstream of the read priming sequence.
  • the blocking chain has a higher ability to bind to the data chain than the T7 promoter binds to the data chain.
  • the blocking chain of the present application can be combined with the read trigger sequence of the nucleic acid molecule (data chain), the data chain, or any position that can block the derivation of the information stored in the nucleic acid molecule (data chain).
  • the blocking strand may further comprise a blocking extension sequence located upstream and/or downstream of the blocking sequence, and the blocking extension sequence is substantially not complementary to the nucleic acid molecule.
  • the blocking chain has a blocking sequence and a blocking extension sequence.
  • the blocking extension sequence is basically not combined with the nucleic acid molecule (data chain), for example, forming an overhang. structure.
  • the extension-blocking sequence may be about 8 nucleotides in length.
  • the blocking extension sequence can be used as a lever by improving the binding ability of the activation chain and the blocking chain, so that the activation chain removes the blocking chain from the data chain.
  • the blocking strand can not bind to the nucleic acid molecule when an activating strand is present, which is complementary to the blocking sequence and the blocking extension sequence at the same time.
  • the activation chain of the present application can have stronger binding ability with the hindrance chain.
  • the activating strand may have more and/or stronger binding bases to the blocking strand than to a sequence on the nucleic acid molecule (data strand) that binds to the blocking strand.
  • the sequence of the nucleic acid molecule (data chain) used to block the reading, the blocking sequence of the specific corresponding blocking strand, and the sequence of the specific corresponding activating chain can be designed by designing specific bases.
  • the sequence of the nucleic acid molecule (data strand) used to block the read is uniquely complementary to the blocking sequence of the specific corresponding blocking strand to achieve an addressable locked read.
  • the sequence of a particular corresponding activating strand is uniquely complementary to the sequence of the blocking sequence of a particular corresponding blocking strand to enable addressable unlocking (activating) reading.
  • the data sequence of the specific nucleic acid molecule (data link) may not be substantially read, transcribed and/or amplified.
  • the data sequence of the specific nucleic acid molecule (data link) can be read, transcribed and/or amplified.
  • the vector may comprise a DNA origami substrate whose staple strands may contain address sequences with addressable information.
  • different coordinate points represent different physical locations.
  • specific coordinate points can be located by the ends of staple strands at specific locations in DNA origami.
  • different coordinate points extend an address sequence with a unique sequence composition.
  • the address sequence of a specific physical location can be used to record index information because it has a unique sequence.
  • the data information may be combined with the address information.
  • 2 or more of the staple strands are spaced apart by about 6 nanometers or more.
  • the adjacent staples The strands are spaced about 6 nanometers or greater, about 7 nanometers or greater, about 8 nanometers or greater, about 9 nanometers or greater, about 10 nanometers or greater, about 15 nanometers or greater, about 20 nanometers or more. larger, about 25 nanometers or larger, or about 30 nanometers or larger.
  • the present application provides a system, which may include the nucleic acid molecule of the present application, and a vector.
  • the system may also include the erasure chain of the present application, the blocking chain of the present application, and/or the activation chain of the present application.
  • the present application provides a data storage method, which may include providing the nucleic acid molecule of the present application and/or the system of the present application.
  • a data editing and/or data reading method the data editing method may include replacing the data information stored in the nucleic acid molecules of the present application.
  • a data reading method which may include determining the data information stored in the nucleic acid molecules of the present application.
  • the method may further comprise providing a vector, the vector may comprise a DNA origami substrate, the staple strands of the DNA origami may comprise an address sequence having addressable information, and the method may comprise providing a molar ratio of About 2:1 or higher of the nucleic acid molecule to the address sequence.
  • the method may include providing a molar ratio of about 2:1 or higher, about 2.1:1 or higher, about 2.2:1 or higher, about 2.3:1 or higher, about 2.4:1 or higher.
  • nucleic acid molecule to the address sequence about 2.5:1 or higher, about 3:1 or higher, about 4:1 or higher, about 5:1 or higher, about 10:1 or higher, about 20:1 or higher, about 50:1 or higher, or about 100:1 or higher of the nucleic acid molecule to the address sequence.
  • choosing an appropriate molar ratio can improve the storage success rate of the data link.
  • choosing an appropriate molar ratio can improve the storage cost-effectiveness of the data link.
  • the method may further comprise providing an erasure strand of the present application, the nucleic acid molecule at a specific physical location is bound to the erasure strand at room temperature, and the nucleic acid molecule is not substantially bound to the carrier.
  • room temperature and “ambient temperature” generally refer to a temperature between about 16 degrees Celsius and about 40 degrees Celsius. For example, a temperature between about 16 degrees Celsius and about 25 degrees Celsius. For example, about 25 degrees Celsius.
  • the method may further include providing a blocking chain of the present application, the blocking chain is combined with the nucleic acid molecule, and the data information of the nucleic acid molecule cannot be substantially read.
  • the method may further include providing the activation chain of the present application.
  • the hindrance chain to which the nucleic acid molecules at a specific physical position are bound binds to the activation chain, and the hindrance chain does not substantially bind to the activation chain.
  • the nucleic acid molecules bind.
  • the heating and cooling processes in this application include the process of annealing nucleic acids.
  • the process of partially separating the double-stranded nucleic acid molecules of the present application eg, heating
  • then restoring the partially double-stranded structure eg, cooling
  • the process of heating to about 95 degrees for about 3 minutes and then cooling to room temperature at a rate of about 1.2 degrees per minute can be a nucleic acid annealing process.
  • the present application provides a storage medium recording a program that can run the method of the present application.
  • the present application provides a device, which may contain the storage medium of the present application.
  • the present application provides a non-volatile computer-readable storage medium on which a computer program is stored, and the program is executed by a processor to implement any one or more methods described in the present application.
  • the non-volatile computer-readable storage medium may include software disk, flexible disk, hard disk, solid state storage (SSS) (such as solid state drive (SSD)), solid state card (SSC), solid state module (SSM)), enterprise flash drive, tape or any other non-transitory magnetic media, etc.
  • SSD solid state drive
  • SSC solid state card
  • SSM solid state module
  • enterprise flash drive tape or any other non-transitory magnetic media, etc.
  • Non-volatile computer-readable storage media may also include punched cards, paper tape, cursor pads (or any other physical media having a hole pattern or other optically identifiable markings), compact disk read-only memory (CD-ROM) , Compact Disc Rewritable (CD-RW), Digital Versatile Disc (DVD), Blu-ray Disc (BD) and/or any other non-transitory optical media.
  • CD-ROM compact disk read-only memory
  • CD-RW Compact Disc Rewritable
  • DVD Digital Versatile Disc
  • BD Blu-ray Disc
  • the device of the present application may further include a processor coupled to the storage medium, and the processor is configured to execute based on a program stored in the storage medium to implement the method of the present application.
  • the device may implement various mechanisms to ensure that methods described herein when executed on a database system produce correct results.
  • the device may use disks as permanent data storage.
  • the device can provide database storage and processing services for multiple database clients.
  • the device may store database data across multiple shared storage devices and/or may utilize one or more execution platforms with multiple execution nodes.
  • the device can be organized so that storage and computing resources can be expanded effectively infinitely.
  • the preparation of a nanometer-addressable, fully functional DNA storage system includes the following steps:
  • the address on the DNA origami surface is expressed in the form of a DNA sequence.
  • An orthogonal sequence library with a length of, for example, 15 to 18 bases or longer is designed through a sequence screening algorithm to ensure that the data chain can be stored in a stable hybridization manner. to a specific address.
  • the address sequence is extended from the corresponding site on the origami surface by designing its staple strands.
  • the address-designed staple strand combination is mixed with the backbone strand and assembled through annealing to form an addressable blank DNA storage platform.
  • the DNA origami assembly process can refer to the DNA origami assembly technology known in the art, such as Rothemund, P.W.K. Folding DNA to Create Nanoscale Shapes and Patterns. Nature 2006, 440 (7082), 297–302.
  • FIG. 1 An exemplary overall storage process can be shown in Figure 1
  • FIG. 2 An exemplary data chain (nucleic acid molecule in this invention) structure can be shown in Figure 2
  • selective reading and reversible erasing of data can be shown in Figure 3.
  • a data operation system of an addressable DNA storage system may include:
  • Data writing (or storing) step which may include utilizing the DNA complementary pairing principle to add a data chain containing a data sequence to bind it at a specific address.
  • Data erasure step which may include adding a specific address erasure operation DNA chain (also known as Erase chain, that is, the erasure chain), through the DNA chain substitution reaction, combined with the data sequence there, the address is restored to an unwritten state.
  • DNA chain also known as Erase chain, that is, the erasure chain
  • Data modification (or editing) step which may include first erasing the original data and then writing new data.
  • Data readout (or reading) step which may include adding an activation chain (also called an Input chain) corresponding to a specific address to activate the transcription function of the address, under the action of T7 RNA polymerase , transcribe the data of this address into an RNA chain, and the inactive address does not have the transcription ability.
  • Addressable data reading is achieved by collecting the obtained RNA strands for subsequent sequencing.
  • the design process of the nano-addressable DNA platform is as follows:
  • the DNA origami skeleton chain uses M13mp18 single chain, and the staple chain combination is obtained according to the target template shape (two-dimensional rectangular structure).
  • An exemplary manner may be to optionally mark the corresponding relationship between the staple chain and the address number according to the shape information of the seven-segment addressing structure. Randomly generate and screen out seven orthogonal address sequences with a length of 20 bases, and extend the staple chain sequences corresponding to addresses 1-7 respectively. The extended sequences are the designed address sequences, thereby obtaining the address sequence for assembly.
  • the information that can be stored in the seven data links corresponding to addresses 1-7 is the seven Chinese characters "Heaven and earth change and all things are connected".
  • the switchable data link adopts a partially complementary paired double-stranded structure.
  • a single Chinese character is converted into binary encoding and then encoded into a 10-base sequence using a cyclic encoding algorithm.
  • the complete data chain is connected by the address complementary sequence, erasing functional region, T7 promoter sequence, and data sequence (including a pair of primers) ( Figure 2).
  • the closed strand contains 17 bases of the T7 promoter and the complementary sequence of 5 bases downstream and an 8-base dangling sequence corresponding to the address (called toehold).
  • the data strand and closed strand form a double-stranded structure through partial complementary pairing of 22 bases.
  • the structure of the nano-addressable platform is assembled.
  • the assembly process is as follows:
  • M13 single chain and addressable staple chain are mixed in 1 ⁇ TAE-Mg 2+ solution, with a final concentration of 10 nM for the backbone chain and 50 nM for the staple chain.
  • the obtained assembly product is purified using PEG precipitation, and a pure assembly structure of approximately 20 nM is obtained as a substrate for information storage.
  • the obtained DNA structure can be characterized using atomic force microscopy (AFM).
  • Embodiment 2 Data operation method of nanometer-addressable full-function DNA storage system
  • RNA molecules obtained by transcription are reverse transcribed using a reverse transcription kit to obtain the corresponding DNA strands.
  • the obtained DNA strands are quantified using fluorescence quantitative PCR to confirm the export of the data.
  • the specific experiment is: first write 647 data, at this time the writing efficiency of the first 488 data is basically 0, then erase the 647 data, at this time the erasing efficiency of the 488 data is basically 0; in the next step, write the 488 data, At this time, the writing efficiency is about 54%, and then the 488 data is erased. At this time, the 488 data erasing efficiency is about 110%. And so on. Repeat more than 5 times to measure the addressable repeated erasing ability of the system. The results show that after repeated erasing and writing, the fluorescence test results obtained can correspond to the expected erasing and writing results, indicating that it is feasible to repeatedly modify the data (Figure 8). The upper part of the ordinate in Figure 8 represents the erasure of 488 data Efficiency, the bottom represents the writing efficiency of 488 data, and the abscissa represents the number of erases or writes.
  • Example 1 The 20 pM biotin-linked origami assembled in Example 1 was fixed on the glass slide through the PEG-biotin-Avidin method. After cleaning, 1 ⁇ M Cy5-labeled data link 1 was added, incubated for 30 minutes, washed, and fluorescence detection was performed under TIRF. Add 1 ⁇ M erase chain 1, incubate for 1 hour, wash with buffer, and take fluorescence photography again. The erasing and writing process was repeated 10 times, and TIRF (total internal reflection fluorescence microscope) recorded the fluorescence changes.
  • Figure 9 is a single-point TIRF imaging diagram of seven address data being erased and written repeatedly for 10 times.
  • Figure 9 shows that for a single fluorescent dot, it is very stable for 10 times of erasing.
  • Figure 10 is a TIRF (total internal reflection fluorescence microscope) imaging diagram and co-localization statistical diagram of seven address data being repeatedly erased 10 times. The co-localization of red and green fluorescence represents the presence of data on the origami paper.
  • Figure 10 shows that after 10 times of erasing and writing, the co-localization ratio after writing is stable at about 85%, and the co-localization ratio after erasing is stable at about 10%.
  • DNA origami is designed with different matrix spacings, including 6nm, 12nm, 18nm and 24nm, to store information chains Arrange them on the DNA origami paper at different spacings, give each information chain a physical address at different spacings, and detect the writing and reading of the data chains at different matrix spacings according to the method in Example 2.
  • the results are shown in Figures 12 and 13. When the matrix spacing is divided into 6nm, 12nm, 18nm, and 24nm, there is no significant difference in the writing efficiency and readout efficiency of the data link.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Computer Hardware Design (AREA)
  • Molecular Biology (AREA)
  • Chemical & Material Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Neurology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

Provided in the present application are a data storage medium and a use thereof. Also provided is a nucleic acid molecule. The nucleic acid molecule can be combined with a carrier having addressable information, and data information contained in the nucleic acid molecule can be randomly read, erased or written in situ in the carrier.

Description

一种数据存储介质及其应用A data storage medium and its application 技术领域Technical field
本申请涉及数据存储领域,具体的涉及一种数据存储介质及其应用。This application relates to the field of data storage, and specifically to a data storage medium and its application.
背景技术Background technique
随着大数据与人工智能技术的兴起,对于海量数据的存储需求也呈现爆炸式的增长,主流的存储介质逐渐难以满足急剧增长的存储需求。脱氧核糖核酸(DNA)作为亿万年自然进化选择出来的碳基生命遗传密码的存储介质,具有极高的存储密度和稳健性。DNA本身所具有的可编码性和高效复制能力,有可能为数据高密度存储和高性能运算提供一种全新策略。DNA存储具有物理稳定性高的优点,不像电子介质会随读取次数而衰退,为数据的长期存储提供了一种根本性解决方案。另外,DNA兼具了信息处理和计算能力,为发展新型的存-算一体架构和系统提供了新的思路。With the rise of big data and artificial intelligence technology, the demand for storage of massive data has also shown explosive growth, and mainstream storage media are gradually unable to meet the rapidly growing storage demand. Deoxyribonucleic acid (DNA), as the storage medium for the carbon-based genetic code of life selected by hundreds of millions of years of natural evolution, has extremely high storage density and robustness. The codability and efficient replication capabilities of DNA may provide a new strategy for high-density data storage and high-performance computing. DNA storage has the advantage of high physical stability, unlike electronic media that deteriorates with the number of reads, providing a fundamental solution for long-term data storage. In addition, DNA has both information processing and computing capabilities, providing new ideas for the development of new storage-computing integrated architectures and systems.
DNA数据存储流程主要包括信息编码、写入、保存、检索、读出、解码6个技术环节。现有的DNA存储体系主要基于人工合成的短链DNA或长链DNA,将数字信息编码成DNA序列并合成对应的DNA链,在细胞内或体外保存,随后利用测序技术实现数据的读出。The DNA data storage process mainly includes six technical links: information encoding, writing, saving, retrieval, reading, and decoding. The existing DNA storage system is mainly based on artificially synthesized short-chain DNA or long-chain DNA, which encodes digital information into DNA sequences and synthesizes corresponding DNA strands, which are stored in cells or outside the body, and then sequenced technology is used to read out the data.
与现有成熟数据存储系统相比,现有DNA存储体系主要包含两种存储模式。一种存储模式是非随机读取的存储架构,该方案将待存储数据进行整体性编码和整体性写入,因此数据的读出也需要对体系中的所有序列进行测序,并对整体的测序结果进行解码以获得数据信息。非随机读取的存储架构无法实现对文件内部分信息进行搜索,因此也无法对写入后的信息进行可寻址的修改。另一种存储模式是具有随机读取模式的DNA存储架构,这一类体系中首先将待写入分割成数据片段,对数据片段进行编码并加入索引序列。因此,这一类DNA存储架构具有选择性读取信息片段的能力。然而,这一类体系数据选择性读取主要依赖于PCR技术,需直接取出原始数据的一部分或通过磁珠将需读数据提取出来后进行PCR扩增,而后进行测序读出。然而,这种方式会破环原始数据组成,在多次读取后,多次PCR会向数据中引入很多错误,影响原始数据的恢复。类似于传统半导体存储介质的原位读取方式,在不破坏原始数据的基础上实现数据选择性读取的数据读取方式目前仍然欠缺。另外,目前DNA存储系统主要具有单次写入存档功能,而对写入后存储系统内数据的修改能力仍然有很大不足,特别是对存放于DNA存储系统中的任意数据的可寻址修改仍未实现。Compared with existing mature data storage systems, existing DNA storage systems mainly include two storage modes. One storage mode is a non-random read storage architecture. This solution encodes and writes the data to be stored holistically. Therefore, the reading of the data also requires sequencing of all sequences in the system and the overall sequencing results. Decode to obtain data information. The non-random read storage architecture cannot search for partial information within the file, and therefore cannot perform addressable modifications to the written information. Another storage mode is a DNA storage architecture with a random read mode. In this type of system, the data to be written is first divided into data fragments, the data fragments are encoded and index sequences are added. Therefore, this type of DNA storage architecture has the ability to selectively read information fragments. However, the selective reading of data in this type of system mainly relies on PCR technology. It is necessary to directly take out a part of the original data or extract the data to be read through magnetic beads and then perform PCR amplification, and then perform sequencing reading. However, this method will destroy the original data composition. After multiple reads, multiple PCRs will introduce many errors into the data, affecting the recovery of the original data. Similar to the in-situ reading method of traditional semiconductor storage media, a data reading method that achieves selective reading of data without destroying the original data is still lacking. In addition, the current DNA storage system mainly has a single write archiving function, but the ability to modify data in the storage system after writing is still very insufficient, especially the addressable modification of any data stored in the DNA storage system. Still not implemented.
因此,本领域急需一种可寻址写入、可寻址修改和/或可寻址读取的DNA存储方法。 Therefore, there is an urgent need in the art for a DNA storage method that is addressable to write, addressable to modify, and/or addressable to read.
发明内容Contents of the invention
本申请提供了一种具有完整数据操作能力的DNA存储体系,可以实现数据写入、数据删除、数据修改、数据读出等具有可寻址特性功能,特别是多次擦写、数据修改和多次数据读出,弥补现有DNA存储系统在功能上的不足。例如,本申请实现DNA分子上存储地址和数据的可编程结合和分离。This application provides a DNA storage system with complete data operation capabilities, which can realize data writing, data deletion, data modification, data reading and other addressable features, especially multiple erasing, data modification and multiple Secondary data readout can make up for the functional shortcomings of existing DNA storage systems. For example, this application enables programmable combination and separation of storage addresses and data on DNA molecules.
一方面,本申请提供了一种系统,所述系统包含1)包含数据信息的核酸分子,和2)具有可寻址信息的载体,所述核酸分子能够结合于所述载体上,并且所述核酸分子包含的数据信息能够在所述载体进行存储(也可称为写入)、原位读取和/或原位编辑(包括擦除和/或写入新的数据信息)。In one aspect, the present application provides a system comprising 1) a nucleic acid molecule containing data information, and 2) a carrier having addressable information, the nucleic acid molecule being able to bind to the carrier, and the The data information contained in the nucleic acid molecules can be stored (also called writing), read in situ and/or edited in situ (including erasing and/or writing new data information) on the carrier.
另一方面,本申请提供了一种数据存储的方法,所述方法包括提供1)所述包含数据信息的核酸分子,和2)所述具有可寻址信息的载体。On the other hand, the present application provides a method for data storage, which method includes providing 1) the nucleic acid molecule containing data information, and 2) the carrier with addressable information.
另一方面,本申请提供了一种数据读取的方法,所述方法包括提供1)所述包含数据信息的核酸分子,和2)所述具有可寻址信息的载体。On the other hand, the present application provides a method for data reading, which method includes providing 1) the nucleic acid molecule containing data information, and 2) the carrier with addressable information.
另一方面,本申请提供了一种数据编辑的方法,所述方法包括提供1)所述包含数据信息的核酸分子,和2)所述具有可寻址信息的载体,并替换所述核酸分子中的数据信息。On the other hand, the present application provides a method for data editing, which method includes providing 1) the nucleic acid molecule containing data information, and 2) the carrier with addressable information, and replacing the nucleic acid molecule data information in.
在某些实施方式中,所述载体的不同物理位置处具有不同序列的地址序列。In certain embodiments, the carrier has different sequences of address sequences at different physical locations.
在某些实施方式中,所述载体包含DNA折纸基板,所述DNA折纸基板包含订书钉链,所述订书钉链包含具有可寻址信息的地址序列。In certain embodiments, the vector comprises a DNA origami substrate comprising a staple strand comprising an address sequence having addressable information.
在某些实施方式中,在所述载体中,所述订书钉链末端的地址序列以矩阵形式排布。所述矩阵至少为2×2矩阵。在某些实施方式中,相邻两个订书钉链的间距为约6-24nm。例如,相邻两个订书钉链的间距为约6nm。例如,相邻两个订书钉链的间距为约12nm。例如,相邻两个订书钉链的间距为约18nm。例如,相邻两个订书钉链的间距为约24nm。In some embodiments, in the vector, the address sequences at the ends of the staple chains are arranged in a matrix. The matrix is at least a 2×2 matrix. In some embodiments, the distance between two adjacent staple strands is about 6-24 nm. For example, the distance between two adjacent staple chains is approximately 6 nm. For example, the distance between two adjacent staple strands is approximately 12 nm. For example, the distance between two adjacent staple strands is approximately 18 nm. For example, the distance between two adjacent staple strands is approximately 24 nm.
在某些实施方式中,所述核酸分子结合在所述订书钉链上,以矩阵形式排布。所述矩阵至少为2×2矩阵。在某些实施方式中,相邻两个核酸分子的间距为约6-24nm。例如,相邻两个核酸分子的间距为约6nm。例如,相邻两个核酸分子的间距为约12nm。例如,相邻两个核酸分子的间距为约18nm。例如,相邻两个核酸分子的间距为约24nm。In certain embodiments, the nucleic acid molecules are bound to the staple strands and arranged in a matrix. The matrix is at least a 2×2 matrix. In certain embodiments, the distance between two adjacent nucleic acid molecules is about 6-24 nm. For example, the distance between two adjacent nucleic acid molecules is approximately 6 nm. For example, the distance between two adjacent nucleic acid molecules is approximately 12 nm. For example, the distance between two adjacent nucleic acid molecules is approximately 18 nm. For example, the distance between two adjacent nucleic acid molecules is approximately 24 nm.
在某些实施方式中,所述核酸分子包含地址互补序列,所述地址互补序列与所述载体上的所述地址序列互补。通过所述地址互补序列与所述地址序列之间的互补,所述核酸分子结合在所述载体的特定物理位置。In certain embodiments, the nucleic acid molecule comprises an address complementary sequence that is complementary to the address sequence on the vector. The nucleic acid molecule is bound to a specific physical location of the vector by complementarity between the address complementary sequence and the address sequence.
在某些实施方式中,所述地址互补序列的长度为约15个或更多个核苷酸。 In certain embodiments, the address complement is about 15 or more nucleotides in length.
在某些实施方式中,所述核酸分子包含数据序列,且在所述核酸分子与所述载体基本上不分离的状态下,所述数据序列中的数据信息可以被读取。例如,通过激活反应,合成可读取的数据链,从而读取所述数据序列中的数据信息。In some embodiments, the nucleic acid molecule includes a data sequence, and the data information in the data sequence can be read in a state where the nucleic acid molecule and the vector are not substantially separated. For example, by activating a reaction, a readable data chain is synthesized, thereby reading the data information in the data sequence.
在某些实施方式中,所述数据序列的长度为约1个或更多个核苷酸。所述数据序列可以是单链。In certain embodiments, the data sequence is about 1 or more nucleotides in length. The data sequence may be single stranded.
在某些实施方式中,所述核酸分子包含读取引发序列,所述读取引发序列能够引发所述数据序列合成出或转录为待测序链,且所述待测序链与所述数据序列互补。In some embodiments, the nucleic acid molecule includes a read priming sequence, the read priming sequence can trigger the synthesis or transcription of the data sequence into a strand to be sequenced, and the strand to be sequenced is complementary to the data sequence. .
在某些实施方式中,所述读取引发序列包括启动子。例如,所述读取引发序列为T7启动子。所述读取引发序列能使数据序列进行转录。通过对转录产物进行测序、解码,就能读取数据序列中包含的数据信息。In certain embodiments, the read priming sequence includes a promoter. For example, the read initiating sequence is a T7 promoter. The read priming sequence enables transcription of the data sequence. By sequencing and decoding the transcript products, the data information contained in the data sequence can be read.
在某些实施方式中,所述系统和/或方法进一步包含3)阻碍链,所述阻碍链能够使得所述数据信息不被读取。在本申请中,所述阻碍链也可以被称为封闭链。In some embodiments, the system and/or method further includes 3) a blocking chain that can prevent the data information from being read. In this application, the blocking chain may also be called a closed chain.
在某些实施方式中,所述阻碍链包含阻碍序列,所述阻碍序列能够与所述读取引发序列互补,从而使得所述数据信息不被读取。在一个具体的实施方式中,所述阻碍序列包括启动子互补序列,其与作为读取引发序列的启动子结合,使得转录不能够启动。In some embodiments, the blocking strand includes a blocking sequence that is complementary to the read initiating sequence, thereby preventing the data information from being read. In a specific embodiment, the blocking sequence includes a promoter complementary sequence that binds to the promoter as a read initiator sequence, so that transcription cannot be initiated.
在某些实施方式中,所述阻碍链包含阻碍延伸序列,所述阻碍延伸序列位于所述阻碍序列的上游和/或下游,且所述阻碍延伸序列与所述核酸分子基本不互补。在本申请中,所述阻碍延伸序列也可以称为悬挂链(teohold)。悬挂链可以提供转录激活反应的能量驱动力及特异性。在一个具体的实施方式中,所述阻碍延伸序列的长度为约4-10个核苷酸,例如,6个核苷酸。In certain embodiments, the blocking strand includes a blocking elongation sequence located upstream and/or downstream of the blocking sequence, and the blocking elongation sequence is substantially non-complementary to the nucleic acid molecule. In this application, the elongation-blocking sequence may also be referred to as a dangling chain (teohold). Dangling strands can provide energy driving force and specificity for transcriptional activation reactions. In a specific embodiment, the elongation-blocking sequence is about 4-10 nucleotides in length, for example, 6 nucleotides.
在本申请中,针对每个地址上的数据信息,在存储状态下,或不需要读取的时候,阻碍链与核酸分子上的读取启动序列互补结合,转录不能启动。In this application, for the data information at each address, in the storage state or when no reading is required, the blocking chain is complementary to the read initiation sequence on the nucleic acid molecule, and transcription cannot be initiated.
在某些实施方式中,所述系统和/或方法进一步包含4)激活链,例如,当需要进行数据读写时,可以加入激活链。所述激活链能够使得所述阻碍链不与所述核酸分子结合,且所述激活链同时与所述阻碍序列以及所述阻碍延伸序列互补。激活链与特定物理地址对应,当需要读取特定地址的数据信息时,可以加入对应特定地址的激活链,所述激活链与阻碍链结合,使得阻碍链从核酸分子的读取启动序列上脱离,从而启动读取,例如,激活该地址的转录功能。在一个具体的实施方式中,加入对应于特定地址的激活链,激活该地址的转录功能,在T7 RNA聚合酶的作用下,转录该地址的数据成RNA链,未被激活的地址不具有转录能力。通过收集获得的RNA链进行后续测序,将核酸序列解码为数据信息,从而实现可寻址的数据 读取。In some embodiments, the system and/or method further includes 4) activation chain. For example, when data reading and writing are required, the activation chain can be added. The activation chain can prevent the blocking chain from binding to the nucleic acid molecule, and the activation chain is complementary to the blocking sequence and the blocking extension sequence at the same time. The activation chain corresponds to a specific physical address. When the data information of a specific address needs to be read, an activation chain corresponding to the specific address can be added. The activation chain combines with the hindrance chain, causing the hindrance chain to detach from the reading initiation sequence of the nucleic acid molecule. , thereby initiating reading, for example, activating the transcription function at that address. In a specific embodiment, an activation chain corresponding to a specific address is added to activate the transcription function of the address. Under the action of T7 RNA polymerase, the data of the address is transcribed into an RNA chain. The unactivated address has no transcription function. ability. By collecting the obtained RNA strands for subsequent sequencing, the nucleic acid sequence is decoded into data information, thereby achieving addressable data. Read.
在某些实施方式中,所述数据读取过程可在室温下进行。在室温中,特定物理位置的核酸分子所结合的阻碍链可以与激活链结合。In some embodiments, the data reading process can be performed at room temperature. At room temperature, the blocking strand bound to the nucleic acid molecule at a specific physical location can bind to the activating strand.
在某些实施方式中,所述核酸分子包含擦除功能序列,所述擦写功能序列位于所述地址互补序列的上游和/或下游,且所述擦写功能序列与所述载体上的所述地址序列基本不互补。In certain embodiments, the nucleic acid molecule includes an erasure function sequence, the erasure function sequence is located upstream and/or downstream of the address complementary sequence, and the erasure function sequence is consistent with all the addresses on the carrier. The above address sequences are basically not complementary.
在某些实施方式中,所述系统和/或方法进一步包含5)擦除链,例如,当需要进行数据擦除时,可以加入擦除链。所述擦除链能够使得所述核酸分子不与所述载体结合,且所述擦除链能够同时与所述地址互补序列以及所述擦写功能序列互补。在某些实施方式中,所述擦除链能够与所述核酸分子的擦写功能序列互补结合。擦除链与特定物理地址对应,当需要擦除特定地址的数据信息时,可以加入对应特定地址的擦除链,然后例如通过DNA链取代反应,结合该地址的数据序列,从而使得该数据序列从载体上移除,使该地址恢复到未被写入的状态。In some embodiments, the system and/or method further includes 5) erasure chain. For example, when data erasure is required, the erasure chain can be added. The erasure strand can prevent the nucleic acid molecule from binding to the carrier, and the erasure strand can be complementary to the address complementary sequence and the erasure function sequence at the same time. In certain embodiments, the erasing strand is capable of complementary binding to the erasing functional sequence of the nucleic acid molecule. The erasure chain corresponds to a specific physical address. When the data information of a specific address needs to be erased, the erasure chain corresponding to the specific address can be added, and then, for example, through a DNA chain substitution reaction, the data sequence of the address is combined, so that the data sequence Removed from the carrier, returning the address to its unwritten state.
在某些实施方式中,所述擦除过程可在室温下进行。室温下,加入所述擦除链后,特定物理位置的所述核酸分子可以与所述擦除链结合,且所述核酸分子基本不与所述载体结合。In some embodiments, the erasing process can be performed at room temperature. At room temperature, after the erasure strand is added, the nucleic acid molecules at specific physical locations can bind to the erasure strand, and the nucleic acid molecules are basically not bound to the carrier.
在某些实施方式中,当原有数据信息被擦除,即,原有数据序列被擦除链擦除时,该地址可以重新被写入新的数据序列,例如,加入新的包含数据信息的核酸分子。从而实现数据的原位修改和/或编辑。In some embodiments, when the original data information is erased, that is, when the original data sequence is erased by the erasure chain, the address can be rewritten into a new data sequence, for example, adding new data information containing nucleic acid molecules. This enables in-situ modification and/or editing of data.
在某些实施方式中,所述核酸分子与所述载体中的地址序列的摩尔比为约1:1或更高。在某些实施方式中,所述核酸分子与所述载体中的地址序列的摩尔比为约2:1或更高。在某些实施方式中,所述核酸分子与所述载体中的地址序列的摩尔比为约3:1或更高。在某些实施方式中,所述核酸分子与所述载体中的地址序列的摩尔比为约4:1或更高。在某些实施方式中,所述核酸分子与所述载体中的地址序列的摩尔比为约5:1或更高。In certain embodiments, the molar ratio of the nucleic acid molecule to the address sequence in the vector is about 1:1 or greater. In certain embodiments, the molar ratio of the nucleic acid molecule to the address sequence in the vector is about 2:1 or greater. In certain embodiments, the molar ratio of the nucleic acid molecule to the address sequence in the vector is about 3:1 or greater. In certain embodiments, the molar ratio of the nucleic acid molecule to the address sequence in the vector is about 4:1 or greater. In certain embodiments, the molar ratio of the nucleic acid molecule to the address sequence in the vector is about 5:1 or greater.
另一方面,本申请提供了所述系统和/或方法中的核酸分子,所述核酸分子包含所述数据序列、地址互补序列和/或读取引发序列。在一个实施方案中,在所述数据序列的上游和/下游,所述核酸分子还包含引物,用于验证擦写与读取的可行性。在其他具体的实施方案中,引物部分可换做包含数据信息的数据序列,进一步增加存储容量。图2显示的是本申请所述的核酸分子的一个实例。In another aspect, the application provides a nucleic acid molecule in the system and/or method, the nucleic acid molecule comprising the data sequence, address complement sequence and/or read priming sequence. In one embodiment, upstream and/or downstream of the data sequence, the nucleic acid molecule further includes primers for verifying the feasibility of erasing and reading. In other specific embodiments, the primer part can be replaced with a data sequence containing data information to further increase storage capacity. Figure 2 shows an example of a nucleic acid molecule described herein.
另一方面,本申请提供了所述系统和/或方法中的载体。In another aspect, the application provides vectors in the systems and/or methods.
另一方面,本申请提供了一种存储介质,所述介质包含本申请的方法。On the other hand, the present application provides a storage medium, which contains the method of the present application.
另一方面,本申请提供了一种设备,所述设备包含本申请的储存介质,以及耦接至所述 储存介质的处理器,所述处理器被配置为基于存储在所述储存介质中的程序执行以实现本申请的方法。On the other hand, the present application provides a device, the device includes the storage medium of the present application, and is coupled to the A processor of a storage medium, the processor is configured to execute based on a program stored in the storage medium to implement the method of the present application.
在一个具体的实施方式中,本申请的发明人构建了6nm的DNA芯片,实现多次原位擦写和/或多次原位选择性读取(WMRM)。在一个更具体的实施方式中,利用DNA纳米结构的可寻址性及可编程性,将存储信息的数据序列以6nm的间距排布于DNA折纸上,赋予每条数据序列以6nm间距的物理地址,通过设计擦除链,利用DNA链置换反应提供的信息处理和计算能力,实现DNA数据序列在折纸上的重复擦写、对任意数据进行修改。此外,在DNA链上设置了启动子(例如,T7启动子)开关作为读取启动序列,通过设计阻碍链和/或激活链,利用DNA链置换反应调控DNA链的转录活性,有选择地将DNA信息转录出来,进行RNA测序(例如,利用Illumina技术),实现对数据信息的选择性读取及多次读取。In a specific embodiment, the inventor of the present application constructed a 6nm DNA chip to achieve multiple in-situ erasing and/or multiple in-situ selective reading (WMRM). In a more specific implementation, the addressability and programmability of DNA nanostructures are used to arrange data sequences storing information on DNA origami with a pitch of 6 nm, giving each data sequence a physical structure with a pitch of 6 nm. Address, by designing erasure chains and utilizing the information processing and computing capabilities provided by the DNA strand displacement reaction, can realize repeated erasing of DNA data sequences on origami and modify any data. In addition, a promoter (for example, T7 promoter) switch is set on the DNA chain as a read start sequence. By designing a blocking chain and/or an activating chain, the DNA chain displacement reaction is used to regulate the transcriptional activity of the DNA chain, and the transcription activity of the DNA chain is selectively The DNA information is transcribed and RNA sequenced (for example, using Illumina technology) to achieve selective reading and multiple reading of the data information.
与现有技术相比,本申请至少具有如下特点:基于DNA组装技术,具有纳米尺度的可寻址性,是目前的DNA存储体系所不具备的;利用这种纳米可寻址性,可将DNA存储的面密度提升到1比特/平方纳米,超越现有无机存储架构以及DNA存储架构;基于DNA折纸表面的可控动态组装,实现了全功能的存储体系,包括数据的读写、修改等操作;读出过程采用原位转录成RNA分子,利用转录产物进行测序的方案,不改变原有存储系统,实现了非破坏性的DNA数据读出。例如,本申请的方法以及数据载体可以具有多次读写、多次读取的效果。例如,本申请的数据系统在多次读取之后,所述数据系统的结构基本不发生变化。例如,本申请的数据载体的地址信息的精确度、特异度较高,可以实现更高的区分水平。例如,本申请的方法以及数据系统可以在室温至约42摄氏度条件下实现数据信息的读取。Compared with the existing technology, this application at least has the following characteristics: it is based on DNA assembly technology and has nanoscale addressability, which is not available in the current DNA storage system; using this nano-addressability, it can The area density of DNA storage has been increased to 1 bit/square nanometer, surpassing existing inorganic storage architecture and DNA storage architecture; based on the controllable dynamic assembly of the DNA origami surface, a full-featured storage system is realized, including data reading, writing, modification, etc. Operation; the readout process adopts a solution of in-situ transcription into RNA molecules and sequencing of the transcript products without changing the original storage system, achieving non-destructive DNA data readout. For example, the method and data carrier of the present application can have the effect of multiple reading and writing, and multiple reading. For example, after reading the data system of the present application multiple times, the structure of the data system basically does not change. For example, the address information of the data carrier of the present application has high accuracy and specificity, and can achieve a higher level of discrimination. For example, the method and data system of the present application can realize the reading of data information at room temperature to about 42 degrees Celsius.
本领域技术人员能够从下文的详细描述中容易地洞察到本申请的其它方面和优势。下文的详细描述中仅显示和描述了本申请的示例性实施方式。如本领域技术人员将认识到的,本申请的内容使得本领域技术人员能够对所公开的具体实施方式进行改动而不脱离本申请所涉及发明的精神和范围。相应地,本申请的附图和说明书中的描述仅仅是示例性的,而非为限制性的。Those skilled in the art will readily appreciate other aspects and advantages of the present application from the detailed description below. Only exemplary embodiments of the present application are shown and described in the following detailed description. As those skilled in the art will realize, the contents of this application enable those skilled in the art to make changes to the specific embodiments disclosed without departing from the spirit and scope of the invention covered by this application. Accordingly, the drawings and descriptions of the present application are illustrative only and not restrictive.
附图说明Description of the drawings
本申请所涉及的发明的具体特征如所附权利要求书所显示。通过参考下文中详细描述的示例性实施方式和附图能够更好地理解本申请所涉及发明的特点和优势。对附图简要说明如下:The specific features of the invention to which this application relates are set forth in the appended claims. The features and advantages of the invention to which this application relates can be better understood by reference to the exemplary embodiments described in detail below and the accompanying drawings. A brief description of the drawings is as follows:
图1显示的是本申请的数据存储介质的示例性操作流程。例如,存储数据可用于一切数据形式,包括但不限于汉字;存储容量基于数据序列的长度可以无限拓展,包括但不限于16 bits/位点或120bits/位点;对于数据的读取可以使用本领域已知的测序技术,包括但不限于高通量原位测序、转录-RNA测序以及转录-反转录-扩增-DNA测序。Figure 1 shows an exemplary operation flow of the data storage medium of the present application. For example, the stored data can be used for all data forms, including but not limited to Chinese characters; the storage capacity can be infinitely expanded based on the length of the data sequence, including but not limited to 16 bits/site or 120bits/site; for data reading, sequencing technologies known in the art can be used, including but not limited to high-throughput in-situ sequencing, transcription-RNA sequencing, and transcription-reverse transcription-amplification- DNA sequencing.
图2显示的是本申请核酸分子的示例性结构组成,即,数据链中,包含数据序列、其中引物1和引物2的设计只是为了更简便地验证选择性擦写与读取的可行性,该部分可换做实际存储信息,进一步增加存储容量。Figure 2 shows an exemplary structural composition of the nucleic acid molecule of the present application, that is, the data chain includes the data sequence, in which primer 1 and primer 2 are designed only to more easily verify the feasibility of selective erasing and reading, This part can be replaced with actual stored information to further increase storage capacity.
图3显示的是本申请的数据存储介质的选择性读取和可逆擦写的示例性流程。转录激活过程:钥匙链即激活链;当本申请所述的核酸分子被阻碍链封闭时,转录无法启动,当加入钥匙链后,阻碍链转而与钥匙链结合,不再结合启动子,加入完整的T7启动子后,RNA聚合酶结合至启动子,启动转录。可逆擦写:当加入擦写链(即擦除链)后,数据链与擦除链结合,从载体上脱离。重新加入具有相同地址互补序列数据链后,数据链又重新写至载体上。Figure 3 shows an exemplary process of selective reading and reversible erasing of the data storage medium of the present application. Transcription activation process: the key chain is the activation chain; when the nucleic acid molecule described in this application is blocked by the hindrance chain, transcription cannot be started. When the key chain is added, the hindrance chain turns to combine with the key chain and no longer binds to the promoter, adding After the T7 promoter is intact, RNA polymerase binds to the promoter and initiates transcription. Reversible erasing: When the erasing chain (ie erasure chain) is added, the data chain is combined with the erasing chain and separated from the carrier. After rejoining the complementary sequence data chain with the same address, the data chain is re-written to the carrier.
图4显示的是DNA折纸表面数据写入后的结构表征结果。上部分为DNA折纸表面写入数据所在地址示意图。下部分为数据写入后AFM表征结果。Figure 4 shows the structural characterization results after writing DNA origami surface data. The upper part is a schematic diagram of the address where data is written on the surface of DNA origami. The lower part shows the AFM characterization results after data writing.
图5显示的是读出激活的正交性的PAGE数据统计得到的热图。只有激活链与数据链匹配时,数据才可被高效读出。Figure 5 shows a heat map of PAGE data statistics for readout activation orthogonality. Only when the activation chain matches the data chain can the data be read out efficiently.
图6显示的是七个地址数据读出后二代测序结果。每个地址测序得到的序列信息均与写入的信息完全一致。Figure 6 shows the second-generation sequencing results after seven address data reads. The sequence information obtained by sequencing at each address is completely consistent with the written information.
图7显示的是寻址数据擦写的超分辨荧光显微镜成像结果。上部分为数据定点擦除和写入过程,以及对应的数据地址示意图。下图为该过程每一步的荧光成像结构表征结果。Figure 7 shows the super-resolution fluorescence microscopy imaging results of addressed data erasing. The upper part shows the data fixed-point erasure and writing process, as well as the corresponding data address schematic diagram. The figure below shows the fluorescence imaging structural characterization results for each step of the process.
图8显示的是数据反复修改可行性的荧光测试结果。Figure 8 shows the fluorescence test results of the feasibility of repeated data modification.
图9显示的是七个地址数据10次反复擦写的单点TIRF(全内反射荧光显微镜)成像图。Figure 9 shows a single-point TIRF (total internal reflection fluorescence microscope) imaging image of seven address data being repeatedly erased 10 times.
图10显示的是七个地址数据10次反复擦写的TIRF(全内反射荧光显微镜)成像图及共定位统计图,红绿荧光共定位代表数据在折纸上的存在。Figure 10 shows the TIRF (total internal reflection fluorescence microscopy) imaging and co-localization statistical chart of seven address data that were repeatedly erased 10 times. The co-localization of red and green fluorescence represents the presence of data on the origami paper.
图11显示的是数据链的反复读取时对转录出的RNA的qPCR定量图。Figure 11 shows the qPCR quantification of transcribed RNA during repeated reads of the data chain.
图12显示的是4种不同矩阵间距下数据链的写入结果。Figure 12 shows the writing results of the data link under four different matrix spacings.
图13显示的是4种不同矩阵间距下数据链的读出结果。Figure 13 shows the readout results of the data link under four different matrix spacings.
具体实施方式Detailed ways
以下由特定的具体实施例说明本申请发明的实施方式,熟悉此技术的人士可由本说明书所公开的内容容易地了解本申请发明的其他优点及效果。 The implementation of the invention of the present application will be described below with specific examples. Those familiar with this technology can easily understand other advantages and effects of the invention of the present application from the content disclosed in this specification.
术语定义Definition of Terms
在本申请中,术语“可寻址性”通常是指将特定信息与存储介质上的位置相关联。例如,为了选择性访问、读取、和/或修改特定位置的信息,需要数据载体具有可寻址性。例如,信息载体在记录数据信息时,可以同时记载不同数据信息对应的基本上唯一对应的索引信息(地址)。例如,通过数据载体的物理位置、空间位置等等形式,记载了特定数据信息的地址信息。例如,当数据载体具有可寻址性时,对于期望访问、读取、和/或修改的信息可以选择性访问,而可以不需要逐一访问。In this application, the term "addressability" generally refers to associating specific information with a location on a storage medium. For example, in order to selectively access, read, and/or modify information at a specific location, addressability of the data carrier is required. For example, when recording data information, the information carrier can simultaneously record basically uniquely corresponding index information (address) corresponding to different data information. For example, the address information of specific data information is recorded through the physical location, spatial location, etc. of the data carrier. For example, when the data carrier is addressable, information desired to be accessed, read, and/or modified may be selectively accessed and may not need to be accessed one by one.
在本申请中,术语“核酸分子”、“核酸序列”、和“核酸片段”可互换使用,通常是指具有各种长度的脱氧核糖核苷酸或核糖核苷酸、或者其类似物。示例性的核苷酸包括脱氧核糖核苷酸(DNA)或核糖核苷酸(RNA)、或者非标准的核苷酸、核苷酸类似物和/或经修饰的核苷酸。In this application, the terms "nucleic acid molecule," "nucleic acid sequence," and "nucleic acid fragment" are used interchangeably and generally refer to deoxyribonucleotides or ribonucleotides of various lengths, or analogs thereof. Exemplary nucleotides include deoxyribonucleotides (DNA) or ribonucleotides (RNA), or non-standard nucleotides, nucleotide analogs and/or modified nucleotides.
在本申请中,术语“载体”通常是指能够装载核酸分子的物质。例如,所述载体可以包含核酸纳米结构,核酸纳米结构(也称为纳米结构)可以是由核酸(例如DNA、RNA、锁核酸(LNA)、肽核酸(PNA)或其任何组合)制成的二维或三维纳米结构。例如,单链核酸或双链核酸(例如仅具有螺旋结构)可以不被认为是“纳米结构”。在一些实施方案中,核酸纳米结构充当用于形成更复杂的结构例如分子复合物的支架。在一些实施方案中,核酸纳米结构是使用DNA折纸方法组装的DNA折纸结构。例如,核酸折纸纳米结构可以指通过将两个或更多个“订书钉链”与一个或多个“支架”链组装成规定形状而形成的核酸纳米结构。订书钉链通常是短的(例如50个核苷酸或更短的)核酸链(单链核酸);支架链通常是更长的(例如,长于200个核苷酸)核酸链(单链核酸)。核酸折纸纳米结构可以是DNA折纸纳米结构。In this application, the term "vector" generally refers to a substance capable of carrying nucleic acid molecules. For example, the vector may comprise nucleic acid nanostructures, which may be made from nucleic acids such as DNA, RNA, locked nucleic acid (LNA), peptide nucleic acid (PNA), or any combination thereof. Two- or three-dimensional nanostructures. For example, single-stranded nucleic acids or double-stranded nucleic acids (eg, having only a helical structure) may not be considered "nanostructures." In some embodiments, nucleic acid nanostructures serve as scaffolds for the formation of more complex structures, such as molecular complexes. In some embodiments, the nucleic acid nanostructure is a DNA origami structure assembled using DNA origami methods. For example, nucleic acid origami nanostructures may refer to nucleic acid nanostructures formed by assembling two or more "staple strands" with one or more "scaffold" strands into a prescribed shape. Staple strands are typically short (eg, 50 nucleotides or less) nucleic acid strands (single-stranded nucleic acids); scaffold strands are typically longer (eg, longer than 200 nucleotides) nucleic acid strands (single-stranded nucleic acids). nucleic acids). The nucleic acid origami nanostructure may be a DNA origami nanostructure.
DNA折纸纳米结构可以折叠(例如通过自组装)成离散且唯一的几何图案,例如二维(2D)和三维(3D)形状,其可以进一步自组装以创建包含两个或更多个离散折纸纳米结构的更大的纳米结构或微结构。在一些实施方案中,支架链具有衍生自M13噬菌体的序列。可以使用其他支架链。在一些实施方案中,订书钉链是荧光团标记的订书钉链。在一些实施方案中,订书钉链的长度是4至30个核苷酸(例如4、5、6、7、8、9、10、11、12、13、14、15、16、17、18、19、20、21、22、23、24、25、26、27、28、29或30个核苷酸)。在一些实施方案中,例如在室温,订书钉链与支架链稳定地结合(长于10秒)。在一些实施方案中,例如在室温,订书钉链与支架链稳定地结合(长于一星期)。在一些实施方案中,订书钉链的长度大于30个核苷酸。DNA origami nanostructures can fold (e.g., through self-assembly) into discrete and unique geometric patterns, such as two-dimensional (2D) and three-dimensional (3D) shapes, which can further self-assemble to create structures containing two or more discrete origami nanostructures. Larger nanostructures or microstructures of structures. In some embodiments, the scaffold strand has sequences derived from M13 phage. Other bracket chains can be used. In some embodiments, the staple strands are fluorophore labeled staple strands. In some embodiments, the staple strand is 4 to 30 nucleotides in length (e.g., 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29 or 30 nucleotides). In some embodiments, the staple strands are stably bonded to the scaffold strands (for longer than 10 seconds), for example, at room temperature. In some embodiments, the staple strands are stably bonded to the scaffold strands (for longer than one week), for example, at room temperature. In some embodiments, the staple strand is greater than 30 nucleotides in length.
在本申请中,术语“原位”通常是指在原来的位置进行的操作。例如,原位读取是指记 录数据的核酸分子在载体原有的位置进行数据的读取,而不需要将所述核酸分子先释放到溶液中再进行数据的读取。术语“扩增”通常是指包括经由引发的酶促合成的重复循环生产核酸分子的拷贝。本申请的读取步骤可以包含聚合步骤,该聚合包含但不限于聚合酶链反应(PCR)、转录为RNA等等,例如还包含本领域技术人员已知的任何其他核酸扩增和/或转录的技术。In this application, the term "in situ" generally refers to operations performed in the original location. For example, in-situ reading refers to The nucleic acid molecules that record the data are read at the original position of the carrier, without the need to release the nucleic acid molecules into the solution first and then read the data. The term "amplification" is generally intended to include the production of copies of a nucleic acid molecule via repeated cycles of primed enzymatic synthesis. The reading step of the present application may include a polymerization step, including but not limited to polymerase chain reaction (PCR), transcription to RNA, and the like, including, for example, any other nucleic acid amplification and/or transcription known to those skilled in the art. Technology.
本申请中,术语“互补的”或“互补性”通常是指核酸通过传统的Watson-Crick或其它非传统类型与另一核酸序列形成氢键的能力。例如,序列A-G-T与序列T-C-A互补。互补性百分比指示可以与第二核酸序列形成氢键(例如,Watson-Crick碱基配对)的核酸分子中的残基百分比(例如,十分之5、6、7、8、9、10分别为50%、60%、70%、80%、90%以及100%互补)。例如,“完全互补”是指核酸序列的所有连续残基将与第二核酸序列中相同数目的连续残基氢键键合。例如,“基本上互补”是指在8、9、10、11、12、13、14、15、16、17、18、19、20、21、22、23、24、25、30、35、40、45、50或更多个核苷酸的区域内,或指在严格条件(即严格杂交条件)下杂交的两个核酸至少有60%、65%、70%、75%、80%、85%、90%、95%、97%、98%、99%或100%的互补性程度。例如,“结合”通常是指例如以序列特异性方式唯一地与特定物类结合的物类。As used herein, the term "complementary" or "complementarity" generally refers to the ability of a nucleic acid to form hydrogen bonds with another nucleic acid sequence via traditional Watson-Crick or other non-traditional types. For example, the sequence A-G-T is complementary to the sequence T-C-A. Percent complementarity indicates the percentage of residues in a nucleic acid molecule that can form hydrogen bonds (e.g., Watson-Crick base pairing) with a second nucleic acid sequence (e.g., 50 out of 10, 6, 7, 8, 9, 10, respectively) %, 60%, 70%, 80%, 90% and 100% complementary). For example, "perfectly complementary" means that all contiguous residues of a nucleic acid sequence will hydrogen bond to the same number of contiguous residues in a second nucleic acid sequence. For example, "substantially complementary" means that at 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 35, Within a region of 40, 45, 50 or more nucleotides, or refers to at least 60%, 65%, 70%, 75%, 80%, A degree of complementarity of 85%, 90%, 95%, 97%, 98%, 99% or 100%. For example, "binding" generally refers to a species that uniquely binds to a particular species, such as in a sequence-specific manner.
在本申请中,术语“包含”通常是指包括明确指定的特征,但不排除其他要素。In this application, the term "comprising" generally means the inclusion of explicitly specified features, but not the exclusion of other elements.
在本申请中,术语“约”通常是指在指定数值以上或以下0.5%-10%的范围内变动,例如在指定数值以上或以下0.5%、1%、1.5%、2%、2.5%、3%、3.5%、4%、4.5%、5%、5.5%、6%、6.5%、7%、7.5%、8%、8.5%、9%、9.5%、或10%的范围内变动。In this application, the term "about" generally refers to a variation within the range of 0.5% to 10% above or below the specified value, such as 0.5%, 1%, 1.5%, 2%, 2.5%, above or below the specified value. 3%, 3.5%, 4%, 4.5%, 5%, 5.5%, 6%, 6.5%, 7%, 7.5%, 8%, 8.5%, 9%, 9.5%, or 10%.
发明详述Detailed description of the invention
一方面,本申请提供了一种核酸分子,所述核酸分子能够结合于具有可寻址信息的载体上,以及所述核酸分子可以包含的数据信息能够在载体原位进行随机读取与擦写。例如,本申请提供的核酸分子可以作为数据链,该数据链可以基于碱基互补原理结合于载体的特定位置,结合动态DNA组装技术,可以实现存储地址和数据的可编程结合和分离。例如,在所述数据信息进行读取时,可以不改变原有存储系统,实现了非破坏性的数据读出。例如,在读取所述核酸分子中记载的数据信息时,所述核酸分子可以不需要先从所述载体上释放,实现对所需读取信息的原位随机读取。On the one hand, the present application provides a nucleic acid molecule that can be bound to a carrier with addressable information, and the data information that the nucleic acid molecule can contain can be randomly read and erased in situ on the carrier. . For example, the nucleic acid molecules provided in this application can be used as a data link. The data link can be combined with a specific position on the carrier based on the principle of base complementarity. Combined with dynamic DNA assembly technology, programmable combination and separation of storage addresses and data can be achieved. For example, when the data information is read, the original storage system does not need to be changed, achieving non-destructive data reading. For example, when reading the data information recorded in the nucleic acid molecule, the nucleic acid molecule does not need to be released from the carrier first, thereby achieving in-situ random reading of the required read information.
例如,所述载体上物理位置不同的地址序列具有不同的序列,所述核酸分子可以包含地址互补序列,且所述地址互补序列与所述载体上的所述地址序列互补。例如,所述地址互补序列可以为地址识别序列,所述地址互补序列可以识别以及特异性结合载体上的地址序列。 例如,可以通过载体上的不同物理位置的地址序列记载地址信息。例如,载体上可以具有2个或更多个坐标点。例如,不同坐标点表示不同的物理位置。例如,特定坐标点可以由DNA折纸中的特定位置的订书钉链末端定位。例如,不同坐标点延伸出具有独特序列组成的地址序列。例如,本申请的地址序列可以为线性单链或者分支链的结构。例如,特定物理位置的地址序列由于具有独特序列,因此能够用于记载索引信息,当具有基本上唯一互补的地址互补序列(由特定数据链包含该地址互补序列)与所述地址序列结合时,所述数据信息可以与所述地址信息结合。在一些实施方案中,例如在室温,数据链与地址链稳定地结合(例如长于10秒)。在一些实施方案中,例如在低温(约4摄氏度),数据链与地址链稳定地结合(例如长于10秒)。在一些实施方案中,例如以干粉状态保存于抑菌、抗氧、抗高温的板子上,数据链与地址链稳定地结合(例如长于10秒)。For example, address sequences at different physical locations on the carrier have different sequences, and the nucleic acid molecule may include an address complementary sequence, and the address complementary sequence is complementary to the address sequence on the carrier. For example, the address complementary sequence can be an address recognition sequence, and the address complementary sequence can recognize and specifically bind the address sequence on the carrier. For example, the address information can be recorded by a sequence of addresses at different physical locations on the carrier. For example, there can be 2 or more coordinate points on the carrier. For example, different coordinate points represent different physical locations. For example, specific coordinate points can be located by the ends of staple strands at specific locations in DNA origami. For example, different coordinate points extend an address sequence with a unique sequence composition. For example, the address sequence of this application may be a linear single chain or a branched chain structure. For example, the address sequence of a specific physical location can be used to record index information because it has a unique sequence. When a substantially unique complementary address complementary sequence (contained by a specific data link) is combined with the address sequence, The data information may be combined with the address information. In some embodiments, the data link is stably associated with the address link (eg, for longer than 10 seconds), such as at room temperature. In some embodiments, such as at low temperatures (about 4 degrees Celsius), the data link is stably coupled to the address chain (eg, for longer than 10 seconds). In some embodiments, for example, stored in a dry powder state on a bacteriostatic, antioxidant, and high temperature resistant plate, the data link and the address link are stably combined (eg, longer than 10 seconds).
例如,所述地址互补序列的长度为约15个或更多个核苷酸。例如,本申请的地址互补序列可以为线性单链或者分支链的结构。例如,地址互补序列的长度可以为约10个或更多个、约11个或更多个、约12个或更多个、约13个或更多个、约14个或更多个、约15个或更多个、约16个或更多个、约17个或更多个、约18个或更多个、约19个或更多个、约20个或更多个、约25个或更多个、约30个或更多个、约40个或更多个、约50个或更多个、或约100个或更多个核苷酸。例如,所述核苷酸可以包含天然核苷酸和/或具有人工修饰的核苷酸,例如包含但不限于甲基修饰、氨基修饰、氟代修饰等等。For example, the address complementary sequence is about 15 or more nucleotides in length. For example, the address complementary sequence of the present application may be a linear single chain or a branched chain structure. For example, the length of the address complementary sequence may be about 10 or more, about 11 or more, about 12 or more, about 13 or more, about 14 or more, about 15 or more, about 16 or more, about 17 or more, about 18 or more, about 19 or more, about 20 or more, about 25 or more, about 30 or more, about 40 or more, about 50 or more, or about 100 or more nucleotides. For example, the nucleotides may include natural nucleotides and/or nucleotides with artificial modifications, such as, but not limited to, methyl modifications, amino modifications, fluoro modifications, and the like.
例如,所述核酸分子还可以包含擦写功能序列,所述擦写功能序列位于所述地址互补序列的上游和/或下游,且所述擦写功能序列与所述载体上的所述地址序列基本不互补。例如,所述擦写功能序列也可以称为擦写功能区。例如,所述擦写功能序列位于所述地址互补序列的上游和/或下游。例如,所述擦写功能序列与所述地址互补序列共同组成的序列可以不完全与载体上的地址序列互补。例如,示例性的地址互补序列可以为20个核苷酸,擦写功能序列可以为10个核苷酸,载体上的地址序列可以仅与地址互补序列的20个核苷酸互补,而可以不与所述擦写功能序列与所述地址互补序列共同组成30个核苷酸完全互补。例如,因此可以引入一种结合能力更高的链(例如本申请的擦除链),以使得数据链的核酸序列可以从所述载体的特定位置移除(擦除)。For example, the nucleic acid molecule may further comprise an erasure function sequence, the erasure function sequence is located upstream and/or downstream of the address complementary sequence, and the erasure function sequence is identical to the address sequence on the carrier. Basically not complementary. For example, the erase and write function sequence may also be called an erase and write functional area. For example, the erase and write function sequence is located upstream and/or downstream of the address complementary sequence. For example, the sequence composed of the erasing function sequence and the address complementary sequence may not be completely complementary to the address sequence on the carrier. For example, the exemplary address complementary sequence may be 20 nucleotides, and the erase function sequence may be 10 nucleotides. The address sequence on the carrier may only be complementary to the 20 nucleotides of the address complementary sequence, but may not be complementary. The erasing function sequence and the address complementary sequence together form 30 nucleotides that are completely complementary. For example, a strand with a higher binding capacity (such as the erasure strand of the present application) can thus be introduced so that the nucleic acid sequence of the data strand can be removed (erased) from a specific position of the vector.
例如,当擦除链存在时,所述核酸分子能够不与所述载体结合,且所述擦除链同时与所述地址链以及所述擦写功能序列互补。例如,在室温中,本申请核酸分子(数据链)能够与所述擦除链具有更强的结合能力。例如,相比于与载体上的地址序列结合,所述数据链可以与所述擦除链具有更多和/或更强的结合碱基。 For example, when the erasure chain exists, the nucleic acid molecule can not be combined with the carrier, and the erasure chain is complementary to the address chain and the erasure function sequence at the same time. For example, at room temperature, the nucleic acid molecule (data strand) of the present application can have stronger binding ability to the erasure strand. For example, the data strand may have more and/or stronger binding bases to the erasure strand than to the address sequence on the carrier.
例如,为了实现可寻址的擦除和写入,所述核酸分子(数据链)的地址互补序列、所述载体的特定位置对应的地址序列、特定对应的擦除链的序列可以通过设计特定碱基顺序的序列,具有独特的互补匹配方式。例如,载体的特定位置的地址序列与特定核酸分子(数据链)的地址互补序列唯一互补,以实现可寻址的写入。例如,载体的特定位置结合的特定核酸分子(数据链)的地址互补序列与特定对应的擦除链的序列唯一互补,以实现可寻址的擦除。For example, in order to achieve addressable erasing and writing, the address complementary sequence of the nucleic acid molecule (data chain), the address sequence corresponding to a specific position of the carrier, and the sequence of the specific corresponding erasure chain can be specified through design. A sequence of base sequences that has a unique complementary matching pattern. For example, the address sequence of a specific position of the vector is uniquely complementary to the address complementary sequence of a specific nucleic acid molecule (data link) to achieve addressable writing. For example, the address complementary sequence of a specific nucleic acid molecule (data strand) bound to a specific position of the carrier is uniquely complementary to the sequence of the specific corresponding erasure strand to achieve addressable erasure.
例如,所述核酸分子可以包含数据序列,且在所述核酸分子与所述载体基本上不分离的状态下,所述数据序列中的数据信息可以被读取。例如,所述核酸分子(数据链)在所存储的原位,可以实现数据信息的读取。例如,读取的方式包含但不限于原位测序、原位转录为其它核酸分子用于读取信息、原位读取为其它核酸分子用于读取信息等等。例如,本申请中将核酸分子(数据链)存储的信息导出可以认为是读取。例如,将导出的(例如转录出的/扩增出的)其它核酸分子进行测序,可以认为是后续任选地额外的读取步骤。For example, the nucleic acid molecule may include a data sequence, and the data information in the data sequence may be read in a state where the nucleic acid molecule and the carrier are not substantially separated. For example, when the nucleic acid molecules (data links) are in the stored position, the data information can be read. For example, reading methods include but are not limited to in-situ sequencing, in-situ transcription into other nucleic acid molecules for reading information, in-situ reading into other nucleic acid molecules for reading information, and so on. For example, in this application, deriving information stored in a nucleic acid molecule (data link) can be considered as reading. For example, sequencing the derived (eg transcribed/amplified) other nucleic acid molecules may be considered a subsequent optional additional read step.
例如,所述数据序列的长度为约1个或更多个核苷酸。例如,本申请的数据序列可以为线性单链或者分支链的结构。例如,数据序列的长度可以为约1个或更多个、约2个或更多个、约3个或更多个、约4个或更多个、约5个或更多个、约6个或更多个、约7个或更多个、约8个或更多个、约9个或更多个、约10个或更多个、约15个或更多个、约20个或更多个、约30个或更多个、约40个或更多个、约50个或更多个、约100个或更多个、约120个或更多个、约150个或更多个、约200个或更多个、约500个或更多个、约700个或更多个、或者约1000个或更多个。例如,本申请的存储方式对于任意长度的数据序列可以具有兼容性。For example, the data sequence is about 1 or more nucleotides in length. For example, the data sequence of this application may be a linear single chain or a branched chain structure. For example, the length of the data sequence may be about 1 or more, about 2 or more, about 3 or more, about 4 or more, about 5 or more, about 6 or more, about 7 or more, about 8 or more, about 9 or more, about 10 or more, about 15 or more, about 20 or More, about 30 or more, about 40 or more, about 50 or more, about 100 or more, about 120 or more, about 150 or more , about 200 or more, about 500 or more, about 700 or more, or about 1000 or more. For example, the storage method of this application can be compatible with data sequences of any length.
例如,所述核酸分子可以包含读取引发序列,所述读取引发序列能够引发所述数据序列读取为待测序链,且所述待测序链与所述数据序列互补。例如,所述核酸分子可以包含读取引发序列,所述读取引发序列可以引发所述数据序列被读取。例如,本申请的读取引发序列可以为启动子,如包含但不限于T7启动子。例如,本申请的存储方式对于任意长度和种类的读取引发序列可以具有兼容性,所述读取引发序列具有的特定的序列可以用于本领域已知的聚合酶的结合和聚合起始。For example, the nucleic acid molecule may include a read priming sequence capable of priming the data sequence to be read as a strand to be sequenced, and the strand to be sequenced is complementary to the data sequence. For example, the nucleic acid molecule may comprise a read priming sequence that may cause the data sequence to be read. For example, the read priming sequence of the present application can be a promoter, such as including but not limited to T7 promoter. For example, the storage method of the present application can be compatible with read priming sequences of any length and type, and the read priming sequences have specific sequences that can be used for binding and polymerization initiation of polymerases known in the art.
例如,本申请的核酸分子(数据链)可以具有选择性读取的功能。例如,所述选择性读取可以通过引入阻碍链实现,所述阻碍链的阻碍序列与所述读取引发序列互补。例如,当阻碍链存在时,所述数据序列能够不被读取,所述阻碍链可以包含阻碍序列,且所述阻碍链的阻碍序列与所述读取引发序列部分或完全互补。例如,所述阻碍链的阻碍序列,与所述读取引发序列以及所述读取引发序列上游/下游约5个核苷酸长度的区间互补。例如,阻碍序列长 度为22个核苷酸,其中17个核苷酸与所述读取引发序列互补,另外5个核苷酸与所述读取引发序列上游/下游约5个核苷酸长度的区间互补。例如,相比于T7启动子与数据链结合,所述阻碍链与数据链的结合能力更高。例如,本申请的阻碍链可以与所述核酸分子(数据链)的读取引发序列、数据链或者任意能够阻碍将核酸分子(数据链)存储的信息导出的位置结合。For example, the nucleic acid molecule (data link) of the present application can have the function of selective reading. For example, the selective reading can be achieved by introducing a blocking strand whose blocking sequence is complementary to the reading initiating sequence. For example, when a blocking chain exists, the data sequence cannot be read, the blocking chain may include a blocking sequence, and the blocking sequence of the blocking chain is partially or completely complementary to the read initiating sequence. For example, the blocking sequence of the blocking strand is complementary to the read priming sequence and an interval of about 5 nucleotides upstream/downstream of the reading priming sequence. For example, the blocking sequence is long The length is 22 nucleotides, of which 17 nucleotides are complementary to the read priming sequence, and the other 5 nucleotides are complementary to an interval of approximately 5 nucleotides upstream/downstream of the read priming sequence. For example, the blocking chain has a higher ability to bind to the data chain than the T7 promoter binds to the data chain. For example, the blocking chain of the present application can be combined with the read trigger sequence of the nucleic acid molecule (data chain), the data chain, or any position that can block the derivation of the information stored in the nucleic acid molecule (data chain).
例如,所述阻碍链还可以包含阻碍延伸序列,所述阻碍延伸序列位于所述阻碍序列的上游和/或下游,所述阻碍延伸序列与所述核酸分子基本不互补。例如,所述阻碍链具有阻碍序列与阻碍延伸序列,当阻碍链与所述核酸分子(数据链)结合时,所述阻碍延伸序列基本不与所述核酸分子(数据链)结合,例如形成悬垂结构。例如,所述阻碍延伸序列长度可以为约8个核苷酸。例如,所述阻碍延伸序列可以作为杠杆,通过提高激活链与阻碍链的结合能力,以使得激活链将阻碍链从数据链上移除。For example, the blocking strand may further comprise a blocking extension sequence located upstream and/or downstream of the blocking sequence, and the blocking extension sequence is substantially not complementary to the nucleic acid molecule. For example, the blocking chain has a blocking sequence and a blocking extension sequence. When the blocking chain is combined with the nucleic acid molecule (data chain), the blocking extension sequence is basically not combined with the nucleic acid molecule (data chain), for example, forming an overhang. structure. For example, the extension-blocking sequence may be about 8 nucleotides in length. For example, the blocking extension sequence can be used as a lever by improving the binding ability of the activation chain and the blocking chain, so that the activation chain removes the blocking chain from the data chain.
例如,当激活链存在时,所述阻碍链能够不与所述核酸分子结合,所述激活链同时与所述阻碍序列以及所述阻碍延伸序列互补。例如,在退火过程中,本申请激活链能够与所述阻碍链具有更强的结合能力。例如,相比于与核酸分子(数据链)上与阻碍链结合的序列,所述激活链可以与所述阻碍链具有更多和/或更强的结合碱基。For example, the blocking strand can not bind to the nucleic acid molecule when an activating strand is present, which is complementary to the blocking sequence and the blocking extension sequence at the same time. For example, during the annealing process, the activation chain of the present application can have stronger binding ability with the hindrance chain. For example, the activating strand may have more and/or stronger binding bases to the blocking strand than to a sequence on the nucleic acid molecule (data strand) that binds to the blocking strand.
例如,为了实现可寻址的读取,所述核酸分子(数据链)的被用于封闭读取的序列、特定对应的阻碍链的阻碍序列、特定对应的激活链的序列可以通过设计特定碱基顺序的序列,具有独特的互补匹配方式。例如,核酸分子(数据链)的被用于封闭读取的序列与特定对应的阻碍链的阻碍序列唯一互补,以实现可寻址的锁定读取。例如,特定对应的激活链的序列与特定对应的阻碍链的阻碍序列的序列唯一互补,以实现可寻址的解锁(激活)读取。例如,特定核酸分子(数据链)处于锁定读取状态时,所述特定核酸分子(数据链)的数据序列可以基本上不被读取、转录和/或扩增。例如,特定核酸分子(数据链)处于解锁(激活)读取状态时,所述特定核酸分子(数据链)的数据序列可以被读取、转录和/或扩增。For example, in order to achieve addressable reading, the sequence of the nucleic acid molecule (data chain) used to block the reading, the blocking sequence of the specific corresponding blocking strand, and the sequence of the specific corresponding activating chain can be designed by designing specific bases. A sequence of base sequences with a unique complementary matching pattern. For example, the sequence of the nucleic acid molecule (data strand) used to block the read is uniquely complementary to the blocking sequence of the specific corresponding blocking strand to achieve an addressable locked read. For example, the sequence of a particular corresponding activating strand is uniquely complementary to the sequence of the blocking sequence of a particular corresponding blocking strand to enable addressable unlocking (activating) reading. For example, when a specific nucleic acid molecule (data link) is in a locked reading state, the data sequence of the specific nucleic acid molecule (data link) may not be substantially read, transcribed and/or amplified. For example, when a specific nucleic acid molecule (data link) is in an unlocked (activated) reading state, the data sequence of the specific nucleic acid molecule (data link) can be read, transcribed and/or amplified.
例如,所述载体可以包含DNA折纸基板,所述DNA折纸的订书钉链可以包含具有可寻址信息的地址序列。例如,载体上可以具有2个或更多个坐标点。例如,不同坐标点表示不同的物理位置。例如,特定坐标点可以由DNA折纸中的特定位置的订书钉链末端定位。例如,不同坐标点延伸出具有独特序列组成的地址序列。例如,特定物理位置的地址序列由于具有独特序列,因此能够用于记载索引信息,当具有基本上唯一互补的地址互补序列(由特定数据链包含该地址互补序列)与所述地址序列结合时,所述数据信息可以与所述地址信息结合。例如,2个或更多个所述订书钉链的间隔为约6纳米或更大。例如,相邻的所述订书钉 链的间隔为约6纳米或更大、约7纳米或更大、约8纳米或更大、约9纳米或更大、约10纳米或更大、约15纳米或更大、约20纳米或更大、约25纳米或更大、或约30纳米或更大。For example, the vector may comprise a DNA origami substrate whose staple strands may contain address sequences with addressable information. For example, there can be 2 or more coordinate points on the carrier. For example, different coordinate points represent different physical locations. For example, specific coordinate points can be located by the ends of staple strands at specific locations in DNA origami. For example, different coordinate points extend an address sequence with a unique sequence composition. For example, the address sequence of a specific physical location can be used to record index information because it has a unique sequence. When a substantially unique complementary address complementary sequence (contained by a specific data link) is combined with the address sequence, The data information may be combined with the address information. For example, 2 or more of the staple strands are spaced apart by about 6 nanometers or more. For example, the adjacent staples The strands are spaced about 6 nanometers or greater, about 7 nanometers or greater, about 8 nanometers or greater, about 9 nanometers or greater, about 10 nanometers or greater, about 15 nanometers or greater, about 20 nanometers or more. larger, about 25 nanometers or larger, or about 30 nanometers or larger.
本申请提供了一种系统,所述系统可以包含本申请的核酸分子,以及载体。例如,所述系统还可以包含本申请的擦除链、本申请的阻碍链和/或本申请的激活链。The present application provides a system, which may include the nucleic acid molecule of the present application, and a vector. For example, the system may also include the erasure chain of the present application, the blocking chain of the present application, and/or the activation chain of the present application.
本申请提供了一种数据存储的方法,所述数据存储方法可以包含提供本申请的核酸分子和/或本申请的系统。一种数据编辑和/或数据读取方法,所述数据编辑方法可以包含替换本申请的核酸分子中存储数据信息。一种数据读取的方法,所述数据读取方法可以包含确定本申请的核酸分子中存储数据信息。The present application provides a data storage method, which may include providing the nucleic acid molecule of the present application and/or the system of the present application. A data editing and/or data reading method, the data editing method may include replacing the data information stored in the nucleic acid molecules of the present application. A data reading method, which may include determining the data information stored in the nucleic acid molecules of the present application.
例如,所述方法还可以包含提供载体,所述载体可以包含DNA折纸基板,所述DNA折纸的订书钉链可以包含具有可寻址信息的地址序列,且所述方法可以包含提供摩尔比为约2:1或更高的所述核酸分子与所述地址序列。例如,所述方法可以包含提供摩尔比为约2:1或更高、约2.1:1或更高、约2.2:1或更高、约2.3:1或更高、约2.4:1或更高、约2.5:1或更高、约3:1或更高、约4:1或更高、约5:1或更高、约10:1或更高、约20:1或更高、约50:1或更高、或约100:1或更高的所述核酸分子与所述地址序列。例如选择合适的摩尔比可以提高数据链的存储成功率。例如选择合适的摩尔比可以提高数据链的存储性价比。For example, the method may further comprise providing a vector, the vector may comprise a DNA origami substrate, the staple strands of the DNA origami may comprise an address sequence having addressable information, and the method may comprise providing a molar ratio of About 2:1 or higher of the nucleic acid molecule to the address sequence. For example, the method may include providing a molar ratio of about 2:1 or higher, about 2.1:1 or higher, about 2.2:1 or higher, about 2.3:1 or higher, about 2.4:1 or higher. , about 2.5:1 or higher, about 3:1 or higher, about 4:1 or higher, about 5:1 or higher, about 10:1 or higher, about 20:1 or higher, about 50:1 or higher, or about 100:1 or higher of the nucleic acid molecule to the address sequence. For example, choosing an appropriate molar ratio can improve the storage success rate of the data link. For example, choosing an appropriate molar ratio can improve the storage cost-effectiveness of the data link.
例如,所述方法还可以包含提供本申请的擦除链,在室温中,特定物理位置的所述核酸分子与所述擦除链结合,且所述核酸分子基本不与所述载体结合。术语“室温”以及“环境温度”通常是指一个在约16摄氏度和约40摄氏度之间的温度。例如,约16摄氏度至约25摄氏度之间的温度。例如,约25摄氏度。For example, the method may further comprise providing an erasure strand of the present application, the nucleic acid molecule at a specific physical location is bound to the erasure strand at room temperature, and the nucleic acid molecule is not substantially bound to the carrier. The terms "room temperature" and "ambient temperature" generally refer to a temperature between about 16 degrees Celsius and about 40 degrees Celsius. For example, a temperature between about 16 degrees Celsius and about 25 degrees Celsius. For example, about 25 degrees Celsius.
例如,所述方法还可以包含提供本申请的阻碍链,所述阻碍链与所述核酸分子结合,且所述核酸分子的数据信息基本不能够被读取。例如,所述方法还可以包含提供本申请的激活链,在加热以及降温过程中,特定物理位置的核酸分子所结合的所述阻碍链与所述激活链结合,且所述阻碍链基本不与所述核酸分子结合。例如,本申请的加热以及降温过程包含使核酸退火的过程。例如,使本申请的核酸分子部分双链分开(例如加热),再恢复为部分双链结构(例如降温)的过程,可以为核酸退火过程。例如,约95度加热约3分钟,然后以每分钟约1.2度的速度降温到室温的过程,可以为核酸退火过程。For example, the method may further include providing a blocking chain of the present application, the blocking chain is combined with the nucleic acid molecule, and the data information of the nucleic acid molecule cannot be substantially read. For example, the method may further include providing the activation chain of the present application. During the heating and cooling processes, the hindrance chain to which the nucleic acid molecules at a specific physical position are bound binds to the activation chain, and the hindrance chain does not substantially bind to the activation chain. The nucleic acid molecules bind. For example, the heating and cooling processes in this application include the process of annealing nucleic acids. For example, the process of partially separating the double-stranded nucleic acid molecules of the present application (eg, heating) and then restoring the partially double-stranded structure (eg, cooling) can be a nucleic acid annealing process. For example, the process of heating to about 95 degrees for about 3 minutes and then cooling to room temperature at a rate of about 1.2 degrees per minute can be a nucleic acid annealing process.
另一方面,本申请提供一种储存介质,其记载可以运行本申请的方法的程序。On the other hand, the present application provides a storage medium recording a program that can run the method of the present application.
另一方面,本申请提供一种设备,其可以包含本申请的储存介质。另一方面,本申请提供了一种非易失性计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行以实现本申请所述的任一种或多种的方法。例如,所述非易失性计算机可读存储介质可以包括软 盘、柔性盘、硬盘、固态存储(SSS)(例如固态驱动(SSD))、固态卡(SSC)、固态模块(SSM))、企业级闪存驱动、磁带或任何其他非临时性磁介质等。非易失性计算机可读存储介质还可以包括打孔卡、纸带、光标片(或任何其他具有孔型图案或其他光学可识别标记的物理介质)、压缩盘只读存储器(CD-ROM)、可重写式光盘(CD-RW)、数字通用光盘(DVD)、蓝光光盘(BD)和/或任何其他非临时性光学介质。On the other hand, the present application provides a device, which may contain the storage medium of the present application. On the other hand, the present application provides a non-volatile computer-readable storage medium on which a computer program is stored, and the program is executed by a processor to implement any one or more methods described in the present application. For example, the non-volatile computer-readable storage medium may include software disk, flexible disk, hard disk, solid state storage (SSS) (such as solid state drive (SSD)), solid state card (SSC), solid state module (SSM)), enterprise flash drive, tape or any other non-transitory magnetic media, etc. Non-volatile computer-readable storage media may also include punched cards, paper tape, cursor pads (or any other physical media having a hole pattern or other optically identifiable markings), compact disk read-only memory (CD-ROM) , Compact Disc Rewritable (CD-RW), Digital Versatile Disc (DVD), Blu-ray Disc (BD) and/or any other non-transitory optical media.
例如,本申请的设备还可以包含耦接至所述储存介质的处理器,所述处理器被配置为基于存储在所述储存介质中的程序执行以实现本申请的方法。例如,所述设备可以实现各种机制以便确保在数据库系统上执行的本申请所述的方法产生正确的结果。在本申请中,所述设备可以使用磁盘作为永久性数据存储器。在本申请中,所述设备可以为多个数据库客户端提供数据库存储和处理服务。所述设备可以跨多个共享存储设备存储数据库数据,和/或可以利用具有多个执行节点的一个或更多个执行平台。所述设备可以被组织,使得存储和计算资源可以被有效地无限扩展。For example, the device of the present application may further include a processor coupled to the storage medium, and the processor is configured to execute based on a program stored in the storage medium to implement the method of the present application. For example, the device may implement various mechanisms to ensure that methods described herein when executed on a database system produce correct results. In this application, the device may use disks as permanent data storage. In this application, the device can provide database storage and processing services for multiple database clients. The device may store database data across multiple shared storage devices and/or may utilize one or more execution platforms with multiple execution nodes. The device can be organized so that storage and computing resources can be expanded effectively infinitely.
本申请提供了以下实施方式:This application provides the following implementation methods:
一种纳米可寻址的全功能DNA存储系统的制备,包括如下步骤:The preparation of a nanometer-addressable, fully functional DNA storage system includes the following steps:
(1)设计和组装纳米可寻址DNA平台。DNA折纸表面的地址以DNA序列的形式表示,通过序列筛选算法设计长度为,例如15至18个碱基,或更长碱基的正交序列库,以保证数据链可以以稳定杂交的方式存储于特定的地址。以单层矩形DNA折纸为模板,通过对其订书钉链的设计,将地址序列从折纸表面的对应位点延伸。经过地址设计的订书钉链组合与骨架链混合,通过退火组装形成可寻址的空白DNA存储平台。(1) Design and assemble nano-addressable DNA platform. The address on the DNA origami surface is expressed in the form of a DNA sequence. An orthogonal sequence library with a length of, for example, 15 to 18 bases or longer is designed through a sequence screening algorithm to ensure that the data chain can be stored in a stable hybridization manner. to a specific address. Using a single-layer rectangular DNA origami as a template, the address sequence is extended from the corresponding site on the origami surface by designing its staple strands. The address-designed staple strand combination is mixed with the backbone strand and assembled through annealing to form an addressable blank DNA storage platform.
DNA折纸组装过程可参考本领域已知的DNA折纸组装技术,例如Rothemund,P.W.K.Folding DNA to Create Nanoscale Shapes and Patterns.Nature 2006,440(7082),297–302。The DNA origami assembly process can refer to the DNA origami assembly technology known in the art, such as Rothemund, P.W.K. Folding DNA to Create Nanoscale Shapes and Patterns. Nature 2006, 440 (7082), 297–302.
示例性的存储整体流程可以如图1所示,示例性的数据链(本所述的核酸分子)结构可以如图2所示,数据的选择性读取与可逆擦写可以如图3所示An exemplary overall storage process can be shown in Figure 1, an exemplary data chain (nucleic acid molecule in this invention) structure can be shown in Figure 2, and selective reading and reversible erasing of data can be shown in Figure 3.
(2)编码对应地址的数据。将待写入数据分割成与单个地址容量相匹配的片段,加上T7启动子序列和地址序列,形成待写入的数据链序列,合成得到数据链。合成核酸链的方法为本领域熟知,例如,通过化合合成数据链。(2) Encode the data corresponding to the address. Divide the data to be written into fragments that match the capacity of a single address, add the T7 promoter sequence and address sequence to form a data chain sequence to be written, and synthesize the data chain. Methods of synthesizing nucleic acid strands are well known in the art, for example, by chemically synthesizing data strands.
一种可寻址的DNA存储系统的数据操作体系,所述体系可包括:A data operation system of an addressable DNA storage system, the system may include:
(1)数据写入(或存储)步骤,所述步骤可包括利用DNA互补配对原则,加入包含数据序列的数据链使其结合在特定地址处。(1) Data writing (or storing) step, which may include utilizing the DNA complementary pairing principle to add a data chain containing a data sequence to bind it at a specific address.
(2)数据擦除步骤,所述步骤可包括加入特定地址的擦除操作DNA链(也可称为Erase 链,即擦除链),通过DNA链取代反应,结合该处的数据序列,使该地址恢复到未被写入的状态。(2) Data erasure step, which may include adding a specific address erasure operation DNA chain (also known as Erase chain, that is, the erasure chain), through the DNA chain substitution reaction, combined with the data sequence there, the address is restored to an unwritten state.
(3)数据修改(或编辑)步骤,所述步骤可包括通过先擦除原有数据,再写入新数据实现。(3) Data modification (or editing) step, which may include first erasing the original data and then writing new data.
(4)数据读出(或读取)步骤,所述步骤可包括加入对应于特定地址的激活链(也可称为Input链),激活该地址的转录功能,在T7 RNA聚合酶的作用下,转录该地址的数据成RNA链,未被激活的地址不具有转录能力。通过收集获得的RNA链进行后续测序,从而实现可寻址的数据读取。(4) Data readout (or reading) step, which may include adding an activation chain (also called an Input chain) corresponding to a specific address to activate the transcription function of the address, under the action of T7 RNA polymerase , transcribe the data of this address into an RNA chain, and the inactive address does not have the transcription ability. Addressable data reading is achieved by collecting the obtained RNA strands for subsequent sequencing.
不欲被任何理论所限,下文中的实施例仅仅是为了阐释本申请的产品、制备方法和用途等,而不用于限制本申请发明的范围。Without intending to be limited by any theory, the following examples are only to illustrate the products, preparation methods and uses of the present application, and are not intended to limit the scope of the invention of the present application.
实施例Example
实施例1一种纳米可寻址的全功能DNA存储系统的制备Example 1 Preparation of a nanometer-addressable fully functional DNA storage system
纳米可寻址DNA平台的设计过程如下:The design process of the nano-addressable DNA platform is as follows:
(1)DNA折纸骨架链采用M13mp18单链,根据目标模板形状(二维矩形结构)得到订书钉链组合。示例性的方式可以是,可选地根据七段式寻址结构的形状信息标记订书钉链与地址编号的对应关系。随机生成并筛选出七段长度为20个碱基的正交地址序列,将对应于地址1-7的订书钉链序列分别延伸,延伸序列为设计的地址序列,从而获得用于组装可寻址DNA平台的订书钉链组合。如图1,地址1-7对应的7个数据链可以存储的信息是“天地变而万物通”七个汉字。(1) The DNA origami skeleton chain uses M13mp18 single chain, and the staple chain combination is obtained according to the target template shape (two-dimensional rectangular structure). An exemplary manner may be to optionally mark the corresponding relationship between the staple chain and the address number according to the shape information of the seven-segment addressing structure. Randomly generate and screen out seven orthogonal address sequences with a length of 20 bases, and extend the staple chain sequences corresponding to addresses 1-7 respectively. The extended sequences are the designed address sequences, thereby obtaining the address sequence for assembly. A combination of staple strands that addresses the DNA platform. As shown in Figure 1, the information that can be stored in the seven data links corresponding to addresses 1-7 is the seven Chinese characters "Heaven and earth change and all things are connected".
(2)可开关的数据链采用部分互补配对的双链结构。单个汉字转换为二进制编码,然后使用循环编码算法编码为10个碱基的序列。完整的数据链由地址互补序列、擦写功能区、T7启动子序列、数据序列(含一对引物)连接而成(图2)。封闭链包含T7启动子的17个碱基及下游5个碱基的互补序列以及一段8个碱基的对应于地址的悬挂序列(称为toehold)。数据链和封闭链通过22个碱基的部分互补配对形成双链结构。(2) The switchable data link adopts a partially complementary paired double-stranded structure. A single Chinese character is converted into binary encoding and then encoded into a 10-base sequence using a cyclic encoding algorithm. The complete data chain is connected by the address complementary sequence, erasing functional region, T7 promoter sequence, and data sequence (including a pair of primers) (Figure 2). The closed strand contains 17 bases of the T7 promoter and the complementary sequence of 5 bases downstream and an 8-base dangling sequence corresponding to the address (called toehold). The data strand and closed strand form a double-stranded structure through partial complementary pairing of 22 bases.
基于上述设计,对纳米可寻址平台的结构进行组装,组装过程如下:Based on the above design, the structure of the nano-addressable platform is assembled. The assembly process is as follows:
(1)M13单链与可寻址的订书钉链混合于1×TAE-Mg2+溶液中,终浓度骨架链10nM,订书钉链50nM。(1) M13 single chain and addressable staple chain are mixed in 1×TAE-Mg 2+ solution, with a final concentration of 10 nM for the backbone chain and 50 nM for the staple chain.
(2)于PCR仪中进行退火组装,退火程序为:95度保温3分钟,然后以每分钟1度的速度降温至25度,最后保持在4度。 (2) Perform annealing assembly in a PCR machine. The annealing procedure is: incubate at 95 degrees for 3 minutes, then cool to 25 degrees at a rate of 1 degree per minute, and finally maintain it at 4 degrees.
(3)获得的组装产物使用PEG沉淀方式纯化,获得约20nM的纯净组装结构作为信息存储的基板。仅为了可视化展示效果,获得的DNA结构可以使用原子力显微镜(AFM)表征。(3) The obtained assembly product is purified using PEG precipitation, and a pure assembly structure of approximately 20 nM is obtained as a substrate for information storage. For visualization purposes only, the obtained DNA structure can be characterized using atomic force microscopy (AFM).
(4)数据链和对应的封闭链在1×TAE-Mg2+溶液中混合,终浓度10μM。PCR仪中退火杂交。杂交产物使用10%聚丙烯酰胺凝胶电泳(PAGE)表征。(4) The data chain and the corresponding closed chain are mixed in 1×TAE-Mg 2+ solution, with a final concentration of 10 μM. Annealing and hybridization in a PCR machine. Hybridization products were characterized using 10% polyacrylamide gel electrophoresis (PAGE).
实施例2纳米可寻址的全功能DNA存储系统的数据操作方法Embodiment 2 Data operation method of nanometer-addressable full-function DNA storage system
数据的写入和读出:Data writing and reading:
(1)向10nM DNA折纸中加入100nM 1-7数据链,室温杂交1小时,进行完全的可寻址信息写入。此时折纸表面呈现“8”形状,由7*7(共49)根1-7数据链组成。使用AFM对数据写入成功率进行表征。加入1-7数据链的部分组合,在折纸表面形成“0-9”形状,例如,数字“1”由两根数据链组成,数字“2”由5根数据链组成,数字“3”由5根数据链组成,以此类推。通过AFM形貌表征来可视化验证该平台数据写入的稳健性。结果显示(图4),目标写入位置的数据链在AFM表征结果中可以可视化验证。(1) Add 100nM 1-7 data link to 10nM DNA origami, hybridize at room temperature for 1 hour, and write completely addressable information. At this time, the origami surface shows an "8" shape, consisting of 7*7 (49 in total) 1-7 data links. Use AFM to characterize the data writing success rate. Add some combinations of data chains 1-7 to form a "0-9" shape on the origami surface. For example, the number "1" is composed of two data chains, the number "2" is composed of 5 data chains, and the number "3" is composed of It consists of 5 data links, and so on. The robustness of data writing on the platform is visually verified through AFM morphology characterization. The results show (Figure 4) that the data chain of the target writing position can be visually verified in the AFM characterization results.
(2)向数据1-7的溶液中分别加入数据1-7的激活链,室温反应1小时后加入完整的T7启动子链,室温反应1小时。加入T7转录混合液,42℃水浴锅中反应1小时。PAGE表征激活后转录的可行性和可寻址读出的正交性。通过读出激活的正交性的PAGE数据统计得到的结果(图5)显示,只有激活链与数据链的序列匹配时,数据才可被导出。(2) Add the activation chains of data 1-7 to the solutions of data 1-7 respectively, react at room temperature for 1 hour, add the complete T7 promoter chain, and react at room temperature for 1 hour. Add T7 transcription mixture and react in a 42°C water bath for 1 hour. PAGE characterizes the viability of transcription upon activation and the orthogonality of addressable readouts. The results obtained by reading out the PAGE data statistics of the orthogonality of the activation (Figure 5) show that the data can only be exported if the sequence of the activation chain matches the data chain.
(3)转录获得的RNA分子使用逆转录试剂盒进行逆转录得到对应的DNA链,对获得的DNA链利用荧光定量PCR进行定量,确认数据的导出。(3) The RNA molecules obtained by transcription are reverse transcribed using a reverse transcription kit to obtain the corresponding DNA strands. The obtained DNA strands are quantified using fluorescence quantitative PCR to confirm the export of the data.
(4)导出的数据使用二代测序仪进行测序,对测序结果进行解码。对七个地址数据读出后序列进行二代测序的结果显示每个地址测序得到的序列信息均与写入的信息完全一致(图6)。(4) The exported data is sequenced using a second-generation sequencer and the sequencing results are decoded. The results of second-generation sequencing on the readout sequences of the seven address data showed that the sequence information obtained by sequencing at each address was completely consistent with the written information (Figure 6).
数据的可寻址擦除和修改:Addressable erasure and modification of data:
(1)1-7数据链完全写入后,加入与数据链5、6能够互补的擦除链5、6,室温孵育1小时,擦除对应数据,通过超分辨荧光显微镜表征到目标数据链5、6已被擦除,呈现数字“3”,进一步加入更多与数据链1、4、7能够互补的擦除链1、4、7,进行反应和表征,目标数据链1、4、7被擦除,得到数字“1”。该过程可参见图7中,数字“8”依次向数字“3”、数字“1”的转变过程。(1) After data links 1-7 are completely written, add erase chains 5 and 6 that are complementary to data links 5 and 6, incubate at room temperature for 1 hour, erase the corresponding data, and characterize the target data link through super-resolution fluorescence microscopy 5 and 6 have been erased, presenting the number "3". Further add more erasure chains 1, 4, and 7 that are complementary to data chains 1, 4, and 7 for reaction and characterization. Target data chains 1, 4, 7 is erased, resulting in the number "1". This process can be seen in Figure 7, where the number "8" is transformed into the number "3" and the number "1" in sequence.
(2)加入数据链1,室温孵育1小时,重新在地址1处写入数据,得到实现数据链1加入的结果,呈现数字“7”。通过超分辨荧光显微镜成像结果,结果显示可寻址数据擦写的 过程中,该过程每一步的荧光成像结构表征结果均与目标擦除和写入位置对应。该过程可参见图7中,数字“1”向数字“7”的转变过程。(2) Add data link 1, incubate at room temperature for 1 hour, re-write data at address 1, and obtain the result of adding data link 1, showing the number "7". Through super-resolution fluorescence microscopy imaging results, the results show that addressable data can be erased and written. During each step of the process, the fluorescence imaging structural characterization results correspond to the target erase and write locations. This process can be seen in Figure 7, the transformation process from the number "1" to the number "7".
(3)以地址1为例检测重复修改能力,将DNA折纸固定于磁珠表面,对待写入地址1的两种数据(两条数据链)分别进行Alexa488和Cy5荧光标记。加入Alexa488标记的数据链,使用荧光分光光度计测试写入后上清液中残存的数据链浓度,进行擦除操作,测试擦除效率。写入Cy5标记的数据链,测试写入效率,进行擦除,测试擦除效率。具体实验为:先写入647数据,此时第一次488数据的写入效率基本为0,然后擦除647数据,此时488数据的擦除效率基本为0;下一步写入488数据,此时写入效率为54%左右,再接着擦除488数据,此时488数据擦除效率为110%左右。如此循环。重复5次以上,测定体系的可寻址反复擦写能力。结果显示经过反复擦除和写入,得到的荧光测试结果可以与预期的擦除写入结果对应,说明反复修改数据具有可行性(图8),图8的纵坐标上方代表488数据的擦除效率,下方代表488数据的写入效率,横坐标代表擦除或写入的次数。(3) Taking address 1 as an example to test the repeated modification ability, fix the DNA origami on the surface of the magnetic beads, and label the two data (two data chains) to be written to address 1 with Alexa488 and Cy5 fluorescent labels respectively. Add the Alexa488-labeled data link, use a fluorescence spectrophotometer to test the concentration of the data link remaining in the supernatant after writing, perform an erasure operation, and test the erasure efficiency. Write the Cy5-marked data link to test the writing efficiency, perform erasing, and test the erasing efficiency. The specific experiment is: first write 647 data, at this time the writing efficiency of the first 488 data is basically 0, then erase the 647 data, at this time the erasing efficiency of the 488 data is basically 0; in the next step, write the 488 data, At this time, the writing efficiency is about 54%, and then the 488 data is erased. At this time, the 488 data erasing efficiency is about 110%. And so on. Repeat more than 5 times to measure the addressable repeated erasing ability of the system. The results show that after repeated erasing and writing, the fluorescence test results obtained can correspond to the expected erasing and writing results, indicating that it is feasible to repeatedly modify the data (Figure 8). The upper part of the ordinate in Figure 8 represents the erasure of 488 data Efficiency, the bottom represents the writing efficiency of 488 data, and the abscissa represents the number of erases or writes.
数据的多次擦写:Multiple erasing of data:
通过PEG-biotin-Avidin方式将实施例1中组装的20pM biotin连接的折纸固定于玻片上,清洗过后,加入1μM Cy5标记的数据链1,孵育30min后进行清洗,在TIRF下进行荧光检测,之后加入1μM的擦除链1,孵育1h后使用缓冲液进行清洗,再次进行荧光拍摄。擦写过程反复进行10次,TIRF(全内反射荧光显微镜)记录荧光变化情况。图9是七个地址数据10次反复擦写的单点TIRF成像图。图9显示,对单个荧光点,10次擦写很稳定。图10是七个地址数据10次反复擦写的TIRF(全内反射荧光显微镜)成像图及共定位统计图,红绿荧光共定位代表数据在折纸上的存在。图10显示,10次擦写,写入后的共定位比例稳定在85%左右,擦除后共定位比例稳定在10%左右。The 20 pM biotin-linked origami assembled in Example 1 was fixed on the glass slide through the PEG-biotin-Avidin method. After cleaning, 1 μM Cy5-labeled data link 1 was added, incubated for 30 minutes, washed, and fluorescence detection was performed under TIRF. Add 1 μM erase chain 1, incubate for 1 hour, wash with buffer, and take fluorescence photography again. The erasing and writing process was repeated 10 times, and TIRF (total internal reflection fluorescence microscope) recorded the fluorescence changes. Figure 9 is a single-point TIRF imaging diagram of seven address data being erased and written repeatedly for 10 times. Figure 9 shows that for a single fluorescent dot, it is very stable for 10 times of erasing. Figure 10 is a TIRF (total internal reflection fluorescence microscope) imaging diagram and co-localization statistical diagram of seven address data being repeatedly erased 10 times. The co-localization of red and green fluorescence represents the presence of data on the origami paper. Figure 10 shows that after 10 times of erasing and writing, the co-localization ratio after writing is stable at about 85%, and the co-localization ratio after erasing is stable at about 10%.
数据的多次读取:Multiple reads of data:
将数据链组装在折纸上,加入激活链进行激活后,将2nM折纸吸附于云母片上,使用1×TAE Mg2+清洗后,加入转录体系于云母上,将云母放于42℃烘箱中孵育2h后,将云母上液体收集并保存。上述转录过程重复11次,并将每次转录出的RNA进行收集,最终将11次转录出的RNA进行反转录-PCR进行定量。图11是数据链的反复读取qPCR定量图,结果表明本申请的存储介质可进行至少10次反复读取。Assemble the data chain on the origami paper, add the activation chain for activation, adsorb 2nM origami paper on the mica sheet, clean it with 1×TAE Mg 2+ , add the transcription system on the mica, and incubate the mica in a 42°C oven for 2 hours Finally, the liquid on the mica is collected and stored. The above-mentioned transcription process was repeated 11 times, and the RNA transcribed each time was collected. Finally, the RNA transcribed 11 times was subjected to reverse transcription-PCR for quantification. Figure 11 is a quantitative PCR diagram of repeated readings of the data link. The results show that the storage medium of the present application can be read repeatedly for at least 10 times.
实施例3基于不同矩阵间距的折纸平台的读取检测Example 3 Reading detection of origami platforms based on different matrix spacing
将DNA折纸设计为不同矩阵间距,包括6nm、12nm、18nm和24nm,将存储信息链 以不同间距排布于DNA折纸上,赋予每条信息链以不同间距的物理地址,按照实施例2的方法检测不同矩阵间距下数据链的写入和读取情况。结果如图12和13所示,矩阵间距分为6nm、12nm、18nm、24nm时,数据链的写入效率和读出效率无明显差异。DNA origami is designed with different matrix spacings, including 6nm, 12nm, 18nm and 24nm, to store information chains Arrange them on the DNA origami paper at different spacings, give each information chain a physical address at different spacings, and detect the writing and reading of the data chains at different matrix spacings according to the method in Example 2. The results are shown in Figures 12 and 13. When the matrix spacing is divided into 6nm, 12nm, 18nm, and 24nm, there is no significant difference in the writing efficiency and readout efficiency of the data link.
前述详细说明是以解释和举例的方式提供的,并非要限制所附权利要求的范围。目前本申请所列举的实施方式的多种变化对本领域普通技术人员来说是显而易见的,且保留在所附的权利要求和其等同方案的范围内。 The foregoing detailed description is provided by way of explanation and example, and is not intended to limit the scope of the appended claims. Various modifications to the embodiments described herein will be apparent to those of ordinary skill in the art and remain within the scope of the appended claims and their equivalents.

Claims (28)

  1. 一种系统,所述系统包含1)包含数据信息的核酸分子,和2)具有可寻址信息的载体,所述核酸分子能够结合于所述载体上,并且所述核酸分子包含的数据信息能够在所述载体原位进行读取。A system comprising 1) a nucleic acid molecule containing data information, and 2) a carrier having addressable information, the nucleic acid molecule being able to bind to the carrier, and the nucleic acid molecule containing the data information being able to Reading is performed in situ on the vector.
  2. 如权利要求1所述的系统,其中所述载体的不同物理位置处具有不同序列的地址序列。The system of claim 1, wherein the carrier has different sequences of address sequences at different physical locations.
  3. 如权利要求1所述的系统,其中所述载体包含DNA折纸基板,所述DNA折纸基板包含订书钉链,所述订书钉链包含具有可寻址信息的地址序列。The system of claim 1, wherein the vector comprises a DNA origami substrate comprising a staple strand comprising an address sequence having addressable information.
  4. 如权利要求3所述的系统,其中在所述载体中,订书钉链的地址序列以矩阵形式排布。The system of claim 3, wherein in the carrier, the address sequences of the staple chains are arranged in a matrix form.
  5. 如权利要求4所述的系统,其中相邻两个订书钉链的间距为6-24nm。The system of claim 4, wherein the distance between two adjacent staple chains is 6-24 nm.
  6. 如权利要求1-5中任一项所述的系统,其中所述核酸分子包含地址互补序列,所述地址互补序列与所述载体上的所述地址序列互补。The system of any one of claims 1-5, wherein the nucleic acid molecule comprises an address complementary sequence that is complementary to the address sequence on the vector.
  7. 如权利要求6所述的系统,其中所述地址互补序列的长度为约15个或更多个核苷酸。The system of claim 6, wherein the address complementary sequence is about 15 or more nucleotides in length.
  8. 权利要求1-7中任一项所述的系统,其中所述核酸分子包含数据序列,且在所述核酸分子与所述载体基本上不分离的状态下,所述数据序列中的数据信息可以被读取。The system of any one of claims 1-7, wherein the nucleic acid molecule comprises a data sequence, and in a state where the nucleic acid molecule and the carrier are not substantially separated, the data information in the data sequence can is read.
  9. 如权利要求8所述的系统,其中所述数据序列的长度为约1个或更多个核苷酸。The system of claim 8, wherein the data sequence is about 1 or more nucleotides in length.
  10. 权利要求1-9中任一项所述的系统,其中所述核酸分子包含读取引发序列,所述读取引发序列能够引发所述数据序列合成为待测序链,且所述待测序链与所述数据序列互补。The system of any one of claims 1-9, wherein the nucleic acid molecule includes a read priming sequence, the read priming sequence can trigger the synthesis of the data sequence into a strand to be sequenced, and the strand to be sequenced is identical to The data sequences are complementary.
  11. 如权利要求10所述的系统,其中所述读取引发序列包括启动子。The system of claim 10, wherein the read initiating sequence includes a promoter.
  12. 如权利要求1-11中任一项所述的系统,其进一步包含3)阻碍链,所述阻碍链能够使得所述数据信息不被读取。The system according to any one of claims 1-11, further comprising 3) a blocking chain, the blocking chain can prevent the data information from being read.
  13. 如权利要求12所述的系统,其中所述阻碍链包含阻碍序列,所述阻碍序列能够与所述读取引发序列互补,从而使得所述数据信息不被读取。The system of claim 12, wherein the blocking strand includes a blocking sequence that is complementary to the read initiating sequence, thereby preventing the data information from being read.
  14. 如权利要求13所述的系统,其中所述阻碍链包含阻碍延伸序列,所述阻碍延伸序列位于所述阻碍序列的上游和/或下游,且所述阻碍延伸序列与所述核酸分子基本不互补。The system of claim 13, wherein the blocking strand comprises a blocking elongation sequence, the blocking elongation sequence is located upstream and/or downstream of the blocking sequence, and the blocking elongation sequence is substantially non-complementary to the nucleic acid molecule. .
  15. 如权利要求12-14中任一项所述的系统,其进一步包含4)激活链,所述激活链能够使得所述阻碍链不与所述核酸分子结合,且所述激活链同时与所述阻碍序列以及所述阻碍延伸序列互补。The system of any one of claims 12-14, further comprising 4) an activation chain, the activation chain can prevent the blocking chain from binding to the nucleic acid molecule, and the activation chain simultaneously binds to the nucleic acid molecule The blocking sequence and the blocking extension sequence are complementary.
  16. 如权利要求1-15中任一项所述的系统,其中所述核酸分子包含擦除功能序列,所述擦写功能序列位于所述地址互补序列的上游和/或下游,且所述擦写功能序列与所述载体上的所述地址序列基本不互补。The system of any one of claims 1-15, wherein the nucleic acid molecule comprises an erasure function sequence, the erasure function sequence is located upstream and/or downstream of the address complementary sequence, and the erasure function sequence The functional sequence is not substantially complementary to the address sequence on the carrier.
  17. 如权利要求16所述的系统,其进一步包含5)擦除链,所述擦除链能够使得所述核酸分子不与所述载体结合,且所述擦除链能够同时与所述地址互补序列以及所述擦写功能序 列互补。The system of claim 16, further comprising 5) an erasure chain, the erasure chain can prevent the nucleic acid molecule from binding to the carrier, and the erasure chain can simultaneously bind to the address complementary sequence and the erase and write function program Columns are complementary.
  18. 如权利要求17所述的系统,其中所述擦除链能够取代所述载体与所述核酸分子结合。The system of claim 17, wherein the erasure strand is capable of displacing the carrier from binding to the nucleic acid molecule.
  19. 如权利要求17所述的系统,其中所述核酸分子与所述载体中的地址序列的摩尔比为2:1或更高。The system of claim 17, wherein the molar ratio of the nucleic acid molecule to the address sequence in the vector is 2:1 or higher.
  20. 权利要求1-19中任一项所述的系统中的核酸分子。The nucleic acid molecule in the system of any one of claims 1-19.
  21. 权利要求1-19中任一项所述的系统中的载体。The vector in the system of any one of claims 1-19.
  22. 一种数据存储的方法,所述方法包括提供权利要求1-19中任一项所述的系统。A method of data storage, the method comprising providing the system according to any one of claims 1-19.
  23. 一种数据编辑的方法,所述方法包括替换权利要求1-19中任一项所述的系统中的核酸分子中的数据信息。A method of data editing, which method includes replacing data information in nucleic acid molecules in the system of any one of claims 1-19.
  24. 如权利要求23所述的方法,所述方法包含提供所述擦除链,在室温中,特定物理位置的所述核酸分子与所述擦除链结合,且所述核酸分子基本不与所述载体结合。The method of claim 23, comprising providing the erasure strand, the nucleic acid molecule at a specific physical location binds to the erasure strand at room temperature, and the nucleic acid molecule is not substantially bound to the erasure strand. carrier combination.
  25. 一种数据读取的方法,所述方法包括确定权利要求1-19中任一项所述的系统中的核酸分子中的数据信息。A method of data reading, which method includes determining data information in nucleic acid molecules in the system of any one of claims 1-19.
  26. 如权利要求25所述的方法,所述方法还包含提供所述激活链,在室温中,特定物理位置的核酸分子所结合的所述阻碍链与所述激活链结合,且所述阻碍链基本不与所述核酸分子结合。The method of claim 25, further comprising providing the activation chain, at room temperature, the hindrance chain to which the nucleic acid molecule at a specific physical position is bound binds to the activation chain, and the hindrance chain is substantially Does not bind to the nucleic acid molecule.
  27. 一种存储介质,所述介质包含权利要求22-26中任一项所述的方法。A storage medium containing the method of any one of claims 22-26.
  28. 一种设备,所述设备包含权利要求27所述的储存介质,以及耦接至所述储存介质的处理器,所述处理器被配置为基于存储在所述储存介质中的程序执行以实现权利要求22-26中任一项所述的方法。 An apparatus comprising the storage medium of claim 27, and a processor coupled to the storage medium, the processor configured to execute based on a program stored in the storage medium to implement a right The method described in any one of claims 22-26.
PCT/CN2023/110132 2022-08-01 2023-07-31 Data storage medium and use thereof WO2024027620A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210913703.8A CN117542391A (en) 2022-08-01 2022-08-01 Data storage medium and application thereof
CN202210913703.8 2022-08-01

Publications (1)

Publication Number Publication Date
WO2024027620A1 true WO2024027620A1 (en) 2024-02-08

Family

ID=89782822

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/110132 WO2024027620A1 (en) 2022-08-01 2023-07-31 Data storage medium and use thereof

Country Status (2)

Country Link
CN (1) CN117542391A (en)
WO (1) WO2024027620A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100324600A1 (en) * 2009-06-18 2010-12-23 Ashok Biyani Unidirectional rotatory pedicle screw and spinal deformity correction device for correction of spinal deformity in growing children
US20140079592A1 (en) * 2012-09-19 2014-03-20 National Chiao Tung University Bio-nanowire device and method of fabricating the same
CN106845158A (en) * 2017-02-17 2017-06-13 苏州泓迅生物科技股份有限公司 A kind of method that information Store is carried out using DNA
CN112585152A (en) * 2018-07-11 2021-03-30 加利福尼亚大学董事会 Electrically readable read-only memory based on nucleic acids
CN113096742A (en) * 2021-04-14 2021-07-09 湖南科技大学 DNA information storage parallel addressing writing method and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100324600A1 (en) * 2009-06-18 2010-12-23 Ashok Biyani Unidirectional rotatory pedicle screw and spinal deformity correction device for correction of spinal deformity in growing children
US20140079592A1 (en) * 2012-09-19 2014-03-20 National Chiao Tung University Bio-nanowire device and method of fabricating the same
CN106845158A (en) * 2017-02-17 2017-06-13 苏州泓迅生物科技股份有限公司 A kind of method that information Store is carried out using DNA
CN112585152A (en) * 2018-07-11 2021-03-30 加利福尼亚大学董事会 Electrically readable read-only memory based on nucleic acids
CN113096742A (en) * 2021-04-14 2021-07-09 湖南科技大学 DNA information storage parallel addressing writing method and system

Also Published As

Publication number Publication date
CN117542391A (en) 2024-02-09

Similar Documents

Publication Publication Date Title
JP7085999B2 (en) Molecular programming tool
Lee et al. Photon-directed multiplexed enzymatic DNA synthesis for molecular digital data storage
JP6730525B2 (en) Chemical composition and method of using the same
JP2023071981A (en) Enzyme- and amplification-free sequencing
Hao et al. Data storage based on DNA
JP7364604B2 (en) Chemical methods for nucleic acid-based data storage
WO2017083177A1 (en) Error correction for nucleotide data stores
CN107922970A (en) It is enriched with by the target of Single probe primer extend
WO2015090879A1 (en) Oligonucleotide data storage on solid supports
CN107750361A (en) Relation DNA is operated
Xu et al. Uncertainties in synthetic DNA-based data storage
CN113066534A (en) Method for writing and reading information by using DNA sequence
CN111757934A (en) Target enrichment by one-way dual probe primer extension
JP2022540744A (en) Arrays and methods for detecting spatial information in nucleic acids
ES2965266T3 (en) Methods for polynucleotide sequencing
WO2024027620A1 (en) Data storage medium and use thereof
ES2789349T3 (en) Improved use of surface primers in groups
Skinner et al. Biocompatible writing of data into DNA
US20210171939A1 (en) Sample processing barcoded bead composition, method, manufacturing, and system
US20240052406A1 (en) Competitive methods and compositions for amplifying polynucleotides
JP5322141B2 (en) 5 'area differential display method
US20240035078A1 (en) Methods and compositions for amplifying polynucleotides
US20240093293A1 (en) Methods for increasing monoclonal nucleic acid amplification products
RU2756641C2 (en) Method for storing information using dna and information storage device
CN116386738A (en) DNA hybridization information storage encryption method based on probe card-issuing structure

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23849328

Country of ref document: EP

Kind code of ref document: A1