WO2024076044A1 - Procédé et dispositif de codage et de décodage d'adn - Google Patents
Procédé et dispositif de codage et de décodage d'adn Download PDFInfo
- Publication number
- WO2024076044A1 WO2024076044A1 PCT/KR2023/014125 KR2023014125W WO2024076044A1 WO 2024076044 A1 WO2024076044 A1 WO 2024076044A1 KR 2023014125 W KR2023014125 W KR 2023014125W WO 2024076044 A1 WO2024076044 A1 WO 2024076044A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- base
- binary data
- processor
- information
- base sequence
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 125
- 238000006243 chemical reaction Methods 0.000 claims description 56
- 238000004590 computer program Methods 0.000 claims description 10
- 239000002773 nucleotide Substances 0.000 abstract description 35
- 125000003729 nucleotide group Chemical group 0.000 abstract description 35
- 230000006820 DNA synthesis Effects 0.000 abstract description 8
- 108020004414 DNA Proteins 0.000 description 110
- UYTPUPDQBNUYGX-UHFFFAOYSA-N guanine Chemical compound O=C1NC(N)=NC2=C1N=CN2 UYTPUPDQBNUYGX-UHFFFAOYSA-N 0.000 description 26
- 102000053602 DNA Human genes 0.000 description 22
- 238000010586 diagram Methods 0.000 description 22
- 238000004458 analytical method Methods 0.000 description 14
- OPTASPLRGRRNAP-UHFFFAOYSA-N cytosine Chemical compound NC=1C=CNC(=O)N=1 OPTASPLRGRRNAP-UHFFFAOYSA-N 0.000 description 14
- 238000011084 recovery Methods 0.000 description 14
- RWQNBRDOKXIBIV-UHFFFAOYSA-N thymine Chemical compound CC1=CNC(=O)NC1=O RWQNBRDOKXIBIV-UHFFFAOYSA-N 0.000 description 14
- 230000003252 repetitive effect Effects 0.000 description 10
- 238000012545 processing Methods 0.000 description 9
- 238000012937 correction Methods 0.000 description 8
- 230000007547 defect Effects 0.000 description 8
- 229930024421 Adenine Natural products 0.000 description 7
- GFFGJBXGBJISGV-UHFFFAOYSA-N Adenine Chemical compound NC1=NC=NC2=C1N=CN2 GFFGJBXGBJISGV-UHFFFAOYSA-N 0.000 description 7
- 229960000643 adenine Drugs 0.000 description 7
- 238000004891 communication Methods 0.000 description 7
- 229940104302 cytosine Drugs 0.000 description 7
- 229940113082 thymine Drugs 0.000 description 7
- 230000006870 function Effects 0.000 description 5
- 230000001681 protective effect Effects 0.000 description 5
- 230000002194 synthesizing effect Effects 0.000 description 5
- 108091028043 Nucleic acid sequence Proteins 0.000 description 4
- 238000013500 data storage Methods 0.000 description 4
- 230000015572 biosynthetic process Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000012886 linear function Methods 0.000 description 3
- 230000035939 shock Effects 0.000 description 3
- 238000003786 synthesis reaction Methods 0.000 description 3
- 240000005589 Calophyllum inophyllum Species 0.000 description 2
- 238000012300 Sequence Analysis Methods 0.000 description 2
- 230000002776 aggregation Effects 0.000 description 2
- 238000004220 aggregation Methods 0.000 description 2
- 229920001222 biopolymer Polymers 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000002068 genetic effect Effects 0.000 description 2
- IWVCMVBTMGNXQD-PXOLEDIWSA-N oxytetracycline Chemical compound C1=CC=C2[C@](O)(C)[C@H]3[C@H](O)[C@H]4[C@H](N(C)C)C(O)=C(C(N)=O)C(=O)[C@@]4(O)C(O)=C3C(=O)C2=C1O IWVCMVBTMGNXQD-PXOLEDIWSA-N 0.000 description 2
- 238000012163 sequencing technique Methods 0.000 description 2
- 238000013403 standard screening design Methods 0.000 description 2
- 108091081062 Repeated sequence (DNA) Proteins 0.000 description 1
- 239000012620 biological material Substances 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 239000002360 explosive Substances 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000001151 other effect Effects 0.000 description 1
- 230000002441 reversible effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/12—Computing arrangements based on biological models using genetic models
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B15/00—ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
- G16B30/10—Sequence alignment; Homology search
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
- G16B30/20—Sequence assembly
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B50/00—ICT programming tools or database systems specially adapted for bioinformatics
Definitions
- the present invention relates to information storage technology based on DNA molecules, and more specifically, to encoding technology for DNA synthesis and decoding technology for data recovery.
- Magnetic tape which is widely used as a long-term recording medium, has a data storage lifespan of about 10 years, so maintenance and management costs are continuously required.
- HDD and SSD are representative examples.
- the lifespan of HDDs is usually 5 years, and if the data access frequency is used less than once per quarter, the lifespan is about 10 years, but it is very vulnerable to shock and has a limit to its maximum capacity.
- SSDs are resistant to shocks, but their lifespan is relatively shorter than that of HDDs. Recently, the explosive amount of data produced has exceeded the capacity of storage media, causing overload, and the data storage density limit of existing information storage media has been reached, creating a need for new types of storage devices.
- DNA is used as a storage medium, it is possible to overcome the data storage density, which is a disadvantage of existing storage media, and information can be stored stably for a long period of time even when subjected to physical shock.
- DNA is contained in cells, the smallest unit of living organisms, and contains all genetic information. All living things grow and move as if programmed, according to the information contained in DNA.
- the DNA contained in one single cell consists of 3 billion base pairs, and the size of the genetic information decoded is approximately 1TB.
- one single cell contains two strands of DNA that are 2 nm wide and 3 m long. Therefore, theoretically, DNA, a next-generation biostorage that can store more than EB (10 ⁇ 18), is very suitable as a biomaterial for ultra-intensive information storage. In addition, the storage life is more than 1,000 years, and low-cost storage is expected to be possible.
- Figure 1 is a conceptual diagram of information storage using DNA base sequences.
- binary data to be stored is encoded with nucleotides A (adenine), T (thymine), G (guanine), and C (cytosine).
- DNA is synthesized according to the encoded base sequence, and the synthesized DNA molecule is stored. Afterwards, the stored DNA molecules are selected through retrieval, the nucleotide sequence of the selected DNA molecule is analyzed (Sequencing), and binary data is decoded according to the analyzed nucleotide sequence.
- the purpose of this specification is to provide a DNA encoding method and device that reduces the possibility of errors when synthesizing and restoring DNA.
- the purpose of this specification is to provide a method and device for shuffling or restoring binary data with a reduced possibility of error when synthesizing and restoring DNA.
- the purpose of this specification is to provide a DNA encoding method and device that can improve the structural stability of synthesized DNA.
- the DNA encoding method according to the present specification to solve the above-described problem includes the steps of: (a) a processor first converting binary data into a base sequence; (b) a step where the processor finds a point where the same base is repeated in the first converted base sequence; and (c) secondary conversion by the processor into a base sequence in which a dummy base is added at a point where the same base is repeated.
- step (b) may be a step in which the processor searches for a point where a predetermined number of identical bases are repeated.
- step (b) may be a step in which the processor finds a point where the same base is repeated according to the number set for each base.
- step (b) includes: (b-1) generating a frequency table for repeated bases in the primary converted base sequence by a processor; (b-2) a step of the processor determining the number of repetitions of the same base requiring addition of a dummy base using the generated frequency table; and (b-3) the processor searches for points where the same base is repeated in the primary converted base sequence according to the number determined in step (b-2).
- the step (b-2) may be a step in which the processor determines the average value of the frequency as the number of repetitions of the same base that requires the addition of a dummy base.
- step (b-2) may be a step in which the processor determines the average frequency of each base as the number of repetitions of the same base that requires the addition of a dummy base for each base.
- step (c) may be a step of secondary conversion by the processor adding a dummy base having at least one predetermined sequence.
- step (c) may be a step in which the processor performs secondary conversion by adding bases different from the adjacent bases on both sides of the point where the same base is repeated as dummy bases.
- step (c) may be a step in which the processor performs secondary conversion by adding dummy bases with two or more sequences.
- the DNA encoding method according to the present specification may be implemented in the form of a computer program written to perform each step of the DNA encoding method on a computer and recorded on a computer-readable recording medium.
- the DNA encoding device for solving the above-mentioned problems primarily converts binary data into a base sequence, finds a point where the same base is repeated in the first converted base sequence, and detects the repeating point of the same base. It may include a processor that performs secondary conversion into a base sequence with dummy bases added at the points.
- the processor can find a point where a predetermined number of identical bases are repeated.
- the processor can find a point where the same base is repeated according to the number set for each base.
- the processor generates a frequency table for repeated bases in the primary converted base sequence, and uses the generated frequency table to determine the number of repetitions of the same base that requires addition of a dummy base. The point where the same base is repeated in the primary converted base sequence can be found according to the determined number.
- the processor may determine the average value of the frequency as the number of repetitions of the same base that require the addition of a dummy base.
- the processor may determine the average frequency for each base as the number of repetitions of the same base that requires the addition of a dummy base for each base.
- the processor may perform secondary conversion by adding a dummy base having at least one predetermined sequence.
- the processor may perform secondary conversion by adding bases different from the adjacent bases on both sides of the point where the same base is repeated as dummy bases.
- the processor can perform secondary conversion by adding dummy bases with two or more sequences.
- the DNA encoding device includes a DNA encoding device; and a DNA synthesis device that synthesizes DNA according to the nucleotide sequence output from the DNA encoding device.
- the binary data shuffling method includes the steps of: (a) a processor temporarily converting binary data into a base sequence; (b) a step where the processor determines whether there is a point in the temporarily converted base sequence where the same base is repeated more than a preset number (hereinafter referred to as 'identical repeat sequence'); (c) when the same base repeat sequence exists in step (b), the processor reversely converts the temporarily converted base sequence back into binary data, shuffles the inverted binary data, and then returns to step (a); and (d) when the processor does not exist in the step (b), the temporarily converted base sequence is converted back into binary data, and the converted binary data is stored as binary data to be converted to base sequence.
- Step may include.
- step (b) may be a step in which the processor searches for a point where a predetermined number of identical bases are repeated.
- step (b) may be a step in which the processor searches for a point where the same base is repeated according to the number set for each base.
- step (c) may be a step in which the processor shuffles binary data using a linear feedback shift register (LFSR) method.
- LFSR linear feedback shift register
- step (c) includes: (c-1) storing the number of shuffles each time the processor shuffles binary data; and (c-2) when the processor reaches a preset maximum number of shuffles, moving to step (d).
- step (a) is a step in which the processor stores a temporarily converted base sequence
- step (c) is a step in which the processor stores the number of identically repeated bases in the temporarily converted base sequence. Further storing, (c-1) when the processor inverts the binary data and is not the same as the original binary data, mixing the inverse binary data and then returning to step (a); And (c-2) when the processor reversely converts the binary data to the same as the original binary data, sending the base sequence with the smallest number of identically repeated bases among the temporarily converted base sequences to step (d). can do.
- the binary data shuffling method according to the present specification may be implemented in the form of a computer program written to perform each step of the binary data shuffling method on a computer and recorded on a computer-readable recording medium.
- a binary data mixing device for solving the above-mentioned problems includes a base sequence conversion unit that converts binary data into a base sequence or reversely converts the base sequence into binary data; A repetitive base analysis unit that determines whether there is a point in the base sequence temporarily converted by the base sequence conversion unit where the same base is repeated more than a preset number (hereinafter referred to as 'identical repeat sequence'); A binary data scram unit that mixes binary data and outputs it; and a control unit that controls the base sequence conversion unit, the repetitive base analysis unit, and the binary data scram unit, wherein the control unit determines that the same base repeat sequence exists in the repetitive base analysis unit, and the temporary conversion After converting the base sequence back into binary data, mixing the back-converted binary data, controlling to convert the mixed binary data back into base sequence, and when there is no identical base repeat sequence in the repeat base analysis unit, It can be controlled to convert the temporarily converted base sequence back into binary data and store the inverted binary data as binary data subject to base sequence conversion.
- the repetitive base analysis unit can find a point where a predetermined number of identical bases are repeated.
- a point where the same base is repeated can be found according to the number set for each base in the repeated base analysis.
- the binary data scram unit can shuffle binary data using a linear feedback shift register (LFSR) method.
- LFSR linear feedback shift register
- control unit stores the number of shuffles each time the binary data scram unit shuffles the binary data, and when the preset maximum shuffle number is reached, the temporarily converted nucleotide sequence is converted back into binary data. It can be controlled to inversely convert and store the inversely converted binary data as binary data subject to nucleotide sequence conversion.
- the control unit stores the base sequence each time it is temporarily converted in the base sequence conversion unit, and calculates the number of identically repeated bases analyzed in the repeat base analysis unit to the temporarily converted base sequence.
- the inverted binary data is not the same as the original binary data, the inverted binary data is controlled to be mixed, and when the inverted binary data is the same as the original binary data, the number of identically repeated bases is the highest.
- the adversary can control the temporarily converted base sequence to store the inversely converted binary data as binary data subject to base sequence conversion.
- the binary data shuffling device includes: a binary data shuffling device; and a DNA synthesis device that converts the binary data output from the binary data mixing device into a base sequence and then synthesizes DNA.
- the binary data restoration method includes the steps of: (a) receiving and storing analyzed DNA sequence information by a processor; (b) a step of the processor converting the base sequence information into binary data; (c) the processor separating the shuffle count information included in the inverted binary data; and (d) a step of the processor restoring binary data from which the shuffle count information has been separated according to the shuffle count information.
- the binary data restoration method includes the steps of: (a) receiving and storing analyzed DNA sequence information by a processor; (b) a step of the processor converting the base sequence information into binary data; (c) the processor reading shuffle count information from the storage unit; and (d) a step of the processor restoring the inversely converted binary data according to the shuffle count information.
- the binary data restoration method according to the present specification may further include removing an error correction code of the inversely converted binary data after step (b).
- Step (d) of the binary data restoration method according to the present specification may be a step in which the processor restores binary data using a linear feedback shift register (LFSR) method.
- LFSR linear feedback shift register
- the binary data restoration method according to the present specification may be implemented in the form of a computer program written to perform each step of the binary data restoration method on a computer and recorded on a computer-readable recording medium.
- the binary data restoration device for solving the above-described problems includes a base sequence conversion unit that reversely converts the analyzed DNA base sequence information into binary data; a binary data processing unit that separates mixed count information included in the inversely converted binary data; and a binary data disk drive unit that restores binary data from which the shuffle count information has been separated according to the shuffle count information.
- the binary data restoration device for solving the above-described problems includes a base sequence conversion unit that reversely converts the analyzed DNA base sequence information into binary data; A storage unit that stores information on the number of times each DNA is mixed; and a binary data disk drive unit that restores the inversely converted binary data according to the shuffle count information stored in the storage unit.
- the base sequence conversion unit of the binary data restoration device can remove the error correction code of the inversely converted binary data.
- LFSR Linear Feedback Shift Register
- a binary data recovery device includes: a binary data recovery device; and a base sequence analysis unit that analyzes DNA and outputs base sequence information.
- the DNA encoding method includes (a) a processor first converting binary data into nucleotide sequence information; and (b) a step of secondary conversion, by the processor, into base sequence information to which dummy base information is added at every preset cycle in the primary converted base sequence information.
- the dummy base information added in step (b) may be base information that is different from adjacent base information in the first converted base sequence information.
- the dummy base information added in step (b) may be at least two or more base information.
- the dummy base information added in step (b) is base information located at both ends of the added dummy base information that is different from the adjacent base information in the first converted base sequence information. You can.
- the DNA encoding method according to the present specification may further include the step of (c) the processor thirdly converting the secondarily converted base sequence information into base sequence information with protective dummy base information added to both ends.
- the DNA encoding method according to the present specification may be implemented in the form of a computer program written to perform each step of the DNA encoding method on a computer and recorded on a computer-readable recording medium.
- the DNA encoding device for solving the above-mentioned problems primarily converts binary data into base sequence information, and adds dummy base information to the first converted base sequence information at preset cycles. It may include a processor that performs secondary conversion.
- the processor may add dummy base information having base information different from adjacent base information in the primary converted base sequence information.
- the processor may add dummy base information consisting of at least two or more base information.
- the processor may add dummy base information in which base information located at both ends of the dummy base information and adjacent base information in the first converted base sequence information have different base information.
- the processor may further convert the secondary converted base sequence information a third time into base sequence information with protective dummy base information added to both ends.
- the DNA encoding device includes: a DNA encoding device; and a DNA synthesis device that synthesizes DNA according to the nucleotide sequence output from the DNA encoding device.
- the possibility of errors when synthesizing and restoring DNA can be reduced. Additionally, the structural stability of synthesized DNA can be improved.
- Figure 1 is a conceptual diagram of information storage using DNA base sequences.
- FIG. 2 is a reference diagram for the DNA information storage system according to the present specification.
- Figure 3 is a flow chart of the DNA encoding method according to the present specification.
- Figure 4 is an exemplary diagram of a DNA encoding method according to the present specification.
- Figure 5 is an example of a dummy base that can be added depending on the point where the same base is repeated and the next adjacent base.
- Figure 6 is a schematic configuration diagram of a binary data shuffling device according to the present specification.
- Figure 7 is a flowchart of a binary data shuffling method according to an embodiment of the present specification.
- Figure 8 is a reference diagram for nucleotide sequence conversion of binary data.
- Figure 9 is a reference diagram to help understand the linear feedback shift register.
- Figure 10 is a flowchart of a binary data shuffling method according to another embodiment of the present specification.
- Figure 11 is a flowchart of a binary data shuffling method according to another embodiment of the present specification.
- Figure 12 is a block diagram schematically showing the configuration of a binary data recovery device according to an embodiment of the present specification.
- Figure 13 is a block diagram schematically showing the configuration of a binary data recovery device according to another embodiment of the present specification.
- Figure 14 is a flowchart of a binary data restoration method according to an embodiment of the present specification.
- Figure 15 is a flowchart of a binary data restoration method according to another embodiment of the present specification.
- Figure 16 is a flowchart of the DNA encoding method according to the present specification.
- Figure 17 is an exemplary diagram of a DNA encoding method according to an embodiment of the present specification.
- Figure 18 is a table of types of dummy bases that can be added according to adjacent bases.
- Figure 19 is an exemplary diagram of a DNA encoding method according to another embodiment of the present specification.
- Figure 20 is an exemplary diagram of a DNA encoding method according to another embodiment of the present specification.
- FIG. 2 is a reference diagram for the DNA information storage system according to the present specification.
- the DNA information storage system is largely divided into a controller and a DNA molecular unit.
- the controller receives a request to store information (Write) from the host, it can compress binary data and scramble the compressed data (Scrambler). The reason for shuffling the data is to prevent the same base sequence from being repeated when the binary data is later replaced with the base sequence.
- An error correction code (ECC) may be added to data that has completed the shuffling process.
- ECC error correction code
- the binary data can be converted into corresponding base sequence information (DNA Library).
- DNA Molecular can synthesize actual DNA molecules according to data converted to base sequence.
- the DNA Molecular analyzes the base sequence of the DNA molecule.
- the controller converts the data back into binary data according to the analyzed base sequence and corrects data errors using an error correction code (ECC) (Encoder).
- ECC error correction code
- Binary data with errors corrected can be descramblered and decompressed to provide original binary data (information).
- DNA encoding method of the invention refers to a process of converting binary data into a base sequence in the sequence shown in FIG. 1.
- DNA encoding is a step that determines in what order the bases will be arranged.
- the base order determines the forward reading direction from 5' to 3'.
- a repeated base sequence means that the same bases are arranged repeatedly (consecutively). Therefore, a point where the same base sequence is repeated means a point where the same base sequence is repeatedly arranged a predetermined number of times in the forward direction.
- the DNA encoding method according to the present specification may be implemented in the form of a computer program written to perform each step described below on a computer and recorded on a computer-readable recording medium.
- each step can be executed by a processor.
- Figure 3 is a flow chart of the DNA encoding method according to the present specification.
- Figure 4 is an exemplary diagram of a DNA encoding method according to the present specification.
- the processor may first convert binary data into a base sequence.
- the processor can find a point where the same base is repeated in the first converted base sequence.
- the processor can secondary convert the sequence into a base sequence in which a dummy base is added to the point where the same base is repeated.
- 2 bits of data are matched 1:1 with bases, but the DNA encoding method according to the present specification is not limited to the example shown.
- One base can correspond to 1 bit, 2 bits, 3 bits, or 4 bits, and methods for matching binary data and base sequence can vary.
- step S200 is a process of finding a point where defects are likely to occur during the synthesis process.
- the processor can find a point where a predetermined number of identical bases are repeated.
- the predetermined number can be set in various ways, for example, 4 to 70. Meanwhile, the same standard may be applied to all four bases, but a different standard may be applied to each base. Therefore, the processor may find points where the same base is repeated according to the number set for each base. For example, A may have 5 bases, T may have 6 bases, G may have 7 bases, C may have 5 bases, etc., and so on. Some bases may have the same number.
- the standard can be determined based on the number of repetitions in the primary converted base sequence.
- the processor may generate a frequency table (see example frequency table in FIG. 4) for repeated bases in the primary converted base sequence. Additionally, the processor can use the generated frequency table to determine the number of repetitions of the same base that require addition of a dummy base. And the processor can find the point where the same base is repeated in the primary converted base sequence according to the determined number of repetitions of the same base.
- the Frequency Count Table can be added to the device (DNA Library) that converts binary information into base sequence information in the controller in the DNA information storage system according to the present specification.
- the processor may determine the average value of the frequency as the number of repetitions of the same base that require the addition of a dummy base. In the example shown in Figure 4, the average value of the frequency is 1.125. Through the rounding operation, whenever the same base is repeated twice, a dummy base can be determined as the position to be added. Additionally, the processor may determine the average value of the frequency for each base as the number of repetitions of the same base that requires the addition of a dummy base for each base.
- the average of A is 0.5
- the average of G is 2
- the average of C is 1.25
- the average of T is 0.75.
- step S210 the processor may determine whether the number of repetitions of the same base exceeds the standard set according to the various embodiments described above. If the number of repeats of the same base does not exceed the standard ('NO' in step S210), there is no need to add a dummy base. On the other hand, if the number of repeats of the same base exceeds the standard ('YES' in step S210), the process can proceed to step S300 because the point requires the addition of a dummy base.
- the processor may perform secondary conversion by adding a dummy base having at least one predetermined sequence.
- the example shown in Figure 4 shows an example in which A is added as a dummy base sequence when four Gs are repeated.
- the DNA encoding method according to the present specification is not limited to the example shown in FIG. 4.
- dummy base 'A' an example of one dummy base 'A' is shown, but the dummy base may be 'G', 'T', or 'C', and the number of dummy bases is not one, but 'AA', There may be two or more such as 'GG', 'TT', 'CC', 'AG', 'GT', 'TC', and 'CA', and their combinations may also vary.
- the processor can perform secondary conversion by adding bases different from the adjacent bases on both sides of the point where the same base is repeated as dummy bases.
- Figure 5 is an example of a dummy base that can be added depending on the point where the same base is repeated and the next adjacent base. In the example shown in FIG. 5, an example with one dummy base is shown for simplicity of drawing and convenience of explanation.
- the processor can perform secondary conversion by adding dummy bases with two or more sequences. In this case, it is also possible to add a dummy base with two or more base sequences combined with different bases from the adjacent bases on both sides of the point where the same base is repeated.
- binary data means data consisting of 1 and 0.
- base sequence refers to information consisting of A (adenine), T (thymine), G (guanine), and C (cytosine). In this specification, the base order determines the forward reading direction from 5' to 3'. Additionally, a repeated base sequence means that the same bases are arranged repeatedly (consecutively). Therefore, a point where the same base sequence is repeated means a point where the same base sequence is repeatedly arranged a predetermined number of times in the forward direction.
- Figure 6 is a schematic configuration diagram of a binary data shuffling device according to the present specification.
- the binary data mixing device 100 may include a base sequence conversion unit 110, a repetitive base analysis unit 120, a binary data scram unit 130, and a control unit 140.
- the base sequence conversion unit 110 can convert binary data into a base sequence or reversely convert a base sequence into binary data.
- the repetitive base analysis unit 120 can determine whether there is a point in the base sequence temporarily converted by the base sequence conversion unit 110 where the same base is repeated more than a preset number (hereinafter referred to as 'same base repeat sequence').
- the binary data scram unit 130 can mix binary data and then output it.
- the control unit 140 can control the base sequence conversion unit 110, the repetitive base analysis unit 120, and the binary data scram unit 130.
- the control unit 140 converts the temporarily converted base sequence back into binary data, mixes the inverted binary data, and then mixes them. You can control the conversion of binary data back to base sequences.
- the control unit 140 reversely converts the temporarily converted base sequence back into binary data, and converts the inverted binary data into base sequence conversion target. You can control it to be stored as binary data. The operation of the control unit 140 will be explained through the binary data shuffling method according to the present specification.
- the base sequence conversion unit 110, the repetitive base analysis unit 120, the binary data scram unit 130, and the control unit 140 are used in the technical field to which the present invention belongs to execute the binary data shuffling method to be described below. It may include known processors, application-specific integrated circuits (ASICs), other chipsets, logic circuits, registers, communication modems, data processing devices, etc.
- ASICs application-specific integrated circuits
- the control logic to be described below is implemented in software
- the base sequence conversion unit 110, repetitive base analysis unit 120, binary data scram unit 130, and control unit 140 are implemented as a set of program modules. It can be.
- the program module may be stored in the memory device and executed by the processor.
- the binary data shuffling method according to the present specification can be implemented in the form of a computer program written to perform each step described below on a computer and recorded on a computer-readable recording medium.
- the binary data shuffling method according to the present specification will be described on the assumption that it is executed by a processor.
- Figure 7 is a flowchart of a binary data shuffling method according to an embodiment of the present specification.
- step S100 the processor may receive and store initial data.
- 'original data' refers to original binary data that has not been mixed.
- the processor may temporarily convert the initial data (binary data) into a base sequence.
- the reason the base sequence converted in step S110 is called a 'temporary base sequence' is to distinguish it from 'binary data subject to base sequence conversion', which will be explained later.
- the ‘temporary base sequence’ may not be the base sequence that will be synthesized into actual DNA later.
- 'binary data subject to base sequence conversion' is data corresponding to the base sequence to be synthesized into actual DNA.
- Figure 8 is a reference diagram for nucleotide sequence conversion of binary data.
- 2 bits of data are matched 1:1 with bases, but the method of mixing binary data according to the present specification is not limited to the example shown.
- One base can correspond to 1 bit, 2 bits, 3 bits, or 4 bits, and methods for matching binary data and base sequence can vary.
- step S120 the processor may determine whether there is a point in the temporarily converted base sequence where the same base is repeated more than a preset number (hereinafter referred to as 'same base repeat sequence'). As explained earlier, if a specific base is placed repeatedly (consecutively), there is a high possibility that defects will occur during the synthesis process. In other words, step S120 is a process of finding a point where defects are likely to occur during the DNA synthesis process.
- the processor can find a point where a predetermined number of identical bases are repeated.
- the predetermined number can be set in various ways, for example, 4 to 70. Meanwhile, the same standard may be applied to all four bases, but a different standard may be applied to each base. Therefore, the processor may find points where the same base is repeated according to the number set for each base. For example, A may have 5 bases, T may have 6 bases, G may have 7 bases, C may have 5 bases, etc., and so on. Some bases may have the same number.
- step S130 the processor may reversely convert the temporarily converted base sequence back into binary data.
- step S140 the processor may mix the inversely converted binary data and then transfer the process to step S110. That is, if the number of repeats of the same base in the converted base sequence is more than a preset number, steps S110 to S140 may be repeatedly performed.
- step S150 the processor may inversely convert the temporarily converted nucleotide sequence back into binary data and store the inversely converted binary data as binary data to which the nucleotide sequence is to be converted. This is because converting the base sequence back into binary data requires additional processing of the binary data in the encoder shown in FIG. 2.
- the processor may shuffle binary data using a linear feedback shift register (LFSR) method.
- LFSR linear feedback shift register
- a linear feedback shift register (LFSR) is a type of shift register and has a structure in which the value entered into the register is calculated as a linear function of the previous state values.
- the linear function used at this time is mainly exclusive logical sum (XOR).
- the initial bit value of LFSR is called the seed.
- LFSR is used in fields such as pseudorandom numbers, pseudorandom noise (PRN), fast digital counters, and blank sequences.
- PRN pseudorandom noise
- LFSR which is used in existing pseudorandom numbers, is used as an element to solve the problem of repeated synchronous bases.
- Figure 9 is a reference diagram to help understand the linear feedback shift register.
- the tap sequence of LFSR can be expressed as a polynomial congruence equation. This means that the coefficients of the polynomial must be 1 or 0. This is called a feedback polynomial or characteristic polynomial. For example, if the taps are the 16th, 14th, 13th, and 11th bits, the LFSR polynomial is:
- LFSR includes 'External LFSR' or 'Internal LFSR' depending on the location of the XOR gate, and various methods known to those skilled in the art, such as 'Galois LFSR', can be applied. As LFSR is known to those skilled in the art, further detailed description will be omitted.
- LFSR LFSR deterministic. Therefore, the sequence of values generated by LFSR is determined by the previous value. Additionally, because the number of values a register can have is finite, this sequence can be repeated at a specific period. Of course, if you choose a good linear function, you can create a long-period, seemingly random sequence. However, if the values output from LSFR are continuously input back into LSFR, there is a possibility that the sequence may be repeated. That is, in some cases, when steps S110 to S140 are repeatedly executed, the initial data may be output again. Therefore, it is necessary to prevent infinite repetition of steps S110 to S140.
- One way to prevent repeated execution of LSFR is to set the number of executions in advance.
- Figure 10 is a flowchart of a binary data shuffling method according to another embodiment of the present specification.
- step S140 the processor may shuffle the binary data and then proceed to step S141.
- step S141 the processor may store the number of shuffles each time the binary data is shuffled.
- step S142 it can be determined whether the number of shuffles exceeds the preset maximum number of shuffles (K). If the number of shuffles is less than the maximum number of shuffles (K) (“NO” in step S142), the process can proceed to step S110.
- steps S110 to S142 can be repeatedly executed until the number of shuffles reaches the maximum number of shuffles (K). On the other hand, if the number of shuffles is greater than the maximum number of shuffles (K) (“YES” in step S142), the process can proceed to step S150. Without performing additional shuffling, the final shuffled binary data is stored as base sequence converted binary data.
- Another way to prevent repeated execution of LSFR is to find the binary data with the lowest number of base sequence repetitions among the converted binary data.
- Figure 11 is a flowchart of a binary data shuffling method according to another embodiment of the present specification.
- step S110 the processor may store the temporarily converted base sequence. That is, each time step S110 is executed after binary data shuffling, the converted temporary data can be stored. And in step S120, the processor may further store the number of identically repeated bases in the temporarily converted base sequence. In other words, it is possible to store more information about the actual number of identically repeated bases in the temporarily converted base sequence.
- step S130 and S140 are the same as previously described.
- step S143 following step S140 the processor may determine whether the inversely converted binary data is the same as the original binary data, that is, the original data.
- step S143 If the inversely converted binary data is not identical to the original binary data (“NO” in step S143), the process may proceed to step S110. Thereafter, the processor may repeatedly execute steps S110 to S143. The repeated execution of steps S110 to S143 may be performed until the number of repeated sequences of the same base in step S120 is less than or equal to the standard number or until the shuffled binary data is identical to the original binary data.
- step S144 the processor may select a base sequence with the smallest number of identically repeated bases among the temporarily converted base sequences. Then, the processor may inversely convert the base sequence selected in step S150 back into binary data and store the inversely converted binary data as binary data subject to base sequence conversion.
- the processor may further add information on the number of times the nucleotide sequence has been converted to the binary data to be converted. For example, if there is no identical nucleotide repeat sequence in the initially temporarily converted nucleotide sequence, the shuffle count information may be '0'. In addition, if the number of times steps S120 to S140 are repeated two times until no identical nucleotide repeat sequence exists, the shuffle number information may be '2'. In this way, the binary data subject to base sequence conversion can be added to the binary data based on how many times it has been mixed.
- the processor may store information on the number of shuffles of the binary data subject to nucleotide sequence conversion in a separate storage device.
- the previous example is an embodiment in which the information on the number of times of mixing is converted into a base sequence and recorded in the DNA itself, and this other embodiment is an embodiment in which the information on the number of times of mixing is stored in a separate storage device.
- the binary data to be converted to base sequence is synthesized into an actual DNA molecule according to the base sequence. Therefore, when a read request occurs for information stored as a DNA molecule, a process is required to unmix it and restore it to the original binary data.
- the role of the binary data restoration method and device according to the present specification is to unravel this mixing and restore the original binary data.
- the binary data recovery device may be a component of a DNA storage system.
- the DNA storage system may include a base sequence analysis unit that analyzes (sequencing) DNA and outputs base sequence information. Therefore, the binary data restoration device and method according to the present specification assumes that information on the base sequence of the actual DNA molecule has been analyzed. Afterwards, the important thing in the restoration process is how many times the DNA molecule has been mixed and the restoration must be carried out according to the information on the number of times it has been mixed. As previously explained, the mixing number information may be stored within the DNA molecule or may be stored in a separate storage device.
- Figure 12 is a block diagram schematically showing the configuration of a binary data recovery device according to an embodiment of the present specification.
- the binary data recovery device 200 may include a base sequence conversion unit 210, a binary data processing unit 220, and a binary data disk drive unit 230. .
- the base sequence conversion unit 210 can reversely convert the analyzed DNA base sequence information into binary data.
- the base sequence conversion unit 210 may be the same or similar to the base sequence conversion unit 110 of the binary data shuffling device 100 described above.
- the binary data processing unit 220 may separate the mixed count information included in the inversely converted binary data.
- the binary data recovery device 200 according to an embodiment of the present specification is a device corresponding to an embodiment in which shuffle count information is included in a DNA molecule.
- the inversely converted binary data includes shuffle count information, and the shuffle count information must be removed to return to the original binary data when unshuffled and restored.
- the binary data disk drive unit 230 can restore binary data with the shuffle count information separated according to the shuffle count information.
- Figure 13 is a block diagram schematically showing the configuration of a binary data recovery device according to another embodiment of the present specification.
- the binary data recovery device 300 may include a base sequence conversion unit 310, a storage unit 320, and a binary data disk drive unit 330.
- the base sequence conversion unit () can reversely convert the analyzed DNA base sequence information into binary data.
- the base sequence conversion unit 310 may be the same or similar to the base sequence conversion unit 110 of the binary data shuffling device 100 described above.
- the storage unit 320 can store information on the number of times each DNA has been mixed.
- the binary data restoration device 300 according to another embodiment of the present specification is a device corresponding to an embodiment in which shuffle count information is not included in the DNA molecule.
- the storage unit 320 is shown in this specification as being included in the binary data recovery device 300, the storage unit 320 may exist outside the binary data recovery device 300.
- the binary data disk drive unit 330 can restore the inversely converted binary data according to the shuffle count information stored in the storage unit 320.
- the base sequence conversion units 210 and 310 can remove the error correction code of the inversely converted binary data.
- it can be converted to a base sequence by adding a code for error correction. Therefore, the inverted binary data also needs to have error correction codes removed.
- the binary data disk drive units 230 and 330 can restore binary data using a linear feedback shift register (LFSR) method. This corresponds to the case where binary data was shuffled using the Linear Feedback Shift Register (LFSR) method.
- LFSR Linear Feedback Shift Register
- the Linear Feedback Shift Register (LFSR) can be restored to its original state when executed in reverse, just as when shuffling. Since LSFR is known to those skilled in the art, a detailed description of the algorithm is omitted.
- the base sequence conversion unit 210, 310, binary data processing unit 220, binary data disk drive unit 230, 330, and storage unit 320 are used to execute the binary data restoration method described below. It may include processors, application-specific integrated circuits (ASICs), other chipsets, logic circuits, registers, communication modems, data processing devices, etc. known in the technical field to which the invention belongs.
- ASICs application-specific integrated circuits
- the base sequence conversion unit 210, 310, binary data processing unit 220, and binary data disk unit 230, 330 are implemented as a set of program modules. You can. At this time, the program module may be stored in the memory device and executed by the processor.
- the binary data restoration method according to the present specification can be implemented in the form of a computer program written to perform each step described below on a computer and recorded on a computer-readable recording medium.
- the binary data restoration method according to the present specification will be described on the assumption that it is executed by a processor.
- Figure 14 is a flowchart of a binary data restoration method according to an embodiment of the present specification.
- the binary data restoration method according to an embodiment of the present specification is a method corresponding to an embodiment in which information on the number of shuffles is included in a DNA molecule.
- step S210 the processor may receive and store the analyzed DNA sequence information.
- the processor may reversely convert the base sequence information into binary data.
- the processor may separate the shuffled count information included in the inversely converted binary data.
- the processor may restore binary data in which the shuffle count information is separated according to the shuffle count information.
- Figure 15 is a flowchart of a binary data restoration method according to another embodiment of the present specification.
- the binary data restoration method according to another embodiment of the present specification is a method corresponding to an embodiment in which information on the number of shuffles is not included in the DNA molecule.
- step S310 the processor may receive and store the analyzed DNA sequence information.
- the processor may reversely convert the base sequence information into binary data.
- the processor can read the shuffle count information from the storage unit.
- the processor may restore binary data in which the shuffle count information is separated according to the shuffle count information.
- the processor may remove the error correction code of the inversely converted binary data.
- the processor may restore binary data using a linear feedback shift register (LFSR) method.
- LFSR linear feedback shift register
- Figure 16 is a flowchart of the DNA encoding method according to the present specification.
- the processor may first convert binary data into nucleotide sequence information.
- the processor may secondary convert the primary converted nucleotide sequence information into nucleotide sequence information to which dummy nucleotide information is added at preset cycles.
- the DNA encoding method according to the present specification will be explained through examples of binary data and base sequence information.
- Figure 17 is an exemplary diagram of a DNA encoding method according to an embodiment of the present specification.
- 2 bits of data are matched 1:1 with bases, but the DNA encoding method according to the present specification is not limited to the example shown.
- One base can correspond to 1 bit, 2 bits, 3 bits, or 4 bits, and methods for matching binary data and base information can vary.
- the base sequence information converted in this way is called ‘primary converted base sequence information’.
- the dummy base information added in step S200 may be base information that is different from adjacent base information in the first converted base sequence information. It is known that defects may occur if the same base is synthesized repeatedly during the process of synthesizing DNA molecules. According to Poon and MacGregor (198) Biopolymers 45:427-434, when G (guanine) is synthesized repeatedly (continuously) more than 4 times, there is a problem of aggregation in the form of a guanine tetraplex. do. Although the above academic data mentions a problem with G (guanine), it does not rule out that the same or similar problems may occur with A (adenine), T (thymine), and C (cytosine).
- Figure 17 is a table of types of dummy bases that can be added according to adjacent bases.
- the added dummy base information may be information of at least two or more bases.
- Figure 19 is an exemplary diagram of a DNA encoding method according to another embodiment of the present specification.
- a dummy base consisting of three bases “AGT”, “GTA”, “TAC”, “AGT”, and "ACC” was added.
- the number of bases constituting the dummy base information can be set in various ways.
- the base information located at both ends of the added dummy base information may be base information that is different from the adjacent base information in the first converted base sequence information. This is also to prevent the same base from being synthesized repeatedly.
- the examples shown in Figures 17 and 19 are examples of secondary conversion by adding a dummy base inside the primary converted base sequence.
- the base located inside the DNA molecule is bonded to other bases on both sides, but the base located at the very end of the DNA molecule is connected to only one side and there is no molecular bond to the other side.
- the bond at the end may be broken and lost.
- the base corresponding to the actual information at the end may be damaged, this also needs to be protected.
- the processor may thirdly convert the secondarily converted base sequence information into base sequence information with protective dummy base information added to both ends.
- Figure 20 is an exemplary diagram of a DNA encoding method according to another embodiment of the present specification.
- the processor may include a microprocessor, ASIC (application-specific integrated circuit), other chipsets, logic circuits, registers, communication modems, and data processing devices known in the technical field to which the present invention pertains to execute the above-described calculation and various control logic. It may include etc. Additionally, when the above-described control logic is implemented as software, the processor may be implemented as a set of program modules. At this time, the program module may be stored in the memory device and executed by the processor.
- ASIC application-specific integrated circuit
- the above-mentioned computer program is C/C++, C#, JAVA that the processor (CPU) of the computer can read through the device interface of the computer in order for the computer to read the program and execute the methods implemented in the program.
- these codes may further include memory reference-related codes that indicate at which location (address address) in the computer's internal or external memory additional information or media required for the computer's processor to execute the above functions should be referenced. there is.
- the code uses the computer's communication module to determine how to communicate with any other remote computer or server. It may further include communication-related codes regarding whether communication should be performed and what information or media should be transmitted and received during communication.
- the storage medium refers to a medium that stores data semi-permanently and can be read by a device, rather than a medium that stores data for a short period of time, such as a register, cache, or memory.
- examples of the storage medium include ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical data storage device, etc., but are not limited thereto. That is, the program may be stored in various recording media on various servers that the computer can access or on various recording media on the user's computer. Additionally, the medium may be distributed to computer systems connected to a network, and computer-readable code may be stored in a distributed manner.
Landscapes
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Theoretical Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Biology (AREA)
- Medical Informatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biotechnology (AREA)
- Chemical & Material Sciences (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Evolutionary Computation (AREA)
- Analytical Chemistry (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- Bioethics (AREA)
- Biomedical Technology (AREA)
- Crystallography & Structural Chemistry (AREA)
- Genetics & Genomics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Epidemiology (AREA)
- Public Health (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
L'invention concerne un procédé et un dispositif de codage et de décodage d'ADN, qui est conçu pour réduire la probabilité d'erreurs pendant la synthèse et la restauration d'ADN. Un procédé de codage d'ADN selon l'invention comprend la conversion de données binaires en une séquence nucléotidique, puis l'ajout de nucléotides factices en des points de la séquence au niveau desquels se répète le même nucléotide. Un autre procédé de codage d'ADN selon l'invention comprend la conversion temporaire de données binaires en une séquence nucléotidique et, si le même nucléotide se répète plus d'un nombre prédéfini de fois, la reconversion de la séquence nucléotidique convertie temporairement en données binaires, la permutation des données binaires inversées, puis la conversion de ces dernières en une séquence nucléotidique. Dans ce cas, les informations de séquence nucléotidique peuvent être reconverties en données binaires, et les données binaires inversées peuvent être restaurées en données binaires d'origine sur la base du nombre de permutations des données. Un autre procédé de codage d'ADN selon l'invention consiste à convertir des données binaires en informations de séquence nucléotidique et à ajouter des informations nucléotidiques factices à des intervalles prédéfinis dans les informations de séquence nucléotidique converties.
Applications Claiming Priority (8)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020220127750A KR20240048218A (ko) | 2022-10-06 | 2022-10-06 | 불량 가능성이 낮은 dna 인코딩 방법 및 장치 |
KR10-2022-0127750 | 2022-10-06 | ||
KR10-2022-0129385 | 2022-10-11 | ||
KR1020220129385A KR20240049911A (ko) | 2022-10-11 | 2022-10-11 | 염기서열의 반복 배치를 방지하기 위한 이진 데이터 섞는 방법 |
KR10-2022-0137061 | 2022-10-24 | ||
KR1020220137061A KR20240056939A (ko) | 2022-10-24 | 2022-10-24 | 섞인 이진 데이터를 원래 이진 데이터로 복원하는 방법 |
KR1020220138893A KR20240058289A (ko) | 2022-10-26 | 2022-10-26 | 구조적 안정성을 향상시키기 위한 dna 인코딩 방법 및 장치 |
KR10-2022-0138893 | 2022-10-26 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2024076044A1 true WO2024076044A1 (fr) | 2024-04-11 |
Family
ID=90608727
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/KR2023/014125 WO2024076044A1 (fr) | 2022-10-06 | 2023-09-19 | Procédé et dispositif de codage et de décodage d'adn |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2024076044A1 (fr) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20150037824A (ko) * | 2012-07-19 | 2015-04-08 | 프레지던트 앤드 펠로우즈 오브 하바드 칼리지 | 핵산을 이용하여 정보를 저장하는 방법 |
KR20190116297A (ko) * | 2017-01-10 | 2019-10-14 | 로스웰 바이오테크놀로지스 인코포레이티드 | Dna 데이터 저장을 위한 방법들 및 시스템들 |
KR20190118853A (ko) * | 2018-04-11 | 2019-10-21 | 서울대학교산학협력단 | Dna 디지털 데이터 저장 장치 및 저장 방법, 그리고 디코딩 방법 |
KR20200025430A (ko) * | 2018-08-30 | 2020-03-10 | 한동대학교 산학협력단 | 디지털 정보를 dna 분자에 저장하는 방법 및 그 장치 |
KR20200071720A (ko) * | 2017-07-25 | 2020-06-19 | 난징진시루이 사이언스 앤드 테크놀로지 바이올로지 코포레이션 | Dna-기반 데이터 저장 |
-
2023
- 2023-09-19 WO PCT/KR2023/014125 patent/WO2024076044A1/fr unknown
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20150037824A (ko) * | 2012-07-19 | 2015-04-08 | 프레지던트 앤드 펠로우즈 오브 하바드 칼리지 | 핵산을 이용하여 정보를 저장하는 방법 |
KR20190116297A (ko) * | 2017-01-10 | 2019-10-14 | 로스웰 바이오테크놀로지스 인코포레이티드 | Dna 데이터 저장을 위한 방법들 및 시스템들 |
KR20200071720A (ko) * | 2017-07-25 | 2020-06-19 | 난징진시루이 사이언스 앤드 테크놀로지 바이올로지 코포레이션 | Dna-기반 데이터 저장 |
KR20190118853A (ko) * | 2018-04-11 | 2019-10-21 | 서울대학교산학협력단 | Dna 디지털 데이터 저장 장치 및 저장 방법, 그리고 디코딩 방법 |
KR20200025430A (ko) * | 2018-08-30 | 2020-03-10 | 한동대학교 산학협력단 | 디지털 정보를 dna 분자에 저장하는 방법 및 그 장치 |
Non-Patent Citations (1)
Title |
---|
TAEJIN AHN, HAMIN BAN, HYUNSOO PARK: "Storing Digital Information in Long-Read DNA", GENOMICS & INFORMATICS, vol. 16, no. 4, 31 December 2018 (2018-12-31), pages e30, XP055680289, DOI: 10.5808/GI.2018.16.4.e30 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2017209531A1 (fr) | Appareil et procédé de codage doté de vérification de redondance cyclique et de code polaire | |
WO2009139574A2 (fr) | Dispositif de mémoire et procédé de gestion d'erreur de données en mémoire | |
WO2014012325A1 (fr) | Procédé et dispositif de contrôle de puce de mémoire flash nand | |
EP0546863A2 (fr) | Dispositif de compression de données | |
KR101531774B1 (ko) | 통합된 데이터 및 헤더 보호를 포함하는 인코드된 데이터의 디코딩 | |
CN111352765B (zh) | 控制器及存储器系统 | |
WO2014137202A1 (fr) | Circuit de traitement de correction d'erreur dans une mémoire et procédé de traitement de correction d'erreur | |
WO2021033981A1 (fr) | Procédé de décodage flexible fondé sur des informations d'un dispositif de stockage d'adn, programme et appareil | |
WO2024076044A1 (fr) | Procédé et dispositif de codage et de décodage d'adn | |
US7770010B2 (en) | Dynamically configurable interleaver scheme using at least one dynamically changeable interleaving parameter | |
JP2589957B2 (ja) | 単一サブブロック・エラーと単一ビット・エラー検出のための符号化方法及びメモリ・システム | |
WO2013015548A2 (fr) | Procédé de codage / décodage ldpc et dispositif l'utilisant | |
JP4065425B2 (ja) | 可変長符号化パッキング・アーキテクチャ | |
US7098818B1 (en) | Encoder and decoder using run-length-limited code | |
WO2020231020A1 (fr) | Procédé et appareil de décodage rapide de code linéaire sur la base d'une décision pondérée | |
KR20190051245A (ko) | 폴라 부호 복호화 장치 및 방법 | |
KR101874537B1 (ko) | 극 부호의 병렬 복호화 방법 및 장치 | |
US20120110416A1 (en) | Data storage apparatus with encoder and decoder | |
KR101543081B1 (ko) | 고착 고장을 갖는 메모리 셀을 수용하기 위한 리던던트 비트의 인코딩 및 디코딩 | |
WO2014073747A1 (fr) | Procédé pour réduire la consommation d'énergie d'une mémoire flash et appareil associé | |
JP2007323786A (ja) | 半導体装置 | |
WO2020179966A1 (fr) | Procédé et appareil de décodage rapide de code linéaire sur la base d'une décision souple | |
WO2023018157A1 (fr) | Procédé de codage et de décodage de données d'adn utilisant un code de contrôle de parité à faible densité, programme et dispositif | |
JP5336501B2 (ja) | ビット列間のエラー制御コードをエンコードする方法およびエンコードシステム | |
WO2014046395A1 (fr) | Procédé et appareil de codage/décodage utilisant un code inverse creux complémentaire |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23875112 Country of ref document: EP Kind code of ref document: A1 |