WO2021033981A1 - Procédé de décodage flexible fondé sur des informations d'un dispositif de stockage d'adn, programme et appareil - Google Patents

Procédé de décodage flexible fondé sur des informations d'un dispositif de stockage d'adn, programme et appareil Download PDF

Info

Publication number
WO2021033981A1
WO2021033981A1 PCT/KR2020/010571 KR2020010571W WO2021033981A1 WO 2021033981 A1 WO2021033981 A1 WO 2021033981A1 KR 2020010571 W KR2020010571 W KR 2020010571W WO 2021033981 A1 WO2021033981 A1 WO 2021033981A1
Authority
WO
WIPO (PCT)
Prior art keywords
sequence
base
storage device
decoding
computer
Prior art date
Application number
PCT/KR2020/010571
Other languages
English (en)
Korean (ko)
Inventor
김성환
박호성
노알버트
김재원
정재호
노종선
Original Assignee
울산대학교 산학협력단
서울대학교 산학협력단
전남대학교산학협력단
홍익대학교 산학협력단
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from KR1020200080464A external-priority patent/KR102339723B1/ko
Application filed by 울산대학교 산학협력단, 서울대학교 산학협력단, 전남대학교산학협력단, 홍익대학교 산학협력단 filed Critical 울산대학교 산학협력단
Publication of WO2021033981A1 publication Critical patent/WO2021033981A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/12Computing arrangements based on biological models using genetic models
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M13/00Coding, decoding or code conversion, for error detection or error correction; Coding theory basic assumptions; Coding bounds; Error probability evaluation methods; Channel models; Simulation or testing of codes
    • H03M13/37Decoding methods or techniques, not specific to the particular type of coding provided for in groups H03M13/03 - H03M13/35
    • H03M13/39Sequence estimation, i.e. using statistical methods for the reconstruction of the original codes
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits

Definitions

  • the present invention relates to a decoding method, program, and apparatus based on soft information of a DNA storage device.
  • such a large amount of data is stored in various places by using a large number of storage devices such as a hardware device itself, an external hard drive, and a web hard drive.
  • DNA storage technology is a method of storing digital data by replacing the bases of DNA with 0 and 1, and if you store digital data rather than genetics in DNA, you can store about 455 billion GB of data in 1 gram of DNA. That's 215,000 times more data storage capacity than a 1TB hard disk.
  • the method of decoding DNA data, which has been studied and used to date, into digital data is a step in which DNA containing binary information is mass-proliferated and read several times to eliminate errors.Regardless of errors, A (adenine), T ( Thymine), G (guanine), C (cytosine) is taking the method of selecting the highest number.
  • the present invention for solving the above-described problem is to provide a decoding method, program, and apparatus based on soft information of a DNA storage device capable of efficient decoding with a small number of reads.
  • the present invention is to provide a decoding method, program, and apparatus based on soft information of a DNA storage device capable of correcting errors as well as simple error checking.
  • the present invention is to provide a decoding method, program, and apparatus based on soft information of a DNA storage device capable of correcting errors in consideration of all four types of errors, such as substitution errors, insertion errors, deletion errors, and location errors.
  • the decoding method based on soft information of a DNA storage device for solving the above-described problem includes the steps of obtaining, by a computer, a first sequence and a quality score for the first sequence, and the A step of calculating, by a computer, a base probability for each position of the first sequence, based on the quality score, wherein the quality score includes an error rate of each base included in the first sequence,
  • the base probability may be a probability that the base at a specific position of the first sequence corresponds to each of adenine (A, Adenine), guanine (G, Guanine), cytosine (C, Cytosine), and thymine (T, Thymine). have.
  • the first sequence may be a sequence having a target length obtained by stitching a plurality of oligo reads included in a decoded nucleotide sequence file.
  • the calculating of the base probability comprises: calculating, by the computer, a conditional position error rate for each position of the first sequence, and by the computer calculating a base probability based on the quality score and the conditional position error rate. It may include the step of.
  • the step of calculating the conditional position error rate is to calculate the conditional position error rate for each position of the first sequence by comparing the base sequences of the first sequence, the second sequence, and the third sequence, and the second
  • the sequence may be a sequence obtained using an edit distance from the first sequence
  • the third sequence may be a correct answer sequence of the first sequence.
  • the second sequence may be a sequence having the closest edit distance to the first sequence among a plurality of candidate sequences.
  • conditional position error rate may indicate an error rate including at least one of a substitution error, an insertion error, a deletion error, and a position error.
  • the computer may further include calculating a log-likelihood ratio (LLR) of a bit corresponding to each base of the first sequence based on the calculated base probability.
  • LLR log-likelihood ratio
  • the calculating of the log likelihood ratio may include calculating a log likelihood ratio for each of two bits corresponding to a base at a specific position of the first sequence.
  • the decoding program based on soft information of a DNA storage device is combined with a computer that is hardware and stored in a computer-readable recording medium to execute any one of the above methods.
  • the decoding apparatus based on soft information of the DNA storage device according to another embodiment of the present invention obtains a first sequence and a quality score for the first sequence, and based on the quality score, each of the first sequence A base probability for a position is calculated, and the quality score includes an error rate of each base included in the first sequence, and the base probability is that the base at a specific position of the first sequence is adenine (A, Adenine ), guanine (G, Guanine), cytosine (C, Cytosine), and thymine (T, Thymine) may be the probability corresponding to each.
  • A Adenine
  • G Guanine
  • C Cytosine
  • T Thymine
  • FIG. 1 is a flowchart illustrating a decoding method based on soft information of a DNA storage device according to an embodiment of the present invention.
  • FIG. 2 is a flowchart illustrating a method of calculating a base probability according to an embodiment of the present invention.
  • FIG. 3 is a flowchart illustrating a preprocessing and decoding algorithm according to an embodiment of the present invention.
  • FIG. 4 is an exemplary diagram illustrating a method of generating an H matrix during a preprocessing process according to an embodiment of the present invention.
  • unit or “module” used in the specification means software, hardware components such as FPGA or ASIC, and “unit” or “module” performs certain roles. However, “unit” or “module” is not meant to be limited to software or hardware.
  • the “unit” or “module” may be configured to be in an addressable storage medium or may be configured to reproduce one or more processors.
  • sub or “module” refers to components such as software components, object-oriented software components, class components and task components, processes, functions, properties, It includes procedures, subroutines, segments of program code, drivers, firmware, microcode, circuits, data, databases, data structures, tables, arrays and variables. Components and functions provided within “sub” or “module” may be combined into a smaller number of components and “sub” or “module” or into additional components and “sub” or “module” Can be further separated.
  • the term "computer” includes all various devices capable of performing arithmetic processing.
  • computers are not only desktop PCs and notebooks, but also smart phones, tablet PCs, cellular phones, PCS phones, and synchronous/asynchronous systems.
  • a mobile terminal of the International Mobile Telecommunication-2000 (IMT-2000), a Palm Personal Computer (PC), a personal digital assistant (PDA), and the like may also be applicable.
  • the computer may be a server computer that receives information from a client.
  • a sequencing device that performs sequencing may correspond to a computer.
  • oligo' or'oligo read' refers to a polymer in which a plurality of nucleotide units including a specific base (adenine, guanine, cytosine, or thymine) are synthesized.
  • 'sequence' refers to a nucleotide sequence read by sequentially reading (sequencing) a specific oligo read.
  • 'sequence' and'oligo read' may be used interchangeably.
  • 'stitched sequence' may specifically mean'sequence of stitched oligo reads'.
  • the present invention relates to a process of re-decoding (decoding) data encoded as DNA data through mapping specific data with a base sequence of DNA.
  • the present invention is used to decode data encoded with a ruby transform code, a fountain code, a turbo code, a polar code, or a low density parity check (LDPC) code.
  • a ruby transform code a fountain code
  • a turbo code a turbo code
  • a polar code a polar code
  • LDPC low density parity check
  • the following example is an example of a general method of encoding using a Ruby transform code.
  • one image is divided into K LT packets of length L.
  • the divided K LT packets are generated as one encoded packet through ruby transform encoding.
  • the seed bit obtained by the Linear-Feedback Shift Register (LFSR) is attached, and the RS (Reed Solomon) encoding bit corresponding to a specific value or more is attached.
  • LFSR Linear-Feedback Shift Register
  • the final encoded bit is converted into a base sequence according to a predetermined rule.
  • an encoding oligo read (encoding sequence) in which bits encoded according to the rule of'bit ⁇ 00, 01, 10, 11 ⁇ ⁇ base ⁇ A, C, G, T ⁇ ' may be generated may be generated.
  • encoding sequences For the generated encoding oligo reads (encoding sequences), it is checked whether the maximum homopolymer length is less than or equal to a predetermined length, and the guanine and cytosine content ratios belong to a predetermined ratio, and a sequence that satisfies the condition is selected, and the Sequences that do not satisfy the condition are discarded. By repeating the above procedure until a target number of sequences is selected, DNA data is encoded.
  • FIG. 1 is a flowchart illustrating a decoding method based on soft information of a DNA storage device according to an embodiment of the present invention.
  • a computer obtains a first sequence and a quality score (S100), and the computer obtains a quality score. Based on the calculation of the base probability for each position of the first sequence (S200) may be included.
  • the computer obtains a quality score for the first sequence and the first sequence (S100).
  • the first sequence may be obtained by receiving a decoding nucleotide sequence file of sequences encoded by the above-described encoding method.
  • the translation nucleotide sequence file is a text-based standard nucleotide data format file representing a DNA nucleotide sequence, and may mean, for example, a FASTQ file.
  • the translation sequence file contains stiffness information and softness information.
  • the stiffness information may include identification information (sequence, length, etc.) of each read, and sequence (ie, base sequence) information of each read.
  • the soft information may include a quality score of the sequence.
  • the quality score is an index indicating the quality of each base call included in the sequence, and may indicate a probability that the result of each base call is an error. Specifically, when the probability that the specific base calling result is an error is P, the quality score Q for the specific base calling may be calculated as shown in Equation 1 below.
  • the obtained sequence is decoded by using the soft information of the decoded base sequence file together with the rigid information.
  • DNA was sequenced during the decoding process, filtered by RS code, and then mass-proliferated and read multiple times of DNA containing binary information.
  • a decoding method was used to select the one with the largest number of adenine (A), guanine (G), cytosine (C), and thymine (T).
  • the soft information as well as the stiffness information together, a relatively small number of reads are required at the time of decoding, and as a result, decoding can be performed quickly and cost can be reduced.
  • stiffness information can be extracted by utilizing the improved color information based on some color information used in the FASTQ file.
  • a method of extracting the stiffness information after making a DNA pair identical to the encoded DNA and subjecting each base to a fluorescence reaction, the DNA information is changed due to a DNA error, and image information of various colors can be obtained.
  • stiffness information may be extracted by measuring color information for a color in a specific wavelength band such as RGB from the obtained image information.
  • the first sequence may mean a sequence of a target length obtained by using a stitch algorithm for a decoded base sequence file.
  • the stitch algorithm is an algorithm for obtaining a sequence of a target length by evaluating and merging the overlap of paired-end reads of various lengths, and may be, for example, a Paired-End reAd mergeR (PEAR) algorithm. That is, encoding oligo reads (encoding sequences) of various lengths are stitched through a stitch algorithm to obtain a plurality of first sequences (a sequence of encoded oligo reads merged by a target length) of a target length.
  • PEAR Paired-End reAd mergeR
  • the computer may obtain a quality score for the first sequence through the translation base sequence file for the stitched first sequence. That is, the error rate of each of a plurality of base calls included in the first sequence may be obtained.
  • the computer calculates the base probability for each position of the first sequence based on the quality score (S200).
  • the base probability may be a probability that the base at a specific position of the first sequence corresponds to each of adenine (A, Adenine), guanine (G, Guanine), cytosine (C, Cytosine), and thymine (T, Thymine).
  • A Adenine
  • G Guanine
  • C Cytosine
  • T Thymine
  • the probability that the base at the m-th position of the first sequence i.e., the m-th nucleotide of the oligo
  • A adenine
  • G guanine
  • C Cytosine
  • T Thymine
  • the FASTQ file provides a quality score (error rate of base recall) only for one base called base for each location, but the present invention calculates the probability of each of the four bases for each location, It even calculates the quality score.
  • FIG. 2 is a flowchart illustrating a method of calculating a base probability according to an embodiment of the present invention.
  • step S200 includes: a computer calculating a conditional position error rate for each position of the first sequence (S210), a computer calculating a base probability based on a quality score and the conditional position error rate (S220) ) Can be included.
  • the computer calculates a conditional position error rate for each position of the first sequence (S210).
  • Errors derived from the encoded sequence are four types of errors, including substitution errors, insertion errors, deletion errors, and position errors.
  • the present invention is characterized in correcting the above-described error, and in particular, is characterized in correcting the positional error.
  • the conditional location error rate may represent an error rate including at least one of a replacement error, an insertion error, a deletion error, and a location error.
  • step S210 may be to calculate a conditional position error rate for each position of the first sequence by comparing the base sequences of the first sequence, the second sequence, and the third sequence.
  • the first sequence may be a sequence of a target length obtained by stitching sequences encoded by a specific encoding method.
  • the second sequence may be a speculative encoding sequence obtained based on the first sequence.
  • the second sequence may be obtained using an edit distance from the first sequence.
  • the second sequence may be a sequence having the closest edit distance to the first sequence among a plurality of candidate encoding sequences for the first sequence.
  • the path of the calculated edit distance between the second sequence and the first sequence it is possible to obtain what type of error (replacement, insertion, deletion, position error) occurred at which location, and based on this You can also generate statistics for errors.
  • the third sequence may be a correct answer encoding sequence for the first sequence. That is, the third sequence corresponds to a specific first sequence, and may be a sequence when input data is encoded without error.
  • both the second sequence and the third sequence for each first sequence may be acquired.
  • the computer compares the first sequence with the second sequence and the third sequence to calculate a conditional position error rate for each position in the first sequence.
  • a process of calculating the conditional position error rate will be described in detail.
  • N read is the number of acquired first sequences (up and read).
  • Each index of the first sequence when said j, j is a second sequence that is to belong to Z [N read] satisfies j ⁇ Z [N read], and, g (j) corresponds to each of the first sequence, Is the index of. That is, a sequence obtained so that the edit distance is closest to the j-th first sequence becomes the g(j)-th second sequence.
  • the index of the specific position in each sequence is i
  • x m,n is the m-th position in the n-th second sequence
  • y m,n' is the m-th position in the n'first sequence.
  • Equation 2 the i-th conditional position error rate of the j-th first sequence is defined as in Equation 2 below.
  • N i (b (k) ) is the number of bases (A, C, G, T) at the i-th position of the plurality of second sequences.
  • the computer calculates the base probability based on the quality score and the conditional location error rate (S220).
  • Equation 3 the probability corresponding to each of the three bases (b (k) ) excluding the base called base (b) can be calculated as shown in Equation 3 below.
  • Equation 3 k ⁇ Z[3], b (k) ⁇ A, C, G, T ⁇ , and b (k) ⁇ b.
  • the base at the specific position is adenine (A), guanine. All four probabilities corresponding to each of (G), cytosine (C), and thymine (T) can be calculated, and decoding can be performed based on this.
  • the probability that each of the four bases (adenine, guanine, cytosine, thymine) corresponds to a specific position of the oligo read is calculated using soft information (quality score), and is decoded by taking this into account. It is possible to reduce the number of reads required for the device, thereby enabling accurate and rapid decoding at low cost.
  • 3 and 4 are flowcharts illustrating a preprocessing method according to an embodiment of the present invention.
  • the decoding method based on soft information of the DNA storage device performs a data preprocessing step (S300) before inputting the first sequence for which the base probability is calculated into a decoding algorithm. It may contain more.
  • the preprocessing step S300 may include the following steps.
  • Step S310 Select a stitched first sequence of a target length in which no uncalled base exists.
  • Step S320 The stiffness information of the seed value is compared to correspond to the seed of the same value, and the seed and the LLR of the portion excluding the RS are merged.
  • Step S330 In the case of merging of the RS parts, the maximum value is determined as the stiffness information by multiplying each probability of A, G, C, and T.
  • Step S350 An H matrix corresponding to L m seed values is generated.
  • the computer further comprises calculating a log-likelihood ratio (LLR) of a bit corresponding to each base of the first sequence based on the calculated base probability. can do.
  • LLR log-likelihood ratio
  • the log likelihood ratio of the bits corresponding to each base is calculated is that it is easier to determine the soft information when calculating the logarithm. Therefore, the log likelihood ratio is calculated for each bit in order to calculate the probability for each bit of each base.
  • step S320 may be to calculate a log likelihood ratio for each of the two bits corresponding to the base at a specific position in the first sequence.
  • the load likelihood ratio must be calculated for each bit.
  • y 1 i,,j and y 2 i,,j are defined as the first bit and the second bit for the i-th position of the j-th first sequence.
  • Equation 4 the log likelihood ratio for each bit of the base call at a specific position may be calculated as shown in Equation 4 below.
  • the log likelihood ratio is calculated as a ratio of each probability, and soft information can be more easily determined, and by using the log likelihood ratio, each probability can be processed before error correction.
  • step S350 may be a step of generating an H matrix.
  • step S350 may be performed when the decoding nucleotide sequence file is based on the encoding of a ruby transform (LT) code or a fountain code.
  • LT ruby transform
  • FIG. 4 is an exemplary diagram illustrating a method of generating an H matrix according to an embodiment of the present invention.
  • the H matrix may be a parity check matrix. That is, when encoding is performed by adding a parity bit (30, parity bit) to an information bit (10, information bit) to check whether an error occurs in the information transfer process, such as a Ruby transform code or fountain code, error checking is efficient.
  • a parity bit (30, parity bit)
  • an information bit (10, information bit)
  • error checking is efficient.
  • an H matrix may be generated by allocating an LLR value of 0 to a punctured information bit and a calculated LLR value to a received parity bit.
  • the decoding method based on soft information of the DNA storage device further includes a step (S400) of inputting data that has undergone the above-described pre-processing step (S300) into a decoding algorithm. I can.
  • the decoding algorithm may be configured as follows.
  • the input value may include L m merged sequences (up and read) including a seed value, a maximum number of re-decodes n re , an LLR value for the merged sequence, and RS parity.
  • the output value may be K LT XL information bits.
  • Step S410 Soft decoding of the L binary ruby transform code is performed.
  • Step S420 RS decoding is performed on each decoded oligo.
  • Step S430 If all RS decoding succeeds without error at the seed position, the next step is skipped by finding the punctured information bit using the decoded parity bit (encoded bit).
  • Step S450 In the case of i ⁇ n re , and decoding an erroneous oligo or an erroneous RS at the seed position, the corresponding oligo is removed and H is reconstructed. Then, it returns to step S410 with an initial log likelihood ratio for the other bit of oligo and i ⁇ i+1.
  • step S440 the number of times of re-decoding of the ruby transform code is limited. This is because the soft decoding of the Ruby transform codes is performed based on H constructed from the seed value, and thus the decoding result is unreliable when the RS (Reed Solomon) decoder determines that there is an error at the seed position. In addition, this is because the decoding result in the case of the presence of an oligo in which RS (Reed Solomon) decoding failure is present is not reliable.
  • the ruby transform code is re-decoded using the initial log likelihood ratio by removing the oligos corresponding to the two cases of step S450.
  • the decoding method based on soft information of a DNA storage device may be implemented as a program (or application) to be executed by being combined with a computer, which is hardware, and stored in a computer-readable recording medium.
  • the program is a computer such as C, C++, JAVA, and machine language that can be read by the computer's processor (CPU) through the device interface of the computer in order for the computer to read the program and execute the methods implemented as a program.
  • It may include a code coded in a language.
  • Such code may include a functional code related to a function defining necessary functions for executing the methods, and a control code related to an execution procedure necessary for the processor of the computer to execute the functions according to a predetermined procedure. can do.
  • code may further include additional information required for the processor of the computer to execute the functions or code related to a memory reference to which location (address address) of the internal or external memory of the computer should be referenced. have.
  • the code uses the communication module of the computer to determine how It may further include a communication-related code for whether to communicate, what kind of information or media should be transmitted and received during communication.
  • the stored medium is not a medium that stores data for a short moment, such as a register, cache, memory, etc., but a medium that stores data semi-permanently and can be read by a device.
  • examples of the storage medium include, but are not limited to, ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like. That is, the program may be stored in various recording media on various servers to which the computer can access, or on various recording media on the user's computer.
  • the medium may be distributed over a computer system connected through a network, and computer-readable codes may be stored in a distributed manner.
  • the decoding computing device based on soft information of the DNA storage device according to another embodiment of the present invention acquires a first sequence and a quality score for the first sequence, and based on the quality score, the first sequence
  • the base probability for each position is calculated, and the quality score includes an error rate of each base included in the first sequence, and the base probability is that the base at a specific position in the first sequence is adenine (A , Adenine), guanine (G, Guanine), cytosine (C, Cytosine), and thymine (T, Thymine), respectively.
  • FIGS. 1 to 4 To a decoding computing device based on soft information of a DNA storage device, the contents described above in FIGS. 1 to 4 may be equally applied.
  • RAM Random Access Memory
  • ROM Read Only Memory
  • EPROM Erasable Programmable ROM
  • EEPROM Electrically Erasable Programmable ROM
  • Flash Memory Hard Disk, Removable Disk, CD-ROM, or It may reside on any type of computer-readable recording medium well known in the art to which the present invention pertains.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Biophysics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Genetics & Genomics (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

La présente invention concerne un procédé de décodage flexible fondé sur des informations d'un dispositif de stockage d'ADN, un programme et un appareil. Selon un mode de réalisation, la présente invention concerne un procédé de décodage flexible fondé sur des informations d'un dispositif de stockage d'ADN. Ledit procédé comprend les étapes suivantes dans lequel un ordinateur : acquiert une première séquence et un score de qualité de la première séquence ; calcule une probabilité de base sur chaque emplacement dans la première séquence sur la base du score de qualité, le score de qualité comprenant un taux d'erreur de chaque base comprise dans la première séquence et la probabilité de base pouvant être une probabilité correspondant à l'adénine (A), la guanine (G), la cytosine (C) et la thymine (T) qui sont des bases situées sur des emplacements spécifiques de la première séquence.
PCT/KR2020/010571 2019-08-21 2020-08-10 Procédé de décodage flexible fondé sur des informations d'un dispositif de stockage d'adn, programme et appareil WO2021033981A1 (fr)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
KR10-2019-0102191 2019-08-21
KR20190102191 2019-08-21
KR1020200080464A KR102339723B1 (ko) 2019-08-21 2020-06-30 Dna 저장 장치의 연성 정보 기반 복호화 방법, 프로그램 및 장치
KR10-2020-0080464 2020-06-30

Publications (1)

Publication Number Publication Date
WO2021033981A1 true WO2021033981A1 (fr) 2021-02-25

Family

ID=74660513

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2020/010571 WO2021033981A1 (fr) 2019-08-21 2020-08-10 Procédé de décodage flexible fondé sur des informations d'un dispositif de stockage d'adn, programme et appareil

Country Status (1)

Country Link
WO (1) WO2021033981A1 (fr)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113314187A (zh) * 2021-05-27 2021-08-27 广州大学 一种数据存储方法、解码方法、系统、装置及存储介质
CN113539370A (zh) * 2021-06-29 2021-10-22 中国科学院深圳先进技术研究院 编码方法、解码方法、装置、终端设备及可读存储介质
CN114218937A (zh) * 2021-11-24 2022-03-22 中国科学院深圳先进技术研究院 数据纠错方法、装置及电子设备
CN114356222A (zh) * 2021-12-13 2022-04-15 深圳先进技术研究院 数据存储方法、装置、终端设备及计算机可读存储介质
WO2023096672A1 (fr) * 2021-11-23 2023-06-01 Pleno, Inc. Détection multiplexée de biomolécules cibles

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100609656B1 (ko) * 2005-02-04 2006-08-08 재단법인서울대학교산학협력재단 디엔에이 서열 어셈블리 방법 및 그 기록매체
KR20150059101A (ko) * 2013-11-18 2015-05-29 한국전자통신연구원 염색체 전좌의 위치 계산방법
KR20170096387A (ko) * 2016-02-16 2017-08-24 서울대학교산학협력단 동형 암호화된 염기서열의 편집 거리 산출 방법
KR20180130755A (ko) * 2017-05-30 2018-12-10 단국대학교 산학협력단 Dna 샷건 시퀀싱 또는 rna 전사체 어셈블리를 위한 콘티그 프로파일의 업데이트 방법 및 콘티그 형성 방법

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100609656B1 (ko) * 2005-02-04 2006-08-08 재단법인서울대학교산학협력재단 디엔에이 서열 어셈블리 방법 및 그 기록매체
KR20150059101A (ko) * 2013-11-18 2015-05-29 한국전자통신연구원 염색체 전좌의 위치 계산방법
KR20170096387A (ko) * 2016-02-16 2017-08-24 서울대학교산학협력단 동형 암호화된 염기서열의 편집 거리 산출 방법
KR20180130755A (ko) * 2017-05-30 2018-12-10 단국대학교 산학협력단 Dna 샷건 시퀀싱 또는 rna 전사체 어셈블리를 위한 콘티그 프로파일의 업데이트 방법 및 콘티그 형성 방법

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ANONYMOUS: "Can you store information in DNA?", ETRI WEBZIN, 1 December 2017 (2017-12-01), pages 1 - 3, XP055782632, Retrieved from the Internet <URL:https://www.etri.re.kr/webzine/20171201/sub04.html> [retrieved on 20210305] *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113314187A (zh) * 2021-05-27 2021-08-27 广州大学 一种数据存储方法、解码方法、系统、装置及存储介质
CN113314187B (zh) * 2021-05-27 2022-05-10 广州大学 一种数据存储方法、解码方法、系统、装置及存储介质
CN113539370A (zh) * 2021-06-29 2021-10-22 中国科学院深圳先进技术研究院 编码方法、解码方法、装置、终端设备及可读存储介质
CN113539370B (zh) * 2021-06-29 2024-02-20 中国科学院深圳先进技术研究院 编码方法、解码方法、装置、终端设备及可读存储介质
WO2023096672A1 (fr) * 2021-11-23 2023-06-01 Pleno, Inc. Détection multiplexée de biomolécules cibles
CN114218937A (zh) * 2021-11-24 2022-03-22 中国科学院深圳先进技术研究院 数据纠错方法、装置及电子设备
CN114356222A (zh) * 2021-12-13 2022-04-15 深圳先进技术研究院 数据存储方法、装置、终端设备及计算机可读存储介质

Similar Documents

Publication Publication Date Title
WO2021033981A1 (fr) Procédé de décodage flexible fondé sur des informations d&#39;un dispositif de stockage d&#39;adn, programme et appareil
US20230326466A1 (en) Text processing method and apparatus, electronic device, and medium
US7199729B2 (en) Character code conversion methods and systems
WO2013065944A1 (fr) Procédé de recombinaison de séquence, et appareil pour séquençage de nouvelle génération
US8156414B2 (en) String reconstruction using multiple strings
CN111144402A (zh) Ocr识别准确率的计算方法、装置、设备以及存储介质
KR102339723B1 (ko) Dna 저장 장치의 연성 정보 기반 복호화 방법, 프로그램 및 장치
WO2022220354A1 (fr) Dispositif de système de surveillance d&#39;écosystème de banc de poissons destiné à détecter une anomalie dans un écosystème de banc de poissons et son procédé de fonctionnement
Belbasi et al. The minimizer Jaccard estimator is biased and inconsistent
WO2022166808A1 (fr) Procédé et appareil de restauration de texte, et dispositif électronique
WO2013015548A2 (fr) Procédé de codage / décodage ldpc et dispositif l&#39;utilisant
WO2019045185A1 (fr) Dispositif mobile, et procédé de correction de chaîne de caractères saisie via un clavier virtuel
WO2015056818A1 (fr) Filtre de bloom de comptage
WO2020231020A1 (fr) Procédé et appareil de décodage rapide de code linéaire sur la base d&#39;une décision pondérée
WO2020179966A1 (fr) Procédé et appareil de décodage rapide de code linéaire sur la base d&#39;une décision souple
US9448975B2 (en) Character data processing method, information processing method, and information processing apparatus
WO2022030805A1 (fr) Système et procédé de reconnaissance vocale pour étalonner automatiquement une étiquette de données
WO2022010064A1 (fr) Dispositif électronique et procédé de commande associé
WO2021015403A1 (fr) Appareil électronique et procédé de commande associé
WO2021230470A1 (fr) Dispositif électronique et son procédé de commande
CN114417834A (zh) 文本的处理方法、装置、电子设备及可读存储介质
US7496842B2 (en) Apparatus and method for automatic spelling correction
WO2014181937A1 (fr) Système et procédé d&#39;alignement de séquence génomique prenant en compte la qualité de lecture
WO2023018157A1 (fr) Procédé de codage et de décodage de données d&#39;adn utilisant un code de contrôle de parité à faible densité, programme et dispositif
CN109684437B (zh) 用于文件比较的内容对齐方法、装置、存储介质和设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20855299

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20855299

Country of ref document: EP

Kind code of ref document: A1