WO2020179962A1 - Dna coding method and biomedical engineering application of same coding method - Google Patents
Dna coding method and biomedical engineering application of same coding method Download PDFInfo
- Publication number
- WO2020179962A1 WO2020179962A1 PCT/KR2019/003570 KR2019003570W WO2020179962A1 WO 2020179962 A1 WO2020179962 A1 WO 2020179962A1 KR 2019003570 W KR2019003570 W KR 2019003570W WO 2020179962 A1 WO2020179962 A1 WO 2020179962A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sequence
- dna fragment
- sum
- codes
- code
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
- G16B30/10—Sequence alignment; Homology search
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B50/00—ICT programming tools or database systems specially adapted for bioinformatics
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B15/00—ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B15/00—ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
- G16B15/10—Nucleic acid folding
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/20—Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
- G16B30/20—Sequence assembly
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B50/00—ICT programming tools or database systems specially adapted for bioinformatics
- G16B50/50—Compression of genetic data
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/20—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
Definitions
- the present invention relates to a DNA code standardization method and an optimized biomedical engineering application of the method.
- DNA (DeoxyriboNucleic Acid), which exists as a genetic material in living organisms, consists of a gene site expressed as a protein and a nongenic site.
- the chemical structure of DNA is that the phosphate group is linked to the 5'carbon of the pentose, which is deoxyribose, and the base is linked to the 1'carbon to form a unit called Nucleotide. Is determined.
- RNA RNA
- U uracil
- a 5'carbon-linked phosphate group is linked to another unit's 3'carbon-OH group by a phosphate diester bond to form a single strand.
- Two complementary single strands connected by a phosphoric acid diester bond form a double helix structure by hydrogen bonding of a complementary base. This double helix was introduced in 1953 by Watson and Crick. [Watson, JD, & Crick, FH (1953). Molecular structure of nucleic acids. Nature , 171 (4356), 737-738.]
- the nucleotide sequence of the gene site in DNA plays an important role in the synthesis of the protein as the three nucleotide codes are translated and linked into one amino acid constituting the protein.
- DNA is transcribed into mRNA and then translated into 20 amino acids according to the sequence of nucleotide sequences.When the translated amino acids are linked by tRNA, proteins are formed and exist as constituents in cells, and are enzymes that mediate various reactions in vivo. It also works.
- Human DNA has 3 billion base pairs (bp) and has a data capacity of GB per person. When this capacity is converted into the number of people, it is insufficient even in PB units. Therefore, rather than analyzing all human DNA sequences, disease-specific SNP (Single Nucleotide Polymorphism, nucleotide polymorphism) sites, etc., are analyzed to predict diseases based on the sequence of short DNA fragments. It is not a reality, and it is necessary to develop various programs to analyze this.
- SNP Single Nucleotide Polymorphism, nucleotide polymorphism
- An object of the present invention is to standardize a DNA base into a binary code (2 bits per base) in which the molecular weight of each base is considered. It provides a method that is optimized for identifying specific patterns.
- Another object of the present invention is to provide an easy method for identifying whether or not complementary binding and pattern using the code sum of nucleotide sequences, and to provide an easy method for predicting the pattern and function of DNA fragments or DNA aptamers.
- Another object of the present invention is to provide an easy method for determining the molecular weight ratio between sequences and the ratio of each base only by the code of the base sequence.
- Another object of the present invention is to provide an easy method for identifying variations in nucleotide sequences and to provide an easy method for predicting diseases by using disease-specific sequence variations such as SNPs.
- the present invention provides a method for standardizing the DNA code, including the following steps: (a) C, T, A, and G are designated as 00, 01, 10, 11, respectively. And (b) when each base is a base pair of G and C and A and T, in the direction of 5'to 3', respectively, 1100 for G and C, 0011 for C and G, and 0011 for A and T, respectively. In the case of the case, it is designated as 1001, and in the case of T and A, it is designated as 0110.
- the present invention provides a method of providing information optimized to identify a specific pattern or secondary structure of a specific DNA fragment or aptamer using standardization of DNA codes including the following steps: (a) C of a specific DNA fragment sequence, Naming T, A, and G as 00, 01, 10, 11, respectively; And (b) comparing the arrangement of codes named by the numerical values with the arrangement of each code sum.
- the step of comparing the arrangement of the codes and the arrangement of the sum of the codes comprises converting the binary number arrangement of 00, 01, 10, and 11 in the step (a) to decimal, and then each sequence It is judged that a stem structure can be formed when the sequence of codes whose sum is 3 is arranged at both ends of two or more pairs, and the sum of the codes of the sequences facing each other is greater than or less than 3, so that complementary bonding cannot be achieved.
- a method of providing information optimized to identify specific patterns or secondary structures of specific DNA fragments or aptamers using DNA code standardization, characterized in that it is determined to form a loop structure when three or more sequences are connected to the center, is preferable. However, it is not limited thereto.
- the present invention provides a method of providing information on the presence or absence of a nucleotide sequence variation of a specific DNA fragment using the DNA code standardization comprising the following steps: (a) C, T, A, and G of the nucleotide sequence of a specific DNA fragment Naming 00, 01, 10 and 11 respectively; And (b) comparing the sum of codes named by the numerical values.
- the step of comparing the sum of the codes comprises converting the number sequence of the binary numbers of 00, 01, 10, and 11 in the step (a) to decimal, and calculating the sum, It is preferable to determine that the mutation exists when there is a difference of 1 to 3 compared to, but is not limited thereto.
- the method comprises comparing the values of the codes obtained by naming C, T, A, and G of the nucleotide sequence of a specific DNA fragment as 00, 01, 10, 11, respectively. It is desirable to be able to check the location, but is not limited thereto.
- the present invention is a computer program for providing information optimized for identifying a specific pattern or secondary structure of a specific DNA fragment or aptamer, which is stored in a computer-readable medium and allows a computer to perform the following steps, They are: (a) naming C, T, A, and G of the base sequence of a specific DNA fragment as 00, 01, 10, 11, respectively; And (b) if the sequence of codes in which the sum of each sequence is 3 is arranged at both ends of two or more pairs after converting the binary number sequence of 00, 01, 10, and 11 in step (a) to decimal.
- the present invention is a computer program for providing information on the presence or absence of a nucleotide sequence mutation of a specific DNA fragment, stored in a computer-readable medium, for causing a computer to perform the following steps, the steps: (a) specific Naming C, T, A, and G of the nucleotide sequence of the DNA fragment as 00, 01, 10, 11, respectively; And (b) converting the number sequence of binary numbers in step (a) into decimal numbers, calculating the sum, and comparing it with the normal sequence to determine that a mutation exists when there is a difference of 1 to 3
- a computer program stored on a computer-readable medium is provided, including determining that it exists.
- the present invention is stored in a computer-readable medium, as a computer program for providing information on the position of the nucleotide sequence mutation sequence of a specific DNA fragment for causing a computer to perform the following steps, the steps: (a ) Naming C, T, A, and G of the nucleotide sequence of a specific DNA fragment as 00, 01, 10, 11, respectively; And (b) comparing the values of the codes obtained by naming C, T, A, and G of the nucleotide sequence of the specific DNA fragment in step (a) as 00, 01, 10, 11, respectively, to determine the position of the mutant sequence.
- a computer program stored on a computer-readable medium is provided, comprising the step of verifying.
- each of the four bases of C, T, A, and G in the order of the smallest molecular weight of DNA is named by codes of 00, 01, 10, 11, respectively, and each base is a base pair of G and C and A and T It provides a method of naming the code so that the sum of the molecular weights coincides with the ratio of the code sum when each is achieved.
- the present invention constructs a system capable of predicting by using SELEX to identify a specific pattern that binds to a reactive group present in each compound by standardizing the aptamer specific to each compound as a code and utilizing it as big data.
- the present invention provides a method of standardizing the sequence of DNA into a code, converting the value of each sequence to a decimal number, and deriving the sum thereof to check the presence or absence of mutations in each sequence and quickly determine the presence of SNPs in a specific disease. .
- the present invention provides an easy method for identifying a specific pattern existing in a nucleotide sequence by standardizing DNA into a code.
- a DNA sequence pattern that binds to a specific target and chemical structure is identified and used as big data to predict an aptamer that binds to a corresponding chemical structural unit, and a SELEX (Systematic evolution of ligands by exponential enrichment) simulation program Provide the necessary information for anger.
- SELEX Systematic evolution of ligands by exponential enrichment
- the present invention provides a method optimized for determining the molecular weight ratio between sequences and the ratio of each base by standardizing DNA into a code reflecting the base molecular weight.
- the present invention provides an easy method for identifying variations within a nucleotide sequence by standardizing DNA into a code reflecting the base molecular weight, and providing an optimized method for comparing the sum and sequence of codes, thereby enabling identification of disease-specific mutations such as SNPs. It provides an easy way to predict disease.
- the DNA code standardization method of the present invention provides an easy method for identifying variations in nucleotide sequences and facilitates prediction of diseases by using disease-specific sequence variations such as SNPs. It provides an easy method for identifying a specific pattern present in a sequence.
- FIG. 2 is a diagram showing that when a designated binary code is paired with the bases of G and C, and A and T, respectively, the ratio of the sum of the codes is 1:1 and is designed to have the same ratio as the actual mass ratio.
- Figure 3 shows the code conversion values of six sequences, a picture showing the comparison of the code sum of each sequence and the molecular weight of each sequence,
- Figure 4 is a check of the pattern of exemplary sequences using the code of the DNA sequence, confirming whether complementary binding is possible according to the code sum of each sequence, and forming a stem-loop structure according to the number of bonds and the number of linked bases. The picture that confirmed the pattern, and
- FIG. 5 shows the code standardization efficiency of the present invention by applying the code to the SNP sequence identified in breast cancer patients.
- the SNP sequence in which the A base at the 14th from Exon 2 is mutated to G is converted into a code, and the number of binary numbers is arranged.
- Example 1 Code standardization according to the molecular weight of each base
- Each of the four bases determining the sequence of the DNA is expressed in a two-digit binary code, which is a computer language, and the molecular weight of each base is analyzed and indicated in FIG.
- Each base G, A, T, C and deoxyribonucleotide linked to one phosphate group were denoted as dGMP, dAMP, dTMP, and dCMP, respectively.
- G ⁇ is nitrogen (N)
- hydrogen (H) is 1 each compared to other bond pairs.
- a and T have two hydrogen bonds in the absence of O or N capable of hydrogen bonding and form a weaker bond than the G ⁇ bond, which forms three hydrogen bonds.
- the code of each base was designated by reflecting the principle of the molecular structure and binding mass ratio of the DNA.
- C, T, A, and G were designated as binary numbers of 00, 01, 10, 11 values in the order of the smallest molecular weight base.
- the code sum represents the sum of each code value after converting the code of each base to a decimal number.
- the code sum of each of G and C, A and T is equal to '3'.
- Example 2 DNA fragment and Of aptamer Optimization of reflecting molecular weight ratio
- the exemplary sequence is a sequence exemplified with the intention of confirming the ratio of molecular weight reflection of the code, and the range is not interpreted as being limited to the sequences of SEQ ID NOs: 1 to 6.
- sequences of SEQ ID Nos: 1 to 6 are as follows.
- the six exemplary sequences are 32 mer nucleotide sequences, and the lengths of the bases are the same, but the types and sequences of bases are various, and the code conversion values of each base are shown in FIG. 3.
- the code sum was calculated by converting the code of each base into a decimal number and then calculating the total sum.
- the code sum was also calculated by reflecting the molecular weight of each sequence according to the base composition of each sequence.
- the code was designated by reflecting the ratio of the molecular weight and was optimized to compare the ratio of the molecular weight of each sequence by using the resultant code sum.
- the sequence of the DNA fragment and the aptamer was converted into a binary base code and optimized to identify specific patterns and secondary structures contained in the sequence by comparing each sequence. To understand this, a DNA sequence consisting of 9 base sequences was used as an exemplary sequence. (Fig. 4)
- SEQ ID NO: 7 An exemplary sequence of SEQ ID NO: 7 is as follows.
- Each base is designed to have a code sum of '3' with a complementary base capable of forming hydrogen bonds, and the arrangement of these sequences can form a stem structure in the DNA aptamer sequence. (Fig. 4; Stem)
- the pattern of the stem-loop structure of DNA is mostly composed of two or more bases that can form a stem structure at both ends, and the sum of the codes of the sequences facing each other is greater or less than 3 to form a complementary bond.
- a loop structure can be formed when three or more sequences that cannot be connected to the center.
- the exemplary sequence can form two stem-loop structures, which can be simply confirmed by nucleotide code arrangement.
- the sequence capable of forming a complementary bond with the first 11 nucleotide code is the base of the eighth 00 code excluding the 00 code next to it (Fig. 4; 1 red arrow), and the base capable of complementary bonding with the second 00 code is 6
- the 11th code Fig. 4; 3 green arrow
- the 7th 11th and 9th 11th codes are complementary to the 8th 00 (Fig. 4; 2 blue arrow) code.
- the stem portion of the stem-loop structure forms a structure when two or more bases are connected, the complementary bonds of the bases connected to the red arrow in FIG. 3 or the complementary bonds of the bases connected to the blue arrow in FIG. ; Dotted round circle), and the complementary bond of the green arrow cannot form a stem structure with a single complementary bond.
- the stem-loop structure can be formed because four bases that can form a loop structure exist in the middle.
- the SNP sequence of the breast cancer patient is that the A base at the 14th from the exon 2 among the sequences present at the position of the first intron 1 of the gene has been mutated to G, and this sequence is converted into a code to be binary. After arranging in an arrangement of, the code sum was calculated, and the code sum of the normal sequence and the mutant sequence was compared. (Fig. 5)
- the normal sequence was 39
- the mutant sequence was 40
- the mutant sequence was identified as a value of 1 greater than the normal sequence.
- the code sum may vary by 1 to 3 depending on the type of the mutated base.
- it is possible to confirm the position of the sequence by comparing the respective values of the mutated codes.
- the difference between sequences can be quickly checked and the presence of SNPs can be easily searched.
- the code sum By applying the code sum to the sequence, it can be used for disease diagnosis.
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biotechnology (AREA)
- Medical Informatics (AREA)
- Biophysics (AREA)
- Theoretical Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Chemical & Material Sciences (AREA)
- Analytical Chemistry (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Molecular Biology (AREA)
- Genetics & Genomics (AREA)
- Crystallography & Structural Chemistry (AREA)
- Bioethics (AREA)
- Databases & Information Systems (AREA)
- Biochemistry (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention relates to a method for code standardization of DNA, the method comprising: (a) assigning codes 00, 01, 10, and 11 to the four bases C, T, A, and G, respectively; and (b) coding base pairs between G and C and between A and T by providing 1100 for G and C, 0011 for C and G, 1001 for A and T, and 0110 for T and A in the 5' to 3' direction. The method for code standardization of DNA according to the present invention provides a method that makes it easy to detect specific patterns present in base sequences such as DNA fragments or aptamers, for example, a method that makes it easy to detect specific patterns and secondary structures in base sequences, mutations of base sequences, etc., and to predict a disease by using disease-specific sequence variations such as SNP, etc.
Description
본 발명은 DNA의 코드 표준화 방법 및 그 방법의 최적화된 의생명공학적 응용에 관한 것이다.The present invention relates to a DNA code standardization method and an optimized biomedical engineering application of the method.
생명체에서 유전물질로 존재하는 DNA(DeoxyriboNucleic Acid)는 단백질로 발현되는 유전자 부위와 비유전자 부위로 구성되어 있다. DNA의 화학 구조는 Deoxyribose인 오탄당의 5'탄소에 인산기와 1'탄소에 염기(base)가 연결되어 뉴클레오티드(Nucloeotide)라는 단위체를 형성하는데 이 때, 뉴클레오티드에 연결된 염기의 종류에 따라 DNA의 서열이 결정된다. DNA (DeoxyriboNucleic Acid), which exists as a genetic material in living organisms, consists of a gene site expressed as a protein and a nongenic site. The chemical structure of DNA is that the phosphate group is linked to the 5'carbon of the pentose, which is deoxyribose, and the base is linked to the 1'carbon to form a unit called Nucleotide. Is determined.
염기의 종류는 2가지 계열로 구분되며 고리 구조가 2개인 퓨린 계열의 염기와 고리구조가 1개인 피리미딘 계열이 있다. 퓨린 계열은 다시 구아닌(G)과 아데닌(A), 피리미딘 계열은 시토신(C)과 티민(T)등이 있으며 RNA의 경우 오탄당의 2'탄소에 -OH기가 연결되어 있는 것과 염기의 구성이 티민 대신 우라실(U)로 치환되어 있는 차이가 있다. 퓨린계열의 G는 피리미딘인 C와 수소결합으로 상보적인 쌍을 이루며 A는 T와 쌍을 이룬다. 이 때, G와 C의 상보 결합은 3개의 수소결합으로 연결되어 있기 때문에 2개의 수소결합을 이루는 A와 T의 결합보다 더 강한 결합을 이루고 있다.There are two types of bases, a purine base with two ring structures and a pyrimidine series with one ring structure. In the purine series, there are guanine (G) and adenine (A), and the pyrimidine series are cytosine (C) and thymine (T). In the case of RNA, the -OH group is connected to the 2'carbon of the pentose and the composition of the base is There is a difference that uracil (U) is substituted for thymine. Purine-series G forms a complementary pair with pyrimidine C through a hydrogen bond, and A forms a pair with T. At this time, since the complementary bonds of G and C are connected by three hydrogen bonds, a stronger bond is formed than the bonds of A and T that form two hydrogen bonds.
DNA의 뉴클레오티드 단위체는 5'탄소에 연결된 인산기가 또 다른 단위체의 3'탄소 -OH기와 인산다이에스터 결합(Phosphodiester bond)으로 연결되어 하나의 가닥을 이룬다. 인산 다이에스터 결합으로 연결된 2개의 상보적인 단일가닥들은 상보 염기의 수소결합으로 이중 나선 구조를 형성하고 있다. 이러한 이중나선구조는 1953년 왓슨과 크릭에 의해 소개되었다. [Watson, J. D., & Crick, F. H. (1953). Molecular structure of nucleic acids.
Nature,
171(4356), 737-738.]In the nucleotide unit of DNA, a 5'carbon-linked phosphate group is linked to another unit's 3'carbon-OH group by a phosphate diester bond to form a single strand. Two complementary single strands connected by a phosphoric acid diester bond form a double helix structure by hydrogen bonding of a complementary base. This double helix was introduced in 1953 by Watson and Crick. [Watson, JD, & Crick, FH (1953). Molecular structure of nucleic acids. Nature , 171 (4356), 737-738.]
DNA 중 유전자 부위의 염기서열은 3개의 염기 코드가 단백질을 구성하는 하나의 아미노산(Amino acid)으로 번역되어 연결되면서 단백질이 합성되는데 중요한 역할을 한다. DNA는 mRNA로 전사된 후 염기서열의 순서에 따라 20가지의 아미노산으로 번역되는데 번역되는 아미노산이 tRNA에 의해 연결되면서 단백질이 형성되어 세포 내의 구성 물질로 존재하고, 생체 내 여러 반응을 매개하는 효소로써 작용하기도 한다.The nucleotide sequence of the gene site in DNA plays an important role in the synthesis of the protein as the three nucleotide codes are translated and linked into one amino acid constituting the protein. DNA is transcribed into mRNA and then translated into 20 amino acids according to the sequence of nucleotide sequences.When the translated amino acids are linked by tRNA, proteins are formed and exist as constituents in cells, and are enzymes that mediate various reactions in vivo. It also works.
인간의 DNA의 경우 30억 개의 염기쌍(bp)을 가지며 한 사람당 GB단위의 데이터 용량을 가진다. 이 용량을 인구 수로 환산하면 PB단위로도 부족한 실정이다. 때문에 인간의 모든 DNA sequence를 분석하기보다는 질병 특이적인 SNP(Single Nucleotide polymorphism, 염기다형성)부위 등을 분석함으로써 짧은 DNA 절편의 sequence로 질병 예측 분석이 이루어지고 있지만, 이마저도 모든 유전자의 SNP 부위를 분석해내지 못한 실정이며, 이를 분석하기 위한 다양한 프로그램 개발이 필요하다. Human DNA has 3 billion base pairs (bp) and has a data capacity of GB per person. When this capacity is converted into the number of people, it is insufficient even in PB units. Therefore, rather than analyzing all human DNA sequences, disease-specific SNP (Single Nucleotide Polymorphism, nucleotide polymorphism) sites, etc., are analyzed to predict diseases based on the sequence of short DNA fragments. It is not a reality, and it is necessary to develop various programs to analyze this.
[선행 특허 문헌][Prior patent literature]
대한민국 공개특허 10-2016-0001455Republic of Korea Patent Publication 10-2016-0001455
본 발명은 상기 문제점을 해결하고, 상기의 필요성에 의해 안출된 것으로 본 발명의 목적은 DNA 염기를 각 염기의 분자량이 고려된 2진수 코드(1 염기당 2 bit)로 표준화하여 염기 서열 내에 존재하는 특정 패턴 파악에 최적화된 방법을 제공하는 것이다. The present invention solves the above problems, and is conceived by the necessity of the above. An object of the present invention is to standardize a DNA base into a binary code (2 bits per base) in which the molecular weight of each base is considered. It provides a method that is optimized for identifying specific patterns.
본 발명의 다른 목적은 염기서열의 코드합을 이용한 상보 결합 여부 및 패턴 파악에 용이한 방법을 제공하고 DNA 단편이나 DNA 압타머의 패턴 및 기능을 예측하는데 용이한 방법을 제공하는 것이다.Another object of the present invention is to provide an easy method for identifying whether or not complementary binding and pattern using the code sum of nucleotide sequences, and to provide an easy method for predicting the pattern and function of DNA fragments or DNA aptamers.
본 발명의 또 다른 목적은 염기서열의 코드만으로 서열 간의 분자량 비율과 각 염기의 비율 등을 파악하는데 용이한 방법을 제공하는 것이다.Another object of the present invention is to provide an easy method for determining the molecular weight ratio between sequences and the ratio of each base only by the code of the base sequence.
본 발명의 또 다른 목적은 염기 서열 내의 변이 파악에 용이한 방법을 제공하고 SNP 등의 질병 특이적인 서열 변이를 이용함으로써 질병 예측에 용이한 방법을 제공하는 것이다.Another object of the present invention is to provide an easy method for identifying variations in nucleotide sequences and to provide an easy method for predicting diseases by using disease-specific sequence variations such as SNPs.
상기의 목적을 달성하기 위하여 본 발명은 다음 단계를 포함하는 DNA의 코드 표준화하는 방법을 제공한다: (a) C, T, A, G인 4가지 염기에 각각 00, 01, 10, 11로 명명하고, (b) 각 염기가 G와 C 그리고 A와 T의 염기 쌍을 이루었을 때는 5'에서 3'방향으로 각각 G와 C의 경우에는 1100, C와 G의 경우에는 0011, A와 T의 경우에는 1001, T와 A의 경우에는 0110으로 명명한다.In order to achieve the above object, the present invention provides a method for standardizing the DNA code, including the following steps: (a) C, T, A, and G are designated as 00, 01, 10, 11, respectively. And (b) when each base is a base pair of G and C and A and T, in the direction of 5'to 3', respectively, 1100 for G and C, 0011 for C and G, and 0011 for A and T, respectively. In the case of the case, it is designated as 1001, and in the case of T and A, it is designated as 0110.
또한 본 발명은 다음 단계를 포함하는 DNA의 코드 표준화를 이용한 특정 DNA 단편이나 압타머의 특정패턴이나 2차 구조 확인하는데 최적화된 정보 제공 방법을 제공한다:(a) 특정 DNA 단편 염기서열의 C, T, A, 및 G를 각각 00, 01, 10, 11로 명명하는 단계; 및 (b) 상기 수치로 명명화된 코드의 배열과 각 코드 합의 배열을 비교하는 단계.In addition, the present invention provides a method of providing information optimized to identify a specific pattern or secondary structure of a specific DNA fragment or aptamer using standardization of DNA codes including the following steps: (a) C of a specific DNA fragment sequence, Naming T, A, and G as 00, 01, 10, 11, respectively; And (b) comparing the arrangement of codes named by the numerical values with the arrangement of each code sum.
본 발명의 일 구현예에 있어서, 상기 코드의 배열과 각 코드 합의 배열을 비교하는 단계는 상기 (a) 단계의 00, 01, 10, 및 11의 이진수의 수 배열을 십진수로 변형한 후에 각 서열의 합이 3이 되는 코드의 배열이 2 쌍 이상 양 끝에 배열되어 있는 경우에 스템 구조를 형성할 수 있다고 판단하며, 서로 마주보고 있는 서열의 코드합이 3보다 크거나 작아 상보 결합을 이룰 수 없는 서열이 3개 이상 중심에 연결되어 있을 때 루프 구조를 형성한다고 판단하는 것을 특징으로 하는 DNA의 코드 표준화를 이용한 특정 DNA 단편이나 압타머의 특정패턴이나 2차 구조 확인하는데 최적화된 정보 제공 방법이 바람직하나 이에 한정되지 아니한다.In one embodiment of the present invention, the step of comparing the arrangement of the codes and the arrangement of the sum of the codes comprises converting the binary number arrangement of 00, 01, 10, and 11 in the step (a) to decimal, and then each sequence It is judged that a stem structure can be formed when the sequence of codes whose sum is 3 is arranged at both ends of two or more pairs, and the sum of the codes of the sequences facing each other is greater than or less than 3, so that complementary bonding cannot be achieved. A method of providing information optimized to identify specific patterns or secondary structures of specific DNA fragments or aptamers using DNA code standardization, characterized in that it is determined to form a loop structure when three or more sequences are connected to the center, is preferable. However, it is not limited thereto.
또한 본 발명은 다음 단계를 포함하는 DNA의 코드 표준화를 이용한 특정 DNA 단편의 염기서열 변이 존재 여부에 대한 정보제공 방법을 제공한다:(a) 특정 DNA 단편 염기서열의 C, T, A, 및 G를 각각 00, 01, 10, 11로 명명하는 단계; 및 (b) 상기 수치로 명명화된 코드의 합을 비교하는 단계.In addition, the present invention provides a method of providing information on the presence or absence of a nucleotide sequence variation of a specific DNA fragment using the DNA code standardization comprising the following steps: (a) C, T, A, and G of the nucleotide sequence of a specific DNA fragment Naming 00, 01, 10 and 11 respectively; And (b) comparing the sum of codes named by the numerical values.
본 발명의 일 구현예에 있어서, 상기 코드의 합을 비교하는 단계는 상기 (a) 단계의 00, 01, 10, 및 11의 이진수의 수 배열을 십진수로 변형한 후 그 합을 구한 후에 정상 서열과 비교하여 1 내지 3의 차이가 있는 경우에 변이가 존재한다고 판단하는 것을 특징으로 하는 것이 바람직하나 이에 한정되지 아니한다.In one embodiment of the present invention, the step of comparing the sum of the codes comprises converting the number sequence of the binary numbers of 00, 01, 10, and 11 in the step (a) to decimal, and calculating the sum, It is preferable to determine that the mutation exists when there is a difference of 1 to 3 compared to, but is not limited thereto.
본 발명의 다른 구현예에 있어서, 상기 방법은 특정 DNA 단편의 염기서열의 C, T, A, 및 G를 각각 00, 01, 10, 11로 명명하여 얻어진 코드의 각각 수치를 비교함으로써 변이 서열의 위치를 확인할 수 있는 것이 바람직하나 이에 한정되지 아니한다.In another embodiment of the present invention, the method comprises comparing the values of the codes obtained by naming C, T, A, and G of the nucleotide sequence of a specific DNA fragment as 00, 01, 10, 11, respectively. It is desirable to be able to check the location, but is not limited thereto.
또 본 발명은 컴퓨터-판독가능 매체에 저장되어, 컴퓨터로 하여금 이하의 단계들을 수행하도록 하기 위한 특정 DNA 단편이나 압타머의 특정패턴이나 2차 구조 확인하는데 최적화된 정보제공용 컴퓨터 프로그램으로서, 상기 단계들은:(a) 특정 DNA 단편의 염기서열의 C, T, A, 및 G를 각각 00, 01, 10, 11로 명명하는 단계; 및 (b) 상기 (a) 단계의 00, 01, 10, 및 11의 이진수의 수 배열을 십진수로 변형한 후에 각 서열의 합이 3이 되는 코드의 배열이 2 쌍 이상 양 끝에 배열되어 있는 경우에 스템 구조를 형성할 수 있다고 판단하며, 서로 마주보고 있는 서열의 코드합이 3보다 크거나 작아 상보 결합을 이룰 수 없는 서열이 3개 이상 중심에 연결되어 있을 때 루프 구조를 형성한다고 판단하는 단계를 포함하는, 컴퓨터-판독가능 매체에 저장된 컴퓨터 프로그램을 제공한다.In addition, the present invention is a computer program for providing information optimized for identifying a specific pattern or secondary structure of a specific DNA fragment or aptamer, which is stored in a computer-readable medium and allows a computer to perform the following steps, They are: (a) naming C, T, A, and G of the base sequence of a specific DNA fragment as 00, 01, 10, 11, respectively; And (b) if the sequence of codes in which the sum of each sequence is 3 is arranged at both ends of two or more pairs after converting the binary number sequence of 00, 01, 10, and 11 in step (a) to decimal. Determining that a stem structure can be formed, and determining that a loop structure is formed when three or more sequences that cannot achieve complementary binding are connected to the center of the code sum of the sequences facing each other is greater than or less than 3 Including, it provides a computer program stored in a computer-readable medium.
또한 본 발명은 컴퓨터-판독가능 매체에 저장되어, 컴퓨터로 하여금 이하의 단계들을 수행하도록 하기 위한 특정 DNA 단편의 염기서열 변이 존재 여부에 대한 정보제공용 컴퓨터 프로그램으로서, 상기 단계들은:(a) 특정 DNA 단편의 염기서열의 C, T, A, 및 G를 각각 00, 01, 10, 11로 명명하는 단계; 및 (b) 상기 (a) 단계의 이진수의 수 배열을 십진수로 변형한 후 그 합을 구한 후에 정상 서열과 비교하여 1 내지 3의 차이가 있는 경우에 변이가 존재한다고 판단하는 것을 경우에 변이가 존재한다고 판단하는 단계를 포함하는, 컴퓨터-판독가능 매체에 저장된 컴퓨터 프로그램을 제공한다.In addition, the present invention is a computer program for providing information on the presence or absence of a nucleotide sequence mutation of a specific DNA fragment, stored in a computer-readable medium, for causing a computer to perform the following steps, the steps: (a) specific Naming C, T, A, and G of the nucleotide sequence of the DNA fragment as 00, 01, 10, 11, respectively; And (b) converting the number sequence of binary numbers in step (a) into decimal numbers, calculating the sum, and comparing it with the normal sequence to determine that a mutation exists when there is a difference of 1 to 3 A computer program stored on a computer-readable medium is provided, including determining that it exists.
또한 본 발명은 컴퓨터-판독가능 매체에 저장되어, 컴퓨터로 하여금 이하의 단계들을 수행하도록 하기 위한 특정 DNA 단편의 염기서열 변이 서열에 대한 위치에 대한 정보제공용 컴퓨터 프로그램으로서, 상기 단계들은:(a) 특정 DNA 단편의 염기서열의 C, T, A, 및 G를 각각 00, 01, 10, 11로 명명하는 단계; 및 (b) 상기 (a)단계의 특정 DNA 단편의 염기서열의 C, T, A, 및 G를 각각 00, 01, 10, 11로 명명하여 얻어진 코드의 각각 수치를 비교함으로써 변이 서열의 위치를 확인하는 단계를 포함하는, 컴퓨터-판독가능 매체에 저장된 컴퓨터 프로그램을 제공한다.In addition, the present invention is stored in a computer-readable medium, as a computer program for providing information on the position of the nucleotide sequence mutation sequence of a specific DNA fragment for causing a computer to perform the following steps, the steps: (a ) Naming C, T, A, and G of the nucleotide sequence of a specific DNA fragment as 00, 01, 10, 11, respectively; And (b) comparing the values of the codes obtained by naming C, T, A, and G of the nucleotide sequence of the specific DNA fragment in step (a) as 00, 01, 10, 11, respectively, to determine the position of the mutant sequence. A computer program stored on a computer-readable medium is provided, comprising the step of verifying.
이하 본 발명을 설명한다.The present invention will be described below.
본 발명은 DNA의 각각 분자량이 작은 순으로 C, T, A, G인 4가지 염기에 각각 00, 01, 10, 11의 코드로 명명하고, 각 염기가 G와 C 그리고 A와 T의 염기 쌍을 이루었을 때 각각 분자량의 합이 코드합의 비율과 일치하도록 코드를 명명하는 방법을 제공한다.In the present invention, each of the four bases of C, T, A, and G in the order of the smallest molecular weight of DNA is named by codes of 00, 01, 10, 11, respectively, and each base is a base pair of G and C and A and T It provides a method of naming the code so that the sum of the molecular weights coincides with the ratio of the code sum when each is achieved.
또한 본 발명은 SELEX를 이용하여 확인된 각 화합물에 특이적인 압타머를 코드로 표준화함으로써 각 화합물에 존재하는 반응기와 결합하는 특정 패턴을 파악하고 빅데이터로 활용하여 예측할 수 있는 시스템을 구축한다.In addition, the present invention constructs a system capable of predicting by using SELEX to identify a specific pattern that binds to a reactive group present in each compound by standardizing the aptamer specific to each compound as a code and utilizing it as big data.
또한 본 발명은 DNA의 서열을 코드로 표준화한 후 각 서열의 값을 십진수로 변환하고 그의 합을 도출함으로써 각 서열의 변이 유무를 확인하고 특정 질병의 SNP존재 여부를 빠르게 파악할 수 있는 방법을 제공한다.In addition, the present invention provides a method of standardizing the sequence of DNA into a code, converting the value of each sequence to a decimal number, and deriving the sum thereof to check the presence or absence of mutations in each sequence and quickly determine the presence of SNPs in a specific disease. .
본 발명은 DNA를 코드로 표준화함으로써 염기 서열 내에 존재하는 특정 패턴 파악에 용이한 방법을 제공한다.The present invention provides an easy method for identifying a specific pattern existing in a nucleotide sequence by standardizing DNA into a code.
본 발명은 특정 타겟 및 화학구조와 결합하는 DNA Sequence 패턴을 파악하고 이를 빅데이터로 활용함으로써 해당 화학 구조 단위에 결합하는 압타머(Aptamer)를 예측하고 SELEX(Systematic evolution of ligands by exponential enrichment) 시뮬레이션 프로그램화에 필요한 정보를 제공한다.In the present invention, a DNA sequence pattern that binds to a specific target and chemical structure is identified and used as big data to predict an aptamer that binds to a corresponding chemical structural unit, and a SELEX (Systematic evolution of ligands by exponential enrichment) simulation program Provide the necessary information for anger.
또 본 발명은 DNA를 염기 분자량이 반영된 코드로 표준화함으로써 염기서열의 코드만으로 서열 간의 분자량 비율과 각 염기의 비율 등을 파악하는데 최적화한 방법을 제공한다.In addition, the present invention provides a method optimized for determining the molecular weight ratio between sequences and the ratio of each base by standardizing DNA into a code reflecting the base molecular weight.
또한 본 발명은 DNA를 염기 분자량이 반영된 코드로 표준화함으로써 염기 서열내 변이 파악에 용이한 방법을 제공하고 코드의 합과 배열 순서 비교에 최적화된 방법을 제공함으로써 SNP등의 질병 특이적인 변이 파악 가능하며 질병 예측에 용이한 방법을 제공한다. In addition, the present invention provides an easy method for identifying variations within a nucleotide sequence by standardizing DNA into a code reflecting the base molecular weight, and providing an optimized method for comparing the sum and sequence of codes, thereby enabling identification of disease-specific mutations such as SNPs. It provides an easy way to predict disease.
본 발명을 통하여 알 수 있는 바와 같이, 본 발명의 DNA 코드 표준화 방법은 염기 서열 내의 변이 파악에 용이한 방법을 제공하고 SNP 등의 질병 특이적인 서열 변이를 이용함으로써 질병의 예측을 용이하게 하는 등 염기 서열 내에 존재하는 특정 패턴 파악에 용이한 방법을 제공한다.As can be seen from the present invention, the DNA code standardization method of the present invention provides an easy method for identifying variations in nucleotide sequences and facilitates prediction of diseases by using disease-specific sequence variations such as SNPs. It provides an easy method for identifying a specific pattern present in a sequence.
도 1은 DNA의 분자 구조 및 결합 질량비의 원리를 반영하여 지정한 코드 값을 분자량이 작은 염기에서 큰 순으로 C, T, A, G를 00, 01, 10, 11 값의 2진수로 지정한 것을 나타낸 그림,1 shows that the code values designated by reflecting the principle of the molecular structure and binding mass ratio of DNA are designated as binary numbers of 00, 01, 10, and 11 values for C, T, A, and G in the order of the lowest molecular weight base. Drawing,
도 2는 지정된 2진수의 코드가 각각 G와 C, A와 T의 염기가 쌍을 이룰 때 각 코드 합의 비율이 1:1로 실제 질량비와 동일한 비율을 가지도록 설계한 것을 나타낸 그림,FIG. 2 is a diagram showing that when a designated binary code is paired with the bases of G and C, and A and T, respectively, the ratio of the sum of the codes is 1:1 and is designed to have the same ratio as the actual mass ratio.
도 3은 6가지 서열의 코드 변환 값을 나타낸 것으로, 각 서열의 코드 합과 각 서열의 분자량을 비교하여 나타낸 그림,Figure 3 shows the code conversion values of six sequences, a picture showing the comparison of the code sum of each sequence and the molecular weight of each sequence,
도 4는 DNA 서열의 코드를 이용하여 예시 서열의 패턴을 확인한 것으로 각 서열의 코드 합에 따라 상보 결합의 가능 여부를 확인하고, 그 결합의 수와 연결된 염기의 수에 따라 스템-루프 구조 형성과 패턴을 확인한 그림, 및Figure 4 is a check of the pattern of exemplary sequences using the code of the DNA sequence, confirming whether complementary binding is possible according to the code sum of each sequence, and forming a stem-loop structure according to the number of bonds and the number of linked bases. The picture that confirmed the pattern, and
도 5는 유방암 환자에게서 확인되는 SNP서열에 코드를 적용하여 본 발명의 코드 표준화 효율성을 확인한 것으로 Exon 2로부터 14번째에 있는 A염기가 G로 변이되어 있는 SNP 서열을 코드로 변환하고 이진수의 수 배열로 배치한 후 코드합을 구하여 정상 서열과 변이 서열의 코드 합을 비교한 그림.5 shows the code standardization efficiency of the present invention by applying the code to the SNP sequence identified in breast cancer patients. The SNP sequence in which the A base at the 14th from Exon 2 is mutated to G is converted into a code, and the number of binary numbers is arranged. Figure that compares the code sum of the normal sequence and the mutant sequence by calculating the code sum after arranging it with.
이하 본 발명을 비한정적인 실시예를 통하여 상세하게 설명한다. 단 하기 실시예는 본 발명을 예시하기 위한 의도로 기재된 것으로서 본 발명의 범위는 하기 실시예에 의하여 제한되는 것으로 해석되지 아니한다. Hereinafter, the present invention will be described in detail through non-limiting examples. However, the following examples are described with the intention of illustrating the present invention, and the scope of the present invention is not to be construed as being limited by the following examples.
실시예Example
1: 각 염기의 분자량에 따른 코드 표준화 1: Code standardization according to the molecular weight of each base
DNA의 서열을 결정하는 각 4가지의 염기를 컴퓨터 언어인 이진법 두자리의 수로 나타내어 코드로 표준화하기 위해 각 염기의 분자량을 분석하여 도 1에 표기하였다. 각각의 염기 G, A, T, C와 1개의 인산기가 연결된 데옥시리보뉴클레오타이드(deoxyribonucleotide)를 각각 dGMP, dAMP, dTMP, dCMP로 표기하였다. Each of the four bases determining the sequence of the DNA is expressed in a two-digit binary code, which is a computer language, and the molecular weight of each base is analyzed and indicated in FIG. Each base G, A, T, C and deoxyribonucleotide linked to one phosphate group were denoted as dGMP, dAMP, dTMP, and dCMP, respectively.
각 염기는 G, A, T, C 순으로 큰 값을 가지며, G와 수소결합으로 쌍을 이루는 C 그리고 A와 상보 결합하는 T의 분자량을 각각 합하여 비교한 결과 654.4(=347.2+307.2)와 653.4(=331.2+322.2)로 대략 1:1의 동등한 분자 질량을 가진 채 서로 쌍을 이루고 있는 것을 확인하였다. G와 C의 분자량의 합보다 A와 T의 분자량의 합이 1이 적은 것은 G≡는 질소(N)가, A=T는 탄소(C), 수소(H)가 다른 결합쌍에 비해 1개씩 더 있으며, N의 분자량과 C+H의 분자량 합의 차이만큼(14>12+1) 각 쌍의 분자량 합의 차이(=1)가 존재하기 때문이다. 따라서 A와 T는 수소 결합이 가능한 O나 N의 부재로 2개의 수소결합을 이뤄 3개 수소결합을 이루고 있는 G≡결합보다는 약한 결합을 이루는 특성이 있다.Each base has a large value in the order of G, A, T, and C, and as a result of comparing the molecular weights of C that are paired with G by hydrogen bonds and T that are complementary to A and are compared, 654.4 (=347.2+307.2) and 653.4 It was confirmed that they were paired with each other with an equivalent molecular mass of approximately 1:1 as (=331.2+322.2). When the sum of the molecular weights of A and T is 1 less than the sum of the molecular weights of G and C, G≡ is nitrogen (N), A=T is carbon (C), hydrogen (H) is 1 each compared to other bond pairs. There is more, because there is a difference (=1) of the sum of molecular weights of each pair as much as the difference of the sum of molecular weights of N and C+H (14>12+1). Therefore, A and T have two hydrogen bonds in the absence of O or N capable of hydrogen bonding and form a weaker bond than the G≡ bond, which forms three hydrogen bonds.
따라서 각 염기의 코드는 상기 DNA의 분자 구조 및 결합 질량비의 원리를 반영하여 지정하였다. 부여된 각 염기의 코드는 분자량이 작은 염기에서 큰 순으로 C, T, A, G를 00, 01, 10, 11 값의 2진수로 지정하였다. (도 1)Therefore, the code of each base was designated by reflecting the principle of the molecular structure and binding mass ratio of the DNA. In the code of each given base, C, T, A, and G were designated as binary numbers of 00, 01, 10, 11 values in the order of the smallest molecular weight base. (Figure 1)
지정된 코드의 값은 각각 G와 C, A와 T의 염기가 쌍을 이룰 때 각각의 코드합 비율이 1:1로 실제 질량비와 동일한 비율을 가지도록 설계하였다. (도 2)The value of the designated code is designed so that when the bases of G and C, and A and T are paired, the sum of the codes is 1:1, which is the same as the actual mass ratio. (Figure 2)
코드합은 각 염기의 코드를 십진수로 변환한 뒤 각 코드 값의 합을 나타낸 것으로 G와 C, A와 T의 각각의 코드합은 '3'으로 동일하다. The code sum represents the sum of each code value after converting the code of each base to a decimal number. The code sum of each of G and C, A and T is equal to '3'.
실시예Example
2: DNA 단편 및 2: DNA fragment and
압타머(Aptamer)의Of aptamer
분자량 비율 반영 최적화 Optimization of reflecting molecular weight ratio
DNA의 각 염기 분자량에 따라 질량이 낮은 순에서 높은 순으로 코드를 지정하였기 때문에 DNA 단편의 총 코드 합은 각 서열의 분자량의 비율이 반영되어 계산되었다. (도 3) 코드의 분자량 반영 비율을 확인하여 6개의 예시 서열로 코드합과 분자량을 비교하였다.Since codes were assigned from lowest to highest in mass according to the molecular weight of each base of DNA, the total code sum of the DNA fragment was calculated by reflecting the ratio of the molecular weights of each sequence. (Fig. 3) The ratio of the molecular weight reflection of the code was checked, and the code sum and molecular weight were compared with six exemplary sequences.
상기 예시서열은 코드의 분자량 반영 비율을 확인하기 위한 의도로 예시된 서열로서 범위는 서열번호 1 내지 6의 서열에 제한되는 것으로 해석되지 아니한다.The exemplary sequence is a sequence exemplified with the intention of confirming the ratio of molecular weight reflection of the code, and the range is not interpreted as being limited to the sequences of SEQ ID NOs: 1 to 6.
상기 서열번호 1 내지 6의 서열은 아래와 같다. The sequences of SEQ ID NOs: 1 to 6 are as follows.
5' AGAGCTCGCGCCGGAGTTCTCAATGCAAGAGC 3' (서열번호 1)5'AGAGCTCGCGCCGGAGTTCTCAATGCAAGAGC 3'(SEQ ID NO: 1)
5' GCGGCGGTGGCCTGAAGTCTGGCGGTGGCCCC 3' (서열번호 2)5'GCGGCGGTGGCCTGAAGTCTGGCGGTGGCCCC 3'(SEQ ID NO: 2)
5' GCGGCGGTGGCCAGAAGTCTCGCGGTGGCGGC 3' (서열번호 3)5'GCGGCGGTGGCCAGAAGTCTCGCGGTGGCGGC 3'(SEQ ID NO: 3)
5' GTGGAGGCGGTGGCCAGTCTCGCGGTGGCGGC 3' (서열번호 4)5'GTGGAGGCGGTGGCCAGTCTCGCGGTGGCGGC 3'(SEQ ID NO: 4)
5' GTGGCGGTGGCCAGCATAGTGGCGGTGGCCAG 3' (서열번호 5)5'GTGGCGGTGGCCAGCATAGTGGCGGTGGCCAG 3'(SEQ ID NO: 5)
5' GTGGAGGCGGTGGCCGTGGAGGCGGAGGCCGC 3' (서열번호 6)5'GTGGAGGCGGTGGCCGTGGAGGCGGAGGCCGC 3'(SEQ ID NO: 6)
상기 6개의 예시 서열은 32 mer의 염기서열이고, 염기의 길이는 동일하나 염기의 종류와 순서는 다양하게 구성한 것으로 각 염기의 코드 변환 값을 도 3에 표기하였다. 코드 합은 각 염기의 코드를 십진수로 변환한 후 총 합을 구한 것으로 각 서열의 염기 구성에 따라 코드 합 또한 각 서열의 분자량이 반영되어 계산되었다. The six exemplary sequences are 32 mer nucleotide sequences, and the lengths of the bases are the same, but the types and sequences of bases are various, and the code conversion values of each base are shown in FIG. 3. The code sum was calculated by converting the code of each base into a decimal number and then calculating the total sum. The code sum was also calculated by reflecting the molecular weight of each sequence according to the base composition of each sequence.
각 서열의 분자량(Mw)과 비교하였을 때 분자량이 작을수록 코드 합의 값이 작은 값으로 확인되며 분자량이 큰 서열일 경우 코드 합은 큰 값으로 계산되었다. (도 3)When compared with the molecular weight (Mw) of each sequence, the smaller the molecular weight, the smaller the value of the code sum. In the case of the sequence with a higher molecular weight, the code sum was calculated as a larger value. (Fig. 3)
이와 같이 분자량의 비율을 반영하여 코드를 지정하고 변환한 결과 코드합을 이용함으로써 각 서열의 분자량의 비를 비교하는데 최적화하였다. In this way, the code was designated by reflecting the ratio of the molecular weight and was optimized to compare the ratio of the molecular weight of each sequence by using the resultant code sum.
실시예Example
3: DNA 단편 및 3: DNA fragment and
압타머의Aptamer
패턴 확인의 최적화 Optimization of pattern checking
DNA 단편 및 압타머의 서열을 2진수 염기 코드로 변환하고 각 서열을 비교함으로써 서열 내에 포함되어 있는 특정 패턴 및 2차구조(secondary structure) 등을 파악하는데 최적화하였다. 이를 파악하기 위해 9개의 염기서열로 구성된 DNA 서열을 예시 서열로 활용하였다. (도 4) The sequence of the DNA fragment and the aptamer was converted into a binary base code and optimized to identify specific patterns and secondary structures contained in the sequence by comparing each sequence. To understand this, a DNA sequence consisting of 9 base sequences was used as an exemplary sequence. (Fig. 4)
상기 예시 서열은 코드의 패턴을 예시하기 위한 의도로 기재된 것으로서 범위는 서열번호 7의예시 서열에 제한되는 것으로 해석되지 아니한다.The above exemplary sequence is described with the intention of illustrating the pattern of the code, and the range is not to be construed as being limited to the exemplary sequence of SEQ ID NO: 7.
상기 서열번호 7의 예시 서열은 아래와 같다. An exemplary sequence of SEQ ID NO: 7 is as follows.
5' GCGGTGGCG 3' (서열번호 7)5'GCGGTGGCG 3'(SEQ ID NO: 7)
상기 예시서열을 염기 코드로 변환하여 나열한 수는 아래와 같다.The number listed by converting the example sequence to a base code is as follows.
11 00 11 11 01 11 11 00 11 (예시서열 코드 1) 11 00 11 11 01 11 11 00 11 (example sequence code 1)
각 염기는 수소 결합을 이룰 수 있는 상보 염기와의 코드합이 '3'이 되도록 코드가 설계되어 있으며, 이러한 서열의 배열은 DNA 압타머 서열에서 스템 구조를 이룰 수 있다. (도 4; Stem)Each base is designed to have a code sum of '3' with a complementary base capable of forming hydrogen bonds, and the arrangement of these sequences can form a stem structure in the DNA aptamer sequence. (Fig. 4; Stem)
DNA의 스템-루프(Stem-loop) 구조의 패턴은 대부분 양 끝에 스템 구조를 이룰 수 있는 염기가 2개 이상 연결되어 있으며, 서로 마주보고 있는 서열의 코드합이 3보다 크거나 작아 상보 결합을 이룰 수 없는 서열이 3개 이상 중심에 연결되어 있을 때 루프 구조가 형성될 수 있는 특성이 있다.The pattern of the stem-loop structure of DNA is mostly composed of two or more bases that can form a stem structure at both ends, and the sum of the codes of the sequences facing each other is greater or less than 3 to form a complementary bond. There is a characteristic that a loop structure can be formed when three or more sequences that cannot be connected to the center.
상기 예시 서열은 두 가지의 스템-루프 구조를 이룰 수 있으며 이는 염기 코드 배열로 간단히 확인할 수 있다. 첫번째 11 염기 코드와 상보결합을 이룰 수 있는 서열은 바로 옆의 00 코드를 제외한 8번째 00 코드의 염기(도 4; ①붉은색 화살표)이며, 두번째의 00 코드와의 상보결합이 가능한 염기는 6번째 11(도 4; ③초록색 화살표)과 7번째 11, 9번째 11 코드가 있다. 이와 동일하게 3번째 11 코드의 염기는 8번째 00 (도 4; ②푸른색 화살표)코드와 상보 결합이 가능하다. 이 때, 스템-루프 구조의 스템 부위는 2개 이상의 염기가 연결되어야 구조를 이루기 때문에 도3에 붉은색 화살표에 연결된 염기의 상보결합이나 푸른색 화살표에 연결된 염기의 상보 결합이 스템 구조(도 4; 점선 둥근 원)를 이룰 수 있으며 초록색 화살표의 상보결합은 단일 상보결합으로 스템 구조를 이룰 수 없다. 스템 구조를 이룰 수 있는 두 가지의 경우 모두 루프 구조를 형성할 수 있는 4개의 염기가 가운데에 존재하므로 스템-루프 구조 형성이 가능한 것으로 예측된다. The exemplary sequence can form two stem-loop structures, which can be simply confirmed by nucleotide code arrangement. The sequence capable of forming a complementary bond with the first 11 nucleotide code is the base of the eighth 00 code excluding the 00 code next to it (Fig. 4; ① red arrow), and the base capable of complementary bonding with the second 00 code is 6 There are the 11th code (Fig. 4; ③ green arrow) and the 7th 11th and 9th 11th codes. In the same way, the base of the 3rd 11th code is complementary to the 8th 00 (Fig. 4; ② blue arrow) code. At this time, since the stem portion of the stem-loop structure forms a structure when two or more bases are connected, the complementary bonds of the bases connected to the red arrow in FIG. 3 or the complementary bonds of the bases connected to the blue arrow in FIG. ; Dotted round circle), and the complementary bond of the green arrow cannot form a stem structure with a single complementary bond. In both cases that can form the stem structure, it is predicted that the stem-loop structure can be formed because four bases that can form a loop structure exist in the middle.
이와 같이 각 염기를 코드로 표준화함으로써 염기 코드 합에 따라 각 염기와의 상보 결합 가능 여부를 예측할 수 있으며 각 서열의 상보 결합의 수와 그에 연결된 염기의 수에 따라 DNA 서열의 2차 구조 및 패턴 등을 예측하는데 용이한 것으로 확인하였다. By standardizing each base into a code in this way, it is possible to predict whether or not complementary bonding with each base is possible according to the sum of the base codes, and the secondary structure and pattern of the DNA sequence according to the number of complementary bonds in each sequence and the number of bases linked thereto. It was confirmed that it was easy to predict.
실시예Example
4: 코드 표준화로 인한 SNP 파악의 최적화 4: Optimization of SNP identification due to code standardization
DNA 서열을 코드로 변환하고 각 서열의 코드합을 비교함으로써 특정 DNA 단편의 염기서열 변이 여부를 파악하는데 최적화하였다. SNP서열은 염기 1개가 변이된 DNA 단편 서열이기 때문에 코드를 SNP 서열에 적용하고 정상 서열과 비교함으로써 변이 존재 여부와 위치를 파악하는데 용이한 것을 확인하였다. 다양한 SNP 서열 중에 하나이며 84%의 유방암 환자에게서 확인되는 CD44유전자의 SNP 서열에 적용하여 코드 표준화의 효율성을 확인하였다. [Zhou, J., Nagarkatti, P. S., Zhong, Y., Creek, K., Zhang, J., & Nagarkatti, M. (2010). Unique SNP in CD44 intron 1 and its role in breast cancer development.
Anticancer research,
30(4), 1263-1272.]By converting the DNA sequence into a code and comparing the code sum of each sequence, it was optimized to determine whether the nucleotide sequence of a specific DNA fragment was changed. Since the SNP sequence is a DNA fragment sequence in which one base is mutated, it was confirmed that it was easy to identify the presence and location of the mutation by applying the code to the SNP sequence and comparing it with the normal sequence. It is one of various SNP sequences and was applied to the SNP sequence of the CD44 gene, which is identified in 84% of breast cancer patients, to confirm the efficiency of code standardization. [Zhou, J., Nagarkatti, PS, Zhong, Y., Creek, K., Zhang, J., & Nagarkatti, M. (2010). Unique SNP in CD44 intron 1 and its role in breast cancer development. Anticancer research , 30 (4), 1263-1272.]
상기 유방암 환자의 SNP 서열은 유전자의 첫번째 인트론(intron 1)의 위치에 존재하는 서열 중 엑손(Exon 2)으로부터 14번째에 있는 A염기가 G로 변이되어 있는 것이며, 이 서열을 코드로 변환하여 이진수의 배열로 배치한 후 코드합을 구하여 정상 서열과 변이 서열의 코드 합을 비교하였다. (도 5) The SNP sequence of the breast cancer patient is that the A base at the 14th from the exon 2 among the sequences present at the position of the first intron 1 of the gene has been mutated to G, and this sequence is converted into a code to be binary. After arranging in an arrangement of, the code sum was calculated, and the code sum of the normal sequence and the mutant sequence was compared. (Fig. 5)
정상 서열과 변이 서열의 코드를 각각 10진수로 변형한 후 합을 구하였을 때 정상 서열은 39이며, 변이 서열은 40으로 변이 서열이 정상 서열보다 1이 큰 값으로 확인되었다. 이와 같이 코드합만으로 DNA 절편 내에 변이 존재 여부를 학인 할 수 있으며 이때 변이된 염기의 종류에 따라 코드합은 1~3정도 차이 날 수 있다. 또한 변이된 코드의 각각 수치를 비교함으로써 서열의 위치까지 확인할 수 있다.When the codes of the normal sequence and the mutant sequence were respectively transformed into decimal numbers and then summed, the normal sequence was 39, the mutant sequence was 40, and the mutant sequence was identified as a value of 1 greater than the normal sequence. As such, it is possible to determine whether a mutation exists in a DNA fragment only by the code sum, and at this time, the code sum may vary by 1 to 3 depending on the type of the mutated base. In addition, it is possible to confirm the position of the sequence by comparing the respective values of the mutated codes.
이와 같이 정상 대조군에서 확인되는 DNA 단편 서열들과 질병 실험군에서 확인되는 특정 변이 서열을 코드로 변환하고 코드합을 비교함으로써 서열 간의 차이를 빠르게 확인하고 SNP 존재 여부를 간편하게 탐색할 수 있으며, 확인된 SNP 서열에 코드합을 적용하여 질병 진단에 활용할 수 있다. By converting the DNA fragment sequences identified in the normal control group and the specific mutant sequence identified in the disease test group into a code and comparing the code sum, the difference between sequences can be quickly checked and the presence of SNPs can be easily searched. By applying the code sum to the sequence, it can be used for disease diagnosis.
Claims (9)
- 하기 단계를 포함하는 DNA의 코드 표준화하는 방법:A method for standardizing the code of DNA comprising the following steps:(a) C, T, A, G인 4가지 염기에 각각 00, 01, 10, 11로 명명하고,(a) Four bases of C, T, A, and G are named 00, 01, 10, 11, respectively,(b) 각 염기가 G와 C 그리고 A와 T의 염기 쌍을 이루었을 때는 5'에서 3'방향으로 각각 G와 C의 경우에는 1100, C와 G의 경우에는 0011, A와 T의 경우에는 1001, T와 A의 경우에는 0110으로 명명하는 DNA를 코드로 표준화하는 방법.(b) When each base is a base pair of G and C and A and T, it is 1100 for G and C, 0011 for C and G, and 0011 for A and T in the direction of 5'to 3', respectively. A method of standardizing DNA, which is designated as 1001 and 0110 for T and A, as a code.
- 하기 단계를 포함하는 DNA의 코드 표준화를 이용한 특정 DNA 단편이나 압타머의 특정패턴이나 2차 구조 확인하는데 최적화된 정보 제공 방법:A method of providing information optimized to identify a specific pattern or secondary structure of a specific DNA fragment or aptamer using standardization of the DNA code comprising the following steps:(a) 특정 DNA 단편 염기서열의 C, T, A, 및 G를 각각 00, 01, 10, 11로 명명하는 단계; 및(a) naming C, T, A, and G of a specific DNA fragment sequence as 00, 01, 10, 11, respectively; And(b) 상기 수치로 명명화된 코드의 배열과 각 코드 합의 배열을 비교하는 단계.(b) comparing an array of codes named by the numerical values and an array of sums of codes.
- 제 2항에 있어서, 상기 코드의 배열과 각 코드 합의 배열을 비교하는 단계는 상기 (a) 단계의 00, 01, 10, 및 11의 이진수의 수 배열을 십진수로 변형한 후에 각 서열의 합이 3이 되는 코드의 배열이 2 쌍 이상 양 끝에 배열되어 있는 경우에 스템 구조를 형성할 수 있다고 판단하며, 서로 마주보고 있는 서열의 코드합이 3보다 크거나 작아 상보 결합을 이룰 수 없는 서열이 3개 이상 중심에 연결되어 있을 때 루프 구조를 형성한다고 판단하는 것을 특징으로 하는 DNA의 코드 표준화를 이용한 특정 DNA 단편이나 압타머의 특정패턴이나 2차 구조 확인하는데 최적화된 정보 제공 방법.The method of claim 2, wherein the step of comparing the sequence of the codes with the sequence of the sum of the codes comprises transforming the binary number sequence of 00, 01, 10, and 11 of the step (a) into a decimal number, and then the sum of each sequence is It is judged that a stem structure can be formed when two or more pairs of 3 codes are arranged at both ends, and a sequence that cannot achieve complementary binding is 3 if the sum of the codes of the sequences facing each other is greater or less than 3 A method of providing information optimized for identifying a specific pattern or secondary structure of a specific DNA fragment or aptamer using DNA code standardization, characterized in that it is determined that a loop structure is formed when connected to more than two centers.
- 하기 단계를 포함하는 DNA의 코드 표준화를 이용한 특정 DNA 단편의 염기서열 변이 존재 여부에 대한 정보제공 방법:A method of providing information on the presence or absence of a nucleotide sequence variation of a specific DNA fragment using standardization of DNA codes comprising the following steps:(a) 특정 DNA 단편 염기서열의 C, T, A, 및 G를 각각 00, 01, 10, 11로 명명하는 단계; 및 (a) naming C, T, A, and G of a specific DNA fragment sequence as 00, 01, 10, 11, respectively; And(b) 상기 수치로 명명화된 코드의 합을 비교하는 단계.(b) comparing the sum of codes named by the numerical values.
- 제4항에 있어서, 상기 코드의 합을 비교하는 단계는 상기 (a) 단계의 00, 01, 10, 및 11의 이진수의 수 배열을 십진수로 변형한 후 그 합을 구한 후에 정상 서열과 비교하여 1 내지 3의 차이가 있는 경우에 변이가 존재한다고 판단하는 것을 특징으로 하는 특정 DNA 단편의 염기서열 변이 존재 여부에 대한 정보제공 방법.The method of claim 4, wherein the step of comparing the sum of the codes comprises converting the binary number sequence of 00, 01, 10, and 11 of the step (a) to decimal, and then calculating the sum and comparing it with a normal sequence. A method of providing information on whether or not there is a nucleotide sequence mutation of a specific DNA fragment, characterized in that it is determined that a mutation exists when there is a difference of 1 to 3.
- 제4항에 있어서, 상기 방법은 특정 DNA 단편의 염기서열의 C, T, A, 및 G를 각각 00, 01, 10, 11로 명명하여 얻어진 코드의 각각 수치를 비교함으로써 변이 서열의 위치를 확인할 수 있는 것을 특징으로 하는 특정 DNA 단편의 염기서열 변이 존재 여부에 대한 정보제공 방법.The method according to claim 4, wherein the position of the mutant sequence is confirmed by comparing the values of the codes obtained by naming C, T, A, and G of the base sequence of a specific DNA fragment as 00, 01, 10, 11, respectively. A method of providing information on the presence or absence of a nucleotide sequence variation of a specific DNA fragment, characterized in that it can be.
- 컴퓨터-판독가능 매체에 저장되어, 컴퓨터로 하여금 이하의 단계들을 수행하도록 하기 위한 특정 DNA 단편이나 압타머의 특정패턴이나 2차 구조 확인하는데 최적화된 정보제공용 컴퓨터 프로그램으로서, 상기 단계들은:A computer program for providing information that is stored in a computer-readable medium and optimized to identify a specific pattern or secondary structure of a specific DNA fragment or aptamer for causing a computer to perform the following steps, the steps:(a) 특정 DNA 단편의 염기서열의 C, T, A, 및 G를 각각 00, 01, 10, 11로 명명하는 단계; 및(a) naming C, T, A, and G of the nucleotide sequence of a specific DNA fragment as 00, 01, 10, 11, respectively; And(b) 상기 (a) 단계의 00, 01, 10, 및 11의 이진수의 수 배열을 십진수로 변형한 후에 각 서열의 합이 3이 되는 코드의 배열이 2 쌍 이상 양 끝에 배열되어 있는 경우에 스템 구조를 형성할 수 있다고 판단하며, 서로 마주보고 있는 서열의 코드합이 3보다 크거나 작아 상보 결합을 이룰 수 없는 서열이 3개 이상 중심에 연결되어 있을 때 루프 구조를 형성한다고 판단하는 단계를 포함하는, 컴퓨터-판독가능 매체에 저장된 컴퓨터 프로그램.(b) In the case where the sequence of codes in which the sum of each sequence is 3 is arranged at both ends of two or more pairs after converting the binary number sequence of 00, 01, 10, and 11 in step (a) to decimal It is determined that the stem structure can be formed, and the step of determining that a loop structure is formed when three or more sequences that cannot achieve complementary binding are connected to the center of the code sum of the sequences facing each other is greater than or less than 3 A computer program stored on a computer-readable medium containing.
- 컴퓨터-판독가능 매체에 저장되어, 컴퓨터로 하여금 이하의 단계들을 수행하도록 하기 위한 특정 DNA 단편의 염기서열 변이 존재 여부에 대한 정보제공용 컴퓨터 프로그램으로서, 상기 단계들은:A computer program for providing information on the presence or absence of a nucleotide sequence variation of a specific DNA fragment stored in a computer-readable medium to cause a computer to perform the following steps, the steps:(a) 특정 DNA 단편의 염기서열의 C, T, A, 및 G를 각각 00, 01, 10, 11로 명명하는 단계; 및(a) naming C, T, A, and G of the nucleotide sequence of a specific DNA fragment as 00, 01, 10, 11, respectively; And(b) 상기 (a) 단계의 이진수의 수 배열을 십진수로 변형한 후 그 합을 구한 후에 정상 서열과 비교하여 1 내지 3의 차이가 있는 경우에 변이가 존재한다고 판단하는 것을 경우에 변이가 존재한다고 판단하는 단계를 포함하는, 컴퓨터-판독가능 매체에 저장된 컴퓨터 프로그램.(b) After converting the number sequence of the binary numbers in step (a) to decimal, and after calculating the sum, compared with the normal sequence, it is determined that a mutation exists in the case of a difference of 1 to 3 A computer program stored on a computer-readable medium comprising the step of determining that it is.
- 컴퓨터-판독가능 매체에 저장되어, 컴퓨터로 하여금 이하의 단계들을 수행하도록 하기 위한 특정 DNA 단편의 염기서열 변이 서열에 대한 위치에 대한 정보제공용 컴퓨터 프로그램으로서, 상기 단계들은:A computer program for providing information on the position of a sequence variation sequence of a specific DNA fragment, stored in a computer-readable medium, for causing a computer to perform the following steps, the steps:(a) 특정 DNA 단편의 염기서열의 C, T, A, 및 G를 각각 00, 01, 10, 11로 명명하는 단계; 및(a) naming C, T, A, and G of the nucleotide sequence of a specific DNA fragment as 00, 01, 10, 11, respectively; And(b) 상기 (a)단계의 특정 DNA 단편의 염기서열의 C, T, A, 및 G를 각각 00, 01, 10, 11로 명명하여 얻어진 코드의 각각 수치를 비교함으로써 변이 서열의 위치를 확인하는 단계를 포함하는, 컴퓨터-판독가능 매체에 저장된 컴퓨터 프로그램.(b) Identify the position of the mutant sequence by comparing the values of the codes obtained by naming C, T, A, and G of the nucleotide sequence of the specific DNA fragment in step (a) as 00, 01, 10, 11, respectively. A computer program stored on a computer-readable medium comprising the step of:
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP19918443.3A EP3937177A4 (en) | 2019-03-05 | 2019-03-27 | Dna coding method and biomedical engineering application of same coding method |
CN201980089597.2A CN113614834B (en) | 2019-03-05 | 2019-03-27 | DNA encoding method and medical life engineering application of encoding method |
JP2021553075A JP7275301B2 (en) | 2019-03-05 | 2019-03-27 | DNA encoding method and biotechnological application of the encoding method |
US17/434,122 US20220139500A1 (en) | 2019-03-05 | 2019-03-27 | Dna coding method and biomedical engineering application of same coding method |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020190025377A KR102252977B1 (en) | 2019-03-05 | 2019-03-05 | A method coding standardization of dna and a biotechnological use of the method |
KR10-2019-0025377 | 2019-03-05 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020179962A1 true WO2020179962A1 (en) | 2020-09-10 |
Family
ID=72338682
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/KR2019/003570 WO2020179962A1 (en) | 2019-03-05 | 2019-03-27 | Dna coding method and biomedical engineering application of same coding method |
Country Status (6)
Country | Link |
---|---|
US (1) | US20220139500A1 (en) |
EP (1) | EP3937177A4 (en) |
JP (1) | JP7275301B2 (en) |
KR (1) | KR102252977B1 (en) |
CN (1) | CN113614834B (en) |
WO (1) | WO2020179962A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2023085887A1 (en) * | 2021-11-15 | 2023-05-19 | 주식회사 넥스모스 | Novel aptamer, and composition for cognitive function improvement and anti-aging comprising aptamer as active ingredient |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116092575A (en) * | 2023-02-03 | 2023-05-09 | 中国科学院地理科学与资源研究所 | G DNA structure discrimination method based on GMNS rule |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20040070438A (en) * | 2003-02-03 | 2004-08-09 | 삼성전자주식회사 | Apparatus for encoding DNA sequence and method of the same |
KR20130068185A (en) * | 2011-12-14 | 2013-06-26 | 한국전자통신연구원 | Genome sequence mapping device and genome sequence mapping method thereof |
KR20160001455A (en) | 2014-06-27 | 2016-01-06 | 한국생명공학연구원 | DNA Memory for Data Storage |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040153255A1 (en) * | 2003-02-03 | 2004-08-05 | Ahn Tae-Jin | Apparatus and method for encoding DNA sequence, and computer readable medium |
WO2005024562A2 (en) * | 2003-08-11 | 2005-03-17 | Eloret Corporation | System and method for pattern recognition in sequential data |
CN103336916B (en) * | 2013-07-05 | 2016-04-06 | 中国科学院数学与系统科学研究院 | A kind of sequencing sequence mapping method and system |
-
2019
- 2019-03-05 KR KR1020190025377A patent/KR102252977B1/en active IP Right Grant
- 2019-03-27 CN CN201980089597.2A patent/CN113614834B/en active Active
- 2019-03-27 US US17/434,122 patent/US20220139500A1/en active Pending
- 2019-03-27 JP JP2021553075A patent/JP7275301B2/en active Active
- 2019-03-27 EP EP19918443.3A patent/EP3937177A4/en not_active Withdrawn
- 2019-03-27 WO PCT/KR2019/003570 patent/WO2020179962A1/en unknown
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20040070438A (en) * | 2003-02-03 | 2004-08-09 | 삼성전자주식회사 | Apparatus for encoding DNA sequence and method of the same |
KR20130068185A (en) * | 2011-12-14 | 2013-06-26 | 한국전자통신연구원 | Genome sequence mapping device and genome sequence mapping method thereof |
KR20160001455A (en) | 2014-06-27 | 2016-01-06 | 한국생명공학연구원 | DNA Memory for Data Storage |
Non-Patent Citations (6)
Title |
---|
RAPIN SUNTHORNWAT, J. MOORE ELVIN, TEMTANAPAT YAOWADEE: "Detecting and classifying mutations in genetic code with an application to beta-thalassaemia", SCIENCEASIA, vol. 37, 2011, pages 51 - 61, XP055737443 * |
SANCHEZ, R. ET AL.: "A genetic code Boolean structure. 1. The meaning of Boolean deductions", BULLETIN OF MATHEMATICAL BIOLOGY, vol. 67, 2005, pages 1 - 14, XP004728838, DOI: 10.1016/j.bulm.2004.05.005 * |
See also references of EP3937177A4 |
STAMBUK, N.: "Universal Metric Properties of the Genetic Code", CROATICA CHEMICA ACTA., vol. 73, no. 4, 2000, pages 1123 - 1139, XP055737446 * |
WATSON, J. D., & CRICK, F. H.: "Molecular structure of nucleic acids", NATURE, vol. 171, no. 4356, 1953, pages 737 - 738, XP036980645, DOI: 10.1038/171737a0 |
ZHOU, J.NAGARKATTI, P. S.ZHONG, Y.CREEK, K.ZHANG, J.NAGARKATTI, M: "Unique SNP in CD44 intron 1 and its role in breast cancer development", ANTICANCER RESEARCH, vol. 30, no. 4, 2010, pages 1263 - 1272 |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2023085887A1 (en) * | 2021-11-15 | 2023-05-19 | 주식회사 넥스모스 | Novel aptamer, and composition for cognitive function improvement and anti-aging comprising aptamer as active ingredient |
Also Published As
Publication number | Publication date |
---|---|
CN113614834B (en) | 2024-06-25 |
JP7275301B2 (en) | 2023-05-17 |
JP2022525042A (en) | 2022-05-11 |
CN113614834A (en) | 2021-11-05 |
US20220139500A1 (en) | 2022-05-05 |
EP3937177A4 (en) | 2022-12-07 |
KR20200106761A (en) | 2020-09-15 |
EP3937177A1 (en) | 2022-01-12 |
KR102252977B1 (en) | 2021-05-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP2875173B1 (en) | System and methods for detecting genetic variation | |
Hernandez‐Alias et al. | Translational efficiency across healthy and tumor tissues is proliferation‐related | |
Atak et al. | Interpretation of allele-specific chromatin accessibility using cell state–aware deep learning | |
WO2020179962A1 (en) | Dna coding method and biomedical engineering application of same coding method | |
CN103374518A (en) | Detecting and classifying copy number variation | |
Keel et al. | Genome‐wide copy number variation in the bovine genome detected using low coverage sequence of popular beef breeds | |
Wang et al. | Integrative genome-wide analysis of long noncoding RNAs in diverse immune cell types of melanoma patients | |
Ajore et al. | Functional dissection of inherited non-coding variation influencing multiple myeloma risk | |
CN115295075A (en) | Construction method of complex disease genetic risk assessment model, model and application thereof | |
Selewa et al. | Single-cell genomics improves the discovery of risk variants and genes of atrial fibrillation | |
Porubsky et al. | A familial, telomere-to-telomere reference for human de novo mutation and recombination from a four-generation pedigree | |
US20220005547A1 (en) | Multiplexed droplet-based sequencing using natural genetic barcodes | |
Lang | NanoCoV19: An analytical pipeline for rapid detection of severe acute respiratory syndrome coronavirus 2 | |
KR102280758B1 (en) | A method coding standardization of dna and a biotechnological use of the method | |
KR20200136354A (en) | A method coding standardization of dna and a biotechnological use of the method | |
WO2022082199A1 (en) | Method for detecting amyotrophic lateral sclerosis | |
WO2021210859A1 (en) | Method and apparatus for designing artificial base sequence, and probe using same | |
He et al. | Human transcription factor combinations mapped by footprinting with deaminase | |
RU2799654C2 (en) | Sequence graph-based tool for determining variation in short tandem repeat areas | |
WO2023043097A1 (en) | Method for displaying paired sequence fragment merging for next-generation sequencing | |
WO2023018024A1 (en) | Method for diagnosing microsatellite instability by using variation rate of sequence lengths at microsatellite loci | |
Dorić et al. | In silico prediction of mtDNA gene expression based on codon usage bias in ants (Formicidae Latreille, 1802) that inhabit limestone quarry ecosystems | |
WO2019108014A1 (en) | Method for measuring integrity of uid nucleic acid sequence in nucleic acid sequencing analysis | |
Niu et al. | A novel strategy to identify the regulatory DNA-organized cooperations among transcription factors | |
Cáceres et al. | Understanding disease with omic data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19918443 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2021553075 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 2019918443 Country of ref document: EP Effective date: 20211005 |