WO2019194640A1 - Molecule-indexed bisulfite sequencing - Google Patents

Molecule-indexed bisulfite sequencing Download PDF

Info

Publication number
WO2019194640A1
WO2019194640A1 PCT/KR2019/004072 KR2019004072W WO2019194640A1 WO 2019194640 A1 WO2019194640 A1 WO 2019194640A1 KR 2019004072 W KR2019004072 W KR 2019004072W WO 2019194640 A1 WO2019194640 A1 WO 2019194640A1
Authority
WO
WIPO (PCT)
Prior art keywords
adapter
dna
sequence
oligonucleotide
long
Prior art date
Application number
PCT/KR2019/004072
Other languages
French (fr)
Korean (ko)
Inventor
정상균
오수아
Original Assignee
한국한의학연구원
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 한국한의학연구원 filed Critical 한국한의학연구원
Publication of WO2019194640A1 publication Critical patent/WO2019194640A1/en

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2523/00Reactions characterised by treatment of reaction samples
    • C12Q2523/10Characterised by chemical treatment
    • C12Q2523/125Bisulfite(s)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2525/00Reactions involving modified oligonucleotides, nucleic acids, or nucleotides
    • C12Q2525/10Modifications characterised by
    • C12Q2525/191Modifications characterised by incorporating an adaptor
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2531/00Reactions of nucleic acids characterised by
    • C12Q2531/10Reactions of nucleic acids characterised by the purpose being amplify/increase the copy number of target nucleic acid
    • C12Q2531/113PCR
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2535/00Reactions characterised by the assay type for determining the identity of a nucleotide base or a sequence of oligonucleotides
    • C12Q2535/122Massive parallel sequencing

Definitions

  • the present invention provides a method for the bisulfite sequencing method, wherein the two fragments of the DNA double helix are differently labeled on the genomic fragments cut in the step before the bisulfite treatment, and a molecular index is attached to allow the different DNA fragments to be distinguished.
  • the introduction of the step relates to a method of greatly improving the error resulting from bisulfite sequencing analysis.
  • the organism's genome contains high-level information or regulates the flow of genetic information through methylation of cytosine or adenine bases.
  • cytosine methylation is an important regulatory mechanism that determines the embryological or histological identity of cells through cell division, through which patterns are inherited, and through which the expression of target genes is blocked. It is also used as a genome defense mechanism that inhibits the activity of risk factors such as retroelements. These regulatory mechanisms are impaired, and cytosine of specific genes or regulatory sites is unnecessarily methylated or demethylated, which may cause diseases such as cancer.
  • cytosine methylation applied to the genome provides important information in understanding the genetic and molecular genetic functions and roles of specific genes, groups of genes, or specific regulatory region sequences, as well as the causes of diseases such as cancer. It can be used to identify, diagnose, and predict prognosis.
  • DNA methylation analysis is a classical method of determining the degree of methylation of a specific restriction enzyme site by cleavage using a restriction enzyme (not cleavable) that is sensitive to methylated base. In addition to being able to apply only a few restriction sites, this method requires additional methods to distinguish the quantitative relationship between cleaved and non-cleaved DNA, and the results provide limited information.
  • a restriction enzyme not cleavable
  • bisulfite sequencing is performed by selectively capturing short fragments of genomic DNA cleaved with one of the restriction enzymes, MspI, which capture CpG nucleotides characteristic of regulatory regions such as promoters in the genome. Because they represent a dense area, they show the effect of full-length dielectric analysis.
  • Bisulfite compounds randomly destroy DNA in addition to the deamination of cytosine. It has been reported that over 90% of DNA is destroyed under reaction conditions to induce sufficient deamine reactions (Grunau C et al, 2001, Bisulfite genomic sequencing: systematic investigation of critical experimental parameters, Nucleic Acid Research, 29 (13): E65 -5). Thus, if the number of templates sequenced after bisulfite treatment is very small, the results may not reflect the exact degree of methylation of the analyte.
  • Cytosine methylation in mammals is predominantly on the CpG double nucleotide background, and one strand of cytosine methylation in the double strand of DNA is often accompanied by, but not likely, methylation of the opposite strand cytosine, the binding base of neighboring guanine bases. This asymmetry may involve important regulatory information. However, obtaining information on asymmetric methylation in these double strands is almost impossible with existing bisulfite analysis.
  • One object of the present invention is to provide a bisulfite sequencing method comprising the following first to fifth steps.
  • an adapter equipped with a molecular label is pre-attached to the cleaved DNA before the bisulfite treatment. This has the effect of making it clear which template the sequencing output originated from and which strand of the double helix. Therefore, by providing a clear device that can determine whether the error due to DNA sampling and methylation symmetry, etc. can be analyzed more accurately DNA methylation information.
  • Figure 1 shows the structure of the adapters A and B produced in the embodiment and a description thereof.
  • Figure 3 shows the results of electrophoresis of bisulfite sequencing library.
  • the left column refers to size markers, and the middle and right columns refer to the results of electrophoresis of bisulfite sequencing libraries prepared using different genomic DNAs.
  • Figure 4 shows the mapping of the nucleotide sequence and methylation call to the reference genome and the distribution of molecular labels.
  • Figure 5 shows the analysis of methylation call reflecting the molecular label.
  • Figure 6 is a reduced representation bisulfite sequencing (RRBS) analysis of the mouse spleen genome according to aging.
  • (5) Provides a bisulfite sequencing method comprising a fifth step of performing a PCR using a primer pair binding to both ends of the template using the product prepared in the fourth step as a template.
  • the first to fifth steps may be provided as a step of preparing a library for Next Generation Sequencing (NGS).
  • NGS Next Generation Sequencing
  • next Generation Sequencing refers to a high-speed analysis method for nucleotide sequences of genomes, and may be used in combination with high-throughput sequencing, massively parallel sequencing, or second generation sequencing.
  • the term "library” refers to a set of fragments of a gene obtained by cutting with a restriction enzyme, and the like, but may be a set of introducing a fragment of the gene into a vector, but is not limited thereto. Specifically, in the present invention, the library may be prepared through the first to fifth steps.
  • the first step provides a step of cutting the genomic DNA extracted from the individual to have a cleavable surface capable of binding to the adapter.
  • the term "individual” may mean any species, including humans, that require library preparation for next-generation sequencing.
  • the mouse was used as an example for obtaining genomic DNA, but is not limited thereto.
  • Restriction enzymes may be used for cleavage of the DNA.
  • restriction enzymes are endonucleases (one of nucleases) that identify specific nucleotide sequences of DNA and cut double chains, and mean special enzymes used to make recombinant DNA in genetic engineering.
  • MspI was used as a restriction enzyme, which is used as a representative example of restriction enzyme, but the scope of the present invention is not limited thereto.
  • DNA can be cleaved using various enzymes or physical forces as well as restriction enzymes, and the first step can be constructed by using a DNA polymerase or the like to make a specific overhang.
  • the term 'cutting surface capable of binding to an adapter' refers to a region that may be connected to an adapter by covalent and / or complementary binding as a cutting edge of genomic DNA.
  • an overhang may occur on the cut surface of the genomic DNA.
  • 'over-hang' refers to a structure in which a predetermined number of nucleotides protrude at the 5'-end or 3'-end from the DNA cutting plane, and the greater the complementarity of the overhang, the more efficient the DNA ligation. It is greatly increased.
  • Method for extracting the genomic DNA from the subject can be used without limitation methods used in the art.
  • the second step provides a step of binding the double-stranded adapters A and B, which are two types of adapters having ends complementary to the cleaved surface of the cleaved DNA, to the cleaved DNA.
  • the term "adapter” refers to a nucleotide sequence of a partial double helix structure used to obtain an amplification product including a nucleotide sequence of a cleavage site, and may bind to both ends of the cleaved genomic DNA.
  • the adapter of the second stage may be composed of different adapters, Adapter A and Adapter B.
  • One end of the adapter may include a sequence that complementarily binds to the genomic DNA cleavage that is cleaved, specifically, adapter A may bind in the 5 'direction, and adapter B may bind in the 3' direction.
  • mouse DNA was cleaved using MspI restriction enzyme, and an adapter capable of binding to the cleavage site of the restriction enzyme was attached to the cleaved genomic DNA.
  • the adapter A may be composed of complementary binding of two oligonucleotides, Long-A and Short-A (FIG. 1).
  • the adapter A is a double stranded site; Primer binding sites for single end reading of the Illumina sequencing platform; Methyl cytosine, adenine, guanine and thymine four bases or adenine, guanine and thymine three bases randomly composed of four or more, specifically 4 to 20 bases containing a molecular label consisting of It may be composed of the complementary binding of the -A oligonucleotide and the Short-A oligonucleotide constituting the complementary nucleotide sequence of Long-A.
  • the primer binding site is methylated cytosine instead of cytosine to prevent modification by bisulfite treatment.
  • the Long-A oligonucleotide is located between the molecular label and the double-stranded region or in front of the molecular label, it may further comprise a shift consisting of base sequences of different lengths. Specifically, the shift may be composed of the base sequence of G, GT, GTG, or GTAG, but is not limited thereto.
  • the double-stranded portion means a region where the top strand Long-A and the bottom strand Short-A form a complementary bond.
  • the molecular label means a label capable of distinguishing the identity of a template from which each base sequence is derived based on the identity of the molecular label after sequencing.
  • the shift is a nucleotide consisting of 1 to 4 different lengths located between the double-stranded site and the molecular label. Adapters with different shifts have different sequencing reactions at different cycles by the difference in the shift length. Make it happen. This is a device for preventing the side effect of stopping the reaction by determining that there is an error in the sample when the same nucleotide is read for each cluster in the initial sequencing reaction cycle of the illumination sequencing platform (illumina sequencing platform).
  • the adapter B may consist of complementary binding of two oligonucleotides, Long-B and Short-B (FIG. 1).
  • the adapter B includes a primer binding site for amplification, and may be composed of complementary binding of a long-B oligonucleotide methylated, and a Short-B oligonucleotide in all cytosine constituent bases.
  • the adapter may include a base sequence capable of attaching a primer when PCR is performed in the preparation of the amplification product.
  • Long-A oligonucleotide constituting the adapter A is an example, consisting of the sequence of SEQ ID NO: 1
  • Short-A oligonucleotide may be composed of the sequence of SEQ ID NO: 2.
  • Long-B oligonucleotide constituting the adapter A is an example consisting of the sequence of SEQ ID NO: 3
  • Short-B oligonucleotide may be composed of the sequence of SEQ ID NO: 4.
  • 'x' means methylated cytosine
  • 'D' means any base among adenine, guani and thymine.
  • a DNA-adapter conjugate may be produced.
  • DNA-adapter linker refers to a structure in which the cleaved genomic DNA is connected to an adapter, and is used as a template for amplification for preparing a library.
  • each cleaved DNA can obtain three types of adapter binding products: the form of the adapter A only, the form of the adapter B only, and the form of the binding of the different adapters, depending on the configuration of the adapter bound to both ends. It can be formed in a quantitative ratio of 1: 1: 2 for each form.
  • a pan-holder structure may be formed through complementary binding between the adapters during the PCR reaction, thereby allowing the PCR amplification can be suppressed.
  • the fifth step of PCR amplification can be performed smoothly.
  • the adapter-DNA conjugate generated in step 2 may be selected for analysis by selecting only a part of specific sequences through an additional process such as capture using a probe or the like.
  • the third step provides a step of performing a fill-in of the adapter terminal single strand using a DNA polymerase.
  • the DNA polymerase of the third step may be any known polymerase without limitation.
  • the term "fill-in” refers to a process of synthesizing a double strand by inducing a DNA polymerization reaction for a single strand (single strand) located at the end of the adapter.
  • the fill-in of the third step may be performed using methyl-dCTP instead of dCTP among four dNTPs, which are polymerase substrates. This prevents the base modification caused by bisulfite treatment at the fill-in site.
  • the short oligonucleotides of the two adapters are dephosphorylated at the 5 'end, they do not bind to the cleaved DNA, and a complementary nucleotide sequence for the long oligonucleotide is made through the fill-in process. Base modification by bisulfite treatment does not occur.
  • the double-stranded site located in the long-A oligonucleotide of adapter A contains unmethylated cytosine, bisulfite treatment results in cytosine-> thymine modification, whereas the complementary sequence of the site Since cytosine-> thymine modification does not occur, it can be used as a device that can distinguish between two strands of cleaved DNA through sequencing.
  • the fourth step provides a step of treating bisulfite with the product prepared in the third step to convert unmethylated cytosine into thymine.
  • the term bisulfite is also known as bisulfite or hydrogen sulfite, and is widely used as a sample for the presence or absence of DNA modification. Specifically, when bisulfite is treated with DNA, deamination of unmethylated cytosine (C) base on the DNA proceeds to conversion to thymine (T) base, while methylated cytosine is deaminated. The reaction does not proceed and does not convert to thymine. Therefore, bisulfite can be used to distinguish between cytosine methylation.
  • the term bisulfite sequencing means a sequencing method such as identifying a sequence of DNA and determining a pattern of methylated base using such bisulfite. In this case, techniques or devices known in the art for bisulfite sequencing are freely available.
  • the fourth step may be performed before the second step, and it will be apparent to those skilled in the art that the same result as the method of the present invention may be obtained even if the fourth step is performed before the second step.
  • the product prepared in the fourth step is used as a template, and the PCR is performed using a pair of primers that bind to both ends of the template.
  • amplified product refers to a product of a PCR performed using a primer on a product in which an adapter and a cleaved DNA are bound, and may include a DNA inserted and a adapter.
  • the primer pair of the fifth step may bind to both ends of the amplification product prepared in the fourth step.
  • the primers may be primers in which a base sequence suitable for next-generation sequencing is added, but is not limited thereto.
  • a library for NGS was prepared using primer pairs containing base sequences suitable for next generation sequencing (FIG. 3).
  • the NGS process may be further performed.
  • the RRBS analysis of the mouse spleen genome for each aging step was performed in order to compare and analyze the effect of bisulfite sequencing using an adapter having a molecular label.
  • the methylation level was higher in all age samples than when the molecular label was not reflected, and the decrease in methylation level according to aging was reflected by analyzing the molecular label.
  • Adapters were prepared as shown in FIG. 1. Specifically, two partial double-stranded adapters having a restriction enzyme cleavage plane and complementary ends were prepared, and each of the adapters A and B had the following characteristics.
  • the two adapters complementarily bind to both ends of the cleaved DNA, with adapter A binding in the 5 'direction and adapter B in the 3' direction, respectively. At this time, one strand was formed to form covalent bonds with the ends of the cut DNA through DNA ligation. Accordingly, the amplification of the cut DNA (insert) through the primer binding to each adapter sequence was made possible.
  • Adapter A consists of complementary binding of two oligonucleotides Long-A and Short-A, double-stranded (DS-A), shift (Sft), molecular label (M-tag), primer binding site (PR- siteA).
  • the cytosine bases contained in PR-siteA were all controlled by using methylated cytosine to prevent C-> T variation by subsequent bisulfite treatment. .
  • the M-tag region of the adapter A is a site where molecular labeling is composed of 8 base sequences randomly composed of 3 bases except cytosine, and is arranged to distinguish template identity based on the identity of the molecular labeling. In this case, all four bases including methyl cytosine may be used, and the length is not limited to 8 bases.
  • the Sft, or shift, of adapter A allows for different lengths of nucleotides between M-tag and DS-A, so that the same in most clusters during the initial sequencing reaction cycle of the Illumina sequencing platform.
  • the nucleotide is read, it is determined that there is an error in the sample, and is a device for preventing the side effect of stopping the reaction.
  • four Long-As each including G, GT, GTG, or GTAG sequences having different lengths at shift positions were used.
  • the DS-A site of adapter A was configured to replace cytosine with thymine (bottom strand-Long A) or remain as it was (bottom strand-Short A) by bisulfite treatment, depending on which strand. .
  • the base sequence of the top strand is read from the original top (OT) strand sequence where the bisulfite conversion occurs, and the base sequence of the bottom strand is determined by the bisulfite conversion.
  • the complementary to original bottom (CTOB) of the strands that occurred is read.
  • Short-A of adapter A has only DS-A and complementarily combines four long-As with different Sfts, resulting in four adapters A.
  • the adapter B is composed of two oligonucleotides, that is, Long-B, Short-B, and has a primer binding site for amplification, wherein the primer binding site may include a double stranded site of the adapter.
  • Long and short oligonucleotides of each adapter were prepared by Genotech. Using the Long and Short oligonucleotides, the same amount was mixed at a concentration of 100 pmole / ⁇ l. Then, after leaving it at 97 ° C. for 2 minutes, the temperature was lowered to 25 ° C. at a rate of 1 ° C./cycle/min to prepare complementary bonds between two base sequences, thereby preparing adapters A and B having partial double strands.
  • an RRBS library was constructed for the sequence.
  • mice genomic DNAs were taken at 100 ng each and cleaved with MspI restriction enzyme at 37 ° C. for 4 hours.
  • the purified DNA was purified using a purification kit (Expin TM PCR SV, GeneAll), dissolved in 30 ⁇ l of water, and all of the lysates were taken for adapter ligation. Specifically, binding was performed using adapter A having four different shifted DNAs (Sft). In this case, each cleaved DNA can obtain three types of adapter binding products, such as a form in which only adapter A is bound, a form in which only adapter B is bound, and a form in which different adapters are bound, depending on the configuration of the adapter bound to both ends. Quantitatively formed 1: 1: 2 for each morphology.
  • the binding product was purified with a purification kit (Expin TM PCR SV, GeneAll), dissolved in 30 ⁇ l of water, and 15 ⁇ l was taken to perform an end fill-in.
  • dCTP in four dNTPs which are polymerase substrates, was methylated met-dCTP to prevent base modification by bisulfite treatment.
  • the short oligonucleotides of the two adapters lack phosphate groups at the 5 'end, and thus, do not bind to the cleaved DNA, and a complementary base sequence for the long oligonucleotide is made through the fill-in process, and the sequence is bisulfite. Base modification by treatment does not occur.
  • the C-> T modification is caused by bisulfite treatment, whereas the site is complementary to the site. Since the sequence does not undergo C-> T modification, the sequence can be used as a device that can distinguish two strands of the cleaved DNA through sequencing.
  • one to all of the DNA of the four kinds of the adapter is coupled pooled (pooling) purified by purification kit (Expin TM PCR SV, GeneAll) and then, by sulfonic using bisulfite kit (EpiTect r Bisulfite Kit, Qiagen) The fight conversion reaction was performed. The unmethylated cytosine was then deaminated and converted to thymine.
  • Bisulfite-treated DNA was purified, dissolved in 20 ⁇ l of water, 7 ⁇ l of which was taken, and PCR amplification was performed to prepare an NGS library.
  • the PCR amplification primers include an index sequence for distinguishing samples, and amplification is performed by binding to the PR-sites of two adapters. Amplification products were designed through the Illumina sequencing platform to determine the nucleotide sequences of Mol-tag, Sft, DS-A and cleaved DNA.
  • the form in which the same adapter is bound at both ends is complementary to each other and generates a relatively long sequence at the both ends.
  • PCR amplification is greatly suppressed by forming a pan-holder-like structure in which primers cannot bind due to complementary binding between sock ends to DNA separated into single strands during PCR.
  • different adapters are attached at both ends, normal amplification occurs, resulting in most of the amplification products.
  • a single-end 150 base reading produced about 20 Giga base, 131 Mega read. Among these, 77% of 101 Mega read had normal Mol-tag, Sft and DS-A structure, and the base sequences of each sample could be classified by differentiating Sft sequence.
  • the SAM file was parsed using a Perl script, and sequence reads were aligned with methylation call strings for each mapping site, and the results of specific genomic sites are shown in FIGS. Specifically, in the first line, the chromosome of the mapping locus (chr6) and the beginning of the sequence (90276000) are indicated. In the second and third lines, the top (T_Ref) and bottom (B_Ref) strand sequences of the reference genome are complementary, respectively. The next line shows the sequential sequence (Seq) and the cytosine methylation call for this (Met) in the order of greatest number of duplicates.
  • methylation call The methylation of cytosine, that is, methylation call, is indicated by the letters Z, X, and H, respectively, depending on the nucleotide sequence of C, such as CG, CHG, CHH, etc. It was. Following the methylation call, the same line is followed by the template origin of the sequence (OT or CBOT), the number of overlapping sequences, and the number of molecules attached to the sequence and the number of sequences with that label. In the listed nucleotide sequences, the mutated bases are shown in red in comparison to the sequences with the most overlapping numbers.
  • Sequences having the same molecular label among the aligned nucleotide sequences are duplicate data originating from one template, and since a small number of nucleotide variations appearing between sequences are generated by various reaction processes such as PCR or sequencing, Representative sequences for the template can be inferred based on the number of base overlaps along the position. In addition, the methylation call can also obtain a representative value based on the number of duplicates of the representative sequence.
  • the data of a and b of FIG. 4 are rearranged in a manner that determines consensus sequences for OT and CBOT and indicates the number of overlaps for the molecular label of the sequence, which is shown in FIG. 5.
  • Methylation information for all three templates in the genome region was determined according to the nucleotide sequence, one of which had information about the top and bottom strands simultaneously, and the other two had information about the bottom strands.
  • the number of templates from which the sequence is derived may not be determined for each locus of the gene, and thus the sequenced reads may be independently determined and analyzed. .
  • information on a total of 86 templates (a and b in Fig. 4, sum of the number of all reads mapped to the site when ignoring the molecular representation) at the site is considered. The interpretation may be exaggerated or distorted.
  • the reads mapped to the same site each had the same molecular label for the OT and CTOB strands (AAGTATGG in FIG. 5)
  • the top / bottom double strands of the same template were sequenced simultaneously.
  • the method of the above example can be used to quickly read the sequence of the DNA of the desired individual and to easily and accurately determine whether a mutation has occurred.
  • Genomic DNA was extracted from spleen cells of mice corresponding to 2 months, 6 months, 12 months and 23 months of age, and the methylation profile was determined through the procedure of Examples 1-3.
  • the degree of methylation of each CpG site was determined by reflecting or not reflecting NGS reads.
  • the methylation level of the entire genome of all the sites where the nucleotide sequence was determined it was confirmed that there was a big difference when the molecular label was reflected and when it was not reflected (FIG. 6).
  • FOG. 6 First, all the samples of all ages showed higher methylation levels than those when the molecular labels were not reflected.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Organic Chemistry (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Microbiology (AREA)
  • Immunology (AREA)
  • Biotechnology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Analytical Chemistry (AREA)
  • Physics & Mathematics (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The present invention relates to a method for introduction of a molecular index, by which discrimination between double strands and between different templates can be made in a step of constructing a library in order to greatly improve the accuracy of bisulfite sequencing results for next generation sequencing (NGS).

Description

분자 인덱스된 바이설파이트 시퀀싱Molecular Indexed Bisulfite Sequencing
본 발명은 바이설파이트 시퀀싱 방법에 있어, 바이설파이트 처리 이전 단계에서 절단된 유전체 조각에 DNA 이중나선의 두 가닥을 다르게 표지하고, 또한 서로 다른 DNA 조각의 구별을 가능하게 하는 분자인덱스를 부착하는 단계를 도입함으로써 바이설파이트 시퀀싱 분석에 따른 오류가 크게 개선된 방법에 관한 것이다.In the bisulfite sequencing method, the present invention provides a method for the bisulfite sequencing method, wherein the two fragments of the DNA double helix are differently labeled on the genomic fragments cut in the step before the bisulfite treatment, and a molecular index is attached to allow the different DNA fragments to be distinguished. The introduction of the step relates to a method of greatly improving the error resulting from bisulfite sequencing analysis.
생물의 유전체는 DNA상의 염기서열 외에도 시토신이나 아데닌 염기의 메틸화를 통해 고차원의 정보를 담거나 유전자 정보의 흐름을 조절한다. 특히 포유동물의 경우 시토신 메틸화는 세포분열을 통해 그 패턴이 유전되며 대상이 되는 유전자의 발현을 원천적으로 차단하는 방법을 통해 세포의 발생학적 또는 조직학적 정체성을 결정하는 중요한 조절기전이다. 또한 레트로요소(retroelement)와 같은 위해 인자의 활성을 억제하는 유전체 방어 기전으로 사용되기도 한다. 이러한 조절기전이 훼손되어 특정 유전자나 조절 부위의 시토신이 불필요하게 메틸화 되거나 탈메틸화 함으로서 암과 같은 질병을 일으키는 요인으로 작용하기도 한다.In addition to the DNA sequences, the organism's genome contains high-level information or regulates the flow of genetic information through methylation of cytosine or adenine bases. Especially in mammals, cytosine methylation is an important regulatory mechanism that determines the embryological or histological identity of cells through cell division, through which patterns are inherited, and through which the expression of target genes is blocked. It is also used as a genome defense mechanism that inhibits the activity of risk factors such as retroelements. These regulatory mechanisms are impaired, and cytosine of specific genes or regulatory sites is unnecessarily methylated or demethylated, which may cause diseases such as cancer.
유전체에 가해진 시토신 메틸화의 양상을 정확하게 파악하는 것은 특정 유전자나 유전자 군 또는 특정 조절부위 염기서열의 발생학적, 분자유전학적 기능과 역할을 이해하는 데 중요한 정보를 제공할 뿐만 아니라 암과 같은 질병의 원인 규명, 진단, 예후 예측 등에 활용할 수 있다.Accurate identification of cytosine methylation applied to the genome provides important information in understanding the genetic and molecular genetic functions and roles of specific genes, groups of genes, or specific regulatory region sequences, as well as the causes of diseases such as cancer. It can be used to identify, diagnose, and predict prognosis.
DNA 메틸화 분석은 메틸화된 염기에 민감한 (절단하지 못하는) 제한효소 등을 이용하여 절단 여부로 특정 제한효소 부위의 메틸화 정도를 판별하는 고전적 방법이 있다. 이 방법은 소수의 제한효소 부위만 적용할 수 있을 뿐만 아니라 절단 DNA와 비절단 DNA의 양적 관계를 구별할 수 있는 추가적인 방법들이 필요하며 그 결과도 제한적인 정보만 제공한다. 바이설파이트를 DNA에 처리하면 다른 염기는 반응하지 않지만 시토신은 탈아민 반응이 일어나 티민으로 구조가 변화된다. 따라서 바이설파이트가 처리된 DNA의 염기서열 결정을 통해 분석 대상이 되는 염기서열 내의 모든 시토신에 대한 메틸화 여부를 판별할 수 있다 (Frommer M et al, 1992 A genomic sequencing protocol that yields a positive display of 5-methylcytosine residues in individual DNA strands, PNAS 89(5): 1827-1831). DNA methylation analysis is a classical method of determining the degree of methylation of a specific restriction enzyme site by cleavage using a restriction enzyme (not cleavable) that is sensitive to methylated base. In addition to being able to apply only a few restriction sites, this method requires additional methods to distinguish the quantitative relationship between cleaved and non-cleaved DNA, and the results provide limited information. When bisulfite is treated with DNA, other bases do not react, but cytosine deamines and changes its structure to thymine. Thus, bisulfite-treated DNA sequences can be used to determine methylation of all cytosines within the target sequence (Frommer M et al, 1992 A genomic sequencing protocol that yields a positive display of 5). -methylcytosine residues in individual DNA strands, PNAS 89 (5): 1827-1831).
NGS를 통한 대용량의 시퀀싱 기술의 개발은 바이설파이트 처리된 전장 유전체를 이용하여 유전체 내 대부분의 시토신에 대한 메틸화 수준을 분석할 수 있게 되었다. 그러나 일반 염기서열 결정과는 달리 시토신 메틸화 정도는 동일 염기서열 부위에 대해 매우 많은 수의 유전체 조각의 정보를 얻어야 하기 때문에 전장유전체 분석은 아직도 매우 많은 비용을 필요로 한다. 이러한 경제적 문제를 완화시키면서 전장유전체 분석 수준의 효과를 얻을 수 있는 축소 대표서열에 대한 바이설파이트 시퀀싱 기술이 (RRBS, reduced representative bisulfite sequencing) 개발되어 활용되고 있다 (Alexander M et al, 2005, Reduced representation bisulfite sequencing for comparative high-resolution DNA methylation analysis, Nucleic Acids Research, 33(18): 5868-77). 이 방법은 제한효소의 하나인 MspI으로 절단된 유전체 DNA중 길이가 짧은 절편만을 선택적으로 포획하여 바이설파이트 시퀀싱을 수행하는 것이며, 이들 포획체는 유전체내의 프로모터 등 조절부위에 특징적인 CpG 뉴클리오티드들이 밀집된 영역을 대표하기 때문에 전장유전체 분석의 효과를 나타낸다.The development of high-volume sequencing technology through NGS has enabled bisulfite-treated full-length genomes to analyze the methylation levels for most cytosines in the genome. Unlike general sequencing, however, cytosine methylation requires very large amounts of genomic fragment information for the same sequencing site, so full-length dielectric analysis is still very expensive. A bisulfite sequencing technique (RRBS, reduced representative bisulfite sequencing) has been developed and utilized to mitigate these economic problems and achieve the same level of field dielectric analysis (Alexander M et al, 2005, Reduced representation). bisulfite sequencing for comparative high-resolution DNA methylation analysis, Nucleic Acids Research, 33 (18): 5868-77). In this method, bisulfite sequencing is performed by selectively capturing short fragments of genomic DNA cleaved with one of the restriction enzymes, MspI, which capture CpG nucleotides characteristic of regulatory regions such as promoters in the genome. Because they represent a dense area, they show the effect of full-length dielectric analysis.
바이설파이이트 화합물은 시토신의 탈아민 반응외에도 무작위로 DNA를 파괴한다. 충분한 탈아민 반응을 유발하기 위한 반응 조건에서 90% 이상의 DNA가 파괴된다는 보고가 있다 (Grunau C et al, 2001, Bisulfite genomic sequencing: systematic investigation of critical experimental parameters, Nucleic Acid Research, 29(13): E65-5). 따라서 바이설파이트 처리후 시퀀싱된 템플레이트의 수가 매우 적은 수에서 유래한 것이라면 그 결과가 분석 대상의 정확한 메틸화 정도를 반영하지 않을 수 있다. Bisulfite compounds randomly destroy DNA in addition to the deamination of cytosine. It has been reported that over 90% of DNA is destroyed under reaction conditions to induce sufficient deamine reactions (Grunau C et al, 2001, Bisulfite genomic sequencing: systematic investigation of critical experimental parameters, Nucleic Acid Research, 29 (13): E65 -5). Thus, if the number of templates sequenced after bisulfite treatment is very small, the results may not reflect the exact degree of methylation of the analyte.
포유동물의 시토신 메틸화는 주로 CpG 이중 뉴클리오티드 배경에서 이루어 지며, DNA의 이중가닥에서 한 가닥의 시토신 메틸화는 이웃하는 구아닌 염기의 결합 염기인 반대 가닥 시토신의 메틸화와 동반되는 경우가 많으나 그렇지 않을 가능성도 상존하며 이러한 비대칭성은 중요한 조절 정보들 수반할 수 있다. 그러나 이러한 이중가닥에서의 비대칭 메틸화에 대한 정보를 얻는 것은 현존하는 바이설파이트 분석을 통해서는 거의 불가능하다.Cytosine methylation in mammals is predominantly on the CpG double nucleotide background, and one strand of cytosine methylation in the double strand of DNA is often accompanied by, but not likely, methylation of the opposite strand cytosine, the binding base of neighboring guanine bases. This asymmetry may involve important regulatory information. However, obtaining information on asymmetric methylation in these double strands is almost impossible with existing bisulfite analysis.
본 발명자들은 바이설파이트 시퀀싱의 장점을 그대로 유지하면서 이 방법이 가지는 두가지 단점, 즉 시퀀싱된 템플레이트의 정확한 수량과, DNA 이중가닥의 비대칭 메틸화 여부를 파악하지 못하는 점을 효과적으로 분석할 수 있는 분자적 장치를 개발하고자 예의 노력한 결과, 분자표지 등이 이루어진 어댑터를 활용한 바이설파이트 시퀀싱을 통해 상기한 바와 같이 시퀀싱 된 템플레이트의 수량과 메틸레이션의 대칭성을 파악할 수 있는 분자장치가 부착된 라이브러리를 제조할 수 있음을 확인함으로써 본 발명을 완성하였다.We maintain a merit of bisulfite sequencing, but we can effectively analyze the two disadvantages of this method: the exact quantity of sequenced templates and the inability to identify asymmetric methylation of DNA double strands. As a result of the intensive efforts to develop the method, bisulfite sequencing using an adapter made of a molecular label can be used to prepare a library with a molecular device capable of identifying the number of templates sequenced and the symmetry of methylation. The present invention was completed by confirming the presence of the same.
본 발명의 하나의 목적은, 하기 제1단계 내지 제5단계를 포함하는, 바이설파이트 시퀀싱 방법을 제공하는 것이다. One object of the present invention is to provide a bisulfite sequencing method comprising the following first to fifth steps.
(1) 개체로부터 추출된 게놈 DNA를 어댑터와 결합 가능한 절단면을 갖도록 절단하는 제1단계;(1) cutting the genomic DNA extracted from the individual to have a cutting surface capable of binding with the adapter;
(2) 절단된 DNA의 절단면과 상보적인 말단을 갖는 2종의 어댑터인 부분 이중가닥 어댑터 A 및 B를 절단된 DNA에 결합시키는 제2단계;(2) a second step of binding the double stranded adapters A and B, which are two types of adapters having ends complementary to the cleaved surface of the cleaved DNA, to the cleaved DNA;
(3) DNA 중합효소를 이용하여 어댑터 말단 단일 가닥의 fill-in을 수행하는 제3단계;(3) a third step of performing a fill-in of the adapter terminal single strand using a DNA polymerase;
(4) 상기 제3단계에서 제조된 산물에 대해 바이설파이트(Bisulfite)를 처리하여, 메틸화되지 않은 시토신을 티민으로 전환시키는 제4단계;(4) a fourth step of treating bisulfite with respect to the product prepared in the third step to convert unmethylated cytosine into thymine;
(5) 상기 제4단계에서 제조된 산물을 주형으로 하여, 상기 주형의 양 말단에 결합하는 프라이머 쌍을 이용하여 PCR을 수행하는 제5단계.(5) a fifth step in which the product prepared in the fourth step is used as a template, and PCR is performed using a pair of primers that bind to both ends of the template.
본 발명은 오류율이 높고 개별 주형에 대해 두 strand간 대칭적 메틸화의 구별이 불가능한 종래 바이설파이트 시퀀싱의 근본적 문제를 해결하기 위하여 분자표지가 장치된 어댑터를 바이설파이트 처리 전에 절단된 DNA에 미리 부착함으로써, 시퀀싱 결과물이 어떤 템플레이트에서 기원하였고 이중 나선의 어떤 가닥에 해당하는지를 분명하게 나타나게 하는 효과가 있다. 따라서 DNA 샘플링에 의한 오류 및 메틸화 대칭성의 여부 등을 파악할 수 있는 분명한 장치를 제공하므로 DNA 메틸화 정보를 보다 정확하게 분석할 수 있게 한다.In order to solve the fundamental problem of conventional bisulfite sequencing where the error rate is high and the symmetrical methylation between two strands cannot be distinguished for the individual templates, an adapter equipped with a molecular label is pre-attached to the cleaved DNA before the bisulfite treatment. This has the effect of making it clear which template the sequencing output originated from and which strand of the double helix. Therefore, by providing a clear device that can determine whether the error due to DNA sampling and methylation symmetry, etc. can be analyzed more accurately DNA methylation information.
도 1은, 실시예에서 제조한 어댑터 A 및 B의 구조 및 이에 대한 설명을 나타낸 것이다.Figure 1 shows the structure of the adapters A and B produced in the embodiment and a description thereof.
도 2는, 바이설파이트 시퀀싱 라이브러리를 제작하는 과정을 나타낸 것이다.2 shows a process for preparing a bisulfite sequencing library.
도 3은, 바이설파이트 시퀀싱 라이브러리의 전기영동 결과를 나타낸 것이다. 좌측 컬럼은 사이즈 마커를 의미하며, 가운데 및 우측 컬럼은 서로 다른 게놈 DNA를 이용하여 제작한 바이설파이트 시퀀싱 라이브러리의 전기영동 결과를 의미한다.Figure 3 shows the results of electrophoresis of bisulfite sequencing library. The left column refers to size markers, and the middle and right columns refer to the results of electrophoresis of bisulfite sequencing libraries prepared using different genomic DNAs.
도 4는, 레퍼런스 게놈에 대한 염기서열 및 methylation call의 mapping과 분자표지의 분포를 나타낸 것이다.Figure 4 shows the mapping of the nucleotide sequence and methylation call to the reference genome and the distribution of molecular labels.
도 5는, 분자 표지를 반영한 methylation call을 분석한 것이다.Figure 5 shows the analysis of methylation call reflecting the molecular label.
도 6은, 노화 단계별 생쥐 비장 유전체를 RRBS(Reduced representation bisulfite sequencing) 분석한 것이다.Figure 6 is a reduced representation bisulfite sequencing (RRBS) analysis of the mouse spleen genome according to aging.
이를 구체적으로 설명하면 다음과 같다. 한편, 본 발명에서 개시된 각각의 설명 및 실시형태는 각각의 다른 설명 및 실시 형태에도 적용될 수 있다. 즉, 본 발명에서 개시된 다양한 요소들의 모든 조합이 본 발명의 범주에 속한다. 또한, 하기 기술된 구체적인 서술에 의하여 본 발명의 범주가 제한된다고 볼 수 없다.This will be described in detail as follows. In addition, each description and embodiment disclosed in this invention is applicable to each other description and embodiment. That is, all combinations of the various elements disclosed in the present invention fall within the scope of the present invention. In addition, the scope of the present invention is not to be limited by the specific description described below.
상기 목적을 달성하기 위한 본 발명의 하나의 양태는 One aspect of the present invention for achieving the above object is
(1) 개체로부터 추출된 게놈 DNA를 어댑터와 결합 가능한 절단면을 갖도록 절단하는 제1단계;(1) cutting the genomic DNA extracted from the individual to have a cutting surface capable of binding with the adapter;
(2) 절단된 DNA의 절단면과 상보적인 말단을 갖는 2종의 어댑터인 부분 이중가닥 어댑터 A 및 B를 절단된 DNA에 결합시키는 제2단계;(2) a second step of binding the double stranded adapters A and B, which are two types of adapters having ends complementary to the cleaved surface of the cleaved DNA, to the cleaved DNA;
(3) DNA 중합효소를 이용하여 어댑터 말단 단일 가닥의 fill-in을 수행하는 제3단계;(3) a third step of performing a fill-in of the adapter terminal single strand using a DNA polymerase;
(4) 상기 제3단계에서 제조된 산물에 대해 바이설파이트(Bisulfite)를 처리하여, 메틸화되지 않은 시토신을 티민으로 전환시키는 제4단계;(4) a fourth step of treating bisulfite with respect to the product prepared in the third step to convert unmethylated cytosine into thymine;
(5) 상기 제4단계에서 제조된 산물을 주형으로 하여, 상기 주형의 양 말단에 결합하는 프라이머 쌍을 이용하여 PCR을 수행하는 제5단계를 포함하는, 바이설파이트 시퀀싱 방법을 제공한다.(5) Provides a bisulfite sequencing method comprising a fifth step of performing a PCR using a primer pair binding to both ends of the template using the product prepared in the fourth step as a template.
상기 제1단계 내지 제5단계는 차세대 염기서열 분석(Next Generation Sequencing; NGS)을 위한 라이브러리를 제조하는 단계로 제공될 수 있다.The first to fifth steps may be provided as a step of preparing a library for Next Generation Sequencing (NGS).
본 발명에서 용어, "차세대 염기서열 분석(Next Generation Sequencing; NGS)"은 유전체의 염기서열에 대한 고속 분석 방법을 말하며, High-throughput sequencing, Massive parallel sequencing 또는 Second generation sequencing과 혼용되어 사용될 수 있다.In the present invention, the term "Next Generation Sequencing (NGS)" refers to a high-speed analysis method for nucleotide sequences of genomes, and may be used in combination with high-throughput sequencing, massively parallel sequencing, or second generation sequencing.
본 발명에서 용어, "라이브러리"는 제한효소 등으로 절단하여 얻은 유전자의 단편들의 집합을 말하며, 유전자의 단편을 벡터에 도입한 집합일 수 있으나, 이에 제한되는 것은 아니다. 구체적으로 본 발명에서 상기 라이브러리는 상기 제1단계 내지 제5단계를 통해 제조할 수 있다. In the present invention, the term "library" refers to a set of fragments of a gene obtained by cutting with a restriction enzyme, and the like, but may be a set of introducing a fragment of the gene into a vector, but is not limited thereto. Specifically, in the present invention, the library may be prepared through the first to fifth steps.
상기 제1단계는 개체로부터 추출된 게놈 DNA를 어댑터와 결합 가능한 절단면을 갖도록 절단하는 단계를 제공한다. The first step provides a step of cutting the genomic DNA extracted from the individual to have a cleavable surface capable of binding to the adapter.
본 발명에서 용어, "개체"는 차세대 염기서열 분석을 위한 라이브러리 제조가 필요한, 인간을 포함한 모든 생물 종을 의미할 수 있다. 본 발명의 일 실시예에서는, 게놈 DNA 수득을 위한 예시로서 마우스를 이용하였으나, 이에 제한되는 것은 아니다.As used herein, the term "individual" may mean any species, including humans, that require library preparation for next-generation sequencing. In one embodiment of the present invention, the mouse was used as an example for obtaining genomic DNA, but is not limited thereto.
상기 DNA의 절단에는 제한효소가 이용될 수 있다. 본 발명에서 제한효소는 DNA의 특정한 염기배열을 식별하고 이중사슬을 절단하는 엔도뉴클레아제(핵산분해효소의 하나)로서 유전공학에서 재조합 DNA를 만들기 위해서 사용하는 특수한 효소를 의미하며, 본 발명의 구체적인 일 실시예에서는 MspI를 제한효소로 사용하였으나, 이는 제한효소의 대표적인 예시로서 사용한 것으로 본 발명의 범위가 이에 제한되는 것은 아니다. 또한, 제한효소뿐만 아니라 다양한 효소나 물리적 힘을 사용하여 DNA를 절단할 수 있으며, DNA polymerase등을 이용하여 절단면에 특정 오버행(over-hang)을 만드는 방법으로 제 1단계를 구성할 수 있다.Restriction enzymes may be used for cleavage of the DNA. In the present invention, restriction enzymes are endonucleases (one of nucleases) that identify specific nucleotide sequences of DNA and cut double chains, and mean special enzymes used to make recombinant DNA in genetic engineering. In a specific embodiment, MspI was used as a restriction enzyme, which is used as a representative example of restriction enzyme, but the scope of the present invention is not limited thereto. In addition, DNA can be cleaved using various enzymes or physical forces as well as restriction enzymes, and the first step can be constructed by using a DNA polymerase or the like to make a specific overhang.
상기 '어댑터와 결합 가능한 절단면'이란, 게놈 DNA의 절단면 말단으로서 공유결합 및/또는 상보적 결합으로 어댑터와 연결될 수 있는 지역을 의미한다.As used herein, the term 'cutting surface capable of binding to an adapter' refers to a region that may be connected to an adapter by covalent and / or complementary binding as a cutting edge of genomic DNA.
상기 제1단계의 과정에 따라 게놈 DNA의 절단면에 오버행이 생길 수 있다. 본 발명에서 '오버행(over-hang)'이란, DNA의 절단면에서 5'-말단이나 3'-말단에 일정 수의 뉴클레오티드(nucleotide)가 돌출된 구조를 말하며 오버행의 상보성이 클수록 DNA ligation의 효율이 크게 높아진다. According to the process of the first step, an overhang may occur on the cut surface of the genomic DNA. In the present invention, 'over-hang' refers to a structure in which a predetermined number of nucleotides protrude at the 5'-end or 3'-end from the DNA cutting plane, and the greater the complementarity of the overhang, the more efficient the DNA ligation. It is greatly increased.
상기 게놈 DNA를 개체로부터 추출하는 방법은 당업계에서 사용되는 방법을 제한 없이 사용할 수 있다.Method for extracting the genomic DNA from the subject can be used without limitation methods used in the art.
상기 제2단계는 절단된 DNA의 절단면과 상보적인 말단을 갖는 2종의 어댑터인 부분 이중가닥 어댑터 A 및 B를 절단된 DNA에 결합시키는 단계를 제공한다.The second step provides a step of binding the double-stranded adapters A and B, which are two types of adapters having ends complementary to the cleaved surface of the cleaved DNA, to the cleaved DNA.
본 발명에서 용어, "어댑터"는 절단부위의 염기서열을 포함하는 증폭산물을 수득하기 위해 사용되는 부분 이중나선 구조의 염기서열을 말하며, 절단된 게놈 DNA의 양 말단에 결합할 수 있다. As used herein, the term "adapter" refers to a nucleotide sequence of a partial double helix structure used to obtain an amplification product including a nucleotide sequence of a cleavage site, and may bind to both ends of the cleaved genomic DNA.
제2단계의 어댑터는 서로 다른 어댑터인 어댑터 A 및 어댑터 B로 구성될 수 있다. 상기 어댑터의 일 말단은 절단되는 게놈 DNA 절단면과 상보적으로 결합하는 서열을 포함할 수 있으며, 구체적으로 어댑터 A는 5' 방향으로, 어댑터 B는 3' 방향으로 각각 결합할 수 있다. 본 발명의 일 실시예에서는, MspI 제한효소를 사용하여 마우스 DNA를 절단하였고, 상기 제한효소의 절단부위에 결합할 수 있는 어댑터를 절단된 게놈 DNA에 부착시켰다.The adapter of the second stage may be composed of different adapters, Adapter A and Adapter B. One end of the adapter may include a sequence that complementarily binds to the genomic DNA cleavage that is cleaved, specifically, adapter A may bind in the 5 'direction, and adapter B may bind in the 3' direction. In one embodiment of the present invention, mouse DNA was cleaved using MspI restriction enzyme, and an adapter capable of binding to the cleavage site of the restriction enzyme was attached to the cleaved genomic DNA.
상기 어댑터 A는 두 개의 올리고뉴클레오티드인 Long-A와 Short-A의 상보적 결합으로 구성될 수 있다 (도 1). The adapter A may be composed of complementary binding of two oligonucleotides, Long-A and Short-A (FIG. 1).
구체적으로, 상기 어댑터 A는 이중가닥 부위; 일루미나 시퀀싱 플랫폼의 단일 말단 리딩(Single end reading)을 위한 프라이머 결합부위; 메틸 시토신(methyl cytosine), 아데닌, 구아닌 및 티민 4개의 염기가 또는 아데닌, 구아닌 및 티민 3개의 염기가 무작위로 구성된 4개 이상, 구체적으로 4개 내지 20개의 염기서열로 구성된 분자표지를 포함하는 Long-A 올리고뉴클레오티드와, Long-A와의 상보적 염기서열을 구성하는 Short-A 올리고뉴클레오티드의 상보적 결합으로 구성된 것일 수 있다. 이때 프라이머 결합부위는 시토신 대신 메틸화된 시토신을 사용하여 바이설파이트 처리에 의한 변형을 방지 한다.Specifically, the adapter A is a double stranded site; Primer binding sites for single end reading of the Illumina sequencing platform; Methyl cytosine, adenine, guanine and thymine four bases or adenine, guanine and thymine three bases randomly composed of four or more, specifically 4 to 20 bases containing a molecular label consisting of It may be composed of the complementary binding of the -A oligonucleotide and the Short-A oligonucleotide constituting the complementary nucleotide sequence of Long-A. The primer binding site is methylated cytosine instead of cytosine to prevent modification by bisulfite treatment.
상기 Long-A 올리고뉴클레오티드는 분자표지와 이중가닥 부위 사이 또는 분자표지의 앞에 위치하며, 서로 다른 길이의 염기서열로 구성되는 시프트를 추가로 포함할 수 있다. 구체적으로 상기 시프트는 G, GT, GTG, 또는 GTAG의 염기서열로 구성되는 것일 수 있으나, 이에 제한되는 것은 아니다.The Long-A oligonucleotide is located between the molecular label and the double-stranded region or in front of the molecular label, it may further comprise a shift consisting of base sequences of different lengths. Specifically, the shift may be composed of the base sequence of G, GT, GTG, or GTAG, but is not limited thereto.
상기 이중가닥 부위는 top strand인 Long-A와 bottom strand인 Short-A가 상보적 결합을 이루고 있는 부위를 의미한다. 상기 분자표지는 시퀀싱 후 분자표지의 동일성을 바탕으로 각 염기서열이 유래한 주형(template)의 동일성을 구별할 수 있는 표지를 의미한다. 상기 시프트는 이중가닥 부위와 분자표지 사이에 위치한 1~4개의 서로 다른 길이로 구성된 뉴클레오티드로서, 서로 다른 시프트를 가진 어댑터는 이중가닥 부위의 시퀀싱 반응이 시프트 길이의 차이만큼 서로 다른 사이클(cycle)에서 이루어지게 한다. 이는 일루미나 시퀀싱 플랫폼(illumina sequencing platform)의 초기 시퀀싱 반응 사이클(sequencing reaction cycle)에서 클러스터 별로 동일한 뉴클레오티드가 읽혀지면 샘플에 오류가 있는 것으로 판단하여 반응이 중지되는 부작용을 방지하기 위한 장치이다.The double-stranded portion means a region where the top strand Long-A and the bottom strand Short-A form a complementary bond. The molecular label means a label capable of distinguishing the identity of a template from which each base sequence is derived based on the identity of the molecular label after sequencing. The shift is a nucleotide consisting of 1 to 4 different lengths located between the double-stranded site and the molecular label. Adapters with different shifts have different sequencing reactions at different cycles by the difference in the shift length. Make it happen. This is a device for preventing the side effect of stopping the reaction by determining that there is an error in the sample when the same nucleotide is read for each cluster in the initial sequencing reaction cycle of the illumination sequencing platform (illumina sequencing platform).
상기 어댑터 B는 두 개의 올리고뉴클레오티드인 Long-B와 Short-B의 상보적 결합으로 구성될 수 있다 (도 1).The adapter B may consist of complementary binding of two oligonucleotides, Long-B and Short-B (FIG. 1).
구체적으로, 상기 어댑터 B는 증폭용 프라이머 결합부위를 포함하며, 구성 염기 중 모든 시토신이 메틸화된 Long-B 올리고뉴클레오티드, 및 Short-B 올리고뉴클레오티드의 상보적 결합으로 구성된 것일 수 있다.Specifically, the adapter B includes a primer binding site for amplification, and may be composed of complementary binding of a long-B oligonucleotide methylated, and a Short-B oligonucleotide in all cytosine constituent bases.
상기 어댑터는 증폭산물의 제조단계에서, PCR 수행시 프라이머의 부착이 가능한 염기서열을 포함할 수 있다. The adapter may include a base sequence capable of attaching a primer when PCR is performed in the preparation of the amplification product.
상기 어댑터 A를 구성하는 Long-A 올리고뉴클레오티드는 예시적으로, 서열번호 1의 서열로 구성된 것이며, Short-A 올리고뉴클레오티드는 서열번호 2의 서열로 구성된 것일 수 있다. 상기 어댑터 A를 구성하는 Long-B 올리고뉴클레오티드는 예시적으로 서열번호 3의 서열로 구성된 것이며, Short-B 올리고뉴클레오티드는 서열번호 4의 서열로 구성된 것일 수 있다.Long-A oligonucleotide constituting the adapter A is an example, consisting of the sequence of SEQ ID NO: 1, Short-A oligonucleotide may be composed of the sequence of SEQ ID NO: 2. Long-B oligonucleotide constituting the adapter A is an example consisting of the sequence of SEQ ID NO: 3, Short-B oligonucleotide may be composed of the sequence of SEQ ID NO: 4.
서열번호 1 - Long-A 올리고뉴클레오티드SEQ ID NO: 1-Long-A oligonucleotide
AxAxGAxGxTxTTxxGATxTDDDDDDDDACACGAGCACACGTGACGTAxAxGAxGxTxTTxxGATxTDDDDDDDDACACGAGCACACGTGACGT
서열번호 2 - Short-A 올리고뉴클레오티드SEQ ID NO: 2-Short-A oligonucleotide
CGACGTCACGTGTGCTCGTGTCGACGTCACGTGTGCTCGTGT
서열번호 3 - Long-B 올리고뉴클레오티드SEQ ID NO: 3-Long-B oligonucleotide
GTGAxTGGAGTTxAGAxGTGTGxTxTTxxGATxTTGTGAxTGGAGTTxAGAxGTGTGxTxTTxxGATxTT
서열번호 4 - Short-B 올리고뉴클레오티드SEQ ID NO: 4-Short-B oligonucleotide
CGAAGATCGGAAGAGCACACGCGAAGATCGGAAGAGCACACG
상기 서열번호 1 내지 4의 서열에 있어서, 'x'는 메틸화된 시토신을 의미하고, 'D'는 아데닌, 구아니, 티민 중에서 임의의 염기를 의미한다.In the sequences of SEQ ID NOS: 1 to 4, 'x' means methylated cytosine, and 'D' means any base among adenine, guani and thymine.
상기 제2단계에 따라 DNA-어댑터 연결체가 생산될 수 있다. 본 발명에서 용어, "DNA-어댑터 연결체"는 상기 절단된 게놈 DNA와 어댑터가 연결된 구조체를 말하며, 라이브러리 제조를 위한 증폭의 주형으로 사용된다. 이 때 각각의 절단된 DNA는 양 말단에 결합된 어댑터의 구성에 따라 어댑터 A만 결합한 형태, 어댑터 B만 결합한 형태, 서로 다른 어댑터가 결합한 형태 등 3가지 형태의 어댑터 결합 산물을 얻을 수 있으며, 이론상 양적으로 각 형태에 대해 1:1:2로 형성될 수 있다.According to the second step, a DNA-adapter conjugate may be produced. As used herein, the term "DNA-adapter linker" refers to a structure in which the cleaved genomic DNA is connected to an adapter, and is used as a template for amplification for preparing a library. At this time, each cleaved DNA can obtain three types of adapter binding products: the form of the adapter A only, the form of the adapter B only, and the form of the binding of the different adapters, depending on the configuration of the adapter bound to both ends. It can be formed in a quantitative ratio of 1: 1: 2 for each form.
상기 제2단계에서 절단된 DNA 양 말단에 동종의 어댑터가 결합된 경우, PCR 반응 과정에서 어댑터 간 상보적 결합을 통해 팬-홀더(pan-holder) 구조가 형성될 수 있으며, 이로써 제5단계의 PCR 증폭이 억제될 수 있다. 반면, 절단된 DNA 양 말단에 이종의 어댑터가 결합된 경우, 제5단계의 PCR 증폭이 원활하게 이루어질 수 있다.When homologous adapters are coupled to both ends of the DNA cleaved in the second step, a pan-holder structure may be formed through complementary binding between the adapters during the PCR reaction, thereby allowing the PCR amplification can be suppressed. On the other hand, when heterologous adapters are coupled to both ends of the cleaved DNA, the fifth step of PCR amplification can be performed smoothly.
상기 2단계에서 생성된 어댑터-DNA결합체는 probe 등을 이용한 포획 등의 추가적인 과정을 통해 특정 염기서열들 일부만을 선별하여 분석대상으로 할 수 있다.The adapter-DNA conjugate generated in step 2 may be selected for analysis by selecting only a part of specific sequences through an additional process such as capture using a probe or the like.
상기 제3단계는 DNA 중합효소를 이용하여 어댑터 말단 단일 가닥의 fill-in을 수행하는 단계를 제공한다. 상기 제3단계의 DNA 중합효소는 공지된 중합효소를 제한 없이 사용할 수 있다.The third step provides a step of performing a fill-in of the adapter terminal single strand using a DNA polymerase. The DNA polymerase of the third step may be any known polymerase without limitation.
본 발명에서 용어 "fill-in"은 어댑터 말단에 위치한 단일 가닥 (single strand)에 대해 DNA 중합반응을 유도하여 이중가닥이 되도록 합성하는 과정을 의미한다. In the present invention, the term "fill-in" refers to a process of synthesizing a double strand by inducing a DNA polymerization reaction for a single strand (single strand) located at the end of the adapter.
상기 제3단계의 fill-in은 폴리머라제의 기질인 4종의 dNTP중 dCTP대신 methyl-dCTP를 사용하여 이루어지는 것일 수 있다. 이를 통해 fill-in 부위에서 바이설파이트 처리에 의한 염기변형이 일어나지 않도록 할 수 있다.The fill-in of the third step may be performed using methyl-dCTP instead of dCTP among four dNTPs, which are polymerase substrates. This prevents the base modification caused by bisulfite treatment at the fill-in site.
또한, 상기 두 어댑터의 Short 올리고뉴클레오티드는 5' 말단이 탈인산화(dephosphrylation) 되어 있으므로 절단 DNA에 결합하지 않고, fill-in 과정을 통해 Long 올리고뉴클레오티드에 대한 상보적인 염기서열이 만들어 지며, 이러한 서열은 바이설파이트 처리에 의한 염기변형이 일어나지 않게 된다. 나아가, 어댑터 A의 Long-A 올리고뉴클레오티드에 위치한 이중가닥 부위는 메틸화되지 않은 시토신(unmethylated cytosine)이 포함되어 있으므로 바이설파이트 처리에 의해 시토신 -> 티민 변형이 일어는 반면, 해당 부위의 상보적 서열은 시토신 -> 티민 변형이 일어나지 않으므로, 결과적으로 시퀀싱을 통해 절단된 DNA의 두 가닥을 구별할 수 있는 장치로 활용할 수 있다.In addition, since the short oligonucleotides of the two adapters are dephosphorylated at the 5 'end, they do not bind to the cleaved DNA, and a complementary nucleotide sequence for the long oligonucleotide is made through the fill-in process. Base modification by bisulfite treatment does not occur. Furthermore, since the double-stranded site located in the long-A oligonucleotide of adapter A contains unmethylated cytosine, bisulfite treatment results in cytosine-> thymine modification, whereas the complementary sequence of the site Since cytosine-> thymine modification does not occur, it can be used as a device that can distinguish between two strands of cleaved DNA through sequencing.
상기 제4단계는 상기 제3단계에서 제조된 산물에 대해 바이설파이트(Bisulfite)를 처리하여, 메틸화되지 않은 시토신을 티민으로 전환시키는 단계를 제공한다.The fourth step provides a step of treating bisulfite with the product prepared in the third step to convert unmethylated cytosine into thymine.
본 발명에서 용어 바이설파이트는 중아황산염, 또는 아황산수소염으로도 불리는 화합물로서, DNA 변형 유무에 대한 시료로 널리 사용되는 것으로 알려져 있다. 구체적으로, 바이설파이트를 DNA에 처리할 경우, DNA 상의 메틸화되지 않은 시토신(C)염기에 대한 탈아민화반응(deamination)이 진행되어 티민(T) 염기로 전환되는 한편, 메틸화된 시토신은 탈아민화반응이 진행되지 않아 티민으로 전환되지 않는다. 따라서, 바이설파이트를 이용하면 시토신의 메틸화 유무를 구별할 우 있다. 본 발명에서 용어 바이설파이트 시퀀싱은 이러한 바이설파이트를 이용해 DNA의 서열을 확인하고 메틸화된 염기의 패턴을 파악하는 등의 시퀀싱 방법을 의미한다. 이 경우 바이설파이트 시퀀싱에 관한 당업계 공지된 기술 또는 장치를 자유롭게 이용할 수 있다.In the present invention, the term bisulfite is also known as bisulfite or hydrogen sulfite, and is widely used as a sample for the presence or absence of DNA modification. Specifically, when bisulfite is treated with DNA, deamination of unmethylated cytosine (C) base on the DNA proceeds to conversion to thymine (T) base, while methylated cytosine is deaminated. The reaction does not proceed and does not convert to thymine. Therefore, bisulfite can be used to distinguish between cytosine methylation. In the present invention, the term bisulfite sequencing means a sequencing method such as identifying a sequence of DNA and determining a pattern of methylated base using such bisulfite. In this case, techniques or devices known in the art for bisulfite sequencing are freely available.
상기 제4단계는 제2단계보다 먼저 수행되는 것일 수 있으며, 제4단계를 제2단계보다 먼저 진행하더라도 본 발명의 방법과 동일한 결과가 나올 것임은 당업자에게 자명하다.The fourth step may be performed before the second step, and it will be apparent to those skilled in the art that the same result as the method of the present invention may be obtained even if the fourth step is performed before the second step.
상기 제5단계는 상기 제4단계에서 제조된 산물을 주형으로 하여, 상기 주형의 양 말단에 결합하는 프라이머 쌍을 이용하여 PCR을 수행하는 단계를 제공한다.In the fifth step, the product prepared in the fourth step is used as a template, and the PCR is performed using a pair of primers that bind to both ends of the template.
본 발명에서 용어, "증폭산물"은 어댑터와 절단 DNA가 결합된 산물에 대해 프라이머를 이용하여 수행한 PCR의 결과물을 말하며, 절단되어 삽입된 DNA, 및 어댑터를 포함할 수 있다. As used herein, the term "amplified product" refers to a product of a PCR performed using a primer on a product in which an adapter and a cleaved DNA are bound, and may include a DNA inserted and a adapter.
상기 제5단계의 프라이머 쌍은 상기 제4단계에서 제조된 증폭산물의 양 말단에 결합할 수 있다. 또한, 상기 프라이머들은 차세대 염기서열 분석에 적합한 염기서열이 추가된 형태의 프라이머들일 수 있으나, 이에 제한되는 것은 아니다.The primer pair of the fifth step may bind to both ends of the amplification product prepared in the fourth step. In addition, the primers may be primers in which a base sequence suitable for next-generation sequencing is added, but is not limited thereto.
본 발명의 일 실시예에서는, 차세대 염기서열 분석에 적합한 염기서열을 포함하는 프라이머 쌍을 사용하여 NGS용 라이브러리를 제조하였다(도 3).In one embodiment of the present invention, a library for NGS was prepared using primer pairs containing base sequences suitable for next generation sequencing (FIG. 3).
상기 제5단계 이후 NGS 과정이 추가로 수행될 수 있다.After the fifth step, the NGS process may be further performed.
본 발명의 구체적인 일 실시예에서는, 분자표지가 이루어진 어댑터를 활용한 바이설파이트 시퀀싱에 의한 효과를 비교 분석하기 위해서 노화 단계별 생쥐 비장 유전체의 RRBS 분석을 실시하였다. 그 결과, 분자표지를 반영하지 않은 경우가 반영했을 때 보다, 모든 월령의 샘플에서 메틸레이션(methylation) 수준이 높게 나타나는 현상을 보였으며, 노화에 따른 methylation 수준의 감소현상이 분자표지를 반영하여 분석했을 때가 그렇지 않은 때 보다 더 뚜렷하게 나타났다 (상관계수(R2)의 차이: 0.033 vs 0.596). 또한, 같은 월령군에서 methylation 수준값의 편차 크기가 분자표지를 반영했을 때가 반영하지 않았을 때보다 훨씬 작게 나타나는 것을 확인하였다 (SD: 0.012~0.020 vs 0.038~0.050) (도 6). 이로써, 분자표지가 된 어탭터를 이용하여 바이설파이트 시퀀싱을 했을 때가 측정값들의 균질성을 잘 확보해 주고, 질적으로 우수한 데이터를 제공함을 알 수 있었다. In a specific embodiment of the present invention, the RRBS analysis of the mouse spleen genome for each aging step was performed in order to compare and analyze the effect of bisulfite sequencing using an adapter having a molecular label. As a result, the methylation level was higher in all age samples than when the molecular label was not reflected, and the decrease in methylation level according to aging was reflected by analyzing the molecular label. Was more pronounced than otherwise (difference in correlation coefficient (R2): 0.033 vs 0.596). In addition, it was confirmed that the deviation size of the methylation level value in the same age group was much smaller than when the molecular label was reflected (SD: 0.012 to 0.020 vs 0.038 to 0.050) (FIG. 6). As a result, when bisulfite sequencing was performed using the molecularly labeled adapter, the homogeneity of the measured values was well secured, and the qualitatively superior data were provided.
이하 본 발명을 실시예를 통하여 보다 상세하게 설명한다. 그러나 이들 실시예는 본 발명을 예시적으로 설명하기 위한 것으로 본 발명의 범위가 이들 실시예에 한정되는 것은 아니다. Hereinafter, the present invention will be described in more detail with reference to Examples. However, these examples are for illustrative purposes only and the scope of the present invention is not limited to these examples.
실시예 1: 어댑터(Adaptor) 제작Example 1 Adapter Preparation
1-1. 어댑터 A 및 B의 제작1-1. Fabrication of Adapters A and B
도 1에 도시된 구조와 같이 어댑터를 준비하였다. 구체적으로, 제한효소 절단면과 상보적 말단을 갖는 부분적 이중가닥 어댑터 2종을 준비하였으며 각각의 어댑터 A 및 B는 다음과 같은 특징을 가지도록 하였다.Adapters were prepared as shown in FIG. 1. Specifically, two partial double-stranded adapters having a restriction enzyme cleavage plane and complementary ends were prepared, and each of the adapters A and B had the following characteristics.
먼저, 두 어댑터는 절단된 DNA의 양 말단에 상보적으로 결합하며, 어댑터 A는 5' 방향으로, 어댑터 B는 3' 방향으로 각각 결합한다. 이 때 DNA 결찰(ligation)을 통해 한쪽 가닥이 절단된 DNA의 말단과 공유결합을 형성하도록 하였다. 그에 따라, 각각의 어댑터 서열에 결합하는 primer를 통해 절단된 DNA (insert)의 증폭이 가능하도록 하였다.First, the two adapters complementarily bind to both ends of the cleaved DNA, with adapter A binding in the 5 'direction and adapter B in the 3' direction, respectively. At this time, one strand was formed to form covalent bonds with the ends of the cut DNA through DNA ligation. Accordingly, the amplification of the cut DNA (insert) through the primer binding to each adapter sequence was made possible.
어댑터 A는 두 개의 올리고뉴클레오티드 Long-A와 Short-A의 상보적 결합으로 구성되며, 이중가닥 부위 (DS-A), 시프트(Sft), 분자표지 (M-tag), 프라이머 결합부위 (PR-siteA)를 갖도록 구성되었다. Adapter A consists of complementary binding of two oligonucleotides Long-A and Short-A, double-stranded (DS-A), shift (Sft), molecular label (M-tag), primer binding site (PR- siteA).
어댑터 A의 Long-A에서, PR-siteA에 포함된 시토신(cytosine) 염기는 모두 메틸화(methylation)된 시토신을 사용함으로서 이후 바이설파이트(bisulfite) 처리에 의해 C->T 변이가 일어나지 않도록 조절하였다. In Long-A of adapter A, the cytosine bases contained in PR-siteA were all controlled by using methylated cytosine to prevent C-> T variation by subsequent bisulfite treatment. .
어댑터 A의 M-tag 부위는 시토신을 제외한 3개의 염기가 무작위로 구성된 8개의 염기서열로 분자표지가 이루어진 부위로, 분자표지의 동일성을 바탕으로 주형(template)의 동일성이 구별하도록 장치하였다. 이 경우 분자표지는 메틸 시토신(methyl cytosine)을 포함한 4개의 염기를 모두 사용할 수도 있으며, 그 길이 또한 8 base에 제한되지 않는다.The M-tag region of the adapter A is a site where molecular labeling is composed of 8 base sequences randomly composed of 3 bases except cytosine, and is arranged to distinguish template identity based on the identity of the molecular labeling. In this case, all four bases including methyl cytosine may be used, and the length is not limited to 8 bases.
어댑터 A의 Sft 즉 시프트는 M-tag과 DS-A 사이에 서로 다른 길이의 뉴클레오티드를 포함하도록 함으로서, 일루미나 시퀀싱 플랫폼(illumina sequencing platform)의 초기 시퀀싱 반응 사이클(sequencing reaction cycle)동안에 대부분의 클러스터에서 동일한 뉴클레오티드가 읽혀지면 샘플에 오류가 있는 것으로 판단하여 반응이 중지되는 부작용을 방지하기 위한 장치이다. 본 실시예에서는 시프트 위치에 서로 길이가 다른 G, GT, GTG, 또는 GTAG 서열이 각각 포함된 4종의 Long-A를 사용하였다.The Sft, or shift, of adapter A allows for different lengths of nucleotides between M-tag and DS-A, so that the same in most clusters during the initial sequencing reaction cycle of the Illumina sequencing platform. When the nucleotide is read, it is determined that there is an error in the sample, and is a device for preventing the side effect of stopping the reaction. In the present example, four Long-As each including G, GT, GTG, or GTAG sequences having different lengths at shift positions were used.
Long-A의 PR-siteA는 일루미나 시퀀싱 플랫폼의 단일 말단 리딩(Single end reading)을 위한 primer 결합부위를 포함하므로, 모든 절단 DNA (insert)는 어댑터 A와 결합한 부위부터 염기서열이 결정된다.Since Long-A's PR-siteA contains a primer binding site for single end reading of the Illumina sequencing platform, all the cleaved DNA (insert) is sequenced from the binding site with adapter A.
어댑터 A의 DS-A 부위는 어떤 가닥(strand)이냐에 따라 바이설파이트 처리에 의해 시토신이 티민(thymine)으로 치환되거나 (top strand-Long A) 그대로 남게 되도록 (bottom strand-Short A) 구성되었다.The DS-A site of adapter A was configured to replace cytosine with thymine (bottom strand-Long A) or remain as it was (bottom strand-Short A) by bisulfite treatment, depending on which strand. .
위와 같은 어댑터 A의 구조적 특징에 따라, Top strand의 염기서열은 바이설파이트 전환(bisulfite conversion)이 일어난 original top (OT) strand 염기서열이 읽히게 되고, bottom strand의 염기서열은 바이설파이트 전환이 일어난 가닥의 상보적 염기서열 (complementary to original bottom, CTOB)가 읽히게 된다.According to the structural characteristics of the above adapter A, the base sequence of the top strand is read from the original top (OT) strand sequence where the bisulfite conversion occurs, and the base sequence of the bottom strand is determined by the bisulfite conversion. The complementary to original bottom (CTOB) of the strands that occurred is read.
어댑터 A의 Short-A는 DS-A만을 가지며 서로 다른 Sft를 가진 4종의 Long-A와 상보적으로 결합하여, 결과적으로 4종의 어댑터 A가 구성된다.Short-A of adapter A has only DS-A and complementarily combines four long-As with different Sfts, resulting in four adapters A.
다음으로, 어댑터 B는 두 개의 올리고뉴클레오티드, 즉 Long-B, Short-B로 구성되며, 증폭용 프라이머 결합부위를 가지는데 이때 프라이머 결합부위에는 어댑터의 이중가닥 부위가 포함될 수 있다.Next, the adapter B is composed of two oligonucleotides, that is, Long-B, Short-B, and has a primer binding site for amplification, wherein the primer binding site may include a double stranded site of the adapter.
어댑터 B의 Long-B는 DNA ligation을 통해 절단 DNA(insert DNA)와 공유결합을 이루며, 구성 염기중 모든 시토신은 메틸화된 형태로, 바이설파이트 처리에 의한 염기변형 (C->T)이 일어나지 않도록 하였다. Long-B of adapter B covalently binds to insert DNA through DNA ligation, and all cytosine in the constituent base is methylated, and no base modification (C-> T) occurs by bisulfite treatment. It was not.
1-2. 구체적인 어댑터 제조 과정1-2. Concrete adapter manufacturing process
각 어댑터의 Long 및 Short 올리고뉴클레오티드는 (주)제노텍에 의뢰하여 제작하였다. 상기 Long 및 Short 올리고뉴클레오티드를 이용하여, 100 pmole/μl의 농도로 동량을 혼합하였다. 그 다음 이를 97℃에서 2분간 방치한 후, 1℃/cycle/분의 속도로 온도를 25℃까지 내려 두 염기서열간의 상보적 결합을 유도함으로써 부분 이중가닥을 갖는 어댑터 A 및 B를 제작하였다.Long and short oligonucleotides of each adapter were prepared by Genotech. Using the Long and Short oligonucleotides, the same amount was mixed at a concentration of 100 pmole / μl. Then, after leaving it at 97 ° C. for 2 minutes, the temperature was lowered to 25 ° C. at a rate of 1 ° C./cycle/min to prepare complementary bonds between two base sequences, thereby preparing adapters A and B having partial double strands.
실시예 2: 시퀀스용 RRBS 라이브러리 (Reduced representation bisulfite sequencing (RRBS) library) 제작Example 2: Manufacture of reduced representation bisulfite sequencing (RRBS) library
도 2에 도시된 바와 같이, 시퀀스를 위한 RRBS 라이브러리를 제작하였다.As shown in FIG. 2, an RRBS library was constructed for the sequence.
2-1. DNA 절단 및 어댑터 결합2-1. DNA cleavage and adapter binding
먼저, 마우스 게놈 DNA 4종을 각각 100ng씩 취하여, MspI 제한효소로 37℃ 에서 4시간 동안 절단하였다.First, four mouse genomic DNAs were taken at 100 ng each and cleaved with MspI restriction enzyme at 37 ° C. for 4 hours.
정제 키트(purification kit)(ExpinTM PCR SV, GeneAll)를 사용하여 절단된 DNA를 정제한 뒤 30μl의 물에 녹였고, 용해액 모두를 취하여 어댑터 결합(adaptor ligation)을 진행하였다. 구체적으로, 4종의 절단된 DNA에 대해 각각 서로 다른 시프트(Sft)를 가진 어댑터 A를 사용하여 결합을 수행하였다. 이때, 각각의 절단된 DNA는 양 말단에 결합된 어댑터의 구성에 따라 어댑터 A만 결합한 형태, 어댑터 B만 결합한 형태, 서로 다른 어댑터가 결합한 형태 등 3가지 형태의 어댑터 결합 산물을 얻을 수 있으며, 이론상 양적으로 각 형태에 대해 1:1:2로 형성된다. The purified DNA was purified using a purification kit (Expin PCR SV, GeneAll), dissolved in 30 μl of water, and all of the lysates were taken for adapter ligation. Specifically, binding was performed using adapter A having four different shifted DNAs (Sft). In this case, each cleaved DNA can obtain three types of adapter binding products, such as a form in which only adapter A is bound, a form in which only adapter B is bound, and a form in which different adapters are bound, depending on the configuration of the adapter bound to both ends. Quantitatively formed 1: 1: 2 for each morphology.
2-2. 어댑터 말단의 Fill-in 수행2-2. Fill-in at the end of the adapter
정제 키트(ExpinTM PCR SV, GeneAll)로 결합 산물을 정제한 후 30μl의 물에 녹이고 이를 15μl 취하여 말단 fill-in (end fill-in)을 수행하였다. 이때 폴리머라제의 기질인 4종의 dNTP중 dCTP는 메틸화된 met-dCTP를 사용하여, 이후 바이설파이트 처리에 의한 염기변형이 일어나지 않도록 하였다. 또한, 두 어댑터의 Short 올리고뉴클레오티드는 5' 말단에 인산기가 결여되어 있으므로 절단 DNA에 결합하지 않고, fill-in 과정을 통해 Long 올리고뉴클레오티드에 대한 상보적인 염기서열이 만들어 지며, 이러한 서열은 바이설파이트 처리에 의한 염기변형이 일어나지 않게 된다. 나아가, 어댑터 A의 Long-A 올리고뉴클레오티드에 위치한 DS-A부위는 메틸화되지 않은 시토신(unmethylated cytosine)이 포함되어 있으므로 바이설파이트 처리에 의해 C->T 변형이 일어는 반면, 해당 부위의 상보적 서열은 C->T 변형이 일어나지 않으므로, 결과적으로 시퀀싱을 통해 절단된 DNA의 두 가닥을 구별할 수 있는 장치로 활용할 수 있다.The binding product was purified with a purification kit (Expin TM PCR SV, GeneAll), dissolved in 30 μl of water, and 15 μl was taken to perform an end fill-in. At this time, dCTP in four dNTPs, which are polymerase substrates, was methylated met-dCTP to prevent base modification by bisulfite treatment. In addition, the short oligonucleotides of the two adapters lack phosphate groups at the 5 'end, and thus, do not bind to the cleaved DNA, and a complementary base sequence for the long oligonucleotide is made through the fill-in process, and the sequence is bisulfite. Base modification by treatment does not occur. Furthermore, since the DS-A site located in the long-A oligonucleotide of adapter A contains unmethylated cytosine, the C-> T modification is caused by bisulfite treatment, whereas the site is complementary to the site. Since the sequence does not undergo C-> T modification, the sequence can be used as a device that can distinguish two strands of the cleaved DNA through sequencing.
2-3. 바이설파이트 전환 및 PCR 반응 수행2-3. Bisulfite Conversion and PCR Reactions
다음으로, 어댑터가 결합된 4종의 DNA를 모두 풀링(pooling)하여 정제 키트(ExpinTM PCR SV, GeneAll)로 정제한 다음, 바이설파이트 키트(EpiTect Bisulfite Kit, Qiagen)를 이용하여 바이설파이트 전환 반응을 수행하였다. 그에 따라 메틸화되지 않은 시토신이 디아민화(deamination) 되어, 티민으로 전환되었다.Next, one to all of the DNA of the four kinds of the adapter is coupled pooled (pooling) purified by purification kit (Expin TM PCR SV, GeneAll) and then, by sulfonic using bisulfite kit (EpiTect Bisulfite Kit, Qiagen) The fight conversion reaction was performed. The unmethylated cytosine was then deaminated and converted to thymine.
바이설파이트가 처리된 DNA를 정제한 다음 20μl의 물에 녹이고, 이중 7μl를 취해 PCR 증폭을 수행하여 NGS 라이브러리를 제작하였다. 상기 PCR 증폭용 프라이머에는 샘플을 구별할 수 있는 인덱스(index) 서열이 포함되어 있으며, 두 어댑터의 PR-site에 결합하여 증폭이 이루어진다. Mol-tag, Sft, DS-A 및 절단 DNA의 염기서열이 결정될 수 있도록 일루미나 시퀀싱 플랫폼을 통해 증폭산물을 설계하였다. Bisulfite-treated DNA was purified, dissolved in 20 μl of water, 7 μl of which was taken, and PCR amplification was performed to prepare an NGS library. The PCR amplification primers include an index sequence for distinguishing samples, and amplification is performed by binding to the PR-sites of two adapters. Amplification products were designed through the Illumina sequencing platform to determine the nucleotide sequences of Mol-tag, Sft, DS-A and cleaved DNA.
3가지 형태의 어댑터 부착 산물 중, 양 말단에 동일한 어댑터가 결합한 형태는 PCR 과정에서 양 말단에 서로 상보적이고 비교적 긴 염기서열이 생성되게 된다. 이러한 경우 PCR 과정에서 단일가닥으로 분리된 DNA에 양말단 간의 상보적 결합으로 인해 프라이머가 결합하지 못하는 팬-홀더(pan-holder)모양의 구조가 형성됨으로써 PCR 증폭이 크게 억제된다. 반면 양 말단에 서로 다른 어댑터가 부착된 경우는 정상적인 증폭이 일어남으로써, 결과적으로 증폭산물의 대부분을 이루게 된다.Of the three types of adapter attachment product, the form in which the same adapter is bound at both ends is complementary to each other and generates a relatively long sequence at the both ends. In this case, PCR amplification is greatly suppressed by forming a pan-holder-like structure in which primers cannot bind due to complementary binding between sock ends to DNA separated into single strands during PCR. On the other hand, if different adapters are attached at both ends, normal amplification occurs, resulting in most of the amplification products.
상기 PCR 조건으로서, 95℃에서 20초, 58℃에서 40초, 68℃에서 60초를 1 cycle로 하여, 총 25 cycle을 수행하였다. 이렇게 수득된 PCR 증폭산물을 전기영동한 결과, 서로 다른 크기의 절단 DNA(도 3의 가운데 컬럼 및 우측 컬럼)가 고르게 증폭되었음을 확인하였다 (도 3). 상기 PCR을 통해 구성된 NGS 라이브러리 DNA를 정제한 뒤 illumina NextSeq 500 플랫폼을 사용하여 NGS를 수행하였다.As the PCR conditions, a total of 25 cycles were performed with 1 cycle of 20 seconds at 95 ° C, 40 seconds at 58 ° C, and 60 seconds at 68 ° C. As a result of electrophoresis of the PCR amplification product thus obtained, it was confirmed that the cut DNA of different sizes (middle column and right column of FIG. 3) was uniformly amplified (FIG. 3). After the NGS library DNA was purified by PCR, NGS was performed using an illumina NextSeq 500 platform.
실시예 3: 시퀀싱 결과 분석Example 3: Analysis of Sequencing Results
단일 말단(Single-end)의 150 염기 해독 (base reading)을 통해 약 20 Giga base, 131 Mega read를 생산하였다. 이중에서 77%인 101 Mega read가 정상적인 Mol-tag, Sft 및 DS-A 구조를 갖고 있었으며, Sft 서열의 구별을 통해 샘플별 염기서열들을 분류할 수 있었다.A single-end 150 base reading produced about 20 Giga base, 131 Mega read. Among these, 77% of 101 Mega read had normal Mol-tag, Sft and DS-A structure, and the base sequences of each sample could be classified by differentiating Sft sequence.
각 샘플 별로 DS-A부위에 대한 서열을 결정한 결과, 모든 시토신이 티민으로 바이설파이트 처리에 의해 변형된 경우와 (OT), 그대로 남아있는 경우 (CTOB), 또는 일부가 변형된 경우가 각각 발견되었으며, 그 비율은 각각 44.4%, 42.2%, 13.4%로 확인되었다. OT의 경우 어댑터 A의 Long-A가 결합된 가닥의 서열을 표지하였으며, CTOB의 경우 bottom strand에 대한 상보적 서열을 표지하였다.As a result of determining the sequence for the DS-A site for each sample, it was found that all cytosine was modified by bisulfite treatment with thymine, and (OT), remained (CTOB), or part of each modified. The ratios were 44.4%, 42.2%, and 13.4%, respectively. In the case of OT, the long-A-coupled strand of adapter A was labeled, and in the case of CTOB, the complementary sequence of the bottom strand was labeled.
다음으로, Trim Galore tool을 사용하여 각 샘플의 read로부터 어댑터 B의 서열을 제거하였고, Bismark tool을 사용하여 레퍼런스 게놈(reference genome)에 각 read들을 맵핑(mapping)하여 그 결과를 SAM output으로 작성하였다.Next, the sequence of adapter B was removed from each sample read using the Trim Galore tool, and each read was mapped to the reference genome using the Bismark tool, and the result was written to the SAM output. .
Perl script를 사용하여 SAM file을 파싱(parsing) 하고, 각 맵핑 부위별로 sequence read들을 methylation call string과 함께 정렬하였으며, 도 4의 a와 b에 특정 유전체 부위의 결과를 나타내었다. 구체적으로, 첫 번째 줄에 mapping locus의 염색체와 (chr6) 서열의 시작부위를 (90276000) 표시하였다. 두 번째와 세 번째 줄에 레퍼런스 게놈(reference genome)의 top (T_Ref) 및 bottom (B_Ref) 가닥 서열을 각각 상보적으로 나타내었다. 그 다음 줄에는 해당 위치에 맵핑된 리드들의 중복을 생략한 염기서열과 (Seq) 이에 대한 cytosine methylation call을 (Met) 중복수가 가장 많은 순서대로 순차적으로 나타내었다. 시토신에 대한 메틸레이션 여부, 즉 methylation call은 CG, CHG, CHH 등 C가 위치한 염기서열 배경에 따라 각각 알파벳 Z, X, H로 표시하였고 메틸화 여부에 따라 대문자 (methylated) 및 소문자 (unmethylated)로 표시하였다. methylation call에 이어 같은 줄에 해당 서열의 template origin과 (OT 또는 CBOT), 서열의 중복 수, 그리고 해당 서열에 부착된 분자표지 및 해당 분자표지를 갖는 서열의 수를 차례로 나타내었다. 나열된 염기서열에서 중복수가 가장 많은 서열과 비교하여 변이가 있는 염기는 붉은 색으로 표시하였다.The SAM file was parsed using a Perl script, and sequence reads were aligned with methylation call strings for each mapping site, and the results of specific genomic sites are shown in FIGS. Specifically, in the first line, the chromosome of the mapping locus (chr6) and the beginning of the sequence (90276000) are indicated. In the second and third lines, the top (T_Ref) and bottom (B_Ref) strand sequences of the reference genome are complementary, respectively. The next line shows the sequential sequence (Seq) and the cytosine methylation call for this (Met) in the order of greatest number of duplicates. The methylation of cytosine, that is, methylation call, is indicated by the letters Z, X, and H, respectively, depending on the nucleotide sequence of C, such as CG, CHG, CHH, etc. It was. Following the methylation call, the same line is followed by the template origin of the sequence (OT or CBOT), the number of overlapping sequences, and the number of molecules attached to the sequence and the number of sequences with that label. In the listed nucleotide sequences, the mutated bases are shown in red in comparison to the sequences with the most overlapping numbers.
정렬된 염기서열 중에서 동일한 분자표지를 갖는 서열은 하나의 주형(template)에서 기원된 중복 데이터이며, 서열 사이에 나타나는 소수의 염기변이는 PCR 또는 시퀀싱 등 각종 반응 과정에서 발생한 것이므로, 이를 통해 염기서열이나 포지션에 따른 염기의 중복수에 기반하여 해당 주형에 대한 대표 서열을 유추할 수 있다. 또한, methylation call 역시 대표서열을 대상으로 중복수 기반의 대표 값을 구할 수 있다.Sequences having the same molecular label among the aligned nucleotide sequences are duplicate data originating from one template, and since a small number of nucleotide variations appearing between sequences are generated by various reaction processes such as PCR or sequencing, Representative sequences for the template can be inferred based on the number of base overlaps along the position. In addition, the methylation call can also obtain a representative value based on the number of duplicates of the representative sequence.
OT와 CBOT에 대한 consensus 염기서열을 결정하고 해당 서열의 분자 표지에 대한 중복수를 나타내는 방식으로 도 4의 a와 b의 데이터를 재정리하여, 이를 도 5에 나타내었다.The data of a and b of FIG. 4 are rearranged in a manner that determines consensus sequences for OT and CBOT and indicates the number of overlaps for the molecular label of the sequence, which is shown in FIG. 5.
해당 유전체 부위에서 총 3개의 주형에 대한 메틸화 정보가 염기서열에 따라 결정되었고, 이중 하나는 top 및 bottom strand에 대한 정보를 동시에 가지고 있으며 다른 2개는 bottom strand에 대한 정보만을 가지고 있다.Methylation information for all three templates in the genome region was determined according to the nucleotide sequence, one of which had information about the top and bottom strands simultaneously, and the other two had information about the bottom strands.
만약, 본 실시예의 어댑터와 달리, 어댑터에 분자표지가 없다면 유전자의 각 좌위(locus)별로 해당 서열이 유래한 주형의 수를 판단할 수 없어, 시퀀싱된 read들을 독립적으로 판단하여 분석할 수 밖에 없다. 이 경우, 해당 부위에서 총 86개의 template(도 4의 a와 b에서, 분자표시를 무시했을 때 해당 부위에 매핑되는 모든 read의 수를 합산한 것)에 대한 정보를 얻은 것으로 간주함에 따라, 데이터 해석이 과장되거나 왜곡될 가능성이 있다.Unlike the adapter of the present embodiment, if the adapter does not have a molecular label, the number of templates from which the sequence is derived may not be determined for each locus of the gene, and thus the sequenced reads may be independently determined and analyzed. . In this case, it is assumed that information on a total of 86 templates (a and b in Fig. 4, sum of the number of all reads mapped to the site when ignoring the molecular representation) at the site is considered. The interpretation may be exaggerated or distorted.
따라서 본 실시예의 결과를 통해, 동일 부위에 맵핑된 read중 OT와 CTOB 가닥에 대해 각각 동일한 분자표지를 갖고 있는 경우 (도 5의 AAGTATGG) 동일 주형의 top/bottom 이중가닥이 동시에 시퀀싱 된 것으로 볼 수 있으며, 해당 template에 대해 메틸화의 반접합성(hemizygosity)(한쪽 strand는 메틸화가 되어 있으나 다른 strand는 그렇지 않음) 등을 파악할 수 있다 (도 5의 붉은색 상자). 또한, 분자표지의 동일성을 바탕으로 하나의 template에서 유래한 것을 알 수 있으므로, 각종 반응 사이에 발생한 일부 염기변이에 대해 중복수를 바탕으로 보정의 기회를 가질 수 있다. 결과적으로, 상기 실시예의 방법을 이용하여 목적하는 개체의 DNA의 서열을 빠르게 판독하고, 돌연변이 발생 여부를 쉽고 정확하게 판단할 수 있다.Therefore, according to the results of the present example, when the reads mapped to the same site each had the same molecular label for the OT and CTOB strands (AAGTATGG in FIG. 5), the top / bottom double strands of the same template were sequenced simultaneously. In addition, it is possible to determine the hemizygosity of methylation (the one strand is methylated but the other strand is not) for the template (red box in FIG. 5). In addition, it can be seen that derived from one template based on the identity of the molecular label, there is a chance of correction based on the number of overlaps for some of the base mutations occurring between the various reactions. As a result, the method of the above example can be used to quickly read the sequence of the DNA of the desired individual and to easily and accurately determine whether a mutation has occurred.
실시예 4: 노화 단계별 생쥐 비장 유전체의 RRBS 분석Example 4 RRBS Analysis of Mouse Spleen Genomes by Stage of Aging
생후 2개월, 6개월, 12개월 및 23개월에 해당하는 생쥐들의 비장세포에서 유전체 DNA를 추출하여 실시예 1-3의 과정을 통해 메틸레이션 프로파일(methylation profile)을 결정하였다. 분자표지에 의한 효과를 비교 분석하기 위해서 NGS 리드(read)들을 분자표지를 반영하거나 반영하지 않는 방법으로 각 CpG 부위의 메틸레이션 정도를 결정하였다. 염기서열이 결정된 모든 부위를 대상으로 전체 유전체의 메틸레이션 수준을 결정한 결과 분자표지를 반영했을 때와 반영하지 않았을 때 큰 차이가 있음을 확인하였다 (도 6). 먼저, 모든 월령의 샘플에서 분자표지를 반영하지 않은 경우 반영했을 때 보다 메틸레이션 수준이 높게 나타나는 현상을 보였다. 또한, 노화에 따른 메틸레이션 수준의 감소현상이 분자표지를 반영하여 분석했을 때가 더 뚜렷하게 나타났다 (상관계수(R2)의 차이: 0.033 vs 0.596). 마지막으로, 같은 월령군에서 메틸레이션 수준값의 편차 크기가 분자표지를 반영했을 때 반영하지 않았을 때보다 훨씬 작게 나타났다 (SD: 0.012~0.020 vs 0.038~0.050). 결과적으로, 상기 데이터로부터 분자표지를 반영하여 분석한 경우가 그렇지 않은 경우보다 같은 실험군에서의 측정값들의 균질성을 잘 확보해 주고 있으며, 또한 생물학적 의미를 발굴할 수 있을 정도의 질적으로 우수한 데이터를 제공해주고 있음을 알 수 있다. Genomic DNA was extracted from spleen cells of mice corresponding to 2 months, 6 months, 12 months and 23 months of age, and the methylation profile was determined through the procedure of Examples 1-3. In order to analyze the effect of molecular labeling, the degree of methylation of each CpG site was determined by reflecting or not reflecting NGS reads. As a result of determining the methylation level of the entire genome of all the sites where the nucleotide sequence was determined, it was confirmed that there was a big difference when the molecular label was reflected and when it was not reflected (FIG. 6). First, all the samples of all ages showed higher methylation levels than those when the molecular labels were not reflected. In addition, the decrease in methylation level with aging was more pronounced when the molecular label was reflected (difference in correlation coefficient (R2): 0.033 vs 0.596). Finally, in the same age group, the variation in methylation level was much smaller than when the molecular label was reflected (SD: 0.012 ~ 0.020 vs 0.038 ~ 0.050). As a result, the analysis by reflecting the molecular label from the data secures the homogeneity of the measured values in the same experimental group better than the case where it is not, and also provides the qualitatively excellent data to discover the biological meaning. You can see that.
이상의 설명으로부터, 본 발명이 속하는 기술분야의 당업자는 본 발명이 그 기술적 사상이나 필수적 특징을 변경하지 않고서 다른 구체적인 형태로 실시될 수 있다는 것을 이해할 수 있을 것이다. 이와 관련하여, 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적인 것이 아닌 것으로 이해해야만 한다. 본 발명의 범위는 상기 상세한 설명보다는 후술하는 특허 청구범위의 의미 및 범위 그리고 그 등가 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.From the above description, those skilled in the art will appreciate that the present invention can be implemented in other specific forms without changing the technical spirit or essential features. In this regard, it should be understood that the embodiments described above are exemplary in all respects and not limiting. The scope of the present invention should be construed that all changes or modifications derived from the meaning and scope of the following claims and equivalent concepts rather than the detailed description are included in the scope of the present invention.

Claims (12)

  1. 하기 제1단계 내지 제5단계를 포함하는, 바이설파이트 시퀀싱 방법:A bisulfite sequencing method comprising the following first to fifth steps:
    (1) 개체로부터 추출된 게놈 DNA를 어댑터와 결합 가능한 절단면을 갖도록 절단하는 제1단계;(1) cutting the genomic DNA extracted from the individual to have a cutting surface capable of binding with the adapter;
    (2) 절단된 DNA의 절단면과 상보적인 말단을 갖는 2종의 어댑터인 부분 이중가닥 어댑터 A 및 B를 절단된 DNA에 결합시키는 제2단계;(2) a second step of binding the double stranded adapters A and B, which are two types of adapters having ends complementary to the cleaved surface of the cleaved DNA, to the cleaved DNA;
    (3) DNA 중합효소를 이용하여 어댑터 말단 단일 가닥의 fill-in을 수행하는 제3단계;(3) a third step of performing a fill-in of the adapter terminal single strand using a DNA polymerase;
    (4) 상기 제3단계에서 제조된 산물에 대해 바이설파이트(Bisulfite)를 처리하여, 메틸화되지 않은 시토신을 티민으로 전환시키는 제4단계;(4) a fourth step of treating bisulfite with respect to the product prepared in the third step to convert unmethylated cytosine into thymine;
    (5) 상기 제4단계에서 제조된 산물을 주형으로 하여, 상기 주형의 양 말단에 결합하는 프라이머 쌍을 이용하여 PCR을 수행하는 제5단계.(5) a fifth step in which the product prepared in the fourth step is used as a template, and PCR is performed using a pair of primers that bind to both ends of the template.
  2. 제1항에 있어서, 상기 제2단계의 어댑터 A는 이중가닥 부위; NGS 시퀀싱 플랫폼의 시퀀스 리딩(Sequence reading)을 위한 프라이머 결합부위; 및 메틸 시토신(methyl cytosine), 아데닌, 구아닌, 티민 염기가 또는 아데닌, 구아닌 및 티민 3개의 염기가 무작위로 구성된 4개 내지 20개의 염기서열로 구성된 분자표지를 포함하는 Long-A 올리고뉴클레오티드와, Long-A와의 상보적 염기서열을 구성하는 Short-A 올리고뉴클레오티드의 상보적 결합으로 구성되며 프라이머 결합부위는 시토신 대신 메틸화된 시토신을 사용하여 바이설파이트 처리에 의한 변형이 방지된 것인, 방법.The method of claim 1, wherein the adapter A of the second step is a double-stranded region; Primer binding sites for sequence reading of the NGS sequencing platform; And a Long-A oligonucleotide comprising a molecular label consisting of 4 to 20 nucleotide sequences randomly composed of methyl cytosine, adenine, guanine, thymine base value or 3 bases of adenine, guanine and thymine, and Long The method consists of complementary binding of Short-A oligonucleotides constituting complementary sequences with -A, and the primer binding site is prevented from modification by bisulfite treatment using methylated cytosine instead of cytosine.
  3. 제2항에 있어서, 상기 Long-A 올리고뉴클레오티드는 분자표지와 이중가닥 부위 사이 또는 분자표지의 앞에 위치하며, 서로 다른 길이의 염기서열로 구성되는 시프트를 추가로 포함하는 것인, 방법.The method of claim 2, wherein the Long-A oligonucleotide is located between the molecular label and the double-stranded region or in front of the molecular label, and further comprises a shift consisting of base sequences of different lengths.
  4. 제3항에 있어서, 상기 시프트는 G, GT, GTG, 또는 GTAG의 염기서열로 구성되는 것인, 방법.The method of claim 3, wherein the shift comprises a nucleotide sequence of G, GT, GTG, or GTAG.
  5. 제1항에 있어서, 상기 제2단계의 어댑터 B는 증폭용 프라이머 결합부위를 포함하며, 구성 염기 중 모든 시토신이 메틸화된 Long-B 올리고뉴클레오티드, 및 Short-B 올리고뉴클레오티드의 상보적 결합으로 구성된 것인, 방법.The method of claim 1, wherein the adapter B of the second step comprises a primer binding site for amplification, consisting of the long-B oligonucleotide methylated Long-B oligonucleotide, and the short-B oligonucleotide all of the constituent bases That's how.
  6. 제2항에 있어서, 상기 Long-A 올리고뉴클레오티드는 서열번호 1의 서열로 구성된 것이며, 상기 Short-A 올리고뉴클레오티드는 서열번호 2의 서열로 구성된 것인, 방법.The method of claim 2, wherein the Long-A oligonucleotide is comprised of the sequence of SEQ ID NO: 1, and the Short-A oligonucleotide is comprised of the sequence of SEQ ID NO.
  7. 제5항에 있어서, 상기 Long-B 올리고뉴클레오티드는 서열번호 3의 서열로 구성된 것이며, 상기 Short-B 올리고뉴클레오티드는 서열번호 4의 서열로 구성된 것인, 방법.The method of claim 5, wherein the Long-B oligonucleotide consists of the sequence of SEQ ID NO: 3, and the Short-B oligonucleotide consists of the sequence of SEQ ID NO. 4.
  8. 제1항에 있어서, 상기 제3단계의 fill-in은 dCTP대신 methyl-dCTP를 사용하여 이루어지는 것인, 방법.The method of claim 1, wherein the third fill-in step is performed using methyl-dCTP instead of dCTP.
  9. 제1항에 있어서, 상기 제4단계를 제2단계보다 먼저 수행되는 것인, 방법.The method of claim 1, wherein the fourth step is performed before the second step.
  10. 제1항에 있어서, 제2단계에서 절단된 DNA 양 말단에 동종의 어댑터가 결합된 경우, 어댑터 간 상보적 결합을 통해 팬-홀더(pan-holder) 구조가 형성되는 것인, 방법.The method of claim 1, wherein when a homologous adapter is coupled to both ends of the DNA cleaved in the second step, a pan-holder structure is formed through complementary binding between the adapters.
  11. 제1항에 있어서, 상기 제5단계는 절단된 DNA 양 말단에 이종의 어댑터가 결합된 가닥에 대해 이루어지는 것인, 방법.The method of claim 1, wherein the fifth step is performed on strands having heterologous adapters coupled to both ends of the cleaved DNA.
  12. 제1항에 있어서, 상기 제5단계 이후 NGS (Next generation sequence)를 추가로 수행하는 것인, 방법.The method of claim 1, further comprising performing a next generation sequence (NGS) after the fifth step.
PCT/KR2019/004072 2018-04-05 2019-04-05 Molecule-indexed bisulfite sequencing WO2019194640A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR1020180039781A KR102342490B1 (en) 2018-04-05 2018-04-05 Molecularly Indexed Bisulfite Sequencing
KR10-2018-0039781 2018-04-05

Publications (1)

Publication Number Publication Date
WO2019194640A1 true WO2019194640A1 (en) 2019-10-10

Family

ID=68100898

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2019/004072 WO2019194640A1 (en) 2018-04-05 2019-04-05 Molecule-indexed bisulfite sequencing

Country Status (2)

Country Link
KR (1) KR102342490B1 (en)
WO (1) WO2019194640A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20240031934A (en) * 2022-09-01 2024-03-08 주식회사 키오믹스 Composition for selective amplifying multiple target DNA and the method of thereof

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090047680A1 (en) * 2007-08-15 2009-02-19 Si Lok Methods and compositions for high-throughput bisulphite dna-sequencing and utilities
US20090148842A1 (en) * 2007-02-07 2009-06-11 Niall Gormley Preparation of templates for methylation analysis
KR20110111507A (en) * 2009-01-30 2011-10-11 옥스포드 나노포어 테크놀로지즈 리미티드 Adaptors for nucleic acid constructs in transmembrane sequencing
KR101651817B1 (en) * 2015-10-28 2016-08-29 대한민국 Primer set for Preparation of NGS library and Method and Kit for making NGS library using the same
KR20160111403A (en) * 2014-01-07 2016-09-26 푼다시오 프리바다 인스티튜트 데 메디시나 프레딕티바 이 페르소나리짜다 델 카세르 Methods for generating double stranded dna libraries and sequencing methods for the identification of methylated cytosines

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101254663B1 (en) 2010-04-14 2013-05-06 대한민국 The method for detecting resistance mutations of influenza virus gene against anti-viral agents using pyrosequencing
KR20170133270A (en) * 2016-05-25 2017-12-05 주식회사 셀레믹스 Method for preparing libraries for massively parallel sequencing using molecular barcoding and the use thereof

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090148842A1 (en) * 2007-02-07 2009-06-11 Niall Gormley Preparation of templates for methylation analysis
US20090047680A1 (en) * 2007-08-15 2009-02-19 Si Lok Methods and compositions for high-throughput bisulphite dna-sequencing and utilities
KR20110111507A (en) * 2009-01-30 2011-10-11 옥스포드 나노포어 테크놀로지즈 리미티드 Adaptors for nucleic acid constructs in transmembrane sequencing
KR20160111403A (en) * 2014-01-07 2016-09-26 푼다시오 프리바다 인스티튜트 데 메디시나 프레딕티바 이 페르소나리짜다 델 카세르 Methods for generating double stranded dna libraries and sequencing methods for the identification of methylated cytosines
KR101651817B1 (en) * 2015-10-28 2016-08-29 대한민국 Primer set for Preparation of NGS library and Method and Kit for making NGS library using the same

Also Published As

Publication number Publication date
KR20190116773A (en) 2019-10-15
KR102342490B1 (en) 2021-12-24

Similar Documents

Publication Publication Date Title
WO2016195382A1 (en) Next-generation nucleotide sequencing using adaptor comprising bar code sequence
WO2016167408A1 (en) Method for predicting organ transplant rejection using next-generation sequencing
JP4422897B2 (en) Primer extension method for detecting nucleic acids
US5830665A (en) Contiguous genomic sequence scanning
Ammerpohl et al. Hunting for the 5th base: Techniques for analyzing DNA methylation
WO2018066910A1 (en) Multiple detection method of methylated dna
WO2021086107A1 (en) Method for determining reactivity to parp inhibitor
WO2020096248A1 (en) Manufacturing and detection method of probe for detecting mutations in lung cancer tissue cells
WO2024080731A1 (en) Methylation marker genes for pancreatic cancer diagnosis and use thereof
WO2019151751A1 (en) Gene panel for personalized medicine, method for forming same, and personalized treatment method using same
WO2019194640A1 (en) Molecule-indexed bisulfite sequencing
WO2010083046A2 (en) Methods for using next generation sequencing to identify 5-methyl cytosines in the genome
WO2017191871A1 (en) Method and device for determining reliability of variation detection marker
WO2023063562A1 (en) Methylation marker gene for colorectal cancer diagnosis using cell-free dna and use thereof
WO2005075678A1 (en) Determination of genetic variants in a population using dna pools
WO2022114732A1 (en) Method capable of making one cluster by connecting information of strands generated during pcr process and tracking generation order of generated strands
WO2021118288A1 (en) Pcr method and pcr kit for increasing allelic discrimination
WO2024029988A1 (en) Biomarker composition, kit and information provision method for predicting coronary collateral circulation by using cell-free dna
WO2018026039A1 (en) Kit and method for detecting single nucleotide polymorphism
US20080044916A1 (en) Computational selection of probes for localizing chromosome breakpoints
WO2019031867A1 (en) Method for increasing accuracy of analysis by removing primer sequence in amplicon-based next-generation sequencing
WO2023132626A1 (en) Composition for diagnosing prostate cancer by using cpg methylation changes in specific genes, and use thereof
WO2020256293A1 (en) Method for detecting methylation of sdc2 gene
WO2018110940A1 (en) Method for measuring complexity of library for next generation sequencing
WO2022181858A1 (en) Composition for improving molecular barcoding efficiency and use thereof

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19780809

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19780809

Country of ref document: EP

Kind code of ref document: A1