WO2018110940A1 - Method for measuring complexity of library for next generation sequencing - Google Patents

Method for measuring complexity of library for next generation sequencing Download PDF

Info

Publication number
WO2018110940A1
WO2018110940A1 PCT/KR2017/014549 KR2017014549W WO2018110940A1 WO 2018110940 A1 WO2018110940 A1 WO 2018110940A1 KR 2017014549 W KR2017014549 W KR 2017014549W WO 2018110940 A1 WO2018110940 A1 WO 2018110940A1
Authority
WO
WIPO (PCT)
Prior art keywords
nucleic acid
library
polynucleotide
complexity
pcr
Prior art date
Application number
PCT/KR2017/014549
Other languages
French (fr)
Korean (ko)
Inventor
정종석
손대순
박웅양
Original Assignee
삼성전자 주식회사
사회복지법인 삼성생명공익재단
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 삼성전자 주식회사, 사회복지법인 삼성생명공익재단 filed Critical 삼성전자 주식회사
Publication of WO2018110940A1 publication Critical patent/WO2018110940A1/en

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6844Nucleic acid amplification reactions
    • C12Q1/6851Quantitative amplification
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2525/00Reactions involving modified oligonucleotides, nucleic acids, or nucleotides
    • C12Q2525/10Modifications characterised by
    • C12Q2525/191Modifications characterised by incorporating an adaptor

Definitions

  • the present invention relates to a method for measuring complexity of a library for next generation nucleic acid sequencing and a device using the same.
  • NGS Next generation sequencing
  • QC quality control
  • QC is performed prior to entering the nucleic acid sequencing, and it is determined whether to proceed with nucleic acid sequencing with the prepared library.
  • QC is performed by a method provided by the manufacturer of the library.
  • the generated nucleic acid sequence data ie, reads
  • the quality of data generation prior to analysis such as mutations, gene mutations, gene expression, and the like.
  • one of the factors determining the quality and determining the quality at each stage of the next-generation nucleic acid sequencing is the complexity of the library.
  • the complexity of the library can be measured during nucleic acid sequencing or after nucleic acid sequencing is completed to determine whether to perform, stop, and utilize the generated nucleic acid sequence data. have.
  • fragmenting nucleic acid extracted from a target sample comprising:
  • It provides a method for measuring the complexity of the library for nucleic acid sequencing, comprising the step of calculating the ratio of the second Ct value to the first Ct value to determine the complexity of the first library.
  • Nucleic acid sequencing of the “nucleic acid sequencing library” may be next generation sequencing (NGS).
  • NGS next generation sequencing
  • massive parallel sequencing or the term “second-generation sequencing”.
  • NGS refers to a technique for fragmenting a full-length genome in chip-based and PCR-based paired end forms, and performing the nucleic acid sequencing analysis at a very high speed based on hybridization.
  • NGS is a technique for simultaneous nucleic acid sequencing of a large amount of fragments of nucleic acid, and can perform NGS-based targeted nucleic acid sequencing or panel nucleic acid sequencing.
  • NGS is, for example, 454 platform (Roche), GS FLX titanium, Illumina MiSeq, Illumina HiSeq, Illumina Genome Analyzer, Solexa platform, SOLiD System (Applied Biosystems), Ion Proton (Life Technologies), Complete Genomics, Helicos Biosciences Heliscope, Pacific Biosciences' single molecule real time (SMRT TM) technology, or a combination thereof.
  • library refers to a collection of nucleic acid fragments.
  • the library is, for example, a genomic library, a complementary DNA library, or a randomized mutant library.
  • library complexity refers to the number of unique fragments that exist in that library. Complexity may be influenced by the amount of nucleic acid that is a starting material, the amount of nucleic acid lost during library preparation, the amount of nucleic acid amplified by PCR, and the like. The complexity of the library can be expressed at a relative level.
  • the method includes fragmenting the nucleic acid extracted from the target sample.
  • the target sample can be from an individual or a cell.
  • the subject may be a mammal, including humans, cattle, horses, pigs, sheep, goats, dogs, cats, and rodents.
  • the cell may be a cell or cell line derived from an individual.
  • the target sample may be a biological sample.
  • the biological sample may be obtained from, for example, blood, plasma, serum, urine, saliva, mucosal secretions, sputum, feces, tears, or a combination thereof.
  • the biological sample may be a sample of eukaryotic cells, prokaryotic cells, viruses, bacteriophage, or the like derived from various species.
  • the nucleic acid may be a genome or a fragment thereof.
  • the term “genome” is a term that collectively refers to the entirety of a chromosome, chromatin, or gene.
  • the genome or fragment thereof may be isolated DNA, eg, cell-free DNA (cfDNA). Methods for extracting or separating nucleic acids from a target sample can be performed by methods known to those skilled in the art.
  • Extracting nucleic acid from the target sample may be performed by a method known to those skilled in the art.
  • the fragmentation may be a physical, chemical or enzymatic cleavage of the genome.
  • fragmentation is cleavage of the genome with restriction enzymes.
  • the method may further comprise selecting the size of the fragmented nucleic acid.
  • the step of size selection can be performed by electrophoresis, centrifugation, chromatography, or a combination thereof.
  • the fragmented nucleic acid has a length of about 10 bp (base pair) to about 2000 bp, about 20 bp to about 1500 bp, about 50 bp to about 1000 bp, about 100 bp to about 800 bp, about 150 bp to about 600 bp, Or about 300 bp to about 600 bp.
  • the method includes ligating a first polynucleotide at one or more ends of the fragmented nucleic acid to produce a first library for nucleic acid sequencing.
  • the preparing of the first library may further include end-repair and 3′-adenosine tailing of the fragmented nucleic acid.
  • the first polynucleotide may be an adapter.
  • the adapter may be a polynucleotide comprising a primer sequence for enriching a target nucleic acid in NGS.
  • the adapter may be a polynucleotide known to those skilled in the art.
  • the adapter may comprise a universal sequence for nucleic acid sequencing. For example, it is an adapter included in a library preparation kit for nucleic acid sequencing.
  • Ligation refers to the binding of ends between nucleic acid fragments.
  • the ligation can be performed using DNA ligase.
  • the first library may be a library prepared for nucleic acid sequencing.
  • the method includes preparing a second library by spiking a second polynucleotide to the first library.
  • the spike may be to mix the first library with a small amount of the second polynucleotide.
  • the second polynucleotide comprises a first region wherein at least two consecutive nucleotides comprise a nucleic acid sequence identical to a target nucleic acid sequence, and at least one end of the first region, wherein the second polynucleotide comprises at least two consecutive nucleotides from at least one end of the target nucleic acid sequence; It may comprise a second region comprising different nucleic acid sequences.
  • the target nucleic acid sequence may comprise genetic variation used for companion diagnostics (CDx).
  • CDx companion diagnostics
  • the length of the second polynucleotide is about 20 nucleotides (hereinafter referred to as 'nt') to about 500 nt, about 30 nt to about 450 nt, about 40 nt to about 400 nt, about 50 nt to about 350 nt, about 60 nt to about 300 nt, about 70 nt to about 250 nt, about 80 nt to about 200 nt, about 90 nt to about 190 nt, about 100 nt to about 180 nt, about 110 nt to about 170 nt, about 120 nt To about 160 nt, about 130 nt to about 150 nt, or about 150 nt.
  • 'nt' nucleotides
  • the first region may comprise a nucleic acid sequence wherein two or more consecutive nucleotides are the same as the target nucleic acid sequence.
  • the first region is about 10 nucleotides (hereinafter referred to as 'nt') to about 490 nt, about 20 nt to about 440 nt, about 30 nt to about 390 nt, about 40 nt to about 340 nt, about 50 nt to About 290 nt, about 60 nt to about 240 nt, about 70 nt to about 150 nt, about 80 nt to about 180 nt, about 90 nt to about 170 nt, about 100 nt to about 160 nt, about 110 nt to about 150 nt, about 120 nt to about 150 nt, about 130 nt to about 150 nt, about 140 nt to about 150 nt, or about 142 nt.
  • the second region is located at both ends of the first region, each second region comprising a sequence different from at least 2 consecutive nucleotides from the 5 'end of the target nucleic acid sequence and a sequence different from at least 2 consecutive nucleotides from the 3' end.
  • the length of the second region is about 2 nt to about 15 nt, about 2 nt to about 13 nt, about 2 nt to about 10 nt, about 2 nt to about 8 nt, about 2 nt to about 6 nt, about 2 nt To about 4 nt, about 3 nt, or about 4 nt.
  • the second polynucleotide may comprise a second region, a first region, and a second region, for example in the 3 'direction from its 5'-end.
  • the second polynucleotide may further comprise two or more contiguous nucleotides identical to the first polynucleotide at one or more ends thereof.
  • the second polynucleotide is, for example, two or more contiguous nucleotides equal to the first polynucleotide in its 3 'direction from its 5'-end, two equal to the second region, first region, second region, and first polynucleotide It may comprise a continuous nucleotide.
  • the method includes performing a first polymerase chain reaction (PCR) using a first primer set complementary to a second library and a first polynucleotide to yield a first threshold cycle (Ct) value. do.
  • PCR polymerase chain reaction
  • the PCR is, for example, quantitative PCR (qPCR), digital PCR (digital PCR (dPCR), hot start PCR, touchdown PCR, nested PCR, booster (booster) ) PCR, multiplex PCR, real-time PCR, differential display PCR (D-PCR), rapid amplification of cDNA ends (RACE), inverse PCR (inverse) polymerase chain reaction (IPCR), vectorette PCR, and TAIL-PCR (thermal asymmetric interlaced PCR).
  • qPCR quantitative PCR
  • digital PCR digital PCR
  • dPCR digital PCR
  • hot start PCR hot start PCR
  • touchdown PCR nested PCR
  • booster (booster) booster
  • multiplex PCR multiplex PCR
  • real-time PCR real-time PCR
  • D-PCR differential display PCR
  • RACE rapid amplification of cDNA ends
  • IPCR inverse PCR
  • vectorette PCR vectorette PCR
  • TAIL-PCR thermoasymmetric interlaced PCR
  • the first primer set may be a polynucleotide complementary to the first polynucleotide.
  • the first primer set may be a universal primer set.
  • the second primer set may be a polynucleotide complementary to a second polynucleotide.
  • the second primer set may be a polynucleotide that is complementary to the second polynucleotide but not complementary to the first polynucleotide.
  • a probe complementary to the target nucleic acid can be further used for detection of the amplified nucleic acid.
  • the probe may be one or more of its ends is labeled with a fluorescent material, quantum dots, FRET and the like.
  • the threshold cycle (Ct) value refers to the number of cycles in the PCR that initially represent the amplified signal above the background signal. For quantitative PCR, this refers to the number of cycles representing the threshold of the fluorescence signal. Since the Ct value is inversely correlated with the copy number of the original nucleic acid as starting material in the amplification reaction, the Ct value can be used to calculate the copy number of the nucleic acid in the target sample.
  • the first Ct value may represent the total read of the second library.
  • read refers to nucleic acid sequence information of nucleic acid fragments obtained by nucleic acid sequence analysis.
  • the method includes performing a second PCR using a second primer set complementary to a second library and a second polynucleotide to yield a second Ct value.
  • the second Ct value may represent a read of a second polynucleotide in a second library.
  • the first PCR, the second PCR, or a combination thereof may be performed by quantitative PCR (qPCR) or digital PCR (dPCR).
  • qPCR quantitative PCR
  • dPCR digital PCR
  • the first PCR and the second PCR may be performed simultaneously or sequentially.
  • the method includes calculating the ratio of the second Ct value to the first Ct value to measure the complexity of the first library.
  • Another aspect includes fragmenting nucleic acid extracted from a target sample
  • the second polynucleotide comprises a first region wherein at least two consecutive nucleotides comprise a nucleic acid sequence identical to a target nucleic acid sequence, and at least one end of the first region, wherein the second polynucleotide comprises at least two consecutive nucleotides from at least one end of the target nucleic acid sequence; Comprising a second region comprising different nucleic acid sequences;
  • the complexity of the target sample, nucleic acid, nucleic acid sequencing, fragmentation, first polynucleotide, ligation, addition, second polynucleotide, PCR, Ct value, library, and library is as described above.
  • the method includes fragmenting the nucleic acid extracted from the target sample.
  • the method includes ligating a first polynucleotide at one or more ends of the fragmented nucleic acid to produce a first library for nucleic acid sequencing.
  • the method includes adding a second polynucleotide to the first library to prepare a second library.
  • the method includes performing nucleic acid sequencing with a first primer complementary to the second library and the first polynucleotide to obtain a total read of the second library.
  • the first primer may be one primer or a set of primers.
  • the method includes selecting a read of the second polynucleotide from the total reads obtained to obtain a read of the second polynucleotide.
  • the method includes calculating the ratio of the number of reads of the second polynucleotide to the total number of reads to determine the complexity of the first library.
  • the method can monitor the complexity of the first library in real time during or after nucleic acid sequencing.
  • the library is prepared in a simple and accurate manner in real time during the preparation of the nucleic acid sequencing library, during the nucleic acid sequencing process after the library preparation, or after completion of the nucleic acid sequencing.
  • the complexity of can be measured.
  • FIG. 1A is a schematic diagram illustrating the principle of a method of measuring the complexity of a library for NGS according to one aspect
  • FIG. 1B is a schematic diagram showing the ratio of artificial sequence reads among the entire reads when the complexity of the library is high or low.
  • adapter
  • artificial sequence
  • 2 is a graph showing Ct values in quantitative PCR according to library complexity.
  • 3 is a graph showing the ratio of artificial sequence reads among total reads according to library complexity.
  • Genes KRAS, IDH1, BRAC1, ALK, and ERBB2, including variants known to be utilized in companion diagnostics (CDx) as target sequences for next generation sequencing (NGS), and regions of these genes was selected. About 150 bp of reference sequence was selected based on the selected position.
  • the selected reference sequence and the nucleic acid sequence are identical, but replace the 4 bp from the 5 'end and the 4 bp from the 3' end with an artificial sequence, and the nucleic acid fragment including the adapter nucleic acid sequence of the library at both ends (hereinafter , Called “artificial sequence containing nucleic acid fragments” were prepared by gene synthesis methods.
  • nucleic acid sequences excluding the adapter nucleic acid sequence from the nucleic acid sequences of the selected genes, reference sequences, and artificial sequence containing nucleic acid fragments are shown in Table 1 below.
  • Chromosome No. 12 Exon No. 3: 3: Chromosome 12: 25380168-25380346 5'-AGGAATCCTGAGAAGGGAGAAACACAGTCTGGATTATTACAGTGCACCTTTTACTTCAAAAAAGGTGTTATATACAACTCAACAACAAAAAATTCAATTTAAAAATGGGCAAAGGACTTGAAAAGACATTGTTCCTGCTCCAAAGATGAC-3 '(SEQ ID NO: 1) -3 '(SEQ ID NO: 2) 2 IDH1: Chromosome No. 12 Exon No.
  • Chromosome 2 209113048-209113359 5'-AGATAATGGCTTCTCTGAAGACCGTGCCACCCAGAATATTTCGTATGGTGCCATTTGGTGATTTCCACATTTGTTTCAACTTGAACTCCTCAACCCTCTTCTCATCAGGAGTGATAGTGGCACATTTGACGCCAACATTATGCTTCTTTA-3 '(SEQ ID NO: 3) -3 '(SEQ ID NO: 4) 3
  • BRAC1 Chromosome 17 Exon number: 15: Chromosome 17: 41222945-41223255 5'-TCAATTCTGGCTTCTCCCTGCTCACACTTTCTTCCATTGCATTATACCCAGCAGTATCAGTAGTATGAGCAGCAGCTGGACTCTGGGCAGATTCTGCAACTTTCAACTTTCAATTGGGGAACTTTCAATGCAGAGGTTGAAGATGGTATG-3 '(SEQ ID NO: 5) -3 '(SEQ ID NO: 6) 4 ALK: Chromosome No.
  • Chromosome 2 29446208-29446394 5'-GGTCACTGATGGAGGAGGTCTTGCCAGCAAAGCAGTAGTTGGGGTTGTAGTCGGTCATGATGGTCGAGGTGCGGAGCTTGCTCAGCTTGTACTCAGGGCTCTGCAGCTCCATCTGCATGGCTTGCAGCTCCTGGTGCTTCCGGCGGTACA-3 '(SEQ ID NO: 7) -3 '(SEQ ID NO: 8) 5 ERBB2: Chromosome No. 17 Exon No.
  • Chromosome 17 37864574-37864787 5'-CAGGGCTACGTGCTCATCGCTCACAACCAAGTGAGGCAGGTCCCACTGCAGAGGCTGCGGATTGTGCGAGGCACCCAGCTCTTTGAGGACAACTATGCCCTGGCCGTGCTAGACAATGGAGACCCGCTGAACAATACCACCCCTGTCACA-3 '(SEQ ID NO: 9) -3 '(SEQ ID NO: 10)
  • HapMap mixed samples in which a total of 10 samples of human genome samples HapMap NA07014, NA10840, NA18595, NA18957, NA18488, NA18511, NA18867, NA18924, NA19108, and NA19114 were mixed at the same molar concentration ratio to prepare a library for NGS. 50 ng or 200 ng was prepared. Prepared mixed samples were subjected to sequential fragmentation, end-repair, 3'-adenosine tailing, and adapter ligation of the human genome using Kapa hyper prep kits for illumine (Kapa Biosystems) according to the method provided by the manufacturer. It was.
  • Example 1.1 50 atmoles of each of the artificial sequence containing nucleic acid fragments prepared in Example 1.1 were spiked into the library to which the adapter was ligated.
  • the library to which the artificial sequence-containing nucleic acid fragment was added was subjected to a pre-capture polymerase chain reaction (PCR), followed by target enrichment.
  • PCR pre-capture polymerase chain reaction
  • the target concentrated library was then subjected to post-capture PCR.
  • Real-time PCR was performed using a quantitative PCR (qPCR) kit for measuring KAPA Illumina library concentrations, and Ct (cycle threshold) values were calculated from real-time qPCR results.
  • Ct cycle threshold
  • the calculated Ct value represents the number of reads derived from the artificial sequence containing nucleic acid fragments.
  • a library having changed the library complexity during the library preparation process was prepared.
  • the product ligated in the adapter ligation step was purified once, using an adapter of 30 ⁇ M and according to this method
  • the prepared library was used as a negative control.
  • complexity can be achieved by using two purifications of the ligated product in the ligation step, using 3 ⁇ M of adapter (ie 1/10 dilution), or a combination thereof. Reduced libraries were prepared.
  • the library of the negative control and the library of artificially reduced complexity were subjected to real-time qPCR in the same manner as above to calculate the Ct value.
  • the calculated Ct value according to the complexity of the library is shown in FIG. 2.
  • the complexity of the library decreased, the Ct value of the artificial sequence containing nucleic acid fragments decreased.
  • the Ct value of the total reads did not change significantly, despite the complexity of the library.
  • the nucleic acid sequences of the prepared libraries were analyzed and the raw read data analyzed was used to calculate the total number of reads and the ratio of artificial sequence reads among the total reads. According to the change in complexity, the calculated total number of reads and the ratio of artificial sequence reads among the total reads are shown in FIG. 3.
  • "50 ng” means that the amount of human genomic DNA HapMap mixed sample at the time of library preparation is 50 ng.
  • the number of total reads and the number of artificial sequence reads did not correlate with library complexity, but it was confirmed that the ratio of artificial sequence reads among all reads correlated inversely with library complexity.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Organic Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biotechnology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Wood Science & Technology (AREA)
  • Zoology (AREA)
  • Medical Informatics (AREA)
  • Molecular Biology (AREA)
  • Genetics & Genomics (AREA)
  • Analytical Chemistry (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Immunology (AREA)
  • General Engineering & Computer Science (AREA)
  • Microbiology (AREA)
  • Biochemistry (AREA)
  • Artificial Intelligence (AREA)
  • Epidemiology (AREA)
  • Evolutionary Computation (AREA)
  • Bioethics (AREA)
  • Public Health (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

A method for measuring the complexity of a library is provided. According to the present invention, the complexity of a library can simply and accurately be measured during a production process of the library for sequencing in real time, during a sequencing process after the production of the library, or after the completion of sequencing.

Description

차세대 핵산 서열 분석을 위한 라이브러리의 복잡성을 측정하는 방법How to measure the complexity of libraries for next-generation nucleic acid sequencing
차세대 핵산 서열 분석을 위한 라이브러리의 복잡성을 측정하는 방법 및 이를 이용한 장치에 관한 것이다.The present invention relates to a method for measuring complexity of a library for next generation nucleic acid sequencing and a device using the same.
차세대 핵산 서열 분석(next generation sequencing: NGS)은 연구 및 진단의 목적으로 널리 활용되고 있다. NGS는 장비의 종류에 따라 다르지만, 크게 보면 시료의 채취, 라이브러리의 제조, 및 핵산 서열 분석의 수행의 총 3단계로 구분할 수 있다. 라이브러리의 제조 후에는 핵산 서열 분석에 들어가기에 앞서 품질 관리(quality control: QC)를 진행하고, 제조된 라이브러리로 핵산 서열 분석을 진행할지 여부를 결정한다. 핵산 서열 분석을 진행하는 중에도 실시간으로 핵산 서열 분석이 원활히 진행되는지 여부를 확인하기 위해, 라이브러리의 제조사에서 제공하는 방법으로 QC를 진행한다. 핵산 서열 분석 후에는 생성된 핵산 서열 데이터(즉, 리드(read))를 분석하여, 실질적으로 돌연변이, 유전자 변이, 유전자 발현 등의 분석 전 데이터 생성 품질을 측정한다. 이와 같이, 각 차세대 핵산 서열 분석의 단계 별로 품질을 측정하고, 품질을 결정하는 요인 중 하나는 라이브러리의 복잡성(complexity)이다. 라이브러리를 제조한 후, 핵산 서열 분석을 수행하는 도중, 또는 핵산 서열 분석을 완료한 후에 라이브러리의 복잡성을 측정하여, 핵산 서열 분석의 수행, 중단, 및 생성된 핵산 서열 데이터의 활용 등을 판단할 수 있다.Next generation sequencing (NGS) is widely used for research and diagnostic purposes. NGS varies depending on the type of equipment, but can be broadly classified into three stages: sampling, preparing a library, and performing nucleic acid sequencing. After preparation of the library, quality control (QC) is performed prior to entering the nucleic acid sequencing, and it is determined whether to proceed with nucleic acid sequencing with the prepared library. In order to check whether nucleic acid sequencing proceeds smoothly in real time even during nucleic acid sequencing, QC is performed by a method provided by the manufacturer of the library. After nucleic acid sequencing, the generated nucleic acid sequence data (ie, reads) are analyzed to determine the quality of data generation prior to analysis, such as mutations, gene mutations, gene expression, and the like. As such, one of the factors determining the quality and determining the quality at each stage of the next-generation nucleic acid sequencing is the complexity of the library. After the library has been prepared, the complexity of the library can be measured during nucleic acid sequencing or after nucleic acid sequencing is completed to determine whether to perform, stop, and utilize the generated nucleic acid sequence data. have.
대량 핵산 서열 분석을 수행하기 전에 2회의 최소 범위 핵산 서열 분석을 수행하고 이로부터 얻은 데이터를 이용하여 라이브러리의 복잡성을 통계적 방법으로 예측하는 방법이 알려져 있다(미국 공개 번호 US20140324359 A1). 그러나, 이러한 방법은 라이브러리의 복잡성을 측정하기 위해, 우선 라이브러리에 대해 핵산 서열 분석을 2회 진행하여야 하고, 라이브러리의 복잡성을 직접적이고 실시간으로 측정할 수 없다.Before performing mass nucleic acid sequencing, it is known to perform two minimum range nucleic acid sequencing and use the data obtained therefrom to statistically predict the complexity of the library (US Publication No. US20140324359 A1). However, this method requires two steps of nucleic acid sequencing on the library first to determine the complexity of the library and cannot measure the complexity of the library directly and in real time.
따라서, 간단하고 정확한 방법으로 핵산 서열 분석 진행 중에 실시간으로 또는 핵산 서열 분석 후에 라이브러리의 복잡성을 측정할 수 있는 방법이 요구된다.Thus, there is a need for a method that can measure the complexity of a library in real time or after nucleic acid sequencing in a simple and accurate manner.
핵산 서열 분석용 라이브러리의 복잡성을 측정하는 방법을 제공한다.Provided are methods for determining the complexity of a library for nucleic acid sequencing.
일 양상에 따르면, 표적 시료로부터 추출된 핵산을 단편화하는 단계;According to one aspect, fragmenting nucleic acid extracted from a target sample;
단편화된 핵산의 하나 이상의 말단에 제1 폴리뉴클레오티드를 라이게이션하여 핵산 서열 분석용 제1 라이브러리를 제조하는 단계;Ligating a first polynucleotide at one or more ends of the fragmented nucleic acid to produce a first library for nucleic acid sequencing;
상기 제1 라이브러리에 제2 폴리뉴클레오티드를 첨가(spiking)하여 제2 라이브러리를 준비하는 단계;Preparing a second library by spiking a second polynucleotide to the first library;
상기 제2 라이브러리 및 상기 제1 폴리뉴클레오티드에 상보적인 제1 프라이머 세트를 사용한 제1 폴리머라제 연쇄 반응(polymerase chain reaction: PCR)을 수행하여 제1 Ct(threshold cycle) 값을 산출하는 단계;Calculating a first threshold cycle (Ct) value by performing a first polymerase chain reaction (PCR) using a first primer set complementary to the second library and the first polynucleotide;
상기 제2 라이브러리 및 상기 제2 폴리뉴클레오티드에 상보적인 제2 프라이머 세트를 사용한 제2 PCR을 수행하여 제2 Ct 값을 산출하는 단계; 및 Calculating a second Ct value by performing a second PCR using a second primer set complementary to the second library and the second polynucleotide; And
상기 제1 Ct 값에 대한 제2 Ct 값의 비율을 산출하여 상기 제1 라이브러리의 복잡성을 측정하는 단계를 포함하는, 핵산 서열 분석용 라이브러리의 복잡성을 측정하는 방법을 제공한다.It provides a method for measuring the complexity of the library for nucleic acid sequencing, comprising the step of calculating the ratio of the second Ct value to the first Ct value to determine the complexity of the first library.
상기 "핵산 서열 분석용 라이브러리"의 핵산 서열 분석은 차세대 핵산 서열 분석(next generation sequencing: NGS)일 수 있다. 용어 "차세대 핵산 서열 분석(next generation sequencing: NGS)"는 용어 "대규모 병렬 시퀀싱(massive parallel sequencing)" 또는 용어 "2세대 시퀀싱(second-generation sequencing)"과 상호 교환적으로 사용될 수 있다. NGS는 칩(chip) 기반 그리고 PCR 기반 쌍 말단(paired end) 형식으로 전장 유전체를 조각내고, 상기 조각을 혼성화 반응(hybridization)에 기초하여 초고속으로 핵산 서열 분석을 수행하는 기술을 의미한다. NGS는 대량의 단편의 핵산을 동시다발적으로 핵산 서열 분석하는 기법으로서, NGS 기반의 표적 핵산 서열 분석(targeted sequencing) 또는 패널 핵산 서열 분석(panel sequencing)을 수행할 수 있다. NGS는 예를 들어, 454 플랫폼(Roche), GS FLX 티타늄, Illumina MiSeq, Illumina HiSeq, Illumina Genome Analyzer, Solexa platform, SOLiD System(Applied Biosystems), Ion Proton(Life Technologies), Complete Genomics, Helicos Biosciences Heliscope, Pacific Biosciences의 단일 분자 실시간(SMRT™) 기술, 또는 이들의 조합에 의해 수행될 수 있다.Nucleic acid sequencing of the “nucleic acid sequencing library” may be next generation sequencing (NGS). The term "next generation sequencing (NGS)" may be used interchangeably with the term "massive parallel sequencing" or the term "second-generation sequencing". NGS refers to a technique for fragmenting a full-length genome in chip-based and PCR-based paired end forms, and performing the nucleic acid sequencing analysis at a very high speed based on hybridization. NGS is a technique for simultaneous nucleic acid sequencing of a large amount of fragments of nucleic acid, and can perform NGS-based targeted nucleic acid sequencing or panel nucleic acid sequencing. NGS is, for example, 454 platform (Roche), GS FLX titanium, Illumina MiSeq, Illumina HiSeq, Illumina Genome Analyzer, Solexa platform, SOLiD System (Applied Biosystems), Ion Proton (Life Technologies), Complete Genomics, Helicos Biosciences Heliscope, Pacific Biosciences' single molecule real time (SMRT ™) technology, or a combination thereof.
용어 "라이브러리(library)"는 핵산 단편의 집합을 말한다. 상기 라이브러리는 예를 들어 유전체 라이브러리(genomic library), 상보적 DNA 라이브러리(complementary DNA library), 또는 무작위적 돌연변이 라이브러리(randomized mutant library)이다.The term "library" refers to a collection of nucleic acid fragments. The library is, for example, a genomic library, a complementary DNA library, or a randomized mutant library.
용어 "라이브러리의 복잡성(library complexity)"은 해당 라이브러리에 존재하는 고유한(unique) 단편의 수를 말한다. 복잡성은 출발 물질인 핵산의 양, 라이브러리 제조 과정 중 소실되는 핵산의 양, PCR을 통해 증폭되는 핵산의 양 등에 영향을 받을 수 있다. 상기 라이브러리의 복잡성은 상대적인 수준으로 나타낼 수 있다.The term "library complexity" refers to the number of unique fragments that exist in that library. Complexity may be influenced by the amount of nucleic acid that is a starting material, the amount of nucleic acid lost during library preparation, the amount of nucleic acid amplified by PCR, and the like. The complexity of the library can be expressed at a relative level.
상기 방법은 표적 시료로부터 추출된 핵산을 단편화하는 단계를 포함한다.The method includes fragmenting the nucleic acid extracted from the target sample.
상기 표적 시료는 개체 또는 세포로부터 유래할 수 있다. 상기 개체는 인간, 소, 말, 돼지, 양, 염소, 개, 고양이, 및 설치류를 포함한 포유류일 수 있다. 상기 세포는 개체로부터 유래된 세포 또는 세포주일 수 있다. 상기 표적 시료는 생물학적 시료일 수 있다. 상기 생물학적 시료는 예를 들어, 혈액, 혈장, 혈청, 소변, 타액, 점막 분비물, 객담, 대변, 눈물, 또는 이들의 조합으로부터 획득된 것일 수 있다. 상기 생물학적 시료는 다양한 종으로부터 유래하는 진핵세포, 원핵세포, 바이러스, 박테리오 파지 등의 시료일 수 있다.The target sample can be from an individual or a cell. The subject may be a mammal, including humans, cattle, horses, pigs, sheep, goats, dogs, cats, and rodents. The cell may be a cell or cell line derived from an individual. The target sample may be a biological sample. The biological sample may be obtained from, for example, blood, plasma, serum, urine, saliva, mucosal secretions, sputum, feces, tears, or a combination thereof. The biological sample may be a sample of eukaryotic cells, prokaryotic cells, viruses, bacteriophage, or the like derived from various species.
상기 핵산은 유전체(genome) 또는 그의 단편일 수 있다. 용어 "유전체(genome)"는 염색체, 염색질, 또는 유전자의 전체를 총칭하는 용어이다. 상기 유전체 또는 그의 단편은 분리된 DNA, 예를 들어 세포를 포함하지 않는 핵산(cell-free DNA: cfDNA)일 수 있다. 표적 시료로부터 핵산을 추출 또는 분리하는 방법은 통상의 기술자에게 공지된 방법으로 수행될 수 있다.The nucleic acid may be a genome or a fragment thereof. The term “genome” is a term that collectively refers to the entirety of a chromosome, chromatin, or gene. The genome or fragment thereof may be isolated DNA, eg, cell-free DNA (cfDNA). Methods for extracting or separating nucleic acids from a target sample can be performed by methods known to those skilled in the art.
상기 표적 시료로부터 핵산을 추출하는 방법은 통상의 기술자에게 알려진 방법으로 수행될 수 있다.Extracting nucleic acid from the target sample may be performed by a method known to those skilled in the art.
상기 단편화(fragmentation)는 유전체를 물리적, 화학적 또는 효소적으로 절단하는 것일 수 있다. 예를 들어, 상기 단편화는 유전체를 제한효소로 절단하는 것이다.The fragmentation may be a physical, chemical or enzymatic cleavage of the genome. For example, such fragmentation is cleavage of the genome with restriction enzymes.
상기 방법은 단편화된 핵산의 크기를 선별하는 단계를 더 포함할 수 있다. 크기를 선별하는 단계는 전기영동, 원심분리, 크로마토그래피, 또는 이들의 조합으로 수행될 수 있다. 상기 단편화된 핵산의 길이는 약 10 bp(염기쌍) 내지 약 2000 bp, 약 20 bp 내지 약 1500 bp, 약 50 bp 내지 약 1000 bp, 약 100 bp 내지 약 800 bp, 약 150 bp 내지 약 600 bp, 또는 약 300 bp 내지 약 600 bp일 수 있다.The method may further comprise selecting the size of the fragmented nucleic acid. The step of size selection can be performed by electrophoresis, centrifugation, chromatography, or a combination thereof. The fragmented nucleic acid has a length of about 10 bp (base pair) to about 2000 bp, about 20 bp to about 1500 bp, about 50 bp to about 1000 bp, about 100 bp to about 800 bp, about 150 bp to about 600 bp, Or about 300 bp to about 600 bp.
상기 방법은 단편화된 핵산의 하나 이상의 말단에 제1 폴리뉴클레오티드를 라이게이션하여 핵산 서열 분석용 제1 라이브러리를 제조하는 단계를 포함한다.The method includes ligating a first polynucleotide at one or more ends of the fragmented nucleic acid to produce a first library for nucleic acid sequencing.
상기 제1 라이브러리를 제조하는 단계는 단편화된 핵산의 말단-수선(end-repair) 및 3'-아데노신 꼬리달기(3'-A tailing)를 더 포함할 수 있다.The preparing of the first library may further include end-repair and 3′-adenosine tailing of the fragmented nucleic acid.
상기 제1 폴리뉴클레오티드는 어댑터일 수 있다. 상기 어댑터(adaptor)는 NGS에서 표적 핵산을 농축(enrichment)하기 위한 프라이머 서열을 포함하는 폴리뉴클레오티드일 수 있다. 상기 어댑터는 통상의 기술자에게 알려진 폴리뉴클레오티드일 수 있다. 상기 어댑터는 핵산 서열 분석용 유니버셜 서열을 포함할 수 있다. 예를 들어, 핵산 서열 분석용 라이브러리 제조 키트에 포함된 어댑터이다.The first polynucleotide may be an adapter. The adapter may be a polynucleotide comprising a primer sequence for enriching a target nucleic acid in NGS. The adapter may be a polynucleotide known to those skilled in the art. The adapter may comprise a universal sequence for nucleic acid sequencing. For example, it is an adapter included in a library preparation kit for nucleic acid sequencing.
라이게이션은 핵산 단편들 간의 말단을 결합시키는 것을 말한다. 상기 라이게이션은 DNA 리가제(ligase)를 사용하여 수행할 수 있다.Ligation refers to the binding of ends between nucleic acid fragments. The ligation can be performed using DNA ligase.
상기 제1 라이브러리는 핵산 서열 분석을 위해 제조된 라이브러리일 수 있다.The first library may be a library prepared for nucleic acid sequencing.
상기 방법은 제1 라이브러리에 제2 폴리뉴클레오티드를 첨가(spiking)하여 제2 라이브러리를 준비하는 단계를 포함한다.The method includes preparing a second library by spiking a second polynucleotide to the first library.
상기 첨가(spiking)는 제1 라이브러리와 소량의 제2 폴리뉴클레오티드를 혼합하는 것일 수 있다.The spike may be to mix the first library with a small amount of the second polynucleotide.
상기 제2 폴리뉴클레오티드는 2 이상 연속 뉴클레오티드가 표적 핵산 서열과 동일한 핵산 서열을 포함하는 제1 영역, 및 상기 제1 영역의 하나 이상의 말단에 위치하고, 표적 핵산 서열의 하나 이상의 말단으로부터 2 이상 연속 뉴클레오티드와 상이한 핵산 서열을 포함하는 제2 영역을 포함할 수 있다.The second polynucleotide comprises a first region wherein at least two consecutive nucleotides comprise a nucleic acid sequence identical to a target nucleic acid sequence, and at least one end of the first region, wherein the second polynucleotide comprises at least two consecutive nucleotides from at least one end of the target nucleic acid sequence; It may comprise a second region comprising different nucleic acid sequences.
상기 표적 핵산 서열은 동반 진단(companion diagnostics: CDx)에 이용되는 유전적 변이를 포함할 수 있다.The target nucleic acid sequence may comprise genetic variation used for companion diagnostics (CDx).
상기 제2 폴리뉴클레오티드의 길이는 약 20 뉴클레오티드(이하, 'nt'라고 함) 내지 약 500 nt, 약 30 nt 내지 약 450 nt, 약 40 nt 내지 약 400 nt, 약 50 nt 내지 약 350 nt, 약 60 nt 내지 약 300 nt, 약 70 nt 내지 약 250 nt, 약 80 nt 내지 약 200 nt, 약 90 nt 내지 약 190 nt, 약 100 nt 내지 약 180 nt, 약 110 nt 내지 약 170 nt, 약 120 nt 내지 약 160 nt, 약 130 nt 내지 약 150 nt, 또는 약 150 nt일 수 있다.The length of the second polynucleotide is about 20 nucleotides (hereinafter referred to as 'nt') to about 500 nt, about 30 nt to about 450 nt, about 40 nt to about 400 nt, about 50 nt to about 350 nt, about 60 nt to about 300 nt, about 70 nt to about 250 nt, about 80 nt to about 200 nt, about 90 nt to about 190 nt, about 100 nt to about 180 nt, about 110 nt to about 170 nt, about 120 nt To about 160 nt, about 130 nt to about 150 nt, or about 150 nt.
상기 제1 영역은 2 이상 연속 뉴클레오티드가 표적 핵산 서열과 동일한 핵산 서열을 포함할 수 있다. 상기 제1 영역은 약 10 뉴클레오티드(이하, 'nt'라고 함) 내지 약 490 nt, 약 20 nt 내지 약 440 nt, 약 30 nt 내지 약 390 nt, 약 40 nt 내지 약 340 nt, 약 50 nt 내지 약 290 nt, 약 60 nt 내지 약 240 nt, 약 70 nt 내지 약 150 nt, 약 80 nt 내지 약 180 nt, 약 90 nt 내지 약 170 nt, 약 100 nt 내지 약 160 nt, 약 110 nt 내지 약 150 nt, 약 120 nt 내지 약 150 nt, 약 130 nt 내지 약 150 nt, 약 140 nt 내지 약 150 nt, 또는 약 142 nt일 수 있다.The first region may comprise a nucleic acid sequence wherein two or more consecutive nucleotides are the same as the target nucleic acid sequence. The first region is about 10 nucleotides (hereinafter referred to as 'nt') to about 490 nt, about 20 nt to about 440 nt, about 30 nt to about 390 nt, about 40 nt to about 340 nt, about 50 nt to About 290 nt, about 60 nt to about 240 nt, about 70 nt to about 150 nt, about 80 nt to about 180 nt, about 90 nt to about 170 nt, about 100 nt to about 160 nt, about 110 nt to about 150 nt, about 120 nt to about 150 nt, about 130 nt to about 150 nt, about 140 nt to about 150 nt, or about 142 nt.
상기 제2 영역은 제1 영역의 양 말단에 위치하고, 각각의 제2 영역은 표적 핵산 서열의 5' 말단으로부터 2 이상 연속 뉴클레오티드와 상이한 서열 및 3' 말단으로부터 2 이상 연속 뉴클레오티드와 상이한 서열을 포함할 수 있다. 상기 제2 영역의 길이는 약 2 nt 내지 약 15 nt, 약 2 nt 내지 약 13 nt, 약 2 nt 내지 약 10 nt, 약 2 nt 내지 약 8 nt, 약 2 nt 내지 약 6 nt, 약 2 nt 내지 약 4 nt, 약 3 nt, 또는 약 4 nt일 수 있다.The second region is located at both ends of the first region, each second region comprising a sequence different from at least 2 consecutive nucleotides from the 5 'end of the target nucleic acid sequence and a sequence different from at least 2 consecutive nucleotides from the 3' end. Can be. The length of the second region is about 2 nt to about 15 nt, about 2 nt to about 13 nt, about 2 nt to about 10 nt, about 2 nt to about 8 nt, about 2 nt to about 6 nt, about 2 nt To about 4 nt, about 3 nt, or about 4 nt.
상기 제2 폴리뉴클레오티드는 예를 들어 그의 5'-말단으로부터 3' 방향으로 제2 영역, 제1 영역, 및 제2 영역을 포함할 수 있다.The second polynucleotide may comprise a second region, a first region, and a second region, for example in the 3 'direction from its 5'-end.
상기 제2 폴리뉴클레오티드는 그의 하나 이상의 말단에 상기 제1 폴리뉴클레오티드와 동일한 2 이상 연속 뉴클레오티드를 더 포함할 수 있다. 상기 제2 폴리뉴클레오티드는 예를 들어 그의 5'-말단으로부터 3' 방향으로 제1 폴리뉴클레오티드와 동일한 2 이상 연속 뉴클레오티드, 제2 영역, 제1 영역, 제2 영역, 및 제1 폴리뉴클레오티드와 동일한 2 이상 연속 뉴클레오티드를 포함할 수 있다.The second polynucleotide may further comprise two or more contiguous nucleotides identical to the first polynucleotide at one or more ends thereof. The second polynucleotide is, for example, two or more contiguous nucleotides equal to the first polynucleotide in its 3 'direction from its 5'-end, two equal to the second region, first region, second region, and first polynucleotide It may comprise a continuous nucleotide.
상기 방법은 제2 라이브러리 및 제1 폴리뉴클레오티드에 상보적인 제1 프라이머 세트를 사용한 제1 폴리머라제 연쇄 반응(polymerase chain reaction: PCR)을 수행하여 제1 Ct(threshold cycle) 값을 산출하는 단계를 포함한다.The method includes performing a first polymerase chain reaction (PCR) using a first primer set complementary to a second library and a first polynucleotide to yield a first threshold cycle (Ct) value. do.
상기 PCR은 예를 들어, 정량적 PCR(quantitative PCR: qPCR), 디지탈 PCR(digital PCR: dPCR), 핫 스타트(hot start) PCR, 터치다운(touchdown) PCR, 네스티드(nested) PCR, 부스터(booster) PCR, 멀티플렉스(multiplex) PCR, 실시간(real-time) PCR, 분별 디스플레이 PCR(differential display PCR, D-PCR), cDNA 말단의 신속 증폭(rapid amplification of cDNA ends, RACE), 인버스 PCR (inverse polymerase chain reaction: IPCR), 벡토레트(vectorette) PCR, 및 TAIL-PCR(thermal asymmetric interlaced PCR)이다.The PCR is, for example, quantitative PCR (qPCR), digital PCR (digital PCR (dPCR), hot start PCR, touchdown PCR, nested PCR, booster (booster) ) PCR, multiplex PCR, real-time PCR, differential display PCR (D-PCR), rapid amplification of cDNA ends (RACE), inverse PCR (inverse) polymerase chain reaction (IPCR), vectorette PCR, and TAIL-PCR (thermal asymmetric interlaced PCR).
상기 제1 프라이머 세트는 제1 폴리뉴클레오티드에 상보적인 폴리뉴클레오티드일 수 있다. 상기 제1 프라이머 세트는 유니버셜 프라이머 세트일 수 있다.The first primer set may be a polynucleotide complementary to the first polynucleotide. The first primer set may be a universal primer set.
상기 제2 프라이머 세트는 제2 폴리뉴클레오티드에 상보적인 폴리뉴클레오티드일 수 있다. 상기 제2 프라이머 세트는 제2 폴리뉴클레오티드에는 상보적이지만 상기 제1 폴리뉴클레오티드에는 상보적이지 않은 폴리뉴클레오티드일 수 있다.The second primer set may be a polynucleotide complementary to a second polynucleotide. The second primer set may be a polynucleotide that is complementary to the second polynucleotide but not complementary to the first polynucleotide.
상기 PCR 반응에서, 증폭된 핵산의 검출을 위해 표적 핵산에 상보적인 프로브를 더 사용할 수 있다. 상기 프로브는 그의 하나 이상의 말단이 형광 물질, 양자점, FRET 등으로 표지된 것일 수 있다.In the PCR reaction, a probe complementary to the target nucleic acid can be further used for detection of the amplified nucleic acid. The probe may be one or more of its ends is labeled with a fluorescent material, quantum dots, FRET and the like.
Ct(threshold cycle) 값은 PCR에서 배경 신호를 초과하여 최초로 증폭 신호를 나타내는 사이클의 수를 말한다. 정량적 PCR의 경우, 형광 신호의 역치(threshold)를 나타내는 사이클의 수를 말한다. Ct 값은 증폭 반응에서 출발 물질로서 최초 핵산의 카피 수와 역의 상관관계가 있기 때문에, Ct 값은 표적 시료 중 핵산의 카피 수를 산출하는데 이용될 수 있다.The threshold cycle (Ct) value refers to the number of cycles in the PCR that initially represent the amplified signal above the background signal. For quantitative PCR, this refers to the number of cycles representing the threshold of the fluorescence signal. Since the Ct value is inversely correlated with the copy number of the original nucleic acid as starting material in the amplification reaction, the Ct value can be used to calculate the copy number of the nucleic acid in the target sample.
상기 제1 Ct 값은 제2 라이브러리의 총 리드(read)를 나타낼 수 있다. 용어 "리드(read)"는 핵산 서열 분석으로 수득된 핵산 단편의 핵산 서열 정보를 말한다.The first Ct value may represent the total read of the second library. The term “read” refers to nucleic acid sequence information of nucleic acid fragments obtained by nucleic acid sequence analysis.
상기 방법은 제2 라이브러리 및 제2 폴리뉴클레오티드에 상보적인 제2 프라이머 세트를 사용한 제2 PCR을 수행하여 제2 Ct 값을 산출하는 단계를 포함한다.The method includes performing a second PCR using a second primer set complementary to a second library and a second polynucleotide to yield a second Ct value.
상기 제2 Ct 값은 제2 라이브러리 중 제2 폴리뉴클레오티드의 리드를 나타낼 수 있다.The second Ct value may represent a read of a second polynucleotide in a second library.
상기 제1 PCR, 제2 PCR, 또는 이들의 조합은 정량적 PCR(quantitative PCR: qPCR) 또는 디지탈 PCR(digital PCR: dPCR)로 수행될 수 있다.The first PCR, the second PCR, or a combination thereof may be performed by quantitative PCR (qPCR) or digital PCR (dPCR).
상기 제1 PCR 및 제2 PCR은 동시 또는 순차로 수행될 수 있다.The first PCR and the second PCR may be performed simultaneously or sequentially.
상기 방법은 제1 Ct 값에 대한 제2 Ct 값의 비율을 산출하여 상기 제1 라이브러리의 복잡성을 측정하는 단계를 포함한다.The method includes calculating the ratio of the second Ct value to the first Ct value to measure the complexity of the first library.
상기 제1 Ct 값에 대한 제2 Ct 값의 비율이 낮을수록 상기 제1 라이브러리의 복잡성이 높을 수 있다. 상기 제1 Ct 값에 대한 제2 Ct 값의 비율이 높을수록 상기 제1 라이브러리의 복잡성이 낮을 수 있다.The lower the ratio of the second Ct value to the first Ct value, the higher the complexity of the first library. The higher the ratio of the second Ct value to the first Ct value, the lower the complexity of the first library.
다른 양상은 표적 시료로부터 추출된 핵산을 단편화하는 단계;Another aspect includes fragmenting nucleic acid extracted from a target sample;
단편화된 핵산의 하나 이상의 말단에 제1 폴리뉴클레오티드를 라이게이션하여 핵산 서열 분석용 제1 라이브러리를 제조하는 단계;Ligating a first polynucleotide at one or more ends of the fragmented nucleic acid to produce a first library for nucleic acid sequencing;
상기 제1 라이브러리에 제2 폴리뉴클레오티드를 첨가하여 제2 라이브러리를 준비하는 단계로서,Preparing a second library by adding a second polynucleotide to the first library,
상기 제2 폴리뉴클레오티드는 2 이상 연속 뉴클레오티드가 표적 핵산 서열과 동일한 핵산 서열을 포함하는 제1 영역, 및 상기 제1 영역의 하나 이상의 말단에 위치하고, 표적 핵산 서열의 하나 이상의 말단으로부터 2 이상 연속 뉴클레오티드와 상이한 핵산 서열을 포함하는 제2 영역을 포함하는 것인 단계;The second polynucleotide comprises a first region wherein at least two consecutive nucleotides comprise a nucleic acid sequence identical to a target nucleic acid sequence, and at least one end of the first region, wherein the second polynucleotide comprises at least two consecutive nucleotides from at least one end of the target nucleic acid sequence; Comprising a second region comprising different nucleic acid sequences;
상기 제2 라이브러리 및 상기 제1 폴리뉴클레오티드에 상보적인 제1 프라이머를 사용한 핵산 서열 분석(sequencing)을 수행하여 제2 라이브러리의 총 리드(read)를 수득하는 단계;Performing nucleic acid sequencing using a first primer complementary to the second library and the first polynucleotide to obtain a total read of the second library;
수득된 총 리드로부터 상기 제2 폴리뉴클레오티드의 리드를 선별하여 제2 폴리뉴클레오티드의 리드를 수득하는 단계; 및Selecting the read of the second polynucleotide from the total reads obtained to obtain a read of the second polynucleotide; And
총 리드의 수에 대한 제2 폴리뉴클레오티드의 리드의 수의 비율을 산출하여 상기 제1 라이브러리의 복잡성을 측정하는 단계를 포함하는,Calculating the ratio of the number of reads of the second polynucleotide to the total number of reads to determine the complexity of the first library,
핵산 서열 분석용 라이브러리의 복잡성을 측정하는 방법을 제공한다.Provided are methods for determining the complexity of a library for nucleic acid sequencing.
상기 표적 시료, 핵산, 핵산 서열 분석, 단편화, 제1 폴리뉴클레오티드, 라이게이션, 첨가, 제2 폴리뉴클레오티드, PCR, Ct 값, 라이브러리, 및 라이브러리의 복잡성은 전술된 바와 같다.The complexity of the target sample, nucleic acid, nucleic acid sequencing, fragmentation, first polynucleotide, ligation, addition, second polynucleotide, PCR, Ct value, library, and library is as described above.
상기 방법은 표적 시료로부터 추출된 핵산을 단편화하는 단계를 포함한다.The method includes fragmenting the nucleic acid extracted from the target sample.
상기 방법은 단편화된 핵산의 하나 이상의 말단에 제1 폴리뉴클레오티드를 라이게이션하여 핵산 서열 분석용 제1 라이브러리를 제조하는 단계를 포함한다.The method includes ligating a first polynucleotide at one or more ends of the fragmented nucleic acid to produce a first library for nucleic acid sequencing.
상기 방법은 제1 라이브러리에 제2 폴리뉴클레오티드를 첨가하여 제2 라이브러리를 준비하는 단계를 포함한다.The method includes adding a second polynucleotide to the first library to prepare a second library.
상기 방법은 제2 라이브러리 및 제1 폴리뉴클레오티드에 상보적인 제1 프라이머를 사용한 핵산 서열 분석을 수행하여 제2 라이브러리의 총 리드를 수득하는 단계를 포함한다.The method includes performing nucleic acid sequencing with a first primer complementary to the second library and the first polynucleotide to obtain a total read of the second library.
제1 프라이머는 하나의 프라이머 또는 프라이머 세트일 수 있다.The first primer may be one primer or a set of primers.
상기 방법은 수득된 총 리드로부터 제2 폴리뉴클레오티드의 리드를 선별하여 제2 폴리뉴클레오티드의 리드를 수득하는 단계를 포함한다.The method includes selecting a read of the second polynucleotide from the total reads obtained to obtain a read of the second polynucleotide.
상기 방법은 총 리드의 수에 대한 제2 폴리뉴클레오티드의 리드의 수의 비율을 산출하여 제1 라이브러리의 복잡성을 측정하는 단계를 포함한다.The method includes calculating the ratio of the number of reads of the second polynucleotide to the total number of reads to determine the complexity of the first library.
상기 총 리드의 수에 대한 제2 폴리뉴클레오티드의 리드의 수의 비율이 낮을수록 상기 제1 라이브러리의 복잡성이 높을 수 있다. 상기 총 리드의 수에 대한 제2 폴리뉴클레오티드의 리드의 수의 비율이 높을수록 상기 제1 라이브러리의 복잡성이 낮을 수 있다.The lower the ratio of the number of reads of the second polynucleotide to the total number of reads, the higher the complexity of the first library may be. The higher the ratio of the number of reads of the second polynucleotide to the total number of reads, the lower the complexity of the first library may be.
상기 방법은 핵산 서열 분석 중 실시간으로 또는 핵산 서열 분석 후 제1 라이브러리의 복잡성을 모니터링할 수 있다.The method can monitor the complexity of the first library in real time during or after nucleic acid sequencing.
일 양상 또는 다른 양상에 따른 라이브러리의 복잡성을 측정하는 방법에 따르면, 핵산 서열 분석용 라이브러리의 제조 과정 중에 실시간으로, 라이브러리 제조 후 핵산 서열 분석 과정 중, 또는 핵산 서열 분석 완료 후에 간단하고 정확한 방법으로 라이브러리의 복잡성을 측정할 수 있다.According to a method for measuring the complexity of a library according to one or another aspect, the library is prepared in a simple and accurate manner in real time during the preparation of the nucleic acid sequencing library, during the nucleic acid sequencing process after the library preparation, or after completion of the nucleic acid sequencing. The complexity of can be measured.
도 1a는 일 양상에 따른 NGS를 위한 라이브러리의 복잡성을 측정하는 방법의 원리를 나타내는 모식도이고, 도 1b는 라이브러리의 복잡성이 높거나 낮은 경우 전체 리드 중 인위적 서열 리드의 비율을 나타내는 모식도이다 (리드 중 ■: 어댑터, 리드 중 □: 인위적 서열).FIG. 1A is a schematic diagram illustrating the principle of a method of measuring the complexity of a library for NGS according to one aspect, and FIG. 1B is a schematic diagram showing the ratio of artificial sequence reads among the entire reads when the complexity of the library is high or low. ■: adapter, in lead □: artificial sequence).
도 2는 라이브러리 복잡도에 따른 정량적 PCR에서 Ct 값을 나타내는 그래프이다.2 is a graph showing Ct values in quantitative PCR according to library complexity.
도 3은 라이브러리 복잡도에 따른 총 리드 중 인위적 서열 리드의 비율을 나타내는 그래프이다.3 is a graph showing the ratio of artificial sequence reads among total reads according to library complexity.
이하 본 발명을 실시예를 통하여 보다 상세하게 설명한다. 그러나, 이들 실시예는 본 발명을 예시적으로 설명하기 위한 것으로 본 발명의 범위가 이들 실시예에 한정되는 것은 아니다.Hereinafter, the present invention will be described in more detail with reference to Examples. However, these examples are for illustrative purposes only and the scope of the present invention is not limited to these examples.
실시예 1. 차세대 핼산 서열 분석을 위한 라이브러리의 복잡성의 측정Example 1 Determination of Complexity of Libraries for Next-Generation Helic acid Sequence Analysis
1. 인위적 서열을 함유하는 핵산 단편의 준비1. Preparation of Nucleic Acid Fragments Containing Artificial Sequences
차세대 핵산 서열 분석(next generation sequencing: NGS)을 위해, 표적 서열로서 동반 진단(companion diagnostics: CDx)에 활용되는 것으로 알려진 변이를 포함하는 유전자 KRAS, IDH1, BRAC1, ALK, 및 ERBB2 및 이들 유전자의 영역을 선정하였다. 선정된 위치를 기준으로 약 150 bp의 참조 서열을 선별하였다.Genes KRAS, IDH1, BRAC1, ALK, and ERBB2, including variants known to be utilized in companion diagnostics (CDx) as target sequences for next generation sequencing (NGS), and regions of these genes Was selected. About 150 bp of reference sequence was selected based on the selected position.
선별된 참조 서열과 핵산 서열은 동일하지만, 그의 5' 말단으로부터 4 bp 및 3' 말단으로부터 4 bp를 인위적인 서열(artificial sequence)로 치환하고, 양 말단에 라이브러리의 어댑터 핵산 서열을 포함한 핵산 단편 (이하, "인위적 서열 함유 핵산 단편"이라고 함)을 유전자 합성 방법으로 제조하였다.The selected reference sequence and the nucleic acid sequence are identical, but replace the 4 bp from the 5 'end and the 4 bp from the 3' end with an artificial sequence, and the nucleic acid fragment including the adapter nucleic acid sequence of the library at both ends (hereinafter , Called “artificial sequence containing nucleic acid fragments” were prepared by gene synthesis methods.
선별된 유전자, 참조 서열, 및 인위적 서열 함유 핵산 단편의 핵산 서열에서 어댑터 핵산 서열을 제외한 나머지 핵산 서열을 하기 표 1에 나타내었다.The remaining nucleic acid sequences excluding the adapter nucleic acid sequence from the nucleic acid sequences of the selected genes, reference sequences, and artificial sequence containing nucleic acid fragments are shown in Table 1 below.
1One KRAS : 염색체 번호 12 :엑손 번호: 3 : 염색체 12:25380168-25380346KRAS: Chromosome No. 12: Exon No. 3: 3: Chromosome 12: 25380168-25380346 5'-AGGAATCCTGAGAAGGGAGAAACACAGTCTGGATTATTACAGTGCACCTTTTACTTCAAAAAAGGTGTTATATACAACTCAACAACAAAAAATTCAATTTAAAAATGGGCAAAGGACTTGAAAAGACATTGTTCCTGCTCCAAAGATGAC-3' (서열번호 1)5'-AGGAATCCTGAGAAGGGAGAAACACAGTCTGGATTATTACAGTGCACCTTTTACTTCAAAAAAGGTGTTATATACAACTCAACAACAAAAAATTCAATTTAAAAATGGGCAAAGGACTTGAAAAGACATTGTTCCTGCTCCAAAGATGAC-3 '(SEQ ID NO: 1) -3' (서열번호 2)-3 '(SEQ ID NO: 2)
22 IDH1 : 염색체 번호12엑손 번호: 4 : 염색체 2:209113048-209113359IDH1: Chromosome No. 12 Exon No. 4: Chromosome 2: 209113048-209113359 5'-AGATAATGGCTTCTCTGAAGACCGTGCCACCCAGAATATTTCGTATGGTGCCATTTGGTGATTTCCACATTTGTTTCAACTTGAACTCCTCAACCCTCTTCTCATCAGGAGTGATAGTGGCACATTTGACGCCAACATTATGCTTCTTTA-3' (서열번호 3)5'-AGATAATGGCTTCTCTGAAGACCGTGCCACCCAGAATATTTCGTATGGTGCCATTTGGTGATTTCCACATTTGTTTCAACTTGAACTCCTCAACCCTCTTCTCATCAGGAGTGATAGTGGCACATTTGACGCCAACATTATGCTTCTTTA-3 '(SEQ ID NO: 3) -3' (서열번호 4)-3 '(SEQ ID NO: 4)
33 BRAC1 : 염색체 번호 17엑손 번호: 15 : 염색체 17:41222945-41223255BRAC1: Chromosome 17 Exon number: 15: Chromosome 17: 41222945-41223255 5'-TCAATTCTGGCTTCTCCCTGCTCACACTTTCTTCCATTGCATTATACCCAGCAGTATCAGTAGTATGAGCAGCAGCTGGACTCTGGGCAGATTCTGCAACTTTCAACTTTCAATTGGGGAACTTTCAATGCAGAGGTTGAAGATGGTATG-3' (서열번호 5)5'-TCAATTCTGGCTTCTCCCTGCTCACACTTTCTTCCATTGCATTATACCCAGCAGTATCAGTAGTATGAGCAGCAGCTGGACTCTGGGCAGATTCTGCAACTTTCAACTTTCAATTGGGGAACTTTCAATGCAGAGGTTGAAGATGGTATG-3 '(SEQ ID NO: 5) -3' (서열번호 6)-3 '(SEQ ID NO: 6)
44 ALK : 염색체 번호 2엑손 번호: 20: 염색체 2:29446208-29446394ALK: Chromosome No. 2 Exon No .: 20: Chromosome 2: 29446208-29446394 5'-GGTCACTGATGGAGGAGGTCTTGCCAGCAAAGCAGTAGTTGGGGTTGTAGTCGGTCATGATGGTCGAGGTGCGGAGCTTGCTCAGCTTGTACTCAGGGCTCTGCAGCTCCATCTGCATGGCTTGCAGCTCCTGGTGCTTCCGGCGGTACA-3' (서열번호 7)5'-GGTCACTGATGGAGGAGGTCTTGCCAGCAAAGCAGTAGTTGGGGTTGTAGTCGGTCATGATGGTCGAGGTGCGGAGCTTGCTCAGCTTGTACTCAGGGCTCTGCAGCTCCATCTGCATGGCTTGCAGCTCCTGGTGCTTCCGGCGGTACA-3 '(SEQ ID NO: 7) -3' (서열번호 8)-3 '(SEQ ID NO: 8)
55 ERBB2 : 염색체 번호 17엑손 번호: 6: 염색체 17:37864574-37864787ERBB2: Chromosome No. 17 Exon No. 6: Chromosome 17: 37864574-37864787 5'-CAGGGCTACGTGCTCATCGCTCACAACCAAGTGAGGCAGGTCCCACTGCAGAGGCTGCGGATTGTGCGAGGCACCCAGCTCTTTGAGGACAACTATGCCCTGGCCGTGCTAGACAATGGAGACCCGCTGAACAATACCACCCCTGTCACA-3' (서열번호 9)5'-CAGGGCTACGTGCTCATCGCTCACAACCAAGTGAGGCAGGTCCCACTGCAGAGGCTGCGGATTGTGCGAGGCACCCAGCTCTTTGAGGACAACTATGCCCTGGCCGTGCTAGACAATGGAGACCCGCTGAACAATACCACCCCTGTCACA-3 '(SEQ ID NO: 9) -3' (서열번호 10)-3 '(SEQ ID NO: 10)
표 1에서, 인위적 서열 함유 핵산 단편의 인위적 서열을 진한 글자 및 밑줄로 표시하였다.In Table 1, the artificial sequences of the artificial sequence containing nucleic acid fragments are shown in bold and underlined.
2. NGS를 위한 라이브러리의 준비 및 인위적 서열 함유 핵산 단편의 첨가2. Preparation of Libraries for NGS and Addition of Artificial Sequence-Containing Nucleic Acid Fragments
NGS를 위한 라이브러리를 제조하기 위해, 인간 유전체 시료 HapMap NA07014, NA10840, NA18595, NA18957, NA18488, NA18511, NA18867, NA18924, NA19108, 및 NA19114의 총 10 종의 시료를 동일한 몰 농도 비율로 혼합한 HapMap 혼합 시료 50 ng 또는 200 ng을 준비하였다. 준비된 혼합 시료를 Kapa hyper prep kits for illumine (Kapa Biosystems)을 사용하여, 제조자가 제공한 방법에 따라, 인간 유전체의 단편화, 말단-수선, 3'-아데노신 꼬리달기, 및 어댑터 라이게이션을 순차로 수행하였다.HapMap mixed samples in which a total of 10 samples of human genome samples HapMap NA07014, NA10840, NA18595, NA18957, NA18488, NA18511, NA18867, NA18924, NA19108, and NA19114 were mixed at the same molar concentration ratio to prepare a library for NGS. 50 ng or 200 ng was prepared. Prepared mixed samples were subjected to sequential fragmentation, end-repair, 3'-adenosine tailing, and adapter ligation of the human genome using Kapa hyper prep kits for illumine (Kapa Biosystems) according to the method provided by the manufacturer. It was.
실시예 1.1에서 준비된 인위적 서열 함유 핵산 단편 각각 50 atmole을 어댑터가 라이게이션된 라이브러리에 첨가(spiking)하였다. 인위적 서열 함유 핵산 단편이 첨가된 라이브러리를 캡쳐-전(pre-capture) 폴리머라제 연쇄 반응(polymerase chain reaction: PCR)을 수행한 후, 표적 농축(target enrichment)를 수행하였다. 그 후, 표적 농축된 라이브러리를 캡쳐-후(post-capture) PCR을 수행하였다.50 atmoles of each of the artificial sequence containing nucleic acid fragments prepared in Example 1.1 were spiked into the library to which the adapter was ligated. The library to which the artificial sequence-containing nucleic acid fragment was added was subjected to a pre-capture polymerase chain reaction (PCR), followed by target enrichment. The target concentrated library was then subjected to post-capture PCR.
KAPA Illumina 라이브러리 농도 측정용 정량적 PCR(quantitative PCR: qPCR) 키트를 사용하여 실시간 PCR을 수행하고, 실시간 qPCR 결과로부터 Ct(cycle threshold) 값을 산출하였다. 여기서, 산출된 Ct 값은 총 리드의 수를 나타낸다.Real-time PCR was performed using a quantitative PCR (qPCR) kit for measuring KAPA Illumina library concentrations, and Ct (cycle threshold) values were calculated from real-time qPCR results. Here, the calculated Ct value represents the total number of leads.
한편, 라이브러리에 포함된 인위적 서열 함유 핵산 단편의 리드 수를 측정하기 위해, 하기 표 2의 프라이머 세트 및 프로브를 사용하여 실시간 qPCR을 수행하였다. 여기서, 산출된 Ct 값은 인위적 서열 함유 핵산 단편으로부터 유래한 리드의 수를 나타낸다.On the other hand, in order to measure the read number of the artificial sequence-containing nucleic acid fragments included in the library, real-time qPCR was performed using the primer sets and probes of Table 2 below. Here, the calculated Ct value represents the number of reads derived from the artificial sequence containing nucleic acid fragments.
IDH1_인위적_정방향IDH1 artificial forward 5'-CCACCGAGATCTACACTCTTTC-3' (서열번호 11)5'-CCACCGAGATCTACACTCTTTC-3 '(SEQ ID NO: 11)
IDH1_인위적_프로브IDH1 artificial probe 5'-ACGCTCTTCCGATCTCTTCAATGGC-3' (서열번호 12)5'-ACGCTCTTCCGATCTCTTCAATGGC-3 '(SEQ ID NO: 12)
IDH1_인위적_역방향IDH1 artificial 5'-AAATCACCAAATGGCACCATAC-3' (서열번호 13)5'-AAATCACCAAATGGCACCATAC-3 '(SEQ ID NO: 13)
BRCA1_인위적_정방향BRCA1_artificial_forward 5'-GCGACCACCGAGATCTACA-3' (서열번호 14)5'-GCGACCACCGAGATCTACA-3 '(SEQ ID NO: 14)
BRCA1_인위적_프로브BRCA1_artificial_probe 5'-ACGACGCTCTTCCGATCTCTTCTTCT-3' (서열번호 15)5'-ACGACGCTCTTCCGATCTCTTCTTCT-3 '(SEQ ID NO: 15)
BRCA1_인위적_역방향BRCA1_artificial_reverse 5'-GAAAGTGTGAGCAGGGAGAAG-3' (서열번호 16)5'-GAAAGTGTGAGCAGGGAGAAG-3 '(SEQ ID NO: 16)
ERBB2_인위적_정방향ERBB2 artificial_forward 5'-CCACCGAGATCTACACTCTTTC-3' (서열번호 17)5'-CCACCGAGATCTACACTCTTTC-3 '(SEQ ID NO: 17)
ERBB2_인위적_프로브ERBB2 artificial probes 5'-ATCTCTTCGCTACGTGCTCATCGC-3' (서열번호 18)5'-ATCTCTTCGCTACGTGCTCATCGC-3 '(SEQ ID NO: 18)
ERBB2_인위적_역방향ERBB2 artificial_reverse 5'-CCTGCCTCACTTGGTTGT-3'(서열번호 19)5'-CCTGCCTCACTTGGTTGT-3 '(SEQ ID NO: 19)
또한, 라이브러리의 복잡성에 따라 전체 리드 중 인위적 서열 리드의 비율이 변화하는지 여부를 확인하기 위해, 라이브러리 제조 과정에서 라이브러리 복잡성을 변화시킨 라이브러리를 제조하였다. Kapa hyper prep kits for illumine (Kapa Biosystems)의 제조자가 제공한 라이브러리의 제조 방법을 이용하여, 어댑터 라이게이션 단계에서 라이게이션된 산물을 1회 정제하고, 30 μM의 어댑터를 사용하고, 이 방법에 따라 제조된 라이브러리를 음성 대조군으로 사용하였다. 제조된 라이브러리의 복잡성을 인위적으로 감소시키기 위해, 라이게이션 단계에서 라이게이션된 산물의 2회 정제, 3 μM의 어댑터(즉, 1/10 희석)를 사용하거나, 또는 이들의 조합을 사용하여 복잡성이 감소된 라이브러리를 제조하였다.In addition, in order to confirm whether the ratio of the artificial sequence reads among the total reads changes according to the complexity of the library, a library having changed the library complexity during the library preparation process was prepared. Using the method of preparation of the library provided by the manufacturer of Kapa hyper prep kits for illumine (Kapa Biosystems), the product ligated in the adapter ligation step was purified once, using an adapter of 30 μM and according to this method The prepared library was used as a negative control. To artificially reduce the complexity of the prepared library, complexity can be achieved by using two purifications of the ligated product in the ligation step, using 3 μM of adapter (ie 1/10 dilution), or a combination thereof. Reduced libraries were prepared.
음성 대조군의 라이브러리와 인위적으로 복잡성을 감소시킨 라이브러리를 상기와 같은 방법으로 실시간 qPCR을 수행하고 Ct 값을 산출하였다. 라이브러리의 복잡성에 따른 산출된 Ct 값을 도 2에 나타내었다. 도 2에 나타난 바와 같이, 라이브러리의 복잡성이 감소함에 따라, 인위적 서열 함유 핵산 단편의 Ct 값이 감소하였다. 이에 반해, 총 리드의 Ct 값은 라이브러리의 복잡성의 변화에도 불구하고, 유의한 변화가 없었다.The library of the negative control and the library of artificially reduced complexity were subjected to real-time qPCR in the same manner as above to calculate the Ct value. The calculated Ct value according to the complexity of the library is shown in FIG. 2. As shown in FIG. 2, as the complexity of the library decreased, the Ct value of the artificial sequence containing nucleic acid fragments decreased. In contrast, the Ct value of the total reads did not change significantly, despite the complexity of the library.
제조된 라이브러리의 핵산 서열을 분석하고, 분석된 미가공된 리드 데이터를 이용하여 전체 리드의 수 및 전체 리드 중 인위적 서열 리드의 비율을 산출하였다. 복잡성의 변화에 따라, 산출된 전체 리드의 수 및 전체 리드 중 인위적 서열 리드의 비율을 도 3에 나타내었다. 도 3에서, "50 ng"은 라이브러리 제조시 인간 게놈 DNA HapMap 혼합 시료의 양이 50 ng임을 의미한다. 도 3에 나타낸 바와 같이, 전체 리드의 수 및 인위적 서열 리드의 수는 라이브러리 복잡성과 상관 관계가 없지만, 전체 리드 중 인위적 서열 리드의 비율은 라이브러리 복잡성과 역으로 상관관계가 있음을 확인하였다.The nucleic acid sequences of the prepared libraries were analyzed and the raw read data analyzed was used to calculate the total number of reads and the ratio of artificial sequence reads among the total reads. According to the change in complexity, the calculated total number of reads and the ratio of artificial sequence reads among the total reads are shown in FIG. 3. In Figure 3, "50 ng" means that the amount of human genomic DNA HapMap mixed sample at the time of library preparation is 50 ng. As shown in FIG. 3, the number of total reads and the number of artificial sequence reads did not correlate with library complexity, but it was confirmed that the ratio of artificial sequence reads among all reads correlated inversely with library complexity.

Claims (19)

  1. 표적 시료로부터 추출된 핵산을 단편화하는 단계;Fragmenting the nucleic acid extracted from the target sample;
    단편화된 핵산의 하나 이상의 말단에 제1 폴리뉴클레오티드를 라이게이션하여 핵산 서열 분석용 제1 라이브러리를 제조하는 단계;Ligating a first polynucleotide at one or more ends of the fragmented nucleic acid to produce a first library for nucleic acid sequencing;
    상기 제1 라이브러리에 제2 폴리뉴클레오티드를 첨가(spiking)하여 제2 라이브러리를 준비하는 단계로서,Preparing a second library by spiking a second polynucleotide to the first library,
    상기 제2 폴리뉴클레오티드는 2 이상 연속 뉴클레오티드가 표적 핵산 서열과 동일한 핵산 서열을 포함하는 제1 영역, 및 상기 제1 영역의 하나 이상의 말단에 위치하고, 표적 핵산 서열의 하나 이상의 말단으로부터 2 이상 연속 뉴클레오티드와 상이한 핵산 서열을 포함하는 제2 영역을 포함하는 것인 단계;The second polynucleotide comprises a first region wherein at least two consecutive nucleotides comprise a nucleic acid sequence identical to a target nucleic acid sequence, and at least one end of the first region, wherein the second polynucleotide comprises two or more consecutive nucleotides from at least one end of the target nucleic acid sequence; Comprising a second region comprising different nucleic acid sequences;
    상기 제2 라이브러리 및 상기 제1 폴리뉴클레오티드에 상보적인 제1 프라이머 세트를 사용한 제1 폴리머라제 연쇄 반응(polymerase chain reaction: PCR)을 수행하여 제1 Ct(threshold cycle) 값을 산출하는 단계;Calculating a first threshold cycle (Ct) value by performing a first polymerase chain reaction (PCR) using a first primer set complementary to the second library and the first polynucleotide;
    상기 제2 라이브러리 및 상기 제2 폴리뉴클레오티드에 상보적인 제2 프라이머 세트를 사용한 제2 PCR을 수행하여 제2 Ct 값을 산출하는 단계; 및 Calculating a second Ct value by performing a second PCR using a second primer set complementary to the second library and the second polynucleotide; And
    상기 제1 Ct 값에 대한 제2 Ct 값의 비율을 산출하여 상기 제1 라이브러리의 복잡성을 측정하는 단계를 포함하는,Calculating the ratio of the second Ct value to the first Ct value to measure the complexity of the first library,
    핵산 서열 분석용 라이브러리의 복잡성을 측정하는 방법.A method of determining the complexity of a library for nucleic acid sequencing.
  2. 청구항 1에 있어서, 상기 핵산 서열 분석은 차세대 핵산 서열 분석(next generation sequencing: NGS)인 것인 방법.The method of claim 1, wherein the nucleic acid sequencing is next generation sequencing (NGS).
  3. 청구항 1에 있어서, 상기 표적 시료는 개체 또는 세포로부터 유래된 것인 방법.The method of claim 1, wherein the target sample is from an individual or cell.
  4. 청구항 1에 있어서, 상기 핵산은 유전체(genome) 또는 그의 단편인 것인 방법.The method of claim 1, wherein the nucleic acid is a genome or fragment thereof.
  5. 청구항 1에 있어서, 상기 제1 폴리뉴클레오티드는 어댑터인 것인 방법.The method of claim 1, wherein the first polynucleotide is an adapter.
  6. 청구항 1에 있어서, 상기 표적 핵산 서열은 동반 진단(companion diagnostics: CDx)에 이용되는 유전적 변이를 포함하는 것인 방법.The method of claim 1, wherein the target nucleic acid sequence comprises a genetic variation used for companion diagnostics (CDx).
  7. 청구항 1에 있어서, 상기 제2 폴리뉴클레오티드의 길이는 20 뉴클레오티드 내지 500 뉴클레오티드인 것인 방법.The method of claim 1, wherein the second polynucleotide is 20 nucleotides to 500 nucleotides in length.
  8. 청구항 1에 있어서, 상기 제2 영역은 제1 영역의 양 말단에 위치하고, 각각의 제2 영역은 표적 핵산 서열의 5' 말단으로부터 2 이상 연속 뉴클레오티드와 상이한 서열 및 3' 말단으로부터 2 이상 연속 뉴클레오티드와 상이한 서열을 포함하는 것인 방법.The method of claim 1, wherein the second region is located at both ends of the first region, each second region is a sequence different from at least 2 consecutive nucleotides from the 5 'end of the target nucleic acid sequence and at least 2 consecutive nucleotides from the 3' end. Methods comprising different sequences.
  9. 청구항 1에 있어서, 상기 제2 영역의 길이는 2 뉴클레오티드 내지 15 뉴클레오티드인 것인 방법.The method of claim 1, wherein the second region is between 2 and 15 nucleotides in length.
  10. 청구항 1에 있어서, 상기 제2 폴리뉴클레오티드는 그의 하나 이상의 말단에 상기 제1 폴리뉴클레오티드와 동일한 2 이상 연속 뉴클레오티드를 더 포함하는 것인 방법.The method of claim 1, wherein the second polynucleotide further comprises two or more contiguous nucleotides identical to the first polynucleotide at one or more ends thereof.
  11. 청구항 1에 있어서, 상기 제1 PCR, 제2 PCR, 또는 이들의 조합은 정량적 PCR(quantitative PCR: qPCR) 또는 디지탈 PCR(digital PCR: dPCR)로 수행되는 것인 방법.The method of claim 1, wherein the first PCR, the second PCR, or a combination thereof is performed by quantitative PCR (qPCR) or digital PCR (dPCR).
  12. 청구항 1에 있어서, 상기 제1 PCR 및 제2 PCR은 동시 또는 순차로 수행되는 것인 방법.The method of claim 1, wherein the first PCR and the second PCR are performed simultaneously or sequentially.
  13. 청구항 1에 있어서, 상기 제1 Ct 값은 제2 라이브러리의 총 리드(read)를 나타내는 것인 방법.The method of claim 1, wherein the first Ct value represents a total read of the second library.
  14. 청구항 1에 있어서, 상기 제2 Ct 값은 제2 라이브러리 중 제2 폴리뉴클레오티드의 리드를 나타내는 것인 방법.The method of claim 1, wherein the second Ct value represents a read of a second polynucleotide in a second library.
  15. 청구항 1에 있어서, 상기 제1 Ct 값에 대한 제2 Ct 값의 비율이 낮을수록 상기 제1 라이브러리의 복잡성이 높고, 상기 제1 Ct 값에 대한 제2 Ct 값의 비율이 높을수록 상기 제1 라이브러리의 복잡성이 낮은 것인 방법.The method of claim 1, wherein the lower the ratio of the second Ct value to the first Ct value, the higher the complexity of the first library, and the higher the ratio of the second Ct value to the first Ct value, the first library Of low complexity.
  16. 표적 시료로부터 추출된 핵산을 단편화하는 단계;Fragmenting the nucleic acid extracted from the target sample;
    단편화된 핵산의 하나 이상의 말단에 제1 폴리뉴클레오티드를 라이게이션하여 핵산 서열 분석용 제1 라이브러리를 제조하는 단계;Ligating a first polynucleotide at one or more ends of the fragmented nucleic acid to produce a first library for nucleic acid sequencing;
    상기 제1 라이브러리에 제2 폴리뉴클레오티드를 첨가하여 제2 라이브러리를 준비하는 단계로서,Preparing a second library by adding a second polynucleotide to the first library,
    상기 제2 폴리뉴클레오티드는 2 이상 연속 뉴클레오티드가 표적 핵산 서열과 동일한 핵산 서열을 포함하는 제1 영역, 및 상기 제1 영역의 하나 이상의 말단에 위치하고, 표적 핵산 서열의 하나 이상의 말단으로부터 2 이상 연속 뉴클레오티드와 상이한 핵산 서열을 포함하는 제2 영역을 포함하는 것인 단계;The second polynucleotide comprises a first region wherein at least two consecutive nucleotides comprise a nucleic acid sequence identical to a target nucleic acid sequence, and at least one end of the first region, wherein the second polynucleotide comprises two or more consecutive nucleotides from at least one end of the target nucleic acid sequence; Comprising a second region comprising different nucleic acid sequences;
    상기 제2 라이브러리 및 상기 제1 폴리뉴클레오티드에 상보적인 제1 프라이머를 사용한 핵산 서열 분석(sequencing)을 수행하여 제2 라이브러리의 총 리드(read)를 수득하는 단계;Performing nucleic acid sequencing using a first primer complementary to the second library and the first polynucleotide to obtain a total read of the second library;
    수득된 총 리드로부터 상기 제2 폴리뉴클레오티드의 리드를 선별하여 제2 폴리뉴클레오티드의 리드를 수득하는 단계; 및Selecting the read of the second polynucleotide from the total reads obtained to obtain a read of the second polynucleotide; And
    총 리드의 수에 대한 제2 폴리뉴클레오티드의 리드의 수의 비율을 산출하여 상기 제1 라이브러리의 복잡성을 측정하는 단계를 포함하는,Calculating the ratio of the number of reads of the second polynucleotide to the total number of reads to determine the complexity of the first library,
    핵산 서열 분석용 라이브러리의 복잡성을 측정하는 방법.A method of determining the complexity of a library for nucleic acid sequencing.
  17. 청구항 16에 있어서, 상기 핵산 서열 분석은 차세대 핵산 서열 분석(NGS)인 것인 방법.The method of claim 16, wherein the nucleic acid sequencing is next generation nucleic acid sequencing (NGS).
  18. 청구항 16에 있어서, 상기 총 리드의 수에 대한 제2 폴리뉴클레오티드의 리드의 수의 비율이 낮을수록 상기 제1 라이브러리의 복잡성이 높고, 상기 총 리드의 수에 대한 제2 폴리뉴클레오티드의 리드의 수의 비율이 높을수록 상기 제1 라이브러리의 복잡성이 낮은 것인 방법.The method of claim 16, wherein the lower the ratio of the number of reads of the second polynucleotide to the total number of reads, the higher the complexity of the first library is, and the number of reads of the second polynucleotide to the total number of reads. The higher the ratio, the lower the complexity of the first library.
  19. 청구항 16에 있어서, 상기 방법은 핵산 서열 분석 중 실시간으로 또는 핵산 서열 분석 후 제1 라이브러리의 복잡성을 모니터링하는 것인 방법.The method of claim 16, wherein the method monitors the complexity of the first library in real time during or after nucleic acid sequencing.
PCT/KR2017/014549 2016-12-13 2017-12-12 Method for measuring complexity of library for next generation sequencing WO2018110940A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR1020160169752A KR102417999B1 (en) 2016-12-13 2016-12-13 Method for measuring library complexity for next generation sequencing
KR10-2016-0169752 2016-12-13

Publications (1)

Publication Number Publication Date
WO2018110940A1 true WO2018110940A1 (en) 2018-06-21

Family

ID=62559454

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2017/014549 WO2018110940A1 (en) 2016-12-13 2017-12-12 Method for measuring complexity of library for next generation sequencing

Country Status (2)

Country Link
KR (1) KR102417999B1 (en)
WO (1) WO2018110940A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP4293126A3 (en) * 2018-11-30 2024-01-17 Illumina, Inc. Analysis of multiple analytes using a single assay

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130045875A1 (en) * 2011-07-29 2013-02-21 Bio-Rad Laboratories, Inc. Library characterization by digital assay
US20130065768A1 (en) * 2011-09-12 2013-03-14 Sequenta, Inc. Random array sequencing of low-complexity libraries
US20140324359A1 (en) * 2013-04-25 2014-10-30 University Of Southern California Predicting the molecular complexity of sequencing libraries
US20160122748A1 (en) * 2014-10-30 2016-05-05 The Board Of Trustees Of The Leland Stanford Junior University Scalable method for isolation and sequence-verification of oligonucleotides from complex libraries

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130045875A1 (en) * 2011-07-29 2013-02-21 Bio-Rad Laboratories, Inc. Library characterization by digital assay
US20130065768A1 (en) * 2011-09-12 2013-03-14 Sequenta, Inc. Random array sequencing of low-complexity libraries
US20140324359A1 (en) * 2013-04-25 2014-10-30 University Of Southern California Predicting the molecular complexity of sequencing libraries
US20160122748A1 (en) * 2014-10-30 2016-05-05 The Board Of Trustees Of The Leland Stanford Junior University Scalable method for isolation and sequence-verification of oligonucleotides from complex libraries

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
HEAD, STEVEN R. ET AL.: "Library Construction for Next-generation Sequencing: Overviews and Challenges", BIOTECHNIQUES, vol. 56, no. 2, 2014, pages 61 - 77, XP055108708 *

Also Published As

Publication number Publication date
KR20180068118A (en) 2018-06-21
KR102417999B1 (en) 2022-07-06

Similar Documents

Publication Publication Date Title
US11453913B2 (en) Safe sequencing system
Chatterjee et al. Tools and strategies for analysis of genome-wide and gene-specific DNA methylation patterns
CN108410980B (en) Method and kit for screening target region for methylation PCR detection and application
WO2016167408A1 (en) Method for predicting organ transplant rejection using next-generation sequencing
US9567633B2 (en) Method for detecting hydroxylmethylation modification in nucleic acid and use thereof
CN106591441B (en) Alpha and/or beta-thalassemia mutation detection probe, method and chip based on whole gene capture sequencing and application
WO2016195382A1 (en) Next-generation nucleotide sequencing using adaptor comprising bar code sequence
CN111440896B (en) Novel beta coronavirus variation detection method, probe and kit
EP2917368A1 (en) Methods and systems for identifying contamination in samples
Yin et al. Challenges in the application of NGS in the clinical laboratory
WO2017204572A1 (en) Method for preparing library for highly parallel sequencing by using molecular barcoding, and use thereof
CN113337590B (en) Second generation sequencing method and library construction method
CN109280696B (en) Method for splitting mixed sample by SNP detection technology
WO2018110940A1 (en) Method for measuring complexity of library for next generation sequencing
WO2016080750A1 (en) Gene panel for detecting cancer genome mutant
WO2020252749A1 (en) Method and use for construction of sequencing library based on dna samples
WO2022181858A1 (en) Composition for improving molecular barcoding efficiency and use thereof
WO2019108014A1 (en) Method for measuring integrity of uid nucleic acid sequence in nucleic acid sequencing analysis
WO2023018024A1 (en) Method for diagnosing microsatellite instability by using variation rate of sequence lengths at microsatellite loci
WO2023018026A1 (en) Method for diagnosing microsatellite instability by using difference between maximum value and minimum value of sequence lengths of microsatellite loci
Benamozig et al. A detection method for the capture of genomic signatures: From disease diagnosis to genome editing
WO2021125854A1 (en) Primers for detecting trace amount of rare single-nucleotide variant and method for specifically and sensitively detecting trace amount of rare single-nucleotide variant by using same
WO2024049276A1 (en) Composition for selective amplification of multiple target dna, and amplification method using same
CN114790455B (en) Primer group, kit and method for amplifying GJB2 gene and SLC26A4 gene
US11920198B2 (en) Method and kit for identifying gene mutations

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17881419

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17881419

Country of ref document: EP

Kind code of ref document: A1