KR20180068118A

KR20180068118A - Method for measuring library complexity for next generation sequencing

Info

Publication number: KR20180068118A
Application number: KR1020160169752A
Authority: KR
Inventors: 정종석; 손대순; 박웅양
Original assignee: 삼성전자주식회사; 사회복지법인 삼성생명공익재단
Priority date: 2016-12-13
Filing date: 2016-12-13
Publication date: 2018-06-21
Also published as: KR102417999B1; WO2018110940A1

Abstract

Provided is a method for measuring the complexity of a library. According to this, the complexity of the library can be measured in a simple and accurate method during the production of a nucleic acid sequence analysis library in real time, during nucleic acid sequence analysis after library preparation, or after nucleic acid sequence analysis. The method includes a step of fragmenting nucleic acid extracted from a target sample, a step of producing a first library and a second library, a step of calculating a first threshold cycle (Ct) value and a second Ct value, and a step of measuring the complexity of the first library.

Description

METHOD FOR MEASURING LIBRARY COMPLEXITY FOR NEXT GENERATION OF NUCLEIC ACID SEQUENCES [

차세대 핵산 서열 분석을 위한 라이브러리의 복잡성을 측정하는 방법 및 이를 이용한 장치에 관한 것이다.A method for measuring the complexity of a library for next-generation nucleic acid sequence analysis, and an apparatus using the same.

차세대 핵산 서열 분석(next generation sequencing: NGS)은 연구 및 진단의 목적으로 널리 활용되고 있다. NGS는 장비의 종류에 따라 다르지만, 크게 보면 시료의 채취, 라이브러리의 제조, 및 핵산 서열 분석의 수행의 총 3단계로 구분할 수 있다. 라이브러리의 제조 후에는 핵산 서열 분석에 들어가기에 앞서 품질 관리(quality control: QC)를 진행하고, 제조된 라이브러리로 핵산 서열 분석을 진행할지 여부를 결정한다. 핵산 서열 분석을 진행하는 중에도 실시간으로 핵산 서열 분석이 원활히 진행되는지 여부를 확인하기 위해, 라이브러리의 제조사에서 제공하는 방법으로 QC를 진행한다. 핵산 서열 분석 후에는 생성된 핵산 서열 데이터(즉, 리드(read))를 분석하여, 실질적으로 돌연변이, 유전자 변이, 유전자 발현 등의 분석 전 데이터 생성 품질을 측정한다. 이와 같이, 각 차세대 핵산 서열 분석의 단계 별로 품질을 측정하고, 품질을 결정하는 요인 중 하나는 라이브러리의 복잡성(complexity)이다. 라이브러리를 제조한 후, 핵산 서열 분석을 수행하는 도중, 또는 핵산 서열 분석을 완료한 후에 라이브러리의 복잡성을 측정하여, 핵산 서열 분석의 수행, 중단, 및 생성된 핵산 서열 데이터의 활용 등을 판단할 수 있다.Next generation sequencing (NGS) is widely used for research and diagnostic purposes. Although NGS differs depending on the kind of equipment, it can roughly be divided into three stages: sample collection, library production, and nucleic acid sequence analysis. After preparation of the library, quality control (QC) is performed prior to the nucleic acid sequence analysis, and whether the nucleic acid sequence analysis is carried out with the prepared library is determined. In order to confirm whether the nucleic acid sequence analysis proceeds smoothly in real time even during the nucleic acid sequence analysis, QC is performed by the method provided by the library manufacturer. After the nucleic acid sequence analysis, the generated nucleic acid sequence data (i.e., read) is analyzed to measure the data generation quality substantially before the analysis such as mutation, gene mutation, gene expression, and the like. Thus, one of the factors that determines the quality and determines the quality of each next-generation nucleic acid sequence analysis is the complexity of the library. The complexity of the library can be measured during the nucleic acid sequence analysis or after the completion of the nucleic acid sequence analysis after the library is prepared and the nucleic acid sequence analysis can be performed and the nucleic acid sequence data generated can be judged have.

대량 핵산 서열 분석을 수행하기 전에 2회의 최소 범위 핵산 서열 분석을 수행하고 이로부터 얻은 데이터를 이용하여 라이브러리의 복잡성을 통계적 방법으로 예측하는 방법이 알려져 있다(미국 공개 번호 US20140324359 A1). 그러나, 이러한 방법은 라이브러리의 복잡성을 측정하기 위해, 우선 라이브러리에 대해 핵산 서열 분석을 2회 진행하여야 하고, 라이브러리의 복잡성을 직접적이고 실시간으로 측정할 수 없다.It is known to perform two minimum-range nucleic acid sequence analyzes before performing bulk nucleic acid sequence analysis and to predict the complexity of the library statistically using the data obtained therefrom (U.S. Publication No. US20140324359 A1). However, in order to measure the complexity of the library, this method must first perform nucleic acid sequence analysis twice for the library, and can not directly and real-time measure the complexity of the library.

따라서, 간단하고 정확한 방법으로 핵산 서열 분석 진행 중에 실시간으로 또는 핵산 서열 분석 후에 라이브러리의 복잡성을 측정할 수 있는 방법이 요구된다.Therefore, there is a need for a method that can measure the complexity of a library in real time or under nucleic acid sequence analysis during the course of nucleic acid sequence analysis in a simple and accurate manner.

핵산 서열 분석용 라이브러리의 복잡성을 측정하는 방법을 제공한다.A method for measuring the complexity of a library for nucleic acid sequence analysis is provided.

일 양상에 따르면, 표적 시료로부터 추출된 핵산을 단편화하는 단계;According to one aspect, there is provided a method for detecting a nucleic acid fragment, comprising: fragmenting a nucleic acid extracted from a target sample;

단편화된 핵산의 하나 이상의 말단에 제1 폴리뉴클레오티드를 라이게이션하여 핵산 서열 분석용 제1 라이브러리를 제조하는 단계;Ligation of the first polynucleotide to one or more ends of the fragmented nucleic acid to prepare a first library for nucleic acid sequence analysis;

상기 제1 라이브러리에 제2 폴리뉴클레오티드를 첨가(spiking)하여 제2 라이브러리를 준비하는 단계;Preparing a second library by spiking a second polynucleotide in the first library;

상기 제2 라이브러리 및 상기 제1 폴리뉴클레오티드에 상보적인 제1 프라이머 세트를 사용한 제1 폴리머라제 연쇄 반응(polymerase chain reaction: PCR)을 수행하여 제1 Ct(threshold cycle) 값을 산출하는 단계;Performing a first polymerase chain reaction (PCR) using the second library and a first primer set complementary to the first polynucleotide to produce a first threshold (Ct) value;

상기 제2 라이브러리 및 상기 제2 폴리뉴클레오티드에 상보적인 제2 프라이머 세트를 사용한 제2 PCR을 수행하여 제2 Ct 값을 산출하는 단계; 및 Performing a second PCR using a second primer set complementary to said second library and said second polynucleotide to yield a second Ct value; And

상기 제1 Ct 값에 대한 제2 Ct 값의 비율을 산출하여 상기 제1 라이브러리의 복잡성을 측정하는 단계를 포함하는, 핵산 서열 분석용 라이브러리의 복잡성을 측정하는 방법을 제공한다.And calculating the ratio of the second Ct value to the first Ct value to measure the complexity of the first library. The present invention also provides a method for measuring the complexity of a library for nucleic acid sequence analysis.

상기 "핵산 서열 분석용 라이브러리"의 핵산 서열 분석은 차세대 핵산 서열 분석(next generation sequencing: NGS)일 수 있다. 용어 "차세대 핵산 서열 분석(next generation sequencing: NGS)"는 용어 "대규모 병렬 시퀀싱(massive parallel sequencing)" 또는 용어 "2세대 시퀀싱(second-generation sequencing)"과 상호 교환적으로 사용될 수 있다. NGS는 칩(chip) 기반 그리고 PCR 기반 쌍 말단(paired end) 형식으로 전장 유전체를 조각내고, 상기 조각을 혼성화 반응(hybridization)에 기초하여 초고속으로 핵산 서열 분석을 수행하는 기술을 의미한다. NGS는 대량의 단편의 핵산을 동시다발적으로 핵산 서열 분석하는 기법으로서, NGS 기반의 표적 핵산 서열 분석(targeted sequencing) 또는 패널 핵산 서열 분석(panel sequencing)을 수행할 수 있다. NGS는 예를 들어, 454 플랫폼(Roche), GS FLX 티타늄, Illumina MiSeq, Illumina HiSeq, Illumina Genome Analyzer, Solexa platform, SOLiD System(Applied Biosystems), Ion Proton(Life Technologies), Complete Genomics, Helicos Biosciences Heliscope, Pacific Biosciences의 단일 분자 실시간(SMRT™) 기술, 또는 이들의 조합에 의해 수행될 수 있다.The nucleic acid sequence analysis of the "library for nucleic acid sequence analysis" may be next generation sequencing (NGS). The term " next generation sequencing " (NGS) can be used interchangeably with the term "massive parallel sequencing" or the term "second- generation sequencing. NGS refers to a technique of fragmenting a full-length genome in a chip-based and PCR-based paired end format and performing ultra-high-speed nucleic acid sequence analysis based on hybridization of the fragment. NGS is a technique for simultaneous nucleic acid sequencing of a large number of fragments of nucleic acid, and can perform NGS-based targeted sequencing or panel sequencing. NGS can be obtained from, for example, the 454 platform (Roche), GS FLX Titanium, Illumina MiSeq, Illumina HiSeq, Illumina Genome Analyzer, Solexa platform, SOLiD System (Applied Biosystems), Ion Proton (Life Technologies), Complete Genomics, Helicos Biosciences Heliscope, Pacific Biosciences single-molecule real-time (SMRT) technology, or a combination thereof.

용어 "라이브러리(library)"는 핵산 단편의 집합을 말한다. 상기 라이브러리는 예를 들어 유전체 라이브러리(genomic library), 상보적 DNA 라이브러리(complementary DNA library), 또는 무작위적 돌연변이 라이브러리(randomized mutant library)이다.The term "library" refers to a collection of nucleic acid fragments. The library may be, for example, a genomic library, a complementary DNA library, or a randomized mutant library.

용어 "라이브러리의 복잡성(library complexity)"은 해당 라이브러리에 존재하는 고유한(unique) 단편의 수를 말한다. 복잡성은 출발 물질인 핵산의 양, 라이브러리 제조 과정 중 소실되는 핵산의 양, PCR을 통해 증폭되는 핵산의 양 등에 영향을 받을 수 있다. 상기 라이브러리의 복잡성은 상대적인 수준으로 나타낼 수 있다.The term "library complexity" refers to the number of unique fragments present in the library. The complexity may be affected by the amount of nucleic acid as a starting material, the amount of nucleic acid lost during the library preparation, the amount of nucleic acid amplified through PCR, and the like. The complexity of the library can be expressed at a relative level.

상기 방법은 표적 시료로부터 추출된 핵산을 단편화하는 단계를 포함한다.The method includes fragmenting the nucleic acid extracted from the target sample.

상기 표적 시료는 개체 또는 세포로부터 유래할 수 있다. 상기 개체는 인간, 소, 말, 돼지, 양, 염소, 개, 고양이, 및 설치류를 포함한 포유류일 수 있다. 상기 세포는 개체로부터 유래된 세포 또는 세포주일 수 있다. 상기 표적 시료는 생물학적 시료일 수 있다. 상기 생물학적 시료는 예를 들어, 혈액, 혈장, 혈청, 소변, 타액, 점막 분비물, 객담, 대변, 눈물, 또는 이들의 조합으로부터 획득된 것일 수 있다. 상기 생물학적 시료는 다양한 종으로부터 유래하는 진핵세포, 원핵세포, 바이러스, 박테리오 파지 등의 시료일 수 있다.The target sample may be derived from an individual or a cell. The subject may be a mammal including humans, cows, horses, pigs, sheep, goats, dogs, cats, and rodents. The cell may be a cell or cell derived from an individual. The target sample may be a biological sample. The biological sample may be obtained from, for example, blood, plasma, serum, urine, saliva, mucous secretion, sputum, feces, tears, or a combination thereof. The biological sample may be a sample of eukaryotic cells, prokaryotic cells, viruses, bacteriophages, etc. derived from various species.

상기 핵산은 유전체(genome) 또는 그의 단편일 수 있다. 용어 "유전체(genome)"는 염색체, 염색질, 또는 유전자의 전체를 총칭하는 용어이다. 상기 유전체 또는 그의 단편은 분리된 DNA, 예를 들어 세포를 포함하지 않는 핵산(cell-free DNA: cf DNA)일 수 있다. 표적 시료로부터 핵산을 추출 또는 분리하는 방법은 통상의 기술자에게 공지된 방법으로 수행될 수 있다.The nucleic acid may be a genome or a fragment thereof. The term "genome" is a generic term for chromosomes, chromosomes, or whole genes. The dielectrics or fragments thereof may be isolated DNA, for example, a cell-free DNA (cf DNA). The method of extracting or separating the nucleic acid from the target sample can be carried out by a method known to a person skilled in the art.

상기 표적 시료로부터 핵산을 추출하는 방법은 통상의 기술자에게 알려진 방법으로 수행될 수 있다.The method of extracting nucleic acid from the target sample can be carried out by a method known to a person skilled in the art.

상기 단편화(fragmentation)는 유전체를 물리적, 화학적 또는 효소적으로 절단하는 것일 수 있다. 예를 들어, 상기 단편화는 유전체를 제한효소로 절단하는 것이다.The fragmentation may be a physical, chemical or enzymatic cleavage of the dielectric. For example, the fragmentation is to digest the genome with a restriction enzyme.

상기 방법은 단편화된 핵산의 크기를 선별하는 단계를 더 포함할 수 있다. 크기를 선별하는 단계는 전기영동, 원심분리, 크로마토그래피, 또는 이들의 조합으로 수행될 수 있다. 상기 단편화된 핵산의 길이는 약 10 bp(염기쌍) 내지 약 2000 bp, 약 20 bp 내지 약 1500 bp, 약 50 bp 내지 약 1000 bp, 약 100 bp 내지 약 800 bp, 약 150 bp 내지 약 600 bp, 또는 약 300 bp 내지 약 600 bp일 수 있다.The method may further comprise the step of selecting the size of the fragmented nucleic acid. The step of selecting the size may be performed by electrophoresis, centrifugation, chromatography, or a combination thereof. Wherein the length of the fragmented nucleic acid is from about 10 bp to about 2000 bp, from about 20 bp to about 1500 bp, from about 50 bp to about 1000 bp, from about 100 bp to about 800 bp, from about 150 bp to about 600 bp, Or from about 300 bp to about 600 bp.

상기 방법은 단편화된 핵산의 하나 이상의 말단에 제1 폴리뉴클레오티드를 라이게이션하여 핵산 서열 분석용 제1 라이브러리를 제조하는 단계를 포함한다.The method comprises ligation of a first polynucleotide to one or more ends of the fragmented nucleic acid to produce a first library for nucleic acid sequence analysis.

상기 제1 라이브러리를 제조하는 단계는 단편화된 핵산의 말단-수선(end-repair) 및 3'-아데노신 꼬리달기(3'-A tailing)를 더 포함할 수 있다.The step of preparing the first library may further comprise end-repair of the fragmented nucleic acid and 3'-adenosine tailing (3'-A tailing).

상기 제1 폴리뉴클레오티드는 어댑터일 수 있다. 상기 어댑터(adaptor)는 NGS에서 표적 핵산을 농축(enrichment)하기 위한 프라이머 서열을 포함하는 폴리뉴클레오티드일 수 있다. 상기 어댑터는 통상의 기술자에게 알려진 폴리뉴클레오티드일 수 있다. 상기 어댑터는 핵산 서열 분석용 유니버셜 서열을 포함할 수 있다. 예를 들어, 핵산 서열 분석용 라이브러리 제조 키트에 포함된 어댑터이다.The first polynucleotide may be an adapter. The adapter may be a polynucleotide comprising a primer sequence for enriching the target nucleic acid in NGS. The adapter may be a polynucleotide known to the ordinarily skilled artisan. The adapter may comprise a universal sequence for nucleic acid sequence analysis. For example, it is an adapter included in a library preparation kit for nucleic acid sequencing.

라이게이션은 핵산 단편들 간의 말단을 결합시키는 것을 말한다. 상기 라이게이션은 DNA 리가제(ligase)를 사용하여 수행할 수 있다.Ligation refers to the binding of ends between nucleic acid fragments. The ligation can be performed using DNA ligase.

상기 제1 라이브러리는 핵산 서열 분석을 위해 제조된 라이브러리일 수 있다.The first library may be a library prepared for nucleic acid sequence analysis.

상기 방법은 제1 라이브러리에 제2 폴리뉴클레오티드를 첨가(spiking)하여 제2 라이브러리를 준비하는 단계를 포함한다.The method comprises the step of spiking a second polynucleotide in a first library to prepare a second library.

상기 첨가(spiking)는 제1 라이브러리와 소량의 제2 폴리뉴클레오티드를 혼합하는 것일 수 있다.The spiking may be by mixing a first library with a small amount of a second polynucleotide.

상기 제2 폴리뉴클레오티드는 2 이상 연속 뉴클레오티드가 표적 핵산 서열과 동일한 핵산 서열을 포함하는 제1 영역, 및 상기 제1 영역의 하나 이상의 말단에 위치하고, 표적 핵산 서열의 하나 이상의 말단으로부터 2 이상 연속 뉴클레오티드와 상이한 핵산 서열을 포함하는 제2 영역을 포함할 수 있다.Wherein the second polynucleotide comprises a first region in which at least two consecutive nucleotides comprise the same nucleic acid sequence as the target nucleic acid sequence and a second region in which at least two consecutive nucleotides from one or more ends of the target nucleic acid sequence And a second region comprising a different nucleic acid sequence.

상기 표적 핵산 서열은 동반 진단(companion diagnostics: CDx)에 이용되는 유전적 변이를 포함할 수 있다.The target nucleic acid sequence may comprise a genetic variation used in companion diagnostics (CDx).

상기 제2 폴리뉴클레오티드의 길이는 약 20 뉴클레오티드(이하, 'nt'라고 함) 내지 약 500 nt, 약 30 nt 내지 약 450 nt, 약 40 nt 내지 약 400 nt, 약 50 nt 내지 약 350 nt, 약 60 nt 내지 약 300 nt, 약 70 nt 내지 약 250 nt, 약 80 nt 내지 약 200 nt, 약 90 nt 내지 약 190 nt, 약 100 nt 내지 약 180 nt, 약 110 nt 내지 약 170 nt, 약 120 nt 내지 약 160 nt, 약 130 nt 내지 약 150 nt, 또는 약 150 nt일 수 있다.Wherein the length of the second polynucleotide is from about 20 nt to about 500 nt, from about 30 nt to about 450 nt, from about 40 nt to about 400 nt, from about 50 nt to about 350 nt, From about 60 nt to about 300 nt, from about 70 nt to about 250 nt, from about 80 nt to about 200 nt, from about 90 nt to about 190 nt, from about 100 nt to about 180 nt, from about 110 nt to about 170 nt, To about 160 nt, from about 130 nt to about 150 nt, or about 150 nt.

상기 제1 영역은 2 이상 연속 뉴클레오티드가 표적 핵산 서열과 동일한 핵산 서열을 포함할 수 있다. 상기 제1 영역은 약 10 뉴클레오티드(이하, 'nt'라고 함) 내지 약 490 nt, 약 20 nt 내지 약 440 nt, 약 30 nt 내지 약 390 nt, 약 40 nt 내지 약 340 nt, 약 50 nt 내지 약 290 nt, 약 60 nt 내지 약 240 nt, 약 70 nt 내지 약 150 nt, 약 80 nt 내지 약 180 nt, 약 90 nt 내지 약 170 nt, 약 100 nt 내지 약 160 nt, 약 110 nt 내지 약 150 nt, 약 120 nt 내지 약 150 nt, 약 130 nt 내지 약 150 nt, 약 140 nt 내지 약 150 nt, 또는 약 142 nt일 수 있다.The first region may comprise a nucleic acid sequence having two or more consecutive nucleotides identical to the target nucleic acid sequence. The first region comprises at least about 10 nucleotides (hereinafter referred to as 'nt') to about 490 nt, about 20 nt to about 440 nt, about 30 nt to about 390 nt, about 40 nt to about 340 nt, From about 60 nt to about 240 nt, from about 70 nt to about 150 nt, from about 80 nt to about 180 nt, from about 90 nt to about 170 nt, from about 100 nt to about 160 nt, from about 110 nt to about 150 nt, nt, from about 120 nt to about 150 nt, from about 130 nt to about 150 nt, from about 140 nt to about 150 nt, or about 142 nt.

상기 제2 영역은 제1 영역의 양 말단에 위치하고, 각각의 제2 영역은 표적 핵산 서열의 5' 말단으로부터 2 이상 연속 뉴클레오티드와 상이한 서열 및 3' 말단으로부터 2 이상 연속 뉴클레오티드와 상이한 서열을 포함할 수 있다. 상기 제2 영역의 길이는 약 2 nt 내지 약 15 nt, 약 2 nt 내지 약 13 nt, 약 2 nt 내지 약 10 nt, 약 2 nt 내지 약 8 nt, 약 2 nt 내지 약 6 nt, 약 2 nt 내지 약 4 nt, 약 3 nt, 또는 약 4 nt일 수 있다.Wherein the second region is located at both ends of the first region, each second region comprises a sequence that differs from the 5 'end of the target nucleic acid sequence and a sequence that differs from the 2 or more consecutive nucleotides from the 3' end . Wherein the length of the second region is from about 2 nt to about 15 nt, from about 2 nt to about 13 nt, from about 2 nt to about 10 nt, from about 2 nt to about 8 nt, from about 2 nt to about 6 nt, To about 4 nt, about 3 nt, or about 4 nt.

상기 제2 폴리뉴클레오티드는 예를 들어 그의 5'-말단으로부터 3' 방향으로 제2 영역, 제1 영역, 및 제2 영역을 포함할 수 있다.The second polynucleotide may comprise, for example, a second region, a first region, and a second region in the 3 'direction from its 5'-end.

상기 제2 폴리뉴클레오티드는 그의 하나 이상의 말단에 상기 제1 폴리뉴클레오티드와 동일한 2 이상 연속 뉴클레오티드를 더 포함할 수 있다. 상기 제2 폴리뉴클레오티드는 예를 들어 그의 5'-말단으로부터 3' 방향으로 제1 폴리뉴클레오티드와 동일한 2 이상 연속 뉴클레오티드, 제2 영역, 제1 영역, 제2 영역, 및 제1 폴리뉴클레오티드와 동일한 2 이상 연속 뉴클레오티드를 포함할 수 있다.The second polynucleotide may further comprise at least two consecutive nucleotides identical to the first polynucleotide at one or more ends thereof. The second polynucleotide may comprise, for example, two or more consecutive nucleotides identical to the first polynucleotide in the 3 'direction from its 5'-end, a second region, a first region, a second region, and a second polynucleotide identical to the first polynucleotide Or more consecutive nucleotides.

상기 방법은 제2 라이브러리 및 제1 폴리뉴클레오티드에 상보적인 제1 프라이머 세트를 사용한 제1 폴리머라제 연쇄 반응(polymerase chain reaction: PCR)을 수행하여 제1 Ct(threshold cycle) 값을 산출하는 단계를 포함한다.The method comprises the step of performing a first polymerase chain reaction (PCR) using a second library and a first primer set complementary to the first polynucleotide to yield a first Ct (threshold cycle) value do.

상기 PCR은 예를 들어, 정량적 PCR(quantitative PCR: qPCR), 디지탈 PCR(digital PCR: dPCR), 핫 스타트(hot start) PCR, 터치다운(touchdown) PCR, 네스티드(nested) PCR, 부스터(booster) PCR, 멀티플렉스(multiplex) PCR, 실시간(real-time) PCR, 분별 디스플레이 PCR(differential display PCR, D-PCR), cDNA 말단의 신속 증폭(rapid amplification of cDNA ends, RACE), 인버스 PCR (inverse polymerase chain reaction: IPCR), 벡토레트(vectorette) PCR, 및 TAIL-PCR(thermal asymmetric interlaced PCR)이다.The PCR may be performed by, for example, quantitative PCR (qPCR), digital PCR (dPCR), hot start PCR, touchdown PCR, nested PCR, booster ) PCR, multiplex PCR, real-time PCR, differential display PCR, D-PCR, rapid amplification of cDNA ends (RACE), inverse PCR polymerase chain reaction (IPCR), vectorette PCR, and thermal asymmetric interlaced PCR (TAIL-PCR).

상기 제1 프라이머 세트는 제1 폴리뉴클레오티드에 상보적인 폴리뉴클레오티드일 수 있다. 상기 제1 프라이머 세트는 유니버셜 프라이머 세트일 수 있다.The first set of primers may be a polynucleotide complementary to the first polynucleotide. The first primer set may be a universal primer set.

상기 제2 프라이머 세트는 제2 폴리뉴클레오티드에 상보적인 폴리뉴클레오티드일 수 있다. 상기 제2 프라이머 세트는 제2 폴리뉴클레오티드에는 상보적이지만 상기 제1 폴리뉴클레오티드에는 상보적이지 않은 폴리뉴클레오티드일 수 있다.The second primer set may be a polynucleotide complementary to the second polynucleotide. The second set of primers may be a polynucleotide complementary to the second polynucleotide but not complementary to the first polynucleotide.

상기 PCR 반응에서, 증폭된 핵산의 검출을 위해 표적 핵산에 상보적인 프로브를 더 사용할 수 있다. 상기 프로브는 그의 하나 이상의 말단이 형광 물질, 양자점, FRET 등으로 표지된 것일 수 있다.In the PCR reaction, a probe complementary to the target nucleic acid may be further used for detection of the amplified nucleic acid. The probe may have one or more ends thereof labeled with a fluorescent material, a quantum dot, FRET, or the like.

Ct(threshold cycle) 값은 PCR에서 배경 신호를 초과하여 최초로 증폭 신호를 나타내는 사이클의 수를 말한다. 정량적 PCR의 경우, 형광 신호의 역치(threshold)를 나타내는 사이클의 수를 말한다. Ct 값은 증폭 반응에서 출발 물질로서 최초 핵산의 카피 수와 역의 상관관계가 있기 때문에, Ct 값은 표적 시료 중 핵산의 카피 수를 산출하는데 이용될 수 있다.The Ct (threshold cycle) value is the number of cycles that represent the first amplified signal exceeding the background signal in the PCR. In the case of quantitative PCR, it refers to the number of cycles representing the threshold of the fluorescence signal. Since the Ct value is inversely correlated with the number of copies of the original nucleic acid as a starting material in the amplification reaction, the Ct value can be used to calculate the number of copies of the nucleic acid in the target sample.

상기 제1 Ct 값은 제2 라이브러리의 총 리드(read)를 나타낼 수 있다. 용어 "리드(read)"는 핵산 서열 분석으로 수득된 핵산 단편의 핵산 서열 정보를 말한다.The first Ct value may represent the total read of the second library. The term "read" refers to nucleic acid sequence information of a nucleic acid fragment obtained by nucleic acid sequence analysis.

상기 방법은 제2 라이브러리 및 제2 폴리뉴클레오티드에 상보적인 제2 프라이머 세트를 사용한 제2 PCR을 수행하여 제2 Ct 값을 산출하는 단계를 포함한다.The method comprises performing a second PCR using a second library and a second primer set complementary to the second polynucleotide to yield a second Ct value.

상기 제2 Ct 값은 제2 라이브러리 중 제2 폴리뉴클레오티드의 리드를 나타낼 수 있다.The second Ct value may represent the lead of the second polynucleotide in the second library.

상기 제1 PCR, 제2 PCR, 또는 이들의 조합은 정량적 PCR(quantitative PCR: qPCR) 또는 디지탈 PCR(digital PCR: dPCR)로 수행될 수 있다.The first PCR, the second PCR, or a combination thereof may be performed by quantitative PCR (qPCR) or digital PCR (dPCR).

상기 제1 PCR 및 제2 PCR은 동시 또는 순차로 수행될 수 있다.The first PCR and the second PCR may be performed simultaneously or sequentially.

상기 방법은 제1 Ct 값에 대한 제2 Ct 값의 비율을 산출하여 상기 제1 라이브러리의 복잡성을 측정하는 단계를 포함한다.The method includes measuring the complexity of the first library by calculating a ratio of a second Ct value to a first Ct value.

상기 제1 Ct 값에 대한 제2 Ct 값의 비율이 낮을수록 상기 제1 라이브러리의 복잡성이 높을 수 있다. 상기 제1 Ct 값에 대한 제2 Ct 값의 비율이 높을수록 상기 제1 라이브러리의 복잡성이 낮을 수 있다.The lower the ratio of the second Ct value to the first Ct value, the higher the complexity of the first library. The higher the ratio of the second Ct value to the first Ct value, the lower the complexity of the first library.

다른 양상은 표적 시료로부터 추출된 핵산을 단편화하는 단계;Other aspects include fragmenting the nucleic acid extracted from the target sample;

상기 제1 라이브러리에 제2 폴리뉴클레오티드를 첨가하여 제2 라이브러리를 준비하는 단계로서,Adding a second polynucleotide to said first library to prepare a second library,

상기 제2 폴리뉴클레오티드는 2 이상 연속 뉴클레오티드가 표적 핵산 서열과 동일한 핵산 서열을 포함하는 제1 영역, 및 상기 제1 영역의 하나 이상의 말단에 위치하고, 표적 핵산 서열의 하나 이상의 말단으로부터 2 이상 연속 뉴클레오티드와 상이한 핵산 서열을 포함하는 제2 영역을 포함하는 것인 단계;Wherein the second polynucleotide comprises a first region in which at least two consecutive nucleotides comprise the same nucleic acid sequence as the target nucleic acid sequence and a second region in which at least two consecutive nucleotides from one or more ends of the target nucleic acid sequence A second region comprising a different nucleic acid sequence;

상기 제2 라이브러리 및 상기 제1 폴리뉴클레오티드에 상보적인 제1 프라이머를 사용한 핵산 서열 분석(sequencing)을 수행하여 제2 라이브러리의 총 리드(read)를 수득하는 단계;Performing nucleic acid sequencing using the second library and the first primer complementary to the first polynucleotide to obtain a total read of the second library;

수득된 총 리드로부터 상기 제2 폴리뉴클레오티드의 리드를 선별하여 제2 폴리뉴클레오티드의 리드를 수득하는 단계; 및Selecting leads of the second polynucleotide from the obtained total leads to obtain leads of the second polynucleotide; And

총 리드의 수에 대한 제2 폴리뉴클레오티드의 리드의 수의 비율을 산출하여 상기 제1 라이브러리의 복잡성을 측정하는 단계를 포함하는,And calculating the ratio of the number of leads of the second polynucleotide to the total number of leads to measure the complexity of the first library.

상기 표적 시료, 핵산, 핵산 서열 분석, 단편화, 제1 폴리뉴클레오티드, 라이게이션, 첨가, 제2 폴리뉴클레오티드, PCR, Ct 값, 라이브러리, 및 라이브러리의 복잡성은 전술된 바와 같다.The complexity of the target sample, nucleic acid, nucleic acid sequence analysis, fragmentation, first polynucleotide, ligation, addition, second polynucleotide, PCR, Ct value, library, and library is as described above.

상기 방법은 제1 라이브러리에 제2 폴리뉴클레오티드를 첨가하여 제2 라이브러리를 준비하는 단계를 포함한다.The method comprises the step of preparing a second library by adding a second polynucleotide to the first library.

상기 방법은 제2 라이브러리 및 제1 폴리뉴클레오티드에 상보적인 제1 프라이머를 사용한 핵산 서열 분석을 수행하여 제2 라이브러리의 총 리드를 수득하는 단계를 포함한다.The method comprises performing a nucleic acid sequence analysis using a second library and a first primer complementary to the first polynucleotide to obtain the total lid of the second library.

제1 프라이머는 하나의 프라이머 또는 프라이머 세트일 수 있다.The first primer may be one primer or a set of primers.

상기 방법은 수득된 총 리드로부터 제2 폴리뉴클레오티드의 리드를 선별하여 제2 폴리뉴클레오티드의 리드를 수득하는 단계를 포함한다.The method comprises selecting a lead of a second polynucleotide from the resulting total lead to obtain a lead of a second polynucleotide.

상기 방법은 총 리드의 수에 대한 제2 폴리뉴클레오티드의 리드의 수의 비율을 산출하여 제1 라이브러리의 복잡성을 측정하는 단계를 포함한다.The method includes measuring the complexity of the first library by calculating the ratio of the number of leads of the second polynucleotide to the total number of leads.

상기 총 리드의 수에 대한 제2 폴리뉴클레오티드의 리드의 수의 비율이 낮을수록 상기 제1 라이브러리의 복잡성이 높을 수 있다. 상기 총 리드의 수에 대한 제2 폴리뉴클레오티드의 리드의 수의 비율이 높을수록 상기 제1 라이브러리의 복잡성이 낮을 수 있다.The lower the ratio of the number of leads of the second polynucleotide to the total number of leads, the higher the complexity of the first library. The higher the ratio of the number of leads of the second polynucleotide to the total number of leads, the lower the complexity of the first library.

상기 방법은 핵산 서열 분석 중 실시간으로 또는 핵산 서열 분석 후 제1 라이브러리의 복잡성을 모니터링할 수 있다.The method can monitor the complexity of the first library in real time during nucleic acid sequencing or after nucleic acid sequence analysis.

일 양상 또는 다른 양상에 따른 라이브러리의 복잡성을 측정하는 방법에 따르면, 핵산 서열 분석용 라이브러리의 제조 과정 중에 실시간으로, 라이브러리 제조 후 핵산 서열 분석 과정 중, 또는 핵산 서열 분석 완료 후에 간단하고 정확한 방법으로 라이브러리의 복잡성을 측정할 수 있다.According to a method for measuring the complexity of a library according to one aspect or another aspect, it is possible to provide a library for nucleic acid sequence analysis in real time, during library nucleic acid sequence analysis, or after completion of nucleic acid sequence analysis, Can be measured.

도 1a는 일 양상에 따른 NGS를 위한 라이브러리의 복잡성을 측정하는 방법의 원리를 나타내는 모식도이고, 도 1b는 라이브러리의 복잡성이 높거나 낮은 경우 전체 리드 중 인위적 서열 리드의 비율을 나타내는 모식도이다 (리드 중 ■: 어댑터, 리드 중 □: 인위적 서열).
도 2는 라이브러리 복잡도에 따른 정량적 PCR에서 Ct 값을 나타내는 그래프이다.
도 3은 라이브러리 복잡도에 따른 총 리드 중 인위적 서열 리드의 비율을 나타내는 그래프이다.FIG. 1A is a schematic diagram showing the principle of a method for measuring the complexity of a library for NGS according to an aspect, and FIG. 1B is a schematic diagram showing a ratio of an artificial sequence leader among all the leads when the complexity of a library is high or low ■: adapter, lead in □: artificial sequence).
2 is a graph showing Ct values in quantitative PCR according to library complexity.
3 is a graph showing the ratio of an artificial sequence leader among total leads according to library complexity.

이하 본 발명을 실시예를 통하여 보다 상세하게 설명한다. 그러나, 이들 실시예는 본 발명을 예시적으로 설명하기 위한 것으로 본 발명의 범위가 이들 실시예에 한정되는 것은 아니다.Hereinafter, the present invention will be described in more detail with reference to examples. However, these examples are for illustrative purposes only, and the scope of the present invention is not limited to these examples.

실시예Example 1. 차세대 1. Next generation 핼산Hexane 서열 분석을 위한 라이브러리의 복잡성의 측정 Measuring library complexity for sequencing

1. 인위적 서열을 함유하는 핵산 단편의 준비1. Preparation of nucleic acid fragments containing artificial sequences

차세대 핵산 서열 분석(next generation sequencing: NGS)을 위해, 표적 서열로서 동반 진단(companion diagnostics: CDx)에 활용되는 것으로 알려진 변이를 포함하는 유전자 KRAS, IDH1, BRAC1, ALK, 및 ERBB2 및 이들 유전자의 영역을 선정하였다. 선정된 위치를 기준으로 약 150 bp의 참조 서열을 선별하였다.For next generation sequencing (NGS), the genes KRAS, IDH1, BRAC1, ALK, and ERBB2, which contain mutations known to be used in companion diagnostics (CDx) as target sequences, Respectively. A reference sequence of about 150 bp was selected based on the selected site.

선별된 참조 서열과 핵산 서열은 동일하지만, 그의 5' 말단으로부터 4 bp 및 3' 말단으로부터 4 bp를 인위적인 서열(artificial sequence)로 치환하고, 양 말단에 라이브러리의 어댑터 핵산 서열을 포함한 핵산 단편 (이하, "인위적 서열 함유 핵산 단편"이라고 함)을 유전자 합성 방법으로 제조하였다.The selected reference sequence and the nucleic acid sequence are the same, but 4 bp from the 5 'end and 4 bp from the 3' end are substituted with an artificial sequence, and nucleic acid fragments including the adapter nucleic acid sequence of the library , "An artificial sequence-containing nucleic acid fragment") was prepared by a gene synthesis method.

선별된 유전자, 참조 서열, 및 인위적 서열 함유 핵산 단편의 핵산 서열에서 어댑터 핵산 서열을 제외한 나머지 핵산 서열을 하기 표 1에 나타내었다.The nucleic acid sequences other than the adapter nucleic acid sequences in the nucleic acid sequences of the selected genes, reference sequences, and artificial sequence-containing nucleic acid fragments are shown in Table 1 below.

번호number 참조 유전체의 위치Location of Reference Dielectric 참조 서열Reference sequence 인위적 서열 함유 핵산 단편An artificial sequence-containing nucleic acid fragment 1One KRAS : 염색체 번호 12 :
엑손 번호: 3 :
염색체 12:25380168-25380346KRAS: Chromosome number 12:
Exon number: 3:
Chromosome 12: 25380168-25380346 5'-AGGAATCCTGAGAAGGGAGAAACACAGTCTGGATTATTACAGTGCACCTTTTACTTCAAAAAAGGTGTTATATACAACTCAACAACAAAAAATTCAATTTAAAAATGGGCAAAGGACTTGAAAAGACATTGTTCCTGCTCCAAAGATGAC-3' (서열번호 1)5 ' -AGGAATCCTGAGAAGGGAGAAACACAGTCTGGATTATTACACTTGCACCTTTTACTTCAAAAAAGGTGTTATATACAACTCAACAACAAAAAATTCAATTTAAAAATGGGCAAAGGACTTGAAAAGACATTGTTCCTGCTCCAAAGATGAC-3 '(SEQ ID NO: 1) 5'- CTTC ATCCTGAGAAGGGAGAAACACAGTCTGGATTATTACAGTGCACCTTTTACTTCAAAAAAGGTGTTATATACAACTCAACAACAAAAAATTCAATTTAAAAATGGGCAAAGGACTTGAAAAGACATTGTTCCTGCTCCAAAGA GAAG -3' (서열번호 2)5'- CTTC ATCCTGAGAAGGGAGAAACACAGTCTGGATTATTACAGTGCACCTTTTACTTCAAAAAAGGTGTTATATACAACTCAACAACAAAAAATTCAATTTAAAAATGGGCAAAGGACTTGAAAAGACATTGTTCCTGCTCCAAAGA GAAG -3 '(SEQ ID NO: 2) 22 IDH1 : 염색체 번호12
엑손 번호: 4 :
염색체 2:209113048-209113359IDH1: chromosome number 12
Exon number: 4:
Chromosome 2: 209113048-209113359 5'-AGATAATGGCTTCTCTGAAGACCGTGCCACCCAGAATATTTCGTATGGTGCCATTTGGTGATTTCCACATTTGTTTCAACTTGAACTCCTCAACCCTCTTCTCATCAGGAGTGATAGTGGCACATTTGACGCCAACATTATGCTTCTTTA-3' (서열번호 3)5'-AGATAATGGCTTCTCTGAAGACCGTGCCACCCAGAATATTTCGTATGGTGCCATTTGGTGATTTCCACATTTGTTTCAACTTGAACTCCTCAACCCTCTTCTCATCAGGAGTGATAGTGGCACATTTGACGCCAACATTATGCTTCTTTA-3 '(SEQ ID NO: 3) 5'- CTTC AATGGCTTCTCTGAAGACCGTGCCACCCAGAATATTTCGTATGGTGCCATTTGGTGATTTCCACATTTGTTTCAACTTGAACTCCTCAACCCTCTTCTCATCAGGAGTGATAGTGGCACATTTGACGCCAACATTATGCTTC GAAG -3' (서열번호 4)5'- CTTC AATGGCTTCTCTGAAGACCGTGCCACCCAGAATATTTCGTATGGTGCCATTTGGTGATTTCCACATTTGTTTCAACTTGAACTCCTCAACCCTCTTCTCATCAGGAGTGATAGTGGCACATTTGACGCCAACATTATGCTTC GAAG -3 '(SEQ ID NO: 4) 33 BRAC1 : 염색체 번호 17
엑손 번호: 15 :
염색체 17:41222945-41223255BRAC1: chromosome number 17
Exon number: 15:
Chromosome 17: 41222945-41223255 5'-TCAATTCTGGCTTCTCCCTGCTCACACTTTCTTCCATTGCATTATACCCAGCAGTATCAGTAGTATGAGCAGCAGCTGGACTCTGGGCAGATTCTGCAACTTTCAACTTTCAATTGGGGAACTTTCAATGCAGAGGTTGAAGATGGTATG-3' (서열번호 5)5'-TCAATTCTGGCTTCTCCCTGCTCACACTTTCTTCCATTGCATTATACCCAGCAGTATCAGTAGTATGAGCAGCAGCTGGACTCTGGGCAGATTCTGCAACTTTCAACTTTCAATTGGGGAACTTTCAATGCAGAGGTTGAAGATGGTATG-3 '(SEQ ID NO: 5) 5'- CTTC TTCTGGCTTCTCCCTGCTCACACTTTCTTCCATTGCATTATACCCAGCAGTATCAGTAGTATGAGCAGCAGCTGGACTCTGGGCAGATTCTGCAACTTTCAACTTTCAATTGGGGAACTTTCAATGCAGAGGTTGAAGATGG GAAG -3' (서열번호 6) CTTC TTCTGGCTTCTCCTTCCATTGCATTATACCCAGCAGTATCAGTAGTATGAGCAGCAGCTGGACTCTGGGCAGATTCTGCAACTTTCAACTTTCAATTGGGGAACTTTCAATGCAGAGGTTGAAGATGG GAAG -3 '(SEQ ID NO: 6) 44 ALK : 염색체 번호 2
엑손 번호: 20:
염색체 2:29446208-29446394ALK: chromosome number 2
Exon number: 20:
Chromosome 2: 29446208-29446394 5'-GGTCACTGATGGAGGAGGTCTTGCCAGCAAAGCAGTAGTTGGGGTTGTAGTCGGTCATGATGGTCGAGGTGCGGAGCTTGCTCAGCTTGTACTCAGGGCTCTGCAGCTCCATCTGCATGGCTTGCAGCTCCTGGTGCTTCCGGCGGTACA-3' (서열번호 7)5'-GGTCACTGATGGAGGAGGTCTTGCCAGCAAAGCAGTAGTTGGGGTTGTAGTCGGTCTGTGGTCGAGGTGCGGAGCTTGCTCAGCTTGTACTCAGGGCTCTGCAGCTCCATCTGCATGGCTTGCAGCTCCTGGTGCTTCCGGCGGTACA-3 '(SEQ ID NO: 7) 5'- CTTC ACTGATGGAGGAGGTCTTGCCAGCAAAGCAGTAGTTGGGGTTGTAGTCGGTCATGATGGTCGAGGTGCGGAGCTTGCTCAGCTTGTACTCAGGGCTCTGCAGCTCCATCTGCATGGCTTGCAGCTCCTGGTGCTTCCGGCGG GAAG -3' (서열번호 8)5'- CTTC ACTGATGGAGGAGGTCTTGCCAGCAAAGCAGTAGTTGGGGTTGTAGTCGGTCTGATGGTCGAGGTGCGGAGCTTGCTCAGCTTGTACTCAGGGCTCTGCAGCTCCATCTGCATGGCTTGCAGCTCCTGGTGCTTCCGGCGG GAAG -3 '(SEQ ID NO: 8) 55 ERBB2 : 염색체 번호 17
엑손 번호: 6:
염색체 17:37864574-37864787ERBB2: chromosome number 17
Exon number: 6:
Chromosome 17: 37864574-37864787 5'-CAGGGCTACGTGCTCATCGCTCACAACCAAGTGAGGCAGGTCCCACTGCAGAGGCTGCGGATTGTGCGAGGCACCCAGCTCTTTGAGGACAACTATGCCCTGGCCGTGCTAGACAATGGAGACCCGCTGAACAATACCACCCCTGTCACA-3' (서열번호 9)5'-CAGGGCTACGTGCTCATCGCTCACAACCAAGTGAGGCAGGTCCCACTGCAGAGGCTGCGGATTGTGCGAGGCACCCAGCTCTTTGAGGACAACTATGCCCTGGCCGTGCTAGACAATGGAGACCCGCTGAACAATACCACCCCTGTCACA-3 '(SEQ ID NO: 9) 5'- CTTC GCTACGTGCTCATCGCTCACAACCAAGTGAGGCAGGTCCCACTGCAGAGGCTGCGGATTGTGCGAGGCACCCAGCTCTTTGAGGACAACTATGCCCTGGCCGTGCTAGACAATGGAGACCCGCTGAACAATACCACCCCTGT GAAG -3' (서열번호 10)5'- CTTC GCTACGTGCTCATCGCTCACAACCAAGTGAGGCAGGTCCCACTGCAGAGGCTGCGGATTGTGCGAGGCACCCAGCTCTTTGAGGACAACTATGCCCTGGCCGTGCTAGACAATGGAGACCCGCTGAACAATACCACCCCTGT GAAG -3 '(SEQ ID NO: 10)

표 1에서, 인위적 서열 함유 핵산 단편의 인위적 서열을 진한 글자 및 밑줄로 표시하였다.In Table 1, the artificial sequences of the artificial sequence-containing nucleic acid fragments are shown in bold letters and underlined.

2. 2. NGSNGS 를 위한 라이브러리의 준비 및 인위적 서열 함유 핵산 단편의 첨가And the addition of an artificial sequence-containing nucleic acid fragment

NGS를 위한 라이브러리를 제조하기 위해, 인간 유전체 시료 HapMap NA07014, NA10840, NA18595, NA18957, NA18488, NA18511, NA18867, NA18924, NA19108, 및 NA19114의 총 10 종의 시료를 동일한 몰 농도 비율로 혼합한 HapMap 혼합 시료 50 ng 또는 200 ng을 준비하였다. 준비된 혼합 시료를 Kapa hyper prep kits for illumine (Kapa Biosystems)을 사용하여, 제조자가 제공한 방법에 따라, 인간 유전체의 단편화, 말단-수선, 3'-아데노신 꼬리달기, 및 어댑터 라이게이션을 순차로 수행하였다.To prepare a library for NGS, a HapMap mixed sample in which 10 kinds of samples of human genomic samples HapMap NA07014, NA10840, NA18595, NA18957, NA18488, NA18511, NA18867, NA18924, NA19108 and NA19114 were mixed at the same molar ratio 50 ng or 200 ng were prepared. Fragments of the human genome, end-repair, 3'-adenosine tailing, and adapter ligation were sequentially performed using Kapa hyper prep kits for illumine (Kapa Biosystems) according to the method provided by the manufacturer Respectively.

실시예 1.1에서 준비된 인위적 서열 함유 핵산 단편 각각 50 atmole을 어댑터가 라이게이션된 라이브러리에 첨가(spiking)하였다. 인위적 서열 함유 핵산 단편이 첨가된 라이브러리를 캡쳐-전(pre-capture) 폴리머라제 연쇄 반응(polymerase chain reaction: PCR)을 수행한 후, 표적 농축(target enrichment)를 수행하였다. 그 후, 표적 농축된 라이브러리를 캡쳐-후(post-capture) PCR을 수행하였다.50 atmoles of each of the artificial sequence-containing nucleic acid fragments prepared in Example 1.1 were spiked to the libraries to which the adapters were ligated. The library to which the artificial sequence-containing nucleic acid fragment was added was subjected to a pre-capture polymerase chain reaction (PCR) followed by target enrichment. The target enriched library was then subjected to post-capture PCR.

KAPA Illumina 라이브러리 농도 측정용 정량적 PCR(quantitative PCR: qPCR) 키트를 사용하여 실시간 PCR을 수행하고, 실시간 qPCR 결과로부터 Ct(cycle threshold) 값을 산출하였다. 여기서, 산출된 Ct 값은 총 리드의 수를 나타낸다.Real time PCR was performed using quantitative PCR (qPCR) kit for KAPA Illumina library concentration measurement, and Ct (cycle threshold) value was calculated from real time qPCR results. Here, the calculated Ct value represents the total number of leads.

한편, 라이브러리에 포함된 인위적 서열 함유 핵산 단편의 리드 수를 측정하기 위해, 하기 표 2의 프라이머 세트 및 프로브를 사용하여 실시간 qPCR을 수행하였다. 여기서, 산출된 Ct 값은 인위적 서열 함유 핵산 단편으로부터 유래한 리드의 수를 나타낸다.On the other hand, in order to measure the number of leads of artificial sequence-containing nucleic acid fragments contained in the library, real-time qPCR was performed using the primer sets and probes shown in Table 2 below. Here, the calculated Ct value represents the number of leads derived from an artificial sequence-containing nucleic acid fragment.

프라이머primer 핵산 서열Nucleic acid sequence IDH1_인위적_정방향IDH1_Artificial_forward 5'-CCACCGAGATCTACACTCTTTC-3' (서열번호 11)5'-CCACCGAGATCTACACTCTTTC-3 '(SEQ ID NO: 11) IDH1_인위적_프로브IDH1_ Artificial probe 5'-ACGCTCTTCCGATCTCTTCAATGGC-3' (서열번호 12)5'-ACGCTCTTCCGATCTCTTCAATGGC-3 '(SEQ ID NO: 12) IDH1_인위적_역방향IDH1_an artificial_orverse 5'-AAATCACCAAATGGCACCATAC-3' (서열번호 13)5'-AAATCACCAAATGGCACCATAC-3 '(SEQ ID NO: 13) BRCA1_인위적_정방향BRCA1_Artificial_forward 5'-GCGACCACCGAGATCTACA-3' (서열번호 14)5'-GCGACCACCGAGATCTACA-3 '(SEQ ID NO: 14) BRCA1_인위적_프로브BRCA1_ Artificial probes 5'-ACGACGCTCTTCCGATCTCTTCTTCT-3' (서열번호 15)5'-ACGACGCTCTTCCGATCTCTTCTTCT-3 '(SEQ ID NO: 15) BRCA1_인위적_역방향BRCA1_ Artificial _ reverse direction 5'-GAAAGTGTGAGCAGGGAGAAG-3' (서열번호 16)5'-GAAAGTGTGAGCAGGGAGAAG-3 '(SEQ ID NO: 16) ERBB2_인위적_정방향ERBB2_ artificial_forward 5'-CCACCGAGATCTACACTCTTTC-3' (서열번호 17)5'-CCACCGAGATCTACACTCTTTC-3 '(SEQ ID NO: 17) ERBB2_인위적_프로브ERBB2_ Artificial probes 5'-ATCTCTTCGCTACGTGCTCATCGC-3' (서열번호 18)5'-ATCTCTTCGCTACGTGCTCATCGC-3 '(SEQ ID NO: 18) ERBB2_인위적_역방향ERBB2_ artificial _ reverse direction 5'-CCTGCCTCACTTGGTTGT-3'(서열번호 19)5'-CCTGCCTCACTTGGTTGT-3 '(SEQ ID NO: 19)

또한, 라이브러리의 복잡성에 따라 전체 리드 중 인위적 서열 리드의 비율이 변화하는지 여부를 확인하기 위해, 라이브러리 제조 과정에서 라이브러리 복잡성을 변화시킨 라이브러리를 제조하였다. Kapa hyper prep kits for illumine (Kapa Biosystems)의 제조자가 제공한 라이브러리의 제조 방법을 이용하여, 어댑터 라이게이션 단계에서 라이게이션된 산물을 1회 정제하고, 30 μM의 어댑터를 사용하고, 이 방법에 따라 제조된 라이브러리를 음성 대조군으로 사용하였다. 제조된 라이브러리의 복잡성을 인위적으로 감소시키기 위해, 라이게이션 단계에서 라이게이션된 산물의 2회 정제, 3 μM의 어댑터(즉, 1/10 희석)를 사용하거나, 또는 이들의 조합을 사용하여 복잡성이 감소된 라이브러리를 제조하였다.In order to confirm whether or not the ratio of an artificial sequence lead among all leads changes according to the complexity of the library, a library in which the library complexity is changed in the course of manufacturing the library was prepared. Utilizing the library preparation method provided by the manufacturer of Kapa hyper prep kits for illumine (Kapa Biosystems), the ligation product was purified once in the adapter ligation step, using a 30 [mu] M adapter, The prepared library was used as a negative control. In order to artificially reduce the complexity of the prepared library, complexity is achieved by using two rounds of purification of the ligation product in the ligation step, using an adapter of 3 [mu] M (i.e., 1/10 dilution) A reduced library was prepared.

음성 대조군의 라이브러리와 인위적으로 복잡성을 감소시킨 라이브러리를 상기와 같은 방법으로 실시간 qPCR을 수행하고 Ct 값을 산출하였다. 라이브러리의 복잡성에 따른 산출된 Ct 값을 도 2에 나타내었다. 도 2에 나타난 바와 같이, 라이브러리의 복잡성이 감소함에 따라, 인위적 서열 함유 핵산 단편의 Ct 값이 감소하였다. 이에 반해, 총 리드의 Ct 값은 라이브러리의 복잡성의 변화에도 불구하고, 유의한 변화가 없었다.A negative control library and a library artificially reduced in complexity were subjected to real-time qPCR in the same manner as above, and the Ct value was calculated. The calculated Ct value according to the complexity of the library is shown in Fig. As shown in Figure 2, as the complexity of the library decreased, the Ct value of the artificial sequence containing nucleic acid fragments decreased. In contrast, the Ct value of the total lead did not change significantly, despite the change in the complexity of the library.

제조된 라이브러리의 핵산 서열을 분석하고, 분석된 미가공된 리드 데이터를 이용하여 전체 리드의 수 및 전체 리드 중 인위적 서열 리드의 비율을 산출하였다. 복잡성의 변화에 따라, 산출된 전체 리드의 수 및 전체 리드 중 인위적 서열 리드의 비율을 도 3에 나타내었다. 도 3에서, "50 ng"은 라이브러리 제조시 인간 게놈 DNA HapMap 혼합 시료의 양이 50 ng임을 의미한다. 도 3에 나타낸 바와 같이, 전체 리드의 수 및 인위적 서열 리드의 수는 라이브러리 복잡성과 상관 관계가 없지만, 전체 리드 중 인위적 서열 리드의 비율은 라이브러리 복잡성과 역으로 상관관계가 있음을 확인하였다.The nucleic acid sequence of the prepared library was analyzed and the unprocessed lead data analyzed was used to calculate the total number of leads and the ratio of anthraquent sequence leads among the total leads. Fig. 3 shows the number of total leads calculated and the ratio of the artificial sequence leads among the total leads according to the change in complexity. In Figure 3, "50 ng" means that the amount of human genomic DNA HapMap mixed sample is 50 ng when the library is prepared. As shown in FIG. 3, the number of total leads and the number of artificial sequence leads do not correlate with library complexity, but it was confirmed that the ratio of artificial sequence leads among all leads was inversely correlated with library complexity.

<110> SAMSUNG ELECTRONICS CO., LTD Samsung Life Public Welfare Foundation <120> Method for measuring library complexity for next generation sequencing <130> PN115614-KR <160> 19 <170> KopatentIn 2.0 <210> 1 <211> 150 <212> DNA <213> Artificial Sequence <220> <223> KRAS reference sequence <400> 1 aggaatcctg agaagggaga aacacagtct ggattattac agtgcacctt ttacttcaaa 60 aaaggtgtta tatacaactc aacaacaaaa aattcaattt aaaaatgggc aaaggacttg 120 aaaagacatt gttcctgctc caaagatgac 150 <210> 2 <211> 150 <212> DNA <213> Artificial Sequence <220> <223> Nucleic acid fragment containing artificial sequence <400> 2 cttcatcctg agaagggaga aacacagtct ggattattac agtgcacctt ttacttcaaa 60 aaaggtgtta tatacaactc aacaacaaaa aattcaattt aaaaatgggc aaaggacttg 120 aaaagacatt gttcctgctc caaagagaag 150 <210> 3 <211> 150 <212> DNA <213> Artificial Sequence <220> <223> IDH1 reference sequence <400> 3 agataatggc ttctctgaag accgtgccac ccagaatatt tcgtatggtg ccatttggtg 60 atttccacat ttgtttcaac ttgaactcct caaccctctt ctcatcagga gtgatagtgg 120 cacatttgac gccaacatta tgcttcttta 150 <210> 4 <211> 150 <212> DNA <213> Artificial Sequence <220> <223> Nucleic acid fragment containing artificial sequence <400> 4 cttcaatggc ttctctgaag accgtgccac ccagaatatt tcgtatggtg ccatttggtg 60 atttccacat ttgtttcaac ttgaactcct caaccctctt ctcatcagga gtgatagtgg 120 cacatttgac gccaacatta tgcttcgaag 150 <210> 5 <211> 150 <212> DNA <213> Artificial Sequence <220> <223> BRAC1 reference sequence <400> 5 tcaattctgg cttctccctg ctcacacttt cttccattgc attataccca gcagtatcag 60 tagtatgagc agcagctgga ctctgggcag attctgcaac tttcaacttt caattgggga 120 actttcaatg cagaggttga agatggtatg 150 <210> 6 <211> 150 <212> DNA <213> Artificial Sequence <220> <223> Nucleic acid fragment containing artificial sequence <400> 6 cttcttctgg cttctccctg ctcacacttt cttccattgc attataccca gcagtatcag 60 tagtatgagc agcagctgga ctctgggcag attctgcaac tttcaacttt caattgggga 120 actttcaatg cagaggttga agatgggaag 150 <210> 7 <211> 150 <212> DNA <213> Artificial Sequence <220> <223> ALK reference sequence <400> 7 ggtcactgat ggaggaggtc ttgccagcaa agcagtagtt ggggttgtag tcggtcatga 60 tggtcgaggt gcggagcttg ctcagcttgt actcagggct ctgcagctcc atctgcatgg 120 cttgcagctc ctggtgcttc cggcggtaca 150 <210> 8 <211> 150 <212> DNA <213> Artificial Sequence <220> <223> Nucleic acid fragment containing artificial sequence <400> 8 cttcactgat ggaggaggtc ttgccagcaa agcagtagtt ggggttgtag tcggtcatga 60 tggtcgaggt gcggagcttg ctcagcttgt actcagggct ctgcagctcc atctgcatgg 120 cttgcagctc ctggtgcttc cggcgggaag 150 <210> 9 <211> 150 <212> DNA <213> Artificial Sequence <220> <223> ERBB2 reference sequence <400> 9 cagggctacg tgctcatcgc tcacaaccaa gtgaggcagg tcccactgca gaggctgcgg 60 attgtgcgag gcacccagct ctttgaggac aactatgccc tggccgtgct agacaatgga 120 gacccgctga acaataccac ccctgtcaca 150 <210> 10 <211> 150 <212> DNA <213> Artificial Sequence <220> <223> Nucleic acid fragment containing artificial sequence <400> 10 cttcgctacg tgctcatcgc tcacaaccaa gtgaggcagg tcccactgca gaggctgcgg 60 attgtgcgag gcacccagct ctttgaggac aactatgccc tggccgtgct agacaatgga 120 gacccgctga acaataccac ccctgtgaag 150 <210> 11 <211> 22 <212> DNA <213> Artificial Sequence <220> <223> IDH1_art_Forward primer <400> 11 ccaccgagat ctacactctt tc 22 <210> 12 <211> 25 <212> DNA <213> Artificial Sequence <220> <223> IDH1_art_Probe <400> 12 acgctcttcc gatctcttca atggc 25 <210> 13 <211> 22 <212> DNA <213> Artificial Sequence <220> <223> IDH1_art_Reverse primer <400> 13 aaatcaccaa atggcaccat ac 22 <210> 14 <211> 19 <212> DNA <213> Artificial Sequence <220> <223> BRCA1_art_Forward primer <400> 14 gcgaccaccg agatctaca 19 <210> 15 <211> 26 <212> DNA <213> Artificial Sequence <220> <223> BRCA1_art_Probe <400> 15 acgacgctct tccgatctct tcttct 26 <210> 16 <211> 21 <212> DNA <213> Artificial Sequence <220> <223> BRCA1_art_Reverse primer <400> 16 gaaagtgtga gcagggagaa g 21 <210> 17 <211> 22 <212> DNA <213> Artificial Sequence <220> <223> ERBB2_art_Forward primer <400> 17 ccaccgagat ctacactctt tc 22 <210> 18 <211> 24 <212> DNA <213> Artificial Sequence <220> <223> ERBB2_art_Probe <400> 18 atctcttcgc tacgtgctca tcgc 24 <210> 19 <211> 18 <212> DNA <213> Artificial Sequence <220> <223> ERBB2_art_Reverse primer <400> 19 cctgcctcac ttggttgt 18 <110> SAMSUNG ELECTRONICS CO., LTD Samsung Life Public Welfare Foundation <120> Method for measuring library complexity for next generation sequencing <130> PN115614-KR <160> 19 <170> Kopatentin 2.0 <210> 1 <211> 150 <212> DNA <213> Artificial Sequence <220> <223> KRAS reference sequence <400> 1 aggaatcctg agaagggaga aacacagtct ggattattac agtgcacctt ttacttcaaa 60 aaaggtgtta tatacaactc aacaacaaaa aattaattt aaaaatgggc aaaggacttg 120 aaaagacatt gttcctgctc caaagatgac 150 <210> 2 <211> 150 <212> DNA <213> Artificial Sequence <220> Nucleic acid fragment containing artificial sequence <400> 2 cttcatcctg agaagggaga aacacagtct ggattattac agtgcacctt ttacttcaaa 60 aaaggtgtta tatacaactc aacaacaaaa aattaattt aaaaatgggc aaaggacttg 120 aaaagacatt gttcctgctc caaagagaag 150 <210> 3 <211> 150 <212> DNA <213> Artificial Sequence <220> <223> IDH1 reference sequence <400> 3 agataatggc ttctctgaag accgtgccac ccagaatatt tcgtatggtg ccatttggtg 60 atttccacat ttgtttcaac ttgaactcct caaccctctt ctcatcagga gtgatagtgg 120 cacatttgac gccaacatta tgcttcttta 150 <210> 4 <211> 150 <212> DNA <213> Artificial Sequence <220> Nucleic acid fragment containing artificial sequence <400> 4 cttcaatggc ttctctgaag accgtgccac ccagaatatt tcgtatggtg ccatttggtg 60 atttccacat ttgtttcaac ttgaactcct caaccctctt ctcatcagga gtgatagtgg 120 cacatttgac gccaacatta tgcttcgaag 150 <210> 5 <211> 150 <212> DNA <213> Artificial Sequence <220> <223> BRAC1 reference sequence <400> 5 tcaattctgg cttctccctg ctcacacttt cttccattgc attataccca gcagtatcag 60 tagtatgagc agcagctgga ctctgggcag attctgcaac tttcaacttt caattgggga 120 actttcaatg cagaggttga agatggtatg 150 <210> 6 <211> 150 <212> DNA <213> Artificial Sequence <220> Nucleic acid fragment containing artificial sequence <400> 6 cttcttctgg cttctccctg ctcacacttt cttccattgc attataccca gcagtatcag 60 tagtatgagc agcagctgga ctctgggcag attctgcaac tttcaacttt caattgggga 120 actttcaatg cagaggttga agatgggaag 150 <210> 7 <211> 150 <212> DNA <213> Artificial Sequence <220> <223> ALK reference sequence <400> 7 ggtcactgat ggaggaggtc ttgccagcaa agcagtagtt ggggttgtag tcggtcatga 60 tggtcgaggt gcggagcttg ctcagcttgt actcagggct ctgcagctcc atctgcatgg 120 cttgcagctc ctggtgcttc cggcggtaca 150 <210> 8 <211> 150 <212> DNA <213> Artificial Sequence <220> Nucleic acid fragment containing artificial sequence <400> 8 cttcactgat ggaggaggtc ttgccagcaa agcagtagtt ggggttgtag tcggtcatga 60 tggtcgaggt gcggagcttg ctcagcttgt actcagggct ctgcagctcc atctgcatgg 120 cttgcagctc ctggtgcttc cggcgggaag 150 <210> 9 <211> 150 <212> DNA <213> Artificial Sequence <220> <223> ERBB2 reference sequence <400> 9 cagggctacg tgctcatcgc tcacaaccaa gtgaggcagg tcccactgca gaggctgcgg 60 attgtgcgag gcacccagct ctttgaggac aactatgccc tggccgtgct agacaatgga 120 gacccgctga acaataccac ccctgtcaca 150 <210> 10 <211> 150 <212> DNA <213> Artificial Sequence <220> Nucleic acid fragment containing artificial sequence <400> 10 cttcgctacg tgctcatcgc tcacaaccaa gtgaggcagg tcccactgca gaggctgcgg 60 attgtgcgag gcacccagct ctttgaggac aactatgccc tggccgtgct agacaatgga 120 gacccgctga acaataccac ccctgtgaag 150 <210> 11 <211> 22 <212> DNA <213> Artificial Sequence <220> <223> IDH1_art_Forward primer <400> 11 ccaccgagat ctacactctt tc 22 <210> 12 <211> 25 <212> DNA <213> Artificial Sequence <220> <223> IDH1_art_Probe <400> 12 acgctcttcc gatctcttca atggc 25 <210> 13 <211> 22 <212> DNA <213> Artificial Sequence <220> <223> IDH1_art_Reverse primer <400> 13 aaatcaccaa atggcaccat ac 22 <210> 14 <211> 19 <212> DNA <213> Artificial Sequence <220> <223> BRCA1_art_Forward primer <400> 14 gcgaccaccg agatctaca 19 <210> 15 <211> 26 <212> DNA <213> Artificial Sequence <220> <223> BRCA1_art_Probe <400> 15 acgacgctct tccgatctct tcttct 26 <210> 16 <211> 21 <212> DNA <213> Artificial Sequence <220> <223> BRCA1_art_Reverse primer <400> 16 gaaagtgtga gcagggagaa g 21 <210> 17 <211> 22 <212> DNA <213> Artificial Sequence <220> <223> ERBB2_art_Forward primer <400> 17 ccaccgagat ctacactctt tc 22 <210> 18 <211> 24 <212> DNA <213> Artificial Sequence <220> <223> ERBB2_art_Probe <400> 18 atctcttcgc tacgtgctca tcgc 24 <210> 19 <211> 18 <212> DNA <213> Artificial Sequence <220> <223> ERBB2_art_Reverse primer <400> 19 cctgcctcac ttggttgt 18

Claims

Fragmenting the nucleic acid extracted from the target sample;
Ligation of the first polynucleotide to one or more ends of the fragmented nucleic acid to prepare a first library for nucleic acid sequence analysis;
Preparing a second library by spiking a second polynucleotide in the first library,
Wherein the second polynucleotide comprises a first region in which at least two consecutive nucleotides comprise the same nucleic acid sequence as the target nucleic acid sequence and a second region in which at least two consecutive nucleotides from one or more ends of the target nucleic acid sequence A second region comprising a different nucleic acid sequence;
Performing a first polymerase chain reaction (PCR) using the second library and a first primer set complementary to the first polynucleotide to produce a first threshold (Ct) value;
Performing a second PCR using a second primer set complementary to said second library and said second polynucleotide to yield a second Ct value; And
Calculating a ratio of a second Ct value to the first Ct value to measure complexity of the first library;
A method for measuring the complexity of a library for nucleic acid sequencing.

The method according to claim 1, wherein the nucleic acid sequence analysis is next generation sequencing (NGS).

The method according to claim 1, wherein the target sample is derived from an individual or a cell.

The method according to claim 1, wherein the nucleic acid is a genome or a fragment thereof.

The method of claim 1, wherein the first polynucleotide is an adapter.

The method of claim 1, wherein the target nucleic acid sequence comprises a genetic variation used in companion diagnostics (CDx).

The method of claim 1, wherein the length of the second polynucleotide is 20 nucleotides to 500 nucleotides.

2. The method of claim 1, wherein the second region is located at both ends of the first region, each second region comprises a sequence that differs from the 5 'end of the target nucleic acid sequence by 2 or more consecutive nucleotides and 2 or more consecutive nucleotides from the 3 &Lt; RTI ID = 0.0 > sequence. &Lt; / RTI >

2. The method of claim 1, wherein the length of the second region is from 2 nucleotides to 15 nucleotides.

The method of claim 1, wherein the second polynucleotide further comprises at least two consecutive nucleotides identical to the first polynucleotide at one or more ends thereof.

The method of claim 1, wherein the first PCR, the second PCR, or a combination thereof is performed with quantitative PCR (qPCR) or digital PCR (dPCR).

The method according to claim 1, wherein the first PCR and the second PCR are performed simultaneously or sequentially.

The method of claim 1, wherein the first Ct value represents a total read of a second library.

3. The method of claim 1, wherein the second Ct value represents a second polynucleotide of the second library.

The method of claim 1, wherein as the ratio of the second Ct value to the first Ct value is lower, the complexity of the first library is higher, and the higher the ratio of the second Ct value to the first Ct value, Lt; / RTI > is low.

Fragmenting the nucleic acid extracted from the target sample;
Ligation of the first polynucleotide to one or more ends of the fragmented nucleic acid to prepare a first library for nucleic acid sequence analysis;
Adding a second polynucleotide to said first library to prepare a second library,
Wherein the second polynucleotide comprises a first region in which at least two consecutive nucleotides comprise the same nucleic acid sequence as the target nucleic acid sequence and a second region in which at least two consecutive nucleotides from one or more ends of the target nucleic acid sequence A second region comprising a different nucleic acid sequence;
Performing nucleic acid sequencing using the second library and the first primer complementary to the first polynucleotide to obtain a total read of the second library;
Selecting leads of the second polynucleotide from the obtained total leads to obtain leads of the second polynucleotide; And
And calculating the ratio of the number of leads of the second polynucleotide to the total number of leads to measure the complexity of the first library.
A method for measuring the complexity of a library for nucleic acid sequencing.

17. The method of claim 16, wherein the nucleic acid sequence analysis is next generation nucleic acid sequence analysis (NGS).

17. The method of claim 16, wherein the lower the ratio of the number of leads of the second polynucleotide to the total number of leads is, the higher the complexity of the first library, and the more the number of leads of the second polynucleotide relative to the total number of leads And the higher the ratio, the lower the complexity of the first library.

17. The method of claim 16, wherein the method is to monitor the complexity of the first library in real time during nucleic acid sequencing or after nucleic acid sequence analysis.