KR101533792B1

KR101533792B1 - Method for Autosomal Analysing Human Subject of Analytes based on a Next Generation Sequencing Technology

Info

Publication number: KR101533792B1
Application number: KR1020150026015A
Authority: KR
Inventors: 신경진; 박수정; 이승환
Original assignee: 대한민국
Priority date: 2015-02-24
Filing date: 2015-02-24
Publication date: 2015-07-06

Abstract

The present invention relates to a method for analyzing an autosome of a human subject based on a next generation sequencing (NGS) technology, wherein the method comprises the steps of: (a) performing multiplex amplification by using primers that complementarily bind, respectively, to D3S1358, TH01, D21S11, D18S51, PentaE, D5S818, D13S317, D7S820, D16S539, CSF1PO (Human c-fms proto-oncogene for CSF-1 receptor gene), PentaD, von Willebrand factor A (vWA), D8S1179, Human thyroid peroxidase gene (TPOX), Human fibrinogen alpha chain (FGA), and amelogenin loci of a human subject DNA sample; (b) determining the short tandem repeat (STR) allele of each of the loci by using NGS data of the product of the multiplex amplification in the step (a), and identifying the human subject with a gene. The method using an NGS system according to the present invention can be used for analyzing and interpreting an STR profile from a single-source sample, mixed sample, and decomposed DNA sample in forensic science.

Description

TECHNICAL FIELD [0001] The present invention relates to an autosomal analysis method for human body based on NGS,

본 발명은 NGS 기반 인간 객체의 상염색체 분석방법에 관한 것이다.
The present invention relates to an autosomal analysis method of an NGS-based human object.

법과학 분야에서 주로 사용되는 짧은연쇄반복(short tandem repeat; 이하 STR)은 사람의 유전체(genome)의 비암호화 영역(non-coding region)에 존재하며 이는 2-7 bp의 염기서열이 반복적으로 나타나는 특징을 가진다. STR은 개인마다 핵심반복단위(core repeat unit)의 반복수(repeat number)가 다르게 나타나고 개인마다 고유한 값을 가지기 때문에 개인식별과 혈연관계의 확인 목적으로 STR 분석을 활용하고 있다.^1-3) 현재 법과학 실무에서는 중합효소연쇄반응(polymerase chain reaction; 이하 PCR)으로 얻은 증폭 산물을 모세관 전기영동법(capillary electrophoresis; 이하 CE)으로 분리하여 길이의 차이에 따른 STR의 반복 수를 조사하여 STR 유전자형을 분석하고 있다.³⁾ 이때 여러 STR 유전좌에 대한 증폭 산물이 동시에 얻어질 수 있도록 다중증폭 PCR (multiplex PCR)이 많이 이용된다. CE 기반의 분석법은 단 한 개의 염기 차이도 구별이 가능한 해상도를 갖고 있어 증폭 산물의 길이를 정확하게 확인할 수 있으며, 형광표지자가 부착된 시동체(primer; 프라이머)를 이용하여 증폭 산물을 자동화된 장비에서 쉽고 빠르게 검출할 수 있다. 그러나 이 방법은 증폭 산물의 염기서열을 확인할 수 없을 뿐 아니라 사용할 수 있는 형광표지자의 수 및 증폭 산물의 크기에서 제한이 있다. 기존의 염기서열을 분석하는 방법인 Sanger 기반의 염기서열 분석법은 정확하게 염기서열 정보를 얻을 수는 있지만, 개인 유전체 분석(personal genome analysis) 등 대용량의 DNA 염기서열 정보를 얻어야 하는 연구 분야에 적용하는 것은 분석에 걸리는 시간, 노동력, 비용 측면에서 비효율적이다. 이 때문에 고효율과 저비용으로 대용량의 DNA 염기서열 정보를 얻을 수 있는 새로운 분석기법에 대한 요구가 있었다. Short tandem repeats (STRs), which are mainly used in the forensic field, exist in the non-coding region of the human genome, which is characterized by repeated repeats of 2-7 bp nucleotide sequences . Because STR has a different repeat number of core repeat units for each individual and has a unique value for each individual, STR analyzes are used to identify individual identification and blood relationship. ^{1-3) In the} current forensic science practice, amplification products obtained by polymerase chain reaction (PCR) were separated by capillary electrophoresis (CE) The STR genotype is being analyzed. ³⁾ Multiplex PCR (multiplex PCR) is widely used so that amplification products of several STR genome loci can be obtained at the same time. CE-based assays can be used to identify amplification products with a resolution that can distinguish only one base difference, and amplification products can be detected in automated equipment using primers (primers) with fluorescent markers It can be detected easily and quickly. However, this method is not only able to identify the base sequence of the amplification product, but also has limitations on the number of fluorescent markers available and the size of amplification product. The Sanger-based sequencing method, which is a method for analyzing the existing nucleotide sequence, can accurately obtain the nucleotide sequence information, but it is applied to the research field in which a large amount of DNA sequence information such as personal genome analysis is required It is inefficient in terms of analysis time, labor, and cost. Therefore, there is a demand for a new analysis technique that can obtain DNA sequence information of a large capacity at high efficiency and low cost.

2000년대 중반에 주형 DNA를 대상으로 짧은 길이의 염기서열을 대용량으로 빠르게 생성시킬 수 있는 차세대 염기서열 분석법(next generation sequencing; NGS)이 소개되었다.⁴⁾ NGS 장비의 개발과 시약의 개선이 이루어지고, 생물정보학적 기법이 발달함에 따라서 NGS 분석은 기존의 Sanger 기반의 방법을 대체할 수 있는 여러 가지 장점을 가지고 있어 많은 연구 분야에서 사용되고 있다.^5-9) 법과학 분야에서도 새로운 NGS 기법을 STR 분석에 적용해 봄으로써 기존의 CE 기반의 방법과 비교하여 어떠한 장점을 가지고 있는지, 특히 기존의 방법에서 나타나는 STR 분석의 제한점이 극복될 수 있는지 알아보는 시도가 이루어져 왔으며, 최근 이에 대한 연구결과들이 속속 발표되고 있다.^10-15) 하지만 NGS 기법으로 STR 증폭 산물의 염기서열 분석을 위한 시료 준비 및 라이브러리 제작과 같은 실험적 방법과 생성된 NGS 자료로부터 STR 대립유전자형을 결정하는 분석법이 아직 확고하게 확립되지 않았다. 따라서 립유연구에서는 STR 분석에 주로 사용되고 있는 다중증폭 PCR 방법으로 얻어진 증폭 산물로부터 NGS 자료를 생성하기 위한 최적의 실험적 방법과 생성된 대용량의 NGS 자료의 분석을 통해 STR 대립유전자형 결정, 대립유전자의 반복구조, 염기서열변이를 효과적으로 분석하는 방법을 제시함으로써 단일시료뿐만 아니라 1:1 혼합시료에 대해서도 함께 NGS를 이용한 STR 유전자형 분석의 유용성을 알아보고자 한다.
In the mid-2000s, next generation sequencing (NGS) was introduced, which can rapidly generate short-length nucleotide sequences in large amounts for template DNA. ^{4) As} the development of NGS equipment, the improvement of reagents, and the development of bioinformatics techniques, NGS analysis has been used in many research fields because it has various advantages to replace the existing Sanger-based method. ^{5-9) In the} field of forensic science, we apply the new NGS technique to the STR analysis to see what advantages it has compared with the existing CE-based method, especially if the limitations of STR analysis in existing methods can be overcome Attempts have been made, and the results of recent research are being published. ^10-15) However, the method for determining STR alleles from the experimental methods such as sample preparation and library preparation for the nucleotide sequence analysis of the STR amplification product by the NGS technique and the generated NGS data has not yet been firmly established. Therefore, in the lip oil study, the optimal experimental method for generating NGS data from the amplification products obtained by the multi-amplification PCR method, which is mainly used for the STR analysis, and the analysis of the large-capacity NGS data, , And to investigate the usefulness of STR genotyping using NGS for 1: 1 mixed samples as well as single samples.

한편, NGS 기술을 이용하여 STR 대립유전자의 서열정보를 생성할 수 있고, 이로써 서열변이 타입, 즉 단일 뉴클레오타이드 다형성(single nucleotide polymorphism, SNP) 및/또는 삽입-결실(insertion-deletion, INDEL)에 대한 정보도 얻을 수 있다. 일부 법의학자들은 STR 대립유전자보다 서열변이에 중점을 두고 있는데, 이는 서열변이 또는 다른 반복구조를 가진 같은 길이의 대립유전자를 구별할 수 있기 때문이다. 즉, STR 내에 서열 변이가 존재함을 활용하여 STR loci의 변별력이 증가될 수 있다[10-17]. On the other hand, the sequence information of STR alleles can be generated using the NGS technology, and thereby the sequence variant type, namely single nucleotide polymorphism (SNP) and / or insertion-deletion (INDEL) Information is also available. Some forensic scientists focus on sequence mutations rather than STR alleles because they can distinguish alleles of the same length with sequence variations or other repeating structures. In other words, discriminatory power of STR loci can be increased by utilizing sequence variation in STR [10-17].

이전 연구에서는 법의학적 STR 분석을 위한 NGS의 가능성이 연구되었는데, 한 연구에서는 CE-기반 STR 키트가 단지 GS Junior platform [10]을 이용한 4개의 STR 마커 및 454 GS FLX [5]을 이용한 9개의 STR 마커만을 실험하는데 사용되었다. 그러나, STR의 NGS 분석에 최적화된 in-house 멀티플렉스 PCR 시스템은 존재하지 않았다. 특히, 리드 길이(read length )를 고려해야 하므로 앰플리콘(amplicon) 사이즈가 보다 중요한데, NGS 플랫폼에서 읽혀질 수 있는 상기 리드 길이는 정확한 STR 타이핑에 있어 가장 중요한 요소이다[17, 18]. 단일 리드(single read)는 STR의 반복 부위를 포함하고 있을 것이다. 따라서, 본 발명자들은 NGS 리드 길이를 위해 작은 사이즈의 앰플리콘을 생성하는데 중점을 두었다. 많은 연구자들이 이미 NGS 분석을 위한 크기 최적화된 멀티플렉스 PCR 시스템이 필요함을 지적하고 있다. 이와 관련하여, 본 발명자들은 70 bp - 210 bp의 작은 앰플리콘을 생성하는 18개 마커를 이용한 in-house developed 멀티플렉스 PCR 시스템을 구축하였다. 상기 in-house 멀티플렉스 PCR 시스템은 단일 샘플, 혼합샘플 및 분해된 DNA 샘플을 이용하는 법의학적 분석에 이용될 수 있다.
Previous studies have investigated the possibility of NGS for forensic STR analysis. In one study, CE-based STR kits were used with four STR markers using the GS Junior platform [10] and 9 STRs using the 454 GS FLX [5] It was used to test only markers. However, there was no in-house multiplex PCR system optimized for NGS analysis of STR. In particular, the amplicon size is more important because the read length needs to be considered, and the lead length that can be read on the NGS platform is the most important factor for accurate STR typing [17, 18]. A single read will contain the repeats of the STR. Thus, the present inventors have focused on generating small size amplicon for the NGS lead length. Many researchers have already pointed out that a size optimized multiplex PCR system for NGS analysis is needed. In this regard, the inventors constructed an in-house developed multiplex PCR system using 18 markers producing small amplicons of 70 bp - 210 bp. The in-house multiplex PCR system can be used for forensic analysis using single samples, mixed samples, and resolved DNA samples.

본 명세서 전체에 걸쳐 다수의 논문 및 특허문헌이 참조되고 그 인용이 표시되어 있다. 인용된 논문 및 특허문헌의 개시 내용은 그 전체로서 본 명세서에 참조로 삽입되어 본 발명이 속하는 기술 분야의 수준 및 본 발명의 내용이 보다 명확하게 설명된다.
Numerous papers and patent documents are referenced and cited throughout this specification. The disclosures of the cited papers and patent documents are incorporated herein by reference in their entirety to better understand the state of the art to which the present invention pertains and the content of the present invention.

본 발명자들은 현재 법의학적 유전자 분석 방법에 활용되는 NGS (next generation sequencing) 기술을 이용하여, NGS를 이용한 STR 유전자형 분석방법 및 이에 최적화된 멀티플렉스 PCR 시스템을 개발하고자 예의 연구 노력하였다. 그 결과, 15 STR 마커의 NGS 자료의 분석을 통해 STR 대립유전자형 결정, 대립유전자의 반복구조, 염기서열변이를 분석하여 인간 객체(human subject)의 상염색체를 정확하고 효과적으로 분석하는 방법을 고안하였으며, 특히 특정 유전자좌위에 상보적으로 결합하는 각각의 비-표지(non-labeled) 프라이머를 이용하여 멀티플렉스 증폭을 실시하고 이들의 증폭산물로부터 수득한 대용량의 NGS 자료 분석을 통해 단일시료뿐만 아니라 1:1 혼합시료에서도 STR 유전자형의 분석이 가능함을 확인하였다. 또한, 본 발명자들은 18개 STR 마커를 이용하고, 70 bp - 210 bp 크기의 앰플리콘을 생성할 수 있는 NGS 분석을 위한 크기 최적화된 멀티플렉스 PCR 시스템을 고안하였으며, 상기 시스템을 이용한 유전자 분석방법은 단일-소스 시료 및 혼합시료 뿐만 아니라 인위적으로 분해된 DNA 샘플에 적용될 수 있으며, 특히 STR loci에서 정보의 손실없이 효과적으로 STR 대립유전자 call 및 STR 프로파일을 생성할 수 있음을 확인함으로써 본 발명을 완성하게 되었다.
The present inventors have now found that forensic gene analysis (NGS) using the next generation sequencing (NGS) technology, and to develop an optimized multiplex PCR system. As a result, the method of accurately and effectively analyzing the autosomal chromosome of a human subject was devised by analyzing the NG allele of the STR marker by analyzing STR allelotype determination, repeating structure of alleles, and nucleotide sequence variation, In particular, multiplex amplification was performed with each non-labeled primer complementarily binding to a specific locus and analysis of large amounts of NGS data obtained from these amplification products resulted in a 1: 1 It was confirmed that analysis of STR genotype was possible in mixed samples. The present inventors also devised a size-optimized multiplex PCR system for NGS analysis using 18 STR markers and capable of generating amplicons of 70 bp to 210 bp in size. The inventors of the present invention have completed the present invention by confirming that they can be applied to artificially decomposed DNA samples as well as single-source samples and mixed samples, and in particular, STR loci can effectively generate STR allele call and STR profiles without loss of information .

따라서, 본 발명의 목적은 NGS 기반(next generation sequencing-based) 인간 객체(human subject)의 상염색체 분석방법을 제공하는 데 있다.Accordingly, an object of the present invention is to provide a method of analyzing an autosomal chromosome of a next generation sequencing-based (NGS) human subject.

본 발명의 다른 목적은 NGS 기반(next generation sequencing-based) 상염색체 분석용 멀티플렉스 유전자 증폭 키트를 제공하는 데 있다.
It is another object of the present invention to provide a multiplex gene amplification kit for NGS-based next-generation sequencing-based autosomal analysis.

본 발명의 다른 목적 및 이점은 하기의 발명의 상세한 설명, 청구범위 및 도면에 의해 보다 명확하게 된다.
Other objects and advantages of the present invention will become more apparent from the following detailed description of the invention, claims and drawings.

본 발명자들은 현재 법의학적 유전자 분석 방법에 활용되는 NGS (next generation sequencing) 기술을 이용하여, NGS를 이용한 STR 유전자형 분석방법 및 이에 최적화된 멀티플렉스 PCR 시스템을 개발하고자 예의 연구 노력하였다. 그 결과, 15 STR 마커의 NGS 자료의 분석을 통해 STR 대립유전자형 결정, 대립유전자의 반복구조, 염기서열변이를 분석하여 인간 객체(human subject)의 상염색체를 정확하고 효과적으로 분석하는 방법을 고안하였으며, 특히 특정 유전자좌위에 상보적으로 결합하는 각각의 비-표지(non-labeled) 프라이머를 이용하여 멀티플렉스 증폭을 실시하고 이들의 증폭산물로부터 수득한 대용량의 NGS 자료 분석을 통해 단일시료뿐만 아니라 1:1 혼합시료에서도 STR 유전자형의 분석이 가능함을 확인하였다. 또한, 본 발명자들은 18개 STR 마커를 이용하고, 70 bp - 210 bp 크기의 앰플리콘을 생성할 수 있는 NGS 분석을 위한 크기 최적화된 멀티플렉스 PCR 시스템을 고안하였으며, 상기 시스템을 이용한 유전자 분석방법은 단일-소스 시료 및 혼합시료 뿐만 아니라 인위적으로 분해된 DNA 샘플에 적용될 수 있으며, 특히 STR loci에서 정보의 손실없이 효과적으로 STR 대립유전자 call 및 STR 프로파일을 생성할 수 있음을 확인하였다.
The present inventors have now found that forensic gene analysis (NGS) using the next generation sequencing (NGS) technology, and to develop an optimized multiplex PCR system. As a result, the method of accurately and effectively analyzing the autosomal chromosome of a human subject was devised by analyzing the NG allele of the STR marker by analyzing STR allelotype determination, repeating structure of alleles, and nucleotide sequence variation, In particular, multiplex amplification was performed with each non-labeled primer complementarily binding to a specific locus and analysis of large amounts of NGS data obtained from these amplification products resulted in a 1: 1 It was confirmed that analysis of STR genotype was possible in mixed samples. The present inventors also devised a size-optimized multiplex PCR system for NGS analysis using 18 STR markers and capable of generating amplicons of 70 bp to 210 bp in size. It has been shown that it can be applied to artificially degraded DNA samples as well as single-source and mixed samples, and STR in particular can generate STR allele call and STR profiles effectively without loss of information.

I. NGS 기반(next generation sequencing-based) 상염색체 분석방법 1I. NGS-based (next generation sequencing-based) autosomal analysis method 1

본 발명의 일 양태에 따르면, 본 발명은 다음의 단계를 포함하는 NGS 기반(next generation sequencing-based) 인간 객체(human subject)의 상염색체 분석방법을 제공한다:According to one aspect of the present invention, there is provided a method of analyzing an autosomal chromosome of an NGS-based next generation sequencing-based human subject, comprising the steps of:

(a) 인간 객체 DNA 시료의 D3S1358, TH01, D21S11, D18S51, PentaE, D5S818, D13S317, D7S820, D16S539, CSF1PO (Human c-fms proto-oncogene for CSF-1 receptor gene), PentaD, vWA (von Willebrand factor A), D8S1179, TPOX (Human thyroid peroxidase gene), FGA (Human fibrinogen alpha chain) 및 아멜로제닌(amelogenin) 유전좌위에 상보적으로 결합하는 각각의 프라이머를 이용하여 멀티플렉스 증폭을 실시하는 단계; 및(a) Human DnaS1358, TH01, D21S11, D18S51, PentaE, D5S818, D13S317, D7S820, D16S539, CSF1PO (Human c-fms proto-oncogene for CSF-1 receptor gene), PentaD, vWA A multiplex amplification using primers complementary to the genetic locus of D8S1179, human thyroid peroxidase gene (TPOX), human fibrinogen alpha chain (FGA), and amelogenin; And

(b) 상기 단계 (a)의 멀티플렉스 증폭 산물의 NGS 데이터를 이용하여 상기 유전좌위의 STR (short tandem repeat) 대립유전자형을 결정하여, 상기 인간 객체를 유전자로 감식하는 단계.
(b) determining an STR (short tandem repeat) allele of the genetic locus using the NGS data of the multiplex amplification product of step (a), and identifying the human object as a gene.

최근, 모세관 전기영동법(CE)-기반 STR 분석법의 한계, 예컨대 형광-레이블된 프라이머 및 STR 앰플리콘의 최대 크기를 이용하여 동시에 측정가능한 STR loci의 제한된 수와 같은 한계를 극복하기 위하여 궁극적인 지노타이핑 방법으로서 차세대 염기서열 분석방법인 NGS (next generation sequencing)가 주목받고 있다.In recent years, in order to overcome the limitations of the capillary electrophoresis (CE) -based STR assay, such as the limited number of simultaneously measurable STR loci using the fluorescence-labeled primer and the maximum size of the STR ampicron, Next generation sequencing (NGS), which is a next-generation sequencing method, has attracted attention as a method.

본 발명자들은 NGS 방법을 통하여 15개 상염색체 STR 마커(D3S1358, TH01, D21S11, D18S51, Penta E, D5S818, D13S317, D7S820, D16S539, CSF1PO, Penta D, vWA, D8S1179, TPOX, FGA의 15개 STR 유전좌 및 아멜로제닌)를 분석하였으며, STR 분석에서 이들의 효율성을 평가하였다. 남성 및 여성 표준 DNA의 단일-소스 및 이들의 1:1 혼합물을 이용하여, 멀티플렉스 PCR 방법으로 샘플 앰플리콘을 생성하였고, 멀티플렉스 identifier (multiplex identifier, MID)를 이용한 어댑터의 접합으로써 DNA 라이브러리를 구축하고, Roche GS Junior Platform을 이용하여 DNA를 시퀀싱하였다. 각 샘플의 시퀀싱 데이터는 미리 구축한 레퍼런스 서열과 얼라인(alignment)함으로써 분석하였다. 대부분의 STR 대립유전자는 2개의 단일-소스 coverage 임계값의 20% 및 1:1 혼합물 coverage 임계값의 10%를 적용함으로써 결정하였다. 각 대립유전자에서 STR의 구조는 타겟 STR 부위의 서열을 분석함으로써 정확히 결정되었다. 혼합 샘플의 혼합비는 각 locus의 해당 대립유전자간의 coverage 비율을 분석하고, 서열변이의 레퍼런스/변이 비율을 분석함으로써 측정되었다. 결과적으로, 본 발명의 실험방법으로써 NGS 데이터를 성공적으로 생성할 수 있다. 또한, NGS 분석 프로토콜은 각 locus에서 정확한 STR 대립유전자 call 및 반복 구조결정을 가능케한다. 따라서, NGS 시스템을 이용한 본 발명의 방법은 법의학 연구에서 단일-소스 및 혼합샘플에서 STR 프로파일을 분석 및 해석하는데 유용할 것이다.
The present inventors have used 15 STR genetic markers (D3S1358, TH01, D21S11, D18S51, Penta E, D5S818, D13S317, D7S820, D16S539, CSF1PO, Penta D, vWA, D8S1179, TPOX, Left and amelogenin) were analyzed and their efficiency was assessed in STR analysis. A sample amplicon was generated by a multiplex PCR method using a single-source and a 1: 1 mixture of male and female standard DNA, and a DNA library was constructed by splicing the adapter using a multiplex identifier (MID) And DNA sequencing was performed using the Roche GS Junior Platform. The sequencing data of each sample was analyzed by alignment with a pre-constructed reference sequence. Most STR alleles were determined by applying 20% of the two single-source coverage thresholds and 10% of the 1: 1 mixture coverage threshold. The structure of the STR in each allele was precisely determined by analyzing the sequence of the target STR region. The mixing ratio of the mixed samples was determined by analyzing the coverage ratio between the respective alleles of each locus and analyzing the reference / mutation rate of the sequence mutations. As a result, NGS data can be successfully generated using the experimental method of the present invention. In addition, the NGS analysis protocol enables accurate STR allele call and repeat structure determination in each locus. Thus, the method of the present invention using the NGS system will be useful for analyzing and interpreting STR profiles in single-source and mixed samples in forensic studies.

본 발명의 분석대상인 DNA 시료는 혈액, 정액, 질 세포, 모발, 타액, 소변, 구강세포, 태반세포 또는 태아세포를 포함하는 양수 및 이의 혼합물을 포함하는 군으로부터 선택되는 조직으로부터 분리된 DNA 시료이며, 이에 한정되지 않는다.A DNA sample to be analyzed according to the present invention is a DNA sample isolated from a tissue selected from the group consisting of blood, semen, vaginal cells, hair, saliva, urine, oral cells, placental cells or amniotic fluid including fetal cells, and mixtures thereof , But is not limited thereto.

상기 DNA 시료는 당업계에 공지된 통상적인 방법을 통해 수득할 수 있다. 예컨대, 상기 조직에 DNA 용해 완충액(예컨대, tris-HCl, EDTA, EGTA, SDS, 디옥시콜레이트(deoxycholate), 및 트리톤X (tritonX) 및/또는 NP-40을 포함)을 처리하여 DNA를 분리한다.The DNA sample can be obtained through a conventional method known in the art. For example, DNA is isolated by treating the tissue with a DNA lysis buffer (e.g., tris-HCl, EDTA, EGTA, SDS, deoxycholate, and Triton X and / or NP-40) .

본 발명의 일 구현예에 따르면, DNA 시료는 DNA를 포함하는 생물학적 시료(biological sample)이다. 한편, 상기 DNA 시료는 단일-소스(single-source) 시료 또는 2 이상 소스의 혼합 시료(mixture)로 구성될 수 있다.According to one embodiment of the present invention, the DNA sample is a biological sample containing DNA. Meanwhile, the DNA sample may be composed of a single-source sample or a mixture of two or more sources.

본 발명의 방법에는 생물학적 시료에서 분리된 DNA를 적용할 수 있지만. 상기 생물학적 시료를 직접 이용하여 핵산분자가 관여하는 직접 PCR(Direct Polymerase Chain Reaction)을 실시할 수도 있다(대한민국 등록특허 10-0746372 참조).Although the method of the present invention can apply DNA isolated from a biological sample, Direct PCR (Direct Polymerase Chain Reaction) involving nucleic acid molecules may be performed using the biological sample directly (see Korean Patent Registration No. 10-0746372).

이하, 본 발명의 유전자 감식 방법에 대하여 단계별로 상세하게 설명한다:Hereinafter, the gene detection method of the present invention will be described step by step in detail:

단계 (a): 멀티플렉스 증폭 단계Step (a): Multiplex amplification step

우선, 분석대상인 DNA 시료의 D3S1358, TH01, D21S11, D18S51, PentaE, D5S818, D13S317, D7S820, D16S539, CSF1PO (Human c-fms proto-oncogene for CSF-1 receptor gene), PentaD, vWA (von Willebrand factor A), D8S1179, TPOX (Human thyroid peroxidase gene), FGA (Human fibrinogen alpha chain) 및 아멜로제닌(amelogenin)에 결합하는 각각의 프라이머를 이용하여 멀티플렉스 증폭을 실시한다.First, the DNA samples to be analyzed include D3S1358, TH01, D21S11, D18S51, PentaE, D5S818, D13S317, D7S820, D16S539, CSF1PO (Human c-fms proto-oncogene for CSF-1 receptor gene), PentaD, vWA ), D8S1179, human thyroid peroxidase gene (TPOX), human fibrinogen alpha chain (FGA), and amelogenin.

본 명세서에 기술된 용어“증폭”은 핵산 분자를 증폭하는 반응을 의미한다. 다양한 증폭 반응들이 당업계에 보고 되어 있으며, 이는 중합효소 연쇄반응(이하 PCR이라 한다)(미국 특허 제4,683,195, 4,683,202, 및 4,800,159호), 역전사-중합효소 연쇄반응(이하 RT-PCR로 표기한다)(Sambrook 등, Molecular Cloning . A Laboratory Manual, 3rd ed. Cold Spring Harbor Press(2001)), Miller, H. I.(WO 89/06700) 및 Davey, C. 등(EP 329,822)의 방법, 리가아제 연쇄 반응(ligase chain reaction; LCR)(17, 18), Gap-LCR(WO 90/01069), 복구 연쇄 반응(repair chain reaction; EP 439,182), 전사-중재 증폭(transcription-mediated amplification; TMA)(19) (WO 88/10315), 자가 유지 염기서열 복제(self sustained sequence replication)(20)(WO 90/06995), 타깃 폴리뉴클레오티드 염기서열의 선택적 증폭(selective amplification of target polynucleotide sequences)(미국 특허 제6,410,276호), 컨센서스 서열 프라이밍 중합효소 연쇄 반응(consensus sequence primed polymerase chain reaction; CP-PCR)(미국 특허 제4,437,975호), 임의적 프라이밍 중합효소 연쇄 반응(arbitrarily primed polymerase chain reaction; AP-PCR)(미국 특허 제5,413,909호 및 제5,861,245호), 핵산 염기서열 기반 증폭(nucleic acid sequence based amplification; NASBA)(미국 특허 제5,130,238호, 제5,409,818호, 제5,554,517호, 및 제6,063,603호), 가닥 치환 증폭(strand displacement amplification) 및 고리-중재 항온성 증폭(loop-mediated isothermal amplification; LAMP)을 포함하나, 이에 한정되지는 않는다. 사용 가능한 다른 증폭 방법들은 미국특허 제5,242,794, 5,494,810, 4,988,617호 및 미국 특허 제09/854,317호에 기술되어 있다.The term " amplification " as described herein refers to a reaction that amplifies a nucleic acid molecule. A variety of amplification reactions have been reported in the art, including polymerase chain reaction (PCR) (US Pat. Nos. 4,683,195, 4,683,202, and 4,800,159), reverse transcription-polymerase chain reaction (RT- (Sambrook et al., Molecular Cloning . A Laboratory Manual , 3rd ed. Methods of ligase chain reaction (LCR) (17, 18), Gap-HI (WO 89/06700) and Davey, C. et al (EP 329,822) (WO 90/01069), repair chain reaction (EP 439,182), transcription-mediated amplification (TMA) 19 (WO 88/10315), self sustained sequence replication) 20 (WO 90/06995), selective amplification of target polynucleotide sequences (U.S. Patent No. 6,410,276), consensus sequence primed polymerase chain reaction (CP-PCR) (U.S. Patent No. 4,437,975), arbitrarily primed polymerase chain reaction (AP-PCR) (U.S. Patent Nos. 5,413,909 and 5,861,245), nucleic acid sequence-based amplification acid sequence based amplification (NASBA) (U.S. Patent Nos. 5,130,238, 5,4 But are not limited to, ribozymes, ribosomes, ribosomes, ribosomes, ribosomes, ribosomes, ribosomes, ribosomes, and ribosomes. Other amplification methods that may be used are described in U.S. Patent Nos. 5,242,794, 5,494,810, 4,988,617 and U.S. Patent No. 09 / 854,317.

PCR은 가장 잘 알려진 핵산 증폭 방법으로, 그의 많은 변형과 응용들이 개발되어 있다. 예를 들어, PCR의 특이성 또는 민감성을 증진시키기 위해 전통적인 PCR 절차를 변형시켜 터치다운(touchdown) PCR, 핫 스타트(hot start) PCR, 네스티드(nested) PCR 및 부스터(booster) PCR이 개발되었다. 또한, 멀티플렉스(multiplex) PCR, 실시간(real-time) PCR, 분별 디스플레이 PCR(differential display PCR, D-PCR), cDNA 말단의 신속 증폭(rapid amplification of cDNA ends, RACE), 인버스 PCR (inverse polymerase chain reaction: IPCR), 벡토레트(vectorette) PCR, 및 TAIL-PCR(thermal asymmetric interlaced PCR)이 특정한 응용을 위해 개발되었다. PCR에 대한 자세한 내용은 McPherson, M.J., 및 Moller, S.G. PCR. BIOS Scientific Publishers, Springer-Verlag New York Berlin Heidelberg, N.Y. (2000)에 기재되어 있으며, 그의 교시사항은 본 명세서에 참조로 삽입된다.PCR is the most well-known nucleic acid amplification method, and many variations and applications thereof have been developed. For example, touchdown PCR, hot start PCR, nested PCR and booster PCR have been developed by modifying traditional PCR procedures to enhance the specificity or sensitivity of PCR. In addition, multiplex PCR, real-time PCR, differential display PCR, D-PCR, rapid amplification of cDNA ends (RACE), inverse PCR chain reaction (IPCR), vectorette PCR, and thermal asymmetric interlaced PCR (TAIL-PCR) have been developed for specific applications. For more information on PCR, see McPherson, MJ, and Moller, SG PCR . BIOS Scientific Publishers, Springer-Verlag New York Berlin, Heidelberg, NY (2000), the teachings of which are incorporated herein by reference.

본 발명에서 상기 멀티플렉스 증폭은 멀티플렉스 PCR(Polymerase Chain Reaction) 증폭, 또는 직접(direct) 멀티플렉스 PCR 증폭이다. 본 발명의 일 구현예에 따르면, 상기 멀티플렉스 증폭은 에멀젼(emulsion) 멀티플렉스 PCR 증폭이다. In the present invention, the multiplex amplification is a multiplex PCR (Polymerase Chain Reaction) amplification or a direct multiplex PCR amplification. According to one embodiment of the present invention, the multiplex amplification is an emulsion multiplex PCR amplification.

본 발명의 일 구현예에 따르면, 상기 멀티플렉스 PCR 증폭은 55-65-℃의 어닐링(annealing) 온도 조건을 갖고, 본 발명의 다른 구현예에 따르면, 상기 멀티플렉스 PCR 증폭은 57-62℃의 어닐링 온도 조건을 가지며, 본 발명의 특정 구현예에 따르면, 상기 멀티플렉스 PCR 증폭은 60℃의 어닐링 온도 조건을 갖는다. According to one embodiment of the present invention, the multiplex PCR amplification has an annealing temperature condition of 55-65- < 0 > C. According to another embodiment of the present invention, the multiplex PCR amplification has a temperature of 57-62 Annealing temperature conditions, and according to certain embodiments of the present invention, the multiplex PCR amplification has an annealing temperature condition of 60 < 0 > C.

상기 멀티플렉스 PCR 증폭은 PCR을 실시하는 데 적정한 싸이클 수가 요구된다. 본 발명의 일 구현예에 따르면, 상기 멀티플렉스 PCR 증폭은 32-36 싸이클, 33-35 싸이클 또는 34 싸이클로 실시한다. The multiplex PCR amplification requires a reasonable number of cycles to perform PCR. According to one embodiment of the invention, the multiplex PCR amplification is performed in 32-36 cycles, 33-35 cycles or 34 cycles.

본 발명은 상기 DNA 시료의 D3S1358, TH01, D21S11, D18S51, PentaE, D5S818, D13S317, D7S820, D16S539, CSF1PO, PentaD, vWA, D8S1179, TPOX, FGA 및 아멜로제닌 유전좌위에 상보적으로 결합하는 각각의 프라이머를 이용하여 멀티플렉스 증폭을 실시한다.The present invention relates to a DNA sample which is complementarily binding to the D3S1358, TH01, D21S11, D18S51, PentaE, D5S818, D13S317, D7S820, D16S539, CSF1PO, PentaD, vWA, D8S1179, TPOX, FGA and Amelozynine genetic loci Multiplex amplification is performed using a primer.

본 발명에서 사용되는 프라이머는 형광염료 또는 별도의 어댑터 서열이 부착되어 사용될 수 있고, 비-표지(non-labeled) 프라이머로서 사용될 수도 있다.The primer used in the present invention may be used with a fluorescent dye or a separate adapter sequence attached thereto, and may be used as a non-labeled primer.

본 발명의 일 구현예에 따르면, 본 발명에서 사용되는 프라이머는 비-표지(non-labeled) 프라이머로서, 특정 형광염료로 표지되거나 어댑터(adaptor)가 부착되지 않은 프라이머이다. 본 발명자들은 CE를 이용한 STR 분석법과 다르게 프라이머에 형광표지자를 부착하지 않았으며, 이는 비-표지 프라이머가 형광표지자 또는 어댑터를 부착한 경우와 비교하여 대립유전자 결정의 정확성 및 NGS 데이터세트의 질(quality)이 높게 나타남을 확인하였기 때문이다.According to one embodiment of the present invention, the primer used in the present invention is a non-labeled primer, which is a primer that is not labeled with a specific fluorescent dye or has no adapter. Unlike the STR assay using CE, the present inventors did not attach fluorescent markers to the primers. This is because the accuracy of the allele determination and the quality of the NGS data set, as compared with the case where the non-labeled primer is attached with a fluorescent marker or an adapter, ) Were found to be high.

본 발명자들은 표 1의 프라이머 서열에 형광표지 및 어댑터 서열의 존재 유무에 따라 3가지 타입의 프라이머 세트를 준비하여 각각의 PCR 증폭산물을 생성하였으며, NGS를 실시하였다(형광표지: Test 1, 6; 어댑터 서열: Test 2, 3; 비표지: Test 4, 5). 그 결과, NGS로 얻어진 자료의 양적인 측면(총 read 수)을 보면 어댑터 서열을 포함시켜 프라이머를 설계한 경우(Test 2, 3)에는 약 8만여 개가 나온 반면에, 그렇지 않은 프라이머로부터 얻어진 데이터세트(Test 1, 4, 5, 6)에서는 약 14~15만개가 얻어졌다. Read 길이에 따른 분포를 보면 Test 1과 6은 200 bp 길이를 중심으로 좁은 범위에서 나타나는 반면에 Test 2-5는 50 bp - 400 bp 구간에서 비교적 넓은 범위에서 나타나는 것을 확인할 수 있었다.The present inventors prepared three types of primer sets according to the presence or absence of the fluorescent label and the adapter sequence in the primer sequence of Table 1 to generate respective PCR amplification products and performed NGS (fluorescent label: Test 1, 6; Adapter sequence: Test 2, 3; non-labeled: Test 4, 5). As a result, the quantitative aspects (total number of readings) of the data obtained by NGS were about 80,000 in the case of designing the primer including the adapter sequence (Test 2 and 3), while the data set obtained from the non-primer Test 1, 4, 5, 6), about 14 to 150 thousand samples were obtained. As for the distribution according to the read length, Test 1 and 6 show a narrow range around 200 bp, whereas Test 2-5 shows a relatively wide range at 50 bp - 400 bp.

이어, NGS 데이터세트를 lobSTR 프로그램 또는 Bowtie 2 프로그램을 이용하여 STR을 분석하였다. 그 결과, 대립유전자 지정의 정확성 측면에서는 ⅰ) 형광표지 및 어댑터 서열이 없는 프라이머로 PCR이 수행된 Test 4와 5, ⅱ) 형광표지가 부착된 프라이머로 PCR이 수행된 Test 1과 6, ⅲ) 어댑터 서열이 포함된 프라이머로 PCR이 수행된 Test 2와 3의 순서로 정확도가 높은 것으로 나타났다(데이터 미기재). NGS 데이터세트의 질(quality)은 coverage가 얼마나 높은지를 확인함으로써 평가할 수 있는데, 이러한 질적인 측면에서는 ⅰ) 형광표지 및 어댑터 서열이 없는 프라이머로 PCR이 수행된 Test 4와 5, ⅱ) 어댑터 서열이 포함된 프라이머로 PCR이 수행된 Test 2와 3, ⅲ) 형광표지가 부착된 프라이머로 PCR이 수행된 Test 1과 6의 순서로 질이 높은 것으로 나타났다(데이터 미기재). 결과적으로, 위와 같이 두 가지 조건에서 형광표지 및 어댑터 서열이 없는 프라이머로 PCR 산물을 얻고 이것으로 NGS를 수행한 경우(Test 4, 5)에서 가장 좋은 결과가 도출되었다.Next, the NGS data set was analyzed using the lobSTR program or the Bowtie 2 program. As a result, in terms of accuracy of allele designation, i) Tests 4 and 5 in which PCR was performed with a primer without fluorescent label and adapter sequence, ii) Tests 1 and 6 in which PCR was performed with a fluorescent label attached primer, iii) The accuracy was high in the order of Test 2 and 3 in which PCR was performed with the primer containing the adapter sequence (data not shown). The quality of the NGS data set can be assessed by ascertaining how high the coverage is, in terms of qualitative aspects: i) Tests 4 and 5 where PCR was performed with primers without fluorescent labeling and adapter sequences, and ii) Tests 2 and 3, in which the PCR was carried out with the included primers, and iii) Test 1 and 6, in which the PCR was performed with the fluorescently labeled primers, were found to be of high quality (data not shown). As a result, PCR products were obtained with primers that lacked the fluorescent label and adapter sequence under the above two conditions, and the best result was obtained when NGS was performed (Test 4, 5).

lobSTR 프로그램 또는 Bowtie 2 프로그램을 이용한 분석 결과 모두 Test 4와 5에서 가장 좋은 결과를 보여주었는데, 특히 Bowtie 2 프로그램을 이용했을 경우에는 20% 이상의 점유율을 기준으로 한 대립유전자를 판정했을 때 CE와 동일한 결과를 보여주었을 뿐만 아니라 10% 이상 20% 미만의 범위에서 나타나는 대립유전자도 없었다. 또한, lobSTR로 분석했을 때에는 Penta D와 VWA 유전좌에서는 대립유전자를 결정하지 못하는 경우가 있었지만, Bowtie 2를 이용한 경우에는 정확하게 대립유전자를 결정할 수 있었다. 그리고 lobSTR로부터의 결과에서는 예상되는 대립유전자의 크기보다 +1 또는 1 bp만큼 차이를 보이는 대립유전자가 관찰되었으나, 레퍼런스 서열을 사용한 Bowtie 2로 분석한 결과에서는 이러한 현상이 관찰되지 않았다.The lobSTR program or the Bowtie 2 program showed the best result in Test 4 and 5, especially when using the Bowtie 2 program, the same result as the CE when judging alleles based on a share of more than 20% As well as no allele in the range of 10% to less than 20%. In addition, when analyzed by lobSTR, alleles could not be determined in PentaD and VWA genetic loci, but Bowtie 2 was able to determine the allele correctly. The results from lobSTR showed an allele of +1 or 1 bp difference from the predicted allele size, but this phenomenon was not observed in Bowtie 2 using the reference sequence.

한편, 본 발명의 프라이머는 프라이머 염기 서열의 변이가 있는 경우에 변이된 결합 부위에서도 증폭이 일어나도록 프라이머 설계에 축퇴성(degeneracy)을 적용하였다. 이러한 본 발명의 프라이머를 이용한 멀티플렉스 PCR의 증폭 산물의 크기는 20 bp - 600 bp로, 극소량의 주형 또는 분해된 시료의 증폭에 유리하여 검출 감도 향상에 기여한다. 본 발명의 일 구현예 따르면, 상기 증폭 산물의 크기는 100 bp - 600 bp 또는 100 bp - 500 bp이다.Meanwhile, the primer of the present invention applied degeneracy to the primer design so that even in the case of mutation of the base sequence of the primer, amplification occurs at the mutated binding site. The amplification product of the multiplex PCR using the primer of the present invention has a size of 20 bp - 600 bp, which is advantageous for amplification of a very small amount of template or degraded sample, thereby contributing to enhancement of detection sensitivity. According to one embodiment of the present invention, the size of the amplification product is 100 bp to 600 bp or 100 bp to 500 bp.

본 발명의 일 구현예에 따르면, 본 발명의 상기 단계 (a)의 상기 D3S1358 유전좌위에 상보적으로 결합하는 프라이머는 서열목록 제1서열 및 제2서열이고; 상기 TH01 유전좌위에 상보적으로 결합하는 프라이머는 서열목록 제3서열 및 제4서열이며; 상기 D21S11 유전좌위에 상보적으로 결합하는 프라이머는 서열목록 제5서열 및 제6서열이고; 상기 D18S51 유전좌위에 상보적으로 결합하는 프라이머는 서열목록 제7서열 및 제8서열이며; 상기 PentaE 유전좌위에 상보적으로 결합하는 프라이머는 서열목록 제9서열 및 제10서열이고; 상기 D5S818 유전좌위에 상보적으로 결합하는 프라이머는 서열목록 제11서열 및 제12서열이며; 상기 D13S317 유전좌위에 상보적으로 결합하는 프라이머는 서열목록 제13서열 및 제14서열이고; 상기 D7S820 유전좌위에 상보적으로 결합하는 프라이머는 서열목록 제15서열 및 제16서열이며; 상기 D16S539 유전좌위에 상보적으로 결합하는 프라이머는 서열목록 제17서열 및 제18서열이고; 상기 CSF1PO 유전좌위에 상보적으로 결합하는 프라이머는 서열목록 제19서열 및 제20서열이며; 상기 PentaD 유전좌위에 상보적으로 결합하는 프라이머는 서열목록 제21서열 및 제22서열이고, 상기 vWA 유전좌위에 상보적으로 결합하는 프라이머는 서열목록 제23서열 및 제24서열이며; 상기 D8S1179 유전좌위에 상보적으로 결합하는 프라이머는 서열목록 제25서열 및 제26서열이고; 상기 TPOX 유전좌위에 상보적으로 결합하는 프라이머는 서열목록 제27서열 및 제28서열이며; 상기 FGA 유전좌위에 상보적으로 결합하는 프라이머는 서열목록 제29서열 및 제30서열이고; 상기 아멜로제닌 유전좌위에 상보적으로 결합하는 프라이머는 서열목록 제31서열 및 제32서열이다. According to one embodiment of the present invention, the primer that binds complementarily to the D3S1358 genetic locus of step (a) of the present invention is a sequence of SEQ ID NOS: 1 and 2; The primer that binds complementarily to the TH01 genetic locus is a sequence of SEQ ID NO: 3 or SEQ ID NO: 4; The primers complementarily binding to the D21S11 genetic locus are those of SEQ ID NOS: 5 and 6; The primers complementarily binding to the D18S51 genetic locus are SEQ ID NOS: 7 and 8; Wherein the primer that binds complementarily to the PentaE gene locus is SEQ ID NOS: 9 and 10; Wherein the primers complementarily binding to the D5S818 genetic locus are SEQ ID NO: 11 and SEQ ID NO: 12; Wherein the primers complementarily binding to the D13S317 genetic locus are SEQ ID NOS 13 and 14; The primers that complementarily bind to the D7S820 genetic locus are those of Sequence Listing 15 and 16; The primers complementarily binding to the D16S539 genetic locus are those of Sequence Listing 17 and Sequence 18; The primers complementarily binding to the CSF1PO genetic locus are SEQ ID NOS: 19 and 20; Wherein the primers complementarily binding to the PentaD genetic locus are SEQ ID NOS 21 and 22, and the primers complementarily binding to the vWA genetic locus are SEQ ID NOS 23 and 24; The primers complementarily binding to the D8S1179 genetic locus are those of Sequence Listing 25 and Sequence 26; Wherein the primer that binds complementarily to the TPOX genetic locus is SEQ ID NO: 27 and SEQ ID NO: 28; Wherein the primers complementarily binding to the FGA genetic locus are those of Sequence Listing Nos. 29 and 30; The primers complementarily binding to the amelogenin gene locus are SEQ ID NOS: 31 and 32.

본 발명의 유전자 감식 방법에 이용되는 프라이머는 피크 균형을 맞추기 위한 최적의 배합비를 갖는다.The primer used in the gene detection method of the present invention has an optimal blending ratio for adjusting the peak balance.

본 발명의 일 구현예에 따르면, D3S1358 유전좌위에 상보적으로 결합하는 프라이머는 0.01-0.5 μM의 최종농도를 갖고, 상기 TH01 유전좌위에 상보적으로 결합하는 프라이머는 0.01-0.4 μM의 최종농도를 가지며, 상기 D21S11 유전좌위에 상보적으로 결합하는 프라이머는 0.4-0.8 μM의 최종농도를 갖고, 상기 D18S51 유전좌위에 상보적으로 결합하는 프라이머는 0.3-0.7 μM의 최종농도를 가지며, 상기 PentaE 유전좌위에 상보적으로 결합하는 프라이머는 1-1.4 μM의 최종농도를 갖고, 상기 D5S818 유전좌위에 상보적으로 결합하는 프라이머는 0.01-0.5 μM의 최종농도를 가지며, 상기 D13S317 유전좌위에 상보적으로 결합하는 프라이머는 0.2-0.6 μM의 최종농도를 갖고; 상기 D7S820 유전좌위에 상보적으로 결합하는 프라이머는 0.1-0.5 μM의 최종농도를 가지며; 상기 D16S539 유전좌위에 상보적으로 결합하는 프라이머는 0.2-0.6 μM의 최종농도를 갖고; 상기 CSF1PO 유전좌위에 상보적으로 결합하는 프라이머는 0.1-0.5 μM의 최종농도를 가지며; 상기 PentaD 유전좌위에 상보적으로 결합하는 프라이머는 1-1.4 μM의 최종농도를 갖고; 상기 vWA 유전좌위에 상보적으로 결합하는 프라이머는 0.01-0.4 μM의 최종농도를 가지며; 상기 D8S1179 유전좌위에 상보적으로 결합하는 프라이머는 0.3-0.7 μM의 최종농도를 갖고; 상기 TPOX 유전좌위에 상보적으로 결합하는 프라이머는 0.01-0.4 μM의 최종농도를 가지며; 상기 FGA 유전좌위에 상보적으로 결합하는 프라이머는 0.4-0.8 μM의 최종농도를 갖고; 상기 아멜로제닌 유전좌위에 상보적으로 결합하는 프라이머는 0.05-0.5 μM의 최종농도를 갖는다. 본 발명의 다른 구현예에 따르면, D3S1358 유전좌위에 상보적으로 결합하는 프라이머는 0.05-0.4 μM의 최종농도를 갖고, 상기 TH01 유전좌위에 상보적으로 결합하는 프라이머는 0.05-0.3 μM의 최종농도를 가지며, 상기 D21S11 유전좌위에 상보적으로 결합하는 프라이머는 0.45-0.75 μM의 최종농도를 갖고, 상기 D18S51 유전좌위에 상보적으로 결합하는 프라이머는 0.35-0.65 μM의 최종농도를 가지며, 상기 PentaE 유전좌위에 상보적으로 결합하는 프라이머는 1.05-1.35 μM의 최종농도를 갖고, 상기 D5S818 유전좌위에 상보적으로 결합하는 프라이머는 0.05-0.4 μM의 최종농도를 가지며, 상기 D13S317 유전좌위에 상보적으로 결합하는 프라이머는 0.25-0.55 μM의 최종농도를 갖고; 상기 D7S820 유전좌위에 상보적으로 결합하는 프라이머는 0.15-0.45 μM의 최종농도를 가지며; 상기 D16S539 유전좌위에 상보적으로 결합하는 프라이머는 0.25-0.55 μM의 최종농도를 갖고; 상기 CSF1PO 유전좌위에 상보적으로 결합하는 프라이머는 0.15-0.45 μM의 최종농도를 가지며; 상기 PentaD 유전좌위에 상보적으로 결합하는 프라이머는 1.05-1.35 μM의 최종농도를 갖고; 상기 vWA 유전좌위에 상보적으로 결합하는 프라이머는 0.05-0.35 μM의 최종농도를 가지며; 상기 D8S1179 유전좌위에 상보적으로 결합하는 프라이머는 0.35-0.65 μM의 최종농도를 갖고; 상기 TPOX 유전좌위에 상보적으로 결합하는 프라이머는 0.05-0.35 μM의 최종농도를 가지며; 상기 FGA 유전좌위에 상보적으로 결합하는 프라이머는 0.45-0.75 μM의 최종농도를 갖고; 상기 아멜로제닌 유전좌위에 상보적으로 결합하는 프라이머는 0.1-0.45 μM의 최종농도를 갖는다. 본 발명의 특정 구현예에 따르면, D3S1358 유전좌위에 상보적으로 결합하는 프라이머는 0.15-0.30 μM의 최종농도를 갖고, 상기 TH01 유전좌위에 상보적으로 결합하는 프라이머는 0.05-0.2 μM의 최종농도를 가지며, 상기 D21S11 유전좌위에 상보적으로 결합하는 프라이머는 0.5-0.7 μM의 최종농도를 갖고, 상기 D18S51 유전좌위에 상보적으로 결합하는 프라이머는 0.4-0.6 μM의 최종농도를 가지며, 상기 PentaE 유전좌위에 상보적으로 결합하는 프라이머는 1.1-1.3 μM의 최종농도를 갖고, 상기 D5S818 유전좌위에 상보적으로 결합하는 프라이머는 0.15-0.3 μM의 최종농도를 가지며, 상기 D13S317 유전좌위에 상보적으로 결합하는 프라이머는 0.3-0.5 μM의 최종농도를 갖고; 상기 D7S820 유전좌위에 상보적으로 결합하는 프라이머는 0.2-0.4 μM의 최종농도를 가지며; 상기 D16S539 유전좌위에 상보적으로 결합하는 프라이머는 0.3-0.5 μM의 최종농도를 갖고; 상기 CSF1PO 유전좌위에 상보적으로 결합하는 프라이머는 0.2-0.4 μM의 최종농도를 가지며; 상기 PentaD 유전좌위에 상보적으로 결합하는 프라이머는 1.1-1.3 μM의 최종농도를 갖고; 상기 vWA 유전좌위에 상보적으로 결합하는 프라이머는 0.1-0.25 μM의 최종농도를 가지며; 상기 D8S1179 유전좌위에 상보적으로 결합하는 프라이머는 0.4-0.6 μM의 최종농도를 갖고; 상기 TPOX 유전좌위에 상보적으로 결합하는 프라이머는 0.1-0.25 μM의 최종농도를 가지며; 상기 FGA 유전좌위에 상보적으로 결합하는 프라이머는 0.5-0.7 μM의 최종농도를 갖고; 상기 아멜로제닌 유전좌위에 상보적으로 결합하는 프라이머는 0.15-0.4 μM의 최종농도를 갖는다.According to one embodiment of the present invention, the primer that binds complementarily to the D3S1358 genetic locus has a final concentration of 0.01-0.5 [mu] M, and the primer that binds complementarily to the TH01 genetic locus has a final concentration of 0.01-0.4 [mu] M Wherein the primer complementarily binding to the D21S11 genetic locus has a final concentration of 0.4-0.8 μM and the primer complementarily binding to the D18S51 genetic locus has a final concentration of 0.3-0.7 μM and the PentaE genetic locus Has a final concentration of 1-1.4 [mu] M, and the primer that binds complementarily to the D5S818 genetic locus has a final concentration of 0.01-0.5 [mu] M, and the primer binding complementarily to the D13S317 genetic locus The primers have a final concentration of 0.2-0.6 [mu] M; The primer that binds complementarily to the D7S820 genetic locus has a final concentration of 0.1-0.5 [mu] M; The primer that binds complementarily to the D16S539 genetic locus has a final concentration of 0.2-0.6 [mu] M; The primer that binds complementarily to the CSF1PO genetic locus has a final concentration of 0.1-0.5 [mu] M; The primer that binds complementarily to the PentaD genetic locus has a final concentration of 1-1.4 [mu] M; The primer that binds complementarily to the vWA genetic locus has a final concentration of 0.01-0.4 [mu] M; The primer that binds complementarily to the D8S1179 genetic locus has a final concentration of 0.3-0.7 [mu] M; The primer that binds complementarily to the TPOX genetic locus has a final concentration of 0.01-0.4 [mu] M; The primer that binds complementarily to the FGA gene locus has a final concentration of 0.4-0.8 [mu] M; The primer that binds complementarily to the amelogenin gene locus has a final concentration of 0.05-0.5 [mu] M. According to another embodiment of the invention, the primer complementarily binding to the D3S1358 genetic locus has a final concentration of 0.05-0.4 [mu] M and the primer that binds complementarily to the TH01 genetic locus has a final concentration of 0.05-0.3 [mu] M Wherein the primer complementarily binding to the D21S11 genetic locus has a final concentration of 0.45-0.75 [mu] M, the primer complementarily binding to the D18S51 genetic locus has a final concentration of 0.35-0.65 [mu] M, and the PentaE gene locus Primers complementarily binding to the D5S818 genetic locus have a final concentration of 1.05-1.35 [mu] M, the primers complementarily binding to the D5S818 genetic locus have a final concentration of 0.05-0.4 [mu] M, and the complementary binding to the D13S317 genetic locus The primers have a final concentration of 0.25-0.55 [mu] M; The primer that binds complementarily to the D7S820 genetic locus has a final concentration of 0.15-0.45 [mu] M; The primer that binds complementarily to the D16S539 genetic locus has a final concentration of 0.25-0.55 [mu] M; The primer that binds complementarily to the CSF1PO gene locus has a final concentration of 0.15-0.45 [mu] M; The primer that binds complementarily to the PentaD gene locus has a final concentration of 1.05-1.35 [mu] M; The primers that complementarily bind to the vWA genetic locus have a final concentration of 0.05-0.35 [mu] M; The primer that binds complementarily to the D8S1179 genetic locus has a final concentration of 0.35-0.65 [mu] M; The primer that binds complementarily to the TPOX genetic locus has a final concentration of 0.05-0.35 [mu] M; The primer that binds complementarily to the FGA gene locus has a final concentration of 0.45-0.75 [mu] M; The primer that binds complementarily to the amelogenin gene locus has a final concentration of 0.1-0.45 [mu] M. According to a particular embodiment of the invention, the primer that binds complementarily to the D3S1358 genetic locus has a final concentration of 0.15-0.30 [mu] M, and the primer that binds complementarily to the TH01 genetic locus has a final concentration of 0.05-0.2 [mu] M Wherein the primer complementarily binding to the D21S11 genetic locus has a final concentration of 0.5-0.7 [mu] M, the primer complementarily binding to the D18S51 genetic locus has a final concentration of 0.4-0.6 [mu] M, and the PentaE gene locus Primers complementarily binding to the D5S818 genetic locus have a final concentration of 1.1-1.3 [mu] M, and the primers complementarily binding to the D5S818 genetic locus have a final concentration of 0.15-0.3 [mu] M and are complementary to the D13S317 genetic locus The primers have a final concentration of 0.3-0.5 [mu] M; The primer that binds complementarily to the D7S820 genetic locus has a final concentration of 0.2-0.4 [mu] M; The primer that binds complementarily to the D16S539 genetic locus has a final concentration of 0.3-0.5 [mu] M; The primer that binds complementarily to the CSF1PO gene locus has a final concentration of 0.2-0.4 [mu] M; The primer that binds complementarily to the PentaD genetic locus has a final concentration of 1.1-1.3 [mu] M; The primer that binds complementarily to the vWA genetic locus has a final concentration of 0.1-0.25 μM; The primer that binds complementarily to the D8S1179 genetic locus has a final concentration of 0.4-0.6 [mu] M; The primer that binds complementarily to the TPOX genetic locus has a final concentration of 0.1-0.25 μM; The primer that binds complementarily to the FGA gene locus has a final concentration of 0.5-0.7 [mu] M; The primer that binds complementarily to the amelogenin gene locus has a final concentration of 0.15-0.4 [mu] M.

상기 아멜로제닌 유전좌위는 인간의 성별을 식별하기 위하여 사용되는 성염색체 상의 유전좌위이다. 아멜로제닌 좌위는 남성 DNA에 존재하는 Y 염색체 상의 좌위를 확인하는 경우 HUMAMELY로서 여성 DNA에 존재하는 X 염색체 상의 좌위를 확인하는 경우 HUMAMELX로서 확인된다.
The amelogenin gene locus is a genetic locus on a sex chromosome used to identify human sex. Amelogenin locus is identified as HUMAMELX when identifying the locus on the Y chromosome present in the male DNA, and as HUMAMELX when identifying the locus on the X chromosome present in the female DNA.

단계 (b): 유전자 감식 단계Step (b): The step of gene detection

이어, 상기 단계(a)의 멀티플렉스 증폭 산물을 이용하여 상기 유전좌위의 대립유전자형을 결정하여, 상기 인간 객체를 유전자로 감식한다.Next, the allele genotype of the genetic locus is determined using the multiplex amplification product of step (a), and the human gene is identified as a gene.

본 발명의 일 구현예에 따르면, 상기 멀티플렉스 증폭 산물에서 증폭된 각각의 STR 대립유전자의 염기서열을 레퍼런스 서열(reference sequence)과 비교/평가하여 실시하며, 상기 STR 대립유전자에서의 coverage 값으로서 각 STR 유전좌에서의 대립유전자형을 결정한다.According to one embodiment of the present invention, the nucleotide sequence of each STR allele amplified in the multiplex amplification product is compared with a reference sequence and evaluated, and as the coverage value in the STR allele, Determine the allelic genotype at STR locus.

상기 비교/평가는 STR 대립유전자의 염기서열을 레퍼런스 서열(reference sequence)과 얼라인(alignment)하여 각 STR 대립유전자에서의 coverage 값을 산출하는 단계를 포함한다. The comparison / evaluation includes calculating a coverage value in each STR allele by aligning a nucleotide sequence of the STR allele with a reference sequence.

본 발명에서 레퍼런스 서열(reference sequence)은 NGS를 이용한 STR 분석에 lobSTR 프로그램 이외의 다른 방법(예컨대, (ⅰ) Illumina GAIIx 시스템 [3], Bowtie aligner [4], SAMtools [2], Picard (http://picard.sourceforge.net/), BEDtools [5], R 프로그램 [6]; (ⅱ) 454 GS Junior 시스템, CLC Genomic Workbench software)을 사용하기 위하여 구축된 custom reference 서열이다. 본 발명에서는 Bornman 외.[3]이 사용한 방법을 기초로 하고, Bowtie 보다 기능이 향상된 Bowtie 2 [8]를 사용하였다.In the present invention, the reference sequence can be obtained by a method other than the lobSTR program (for example, (i) Illumina GAIIx system [3], Bowtie aligner [4], SAMtools [ (picard.sourceforge.net/), BEDtools [5], R program [6], and (ⅱ) 454 GS Junior System, CLC Genomic Workbench software. In the present invention, Bowtie 2 [8] is used which is based on the method used by Bornman et al. [3] and which has improved function than Bowtie.

lobSTR 프로그램으로 STR을 분석할 때에는 인간 지놈 GRCh37/hg19에 존재하는 모든 STR 유전좌를 레퍼런스로 사용하지만, 본 발명에서는 분석하고자하는 STR이 15개에 불과하기 때문에 불필요한 STR 유전좌를 줄임과 동시에 STR 대립유전자의 지정을 보다 수월하게 할 수 있도록 레퍼런스 서열을 제작하여 사용하였다. In analyzing the STR with the lobSTR program, all the STR genomic residues in the human genome GRCh37 / hg19 are used as references. In the present invention, since only 15 STRs are analyzed, unnecessary STR genomic residues are reduced, A reference sequence was prepared and used so that the designation of the gene could be made easier.

레퍼런스 서열을 하기와 같은 방법에 의해 제작되었다: The reference sequence was constructed by the following method:

STR 유전좌에서 현재까지 알려진 반복수 및 이들의 서열은 STRbase [9]로부터 얻었다. 각 STR의 5' 및 3' 주변부 서열(flanking region sequence)은 human genoeme GRCh37/hg19에서 가져왔으며, 주변부 서열의 길이는 500 bp - 550 bp로 설정할 수 있다. 기본적으로 하나의 STR은 5' 주변부 서열, STR 영역의 서열, 3' 주변부 서열의 순서로 구성된다. 이를 바탕으로 CE 시스템에서 대립유전자의 래더(ladder)를 구성하듯이, STR의 반복수가 작은 것부터 큰 순서로 이들 서열을 순차적으로 연결함으로써 STR 유전좌에 대한 레퍼런스 서열을 제작한다(도 1 참조).The known number of repeats and their sequence to date in STR loci were obtained from STRbase [9]. The 5 'and 3' flanking region sequences of each STR were derived from the human genome GRCh37 / hg19, and the length of the peripheral sequences could be set to 500 bp - 550 bp. Basically, one STR consists of sequence of 5 'perimeter sequence, sequence of STR region, and sequence of 3' peripheral sequence. Based on this, a reference sequence for the STR genetic locus is constructed by sequentially linking the sequences from the smallest number of STR repeats to the greatest number sequence (see FIG. 1), as in the case of constructing a ladder of alleles in the CE system.

본 발명의 일 구현예에 따르면, 상기 레퍼런스 서열은 상기 대립유전자의 STR 영역의 서열, 상기 STR 영역의 5' 주변부 서열 및 상기 STR 영역의 3' 주변부 서열을 포함한다. According to an embodiment of the present invention, the reference sequence includes a sequence of the STR region of the allele, a 5 'peripheral sequence of the STR region, and a 3' peripheral sequence of the STR region.

본 발명의 상기 레퍼런스 서열은 다양한 반복 횟수의 STR(Short Tandem Repeat)를 갖는 분획들로 구성되므로, 멀티플렉스 증폭 산물의 정확한 크기 및 서열을 분석하여 정확한 대립유전자를 평가하는데 이용된다. 상기 레퍼런스 서열은 상술한 방법 뿐 아니라 당업계에 공지된 통상적인 방법을 통해 수득할 수 있다. Since the reference sequence of the present invention is composed of fragments having various repetition times of STR (Short Tandem Repeat), it is used to evaluate correct alleles by analyzing the exact size and sequence of the multiplex amplification product. The reference sequence can be obtained not only by the method described above but also by a conventional method known in the art.

본 명세서에서, 용어 'STR'은 인간 게놈(genome) 내의 어떤 형질 등의 유전정보를 담고 있지 않은 인트론(intron) 내에 널리 분포한 것으로 알려진 반복염기서열(tandem repeat sequence)들은 유전자 감정의 마커(marker)로서 범세계적으로 유용하게 사용되고 있다. 인간의 게놈에서 많은 유전좌위가 다형성의 STR 부위를 함유하고 있다. STR 좌위는 길이가 2 내지 7개의 염기쌍인 짧은 반복 서열 요소로 구성되어 있는데, 인간의 게놈에는 매 15 kb마다 한 번씩, 2백 만개의 삼량체 및 사량체 STR이 존재하는 것으로 추정된다. 특정 좌위에서의 짧은 반복 배열 단위의 수가 변함에 따라, 이 좌위에서의 DNA 길이가 각 대립 형질(allele) 및 각 개인에 따라 변하게 된다. STR 좌위는 이 반복 배열의 측부에 동정되어 있는 특정 프라이머 서열을 이용하여 폴리머라제 연쇄 반응법(PCR)을 통해 증폭시킬 수 있다.As used herein, the term ' STR ' refers to a tandem repeat sequence that is known to be widely distributed within an intron that does not contain genetic information such as a trait in the human genome, ), Which has been widely used worldwide. Many genetic loci in the human genome contain the STR region of the polymorphism. The STR locus consists of a short repeat sequence element, 2 to 7 base pairs in length, which is estimated to have 2 million trimer and tetramer STRs once every 15 kb in the human genome. As the number of short repeating units in a particular locus changes, the DNA length at that locus will vary with each allele and each individual. STR loci can be amplified by Polymerase Chain Reaction (PCR) using specific primer sequences identified on the side of this repeat sequence.

이러한 유전좌위의 대립 형질들은 증폭된 부위 내에 함유되어 있는 반복 서열의 복제수(the number of copy), 대립유전자의 반복구조 또는 염기 서열변이(sequence variation)에 따라 세분화된다. 즉, 본 발명의 분석방법은 NGS 기반 상염색체 분석방법으로서 STR 대립유전자의 반복구조 결정 및 염기서열변이의 관찰이 가능하여(표 5 및 표 6), (ⅰ) 한 유전좌에서 같은 길이의 대립유전자로 보였지만 다른 염기서열을 갖고 있는 경우, (ⅱ) 한 유전좌에서 서로 다른 시료 간에 다른 반복구조를 보이는 경우, 및 (ⅲ) STR 영역의 반복구조는 같지만, 주변부 서열에서 염기서열변이가 관찰된 경우 등을 관찰할 수 있다. 이러한 경우들은 NGS를 이용한 염기서열 기반의 분석은 기존의 CE를 통해 확인된 STR 대립유전자가 더욱 더 세분될 수 있음을 시사한다. Alleles of these genetic loci are subdivided according to the number of copies of the repeated sequence contained within the amplified region, the repeat structure of alleles or sequence variation. That is, the analysis method of the present invention is capable of determining the repeating structure of STR alleles and observing the nucleotide sequence variation as NGS-based autosomal analysis method (Table 5 and Table 6), (i) (Ii) a different repeating structure between different samples in one genome, and (iii) the repeat structure of the STR region is the same, but a nucleotide sequence variation is observed in the peripheral region Can be observed. In these cases, nucleotide sequence-based analysis using NGS suggests that the STR allele identified by the existing CE can be further subdivided.

본 발명의 일 구현예에 따르면, 같은 길이의 대립유전자를 구별하기 위하여 상기 STR 유전좌 부위에서 반복 서열의 복제수(the number of copy), 대립유전자의 반복구조 또는 염기 서열변이(sequence variation)를 동정하는 단계를 추가적으로 실시할 수 있다. According to an embodiment of the present invention, in order to distinguish alleles of the same length, the number of copies of the repeated sequence (alleles) or sequence variation The identifying step may be additionally performed.

본 발명에 따르면, 상기 염기 서열변이(sequence variation)를 이용하여 혼합 샘플의 혼합비를 측정하는 단계를 추가적으로 실시할 수 있다. 상기 혼합비는 각 STR 유전자위의 대립유전자간 coverage 비율을 분석하고, 서열변이의 레퍼런스염기/변이염기 비율을 분석함으로써 측정한다(도 3 참조).According to the present invention, a step of measuring the mixing ratio of the mixed sample using the sequence variation may be additionally performed. The mixing ratio is determined by analyzing the coverage ratio between alleles on each STR gene and analyzing the ratio of the reference base / mutation base of the sequence variation (see FIG. 3).

본 발명의 상기 멀티플렉스 증폭 산물은 20 bp - 600 bp의 크기를 갖는다. 작은 증폭산물은 극소량의 주형 또는 분해된 시료의 증폭에 유리함을 줄 수 있어 검출 감도 향상시킨다. 본 발명의 일 구현예에 따르면, 상기 멀티플렉스 증폭 산물은 100 bp - 600 bp 또는 100 bp - 500 bp이다(도 2a 내지 도 2c 참조).The multiplex amplification product of the present invention has a size of 20 bp to 600 bp. Small amplification products can be advantageous for amplification of very small template or degraded samples, resulting in improved detection sensitivity. According to an embodiment of the present invention, the multiplex amplification product is 100 bp to 600 bp or 100 bp to 500 bp (see FIGS. 2A to 2C).

본 발명의 유전자 감식 방법은 법의학적 타이핑(Forensic typing) 또는 신원확인의 용도를 갖는다.The gene detection method of the present invention has the use of forensic typing or identification.

한편, 상기 단계 (b) 이후에, 본 발명의 방법에 의해 결정된 STR의 대립유전자형의 정확성을 판단하기 위하여 CE 분석법을 이용한 STR의 대랍유전자형과 비교하는 단계를 추가적으로 실시할 수 있다.
In addition, after step (b), a step of comparing the allele genotype of STR using the CE method can be additionally performed to determine the accuracy of the allelotype of STR determined by the method of the present invention.

본 발명의 다른 양태에 따르면, 본 발명은 D3S1358, TH01, D21S11, D18S51, PentaE, D5S818, D13S317, D7S820, D16S539, CSF1PO (Human c-fms proto-oncogene for CSF-1 receptor gene), PentaD, vWA (von Willebrand factor A), D8S1179, TPOX (Human thyroid peroxidase gene), FGA (Human fibrinogen alpha chain) 및 아멜로제닌(amelogenin) 유전좌위에 상보적으로 결합하는 각각의 프라이머를 포함하는 NGS 기반(next generation sequencing-based) 상염색체 분석용 멀티플렉스 유전자 증폭 키트를 제공한다.(Human c-fms proto-oncogene for CSF-1 receptor gene), PentaD, vWA (human c-fms proto-oncogene for CSF-1 receptor gene), D3S1358, TH01, D21S11, D18S51, PentaE, D5S818, D13S317, D7S820, D16S539 (NGS) -based primers, each containing a primer complementary to the von Willebrand factor A, D8S1179, human thyroid peroxidase gene (TPOX), human fibrinogen alpha chain (FGA) and amelogenin genetic loci -based mutant gene amplification kit for autosomal analysis.

본 발명의 키트는 상술한 멀티플렉스 유전자 증폭을 이용한 분석대상의 인간 객체(human subject)의 상염색체 분석 방법 1 을 이용하기 때문에, 이 둘 사이에 공통된 내용은 본 명세서의 과도한 복잡성을 피하기 위하여, 그 기재를 생략한다.
Since the kit of the present invention utilizes the autosomal analysis method 1 of a human subject to be analyzed using the multiplex gene amplification described above, the common content between the two is to avoid the excessive complexity of the present invention, The description is omitted.

Ⅱ. Ⅱ. NGSNGS 기반( base( nextnext generationgeneration sequencingsequencing -- basedbased ) ) 상염색체Autosomal 분석방법 2 Analysis Method 2

본 발명의 또 다른 양태에 따르면, 본 발명은 다음의 단계를 포함하는 NGS 기반(next generation sequencing-based) 인간 객체(human subject)의 상염색체 분석방법을 제공한다:According to another aspect of the present invention, the present invention provides a method of analyzing an autosomal analysis of an NGS-based next generation sequencing-based human subject, comprising the steps of:

(a) 인간 객체 DNA 시료의 D19S433, D5S818, Penta E, CSF1PO (Human c-fms proto-oncogene for CSF-1 receptor gene), D7S820, D18S51, TPOX (Human thyroid peroxidase gene), D16S539, D8S1179, 아멜로제닌(amelogenin), FGA (Human fibrinogen alpha chain), D13S317, D2S1338, D21S11, Penta D, D3S1358, TH01 및 vWA (von Willebrand factor A) 유전좌위에 상보적으로 결합하는 각각의 프라이머를 이용하여 멀티플렉스 증폭을 실시하는 단계; 및(a) Human DnaSDNA, D5S818, Penta E, CSF1PO (Human c-fms proto-oncogene for CSF-1 receptor gene), D7S820, D18S51, TPOX (human thyroid peroxidase gene), D16S539, D8S1179, Amplifier amplification was performed using primers complementary to the genetic locus of amelogenin, human fibrinogen alpha chain (FGA), D13S317, D2S1338, D21S11, Penta D, D3S1358, TH01 and vWA (von Willebrand factor A) ; And

NGS 방법은 STR (short tandem repeat) 마커를 분석에 적용되어 왔으며, 법의학 분야에서 인간 개체의 식별(동정)에 주로 이용되어 왔다. 일부 연구에서 STR 타이핑에 대한 NGS 시스템의 성공적인 적용가능성을 증명하였으나, 법의학적 STR 마커의 NGS 분석에 최적화된 멀티플렉스 PCR 시스템은 존재하지 않는다. 따라서 본 발명자들은 18개 마커(CODIS 13 STRs, D2S1338, D19S433, Penta D, Penta E, 및 아멜로제닌)의 증폭을 위한 in-house developed 멀티플렉스 PCR 시스템을 구축하였으며, 이 시스템은 50 bp - 300 bp (보다 상세하게는 70 bp - 210 bp) 크기의 앰플리콘을 생성한다. PCR 생성물은 본 발명의 멀티플렉스 PCR 방법을 이용하여 단일-소스 시료, 혼합시료 및 인위적으로 분해된 DNA 샘플로부터 생성되었고, 생성된 앰플리콘으로 MiSeq에서 시퀀싱에 필요한 바코드화된(barcoded) 라이브러리를 준비하였다. NGS를 수행하고 이의 데이터를 분석함으로써, 본 발명자들은 모든 STR 지노타입 결과가 CE-기반 타이핑의 결과와 일치함을 확인하였다. 서열변이 또한 타겟 STR 부위에서 검출되었다. 또한, in-house developed 멀티플렉스 PCR 시스템을 이용하는 경우, 상업적 키트에서 큰 사이즈의 앰플리콘을 생성했던 STR loci에서 정보의 손실없이 효과적으로 STR 대립유전자 call 및 STR 프로파일을 얻을 수 있으며, 인위적으로 분해된 DNA를 분석하여 NGS 데이터로부터 STR 프로파일을 수득할 수 있었다. 즉, 작은 앰플리콘을 생성할 수 있는 본 발명의 in-house developed 멀티플렉스 PCR 시스템은 NGS 시스템을 이용하는 법의학 연구에서, 단일-소스 샘플, 혼합샘플 및 분해된 DNA 샘플을 이용한 STR NGS 분석에 성공적으로 적용할 수 있다.The NGS method has been applied to the analysis of STR (short tandem repeat) markers and has been used primarily in the identification (identification) of human subjects in the forensic field. Although some studies have demonstrated the successful applicability of the NGS system to STR typing, there is no multiplex PCR system optimized for NGS analysis of forensic STR markers. Therefore, we constructed an in-house developed multiplex PCR system for the amplification of 18 markers (CODIS 13 STRs, D2S1338, D19S433, Penta D, Penta E and Ammelogenin) bp (more specifically 70 bp - 210 bp) size amplicons. PCR products were generated from single-source samples, mixed samples, and artificially resolved DNA samples using the multiplex PCR method of the present invention, and the resulting amplicon was used to prepare a barcoded library for sequencing in MiSeq Respectively. By performing NGS and analyzing its data, the inventors have confirmed that all STR genotype results are consistent with the results of CE-based typing. Sequence variations were also detected in the target STR region. In addition, when using the in-house developed multiplex PCR system, the STR loci that generated the large size amplicon in the commercial kit can effectively obtain the STR allele call and STR profile without loss of information, and the artificially degraded DNA To obtain STR profiles from NGS data. That is, the in-house developed multiplex PCR system of the present invention capable of generating small amplicons has been successfully used in forensic studies using the NGS system, STR NGS analysis using single-source samples, mixed samples and resolved DNA samples Can be applied.

단계 (a): 멀티플렉스 증폭 단계Step (a): Multiplex amplification step

우선, 분석대상인 DNA 시료의 D19S433, D5S818, Penta E, CSF1PO (Human c-fms proto-oncogene for CSF-1 receptor gene), D7S820, D18S51, TPOX (Human thyroid peroxidase gene), D16S539, D8S1179, 아멜로제닌(amelogenin), FGA (Human fibrinogen alpha chain), D13S317, D2S1338, D21S11, Penta D, D3S1358, TH01 및 vWA (von Willebrand factor A)에 결합하는 각각의 프라이머를 이용하여 멀티플렉스 증폭을 실시한다.First, the D19S433, D5S818, Penta E, CSF1PO (human c-fms proto-oncogene for CSF-1 receptor gene), D7S820, D18S51, TPOX (human thyroid peroxidase gene), D16S539, D8S1179, multiplex amplification is carried out using primers which bind to amelogenin, human fibrinogen alpha chain (FGA), D13S317, D2S1338, D21S11, Penta D, D3S1358, TH01 and vWA (von Willebrand factor A).

상기 DNA 시료는 혈액, 정액, 질 세포, 모발, 타액, 소변, 구강세포, 태반세포 또는 태아세포를 포함하는 양수 및 이의 혼합물을 포함하는 군으로부터 선택되는 조직으로부터 분리된 DNA 시료이며, DNA를 포함하는 생물학적 시료(biological sample)를 포함한다. 한편, 상기 DNA 시료는 분절된(degraded) DNA 시료 또한 포함한다.The DNA sample is a DNA sample isolated from a tissue selected from the group consisting of blood, semen, vaginal cells, hair, saliva, urine, oral cells, placental cells or fetal cells, and mixtures thereof. And a biological sample. On the other hand, the DNA sample also includes a degraded DNA sample.

본 발명에서 상기 멀티플렉스 증폭은 멀티플렉스 PCR(Polymerase Chain Reaction) 증폭 또는 직접(direct) 멀티플렉스 PCR 증폭이다.In the present invention, the multiplex amplification is a Polymerase Chain Reaction (PCR) amplification or a direct multiplex PCR amplification.

본 발명의 일 구현예에 따르면, 상기 멀티플렉스 PCR 증폭은 57-61℃의 어닐링(annealing) 온도 조건을 갖고, 본 발명의 다른 구현예에 따르면, 상기 멀티플렉스 PCR 증폭은 58-60℃의 어닐링 온도 조건을 가지며, 본 발명의 특정 구현예에 따르면, 상기 멀티플렉스 PCR 증폭은 59℃의 어닐링 온도 조건을 갖는다.According to one embodiment of the present invention, the multiplex PCR amplification has an annealing temperature condition of 57-61 ° C. According to another embodiment of the present invention, the multiplex PCR amplification is performed at a temperature of 58-60 ° C Temperature conditions, and according to a particular embodiment of the invention, the multiplex PCR amplification has an annealing temperature condition of 59 < 0 > C.

상기 멀티플렉스 PCR 증폭은 PCR을 실시하는 데 적정한 싸이클 수가 요구된다. 본 발명의 일 구현예에 따르면, 상기 멀티플렉스 PCR 증폭은 31-38 싸이클, 31-35 싸이클, 32-34 싸이클 또는 33 싸이클로 실시한다. 본 발명의 다른 구현예에 따르면, 분절된(degraded) DNA 시료를 분석하는 경우 상기 멀티플렉스 유전자 증폭은 34-38 싸이클로 실시하며, 본 발명의 특정 구현예에 따르면 36 싸이클로 실시한다. The multiplex PCR amplification requires a reasonable number of cycles to perform PCR. According to one embodiment of the present invention, the multiplex PCR amplification is performed in 31-38 cycles, 31-35 cycles, 32-34 cycles, or 33 cycles. According to another embodiment of the present invention, when analyzing a degraded DNA sample, the multiplex gene Amplification is carried out in 34-38 cycles and in 36 cycles according to a particular embodiment of the invention.

본 발명은 상기 DNA 시료의 D19S433, D5S818, Penta E, CSF1PO, D7S820, D18S51, TPOX, D16S539, D8S1179, 아멜로제닌, FGA, D13S317, D2S1338, D21S11, Penta D, D3S1358, TH01 및 vWA 유전좌위에 상보적으로 결합하는 각각의 프라이머를 이용하여 멀티플렉스 증폭을 실시한다. The present invention is based on the finding that the DNA samples are complementary to the D19S433, D5S818, Penta E, CSF1PO, D7S820, D18S51, TPOX, D16S539, D8S1179, Amelogenin, FGA, D13S317, D2S1338, D21S11, Penta D, D3S1358, TH01 and vWA genetic loci Multiplex amplification is carried out using each of the primers that bind to each other.

본 발명의 일 구현예에 따르면, 본 발명에서 사용되는 프라이머는 비-표지(non-labeled) 프라이머로서, 특정 형광염료로 표지되거나 어댑터(adaptor)가 부착되지 않은 프라이머이다. According to one embodiment of the present invention, the primer used in the present invention is a non-labeled primer, which is a primer that is not labeled with a specific fluorescent dye or has no adapter.

한편, 본 발명의 프라이머는 프라이머 염기 서열의 변이가 있는 경우에 변이된 결합 부위에서도 증폭이 일어나도록 프라이머 설계에 축퇴성(degeneracy)을 적용하였다. 이러한 본 발명의 프라이머를 이용한 멀티플렉스 PCR의 증폭 산물의 크기는 50 bp - 300 bp로, 극소량의 주형 또는 분해된 시료의 증폭에 유리하여 검출 감도 향상에 기여한다. 본 발명의 일 구현예 따르면, 상기 증폭 산물의 크기는 70 bp 내지 210 bp이다.Meanwhile, the primer of the present invention applied degeneracy to the primer design so that even in the case of mutation of the base sequence of the primer, amplification occurs at the mutated binding site. The amplification product of the multiplex PCR using the primer of the present invention has a size of 50 bp - 300 bp, which is advantageous for amplification of a very small amount of template or degraded sample, thereby contributing to improvement of detection sensitivity. According to one embodiment of the present invention, the size of the amplification product is 70 bp to 210 bp.

STR 마커에 대한 DNA 염기서열정보는 UCSC genome browser (http://genome. ucsc.edu/)에서 수집할 수 있다. 각 마커를 증폭하기 위한 PCR 프라이머는 수집된 염기서열 정보를 바탕으로 Primer3 (http://frodo.wi.mit.edu/primer3/input.htm) 상에 내정된 조건을 기본적으로 충족하면서, 증폭산물의 크기는 최종적으로 50 bp - 300 bp가 되도록 설계한다(표 7 참조). 프라이머는 STR 반복단위와 중첩되는 않는 범위에서 STR 반복단위에 가장 근접하고, 프라이머 부착부위에 1% 이상 빈도의 단일염기다형성(single nucleotide polymorphism, SNP)이 존재하는 않도록 설계한다. Primer3로 설계된 다수의 후보 프라이머를 대상으로, PCR 증폭에서 프라이머 간 간섭이 없는 최적의 프라이머 조합 및 전기영동에서 STR 유전좌위 간 피크(peak) 높이 균형이 유지되는 프라이머 농도를 결정하여 멀티플렉스 PCR 시스템을 구축한다.DNA sequence information for STR markers can be collected from the UCSC genome browser (http: // genome.ucsc.edu/). PCR primers for amplifying the respective markers basically satisfy the predetermined conditions on Primer3 (http://frodo.wi.mit.edu/primer3/input.htm) based on the collected nucleotide sequence information, Is designed to be 50 bp to 300 bp (see Table 7). The primer should be designed so that it is closest to the STR repeat unit in the range not overlapped with the STR repeat unit, and that there is no single nucleotide polymorphism (SNP) of 1% or more at the primer attachment site. In the case of multiple candidate primers designed with Primer3, the optimum primer combination without primer-to-primer interference in PCR amplification and the primer concentration maintaining the peak height balance between the STR genetic loci in electrophoresis were determined and multiplex PCR system Build.

본 발명의 일 구현예에 따르면, 상기 단계 (a)의 상기 D19S433 유전좌위에 상보적으로 결합하는 프라이머는 서열목록 제33서열 및 제34서열이고; 상기 D5S818 유전좌위에 상보적으로 결합하는 프라이머는 서열목록 제35서열 및 제36서열이며; 상기 Penta E 유전좌위에 상보적으로 결합하는 프라이머는 서열목록 제37서열 및 제38서열이고; 상기 CSF1PO 유전좌위에 상보적으로 결합하는 프라이머는 서열목록 제39서열 및 제40서열이며; 상기 D7S820 유전좌위에 상보적으로 결합하는 프라이머는 서열목록 제41서열 및 제42서열이고; 상기 D18S51 유전좌위에 상보적으로 결합하는 프라이머는 서열목록 제43서열 및 제44서열이며; 상기 TPOX 유전좌위에 상보적으로 결합하는 프라이머는 서열목록 제45서열 및 제46서열이고; 상기 D16S539 유전좌위에 상보적으로 결합하는 프라이머는 서열목록 제47서열 및 제48서열이며; 상기 D8S1179 유전좌위에 상보적으로 결합하는 프라이머는 서열목록 제49서열 및 제50서열이고; 상기 아멜로제닌 유전좌위에 상보적으로 결합하는 프라이머는 서열목록 제51서열 및 제52서열이며; 상기 FGA 유전좌위에 상보적으로 결합하는 프라이머는 서열목록 제53서열 및 제54서열이고, 상기 D13S317 유전좌위에 상보적으로 결합하는 프라이머는 서열목록 제55서열 및 제56서열이며; 상기 D2S1338 유전좌위에 상보적으로 결합하는 프라이머는 서열목록 제57서열 및 제58서열이고; 상기 D21S11 유전좌위에 상보적으로 결합하는 프라이머는 서열목록 제59서열 및 제60서열이며; 상기 Penta D 유전좌위에 상보적으로 결합하는 프라이머는 서열목록 제61서열 및 제62서열이고; 상기 D3S1358 유전좌위에 상보적으로 결합하는 프라이머는 서열목록 제63서열 및 제64서열이며; 상기 TH01 유전좌위에 상보적으로 결합하는 프라이머는 서열목록 제65서열 및 제66서열이고; 상기 vWA 유전좌위에 상보적으로 결합하는 프라이머는 서열목록 제67서열 및 제68서열이다.According to one embodiment of the present invention, the primer that binds complementarily to the D19S433 genetic locus of step (a) is the sequence of SEQ ID NO: 33 and SEQ ID NO: 34; Wherein the primers complementarily binding to the D5S818 genetic locus are SEQ ID NO: 35 and SEQ ID NO: 36; Wherein the primer that binds complementarily to the Penta E gene locus is a sequence of SEQ ID NOS: 37 and 38; The primers complementarily binding to the CSF1PO genetic locus are SEQ ID NO: 39 and SEQ ID NO: 40; The primers complementarily binding to the D7S820 genetic locus are those of Sequence Listing 41 and Sequence 42; The primers complementarily binding to the D18S51 genetic locus are those of Sequence Listing 43 and Sequence 44; Wherein the primers complementarily binding to the TPOX genetic locus are those of Sequence Listing 45 and 46; The primers complementarily binding to the D16S539 genetic locus are those of Sequence Listing 47 and Sequence 48; The primers complementarily binding to the D8S1179 genetic locus are those of Sequence Listing 49 and Sequence 50; Wherein the primers complementarily binding to the amelogenin gene locus are SEQ ID NO: 51 and SEQ ID NO: 52; Wherein the primers complementarily binding to the FGA gene locus are SEQ ID NOS: 53 and 54, and the primers complementarily binding to the D13S317 genetic locus are SEQ ID NOS: 55 and 56; The primers complementarily binding to the D2S1338 genetic locus are those of SEQ ID NOS: 57 and 58; The primers that complementarily bind to the D21S11 genetic locus are those of Sequence Listing 59 and Sequence 60; Wherein the primer complementarily binding to the Penta D genetic locus is a sequence selected from the group consisting of SEQ ID NOS: 61 and 62; The primers complementarily binding to the D3S1358 genetic locus are SEQ ID NO: 63 and SEQ ID NO: 64; Wherein the primers complementarily binding to the TH01 genetic locus are those of SEQ ID NOS: 65 and 66; The primers that complementarily bind to the vWA genetic locus are SEQ ID NO: 67 and SEQ ID NO: 68.

본 발명의 일 구현예에 따르면, D19S433 유전좌위에 상보적으로 결합하는 프라이머는 0.4-0.8 μM의 최종농도를 갖고, 상기 D5S818 유전좌위에 상보적으로 결합하는 프라이머는 0.3-0.8 μM의 최종농도를 가지며, 상기 Penta E 유전좌위에 상보적으로 결합하는 프라이머는 0.8-1.2 μM의 최종농도를 갖고, 상기 CSF1PO 유전좌위에 상보적으로 결합하는 프라이머는 0.3-0.7 μM의 최종농도를 가지며, 상기 D7S820 유전좌위에 상보적으로 결합하는 프라이머는 0.3-0.7 μM의 최종농도를 갖고, 상기 D18S51 유전좌위에 상보적으로 결합하는 프라이머는 0.7-1.1 μM의 최종농도를 가지며, 상기 TPOX 유전좌위에 상보적으로 결합하는 프라이머는 0.1-0.6 μM의 최종농도를 갖고; 상기 D16S539 유전좌위에 상보적으로 결합하는 프라이머는 0.2-0.7 μM의 최종농도를 가지며; 상기 D8S1179 유전좌위에 상보적으로 결합하는 프라이머는 0.7-1.1 μM의 최종농도를 갖고; 상기 아멜로제닌 유전좌위에 상보적으로 결합하는 프라이머는 0.6-1 μM의 최종농도를 가지며; 상기 FGA 유전좌위에 상보적으로 결합하는 프라이머는 0.8-1.2 μM의 최종농도를 갖고; 상기 D13S317유전좌위에 상보적으로 결합하는 프라이머는 0.6-1 μM의 최종농도를 가지며; 상기 D2S1338 유전좌위에 상보적으로 결합하는 프라이머는 0.6-1 μM의 최종농도를 갖고; 상기 D21S11 유전좌위에 상보적으로 결합하는 프라이머는 0.5-0.9 μM의 최종농도를 가지며; 상기 Penta D 유전좌위에 상보적으로 결합하는 프라이머는 0.8-1.2 μM의 최종농도를 갖고; 상기 D3S1358 유전좌위에 상보적으로 결합하는 프라이머는 0.3-0.7 μM의 최종농도를 가지며; 상기 TH01 유전좌위에 상보적으로 결합하는 프라이머는 0.3-0.7 μM의 최종농도를 갖고; 상기 vWA 유전좌위에 상보적으로 결합하는 프라이머는 0.8-1.2 μM의 최종농도를 갖는다. 본 발명의 다른 구현예에 따르면, D19S433 유전좌위에 상보적으로 결합하는 프라이머는 0.45-0.75 μM의 최종농도를 갖고, 상기 D5S818 유전좌위에 상보적으로 결합하는 프라이머는 0.35-0.75 μM의 최종농도를 가지며, 상기 Penta E 유전좌위에 상보적으로 결합하는 프라이머는 0.85-1.15 μM의 최종농도를 갖고, 상기 CSF1PO 유전좌위에 상보적으로 결합하는 프라이머는 0.35-0.65 μM의 최종농도를 가지며, 상기 D7S820 유전좌위에 상보적으로 결합하는 프라이머는 0.35-0.65 μM의 최종농도를 갖고, 상기 D18S51 유전좌위에 상보적으로 결합하는 프라이머는 0.75-1.05 μM의 최종농도를 가지며, 상기 TPOX 유전좌위에 상보적으로 결합하는 프라이머는 0.15-0.55 μM의 최종농도를 갖고; 상기 D16S539 유전좌위에 상보적으로 결합하는 프라이머는 0.25-0.65 μM의 최종농도를 가지며; 상기 D8S1179 유전좌위에 상보적으로 결합하는 프라이머는 0.75-1.05 μM의 최종농도를 갖고; 상기 아멜로제닌 유전좌위에 상보적으로 결합하는 프라이머는 0.65-0.95 μM의 최종농도를 가지며; 상기 FGA 유전좌위에 상보적으로 결합하는 프라이머는 0.85-1.15 μM의 최종농도를 갖고; 상기 D13S317유전좌위에 상보적으로 결합하는 프라이머는 0.65-0.95 μM의 최종농도를 가지며; 상기 D2S1338 유전좌위에 상보적으로 결합하는 프라이머는 0.65-0.95 μM의 최종농도를 갖고; 상기 D21S11 유전좌위에 상보적으로 결합하는 프라이머는 0.55-0.85 μM의 최종농도를 가지며; 상기 Penta D 유전좌위에 상보적으로 결합하는 프라이머는 0.85-1.15 μM의 최종농도를 갖고; 상기 D3S1358 유전좌위에 상보적으로 결합하는 프라이머는 0.35-0.65 μM의 최종농도를 가지며; 상기 TH01 유전좌위에 상보적으로 결합하는 프라이머는 0.35-0.65 μM의 최종농도를 갖고; 상기 vWA 유전좌위에 상보적으로 결합하는 프라이머는 0.85-1.15 μM의 최종농도를 갖는다. 본 발명의 특정 구현예에 따르면, D19S433 유전좌위에 상보적으로 결합하는 프라이머는 0.5-0.7 μM의 최종농도를 갖고, 상기 D5S818 유전좌위에 상보적으로 결합하는 프라이머는 0.45-0.65 μM의 최종농도를 가지며, 상기 Penta E 유전좌위에 상보적으로 결합하는 프라이머는 0.9-1.1 μM의 최종농도를 갖고, 상기 CSF1PO 유전좌위에 상보적으로 결합하는 프라이머는 0.4-0.6 μM의 최종농도를 가지며, 상기 D7S820 유전좌위에 상보적으로 결합하는 프라이머는 0.4-0.6 μM의 최종농도를 갖고, 상기 D18S51 유전좌위에 상보적으로 결합하는 프라이머는 0.8-1 μM의 최종농도를 가지며, 상기 TPOX 유전좌위에 상보적으로 결합하는 프라이머는 0.25-0.45 μM의 최종농도를 갖고; 상기 D16S539 유전좌위에 상보적으로 결합하는 프라이머는 0.35-0.55 μM의 최종농도를 가지며; 상기 D8S1179 유전좌위에 상보적으로 결합하는 프라이머는 0.8-1 μM의 최종농도를 갖고; 상기 아멜로제닌 유전좌위에 상보적으로 결합하는 프라이머는 0.7-0.9 μM의 최종농도를 가지며; 상기 FGA 유전좌위에 상보적으로 결합하는 프라이머는 0.9-1.1 μM의 최종농도를 갖고; 상기 D13S317유전좌위에 상보적으로 결합하는 프라이머는 0.7-0.9 μM의 최종농도를 가지며; 상기 D2S1338 유전좌위에 상보적으로 결합하는 프라이머는 0.7-0.9 μM의 최종농도를 갖고; 상기 D21S11 유전좌위에 상보적으로 결합하는 프라이머는 0.6-0.8 μM의 최종농도를 가지며; 상기 Penta D 유전좌위에 상보적으로 결합하는 프라이머는 0.9-1.1 μM의 최종농도를 갖고; 상기 D3S1358 유전좌위에 상보적으로 결합하는 프라이머는 0.4-0.6 μM의 최종농도를 가지며; 상기 TH01 유전좌위에 상보적으로 결합하는 프라이머는 0.4-0.6 μM의 최종농도를 갖고; 상기 vWA 유전좌위에 상보적으로 결합하는 프라이머는 0.9-1.1 μM의 최종농도를 갖는다. According to one embodiment of the invention, the primer complementarily binding to the D19S433 genetic locus has a final concentration of 0.4-0.8 [mu] M and the primer that binds complementarily to the D5S818 genetic locus has a final concentration of 0.3-0.8 [mu] M Wherein the primer complementarily binding to the Penta E genetic locus has a final concentration of 0.8-1.2 [mu] M, the primer complementarily binding to the CSF1PO genetic locus has a final concentration of 0.3-0.7 [mu] M, the D7S820 genetic The primers complementarily binding to the locus had a final concentration of 0.3-0.7 [mu] M, the primers complementarily binding to the D18S51 genetic locus had a final concentration of 0.7-1.1 [mu] M, and the TPOX genetic locus was complementarily bound Primers have a final concentration of 0.1-0.6 [mu] M; The primer that binds complementarily to the D16S539 genetic locus has a final concentration of 0.2-0.7 [mu] M; The primer that binds complementarily to the D8S1179 genetic locus has a final concentration of 0.7-1.1 [mu] M; The primer that binds complementarily to the amelogenin gene locus has a final concentration of 0.6-1 [mu] M; The primer that binds complementarily to the FGA gene locus has a final concentration of 0.8-1.2 [mu] M; The primer that binds complementarily to the D13S317 genetic locus has a final concentration of 0.6-1 [mu] M; The primer that binds complementarily to the D2S1338 genetic locus has a final concentration of 0.6-1 [mu] M; The primer that binds complementarily to the D21S11 genetic locus has a final concentration of 0.5-0.9 [mu] M; The primer that binds complementarily to the Penta D genetic locus has a final concentration of 0.8-1.2 [mu] M; The primer that binds complementarily to the D3S1358 genetic locus has a final concentration of 0.3-0.7 [mu] M; The primer that binds complementarily to the TH01 genetic locus has a final concentration of 0.3-0.7 [mu] M; The primer that binds complementarily to the vWA genetic locus has a final concentration of 0.8-1.2 [mu] M. According to another embodiment of the present invention, the primer complementarily binding to the D19S433 genetic locus has a final concentration of 0.45-0.75 [mu] M and the primer that binds complementarily to the D5S818 genetic locus has a final concentration of 0.35-0.75 [mu] M Wherein the primer complementarily binding to the Penta E genetic locus has a final concentration of 0.85-1.15 [mu] M, the primer complementarily binding to the CSF1PO genetic locus has a final concentration of 0.35-0.65 [mu] M, the D7S820 genetic The primers complementarily binding to the locus had a final concentration of 0.35-0.65 [mu] M, the primers complementarily binding to the D18S51 genetic locus had a final concentration of 0.75-1.05 [mu] M, and the TPOX genetic locus was complementarily bound Primers have a final concentration of 0.15-0.55 [mu] M; The primer that binds complementarily to the D16S539 genetic locus has a final concentration of 0.25-0.65 [mu] M; The primer that binds complementarily to the D8S1179 genetic locus has a final concentration of 0.75-1.05 [mu] M; The primer that binds complementarily to the amelogenin gene locus has a final concentration of 0.65-0.95 [mu] M; The primer that binds complementarily to the FGA gene locus has a final concentration of 0.85-1.15 [mu] M; The primer that binds complementarily to the D13S317 genetic locus has a final concentration of 0.65-0.95 [mu] M; The primer that binds complementarily to the D2S1338 genetic locus has a final concentration of 0.65-0.95 [mu] M; The primers complementarily binding to the D21S11 genetic locus had a final concentration of 0.55-0.85 [mu] M; The primer that binds complementarily to the Penta D gene locus has a final concentration of 0.85-1.15 [mu] M; The primer that binds complementarily to the D3S1358 genetic locus has a final concentration of 0.35-0.65 [mu] M; The primer that binds complementarily to the TH01 genetic locus has a final concentration of 0.35-0.65 [mu] M; The primer that binds complementarily to the vWA genetic locus has a final concentration of 0.85-1.15 [mu] M. According to a particular embodiment of the invention, the primer that binds complementarily to the D19S433 genetic locus has a final concentration of 0.5-0.7 [mu] M, and the primer that binds complementarily to the D5S818 genetic locus has a final concentration of 0.45-0.65 [mu] M Wherein the primer complementarily binding to the Penta E genetic locus has a final concentration of 0.9-1.1 [mu] M, the primer complementarily binding to the CSF1PO genetic locus has a final concentration of 0.4-0.6 [mu] M, the D7S820 genetic The primers complementarily binding to the locus had a final concentration of 0.4-0.6 [mu] M, the primers complementarily binding to the D18S51 genetic locus had a final concentration of 0.8-1 [mu] M, and the TPOX genetic locus was complementarily bound Primers have a final concentration of 0.25-0.45 [mu] M; The primer that binds complementarily to the D16S539 genetic locus has a final concentration of 0.35-0.55 [mu] M; The primer that binds complementarily to the D8S1179 genetic locus has a final concentration of 0.8-1 [mu] M; The primer that binds complementarily to the amelogenin gene locus has a final concentration of 0.7-0.9 [mu] M; The primer that binds complementarily to the FGA gene locus has a final concentration of 0.9-1.1 [mu] M; The primer complementarily binding to the D13S317 genetic locus had a final concentration of 0.7-0.9 [mu] M; The primer that binds complementarily to the D2S1338 genetic locus has a final concentration of 0.7-0.9 [mu] M; The primer that binds complementarily to the D21S11 genetic locus has a final concentration of 0.6-0.8 [mu] M; The primer that binds complementarily to the Penta D gene locus has a final concentration of 0.9-1.1 [mu] M; The primer that binds complementarily to the D3S1358 genetic locus has a final concentration of 0.4-0.6 [mu] M; The primer that binds complementarily to the TH01 genetic locus has a final concentration of 0.4-0.6 [mu] M; The primer that binds complementarily to the vWA genetic locus has a final concentration of 0.9-1.1 [mu] M.

단계 (b): 유전자 감식 단계Step (b): The step of gene detection

본 발명에서 레퍼런스 서열(reference sequence)은 NGS 분석에서 생성되는 대량의 염기서열 정보를 효율적으로 분석하기 위한 참조서열이다. 레퍼런스 서열을 하기와 같은 방법에 의해 제작될 수 있다: In the present invention, a reference sequence is a reference sequence for efficiently analyzing a large amount of nucleotide sequence information generated in the NGS analysis. Reference sequences can be produced by the following method:

우선, STRbase (http://www.cstl.nist.gov/biotech/strbase)로부터 현재까지 알려진 STR 대립유전자의 반복수와 이들의 서열 정보 및 human genome GRCh37/hg19 서열에 표시된 각 STR의 5´ 및 3´ 주변부 서열 (flanking region sequence) 정보를 수집한다. 이들 정보들을 바탕으로, 각 프라이머 조합을 통해 얻어진 NGS 리드가 참조서열과의 정렬이 가능하도록 STR 반복단위 주변부 서열의 길이가 500 bp - 550 bp가 되도록 설계한다. 또한, STR 5´ 주변부 서열, STR 반복단위 영역의 서열, STR 3´ 주변부 서열을 각 대립유전자별로 MicrosoftExcel의 매크로 기능을 이용하여 조합한다.First, the number of repeats of the STR alleles known from the STRbase (http://www.cstl.nist.gov/biotech/strbase) to date, their sequence information, and the 5 'of each STR shown in the human genome GRCh37 / hg19 sequence 3 'flanking region sequence information. Based on these information, the length of the STR repeating unit peripheral sequence is designed to be 500 bp - 550 bp so that the NGS leads obtained through each primer combination can be aligned with the reference sequence. In addition, the STR 5 'peripheral sequence, the STR repeat sequence unit sequence, and the STR 3' peripheral sequence are combined for each allele using the macro function of Microsoft Excel.

본 발명의 일 구현예에 따르면, 상기 레퍼런스 서열은 상기 대립유전자의 STR 영역의 서열, 상기 STR 영역의 5' 주변부 서열 및 상기 STR 영역의 3' 주변부 서열을 포함하며, 주변부 서열의 길이는 500 bp - 550 bp로 설정된다.According to an embodiment of the present invention, the reference sequence includes a sequence of the STR region of the allele, a 5 'peripheral sequence of the STR region, and a 3' peripheral sequence of the STR region, and the length of the peripheral sequence is 500 bp - Set to 550 bps.

획득한 자료의 NGS 데이터 분석은 기본적으로 Warshauer 등 [1]이 법과학 영역에서 주로 사용되는 STR 대립유전자형을 결정하기 위해 개발한 프로그램인 STRait Razor 프로그램 등을 이용하여 NGS 자료를 분석할 수 있으며, Bowtie 2 program [2] 및 SAMtools [3], BEDtools [4] 및 Microsoft Excel을 이용하여서도 NGS 자료 분석을 수행할 수 있다. 각 STR 대립유전자형에 대한 coverage 값은 레퍼런스 서열 정보에 따라 전체 STR 영역의 서열을 가지고 있는 정렬된 리드 수로 계산한다. 각 STR 대립유전자에 대한 염기서열을 바탕으로 한 반복단위의 확인은 Integrative Genomics Viewer [5, 6] 프로그램을 이용할 수 있다. NGS data analysis of acquired data can basically analyze NGS data using STRait Razor program, which is a program developed by Warshauer et al. [1] to determine STR alleles used mainly in the forensic science field. Bowtie 2 NGS data analysis can also be performed using program [2] and SAMtools [3], BEDtools [4], and Microsoft Excel. The coverage value for each STR allele genotype is calculated by the number of aligned leads having the sequence of the entire STR region according to the reference sequence information. Identification of repeat units based on nucleotide sequences for each STR allele can be done using the Integrative Genomics Viewer program [5, 6].

본 발명의 일 구현예에 따르면, 본 발명에서는 단일-소스(single-source) 시료 또는 이들의 혼합시료(예컨대, 1:1, 1:3, 1:6, 1:9 비율로 섞은 혼합시료)를 대상으로 본 발명의 새로운 멀티플렉스 PCR 시스템을 이용하여 증폭산물을 생성하고 적절한 시퀀싱 시스템(Roche의 GS FLX 또는 Illumina의 MiSeq platform)을 이용하여 서열 정보를 수득할 수 있다.According to an embodiment of the present invention, a single-source sample or a mixture thereof (for example, a mixed sample mixed at a ratio of 1: 1, 1: 3, 1: 6, 1: 9) , A new multiplex PCR system of the present invention can be used to generate amplification products and sequence information can be obtained using an appropriate sequencing system (Roche's GS FLX or Illumina's MiSeq platform).

본 발명의 방법은 같은 길이의 대립유전자를 구별하기 위하여 상기 STR 유전좌 부위에서 반복 서열의 복제수(the number of copy), 대립유전자의 반복구조 또는 염기 서열변이(sequence variation)를 동정하는 단계를 추가적으로 실시할 수 있다. The method of the present invention includes the step of identifying the number of copies of the repeat sequence, repeat structure of alleles or sequence variation in the STR genomic locus to distinguish alleles of the same length Can be additionally performed.

본 발명에 따르면, 상기 염기 서열변이(sequence variation)를 이용하여 혼합 샘플의 혼합비를 측정하는 단계를 추가적으로 실시할 수 있다. 상기 혼합비는 각 STR 유전자위의 대립유전자간 coverage 비율을 분석하고, 서열변이의 레퍼런스염기/변이염기 비율을 분석함으로써 측정한다.According to the present invention, a step of measuring the mixing ratio of the mixed sample using the sequence variation may be additionally performed. The mixing ratio is determined by analyzing the coverage ratio between alleles on each STR gene and analyzing the base / mutation base ratio of the sequence mutation.

본 발명의 상염색체 분석 방법은 법의학적 타이핑(Forensic typing) 또는 신원확인의 용도를 갖는다.The autosomal analysis method of the present invention has the use of forensic typing or identification.

한편, 상기 단계 (b) 이후에, 본 발명의 방법에 의해 결정된 STR의 대립유전자형의 정확성을 판단하기 위하여 CE 분석법을 이용한 STR의 대랍유전자형과 비교하는 단계를 추가적으로 실시할 수 있다. 즉, 단일시료 및 혼합시료를 상업적인 키트 [예컨대, AmpFℓSTR Identifiler(Applied Biosystems)]를 이용하여 제조사의 지시에 따라 중합효소연쇄반응을 수행 한 후, 분석시스템[예컨대, ABI PRISM3130xl Genetic Analyzer (Applied Biosystems)]를 이용하여 분석한다. 이어, ABI GeneMapperID software version 3.7 (Applied Biosystems) 등의 프로그램에서 결정된 STR 대립유전자형이 NGS 분석법으로 얻어진 STR 대립유전자형의 결과가 일치하는지 비교한다.In addition, after step (b), a step of comparing the allele genotype of STR using the CE method can be additionally performed to determine the accuracy of the allelotype of STR determined by the method of the present invention. That is, a single sample and a mixed sample are subjected to PCR using a commercial kit (e.g., AmpFlSTR Identifiler (Applied Biosystems)) according to the manufacturer's instructions, and then analyzed using an analysis system (e.g., ABI PRISM3130xl Genetic Analyzer (Applied Biosystems) ]. Next, the STR allele genotype determined in a program such as ABI GeneMapperID software version 3.7 (Applied Biosystems) is compared with the results of the STR allelic genotype obtained by the NGS method.

본 발명의 또 다른 양태에 따르면, D19S433, D5S818, Penta E, CSF1PO (Human c-fms proto-oncogene for CSF-1 receptor gene), D7S820, D18S51, TPOX (Human thyroid peroxidase gene), D16S539, D8S1179, 아멜로제닌(amelogenin), FGA (Human fibrinogen alpha chain), D13S317, D2S1338, D21S11, Penta D, D3S1358, TH01 및 vWA (von Willebrand factor A) 유전좌위에 상보적으로 결합하는 각각의 프라이머를 포함하는 NGS 기반(next generation sequencing-based) 상염색체 분석용 멀티플렉스 유전자 증폭 키트를 제공한다.According to another aspect of the present invention, there is provided a method for screening for a human thyroid peroxidase gene (D19S433, D5S818, Penta E, CSF1PO (human c-fms proto-oncogene for CSF-1 receptor gene), D7S820, D18S51, TPOX NGS-based primers containing respective primers complementary to the genetic locus of amelogenin, human fibrinogen alpha chain (FGA), D13S317, D2S1338, D21S11, Penta D, D3S1358, TH01 and vWA (von Willebrand factor A) (next generation sequencing-based) autosomal analysis multiplex gene amplification kit.

본 발명의 키트는 상술한 멀티플렉스 유전자 증폭을 이용한 분석대상의 인간 객체(human subject)의 상염색체 분석 방법 2를 이용하기 때문에, 이 둘 사이에 공통된 내용은 본 명세서의 과도한 복잡성을 피하기 위하여, 그 기재를 생략한다.
Since the kit of the present invention utilizes the autosomal analysis method 2 of the human subject to be analyzed using the multiplex gene amplification described above, the common content between the two is to avoid the excessive complexity of the present invention, The description is omitted.

본 발명의 특징 및 이점을 요약하면 다음과 같다:The features and advantages of the present invention are summarized as follows:

(a) 본 발명은 NGS 기반(next generation sequencing-based) 인간 객체(human subject)의 상염색체 분석방법 및 이에 최적화된 멀티플렉스 PCR 시스템을 제공한다.(a) The present invention provides an autosomal analysis method of a next generation sequencing-based human subject and a multiplex PCR system optimized therefor.

(b) 본 발명의 방법을 이용하여 STR 마커의 NGS 자료의 분석을 통해 STR 대립유전자형 결정, 대립유전자의 반복구조, 염기서열변이를 분석하여 인간 객체(human subject)의 상염색체를 정확하고 효과적으로 분석할 수 있다.(b) By analyzing NGS data of the STR marker using the method of the present invention, it is possible to accurately and effectively analyze the autosomal chromosome of a human subject by analyzing STR allelotype determination, repetitive structure of alleles, and nucleotide sequence variation can do.

(c) 본 발명은 50 bp - 300 bp 크기의 앰플리콘을 생성할 수 있는 NGS 분석을 위한 크기(size) 최적화된 멀티플렉스 PCR 시스템을 제공한다.(c) The present invention provides a size optimized multiplex PCR system for NGS analysis capable of producing amplicons of 50 bp to 300 bp in size.

(d) NGS 시스템을 이용한 본 발명의 방법은 법의학 연구에서 단일-소스 샘플, 혼합샘플 및 분해된 DNA 샘플에서 STR 프로파일을 분석 및 해석하는데 유용하게 사용될 수 있다.
(d) The method of the present invention using the NGS system can be useful for analyzing and interpreting STR profiles in single-source samples, mixed samples, and degraded DNA samples in forensic studies.

도 1은 STR 레퍼런스 서열의 모식도를 나타낸다. STR 레퍼런tm 서열 내 500 bp - 550 bp 범위의 긴 플랭킹 서열은, 특정 프라이머 조합에 의해 생성된 샘플 서열과 완전히 얼라인될 수 있도록 디자인되었다.
도 2a 내지 도 2c는 2100 Bioanalyzer을 이용한 고감도 칩 상에서 구축된 라이브러리의 품질(quality) 검증을 나타낸다. 어댑터 다이머(adaptor dimer)를 포함하는 100 bp보다 작은 절편이 성공적으로 제거되었다. a: 표준 남성 DNA (Standard male DNA) 2800M ; b: 포준 여성 DNA (Standard female DNA) 9947A ; c: 1:1 혼합물.
도 3a 내지 도 3c는 D13S317 locus에서 관찰된 서열변이로부터, 레퍼런스/변이 비율 기반의 혼합비 측정을 나타낸다. D13S317 locus의 3‘ 플랭킹 부위에서 아데닌(A)에서 티민(T)으로의 서열변이가 관찰되었다. 혼합비율은 46% (A) : 53% (T)로 측정되었다. 3a: 표준 남성 DNA (Standard male DNA) 2800M ; 3b: 표준 여성 DNA (Standard female DNA) 9947A ; 3c: 1:1 혼합물 ; 3d: 혼합비.
도 4는 NGS 분석을 위한 in-house developed 멀티플렉스 PCR 시스템의 18 개 마커를 나타낸다.
도 5는 MiSeq 데이터로부터 각 대립유전자의 coverage 히스토그램(histogram)을 나타낸다. Y 축은 전체 coverage 퍼센트를 나타내고, 상단의 숫자는 각 마커의 리드 넘버(the number of reads)를 나타낸다.
도 6은 멀티플렉스 PCR에 의해 생성된 앰플리콘의 전기영동도(electropherogram)를 나타낸다. a 2800M 및 b 9947A control DNA.
도 7a 내지 도 7e는 2개 대조군 DNA 샘플에서 관찰된 염기서열변이를 나타낸다(a: D2S1338, b: D3S1358, c: D8S1179, d: D21S11, e: vWA). 프라이머 서열 및 플랭킹 부위는 도면에 나타내지 않았다. 파란색 글자 = 코어 반복 유닛(core repeat unit), 주황색 글자 = 반복 유닛(repeat units), 분홍색 글자 =비-반복 부위(non-repeat region), 볼드체 빨간색 글자 = 서열변이(sequence variation). 1 shows a schematic diagram of an STR reference sequence. Long flanking sequences ranging from 500 bp to 550 bp in the STR reference tm sequence were designed to be completely aligned with the sample sequence generated by the specific primer combination.
Figures 2a-2c illustrate the quality verification of a library constructed on a high sensitivity chip using a 2100 Bioanalyzer. Sections smaller than 100 bp containing the adapter dimer were successfully removed. a: Standard male DNA 2800M; b: Standard female DNA 9947A; c: 1: 1 mixture.
Figures 3A-3C show the reference / mutation rate-based mixing ratio measurements from sequence variations observed in D13S317 locus. Sequence variation from adenine (A) to thymine (T) was observed at the 3 'flanking site of D13S317 locus. The mixing ratio was measured as 46% (A): 53% (T). 3a: Standard male DNA 2800M; 3b: Standard female DNA 9947A; 3c: 1: 1 mixture; 3d: mixing ratio.
Figure 4 shows the 18 markers of an in-house developed multiplex PCR system for NGS analysis.
FIG. 5 shows a coverage histogram of each allele from MiSeq data. The Y axis represents the total coverage percentage, and the number at the top represents the number of reads of each marker.
Figure 6 shows the electrophorogram of the amplicon produced by multiplex PCR. a 2800M and b 9947A control DNA.
Figures 7a-7e show the nucleotide sequence variations observed in two control sample DNA samples (a: D2S1338, b: D3S1358, c: D8S1179, d: D21S11, e: vWA). The primer sequences and flanking regions are not shown in the figure. Blue letters = core repeat unit, orange letters = repeat units, pink letters = non-repeat region, bold red letters = sequence variation.

이하, 실시예를 통하여 본 발명을 더욱 상세히 설명하고자 한다. 이들 실시예는 오로지 본 발명을 보다 구체적으로 설명하기 위한 것으로, 본 발명의 요지에 따라 본 발명의 범위가 이들 실시예에 의해 제한되지 않는다는 것은 당업계에서 통상의 지식을 가진 자에 있어서 자명할 것이다.
Hereinafter, the present invention will be described in more detail with reference to Examples. It is to be understood by those skilled in the art that these embodiments are only for describing the present invention in more detail and that the scope of the present invention is not limited by these embodiments in accordance with the gist of the present invention .

실시예Example

실시예 1: 차세대 염기서열 분석법을 이용한 15개 상염색체 STR의 염기서열 생성 및 유전자형 분석Example 1: Sequence Generation and Genotyping of 15 Stage Chromosomal Strains Using Next Generation Sequence Analysis

가. 연구방법end. Research method

1. DNA 시료1. DNA sample

사용된 사용된 DNA 시료는 법의유전학 연구에서 대조군으로 사용되고 있는 상용 남성 표준 시료 2800M (Promega, Madison, WI, USA), 여성은 9947A (Promega)를 사용하였다. 이들 DNA 시료는 NanoDrop 1000 spectrophotometer (Thermo. Fisher scientific, Waltham, MA, USA)를 이용하여 정량한 후 1 ng/㎕의 농도로 준비하였다. 1:1 혼합시료는 두 개의 단일시료(2800M과 9947A)를 섞어서 최종농도 1 ng/㎕가 되도록 했다.
The DNA samples used were a commercial male standard sample 2800M (Promega, Madison, Wis., USA), which was used as a control in the genetics research of the law, and a female 9947A (Promega). These DNA samples were quantified using a NanoDrop 1000 spectrophotometer (Thermo. Fisher Scientific, Waltham, Mass., USA) and then prepared at a concentration of 1 ng / μl. The 1: 1 mixed sample was mixed with two single samples (2800M and 9947A) to a final concentration of 1 ng / μl.

2. STR 증폭 산물의 생성 및 확인2. Generation and identification of STR amplification products

본 연구에서는 D3S1358, TH01, D21S11, D18S51, PentaE, D5S818, D13S317, D7S820, D16S539, CSF1PO, PentaD, vWA, D8S1179, TPOX, FGA의 15개 STR 유전좌 및 Amelogenin을 분석할 수 있도록 PowerPlex 16 system (Promega)의 공개된 정보를 바탕으로 프라이머(primer)를 준비하였다. 다만 기존의 CE를 이용한 STR 분석법과 다르게 프라이머에 형광표지자를 부착하지 않았다. 표 1은 PCR 과정에서 사용된 프라이머의 서열 및 최종농도를 보여준다. PCR은 2.5 ㎕의 10X Gold ST*R buffer (Promega), 4.0 unit의 AmpliTaq Gold DNA polymerase (Applied Biosystems, Foster City, CA, USA), 프라이머와 1 ng의 DNA 시료를 포함하는 총 25 μl의 반응액을 준비하여 PowerPlex 16 system에서 권장하는 방법에서 PCR 온도 순환만 34회로 조정하여 수행하였다. PCR을 마친 후에 폴리아크릴아마이드젤X Gol(polyacrylamide gel ST*R bufforesis)을 통해서 증폭 산물들이 균일하게 생성되었는지 확인하였다. 2.5 증폭 산물의 정제는 QIAquick PCR 정제 키트 (QIAGEN, Hilden, Germany)를 이용하였다. 얻어진 증폭 산물의 농도는 QuantiT^TM PicoGreen dsDNA Assay Kit (Invitrogen, Carlsbad, CA, USA)를 이용하여 측정했으며, 순도 측정은 NanoDrop 1000 spectrophotometer (Thermo. Fisher scientific)로 260 nm와 280 nm의 파장에서 측정된 흡광도의 비율을 계산함으로써 이루어졌다.In order to analyze 15 STR genomes and amelogenins of D3S1358, TH01, D21S11, D18S51, PentaE, D5S818, D13S317, D7S820, D16S539, CSF1PO, PentaD, vWA, D8S1179, TPOX and FGA, the PowerPlex 16 system ) Was prepared based on the disclosed information of the primers. However, unlike the conventional STR method using CE, no fluorescent marker was attached to the primer. Table 1 shows the sequence and final concentration of the primers used in the PCR procedure. PCR was carried out in a total volume of 25 μl containing 2.5 μl of 10 × Gold ST * R buffer (Promega), 4.0 units of AmpliTaq Gold DNA polymerase (Applied Biosystems, Foster City, CA, USA), 1 ng of primer and 1 ng of DNA sample Was prepared by adjusting the PCR temperature cycle to 34 cycles in the method recommended by the PowerPlex 16 system. After completion of the PCR, polyacrylamide gel ST * R bufforesis was used to confirm that the amplified products were uniformly generated. 2.5 Amplification products were purified using QIAquick PCR purification kit (QIAGEN, Hilden, Germany). The concentration of the obtained amplification product was measured using a QuantiT ^™ PicoGreen dsDNA Assay Kit (Invitrogen, Carlsbad, Calif., USA), and the purity was measured at 260 nm and 280 nm with a NanoDrop 1000 spectrophotometer By calculating the ratio of absorbance.

멀티플렉스 PCR 시스템의 프라이머 세트^* 최종 농도Primer set of multiplex PCR system ^* Final concentration -- 위치(Locus)Locus 프라이머세트Primer set 프라이머 서열 (5`→3`)The primer sequence (5` → 3`) 농도 (μM)Concentration (μM) 1One D3S1358D3S1358 D3-PP16-FD3-PP16-F ACTGCAGTCCAATCTGGGTACTGCAGTCCAATCTGGGT 0.200.20 D3-PP16-RD3-PP16-R ATGAAATCAACAGAGGCTTGCATGAAATCAACAGAGGCTTGC 22 TH01TH01 TH01-PP16-FTH01-PP16-F GTGATTCCCATTGGCCTGTTCGTGATTCCCATTGGCCTGTTC 0.100.10 TH01-PP16-RTH01-PP16-R ATTCCTGTGGGCTGAAAAGCTCATTCCTGTGGGCTGAAAAGCTC 33 D21S11D21S11 D21-PP16-FD21-PP16-F ATATGTGAGTCAATTCCCCAAGATATGTGAGTCAATTCCCCAAG 0.600.60 D21-PP16-RD21-PP16-R TGTATTAGTCAATGTTCTCCAGAGACTGTATTAGTCAATGTTCTCCAGAGAC 44 D18S51D18S51 D18-PP16-FD18-PP16-F TTCTTGAGCCCAGAAGGTTATTCTTGAGCCCAGAAGGTTA 0.500.50 D18-PP16-RD18-PP16-R ATTCTACCAGCAACAACACAAATAAACATTCTACCAGCAACAACACAAATAAAC 55 Penta EPenta E PentaE-PP16-FPentaE-PP16-F ATTACCAACATGAAAGGGTACCAATAATTACCAACATGAAAGGGTACCAATA 1.201.20 PentaE-PP16-RPentaE-PP16-R TGGGTTATTAATTGAGAAAACTCCTTACAATTTTGGGTTATTAATTGAGAAAACTCCTTACAATTT 66 D5S818D5S818 D5-PP16-FD5-PP16-F GGTGATTTTCCTCTTTGGTATCCGGTGATTTTCCTCTTTGGTATCC 0.200.20 D5-PP16-RD5-PP16-R AGCCACAGTTTACAACATTTGTATCTAGCCACAGTTTACAACATTTGTATCT 77 D13S317D13S317 D13-PP16-FD13-PP16-F ATTACAGAAGTCTGGGATGTGGAGGAATTACAGAAGTCTGGGATGTGGAGGA 0.400.40 D13-PP16-RD13-PP16-R GGCAGCCCAAAAAGACAGAGGCAGCCCAAAAAGACAGA 88 D7S820D7S820 D7-PP16-FD7-PP16-F ATGTTGGTCAGGCTGACTATGATGTTGGTCAGGCTGACTATG 0.300.30 D7-PP16-RD7-PP16-R GATTCCACATTTATCCTCATTGACGATTCCACATTTATCCTCATTGAC 99 D16S539D16S539 D16-PP16-FD16-PP16-F GGGGGTCTAAGAGCTTGTAAAAAGGGGGGTCTAAGAGCTTGTAAAAAG 0.400.40 D16-PP16-RD16-PP16-R GTTTGTGTGTGCATCTGTAAGCATGTATCGTTTGTGTGTGCATCTGTAAGCATGTATC 1010 CSF1POCSF1PO CSF1PO-PP16-FCSF1PO-PP16-F CCGGAGGTAAAGGTGTCTTAAAGTCCGGAGGTAAAGGTGTCTTAAAGT 0.300.30 CSF1PO-PP16-RCSF1PO-PP16-R ATTTCCTGTGTCAGACCCTGTTATTTCCTGTGTCAGACCCTGTT 1111 Penta DPenta D PentaD-PP16-FPentaD-PP16-F GAAGGTCGAAGCTGAAGTGGAAGGTCGAAGCTGAAGTG 1.201.20 PentaD-PP16-RPentaD-PP16-R ATTAGAATTCTTTAATCTGGACACAAGATTAGAATTCTTTAATCTGGACACAAG 1212 AmelogeninAmelogenin Amelo-PP16-FAmelo-PP16-F CCCTGGGCTCTGTAAAGAACCCTGGGCTCTGTAAAGAA 0.250.25 Amelo-PP16-RAmelo-PP16-R ATCAGAGCTTAAACTGGGAAGCTGATCAGAGCTTAAACTGGGAAGCTG 1313 vWAvWA vWA-PP16-FvWA-PP16-F GCCCTAGTGGATGATAAGAATAATCAGTATGTGGCCCTAGTGGATGATAAGAATAATCAGTATGTG 0.150.15 vWA-PP16-RvWA-PP16-R GGACAGATGATAAATACATAGGATGGATGGGGACAGATGATAAATACATAGGATGGATGG 1414 D8S1179D8S1179 D8-PP16-FD8-PP16-F ATTGCAACTTATATGTATTTTTGTATTTCATGATTGCAACTTATATGTATTTTTGTATTTCATG 0.500.50 D8-PP16-RD8-PP16-R ACCAAATTGTGTTCATGAGTATAGTTTCACCAAATTGTGTTCATGAGTATAGTTTC 1515 TPOXTPOX TPOX-PP16-FTPOX-PP16-F GCACAGAACAGGCACTTAGGGCACAGAACAGGCACTTAGG 0.150.15 TPOX-PP16-RTPOX-PP16-R CGCTCAAACGTGAGGTTGCGCTCAAACGTGAGGTTG 1616 FGAFGA FGA-PP16-FFGA-PP16-F GGCTGCAGGGCATAACATTAGGCTGCAGGGCATAACATTA 0.600.60 FGA-PP16-RFGA-PP16-R ATTCTATGACTTTGCGCTTCAGGAATTCTATGACTTTGCGCTTCAGGA

* 형광염료 없이 PowerPlex 16 시스템으로부터의 정보에 기반한 각 프라이머 서열
* Each primer sequence based on information from the PowerPlex 16 system without fluorescent dye

3. 증폭 산물을 이용한 라이브러리(3. Library using amplification product ( librarylibrary )의 제작) Production

NGS 분석을 위한 첫 단계로 생성된 증폭 산물을 대상으로 특정 어댑터(adaptor)를 붙여주는 라이브러리 제작은 GS Rapid Library Preparation Kit (Roche Diagnostics Corp., Branford, CT, USA)를 이용하여 제조사의 지시대로 수행하였다. 이 과정에서 DNA 시료에 따른 구분을 위해서 Multiplex Identifier (MID)가 포함된 어댑터를 사용하였다. 제작된 라이브러리의 정제는 AMPure bead (Beckman Coulter, Brea, CA, USA)를 이용하였는데, 증폭 산물과 비드(bead)의 비율이 2:1이 되도록 함으로써 크기가 100 bp 미만인 작은 절편들을 제거할 수 있도록 하였다. 최종적으로 얻어진 라이브러리에 대한 크기별 분포 확인 및 농도 측정은 2100 Bioanalyzer (Agilent Technologies, Palo Alto, CA, USA)를 이용하였다.
The first step for NGS analysis was to use the GS Rapid Library Preparation Kit (Roche Diagnostics Corp., Branford, CT, USA) to construct a library to attach a specific adapter to the amplification product Respectively. In this process, the adapter containing the Multiplex Identifier (MID) was used to distinguish the DNA samples. AMPure bead (Beckman Coulter, Brea, CA, USA) was used to purify the library so that the ratio of the amplified product to the bead was 2: 1 so that small fragments smaller than 100 bp could be removed. Respectively. The size distribution and the concentration of the final library were checked using a 2100 Bioanalyzer (Agilent Technologies, Palo Alto, CA, USA).

4. 클론 증폭(4. Clone amplification clonalclonal amplificationamplification ) 및 대량 염기서열 생성) And large base sequence generation

제작된 라이브러리의 클론 증폭을 위한 에멀션 PCR (emulsion PCR)은 GS Junior Titanium emPCR Kit (Lib-L; Roche Diagnostics Corp.)을 이용하여 제조사의 지시대로 수행하였다. 이를 위해서 라이브러리는 측정된 농도를 기반으로 1 μl 당 분자의 수를 계산한 후에 1 μl 당 1×10⁷개의 분자가 되도록 희석하였다(Eq. 1). 또한 Scheible 등¹⁴ ⁾이 제시한대로 copy per bead 수는 1.0으로 설정하였다. 에멀션 PCR 증폭 산물에 대한 대량 염기서열 생성은 GS Junior Titanium Sequencing Kit (Roche Diagnostics Corp.)를 이용하여 제조사의 지시에 따라 GS Junior (Roche Diagnostics Corp.) 장비에서 수행하였다.Emulsion PCR for clone amplification of the prepared library was performed using the GS Junior Titanium emPCR Kit (Lib-L; Roche Diagnostics Corp.) according to the manufacturer's instructions. For this, the library was diluted to 1 × 10 ⁷ molecules per μl after calculating the number of molecules per μl based on the measured concentration (Eq. 1). Also Scheible such as ^¹⁴⁾ can copy per bead As presented was set to 1.0. Generation of large base sequences for the emulsion PCR amplification products was performed on GS Junior (Roche Diagnostics Corp.) equipment using the GS Junior Titanium Sequencing Kit (Roche Diagnostics Corp.) according to the manufacturer's instructions.

Molecules/μl =

: Eq. 1
Molecules / μl =

: Eq. One

5. 5. NGSNGS 자료의 분석 Analysis of data

STR 대립유전자형 결정을 위해 Bornman 등15)이 제시한 프로토콜에 기초하여 i) 참조서열(reference sequence)의 제작, ii) NGS 리드와 참조서열 간의 정렬(alignment), iii) 각 STR대립유전자에서의 coverage 값의 계산, iv) 각 STR 유전좌에서의 대립유전자형 결정의 순서로 분석이 이루어졌다. 참조서열의 제작을 위해 현재까지 알려진 STR 대립유전자의 반복수 및 이들의 서열은 STRbase (http://www.cstl.nist.gov/biotech/strbase)로부터 얻었으며, 각 STR의 5´ 및 3´ 주변부 서열(flanking region sequence)은 human genome GRCh37/hg19에서 가져왔다. 또한, 주변부 서열의 길이를 500-550 bp로 설정함으로써 어떠한 프라이머 조합을 통해 얻은 NGS 리드도 참조서열과의 정렬(alignment)이 이루어질 수 있도록 하였다(도 1). 최종적으로 참조서열은 5´ 주변부 서열, STR 영역의 서열, 3´ 주변부 서열로 구성될 수 있도록 Microsoft Excel의 매크로 기능을 이용하여 제작하였다.Based on the protocol proposed by Bornman et al. (15) for STR allelic genotyping, i) production of a reference sequence, ii) alignment between NGS leads and reference sequences, iii) coverage of each STR allele And iv) the order of allele genotyping in each STR genome was analyzed. The number of repetitions of STR alleles known to date for the construction of reference sequences and their sequences were obtained from STRbase (http://www.cstl.nist.gov/biotech/strbase) and the 5 'and 3' The flanking region sequence was derived from the human genome GRCh37 / hg19. In addition, by setting the length of the peripheral sequence to 500-550 bp, alignment of the NGS leads obtained through any combination of primers with the reference sequence was made possible (FIG. 1). Finally, the reference sequence was constructed using the macro function of Microsoft Excel so that it could consist of the 5 'perimeter sequence, the STR region sequence, and the 3' perimeter sequence.

NGS 리드와 참조서열간의 정렬은 리눅스(Linux) 운영체제에서 Bowtie 2¹⁶⁾ 프로그램을 이용하였다. 얻어진 결과 파일의 형식전환을 위해 SAMtools¹⁷ ⁾과 BEDTool¹⁸ ⁾을 순차적으로 사용하였다. 각 STR 대립유전자에 대한 coverage 값은 참조서열에 정렬된 리드 중에서 전체 STR 영역을 포함하는 리드의 수를 계산함으로써 얻어졌다. 각 STR 유전좌에서 대립유전자형의 결정을 위해서 단일시료에서는 각 유전좌에서 전체 coverage 값의 20%, 혼합시료에서는 10%로 기준값을 적용함으로써 이루어졌다. 앞에서 결정된 각 STR 대립유전자의 염기서열을 바탕으로 한 반복구조(repeat structure)의 확인은 Integrative Genomics Viewer¹⁹ ⁾를 이용하였다. 또한, 각 STR 유전좌에서 대립유전자형에 대한 coverage 값의 비율을 조사하는 방법과 특정 위치에서 나타나는 염기서열변이(sequence variation)를 확인하고, 각 염기의 비율을 알아보는 방법으로 1:1 혼합시료 비율을 추정하였다.
The alignment between the NGS lead and the reference sequence was done using the Bowtie 2 ¹⁶⁾ program on the Linux operating system. SAMtools ¹⁷ ⁾ and BEDTool ¹⁸ ⁾ were used in order to convert the format of the obtained result file. The coverage value for each STR allele was obtained by calculating the number of leads containing the entire STR region among the leads aligned in the reference sequence. For the determination of allelic genotypes in each STR genotype, the reference value was applied to 20% of the total coverage value in each genome and 10% in the mixed sample in a single sample. Identification of the repeat structure based on the nucleotide sequence of each STR allele determined above was performed using the Integrative Genomics Viewer ¹⁹ ⁾ . In addition, a method of examining the ratio of the coverage value to the allelic genotype in each STR genome, and identifying the sequence variation at a specific site, Respectively.

6. 모세관 전기영동(6. Capillary electrophoresis ( capillarycapillary electrophoresis전공기 )을 이용한 ) STRSTR 분석 analysis

NGS 자료로부터 얻어진 남녀 표준시료 및 1:1 혼합시료의 대립유전자형이 정확하게 결정되었는지 확인하기 위해서 모세관 전기영동 기반의 STR 분석법으로 이들 시료의 대립유전자형을 알아보았다. 이를 위해 각 DNA 시료 1 ng과 PowerPlex 16 HS system (Promega)을 이용하여 제조사의 지시대로 PCR을 수행하고, 얻어진 증폭 산물은 ABI PRISM 3130xlGenetic Analyzer와 GeneScan Software Version 3.7 (Applied Biosystems)을 이용하여 검출하였으며, 마지막으로 GeneMapper™ ID Software Version 3.1 (Applied Biosystems)을 이용하여 분석하였다.
To determine whether the allele genotypes of male and female standard and 1: 1 mixed samples from NGS data were correctly determined, allelic types of these samples were examined by capillary electrophoresis based STR analysis. PCR was performed using 1 ng of each DNA sample and PowerPlex 16 HS system (Promega) according to the manufacturer's instructions. The obtained amplified products were detected using ABI PRISM 3130xl Genetic Analyzer and GeneScan Software Version 3.7 (Applied Biosystems) Finally, analysis was performed using GeneMapper ™ ID Software Version 3.1 (Applied Biosystems).

연구결과Results

1. One. PCRPCR 증폭 산물로부터 라이브러리의 제작 Production of libraries from amplification products

Powerplex 16 system 정보를 바탕으로 준비된 프라이머를 이용하여 2800M 표준 남성 DNA 시료, 9947A 표준 여성 DNA 시료, 이들의 1:1 혼합시료로부터 PCR을 수행하였으며, 얻어진 증폭 산물을 정제 한 후에 이들의 순도를 측정? 이때 1.92-1.95의 범위로 나왔으며 1650 - 2220 ng의 범위로 증폭 산물을 얻었다. 따라서 Roche사에서 제시하는 라이브러리 제작을 위한 최소량 500 ng 이상, 순도 1.70 이상 2.00 미만의 기준을 충족하였기에 3개 DNA 시료의 증폭 산물로부터 라이브러리 를 제작하였다. 도 2a 내지 도 2c는 Bioanalyzer 를 통해 얻은 최종적으로 얻은 각 시료의 라이브러리에 대한 크기별 분포를 보여주고 있다. 여기서 100 bp 미만의 작은 절편들은 거의 확인되지 않기 때문에 라이브러리 제작과정에서 비드를 이용한 방법으로 작은 절편들이 선택적으로 제거됨을 확인할 수 있었다.
PCR was performed from 2800M standard male DNA samples, 9947A standard female DNA samples, and 1: 1 mixture samples using primers prepared based on the Powerplex 16 system information. After the obtained amplification products were purified, their purity was measured. At this time, amplification products were obtained in the range of 1.92-1.95 and in the range of 1650-2220 ng. Therefore, the libraries were constructed from the amplification products of three DNA samples, since the minimum amount of 500 ng or more and the purity of 1.70 or more and less than 2.00 were met for the library production proposed by Roche. FIGS. 2A to 2C show the sizes of the libraries of the final samples obtained through the Bioanalyzer. Since small fragments of less than 100 bp are rarely observed, it was confirmed that small fragments were selectively removed by the bead method in the library production process.

2. 2. NGSNGS 자료의 Of the material 시료 별By sample 분류 및 서열정렬( Sort and sequence alignment ( sequencesequence alignmentalignment ))

NGS을 통해 얻은 리드의 수가 총 164,468개였으며, 이들의 평균길이는 183.64 bp로 나왔으며, MID 서열을 이용한 시료에 따른 분류를 통해서 2800M 표준시료는 51,475개, 9947A 표준시료는 33,213개, 이들의 1:1 혼합시료는 76,943개, 그리고 분류되지 않은 리드는 2,837개로 얻어졌다. 이들 자료를 참조서열과의 정렬을 통해 각 STR 유전좌에서 얻어진 리드 수를 확인할 수 있었다(표 2). 15개의 STR 유전좌 중 D3S1358, D5S818, D13S317, TH01에서는 다른 유전좌들에 비해 많은 리드 수가 얻어졌다. 그리고 D16S439, D18S51, CSF1PO, FGA, Penta D, Penta E, TPOX에서는 상대적으로 적은 리드를 얻었는데, 이들 증폭 산물의 크기는 대체로 250 bp보다 컸음을 확인할 수 있었다. 각 유전좌에서 모든 리드의 수에 대한 전체 STR 영역을 포함하는 리드의 수의 백분율을 조사하였을 때 D18S51, FGA, Penta D, Penta E에서 50% 미만으로 나오는 것을 볼 수 있었다. 마찬가지로 증폭 산물의 크기가 클수록 전체 STR 영역을 포함하는 리드의 수도 적게 얻어졌음을 보여준다.The total number of leads obtained from NGS was 164,468, and their average length was 183.64 bp. According to the classification using the MID sequence, 51,475 standard specimens of 2800M, 33,213 standard specimens of 9947A, and 1 of them : 76,943 mixed samples and 2,837 untested leads were obtained. By aligning these data with reference sequences, we could confirm the number of leads obtained from each STR genome (Table 2). In the 15 STR genotypes D3S1358, D5S818, D13S317 and TH01, more lead counts were obtained than other genome sequences. In the case of D16S439, D18S51, CSF1PO, FGA, Penta D, Penta E and TPOX, relatively small leads were obtained, and the size of these amplified products was generally larger than 250 bp. When the percentage of the number of leads including the entire STR region with respect to the number of all leads in each dielectric layer was examined, it was found that D18S51, FGA, Penta D and Penta E showed less than 50%. Likewise, the larger the size of the amplification product, the smaller the number of leads containing the entire STR region.

각 샘플 내 15개 STR loci의 리드카운트(read counts)The read counts of 15 STR loci in each sample STR locusSTR locus Amplicon 크기범위 (bp)Amplicon Size Range (bp) 2800M2800M 9947A9947A 1:1 혼합물1: 1 mixture All*All * 전체 STR¹ Full STR ¹ 전체 STR/All(%)Total STR / All (%) All*All * 전체 STR¹ Full STR ¹ 전체 STR/All(%)Total STR / All (%) All*All * 전체 STR¹ Full STR ¹ 전체 STR/All(%)Total STR / All (%) D3S1358D3S1358 115-147 115-147 9470 9470 8743 8743 92.3 92.3 6341 6341 6012 6012 94.8 94.8 14261 14261 13306 13306 93.393.3 D5S818D5S818 119-155 119-155 9485 9485 8705 8705 91.8 91.8 5523 5523 5011 5011 90.7 90.7 9347 9347 8531 8531 91.391.3 D7S820D7S820 215-247 215-247 3676 3676 3476 3476 94.6 94.6 1868 1868 1780 1780 95.3 95.3 4815 4815 4603 4603 95.695.6 D8S1179D8S1179 203-247 203-247 4458 4458 4017 4017 90.1 90.1 1967 1967 1805 1805 91.8 91.8 3368 3368 3054 3054 90.790.7 D13S317D13S317 169-201 169-201 4897 4897 4631 4631 94.6 94.6 4060 4060 3868 3868 95.3 95.3 12839 12839 12140 12140 94.694.6 D16S539D16S539 264-304 264-304 967 967 877 877 90.790.7 708 708 655 655 92.5 92.5 2497 2497 2361 2361 94.694.6 D18S51D18S51 209-366 209-366 739 739 332 332 44.9 44.9 1284 1284 546 546 42.5 42.5 1117 1117 481 481 43.143.1 D21S11D21S11 203-259 203-259 3045 3045 2313 2313 76.0 76.0 2996 2996 2525 2525 84.3 84.3 4873 4873 3871 3871 79.479.4 CSF1POCSF1PO 321-357 321-357 291 291 244 244 83.8 83.8 596 596 522 522 87.6 87.6 862 862 742 742 86.186.1 FGAFGA 322-444 322-444 956 956 460 460 48.1 48.1 666 666 255 255 38.3 38.3 3137 3137 1440 1440 45.945.9 Penta DPenta D 376-441 376-441 142 142 31 31 21.8 21.8 267 267 56 56 21.0 21.0 403 403 75 75 18.618.6 Penta EPenta E 379-474 379-474 193 193 84 84 43.5 43.5 356 356 116 116 32.6 32.6 563 563 309 309 54.954.9 TH01TH01 156-195 156-195 5503 5503 4620 4620 84.0 84.0 3324 3324 2811 2811 84.6 84.6 6712 6712 5518 5518 82.282.2 TPOXTPOX 262-290 262-290 269 269 230 230 85.5 85.5 215 215 183 183 85.1 85.1 679 679 576 576 84.884.8 vWAvWA 123-171 123-171 3153 3153 2782 2782 88.2 88.2 1014 1014 919 919 90.6 90.6 8565 8565 7649 7649 89.389.3 AMELAMEL 106, 112 106, 112 3416 3416 3247 3247 95.1 95.1 1773 1773 1741 1741 98.2 98.2 2334 2334 2247 2247 96.396.3 합계Sum -- 50550 50550 44792 44792 88.6 88.6 32958 32958 28805 28805 87.4 87.4 76372 76372 66903 66903 87.687.6

* 리드 STR 구역의 존재여부와 무관한 모든 얼라인된 리드* Lead All aligned leads, regardless of STR presence or absence

†전체 STR 부분을 포함하는 얼라인된 리드, 50% 이하인 전체 STR은 볼드체로 나타냄
† Aligned leads that contain the entire STR portion, and all STRs that are less than 50% are shown in bold

3. 3. STRSTR 대립유전자의 반복구조 결정 및 Determination of repetitive structure of alleles and 염기서열변이의Nucleotide sequence variation 확인 Confirm

표 3은 2개의 단일시료(2800M 및 9947A)와 이들의 1:1 혼합물에 대한 NGS 자료로부터 STR 대립유전자가 결정되는 예시를 보여준다. 이와 같은 방법으로 이들 시료에 대해서 15개 STR 유전좌에서의 대립유전자를 결정할 수 있었다 (표 4). NGS 자료로부터 결정된 유전자형이 정확하게 일치하는지 확인하기 위하여 기존 CE 분석법으로 결정된 STR 유전자형 결과와 비교해 본 결과 단일시료의 경우에는 15개 STR 유전좌에서 모두 일치하였고, 1:1 혼합시료의 경우 13개 STR 유전좌(D3S1358, D5S818, D7S820, D8S1179, D13S317, D16S539, D18S51, D21S11, CSF1PO, FGA, Penta D, TH01, TPOX)에서는 대립유전자형이 정확하게 일치하였으나, 2개 유전좌(Penta E, vWA)에서는 일치하지 않았다. 이들의 coverage 값을 토대로 확인해 본 결과 Penta E의 대립유전자 12는 7.44%, vWA의 대립유전자 18은 9.11%로 혼합시료의 분석 기준값인 10%에 미치지 못하였다.
Table 3 shows an example in which STR alleles are determined from NGS data for two single samples (2800M and 9947A) and a 1: 1 mixture thereof. In this way, alleles at 15 STR loci were determined for these samples (Table 4). In order to confirm whether the genotypes determined from the NGS data match exactly, we compared the STR genotypes determined by the conventional CE method. The results were consistent with all 15 STR genotypes in a single sample, and 13 STR genotypes in a 1: The alleles were exactly identical in the left (D3S1358, D5S818, D7S820, D8S1179, D13S317, D16S539, D18S51, D21S11, CSF1PO, FGA, Penta D, TH01 and TPOX) I did. Based on their coverage values, the Penta E allele 12 and vWA allele 12 were 7.44% and 9.11%, respectively, which was below the reference value of 10% of the mixed sample.

단일-소스 및 1:1 혼합물에서 대립유전자 coverage 퍼센트에 기초한 D3S1358 대립유전자 확인Identification of the D3S1358 allele based on the allele coverage percentage in single-source and 1: 1 mixtures 대립유전자Allele 2800M2800M 9947A9947A 1:1 혼합물1: 1 mixture 대립유전자 리드카운트Allele lead count 대립유전자 coverage* (%)Allele coverage * (%) 대립유전자 리드카운트Allele lead count 대립유전자 coverage* (%)Allele coverage * (%) 대립유전자 리드카운트Allele lead count 대립유전자 coverage* (%)Allele coverage * (%) 1111 00 -- 22 0.030.03 00 -- 1212 00 -- 1212 0.200.20 55 0.040.04 1313 22 0.020.02 217217 3.613.61 103103 0.770.77 1414 1313 0.150.15 28682868 47.7047.70 15191519 11.4111.41 1515 7171 0.810.81 28792879 47.8947.89 22452245 16.8716.87 1616 495495 5.665.66 3434 0.570.57 541541 4.064.06 1717 43554355 49.8149.81 00 -- 49364936 37.0937.09 1818 37573757 42.9742.97 00 -- 39063906 29.3529.35 1919 2424 0.270.27 00 -- 3333 0.250.25 2020 2626 0.300.30 00 -- 2121 0.160.16 합계Sum 87438743 -- 60126012 -- 1330913309 --

* 상기 표에서 음영부분(볼드체)은 분석 한계치에 기초한 대립유전자를 나타낸다. * The shaded area (bold) in the above table represents the allele based on the analysis limits.

* 대립유전자 coverage 퍼센트(%) = 대립유전자 리드카운트/locus 리드카운트 x 100* Allele coverage percentage (%) = allele lead count / locus lead count x 100

CE 및 NGS 분석법에 의한 2 단일-소스 및 1:1 혼합물에서의 STR 지노타이핑 결과STR Genotyping results in 2 single-source and 1: 1 mixtures by CE and NGS assays STR locusSTR locus 2800M2800M 9947A9947A 1:1 혼합물1: 1 mixture CECE NGSNGS CECE NGSNGS CECE NGSNGS D3S1358D3S1358 17, 1817, 18 17, 1817, 18 14, 1514, 15 14, 1514, 15 14, 15, 17, 1814, 15, 17, 18 14, 15, 17, 1814, 15, 17, 18 D5S818D5S818 1212 1212 1111 1111 11, 1211, 12 11, 1211, 12 D7S820D7S820 8, 118, 11 8, 118, 11 10, 1110, 11 10, 1110, 11 8, 10, 118, 10, 11 8, 10, 118, 10, 11 D8S1179D8S1179 14, 1514, 15 14, 1514, 15 1313 1313 13, 14, 1513, 14, 15 13, 14, 1513, 14, 15 D13S317D13S317 9, 119, 11 9, 119, 11 1111 9, 119, 11 9, 119, 11 9, 119, 11 D16S539D16S539 9, 139, 13 9, 139, 13 11, 1211, 12 11, 1211, 12 9, 11, 12, 139, 11, 12, 13 9, 11, 12, 139, 11, 12, 13 D18S51D18S51 16, 1816, 18 16, 1816, 18 15, 1915, 19 15, 1915, 19 15, 16, 18, 1915, 16, 18, 19 15, 16, 18, 1915, 16, 18, 19 D21S11D21S11 29, 31.229, 31.2 29, 31.229, 31.2 3030 3030 29, 30, 31.229, 30, 31.2 29, 30, 31.229, 30, 31.2 CSF1POCSF1PO 1212 1212 10, 1210, 12 10, 1210, 12 10, 1210, 12 10, 1210, 12 FGAFGA 20, 2320, 23 20, 2320, 23 23, 2423, 24 23, 2423, 24 20, 23, 2420, 23, 24 20, 23, 2420, 23, 24 Penta DPenta D 12, 1312, 13 12, 1312, 13 1212 1212 12, 1312, 13 12, 1312, 13 Penta EPenta E 7, 147, 14 7, 147, 14 12, 1312, 13 12, 1312, 13 7, 12, 13, 147, 12, 13, 14 7, (12), 13, 147, (12), 13, 14 TH01TH01 6, 9.36, 9.3 6, 9.36, 9.3 8, 9.38, 9.3 8, 9.38, 9.3 6, 8, 9.36, 8, 9.3 6, 8, 9.36, 8, 9.3 TPOXTPOX 1111 1111 88 88 8, 118, 11 8, 118, 11 vWAvWA 16, 1916, 19 16, 1916, 19 17, 1817, 18 17, 1817, 18 16, 17, 18, 1916, 17, 18, 19 16, 17, (18), 1916, 17, (18), 19

* 괄호: 전체 coverage 값의 10%보다 낮은 값은 가지는 대립유전자(true allele)
* Parentheses: Alleles with a value lower than 10% of the total coverage value (true allele)

남녀 표준 DNA 시료의 NGS 자료로부터 15개 STR 유전좌 에서 결정된 대립유전자의 염기서열을 확인하였으며, 이를 바탕으로 각 STR 영역의 반복구조를 결정할 수 있었다(표 5). 또한, 각 시료 간에 STR 유전좌에서의 염기서열을 비교하여 다음과 같이 반복구조의 차이 혹은 염기서열의 변이를 관찰하였다. 첫 번째는 두 개의 대립유전자형이 길이는 같지만, 염기서열이 다른 경우이다. 9947A 시료의 D8S1179 유전좌는 CE 기반의 분석법으로는 대립유전자형이 13, 13으로 동형접합자(homozygous)로 나타나지만, NGS를 통해 분석한 결과 하나는“TCTA TCTG [TCTA]₁₁”으로, 다른 하나는 “[TCTA]₁₃”으로 서로 다른 반복구조를 가진 대립유전자형으로 나타나는 것으로 확인되었다. 결과적으로는 STR 영역의 길이는 갖지만, 서로 다른 염기서열을 갖는 이형접합자(heterozygous)인 것이다. 두 번째는 시료 간에 서로 다른 반복구조를 가진 경우이다. D3S1358 유전좌에서는 핵심반복단위는 [TCTA]로 시료에 따라 [TCTG]의 반복단위가 발견된다. 2800M과 9947A 시료 간에 D3S1358 유전좌의 반복구조를 비교했을 때 [TCTG]가 나타나는 위치가 각각 세 번째와 두 번째로 서로 다르게 나타나는 것이 관찰되었다. 세 번째로 STR 영역이 아닌 주변부 서열에서 염기서열변이가 관찰된 경우이다. STR 대립유전자형은 9, 11로 확인된 2800M 시료의 D13S317 유전좌에서 유일하게 관찰되었다. 이들의 3’주변부 서열에서 대립유전자 9는“AATC AATC”로, 대립유전자 11은“TATC AATC”로 나타났다. 마치 [TATC]의 반복이 하나 더 추가된 것처럼 관찰된 것이다.
From the NGS data of male and female standard DNA samples, the nucleotide sequence of the alleles determined at the 15 STR loci was confirmed and based on this, it was possible to determine the repeat structure of each STR region (Table 5). In addition, the nucleotide sequences of the STR genomic loci were compared among the samples, and the differences in the repeating structure or the sequence of the nucleotide sequences were observed as follows. The first is that the two alleles have the same length but different base sequences. The D8S1179 genetic locus of the 9947A sample appears homozygous with allelic variants of 13 and 13 in the CE-based method. One result of analysis by NGS is "TCTA TCTG [TCTA] ₁₁ " and the other is " And [TCTA] ₁₃ ", respectively. As a result, the STR region is heterozygous, having a different base sequence but having a length. The second case is that the samples have different repeating structures. In the D3S1358 genetic locus, the core repeat unit is [TCTA], and repeat units of [TCTG] are found, depending on the sample. Comparing the repeating structures of D3S1358 genetic loci between 2800M and 9947A samples, it was observed that the positions of [TCTG] appeared to be different from each other in the third and the second. Third, there is a case in which nucleotide sequence variation is observed in a peripheral region other than an STR region. The STR allelic genotype was the only one found in the D13S317 genetic locus of the 2800M sample identified as 9, 11. Alleles 9 in the 3 'perimeter sequence were identified as "AATC AATC" and allele 11 as "TATC AATC". It was observed as if one more iteration of [TATC] was added.

NGS 데이터로부터의 Two Standard Sample에서 15개 STR의 반복 구조-2800MRepetition structure of 15 STRs in Two Standard Sample from NGS data -2800M STR locusSTR locus 지노타입Zino type 코어 반복Core iteration 반복 구조Repeating structure D3S1358D3S1358 17, 1817, 18 TCTATCTA 17: TCTA [TCTG]₃ [TCTA]₁₃ 17: TCTA [TCTG] ₃ [TCTA] ₁₃ 18: TCTA [TCTG]₃ [TCTA]₁₄ 18: TCTA [TCTG] ₃ [TCTA] ₁₄ D5S818D5S818 1212 AGATAGAT 12: [AGAT]₁₂ 12: [AGAT] ₁₂ D7S820D7S820 8, 118, 11 GATAGATA 8: [GATA]₈ 8: [GATA] ₈ 11: [GATA]₁₁ 11: [GATA] ₁₁ D8S1179D8S1179 14, 1514, 15 TCTATCTA 14: TCTA TCTG [TCTA]₁₂ 14: TCTA TCTG [TCTA] ₁₂ 15: [TCTA]₂ TCTG [TCTA]₁₂ 15: [TCTA] ₂ TCTG [TCTA] ₁₂ D13S317D13S317 9, 119, 11 TATCTATC 9: [TATC]₉ [ AATC ] ₂ 9: [TATC] ₉ [ AATC ] ₂ 11: [TATC]₁₁ TATC AATC 11: [TATC] ₁₁ TATC AATC D16S539D16S539 9, 139, 13 GATAGATA 9: [GATA]₉ 9: [GATA] ₉ 13: [GATA]₁₃ 13: [GATA] ₁₃ D18S51D18S51 16, 1816, 18 AGAAAGAA 16: [AGAA]₁₆ AAAG [ AG ] ₃ 16: [AGAA] ₁₆ AAAG [ AG ] ₃ 18: [AGAA]₁₈ AAAG [ AG ] ₃ 18: [AGAA] ₁₈ AAAG [ AG ] ₃ D21S11D21S11 29, 31.229, 31.2 TCTATCTA 29: [TCTA]₄ [TCTG]₆ [TCTA]₃ TA [TCTA]₃ TCA [TCTA]₂ TCCA TA [TCTA]₁₁ 29: [TCTA] ₄ [TCTG] ₆ [TCTA] ₃ TA [TCTA] ₃ TCA [TCTA] ₂ TCCA TA [TCTA] ₁₁ 31.2: [TCTA]₅ [TCTG]₆ [TCTA]₃ TA [TCTA]₃ TCA [TCTA]₂ TCCA TA [TCTA]₁₁ TA TCTA31.2: [TCTA] ₅ [TCTG] ₆ [TCTA] ₃ TA [TCTA] ₃ TCA [TCTA] ₂ TCCA TA [TCTA] ₁₁ TA TCTA CSF1POCSF1PO 1212 AGATAGAT 12: [AGAT]₁₂ 12: [AGAT] ₁₂ FGAFGA 20, 2320, 23 CTTTCTTT 20: [TTTC]₃ TTTT TTCT [CTTT]₁₂ CTCC [TTCC]₂ 20: [TTTC] ₃ TTTT TTCT [CTTT] ₁₂ CTCC [TTCC] ₂ 23: [TTTC]₃ TTTT TTCT [CTTT]₁₅ CTCC [TTCC]₂ 23: [TTTC] ₃ TTTT TTCT [CTTT] ₁₅ CTCC [TTCC] ₂ Penta DPenta D 12, 1312, 13 AAAGAAAAGA 12: [AAAGA]₁₂ 12: [AAAGA] ₁₂ 13: [AAAGA]₁₃ 13: [AAAGA] ₁₃ Penta EPenta E 7, 147, 14 AAAGAAAAGA 7: [AAAGA]₇ 7: [AAAGA] ₇ 14: [AAAGA]₁₄ 14: [AAAGA] ₁₄ TH01TH01 6, 9.36, 9.3 AATGAATG 6: [AATG]₆ 6: [AATG] ₆ 9.3: [AATG]₆ ATG [AATG]₃ 9.3: [AATG] ₆ ATG [AATG] ₃ TPOXTPOX 1111 AATGAATG 11: [AATG]₁₁ 11: [AATG] ₁₁ vWAvWA 16, 1916, 19 TCTATCTA 16: TCTA [TCTG]₃ [TCTA]₁₂ TCCA TCTA16: TCTA [TCTG] ₃ [TCTA] ₁₂ TCCA TCTA 19: TCTA [TCTG]₄ [TCTA]₁₄ TCCA TCTA19: TCTA [TCTG] ₄ [TCTA] ₁₄ TCCA TCTA

NGS 데이터로부터의 Two Standard Sample에서 15개 STR의 반복 구조-9947AMRepeating structure of 15 STRs in Two Standard Sample from NGS data -9947AM STR locusSTR locus 지노타입Zino type 코어 반복Core iteration 반복 구조Repeating structure D3S1358D3S1358 14, 1514, 15 TCTATCTA 14: TCTA [TCTG]₂ [TCTA]₁₁ 14: TCTA [TCTG] ₂ [TCTA] ₁₁ 15: TCTA [TCTG]₂ [TCTA]₁₂ 15: TCTA [TCTG] ₂ [TCTA] ₁₂ D5S818D5S818 1111 AGATAGAT 12: [AGAT]₁₁ 12: [AGAT] ₁₁ D7S820D7S820 10, 1110, 11 GATAGATA 10: [GATA]₁₀ 10: [GATA] ₁₀ 11: [GATA]₁₁ 11: [GATA] ₁₁ D8S1179D8S1179 1313 TCTATCTA 13a: TCTA TCTG [TCTA]₁₁ 13a: TCTA TCTG [TCTA] ₁₁ 13b: [TCTA]₁₃ 13b: [TCTA] ₁₃ D13S317D13S317 1111 TATCTATC 11: [TATC]₁₁ [ AATC ] ₂ 11: [TATC] ₁₁ [ AATC ] ₂ D16S539D16S539 11, 1211, 12 GATAGATA 11: [GATA]₁₁ 11: [GATA] ₁₁ 12: [GATA]₁₂ 12: [GATA] ₁₂ D18S51D18S51 15, 1915, 19 AGAAAGAA 15: [AGAA]₁₅ AAAG [ AG ] ₃ 15: [AGAA] ₁₅ AAAG [ AG ] ₃ 19: [AGAA]₁₉ AAAG [ AG ] ₃ 19: [AGAA] ₁₉ AAAG [ AG ] ₃ D21S11D21S11 3030 TCTATCTA 30: [TCTA]₆ [TCTG]₅ [TCTA]₃ TA [TCTA]₃ TCA [TCTA]₂ TCCA TA [TCTA]₁₁ 30: [TCTA] ₆ [TCTG] ₅ [TCTA] ₃ TA [TCTA] ₃ TCA [TCTA] ₂ TCCA TA [TCTA] ₁₁ CSF1POCSF1PO 10, 1210, 12 AGATAGAT 10: [AGAT]₁₀ 10: [AGAT] ₁₀ 12: [AGAT]₁₂ 12: [AGAT] ₁₂ FGAFGA 23, 2423, 24 CTTTCTTT 23: [TTTC]₃ TTTT TTCT [CTTT]₁₅ CTCC [TTCC]₂ 23: [TTTC] ₃ TTTT TTCT [CTTT] ₁₅ CTCC [TTCC] ₂ 24: [TTTC]₃ TTTT TTCT [CTTT]₁₆ CTCC [TTCC]₂ 24: [TTTC] ₃ TTTT TTCT [CTTT] ₁₆ CTCC [TTCC] ₂ Penta DPenta D 1212 AAAGAAAAGA 12: [AAAGA]₁₂ 12: [AAAGA] ₁₂ Penta EPenta E 12, 1312, 13 AAAGAAAAGA 12: [AAAGA]₁₂ 12: [AAAGA] ₁₂ 13: [AAAGA]₁₃ 13: [AAAGA] ₁₃ TH01TH01 8, 9.38, 9.3 AATGAATG 8: [AATG]₈ 8: [AATG] ₈ 9.3: [AATG]₆ ATG [AATG]₃ 9.3: [AATG] ₆ ATG [AATG] ₃ TPOXTPOX 88 AATGAATG 8: [AATG]₈ 8: [AATG] ₈ vWAvWA 17, 1817, 18 TCTATCTA 17: TCTA [TCTG]₄ [TCTA]₁₂ TCCA TCTA17: TCTA [TCTG] ₄ [TCTA] ₁₂ TCCA TCTA 18: TCTA [TCTG]₄ [TCTA]₁₃ TCCA TCTA18: TCTA [TCTG] ₄ [TCTA] ₁₃ TCCA TCTA

4. 혼합시료에서의 혼합비율 추정4. Estimation of Mixing Ratio in Mixed Samples

분석 대상인 15개의 각 STR 유전좌에서 대립유전자형에 대한 coverage 값의 비율을 조사하는 방법과 특정 위치에서 나타나는 서열변이를 확인하고 각 염기의 비율을 알아보는 방법으로 1:1 혼합시료 비율을 추정하였다. D3S1358 유전좌를 예로 들면, 2800M에서는“17, 18”의 대립유전자형을 가지고 있고, 9947A에서는“14, 15”이기 때문에 이들의 1:1 혼합물에 대한 대립유전자형은“14, 15, 17, 18”이 된다. 이론적으로 각 대립유전자형에 대한 coverage 값의 비율이 1:1:1:1로 예상되었으나, 이들의 coverage 값이 각각 1519, 2245, 4936, 3006의 순으로 나와서(Table 3) 이들의 비율은“1:1.5:3.3:2.6”으로 얻어졌다. 또한, 2800M 시료의 D13S317 유전좌에서는 대립유전자 11의 3´ 주변부 서열에서 human reference genome hg19을 기준으로 아데닌(adenine)에서 티민(thymine)으로의 염기서열변이가 확인되었다(도 3). 이 위치에서 각 염기의 수를 조사한 결과 전체 coverage 값 5683 중에서 티민은 3037 (46%)로, 아데닌은 2642 (53%)으로 나와서 두 개의 염기가 거의 1:1로 존재하고 있음을 확인하였다.The ratio of 1: 1 mixed sample was estimated by examining the ratio of coverage values to alleles in each of the 15 STR genotypes analyzed, and identifying sequence variations at specific positions and determining the ratio of each base. The allele genotype for the 1: 1 mixture of these genes is "14, 15, 17, 18" because the D3S1358 genotype has the allele of 17, 18 in 2800M and 14, 15 in 9947A. . Theoretically, the coverage ratio for each allele genotype was expected to be 1: 1: 1: 1, but their coverage values were in the order of 1519, 2245, 4936, and 3006 (Table 3) : 1.5: 3.3: 2.6 ". In addition, in the D13S317 genetic locus of the 2800M sample, nucleotide sequence change from adenine to thymine was confirmed based on the human reference genome hg19 in the 3 'peripheral sequence of the allele 11 (FIG. 3). As a result of investigation of the number of each base at this position, thymine was found to be 3037 (46%) and adenine was found to be 2642 (53%) among the total coverage value of 5683.

그런데 2800M 시료의 D13S317 유전좌에서의 대립유전자형은 9, 11의 이형접합자이고 9947A는 11, 11의 동형접합자라는 점을 감안한다면 아데닌과 티민의 비율이 2:1로 나와야만 한다. 결국, 예상되는 실제 혼합비율과 다르게 나타났다는 것을 알 수 있었다.
However, considering that the allele genotype at D13S317 in the 2800M sample is 9 and 11 heterozygotes and 9947A is homozygous for 11 and 11, the ratio of adenine to thymine should be 2: 1. As a result, it was found that the actual mixing ratio was different from the expected mixing ratio.

다. 고찰All. Review

NGS 방법을 이용한 STR 유전자형 분석은 STR 증폭 산물의 생성, 라이브러리 제작, 대량의 염기서열 생성, 자료 분석의 과정으로 이루어진다. 본 연구에서는 법과학 영역에서 CE 기반의 상염색체 STR 분석에 사용되는 PowerPlex 16 system 의 프라이머 정보를 이용하여 다중증폭 PCR 시스템을 구축하고 표준 DNA 시료를 대상으로 증폭 산물을 생성하고 NGS를 통해 얻은 결과를 분석하였다. 본 연구는 여러 그룹에서도 보고한 법과학 STR을 대상으로 NGS 기법으로 분석한 방법과 비슷하다.^10-15) Van Neste 등의 첫 번째 연구¹⁰ ⁾에서는 9개의 STR 유전좌를 분석할 수 있는 상용화된 STR 키트를 이용하여 단일시료 및 혼합시료에 대한 증폭 산물을 준비하고 NGS 분석을 수행하였다. 여기서는 형광표지자가 부착된 프라이머를 그대로 사용하였는데, 저자들은 NGS 분석 결과로 정방향(forward) 및 역방향(reverse)으로 읽은 리드 수의 차이가 크게 나타나는 점을 확인했으며, 이것이 형광표지자의 영향일 것으로 추측하였다. 이 때문에 본 연구에서도 Van Neste 등의 두 번째 연구²⁰ ⁾와 동일하게 15개 STR 유전좌에 대해 형광표지자가 부착되지 않은 프라이머를 가지고 다중증폭 PCR 방법으로 증폭 산물을 준비하고 다만 다른 NGS 장비인 GS Junior를 사용하여 분석하였다.Analysis of STR genotypes using the NGS method consists of the production of STR amplification products, library production, mass sequencing, and data analysis. In this study, we constructed a multiplex amplification PCR system using the primer information of PowerPlex 16 system used for CE-based autosomal STR analysis in the forensic field, and generated the amplification product of the standard DNA sample and analyzed the result obtained through NGS Respectively. This study is similar to the NGS method for forensic STRs reported in several groups. ^{10-15) In} the first study of Van Neste et al. ( ¹⁰ ⁾ , amplification products for single and mixed samples were prepared and NGS analysis was performed using a commercially available STR kit capable of analyzing 9 STR genome loci. Here, we used primers with fluorescent markers as they were, and the authors confirmed that NGS analysis showed a large difference in the number of leads read in both forward and reverse directions, and this was presumed to be due to fluorescent markers . In this study, as in the second study of Van Neste et al. ( ²⁰ ⁾ , amplification products were prepared by multiplex amplification PCR with primers without fluorescent markers attached to 15 STR genomic loci. However, other NGS equipment, GS Junior .

라이브러리 제작을 위해 Roche 사에서 권장하는 방법은 i)어댑터 서열과 주형 특이적 서열(template specific sequence)이 서로 결합된 프라이머(퓨전 프라이머; fusion primer)를 이용하여 증폭 산물을 생성하거나, ii) 온전한 주형 DNA를 작은 절편으로 만드는 과정(절편화; fragmentation)을 수행한 후 어댑터를 붙이는 방법이다. 첫 번째로 퓨전 프라이머를 이용하는 방법은 이전에 증폭 산물 생성과 라이브러리 제작을 동시에 진행하기 위해 사용해 본 적이 있다. 하지만 증폭 산물을 전반적으로 고르게 얻지 못하였고, 이에 따라 NGS로 얻어진 총 리드수가 적게 나왔으며, 일부 유전좌에서 대립유전자형이 정확하게 결정되지 않은 경우가 발생하였다. 아마도 길어진 프라이머가 사용되었기 때문에 PCR 과정에서의 증폭 효율이 떨어졌고, 또한 이어 진행된 에멀션 PCR 단계에서도 영향을 끼친 것으로 생각되어 본 연구에서는 사용되지 않았다. 두 번째로 DNA를 절편화 하는 방법은 전장 유전체(whole genome) 및 미토콘드리아 DNA와 같이 길이가 긴 경우에 라이브러리를 제작하는 방법으로 다중증폭 PCR을 통해서 100-450 bp 범위의 증폭 산물을 생성함으로써 이루어지는 STR 분석에는 적절하지 않았다. 하지만 이러한 증폭 산물을 이미 절편화가 완료된 작은 절편으로 간주하고 어댑터를 부착을 통한 라이브러리 제작을 통해서 성공적으로 NGS 자료를 생성할 수 있었다. 결과적으로 기존의 다중증폭 PCR의 방식을 그대로 유지하면서 위와 같이 라이브러리를 제작하는 방법은 법과학 분야에서 NGS를 통한 STR 유전좌의 연구에 매우 유용할 것이라고 본다. 15개 STR 유전좌에 대해서 NGS 자료를 생성하고, 분석을 통해 각 유전좌마다 리드의 분포를 조사했을 때 일정하게 나오지 않고 증폭 산물의 크기와 반비례하여 나타나는 것이 관찰되었다(표 2). 대체로 250 bp를 기준으로 이것보다 증폭 산물이 작게 만들어지는 유전좌에서는 리드의 수가 많게 나왔지만, 크게 나오는 유전좌에 대해서는 리드 수가 상대적으로 적게 얻어졌다. 특히 300 bp 이상의 증폭 산물이 생성되는 D18S51, FGA, Penta D, Penta E에서는 모든 리드(All)의 수도 적게 얻어졌을 뿐만 아니라 이들 중에서 전체 STR 영역을 포함하는 리드의 비율(Entire STR/All)도 50% 미만으로 확인되었다. 본 연구에서는 NGS를 위해 PowerPlex 16 system의 프라이머 정보를 이용하였기 때문에 이에 따른 증폭 산물의 크기도 106-474 bp의 범위로 넓게 나타나게 된다. 이러한 점들을 고려할 때 전체적으로 증폭 산물의 크기를 줄이면서 보다 좁은 범위에서 이들이 생성될 수 있게 한다면, 각 STR 유전좌마다 일정한 리드의 수를 얻게 됨으로써 차후 분석결과에 신뢰를 줄 수 있을 것으로 예상된다. 따라서 NGS에 최적화된 STR 분석결과를 얻기 위해서는 새로운 실험적 설계가 필요할 것으로 본다. 또한, GS Junior 장비 이외에 다른 시퀀싱 방식을 사용하는 동급의 MiSeq (Illumina Inc., San Diego, CA, USA) 및 Ion Torrent PGM (Life Technologies, Carlsbad, CA, USA) 장비에서도 성능 개선을 통해 읽을 수 있는 리드의 길이가 점차 길어지고 있기 때문에 이러한 장비에서도 함께 적용될 수 있는 설계가 요구될 것이다.Roche recommends the use of i) a primer (fusion primer) in which an adapter sequence and a template specific sequence are bound to each other to produce an amplification product, or ii) It is a method of attaching the adapter after performing a process (fragmentation) of making the DNA into a small intercept. First, the fusion primer method has been used to perform amplification product generation and library production at the same time. However, the amplification product was not generally obtained, and thus the total number of leads obtained by NGS was small, and some alleles were not correctly determined in some genetic loci. Perhaps because of the use of longer primers, the efficiency of amplification in the PCR process was reduced, and it was also thought to have influenced the subsequent emulsion PCR steps and was not used in this study. The second method of fragmenting DNA is to construct a library when the whole genome and mitochondrial DNA are long, and then construct amplification products in the range of 100-450 bp by multiplex amplification PCR. It was not appropriate for analysis. However, this amplification product was regarded as a small section that had already been segmented, and NGS data could be successfully generated through library production by attaching the adapter. As a result, the method of constructing the above-mentioned library while maintaining the conventional multi-amplification PCR method will be very useful for the study of STR genetic loci through NGS in the forensic science field. NGS data were generated for 15 STR genomic loci, and when the distribution of leads for each genome locus was analyzed by analysis, it was observed not to appear constant but to be inversely proportional to the size of the amplified product (Table 2). Generally, the number of leads was large in the genome where the amplification product was made smaller than that based on 250 bp, but the number of leads was relatively small for the genome left large. Especially, in D18S51, FGA, Penta D, and Penta E in which amplification products of 300 bp or more are generated, not only the number of all leads was obtained, but also the ratio of the leads including the entire STR region (Entire STR / All) %. In this study, because the primer information of PowerPlex 16 system was used for NGS, the size of amplification product is also widely ranged from 106 to 474 bp. Considering these points, it is expected that, if the amplification products are reduced in size and narrowed to produce them, the number of leads in each STR genome will be increased, thereby providing confidence in the results of the subsequent analysis. Therefore, a new experimental design is needed to obtain STR analysis results optimized for NGS. It is also possible to read through the performance enhancements of the MiSeq (Illumina Inc., San Diego, CA, USA) and Ion Torrent PGM (Life Technologies, Carlsbad, CA, USA) As the length of the leads is getting longer, a design that can be applied together in these devices will be required.

단일시료 및 1:1 혼합시료를 NGS를 통해 STR 대립유전자 형을 결정한 후에 CE 분석법으로 얻어진 결과와 비교하였을 때, 단일시료에서는 모든 STR 유전좌에서 대립유전자형이 일치하였는데 반하여 1:1 혼합시료의 일부 STR 유전좌(PentaE, vWA)에서는 CE 분석법으로 얻은 대립유전자형과 NGS 분석으로 얻은 대립유전자형이 서로 일치하지 않는 것이 확인되었다(표 4). 이것은 이들 유전좌에서 각 하나씩의 대립유전자의 coverage 값이 본 연구에서 대립유전자 결정을 위해 설정한 기준값(10%) 미만으로 나왔기 때문이다. 그렇지만 이들 대립유전자에서는 stutter라고 여겨지는 대립유전자의 coverage 값보다는 크게 나왔기 때문에 결과에서 이들을 배제하는 것은 옳지 않다고 판단하였다. 앞으로도 NGS를 이용한 혼합시료의 STR 분석에서도 대립유전자를 결정할 때 위와 같은 점을 고려하여 분석 결과에 오류가 없도록 세심한 노력이 필요할 것으로 본다.Compared with the result obtained by the CE method after determining STR alleles by NGS, single samples and 1: 1 mixed samples showed allele genotypes in all STR genotypes in a single sample, whereas some of the 1: 1 mixed samples In the STR genotype (PentaE, vWA), it was confirmed that the allelic genotype obtained by the CE method and the allele genotype obtained by the NGS analysis do not match (Table 4). This is because the coverage value of each single allele in these genetic loci is less than the reference value (10%) set for the allele determination in this study. However, these alleles were larger than the allergic coverage values considered to be stuttering, so it was not appropriate to exclude them from the results. In the future, it will be necessary to take careful efforts to ensure that there is no error in the analysis results in consideration of the above points when determining alleles in STR analysis of mixed samples using NGS.

NGS 기법으로 STR 대립유전자의 반복구조 결정 및 염기서열변이의 관찰이 가능하여(표 5), 또한 두 개의 남녀 표준시료(2800M과 9947A)에서 3가지의 특징을 확인할 수 있었다. 첫 번째는 한 유전좌에서 같은 길이의 대립유전자로 보였지만 다른 염기서열을 갖고 있는 경우였으며, 두 번째는 한 유전좌에서 서로 다른 시료 간에 다른 반복구조를 보이는 경우였고, 세 번째는 STR 영역의 반복구조는 같지만, 주변부 서열에서 염기서열변이가 관찰된 경우였다. 이러한 점들은 NGS를 이용한 염기서열 기반의 분석으로 기존의 CE를 통해 확인된 STR 대립유전자가 더욱 더 세분될 수 있음을 시사한다. 또한, 앞선 연구¹¹ ⁾에서 제시한 바와 같이, 한국인에서도 NGS를 이용한 STR 대립유전자의 염기서열정보 및 이들의 빈도자료가 구축된다면 앞으로 친자확인 및 범죄수사와 같은 법과학 실무에 유용할 것이다.It was possible to determine the repeat structure of STR alleles and to observe the nucleotide sequence variation by the NGS technique (Table 5), and three characteristics were confirmed in two male and female standard samples (2800M and 9947A). The first one was the same length of alleles in one genome, but the other had the same sequence. The second one showed a different repetition structure among the different samples in one genome. Were the same, but the nucleotide sequence variation was observed in the peripheral part. These findings suggest that the STR allele identified through CE can be further subdivided by nucleotide sequence analysis using NGS. In addition, as shown in the previous study ¹¹ ^{, if} nucleotide sequence information of STR allele using NGS and their frequency data are constructed in Korean, it will be useful for forensic science practice such as paternity identification and crime investigation.

1:1 혼합시료에서 NGS 분석을 통해 혼합비율의 추정하기 위해 STR 대립유전자에 대한 coverage 값의 비율로 알아보는 방법을 사용하였을 때 얻어진 결과 값이 예상하고 있는 비율과 다르게 나오는 것이 확인되었다. 특이하게도 2800M과 9947A에서 각각 유래된 대립유전자를 분리하여 coverage 값을 조사하였을 때 동일한 양상으로 나오지 않고, 2800M 유래의 대립유전자 쪽으로 치우치는 경향을 확인할 수 있었다(표 3). 이러한 양상은 15개 STR 유전좌에서 모두 동일하게 나타났다(자료 제시 없음). 뿐만 아니라 D13S317 유전좌에서 관찰된 염기 서열변이로부터 아데닌과 티민의 수를 조사하여 혼합비율을 추정한 경우에서도 2800M에서 유래된 티민이 예상보다 많이 나오는 것이 관찰되었다(도 3 ). 이러한 원인을 알아보기 위해서 CE를 통해 얻은 1:1 혼합시료의 프로필(profile)에서 대립유전자의 피크(peak) 높이를 조사하여 혼합비율을 추정해보았다. CE 결과에서도 NGS 결과와 마찬가지로 한쪽 시료의 대립유전자가 예상보다 많이 나온다는 것을 알 수 있었다(자료 제시 없음). CE 및 NGS 기법은 공통적으로 대상 시료로부터 PCR을 통해서 증폭 산물을 준비하는 것으로 시작한다. 이것으로 볼 때 위와 같은 현상은 PCR 과정에서 발생하는 두 개의 시료 간의 증폭 효율의 차이라는 것을 미루어 짐작할 수 있었다. 따라서 NGS를 이용하여 혼합비율을 추정하는 경우에는 이러한 점을 충분히 고려하여 분석이 이루어져야 할 것이다.It has been confirmed that the ratio of the coverage value to the STR allele in the 1: 1 mixed sample is different from the expected ratio in estimating the mixing ratio by NGS analysis. In particular, when the alleles derived from 2800M and 9947A were isolated and their coverage values were examined, it was found that the alleles from 2800M and 9947A did not show the same pattern but shifted toward the 2800M alleles (Table 3). This pattern was the same for all 15 STR strains (data not shown). In addition, it was observed that the number of adenine and thymine from the nucleotide sequence variation observed in the D13S317 genetic locus was estimated to be higher than expected at 2800M (FIG. 3) even when the mixing ratio was estimated. To investigate the cause, we investigated the peak height of the allele in the profile of the 1: 1 mixed sample obtained through CE and estimated the mixing ratio. CE results showed that the alleles of one sample were more than expected as well as NGS results (data not shown). CE and NGS techniques commonly begin by preparing amplification products from PCR from the target sample. This suggests that the above phenomenon is due to the difference in amplification efficiency between the two samples generated in the PCR process. Therefore, when estimating the mixing ratio using NGS, this point should be fully considered and analyzed.

본 연구 및 Bornman 등¹⁵ ⁾의 연구에서는 2개의 단일시료를 이용하여“1:1”의 비율에 대해서만 NGS를 통한 분석을 수행하였다. 이러한 경우는 용의자와 피해자가 각각 한 명으로 구성된 사건 현장에서 얻어진 시료를 해석하는데 적용될 수 있을 것이다. 하지만 혹시라도 둘 중 한 명의 시료에서 낮은 비율로 나타난다면 자료의 해석이 어려워질 수 있다. 따라서 사건 현장에서 얻어지는 시료의 실제적인 특성을 고려하기 위해서는 “1:1”조건 이외에도 좀 더 다양한 비율로 혼합된 시료를 대상으로 효과적인 자료 해석이 이루어지는지 조사할 필요가 있다.In this study and Bornman et al. [ ¹⁵ ^] , two single samples were used for analysis of "1: 1" through NGS. This case can be applied to the interpretation of the samples obtained at the case site consisting of one suspect and one victim. However, if one of the samples shows a low ratio, the interpretation of the data may become difficult. Therefore, in order to take into account the actual characteristics of the sample obtained at the site of the accident, it is necessary to investigate whether effective data interpretation is possible for the mixed samples in a more varied ratio besides the "1: 1" condition.

Van Neste 등은 총 4개의 시료로부터 “10:20:30:40”및 “93.40:5:1:0.5:0.1”의 비율로 혼합시료를 만들어 NGS를 통한 분석에 이용하였다.²⁰⁾ 여기서 분석에 사용된 최소 기준을 0.5%로 설정하였기 때문에 이론적으로는 1%로 존재하는 시료까지는 검출되어야 하지만, 실제적으로는 5% 이상으로 존재하는 시료부터 검출할 수 있었다. 이러한 연구는 NGS를 이용한 혼합물 분석에서 가장 큰 관심거리인“소수의 공여자(minor contributor)로부터의 대립유전자를 얼마나 낮은 비율까지(민감도; sensitivity) 그리고 얼마나 정확하게(특이도; specificity) 검출할 수 있는가”를 알아보는 데 중요한 정보를 제공할 것으로 본다. 앞으로 이러한 연구 결과를 NGS 자료의 분석을 통해 STR 대립유전자형을 결정하는 데 활용함으로써 얻어진 자료의 해석이 정확하게 이루어질 수 있도록 노력해야 할 것으로 본다.Van Neste et al. Prepared a mixed sample from four samples at a ratio of "10: 20: 30: 40" and "93.40: 5: 1: 0.5: ^{20) Since} the minimum standard used for the analysis was set at 0.5%, it was theoretically detectable from 1% sample, but in practice, 5% sample could be detected. These studies have shown that "what is the sensitivity (sensitivity) and how precise (specificity) is the detection of the allele from the minor contributor", which is of greatest interest in analyzing mixtures using NGS I think it will provide important information to learn. In the future, it will be necessary to try to make accurate interpretation of the data obtained by analyzing NGS data to determine STR alleles.

본 연구에서 제시한 NGS 자료 분석의 전략으로 참조서열을 직접 제작함으로써 법과학에서 주로 사용되는 STR 유전좌에 맞게 대립유전자형을 결정할 수 있도록 새로운 방법을 제시하였다. 이전에 다른 연구자들에 의해 개발된 STR을 분석하는 lobSTR 프로그램도 보고된 바 있다. 하지만 본 연구에서 제시한 분석법이 lobSTR 프로그램을 사용했을 때보다 향상된 결과를 보였다(자료 제시 없음). 본 연구에서 사용된 분석법은 복잡하고 번거로운 과정 때문에 실제 사용자들이 느끼기에는 다소 어려운 점이 있을 것으로 여겨진다. 이에 새로 제작된 참조서열을 이용한 분석 프로그램이 개발된다면 좀 더 효율적으로 NGS 분석을 수행할 수 있을 것이고 더 나아가 좀 더 많은 STR 분석의 적용에도 유용할 것으로 생각한다.As a strategy of NGS data analysis proposed in this study, a new method is proposed to determine the allele genotype according to the STR locus used mainly in forensic science. A lobSTR program has been reported to analyze STR previously developed by other researchers. However, the method presented in this study showed better results than the lobSTR program (data not shown). The analytical methods used in this study seem to be somewhat difficult for actual users to feel because of the complicated and cumbersome process. Therefore, if an analysis program using the newly generated reference sequence is developed, NGS analysis will be performed more efficiently and it will be useful for application of more STR analysis.

본 연구에서는 남녀 표준시료 단일시료 및 이들의 혼합시료를 대상으로 단 한 번의 NGS 과정을 통해 성공적으로 염기서열 자료를 생성할 수 있었을 뿐만 아니라 이들 자료로부터 효과적인 STR 분석을 수행할 수 있었다. 이러한 방법은 범죄현장에서 발견된 시료와 함께 용의자 및 피해자에게서 채취한 시료의 분석에 대해서도 동일하게 적용될 수 있는 모델이라 판단된다. 따라서 NGS를 이용한 STR 분석법이 실험적, 분석적 측면에서 보다 최적화가 이루어진다면 기존의 CE 기반의 방법이 가지는 부족한 점을 채워줌으로써 법과학 분야에서 기존 방법과 함께 추가적인 방법으로 유용하게 사용 될 수 있을 것으로 전망한다.
In this study, it was possible to successfully generate sequencing data for single male and female standard samples and their mixed samples through one NGS process, and to perform effective STR analysis from these data. This method is considered to be applicable to the analysis of samples collected from suspects and victims together with the samples found at the crime scene. Therefore, if the STR analysis using NGS is more optimized in terms of experimental and analytical aspects, it can be used as an additional method in addition to existing methods in the forensic science field by satisfying the deficiencies of the existing CE-based method.

라. 참고문헌 la. references

1. Thompson R, Zoppis S, McCord B. An overview of DNA typing methods for human identification: past, present, and future. Methods Mol Biol 2012;830:3-16.1. Thompson R, Zoppis S, McCord B. An overview of DNA typing methods for human identification: past, present, and future. Methods Mol Biol 2012; 830: 3-16.

2. Kayser M, de Knijff P. Improving human forensics through advances in genetics, genomics and molecular biology. Nat Rev Genet 2011;12:179-92.2. Kayser M, de Knijff P. Improving human forensics through advances in genetics, genomics and molecular biology. Nat Rev Genet 2011; 12: 179-92.

3. Berglund EC, Kiialainen A, Syv¨anen AC. Next-generation sequencing technologies and applications for human genetic history and forensics. Investig Genet 2011;2:23.3. Berglund EC, Kiialainen, Syvæanen AC. Next-generation sequencing technologies and applications for human genetic history and forensics. Investig Genet 2011; 2: 23.

4. Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet 2010;11:31-46.4. Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet 2010; 11: 31-46.

5. Cho IS, Blaser MJ. The human microbiome: at the interface of health and disease. Nat Rev Genet 2012;13:260-70.5. Cho IS, Blaser MJ. The human microbiome: at the interface of health and disease. Nat Rev Genet 2012; 13: 260-70.

6. Bamshad MJ, Ng SB, Bigham AW, et al. Exome sequencing as a tool for Mendelian disease gene discovery. Nat Rev Genet 2012;12:745-55.6. Bamshad MJ, Ng SB, Bigham AW, et al. Exome sequencing as a tool for Mendelian disease gene discovery. Nat Rev Genet 2012; 12: 745-55.

7. Ozsolak F, Milos PM. RNA sequencing: advances, challenges and opportunities. Nat Rev Genet 2011;12:87-98.7. Ozsolak F, Milos PM. RNA sequencing: advances, challenges and opportunities. Nat Rev Genet 2011; 12: 87-98.

8. Meyerson M, Gabriel S, Getz G. Advances in understanding cancer genomes through second-generation sequencing. Nat Rev Genet 2010;11:685-96.8. Meyerson M, Gabriel S, Getz G. Advances in understanding cancer genomes through second-generation sequencing. Nat Rev Genet 2010; 11: 685-96.

9. Laird PW. Principles and challenges of genomewide DNA methylation analysis. Nat Rev Genet 2010;11:191-203.9. Laird PW. Principles and challenges of genomewide DNA methylation analysis. Nat Rev Genet 2010; 11: 191-203.

10. Van Neste C, Van Nieuwerburgh F, Van Hoofstat D, et al. Forensic STR analysis using massive parallel sequencing. Forensic Sci Int Genet 2012;6:810-8.10. Van Neste C, Van Nieuwerburgh F, Van Hoofstat D, et al. Forensic STR analysis using massive parallel sequencing. Forensic Sci Int. Genet. 2012; 6: 810-8.

11. Rockenbauer E, Hansen S, Mikkelsen M, et al. Characterization of mutations and sequence variants in the D21S11 locus by next generation sequencing. Forensic Sci Int Genet 2014;8:68-72.11. Rockenbauer E, Hansen S, Mikkelsen M, et al. Characterization of mutations and sequence variants in the D21S11 locus by next generation sequencing. Forensic Sci Int. Genet. 2014; 8: 68-72.

12. Fordyce SL, A′vila-Arcos MC, Rockenbauer E, et al. Highthroughput sequencing of core STR loci for forensic genetic investigations using the Roche Genome Sequencer FLX platform. Biotechniques 2011;51:127-33.12. Fordyce SL, A'vila-Arcos MC, Rockenbauer E, et al. High throughput sequencing of core STR loci for forensic genetic investigations using the Roche Genome Sequencer FLX platform. Biotechniques 2011; 51: 127-33.

13. Dalsgaard S, Rockenbauer E, Buchard A, et al. Non-uniform phenotyping of D12S391 resolved by second generation sequencing. Forensic Sci Int Genet 2014;8:195-9.13. Dalsgaard S, Rockenbauer E, Buchard A, et al. Non-uniform phenotyping of D12S391 resolved by second generation sequencing. Forensic Sci Int. Genet. 2014; 8: 195-9.

14. Scheible M, Loreille O, Just R, et al. Short tandem repeat sequencing on the 454 platforms. Forensic Sci Int Genet Suppl Ser 2011;3:357-8.14. Scheible M, Loreille O, Just R, et al. Short tandem repeat sequencing on the 454 platforms. Forensic Sci Int Genet Suppl Ser 2011; 3: 357-8.

15. Bornman DM, Hester ME, Schuetter JM, et al. Short-read, high-throughput sequencing technology for STR genotyping. Biotechniques 2012;0:1-6.15. Bornman DM, Hester ME, Schuetter JM, et al. Short-read, high-throughput sequencing technology for STR genotyping. Biotechniques 2012; 0: 1-6.

16. Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods 2012;9:357-9.16. Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods 2012; 9: 357-9.

17. Li H, Handsaker B, Wysoker A, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 2009;25:2078-9.17. Li H, Handsaker B, Wysokere, et al. The Sequence Alignment / Map format and SAMtools. Bioinformatics 2009; 25: 2078-9.

18. Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 2010;26:841-2.18. Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 2010; 26: 841-2.

19. Robinson JT, Thorvaldsdo′ttir H, Winckler W, et al. Integrative genomics viewer. Nat Biotechnol 2011;29:24-6.19. Robinson JT, Thorvaldsdo H, Winckler W, et al. Integrative genomics viewer. Nat Biotechnol 2011; 29: 24-6.

20. Van Neste C, Vandewoestyne M, Van Criekinge W, et al. My-Forensic-Loci-queries (MyFLq) framework for analysis of forensic STR data generated by massive parallel sequencing. Forensic Sci Int Genet 2014;9:1-8.20. Van Neste C, Vandewoestyne M, Van Criekinge W, et al. My-Forensic-Loci-queries (MyFLq) framework for analysis of forensic STR data generated by massive parallel sequencing. Forensic Sci Int. Genet. 2014; 9: 1-8.

21. Gymrek M, Golan D, Rosset S, et al. lobSTR: a short tandem repeat profiler for personal genomes. Genome Res 2012;22:1154-62.
21. Gymrek M, Golan D, Rosset S, et al. lobSTR: a short tandem repeat profiler for personal genomes. Genome Res 2012; 22: 1154-62.

실시예Example 2: 멀티플렉스 유전자 증폭을 이용한 혼합시료에서의 인간 객체의 2: Multiplex Gene Amplification for the Analysis of Human Objects in Mixed Samples 상염색체Autosomal 분석 analysis

가. 연구방법end. Research method

1. 단일-소스(1. Single-source ( singlesingle -- sourcesource ) 및 혼합 ) And mixing DNADNA 샘플 Sample

DNA 시료는 법유전학 연구에서 대조군으로 사용되고 있는 남성 표준 시료 2800M (Promega, Madison, WI, USA)과 여성 표준 시료 9947A (Promega)를 사용하였다. 이들 DNA 시료는 NanoDrop 1000 spectrophotometer (Thermo. Fisher scientific, Waltham, MA, USA)를 이용하여 정량한 후 1 ng/μl의 농도로 준비하였다. 혼합시료는 두 개의 단일시료(2800M과 9947A)를 각각 1:1, 1:3, 1:6, 1:9 및 1:49 (남성:여성) 비율로 섞어서 최종농도 1 ng/μl가 되도록 하여 준비했다.
DNA samples were obtained from a male standard sample 2800M (Promega, Madison, WI, USA) and a female standard sample 9947A (Promega), which are used as control in legal genetics research. These DNA samples were quantified using a NanoDrop 1000 spectrophotometer (Thermo Fisher Scientific, Waltham, Mass., USA) and then prepared at a concentration of 1 ng / μl. Mixed samples were prepared by mixing two single samples (2800M and 9947A) at a ratio of 1: 1, 1: 3, 1: 6, 1: 9 and 1:49 (male: female), respectively, to a final concentration of 1 ng / μl Ready.

2. 증폭 방법2. Amplification method

본 연구에서는 STR NGS 분석을 위해 구축된 새로운 멀티플렉스 PCR 시스템을 이용하여 단일 소스 샘플 및 혼합물을 증폭하였다. 상기 시스템은 일반적으로 법의학 유전학에서 사용되는 18개의 마커(CODIS 13 STRs, D2S1338, D19S433, Penta D, Penta E, 및 아멜로제닌)를 포함하고 있다. STR 부위의 증폭을 위하여 Primer 3 프로그램을 이용하여 프라이머를 디자인하였으며, 앰플리콘(amplicon) 사이즈는 70 bp 내지 210 bp의 범위로서, 형광염료를 사용하지 않았다. 본 발명자들은 NCBI SNP 정보(http://www.ncbi.nlm.nih.gov/SNP/)에 근거한 프라이머 결합부위에서 1% 이상의 돌연변이가 나타나지 않도록 프라이머를 디자인 하였다. In this study, a novel multiplex PCR system constructed for STR NGS analysis was used to amplify single source samples and mixtures. The system includes 18 markers (CODIS 13 STRs, D2S1338, D19S433, Penta D, Penta E, and ammelogenin) generally used in forensic genetics. Primers were designed using the Primer 3 program for amplification of the STR region. The amplicon size ranged from 70 bp to 210 bp, and no fluorescent dye was used. We designed the primers so that no more than 1% mutation appears at the primer binding site based on the NCBI SNP information ( http://www.ncbi.nlm.nih.gov/SNP/ ).

18개 마커의 in-house developed 멀티플렉스 PCR 시스템의 PCR 프라이머 및 최종 농도PCR primers and final concentrations of 18 markers in-house developed multiplex PCR system -- 위치(Locus)Locus 프라이머세트Primer set 프라이머 서열 (5`→3`)The primer sequence (5` → 3`) 농도 (μM)Concentration (μM) 1One D19S433D19S433 F053F053 GCAAAAAGCTATAATTGTACCACGCAAAAAGCTATAATTGTACCAC 0.600.60 R203R203 AAAAATCTTCTCTCTTTCTTCCTCTCAAAAATCTTCTCTCTTTCTTCCTCTC 0.600.60 22 D5S818D5S818 F160F160 TGATTTTCCTCTTTGGTATCCTTTGATTTTCCTCTTTGGTATCCTT 0.550.55 R280R280 CAACATTTGTATCTTTATCTGTATCCTCAACATTTGTATCTTTATCTGTATCCT 0.550.55 33 Penta EPenta E F203F203 GGCGACTGAGCAAGACTCAGGCGACTGAGCAAGACTCA 1.001.00 R284mR284m TGGGTTATTAATTGAGAAAACTCCTTTGGGTTATTAATTGAGAAAACTCCTT 1.001.00 44 CSF1POCSF1PO F191F191 ACTGCCTTCATAGATAGAAGATACTGCCTTCATAGATAGAAGAT 0.500.50 R295R295 GACCCTGTTCTAAGTACTTCCTGACCCTGTTCTAAGTACTTCCT 0.500.50 55 D7S820D7S820 F171F171 TGATAGAACACTTGTCATAGTTTAGAATGATAGAACACTTGTCATAGTTTAGAA 0.500.50 R344R344 CTCATTGACAGAATTGCACCACTCATTGACAGAATTGCACCA 0.500.50 66 D18S51D18S51 F197F197 GTTGCTACTATTTCTTTTCTTTTTCTCGTTGCTACTATTTCTTTTCTTTTTCTC 0.900.90 R340R340 CTGAGTGACAAATTGAGACCTTGCTGAGTGACAAATTGAGACCTTG 0.900.90 77 TPOXTPOX F112F112 CAGAACAGGCACTTAGGGAACCAGAACAGGCACTTAGGGAAC 0.350.35 R198R198 TCCTTGTCAGCGTTTATTTGCTCCTTGTCAGCGTTTATTTGC 0.350.35 88 D16S539D16S539 F119F119 AATACAGACAGACAGACAGGTGAATACAGACAGACAGACAGGTG 0.450.45 R225R225 AGCATGTATCTATCATCCATCTCTGAGCATGTATCTATCATCCATCTCTG 0.450.45 99 D8S1179D8S1179 F173F173 TTTTTGTATTTCATGTGTACATTCGTTTTTTGTATTTCATGTGTACATTCGT 0.900.90 R275R275 GTAGATTATTTTCACTGTGGGGAAGTAGATTATTTTCACTGTGGGGAA 0.900.90 1010 AmelogeninAmelogenin F181F181 CCTTTGAAGTGGTACCAGAGCATCCTTTGAAGTGGTACCAGAGCAT 0.800.80 R262R262 GCATGCCTAATATTTTCAGGGAATAAGCATGCCTAATATTTTCAGGGAATAA 0.800.80 1111 FGAFGA F153F153 AAATAAAATTAGGCATATTTACAAGCAAATAAAATTAGGCATATTTACAAGC 1.001.00 R293R293 GCCAGCAAAAAAGAAAGGAAGCCAGCAAAAAAGAAAGGAA 1.001.00 1212 D13S317D13S317 F183F183 TCTAACGCCTATCTGTATTTACAATCTAACGCCTATCTGTATTTACAA 0.800.80 R284R284 AGACAGAAAGATAGATAGATGATTGAAGACAGAAAGATAGATAGATGATTGA 0.800.80 1313 D2S1338D2S1338 F128F128 TGGAAACAGAAATGGCTTGGTGGAAACAGAAATGGCTTGG 0.800.80 R273R273 AGTTATTCAGTAAGTTAAAGGATTGCAGTTATTCAGTAAGTTAAAGGATTGC 0.800.80 1414 D21S11D21S11 F161F161 AATTCCCCAAGTGAATTGCCAATTCCCCAAGTGAATTGCC 0.700.70 F334F334 GGTAGATAGACTGGATAGATAGACGAGGTAGATAGACTGGATAGATAGACGA 0.700.70 1515 Penta DPenta D F153F153 GCAAGACACCATCTCAAGAAAGGCAAGACACCATCTCAAGAAAG 1.001.00 R318R318 TGGTCATAACGATTTTTTTGAGATGGTCATAACGATTTTTTTGAGA 1.001.00 1616 D3S1358D3S1358 F145F145 CAGTCCAATCTGGGTGACAGCAGTCCAATCTGGGTGACAG 0.500.50 R266R266 ATCAACAGAGGCTTGCATGTATCAACAGAGGCTTGCATGT 0.500.50 1717 TH01TH01 F117F117 GATTCCCATTGGCCTGTTCGATTCCCATTGGCCTGTTC 0.500.50 R216R216 CAGGTCACAGGGAACACAGACAGGTCACAGGGAACACAGA 0.500.50 1818 vWAvWA F096F096 GAATAATCAGTATGTGACTTGGATTGGAATAATCAGTATGTGACTTGGATTG 1.001.00 R226R226 TGATAAATACATAGGATGGATGGTGATAAATACATAGGATGGATGG 1.001.00

멀티플렉스 PCR 반응은 1 ng 주형 DNA, 4.5 U AmpliTaq GoldDNA 폴리머라아제(Applied Biosystems), 2.5 μL Gold ST*R 10×버퍼(Promega) 및 각 0.35 μM - 1.0 μM 의 프라이머(표 7)를 포함하는 25 μL 반응 부피에서 수행하였다. PCR 사이클은 다음 조건 하에서 Veriti96-Well Thermal Cycler (Applied Biosystems)를 이용하여 수행하였다:95℃, 11 분; 33 사이클, 94℃(20초), 59℃(90초) 및 72℃(60초); 및 최종 연장, 60℃(45분); 4℃. Multiplex PCR reactions included 1 ng template DNA, 4.5 U AmpliTaq Gold DNA polymerase (Applied Biosystems), 2.5 μL Gold ST * R 10 × buffer (Promega) and 0.35 μM-1.0 μM primers (Table 7) 25 [mu] L reaction volume. The PCR cycle was performed using Veriti 96-Well Thermal Cycler (Applied Biosystems) under the following conditions: 95 ° C, 11 min; 33 cycles, 94 캜 (20 sec), 59 캜 (90 sec) and 72 캜 (60 sec); And final extension, 60 < 0 > C (45 min); 4 ° C.

PCR 이후, PCR 생성물 25 μL에 ExoSAP IT (USB, Cleveland, OH, USA) 10 μL를 처리하여, 37℃에서 45분 반응시킴으로써 정제하였으며, 이어 80℃에서 15분간 열처리하여 불활성화시켰다. 상기 효소 정제된 PCR 생성물을 Qiaquick PCR Purification Kit (Qiagen, Hilden, Germany)를 이용하여 추가 정제하였다. 최종적으로 Agilent High Sensitivity DNA Kit (Agilent Technologies, Palo Alto, CA, USA)를 사용하는 Bioanalyzer 2100를 이용하여 사이즈 분포를 분석하고, NanoDrop 1000 분광기(Thermo. Fisher scientific, Waltham, MA, USA)를 이용한 순도를 측정하며, Quant-iT™ PicoGreendsDNA Assay Kit (Invitrogen, Carlsbad, CA, USA)를 이용하여 정량함으로써 정제된 앰플리콘의 질을 계산하였다.
After PCR, 25 μL of the PCR product was purified by treating with 10 μL of ExoSAP IT (USB, Cleveland, OH, USA) for 45 minutes at 37 ° C., followed by heat treatment at 80 ° C. for 15 minutes to inactivate the PCR product. The purified enzyme-purified PCR product was further purified using Qiaquick PCR Purification Kit (Qiagen, Hilden, Germany). Finally, the size distribution was analyzed using a Bioanalyzer 2100 using an Agilent High Sensitivity DNA Kit (Agilent Technologies, Palo Alto, Calif., USA) and the purity using a NanoDrop 1000 spectrophotometer (Thermo Fisher Scientific, Waltham, Mass., USA) , And the quality of the purified ampicillin was calculated by quantification using Quant-iT ™ PicoGreened DNA Assay Kit (Invitrogen, Carlsbad, Calif., USA).

3. 인위적 분해된 3. Artificially decomposed DNADNA 의 of PCRPCR 증폭 Amplification

K562 고분자 DNA (Promega)에 0.006 U의 DNase I (New England Biolabs, Ipswich, MA, USA)를 첨가하여 37℃, 15분 및 75℃, 10분 반응시켜 분절시킴으로써 인위적 분해된 샘플을 준비하였다.An artificially degraded sample was prepared by adding 0.006 U of DNase I (New England Biolabs, Ipswich, MA, USA) to K562 polymer DNA (Promega) and reacting at 37 ° C for 15 minutes and 75 ° C for 10 minutes.

DNA 파편은 아가로스 젤 전기영동을 통하여 확인하였다. 증폭 반응은 상기 절단된 DNA 200 pg, 6.0 U의 AmpliTaq GoldDNA 폴리머라아제(Applied Biosystems)를 포함하는 반응부피 25 μL에서 상술한 조건과 같은 조건으로 수행하였으며, 36 사이클로 진행되었다. 모든 실험은 2회 진행하였으며, 각각 AD01 및 AD02로 표시하였다.
DNA fragments were confirmed by agarose gel electrophoresis. The amplification reaction was carried out under the same conditions as described above under a reaction volume of 25 μL containing 200 pg of the digested DNA and 6.0 U of AmpliTaq Gold DNA polymerase (Applied Biosystems), and the PCR reaction was carried out at 36 cycles. All experiments were conducted in duplicate and labeled AD01 and AD02, respectively.

4. 라이브러리 준비 및 4. Prepare your library and NGSNGS 데이터 생성 Data generation

200 ng 이상의 정제된 PCR 생성물로부터, TruSeqNano DNA Sample Preparation Kit (Illumina)에 의해 라이브러리가 생성되었다. 모든 과정은 제조사의 방법에 따라 수행하였으며, 사이즈 선별을 위한 비드(beads) 비율은 이전 연구[19]에 근거한 비드 정제과정에서 조절되었다. From 200 ng or more purified PCR products, libraries were generated by the TruSeqNano DNA Sample Preparation Kit (Illumina). All procedures were performed according to the manufacturer's method, and the beads ratio for sizing was controlled in the bead refining process based on previous work [19].

완성된 라이브러리는 Illumina에 제시된 qPCR 방법을 이용하여 정량화하였다. Bioanalyzer는 사이즈 분배를 확인하고 최종 라이브러리의 전체 농도를 측정하기 위하여 사용되었다. 샘플은 MiSeq™ (Illumina Inc., San Diego, CA, USA)상에서 시퀀싱하였고, MiSeq Reporter에서 자동적으로 분류되었다.
The completed library was quantified using the qPCR method presented in Illumina. The Bioanalyzer was used to confirm the size distribution and to measure the total concentration of the final library. Samples were sequenced on MiSeq ™ (Illumina Inc., San Diego, Calif., USA) and automatically sorted by the MiSeq Reporter.

5. 5. MiSeqMiSeq 데이터 분석 Data Analysis

STR 대립유전자는 Bornman 외[6]에 개시된 프로토콜에 따라 MiSeq 데이터로부터 불러왔다. 요약하면, 본 발명자들은 FASTA 포맷에서 STR 레퍼런스 서열을 준비하고, 최근 알려진 대립유전자 서열 및 5’ and 3’ 플랭킹(flanking) 서열과 연관시킴으로써 구축하였다. 대립유전자 서열은 STRbase (http://www.cstl.nist.gov/strbase)로부터 얻었다. 길이 500 bp - 550 bp의 플랭킹 서열은 인간 레퍼런스 지놈 GRCh37/hg19으로부터 얻었다. NGS 데이터로부터의 STR 프로파일링을 위하여 4 가지 프로그램(Bowtie 2 [20], SAMtools [21], BEDTools [22], 및 Microsoft Excel)을 사용하였다. 전체 STR 부위 및 5’ and 3’ 플랭킹(flanking) 서열의 5 bp에 걸친 서열 리드를 검출하기 위하여, 각 대립유전자의 i) STR locus 이름, ⅱ) 대립유전자 이름, 및 ⅲ) 반복부위의 개시 및 종료지점을 포함하는 Tab-delimited BED 파일을 준비하였다. STR alleles were retrieved from MiSeq data according to the protocol described in Bornman et al. [6]. Briefly, we constructed an STR reference sequence in the FASTA format by linking it to a recently known allele sequence and a 5 'and 3' flanking sequence. The allele sequence was obtained from STRbase ( http://www.cstl.nist.gov/strbase ). Flanking sequences of 500 bp to 550 bp in length were obtained from the human reference genome GRCh37 / hg19. Four programs (Bowtie 2 [20], SAMtools [21], BEDTools [22], and Microsoft Excel) were used for STR profiling from NGS data. I) the STR locus name of each allele, ii) the allele name, and iii) the start of the repeat site, in order to detect sequence leads spanning 5 bp of the entire STR region and 5 'and 3' flanking sequences. And a tab-delimited BED file containing the end point.

MiSeq 데이터의 서열 리드는 Bowtie 2 프로그램을 사용하여 STR 레퍼런스 서열 상에서 얼라인하였다. 이후, 얼라인 결과 파일의 포맷 전환에 SAMtools 및 BEDTools를 사용하였다. 마지막으로, Microsoft Excel을 이용하여, 얼라인 결과 및 BED 파일로부터 전체 STR 부위를 포함하는 서열 리드의 수를 카운팅함으로써 각 locus에서 STR 대립유전자를 결정하였다. 이러한 과정 동안, 전체 coverage 값의 20%가 단일-소스 샘플의 STR 대립유전자 call에 사용되었으며, 10%, 5%, 및 1%은 다양한 혼합비로서 혼합물샘플에 적용되었다. 임계값(threshold values)은 전체 데이터세트를 위하여 각 locus에서 STR 대립유전자의 coverage 값을 분석함으로써 경험적으로 결정되었다. STR 대립유전자 call 결과를 측정하기 위하여, STRait Razor 프로그램[17]을 이용하여 MiSeq 데이터를 분석하였다. 타겟 STR 부위의 서열 변이는 Integrative Genomics Viewer [23,24]를 이용하여 레퍼런스 및 샘플 서열을 비교함으로써 분석하였고, 이어 이들이 이미 보고된 것인지 NCBI dbSNP (http://www.ncbi.nlm.nih.gov/projects/SNP/)에서 검색하였다.
Sequence reads of MiSeq data were aligned on the STR reference sequence using the Bowtie 2 program. We then used SAMtools and BEDTools to convert the format of the alignment results file. Finally, using Microsoft Excel, STR alleles were determined in each locus by counting the number of sequence leads containing the entire STR region from the alignment results and the BED file. During this process, 20% of the total coverage value was used in the STR allele call of the single-source sample, and 10%, 5%, and 1% were applied to the mixture samples at various mixing ratios. Threshold values were determined empirically by analyzing the coverage of STR alleles in each locus for the entire data set. In order to measure STR allele call results, MiSeq data were analyzed using the STRait Razor program [17]. Sequence variations in the target STR region were analyzed by comparing the reference and sample sequences using the Integrative Genomics Viewer [23,24], and then the NCBI dbSNP ( http://www.ncbi.nlm.nih.gov) / projects / SNP / ).

6. 비교를 위한 6. For comparison CECE 프로파일 데이터 Profile data

모든 샘플은 AmpFℓSTR IdentifilerKit (Applied Biosystems)로서 검출하였고, Data Collection Software Version 3.0이 있는 ABI 3130xl Genetic Analyzer를 거쳐, ABI GeneMapperID Software Version 3.2로 분석하였다. 이러한 방법을 통해 얻은 STR 프로파일은 NGS 데이터로부터의 STR 타이핑 결과와 비교하기 위한 레퍼런스 데이터로 사용되었다. 인위적 분절된 DNA를 위하여, 주형 DNA 200pg를 사용하였고, 열 사이클링은 33 사이클로 보정하였다.
All samples were detected as AmpFlSTR IdentifilerKit (Applied Biosystems) and analyzed with ABI GeneMapperID Software Version 3.2 via ABI 3130xl Genetic Analyzer with Data Collection Software Version 3.0. The STR profile obtained by this method was used as reference data for comparison with STR type results from NGS data. For artificially segmented DNA, 200 pg of template DNA was used and thermal cycling was corrected to 33 cycles.

나. 연구결과I. Results

1. NGS 분석에 최적화된 in-house 멀티플렉스 PCR 시스템의 구축 1. Construction of an in-house multiplex PCR system optimized for NGS analysis

in-house 멀티플렉스 PCR 시스템은 13개 CODIS STR loci, 일반적인 법의학 상업적 키트에 사용되는 4개 loci (D2S1338, D19S433, Penta D 및 Penta E) 및 아멜로제닌을 포함하는 18개 STR의 NGS 분석을 위하여 구축되었다. 앰플리콘은 70 bp - 210 bp의 크기로 디자인되었으며(도 4), 현재 이용가능한 NGS 플랫폼의 리드 길이와 호환가능하다. 프라이머 간섭정도를 알아보기 위하여, 염료-레이블링된 프라이머로 PCR 앰플리콘을 분석하였다. 그 결과, 전기영동도(electropherogram) 상에서 최종 프라이머 세트는 어떤 프라이머 간섭도 없이 균형잡힌 증폭 효율을 나타내었다(도 6).
The in-house multiplex PCR system consists of 13 CODIS STR loci, 18 loci for NGS analysis, including 4 loci (D2S1338, D19S433, Penta D and Penta E) and ammelogenin for use in general forensic commercial kits Was built. Amplicons are designed in sizes from 70 bp to 210 bp (Fig. 4) and are compatible with the lead lengths of currently available NGS platforms. To determine the degree of primer interference, PCR amplicons were analyzed with dye-labeled primers. As a result, the final primer set on the electrophorogram showed balanced amplification efficiency without any primer interference (FIG. 6).

2. 시퀀싱 데이터2. Sequencing data

표 8은 각 샘플에서 특정 STR 대립유전자의 전체 반복 부위에 걸친 얼라인 리드 넘버 수를 나타낸다. 30 (Q30)의 퀄리티 스코어를 가진 염기는 75.8%이다. 각 대립유전자의 coverage는 18개 마커에 따라 분류되고 레퍼런스 서열과 얼라인되었다(도 5). STR loci에 따른 전체 coverage는 CSF1PO에서 최소 29,818 리드, D2S1338에서 최대 107,477 리드의 분포를 나타내었다. 결론적으로, 반복수에 따른 대립유전자 그룹핑이 가능하고, 이는 대립유전자 및 산출되어야할 주요 모 대립유전자(parent allele) 보다 하나의 반복부위만큼 짧은 소수의 PCR 결과물인 stutter의 서열 비율을 가능케 한다.
Table 8 shows the number of aligned lead numbers across all repeat sites of a particular STR allele in each sample. 30 (Q30) has a quality score of 75.8%. The coverage of each allele was classified according to 18 markers and aligned with the reference sequence (Fig. 5). The total coverage according to STR loci showed a minimum of 29,818 leads in CSF1PO and a maximum of 107,477 leads in D2S1338. In conclusion, it is possible to group the alleles according to the number of repeats, which allows the sequence ratio of the stutter, which is a small number of PCR products, which is as short as one iteration than the allele and the parent allele to be calculated.

MiSeq 데이터로부터의 각 샘플의 얼라인(aligned) 리드 수The number of aligned leads of each sample from MiSeq data 샘플Sample 2800M2800M 9947A9947A 1:1 혼합물1: 1 mixture 1:3 혼합물1: 3 mixture Paired-end readPaired-end read 리드 1Lead 1 리드 2Lead 2 리드 1Lead 1 리드 2Lead 2 리드 1Lead 1 리드 2Lead 2 리드 1Lead 1 리드 2Lead 2 전체 리드All leads 1,046,9631,046,963 1,037,1841,037,184 1,109,0401,109,040 1,100,1661,100,166 1,001,2331,001,233 990,550990,550 1,008,5851,008,585 1,000,2521,000,252 샘플Sample 1:6 혼합물1: 6 mixture 1:9 혼합물1: 9 mixture 1:19 혼합물1:19 mixture 1:49 혼합물1:49 mixture Paired-end readPaired-end read 리드 1Lead 1 리드 2Lead 2 리드 1Lead 1 리드 2Lead 2 리드 1Lead 1 리드 2Lead 2 리드 1Lead 1 리드 2Lead 2 전체 리드All leads 1,045,5191,045,519 1,037,2301,037,230 1,006,8431,006,843 997,143997,143 1,795,4341,795,434 1,776,8011,776,801 2,302,0592,302,059 2,285,3332,285,333

3. 단일-소스 시료 및 혼합시료의 3. Single-source and mixed samples STRSTR 지노타이핑Zino typing

수득한 NGS 데이터를 이용하여, 대부분의 STR 대립유전자 및 아멜로제닌은 단일-소스 샘플의 20% coverage 임계치를 적용함으로써 성공적으로 결정될 수 있다. 이어, 18개 마커의 STR 지노타입은 CE 및 NGS 기반 분석에서 모두 동일하였다(표 9). Using the obtained NGS data, most STR alleles and amelogenins can be successfully determined by applying a 20% coverage threshold of a single-source sample. Next, the STR genotype of the 18 markers was the same for CE and NGS based analyzes (Table 9).

NGS 데이터로부터의 두 가지 단일 소스 STR 지노타이핑 결과Two Single Source STR Genotype Typing Results from NGS Data STR locusSTR locus 2800M2800M 9947A9947A CECE NGSNGS CECE NGSNGS D2S1338D2S1338 22, 2522, 25 22, 2522, 25 19, 2319, 23 19, 2319, 23 D3S1358D3S1358 17, 1817, 18 17, 1817, 18 14, 1514, 15 14, 1514, 15 D5S818D5S818 1212 1212 1111 1111 D7S820D7S820 8, 118, 11 8, 118, 11 10, 1110, 11 10, 1110, 11 D8S1179D8S1179 14, 1514, 15 14, 1514, 15 1313 1313 D13S317D13S317 9, 119, 11 9, 119, 11 1111 1111 D16S539D16S539 9, 139, 13 9, 139, 13 11, 1211, 12 11, 1211, 12 D18S51D18S51 16, 1816, 18 16, 1816, 18 15, 1915, 19 15, 1915, 19 D19S433D19S433 13, 1413, 14 13, 1413, 14 14, 1514, 15 14, 1514, 15 D21S11D21S11 29, 31.229, 31.2 29, 31.229, 31.2 3030 3030 CSF1POCSF1PO 1212 1212 10, 1210, 12 10, 1210, 12 FGAFGA 20, 2320, 23 20, 2320, 23 23, 2423, 24 23, 2423, 24 Penta DPenta D 12, 1312, 13 12, 1312, 13 1212 1212 Penta EPenta E 7, 147, 14 7, 147, 14 12, 1312, 13 12, 1312, 13 TH01TH01 6, 9.36, 9.3 6, 9.36, 9.3 8, 9.38, 9.3 8, 9.38, 9.3 TPOXTPOX 1111 1111 88 88 vWAvWA 16, 1916, 19 16, 1916, 19 17, 1817, 18 17, 1817, 18 AmelogeninAmelogenin X, YX, Y X, YX, Y XX XX

NGS에 의한 혼합 샘플의 STR 지노타이핑 결과는 표 10에 나타내었다. 이 결과는 CE-기반 프로파일과 거의 일치한다. 흥미롭게도, 1:1 혼합물에서 drop-out으로서 관찰된 coverage 임계치(10%)보다 낮은 대립유전자들은 9947A 지노타입이었다. 그러나 1:6 및 1:49 혼합물의 결과는 coverage 임계치(각각 #% 및 #%)보다 낮은 대립유전자가 2800M 지노타입으로 나타났다. 상기 데이터에서 drop-in 대립유전자가 관찰되었으며, 이들 중 일부는 stutter threshold의 15%이상이었다. The results of STR genotyping of the mixed samples by NGS are shown in Table 10. This result almost coincides with the CE-based profile. Interestingly, alleles lower than the coverage threshold (10%) observed as a drop-out in a 1: 1 mixture were 9947A genotype. However, the results for the 1: 6 and 1:49 mixtures showed an allele of 2800M genotype lower than the coverage threshold (#% and #%, respectively). A drop-in allele was observed in the data, some of which were more than 15% of the stutter threshold.

NGS 데이터로부터의 혼합물 STR 지노타이핑 결과Mixture STR NG Typing Results from NGS Data STR locusSTR locus MiSeq STR 데이터MiSeq STR data 1:11: 1 1:31: 3 1:61: 6 1:91: 9 1:191:19 1:491:49 D2S1338D2S1338 19, 22, (23) ^a ,2519, 22, (23) ^a , 25 19, 22, 23, 2519, 22, 23, 25 19, 22, 23, 2519, 22, 23, 25 18 ^b , 19, 22, 23, 25 18 ^b , 19, 22, 23, 25 19, 22, 23, 2519, 22, 23, 25 18 ^b , 19, 22, 23, (25) ^a 18 ^b , 19, 22, 23, (25) ^a D3S1358D3S1358 14, 15, 17, 1814, 15, 17, 18 14, 15, 17, 1814, 15, 17, 18 14, 15, 17, 1814, 15, 17, 18 14, 15, 17, 1814, 15, 17, 18 14, 15, 17, 1814, 15, 17, 18 14, 15, 17, (18) ^a 14, 15, 17, (18) ^a D5S818D5S818 11, 1211, 12 11, 1211, 12 11, 1211, 12 11, 1211, 12 11, 1211, 12 11, 1211, 12 D7S820D7S820 8, 10, 118, 10, 11 8, 10, 118, 10, 11 8, 10, 118, 10, 11 8, 10, 118, 10, 11 8, 10, 118, 10, 11 8, 10, 118, 10, 11 D8S1179D8S1179 13, 14, 1513, 14, 15 13, 14, 1513, 14, 15 13, 14, 1513, 14, 15 13, 14, 1513, 14, 15 13, 14, 1513, 14, 15 11 ^b , 13, 14, 15 11 ^b , 13, 14, 15 D13S317D13S317 9, 119, 11 9, 119, 11 9, 119, 11 9, 119, 11 9, 119, 11 9, 119, 11 D16S539D16S539 9, (11) ^a ,12,139, (11) ^a , 12, 13 9, 11, 12, 139, 11, 12, 13 9, 11, 12, 139, 11, 12, 13 9, 11, 12, 139, 11, 12, 13 9, 11, 12, 139, 11, 12, 13 9, 11, 12, 139, 11, 12, 13 D18S51D18S51 15, 16, 18, 1915, 16, 18, 19 15, 16, 18, 1915, 16, 18, 19 15, 16, 18, 1915, 16, 18, 19 14 ^b , 15, 16, 18, 19 14 ^b , 15, 16, 18, 19 14 ^b , 15, 16, 18, 19 14 ^b , 15, 16, 18, 19 12 ^b , 13 ^b , 14 ^b ,15,(16) ^a ,18,19 12 ^b , 13 ^b , 14 ^b , 15, (16) ^a , 18, 19 D19S433D19S433 13, 14, (15) ^a 13, 14, (15) ^a 13, 14, 1513, 14, 15 13, 14, 1513, 14, 15 13, 14, 1513, 14, 15 13, 14, 1513, 14, 15 13, 14, 1513, 14, 15 D21S11D21S11 29, 30, 31.229, 30, 31.2 29, 30, 31.229, 30, 31.2 29, 30, 31.229, 30, 31.2 29, 30, 31.229, 30, 31.2 29, 30, 30.2 ^b ,31.229, 30, 30.2 ^b , 31.2 29, 29.2 ^b ,30, 30.2 ^b ,31.229, 29.2 ^b , 30, 30.2 ^b , 31.2 CSF1POCSF1PO (10) ^a , 12 (10) ^a , 12 10, 1210, 12 10, 1210, 12 10, 1210, 12 10, 1210, 12 10, 1210, 12 FGAFGA 20, 23, 2420, 23, 24 20, 23, 2420, 23, 24 (20) ^a , 23, 24 (20) ^a , 23, 24 20, 23, 2420, 23, 24 20, 23, 2420, 23, 24 (20) ^a , 23, 24 (20) ^a , 23, 24 Penta DPenta D 12, 1312, 13 12, 1312, 13 12, 1312, 13 12, 1312, 13 12, 1312, 13 12, 1312, 13 Penta EPenta E 7, (12) ^a ,13,147, (12) ^a , 13, 14 7, 12, 13, 147, 12, 13, 14 7, 12, 13, (14) ^a 7, 12, 13, (14) ^a 7, 12, 13, 147, 12, 13, 14 7, 12, 13, 147, 12, 13, 14 (7) ^a , 12, 13, 14 (7) ^a , 12, 13, 14 TH01TH01 6, (8) ^a ,9.36, (8) ^a, 9.3 6, 8, 9.36, 8, 9.3 6, 8, 9.36, 8, 9.3 6, 8, 9.36, 8, 9.3 6, 8, 9.36, 8, 9.3 6, 8, 9.36, 8, 9.3 TPOXTPOX 8, 118, 11 8, 118, 11 8, 118, 11 8, 118, 11 8, 118, 11 8, 118, 11 vWAvWA 16, 17, 18, 1916, 17, 18, 19 16, 17, 18, 1916, 17, 18, 19 16, 17, 18, 1916, 17, 18, 19 16, 17, 18, 1916, 17, 18, 19 16, 17, 18, 1916, 17, 18, 19 16, 17, 18, (19) ^a , 25 ^b 16, 17, 18, (19) ^a , 25 ^b coverage threshold
coverage threshold
10%10% 10%10% 10%10% 5%5% 2.5%2.5% 1%One%

a 괄호안의 대립유전자는 각 coverage 임계치보다 낮은 coverager 값을 가지는 대립유전자를 나타낸다.a Alleles in parentheses indicate alleles with coverers below each coverage threshold.

b 밑줄 표시의 대립유전자는 stutter 임계치보다 높은 값을 가지는 예상외의 대립유전자를 나타낸다.
b The underlined allele represents an unexpected allele with a value higher than the stutter threshold.

4. 관찰된 서열 변이4. Observed sequence variations

시퀀싱 데이터에서, D2S1338, D8S1179, D21S11, D3S1358, 및 vWA의 반복부위로부터의 서열 변이를 관찰하였다. D2S1338 대립유전자 서열은 [TGCC]a [TTCC]b로 표시된다. 9947A에서 D2S1338의 서열 변이는 상호연결 유닛인 ‘GTCC’에서 관찰되었다(도 7a). 이러한 변이 타입은 NCBI에서 단일 뉴클레오타이드 다형성(SNP), rs9678338 (T>G)로 보고되었다. D3S1358의 대립유전자 서열은 TCTA [TCTG]a [TCTA]b로 표시된다. 서열 변이는 두 번째 서브-반복단위인 ‘TCTA’에서 관찰되었다. 상기 변이는 각각 rs77577482 (A>G) 및 rs71325067 (A>G)로 보고되었다. D8S1179의 대립유전자 서열은 [TCTA]a [TCTG]b [TCTA]c로 표시된다. D8S1179의 서열 변이는 상호연결 유닛인 ‘TCTG’ 및 서브-반복단위인 ‘TCTA’에서 관찰되었다. 흥미롭게도, 9947A 대립유전자는 NGS 데이터에 의해 이형접합체 대립유전자로서 동정되었고, 반면, CE에서는 동형접합체 대립유전자로 나타났다(도 7c). 그 결과, 본 발명자들은 NGS에 의한 서열 변이에 따라 같은 대립유전자를 구별할 수 있었다. 이러한 변이는 각각 rs13265375 (G>A) 및 rs111782616 (A>G)로 보고되었다. D21S11 locus에서, Danes [11,13]에 이미 보고되었던 3가지 서열 변이가 동정되었다. NCBI SNP 데이터베이스와 비교하여, 상기 변이는 rs13049099 (G>A), rs200026324 (G>A) 및 rs13050496 (A>G)로 동정되었다(도 7d). vWA의 대립유전자서열은 TCTA [TCTG]a [TCTA]b TCCA TCTA로서 표시될 수 있다. vWA의 서열 변이는 상호연결 유닛인 ‘TCTG’에서 관찰되었다.In the sequencing data, sequence variations from repeating sites of D2S1338, D8S1179, D21S11, D3S1358, and vWA were observed. The D2S1338 allele sequence is shown as [TGCC] a [TTCC] b. Sequence variation of D2S1338 at 9947A was observed in the interconnecting unit 'GTCC' (Fig. 7A). This mutation type has been reported in NCBI as a single nucleotide polymorphism (SNP), rs9678338 (T> G). The allele sequence of D3S1358 is designated TCTA [TCTG] a [TCTA] b. Sequence variation was observed in the second sub-repeat unit 'TCTA'. These mutations were reported as rs77577482 (A> G) and rs71325067 (A> G), respectively. The allele sequence of D8S1179 is shown as [TCTA] a [TCTG] b [TCTA] c. Sequence variation of D8S1179 was observed in the interconnection unit 'TCTG' and the sub-repeat unit 'TCTA'. Interestingly, the 9947A allele was identified as a heterozygous allele by NGS data whereas the CE was a homozygous allele (Fig. 7c). As a result, the present inventors could identify the same allele according to the sequence variation by NGS. These mutations were reported as rs13265375 (G> A) and rs111782616 (A> G), respectively. In the D21S11 locus, three sequence mutations previously reported in Danes [11,13] were identified. Compared to the NCBI SNP database, the mutations were identified as rs13049099 (G> A), rs200026324 (G> A) and rs13050496 (A> G) (Fig. 7d). The allele sequence of vWA may be represented as TCTA [TCTG] a [TCTA] b TCCA TCTA. Sequence variation of vWA was observed in the interconnected unit 'TCTG'.

6. 인위적 분절된 6. Artificially segmented DNADNA

본 발명자들은 인위적으로 분절된 DNA 샘플을 분석하였다. 그러나 Bioanalyzer상 라이브러리의 전기영동도(electropherogram)에서 비-특이적 피크가 관찰되었으며, NGS 분석에 대한 영향은 밝혀내지 못했다(데이터 미기재). 결과적으로 샘플 프로파일은 일반적으로 복제 증폭 간에는 일치하였으며, Identifiler kit를 통해 얻어진 부분적인 CE 프로파일 증폭과 일치하였다(표 11). 그러나 일부 대립유전자는 CE 프로파일에서 대립유전자 drop-out을 나타내었다(D13S317, D18S51: AD01; D2S1338, D7S820: AD02). 한편, 대립유전자 drop-in은 10% coverage 임계치가 STR 대립유전자를 결정하는데 적용되었을 때 일부 STR 부위에서 발견되었다. The present inventors analyzed artificially segmented DNA samples. However, non-specific peaks were observed in the electrophorogram of the Bioanalyzer phase library, and the effect on NGS analysis was not revealed (data not shown). As a result, the sample profile was generally consistent between duplicate amplifications, consistent with the partial CE profile amplification obtained with the Identifiler kit (Table 11). However, some alleles showed an allele drop-out in the CE profile (D13S317, D18S51: AD01; D2S1338, D7S820: AD02). On the other hand, the allele drop-in was found in some STR sites when the 10% coverage threshold was applied to determine the STR allele.

NGS 데이터로부터의 인위적으로 분해된 DNA의 STR 지노타이핑 결과STR Genotyping results of artificially degraded DNA from NGS data STRsSTRs K562K562 프로파일profile AD01AD01 AD02AD02 CECE NGSNGS CECE NGSNGS D2S1338D2S1338 1717 1717 1717 -- 1717 D3S1358D3S1358 1616 1616 15 ^b , 16 15 ^b , 16 1616 1616 D5S818D5S818 11, 1211, 12 11, 1211, 12 11, 1211, 12 1111 11, 1211, 12 D7S820D7S820 9, 119, 11 9, 119, 11 9, 119, 11 -- 9, 10 ^b , 119, 10 ^b , 11 D8S1179D8S1179 1212 1212 1212 1212 11 ^b , 12 11 ^b , 12 D13S317D13S317 88 -- 88 88 88 D16S539D16S539 11, 1211, 12 11, 1211, 12 11, 1211, 12 1212 11, 1211, 12 D18S51D18S51 15, 1615, 16 -- 14 ^b , 15, 16 14 ^b , 15, 16 15, 1615, 16 14, 15, 1614, 15, 16 D19S433D19S433 14, 14.214, 14.2 14, 14.214, 14.2 14, 14.214, 14.2 14.214.2 14, 14.214, 14.2 D21S11D21S11 29, 30, 3129, 30, 31 3131 29, (30) _a,3129, (30) _a , 31 3131 29, 30, 3129, 30, 31 CSF1POCSF1PO 9, 109, 10 99 9, 109, 10 1010 9, 109, 10 FGAFGA 21, 2421, 24 21, 2421, 24 21, 2421, 24 21, 2421, 24 21, 2421, 24 Penta DPenta D 9, 139, 13 -- 9, 139, 13 -　- 9, 139, 13 Penta EPenta E 5, 145, 14 -- 5, 145, 14 -- 5, 145, 14 TH01TH01 9.39.3 9.39.3 9.39.3 9.39.3 9.39.3 TPOXTPOX 8, 98, 9 8, 98, 9 8, 98, 9 8, 98, 9 8, 98, 9 vWAvWA 1616 1616 1616 1616 16, 18 ^b 16, 18 ^b AmelogeninAmelogenin XX XX XX XX XX

a 괄호안의 대립유전자는 10% coverage 임계치보다 낮은 coverager 값을 가지는 대립유전자를 나타낸다.a Alleles in parentheses represent alleles with a coverager value lower than the 10% coverage threshold.

대립유전자에 대한 리드에 기초하여, 본 발명자들은 대립유전자 drop-in으로서 나타난 대립유전자(true allele)의 stutter로 생각되는 대립유전자를 확인하였다. 그 이유는 이들의 coverage는 stutter 분석 임계치보다 다소 높기 때문이다. 결론적으로 NGS에 의한 중복실험 결과는 높은 coverage를 가진 CE 프로파일보다 많은 STR 프로파일을 보여주었다.
Based on the leader for the allele, we have identified an allele thought to be a stutter of the true allele, which appeared as an allele drop-in. This is because their coverage is somewhat higher than the stutter analysis threshold. In conclusion, the overlapping results of NGS showed more STR profiles than the CE profiles with high coverage.

다. 고찰All. Review

in-house 멀티플렉스 PCR 시스템을 이용하여, 본 발명자들은 NGS 데이터세트를 생성할 수 있고 STR 대립유전자를 결정할 수 있으며, STR 부위의 서열 변이를 검출할 수 있음을 증명하였다. 상기 시스템은 NGS를 이용하여 법의학적 STR loci를 분석하기 위해 디자인되었으며, 비-표지 프라이머를 이용하여 70 bp - 210 bp의 좁은 사이즈 범위 내의 작은 생성물(amplicons)를 생성할 수 있다. MiSeq 플랫폼은 평균길이 200 bp의 서열 리드를 생성할 수 있기 때문에, 이 시스템은 우수한 질(quality)의 STR NGS 데이터를 생성할 수 있다. 또한, 이 시스템은 법의학적 샘플에 적용될 수 있다. 34개 프라이머 이상을 사용하여 어떠한 프라이머 간섭도 없이 STR loci 간에 균형을 이루는 일관된 PCR 산물을 얻을 수 있다면 본 발명자들의 멀티플렉스 PCR 시스템에서 제시되는 STR은 상당하다Using an in-house multiplex PCR system, we have demonstrated that it is possible to generate NGS data sets, determine STR alleles, and detect sequence variations in the STR region. The system was designed to analyze forensic STR loci using NGS and can generate small amplicons within a narrow size range of 70 bp to 210 bp using non-labeled primers. Because the MiSeq platform can generate sequencing leads with an average length of 200 bp, this system can generate STR NGS data of excellent quality. The system can also be applied to forensic samples. The STR presented in our multiplex PCR system is significant if we can obtain a consistent PCR product that balances STR loci with no more than 34 primers and no primer interference

본 연구에서는 단일-소스 샘플 및 두 가지 대조 DNA 샘플의 혼합물(*1:1, 1:3, 1:6, 1:9, 1:19, 및 1:49)에서의 CE 프로파일링과 완전히 매칭됨을 보여주였다. 혼합물의 CE 프로파일을 관찰하였을 때, 전기영동도는 PCR 증폭산물의 피크 및 노이즈를 구분해내지 못했다. NGS 데이터를 통해서는 true 대립유전자가 리드 넘버수에 근거하여 구분될 수 있음을 확인하였다. 샘플 내 PCR 효율성이 존재함에도 불구하고, 본 발명자들은 거의 절반의 STR loci에서 정확한 지노타이핑 결과를 관찰하였다: D5S818, D7S820, D13S317, D16S539, D19S433, CSF1PO, Penta D, TH01, TPOX - 1:49 혼합물. 그러나 상기 혼합물 분석은 케이스워크(casework) 샘플이 아닌, 대조군 DNA를 사용하는 단지 2개의 도너로 실험되었다. 또한 본 발명자들은 NGS에서 1:49 혼합비의 낮은 한계를 제시하지 못했다.In this study, CE profiling in a mixture of single-source samples and two control DNA samples (* 1: 1, 1: 3, 1: 6, 1: 9, 1:19, and 1:49) . When the CE profile of the mixture was observed, the electrophoresis did not distinguish the peak and noise of the PCR amplification product. NGS data confirmed that true alleles could be identified based on the number of read numbers. Despite the presence of PCR efficiency in the sample, we observed accurate genotyping results in nearly half of STR loci: D5S818, D7S820, D13S317, D16S539, D19S433, CSF1PO, Penta D, TH01, TPOX - 1:49 mixture . However, the mixture analysis was tested with only two donors using control DNA, not casework samples. The present inventors also failed to provide a lower limit of 1:49 mixing ratio in NGS.

본 발명자들은 STR 부위에서 서열 변이를 동정하였다(D2S1338, D3S1358, D8S1179, D21S11 및 vWA). 이러한 변이는 SNP 데이터베이스에 보고되었던 것이다. 이들 변이는 같은 길이의 대립유전자를 가진 개체를 구별할 수 있도록 해준다. 이는 변별력(discrimination power, PD)이 증가되었음을 의미한다. 실질적으로 2개의 대조군 DNA를 이용하는 혼합물 분석은 다른 서열 변이가 아니라 같은 대립유전자를 관찰하여 구별가능 하였다. 이로써 같은 혼합물 내에서 대립유전자가 관찰된 서열 변이에 의해 구별될 수 있다.We have identified sequence variations at the STR site (D2S1338, D3S1358, D8S1179, D21S11 and vWA). These variations were reported in the SNP database. These mutations allow to distinguish individuals with alleles of the same length. This means that the discrimination power (PD) is increased. Mixture analysis using substantially two control DNAs was distinguishable by observing the same allele, not other sequence variants. Whereby alleles within the same mixture can be distinguished by observed sequence variations.

상기 수득한 서열 변이 데이터와 비교하기 위하여, 본 발명자들은 이전 공개된 데이터를 검색하였다. 현재까지는 STR 구역의 서열 변이에 대한 명명법(nomenclature)이 확립되지 않았다. 한 연구에서 이 문제에 대하여 지적하였다[13]. NGS로부터의 STR 데이터 일관성을 위하여 시퀀싱된 STR 프로파일의 새로운 명명법이 요구된다.In order to compare with the obtained sequence variation data, the present inventors retrieved previously disclosed data. Up to now no nomenclature has been established for sequence variation in STR regions. One study has pointed to this problem [13]. A new nomenclature of the sequenced STR profile is required for STR data consistency from NGS.

또한, 본 발명자들은 법의학적 케이스워크 샘플로서 인공적으로 분해된 DNA를 분석하였다. 인위적으로 분절된 DNA의 CE 프로파일과 비교하여, NGS 데이터는 보다 유용한 정보를 주었다. 작은 사이즈의 산물을 위한 증폭 전략을 세움에 따라, STR 프로파일을 성공적으로 얻을 수 있었다. 그러나, 본 발명자들은 진정한 케이스워크 샘플의 상태를 추정할 수는 없었다. 추가적으로 골격 잔류물, 토양 노출된 샘플과 같은 케이스워크 샘플을 분석한다면, 이러한 방법은 법의학적으로 도전적인 케이스워크 샘플로부터보다 유용한 정보를 제공할 수 있을 것이며, 이들의 유용성을 증명할 수 있을 것이다. We also analyzed artificially degraded DNA as forensic case work samples. Compared to the CE profile of artificially segmented DNA, NGS data gave more useful information. As the amplification strategy for small size products was established, the STR profile was successfully obtained. However, the present inventors could not estimate the state of true case work samples. In addition, analyzing case work samples such as skeletal residues and soil exposed samples, this method will provide more useful information from forensically medically challenging case work samples and may prove their usefulness.

법의학 아카데미는 질량 분석을 이용하는 시퀀싱으로 염기 조성을 분석하고자 하였다. 법의학 STR을 이용하는 연구가 이미 행해졌다[25-28]. 또한, 반복 부위에서 서열 변이가 관찰되었다. 그러나, 서열 상에서 이들의 위치를 알 수 없었으며, Snager 시퀀싱이 추가적인 분석단계로 수행되었다. 이러한 이유로서, NGS는 법의학분야에서 더욱 주목받았다. 법의학에서의 NGS의 이용은 CE 기반 방법에 대하여 추가적인 툴을 제공하였으며, 보다 중점적으로 연구되었다. 법의학적 STR 뿐만 아니라 미토콘드리아 DNA, Y STRs, X-STRs가 NGS 플랫폼으로 분석될 수 있다. 따라서, 본 연구는 NGS 리드 길이를 고려한 멀티플렉스 PCR에서 최적화될 수 있는 레퍼런스 연구를 제공할 것이다. 본 발명자들은 단일-소스 샘플로서 2가지의 대조군 DNA를 시퀀싱하였다. 질량 분석기를 이용하여 STR을 분석했던 이전 연구에서는, 증가된 정보의 양이 추가적인 대립유전자의 변인을 밝혀내기 위한 것이었다. 관찰된 변이는 식별력과 유효성이 증가되었음을 의미한다. The Forensic Academy wanted to analyze the base composition by sequencing using mass spectrometry. Studies using forensic STR have already been done [25-28]. Sequence variations were also observed at the repeat sites. However, their location in the sequence was not known, and Snager sequencing was performed as an additional analysis step. For this reason, NGS has received much attention in the forensic field. The use of NGS in forensic science has provided additional tools for CE-based methods and has been studied more intensively. Mitochondrial DNA, Y STRs, and X-STRs as well as forensic STRs can be analyzed on the NGS platform. Therefore, this study will provide a reference study that can be optimized in multiplex PCR by considering NGS lead length. We sequenced two control DNAs as single-source samples. In previous studies that analyzed STR using mass spectrometry, the amount of information increased was to reveal additional allelic variants. Observed mutations mean increased discrimination and efficacy.

많은 연구자들은 NGS의 실험방법이 이전보다 간소화되었다고 말한다. 그러나, 샘플 1 ng을 이용한 증폭반응에서 NGS의 시퀀싱까지 STF의 분석을 위해 많은 과정이 필요하다. 본 발명자들은 MiSeq 플랫폼에서 TruSeq 시스템을 이용할 때 라이브러리를 생성하는 것은 CE 방법보다 보다 힘든 일이며, 시간 소모적인 일이다. 특히, 동일한 시간에 생성된 라이브러리 수는 적다. 만약 공개적인 법의학 실험실에서 추가적인 강력한 도구로서 NGS를 사용하고자 한다면, 실험 단계를 간소화하는 것이 필요하다.Many researchers say NGS experiments are simpler than ever before. However, many steps are required to analyze the STF from the amplification reaction using 1 ng of sample to the sequencing of NGS. The inventors have found that creating a library when using the TruSeq system on the MiSeq platform is a more difficult and time consuming task than the CE method. In particular, the number of libraries generated at the same time is small. If you want to use NGS as an additional powerful tool in public forensic laboratories, it is necessary to simplify the experimental steps.

본 발명자들이 MiSeq 플랫폼만을 사용했지만, in-house developed 멀티플렉스 PCR 시스템은 또 다른 NGS 플랫폼에 적용되었을 때 우수한 퀄리티를 제공하며 적절한 다른evelo제공한다. 따라서, in-house developed 멀티플렉스 PCR 시스템은 특이적인 NGS 플랫폼에 한정 적절한는다. 이를 뤄을 사용다양한 NGS 플랫폼에서 민감도 실험을 포함한 추가적인 실험를 뤄을 사용 만약 추가적인 실험을 수행하여 STR NGS 연구를 보완한다면, 이는 법의학적 유전학에 유용하게 이용될 수 있을 것이다.Although the present inventors used only the MiSeq platform, the in-house developed multiplex PCR system provides superior quality when applied to another NGS platform and provides another suitable evelo. Thus, in-house developed multiplex PCR systems are appropriate for specific NGS platforms. Use this to make additional experiments, including sensitivity testing, on various NGS platforms. If you perform additional experiments and supplement STR NGS studies, this can be useful for forensic genetics.

또한 본 연구에서 구축된 multiplex PCR system을 이용한 STR의 NGS 분석은 단일시료뿐만 아니라 혼합시료의 분석에도 유용하게 적용될 수 있을 것으로 보이며, 향후 보다 많은 시료를 대상으로 STR 영역 내 염기서열변이를 확인하고 이들을 이용한다면 혼합시료의 분석을 보다 용이하게 수행할 수 있을 것으로 기대된다.
In addition, the NGS analysis of STR using the multiplex PCR system constructed in this study is expected to be useful not only for single samples but also for analyzing mixed samples. It is expected that the analysis of the mixed sample can be performed more easily.

라. 참고문헌la. references

[1] R. Thompson, S. Zoppis, B. McCord, An overview of DNA typing methods for human identification: past, present, and future. Methods Mol Biol 830 (2012) 3-16.[1] R. Thompson, S. Zoppis, B. McCord, An overview of DNA typing methods for human identification: past, present, and future. Methods Mol Biol 830 (2012) 3-16.

[2] M. Kayser, P. de Knijff, Improving human forensics through advances in genetics, genomics and molecular biology. Nat Rev Genet 12 (2011) 179-192. [2] M. Kayser, P. de Knijff, Improving human forensics through advances in genetics, genomics and molecular biology. Nat Rev Genet 12 (2011) 179-192.

[3] J.M. Butler, E. Buel, F. Crivellente, B.R. McCord, Forensic DNA typing by capillary electrophoresis using the ABI Prism 310 and 3100 genetic analyzers for STR analysis, Electrophoresis. 25 (2004) 1397-1412.[3] J.M. Butler, E. Buel, F. Crivellente, B.R. McCord, Forensic DNA typing by capillary electrophoresis using the ABI Prism 310 and 3100 genetic analyzers for STR analysis, Electrophoresis. 25 (2004) 1397-1412.

[4] C. Phillips, M. Gelabert-Besada, L. Fernandez-Formoso, M. Garc, C. Santos, M. Fondevila, D. Ballard, D. Syndercombe Court, A. Carracedo, M. Victoria Lareu, “New turns from old STaRs”: Enhancing the capabilities of forensic short tandem repeat analysis, Electrophoresis. (2014) doi: 10.1002/elps.201400095.[4] C. Phillips, M. Gelabert-Besada, L. Fernandez-Formoso, M. Garc, C. Santos, M. Fondevila, D. Ballard, D. Syndercombe Court, A. Carracedo, M. Victoria Lareu, New turns from old STaRs ": Enhancing the capabilities of forensic short tandem repeat analysis, Electrophoresis. (2014) doi: 10.1002 / elps.201400095.

[5] C. Van Neste, F. Van Nieuwerburgh, D. Van Hoofstat, D. Deforce, Forensic STR analysis using massive parallel sequencing, Forensic Sci. Int. Genet. 6 (2012) 810-818.[5] C. Van Neste, F. Van Nieuwerburgh, D. Van Hoofstat, D. Deforce, Forensic STR analysis using massive parallel sequencing, Forensic Sci. Int. Genet. 6 (2012) 810-818.

[6] D.M. Bornman, M.E. Hester, J.M. Schuetter, M.D. Kasoji, A. Minard-Smith, C.A. Barden, S.C. Nelson, G.D. Godbold, C.H. Baker, B. Yang, J.E. Walther, I.E. Tornes, P.S. Yan, B. Rodriguez, R. Bundschuh, M.L. Dickens, B.A. Young, S.A. Faith, Short-read, high-throughput sequencing technology for STR genotyping, Biotechniques (2012) 1-6 doi: 10.2144/000113857.[6] D.M. Bornman, M.E. Hester, J.M. Schuetter, M.D. Kasoji, A. Minard-Smith, C.A. Barden, S.C. Nelson, G.D. Godbold, C.H. Baker, B. Yang, J.E. Walther, I.E. Tornes, P. S. Yan, B. Rodriguez, R. Bundschuh, M.L. Dickens, B.A. Young, S.A. Faith, Short-read, high-throughput sequencing technology for STR genotyping, Biotechniques (2012) 1-6 doi: 10.2144 / 000113857.

[7] E.C. Berglund, A. Kiialainen, A.C. Syv, Next-generation sequencing technologies and applications for human genetic history and forensics. Investig Genet. 2 (2011) 23.[7] E.C. Berglund, A. Kiialainen, A.C. Syv, Next-generation sequencing technologies and applications for human genetic history and forensics. Investig Genet. 2 (2011) 23.

[8] T.C. Glenn, Field guide to next-generation DNA sequencers. Mol Ecol Resour. 11 (2011) 759-769.[8] T.C. Glenn, Field guide to next-generation DNA sequencers. Mol Ecol Resour. 11 (2011) 759-769.

[9] L. Liu, Y. Li, S. Li, N. Hu, Y. He, R. Pong, D. Lin, L. Lu, M. Law, Comparison of next-generation sequencing systems. J Biomed Biotechnol (2012) doi:10.1155/2012/251364.[9] L. Liu, Y. Li, S. Li, N. Hu, Y. He, R. Pong, D. Lin, L. Lu, M. Law, Comparison of next-generation sequencing systems. J Biomed Biotechnol (2012) doi: 10.1155 / 2012/251364.

[10] S. Dalsgaard, E. Rockenbauer, C. Gelardi, C. Børsting, S.L. Fordyce, N. Morling, Characterization of mutations and sequence variations in complex STR loci by second generation sequencing. Forensic Sci Int: Genet. Suppl. Series 4 (2013) e218-219.[10] S. Dalsgaard, E. Rockenbauer, C. Gelardi, C. Børsting, S.L. Fordyce, N. Morling, Characterization of mutations and sequence variations in STR loci by second generation sequencing. Forensic Sci Int: Genet. Suppl. Series 4 (2013) e218-219.

[11] E. Rockenbauer, S. Hansen, M. Mikkelsen, C. Børsting, N. Morling, Characterization of mutations and sequence variants in the D21S11 locus by next generation sequencing, Forensic Sci. Int. Genet. 8 (2014) 68-72. [11] E. Rockenbauer, S. Hansen, M. Mikkelsen, C. Børsting, N. Morling, Characterization of mutations and sequence variants in the D21S11 locus by next generation sequencing, Forensic Sci. Int. Genet. 8 (2014) 68-72.

[12] S.L. Fordyce, M.C. Avila-Arcos, E. Rockenbauer, C. Børsting, R. Frank-Hansen, F.T. Petersen, E. Willerslev, A.J. Hansen, N. Morling, M.T. Gilbert, High-throughput sequencing of core STR loci for forensic genetic investigations using the Roche Genome Sequencer FLX platform, Biotechniques 51 (2011) 127-133. [12] S.L. Fordyce, M.C. Avila-Arcos, E. Rockenbauer, C. Børsting, R. Frank-Hansen, F.T. Petersen, E. Willerslev, A.J. Hansen, N. Morling, M.T. Gilbert, High-throughput sequencing of core STR loci for forensic genetic investigations using the Roche Genome Sequencer FLX platform, Biotechniques 51 (2011) 127-133.

[13] C. Gelardi, E. Rockenbauer, S. Dalsgaard, C. Børsting, N. Morling, Second generation sequencing of three complex STRs D3S1358, D21S11 and D12S391 in Danes and a proposal for nomenclature of sequenced STR alleles, Forensic Sci. Int. Genet. 12 (2014) 38-41.[13] C. Gelardi, E. Rockenbauer, S. Dalsgaard, C. Børsting, N. Morling, Second generation sequencing of three complex STRs D3S1358, D21S11 and D12S391 in Danes and a proposal for nomenclature of sequenced STR alleles, Forensic Sci. Int. Genet. 12 (2014) 38-41.

[14] M. Schieble, O. Loreille, R. Just, J. Irwin. Short tandem repeat typing on the 454 platform:Strategies and considerations for targeted sequencing of common forensic markers. Forensic Sci. Int. Genet. 12 (2014) 107-109. [14] M. Schieble, O. Loreille, R. Just, J. Irwin. Short tandem repeat typing on the 454 platform: Strategies and considerations for targeted sequencing of common forensic markers. Forensic Sci. Int. Genet. 12 (2014) 107-109.

[15] C. Van Neste, M. Vandewoestyne, W. Van Criekinge, D. Deforce, F. Van Nieuwerburgh, My-Forensic-Loci-queries (MyFLq) framework for analysis of forensic STR data generated by massive parallel sequencing, Forensic Sci. Int. Genet. 9 (2014) 1-8. [15] C. Van Neste, M. Vandewoestyne, W. Van Criekinge, D. Deforce, F. Van Nieuwerburgh, and My-Forensic-Loci-queries (MyFLq) framework for analysis of forensic STR data generated by massive parallel sequencing, Sci. Int. Genet. 9 (2014) 1-8.

[16] S.L. Fordyce, H.S. Mogensen, C. Børsting, R.E. Lagac´e, C.W. Chang, N. Rajagopalan, N. Morling, Second-generation sequencing of forensic STRs using the Ion TorrentTM HID STR 10-plex and the Ion PGMTM, Forensic Sci. Int. Genet. (2014) doi :10.1016/j.fsigen.2014.09.020. [16] S.L. Fordyce, H.S. Mogensen, C. Børsting, R.E. Lagac, C.W. Chang, N. Rajagopalan, N. Morling, Second-generation sequencing of forensic STRs using the Ion TorrentTM HID STR 10-plex and the Ion PGMTM, Forensic Sci. Int. Genet. (2014) doi: 10.1016 / j.

[17] D.H. Warshauer, D Lin, K. Hari, R Jain, C. Davis, B. LaRue, J.L. King, B. Budowle, STRait razor:A length-based forensic STR allele-callling tool for use with second generation sequencing data, Forensic Sci. Int. Genet. 7 (2013) 409-417.[17] D.H. Warshauer, D. Lin, K. Hari, R. Jain, C. Davis, B. LaRue, J. L. King, B. Budowle, STRait razor: A length-based forensic STR allele-callling tool for use with second generation sequencing data, Forensic Sci. Int. Genet. 7 (2013) 409-417.

[18] M. Gymrek, D. Golan, S. Rosset, Y. Erlich, lobSTR: A short tandem repeat profiler for personal genomes, Genome Res. 22 (2012) 1154-1162.[18] M. Gymrek, D. Golan, S. Rosset, Y. Erlich, LOBSTR: A short tandem repeat profiler for personal genomes, Genome Res. 22 (2012) 1154-1162.

[19] I.F. Bronner, M.A. Quail, D.J. Turner, H. Swerdlow, Improved protocols for Illumina sequencing, Curr Protoc Hum Genet. (2009) doi:10.1002/0471142905.hg1802s62. [19] I.F. Bronner, M.A. Quail, D.J. Turner, H. Swerdlow, Improved protocols for Illumina sequencing, Curr Protoc Hum Genet. (2009) doi: 10.1002 / 0471142905.hg1802s62.

[20] B. Langmead, S.L. Salzberg, Fast gapped-read alignment with Bowtie 2, Nat Methods 9 (2012) 357-359.[20] B. Langmead, S.L. Salzberg, Fast gapped-read alignment with Bowtie 2, Nat Methods 9 (2012) 357-359.

[21] H. Li, B. Handsaker, A. Wysoker, et al. 1000 Genome Project Data Processing Subgroup. The Sequence Alignment/Map format and SAMtools, Bioinformatics 25 (2009) 2078-2079.[21] H. Li, B. Handsaker, A. Wysoker, et al. 1000 Genome Project Data Processing Subgroup. The Sequence Alignment / Map format and SAMtools, Bioinformatics 25 (2009) 2078-2079.

[22] A.R. Quinlan, I.M. Hall, BEDTools: a flexible suite of utilities for comparing genomic features, Bioinformatics. 26 (2010) 841-842.[22] A.R. Quinlan, I. M. Hall, BEDTools: a flexible suite of utilities for comparing genomic features, Bioinformatics. 26 (2010) 841-842.

[23] H. Thorvaldsd, J.T. Robinson, J.P. Mesirov, Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration, Brief Bioinform 14 (2013) 178-192.[23] H. Thorvaldsd, J.T. Robinson, J.P. Mesirov, Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration, Brief Bioinform 14 (2013) 178-192.

[24] J.T. Robinson, H. Thorvaldsd, W. Winckler, M. Guttman, E.S. Lander, G. Getz, J.P. Mesirov, Integrative Genomics Viewer, Nature Biotechnology 29 (2011) 2426.[24] J.T. Robinson, H. Thorvaldsd, W. Winckler, M. Guttman, E.S. Lander, G. Getz, J.P. Mesirov, Integrative Genomics Viewer, Nature Biotechnology 29 (2011) 2426.

[25] J.V. Planz, K.A. Sannes-Lowery, D.D. Duncan, S. Manalili, B. Budowle, R. Chakraborty, S.A. Hofstadler, T.A. Hall, Automated analysis of sequence polymorphism in STR alleles by PCR and direct electrospray ionization mass spectrometry, Forensic Sci Int Genet. 6 (2012) 594-606.[25] J.V. Planz, K.A. Sannes-Lowery, D.D. Duncan, S. Manali, B. Budowle, R. Chakraborty, S.A. Hofstadler, T.A. Hall, Automated analysis of sequence polymorphism in STR alleles by PCR and direct electrospray ionization mass spectrometry, Forensic Sci Int Genet. 6 (2012) 594-606.

[26] H. Oberacher, F. Pitter, G. Huber, H. Niederst, M. Steinlechner, W. Parson, Increased forensic efficiency of DNA fingerprints through simultaneous resolution of length and nucleotide variability by high-performance mass spectrometry, Hum Mutat 29 (2008) 427-432.[26] H. Oberacher, F. Pitter, G. Huber, H. Niederst, M. Steinlechner, and W. Parson, Increasing forensic efficiency of DNA fingerprints through simultaneous resolution of length and nucleotide variability by high-performance mass spectrometry, 29 (2008) 427-432.

[27] F. Pitterl, H. Niederst, G. Huber, B. Zimmermann, H. Oberacher, W. Parson, The next generation of DNA profiling-STR typing by multiplexed PCR-ion-pair RP LC-ESI time-of-flight MS, Electrophoresis 29 (2008) 4739-4750.[27] F. Pitterl, H. Niederst, G. Huber, B. Zimmermann, H. Oberacher, W. Parson, The next generation of DNA profiling-STR typing by multiplexed PCR- -flight MS, Electrophoresis 29 (2008) 4739-4750.

[28] F. Pitterl, K. Schmidt, G. Huber, B. Zimmermann, R. Delport, S. Amory, B. Ludes, H. Oberacher, W. Parson, Increasing the discrimination power of forensic STR testing by employing high-performance mass spectrometry, as illustrated in indigenous South African and Central Asian populations, Int J Legal Med 124 (2010) 551-558.[28] F. Pitterl, K. Schmidt, G. Huber, B. Zimmermann, R. Delport, S. Amory, B. Ludes, H. Oberacher, W. Parson, Increasing the discrimination power of forensic STR testing by employing high -performance mass spectrometry, as illustrated in indigenous South African and Central Asian populations, Int J Legal Med 124 (2010) 551-558.

[29] J.M. Butler, Advanced topics in forensic DNA typing: methodology, Elsevier academic press, New York, 2011.
[29] JM Butler, Advanced topics in forensic DNA typing: methodology, Elsevier academic press, New York, 2011.

이상으로 본 발명의 특정한 부분을 상세히 기술하였는바, 당업계의 통상의 지식을 가진 자에게 있어서 이러한 구체적인 기술은 단지 바람직한 구현예일 뿐이며, 이에 본 발명의 범위가 제한되는 것이 아닌 점은 명백하다. 따라서 본 발명의 실질적인 범위는 첨부된 청구항과 그의 등가물에 의하여 정의된다고 할 것이다.
While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the same is by way of illustration and example only and is not to be construed as limiting the scope of the present invention. It is therefore intended that the scope of the invention be defined by the claims appended hereto and their equivalents.

<110> Republic Of Korea (Supreme Public Prosecutor's Office) <120> Method for Autosomal Analysing Human Subject of Analytes based on a Next Generation Sequencing Technology <130> PN140529 <160> 68 <170> KopatentIn 2.0 <210> 1 <211> 19 <212> DNA <213> Artificial Sequence <220> <223> D3S1358 forward primer <400> 1 actgcagtcc aatctgggt 19 <210> 2 <211> 21 <212> DNA <213> Artificial Sequence <220> <223> D3S1358 reverse primer <400> 2 atgaaatcaa cagaggcttg c 21 <210> 3 <211> 21 <212> DNA <213> Artificial Sequence <220> <223> TH01 forward primer <400> 3 gtgattccca ttggcctgtt c 21 <210> 4 <211> 22 <212> DNA <213> Artificial Sequence <220> <223> TH01 reverse primer <400> 4 attcctgtgg gctgaaaagc tc 22 <210> 5 <211> 22 <212> DNA <213> Artificial Sequence <220> <223> D21S11 forward primer <400> 5 atatgtgagt caattcccca ag 22 <210> 6 <211> 26 <212> DNA <213> Artificial Sequence <220> <223> D21S11 reverse primer <400> 6 tgtattagtc aatgttctcc agagac 26 <210> 7 <211> 20 <212> DNA <213> Artificial Sequence <220> <223> D18S51 forward primer <400> 7 ttcttgagcc cagaaggtta 20 <210> 8 <211> 27 <212> DNA <213> Artificial Sequence <220> <223> D18S51 reverse primer <400> 8 attctaccag caacaacaca aataaac 27 <210> 9 <211> 26 <212> DNA <213> Artificial Sequence <220> <223> Penta E forward primer <400> 9 attaccaaca tgaaagggta ccaata 26 <210> 10 <211> 33 <212> DNA <213> Artificial Sequence <220> <223> Penta E reverse primer <400> 10 tgggttatta attgagaaaa ctccttacaa ttt 33 <210> 11 <211> 23 <212> DNA <213> Artificial Sequence <220> <223> D5S818 forward primer <400> 11 ggtgattttc ctctttggta tcc 23 <210> 12 <211> 26 <212> DNA <213> Artificial Sequence <220> <223> D5S818 reverse primer <400> 12 agccacagtt tacaacattt gtatct 26 <210> 13 <211> 26 <212> DNA <213> Artificial Sequence <220> <223> D13S317 forward primer <400> 13 attacagaag tctgggatgt ggagga 26 <210> 14 <211> 19 <212> DNA <213> Artificial Sequence <220> <223> D13S317 reverse primer <400> 14 ggcagcccaa aaagacaga 19 <210> 15 <211> 21 <212> DNA <213> Artificial Sequence <220> <223> D7S820 forward primer <400> 15 atgttggtca ggctgactat g 21 <210> 16 <211> 24 <212> DNA <213> Artificial Sequence <220> <223> D7S820 reverse primer <400> 16 gattccacat ttatcctcat tgac 24 <210> 17 <211> 24 <212> DNA <213> Artificial Sequence <220> <223> D16S539 forward primer <400> 17 gggggtctaa gagcttgtaa aaag 24 <210> 18 <211> 29 <212> DNA <213> Artificial Sequence <220> <223> D16S539 reverse primer <400> 18 gtttgtgtgt gcatctgtaa gcatgtatc 29 <210> 19 <211> 24 <212> DNA <213> Artificial Sequence <220> <223> CSF1PO forward primer <400> 19 ccggaggtaa aggtgtctta aagt 24 <210> 20 <211> 22 <212> DNA <213> Artificial Sequence <220> <223> CSF1PO reverse primer <400> 20 atttcctgtg tcagaccctg tt 22 <210> 21 <211> 19 <212> DNA <213> Artificial Sequence <220> <223> Penta D forward primer <400> 21 gaaggtcgaa gctgaagtg 19 <210> 22 <211> 27 <212> DNA <213> Artificial Sequence <220> <223> Penta D reverse primer <400> 22 attagaattc tttaatctgg acacaag 27 <210> 23 <211> 33 <212> DNA <213> Artificial Sequence <220> <223> vWA forward primer <400> 23 gccctagtgg atgataagaa taatcagtat gtg 33 <210> 24 <211> 30 <212> DNA <213> Artificial Sequence <220> <223> vWA reverse primer <400> 24 ggacagatga taaatacata ggatggatgg 30 <210> 25 <211> 32 <212> DNA <213> Artificial Sequence <220> <223> D8S1179 forward primer <400> 25 attgcaactt atatgtattt ttgtatttca tg 32 <210> 26 <211> 28 <212> DNA <213> Artificial Sequence <220> <223> D8S1179 reverse primer <400> 26 accaaattgt gttcatgagt atagtttc 28 <210> 27 <211> 20 <212> DNA <213> Artificial Sequence <220> <223> TPOX forward primer <400> 27 gcacagaaca ggcacttagg 20 <210> 28 <211> 18 <212> DNA <213> Artificial Sequence <220> <223> TPOX reverse primer <400> 28 cgctcaaacg tgaggttg 18 <210> 29 <211> 20 <212> DNA <213> Artificial Sequence <220> <223> FGA forward primer <400> 29 ggctgcaggg cataacatta 20 <210> 30 <211> 24 <212> DNA <213> Artificial Sequence <220> <223> FGA reverse primer <400> 30 attctatgac tttgcgcttc agga 24 <210> 31 <211> 19 <212> DNA <213> Artificial Sequence <220> <223> Amelogenin forward primer <400> 31 ccctgggctc tgtaaagaa 19 <210> 32 <211> 24 <212> DNA <213> Artificial Sequence <220> <223> Amelogenin reverse primer <400> 32 atcagagctt aaactgggaa gctg 24 <210> 33 <211> 23 <212> DNA <213> Artificial Sequence <220> <223> D19S433 forward primer <400> 33 gcaaaaagct ataattgtac cac 23 <210> 34 <211> 26 <212> DNA <213> Artificial Sequence <220> <223> D19S433 reverse primer <400> 34 aaaaatcttc tctctttctt cctctc 26 <210> 35 <211> 23 <212> DNA <213> Artificial Sequence <220> <223> D5S818 forward primer <400> 35 tgattttcct ctttggtatc ctt 23 <210> 36 <211> 27 <212> DNA <213> Artificial Sequence <220> <223> D5S818 reverse primer <400> 36 caacatttgt atctttatct gtatcct 27 <210> 37 <211> 19 <212> DNA <213> Artificial Sequence <220> <223> Penta E forward primer <400> 37 ggcgactgag caagactca 19 <210> 38 <211> 26 <212> DNA <213> Artificial Sequence <220> <223> Penta E reverse primer <400> 38 tgggttatta attgagaaaa ctcctt 26 <210> 39 <211> 22 <212> DNA <213> Artificial Sequence <220> <223> CSF1PO forward primer <400> 39 actgccttca tagatagaag at 22 <210> 40 <211> 22 <212> DNA <213> Artificial Sequence <220> <223> CSF1PO reverse primer <400> 40 gaccctgttc taagtacttc ct 22 <210> 41 <211> 27 <212> DNA <213> Artificial Sequence <220> <223> D7S820 forward primer <400> 41 tgatagaaca cttgtcatag tttagaa 27 <210> 42 <211> 21 <212> DNA <213> Artificial Sequence <220> <223> D7S820 reverse primer <400> 42 ctcattgaca gaattgcacc a 21 <210> 43 <211> 27 <212> DNA <213> Artificial Sequence <220> <223> D18S51 forward primer <400> 43 gttgctacta tttcttttct ttttctc 27 <210> 44 <211> 23 <212> DNA <213> Artificial Sequence <220> <223> D18S51 reverse primer <400> 44 ctgagtgaca aattgagacc ttg 23 <210> 45 <211> 21 <212> DNA <213> Artificial Sequence <220> <223> TPOX forward primer <400> 45 cagaacaggc acttagggaa c 21 <210> 46 <211> 21 <212> DNA <213> Artificial Sequence <220> <223> TPOX reverse primer <400> 46 tccttgtcag cgtttatttg c 21 <210> 47 <211> 22 <212> DNA <213> Artificial Sequence <220> <223> D16S539 forward primer <400> 47 aatacagaca gacagacagg tg 22 <210> 48 <211> 25 <212> DNA <213> Artificial Sequence <220> <223> D16S539 reverse primer <400> 48 agcatgtatc tatcatccat ctctg 25 <210> 49 <211> 26 <212> DNA <213> Artificial Sequence <220> <223> D8S1179 forward primer <400> 49 tttttgtatt tcatgtgtac attcgt 26 <210> 50 <211> 24 <212> DNA <213> Artificial Sequence <220> <223> D8S1179 reverse primer <400> 50 gtagattatt ttcactgtgg ggaa 24 <210> 51 <211> 23 <212> DNA <213> Artificial Sequence <220> <223> Amelogenin forward primer <400> 51 cctttgaagt ggtaccagag cat 23 <210> 52 <211> 26 <212> DNA <213> Artificial Sequence <220> <223> Amelogenin reverse primer <400> 52 gcatgcctaa tattttcagg gaataa 26 <210> 53 <211> 26 <212> DNA <213> Artificial Sequence <220> <223> FGA forward primer <400> 53 aaataaaatt aggcatattt acaagc 26 <210> 54 <211> 20 <212> DNA <213> Artificial Sequence <220> <223> FGA reverse primer <400> 54 gccagcaaaa aagaaaggaa 20 <210> 55 <211> 24 <212> DNA <213> Artificial Sequence <220> <223> D13S317 forward primer <400> 55 tctaacgcct atctgtattt acaa 24 <210> 56 <211> 26 <212> DNA <213> Artificial Sequence <220> <223> D13S317 reverse primer <400> 56 agacagaaag atagatagat gattga 26 <210> 57 <211> 20 <212> DNA <213> Artificial Sequence <220> <223> D2S1338 forward primer <400> 57 tggaaacaga aatggcttgg 20 <210> 58 <211> 26 <212> DNA <213> Artificial Sequence <220> <223> D2S1338 reverse primer <400> 58 agttattcag taagttaaag gattgc 26 <210> 59 <211> 20 <212> DNA <213> Artificial Sequence <220> <223> D21S11 forward primer <400> 59 aattccccaa gtgaattgcc 20 <210> 60 <211> 26 <212> DNA <213> Artificial Sequence <220> <223> D21S11 reverse primer <400> 60 ggtagataga ctggatagat agacga 26 <210> 61 <211> 22 <212> DNA <213> Artificial Sequence <220> <223> Penta D forward primer <400> 61 gcaagacacc atctcaagaa ag 22 <210> 62 <211> 23 <212> DNA <213> Artificial Sequence <220> <223> Penta D reverse primer <400> 62 tggtcataac gatttttttg aga 23 <210> 63 <211> 20 <212> DNA <213> Artificial Sequence <220> <223> D3S1358 forward primer <400> 63 cagtccaatc tgggtgacag 20 <210> 64 <211> 20 <212> DNA <213> Artificial Sequence <220> <223> D3S1358 reverse primer <400> 64 atcaacagag gcttgcatgt 20 <210> 65 <211> 19 <212> DNA <213> Artificial Sequence <220> <223> TH01 forward primer <400> 65 gattcccatt ggcctgttc 19 <210> 66 <211> 20 <212> DNA <213> Artificial Sequence <220> <223> TH01 reverse primer <400> 66 caggtcacag ggaacacaga 20 <210> 67 <211> 26 <212> DNA <213> Artificial Sequence <220> <223> vWA forward primer <400> 67 gaataatcag tatgtgactt ggattg 26 <210> 68 <211> 23 <212> DNA <213> Artificial Sequence <220> <223> vWA reverse primer <400> 68 tgataaatac ataggatgga tgg 23 <110> Republic Of Korea (Supreme Public Prosecutor's Office) <120> Method for Autosomal Analysis Human Subject of Analytes based on a Next Generation Sequencing Technology <130> PN140529 <160> 68 <170> Kopatentin 2.0 <210> 1 <211> 19 <212> DNA <213> Artificial Sequence <220> <223> D3S1358 forward primer <400> 1 actgcagtcc aatctgggt 19 <210> 2 <211> 21 <212> DNA <213> Artificial Sequence <220> <223> D3S1358 reverse primer <400> 2 atgaaatcaa cagaggcttg c 21 <210> 3 <211> 21 <212> DNA <213> Artificial Sequence <220> <223> TH01 forward primer <400> 3 gtgattccca ttggcctgtt c 21 <210> 4 <211> 22 <212> DNA <213> Artificial Sequence <220> <223> TH01 reverse primer <400> 4 attcctgtgg gctgaaaagc tc 22 <210> 5 <211> 22 <212> DNA <213> Artificial Sequence <220> <223> D21S11 forward primer <400> 5 atatgtgagt caattcccca ag 22 <210> 6 <211> 26 <212> DNA <213> Artificial Sequence <220> <223> D21S11 reverse primer <400> 6 tgtattagtc aatgttctcc agagac 26 <210> 7 <211> 20 <212> DNA <213> Artificial Sequence <220> <223> D18S51 forward primer <400> 7 ttcttgagcc cagaaggtta 20 <210> 8 <211> 27 <212> DNA <213> Artificial Sequence <220> <223> D18S51 reverse primer <400> 8 attctaccag caacaacaca aataaac 27 <210> 9 <211> 26 <212> DNA <213> Artificial Sequence <220> <223> Penta E forward primer <400> 9 attaccaaca tgaaagggta ccaata 26 <210> 10 <211> 33 <212> DNA <213> Artificial Sequence <220> <223> Penta E reverse primer <400> 10 tgggttatta attgagaaaa ctccttacaa ttt 33 <210> 11 <211> 23 <212> DNA <213> Artificial Sequence <220> <223> D5S818 forward primer <400> 11 ggtgattttc ctctttggta tcc 23 <210> 12 <211> 26 <212> DNA <213> Artificial Sequence <220> <223> D5S818 reverse primer <400> 12 agccacagtt tacaacattt gtatct 26 <210> 13 <211> 26 <212> DNA <213> Artificial Sequence <220> <223> D13S317 forward primer <400> 13 attacagaag tctgggatgt ggagga 26 <210> 14 <211> 19 <212> DNA <213> Artificial Sequence <220> <223> D13S317 reverse primer <400> 14 ggcagcccaa aaagacaga 19 <210> 15 <211> 21 <212> DNA <213> Artificial Sequence <220> <223> D7S820 forward primer <400> 15 atgttggtca ggctgactat g 21 <210> 16 <211> 24 <212> DNA <213> Artificial Sequence <220> <223> D7S820 reverse primer <400> 16 gattccacat ttatcctcat tgac 24 <210> 17 <211> 24 <212> DNA <213> Artificial Sequence <220> <223> D16S539 forward primer <400> 17 gggggtctaa gagcttgtaa aaag 24 <210> 18 <211> 29 <212> DNA <213> Artificial Sequence <220> <223> D16S539 reverse primer <400> 18 gtttgtgtgt gcatctgtaa gcatgtatc 29 <210> 19 <211> 24 <212> DNA <213> Artificial Sequence <220> &Lt; 223 > CSF1PO forward primer <400> 19 ccggaggtaa aggtgtctta aagt 24 <210> 20 <211> 22 <212> DNA <213> Artificial Sequence <220> &Lt; 223 > CSF1PO reverse primer <400> 20 atttcctgtg tcagaccctg tt 22 <210> 21 <211> 19 <212> DNA <213> Artificial Sequence <220> <223> Penta D forward primer <400> 21 gaaggtcgaa gctgaagtg 19 <210> 22 <211> 27 <212> DNA <213> Artificial Sequence <220> <223> Penta D reverse primer <400> 22 attagaattc tttaatctgg acacaag 27 <210> 23 <211> 33 <212> DNA <213> Artificial Sequence <220> <223> vWA forward primer <400> 23 gccctagtgg atgataagaa taatcagtat gtg 33 <210> 24 <211> 30 <212> DNA <213> Artificial Sequence <220> <223> vWA reverse primer <400> 24 ggacagatga taaatacata ggatggatgg 30 <210> 25 <211> 32 <212> DNA <213> Artificial Sequence <220> <223> D8S1179 forward primer <400> 25 attgcaactt atatgtattt ttgtatttca tg 32 <210> 26 <211> 28 <212> DNA <213> Artificial Sequence <220> <223> D8S1179 reverse primer <400> 26 accaaattgt gttcatgagt atagtttc 28 <210> 27 <211> 20 <212> DNA <213> Artificial Sequence <220> <223> TPOX forward primer <400> 27 gcacagaaca ggcacttagg 20 <210> 28 <211> 18 <212> DNA <213> Artificial Sequence <220> <223> TPOX reverse primer <400> 28 cgctcaaacg tgaggttg 18 <210> 29 <211> 20 <212> DNA <213> Artificial Sequence <220> <223> FGA forward primer <400> 29 ggctgcaggg cataacatta 20 <210> 30 <211> 24 <212> DNA <213> Artificial Sequence <220> <223> FGA reverse primer <400> 30 attctatgac tttgcgcttc agga 24 <210> 31 <211> 19 <212> DNA <213> Artificial Sequence <220> <223> Amelogenin forward primer <400> 31 ccctgggctc tgtaaagaa 19 <210> 32 <211> 24 <212> DNA <213> Artificial Sequence <220> <223> Amelogenin reverse primer <400> 32 atcagagctt aaactgggaa gctg 24 <210> 33 <211> 23 <212> DNA <213> Artificial Sequence <220> <223> D19S433 forward primer <400> 33 gcaaaaagct ataattgtac cac 23 <210> 34 <211> 26 <212> DNA <213> Artificial Sequence <220> <223> D19S433 reverse primer <400> 34 aaaaatcttc tctctttctt cctctc 26 <210> 35 <211> 23 <212> DNA <213> Artificial Sequence <220> <223> D5S818 forward primer <400> 35 tgattttcct ctttggtatc ctt 23 <210> 36 <211> 27 <212> DNA <213> Artificial Sequence <220> <223> D5S818 reverse primer <400> 36 caacatttgt atctttatct gtatcct 27 <210> 37 <211> 19 <212> DNA <213> Artificial Sequence <220> <223> Penta E forward primer <400> 37 ggcgactgag caagactca 19 <210> 38 <211> 26 <212> DNA <213> Artificial Sequence <220> <223> Penta E reverse primer <400> 38 tgggttatta attgagaaaa ctcctt 26 <210> 39 <211> 22 <212> DNA <213> Artificial Sequence <220> &Lt; 223 > CSF1PO forward primer <400> 39 actgccttca tagatagaag at 22 <210> 40 <211> 22 <212> DNA <213> Artificial Sequence <220> &Lt; 223 > CSF1PO reverse primer <400> 40 gaccctgttc taagtacttc ct 22 <210> 41 <211> 27 <212> DNA <213> Artificial Sequence <220> <223> D7S820 forward primer <400> 41 tgatagaaca cttgtcatag tttagaa 27 <210> 42 <211> 21 <212> DNA <213> Artificial Sequence <220> <223> D7S820 reverse primer <400> 42 ctcattgaca gaattgcacc a 21 <210> 43 <211> 27 <212> DNA <213> Artificial Sequence <220> <223> D18S51 forward primer <400> 43 gttgctacta tttcttttct ttttctc 27 <210> 44 <211> 23 <212> DNA <213> Artificial Sequence <220> <223> D18S51 reverse primer <400> 44 ctgagtgaca aattgagacc ttg 23 <210> 45 <211> 21 <212> DNA <213> Artificial Sequence <220> <223> TPOX forward primer <400> 45 cagaacaggc acttagggaa c 21 <210> 46 <211> 21 <212> DNA <213> Artificial Sequence <220> <223> TPOX reverse primer <400> 46 tccttgtcag cgtttatttg c 21 <210> 47 <211> 22 <212> DNA <213> Artificial Sequence <220> <223> D16S539 forward primer <400> 47 aatacagaca gacagacakg tg 22 <210> 48 <211> 25 <212> DNA <213> Artificial Sequence <220> <223> D16S539 reverse primer <400> 48 agcatgtatc tatcatccat ctctg 25 <210> 49 <211> 26 <212> DNA <213> Artificial Sequence <220> <223> D8S1179 forward primer <400> 49 tttttgtatt tcatgtgtac attcgt 26 <210> 50 <211> 24 <212> DNA <213> Artificial Sequence <220> <223> D8S1179 reverse primer <400> 50 gtagattatt ttcactgtgg ggaa 24 <210> 51 <211> 23 <212> DNA <213> Artificial Sequence <220> <223> Amelogenin forward primer <400> 51 cctttgaagt ggtaccagag cat 23 <210> 52 <211> 26 <212> DNA <213> Artificial Sequence <220> <223> Amelogenin reverse primer <400> 52 gcatgcctaa tattttcagg gaataa 26 <210> 53 <211> 26 <212> DNA <213> Artificial Sequence <220> <223> FGA forward primer <400> 53 aaataaaatt aggcatattt acaagc 26 <210> 54 <211> 20 <212> DNA <213> Artificial Sequence <220> <223> FGA reverse primer <400> 54 gccagcaaaa aagaaaggaa 20 <210> 55 <211> 24 <212> DNA <213> Artificial Sequence <220> <223> D13S317 forward primer <400> 55 tctaacgcct atctgtattt acaa 24 <210> 56 <211> 26 <212> DNA <213> Artificial Sequence <220> <223> D13S317 reverse primer <400> 56 agacagaaag atagatagat gattga 26 <210> 57 <211> 20 <212> DNA <213> Artificial Sequence <220> <223> D2S1338 forward primer <400> 57 tggaaacaga aatggcttgg 20 <210> 58 <211> 26 <212> DNA <213> Artificial Sequence <220> <223> D2S1338 reverse primer <400> 58 agttattcag taagttaaag gattgc 26 <210> 59 <211> 20 <212> DNA <213> Artificial Sequence <220> <223> D21S11 forward primer <400> 59 aattccccaa gtgaattgcc 20 <210> 60 <211> 26 <212> DNA <213> Artificial Sequence <220> <223> D21S11 reverse primer <400> 60 ggtagataga ctggatagat agacga 26 <210> 61 <211> 22 <212> DNA <213> Artificial Sequence <220> <223> Penta D forward primer <400> 61 gcaagacacc atctcaagaa ag 22 <210> 62 <211> 23 <212> DNA <213> Artificial Sequence <220> <223> Penta D reverse primer <400> 62 tggtcataac gatttttttg aga 23 <210> 63 <211> 20 <212> DNA <213> Artificial Sequence <220> <223> D3S1358 forward primer <400> 63 cagtccaatc tgggtgacag 20 <210> 64 <211> 20 <212> DNA <213> Artificial Sequence <220> <223> D3S1358 reverse primer <400> 64 atcaacagag gcttgcatgt 20 <210> 65 <211> 19 <212> DNA <213> Artificial Sequence <220> <223> TH01 forward primer <400> 65 gattcccatt ggcctgttc 19 <210> 66 <211> 20 <212> DNA <213> Artificial Sequence <220> <223> TH01 reverse primer <400> 66 caggtcacag ggaacacaga 20 <210> 67 <211> 26 <212> DNA <213> Artificial Sequence <220> <223> vWA forward primer <400> 67 gaataatcag tatgtgactt ggattg 26 <210> 68 <211> 23 <212> DNA <213> Artificial Sequence <220> <223> vWA reverse primer <400> 68 tgataaatac ataggatgga tgg 23

Claims

An autosomal analysis method of an NGS-based next generation sequencing-based human subject comprising the steps of:
(a) Human DnaS1358, TH01, D21S11, D18S51, PentaE, D5S818, D13S317, D7S820, D16S539, CSF1PO (Human c-fms proto-oncogene for CSF-1 receptor gene), PentaD, vWA Multiplex amplification using respective primers complementarily binding to the genetic locus of D8S1179, human thyroid peroxidase gene (TPOX), human fibrinogen alpha chain (FGA) and amelogenin, Wherein the primer complementarily binding to the D3S1358 genetic locus is a first sequence and a second sequence of Sequence Listing; The primer that binds complementarily to the TH01 genetic locus is a sequence of SEQ ID NO: 3 or SEQ ID NO: 4; The primers complementarily binding to the D21S11 genetic locus are those of SEQ ID NOS: 5 and 6; The primers complementarily binding to the D18S51 genetic locus are SEQ ID NOS: 7 and 8; Wherein the primer that binds complementarily to the PentaE gene locus is SEQ ID NOS: 9 and 10; Wherein the primers complementarily binding to the D5S818 genetic locus are SEQ ID NO: 11 and SEQ ID NO: 12; Wherein the primers complementarily binding to the D13S317 genetic locus are SEQ ID NOS 13 and 14; The primers that complementarily bind to the D7S820 genetic locus are those of Sequence Listing 15 and 16; The primers complementarily binding to the D16S539 genetic locus are those of Sequence Listing 17 and Sequence 18; The primers complementarily binding to the CSF1PO genetic locus are SEQ ID NOS: 19 and 20; Wherein the primers complementarily binding to the PentaD genetic locus are SEQ ID NOS 21 and 22, and the primers complementarily binding to the vWA genetic locus are SEQ ID NOS 23 and 24; The primers complementarily binding to the D8S1179 genetic locus are those of Sequence Listing 25 and Sequence 26; Wherein the primer that binds complementarily to the TPOX genetic locus is SEQ ID NO: 27 and SEQ ID NO: 28; Wherein the primers complementarily binding to the FGA genetic locus are those of Sequence Listing Nos. 29 and 30; Wherein the primers complementarily binding to the amelogenin gene locus are SEQ ID NOS: 31 and 32; And
(b) determining an STR (short tandem repeat) allele of the genetic locus using the NGS data of the multiplex amplification product of step (a), and identifying the human object as a gene.

The method of claim 1, wherein the DNA sample is DNA isolated from a tissue selected from the group consisting of blood, semen, vaginal cells, hair, saliva, urine, oral cells, placental cells or amniotic fluid including fetal cells, Wherein the sample is a sample.

The method according to claim 1, wherein the DNA sample is a biological sample containing DNA.

The method according to claim 1, wherein the multiplex gene amplification is a Polymerase Chain Reaction (PCR) amplification or a direct multiplex PCR amplification.

5. The method according to claim 4, wherein the multiplex gene Wherein the amplification has 32-36 cycles.

delete

2. The method according to claim 1, wherein the primer complementarily binding to the D3S1358 genetic locus of step (a) has a final concentration of 0.01-0.5 [mu] M and the primer that binds complementarily to the TH01 genetic locus is 0.01-0.4 [mu] M Wherein the primer complementarily binding to the D21S11 genetic locus has a final concentration of 0.4-0.8 μM and the primer complementarily binding to the D18S51 genetic locus has a final concentration of 0.3-0.7 μM, The primer complementarily binding to the PentaE gene locus has a final concentration of 1-1.4 [mu] M, and the primer complementarily binding to the D5S818 genetic locus has a final concentration of 0.01-0.5 [mu] M, and the D13S317 gene locus is complementary The primer that binds as a target has a final concentration of 0.2-0.6 [mu] M; The primer that binds complementarily to the D7S820 genetic locus has a final concentration of 0.1-0.5 [mu] M; The primer that binds complementarily to the D16S539 genetic locus has a final concentration of 0.2-0.6 [mu] M; The primer that binds complementarily to the CSF1PO genetic locus has a final concentration of 0.1-0.5 [mu] M; The primer that binds complementarily to the PentaD genetic locus has a final concentration of 1-1.4 [mu] M; The primer that binds complementarily to the vWA genetic locus has a final concentration of 0.01-0.4 [mu] M; The primer that binds complementarily to the D8S1179 genetic locus has a final concentration of 0.3-0.7 [mu] M; The primer that binds complementarily to the TPOX genetic locus has a final concentration of 0.01-0.4 [mu] M; The primer that binds complementarily to the FGA gene locus has a final concentration of 0.4-0.8 [mu] M; Wherein the primer complementarily binding to the amelogenin gene locus has a final concentration of 0.05-0.5 [mu] M.

2. The method according to claim 1, wherein step (b) comprises comparing the nucleotide sequence of each STR allele amplified in the multiplex amplification product with a reference sequence, Lt; RTI ID = 0.0 > STR < / RTI >

[Claim 9] The method according to claim 8, wherein the reference sequence comprises a sequence of the STR region of the allele, a 5 'perimeter sequence of the STR region, and a 3' peripheral sequence of the STR region.

9. The method of claim 8, wherein the method further comprises, after step (b), identifying the number of copies of the repeat sequence, the repeat structure of alleles, or sequence variation in the STR genomic locus RTI ID = 0.0 > 1, < / RTI >

11. The method of claim 10, wherein the method further comprises the step of measuring the mixing ratio of the mixed sample using the sequence variation, wherein the mixing ratio is determined by analyzing coverage ratio between alleles on each STR gene, Lt; RTI ID = 0.0 > mutation < / RTI > base sequence.

2. The method of claim 1, wherein the multiplex amplification product of step (b) has a size of 20 bp to 600 bp.

2. The method of claim 1, wherein the method of autosomal analysis has the use of forensic typing or identification.

2. The method of claim 1, wherein the primer is a non-labeled primer.

D3S1358, TH01, D21S11, D18S51, PentaE, D5S818, D13S317, D7S820, D16S539, CSF1PO (Human c-fms proto-oncogene for CSF-1 receptor gene), PentaD, von Willebrand factor A, D8S1179, TPOX (next generation sequencing-based) autosomal multiplex multiplex gene amplification that includes primers complementary to the thyroid peroxidase gene, FGF (human fibrinogen alpha chain), and amelogenin gene locus As a kit, the primer that binds complementarily to the D3S1358 genetic locus is a first sequence and a second sequence of Sequence Listing; The primer that binds complementarily to the TH01 genetic locus is a sequence of SEQ ID NO: 3 or SEQ ID NO: 4; The primers complementarily binding to the D21S11 genetic locus are those of SEQ ID NOS: 5 and 6; The primers complementarily binding to the D18S51 genetic locus are SEQ ID NOS: 7 and 8; Wherein the primer that binds complementarily to the PentaE gene locus is SEQ ID NOS: 9 and 10; Wherein the primers complementarily binding to the D5S818 genetic locus are SEQ ID NO: 11 and SEQ ID NO: 12; Wherein the primers complementarily binding to the D13S317 genetic locus are SEQ ID NOS 13 and 14; The primers that complementarily bind to the D7S820 genetic locus are those of Sequence Listing 15 and 16; The primers complementarily binding to the D16S539 genetic locus are those of Sequence Listing 17 and Sequence 18; The primers complementarily binding to the CSF1PO genetic locus are SEQ ID NOS: 19 and 20; Wherein the primers complementarily binding to the PentaD genetic locus are SEQ ID NOS 21 and 22, and the primers complementarily binding to the vWA genetic locus are SEQ ID NOS 23 and 24; The primers complementarily binding to the D8S1179 genetic locus are those of Sequence Listing 25 and Sequence 26; Wherein the primer that binds complementarily to the TPOX genetic locus is SEQ ID NO: 27 and SEQ ID NO: 28; Wherein the primers complementarily binding to the FGA genetic locus are those of Sequence Listing Nos. 29 and 30; Wherein the primers complementarily binding to the amelogenin gene locus are SEQ ID NOS: 31 and 32.

An autosomal analysis method of an NGS-based next generation sequencing-based human subject comprising the steps of:
(a) Human DnaSDNA, D5S818, Penta E, CSF1PO (Human c-fms proto-oncogene for CSF-1 receptor gene), D7S820, D18S51, TPOX (human thyroid peroxidase gene), D16S539, D8S1179, Amplifier amplification was performed using primers complementary to the genetic locus of amelogenin, human fibrinogen alpha chain (FGA), D13S317, D2S1338, D21S11, Penta D, D3S1358, TH01 and vWA (von Willebrand factor A) Wherein the primer that binds complementarily to the D19S433 genetic locus is SEQ ID NO: 33 and SEQ ID NO: 34; Wherein the primers complementarily binding to the D5S818 genetic locus are SEQ ID NO: 35 and SEQ ID NO: 36; Wherein the primer that binds complementarily to the Penta E gene locus is a sequence of SEQ ID NOS: 37 and 38; The primers complementarily binding to the CSF1PO genetic locus are SEQ ID NO: 39 and SEQ ID NO: 40; The primers complementarily binding to the D7S820 genetic locus are those of Sequence Listing 41 and Sequence 42; The primers complementarily binding to the D18S51 genetic locus are those of Sequence Listing 43 and Sequence 44; Wherein the primers complementarily binding to the TPOX genetic locus are those of Sequence Listing 45 and 46; The primers complementarily binding to the D16S539 genetic locus are those of Sequence Listing 47 and Sequence 48; The primers complementarily binding to the D8S1179 genetic locus are those of Sequence Listing 49 and Sequence 50; Wherein the primers complementarily binding to the amelogenin gene locus are SEQ ID NO: 51 and SEQ ID NO: 52; Wherein the primers complementarily binding to the FGA gene locus are SEQ ID NOS: 53 and 54, and the primers complementarily binding to the D13S317 genetic locus are SEQ ID NOS: 55 and 56; The primers complementarily binding to the D2S1338 genetic locus are those of SEQ ID NOS: 57 and 58; The primers that complementarily bind to the D21S11 genetic locus are those of Sequence Listing 59 and Sequence 60; Wherein the primer complementarily binding to the Penta D genetic locus is a sequence selected from the group consisting of SEQ ID NOS: 61 and 62; The primers complementarily binding to the D3S1358 genetic locus are SEQ ID NO: 63 and SEQ ID NO: 64; Wherein the primers complementarily binding to the TH01 genetic locus are those of SEQ ID NOS: 65 and 66; Wherein the primers complementarily binding to the vWA genetic locus are SEQ ID NOS: 67 and 68; And
(b) determining an STR (short tandem repeat) allele of the genetic locus using the NGS data of the multiplex amplification product of step (a), and identifying the human object as a gene.

18. The method of claim 16, wherein the DNA sample is DNA isolated from a tissue selected from the group consisting of blood, semen, vaginal cells, hair, saliva, urine, oral cells, placental cells or amniotic fluid including fetal cells, Wherein the sample is a sample.

17. The method according to claim 16, wherein the DNA sample is a biological sample containing DNA.

17. The method of claim 16, wherein the DNA sample is a degraded DNA sample.

17. The method of claim 16, wherein the multiplex gene amplification is Polymerase Chain Reaction (PCR) amplification or direct multiplex PCR amplification.

17. The method of claim 16, wherein the multiplex gene Wherein the amplification has 31-38 cycles.

20. The method of claim 19, wherein the multiplex gene Wherein the amplification has 34-38 cycles.

17. The method of claim 16, wherein the multiplex gene amplification has an annealing temperature condition of 57-61 < 0 > C.

delete

17. The method according to claim 16, wherein the primer complementarily binding to the D19S433 genetic locus of step (a) has a final concentration of 0.4-0.8 [mu] M and the primer that binds complementarily to the D5S818 genetic locus is 0.3-0.8 [mu] M Wherein the primer complementarily binding to the Penta E genetic locus has a final concentration of 0.8-1.2 μM and the primer complementarily binding to the CSF1 PO genetic locus has a final concentration of 0.3-0.7 μM , The primer complementarily binding to the D7S820 genetic locus had a final concentration of 0.3-0.7 [mu] M, the primer complementarily binding to the D18S51 genetic locus had a final concentration of 0.7-1.1 [mu] M and the TPOX locus The complementarily binding primers have a final concentration of 0.1-0.6 [mu] M; The primer that binds complementarily to the D16S539 genetic locus has a final concentration of 0.2-0.7 [mu] M; The primer that binds complementarily to the D8S1179 genetic locus has a final concentration of 0.7-1.1 [mu] M; The primer that binds complementarily to the amelogenin gene locus has a final concentration of 0.6-1 [mu] M; The primer that binds complementarily to the FGA gene locus has a final concentration of 0.8-1.2 [mu] M; The primer that binds complementarily to the D13S317 genetic locus has a final concentration of 0.6-1 [mu] M; The primer that binds complementarily to the D2S1338 genetic locus has a final concentration of 0.6-1 [mu] M; The primer that binds complementarily to the D21S11 genetic locus has a final concentration of 0.5-0.9 [mu] M; The primer that binds complementarily to the Penta D genetic locus has a final concentration of 0.8-1.2 [mu] M; The primer that binds complementarily to the D3S1358 genetic locus has a final concentration of 0.3-0.7 [mu] M; The primer that binds complementarily to the TH01 genetic locus has a final concentration of 0.3-0.7 [mu] M; Wherein the primer complementarily binding to the vWA genetic locus has a final concentration of 0.8-1.2 [mu] M.

17. The method according to claim 16, wherein step (b) comprises comparing the nucleotide sequence of each STR allele amplified in the multiplex amplification product with a reference sequence, Lt; RTI ID = 0.0 > STR < / RTI >

27. The method of claim 26, wherein the reference sequence comprises a sequence of the STR region of the allele, a 5 'perimeter sequence of the STR region, and a 3' perimeter sequence of the STR region.

27. The method of claim 26, wherein the method further comprises, after step (b), identifying the number of copies of the repeat sequence, the repeat structure of alleles, or sequence variation in the STR genomic locus RTI ID = 0.0 > 1, < / RTI >

27. The method of claim 26, wherein the method further comprises the step of measuring the mixing ratio of the mixed sample using the sequence variation, wherein the mixing ratio is determined by analyzing coverage ratio between alleles on each STR gene, Lt; RTI ID = 0.0 > mutation < / RTI > base sequence.

17. The method of claim 16, wherein the multiplex amplification product of step (b) has a size of 50 bp to 300 bp.

17. The method of claim 16, wherein the method of autosomal analysis has the use of forensic typing or identification.

17. The method of claim 16, wherein the primer is a non-labeled primer.

(Human c-fms proto-oncogene for CSF-1 receptor gene), D7S820, D18S51, TPOX (human thyroid peroxidase gene), D16S539, D8S1179, amelogenin, FGA (next generation sequencing-based) autosomal chromosomes containing respective primers complementarily binding to the genetic locus of the fibrinogen alpha chain, D13S317, D2S1338, D21S11, Penta D, D3S1358, TH01 and vWA (von Willebrand factor A) Wherein the primers complementarily binding to the D19S433 genetic locus are those of Sequence Listing 33 and Sequence 34; Wherein the primers complementarily binding to the D5S818 genetic locus are SEQ ID NO: 35 and SEQ ID NO: 36; Wherein the primer that binds complementarily to the Penta E gene locus is a sequence of SEQ ID NOS: 37 and 38; The primers complementarily binding to the CSF1PO genetic locus are SEQ ID NO: 39 and SEQ ID NO: 40; The primers complementarily binding to the D7S820 genetic locus are those of Sequence Listing 41 and Sequence 42; The primers complementarily binding to the D18S51 genetic locus are those of Sequence Listing 43 and Sequence 44; Wherein the primers complementarily binding to the TPOX genetic locus are those of Sequence Listing 45 and 46; The primers complementarily binding to the D16S539 genetic locus are those of Sequence Listing 47 and Sequence 48; The primers complementarily binding to the D8S1179 genetic locus are those of Sequence Listing 49 and Sequence 50; Wherein the primers complementarily binding to the amelogenin gene locus are SEQ ID NO: 51 and SEQ ID NO: 52; Wherein the primers complementarily binding to the FGA gene locus are SEQ ID NOS: 53 and 54, and the primers complementarily binding to the D13S317 genetic locus are SEQ ID NOS: 55 and 56; The primers complementarily binding to the D2S1338 genetic locus are those of SEQ ID NOS: 57 and 58; The primers that complementarily bind to the D21S11 genetic locus are those of Sequence Listing 59 and Sequence 60; Wherein the primer complementarily binding to the Penta D genetic locus is a sequence selected from the group consisting of SEQ ID NOS: 61 and 62; The primers complementarily binding to the D3S1358 genetic locus are SEQ ID NO: 63 and SEQ ID NO: 64; Wherein the primers complementarily binding to the TH01 genetic locus are those of SEQ ID NOS: 65 and 66; Wherein the primers complementarily binding to the vWA genetic locus are SEQ ID NOS: 67 and 68.