KR20170000744A

KR20170000744A - Method and apparatus for analyzing gene

Info

Publication number: KR20170000744A
Application number: KR1020150168833A
Authority: KR
Inventors: 박웅양; 김상철; 남재용
Original assignee: 사회복지법인 삼성생명공익재단
Priority date: 2015-06-24
Filing date: 2015-11-30
Publication date: 2017-01-03
Also published as: CN107408163A; SG11201707649SA; KR101828052B1; SA517380741B1; CN107408163B

Abstract

The present invention relates to a method and a device for analyzing a gene. The present invention generates a reference data set by performing deep sequencing for reference genes, analyzes the depths of the genes by performing deep sequencing for the genes, and compares the analyzed depths with the depths of the reference genes included in the reference data set to determine whether there is a copy number variation (CNV) gene among the genes.

Description

[0001] METHOD AND APPARATUS FOR ANALYZING GENE [0002]

유전자를 분석하는 방법 및 장치에 관한 것으로서, 특히 복제수 변이(copy number variation, CNV)의 유전자를 분석하는 방법 및 장치에 관한다.And more particularly, to a method and apparatus for analyzing genes of copy number variation (CNV).

유전체(genome)란 한 생물이 가지는 모든 유전 정보를 말한다. 어느 한 개인의 유전체의 시퀀싱(sequencing)을 위하여, DNA 칩 및 차세대 서열화(Next Generation Sequencing) 기술, 차차세대 서열화(Next Next Generation Sequencing) 기술 등 여러 기술들이 개발되고 있다. 핵산 서열, 단백질 등과 같은 유전 정보들은 분석은 당뇨병, 암과 같은 질병을 발현시키는 유전자를 찾거나, 유전적 다양성과 개체의 발현 특성 간의 상관관계 등을 파악하기 위하여 폭넓게 활용된다. 특히, 개인으로부터 수집된 유전 데이터는 서로 다른 증상이나 질병의 진행과 관련된 개인의 유전적인 특징을 규명하는데 있어서 중요하다. 따라서, 개인의 핵산 서열, 단백질 등과 같은 유전 데이터는 현재와 미래의 질병 관련 정보를 파악하여 질병을 예방하거나 질병의 초기 단계에서 최적의 치료 방법을 선택할 수 있도록 하는 핵심적인 데이터이다. 생물의 유전 정보들로서 SNP(Single Nucleotide Polymorphism), CNV(Copy Number Variation) 등을 검출하는 유전체 검출 장비를 활용하여 개인의 유전 데이터를 정확히 분석하고, 개인의 질병을 진단하는 기술들이 연구 중에 있다.A genome is any genetic information that a creature has. Several techniques have been developed for DNA sequencing of a single individual, such as DNA chip and next generation sequencing technology, and next generation sequencing technology. Genetic information such as nucleic acid sequences and proteins are widely used to find genes expressing diseases such as diabetes and cancer or to correlate genetic diversity and expression characteristics of individuals. In particular, genetic data collected from individuals is important in identifying genetic characteristics of individuals with different symptoms or progression of disease. Thus, genetic data such as individual nucleic acid sequences, proteins, and the like are crucial data that can identify current and future disease-related information to prevent disease or select optimal treatment methods at an early stage of disease. Techniques for accurately diagnosing individuals' genetic data and diagnosing individual diseases using genome detection devices that detect SNP (Single Nucleotide Polymorphism) and CNV (Copy Number Variation) as biological genetic information are under study.

유전자를 분석하는 방법 및 장치를 제공하는데 있다. 본 실시예가 이루고자 하는 기술적 과제는 상기된 바와 같은 기술적 과제들로 한정되지 않으며, 이하의 실시예들로부터 또 다른 기술적 과제들이 유추될 수 있다.And a method and apparatus for analyzing genes. The technical problem to be solved by this embodiment is not limited to the above-mentioned technical problems, and other technical problems can be deduced from the following embodiments.

일 측면에 따르면, 유전자를 분석하는 방법은, 레퍼런스 유전자들에 대한 딥 시퀀싱을 수행함으로써 상기 레퍼런스 유전자들 각각에 정렬된 리드들의 뎁스들에 관한 레퍼런스 데이터 세트를 생성하는 단계; 피검 유전자들에 대해 상기 딥 시퀀싱을 수행함으로써 상기 피검 유전자들 각각에 정렬된 리드들의 뎁스들을 분석하는 단계; 및 상기 분석된 뎁스들을 상기 레퍼런스 데이터 세트에 포함된 상기 레퍼런스 유전자들에 대한 뎁스들과 비교함으로써, 상기 피검 유전자들 중 복제수 변이(CNV) 유전자가 존재하는지 여부를 판단하는 단계를 포함한다.According to one aspect, a method of analyzing a gene comprises generating a set of reference data for the depths of leads aligned with each of the reference genes by performing a deep sequencing on the reference genes; Analyzing the depths of the leads aligned with each of the tested genes by performing the deep sequencing on the tested genes; And comparing the analyzed depths with depths of the reference genes included in the reference data set to determine whether a copy number variation (CNV) gene exists among the tested genes.

또한, 상기 분석하는 단계는 상기 피검 유전자들의 엑손 부위들에 정렬된 상기 리드들의 상기 뎁스를 분석한다.In addition, the analyzing step analyzes the depth of the leads aligned with the exon regions of the test genes.

또한, 상기 판단하는 단계는 동일한 엑손 부위 별로 상기 레퍼런스 유전자들 및 상기 피검 유전자들 간의 상기 뎁스들을 비교함으로써, 상기 복제수 변이(CNV) 유전자의 존재를 판단한다.In addition, the determining step determines the presence of the copy number mutation (CNV) gene by comparing the depths of the reference genes and the test genes for the same exon region.

또한, 상기 판단하는 단계는 상기 피검 유전자들의 엑손 부위들 중, 상기 레퍼런스 유전자들 및 상기 피검 유전자들 간에 서로 대응되는 엑손 부위들에서의 뎁스의 차이가 통계적으로 유의(significant)하지 않은 엑손 부위가 존재하는 경우, 상기 복제수 변이(CNV) 유전자가 존재하는 것으로 판단한다.Also, in the determining step, the exon regions of the exon regions of the test genes are not statistically significant in the difference between the depths of the exon regions corresponding to each other among the reference genes and the test genes , It is determined that the copy number variation (CNV) gene is present.

또한, 상기 생성하는 단계는 복수의 사람들의 유전자 데이터에 대한 상기 딥 시퀀싱을 통해, 상기 사람들 각각에 대하여 상기 레퍼런스 유전자들에 대응되는 리드-뎁스들을 획득하는 단계; 상기 획득된 리드-뎁스들의 분포에 따라 상기 사람들을 서로 다른 그룹들로 클러스터링하는 단계; 그룹 마다 상기 레퍼런스 유전자들 각각에 대해 획득된 상기 리드-뎁스들을 표준화함으로써, 상기 그룹들 각각을 대표하는 상기 레퍼런스 유전자들 각각의 표준 뎁스들을 획득하는 단계를 포함하고, 상기 레퍼런스 데이터 세트는 상기 그룹들 각각에 대하여, 상기 레퍼런스 유전자들 각각의 표준 뎁스들을 나타내는 데이터를 포함한다.Also, the generating may include obtaining the lead-depths corresponding to the reference genes for each of the people through the deep sequencing of gene data of a plurality of people; Clustering the people into different groups according to the distribution of the obtained lead-depths; And obtaining standard depths of each of the reference genes representing each of the groups by standardizing the lead-depths obtained for each of the reference genes for each group, For each of the reference genes, data representative of the standard depths of each of the reference genes.

또한, 상기 판단하는 단계는 상기 그룹들 중, 상기 분석된 뎁스들의 분포와 상기 표준 뎁스들의 분포 간의 통계적인 차이가 가장 작은 그룹을 결정하는 단계; 및 상기 분석된 뎁스들과 상기 결정된 그룹에 대응되는 표준 뎁스들을 비교함으로써, 상기 복제수 변이(CNV) 유전자가 존재하는지 여부를 판단하는 단계를 포함한다.The determining may include determining a group having a smallest statistical difference between the distribution of the analyzed depths and the distribution of the standard depths among the groups; And comparing the analyzed depths with standard depths corresponding to the determined group to determine whether the copy number variation (CNV) gene is present.

또한, 공개 게놈 데이터 또는 공개 합맵(HapMap) 데이터로부터 상기 사람들의 상기 유전자 데이터를 획득하는 단계를 더 포함한다.The method further includes obtaining the gene data of the people from public genomic data or public map (HapMap) data.

또한, 상기 레퍼런스 유전자들 또는 상기 피검 유전자들은 생검 조직, 포르말린-고정 파라핀-내장(Formalin-fixed, paraffin-embedded, FFPE) 조직으로부터 획득된 것일 수 있다.In addition, the reference genes or the test genes may be obtained from biopsy tissue, formalin-fixed paraffin-embedded (FFPE) tissue.

또한, 상기 피검 유전자들 중 상기 복제수 변이(CNV) 유전자가 존재하는 것으로 판단된 경우, 상기 복제수 변이(CNV) 유전자에 대응되는 약물을 식별하기 위한 어노테이션을 수행하는 단계를 더 포함한다.In addition, when it is determined that the copy number mutation (CNV) gene is present among the genes to be tested, an annotation is performed to identify a drug corresponding to the copy number mutation (CNV) gene.

다른 측면에 따르면, 상기 방법을 컴퓨터에서 실행시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체를 제공한다.According to another aspect, there is provided a computer-readable recording medium storing a program for causing a computer to execute the method.

또 다른 측면에 따르면, 유전자를 분석하는 장치는, 레퍼런스 유전자들에 대한 딥 시퀀싱을 수행함으로써 상기 레퍼런스 유전자들 각각에 정렬된 리드들의 뎁스들에 관한 레퍼런스 데이터 세트를 생성하는 레퍼런스 데이터 생성부; 피검 유전자들에 대해 상기 딥 시퀀싱을 수행함으로써 상기 피검 유전자들 각각에 정렬된 리드들의 뎁스들을 분석하는 분석부; 및 상기 분석된 뎁스들을 상기 레퍼런스 데이터 세트에 포함된 상기 레퍼런스 유전자들에 대한 뎁스들과 비교함으로써, 상기 피검 유전자들 중 복제수 변이(CNV) 유전자가 존재하는지 여부를 판단하는 판단부를 포함한다.According to another aspect, an apparatus for analyzing a gene includes: a reference data generator for generating a reference data set for the depths of leads aligned to each of the reference genes by performing a deep sequencing on the reference genes; An analyzer for analyzing the depths of the leads aligned to each of the tested genes by performing the deep sequencing on the tested genes; And a determination unit for determining whether a copy number variation (CNV) gene exists among the test genes by comparing the analyzed depths with depths of the reference genes included in the reference data set.

또한, 상기 분석부는 상기 피검 유전자들의 엑손 부위들에 정렬된 상기 리드들의 상기 뎁스를 분석한다.In addition, the analyzing unit analyzes the depth of the leads aligned on the exon regions of the test genes.

또한, 상기 판단부는 동일한 엑손 부위 별로 상기 레퍼런스 유전자들 및 상기 피검 유전자들 간의 상기 뎁스들을 비교함으로써, 상기 복제수 변이(CNV) 유전자의 존재를 판단한다.In addition, the determination unit determines the existence of the copy number variation (CNV) gene by comparing the depths between the reference genes and the test genes for the same exon region.

또한, 상기 판단부는 상기 피검 유전자들의 엑손 부위들 중, 상기 레퍼런스 유전자들 및 상기 피검 유전자들 간에 서로 대응되는 엑손 부위들에서의 뎁스의 차이가 통계적으로 유의(significant)하지 않은 엑손 부위가 존재하는 경우, 상기 복제수 변이(CNV) 유전자가 존재하는 것으로 판단한다.In addition, when the exon regions of the exon regions of the test genes are not statistically significant in the difference between the depths of the exon regions corresponding to each other among the reference genes and the test genes, , And the copy number variation (CNV) gene is present.

또한, 상기 레퍼런스 데이터 생성부는 복수의 사람들의 유전자 데이터에 대한 상기 딥 시퀀싱을 통해, 상기 사람들 각각에 대하여 상기 레퍼런스 유전자들에 대응되는 리드-뎁스들을 획득하고, 상기 획득된 리드-뎁스들의 분포에 따라 상기 사람들을 서로 다른 그룹들로 클러스터링하고, 그룹 마다 상기 레퍼런스 유전자들 각각에 대해 획득된 상기 리드-뎁스들을 표준화함으로써, 상기 그룹들 각각을 대표하는 상기 레퍼런스 유전자들 각각의 표준 뎁스들을 획득하고, 상기 레퍼런스 데이터 세트는 상기 그룹들 각각에 대하여, 상기 레퍼런스 유전자들 각각의 표준 뎁스들을 나타내는 데이터를 포함한다.In addition, the reference data generator may acquire lead-depths corresponding to the reference genes for each of the people through the deep sequencing of gene data of a plurality of people, The standard depths of each of the reference genes representing each of the groups are obtained by clustering the people into different groups and standardizing the lead-depths obtained for each of the reference genes for each group, The reference data set includes, for each of the groups, data representing the standard depths of each of the reference genes.

또한, 상기 판단부는 상기 그룹들 중, 상기 분석된 뎁스들의 분포와 상기 표준 뎁스들의 분포 간의 통계적인 차이가 가장 작은 그룹을 결정하고, 상기 분석된 뎁스들과 상기 결정된 그룹에 대응되는 표준 뎁스들을 비교함으로써, 상기 복제수 변이(CNV) 유전자가 존재하는지 여부를 판단한다.The determining unit may determine a group having a smallest statistical difference between the distribution of the analyzed depths and the distribution of the standard depths among the groups, and compare the analyzed depths with the standard depths corresponding to the determined group Thereby determining whether the copy number variation (CNV) gene is present.

또한, 상기 레퍼런스 데이터 생성부는 공개 게놈 데이터 또는 공개 합맵(HapMap) 데이터로부터 상기 사람들의 상기 유전자 데이터를 획득한다.Also, the reference data generator obtains the genetic data of the people from public genome data or public map (HapMap) data.

또한, 상기 판단부는 상기 피검 유전자들 중 상기 복제수 변이(CNV) 유전자가 존재하는 것으로 판단된 경우, 상기 복제수 변이(CNV) 유전자에 대응되는 약물을 식별하기 위한 어노테이션을 수행한다.In addition, when the determination unit determines that the copy number mutation (CNV) gene exists, the determination unit annotates the drug corresponding to the copy number mutation (CNV) gene.

상기된 바에 따르면, 피검체의 피검 유전자로부터 복제수 변이(CNV) 유전자가 존재하는지를 보다 정확하게 분석해 낼 수 있다.According to the above, it is possible to more accurately analyze whether the copy number mutation (CNV) gene is present from the test gene of the test subject.

도 1은 일 실시예에 따른 유전자 분석 장치를 설명하기 위한 도면이다.
도 2는 일 실시예에 따른 유전자 분석 장치의 하드웨어 구성들을 도시한 블록도이다.
도 3은 일 실시예에 따른 레퍼런스 데이터 세트를 생성하는 방법의 흐름도이다.
도 4는 일 실시예에 따라 복수의 사람들(예를 들어, 정상인들) 각각에 대하여 레퍼런스 유전자들에 대응되는 리드-뎁스들을 획득하는 것을 설명하기 위한 도면이다.
도 5는 일 실시예에 따라 엑손 부위들에 대한 딥 시퀀싱을 설명하기 위한 도면이다.
도 6은 일 실시예에 따라 정상인 집단(400)으로부터 획득된 리드-뎁스들의 분포에 따라 사람들을 서로 다른 그룹들로 클러스터링하는 것을 설명하기 위한 도면이다.
도 7은 일 실시예에 따라 어느 그룹을 대표하는, 레퍼런스 유전자들 각각의 표준 뎁스들을 설명하기 위한 도면이다.
도 8은 일 실시예에 따라 피검체의 생물학적 샘플로부터 획득된 피검 유전자들에 대한 딥 시퀀싱을 수행하는 것을 설명하기 위한 도면이다.
도 9는 일 실시예에 따른 복제수 변이(CNV) 유전자가 존재하는지 여부를 판단하는 방법의 흐름도이다.
도 10은 일 실시예에 따라 복제수 변이(CNV) 유전자가 존재하는지 여부를 판단하는 것을 설명하기 위한 도면이다.
도 11은 일 실시예에 따라 유전자를 분석하는 방법의 흐름도이다.
도 12는 일 실시예에 따른 컴퓨팅 장치의 하드웨어 구성들을 도시한 블록도이다.1 is a diagram for explaining a gene analysis apparatus according to an embodiment.
2 is a block diagram illustrating hardware configurations of a gene analysis apparatus according to an exemplary embodiment of the present invention.
3 is a flow diagram of a method for generating a set of reference data in accordance with an embodiment.
FIG. 4 is a diagram for explaining obtaining lead-depths corresponding to reference genes for each of a plurality of people (for example, normal persons) according to an embodiment.
5 is a diagram for explaining deep sequencing for exon regions according to an embodiment.
FIG. 6 is a diagram illustrating clustering of people into different groups according to the distribution of lead-depths obtained from the normal population 400 according to an embodiment.
FIG. 7 is a diagram for explaining the standard depths of reference genes, which represent a group according to an embodiment.
FIG. 8 is a diagram for explaining performing deep sequencing on the test genes obtained from a biological sample of a subject according to an embodiment. FIG.
9 is a flowchart of a method for determining whether a copy number variation (CNV) gene is present according to an embodiment.
FIG. 10 is a diagram for explaining whether or not a copy number variation (CNV) gene is present according to an embodiment.
11 is a flowchart of a method for analyzing a gene according to an embodiment.
12 is a block diagram illustrating hardware configurations of a computing device according to one embodiment.

본 실시예들에서 사용되는 용어는 본 실시예들에서의 기능을 고려하면서 가능한 현재 널리 사용되는 일반적인 용어들을 선택하였으나, 이는 당 기술분야에 종사하는 기술자의 의도 또는 판례, 새로운 기술의 출현 등에 따라 달라질 수 있다. 또한, 특정한 경우는 임의로 선정된 용어도 있으며, 이 경우 해당 실시예의 설명 부분에서 상세히 그 의미를 기재할 것이다. 따라서, 본 실시예들에서 사용되는 용어는 단순한 용어의 명칭이 아닌, 그 용어가 가지는 의미와 본 실시예들의 전반에 걸친 내용을 토대로 정의되어야 한다.Although the terms used in the present embodiments have been selected in consideration of the functions in the present embodiments and are currently available in common terms, they may vary depending on the intention or the precedent of the technician working in the art, the emergence of new technology . Also, in certain cases, there are arbitrarily selected terms, and in this case, the meaning will be described in detail in the description part of the embodiment. Therefore, the terms used in the embodiments should be defined based on the meaning of the terms, not on the names of simple terms, and on the contents of the embodiments throughout.

실시예들에 대한 설명들에서, 어떤 부분이 다른 부분과 연결되어 있다고 할 때, 이는 직접적으로 연결되어 있는 경우뿐 아니라, 그 중간에 다른 구성요소를 사이에 두고 전기적으로 연결되어 있는 경우도 포함한다. 또한 어떤 부분이 어떤 구성요소를 포함한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다. 또한, 실시예들에 기재된 “...부”, “...모듈”의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어 또는 소프트웨어로 구현되거나 하드웨어와 소프트웨어의 결합으로 구현될 수 있다.In the descriptions of the embodiments, when a part is connected to another part, it includes not only a case where the part is directly connected but also a case where the part is electrically connected with another part in between . Also, when a component includes an element, it is understood that the element may include other elements, not the exclusion of any other element unless specifically stated otherwise. The term " ... ", " module ", as used in the embodiments, means a unit for processing at least one function or operation, and may be implemented in hardware or software, or a combination of hardware and software Can be implemented.

본 실시예들에서 사용되는 “구성된다” 또는 “포함한다” 등의 용어는 명세서 상에 기재된 여러 구성 요소들, 도는 여러 단계들을 반드시 모두 포함하는 것으로 해석되지 않아야 하며, 그 중 일부 구성 요소들 또는 일부 단계들은 포함되지 않을 수도 있고, 또는 추가적인 구성 요소 또는 단계들을 더 포함할 수 있는 것으로 해석되어야 한다.It should be noted that the terms such as " comprising " or " comprising ", as used in these embodiments, should not be construed as necessarily including the various components or stages described in the specification, Some steps may not be included, or may be interpreted to include additional components or steps.

하기 실시예들에 대한 설명은 권리범위를 제한하는 것으로 해석되지 말아야 하며, 해당 기술분야의 당업자가 용이하게 유추할 수 있는 것은 실시예들의 권리범위에 속하는 것으로 해석되어야 할 것이다. 이하 첨부된 도면들을 참조하면서 오로지 예시를 위한 실시예들을 상세히 설명하기로 한다.The following description of the embodiments should not be construed as limiting the scope of the present invention and should be construed as being within the scope of the embodiments of the present invention. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Exemplary embodiments will now be described in detail with reference to the accompanying drawings.

도 1은 일 실시예에 따른 유전자 분석 장치를 설명하기 위한 도면이다.1 is a diagram for explaining a gene analysis apparatus according to an embodiment.

도 1을 참고하면, 유전자 분석 장치(10)는 정상인 집단으로부터 획득된 유전자 데이터(20) 및 피검체로부터 획득된 유전자 데이터(30)를 이용하여, 피검체의 피검 유전자에 복제수 변이(CNV) 유전자가 존재하는지 여부를 식별할 수 있다.1, the gene analyzer 10 uses the gene data 20 obtained from the normal population and the gene data 30 obtained from the subject to determine the copy number variation (CNV) It is possible to identify whether or not the gene exists.

유전자 분석 장치(10)에서 수신하는 유전자 데이터(20) 및 유전자 데이터(30)는, 차세대 시퀀싱(next generation sequencing, NGS)에 의해 획득된 FASTQ 파일 포맷의 유전자 데이터에 해당될 수 있다. FASTQ 포맷은 보통 뉴클레오티드 서열과 같은 생물학적 서열과, 그에 대응되는 퀄리티 스코어를 저장하는 텍스트 기반 포맷(text-based format)이다. 다만, 본 실시예에 따른 유전자 분석 장치(10)는, FASTQ 포맷에 제한되지 않고, 다른 포맷의 유전자 데이터(20 및 30)도 분석이 가능하다.The gene data 20 and the gene data 30 received by the gene analysis apparatus 10 may correspond to the gene data of the FASTQ file format obtained by the next generation sequencing (NGS). The FASTQ format is a text-based format that stores biological sequences, such as nucleotide sequences, and their corresponding quality scores. However, the gene analysis apparatus 10 according to the present embodiment is not limited to the FASTQ format, and gene data 20 and 30 of different formats can be analyzed.

정상인 집단의 유전자 데이터(20)는, NCBI(National Center for Biotechnology Information), Gene　Expression Omnibus (GEO) 등과 같은 당해 기술분야에서 이미 공지된 데이터베이스(DB)로부터 획득되거나, 또는 피검체의 피검 유전자들을 분석하기 위하여 모집된 사람들의 생물학적 샘플로부터 획득된 것일 수 있다. 즉, 유전자 데이터(20)는 공개 게놈 데이터 또는 공개 합맵(HapMap) 데이터로부터 획득된 것일 수 있다. 한편, 유전자 데이터(20)에 포함된 레퍼런스 유전자들 또는 유전자 데이터(30)에 포함된 피검 유전자들은, 생검 조직, 포르말린-고정 조직 또는 파라핀-내장(Formalin-fixed, paraffin-embedded) 조직으로부터 획득된 것일 수 있다.The genomic data 20 of the normal population can be obtained from a database (DB) already known in the art such as National Center for Biotechnology Information (NCBI), Gene Expression Omnibus (GEO), or the like, Lt; RTI ID = 0.0 > a < / RTI > That is, the genetic data 20 may be obtained from public genomic data or public map (HapMap) data. On the other hand, the reference genes contained in the gene data 20 or the test genes contained in the gene data 30 may be obtained from biopsy tissue, formalin-fixed tissue, or formalin-fixed (paraffin-embedded) tissue Lt; / RTI >

복제수 변이(CNV)는, 레퍼런스 게놈(reference genome)과 비교하여 특정 염색체의 상대적으로 큰 영역이 결손되거나 증폭되어 반복적으로 나타나는 유전자 내의 변이를 의미하는 것으로 알려져 있다. 즉, 유전자 분석 장치(10)는 정상인 집단으로부터 획득된 유전자 데이터(20) 대비 피검체로부터 획득된 유전자 데이터(30)에 비정상적으로 결손되거나 증폭된 유전자가 존재하는지 여부를 판단할 수 있다. 여기서, 유전자 분석 장치(10)에 의해 분석되는 유전자는 DNA(deoxyribonucleic acid), RNA(ribonucleic acid) 등과 같은 핵산을 의미할 수 있다.The copy number variation (CNV) is known to mean a variation in a gene that is relatively repeated as compared with a reference genome, in which a relatively large region of a specific chromosome is missing or amplified. That is, the gene analysis apparatus 10 can determine whether or not an abnormally deficient or amplified gene exists in the gene data 30 obtained from the subject relative to the gene data 20 obtained from the normal population. Here, the gene analyzed by the gene analysis apparatus 10 may mean a nucleic acid such as DNA (deoxyribonucleic acid), RNA (ribonucleic acid) and the like.

본 실시예들에서, 정상인 집단은 특정 질병, 예를 들어 암, 종양 등이 발견되지 않은 일반 사람들로 구성된 집단을 의미하고, 피검체는 암, 종양 등과 같은 특정 질병이 발견된 환자를 의미할 수 있다. 한편, 본 실시예들에서 정상인 집단, 피검체는 인간이 아닌, 다른 동물들에 해당될 수도 있다.In the present embodiments, the normal population refers to a group of general people who have not found a specific disease, such as cancer or tumor, and the subject may refer to a patient who has found a specific disease such as cancer, have. On the other hand, in the present embodiments, the normal group, the subject may be other animals than humans.

유전자 분석 장치(10)는 유전자 데이터(20 및 30)를 분석하여 복제수 변이(CNV) 유전자를 식별하기 위한 다양한 명령어들, 다양한 알고리즘들을 수행하는 데이터 프로세싱의 기능을 갖는 적어도 하나의 프로세서로 구현될 수 있다.The gene analysis apparatus 10 may be implemented with at least one processor having a function of data processing for performing various algorithms for analyzing the gene data 20 and 30 to identify the copy number variation (CNV) gene, various algorithms .

도 2는 일 실시예에 따른 유전자 분석 장치의 하드웨어 구성들을 도시한 블록도이다.2 is a block diagram illustrating hardware configurations of a gene analysis apparatus according to an exemplary embodiment of the present invention.

도 2를 참고하면, 유전자 분석 장치(10)는 레퍼런스 데이터 생성부(110), 분석부(120) 및 판단부(130)를 포함할 수 있다. 한편, 도 2에 도시된 유전자 분석 장치(10)는 본 실시예의 특징이 흐려지는 것을 방지하기 위하여 본 실시예에 관련된 구성요소들만이 도시되어 있을 뿐이므로, 유전자 분석 장치(10)는 도 2에 도시된 구성요소들 외에 다른 범용적인 구성요소들이 더 포함될 수 있다.Referring to FIG. 2, the gene analysis apparatus 10 may include a reference data generator 110, an analyzer 120, and a determiner 130. 2, only the components related to the present embodiment are shown in order to prevent the characteristics of the present embodiment from being blurred. Therefore, the gene analysis apparatus 10 is shown in FIG. 2 Other general components may be further included.

레퍼런스 데이터 생성부(110)는 앞서 도 1에서 설명된, 정상인 집단으로부터 획득된 유전자 데이터(20)를 수신하고, 수신된 유전자 데이터(20)를 이용하여 레퍼런스 데이터 세트를 생성한다.The reference data generating unit 110 receives the gene data 20 obtained from the normal population described above with reference to FIG. 1 and generates the reference data set using the received gene data 20.

보다 상세하게는, 레퍼런스 데이터 생성부(110)는 유전자 데이터(20)에 포함된 레퍼런스 유전자들에 대한 딥 시퀀싱(deep sequencing)을 수행함으로써 레퍼런스 유전자들 각각에 정렬된 리드들(reads)의 뎁스들(depths)에 관한 레퍼런스 데이터 세트를 생성한다. 딥 시퀀싱이란, DNA 절편, RNA 절편 등과 같은 핵산들에 리드들을 반복적으로 정렬시킴으로써, DNA 절편, RNA 절편 등과 같은 핵산들을 시퀀싱하는 기술이다. 딥 시퀀싱의 결과, DNA 절편, RNA 절편 등과 같은 핵산들에 상보적으로 결합된 리드들의 개수에 대응되는 뎁스들에 관한 데이터가 획득될 수 있다. 본 실시예들에서, “뎁스”의 용어는 “리드-뎁스(read-depth)”의 용어와 동일한 의미로서 혼용되어 사용될 수 있다.More specifically, the reference data generator 110 performs deep sequencing on the reference genes included in the gene data 20, thereby extracting the depths of readings sorted on each of the reference genes (depths) of the reference data set. Deep sequencing is a technique for sequencing nucleic acids such as DNA fragments and RNA fragments by repeatedly aligning leads to nucleic acids such as DNA fragments and RNA fragments. As a result of the deep sequencing, data on the depths corresponding to the number of leads complementarily bound to nucleic acids such as DNA fragments, RNA fragments and the like can be obtained. In the present embodiments, the term " depth " may be used interchangeably with the term " read-depth ".

레퍼런스 데이터 생성부(110)는 먼저, 복수의 사람들(예를 들어, 정상인들)의 유전자 데이터(도 1의 20)에 대한 딥 시퀀싱을 통해, 사람들 각각에 대하여 레퍼런스 유전자들에 대응되는 리드-뎁스들을 획득한다. 그리고 나서, 레퍼런스 데이터 생성부(110)는 획득된 리드-뎁스들의 분포에 따라 사람들을 서로 다른 그룹들로 클러스터링한다. 레퍼런스 데이터 생성부(110)는 그룹 마다 레퍼런스 유전자들 각각에 대해 획득된 리드-뎁스들을 표준화함으로써, 그룹들 각각을 대표하는 레퍼런스 유전자들 각각의 표준 뎁스들을 획득한다. 결국, 레퍼런스 데이터 생성부(110)에 의해 생성된 레퍼런스 데이터 세트는, 그룹들 각각에 대하여, 레퍼런스 유전자들 각각의 표준 뎁스들을 나타내는 데이터를 포함할 수 있다.The reference data generating unit 110 firstly generates the reference data for each of the people through deep sequencing of gene data (20 in FIG. 1) of a plurality of people (for example, normal persons) &Lt; / RTI > Then, the reference data generator 110 clusters the users into different groups according to the distribution of the obtained lead-depths. The reference data generator 110 normalizes the lead-depths obtained for each of the reference genes for each group, thereby obtaining the standard depths of the respective reference genes representing each of the groups. As a result, the reference data set generated by the reference data generator 110 may include, for each of the groups, data representing the standard depths of each of the reference genes.

분석부(120)는 앞서 도 1에서 설명된, 피검체로부터 획득된 유전자 데이터(30)를 수신하고, 유전자 데이터(30)에 포함된 피검 유전자들에 대해 딥 시퀀싱을 수행함으로써 피검 유전자들 각각에 정렬된 리드들의 뎁스들을 분석한다.The analyzer 120 receives the gene data 30 obtained from the subject described above with reference to FIG. 1 and performs deep sequencing on the test genes included in the gene data 30, Analyze the depths of the aligned leads.

한편, 레퍼런스 데이터 생성부(110) 및 분석부(120)에 의해 수행되는 딥 시퀀싱은, 레퍼런스 유전자 또는 피검 유전자 내의 엑손(exon) 부위들에 대해 수행될 수 있다. 다시 말하면, 딥 시퀀싱 결과에 해당되는, 레퍼런스 데이터 생성부(110)에서 생성된 레퍼런스 데이터 세트 또는 분석부(120)에 의해 분석된 뎁스들의 데이터에는, 엑손(exon) 부위들에서의 뎁스들에 관한 데이터만이 포함되고, 인트론 부위들에 정렬된 리드들의 뎁스들에 관한 데이터는 포함되지 않을 수 있다. 다만, 본 실시예들은 이에 제한되지 않고, 인트론 부위들에 대한 뎁스 데이터가 포함될 수도 있다.Meanwhile, the deep sequencing performed by the reference data generator 110 and the analyzer 120 may be performed on reference genes or exon regions in the test gene. In other words, the reference data set generated by the reference data generating unit 110 or the data of the depths analyzed by the analyzing unit 120 corresponding to the result of the deep sequencing includes the depths of the exon regions Only data is included, and data about the depths of leads aligned to intron sites may not be included. However, the present embodiments are not limited thereto, and depth data about intron portions may be included.

판단부(130)는 분석부(120)에 의해 분석된 뎁스들을, 레퍼런스 데이터 생성부(110)에 의해 생성된 레퍼런스 데이터 세트에 포함된 레퍼런스 유전자들에 대한 뎁스들과 비교한다. 그리고 나서, 판단부(130)는 피검 유전자들 중 복제수 변이(CNV) 유전자가 존재하는지 여부를 판단한다. 이때, 판단부(130)는 동일한 엑손 부위 별로 레퍼런스 유전자들 및 피검 유전자들 간의 뎁스들을 비교함으로써, 복제수 변이(CNV) 유전자의 존재를 판단할 수 있다.The determination unit 130 compares the depths analyzed by the analysis unit 120 with the depths of the reference genes included in the reference data set generated by the reference data generation unit 110. [ Then, the determination unit 130 determines whether a copy number variation (CNV) gene exists among the tested genes. At this time, the determination unit 130 can determine the presence of the copy number variation (CNV) gene by comparing the depths of the reference genes and the test genes by the same exon region.

판단부(130)는 판단 기준으로서, 피검 유전자들의 엑손 부위들 중, 레퍼런스 유전자들 및 피검 유전자들 간에 서로 대응되는 엑손 부위들에서의 뎁스의 차이가 통계적으로 유의(significant)하지 않은 엑손 부위가 존재하는 경우에, 복제수 변이(CNV) 유전자가 존재하는 것으로 판단할 수 있다.The determination unit 130 determines whether there is an exon site in which the difference in the depth between the exon regions corresponding to each other among the exon regions of the test genes is not statistically significant , It can be judged that the copy number mutation (CNV) gene exists.

판단부(130)는 서로 대응되는 엑손 부위들에서의 뎁스의 차이가 통계적으로 유의하지 않은 엑손 부위에 대응되는 유전자가 복제수 변이(CNV) 유전자에 해당되는 것으로 검출 또는 식별한다. 나아가서, 판단부(130)는 피검 유전자들 중 복제수 변이(CNV) 유전자가 존재하는 것으로 판단된 경우에는, 검출된 복제수 변이(CNV) 유전자에 대응되는 약물(예를 들어, 항암제 등)을 식별하기 위한 어노테이션(annotation)을 수행할 수 있다.The determination unit 130 detects or identifies that the gene corresponding to the exon region in which the difference in depth in the corresponding exon regions is not statistically significant corresponds to the copy number variation (CNV) gene. In addition, when it is determined that the copy number mutation (CNV) gene is present among the genes to be tested, the determination unit 130 determines that a drug (for example, an anti-cancer agent) corresponding to the detected copy number variation (CNV) It is possible to perform an annotation for identifying the user.

도 3은 일 실시예에 따른 레퍼런스 데이터 세트를 생성하는 방법의 흐름도이다. 도 3을 참고하면, 레퍼런스 데이터 세트의 생성은 앞서 설명된 레퍼런스 데이터 생성부(110)에서 시계열적으로 처리되는 단계들을 포함한다.3 is a flow diagram of a method for generating a set of reference data in accordance with an embodiment. Referring to FIG. 3, the generation of the reference data set includes the steps of time-series processing in the reference data generation unit 110 described above.

301 단계에서, 레퍼런스 데이터 생성부(110)는, 복수의 사람들(예를 들어, 정상인들) 각각에 대하여 레퍼런스 유전자들에 대응되는 리드-뎁스들을 획득한다.In step 301, the reference data generator 110 obtains the lead-depths corresponding to the reference genes for each of a plurality of people (for example, normal persons).

302 단계에서, 레퍼런스 데이터 생성부(110)는, 획득된 리드-뎁스들의 분포에 따라 사람들을 서로 다른 그룹들로 클러스터링한다.In step 302, the reference data generator 110 clusters people into different groups according to the distribution of the obtained lead-depths.

303 단계에서, 레퍼런스 데이터 생성부(110)는, 그룹 마다, 레퍼런스 유전자들 각각에 대해 획득된 리드-뎁스들을 표준화한다.In step 303, the reference data generator 110 normalizes the lead-depths obtained for each of the reference genes for each group.

304 단계에서, 레퍼런스 데이터 생성부(110)는, 그룹들 각각을 대표하는 레퍼런스 유전자들 각각의 표준 뎁스들을 획득한다.In step 304, the reference data generator 110 acquires standard depths of reference genes representing each of the groups.

도 4는 일 실시예에 따라 복수의 사람들(예를 들어, 정상인들) 각각에 대하여 레퍼런스 유전자들에 대응되는 리드-뎁스들을 획득하는 것을 설명하기 위한 도면이다. 도 4의 설명은, 도 3의 301 단계에서 수행되는 방법과 관련된 것일 수 있다.FIG. 4 is a diagram for explaining obtaining lead-depths corresponding to reference genes for each of a plurality of people (for example, normal persons) according to an embodiment. The description of FIG. 4 may be related to the method performed in step 301 of FIG.

도 4를 참고하면, 레퍼런스 데이터 생성부(110)는 데이터베이스(DB)(40)로부터 획득된 유전자 데이터(401)를 이용하여 딥 시퀀싱을 수행함으로써, 리드-뎁스들을 획득할 수 있다.Referring to FIG. 4, the reference data generator 110 may obtain the lead-depths by performing the deep sequencing using the gene data 401 obtained from the database (DB) 40.

데이터베이스(DB)(40)는 정상인 집단(400)으로 분류된 복수의 사람들(예를 들어, 정상인들) 개개인의 유전자 데이터(401)를 저장하고 있다. 유전자 데이터(401)는 복수의 사람들로부터 채취된 생물학적 샘플들에 대해 차세대 시퀀싱(NGS), 마이크로어레이 등과 같은 다양한 시퀀싱 수단들을 이용하여 획득된 것일 수 있다. 한편, 유전자 데이터(401)는, 전유전체(whole genome)에 대한 데이터이거나, 또는 합맵(HapMap)에 대한 데이터일 수 있다.The database (DB) 40 stores gene data 401 of a plurality of people (for example, normal persons) classified into the normal population 400. Genetic data 401 may be obtained using various sequencing means such as next generation sequencing (NGS), microarrays, etc. for biological samples taken from a plurality of people. On the other hand, the gene data 401 may be data on a whole genome, or data on a hapmap (HapMap).

데이터베이스(DB)(40)는, NCBI, GEO 등과 같은 당해 기술분야에서 이미 공지된 데이터베이스(DB)에 해당되거나, 또는 피검체의 피검 유전자들을 분석하기 위하여 모집된 사람들의 유전자 데이터(401)를 저장하기 위하여 구축된 것일 수 있다.The database (DB) 40 stores genetic data 401 of people who are known in the art such as NCBI, GEO, and the like, or who are recruited to analyze the test genes of the test subject To be built.

레퍼런스 데이터 생성부(110)는, 유전자 데이터(401)에 포함된, 정상인 집단(400)의 개개인들의 유전자들(즉, 레퍼런스 유전자들)에 대해 딥 시퀀싱을 수행한다. 예를 들어, 레퍼런스 데이터 생성부(110)는, 정상인 집단(400)에 포함된 “사람 1”(410)의 레퍼런스 유전자들(411)에 대해 딥 시퀀싱을 수행할 수 있다. 레퍼런스 유전자(411)에 대한 딥 시퀀싱의 결과, 레퍼런스 유전자들(411)에 포함된 유전자 1, ..., 유전자 n (n은 자연수) 각각에는 리드들(415) 정렬되고, 레퍼런스 유전자들(411) 각각에 정렬된 리드들(415)의 뎁스들(리드-뎁스들)에 대한 데이터가 획득된다. 마찬가지로, 레퍼런스 데이터 생성부(110)는, 정상인 집단(400)에 포함된 “사람 1”(420)의 레퍼런스 유전자들(421)에 대해서도 딥 시퀀싱을 수행하고, 레퍼런스 유전자들(421) 각각에 정렬된 리드들(425)의 뎁스들(리드-뎁스들)에 대한 데이터를 획득한다. 레퍼런스 데이터 생성부(110)는, 유전자 데이터(401)에 포함된 정상인 집단(400)의 개개인들의 레퍼런스 유전자들에 대해 딥 시퀀싱을 수행함으로써, 리드-뎁스들의 데이터를 획득할 수 있다.The reference data generation unit 110 performs deep sequencing on genes of individual individuals of the normal population 400 included in the gene data 401 (i.e., reference genes). For example, the reference data generator 110 may perform deep sequencing on the reference genes 411 of the " person 1 " 410 included in the normal population 400. As a result of the deep sequencing on the reference gene 411, the leads 415 are aligned in the gene 1, ..., and the gene n (n is a natural number) included in the reference genes 411, (Lead-depths) of the leads 415 aligned to each of the plurality of leads 415 are obtained. Similarly, the reference data generator 110 also performs the deep sequencing on the reference genes 421 of the " person 1 " 420 included in the normal population 400 and performs the deep sequence on the reference genes 421 (Lead-depths) of the leads 425 that have been formed. The reference data generator 110 may acquire the data of the lead-depths by performing the deep sequencing on the reference genes of the individuals of the normal population 400 included in the gene data 401.

도 5는 일 실시예에 따라 엑손 부위들에 대한 딥 시퀀싱을 설명하기 위한 도면이다.5 is a diagram for explaining deep sequencing for exon regions according to an embodiment.

도 5를 참고하면, 정상인 집단(400)의 개개인들의 유전자들에 해당되는 레퍼런스 유전자들에 대한 딥 시퀀싱은, 인트론 부위들(505)을 제외하고, 엑손 부위들에 정렬된 리드들의 뎁스들(리드-뎁스들)을 획득한다. 예를 들어, 어느 개인의 레퍼런스 유전자(핵산(500))가 유전자 a, 유전자 b 및 유전자 c를 포함하는 경우, 딥 시퀀싱의 결과는 유전자 a 내의 엑손 a1에 정렬된 리드들(510)의 뎁스 및 엑손 a2에 정렬된 리드들의 뎁스, 유전자 b 내의 엑손 b1에 정렬된 리드들의 뎁스 및 엑손 b2에 정렬된 리드들의 뎁스, 및 유전자 c 내의 엑손 c에 정렬된 리드들의 뎁스의 데이터를 포함할 수 있다. 다만, 본 실시예들은 이에 제한되지 않고, 딥 시퀀싱의 결과에는 인트론 부위들(505)에 정렬된 리드들의 뎁스들의 데이터가 포함될 수도 있다.Referring to FIG. 5, deep sequencing of reference genes corresponding to genes of individuals in the normal population 400 may be performed by using the depths of leads (aligned with exons) except the intron regions 505 - depths). For example, if an individual reference gene (nucleic acid 500) comprises gene a, gene b and gene c, the result of the deep sequencing is the depth of the leads 510 aligned to exon a1 in gene a, The depth of leads aligned with exon a2, the depth of leads aligned with exon b1 in gene b, the depth of leads aligned with exon b2, and the depth of leads aligned with exon c in gene c. However, the present embodiments are not limited thereto, and the results of the deep sequencing may include data of the depths of the leads arranged in the intron portions 505.

한편, 도 5에 도시된 엑손 부위들에 대한 딥 시퀀싱은, 레퍼런스 유전자들뿐만 아니라, 피검체로부터 획득된 피검 유전자들에 대해서도 적용된다. 즉, 분석부(도 2의 120)는 피검 유전자들 내 엑손 부위들에 대해 딥 시퀀싱을 수행함으로써 피검 유전자들 내 엑손 부위들 각각에 정렬된 리드들의 뎁스들을 분석할 수 있다.On the other hand, the deep sequencing of the exon regions shown in FIG. 5 is applied not only to the reference genes, but also to the test genes obtained from the test subject. That is, the analysis unit 120 in FIG. 2 can analyze the depths of the leads aligned in each of the exon regions in the test genes by performing deep sequencing on the exon regions in the test genes.

도 6은 일 실시예에 따라 정상인 집단(400)으로부터 획득된 리드-뎁스들의 분포에 따라 사람들을 서로 다른 그룹들로 클러스터링하는 것을 설명하기 위한 도면이다. 도 6의 설명은, 도 3의 302 단계에서 수행되는 방법과 관련된 것일 수 있다.FIG. 6 is a diagram illustrating clustering of people into different groups according to the distribution of lead-depths obtained from the normal population 400 according to an embodiment. The description of FIG. 6 may be related to the method performed in step 302 of FIG.

정상인 집단(400)의 개개인들은 서로 다른 유전자들을 갖고 있기 때문에, 개개인들마다 딥 시퀀싱으로 분석된, 특정 유전자(또는 특정 엑손)에 대응되는 뎁스는 서로 다를 수 있다. 또는, 이 밖에도, 개개인들로부터 획득된 생물학적 샘플에 대한 화학적 처리(예를 들어, FFPE(Formalin-fixed, paraffin-embedded)) 여부, 딥 시퀀싱 오차 등으로 인해, 개개인들의 레퍼런스 유전자들 각각에 대한 뎁스들의 분포 경향은 서로 다를 수 있다. 따라서, 레퍼런스 데이터 생성부(110)는 뎁스들의 분포가 비슷한 경향을 갖는 사람들끼리 그룹핑하여, 정상인 집단(400)의 개개인들을 서로 다른 그룹들로 클러스터링한다. 여기서, 클러스터링은, 공지의 추세 분석 알고리즘, 클러스터링 알고리즘 등을 이용하여 각 레퍼런스 유전자(엑손)에 대한 리드-뎁스의 분포를 통계적으로 분석함으로써 수행될 수 있다.Since individuals in the normal population 400 have different genes, the depths corresponding to specific genes (or specific exons) analyzed by deep sequencing may be different for each individual. Alternatively, due to the chemical treatment (e.g., FFPE (Formalin-fixed, paraffin-embedded)) of the biological samples obtained from individuals, and the deep sequencing error, etc., a depth for each of the individual reference genes May be different from one another. Therefore, the reference data generation unit 110 groups individuals having a similar tendency in the distribution of the depths, and clusters individuals of the normal population 400 into different groups. Here, the clustering can be performed by statistically analyzing the distribution of the lead-depth for each reference gene (exon) using a known trend analysis algorithm, clustering algorithm, and the like.

도 6을 참고하면, 그룹 1에 속한 사람들의 레퍼런스 유전자들에 대해 딥 시퀀싱을 수행한 결과, 그룹 1에 속한 사람들의 레퍼런스 유전자들은, 각 유전자와 뎁스 쌍의 분포가 비슷한 경향을 가질 수 있다. 또한, 다른 그룹들도 마찬가지이다. 예를 들어, 그룹 1에 속한 사람들의 레퍼런스 유전자들은 그룹 1에 속한 사람들의 생검 샘플들로부터 획득된 것일 수 있고, 그룹 M (M은 자연수)에 속한 사람들의 레퍼런스 유전자들은 그룹 M에 속한 사람들의 FFPE 샘플들로부터 획득된 것일 수 있다.Referring to FIG. 6, as a result of performing deep sequencing on the reference genes of the people belonging to the group 1, the reference genes of the people belonging to the group 1 may have a tendency that the distribution of each gene and the depth pair is similar. The same goes for the other groups. For example, reference genes of people belonging to group 1 may be obtained from biopsy samples of people belonging to group 1, and reference genes of people belonging to group M (M is natural number) are referred to as FFPE May be obtained from the samples.

도 7은 일 실시예에 따라 어느 그룹을 대표하는, 레퍼런스 유전자들 각각의 표준 뎁스들을 설명하기 위한 도면이다. 도 7의 설명은, 도 3의 303 단계 및 304 단계에서 수행되는 방법들과 관련된 것일 수 있다.FIG. 7 is a diagram for explaining the standard depths of reference genes, which represent a group according to an embodiment. The description of FIG. 7 may relate to the methods performed in steps 303 and 304 of FIG.

도 7을 참고하면, 클러스터링이 완료된 경우, 레퍼런스 데이터 생성부(110)는 각 그룹마다, 레퍼런스 유전자들 각각에 대해 획득된 리드-뎁스들을 표준화하여, 그룹들 각각을 대표하는, 레퍼런스 유전자들 각각의 표준 뎁스들을 획득한다.Referring to FIG. 7, when the clustering is completed, the reference data generator 110 normalizes the lead-depths obtained for the respective reference genes for each group to generate reference data for each of the reference genes Obtain standard depths.

어느 레퍼런스 유전자(예를 들어, “엑손 1”)에 대하여, 그룹 x에 속한 사람들마다 뎁스가 다양한 값을 갖는 경우, 레퍼런스 데이터 생성부(110)는 “엑손 1”에 대한 다양한 뎁스들의 평균을 계산함으로써, “엑손 1”에 대한 뎁스를 표준화할 수 있다. 마찬가지로, 레퍼런스 데이터 생성부(110)는 다른 레퍼런스 유전자들(예를 들어, “엑손 43”, “엑손 3543”, “엑손 5623” 등) 각각에 대하여도 다양한 뎁스들의 평균을 계산함으로써, 각 유전자(엑손)에 대한 표준 뎁스를 계산할 수 있다. 이로써, 레퍼런스 데이터 생성부(110)는 클러스터링된 그룹들 각각을 대표하는, 레퍼런스 유전자들 각각의 표준 뎁스들을 획득할 수 있다. 한편, 본 실시예에서는 설명의 편의를 위하여, 뎁스들의 평균을 계산하여 대푯값을 취하는 것으로 설명되었으나, 본 실시예들은 평균 외에도 다른 종류의 통계량을 이용하여 뎁스들의 대푯값이 계산될 수도 있다.For a reference gene (e.g., "exon 1"), when the depth has a variable value for each person belonging to group x, the reference data generator 110 calculates the average of various depths for "exon 1" , It is possible to standardize the depth for " exon 1 ". Similarly, the reference data generator 110 calculates the average of various depths for each of the other reference genes (e.g., "Exon 43", "Exon 3543", "Exon 5623" Exon) can be calculated. In this way, the reference data generator 110 can acquire the standard depths of the respective reference genes, which represent each of the clustered groups. Meanwhile, in the present embodiment, for convenience of explanation, it has been described that the mean value of the depths is calculated to take the representative value. However, in the present embodiments, the representative value of the depths may be calculated by using other kinds of statistical quantities.

도 8은 일 실시예에 따라 피검체의 생물학적 샘플로부터 획득된 피검 유전자들에 대한 딥 시퀀싱을 수행하는 것을 설명하기 위한 도면이다.FIG. 8 is a diagram for explaining performing the deep sequencing on the test genes obtained from the biological sample of the subject according to one embodiment. FIG.

도 8을 참고하면, 분석부(도 2의 120)는 피검체(800)의 유전자 데이터(30)에 기초하여, 피검 유전자들에 대한 딥 시퀀싱을 수행함으로써 피검 유전자들 각각에 정렬된 리드들의 뎁스들을 분석한다.Referring to FIG. 8, the analysis unit 120 of FIG. 2 analyzes the gene data 30 of the subject 800 by performing deep sequencing on the genes to be tested, Lt; / RTI >

피검체(800)의 유전자 데이터(30)는 피검체(800)의 일부 조직으로부터 채취된 생검 샘플(810) 또는 FFPE 샘플(825)에 대한 차세대 시퀀싱(NGS)을 통해 획득된 것일 수 있다. 여기서, FFPE 샘플(825)은 피검체(800)의 일부 조직에 대한 FFPE 처리(820)에 의한 샘플이다.The gene data 30 of the subject 800 may be obtained through the next generation sequencing (NGS) on the biopsy sample 810 or the FFPE sample 825 taken from a part of the tissue of the subject 800. [ Here, the FFPE sample 825 is a sample by the FFPE processing 820 on some tissues of the test object 800.

분석부(도 2의 120)는 앞서 도 4 및 도 5에서 설명된 딥 시퀀싱 방식들에 따라, 피검체(800)의 피검 유전자들에 정렬된 리드들의 뎁스를 분석함으로써, 피검 유전자들의 뎁스 데이터(830)를 획득할 수 있다.The analysis unit 120 in FIG. 2 analyzes the depths of the leads aligned with the tested genes in the test body 800 according to the deep sequencing methods described above with reference to FIGS. 4 and 5, 830).

도 9는 일 실시예에 따른 복제수 변이(CNV) 유전자가 존재하는지 여부를 판단하는 방법의 흐름도이다. 도 9를 참고하면, 복제수 변이(CNV) 유전자의 판단은 앞서 설명된 판단부(130)에서 시계열적으로 처리되는 단계들을 포함한다.9 is a flowchart of a method for determining whether a copy number variation (CNV) gene is present according to an embodiment. Referring to FIG. 9, the determination of the copy number variation (CNV) gene includes the steps of time series processing in the determination unit 130 described above.

901 단계에서, 판단부(130)는, 레퍼런스 데이터 생성부(110)에 의해 클러스터링된 그룹들 중, 피검 유전자들로부터 분석된 뎁스들의 분포와 표준 뎁스들의 분포 간의 통계적인 차이가 가장 작은 그룹을 결정한다. 즉, 판단부(130)는, 클러스터링된 그룹들(예를 들어, 도 6의 그룹들) 중, 피검 유전자들로부터 분석된 뎁스들의 분포와 비슷한 통계적 경향을 갖는 적어도 하나의 어느 그룹을 결정한다. 이때, 판단부(130)는 피검 유전자들로부터 분석된 뎁스들의 분포와 표준 뎁스들의 분포 간의 표준 편차가 가장 작은 그룹을 결정할 수 있다. 다만, 이에 제한되지 않고, 피검 유전자들로부터 분석된 뎁스들의 분포와 비슷한 경향을 갖는 그룹을 선택하기 위하여, 표준 편차 외에, 다른 통계량들이 이용될 수도 있다.In step 901, the determination unit 130 determines a group having the smallest statistical difference between the distribution of the depths analyzed from the genes tested and the distribution of the standard depths among the groups clustered by the reference data generation unit 110 do. That is, the determination unit 130 determines at least one group among the clustered groups (for example, the groups in FIG. 6) having a statistical tendency similar to the distribution of the analyzed probabilities from the tested genes. At this time, the determination unit 130 can determine the group having the smallest standard deviation between the distribution of the depths analyzed from the test genes and the distribution of the standard depths. However, other statistics may be used in addition to the standard deviation, in order to select a group having a tendency similar to the distribution of the analyzed depths from the test genes.

902 단계에서, 판단부(130)는, 피검 유전자들로부터 분석된 분석된 뎁스들과, 결정된 그룹에 대응되는 표준 뎁스들을 비교한다. 보다 상세하게는, 판단부(130)는, 피검 유전자들(엑손들) 각각의 뎁스를, 대응되는 레퍼런스 유전자(대응되는 엑손)의 뎁스와 비교한다. 예를 들어, 피검 유전자들 및 레퍼런스 유전자들 모두에 “엑손 1” 및 “엑손 43”이 존재하는 경우를 가정하면, 판단부(130)는 분석부(120)에 의해 분석된 “엑손 1”의 뎁스를 “엑손 1”의 표준 뎁스와 비교하고, 분석부(120)에 의해 분석된 “엑손 43”의 뎁스를 “엑손 43”의 표준 뎁스와 비교한다. 여기서, “엑손 1” 및 “엑손 43”는 서로 다른 엑손들인 것을 나타내기 위한 임의의 용어들이다.In step 902, the determination unit 130 compares the analyzed depths analyzed from the test genes with standard depths corresponding to the determined group. More specifically, the determination unit 130 compares the depth of each of the tested genes (exons) with the depth of the corresponding reference gene (corresponding exon). For example, when it is assumed that exons 1 and 43 exist in both the test genes and the reference genes, the determination unit 130 determines whether or not the exon 1 and exon 43 analyzed by the analysis unit 120 The depth is compared with the standard depth of "Exxon 1", and the depth of "Exxon 43" analyzed by the analysis unit 120 is compared with the standard depth of "Exxon 43". Here, "exon 1" and "exon 43" are arbitrary terms for indicating that they are different exons.

903 단계에서, 판단부(130)는 비교 결과, 복제수 변이(CNV) 유전자가 존재하는지 여부를 판단한다. 이때, 판단부(130)는 피검 유전자들의 엑손 부위들 중, 레퍼런스 유전자들 및 피검 유전자들 간에 서로 대응되는 엑손 부위들에서의 뎁스의 차이가 통계적으로 유의(significant)하지 않은 엑손 부위가 존재하는 경우, 복제수 변이(CNV) 유전자가 존재하는 것으로 판단할 수 있다.In step 903, the determination unit 130 determines whether a copy number variation (CNV) gene exists as a result of the comparison. At this time, the determination unit 130 determines that there is an exon region that is not statistically significant in the difference of the depths in the exon regions corresponding to each other among the exon regions of the test genes , And a copy number variation (CNV) gene.

보다 구체적으로, 뎁스의 차이가 유의하지 않다고 판단하기 위한 임계값이 표준 뎁스의 4배인 것으로 설정된 경우를 가정하면, 판단부(130)는 분석부(120)에 의해 분석된 어느 엑손의 뎁스가 표준 뎁스의 4배를 초과하는 경우에 복제수 변이(CNV) 유전자가 존재하는 것으로 판단할 수 있다. 다만, 임계값은 이에 제한되지 않고 다양하게 바뀔 수 있다. 예를 들면, “엑손 1”의 표준 뎁스가 1000인 경우, 유의성을 판단하기 위한 임계값은 4000일 수 있다. 따라서, 분석부(120)에 의해 분석된, 피검체의 “엑손 1”의 뎁스가 5000인 경우, 판단부(130)는 “엑손 1”의 유전자는 복제수 변이(CNV) 유전자인 것으로 판단할 수 있다.More specifically, assuming that the threshold for determining that the difference in depth is not significant is set to be four times the standard depth, the determination unit 130 determines whether the depth of any of the exons analyzed by the analysis unit 120 satisfies the standard It can be judged that the copy number mutation (CNV) gene is present in the case of exceeding 4 times of the depth. However, the threshold value may be variously changed without being limited thereto. For example, when the standard depth of " exon 1 " is 1000, the threshold for determining significance may be 4000. Therefore, when the depth of the exon 1 of the subject analyzed by the analysis unit 120 is 5000, the determination unit 130 determines that the gene of "exon 1" is a copy number variation (CNV) gene .

도 10은 일 실시예에 따라 복제수 변이(CNV) 유전자가 존재하는지 여부를 판단하는 것을 설명하기 위한 도면이다.FIG. 10 is a diagram for explaining whether or not a copy number variation (CNV) gene is present according to an embodiment.

도 10을 참고하면, 실선으로 표시된 뎁스들은 레퍼런스 유전자(엑손)에 대응되고, 실선으로 표시된 뎁스들은 레퍼런스 유전자(엑손)에 대응되고, 일점쇄선으로 표시된 뎁스들은 피검 유전자(엑손)에 대응된다.Referring to FIG. 10, the depths indicated by solid lines correspond to reference genes (exons), the dashed lines correspond to reference genes (exons), and the dashed lines correspond to the test genes (exons).

판단부(130)는 앞서 도면들에서 설명된 바와 같이, 분석부(120)에 의해 분석된 엑손들의 뎁스들과, 표준 뎁스들을 비교한다. 판단부(130)는 피검 유전자들의 엑손 부위들 중, 레퍼런스 유전자들 및 피검 유전자들 간에 서로 대응되는 엑손 부위들에서의 뎁스의 차이가 통계적으로 유의(significant)하지 않은 엑손 부위(“엑손 a”)가 존재하는 경우, “엑손 a”의 피검 유전자는 복제수 변이(CNV) 유전자로 식별되었기 때문에 복제수 변이(CNV) 유전자가 존재하는 것으로 판단할 수 있다.The determination unit 130 compares the depths of the exons analyzed by the analysis unit 120 with the standard depths, as described above with reference to the drawings. The determination unit 130 determines whether or not the exon region ("exon a") in which the difference in the depth between the exon regions corresponding to each other among the reference genes and the exon regions of the test genes is not statistically significant, (CNV) gene can be judged to exist because the test gene of "exon a" is identified as a copy number mutation (CNV) gene.

한편, 판단부(130)는 피검 유전자들 중 복제수 변이(CNV) 유전자가 존재하는 것으로 판단된 경우, 복제수 변이(CNV) 유전자에 대응되는 약물(예를 들어, 항암제)을 식별하기 위한 어노테이션을 수행할 수 있다.Meanwhile, when it is determined that the copy number mutation (CNV) gene is present among the genes to be tested, the determination unit 130 outputs an annotation for identifying a drug (for example, an anticancer agent) corresponding to the copy number variation (CNV) Can be performed.

도 11은 일 실시예에 따라 유전자를 분석하는 방법의 흐름도이다. 도 11을 참고하면, 유전자 분석 방법은 앞선 도면들에서 설명된 유전자 분석 장치(10)에서 시계열적으로 처리되는 단계들을 포함한다. 따라서, 이하 생략된 내용이라 하더라도 앞선 도면들에서 설명되었던 내용들은 도 11의 유전자 분석 방법에도 적용될 수 있다.11 is a flowchart of a method for analyzing a gene according to an embodiment. Referring to FIG. 11, the gene analysis method includes steps of time-series processing in the gene analysis apparatus 10 described in the preceding figures. Therefore, even if omitted below, the contents described in the preceding drawings can also be applied to the gene analysis method of FIG.

1101 단계에서, 레퍼런스 데이터 생성부(110)는 레퍼런스 유전자들에 대한 딥 시퀀싱을 수행함으로써 레퍼런스 유전자들 각각에 정렬된 리드들의 뎁스들에 관한 레퍼런스 데이터 세트를 생성한다.In step 1101, the reference data generator 110 generates a set of reference data on the depths of the leads aligned on each of the reference genes by performing deep sequencing on the reference genes.

1102 단계에서, 분석부(120)는 피검 유전자들에 대해 딥 시퀀싱을 수행함으로써 피검 유전자들 각각에 정렬된 리드들의 뎁스들을 분석한다.In step 1102, the analysis unit 120 analyzes the depths of the leads aligned on each of the tested genes by performing deep sequencing on the tested genes.

1103 단계에서, 판단부(130)는 분석된 뎁스들을 레퍼런스 데이터 세트에 포함된 레퍼런스 유전자들에 대한 뎁스들과 비교함으로써, 피검 유전자들 중 복제수 변이(CNV) 유전자가 존재하는지 여부를 판단한다.In step 1103, the determination unit 130 determines whether a copy number variation (CNV) gene exists among the tested genes by comparing the analyzed depths to the depths of the reference genes included in the reference data set.

도 12는 일 실시예에 따른 컴퓨팅 장치의 하드웨어 구성들을 도시한 블록도이다.12 is a block diagram illustrating hardware configurations of a computing device according to one embodiment.

도 12를 참고하면, 컴퓨팅 장치(1)는 유전자 분석 장치(프로세서)(10), 데이터 인터페이스(11) 및 메모리(12)를 포함한다. 한편, 도 12에 도시된 컴퓨팅 장치(1)는 본 실시예의 특징이 흐려지는 것을 방지하기 위하여 본 실시예에 관련된 구성요소들만이 도시되어 있을 뿐이므로, 도 12에 도시된 구성요소들 외에 다른 범용적인 구성요소들이 더 포함될 수 있다.12, the computing apparatus 1 includes a genetic analysis apparatus (processor) 10, a data interface 11, and a memory 12. [ On the other hand, since the computing device 1 shown in Fig. 12 is only shown in order to prevent the characteristic of the present embodiment from being blurred, only the components related to the present embodiment are shown. Therefore, Components may be further included.

데이터 인터페이스(11)는 앞서 도 1에서 설명된, 정상인 집단의 유전자 데이터(20) 및 피검체의 유전자 데이터(30)를 수신한다. 즉, 데이터 인터페이스(11)는 컴퓨팅 장치(1)가 외부의 다른 디바이스들과 통신하기 위한 유/무선 네트워크 인터페이스의 하드웨어로 구현될 수 있다. 데이터 인터페이스(11)는 수신된 유전자 데이터(20 및 30)를 유전자 분석 장치(프로세서)(10)로 전송한다.The data interface 11 receives the genetic data 20 of the normal population and the genetic data 30 of the subject described above with reference to Fig. That is, the data interface 11 may be implemented as hardware of a wired / wireless network interface for the computing device 1 to communicate with other external devices. The data interface 11 transmits the received gene data 20 and 30 to a gene analysis apparatus (processor)

데이터 인터페이스(11)는 데이터베이스(DB)(도 4의 40)로부터 정상인 집단의 유전자 데이터(20)를 수신할 수 있다. 그리고, 데이터 인터페이스(11)는 피검체의 피검 유전자를 시퀀싱하기 위한 외부의 차세대 시퀀싱 장치, 마이크로어레이 등으로부터 피검체의 유전자 데이터(30)를 수신할 수 있다.The data interface 11 can receive genetic data 20 of a normal population from the database DB (40 in Fig. 4). The data interface 11 can receive the gene data 30 of the subject from an external next-generation sequencing device, a microarray, or the like for sequencing the gene to be tested.

메모리(12)는 컴퓨팅 장치(1) 내에서 처리될 데이터들 및 처리가 완료된 결과들을 저장하기 위한 하드웨어로서, RAM(random access memory), ROM(read only memory) 등의 메모리 칩들 또는 HDD(hard disk drive), SSD(solid state drive) 등의 스토리지를 포함한다. 즉, 메모리(12)는 데이터 인터페이스(11)에 의해 수신된 유전자 데이터(20 및 30)을 저장할 수 있고, 유전자 분석 장치(프로세서)(10)에 의해 처리된 레퍼런스 데이터 세트, 피검 유전자들에 대한 딥 시퀀싱 데이터, 식별된 복제수 변이(CNV) 유전자에 대한 데이터도 저장할 수 있다.The memory 12 is a hardware for storing data to be processed and results of processing in the computing device 1 and may be a memory chip such as a random access memory (RAM), a read only memory (ROM) drive, solid state drive (SSD), and the like. That is, the memory 12 can store the gene data 20 and 30 received by the data interface 11, and can store the reference data set processed by the gene analysis apparatus (processor) 10, Deep sequencing data, and data on the identified copy number variation (CNV) gene.

유전자 분석 장치(프로세서)(10)는 하나 이상의 프로세싱 유닛들로 구현된 모듈로서, 다수의 논리 게이트들의 어레이를 갖는 마이크로프로세서와 이 마이크로프로세서에서 실행될 수 있는 프로그램이 저장된 메모리 모듈의 조합으로 구현될 수도 있다. 유전자 분석 장치(프로세서)(10)는 응용 프로그램의 모듈 형태로 구현될 수도 있다. 유전자 분석 장치(프로세서)(10)는 앞서 도 1 내지 도 11에서 설명된 유전자 분석을 처리하는 하드웨어 장치이다.A genetic analysis apparatus (processor) 10 is a module implemented with one or more processing units, which may be implemented as a combination of a microprocessor having an array of logic gates and a memory module in which a program executable in the microprocessor is stored have. The gene analysis apparatus (processor) 10 may be implemented as a module of an application program. The gene analysis apparatus (processor) 10 is a hardware apparatus for processing the gene analysis described in Figs. 1 to 11 above.

유전자 분석 장치(프로세서)(10)에 의해 식별된 복제수 변이(CNV) 유전자에 대한 정보는 데이터 인터페이스(11)를 통해 외부의 다른 디바이스, 예를 들어 디스플레이 디바이스, 다른 컴퓨팅 장치 등으로 전송되거나, 또는 외부 네트워크, 예를 들어 인터넷, 공개 데이터베이스(DB) 서버 상으로 전송될 수 있다.Information on the copy number variation (CNV) gene identified by the genetic analysis apparatus (processor) 10 is transmitted to another external device, for example, a display device, another computing device, etc., via the data interface 11, Or on an external network, e.g., the Internet, a public database (DB) server.

앞서 설명된 본 실시예들에 따르면, 피검체(예를 들어, 암 환자)의 정상 혈액을 확보할 수 없을지라도, 피검체의 암 조직의 생검 샘플 또는 FFPE 샘플만으로도 복제수 변이(CNV) 유전자를 검출할 수 있다. 나아가서, 피검체로부터 획득된 암 조직의 유전자들(피검 유전자들)이 FFPE 처리에 의하여 화학적으로 약간 손상된다 할지라도, 비슷한 조건(FFPE 처리)의 레퍼런스 유전자들을 참조하여 복제수 변이(CNV) 유전자의 존재를 판단하므로, 복제수 변이(CNV) 유전자를 정확하게 검출할 수 있다.According to the embodiments described above, even if the normal blood of the subject (for example, a cancer patient) can not be ensured, the biopsy sample of the cancer tissue of the subject or the FFPE sample alone can be used to measure the copy number mutation Can be detected. Furthermore, even though the genes of the cancer tissues (test genes) obtained from the subject are chemically slightly damaged by the FFPE treatment, reference genes of similar conditions (FFPE treatment) are referred to to determine the copy number mutation (CNV) gene (CNV) gene can be detected accurately.

본 실시예들에 따른 장치는 프로세서, 프로그램 데이터를 저장하고 실행하는 메모리, 디스크 드라이브와 같은 영구 저장부(permanent storage), 외부 장치와 통신하는 통신 포트, 터치 패널, 키(key), 버튼 등과 같은 사용자 인터페이스 장치 등을 포함할 수 있다. 소프트웨어 모듈 또는 알고리즘으로 구현되는 방법들은 상기 프로세서상에서 실행 가능한 컴퓨터가 읽을 수 있는 코드들 또는 프로그램 명령들로서 컴퓨터가 읽을 수 있는 기록 매체 상에 저장될 수 있다. 여기서 컴퓨터가 읽을 수 있는 기록 매체로 마그네틱 저장 매체(예컨대, ROM(read-only memory), RAM(random-access memory), 플로피 디스크, 하드 디스크 등) 및 광학적 판독 매체(예컨대, 시디롬(CD-ROM), 디브이디(DVD: Digital Versatile Disc)) 등이 있다. 컴퓨터가 읽을 수 있는 기록 매체는 네트워크로 연결된 컴퓨터 시스템들에 분산되어, 분산 방식으로 컴퓨터가 판독 가능한 코드가 저장되고 실행될 수 있다. 매체는 컴퓨터에 의해 판독가능하며, 메모리에 저장되고, 프로세서에서 실행될 수 있다. An apparatus according to the present embodiments may include a processor, a memory for storing and executing program data, a permanent storage such as a disk drive, a communication port for communicating with an external device, a touch panel, a key, User interface devices, and the like. Methods implemented with software modules or algorithms may be stored on a computer readable recording medium as computer readable codes or program instructions executable on the processor. Here, the computer-readable recording medium may be a magnetic storage medium such as a read-only memory (ROM), a random-access memory (RAM), a floppy disk, a hard disk, ), And a DVD (Digital Versatile Disc). The computer-readable recording medium may be distributed over networked computer systems so that computer readable code can be stored and executed in a distributed manner. The medium is readable by a computer, stored in a memory, and executable on a processor.

본 실시예는 기능적인 블록 구성들 및 다양한 처리 단계들로 나타내어질 수 있다. 이러한 기능 블록들은 특정 기능들을 실행하는 다양한 개수의 하드웨어 또는/및 소프트웨어 구성들로 구현될 수 있다. 예를 들어, 실시 예는 하나 이상의 마이크로프로세서들의 제어 또는 다른 제어 장치들에 의해서 다양한 기능들을 실행할 수 있는, 메모리, 프로세싱, 로직(logic), 룩 업 테이블(look-up table) 등과 같은 직접 회로 구성들을 채용할 수 있다. 구성 요소들이 소프트웨어 프로그래밍 또는 소프트웨어 요소들로 실행될 수 있는 것과 유사하게, 본 실시예는 데이터 구조, 프로세스들, 루틴들 또는 다른 프로그래밍 구성들의 조합으로 구현되는 다양한 알고리즘을 포함하여, C, C++, 자바(Java), 어셈블러(assembler) 등과 같은 프로그래밍 또는 스크립팅 언어로 구현될 수 있다. 기능적인 측면들은 하나 이상의 프로세서들에서 실행되는 알고리즘으로 구현될 수 있다. 또한, 본 실시예는 전자적인 환경 설정, 신호 처리, 및/또는 데이터 처리 등을 위하여 종래 기술을 채용할 수 있다. “매커니즘”, “요소”, “수단”, “구성”과 같은 용어는 넓게 사용될 수 있으며, 기계적이고 물리적인 구성들로서 한정되는 것은 아니다. 상기 용어는 프로세서 등과 연계하여 소프트웨어의 일련의 처리들(routines)의 의미를 포함할 수 있다.This embodiment may be represented by functional block configurations and various processing steps. These functional blocks may be implemented in a wide variety of hardware and / or software configurations that perform particular functions. For example, embodiments may include integrated circuit components such as memory, processing, logic, look-up tables, etc., that may perform various functions by control of one or more microprocessors or other control devices Can be employed. Similar to how components may be implemented with software programming or software components, the present embodiments may be implemented in a variety of ways, including C, C ++, Java (" Java), an assembler, and the like. Functional aspects may be implemented with algorithms running on one or more processors. In addition, the present embodiment can employ conventional techniques for electronic environment setting, signal processing, and / or data processing. Terms such as "mechanism", "element", "means", "configuration" may be used broadly and are not limited to mechanical and physical configurations. The term may include the meaning of a series of routines of software in conjunction with a processor or the like.

본 실시예에서 설명하는 특정 실행들은 예시들로서, 어떠한 방법으로도 기술적 범위를 한정하는 것은 아니다. 명세서의 간결함을 위하여, 종래 전자적인 구성들, 제어 시스템들, 소프트웨어, 상기 시스템들의 다른 기능적인 측면들의 기재는 생략될 수 있다. 또한, 도면에 도시된 구성 요소들 간의 선들의 연결 또는 연결 부재들은 기능적인 연결 및/또는 물리적 또는 회로적 연결들을 예시적으로 나타낸 것으로서, 실제 장치에서는 대체 가능하거나 추가의 다양한 기능적인 연결, 물리적인 연결, 또는 회로 연결들로서 나타내어질 수 있다. The specific implementations described in this embodiment are illustrative and do not in any way limit the scope of the invention. For brevity of description, descriptions of conventional electronic configurations, control systems, software, and other functional aspects of such systems may be omitted. Also, the connections or connecting members of the lines between the components shown in the figures are illustrative of functional connections and / or physical or circuit connections, which may be replaced or additionally provided by a variety of functional connections, physical Connection, or circuit connections.

본 명세서(특히 특허청구범위에서)에서 “상기”의 용어 및 이와 유사한 지시 용어의 사용은 단수 및 복수 모두에 해당하는 것일 수 있다. 또한, 범위(range)를 기재한 경우 상기 범위에 속하는 개별적인 값을 포함하는 것으로서(이에 반하는 기재가 없다면), 상세한 설명에 상기 범위를 구성하는 각 개별적인 값을 기재한 것과 같다. 마지막으로, 방법을 구성하는 단계들에 대하여 명백하게 순서를 기재하거나 반하는 기재가 없다면, 상기 단계들은 적당한 순서로 행해질 수 있다. 반드시 상기 단계들의 기재 순서에 한정되는 것은 아니다.In this specification (particularly in the claims), the use of the terms " above " and similar indication words may refer to both singular and plural. In addition, when a range is described, it includes the individual values belonging to the above range (unless there is a description to the contrary), and the individual values constituting the above range are described in the detailed description. Finally, if there is no explicit description or contradiction to the steps constituting the method, the steps may be performed in an appropriate order. It is not necessarily limited to the description order of the above steps.

이제까지 본 발명에 대하여 그 바람직한 실시예들을 중심으로 살펴보았다. 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자는 본 발명이 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 변형된 형태로 구현될 수 있음을 이해할 수 있을 것이다. 그러므로 개시된 실시예들은 한정적인 관점이 아니라 설명적인 관점에서 고려되어야 한다. 본 발명의 범위는 전술한 설명이 아니라 특허청구범위에 나타나 있으며, 그와 동등한 범위 내에 있는 모든 차이점은 본 발명에 포함된 것으로 해석되어야 할 것이다.The present invention has been described with reference to the preferred embodiments. It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims. Therefore, the disclosed embodiments should be considered in an illustrative rather than a restrictive sense. The scope of the present invention is defined by the appended claims rather than by the foregoing description, and all differences within the scope of equivalents thereof should be construed as being included in the present invention.

Claims

Generating a set of reference data for the depths of the leads aligned with each of the reference genes by performing a deep sequencing on the reference genes;
Analyzing the depths of the leads aligned with each of the tested genes by performing the deep sequencing on the tested genes; And
Comparing the analyzed depths with depths of the reference genes included in the reference data set to determine whether a copy number variation (CNV) gene is present among the tested genes How to analyze.

The method according to claim 1,
The analyzing step
And analyzing the depth of the leads aligned with the exon regions of the tested genes.

3. The method of claim 2,
The determining step
(CNV) gene is determined by comparing the depths between the reference genes and the test genes for the same exon region.

The method according to claim 1,
The determining step
When there is an exon site in which the difference in depth between the reference genes and the exon regions corresponding to each other among the exon regions of the tested genes is not statistically significant, (CNV) gene is present.

The method according to claim 1,
The generating step
Obtaining the lead-depths corresponding to the reference genes for each of the people through the deep sequencing of gene data of a plurality of people;
Clustering the people into different groups according to the distribution of the obtained lead-depths; And
And normalizing the lead-depths obtained for each of the reference genes for each group to obtain standard depths of the reference genes representing each of the groups,
The reference data set
And for each of the groups, data representative of the standard depths of each of the reference genes.

6. The method of claim 5,
The determining step
Determining a group having the smallest statistical difference between the distribution of the analyzed depths and the distribution of the standard depths among the groups; And
Comparing the analyzed depths with standard depths corresponding to the determined group to determine whether the copy number variation (CNV) gene is present.

6. The method of claim 5,
Further comprising obtaining the genetic data of the people from public genome data or public map (HapMap) data.

The method according to claim 1,
The reference genes or the test genes
Biopsy tissue, formalin-fixed, paraffin-embedded (FFPE) tissue.

The method according to claim 1,
Further comprising performing annotation to identify a drug corresponding to the copy number mutation (CNV) gene when it is determined that the copy number mutation (CNV) gene is present among the tested genes.

A computer-readable recording medium storing a program for causing a computer to execute the method according to any one of claims 1 to 9.

A reference data generator for generating a reference data set related to the depths of the leads arranged in each of the reference genes by performing a deep sequencing on the reference genes;
An analyzer for analyzing the depths of the leads aligned to each of the tested genes by performing the deep sequencing on the tested genes; And
And determining whether a copy number variation (CNV) gene exists among the test genes by comparing the analyzed depths with depths of the reference genes included in the reference data set. Analyzing device.

12. The method of claim 11,
The analyzer
And analyzing the depth of the leads aligned with the exon regions of the tested genes.

13. The method of claim 12,
The determination unit
And comparing the depths of the reference genes and the test genes for the same exon region to determine the presence of the copy number mutation (CNV) gene.

12. The method of claim 11,
The determination unit
When there is an exon site in which the difference in depth between the reference genes and the exon regions corresponding to each other among the exon regions of the tested genes is not statistically significant, RTI ID = 0.0 > (CNV) < / RTI >

12. The method of claim 11,
The reference data generator
Through the deep sequencing of gene data of a plurality of people, obtaining lead-depths corresponding to the reference genes for each of the people,
And clustering the people into different groups according to the distribution of the obtained lead-depths,
Acquiring standard depths of each of the reference genes representing each of the groups by standardizing the lead-depths obtained for each of the reference genes for each group,
The reference data set
And for each of the groups, data representative of the standard depths of each of the reference genes.

16. The method of claim 15,
The determination unit
Determining a group having the smallest statistical difference between the distribution of the analyzed depths and the distribution of the standard depths among the groups,
And comparing the analyzed depths with standard depths corresponding to the determined group to determine whether the copy number variation (CNV) gene is present.

16. The method of claim 15,
The reference data generator
And obtaining the genetic data of the people from public genome data or public map (HapMap) data.

12. The method of claim 11,
The reference genes or the test genes
Biopsy tissue, formalin-fixed paraffin-embedded (FFPE) tissue.

12. The method of claim 11,
The determination unit
And an annotation for identifying a drug corresponding to the copy number mutation (CNV) gene when it is determined that the copy number mutation (CNV) gene is present among the tested genes.