KR102236194B1

KR102236194B1 - Method for selecting function group marker of genes, system and method for disease prediction

Info

Publication number: KR102236194B1
Application number: KR1020200145453A
Authority: KR
Inventors: 이관수; 민범기; 박성용
Original assignee: 한국과학기술원
Priority date: 2020-11-03
Filing date: 2020-11-03
Publication date: 2021-04-05
Also published as: KR20200127956A

Abstract

질병 판별 시스템의 동작 방법으로서, 기능이 유사한 유전자들이 묶인 기능 그룹들을 수집하는 단계, 상기 기능 그룹들에서 출현하는 빈도가 기준 이상인 다기능 유전자들을 추출하는 단계, 다기능 유전자 간 기능 유사도를 기초로 상기 다기능 유전자들의 조합으로 구성된 적어도 하나의 공통기능 그룹 지표를 탐색하고, 각 공통기능 그룹 지표를 노드로 가지는 기저 온톨로지를 생성하는 단계, 수집한 전체 유전자들로 그룹 지표 탐색 범위를 확장하고, 유전자 간 기능 유사도를 기초로 유전자들의 조합으로 구성된 적어도 하나의 세부기능 그룹 지표를 탐색하며, 상기 기저 온톨로지 기반에서 각 세부기능 그룹 지표를 노드로 추가하여 기능 온톨로지를 생성하는 단계, 그리고 상기 기능 온톨로지를 구성하는 노드들을 기능 그룹 지표들로 선정하는 단계를 포함한다.A method of operating a disease discrimination system, comprising: collecting functional groups in which genes having similar functions are grouped, extracting multi-functional genes with a frequency greater than or equal to a standard appearing in the functional groups, and the multi-functional genes based on functional similarity between multi-functional genes Searching for at least one common functional group index consisting of a combination of them, creating a base ontology having each common functional group index as a node, expanding the scope of the group index search with all the collected genes, and determining the functional similarity between genes. A step of creating a functional ontology by searching for at least one detailed functional group index composed of a combination of genes on the basis, adding each detailed functional group index as a node on the basis of the base ontology, and functioning the nodes constituting the functional ontology. Including the step of selecting as group indicators.

Description

Methods of selecting functional group indicators of genes, disease identification system, and operation method thereof {METHOD FOR SELECTING FUNCTION GROUP MARKER OF GENES, SYSTEM AND METHOD FOR DISEASE PREDICTION}

본 발명은 생물정보학(Bioinformatics) 기술에 관한 것이다.The present invention relates to bioinformatics technology.

환자의 질병 여부, 질병 내 상태, 질병 유발 기전 등을 알아내기 위하여 mRNA나 단백질 등의 다양한 분자 지표가 활용되고 있다. 최근에는 질병 상태를 보다 정확하고 일관되게 판별할 수 있는 지표를 찾기 위해, 질병 상태 별 다양한 오믹스 데이터를 활용하여 특이적 패턴을 보이는 분자 지표를 발굴하고 있다. 오믹스 데이터는 한 세포의 모든 유전자 변이 또는 발현양을 측정하는 것으로서, 세포 내에서 나타날 수 있는 모든 유전자들을 상대 비교함으로써 특정 유전자에 편향되지 않는 공정한 선정과 함께 질병 기전과의 연관성을 통섭적으로 해석 가능한 장점을 제공한다. Various molecular indicators such as mRNA and protein are used to find out whether a patient has a disease, a state within a disease, or a mechanism that causes disease. In recent years, in order to find an indicator that can more accurately and consistently determine a disease state, various ohmic data for each disease state are used to discover molecular indicators showing specific patterns. Ohmics data is a measure of the amount of variation or expression of all genes in a cell.By comparing all genes that may appear in a cell, a fair selection that is not biased to a specific gene and the association with the disease mechanism are comprehensively interpreted. Offers possible advantages.

대표적 오믹스 데이터로는 유전체, 전사체, 단백체, 대사체를 들 수 있다. 유전체 변이 데이터는 인체 DNA 전반에 걸친 수백 만개의 변이 정보로부터 나오는 것으로 적은 수의 환자 데이터에서 정확한 연관성을 찾는데 매우 큰 한계가 있다. 단백체나 대사체 오믹스 데이터는 세포 기능과의 직접적 연계성이 큰 장점이 있는 반면, 측정 가능한 유전자 수는 아직 수 백에서 1천개로서, 수 만개의 유전자 전체에 이르는 전사체에 비해 한정된 것이 큰 단점이다. 전사체는 유전자 전체를 대상으로 측정 가능하며, 동시에 유전자가 mRNA 상태로 발현된 것을 측정하는 것으로서 질병 기전과의 연관성이 단백질 발현 측정 수준에 근접하여 가장 활발히 활용되고 있다. 질병 상태가 결정되는 기전은 세포 내의 단위 기능들과 이들이 조합된 상위 기능들로 복잡하게 구성된다. 세포 내 기능들은 DNA에 정보화된 유전자가 mRNA를 거쳐 단백질로 발현되는 과정, 그리고 단백질들의 단위 기능들에 의해 다양한 대사물질 및 다음 단계의 세포 기능들이 결정되는 과정이 포함된다. 단위 기능 및 조합된 상위 기능들을 결정하는 유전자들의 조직화되고 네트워크화된 조절 관계가 유전자들의 1차 발현 산물들(mRNA, 단백질)과 2차 산물들에 의해 구성되는 것으로 파악되고 있다. 그러나, 현재는 주로 개별적인 생체 분자들과 이들 기능들의 인과관계 또는 생체 분자들과 질병의 인과관계가 매우 한정적으로 밝혀져 있다. 따라서, 가용 가능한 개별 유전자 또는 종합적인 오믹스 실험 데이터들을 바탕으로 최대한 효율적이고 정확한 예측을 하기 위해 다양한 정보학적 방법과 모델링 방법들이 적용되고 있다. DNA 마이크로어레이 또는 RNASeq 등의 전사체 분석 데이터는 식별 가능한 생체 분자들의 범위가 전체 유전자로 최대이며, 측정의 편리성으로 다양한 종류의 대다수 환자 데이터를 제공한다. 따라서, 전사체 분석 데이터가 질병 기전과 생체 분자간의 인과관계를 생체 기전의 복잡성을 반영하여 예측할 수 있는 가장 효율적인 자원이라고 할 수 있다.Representative ohmic data include genomes, transcripts, proteins, and metabolites. Genomic mutation data comes from millions of mutation information across human DNA, and there is a very big limit in finding accurate associations in a small number of patient data. While proteomic or metabolite omics data has a great advantage in direct linkage with cell function, the number of measurable genes is still several hundred to 1,000, and it is a major disadvantage that it is limited compared to transcriptomes spanning tens of thousands of genes. . The transcriptome can be measured for the entire gene, and at the same time, it is used to measure the expression of the gene in the mRNA state. The mechanism by which the disease state is determined is complexly composed of unit functions within a cell and higher functions in which they are combined. Intracellular functions include the process by which genes informationd in DNA are expressed as proteins via mRNA, and processes in which various metabolites and cell functions in the next step are determined by the unit functions of the proteins. It is understood that the organized and networked regulatory relationships of genes that determine unit functions and combined higher functions are constituted by primary expression products (mRNA, protein) and secondary products of genes. However, at present, the causal relationship between individual biomolecules and these functions or between biomolecules and disease has been very limited. Therefore, various informational methods and modeling methods are applied to make the most efficient and accurate prediction based on available individual genes or comprehensive ohmic experimental data. Transcript analysis data, such as DNA microarray or RNASeq, has the largest range of identifiable biomolecules for all genes, and provides data on a wide variety of patients with convenience of measurement. Therefore, it can be said that the transcriptome analysis data is the most efficient resource that can predict the causal relationship between the disease mechanism and the biomolecule by reflecting the complexity of the biomechanism.

임상적인 목적의 질병 진단, 예후 예측, 약물 동반 진단 등을 달성하기 위한 생체 분자 지표의 선정에도, 조직화되고 네트워크화된 생체 기전을 반영한 기법을 적용하여 판별의 정확성과 안정성/재현성을 확보할 수 있다. 현재, 대부분의 질병 전사체를 통한 지표화 기술은 질병 상태에 따라 mRNA(유전자의 전사 물질) 발현량이 유의미하게 변화하는 유전자들을 선별하고, 이로부터 개별 유전자의 발현 지표 또는 유전자들의 조합을 활용하여 지표화 한다. 질병 상태는 다양한 유전자 기능들이 종합된 결과이므로, 개별 유전자의 발현 차이를 이용한 지표화는 근본적인 한계가 있고, 같은 질병 상태의 환자들도 큰 편차를 보여 선정된 지표의 신뢰도와 안정성/재현성이 크게 떨어진다. 개별 환자의 질병 상태와 연관 있는 변이와 연관 없는 변이의 차이를 구분하기 어렵다. In the selection of biomolecular indicators for achieving clinical disease diagnosis, prognosis prediction, and drug companion diagnosis, the accuracy and stability/reproducibility of discrimination can be secured by applying a technique that reflects organized and networked biological mechanisms. Currently, most disease transcripts are used to select genes whose expression levels of mRNA (gene transcription substances) significantly change according to disease states, and from them, index them using an expression index of an individual gene or a combination of genes. . Since the disease state is a result of the synthesis of various gene functions, indexing using the difference in expression of individual genes has a fundamental limitation, and patients with the same disease state also show large deviations, so the reliability and stability/reproducibility of the selected index are greatly degraded. It is difficult to distinguish the difference between mutations that are related to the disease state of individual patients and mutations that are not related to it.

질병 기전을 구성하는 기능군 내 유전자들의 변이를 통합하여 표현하면, 질병 상태와 연관 없는 변이는 상쇄하고 연관 있는 변이는 강화할 수 있어서, 같은 질병 상태의 다양한 환자들을 포괄하는 안정적인 지표를 도출할 수 있다. 최근 시도되고 있는 기법으로, 발현 패턴이 유사한 여러 개의 개별 분자 지표들을 하나의 그룹으로 묶어서 지표화하거나, 기존에 알려진 기능군 내 유전자들을 그룹화하고 이들의 발현 패턴 유사성에 따라 최종 그룹을 구성하는 기법들이 있다. By integrating and expressing the mutations of the genes in the functional group that make up the disease mechanism, mutations that are not related to the disease state can be offset and the related mutations can be reinforced, resulting in stable indicators covering various patients with the same disease state. . As a technique that has been recently tried, several individual molecular indicators with similar expression patterns are grouped into a single group, or genes within known functional groups are grouped and the final group is formed according to their expression pattern similarity. .

하지만, 발현 패턴 유사성 기반 그룹 지표화는 여전히 패턴 발굴에 참여한 환자 그룹(샘플)에 영향을 받기 때문에, 개별 유전자 지표보다는 개선되었으나 정확성과 안정성이 크게 개선되지 않는다. 이들 패턴 그룹들과 유사한 기능군을 결정하기도 어려워 환자 샘플의 수가 매우 늘어나기 전에는 이러한 패턴 그룹들이 의미가 있는지 결정하기 어려운 상황이다. However, since the group indexing based on expression pattern similarity is still affected by the patient group (sample) participating in the pattern discovery, it is improved compared to the individual genetic index, but the accuracy and stability are not significantly improved. It is difficult to determine functional groups similar to these pattern groups, and it is difficult to determine whether these pattern groups are meaningful until the number of patient samples increases significantly.

기존에 알려진 기능군 중심의 그룹 지표화는 다양한 기능 정보들이 있음에도 불구하고 Gene Ontology 등으로 대표되는 한정된 기능 정보만을 활용하여 기능군을 결정하는 근본적인 한계가 있다. 현재, 각종 실험 데이터와 정보 분석 데이터를 통해 수백만 가지의 기능군 조합을 도출할 수 있으나, 수백만 가지의 기능군을 종합하여 포괄적이고 필수적인 형태의 기능군으로 재구성할 수 있는 기술이 존재하지 않는다. 따라서, 단일 리소스만을 활용하거나 리소스 내 모든 기능 정보를 단순 병합하여 분석하고 있는 상황이다. Although the previously known functional group-centered group indexing has various functional information, there is a fundamental limitation in determining the functional group using only limited functional information represented by Gene Ontology. Currently, it is possible to derive a combination of millions of functional groups through various experimental data and information analysis data, but there is no technology that can synthesize millions of functional groups and reconstruct them into comprehensive and essential functional groups. Therefore, it is a situation in which only a single resource is used or all functional information in the resource is simply merged and analyzed.

또한, 발현 패턴 유사성 기반으로 선정된 그룹 지표와 알려진 기능군 중심으로 선정된 그룹 지표는, 다양한 질병 상태 또는 질병 기전에서 그룹 지표 후보들의 판별 성능을 보여주지 못한다. 현재까지 시도된 그룹 지표는 질병 기전을 바탕으로 질병 상태를 판별하는 안정적인 지표라고 볼 수 없다.In addition, the group index selected based on the expression pattern similarity and the group index selected based on the known functional group do not show the discrimination performance of the group index candidates in various disease states or disease mechanisms. Group indicators that have been tried to date cannot be considered as stable indicators that determine disease status based on disease mechanisms.

따라서, 질병에 의미 있는 변이 탐색이 어려워 신뢰도와 안정성이 떨어졌던 기존의 개별 유전자 발현 지표의 한계점을 극복하고, 또한 다양한 기능 정보를 활용하지 못한 기존 기능군 분석 방법의 한계를 극복할 수 있는 새로운 그룹 지표가 요구된다.Therefore, a new group that overcomes the limitations of the existing individual gene expression indicators, which had poor reliability and stability due to difficulty in searching for meaningful mutations in diseases, and also overcomes the limitations of the existing functional group analysis method that did not utilize various functional information. Indicators are required.

(특허문헌 1) KR10-1927910 B (Patent Document 1) KR10-1927910 B

(특허문헌 2) KR10-1990429 B (Patent Document 2) KR10-1990429 B

(특허문헌 3) KR10-1860061 B(Patent Document 3) KR10-1860061 B

해결하고자 하는 과제는 기능이 유사한 유전자들을 하나의 기능 그룹 지표로 (재)구성하고, 그룹 지표들을 이용하여 질병을 판별하는 시스템 및 방법을 제공하는 것이다.The task to be solved is to (re)constitute genes with similar functions into one functional group indicator, and to provide a system and method for discriminating diseases using group indicators.

해결하고자 하는 과제는 다양한 기능 정보를 통합 분석하여, 질병과 연관된 포괄적이고 필수적인 기능의 유전자들을 재구성하여 질병 그룹 지표 후보를 발굴하고, 질병 판별력 검증을 통해 질병 그룹 지표를 선정하는 시스템 및 방법을 제공하는 것이다.The task to be solved is to provide a system and method that integrates and analyzes various functional information, reorganizes genes of comprehensive and essential functions related to disease, discovers disease group indicator candidates, and selects disease group indicators through disease discriminant verification. will be.

해결하고자 하는 과제는 다양한 기능 그룹들을 재구성하여 새로운 기능 그룹 지표들을 발굴하고, 각 기능 그룹 지표의 질병 연관성을 점수화하여 질병 그룹 지표를 발굴하고, 질병 그룹 지표를 기반으로 질병 상태 판별이 가능한 판별 모델을 구성하는 시스템 및 방법을 제공하는 것이다.The task to be solved is to discover new functional group indicators by reorganizing various functional groups, to find disease group indicators by scoring disease associations of each functional group indicator, and to develop a discrimination model that can discriminate disease status based on disease group indicators. It is to provide a system and method to configure.

한 실시예에 따른 질병 판별 시스템의 동작 방법으로서, 기능이 유사한 유전자들이 묶인 기능 그룹들을 수집하는 단계, 상기 기능 그룹들에서 출현하는 빈도가 기준 이상인 다기능 유전자들을 추출하는 단계, 다기능 유전자 간 기능 유사도를 기초로 상기 다기능 유전자들의 조합으로 구성된 적어도 하나의 공통기능 그룹 지표를 탐색하고, 각 공통기능 그룹 지표를 노드로 가지는 기저 온톨로지를 생성하는 단계, 수집한 전체 유전자들로 그룹 지표 탐색 범위를 확장하고, 유전자 간 기능 유사도를 기초로 유전자들의 조합으로 구성된 적어도 하나의 세부기능 그룹 지표를 탐색하며, 상기 기저 온톨로지 기반에서 각 세부기능 그룹 지표를 노드로 추가하여 기능 온톨로지를 생성하는 단계, 그리고 상기 기능 온톨로지를 구성하는 노드들을 기능 그룹 지표들로 선정하는 단계를 포함한다.A method of operating a disease discrimination system according to an embodiment, comprising: collecting functional groups in which genes having similar functions are grouped, extracting multi-functional genes having a frequency of occurrence in the functional groups greater than or equal to a reference, and functional similarity between multi-functional genes. Based on the search for at least one common functional group index composed of a combination of the multifunctional genes, generating a base ontology having each common functional group index as a node, expanding the group index search range with all the collected genes, Searching for at least one detailed functional group index composed of a combination of genes based on the functional similarity between genes, and creating a functional ontology by adding each detailed functional group indicator as a node based on the base ontology, and the functional ontology And selecting the constituent nodes as functional group indicators.

상기 기저 온톨로지를 생성하는 단계는 다기능 유전자 간 기능 유사도가 높은 다기능 유전자쌍 순서대로, 해당 다기능 유전자쌍을 연결하여 제1 유전자 네트워크를 확장하고, 상기 제1 유전자 네트워크에서 극대 클릭(maximum clique)을 탐색하며, 탐색한 극대 클릭에 해당하는 유전자셋이 수집한 기능 그룹들에 존재하면, 탐색한 극대 클릭을 온톨로지의 노드로 생성하는 절차를 반복할 수 있다. 상기 극대 클릭에 해당하는 유전자셋은 상기 공통기능 그룹 지표일 수 있다.In the step of generating the base ontology, the first gene network is expanded by connecting the corresponding multi-functional gene pairs in the order of multi-functional gene pairs having high functional similarity between multi-functional genes, and a maximum clique is searched in the first gene network. And, if the gene set corresponding to the searched maximum click exists in the collected functional groups, the procedure of generating the searched maximum click as a node of the ontology can be repeated. The gene set corresponding to the maximum click may be the common functional group index.

상기 기저 온톨로지를 생성하는 단계는 각 다기능 유전자쌍을 상기 제1 유전자 네트워크에 추가한 후, 추가된 다기능 유전자쌍이 이전에 추가된 다기능 유전자쌍들로 탐색된 극대 클릭들을 확장시키거나 새로운 극대 클릭을 구성하는지 탐색할 수 있다.In the step of generating the base ontology, after adding each multifunctional gene pair to the first gene network, the added multifunctional gene pair expands the maximum clicks searched with the previously added multifunctional gene pairs or constructs a new maximum click. You can explore whether it is.

상기 기저 온톨로지를 생성하는 단계는 탐색한 극대 클릭에 해당하는 유전자셋을 노드 후보로 결정하고, 상기 온톨로지에 상기 노드 후보의 부분 집합 노드가 있으면, 상기 부분 집합 노드의 부모 노드로 상기 노드 후보를 추가하고, 상기 온톨로지에 상기 노드 후보의 부분 집합 노드가 없으면, 상기 노드 후보를 말단 노드로 추가할 수 있다.In the step of generating the base ontology, the gene set corresponding to the searched maximum click is determined as a node candidate, and if there is a subset node of the node candidate in the ontology, the node candidate is added as a parent node of the subset node. And, if there is no subset node of the node candidate in the ontology, the node candidate may be added as an end node.

상기 기능 온톨로지를 생성하는 단계는 수집한 유전자들에 대해 유전자 간 기능 유사도를 계산하고, 기능 유사도가 높은 유전자쌍 순서대로, 해당 유전자쌍을 연결하여 제2 유전자 네트워크를 확장하고, 상기 제2 유전자 네트워크에서 극대 클릭(maximum clique)을 탐색하며, 탐색한 극대 클릭에 해당하는 유전자셋이 수집한 기능 그룹들에 존재하면, 탐색한 극대 클릭을 상기 온톨로지의 노드로 생성하는 절차를 반복할 수 있다. 상기 극대 클릭에 해당하는 유전자셋은 상기 세부기능 그룹 지표일 수 있다.In the step of generating the functional ontology, the function similarity between genes is calculated for the collected genes, the gene pairs having high function similarity are connected in the order of the gene pairs to expand the second gene network, and the second gene network If a maximum clique is searched for and a gene set corresponding to the searched maximum click exists in the collected functional groups, a procedure of generating the searched maximum click as a node of the ontology may be repeated. The gene set corresponding to the maximum click may be the detailed functional group index.

상기 기능 온톨로지를 생성하는 단계는 각 유전자쌍을 상기 제2 유전자 네트워크에 추가한 후, 추가된 유전자쌍이 이전에 추가된 유전자쌍들로 탐색된 극대 클릭들을 확장시키거나 새로운 극대 클릭을 구성하는지 탐색할 수 있다.In the step of generating the functional ontology, after each gene pair is added to the second gene network, it is searched whether the added gene pair expands the maximum clicks searched with previously added gene pairs or constitutes a new maximum click. I can.

상기 기능 온톨로지를 생성하는 단계는 탐색한 극대 클릭에 해당하는 유전자셋을 노드 후보로 결정하고, 상기 온톨로지에 상기 노드 후보의 부분 집합 노드가 있으면, 상기 부분 집합 노드의 부모 노드로 상기 노드 후보를 추가하고, 상기 온톨로지에 상기 노드 후보의 부분 집합 노드가 없으면, 상기 노드 후보를 말단 노드로 추가할 수 있다.In the step of generating the functional ontology, a gene set corresponding to the searched maximum click is determined as a node candidate, and if there is a subset node of the node candidate in the ontology, the node candidate is added as a parent node of the subset node. And, if there is no subset node of the node candidate in the ontology, the node candidate may be added as an end node.

상기 동작 방법은 상기 기능 그룹 지표들 중에서, 질병 지표를 유의미하게 포함하고, 질병-비질병 마이크로어레이에서 활성화 점수가 유의미하게 차이 나는 기능 그룹 지표를 질병 그룹 지표로 선정하는 단계를 더 포함할 수 있다.The operation method may further include selecting, as a disease group index, a functional group index that significantly includes a disease index among the functional group indexes, and has a significantly different activation score in the disease-non-disease microarray. .

상기 동작 방법은 판별하고자 하는 질병 상태의 마이크로어레이 샘플별로 질병 그룹 지표들의 활성화 점수를 계산하여 학습 데이터를 생성하는 단계, 상기 질병 그룹 지표들의 활성화 점수를 기반으로 특정 질병 상태를 판별하는 판별 모델을 학습시키는 단계, 특정 샘플의 질병 상태 판별을 요청받으면, 상기 특정 샘플에 대한 상기 질병 그룹 지표들의 활성화 점수를 학습된 상기 판별 모델로 입력하는 단계, 그리고 상기 판별 모델로부터 출력된 판별값을 통하여 상기 특정 샘플의 질병 상태를 출력하는 단계를 더 포함할 수 있다.The operation method includes generating learning data by calculating activation scores of disease group indicators for each microarray sample of a disease state to be determined, and learning a discrimination model for determining a specific disease state based on the activation scores of the disease group indicators. When a request is made to determine the disease state of a specific sample, inputting the activation score of the disease group indicators for the specific sample into the learned discrimination model, and the specific sample through the discrimination value output from the discrimination model. It may further include the step of outputting the disease state of.

다른 실시예에 다른 질병 판별 시스템의 동작 방법으로서, 기능이 유사한 유전자들이 묶인 기능 그룹들을 수집하는 단계, 수집한 유전자들이 포함된 기능 그룹의 유사도를 기초로 유전자쌍을 연결하여 유전자 네트워크를 확장하고, 상기 유전자 네트워크에서 연결된 유전자들의 조합으로 구성된 기능 그룹 지표를 탐색하며, 각 기능 그룹 지표를 노드로 가지는 온톨로지를 생성하는 단계, 상기 온톨로지를 구성하는 노드들을 기능 그룹 지표들로 선정하는 단계, 그리고 상기 기능 그룹 지표들 중에서, 질병 지표를 유의미하게 포함하고, 질병-비질병 마이크로어레이에서 활성화 점수가 유의미하게 차이 나는 기능 그룹 지표를 질병 그룹 지표로 선정하는 단계를 포함한다.In another embodiment, as an operating method of another disease determination system, collecting functional groups in which genes having similar functions are grouped, expanding a gene network by connecting gene pairs based on the similarity of the functional groups in which the collected genes are included, Searching for a functional group indicator consisting of a combination of genes connected in the gene network, generating an ontology having each functional group indicator as a node, selecting nodes constituting the ontology as functional group indicators, and the function Among the group indicators, a disease indicator is significantly included, and a functional group indicator having a significantly different activation score in the disease-non-disease microarray is selected as the disease group indicator.

상기 온톨로지를 생성하는 단계는 상기 기능 그룹들에서 출현하는 빈도가 기준 이상인 유전자들을 다기능 유전자들로 추출하는 단계, 다기능 유전자 간 기능 유사도를 기초로 상기 다기능 유전자들의 조합으로 구성된 적어도 하나의 공통기능 그룹 지표를 탐색하고, 각 공통기능 그룹 지표를 노드로 가지는 기저 온톨로지를 생성하는 단계, 그리고 수집한 전체 유전자들로 그룹 지표 탐색 범위를 확장하고, 유전자 간 기능 유사도를 기초로 유전자들의 조합으로 구성된 적어도 하나의 세부기능 그룹 지표를 탐색하며, 상기 기저 온톨로지 기반에서 각 세부기능 그룹 지표를 노드로 추가하여 상기 온톨로지를 생성하는 단계를 포함할 수 있다.The step of generating the ontology includes extracting genes with a frequency greater than or equal to a standard appearing in the functional groups as multi-functional genes, at least one common functional group index composed of a combination of the multi-functional genes based on the similarity of functions between multi-functional genes. And generating a base ontology having each common functional group index as a node, and expanding the search range of the group index with all the collected genes, and at least one consisting of combinations of genes based on the functional similarity between genes. And generating the ontology by searching for a detailed functional group index and adding each detailed functional group index as a node based on the base ontology.

상기 온톨로지를 생성하는 단계는 유전자 간 기능 유사도가 높은 유전자쌍 순서대로, 해당 유전자쌍을 연결하여 상기 유전자 네트워크를 확장하고, 상기 유전자 네트워크에서 극대 클릭(maximum clique)을 탐색하며, 탐색한 극대 클릭에 해당하는 유전자셋을 온톨로지의 노드로 생성하는 절차를 반복할 수 있다.In the step of generating the ontology, the gene network is expanded by connecting the corresponding gene pairs in the order of gene pairs having high functional similarity between genes, searching for a maximum clique in the gene network, and searching for a maximum click. The procedure of generating the corresponding gene set as a node of the ontology can be repeated.

상기 기능 그룹들을 수집하는 단계는 기능 유전자셋(gene set) 정보를 제공하는 데이터베이스, 그리고 질병 경로 내에 포함된 생물학적 경로, 각종 조절자-표적 정보, 유전자 상호작용 정보를 제공하는 데이터베이스를 이용하여 기능이 유사한 유전자들이 묶인 기능 그룹들을 수집할 수 있다.The step of collecting the functional groups includes a database providing functional gene set information, and a database providing biological pathways included in the disease pathway, various regulator-target information, and gene interaction information. It is possible to collect functional groups in which similar genes are grouped.

한 실시예에 따른 질병 판별 시스템으로서, 기능이 유사한 유전자들이 묶인 기능 그룹들을 수집하고, 수집한 유전자들이 포함된 기능 그룹의 유사도를 기초로 유전자쌍을 순차적으로 연결하여 유전자 네트워크를 확장하며, 상기 유전자 네트워크에서 탐색한 극대 클릭(maximum clique)의 유전자셋을 기능 그룹 지표로 선정하는 기능 그룹 지표 발굴 장치, 그리고 복수의 기능 그룹 지표들 중에서, 질병 지표를 유의미하게 포함하고, 질병-비질병 마이크로어레이에서 활성화 점수가 유의미하게 차이 나는 기능 그룹 지표를 질병 그룹 지표로 선정하고, 마이크로어레이 샘플별로 질병 그룹 지표들의 활성화 점수를 계산하며, 상기 질병 그룹 지표들의 활성화 점수를 기반으로 특정 질병 상태를 판별하는 판별 모델을 학습시키는 질병 판별 모델 생성 장치를 포함한다.As a disease discrimination system according to an embodiment, a gene network is expanded by collecting functional groups in which genes having similar functions are grouped, and by sequentially connecting gene pairs based on the similarity of the functional groups including the collected genes, and the gene A functional group indicator discovery device that selects the gene set of the maximum clique searched in the network as a functional group indicator, and a disease indicator significantly included among a plurality of functional group indicators, in a disease-non-disease microarray. A discrimination model that selects functional group indicators with significantly different activation scores as disease group indicators, calculates activation scores of disease group indicators for each microarray sample, and determines a specific disease state based on the activation scores of the disease group indicators It includes a disease discrimination model generation device for learning.

상기 기능 그룹 지표 발굴 장치는 상기 기능 그룹들에서 출현하는 빈도가 기준 이상인 다기능 유전자들을 추출하고, 다기능 유전자 간 기능 유사도를 기초로 정렬한 다기능 유전자쌍을 연결하여 제1 유전자 네트워크를 확장하며, 상기 제1 유전자 네트워크에서 탐색한 극대 클릭의 유전자셋을 노드로 가지는 기저 온톨로지를 생성할 수 있다. 상기 기능 그룹 지표 발굴 장치는 수집한 전체 유전자들의 유전자 간 기능 유사도를 기초로 유전자쌍을 정렬하고, 유전자쌍을 순서대로 연결하여 제2 유전자 네트워크를 확장하며, 상기 제2 유전자 네트워크에서 탐색한 극대 클릭의 유전자셋을 상기 기저 온톨로지에 추가하여 최종 온톨로지를 생성할 수 있다. 상기 기능 그룹 지표 발굴 장치는 상기 최종 온톨로지를 구성하는 노드들을 기능 그룹 지표들로 선정할 수 있다.The functional group index discovery device extracts multi-functional genes with a frequency of occurrence of the functional groups greater than or equal to a reference, and expands a first gene network by connecting multi-functional gene pairs arranged based on functional similarity between multi-functional genes, and the first gene network. 1 It is possible to create a base ontology with the gene set of the maximum click searched in the gene network as a node. The functional group index discovery device arranges gene pairs based on the functional similarity between genes of all the collected genes, expands the second gene network by connecting the gene pairs in order, and maximizes clicks searched in the second gene network. The final ontology may be generated by adding the gene set of to the base ontology. The functional group indicator discovery apparatus may select nodes constituting the final ontology as functional group indicators.

상기 질병 판별 모델 생성 장치는 특정 샘플의 질병 상태 판별을 요청받으면, 상기 특정 샘플에 대한 상기 질병 그룹 지표들의 활성화 점수를 학습된 상기 판별 모델로 입력하고, 상기 판별 모델로부터 출력된 판별값을 통하여 상기 특정 샘플의 질병 상태를 출력할 수 있다.When a request for determining a disease state of a specific sample is requested, the disease determination model generating device inputs activation scores of the disease group indicators for the specific sample into the learned determination model, and the determination value output from the determination model You can print the disease state of a specific sample.

실시예에 따르면 기능이 유사한 유전자들을 포괄적이고 필수적인 형태의 기능 그룹으로 재구성할 수 있고, 이를 통해 세포 내의 단위 기능들과 이들이 조합된 상위 기능들로 복잡하게 구성된 질병 기전을 반영할 수 있어, 질병 판별의 정확성과 안정성/재현성을 높일 수 있다.According to the embodiment, genes with similar functions can be reconstructed into a comprehensive and essential functional group, and through this, it is possible to reflect a disease mechanism complicatedly composed of unit functions within a cell and higher functions in which they are combined, thereby discriminating diseases. Accuracy and stability/reproducibility can be improved.

실시예에 따르면 기존에 기능 그룹으로 알려진 유전자셋을 통합 및 재구성하여, 기능에 의한 인과관계가 알려진 유전자셋을 하나의 "기능 그룹 지표"로 발굴할 수 있고, 특히 다양한 기능에 연관된 "공통기능 그룹 지표"및 특수한 세부 기능에 연관된 "세부기능 그룹 지표"의 조합으로, 질병 및 세포 기능을 설명할 수 있는 기능 그룹 지표를 선정할 수 있다. According to an embodiment, by integrating and reconstructing a gene set previously known as a functional group, a gene set with a known causal relationship due to a function can be discovered as one "function group indicator", and in particular, a "common function group" related to various functions. By a combination of “indicators” and “sub-functional group indicators” associated with specific sub-functions, it is possible to select functional group indicators that can describe disease and cellular functions.

실시예에 따르면 질병 지표 및 질병 전사체 데이터 분석을 통해 기능적 인과관계가 있으면서 질병에서 특이적 발현 패턴을 보이는 "질병 그룹 지표"를 선정할 수 있고, 질병 그룹 지표들을 기반으로 판별 모델을 구성하므로, 질병 상태 판별의 재현성을 높일 수 있다.According to the embodiment, it is possible to select a "disease group indicator" showing a specific expression pattern in a disease while having a functional causal relationship through analysis of disease indicators and disease transcript data, and a discrimination model is constructed based on the disease group indicators, The reproducibility of disease state determination can be improved.

실시예에 따라 선정된 질병 그룹 지표는 질병 진단, 예후 예측, 약물 동반 진단 등에 광범위하게 이용될 수 있고, 마이크로어레이 내지는 멀티플렉스 분석 키트로 제작되어 활용될 수 있다.The disease group indicator selected according to the embodiment may be widely used for disease diagnosis, prognosis prediction, drug accompanying diagnosis, etc., and may be manufactured and used as a microarray or a multiplex analysis kit.

실시예에 따르면 질병에 의미 있는 변이의 탐색이 어려워 신뢰도와 안정성이 떨어졌던 기존의 개별 유전자 발현 지표의 한계점을 극복하고, 다양한 기능 정보를 종합하여 활용하지 못하였던 기존 기능 그룹 분석 방법의 문제점을 해결할 수 있다. According to the embodiment, it overcomes the limitations of the existing individual gene expression index, which was inferior in reliability and stability due to difficulty in searching for meaningful mutations in diseases, and solves the problems of the existing functional group analysis method, which was not able to synthesize and utilize various functional information. I can.

도 1은 한 실시예에 따른 기능 유사한 유전자들의 그룹 지표를 이용한 질병 판별 시스템의 구성도이다.
도 2는 한 실시예에 따른 기저 온톨로지 구성 방법의 흐름도이다.
도 3은 한 실시예에 따른 기능 온톨로지 구성 방법의 흐름도이다.
도 4는 한 실시예에 따른 질병 그룹 지표 선정 방법의 흐름도이다.
도 5는 한 실시예에 따른 질병 판별 모델 생성 방법의 흐름도이다.
도 6은 한 실시예에 따른 기저 온톨로지 구성 방법을 예시적으로 설명하는 도면이다.
도 7은 한 실시예에 따른 기능 온톨로지 구성방법을 예시적으로 설명하는 도면이다.1 is a block diagram of a disease determination system using a group index of genes similar to function according to an embodiment.
2 is a flowchart of a method for configuring a base ontology according to an embodiment.
3 is a flowchart of a method of configuring a functional ontology according to an embodiment.
4 is a flowchart of a method for selecting a disease group index according to an exemplary embodiment.
5 is a flowchart of a method for generating a disease discrimination model according to an exemplary embodiment.
6 is a diagram illustrating a method of configuring a base ontology according to an exemplary embodiment.
7 is a diagram illustrating a method of configuring a functional ontology according to an exemplary embodiment.

아래에서는 첨부한 도면을 참고로 하여 본 발명의 실시예에 대하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those of ordinary skill in the art may easily implement the present invention. However, the present invention may be implemented in various different forms and is not limited to the embodiments described herein. In the drawings, parts irrelevant to the description are omitted in order to clearly describe the present invention, and similar reference numerals are attached to similar parts throughout the specification.

명세서 전체에서, 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다. 또한, 명세서에 기재된 "…부", "…기", "모듈" 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어나 소프트웨어 또는 하드웨어 및 소프트웨어의 결합으로 구현될 수 있다.Throughout the specification, when a part "includes" a certain component, it means that other components may be further included rather than excluding other components unless specifically stated to the contrary. In addition, terms such as "... unit", "... group", and "module" described in the specification mean a unit that processes at least one function or operation, which can be implemented by hardware or software or a combination of hardware and software. have.

명세서 전체에서, 기능이 동일/유사한 유전자 그룹(유전자셋)으로 구성된 그룹 지표(group marker)를 간단히, "기능 그룹 지표"라고 부른다. 기능 그룹 지표에 연관된 기능 범위에 따라 공통기능 그룹 지표나 세부기능 그룹 지표로 구분하여 부를 수 있다. 유사도 판단 기준에 의해 기능이 유사하다고 판단되거나, 기능이 유사하다고 알려진 정보를 사용할 수 있다.Throughout the specification, a group marker composed of a group of genes (gene sets) having the same/similar functions is simply referred to as a “functional group indicator”. Depending on the functional range associated with the functional group indicator, it can be called as a common functional group indicator or a detailed functional group indicator. It is determined that the function is similar based on the similarity determination criterion, or information known to have similar function may be used.

도 1은 한 실시예에 따른 기능 유사한 유전자들의 그룹 지표를 이용한 질병 판별 시스템의 구성도이다.1 is a block diagram of a disease determination system using a group index of genes similar to function according to an embodiment.

도 1을 참고하면, 질병 판별 시스템(10)은 적어도 하나의 프로세서로 동작하고, 기능이 유사한 유전자들을 기능 그룹 지표로 선정하는 기능 그룹 지표 발굴 장치(100), 그리고 기능 그룹 지표들의 질병 연관성을 평가하여 질병 그룹 지표를 선정하고 질병 그룹 지표를 기초로 판별 모델을 생성하는 질병 판별 모델 생성 장치(200)를 포함한다. 질병 판별 시스템(10)은 기능 그룹 지표 발굴 장치(100) 및 질병 판별 모델 생성 장치(200)의 동작에 필요한 정보를 저장하는 데이터베이스(300)를 더 포함할 수 있다. 그룹 지표 발굴 장치(100), 질병 판별 모델 생성 장치(200) 그리고 데이터베이스(300)는 통신 인터페이스를 통해 서로 연동하고, 본 발명의 동작에 필요한 정보를 주고받을 수 있다. 데이터베이스(300)의 적어도 일부는 질병 판별 시스템(10)에 구축되거나, 외부 서버에 구현될 수 있다.Referring to FIG. 1, the disease determination system 10 operates as at least one processor, a function group indicator discovery device 100 that selects genes with similar functions as function group indicators, and the disease association of the functional group indicators is evaluated. Thus, it includes a disease discrimination model generation device 200 for selecting a disease group index and generating a discrimination model based on the disease group index. The disease determination system 10 may further include a database 300 that stores information necessary for the operation of the function group indicator discovery device 100 and the disease determination model generation device 200. The group indicator discovery device 100, the disease determination model generation device 200, and the database 300 may interwork with each other through a communication interface, and may exchange information necessary for the operation of the present invention. At least a part of the database 300 may be built in the disease determination system 10 or may be implemented in an external server.

먼저, 데이터베이스(300)는 데이터 종류에 따라, 기능 유전자셋(functional gene set) 데이터베이스(310), 경로 및 상호작용 데이터베이스(320), 질병 지표 데이터베이스(330) 그리고 마이크로어레이 데이터베이스(340)로 구분할 수 있다. 당연히, 데이터베이스(300)는 설계에 따라 다양하게 구현될 수 있고, 반드시 물리적으로 구분될 필요 없으며, 반드시 물리적으로 동일한 위치에 존재할 필요도 없다.First, the database 300 can be classified into a functional gene set database 310, a pathway and interaction database 320, a disease index database 330, and a microarray database 340 according to data types. have. Naturally, the database 300 may be implemented in various ways according to design, and does not necessarily need to be physically classified, and does not necessarily need to exist in the same physical location.

기능 유전자셋 데이터베이스(310)는 기능과 유전자의 관계 정보를 제공한다. 기능 유전자셋 데이터베이스(310)는 예를 들면, Gene Ontology, MSigDB, Enrichr 등의 기능 유전자셋 정보를 제공할 수 있다. The functional gene set database 310 provides relationship information between a function and a gene. The functional gene set database 310 may provide functional gene set information such as Gene Ontology, MSigDB, and Enrichr.

경로 및 상호작용 데이터베이스(320)는 질병 경로 내에 포함된 생물학적 경로, 각종 조절자-표적 정보, 유전자 상호작용 정보를 제공한다. 경로 및 상호작용 데이터베이스(320)는 예를 들면, BioCarta, HumanCyc, KEGG Pathway, NCI-PID, Panther Pathway, PharmGKB, Reactome, SMPDB 데이터베이스 내의 질병 경로 내에 포함된 생물학적 경로, TRANSFAC 데이터베이스 내의 TF 조절자-표적 정보, E3Net 데이터베이스의 E3 조절자-표적 정보, PhosphoSitePlus 데이터베이스 내의 인산화효소 조절자-표적 정보, DEPOD 데이터베이스의 탈인산화효소 조절자-표적 정보, HIPPIE 데이터베이스 내의 유전자 상호작용 정보를 제공할 수 있다.The pathway and interaction database 320 provides biological pathways, various regulator-target information, and gene interaction information included in the disease pathway. The pathway and interaction database 320 includes, for example, BioCarta, HumanCyc, KEGG Pathway, NCI-PID, Panther Pathway, PharmGKB, Reactome, biological pathways included in disease pathways in the SMPDB database, TF modulator-targets in the TRANSFAC database. Information, E3 modulator-target information in the E3Net database, kinase modulator-target information in the PhosphoSitePlus database, dephosphorylase modulator-target information in the DEPOD database, and gene interaction information in the HIPPIE database can be provided.

질병 지표 데이터베이스(330)는 알려진 질병 지표들을 제공한다. 질병 지표 데이터베이스(330)는 예를 들면, DisGeNet 데이터베이스 내의 알려진 질병 유전자 정보, DrugCentral 데이터베이스 내의 질병에 대한 약물 표적 정보를 제공할 수 있다.The disease indicator database 330 provides known disease indicators. The disease index database 330 may provide, for example, known disease gene information in the DisGeNet database and drug target information for a disease in the DrugCentral database.

마이크로어레이 데이터베이스(340)는 마이크로어레이 정보를 제공한다. 마이크로어레이 데이터베이스(340)는 예를 들면, Gene Expression Omnibus, ArrayExpress 데이터베이스 내의 유전자 발현 마이크로어레이 정보를 제공할 수 있다.The microarray database 340 provides microarray information. The microarray database 340 may provide, for example, gene expression microarray information in the Gene Expression Omnibus or ArrayExpress database.

기능 그룹 지표 발굴 장치(100)는 기능 유전자셋 데이터베이스(310) 및 경로 및 상호작용 데이터베이스(320)의 정보를 이용하여 기능이 유사한 유전자들을 기능 그룹 지표로 선정한다. 이를 위해, 기능 그룹 지표 발굴 장치(100)는 유전자 기능 통합 추출부(110), 기저 온톨로지 구성부(130) 그리고 기능 그룹 지표 선정부(150)를 포함할 수 있다.The functional group indicator discovery apparatus 100 selects genes having similar functions as functional group indicators using information from the functional gene set database 310 and the pathway and interaction database 320. To this end, the function group index discovery apparatus 100 may include a gene function integrated extraction unit 110, a base ontology constructing unit 130, and a function group index selection unit 150.

유전자 기능 통합 추출부(110)는 알려진 기능 유전자셋에 포함된 유전자의 기능 정보, 그리고 경로 및 상호작용 정보에 포함된 유전자의 기능 정보를 통합한 유전자 기능 통합 정보를 획득한다. 한편, 유전자 기능 통합 추출부(110)는 사용자가 정의한 임의의 유전자셋 리스트로부터 획득한 기능 정보를 유전자 기능 통합 정보에 추가할 수 있다. 유전자 기능 통합 추출부(110)는 유전자 기능 통합 정보에서 다기능 유전자 및 유전자 간 기능 유사도 정보를 획득한다.The gene function integration extraction unit 110 acquires gene function integration information in which the function information of the gene included in the known function gene set and the function information of the gene included in the path and interaction information are integrated. Meanwhile, the gene function integrated extraction unit 110 may add function information obtained from a list of random genes defined by a user to the integrated gene function information. The gene function integration extraction unit 110 acquires multifunctional gene and inter-gene function similarity information from the gene function integration information.

유전자 기능 통합 추출부(110)는 생물학적 기능, 분자 기능, 세포 내 위치 등의 기능-유전자 정보로부터, 같은/유사한 기능을 가지는 유전자들을 묶어 기능 그룹(모듈)으로 수집할 수 있다. 이때, 유전자 기능 통합 추출부(110)는 Gene Ontology, MSigDB, Enrichr 등의 기능 유전자셋 데이터베이스(310)에서 기능 모듈을 추출할 수 있다.The gene function integration extraction unit 110 may collect genes having the same/similar functions from functional-gene information such as biological functions, molecular functions, and intracellular locations into a functional group (module). In this case, the gene function integration extraction unit 110 may extract a function module from a function gene set database 310 such as Gene Ontology, MSigDB, or Enrichr.

또한, 유전자 기능 통합 추출부(110)는 경로 및 상호작용 데이터베이스(320)의 경로 정보로부터 같은 경로를 가지는 유전자들을 묶어 기능 그룹으로 수집할 수 있다. 유전자 기능 통합 추출부(110)는 경로 및 상호작용 데이터베이스(320)에서 경로는 없으나 TF, E3, miRNA, kinase, phosphatase 등의 조절자에 대한 상호작용 정보가 있을 경우, 같은 조절자에 의하여 조절되는 유전자들을 묶어 기능 그룹으로 수집할 수 있다. 유전자 기능 통합 추출부(110)는 경로 및 상호작용 데이터베이스(320)에서 단순히 상호작용 정보만 있을 경우, 각 유전자의 1-hop 네트워크를 하나의 유전자셋으로 묶어 기능 그룹으로 수집할 수 있다. In addition, the gene function integration extraction unit 110 may collect genes having the same path from the path information of the path and interaction database 320 into a functional group. The gene function integration extraction unit 110 does not have a pathway in the pathway and interaction database 320, but when there is interaction information about a regulator such as TF, E3, miRNA, kinase, phosphatase, etc., it is regulated by the same regulator. Genes can be grouped together and collected into functional groups. When there is only interaction information in the path and interaction database 320, the gene function integration extraction unit 110 may collect the 1-hop network of each gene into one gene set and collect it as a function group.

이외에도, 유전자 기능 통합 추출부(110)는 마이크로어레이 등의 오믹스 데이터로부터 도출된 연관 유전자셋, 조건별 특이적 발현 유전자셋 등 사용자가 정의한 임의의 유전자셋 리스트를 기능 그룹으로 수집할 수 있다. In addition, the gene function integration extraction unit 110 may collect a list of arbitrary gene sets defined by the user, such as a related gene set derived from ohmic data such as a microarray, and a specific expression gene set for each condition, as a function group.

유전자 기능 통합 추출부(110)는 수집한 다양한 기능 그룹들을 통합한 유전자 기능 통합 정보에서, 유전자별로 기능 그룹들에서 출현하는 빈도를 계산하고, 일정 기준으로 빈도수가 높은 유전자를 다기능 유전자로 추출한다. 일정 기준은 예를 들면, 상위 5%, p-value 0.05 이하 등으로 설정될 수 있다.The gene function integration extraction unit 110 calculates the frequency of appearance in the function groups for each gene from the gene function integration information in which the collected various function groups are integrated, and extracts a gene having a high frequency on a predetermined basis as a multifunctional gene. A certain criterion may be set to, for example, the top 5%, a p-value of 0.05 or less.

또한 유전자 기능 통합 추출부(110)는 유전자 기능 통합 정보에 포함된 모든 유전자쌍에 대하여 해당 유전자가 포함된 기능 그룹이 유사한지를 점수화한 유사성 분석을 한다. 유사성 분석을 통해 계산된 유전자 간 기능 유사도 정보는 유전자 간 기능 유사도 행렬로 표현될 수 있다. 유전자 기능 통합 추출부(110)는 예를 들면, 자카드 지수의 분석 방법을 활용하여, 유전자 간 기능 유사성을 분석할 수 있다.In addition, the gene function integration extraction unit 110 performs a similarity analysis by scoring whether the function group including the corresponding gene is similar to all the gene pairs included in the gene function integration information. The function similarity information between genes calculated through similarity analysis may be expressed as a function similarity matrix between genes. The gene function integration extraction unit 110 may analyze functional similarity between genes by using, for example, an analysis method of a Jacquard index.

기저 온톨로지 구성부(130)와 기능 그룹 지표 선정부(150)는 유전자 기능 통합 추출부(110)에서 추출한 다기능 유전자 정보 및 유전자 간 기능 유사도 정보를 이용하여, 기능에 의한 인과관계가 알려진 유전자셋을 하나의 "기능 그룹 지표"로 발굴한다. 기능 그룹 지표는 기능에 연관된 유전자셋으로 구성된다. The base ontology configuration unit 130 and the function group indicator selection unit 150 use the multifunctional gene information and the function similarity information between genes extracted from the gene function integration extraction unit 110 to select a gene set with a known causal relationship due to a function. Discovered as one "functional group indicator". Functional group indicators consist of a set of genes involved in function.

구체적으로, 기저 온톨로지 구성부(130)와 기능 그룹 지표 선정부(150) 각각은 유전자쌍의 관계를 유전자 노드(꼭짓점)와 연결선(변)으로 연결하는 그래프 이론 기반 유전자 네트워크, 그리고 유전자 네트워크에서 추출한 극대 클릭(maximum clique)을 온톨로지 노드로 생성하는 계층적 온톨로지를 기초로 기능 그룹 지표를 선정한다. 온톨로지는 유향 비순환 그래프의 집합으로 표현되고, 소수의 다기능 유전자 집합으로 구성된 기저 노드를 기반으로, 세분화된 기능에 관계된 세부기능 유전자 집합으로 구성된 상위(부모) 노드를 연결하거나, 새로운 말단 노드를 추가하면서 생성된다. 온톨로지에서, 상위(부모) 노드는 하위 노드의 유전자 집합을 포함하고, 하위 노드는 상위 노드에 포함된 유전자 집합의 부분 집합이다.Specifically, each of the base ontology constructing unit 130 and the functional group indicator selecting unit 150 is a graph theory-based gene network that connects the relationship of gene pairs with a gene node (vertex) and a connecting line (side), and extracted from the gene network. A functional group index is selected based on a hierarchical ontology that generates a maximum clique as an ontology node. Ontology is expressed as a set of directed acyclic graphs, based on a base node composed of a small number of multifunctional gene sets, by connecting the parent (parent) node composed of a detailed function gene set related to a subdivided function, or adding a new terminal node. Is created. In ontology, an upper (parent) node contains a gene set of a lower node, and a lower node is a subset of the gene set included in an upper node.

기저 온톨로지 구성부(130)와 기능 그룹 지표 선정부(150)가 반드시 독립적으로 구현될 필요는 없으나, 기저 온톨로지를 구성하고, 기저 온톨로지를 기반으로 최종 온톨로지를 구성하는 방법을 단계적으로 설명하기 위해, 분리된 구성으로 설명한다. 또한, 기능 그룹 지표를 다기능과 연관된 "공통기능 그룹 지표"그리고 세분화된 기능과 연관된 "세부기능 그룹 지표"로 구분해서 설명할 수 있으나, 구분 없이 기능 그룹 지표라고 불러도 무방하다.Although the base ontology configuration unit 130 and the function group indicator selection unit 150 do not necessarily need to be independently implemented, in order to explain step by step a method of configuring a base ontology and configuring a final ontology based on the base ontology, It will be described as a separate configuration. In addition, functional group indicators can be classified into “common functional group indicators” associated with multi-functionality and “detailed functional group indicators” associated with subdivided functions, but may be referred to as functional group indicators without distinction.

기저 온톨로지 구성부(130)는 유전자 기능 통합 추출부(110)에서 추출한 다기능 유전자들의 기능 유사도 정보를 기초로 기저 온톨로지를 구성한다. 기저 온톨로지 구성부(130)는 유전자 간 기능 유사도 정보 중에서 다기능 유전자 간 기능 유사도 정보를 추출한다. 기저 온톨로지 구성부(130)는 기능 유사도가 높은 유전자쌍 순서로, 두 유전자를 노드와 연결선으로 연결하면서 연결 관계를 추가하는 방식으로 유전자 네트워크를 단계적으로 확장한다. 이때, 기저 온톨로지 구성부(130)는 유전자쌍을 추가하여 네트워크를 단계적으로 확장하는데, 유전자 네트워크에서 더 이상 유전자 노드를 추가할 수 없는 극대 클릭(maximum clique)을 탐색하고, 극대 클릭에 포함된 다기능 유전자들을 공통기능 그룹 지표로 온톨로지화 한다. 즉, 기저 온톨로지 구성부(130)는 극대 클릭에 포함된 다기능 유전자들을 기저 온톨로지 노드로 생성한다. 기저 온톨로지 노드를 구성하는 다기능 유전자들은 공통기능 그룹 지표로 선정된다. 이때, 기저 온톨로지 구성부(130)는 유전자쌍을 유전자 네트워크에서 추가하는 단계마다 극대 클릭인지 판단하고, 극대 클릭에 포함된 다기능 유전자들이 유전자 기능 통합 추출부(110)에서 수집한 다양한 기능 그룹들 중 하나 이상의 부분 집합인 경우, 극대 클릭에 포함된 다기능 유전자들을 공통기능 그룹 지표인 온톨로지 노드로 생성할 수 있다.The base ontology construction unit 130 constructs a base ontology based on the function similarity information of the multifunctional genes extracted by the gene function integration extraction unit 110. The base ontology constructing unit 130 extracts functional similarity information between multifunctional genes from among the functional similarity information between genes. The base ontology construction unit 130 gradually expands the gene network by adding a connection relationship while connecting the two genes with a node and a connection line in the order of gene pairs having high functional similarity. At this time, the base ontology configuration unit 130 gradually expands the network by adding a gene pair, searching for a maximum clique in which a gene node can no longer be added in the gene network, and multifunctional included in the maximum click. Genes are ontologyized as common functional group indicators. That is, the base ontology construction unit 130 generates multi-functional genes included in the maximum click as base ontology nodes. Multifunctional genes constituting the underlying ontology node are selected as common functional group indicators. At this time, the base ontology construction unit 130 determines whether a gene pair is a maximum click every step of adding a gene pair from the gene network, and the multifunctional genes included in the maximum click are among various functional groups collected by the gene function integration extraction unit 110. In the case of more than one subset, multifunctional genes included in the maximum click can be generated as ontology nodes, which are common functional group indicators.

기능 그룹 지표 선정부(150)는 기저 온톨로지 구성부(130)에서 다기능 유전자쌍을 이용하여 구성한 기저 온톨로지를 전체 유전자쌍을 이용하여 확장한다. 기능 그룹 지표 선정부(150)는 기저 온톨로지 구성부(130)의 기저 온톨로지 구성 방법과 동일하게, 전체 유전자쌍으로 유전자 네트워크를 확장하고, 극대 클릭에 해당하는 유전자들을 기저 온톨로지 노드의 상위 노드로 추가한다. 상위 온톨로지 노드를 구성하는 유전자들은 하위 온톨로지 노드 또는 기저 온톨로지 노드보다 세부적인 기능에 관계된 세부기능 그룹 지표로 선정된다. 한편, 기능 그룹 지표 선정부(150)는 전체 유전자쌍으로 유전자 네트워크를 확장하기 때문에, 다기능 유전자가 아닌 유전자들의 기능 그룹이 생성될 수 있다. 이 경우, 기능 그룹 지표 선정부(150)는 다기능 유전자가 아닌 유전자들의 기능 그룹을 세부기능 그룹 지표로 온톨로지에 추가한다.The functional group index selection unit 150 expands the base ontology constructed by using the multifunctional gene pair in the base ontology construction unit 130 by using the entire gene pair. The functional group index selection unit 150 expands the gene network to the entire gene pair and adds the genes corresponding to the maximum click to the upper node of the base ontology node in the same manner as the base ontology configuration method of the base ontology configuration unit 130 do. Genes constituting an upper ontology node are selected as detailed functional group indicators related to a detailed function than a lower ontology node or a base ontology node. Meanwhile, since the functional group index selection unit 150 expands the gene network to the entire gene pair, a functional group of genes other than the multi-functional gene may be generated. In this case, the functional group indicator selection unit 150 adds a functional group of genes other than the multifunctional gene to the ontology as a detailed functional group indicator.

이와 같이, 기능 그룹 지표 선정부(150)는 다양한 유전자들의 기능 정보를 통합하여 유전자가 포함된 기능 그룹이 유사한지를 점수화한 유전자 간 기능 유사도 정보를 계산하고, 이를 사용하여 기능 그룹 지표를 온톨로지화한다. 따라서, 기능 그룹 지표 선정부(150)는 다양한 기능 유전자셋을 재구성하여 공통기능 그룹 지표들부터 세부기능 그룹 지표들까지 다양한 기능 그룹 지표들을 추출할 수 있다.In this way, the functional group indicator selection unit 150 calculates functional similarity information between genes obtained by integrating the functional information of various genes and scoring whether the functional group including the gene is similar, and using this to ontology the functional group indicator. . Accordingly, the functional group indicator selection unit 150 may extract various functional group indicators from common functional group indicators to detailed functional group indicators by reconstructing various functional gene sets.

질병 판별 모델 생성 장치(200)는 기능 그룹 지표들의 질병 연관성을 평가하여 질병 그룹 지표를 선정하는 질병 그룹 지표 선정부(210), 그리고 질병 그룹 지표를 기초로 판별 모델을 생성하는 질병 상태 판별기(230)를 포함한다.The disease discrimination model generation apparatus 200 includes a disease group indicator selection unit 210 that selects a disease group indicator by evaluating the disease association of the functional group indicators, and a disease state discriminator that generates a discrimination model based on the disease group indicator ( 230).

질병 그룹 지표 선정부(210)는 기능 그룹 지표들 중에서 질병 지표를 유의하게 포함하고, 마이크로어레이 데이터에서의 발현 차이가 유의하게 나타나는 평가하고, 평가 결과를 기초로 질병 그룹 지표를 선정한다. 구체적으로 질병 그룹 지표 선정부(210)는 전체 기능 그룹 지표들 중에서 질병 지표 데이터베이스(330)에서 추출한 질병 지표(질병 마커)를 유의미하게 포함하는 기능 그룹 지표를 질병 그룹 지표 후보로 필터링할 수 있다. 그리고, 질병 그룹 지표 선정부(210)는 마이크로어레이 데이터베이스(340)에서 추출한 마이크로어레이 유전자 발현 데이터를 기반으로 샘플별 질병 그룹 지표 후보들의 활성화 점수를 계산하고, 확인하고자 하는 질병 상태 간 유의미한 차이를 보이는 질병 그룹 지표 후보를 최종 질병 그룹 지표로 선정할 수 있다. 질병 그룹 지표는 질병 연관 유전자셋으로 구성된다.The disease group index selection unit 210 significantly includes the disease index among the functional group indexes, evaluates that the difference in expression in the microarray data is significant, and selects the disease group index based on the evaluation result. In more detail, the disease group index selection unit 210 may filter a functional group index that significantly includes a disease index (disease marker) extracted from the disease index database 330 among all functional group indexes as a disease group index candidate. In addition, the disease group index selection unit 210 calculates the activation score of the disease group index candidates for each sample based on the microarray gene expression data extracted from the microarray database 340, and shows a significant difference between the disease states to be checked. A candidate disease group indicator can be selected as the final disease group indicator. The disease group indicator consists of a disease-related gene set.

질병 상태 판별기(230)는 판별하고자 하는 질병 상태의 마이크로어레이 샘플별로 질병 그룹 지표들의 활성화 점수를 계산하여 학습 데이터를 생성하고, 질병 그룹 지표들의 활성화 점수를 기반으로 특정 질병 상태를 판별할 수 있는 판별 모델을 학습시킨다. 판별 모델은 마이크로어레이 데이터 내 질병 상태를 판별할 수 있는 서포트 벡터 머신(support vector machine, SVM) 기계학습 판별기일 수 있으나, 다양한 학습 모델이 사용될 수 있다.The disease state discriminator 230 generates learning data by calculating activation scores of disease group indicators for each microarray sample of the disease state to be determined, and can determine a specific disease state based on the activation scores of the disease group indicators. Train the discriminant model. The discriminant model may be a support vector machine (SVM) machine learning discriminator capable of discriminating a disease state in microarray data, but various learning models may be used.

질병 상태 판별기(230)는 새로운 샘플의 질병 상태 판별을 요청받으면, 새로운 샘플에 대한 질병 그룹 지표들의 활성화 점수를 계산하고, 이를 학습된 판별 모델로 입력한다. 그러면, 질병 상태 판별기(230)는 판별 모델로부터 출력된 판별값을 통하여 새로운 샘플의 질병 상태를 판별한다. 질병 상태 판별기(230)는 샘플 내의 유전자 발현값을 판별 모델의 학습에 사용된 질병 그룹 지표들의 활성화 점수로 치환하고, 이를 판별 모델의 입력값으로 사용한다.When the disease state discriminator 230 receives a request to determine the disease state of a new sample, it calculates activation scores of the disease group indicators for the new sample, and inputs this into the learned discrimination model. Then, the disease state determiner 230 determines the disease state of the new sample through the determination value output from the determination model. The disease state discriminator 230 replaces the gene expression value in the sample with the activation score of the disease group indicators used in the learning of the discriminant model, and uses this as an input value of the discrimination model.

이와 같이, 질병 판별 시스템(10)은 다양한 기능 유전자셋과 경로 및 상호작용 정보 등 기능 정보로부터 유전자 기능 통합 정보를 추출하고, 유전자 간 기능 유사도를 기반으로 기능 그룹 지표를 추출한다. 이때, 기능 그룹 지표는 공통기능 그룹 지표와 세부기능 그룹 지표를 포함하는데, 질병 판별 시스템(10)은 다기능 유전자들을 이용해 공통기능 그룹 지표로 구성된 기저 온톨로지를 먼저 구성하고, 전체 유전자들을 이용해 세부기능 그룹 지표를 탐색함으로써, 기저 온톨로지를 확장한다. 질병 판별 시스템(10)은 다양한 기능 정보를 통합 분석하여 획득한 기능 그룹 지표들 중에서, 알려진 질병 지표 및 마이크로어레이 데이터를 기반으로 질병에 연관된 질병 그룹 지표를 선정한 후, 선정한 질병 그룹 지표를 활용하여 질병 상태 판별이 가능한 판별 모델을 학습시킨다. In this way, the disease determination system 10 extracts gene function integration information from functional information such as various functional gene sets, pathways, and interaction information, and extracts functional group indicators based on the similarity of functions between genes. At this time, the functional group indicator includes a common functional group indicator and a detailed functional group indicator, and the disease identification system 10 first constructs a basic ontology consisting of common functional group indicators using multifunctional genes, and then uses all genes to form a detailed functional group. By exploring the indicators, the underlying ontology is expanded. The disease determination system 10 selects a disease group index related to a disease based on known disease index and microarray data from among the function group indexes obtained by integrated analysis of various function information, and then uses the selected disease group index to determine the disease. It trains a discrimination model capable of discriminating the state.

이처럼 본 발명의 질병 상태 판별기는 질병 그룹 지표라고 정의한 질병 연관 유전자셋을 선정하고, 이의 조합으로 판별 모델을 구성한다. 따라서, 본 발명은 종래의 단일 마커 내지는 마커셋에서 가지는 재현성의 한계를 해결하여 보다 강건한 판별 모델을 구성할 수 있다. 또한 본 발명은 종래의 유전자셋 선정 방법에서 활용하지 못하였던 그룹 정보를 활용함으로, 알려진 기능 정보와 직접적인 연계가 가능하여 기전 해석이 동시에 가능하다.As described above, the disease state discriminator of the present invention selects a disease-related gene set defined as a disease group index, and constructs a discrimination model with a combination thereof. Accordingly, the present invention can construct a more robust discrimination model by solving the limitation of reproducibility of a conventional single marker or marker set. In addition, the present invention utilizes group information that has not been utilized in the conventional gene set selection method, so that it is possible to directly link with known function information, so that mechanism analysis is possible at the same time.

도 2는 한 실시예에 따른 기저 온톨로지 구성 방법의 흐름도이다.2 is a flowchart of a method for configuring a base ontology according to an embodiment.

도 2를 참고하면, 기능 그룹 지표 발굴 장치(100)는 다양한 기능 유전자셋과 경로 및 상호작용 정보를 기초로, 기능, 경로, 또는 상호작용이 동일/유사한 유전자들(유전자셋)을 기능 그룹으로 묶어 유전자 기능 통합 정보를 추출한다(S110). Referring to FIG. 2, the function group indicator discovery apparatus 100 uses genes (gene sets) having the same/similar functions, pathways, or interactions as a functional group based on various functional gene sets and pathways and interaction information. Bundled and extracted gene function integration information (S110).

기능 그룹 지표 발굴 장치(100)는 다양한 기능 그룹들이 통합된 유전자 기능 통합 정보에서 다기능 유전자들을 추출한다(S120). 다기능 유전자는 기능 그룹들에서 출현하는 빈도 정보로 결정될 수 있다.The functional group index discovery apparatus 100 extracts multifunctional genes from the gene function integration information in which various functional groups are integrated (S120). Multifunctional genes can be determined by information on the frequency of appearance in functional groups.

기능 그룹 지표 발굴 장치(100)는 다기능 유전자 간 기능 유사도를 계산하고, 기능 유사도가 높은 순으로 다기능 유전자쌍을 정렬한다(S130). 유전자 간 기능 유사도는 유전자쌍에 대하여 해당 유전자가 포함된 기능 그룹이 유사한지를 점수화한 유사성 분석으로 계산되고, 유전자 간 기능 유사도 행렬로 표현될 수 있다.The functional group index discovery device 100 calculates the functional similarity between the multifunctional genes, and arranges the multifunctional gene pairs in the order of the highest functional similarity (S130). The functional similarity between genes is calculated by a similarity analysis by scoring whether a functional group containing a corresponding gene is similar for a gene pair, and can be expressed as a functional similarity matrix between genes.

기능 그룹 지표 발굴 장치(100)는 다기능 유전자쌍 중 기능 유사도가 높은 순서대로 유전자쌍을 유전자 네트워크에 추가하고, 유전자 네트워크에서 극대 클릭을 탐색한다(S140). 이를 위해, 기능 그룹 지표 발굴 장치(100)는 전체 다기능 유전자쌍을 추가 연결관계 후보로 리스트한 후, 연결관계를 추가하며 확장해 나갈 빈 네트워크를 생성하고, 기능 유사도가 높은 유전자쌍부터 단계적으로 노드를 추가하면서 유전자 네트워크를 확장한다. 한편, 유전자쌍이 네트워크에 추가될 때마다 극대 클릭을 탐색하는 것은 계산 시간이 많이 소요된다. 따라서, 기능 그룹 지표 발굴 장치(100)는 이전 단계에 찾은 극대 클릭들을 기반으로 유전자 네트워크에 새로 추가된 유전자가 기존의 극대 클릭들을 확장시킬 수 있는지, 혹은 새로운 극대 클릭을 구성할 수 있는지 만을 계산한다. 기능 그룹 지표 발굴 장치(100)는 계산 시간을 단축시키기 위하여 극대 클릭을 병렬 연산하여 탐색할 수 있다.The functional group index discovery apparatus 100 adds gene pairs to the gene network in the order of high functional similarity among the multi-functional gene pairs, and searches for a maximum click in the gene network (S140). To this end, the functional group index discovery device 100 lists all multifunctional gene pairs as candidates for an additional connection relationship, then creates an empty network to be expanded by adding a connection relationship, and step by step from a gene pair having a high functional similarity. The gene network is expanded while adding. On the other hand, searching for a maximum click each time a gene pair is added to the network takes a lot of computation time. Therefore, the functional group index discovery apparatus 100 calculates only whether a new gene added to the gene network can expand existing maximum clicks or constitute a new maximum click based on the maximum clicks found in the previous step. . In order to shorten the calculation time, the functional group index discovery apparatus 100 may perform parallel calculation and search for maximum clicks.

기능 그룹 지표 발굴 장치(100)는 유전자 네트워크에서 새로운 극대 클릭이 발견되면, 극대 클릭에 해당하는 유전자셋을 공통기능 그룹 지표로 결정하고, 공통기능 그룹 지표를 온톨로지 노드 후보로 선정한다(S150). 기능 그룹 지표 발굴 장치(100)는 극대 클릭에 해당하는 유전자셋을 모두 포함하는 기능 그룹이 수집한 유전자 기능 통합 정보에 존재하거나, 온톨로지 내 노드(유전자셋)들의 조합으로 극대 클릭을 구성할 수 있는 경우, 극대 클릭의 유전자셋을 온톨로지 노드 후보로 선정할 수 있다.When a new maximum click is found in the gene network, the functional group indicator discovery apparatus 100 determines a gene set corresponding to the maximum click as a common functional group indicator, and selects the common functional group indicator as an ontology node candidate (S150). The functional group index discovery device 100 exists in the gene function integration information collected by the functional group including all the gene sets corresponding to the maximum click, or the maximum click can be configured by a combination of nodes (gene sets) in the ontology. In this case, the gene set of the maximum click may be selected as an ontology node candidate.

기능 그룹 지표 발굴 장치(100)는 온톨로지에, 온톨로지 노드 후보의 부분 집합인 노드가 있는지 탐색한다(S160). 즉, 기능 그룹 지표 발굴 장치(100)는 온톨로지 노드 후보를 구성하는 유전자셋의 부분 집합으로 구성된 노드가 있는지 탐색한다.The functional group index discovery apparatus 100 searches whether there is a node in the ontology that is a subset of the ontology node candidate (S160). That is, the functional group index discovery apparatus 100 searches for a node composed of a subset of a gene set constituting an ontology node candidate.

기능 그룹 지표 발굴 장치(100)는 온톨로지에 온톨로지 노드 후보의 부분 집합인 노드가 있으면, 온톨로지 노드 후보를 부분집합 노드의 부모 노드로 추가하고, 온톨로지 노드 후보의 부분 집합인 노드가 없으면, 온톨로지 노드 후보를 새로운 말단 노드로 추가한다(S170).If there is a node that is a subset of the ontology node candidate in the ontology, the functional group index discovery apparatus 100 adds the ontology node candidate as a parent node of the subset node, and if there is no node that is a subset of the ontology node candidate, the ontology node candidate Is added as a new end node (S170).

기능 그룹 지표 발굴 장치(100)는 전체 다기능 유전자쌍 중에서 유전자 네트워크에 추가할 유전자쌍이 남아있는지 판단한다(S180). 기능 그룹 지표 발굴 장치(100)는 전체 다기능 유전자쌍을 추가 연결관계 후보로 리스트한 후, 유전자 네트워크에 추가하여 탐색한 유전자쌍을 제거하면서 유전자 네트워크에 추가할 유전자쌍이 남아있는지 판단할 수 있다. 기능 그룹 지표 발굴 장치(100)는 유전자 네트워크에 추가할 유전자쌍이 남아있으면, 유전자 네트워크 확장 및 극대 클릭을 탐색하는 단계(S140)를 반복한다.The functional group index discovery device 100 determines whether a gene pair to be added to the gene network remains among all multi-functional gene pairs (S180). The functional group index discovery apparatus 100 may list all multifunctional gene pairs as candidates for an additional linkage, and then determine whether a gene pair to be added to the gene network remains while removing the searched gene pair by adding it to the gene network. The functional group index discovery apparatus 100 repeats the step (S140) of searching for gene network expansion and maximum click when there are remaining gene pairs to be added to the gene network.

기능 그룹 지표 발굴 장치(100)는 다기능 유전자들의 전체 유전자쌍에 대한 연결이 완료되면, 각 공통기능 그룹 지표가 노드로 구성된 기저 온톨로지를 출력한다(S190). 기능 그룹 지표 발굴 장치(100)는 다기능 유전자간에 기능적으로 연결된 유전자셋을 탐색하여 기저 온톨로지의 노드들을 구성한다.When the linking of all the gene pairs of the multi-functional genes is completed, the functional group indicator discovery apparatus 100 outputs a base ontology in which each common functional group indicator is composed of nodes (S190). The functional group index discovery apparatus 100 constructs nodes of a base ontology by searching for a gene set functionally connected between multifunctional genes.

도 3은 한 실시예에 따른 기능 온톨로지 구성 방법의 흐름도이다.3 is a flowchart of a method of configuring a functional ontology according to an embodiment.

도 3을 참고하면, 기능 그룹 지표 발굴 장치(100)는 다기능 유전자들로 생성한 기저 온톨로지를 유전자 전체로 확장하여 기능 온톨로지를 구성한다. 기능 그룹 지표 발굴 장치(100)는 기저 온톨로지의 공통기능 그룹 지표로부터 세부적인 기능에 연관된 유전자셋으로 구성된 세부기능 그룹 지표를 탐색한다. 기능 온톨로지를 구성하는 방법은 기저 온톨로지를 구성하는 방법과 유사한다.Referring to FIG. 3, the functional group indicator discovery apparatus 100 expands the basic ontology generated by multifunctional genes to the entire gene to construct a functional ontology. The functional group indicator discovery apparatus 100 searches for a detailed functional group indicator composed of a gene set related to a detailed function from the common functional group indicator of the underlying ontology. The method of constructing the functional ontology is similar to the method of constructing the base ontology.

기능 그룹 지표 발굴 장치(100)는 다기능 유전자들에 의해 생성된 기저 온톨로지를 입력받는다(S210).The functional group index discovery device 100 receives a base ontology generated by multifunctional genes (S210).

기능 그룹 지표 발굴 장치(100)는 다양한 기능 그룹들이 통합된 유전자 기능 통합 정보에서, 전체 유전자 간 기능 유사도를 계산한다(S220). 유전자 간 기능 유사도는 유전자쌍에 대하여 해당 유전자가 포함된 기능 그룹이 유사한지를 점수화한 유사성 분석으로 계산되고, 유전자 간 기능 유사도 행렬로 표현될 수 있다. 전체 유전자 간 기능 유사도는 미리 계산될 수 있다.The functional group index discovery apparatus 100 calculates a function similarity between all genes from the gene function integration information in which various functional groups are integrated (S220). The functional similarity between genes is calculated by a similarity analysis by scoring whether a functional group containing a corresponding gene is similar for a gene pair, and can be expressed as a functional similarity matrix between genes. Functional similarity between all genes can be calculated in advance.

기능 그룹 지표 발굴 장치(100)는 기능 유사도가 높은 순으로 전체 유전자쌍을 정렬한다(S230). The functional group index discovery apparatus 100 sorts all gene pairs in the order of the highest functional similarity (S230).

기능 그룹 지표 발굴 장치(100)는 전체 유전자쌍 중 기능 유사도가 높은 순서대로 유전자쌍을 유전자 네트워크에 추가하고, 유전자 네트워크에서 극대 클릭을 탐색한다(S240). 이를 위해, 기능 그룹 지표 발굴 장치(100)는 전체 유전자쌍을 추가 연결관계 후보로 리스트한 후, 연결관계를 추가하며 확장해 나갈 빈 네트워크를 생성하고, 기능 유사도가 높은 유전자쌍부터 단계적으로 노드를 추가하면서 유전자 네트워크를 확장한다. 이때, 기능 그룹 지표 발굴 장치(100)는 이전 단계에 찾은 극대 클릭들을 기반으로 유전자 네트워크에 새로 추가된 유전자가 기존의 극대 클릭들을 확장시킬 수 있는지, 혹은 새로운 극대 클릭을 구성할 수 있는지 만을 계산하여 탐색 시간을 단축시킬 수 있다. 또한 기능 그룹 지표 발굴 장치(100)는 계산 시간을 단축시키기 위하여 극대 클릭을 병렬 연산하여 탐색할 수 있다.The functional group index discovery apparatus 100 adds gene pairs to the gene network in the order of high functional similarity among all gene pairs, and searches for a maximum click in the gene network (S240). To this end, the functional group index discovery device 100 lists all gene pairs as candidates for additional connection relations, then creates an empty network to be expanded by adding the connection relations, and stepwisely selects nodes from gene pairs having high functional similarity. As you add, you expand your genetic network. At this time, the functional group index discovery device 100 calculates only whether a new gene added to the gene network can expand existing maximum clicks or constitute a new maximum click based on the maximum clicks found in the previous step. You can shorten the search time. In addition, the functional group index discovery apparatus 100 may perform parallel calculations and search for maximum clicks in order to shorten the calculation time.

기능 그룹 지표 발굴 장치(100)는 유전자 네트워크에서 새로운 극대 클릭이 발견되면, 극대 클릭에 해당하는 유전자셋을 세부기능 그룹 지표로 결정하고, 세부기능 그룹 지표를 온톨로지 노드 후보로 선정한다(S250). 기능 그룹 지표 발굴 장치(100)는 극대 클릭에 해당하는 유전자셋을 모두 포함하는 기능 그룹이 수집한 유전자 기능 통합 정보에 존재하거나, 온톨로지 내 노드(유전자셋)들의 조합으로 극대 클릭을 구성할 수 있는 경우, 극대 클릭의 유전자셋을 온톨로지 노드 후보로 선정할 수 있다.When a new maximum click is found in the gene network, the functional group indicator discovery apparatus 100 determines a gene set corresponding to the maximum click as a detailed functional group indicator, and selects the detailed functional group indicator as an ontology node candidate (S250). The functional group index discovery device 100 exists in the gene function integration information collected by the functional group including all the gene sets corresponding to the maximum click, or the maximum click can be configured by a combination of nodes (gene sets) in the ontology. In this case, the gene set of the maximum click may be selected as an ontology node candidate.

기능 그룹 지표 발굴 장치(100)는 온톨로지에, 온톨로지 노드 후보의 부분 집합인 노드가 있는지 탐색한다(S260). The functional group index discovery apparatus 100 searches for a node that is a subset of the ontology node candidate in the ontology (S260).

기능 그룹 지표 발굴 장치(100)는 온톨로지에 온톨로지 노드 후보의 부분 집합인 노드가 있으면, 온톨로지 노드 후보를 부분집합 노드의 부모 노드로 추가하거나, 온톨로지 노드 후보를 새로운 말단 노드로 추가한다(S270). 이를 통해, 다기능 유전자들로 구성된 기저 온톨로지를 기반으로 상위 기능 그룹의 개념을 쌓되, 다기능 유전자가 아닌 유전자가 포함된 유전자셋이 기능 그룹으로 추가되면서 세부기능의 그룹 지표가 탐색될 수 있다.If there is a node that is a subset of the ontology node candidate in the ontology, the functional group index discovery apparatus 100 adds the ontology node candidate as a parent node of the subset node or adds the ontology node candidate as a new end node (S270). Through this, the concept of a higher functional group is built up based on a basic ontology composed of multifunctional genes, but as a gene set containing genes other than multifunctional genes is added as a functional group, group indicators of detailed functions can be searched.

기능 그룹 지표 발굴 장치(100)는 전체 유전자쌍 중에서 유전자 네트워크에 추가할 유전자쌍이 남아있는지 판단한다(S280). 기능 그룹 지표 발굴 장치(100)는 전체 유전자쌍을 추가 연결관계 후보로 리스트한 후, 유전자 네트워크에 추가하여 탐색한 유전자쌍을 제거하면서 유전자 네트워크에 추가할 유전자쌍이 남아있는지 판단할 수 있다. 기능 그룹 지표 발굴 장치(100)는 유전자 네트워크에 추가할 유전자쌍이 남아있으면, 유전자 네트워크 확장 및 극대 클릭을 탐색하는 단계(S240)를 반복한다.The functional group index discovery apparatus 100 determines whether a gene pair to be added to the gene network remains among all gene pairs (S280). After listing all the gene pairs as candidates for an additional linkage relationship, the functional group index discovery apparatus 100 may determine whether a gene pair to be added to the gene network remains while removing the searched gene pair by adding it to the gene network. If there is a gene pair to be added to the gene network, the functional group index discovery device 100 repeats the step (S240) of searching for a gene network expansion and a maximum click.

기능 그룹 지표 발굴 장치(100)는 전체 유전자쌍을 유전자 네트워크에 추가한 경우, 기능 온톨로지의 노드들을 기능 그룹 지표들로 출력한다(S290). 기능 온톨로지의 각 노드를 구성하는 유전자셋이 기능 그룹 지표를 구성하고, 공통기능 그룹 지표 및 세부기능 그룹 지표를 포함한다.When the entire gene pair is added to the gene network, the functional group index discovery apparatus 100 outputs the nodes of the functional ontology as functional group indexes (S290). The gene set constituting each node of the functional ontology constitutes a functional group indicator, and includes a common functional group indicator and a detailed functional group indicator.

도 4는 한 실시예에 따른 질병 그룹 지표 선정 방법의 흐름도이다.4 is a flowchart of a method for selecting a disease group index according to an exemplary embodiment.

도 4를 참고하면, 질병 판별 모델 생성 장치(200)는 기능 그룹 지표들 중에서, 알려진 질병 지표(마커)를 유의미하게 포함하고, 질병-비질병 마이크로어레이 데이터에서 점수가 유의미하게 차이 나는 기능 그룹 지표를 질병 그룹 지표로 선정한다.Referring to FIG. 4, the disease discrimination model generation apparatus 200 significantly includes a known disease indicator (marker) among functional group indicators, and a functional group indicator in which scores are significantly different in disease-non-disease microarray data. Is selected as the disease group indicator.

질병 판별 모델 생성 장치(200)는 질병 지표의 유의성 분석을 통해, 기능 그룹 지표들 중에서 알려진 질병 지표(마커)를 유의하게 포함하는 기능 그룹 지표를 질병 그룹 지표 후보로 선정한다(S310). 질병 판별 모델 생성 장치(200)는 질병 지표 데이터베이스(330)로부터, 알려진 질병 지표를 추출할 수 있다. 질병 지표의 유의성 분석은 예를 들면, 피셔의 정확 검정 기법(Fisher's exact test)을 활용하여 각 기능 그룹 지표에 대해 질병 지표 포함 정도를 나타내는 유의성을 계산하고, p-value 0.05 이하인 기능 그룹 지표가 질병 그룹 지표 후보로 선정될 수 있다.The apparatus 200 for generating a disease discrimination model selects a functional group indicator that significantly includes a known disease indicator (marker) among functional group indicators as a disease group indicator candidate (S310). The apparatus 200 for generating a disease discrimination model may extract a known disease index from the disease index database 330. To analyze the significance of disease indicators, for example, by using Fisher's exact test, the significance indicating the degree of disease indicator inclusion is calculated for each functional group indicator, and the functional group indicator with a p-value of 0.05 or less is the disease. Can be selected as a group indicator candidate.

질병 판별 모델 생성 장치(200)는 마이크로어레이 데이터를 기반으로 샘플별 질병 그룹 지표 후보들의 활성화 점수를 계산한다(S320). 질병 판별 모델 생성 장치(200)는 질병-비질병 마이크로어레이 데이터에서 활성화 점수가 유의미하게 차이 나는 기능 그룹 지표를 질병 그룹 지표로 선정한다. 질병 판별 모델 생성 장치(200)는 질병 그룹 지표 후보의 유전자 발현값을 통합하고, 이를 기초로 해당 질병 그룹 지표 후보의 활성화 점수를 계산할 수 있다. 질병 판별 모델 생성 장치(200)는 유전자 집합 농축도 분석(Gene Set Enrichment Analysis)을 통해, 질병 상태 샘플에서 발현되는 유전자 발현 정보를 해당 샘플의 질병 그룹 지표 후보의 활성화 점수로 변환한다. 즉, 유전자 집합 농축도 분석을 통해, 샘플-유전자 발현 행렬을 샘플-질병 그룹 지표 후보의 활성화 점수 행렬로 변환한다. 유전자 집합 농축도 분석은 예를 들면 single sample GSEA, gene set variation analysis 등의 방법을 활용할 수 있다. 마이크로어레이 데이터는 예를 들면, Gene Expression Omnibus, ArrayExpress 데이터베이스 내의 유전자 발현 마이크로어레이 정보를 활용할 수 있고, 마이크로어레이 데이터베이스(340)에서 추출될 수 있다. The disease discrimination model generation apparatus 200 calculates activation scores of disease group index candidates for each sample based on the microarray data (S320). The disease discrimination model generation apparatus 200 selects a functional group index having a significantly different activation score in the disease-non-disease microarray data as a disease group index. The apparatus 200 for generating a disease discrimination model may integrate the gene expression values of the disease group index candidate, and calculate an activation score of the disease group index candidate based on this. The disease discrimination model generation apparatus 200 converts gene expression information expressed in a disease state sample into an activation score of a disease group indicator candidate of the sample through Gene Set Enrichment Analysis. That is, through gene set enrichment analysis, the sample-gene expression matrix is transformed into an activation score matrix of the sample-disease group indicator candidate. For the gene set enrichment analysis, for example, single sample GSEA, gene set variation analysis, etc. can be used. Microarray data may utilize gene expression microarray information in, for example, Gene Expression Omnibus or ArrayExpress databases, and may be extracted from the microarray database 340.

질병 판별 모델 생성 장치(200)는 질병 그룹 지표 후보 별로 질병 상태 간 활성화 점수 차이를 비교하고, 유의미한 차이를 보이는 특정 질병 그룹 지표 후보를 해당 질병 상태의 질병 그룹 지표로 선정한다(S330). 질병 판별 모델 생성 장치(200)는 샘플-질병 그룹 지표 후보의 활성화 점수 행렬을 통해, 질병 상태간 활성화 점수 차이를 분석하고, 유의한 차이를 보이는 특정 그룹 지표 후보를 질병 그룹 지표로 선정한다. 질병 그룹 지표 후보의 유의성 분석은 예를 들면, t-test를 활용하여 활성화 점수 차이의 유의성을 계산하고, p-value 0.05 이하인 그룹 지표를 질병 그룹 지표로 선정할 수 있다.The disease discrimination model generation apparatus 200 compares the difference in activation score between disease states for each disease group index candidate, and selects a specific disease group index candidate showing a significant difference as a disease group index of the corresponding disease state (S330). The disease discrimination model generation apparatus 200 analyzes the difference in activation score between disease states through the activation score matrix of the sample-disease group index candidate, and selects a specific group index candidate showing a significant difference as the disease group index. To analyze the significance of the disease group index candidate, for example, a t-test may be used to calculate the significance of the difference in activation score, and a group index with a p-value of 0.05 or less may be selected as the disease group index.

도 5는 한 실시예에 따른 질병 판별 모델 생성 방법의 흐름도이다.5 is a flowchart of a method for generating a disease discrimination model according to an exemplary embodiment.

도 5를 참고하면, 질병 판별 모델 생성 장치(200)는 판별하고자 하는 질병 상태의 마이크로어레이 샘플별로 질병 그룹 지표의 활성화 점수를 계산한다(S410). 질병 판별 모델 생성 장치(200)는 샘플-유전자 발현 행렬을 샘플-질병 그룹 지표의 활성화 점수 행렬로 변환하여 활성화 점수를 계산할 수 있다. Referring to FIG. 5, the apparatus 200 for generating a disease determination model calculates an activation score of a disease group index for each microarray sample of a disease state to be determined (S410). The apparatus 200 for generating a disease discrimination model may calculate an activation score by converting the sample-gene expression matrix into an activation score matrix of the sample-disease group index.

질병 판별 모델 생성 장치(200)는 질병 그룹 지표들의 활성화 점수를 기반으로 특정 질병 상태를 판별할 수 있는 판별 모델을 학습시킨다(S420). 판별 모델은 마이크로어레이 데이터 내 질병 상태를 판별할 수 있는 서포트 벡터 머신(support vector machine, SVM) 기계학습 판별기일 수 있으나, 다양한 학습 모델이 사용될 수 있다.The disease discrimination model generating apparatus 200 learns a discrimination model capable of discriminating a specific disease state based on the activation score of the disease group indicators (S420). The discriminant model may be a support vector machine (SVM) machine learning discriminator capable of discriminating a disease state in microarray data, but various learning models may be used.

질병 판별 모델 생성 장치(200)는 질병 상태를 판별하고자 하는 신규 샘플을 입력받는다(S430).The disease determination model generation apparatus 200 receives a new sample for determining a disease state (S430).

질병 판별 모델 생성 장치(200)는 입력 샘플의 유전자 발현값(유전자 발현 행렬)을 판별 모델의 학습에 사용된 질병 그룹 지표들의 활성화 점수 정보(활성화 점수 행렬)로 변환한다(S440).The disease discrimination model generating apparatus 200 converts the gene expression value (gene expression matrix) of the input sample into activation score information (activation score matrix) of the disease group indicators used for learning the discrimination model (S440).

질병 판별 모델 생성 장치(200)는 입력 샘플의 활성화 점수 정보를 학습된 판별 모델로 입력한다(S450).The disease discrimination model generating apparatus 200 inputs activation score information of the input sample as the learned discrimination model (S450).

질병 판별 모델 생성 장치(200)는 신규 샘플에 대한 판별 모델의 판별값을 출력한다(S460). 판별값을 통하여 입력 샘플의 질병 상태가 판별된다.The disease discrimination model generation apparatus 200 outputs a discrimination value of a discrimination model for a new sample (S460). The disease state of the input sample is determined through the discrimination value.

도 6은 한 실시예에 따른 기저 온톨로지 구성 방법을 예시적으로 설명하는 도면이다.6 is a diagram illustrating a method of configuring a base ontology according to an exemplary embodiment.

도 2와 도 6을 참고하여, 기능 그룹 지표 발굴 장치(100)가 유전자 기능 통합 추출부(110)에서 추출한 다기능 유전자 간 기능 유사도 정보 중에서 유사도가 높은 유전자쌍부터 하나씩 유전자 네트워크에 추가하면서 극대 클릭을 찾고, 온톨로지에 추가하는 방법을 예시적으로 설명한다.With reference to FIGS. 2 and 6, the function group index discovery device 100 adds the gene pairs with high similarity one by one from among the functional similarity information between multifunctional genes extracted by the gene function integration extraction unit 110 to the gene network, while maximizing clicks. How to find and add to the ontology will be described as an example.

예를 들어, 기능 그룹 지표 발굴 장치(100)는 다기능 유전자로 MDM2, p53, BAX, CASP8, BCL2, CDK2의 6개의 유전자들을 찾으면, 표 1과 같이 다기능 유전자 간 기능 유사도를 계산한다. 표 1의 기능 유사도에 따르면, 기능 유사도가 높은 다기능 유전자쌍은 표 2와 같이 정렬될 수 있다.For example, when the functional group index discovery device 100 finds six genes of MDM2, p53, BAX, CASP8, BCL2, and CDK2 as multi-functional genes, it calculates functional similarity between multi-functional genes as shown in Table 1. According to the functional similarity of Table 1, multifunctional gene pairs having high functional similarity can be arranged as shown in Table 2.

MDM2MDM2 p53p53 BAXBAX CASP8CASP8 BCL2BCL2 CDK2CDK2 MDM2MDM2 -- 0.2360.236 0.1900.190 0.1250.125 0.1050.105 0.1630.163 P53P53 0.2360.236 -- 0.1950.195 0.1370.137 0.1240.124 0.1650.165 BAXBAX 0.1900.190 0.1950.195 -- 0.2000.200 0.1860.186 0.1680.168 CASP8CASP8 0.1250.125 0.1370.137 0.2000.200 -- 0.1870.187 0.1000.100 BCL2BCL2 0.1050.105 0.1650.165 0.1860.186 0.1870.187 -- 0.0920.092 CDK2CDK2 0.1630.163 0.1240.124 0.1680.168 0.1000.100 0.0920.092 --

1One MDM2-p53MDM2-p53 0.2360.236 22 BAX-CASP8BAX-CASP8 0.2000.200 33 p53- BAXp53-BAX 0.1950.195 44 MDM2- BAXMDM2- BAX 0.1900.190 55 CASP8- BCL2CASP8- BCL2 0.1870.187 66 BAX- BCL2BAX- BCL2 0.1860.186 77 BAX- CDK2BAX- CDK2 0.1680.168 …… ……

“알려진 기능1: p53 pathway” 및 “알려진 기능2: apoptosis”를 이용하여 기저 온톨로지를 구성하는 것을 설명한다. p53 pathway는 {MDM2, p53, BAX}로 구성되고, apoptosis는 {BAX, CASP8, BCL2}로 구성되며, 유전자 기능 통합 정보에 존재한다고 가정한다.It describes the construction of the underlying ontology using “known function 1: p53 pathway” and “known function 2: apoptosis”. The p53 pathway consists of {MDM2, p53, BAX}, and apoptosis consists of {BAX, CASP8, BCL2}, and is assumed to exist in the gene function integration information.

과정 1에서, 기능 그룹 지표 발굴 장치(100)는 관계가 구성되지 않은 빈 유전자 네트워크와 온톨로지 노트가 없는 빈 온톨로지를 생성한다. In step 1, the functional group index discovery apparatus 100 generates an empty gene network with no relationship and an empty ontology without ontology notes.

과정 2에서, 기능 그룹 지표 발굴 장치(100)는 표 2에서, 기능 유사도가 가장 높은 MDM2-p53의 관계를 유전자 네트워크에 추가하고, 극대 클릭을 탐색한다. 이 때 유전자 2개로 구성된 {MDM2, p53}이 새로 탐색된 극대 클릭이다. 기능 그룹 지표 발굴 장치(100)는 수집한 유전자 통합 기능 정보에서 탐색된 극대 클릭이 존재하는지 확인하고, 유전자 통합 기능 정보에 존재하면, 새로운 노드로 온톨로지에 추가한다. {MDM2, p53}은 “알려진 기능1: P53 pathway”를 공유하므로, 신규 온톨로지 노드(T1)로 온톨로지에 추가된다. In step 2, the functional group index discovery apparatus 100 adds the relationship of MDM2-p53 having the highest functional similarity in Table 2 to the gene network, and searches for a maximum click. At this time, {MDM2, p53}, which consists of two genes, is the newly searched maximum click. The functional group index discovery apparatus 100 checks whether the maximum click searched for in the collected gene integration function information exists, and if it exists in the gene integration function information, adds it to the ontology as a new node. Since {MDM2, p53} shares “Known Function 1: P53 pathway”, it is added to the ontology as a new ontology node (T1).

과정 3에서, 기능 그룹 지표 발굴 장치(100)는 표 2에서, 기능 유사도가 두 번째로 높은 BAX-CASP8을 네트워크에 추가한다. {BAX, CASP8}이 새로운 극대 클릭이며, “알려진 기능2: Apoptosis”를 공유하므로 신규 온톨로지 노드(T2)로 온톨로지에 추가된다. In step 3, the functional group index discovery apparatus 100 adds BAX-CASP8, which has the second highest functional similarity in Table 2, to the network. Since {BAX, CASP8} is a new extreme click and shares “Known Function 2: Apoptosis”, it is added to the ontology as a new ontology node (T2).

과정 4에서, 기능 유사도가 세 번째로 높은 {p53, BAX}이 신규 극대 클릭이고, 온톨로지 노드(T3)로 추가된다. In step 4, {p53, BAX}, which has the third highest functional similarity, is a new maximum click, and is added as an ontology node T3.

과정 5에서, 기능 유사도가 네 번째로 높은 MDM2-BAX가 유전자 네트워크에 추가되는데, {MDM2, p53, BAX}이 신규 극대 클릭이다. 세 유전자는 모두 “알려진 기능1: P53 pathway”를 공유하면서, 온톨로지 노드 T1(MDM2, p53)과 온톨로지 노드 T3(p53, BAX)이 {MDM2, p53, BAX}의 부분집합이므로, {MDM2, p53, BAX}이 T1과 T3의 부모 노드인 신규 온톨로지 노드(T4)로 추가된다. In step 5, MDM2-BAX, which has the fourth highest functional similarity, is added to the gene network, and {MDM2, p53, BAX} is a new maximal click. All three genes share the “known function 1: P53 pathway”, while ontology node T1 (MDM2, p53) and ontology node T3 (p53, BAX) are subsets of {MDM2, p53, BAX}, so {MDM2, p53 , BAX} is added as a new ontology node T4, which is the parent node of T1 and T3.

과정 6에서, 기능 유사도가 다섯 번째로 높은 {CASP8, BCL2}가 신규 온톨로지 노드(T5)로 추가된다.In step 6, {CASP8, BCL2} having the fifth highest functional similarity is added as a new ontology node T5.

과정 7에서, 기능 유사도가 여섯 번째로 높은 {BAX, BCL2}에 의해 탐색된 극대 클릭 {BAX, BCL2, CASP8}가 온톨로지 노드 T2 및 온톨로지 노드 T5 의 부모 노드인 신규 온톨로지 노드(T6)로 추가된다.In step 7, the maximum click {BAX, BCL2, CASP8} found by {BAX, BCL2} with the sixth highest functional similarity is added as a new ontology node (T6), which is the parent node of the ontology node T2 and the ontology node T5. .

과정 8에서, 기능 유사도가 일곱 번째로 높은 BAX-CDK2가 유전자 네트워크에 추가되고, {BAX, CDK2}가 새로운 극대 클릭으로 탐색된다. 하지만, {BAX, CDK2}를 “알려진 기능1: p53 pathway” 및 “알려진 기능2: apoptosis”에 포함되지 않으므로, 공유(공통)되는 기능을 가지지 않는다고 판단한다. 따라서, {BAX, CDK2}는 온톨로지에 추가되지 않는다. In step 8, BAX-CDK2 with the seventh highest functional similarity is added to the gene network, and {BAX, CDK2} is searched for with a new maximum click. However, because {BAX, CDK2} is not included in “known function 1: p53 pathway” and “known function 2: apoptosis”, we judge that it does not have shared (common) function. Therefore, {BAX, CDK2} is not added to the ontology.

이와 같이 유전자 관계를 하나씩 추가해가면서 모든 관계가 유전자 네트워크에 표시될 때까지 온톨로지 확장을 수행하고, 최종 얻어진 온톨로지를 기저 온톨로지로서 획득할 수 있다. 기저 온톨로지는 아래 표 3과 같이, 다기능 유전자쌍으로 구성된 노드들로 생성된다. 본 과정에서는 설명을 위하여 유전자 간 관계를 하나씩 추가하였으나, 특정 개수씩, 혹은 특정 유사도 점수 간격으로 관계 집합을 추가하여 온톨로지를 구성할 수 있다.As such, by adding genetic relationships one by one, ontology expansion is performed until all relationships are displayed in the gene network, and the finally obtained ontology can be obtained as a base ontology. The base ontology is created with nodes composed of multifunctional gene pairs, as shown in Table 3 below. In this process, relationships between genes are added one by one for explanation, but an ontology can be constructed by adding a relationship set by a specific number or at a specific similarity score interval.

순서order 다기능 유전자쌍Multifunctional gene pair 기능 유사도Functional similarity 온톨로지 노드Ontology Node 1One MDM2-p53MDM2-p53 0.2360.236 T1T1 22 BAX-CASP8BAX-CASP8 0.2000.200 T2T2 33 p53- BAXp53-BAX 0.1950.195 T3T3 44 MDM2- BAXMDM2- BAX 0.1900.190 T4={ T1, T3}T4={ T1, T3} 55 CASP8- BCL2CASP8- BCL2 0.1870.187 T5T5 66 BAX- BCL2BAX- BCL2 0.1860.186 T6={T2, T5}T6={T2, T5} 77 BAX- CDK2BAX- CDK2 0.1680.168 -- …… …… …… …… 1515 BCL2- CDK2BCL2- CDK2 0.0920.092 --

도 7은 한 실시예에 따른 기능 온톨로지 구성방법을 예시적으로 설명하는 도면이다.7 is a diagram illustrating a method of configuring a functional ontology according to an exemplary embodiment.

도 3과 도 7을 참고하여, 기능 그룹 지표 발굴 장치(100)가 다기능 유전자들로 생성한 도 6의 기저 온톨로지로부터 기능 온톨로지를 생성하는 방법을 예시적으로 설명한다.Referring to FIGS. 3 and 7, a method of generating a functional ontology from the base ontology of FIG. 6 generated by the functional group indicator discovery apparatus 100 will be described as an example.

예를 들어, 전체 유전자들이 {MDM2, p53, BAX, CASP8, BAD, PLK3}라고 가정하면, 표 4와 같이 유전자 간 기능 유사도가 계산된다. 표 4의 기능 유사도에 따르면, 기능 유사도가 높은 유전자쌍은 표 5와 같이 정렬될 수 있다.For example, assuming that all genes are {MDM2, p53, BAX, CASP8, BAD, PLK3}, functional similarity between genes is calculated as shown in Table 4. According to the functional similarity of Table 4, gene pairs having high functional similarity can be aligned as shown in Table 5.

MDM2MDM2 p53p53 BAXBAX CASP8CASP8 BADBAD PLK3PLK3 MDM2MDM2 -- 0.2360.236 0.1900.190 0.1250.125 0.1050.105 0.1630.163 P53P53 0.2360.236 -- 0.1950.195 0.1370.137 0.1240.124 0.1650.165 BAXBAX 0.1900.190 0.1950.195 -- 0.2000.200 0.2170.217 0.1680.168 CASP8CASP8 0.1250.125 0.1370.137 0.2000.200 -- 0.2300.230 0.1030.103 BADBAD 0.1050.105 0.1650.165 0.2170.217 0.2300.230 -- 0.1980.198 PLK3PLK3 0.1630.163 0.1240.124 0.1680.168 0.1030.103 0.1980.198 --

순서order 유전자쌍Gene pair 기능 유사도Functional similarity 온톨로지 노드Ontology Node 1One MDM2-p53MDM2-p53 0.2360.236 기저 온톨로지 노드 T1Base Ontology Node T1 22 BAD-CASP8BAD-CASP8 0.2300.230 33 BAX- BADBAX- BAD 0.2170.217 44 BAX-CASP8BAX-CASP8 0.2000.200 기저 온톨로지 노드 T2Base Ontology Node T2 55 BAD- PLK3BAD- PLK3 0.1980.198 …… ……

“알려진 기능1: p53 pathway” 및 “알려진 기능2: apoptosis”를 이용하여 온톨로지를 구성하는 것을 설명한다. 도 6에서 설명한 바와 동일하게, 기능 유사도가 높은 유전자쌍부터 하나씩 유전자 네트워크에 추가하면서 극대 클릭을 찾고, 기저 온톨로지에 추가한다. It describes constructing an ontology using “known function 1: p53 pathway” and “known function 2: apoptosis”. In the same manner as described in FIG. 6, a gene pair having a high functional similarity is added to the gene network one by one, looking for the maximum click, and adding it to the base ontology.

과정 1에서, 기능 그룹 지표 발굴 장치(100)는 관계가 구성되지 않은 빈 유전자 네트워크를 생성하고, 기저 온톨로지를 획득한다. In step 1, the functional group index discovery apparatus 100 generates an empty gene network in which no relationship is formed, and acquires a base ontology.

과정 2에서, 기능 그룹 지표 발굴 장치(100)는 표 2에서, 기능 유사도가 가장 높은 유전자쌍(MDM2-p53)의 관계를 유전자 네트워크에 추가하고, 극대 클릭을 탐색한다. 이 때 극대 클릭 {MDM2, p53}는 다기능 유전자쌍이므로 이미 온톨로지 노드 T1으로 존재한다. 따라서, {MDM2, p53}이 온톨로지에 추가될 필요 없으므로, 다음 유전자쌍으로 넘어간다. 즉, 기능 온톨로지 구성 과정에서는 다기능 유전자가 아닌 유전자가 하나 이상 포함된 극대 클릭만 온톨로지에 추가된다. In step 2, the functional group index discovery apparatus 100 adds the relationship of the gene pair (MDM2-p53) having the highest functional similarity in Table 2 to the gene network, and searches for a maximum click. At this time, the maximum click {MDM2, p53} is a multifunctional gene pair, so it already exists as the ontology node T1. Therefore, {MDM2, p53} does not need to be added to the ontology, so it moves on to the next gene pair. In other words, in the process of constructing a functional ontology, only the maximum click containing at least one gene, not the multifunctional gene, is added to the ontology.

과정 3에서, BAD-CASP8이 유전자 네트워크에 추가되고, 신규 극대 클릭 {BAD, CASP8}이 탐색된다. {BAD, CASP8}에 다기능 유전자가 아닌 BAD라는 유전자가 포함되었다. 또한 {BAD, CASP8}는 “알려진 기능1: Apoptosis”를 공유하므로, 신규 말단 노드(T7)로서 온톨로지에 추가된다. In step 3, BAD-CASP8 is added to the gene network, and a new maximum click {BAD, CASP8} is searched. {BAD, CASP8} contained a gene called BAD, not a multifunctional gene. Also, because {BAD, CASP8} shares “Known Function 1: Apoptosis”, it is added to the ontology as a new end node (T7).

과정 4에서도, BAX-BAD 이 유전자 네트워크에 추가되고, 신규 극대 클릭 {BAX, BAD}이 탐색된다. {BAX, BAD}는 알려진 기능1: Apoptosis”를 공유하므로, 신규 말단 노드(T8)로서 온톨로지에 추가된다. Also in step 4, BAX-BAD is added to the gene network, and a new maximum click {BAX, BAD} is searched. Since {BAX, BAD} shares the known function 1: Apoptosis", it is added to the ontology as a new end node (T8).

과정 5에서, CASP8-BAX가 유전자 네트워크에 추가되고, 신규 극대 클릭 {BAD, BAX, CASP8}이 탐색된다. {BAD, BAX, CASP8}은 모두 “알려진 기능1: Apoptosis”를 공유하며, 온톨로지 노드 T2(CASP8, BAX), 온톨로지 노드 T7(BAD, CASP8), 온톨로지 노드 T8(BAD, BAX)이 {BAD, BAX, CASP8}의 부분 집합이므로, {BAD, BAX, CASP8}이 T2, T7, T8의 부모 노드(T9)로 온톨로지에 추가된다. In step 5, CASP8-BAX is added to the gene network, and new maximum clicks {BAD, BAX, CASP8} are searched. {BAD, BAX, CASP8} all share “Known Function 1: Apoptosis”, and ontology node T2 (CASP8, BAX), ontology node T7 (BAD, CASP8), and ontology node T8 (BAD, BAX) are {BAD, Since it is a subset of BAX, CASP8}, {BAD, BAX, CASP8} is added to the ontology as the parent node T9 of T2, T7, and T8.

과정 6에서, PLK3-BAD가 유전자 네트워크에 추가되는데, 신규 극대 클릭 {PLK3, BAD}는 “알려진 기능1: p53 pathway” 및 “알려진 기능2: apoptosis”에 포함되지 않으므로, 공유(공통)되는 기능을 가지지 않는다고 판단한다. 따라서, {PLK3, BAD}은 온톨로지에 추가되지 않는다.In step 6, PLK3-BAD is added to the gene network, a new maximal click {PLK3, BAD} is not included in the “known function 1: p53 pathway” and “known function 2: apoptosis”, so a shared (common) function. It is judged that it does not have. Therefore, {PLK3, BAD} is not added to the ontology.

이와 같이 유전자 관계를 하나씩 추가해가면서 모든 관계가 유전자 네트워크에 표시될 때까지 획득한 기저 온톨로지에 추가하여 온톨로지 확장을 수행하고, 최종 얻어진 온톨로지를 기능 온톨로지로서 획득할 수 있다. 기능 온톨로지는 아래와 표 6과 같이, 기저 온톨로지 노드에서 확장된 노드들로 생성된다. 기능 온톨로지 내의 모든 온톨로지 노드들은 기능 그룹 지표로서 출력될 수 있다. 본 과정에서는 설명을 위하여 유전자 간 관계를 하나씩 추가하였으나, 특정 개수씩, 혹은 특정 유사도 점수 간격으로 관계 집합을 추가하여 온톨로지를 구성할 수 있다. In this way, by adding genetic relationships one by one, ontology expansion is performed by adding to the acquired base ontology until all relationships are displayed in the gene network, and the finally obtained ontology can be obtained as a functional ontology. Functional ontology is created with nodes that are expanded from the base ontology node, as shown in Table 6 below. All ontology nodes in the functional ontology can be output as functional group indicators. In this process, relationships between genes are added one by one for explanation, but an ontology can be constructed by adding a relationship set by a specific number or at a specific similarity score interval.

순서order 유전자쌍Gene pair 기능 유사도Functional similarity 온톨로지 노드Ontology Node 1One MDM2-p53MDM2-p53 0.2360.236 기저 온톨로지 노드 T1Base Ontology Node T1 22 BAD-CASP8BAD-CASP8 0.2300.230 T7T7 33 BAX- BADBAX- BAD 0.2170.217 T8T8 44 BAX-CASP8BAX-CASP8 0.2000.200 T9={T2, T7, T8}T9={T2, T7, T8} 55 BAD- PLK3BAD- PLK3 0.1980.198 -- …… …… …… ……

이와 같이, 실시예에 따르면 기능이 유사한 유전자들을 포괄적이고 필수적인 형태의 기능군으로 재구성할 수 있고, 이를 통해 세포 내의 단위 기능들과 이들이 조합된 상위 기능들로 복잡하게 구성된 질병 기전을 반영할 수 있어, 질병 판별의 정확성과 안정성/재현성을 높일 수 있다.As described above, according to the embodiment, genes having similar functions can be reconstructed into a comprehensive and essential functional group, and through this, it is possible to reflect a disease mechanism complicatedly composed of unit functions within a cell and higher functions in which they are combined. , It can improve the accuracy and stability/reproducibility of disease identification.

실시예에 따르면 기존에 알려진 기능 유전자셋을 통합 및 재구성하여, 기능에 의한 인과관계가 알려진 유전자셋을 하나의 "기능 그룹 지표"로 발굴할 수 있고, 구체적으로 다양한 기능에 연관된 "공통기능 그룹 지표"및 특수한 세부 기능에 연관된 "세부기능 그룹 지표"의 조합으로 질병 및 세포 기능을 설명할 수 있는 그룹 지표를 선정할 수 있다. According to an embodiment, by integrating and reconfiguring a previously known functional gene set, a gene set with a known causal relationship due to a function can be discovered as one "functional group indicator", and specifically, a "common functional group indicator" related to various functions. The combination of “and “sub-function group indicators” associated with specific sub-functions can select group indicators that can describe disease and cellular functions.

실시예에 따르면 질병에 의미 있는 변이의 탐색이 어려워 신뢰도와 안정성이 떨어졌던 기존의 개별 유전자 발현 지표의 한계점을 극복하고, 다양한 기능 정보를 종합하여 활용하지 못하였던 기존 기능군 분석 방법의 문제점을 해결할 수 있다. According to the embodiment, it overcomes the limitations of the existing individual gene expression index, which was inferior in reliability and stability due to difficulty in searching for meaningful mutations in disease, and solves the problem of the existing functional group analysis method that was not able to synthesize and utilize various functional information I can.

이상에서 설명한 본 발명의 실시예는 장치 및 방법을 통해서만 구현이 되는 것은 아니며, 본 발명의 실시예의 구성에 대응하는 기능을 실현하는 프로그램 또는 그 프로그램이 기록된 기록 매체를 통해 구현될 수도 있다.The embodiments of the present invention described above are not implemented only through an apparatus and a method, but may be implemented through a program that realizes a function corresponding to the configuration of the embodiment of the present invention or a recording medium in which the program is recorded.

이상에서 본 발명의 실시예에 대하여 상세하게 설명하였지만 본 발명의 권리범위는 이에 한정되는 것은 아니고 다음의 청구범위에서 정의하고 있는 본 발명의 기본 개념을 이용한 당업자의 여러 변형 및 개량 형태 또한 본 발명의 권리범위에 속하는 것이다.Although the embodiments of the present invention have been described in detail above, the scope of the present invention is not limited thereto, and various modifications and improvements by those skilled in the art using the basic concept of the present invention defined in the following claims are also provided. It belongs to the scope of rights.

Claims

As an operating method of the functional group index excavation device,
Collecting functional groups in which genes with similar functions are grouped, and
Expanding the gene network by sequentially connecting gene pairs based on the similarity of the functional group containing the collected genes, and selecting the gene set of the maximum clique searched in the gene network as a functional group index
Containing, operating method.

In claim 1,
The step of selecting as the functional group index
If the gene set of the maximum click searched in the gene network exists in the collected functional groups, the procedure of generating the searched maximum click gene set as a node of the ontology is repeated, and corresponding to each node constituting the ontology. Outputting the gene set to be performed as the functional group index.

In paragraph 2,
The step of selecting as the functional group index
If there is a subset node of the gene set corresponding to the searched maximum click in the ontology, a node corresponding to the searched maximum click is added as a parent node of the partial set node,
If there is no subset node of the gene set corresponding to the searched maximum click in the ontology, adding a node corresponding to the searched maximum click as a terminal node,
How it works.

In claim 1,
The step of selecting as the functional group index
Extracting multifunctional genes having a frequency of appearance in the functional groups greater than or equal to a reference,
In the order of multifunctional gene pairs with high functional similarity between multifunctional genes, the first gene network is expanded by connecting the multifunctional gene pairs, searching for a maximum click in the first gene network, and a gene set corresponding to the searched maximum click If present in the collected functional groups, generating a base ontology by repeating the procedure of generating the gene set corresponding to the searched maximum click as an ontology node,
Calculate the functional similarity between genes for the collected genes, expand the second gene network by connecting the corresponding gene pairs in the order of gene pairs with high functional similarity, search for maximum clicks in the second gene network, and search If the gene set corresponding to one maximum click exists in the collected functional groups, the function ontology is performed by repeating the procedure of adding the gene set of the maximum click searched in the second gene network to the base ontology based on the base ontology. To generate, and
Outputting nodes constituting the functional ontology as functional group indicators
Containing, operating method.

delete