KR20180090680A

KR20180090680A - Geneome analysis system

Info

Publication number: KR20180090680A
Application number: KR1020170015826A
Authority: KR
Inventors: 박수준
Original assignee: 한국전자통신연구원
Priority date: 2017-02-03
Filing date: 2017-02-03
Publication date: 2018-08-13

Abstract

The present invention relates to a genome analysis system. The dielectric analysis system according to an embodiment of the present invention includes a folksonomy construction system and an analysis part. The folksonomy construction system receives omix data for genes corresponding to the clinical variable of a specific disease. The folksonomy construction system extracts genome annotations from the omix data and classifies the genome annotations to generate a classification matrix. The folksonomy construction system generates mapping data corresponding to a folksonomy structure based on the classification matrix. The analysis part analyzes disease associations to genes based on the mapping data. The genome analysis system of the present invention can efficiently analyze a relationship between genome and disease.

Description

[0001] GENEOME ANALYSIS SYSTEM [0002]

본 발명은 질병 예측 및 분석에 관한 것으로, 좀 더 상세하게는 유전체 분석 시스템에 관한 것이다.The present invention relates to disease prediction and analysis, and more particularly to a genomic analysis system.

유전체는 한 생물체가 갖고 있는 모든 유전 정보의 집합체를 의미한다. 한 생물체를 구성하는 모든 세포는 동일한 수의 염색체와 유전 정보를 갖고 있다. 과거의 유전체 분석은 정성적인 방법에 의존하였으나, 최근에는 차세대 염기서열(NGS, next generation sequencing) 기술을 이용하여 유전체 분야를 정량적으로 분석할 수 있다. 차세대 염기서열 기술은 기존 염기서열 분석과 달리, 다수의 DNA 조각들을 병렬로 처리하여 염기서열을 고속으로 분석할 수 있다. 따라서, 차세대 염기서열 기술을 이용하여 다양한 유전체 주석들이 생산될 수 있다.A genome is a collection of all the genetic information an organism has. Every cell that makes up an organism has the same number of chromosomes and genetic information. In the past, genomic analysis depended on qualitative methods, but recently it is possible to quantitatively analyze the genome using the next generation sequencing (NGS) technology. Unlike conventional sequence sequencing, the next-generation sequencing technology allows a large number of DNA fragments to be processed in parallel to analyze the base sequence at high speed. Thus, a variety of dielectric tin can be produced using the next generation sequencing technology.

전자 통신 기술이 발달함에 따라, 생명 공학 및 의학 분야에도 전자 통신 기술이 빈번하게 이용되고 있다. 따라서, 전자 통신 기술을 이용하여 유전체를 분석하고, 인간의 질병을 진단하거나 예측하고자 하는 요구가 제기되고 있다. 또한, 생명공학의 발달로 유전 정보 검출 기술이 발달하고, 이에 따라 다양한 유전체 주석들에 근거하여 질병을 이해하고자 하는 요구가 제기되고 있다.BACKGROUND OF THE INVENTION With the development of electronic communication technology, electronic communication technology is frequently used in biotechnology and medicine. Thus, there is a need to analyze genomes using electronic communication technologies and to diagnose or predict human diseases. In addition, the development of biotechnology has led to the development of genetic information detection technology, which has led to the demand for understanding diseases based on various genome annotations.

유전체 주석들과 질병과의 상관관계를 분석하고, 상관관계에 대한 신뢰성을 향상시키기 위하여는 유전체 데이터들이 효과적으로 분류되어야 한다. 따라서, 유전체 데이터들의 효과적 분류를 위한 다양한 알고리즘의 개발이 요구되고 있다.Dielectric data should be effectively categorized to analyze correlations between genome annotations and disease, and to improve confidence in correlation. Therefore, development of various algorithms for effective classification of genomic data is required.

본 발명은 유전체와 질병 간의 연관성을 효과적으로 분석할 수 있는 유전체 분석 시스템을 제공할 수 있다.The present invention can provide a dielectric analysis system capable of effectively analyzing the relationship between a genome and a disease.

본 발명의 실시예에 따른 유전체 분석 시스템은 폭소노미 구축 시스템 및 분석부를 포함한다. A dielectric analysis system according to an embodiment of the present invention includes a fountain-well construction system and an analysis unit.

폭소노미 구축 시스템은 특정 질병의 임상 변수에 대응되는 유전자에 대한 오믹스 데이터를 수신한다. 폭소노미 구축 시스템은 오믹스 데이터로부터 유전체 주석들을 추출하고, 유전체 주석들을 분류하여 분류 매트릭스를 생성한다. 폭소노미 구축 시스템은 분류 매트릭스에 근거하여 유전자와 유전체 주석들을 연결하고, 유전체 주석들과 임상 변수를 연결하는 매핑 데이터를 생성한다. The folksonomy building system receives the omix data for the genes corresponding to the clinical variables of a specific disease. The follyonomy construction system extracts the dielectric annotations from the omix data and classifies the dielectric annotations to generate a classification matrix. The phantom building system links gene and genome annotations based on the classification matrix and generates mapping data that links genomic annotations to clinical variables.

분석부는 매핑 데이터에 근거하여 유전자에 대한 질병 연관성을 분석한다. 분석부는 유전자와 유전체 주석들 사이의 연관성, 유전체 주석들과 임상 변수 사이의 연관성, 또는 유전자와 임상 변수 사이의 연관성을 분석한다.The analysis unit analyzes the disease association with the gene based on the mapping data. The analysis unit analyzes the association between genes and genetic annotations, the relationship between genetic annotations and clinical variables, or the relationship between genes and clinical variables.

본 발명의 실시예에 따른 유전체 분석 시스템은 유전자, 유전체 주석, 및 질병을 폭소노미 구조로 매핑하여 유전체와 질병 간 연관성 분석의 신뢰성 및 효율성을 확보할 수 있다.The dielectric analysis system according to the embodiment of the present invention can ensure the reliability and efficiency of the correlation analysis between the genome and the disease by mapping the gene, the tin and the disease to the fuzzy nominal structure.

도 1은 본 발명의 일 실시예에 따른 유전체 분석 시스템의 블록도이다.
도 2는 도 1의 폭소노미 구축 시스템의 블록도이다.
도 3은 유전자, 유전체 주석, 및 임상 변수를 연결하는 구조를 설명하기 위한 도면이다.
도 4는 유전체 분석 시스템을 이용한 유전체 분석 방법을 도시한 순서도이다.1 is a block diagram of a dielectric analysis system according to an embodiment of the present invention.
Figure 2 is a block diagram of the follyonomy building system of Figure 1;
3 is a diagram for explaining a structure linking genes, dielectric annotations, and clinical parameters.
4 is a flowchart showing a method of analyzing a dielectric using a dielectric analysis system.

아래에서는, 본 발명의 기술 분야에서 통상의 지식을 가진 자가 본 발명을 용이하게 실시할 수 있을 정도로, 본 발명의 실시 예들이 명확하고 상세하게 기재된다.Hereinafter, embodiments of the present invention will be described in detail and in detail so that those skilled in the art can easily carry out the present invention.

도 1은 본 발명의 일 실시예에 따른 유전체 분석 시스템의 블록도이다.1 is a block diagram of a dielectric analysis system according to an embodiment of the present invention.

도 1을 참조하면, 유전체 분석 시스템(1000)은 임상 정보 데이터베이스(100), 폭소노미 구축 시스템(200), 분석부(300), 및 질병 연관성 데이터베이스(400)를 포함한다.Referring to FIG. 1, a dielectric analysis system 1000 includes a clinical information database 100, a follyonomy construction system 200, an analysis unit 300, and a disease relevance database 400.

임상 정보 데이터베이스(100)는 복수의 인체 또는 생체에 대한 오믹스 데이터(od)를 저장한다. 오믹스 데이터(od)는 생물에 대한 유전체 데이터, 전사체 데이터, 또는 단백질체 데이터와 같은 다양한 생물학적 정보에 대한 데이터를 포함한다. 임상 정보 데이터베이스(100)에 저장된 오믹스 데이터(od)는 오믹스 데이터(od)의 추출 대상에 대한 정보를 포함할 수 있다. 예를 들어, 오믹스 데이터(od)가 특정 질병에 대한 환자로부터 추출된 유전체 등에 근거하여 생성된 경우, 오믹스 데이터(od)는 환자의 임상 정보를 포함할 수 있다. 임상 정보 데이터베이스(100)는 TCGA(The Cancer Genome Atlas), COSMIC, GEO, ArrayExpress, GWAS central, 또는 CNVDB와 같은 외부 데이터베이스일 수 있다.The clinical information database 100 stores a plurality of omix data od for a human body or a living body. The omix data (od) includes data on various biological information such as genomic data for organisms, transcript data, or protein body data. The omix data od stored in the clinical information database 100 may include information on an object of extraction of the omix data od. For example, when the omix data od is generated based on a genome or the like extracted from a patient for a specific disease, the omix data od may include clinical information of the patient. The clinical information database 100 may be an external database such as the Cancer Genome Atlas (TCGA), COSMIC, GEO, ArrayExpress, GWAS central, or CNVDB.

폭소노미 구축 시스템(200)은 임상 정보 데이터베이스(100)로부터 오믹스 데이터(od)를 수신한다. 폭소노미 구축 시스템(200)은 임상 정보 데이터베이스(100)로부터 특정 임상 정보에 대한 오믹스 데이터(od)를 선별하여 수신할 수 있다. 즉, 폭소노미 구축 시스템(200)은 특정 질병에 대한 환자군을 특정하여, 임상 정보 데이터베이스(100)에 접근하고, 특정된 환자군에 대한 오믹스 데이터(od)를 임상 정보에 근거하여 수신할 수 있다.The follyonomy construction system 200 receives the omix data od from the clinical information database 100. The follyonomy construction system 200 can selectively receive the omix data od for specific clinical information from the clinical information database 100. [ That is, the folomonomy establishing system 200 can access the clinical information database 100 by specifying a patient group for a specific disease, and receive omix data (od) for a specified patient group based on clinical information.

폭소노미 구축 시스템(200)은 오믹스 데이터(od)에 근거하여 유전체 데이터와 임상 정보에 대한 분류 체계를 구축한다. 폭소노미 구축 시스템(200)은 사용자, 태그, 및 리소스의 연결관계를 이용한 폭소노미 구조에 대응되도록 유전자, 유전체 주석, 및 임상 변수의 연결관계를 규정한다. 폭소노미 구축 시스템(200)은 유전체 데이터로부터 특정 임상 변수에 대응되는 유전체 주석을 추출한다. 폭소노미 구축 시스템(200)은 추출된 유전체 주석을 분류하여 분류 매트릭스를 생성한다. 폭소노미 구축 시스템(200)은 분류 매트릭스에 근거하여 유전자와 유전체 주석을 연결하고 유전체 주석과 임상 변수를 연결하도록 연결관계를 구축한다. 폭소노미 구축 시스템(200)은 이러한 연결관계에 근거하여 매핑 데이터(md)를 생성한다. 구체적인 내용은 후술된다.The follyonomy construction system 200 establishes a classification system for genome data and clinical information based on the omix data (od). The follyonomy construction system 200 defines a connection relationship of genes, genome annotations, and clinical variables so as to correspond to the follyonomy structure using the connection relation of users, tags, and resources. The follyonomy construction system 200 extracts dielectric annotations corresponding to specific clinical variables from the genome data. The follyonomy construction system 200 generates the classification matrix by classifying the extracted dielectric annotations. The follyonomy construction system 200 establishes a connection relationship by linking genes and dielectric annotations based on the classification matrix and connecting dielectric annotations to clinical variables. The follyonomy construction system 200 generates the mapping data md based on this connection relationship. Details will be described later.

분석부(300)는 매핑 데이터(md)에 근거하여 유전체와 질병 정보 사이의 연관성을 분석한다. 분석부(300)는 폭소노미 구축 시스템(200)에서 매핑한 매핑 데이터(md)를 수신할 수 있다. 또는, 분석부(300)는 질병 연관성 데이터베이스(400)에 저장된 매핑 데이터(md)를 수신할 수 있다. 질병 정보 사이의 연관성 분석은 질병군의 유전체 주석을 예측하거나, 임상적 경향을 분석하거나, 관련 유전자들의 집합을 예측하는 것을 포함할 수 있다.The analysis unit 300 analyzes the association between the genome and disease information based on the mapping data md. The analysis unit 300 may receive the mapping data md mapped by the follyonomy construction system 200. Alternatively, the analysis unit 300 may receive the mapping data md stored in the disease relevance database 400. [ An association analysis between disease information may include predicting genomic annotations in a disease group, analyzing clinical trends, or predicting a set of related genes.

분석부(300)는 임상 변수와 연관성이 높은 유전자를 예측할 수 있다. 예를 들어, 분석부(300)는 유전자와 임상 변수의 연결관계를 정량적으로 분석한다. 분석부(300)는 특정 임상 변수에 연결된 유전자들의 공통점, 즉 콘센서스 마커(consensus marker)를 추출한다. 또한, 분석부(300)는 다른 유전자들과 특정 임상 변수에 연결된 유전자들 사이의 차이점, 즉 특이적인 마커를 추출한다. 분석부(300)는 특이적인 마커 및 콘센서스 마커의 교집합에 근거하여 임상 변수와 연관성이 높은 유전자를 예측할 수 있다.The analysis unit 300 can predict a gene having a high correlation with a clinical parameter. For example, the analysis unit 300 quantitatively analyzes a link between a gene and a clinical variable. The analysis unit 300 extracts the common points of the genes linked to the specific clinical variables, that is, consensus markers. In addition, the analysis unit 300 extracts a difference between the genes and the genes linked to specific clinical variables, i.e., specific markers. The analysis unit 300 can predict a gene having a high correlation with a clinical parameter based on the intersection of specific markers and consensus markers.

분석부(300)는 특정 유전체 주석을 공통으로 사용하는 패턴을 보이는 유전자를 예측할 수 있다. 예를 들어, 분석부(300)는 유전자와 유전체 주석의 연결관계를 정량적으로 분석한다. 분석부(300)는 특정 유전체 주석에 연결된 유전자들의 공통점, 즉 콘센서스 마커를 추출한다. 또한, 분석부(300)는 다른 유전자들과 특정 유전체 주석에 연결된 유전자들 사이의 차이점, 즉 특이적인 마커를 추출한다. 분석부(300)는 특이적인 마커 및 콘센서스 마커의 교집합에 근거하여 특정 유전체 주석을 공통으로 사용하는 패턴을 보이는 유전자를 예측할 수 있다.The analyzer 300 can predict a gene showing a pattern commonly using specific dielectric tin. For example, the analysis unit 300 quantitatively analyzes a link between a gene and a dielectric annotation. The analysis unit 300 extracts the common points, i.e., consensus markers, of the genes linked to specific dielectric tins. In addition, the analysis unit 300 extracts a difference between the genes linked to specific genomes and other genes, that is, specific markers. The analyzer 300 can predict a gene showing a pattern commonly using specific dielectric tins based on the intersection of specific markers and consensus markers.

분석부(300)는 동일한 임상 변수에 대하여 공통의 유전체 주석을 갖는 유전자를 예측할 수 있다. 예를 들어, 분석부(300)는 유전체 주석과 임상 변수의 연결관계를 정량적으로 분석하여 동일한 임상 변수 및 동일한 유전체 주석과 연결된 유전자들의 공통점, 그리고 다른 유전자들과의 차이점을 추출한다. 이러한 공통점 및 차이점에 근거하여 분석부(300)는 동일한 임상 변수에 대하여 공통의 유전자 주석을 갖는 유전자를 예측할 수 있다.The analysis unit 300 can predict a gene having a common genome annotation for the same clinical variables. For example, the analysis unit 300 quantitatively analyzes the linkage between the genome annotation and the clinical variables, and extracts the same clinical variables and the common points of genes linked to the same genome annotation, and differences with other genes. Based on these common points and differences, the analysis unit 300 can predict a gene having a common gene annotation for the same clinical variable.

분석부(300)는 상술한 특이적인 마커 또는 콘센서스 마커를 이용하여 유전자와 질병 사이의 연관성을 분석할 수 있고, 정량적 분석을 위하여 순열 테스트, 또는 인리치먼트 테스트(enrichment test) 등을 수행할 수 있다. 분석부(300)는 매핑 데이터(md)에 근거하여 유전자와 질병 사이의 연관관계의 분석 결과 정보를 포함하는 분석 데이터를 생성한다.The analysis unit 300 may analyze the association between the gene and the disease using the specific marker or the consensus marker described above and perform a permutation test or an enrichment test for the quantitative analysis . The analysis unit 300 generates analysis data including analysis result information of the association between the gene and the disease based on the mapping data md.

질병 연관성 데이터베이스(400)는 매핑 데이터(md) 및 분석 데이터를 저장한다. 질병 연관성 데이터베이스(400)는 유전자, 유전체 주석, 및 임상 변수의 연결관계에 관한 매핑 데이터(md)를 데이터베이스화하여 저장하는 매핑 데이터베이스를 포함하고, 유전자와 질병 사이의 연관관계에 대한 분석 데이터를 데이터베이스화하여 저장하는 분석 데이터베이스를 포함할 수 있다.The disease relevance database 400 stores mapping data md and analysis data. The disease association database 400 includes a mapping database for storing mapping data (md) relating to the linkage relationship of genes, genome annotations, and clinical variables in a database, and stores analysis data on the association between genes and diseases And an analysis database for storing the converted data.

질병 연관성 데이터베이스(400)는 폭소노미 구축 시스템(200)으로부터 매핑 데이터(md)를 수신하고, 분석부(300)로부터 분석 데이터를 수신할 수 있다. 질병 연관성 데이터베이스(400)는 폭소노미 구축 시스템(200)으로부터 주기적으로 매핑 데이터(md)를 수신하여 유전자, 유전체 주석, 및 임상 변수의 연결관계 정보를 갱신할 수 있고, 분석부(300)로부터 주기적으로 분석 데이터를 수신하여 분석 결과 정보를 갱신할 수 있다. 사용자는 질병 연관성 데이터베이스(400)에 접근하여 유전자와 질병 사이의 연관성을 참고할 수 있다.The disease relevance database 400 may receive the mapping data md from the phonemic building system 200 and receive the analysis data from the analysis unit 300. The disease relevance database 400 periodically receives the mapping data md from the phonemic building system 200 to update the linkage information of the gene, the genome annotation, and the clinical variables, The analysis result information can be updated by receiving the analysis data. The user can access the disease association database 400 to refer to the association between the gene and the disease.

사용자는 유전체 분석 시스템(1000)을 이용하여 질병 가능성을 예측하거나, 의료 진단을 수행할 수 있다. 예를 들어, 의사와 같은 사용자는 환자 또는 내원한 고객과 같은 피검자의 유전자를 검출하여, 유전체 분석 시스템(1000)을 이용하여 해당 유전자를 분석할 수 있다. 해당 유전자와 유사한 유전자를 갖는 샘플에 대한 유전체 주석 및 임상 변수가 참조될 수 있고, 해당 유전자의 유전체 주석을 추출하여 질병 연관성이 검토될 수 있다. 사용자는 질병 연관성 분석 결과에 근거하여 피검자의 질병 가능성 및 예방책을 제안할 수 있다.The user can use the genome analysis system 1000 to predict a disease possibility or perform a medical diagnosis. For example, a user such as a doctor can detect a gene of a subject such as a patient or a visiting customer, and analyze the gene using the genome analysis system 1000. Dielectric tin and clinical variables for samples with genes similar to the gene of interest can be referred to, and genetic tin of the gene can be extracted for disease relevance. Based on the results of the disease association analysis, the user can suggest the possibility and prevention of the disease of the subject.

도 2는 도 1의 폭소노미 구축 시스템의 블록도이다.Figure 2 is a block diagram of the follyonomy building system of Figure 1;

도 2를 참고하면, 폭소노미 구축 시스템(200)은 통신 인터페이스(210), 시스템 인터페이스(220), 유전체 주석 추출부(230), 분류 매트릭스 생성부(240), 및 매핑부(250)를 포함한다.2, the follyonomy construction system 200 includes a communication interface 210, a system interface 220, a dielectric annotation extraction unit 230, a classification matrix generation unit 240, and a mapping unit 250 .

통신 인터페이스(210)는 사용자의 요청에 따라 임상 정보 데이터베이스(100)로부터 오믹스 데이터(od)를 수신한다. 통신 인터페이스(210)는 임상 정보 데이터베이스(100)에 특정 임상 정보에 대한 요청 신호를 제공하고, 특정 임상 정보에 대응되는 오믹스 데이터(od)를 임상 정보 데이터베이스(100)로부터 수신한다. 통신 인터페이스(210)는 시스템 인터페이스(220)로부터 특정 임상 정보에 대한 요청 신호를 수신한다. 통신 인터페이스(210)는 오믹스 데이터(od)를 유전체 주석 추출부(230)에 제공한다.The communication interface 210 receives the omix data od from the clinical information database 100 at the request of the user. The communication interface 210 provides a request signal for specific clinical information to the clinical information database 100 and receives the omix data od corresponding to the specific clinical information from the clinical information database 100. The communication interface 210 receives a request signal for specific clinical information from the system interface 220. The communication interface 210 provides the omix data od to the dielectric annotation extractor 230.

시스템 인터페이스(220)는 사용자의 요청에 따라 통신 인터페이스(210)가 임상 정보 데이터베이스(100)로부터 오믹스 데이터(od)를 수신하도록 제어한다. 시스템 인터페이스(220)는 사용자로부터 외부 명령(USER)을 수신하고, 외부 명령(USER)에 근거하여 통신 인터페이스(210)를 제어한다. 외부 명령(USER)은 특정 질병에 대한 임상 정보에 대한 오믹스 데이터 수집 명령일 수 있다. 외부 명령(USER)은 매핑 데이터(md)의 다각적 분석을 위하여 복수의 임상 정보와 대한 오믹스 데이터 수집 명령을 포함할 수 있다. 이 경우, 통신 인터페이스(210)는 복수의 임상 정보에 대응되는 오믹스 데이터(od)를 수신할 수 있다.The system interface 220 controls the communication interface 210 to receive the omix data od from the clinical information database 100 according to a user's request. The system interface 220 receives an external command USER from the user and controls the communication interface 210 based on an external command USER. The external command (USER) may be an error data collection command for clinical information on a particular disease. The external command USER may include a plurality of clinical information and an omnix data collection command for the multiple analysis of the mapping data md. In this case, the communication interface 210 can receive the omix data od corresponding to a plurality of pieces of clinical information.

유전체 주석 추출부(230)는 통신 인터페이스(210)로부터 오믹스 데이터(od)를 수신한다. 유전체 주석 추출부(230)는 오믹스 데이터(od)에서 유전체 주석을 추출한다. 유전체 주석 추출부(230)는 오믹스 데이터(od)로부터 다양한 변수들을 필터링하여 유전체 주석을 추출할 수 있다. 예를 들어, 유전체 주석 추출부(230)는 복제수변이(CNV, copy number variation) 필터링, 단일염기서열(SNV, single nucleotide variation), 메틸 필터링, miRNA 필터링, 또는 mRNA(단백질) 필터링을 이용하여 오믹스 데이터(od)로부터 유전체 주석을 추출할 수 있다. 다만, 이에 제한되지 않고, 유전체 주석 추출부(230)는 다양한 방식의 유전체 주석 추출 방식을 가질 수 있다. 유전체 주석은 SNV, CNV, 메틸, miRNA, 염기서열의 삽입-결실(INDEL), 단백질, 또는 차등발현 유전자 등을 포함할 수 있다. 유전체 주석 추출부(230)는 추출된 유전체 주석에 근거하여 유전체 주석 데이터를 생성한다.The dielectric annotation extractor 230 receives the omix data od from the communication interface 210. The dielectric annotation extractor 230 extracts the dielectric annotation from the omix data od. The dielectric annotation extractor 230 can extract various dielectric annotations by filtering various parameters from the omix data od. For example, the dielectric annotation extractor 230 may extract the SNVs using copy number variation (CNV) filtering, single nucleotide variation (SNV), methyl filtering, miRNA filtering, or mRNA (protein) The dielectric annotation can be extracted from the omix data od. However, the present invention is not limited to this, and the dielectric tin extraction unit 230 may have various types of dielectric tin extraction methods. Dielectric tin may include SNV, CNV, methyl, miRNA, insertion-deletion (INDEL) of the nucleotide sequence, protein, or differential expression gene. The dielectric annotation extractor 230 generates dielectric annotation data based on the extracted dielectric annotations.

분류 매트릭스 생성부(240)는 유전체 주석 추출부(230)로부터 유전체 주석 데이터를 수신한다. 분류 매트릭스 생성부(240)는 유전체 주석 데이터를 분류하여 분류 매트릭스를 생성할 수 있다. 예를 들어, 분류 매트릭스 생성부(240)는 특정 임상 정보에 근거한 임상 변수에 대응하는 유전체 주석 데이터를 CNV, SNV, 메틸, 및 miRNA 등으로 분류하고, 임상 변수에 대응하는 유전자와 유전체 주석의 관계를 매트릭스로 나타낼 수 있다. 즉, 하나의 임상 변수에 대한 유전자 및 유전체 주석의 관계를 나타내는 하나의 매트릭스가 생성될 수 있다. The classification matrix generator 240 receives the dielectric annotation data from the dielectric annotation extractor 230. The classification matrix generator 240 may classify the dielectric annotation data to generate a classification matrix. For example, the classification matrix generator 240 classifies the genomic annotation data corresponding to clinical variables based on specific clinical information into CNV, SNV, methyl, miRNA, and the like, and determines the relationship between the gene corresponding to the clinical variable and the genome annotation Can be represented by a matrix. That is, one matrix can be generated that represents the relationship of gene and dielectric annotations to one clinical variable.

분류 매트릭스 생성부(240)는 임상 변수를 기준으로 매트릭스를 생성하지 않고, 유전체 주석에 대한 유전자와 임상 변수의 관계를 매트릭스로 나타내거나, 유전자에 대한 유전체 주석과 임상 변수의 관계를 매트릭스로 나타낼 수 있다. 이 경우, 폭소노미 구축 시스템(200)은 복수의 임상 변수에 대한 데이터를 확보하기 위하여 임상 정보 데이터베이스(100)에 복수의 임상 정보에 대한 요청 신호를 제공할 수 있고, 복수의 임상 정보에 대한 오믹스 데이터(od)를 수신할 수 있다.The classification matrix generation unit 240 may generate a matrix of the relationship between the gene and the clinical variable for the dielectric annotation or a matrix of the relationship between the genome annotation for the gene and the clinical variable without generating the matrix based on the clinical variable have. In this case, the follyonomy construction system 200 may provide a request signal for a plurality of clinical information to the clinical information database 100 to secure data for a plurality of clinical variables, It is possible to receive the data od.

분류 매트릭스 생성부(240)는 이 외에 다양한 통계적 분석 방법을 이용하여 분류 매트릭스를 생성할 수 있다. 예를 들어, 분류 매트릭스 생성부(240)는 t-테스트(t-test), 서바이벌(survival), 또는 기계학습 등을 이용하여 분류 매트릭스를 생성할 수 있다.The classification matrix generation unit 240 may generate a classification matrix using various statistical analysis methods. For example, the classification matrix generator 240 may generate a classification matrix using t-test (t-test), survival, or machine learning.

매핑부(250)는 분류 매트릭스 생성부(240)로부터 분류 매트릭스를 수신한다. 매핑부(250)는 분류 매트릭스에 근거하여 유전자, 유전체 주석, 및 임상 변수 사이의 연결관계를 규정한다. 유전자, 유전체 주석, 및 임상 변수 사이의 연결관계는 폭소노미 구조에 대응된다. 폭소노미 구조는 사용자가 태그를 태깅하여 리소스에 접근하는 구조에 대응될 수 있다. 매핑부(250)는 유전자가 유전체 주석을 태깅하여 임상 변수에 접근하는 구조로 분류 매트릭스를 매핑할 수 있다. 예를 들어, 매핑부(250)는 분류 매트릭스에 포함된 유전체 주석 성분에 근거하여 유전체 주석과 임상 변수를 연결하는 네트워크 구조를 구축할 수 있다. 매핑부(250)는 분류 매트릭스에 포함된 유전체 주석 성분 및 유전자 성분의 교집합 값에 근거하여 유전자와 유전체 주석을 연결하는 네트워크 구조를 구축할 수 있다. 매핑부(250)는 이러한 유전자, 유전체 주석, 및 임상 변수를 연결하는 관계에 대한 매핑 데이터(md)를 생성한다. 매핑 데이터(md)는 분석부(300) 또는 질병 연관성 데이터베이스(400)에 제공된다.The mapping unit 250 receives the classification matrix from the classification matrix generation unit 240. The mapping unit 250 defines the linkage between the gene, the dielectric annotation, and the clinical variable based on the classification matrix. The linkage relationship between genes, genome tin, and clinical variables corresponds to the fungus structure. The follyonomy structure can correspond to a structure in which a user tags a tag to access a resource. The mapping unit 250 can map the classification matrix to a structure in which genes tag the genome annotations to access the clinical variables. For example, the mapping unit 250 may construct a network structure that links the dielectric annotation with the clinical variables based on the dielectric tin components included in the classification matrix. The mapping unit 250 can construct a network structure linking the gene and the dielectric annotation based on the intersection value of the dielectric tin component and the gene component included in the classification matrix. The mapping unit 250 generates mapping data (md) for a relation linking these genes, dielectric annotations, and clinical variables. The mapping data md is provided to the analysis unit 300 or the disease relevance database 400.

도 3은 유전자, 유전체 주석, 및 임상 변수를 연결하는 구조를 설명하기 위한 도면이다.3 is a diagram for explaining a structure linking genes, dielectric annotations, and clinical parameters.

도 3을 참조하면, 도 2의 폭소노미 구축 시스템(200)은 3개의 오믹스 데이터(od)를 수신한 것일 수 있다. 즉, 폭소노미 구축 시스템(200)은 3명의 피검자 각각에 대응되는 제1 유전자(G1), 제2 유전자(G2), 및 제3 유전자(G3)에 대한 오믹스 데이터(od)를 임상 정보 데이터베이스(100)로부터 수신할 수 있다.Referring to FIG. 3, the follyonomy construction system 200 of FIG. 2 may have received three pieces of omics data od. In other words, the follyonomy construction system 200 stores the omix data od for the first gene (G1), the second gene (G2), and the third gene (G3) 100).

폭소노미 구축 시스템(200)은 제1 내지 제3 유전자(G1~G3)로부터 유전체 주석을 추출한다. 제1 내지 제3 유전자(G1~G3)로부터 추출된 유전체 주석은 SNV, CNV, 메틸(Methyl), 및 miRNA를 포함한다. 제1 임상 변수(R1)에 대응되는 유전체 주석은 제1 SNV(T11), 제1 CNV(T12), 제1 Methyl(T13), 및 제1 miRNA(T14)를 포함한다. 제2 임상 변수(R2)에 대응되는 유전체 주석은 제2 SNV(T21), 제2 CNV(T22), 제2 Methyl(T23), 및 제2 miRNA(T24)를 포함한다. 제3 임상 변수(R3)에 대응되는 유전체 주석은 제3 SNV(T31), 제3 CNV(T32), 제3 Methyl(T33), 및 제3 miRNA(T34)를 포함한다. 제1 유전자(G1)로부터 추출된 유전체 주석은 제1 SNV(T11), 제1 CNV(T12), 제1 miRNA(T14), 제2 CNV(T22), 및 제3 Methyl(T33)을 포함한다. 제2 유전자(G2)로부터 추출된 유전체 주석은 제1 CNV(T12), 제2 SNV(T21), 제2 miRNA(T24), 및 제3 CNV(T32)를 포함한다. 제3 유전자(G3)로부터 추출된 유전체 주석은 제1 Methyl(T13), 제2 Methyl(T23), 제3 SNV(T31), 제3 Methyl(T33), 및 제3 miRNA(T34)를 포함한다.The follyonomy construction system 200 extracts dielectric tins from the first to third genes G1 to G3. Dielectric tin extracted from the first to third genes (G1 to G3) includes SNV, CNV, methyl (Methyl), and miRNA. Dielectric tin corresponding to the first clinical variable R1 includes a first SNV (T11), a first CNV (T12), a first Methyl (T13), and a first miRNA (T14). The dielectric tin corresponding to the second clinical variable R2 includes a second SNV (T21), a second CNV (T22), a second Methyl (T23), and a second miRNA (T24). Dielectric tin corresponding to the third clinical variable R3 includes the third SNV (T31), the third CNV (T32), the third Methyl (T33), and the third miRNA (T34). The dielectric tin extracted from the first gene (G1) includes a first SNV (T11), a first CNV (T12), a first miRNA (T14), a second CNV (T22), and a third Methyl (T33). The dielectric tin extracted from the second gene G2 includes a first CNV (T12), a second SNV (T21), a second miRNA (T24), and a third CNV (T32). Dielectric tin extracted from the third gene (G3) includes the first Methyl (T13), the second Methyl (T23), the third SNV (T31), the third Methyl (T33), and the third miRNA (T34).

유전체 주석 및 임상 변수는 폭소노미 구축 시스템(200)에 의하여 구축된 네트워크 구조인 것일 수 있다. 이 경우, 도 3은 3명의 피검자로부터 추출한 제1 내지 제3 유전자(G1~G3)로부터 질병 연관성을 분석하기 위한 것일 수 있다. 분석부(300)는 제1 내지 제3 유전자(G1~G3)의 유전체 주석으로부터 질병 연관성을 분석할 수 있다. 예를 들어, 제1 유전자(G1)를 갖는 피검자로부터 제1 임상 변수에 대응되는 유전체 주석들 중 제1 SNV(T11), 제1 CNV(T12), 제1 miRNA(T14)가 검출된 것으로, 제1 임상 변수에 의하여 발현되는 질병 가능성이 다른 피검자들에 비해 높다고 분석될 수 있다. 제3 유전자(G3)를 갖는 피검자는 제3 임상 변수에 대응되는 유전체 주석들 중 제3 SNV(T31), 제3 Methyl(T33), 제3 miRNA(T34)가 검출된 것으로, 제3 임상 변수에 의하여 발현되는 질병 가능성이 다른 피검자들에 비해 높다고 분석될 수 있다.The dielectric tin and the clinical variables may be network structures constructed by the phantom building system 200. In this case, FIG. 3 may be for analyzing the disease association from the first to third genes (G1 to G3) extracted from the three subjects. The analysis unit 300 may analyze the disease association from the genetic annotations of the first to third genes G1 to G3. For example, the first SNV (T11), the first CNV (T12), and the first miRNA (T14) of the dielectric tin corresponding to the first clinical variable are detected from the subject having the first gene (G1) The possibility of disease expressed by the first clinical variable is higher than that of the other subjects. The third SNV (T31), the third Methyl (T33), and the third miRNA (T34) among the dielectric tin corresponding to the third clinical variable were detected in the subject having the third gene (G3) The possibility that the disease is expressed by the patients is higher than that of the other subjects.

제1 내지 제3 유전자(G1~G3)는 제1 내지 제3 임상 변수(R1~R3) 중 적어도 하나에 대응하는 유전자일 수 있다. 이 경우, 도 3은 제1 내지 제3 임상 변수(R1~R3) 중 적어도 하나에 의하여 발현되는 질병을 갖는 3명의 피검자로부터 임상 변수와 유전체 주석 사이의 연결관계를 추가적으로 데이터베이스화 하기 위한 것일 수 있다. 분석부(300)는 제1 내지 제3 유전자(G1~G3)의 유전체 주석으로부터 유전체 주석과 임상 변수 사이의 연결관계를 분석할 수 있다. 예를 들어, 제1 임상 변수(R1)에 의하여 발현되는 질병을 갖고, 제1 유전자(G1)를 갖는 피검자로부터 추출한 유전체 주석을 분석한 결과, 유전자에 제1 SNV(T11), 제1 CNV(T12), 제1 miRNA(T14)가 추출될 때 제1 임상 변수에 의하여 발현되는 질병 가능성이 높은 것으로 분석될 수 있다. 그리고, 제1 Methyl(T13)과 제1 임상 변수 사이의 상관관계는 제1 SNV(T11), 제1 CNV(T12), 제1 miRNA(T14)보다 낮은 가중치를 갖는 것으로 분석될 수 있다.The first to third genes G1 to G3 may be genes corresponding to at least one of the first to third clinical variables R1 to R3. In this case, FIG. 3 may be for additionally databaseing the connection between the clinical variables and the genome annotation from three subjects having the disease expressed by at least one of the first to third clinical variables R1 to R3 . The analysis unit 300 can analyze the connection between the dielectric annotation and the clinical variables from the dielectric annotations of the first to third genes G1 to G3. For example, when genetic tin extracted from a subject having a disease represented by the first clinical variable (R1) and having a first gene (G1) was analyzed, the first SNV (T11), the first CNV T12) and when the first miRNA (T14) is extracted, it can be analyzed that the disease potential expressed by the first clinical variable is high. The correlation between the first Methyl (T13) and the first clinical parameter can be analyzed to have a lower weight than the first SNV (T11), the first CNV (T12), and the first miRNA (T14).

이와 같이, 도 3과 같은 구조를 이용하여 질병 연관성을 분석하는 경우, 유전체 주석과 임상 변수 사이의 연관성은 데이터 축적에 따라 변할 수 있다. 특정 임상 변수에 연결된 유전체 주석들의 접근 회수는 유전체 주석과 임상 변수 사이의 연관도 측정에 이용될 수 있다. 또한, 축적된 데이터를 바탕으로 피검자의 유전자로부터 검출된 유전체 주석이 분석되고, 질병 가능성이 예측될 수 있다.Thus, when analyzing the disease association using the structure as in FIG. 3, the association between the genome annotation and the clinical variable may vary with data accumulation. The number of accesses of genome annotations linked to specific clinical variables can be used to measure the association between genome annotations and clinical variables. Based on the accumulated data, the genome tin detected from the gene of the subject can be analyzed and the possibility of disease can be predicted.

도 4는 유전체 분석 시스템을 이용한 유전체 분석 방법을 도시한 순서도이다.4 is a flowchart showing a method of analyzing a dielectric using a dielectric analysis system.

도 4를 참조하면, 유전체 분석 방법(S1000)은 유전체 주석을 추출하는 단계(S100), 분류 매트릭스를 생성하는 단계(S200), 폭소노미 구조를 구축하는 단계(S300), 및 질병 연관성을 분석하는 단계(S400)를 포함한다. 유전체 분석 방법(S1000)은 도 1의 유전체 분석 시스템(1000)에 의하여 수행될 수 있다.Referring to FIG. 4, a dielectric analysis method (S1000) includes a step of extracting dielectric tins (S100), a step of creating a classification matrix (S200), a step of constructing a follyonomy structure (S300), and a step of analyzing a disease association (S400). The dielectric analysis method (S1000) can be performed by the dielectric analysis system 1000 of FIG.

유전체 주석을 추출하는 단계(S100)는 도 1의 폭소노미 구축 시스템(200) 또는 도 2의 유전체 주석 추출부(230)에 의하여 수행될 수 있다. 유전체 주석을 추출하는 단계(S100)에서 폭소노미 구축 시스템(200)은 오믹스 데이터(od)로부터 유전체 주석을 추출한다. 유전체 주석은 특정 임상 변수에 대응되는 오믹스 데이터(od)로부터 추출된 것일 수 있다.The step of extracting the dielectric tin (SlOO) may be performed by the follooming system 200 of FIG. 1 or the dielectric annotation extractor 230 of FIG. In step S100 of extracting the dielectric tin, the follyonomy construction system 200 extracts the dielectric tin from the omix data od. The dielectric annotation may be extracted from the omix data (od) corresponding to a particular clinical variable.

분류 매트릭스를 생성하는 단계(S200)는 도 1의 폭소노미 구축 시스템(200) 또는 도 2의 분류 매트릭스 생성부(240)에 의하여 수행될 수 있다. 분류 매트릭스를 생성하는 단계(S200)에서 폭소노미 구축 시스템(200)은 유전체 주석을 분류하여 분류 매트릭스를 생성한다. 예를 들어, 분류 매트릭스는 특정 임상 변수에 대응되는 유전자와 유전체 주석의 관계를 나타낸 것일 수 있다.The step S200 of generating the classification matrix may be performed by the fumnomy building system 200 of FIG. 1 or the classification matrix generator 240 of FIG. In step S200 of generating a classification matrix, the falseonomy construction system 200 generates a classification matrix by classifying the dielectric annotations. For example, a classification matrix may represent the relationship between a gene and a dielectric annotation corresponding to a particular clinical variable.

폭소노미 구조를 구축하는 단계(S300)는 도 1의 폭소노미 구축 시스템(200) 또는 도 2의 매핑부(250)에 의하여 수행될 수 있다. 폭소노미 구조를 구축하는 단계(S300)에서 폭소노미 구축 시스템(200)은 분류 매트릭스에 근거하여 유전자, 유전체 주석, 및 임상 변수를 연결하는 관계를 규정하는 매핑 데이터(md)를 생성할 수 있다. 즉, 유전자와 유전체 주석이 연결되고, 유전체 주석과 임상 변수를 연결하는 폭소노미 구조가 구축될 수 있다.Step S300 of constructing the follyonomy structure may be performed by the follooming system 200 of FIG. 1 or the mapping unit 250 of FIG. In step S300 of constructing the follyonomy structure, the follyonomy construction system 200 may generate mapping data md defining relationships linking genes, genome annotations, and clinical variables based on the classification matrix. In other words, a gene can be linked to a genome annotation, and a genomic annotation linking the genome annotation with a clinical variable can be constructed.

질병 연관성을 분석하는 단계(S400)는 도 1의 분석부(300)에 의하여 수행될 수 있다. 질병 연관성을 분석하는 단계(S400)에서 분석부(300)는 매핑 데이터(md)에 근거하여 유전체와 질병 사이의 연관성을 분석할 수 있다. 예를 들어, 질병군과 유전체 주석 사이의 연관성, 질병군과 임상적 경향, 또는 질병군과 유전자 사이의 경향 등이 분석될 수 있다. 구체적으로, 질병 연관성을 분석하는 단계(S400)에서 임상변수와 연관성이 높은 유전자가 예측될 수 있고, 특정 유전체 주석을 공통으로 사용하는 유전자를 예측할 수 있고, 동일한 임상 변수에 대하여 공통의 유전체 주석을 갖는 유전자를 예측할 수 있다.The step S400 of analyzing the disease association can be performed by the analysis unit 300 of FIG. In step S400 of analyzing the disease association, the analysis unit 300 may analyze the association between the genome and the disease based on the mapping data md. For example, associations between disease groups and genetic annotations, disease groups and clinical trends, or trends between disease groups and genes can be analyzed. Specifically, in step S400 of analyzing the disease association, it is possible to predict a gene having a high correlation with a clinical variable, predict a gene using a specific genome annotation in common, and use a common genome annotation Can be predicted.

본 발명의 유전체 분석 시스템(1000) 및 유전체 분석 방법(S1000)은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터로 판독할 수 있는 매체에 기록될 수 있다. 컴퓨터로 판독할 수 있는 매체는 하드 디스크, 자기 매체, 및 광 기록 매체와 같은 기록 매체를 포함하고 ROM, RAM, 플래시 메모리 등과 같은 스토리지 장치를 포함할 수 있다.The dielectric analysis system 1000 and the dielectric analysis method S1000 of the present invention may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer readable medium. The computer-readable medium includes a recording medium such as a hard disk, a magnetic medium, and an optical recording medium, and may include a storage device such as ROM, RAM, flash memory and the like.

위에서 설명한 내용은 본 발명을 실시하기 위한 구체적인 예들이다. 본 발명에는 위에서 설명한 실시 예들뿐만 아니라, 단순하게 설계 변경하거나 용이하게 변경할 수 있는 실시 예들도 포함될 것이다. 또한, 본 발명에는 상술한 실시 예들을 이용하여 앞으로 용이하게 변형하여 실시할 수 있는 기술들도 포함될 것이다.The above description is a concrete example for carrying out the present invention. The present invention includes not only the above-described embodiments, but also embodiments that can be simply modified or easily changed. In addition, the present invention includes techniques that can be easily modified by using the above-described embodiments.

1000: 유전체 분석 시스템 100: 임상 정보 데이터베이스
200: 폭소노미 구축 시스템 300: 분석부
400: 질병 연관성 데이터베이스1000: Dielectric analysis system 100: Clinical information database
200: foolishness building system 300: analysis unit
400: Disease association database

Claims

The method comprising the steps of: receiving omix data for a gene corresponding to a clinical condition of a specific disease; extracting genomic annotations from the omix data; generating a classification matrix by classifying the genomic annotations; A follyonomy construction system for connecting the dielectric tins and generating mapping data for connecting the dielectric tins and the clinical variables; And
And an analysis unit for analyzing the association between the gene and the dielectric annotations based on the mapping data, the association between the dielectric annotations and the clinical variable, or the association between the gene and the clinical variable.