KR100610240B1

KR100610240B1 - System and method for reverse-engineering genetic circuits using bayesian network learning

Info

Publication number: KR100610240B1
Application number: KR1020050064754A
Authority: KR
Inventors: 이도헌; 이필현; 김영훈
Original assignee: 한국과학기술원
Priority date: 2005-07-18
Filing date: 2005-07-18
Publication date: 2006-08-09

Abstract

본 발명은 유전자 수에 비해 실험 데이터가 부족한 경우에도 신뢰성 있는 유전자 상호작용 추론이 가능토록 한 베이지안 네트워크 학습을 이용한 유전자 회로 역공학 시스템 및 방법을 제공한다. 본 발명의 시스템은 생물주석정보 및 유전자간 상호정보를 이용하여 유전자간 유사성을 계산하는 제1, 제2 유사성 계산부; 마이크로어레이 발현 데이터로부터 실험조건에 따라 특이성을 가지는 유전자를 씨앗 유전자로 선택하는 씨앗 유전자 선택부; 상기 제1,2 유사성 계산부를 통하여 얻어진 결과를 바탕으로 상기 씨앗 유전자를 중심으로 한 영역을 확장해가면서 유사 유전자 그룹인 유전자 모듈을 형성하는 유전자 모듈 형성부; 상기 유전자 씨앗을 중심으로 형성된 각 유전자 모듈들에 대해서 병렬화된 베이지안 네트워크 학습을 수행하는 학습부; 상기 각 유전자 모듈들 간에 공통적으로 포함되어 있는 매개 유전자를 이용하여 상기 학습부에서 학습된 각 유전자 모듈을 하나의 네트워크로 통합하는 통합부로 구성됨을 특징으로 한다.The present invention provides a genetic circuit reverse engineering system and method using Bayesian network learning that enables reliable inference of gene interaction even when the experimental data is insufficient compared to the number of genes. The system of the present invention includes a first and second similarity calculating unit for calculating the similarity between genes by using biotin information and mutual information between genes; Seed gene selection unit for selecting a gene having a specificity as a seed gene according to the experimental conditions from the microarray expression data; A gene module forming unit configured to form a gene module that is a similar gene group while expanding a region around the seed gene based on the result obtained through the first and second similarity calculating units; A learning unit that performs parallelized Bayesian network learning on each of the gene modules formed around the gene seeds; It is characterized by consisting of an integration unit for integrating each gene module learned in the learning unit by using a mediator gene commonly included between each of the gene modules into a network.

유전자 상호작용, 추론, 베이지안 네트워크, 유전자 네트워크 Gene Interaction, Inference, Bayesian Network, Gene Network

Description

Genetic circuit reverse engineering system and method using Bayesian network learning {SYSTEM AND METHOD FOR REVERSE-ENGINEERING GENETIC CIRCUITS USING BAYESIAN NETWORK LEARNING}

도 1은 본 발명에 따른 베이지안 네트워크 학습을 이용한 유전자 회로 역공학 시스템의 구성도.1 is a block diagram of a genetic circuit reverse engineering system using Bayesian network learning according to the present invention.

도 2는 본 발명의 전체적인 동작 흐름도.2 is an overall operational flow diagram of the present invention.

도 3은 본 발명에서 생물주석정보를 이용한 유전자 간의 유사성을 계산하는 방법을 설명하기 위한 도.Figure 3 is a view for explaining a method for calculating the similarity between genes using biotin information in the present invention.

도 4는 도 1의 유전자 모듈 형성부를 통해 생성되는 유전자 모듈을 나타낸 도.4 is a diagram showing a gene module generated through the gene module forming unit of FIG.

도 5는 본 발명에서 매개 유전자를 통해 각 유전자 모듈을 통합하는 방법을 나타낸 알고리즘.5 is an algorithm showing a method of integrating each gene module through a mediated gene in the present invention.

도 6은 본 발명을 통해 얻어지는 유전자간 상호작용 네트워크를 나타낸 도.Figure 6 is a diagram showing the intergene interaction network obtained through the present invention.

<도면의 주요 부분에 대한 부호의 설명><Explanation of symbols for the main parts of the drawings>

110, 120 : 제1,제2 유사성 계산부 130 : 씨앗 유전자 선택부110, 120: first and second similarity calculation unit 130: seed gene selection unit

140 : 유전자 모듈 형성부 150 : 학습부140: gene module forming unit 150: learning unit

160 : 통합부 160: integrated unit

본 발명은 유전자 상호작용 추론 시스템 및 방법에 관한 것으로, 특히 유전자 수에 비해 부족한 실험 데이터로 인한 잘못된 추론의 영향을 최소화하여 신뢰성 있는 상호작용 추론이 가능토록 하는 베이지안 네트워크 학습을 이용한 유전자 회로 역공학 시스템 및 방법에 관한 것이다.The present invention relates to a gene interaction inference system and method, and in particular, a genetic circuit reverse engineering system using Bayesian network learning that enables reliable interaction inference by minimizing the effects of false inference due to insufficient experimental data compared to the number of genes. And to a method.

세포의 기능과 그 변화는 수천 개의 유전자와 그 유전자들의 생성물이 복합적으로 상호작용한 활동의 결과로 나타난다. 이러한 활동들은 유전자들의 발현(Expression)을 조절하는 복잡한 상호작용 네트워크를 통해 설명될 수 있다. The function of cells and their changes are the result of a complex interaction of thousands of genes and their products. These activities can be explained through complex interaction networks that regulate the expression of genes.

최근 마이크로어레이(Microarray) 실험을 통한 지놈(Genome) 단위의 유전자 분석이 가능해지면서 마이크로어레이 데이터를 통해 대규모 유전자간 상호작용을 추론함으로서 생명체의 기능과 내부 메카니즘을 밝히고자 하는 시도가 활발해지고 있다.Recently, as genetic analysis at the genome unit is possible through microarray experiments, attempts to shed light on the functions and internal mechanisms of living organisms have been actively made by inferring large-scale gene interactions through microarray data.

이러한 추론을 위해서 불리안 네트워크(Boolean Network), 베이지안 네트워크(Bayesian Network)등을 포함한 여러 가지 방법론들이 제시되어 왔다. 이 중에서 베이지안 네트워크가 이론적 타당성과 통계적 안정성을 기반으로 유전자 상호작용 추론에 적절한 것으로 받아들여지고 있다.For this reasoning, several methodologies have been proposed, including Boolean Network and Bayesian Network. Among them, the Bayesian network is accepted as suitable for inference of gene interaction based on theoretical validity and statistical stability.

그러나, 수천 개에 이르는 유전자 수에 비해 현저히 부족한 마이크로어레이 실험 데이터의 양으로 인해 잘못된 추론(양성 오류(False Positive))이 많아지는 문제를 안고 있다.However, due to the amount of microarray experimental data, which is significantly insufficient compared to the number of genes of thousands, there is a problem of increasing false inference (False Positive).

또한, 베이지안 네트워크를 사용할 때 수천 개에 이르는 유전자에 대한 상호작용 전체를 추론하는 것은 불가능 하다는 한계를 가지고 있다.In addition, the limitation of using Bayesian networks is that it is impossible to deduce the total interactions for thousands of genes.

종래에는 상호작용 추론의 검색범위를 줄여주는 적은 후보(Sparse Candidate)방법이나 추론된 여러 개의 모델로부터 공통의 것을 뽑아내어 모델을 만드는 모델 병합(Model Averaging)방법(Pacific Symposium on Biocomputing, Hartemink et al. 7, pp. 437-449, 2002), 기존에 알려진 생물학적 정보를 통해 추론하는 방법 등의 시도가 있었으나 근본적인 데이터 부족의 문제는 크게 진전을 보지 못하고 있다.Conventionally, there is a small candidate method for reducing the range of interaction inference or a model averaging method for extracting a common one from a plurality of inferred models (Pacific Symposium on Biocomputing, Hartemink et al. 7, pp. 437-449, 2002), attempts have been made to deduce from known biological information, but the lack of fundamental data has not made much progress.

본 발명은 이러한 문제점을 해결하기 위한 것으로, 본 발명의 목적은 실험 데이터 부족으로부터 발생하는 잘못된 추론을 최소화하여 보다 신뢰성 있는 유전자 상호작용 네트워크 구축이 가능토록 한 베이지안 네트워크 학습을 이용한 유전자 회로 역공학 시스템 및 방법을 제공함에 있다.The present invention has been made to solve this problem, and an object of the present invention is to reverse the genetic circuit engineering system using Bayesian network learning to minimize the false inference generated from the lack of experimental data and to build a more reliable gene interaction network and In providing a method.

본 발명의 다른 목적은 지놈규모(Genome-Wide)의 유전자 전체에 대한 추론이 가능한 베이지안 네트워크 학습을 이용한 유전자 회로 역공학 시스템 및 방법을 제공함에 있다.Another object of the present invention is to provide a genetic circuit reverse engineering system and method using Bayesian network learning capable of inferring the entire genome-wide gene.

상기 본 발명의 목적을 달성하기 위한 본 발명에 따른 베이지안 네트워크 학습을 이용한 유전자 회로 역공학 시스템은, 생물주석정보 및 유전자간 상호정보를 이용하여 유전자간 유사성을 계산하는 제1, 제2 유사성 계산수단; 마이크로어레이 발현 데이터로부터 실험조건에 따라 특이성을 가지는 유전자를 씨앗 유전자로 선택하는 씨앗 유전자 선택수단; 상기 제1,2 유사성 계산수단의 결과를 바탕으로 상기 씨앗 유전자를 중심으로 한 영역을 확장해가면서 유사 유전자 그룹인 유전자 모듈을 형성하는 유전자 모듈 형성수단; 상기 유전자 씨앗을 중심으로 형성된 각 유전자 모듈들에 대해서 병렬화된 베이지안 네트워크 학습을 수행하는 학습수단; 및 상기 각 유전자 모듈들 간에 공통적으로 포함되어 있는 매개 유전자를 이용하여 상기 학습수단에서 학습된 각 유전자 모듈을 하나의 네트워크로 통합하는 통합수단;으로 구성됨을 특징으로 한다.Genetic circuit reverse engineering system using Bayesian network learning according to the present invention for achieving the object of the present invention, the first, second similarity calculation means for calculating the similarity between genes using biotin information and mutual information between genes ; Seed gene selection means for selecting a gene having a specificity as a seed gene according to experimental conditions from the microarray expression data; A gene module forming means for forming a gene module which is a similar gene group while expanding a region around the seed gene based on the result of the first and second similarity calculating means; Learning means for performing parallelized Bayesian network learning on each of the gene modules formed around the gene seeds; And integrating means for integrating each gene module learned in the learning means into one network by using a mediator gene commonly included between the respective gene modules.

상기 목적을 달성하기 위한 본 발명의 베이지안 네트워크 학습을 이용한 유전자 회로 역공학 방법은, 생물주석정보 및 유전자간 상호정보를 이용하여 각 유전자간 유사성을 계산하는 제1 단계; 마이크로어레이 발현 데이터로부터 실험조건에 따라 특이성을 가지는 유전자를 씨앗 유전자로 선택하는 제2 단계; 상기 제1 단계를 통하여 얻어진 결과를 바탕으로 상기 씨앗 유전자를 중심으로 한 영역을 확장해가면서 유사 유전자 그룹인 유전자 모듈을 형성하는 제3 단계; 상기 유전자 씨앗을 중심으로 형성된 각 유전자 모듈들에 대해서 병렬화된 베이지안 네트워크 학습을 수행하는 제4 단계; 및 상기 각 유전자 모듈들 간에 공통적으로 포함되어 있는 매개 유전자를 이용하여 상기 학습된 각 유전자 모듈을 하나의 네트워크로 통합하는 제5 단계;를 포함하는 것을 특징으로 한다.Genetic circuit reverse engineering method using the Bayesian network learning of the present invention for achieving the above object, the first step of calculating the similarity between each gene using biotin information and inter-gene mutual information; Selecting a gene having a specificity as a seed gene according to experimental conditions from the microarray expression data; A third step of forming a gene module which is a similar gene group by expanding a region centered on the seed gene based on the result obtained through the first step; Performing a parallelized Bayesian network learning on each of the gene modules formed around the gene seeds; And a fifth step of integrating the learned genetic modules into one network by using a mediator gene commonly included between the respective genetic modules.

이하, 본 발명의 바람직한 실시 예를 첨부된 도면을 참조하여 보다 상세하게 설명한다. 단, 하기 실시 예는 본 발명을 예시하는 것일 뿐 본 발명의 내용이 하기 실시 예에 한정되는 것은 아니다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings. However, the following examples are merely to illustrate the present invention is not limited to the contents of the present invention.

도 1은 본 발명에 따른 베이지안 네트워크 학습을 이용한 유전자 회로 역공학 시스템의 블록 구성도를 도시한 것이다.1 is a block diagram of a genetic circuit reverse engineering system using Bayesian network learning according to the present invention.

도시한 바와 같이, 생물주석정보(Biological Annotation Information)를 이용하여 각 유전자간 유사성을 계산하는 제1 유사성 계산부(110), 유전자간 상호정보(Mutual Information)를 이용하여 유전자간 유사성을 계산하는 제2 유사성 계산부(120), 마이크로어레이 발현 데이터(Microarray Expression Data)로부터 실험조건(Experimental Condition)에 따라 특이성을 가지는 유전자를 씨앗 유전자(Seed Gene)로 선택하는 씨앗 유전자 선택부(130), 상기 제1,2 유사성 계산부(110),(120)를 통하여 얻어진 결과를 바탕으로 상기 씨앗 유전자를 중심으로 한 영역을 확장해가면서 유사 유전자 그룹인 유전자 모듈(module 1 - module n)을 형성하는 유전자 모듈 형성부(140), 상기 유전자 씨앗을 중심으로 만들어진 각 유전자 모듈(module 1 - module n)들에 대해서 병렬화된 베이지안 네트워크 학습을 수행하는 학습부(150), 상기 각 유전자 모듈(module 1 - module n)들간에 공통적으로 포함되어 있는 매개 유전자(Intermediary Gene)를 이용하여 상기 학습부(150)에서 학습된 각 유전자 모듈(module 1 - module n)을 하나의 네트워크로 통합하는 통합부(160)로 구성된다.As shown, a first similarity calculation unit 110 for calculating similarity between genes using biological annotation information, and a method for calculating similarity between genes using mutual information between genes. 2 similarity calculation unit 120, seed gene selection unit 130 for selecting a gene having a specificity according to experimental conditions (Experimental Condition) from the microarray expression data (Sear Gene) (Seed Gene), 130, On the basis of the results obtained through the similarity calculation unit 110 and 120, the genetic module that forms a gene module (module 1-module n), which is a similar gene group, is expanded while the region centered on the seed gene. Forming unit 140, the learning unit for performing parallelized Bayesian network learning for each gene module (module 1-module n) made around the gene seeds (15) 0), each of the gene modules (module 1-module n) learned in the learning unit 150 by using an intermediary gene (Communication Gene) commonly included between each of the module (module 1-module n). It is composed of an integrated unit 160 to integrate into one network.

상기와 같이 구성된 본 발명을 도 2의 흐름도와 함께 살펴본다.The present invention configured as described above will be described with the flowchart of FIG. 2.

먼저, 상기 생물주석정보를 사용하여 유전자 간 유사성을 측정하는 제1 유사성 계산부(110)는 유전자간 기능의 유사성을 점수를 통해 수치화하게 된다(S110).First, the first similarity calculation unit 110 for measuring similarity between genes using the biotin information quantifies the similarity of functions between genes through a score (S110).

도 3에 나타낸 바와 같이, 두 유전자의 기능이 다를 경우 트리 구조에서 둘의 공통 부모노드를 찾아 그것의 값을 취한다. 이 값에 일반적인 RSS(Resnik Sementic Similarity)를 적용하여 유사성을 얻게 된다.As shown in FIG. 3, when the functions of the two genes are different, the two common parent nodes are found in the tree structure and their values are taken. Similarity is obtained by applying a common RSS (Resnik Sementic Similarity) to this value.

수학식 1을 통해 좀 더 자세히 살펴보면, S(f_i,f_j)이 두 노드의 최단 공통 부모 노드의 값에 음수 로그를 취한 것이라면, 두 유전자 간의 유사성 AI(g_i,g_j)는 공통의 기능을 가지는 경우의 S(f_i,f_j)값을 모두 더하고 여기에 서로 다른 기능에 대한 최대의 S(f_i,f_j)값을 더하여 구한다.Looking more closely at Equation 1, if S (f _i , f _j ) is a negative log of the values of the shortest common parent nodes of the two nodes, then the similarity AI (g _i , g _j ) between the two genes is common. It is obtained by adding all the S (f _i , f _j ) values in the case of having a function and adding the maximum S (f _i , f _j ) values for different functions.

수학식 1

Equation 1

그리고 유전자 간 상호정보를 이용하여 유전자 사이의 유사성을 계산하는 제2 유사성 계산부(120)는 유전자들의 발현 양상(Expression Pattern)을 근거로 유전 자 간의 유사성을 수치화하게 된다(S110). 이를 위해 유전자 간 상호정보가 측도로 사용된다.In addition, the second similarity calculator 120 that calculates similarity between genes by using mutual information between genes may quantify similarity between genes based on expression patterns of genes (S110). For this purpose, mutual information between genes is used as a measure.

즉, g_i와 g_j는 유전자이고, x_i,y_j는 상기 유전자 g_i와 g_j의 이산화된(Discretized) 발현 수치일 때, 유전자 간 상호정보를 이용한 유전자 간 유사성인 MI(g_i,g_j)는 다음의 수학식 2에 의해 얻어진다.That is, when g _i and g _j are genes and x _i and y _j are discretized expression values of the genes g _i and g _j , MI (g _i , g _j ) is obtained by the following equation.

수학식 2Equation 2

MI(g_i,g_j) = ∑x_i∑x_jp(x_i,x_j)log(p(x_i,x_j)/p(x_i)p(x_j))MI (g _i , g _j ) = ∑x _i ∑x _j p (x _i , x _j ) log (p (x _i , x _j ) / p (x _i ) p (x _j ))

상기 수학식 2에서 P는 확률함수이다.In Equation 2, P is a probability function.

또한, 상기 씨앗 유전자 선택부(130)는 유전자 발현 양상을 근거로 각 실험조건을 대표하는 유전자를 추출하게 된다(S120).In addition, the seed gene selection unit 130 extracts a gene representing each experimental condition based on the gene expression pattern (S120).

즉, u_ci는 실험조건 c아래에서 유전자 i의 평균을, u_￢ci는 실험조건 c를 제외한 나머지 조건에서 유전자 i의 평균을, 그리고 각각에서의 표준편차를 σ_ci, σ_￢ci 로 정의할 때, 다음의 수학식 3과 같이 얻어지는 특이성 측도 D값 중 기준치(Threshold) u_D + 3*σ_D 이상의 유전자들을 씨앗 유전자로 선택한다. 여기서, u_D 및 σ_D 는 각각 특이성 측도의 평균 및 표준편차이다.That is, u _ci is defined as the mean of gene i under experimental condition c, u _￢ci is the mean of gene i under conditions except experimental condition c, and the standard deviation in each is defined as σ _ci , σ _￢ci . At this time, among the specificity measure D values obtained as in Equation 3 below, genes of a threshold u _D + 3 * σ _D or more are selected as seed genes. Where u _D and σ _D are the mean and standard deviation of the specificity measures, respectively.

수학식 3Equation 3

D = │u_ci -u_￢ci│/ (σ_ci + σ_￢ci )D = | u _ci - u _￢ci │ / (σ _ci + σ _￢ci )

그리고 상기 유전자 모듈 형성부(140)는 생물주석정보와 상호정보의 합집합 으로 씨앗 유전자를 중심으로 영역을 확장시키면서 도 4와 같은 유전자 모듈(module 1 - module n)을 형성하게 된다(S130).In addition, the gene module forming unit 140 forms a gene module (module 1-module n) as shown in FIG. 4 while expanding a region around the seed gene as a union of biotin information and mutual information (S130).

이 확장을 위해 기존에 실험을 통해 알려진 생물주석정보를 이용해 연관된 유전자를 선택하는 한편, 기존에 밝혀지지 않은 유전자 또한 포함하기 위해 유전자 발현 양상을 기초로 한 상호정보를 이용한다. For this expansion, the relevant genes are selected using previously known biotin information from experiments, while mutual information based on gene expression patterns is used to include genes that are not previously known.

유전자 모듈(module 1 - module n) 형성을 위한 영역 확장을 위한 기준값으로는 각각 u_AI + 4*σ_AI,u_MI + 3*σ_MI를 사용한다. 여기서, u_AI는 생물주석정보를 이용한 유전자간 유사성의 평균, σ_AI는 생물주석정보를 이용한 유전자간 유사성의 표준편차,u_MI는 상호정보를 이용한 유전자간 유사성의 평균, σ_MI 는 상호정보를 이용한 유전자간 유사성의 표준편차이다.As reference values for region expansion for forming the module 1-module n, u _AI + 4 * σ _{AI and} u _MI + 3 * σ _MI are used, respectively. Where u _AI is the mean of similarity between genes using biotin information, σ _AI is the standard deviation of similarity between genes using biotin information, u _MI is the mean of similarity between genes using mutual information, and σ _MI is the standard deviation of similarity between genes using mutual information.

또한, 상기 기준값은 형성되는 유전자 모듈(module 1 - module n) 들의 크기를 조정하고, 상호정보와 생물주석정보로부터 유입되는 유전자의 양의 밸런스를 맞추기 위한 것으로, 실험을 통한 경험적 수치이다.In addition, the reference value is to adjust the size of the gene module (module 1-module n) to be formed, and to balance the amount of genes introduced from mutual information and bio tin information, it is an empirical value through experiments.

상기 학습부(150)는 상기 씨앗 유전자를 중심으로 만들어진 각 유전자 모듈(module 1 - module n) 들에 대해 병렬처리를 통해 베이지안 네트워크 학습 속도를 최대화하며, 이를 위해 씨앗 유전자를 중심으로 한 각 유전자 모듈(module 1 - module n)은 쓰레드(Thread) 단위로 나뉘어 학습이 진행된다(S140).The learning unit 150 maximizes the Bayesian network learning speed through parallel processing for each gene module (module 1-module n) made based on the seed gene, and for this purpose, each gene module based on the seed gene. (module 1-module n) is divided into threads (Thread) unit, the learning proceeds (S140).

효과적인 학습 수행을 위해서 언덕 오름 방법(Hill Climbing), 적은 후보 방법(Sparse Candidate), 모델 병합 방법(Model Averaging)이 사용되며, 네트워크 구 조를 평가하기 위해 MDL 스코어(Minimum Description Length Score)가 사용되었다. Hill Climbing, Sparse Candidate, and Model Averaging are used for effective learning, and Minimal Description Length Score is used to evaluate the network structure. .

또한, 수천 개의 유전자를 한꺼번에 추론하지 않고 유전자를 각 유전자 모듈(module 1 - module n) 별로 분리하여 추론하기 때문에 유전자 당 실험 데이타의 비율이 크게 향상되어 추론 오류의 가능성을 크게 줄여주게 된다.In addition, since genes are inferred by each gene module (module 1-module n) without inferring thousands of genes at once, the ratio of experimental data per gene is greatly improved, which greatly reduces the possibility of inference error.

그리고 상기 통합부(160)는 씨앗 유전자를 중심으로 한 각 유전자 모듈(module 1 - module n) 들이 서로 분리된 별개의 네트워크 결과물로 남지 않고 통합된 하나의 유기체가 될수 있도록 만들게 된다(S150). 종래의 방법들은 군집화 방법(Clustering)을 사용하여 여러 개의 그룹들을 만들어내기는 했으나 그들 사이에 아무런 연결고리를 갖지 못하고 단지 독립된 각각의 그룹으로 그치고 말았다. In addition, the integration unit 160 is to make each genetic module (module 1-module n) centered on the seed gene to be an integrated organism without remaining as a separate network result separated from each other (S150). Conventional methods use clustering to create several groups, but have no link between them, but only separate groups.

따라서 본 발명에서는 각 유전자 모듈들(module 1 - module n) 간에 공통적으로 포함되어 있는 매개 유전자를 이용하여 전체 유전자 모듈(module 1 - module n)을 하나의 네트워크로 통합한다. 도 5는 이 작업의 알고리즘을 나타내고 있으며, 최종적으로 도 6과 같은 유전자 상호작용 네트워크를 얻게 된다.Therefore, in the present invention, the entire gene module (module 1-module n) is integrated into one network by using a mediator gene commonly included between each gene module (module 1-module n). Fig. 5 shows the algorithm of this work, and finally the gene interaction network as shown in Fig. 6 is obtained.

상술한 바와 같이, 본 발명의 바람직한 실시 예를 참조하여 설명하였지만, 해당 기술 분야의 숙련된 당업자는 하기의 특허청구범위에 기재된 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 또는 변형하여 실시할 수 있다. As described above, although described with reference to a preferred embodiment of the present invention, those skilled in the art various modifications of the present invention without departing from the spirit and scope of the invention described in the claims below Or it may be modified.

이상에서 살펴본 바와 같이, 본 발명은 부족한 마이크로어레이 실험 데이터 로 인한 잘못된 추론의 영향을 최소화 하여 신뢰성 있는 유전자 상호작용 네트워크를 구성할 수 있고, 또한 지놈단위의 대규모 유전자에 대한 분석이 가능해짐으로서 유전자 상호작용에 대한 세부적인 시각 뿐 아니라 포괄적인 시각을 얻을 수 있다.As described above, the present invention can construct a reliable gene interaction network by minimizing the influence of false inference due to insufficient microarray experimental data, and also enables the analysis of large genes at the genome level. You get a comprehensive view as well as a detailed view of the action.

Claims

First similarity calculating means for calculating similarity between genes using biotin information;

Second similarity calculating means for calculating similarity between genes using mutual information between genes;

Seed gene selection means for selecting a gene having a specificity as a seed gene according to experimental conditions from the microarray expression data;

A gene module forming means for forming a gene module which is a similar gene group while expanding a region around the seed gene based on the result of the first and second similarity calculating means;

Learning means for performing parallelized Bayesian network learning on each of the gene modules formed around the gene seeds; And

Integration means for integrating each gene module learned by the learning means into one network by using a mediator gene commonly included between the respective gene modules;

Genetic circuit reverse engineering system using Bayesian network learning, characterized in that consisting of.

The method of claim 1, wherein the first similarity calculating means

Calculate similarity between genes using RSS (Resnik Sementic Similarity), but when S ( f _i , f _j ) takes a negative log at the value of the shortest common parent node of two nodes, the similarity AI ( g _i) between the two genes , g _j ) is a Bayesian which is obtained by adding all S ( f _i , f _j ) values for common functions and adding the maximum S ( f _i , f _j ) values for different functions. Genetic Reverse Engineering System Using Network Learning.

The method of claim 1, wherein the second similarity calculating means

g _i and g _j are genes, x _i and y _j are discretized expression levels of the genes g _i and g _j , and P is a probability function, where MI (g _i , g _j ) is the similarity between genes. Genetic reverse engineering using Bayesian network learning, characterized by _i ∑x _j p (x _i , x _j ) log (p (x _i , x _j ) / p (x _i ) p (x _j )) system.

According to claim 1, wherein the seed gene selection means

When u _ci is defined as the mean of gene i under experimental condition c, u _￢ci is the mean of gene i under conditions except experimental condition c, and the standard deviation in each is defined as σ _ci , σ _￢ci . D = | u _ci - _Genes greater than or equal to the reference value u _D + 3 * σ _D (u _D : mean of specificity measures, σ _D : standard deviation of specificity measures) among the specificity measures D obtained as u _￢ci │ / (σ _ci + σ _￢ci ) Genetic circuit reverse engineering system using Bayesian network learning, characterized in that the selection as a seed gene.

The method of claim 1, wherein the gene module forming means

Genetic modules based on seed genes are formed by combining the biotin information and mutual information, wherein u _AI is the mean of similarity between genes using biotin information, and σ _AI is the standard deviation of similarity between genes using biotin information. , When u _MI is the mean of similarity between genes using mutual information, and σ _MI is the standard deviation of the similarity between genes using mutual information, u _AI + 4 * σ _{AI and} u _MI + are the reference values for forming the genetic module. Genetic circuit reverse engineering system using Bayesian network learning, characterized by using 3 * σ _MI .

The method of claim 1, wherein the learning means

Genetic circuit reverse engineering system using Bayesian network learning, characterized by parallel processing of each gene module in units of threads.

The method of claim 1, wherein the learning means

A genetic circuit reverse engineering system using Bayesian network learning, which uses a hill climbing method or a small candidate method or a model merging method for learning, but uses a Minimum Description Length Score (MDL) score as a measure of network structure evaluation.

A first step of calculating similarity between genes using biotin information and mutual information between genes;

Selecting a gene having a specificity as a seed gene according to experimental conditions from the microarray expression data;

A third step of forming a gene module which is a similar gene group by expanding a region centered on the seed gene based on the result obtained through the first step;

Performing a parallelized Bayesian network learning on each of the gene modules formed around the gene seeds; And

A fifth step of integrating the learned genetic modules into one network by using a mediator gene commonly included between the respective genetic modules;

Genetic circuit reverse engineering method using Bayesian network learning, comprising a.