KR102051710B1

KR102051710B1 - Genome analysis method based on modularization

Info

Publication number: KR102051710B1
Application number: KR1020170150582A
Authority: KR
Inventors: 김진혁; 김혜영
Original assignee: 한양대학교 산학협력단
Priority date: 2017-11-13
Filing date: 2017-11-13
Publication date: 2019-12-04
Also published as: KR20190054386A

Abstract

모듈화에 기반한 게놈 분석 방법은 컴퓨터 장치가 특정 유전자 발현 데이터를 입력받는 단계, 상기 컴퓨터 장치가 게놈을 구성하는 복수의 유전자 각각의 발현 상태에 대한 확률을 기준으로 유전자의 엔트로피를 결정하는 단계, 상기 컴퓨터 장치가 상기 엔트로피를 기준으로 복수의 유전자가 속하는 복수의 게놈 모듈로 구성되는 게놈 모듈 네트워크를 생성하는 단계 및 상기 컴퓨터 장치가 상기 게놈 모듈에 대한 각 유전자의 확률을 결정하는 단계를 포함한다. 상기 엔트로피는 상기 복수의 유전자 각각의 발현 상태의 불균일성을 나타내고, 상기 컴퓨터 장치는 상기 복수의 게놈 모듈에 대한 엔트로피의 합이 최소가 되도록 상기 게놈 모듈을 구성한다.The method of genome analysis based on modularization includes the steps of a computer device receiving specific gene expression data, the computer device determining the entropy of a gene based on a probability of the expression state of each of a plurality of genes constituting the genome, the computer The device generating a genomic module network consisting of a plurality of genomic modules to which a plurality of genes belong based on the entropy and the computer device determining a probability of each gene for the genomic module. The entropy represents a non-uniformity in the expression state of each of the plurality of genes, and the computer device configures the genomic module such that the sum of entropy for the plurality of genomic modules is minimal.

Description

Genome analysis method based on modularity {GENOME ANALYSIS METHOD BASED ON MODULARIZATION}

이하 설명하는 기술은 게놈 분석 기법에 관한 것이다.The techniques described below relate to genomic analysis techniques.

전통적으로 악성종양과 같은 질환은 원인이 게놈에 있을 것으로 추정하며, 따라서 악성종양을 극복하기 위한 연구는 게놈에 집중되어 있다. 분자생물학의 발달로 전통적 항암 화학요법의 부작용을 줄이고 암세포만을 선택적으로 파괴하는 분자표적 치료가 가능하게 되었다. 그러나 악성종양에 대한 완벽한 치료는 아직 불가능하다. 이는 게놈의 기능 및 작용기전에 대한 이해의 부족으로부터 기인한다. 게놈에 대한 전통적인 연구방법은 생화학적 기술에 기반한 것으로 화학적 기능 이상으로 이해를 확장하기에는 한계가 있다.Traditionally, diseases such as malignant tumors are presumed to have a cause in the genome, so research to overcome the malignancy is concentrated in the genome. Advances in molecular biology have enabled the treatment of molecular targets that reduce the side effects of traditional chemotherapy and selectively destroy only cancer cells. However, complete treatment for malignancies is not yet possible. This is due to a lack of understanding of the function and mechanism of action of the genome. Traditional research methods on the genome are based on biochemical technology, and there is a limit to extending understanding beyond chemical functions.

미국공개특허 US2013-0191098United States Patent Application Publication US2013-0191098

이하 설명하는 기술은 질병 진단 내지 치료를 위한 새로운 게놈 분석 방법을 제공하고자 한다. 이하 설명하는 기술은 유전자 발현 데이터를 해석하는 기법을 제공하고자 한다. 이하 설명하는 기술은 게놈을 유전 정보를 생성하고 가공하는 정보 통신 장치로 보고 게놈을 분석하는 기법을 제공하고자 한다.The techniques described below are intended to provide new genomic analysis methods for disease diagnosis or treatment. The techniques described below are intended to provide techniques for interpreting gene expression data. The techniques described below seek to view a genome as an information communication device that generates and processes genetic information and to provide techniques for analyzing the genome.

모듈화에 기반한 게놈 분석 방법은 컴퓨터 장치가 특정 조직의 유전자 발현 데이터를 입력받는 단계, 상기 컴퓨터 장치가 상기 데이터를 이용하여 게놈을 구성하는 복수의 유전자 각각에 대한 엔트로피를 결정하는 단계, 상기 컴퓨터 장치가 상기 엔트로피를 기준으로 상기 복수의 유전자를 복수의 게놈 모듈로 구분하는 단계 및 상기 컴퓨터 장치가 상기 복수의 게놈 모듈에 대한 엔트로피를 기준으로 게놈 모듈을 연결하는 에지를 결정하여 상기 복수의 유전자에 대한 게놈 모듈 네트워크를 생성하는 단계를 포함한다. The method of genome analysis based on modularization includes the steps of a computer device receiving gene expression data of a specific tissue, the computer device using the data to determine entropy for each of a plurality of genes constituting the genome, wherein the computer device is Dividing the plurality of genes into a plurality of genomic modules based on the entropy, and the computer device determines an edge connecting the genomic modules based on entropy for the plurality of genomic modules to determine the genomes for the plurality of genes. Creating a module network.

게놈 모듈 네트워크 구축 장치은 특정 조직의 유전자 발현 데이터를 입력받는 입력장치, 상기 데이터를 이용하여 결정되는 복수의 유전자에 대한 엔트로피를 기준으로 상기 복수의 유전자를 복수의 게놈 모듈로 구분하고, 상기 복수의 게놈 모듈에 대하여 엔트로피를 기준으로 게놈 모듈을 연결하는 에지를 결정하는 프로그램을 저장하는 저장장치 및 상기 프로그램을 구동하여 상기 유전자 발현 데이터에 포함된 복수의 유전자에 대한 엔트로피를 결정하고, 결정된 엔트로피를 기준으로 상기 게놈 모듈 네트워크를 생성하는 연산장치를 포함한다.The apparatus for constructing a genomic module network divides the plurality of genes into a plurality of genomic modules based on an input device that receives gene expression data of a specific tissue, and entropy of a plurality of genes determined using the data, and the plurality of genomes. A storage device for storing a program for determining an edge connecting the genomic module with respect to the entropy with respect to the module, and the program is driven to determine entropy for a plurality of genes included in the gene expression data, and based on the determined entropy. A computing device for generating said genomic module network.

상기 엔트로피는 상기 유전자의 발현 상태에 대한 확률을 기준으로 상기 복수의 유전자 중 적어도 둘 이상의 유전자 사이의 상호 관계를 나타낸다.The entropy represents a correlation between at least two genes of the plurality of genes based on a probability of the expression state of the gene.

이하 설명하는 기술은 게놈을 네트워크 구조로 분석하고, 정보의 경로를 분석하여 특정 환자와 질환에 적합한 적절한 치료 방법을 제공할 수 있다.The techniques described below can analyze genomes into network structures and analyze information pathways to provide appropriate treatment methods for particular patients and diseases.

도 1은 게놈 모듈 네트워크의 예이다.
도 2는 게놈의 기초 상태를 도시한 예이다.
도 3은 게놈 공간에서 밀도 행렬을 나타낸 예이다.
도 4는 게놈의 기초 상태 행렬에 위치하는 게놈 모듈의 예이다.
도 5는 게놈 공간의 유전자 발현을 샘플 공간에서 측정하는 예이다.
도 6은 8개의 조직에 대한 커널 모듈을 도시한 예이다.
도 7은 BRNO의 커널 모듈을 다른 조직의 커널 모듈에 맵핑한 결과를 도시한 예이다.
도 8은 BRNO의 CCDR 도메인의 모듈을 다른 조직의 모듈에 맵핑한 결과를 도시한 예이다.
도 9는 샘플 공간에서 임의의 모듈에 대한 밀도 행렬을 도시한 예이다.
도 10은 임의의 모듈의 유전자 네트워크에서 유전자 i의 배제에 의한 모듈 섭동을 도시한 예이다.
도 11은 8개의 조직에 대한 TCGA 데이터 세트로 모듈 간 네트워크를 구성한 예이다.
도 12는 다양한 컷오프값을 적용한 BRNO의 모듈 간 네트워크에 대한 예이다.
도 13은 BRNO의 모듈 간 네트워크를 다른 조직의 모듈에 맵핑한 예를 도시한다.
도 14는 게놈 모듈 네트워크를 이용한 분석 방법에 대한 순서도의 예이다.
도 15는 게놈 모듈 네트워크를 구축하는 시스템에 대한 예이다. 1 is an example of a genomic module network.
2 is an example showing the basic state of the genome.
3 shows an example of a density matrix in genomic space.
4 is an example of a genomic module located in the basal state matrix of the genome.
5 is an example of measuring gene expression in genomic space in sample space.
6 shows an example kernel module for eight organizations.
7 illustrates an example of mapping a kernel module of a BRNO to a kernel module of another organization.
8 is an example showing the result of mapping a module of the CCDR domain of BRNO to a module of another organization.
9 is an example illustrating the density matrix for any module in sample space.
10 is an example illustrating module perturbation by exclusion of gene i from the gene network of any module.
11 shows an example of configuring a module-to-module network with TCGA data sets for eight organizations.
12 is an example of an inter-module network of BRNO to which various cutoff values are applied.
FIG. 13 shows an example of mapping an intermodule network of BRNOs to modules of another organization.
14 is an example of a flowchart for an analysis method using a genomic module network.
15 is an example of a system for building a genomic module network.

이하 설명하는 기술은 다양한 변경을 가할 수 있고 여러 가지 실시례를 가질 수 있는 바, 특정 실시례들을 도면에 예시하고 상세하게 설명하고자 한다. 그러나, 이는 이하 설명하는 기술을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 이하 설명하는 기술의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다.The following description may be made in various ways and have a variety of embodiments, specific embodiments will be illustrated in the drawings and described in detail. However, this is not intended to limit the technology described below to specific embodiments, it should be understood to include all changes, equivalents, and substitutes included in the spirit and scope of the technology described below.

먼저 이하 설명에서 공통적으로 사용되는 용어를 설명한다. (1) 게놈 모듈(genomic module)은 게놈을 일정한 기준으로 모듈화(modularization)하여 구분한 객체를 의미한다. 하나의 모듈은 복수의 유전자로 구성된다. (2) 게놈 모듈 네트워크는 복수의 게놈 모듈이 에지로 연결된 네트워크를 의미한다. (3) 유전자 네트워크는 하나의 모듈에서 유전자가 에지로 연결된 네트워크를 의미한다. (4) 엔트로피는 복수 유전자의 기능적 단합도 및 특성의 활성도를 나타낸다. 이하 관련 설명에서 구체적인 내용을 설명한다. 엔트로피를 기준으로 게놈 모듈, 모듈 간 네트워크 및 유전자 네트워크를 구성한다.First, terms commonly used in the following description will be described. (1) A genomic module means an object that is classified by modularizing the genome on a predetermined basis. One module consists of a plurality of genes. (2) A genomic module network means a network in which a plurality of genomic modules are connected by an edge. (3) A gene network refers to a network in which genes are edged in one module. (4) Entropy represents the degree of functional unity and the activity of a plurality of genes. In the following description, specific contents will be described. Based on entropy, it constructs genomic modules, intermodule networks, and genetic networks.

이하 설명하는 기술은 모듈화된 게놈 모듈 네트워크와 표현형의 관계를 규명하기 위한 것이다. 이를 위해 게놈(genome)의 유전자 전사 활동도(transcriptional activity of a gene)에 대한 정보의 흐름을 탐색하고, 이를 표현형의 측면에서 확인한다. The technique described below is for identifying the relationship between the modular genome module network and the phenotype. To this end, the flow of information about the transcriptional activity of a gene in the genome is explored and confirmed in terms of phenotype.

특정 질환(예컨대, 악성 종양)과 관련된 종래 대부분의 연구는 생화학적 기법에 기반한다. 이하 설명하는 기술은 생화학적 기법과는 관점이 전혀 다른 것으로, 생명체를 하나의 시스템으로 보고 이를 분석하는 기법에 해당한다. Most of the conventional research related to certain diseases (eg, malignant tumors) is based on biochemical techniques. The technique described below is completely different from the biochemical technique, and corresponds to a technique of viewing and analyzing life as a system.

생명체는 무핵세포로부터 유핵세포로, 단세포로부터 다세포로 진화하면서, 수직적 수평적으로 복잡한 구조가 되었다. 수직적으로는 다층구조(multilayered structure)를 형성하였으며, 수평적으로는 다수의 구성요소 사이에 복잡한 연결을 형성함으로써 진화된 생체 시스템으로 발전하였다. 일반적으로 시스템은 구성요소들이 조직화된 방법으로 서로 연결된 집합체로서 구성요소의 하나 혹은 일부의 집합은 시스템의 특성(property)에 영향을 미친다. Ackhoff (1972) 및 Checkland (1981)는 시스템은 각개 구성요소 혹은 일부분의 특성보다는 시스템 자체의 특성을 발현한다고 하였다. 같은 원리로 생체 시스템에서도 구성요소인 단백질 및 유전자의 특성이 아닌 시스템 자체의 특성들이 발현되며, 생체 시스템 자체의 특성이 표현형(phenotype)이 된다고 할 수 있다. 단백질 및 유전자는 생명체의 표현형에 영향을 줄 수는 있지만, 생체 시스템 자체를 구축하는 요소라고 보기는 어렵다.Living organisms evolved from nucleated cells to nucleated cells and from single cells to multicellular cells, becoming vertically and horizontally complex structures. Vertically, a multilayered structure was formed, and horizontally, an advanced biological system was developed by forming a complex connection between a plurality of components. In general, a system is a collection of components connected to each other in an organized way, where one or a set of components affects the system's properties. Ackhoff (1972) and Checkland (1981) said that systems express the characteristics of the system itself rather than the characteristics of individual components or parts. In the same principle, the characteristics of the system itself are expressed, not the properties of proteins and genes, and the characteristics of the biological system itself become phenotypes. Proteins and genes can affect the phenotype of life, but they are hardly seen as building up the biological system itself.

생체 시스템은 내적 외적 환경변화에 대해 적절한 표현형을 발현하여 대응한다. 이러한 대응 시나리오가 인코딩(encoding)될 수 있는 곳은 DNA 사슬 밖에 없다. 따라서 전체적인 생체 시스템은 유전자에 대한 정보로 특정된다고 할 수 있다.The biological system expresses and responds to appropriate phenotypes for internal and external environmental changes. The only place where this corresponding scenario can be encoded is the DNA chain. Therefore, the whole biological system can be said to be specified with information about genes.

악성종양을 예로 들어 설명한다. 악성종양에 의한 사망의 직접적 원인은 암세포 내의 특이적 유전자 및 단백의 동향에 의한 것이 아니라 암세포 혹은 악성종양이 발현하는 표현형에 의한다. 표현형의 발현은 암 생체 시스템 자체의 특성에 기인하기 때문에 표현형을 조절하거나 차단하는 것은 어렵다. 이하 설명하는 기술은 악성종양의 특정 표현형과 관련된 생체 시스템의 유전자 정보를 분리하여, 유전자 정보와 표현형의 발현과의 관계를 규명하고자 한다. 이하 설명하는 기술은 표현형과 관련된 유전자를 모듈화한 게놈 모듈 네트워크를 이용한다. 게놈 모듈 네트워크는 표현형과 관련된 유전자들을 일정한 기준으로 구분하여 모듈화하고, 각 모듈의 상호 관계를 정의한 모델이다.A malignant tumor is demonstrated as an example. The direct cause of death from malignant tumors is not due to the trend of specific genes and proteins in cancer cells, but to phenotypes expressed by cancer cells or malignancies. It is difficult to control or block the phenotype because the expression of the phenotype is due to the characteristics of the cancer biosystem itself. The technique described below is to isolate the genetic information of the biological system associated with a specific phenotype of malignant tumor, and to examine the relationship between the expression of the genetic information and the phenotype. The technique described below utilizes a genomic module network that modularizes genes associated with phenotypes. Genome module network is a model that classifies genes related to phenotype by modularity and defines each module's interrelationship.

이하 설명하는 기술은 유전자에 대한 발현 데이터를 사용하여 생물체 또는 생물체의 특정 질환과 관련된 게놈 모듈 네트워크를 구축한다. 이하 게놈 모듈 네트워크를 구축하기 위한 기반 개념 및 기법에 대하여 설명한다. 게놈 모듈 네트워크는 하나의 시스템이다. The technique described below uses expression data for genes to build a genomic module network associated with an organism or a particular disease of the organism. The underlying concepts and techniques for building a genomic module network are described below. The genome module network is a system.

도 1은 게놈 모듈 네트워크(100)의 예이다. 게놈 모듈 네트워크(100)를 구축하는 과정에 대해 설명하기 전에 게놈 모듈 네트워크(100)의 구조에 대해 먼저 설명한다. 게놈 모듈 네트워크는 복수의 게놈 모듈과 일정한 모듈을 연결한 에지로 구성된다. 1 is an example of a genomic module network 100. Before describing the process of constructing the genomic module network 100, the structure of the genomic module network 100 will be described first. The genome module network is composed of a plurality of genomic modules and edges connecting certain modules.

게놈 모듈 네트워크(100)의 각 모듈은 일정한 유전자를 포함한다. 도 1에서 모듈은 실선 원으로 도시하였다. 모듈에 표기된 번호는 모듈을 구분하는 식별자의 예이다. 하나의 모듈에 속하는 유전자는 특정 표현형 발현과 관련하여 연관된 유전자에 해당한다. 한편 복수의 모듈은 표현형과 관련하여 일정한 기능을 수행하는 그룹으로 구분될 수 있다. 도 1에서 점선 원으로 복수의 모듈을 구분하였다. 점선 원은 복수의 모듈을 관련된 기능으로 구분한 예이다. 이와 같이 기능적으로 관련성 있는 복수의 모듈을 구분한 영역을 도메인이라고 한다. 도 1에서는 A, B, C, D 및 E 도메인을 표시하였다. 한편 모듈 84는 도메인 A와 도메인 C를 연결하는 중계자 연학을 수행한다. 각 도메인은 세포주기, DNA 손상 제어, 상피 조직, 세포외기질, 면역, 혈관 신생 등과 같은 기능을 수행할 수 있다.Each module of the genomic module network 100 contains a constant gene. In Figure 1 the module is shown with a solid circle. The number given to a module is an example of an identifier that identifies the module. Genes belonging to one module correspond to genes associated with a particular phenotype expression. On the other hand, the plurality of modules may be divided into groups performing a certain function in relation to the phenotype. In FIG. 1, a plurality of modules are divided by a dotted circle. The dashed circle is an example of dividing a plurality of modules into related functions. Such an area in which a plurality of functionally related modules are divided is called a domain. In Figure 1 the A, B, C, D and E domains are indicated. Module 84, on the other hand, performs the relay linkage between domain A and domain C. Each domain can perform functions such as cell cycle, DNA damage control, epithelial tissue, extracellular matrix, immunity, angiogenesis and the like.

도 1에서 어느 한 쌍의 모듈이 실선(에지)로 연결된 상태를 도시한다. 에지로 연결된 2개의 모듈은 서로 일정한 연관성을 갖는다는 의미이다. 여기서 에지는 모듈 간 정보를 전달 내지 교환하는 통로라고 볼 수도 있다. 에지로 연결된 모듈은 어느 하나의 모듈이 다른 모듈의 기능에 관련된다고 볼 수 있다. 서로 다른 도메인에 속한 모듈이 연결되는 경우, 어느 하나의 도메인이 다른 도메인의 기능에 영향을 준다고 해석할 수 있다. 예컨대, 특정한 표현형은 하나의 모듈을 통해 발현될 수도 있지만, 다양한 모듈이 직접 또는 간접으로 관여하여 발현될 수도 있다.In FIG. 1, a pair of modules is connected by a solid line (edge). This means that two modules connected by an edge have a constant relationship with each other. In this case, the edge may be regarded as a path for transferring or exchanging information between modules. A module connected by an edge can be considered that one module is related to the function of another module. When modules belonging to different domains are connected, one domain can be interpreted as affecting the function of another domain. For example, certain phenotypes may be expressed through one module, but various modules may be expressed directly or indirectly.

도 1의 하단에 모듈 27을 확대 도시 하였다. 전술한 바와 같이 각 모듈은 복수의 유전자로 구성된다. 모듈 27에 속한 유전자는 알파벳으로 식별하였다. 하나의 모듈에 속한 유전자 쌍은 에지로 연결된 상태를 갖는다. 모듈 간 네트워크를 형성하는 것과 유사하게, 모듈 내의 유전자도 네트워크를 형성한다. 이를 유전자 네트워크라고 한다.An enlarged view of module 27 at the bottom of FIG. As mentioned above, each module consists of a plurality of genes. Genes belonging to module 27 were identified by alphabet. Gene pairs belonging to one module have an edge connected state. Similar to forming an intermodule network, the genes within a module also form a network. This is called a genetic network.

이하 게놈 모듈 네트워크를 구축하는 과정은 게놈을 모듈화하는 과정, 게놈 모듈 간의 네트워크를 구축하는 과정 및 각 모듈 내의 유전자 네트워크를 구축하는 과정을 포함한다. 이하 각 과정에 대하여 설명한다.Hereinafter, the process of constructing a genomic module network includes a process of modularizing a genome, a process of constructing a network between genome modules, and a process of constructing a genetic network within each module. Each process will be described below.

게놈의 상태(state of genome)State of genome

유전자의 상태를 정의하는 개념부터 설명한다. 유전자의 상태를 퀀텀 시스템(quantum system) 레벨에서 설명한다. 퀀텀 시스템은 밀도 행렬(density matrix) 형태로 표현된다.Describe the concepts that define the state of a gene. The state of genes is described at the quantum system level. Quantum systems are represented in the form of a density matrix.

하나의 유전자는 활성 내지 비활성이라는 2개의 기초 상태(basis state)를 갖는다. 활성 상태는 전사 과정에서 해당 유전자가 활성을 갖는다는 의미이다. 특정한 시점에서 하나의 유전자는 활성 또는 비활성 상태를 갖는다. 따라서 기초 상태는 서로 배타적이고, 수학적으로 벡터 공간에서 직교(orthonormal)하는 관계이다. 활성 상태를 "1" 또는 "on"이라고 표현하고, 비활성 상태를 "0" 또는 "off"라고 표현할 수 있다. 설명의 편의를 위해 활성 및 비활성 상태를 각각 기초 상태 벡터(basis state vector)

및

으로 표현한다. 하나의 유전자의 실제 상태 벡터(real state vector)

는 아래의 수학식 1과 같이 두 개 기초 상태의 선형적 조합에 해당한다. One gene has two basis states, active to inactive. Active state means that the gene is active during transcription. At any given time, one gene has an active or inactive state. The underlying states are thus mutually exclusive and mathematically orthonormal in vector space. The active state may be expressed as "1" or "on", and the inactive state may be expressed as "0" or "off". For convenience of explanation, the active and inactive states are respectively based on a basis state vector.

And

Express as Real state vector of a gene

Is a linear combination of two basic states as shown in Equation 1 below.

수학식 1에서 a₀은 비활성 상태에 대한 계수이고, a₁은 활성 상태에 대한 계수이다. 유전자에 의해 생성되는 mRNA의 양은 a₁에 따라 결정된다. 유전자의 활성 상태 벡터

는

이고, 아래의 수학식 2와 같이 일반화하여 표현할 수 있다.In Equation 1, a ₀ is a coefficient for an inactive state, and a ₁ is a coefficient for an active state. The amount of mRNA produced by the gene is determined according to a ₁ . Gene activity vector

Is

It can be expressed by generalizing as shown in Equation 2 below.

기초 상태 벡터

및

은 직교 정규(orthonormal) 특성을 갖는다. 활성과 비활성이라는 두 개의 유전자 상태를 활성상태를 기준으로 정규화할 수 있다. 예컨대, 두 개의 유전자 상태

를 하나의 형태로 정규화할 수 있다. 유전자 상태에 대한 계수를

로 정규화할 수 있다.

의 계수는 해당 유전자의 활성 상태 가능성을 나타낸다.Foundation state vector

And

Has an orthonormal characteristic. Two genetic states, active and inactive, can be normalized based on activity. For example, two genetic states

Can be normalized to one form. Coefficients for genetic status

Can be normalized to

The coefficient of represents the active state of the gene.

도 2은 유전자의 기초 상태를 도시한 예이다. 도 2은 두 개의 유전자 g₁ 및 g₂의 기초 상태를 도시한다. 두 개의 유전자는 모두 4개의 기초 상태를 가질 수 있다. 도 2은 유전자 g₁ 및 g₂각각에 대한 활성 상태 벡터

및

을 표시하였다.2 is an example showing the basic state of a gene. 2 shows the basal states of two genes g ₁ and g ₂ . Both genes can have four basal states. 2 shows activity state vectors for genes g ₁ and g _2, respectively.

And

Is indicated.

n개의 유전자로 구성되는 게놈(genome)에서 전체 유전자는 2ⁿ 가지의 기초 상태를 가질 수 있다. n = 2라고 가정하면, 두 개의 유전자 각각은 직교 정규 특성을 갖는 기초 상태 벡터를 갖는다. 따라서 게놈의 기초 상태 벡터는

로 나타낼 수 있다. 여기서

이고,

이다. 결국 두 개의 유전자로 구성된 게놈은 직교 정규 특성을 갖는 4개의 기초 상태 벡터

,

및

를 갖는다. 유전자의 개수에 따라 새로운 벡터 공간이 늘어난다고 할 수 있다. 게놈의 기초 상태 벡터로 정의되는 공간은 힐베르트 공간(Hilbert space)이다. 게놈의 기초 상태 벡터로 정의되는 공간을 이하 게놈 공간(genome space)이라고 명명한다.In a genome consisting of n genes, the entire gene may have 2 ⁿ basic states. Assuming n = 2, each of the two genes has a basis state vector with orthonormal properties. So the basal state vector of the genome

It can be represented as. here

ego,

to be. The genome consisting of two genes, in turn, has four elementary state vectors with orthonormal normalities

,

And

Has The new vector space increases with the number of genes. The space defined by the basic state vector of the genome is Hilbert space. The space defined by the basal state vector of the genome is hereinafter referred to as genome space.

두 개의 유전자를 갖는 게놈의 실제 유전자 상태

는 아래의 수학식 3과 같이 나타낼 수 있다. 두 개의 유전자는 제1 유전자와 제2 유전자로 구분한다.Actual genetic state of the genome with two genes

Can be expressed as Equation 3 below. The two genes are divided into a first gene and a second gene.

유전자의 실제 상태는 게놈에 있는 모든 유전자의 기초 상태에 따라 결정된다. 수학식 3에서

은 제1 유전자의 비활성 상태에 대한 계수이고,

는 제1 유전자의 활성 상태에 대한 계수이다.

은 제2 유전자의 비활성 상태에 대한 계수이고,

는 제2 유전자의 활성 상태에 대한 계수이다. 두 개의 유전자는 모두 4개의 기초 상태를 가질 수 있다.

은 두 개의 유전자 모두의 비활성 상태일 경우에 대한 계수이다.

은 제1 유전자는 비활성 상태이고, 제2 유전자는 활성 상태인 경우에 대한 계수이다.

는 제1 유전자는 활성 상태이고, 제2 유전자는 비활성 상태인 경우에 대한 계수이다.

은 두 개의 유전자 모두의 활성 상태일 경우에 대한 계수이다.The actual state of a gene is determined by the underlying state of every gene in the genome. In equation (3)

Is a coefficient for the inactive state of the first gene,

Is a coefficient for the active state of the first gene.

Is a coefficient for the inactive state of the second gene,

Is the coefficient for the active state of the second gene. Both genes can have four basal states.

Is the coefficient for the case where both genes are inactive.

Is a coefficient for when the first gene is inactive and the second gene is active.

Is a coefficient for the case where the first gene is active and the second gene is inactive.

Is the coefficient for the active state of both genes.

유전자 i의 활성 상태는

이다. 유전자 i는

의 전사능(level of transcription)을 갖는다. 두 개의 유전자 상태는 아래의 수학식 4와 같이 기초 상태에 머무를 가능성을 나타낼 수 있다. The activity of gene i is

to be. Gene i

Has a level of transcription. The two gene states may indicate the possibility of staying in the base state as shown in Equation 4 below.

수학식 4에서

는 유전자 i가 기초 상태

에 머무를 확률을 나타낸다.

는 행렬의 대각 요소를 찾는 연산

으로 결정될 수 있다. 유전자 i가 활성 상태에 있을 가능성은

로 결정될 수 있다. 게놈 전체가

에 있을 가능성은 게놈에 속한 각 유전자가 해당 상태에 있을 확률을 합산한 값이다. In equation (4)

The gene i state

Indicate the probability of staying on.

Finds diagonal elements of a matrix

Can be determined. The possibility that gene i is active

Can be determined. The whole genome

The likelihood is the sum of the probabilities of each gene in the genome being in that state.

게놈에서 유전자들이 갖는 상태에 대한 확률 분포는 유전자 사이의 특징적 관계를 나타낸다. 유전자들이 균일한 분포(uniform distribution)의 상태를 갖는다면, 해당 유전자들은 서로 연관성없이 랜덤한 활성을 갖는다고 할 수 있다. 그러나 유전자 사이의 연관성이 증가하면, 유전자의 상태 분포는 불균일성(unevenness)이 증가한다. 따라서 유전자 상태의 확률 분포의 불균일성은 게놈에서 해당 유전자의 연관도를 나타내는 정보라고 할 수 있다. The probability distribution of the state of genes in the genome represents a characteristic relationship between genes. If genes have a uniform distribution, the genes can be said to have random activity without being related to each other. However, as associations between genes increase, the state distribution of genes increases unevenness. Therefore, the nonuniformity of the probability distribution of the genetic state can be said to represent the degree of association of the gene in the genome.

n개의 유전자를 갖는 게놈은 전체 2ⁿ개의 기초 상태를 갖는다. 전체 기초 상태에서

번째 기초 상태를

라고 한다.

는

로 표현된다.

이다. 모든

는 서로 직교 정규하는 특성을 갖는다. 따라서 유전자 i의 활성 상태는 아래의 수학식 5와 같이 표현할 수 있다.A genome with n genes has a total of 2 ⁿ basal states. In full foundation condition

First base state

It is called.

Is

It is expressed as

to be. all

Have the property of being orthonormal to each other. Therefore, the active state of gene i can be expressed as in Equation 5 below.

유전자 i의 mRNA 생성 정도는 계수

에 따른다. 게놈은 해당하는 유전자를 제어하여 mRNA 생성을 제어한다.MRNA generation of gene i is a factor

Follow. The genome controls mRNA production by controlling the genes involved.

대각합(trace)이 1이 되도록 정규화 된 다이애드(dyad)

를 유전자 i의 밀도 행렬(density matrix)

이라고 한다.

은

와 동일하기 때문에, 밀도 행렬은 게놈의 순수 상태(pure state)를 나타낸다. 퀀텀 시스템에서 순수 상태는 상태를 정확하게 알고 있는 상태를 의미한다. 게놈 시스템의 확률적 특성을 고려하면 밀도 행렬을 이용하여 유전자의 순수 상태의 조합(ensemble of pure states)으로 게놈의 혼합 상태(mixed state)를 설명하는 것이 유용하다. 따라서 게놈의 혼합 상태 밀도 행렬

는

의 조합에 해당한다. 즉,

는

이다.

는

의 확률이다.

가 1/n으로 동일한 값이라면,

는

로 표현될 수 있다. 따라서

는 아래 수학식 6과 같이 나타낼 수 있다.Dyad normalized to have a trace of 1

Is the density matrix of gene i

It is called.

silver

Because of the same as, the density matrix represents the pure state of the genome. In a quantum system, pure state means a state that knows exactly the state. Given the stochastic nature of the genomic system, it is useful to describe the mixed state of the genome in terms of the sense of pure states using a density matrix. Therefore, the mixed-state density matrix of the genome

Is

Corresponds to the combination of. In other words,

Is

to be.

Is

Is the probability.

Is equal to 1 / n,

Is

It can be expressed as. therefore

May be expressed as in Equation 6 below.

게놈 공간은 힐베르트 공간이므로, 밀도 행렬

에 대한 단위 벡터

의 확률은 아래의 수학식 7과 같이 글리슨 정리(Gleason's theorem)에 따라 정의될 수 있다. Since genomic space is Hilbert space, the density matrix

Unit vector for

The probability of may be defined according to Gleason's theorem, as shown in Equation 7 below.

게놈이

번째 기초 상태에 머무를 가능성은

이다. 이 가능성은

로 연산된다. 게놈이 특정 기초 상태일 가능성은 게놈의 밀도 행렬의 대각에 위치한다. n개의 유전자로 구성되는 게놈에 대한 밀도 행렬은 2ⁿ×2ⁿ 크기의 사각 행렬이다. 이 밀도 행렬은 2ⁿ개의 고유 벡터(eigenvector)와 고유값(eigenvalues)를 갖는다. 고유 벡터는 고유상태(eigenstates)를 나타내고, 고유값은 특정 상태에 대한 확률을 나타낸다.Genome

The possibility of staying in the first basic state

to be. This possibility

Is calculated as The likelihood that the genome is a particular basal state is located diagonal to the density matrix of the genome. The density matrix for a genome consisting of n genes is a square matrix of size 2 ⁿ × 2 ⁿ . This density matrix has 2 ⁿ eigenvectors and eigenvalues. Eigenvectors represent eigenstates, and eigenvalues represent probabilities for a particular state.

게놈 시스템이 각 고유상태에 머무를 확률은 불균일하다. 게놈 시스템에서 이러한 불균일성은 일정한 유전 정보에 해당한다. 도 3은 게놈 공간에서 밀도 행렬을 나타낸 예이다. 밀도 행렬은 2차원 게놈 공간에서 타원형 형태를 갖는다. 점선 화살표로 표시된 2ⁿ개의 축

은 게놈의 기초 상태 벡터를 나타낸다. 실선 화살표로 표시된 2ⁿ개의 축

은 고유벡터를 나타낸다. 굵은색 화살표의 길이는 고유 벡터

의 확률을 나타낸다. 검은색 점은 유전자를 나타낸다.The probability that the genomic system stays in each eigenstate is heterogeneous. This heterogeneity in genomic systems corresponds to constant genetic information. 3 shows an example of a density matrix in genomic space. The density matrix has an elliptic form in two-dimensional genomic space. 2 ⁿ axes represented by dotted arrows

Represents the basis state vector of the genome. 2 ⁿ axes represented by solid arrows

Denotes the eigenvector. Bold arrow length vector

Represents the probability of. Black dots indicate genes.

게놈의 혼합 상태 밀도 행렬

의 고유 벡터는 게놈 시스템의 창발적 특성을 나타내고, 고유 벡터의 고유값은 특성 발현의 확률을 결정한다. 불균일성은 엔트로피 S(ρ)로 표현할 수 있다. 게놈이 어떤 상호 작용에서도 유전자를 활성화하지 않거나, 매우 많은 상호 작용에서 동시에 다수의 유전자를 활성화하면, 높은 엔트로피 값을 갖는다. 게놈 공간에서 엔트로피가 증가하면 밀도 행렬의 타원은 특정한 방향성을 잃고 원 형태를 갖게된다. 반대로 게놈이 소수의 특정한 타깃에 집중하여 활동하면 낮은 엔트로피 값을 갖는다. 게놈 공간에서 생성된 유전 정보는 실제 공간(real space)에서 단백질 네트워크로 전달된다. mRNA는 게놈 공간과 단백질 공간을 연결하는 채널에 해당한다. Genome Mixed State Density Matrix

The eigenvectors of represent the emergent properties of the genomic system, and the eigenvalues of the eigenvectors determine the probability of characteristic expression. Nonuniformity can be expressed as entropy S (ρ). If the genome does not activate a gene in any interaction, or activates multiple genes simultaneously in very many interactions, it has high entropy values. As entropy increases in genomic space, the ellipse of the density matrix loses its specific orientation and becomes circular. Conversely, if the genome is focused on a few specific targets, it will have a low entropy value. Genetic information generated in the genomic space is transferred to the protein network in real space. mRNA corresponds to a channel connecting genomic and protein spaces.

게놈 모듈화Genome Modularity

고등 진핵 생물은 하나의 세포에서도 서로 다른 단백질 네트워크를 동시에 가동한다. 특정한 상호 작용에 관여하는 유전자들이 하나의 그룹에 속한다고 가정한다. 특정 그룹에 속한 유전자들은 특정한 상호 작용과 관련된 표현형을 나타내는 단백질 생성을 위해 연관성을 갖고 동작한다. 따라서 해당 그룹에 속한 유전자를 하나의 모듈(module)로 정의할 수 있다. 모듈은 특정 표현형에 관여하는 단백질 생성에 관여하는 유전자이다. 게놈 전체의 유전자를 분석하면, 게놈은 복수의 모듈로 구분될 수 있다. 모듈에 속한 유전자는 특정 표현형에 관한 단백질을 직접 생성하는데 관여할 수 있다. 나아가 모듈에 속한 유전자는 특정 단백질 생성 과정에 간접적으로 관여할 수 있다. Higher eukaryotes run different protein networks simultaneously in a single cell. It is assumed that the genes involved in a particular interaction belong to a group. Genes belonging to a particular group work in association to produce proteins that exhibit phenotypes associated with specific interactions. Therefore, genes belonging to the group can be defined as a module. Modules are genes involved in the production of proteins involved in specific phenotypes. By analyzing genes throughout the genome, the genome can be divided into multiple modules. Genes belonging to a module may be involved in directly producing proteins for a particular phenotype. Furthermore, the genes in the module can be indirectly involved in the production of specific proteins.

연구자는 전체 게놈은 최대한 독립된 모듈로 구분하고, 독립된 모듈 간 연관성을 분석하여 모듈 간 링크를 파악한다. 이와 같이 전체 게놈이 모듈과 링크로 정의되는 네트워크가 게놈 모듈 네트워크이다. 특정 게놈에 대한 게놈 모듈 네트워크를 구축하여, 게놈 모듈 네트워크 레벨에서 게놈을 분석하고자하는 것이다.The researcher identifies the whole genome as independent modules as possible, and analyzes the links between the independent modules to identify the links between the modules. As such, a network in which the entire genome is defined as modules and links is a genomic module network. By building a genomic module network for a particular genome, one would like to analyze the genome at the genomic module network level.

복수의 모듈은 특정한 표현형 발현에 협력적으로 관여할 수 있다. 복수의 모듈은 모듈 간 링크를 통해 일정한 통신을 수행한다고 할 수 있다.Multiple modules may cooperatively participate in specific phenotypic expression. The plurality of modules may be said to perform certain communication through a link between the modules.

원칙적으로 유전자 색인과 기초 상태를 적절하게 정렬하면 게놈 모듈의 분리가 가능하다. 게놈이 각 기초 상태에 머무는 확률은 대부분 0(zero)에 가까우며 게놈 모듈 영역에서 변동할 것이다. In principle, proper alignment of the genetic index and the underlying state allows for the isolation of genomic modules. The probability that the genome stays in each basal state is mostly near zero and will vary in genomic module regions.

단세포 생물의 경우, 하나의 유전자는 동시에 서로 다른 다중 상태를 가질 수 없기 때문에 하나의 역할만을 수행할 수 있다. 다른 말로 표현하면, mRNA는 물리적으로 연속된 공간에서 하나의 레벨을 유지하기 때문에, 하나의 유전자는 하나의 게놈 모듈에 포함되어야 할 것이다.In single-celled organisms, one gene can play only one role because it cannot have multiple states that are different at the same time. In other words, because mRNA maintains one level in physically contiguous space, one gene will have to be included in one genomic module.

반면 다세포 생물에서 한 유전자는 물리적으로 분리된 각 공간에서 발현한다. 컴퓨터 중앙처리장치의 시분할을 통한 멀티태스킹과 유사하게 유핵 생명체에서 한 유전자는 공간 분할을 통해 멀티태스킹을 수행할 수 있으며, 이는 유핵 생명체가 다세포생물로 진화하게 된 하나의 근거를 제시한다.In multicellular organisms, on the other hand, a gene is expressed in each physically separate space. Similar to multitasking through time-division of a computer central processing unit, a gene in nucleated organisms can perform multitasking through spatial partitioning, providing one reason for the evolution of nucleated organisms into multicellular organisms.

도 4는 게놈의 기초 상태 행렬에 위치하는 게놈 모듈의 예이다. 세로축은 기초 상태의 인덱스를 나타내고, 가로축은 유전자 인덱스를 나타낸다. 모듈 c와 모듈 b가 부분적으로 중복된다. 4 is an example of a genomic module located in the basal state matrix of the genome. The vertical axis represents the index of the basal state, and the horizontal axis represents the genetic index. Module c and module b partially overlap.

모듈 a,b 및 d 또는 모듈 a, c 및 d는 하나의 세포(단일 게놈 공간)에서 활성화될 수 있다. 그러나 모듈 b 및 c는 공유되는 유전자가 존재하므로 서로 다른 세포(다중 게놈 공간)에서 활성화되어야 한다. Modules a, b and d or modules a, c and d can be activated in one cell (single genomic space). However, modules b and c must be activated in different cells (multiple genomic spaces) because of the presence of shared genes.

모듈 a와 b는 부분적으로 기초 상태가 중복되지만, 두 개의 모듈의 고유 벡터는 서로 다른 방향성을 갖는다. 따라서 두 개의 모듈 a와 b는 서로 다른 단백질 네트워크와 표현형에 관여한다. 두 개의 모듈 a와 b에 대한 상호 정보 I(a:b)는 S(ρ_a) + S(ρ_b) - S(ρ_ab)로 표현한다. 상호 정보는 두 개의 모듈 사이의 상호 의존성을 의미한다. 두 개의 모듈에서 공유하는 기초 상태가 증가하면, S(ρ_ab)는 감소하고, 상호 정보는 증가한다. 두 개의 모듈 사이에 공유하는 기초 상태의 개수는 게놈 모듈 사이의 연결 정도에 영향을 줄 수 있다. 그러나 각 게놈 모듈이 고유의 특성을 발현할 정도의 복잡도를 갖게 되면 이러한 연결은 게놈 모듈의 실행에 매개 변수(parameter)로서 기능하게 된다. Modules a and b partially overlap the underlying state, but the eigenvectors of the two modules have different orientations. Thus, the two modules a and b are involved in different protein networks and phenotypes. Mutual information I (a: b) for two modules a and b is expressed as S (ρ _a ) + S (ρ _b )-S (ρ _ab ). Mutual information means interdependence between two modules. As the base state shared by the two modules increases, S (ρ _ab ) decreases and mutual information increases. The number of underlying states shared between two modules can affect the degree of linkage between genomic modules. However, if each genomic module is complex enough to express its own characteristics, this linkage will function as a parameter in the execution of the genomic module.

게놈 시스템에서 유전자는 시간 또는 샘플을 축으로 정의되는 공간에서 게놈의 활성 상태에 따라 다양한 발현 레벨을 갖는다. 발현 레벨을 결정하는 유전자의 활성 상태

는 수학식 5에서 정의한 바 있다. 수학식 5는 게놈 공간의 기초 벡터로 아래의 수학식 8과 같이 다시 정리될 수 있다.Genes in genomic systems have varying levels of expression depending on the active state of the genome in time or space defined by the sample axis. Activity of genes that determine expression levels

Is defined in Equation 5. Equation 5 may be rearranged as Equation 8 below as a basis vector of the genomic space.

수학식 8은 게놈 시스템에서 어느 하나의 유전자의 발현 레벨이 게놈의 모든 유전자와의 상호 작용 패턴에 따라 변경될 수 있다는 것을 나타낸다. Equation 8 indicates that the expression level of either gene in the genomic system can be altered depending on the pattern of interaction with all the genes in the genome.

원핵 생물은 유전자 i에 대하여

를 제외하고 모든

가 0(zero)이다. 게놈 내에서 유전자간 상호작용이 없기 때문이다. 수학식 8에 따르면 원핵 생물에서 유전자 i의 활성 상태

는 널 벡터(null vector)가 된다. 널 벡터는 게놈 공간이 없다는 의미이다. 따라서 원핵 생물의 게놈은 단백질 유전 정보(proteome)에 종속적이며 유전자의 발현 레벨은 스칼라량인 활성 상태에 따라 결정된다. Prokaryotes are against gene i

Except all

Is zero. This is because there is no interaction between genes in the genome. According to Equation 8, the active state of gene i in prokaryotes

Becomes a null vector. A null vector means no genomic space. Thus, the prokaryotic genome is dependent on protein genetic information, and the expression level of the gene is determined by the scalar amount of activity.

그러나 진핵 생물에서 유전자 i의

는 0이 아닌 값을 갖는다. 따라서

는 전체 게놈의 활성 상태에 의존적이며, 다중 선형적인 특성을 갖고, 게놈 공간에 위치한다. 한 유전자의 발현 레벨은 수치상으로 동일할지라도 게놈의 활성 상태에 따라 서로 다른 의미를 갖는다. 따라서 진핵 생물의 경우 개별 유전자의 활성을 파악하는 대신에, 유전자의 기능적 단합도와 특성의 활성 및 상호 관계를 나타내는 게놈 모듈의 엔트로피 S(ρ)를 활용해야 한다.But of gene i in eukaryotes

Has a nonzero value. therefore

Is dependent on the active state of the entire genome and has multiple linear properties and is located in the genomic space. Although expression levels of a gene are numerically identical, they have different meanings depending on the activity of the genome. Therefore, in eukaryotic organisms, instead of identifying the activity of individual genes, the entropy S (ρ) of the genomic module, which represents the functional integrity and gene activity and interrelationship of genes, should be utilized.

실제 공간과 게놈 공간은 본질적으로 상이하다. 게놈 공간은 게놈의 기초 상태를 단위벡터(unit vector)로 정의되는 2ⁿ차원 공간이다. 실제 공간(real space)은 생물체에서 유전자 활성을 통한 특정 단백질 생성 등의 화학반응이 일어나는 현실 세계의 3차원 공간이다. 따라서 게놈의 활성을 파악하기 위하여 게놈 공간에 직접 접근하는 것은 불가능하다. 따라서 게놈 공간을 유전자 발현 데이터의 샘플 공간(sample space)으로 변환하여 분석하기 위한 방법이 필요하다. 샘플 공간은 각 샘플을 단위벡터로 정의되는 m차원 공간이다. Real space and genomic space are essentially different. Genomic space is a 2 ^n- dimensional space in which the basic state of the genome is defined as a unit vector. Real space is the three-dimensional space of the real world where chemical reactions, such as the production of specific proteins through gene activity in living organisms, occur. Therefore, it is not possible to directly access the genome space to understand the activity of the genome. Therefore, a method for converting and analyzing genomic space into sample space of gene expression data is needed. Sample space is an m-dimensional space in which each sample is defined as a unit vector.

cDNA 마이크로어레이와 같은 대용량고효율(high-throughput) 기술은 동시에 수만개의 유전자 발현 레벨을 측정할 수 있다. mRNA는 게놈 공간으로부터 실제 공간으로 정보를 전달하기 때문에 대용량고효율 유전자 발현 측정 기술은 게놈 공간 내부를 들여다볼 수 있게 한다. 유전자 발현의 측정은 게놈의 상태를 게놈 공간으로부터 샘플 공간으로 맵핑(mapping)하는 과정이다. m개의 샘플로부터 대용량고효율 유전자 발현의 측정은 게놈 공간에서 mRNA에 실린 정보를 m 차원의 샘플 공간으로 변환한다. 변환 행렬

을 사용하여 게놈 공간의 밀도 행렬

을 샘플 공간의 밀도 행렬

로 변환할 수 있다. 변환 과정은 아래의 수학식 9와 같다.Large-capacity high-throughput technologies such as cDNA microarrays can measure tens of thousands of gene expression levels simultaneously. Because mRNA transfers information from the genomic space to the real space, high-capacity, high-efficiency gene expression measurement technology allows you to look inside the genome space. Measurement of gene expression is the process of mapping the state of the genome from genomic space to sample space. Measurement of large capacity high efficiency gene expression from m samples converts the information carried in the mRNA in genomic space into m-dimensional sample space. Transformation matrix

Density matrix in genomic space using

Density matrix of the sample space

Can be converted to The conversion process is shown in Equation 9 below.

은

의 유사역행렬(pseudo inverse matrix)이다. 샘플 공간에서 유전자의 혼합 상태(mixed state)

는 아래의 수학식 10과 같다.

silver

Is the pseudo inverse matrix of. Mixed state of genes in sample space

Is the same as Equation 10 below.

측정된 발현 레벨

로부터 직접적으로 게놈 또는 유전자의 상태를 판단하려면

이 필요하다. 한편 유전자 발현 측정 과정에는 시간 또는 샘플 공간에서 샘플의 선택, 측정 방법, 데이터 처리 등이 관여하는 바, 변환 행렬

은 많은 요인에 영향을 받을 수 있으며, 동일 조건이라 하더라도 각 유전자가 받는 영향은 모두 다르다. 결과적으로

는 실험 조건 내지 환경에 영향을 받는다. 이는 유전자 발현 데이터가 원칙적으로 일관성이 없음을 설명한다. 이러한 데이터의 원리적 취약성을 통계적 또는 실험적인 방법으로 극복하는 데에는 큰 한계가 있다. Measured expression levels

To determine the state of a genome or gene directly from

This is necessary. Meanwhile, the process of measuring gene expression involves selection of samples, measurement methods, and data processing in time or sample space.

Can be influenced by many factors, and even under the same conditions, the effects of each gene are different. As a result

Is affected by the experimental conditions or the environment. This explains that gene expression data are in principle inconsistent. There is a big limitation in overcoming the principle vulnerabilities of these data in statistical or experimental ways.

전술한 수학식 7, 수학식 9 및 수학식 10은 결국 다음과 같은 사실을 의미한다. 밀도 행렬

로 표현되는 게놈 모듈에 대한 유전자 i의 확률은 게놈 공간에서의 확률

과 샘플 공간의 확률

이 일치한다. 나아가 유니터리 변환(unitary transformation)에 의하여 게놈 공간의 엔트로피

는 샘플 공간의 엔트로피

와 동일하게 된다. 이는 전술한 확률과 엔트로피가 유전자 발현 레벨의 차이에 따른 측정 환경에 영향을 받지 않는 파라미터에 해당한다는 것을 증명한다. 진핵생물의 게놈을 측정하기 위한 완벽한 변환 행렬

을 얻는 것은 불가능하지만, 엔트로피 및 확률은 측정 과정을 고려하지 않고 얻을 수 있는 것이다.

Equations

7, 9, and 10 described above mean the following facts. Density matrix

The probability of gene i for the genomic module expressed by

And the probability of sample space

This matches. Further entropy of genomic space by unitary transformation

Is the entropy of the sample space

Becomes the same as This demonstrates that the aforementioned probabilities and entropy correspond to parameters that are not affected by the measurement environment due to differences in gene expression levels. Complete transformation matrix to measure the genome of eukaryotes

It is impossible to obtain, but entropy and probability can be obtained without considering the measurement process.

도 5는 게놈 공간의 유전자 발현을 샘플 공간에서 측정하는 예이다. 기초 벡터

기반의 2ⁿ 차원 게놈 공간에서 게놈 모듈의 밀도 행렬

및 유전자 상태

가 기초 벡터

로 구성되는 m차원의 샘플 공간에서 각각

및

로 변환되는 예이다. 여기서

이므로 모듈에 대한 유전자의 확률은 유전자 발현의 측정에 따른 게놈 공간에서 샘플 공간으로의 변환에 영향을 받지 않는다. 5 is an example of measuring gene expression in genomic space in sample space. Foundation vector

Density Matrix of Genomic Modules in 2 ^n- dimensional Genomic Spaces Based

And genetic status

Autumn foundation vector

Each in the m-dimensional sample space

And

This is an example of conversion to. here

Therefore, the probability of the gene for the module is not affected by the conversion from genome space to sample space according to the measurement of gene expression.

샘플 공간에서 모든 유전자의 벡터가 동일한 방향을 갖는다면 엔트로피는 0(zero)이다. 이는 타원형의 밀도 행렬이 첫 번째 고유 벡터와 일치하는 직선이 된다는 의미이다. 유전자의 확률이 모든 고유 벡터에 동일하여 밀도 행렬이 완벽한 원(또는 구)이 되면 엔트로피는 최대값을 갖는다.If the vectors of all genes in the sample space have the same direction, entropy is zero. This means that the elliptic density matrix is a straight line that matches the first eigenvector. If the probability of a gene is the same for all the eigenvectors, and the density matrix is a perfect circle (or sphere), the entropy has a maximum value.

이하 실제 유전자 발현 데이터를 이용하여 전술한 게놈 모듈 네트워크를 구축하는 예를 설명하고자 한다. 종양(tumor)은 거대한 호스트 시스템에 존재하는 작은 독립 시스템이라고 할 수 있다. 따라서 종양에 대한 유전 정보를 이용하여 게놈 모듈 네트워크를 구축하고자 한다.Hereinafter, an example of constructing the above-described genome module network using actual gene expression data will be described. Tumors are small independent systems that exist in large host systems. Therefore, we want to build a genomic module network using genetic information about tumors.

유전자 발현 데이터 세트는 유방암(BRCA), 결장암(COAD), 직장암(READ), 폐선암(LUAD), 폐편평상피세포암(LUSC), 난소암(OV) 등 6종의 암 조직, 정상유방조직(BRNO), 정상결장조직(CONO) 등 2종의 정상 조직의 데이터와, 6종 암 조직 데이터의 임의 혼합(X6CA), 2종 정상 조직 데이터의 임의 혼합(X2NO) 및 6종 암 조직과 2종 정상 조직 데이터의 임의 혼합(X6C2N) 데이터를 사용하였다. BRCA 등은 해당 조직으로부터 유전자 발현량의 측정을 TCGA(The Cancer Genome Atlas)에서 수행하여 학술적 연구를 위해 일반에 공개한 데이터 세트를 의미한다. 연산 시간을 줄이기 위하여 36개의 샘플보다 많은 데이터 세트를 갖는 BRCA, COAD, LUSC 및 OV에서는 각각 36개의 샘플을 임의로 선택하였다. 데이터 세트를 이용하여 게놈 모듈을 추출(isolation)하였다. Gene expression data sets include six types of cancer tissues, including breast cancer (BRCA), colon cancer (COAD), rectal cancer (READ), lung adenocarcinoma (LUAD), lung squamous cell carcinoma (LUSC), and ovarian cancer (OV). Data from two normal tissues (BRNO), normal colon tissue (CONO), and a random mix of six cancer tissue data (X6CA), a random mix of two normal tissue data (X2NO), and six cancer tissues and two Random mix (X6C2N) data of the species normal tissue data was used. BRCA et al. Refers to a data set that has been made public for academic research by measuring gene expression levels from tissues in the Cancer Genome Atlas (TCGA). To reduce the computation time, 36 samples were randomly selected for BRCA, COAD, LUSC, and OV, each with more than 36 samples. Genomic modules were isolated using the data set.

유전자 발현 데이터 세트를 기준으로 전술한 게놈 모듈을 추출(구분)하는 과정을 설명한다. 전술한 내용 중 모듈화 과정(modularization)에 필요한 내용을 설명한다. 완전히 독립된(연결성이 전혀 없는) n개 모듈에서 각각의 밀도행렬을

라고 한다. 각각의 공간은 힐버트 공간이므로 전체의 밀도행렬은

와 같다. 따라서 전체 엔트로피는 각 모듈의 엔트로피의 합과 같게 된다. 즉

이다.The process of extracting (dividing) the aforementioned genomic module based on the gene expression data set will be described. Of the foregoing, the contents necessary for the modularization process will be described. Each density matrix in n completely independent (no connectivity) modules

It is called. Each space is a Hilbert space, so the overall density matrix

Same as Therefore, the total entropy is equal to the sum of the entropy of each module. In other words

to be.

다만 n개 모듈이 서로 독립적이지 않다면(즉 모듈 간 연결성이 존재한다면) 전체의 엔트로피는 각 모듈 엔트로피의 합보다 작게 된다. 즉

이다. However, if the n modules are not independent of each other (ie, connectivity exists between modules), the overall entropy will be less than the sum of each module entropy. In other words

to be.

어느 하나의 모듈은 다른 모듈에 영향을 주고, 모듈 간 일정한 정보를 교환하기도 한다. 모듈은 유전자들의 집합체인데, 서로 다른 모듈에 동일한 유전자가 존재하기도 한다. 따라서 게놈 모듈 네트워크에서 각 모듈은 완벽하게 독립적으로 작용하기 어렵다. 이 경우 모듈 분리의 기준은 각 모듈 엔트로피의 합과 전체 엔트로피 간의 차이를 최소화하는 모듈을 찾는 것이다.One module affects other modules and may exchange certain information between modules. A module is a collection of genes, with the same genes in different modules. Thus, in a genomic module network, each module is difficult to operate completely independently. In this case, the criterion for module separation is to find modules that minimize the difference between the sum of each module entropy and the total entropy.

그러나 게놈에서 활성화되는 모듈의 실제 개수를 알지 못할 뿐만 아니라, 참여하는 유전자의 범위와 여러 모듈에 동시에 참여하는 유전자에 관한 정보도 없다. 따라서 위 방법으로 모듈의 조합을 찾는 것은 현실적으로 불가능하다. 이에 대한 해결책으로서 실제 존재하는 진성모듈(true module)의 국소 최적점(local optimal point)들을 파악 하고 그것들을 중심으로 모듈을 구성하여 추정 모듈을 완성할 수 있다. However, not only do we know the actual number of modules activated in the genome, but we also have no information on the range of genes involved and the genes that participate in multiple modules simultaneously. Therefore, finding a combination of modules in this way is practically impossible. As a solution to this, it is possible to complete the estimation module by identifying the local optimal points of the true modules that are actually present and constructing the modules around them.

이 과정에서 다른 국소 최적점으로 전이를 방지하여 진성모듈에 근접하는 추정모듈들을 생성한다. 이와 같은 방법으로 서로 상당부분 중첩되는 추정모듈들의 외곽 한계는 큰 범주에서 표현형을 표출하는 진성모듈의 연결집단인 도메인과 일치한다. 큰 범주의 표현형을 조절하는 도메인은 다른 도메인과 정보 교환 채널의 수가 적을 수밖에 없기 때문이다(max-flow min-cut).In this process, estimation modules are generated that approach the intrinsic module by preventing the transition to another local optimum point. In this way, the outer limits of the estimation modules, which are largely overlapped with each other, are consistent with the domain, which is a concatenated group of intrinsic modules expressing phenotypes in large categories. This is because a domain that controls a large category of phenotypes has a small number of information exchange channels with other domains (max-flow min-cut).

아래 표 1은 게놈 모듈을 위한 국소 최적점을 찾기 위한 알고리즘에 대한 슈도코드(pseudo code)에 대한 예이다. 유전자를 임의의 집합으로 나누고, 나누어진 각 집합(모듈)에 대하여 유전자를 하나씩 제거해 나가며 엔트로피를 목표치까지 낮춤으로써 국소 최적점을 찾는다. 실제 모듈의 내부에 존재하는 국소 최적점을 찾아야 하므로 엔트로피 목표치는 충분히 낮게 설정한다. 표 1에서 "th"가 목표치인 임계값에 해당한다. 표 1에서 역슬래쉬는 좌측 집합에서 우측 요소를 제거하는 연산을 의미한다.Table 1 below is an example of pseudo code for an algorithm for finding local optimal points for genomic modules. Find the local optimal point by dividing genes into arbitrary sets, removing genes one by one for each set (module), and lowering entropy to the target. Set the entropy target low enough because you need to find the local optimum that exists inside the actual module. In Table 1, "th" corresponds to a threshold value. In Table 1, the backslash means the operation to remove the right element from the left set.

상기 표 1의 과정을 통해 찾은 국소 최적점을 이용하여 최종적으로 게놈 모듈을 결정한다. 아래 표 2는 게놈 모듈을 완성하는 과정에 대한 알고리즘에 대한 예이다.The genomic module is finally determined using the local optimal point found through the procedure of Table 1 above. Table 2 below shows an example of the algorithm for completing the genome module.

상기 표 2는 엔트로피를 증가시키지 않는 조건으로 외부 유전자 j를 하나씩 추가하여 모듈을 강화한다. 모듈 강화 과정에서 모듈의 중심이 이동하는 것을 방지하기 위하여 주고유벡터(principal eigenvector) v₁방향의 변동을 제한한다. 표 2에서 "th"는 주고유벡터의 변동 각도에 대한 임계값을 의미한다.Table 2, above, enhances the module by adding external genes j one by one without increasing entropy. In order to prevent the module's center from shifting during module reinforcement, it limits the fluctuation in the principal eigenvector v ₁ direction. In Table 2, "th" means a threshold value for the angle of variation of the high unique vector.

이때 표 1에서의 엔트로피의 목표치와 표 2에서의 주고유벡터의 변동 각도 등의 최적 파라미터는 유전자 발현데이터의 특성에 따라 달라지므로 다양한 파라미터로 얻은 결과로부터 각각 게놈 모듈 네트워크 구축 및 도메인을 파악하여 일관된 결과가 나오는 것으로 최적 파라미터를 결정하는 과정이 필요할 수 있다.At this time, the optimal parameters such as the entropy target value in Table 1 and the variation angle of the sub-unique vector in Table 2 depend on the characteristics of the gene expression data. The results may require the process of determining the optimal parameters.

일반적으로 낮은 엔트로피는 밀도 행렬의 제1 고유 벡터로 대표되는 특정 목표에 집중하는 시스템을 의미한다. 진핵 세포의 게놈 시스템에서 엔트로피가 낮은 게놈 모듈은 특정한 표현형을 발현하기 위한 정보를 생성한다고 할 수 있다. In general, low entropy means a system that focuses on a specific target represented by the first eigenvector of the density matrix. Genomic modules with low entropy in the genomic system of eukaryotic cells can be said to generate information for expressing a particular phenotype.

게놈 모듈을 구성하는 유전자들 중 일부는 서로 다른 모듈에 중복된다. 이는 게놈 모듈을 구성할 때 첫 번째 고유 벡터 변화가 일정한 임계값 이하로 낮도록 조건을 설정했기 때문이다.Some of the genes that make up genomic modules overlap in different modules. This is because, when constructing the genome module, the conditions were set so that the first eigenvector change was lower than a certain threshold.

실험에서 사용 된 데이터 세트를 생산하는 모든 조직은 내배엽, 중배엽, 외배엽에서 유래된 여러 종류의 세포로 구성된다. 따라서 발현 데이터 세트는 다양한 유형의 세포로부터의 축적된 것이다. 같은 종류의 세포 또는 개별 세포도 각자의 기능(동작)을 수행하기 때문에, 많은 유전자는 자신이 속한 세포와 관련하여 조직 레벨에서 다른 기능에 동시에 관여할 수 있다. 따라서, 조직에서 단일 유전자의 복잡한 발현 프로파일을 분석하는 것은 어렵다.All tissues producing the data set used in the experiment consist of several types of cells derived from endoderm, mesoderm, and ectoderm. Thus, expression data sets are accumulated from various types of cells. Because the same kind of cells or individual cells perform their own functions, many genes can be involved in different functions simultaneously at the tissue level in relation to the cells to which they belong. Thus, it is difficult to analyze the complex expression profile of a single gene in tissues.

6종의 주요 암 조직 데이터(BRCA, COAD, READ, LUAD, LUSC, OV), 2종의 정상 조직 데이터(BRNO, CONO) 및 3종의 혼합데이터(X6CA, X2NO, X6C2N)로 부터 얻은 게놈 모듈은 다양한 엔트로피 레벨을 갖는다. 실험을 통해 살펴보면, 엔트로피가 극도로 낮은 모듈이 분리된다. 가장 낮은 엔트로피를 갖는 모듈은 (i) 대부분의 조직에서의 두 번째 모듈(m2)과 (ii) 유방암 조직(BRCA), 정상 결장 조직(CONO) 및 난소암 조직(OV)의 첫 번째 모듈(m1)이었다.Genome module derived from six main cancer tissue data (BRCA, COAD, READ, LUAD, LUSC, OV), two normal tissue data (BRNO, CONO) and three mixed data (X6CA, X2NO, X6C2N) Has various entropy levels. Experiments show that modules with extremely low entropy are separated. The modules with the lowest entropy are (i) the second module (m2) in most tissues and (ii) the first module (m1) of breast cancer tissue (BRCA), normal colon tissue (CONO) and ovarian cancer tissue (OV). Was.

도 6은 8종의 조직에서 발견한 상기 모듈의 유전자 네트워크를 도시한 예이다. 8종의 조직에 대한 TCGA 데이터 세트는 BRNO, CONO, BRCA, COAD, READ, LUAD, LUSC 및 OV이다. 각 모듈은 복수의 유전자로 구성된다. 모듈에 속한 유전자는 일정한 네트워크를 구성한다. 유전자 사이의 에지는 유전자간 일정한 정보를 전달하거나 교환하는 채널을 의미한다. 도 6에서 각 노드의 크기는 유전자 사이의 에지의 개수에 비례한다. 도 6에서 알 수 있듯이 서로 다른 조직의 모듈이지만 주요한 유전자가 공통된 것을 알 수 있다. 예컨대, TYR, HBE1, F2, GDF3 및 AHSG는 모든 조직에서 공통적으로 포함되며, 나머지 유전자들도 절반 이상의 조직에서 공통적으로 포함된다. 또한 대부분의 조직에서 TYR과 AHSG를 주축으로 유전자 네트워크가 구성되어 있는 공통점을 확인할 수 있다. 서로 다른 조직에서 완벽하게 동일한 유전자로 구성되지는 않지만, 상기 모듈(모듈을 구성하는 유전자)은 서로 다른 조직에서 상당히 공통된다는 것을 알 수 있다.Figure 6 is an example showing the gene network of the module found in eight tissues. TCGA data sets for eight tissues are BRNO, CONO, BRCA, COAD, READ, LUAD, LUSC, and OV. Each module consists of a plurality of genes. The genes that belong to a module make up a certain network. Edge between genes refers to a channel for transmitting or exchanging certain information between genes. In Figure 6 the size of each node is proportional to the number of edges between genes. As can be seen in Figure 6, but the module of different tissues can be seen that the major genes are common. For example, TYR, HBE1, F2, GDF3 and AHSG are common in all tissues, and the remaining genes are common in more than half of the tissues. In addition, in most organizations, the commonality between the TYR and the AHSG can be identified. Although not composed of exactly the same genes in different tissues, it can be seen that the modules (the genes that make up the modules) are quite common in different tissues.

특정 모듈이 모든 조직에서 엔트로피가 매우 낮음은 표현형과 기능에 상관없이 모든 세포에서 활성화됨을 의미한다. 따라서 상기 모듈은 모든 종류의 세포에서 공통적인 기능을 수행하는 것을 의미하며 진핵 생물의 게놈 시스템에서 핵심적 구성에 해당한다고 볼 수 있다. 엔트로피가 모든 조직에서 극도로 낮으며 공통적 유전자로 구성된 상기 모듈을 이하 커널(kernel) 모듈이라고 명명한다. 복수의 커널 모듈이 존재할 수 있으며, 복수의 커널 모듈로 구성된 집합을 커널 도메인이라고 명명한다. 커널 모듈은 특정 단백질 네트워크에 관련된 단백질의 생성보다는 논코딩(non-coding) RNA와 같은 유전자 발현 부산물의 생성으로 게놈 시스템의 활성화에 중요한 기능을 하는 것으로 추정된다.The very low entropy of a particular module in all tissues means that it is activated in all cells, regardless of phenotype and function. Therefore, the module means to perform a common function in all kinds of cells and can be seen as a key component in the genomic system of eukaryotes. This module, whose entropy is extremely low in all tissues and consists of common genes, is hereinafter referred to as the kernel module. There may be a plurality of kernel modules, and a set consisting of a plurality of kernel modules is called a kernel domain. Kernel modules are thought to play an important role in the activation of genomic systems by the generation of gene expression byproducts such as non-coding RNA rather than the production of proteins related to specific protein networks.

실험적으로 커널 모듈이 서로 다른 조직 간에 공통되는지 확인하기 위하여 커널 모듈을 각기 다른 조직에 맵핑(mapping)하였다. 도 7은 BRNO의 커널 모듈을 다른 조직에 맵핑한 결과를 도시한 예이다. 맵핑은 주어진 모듈에 속한 유전자에 해당하는 데이터를 다른 조직의 유전자 발현 데이터로부터 추출하여 동일한 계산을 수행하는 과정을 의미한다.In order to verify that the kernel modules are common among different organizations, the kernel modules are mapped to different organizations. 7 illustrates an example of mapping a kernel module of BRNO to another organization. Mapping refers to a process of extracting data corresponding to a gene belonging to a given module from gene expression data of another tissue and performing the same calculation.

도 7에서 (A)는 BRNO의 기본 커널 모듈이다. (B)는 BRNO를 CONO에 맵핑한 결과이다. (C)는 BRNO를 LUSC에 맵핑한 결과이다. (D)는 BRNO를 BRCA에 맵핑한 결과이다. (E)는 BRNO를 COAD에 맵핑한 결과이다. (F)는 BRNO를 READ에 맵핑한 결과이다. (G)는 BRNO를 X6CA에 맵핑한 결과이다. (H)는 BRNO를 X2NO에 맵핑한 결과이다. 유전자 네트워크의 노란색 노드는 해당 유전자의 LOR(log odds ratio) 값이 0에 가까운 것을 의미하고, 붉은색은 LOR(log odds ratio) 값이 음수임을 나타낸다. LOR 값이 감소하는 순서로 처음 10 개의 유전자와 마지막 10 개의 유전자가 각 모듈에서 선택되었고, 선택된 유전자의 발현 프로파일을 모듈 아래에 도시하였다. 검은색이 처음 10개의 유전자에 대한 발현 프로파일이고, 붉은색은 마지막 10개의 유전자에 대한 발현 프로파일이다. In Figure 7 (A) is a basic kernel module of the BRNO. (B) shows the result of mapping BRNO to CONO. (C) shows the result of mapping BRNO to LUSC. (D) is the result of mapping BRNO to BRCA. (E) shows the result of mapping BRNO to COAD. (F) is the result of mapping BRNO to READ. (G) shows the result of mapping BRNO to X6CA. (H) is the result of mapping BRNO to X2NO. The yellow node of the gene network means that the log odds ratio (LOR) value of the gene is close to zero, and the red color indicates that the log odds ratio (LOR) value is negative. The first 10 genes and the last 10 genes were selected in each module in order of decreasing LOR value, and the expression profile of the selected genes is shown below the module. Black is the expression profile for the first 10 genes and red is the expression profile for the last 10 genes.

BRNO의 커널을 다른 조직에 맵핑할 때 맵핑된 유전자가 다른 커널 영역에 존재하지 않거나, 맵핑할 조직의 커널 영역의 복잡성이 낮으면 엔트로피가 증가한다. BRNO의 커널을 다른 조직에 맵핑한다는 것은 다른 조직의 유전자 발현 데이터에서 BRNO의 커널에 포함된 유전자의 데이터를 추출하여 필요한 계산을 수행한다는 의미이다. BRNO의 커널을 CONO로 맵핑하였을 때 계산한 엔트로피는 0.091 nats이다. 이는 CONO(정상 결장 조직)에서 무작위로 선택된 유전자의 엔트로피인 0.515 nats보다 훨씬 낮은 것이다. 따라서 두 개의 서로 다른 조직 사이에 커널 모듈의 유사성이 상당히 높다는 것을 알 수 있다. 도 7의 (B)를 살펴보면, BRNO의 커널을 CONO로 맵핑하였을 때 대부분의 유전자에서 LOR이 0에 가까운 것을 알 수 있다. 한편 BRNO 커널을 종양 데이터(BRCA, COAD, READ, LUAD, LUSC, OV)에 맵핑하였을 때 엔트로피는 0.224-0.601 nats로 비교적 높았다. 따라서 BRNO(정상 유방 조직)에서 종양 조직으로의 맵핑에서 엔트로피가 높다는 것은 종양의 커널 영역이 정상 조직에 비해 그 특성의 방향성이 분산되었음을 의미한다. 나아가 CONO의 커널 모듈을 다른 데이터 세트에 맵핑할 때도 비슷한 결과가 나타났다. When mapping the kernel of BRNO to another tissue, entropy increases if the mapped gene is not present in another kernel region, or if the kernel region of the tissue to be mapped is low. Mapping the BRNO's kernel to other tissues means extracting the data of the genes contained in the BRNO's kernel from gene expression data from other tissues and performing the necessary calculations. The entropy calculated when mapping the kernel of BRNO to CONO is 0.091 nats. This is much lower than 0.515 nats, the entropy of randomly selected genes in CONO (normal colon tissue). Thus, we can see that the similarity of kernel modules between two different organizations is quite high. Referring to FIG. 7B, when the kernel of BRNO is mapped to CONO, LOR is close to zero in most genes. On the other hand, when the BRNO kernels were mapped to tumor data (BRCA, COAD, READ, LUAD, LUSC, OV), the entropy was relatively high, 0.224-0.601 nats. Thus, high entropy in the mapping of BRNO (normal breast tissue) to tumor tissue means that the kernel region of the tumor has a more directional dispersion of its properties than normal tissue. In addition, mapping CONO's kernel modules to other data sets yielded similar results.

다양한 기능을 갖는 도메인들이 있지만, 종양과의 관계에서 중요한 세포 주기 및 DNA 수선(cell cycle and DNA repair, 이하 CCDR이라함) 관련한 도메인에 대하여 설명한다.Although there are domains with various functions, domains related to cell cycle and DNA repair (hereinafter referred to as CCDR), which are important in relation to tumors, are described.

세포 분열은 수정란에서 체세포로의 다세포 생물의 발달과 단세포 생물의 개체군 증가를 위한 필수적인 과정이다. 세포 분열은 세포주기 정지 및 DNA 손상 수리를 통해 정교하게 조절되며, 조절 장애는 비정상적인 세포 성장을 초래할 수 있다. Cell division is an essential process for the development of multicellular organisms from fertilized eggs to somatic cells and for the growth of unicellular organisms. Cell division is finely regulated through cell cycle arrest and repair of DNA damage, and dysregulation can lead to abnormal cell growth.

CCDR 도메인은 정상 유방 조직에서 복수의 모듈로 구성된다. CCDR을 구성하는 12개의 모듈은 세포 분열에 참여하는 유전자(예컨대, BUB1)로 구성되고, 서로 에지를 통해 강하게 연결되는 것으로 나타났다. 이러한 유전자로 구성되는 모듈은 다른 정상 조직(CONO, X2NO)에서도 다수 발견되었으며, 종양 조직에서는 소수의 모듈이 발견되었다.The CCDR domain consists of a plurality of modules in normal breast tissue. The twelve modules that make up the CCDR consist of genes that participate in cell division (eg, BUB1) and have been shown to be strongly linked through the edges of each other. Numerous modules consisting of these genes have been found in other normal tissues (CONO, X2NO) and few have been found in tumor tissues.

정상 유방 조직의 12 개 CCDR 모듈을 다른 정상 조직으로 맵핑했을 때 엔트로피 값은 본래 엔트로피 값과 전체적으로 비슷하다. 대조적으로, CCDR 모듈을 종양 데이터 세트로 맵핑했을 때 엔트로피 값은 각각의 데이터 세트에서 계산한 랜덤 엔트로피 수준으로 높아진다. When the 12 CCDR modules of normal breast tissue are mapped to other normal tissues, the entropy values are generally similar to the original entropy values. In contrast, when the CCDR module is mapped to a tumor data set, the entropy value is raised to the random entropy level calculated in each data set.

도 8은 BRNO의 CCDR 도메인의 모듈을 다른 조직의 모듈에 맵핑한 예를 도시한다. 도 8은 CCDR 도메인에 속한 몇 개의 모듈에 대한 유전자 네트워크를 도시한다. 도 8은 CCDR의 모듈 중 m3, m41 및 m49를 도시한다. 도 8에서 A는 BRNO의 CCDR 모듈을 각각 CONO, BRCA, LUAD 및 LUSC에 맵핑한 결과이다. 노드의 색상은 도 7에서 설명한 바와 같다. 도 8에서 B는 BRNO의 CCDR 모듈 및 BRNO의 모듈이 다른 조직에 맵핑될 때 해당 모듈의 유전자 발현에 기여하는 확률 분포이다. 도 8에서 B는 유전자의 좌 시프트(left-shift) 된 확률 분포를 도시한다. 이는 밀도 행렬의 변형을 의미한다. 8 shows an example of mapping a module of the CCDR domain of BRNO to a module of another organization. 8 shows the genetic network for several modules belonging to the CCDR domain. 8 shows m3, m41 and m49 of modules of CCDR. In Figure 8, A is the result of mapping the CCDR module of BRNO to CONO, BRCA, LUAD and LUSC, respectively. The color of the node is as described in FIG. In FIG. 8, B is a probability distribution that contributes to gene expression of a module when the CCDR module of BRNO and the module of BRNO are mapped to other tissues. In FIG. 8, B shows the left-shifted probability distribution of genes. This means a deformation of the density matrix.

도 8의 A는 정상 유방의 CCDR 도메인을 구성하는 모듈 내의 유전자 네트워크를 도시한다. 유전자의 발현에 기여하는 모듈의 확률은 또한 유전자가 모듈과 어떻게 관련이 있는지를 나타낸다. 정상 유방의 CCDR 모듈이 정상 결장 (CONO)에 맵핑되었을 때, 대부분의 유전자는 모듈에 높은 확률을 보였다. 대조적으로, CCDR 모듈이 암 데이터 세트로 맵핑될 때 확률은 크게 감소했다. m3, m41 및 m49를 대부분의 종양 데이터 세트에 맵핑했을 때 엔트로피 값은 1.0 nats를 초과한다. 이것은 종양에서 CCDR 모듈의 붕괴 또는 변형이 암세포 증식을 초래할 뿐 아니라, 커널 영역의 통제를 벗어나 실질 및 간질의 세포 사건과 균형을 이루지 못하게 한다는 것을 의미한다. 8A shows the gene network in the modules that make up the CCDR domain of normal breasts. The probability of a module contributing to the expression of a gene also indicates how the gene is related to the module. When the CCDR module of normal breast was mapped to normal colon (CONO), most genes showed a high probability on the module. In contrast, the probability decreased significantly when the CCDR module was mapped to the cancer data set. The entropy value exceeds 1.0 nats when m3, m41 and m49 are mapped to most tumor data sets. This means that disruption or modification of the CCDR module in the tumor not only leads to cancer cell proliferation, but also out of control of the kernel region and out of balance with cell events of the parenchyma and epilepsy.

정상 유방 조직의 CCDR 모듈을 다른 정상 조직 및 종양 조직에 맵핑하여 얻은 결과를 종합하면, 정상 세포에서는 CCDR 프로그램의 엄격한 통제를 받으며, 종양 조직에서는 CCDR 모듈의 붕괴 또는 변형으로 인해 DNA 손상이 발생한 세포에서 세포 주기가 계속되는 것을 나타낸다. BRNO에서 LUAD로 맵핑하였을 때 엔트로피 값은 BRNO가 LUSC에 맵핑되는 경우의 값보다 2배 이상 크다. 이는 LUAD이 비하여 LUSC가 암의 진행 속도가 빠르고 돌연변이 확률도 높다는 종래 연구결과와 일치한다. The results obtained by mapping the CCDR module of normal breast tissue to other normal tissues and tumor tissues are summarized, which is under tight control of the CCDR program in normal cells, and in tumor tissues in cells where DNA damage has occurred due to disruption or modification of the CCDR module. Indicates that the cell cycle continues. When mapped from BRNO to LUAD, the entropy value is more than twice as large as when BRNO is mapped to LUSC. This is consistent with previous studies that LUSC has a faster cancer progression and a higher probability of mutation than LUAD.

유전자 네트워크(Genetic network)Genetic network

전술한 바와 같이 게놈 모듈은 복수의 유전자로 구성된다. 하나의 모듈에 존재하는 유전자들은 정보를 교환하는 네트워크를 구성한다. 이를 유전자 네트워크라고 명명하였다. As mentioned above, a genomic module consists of a plurality of genes. Genes in a module form a network of information exchange. This is called the genetic network.

게놈 모듈은 진핵생물 게놈에서 프로그램 단위를 나타낸다. 전술한 바와 같이 생물에 대한 전체 프로그램에서 모듈이 특정한 단위를 구성하는 것이다. 여기서 프로그램은 생물이라는 시스템을 구동하기 위해 필요한 프로세스를 의미한다. 게놈 모듈에 존재하는 유전자 네트워크는 특정 관점에서 유전자들을 연결하는 요소이다. Genomic modules represent program units in the eukaryotic genome. As mentioned above, the modules make up a specific unit in the whole program for a creature. The program here refers to the processes necessary to run the system of living things. Gene networks present in genomic modules are the elements that link genes in particular respects.

도 9는 샘플 공간에서 임의의 모듈에 대한 밀도 행렬을 도시한 예이다. 도 9는 임의의 모듈의 밀도 행렬

및 모듈에 포함된 임의의 유전자 j의 발현 벡터

를 도시한다. 붉은색 실선은 임의의 모듈의 밀도 행렬을 나타낸다. 파란색 실선은 밀도 행렬에 대한 단위 벡터의 확률 궤적을 나타낸다. 점선은 유전자 i의 배제에 따른 섭동(perturbation)을 나타낸다. 섭동은 역학계에서 주요한 힘의 작용에 의한 운동이 부차적인 힘의 영향으로 인하여 교란되어 일어나는 운동을 의미한다.

는 유전자 j의 정규화된 발현 벡터이므로, 밀도 행렬에 대한 확률은 대응하는 궤적 상에 있다.9 is an example illustrating the density matrix for any module in sample space. 9 is the density matrix of any module

And expression vector of any gene j included in the module

Shows. The solid red line represents the density matrix of any module. The solid blue line represents the probability trajectory of the unit vector for the density matrix. The dashed line represents perturbation following exclusion of gene i. Perturbation refers to the movement caused by the disturbance caused by the influence of the secondary force.

Since is a normalized expression vector of gene j, the probability for the density matrix is on the corresponding trajectory.

임의의 모듈이 유전자 i의 배제에 의해 섭동된다. 섭동된 모듈의 밀도 행렬을

라고 한다. 유전자 i가 배제되어 모듈이 섭동되면 밀도 행렬은 샘플 공간에서 약간 회전하고, 타원형 모양이 약간 좁아지거나 넓어진다. 밀도 행렬에 있는 다른 유전자 j의 확률은

에서

로 변동된다. 유전자 j가 유전자 i에 강하게 연결된 상태라면, 섭동에 의하여 유전자 j의 확률은 크게 감소하게 된다.Any module is perturbed by the exclusion of gene i. The density matrix of the perturbed module

It is called. When the gene i is excluded and the module is perturbed, the density matrix rotates slightly in sample space and the elliptical shape becomes slightly narrower or wider. The probability of another gene j in the density matrix is

in

Fluctuates. If gene j is strongly linked to gene i, perturbation greatly reduces the probability of gene j.

도 10은 임의의 모듈의 유전자 네트워크에서 유전자 i의 배제에 의한 모듈 섭동을 도시한 예이다. 도 10(A)의 A는 임의의 모듈에서 유전자 i를 배제하기 전의 유전자 네트워크를 도시한다. 도 10(B)의 B는 유전자 i를 배제한 후의 유전자 네트워크를 도시한다. 유전자 i를 모듈로부터 제거하였을 때, 유전자 i에만 연결된 유전자 j는 모듈에서 고립된다. 따라서 유전자 i가 제거된 모듈에서 유전자 j의 확률의 감소폭은 커진다. 반면에 유전자 i에 직접 연결되어 있지 않거나, 연결되어 있다 할지라도 다른 유전자와도 연결되어 있는 경우 유전자 j의 확률의 감소폭은 작아진다. 따라서 유전자 i의 제거 전 유전자 j의 확률과 제거 후의 확률 사이의 LOR로 유전자 i로부터 유전자 j로의 연결을 추정할 수 있다.10 is an example illustrating module perturbation by exclusion of gene i from the gene network of any module. Figure 10 (A) shows the gene network before excluding gene i from any module. B in FIG. 10 (B) shows the gene network after excluding gene i. When gene i is removed from the module, gene j linked only to gene i is isolated from the module. Therefore, the decrease in the probability of gene j becomes larger in the module from which gene i is removed. On the other hand, if it is not directly linked to gene i, or if it is linked to another gene, the decrease in the probability of gene j becomes smaller. Thus, the LOR between the probability of gene j before removal and the probability after removal of gene i can be used to estimate the link from gene i to gene j.

확률의 승산비(odds ratio)는 두 유전자 사이의 영향을 정량적으로 표현할 수 있다. 확률의 LOR(log odds ratio)은 정보 콘텐츠의 차이와 동일하다. 아래 수학식 11은 모듈에서 유전자 i가 배제되는 경우 동일 모듈에 속한 유전자 j의 확률 변동 정도(

)를 나타낸다.The odds ratio of probability can quantitatively express the effect between two genes. The log odds ratio (LOR) of the probability is equal to the difference in the information content. Equation 11 below shows the degree of probability variation of gene j belonging to the same module when gene i is excluded from the module.

).

게놈 모듈에 있는 가능한 모든 유전자 쌍에 대하여 상기

를 연산한다. 임의의 두 유전자(i와 j)에 대한

가 일정한 임계값을 넘는 경우 유전자 i와 유전자 j는 강한 연결성을 갖는다고 판단한다. 강한 연결성을 갖는 유전자 사이는 에지로 연결한다. 이와 같이 게놈 모듈 내에 존재하는 모든 유전자 사이에서 전술한

를 연산하여 유전자 네트워크를 구성할 수 있다. 예컨대, 도 6은 8개의 조직에 대한 커널 모듈을 도시하는데, 각 모듈에 대한 유전자 네트워크는 전술한 방법을 통해 완성한 것이다.For all possible gene pairs in the genomic module

Calculate For any two genes (i and j)

If i exceeds a certain threshold, it is determined that gene i and gene j have strong connectivity. Between genes with strong connectivity are connected by edges. As described above, among all the genes present in the genome module,

Can be constructed to construct a genetic network. For example, FIG. 6 shows kernel modules for eight tissues, with the genetic network for each module completed through the method described above.

아래 표 3은 모듈 내에서 유전자 네트워크를 구성하는 과정에 대한 슈도코드이다. 간략하게 설명하면 전술한 바와 같이 임의의 모듈에 속한 유전자 쌍에 대하여 LOR을 연산하고, 이를 바탕으로 인접행렬(adjacency matrix)을 생성한다. 유전자 i와 다른 모든 유전자들 사이의 LOR을 인접행렬로부터 추출하여 사분위수(quartile)를 계산하고 cutoff 값을 이용하여 유전자 i의 내부 임계값 th_i을 계산한다. 각 유전자에 대하여 내부 임계값 이상의 LOR을 갖는 유전자 쌍에 에지를 부여하는 과정을 반복한다.Table 3 below is a pseudocode for the process of constructing a gene network within a module. Briefly, as described above, an LOR is calculated on a gene pair belonging to an arbitrary module, and an adjacency matrix is generated based on the LOR. The LOR between gene i and all other genes is extracted from the neighboring matrix to calculate the quartile, and the cutoff value is used to calculate the internal threshold th _i of gene i. For each gene, the process of giving an edge to a pair of genes having an LOR above an internal threshold is repeated.

모듈 간 네트워크(intermodular network)Intermodular network

유전자 네트워크가 유전자에 의해 동작하는 프로그램이라면, 유기체의 프로그램 구조는 모듈 간 네트워크로 나타낼 수 있다. 전술한 바와 같이 게놈 모듈 네트워크에서 모듈 간 에지가 존재한다. 여기서 에지는 모듈이 서로 일정한 연관성 내지 연결성이 있다는 의미이다. 에지는 모듈이 일정한 정보를 전달하거나 교환하는 채널이라고 볼 수도 있다.If a genetic network is a program driven by genes, the program structure of an organism may be represented as an intermodule network. As mentioned above, there are intermodule edges in the genomic module network. An edge here means that the modules have a constant association or connectivity with each other. An edge can also be thought of as a channel through which modules communicate or exchange certain information.

모듈 간 네트워크를 구성하는 과정에 대하여 설명한다. 유전자 데이터 세트에서 추출한 가능한 모든 모듈 쌍(모듈 i 및 모듈 j)에 대하여 상대 엔트로피(relative entropy)를 측정한다.The process of configuring a network between modules will be described. Relative entropy is measured for all possible module pairs (module i and module j) extracted from the genetic data set.

상대 엔트로피는 모듈 i가 모듈 j에 대하여 갖는 정보 이득을 의미한다. 상대 엔트로피는

로 표현될 수 있다. 여기서

및

는 각각 모듈 i와 모듈 j에 대한 밀도 행렬을 나타낸다. 상대 엔트로피는 항상 음수가 아니고 비가환적이다. 즉,

이다. 상대 엔트로피는 모듈 간 네트워크를 구성하기 위한 정보로 사용된다. 두 개의 밀도 행렬이 동일하다면 상대 엔트로피는 0(zero)이다. 두 개의 밀도 행렬의 차이가 크면 상대 엔트로피 값도 커진다. Relative entropy means the information gain that module i has for module j. Relative entropy is

It can be expressed as. here

And

Denote the density matrices for module i and module j, respectively. Relative entropy is not always negative and non-cyclical. In other words,

to be. Relative entropy is used as information to form a network between modules. If the two density matrices are equal, the relative entropy is zero. Larger differences between two density matrices result in larger relative entropy values.

한편 상대 엔트로피는 서로 다른 조직에서 분리한 모듈 간 유사성을 파악하는 과정에도 사용된다. 서로 다른 조직에서 분리된 모듈은 샘플 공간이 전혀 다르기 때문에 직접적인 비교가 불가능하다. 한 조직의 모듈을 다른 조직으로 맵핑하여 계산한 상대 엔트로피는 동일 샘플 공간에서 밀도 행렬의 차이를 나타낸다.Relative entropy is also used to identify similarities between modules separated from different organizations. Modules isolated from different tissues are completely different in sample space and therefore cannot be compared directly. Relative entropy, calculated by mapping modules from one tissue to another, represents the difference in density matrices in the same sample space.

모듈 i가 모듈 j에 대하여 정보 이득이 낮다면 모듈 i와 모듈 j는 서로 연관성이 높다고 할 수 있다. 이 경우 모듈 i와 모듈 j 사이에 에지를 연결하여 네트워크를 구축한다. If module i has a low information gain with respect to module j, it can be said that module i and module j have a high correlation with each other. In this case, a network is established by connecting the edge between module i and module j.

낮은 레벨에서 상대 엔트로피의 해상도를 높이기 위하여, 상대 엔트로피에 음의 로그(log)를 적용할 수 있다. 로그를 적용한 상대 엔트로피는

이다. 여기서,

이다.

> 0이고, i≠ j이다. 아래 표 4는 모듈 간 네트워크를 구축하기 위한 알고리즘에 대한 슈도코드의 예이다.In order to increase the resolution of the relative entropy at low levels, a negative log can be applied to the relative entropy. Relative entropy with log applied

to be. here,

to be.

> 0 and i ≠ j. Table 4 below is an example of pseudocode for an algorithm for building an intermodule network.

주어진 모듈 i에 대하여 다른 상대 모듈 j와의 연관성을 결정하기 위하여 일정한 임계값을 사용한다. 모듈 i와 모듈 j 사이의

가 임계값을 초과하지 않는 경우 모듈 i와 모듈 j 사이에 에지를 연결한다. 컷오프(cutoff, C_f)와 관련하여 모듈 간 에지를 결정하기 위한 적절한 임계값을 찾아야 한다. 예컨대, 상기 표 4와 같이

의 제1사분위수(Q1), 제2사분위수(Q2) 및 제3사분위수(Q3)를 사용할 수 있다.For a given module i, a certain threshold is used to determine the association with another partner module j. Between module i and module j

Does not exceed the threshold, connect the edge between module i and module j. With regard to the cutoff (C _f ), we need to find a suitable threshold for determining the edge between modules. For example, as shown in Table 4 above.

The first quartile Q1, the second quartile Q2, and the third quartile Q3 may be used.

게놈 모듈 간의 정보 교환 패턴은 모듈 간 네트워크로 표현할 수 있다. 모듈 사이의 연결성을 확인하기 위하여 샘플 공간에서 게놈 모듈 사이의 상대 엔트로피를 측정한다.The information exchange pattern between genomic modules can be expressed as a network between modules. The relative entropy between genomic modules in the sample space is measured to confirm connectivity between the modules.

가능한 모든 게놈 모듈의 쌍으로부터 상대 엔트로피를 측정하고 컷오프를 이용하여 계산된 임계값을 넘지 않는 상대 엔트로피 값으로 구성된 인접 행렬이 마련될 수 있다. 인접 행렬을 이용하여 모듈 간 네트워크를 구성할 수 있다. 컷오프를 감소시킴에 따라 계산한 인접 행렬로부터 모듈 간 네트워크를 구성할 때, 모듈 연결 순서는 조직 유형에 따른다.A contiguous matrix may be prepared which measures relative entropy from all possible pairs of genomic modules and consists of relative entropy values not exceeding a threshold calculated using cutoff. The adjacency matrix can be used to construct an intermodule network. When constructing a module-to-module network from adjacent matrices calculated by reducing the cutoff, the order of module connection depends on the organization type.

초기에 연결되는 모듈은 모듈 간 네트워크를 구성하는 각 영역에서 시드(seed)를 구성한다고 할 수 있다. 도 11은 8개의 조직에 대한 TCGA 데이터 세트로 모듈 간 네트워크를 구성한 예이다. 붉은색 화살표는 커널 도메인의 시드를 나타내고, 파란색 화살표는 세포 주기 및 DNA 수선(CCDR) 도메인의 시드를 나타낸다. 모듈의 엔트로피는 그레이 스케일(gray scale)로 표현했다. 밝은 색은 보다 낮은 엔트로피를 의미한다. 도 11에서 각 노드는 하나의 모듈을 나타내고, 각 모듈은 노드의 숫자로 식별한다.Modules that are initially connected can be said to form a seed in each area of the network between modules. 11 shows an example of configuring a module-to-module network with TCGA data sets for eight organizations. Red arrows indicate the seed of the kernel domain and blue arrows indicate the seed of the cell cycle and DNA repair (CCDR) domains. The entropy of the module is expressed in gray scale. Light colors mean lower entropy. In FIG. 11, each node represents one module, and each module is identified by the number of nodes.

BRNO 및 CONO 데이터 세트의 경우 커널 도메인의 시드가 컷오프(C_f) 4.0에 표시되었다. BRCA의 커널 도메인의 첫 번째 에지(edge)는 C_f가 2.2일 때까지 나타나지 않았다. LUAD, COAD 및 READ 데이터 세트의 모듈 간 네트워크는 각각 C_f 3.0, 2.8 및 3.0에서 커널 도메인의 첫 번째 에지가 나타났다. LUSC 및 OV는 C_f 1.9에서 커널 도메인의 첫 번째 에지가 나타났다. 이러한 결과는 커널 도메인과 관련하여 종양의 모듈 간 네트워크가 정상 조직과 다를 수 있음을 시사한다.For the BRNO and CONO data sets, the seed of the kernel domain is shown in cutoff (C _f ) 4.0. The first edge of the BRCA's kernel domain did not appear until C _f was 2.2. The intermodule network of the LUAD, COAD, and READ data sets showed the first edge of the kernel domain at C _f 3.0, 2.8, and 3.0, respectively. LUSC and OV showed the first edge of the kernel domain at C _f 1.9. These results suggest that the intermodule network of tumors may differ from normal tissues with respect to the kernel domain.

TCGA 데이터 세트에 대해 실험적으로 C_f를 변경하면서 모듈 간 네트워크를 재구성하였다. 전체 에지의 수와 모듈 당 에지의 수는 정상 조직이 암 조직보다 많았다. 이는 종양의 게놈 시스템이 정상 조직보다 단순하다는 것을 암시한다.We have reconfigured the module-to-module network with experimentally changing C _f for the TCGA data set. The total number of edges and the number of edges per module were higher in normal tissue than in cancer tissue. This suggests that the genomic system of the tumor is simpler than normal tissue.

도 12는 다양한 컷오프값을 적용한 BRNO의 모듈 간 네트워크에 대한 예이다. 도 12에서 kn은 커널 도메인이고, cc는 CCDR 도메인이고, pr은 실질(parenchyma) 도메인이고, st는 간질(stroma) 도메인이다. 도 12에서 각 노드는 그레이 스케일 값을 갖는다. 색이 밝을 수록 낮은 엔트로피를 의미한다. 컷오프 C_f를 낮추면 에지를 갖는 모듈의 개수가 증가하며 네트워크의 단절이 감소한다. C_f가 1.0에 도달하기 전 도메인 사이에 완벽한 연결을 보여준다. C_f 4.0에서 이미 모든 도메인(kn, cc, pr 및 st)의 시드가 식별 가능하다. 12 is an example of an inter-module network of BRNO to which various cutoff values are applied. In FIG. 12, kn is a kernel domain, cc is a CCDR domain, pr is a parenchyma domain, and st is a stroma domain. In FIG. 12, each node has a gray scale value. The brighter the color, the lower the entropy. Lowering the cutoff C _f increases the number of modules with edges and reduces network disconnection. It shows a perfect connection between domains before C _f reaches 1.0. In C _f 4.0, seeds of all domains (kn, cc, pr and st) are already identifiable.

각 모듈을 다른 조직의 유전자 발현 데이터 세트에 맵핑하고 유전자 온톨로지를 이용하여 기본 유전자의 기능을 탐색해보면, 모듈 간 네트워크로 구분된 각 영역의 특정 생물학적 기능을 유추할 수 있다. C_f 1.0으로 구성한 BRNO의 모듈 간 네트워크는 도메인 영역 사이의 관계를 명확하게 설명한다. 커널 도메인(kn)은 모듈 m52와 m60을 통해 실질의 기능을 수행하는 모듈(pr)을 제어할 수 있다. 모듈 m3은 커널 도메인과 CCDR 도메인 사이의 정보 흐름을 중계한다.By mapping each module to a gene expression dataset from another tissue and using gene ontology to explore the function of the underlying genes, one can infer specific biological functions in each region, separated by a network between modules. BRNO's intermodule network, configured with C _f 1.0, clearly explains the relationship between domain regions. The kernel domain kn may control a module pr that performs an actual function through the modules m52 and m60. Module m3 relays the information flow between the kernel domain and the CCDR domain.

정상 유방의 간질의 기능은 st 영역의 모듈로 나타난다. C_f 4.0에서 st 영역은 두 개의 영역으로 구분된다. m38, m64 및 m79를 포함하는 영역은 지방 세포(adipocytes)와 관련되고, m27 및 m50을 포함하는 영역은 간질 도메인과 커널 도메인의 중계를 수행한다고 분석할 수 있다. The function of the epilepsy of normal breasts appears as a module in the st region. In C _f 4.0, the st region is divided into two regions. The region containing m38, m64 and m79 is associated with adipocytes, and the region containing m27 and m50 can be analyzed to perform the relay of the interstitial domain and the kernel domain.

간질 도메인 st는 C_f 2.5에서 6개의 시드가 된다. 모듈을 분석하면 혈관 신생(angiogenesis), 면역 기능(마크로파지), 세포외기질 생성, 지방 세포에 대한 동작을 수행한다고 추정할 수 있다. 또한 커널 도메인과 CCDR 도메인의 중계 역할도 수행한다고 추정할 수 있다.Interstitial domain st results in 6 seeds at C _f 2.5. Analysis of the module can be assumed to perform angiogenesis, immune function (macrophage), extracellular matrix production, and action on adipocytes. It can also be assumed that it also plays a role of relay between the kernel domain and the CCDR domain.

C_f값에 따라 구분되는 다양한 모듈 간 네트워크를 통해 각 도메인과 모듈 사이의 기능을 해석할 수 있다. You can interpret the functionality between each domain and module through a network of different modules that are distinguished by their C _f values.

모듈 간 네트워크의 중앙 영역에 위치한 몇 개의 모듈은 게놈 시스템의 모든 도메인을 연결한다. 일종의 메타 프로그램으로 간주될 수 있다. 해당 모듈은 세포외기질과 혈관구조(vasculature)가 간질, 실질 및 커널 도메인에 속한 모듈 사이의 통신에 의해 조절된다는 것을 암시한다. 정상 유방 조직에서 면역 기능과 관련된 게놈 시스템은 다른 시스템에 의해 억제 된 것으로 보인다. 노드의 색상은 BRNO의 게놈 모듈이 다른 조직에서 갖는 엔트로피의 변화량을 나타낸다. 노란색은 BRNO와 다른 조직에서 엔트로피가 거의 동일한 것을 나타낸다. 붉은색은 BRNO보다 다른 조직에서 엔트로피가 높다는 것을 나타낸다. Several modules located in the central region of the intermodule network connect all domains of the genomic system. It can be considered a kind of meta program. The module suggests that extracellular matrix and vasculature are regulated by communication between modules belonging to the epilepsy, parenchyma and kernel domains. In normal breast tissue, genomic systems related to immune function appear to be suppressed by other systems. The color of the node represents the amount of change in entropy the BRNO's genomic module has in other tissues. Yellow indicates that entropy is almost the same in BRNO and other tissues. Red color indicates higher entropy in other tissues than BRNO.

도 13은 BRNO의 모듈 간 네트워크를 다른 조직의 모듈에 맵핑한 예를 도시한다. 도 13에서 (A)는 BRNO를 CONO에 맵핑한 예이고, (B)는 BRNO를 BRCA에 맵핑한 예이고, (C)는 BRNO를 LUAD에 맵핑한 예이고, (D)는 BRNO를 LUSC에 맵핑한 예이다. 도 13에서 (A)는 다른 정상 조직에 대한 맵핑이고, (B)부터 (D)까지는 종양 조직에 대한 맵핑의 예이다. 모듈 간 네트워크를 맵핑하는 과정은 게놈 모듈 네트워크를 도메인의 종류 및 모듈 간 엔트로피의 차이를 기준으로 수행할 수 있다. 도 13에서 f는 지방 조직 도메인을 의미한다.FIG. 13 shows an example of mapping an intermodule network of BRNOs to modules of another organization. In FIG. 13, (A) shows an example of mapping BRNO to CONO, (B) shows an example of mapping BRNO to BRCA, (C) shows an example of mapping BRNO to LUAD, and (D) shows BRNO to LUSC. This is an example of mapping. In Figure 13 (A) is a mapping to other normal tissue, (B) to (D) is an example of the mapping to tumor tissue. The process of mapping the network between modules may perform the genomic module network based on the type of domain and the difference in entropy between modules. In Figure 13 f means adipose tissue domain.

도 13 (A)를 살펴보면, BRNO에서 CONO에 맵핑 된 모듈 간 네트워크는 커널 도메인(kn)은 0.091-0.182nats 엔트로피를 보인다. 이는 기본 모듈인 BRNO의 엔트로피(0.017-0.109 nats)와 유사한 것이다. 그러나 BRNO에 맵핑 된 CONO 커널 도메인은 0.144-0.289 nats 엔트로피를 가져 CONO 독자의 엔트로피(0.016-0.043 nats)보다 조금 높은 값을 보인다. 이러한 차이는 CONO가 BRNO에 비하여 약간 더 넓은 커널 도메인을 갖기 때문일 수 있다. 따라서 결장 조직이 유방 조직보다 더 많은 세포 유형으로 구성된다고 추정할 수 있다. 다른 영역에서는, 정상 유방의 실질(pr) 및 지방 조직 (f)에 있는 모듈 중 일부가 증가된 맵핑 엔트로피를 보여준다. 따라서 정상 유방 조직의 생물학적 프로그램의 상당 부분은 정상 결장에서도 활성화되어야하지만 기능적 활동의 정도는 환경 및 기타 모듈의 매개 변수 입력에 따라 변경 될 수 있다는 것을 보여준다. 커널 도메인을 제외하고는 CONO에서 BRNO로 맵핑된 모듈은 CONO의 독립 모듈보다 덜 활동적이다. 이 결과는 정상 결장의 게놈 시스템이 정상 유방의 게놈 시스템보다 복잡하다는 것을 암시한다.Referring to FIG. 13 (A), the inter-module network mapped to CONO in BRNO has a kernel domain (kn) of 0.091-0.182nats entropy. This is similar to the entropy of the base module BRNO (0.017-0.109 nats). However, the CONO kernel domain mapped to BRNO has 0.144-0.289 nats entropy, slightly higher than CONO's own entropy (0.016-0.043 nats). This difference may be because CONO has a slightly wider kernel domain than BRNO. Thus, it can be assumed that colon tissue consists of more cell types than breast tissue. In other areas, some of the modules in the parenchymal (pr) and adipose tissue (f) of normal breasts show increased mapping entropy. Thus, while a large part of the biological program of normal breast tissue should be activated even in normal colon, it shows that the degree of functional activity can change depending on the input of parameters of the environment and other modules. Except for the kernel domain, modules mapped from CONO to BRNO are less active than CONO's independent modules. This result suggests that the genomic system of normal colon is more complex than the genomic system of normal breast.

도 13 (B)부터 (D)까지는 각각 BRNO의 모듈 간 네트워크를 종양 데이터 세트인 BRCA, LUAD 및 LUSC로 맵핑한 결과를 도시한다. 종양 유형이 다르지만 맵핑된 모듈 간 네트워크의 엔트로피 분포 패턴은 세 가지 암 데이터 세트 모두에서 유사했다. 13 (B) to (D) show the results of mapping the intermodule network of BRNO to tumor data sets BRCA, LUAD and LUSC, respectively. Although the tumor types are different, the entropy distribution pattern of the mapped intermodule network was similar in all three cancer data sets.

BRNO의 실질 도메인(pr)에 포함된 모든 모듈을 각 종양 조직에 맵핑했을 때 엔트로피는 0.890-1.493 nats으로 나타났다. 이는 BRNO 에서의 본래 엔트로피 0.109 nats 또는 BRNO에서 CONO에 맵핑했을 때의 엔트로피 0.263 nats보다 매우 높은 것이다. 따라서 종양 조직에서 실질 도메인(pr)은 정상적인 기능을 수행하지 못할 정도로 변형되었다고 볼 수 있다. When all modules included in the parenchymal domain (pr) of BRNO were mapped to each tumor tissue, the entropy was found to be 0.890-1.493 nats. This is much higher than the original entropy of 0.109 nats in BRNO or 0.263 nats of entropy when mapped to CONO in BRNO. Therefore, the parenchymal domain (pr) in tumor tissues can be considered to be modified to the extent that it does not function properly.

CCDR 도메인의 경우 맵핑된 모듈의 엔트로피가 0.754-1.507 nats로 나타났다. 서로 다른 종양 유형에 대하여 서로 다른 파손 패턴을 보인다.For the CCDR domain, the entropy of the mapped module was 0.754-1.507 nats. Different breakage patterns are shown for different tumor types.

특히 도메인을 연결하는 메타 모듈은 종양 조직에서 비활성화되었다. 메타 모듈은 서로 다른 도메인을 연결하는 역할을 수행하는 모듈을 말한다. BRNO에서 LUSC로 맵핑한 경우 모듈 m3에 대한 엔트로피는 0.795-1.407 nats로서, CCDR 도메인 다음으로 높은 변형(붕괴)을 보였다.In particular, meta-modules linking domains were inactivated in tumor tissues. The meta module is a module that serves to connect different domains. When mapped from BRNO to LUSC, the entropy for module m3 was 0.795-1.407 nats, showing the highest deformation (collapsing) after the CCDR domain.

게놈 시스템의 커널, CCDR 및 실질 도메인은 간질 도메인에 일정한 정보를 전달하여 혈관 신생(c), 면역 기능(d), 지방 조직(f) 형성을 포함하는 세포외기질 형성을 제어한다. 종양에서 '간질 도메인(st)'과 '실질 (pr), 커널(kn) 및 CCDR(cc) 도메인'을 연결하는 a 및 e 영역도 종양 조직에서 매우 약화되었다. 이는 종양 조직에서 간질 도메인이 다른 도메인과 서로 통신하기 어려운 상태라는 것을 암시한다. 즉 간질 도메인이 다른 도메인에 정보를 전달하여 일정한 기능에 관여하지 못하는 것을 말한다. 이는 종양 조직에서 간질이 적절한 조절없이 비정상적으로 구축되는 현상과 일치한다.The kernel, CCDR and parenchymal domains of the genomic system deliver certain information to the epilepsy domain to control extracellular matrix formation, including angiogenesis (c), immune function (d), and adipose tissue (f) formation. The a and e regions that link the 'epileptic domain (st)' and 'parenchymal (pr), kernel (kn) and CCDR (cc) domains' in tumors are also very weak in tumor tissues. This suggests that interstitial domains are difficult to communicate with other domains in tumor tissues. In other words, the epilepsy domain cannot transmit information to other domains and is not involved in certain functions. This is consistent with the phenomenon that epilepsy is abnormally built up without proper control in tumor tissues.

전술한 게놈 모듈 네트워크를 통해 다양한 분석을 수행할 수 있다. 도 14는 게놈 모듈 네트워크를 이용한 분석 방법(200)에 대한 순서도의 예이다. 먼저 유전자 발현 데이터를 입력받는다(210). 유전자 발현 데이터는 복수의 샘플에서 추출한 데이터인 것이 바람직하다. 유전자 발현 데이터는 cDNA 마이크로어레이와 같은 기법을 활용하여 획득한 데이터이다. 이후 유전자 발현 데이터를 기준으로 유전자들을 특정한 모듈로 모듈화한다(230). 전술한 설명에 따라 유전자 발현 데이터를 해석하여 특정한 모듈로 구분하는 과정이다. 복수의 모듈 사이에서 모듈 간 네트워크를 구축한다(240). 또 모듈에 속한 복수의 유전자 사이에서 유전자 네트워크를 구축한다(250). 유전자 네트워크 구축은 모듈화 이후에 수행되면 된다. 도 14의 순서도의 순서와 동일하지 않아도 된다. 이 과정에서 모듈 간 네트워크를 분석하여, 모듈 레벨에서 게놈을 분석할 수 있다(260). 모듈 간 네트워크를 기준으로 모듈 사이의 관계성을 분석할 수 있다. 또한 서로 다른 샘플에 대한 모듈 간 네트워크를 맵핑하여 서로 다른 샘플의 관계성에 대해서도 분석할 수 있다. 전술한 바와 같이 정상 조직과 비교하여 종양 조직에서 특정한 모듈 또는 특정한 도메인의 활성이 약화되는 것을 확인할 수 있다. 나아가 유전자 레벨에서 게놈을 분석할 수도 있다(270). 유전자 네트워크를 이용하여 유전자 사이의 관계성을 분석할 수 있다. 나아가 서로 다른 샘플에 대한 유전자 네트워크를 맵핑하여, 특정 샘플에 대한 유전자 기능을 분석할 수도 있다. 예컨대, 종양 환자에 대한 특정 유전자의 기능의 활성, 특정 유전자의 비활성, 종양과 관련된 유전자의 검출 등과 같은 분석을 수행할 수 있다. 이를 활용하면 특정 질환과 관련된 유전자(마커)를 파악할 수 있다.Various analyzes can be performed through the genomic module network described above. 14 is an example of a flowchart for an analysis method 200 using a genomic module network. First, gene expression data is input (210). Gene expression data is preferably data extracted from a plurality of samples. Gene expression data is data obtained using techniques such as cDNA microarrays. The genes are then modularized into specific modules based on the gene expression data (230). According to the above description, gene expression data is analyzed and classified into specific modules. A module-to-module network is established between the plurality of modules (240). In addition, a gene network is constructed between a plurality of genes belonging to the module (250). Genetic network construction only needs to be carried out after modularization. It does not have to be the same as that of the flowchart of FIG. In this process, the module-to-module network may be analyzed to analyze the genome at the module level (260). You can analyze the relationships between modules based on the network between modules. You can also analyze the relationships between different samples by mapping networks between modules for different samples. As described above, it can be seen that the activity of a specific module or a specific domain is weakened in tumor tissue compared to normal tissue. Further, the genome may be analyzed at the gene level (270). Genetic networks can be used to analyze relationships between genes. Furthermore, gene networks for different samples can be mapped to analyze gene function for a particular sample. For example, analysis may be performed such as activity of a particular gene's function on a tumor patient, inactivation of a particular gene, detection of a gene associated with the tumor, and the like. This can be used to identify genes (markers) associated with specific diseases.

도 15는 게놈 모듈 네트워크를 구축하는 시스템에 대한 예이다. 도 15는 게놈 모듈 네트워크를 구축하는 물리적인 시스템에 대한 예이다.15 is an example of a system for building a genomic module network. 15 is an example of a physical system for building a genomic module network.

도 15(A)는 사전에 마련된 유전자 발현 데이터를 이용하여 게놈 모듈 네트워크를 구축하는 시스템(400)이다. 시스템(400)은 유전자 발현 DB(410) 및 컴퓨터 장치(450)를 포함한다. Fig. 15A is a system 400 for constructing a genomic module network using previously prepared gene expression data. System 400 includes gene expression DB 410 and computer device 450.

유전자 발현 DB(expression DB, 410)는 특정 생물체의 유전자 발현과 관련된 데이터를 저장한다. 전술한 바와 같이 유전자 발현 데이터는 cDNA 마이크로어레이 등과 같은 기법을 이용하여 생성된다. 유전자 발현 DB(410)은 특정 질환(예컨대, 악성종양)과 관련된 발현 데이터일 수도 있다.The gene expression DB 410 stores data related to gene expression of a specific organism. As described above, gene expression data is generated using techniques such as cDNA microarrays. Gene expression DB 410 may be expression data associated with a particular disease (eg, malignant tumor).

컴퓨터 장치(450)는 유전자 발현 DB(210)에 저장된 발현 데이터에 접근한다. 컴퓨터 장치(450)는 발현 데이터를 분석하여 특정 생물체에 대한 게놈 모듈 네트워크를 구축한다. 전술한 바와 같이 컴퓨터 장치(450)는 전사 발현에 대한 정보를 엔트로피로 해석하고, 유전자 사이의 엔트로피를 기준으로 모듈 및 네트워크를 구축한다. Computer device 450 accesses expression data stored in gene expression DB 210. Computer device 450 analyzes the expression data to build a genomic module network for the particular organism. As described above, the computer device 450 interprets information on transcriptional expression as entropy, and builds modules and networks based on entropy between genes.

연구자는 컴퓨터 장치(450)가 구축한 게놈 모듈 네트워크를 기반으로 게놈 내지 유전자에 대한 분석을 수행할 수 있다. 나아가 특정 환자에 대한 게놈 모듈 네트워크를 구축하고, 비교 대상인 정상 게놈 모듈 네트워크와 비교하여 환자에 대한 진단 등을 수행할 수도 있다. 게놈 분석 내지 환자 진단은 연구자 또는 의사가 주어진 자료(게놈 모듈 네트워크)를 보고 수행할 수 있다. 나아가 컴퓨터 장치(450)가 자동으로 특정한 분석을 수행한 결과를 출력할 수도 있다.The researcher may perform an analysis on genomes or genes based on the genomic module network built by the computer device 450. Furthermore, a genome module network for a specific patient may be established, and the patient may be diagnosed by comparing with a normal genome module network to be compared. Genomic analysis or patient diagnosis can be performed by a researcher or physician by viewing a given data (genomic module network). Further, the computer device 450 may automatically output a result of performing a specific analysis.

도 15(B)는 게놈 모듈 네트워크를 구축하는 시스템(500)의 다른 예이다. 시스템(500)은 연구자 단말(510), 중앙 서버(520) 및 컴퓨터 장치(550)를 포함한다. 15B is another example of a system 500 for building a genomic module network. The system 500 includes a researcher terminal 510, a central server 520, and a computer device 550.

연구자는 환자에 대한 유전자 발현에 대한 실험을 수행하고, 유전자 발현에 대한 결과를 연구자 단말(310)에 입력한다. 예컨대, 유전자는 환자의 mRNA에 대한 마이크로어레이 실험을 수행하고, 발현과 관련된 데이터를 연구자 단말(510)에 저장한다. 데이터는 텍스트, 영상 등과 같은 형태를 포함한다. 영상인 경우 영상을 분석하기 위한 소프트웨어 등이 사용될 수 있다. 연구자 단말(510)은 입력된 데이터를 중앙 서버(520)에 전달한다. 중앙 서버(520)는 특정 환자에 대한 유전자 발현 데이터를 저장하고, 관리한다.The researcher performs experiments on gene expression for the patient and inputs the results of gene expression into the researcher terminal 310. For example, the gene performs a microarray experiment on the mRNA of the patient and stores data related to expression in the researcher terminal 510. Data includes forms such as text, images, and the like. In the case of an image, software for analyzing the image may be used. The researcher terminal 510 transmits the input data to the central server 520. The central server 520 stores and manages gene expression data for a particular patient.

중앙 서버(520)가 유전자 발현 데이터를 이용하여 특정한 게놈 모듈 네트워크를 구출할 수 있다. 나아가 중앙 서버(520)는 구축한 게놈 모듈 네트워크를 기반으로 환자에 대한 진단 내지 치료에 대한 정보를 제공할 수 있다. 이 경우 컴퓨터 장치(550)는 클라이언트 장치로 중앙 서버(520)에 접속하여 게놈 모듈 네트워크를 확인하거나, 진단 등에 대한 정보를 확인할 수 있다. The central server 520 may use the gene expression data to rescue a particular genomic module network. Furthermore, the central server 520 may provide information on diagnosis or treatment for the patient based on the constructed genomic module network. In this case, the computer device 550 may access the central server 520 as a client device to check the genomic module network or to check information about the diagnosis.

나아가 중앙 서버(520)는 특정 환자에 대한 유전자 발현 데이터를 저장하고, 관리만 할 수도 있다. 이 경우 컴퓨터 장치(550)는 중앙 서버(520)에 저장된 발현 데이터를 분석하여 특정 생물체에 대한 게놈 모듈 네트워크를 구축한다. 나아가 컴퓨터 장치(550)는 구축한 게놈 모듈 네트워크를 기반으로 환자에 대한 진단 내지 치료에 대한 정보를 제공할 수 있다.Furthermore, the central server 520 may only store and manage gene expression data for a particular patient. In this case, the computer device 550 analyzes the expression data stored in the central server 520 to build a genomic module network for a specific organism. Further, the computer device 550 may provide information on diagnosis or treatment for the patient based on the constructed genomic module network.

도 15(C)는 게놈 모듈 네트워크를 구축하는 장치(600)에 대한 예이다. 도 15(C)는 컴퓨터 장치가 특정한 프로그램을 구동하여 게놈 모듈 네트워크를 구축하거나, 분석을 수행하는 예이다. 도 15(C)는 PC와 같은 장치를 예로 도시하였지만, 도 15(C)의 장치(600)는 중앙 서버(520)와 같은 서버 장치일 수도 있다. 게놈 모듈 네트워크 구축 장치(600)는 입력 장치(610), 연산장치(620), 저장장치(630) 및 출력장치(640)를 포함한다.15C is an example of an apparatus 600 for building a genomic module network. 15C is an example in which a computer device runs a specific program to build a genomic module network or performs an analysis. Although FIG. 15C illustrates an apparatus such as a PC as an example, the apparatus 600 of FIG. 15C may be a server apparatus such as a central server 520. The genome module network building device 600 includes an input device 610, a computing device 620, a storage device 630, and an output device 640.

입력장치(610)는 유전자 발현 데이터를 입력받는다. 입력장치(610)는 키보드, 마우스, 터치패드 와 같은 물리적인 인터페이스 장치일 수 있다. 또는 입력장치(610)는 외부 저장매체(USB 등)로부터 저장된 유전자 발현 데이터를 전달받은 장치일 수도 있다. 또는 입력장치(610)는 외부 네트워크로부터 유전자 발현 데이터를 수신하는 통신 모듈일 수도 있다.The input device 610 receives the gene expression data. The input device 610 may be a physical interface device such as a keyboard, a mouse, or a touch pad. Alternatively, the input device 610 may be a device that receives gene expression data stored from an external storage medium (USB, etc.). Alternatively, the input device 610 may be a communication module that receives gene expression data from an external network.

저장장치(630)는 게놈 모듈 네트워크 구축을 위한 프로그램을 저장한다. 또한 게놈 모듈 네트워크를 이용하여 특정한 분석을 수행하는 프로그램을 저장할 수도 있다. 저장장치(630)에 저장되는 프로그램은 전술한 설명에 따라 유전자에 대한 게놈 모듈 네트워크를 구축하기 위한 소스 코드를 포함한다.The storage device 630 stores a program for constructing a genomic module network. You can also store programs that perform specific analyzes using genomic module networks. The program stored in storage 630 includes source code for building a genomic module network for genes in accordance with the foregoing description.

연산장치(630)는 입력되는 유전자 발현 데이터와 저장장치(630)에 저장된 프로그램을 이용하여 게놈 모듈 네트워크를 구축하는 연산을 수행한다. 나아가 연산장치(630)는 구축된 게놈 모듈 네트워크를 저장장치(630)에 저장된 분석 프로그램을 이용하여 분석하는 과정을 수행할 수도 있다. 연산 장치(630)는 CPU, AP(Application processor) 등과 같이 프로그램을 통해 특정한 연산을 처리하는 프로세서 장치를 의미한다.The operation unit 630 performs an operation of constructing a genomic module network by using input gene expression data and a program stored in the storage unit 630. Further, the computing device 630 may perform a process of analyzing the constructed genomic module network using an analysis program stored in the storage device 630. The computing device 630 refers to a processor device that processes a specific operation through a program such as a CPU, an application processor (AP), or the like.

출력장치(640)는 구축한 게놈 모듈 네트워크 및 분석 결과를 출력하는 장치이다. 출력장치(640)는 영상을 출력하는 디스플레이 장치, 텍스트를 출력하는 프린터 등일 수 있다. 나아가 출력장치(640)는 생성한 게놈 모듈 네트워크 내지 분석 데이터를 다른 장치에 전달하는 통신 모듈일 수도 있다.The output device 640 is a device for outputting the constructed genomic module network and the analysis result. The output device 640 may be a display device for outputting an image, a printer for outputting text, or the like. Further, the output device 640 may be a communication module that delivers the generated genomic module network or analysis data to another device.

또한, 상술한 바와 같은 게놈 모듈 네트워크 구축 방법은 컴퓨터에서 실행될 수 있는 실행가능한 알고리즘을 포함하는 프로그램(또는 어플리케이션)으로 구현될 수 있다. 상기 프로그램은 비일시적 판독 가능 매체(non-transitory computer readable medium)에 저장되어 제공될 수 있다.In addition, the genomic module network construction method as described above may be implemented as a program (or application) including an executable algorithm that may be executed on a computer. The program may be stored and provided in a non-transitory computer readable medium.

비일시적 판독 가능 매체란 레지스터, 캐쉬, 메모리 등과 같이 짧은 순간 동안 데이터를 저장하는 매체가 아니라 반영구적으로 데이터를 저장하며, 기기에 의해 판독(reading)이 가능한 매체를 의미한다. 구체적으로는, 상술한 다양한 어플리케이션 또는 프로그램들은 CD, DVD, 하드 디스크, 블루레이 디스크, USB, 메모리카드, ROM 등과 같은 비일시적 판독 가능 매체에 저장되어 제공될 수 있다.A non-transitory readable medium refers to a medium that stores data semi-permanently and is read by a device, not a medium storing data for a short time such as a register, a cache, a memory, and the like. Specifically, the various applications or programs described above may be stored and provided in a non-transitory readable medium such as a CD, a DVD, a hard disk, a Blu-ray disk, a USB, a memory card, a ROM, or the like.

본 실시례 및 본 명세서에 첨부된 도면은 전술한 기술에 포함되는 기술적 사상의 일부를 명확하게 나타내고 있는 것에 불과하며, 전술한 기술의 명세서 및 도면에 포함된 기술적 사상의 범위 내에서 당업자가 용이하게 유추할 수 있는 변형 예와 구체적인 실시례는 모두 전술한 기술의 권리범위에 포함되는 것이 자명하다고 할 것이다.The embodiments and the drawings attached to this specification are merely to clearly show some of the technical ideas included in the above-described technology, and those skilled in the art may easily It will be apparent that both the inferred modifications and the specific embodiments are included in the scope of the above-described technology.

100 : 게놈 모듈 네트워크
400 : 게놈 모듈 네트워크 구축 시스템
410 : 유전자 발현 DB
450 : 컴퓨터 장치
500 : 게놈 모듈 네트워크 구축 시스템
510 : 연구자 단말
520 : 중앙 서버
550 : 컴퓨터 장치
600 : 게놈 모듈 네트워크 구축 장치
610 : 입력장치
620 : 연산장치
630 : 저장장치
640 : 출력 장치100: Genome Module Network
400: Genome Module Network Building System
410 gene expression DB
450: computer device
500: Genome Module Network Building System
510: researcher terminal
520: central server
550: Computer Device
600: Genome Module Network Building Device
610: input device
620: computing device
630: Storage device
640: output device

Claims

The computer device receiving gene expression data of a specific tissue;
Determining, by the computer device, the entropy for the plurality of gene sets that make up the genome using the data;
The computer device dividing the plurality of gene sets into a plurality of genomic modules based on the entropy; And
The computer device determining an edge connecting the genomic modules based on relative entropy for the plurality of genomic modules to generate a genomic module network for the plurality of gene sets;
The entropy for the set of genes represents a correlation between at least two of the plurality of genes based on a probability of the expression state of the gene, and the relative entropy is a relative between two pairs of the plurality of genomic modules. Modularity-based genome analysis method representing information gain.

The method of claim 1,
The computer device
Wherein said expression level is expressed as a mixed state vector of genes in a sample space, and the entropy for said set of genes is defined as a density matrix in a sample space.

The method of claim 2,
And wherein said probability is determined using a mixed state unit vector of said density matrix and gene.

The method of claim 2,
The computer device
The entropy of the gene sets divided into the genome modules is kept below a threshold value, while the plurality of gene sets are divided into the genome modules so that the direction of the density matrix has a difference between a reference direction and a threshold value or less. Based genome analysis method.

The method of claim 1,
The computer device divides a plurality of genes into any of the plurality of gene sets,
The step of dividing into a plurality of genomic modules
Removing one gene for each of the plurality of gene sets and adjusting the entropy of all genes belonging to each set to be smaller than a threshold value; And
For each of the adjusted plurality of gene sets, a gene which does not belong to the set is added to the set, provided that the entropy of the entire genes belonging to the set is equal to or less than the threshold and the variation of the unique vector is equal to or less than the reference value. Creating a set,
And wherein said plurality of final gene sets are based on modularization corresponding to said plurality of genomic modules.

The method of claim 1,
And the computer device connects the edges between all module pairs of the plurality of genomic modules between modules whose relative entropy for two modules constituting the module pair is less than or equal to a reference value.

The method of claim 6,
The entropy for the set of genes is defined as the density matrix of the sample space,
Wherein said relative entropy represents a difference in density matrix between said two modules.

The method of claim 1,
And the computer device further comprises constructing, for each of the plurality of genomic modules, a gene network for a plurality of genes belonging to the module.

The method of claim 8,
The entropy for the set of genes is defined as the density matrix of the sample space,
And the computer device determines the edge between two genes based on the odds ratio of the probability on the density matrix for the genes belonging to the module to construct the gene network.

The method of claim 8,
The entropy for the set of genes is defined as the density matrix of the sample space,
The computer device may be configured to, for the first gene and the second gene belonging to the module, if the variation of the probability of the second gene in the density matrix exceeds a threshold when the first gene is removed from the module, the first gene and the second gene. A method of genome analysis based on modularization that connects edges between second genes to form the gene network.

A computer-readable recording medium having recorded thereon a program for executing a method of performing a genome analysis according to any one of claims 1 to 10 in a computer.

An input device for receiving gene expression data of a specific tissue;
The plurality of genes are divided into a plurality of genomic modules based on entropy for a plurality of gene sets determined using the data, and an edge connecting the genomic modules is determined based on relative entropy for the plurality of genomic modules. A storage device for storing a program to be made; And
And a computing device configured to drive the program to determine entropy for a plurality of genes included in the gene expression data and to generate the genome module network based on the determined entropy.
The entropy for the set of genes represents a correlation between at least two of the plurality of genes based on a probability of expression state of the genes, and the relative entropy is determined between two pairs of the plurality of genomic modules. Genome module network construction device representing relative information gain.

The method of claim 12,
The computing device determines the plurality of gene sets by dividing the plurality of genes into any plurality of sets, removes one gene for each of the plurality of gene sets, and adjusts the entropy of each set to be smaller than a threshold value. ,
For each of the adjusted plurality of gene sets, a gene which does not belong to the set is added to the set, provided that the entropy of the entire genes belonging to the set is equal to or less than the threshold and the variation of the unique vector is equal to or less than the reference value. Genome module network construction device constituting the module.

The method of claim 12,
And the computing device connects the edges between all module pairs of the plurality of genomic modules between modules whose relative entropy for two modules constituting the module pair is equal to or less than a reference value.

The method of claim 12,
The computing device determines, for each of the plurality of genomic modules, an edge between two genes based on an odds ratio of a probability on a density matrix for a plurality of genes belonging to the module, thereby determining the plurality of genomes belonging to the module. Genome module network construction device that constructs a gene network for genes.