KR101274144B1

KR101274144B1 - Method and apparatus for extracting core protein network for disease research

Info

Publication number: KR101274144B1
Application number: KR1020110099175A
Authority: KR
Inventors: 유석종
Original assignee: 한국과학기술정보연구원
Priority date: 2011-09-29
Filing date: 2011-09-29
Publication date: 2013-06-13
Also published as: KR20130034988A

Abstract

In order to solve the above technical problem, the core protein network extraction method for disease research according to an embodiment of the present invention, the step of receiving data on protein interactions, among the received data related to the protein related to a particular disease Detecting the action data, generating a first protein network based on protein correlation information included in the detected protein interaction data, and applying a specific protein within the first protein network to a root protein node. Generating a second protein network having a tree structure, calculating a weight of edges between protein nodes included in the second protein network, and within the second protein network. Creating two or more sub-networks by separating one or more specific edges; Recalculating the weights of the edges in the respective sub-networks, and determining, for each sub-network, the sum of the re-calculated weights as the network power of the respective sub-networks.

Description

Core Protein Network Extraction Method and Apparatus for Disease Research {METHOD AND APPARATUS FOR EXTRACTING CORE PROTEIN NETWORK FOR DISEASE RESEARCH}

The present invention relates to a method and apparatus for generating a protein network for studying a protein. More specifically, the invention relates to a method and apparatus for generating a core protein network that exhibits a protein interaction relationship that causes a particular disease in causing disease through protein interactions.

Proteins produced by linear binding of 20 kinds of amino acids form various three-dimensional conformations or four-dimensional complexes, and are classified into various kinds of proteins, and have various functions corresponding to them.

Diseases are known to occur through interactions between various types of proteins, variations in the three-dimensional or four-dimensional structure of the proteins themselves, and / or abnormalities in the proteins themselves. Therefore, in order to cure diseases occurring in human beings, it is necessary to first identify phenomena such as interactions between proteins, structural changes in proteins, and malfunctions.

However, in the past, attention was not paid to the construction of a network showing protein correlations related to specific diseases.

In particular, the interaction between proteins can be represented by a vast amount of data depending on the type of protein interacting and the structure of each protein. Therefore, there is a problem that it is not known which protein correlation among the vast amount of data is necessary information for studying a specific disease.

Meanwhile, a skyline algorithm may be used as a method of determining which of the objects having two or more attributes meets the needs of the researcher, compared to other objects.

1 is a diagram illustrating a conventional skyline algorithm.

For example, consider a situation where a hotel guest would like to book a hotel at a destination. In general, a hotel farther away from a travel destination may reduce the hotel accommodation cost, and a hotel closer to the travel destination may increase the hotel accommodation cost. However, comparing a large number of hotels by distance and accommodation costs can be easier if the user presents the best candidate hotel that the user can intensively select in an inefficient manner. The skyline algorithm can be used as one of the techniques that suggests the optimal solution based on the considerations needed by the user.

The x-axis of the two-dimensional graph of FIG. 1 represents the distance from the travel destination to where the hotel is located, and the y axis represents the hotel's accommodation cost. In the graph, hotels a to m are compared. For example, the e hotel is away from the destination by the same distance as the k hotel, but the price of the k hotel is cheaper, so hotel guests may be interested in the k hotel, excluding the e hotel. Similarly, in terms of price and distance, a, i, k hotels are determined to be optimal hotels, and a line connecting them can be drawn. This line is called the skyline, and hotel guests will choose a hotel that falls under or falls below the skyline. In other words, the skyline serves to provide a criterion for optimal selection.

The present invention is to solve the above-mentioned problems of the prior art, and the technical problem to be achieved by the present invention is to provide an efficient research sample group for researchers studying the relationship between protein-diseases. In addition, another technical task of the present invention is to provide a criterion for establishing an efficient research plan for the resulting protein-disease relationship.

Preferably, the first protein network may include two or more protein nodes, and may include an edge indicating an association between the two or more protein nodes.

Preferably, the weight may be calculated by dividing the number of nodes included in the lower level to which the node connected to the edge to which the weight is calculated in the tree structure network is divided by the level value of the lower level.

Preferably, the generating of the one or more sub-networks may be characterized by generating two or more sub-networks by separating one or more edges having the smallest sum of the weights of the edges to be separated.

Preferably, in the core protein network extraction method for disease research according to an embodiment of the present invention, the disease is determined by subnetwork having the largest value obtained by dividing the determined network power by the number of protein nodes in each subnetwork. The method may further include determining a core protein network for the study.

Preferably, the core protein network extraction method for disease research according to an embodiment of the present invention is determined as the core protein network for disease research based on the determined network power and the number of nodes included in each sub-network. It may further comprise the step.

Preferably, the determining of the core protein network may include forming a two-dimensional graph having a network power value and a number of nodes included in the sub-network, respectively, on the x-axis or the y-axis, where the respective values are large. To form a skyline (skyline), and the sub-network included in the formed skyline may be characterized as determining the core protein network for disease research.

Preferably, detecting the protein interaction data related to a specific disease among the received data, converting the data about the received protein interaction into text-based data, for the text-based data, The method may further include performing a query in a field or disease term related to a specific disease, and mining the queried data.

In addition, the core protein network extraction device for disease research according to an embodiment of the present invention may include a functional unit for performing each of the above-described methods.

Effects of the core protein network extraction method and apparatus for disease research according to the present invention are as follows.

According to the present invention, there is an effect of providing a core disease-related protein network (delivery network) in research activities such as analyzing the cause mechanisms related to disease and selecting a protein for drug development.

In addition, according to the present invention, there is an effect that can efficiently plan the study design for the relationship between protein-diseases.

1 is a diagram illustrating a conventional skyline algorithm.
2 illustrates a biological network, in accordance with an embodiment of the present invention.
3 is a diagram illustrating an exponential network according to an embodiment of the present invention.
4 is a diagram illustrating a scale-free network according to an embodiment of the present invention.
5 is a diagram illustrating separating a basic network into sub-networks according to an embodiment of the present invention.
6 is a diagram illustrating a process of forming a sub network according to an embodiment of the present invention.
7 is a diagram illustrating a part of a process of forming a sub-network starting with a specific node in a basic network according to an embodiment of the present invention.
FIG. 8 is a diagram of a basic network of FIG. 7 reorganized into a tree structured network according to an embodiment of the present invention.
FIG. 9 is a diagram illustrating a network in which the tree structure of FIG. 8 is divided into two sub-networks according to an embodiment of the present invention.
FIG. 10 illustrates a skyline for determining an optimal protein subnetwork for disease research, in accordance with an embodiment of the invention.
11 is a diagram illustrating a process of extracting an optimal protein subnetwork for disease research, according to an embodiment of the present invention.
12 is a view showing a protein network extraction device for disease research, according to an embodiment of the present invention.
Figure 13 is a flow chart showing a core protein network extraction method for disease research, according to an embodiment of the present invention.

Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings and accompanying drawings, but the present invention is not limited to or limited by the embodiments.

As used herein, terms used in the present invention are selected from general terms that are widely used in the present invention while taking into account the functions of the present invention, but these may vary depending on the intention or custom of a person skilled in the art or the emergence of new technologies. In addition, in certain cases, there may be a term arbitrarily selected by the applicant, in which case the meaning thereof will be described in the description of the corresponding invention. Therefore, it is intended that the terminology used herein should be interpreted based on the meaning of the term rather than on the name of the term, and on the entire contents of the specification.

2 illustrates a biological network, in accordance with an embodiment of the present invention.

An important criterion for measuring the importance of protein interactions is to measure the connectivity of the edges that express the association between each protein.

For example, referring to the G1 network shown in FIG. 2, there is an association between node 1 and node 2, node 1 and node 4, mode 1 and node 6, respectively. This is connected to the edge. The number of nodes included in the G1 network and the G2 network is the same, it can be seen that they have the same network density (density).

However, in biological research, the G2 network is more valuable than the G1 network. The reason for this is that the protein that makes up node 3 has high connectivity with other proteins (eg, the protein of node 1, 2, 4, 5, 6).

In this embodiment, it is easy to know which protein network has high research value in contrast to a relatively simple protein network. However, since a vast amount of proteins are related to each other as described above, the protein network can be used in any of a plurality of protein networks. Sometimes it is impossible to determine whether a protein network is a highly researched protein network.

2 is a diagram illustrating an exponential network according to an embodiment of the present invention.

An example of an exponential network is a road network, a network with nearly equal edges between nodes. That is, the relevance of each node included in the exponential network does not differ significantly from each other, and the number of edges connected to the respective nodes also does not differ significantly from each other. It can be said that homogeneity exists between these nodes.

3 illustrates a scale-free network according to an embodiment of the present invention.

A scale-free network is a network in which hub nodes exist, which makes the network robust. Each node constituting each represents a protein. Scale-free networks contain hub nodes that represent proteins that are biologically involved with other proteins. It can be seen that such a hub node is connected to a plurality of nodes. Nodes other than hub nodes have fewer edges than hub nodes. It can be expressed that there is a heterogeneity between these nodes.

An important criterion for determining the hub node can be the number of edges connected to each node, which can be expressed as node degree. Each edge shows that the proteins corresponding to the nodes connected to the edges interact with each other.

Such scale-free networks can appear biologically. Human relations networks also become such scale-free networks. This indicates that the study of protein networks can also be applied in human / social studies.

In FIG. 4 the scale-free network is shown in a limited diagram and its size is limited, but the actual protein network is larger in size. Thus, multiple hub nodes may be included and there may be sub-networks included in the scale-free network. Each subnetwork may each be associated with specific diseases. In addition, multiple sub-networks may be associated with one disease. Therefore, it is necessary to be able to determine a subnetwork that has the greatest influence on the disease among the subnetworks associated with the particular disease, that is, the core subnetwork. In addition, prior to determining the core sub-networks, there is also a need for criteria for how to distinguish sub-networks in an exponential or scale-free network.

The larger the number of nodes included in the subnetwork among the sub networks, and the higher the network power of each sub network, the more likely it is to become a core sub network. In this case, the network power may be defined as a sum of weights assigned to edges based on connectivity of nodes constituting the network. More details about the network power will be described later.

If the basic network, which means the largest range of protein networks including an exponential network or a scale-free network, is defined as G, the sub-network belonging to G can be expressed as g. When the number of sub-networks is two or more, it can be represented by g1, g2, g3, etc. Nodes and edges included in the elementary network may be expressed in the form of G (VG, EG), VG may indicate a node included in the elementary network, and EG may indicate an edge included in the elementary network. Nodes and edges included in the sub-network may be expressed in a manner similar to the basic network, and may be expressed in the form of g (Vg, Eg). Here, Vg represents a node belonging to the corresponding subnetwork, and Eg represents an edge belonging to the corresponding subnetwork.

NG represents the number of nodes included in the basic network, and Ng represents the number of nodes included in the subnetwork. P (G) represents the network power of the base network, and P (g) represents the network power of the subnetwork. Network power is calculated based on the weight of the edge. The calculation of the network power will be described later.

An edge cut-set means a set that divides one network into two sub-networks. The edge cut-set may consist of one edge, or may consist of two or more edges.

S (G) means a set of sub-networks included in the base network. Each subnetwork g may be represented by Ng and P (g) representing the number of nodes included in the corresponding subnetwork.

For example, if there are subnetworks g1 and g2 belonging to S (G), where Ng1> Ng2 and P (g1)> P (g2), then Ng1 = Ng2 and P (g1)> P (g2) In the case where Ng1> Ng2 and P (g1) = P (g2), it may be determined that the subnetwork g1 is a core subnetwork rather than the subnetwork g2. That is, in this case, it may be determined that the subnetwork g1 is a key subnetwork for studying the relationship between diseases and proteins, rather than the subnetwork g2.

5 is a diagram illustrating separating a basic network into sub-networks according to an embodiment of the present invention.

In addition to the basic network, sub-networks may additionally be divided into lower sub-networks. For convenience, the lower subnetwork is also called a subnetwork.

According to an embodiment of the present invention, a method of dividing a basic network or a subnetwork into subnetworks breaks the edge with the lowest connectivity. That is, in order to separate the basic network and make it into a sub network, the sub network is made in the direction where the number of edges to be cut is the least.

6 is a diagram illustrating a process of forming a sub network according to an embodiment of the present invention.

In the underlying network G, the least connected edge would be e6,9 connecting nodes 6 and 9. Thus, e6, 9 can be separated to form a subnetwork of g1 and g2. In the sub-network g1, the edge with the lowest connectivity, that is, the least edge that needs to be cut in order to generate the sub-network is the case in which e2,5 and e4,7 are broken. g1 is separated to generate subnetworks g3 and g4.

However, in biological research, edges with high research / academic value may exist even when a specific protein is a source of expression of a specific disease or function, even when the connectivity is low, that is, the number of edges is one. That is, in the example of FIG. 6, e6,9 determined as the weakest edge in the basic network may be the most important association in disease and protein research. In this case, if a sub-network is simply formed based on the number of edges, very important research data will be missed. Therefore, further consideration is needed to form a sub network.

7 is a diagram illustrating a part of a process of forming a sub-network starting with a specific node in a basic network according to an embodiment of the present invention.

Basic networks are generated based on these relationships by extracting protein correlations from articles / data related to protein interactions.

As shown in FIG. 7, the basic network may include a plurality of nodes and edges, and specific nodes and / or edges may be selected to determine a root node. For example, you can select a node and / or edge known to contain critical information about the disease, and determine that node as the root node, and if an edge is selected, either node connected to that edge is the root node. Can be determined. Alternatively, the researcher can select a node to study and determine it as the root node.

In FIG. 7, the node indicated by the arrow is selected as the root node.

FIG. 8 is a diagram of a basic network of FIG. 7 reorganized into a tree structured network according to an embodiment of the present invention.

According to one embodiment of the present invention, in addition to the connectivity of each node, the concept of weight is introduced in the criterion for separating the basic network into sub-networks. That is, in addition to the number of edges separated in the process of forming a sub-network by cutting a specific edge of the basic network, the edge is cut off in consideration of the weight value. In this case, although the number of edges separated is small, there is an effect that can not overlook the relatively important protein interactions.

The weight is calculated starting from the root node, taking into account the number of edges connected to each node and the depth in the tree structure to which each node belongs. In an embodiment of the present invention, the depth is expressed as a level for convenience of description.

The weight is calculated for each edge by dividing the number of edges connected to a specific node by level. For example, since the number of edges connected to the root node is five, and the root node is level 1, it becomes 5/1, and the weight of the edge connected to the root node is five. Among the nodes belonging to level 2, the weight of the edge connected to node a is 2 because the edge connected to node a is 2 and the level of node a is 2, so 2/2 = 1, that is, 1 is the weight of each edge. In addition, since the number of edges connected to node b is 3 and b belongs to level 2, the weight of the edge connected to node b is 3/2 = 1.5. Similarly, an edge connected to node c has a weight of 1/2 = 0.5. In this way the weight of each edge can be calculated.

According to the present invention, there is one edge between node d and e, and between node e and node f, respectively, but the weight of the edge between node e and node f is lower than the weight of the edge between node d and node e. Therefore, the edge between the node e and the node f must be broken to form the sub network.

In addition, the edges between level 6 and level 7 and the edges between level 7 and level 8 each have a lower weight than the edge between e and f nodes, but break the edge between level 6 and level 7 In order to make, the two edges having a weight of 0.16 must be broken, so that the total weight is 0.32, so that the edge of the higher weight is broken than the edge between the e-node and the f-node is broken. Therefore, the edge between the e-node and the f-node having a weight of 0.25 should be broken to form a sub network. This applies equally to edges between level 7 and level 8.

The classification of the sub-networks according to the present invention is based on the nodes or researchers known to contain key information about the disease, or the level of interest (depth difference or related distance difference) with respect to the nodes that are of interest or are expected to be important. Since the number of edges connected to each node is considered together with each other, it is possible to distinguish a relatively important edge from a non-essential edge, thereby providing an important criterion for forming a subnetwork.

FIG. 9 is a diagram illustrating a network in which the tree structure of FIG. 8 is divided into two sub-networks according to an embodiment of the present invention.

For each subnetwork separated from the underlying network, the weight is recalculated. That is, the weight of each edge is recalculated in the above-described manner in each of the subnetwork g1 and the subnetwork g2.

In one embodiment, the network power of each sub-network may be defined as the sum of the weights of the edges included in each sub-network.

In another embodiment, the network power of each subnetwork may be defined as a value obtained by dividing the sum of weights of edges included in each subnetwork by the number of nodes belonging to the subnetwork.

In either case, a large network power may be determined as the core subnetwork.

For example, when the network power of each subnetwork is defined as the sum of the weights of the edges included in each subnetwork divided by the number of nodes belonging to the subnetwork, the network power of g1 is 34.33 / 12 = about 2.86 The network power of g2 becomes 5.66 / 6 = about 0.94, indicating that g1 is a more important subnetwork than g2.

In an embodiment of the present invention, the description is discontinued in dividing the basic network into two sub-networks g1 and g2, but the sub-networks may be separated again to form a plurality of sub-networks. For example, the sub networks g1 can be separated to generate sub networks g3 and g4. Even in this case, the weight of each edge of g3 and g4 is newly calculated, and the network power of each subnetwork can also be calculated.

FIG. 10 illustrates a skyline for determining an optimal protein subnetwork for disease research, in accordance with an embodiment of the invention.

As shown in FIG. 10, the x-axis of the two-dimensional graph represents the number of nodes (or vertices) included in the sub-network, and the y-axis represents the network power of the sub-network. Each point in the two-dimensional graph may represent a respective subnetwork.

In one embodiment, the number of nodes and network power of all the sub-networks generated in the process of separating the base network (basic protein network) into sub-networks (sub-protein network) and separating the separated sub-networks into sub networks again. It can be identified and displayed as a point on a two-dimensional graph.

As the protein subnetwork has a larger network power and a larger number of nodes included in the subnetwork, the protein subnetwork becomes an optimal subnetwork for disease research. Connected to form a skyline. Sub-networks included in the formed skylines may be potent protein sub-networks for disease research.

Although FIG. 10 illustrates the formation of the skyline as a two-dimensional graph, when another consideration factor is needed to determine an optimal protein network for disease research, a three-dimensional graph may be used.

11 is a diagram illustrating a process of extracting an optimal protein subnetwork for disease research, according to an embodiment of the present invention.

The process of extracting the optimal protein subnetwork for disease research begins with the process of collecting previously generated article information (11010) related to protein-disease and Pubmed (11012), which is related to various diseases.

The data importer 11020 serves to process the collected data into a system. For example, the data importer 11020 converts the collected data into a text format. This is to facilitate data retrieval and / or collection of data related to specific issues.

Data mining 11030 serves to classify the data. The data mining 11030 receives a specific condition set in the ruleset 11032 and detects data corresponding to a relationship with a disease due to protein interaction.

The ruleset 11032 sets a corresponding condition in order to query protein-disease related data corresponding to a specific condition. For example, the name of a particular disease, the organ from which the disease occurs, the name of a protein and / or the phenomenon of the disease may be set as a condition of the query.

The protein interaction database 11040 is responsible for storing data including a relationship with a disease due to protein interaction.

The bioreaction network generator 11050 plays a role in generating a basic network related to protein interaction. For example, the underlying network can be the largest network involved in protein interactions.

The core network recommender 11060 separates the sub-networks from the basic network, classifies the networks effective for the study of disease and protein interaction among the separated sub-networks, and designates them as core networks.

12 is a view showing a protein network extraction device for disease research, according to an embodiment of the present invention.

Protein network extraction device for disease research according to an embodiment of the present invention is a data collection module 12010, data analysis module 12020, protein interaction database (12030), basic network generation module (12040), core network Analysis module 1250 and / or network visualization module 12060.

The data collection module 12010 collects data related to proteins and / or diseases. The data collected may be article data, or may be data related to proteins and / or diseases published in a manner other than the article. In addition, private data may also be collected depending on the situation.

The data analysis module 12020 analyzes data representing protein interactions and thus a relationship with a disease among the collected data. That is, the data analysis module 12020 detects protein interaction data related to a specific disease among the collected data.

Protein interaction database 12030 stores protein interaction data associated with a particular disease.

The basis network generation module 12040 generates the basis protein network based on the protein correlation information included in the protein interaction data. For example, the basal protein network may be the largest network containing protein correlations associated with a particular disease. For example, the base network generation module 1204 may serve to generate a base network similar to the protein network shown in FIG. 4 or 7.

The core network analysis module 12050 analyzes the underlying protein network and extracts key subprotein networks for disease research. For example, the core network analysis module 12050 creates a tree structured protein network having a particular protein in the underlying protein network as the root protein node, and included in the tree structured protein network. Calculate the weight of the edge between protein nodes. For example, the core network analysis module 1250 may serve to calculate the weight for each edge, as described above in FIG. 8 and the description thereof. The core network analysis module 1250 generates two or more sub-networks by separating one or more specific edges within a tree-like protein network, recalculating the weights of the edges within each sub-network, For each sub-network, the sum of the recalculated weights may be determined as the network power of each sub-network. For example, core network analysis module 1150 separates the underlying network into sub-networks of g1 and g2 and recalculates the weight for each edge within each sub-network of g1 and g2, as shown in FIG. Each network power can be determined.

The core network analysis module 1250 may extract the core network in consideration of the number of nodes included in each sub-network and / or network power. For example, the core network analysis module 1250 may extract g1 as a core network because the network power of g1 among the sub networks of g1 and g2 shown in FIG. 9 is large. Alternatively, the core network analysis module 1250 may extract a subnetwork having high values of both as core networks in consideration of the number of nodes included in each subnetwork and the network power.

The network visualization module 12060 serves to present the core network to the user. For example, network visualization module 12060 may display a skyline created for core network recommendation and allow a user to select a desired subnetwork. For example, the network visualization module 12060 may create a graph as shown in FIG. 10 and display a skyline in the graph to display to a user.

Figure 13 is a flow chart showing a core protein network extraction method for disease research, according to an embodiment of the present invention.

According to an embodiment of the present invention, the protein network extraction apparatus receives data on protein interaction (S13010). The data received is raw data, which may be protein and / or disease articles, other data, and the like.

The protein network extraction apparatus detects data including protein interactions associated with a specific disease (s13020).

The protein network extraction apparatus generates a first protein network based on the protein correlation information included in the protein interaction data (S13030). For example, the first protein network is the basal protein network or basal network, and may be defined as the top-level network for protein interaction. For example, the first protein network generated here may have a form similar to the protein network shown in FIG. 4 or 7.

The protein network extracting apparatus generates a second protein network having a tree structure having a specific protein in the first protein network as a root protein node (S13040). For example, the second protein network may have a form similar to that shown in FIG. 8.

The protein network extraction apparatus calculates the weight of the edges between the protein nodes included in the second protein network (S13050). Here, the calculation of the weight is as described above.

The protein network extraction apparatus generates two or more sub-networks by separating one or more specific edges in the second protein network (s13060). The protein network extraction apparatus may generate one or more sub-networks by separating one or more edges having the smallest sum of the weights of the edges to be separated.

The apparatus for extracting protein networks recalculates the weights of the edges in each subnetwork, and determines the sum of the recalculated weights for each subnetwork as the network power of each subnetwork (s13070). For example, the protein network extraction apparatus separates the underlying network into sub-networks of g1 and g2 as shown in FIG. 9, recalculates the weight for each edge in each sub-network of g1 and g2, respectively. Can determine the network power. In another embodiment, as described above, the network power may be defined as a sum of weights of edges of the sub-network divided by the number of nodes of the sub-network. For example, the protein network extraction apparatus has a large network power for g1 among the subnetworks of g1 and g2 shown in FIG. 9, and thus can extract g1 as a core network. Alternatively, the protein network extracting apparatus may extract a subnetwork having both high values as a core network in consideration of the number of nodes included in each subnetwork and the network power.

As described above, according to the present invention, the core disease-related protein network (delivery network) is provided in research activities such as analysis of the cause mechanism related to disease and selection of protein for drug development, and thus the efficiency of the research activity is improved. It has the effect of increasing.

In addition, according to the present invention, by using a core protein network related to a specific issue, there is an effect of efficiently planning a research design for a specific issue.

The present invention described so far is not limited to the above-described embodiments, and can be modified by those skilled in the art as can be seen from the appended claims, and such modifications are the scope of the present invention. Belongs to.

Claims

Receiving data about protein interactions;
Detecting data about protein interactions that cause a specific disease among the received data;
Generating a first protein network based on protein correlation information included in data about the detected protein interactions;
Generating a second protein network of a tree structure having a specific protein in the first protein network as a root protein node;
Calculating a weight of edges between protein nodes included in the second protein network;
In the second protein network, separating one or more specific edges to create two or more sub-networks; And
Recalculating the weights of the edges in the respective sub-networks, and for each sub-network, determining the sum of the re-calculated weights as the network power of the respective sub-networks;
Core protein network extraction method for disease research, including.

The method of claim 1, wherein the first protein network,
A core protein network extraction method for disease research, comprising at least two protein nodes and comprising an edge connecting the at least two protein nodes.

The method of claim 1, wherein the weight is,
Extracting the core protein network for disease research, which is calculated by dividing the number of nodes included in the lower level to which the node connected to the edge to which the weight is calculated in the tree structure is divided by the level value of the lower level. Way.

The method of claim 1, wherein creating the one or more subnetworks,
A method for extracting core protein networks for disease research, characterized by separating two or more sub-networks by separating one or more edges having the smallest sum of the weights of the edges to be separated.

The method of claim 1,
Determining a subnetwork having the largest value obtained by dividing the determined network power by the number of protein nodes in each subnetwork as a core protein network for disease research;
Core protein network extraction method for disease research further comprising.

The method of claim 1,
Determining a core protein network for disease research based on the determined network power and the number of nodes included in each sub-network;
Core protein network extraction method for disease research further comprising.

The method of claim 6, wherein the determining of the core protein network comprises
A two-dimensional graph is formed having a network power value and the number of nodes included in the sub-network, respectively, on the x-axis or the y-axis, and in the formed two-dimensional graph, the coordinates corresponding to each sub-network are set, and the set A method for extracting a core protein network for disease research, comprising: forming a skyline with respect to coordinates and determining a sub-network included in the formed skyline as a core protein network for disease research.

The method of claim 1, further comprising: detecting data about protein interactions that cause a specific disease among the received data;
Converting data about the received protein interaction into text-based data;
Performing a query on the text-based data in the specific disease field or disease term; And
Mining the queried data;
Core protein network extraction method for disease research further comprising.

A data collector for receiving data on protein interactions;
A data analyzer for detecting data on protein interactions that cause a specific disease among the received data;
A basic network generator configured to generate a first protein network based on protein correlation information included in the detected protein interaction data; And
A second protein network having a tree structure having a specific protein in the first protein network as a root protein node is generated, and the weight of edges between protein nodes included in the second protein network ( weight), within the second protein network, separate one or more specific edges to create two or more sub-networks, recalculate the weight of the edges within each sub-network, and for each sub-network, A core network generator that determines the sum of the recalculated weights as network power of each of the sub-networks;
Core protein network extraction system for disease research comprising a.

The method of claim 9, wherein the first protein network,
A core protein network extraction system for disease research, comprising at least two protein nodes and comprising an edge connecting the at least two protein nodes.

The method of claim 9, wherein the weight is,
Extracting the core protein network for disease research, which is calculated by dividing the number of nodes included in the lower level to which the node connected to the edge to which the weight is calculated in the tree structure is divided by the level value of the lower level. system.

The method of claim 9, wherein the core network generation unit,
A core protein network extraction system for disease research, wherein one or more edges with the smallest sum of the weights of the edges to be separated are separated to create two or more sub-networks.

The method of claim 9, wherein the core network generation unit,
The core protein network extraction system for disease research, wherein the subnetwork having the greatest value divided by the number of protein nodes in each subnetwork is determined as a core protein network for disease research. .

The method of claim 9, wherein the core network generation unit,
The core protein network extraction system for disease research, characterized in that to determine the core protein network for disease research based on the determined network power and the number of nodes included in each sub-network.

The method of claim 14, wherein the core network generation unit,
A two-dimensional graph is formed having a network power value and the number of nodes included in the sub-network, respectively, on the x-axis or the y-axis. A core protein network extraction system for disease research, comprising: forming a skyline with respect to coordinates, and determining a sub-network included in the formed skyline as a core protein network for disease research.

The method of claim 9, wherein the data analysis unit,
Convert the data about the received protein interactions into text-based data, perform a query on the text-based data in the field or disease term of the specific disease, and mine the queried data ( mining) is a core protein network extraction system for disease research, characterized in that.