KR20030019682A

KR20030019682A - Apparatus and method for analysing protein expression profile based on spot intensity information

Info

Publication number: KR20030019682A
Application number: KR1020010052556A
Authority: KR
Inventors: 인용호; 김지협; 김용욱; 김형용; 김진희; 채수진; 정재은; 엄태진
Original assignee: 바이오인포메틱스 주식회사
Priority date: 2001-08-29
Filing date: 2001-08-29
Publication date: 2003-03-07

Abstract

PURPOSE: A protein manifestation profile analysis system and method is provided to analyze a protein manifestation variation for each experiment condition obtained by a two dimension electrophoresis method, to search for a similar manifestation profile pattern, and to cluster the protein in a hierarchy. CONSTITUTION: The system comprises an interface(10), a proteome database(20), and a protein manifestation profile analyzer(50). The proteome database(20) stores data on many kinds of approved proteome. The interface(10) receives data, necessary for analyzing the protein manifestation profile, from servers, transmits the data to the protein manifestation profile analyzer(50), and outputs the profile analysis result to the servers. The protein manifestation profile analyzer(50) includes a ratio variation analyzer(60), a similar manifestation pattern retrieval module(70), and a clustering module(80). The ratio variation analyzer(60) extracts intensity information from the proteome database(20) with respect to experiment conditions input by a user, calculates variation ratios, and searches for a result among the calculated variation ratios satisfying an input variation ratio. The similar manifestation pattern retrieval module(70) calculates an Euclidian distance between the intensity information, selected by the user and that, extracted from the proteome database(20), and searches for a protein having a similar protein manifestation pattern. The clustering module(80) extracts intensity information of a protein profile, satisfying an experiment condition input by a user, from the proteome database(20), and performs a hierarchical clustering operation on the proteins having similar manifestation patterns by using the extracted intensity information.

Description

Apparatus and method for analysing protein expression profile based on spot intensity information}

본 발명은 프로테옴(Proteome) 분석 시스템에 관한 것으로, 특히 생물학연구 방법 중 하나인 2차원 전기영동에 의해 얻어진 단백질의 실험 조건별 발현 변화를 분석하고, 유사발현 프로파일 패턴을 검색하며, 이를 계층적으로 클러스터링 하는 장치 및 방법에 관한 것이다.The present invention relates to a proteome analysis system, and in particular, to analyze the expression changes according to the experimental conditions of the protein obtained by two-dimensional electrophoresis, one of the biological research methods, search for similar expression profile patterns, and hierarchically An apparatus and method for clustering are provided.

단백질(Protein) 연구를 위해 오래 전부터 사용되었던 2차원 전기영동 기술은, 현재는 프로테옴 연구를 위해 주로 사용되고 있으며, 다량의 데이터 처리 및 방대한 자료로부터 유용 정보를 탐사하고 선별하는 데이터 마이닝(data mining)을 위해 데이터베이스(database ; DB)를 필요로 한다.Two-dimensional electrophoresis technology, which has long been used for protein research, is currently used mainly for proteome research and uses data mining to explore and screen useful information from large amounts of data and vast amounts of data. To do this, you need a database (DB).

현재 주로 사용되고 있는 2차원 전기영동 기술에는 폴리아크릴 아마이드 젤 전기영동을 이용하여 등전점(Isoelectric Point ; pI)과 분자량(Molecular Weight ; MW)으로 단백질을 분리하는 2차원 폴리아크릴 아마이드 젤 전기영동(2 Dimensional PolyacrylAmide Gel Electrophoresis ; 2D-PAGE) 기술이 있다. 줄리오 셀리스(Julio Celis) 등에 의해 구축되기 시작한 2D-PAGE 프로테옴 데이터베이스는 아미노산(Amino Acid) 서열, 펩타이드 매스 핑거프린팅(Peptide Mass Fingerprinting ; PMF), 등전점(pI) 및 분자량(MW) 등을 이용하여 주석을 단 단백질에 대한 상세 정보를 제공하고 있다. 이들 단백질 정보들을 총체적으로 확인하고, 이들 단백질들의 발현 정도, 변형과 세포 내 위치 및 이들 단백질들간의 상호 작용 등을 포괄적으로 규명하는 연구를 통해 세포 내에서 발현되는 총체적인 단백질들이 규명되고 있고, 이들 단백질들간의 네트워크(network)가 규명됨으로써, 유전체로부터 실제 생명 현상을 이루는 단백질에 이르기까지 총체적인 생명 현상이 규명되어지고 있다.Currently used two-dimensional electrophoresis technology is a two-dimensional polyacrylamide gel electrophoresis that separates proteins by isoelectric point (pI) and molecular weight (MW) using polyacrylamide gel electrophoresis (2 Dimensional) PolyacrylAmide Gel Electrophoresis (2D-PAGE) technology. The 2D-PAGE proteome database, started by Julio Celis et al., Uses amino acid sequences, peptide mass fingerprinting (PMF), isoelectric point (pI) and molecular weight (MW). It provides detailed information about annotated proteins. Through a comprehensive study of these protein information and comprehensively identifying the level of expression of these proteins, their modifications and their location within the cell, and the interactions between these proteins, the total proteins expressed in the cell have been identified. By identifying networks between them, the overall life phenomena from the genome to the proteins that make up the real life are being identified.

단백질 정보들의 분석 중 특히 발현 프로파일(Profile)의 분석은, 실험 조건에 따른 단백질 발현 양상을 종합적으로 분석하는 것으로써, 작은 칩 위에 분석에 필요한 정보를 지닌 유전자 물질을 미세하게 집적하여 분석하는 유전자 마이크로어레이(Microarray) 기술에서 유래되었으며, 초기에 수행되었던 유전자 1개 내지 2개의 발현량 분석의 한계를 넘어서 대량의 유전자 발현 변화를 다양한 통계적 기법을 사용하여 분석할 수 있게 되었다. 이러한 단백질 발현 프로파일의 분석이 가능해 짐에 따라, 복잡한 질병의 메커니즘이 규명되고, 이를 이용한 신약 개발 또한 활기를 띠고 있다.Among the analysis of protein information, the analysis of the expression profile (Profile) is a comprehensive analysis of the protein expression pattern according to the experimental conditions, a gene micro-microchip that analyzes the micro-aggregation of the genetic material with the information necessary for analysis on a small chip Derived from the array (Microarray) technology, beyond the limits of the expression analysis of one to two genes that were performed earlier, it is possible to analyze a large amount of gene expression changes using various statistical techniques. As analysis of such protein expression profiles becomes possible, the mechanism of complex diseases is identified, and new drug development using them is also encouraging.

그러나, 생명 현상에 있어서, 직접적인 생체 반응에 관여하는 물질은 유전자(mRNA)가 아닌, 기능성 분자, 즉 단백질이다. 실제로, 유전자(mRNA)의 발현량과 단백질의 발현량은 정확히 일치하지 않을 뿐 아니라, 몇몇 경우에 있어서 전혀 다르게 나타남이 알려져 있다. 따라서, 보다 정확한 발현 프로파일의 분석을 위해서는 그 대상이 유전자(mRNA)가 아닌 단백질이어야 복잡한 생명 현상의 종합적 변화 관찰이 가능하고, 질병의 메커니즘이 이해될 수 있다. 이를 위해, 2D-PAGE의 실험결과 중 단백질 스팟 정량정보(intensity information)를 이용한 발현 프로파일 분석이 다양하게 시도되어 왔으나, 2D-PAGE 실험의 재현성 문제 및 단백질 동정의 어려움 등으로 인하여, 그 유용성이 제대로 활용될 수 없었다. 따라서, 대부분의 프로테옴 데이터베이스들은 단순 정보검색에 활용되는데 그치고 있다.However, in life phenomena, the substances involved in the direct bioreaction are functional molecules, ie proteins, not genes (mRNAs). Indeed, it is known that the amount of expression of a gene (mRNA) and the amount of expression of a protein are not exactly the same, and appear to be completely different in some cases. Therefore, in order to analyze the expression profile more accurately, the subject should be a protein, not a gene (mRNA), so that it is possible to observe comprehensive changes in complex life phenomena and understand the mechanism of disease. To this end, various expression profile analysis using protein spot intensity information has been attempted among 2D-PAGE experiments, but due to the reproducibility problem of the 2D-PAGE experiment and difficulty of protein identification, its usefulness is poor. Could not be utilized. Thus, most proteome databases are only used for simple information retrieval.

본 발명이 이루고자 하는 기술적 과제는, 프로테옴 데이터베이스에서 실험 조건별 단백질 발현 변화 분석, 유사발현 프로파일 패턴 검색, 및 계층적 클러스터링을 실시간으로 구현하는 장치 및 방법을 제공하는데 있다.It is an object of the present invention to provide an apparatus and method for implementing protein expression change analysis, similar expression profile pattern search, and hierarchical clustering in real time in a proteome database.

도 1은 본 발명의 바람직한 실시 예에 따른 데이터베이스의 단백질 정량정보(intensity information)를 이용한 단백질 발현 프로파일 분석 시스템의 구조를 보여주는 블록도이다.1 is a block diagram showing the structure of a protein expression profile analysis system using protein intensity information (intensity information) of a database according to a preferred embodiment of the present invention.

도 2는 도 1에 도시된 단백질 발현 프로파일 분석 서버에서 수행되는 동작을 개략적으로 보여주는 블록도이다.FIG. 2 is a block diagram schematically showing an operation performed in the protein expression profile analysis server shown in FIG. 1.

도 3은 본 발명의 바람직한 실시 예에 따른 단백질 발현 프로파일의 조건별 비율변화 검색 방법을 보여주는 흐름도이다.Figure 3 is a flow chart showing a method for detecting the rate change according to the condition of the protein expression profile according to an embodiment of the present invention.

도 4는 도 3에 도시된 조건별 비율변화 검색을 위한 사용자 인터페이스 화면을 보여주는 도면이다.FIG. 4 is a diagram illustrating a user interface screen for searching for a rate change for each condition shown in FIG. 3.

도 5는 단백질 발현 프로파일 분석에 사용되는 단백질 정량정보의 일례를 보여주는 도면이다.5 is a view showing an example of protein quantitative information used for protein expression profile analysis.

도 6은 본 발명의 일 실시 예에 따른 단백질 발현의 조건별 비율변화 검색 결과를 보여주는 도면이다.6 is a view showing the results of the rate change according to the condition of protein expression according to an embodiment of the present invention.

도 7은 본 발명의 바람직한 실시 예에 따른 단백질 발현 프로파일의 유사 단백질 검색 방법을 보여주는 흐름도이다.7 is a flowchart illustrating a method of searching for a similar protein of a protein expression profile according to a preferred embodiment of the present invention.

도 8은 본 발명의 바람직한 실시 예에 따른 단백질 발현 프로파일의 계층적 클러스터링 방법을 보여주는 흐름도이다.8 is a flowchart illustrating a hierarchical clustering method of protein expression profiles according to a preferred embodiment of the present invention.

도 9는 도 8에 도시된 방법에 의한 단백질 발현 프로파일의 계층적 클러스터링 결과를 보여주는 도면이다.9 is a diagram showing a hierarchical clustering result of a protein expression profile by the method shown in FIG. 8.

< 도면의 주요 부분에 대한 부호의 설명 ><Description of Symbols for Main Parts of Drawings>

10 : 인터페이스20 : 프로테옴 DB10: interface 20: proteome DB

60 : 비율변화 분석부70 : 유사 발현 패턴 검색부60: ratio change analysis unit 70: similar expression pattern search unit

80 : 클러스터링부90 : 인터넷80: clustering unit 90: the Internet

1 : 단백질 발현 프로파일 분석 시스템1: protein expression profile analysis system

50 : 단백질 발현 프로파일 분석부50: protein expression profile analysis unit

100 : 단백질 발현 프로파일 분석 서버100: protein expression profile analysis server

200a-200z : 클라이언트 서버200a-200z: client server

상기의 과제를 이루기 위하여 본 발명에 의한 단백질 발현 프로파일 분석 장치는, 검증된 다량의 프로테옴 데이터가 저장된 데이터베이스; 및 상기 데이터베이스에 저장된 단백질의 실험 조건별 발현 비율의 변화를 분석하고, 지정된 특정 단백질의 발현 프로파일과 유사한 발현 프로파일을 가지는 단백질을 검색하고, 발현 양상이 유사한 단백질들을 계층적으로 클러스터링 하는 단백질 발현 프로파일 분석부를 포함하는 것을 특징으로 한다.In order to achieve the above object, the protein expression profile analysis apparatus according to the present invention includes a database storing a large amount of verified proteome data; And analyzing a change in the expression ratio according to the experimental condition of the protein stored in the database, searching for a protein having an expression profile similar to that of the designated specific protein, and analyzing the protein expression profile to hierarchically cluster proteins with similar expression patterns. It is characterized by including a wealth.

상기의 과제를 이루기 위하여 본 발명에 의한 실험 조건별 단백질 발현 변화 분석 방법은, (a) 적어도 두 가지 이상의 실험 조건, 및 단백질 발현 비율의 범위를 입력하는 단계; (b) 검증된 다량의 프로테옴 관련 데이터가 저장된 프로테옴 데이터베이스로부터 상기 실험 조건에 해당되는 단백질의 정량정보들을 추출하는 단계; (c) 추출된 상기 정량정보들에 대한 실험 조건별 변화 비율을 계산하는 단계; 및 (d) 계산된 상기 변화 비율들 중 입력된 상기 변화 비율 범위를 만족하는 결과를 검색하는 단계를 포함하는 것을 특징으로 한다.In order to achieve the above object, the protein expression change analysis method for each experimental condition according to the present invention includes: (a) inputting at least two or more experimental conditions and a range of protein expression ratios; (b) extracting quantitative information of a protein corresponding to the experimental condition from a proteome database in which verified proteome-related data is stored; (c) calculating a rate of change for each experimental condition with respect to the extracted quantitative information; And (d) retrieving a result that satisfies the inputted change rate range among the calculated change rates.

상기의 과제를 이루기 위하여 본 발명에 의한 단백질 발현 프로파일의 유사 발현 패턴 검색 방법은, (a) 유사 발현 패턴 검색을 원하는 단백질을 선정하는 단계; (b) 검증된 다량의 프로테옴 관련 데이터가 저장된 프로테옴 데이터베이스로부터 복수 개의 단백질 정량정보들을 추출하는 단계; (c) 상기 검색될 단백질의 프로파일 정량정보와 상기 데이터베이스로부터 추출된 상기 프로파일 정량정보들 사이의 유클리드 거리를 계산하는 단계; (d) 계산된 상기 유클리드 거리 값이 작은 순으로 단백질을 정렬하는 단계; 및 (e) 상기 정렬 결과를 단백질 발현 프로파일의유사 발현 패턴 검색 결과로서 디스플레이 하는 단계를 포함하는 것을 특징으로 한다.In order to achieve the above object, a method for searching for a similar expression pattern of a protein expression profile according to the present invention includes: (a) selecting a protein for which a similar expression pattern is to be searched; (b) extracting a plurality of protein quantitative information from a proteome database in which verified large amounts of proteome related data are stored; (c) calculating a Euclidean distance between the profile quantitative information of the protein to be searched and the profile quantitative information extracted from the database; (d) sorting the proteins in descending order of the calculated Euclidean distance value; And (e) displaying the alignment result as a similar expression pattern search result of the protein expression profile.

상기의 과제를 이루기 위하여 본 발명에 의한 단백질 발현 프로파일의 계층적 클러스터링 방법은, (a) 클러스터링을 수행하기 원하는 실험 조건을 입력받는 단계; (b) 검증된 다량의 프로테옴 관련 데이터가 저장된 프로테옴 데이터베이스로부터 상기 실험 조건에 해당되는 단백질들의 정량정보를 추출하는 단계; (c) 추출된 상기 정량정보의 모든 조합에 대한 유클리드 거리를 계산하는 단계; (d) 상기 계산된 유클리드 거리 값이 작은 단백질들끼리 군집화 하는 계층적 클러스터링을 수행하는 단계; 및 (e) 상기 클러스터링 된 결과를 디스플레이 하는 단계를 포함하는 것을 특징으로 한다.In order to achieve the above object, the hierarchical clustering method of the protein expression profile according to the present invention comprises the steps of: (a) receiving input experimental conditions for performing clustering; (b) extracting quantitative information of proteins corresponding to the experimental conditions from a proteome database in which verified proteome-related data is stored; (c) calculating Euclidean distance for all combinations of the extracted quantitative information; (d) performing hierarchical clustering for clustering proteins with small calculated Euclidean distance values; And (e) displaying the clustered results.

이하에서, 첨부된 도면을 참조하여 본 발명의 바람직한 실시 예에 대하여 상세히 설명한다.Hereinafter, with reference to the accompanying drawings will be described in detail a preferred embodiment of the present invention.

도 1은 본 발명의 바람직한 실시 예에 따른 단백질 발현 프로파일 분석 시스템(1)의 구조를 보여주는 블록도이다. 도 1을 참조하면, 본 발명에 따른 단백질 발현 프로파일 분석 시스템(1)은, 인터넷(90)에 연결된 적어도 하나 이상의 클라이언트(200a, 200b, …, 200z)와, 웹 상에서 상기 클라이언트(200a, 200b, …, 200z)로 단백질 발현 프로파일의 분석 서비스를 실시간으로 제공하는 단백질 발현 프로파일 분석 서버(100)로 구성된다. 단백질 발현 프로파일 분석 서버(100)는 인터페이스(10), 프로테옴 데이터베이스(20), 및 단백질 발현 프로파일 분석부(50)로 구성되며, 단백질 발현 프로파일 분석부(50)는 비율변화 분석부(60), 유사 발현패턴 검색부(70) 및 클러스터링부(80)로 구성된다.1 is a block diagram showing the structure of a protein expression profile analysis system 1 according to a preferred embodiment of the present invention. Referring to FIG. 1, the protein expression profile analysis system 1 according to the present invention includes at least one client 200a, 200b,..., 200z connected to the Internet 90, and the client 200a, 200b, ..., 200z) is composed of a protein expression profile analysis server 100 that provides an analysis service of the protein expression profile in real time. Protein expression profile analysis server 100 is composed of an interface 10, a proteome database 20, and a protein expression profile analysis unit 50, the protein expression profile analysis unit 50 is a rate change analysis unit 60, It is composed of a similar expression pattern search unit 70 and the clustering unit 80.

프로테옴 데이터베이스(20)는, 검증된 다량의 프로테옴 관련 데이터가 저장된 데이터베이스이다. 프로테옴 데이터베이스(20)에 포함될 수 있는 전문 주석 데이터베이스에는 여러 가지가 있으나, 본 발명에서는 단백질 서열 정보 이외에 단백질의 기능과 구조, 각 도메인에 관한 정보 및 단백질의 번역 후 변형(post-translational modification)에 대한 정보 등 단백질과 관련된 대부분의 정보들이 저장되어 있는 스위스-프롯 (SWISS-PROT) 데이터베이스를 사용한다.The proteome database 20 is a database in which a large amount of verified proteome related data is stored. Although there are various specialized annotation databases that may be included in the proteome database 20, in the present invention, in addition to protein sequence information, the function and structure of the protein, information on each domain, and post-translational modification of the protein Use a SWISS-PROT database, which contains most of the protein-related information, including information.

인터페이스(10)는 단백질 발현 프로파일 분석에 필요한 검색 정보를 서버(200a, 200b, …, 200z)로부터 받아들여 단백질 발현 프로파일 분석부(50)로 전송하고, 단백질 발현 프로파일 분석부(50)의 단백질 발현 프로파일 분석 결과를 상기 서버(200a, 200b, …, 200z)로 출력한다.The interface 10 receives the search information necessary for analyzing the protein expression profile from the servers 200a, 200b,..., 200z and transmits it to the protein expression profile analyzer 50, and expresses the protein of the protein expression profile analyzer 50. The profile analysis result is output to the servers 200a, 200b, ..., 200z.

단백질 발현 프로파일 분석부(50)에 포함된 비율변화 분석부(60)는, 사용자가 입력한 두 가지 실험조건에 대해 프로테옴 데이터베이스(20)로부터 정량정보를 추출하고 이들의 변화 비율을 계산하며, 계산된 변화 비율들 중 입력된 변화비율을 만족하는 결과를 검색하여 출력한다. 유사 발현 패턴 검색부(70)는, 사용자로부터 선정된 단백질의 발현 프로파일 정량정보와 프로테옴 데이터베이스(20)로부터 추출된 단백질 프로파일 정량정보 사이의 유클리드 거리를 계산하여, 단백질 발현 패턴이 유사한 단백질을 검색한다. 클러스터링부(80)는, 사용자가 클러스터링하기 원하는 실험 조건을 입력하면, 상기 실험 조건을 만족하는 단백질의 프로파일 정량정보를 프로테옴 데이터베이스(20)로부터 추출하고, 추출된 프로파일 정량정보를 이용하여 발현 양상이 유사한 단백질들을 계층적으로 클러스터링(Hierarchical Clustering) 한다. 이와 같은 구성을 가지는 단백질 발현 프로파일 분석 서버(100)에서 수행되는 동작은 다음과 같다.The ratio change analyzer 60 included in the protein expression profile analyzer 50 extracts quantitative information from the proteome database 20 and calculates the rate of change of the two test conditions input by the user. The result of searching for the change rate among the changed change ratios is searched for and output. The similar expression pattern search unit 70 calculates a Euclidean distance between the expression profile quantitative information of the protein selected by the user and the protein profile quantitative information extracted from the proteome database 20, and searches for a protein having a similar protein expression pattern. . When the user inputs an experimental condition to be clustered, the clustering unit 80 extracts profile quantitative information of a protein satisfying the experimental condition from the proteome database 20 and expresses an expression pattern using the extracted profile quantitative information. Hierarchical Clustering of similar proteins. The operation performed in the protein expression profile analysis server 100 having such a configuration is as follows.

도 2는 도 1에 도시된 단백질 발현 프로파일 분석 서버(100)에서 수행되는 동작을 개략적으로 보여주는 블록도이다. 도 2를 참조하면, 먼저 프로테옴 데이터베이스(20)로부터 단백질 정량정보가 추출(52)된다. 추출된 정량정보는, 비율변화 분석부로 입력되어, 실험 조건별로 단백질 발현이 변화되는 양상을 분석하는데 사용되거나(60), 유사 발현패턴 검색부에 입력되어 유사한 단백질 발현 패턴을 가지는 단백질을 검색하는데 사용되거나(70), 또는 클러스터링부에 입력되어 단백질 발현 프로파일의 계층적 클러스터링을 수행하는데 사용된다(80). 비율변화 분석부(60), 유사 발현패턴 검색부(70), 또는 클러스터링부(80)에 의해 수행된 분석 결과(54)는 프로테옴 데이터베이스(20)와 연결되고, 분석 결과(54)를 구성하는 개별 단백질에 대한 상세 내용이 프로테옴 데이터베이스(20)로부터 검색되어 디스플레이 된다. 상기와 같은 단백질 발현 프로파일 분석 과정에 대한 상세 내용은 다음과 같다.FIG. 2 is a block diagram schematically illustrating an operation performed in the protein expression profile analysis server 100 shown in FIG. 1. Referring to FIG. 2, first, protein quantification information is extracted 52 from the proteome database 20. The extracted quantitative information is input to the rate change analyzer and used to analyze a pattern in which protein expression is changed according to experimental conditions (60), or used to search for a protein having a similar protein expression pattern input to a similar expression pattern search unit. Or input to the clustering portion 70 to be used to perform hierarchical clustering of the protein expression profile (80). The analysis result 54 performed by the rate change analysis unit 60, the similar expression pattern search unit 70, or the clustering unit 80 is connected to the proteome database 20, and constitutes the analysis result 54. Details of individual proteins are retrieved from the proteome database 20 and displayed. Details of the protein expression profile analysis process as described above are as follows.

도 3은 본 발명의 바람직한 실시 예에 따른 실험 조건별 단백질 발현 변화 분석 방법을 보여주는 흐름도이다. 도 4는 도 3에 도시된 실험 조건별 비율변화 검색을 위한 사용자 인터페이스 화면을 보여주는 도면이고, 도 5는 단백질 발현 프로파일 분석에 사용되는 단백질 정량정보의 일례를 보여주는 도면이다. 그리고 도 6은 본 발명의 일 실시 예에 따른 단백질 발현의 조건별 비율변화 검색 결과를 보여주는 도면이다.3 is a flowchart illustrating a method for analyzing protein expression change according to experimental conditions according to a preferred embodiment of the present invention. FIG. 4 is a diagram illustrating a user interface screen for searching for a rate change for each experimental condition shown in FIG. 3, and FIG. 5 is a diagram illustrating an example of protein quantitative information used for analyzing a protein expression profile. And Figure 6 is a view showing the results of the ratio change according to the condition of protein expression according to an embodiment of the present invention.

도 3 내지 도 6을 참조하여 본 발명에 의한 단백질 발현의 조건별 비율변화 분석 방법을 살펴보면 다음과 같다.Referring to Figures 3 to 6 look at the rate change analysis method for each condition of protein expression according to the present invention.

먼저 도 3을 참조하면, 비교를 원하는 단백질 발현의 실험 조건 및 검색되어질 비율 범위가 비율변화 분석부(60)로 입력된다(61 단계). 상기 실험 조건 및 검색 범위는 도 4에 도시된 단백질 발현의 조건별 비율변화 검색 화면에 구비된 검색 입력창을 통해 입력된다.First, referring to FIG. 3, an experimental condition and a range of ratios to be searched for protein expressions to be compared are input to the ratio change analyzer 60 (step 61). The experimental condition and the search range are input through a search input window provided on the conditional rate change search screen of the protein expression shown in FIG. 4.

여기서, 실험 조건은 사용자의 입력을 용이하게 하기 위해 드롭다운 메뉴(drop down menu)를 사용하여 입력되고, 단백질 발현 변화율의 범위는 사용자가 원하는 수치를 자유롭게 입력할 수 있도록 구성된다.Here, the experimental conditions are input using a drop down menu to facilitate the user's input, and the range of protein expression change rate is configured so that the user can freely input a desired value.

검색 입력창을 통해 상기 실험 조건 및 검색될 비율 범위가 입력되면, 상기 실험 조건에 해당되는 단백질 정량정보가 프로테옴 데이터베이스(20)로부터 추출된다(62 단계). 단백질 정량정보는 젤 이미지 상에 존재하는 단백질의 농도(intensity)를 의미하며, 도 5에 도시된 바와 같이 다양한 데이터 형태로 표시될 수 있다.When the experimental condition and the ratio range to be searched through the search input window are input, protein quantitative information corresponding to the experimental condition is extracted from the proteome database 20 (step 62). Protein quantitative information refers to the concentration (intensity) of the protein present on the gel image, as shown in Figure 5 can be represented in a variety of data forms.

계속해서, 비교되는 두 실험조건에 대한 단백질 정량정보 변화비율이 계산된다(63 단계). 여기서, 변화비율은, 두 실험조건의 비율을 의미하며, 이는Subsequently, the rate of change of protein quantitative information for the two experimental conditions to be compared is calculated (step 63). Here, the change ratio means the ratio of two experimental conditions,

[수학식 1][Equation 1]

으로 표현된다. 이 때 계산되는 변화비율은, 실험 조건을 달리했을 때의 단백질 발현 비율을 의미한다.It is expressed as The change ratio calculated at this time means a protein expression ratio when the experimental conditions are different.

[수학식 1]에 의해 단백질 정량정보의 변화비율이 계산되면, 계산된 단백질 정량정보 변화 비율 중 사용자로부터 입력된 검색 비율 범위에 속하는 결과가 도 6과 같이 출력된다(64 단계). 출력되는 결과는 단백질의 고유번호 및 이름 등과 같은 단백질 스팟 정보와, 두 실험조건의 비교에 의한 단백질 정량정보의 변화 비율, 및 상기 변화 비율이 변화되는 정보를 나타낸다. 이와 같은 정보는 표로 나타낼 수도 있는데, 특히 본 발명에서는 단백질 발현 비율의 증감 정도가 한 눈에 쉽게 구분될 수 있도록 증감에 따라 색을 달리하여 표시한다. 이와 같은 색에 의한 단백질 발현 비율의 표시에 의해서, 사용자는 비슷한 변화율로 증가하는 단백질과 비슷한 변화율로 감소하는 단백질을 쉽게 구분할 수 있게 된다.When the rate of change of protein quantitative information is calculated according to [Equation 1], the result of falling within the range of search rate input from the user is calculated as shown in FIG. The output results show protein spot information such as protein unique number and name, change rate of protein quantitative information by comparing two experimental conditions, and change information of the change rate. Such information may be represented in a table. In particular, in the present invention, the change in the protein expression ratio is displayed in different colors according to the increase and decrease so that the increase and decrease of the protein expression ratio can be easily distinguished at a glance. By the expression of the protein expression rate by this color, the user can easily distinguish between the protein which decreases with a similar rate of change and the protein which increases with a similar rate of change.

이와 같은 기능 외에, 본 발명에 의한 단백질 실험 조건별 비율변화 검색 방법은, 추출된 결과와 프로테옴 데이터베이스(20)를 연결하고(65 단계), 추출된 결과에 대한 상세 정보를 프로테옴 데이터베이스로부터 검색할 것인지 여부를 판별한다(66 단계). 그리고, 도 5와 같이 분석 결과로서 표시되는 각각의 단백질에 대한 상세 정보를 프로테옴 데이터베이스(20)로부터 검색하는 기능을 수행한다(67 단계).In addition to the above functions, the method for detecting the rate change according to the protein experiment condition according to the present invention connects the extracted result with the proteome database 20 (step 65), and searches for detailed information on the extracted result from the proteome database. It is determined whether or not (step 66). Then, as shown in FIG. 5, detailed function information of each protein displayed as an analysis result is searched from the proteome database 20 (step 67).

따라서, 사용자는 단백질 정량정보의 변화 비율의 계산을 통해 실험 조건별 단백질 발현 양상을 분석할 수 있고, 필요시 검색된 각각의 단백질에 대한 상세 정보를 확인할 수 있다. 상기 상세 정보의 검색은, 사용자가 검색결과로부터 검색을 원하는 단백질을 클릭함에 의해 수행된다. 단백질에 대한 상세 정보는 프로테옴 데이터베이스(20) 공급자가 제공하는 것으로서, 검증된 다량의 단백질 관련 정보를 제공하여, 사용자로 하여금 단백질을 분석하는 데 이를 유용하게 사용할 수 있도록 한다.Therefore, the user can analyze the protein expression pattern according to the experimental conditions by calculating the rate of change of the protein quantitative information, and can check the detailed information of each protein searched if necessary. The retrieval of the detailed information is performed by the user clicking on a protein to be searched from the search results. The detailed information about the protein is provided by the proteome database 20 provider, and provides a large amount of proven protein related information, so that the user can use it usefully to analyze the protein.

도 7은 본 발명의 바람직한 실시 예에 따른 단백질 발현 프로파일의 유사 단백질 검색 방법을 보여주는 흐름도이다. 도 7을 참조하면, 유사 발현 패턴 검색부(70)는 먼저 검색을 원하는 단백질을 선정하고(71 단계), 상기 단백질에 대한 발현 프로파일이 입력되었는지 여부를 판별한다(72 단계). 판별 결과, 검색될 단백질 발현 프로파일이 입력되었으면, 프로테옴 데이터베이스(20)로부터 단백질 프로파일 정량정보를 추출하고(73 단계), 사용자가 입력한 단백질 발현 프로파일의 정량정보와 프로테옴 데이터베이스(20)로부터 추출된 단백질 프로파일 정량정보 사이의 유클리드 거리를 계산한다(74 단계). 그리고, 계산된 유클리드 거리의 크기가 작은 순으로 해당 단백질을 정렬하고(75 단계), 정렬된 결과를 유사 단백질 검색 결과로서 디스플레이 한다(76 단계). 유클리드 거리는 N차원의 공간에서 두 지점간의 최단거리를 의미하는 것으로서, 계산된 유클리드 거리가 가까울수록 두 단백질의 발현 패턴이 유사하다는 것을 의미한다.7 is a flowchart illustrating a method of searching for a similar protein of a protein expression profile according to a preferred embodiment of the present invention. Referring to FIG. 7, the similar expression pattern search unit 70 first selects a protein to be searched (step 71), and determines whether an expression profile for the protein is input (step 72). As a result of the determination, when the protein expression profile to be searched is input, the protein profile quantitative information is extracted from the proteome database 20 (step 73), and the quantitative information of the protein expression profile input by the user and the protein extracted from the proteome database 20 The Euclidean distance between the profile quantitative information is calculated (step 74). The proteins are sorted in descending order of the calculated Euclidean distance (step 75), and the sorted results are displayed as similar protein search results (step 76). Euclidean distance means the shortest distance between two points in the N-dimensional space. The closer the calculated Euclidean distance is, the more similar the expression patterns of the two proteins are.

이어서, 본 발명에 의한 단백질 발현 프로파일의 유사 단백질 검색 방법은, 검색된 유사 단백질 결과와 프로테옴 데이터베이스(20)를 연결하고(77 단계), 유사 단백질 검색 결과에 대한 프로테옴 데이터베이스 검색을 수행할 것인지 여부를 판별한다(78 단계). 판별 결과, 유사 단백질 검색 결과에 대한 프로테옴 데이터베이스 검색을 수행할 것으로 판명되면, 검색된 유사 단백질 각각에 대한 상세 정보를프로테옴 데이터베이스(20)로부터 검색한다(79 단계).Subsequently, the method of searching for a similar protein of the protein expression profile according to the present invention connects the detected similar protein result with the proteome database 20 (step 77), and determines whether to perform a proteome database search for the similar protein search result. (Step 78). As a result of the determination, if it is found that the proteome database search for the similar protein search result is to be performed, detailed information about each detected similar protein is retrieved from the proteome database 20 (step 79).

따라서, 사용자는 프로테옴 데이터베이스(20)에 저장되어 있는 정량정보를 이용하여 단백질 발현 프로파일의 유사 단백질을 검색할 수 있고, 필요시 검색된 각각의 단백질에 대한 상세 정보를 확인할 수 있다.Therefore, the user can search for similar proteins of the protein expression profile using the quantitative information stored in the proteome database 20, and can check the detailed information for each protein searched if necessary.

도 8은 본 발명의 바람직한 실시 예에 따른 단백질 발현 프로파일의 계층적 클러스터링 방법을 보여주는 흐름도이다. 도 8을 참조하면, 본 발명에 따른 단백질 발현 프로파일의 계층적 클러스터링 방법은, 먼저 사용자로부터 계층적 클러스터링을 원하는 실험 조건을 입력받고(81 단계), 상기 실험 조건에 해당되는 단백질 프로파일 정량정보를 추출한다(82 단계). 이어서, 추출된 단백질 프로파일 정량정보의 모든 조합에 대한 유클리드 거리를 계산하고(83 단계), 계산된 유클리드 거리를 이용하여 유사한 발현 프로파일을 갖는 단백질들끼리 군집화 하는 계층적 클러스터링을 수행한 후(84 단계), 그 결과를 디스플레이 한다(85 단계).8 is a flowchart illustrating a hierarchical clustering method of protein expression profiles according to a preferred embodiment of the present invention. Referring to FIG. 8, in the hierarchical clustering method of a protein expression profile according to the present invention, first, an experimental condition for hierarchical clustering is input from a user (step 81), and protein profile quantification information corresponding to the experimental condition is extracted. (Step 82). Next, the Euclidean distance for all combinations of the extracted protein profile quantification information is calculated (step 83), and hierarchical clustering for clustering proteins with similar expression profiles using the calculated Euclidean distance is performed (step 84). Display the result (step 85).

그리고, 클러스터링 된 결과는 프로테옴 데이터베이스(20)와 연결되고(86 단계), 클러스터링 된 개별 단백질에 대한 프로테옴 데이터베이스 검색을 수행할 것인지 여부가 판별된다(87 단계). 판별 결과, 클러스터링 된 개별 단백질에 대한 프로테옴 데이터베이스 검색을 수행할 것으로 판명되면, 개별 단백질 각각에 대한 상세 정보가 프로테옴 데이터베이스(20)로부터 검색된다(88 단계).Then, the clustered result is connected to the proteome database 20 (step 86), and it is determined whether to perform a proteome database search for the clustered individual proteins (step 87). If it is determined that a proteome database search is performed for the individual clustered proteins, detailed information about each individual protein is retrieved from the proteome database 20 (step 88).

따라서, 사용자는 프로테옴 데이터베이스(20)에 저장되어 있는 정량정보를 이용하여 단백질 발현 프로파일을 유사도에 따라 계층적으로 클러스터링 할 수 있고, 필요시 클러스터링된 각각의 단백질에 대한 상세 정보를 확인할 수 있다.Therefore, the user can cluster the protein expression profile hierarchically according to the similarity using the quantitative information stored in the proteome database 20, and can check the detailed information about each clustered protein if necessary.

도 9는 도 8에 도시된 방법에 의한 단백질 발현 프로파일의 계층적 클러스터링 결과를 보여주는 도면이다. 계층적 클러스터링이란, 다변량 자료를 각 특성의 유사성에 따라 여러 그룹으로 나누는 통계 분석기법으로, 각 개체나 변수가 미리 정해진 기준에 맞추어 각 그룹 내에 비슷한 것들끼리 모이도록 분류하는 것을 의미한다. 클러스터링에 사용되는 방법으로는, 가장 거리가 짧은 두 개체를 한 그룹으로 묶은 뒤, 두 개체 중의 하나와 거리가 가장 짧은 세 번째 개체를 하나의 그룹으로 클러스터링 하는 단일연결방법과, 가장 가까운 개체를 한 그룹으로 묶은 뒤, 이 그룹과 다른 개체와의 최대 거리를 구하여 이를 거리 행렬로 삼고, 상기 최대거리가 최소화되는 개체들끼리 클러스터링 하는 완전연결방법, 그리고 포함된 개체들의 모든 조합에 대한 평균거리를 구하여 그룹과 그룹간의 거리를 클러스터링 하는 평균연결방법이 있다. 본 발명에서는 상기 연결방법들 중 평균연결방법을 사용하여 계층적 클러스터링을 수행한다.9 is a diagram showing a hierarchical clustering result of a protein expression profile by the method shown in FIG. 8. Hierarchical clustering is a statistical analysis method that divides multivariate data into groups according to the similarity of each characteristic, and classifies each entity or variable into similar groups in each group according to predetermined criteria. The method used for clustering includes grouping two objects with the shortest distance into one group, then clustering one of the two objects with the third object with the shortest distance into one group, and one object with the nearest object. After grouping together, find the maximum distance between this group and other entities, use it as a distance matrix, complete connection method for clustering the objects with the maximum distance minimized, and find the average distance for all combinations of the included entities. There is an average connection method that clusters the distance between groups. In the present invention, hierarchical clustering is performed using an average connection method among the connection methods.

도 9를 참조하면, 계층적 클러스터링 결과는 사용자가 단백질 발현프로파일간의 상호관계를 용이하게 이해할 수 있도록 이미지 형태로 시각화된다. 즉, 인접한 두 단백질간의 발현 양상에 따른 거리가 트리뷰(Tree View)로 표현되며, 트리 구조의 x축 길이에 따라 두 단백질간의 발현 양상의 유사도 정도가 다르게 표현된다. 그러므로, 두 단백질간의 발현 양상의 유사도는 x축의 길이를 비교함에 의해서 확인될 수 있다. 예를 들어, 3207번 단백질과 8118번 단백질간의 x축 길이가 가장 짧으므로, 두 단백질들간의 발현 양상이 가장 유사하다는 것을 알 수 있다. 이와 같은 계층적 클러스터링으로 인해서 데이터베이스에 저장된 단백질 전체의 발현양상에 대한 계층적 발현패턴 유사도를 한눈에 알 수 있을 뿐 아니라, 두 단백질간의 발현 양상의 유사 정도를 x축 길이를 비교함으로써 비교 분석 할 수 있다. 이와 같은 계층적 클러스터링은 프로테옴 데이터베이스 내의 단백질 정보가 추가되거나 변동되는 경우, 변경된 전체 단백질들에 대한 클러스터링을 실시간으로 수행한다. 그 결과, 사용자는 데이터베이스 내의 정보 변화에 따라 실시간으로 달라지는 클러스터링 결과를 웹 상에서 획득할 수 있다.Referring to FIG. 9, hierarchical clustering results are visualized in an image form so that a user can easily understand the correlation between protein expression profiles. That is, the distance according to the expression pattern between two adjacent proteins is expressed in a tree view, and the degree of similarity of expression patterns between the two proteins is expressed differently according to the x-axis length of the tree structure. Therefore, the similarity of expression patterns between the two proteins can be confirmed by comparing the length of the x-axis. For example, since the x-axis length is shortest between proteins 3207 and 8118, the expression patterns between the two proteins are most similar. This hierarchical clustering not only shows the similarity of the hierarchical expression patterns for the expression patterns of the entire proteins stored in the database at a glance, but also compares the similarity of the expression patterns between the two proteins by comparing the x-axis lengths. have. Such hierarchical clustering performs clustering on the changed whole proteins in real time when protein information in the proteome database is added or changed. As a result, the user can obtain a clustering result on the web that varies in real time according to the change of information in the database.

이와 같은 단백질 발현 정량정보의 실험 조건별 변화 분석, 유사발현 패턴 검색, 및 계층적 클러스터링을 통해서 단백질 발현 프로파일을 다각적으로 분석할 수 있으며, 분석된 결과를 토대로 유사한 단백질 발현 프로파일을 가지는 단백질을 프로테옴 데이터베이스(20)로부터 검색해 낼 수 있다.The protein expression profile can be analyzed in various ways by analyzing the change of the protein expression quantitative information according to the experimental conditions, searching for similar expression patterns, and hierarchical clustering, and based on the analyzed result, a protein having a similar protein expression profile is identified as a proteome database. We can retrieve from (20).

이상에서, 본 발명의 실시 예로서 스위스-프롯 데이터베이스를 레퍼런스 프로테옴 데이터베이스로 사용하여 웹 환경에서 단백질 발현 프로파일을 분석하는 장치 및 방법에 대해 구체적으로 예시되었으나, 그밖에도 다양한 프로테옴 데이터베이스들이 본 발명에 적용될 수 있고, 웹 환경이 아닌 로컬 환경에서도 본 발명을 적용할 수 있다.In the above, as an embodiment of the present invention specifically described for the apparatus and method for analyzing the protein expression profile in the web environment using the Swiss-Plot database as a reference proteome database, various other proteome databases can be applied to the present invention In addition, the present invention can be applied to a local environment rather than a web environment.

본 발명은 또한 컴퓨터로 읽을 수 있는 기록매체에 컴퓨터가 읽을 수 있는 코드로서 구현하는 것이 가능하다. 컴퓨터가 읽을 수 있는 기록매체는 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록장치를 포함한다. 컴퓨터가 읽을 수 있는 기록매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플로피디스크, 광 데이터 저장장치 등이 있으며, 또한 캐리어 웨이브(예를 들어 인터넷을통한 전송)의 형태로 구현되는 것도 포함한다. 또한 컴퓨터가 읽을 수 있는 기록매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산방식으로 컴퓨터가 읽을 수 있는 코드로 저장되고 실행될 수 있다.The invention can also be embodied as computer readable code on a computer readable recording medium. The computer-readable recording medium includes all kinds of recording devices in which data that can be read by a computer system is stored. Examples of computer-readable recording media include ROM, RAM, CD-ROM, magnetic tape, floppy disks, optical data storage devices, and the like, which are also implemented in the form of carrier waves (for example, transmission over the Internet). Include. The computer readable recording medium can also be distributed over network coupled computer systems so that the computer readable code is stored and executed in a distributed fashion.

이상에 설명한 바와 같이, 본 발명에 의한 단백질 발현 프로파일 분석 장치 및 방법에 의하면, 데이터베이스의 단백질 정량정보를 이용하여 단백질 발현의 실험 조건별 변화가 분석되고, 단백질 발현 프로파일의 유사발현 패턴이 검색되며, 전체적인 관점에서 유사 발현 패턴을 지닌 단백질들이 계층적으로 클러스터링 됨으로써, 단백질 발현 프로파일과 관련된 종합적인 정보를 획득할 수 있다.As described above, according to the apparatus and method for analyzing protein expression profiles according to the present invention, changes in protein expression by experimental conditions are analyzed using protein quantitative information in a database, and similar expression patterns of protein expression profiles are searched. As a whole, proteins having similar expression patterns are hierarchically clustered to obtain comprehensive information related to protein expression profiles.

Claims

A database in which verified large amounts of proteome data are stored; And

The protein expression profile analysis unit analyzes the change in the expression ratio of the protein stored in the database according to the experimental condition, searches for a protein having an expression profile similar to that of the designated specific protein, and hierarchically clusters proteins with similar expression patterns. Protein expression profile analysis device comprising a.

According to claim 1, wherein the protein expression profile analysis device,

Connected with a user through a communication network, accepting a range of experimental conditions and protein expression ratios desired for analysis on the specific protein, transferring the experimental conditions and the range of expression ratios to the protein expression profile analyzer, and analyzing the protein expression profile Apparatus for retrieving similar proteins using a two-dimensional gel image, characterized in that it further comprises an interface for transmitting a negative analysis result to the server through the communication network.

The method of claim 2,

The protein profile quantitative information is a protein expression profile analysis device, characterized in that the concentration of the protein present on the gel image obtained by two-dimensional electrophoresis.

The method of claim 2,

The protein expression profile analysis unit,

Extracting profile quantitative information for at least two experimental conditions from the database, calculating a change in expression rate for each experimental condition of the protein from the extracted quantitative information, and calculating the range of the expression rate among the calculated expression rates. A conditional rate change analysis unit extracting a protein belonging to the database;

A similar expression pattern search unit for calculating a Euclidean distance between the profile quantitative information of the specific protein and the profile quantitative information extracted from the database, and searching for a protein having an expression pattern similar to that of the specific protein in response to the calculation result; And

Protein profile analysis apparatus characterized in that it comprises a clustering unit for extracting the protein profile quantitative information satisfying the experimental conditions from the database, and hierarchically clusters proteins with similar expression patterns in response to the extracted profile quantitative information. .

The method of claim 4, wherein

Expression ratio for each experimental condition is, Protein expression profile analysis device, characterized in that.

The method of claim 4, wherein

The ratio change analysis unit, the protein expression profile analysis device, characterized in that to display the result of the increase or decrease of the change rate of the protein profile quantitative information using different colors.

The method of claim 4, wherein

The similar expression pattern search unit, the protein expression profile analysis device, characterized in that sorting the protein in the order of the smallest Euclidean distance value.

The method of claim 4, wherein

The clustering unit,

And calculating a Euclidean distance for all combinations of the extracted profile quantitative information, and performing hierarchical clustering for clustering small ones of the calculated Euclidean distance values.

The method of claim 4, wherein

The clustering unit, protein expression profile analysis device, characterized in that for performing hierarchical clustering by the average connection method.

The method of claim 4, wherein

The clustering unit, the protein expression profile analysis device, characterized in that for outputting the image of the tree view form in which the degree of similarity between the expression patterns between the two proteins expressed in the x-axis length as a clustering result.

(a) entering at least two or more experimental conditions and a range of protein expression ratios;

(b) extracting quantitative information of a protein corresponding to the experimental condition from a database in which verified large amounts of proteome related data are stored;

(c) calculating a rate of change for each experimental condition with respect to the extracted quantitative information; And

and (d) searching for a result satisfying the input change rate range among the calculated change rates.

The method of claim 11,

The expression ratio for each experimental condition is Protein expression change analysis method for each experimental condition, characterized in that the.

The method of claim 11,

The search results, the protein expression change analysis method according to the experimental conditions, characterized in that using different colors to display the increase and decrease of the rate of change.

The method of claim 11, wherein the conditional rate change search method of the protein expression profile is

(e) linking the search results with the database; And

(f) retrieving detailed expression of protein expression according to experimental conditions, further comprising retrieving detailed information about each protein constituting the search result from the database.

(a) selecting a protein for which a similar expression pattern is to be searched;

(b) extracting a plurality of protein quantitative information from a database in which verified large amounts of proteome related data are stored;

(c) calculating a Euclidean distance between the profile quantitative information of the protein to be searched and the profile quantitative information extracted from the database;

(d) sorting the proteins in descending order of the calculated Euclidean distance value; And

(e) displaying the alignment result as a similar expression pattern search result of the protein expression profile.

The method of claim 15, wherein the method of searching for analogous proteins of the protein expression profile is as follows.

(f) linking the search results with the database; And

(g) retrieving detailed expression patterns of the protein expression profile of the protein expression profile, further comprising retrieving detailed information about each protein constituting the search result from the database.

(a) receiving an experimental condition for performing clustering;

(b) extracting quantitative information of the proteins corresponding to the experimental conditions from a database in which verified large amounts of proteome related data are stored;

(c) calculating Euclidean distance for all combinations of the extracted quantitative information;

(d) performing hierarchical clustering for clustering proteins with small calculated Euclidean distance values; And

(e) displaying the clustered results.

The method of claim 17,

Step (d), hierarchical clustering method of protein expression profile, characterized in that performed by the average linkage method.

The method of claim 17,

The clustering result is a hierarchical clustering method of the protein expression profile, characterized in that the degree of similarity between the expression patterns between the two proteins consists of a tree-view image expressed in the x-axis length.

The method of claim 17, wherein the hierarchical clustering method of the protein expression profile is

(f) linking the clustering results with the database; And

(g) searching for detailed information on each protein constituting the clustering result from the database.

A computer-readable recording medium having recorded thereon a program for executing the method of any one of claims 11 to 20 on a computer.