JP3476310B2

JP3476310B2 - Protein database system and method for displaying protein names and functions

Info

Publication number: JP3476310B2
Application number: JP20357596A
Authority: JP
Inventors: 洋文土居; 正人北島; 勇渡部
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1996-08-01
Filing date: 1996-08-01
Publication date: 2003-12-10
Anticipated expiration: 2016-08-01
Also published as: JPH1045795A

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】ヒトゲノム計画などにおい
て、人や病原菌を含む種々の生物のＤＮＡ配列の読み取
りが進んでおり、それに伴い機能未知の蛋白質に関する
アミノ酸の配列情報が急速に蓄積され、膨大な量となっ
てきている。そこで、これらのデータをもとに病気や発
病の推定、病原遺伝子の推定等が行われようとしてい
る。しかし、そのためには機能未知の蛋白質の機能や機
能部位を推定する必要がある。このような推定が可能に
なれば病原遺伝子の推定が容易となり、それに対処する
医薬品の開発も進むものと考えられる。BACKGROUND OF THE INVENTION In the human genome project, etc., the reading of DNA sequences of various organisms including humans and pathogens is progressing, and along with this, sequence information of amino acids relating to proteins of unknown function is rapidly accumulated, resulting in a huge amount. Is becoming. Therefore, based on these data, diseases and onset of diseases, pathogenic genes, etc. are being estimated. However, for that purpose, it is necessary to estimate the function or functional site of a protein of unknown function. If this kind of estimation becomes possible, it will be easier to estimate the pathogenic gene, and it is thought that the development of medicines to deal with it will advance.

【０００２】本発明は、上記のように機能未知の蛋白
質の機能や機能部位、あるいは、機能が既知で機能部位
が未知の蛋白質の機能部位を推定するために使用するこ
とができる蛋白質データベース・システム並びに蛋白質
の名前および機能の表示方法に関し、生化学・分子生物
学・医薬品開発等の各種の分野で広く利用することがで
きる。INDUSTRIAL APPLICABILITY The present invention can be used to estimate the function or functional site of a protein of unknown function, or the functional site of a protein whose function is known and whose functional site is unknown .
The protein database system capable of performing and the method of displaying the names and functions of proteins can be widely used in various fields such as biochemistry, molecular biology, and drug development.

【０００３】[0003]

【従来の技術】従来、機能未知の蛋白質を推定すると
き、該機能未知の蛋白質に近似した蛋白質を検索するホ
モロジー・サーチと呼ばれるアルゴリズムを使ってデー
タベースに対して検索を行い、機能および機能部位を推
定していた。しかしながら、機能未知の蛋白質がデータ
ベース中の蛋白質アミノ酸配列データとホモロジーが無
いと機能および機能部位を推定できないのが現状であ
る。さらに、機能が生物学的に分かっていても、データ
ベース中の蛋白質アミノ酸配列データとホモロジーが無
い場合、機能部位決定のために、研究者がランダムにア
ミノ酸置換などを行い思考錯誤で実験を行っていた。2. Description of the Related Art Conventionally, when estimating a protein of unknown function, a database is searched using an algorithm called a homology search that searches for a protein that is similar to a protein of unknown function, and the function and functional site are identified. I was estimating. However, under the present circumstances, a protein of unknown function cannot be inferred for its function and functional site unless it has homology with the protein amino acid sequence data in the database. Furthermore, even if the function is biologically known, if there is no homology with the protein amino acid sequence data in the database, researchers randomly perform amino acid substitutions etc. to determine the functional site, and conduct experiments with thought and error. It was

【０００４】[0004]

【発明が解決しようとする課題】上記のように、従来に
おいて、蛋白質の機能や機能部位は、通常、データベー
ス中の蛋白質アミノ酸配列データとのホモロジーに基づ
いて推定していた。しかし、機能未知の蛋白質が、デー
タベース中の蛋白質アミノ酸配列データとホモロジーが
無い場合には、上記のように機能および機能部位の推定
が困難であった。本発明は上記した事情に鑑みなされた
ものであり、その目的とするところは、機能未知もしく
は機能部位が未知の蛋白質のアミノ酸配列をオリゴペプ
チドに分解してデータベースを検索することにより、従
来のホモロジー・サーチで推定できなかった機能未知も
しくは機能部位が未知の蛋白質に対しても機能および機
能部位の推定を行えるようにすることである。As described above, conventionally, the function or functional site of a protein is usually estimated based on the homology with the protein amino acid sequence data in the database. However, when the protein of unknown function has no homology with the protein amino acid sequence data in the database, it was difficult to estimate the function and functional site as described above. The present invention has been made in view of the above circumstances, and an object thereof is to analyze a database by decomposing an amino acid sequence of a protein whose function is unknown or whose functional site is unknown into oligopeptides to obtain a conventional homology. -It is to be able to estimate the function and functional site even for proteins whose function is unknown or whose functional site cannot be estimated by search.

【０００５】[0005]

【課題を解決するための手段】図１は本発明の原理構成
図である。同図において、１は入力されたアミノ酸配列
をある長さ（例えばアミノ酸数４〜７）のオリゴペプチ
ドに分解する手段であり、例えば、ＭＳＫＧＥＥＬＦ…
のアミノ酸配列を５の長さを有するＭＳＫＧＥ，ＳＫＧ
ＥＥ，ＫＧＥＥＬ，…等のオリゴペプチドに分解する
（Ｍ，Ｓ，Ｋ等はアミノ酸の１文字記号）。２は蛋白質
データベースであり、蛋白質ＩＤ、蛋白質名、機能、ア
ミノ酸配列等の蛋白質のデータが格納されている。FIG. 1 is a block diagram showing the principle of the present invention. In the figure, 1 is a means for decomposing an input amino acid sequence into an oligopeptide having a certain length (for example, the number of amino acids 4 to 7), and for example, MSKGEELF ...
Amino acid sequence of MSKGE, SKG having a length of 5
Decomposes into oligopeptides such as EE, KGEEL, etc. (M, S, K, etc. are single letter symbols of amino acids). A protein database 2 stores protein data such as protein IDs, protein names, functions and amino acid sequences.

【０００６】３は蛋白質のデータを抽出する手段であ
り、上記オリゴペプチドに分解する手段１により得られ
たオリゴペプチドを持つ蛋白質のデータを蛋白質データ
ベース２から抽出する。４は頻度解析手段であり、上記
オリゴペプチドの頻度解析を行う。５は出力手段であ
り、上記手段３の抽出結果、手段４による頻度解析結果
等を表示する。Reference numeral 3 denotes a means for extracting protein data, which extracts from the protein database 2 data on a protein having an oligopeptide obtained by the means 1 for decomposing into oligopeptides. Reference numeral 4 is a frequency analysis means for performing frequency analysis of the oligopeptide. An output unit 5 displays the extraction result of the unit 3, the frequency analysis result of the unit 4, and the like.

【０００７】上記のような蛋白質データ・ベースシステ
ムを利用して機能未知の蛋白質のアミノ酸配列を解析す
ることにより、従来のホモロジー・サーチで推定できな
かった機能未知の蛋白質に対しても機能および機能部位
の推定を行えるようになる。すなわち、蛋白質の機能部
位を含むオリゴペプチドはデータベース中の多くの蛋白
質でも頻繁につかわれているため、オリゴペプチドの頻
度解析を行い、出現頻度が高い蛋白質の機能等を抽出す
ることにより、蛋白質の機能部位の推定が可能となる。
また、逆にデータベース中で出現頻度が極端に低いか、
出現頻度がゼロのオリゴペプチドについては、機能が未
知の蛋白質の独自の機能部位として推定することができ
る。By analyzing the amino acid sequence of a protein of unknown function using the protein database system as described above, the function and function of a protein of unknown function which could not be estimated by the conventional homology search are analyzed. You will be able to estimate the part. In other words, oligopeptides containing functional sites of proteins are frequently used in many proteins in the database.Therefore, by performing frequency analysis of oligopeptides and extracting the functions of proteins with high frequency of occurrence, etc. It is possible to estimate the part.
On the contrary, if the frequency of occurrence in the database is extremely low,
An oligopeptide having a zero appearance frequency can be estimated as a unique functional site of a protein whose function is unknown.

【０００８】上述したように、本発明の請求項１から
請求項４の発明は前記課題を次のように解決する。（１）蛋白質データベース・システムを、蛋白質のアミ
ノ酸配列を入力とし、ある長さのオリゴペプチドに分解
する手段と、蛋白質名、機能、アミノ酸配列等の蛋白質
のデータが格納された蛋白質データベースにアクセス
し、分解された各オリゴペプチドについて、そのオリゴ
ペプチドを持つ蛋白質のデータを抽出する手段と、蛋白
質データベースにアクセスし、上記オリゴペプチドの頻
度解析を行い、蛋白質データベース中で出現頻度の高い
オリゴペプチドを求める手段と、上記出現頻度の高いオ
リゴペプチドおよびそのオリゴペプチドを持つ蛋白質の
データのうち少なくとも名前および機能を表示する手段
とから構成する。 As described above, the inventions of claims 1 to 4 of the present invention solve the above problems as follows. (1) Use the protein database system to
No acid sequence as input and decomposed into oligopeptide of a certain length
And the protein such as protein name, function, amino acid sequence, etc.
Access the protein database that stores data
For each degraded oligopeptide,
A means for extracting data of proteins having peptides,
Access the quality database and check the
Frequency analysis and high frequency of occurrence in protein databases
The means for obtaining oligopeptides and the
Of proteins with lygopeptides and their oligopeptides
A means of displaying at least the name and function of the data
It consists of and.

【０００９】（２）蛋白質データベース・システムを、
蛋白質のアミノ酸配列を入力とし、ある長さのオリゴペ
プチドに分解する手段と、蛋白質名、機能、アミノ酸配
列等の蛋白質のデータが格納された蛋白質データベース
にアクセスし、分解された各オリゴペプチドについて、
そのオリゴペプチドを持つ蛋白質のデータを抽出する手
段と、蛋白質データベースにアクセスし、上記オリゴペ
プチドの頻度解析を行い、蛋白質データベース中で出現
頻度が極端に低いか、出現頻度がゼロのオリゴペプチド
を求める手段と、上記出現頻度の低いオリゴペプチドお
よびそのオリゴペプチドを持つ蛋白質の名前および機能
を表示するとともに、出現頻度がゼロのオリゴペプチド
を表示する手段とから構成する。( 2 ) A protein database system
A means for degrading oligopeptides of a certain length using the amino acid sequence of the protein as input and the protein name, function, amino acid sequence
Access the protein database that stores protein data such as columns , and for each decomposed oligopeptide,
Access to the protein database with a means to extract the data of the protein having the oligopeptide, perform the frequency analysis of the above oligopeptide, and find the oligopeptide with extremely low occurrence frequency or zero appearance frequency in the protein database. And means for displaying the name and function of the oligopeptide having a low occurrence frequency and the protein having the oligopeptide, and displaying the oligopeptide having a zero appearance frequency.

【００１０】（３）蛋白質名、機能、アミノ酸配列等の
蛋白質のデータが格納された蛋白質データベースと、デ
ータ入力手段と、入力データを処理する入力データ処理
手段と、蛋白質データベース検索手段と、蛋白質データ
ベースから検索されたデータを処理する検索データ処理
手段と、処理されたデータを表示出力するデータ出力手
段を備えた蛋白質データベース・システムにおいて、蛋
白質のアミノ酸配列を入力とし、ある長さのオリゴペプ
チドに分解し、上記蛋白質データベースにアクセスして
上記オリゴペプチドの頻度解析を行い、上記出現頻度の
高いオリゴペプチドを持つ蛋白質のデータのうち少なく
とも名前および機能を表示する。（４）蛋白質データベースと、データ入力手段と、入力
データを処理する入力データ処理手段と、蛋白質データ
ベース検索手段と、蛋白質データベースから検索された
データを処理する検索データ処理手段と、処理されたデ
ータを表示出力するデータ出力手段を備えた蛋白質デー
タベース・システムにおいて、蛋白質のアミノ酸配列を
入力とし、ある長さのオリゴペプチドに分解し、上記蛋
白質データベースにアクセスして上記オリゴペプチドの
頻度解析を行い、出現頻度が極端に低いオリゴペプチド
を持つ蛋白質のデータのうち少なくとも名前および機能
を表示するとともに出現頻度がゼロのオリゴペプチドを
表示する。( 3 ) Protein name, function, amino acid sequence, etc.
A protein database that stores protein data, and a
Data input means and input data processing for processing input data
Means, protein database searching means, protein data
Search data processing to process the data retrieved from the base
Means and a data output device for displaying and outputting the processed data
In a protein database system equipped with steps, the amino acid sequence of a protein is input, it is decomposed into oligopeptides of a certain length, the protein database is accessed, and the frequency analysis of the oligopeptides is performed. Display at least the name and function of the data for proteins with peptides. ( 4 ) Protein database, data input means, and input
Input data processing means for processing data and protein data
Searched from base search means and protein database
Search data processing means for processing the data and the processed data
Protein data output means for displaying and outputting the data
In database systems, as input an amino acid sequence of the protein, decomposing the oligopeptides of a length, performs frequency analysis of the oligopeptides to access the蛋<br/> white matter database, is extremely low frequency At least the name and function of the data of proteins having oligopeptides are displayed, and oligopeptides having a zero appearance frequency are displayed.

【００１１】[0011]

【発明の実施の形態】図２は本発明の実施の形態のシス
テムの構成である。同図において、１１はアミノ酸配列
を入力する入力装置、１２は処理部であり、処理部１２
において、１２ａは入力されたアミノ酸配列をある長さ
のオリゴペプチド（アミノ酸の連鎖）に分解するオリゴ
ペプチド分解手段、１２ｂは検索手段であり、蛋白質デ
ータベース１３にアクセスし、上記オリゴペプチド分解
手段１２ａにより分解されたオリゴペプチドを持つ蛋白
質の名前、機能、アミノ酸配列等のデータを検索する。FIG. 2 shows the configuration of a system according to an embodiment of the present invention. In the figure, 11 is an input device for inputting an amino acid sequence, 12 is a processing unit, and the processing unit 12
In the above, 12a is an oligopeptide decomposing means for decomposing the input amino acid sequence into an oligopeptide (a chain of amino acids) of a certain length, and 12b is a searching means, which accesses the protein database 13 and uses the oligopeptide decomposing means 12a. Search for data such as the name, function, amino acid sequence, etc. of proteins that have degraded oligopeptides.

【００１２】１２ｃは頻度解析手段であり、上記検索手
段１２ｂの検索結果に基づき、上記オリゴペプチドの出
現頻度を求める。１２ｄは機能抽出手段であり、上記頻
度解析手段の解析結果に基づき上記蛋白質データベース
にアクセスし、上記オリゴペプチドを持つ蛋白質の機能
等を抽出する。１３は上記した蛋白質データベースであ
り、蛋白質データベース１３には、蛋白質ＩＤ、蛋白質
名、機能、アミノ酸配列、参考文献等の蛋白質のデータ
が格納されている。なお、本実施例においては、蛋白質
データベース１３としてＳＷＩＳＳ・ＰＲＯＴ version
33を用いたが他のデータベースを用いてもよい。１４は
ＣＲＴ，液晶表示装置、プリンタ等から構成される出力
装置であり、上記機能抽出手段１２ｃ、頻度解析手段１
２ｄの結果を出力する。Reference numeral 12c is a frequency analysis means, which determines the appearance frequency of the oligopeptide based on the search result of the search means 12b. A function extracting unit 12d accesses the protein database on the basis of the analysis result of the frequency analyzing unit and extracts the function and the like of the protein having the oligopeptide. Reference numeral 13 is the above-mentioned protein database, and the protein database 13 stores protein data such as protein ID, protein name, function, amino acid sequence, and references. In this example, the protein database 13 was used as SWISS / PROT version.
33 was used, but other databases may be used. An output device 14 is composed of a CRT, a liquid crystal display device, a printer, etc., and has the function extracting means 12c and the frequency analyzing means 1
Output the result of 2d.

【００１３】以下、上記システムを用いた蛋白質の機能
および機能部位の解析についての実施例を説明する。な
お、以下の説明では、アミノ酸を図３に示す１文字記号
Ａ，Ｒ，Ｎ，…等で表記する。また、アミノ酸配列をオ
リゴペプチドに分解する際のオリゴペプチドの長さは、
アミノ酸の種類が図３に示すように２０種類であるの
で、検索効率、カバレージ等を考慮すると４〜７程度が
望ましく、以下の実施例では５とした。An example of analysis of protein functions and functional sites using the above system will be described below. In the following description, amino acids are represented by the one-letter symbols A, R, N, ... Shown in FIG. In addition, the length of an oligopeptide when decomposing an amino acid sequence into an oligopeptide is
Since there are 20 kinds of amino acids as shown in FIG. 3, it is preferably about 4 to 7 in consideration of search efficiency, coverage, etc., and was set to 5 in the following examples.

【００１４】（１）図４に示すように、機能未知の蛋白
質のアミノ酸配列を入力とし、前記オリゴペプチド分解
手段１２ａにより、入力された蛋白質のアミノ酸配列上
で、Ｎ末（NH₂側）からＣ末（COOH側）まである長さの
単位（以下ウインドウという）でアミノ酸を一つずつず
らしていきながら、上記ウインドウの長さでオリゴペプ
チドに分解していく。図５に、緑色の蛍光を発するGree
n Fluorescent Protein のＮ末から３０アミノ酸を長さ
５のオリゴペプチドに分解した例を示す。(1) As shown in FIG. 4, the amino acid sequence of a protein of unknown function is input, and the oligopeptide degrading means 12a causes the amino acid sequence of the input protein to start from the N-terminal (NH ₂ side). While gradually shifting amino acids one by one in units of a certain length (hereinafter referred to as a window) up to the C-terminal (COOH side), they are decomposed into oligopeptides with the length of the above window. In Figure 5, Gree that emits green fluorescence
An example in which 30 amino acids from the N terminus of n Fluorescent Protein is decomposed into oligopeptides having a length of 5 is shown.

【００１５】（２）図６に示すように、分解されたオリ
ゴペプチドを、前記検索手段１２ｂにより蛋白質データ
ベース１３から検索し、頻度解析手段１２ｃにより各オ
リゴペプチドについて出現頻度をカウントし、各オリゴ
ペプチドの出現頻度を出力手段１４にグラフ表示する。
図７は上記したGreen Fluorescent Protein のＮ末から
３０アミノ酸を長さ５のオリゴペプチドに分解し、蛋白
質データベースＳＷＩＳＳ・ＰＲＯＴ version33中で検
索し、出現頻度をグラフ表示した結果を示す図であり、
縦軸は分解されたオリゴペプチドを示し、横軸が出現頻
度を示している。(2) As shown in FIG. 6, the decomposed oligopeptides are searched from the protein database 13 by the searching means 12b, and the frequency of appearance of each oligopeptide is counted by the frequency analyzing means 12c. The appearance frequency of is displayed as a graph on the output means 14.
FIG. 7 is a diagram showing the results of degrading 30 amino acids from the N terminus of the above-mentioned Green Fluorescent Protein into oligopeptides of length 5, searching in the protein database SWISS PROT version33, and displaying the frequency of occurrence in a graph,
The vertical axis shows the decomposed oligopeptide, and the horizontal axis shows the appearance frequency.

【００１６】（３）また、図８に示すように、分解され
たオリゴペプチドについて、そのオリゴペプチドを持つ
蛋白質を前記検索手段１２ｂにより蛋白質データベース
１３から検索し、上記分解されたオリゴペプチドを持つ
蛋白質のデータを出力手段１４に表示する。図９は蛋白
質データベースＳＷＩＳＳ・ＰＲＯＴ version33中にお
いて、上記３０アミノ酸配列の一番Ｎ末側の長さ５のオ
リゴペプチドＭＳＫＧＥを持つ蛋白質を表示した結果を
示す図である。なお、同図において、FKB2 BOVIN 等は
蛋白質データベースＳＷＩＳＳ・ＰＲＯＴ version33に
おける蛋白質のＩＤ名である。(3) Further, as shown in FIG. 8, regarding the degraded oligopeptide, the protein having the oligopeptide is searched from the protein database 13 by the search means 12b, and the protein having the degraded oligopeptide is searched. The data of 1 is displayed on the output means 14. FIG. 9 is a diagram showing the results of displaying a protein having the oligopeptide MSKGE with a length of 5 at the most N-terminal side of the above 30 amino acid sequence in the protein database SWISS PROT version 33. In the figure, FKB2 BOVIN and the like are protein ID names in the protein database SWISS / PROT version 33.

【００１７】（４）図１０に示すように、分解されたオ
リゴペプチドについて、そのオリゴペプチドを持つ蛋白
質およびその蛋白質の名前、機能を、蛋白質データベー
ス１３中で検索手段１２ｂにより検索して機能抽出手段
１２ｄにより抽出し、出力手段１４に表示する。図１１
は蛋白質データベースＳＷＩＳＳ・ＰＲＯＴ version33
中において、上記３０アミノ酸配列の一番Ｎ末側の長さ
５のオリゴペプチドＭＳＫＧＥを持つ蛋白質の機能を表
示（蛋白質の名前が機能を表現しているので名前を表
示）した結果を示す図である。なお、オリゴペプチドＭ
ＳＫＧＥを持つ蛋白質はＳＷＩＳＳ・ＰＲＯＴ version
33において６個あるが、同図では、この内蛋白質の機能
として頻度が上位２個について表示してある。(4) As shown in FIG. 10, for the decomposed oligopeptide, the protein having the oligopeptide and the name and function of the protein are searched by the search means 12b in the protein database 13 to extract the function. It is extracted by 12d and displayed on the output means 14. Figure 11
Is the protein database SWISS ・ PROT version33
In the figure, the function of a protein having the oligopeptide MSKGE with a length of 5 at the N-terminal side of the above 30 amino acid sequence is displayed (the name of the protein is displayed, so the name is displayed). is there. In addition, oligopeptide M
The protein with SKGE is SWISS ・ PROT version
There are 6 in 33, but in the same figure, the highest two frequencies are shown as the function of this internal protein.

【００１８】（５）図１２に示すように、分解されたオ
リゴペプチドについて、そのオリゴペプチドを、検索手
段１２ｂにより蛋白質データベース１３中で検索し、頻
度解析手段１２ｃによりカウントして、出現頻度の多い
順に並べ変える。そして、検索手段１２ｂにより、出現
頻度の高い順に上記オリゴペプチドを持つ蛋白質の機能
を検索し、機能抽出手段１２ｄにより機能を抽出して、
出力手段１４に出現頻度の高いオリゴペブチド順に出力
する。図１３〜図１７は、上記３０アミノ酸配列におい
て、長さ５のウインドウで分解されたオリゴペプチドに
ついて蛋白質データベースＳＷＩＳＳ・ＰＲＯＴ versi
on33中で頻度解析を行い、出現頻度の高い順に並べ、上
位４個のオリゴペプチドについて、そのオリゴペプチド
を持っている蛋白質のＩＤおよび機能を表示（蛋白質の
名前が機能を表現しているので名前を表示）した結果を
示す図である。(5) As shown in FIG. 12, regarding the decomposed oligopeptide, the oligopeptide is searched in the protein database 13 by the search means 12b and counted by the frequency analysis means 12c, and the appearance frequency is high. Rearrange in order. Then, the searching means 12b searches the functions of the proteins having the oligopeptides in descending order of appearance frequency, and the function extracting means 12d extracts the functions,
It outputs to the output means 14 in the order of the most frequently occurring oligopeptides. 13 to 17 show protein database SWISS / PROT versi for oligopeptides decomposed in a window of length 5 in the above 30 amino acid sequence.
The frequency analysis is performed in on33, and they are arranged in descending order of appearance frequency, and for the top four oligopeptides, the IDs and functions of the proteins having the oligopeptides are displayed (the name of the protein expresses the function. Is displayed).

【００１９】（６）図１８に示すように、分解されたオ
リゴペプチドについて、そのオリゴペプチドを、検索手
段１２ｂにより蛋白質データベース１３中で検索し、頻
度解析手段１２ｃによりカウントして、出現頻度の低い
順に並べ変える。そして、検索手段１２ｂにより、出現
頻度の低い順に上記オリゴペプチドを持つ蛋白質の機能
を検索し、機能抽出手段１２ｄにより機能を抽出して、
出力手段１４に出現頻度の低いオリゴペブチド順に出力
する。図１９は、上記３０アミノ酸配列において長さ５
のウインドウで分解されたオリゴペプチドについて蛋白
質データベースＳＷＩＳＳ・ＰＲＯＴ version33中で頻
度解析を行い、出現頻度の低い順に並べ、下位２個のオ
リゴペプチドについて、そのオリゴペプチドを持ってい
る蛋白質のＩＤ、機能（蛋白質の名前が機能を表現して
いるので名前を表示）、および出現頻度ゼロのオリゴペ
プチドを表示した結果を示す図である。なお、以上の説
明では、機能未知の蛋白質の機能および機能部位を推定
する場合について説明したが、当然のことながら、本発
明により、機能は既知で機能部位が未知の蛋白質の機能
部位を推定することができる。(6) As shown in FIG. 18, regarding the decomposed oligopeptide, the oligopeptide is searched in the protein database 13 by the search means 12b and counted by the frequency analysis means 12c, so that the appearance frequency is low. Rearrange in order. Then, the searching means 12b searches the functions of the proteins having the above oligopeptides in the order of appearance frequency, and the function extracting means 12d extracts the functions,
It outputs to the output means 14 in the order of the oligopeptides having the lowest appearance frequency. FIG. 19 shows a length of 5 in the above 30 amino acid sequence.
Frequency analysis was performed in the protein database SWISS PROT version33 for the oligopeptides decomposed in the window of, and they were arranged in order from the lowest appearance frequency. For the lower two oligopeptides, the ID and function of the protein having the oligopeptide ( It is a figure which shows the result which displayed the name because the name of a protein expresses a function) and the oligopeptide of which appearance frequency is zero. In the above description, the case of estimating the function and functional site of a protein of unknown function has been described, but, of course, according to the present invention, the functional site of a protein having a known function and an unknown functional site is estimated. be able to.

【００２０】[0020]

【発明の効果】以上説明したように、本発明において
は、蛋白質のアミノ酸配列をある長さのオリゴペプチド
に分解して蛋白質データベースにアクセスして、上記オ
リゴペプチドの頻度解析を行い、出現頻度をグラフ表
示したり、出現頻度の高いオリゴペプチドを持つ蛋白
質の名前および機能を表示したり、あるいは、出現頻
度が極端に低いオリゴペプチドを持つ蛋白質の名前およ
び機能、出現頻度がゼロのオリゴペプチドを表示するよ
うにしたので、出現頻度の高いオリゴペプチドを機能未
知の蛋白質の機能部位と推定したり、あるいは、出現頻
度が極端に低いか出現頻度ゼロのオリゴペプチドを機能
未知の蛋白質の独自の機能部位として推定することがで
きる。As described above, in the present invention
Is an oligopeptide of a certain length that is the amino acid sequence of a protein
And then access the protein database to
Frequency analysis of Rigopeptide is performed and the frequency of appearance is shown in a graph.
Show or Proteins with frequently occurring oligopeptides
Display quality name and function, or Frequent appearance
The names and names of proteins with extremely low oligopeptides
Function and the frequency of occurrence of oligopeptides are displayed.
As a result, oligopeptides that frequently appear will not function properly.
Presumed to be the functional site of a known protein, or
Functions oligopeptides with extremely low frequency or frequency of occurrence
It can be estimated as a unique functional site of an unknown protein.
Wear.

【００２１】このため、蛋白質データベース中の蛋白質
とホモロジーが無い場合においても、機能未知の蛋白質
の機能および機能部位もしくは機能が既知で機能部位が
未知の蛋白質の機能部位を推定することができ、これま
で、研究者がランダムにアミノ酸置換などを行い試行錯
誤で行っていた実験をかなり効率よく行うことができ
る。これにより、本手法が生化学・分子生物学・医薬品
開発等の各種の分野において広く普及することが期待さ
れる。Therefore, even if there is no homology with the protein in the protein database, it is possible to estimate the function and functional site of a protein of unknown function or the functional site of a protein whose function is known and whose functional site is unknown. Up to now, researchers can randomly perform amino acid substitutions and perform experiments by trial and error, which makes it possible to perform experiments with great efficiency. As a result, this method is expected to be widely spread in various fields such as biochemistry, molecular biology, and drug development.

[Brief description of drawings]

【図１】本発明の原理構成図である。FIG. 1 is a principle configuration diagram of the present invention.

【図２】本発明の実施の形態のシステム構成を示す図で
ある。FIG. 2 is a diagram showing a system configuration of an embodiment of the present invention.

【図３】一文字記号とアミノ酸との対応を示す図であ
る。FIG. 3 is a diagram showing correspondence between one-letter symbols and amino acids.

【図４】アミノ酸配列をオリゴペプチドに分解する処理
を説明する図である。FIG. 4 is a diagram illustrating a process of decomposing an amino acid sequence into oligopeptides.

【図５】３０アミノ酸配列をオリゴペプチドに分解した
例を示す図である。FIG. 5 is a diagram showing an example in which a 30 amino acid sequence is decomposed into oligopeptides.

【図６】出現頻度をグラフ表示する場合の処理を説明す
る図である。FIG. 6 is a diagram illustrating a process of displaying the appearance frequency in a graph.

【図７】３０アミノ酸配列の出現頻度をグラフ表示した
一例を示す図である。FIG. 7 is a diagram showing an example in which the frequency of appearance of 30 amino acid sequences is displayed graphically.

【図８】蛋白質のデータを表示する場合の処理を説明す
る図である。FIG. 8 is a diagram illustrating processing when displaying protein data.

【図９】オリゴペプチドＭＳＫＧＥを持つ蛋白質のデー
タの表示結果を示す図である。FIG. 9 is a diagram showing a display result of data of a protein having oligopeptide MSKGE.

【図１０】蛋白質およびその名前、機能を表示する場合
の処理を説明する図である。FIG. 10 is a diagram illustrating a process of displaying a protein, its name, and a function.

【図１１】オリゴペプチドＭＳＫＧＥを持つ蛋白質の機
能の表示結果を示す図である。FIG. 11 is a diagram showing the display results of the function of a protein having the oligopeptide MSKGE.

【図１２】出現頻度が高いオリゴペプチドを持つ蛋白質
の機能を表示する場合の処理を説明する図である。FIG. 12 is a diagram illustrating a process of displaying the function of a protein having an oligopeptide having a high appearance frequency.

【図１３】出現頻度が高いオリゴペプチドを持つ蛋白質
のＩＤ、機能の表示結果を示す図（その１）である。FIG. 13 is a diagram (part 1) showing the display results of IDs and functions of proteins having oligopeptides with high appearance frequency.

【図１４】出現頻度が高いオリゴペプチドを持つ蛋白質
のＩＤ、機能の表示結果を示す図（その２）である。FIG. 14 is a diagram (part 2) showing the display results of the ID and function of a protein having an oligopeptide having a high appearance frequency.

【図１５】出現頻度が高いオリゴペプチドを持つ蛋白質
のＩＤ、機能の表示結果を示す図（その３）である。FIG. 15 is a diagram (part 3) showing display results of IDs and functions of proteins having oligopeptides with high appearance frequency.

【図１６】出現頻度が高いオリゴペプチドを持つ蛋白質
のＩＤ、機能の表示結果を示す図（その４）である。FIG. 16 is a diagram (part 4) showing the display results of the ID and function of a protein having an oligopeptide having a high appearance frequency.

【図１７】出現頻度が高いオリゴペプチドを持つ蛋白質
のＩＤ、機能の表示結果を示す図（その５）である。FIG. 17 is a view (No. 5) showing display results of IDs and functions of proteins having oligopeptides with high appearance frequency.

【図１８】出現頻度が低いオリゴペプチドを持つ蛋白質
の機能を表示する場合の処理を説明する図である。FIG. 18 is a diagram illustrating a process for displaying the function of a protein having an oligopeptide having a low appearance frequency.

【図１９】出現頻度が低いオリゴペプチドを持つ蛋白質
のＩＤ、機能、出現頻度がゼロのオリゴペプチドの表示
結果を示す図である。FIG. 19 is a diagram showing IDs, functions of proteins having oligopeptides having a low appearance frequency, and display results of oligopeptides having a zero appearance frequency.

[Explanation of symbols]

１アミノ酸配列をオリゴペプチドに分解する手段２蛋白質データベース３蛋白質のデータを抽出する手段４頻度解析手段５出力手段１１入力装置１２処理部１２ａオリゴペプチド分解手段１２ｂ検索手段１２ｃ頻度解析手段１２ｄ機能抽出手段１３蛋白質データベース１４出力装置 1 Means for degrading amino acid sequences into oligopeptides 2 protein database 3 Means for extracting protein data 4 Frequency analysis means 5 Output means 11 Input device 12 Processing unit 12a Oligopeptide degrading means 12b Search method 12c Frequency analysis means 12d function extracting means 13 protein database 14 Output device

フロントページの続き (72)発明者渡部勇神奈川県川崎市中原区上小田中４丁目１番１号富士通株式会社内 (56)参考文献特開平４−75582（ＪＰ，Ａ) 特開平７−105224（ＪＰ，Ａ) 特開平８−110910（ＪＰ，Ａ) ＳＴＮＩＮＴＥＲＮＡＴＩＯＮＡＬ編，ＣＡＳＯＮＬＩＮＥポケット・ガイド，日本，社団法人化学情報協会，1996年４月，ｐ．30 ＰｒｏｔｅｉｎＥｎｇｉｎｅｅｒｉｎｇ，1993年，ｖｏｌ．６，ｎｏ. ４，ｐ．391−395 (58)調査した分野(Int.Cl.⁷，ＤＢ名) C12N 15/00 - 15/90 ＪＩＣＳＴファイル（ＪＯＩＳ) ＰｕｂＭｅｄFront page continuation (72) Inventor Yutaka Watanabe 4-1-1 Kamitadanaka, Nakahara-ku, Kawasaki-shi, Kanagawa Fujitsu Limited (56) Reference JP-A-4-75582 (JP, A) JP-A-7- 105224 (JP, A) JP-A-8-110910 (JP, A) STN INTERNATIONAL, edited by CAS ONLINE Pocket Guide, Japan, Japan Society for Chemical Information, April 1996, p. 30 Protein Engineering, 1993, vol. 6, no. 4, p. 391-395 (58) Fields surveyed (Int.Cl. ⁷ , DB name) C12N 15/00-15/90 JISST file (JOIS) PubMed

Claims

(57) [Claims]

1. Means for decomposing an oligopeptide of a certain length by inputting the amino acid sequence of the protein and protein data such as protein name, function, amino acid sequence, etc.
Access the pay to protein databases, each oligopeptide is degraded, and means for extracting the data of the protein having the oligopeptide, accesses the protein database, performs frequency analysis of the oligopeptide, in a protein database A protein database characterized by comprising means for obtaining an oligopeptide having a high frequency of occurrence, and means for displaying at least a name and a function of data of the above-mentioned high-frequency oligopeptide and a protein having the oligopeptide. system.

2. A method for inputting an amino acid sequence of a protein to decompose it into an oligopeptide of a certain length and protein data such as protein name, function, amino acid sequence, etc.
Access the pay to protein databases, each oligopeptide is degraded, and means for extracting the data of the protein having the oligopeptide, accesses the protein database, performs frequency analysis of the oligopeptide, in a protein database The method of finding oligopeptides whose occurrence frequency is extremely low or the occurrence frequency is zero, and at least the name and function of the above-mentioned oligopeptides with low occurrence frequency and the data of proteins having the oligopeptides are displayed. And a means for displaying zero-oligopeptides, and a protein database system.

3. A protein such as protein name, function, amino acid sequence, etc.
Protein database storing quality data , data input means, input data processing means for processing input data, protein database searching means, search data processing means for processing data searched from protein database, and processing in protein database system having a data output means for displaying and outputting data as input an amino acid sequence of the protein, decomposing the oligopeptides of a length, frequency analysis of the oligopeptides to access the protein database It was carried out, how to estimate the functional site of unknown function of a protein with a high frequency oligopeptides in the database, displaying at least the name and function of the protein of data with high oligopeptide of the appearance frequency.

4. A protein such as protein name, function, amino acid sequence, etc.
Protein database storing quality data , data input means, input data processing means for processing input data, protein database searching means, search data processing means for processing data searched from protein database, and processing in protein database system having a data output means for displaying and outputting data as input an amino acid sequence of the protein, decomposing the oligopeptides of a length, frequency analysis of the oligopeptides to access the protein database It was carried out, and displays at least a name and function of the protein of data with extremely low oligopeptide frequency, method of frequency displays zero oligopeptide.