JP2000222421A

JP2000222421A - Protein analysis aiding device and storage medium storing program for allowing computer to execute processing in the device

Info

Publication number: JP2000222421A
Application number: JP11022997A
Authority: JP
Inventors: Masato Kitajima; 正人北島; Michihiro Oya; 倫宏大屋
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1999-01-29
Filing date: 1999-01-29
Publication date: 2000-08-11
Anticipated expiration: 2019-01-29
Also published as: JP3370942B2

Abstract

PROBLEM TO BE SOLVED: To provide a protein analysis aiding device capable of extracting the amino acid residual group parts of a protein interacting with a low molecular weight compound in a form of including much information related to the structure of the protein. SOLUTION: The protein analysis aiding device retrieves amino acid residual groups including respective atoms of a specified kind in a specified compound and atoms of the specified kind in the specified protein included in a prescribed distance range by using information stored in a protein data base(PDB), generates an arrayed pattern by arraying respective amino acid residual groups obtained as retrieved results in the order of the primary array of amino acid residual groups constituting the specified protein and obtains the array pattern as information expressing the primary array structure of parts including the amino acid residual group parts of the protein interacting with the specified compound.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、タンパク質の性質
を解析するために有益な情報を得るためのタンパク質解
析支援装置に係り、詳しくは、タンパク質を一次配列記
述したアミノ酸残基列から低分子化合物と相互作用する
と予想される部分を構成するアミノ酸残基の配列パター
ンを抽出するようにしたタンパク質解析支援装置に関す
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a protein analysis support apparatus for obtaining useful information for analyzing the properties of a protein, and more particularly, to a low molecular weight compound from a sequence of amino acid residues describing a primary sequence of a protein. The present invention relates to a protein analysis support apparatus for extracting a sequence pattern of amino acid residues constituting a portion expected to interact with a protein.

【０００２】また、本発明は、上記のようなタンパク質
解析支援装置での処理をコンピュータに行わせるための
プログラムを格納したプログラムを格納した記憶媒体に
関する。[0002] The present invention also relates to a storage medium storing a program storing a program for causing a computer to perform the processing in the protein analysis support apparatus as described above.

【０００３】[0003]

【従来の技術】低分子化合物と相互作用を行うタンパク
質を予測することは、低分子化合物（基質）を変化さる
ために必要となるタンパク質（酵素）が予測できるな
ど、製薬の分野や生化学の分野において重要なことであ
る。タンパク質とその複雑に折れ曲がった糸状のタンパ
ク質に絡み合うように存在する低分子化合物との三次元
構造について記述したデータベース（例えば、ＰＤＢ：
Protein Data Bank ）が提供されている。2. Description of the Related Art Predicting a protein that interacts with a low-molecular compound can predict the protein (enzyme) required to change the low-molecular compound (substrate). It is important in the field. A database describing the three-dimensional structure of a protein and its low molecular weight compounds that are entangled with the complexly folded filamentous protein (eg, PDB:
Protein Data Bank).

【０００４】従来、このようなデータベースを用いて、
低分子化合物（結合物質）と相互作用（結合）するタン
パク質のアミノ酸残基列を検索する装置が提案されてい
る（特開平７−０５６９３１）。この従来の装置では、
タンパク質と低分子化合物を指定すると、データベース
に記述されるその指定タンパク質及び低分子化合物の三
次元構造データに基づいて、低分子化合物の各要素から
所定距離以内のタンパク質のアミノ酸残基列を低分子化
合物との結合部位として抽出している。Conventionally, using such a database,
An apparatus has been proposed which searches for an amino acid residue sequence of a protein that interacts (binds) with a low molecular compound (binding substance) (JP-A-7-056931). In this conventional device,
When a protein and a low-molecular compound are specified, the amino acid residue sequence of the protein within a predetermined distance from each element of the low-molecular compound is converted to the low-molecular compound based on the three-dimensional structural data of the specified protein and the low-molecular compound described in the database. It is extracted as a binding site with the compound.

【０００５】[0005]

【発明が解決しようとする課題】例えば、図１に示すよ
うに、タンパク質Ｐは、糸状に複雑に折れ曲がった構造
となっているため、そのタンパク質Ｐに絡まった状態で
存在する低分子化合物Ｃと近接する（所定距離以内）タ
ンパク質Ｐのアミノ酸残基部分Ｃ１、Ｃ２は、一次配列
されたアミノ酸残基列中において、とびとびに存在する
ことになる。従って、上述した従来の装置では、この結
合部位として抽出されるアミノ酸残基列Ｃ１、Ｃ２が個
別的に得られるだけである。For example, as shown in FIG. 1, protein P has a complicatedly bent structure in the form of a thread. The amino acid residue portions C1 and C2 of the protein P that are close to each other (within a predetermined distance) are discretely present in the amino acid residue sequence in the primary sequence. Therefore, in the above-mentioned conventional apparatus, only the amino acid residue sequences C1 and C2 extracted as the binding site can be obtained individually.

【０００６】タンパク質の性質はその三次元構造に依存
するものであり、上記のように個別的に抽出されたアミ
ノ酸残基列Ｃ１、Ｃ２だけでは、アミノ酸残基列Ｃ１と
Ｃ２の配列順序や、その間の条件が不明であり、三次元
構造に依存するタンパク質の性質を解析するうえで、十
分有益な情報とはいえない。即ち、アミノ酸残基の一次
配列により記述されたタンパク質のなかで、個別的に抽
出されたアミノ酸残基列Ｃ１、Ｃ２が、Ｃ１、Ｃ２の順
序で存在するのか、Ｃ２、Ｃ１の順序で存在するのか、
あるいは、Ｃ１とＣ２との間は、どのような条件となっ
ているのか等の情報が不明である。[0006] The properties of a protein depend on its three-dimensional structure. Only the amino acid residue sequences C1 and C2 individually extracted as described above, the sequence order of the amino acid residue sequences C1 and C2, and the like. The conditions during that period are unknown, and are not sufficiently useful information for analyzing the properties of proteins that depend on three-dimensional structures. That is, in the protein described by the primary sequence of the amino acid residues, the amino acid residue sequences C1, C2 individually extracted exist in the order of C1, C2, or exist in the order of C2, C1. Or
Alternatively, information such as conditions under C1 and C2 is unknown.

【０００７】そこで、本発明の第一の課題は、低分子化
合物と相互作用するタンパク質のアミノ酸残基部分を当
該タンパク質の構造に関する情報をより多く含む形で抽
出できるようにしたタンパク質解析支援装置を提供する
ことである。本発明の第二の課題は、上記のようなタン
パク質解析支援装置での処理をコンピュータで行わせる
ためのプログラムを格納した記憶媒体を提供することで
ある。Accordingly, a first object of the present invention is to provide a protein analysis support apparatus capable of extracting an amino acid residue portion of a protein interacting with a low molecular compound in a form containing more information on the structure of the protein. To provide. A second object of the present invention is to provide a storage medium storing a program for causing a computer to perform the above-described processing in the protein analysis support device.

【０００８】[0008]

【課題を解決するための手段】上記第一の課題を解決す
るため、本発明は、請求項１に記載されるように、タン
パク質とそのタンパク質の近傍に存在する化合物の三次
元構造データを記述したデータベースを用いて、指定さ
れた化合物と相互作用すると予測される指定されたタン
パク質のアミノ酸残基部分の情報を得るためのタンパク
質解析支援装置において、指定された化合物において指
定された種類の各原子と所定距離範囲内に存在する指定
されたタンパク質において指定された種類の原子を含む
アミノ酸残基を上記データベースの情報を用いて検索す
る検索手段と、該検索手段での検索結果として得られた
各アミノ酸残基を所定の表記規則に従って当該指定され
たタンパク質を構成するアミノ酸残基の一次配列の順に
配列してパターン化した配列パターンを生成する配列パ
ターン生成手段とを有し、この配列パターンを指定され
た化合物と相互作用するタンパク質のアミノ酸残基部分
を含む部分の一次配列構造を表す情報として得るように
構成される。In order to solve the first problem, the present invention, as described in claim 1, describes three-dimensional structural data of a protein and a compound existing in the vicinity of the protein. A protein analysis support device for obtaining information on amino acid residue portions of a specified protein that is predicted to interact with a specified compound by using the specified database, each atom of a specified type in the specified compound Search means for searching for amino acid residues containing a specified type of atom in a specified protein existing within a predetermined distance range using the information of the database; and each of the search results obtained by the search means. The amino acid residues are arranged in the order of the primary sequence of the amino acid residues constituting the designated protein according to a predetermined notation rule, and are patterned. Sequence pattern generating means for generating a sequence pattern, wherein the sequence pattern is obtained as information representing a primary sequence structure of a portion including an amino acid residue portion of a protein interacting with a designated compound. .

【０００９】このようなタンパク質解析支援装置では、
検索結果として得られた各アミノ酸残基が所定の表記規
則に従って指定されたタンパク質を構成するアミノ酸残
基の一次配列の順に配列されてパターン化される。従っ
て、このパターン化の結果として得られた配列パターン
は、得られたアミノ酸残基が当該一次配列の順に従って
離散していても統合された配列パターンとして表現され
る。その結果、この配列パターンから、離散されたアミ
ノ酸残基の配列順序や、化合物と相互作用する部分を含
むタンパク質の残基部分の大まかな一次配列構造を把握
することができる。In such a protein analysis support apparatus,
Each amino acid residue obtained as a search result is arranged and patterned in the order of the primary sequence of the amino acid residue constituting the designated protein according to a predetermined notation rule. Therefore, the sequence pattern obtained as a result of this patterning is expressed as an integrated sequence pattern even if the obtained amino acid residues are discrete according to the order of the primary sequence. As a result, from this sequence pattern, the sequence order of the discrete amino acid residues and the approximate primary sequence structure of the residue portion of the protein including the portion that interacts with the compound can be grasped.

【００１０】なお、上記所定の表記規則は、特に限定さ
れないが、生成される配列パターンの情報を利用するシ
ステムでの要求（表示の見易さ、配列パターンの検索ア
ルゴリズムの簡便さ等）を満たすことが好ましい。より
簡単に表記された配列パターンが得られるという観点か
ら、本発明においては、請求項２に記載されるように、
上記配列パターン生成手段は、１文字表記されたアミノ
酸残基を当該指定されたタンパク質を構成するアミノ酸
残基の一次配列の順に配列すると共に、該一次配列の順
において離散して存在するアミノ酸残基間に所定のパタ
ーン表記体を挿入して配列パターン情報を生成するよう
に構成することができる。The above-mentioned predetermined notation rule is not particularly limited, but satisfies the requirements (easiness of display, simplicity of array pattern search algorithm, etc.) in a system using information of the generated array pattern. Is preferred. From the viewpoint that an array pattern described in a simpler manner can be obtained, in the present invention, as described in claim 2,
The sequence pattern generation means arranges the amino acid residues represented by one letter in the order of the primary sequence of the amino acid residues constituting the designated protein, and the amino acid residues discretely present in the order of the primary sequence. It can be configured to generate array pattern information by inserting a predetermined pattern notation between them.

【００１１】このようなタンパク質解析支援装置では、
タンパク質を構成するアミノ酸残基の一次配列の順にア
ミノ酸残基を表す文字と所定のパターン表記体が連続し
て配列された配列パターンが得られる。上記データベー
スは、当該タンパク質解析支援装置内に備えていてもよ
いし、また、インターネット等の外部通信網から入手す
るようにしてもよい。また、外部通信網から入手する場
合では、データベースの情報を全部装置内にダウンロー
ドすることも可能であるし、また、通信回線を接続した
状態で利用することも可能である。In such a protein analysis support device,
A sequence pattern in which letters representing amino acid residues and a predetermined pattern notation are successively arranged in the order of the primary sequence of the amino acid residues constituting the protein is obtained. The database may be provided in the protein analysis support apparatus, or may be obtained from an external communication network such as the Internet. When the information is obtained from an external communication network, all the information in the database can be downloaded into the apparatus, or can be used with a communication line connected.

【００１２】生成された配列パターンは、新たに発見さ
れたタンパク質の性質を解析するために、その新たなタ
ンパク質を構成するアミノ酸残基列から、当該生成され
た配列パターンで表されるアミノ酸残基列を検索するシ
ステムで利用することができる。また、生成された配列
パターンを単に表示して、研究者等が利用することもで
きる。[0012] In order to analyze the properties of a newly discovered protein, the generated sequence pattern is derived from the amino acid residue sequence constituting the new protein by the amino acid residues represented by the generated sequence pattern. It can be used in a system that searches columns. Also, the generated sequence pattern can be simply displayed and used by researchers or the like.

【００１３】上記のようにデータベースを装置内に備
え、生成された配列パターンを表示する機能を備えたタ
ンパク質解析支援装置を提供するという観点から、本発
明は、請求項３に記載されるように、タンパク質とその
タンパク質の近傍に存在する化合物の三次元構造データ
を記述したデータベースを用いて、指定された化合物と
相互作用すると予測される指定されたタンパク質のアミ
ノ酸残基部分の情報を得るためのタンパク質解析支援装
置において、上記データベースを格納するデータベース
格納手段と、指定された化合物において指定された種類
の各原子と所定距離範囲内に存在する指定されたタンパ
ク質において指定された種類の原子を含むアミノ酸残基
を上記格納手段から読み出されたデータベースの情報を
用いて検索する検索手段と、該検索手段での検索結果と
して得られた各アミノ酸残基を所定の表記規則に従って
当該指定されたタンパク質を構成するアミノ酸残基の一
次配列の順に配列してパターン化した配列パターンを生
成する配列パターン生成手段と、この配列パターンをを
指定された化合物と相互作用するタンパク質のアミノ酸
残基部分を含む部分の一次配列構造を表す情報として表
示する表示手段とを有するように構成される。[0013] From the viewpoint of providing a protein analysis support apparatus having a function of displaying a generated sequence pattern by providing a database in the apparatus as described above, the present invention provides a method as described in claim 3. Using a database that describes the three-dimensional structural data of a protein and compounds present in the vicinity of the protein, to obtain information on the amino acid residue portion of the specified protein that is predicted to interact with the specified compound In the protein analysis support apparatus, a database storage means for storing the database, and an amino acid containing an atom of a specified type in a specified protein existing within a predetermined distance from each atom of a specified type in a specified compound Search for searching for residues using information in the database read from the storage means And generating a sequence pattern in which the amino acid residues obtained as a result of the search by the search means are arranged in the order of the primary sequence of the amino acid residues constituting the designated protein according to a predetermined notation rule, and are patterned. And a display means for displaying the sequence pattern as information representing a primary sequence structure of a portion including an amino acid residue portion of a protein interacting with a designated compound.

【００１４】上記本発明の第二の課題を解決するため、
本発明は、請求項４に記載されるように、タンパク質と
そのタンパク質の近傍に存在する化合物の三次元構造デ
ータを記述したデータベースを用いて、指定された化合
物と相互作用すると予測される指定されたタンパク質の
アミノ酸残基部分の情報を得るためのタンパク質解析支
援装置での処理をコンピュータに行わせるためのプログ
ラムを格納した記憶媒体において、指定された化合物に
おいて指定された種類の各原子と所定距離範囲内に存在
する指定されたタンパク質において指定された種類の原
子を含むアミノ酸残基を上記データベースの情報を用い
て検索する検索手順と、該検索手段での検索結果として
得られた各アミノ酸残基を所定の表記規則に従って当該
指定されたタンパク質を構成するアミノ酸残基の一次配
列の順に配列してパターン化した配列パターンを生成す
る配列パターン生成手順とを有し、この配列パターンを
指定された化合物と相互作用するタンパク質のアミノ酸
残基部分を含む部分の一次配列構造を表す情報として得
るようにしたプログラムを格納した記憶媒体として構成
される。In order to solve the second problem of the present invention,
As described in claim 4, the present invention uses a database describing three-dimensional structural data of a protein and a compound present in the vicinity of the protein, and specifies a designated compound predicted to interact with the designated compound. In a storage medium storing a program for causing a computer to perform processing in a protein analysis support device for obtaining information on amino acid residue portions of a protein, each atom of a specified type in a specified compound and a predetermined distance A search procedure for searching for an amino acid residue containing an atom of a specified type in a specified protein existing in a range using information of the database, and each amino acid residue obtained as a search result by the search means. Are arranged in the order of the primary sequence of amino acid residues constituting the designated protein according to a predetermined notation rule. A sequence pattern generating procedure for generating a turned sequence pattern, wherein the sequence pattern is obtained as information representing a primary sequence structure of a portion including an amino acid residue portion of a protein interacting with a designated compound. It is configured as a storage medium storing a program.

【００１５】この記憶媒体は、コンピュータによるプロ
グラムの読み出しが可能であるものであれば特に制限さ
れるものではなく、機械的に書き込んだプログラムを光
学的に読み出すＣＤ−ＲＯＭ等の媒体、磁気的にプログ
ラムの記録再生を行う磁気ディスク等の媒体、電気的に
プログラムの読み出し、書き込みを行う半導体メモリ等
の媒体、光、磁気を用いてプログラムの読み出し、書き
込みを行う光磁気ディスク等の媒体等を用いることがで
きる。The storage medium is not particularly limited as long as the program can be read out by a computer. A medium such as a CD-ROM for optically reading a program written mechanically, a magnetic medium for reading out a program, etc. Use a medium such as a magnetic disk for reading and writing the program, a medium such as a semiconductor memory for electrically reading and writing the program, and a medium such as a magneto-optical disk for reading and writing the program using light and magnetism. be able to.

【００１６】[0016]

【発明の実施の形態】以下、本発明の実施の一形態を図
面に基づいて説明する。本発明の実施の一形態に係るタ
ンパク質解析支援装置が適用されるシステムの基本的な
構成は、例えば、図２に示すようになっている。図２に
おいて、このシステムは、所謂クライアント−サーバシ
ステムであり、複数のクライアント端末１０、１１、１
２とサーバ２０がＬＡＮによって接続されている。サー
バ２０は、タンパク質及びタンパク質の糸状鎖内に含ま
れる低分子化合物の三次元構造を記述したデーターベー
ス（ＰＤＢ）を有している。各クライアント端末１０、
１１、及びサーバ２０は、一般的なコンピュータシステ
ムにて構成され、例えば、図３に示すように構成されて
いる。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS One embodiment of the present invention will be described below with reference to the drawings. A basic configuration of a system to which a protein analysis support device according to an embodiment of the present invention is applied is, for example, as shown in FIG. In FIG. 2, this system is a so-called client-server system, and includes a plurality of client terminals 10, 11, 1 and 2.
2 and the server 20 are connected by a LAN. The server 20 has a database (PDB) that describes the three-dimensional structure of a protein and a low-molecular compound contained in a thread-like chain of the protein. Each client terminal 10,
The server 11 and the server 20 are configured by a general computer system, for example, as illustrated in FIG.

【００１７】図３において、このコンピュータシステム
は、ＣＰＵ１０１、メモリユニット１０２、ＬＡＮユニ
ット１０３、入力ユニット１０４、表示ユニット１０
５、ＣＤ−ＲＯＭドライブユニット１０６及び補助記憶
装置（例えば、ディスク装置）１０７を有している。各
ユニットは、バスＢにて接続されている。ＣＰＵ１０１
は、メモリユニット１０２に格納されるプログラムに従
って、システム全体を制御すると共に、後述するような
タンパク質の解析処理を実行する。メモリユニット１０
２は、ＲＯＭ及びＲＡＭを備え、ＣＰＵ１０にて実行さ
れるプログラムを格納すると共に、テーブルやＣＰＵ１
０１での処理にて得られた各種情報を格納する。ＬＡＮ
ユニット１０３は、ＬＡＮを介してクライアント−サー
バ間のデータ通信制御を行う。In FIG. 3, the computer system includes a CPU 101, a memory unit 102, a LAN unit 103, an input unit 104, a display unit 10
5, a CD-ROM drive unit 106 and an auxiliary storage device (for example, a disk device) 107. Each unit is connected by a bus B. CPU 101
Controls the entire system according to a program stored in the memory unit 102 and executes a protein analysis process as described later. Memory unit 10
Reference numeral 2 includes a ROM and a RAM, and stores programs to be executed by the CPU 10 and a table and a CPU 1.
01 is stored. LAN
The unit 103 controls data communication between the client and the server via the LAN.

【００１８】入力ユニット１０４は、マウス及びキーボ
ードを有し、ユーザが必要な情報をシステムに入力する
ために用いられる。表示ユニット１０５は、上述したタ
ンパク質の解析処理において必要なデータの表示やその
解析結果等を表示する。ＣＤ−ＲＯＭドライブユニット
１０６は、セットされたＣＤ−ＲＯＭ２００から情報を
読み出す。ＣＰＵ１０１による制御に従って、ＣＤ−Ｒ
ＯＭ２００に格納されたプログラムやデータベースは、
ＣＤ−ＲＯＭドライブユニット１０６にて読み出され
て、補助記憶装置１０７に格納される。補助記憶装置１
０７に格納されたプログラムは、システムの起動時にメ
モリユニット１０２に移され、ＣＰＵ１０１がこのメモ
リユニット１０２に格納されたプログラムに従って処理
を実行する。特に、サーバ２０においては、上述したＰ
ＤＢ（Protein Data Bank)が、例えば、ＣＤ−ＲＯＭ２
００にて提供され、このＰＤＢが補助記憶装置１０７に
格納される。なお、このＰＤＢは、例えば、インターネ
ット等の外部の通信網を介してサーバ２０に提供するこ
ともできる。The input unit 104 has a mouse and a keyboard, and is used by a user to input necessary information to the system. The display unit 105 displays data necessary for the above-described protein analysis processing, the analysis result thereof, and the like. The CD-ROM drive unit 106 reads information from the set CD-ROM 200. According to the control by the CPU 101, the CD-R
The programs and databases stored in the OM 200
The data is read by the CD-ROM drive unit 106 and stored in the auxiliary storage device 107. Auxiliary storage device 1
The program stored in 07 is transferred to the memory unit 102 when the system is started, and the CPU 101 executes processing according to the program stored in the memory unit 102. In particular, in the server 20, the P
DB (Protein Data Bank) is, for example, CD-ROM2
00, and the PDB is stored in the auxiliary storage device 107. Note that this PDB can be provided to the server 20 via an external communication network such as the Internet, for example.

【００１９】ＰＤＢは、例えば、図４及び図５に示すよ
うな情報が予め登録されている。図４はタンパク質に関
する情報（ＡＴＯＭ行）を示す。即ち、タンパク質を構
成する各アミノ酸残基名（ＭＥＴ、ＴＨＲ、ＧＬＵ（以
上、３文字表記例）等）、そのアミノ酸残基を構成する
原子、各原子の三次元位置座標（三次元構造）及びタン
パク質を特定するＩＤ名等が登録されている。図５は、
図１に示すようにタンパク質Ｐの糸状鎖内に存在するこ
とが判明している低分子化合物に関する情報（ＨＥＴＡ
ＴＯＭ行）を示す。即ち、化合物名（ＧＮＰ（三文字表
記例）等）、その化合物を構成する原子、各原子の三次
元位置座標（三次元構造）及び当該低分子化合物を特定
するＩＤ名等が登録されている。In the PDB, for example, information as shown in FIGS. 4 and 5 is registered in advance. FIG. 4 shows information on the protein (ATOM row). That is, the name of each amino acid residue (MET, THR, GLU (above, three-letter notation), etc.) constituting the protein, the atoms constituting the amino acid residue, the three-dimensional position coordinates (three-dimensional structure) of each atom, and An ID name for identifying a protein is registered. FIG.
As shown in FIG. 1, information on low molecular weight compounds known to be present in the filamentous chain of protein P (HETA
TOM line). That is, a compound name (such as GNP (three-letter notation)), atoms constituting the compound, three-dimensional position coordinates (three-dimensional structure) of each atom, an ID name for specifying the low-molecular compound, and the like are registered. .

【００２０】ユーザが、例えば、クライアント端末１０
を用いてタンパク質の解析を行う場合、クライアント端
末１０のＣＰＵ１０１は、図６に示す手順に従って処理
を実行する。図６において、ユーザが入力ユニット１０
４にて解析処理の開始操作を行うと、ＰＤＢの読み込み
処理が行われる（Ｓ１）。このＰＤＢの読み込み処理で
は、まず、ＬＡＮユニット１０３及びＬＡＮを介してサ
ーバ２０に対してＰＤＢの要求を行う。その応答とし
て、ＰＤＢがサーバ２０から返送されてＬＡＮユニット
１０３にて受信されると、そのＰＤＢが補助記憶装置１
０７に格納される。When the user operates, for example, the client terminal 10
In the case of performing protein analysis using, the CPU 101 of the client terminal 10 executes processing according to the procedure shown in FIG. In FIG. 6, the user operates the input unit 10.
When a start operation of the analysis process is performed in step 4, a PDB reading process is performed (S1). In the PDB reading process, first, a request for the PDB is made to the server 20 via the LAN unit 103 and the LAN. As a response, when the PDB is returned from the server 20 and received by the LAN unit 103, the PDB is stored in the auxiliary storage device 1.
07 is stored.

【００２１】ＰＤＢがこのようにクライアント端末１０
に読み込まれると、ＰＤＢに記述されたタンパク質及び
低分子化合物名（図４及び図５参照）が表示ユニット１
０５に表示される。そして、ユーザが表示されたタンパ
ク質及び低分子化合物から入力ユニット１０４の操作に
よって対象となるタンパク質及び低分子化合物を指定す
ると、その指定されたタンパク質及び低分子化合物が取
得される（Ｓ２）。ＣＰＵ１０１がユーザ指定のタンパ
ク質及び低分子化合物を取得すると、それらを構成する
原子種のリストが表示ユニット１０５に表示される。When the PDB is connected to the client terminal 10 in this manner,
Is read, the names of the proteins and low-molecular compounds described in the PDB (see FIGS. 4 and 5) are displayed on the display unit 1.
05 is displayed. Then, when the user specifies the target protein and low-molecular compound from the displayed proteins and low-molecular compounds by operating the input unit 104, the specified protein and low-molecular compound are obtained (S2). When the CPU 101 obtains a protein and a low-molecular compound specified by the user, a list of the atomic species constituting them is displayed on the display unit 105.

【００２２】タンパク質及び低分子化合物のそれぞれを
構成する原子種からユーザが入力ユニット１０４の操作
によって相互作用の有無を調べるべき低分子化合物の原
子種を指定すると共に、タンパク質の原子種を指定する
と、それら原子種対が取得される（Ｓ３、Ｓ４）。そし
て、更に、ユーザが相互作用すると予測される原子間距
離の範囲を入力ユニット１０４の操作によって指定する
と、その原子間距離の範囲が取得される（Ｓ５）。When the user designates the atomic species of the low-molecular compound to be examined for the presence or absence of the interaction by operating the input unit 104 from the atomic species constituting each of the protein and the low-molecular compound, and designates the atomic species of the protein, Those atomic species pairs are obtained (S3, S4). Then, when the user specifies the range of the interatomic distance predicted to interact with the input unit 104, the range of the interatomic distance is obtained (S5).

【００２３】上記のようにして、タンパク質及び低分子
化合物の各原子種及び原子間距離の範囲が取得される
と、それらの情報に基づいて検索条件が作成され、登録
される（メモリユニット１０２の所定領域に格納され
る）（Ｓ６）。この検索条件は、例えば、タンパク質低分子化合物原子間距離の範囲Ｃ（炭素）Ｏ（酸素）３．５Å以下のようになる。即ち、この場合、検索条件は、「低分子
化合物の酸素原子Ｏに対して距離が３．５Å以下となる
タンパク質の炭素原子Ｃ」になる。As described above, when the atomic species and the range of the interatomic distance of the protein and the low molecular weight compound are obtained, a search condition is created and registered based on the information (the memory condition of the memory unit 102). (Stored in a predetermined area) (S6). This search condition is, for example, a protein low molecular weight compound The range of interatomic distance C (carbon) O (oxygen) 3.5 ° or less. That is, in this case, the search condition is “the carbon atom C of the protein whose distance to the oxygen atom O of the low molecular compound is 3.5 ° or less”.

【００２４】ユーザによって検索条件設定処理の終了操
作がなされたか否かが判定され（Ｓ７）、まだ、当該終
了操作がなされなければ、ユーザからの操作入力に従っ
て上述した処理（Ｓ３、Ｓ４、Ｓ５、Ｓ６）が繰り返し
実行され、上述したようなタンパク質及び低分子化合物
の各原子種及び原子間距離の範囲にて記述される検索条
件が順次登録される。It is determined whether or not the end operation of the search condition setting process has been performed by the user (S7). If the end operation has not been performed yet, the above-described processes (S3, S4, S5, S6) is repeatedly executed, and the search conditions described in the range of each atomic type and interatomic distance of the protein and the low-molecular compound as described above are sequentially registered.

【００２５】検索条件設定処理の終了操作がなされる
と、各検索条件を記述した条件ファイルがメモリユニッ
ト１０２から読み出されてＬＡＮユニット１０３からＬ
ＡＮを介してサーバ２０に送信される（Ｓ８）。そし
て、当該クライアント端末１０のＣＰＵ１０１は、サー
バ２０からの応答の待ち状態となる（Ｓ９）。上記のよ
うにしてクライアント端末１０から送信された条件ファ
イルをサーバ２０が受信すると、サーバ２０のＣＰＵ１
０１は、例えば、図７の手順に従って処理を実行する。When the search condition setting process is terminated, a condition file describing each search condition is read out from the memory unit 102, and the condition file
The data is transmitted to the server 20 via the AN (S8). Then, the CPU 101 of the client terminal 10 waits for a response from the server 20 (S9). When the server 20 receives the condition file transmitted from the client terminal 10 as described above, the CPU 1 of the server 20
01 executes a process according to the procedure of FIG. 7, for example.

【００２６】図７において、上述した条件ファイルをＬ
ＡＮユニット１０３にて受信すると（Ｓ２１）、その条
件ファイルがメモリユニット１０２に展開される。この
状態で、当該条件ファイルを参照して、まず、１番目の
条件（原子種対、原子間距離の範囲）が取得される（Ｓ
２２）。このように条件が取得されると、補助記憶装置
１０７に格納されたＰＤＢ（図４及び図５参照）を参照
して、対象となる低分子化合物及びタンパク質につい
て、取得された条件に合致する低分子化合物及びタンパ
ク質の原子、及びタンパク質の当該原子の属するアミノ
酸残基が特定される（Ｓ２３）。この処理は、例えば、
次のようになされる。In FIG. 7, the condition file described above is
When the condition file is received by the AN unit 103 (S21), the condition file is expanded in the memory unit 102. In this state, by referring to the condition file, first, the first condition (atomic species pair, range of interatomic distance) is acquired (S
22). When the conditions are acquired in this manner, the low-molecular-weight compounds and proteins of interest are referred to the PDB stored in the auxiliary storage device 107 (see FIGS. The atoms of the molecular compound and the protein, and the amino acid residue to which the atom of the protein belongs are specified (S23). This process, for example,
This is done as follows.

【００２７】まず、検索条件となる低分子化合物の原子
種（例えば、酸素原子Ｏ）に属する原子（例えば、Ｏ１
Ａ、Ｏ２＊、Ｏ２Ｂ、・・・）が全て抽出されると共
に、検索条件となるタンパク質の原子種（例えば、炭素
原子Ｃ）に属する原子（例えば、ＣＡ、ＣＢ、ＣＥ、・
・・）が全て抽出される。そして、低分子化合物の抽出
された各原子からタンパク質の抽出された各原子までの
距離がＰＤＢの各原子の三次元位置座標データを用いて
演算される。そして、その距離が検索条件における原子
間距離の範囲の条件（例えば、３．５Å以下）を満足す
る原子対が特定されると共に及びその原子間距離が特定
される。そして、対となった一方のタンパク質の原子を
含むアミノ酸残基が特定される。First, an atom (for example, O1) belonging to an atomic species (for example, oxygen atom O) of a low molecular compound serving as a search condition
A, O2 *, O2B,...) Are all extracted, and atoms (for example, CA, CB, CE,...) Belonging to a protein atom type (for example, carbon atom C) serving as a search condition.
・・) Are all extracted. Then, the distance from each of the extracted atoms of the low molecular compound to each of the extracted atoms of the protein is calculated using the three-dimensional position coordinate data of each of the atoms in the PDB. Then, an atom pair that satisfies the condition of the range of the interatomic distance in the search condition (for example, 3.5 ° or less) is specified, and the interatomic distance is specified. Then, an amino acid residue containing an atom of one of the paired proteins is specified.

【００２８】上記のようにして、検索条件を満足する原
子対等が抽出されると、受信した条件ファイルにまだ検
索条件が含まれているか否かが判定され（Ｓ２４）、ま
だ、含まれている場合には、次の検索条件について上述
の処理（Ｓ２２、Ｓ２３）が実行される。以後、受信し
た条件ファイルに含まれる検索条件全てについて同様の
処理が繰り返し実行される。As described above, when an atom pair or the like satisfying the search condition is extracted, it is determined whether or not the received condition file still includes the search condition (S24). In this case, the above processing (S22, S23) is executed for the next search condition. Thereafter, the same processing is repeatedly executed for all the search conditions included in the received condition file.

【００２９】受信した条件ファイルに含まれる全ての検
索条件についての処理が終了すると、その結果のファイ
ルが作成される（Ｓ２５）。このファイルには、例え
ば、図８に示すように、抽出された原子対の原子間距
離、原子対の一方のタンパク質の原子、その原子を含む
アミノ酸残基、その残基番号、原子対の他方の低分子化
合物の原子等が記述される。そして、このように作成さ
れた結果のファイルの内容に基づいて、低分子化合物の
指定原子と相互作用すると予測されるタンパク質の原子
を含むアミノ酸残基の配列パターンがモチーフパターン
として生成される（Ｓ２６）。When the processing for all the search conditions included in the received condition file is completed, a file of the result is created (S25). In this file, for example, as shown in FIG. 8, the interatomic distance of the extracted atom pair, the atom of one protein of the atom pair, the amino acid residue containing the atom, the residue number, and the other of the atom pair Are described. Then, based on the content of the file created as described above, the sequence pattern of amino acid residues including the atoms of the protein predicted to interact with the designated atoms of the low-molecular compound is generated as a motif pattern (S26). ).

【００３０】モチーフパターンは、次のようにして生成
される。例えば、図８に示すような抽出結果が得られた
場合、アミノ酸残基を１文字表記して、そのアミノ酸残
基を残基番号の順に配列すると、ＧＫＳＡ・・・Ｆ・・Ｄ・・Ｔのようになる。これは、残基番号（１６）から（１８）
まで連続して配列されるアミノ酸残基列「ＧＫＳＡ」
と、残基番号（２８）に位置するアミノ酸残基Ｆ、残基
番号（３０）に位置するアミノ酸残基Ｄ、及び残基番号
（３５）に位置するアミノ酸残基Ｔが順次配列されてい
ることを表す。The motif pattern is generated as follows. For example, when the extraction result as shown in FIG. 8 is obtained, the amino acid residues are represented by one letter, and the amino acid residues are arranged in the order of the residue numbers. GKSA... F.D..T become that way. This corresponds to residue numbers (16) to (18)
Amino acid residue sequence "GKSA" which is continuously arranged up to
And amino acid residue F located at residue number (28), amino acid residue D located at residue number (30), and amino acid residue T located at residue number (35) are sequentially arranged. It represents that.

【００３１】ここで、残基番号（１８）と（２８）の間
のアミノ酸残基は、低分子化合物の酸素Ｏと相互作用す
ると予測される炭素Ｃを含んでいない。そこで、アミノ
酸残基Ａ（残基番号（１８））とアミノ酸残基Ｆ（残基
番号（２８））との間の条件を、例えば、「．」と
「＊」を連結して「．＊」と記述する。「．」は、任意
のアミノ酸残基の存在を定義し、「＊」は、繰り返しを
定義する。従って、「．＊」は、任意のアミノ酸残基
（文字）が任意の長さで配列されることを意味すること
になる。このように定義された記号「．」及び「＊」を
用いることにより、アミノ酸残基Ｆ（残基番号（２
８））とＤ（残基番号（３０））との間の条件及びアミ
ノ酸残基Ｄ（残基番号（３０））とＦ（残基番号（３
５））との間の条件もまた、それぞれ「．＊」で表現で
きる。その結果、上記のように抽出されたアミノ酸残基
列「ＧＫＳＡ・・・Ｆ・・Ｄ・・Ｆ」は、「ＧＫＳＡ．
＊Ｆ．＊Ｄ．＊Ｆ」というアミノ酸残基の配列パターン
（モチーフパターン）として記述される。Here, the amino acid residue between residue numbers (18) and (28) does not contain carbon C which is expected to interact with oxygen O of low molecular weight compounds. Therefore, the condition between the amino acid residue A (residue number (18)) and the amino acid residue F (residue number (28)) is changed, for example, by connecting “.” And “*” to “. *”. Is described. “.” Defines the presence of any amino acid residue, and “*” defines repeat. Therefore, ". *" Means that an arbitrary amino acid residue (character) is arranged at an arbitrary length. By using the symbols “.” And “*” thus defined, the amino acid residue F (residue number (2
8)) and D (residue number (30)) and amino acid residues D (residue number (30)) and F (residue number (3
5)) can also be expressed by “. *”, Respectively. As a result, the amino acid residue sequence “GKSA... F... D... F” extracted as described above becomes “GKSA.
* F. * D. * F "is described as an amino acid residue sequence pattern (motif pattern).

【００３２】このモチーフパターンは、「連続するアミ
ノ酸残基列ＧＫＳＡ、任意個数の任意のアミノ酸残基
列、アミノ酸残基Ｆ、任意個数の任意のアミノ酸残基
列、アミノ酸残基Ｄ、任意個数の任意のアミノ酸残基列
及びアミノ酸残基Ｔが順次配列された」状態を記述した
ことになる。即ち、このモチーフパターンは、個別的に
抽出されたアミノ酸残基Ｇ、Ｋ、Ｓ、Ａ、Ｆ、Ｄ、Ｆの
配列順序及びアミノ酸残基の連続性、及び離散するアミ
ノ酸残基間の条件の各情報を含むことになる。This motif pattern is composed of “a continuous amino acid residue sequence GKSA, an arbitrary number of arbitrary amino acid residue sequences, an amino acid residue F, an arbitrary number of arbitrary amino acid residue sequences, an amino acid residue D, an arbitrary number of amino acid residues D, An arbitrary amino acid residue sequence and an amino acid residue T are sequentially arranged. " That is, this motif pattern is based on the sequence order of amino acid residues G, K, S, A, F, D, and F, the continuity of amino acid residues, and the condition between discrete amino acid residues. Each piece of information will be included.

【００３３】上記のようにしてモチーフパターンが生成
されると、そのモチーフパターンが上記抽出結果ファイ
ル（図８参照）と共にＬＡＮユニット１０３からＬＡＮ
を介してクライアント１０に送信される（Ｓ２７）。サ
ーバ２０での一連の処理がこれで終了する。図６に戻っ
て、サーバ２０からの応答の待ち状態となっていたクラ
イアント１０のＣＰＵ１０１は、サーバ２０からの抽出
結果ファイル及びモチーフパターンをＬＡＮユニット１
０３にて受信したことを判定すると、その受信した情報
の表示処理を行う（Ｓ１０）。この表示処理では、上記
抽出結果ファイルの内容（図８参照）及び上記モチーフ
パターン（例えば、「ＧＫＳＡ．＊Ｆ．＊Ｄ．＊Ｆ」）
を表示ユニット１０５の所定のウインドウに表示させ
る。When the motif pattern is generated as described above, the motif pattern is transmitted from the LAN unit 103 to the LAN together with the extraction result file (see FIG. 8).
(S27). A series of processes in the server 20 ends here. Returning to FIG. 6, the CPU 101 of the client 10 that has been waiting for a response from the server 20 transmits the extracted result file and the motif pattern from the server 20 to the LAN unit 1.
When it is determined that the received information has been received at step 03, display processing of the received information is performed (S10). In this display processing, the content of the extraction result file (see FIG. 8) and the motif pattern (for example, “GKSA. * F. * D. * F”)
Is displayed in a predetermined window of the display unit 105.

【００３４】ユーザは、例えば、解析によってアミノ酸
残基の一次配列が得られた新たなタンパク質に上記モチ
ーフパターンが存在する否かを目視によって確かめるこ
とができる。もし、そのようなモチーフパターンが存在
すれば、この新たなタンパク質も上記指定した低分子化
合物と相互作用すると予測することができる。上記例で
は、モチーフパターンを得るための処理をクライアント
とサーバに分散して行うようにしたが、クライアント側
だけで実行すること（スタンドアロン）も可能である。
この場合、ＰＤＢは、サーバから受信するようにしても
よいが、ＣＤ−ＲＯＭ等によって直接クライアントに供
することもできる。The user can visually confirm, for example, whether or not the motif pattern is present in a new protein whose primary sequence of amino acid residues has been obtained by analysis. If such a motif pattern is present, it can be predicted that this new protein will also interact with the low molecular weight compound specified above. In the above example, the processing for obtaining the motif pattern is performed in a distributed manner between the client and the server. However, the processing can be performed only on the client side (stand-alone).
In this case, the PDB may be received from the server, or may be directly provided to the client by a CD-ROM or the like.

【００３５】また、ＰＤＢをクライアント側に備え、ク
ライアントから条件ファイルをサーバに送信する際に
（図６のＳ６）、ＰＤＢに記述された指定の低分子化合
物及びタンパク質に関する情報（三次元構造を含む）を
サーバに送信するよう構成することもできる。この場
合、サーバは、受信した低分子化合物及びタンパク質の
情報から検索条件を満足するアミノ酸残基等を抽出す
る。When the PDB is provided on the client side and the condition file is transmitted from the client to the server (S6 in FIG. 6), information on the specified low-molecular compound and protein described in the PDB (including the three-dimensional structure) is included. ) Can be sent to the server. In this case, the server extracts an amino acid residue or the like that satisfies the search condition from the received information on the low-molecular compound and the protein.

【００３６】なお、上記の例では、１つの検索条件（タ
ンパク質及び低分子化合物の原子対と、原子間距離の範
囲）での処理結果（図８参照）であったが、複数の検索
条件での処理結果は、各検索条件での処理結果を重ね合
わせたものとなる。更にまた、例えば、本願出願人が既
に出願している特願平１０−１０９１３９に開示される
文字列検索システムによれば、新たなタンパク質のアミ
ノ酸残基列から上述したようなモチーフパターンを自動
的に検索することもできる。In the above example, the processing result was obtained under one search condition (the range of the distance between atoms and the pairs of proteins and low-molecular compounds) (see FIG. 8). Is a superposition of the processing results under each search condition. Furthermore, for example, according to the character string search system disclosed in Japanese Patent Application No. 10-109139 filed by the present applicant, the motif pattern as described above is automatically obtained from the amino acid residue sequence of a new protein. You can also search for

【００３７】この既に出願済の文字列検索システムにて
実行される文字列検索処理では、例えば、次のような規
則（１）乃至（８）に従ってアミノ酸残基を表す文字の
配列条件が指定できる。（１）特定の文字の指定記述例：Ａ意味：文字Ａを指定（２）複数の文字の指定記述例：［ＡＢＣ］意味：Ａ、Ｂ、Ｃのいずれかの文字の指定（３）特定の文字と異なる文字の指定記述例：［＾Ａ］意味：Ａではない文字の指定（４）任意の文字（何でもよい）の指定記述例：．意味：任意の文字の指定（５）繰り返しの指定記述例：＊意味：直前のパターン（文字）を０回以上繰り返す（６）指定回数の繰り返しの指定記述例：\ ｛3\｝意味：直前のパターン（文字）を３回繰り返す（７）指定範囲回数の繰り返し指定記述例：\ ｛3,5\｝意味：直前のパターン（文字）を３回から５回繰り返
す（８）複数の文字列の指定記述例：（ＸＸＸ｜ＹＹＹ｜ＸＸＸ）意味：ＸＸＸ、ＹＹＹ、ＺＺＺのいずれかの文字列の
指定以上のような規則（配列条件）に従って文字列を指定す
る場合、検索条件を記述した検索条件情報によって、例
えば、次のような文字列の指定が可能となる。In the character string search processing executed by the already applied character string search system, for example, a sequence condition of a character representing an amino acid residue can be designated according to the following rules (1) to (8). . (1) Specification of specific character Description example: A Meaning: Specify character A (2) Specification of multiple characters Description example: [ABC] Meaning: Specification of any character of A, B, C (3) Specification Specification of a character different from the character of example Description example: [@A] Meaning: Specification of a character other than A (4) Specification of an arbitrary character (anything is acceptable) Example of description: Meaning: Specification of any character (5) Specification of repetition Description example: * Meaning: Repeat the previous pattern (character) 0 or more times (6) Specification of repetition of the specified number of times Description example: \ {3 \} Meaning: Immediately Repeat the pattern (character) three times (7) Specify the repetition of the specified number of times Example: \ {3,5 \} Meaning: Repeat the previous pattern (character) three to five times (8) Multiple character strings Example of description: (XXX | YYY | XXX) Meaning: Specification of any character string of XXX, YYY, ZZZ When specifying a character string in accordance with the above rules (array conditions), a search that describes search conditions For example, the following character strings can be specified by the condition information.

【００３８】上記規則１）に従って「ＡＢＣＤＥ」とい
う検索条件情報を入力することによって、文字列「ＡＢ
ＣＤＥ」が指定される。上記規則１）及び２）に従って
「ＡＢ［ＣＤＥ］ＦＧ」という検索条件情報を入力する
ことによって、文字列「ＡＢＣＦＧ」、「ＡＢＤＦ
Ｇ」、「ＡＢＥＦＧ」が指定される。By inputting search condition information "ABCDE" in accordance with the above rule 1), the character string "ABDE" is entered.
“CDE” is specified. By inputting search condition information “AB [CDE] FG” in accordance with the above rules 1) and 2), character strings “ABCFG”, “ABDF”
G "and" ABEFG "are designated.

【００３９】上記規則１）及び３）に従って「ＡＢＣ
［＾Ｄ］Ｅ」という検索条件情報を入力することによっ
て、文字列「ＡＢＣＡＥ」、「ＡＢＣＢＥ」、「ＡＢＣ
ＣＥ」、「ＡＢＣＥＣ」、・・・が指定される。上記規
則１）及び４）に従って、「ＡＢ．ＣＤ」という検索条
件情報を入力することによって、文字列「ＡＢＡＣ
Ｄ」、「ＡＢＢＣＤ」、「ＡＢＣＣＤ」、「ＡＢＤＣ
Ｄ」、・・・が指定される。According to the above rules 1) and 3), "ABC
By inputting the search condition information “[D] E”, the character strings “ABCAE”, “ABCBE”, “ABC
CE "," ABCEC ",... Are designated. By inputting search condition information “AB.CD” according to the above rules 1) and 4), the character string “ABAC” is input.
D "," ABBCD "," ABCCD "," ABDC "
D ",... Are designated.

【００４０】上記規則１）及び５）に従って、「ＡＢ＊
ＣＤ」という検索条件情報を入力することによって、文
字列「ＡＣＤ」、「ＡＢＣＤ」、「ＡＢＢＣＤ」、「Ａ
ＢＢＢＣＤ」、・・・が指定される。上記規則１）、
４）及び５）に従って、「ＡＢ．＊ＣＤ」という検索条
件情報を入力することによって、文字列「ＡＢＣＤ」、
「ＡＢＸＣＤ」、「ＡＢＸＹＺＣＤ」、・・・が指定さ
れる。According to the above rules 1) and 5), "AB *
By inputting the search condition information “CD”, the character strings “ACD”, “ABCD”, “ABBCD”, “A
BBBCD ",... Are designated. Rule 1) above,
According to 4) and 5), by inputting search condition information “AB. * CD”, the character string “ABCD”,
“ABXCD”, “ABXYZCD”,... Are designated.

【００４１】上記規則１）及び６）に従って、「ＡＢ\
｛4\｝ＣＤ」という検索条件情報を入力することによっ
て、文字列「ＡＢＢＢＢＣＤ」が指定される。上記規則
１）及び７）に従って、「ＡＢ\ ｛2,4\｝ＣＤ」という
検索条件情報を入力するとことによって、文字列「ＡＢ
ＢＣＤ」、「ＡＢＢＢＣＤ」、「ＡＢＢＢＢＣＤ」が指
定される。According to the above rules 1) and 6), "AB \
The character string "ABBBBBCD" is specified by inputting search condition information "$ 4 \ $ CD". According to the above rules 1) and 7), by inputting search condition information “AB \\ 2,4 \｝ CD”, the character string “AB
"BCD", "ABBBCD", and "ABBBBBCD" are designated.

【００４２】上記規則１）及び８）に従って、「ＡＢ
（ＣＤＥ｜Ｆ）ＧＨ」という検索条件情報を入力するこ
とにより、文字列「ＡＢＣＤＥＧＨ」、「ＡＢＦＧＨ」
が指定される。上記のような文字列検索システムに入力
する情報としてのモチーフパターンを生成する場合、個
別的に抽出された低分子化合物と相互作用を行うと予測
されるアミノ酸残基部分を種々の規則性に着目して統合
したモチーフパターンを生成することができるようにな
る。According to the above rules 1) and 8), "AB
(CDE | F) GH ", the character strings" ABCDEGH "," ABFGGH "
Is specified. When generating a motif pattern as information to be input to the character string search system as described above, focus on various regularities at amino acid residue portions that are predicted to interact with individually extracted low-molecular compounds. Then, an integrated motif pattern can be generated.

【００４３】上記例において、図７におけるステップＳ
２２乃至Ｓ２５での処理が検索手段に対応し、ステップ
Ｓ２６での処理が配列パターン生成手段に対応する。ま
た、補助記憶装置１０７がデータベース格納手段に対応
し、図１０におけるステップＳ１０での処理及び表示ユ
ニット１０５が表示手段に対応する。In the above example, step S in FIG.
The processing in 22 to S25 corresponds to the search means, and the processing in step S26 corresponds to the array pattern generating means. Further, the auxiliary storage device 107 corresponds to a database storage unit, and the processing and display unit 105 in step S10 in FIG. 10 corresponds to a display unit.

【００４４】[0044]

【発明の効果】以上、説明してきたように、請求項１乃
至３記載の本願発明によれば、このパターン化の結果と
して得られた配列パターンは、得られたアミノ酸残基が
当該一次配列の順に従って離散していても統合された配
列パターンとして表現され、その配列パターンから、離
散されたアミノ酸残基の配列順序や、化合物と相互作用
する部分を含むタンパク質の残基部分の大まかな一次配
列構造を把握することができる等、低分子化合物と相互
作用するタンパク質のアミノ酸残基部分を当該タンパク
質の構造に関する情報をより多く含む形で抽出できるよ
うになる。As described above, according to the present invention as set forth in claims 1 to 3, the sequence pattern obtained as a result of the patterning is such that the obtained amino acid residue is the same as that of the primary sequence. It is expressed as an integrated sequence pattern even if it is discrete according to the order, and from that sequence pattern, the sequence order of the discrete amino acid residues and the rough primary sequence of the protein residue part including the part that interacts with the compound For example, the amino acid residue portion of a protein that interacts with a low-molecular compound can be extracted in a form that contains more information on the structure of the protein, for example, the structure can be grasped.

【００４５】また、請求項４記載の本願発明によれば、
上記のようなタンパク質解析支援装置での処理をコンピ
ュータで行わせるためのプログラムを格納した記憶媒体
を提供することができる。According to the invention of claim 4,
It is possible to provide a storage medium storing a program for causing a computer to perform the above-described processing in the protein analysis support apparatus.

[Brief description of the drawings]

【図１】タンパク質とその近傍に存在する低分子化合物
の関係の一例を示す図である。FIG. 1 is a diagram showing an example of the relationship between a protein and a low molecular weight compound present in the vicinity thereof.

【図２】本発明の実施の一形態に係るタンパク質解析支
援装置が適用されるシステムの一例を示すブロック図で
ある。FIG. 2 is a block diagram showing an example of a system to which the protein analysis support device according to one embodiment of the present invention is applied.

【図３】図２に示す各クライアント及びサーバの基本的
な構成例を示すブロック図である。FIG. 3 is a block diagram showing a basic configuration example of each client and server shown in FIG. 2;

【図４】ＰＤＢに登録されたタンパク質に関する情報の
例を示す図である。FIG. 4 is a diagram showing an example of information on a protein registered in a PDB.

【図５】ＰＤＢに登録された化合物に関する情報の例を
示す図である。FIG. 5 is a diagram showing an example of information on a compound registered in a PDB.

【図６】クライアント端末で実行される条件ファイル作
成のための処理の手順を示すフローチャートである。FIG. 6 is a flowchart illustrating a procedure of a process for creating a condition file executed by the client terminal.

【図７】サーバで実行されるモチーフパターンを作成す
るための処理の手順を示すフローチャートである。FIG. 7 is a flowchart illustrating a procedure of a process for creating a motif pattern executed by the server.

【図８】検索条件に従って抽出された情報の例を示す図
である。FIG. 8 is a diagram illustrating an example of information extracted according to a search condition.

[Explanation of symbols]

１０、１１、１２クライアント端末２０サーバ１０１ＣＰＵ１０２メモリユニット１０３ＬＡＮユニット１０４入力ユニット１０５表示ユニット１０６ＣＤ−ＲＯＭドライブユニット１０７補助記憶装置２００ＣＤ−ＲＯＭ 10, 11, 12 Client terminal 20 Server 101 CPU 102 Memory unit 103 LAN unit 104 Input unit 105 Display unit 106 CD-ROM drive unit 107 Auxiliary storage device 200 CD-ROM

───────────────────────────────────────────────────── フロントページの続き (72)発明者大屋倫宏福岡県福岡市早良区百道浜２丁目２番１号株式会社富士通九州システムエンジニアリング内Ｆターム(参考） 2G045 AA34 DA36 JA01 JA04 JA05 JA07 5B075 ND04 NK39 ────────────────────────────────────────────────── ─── Continuation of the front page (72) Inventor Norihiro Oya 2-2-1 Momodohama, Sawara-ku, Fukuoka, Fukuoka Prefecture F-term in the Fujitsu Kyushu System Engineering Co., Ltd. (Reference) 2G045 AA34 DA36 JA01 JA04 JA05 JA07 5B075 ND04 NK39

Claims

[Claims]

1. Using a database describing the three-dimensional structural data of a protein and a compound present in the vicinity of the protein, information on the amino acid residue portion of the specified protein predicted to interact with the specified compound In the protein analysis support apparatus for obtaining, the specified type of atoms in the specified compound and the amino acid residues containing the specified type of atoms in the specified protein existing within a predetermined distance range from the database A search means for searching using information, and arranging each amino acid residue obtained as a search result by the search means in the order of the primary sequence of amino acid residues constituting the designated protein according to a predetermined notation rule Array pattern generating means for generating a patterned array pattern; Protein analysis support apparatus that obtained as information representing the primary sequence structure of a portion including a portion of amino acid residues of the compound and the interacting proteins.

2. The protein analysis support device according to claim 1, wherein the sequence pattern generating means arranges the amino acid residues represented by one letter in the order of the primary sequence of the amino acid residues constituting the designated protein. In addition, a protein analysis support device that generates a sequence pattern information by inserting a predetermined pattern notation between amino acid residues that are discretely present in the order of the primary sequence.

3. Using a database describing three-dimensional structural data of a protein and a compound existing in the vicinity of the protein, information on amino acid residue portions of the specified protein which are predicted to interact with the specified compound. A database storage means for storing the database; and a specified type of a specified protein existing within a predetermined distance from each atom of the specified type in the specified compound. Searching means for searching for amino acid residues containing atoms using the information of the database read from the storage means, and specifying each amino acid residue obtained as a search result by the searching means in accordance with a predetermined notation rule Of amino acid residues composing the protein sequence Protein analysis support having sequence pattern generating means for generating a pattern, and display means for displaying the sequence pattern as information representing a primary sequence structure of a portion including an amino acid residue portion of a protein interacting with a designated compound apparatus.

4. Use of a database describing the three-dimensional structure data of a protein and a compound existing in the vicinity of the protein to obtain information on amino acid residue portions of the specified protein which are predicted to interact with the specified compound. In a storage medium storing a program for causing a computer to perform a process in a protein analysis support apparatus for obtaining a specified protein existing within a predetermined distance from each atom of a specified type in a specified compound A search procedure for searching for an amino acid residue containing the atom of the type specified in the above-mentioned database using the information of the database; and specifying each amino acid residue obtained as a search result by the search means in accordance with a predetermined notation rule. Sequence pattern patterned by arranging in the order of the primary sequence of the amino acid residues constituting the protein And storing a program for obtaining the sequence pattern as information representing a primary sequence structure of a portion including an amino acid residue portion of a protein interacting with a designated compound. Medium.