JP2930851B2

JP2930851B2 - Method and apparatus for calculating prediction accuracy of three-dimensional structure of protein

Info

Publication number: JP2930851B2
Application number: JP6002461A
Authority: JP
Inventors: 綏窪田
Original assignee: ADOBANSUDO TEKUNOROJII INSUTEITEYUUTO KK
Current assignee: ADOBANSUDO TEKUNOROJII INSUTEITEYUUTO KK
Priority date: 1994-01-14
Filing date: 1994-01-14
Publication date: 1999-08-09
Anticipated expiration: 2014-08-09
Also published as: JPH07206894A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、蛋白質工学などのバイ
オテクノロジーの分野における蛋白質の立体構造の予測
精度演算方法及び予測精度演算装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method and an apparatus for calculating the prediction accuracy of a three-dimensional structure of a protein in the field of biotechnology such as protein engineering.

【０００２】[0002]

【従来の技術】蛋白質とは、グルタミン酸やリジンなど
表１に示す２０種類のアミノ酸がペプチド結合して、例
えば、メチオニン−グルタミン酸−グリシン−リジン−
プロリン−……のように数十乃至数百個のアミノ酸がつ
ながって構成されたものであり、これをアミノ酸配列又
は蛋白質の１次構造と呼んでおり、当該アミノ酸配列
は、蛋白質の最も本質的な情報、特にその立体構造や機
能に関する情報を有する。このように隣接する同士のア
ミノ酸がペプチド結合して蛋白質を構成する単位を特に
アミノ酸残基と呼んでいる。2. Description of the Related Art A protein is formed by binding 20 kinds of amino acids shown in Table 1 such as glutamic acid and lysine to a peptide, for example, methionine-glutamic acid-glycine-lysine-
It is composed of tens to hundreds of amino acids connected like proline-..., Which is called an amino acid sequence or a primary structure of a protein. Information, especially information on its three-dimensional structure and function. Such a unit that forms a protein by adjoining amino acids with a peptide bond is particularly called an amino acid residue.

【０００３】[0003]

【表１】 ─────────────────────────── アミノ酸の名称３文字コード１文字コード ─────────────────────────── グリシンＧＬＹＧアラニンＡＬＡＡセリンＳＥＲＳシステインＣＹＳＣメチオニンＭＥＴＭリシンＬＹＳＫバリンＶＡＬＶトレオニンＴＨＲＴイソロイシンＩＬＥＩロイシンＬＥＵＬアスパラギン酸ＡＳＰＤアスパラギンＡＳＮＮグルタミン酸ＧＬＵＥグルタミンＧＬＮＱアルギニンＡＲＧＲプロリンＰＲＯＰヒスチジンＨＩＳＨフェニルアラニンＰＨＥＦチロシンＴＹＲＹトリプトファンＴＲＰＷ ───────────────────────────[Table 1] 名称 Name of amino acid 3 letter code 1 letter code ────────── Glycine GLY G alanine ALA A serine SERS cysteine CYS C methionine MET M lysine LYS K valine VAL V threonine THR T isoleucine ILE I leucine LEUL L aspartic acid ASP D Asparagine ASNN N Glutamic acid GLU E Glutamine GLN Q Arginine ARG R Proline PRO P Histidine HISH H Phenylalanine PHE F Tyrosine TYRY Tryptophan TRP W ────────────────────── ─────

【０００４】一般に、蛋白質のアミノ酸配列を決定する
ことは比較的容易であり、現在までに約２万種類にのぼ
る蛋白質のアミノ酸配列が決定され、データベース化さ
れている。一方、これらの蛋白質の立体構造について
は、従来、例えば、Ｘ線結晶解析又はＮＭＲを用いた解
析によって、蛋白質を構成するアミノ酸残基の各原子に
対してその空間座標（ｘ，ｙ，ｚ）が求められている。
以下、この空間座標の蛋白質のデータベースをＰＤＢ
（Protein Data Bank）という。従って、これらの立体
構造が既知である蛋白質から２０種類のアミノ酸残基間
の相対距離に関する統計データを計算することができ
る。In general, it is relatively easy to determine the amino acid sequence of a protein, and up to the present, the amino acid sequences of about 20,000 types of proteins have been determined and stored in databases. On the other hand, regarding the three-dimensional structure of these proteins, the spatial coordinates (x, y, z) of each atom of amino acid residues constituting the protein have been conventionally determined by, for example, X-ray crystallography or NMR. Is required.
Hereinafter, the protein database of the spatial coordinates is referred to as PDB.
(Protein Data Bank). Therefore, it is possible to calculate statistical data relating to the relative distance between 20 types of amino acid residues from a protein whose known three-dimensional structure is known.

【０００５】一方、近年、上記Ｘ線解析によって決定さ
れた蛋白質の立体構造のデータの増加に伴って、予測目
的とする蛋白質（以下、目的蛋白質という。）のアミノ
酸配列と類似の、すなわち相同性（ホモロジー）がある
立体構造が既知である参照とすべき蛋白質（以下、参照
蛋白質という。）を見つけ出す可能性が高くなりつつあ
る。このような状況のもとで、参照蛋白質の立体構造を
参照することによって、目的蛋白質の立体構造を予測す
るというホモロジーモデリングの手法が提案されてい
る。すなわち、アミノ酸配列が類似していれば、立体構
造もよく似ているであろうという仮定に基づいている。On the other hand, in recent years, with the increase in data on the three-dimensional structure of a protein determined by the X-ray analysis, the amino acid sequence similar to the amino acid sequence of a protein to be predicted (hereinafter referred to as the target protein), that is, homology The possibility of finding a protein to be a reference (homology) having a known three-dimensional structure (hereinafter referred to as a reference protein) is increasing. Under such circumstances, there has been proposed a homology modeling method of predicting the three-dimensional structure of a target protein by referring to the three-dimensional structure of a reference protein. That is, it is based on the assumption that if the amino acid sequences are similar, the three-dimensional structure will also be very similar.

【０００６】上記ホモロジーモデリングの手法の一例と
して、参照蛋白質を構成する主要な原子であってアミノ
残基の中心を成すα炭素原子（以下、Ｃ_α原子とい
う。）の３次元座標を用いる方法が提案されている。当
該Ｃ_α原子の絶対座標は、蛋白質分子の向きや位置に依
存するために、上記３次元座標の計算がたいへん複雑に
なるという問題点があった。As an example of the above homology modeling method, there is a method using three-dimensional coordinates of an α carbon atom (hereinafter referred to as a _Cα atom) which is a main atom constituting a reference protein and forms a center of an amino residue. Proposed. Absolute coordinates of the C _alpha atoms in order that depends on the orientation and position of the protein molecule, the calculation of the three-dimensional coordinates have a problem of the very complicated.

【０００７】一方、クリッペン（G.M.Crippen）とヘー
ベル（T.Havel）はＣ_α原子間の相対距離からメトリッ
ク行列を求め、この固有値問題を解くことにより、大き
い順で上位３つの固有値からＣ_α原子の３次元座標を求
めるという、いわゆる距離の幾何学（distance geometr
y）による方法を提案している。しかしながら、かかる
方法では得られた蛋白質が実際のものより縮小されると
いう問題点があった。On the other hand, Crippen (GMCrippen) and Hebel (T.Havel) obtains the metric matrix from the relative distance between the C _alpha atoms by solving the eigenvalue problem, the top three eigenvalues of C _alpha atom in descending order Finding three-dimensional coordinates, the so-called distance geometr
y). However, such a method has a problem that the obtained protein is smaller than the actual protein.

【０００８】この問題点を解決するために、参照蛋白質
の２つのＣ_α原子間の相対距離を用いて、目的蛋白質の
立体構造を予測する方法が、例えば、輪湖博（Hiroshi
Wako）とハロルド・エイ・シェラガ（Harold A.Scherag
a）による次の文献において提案されている（以下、輪
湖・シェラガの方法という。）。 “Distance-Conatraint Approach to Protein Folding.
I. StatisticalAnalysis of Protein Conformations i
n Terms of Distances Between Residues”，Journal o
f Protein Chemistry, Vol.1,No.1,pp5-45,1982（以
下、文献１という。）[0008] In order to solve this problem, using the relative distance between two C _alpha atoms of the reference protein, a method for predicting the three-dimensional structure of the target protein, e.g., Wamizuumihaku (Hiroshi
Wako and Harold A. Scherag
This is proposed in the following document according to a) (hereinafter referred to as the Waiko-Sheraga method). “Distance-Conatraint Approach to Protein Folding.
I. Statistical Analysis of Protein Conformations i
n Terms of Distances Between Residues ”, Journal o
f Protein Chemistry, Vol.1, No.1, pp5-45, 1982 (hereinafter referred to as Reference 1)

【０００９】しかしながら、文献１において開示された
輪湖・シェラガの方法においては、Ｘ線結晶解析及びＮ
ＭＲを用いた解析によって得られた参照蛋白質の立体構
造に基づいた平均的な統計データを使用するために、予
測したい実際の蛋白質の座標と大きなずれを生じ、実際
の立体構造を予測することができなかった。However, in the method of Wahu and Shelaga disclosed in Reference 1, X-ray crystallography and N
Since the average statistical data based on the three-dimensional structure of the reference protein obtained by the analysis using MR is used, there is a large deviation from the coordinates of the actual protein to be predicted, and it is difficult to predict the actual three-dimensional structure. could not.

【００１０】この問題点を解決するために、本発明者
は、特願平５−２５２８９５号において、蛋白質の立体
構造の予測演算方法及び予測演算装置を提案した。当該
演算方法及び演算装置は、アミノ酸残基の原子の座標が
既知の蛋白質のうち、立体構造を予測する目的蛋白質と
の相同性の度合いが高い少なくとも１つの蛋白質を参照
蛋白質に選択し、上記選択した参照蛋白質のアミノ酸残
基の原子の座標に基づいて上記目的蛋白質のアミノ酸残
基の原子の座標を予測演算する蛋白質の立体構造の予測
演算方法であって、上記参照蛋白質と上記目的蛋白質と
の間のアミノ酸配列を比較することによって相同な複数
の残基ペアを切り出し、上記切り出した複数の残基ペア
に基づいて上記参照蛋白質における互いに残基番号が異
なる各残基の原子間の相対距離を計算して上記目的蛋白
質の対応する第１の相対距離に設定し、所定の目的蛋白
質の初期の立体構造の座標に基づいて上記初期の立体構
造における互いに残基番号が異なる各残基の原子間の第
２の相対距離を計算し、上記目的蛋白質における互いに
残基番号が異なる各残基の原子間に関する上記第１の相
対距離とそれに対応する上記第２の相対距離との差の二
乗の和が最小になるように最適化を行って上記目的蛋白
質のアミノ酸残基の原子の座標を予測演算することを特
徴としている。In order to solve this problem, the present inventor proposed in Japanese Patent Application No. 5-252895 a method and apparatus for predicting and calculating the three-dimensional structure of a protein. The calculation method and the calculation device select at least one protein having a high degree of homology with the target protein whose three-dimensional structure is to be predicted from among the proteins whose coordinates of the atoms of the amino acid residues are known, as the reference protein, A method for predicting and calculating the three-dimensional structure of a protein, wherein the coordinates of the atoms of the amino acid residues of the target protein are calculated based on the coordinates of the atoms of the amino acid residues of the reference protein. A plurality of homologous residue pairs are cut out by comparing the amino acid sequences between them, and the relative distance between atoms of each residue having a different residue number in the reference protein based on the cut out plurality of residue pairs is determined. Calculate and set to the corresponding first relative distance of the target protein, and based on the coordinates of the initial three-dimensional structure of the predetermined target protein, the mutuals in the initial three-dimensional structure are determined. Calculating a second relative distance between atoms of each residue having a different residue number, the first relative distance between atoms of each residue having a different residue number in the target protein, and The method is characterized in that optimization is performed so that the sum of the squares of the difference from the second relative distance is minimized, and the coordinates of the atoms of the amino acid residues of the target protein are predicted and calculated.

【００１１】[0011]

【発明が解決しようとする課題】ところで、蛋白質の立
体構造を、例えば、本発明者の上記予測演算方法等の、
ホモロジーモデリングなどの手法により予測した場合、
Ｘ結晶解析などによる実際の立体構造からのズレを示す
偏差（deviation）は、アミノ酸配列に沿って一様では
ないことが知られている。通常、立体構造を予測すべき
目的蛋白質は、Ｘ線結晶解析などの結果が知られていな
いので、アミノ酸配列の情報から局所的な予測精度を推
定することは困難である。一方、目的蛋白質のアミノ酸
配列と相同な（類似な）配列を有する参照蛋白質が見つ
かった場合は、例えばホモロジーモデリングなどの手法
により立体構造が予測されるが、一般的に、アミノ酸配
列が類似していれば立体構造も良く似ていることが期待
されているということは、アミノ酸配列の情報から局所
的な予測精度の推定の可能性を示唆するものである。By the way, the three-dimensional structure of a protein can be determined by, for example,
When predicted by a method such as homology modeling,
It is known that the deviation from the actual three-dimensional structure due to X-ray crystal analysis or the like is not uniform along the amino acid sequence. Usually, since the result of X-ray crystal analysis or the like is not known for the target protein whose tertiary structure is to be predicted, it is difficult to estimate the local prediction accuracy from the amino acid sequence information. On the other hand, when a reference protein having a sequence homologous (similar) to the amino acid sequence of the target protein is found, a three-dimensional structure is predicted by a technique such as homology modeling, but generally the amino acid sequence is similar. If the three-dimensional structure is expected to be very similar, it suggests the possibility of estimating the local prediction accuracy from the amino acid sequence information.

【００１２】しかしながら、従来の技術では、相同性の
程度がアミノ酸配列に沿って定量的且つ立体構造を反映
するように定義されていないために、アミノ酸配列に沿
った立体構造の局所的な予測精度を定量的に推定するこ
とができなかった。すなわち、相同性の定義がアミノ酸
配列に沿って、定量的且つある領域のアミノ酸配列の相
同性が高ければその領域の立体構造もよく似ているよう
にはなされていないという問題点があった。However, in the prior art, since the degree of homology is not defined quantitatively along the amino acid sequence and reflects the three-dimensional structure, the local prediction accuracy of the three-dimensional structure along the amino acid sequence is low. Could not be estimated quantitatively. That is, there is a problem in that the definition of homology is quantitative along the amino acid sequence, and if the homology of the amino acid sequence of a certain region is high, the three-dimensional structure of the region is not very similar.

【００１３】この問題点を解決するためには、この定量
的且つ立体構造を反映していない相同性の定義の問題点
については、本発明者である窪田綏（Yasushi Kubota）
らが相関関数を用いることで解決できることを次の文献
で示している（以下、窪田らの方法という。）。 “Correspondence of Homologies in Amino Acid Seque
nce and Tertiary Structure of Protein Molecules"
，Biochimica et Biophysica Acta,Vol.701,pp242-252
(1982)（以下、文献２という。）In order to solve this problem, regarding the problem of the definition of homology which is not quantitative and does not reflect the three-dimensional structure, the present inventor, Yasushi Kubota,
The following document shows that they can solve the problem by using a correlation function (hereinafter referred to as Kubota et al.'S method). “Correspondence of Homologies in Amino Acid Seque
nce and Tertiary Structure of Protein Molecules "
, Biochimica et Biophysica Acta, Vol.701, pp242-252
(1982) (hereinafter referred to as Reference 2)

【００１４】しかしながら、文献２において開示された
窪田らの方法においては、２つのアミノ酸配列を２次元
的にマトリックスの形で展開しているために、アミノ酸
配列に沿って１次元的に相同性を定量的に表現できない
という難点があった。However, in the method of Kubota et al. Disclosed in Reference 2, since two amino acid sequences are developed two-dimensionally in the form of a matrix, homology is one-dimensionally determined along the amino acid sequences. There was a disadvantage that it could not be expressed quantitatively.

【００１５】本発明の目的はかかる問題点を解決し、所
定の方法で予測された目的蛋白質の立体構造の予測精度
をアミノ酸配列に沿って定量的に推定して演算すること
ができる蛋白質の立体構造の予測精度演算方法及び予測
精度演算装置を提供することにある。[0015] An object of the present invention is to solve the above-mentioned problems, and to provide a three-dimensional structure of a protein which can be calculated by estimating the accuracy of the three-dimensional structure of the target protein predicted by a predetermined method quantitatively along the amino acid sequence. It is an object of the present invention to provide a prediction accuracy calculation method and a prediction accuracy calculation device for a structure.

【００１６】[0016]

【課題を解決するための手段】本発明に係る請求項１記
載の蛋白質の立体構造の予測演算方法は、アミノ酸残基
の原子の座標が既知の蛋白質のうち、立体構造を予測す
べき目的蛋白質との相同性の度合いが高い少なくとも１
つの蛋白質を参照蛋白質に選択し、上記選択した参照蛋
白質のアミノ酸残基の原子の座標に基づいて上記目的蛋
白質のアミノ酸残基の原子の座標が所定の方法で予測さ
れた目的蛋白質の立体構造の予測精度を演算する蛋白質
の立体構造の予測精度演算方法であって、上記目的蛋白
質に含まれる複数種類のアミノ酸の物理化学的パラメー
タに基づいて、上記参照蛋白質と上記目的蛋白質の併置
配列において、上記参照蛋白質のアミノ酸残基と上記目
的蛋白質のアミノ酸残基とが対応するように上記参照蛋
白質の残基番号と上記目的蛋白質の残基番号とをともに
変化させて、各対応するアミノ酸残基に対する相関性を
示す相関関数の値を、上記複数種類のアミノ酸の物理化
学的パラメータから予め選択された少なくとも１つの選
択物理化学的パラメータについてそれぞれ計算し、上記
計算した相関関数の値を平均して計算された相関関数の
平均値を上記予測された目的蛋白質の立体構造の予測精
度として用いることを特徴とする。また、請求項２記載
の蛋白質の立体構造の予測精度演算方法は、請求項１記
載の蛋白質の立体構造の予測精度演算方法において、上
記アミノ酸残基の原子はα炭素原子であることを特徴と
する。さらに、請求項３記載の蛋白質の立体構造の予測
精度演算方法は、請求項１又は２記載の蛋白質の立体構
造の予測精度演算方法において、上記選択物理化学的パ
ラメータは、極性度と、偏比容と、ターン形成度と、α
アミノ基のｐＫ値と、αカルボキシル基のｐＫ値と、突
然変異度とのうちの少なくとも１つを含むことを特徴と
する。According to the present invention, there is provided a method for predicting and calculating a three-dimensional structure of a protein, wherein the target protein whose three-dimensional structure is to be predicted among proteins whose amino acid residue atom coordinates are known. At least one with a high degree of homology with
Three proteins are selected as reference proteins, and the coordinates of the atoms of the amino acid residues of the target protein are predicted by a predetermined method based on the coordinates of the atoms of the amino acid residues of the selected reference protein. A method for calculating the prediction accuracy of a three-dimensional structure of a protein for calculating prediction accuracy, wherein the juxtaposed sequence of the reference protein and the target protein, based on physicochemical parameters of a plurality of types of amino acids contained in the target protein, By changing both the residue number of the reference protein and the residue number of the target protein so that the amino acid residue of the reference protein corresponds to the amino acid residue of the target protein, the correlation with each corresponding amino acid residue is changed. The value of the correlation function indicating the property is determined by using at least one selected physicochemical parameter previously selected from the physicochemical parameters of the plurality of amino acids. Each calculated for over data, characterized by using the average value of the correlation function value is calculated by averaging the correlation function calculated above as the prediction accuracy of the three-dimensional structure of the target protein is the prediction. Further, the method for calculating the prediction accuracy of the three-dimensional structure of the protein according to claim 2 is the method for calculating the prediction accuracy of the three-dimensional structure of the protein according to claim 1, wherein the atom of the amino acid residue is an α-carbon atom. I do. Further, the method for calculating the prediction accuracy of the three-dimensional structure of the protein according to claim 3 is the method for calculating the prediction accuracy of the three-dimensional structure of the protein according to claim 1 or 2, wherein the selected physicochemical parameters are polarities and partial ratios. Volume, degree of turn formation, α
It is characterized by including at least one of a pK value of an amino group, a pK value of an α-carboxyl group, and a degree of mutation.

【００１７】本発明に係る請求項４記載の蛋白質の立体
構造の予測演算装置は、アミノ酸残基の原子の座標が既
知の蛋白質のうち、立体構造を予測すべき目的蛋白質と
の相同性の度合いが高い少なくとも１つの蛋白質を参照
蛋白質に選択し、上記選択した参照蛋白質のアミノ酸残
基の原子の座標に基づいて上記目的蛋白質のアミノ酸残
基の原子の座標が所定の方法で予測された目的蛋白質の
立体構造の予測精度を演算する蛋白質の立体構造の予測
精度演算装置であって、上記目的蛋白質に含まれる複数
種類のアミノ酸の物理化学的パラメータに基づいて、上
記参照蛋白質と上記目的蛋白質の併置配列において、上
記参照蛋白質のアミノ酸残基と上記目的蛋白質のアミノ
酸残基とが対応するように上記参照蛋白質の残基番号と
上記目的蛋白質の残基番号とをともに変化させて、各対
応するアミノ酸残基に対する相関性を示す相関関数の値
を、上記複数種類のアミノ酸の物理化学的パラメータか
ら予め選択された少なくとも１つの選択物理化学的パラ
メータについてそれぞれ計算する第１の計算手段と、上
記第１の計算手段によって計算された相関関数の値を平
均して上記相関関数の平均値を計算する第２の計算手段
とを備え、上記第２の計算手段によって計算された上記
相関関数の平均値を、上記予測された目的蛋白質の立体
構造の予測精度として用いることを特徴とする。また、
請求項５記載の蛋白質の立体構造の予測精度演算装置
は、請求項４記載の蛋白質の立体構造の予測精度演算装
置において、上記アミノ酸残基の原子はα炭素原子であ
ることを特徴とする。さらに、請求項６記載の蛋白質の
立体構造の予測精度演算装置は、請求項４又は５記載の
蛋白質の立体構造の予測精度演算装置において、上記選
択物理化学的パラメータは、極性度と、偏比容と、ター
ン形成度と、αアミノ基のｐＫ値と、αカルボキシル基
のｐＫ値と、突然変異度とのうちの少なくとも１つを含
むことを特徴とする。According to a fourth aspect of the present invention, there is provided an apparatus for predicting and calculating the three-dimensional structure of a protein, wherein the degree of homology with a target protein whose three-dimensional structure is to be predicted among proteins having known coordinates of atoms of amino acid residues. Is selected as a reference protein, and the coordinates of the atoms of the amino acid residues of the target protein are predicted by a predetermined method based on the coordinates of the atoms of the amino acid residues of the selected reference protein. An apparatus for predicting the three-dimensional structure of a protein for calculating the three-dimensional structure prediction accuracy, comprising: juxtaposition of the reference protein and the target protein based on physicochemical parameters of a plurality of types of amino acids contained in the target protein. In the sequence, the residue number of the reference protein and the residue number of the target protein are set so that the amino acid residue of the reference protein corresponds to the amino acid residue of the target protein. By changing both the base number and the value of the correlation function indicating the correlation for each corresponding amino acid residue, at least one selected physicochemical parameter previously selected from the physicochemical parameters of the plurality of amino acids A first calculating means for calculating each of them; and a second calculating means for averaging the values of the correlation functions calculated by the first calculating means to calculate an average value of the correlation functions. The average value of the correlation function calculated by the calculation means is used as the predicted accuracy of the predicted three-dimensional structure of the target protein. Also,
According to a fifth aspect of the present invention, there is provided an apparatus for predicting the three-dimensional structure of a protein, wherein the amino acid residue atom is an α-carbon atom. The apparatus for predicting the three-dimensional structure of a protein according to claim 6 is the apparatus for predicting the three-dimensional structure of a protein according to claim 4 or 5, wherein the selected physicochemical parameters are polarities and partial ratios. It is characterized by including at least one of the following: the volume, the degree of turn formation, the pK value of the α amino group, the pK value of the α carboxyl group, and the degree of mutation.

【００１８】[0018]

【実施例】以下、図面を参照して本発明に係る実施例に
ついて説明する。図１は、本発明に係る一実施例である
蛋白質の立体構造の予測演算及び予測精度演算装置の構
成を示すブロック図である。Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram showing a configuration of a prediction calculation and prediction accuracy calculation device for a three-dimensional structure of a protein according to one embodiment of the present invention.

【００１９】本実施例の蛋白質の立体構造の予測演算及
び予測精度演算装置で用いる方法は、大きく分けて以下
のステップを有することを特徴としている。（ａ）ＰＤＢに基づいて、目的蛋白質のアミノ酸配列と
類似する配列を有しかつ立体構造が既知の参照蛋白質を
検索し、対応する相同な残基ペアが全体に占める割合を
示す相同性の度合いが高い、例えば３０％以上である参
照蛋白質候補のうち相同性の度合いが最大（又は比較的
高い）である蛋白質を参照蛋白質に選択する。（ｂ）選択された参照蛋白質と目的蛋白質との間のアミ
ノ酸配列を比較して、５残基以上の挿入又は欠失の無い
対応する相同な複数の残基ペアを切り出す。（ｃ）切り出された複数の残基ペアに基づいて、参照蛋
白質に対応する複数の相同な領域間における互いに残基
番号が異なる各残基の原子間の相対距離を計算して、そ
れを目的蛋白質の対応する相対距離＜ｄ_ij＞として設定
する。（ｄ）所定の乱数のタネＩＲに基づいて乱数を発生させ
て乱数値に比例する値を目的蛋白質の初期の立体構造
（以下、初期構造という。）の座標値として設定した
後、これに基づいて上記目的蛋白質の初期構造における
上記複数の相同な領域間における互いに残基番号が異な
る各残基の原子間の相対距離に対応する相対距離ｄ_ijを
計算する。（ｅ）例えば準ニュートン法を用いて、上記目的蛋白質
における互いに残基番号が異なる各残基の原子間に関す
る上記相対距離ｄ_ijと上記相対距離＜ｄ_ij＞との差の二
乗の和が最小となるように、すなわち、後述する数１で
示す目的蛋白質の関数Ｆの関数値が最小になるように関
数Ｆの最適化を行い、関数Ｆの変数である目的蛋白質の
Ｃ_α原子の予測座標を計算する。（ｆ）アミノ酸配列を、例えば極性度、偏比容、ターン
形成度、αアミノ基のｐＫ値、αカルボキシル基のｐＫ
値、突然変異度などのような、表１の２０種類のアミノ
酸に固有の物理化学的パラメータｐを使って数列化し、
目的蛋白質Ｘと参照蛋白質Ｙとの間のアミノ酸配列の対
応関係を示す併置配列（alignment）において、目的蛋
白質Ｘの残基番号ｉと、その残基番号ｉに対応する参照
蛋白質Ｙの残基番号ｊとの間の相関関数Ｃｐ（ｉ）を後
述の数５で定義して計算し、さらに、雑音軽減のため信
号対雑音比Ｓ／Ｎの向上するために、後述の数６に示す
ように上記相関関数Ｃｐ（ｉ）について複数個のパラメ
ータｐに関する平均値＜Ｃ（ｉ）＞を計算することによ
り、予測精度を推定して演算する。The method used in the apparatus for predicting and predicting the three-dimensional structure of a protein according to the present embodiment is characterized in that the method includes the following steps, which are roughly divided. (A) A reference protein having a sequence similar to the amino acid sequence of the target protein and having a known three-dimensional structure is searched based on the PDB, and the degree of homology indicating the ratio of the corresponding homologous residue pairs to the whole is searched. Is selected as a reference protein from among the reference protein candidates having a high degree of homology, for example, 30% or more, with the maximum (or relatively high) degree of homology. (B) Compare the amino acid sequences between the selected reference protein and the target protein, and cut out a plurality of corresponding homologous residue pairs without insertion or deletion of 5 or more residues. (C) calculating a relative distance between atoms of residues having different residue numbers among a plurality of homologous regions corresponding to a reference protein based on a plurality of cut out residue pairs; Set as the corresponding relative distance <d _ij > of the protein. (D) A random number is generated based on a predetermined random number seed IR, a value proportional to the random number value is set as a coordinate value of an initial three-dimensional structure (hereinafter, referred to as an initial structure) of the target protein, and based on the coordinate value. Then, a relative distance d _ij corresponding to a relative distance between atoms of residues having different residue numbers between the plurality of homologous regions in the initial structure of the target protein is calculated. (E) For example, using the quasi-Newton method, the sum of the squares of the difference between the relative distance d _ij and the relative distance <d _ij > between the atoms of the residues having different residue numbers in the target protein is minimized. That is, the function F is optimized so that the function value of the function F of the target protein expressed by the following equation 1 is minimized, and the predicted coordinates of the _Cα atom of the target protein, which is a variable of the function F, Is calculated. (F) The amino acid sequence is determined by, for example, the degree of polarity, the specific volume, the degree of turn formation, the pK value of the α-amino group,
Sequence using the physicochemical parameters p unique to the 20 amino acids in Table 1, such as value, mutation degree, etc.
In an alignment showing the amino acid sequence correspondence between the target protein X and the reference protein Y, the residue number i of the target protein X and the residue number of the reference protein Y corresponding to the residue number i j is calculated by defining the correlation function Cp (i) with the following equation (5). Further, in order to improve the signal-to-noise ratio S / N for noise reduction, as shown in the following equation (6) The prediction accuracy is estimated and calculated by calculating an average value <C (i)> for the plurality of parameters p with respect to the correlation function Cp (i).

【００２０】まず、目的蛋白質の立体構造の予測演算及
び予測精度演算に関する基本的な原理について説明す
る。この演算において、１つの目的蛋白質はＣ_α原子の
みによって代表される残基によってモデル化されるもの
とする。従って、自然数Ｍ個の残基により構成される１
つの目的蛋白質の立体構造はＭ個のポイントによって表
される。ここで、Ｍ個のポイントの空間的な位置からな
る立体構造は、次の数１に示す関数を最適化することに
よって得ることができる。First, the basic principle relating to the prediction calculation and the prediction accuracy calculation of the three-dimensional structure of the target protein will be described. In this operation, it is assumed that one target protein is modeled by residues represented by only _Cα atoms. Therefore, 1 composed of M natural residues
The three-dimensional structure of one target protein is represented by M points. Here, the three-dimensional structure including the spatial positions of the M points can be obtained by optimizing the function shown in the following Expression 1.

【００２１】[0021]

【数１】Ｆ＝Σ Σｗ_ij・（ｄ_ij−＜ｄ_ij＞）²／ｍｉ＜ｊ## EQU1 ## F = Σ Σw _ij · (d _ij − <d _ij >) ² / mi <j

【００２２】関数Ｆは、上記数１に示すようにｉ＜ｊの
条件のもとで互いに残基番号が異なる各残基の原子間で
のｉとｊに関する和で表される。ここで、１個のポイン
トをＸＹＺの直交座標系を用いて座標を表わすと、関数
Ｆは次の数２に示すように、３Ｍ個の座標変数の関数と
なる。The function F is represented by the sum of i and j between atoms of residues having different residue numbers under the condition of i <j as shown in the above equation (1). Here, when one point is represented by coordinates using an XYZ rectangular coordinate system, the function F is a function of 3M coordinate variables as shown in the following Expression 2.

【００２３】[0023]

【数２】Ｆ＝Ｆ（ｘ₁，ｙ₁，ｚ₁，ｘ₂，ｙ₂，ｚ₂，…，
ｘ_M，ｙ_M，ｚ_M）F = F (x ₁ , y ₁ , z ₁ , x ₂ , y ₂ , z ₂ ,...,
x _M , y _M , z _M )

【００２４】上記数１において、ｍは目的蛋白質と参照
蛋白質との間で対応する残基ペアの全体の数Ｍ（Ｍ−
１）／２である。また、ｗ_ijは重み係数であり、従来の
輪湖・シェラガの方法においては平均的な統計データに
基づいた重み係数を用いるが、本実施例においては常時
ｗ_ij＝１としている。さらに、ｄ_ijはｉ番目の残基とｊ
番目の残基との間の相対距離であり、＜ｄ_ij＞は、従来
の輪湖・シェラガの方法においてはＰＤＢに基づいて多
数の蛋白質から得られた平均的な統計データによる対応
するｉ番目とｊ番目の残基ペアに割り当てられた相対距
離であるが、本実施例においては、上述のように目的蛋
白質と相同性の度合いが最大であるように選択された立
体構造が既知の参照蛋白質についての上記目的蛋白質に
対応するｉ番目とｊ番目の残基ペアに割り当てられた相
対距離である。In the above formula 1, m is the total number M (M−M−) of the corresponding residue pairs between the target protein and the reference protein.
1) / 2. In addition, w _ij is a weighting coefficient. In the conventional method of Wahu and Shelaga, a weighting coefficient based on average statistical data is used. In the present embodiment, w _ij = 1 is always set. Further, _dij is the i-th residue and j
<D _ij > is the corresponding i-th residue from the average statistical data obtained from many proteins based on PDB in the conventional Wahu-Shelaga method. And the relative distance assigned to the j-th residue pair. In this example, the three-dimensional structure selected so as to have the maximum degree of homology with the target protein as described above is a known reference protein. Are the relative distances assigned to the i-th and j-th residue pairs corresponding to the target protein described above.

【００２５】アミノ酸配列が与えられると、数１の関数
Ｆが一意的に決定されるので、本実施例においては、所
定の乱数のタネＩＲを与えて乱数を発生させてその乱数
値に比例する値を座標値とした目的蛋白質の初期構造に
おけるｉ番目の残基とｊ番目の残基との間の相対距離を
関数Ｆのｄ_ijに代入して、例えば準ニュートン法を用い
て、数１の関数Ｆが最小になるように関数Ｆを最適化
し、最適化された関数Ｆを用いて目的蛋白質のＣ_α原子
の座標を計算する。すなわち、目的蛋白質の初期構造に
おけるｉ番目の残基とｊ番目の残基との間の相対距離
と、参照蛋白質についての上記目的蛋白質に対応するｉ
番目とｊ番目の残基ペアに割り当てられた相対距離との
差の二乗の和を最小にするような、目的蛋白質の座標変
数ｘ₁，ｙ₁，ｚ₁，ｘ₂，ｙ₂，ｚ₂，…，ｘ_M，ｙ_M，ｚ_M
を求めて、目的蛋白質の立体構造を求めている。なお、
挿入又は欠失があったり、あるいは相同性の無い領域の
残基には従来の方法の統計データの相対距離を用いる。Given the amino acid sequence, the function F of Equation 1 is uniquely determined. In this embodiment, a random number seed IR is given to generate a random number, which is proportional to the random number value. By substituting the relative distance between the i-th residue and the j-th residue in the initial structure of the target protein using the values as coordinate values into _dij of the function F, for example, using the quasi-Newton method, The function F is optimized so that the function F is minimized, and the coordinates of the _Cα atom of the target protein are calculated using the optimized function F. That is, the relative distance between the i-th residue and the j-th residue in the initial structure of the target protein, and the i of the reference protein corresponding to the target protein
Coordinate variables x ₁ , y ₁ , z ₁ , x ₂ , y ₂ , z _{2 of the} target protein such that the sum of the squares of the differences between the relative distances assigned to the jth and jth residue pairs is minimized. _{_{, ..., x M, y M}} , z M
To determine the three-dimensional structure of the target protein. In addition,
For residues in regions where there is an insertion or deletion or no homology, the relative distance of the statistical data of the conventional method is used.

【００２６】一方、従来の輪湖・シェラガの方法におい
ては、ＰＤＢに基づいて多数の蛋白質から得られた平均
的な統計データによる対応するｉ番目とｊ番目の残基ペ
アに割り当てられた相対距離＜ｄ_ij＞を用いた場合、非
常に粗いデータであるので、関数Ｆから得られた目的蛋
白質の立体構造は実際のそれと程遠いものとなる。On the other hand, in the conventional Wahu-Sheraga method, relative distances assigned to the corresponding i-th and j-th residue pairs based on average statistical data obtained from a large number of proteins based on PDBs When <d _ij > is used, since the data is very coarse, the three-dimensional structure of the target protein obtained from the function F is far from the actual one.

【００２７】本実施例において、所定の乱数のタネＩＲ
を与えて乱数を発生させてその乱数値に比例する値を座
標値とした目的蛋白質の初期構造を求めるときに、蛋白
質の統計学的データに基づいて１４個の球状蛋白質の線
形回帰分析により経験的に得られた蛋白質の旋回半径
（radius of gyration）Ｒｇ（Å）を示す次の数３と、
当該初期構造の３Ｍ個の座標（ｘ₁，ｙ₁，ｚ₁，ｘ₂，ｙ
₂，ｚ₂，…，ｘ_M，ｙ_M，ｚ_M）（以下、その符号を代表
してｑ_lmとし、ここで、ｌ＝１，２，３；ｍ＝１，２，
…，Ｍである。）を示す次の数４を用いる。In this embodiment, a predetermined random number seed IR
When the initial structure of the target protein is determined by generating random numbers and giving coordinates that are proportional to the random numbers, a linear regression analysis of 14 globular proteins is performed based on the statistical data of the proteins. The following equation 3 showing the radius of gyration Rg (Å) of the protein obtained in the following way:
3M coordinates (x ₁ , y ₁ , z ₁ , x ₂ , y) of the initial structure
₂ , z ₂ ,..., X _M , y _M , z _M ) (hereinafter, the sign is represented as q _lm , where l = 1, 2, 3;
..., M. ) Is used.

【００２８】[0028]

【数３】ｌｏｇＲｇ＝０．８２４＋０．３６７・ｌｏｇ
Ｍ## EQU3 ## logRg = 0.824 + 0.367.log
M

【数４】ｑ_lm＝２×Ｒｇ×ｒａｎｄ（ＩＲ）## EQU4 ## q _lm = 2 × Rg × rand (IR)

【００２９】ここで、ｒａｎｄ（）は、乱数のタネを
初期値としたときに乱数値（０≦乱数値＜１）を発生す
る公知の乱数発生関数である。上記数４を用いて、目的
蛋白質の初期構造の座標を実際の座標により近似させる
ために、上記乱数値ｒａｎｄ（ＩＲ）に蛋白質の旋回半
径Ｒｇをかけた後、直径の座標とするためにそれを２倍
して、当該初期構造の座標値とする。Here, rand () is a known random number generation function that generates a random number value (0 ≦ random number value <1) when a random number seed is set as an initial value. Using Equation 4 above, in order to approximate the coordinates of the initial structure of the target protein with actual coordinates, the random number value rand (IR) is multiplied by the radius of gyration Rg of the protein, and then the coordinates are used to obtain the coordinates of the diameter. Is doubled to obtain the coordinate value of the initial structure.

【００３０】さらに、上記の予測方法で予測された目的
蛋白質の立体構造の予測精度を、アミノ酸配列に沿って
推測して計算する方法について詳述する。アミノ酸配列
は表１に示す１文字コードを用いる“記号列”で表され
るため、従来の一般的に用いられる相同性は、アミノ酸
残基の物理的及び化学的性質の類似性、例えばイソロイ
シン（Ｉ）とロイシン（Ｌ）とが類似し、グルタミン酸
（Ｅ）とアスパラギン酸（Ｄ）とが類似しているなどの
類似性に基づいた、いわば定性的な方法によるものであ
る。本発明の当該方法はアミノ酸配列をあるパラメータ
で数列化して後述の数５を用いて目的蛋白質と参照蛋白
質との相関関数を求めるため、−１≦相同性の程度≦１
のように定量的に定義する。すなわち、当該方法は、目
的蛋白質と参照蛋白質との間の相同性の定義を、アミノ
酸配列に沿って定量的且つ立体構造を反映するように行
うことにより、相同性の高低から立体構造の局所的な予
測精度の推定を行おうというものである。Further, a method for estimating and calculating the prediction accuracy of the three-dimensional structure of the target protein predicted by the above-described prediction method along the amino acid sequence will be described in detail. Since amino acid sequences are represented by "symbol strings" using the one-letter codes shown in Table 1, conventional and commonly used homologies are similarities in the physical and chemical properties of amino acid residues, such as isoleucine ( This is based on a qualitative method based on similarities such as similarity between I) and leucine (L) and similarity between glutamic acid (E) and aspartic acid (D). In the method of the present invention, the amino acid sequence is converted into a sequence using a certain parameter, and the correlation function between the target protein and the reference protein is determined using Equation 5 described later.
Is defined quantitatively as follows. That is, the method defines homology between the target protein and the reference protein quantitatively along the amino acid sequence and reflects the three-dimensional structure. It is intended to estimate a precise prediction accuracy.

【００３１】アミノ酸配列は、公知の通り、例えば、次
に示すような、極性度、偏比容、ターン形成度、αアミ
ノ基のｐＫ値、αカルボキシル基のｐＫ値、突然変異度
などのような、表１に示す２０種類のアミノ酸に固有の
物理化学的パラメータｐを使って表２乃至表７に示すよ
うに、数列化することができる。なお、表２乃至表７で
は、表１に示す２０種類のアミノ酸の１文字コードのア
ルファベット順（Ａ，Ｃ，Ｄ，Ｅ，Ｆ，Ｇ，Ｈ，Ｉ，
Ｋ，Ｌ，Ｍ，Ｎ，Ｐ，Ｑ，Ｒ，Ｓ，Ｔ，Ｖ，Ｗ，Ｙ）に
数列化しており、表２乃至表７のそれぞれの（）内の
２つは、各パラメータの数値を提示した論文の著者名
と、その発表年である。（ａ）極性度（polarity）：アミノ酸残基が蛋白質表面
に露出する度合い。（ｂ）偏比容（partial specific volume）：単位質量
のアミノ酸を無限量の溶液に溶かしたときの溶液の容積
変化。（ｃ）ターン形成度（propensipy to form reverse tur
n）：アミノ酸残基が蛋白質内で折れ曲がる度合い。（ｄ）αアミノ基のｐＫ値（ｐＫ−Ｎ）：アミノ酸のα
アミノ基の解離定数Ｋの対数に負の符号を付けた値、す
なわち−ｌｏｇＫ。（ｅ）αカルボキシル基のｐＫ値（ｐＫ−Ｃ）：アミノ
酸のαカルボキシル基の解離定数Ｋの対数に負の符号を
付けた値、すなわち−ｌｏｇＫ。（ｆ）突然変異度（relative mutability）：アミノ酸
残基の相対的な置換頻度。As is known, the amino acid sequence may be, for example, a polarity, a partial specificity, a degree of turn formation, a pK value of an α amino group, a pK value of an α carboxyl group, a degree of mutation, etc. In addition, as shown in Tables 2 to 7, numerical sequences can be formed using physicochemical parameters p unique to the 20 kinds of amino acids shown in Table 1. In Tables 2 to 7, the one-letter codes of the 20 amino acids shown in Table 1 in alphabetical order (A, C, D, E, F, G, H, I,
K, L, M, N, P, Q, R, S, T, V, W, and Y), and two of the parentheses in Tables 2 to 7 indicate the numerical value of each parameter. Is the author's name and the year of publication. (A) polarity: the degree to which amino acid residues are exposed on the protein surface. (B) Partial specific volume: the change in volume of a solution when a unit mass of amino acid is dissolved in an infinite amount of solution. (C) Turn formation (propensipy to form reverse tur)
n): Degree of amino acid residue bending in the protein. (D) pK value of α-amino group (pK-N): α of amino acid
The value obtained by adding the negative sign to the logarithm of the dissociation constant K of the amino group, that is, -logK. (E) pK value of α carboxyl group (pK−C): a value obtained by adding a negative sign to the logarithm of the dissociation constant K of α carboxyl group of amino acid, that is, −logK. (F) Relative mutability: relative substitution frequency of amino acid residues.

【００３２】[0032]

【表２】極性度（Grantham，１９７４） ──────────────────────────── 8.10,5.50,13.00,12.30,5.20,9.00,10.40,5.20,11.30,4.90, 5.70,11.60,8.00,10.50,10.50,9.20,8.60,5.90,5.40,6.20 ────────────────────────────[Table 2] Polarity (Grantham, 1974) ──────────────────────────── 8.10,5.50,13.00,12.30,5.20,9.00 , 10.40,5.20,11.30,4.90,5.70,11.60,8.00,10.50,10.50,9.20,8.60,5.90,5.40,6.20 ────────────────────── ──────

【００３３】[0033]

【表３】偏比容（Cohn and Edsall，１９４３） ──────────────────────────── 0.75,0.61,0.60,0.66,0.77,0.64,0.67,0.90,0.82,0.90, 0.75,0.61,0.76,0.67,0.70,0.68,0.70,0.86,0.74,0.71 ────────────────────────────[Table 3] Uneven volume (Cohn and Edsall, 1943) ──────────────────────────── 0.75, 0.61, 0.60, 0.66, 0.77,0.64,0.67,0.90,0.82,0.90,0.75,0.61,0.76,0.67,0.70,0.68,0.70,0.86,0.74,0.71 ─────────────────── ─────────

【００３４】[0034]

【表４】ターン形成度（Levitt，１９７８） ──────────────────────────── 0.77,0.81,1.41,0.99,0.59,1.64,0.68,0.51,0.96,0.58, 0.41,1.28,1.91,0.98,0.88,1.32,1.04,0.47,0.76,1.05 ────────────────────────────[Table 4] Degree of turn formation (Levitt, 1978) ──────────────────────────── 0.77, 0.81, 1.41, 0.99, 0.59, 1.64,0.68,0.51,0.96,0.58, 0.41,1.28,1.91,0.98,0.88,1.32,1.04,0.47,0.76,1.05 ───────────────────── ───────

【００３５】[0035]

【表５】 αアミノ基のｐＫ値（Sober，１９７０） ──────────────────────────── 9.69,8.35,9.60,9.67,9.18,9.78,9.17,9.68,9.18,9.60, 9.21,8.80,10.64,9.13,8.99,9.21,9.10,9.62,9.44,9.11 ────────────────────────────[Table 5] pK value of α-amino group (Sober, 1970) ──────────────────────────── 9.69, 8.35, 9.60, 9.67 , 9.18,9.78,9.17,9.68,9.18,9.60,9.21,8.80,10.64,9.13,8.99,9.21,9.10,9.62,9.44,9.11 ────────────────── ──────────

【００３６】[0036]

【表６】 αカルボキシル基のｐＫ値（Sober，１９７０） ──────────────────────────── 2.34,1.92,1.88,2.10,2.16,2.35,1.82,2.36,2.16,2.36, 2.28,2.02,1.95,2.17,1.82,2.19,2.09,2.32,2.43,2.20 ────────────────────────────[Table 6] pK value of α-carboxyl group (Sober, 1970) ──────────────────────────── 2.34, 1.92, 1.88, 2.10 , 2.16,2.35,1.82,2.36,2.16,2.36,2.28,2.02,1.95,2.17,1.82,2.19,2.09,2.32,2.43,2.20 ────────────────── ──────────

【００３７】[0037]

【表７】突然変異度（Dayhoff，１９７８） ──────────────────────────── １．００，０．２０，１．０６，１．０２，０．４１，０．４９，０．６６，
０．９６，０．５６，０．４０，０．９４，１．３４，０．５６，０．９３，０．６５，１．２０，０．９７，
０．７４，０．１８，０．４１ ────────────────────────────Table 7 Mutation degree (Dayhoff, 1978) ──────────────────────────── 1.00, 0.20, 1. 06, 1.02, 0.41, 0.49, 0.66
0.96, 0.56, 0.40, 0.94, 1.34, 0.56, 0.93, 0.65, 1.20, 0.97,
0.74, 0.18, 0.41────────────────────────────

【００３８】そこで、目的蛋白質Ｘと参照蛋白質Ｙとの
間のアミノ酸配列の対応関係を示す併置配列において、
目的蛋白質Ｘの残基番号ｉと、その残基番号ｉに対応す
る参照蛋白質Ｙの残基番号ｊとの間の相関性を示す相関
関数Ｃｐ（ｉ）を次の数５で定義する。Therefore, in the juxtaposed sequence showing the correspondence of the amino acid sequence between the target protein X and the reference protein Y,
A correlation function Cp (i) indicating the correlation between the residue number i of the target protein X and the residue number j of the reference protein Y corresponding to the residue number i is defined by the following equation 5.

【００３９】[0039]

【数５】 (Equation 5)

【００４０】ここで、ｘ_ｐ（ｍ）およびｙ_p（ｎ）はそ
れぞれ、目的蛋白質Ｘの残基番号ｍと、参照蛋白質Ｙと
の残基番号ｎにおけるアミノ酸残基に対する物理化学的
パラメータｐの値であり、また＜ｃ_p＞は例えば表１の
２０種類のアミノ酸でのパラメータｐの平均値であり、
ｋとしては例えば５を与える。なお、挿入又は欠失など
のため対応する残基ペアの無い箇所については相関関数
の値を０とおく。Here, x _p (m) and y _p (n) are, respectively, the residue number m of the target protein X and the physicochemical parameter p for the amino acid residue at the residue number n with the reference protein Y. a value, also <c _p> is the average value of the parameter p in the 20 amino acids of e.g. Table 1,
For example, 5 is given as k. In addition, the value of the correlation function is set to 0 for a portion where there is no corresponding residue pair due to insertion or deletion.

【００４１】目的蛋白質Ｘと参照蛋白質Ｙをある１つの
物理化学的パラメータで数列化し、ｋ＝５の場合は１１
残基のセグメント（断片）に対して図５のような対応関
係で上記数５を用いて相関関数Ｃｐ（ｉ）の値を求め
る。なお、数５の数列和Σはｌ＝−ｋからｌ＝＋ｋであ
るが、例えばｋ＝５のとき、以下のように相関関数Ｃｐ
（ｉ）の値を求める。（ａ）ｉ＝１（Ｎ末端）の場合は、図５の３１に示すよ
うに、目的蛋白質Ｘの残基番号ｉ，ｉ＋１，…，ｉ＋５
に対してそれぞれ、参照蛋白質Ｙの残基番号ｊ，ｊ＋
１，…，ｊ＋５の６残基との間の相関関数Ｃｐ（ｉ）の
値が計算される。（ｂ）ｉ＝２の場合は、図５の３２のように、目的蛋白
質Ｘの残基番号ｉ−１，ｉ，ｉ＋１，…，ｉ＋５に対し
てそれぞれ、参照蛋白質Ｙの残基番号ｊ−１，ｊ，ｊ＋
１，…，ｊ＋５の７残基との間の相関係数Ｃｐ（ｉ）の
値が計算される。（ｃ）ｉが最後の残基（Ｃ末端）の場
合はその逆に、目的蛋白質Ｘの残基番号ｉ−５，…，ｉ
−１，ｉに対してそれぞれ、参照蛋白質Ｙの残基番号ｊ
−５，…，ｊ−１，ｊの６残基との相関係数Ｃｐ（ｉ）
の値が計算される。The target protein X and the reference protein Y are converted into a sequence using a certain physicochemical parameter, and when k = 5, 11
The value of the correlation function Cp (i) is obtained using the above equation (5) in a correspondence relationship as shown in FIG. 5 with respect to the segment (fragment) of the residue. Note that the sequence sum Σ of Equation 5 is from l = −k to l = + k. For example, when k = 5, the correlation function Cp
Find the value of (i). (A) When i = 1 (N-terminal), as shown at 31 in FIG. 5, residue numbers i, i + 1,..., I + 5 of the target protein X
Respectively, the residue numbers j and j + of the reference protein Y
The value of the correlation function Cp (i) between six residues of 1,..., J + 5 is calculated. (B) When i = 2, as shown at 32 in FIG. 5, the residue numbers i−1, i, i + 1,..., I + 5 of the target protein X are respectively assigned to the residue numbers j− 1, j, j +
The value of the correlation coefficient Cp (i) between 7 residues of 1,..., J + 5 is calculated. (C) When i is the last residue (C-terminal), conversely, residue numbers i-5,.
−1 and i, respectively, the residue number j of the reference protein Y
Correlation coefficient Cp (i) with 6 residues of −5,..., J−1, j
Is calculated.

【００４２】さらに、本発明では、雑音を除去しＳ／Ｎ
比の向上させるために、上記計算された相関関数Ｃｐ
（ｉ）について、複数個の物理化学的パラメータｐに関
する平均値（以下、相関関数の平均値という。）＜Ｃ
（ｉ）＞を次の数６で表わす。Further, in the present invention, noise is removed and S / N
In order to improve the ratio, the above calculated correlation function Cp
Regarding (i), an average value for a plurality of physicochemical parameters p (hereinafter, referred to as an average value of a correlation function) <C
(I)> is expressed by the following equation (6).

【００４３】[0043]

【数６】 (Equation 6)

【００４４】従って、本発明では、立体構造を反映し且
つできるだけ独立な、すなわち互いに相関がない、例え
ば上記６つの物理化学的パラメータに対する上記数６を
用いて（この場合ｎ＝６）、アミノ酸配列に沿った１次
元的な相同性の度合いを示す相関係数を計算する。以
下、目的蛋白質がエラスターゼであって参照蛋白質がキ
モトリプシノーゲンであるときの例について説明する。Therefore, in the present invention, the amino acid sequence can be obtained by using the above equation (6) for the above six physicochemical parameters (in this case, n = 6) which reflects the three-dimensional structure and is as independent as possible, ie, uncorrelated. Is calculated, which indicates the degree of one-dimensional homology along. Hereinafter, an example in which the target protein is elastase and the reference protein is chymotrypsinogen will be described.

【００４５】図６は、目的蛋白質であるエラスターゼの
上記予測方法で予測された立体構造（実線）とＸ線結晶
解析によって計測された立体構造（点線）を最適に重ね
合せたものを、ステレオ図で示したものである。また、
図７は、図６に基づいて、目的蛋白質エラスターゼの立
体構造において、予測された当該立体構造のＣ_α原子の
位置について、対応するＸ線結晶解析による立体構造の
Ｃ_α原子からの偏差、すなわち、各Ｃ_α原子の位置の距
離（以下、単に偏差という。）ｌ_iiを次の数７を用いて
残基番号に対してプロットしたものである。FIG. 6 is a stereo diagram showing the optimal superposition of the three-dimensional structure (solid line) of the target protein elastase predicted by the above-mentioned prediction method and the three-dimensional structure (dotted line) measured by X-ray crystallography. It is shown by. Also,
Figure 7 is based on Figure 6, the conformation of the target protein elastase, the position of the C _alpha atoms of the predicted the conformation, deviation from the C _alpha atoms of the three-dimensional structure by the corresponding X-ray crystallography, i.e. , And the distance between the positions of each _Cα atom (hereinafter simply referred to as deviation) l _ii is plotted against the residue number using the following _equation (7).

【００４６】[0046]

【数７】ｌ_ii＝√｛（ｘ₁−ｘ₂）²＋（ｙ₁−ｙ₂）²＋
（ｚ₁−ｚ₂）²｝## EQU7 ## l _ii = √ ｛(x ₁ −x ₂ ) ² + (y ₁ −y ₂ ) ² +
(Z ₁ −z ₂ ) ² ｝

【００４７】ここで、（ｘ₁，ｙ₁，ｚ₁）は目的蛋白質
の３次元座標であり、（ｘ₂，ｙ₂，ｚ₂）は参照蛋白質
の３次元座標である。図７から明らかなように、予測精
度が高い領域では偏差ｌ_iiの値は小さく、予測精度の低
い領域でのそれは大きくなっていることが認められる。Here, (x ₁ , y ₁ , z ₁ ) is the three-dimensional coordinates of the target protein, and (x ₂ , y ₂ , z ₂ ) is the three-dimensional coordinates of the reference protein. As is clear from FIG. 7, it is recognized that the value of the deviation l _ii is small in the region where the prediction accuracy is high, and is large in the region where the prediction accuracy is low.

【００４８】図８は、数６を用いて計算した相関関数の
平均値＜Ｃ（ｉ）＞を残基番号に対してプロットしたも
のである。図７と比較することにより、目的蛋白質であ
るエラスターゼと参照蛋白質であるキモトリプシノーゲ
ンとの間で相同性が高い、すなわち相関値が高い残基番
号の領域では立体構造の予測精度が高く、すなわち偏差
ｌ_iiの値が小さいが、一方、相同性が低い、すなわち相
関値が低い残基番号の領域では立体構造の予測精度が低
く、すなわち偏差ｌ_iiの値が大きいことが認められる。
このように、上記数６に基づいた相同性の定義に従え
ば、アミノ酸配列情報のみから予測された立体構造の局
所的な予測精度の推定が可能となる。FIG. 8 is a plot of the average <C (i)> of the correlation function calculated using Equation 6 with respect to the residue number. Compared to FIG. 7, the homology between elastase, which is the target protein, and chymotrypsinogen, which is the reference protein, is high, that is, in the region of the residue number having a high correlation value, the prediction accuracy of the three-dimensional structure is high, Although the value of l _ii is small, the homology is low, that is, in the region of the residue number having a low correlation value, the prediction accuracy of the three-dimensional structure is low, that is, the value of the deviation l _ii is large.
As described above, according to the definition of homology based on Equation 6, it is possible to estimate the local prediction accuracy of the three-dimensional structure predicted only from the amino acid sequence information.

【００４９】以上の実施例において、立体構造の局所的
な予測精度の推定を行うために、上記６つの物理化学的
パラメータを用いているが、これらは本発明者が、立体
構造が似ている部分に対応する残基番号の領域の相関値
が高くなるようなパラメータを探した結果、いわば経験
的に見つけたものである。本発明において用いる物理化
学的パラメータとしては、好ましくは、上記６つの物理
化学的パラメータのうちの少なくとも１つを用いるよう
にしてもよい。In the above embodiment, the above-mentioned six physicochemical parameters are used for estimating the local prediction accuracy of the three-dimensional structure. As a result of searching for a parameter that increases the correlation value in the region of the residue number corresponding to the portion, the parameter was empirically found. As the physicochemical parameters used in the present invention, preferably, at least one of the above six physicochemical parameters may be used.

【００５０】本発明者は、上記の目的蛋白質であるエラ
スターゼと参照蛋白質であるキモトリプシノーゲンとの
間の偏差ｌ_iiと相関関数の平均値＜Ｃ（ｉ）＞との関係
のほかに、以下の目的蛋白質と参照蛋白質との間で、立
体構造の局所的な予測精度の推定が可能であることを確
認している。（ａ）目的蛋白質ヘモグロビン・アルファ鎖（hemoglob
in α−chain）と参照蛋白質ヘモグロビン・ベータ鎖
（hemoglobin β−chain）。（ｂ）目的蛋白質アルファ・リスティック・プロテアー
ゼ（α−lytic protease）と参照蛋白質プロテアーゼＡ
（protease Ａ）。（ｃ）目的蛋白質エンドチアペプシン（endothiapepsi
n）と参照蛋白質ペニシロペプシン（penicillopepsi
n）。The present inventor has determined the following relationship besides the relationship between the deviation l _ii between elastase as the target protein and chymotrypsinogen as the reference protein and the average value <C (i)> of the correlation function. It has been confirmed that local prediction accuracy of the three-dimensional structure can be estimated between the target protein and the reference protein. (A) Hemoglobin alpha chain (hemoglob)
in α-chain) and the reference protein hemoglobin β-chain. (B) Target protein alpha-listic protease (α-lytic protease) and reference protein protease A
(Protease A). (C) The target protein endothiapepsiin
n) and the reference protein penicillopepsin
n).

【００５１】次に、図１を参照して、本実施例の目的蛋
白質の立体構造の予測演算及び予測精度演算装置の構成
について説明する。図１に示すように、図２及び図３の
処理を実行する演算制御装置であるマイクロプロセッシ
ングユニット（以下、ＭＰＵという。）１０には、バス
を介して以下の装置が接続される。（ａ）ＲＯＭ１１：図２及び図３のプログラムデータ及
び当該プログラムを実行するために必要なデータを格納
する。（ｂ）ＲＡＭ１２：図２及び図３の処理を実行するため
のワークエリアとなる記憶領域のためのメモリである。（ｃ）キーボード１３：図２及び図３の処理を実行する
ための指示コマンド又は目的蛋白質の名称と、アミノ酸
配列と、並びに、予測座標計算処理における乱数のタネ
の値などを入力するための入力装置である。（ｄ）ＣＲＴディスプレイ１４：図２及び図３の処理の
処理中の内容及び計算結果を表示する表示装置である。（ｅ）プリンタ１５：図２及び図３の処理の計算結果を
印字する印字装置である。（ｆ）計算結果ファイル２０：例えばハードディスクか
ら構成され、図２及び図３の処理の計算結果のデータを
記憶するための記憶装置である。（ｇ）蛋白質データベース（ＰＤＢ）ファイル２１：例
えばハードディスクから構成され、公知のＸ線結晶解析
及びＮＭＲを用いた解析によって、種々の蛋白質を構成
するアミノ酸残基の各々に対してその空間座標（ｘ，
ｙ，ｚ）を求めた結果を含むデータベースファイルであ
って、本実施例において参照蛋白質となる得る各蛋白質
に関する蛋白質の名称、アミノ酸配列（３文字コード表
示及び１文字コード表示）、残基数、各残基に対する空
間座標を記憶する記憶装置である。Next, with reference to FIG. 1, the configuration of the prediction calculation and prediction accuracy calculation device of the three-dimensional structure of the target protein of the present embodiment will be described. As shown in FIG. 1, the following devices are connected via a bus to a microprocessing unit (hereinafter, referred to as an MPU) 10, which is an arithmetic and control unit that executes the processes in FIGS. (A) ROM 11: Stores the program data of FIGS. 2 and 3 and data necessary for executing the program. (B) RAM 12: A memory for a storage area serving as a work area for executing the processing of FIGS. (C) Keyboard 13: Input for inputting an instruction command for executing the processing of FIGS. 2 and 3 or the name of the target protein, an amino acid sequence, and a value of a random seed in the predicted coordinate calculation processing. Device. (D) CRT display 14: a display device that displays the contents of the processing of FIGS. 2 and 3 and the calculation results. (E) Printer 15: a printer for printing the calculation results of the processing in FIGS. (F) Calculation result file 20: This is a storage device that is constituted by, for example, a hard disk and stores data of the calculation results of the processing in FIGS. (G) Protein database (PDB) file 21: composed of, for example, a hard disk, and analyzed by known X-ray crystallography and NMR to analyze the spatial coordinates (x ,
y, z), which is a database file containing the results of the determination of the protein name, amino acid sequence (three-letter code display and one-letter code display), the number of residues, This is a storage device that stores spatial coordinates for each residue.

【００５２】図２は、図１の装置で実行される蛋白質の
立体構造予測及び精度計算処理を示すフローチャートで
ある。以下、目的蛋白質がエラスターゼであって参照蛋
白質がキモトリプシノーゲンであるときの一例を参照し
て、本実施例の蛋白質の立体構造予測及び精度計算処理
について説明する。なお、印字出力結果である表８乃至
表２４については、明細書の編集の都合上、実施例の最
後に記載しかつ各表においてデータ長が１ページの長さ
を超えるものについては、複数の表に分割している。FIG. 2 is a flowchart showing the three-dimensional structure prediction and accuracy calculation processing of a protein, which is executed by the apparatus shown in FIG. Hereinafter, the process of predicting the three-dimensional structure of the protein and calculating the accuracy thereof will be described with reference to an example when the target protein is elastase and the reference protein is chymotrypsinogen. Note that Tables 8 to 24, which are printout results, are described at the end of the embodiment for the sake of editing the specification, and for each table whose data length exceeds one page, It is divided into tables.

【００５３】図２において、まず、ステップＳ１におい
て、キーボード１３を用いて、座標を予測したい目的蛋
白質の名称とアミノ酸配列とを入力した後、ステップＳ
２においてホモロジー検索処理が実行される。ホモロジ
ー検索処理においては、ＰＤＢに基づいて、目的蛋白質
のアミノ酸配列と類似する配列を有しかつ立体構造が既
知の参照蛋白質を検索し、対応する残基ペアが全体に占
める割合を示す相同性の度合いが例えば３０％以上であ
る参照蛋白質候補のうち、本実施例では相同性の度合い
が最大である蛋白質を参照蛋白質に選択する。次いで、
ステップＳ３において、表８乃至表１１に示すように、
以下の事項が印字される。（ａ）表８：目的蛋白質の名称及び残基数（ｂ）表９乃至表１１：上記選択された各参照蛋白質候
補について（ｂ−１）参照蛋白質の名称（ｂ−２）目的蛋白質と参照蛋白質との間の対応する残
基番号（ｂ−３）相同性の度合い（％）（ｂ−４）目的蛋白質と参照蛋白質との間のアミノ酸配
列の対応関係：ここで、アミノ酸は１文字コードで表さ
れ、上段が目的蛋白質であり下段が参照蛋白質であり、
残基番号の１０毎に＊が付されている。また、目的蛋白
質と参照蛋白質のアミノ酸が対応一致しているところに
“：”が付されており、アミノ酸が欠失しているところ
に“−”が付されている。In FIG. 2, first, in step S1, the name and the amino acid sequence of the target protein whose coordinates are to be predicted are input using the keyboard 13, and then the process proceeds to step S1.
In 2, a homology search process is executed. In the homology search process, based on the PDB, a reference protein having a sequence similar to the amino acid sequence of the target protein and having a known tertiary structure is searched, and the homology indicating the ratio of the corresponding residue pair to the whole is determined. Among the reference protein candidates having a degree of, for example, 30% or more, in this embodiment, the protein having the highest degree of homology is selected as the reference protein. Then
In step S3, as shown in Tables 8 to 11,
The following items are printed. (A) Table 8: Name and number of residues of target protein (b) Tables 9 to 11: For each of the above-mentioned selected reference protein candidates (b-1) Name of reference protein (b-2) Target protein and reference Corresponding residue number between protein (b-3) Degree of homology (%) (b-4) Correspondence of amino acid sequence between target protein and reference protein: where amino acids are one-letter code , The upper row is the target protein and the lower row is the reference protein,
An asterisk is added to every 10 residue numbers. In addition, “:” is added to a position where the amino acid of the target protein matches the amino acid of the reference protein, and “−” is added to a position where the amino acid is deleted.

【００５４】この例では、３つの参照蛋白質候補とし
て、［１］キモトリプシノーゲンＡと、［２］トリプシ
ノーゲンと、［３］キモトリプシノーゲン＊Ａとが検索
され、本実施例では、これらの参照蛋白質候補のうち相
同性の度合いが最大であるキモトリプシノーゲン＊Ａ
（以下、単にキモトリプシノーゲンという。）が参照蛋
白質に選択される。In this example, [1] chymotrypsinogen A, [2] trypsinogen, and [3] chymotrypsinogen * A are searched as three reference protein candidates, and in this embodiment, these reference protein candidates are Chymotrypsinogen * A with the highest degree of homology
(Hereinafter simply referred to as chymotrypsinogen) is selected as the reference protein.

【００５５】次いで、ステップＳ４において、選択され
た参照蛋白質（キモトリプシノーゲン）と目的蛋白質
（エラスターゼ）との間のアミノ酸配列を比較して、本
実施例では、５残基以上の挿入又は欠失（−）の無い対
応する相同な残基ペアを切り出して、ステップＳ５にお
いて、表１２に示すように、対応する残基ペアの数と、
目的蛋白質と参照蛋白質の残基番号の対応表とが印字さ
れる。切り出される対応する残基ペアは、表１１から明
らかなように以下の通りである。（ａ）目的蛋白質の残基番号１−２３が参照蛋白質の残
基番号１６−３８に相同し、以下、残基ペアＲＰ１とす
る。（ｂ）目的蛋白質の残基番号２８−５３が参照蛋白質の
残基番号４０−６５に相同し、以下、残基ペアＲＰ２と
する。Next, in step S4, the amino acid sequence between the selected reference protein (chymotrypsinogen) and the target protein (elastase) is compared. A corresponding homologous residue pair without-) is cut out, and in step S5, as shown in Table 12, the number of the corresponding residue pair,
A correspondence table between the target protein and the residue number of the reference protein is printed. The corresponding residue pairs that are cut out are as follows, as apparent from Table 11. (A) Residue numbers 1 to 23 of the target protein are homologous to residue numbers 16 to 38 of the reference protein, and will be referred to as a residue pair RP1. (B) Residue numbers 28-53 of the target protein are homologous to residue numbers 40-65 of the reference protein, and will be referred to as a residue pair RP2.

【００５６】以下、同様に、残基ペアＲＰ３乃至ＲＰ８
まで存在する。これら対応する残基ペアＲＰ１乃至ＲＰ
８を図４に示す。本実施例では、図４に示すように、座
標が既知である参照蛋白質の残基番号ｉ’番目のＣ_α原
子の座標と残基番号ｊ’番目のＣ_α原子の座標との間の
相対距離を、それぞれ対応する目的蛋白質の残基番号ｉ
番目のＣ_α原子の座標と残基番号ｊ番目のＣ_α原子の座
標との間の相対距離＜ｄ_ij＞に設定して、目的蛋白質の
立体構造を予測する。さらに、ステップＳ６において、
図３の予測座標計算処理を実行して、目的蛋白質のＣ_α
原子の座標を計算して、計算結果をＣＲＴディスプレイ
１４に表示するとともに、プリンタ１５を用いて印字す
る。Hereinafter, similarly, the residue pairs RP3 to RP8
Exist up to. These corresponding residue pairs RP1 to RP
8 is shown in FIG. In this embodiment, as shown in FIG. 4, the relative between the coordinates residue number i of a reference protein known 'of th C _alpha atom coordinates and residue number j' th C _alpha atom coordinates The distance is represented by the residue number i of the corresponding target protein.
The three-dimensional structure of the target protein is predicted by setting the relative distance <d _ij > between the coordinate of the C _α atom at the residue and the coordinate of the C _α atom at the residue number j. Further, in step S6,
The predicted coordinate calculation process shown in FIG. 3 is executed to obtain C _{α of the} target protein.
The coordinates of the atoms are calculated, the calculation result is displayed on the CRT display 14, and printed using the printer 15.

【００５７】次いで、ステップＳ７において、上記２０
種類のアミノ酸に固有の物理化学的パラメータｐの値に
基づいて、ある１つの物理化学的パラメータｐについて
数５を用いて、目的蛋白質と参照蛋白質との間の相関関
数Ｃｐ（ｉ）の値を計算し、次いで、さらに他の５種類
の物理化学的パラメータｐについて同様に数５を用いて
目的蛋白質と参照蛋白質との間の相関関数Ｃｐ（ｉ）の
値を計算し、すなわち６種類の物理化学的パラメータｐ
について目的蛋白質と参照蛋白質との間の相関関数Ｃｐ
（ｉ）の値を計算した後、数６を用いて相関関数の平均
値＜Ｃ（ｉ）＞を目的蛋白質の残基番号ｉに対して計算
する。Next, in step S7, the above 20
The value of the correlation function Cp (i) between the target protein and the reference protein is calculated based on the value of the physicochemical parameter p peculiar to the type of amino acid using Equation 5 for one physicochemical parameter p. Then, the value of the correlation function Cp (i) between the target protein and the reference protein is calculated using Equation 5 for the other five physicochemical parameters p. Chemical parameter p
For the correlation function Cp between the target protein and the reference protein
After calculating the value of (i), the average value <C (i)> of the correlation function is calculated with respect to the residue number i of the target protein using Expression 6.

【００５８】さらに、ステップＳ８において、ステップ
Ｓ７において計算された相関関数の平均値＜Ｃ（ｉ）＞
を、図８のフォーマットで、すなわち目的蛋白質の残基
番号ｉに対する相関関数の平均値＜Ｃ（ｉ）＞を折れ線
グラフをＣＲＴディスプレイ１４に表示するとともに、
プリンタ１５を用いて印字する。当該図８から、アミノ
酸配列情報のみに基づいて、予測された目的蛋白質の立
体構造の座標について残基番号に沿って１次元的に、局
所的な予測精度の推定が可能となった。この方法で１次
構造を比較し、立体構造の似ている可能性の高い部分を
抽出することができる。当該方法によれば、相関関数の
値がどれくらいであれば、また鎖がどれだけ長ければ立
体構造の一致が期待されるかを検討することができる。Further, in step S8, the average <C (i)> of the correlation function calculated in step S7.
Is displayed on the CRT display 14 in the format of FIG. 8, that is, the average value <C (i)> of the correlation function with respect to the residue number i of the target protein is displayed on the CRT display 14,
Printing is performed using the printer 15. From FIG. 8, it is possible to estimate the local prediction accuracy one-dimensionally along the residue number with respect to the coordinates of the predicted three-dimensional structure of the target protein based only on the amino acid sequence information. By using this method, the primary structures can be compared, and a portion having a high possibility that the three-dimensional structure is similar can be extracted. According to this method, it is possible to examine how long the value of the correlation function is, and how long the chain is expected to be in conformity with the three-dimensional structure.

【００５９】図３は、図２のサブルーチンである予測座
標計算処理（ステップＳ６）を示すフローチャートであ
る。図３に示すように、まず、ステップＳ１１におい
て、ステップＳ４において切り出された残基ペアＲＰ１
乃至ＲＰ８に基づいて、参照蛋白質における対応する相
同な領域間の相対距離ｄ_i’_j’を計算して、これを目的
蛋白質の対応する相対距離＜ｄ_ij＞として設定し、この
処理をｉ＜ｊの条件のもとでｉとｊに関して繰り返し実
行する。次いで、ステップＳ１２において、乱数のタネ
ＩＲを入力する。ステップＳ１３においては、入力した
乱数のタネＩＲに基づいて乱数値（０≦乱数値＜１）を
発生し、さらに、上記数４を用いて、目的蛋白質の初期
構造の座標を実際の座標により近似させるために、上記
乱数値に蛋白質の旋回半径Ｒｇをかけた後、直径の座標
とするためにそれを２倍して目的蛋白質の初期構造の座
標ｑ_lmを計算しそれに基づいて上記初期構造の相対距離
ｄ_ijを計算し、この計算をｉ＜ｊの条件のもとでｉとｊ
に関して繰り返し実行する。そして、ステップＳ１４に
おいて当該初期構造における後述する数１の関数値Ｆを
計算する。FIG. 3 is a flowchart showing the predicted coordinate calculation processing (step S6) which is a subroutine of FIG. As shown in FIG. 3, first, in step S11, the residue pair RP1 cut out in step S4
RP8, the relative distance d _i ' _j ' between the corresponding homologous regions in the reference protein is calculated, set as the corresponding relative distance <d _ij > of the target protein, and It repeatedly executes for i and j under the condition of j. Next, in step S12, a seed IR of a random number is input. In step S13, a random number value (0 ≦ random number value <1) is generated based on the input seed IR of the random number, and the coordinates of the initial structure of the target protein are approximated to actual coordinates using the above equation (4). After multiplying the random number value by the radius of gyration Rg of the protein, doubling it to obtain the coordinates of the diameter, the coordinates q _lm of the initial structure of the target protein are calculated, and based on this, the coordinates of the initial structure are calculated. The relative distance d _ij is calculated, and this calculation is performed under the condition of i <j.
Is repeatedly executed. Then, in step S14, a function value F of Equation 1 described later in the initial structure is calculated.

【００６０】次いで、ステップＳ１５において、例えば
１階の偏微分のみを計算するが２階の偏微分を計算しな
いで関数の最小化を行う公知の準ニュートン法を用い
て、数１の目的蛋白質の関数Ｆの関数値が最小になるよ
うに関数Ｆの最適化を行い、関数値Ｆを計算する。さら
に、ステップＳ１６において、最適化された関数Ｆに基
づいて、関数Ｆの変数である目的蛋白質の予測座標変数
ｘ₁，ｙ₁，ｚ₁，ｘ₂，ｙ₂，ｚ₂，…，ｘ_M，ｙ_M，ｚ_Mを
計算する。なお、ステップＳ１４，Ｓ１５において目的
蛋白質の初期構造における関数値Ｆと最適化後の目的蛋
白質の関数値Ｆとを計算するのは、最適化の度合いを知
るためである。最適化後の目的蛋白質の関数値Ｆが目的
蛋白質の初期構造における関数値Ｆに比較して十分に小
さくなったとき、最適化の度合いが大きいと判定するこ
とができる。Next, in step S15, for example, only the first-order partial derivative is calculated, but the function is minimized without calculating the second-order partial derivative. The function F is optimized so that the function value of the function F is minimized, and the function value F is calculated. Further, in step S16, based on the optimized function F, predicted coordinate variables x ₁ , y ₁ , z ₁ , x ₂ , y ₂ , z ₂ ,..., X _M of the target protein, which are variables of the function F , Y _M and z _M are calculated. The reason for calculating the function value F in the initial structure of the target protein and the function value F of the target protein after optimization in steps S14 and S15 is to know the degree of optimization. When the function value F of the target protein after optimization is sufficiently smaller than the function value F in the initial structure of the target protein, it can be determined that the degree of optimization is large.

【００６１】さらに、ステップＳ１７において、表１３
と表１４に示す標準出力フォーマットで出力結果を印字
した後、ステップＳ１８において、表１５乃至表２４に
示す目的蛋白質の予測座標の計算結果を印字する。Further, in step S17, the table 13
Then, the output result is printed in the standard output format shown in Table 14, and in step S18, the calculation results of the predicted coordinates of the target protein shown in Tables 15 to 24 are printed.

【００６２】上記標準出力フォーマットにおいては、表
１３と表１４に示すように、以下の情報が印字される。（ａ）目的蛋白質の名称及びアミノ酸配列（１文字コー
ド表示）（ｂ）参照蛋白質の名称及びアミノ酸配列（３文字コー
ド表示）（ｃ）目的蛋白質と参照蛋白質との間で対応する相同な
残基番号（ｄ）乱数のタネの値（ｅ）ステップＳ１５における最適化における最適化の
計算の繰り返し数（ｆ）ステップＳ１４で計算された目的蛋白質の初期構
造の関数値Ｆ（ｇ）ステップＳ１５で計算された最適化後の目的蛋白
質の予測構造の関数値ＦIn the standard output format, the following information is printed as shown in Tables 13 and 14. (A) Name and amino acid sequence of target protein (indicated by one letter code) (b) Name and amino acid sequence of reference protein (indicated by three letter code) (c) Homologous residues corresponding between target protein and reference protein No. (d) Seed value of random number (e) Number of iterations of optimization calculation in optimization in step S15 (f) Function value F of initial structure of target protein calculated in step S14 (g) Calculation in step S15 Function value F of the predicted structure of the target protein after optimization

【００６３】さらに、上記予測座標の計算結果において
は、表１５乃至表２４に示すように、以下の情報が印字
される。（ａ）目的蛋白質の名称（ｂ）目的蛋白質の残基毎の以下のデータ（ｂ−１）アミノ酸配列（３文字コード表示）（ｂ−２）残基番号（ｂ−３）予測されたｘ，ｙ，ｚ座標値Further, in the calculation results of the predicted coordinates, the following information is printed as shown in Tables 15 to 24. (A) Name of target protein (b) The following data for each residue of target protein (b-1) Amino acid sequence (three-letter code display) (b-2) Residue number (b-3) Predicted x , Y, z coordinate values

【００６４】以上説明したように、本実施例によれば、
ＰＤＢに基づいて相同性の度合いが最大又は比較的高い
蛋白質を参照蛋白質を選択し、選択された参照蛋白質と
目的蛋白質との間のアミノ酸配列を比較して、相同な複
数の残基ペアを切り出し、切り出された複数の残基ペア
に基づいて、参照蛋白質における互いに残基番号が異な
る各残基の原子間の相対距離を計算して、それを目的蛋
白質の対応する相対距離＜ｄ_ij＞として設定し、所定の
乱数のタネＩＲに基づいて乱数を発生させて乱数値に比
例する値を目的蛋白質の初期構造の座標値として設定し
て初期構造の相対距離ｄ_ijを計算した後、目的蛋白質の
関数Ｆの関数値が最小になるように関数Ｆの最適化を行
い、関数Ｆの変数である目的蛋白質の予測座標を計算し
ている。従って、従来の方法に比較して計算をより簡単
にすることができるとともに、輪湖・シェラガの方法な
どの従来の方法に比較して目的蛋白質の座標をより高い
精度で、すなわちより実際に近い座標を計算することが
できる。As described above, according to this embodiment,
Based on the PDB, a protein having the highest or relatively high degree of homology is selected as a reference protein, the amino acid sequences between the selected reference protein and the target protein are compared, and a plurality of homologous residue pairs are cut out. Calculate the relative distance between atoms of each residue having a different residue number in the reference protein based on the plurality of cut out residue pairs, and use that as the corresponding relative distance <d _ij > of the target protein. After a random number is generated based on a predetermined random number seed IR, a value proportional to the random number value is set as a coordinate value of the initial structure of the target protein, and a relative distance d _ij of the initial structure is calculated. The function F is optimized so that the function value of the function F is minimized, and the predicted coordinates of the target protein, which is a variable of the function F, are calculated. Therefore, the calculation can be simplified as compared with the conventional method, and the coordinates of the target protein can be calculated with higher accuracy, that is, closer to the actual one, as compared with the conventional method such as the method of Wahu and Shelaga. Coordinates can be calculated.

【００６５】さらに、目的蛋白質の残基番号ｉに対する
相関関数の平均値＜Ｃ（ｉ）＞を上記数５と数６とを用
いて計算しかつ表示することにより、アミノ酸配列情報
のみに基づいて、予測された目的蛋白質の立体構造の座
標について残基番号に沿って１次元的に、局所的な予測
精度の推定が可能となる。この方法で１次構造を比較
し、立体構造の似ている可能性の高い部分を抽出するこ
とができる。当該方法によれば、相関関数の値がどれく
らいであれば、また鎖がどれだけ長ければ立体構造の一
致が期待されるかを検討することができる。Further, by calculating and displaying the average value <C (i)> of the correlation function for residue number i of the target protein using the above formulas 5 and 6, only the amino acid sequence information is used. In addition, local prediction accuracy can be estimated one-dimensionally along the residue numbers with respect to the coordinates of the predicted three-dimensional structure of the target protein. By using this method, the primary structures can be compared, and a portion having a high possibility that the three-dimensional structure is similar can be extracted. According to this method, it is possible to examine how long the value of the correlation function is, and how long the chain is expected to be in conformity with the three-dimensional structure.

【００６６】[0066]

【表８】 Target protein: elastase porcine ec 3.4.21.36 No. of residues: 240 Fragmentation from 1 to 240[Table 8] Target protein: elastase porcine ec 3.4.21.36 No. of residues: 240 Fragmentation from 1 to 240

【００６７】[0067]

【表９】 [ 1] ( 1 - 240) 240 elastase porcine ec 3.4.21.36 ( 16 - 226) 226 CHYMOTRYPSINOGEN A code name: 1CHG cor. coef.= 0.415 ( 37.9%) + * + * + * + * + * + * + * VVGGTEAQRNSWPSQISLQYRSGSSWAHTCGGTLIRQNWVMTAAHCVDRELTFRVVVGEHNLNQNNGTEQ : : :: ::: : ::: : : ::: :: ::: ::::: : ::: IVNGEEAVPGSWPWQVSLQDKTG--F-HFCGGSLINENWVVTAAHC-GVT-TSDVVVAGEKIQKLK-IAK + * + * + * + * + * + * + * YVGVQKIVVHPYWNTDDVAAGYDIALLRLAQSVTLNSYVQLGVLPRAGTILANNSPCYITGWGLTRTNGQ : : : : :: :: : : :: : : : :::: : -V-F-KNS-K-Y-NSLTINN--DITLLKLSTAASFSQTVSAVCLPSASDDFAAGTTCVTTGWG--R---- + * + * + * + * + * + * + * LAQTLQQAYLPTVDYAICSSSSYWGSTVKNSMVCAGGNGVRSGCQGDSGGPLHCLVNGQYAVHGVTSFVS : : : :: : ::: : : ::: :: : : ::::::: : :: : : : L-Q--QAS-LPLLSNTNCKK--YWGTKIKDAMICAGASGV-SSCMGDSGGPLVCKKNGAWTLVGIVSWGS + * + * + * + * + * + * + * RLGCNVTRKPTVFTRVSAYISWINNVIASN : : : : :: : : : : S-TCS-TSTPGVYARVTALVNWVQQTLAAN[Table 9] [1] (1-240) 240 elastase porcine ec 3.4.21.36 (16-226) 226 CHYMOTRYPSINOGEN A code name: 1CHG cor. Coef. = 0.415 (37.9%) + * + * + * + * + * + * + * VVGGTEAQRNSWPSQISLQYRSGSSWAHTCGGTLIRQNWVMTAAHCVDRELTFRVVVGEHNLNQNNGTEQ ::::::::::::::::::::::::::::::: IVNGEEAVPGSWPWQVSLQDKTG--F-HFCGVVVK + * + * + * + * + * + * YVGVQKIVVHPYWNTDDVAAGYDIALLRLAQSVTLNSYVQLGVLPRAGTILANNSPCYITGWGLTRTNGQ:::: :: :::: ::::: ::::: -VF-KNS-KY-NSLTINN--DITLLKLSTAASFSQTVSAGTTCLAGS -+ * + * + * + * + * + * + * LAQTLQQAYLPTVDYAICSSSSYWGSTVKNSMVCAGGNGVRSGCQGDSGGPLHCLVNGQYAVHGVTSFVS::: ::: ::::: ::: ::: ::::::::::::: LQ-- QAS-LPLLSNTNCKK--YWGTKIKDAMICAGASGV-SSCMGDSGGPLVCKKNGAWTLVGIVSWGS + * + * + * + * + * + * + * RLGCNVTRKPTVFTRVSAYISWINNVIASN::: :::::: S-TCS-TSTPGVYARVTALVNWVQA

【００６８】[0068]

【表１０】 [ 2] ( 17 - 240) 240 elastase porcine ec 3.4.21.36 ( 10 - 222) 222 TRYPSINOGEN code name: 1TGN cor. coef.= 0.452 ( 35.7%) + * + * + * + * + * + * + * SLQYRSGSSWA-HTCGGTLIRQNWVMTAAHCVDRELTFRVVVGEHNLNQNNGTEQYVGVQKIVVHPYWNT : : ::: :: :: :::: : :: : : : :: : ::: : TVPYQVSLNSGYHFCGGSLINSQWVVSAAHCY-KS-GIQVRLGEDNINVVEGNEQFISASKSIVHPSYNS + * + * + * + * + * + * + * DDVAAGYDIALLRLAQSVTLNSYVQLGVLPRAGTILANNSPCYITGWGLTRTNGQ-LAQTLQQAYLPTVD :: : : ::: : :: : : ::: : : : : NTLNN--DIMLIKLKSAASLNSRVASISLP-TSCA-SAGTQCLISGWGNTKSSGTSYPDVLKCLKAPILS + * + * + * + * + * + * + * YAICSSSSYWGSTVKNSMVCAG-GNGVRSGCQGDSGGPLHCLVNGQYAVHGVTSFVSRLGCNVTRKPTVF : : : : ::: : :::::::: : : : : : :: :: : DSSCKSA-Y-PGQITSNMFCAGYLEGGKDSCQGDSGGPVVC--SGK--LQGIVSWGS--GCAQKNKPGVY + * + * + * + * + * + * + * TRVSAYISWINNVIASN : : : ::: :::: TKVCNYVSWIKQTIASN[Table 10] [2] (17-240) 240 elastase porcine ec 3.4.21.36 (10-222) 222 TRYPSINOGEN code name: 1TGN cor.coef. = 0.452 (35.7%) + * + * + * + * + * + * + * SLQYRSGSSWA-HTCGGTLIRQNWVMTAAHCVDRELTFRVVVGEHNLNQNNGTEQYVGVQKIVVHPYWNT:: ::: :: :: :::: ::::: ::: ::::: TVPYQVSLNSGYHFCGGSLINSQWVVSAAHCY * KS + GIVVRNSED + * DDVAAGYDIALLRLAQSVTLNSYVQLGVLPRAGTILANNSPCYITGWGLTRTNGQ-LAQTLQQAYLPTVD :::: :::: :::: ::::::: NTLNN - DIMLIKLKSAASLNSRVASISLP-TSCA-SAGTQCLISGWGNTKSSGTSYPDVLKCLKAPILS + * + * + * + * + * + * + * YAICSSSSYWGSTVKNSMVCAG-GNGVRSGCQGDSGGPLHCLVNGQYAVHGVTSFVSRLGCNVTRKPTVF: ::: :::: ::::::::::::: :: ::: DSSCKSA-Y-PGQITSNMFCAGYLEGGKDSCQGDSGGPVVC--SGK--LQGIVSWGS--GCAQKNKPGVY + * + * + * + * + * + * + * TRVSAYISWINNVIASN::: ::: :::: TKVCNYVSWIKQTIASN

【００６９】[0069]

【表１１】 [ 3] ( 1 - 240) 240 elastase porcine ec 3.4.21.36 ( 16 - 245) 245 CHYMOTRYPSINOGEN *A code name: 2CGA cor. coef.= 0.502 ( 39.0%) + * + * + * + * + * + * + * VVGGTEAQRNSWPSQISLQYRSGSSWAHTCGGTLIRQNWVMTAAHCVDRELTFRVVVGEHNLNQNNGTEQ : : :: ::: : ::: : : ::: :: ::: ::::: :: :: : IVNGEEAVPGSWPWQVSLQDKTG--F-HFCGGSLINENWVVTAAHCGVTTSDV-VVAGEFDQGSSSEKIQ + * + * + * + * + * + * + * YVGVQKIVVHPYWNTDDVAAGYDIALLRLAQSVTLNSYVQLGVLPRAGTILANNSPCYITGWGLTR-TNG : :: :: : : :: : : : ::::::: :: KLKIAKVFKNS--KYNSLTINNDITLLKLSTAASFSQTVSAVCLPSASDDFAAGTTCVTTGWGLTRYTNA + * + * + * + * + * + * + * QLAQTLQQAYLPTVDYAICSSSSYWGSTVKNSMVCAGGNGVRSGCQGDSGGPLHCLVNGQYAVHGVTSFV :::: :: : ::: : : ::: :: : : ::::::: : :: : : NTPDRLQQASLPLLSNTNCKK--YWGTKIKDAMICAGASGV-SSCMGDSGGPLVCKKNGAWTLVGIVSWG + * + * + * + * + * + * + * SRLGCNVTRKPTVFTRVSAYISWINNVIASN : : : : : :: : : : : SS-TCS-TSTPGVYARVTALVNWVQQTLAAN[Table 11] [3] (1-240) 240 elastase porcine ec 3.4.21.36 (16-245) 245 CHYMOTRYPSINOGEN * A code name: 2CGA cor. Coef. = 0.502 (39.0%) + * + * + * + * + * + * + * VVGGTEAQRNSWPSQISLQYRSGSSWAHTCGGTLIRQNWVMTAAHCVDRELTFRVVVGEHNLNQNNGTEQ :::::::::::::::::::::::::::: NGNGAV + * + * + * + * + * YVGVQKIVVHPYWNTDDVAAGYDIALLRLAQSVTLNSYVQLGVLPRAGTILANNSPCYITGWGLTR-TNG: :: :::: ::::: ::::::: :: KLKIAKVFKNS--KYNSLTINNDITLLKLSTAASFSQTV + ATS + * QLAQTLQQAYLPTVDYAICSSSSYWGSTVKNSMVCAGGNGVRSGCQGDSGGPLHCLVNGQYAVHGVTSFV :::::::::::::::::::::::::: NTPDRLQQASLPLLSNTNCKK--YWGTKIKDAMICAGASGSGVPLV-SSCMGDSGG + * + * SRLGCNVTRKPTVFTRVSAYISWINNVIASN::::: :::::: SS-TCS-TSTPGVYARVTALVNWVQQTLAAN

【００７０】[0070]

【表１２】 [Table 12]

【００７１】[0071]

【表１３】 Protein name: elastase porcine ec 3.4.21.36 Sequence: vvggteaqrn swpsqislqy rsgsswahtc ggtlirqnwv mtaahcvdre ltfrvvvgeh nlnqnngteq yvgvqkivvh pywntddvaa gydiallrla qsvtlnsyvq lgvlpragti lannspcyit gwgltrtngq laqtlqqayl ptvdyaicss ssywgstvkn smvcaggngv rsgcqgdsgg plhclvngqy avhgvtsfvs rlgcnvtrkp tvftrvsayi swinnviasn[Table 13] Protein name: elastase porcine ec 3.4.21.36 Sequence: vvggteaqrn swpsqislqy rsgsswahtc ggtlirqnwv mtaahcvdre ltfrvvvgeh nlnqnngteq yvgvqkivvh pywntddvaa gydiallrla qsvtlnsyvq lgvlpragti lannspcyit gwgltrtngq laqtlqqayl ptvdyaicss ssywgstvkn smvcaggngv rsgcqgdsgg plhclvngqy avhgvtsfvs rlgcnvtrkp tvftrvsayi swinnviasn

【００７２】[0072]

【表１４】 [Suplementary information] Reference protein: CHYMOTRYPSINOGEN *A 2CGA 4 Sequence: CYS GLY VAL PRO ALA ILE GLN PRO VAL LEU SER GLY LEU SER ARG ILE VAL ASN GLY GLU GLU ALA VAL PRO GLY SER TRP PRO TRP GLN VAL SER LEU GLN ASP LYS THR GLY PHE HIS PHE CYS GLY GLY SER LEU ILE ASN GLU ASN TRP VAL VAL THR ALA ALA HIS CYS GLY VAL THR THR SER ASP VAL VAL VAL ALA GLY GLU PHE ASP GLN GLY SER SER SER GLU LYS ILE GLN LYS LEU LYS ILE ALA LYS VAL PHE LYS ASN SER LYS TYR ASN SER LEU THR ILE ASN ASN ASP ILE THR LEU LEU LYS LEU SER THR ALA ALA SER PHE SER GLN THR VAL SER ALA VAL CYS LEU PRO SER ALA SER ASP ASP PHE ALA ALA GLY THR THR CYS VAL THR THR GLY TRP GLY LEU THR ARG TYR THR ASN ALA ASN THR PRO ASP ARG LEU GLN GLN ALA SER LEU PRO LEU LEU SER ASN THR ASN CYS LYS LYS TYR TRP GLY THR LYS ILE LYS ASP ALA MET ILE CYS ALA GLY ALA SER GLY VAL SER SER CYS MET GLY ASP SER GLY GLY PRO LEU VAL CYS LYS LYS ASN GLY ALA TRP THR LEU VAL GLY ILE VAL SER TRP GLY SER SER THR CYS SER THR SER THR PRO GLY VAL TYR ALA ARG VAL THR ALA LEU VAL ASN TRP VAL GLN GLN THR LEU ALA ALA ASN Residue pairs: 1- 23: 16- 38 | 28- 53: 40- 65 | 55- 81: 66- 92 | 84-136: 93-145 | 137-160:147-170 | 163-180:171-188 | 182-211:189-218 | 217-240:222-245 | Random number generation: 2 Minop has exhausted 300calls of calcfg ** Initial conformation: F= 0.521134E+07 ** Final conformation: F= 0.323426E+04[Table 14] [Suplementary information] Reference protein: CHYMOTRYPSINOGEN * A 2CGA 4 Sequence: CYS GLY VAL PRO ALA ILE GLN PRO VAL LEU SER GLY LEU SER ARG ILE VAL ASN GLY GLU GLU ALA VAL PRO GLY SER TRP PRO TRP GLN VAL SER LEU GLN ASP LYS THR GLY PHE HIS PHE CYS GLY GLY SER LEU ILE ASN GLU ASN TRP VAL VAL THR ALA ALA HIS CYS GLY VAL THR THR SER ASP VAL VAL VAL ALA GLY GLU PHE ASP GLN GLY SER SER SER GLU LYS GLN LEU LYS ILE ALA LYS VAL PHE LYS ASN SER LYS TYR ASN SER LEU THR ILE ASN ASN ASP ILE THR LEU LEU LYS LEU SER THR ALA ALA SER PHE SER GLN THR VAL SER ALA VAL CYS LEU PRO SER ALA SER ASP PHE ALA GLY THR THR CYS VAL THR THR GLY TRP GLY LEU THR ARG TYR THR ASN ALA ASN THR PRO ASP ARG LEU GLN GLN ALA SER LEU PRO LEU LEU SER ASN THR ASN CYS LYS LYS TYR TRP GLY THR LYS ILE LYS ASP AMET ALA GLY ALA SER GLY VAL SER SER CYS MET GLY ASP SER GLY GLY PRO LEU VAL CYS LYS LYS ASN GLY ALA TRP THR LEU VAL GLY ILE VAL SER TRP GLY SER SER THR CYS SER THR SER THR PRO G LY VAL TYR ALA ARG VAL THR ALA LEU VAL ASN TRP VAL GLN GLN THR LEU ALA ALA ASN Residue pairs: 1-23: 16-38 | 28-53: 40-65 | 55-81: 66-92 | 84-136 : 93-145 | 137-160: 147-170 | 163-180: 171-188 | 182-211: 189-218 | 217-240: 222-245 | Random number generation: 2 Minop has exhausted 300calls of calcfg ** Initial conformation: F = 0.521134E + 07 ** Final conformation: F = 0.323426E + 04

【００７３】[0073]

【表１５】 HEADER elastase porcine ec 3.4.21.36 COMPND ATOM 1 CA VAL 1 18.851 -2.532 20.987 ATOM 2 CA VAL 2 15.303 -2.327 19.641 ATOM 3 CA GLY 3 14.268 1.317 20.378 ATOM 4 CA GLY 4 16.678 2.285 23.284 ATOM 5 CA THR 5 15.808 2.957 26.986 ATOM 6 CA GLU 6 14.409 5.605 29.237 ATOM 7 CA ALA 7 16.874 8.289 30.295 ATOM 8 CA GLN 8 17.404 9.265 33.926 ATOM 9 CA ARG 9 15.443 12.444 34.524 ATOM 10 CA ASN 10 17.432 15.682 34.242 ATOM 11 CA SER 11 20.388 13.904 32.727 ATOM 12 CA TRP 12 20.195 15.463 29.240 ATOM 13 CA PRO 13 19.813 19.086 30.382 ATOM 14 CA SER 14 20.126 20.787 26.964 ATOM 15 CA GLN 15 17.127 18.902 25.531 ATOM 16 CA ILE 16 14.106 21.076 24.752 ATOM 17 CA SER 17 10.689 20.574 23.259 ATOM 18 CA LEU 18 9.093 22.827 20.699 ATOM 19 CA GLN 19 5.414 23.298 20.964 ATOM 20 CA TYR 20 2.907 25.510 19.275 ATOM 21 CA ARG 21 0.476 28.157 20.613 ATOM 22 CA SER 22 -2.005 25.426 21.582 ATOM 23 CA GLY 23 0.535 22.932 22.760 ATOM 24 CA SER 24 0.467 21.756 19.096 ATOM 25 CA SER 25 2.042 22.804 15.793[Table 15] HEADER elastase porcine ec 3.4.21.36 COMPND ATOM 1 CA VAL 1 18.851 -2.532 20.987 ATOM 2 CA VAL 2 15.303 -2.327 19.641 ATOM 3 CA GLY 3 14.268 1.317 20.378 ATOM 4 CA GLY 4 16.678 2.285 23.284 ATOM 5 CA THR 5 15.808 2.957 26.986 ATOM 6 CA GLU 6 14.409 5.605 29.237 ATOM 7 CA ALA 7 16.874 8.289 30.295 ATOM 8 CA GLN 8 17.404 9.265 33.926 ATOM 9 CA ARG 9 15.443 12.444 34.524 ATOM 10 CA ASN 10 17.432 15.682 34.242 ATOM 11 CA SER 11 20.388 13.904 32.727 ATOM 12 CA TRP 12 20.195 15.463 29.240 ATOM 13 CA PRO 13 19.813 19.086 30.382 ATOM 14 CA SER 14 20.126 20.787 26.964 ATOM 15 CA GLN 15 17.127 18.902 25.531 ATOM 16 CA ILE 16 14.106 21.076 24.752 ATOM 17 CA SER 17 10.689 20.574 23.259 ATOM 18 CA LEU 18 9.093 22.827 20.699 ATOM 19 CA GLN 19 5.414 23.298 20.964 ATOM 20 CA TYR 20 2.907 25.510 19.275 ATOM 21 CA ARG 21 0.476 28.157 20.613 ATOM 22 CA SER 22 -2.005 25.426 21.582 ATOM 23 CA GLY 23 0.535 22.932 22.760 ATOM 24 CA SER 24 0.467 21.756 19.096 ATOM 25 CA SER 25 2.042 22.804 15. 793

【００７４】[0074]

【表１６】 ATOM 26 CA TRP 26 3.969 19.750 14.634 ATOM 27 CA ALA 27 2.807 17.611 17.585 ATOM 28 CA HIS 28 4.789 18.927 20.563 ATOM 29 CA THR 29 6.194 19.127 17.020 ATOM 30 CA CYS 30 9.957 19.301 17.210 ATOM 31 CA GLY 31 12.964 19.010 19.458 ATOM 32 CA GLY 32 15.944 21.369 20.022 ATOM 33 CA THR 33 19.182 21.917 22.057 ATOM 34 CA LEU 34 20.207 24.855 24.311 ATOM 35 CA ILE 35 23.569 26.252 23.416 ATOM 36 CA ARG 36 23.218 28.969 26.009 ATOM 37 CA GLN 37 20.595 30.780 28.048 ATOM 38 CA ASN 38 19.501 32.806 24.995 ATOM 39 CA TRP 39 19.762 30.486 22.041 ATOM 40 CA VAL 40 18.339 27.129 20.770 ATOM 41 CA MET 41 19.501 25.071 17.786 ATOM 42 CA THR 42 16.988 23.035 15.819 ATOM 43 CA ALA 43 16.242 21.851 12.273 ATOM 44 CA ALA 44 15.217 24.288 9.484 ATOM 45 CA HIS 45 12.578 21.796 8.167 ATOM 46 CA CYS 46 10.856 22.238 11.555 ATOM 47 CA VAL 47 9.278 25.470 10.248 ATOM 48 CA ASP 48 9.424 27.290 13.625 ATOM 49 CA ARG 49 7.681 30.673 14.011 ATOM 50 CA GLU 50 7.681 33.203 16.785[Table 16] ATOM 26 CA TRP 26 3.969 19.750 14.634 ATOM 27 CA ALA 27 2.807 17.611 17.585 ATOM 28 CA HIS 28 4.789 18.927 20.563 ATOM 29 CA THR 29 6.194 19.127 17.020 ATOM 30 CA CYS 30 9.957 19.301 17.210 ATOM 31 CA GLY 31 12.964 19.010 19.458 ATOM 32 CA GLY 32 15.944 21.369 20.022 ATOM 33 CA THR 33 19.182 21.917 22.057 ATOM 34 CA LEU 34 20.207 24.855 24.311 ATOM 35 CA ILE 35 23.569 26.252 23.416 ATOM 36 CA ARG 36 23.218 28.969 26.009 ATOM 37 CA GLN 37 20.595 30.780 ATOM 38 CA ASN 38 19.501 32.806 24.995 ATOM 39 CA TRP 39 19.762 30.486 22.041 ATOM 40 CA VAL 40 18.339 27.129 20.770 ATOM 41 CA MET 41 19.501 25.071 17.786 ATOM 42 CA THR 42 16.988 23.035 15.819 ATOM 43 CA ALA 43 16.242 21.851 12.273 ATOM 44 CA ALA 44 15.217 24.288 9.484 ATOM 45 CA HIS 45 12.578 21.796 8.167 ATOM 46 CA CYS 46 10.856 22.238 11.555 ATOM 47 CA VAL 47 9.278 25.470 10.248 ATOM 48 CA ASP 48 9.424 27.290 13.625 ATOM 49 CA ARG 49 7.681 30.673 14.011 ATOM 50 CA GLU 50 7.681 33.203 16.785

【００７５】[0075]

【表１７】 ATOM 51 CA LEU 51 4.274 31.867 17.943 ATOM 52 CA THR 52 5.811 28.550 18.988 ATOM 53 CA PHE 53 7.071 28.006 22.591 ATOM 54 CA ARG 54 8.397 24.718 21.106 ATOM 55 CA VAL 55 10.286 26.413 23.928 ATOM 56 CA VAL 56 9.932 24.014 26.848 ATOM 57 CA VAL 57 12.973 23.250 28.996 ATOM 58 CA GLY 58 13.598 20.851 31.890 ATOM 59 CA GLU 59 11.025 18.298 30.786 ATOM 60 CA HIS 60 11.118 14.540 31.293 ATOM 61 CA ASN 61 7.653 13.062 31.184 ATOM 62 CA LEU 62 5.258 14.966 29.043 ATOM 63 CA ASN 63 2.358 13.384 30.844 ATOM 64 CA GLN 64 3.344 15.002 34.151 ATOM 65 CA ASN 65 2.043 18.438 35.406 ATOM 66 CA ASN 66 4.203 18.063 38.430 ATOM 67 CA GLY 67 7.441 18.996 36.718 ATOM 68 CA THR 68 9.139 22.298 37.292 ATOM 69 CA GLU 69 9.719 23.150 33.610 ATOM 70 CA GLN 70 10.445 26.522 31.865 ATOM 71 CA TYR 71 8.236 27.651 29.000 ATOM 72 CA VAL 72 10.232 30.118 26.973 ATOM 73 CA GLY 73 9.090 32.602 24.329 ATOM 74 CA VAL 74 10.871 33.193 21.048 ATOM 75 CA GLN 75 12.145 36.637 19.897[Table 17] ATOM 51 CA LEU 51 4.274 31.867 17.943 ATOM 52 CA THR 52 5.811 28.550 18.988 ATOM 53 CA PHE 53 7.071 28.006 22.591 ATOM 54 CA ARG 54 8.397 24.718 21.106 ATOM 55 CA VAL 55 10.286 26.413 23.928 ATOM 56 CA VAL 56 9.932 24.014 26.848 ATOM 57 CA VAL 57 12.973 23.250 28.996 ATOM 58 CA GLY 58 13.598 20.851 31.890 ATOM 59 CA GLU 59 11.025 18.298 30.786 ATOM 60 CA HIS 60 11.118 14.540 31.293 ATOM 61 CA ASN 61 7.653 13.062 31.184 ATOM 62 CA LEU 62 5.258 14.966 29.043 ATOM 63 CA ASN 63 2.358 13.384 30.844 ATOM 64 CA GLN 64 3.344 15.002 34.151 ATOM 65 CA ASN 65 2.043 18.438 35.406 ATOM 66 CA ASN 66 4.203 18.063 38.430 ATOM 67 CA GLY 67 7.441 18.996 36.718 ATOM 68 CA THR 68 9.139 22.298 37.292 ATOM 69 CA GLU 69 9.719 23.150 33.610 ATOM 70 CA GLN 70 10.445 26.522 31.865 ATOM 71 CA TYR 71 8.236 27.651 29.000 ATOM 72 CA VAL 72 10.232 30.118 26.973 ATOM 73 CA GLY 73 9.090 32.602 24.329 ATOM 74 CA VAL 74 10.871 33.193 21.048 ATOM 75 CAGLN 75 12.145 36.637 19.897

【００７６】[0076]

【表１８】 ATOM 76 CA LYS 76 13.419 35.909 16.359 ATOM 77 CA ILE 77 13.963 32.885 14.160 ATOM 78 CA VAL 78 17.058 32.573 11.944 ATOM 79 CA VAL 79 16.921 29.883 9.269 ATOM 80 CA HIS 80 20.294 29.135 7.395 ATOM 81 CA PRO 81 19.837 30.674 3.980 ATOM 82 CA TYR 82 16.953 28.351 4.957 ATOM 83 CA TRP 83 20.573 27.735 5.984 ATOM 84 CA ASN 84 21.686 27.864 2.292 ATOM 85 CA THR 85 18.733 25.451 3.256 ATOM 86 CA ASP 86 17.599 23.567 0.387 ATOM 87 CA ASP 87 14.123 22.580 0.848 ATOM 88 CA VAL 88 14.430 20.145 -2.002 ATOM 89 CA ALA 89 17.363 18.113 -0.848 ATOM 90 CA ALA 90 16.890 18.918 2.901 ATOM 91 CA GLY 91 20.567 19.922 2.948 ATOM 92 CA TYR 92 21.984 22.543 5.411 ATOM 93 CA ASP 93 19.056 21.731 7.820 ATOM 94 CA ILE 94 19.470 24.142 10.697 ATOM 95 CA ALA 95 17.722 27.016 12.452 ATOM 96 CA LEU 96 18.585 29.300 15.403 ATOM 97 CA LEU 97 16.026 30.577 17.739 ATOM 98 CA ARG 98 16.756 33.629 19.888 ATOM 99 CA LEU 99 14.620 33.510 23.091 ATOM 100 CA ALA 100 12.579 36.612 24.198[Table 18] ATOM 76 CA LYS 76 13.419 35.909 16.359 ATOM 77 CA ILE 77 13.963 32.885 14.160 ATOM 78 CA VAL 78 17.058 32.573 11.944 ATOM 79 CA VAL 79 16.921 29.883 9.269 ATOM 80 CA HIS 80 20.294 29.135 7.395 ATOM 81 CA PRO 81 19.837 30.674 3.980 ATOM 82 CA TYR 82 16.953 28.351 4.957 ATOM 83 CA TRP 83 20.573 27.735 5.984 ATOM 84 CA ASN 84 21.686 27.864 2.292 ATOM 85 CA THR 85 18.733 25.451 3.256 ATOM 86 CA ASP 86 17.599 23.567 0.387 ATOM 87 CA ASP 87 14.123 22.580 0.848 ATOM 88 CA VAL 88 14.430 20.145 -2.002 ATOM 89 CA ALA 89 17.363 18.113 -0.848 ATOM 90 CA ALA 90 16.890 18.918 2.901 ATOM 91 CA GLY 91 20.567 19.922 2.948 ATOM 92 CA TYR 92 21.984 22.543 5.411 ATOM 93 CA ASP 93 19.056 21.731 7.820 ATOM 94 CA ILE 94 19.470 24.142 10.697 ATOM 95 CA ALA 95 17.722 27.016 12.452 ATOM 96 CA LEU 96 18.585 29.300 15.403 ATOM 97 CA LEU 97 16.026 30.577 17.739 ATOM 98 CA ARG 98 16.756 33.629 19.888 ATOM 99 CA LEU 99 14.620 33.510 23.091 ATOM 100 CA ALA 100 12.579 36.612 24.198

【００７７】[0077]

【表１９】 ATOM 101 CA GLN 101 12.921 35.234 27.696 ATOM 102 CA SER 102 16.256 33.781 28.839 ATOM 103 CA VAL 103 16.598 30.229 30.219 ATOM 104 CA THR 104 17.429 29.900 33.841 ATOM 105 CA LEU 105 20.290 27.410 34.114 ATOM 106 CA ASN 106 19.975 24.821 36.900 ATOM 107 CA SER 107 20.655 21.250 37.796 ATOM 108 CA TYR 108 18.309 20.142 35.019 ATOM 109 CA VAL 109 18.669 22.991 32.486 ATOM 110 CA GLN 110 22.024 23.750 30.864 ATOM 111 CA LEU 111 23.957 23.834 27.556 ATOM 112 CA GLY 112 25.474 21.338 25.072 ATOM 113 CA VAL 113 28.934 22.164 23.644 ATOM 114 CA LEU 114 29.414 23.002 19.950 ATOM 115 CA PRO 115 32.334 21.340 17.893 ATOM 116 CA ARG 116 35.074 23.456 16.017 ATOM 117 CA ALA 117 34.676 23.489 12.202 ATOM 118 CA GLY 118 37.846 21.310 12.129 ATOM 119 CA THR 119 36.637 18.684 14.544 ATOM 120 CA ILE 120 36.853 15.237 13.204 ATOM 121 CA LEU 121 34.484 12.657 14.608 ATOM 122 CA ALA 122 35.483 9.192 13.197 ATOM 123 CA ASN 123 33.124 6.458 11.904 ATOM 124 CA ASN 124 32.277 3.845 14.454 ATOM 125 CA SER 125 32.462 6.308 17.289[Table 19] ATOM 101 CA GLN 101 12.921 35.234 27.696 ATOM 102 CA SER 102 16.256 33.781 28.839 ATOM 103 CA VAL 103 16.598 30.229 30.219 ATOM 104 CA THR 104 17.429 29.900 33.841 ATOM 105 CA LEU 105 20.290 27.410 34.114 ATOM 106 CA ASN 106 19.975 24.821 36.900 ATOM 107 CA SER 107 20.655 21.250 37.796 ATOM 108 CA TYR 108 18.309 20.142 35.019 ATOM 109 CA VAL 109 18.669 22.991 32.486 ATOM 110 CA GLN 110 22.024 23.750 30.864 ATOM 111 CA LEU 111 23.957 23.834 27.556 ATOM 112 CA GLY 112 25.474 21. ATOM 113 CA VAL 113 28.934 22.164 23.644 ATOM 114 CA LEU 114 29.414 23.002 19.950 ATOM 115 CA PRO 115 32.334 21.340 17.893 ATOM 116 CA ARG 116 35.074 23.456 16.017 ATOM 117 CA ALA 117 34.676 23.489 12.202 ATOM 118 CA GLY 118 37.846 21.310 12.129 ATOM 119 CA THR 119 36.637 18.684 14.544 ATOM 120 CA ILE 120 36.853 15.237 13.204 ATOM 121 CA LEU 121 34.484 12.657 14.608 ATOM 122 CA ALA 122 35.483 9.192 13.197 ATOM 123 CA ASN 123 33.124 6.458 11.904 ATOM 124 CA ASN 124 32.277 3.845 14.454 ATO M 125 CA SER 125 32.462 6.308 17.289

【００７８】[0078]

【表２０】 ATOM 126 CA PRO 126 29.443 5.603 19.607 ATOM 127 CA CYS 127 27.161 8.681 20.144 ATOM 128 CA TYR 128 23.801 9.364 21.797 ATOM 129 CA ILE 129 20.457 10.524 20.417 ATOM 130 CA THR 130 17.632 11.463 22.732 ATOM 131 CA GLY 131 13.996 12.472 22.429 ATOM 132 CA TRP 132 10.244 11.899 22.536 ATOM 133 CA GLY 133 10.010 10.720 18.922 ATOM 134 CA LEU 134 8.160 7.549 19.898 ATOM 135 CA THR 135 5.145 9.670 20.637 ATOM 136 CA ARG 136 4.723 10.340 16.882 ATOM 137 CA THR 137 -0.358 8.075 17.391 ATOM 138 CA ASN 138 -2.041 4.683 17.584 ATOM 139 CA GLY 139 0.642 3.241 19.679 ATOM 140 CA GLN 140 0.073 6.394 21.641 ATOM 141 CA LEU 141 3.334 5.402 23.381 ATOM 142 CA ALA 142 5.564 6.033 26.461 ATOM 143 CA GLN 143 5.945 9.802 26.947 ATOM 144 CA THR 144 9.266 9.810 28.849 ATOM 145 CA LEU 145 12.605 10.935 27.345 ATOM 146 CA GLN 146 14.358 7.949 25.590 ATOM 147 CA GLN 147 18.018 7.543 24.736 ATOM 148 CA ALA 148 19.880 5.281 22.299 ATOM 149 CA TYR 149 23.601 4.723 21.595 ATOM 150 CA LEU 150 24.381 4.597 17.895[Table 20] ATOM 126 CA PRO 126 29.443 5.603 19.607 ATOM 127 CA CYS 127 27.161 8.681 20.144 ATOM 128 CA TYR 128 23.801 9.364 21.797 ATOM 129 CA ILE 129 20.457 10.524 20.417 ATOM 130 CA THR 130 17.632 11.463 22.732 ATOM 131 CA GLY 131 13.996 12.472 22.429 ATOM 132 CA TRP 132 10.244 11.899 22.536 ATOM 133 CA GLY 133 10.010 10.720 18.922 ATOM 134 CA LEU 134 8.160 7.549 19.898 ATOM 135 CA THR 135 5.145 9.670 20.637 ATOM 136 CA ARG 136 4.723 10.340 16.882 ATOM 137 CA THR 137 -0.358 8.075 17.391 ATOM 138 CA ASN 138 -2.041 4.683 17.584 ATOM 139 CA GLY 139 0.642 3.241 19.679 ATOM 140 CA GLN 140 0.073 6.394 21.641 ATOM 141 CA LEU 141 3.334 5.402 23.381 ATOM 142 CA ALA 142 5.564 6.033 26.461 ATOM 143 CA GLN 143 5.945 9.802 26.947 ATOM 144 CA THR 144 9.266 9.810 28.849 ATOM 145 CA LEU 145 12.605 10.935 27.345 ATOM 146 CA GLN 146 14.358 7.949 25.590 ATOM 147 CA GLN 147 18.018 7.543 24.736 ATOM 148 CA ALA 148 19.880 5.281 22.299 ATOM 149 CA TYR 149 23.601 4.723 21.595 CA LEU 150 24.381 4.597 17.895

【００７９】[0079]

【表２１】 ATOM 151 CA PRO 151 27.725 4.413 15.936 ATOM 152 CA THR 152 28.655 7.144 13.428 ATOM 153 CA VAL 153 28.940 6.048 9.799 ATOM 154 CA ASP 154 31.047 7.478 6.999 ATOM 155 CA TYR 155 28.950 9.044 4.203 ATOM 156 CA ALA 156 30.379 6.460 1.703 ATOM 157 CA ILE 157 28.888 3.802 3.705 ATOM 158 CA CYS 158 25.671 5.686 4.243 ATOM 159 CA SER 159 25.080 5.897 0.451 ATOM 160 CA SER 160 24.542 2.092 0.480 ATOM 161 CA SER 161 23.473 1.851 4.162 ATOM 162 CA SER 162 20.506 3.770 5.660 ATOM 163 CA TYR 163 21.189 2.940 2.015 ATOM 164 CA TRP 164 20.008 6.191 0.847 ATOM 165 CA GLY 165 22.043 6.686 -2.295 ATOM 166 CA SER 166 22.599 10.145 -3.735 ATOM 167 CA THR 167 20.332 11.720 -1.147 ATOM 168 CA VAL 168 23.467 11.834 1.098 ATOM 169 CA LYS 169 25.090 15.334 0.833 ATOM 170 CA ASN 170 28.351 16.462 2.426 ATOM 171 CA SER 171 26.602 18.580 5.035 ATOM 172 CA MET 172 24.920 15.473 6.321 ATOM 173 CA VAL 173 26.021 12.759 8.769 ATOM 174 CA CYS 174 24.491 9.298 9.375 ATOM 175 CA ALA 175 24.378 7.183 12.460[Table 21] ATOM 151 CA PRO 151 27.725 4.413 15.936 ATOM 152 CA THR 152 28.655 7.144 13.428 ATOM 153 CA VAL 153 28.940 6.048 9.799 ATOM 154 CA ASP 154 31.047 7.478 6.999 ATOM 155 CA TYR 155 28.950 9.044 4.203 ATOM 156 CA ALA 156 30.379 6.460 1.703 ATOM 157 CA ILE 157 28.888 3.802 3.705 ATOM 158 CA CYS 158 25.671 5.686 4.243 ATOM 159 CA SER 159 25.080 5.897 0.451 ATOM 160 CA SER 160 24.542 2.092 0.480 ATOM 161 CA SER 161 23.473 1.851 4.162 ATOM 162 CA SER 162 20.506 3.770 ATOM 163 CA TYR 163 21.189 2.940 2.015 ATOM 164 CA TRP 164 20.008 6.191 0.847 ATOM 165 CA GLY 165 22.043 6.686 -2.295 ATOM 166 CA SER 166 22.599 10.145 -3.735 ATOM 167 CA THR 167 20.332 11.720 -1.147 ATOM 168 CA VAL 168 23.467 11.834 1.098 ATOM 169 CA LYS 169 25.090 15.334 0.833 ATOM 170 CA ASN 170 28.351 16.462 2.426 ATOM 171 CA SER 171 26.602 18.580 5.035 ATOM 172 CA MET 172 24.920 15.473 6.321 ATOM 173 CA VAL 173 26.021 12.759 8.769 ATOM 174 CA CYS 174 24.491 9.298 9.375 175 CA ALA 175 24.378 7.183 12 .460

【００８０】[0080]

【表２２】 ATOM 176 CA GLY 176 22.974 3.824 13.593 ATOM 177 CA GLY 177 21.787 0.767 11.650 ATOM 178 CA ASN 178 23.430 -1.515 14.182 ATOM 179 CA GLY 179 20.439 -3.047 15.884 ATOM 180 CA VAL 180 18.859 -0.111 17.378 ATOM 181 CA ARG 181 19.056 3.286 18.939 ATOM 182 CA SER 182 16.589 2.327 16.140 ATOM 183 CA GLY 183 16.885 6.146 15.520 ATOM 184 CA CYS 184 13.181 7.048 15.645 ATOM 185 CA GLN 185 13.352 8.263 19.312 ATOM 186 CA GLY 186 15.002 11.524 18.037 ATOM 187 CA ASP 187 12.469 14.253 17.449 ATOM 188 CA SER 188 13.025 16.375 14.353 ATOM 189 CA GLY 189 15.565 18.979 15.251 ATOM 190 CA GLY 190 16.815 17.051 18.336 ATOM 191 CA PRO 191 20.443 16.249 19.282 ATOM 192 CA LEU 192 23.005 13.724 18.326 ATOM 193 CA HIS 193 25.791 14.108 20.861 ATOM 194 CA CYS 194 29.203 12.596 21.186 ATOM 195 CA LEU 195 31.753 13.081 23.894 ATOM 196 CA VAL 196 34.846 15.124 23.079 ATOM 197 CA ASN 197 37.452 15.597 25.869 ATOM 198 CA GLY 198 34.806 14.392 28.396 ATOM 199 CA GLN 199 32.250 16.930 27.291 ATOM 200 CA TYR 200 29.016 16.171 25.402[Table 22] ATOM 176 CA GLY 176 22.974 3.824 13.593 ATOM 177 CA GLY 177 21.787 0.767 11.650 ATOM 178 CA ASN 178 23.430 -1.515 14.182 ATOM 179 CA GLY 179 20.439 -3.047 15.884 ATOM 180 CA VAL 180 18.859 -0.111 17.378 ATOM 181 CA ARG 181 19.056 3.286 18.939 ATOM 182 CA SER 182 16.589 2.327 16.140 ATOM 183 CA GLY 183 16.885 6.146 15.520 ATOM 184 CA CYS 184 13.181 7.048 15.645 ATOM 185 CA GLN 185 13.352 8.263 19.312 ATOM 186 CA GLY 186 15.002 11.524 18.037 ATOM 187 CA ASP 187 12.469 14.253 17.449 ATOM 188 CA SER 188 13.025 16.375 14.353 ATOM 189 CA GLY 189 15.565 18.979 15.251 ATOM 190 CA GLY 190 16.815 17.051 18.336 ATOM 191 CA PRO 191 20.443 16.249 19.282 ATOM 192 CA LEU 192 23.005 13.724 18.326 ATOM 193 CA HIS 193 25.791 14.108 20.861 ATOM 194 CA CYS 194 29.203 12.596 21.186 ATOM 195 CA LEU 195 31.753 13.081 23.894 ATOM 196 CA VAL 196 34.846 15.124 23.079 ATOM 197 CA ASN 197 37.452 15.597 25.869 ATOM 198 CA GLY 198 34.806 14.392 28.396 ATOM 199 32. 20 0 CA TYR 200 29.016 16.171 25.402

【００８１】[0081]

【表２３】 ATOM 201 CA ALA 201 29.237 17.882 22.017 ATOM 202 CA VAL 202 26.477 18.407 19.386 ATOM 203 CA HIS 203 27.599 16.495 16.297 ATOM 204 CA GLY 204 24.149 15.989 14.545 ATOM 205 CA VAL 205 20.649 17.525 14.312 ATOM 206 CA THR 206 17.833 14.988 13.502 ATOM 207 CA SER 207 17.004 15.670 9.868 ATOM 208 CA PHE 208 15.361 12.670 8.212 ATOM 209 CA VAL 209 15.198 8.921 7.541 ATOM 210 CA SER 210 12.840 6.012 7.980 ATOM 211 CA ARG 211 9.065 6.721 8.446 ATOM 212 CA LEU 212 9.207 10.363 9.487 ATOM 213 CA GLY 213 10.513 8.176 12.286 ATOM 214 CA CYS 214 14.299 8.540 12.751 ATOM 215 CA ASN 215 14.022 5.057 14.226 ATOM 216 CA VAL 216 16.089 2.589 12.218 ATOM 217 CA THR 217 16.970 -0.988 13.315 ATOM 218 CA ARG 218 18.852 -1.719 10.187 ATOM 219 CA LYS 219 18.625 1.675 8.439 ATOM 220 CA PRO 220 20.773 4.709 9.586 ATOM 221 CA THR 221 19.186 7.973 10.532 ATOM 222 CA VAL 222 20.240 11.136 8.710 ATOM 223 CA PHE 223 21.377 14.250 10.543 ATOM 224 CA THR 224 22.588 17.709 9.840 ATOM 225 CA ARG 225 26.434 17.861 10.202[Table 23] ATOM 201 CA ALA 201 29.237 17.882 22.017 ATOM 202 CA VAL 202 26.477 18.407 19.386 ATOM 203 CA HIS 203 27.599 16.495 16.297 ATOM 204 CA GLY 204 24.149 15.989 14.545 ATOM 205 CA VAL 205 20.649 17.525 14.312 ATOM 206 CA THR 206 17.833 14.988 13.502 ATOM 207 CA SER 207 17.004 15.670 9.868 ATOM 208 CA PHE 208 15.361 12.670 8.212 ATOM 209 CA VAL 209 15.198 8.921 7.541 ATOM 210 CA SER 210 12.840 6.012 7.980 ATOM 211 CA ARG 211 9.065 6.721 8.446 ATOM 212 CA LEU 212 9.207 10.363 9.487 ATOM 213 CA GLY 213 10.513 8.176 12.286 ATOM 214 CA CYS 214 14.299 8.540 12.751 ATOM 215 CA ASN 215 14.022 5.057 14.226 ATOM 216 CA VAL 216 16.089 2.589 12.218 ATOM 217 CA THR 217 16.970 -0.988 13.315 ATOM 218 CA ARG 218 18.852 -1.719 10.187 ATOM 219 CA LYS 219 18.625 1.675 8.439 ATOM 220 CA PRO 220 20.773 4.709 9.586 ATOM 221 CA THR 221 19.186 7.973 10.532 ATOM 222 CA VAL 222 20.240 11.136 8.710 ATOM 223 CA PHE 223 21.377 14.250 10.543 ATOM 224 CA THR 224 22.588 17.709 9.840 ATOM 225 CA ARG 225 26 .434 17.861 10.202

【００８２】[0082]

【表２４】 ATOM 226 CA VAL 226 26.987 20.558 12.851 ATOM 227 CA SER 227 30.732 20.894 12.035 ATOM 228 CA ALA 228 29.767 22.035 8.475 ATOM 229 CA TYR 229 27.315 24.546 9.917 ATOM 230 CA ILE 230 29.176 25.852 12.929 ATOM 231 CA SER 231 30.676 28.924 11.237 ATOM 232 CA TRP 232 27.267 30.374 10.371 ATOM 233 CA ILE 233 26.138 29.758 14.035 ATOM 234 CA ASN 234 29.038 31.790 15.523 ATOM 235 CA ASN 235 28.410 34.403 12.987 ATOM 236 CA VAL 236 24.670 34.745 13.693 ATOM 237 CA ILE 237 25.176 34.793 17.433 ATOM 238 CA ALA 238 28.014 37.384 17.237 ATOM 239 CA SER 239 25.674 39.775 15.407 ATOM 240 CA ASN 240 22.208 39.280 16.899 END[Table 24] ATOM 226 CA VAL 226 26.987 20.558 12.851 ATOM 227 CA SER 227 30.732 20.894 12.035 ATOM 228 CA ALA 228 29.767 22.035 8.475 ATOM 229 CA TYR 229 27.315 24.546 9.917 ATOM 230 CA ILE 230 29.176 25.852 12.929 ATOM 231 CA SERSER 28.924 11.237 ATOM 232 CA TRP 232 27.267 30.374 10.371 ATOM 233 CA ILE 233 26.138 29.758 14.035 ATOM 234 CA ASN 234 29.038 31.790 15.523 ATOM 235 CA ASN 235 28.410 34.403 12.987 ATOM 236 CA VAL 236 24.670 34.745 13.693 37.37 37.37 37.3737 ATOM 238 CA ALA 238 28.014 37.384 17.237 ATOM 239 CA SER 239 25.674 39.775 15.407 ATOM 240 CA ASN 240 22.208 39.280 16.899 END

【００８３】以上の実施例において、目的蛋白質のα炭
素原子の座標を予測演算しているが、本発明はこれに限
らず、同様にして、蛋白質を構成するすべての原子であ
るアミノ酸残基の原子の座標を予測演算するように構成
してもよい。In the above embodiments, the coordinates of the α-carbon atom of the target protein are predicted and calculated. However, the present invention is not limited to this, and similarly, the amino acid residues of all the atoms constituting the protein are similarly calculated. It may be configured to predict and calculate the coordinates of the atoms.

【００８４】[0084]

【発明の効果】以上詳述したように本発明によれば、ア
ミノ酸残基の原子の座標が既知の蛋白質のうち、立体構
造を予測すべき目的蛋白質との相同性の度合いが高い少
なくとも１つの蛋白質を参照蛋白質に選択し、上記選択
した参照蛋白質のアミノ酸残基の原子の座標に基づいて
上記目的蛋白質のアミノ酸残基の原子の座標が所定の方
法で予測された目的蛋白質の立体構造の予測精度を演算
する蛋白質の立体構造の予測精度演算方法及び装置であ
って、上記目的蛋白質に含まれる複数種類のアミノ酸の
物理化学的パラメータに基づいて、上記参照蛋白質と上
記目的蛋白質の併置配列において、上記参照蛋白質のア
ミノ酸残基と上記目的蛋白質のアミノ酸残基とが対応す
るように上記参照蛋白質の残基番号と上記目的蛋白質の
残基番号とをともに変化させて、各対応するアミノ酸残
基に対する相関性を示す相関関数の値を、上記複数種類
のアミノ酸の物理化学的パラメータから予め選択された
少なくとも１つの選択物理化学的パラメータについてそ
れぞれ計算し、上記計算した相関関数の値を平均して計
算された相関関数の平均値を上記予測された目的蛋白質
の立体構造の予測精度として用いる。従って、本発明に
よれば、アミノ酸配列情報のみに基づいて、予測された
目的蛋白質の立体構造の座標について残基番号に沿って
１次元的に、局所的な予測精度の推定が可能となる。こ
の方法で１次構造を比較し、立体構造の似ている可能性
の高い部分を抽出することができる。当該方法によれ
ば、相関関数の値がどれくらいであれば、また鎖がどれ
だけ長ければ立体構造の一致が期待されるかを検討する
ことができる。As described in detail above, according to the present invention, at least one protein having a high degree of homology with a target protein whose tertiary structure is to be predicted, among proteins whose atomic coordinates of amino acid residues are known. A protein is selected as a reference protein, and the coordinates of the atoms of the amino acid residues of the target protein are predicted by a predetermined method based on the coordinates of the atoms of the amino acid residues of the selected reference protein. A prediction accuracy calculation method and apparatus of the three-dimensional structure of a protein to calculate the accuracy, based on physicochemical parameters of a plurality of types of amino acids contained in the target protein, in the juxtaposed sequence of the reference protein and the target protein, Both the residue number of the reference protein and the residue number of the target protein are set so that the amino acid residues of the reference protein correspond to the amino acid residues of the target protein. The values of the correlation function indicating the correlation with each corresponding amino acid residue are calculated for at least one selected physicochemical parameter pre-selected from the physicochemical parameters of the plurality of amino acids, The average value of the correlation function calculated by averaging the calculated correlation function values is used as the prediction accuracy of the predicted three-dimensional structure of the target protein. Therefore, according to the present invention, it is possible to estimate the local prediction accuracy one-dimensionally along the residue number with respect to the coordinates of the predicted three-dimensional structure of the target protein based only on the amino acid sequence information. By using this method, the primary structures can be compared, and a portion having a high possibility that the three-dimensional structure is similar can be extracted. According to this method, it is possible to examine how long the value of the correlation function is, and how long the chain is expected to be in conformity with the three-dimensional structure.

【図面の簡単な説明】[Brief description of the drawings]

【図１】本発明に係る一実施例である蛋白質の立体構
造の予測演算及び予測精度演算装置の構成を示すブロッ
ク図である。FIG. 1 is a block diagram showing a configuration of a prediction calculation and prediction accuracy calculation device for a three-dimensional structure of a protein according to one embodiment of the present invention.

【図２】図１の装置で実行される蛋白質の立体構造予
測及び精度計算処理を示すフローチャートである。FIG. 2 is a flowchart showing a three-dimensional structure prediction and accuracy calculation process of a protein performed by the apparatus of FIG. 1;

【図３】図２のサブルーチンである予測座標計算処理
を示すフローチャートである。FIG. 3 is a flowchart illustrating a predicted coordinate calculation process as a subroutine of FIG. 2;

【図４】目的蛋白質がエラスターゼであり参照蛋白質
がキモトリプシノーゲンであるときの相同な残基ペアと
本発明に係る相対距離＜ｄ_ij＞を示す図である。FIG. 4 is a view showing a pair of homologous residues when the target protein is elastase and the reference protein is chymotrypsinogen, and the relative distance <d _ij > according to the present invention.

【図５】本発明で用いる目的蛋白質Ｘの残基番号ｉと
参照蛋白質Ｙの残基番号ｊとの間の相関関数Ｃｐ（ｉ）
を示す数５を説明するための図である。FIG. 5 shows a correlation function Cp (i) between residue number i of target protein X and residue number j of reference protein Y used in the present invention.
It is a figure for demonstrating Formula 5 which shows this.

【図６】図１の装置で予測された目的蛋白質エラスタ
ーゼの立体構造（実線）とそのＸ線結晶解析による計測
結果の立体構造（破線）とを示すステレオ図である。6 is a stereo view showing a three-dimensional structure (solid line) of the target protein elastase predicted by the apparatus of FIG. 1 and a three-dimensional structure (broken line) of the measurement result by X-ray crystallography.

【図７】目的蛋白質エラスターゼの残基番号に対す
る、図１の装置で予測された目的蛋白質エラスターゼの
立体構造とそのＸ線結晶解析による計測結果の立体構造
との間の偏差ｌ_iiを示すグラフである。For 7 residues number of target proteins elastase, a graph showing the deviation l _ii between the three-dimensional structure of the target protein elastase predicted in the apparatus of FIG 1 and the conformation of the measurement result of the X-ray crystallography is there.

【図８】目的蛋白質エラスターゼの残基番号に対す
る、図１の装置で計算された予測精度係数を示す相関関
数Ｃｐ（ｉ）の平均値＜Ｃ（ｉ）＞を示すグラフであ
る。8 is a graph showing an average <C (i)> of a correlation function Cp (i) indicating a prediction accuracy coefficient calculated by the apparatus of FIG. 1 with respect to a residue number of a target protein elastase.

[Explanation of symbols]

１０…マイクロプロセッシングユニット（ＭＰＵ）、１１…ＲＯＭ、１２…ＲＡＭ、１３…キーボード、１４…ＣＲＴディスプレイ、１５…プリンタ、２０…計算結果ファイル、２１…蛋白質データベース（ＰＤＢ）ファイル、ＲＰ１乃至ＲＰ８…対応する残基ペア。 Reference numeral 10: Microprocessing unit (MPU), 11: ROM, 12: RAM, 13: Keyboard, 14: CRT display, 15: Printer, 20: Calculation result file, 21: Protein database (PDB) file, RP1 to RP8 ... Residue pairs to be used.

───────────────────────────────────────────────────── フロントページの続き (58)調査した分野(Int.Cl.⁶，ＤＢ名) C06F 17/30 C07K 1/00 ＪＩＣＳＴファイル（ＪＯＩＳ)──────────────────────────────────────────────────続き Continued on the front page (58) Field surveyed (Int. Cl. ⁶ , DB name) C06F 17/30 C07K 1/00 JICST file (JOIS)

Claims

(57) [Claims]

At least one protein having a high degree of homology with a target protein whose tertiary structure is to be predicted is selected as a reference protein from among proteins whose coordinates of atoms of amino acid residues are known, and the selected reference protein is selected. Calculating the prediction accuracy of the three-dimensional structure of the target protein, in which the coordinates of the atoms of the amino acid residues of the target protein are predicted by a predetermined method based on the coordinates of the atoms of the amino acid residues of the protein The method, wherein, based on physicochemical parameters of a plurality of types of amino acids contained in the target protein, an amino acid residue of the reference protein and an amino acid residue of the target protein in the juxtaposed sequence of the reference protein and the target protein. By changing both the residue number of the reference protein and the residue number of the target protein so that the groups correspond to each other, The value of the correlation function indicating the correlation is calculated for each of at least one selected physicochemical parameter previously selected from the physicochemical parameters of the plurality of types of amino acids, and the calculated correlation function values are calculated and averaged. A method for calculating the prediction accuracy of the three-dimensional structure of a protein, comprising using the average value of the obtained correlation function as the prediction accuracy of the predicted three-dimensional structure of the target protein.

2. The method according to claim 1, wherein the atom of the amino acid residue is an α-carbon atom.

3. The selected physicochemical parameters include: a degree of polarity, a specific volume, a degree of turn formation, a pK value of an α amino group, a pK value of an α carboxyl group, and a mutation degree. The method according to claim 1, wherein the method includes at least one of the following.

4. At least one protein having a high degree of homology with a target protein whose tertiary structure is to be predicted is selected from among proteins whose coordinates of atoms of amino acid residues are known, and the selected reference protein is selected. Calculating the prediction accuracy of the three-dimensional structure of the target protein, in which the coordinates of the atoms of the amino acid residues of the target protein are predicted by a predetermined method based on the coordinates of the atoms of the amino acid residues of the protein An apparatus, wherein, based on physicochemical parameters of a plurality of types of amino acids contained in the target protein, an amino acid residue of the reference protein and an amino acid residue of the target protein in the juxtaposed sequence of the reference protein and the target protein. By changing both the residue number of the reference protein and the residue number of the target protein so that the groups correspond to each other, First calculating means for calculating the value of the correlation function indicating the correlation for at least one selected physicochemical parameter selected in advance from the physicochemical parameters of the plurality of amino acids; and the first calculating means. A second calculating means for averaging the value of the correlation function calculated by the second calculation means to calculate an average value of the correlation function, and calculating the average value of the correlation function calculated by the second calculation means. An apparatus for predicting the three-dimensional structure of a protein, which is used as the accuracy of predicting the three-dimensional structure of the target protein.

5. The apparatus according to claim 4, wherein the atom of the amino acid residue is an α-carbon atom.

6. The selected physicochemical parameters include a polarity, a partial specificity, a turn formation, a pK value of an α amino group, a pK value of an α carboxyl group, and a mutation degree. The apparatus according to claim 4 or 5, wherein the apparatus further comprises at least one protein.