JP3012441B2

JP3012441B2 - Protein three-dimensional structure prediction method

Info

Publication number: JP3012441B2
Application number: JP24680593A
Authority: JP
Inventors: 拓馬見塚
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1993-10-01
Filing date: 1993-10-01
Publication date: 2000-02-21
Anticipated expiration: 2015-02-21
Also published as: JPH07105179A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、立体構造未知のタンパ
ク質アミノ酸配列から、タンパク質の立体構造を予測す
る方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method for predicting the three-dimensional structure of a protein from the amino acid sequence of a protein whose structure is unknown.

【０００２】[0002]

【従来の技術】タンパク質の局所的な立体構造として、
αヘリックスやβシートに代表される二次構造やジンク
フィンガーやロイシンジッパーに代表されるモチーフな
どがある。立体構造未知のタンパク質アミノ酸配列に対
して、これらタンパク質の局所構造の予測が可能になれ
ば、タンパク質全体の立体構造予測が可能になると一般
に考えられている。2. Description of the Related Art As a local three-dimensional structure of a protein,
There are secondary structures such as α-helix and β-sheet, and motifs such as zinc finger and leucine zipper. It is generally considered that if it becomes possible to predict the local structure of a protein with respect to the amino acid sequence of a protein whose structure is unknown, it is possible to predict the three-dimensional structure of the entire protein.

【０００３】例えば、タンパク質二次構造予測問題は、
２０年以上も前から解決が図られてきた問題であり、従
来、タンパク質の一次構造の各残基（以下、予測対象と
なる残基を中心残基と呼ぶ）が、３（あるいは４）種類
の二次構造のいずれに対応するかを予測する問題として
扱われてきた。従来技術によるタンパク質の二次構造を
予測する方法として、例えば、１９７４年発行の米国の
雑誌「バイオケミストリー」（Ｂｉｏｃｈｅｍｉｓｔｒ
ｙ）の第２３巻２２２−２４５頁掲載のチョウ（Ｃｈｏ
ｕ）とファスマン（Ｆａｓｍａｎ）による論文「プレデ
ィクションオブプロテインコンホメーション」
（Ｐｒｅｄｉｃｔｉｏｎｏｆｐｒｏｔｅｉｎｃｏ
ｎｆｏｒｍａｔｉｏｎ）（以下、ＣＦ法と略す）、１９
７８年発行の米国の雑誌「ジャーナルオブモレキュ
ラバイオロジー」（ＪｏｕｒｎａｌｏｆＭｏｌｅ
ｃｕｌａｒＢｉｏｌｏｇｙ）の第１２０巻９７−１２
０頁掲載のガルニエ（Ｇａｒｎｉｅｒ）らによる論文
「アナリシスオブザアキュレシーアンドイン
プリケーションズオブシンプルメソードフォー
プレディクティングザセコンダリーストラクチャ
ーオブグロブラープロテインズ」（Ａｎａｌｙｓｉ
ｓｏｆｔｈｅａｃｃｕｒａｃｙａｎｄｉｍｐｌ
ｉｃａｔｉｏｎｓｏｆｓｉｍｐｌｅｍｅｔｈｏｄ
ｆｏｒｐｒｅｄｉｃｔｉｎｇｔｈｅｓｅｃｏｎ
ｄａｒｙｓｔｒｕｃｔｕｒｅｏｆｇｌｏｂｕｌａｒ
ｐｒｏｔｅｉｎｓ）（以下、ＧＯＲ法と略す）、１９
８７年発行の米国の雑誌「ジャーナルオブモレキュ
ラバイオロジー」（ＪｏｕｒｎａｌｏｆＭｏｌｅ
ｃｕｌａｒＢｉｏｌｏｇｙ）の第１９８巻４２５−４
４３頁掲載のギブラト（Ｇｉｂｒａｔ）らによる論文
「ファザーデベロプメンツオブプロテインセコ
ンダリーストラクチャプレディクションユージン
グインホメーションセオリー：ニューパラメータ
ズアンドコンシダレーションオブレジデュー
ペアズ」（Ｆｕｒｔｈｅｒｄｅｖｅｌｏｐｍｅｎｔｓ
ｏｆｐｒｏｔｅｉｎｓｅｃｏｎｄａｒｙｓｔｒ
ｕｃｔｕｒｅｐｒｅｄｉｃｉｔｏｎｕｓｉｎｇｉ
ｎｆｏｒｍａｔｉｏｎｔｈｅｏｒｙ：Ｎｅｗｐａｒ
ａｍｅｔｅｒｓａｎｄｃｏｎｓｉｄｅｒａｔｉｏｎ
ｏｆｒｅｓｉｄｕｅｐａｉｒｓ）（以下、ＧＧＲ
法と略す）、１９８８年発行の米国の雑誌「ジャーナル
オブモレキュラバイオロジー」（Ｊｏｕｒｎａｌ
ｏｆＭｏｌｅｃｕｌａｒＢｉｏｌｏｇｙ）の第２
０２巻８６５−８８４頁掲載のキャン（Ｑｉａｎ）らに
よる論文「プレディクティングザセコンダリース
トラクチャーオブグロブラープロテインズユー
ジングニューラルネットワークモデルズ」（Ｐｒ
ｅｄｉｃｔｉｎｇｔｈｅｓｅｃｏｎｄａｒｙｓｔ
ｒｕｃｔｕｒｅｏｆｇｌｏｂｕｌａｒｐｒｏｔｅ
ｉｎｓｕｓｉｎｇｎｅｕｒａｌｎｅｔｗｏｒｋｍ
ｏｄｅｌｓ）（以下ＱＳと略す）、及び１９９３年の米
国の学会「ハワイインターナショナルコンファレン
スオブシステムサイエンシイズ」（Ｈａｗａｉｉ
ＩｎｔｅｒｎａｔｉｏｎａｌＣｏｎｆｅｒｅｎｃｅ
ｏｆＳｙｓｔｅｍＳｃｉｅｎｃｅｓ）の予稿集第
１巻６５９−６６８頁記載の馬見塚らによる論文「プロ
テイン αヘリックスリージョンプレディクション
ベースドオンストキャスティックルールラーニ
ング」（Ｐｒｏｔｅｉｎ α−ＨｅｌｉｘＲｅｇｉｏ
ｎＰｒｅｄｉｃｔｉｏｎＢａｓｅｄ−ｏｎＳｔｏ
ｃｈａｓｔｉｃＲｕｌｅＬｅａｒｎｉｎｇ）（以
下、ＭＹ法と略す）などがある。For example, the problem of protein secondary structure prediction is as follows:
This is a problem that has been solved for more than 20 years. Conventionally, each residue in the primary structure of a protein (hereinafter, the residue to be predicted is referred to as a central residue) has 3 (or 4) types. Has been treated as a matter of predicting which of the secondary structures corresponds to As a method for predicting the secondary structure of a protein according to the prior art, for example, a US magazine “Biochemistry” published in 1974 (Biochemistry)
y), Vol. 23, pp. 222-245.
u) and Fasman's dissertation "Prediction of Protein Conformation"
(Prediction of protein co
nformation) (hereinafter abbreviated as CF method), 19
US journal "Journal of Molecule Biology" published in 1978
Cultural Biology) Volume 120 97-12
"Analysis of the Accuracy and Implications of Simple Methods for Predicting the Secondary Structure of Globular Proteins" by Garnier et al.
s of the accuracy and impl
indications of simple method
for predicting the second
day structure of global
proteins) (hereinafter abbreviated as GOR method), 19
1987 US journal "Journal of Molecule Biology" (Journal of Mole
Circular Biology) Vol.
Gibrat et al., Page 43, "Father Developments of Protein Secondary Structure Prediction Yousing Information Theory: New Parameters and Conclusion of Residue"
Pairs "(Further developments
of protein secondary str
ucture prediciton using i
nformation theory: New par
meters and considation
of resin pairs) (hereinafter GGR
1988), a US magazine published in 1988, "Journal of Molecular Biology" (Journal
of Molecular Biology)
Qian et al., Predicting the Secondary Structure of Globber Proteins Using Neural Network Models, Vol. 02, pages 865-884 (Pr.
editing the secondary st
structure of global prote
insuring neural network m
odels) (hereinafter abbreviated as QS), and the 1993 American academic conference “Hawaii International Conference of System Sciences” (Hawaii).
International Conference
of System Sciences, Vol. 1, pages 659-668, written by Umizuka et al., "Protein α-Helix Region Prediction Based on Stochastic Rule Learning" (Protein α-Helix Region).
n Prediction Based-on Sto
classic Rule Learning) (hereinafter abbreviated as MY method).

【０００４】ＣＦ法は、タンパク質構造のデータベース
から各二次構造におけるアミノ酸の統計的な出現頻度を
求め、この頻度表を使用し、経験的な規則に基づく予測
を行っている。また、ＧＯＲ法は、中心残基の二次構造
に対して、その残基から数残基離れた残基により独立に
もたらされる情報量の和を計算し、その相対値から予測
を行い、ＧＧＲ法は、中心残基の二次構造に対して、そ
の残基及びその残基から数残基離れた残基によりもたら
される情報量の和から予測を行っている。さらに、ＱＳ
法は、３層のフィードフォワード型のネットワークを使
用し、中心残基の前後８残基を含む配列を入力とし、二
次構造に対する中心残基及び周辺残基からの寄与をニュ
ーラルネットワークを用いて抽出することにより予測を
行っている。ＭＹ法は、訓練配列の各残基位置において
各アミノ酸がαヘリックスであるかの確からしさを確率
分布として計算し、それからテスト配列の各領域に対
し、αヘリックスの確からしさを計算する。[0004] In the CF method, the statistical appearance frequency of amino acids in each secondary structure is obtained from a protein structure database, and the frequency table is used to make predictions based on empirical rules. Also, the GOR method calculates the sum of the information amounts independently brought about by the residues several residues away from the secondary structure of the central residue, makes a prediction from the relative value, and calculates the GGR The method predicts the secondary structure of the central residue from the sum of the information content provided by the residue and the residues several residues away from the residue. Furthermore, QS
The method uses a three-layer feed-forward network, inputs a sequence containing eight residues before and after the central residue, and uses a neural network to determine the contribution of the central and peripheral residues to the secondary structure. Prediction is performed by extracting. The MY method calculates the probability that each amino acid is an α-helix at each residue position in the training sequence as a probability distribution, and then calculates the probability of the α-helix for each region of the test sequence.

【０００５】[0005]

【発明が解決しようとする課題】タンパク質アミノ酸配
列においては、各アミノ酸残基同士は依存関係を保持
し、その局所的な立体構造や機能的な部位を形成してい
ると考えられている。従って、タンパク質の局所的な立
体構造を表現及び予測するためには、それら局所的な立
体構造内の各残基間の依存関係の表現が重要であると考
えられる。しかし、従来、それら残基位置間に依存する
依存関係をネットワークの形で自動的に抽出する方法
や、さらに、その依存関係を規則として未知データに対
する予測を行う方法は、全く検討されておらず、そうい
った手法が確率されていなかった。In the amino acid sequence of a protein, it is considered that each amino acid residue retains a dependency and forms a local three-dimensional structure or a functional site. Therefore, in order to represent and predict the local three-dimensional structure of a protein, it is considered important to express the dependence between the residues in the local three-dimensional structure. However, there has been no study on a method for automatically extracting a dependency depending on the residue position in the form of a network or a method for predicting unknown data by using the dependency as a rule. , Such a method was not established.

【０００６】[0006]

【課題を解決するための手段】本発明は、タンパク質構
造を予測するための確率的規則の構造を学習するステッ
プを有するタンパク質立体構造予測方法において、前記
ステップは、正例と負例とからなるアミノ酸配列の訓練
データを入力するステップと、各アミノ酸の依存関係を
条件付き確率として、正例及び負例の実数値パラメータ
を計算するステップと、該実数値パラメータにより定め
られた情報量基準を用いて、該アミノ酸配列の各々の残
基位置が依存している他の残基位置を決定することによ
り、該依存関係を表現する確率的規則の構造を構築して
出力するステップとを含み、前記確率的規則の構造を用
いてテストデータの活性度を計算し、計算された活性度
に基づいてタンパク質の立体構造を予測することを特徴
とする。 Means for Solving the Problems The present invention provides a protein structure.
Steps to learn the structure of stochastic rules for predicting
In the method for predicting a protein three-dimensional structure having
The step consists of training the amino acid sequence consisting of positive and negative examples.
Entering the data and the dependencies of each amino acid
Positive and negative real-valued parameters as conditional probabilities
And the real-valued parameters
Using the information criterion provided, the residue of each of the amino acid sequences
By determining the position of other residues on which the base position depends
And constructing a probabilistic rule structure to express the dependency
Outputting the probabilistic rule.
Calculate the activity of the test data and calculate the activity
Predicts the three-dimensional structure of proteins based on
And

【０００７】また、本発明は、タンパク質構造を予測す
るための確率的規則の構造を学習するステップを有する
タンパク質立体構造予測方法において、前記ステップ
は、正例と負例とからなるアミノ酸配列の訓練データを
実数値属性として入力するステップと、各アミノ酸の依
存関係を条件付き確率として、正例及び負例の実数値パ
ラメータを計算するステップと、該実数値パラメータに
より定められた情報量基準を用いて、該アミノ酸配列の
各々の残基位置が依存している他の残基位置を決定する
ことにより、該依存関係を表現する確率的規則の構造を
構築して出力するステップとを含み、前記確率的規則の
構造を用いてテストデータの活性度を計算し、計算され
た活性度に基づいてタンパク質の立体構造を予測するこ
とを特徴とする。 Further , the present invention provides a method for predicting a protein structure.
Learning the structure of stochastic rules for
In the protein three-dimensional structure prediction method, the step
Provides training data for amino acid sequences consisting of positive and negative examples.
Entering as real-valued attributes
Real-valued patterns of positive and negative examples
Calculating the parameters, and
Using a more defined information criterion, the amino acid sequence
Determine the other residue positions on which each residue position depends
Thus, the structure of the stochastic rule expressing the dependency is
Constructing and outputting the probabilistic rule.
Calculate the activity of the test data using the structure and calculate
Predict the three-dimensional structure of a protein based on the activity
And features.

【０００８】また、本発明は、タンパク質構造を予測す
るための確率的規則の構造を学習するステップを有する
タンパク質立体構造予測方法において、前記ステップ
は、正例と負例とからなるアミノ酸配列の訓練データを
入力するステップと、各アミノ酸の依存関係を条件付き
確率として、正例及び負例の実数値パラメータを計算す
るステップと、該実数値パラメータにより定められた情
報量基準を用いて、該アミノ酸配列の各々の残基位置が
依存している他の残基位置を、依存関係が循環しないと
いう制限の下で決定することにより、該依存関係を表現
する確率的規則の構造を構築して出力するステップとを
含み、前記確率的規則の構造を用いてテストデータの活
性度を計算し、計算された活性度に基づいてタンパク質
の立体構造を予測することを特徴とする。 [0008] The present invention also provides a method for predicting a protein structure.
Learning the structure of stochastic rules for
In the protein three-dimensional structure prediction method, the step
Provides training data for amino acid sequences consisting of positive and negative examples.
Condition of input step and dependency of each amino acid
Calculate positive and negative real-valued parameters as probabilities.
And the information defined by the real-valued parameters.
Each residue position in the amino acid sequence is determined using the
If the residue does not circulate to other dependent residue positions
Express the dependency by deciding under the restrictions
Constructing and outputting a probabilistic rule structure
And use the structure of the probabilistic rules to activate test data.
Calculate the degree of activity and calculate the protein based on the calculated activity
Is predicted.

【０００９】また、本発明は、タンパク質構造を予測す
るための確率的規則の構造を学習するステップを有する
タンパク質立体構造予測方法において、前記ステップ
は、正例と負例とからなるアミノ酸配列の訓練データを
実数値属性として入力するステップと、各アミノ酸の依
存関係を条件付き確率として、正例及び負例の実数値パ
ラメータを計算するステップと、該実数値パラメータに
より定められた情報量基準を用いて、該アミノ酸配列の
各々の残基位置が依存している他の残基位置を依存関係
が循環しないという制限の下で決定することにより、該
依存関係を表現する確率的規則の構造を構築して出力す
るステップとを含み、前記確率的規則の構造を用いてテ
ストデータの活性度を計算し、計算された活性度に基づ
いてタンパク質の立体構造を予測することを特徴とす
る。 The present invention also provides a method for predicting a protein structure.
Learning the structure of stochastic rules for
In the protein three-dimensional structure prediction method, the step
Provides training data for amino acid sequences consisting of positive and negative examples.
Entering as real-valued attributes
Real-valued patterns of positive and negative examples
Calculating the parameters, and
Using a more defined information criterion, the amino acid sequence
Dependency on other residue positions where each residue position depends
Is determined under the restriction that the
Build and output the structure of probabilistic rules expressing dependencies
Using the structure of the stochastic rule.
Calculate the activity of the strike data, and based on the calculated activity
And predicts the three-dimensional structure of the protein.
You.

【００１０】[0010]

【実施例】次に、本発明について図面を参照して詳細に
説明する。Next, the present invention will be described in detail with reference to the drawings.

【００１１】図１は、本発明のタンパク質立体構造予測
方法の実施例を説明するフローチャートである。本実施
例では、対象とするタンパク質の局所的な立体構造とし
てαヘリックスを扱うものとする。FIG. 1 is a flow chart for explaining an embodiment of the protein three-dimensional structure prediction method of the present invention. In this embodiment, it is assumed that an α-helix is treated as a local three-dimensional structure of a target protein.

【００１２】ステップ１１では、αヘリックスの領域が
わかっているタンパク質のアミノ酸配列に対して、同じ
ファミリーのタンパク質、例えば、種が異なる同じタン
パク質のアライメント（整合）をとり、αヘリックスに
対応する部分配列を、αヘリックスの正例として抽出す
る。In step 11, the amino acid sequence of the protein whose α-helix region is known is aligned with the same family of proteins, for example, the same protein of different species, and the partial sequence corresponding to the α-helix is determined. Is extracted as a positive example of an α helix.

【００１３】例えば、ヘモグロビンというタンパク質の
β鎖の場合には、ヒトのヘモグロビンのαヘリックスの
位置は、Ｘ線結晶回折の結果から明らかになっており、
８個のαヘリックスの領域を有することが知られてい
る。従って、ヒトのヘモグロビンβ鎖に対して、他の
種、例えば、チンパンジー、ウマなどの他の種のヘモグ
ロビンβ鎖のアライメントを行い、８個のαヘリックス
に対応する領域をαヘリックスの正例として抽出する。For example, in the case of a protein β-chain called hemoglobin, the position of the α-helix of human hemoglobin has been clarified from the results of X-ray crystal diffraction.
It is known to have a region of eight α-helices. Therefore, human hemoglobin β chain is aligned with other species, for example, chimpanzee, horse and other species of hemoglobin β chain. Extract.

【００１４】ステップ１２では、αヘリックス位置の知
られているタンパク質のαヘリックスに対応する部分配
列に対して、αヘリックス位置の知られているアミノ酸
配列データベースの各配列のアライメントをとり、αヘ
リックスに対応しない部分配列を、ステップ１０で抽出
されたαヘリックスの正例に対する負例として抽出す
る。In step 12, each sequence of the amino acid sequence database with known α-helix position is aligned with the partial sequence corresponding to α-helix of the protein with known α-helix position, Non-corresponding subsequences are extracted as a negative example relative to the positive example of the α helix extracted in step 10.

【００１５】ヘモグロビンβ鎖の例では、８個のαヘリ
ックスに対応する部分配列に対して、例えば、ＰＤＢ
（ＰｒｏｔｅｉｎＤａｔａＢａｎｋ）などのタンパ
ク質構造データベース内のいくつかの配列に対してアラ
イメントを行い、アライメントの結果得られた各部分配
列において、その配列の構造がαヘリックスではない場
合に、それらを負例として抽出する。例えば、負例抽出
の際のアライメントでは、一定の割合以上の相同性を保
持する部分配列を負例とすることが考えられる。具体的
には、アライメントによる相同性が３０％以上の部分配
列を負例とする方法などがある。In the example of the hemoglobin β chain, the partial sequence corresponding to the eight α helices is, for example, PDB
Alignment is performed for several sequences in a protein structure database such as (Data Protein Bank), and in each partial sequence obtained as a result of the alignment, if the structure of the sequence is not α-helix, these are regarded as negative examples. Extract as For example, in the alignment at the time of extracting a negative example, it is conceivable that a partial sequence having homology of a certain ratio or more is regarded as a negative example. Specifically, there is a method using a partial sequence having a homology of 30% or more by alignment as a negative example.

【００１６】抽出するデータ数については、例えば、α
ヘリックスの正例となる各領域における正例と負例との
割合を各領域についてそれぞれ等しくすることが考えら
れる。さらに、具体的には、その割合として正例、負例
を同数とすることが考えられる。The number of data to be extracted is, for example, α
It is conceivable that the ratio of the positive example to the negative example in each region serving as the positive example of the helix is made equal in each region. Furthermore, more specifically, it is conceivable that the same number of positive examples and negative examples is used as the ratio.

【００１７】ステップ１３は、ステップ１１及びステッ
プ１２で抽出された正例と負例から、確率的規則の実数
値パラメータを計算するステップである。Step 13 is a step of calculating the real-valued parameters of the probabilistic rule from the positive and negative examples extracted in steps 11 and 12.

【００１８】確率的規則とは、ここでは任意の与えられ
た配列の領域に対して、αヘリックスが対応する確率を
与える確率分布のことである。各Ｘ_i（ｉ＝
１，．．．，ｎ）をそれぞれ属性値の空間として、Ｘを
それらの直積、すなわち、Ｘ＝Ｘ₁×Ｘ₂×．．．×Ｘ
_nと書く。The stochastic rule here is a probability distribution that gives a probability that an α-helix corresponds to an arbitrary region of a given sequence. Each X _i (i =
1,. . . , N) are attribute value spaces, and X is their direct product, that is, X = X ₁ × X ₂ ×. . . × X
Write _n .

【００１９】例えば、Ｘは２０種類のアミノ酸からなる
一つの集合を表す場合や、またＸ＝Ｘ₁×Ｘ₂で、Ｘ₁
が疎水性を表す数値の範囲かつＸ₂が分子量を表す数値
の範囲を表す場合などがある。For example, X represents one set of 20 kinds of amino acids, or X = X ₁ × X ₂ and X ₁
May represent a range of numerical values representing hydrophobicity and X ₂ may represent a range of numerical values representing molecular weight.

【００２０】αヘリックスの正例中の長さＬのウィンド
ウＷに対し、テスト配列中の任意の長さＬの領域ＳがＷ
部分に対応する確からしさを以下のように求める。ま
ず、Ｘ_t（以下、変数と呼ぶ）を配列Ｓの左から数えて
ｔ番目の残基位置であり、π_tを領域ＷにおいてＸ_tが
依存する残基位置の集合（以下、π_tをＸ_tの親変数の
集合、Ｘ_tをπ_tの子変数と呼ぶ）とする。ここで、For a window W of length L in the positive example of the α helix, a region S of arbitrary length L in the test sequence is W
The likelihood corresponding to the part is determined as follows. First, X _t (hereinafter, referred to as a variable) is the t-th residue position counted from the left of the sequence S, and π _t is a set of residue positions on which X _t depends in the region W (hereinafter, π _t is referred to as a variable). set of parent variables of X _t, the X _t is referred to as a child variable of π _t) to. here,

【００２１】[0021]

【数１】 (Equation 1)

【００２２】を、領域Ｗにおいて、Ｘ_tがπ_tに依存し
ている条件付き確率とし、領域ＳがＷ部分に対応する確
からしさＰｗ（Ｓ）は、次のように書けるものと仮定す
る。Let the conditional probability that X _t depends on π _t be in the region W, and assume that the probability Pw (S) that the region S corresponds to the W portion can be written as follows.

【００２３】[0023]

【数２】 (Equation 2)

【００２４】１式の右辺は、変数をノードとし、変数間
の親から子にアークを伸ばすことにより、ネットワーク
構造に対応する。例えば、領域Ｓが３残基からなり、領
域Ｓの各残基の結合確率が具体的に次式のように書ける
ものとすれば、次式は図３のネットワークに対応する。The right side of Equation 1 corresponds to a network structure by using variables as nodes and extending an arc from a parent to a child between the variables. For example, if the region S is composed of three residues, and the connection probability of each residue in the region S can be specifically written as the following expression, the following expression corresponds to the network of FIG.

【００２５】[0025]

【数３】 (Equation 3)

【００２６】さらに、各Further, each

【００２７】[0027]

【数４】 (Equation 4)

【００２８】は、与えられた正例と負例とからなる事例
データから、例えば、次のようにして決定される。Is determined as follows, for example, from the given case data including the positive and negative examples.

【００２９】まず、ｔ番目の残基位置において、属性の
実数値のとり得る範囲を重なり合わない部分領域（以
下、これをセルと呼ぶ）に有限分割し、ｍ_tを全セル
数、Ｃ_iをｉ番目のセルとする。[0029] First, in the t-th residue position, partial area not overlapping the possible range of real-valued attributes (hereinafter, referred to as cells) and finite divided, the total number of cell m _t, C _i Is the i-th cell.

【００３０】ｔ番目の位置の残基がｍ_t個のセルの内の
Ｃ_iに含まれる場合に、Ｘ_tの生起確率Ｐ（Ｘ_t＝ｉ）
＝ｐ_i（ｔ）とする。ここで、[0030] If the residue of the t-th position is included in C _i of the m _t number of cells, X _t of the occurrence probability P (X _t = i)
= P _i (t). here,

【００３１】[0031]

【数５】 (Equation 5)

【００３２】であり、これを確率パラメータと呼ぶ。図
４は、有限分割の構造を示す例であるが、値が０から１
の範囲をとる一つの属性により確率パラメータを推定す
る場合を示す。This is called a probability parameter. FIG. 4 shows an example of the structure of the finite division, in which the value is from 0 to 1.
Shows a case where a probability parameter is estimated by one attribute having a range of.

【００３３】確率パラメータは、各セルに含まれる正例
及び負例のデータ数を用いて推定する。The probability parameter is estimated using the number of positive and negative data included in each cell.

【００３４】[0034]

【数６】 (Equation 6)

【００３５】をｔ番目の位置でのｉ番目のセルに含まれ
る正例数、Ｎ_i（ｔ）をｔ番目の位置でのｉ番目のセル
に含まれる正例数と負例数の和とし、ｔ番目の位置での
ｉ番目のセルにおける推定値をｐ_i（ｔ）とする。例え
ば、次式のラプラス推定量によって、各セルに対する確
率パラメータを計算する。Is the number of positive cases contained in the i-th cell at the t-th position, and N _i (t) is the sum of the number of positive cases and the number of negative cases contained in the i-th cell at the t-th position. , Let the estimated value at the i-th cell at the t-th position be p _i (t). For example, a probability parameter for each cell is calculated by the Laplace estimator of the following equation.

【００３６】[0036]

【数７】 (Equation 7)

【００３７】ただし、推定量はラプラス推定量のみなら
ず、多くの推定量が使用できる。次に、同様に、Ｘ_tと
π_tの結合確率Ｐ（Ｘ_t，π_t）も推定量を用いて計算
できる。例えば、π_tの要素が変数Ｘ_sのみである場
合、Ｘ_sをＸ_tと同様に重なり合わないｍ_s個の部分領
域に有限分割し、ｓ番目の残基がｍ_s個のセル内のＣ_j
に含まれ、However, not only the Laplace estimator but also many estimators can be used. Then, similarly, X _t and [pi _t joint probability P (X _{_t, π t)} can be calculated using the estimated amount. For example, [pi when the elements of _t is only variable X _s, and finite divides X _s in m _s number of partial areas not overlapping in the same manner as X _t, s-th residue of the m _s number of cells C _j
Included in

【００３８】[0038]

【数８】 (Equation 8)

【００３９】とし、確率パラメータｐ_{i , j}（ｔ，ｓ）
を推定する。And the probability parameters p _{i, j} (t, s)
Is estimated.

【００４０】ｔ番目，ｓ番目の各位置において、各セル
に含まれる正例及び負例のデータ数から、At each of the t-th and s-th positions, based on the number of data of the positive and negative examples contained in each cell,

【００４１】[0041]

【数９】 (Equation 9)

【００４２】を各位置においてそれぞれｉ，ｊ番目のセ
ルに含まれる正例数、Ｎ_{i , j}（ｔ，ｓ）を各位置にお
いてそれぞれｉ，ｊ番目のセルに含まれる正例数と負例
数の和とする。これから、例えば、次式のラプラス推定
量により、確率パラメータを推定する。Is the number of positive examples included in the i, j-th cell at each position, and N _{i, j} (t, s) is the number of positive examples and the negative example included in the i, j-th cell at each position. The sum of numbers. From this, for example, a probability parameter is estimated by the following Laplace estimator.

【００４３】[0043]

【数１０】 (Equation 10)

【００４４】最後に、これら推定された確率パラメータ
を用いて、π_tが存在する下でのＸ_tの条件付き確率Finally, using these estimated probability parameters, the conditional probability of X _{t in} the presence of π _t

【００４５】[0045]

【数１１】 [Equation 11]

【００４６】を確率パラメータとして計算する。上述の
ように、π_tの要素が変数Ｘ_sのみであり、ｔ番目及び
ｓ番目の各位置をそれぞれｍ_t個、ｍ_s個に有限分割
し、さらに、各位置の残基がセルＣ_i、Ｃ_jに含まれる
場合、Is calculated as a probability parameter. As described above, [pi elements _t is only variable X _s, t th and s-th each m _t pieces each position, and finite divided into m _s pieces, further residue cell C _i of each position , C _j ,

【００４７】[0047]

【数１２】 (Equation 12)

【００４８】とし、確率パラメータAnd the probability parameter

【００４９】[0049]

【数１３】 (Equation 13)

【００５０】は次式のように計算する。Is calculated as follows:

【００５１】[0051]

【数１４】 [Equation 14]

【００５２】ステップ１４では、確率的規則の構造を決
定する。すなわち、各変数Ｘ_tに対し、その親変数の集
合π_tを情報量基準を使用して決定するステップであ
り、本発明の第１の発明と第３の発明に相当する。In step 14, the structure of the stochastic rule is determined. That is, for each variable X _t , a set of parent variables π _t is determined using the information amount criterion, and corresponds to the first and third inventions of the present invention.

【００５３】以下、情報量基準として記述長最小（Ｍｉ
ｎｉｍｕｍＤｅｓｃｒｉｐｔｉｏｎＬｅｎｇｈｔｈ
（ＭＤＬ））原理（以下、ＭＤＬ原理）と適用した場合
のネットワーク構成方法の一例について述べる。なお、
ＭＤＬ原理については、１９７８年発行の米国の雑誌
「オートマティカ」（Ａｕｔｏｍａｔｉｃａ）の第１４
巻４６５−４７１頁記載のリサネン（Ｒｉｓｓａｎｅ
ｎ）による論文「モデリングバイショーテストデ
ータディスクリプション」（Ｍｏｄｅｌｉｎｇｂｙ
ｓｈｏｒｔｅｓｔｄａｔａｄｅｓｃｒｉｐｔｉｏ
ｎ）に詳しく記載されている。Hereinafter, the minimum description length (Mi
minimum Description Length
(MDL) An example of a network configuration method when applied to the principle (hereinafter, the MDL principle) will be described. In addition,
For the MDL principle, see the 14th edition of the American magazine "Automatica" published in 1978.
Volume 465-471, Rissane.
n), "Modeling by Shortest Data Description" (Modelingby
shortest data description
n).

【００５４】ＭＤＬ原理によれば、与えられた事例デー
タから計算されるデータ記述長と規則の記述長との和が
最小となる規則を最適な規則する。従って、ステップ１
１、１２、１３において求められた正例、負例及び実数
値パラメータから、ここでの確率的規則のデータ記述長
及び規則の記述長を計算する。According to the MDL principle, the rule that minimizes the sum of the data description length calculated from given case data and the rule description length is determined as the optimal rule. Therefore, step 1
The data description length of the probabilistic rule and the description length of the rule here are calculated from the positive examples, negative examples, and real-valued parameters obtained in 1, 12, and 13.

【００５５】ここで説明する例では、各残基位置ごと
に、その位置と依存関係にある位置を決定していくこと
を考える。すなわち、各変数毎に独立にその親の変数を
決定していく。In the example described here, it is considered that, for each residue position, a position that is dependent on the position is determined. That is, the parent variable is determined independently for each variable.

【００５６】まず、ｔ番目の残基位置に着目する。変数
Ｘ_tとその親変数の集合π_tとの依存関係は、１式の確
率的規則から条件付き確率First, attention is paid to the t-th residue position. The dependency between the variable X _t and its parent variable set π _t is determined by the conditional probability

【００５７】[0057]

【数１５】 (Equation 15)

【００５８】で表現される。ここで、親変数の数をｋ、
親変数の残基位置を順番にｔ_lからｔ_kとし、また、ｔ
及びｔ_lからｔ_kの残基位置での全セル数をそれぞれｍ
_t，ｍ_{t l}，・・・，ｍ_{t k}とする。さらに、変数Ｘ_t
の残基がｉ、変数Ｘ_{t l}からＸ_{t k}の残基が、それぞれ
ｊ_l，・・・，ｊ_k番目のセルに含まれるような属性を
有している正例数をIs represented by Here, the number of parent variables is k,
The residue position of the parent variables and t _k from t _l in order, also, t
And the total number of cells at residue positions from t _l to t _k is m
_t, m _tl, ···, and m _tk. Further, the variable X _t
Residues i, residues X _tk from the variable X _tl, respectively j _l, · · ·, a positive number of cases that have attributes such as contained in j _k th cell

【００５９】[0059]

【数１６】 (Equation 16)

【００６０】、変数Ｘ_tの残基がｉ、変数Ｘ_{t l}からＸ
_{t k}の残基が、それぞれｊ_l，・・・，ｊ_k番目のセル
に含まれるような属性を有している正例数と負例数との
和をＮ_{i , j l , ・・・ , j k}（ｔ，ｔ_l，・・・，ｔ
_k）、変数Ｘ_tの残基がｉ、変数Ｘ_{t l}からＸ_{t k}の残
基が、それぞれｊl ，・・・，ｊ_k番目のセルに入るよ
うな属性を有している条件付き確率の確率パラメータを[0060], residues of variable X _t is i, from the variable X _tl X
residues _tk are respectively j _l, · · ·, the sum N _i of the j _k-th number of positive cases that have attributes such as the cell contains a negative number of _{cases, jl, · · ·, jk} (t, t _l , ..., t
_k), the variable X residues _t is i, the residues X _tk from the variable X _tl, respectively jl, · · ·, a j _k-th conditional probability has attributes such as entering a cell probability Parameter

【００６１】[0061]

【数１７】 [Equation 17]

【００６２】とする。It is assumed that

【００６３】すると、ここでの確率的規則のデータ記述
長は、ステップ１３により計算された確率パラメータか
ら、規則の対数尤度の負をとることにより、次式で与え
られる。Then, the data description length of the probabilistic rule here is given by the following equation by taking the log likelihood of the rule negative from the probability parameter calculated in step 13.

【００６４】[0064]

【数１８】 (Equation 18)

【００６５】さらに、ここでの確率的規則の規則の記述
長は、次式で与えられる。Further, the description length of the rule of the stochastic rule is given by the following equation.

【００６６】[0066]

【数１９】 [Equation 19]

【００６７】従って、ｔ番目の残基位置に相当する変数
Ｘ_tに対し、（６）式と（７）式との和を最小にするよ
うな親変数の集合を選択することにより、確率的規則の
構造が決定される。[0067] Therefore, with respect to the variable X _t, which corresponds to t-th residue positions, by selecting the set of parent variables minimizing the sum of (6) and (7), stochastic The structure of the rule is determined.

【００６８】ステップ１５では、ステップ１４において
構成された確率的規則を使用し、与えられたテストデー
タ配列の各領域に対して、その活性度を計算する。In step 15, the probabilistic rule constructed in step 14 is used to calculate the activity of each region of the given test data array.

【００６９】ここでは、活性度として尤度を使用する。Here, likelihood is used as the activity.

【００７０】まず、訓練配列の正例中の任意の長さＬの
領域Ｗを取り出す。このＷの各残基位置に対応する変数
に対して、その親変数がステップ１４において決定され
ている。さらに、各変数とその親変数との依存関係を表
す条件付き確率の実数値パラメータは、ステップ１３に
おいて算出されている。First, an area W having an arbitrary length L in the positive example of the training sequence is extracted. The parent variable of the variable corresponding to each residue position of W has been determined in step 14. Further, the real-valued parameter of the conditional probability representing the dependency between each variable and its parent variable is calculated in step 13.

【００７１】次に、この領域Ｗをテスト配列の任意の長
さＬの部分配列Ｓにあてはめ、そのαヘリックスの尤度
を計算する。Next, this area W is applied to a partial sequence S of an arbitrary length L of the test sequence, and the likelihood of the α helix is calculated.

【００７２】例えば、Ｌ＝３の領域Ｗにおいて、（２）
式のような確率的規則の構造が決定され、テスト配列の
Ｌ＝３の領域Ｓでは、その領域内の各残基が順に、２、
１、３番目のセルに入る実数値属性を有しているとす
る。すると、この領域ＳがＷである尤度は次式で計算で
きる。For example, in the area W where L = 3, (2)
The structure of the stochastic rule such as the equation is determined, and in the region S of L = 3 of the test sequence, each residue in the region is 2,
Suppose that it has a real-valued attribute that goes into the first and third cells. Then, the likelihood that this area S is W can be calculated by the following equation.

【００７３】[0073]

【数２０】 (Equation 20)

【００７４】この動作を訓練配列の正例中の取り得る全
ての長さＬの領域で構成された確率的規則を使用し、テ
スト配列中の長さＬの全ての部分配列に対して行う。This operation is performed on all the sub-arrays of length L in the test array using a probabilistic rule composed of all possible length L regions in the positive example of the training array.

【００７５】ステップ１６では、テスト配列中の任意の
長さＬの領域Ｓに対して、訓練配列中の正例の取り得る
全ての長さＬの領域により算出された複数の尤度の中
で、最大の尤度を選出し、領域Ｓのαヘリックスの尤度
とする。In step 16, for a region S of an arbitrary length L in the test sequence, a plurality of likelihoods calculated by all possible length L regions of the positive example in the training sequence are calculated. , The maximum likelihood is selected as the α-helix likelihood of the region S.

【００７６】ステップ１５及びステップ１６の動作は次
のようにまとめることができる。すなわち、訓練配列中
の正例中の長さＬの領域の全ての集合をＡとし、テスト
配列の長さＬの部分配列Ｓに対するαヘリックスの尤度
Ｐ（Ｓ）を次式により計算する。The operations in steps 15 and 16 can be summarized as follows. That is, all sets of the length L region in the positive sequence in the training sequence are A, and the likelihood P (S) of the α helix with respect to the partial sequence S having the length L of the test sequence is calculated by the following equation.

【００７７】[0077]

【数２１】 (Equation 21)

【００７８】この動作をテスト配列の各領域に対して繰
り返すことにより、テスト配列の各領域において、αヘ
リックスの尤度を計算する。By repeating this operation for each area of the test array, the likelihood of the α helix is calculated for each area of the test array.

【００７９】ここで、さらに、αヘリックス領域が複数
個あれば、各αヘリックス領域について、同様な尤度計
算を行ない、αヘリックス領域全体を通じて最大の尤度
を最適値として選ぶ。Here, if there are a plurality of α-helix regions, the same likelihood calculation is performed for each α-helix region, and the maximum likelihood is selected as the optimum value over the entire α-helix region.

【００８０】さらに、テスト配列内の尤度が与えられた
各領域において、最大の尤度を領域内の各残基の最適値
とする、あるいは、領域内の各残基に対しては、その残
基を含む領域の得られた尤度の平均を各残基の最適値と
する、などの方法を使用し、テストアミノ酸配列全体に
対する尤度の変化を出力する。Further, in each region where the likelihood in the test sequence is given, the maximum likelihood is set as the optimum value of each residue in the region. The average of the obtained likelihood of the region containing the residue is used as the optimum value of each residue, and the like, and the likelihood change for the entire test amino acid sequence is output.

【００８１】以上の図１における学習及び予測方法は、
αヘリックス以外の二次構造及びモチーフ等の局所領域
の特徴抽出、さらに予測に適用できる。図２は、本発明
のタンパク質立体構造予測方法の実施例を説明するフロ
ーチャートである。本実施例では、対象とする二次構造
としてαヘリックスを扱うものとする。The learning and predicting method in FIG.
The present invention can be applied to feature extraction and prediction of local regions such as secondary structures and motifs other than the α-helix. FIG. 2 is a flowchart illustrating an embodiment of the protein three-dimensional structure prediction method of the present invention. In the present embodiment, it is assumed that an α-helix is used as a target secondary structure.

【００８２】ステップ２１は、図１のステップ１１と同
じ処理を行ないαヘリックス領域予測のために必要な正
例を抽出する。In step 21, the same processing as in step 11 in FIG. 1 is performed to extract a positive example required for α-helix region prediction.

【００８３】ステップ２２は、図１のステップ１２と同
じ処理を行ないαヘリックス領域予測のために必要な負
例を抽出する。In step 22, the same processing as in step 12 in FIG. 1 is performed to extract a negative example necessary for α-helix region prediction.

【００８４】ステップ２３は、図１のステップ１３と同
じ処理を行ないステップ２１及びステップ２２で抽出さ
れた正例及び負例から、確率的規則の実数値パラメータ
を推定する。In step 23, the same processing as in step 13 in FIG. 1 is performed, and the real-valued parameters of the probabilistic rule are estimated from the positive and negative examples extracted in steps 21 and 22.

【００８５】ステップ２４は、確率的規則の構築に制限
が加われ、局所構造領域の全変数の結合確率分布として
無矛盾な確率的規則を構築するステップであり、本発明
の第２の発明と第４の発明に含まれる。ここでの制限と
は、確率的規則を図３のようなネットワーク構造で示し
た場合に、確率分布に矛盾が生じないように、アークの
方向を非循環とする制限である。例えば、図３は非循環
ネットワークの例であるが、この図において、Ｘ₁から
Ｘ₃に伸びているアークを逆にＸ₃からＸ₁へと伸ばせ
ば、このネットワークは循環ネットワークとなり、その
ようなネットワークの生成は許さない。Step 24 is a step in which the construction of the probabilistic rule is restricted, and a consistent probabilistic rule is constructed as a joint probability distribution of all variables in the local structure region. Included in the invention. The restriction here is a restriction in which the direction of the arc is non-circular so that the probability distribution does not conflict when the probabilistic rule is represented by a network structure as shown in FIG. For example, FIG. 3 is an example of a non-circulating network. In this figure, if the arc extending from X ₁ to X ₃ is extended from X ₃ to X ₁ , the network becomes a cyclic network, and so on. We do not allow the creation of a simple network.

【００８６】制限を加える方法として、例えば、各変数
に順番付けを行ない、順番の小さい変数のみを親変数と
して持てるとする方法、あるいは、アークに循環が生じ
るような依存関係が構成される場合にのみ、その依存関
係を成立しないようにする方法などが考えられる。As a method of adding a restriction, for example, a method in which each variable is ordered and only variables having a small order can be used as a parent variable, or a case where a dependency is formed such that an arc is circulated. Only, a method of preventing the dependency from being established can be considered.

【００８７】ステップ２５は、図１のステップ１５と同
じ処理を行ない、ステップ２４を使用して構造が最適化
された確率的規則を使用し、テストアミノ酸配列データ
の各領域に対して、その活性度を計算する。In step 25, the same processing as in step 15 in FIG. 1 is performed, and using a stochastic rule whose structure has been optimized using step 24, the activity of each region of the test amino acid sequence data is determined. Calculate the degree.

【００８８】ステップ２６は、図１のステップ１６と同
じ処理を行ないステップ２５により求められた複数の活
性度から、配列全体に対する活性度の変化を出力する。In step 26, the same processing as in step 16 of FIG. 1 is performed, and a change in the activity for the entire array is output from the plurality of activities determined in step 25.

【００８９】以上の図２における学習及び予測方法は、
αヘリックス以外の二次構造及びモチーフ等の局所領域
の特徴抽出、さらに予測に適用できる。The learning and prediction method in FIG.
The present invention can be applied to feature extraction and prediction of local regions such as secondary structures and motifs other than the α-helix.

【００９０】[0090]

【発明の効果】立体構造既知のタンパク質のアミノ酸配
列情報から、局所的な立体構造さえも未知のタンパク質
の局所的な立体構造を従来技術に対して高い精度で予測
可能である。例えば、従来手法の一つであるＭＹ法で
は、局所領域内の残基位置間の依存関係を全く考慮して
いなかったが、残基位置間の依存性を反映した確率的規
則の構成によって、より精度の高い局所領域の特徴抽出
及び予測が可能になっている。また、情報量規準に基づ
く最適化により、確率的規則の構造を理論的に最適化す
ることが可能になる。According to the present invention, from the amino acid sequence information of a protein having a known three-dimensional structure, the local three-dimensional structure of a protein whose local three-dimensional structure is unknown can be predicted with higher accuracy than the prior art. For example, the MY method, which is one of the conventional methods, does not consider the dependency between the residue positions in the local region at all. However, due to the configuration of the probabilistic rule reflecting the dependency between the residue positions, The feature extraction and prediction of the local region with higher accuracy can be performed. In addition, the optimization based on the information criterion makes it possible to theoretically optimize the structure of the stochastic rule.

[Brief description of the drawings]

【図１】本発明のタンパク質立体構造予測方法の一実施
例を示すフローチャートFIG. 1 is a flow chart showing one embodiment of the protein three-dimensional structure prediction method of the present invention.

【図２】本発明のタンパク質立体構造予測方法の一実施
例を示すフローチャートFIG. 2 is a flowchart showing an embodiment of the protein three-dimensional structure prediction method of the present invention.

【図３】本発明で使用する確率的規則の変数間の依存関
係を示す模式図。FIG. 3 is a schematic diagram showing a dependency relationship between variables of a stochastic rule used in the present invention.

【図４】本発明において各残基位置で行う有限分割の具
体例を示す模式図FIG. 4 is a schematic diagram showing a specific example of finite division performed at each residue position in the present invention.

[Explanation of symbols]

１１正例抽出１２負例抽出１３実数値パラメータ推定１４確率的規則の構造の決定１５テスト配列に対する活性度算出１６テスト配列に対する予測値算出２１正例抽出２２負例抽出２３実数値パラメータ推定２４確率的規則の構造の決定２５テスト配列に対する活性度計算２６テスト配列に対する予測値算出 11 Extraction of positive examples 12 Extraction of negative examples 13 Estimation of real-valued parameters 14 Determination of the structure of probabilistic rules 15 Calculation of activity for test sequences 16 Calculation of predicted values for test sequences 21 Extraction of positive examples 22 Extraction of negative examples 23 Real-valued parameter estimation 24 Probability Determining the structure of a statistical rule 25 Calculating activity for a test sequence 26 Calculating a predicted value for a test sequence

フロントページの続き (56)参考文献情報処理学会全国大会講演論文集，第 45回，１−343〜１−344，馬見塚，山崎「確率的規則を用いたタンパク質のヘリックス領域予測」（平４−10−11) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06F 15/20 G06F 15/40 530 Continuation of the front page (56) References Proceedings of the IPSJ National Convention, 45th, 1-343-344, Mamizuka, Yamazaki "Helix region prediction of proteins using stochastic rules" (Hei 4 −10−11) (58) Fields investigated (Int.Cl. ⁷ , DB name) G06F 15/20 G06F 15/40 530

Claims

(57) [Claims]

1. A stochastic rule for predicting a protein structure.
In protein tertiary structure prediction method having a step of learning the structure of law, the step includes a step of inputting <br/> training data of the amino acid sequence consisting of positive cases and negative cases, the dependency of each amino acid conditions attached as a probability, real value parameters of the positive examples and negative examples
Calculating the data position and using the information criterion defined by the real-valued parameters to determine other residue positions on which each residue position of the amino acid sequence is dependent. by, and a step of outputting to build the structure of probabilistic rules that express the dependency, the activity of the test data using the structure of the probabilistic rule
Calculates and calculates the 3D of the protein based on the calculated activity
A method for predicting a protein three-dimensional structure, comprising predicting a structure.

2. A stochastic rule for predicting a protein structure.
In protein tertiary structure prediction method having a step of learning the structure of law, the step includes a step of inputting <br/> training data of the amino acid sequence consisting of positive cases and negative cases as a real numeric attributes, each
Amino acid of dependency as a conditional probability, positive examples and negative
Calculating an example real-valued parameter;
Using the information criterion defined by the parameters, determine the residue positions on which each residue position of the amino acid sequence depends, thereby constructing a probabilistic rule structure expressing the dependency relationship. and a step of and outputs, the activity of the test data using the structure of the probabilistic rule
Calculates and calculates the 3D of the protein based on the calculated activity
A method for predicting a protein three-dimensional structure, comprising predicting a structure.

3. A stochastic rule for predicting a protein structure.
Three-dimensional structure with the step of learning the rule structure
In the structure prediction method, the step comprises the steps of:
Input training data and the dependence of each amino acid
Real-valued parameters for positive and negative examples
Calculating the data and the real-valued parameters
Using a defined information criterion, each of the amino acid sequences
The other residue positions on which the
By making decisions under the restriction of not circulating,
Construct and output the structure of probabilistic rules expressing existence relationships
And determining the activity of the test data using the structure of the stochastic rule.
Calculates and calculates the 3D of the protein based on the calculated activity
Protein three-dimensional structure prediction characterized by predicting the structure
Measurement method.

4. A stochastic rule for predicting a protein structure.
Three-dimensional structure with the step of learning the rule structure
In the structure prediction method, the step comprises the steps of:
Inputting training data as real-valued attributes;
Positive and negative amino acid dependencies as conditional probabilities
Calculating an example real-valued parameter;
Using the information criterion defined by the parameter,
Other residue positions at which each residue position in the amino acid sequence is dependent
Location under the restriction that dependencies do not cycle.
Form the structure of the probabilistic rule expressing the dependency.
And outputting the test data using the structure of the stochastic rule.
Calculates and calculates the 3D of the protein based on the calculated activity
Protein three-dimensional structure prediction characterized by predicting the structure
Measurement method.