JP3129202B2

JP3129202B2 - Sequence secondary structure prediction method and sequence secondary structure prediction device

Info

Publication number: JP3129202B2
Application number: JP22989896A
Authority: JP
Inventors: 稔麻生川
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1996-08-30
Filing date: 1996-08-30
Publication date: 2001-01-29
Anticipated expiration: 2016-08-30
Also published as: JPH1074189A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は配列２次構造予測方
法及び配列２次構造予測装置に係り、特に立体構造が未
知の蛋白質及びＲＮＡ（リボ核酸）等の核酸配列から配
列２次構造を予測する配列２次構造予測方法及び装置に
関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method and apparatus for predicting a secondary structure of a sequence, and more particularly to predicting a secondary structure of a sequence from a nucleic acid sequence such as a protein or an RNA (ribonucleic acid) whose tertiary structure is unknown. The present invention relates to a method and apparatus for predicting a secondary structure of a sequence.

【０００２】[0002]

【従来の技術】蛋白質は、Ｇ（グリシン）、Ａ（アラニ
ン）、Ｖ（バリン）、Ｉ（イソロイシン）、Ｌ（ロイシ
ン）、Ｆ（フェニルアラニン）、Ｐ（プロリン）、Ｍ
（メチオニン）、Ｓ（セリン）、Ｔ（トレオニン）、Ｙ
（チロシン）、Ｗ（トリプトファン）、Ｎ（アスパラギ
ン）、Ｑ（グルタミン）、Ｃ（システイン）、Ｄ（アス
パラギン酸）、Ｅ（グルタミン酸）、Ｋ（リシン）、Ｒ
（アルギニン）、Ｈ（ヒスチジン）の２０種類の残基が
ペプチド結合により直鎖状に結合した、ポリペプチドで
ある。2. Description of the Related Art Proteins include G (glycine), A (alanine), V (valine), I (isoleucine), L (leucine), F (phenylalanine), P (proline), M
(Methionine), S (serine), T (threonine), Y
(Tyrosine), W (tryptophan), N (asparagine), Q (glutamine), C (cysteine), D (aspartic acid), E (glutamic acid), K (lysine), R
(Arginine) and H (histidine) are polypeptides in which 20 types of residues are linearly linked by peptide bonds.

【０００３】直鎖状ポリペプチドが折り畳まって、複雑
な立体構造を作る。これは、配列上連続していない主鎖
の原子同士が水素結合やジスルフィド結合（システイン
側鎖同士の脱水結合）などの弱い結合をすることに起因
する。水素結合では、主鎖においては、酸素と窒素に結
合している水素が、配列の他の部分の窒素に結合してい
る水素と酸素にそれぞれ結合する。従って、１残基は直
接には１残基と結合する。[0003] Linear polypeptides fold into complex tertiary structures. This is because the atoms of the main chain that are not continuous in the sequence form a weak bond such as a hydrogen bond or a disulfide bond (a dehydration bond between cysteine side chains). In the hydrogen bond, in the main chain, hydrogen bonded to oxygen and nitrogen is bonded to hydrogen and oxygen bonded to nitrogen in the other part of the sequence, respectively. Therefore, one residue is directly linked to one residue.

【０００４】このペアによって決まる構造には、αヘリ
ックスと呼ばれる螺旋構造と、βシートと呼ばれる直線
的構造がある。αヘリックスは、３．５〜６残基周期の
螺旋でｉ番目の残基とｉ＋４番目の残基の主鎖同士が水
素結合して形成された安定な構造である。βシートは、
主鎖同士水素結合してペアを作り、そのペアが連続して
膜のような構造を形成する。βシートは、方向が同じ向
きで結合する平行βシートと、反平行βシートの２種類
がある。蛋白質の立体構造において、どの部分が寄与し
ているのかを解析することについては、特開平５−２８
２３８１号公報（松尾、”蛋白質分子立体構造解析装
置”：以下、文献１と称す）に詳しく述べられている。[0004] The structure determined by the pair includes a helical structure called α-helix and a linear structure called β-sheet. The α-helix is a stable structure formed by hydrogen bonding between the main chains of the i-th residue and the (i + 4) -th residue in a helix having a period of 3.5 to 6 residues. Beta sheet
The main chains are hydrogen-bonded to form a pair, and the pair continuously forms a film-like structure. There are two types of β sheets, a parallel β sheet and a parallel β sheet that are joined in the same direction. For analyzing which part contributes to the three-dimensional structure of a protein, see JP-A-5-28.
No. 2381 (Matsuo, "Protein molecular three-dimensional structure analyzer": hereinafter referred to as Document 1).

【０００５】βシートにおいては、配列上で隣り合う残
基では、水素結合の方向が正反対なので、１残基は見掛
け上２残基と隣り合う。また、この時、結合した残基の
側鎖の分子が相互に干渉することによって、ペアになり
易い残基対と、なり難い残基対が決まる。これらについ
ては、文献２（Branden,C.,and Tooze,J.,"Introductio
n to Protein Structure",Garland Publishing,(199
1)）、日本語訳は「蛋白質の構造入門」、教育社（１９
９２））に詳しく述べられている。[0005] In the β sheet, one residue is apparently adjacent to two residues because the direction of hydrogen bonding is exactly opposite between residues adjacent to each other on the sequence. Further, at this time, by the molecules of the side chains of the bonded residues interfering with each other, a pair of residues that are likely to be paired and a pair of residues that are unlikely to be paired are determined. These are described in reference 2 (Branden, C., and Tooze, J., "Introductio
n to Protein Structure ", Garland Publishing, (199
1)), Japanese translation is "Introduction to Protein Structure", Kyoikusha (19)
92)).

【０００６】従来の蛋白質のβシートを予測するシステ
ムでは、大別して２種類ある。まず、残基がβシートに
含まれる統計的な傾向を、データベースから予め求め、
それを基に配列上で連続する幾つかの残基がβシートに
含まれるかを予測する。これらについては文献３（Cho
u,P.Y.,Fasman,G.D.,"Prediction of protein confroma
tion",Biochemistry,13:222-245,(1974)）に詳しく述べ
られている。Conventional systems for predicting β-sheets of proteins are roughly classified into two types. First, a statistical tendency in which residues are included in the β sheet is obtained in advance from a database,
Based on this, it is predicted whether several consecutive residues on the sequence are included in the β-sheet. These are described in Reference 3 (Cho
u, PY, Fasman, GD, "Prediction of protein confroma
tion ", Biochemistry, 13: 222-245, (1974).

【０００７】これ以外の方法としては、立体構造が分か
っている蛋白質より、ペアを構成する統計的な傾向を、
データベースから予め求め、若しくはニューラルネット
ワークに学習させ、配列上ですべてのペアについてβシ
ートを構成するかを予測する。これについては、文献４
（Hubbard T.,"Use of β-strand Interaction Pseudo-
Potentials in Protein Structure Prediction and Mod
elling",Proceedingsof the 27th Annual Hawaii Inter
national Conference on System Sciences,pp.336-334.
(1994)）に詳しく述べられている。[0007] As another method, the statistical tendency of forming a pair from a protein whose tertiary structure is known,
It is obtained in advance from a database or learned by a neural network, and it is predicted whether a β sheet is formed for all pairs on an array. For this, see Reference 4.
(Hubbard T., "Use of β-strand Interaction Pseudo-
Potentials in Protein Structure Prediction and Mod
elling ", Proceedingsof the 27th Annual Hawaii Inter
national Conference on System Sciences, pp. 336-334.
(1994)).

【０００８】一方、ＲＮＡ配列は４種類の塩基、すなわ
ち、アデニン（Ａ）、ウラシル（Ｕ）、グアニン（Ｇ）
及びシトシン（Ｃ）で構成されている。これらのＲＮＡ
の各塩基は、ＡとＵ、ＧとＣ、不安定ではあるがＧとＵ
が相補的に結合する。この結合によって、ＲＮＡには、
蛋白質のβシートのような構造（ステム構造）が出現す
る。蛋白質との違いは、塩基間の制約によって平行のシ
ートが無いことと、各塩基は最大１つの塩基としか隣り
合うことがない（蛋白質では最大２まで）ことである。On the other hand, the RNA sequence has four types of bases, namely, adenine (A), uracil (U), and guanine (G).
And cytosine (C). These RNAs
Bases A and U, G and C, and unstable but G and U
Bind complementarily. By this binding, RNA has
A structure (stem structure) like a β-sheet of a protein appears. Differences from proteins are that there are no parallel sheets due to constraints between bases, and that each base can be adjacent to only one base at most (up to 2 for proteins).

【０００９】ＲＮＡは蛋白質に比較して単純なので、実
験的に塩基間の結合エネルギーやステム長さに依存する
安定化エネルギーについて知られている。動的計画法を
用いて最低エネルギーをとる形状を求める方法が有名で
あるが（Ｚｕｋｅｒ法）、擬似ノット（Ｐｓｕｅｄｏ−
Ｋｎｏｔ）という形状については対処ができない。これ
らについては文献５（Gribskov M.and Devereux J.,"Se
quence Analysis Primer",Stockton press,(1991)）に
詳しく述べられている。[0009] Since RNA is simpler than protein, it is experimentally known about the binding energy between bases and the stabilizing energy depending on the stem length. Although a method of finding a shape having the lowest energy using dynamic programming is well known (Zuker method), a pseudo knot (Pseudo-
Knot) cannot be dealt with. These are described in reference 5 (Gribskov M. and Devereux J., "Se
quence Analysis Primer ", Stockton press, (1991)).

【００１０】[0010]

【発明が解決しようとする課題】しかし、以上の従来の
２次構造予測方法では、いずれも実際に立体構造をとる
ことが可能かどうかを判定しておらず、システムの出力
としては実現不可能なものも含まれるという欠点があ
る。また、蛋白質分子の個々の局所構造間の相互作用が
立体構造形成にどのように寄与をしているかを解析する
ことについては、文献１に述べられているが、このもの
には立体構造を推定することについては何ら記載がな
い。However, none of the above-mentioned conventional secondary structure prediction methods determines whether or not it is possible to actually take a three-dimensional structure, and cannot be realized as a system output. There is a drawback that some are included. The analysis of how the interaction between individual local structures of a protein molecule contributes to the formation of a three-dimensional structure is described in Reference 1, but this document estimates the three-dimensional structure. There is no mention of what to do.

【００１１】本発明は以上の点に鑑みなされたもので、
実際に立体構造をとることが可能かどうかを判定しつ
つ、配列中でペアを構成する対を決めることにより、も
っともらしい立体構造予測ができる配列２次構造予測方
法及び配列２次構造予測装置を提供することを目的とす
る。The present invention has been made in view of the above points,
A sequence secondary structure prediction method and a sequence secondary structure prediction device capable of plausible three-dimensional structure prediction by determining pairs forming pairs in a sequence while determining whether or not a three-dimensional structure can be actually taken. The purpose is to provide.

【００１２】[0012]

【課題を解決するための手段】上記の目的を達成するた
め、本発明方法は、蛋白質及び核酸配列の各構成単位同
士の結合エネルギーを用いて、入力された立体構造が未
知な蛋白質又は核酸配列の２つの部分配列のすべての組
み合わせに対して結合エネルギーを計算した後、その結
合エネルギーの総和が最低となる部分配列の組み合わせ
を、立体構造の制約を表現したホップフィールドニュー
ラルネットワークを用いて求め、入力された立体構造が
未知な蛋白質又は核酸配列の配列２次構造を予測するよ
うにしたものである。In order to achieve the above-mentioned object, the method of the present invention uses a binding energy between constituent units of a protein and a nucleic acid sequence to obtain a protein or nucleic acid sequence whose input tertiary structure is unknown. After calculating the binding energies for all combinations of the two sub-arrays, the combination of the sub-arrays having the lowest sum of the binding energies is determined using a Hopfield neural network expressing the constraint of the three-dimensional structure, The input three-dimensional structure predicts the secondary structure of an unknown protein or nucleic acid sequence.

【００１３】本発明方法では、立体構造の制約を考慮し
て立体構造が未知な蛋白質又は核酸配列の配列２次構造
を予測するようにしたので、立体構造の制約を考慮して
いない従来方法に比しより正確な配列２次構造を予測が
できる。In the method of the present invention, the secondary structure of a protein or nucleic acid sequence whose tertiary structure is unknown is predicted in consideration of the restriction of tertiary structure. A more accurate sequence secondary structure can be predicted.

【００１４】また、本発明方法は上記に加えて、ホップ
フィールドニューラルネットワークで得られた結果の折
り畳みが可能かどうかをシミュレーションを組み合わせ
るようにしたものである。これにより、より正確な配列
２次構造の予測ができる。[0014] In addition to the above, the method of the present invention combines a simulation to determine whether the result obtained by the Hopfield neural network can be folded. Thereby, more accurate prediction of the sequence secondary structure can be performed.

【００１５】また、本発明装置は上記の目的を達成する
ため、蛋白質及び核酸配列の各構成単位同士の結合エネ
ルギーを保持する結合エネルギー保持部と、入力された
立体構造が未知な蛋白質又は核酸配列の２つの部分配列
のすべての組み合わせを発生する部分配列組み合わせ発
生部と、結合エネルギー保持部からの結合エネルギーを
用いて部分配列組み合わせ発生部からの２つの部分配列
のすべての組み合わせに対して結合エネルギーを計算す
る結合エネルギー計算部と、結合エネルギー計算部によ
り計算された結合エネルギーが初期値として設定され、
立体構造の制約を満たしつつ最低の結合エネルギーを発
生させる部分配列の組み合わせを、入力された立体構造
が未知な蛋白質又は核酸配列の配列２次構造として予測
するホップフィールドニューラルネットワーク部とを有
する構成としたものである。In order to achieve the above object, the apparatus of the present invention comprises a binding energy holding unit for holding binding energy between constituent units of a protein and a nucleic acid sequence, and a protein or nucleic acid sequence whose input tertiary structure is unknown. And a binding energy for all combinations of the two partial sequences from the partial sequence combination generating unit using the binding energy from the binding energy holding unit. And a binding energy calculated by the binding energy calculation unit are set as initial values,
A configuration having a Hopfield neural network unit that predicts a combination of partial sequences that generate the lowest binding energy while satisfying the constraints of the three-dimensional structure as a sequence secondary structure of an unknown protein or nucleic acid sequence. It was done.

【００１６】この発明装置では、ホップフィールドニュ
ーラルネットワーク部が立体構造の制約を満たしつつ最
低の結合エネルギーを発生させる部分配列の組み合わせ
を、入力された立体構造が未知な蛋白質又は核酸配列の
配列２次構造として予測するため、立体構造の制約を考
慮していない従来予測装置に比しより正確な配列２次構
造を予測ができる。In the apparatus of the present invention, the Hopfield neural network unit determines the combination of the partial sequences that generate the lowest binding energy while satisfying the constraints of the three-dimensional structure, and the sequence of the protein or nucleic acid sequence whose unknown three-dimensional structure is unknown. Since the prediction is performed as a structure, a more accurate sequence secondary structure can be predicted as compared with a conventional prediction device that does not consider the restriction of the three-dimensional structure.

【００１７】また、本発明装置の結合エネルギー保持部
は、蛋白質の部分配列の組み合わせにより表現されるβ
シートに関して、平行型と反平行型と水素結合のパター
ンによって３つのタイプに分類して各タイプ別の結合エ
ネルギーを保持し、結合エネルギー計算部において前記
３つのタイプの結合エネルギーのそれぞれについて結合
エネルギーの計算をさせることを特徴とする。Further, the binding energy holding unit of the device of the present invention is characterized by β β expressed by a combination of partial sequences of proteins.
The sheets are classified into three types according to a parallel type, an anti-parallel type, and a hydrogen bond pattern, and hold the binding energy of each type. The binding energy calculation unit calculates the binding energy of each of the three types of binding energy. It is characterized by performing calculations.

【００１８】また、本発明装置のホップフィールドニュ
ーラルネットワーク部は、蛋白質の部分配列の組み合わ
せにより表現されるβシートに関して、平行型と反平行
型と水素結合のパターンによって３つのタイプに分類し
たときの各タイプに対する制約を表現したエネルギー関
数を用いて動作することを特徴とする。Further, the Hopfield neural network section of the apparatus of the present invention is characterized in that β-sheets represented by combinations of partial sequences of proteins are classified into three types based on the pattern of parallel type, antiparallel type and hydrogen bond. The operation is performed using an energy function expressing a constraint for each type.

【００１９】更に、ホップフィールドニューラルネット
ワーク部は、βシートの始まりと終りを表現し、βシー
トの連続性に関する制約を表現したエネルギー関数を用
いて動作することがより正確な予測にとって望ましい。Furthermore, it is desirable for a more accurate prediction that the Hopfield neural network unit operates using an energy function that expresses the beginning and end of the β-sheet and expresses a constraint on the continuity of the β-sheet.

【００２０】また、ホップフィールドニューラルネット
ワーク部は、蛋白質の部分配列の組み合わせにより表現
されるαヘリックスの始まりと終りを表現し、αヘリッ
クスの連続性に関する制約を表現したエネルギー関数を
用いて動作することが、より立体構造の制約を考慮した
配列２次構造予測ができるので望ましい。The Hopfield neural network unit operates using an energy function that expresses the beginning and end of an α-helix expressed by a combination of partial sequences of proteins, and expresses constraints on the continuity of the α-helix. However, it is preferable because the sequence secondary structure can be predicted in consideration of the restriction of the three-dimensional structure.

【００２１】更に、本発明装置は、動作中のホップフィ
ールドニューラルネットワーク部の値を入力として受け
て折り畳みシミュレーションを行い、ホップフィールド
ニューラルネットワーク部の値に対してフィードバック
を行う折り畳みシミュレーション部を更に有すること
が、より予測の正確性を向上できる点で望ましい。Further, the apparatus of the present invention further includes a folding simulation unit which receives a value of the operating Hopfield neural network unit as an input, performs a folding simulation, and feeds back the value of the Hopfield neural network unit. However, it is desirable in that the accuracy of prediction can be further improved.

【００２２】[0022]

【発明の実施の形態】次に、本発明の実施の形態につい
て図面と共に説明する。Next, embodiments of the present invention will be described with reference to the drawings.

【００２３】図１は本発明になる配列２次構造予測装置
の第１の実施の形態の全体構成図を示す。同図に示すよ
うに、この配列２次構造予測装置は、各構成単位（蛋白
質の場合アミノ酸、ＲＮＡの場合核酸）同士の結合エネ
ルギーを保持する結合エネルギー保持部１１と、部分配
列の組み合わせに対して結合エネルギーを計算する結合
エネルギー計算部１２と、立体構造が未知な配列１５の
２つの部分配列のすべての組み合わせを発生する部分配
列組み合わせ発生部１３と、立体構造の制約を表現した
ホップフィールドニューラルネットワーク部１４とから
構成されている。FIG. 1 shows an overall configuration diagram of a first embodiment of an array secondary structure prediction device according to the present invention. As shown in the figure, the sequence secondary structure predicting apparatus includes a binding energy holding unit 11 for holding binding energy of each structural unit (amino acid in the case of protein, nucleic acid in the case of RNA), and a combination of partial sequences. Binding energy calculation unit 12 for calculating binding energy by means of a combination, subsequence combination generation unit 13 for generating all combinations of two subsequences of sequence 15 whose tertiary structure is unknown, and Hopfield neural network expressing constraints on steric structure And a network unit 14.

【００２４】結合エネルギー保持部１１が保持するＲＮ
Ａの結合エネルギーは、文献５に示すように、生化学的
な実験によって得られている。また、蛋白質の結合エネ
ルギーにおいては、実験的に知られているところが少な
いので、実際に立体構造が分かっている蛋白質での統計
を基にして求める。その方法は文献４により公知であ
る。RN held by the binding energy holding unit 11
As shown in Reference 5, the binding energy of A has been obtained by a biochemical experiment. In addition, since the binding energy of a protein is not known experimentally, it is determined based on statistics of a protein whose tertiary structure is actually known. The method is known from document 4.

【００２５】蛋白質の場合、部分的な立体構造である２
次構造として、αヘリックスと呼ばれる螺旋構造と、β
シートと呼ばれる直線的構造がある。これらの２次構造
において、残基間の結合エネルギーを統計値を用いて推
定する。例えば、βシートについては、３残基の部分配
列と３残基の部分配列のペアに対して結合エネルギーを
求める。３残基にしたのは、立体構造のデータベースの
数から仮に決めたもので、これ以外の長さの部分配列を
用いることも可能である。In the case of proteins, the partial tertiary structure 2
As the next structure, a helical structure called α-helix and β
There is a linear structure called a sheet. In these secondary structures, the binding energy between residues is estimated using statistics. For example, for the β sheet, the binding energy is determined for a pair of a 3-residue partial sequence and a 3-residue partial sequence. The three residues are tentatively determined from the number of databases of the three-dimensional structure, and partial sequences having other lengths can also be used.

【００２６】蛋白質のβシート中のストランド間にまた
がる３残基列−３残基列ペアを３つのタイプに分類す
る。第１のタイプは図２（ａ）に示すように、反平行β
シート中の３残基列−３残基列ペアで両端の残基間に水
素結合があるタイプＨであり、第２のタイプは図２
（ｂ）に示すように、反平行βシート中の３残基列−３
残基列ペアで中央の残基間にのみ水素結合があるタイプ
Ａであり、第３のタイプは図２（ｃ）に示すように、平
行βシート中の３残基列−３残基列ペアであるタイプＰ
である。図１の結合エネルギー保持部１１はこれらの３
つのタイプに分類したところの結合エネルギーを保持し
ている。The three-residue string-three-residue string pairs extending between strands in the β-sheet of the protein are classified into three types. The first type is an antiparallel β as shown in FIG.
In the 3-residue line-3 residue sequence pair in the sheet, there is a type H in which hydrogen bonds are present between the residues at both ends.
As shown in (b), 3 residue sequence in antiparallel β sheet-3
In the residue type pair, type A in which a hydrogen bond is present only between the central residues, and the third type is, as shown in FIG. Type P which is a pair
It is. The binding energy holding unit 11 in FIG.
It holds the binding energies of the two types.

【００２７】図１の部分配列組み合わせ発生部１３で
は、βシートに対して３残基列を用いる例では、立体構
造未知配列１５中の連続した３残基列をすべて求め、そ
の３残基列の２つのすべての組み合わせを発生する。結
合エネルギー計算部１２は部分配列組み合わせ発生部１
３により発生された３残基列のすべての組み合わせに対
するそれぞれの結合エネルギーを、結合エネルギー保持
部１１に保存されている結合エネルギーを用いて計算す
る。In the example in which the three-residue sequence is used for the β sheet, the partial-sequence combination generating unit 13 shown in FIG. Generate all two combinations of The binding energy calculation unit 12 is a partial array combination generation unit 1
The respective binding energies for all combinations of the three residue sequences generated by 3 are calculated using the binding energies stored in the binding energy holding unit 11.

【００２８】この結合エネルギーの計算では、（Ｎ−
１）²／２個の結合エネルギーが計算される。１／２倍
しているのは、３残基列は交換可能としているからで、
交換可能としない場合は、１／２倍せずに（Ｎ−１）²
個の結合エネルギーを計算する。また、（Ｎ−１）個と
なっているのは、３残基列と同じ３残基列に対しては決
して３残基列−３残基列ペアとならないので、予め省略
しているからである。蛋白質の場合、上記したタイプ
Ｈ、Ａ及びＰの３つのタイプに分類するので、３×（Ｎ
−１）²／２個の結合エネルギーが計算できる。In the calculation of the binding energy, (N−
1) ^2/2 binding energy is computed. The reason for the doubling is that the three residue sequence is exchangeable.
If not exchangeable, do not multiply by 1/2 and (N-1) ²
Is calculated. In addition, the reason why (N-1) is used is that it is omitted in advance because the same three-residue sequence as the three-residue sequence never becomes a three-residue sequence-3 residue sequence pair. It is. In the case of proteins, since they are classified into the above three types, H, A and P, 3 × (N
-1) ^2/2 binding energy can be calculated.

【００２９】図１のホップフィールドニューラルネット
ワーク部１４は、結合エネルギー計算部１２で計算され
た計算結果（蛋白質の場合、３×（Ｎ−１）²／２個の
結合エネルギー）が初期値として設定され、その初期値
と同じ数のセルを用いて動作する。また、αヘリックス
とβシートに対して、２×Ｎ個のセルを用いる。The Hopfield neural network unit 14 of FIG. 1, (in the case of proteins, 3 × (N-1) 2/2 pieces of binding energy) is calculated by the binding energy calculation unit 12 the calculation result set as an initial value And operates using the same number of cells as the initial value. Also, 2 × N cells are used for the α helix and β sheet.

【００３０】図３は上記ホップフィールドニューラルネ
ットワーク部１４の一部を示し、同図（ａ）はタイプ
Ｈ、同図（ｂ）はタイプＡ、同図（ｃ）はタイプＰを示
す。図３（ａ）に示すホップフィールドニューラルネッ
トワーク部１４のＨ_i,jは、３残基列（残基ｉ−１，残
基ｉ，残基ｉ＋１）と３残基列（残基ｊ−１，残基ｊ，
残基ｊ＋１）がタイプＨの結合を持つかを、例えば［0.
0,1.0］の実数で表現している。以下、３残基列（残基
ｉ−１，残基ｉ，残基ｉ＋１）は残基列ｉ、３残基列
（残基ｊ−１，残基ｊ，残基ｊ＋１）は残基列ｊと表現
する。FIG. 3 shows a part of the Hopfield neural network unit 14, wherein FIG. 3A shows type H, FIG. 3B shows type A, and FIG. Hi _{, j} of the Hopfield neural network unit 14 shown in FIG. 3A is a three residue sequence (residue i-1, residue i, residue i + 1) and a three residue sequence (residue j-1). , Residue j,
Whether residue j + 1) has a type H bond, for example, [0.
0,1.0]. Hereinafter, the three residue sequence (residue i-1, residue i, residue i + 1) is the residue sequence i, and the three residue sequence (residue j-1, residue j, residue j + 1) is the residue sequence. Expressed as j.

【００３１】同様に、図３（ｂ）に示すＡ_i,j、同図
（ｃ）に示すＰ_i,jはそれぞれ残基列ｉと残基列ｊが各
々タイプＡとタイプＰの結合を持つかを表現している。
つまり、図３中、一つの丸で示す例えば「Ｈ_i,j」は３
残基列ｉと３残基列ｊとの３残基列ペアを示している
（後述の図４も同様）。Similarly, A _{i, j} shown in FIG. 3 (b) and P _{i, j} shown in FIG. 3 (c) indicate that the residue sequence i and the residue sequence j indicate the binding of type A and type P, respectively. It expresses whether you have.
That is, for example, “H _{i, j} ” shown by one circle in FIG.
A three-residue column pair of a residue sequence i and a three-residue sequence j is shown (the same applies to FIG. 4 described later).

【００３２】３残基列ペアＨ_i,j、Ａ_i,j及びＰ_i,jを用
いて、蛋白質の立体構造による各制約は以下に示すよう
な各エネルギー関数として表現する。Using the three residue sequence pairs H _{i, j} , A _{i, j} and P _{i, j} , each constraint due to the three-dimensional structure of the protein is expressed as each energy function as shown below.

【００３３】タイプＨのエネルギー関数Ｕ₁は次式で表
され、各３残基列ｉについてタイプＨのペアが可能な３
残基列は最大１つである。The energy function U ₁ of the type H is represented by the following equation.
There is at most one residue sequence.

【００３４】[0034]

【数１】タイプＡのエネルギー関数Ｕ₂は次式で表され、各３残
基列ｉについてタイプＡのペアが可能な３残基列は最大
１つである。この制約は次式のエネルギー関数Ｕ₂を用
いて実現する。(Equation 1) The energy function U ₂ of type A is represented by the following equation. For each three-residue sequence i, a maximum of one three-residue sequence can be paired with type A. This constraint is realized by using the energy function U ₂ of the following equation.

【００３５】[0035]

【数２】タイプＰのエネルギー関数Ｕ₃は次式で表され、各３残
基列ｉについてタイプＰのペアが可能な３残基列は最大
２つである。この制約は次式のエネルギー関数Ｕ₃を用
いて実現する。(Equation 2) The energy function U ₃ of the type P is represented by the following equation, and the number of the three residue sequences that can be paired with the type P for each three residue sequence i is a maximum of two. This constraint is realized by using the energy function U ₃ of the following equation.

【００３６】[0036]

【数３】各３残基列ｉについてβシート中でペアが可能な３残基
列は最大２つであり、その制約を表すエネルギー関数Ｕ
₄は次式で表される(Equation 3) For each three residue sequence i, the maximum number of possible three residue sequences in the β sheet is two, and the energy function U representing the constraint is
₄ is given by

【００３７】[0037]

【数４】すべての３残基列ペアについて許されるのは最大、タイ
プＨ、タイプＡ及びタイプＰのどれか１つのみであり、
その制約に関するエネルギー関数Ｕ₅は次式で表され
る。(Equation 4) Only one of the maximum, type H, type A and type P is allowed for all three residue sequence pairs,
Energy function U ₅ for that constraint is expressed by the following equation.

【００３８】[0038]

【数５】平行βシートでは、タイプＰが連続する。従って、タイ
プＰの３残基列ペア（ｉ，ｊ）に隣接して、タイプＰの
３残基列ペア（ｉ−１，ｊ−１）がある方がβシートが
長くなり、化学的に安定となる。この制約は以下のよう
に実現する。(Equation 5) In the parallel β sheet, type P is continuous. Therefore, the β-sheet becomes longer when the type P three-residue string pair (i-1, j-1) is adjacent to the type P three-residue string pair (i, j), and the β sheet becomes chemically longer. Become stable. This restriction is realized as follows.

【００３９】[0039]

【数６】反平行βシートでは、３残基列ペアの結合タイプはタイ
プＨ−タイプＡ−タイプＨ−タイプＡ−のように交互に
並ぶ。従って、タイプＨの３残基列−３残基列ペア
（ｉ，ｊ）に隣接して、タイプＡの３残基列−３残基列
ペア（ｉ−１，ｊ＋１）がある方がβシートが長くな
り、化学的に安定となる。このときのエネルギー関数Ｕ
₆は次式で表される。(Equation 6) In the antiparallel β-sheet, the binding types of the 3-residue row pairs are alternately arranged as type H-type A-type H-type A-. Accordingly, the type A 3-residue string-3 residue sequence pair (i-1, j + 1) is adjacent to the type H 3-residue sequence-3 residue sequence pair (i, j), and β The sheet becomes longer and chemically stable. The energy function U at this time
₆ is represented by the following equation.

【００４０】[0040]

【数７】反平行βシートでは、３残基列−３残基列ペアの結合タ
イプは、タイプＨ−タイプＡ−タイプＨ−タイプＡ−の
ように交互に並ぶので、βシートでタイプＨ又はタイプ
Ａが連続するのを禁止する。この制約は次式のエネルギ
ー関数Ｕ₇を用いて実現する。(Equation 7) In the anti-parallel β sheet, the binding types of the 3-residue string-3 residue string pairs are alternately arranged like type H-type A-type H-type A-. Prohibit continuation. This constraint is implemented by using an energy function U ₇ follows.

【００４１】[0041]

【数８】３残基列−３残基列ペア傾向指数による初期値は、殆ど
正しいので大きく離れないようにする。これは（１０）
式〜（１２）式で実現する。ただし、各式中、Ｈ
^initial _i,j、Ａ^initial _i,j及びＰ^initial _i,jは、図１の
部分配列組み合わせ発生部１３により発生された３残基
列のすべての組み合わせに対するそれぞれの結合エネル
ギーを、結合エネルギー保持部１１に保存されている結
合エネルギーを用いて結合エネルギー計算部１２で計算
された計算結果であり、ホップフィールドニューラルネ
ットワーク部１４に与えられた初期値である。(Equation 8) The initial value based on the 3-residue-row-residue-row pair tendency index is almost correct and should not be largely separated. This is (10)
This is realized by Expressions (12) to (12). However, in each formula, H
^{The initial} _{i, j} , A ^initial _{i, j} and P ^initial _{i, j} are the binding energies for all the combinations of the three residue sequences generated by the partial sequence combination generator 13 in FIG. 11 is a calculation result calculated by the binding energy calculation unit 12 using the binding energy stored in 11, and is an initial value given to the Hopfield neural network unit 14.

【００４２】[0042]

【数９】ホップフィールドニューラルネットワーク状態エネルギ
ー関数Ｅは、次式に示すように、これらの各エネルギー
関数Ｕ_kの重み付き総和とした。(Equation 9) The Hopfield neural network state energy function E was a weighted sum of these energy functions U _k as shown in the following equation.

【００４３】[0043]

【数１０】ただし、上式中、α_ｋは重み係数である。(Equation 10) Here, in the above equation, α _k is a weight coefficient.

【００４４】ホップフィールドニューラルネットワーク
がエネルギー関数Ｅを最小化するように動作させるため
に、最急降下法を用いる。そのために、上記のエネルギ
ー関数ＥをＨ_i,jで偏微分し、その偏微分係数に基づい
て、次式に示すようにＨ_i,jの膜電位の変化量ΔｍＨ_i,j
を決める。To make the Hopfield neural network operate so as to minimize the energy function E, a steepest descent method is used. Therefore, the energy function E is partially differentiated by H _{i, j,} based on the partial differential coefficients, H _i as shown in the following _equation, the variation in membrane potential of the _{_j ΔmH} i, _j
Decide.

【００４５】[0045]

【数１１】ここで、ホップフィールドニューラルネットワーク状態
エネルギー関数Ｅは、セルの活性度Ｈ_i,j、Ａ_i,j及びＰ
_i,jに対してほぼ線形であるので、動作式の導出は簡単
である。上記の膜電位の変化量ΔｍＨ_i,jに基づいて更
新されたタイプＨの膜電位ｍＨ_i,jを用いて、次式によ
りセルの活性度Ｈ_i,jは決められる。[Equation 11] Here, the Hopfield neural network state energy function E is expressed as cell activity H _{i, j} , A _{i, j} and P
Since it is almost linear with respect to _{i and j} , the derivation of the operation equation is simple. Using the membrane potential mH _{i, j} of the type H updated based on the change amount ΔmH _{i, j} of the membrane potential _{, the} cell activity H _{i, j} is determined by the following equation.

【００４６】[0046]

【数１２】タイプＡとタイプＰの膜電位ｍＡ_i,j、ｍＰ_i,jについて
も、上記と同様に求められ、それらのセルの活性度Ａ
_i,j及びＰ_i,jが偏微分係数に基づいて更新される。(Equation 12) The membrane potentials mA _{i, j} and mP _{i, j} of type A and type P are also obtained in the same manner as described above, and the activity A of those cells is obtained.
_{i, j} and P _{i, j} are updated based on the partial derivatives.

【００４７】以上の動作式やセルの活性度Ａ_i,j及びＰ
_i,jに関する（１４）式や（１５）式相当式を用いてホ
ップフィールドニューラルネットワーク部１４を動作さ
せることによって、立体構造の制約を考慮した、従来よ
りも高信頼性の２次配列構造の予測ができる。The above operation formulas and cell activities A _{i, j} and P
By operating the Hopfield neural network unit 14 using the equations (14) and (15) corresponding to _{i and j, the} secondary array structure having a higher reliability than the conventional one in consideration of the three-dimensional structure restriction is considered. Can predict.

【００４８】次に、本発明の第２の実施の形態について
説明する。この第２の実施の形態の全体構成図は図１と
同じであるが、ホップフィールドニューラルネットワー
ク部１４の構成が異なる。図４はこの第２の実施の形態
のホップフィールドニューラルネットワーク部１４の一
部で、同図（ａ）はタイプＨ、同図（ｂ）はタイプＡ、
同図（ｃ）はタイプＰを示す。この第２の実施の形態で
は、ホップフィールドニューラルネットワーク部１４が
図３に示した構成に加えて、平行βシートと反平行βシ
ートの切れ目を表現するために、図４に示すように、２
×（Ｎ−１）²／２個のセルを用いる。Next, a second embodiment of the present invention will be described. The overall configuration of the second embodiment is the same as that of FIG. 1, but the configuration of the Hopfield neural network unit 14 is different. FIGS. 4A and 4B show a part of the Hopfield neural network unit 14 according to the second embodiment. FIG. 4A shows a type H, FIG.
FIG. 3C shows the type P. In the second embodiment, in addition to the configuration shown in FIG. 3, the Hopfield neural network unit 14 includes two bits as shown in FIG.
× (N-1) using a ^2/2 cells.

【００４９】図４に示すように、平行βシートの切れ目
を表現するためのセルをＳＰとし、反平行βシートの切
れ目を表現するためのセルをＳＡとする。この第２の実
施の形態では、これらを用いて、ホップフィールドニュ
ーラルネットワーク部１４はβシートの始まりと終りを
表現し、シートの連続性に対する制約を表現したエネル
ギー関数を用いる。そのため、エネルギー関数は第１の
実施の形態に対して以下のように追加変更する。As shown in FIG. 4, a cell for expressing a cut in a parallel β sheet is denoted by SP, and a cell for expressing a cut in an antiparallel β sheet is denoted by SA. In the second embodiment, using these, the Hopfield neural network unit 14 expresses the start and end of the β-sheet and uses an energy function expressing the constraint on the continuity of the sheet. Therefore, the energy function is additionally changed as follows with respect to the first embodiment.

【００５０】平行βシートでは、タイプＰが連続するこ
とについてのエネルギー関数に、区切りに関する新たな
変数（セル）ＳＰを導入した。このＳＰの性質として、
ＳＰ_i,jが「１．０」のとき、平行βシートについて、
３残基列ペア（ｉ−１，ｊ−１）と３残基列ペア（ｉ，
ｊ）が不連続であり、どちらか一方の３残基列ペアが平
行βシートに属し、もう片方が平行βシートに属さな
い。ＳＰ_i,jが「０．０」のときは、平行βシートの意
味において連続であることを示す。ＳＰを用いて、平行
βシートの連続性は、（７）式に代えて次式により実現
する。In the parallel β sheet, a new variable (cell) SP relating to a break is introduced in the energy function for the continuation of the type P. As the nature of this SP,
When SP _{i, j} is “1.0”, for a parallel β sheet,
The three residue sequence pair (i-1, j-1) and the three residue sequence pair (i,
j) is discontinuous, and one of the three residue sequence pairs belongs to the parallel β sheet, and the other does not belong to the parallel β sheet. When SP _{i, j} is “0.0”, it indicates that the sheet is continuous in the sense of a parallel β sheet. Using SP, the continuity of the parallel β sheet is realized by the following equation instead of equation (7).

【００５１】[0051]

【数１３】また、反平行のβシートでは（８）式に代えて次式によ
り表現する。(Equation 13) In the anti-parallel β sheet, the following expression is used instead of expression (8).

【００５２】[0052]

【数１４】ホップフィールドニューラルネットワーク状態エネルギ
ー関数Ｅは、次式で示すように、これらの各エネルギー
Ｕ_kの重み付き総和とした。[Equation 14] The Hopfield neural network state energy function E was a weighted sum of these energies U _k as shown in the following equation.

【００５３】[0053]

【数１５】ただし、上式中、α_ｋは重み係数である。(Equation 15) Here, in the above equation, α _k is a weight coefficient.

【００５４】この実施の形態では、ホップフィールドニ
ューラルネットワーク部１４がホップフィールドニュー
ラルネットワーク状態エネルギー関数Ｅを最小化するよ
うに動作させるために、最急降下法を用いる。そのため
に、上記のエネルギー関数Ｅを各タイプのセルの活性度
で偏微分し、その偏微分係数に基づいてホップフィール
ドニューラルネットワーク部１４を動作させることによ
り、立体構造の制約を考慮した、より確からしい２次構
造配列の予測ができる。この実施の形態では、図４に示
したように、切れ目を表現するセルＳＡ、ＳＰを用いて
いるので、βシートの長さの範囲が第１の実施の形態よ
りも正確に分かるので、２次構造配列の予測の信頼性を
より向上することができる。In this embodiment, the steepest descent method is used to operate the Hopfield neural network unit 14 so as to minimize the Hopfield neural network state energy function E. For this purpose, the above energy function E is partially differentiated by the activity of each type of cell, and the Hopfield neural network unit 14 is operated based on the partial differential coefficient. It is possible to predict a likely secondary structure sequence. In this embodiment, as shown in FIG. 4, since the cells SA and SP representing the cuts are used, the range of the length of the β sheet can be known more accurately than in the first embodiment. The reliability of the prediction of the substructure sequence can be further improved.

【００５５】次に、本発明の第３の実施の形態について
説明する。この実施の形態の全体構成図は図１と同様で
あるが、結合エネルギー保持部１が第１及び第２の実施
の形態の結合エネルギーに加えて、αヘリックスの結合
エネルギーも保持しており、ホップフィールドニューラ
ルネットワーク部１４が螺旋状のαヘリックスの始まり
と終りを表現し、αヘリックスの連続性に対する制約を
表現したエネルギー関数を用いる構成としたものであ
る。Next, a third embodiment of the present invention will be described. The overall configuration of this embodiment is the same as that of FIG. 1, but the binding energy holding unit 1 holds the binding energy of the α helix in addition to the binding energies of the first and second embodiments. The Hopfield neural network unit 14 expresses the beginning and end of a spiral α-helix and uses an energy function expressing a constraint on the continuity of the α-helix.

【００５６】７残基の部分配列がαヘリックス中に出現
する頻度を求め、その出現頻度に基づいてαヘリックス
の結合エネルギーを計算し、それを図１の結合エネルギ
ー保持部１１に保持する。７残基としたのは、立体構造
のデータベースの数から仮に決めたもので、これ以外の
長さの部分配列を用いることも可能である。The frequency at which the partial sequence of 7 residues appears in the α-helix is determined, the binding energy of the α-helix is calculated based on the occurrence frequency, and the calculated binding energy is stored in the binding energy holding unit 11 of FIG. The number of residues is tentatively determined from the number of the three-dimensional structure database, and partial sequences having other lengths can be used.

【００５７】この実施の形態では、図１の部分配列組み
合わせ発生部１３が長さＮの立体構造が未知の配列を得
た時、結合エネルギー保持部１１からのαヘリックスに
対してＮ個の結合エネルギーを結合エネルギー計算部１
２で計算し、その計算結果を初期値Ｈｅｌｉｘ^initial
_i,jとしてホップフィールドニューラルネットワーク部
１４に設定する。ここでは、ある残基がαヘリックスに
含まれるかを、例えば［0.0,1.0］の実数で表現してい
る。In this embodiment, when the partial sequence combination generator 13 shown in FIG. 1 obtains a sequence having a length N and an unknown tertiary structure, N binding bonds to the α helix from the binding energy holding portion 11 are obtained. Energy binding energy calculator 1
2 and calculate the result as the initial value Helix ^initial
_{i, j} are set in the Hopfield neural network unit 14. Here, whether a certain residue is included in the α helix is represented by, for example, a real number of [0.0, 1.0].

【００５８】この実施の形態では、第２の実施の形態の
ホップフィールドニューラルネットワーク部１４の構成
に加えて、αヘリックスとβシートに対する各Ｎ個のセ
ルとαヘリックスの切れ目を表現するために、Ｎ個のセ
ルを用いる。図５はこの実施の形態のホップフィールド
ニューラルネットワーク部１４の一部を示す。同図にお
いて、Ｓｈｅｅｔ_iはβシートである度合いを示す変
数、Ｈｅｌｉｘ_iはαヘリックス用の変数、ＳＨｅｌｉ
ｘ_iはαヘリックスの切れ目を示す変数である。In this embodiment, in addition to the configuration of the Hopfield neural network unit 14 of the second embodiment, in order to represent the N cells for the α-helix and β-sheet and the break of the α-helix, N cells are used. FIG. 5 shows a part of the Hopfield neural network unit 14 of this embodiment. In the figure, Sheet _i is a variable indicating the degree of β-sheet, Helix _i is a variable for α-helix, SHeli
x _i is a variable indicating a break in the α helix.

【００５９】この実施の形態におけるホップフィールド
ニューラルネットワーク部１４で用いるエネルギー関数
としては、前記の第２の実施の形態の各エネルギー関数
に更に以下のエネルギー関数を加える。As an energy function used in the Hopfield neural network unit 14 in this embodiment, the following energy function is further added to each energy function of the second embodiment.

【００６０】αヘリックスに対する予測値Ｈｅｌｉｘ
_i,j は殆ど正しいので、初期値Ｈｅｌｉｘ^initial _i,jと
大きく離れないようにするため、次式のエネルギー関数
Ｕ₁₄を用いる。Predicted value Helix for α helix
_{i, j} Since most correct order not leave the initial value Helix ^initial _i, and _j increase, using an energy function U ₁₄ follows.

【００６１】[0061]

【数１６】また、残基レベルでαヘリックスとβシートとが重なら
ないようにするため、次式のエネルギー関数Ｕ₁₅を用い
る。(Equation 16) Further, in order to prevent overlap and α-helices and β sheets residue level, using the energy function U ₁₅ follows.

【００６２】[0062]

【数１７】ここで、上式中、Ｓｈｅｅｔ_i は図５に示したように、
βシートである度合いを示す変数であり、以下の式から
求められる。[Equation 17] Here, in the above equation, Sheet _i is, as shown in FIG.
This is a variable indicating the degree of the β sheet, and is obtained from the following equation.

【００６３】[0063]

【数１８】 αヘリックスを連続にするため、αヘリックスの切れ目
を表現するＳＨｅｌｉｘを用いて次式で表されるエネル
ギー関数Ｕ₁₆を用いる。(Equation 18) In order to make the α helix continuous, an energy function U ₁₆ represented by the following equation is used using SHelix representing a break in the α helix.

【００６４】[0064]

【数１９】更に、αヘリックスの長さを４以上にするために次式の
エネルギー関数Ｕ₁₇を用いる。[Equation 19] Further, the energy function U _{17 of the} following equation is used to make the length of the α helix 4 or more.

【００６５】[0065]

【数２０】ホップフィールドニューラルネットワーク状態エネルギ
ー関数Ｅは、次式で表されるように、第２の実施の形態
で用いるエネルギー関数及びこれらの各エネルギー関数
Ｕ₁₄〜Ｕ₁₇の重み付き総和とした。(Equation 20) Hopfield Neural Network state energy function E, as represented by the following formula, and a weighted sum of the energy function and their respective energy function U ₁₄ ~U ₁₇ used in the second embodiment.

【００６６】[0066]

【数２１】ただし、上式中、α_kは重み係数である。(Equation 21) Here, in the above equation, α _k is a weight coefficient.

【００６７】ホップフィールドニューラルネットワーク
部１４は、エネルギー関数Ｅを最小化するように動作さ
せるために、最急降下法を用いる。そのため、ホップフ
ィールドニューラルネットワーク部１４は、ホップフィ
ールドニューラルネットワーク状態エネルギー関数Ｅを
セルの活性度で偏微分し、その偏微分係数に基づいて動
作する。これにより、立体構造の制約を考慮した、立体
構造が未知な配列の蛋白質及びＲＮＡ配列のより確から
しい配列２次構造の予測ができる。The Hopfield neural network unit 14 uses the steepest descent method to operate so as to minimize the energy function E. Therefore, the Hopfield neural network unit 14 partially differentiates the Hopfield neural network state energy function E by the activity of the cell, and operates based on the partial derivative. This makes it possible to predict the more likely sequence secondary structure of a protein or RNA sequence whose sequence is unknown, in consideration of the restriction of the three-dimensional structure.

【００６８】次に、本発明の第４の実施の形態について
説明する。図６は本発明になる配列２次構造予測装置の
第４の実施の形態の全体構成図を示す。同図中、図１と
同一構成部分には同一符号を付し、その説明を省略す
る。図６に示すように、この実施の形態は、配列の立体
構造の折り畳みをシミュレーションする折り畳みシミュ
レーション部１７を有する点に特徴がある。Next, a fourth embodiment of the present invention will be described. FIG. 6 is an overall configuration diagram of a fourth embodiment of an array secondary structure prediction device according to the present invention. In the figure, the same components as those of FIG. 1 are denoted by the same reference numerals, and description thereof will be omitted. As shown in FIG. 6, this embodiment is characterized in that a folding simulation unit 17 for simulating the folding of the three-dimensional structure of the array is provided.

【００６９】この折り畳みシミュレーション部１７は、
ホップフィールドニューラルネットワーク部１４で得ら
れた、αヘリックス、βシートの予測に基づいて、分子
模型を計算機中で動作させ、その折り畳みが可能かどう
かをシミュレーションする。すなわち、折り畳みシミュ
レーション部１７は、動作中のホップフィールドニュー
ラルネットワーク部１４の値を入力として受けて折り畳
みシミュレーションを行い、ホップフィールドニューラ
ルネットワーク部１４の値に対してフィードバックを行
う。この折り畳みのシミュレーション自体は従来より例
えば文献６（Hirosawa M.,Feldmann R.,Rawn D.,Ishika
wa M.,Hoshida M.,and Michaels G.,"Folding Simulati
on using Temperature Parallel Simulated Annealin
g",Proceddings of FGCS 1992,pp.300-306(1992)）など
により公知である。This folding simulation unit 17
Based on the prediction of the α helix and β sheet obtained by the Hopfield neural network unit 14, the molecular model is operated in a computer to simulate whether or not it can be folded. That is, the folding simulation unit 17 receives the value of the operating Hopfield neural network unit 14 as an input, performs folding simulation, and performs feedback on the value of the Hopfield neural network unit 14. The simulation of the folding itself has hitherto been described, for example, in Reference 6 (Hirosawa M., Feldmann R., Rawn D., Ishika
wa M., Hoshida M., and Michaels G., "Folding Simulati
on using Temperature Parallel Simulated Annealin
g ", Proceddings of FGCS 1992, pp. 300-306 (1992)).

【００７０】この実施の形態によれば、例えば、βシー
ト中に含まれている残基同士が、残基間の距離が十分で
あるが、途中にαヘリックスが挿入されることによっ
て、残基間の距離が短くなり、βシート中でペアになら
なくなる、といったことを予め知ることができる。According to this embodiment, for example, the residues contained in the β sheet have a sufficient distance between the residues, but the α helix is inserted in the middle, so that It is possible to know in advance that the distance between them becomes shorter, and that they will not be paired in the β sheet.

【００７１】なお、本発明は以上の実施の形態に限定さ
れるものではなく、例えばＲＮＡ以外のＤＮＡ（デオキ
シリボ核酸）などの他の核酸配列の配列２次構造の予測
についても適用可能である。The present invention is not limited to the above embodiment, but is also applicable to prediction of the sequence secondary structure of other nucleic acid sequences such as DNA (deoxyribonucleic acid) other than RNA.

【００７２】[0072]

【発明の効果】以上説明したように、本発明によれば、
立体構造の制約を考慮しているため、従来のように探索
に対する計算量が長さの４乗に比例して大きくなること
から、長い配列に対しては適用が難しかったり、遠距離
にあるβシートの予測が不能になったりするということ
はなく、長い配列や遠距離にあるβシートについても従
来に比べてより正確に配列２次構造の予測ができる。As described above, according to the present invention,
Considering the restrictions of the three-dimensional structure, the amount of calculation for the search increases in proportion to the fourth power of the length as in the conventional case, so that it is difficult to apply to a long array or β There is no possibility that the prediction of the sheet becomes impossible, and the prediction of the secondary structure of the arrangement can be performed more accurately even in the case of a long arrangement or a β sheet at a long distance as compared with the related art.

【００７３】また、本発明によれば、動作中のホップフ
ィールドニューラルネットワーク部の値を入力として受
けて折り畳みシミュレーションを行い、ホップフィール
ドニューラルネットワーク部の値に対してフィードバッ
クを行うようにしたため、より予測の正確性を向上で
き、立体構造が未知な蛋白質又は核酸配列の配列２次構
造の予測信頼性を大幅に向上することができる。According to the present invention, folding simulation is performed by receiving the value of the operating Hopfield neural network as an input, and feedback is performed on the value of the Hopfield neural network. Can be improved, and the reliability of predicting the secondary structure of a protein or nucleic acid sequence whose tertiary structure is unknown can be greatly improved.

[Brief description of the drawings]

【図１】本発明装置の第１の実施の形態の全体構成図で
ある。FIG. 1 is an overall configuration diagram of a first embodiment of a device of the present invention.

【図２】蛋白質のβシート中のストランド間にまたがる
３残基列−３残基列ペアの３つのタイプの説明図であ
る。FIG. 2 is an explanatory diagram of three types of a 3-residue-string-residue-string pair spanning between strands in a β-sheet of a protein.

【図３】図１のホップフィールドニューラルネットワー
ク部の第１の例を示す図である。FIG. 3 is a diagram illustrating a first example of a Hopfield neural network unit in FIG. 1;

【図４】図１のホップフィールドニューラルネットワー
ク部の第２の例を示す図である。FIG. 4 is a diagram illustrating a second example of the Hopfield neural network unit of FIG. 1;

【図５】図１のホップフィールドニューラルネットワー
ク部の第３の例を示す図である。FIG. 5 is a diagram illustrating a third example of the Hopfield neural network unit in FIG. 1;

【図６】本発明装置の第４の実施の形態の全体構成図で
ある。FIG. 6 is an overall configuration diagram of a fourth embodiment of the device of the present invention.

[Explanation of symbols]

１１結合エネルギー保持部１２結合エネルギー計算部１３部分配列組み合わせ発生部１４ホップフィールドニューラルネットワーク部１５立体構造未知配列１７折り畳みシミュレーション部 DESCRIPTION OF SYMBOLS 11 Binding energy holding part 12 Binding energy calculation part 13 Partial sequence combination generation part 14 Hopfield neural network part 15 Three-dimensional structure unknown sequence 17 Folding simulation part

フロントページの続き (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06N 3/10 G01N 33/48 G01N 33/68 G06F 19/00 110 ＩＮＳＰＥＣ（ＤＩＡＬＯＧ) ＪＩＣＳＴファイル（ＪＯＩＳ) ＷＰＩ（ＤＩＡＬＯＧ)Continuation of the front page (58) Fields investigated (Int.Cl. ⁷ , DB name) G06N 3/10 G01N 33/48 G01N 33/68 G06F 19/00 110 INSPEC (DIALOG) JICST file (JOIS) WPI (DIALOG)

Claims

(57) [Claims]

(1) a method for forming a bond between constituent units of a protein and a nucleic acid sequence;
The input energy is used to determine the unknown protein
All combinations of two subsequences of white matter or nucleic acid sequences
After calculating the binding energy for the
The combination of the sub-arrays with the lowest sum of
Hopfield neural networks expressing body structure constraints
Network, and the input three-dimensional structure is not
Predict sequence secondary structure of known protein or nucleic acid sequenceArray
In the secondary structure prediction method, Obtained by the Hopfield neural network
Whether the result can be folded
Input the combination of foldable sub-arrays
Sequence of protein or nucleic acid sequence of unknown tertiary structure
Predict as structure Sequence secondary structure prediction characterized by the following:
Method.

(2)The connection between the constituent units of protein and nucleic acid sequences
A binding energy holding unit that holds the combined energy, and an input
Two parts of a protein or nucleic acid sequence of unknown conformation
Subarray combinations that generate all combinations of minute arrays
And the binding energy from the binding energy holding unit.
From the partial array combination generator using energy
Binding energy for all combinations of two subsequences
A binding energy calculation unit for calculating energy,
The binding energy calculated by the energy calculator is initially
Is set as the value and the lowest
The combination of the partial sequences that generate the binding energy
The input three-dimensional structure of the unknown protein or nucleic acid sequence
Hopfield neural predicting as secondary structure
And a secondary network structure prediction device having
And The binding energy holding unit is a combination of protein partial sequences.
Regarding the β sheet expressed by matching, the parallel type and the
There are three types depending on the parallel type and the hydrogen bonding pattern.
Holding the binding energy of each type
In the energy calculator, the three types of coupling energy
Calculate the binding energy for each of the lugis
To An array secondary structure prediction device, characterized in that:

3. The Hopfield neural network unit is adapted to classify β sheets represented by a combination of partial sequences of proteins into three types according to a parallel type, an antiparallel type, and a hydrogen bond pattern. 3. The apparatus according to claim 2, wherein the apparatus operates using an energy function expressing a constraint.

4. The Hopfield neural network unit represents a start and an end of the β sheet,
The array secondary structure predicting apparatus according to claim 2, wherein the apparatus operates using an energy function expressing a restriction on sheet continuity.

5. The Hopfield neural network unit operates using an energy function that expresses a start and an end of an α-helix expressed by a combination of partial sequences of proteins, and expresses a restriction on continuity of the α-helix. The sequence secondary structure prediction device according to any one of claims 2 to 4 , wherein:

6. A folding simulation unit which receives a value of the Hopfield neural network unit in operation as input, performs a folding simulation, and feeds back the value of the Hopfield neural network unit. The sequence secondary structure prediction device according to any one of claims 2 to 5 .