JP2010134514A

JP2010134514A - Protein three-dimensional structure prediction method

Info

Publication number: JP2010134514A
Application number: JP2008307164A
Authority: JP
Inventors: Kentaro Onizuka; 健太郎鬼塚
Original assignee: Panasonic Corp
Current assignee: Panasonic Corp
Priority date: 2008-12-02
Filing date: 2008-12-02
Publication date: 2010-06-17

Abstract

<P>PROBLEM TO BE SOLVED: To provide a method of determining which sequence fragment is firstly subjected to structure optimization by using the entropy of the sequence fragments of all the length and all the positions calculated from the sequence of protein whose structure is predicted when performing protein three-dimensional structure prediction. <P>SOLUTION: The structure optimization of protein having short sequence fragments is given priority over protein having long sequence fragments, and the optimization of the sequence fragments having the same length and having small entropy is sequentially given priority, and the energy optimization of the sequence fragments to be optimized afterwards including the sections of the fragments optimized beforehand is performed by scheduling. As for the optimization of each fragment, structure prediction is performed by optimizing the structure for minimizing total energy by calculating the sum of the mean power field potentials of all pairs of residual bases in the fragments and the sum of molecular dynamics potentials among all atoms in the fragments supposing that the all the residual bases are glycine, wherein CO and HH are added to the residual bases at both ends of each fragment. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、天然、非天然のタンパク質のアミノ酸残基配列から、そのタンパク質の立体構造を予測する、タンパク質可変長断片エントロピー解析に基づくタンパク質立体構造予測法に関する。 The present invention relates to a protein tertiary structure prediction method based on protein variable length fragment entropy analysis, which predicts a tertiary structure of a protein from amino acid residue sequences of natural and non-natural proteins.

天然のタンパク質は２０種類の生体アミノ酸が数十から数千ほど鎖状に結合したペプチドが特定の構造に折れ畳まった有機高分子である。タンパク質は、そのアミノ酸配列によって同定されることから、長さの多様性、配列の多様性を考えるとその種類は無限といってよい。タンパク質は生体中にあまねく存在し、生体の構造物の要素構造体をなしたり（構造タンパク質）、生体中の化学反応を促進させるための触媒として働いたり（酵素）、免疫系で代表されるように特定の分子を認識して結合するなど様々な機能をもっている。生体中の天然のタンパク質のアミノ酸配列は、対応する遺伝子中にＤＮＡ塩基配列としてかかれており、必要なときに、ＤＮＡ塩基配列に基づいてアミノ酸が結合されペプチドが合成される。合成されたペプチドは最終的に安定な立体構造をとり、特定の機能を有するにいたる。なお、タンパク質によっては構造が部分的あるいは全体的に一定せず、それでいて特定の機能を有するものも存在する。 A natural protein is an organic polymer in which a peptide in which 20 kinds of biological amino acids are linked in a chain form from several tens to several thousands is folded into a specific structure. Since proteins are identified by their amino acid sequences, the types of proteins can be said to be infinite considering the diversity of length and the diversity of sequences. Proteins are present throughout the body and form the elemental structure of biological structures (structural proteins), act as catalysts for promoting chemical reactions in the body (enzymes), and are represented by the immune system It has various functions such as recognizing and binding specific molecules. The amino acid sequence of a natural protein in a living body is associated with the corresponding gene as a DNA base sequence, and when necessary, amino acids are combined based on the DNA base sequence to synthesize a peptide. The synthesized peptide finally takes a stable three-dimensional structure and has a specific function. Note that some proteins have a structure that is not partially or wholly constant and still has a specific function.

それぞれのタンパク質の機能は、その立体的な構造と密接に結びついているため、タンパク質が機能するメカニズムを知るうえでは、タンパク質の立体構造を知ることが欠かせない。しかし、タンパク質の立体構造をＸ線結晶解析やＮＭＲ法などで求めることは、そのタンパク質のアミノ酸配列を求めることに比べて、大きな労力と時間がかかるため、タンパク質のアミノ酸配列情報から直接その立体構造を推定する手法、すなわちタンパク質立体構造予測法が必要である。高精度なタンパク質立体構造予測法を開発するためには、タンパク質のアミノ酸配列と構造との関係がいかなるものであるかを探求する必要があり、これはタンパク質の立体構造形成のメカニズムを探求することによって得られる。 Since the function of each protein is closely related to its three-dimensional structure, it is indispensable to know the three-dimensional structure of the protein in order to know the mechanism by which the protein functions. However, obtaining the protein's three-dimensional structure by X-ray crystallography or NMR method requires much effort and time compared to obtaining the protein's amino acid sequence. Is necessary, that is, a protein three-dimensional structure prediction method. In order to develop a highly accurate protein tertiary structure prediction method, it is necessary to explore what the relationship between the amino acid sequence and structure of the protein is, and this is to explore the mechanism of protein tertiary structure formation. Obtained by.

タンパク質の立体構造予測法の研究は、最初にミオグロビンの立体構造がＸ線結晶解析によって解明された１９６０年代前半からはじまり、当初は立体構造中の水素結合によって安定化されている規則的な螺旋構造であるヘリックスや、複数の直線的に延びたストランド構造複数が水素結合によって板状につらなったシート構造などが注目され、配列上のどの部分がヘリックスあるいはストランド構造をとるかをアミノ酸残基配列の配列パターンから推定しようとする二次構造予測法の研究が盛んであった。さらに規則的でないものの水素結合によって安定化された不規則構造のいくつかにターン構造を見出すこともあり、特徴的なターン構造の種類の研究なども盛んに行われた。１９８０年代には統計物理学の研究者が統計物理学の理論を駆使して、タンパク質の立体構造の安定性を解析したり、タンパク質の構造を単純化したモデルで統計物理的な折りたたみの仕組みを解明したりしようとする研究が盛んに行われた。こうした中で、安定した立体構造をもつタンパク質のアミノ酸配列は構造形成しやすいように調整されていて、ランダムな配列のほとんどは安定した立体構造にいたらないことがわかってきた。 Research on protein structure prediction methods began in the first half of the 1960s when the myoglobin structure was first elucidated by X-ray crystallography, and was initially a regular helical structure stabilized by hydrogen bonds in the structure. Attention has been focused on the helix and the sheet structure in which a plurality of linearly extending strand structures are joined together in a plate shape by hydrogen bonding, and the amino acid residue sequence indicating which part of the sequence takes the helix or strand structure There have been many studies on secondary structure prediction methods that try to infer from the sequence pattern. In addition, turn structures were found in some of the irregular structures stabilized by hydrogen bonds, although they were not regular, and research on the types of characteristic turn structures was also actively conducted. In the 1980s, statistical physics researchers used the theory of statistical physics to analyze the stability of protein steric structures, and to develop statistical physics folding mechanisms with simplified models of proteins. There has been a lot of research to elucidate. Under these circumstances, it has been found that the amino acid sequence of a protein having a stable three-dimensional structure is adjusted so that the structure is easily formed, and most of the random sequences do not have a stable three-dimensional structure.

１９９０年代に入ると、当時すでに立体構造が判明したタンパク質の数が１０００を超え、これらの構造既知タンパク質の構造を利用したタンパク質構造予測法が提唱された。非特許文献１においては、構造既知タンパク質から統計的に抽出された平均力場ポテンシャルの考え方で、立体構造を予測したいタンパク質の配列を、構造既知のタンパク質の立体構造にながしこみ、配列と構造との適合度を平均力場ポテンシャルから求めて、その適合度が高い場合は、構造予測したいタンパク質はその適合度の高い構造既知タンパク質の構造と類似している可能性が高いことを突き止めた。ここで、タンパク質の構造予測技術は構造認識技術として解釈され、この段階でタンパク質の立体構造は実用的な精度で予測できることが多いようになってきた。現在のタンパク質構造予測法の主流はこの方法を大きく拡張したもので、構造未知タンパク質の立体構造を構造認識法で得られた類似の可能性のある構造として大まかに予測し、それを分子動力学計算などで詳細に補正するものが多い。現在では新たに判明したタンパク質の立体構造が予測された構造と類似している率は６０％以上といわれ、予測技術が実用的に利用できる段階に入ったと考えられる。 In the 1990s, the number of proteins whose three-dimensional structures were already known exceeded 1000, and protein structure prediction methods using the structures of these known proteins were proposed. In Non-Patent Document 1, the sequence of a protein for which a three-dimensional structure is to be predicted is applied to the three-dimensional structure of a protein with a known structure, based on the concept of mean force field potential statistically extracted from the protein with a known structure. From the average force field potential, when the fitness is high, it was found that there is a high possibility that the protein whose structure is to be predicted is similar to the structure of a known protein having a high fitness. Here, the protein structure prediction technique is interpreted as a structure recognition technique, and at this stage, the three-dimensional structure of a protein can often be predicted with practical accuracy. The mainstream of current protein structure prediction methods is a significant extension of this method, and roughly predicts the three-dimensional structure of proteins with unknown structure as possible structures obtained by the structure recognition method, which is then molecular dynamics. There are many things that are corrected in detail by calculation. At present, the rate at which the three-dimensional structure of a newly identified protein is similar to the predicted structure is said to be 60% or more, and it is considered that the prediction technology has entered a stage where it can be practically used.

しかし、一方でタンパク質の構造形成のメカニズムが判明したといえる状態からは程遠い。新規に解明されたタンパク質の立体構造が、既知の立体構造とほとんど類似性のない新規の構造で、またそのタンパク質の配列も構造既知のタンパク質の配列とは全く異なる場合、構造予測が成功している例はほとんどなく、ある程度類似している構造を予測できたというような場合を成功事例として含めても１０％から２０％の予測精度しかない。真に必要とされるタンパク質の立体構造予測法は、構造認識技術や類似配列で構造既知のタンパク質構造から推定するような方法ではなく、タンパク質の構造形成メカニズムを解明したうえでの厳密な予測方法である。 However, on the other hand, it is far from being able to say that the mechanism of protein structure formation has been found. If a newly elucidated protein has a three-dimensional structure that has little similarity to a known three-dimensional structure, and the protein sequence is completely different from that of a known protein, the structure prediction is successful. There are almost no examples, and there is only a prediction accuracy of 10% to 20% even if a case where a similar structure can be predicted to some extent can be included as a successful example. The true method for predicting the three-dimensional structure of a protein is not a method of inferring from a known protein structure with a structure recognition technique or a similar sequence, but a precise prediction method after elucidating the structure formation mechanism of the protein. It is.

こうした中、特許文献１において、タンパク質の立体構造形成の順序が配列の部分断片のエントロピーによって決まっている可能性が示唆された。同先行文献においては、実験によって構造形成における折りたたみの順序が判明している少数のタンパク質で、それらのタンパク質の部分配列断片の構造多様性についてのエントロピーを計算すると、エントロピーの低いところから順次折りたたみが進行していると考えることができる事例が多数みつかったのである。これらエントロピーの低い部分は、最終的に水素結合によって安定化された規則的なヘリックスやシートなどの二次構造部分ではなく、むしろ二次構造と二次構造の切れ目に相当する不規則構造部分やターン構造を作る部分に対応することが多い。このことは、１９７０年代に盛んであったヘリックスやシートの部分を推定しようとする二次構造予測法があまり成功しなかった理由を物語る。これらヘリックスやシートのような規則的で安定な構造が頻繁に存在するのは、これらの安定構造がアミノ酸の配列パターンとは無関係にいかなる配列でも自然に折りたたんでしまった結果であり、ほとんどの配列はストランドにもヘリックスにもなる可能性があり、むしろどちらの構造になるかは、配列上でこれら安定構造部分の間にある不規則構造をつくる部分の配列に依存していると考えられるからである。この発見と、この発見が統計力学の原理に根ざしていることから、タンパク質の構造形成のメカニズムの非常に重要な部分が解明可能になってきた。 Under such circumstances, Patent Document 1 suggests that the order of formation of the three-dimensional structure of the protein may be determined by the entropy of partial fragments of the sequence. In the prior literature, for a small number of proteins for which the order of folding in structure formation has been clarified by experiments, the entropy for the structural diversity of partial sequence fragments of those proteins is calculated. Many cases were found that could be considered ongoing. These low entropy parts are not secondary structural parts such as regular helices and sheets that are finally stabilized by hydrogen bonds, but rather irregular structural parts corresponding to the breaks between secondary structure and secondary structure. Often corresponds to the part of the turn structure. This explains the reason why the secondary structure prediction method, which tried to estimate the helix and sheet parts, which was popular in the 1970s, was not very successful. The regular and stable structures such as these helixes and sheets frequently exist as a result of the natural folding of these stable structures regardless of the amino acid sequence pattern. Can be either a strand or a helix. Rather, the structure is considered to depend on the sequence of the part that creates an irregular structure between these stable structural parts on the sequence. It is. Since this discovery and this discovery are rooted in the principles of statistical mechanics, it has become possible to elucidate a very important part of the mechanism of protein structure formation.

そこで、エントロピーの小さい部分に着目してその配列パターンの特徴をみると、グリシン、プロリン、トリプトファン、システインなど２０種類のアミノ酸の中ではかなり特殊なものが多い。グリシンは側鎖がないことによって非常にフレキシブルな構造をとることが可能であり、これがこのアミノ酸を特徴づけている。一方プロリンは主鎖中に５員環をもつことで主鎖二面角の角度φが−６０°前後しかとれない。トリプトファンは側鎖が巨大であるため、同じくφやψの角度に多くの制約がある。逆にエントロピーが大きい部分に着目すると、そこにはアラニンに代表される非常に平凡なアミノ酸が出てくる。アラニン自身はヘリックス中に存在することが多いが、エントロピーを計算すると、アラニンは自ら進んでへリックスを作ろうとしているわけではなく、たまたまヘリックスになってもエネルギー的に不安定でないことが多いので、ヘリックス構造をとることが多く、へリックス構造を作りだすきっかけにはなっていないことがわかる。 Therefore, focusing on the small entropy portion and examining the characteristics of the sequence pattern, there are many quite special ones among the 20 amino acids such as glycine, proline, tryptophan, and cysteine. Glycine can have a very flexible structure due to the absence of side chains, which characterizes this amino acid. On the other hand, proline has a 5-membered ring in the main chain, so that the angle φ of the main chain dihedral angle can only be around −60 °. Since tryptophan has a huge side chain, there are many restrictions on the angles of φ and ψ. On the other hand, when attention is focused on a portion with a large entropy, a very ordinary amino acid represented by alanine appears there. Alanine itself is often present in the helix, but when entropy is calculated, alanine is not willing to make a helix by itself, and even if it happens to be a helix, it is often not energetically unstable. It can be seen that the helix structure is often used and the helix structure is not triggered.

以上のようなエントロピー解析の結果から推定されることは、次のようなことである。タンパク質配列中のエントロピーが大きい部分断片は、その部分配列断片はどのような構造をとってもそれほどその部分の局所的なエネルギーの変化が小さいので、その部分の配列以外の近傍あるいは配列上遠く離れた別の部分の影響によっていかような構造にでもなりえるが、一方エントロピーの小さいところは、その部分が少数の特定の構造をとったときに局所的なエネルギーが非常に小さくなり、ほかの構造をとったときにはエネルギーが非常に大きくなるため、周囲の影響がなくても、自ら低エネルギーの局所構造をとろうとする。すなわち周囲に比べてエントロピーの小さいところは、ほかの部分の構造形成以前に率先して特定の構造をとろうとし、その結果周囲のエントロピーの大きい部分が影響されて比較的安定なヘリックスやシート構造をとることになる。これはタンパク質の構造形成過程が統計力学のボルツマンの原理に従っていることを意味する。 What is estimated from the results of the entropy analysis as described above is as follows. A partial fragment with a large entropy in a protein sequence has a small local energy change in the partial sequence fragment, regardless of the structure of the partial sequence fragment. Depending on the influence of this part, it can have any structure. On the other hand, where the entropy is small, when that part takes a small number of specific structures, the local energy becomes very small and other structures are taken. The energy is so great that it tries to take a local structure with low energy by itself even if there is no influence from the surroundings. In other words, where the entropy is small compared to the surrounding area, it takes a specific structure prior to the formation of the structure of other parts, and as a result, the surrounding entropy area is affected and a relatively stable helix or sheet structure. I will take. This means that the protein structure formation process follows the Boltzmann principle of statistical mechanics.

配列断片のエントロピーＳ^ｓは配列断片を構成する断片中ｉ番目のアミノ酸残基と断片中ｊ番目のアミノ酸残基について、その配列上の相対配置ｋ＝ｊ−ｉと二つのアミノ酸の種類ａ，ｂ（ここでｉ番目の残基の種類がａで、ｊ番目の残基の種類がｂであるとする）から決まる配列依存対エントロピーＳ^ｓａｂ _ｋを断片中のすべての可能なｉ≦ｊを満たす組み合わせについて足し合わせた総和Ｓ^ｓを次式で計算できる。 The entropy S ^s of the sequence fragment is the relative arrangement k = j−i and the two types of amino acids a, i for the i-th amino acid residue in the fragment and the j-th amino acid residue in the fragment. The sequence-dependent pair entropy S ^sab _k determined from b ( ^assuming that the type of the i-th residue is a and the type of the j-th residue is b) is to determine all possible i ≦ j in the fragment. The total sum S ^s of the combinations to be satisfied can be calculated by the following equation.

［数１］
Ｓ^ｓ＝Σ_ｉ≦ｊＳ^ｓａｂ _{ｋ＝ｊ−ｉ} [Equation 1]
S ^s = Σ _{i ≦ j} S ^sab _{k = j−i}

ここで、配列断片エントロピーＳ^ｓは、配列断片の全エントロピーＳ＝Ｓ^ｃ＋Ｓ^ｓのうち配列依存のエントロピーＳ^ｓを意味する。配列非依存のエントロピーＳ^ｃは原理的に計算不可能であるが、配列長が同じ配列断片については配列の違いに無関係に一定値をとるため、配列によるエントロピーの違いを見るためには、配列断片の長さが同じである限りにおいて配列断片の配列依存のエントロピーＳ^ｓを考えるだけでよい。 Here, the sequence fragment entropy S ^s means the sequence-dependent entropy S ^s of the total entropy S = S ^c + S ^s of the sequence fragment. Although entropy S ^c of sequence-independent in principle incalculable, since the sequence length is to take independent constant value sequence differences for the same sequence fragment, in order to see the entropy difference in due sequence, sequence As long as the lengths of the fragments are the same, it is only necessary to consider the sequence-dependent entropy S ^s of the sequence fragments.

以上により、タンパク質の立体構造予測法において予測するタンパク質の配列全体中の様々な長さ、様々な位置の断片について配列断片エントロピーＳ^ｓを求めたエントロピーマップを作り、このマップに基づいてタンパク質構造の折りたたみ順序を決めていくことで、タンパク質の立体構造がその折りたたみ過程も含めて予測できることが期待される。 As described above, an entropy map in which the sequence fragment entropy S ^s is obtained for fragments of various lengths and various positions in the entire protein sequence predicted in the three-dimensional structure prediction method of the protein is created. By determining the folding order, it is expected that the three-dimensional structure of the protein can be predicted including the folding process.

特開２００８−１４６５２９号公報。Japanese Patent Application Laid-Open No. 2008-146529. Sippl M.J., "Calculation of Conformational Ensemble from Potentials of Mean Force: An Approach to the Knowledge-based Prediction of Local Structure in Globular Proteins", Journal of Molecular Biology, 213, pp.850-883, 1990.Sippl M.J., `` Calculation of Conformational Ensemble from Potentials of Mean Force: An Approach to the Knowledge-based Prediction of Local Structure in Globular Proteins '', Journal of Molecular Biology, 213, pp.850-883, 1990. Bowie J.U., et al, "A Method to Identify Protein Sequence That Fold into a Known Three-Dimensional Structure." Science, 256, pp.164-170, 1991.Bowie J.U., et al, "A Method to Identify Protein Sequence That Fold into a Known Three-Dimensional Structure." Science, 256, pp.164-170, 1991. Onizuka K., et al., "Using Data Compression for Multi-dimensional Distribution Analysis", IEEE Intelligent Systems 17(3), pp.48-54,2002.Onizuka K., et al., "Using Data Compression for Multi-dimensional Distribution Analysis", IEEE Intelligent Systems 17 (3), pp.48-54, 2002.

このようなタンパク質立体構造予測法においては、いかなる長さの配列断片についてもその断片が特定の立体構造断片をとったときの総エネルギー値が計算できなければならない。さらにこの総エネルギー値は主鎖の立体構造にのみ依存し、側鎖配座の違いによって変化しないものであることが前記構造予測アルゴリズムから要請される。これを満たす構造断片の総エネルギーの計算法をどのようにするかが第一の課題である。 In such a protein three-dimensional structure prediction method, it is necessary to be able to calculate the total energy value of a sequence fragment of any length when the fragment takes a specific three-dimensional structure fragment. Furthermore, it is required from the structure prediction algorithm that this total energy value depends only on the three-dimensional structure of the main chain and does not change depending on the side chain conformation. The first issue is how to calculate the total energy of structural fragments that satisfy this requirement.

次に、構造断片のエネルギー最適化において、構造断片の最適構造をどのように求めるかが課題であり、これには初期構造から次第に変化させて最終的な最適構造にいたるための初期構造設定方法、及び各段階での構造断片の変形方法が課題である。つまり構造断片の初期構造設定方法が第二の課題であり、第三の課題として構造断片の変形方法がある。 Next, in the energy optimization of structural fragments, the issue is how to obtain the optimal structure of the structural fragments, and this includes an initial structure setting method for gradually changing from the initial structure to the final optimal structure. And a method of deforming the structure fragment at each stage is a problem. That is, the initial structure setting method for the structure fragment is the second problem, and the third problem is a method for deforming the structure fragment.

全体構造を予測するためには、構造予測するタンパク質の配列中の様々な長さ、位置の配列断片のエントロピーを全て見たうえで、エントロピーの小さいところから順次局所構造最適化を行う必要がある。その際、エントロピー計算法からの制約で、長さの異なる構造断片のエントロピーは、配列非依存のエントロピー（計算不可能）が異なるためエントロピーの大小を比べることができない。そこでエントロピーの大小は同じ配列長の配列断片の間でのみ比較しなければならない。これらの制約条件の中でどのような順序で配列断片の構造最適化を行うかが、第四の課題である。 In order to predict the entire structure, it is necessary to perform local structure optimization in order from the smallest entropy after looking at the entropy of sequence fragments of various lengths and positions in the sequence of the protein to be predicted. . At this time, the entropy of structural fragments having different lengths due to constraints from the entropy calculation method cannot be compared because the entropy (incapable of calculation) is independent of sequence. Therefore, the magnitude of entropy must be compared only between sequence fragments of the same sequence length. The fourth problem is how to optimize the structure of the sequence fragments in any of these constraints.

本発明の目的は以上の問題点を解決し、従来技術に比較してタンパク質の構造断片の総エネルギーを高精度で計算することができ、適正に初期構造を設定し構造断片を変形し、配列断片の構造を最適化することができるタンパク質立体構造予測法を提供することにある。 The object of the present invention is to solve the above problems and to calculate the total energy of protein structural fragments with higher accuracy than in the prior art, to appropriately set the initial structure, to deform the structural fragment, The object is to provide a protein tertiary structure prediction method capable of optimizing the structure of a fragment.

本発明に係るタンパク質立体構造予測法は、立体構造を予測するタンパク質の配列のすべての可能な部分配列断片の配列依存エントロピーＳ^ｓを計算し、その値の大小に応じてどの部分断片から折りたたみをするかの折りたたみ順序を決め（構造最適化スケジューリング）、それぞれの折りたたみにおいては可変長の断片を変化させることで断片の総エネルギー値を最適化させていき（構造断片最適化）、最終的に全体構造を予測するタンパク質立体構造予測アルゴリズムを用いるものである。このタンパク質立体構造予測アルゴリズムによって、タンパク質立体構造予測問題を解決しようとするものである。 The protein three-dimensional structure prediction method according to the present invention calculates the sequence-dependent entropy S ^s of all possible partial sequence fragments of a protein sequence that predicts a three-dimensional structure, and folds from which partial fragment according to the magnitude of the value. Decide the folding order (structure optimization scheduling), and optimize the total energy value of the fragments by changing the variable-length fragments in each fold (structure fragment optimization) and finally the whole A protein tertiary structure prediction algorithm that predicts the structure is used. The protein tertiary structure prediction algorithm is intended to solve the protein tertiary structure prediction problem.

本発明に係るタンパク質立体構造予測法は、タンパク質の立体構造を予測するタンパク質立体構造予測法において、立体構造を予測しようとするタンパク質の配列全体の中のすべての配列断片の配列依存エントロピーを算出し、その配列依存エントロピーの大小関係によって、各配列断片に対応する構造断片の構造を予測し又は最適化する順序と、周辺の配列をどのように考慮にいれるかの条件とをスケジューリングすることを特徴とする。 The protein three-dimensional structure prediction method according to the present invention is a protein three-dimensional structure prediction method that predicts a three-dimensional structure of a protein, and calculates sequence-dependent entropy of all sequence fragments in the entire protein sequence to be predicted three-dimensional structure. The order of predicting or optimizing the structure of the structure fragment corresponding to each sequence fragment and the condition for considering the surrounding sequences are scheduled according to the magnitude relationship of the sequence-dependent entropy. And

上記タンパク質立体構造予測法において、上記タンパク質の構造断片の最適化の順序は長さが短い断片を優先して最適化し、同じ長さの構造断片の最適化においては、その構造断片に対応する配列断片の配列依存エントロピーが小さい順に行い、所定の順序で最適化される構造断片が、その前の順序で最適化されるべき同じ長さの構造断片と重複する部分があるときは、構造断片の最適化の条件として、先に最適化されたるべき構造断片すべてのうち配列上連続するものを含めて伸長された構造断片の総エネルギー最少化を目指した構造最適化を行うことを特徴とする。 In the protein three-dimensional structure prediction method, the order of optimization of the structural fragments of the protein is optimized by giving priority to fragments having a short length, and in the optimization of structural fragments of the same length, sequences corresponding to the structural fragments If the structure fragments that are optimized in a given order have a portion that overlaps with a structure fragment of the same length that is to be optimized in the previous order, As a condition for optimization, the structure optimization is performed for the purpose of minimizing the total energy of the extended structural fragments including all of the structural fragments to be optimized first, which are continuous in sequence.

また、上記タンパク質立体構造予測法において、上記構造断片最適化スケジューリングにおいて、長さが小さい配列断片から優先して適用していくときに、長さ１においてはすべての残基の最適解が独立であると仮定し、次いで、長さが２以上の場合は、一つの長さが小さい段階で最適化された全体構造を初期値とする方法によって、各断片長における構造最適化を行うことを特徴とする。 In the protein tertiary structure prediction method, when the structure fragment optimization scheduling is applied in preference to a sequence fragment having a small length, the optimal solution for all residues is independent at length 1. Assuming that there is a length of 2 or more, the structure optimization at each fragment length is performed by a method in which the entire structure optimized at the stage where one length is small is used as an initial value. And

さらに、上記タンパク質立体構造予測法において、上記タンパク質の伸長された構造断片の総エネルギー最適化において、伸長された構造断片内の残基のすべての対についての平均力場ポテンシャルの和と、伸長された構造断片の両端の残基にＣＯ、及びＮＨの基を付加し、かつ残基をすべてグリシンであるとした場合の構造断片内の全原子間のレナードジョーンズポテンシャル及び静電ポテンシャルの和の総和と主鎖二面角に関連する二面角ポテンシャルの和、さらに全残基をグリシンとしたことによって生ずるエネルギー差をグリシンとグリシンの対の平均力場ポテンシャルによって非グリシン補正したものをもって伸長された構造断片の総エネルギーとし、これを最小化するように構造を変形して伸長された構造断片の構造最適化を行うことを特徴とする。 Further, in the protein conformation prediction method, in the total energy optimization of the extended structural fragment of the protein, the sum of the average force field potentials for all pairs of residues in the extended structural fragment and the extended Sum of the sum of the Leonard Jones potential and the electrostatic potential between all atoms in the structure fragment when CO and NH groups are added to the residues at both ends of the structure fragment and all the residues are glycine And the sum of the dihedral potentials related to the main chain dihedral angle, and the energy difference caused by glycine as the total residue was extended with the non-glycine corrected by the mean force field potential of the glycine-glycine pair. The total energy of the structural fragment is used, and the structure of the extended structural fragment is optimized by modifying the structure to minimize this energy. It is characterized in.

またさらに、上記タンパク質立体構造予測法において、上記タンパク質の伸長された構造断片の最適化において、伸長された構造断片中、伸長される前の構造断片部分に含まれる主鎖二面角のうち連続する三つの二面角の角度ω，φ，ψ、あるいはψ，ω，φの組み合わせを、その二面角の組が関わる配列長１又は２の部分構造断片の配列断片エントロピーの大小に基づいてエントロピーの小さい順に最適な二面角の角度ω，φ，ψ、又はψ，ω，φを決定していくことを特徴とする。 Furthermore, in the protein three-dimensional structure prediction method, in the optimization of the extended structural fragment of the protein, among the extended structural fragments, the main chain dihedral angles included in the structural fragment portion before being extended are continuous. The combination of the three dihedral angle angles ω, φ, ψ, or ψ, ω, φ is based on the size of the sequence fragment entropy of the partial structure fragment of sequence length 1 or 2 that involves the set of dihedral angles. The optimum dihedral angle ω, φ, ψ, or ψ, ω, φ is determined in order of increasing entropy.

従って、本発明に係るタンパク質立体構造予測法によれば、従来技術に比較してタンパク質の構造断片の総エネルギーを高精度で計算することができ、適正に初期構造を設定し構造断片を変形し、配列断片の構造を最適化することができる。 Therefore, according to the protein tertiary structure prediction method according to the present invention, the total energy of the protein structural fragment can be calculated with higher accuracy than in the prior art, and the initial structure is appropriately set and the structural fragment is deformed. The structure of the sequence fragment can be optimized.

以下、本発明に係る実施形態について図面を参照して説明する。 Hereinafter, embodiments according to the present invention will be described with reference to the drawings.

前記第一の課題である配列断片の総エネルギー計算法の解決のために、非特許文献１において提唱された平均力場ポテンシャルを用いることにする。この平均力場ポテンシャルはタンパク質立体構造中の任意の二つのアミノ酸残基の間に働く力場を統計的に抽出した二体間ポテンシャルとして定義されており、これはその二つのアミノ酸の種類ａ，ｂと、その二つの配列上での相対配置ｋ＝ｊ−ｉの（ａのアミノ酸残基が配列中のｉ番目にありｂのアミノ酸残基がｊ番目にあるとする）場合の空間的な相対配置ｒの分布密度から計算される統計ポテンシャルである。この平均力場ポテンシャルはアミノ酸の側鎖配座を無視して定義することにより側鎖配座に依存しないポテンシャルとすることができる。ただし、確率論的な問題により、非特許文献１において提唱された正味の平均力場ポテンシャルを用いる必要があり、その場合は、アミノ酸残基の主鎖原子間に働く物理化学的な相互作用がとりこまれていないため、これを別途考慮しなければならない。 The average force field potential proposed in Non-Patent Document 1 will be used to solve the first problem of calculating the total energy of sequence fragments. This mean force field potential is defined as a two-body potential obtained by statistically extracting the force field that acts between any two amino acid residues in the protein three-dimensional structure. b and the spatial arrangement in the case of relative arrangement k = j−i on the two sequences (assuming that the amino acid residue of a is i-th in the sequence and the amino acid residue of b is j-th) This is a statistical potential calculated from the distribution density of the relative arrangement r. This mean force field potential can be defined as a potential independent of the side chain conformation by ignoring the side chain conformation of the amino acid. However, due to a probabilistic problem, it is necessary to use the net mean force field potential proposed in Non-Patent Document 1, in which case the physicochemical interaction between the main chain atoms of amino acid residues is This has to be taken into account separately since it is not incorporated.

そこで、このアミノ酸残基の主鎖原子の相互作用としては、分子動力学の力場ポテンシャルを用いることにする。すなわち古典的な静電相互作用と分子間力であるファンデルワールス相互作用を近似したレナードジョーンズポテンシャル、そして、主鎖二面角に働く二面角ポテンシャルである。これらに必要な各原子の部分電荷やファンデルワールス半径、二面角ポテンシャルの各パラメータは実績のある分子動力学計算システムで利用しているものを利用する（たとえばＣＨＡＲＭＭ２２の分子動力学計算パラメータセットなど）。この段階で新たに露見する課題として主鎖のＣＡ原子と結合している側鎖のＣＢ原子の扱いである。このＣＢ原子はＣＡだけでなくほかの主鎖原子とも強く相互作用し、主鎖二面角φに大きく影響する。そこで、これを解決するために分子動力学ポテンシャルを利用するときには、すべてのアミノ酸をグリシンであるとし、ＣＢ原子はないものとすることにした。この結果、主鎖間の相互作用はグリシンとグリシンの分子動力学的相互作用で代表されることになり、他のアミノ酸の主鎖間相互作用としてはふさわしくない。 Therefore, the force field potential of molecular dynamics is used as the interaction of the main chain atom of this amino acid residue. In other words, it is a Leonard Jones potential that approximates the classical electrostatic interaction and van der Waals interaction that is an intermolecular force, and a dihedral angle potential that acts on the main chain dihedral angle. The partial charge, van der Waals radius, and dihedral angle potential parameters required for each of these are the same as those used in a proven molecular dynamics calculation system (for example, the molecular dynamics calculation parameter set of CHARMM22) Such). A new issue at this stage is the handling of CB atoms in the side chain bonded to the CA atoms in the main chain. This CB atom strongly interacts not only with CA but also with other main chain atoms, and greatly affects the main chain dihedral angle φ. In order to solve this problem, when using the molecular dynamics potential, it was decided that all amino acids were glycine and there were no CB atoms. As a result, the interaction between the main chains is represented by the molecular dynamic interaction between glycine and glycine, which is not suitable as the main chain interaction of other amino acids.

そこでこれを避けるため、平均力場ポテンシャルをグリシン対の平均力場ポテンシャルで補正することにした。また、総エネルギーを計算する対象はアルゴリズムの中間過程では配列断片が特定の構造をとった場合のものであるため、断片の両端の状況が悪い影響を及ぼすことがある。アミノ酸複数が結合してペプチドになるときは、Ｃ末端のカルボキシル基−ＣＯＯＨとＮ末端のアミノ基−ＮＨ２が脱水結合して−ＣＯＮＨ−となる。ＣとＮをつなぐ結合は二重結合性で、したがって、二面角ωは０°近辺と１８０°近辺しかとることができず、多くの場合１８０°となる。すると、ＣＯＮＨはＨが正の部分電荷をもち、Ｏが負の部分電荷をもつことで強力な双極子となる。この双極子は周囲のアミノ酸の構造に大きな影響を与える。一般に配列中の連続する二つのアミノ酸の境目はＣＯとＮＨの間とされるが、構造断片においては、Ｃ末端側、Ｎ末端側にそれぞれＣＯ，ＮＨを追加しＣＯＮＨが両端にある形でエネルギー計算をした方が、計算精度が良い。 To avoid this, we decided to correct the mean force field potential with the mean force field potential of the glycine pair. In addition, since the target for calculating the total energy is in the middle of the algorithm when the sequence fragment has a specific structure, the situation at both ends of the fragment may have a bad influence. When a plurality of amino acids are combined to form a peptide, the C-terminal carboxyl group —COOH and the N-terminal amino group —NH 2 are dehydrated to form —CONH—. The bond connecting C and N is a double bond, and therefore the dihedral angle ω can only be around 0 ° and around 180 °, and is often 180 °. Then, CONH becomes a strong dipole because H has a positive partial charge and O has a negative partial charge. This dipole greatly affects the structure of the surrounding amino acids. In general, the boundary between two consecutive amino acids in the sequence is between CO and NH. However, in the structure fragment, CO and NH are added to the C-terminal side and N-terminal side, respectively, and the energy is such that CONH is at both ends. The calculation accuracy is better when the calculation is performed.

以上のようにして、本発明の実施形態では、
（１）非特許文献１における平均力場ポテンシャルをアミノ酸残基間のポテンシャルとして導入し、
（２）これで表現されない主鎖間のポテンシャルとしては、残基をすべてグリシンとしたうえで分子動力学ポテンシャルを導入し、
（３）すべてをグリシンとしたことを補正するために平均力場ポテンシャルをグリシン対のもので補正し、
（４）配列断片のＣ末端側、Ｎ末端側にそれぞれＣＯ，ＮＨを付加することで、配列断片の総エネルギーを計算する。
この（１）から（４）の方法によって第一の課題である配列断片が特定の構造をとった場合の総エネルギー値の計算方法をいかにすべきかを解決する。 As described above, in the embodiment of the present invention,
(1) An average force field potential in Non-Patent Document 1 is introduced as a potential between amino acid residues,
(2) As the potential between main chains that are not expressed in this way, the molecular dynamics potential is introduced after all residues are glycine,
(3) To correct that all were glycine, the mean force field potential was corrected with the glycine pair,
(4) The total energy of the sequence fragment is calculated by adding CO and NH to the C-terminal side and N-terminal side of the sequence fragment, respectively.
The method of (1) to (4) solves how to calculate the total energy value when the sequence fragment, which is the first problem, has a specific structure.

前記第二の課題は、第四の課題を解決する手段により自然に解決される。 The second problem is naturally solved by means for solving the fourth problem.

前記第三の課題の解決の手段としては、断片内の主鎖二面角の角度ω，φ，ψを変形する方法を用いることを前提とする。第一の課題解決の手段で考案した総エネルギー計算方法においては、主鎖内の共有結合している原子間の距離や結合角を変化させることは望ましくないためで、これらを規定値に固定し、結合二面角のみ変更することで構造変形を行うことが望ましい。初期構造から最終的な最適構造にいたるまでの間に、断片中のすべての二面角を同時に変化させて最適なものにすることは不可能であると考え、段階的に断片中の二面角を変化させて次第に最適構造に近づけていくこと方法をとることにする。 As a means for solving the third problem, it is assumed that a method of deforming the angles ω, φ, ψ of the main chain dihedral angles in the fragment is used. In the total energy calculation method devised by the means for solving the first problem, it is not desirable to change the distance or bond angle between the covalently bonded atoms in the main chain. It is desirable to change the structure by changing only the coupling dihedral angle. We think that it is impossible to change all dihedral angles in a fragment at the same time from the initial structure to the final optimum structure, so that it is impossible to optimize the structure. We will take a method of gradually approaching the optimum structure by changing the angle.

そこで一回の構造変形でどのように変化させるかが問題である。一回の構造変形において断片中のすべての変化させることのできる二面角のうちの一つだけを変化させる方法では最終構造に至らないことが確かめられた。このときの変形方法は二面角を１°から５°程度の微小変形する場合でも大きく最大１８０°変化させることができるようにしても、最終構造には至らない。そこで、一回の構造変形において、構造断片中の一つの残基の角度ω，φ，ψ（ωはその両側の残基のうちＣ末端側の残基に属すと考える）を同時に大きく変化させる場合と構造断片中の二つの残基の間の角度ψ，ω，φを同時大きく変化させる場合の二つを併用することにした。また一回の構造変形においてどの残基の二面角（ω，φ，ψ）あるいはどの連続する二つの残基の間の二面角（ψ，ω，φ）を変化させるかは、これらと密接に結びつく１残基の内部構造に基づくエントロピーＳ^ｓａａ _０、及び連続する２残基の相対配置（ψ，ω，φ）に基づくエントロピーＳ^ｓａｂ _１の大小関係を用いて決定することにした。配列断片中の連続する２残基のエントロピーＳ^ｓａｂはＮ末端側の残基がａでＣ末端側の残基がｂである場合、次式で表される。 The problem is how to change the structure with a single structural deformation. It was confirmed that the method of changing only one of all the dihedral angles that can be changed in the fragment in one structural deformation does not lead to the final structure. The deformation method at this time does not lead to the final structure even if the dihedral angle can be changed by a maximum of 180 ° even when the dihedral angle is slightly deformed by about 1 ° to 5 °. Therefore, in one structural deformation, the angle ω, φ, ψ of one residue in the structural fragment (ω is considered to belong to the C-terminal residue among the residues on both sides) is changed greatly at the same time. We decided to use both the case and the case where the angles ψ, ω, and φ between two residues in the structural fragment are changed greatly at the same time. Also, which dihedral angle (ω, φ, ψ) of which residue or dihedral angle between two consecutive residues (ψ, ω, φ) is changed in one structural deformation is It was decided to use entropy S ^saa ₀ based on the internal structure of one closely linked residue and entropy S ^sab ₁ based on the relative arrangement of two consecutive residues (ψ, ω, φ). The continuous two-residue entropy S ^sab in the sequence fragment is represented by the following formula when the N-terminal residue is a and the C-terminal residue is b.

［数２］
Ｓ^ｓａｂ＝Ｓ^ｓａａ _０＋Ｓ^ｓｂｂ _０＋Ｓ^ｓａｂ _１ [Equation 2]
S ^sab = S ^saa ₀ + S ^sbb ₀ + S ^sab ₁

配列断片中のすべての連続する二残基についてこのエントロピーを計算し、最も小さい二残基配列断片から順次変化させることにし、各連続する二残基の構造変形方法としては、Ｓ^ｓａａ _０とＳ^ｓｂｂ _０のうち小さい方の残基の角度ω，φ，ψを先に変化させ、次いで大きい方を変化させ、その後、二つの間の角度ψ，ω，φを変化させることにする。連続する二面角を同時に変化させて最適な変形を得るためには、角度ωについては０°か１８０°、φ、ψについては１５°から６０°刻みで全ての可能な角度の組み合わせに変化させ、そのつど構造断片の総エネルギー値を計算し、最も小さい総エネルギー値をとる角度に一旦設定したあと、微小変形よる修正を行う。仮に大規模変形で６０°刻みであった場合は、その刻みで最適であった角度ψ，φの値をそれぞれ±３２°ずつ変化させ（全部で９通りの微小変形がある）その中の最適な微小変形を得たあと、変化させる角度を１６°にしてこれを繰り返し、変形角度を半分にして１°刻みまでこれを繰り返す。こうして得られた変形による構造断片の総エネルギー値が、変形以前よりも小さい場合は、その変形がその段階での最適変形として受け入れる。変形以前よりも小さくない場合は、その変形を行わず、変形以前がこの段階における最適構造であったとする。 This entropy is calculated for all consecutive two residues in the sequence fragment, and is sequentially changed from the smallest two-residue sequence fragment. As a structural modification method for each consecutive two residue, S ^saa ₀ and S The angle ω, φ, ψ of the smaller residue of ^sbb ₀ is changed first, then the larger one is changed, and then the angle ψ, ω, φ between the two is changed. In order to obtain the optimal deformation by changing the continuous dihedral angle at the same time, the angle ω changes to 0 ° or 180 °, and φ and ψ change from 15 ° to 60 ° in all possible angle combinations. Each time, the total energy value of the structural fragment is calculated, and once set to an angle at which the smallest total energy value is obtained, correction by minute deformation is performed. If it is a large-scale deformation in 60 ° increments, the values of the angles ψ and φ that were optimal in the increment are changed by ± 32 ° respectively (there are 9 small deformations in total) and the optimum After obtaining a very small deformation, the change angle is set to 16 °, and this is repeated, and the deformation angle is halved and this is repeated every 1 °. When the total energy value of the structural fragment obtained by the deformation is smaller than that before the deformation, the deformation is accepted as the optimum deformation at that stage. If it is not smaller than before the deformation, the deformation is not performed, and the structure before the deformation is the optimum structure at this stage.

以上のようにして以下の形で第三の課題を解決する。
（１）二面角を変化させることによる構造断片の変形を繰り返すことで初期構造から最適構造への変形を実現する。
（２）一回の構造変形においては、断片中の一つの残基の二面角の三つの角度（ω，φ，ψ）、あるいは連続する二つの残基の間の二面角の三つの角度（ψ，ω，φ）を同時に変化させる。
（３）どの残基の二面角、あるいはどの連続する二つの残基の間の二面角を変化させるかについては、連続する二つの残基の配列断片エントロピーの大小によって判断し、最も小さいエントロピーをもつ連続する二残基について変化させる。
（４）各連続する二残基についてはまず残基のエントロピーの小さい方の角度ω，φ，ψを変化させ、次に大きい方を変化させたあと、両者の間の角度ψ，ω，φを変化させる。
（５）近傍の二面角を同時に変化させる方法として、１５°から６０°刻みですべての可能な角度の組み合わせで配列断片のエネルギー値を計算し最もよかった角度の組み合わせに変化させたのち、刻み角度の半分程度の微小変形の組み合わせを試し、次いで微小変形の刻み角を半分ずつに設定して最終的に１°刻みでの最適変形角度の組み合わせを得る。 As described above, the third problem is solved in the following manner.
(1) The deformation from the initial structure to the optimum structure is realized by repeating the deformation of the structure fragment by changing the dihedral angle.
(2) In one structural deformation, three dihedral angles (ω, φ, ψ) of one residue in a fragment or three dihedral angles between two consecutive residues The angles (ψ, ω, φ) are changed simultaneously.
(3) Which dihedral angle of which residue or dihedral angle between two consecutive residues is changed is determined by the size of the sequence fragment entropy of the two consecutive residues, and is the smallest Change for two consecutive residues with entropy.
(4) For each successive two residues, first change the angle ω, φ, ψ with the smaller entropy of the residue, then change the larger one, then angle ψ, ω, φ between the two To change.
(5) As a method of simultaneously changing the dihedral angle in the vicinity, the energy value of the sequence fragment is calculated in all possible combinations of angles in increments of 15 ° to 60 °, and then changed to the best combination of angles. A combination of micro deformations of about half the angle is tried, then the step angles of the micro deformations are set to half each, and finally an optimal deformation angle combination in units of 1 ° is obtained.

次に、第四と第二の課題の解決のための手段について説明する。配列断片のエントロピーは、同じ配列長の（同じ残基数からなる）配列断片については、その配列非依存のエントロピーが一定であるので、比較可能であるが、配列長の異なるものについては比較ができない。一方で、タンパク質の折りたたみにおいては局所的な構造形成から全体構造の形成へと向かうことが仮定できるので、これを考慮すると、まず配列断片が非常に短い、たとえば、たった一つの残基からなる場合から始めて、次第に配列断片を長くしつつ全体構造を求めるアルゴリズムがふさわしいと考えられる。そこでまず配列断片の長さが１、つまり１残基からなるものについて考える。この場合考慮すべきエントロピーはＳ^ｓａａ _０，（Ｋ＝ｊ−ｉ＝０）である。この１残基のエントロピーは周囲の残基の種類などから影響を受けないので、立体構造を求めようとするタンパク質配列の各残基について互いに独立に最適構造を決定することができる。ただし、断片としては、前記のＣＯＮＨの双極子の影響を考慮し、一つの残基の両側にＣＯ及びＮＨを付加した形で最もエネルギーの小さい構造（角度ω，ψ，φの組み合わせ）を選ぶ。これによってタンパク質立体構造予測の初期値が決定され、第二の課題が解決できる。 Next, means for solving the fourth and second problems will be described. Entropy of sequence fragments can be compared for sequence fragments having the same sequence length (consisting of the same number of residues), since the sequence-independent entropy is constant. Can not. On the other hand, protein folding can be assumed to go from local structure formation to overall structure formation. Considering this, first of all, a sequence fragment is very short, for example, it consists of only one residue. It is considered that an algorithm for obtaining the entire structure while gradually lengthening the sequence fragment is suitable. Therefore, first consider the case where the length of the sequence fragment is 1, that is, one residue. The entropy to be considered in this case is S ^saa ₀ , (K = j−i = 0). Since the entropy of one residue is not affected by the type of surrounding residues, etc., the optimum structure can be determined independently for each residue of the protein sequence for which a three-dimensional structure is to be obtained. However, considering the influence of the above-mentioned dipole of CONH, the structure having the smallest energy (combination of angles ω, ψ, and φ) is selected by adding CO and NH to both sides of one residue. . This determines the initial value of protein tertiary structure prediction and can solve the second problem.

次いで、Ｋ＝ｊ−ｉ＝１の場合を考える。この場合のエントロピーはＳ^ｓ _{ｉ，ｉ＋１}連続する二残基からなる断片のエントロピーであるから要素となる対エントロピーの和Ｓ^ｓ _{ｉ，ｉ＋１}は次式で表される。 Next, consider the case of K = j−i = 1. The entropy case is ^{S s _i, i} _{+ 1} because it is entropy of contiguous two residues fragment pairs entropy as the elements sum ^{S s _i, i} _{+ 1} is expressed by the following equation.

［数３］
Ｓ^ｓ _{ｉ，ｉ＋１}＝Ｓ^ｓａａ _０＋Ｓ^ｓａａ _０＋Ｓ^ｓａａ _１ [Equation 3]
^{_{^{_{^{S s i, i + 1 =}}}}} S saa 0 + S saa 0 + S saa 1

配列断片中のすべての連続する二残基（ｉ番目の残基とｉ＋１番目の残基）のエントロピーＳ^ｓ _{ｉ，ｉ＋１}を調べ、エントロピーの小さい順に最適構造を決定していく。その際、エントロピーＳ^ｓ _{ｉ，ｉ＋１}が局所最少である場合は、周囲から独立して最適構造を決定できるが、すでにＮ末端側隣のエントロピーＳ^ｓ _{ｉ−１，ｉ}やＣ末端側隣のエントロピーＳ^ｓ _{ｉ＋１，ｉ＋２}の最適構造が決定されている場合、この部分の最適構造はその一方あるいは両方の影響を受けることになる。そこで、その場合は最適構造を求めるときに、総エネルギーを求める範囲としてすでに最適構造が決定されている近傍の部分を含む配列断片について計算し、最適解を求める。このようにすることで、Ｋ＝１の場合については、エントロピーの低い順に断片の最適構造を決定することができ、局所的に最少のエントロピーをもつ断片の構造は周囲から独立に決定され、周囲において最少ではない断片の構造は周囲のすでに決まっている構造の影響を考慮して決定される。さらに、エントロピーＳ^ｓ _{ｉ，ｉ＋１}が最大値をとる断片の構造決定が行われるときは、タンパク質の全体配列をもつ全体配列を考慮して最適構造が決定され、この段階で暫定的な全体構造が求められる。 The entropy S ^s _{i, i + 1} of all two consecutive residues (i-th residue and i + 1-th residue) in the sequence fragment is examined, and the optimum structure is determined in order of increasing entropy. At that time, when the entropy S ^s _{i, i + 1} is the local minimum, the optimum structure can be determined independently from the surroundings, but the entropy S ^s _{i−1, i} adjacent to the N-terminal side and the entropy adjacent to the C-terminal side are already present. When the optimum structure of S ^s _{i + 1, i + 2} is determined, the optimum structure of this part is affected by one or both of them. Therefore, in this case, when the optimum structure is obtained, an array solution including a neighboring portion in which the optimum structure has already been determined as a range for obtaining the total energy is calculated to obtain an optimum solution. In this way, in the case of K = 1, the optimum structure of fragments can be determined in order of increasing entropy, and the structure of the fragment having the smallest entropy is determined independently from the surroundings. The structure of the non-minimum fragment is determined in consideration of the influence of the surrounding predetermined structure. Furthermore, when the structure of the fragment having the maximum entropy S ^si _{, i + 1} is determined, the optimal structure is determined in consideration of the entire sequence having the entire sequence of the protein. At this stage, the provisional overall structure is determined. Desired.

この方法をＫが配列全長−１、つまり配列断片が全体配列と一致するまで繰り返すことで、全体構造最適化が行われることを示す。Ｋ＝０で、各残基独立に最適構造が求められ、こうして得られた結果を初期状態にしてＫ＝１の場合の最適構造が求められる。次いでＫ＝２の場合の構造は、Ｋ＝１の結果を初期状態として最適構造が求められ、以降Ｋが配列全長−１になるまで繰り返すことで最終的に全体構造が求められることになる。これで、エントロピーの小さい順に構造を決定していくということが可能になり、すべての長さの配列断片のエントロピーを考慮し、かつその大小によって折りたたみの順序を決定する最良の方法が得られることで、第四の課題が解決される。 By repeating this method until K is the total sequence length -1, that is, the sequence fragment matches the entire sequence, it is shown that the entire structure is optimized. When K = 0, an optimum structure is obtained independently for each residue, and an optimum structure in the case of K = 1 is obtained with the result thus obtained as an initial state. Next, as for the structure in the case of K = 2, the optimum structure is obtained with the result of K = 1 as an initial state, and thereafter the entire structure is finally obtained by repeating until K becomes the total length of the sequence −1. This makes it possible to determine the structure in ascending order of entropy, taking into account the entropy of sequence fragments of all lengths, and obtaining the best method for determining the order of folding according to their size Thus, the fourth problem is solved.

この方法により本発明の解決すべき課題である、立体構造を予測しようとするタンパク質の全配列のすべての長さ、位置の配列断片のエントロピーの大小により立体構造最適化の順序を厳密に決定でき、これによって高精度なタンパク質立体構造予測の実現へ向け、大きく前進することができる。 By this method, the order of optimization of the three-dimensional structure can be strictly determined by the entropy of all the lengths and positions of the entire sequence of the protein to be predicted, which is a problem to be solved by the present invention. As a result, a great progress can be made toward the realization of highly accurate protein tertiary structure prediction.

本発明に係る実施形態は、
（１）構造既知タンパク質から抽出された統計量に基づくアミノ酸残基間平均力場ポテンシャルを用い、
（２）立体構造の予測を行う与えられたタンパク質のアミノ酸残基配列の可能なすべての長さ、位置の配列断片の配列依存エントロピーを計算し、
（３）各配列断片のエントロピーの大小により最適化する配列断片の順序と他の配列断片に対する依存性を決定し、
（４）各配列断片の構造を予測するときには、その配列断片の構造予測でその配列断片の構造決定が依存する範囲についてその範囲の残基のすべての対についての平均力場ポテンシャルとその範囲のすべての残基をグリシンとした場合の想定されるすべての原子間の静電ポテンシャルとレナードジョーンズポテンシャルと、グリシン以外の残基への平均力場ポテンシャルの補正項の総和として計算される総エネルギーを最小にすべく、断片の最適化を行うことによって、最終的な立体構造を予測する方法である。 Embodiments according to the present invention
(1) Using the average force field potential between amino acid residues based on statistics extracted from proteins of known structure,
(2) Calculate the sequence-dependent entropy of sequence fragments of all possible lengths and positions of the amino acid residue sequence of a given protein that performs three-dimensional structure prediction,
(3) Determine the order of sequence fragments to be optimized based on the entropy of each sequence fragment and the dependency on other sequence fragments,
(4) When predicting the structure of each sequence fragment, the average force field potential for all pairs of residues in the range and the range of the range for which the structure determination of the sequence fragment depends on the structure prediction of the sequence fragment The total energy calculated as the sum of all the potential electrostatic potentials and Leonard Jones potentials between all the residues, assuming that all residues are glycine, and the correction term for the mean force field potential for residues other than glycine. This is a method for predicting the final three-dimensional structure by optimizing the fragments to minimize the size.

本発明に係る実施形態は、前記の構成から、
（１）平均力場ポテンシャル計算部１と、
（２）与えられたタンパク質配列全体のすべての配列断片の配列依存エントロピーを計算する配列エントロピーマップ計算部２と、
（３）計算された各配列断片のエントロピーの結果を用いて配列断片の構造最適化の順序と各構造断片最適化のときの他の部分配列への依存関係を決定する構造最適化スケジューリング部３と、
（４）それぞれの配列断片の最適化の際、部分構造断片を変形させ、関連部分の総エネルギーを計算し、これを最小化させる断片構造最適化部４の、
大きく分けて全部で四つの部分からなる。 The embodiment according to the present invention is based on the above configuration.
(1) Mean force field potential calculation unit 1;
(2) a sequence entropy map calculation unit 2 for calculating sequence-dependent entropy of all sequence fragments of the entire given protein sequence;
(3) A structure optimization scheduling unit 3 that determines the order of structure optimization of sequence fragments and the dependency on other partial sequences at the time of each structure fragment optimization using the calculated entropy result of each sequence fragment When,
(4) When optimizing each sequence fragment, the fragment structure optimization unit 4 deforms the partial structure fragment, calculates the total energy of the related portion, and minimizes it.
It is roughly divided into four parts.

まず、平均力場ポテンシャル計算部１は、すでに公知の方法をそのまま利用しているので、本発明の要旨を含まない。しかし本発明の実施形態において、高精度な対エントロピーを計算するために、平均力場ポテンシャルにおける相対配置ｒは残基対の空間内での相対距離ｄと一方からみたもう一方の残基の方向（天頂角θと方位角φの二自由度）の三自由度で定義されるものが不可決である。すなわち多次元平均力場ポテンシャルを用いる。統計処理として残基対を構成するアミノ酸の種類ａ，ｂ，及び配列上の相対配置ｋである立体構造データベース中の残基対について、ｒのヒストグラムを作る必要があるが、このとき、相対配置ｒの分割方法として、距離については１００ｐｍ（ピコメートル）刻みで、角度θ，φについては３０°から６０°の間の刻みでヒストグラムを作ると、データ数と統計精度を考える上でバランスがよい。 First, since the average force field potential calculation unit 1 uses a known method as it is, it does not include the gist of the present invention. However, in an embodiment of the present invention, in order to calculate a highly accurate pair entropy, the relative configuration r in the mean force field potential is the relative distance d in the residue pair space and the direction of the other residue from one side. What is defined by three degrees of freedom (two degrees of freedom of zenith angle θ and azimuth angle φ) is unacceptable. That is, a multidimensional mean force field potential is used. As a statistical process, it is necessary to create a histogram of r for the residue pairs in the three-dimensional structure database that are the types of amino acids a and b constituting the residue pair and the relative arrangement k on the sequence. As a method for dividing r, if a histogram is created in increments of 100 pm (picometers) for distances and increments of 30 ° to 60 ° for angles θ and φ, a good balance is obtained in consideration of the number of data and statistical accuracy. .

また、平均力場ポテンシャルの残基対を構成するアミノ酸残基二つの配列上の距離ｋについては、ｋ＝０の場合、つまり対ではなくアミノ酸一つの平均力場ポテンシャルについても定義されている必要がある。この場合、一つのアミノ酸内の相対配置は定義不能であるため、一つのアミノ酸の内部状態として主鎖二面角の角度ω，φ，ψの組が内部状態を表すものとし、角度ω，φ，ψに対する頻度情報をもとに平均力場ポテンシャルを決定する。同様にｋ＝１の場合は隣接する二つのアミノ酸の平均力場ポテンシャルであり、この隣接する二つのアミノ酸の相対配置は、その間にある主鎖二面角の角度ψ，ω，φによってほぼ厳密に定義できるので、角度ψ，ω，φが相対配置を表現するものとして用いて、平均力場ポテンシャルを定義する。これら二面角系の統計処理でのヒストグラムをつくるときには、角度ωはこれが表す化学結合の特性により０°か１８０°のいずれかのみをとるものとし、角度φ，ψはそれぞれ１５°から６０°刻みで分割して三次元のヒストグラムを作る。 Further, the distance k on the sequence of two amino acid residues constituting the residue pair of the average force field potential must be defined when k = 0, that is, the average force field potential of one amino acid instead of the pair. There is. In this case, since the relative configuration within one amino acid cannot be defined, a set of main chain dihedral angles ω, φ, ψ represents the internal state as the internal state of one amino acid, and the angle ω, φ , Ψ is used to determine the mean force field potential. Similarly, when k = 1, it is the average force field potential of two adjacent amino acids, and the relative arrangement of the two adjacent amino acids is almost strictly determined by the angle ψ, ω, φ of the main chain dihedral angle between them. Therefore, the average force field potential is defined by using the angles ψ, ω, and φ as the relative arrangement. When creating a histogram in the statistical processing of these dihedral angles, the angle ω takes only 0 ° or 180 ° depending on the characteristics of the chemical bond represented by this, and the angles φ and ψ are 15 ° to 60 °, respectively. Divide by ticks to create a three-dimensional histogram.

ｋ＞１の場合は三次元の平均力場ポテンシャルを用いる。ｋは非常に長い配列をもつタンパク質を考慮すれば、いくらでも大きな数字にすることもできるが、一定以上のｋについては分布ｆ^ａｂ _ｋ（ｒ）に大きな違いはないとみて、ｋが一定値Ｋより大きいｋ＞Ｋの場合は、統計上区別しないこととしてよい。Ｋとしては、統計誤差を考慮して５から２０程度がよい。 When k> 1, a three-dimensional average force field potential is used. If a protein having a very long sequence is taken into consideration, k can be made as large as possible. However, for k above a certain value, the distribution f ^ab _k (r) is not significantly different, and k is a constant value K In the case of larger k> K, it may be statistically indistinguishable. K is preferably about 5 to 20 in consideration of statistical errors.

平均力場ポテンシャルは以上のように統計に基づいて決定されるものであるから、本来連続の空間を分割して頻度を観測している。結果として分割された境界では計算された平均力場ポテンシャルは不連続である。微小変形を繰り返して最適構造を求める形式でのタンパク質立体構造予測法に平均力場ポテンシャルを用いるうえでは、このポテンシャルの不連続性は避けるべきであり、本発明の実施形態では、二次のベジエ補間を用いて平滑化した平均力場ポテンシャルを用いた。 Since the average force field potential is determined based on statistics as described above, the frequency is observed by dividing a continuous space. As a result, the calculated mean force field potential is discontinuous at the divided boundary. This potential discontinuity should be avoided when using the mean force field potential in the protein tertiary structure prediction method in which the optimal structure is obtained by repeating microdeformation. In the embodiment of the present invention, a secondary Bezier is used. The mean force field potential smoothed using interpolation was used.

配列断片エントロピーの計算方法については、本発明の要旨ではないが、関連する内容なので、その概略を説明する。本発明で用いる配列断片エントロピーの計算システムは、対エントロピー計算部８と、配列断片エントロピー計算部１０と、これを与えられたタンパク質の全配列のすべての長さ位置について配列断片エントロピーを計算する配列エントロピーマップ計算部２の三つの部分からなる。 The method for calculating the sequence fragment entropy is not the gist of the present invention, but is related content, so the outline thereof will be described. The sequence fragment entropy calculation system used in the present invention includes a pair entropy calculation unit 8, a sequence fragment entropy calculation unit 10, and an array for calculating sequence fragment entropy for all the length positions of the whole sequence of a given protein. The entropy map calculation unit 2 includes three parts.

対エントロピー計算部８は、まず、与えられたアミノ酸対ａ，ｂが配列上ｋ離れている場合の平均力場ポテンシャルの値を様々なｒについて計算して、そこからアミノ酸対ａ，ｂが配列上ｋ離れている場合の対エントロピーＳ^ｓａｂ _ｋ（ｒ）を計算するステップＳ１を有し、次いで、これを全てのａ，ｂとｋの組み合わせについて行うステップＳ２を有する。ステップＳ２は、ステップＳ１を異なるアミノ酸の種類ａ，ｂ，ｋの組み合わせについて行う３重のループ処理である。さらに対エントロピー計算部８は計算結果を保存するステップＳ３を有する。すなわち計算されたＳ^ｓａｂ _ｋ（ｒ）は、ステップＳ３によって対エントロピーテーブル９として保存される。本発明が通常の計算機システム上で行われる場合は、対エントロピーテーブル９は不揮発性の記憶媒体に保存されなければならない。この保存された対エントロピーテーブル９は一度計算されたあとは、更新する必要性が生じるまで再計算する必要はない。更新する必要性としては、平均力場ポテンシャルのもととなる統計データが更新されたり、平均力場ポテンシャルの計算方法が変更になったりした場合を指す。 The entropy calculation unit 8 first calculates the value of the mean force field potential when a given amino acid pair a, b is separated by k on the sequence for various r, from which the amino acid pair a, b is arranged There is a step S1 of calculating the entropy S ^sab _k (r) when the distance is k above, and then a step S2 of performing this for all a, b and k combinations. Step S2 is a triple loop process in which step S1 is performed for combinations of different amino acid types a, b, and k. Further, the entropy calculation unit 8 includes a step S3 for storing the calculation result. That is, the calculated S ^sab _k (r) is stored as the entropy table 9 in step S3. When the present invention is performed on a normal computer system, the anti-entropy table 9 must be stored in a nonvolatile storage medium. Once this stored anti-entropy table 9 has been calculated, it need not be recalculated until it needs to be updated. The necessity of updating refers to the case where the statistical data that is the basis of the average force field potential is updated or the calculation method of the average force field potential is changed.

配列断片エントロピー計算部１０は、与えられた配列断片の配列依存エントロピーを計算する部分であり、まず、計算に先立ち対エントロピー計算部８で計算され保存された対エントロピーテーブル９を読み込むステップＳ４を有する。次に、配列断片エントロピー計算部１０は、入力された配列断片中のｉ≦ｊを満たすｉ番目とｊ番目の残基の種類ａ，ｂを調べ、対エントロピー中から、ａ，ｂ，ｋ＝ｊ−ｉに対応するＳ^ｓａｂ _ｋを参照するステップＳ５を有し、最終的な配列断片エントロピーを計算するために、配列断片中のｉ≦ｊを満たすすべてのｉ，ｊについてエントロピーＳ^ｓａｂ _ｋの総和を計算するステップＳ６を有する。ステップＳ６によって求める配列断片エントロピーが算出される。 The sequence fragment entropy calculation unit 10 is a part that calculates the sequence-dependent entropy of a given sequence fragment. First, the sequence fragment entropy calculation unit 10 has a step S4 of reading the pair entropy table 9 calculated and stored by the pair entropy calculation unit 8 prior to the calculation. . Next, the sequence fragment entropy calculation unit 10 checks the types a and b of the i-th and j-th residues that satisfy i ≦ j in the input sequence fragment, and a, b, k = having step S5 referring to S ^sab _k corresponding to j−i and calculating the final sequence fragment entropy of entropy S ^sab _k for all i, j satisfying i ≦ j in the sequence fragment It has step S6 which calculates total. The sequence fragment entropy obtained in step S6 is calculated.

配列エントロピーマップ計算部２は、立体構造を予測するタンパク質の配列全体（これの残基数、あるいは配列長をＮとする）の中のすべての可能な配列断片、すなわちｉ番目からｊ番目までの配列断片で、１≦ｉ≦Ｎ，ｉ≦ｊ≦Ｎを満たすものすべてについて配列断片エントロピーを算出し、これをマップＳ^ｓ _ｉｊのマップとして出力する部分である。 The sequence entropy map calculation unit 2 calculates all possible sequence fragments in the entire protein sequence (the number of residues or the sequence length is N), ie, i th to j th The sequence fragment entropy is calculated for all the sequence fragments satisfying 1 ≦ i ≦ N and i ≦ j ≦ N, and this is output as the map S ^s _ij .

構造最適化スケジューリング部３は、配列エントロピーマップ計算部２で得られた配列断片エントロピーマップを用い、どの配列断片をどのような他の部分との依存関係に基づいてどのような順序で最適化するかを決定する。構造予測をすべきタンパク質の残基配列全体５で、ｉ番目のアミノ酸残基からｊ番目のアミノ酸残基までの配列断片の配列断片エントロピーをＳ^ｓ _ｉ，ｊとする。このエントロピーは、長さｋ＋１（ここで、ｋ＝ｊ−ｉとする）の配列断片の配列依存エントロピーを指し、これは、ｉ≦ｊ、かつｊ−ｉ≦ｋを満たすすべてのｉ’，ｊ’についての、Ｓ^ｓａｂ _{ｋ＝ｊ’−ｉ’}の総和である。ここで、ａはｉ’番目の残基の、ｂはｊ’番目の残基の種類である。 The structure optimization scheduling unit 3 uses the sequence fragment entropy map obtained by the sequence entropy map calculation unit 2 to optimize which sequence fragment in any order based on the dependency with what other part. To decide. Let S ^s _{i, j} be the sequence fragment entropy of the sequence fragment from the i-th amino acid residue to the j-th amino acid residue in the entire residue sequence 5 of the protein whose structure is to be predicted. This entropy refers to the sequence-dependent entropy of a sequence fragment of length k + 1 (where k = j−i), which means that all i ′, j satisfying i ≦ j and j−i ≦ k. ^Is the sum of S ^sab _{k = j′−i ′} . Here, a is the type of the i'th residue, and b is the type of the j'th residue.

まず、最も簡単なｋ＝０、つまり、長さ１の１残基からなる構造断片の最適化が最初に行われる。この場合は、それぞれの残基の最適化においては他の残基の影響はないと考えて、すべての残基の最適構造を独立に求めることができる。断片の両端には隣接残基の断片ＣＯ及びＮＨを付加した形で、最少の総エネルギー値になるような最適構造を暫定的に求める。すなわち構造最適化スケジューリング部３は、ｋ＝０の場合として、配列全体中の各残基の最適化をそれぞれ独立して行うようにスケジューリングするステップＳ１を有する。次に、ｋ＝Ｋ（Ｋ＞０）であるとき、すなわち長さがＫ＋１の配列断片の構造最適化順序と、多数の配列部分からの影響については以下のように決定する。まず、長さがＫ＋１の配列断片すべてのうちで、最も小さいエントロピーＳ^ｓ _{ｉ，ｊ＋Ｋ}については、これは他の部分に対して独立にこの断片部分の構造が決定できるとして、ｉ番目の残基からｉ＋Ｋの残基からなる構造断片の総エネルギーを最小にする構造最適化を行う。この部分の断片の総エネルギーを計算するときは、ｉ番目の残基がＮ末端でないときは、ＣＯをＮ末端側に付加し、ｉ＋Ｋ番目の残基Ｃ末端でないときはＮＨをＣ末端側に付加した場合の総エネルギーを求める。 First, the simplest k = 0, that is, optimization of a structural fragment consisting of one residue of length 1 is performed first. In this case, it is considered that there is no influence of other residues in the optimization of each residue, and the optimal structure of all residues can be obtained independently. The optimum structure is tentatively determined so that the minimum total energy value is obtained by adding fragments CO and NH of adjacent residues to both ends of the fragment. That is, the structure optimization scheduling unit 3 has a step S1 for scheduling so that each residue in the entire sequence is optimized independently when k = 0. Next, when k = K (K> 0), that is, the structure optimization order of the sequence fragment having a length of K + 1 and the influence from a large number of sequence parts are determined as follows. First, among all the sequence fragments having a length of K + 1, for the smallest entropy S ^s _{i, j + K} , this means that the structure of this fragment part can be determined independently of the other parts, and the i th residue To optimize the structure fragment consisting of residues i + K to minimize the total energy. When calculating the total energy of the fragment of this part, CO is added to the N-terminal side when the i-th residue is not N-terminal, and NH is C-terminal side when it is not the i + K-th residue C-terminal. Find the total energy when added.

次に、ｉ番目の残基からｉ＋Ｋ番目の残基までの断片のエントロピーＳ^ｓ _{ｉ，ｊ＋Ｋ}が、この断片と重複部分をもつ周囲の同じ長さの断片のエントロピーよりも大きい値をとるときは、総エネルギーを計算する対象として、配列上連続する限り、Ｎ末端側、及びＣ末端側に範囲を延長する。 Next, when the entropy S ^s _{i, j + K} of the fragment from the i-th residue to the i + K-th residue is larger than the entropy of the same-length fragment surrounding this fragment and the overlapping portion, As a target for calculating the total energy, the range is extended to the N-terminal side and the C-terminal side as long as the sequence is continuous.

図２は図１の計算システムにおいて、配列断片構造最適化時の総エネルギー計算時に考慮すべき構造（配列）の範囲を示す図である。図２中の横に並ぶ丸は残基を表し、１５で示す範囲が現在の順序で最適化すべき配列断片であり、１１から１４で示す断片はすでに先行する順序で最適化されるべき断片である。この場合先行する順序で最適化される１１から１４で示す領域（断片）のうち、新たに最適化される５と連続的に重ならない１２を除き、１１、１３、１４、１５の配列断片を合わせた１６の伸長された断片部分が総エネルギー計算の対象となる断片である。こうして伸長された配列断片の範囲をＩ番目の残基からＪ番目の残基までの範囲であるとする。エントロピーＳ^ｓ _{ｉ，ｊ＋Ｋ}が最大である配列断片の最適化は配列長がＫ＋１の段階では、最後に行われ、このタンパク質の全長がＮであるならば、Ｉ＝１（Ｎ末端）で、Ｊ＝Ｎ（ＪはＣ末端）となる。 FIG. 2 is a diagram showing a range of structures (arrays) to be considered in calculating the total energy when optimizing the sequence fragment structure in the calculation system of FIG. The horizontal circles in FIG. 2 represent residues, the range indicated by 15 is a sequence fragment to be optimized in the current order, and the fragments indicated by 11 to 14 are fragments to be optimized in the preceding order. is there. In this case, among the regions (fragments) indicated by 11 to 14 that are optimized in the preceding order, except for 12 that does not overlap with the newly optimized 5 continuously, the sequence fragments of 11, 13, 14, and 15 are obtained. A total of 16 extended fragment portions are fragments to be subjected to the total energy calculation. The range of the sequence fragment thus extended is assumed to be the range from the I-th residue to the J-th residue. Optimization of the sequence fragment with the largest entropy S ^s _{i, j + K} is performed at the end when the sequence length is K + 1, and if the total length of this protein is N, then I = 1 (N-terminal) and J = N (J is C-terminal).

総エネルギーを計算する際には、Ｉ番目がＮ末端でない場合は、ＣＯをＩ番目の残基のＮＨに付加し、Ｊ番目の残基がＣ末端でない場合は、ＣＯに、ＮＨを付加した条件で行う。すなわち、図１の構造最適化スケジューリング部３は、ｋ＝Ｋ（Ｋ＞０）の場合の配列断片エントロピーの計算結果に基づいて、各配列断片の最適化をどの順番で行い、それぞれの最適化において総エネルギー計算する配列の範囲Ｉ，Ｊをどう設定するかを決定するステップＳ１２を有する。Ｋは０からＮ−１まで１ずつ増やす形で最適化順序を決定する。この構造最適化スケジューリング部３によって、構造を予測したいタンパク質の配列全体が与えられたときに、どの部分から構造を最適化し、その最どの配列部分までのエネルギーを考慮するかが、完全に決定される。よって、構造最適化スケジューリング部３は、まず前記構造最適化スケジューリング部３のステップＳ１１を行い、次いでＫ＝１から、Ｋ＝Ｎ−１（全長−１）まで、各Ｋにおいて断片の最適化順序と総エネルギー計算範囲Ｉ，Ｊを決定するステップＳ１２を行うステップＳ１３を有し、これで構造を予測すべきタンパク質の全配列中のすべての配列断片についてどの順序でどのように最適化するかを決定する。 When calculating the total energy, if the I-th is not the N-terminus, CO is added to the NH of the I-th residue, and if the J-th residue is not the C-terminus, NH is added to the CO. Perform under conditions. That is, the structure optimization scheduling unit 3 shown in FIG. 1 performs optimization on each sequence fragment based on the calculation result of the sequence fragment entropy when k = K (K> 0), and optimizes each sequence fragment. In step S12, it is determined how to set the ranges I and J of the array for calculating the total energy. The optimization order is determined by increasing K by 1 from 0 to N-1. This structure optimization scheduling unit 3 completely determines from which part the structure is optimized and the energy up to which sequence part is taken into consideration when the entire protein sequence whose structure is to be predicted is given. The Therefore, the structure optimization scheduling unit 3 first performs step S11 of the structure optimization scheduling unit 3 and then optimizes the fragments in each K from K = 1 to K = N−1 (total length−1). And step S13 for determining the total energy calculation ranges I and J, and how to optimize in this order for all sequence fragments in the entire sequence of the protein whose structure is to be predicted. decide.

次に、構造断片最適化部４は、最適化すべき配列断片がｉ番目からｊ番目の範囲で（ｉ≦ｊ）、その最の総エネルギーを計算する範囲がＩ（１≦Ｉ≦ｉ）番目の残基からＪ（ｊ≦Ｊ≦Ｎ）番目の残基である場合に、ｉ番目からｊ番目の残基に関わる主鎖二面角の角度ω，φ，ψを変化させてＩ番目の残基からＪ番目の残基と両末端にそれぞれＣＯ，ＮＨを付加した構造断片の総エネルギーを最小化する部分である。ある一つの残基の角度ω，φ，ψは変化させるときに、この三つの角度を一斉に変化させ、また配列上連続する二つの残基の間にある角度ψ，ω，φも、この三つの角度ψ，ω，φを一斉に変化させる方法をとる。ｉ番目からｊ番目の残基を変化させる場合は、この間のｊ−ｉ＋１個すべての残基、の角度ω，φ，ψの組及び、この範囲にあるｊ−ｉ個のすべての残基間の二面角の角度ψ，ω，φの組を変化させることになり、その変化させる順序を決定しなくてはならない。そこで、ここでも各残基と連続する二つの残基からなる部分断片のエントロピーの大小関係に基づいて、どの角度の組から順番に変化させるかを決定する。すなわち、断片構造最適化部４は、ｉ番目からｊ番目までの配列断片の中の長さ２（つまりｋ＝１）の部分配列断片のエントロピーを計算するステップＳ２１を有し、次いで最も小さいエントロピーの部分配列断片から順次最適化を進めるステップＳ２２を有する。ステップＳ２２においては、この長さ２の部分配列断片を構成する二つの残基それぞれの角度ω，φ，ψと、二つの残基の間の角度ψ，ω，φが一つの、合わせて三つの可変な角度の組がある（ただし、角度は重複している）。そこで、できるだけ局所から順次角度を最適化すべきとする原則にそって、このｋ＝１の部分断片を構成する二残基の角度の最適化の順序はまず、二つのうち残基のエントロピーが小さい方の角度ω，φ，ψを最適化し、次いで大きい方の角度ω，φ，ψを最適化し、続いて二つの残基間の角度ψ，ω，φを最適化する。この方法では、ステップＳ２２を一度行った段階では最終的な最適構造に至らないと考えられるので、構造変形がそれ以上できなくなった収束段階にいたるまで、ステップＳ２２を繰り返す。エントロピーは構造変化に対して変化しないので、ステップＳ２１は一度行うだけでよい。 Next, the structural fragment optimization unit 4 has the sequence fragment to be optimized in the i-th to j-th range (i ≦ j), and the maximum total energy calculation range is the I (1 ≦ I ≦ i) -th range. When the residue is the Jth (j ≦ J ≦ N) th residue, the main chain dihedral angles ω, φ, ψ related to the ith to jth residues are changed to change the Ith This is the part that minimizes the total energy of the J-th residue from the residue and the structural fragment with CO and NH added to both ends respectively. When the angles ω, φ, ψ of one residue are changed, these three angles are changed at the same time, and the angles ψ, ω, φ between two consecutive residues in the sequence are also changed. A method is adopted in which the three angles ψ, ω, and φ are changed simultaneously. When changing from the i-th residue to the j-th residue, all the j-i + 1 residues in the meantime, the set of angles ω, φ, ψ, and all the j-i residues in this range The pair of dihedral angles ψ, ω, and φ is changed, and the order of change must be determined. Therefore, also from this, it is determined which set of angles is to be changed in order based on the entropy relationship of the partial fragments composed of two residues that are continuous with each residue. That is, the fragment structure optimizing unit 4 has a step S21 of calculating the entropy of the partial sequence fragment of length 2 (that is, k = 1) among the i th to j th sequence fragments, and then the smallest entropy Step S22 in which optimization is sequentially advanced from the partial sequence fragments. In step S22, the angles ω, φ, ψ of the two residues constituting the partial sequence fragment of length 2 and the angles ψ, ω, φ between the two residues are one and three in total. There are two variable angle pairs (however, the angles overlap). Therefore, in accordance with the principle that the angles should be optimized sequentially from the local area as much as possible, the order of optimization of the angles of the two residues constituting the partial fragment of k = 1 is first, the entropy of the residues out of the two is small The larger angle ω, φ, ψ is optimized, then the larger angle ω, φ, ψ is optimized, and then the angle ψ, ω, φ between the two residues is optimized. In this method, since it is considered that the final optimum structure is not reached after step S22 is performed once, step S22 is repeated until the convergence stage where the structural deformation is no longer possible. Since entropy does not change with structural changes, step S21 need only be performed once.

ステップＳ２２における構造変形方法は、角度ωについては０°か１８０°のいずれか、角度φとψについてはそれぞれ１５°から６０°の刻みでどの角度をとるのが最適であるかを、それぞれの角度を与えた場合の総エネルギーから判断し、最適な角度ω、φ、ψに変形させるステップＳ２３を有し、角度φ，ψについては刻み幅が６０°である場合は、まず、角度φ，ψのそれぞれを０°及び±３２°変化した場合（二つの角度でそれぞれ３通りであるから９通りある）について最適な角度の微動の組をみつけるステップＳ２４を有し、続いて変形幅を１６°にしてステップＳ２４を行い、この変形幅を半分にしつつ１°になるまで続けるステップＳ２５を有する。ステップＳ２３における角度φ，ψの刻み角度が１５°である場合は、ステップＳ２４の最初の変形幅は８°にする。すなわち刻み角度がＬ°であるときには、刻み幅Ｌを超えない２のべき乗で得られる数にする。刻み幅Ｌ°としては、化学結合の安定性も考慮して、６０°３０°１５°のいずれかがふさわしい。３０°の場合は、変形幅は１６°とする。こうして、ステップＳ２３とステップＳ２４の繰り返しにより、最適な角度ω，φ，ψ、あるいは二残基間の角度ψ，ω，φの角度の組が決定され、その場合のＩ番目からＪ番目の構造断片の総エネルギー値が、全く構造が変化しない場合よりも小さくなれば、その変形は有効とし、小さくないときは変形を破棄して元の角度の組に戻すステップＳ２５によって、最適な変形となる角度組み合わせを順次求めていく。なお、ステップＳ２３において刻み幅Ｌ°で角度ω，φ，ψ、あるいは角度ψ，ω，φの最もよい組み合わせを求めるときに、構造既知タンパク質の統計処理から、立体障害などによってとりえない角度の組み合わせについては除外することで、組み合わせの数を減らし、計算速度を速くすることも考えられる。 The structural deformation method in step S22 determines which angle is optimal for the angle ω in either 0 ° or 180 °, and the angles φ and ψ in steps of 15 ° to 60 °, respectively. Judging from the total energy in the case of giving an angle, it has a step S23 for transforming it into the optimum angles ω, φ, ψ, and when the step size of the angles φ, ψ is 60 °, first, the angle φ, When each of ψ is changed by 0 ° and ± 32 ° (there are 3 ways for each of the two angles, there are 9 ways), step S24 is found to find a set of fine movements of the optimum angle, and then the deformation width is set to 16 Step S24 is performed, and step S25 is continued until the deformation width is halved until it reaches 1 °. When the step angle of the angles φ and ψ in step S23 is 15 °, the initial deformation width in step S24 is 8 °. That is, when the step angle is L °, the number is obtained by a power of 2 that does not exceed the step width L. As the step width L °, either 60 °, 30 °, or 15 ° is appropriate in consideration of the stability of chemical bonds. In the case of 30 °, the deformation width is 16 °. Thus, by repeating step S23 and step S24, the optimum angle ω, φ, ψ, or the pair of angles ψ, ω, φ between two residues is determined, and in this case, the I-th to J-th structures are determined. If the total energy value of the fragments becomes smaller than when the structure does not change at all, the deformation is effective. If not, the deformation is discarded, and the optimal deformation is obtained by step S25 to return to the original set of angles. The angle combination is obtained sequentially. In step S23, when obtaining the best combination of angles ω, φ, ψ or angles ψ, ω, φ with step size L °, the statistical processing of proteins with known structures is used to determine the angle that cannot be taken due to steric hindrance. By excluding combinations, it is possible to reduce the number of combinations and increase the calculation speed.

以上説明したように、本発明の実施形態によれば、対エントロピー計算部８と配列断片エントロピー計算部１０の結果に基づいて構造最適化の順序を決定する構造最適化スケジューリング部３と、構造最適化スジェジューリング部３の結果による順序で各構造断片を最適化する断片構造最適化部４を有し、対エントロピー計算部８及び配列断片エントロピー計算部１０はステップＳ１からステップＳ６によって各配列断片の配列依存エントロピーを算出し、構造最適化スケジューリング部３は、ステップＳ１１からステップＳ１３によってどの構造断片をどのような順序でどのような条件で最適化するかを決定し、断片構造最適化部４は、構造最適化スケジューリング３部の決定したスケジュールと条件（総エネルギー計算する配列上の範囲）にそって、各配列断片の構造をステップＳ２１からステップＳ２５によって最適化する。最終的に配列断片長が、全体配列長と等しくなった段階で配列断片の構造断片最適化が行われた段階得られた全体構造（立体構造）が、与えられたタンパク質の全体構造（立体構造）の予測結果６であるとし、構造予測が完結する。 As described above, according to the embodiment of the present invention, the structure optimization scheduling unit 3 that determines the order of structure optimization based on the results of the pair entropy calculation unit 8 and the sequence fragment entropy calculation unit 10, and the structure optimization A fragment structure optimizing unit 4 for optimizing each structural fragment in the order according to the result of the synthesizing / scaling unit 3. The structure optimization scheduling unit 3 determines which structural fragment is to be optimized in what order and under what conditions in steps S11 to S13, and the fragment structure optimization unit 4 Is the schedule and conditions determined by the third part of the structure optimization scheduling (range on the array for calculating the total energy) Along, optimized by step S25 the structure of each sequence fragments from step S21. When the sequence fragment length is finally equal to the total sequence length, the sequence fragment structure is optimized. The overall structure (stereostructure) obtained is the overall structure of the given protein (stereostructure). ), The structure prediction is completed.

図１において、対エントロピーテーブル９及び平均力場ポテンシャル算出用統計データ７は例えばハードディスクメモリなどの記憶装置に記憶され、対エントロピー計算部８、配列断片エントロピー計算部１０、配列エントロピーマップ計算部２、構造最適化スケジューリング部３、構造断片最適化部４及び平均力場ポテンシャル計算部１は例えばコンピュータなどのデジタル計算機により構成される。 In FIG. 1, a counter entropy table 9 and mean force field potential calculation statistical data 7 are stored in a storage device such as a hard disk memory, for example, and a counter entropy calculation unit 8, a sequence fragment entropy calculation unit 10, a sequence entropy map calculation unit 2, The structure optimization scheduling unit 3, the structure fragment optimization unit 4, and the average force field potential calculation unit 1 are configured by a digital computer such as a computer.

本発明の実施形態に係る計算システムによりタンパク質立体構造予測法を用いて予測した結果を図３で示す。すなわち、図３（ａ）は図１の計算システムにより予測された部分構造２１を示す図であり、図３（ｂ）はその実際の構造２２を示す図である。 FIG. 3 shows the result of prediction using the protein tertiary structure prediction method by the calculation system according to the embodiment of the present invention. 3A is a diagram showing the partial structure 21 predicted by the calculation system of FIG. 1, and FIG. 3B is a diagram showing the actual structure 22. As shown in FIG.

図３（ａ）の２１が予測されたＰＯＩＡ１の部分の立体構造であり、図３（ｂ）の２２が対応する部分の実際のＰＯＩＡ１の立体構造である。全体構造を予測するにはまだ時間がかかり、また高い精度で予測をすることが難しいが、タンパク質全体の中の部分を切りだした場合、高い精度で予測できることを示している。この例では、プロリンとグリシンによるターン構造が実際のタンパク質（ＰＯＩＡ１の当該部分）とほぼ同じ構造をとり、これがヘアピンシート構造の形成を誘導する。シート形成における水素結合する残基の対応も実際のタンパク質の立体構造と一致している。この方法では、Ｋ＝８の段階（配列断片長９）の段階で１に示す構造になった。 In FIG. 3A, 21 is the predicted three-dimensional structure of POIA1, and 22 in FIG. 3B is the corresponding three-dimensional structure of POIA1. Although it takes time to predict the whole structure and it is difficult to predict with high accuracy, it shows that it is possible to predict with high accuracy when a part of the whole protein is cut out. In this example, the turn structure by proline and glycine takes almost the same structure as the actual protein (the relevant part of POIA1), which induces the formation of a hairpin sheet structure. The correspondence of hydrogen-bonding residues in sheet formation is also consistent with the actual protein conformation. In this method, the structure shown in 1 was obtained at the stage of K = 8 (sequence fragment length 9).

以上詳述したように本発明によれば、タンパク質の立体構造予測の精度が向上し、かつその立体構造形成の順序が予測できることによって、構造未知のタンパク質の構造予測のみならず、新規の配列をもつ非天然の人工タンパク質の設計や、既存の天然タンパク質の配列変更などにおいて、どのような設計、あるいは配列変更が目的の機能とその機能を実現する構造をもつかを推定することが可能になる。これは、生体活動における詳細なメカニズムの解明に役立ち、そこから生体内部での物理化学反応を制御する方法の開発にも役立つ。さらに、新規の人工タンパク質の設計においても、設計の詳細を決定していく段階で多くの知見を与えるものとなる。 As described above in detail, according to the present invention, the accuracy of the three-dimensional structure prediction of a protein is improved and the order of the three-dimensional structure formation can be predicted. It is possible to estimate what design or sequence change has the desired function and the structure that realizes that function in the design of a non-natural artificial protein possessed or the sequence change of an existing natural protein. . This is useful for elucidating detailed mechanisms in biological activities, and for developing methods for controlling physicochemical reactions inside the body. Furthermore, in designing a new artificial protein, a lot of knowledge will be given at the stage of determining the details of the design.

すでに、この発明によってタンパク質の二次構造形成の仕組みが解明されつつある。へリックス構造は、２残基から４残基の長さのエントロピーの小さい配列断片がφ、ψともにマイナスとなる局所構造をとることによって、この部分の各ＣＯＮＨの双極子が作り出す静電ポテンシャルがこの部分の主としてＣ末端側下流の領域をヘリックスになりやすい状況にすることでヘリックス様構造を作りだし、これがＣ末端側にさらに伸長することで長いヘリックスが形成されることが判明した。またシート構造は、主としてグリシンやプロリンなどの小さいエントロピーの残基をもつ領域がヘアピンターンを作りだし、これがその両側の領域の水素結合を誘発し、反平行のシート構造を形成することも判明した。この際に、シート構造において水素結合をつくる配列上離れた領域にある残基対の間には、特別な残基の組み合わせがあるわけではなく、両者がシートを作りにくい場合であっても、周囲の環境によってシート構造に埋め込まれることも判明した。 The mechanism of secondary structure formation of proteins has already been elucidated by this invention. The helix structure has a local structure in which φ and ψ are negative in the small entropy sequence fragment of 2 to 4 residues in length, so that the electrostatic potential generated by each dipole of each CONH It was found that a helix-like structure was created by making the region downstream of this portion mainly downstream of the C-terminal easy to become a helix, and a long helix was formed by extending this further to the C-terminal. It was also found that the region with small entropy residues, such as glycine and proline, produced hairpin turns mainly in the sheet structure, which induced hydrogen bonding in the regions on both sides, forming an antiparallel sheet structure. At this time, there is not a special combination of residues between residue pairs in regions separated from each other in the sheet structure in which hydrogen bonds are formed, and even when both are difficult to form a sheet, It was also found that it was embedded in the sheet structure depending on the surrounding environment.

さらに、最終的にシートを構成するストランドを作る領域のエントロピーが小さいときがあり、この場合は、その部分が強靭にストランドを作ることを主張し、周囲がいかにヘリックスになろうが、自身はヘリックスにならないことで、近傍に他のストランドが近づきシートを形成することも判明した。さらにヘリックスについて、ヘリックス構造は、Ｃ末端側、あるいはＮ末端側にも伸長していくが、最終的にはできたヘリックスの両端に位置するエントロピーの小さい領域が不規則なターン構造あるいはループ構造を作ることで、ヘリックスの伸長をとめることも判明した。 In addition, there are times when the entropy of the region that makes up the strands that ultimately make up the sheet is small, in which case it insists that the strands make strands tough and how helix the surroundings are, It was also found that other strands approached in the vicinity to form a sheet by not becoming. Furthermore, for the helix, the helix structure extends to the C-terminal side or the N-terminal side, but the regions with small entropy located at both ends of the helix finally formed an irregular turn structure or loop structure. It has also been found that making it stops the extension of the helix.

以上の本発明による構造予測によって得られた構造形成の知見は、従来のタンパク質構造予測法で利用しようとしてきた知見とは大きく異なるものであり、これら新しい知見を利用することで、タンパク質の構造形成の仕組みの理解がさらに一層進み、これらの知見が新規の配列をもつ人工タンパク質の設計や既存の天然タンパク質の改変を行うにあたって必要不可欠であると考えられる。 The knowledge of structure formation obtained by the structure prediction according to the present invention is greatly different from the knowledge that has been attempted to be used in the conventional protein structure prediction method. By using these new knowledge, protein structure formation It is considered that the understanding of this mechanism is further advanced, and these findings are indispensable for designing artificial proteins having novel sequences and modifying existing natural proteins.

本発明の実施形態に係るタンパク質立体構造予測法を用いてエントロピーを計算する計算システムの構成を示すブロック図である。It is a block diagram which shows the structure of the calculation system which calculates entropy using the protein tertiary structure prediction method which concerns on embodiment of this invention. 図１の計算システムにおいて、配列断片構造最適化時の総エネルギー計算時に考慮すべき構造（配列）の範囲を示す図である。In the calculation system of FIG. 1, it is a figure which shows the range of the structure (sequence) which should be considered at the time of total energy calculation at the time of sequence fragment structure optimization. （ａ）は図１の計算システムにより予測された部分構造２１を示す図であり、（ｂ）はその実際の構造２２を示す図である。(A) is a figure which shows the partial structure 21 estimated by the calculation system of FIG. 1, (b) is a figure which shows the actual structure 22. FIG.

Explanation of symbols

１…平均力場ポテンシャル計算部、
２…配列エントロピーマップ計算部、
３…構造最適化スケジューリング部、
４…構造断片最適化部、
５…タンパク質残基配列、
６…タンパク質立体構造予測結果、
７…平均力場ポテンシャル算出用統計データ、
８…対エントロピーテーブル計算部、
９…対エントロピーテーブル、
１０…配列断片エントロピー計算部。 1 ... Average force field potential calculation part,
2 ... Sequence entropy map calculator,
3. Structure optimization scheduling unit,
4 ... Structural fragment optimization unit,
5 ... Protein residue sequence,
6 ... Protein tertiary structure prediction result,
7 ... Statistical data for calculating mean force field potential,
8: Counter entropy table calculation unit,
9 ... Entropy table,
10: Sequence fragment entropy calculation section.

Claims

In the protein tertiary structure prediction method that predicts the three-dimensional structure of a protein, the sequence-dependent entropy of all the sequence fragments in the entire sequence of the protein for which the three-dimensional structure is to be predicted is calculated. A protein three-dimensional structure prediction method characterized by scheduling an order of predicting or optimizing a structure of a structure fragment corresponding to a sequence fragment and a condition on how to consider surrounding sequences.

The order of optimization of the structural fragments of the protein is optimized in preference to fragments having a short length. In the optimization of structural fragments of the same length, the sequence-dependent entropy of the sequence fragments corresponding to the structural fragments is in ascending order. If the structure fragment that is optimized in the predetermined order overlaps with the structure fragment of the same length to be optimized in the previous order, the structure fragment optimization condition is The protein three-dimensional structure prediction according to claim 1, wherein the structure optimization is performed for the purpose of minimizing the total energy of the extended structural fragment including all of the structural fragments to be optimized. Law.

In the structure fragment optimization scheduling, when applying in preference to a sequence fragment having a small length, it is assumed that the optimal solution of all residues is independent at length 1, and then the length is 3. In the case of 2 or more, the structure of each fragment length is optimized by a method in which the whole structure optimized in the stage where one length is small is used as an initial value. Prediction method.

In the total energy optimization of the extended structural fragment of the protein, the sum of the mean force field potentials for all pairs of residues in the extended structural fragment and the CO at both ends of the extended structural fragment , And NH, and the sum of the Leonard-Jones potential and electrostatic potential between all atoms in the structural fragment, assuming that the residue is all glycine, and the main chain dihedral angle. The sum of the angular potentials, and the energy difference caused by setting all residues to glycine, corrected for the non-glycine by the average force field potential of the pair of glycine and glycine, is the total energy of the extended structural fragment, and this is the minimum 3. The protein solid according to claim 2, wherein the structure of the extended structural fragment is optimized by modifying the structure so as to be Concrete prediction methods.

In the optimization of the extended structural fragment of the protein, three continuous dihedral angles ω and φ among the main chain dihedral angles included in the structural fragment portion before being extended in the extended structural fragment. , Ψ, or a combination of angles ψ, ω, φ, with the optimal dihedral angle in ascending order of entropy based on the size of the sequence fragment entropy of the partial structure fragment of sequence length 1 or 2 related to the set of dihedral angles. 5. The protein tertiary structure prediction method according to claim 4, wherein the angles ω, φ, ψ or the angles ψ, ω, φ are determined.