JP2003524831A

JP2003524831A - System and method for exploring combinatorial space

Info

Publication number: JP2003524831A
Application number: JP2001540691A
Authority: JP
Inventors: ゴールドブラム、アミラム; グリック、メイア
Original assignee: Yissum Research Development Co of Hebrew University of Jerusalem
Current assignee: Yissum Research Development Co of Hebrew University of Jerusalem
Priority date: 1999-11-22
Filing date: 2000-11-22
Publication date: 2003-08-19
Also published as: AU780941B2; WO2001039098A3; AU1546901A; CA2391987A1; EP1266337A2; WO2001039098A8; WO2001039098A2

Abstract

(57)【要約】組み合わせの爆発を引き起こすことなく、組み合わせ空間を隅々まで探索するため方法およびシステム。探索は、基本要素の様々な組み合わせに対して、探索の成功に関する定量的測定値に変換可能な、当該組み合わせの少なくとも１つの所望の特性に従って行われる。変数の数、したがって組み合わせの数が非常に大きくなりうるため、好ましくは組み合わせのサンプルが調べられる。定量的測定値の一貫した最大化および／または促進を有する組み合わせの要素が保持され、その他の要素は放棄される。この処理をある最小数の組み合わせが見つかるまで繰り返し、それを任意選択で同様のパラメータおよび／または何らかの他のパラメータまたは特性に従ってさらに評価することができる。 (57) [Summary] A method and system for exploring every corner of a combination space without causing a combination explosion. The search is performed according to at least one desired characteristic of the combination, which can be converted, for various combinations of the basic elements, into a quantitative measure of the success of the search. Since the number of variables, and thus the number of combinations, can be very large, preferably a sample of the combinations is examined. Elements of the combination that have consistent maximization and / or enhancement of quantitative measurements are retained, and other elements are discarded. This process may be repeated until a certain minimum number of combinations is found, which may optionally be further evaluated according to similar parameters and / or some other parameters or characteristics.

Description

Detailed Description of the Invention

【０００１】（発明の分野）本発明は組み合わせ空間を隅々まで探索するためのシステムおよび方法を開示
しており、特に、所望の特性を有する基本要素の１または複数の組み合わせの位
置を組み合わせ空間内で特定することのできるシステムおよび方法に関する。該
所望の特性は、数値的根拠を有しているか、最低でも何らかの形式の数値測度お
よび／または同等のものに変換可能でなければならない。本発明は、上記特性に
従って、組み合わせの爆発を引き起こすことなく、組み合わせ空間を迅速且つ効
率的に探索することを可能にする。本発明は、基本要素の各値を探索プロセスの
間に少なくとも１回、好ましくは複数回調べることにより、上記の作業を達成す
る。したがって、基本要素に対する値の全ての組み合わせを網羅的に探索する必
要なしに、各値を網羅的探索に探索することができると言える。このように、本
発明は非網羅的な確率論的探索プロセスの効果と網羅的探索の効果とを併せ持っ
ている。FIELD OF THE INVENTION [0001] The present invention discloses systems and methods for searching combinatorial spaces in depth, and in particular, the position of one or more combinations of primitives having desired properties. And systems and methods that can be identified within. The desired property must have a numerical basis or be at least convertible to some form of numerical measure and / or equivalent. The present invention, according to the above properties, allows a quick and efficient search of the combinatorial space without causing combinatorial explosion. The present invention accomplishes the above task by examining each value of a primitive element at least once, and preferably multiple times during the search process. Therefore, it can be said that each value can be searched for in an exhaustive search without having to exhaustively search all combinations of values for the basic elements. Thus, the present invention combines the effects of a non-exhaustive stochastic search process and the effects of an exhaustive search.

【０００２】（発明の背景）この項目は説明の簡潔化のために多数の下位項目に分割した。簡単に言うと、
第１の項目では組み合わせ空間の一般的問題と、該空間内での探索について述べ
る。次の項目では、多くの様々な生物学的問題を解決するために、これまでに試
みられた解決策について述べるが、これらは、組み合わせ探索空間を生物学的問
題に関して取り扱うにあたっての背景技術の解決策の不十分さを示した例でもあ
る。これらの項目は、タンパク質などの生物学的分子に対する極性プロトンの配
置、タンパク質中のアミノ酸に対する側鎖の配置、およびタンパク質中のループ
構造の予測を含んでいる。BACKGROUND OF THE INVENTION This section has been divided into a number of subsections for the sake of brevity. Put simply,
The first item describes the general problem of combinatorial spaces and the search within that space. The next section describes the solutions that have been attempted so far to solve many different biological problems, which are background techniques for dealing with combinatorial search spaces in relation to biological problems. It is also an example showing the inadequacy of the measures. These items include the placement of polar protons for biological molecules such as proteins, the placement of side chains for amino acids in proteins, and the prediction of loop structures in proteins.

【０００３】組み合わせ空間組み合わせ空間は、基本要素の複数の組み合わせを有するものとして定義され
る。これらの組み合わせは、要素の値の形式や、結果として得られる要素の組み
合わせの構造によって相違しうるし、あるいは、両方の因子の結果として生成す
ることもできる。より基本的なレベルでは、それぞれの組み合わせは変数から構
成され、該変数のそれぞれは１つ以上の値をとり得ると考えることができる。本
明細書中においては、好ましくは各変数はばらばらの値の組みの中の１つの値を
とるが、これに代えて、各変数は例えば、ある範囲の連続値の中の、あるいは関
数の中の１つの値をとってもよい。 Combinatorial Space A combinatorial space is defined as having multiple combinations of basic elements. These combinations may differ depending on the format of the element values, the resulting structure of the element combinations, or may be generated as a result of both factors. At a more basic level, each combination can be considered to consist of variables, each of which can have one or more values. In the present specification, each variable preferably takes on one value in a set of disjoint values, but instead each variable may, for example, be in a range of continuous values or in a function. May take one value of

【０００４】多くの基本的な生体材料は、結果として得られる構造および／機能は非常に複
雑であったとしても、それ自体は比較的基本的な構築ブロックの組み合わせによ
って作られるため、生物学においてはしばしば組み合わせ空間が生じる。組み合
わせ空間の例としてはタンパク質が含まれるが、タンパク質に限定されることは
ない。タンパク質は、構築ブロックとしてのアミノ酸の組み合わせによって生成
され、組み合わせ空間内の１組の値である「三次構造」として知られる立体構造
に最終的にフォールディングする。このような組み合わせ空間を隅々まで探索し
てこの単一の構造を見つけだすことは、「組み合わせ探索」と呼ぶこともできる
。しかしながら、生物環境における折り畳まれたタンパク質は、１つの「三次構
造」に固定されることなく、多くの平衡なコンフォメーション準安定状態で存在
しうる。したがって、組み合わせ空間を隅々まで探索して、好ましくは１つより
多くの解が見つけなければならない。Many basic biomaterials are made in biology because they are themselves made up of relatively basic building block combinations, even though the resulting structures and / or functions are very complex. Often results in combinatorial spaces. Examples of combinatorial space include, but are not limited to proteins. Proteins are produced by the combination of amino acids as building blocks and ultimately fold into a conformation known as the "tertiary structure", which is a set of values in the combinatorial space. Searching every corner of such a combinatorial space to find this single structure can also be called a “combinatorial search”. However, folded proteins in the biological milieu may exist in many equilibrium conformational metastable states without being locked into one "tertiary structure". Therefore, a combinatorial space must be searched everywhere, and preferably more than one solution must be found.

【０００５】これらの様々なタイプの構築ブロックは明らかに単純であっても、得られる組
み合わせの数と複雑さが甚大であるために、組み合わせ空間を網羅的に探索する
ことは当該技術水準のコンピュータの処理範囲を越すことになるため、組み合わ
せ空間を隅々まで探索することは難しい問題である。たとえば、比較的短いアミ
ノ酸配列を有する単純で小さいタンパク質でさえ、実際に安定な構造は１個また
は数個であるにせよ、膨大な数の異なる潜在的構造を有している。タンパク質構
造についていえば、この問題は、アミノ酸の側鎖や極性プロトン（タンパク質内
でのおよび任意選択でタンパク質と他の分子との間での水素結合を決定するため
の）の配置、およびループなどのタンパク質中のより大きい構造の位置と種類と
いった幾つかの下位問題から構成される。したがって、特に生物学的問題のため
に、上記のタイプの組み合わせ空間を隅々まで探索することは、典型的には様々
な計算アプローチによるモデリングおよび予測にも耐えることが分ってきた。Even though these various types of building blocks are obviously simple, the exhaustive search of the combination space is not exhaustive because of the enormous number and complexity of combinations available. Since it exceeds the processing range of, it is a difficult problem to search every corner of the combination space. For example, even simple small proteins with relatively short amino acid sequences have a huge number of different potential structures, even if only one or a few are actually stable structures. In terms of protein structure, this problem includes the placement of amino acid side chains and polar protons (to determine hydrogen bonds within the protein and optionally between the protein and other molecules), and loops. It consists of several subproblems, such as the location and type of the larger structures in the protein. Therefore, it has been found that exploring combinatorial spaces of the above type in depth, especially due to biological problems, typically also withstands modeling and prediction by various computational approaches.

【０００６】組み合わせ空間を隅々まで探索するという問題に対処するために多くの試みが
なされてきた。一般的にはこれらの試みには、たとえば組み合わせ空間の細分化
による大きい探索空間を小さい探索空間に変形したり、空間の表示を変更したり
、潜在的解を探し出すための基準の決定性を小さくするといった具合に、探索ス
トラテジーが所望の解決策に向けて誘導される指向型探索が含まれていた。しか
しながら、上記解決法のいずれもタンパク質構造の予測などの生物学的課題には
適していないことが分ってきた。タンパク質構造内では多くの「デッドエンド（
ｄｅａｄ−ｅｎｄ）」が可能であるとともに、タンパク質の折り畳みに対する一
連の決定的な法則が知られていないために、指向型探索は、上記のような問題に
対しては有用ではない。同様に、タンパク質の構造に対する法則が知られていな
いため、現在のところ探索空間を減らすことはできない。また、タンパク質構造
はアミノ酸という基本的な決まった構築ブロックを有しているために、組み合わ
せ空間の表示を変えることもできない。最後に、決定性の低い解を特定すること
は、タンパク質構造の全体予測に対しては有用でないことが分っている。これは
、１つのタンパク質を折り畳むために導かれうる明確な「法則」自体は、現在の
ところは当該タンパク質に特有のものであって、一般化はされていない。組み合
わせ空間内の適切な値の組の集団を検出することのできる有効な探索ストラテジ
ーは存在しない。Many attempts have been made to address the problem of searching combinatorial spaces in every corner. Generally, these attempts include transforming a large search space into a smaller search space by subdividing the combinatorial space, changing the display of the space, and reducing the determinism of the criteria for finding potential solutions. And so on, including a directed search in which the search strategy was guided towards the desired solution. However, it has been found that none of the above solutions are suitable for biological problems such as prediction of protein structure. Within the protein structure there are many "dead ends (
Directed search is not useful for problems such as the one above because of the possibility of "dead-end)" and the lack of a set of decisive rules for protein folding. Similarly, the search space cannot be reduced at this time because the laws for protein structure are unknown. Also, because the protein structure has basic fixed building blocks of amino acids, the display of the combination space cannot be changed. Finally, identifying less deterministic solutions proves to be less useful for the overall prediction of protein structure. This is because the definite "rule" itself that can be guided to fold a protein is specific to that protein at the moment and has not been generalized. There is no effective search strategy that can find a population of appropriate value sets in the combinatorial space.

【０００７】第１の生物学的問題：極性プロトン上記の一般的に試みられてきた解決法は、特定の生物学的問題に対する具体的
な解決法の面からさらに説明することができる。たとえば、極性プロトン（水素
）の位置は、タンパク質やＤＮＡ分子などの生物学上重要な分子内での水素結合
の関係と特異性を決定するために重要であるとともに、これらの分子と他の分子
との間などでの水素結合を決定するためにも重要である。すべての水素原子をタ
ンパク質および核酸モデル内に含めることは、エネルギー最小化、分子動力学シ
ミュレーションの間に生物学的系をより正確に表示するために、また分子認識を
理解するために必要である（ジョーンズ（Ｊｏｎｅｓ）ら、Ｊ．Ｍｏｌ．Ｂｉｏ
ｌ、第２４巻、４３〜５３頁、１９９５年）。極性水素は、二次構造およびタン
パク質パッキングを決定する際に重要な役割を果たし、水素結合を正確に配置し
て確実に形成させることは、エネルギー評価に対して非常に重要である。活性部
位における１つの誤って配置された極性水素は、分子動力学シミュレーションの
間に基質のコンフォメーションを劇的に変化させることが分っている（バス（Ｂ
ａｓｓ）ら、Ｐｒｏｔａｉｎｓ、第１２巻、２６６〜２７７頁、１９９２年）。 First Biological Problem: Polar Protons The generally attempted solutions above can be further described in terms of specific solutions to particular biological problems. For example, the positions of polar protons (hydrogens) are important for determining hydrogen bond relationships and specificities within biologically important molecules such as proteins and DNA molecules, as well as for these and other molecules. It is also important for determining hydrogen bonds such as between and. Inclusion of all hydrogen atoms in protein and nucleic acid models is necessary for energy minimization, for a more accurate representation of biological systems during molecular dynamics simulations, and for understanding molecular recognition (Jones, et al., J. Mol. Bio
1, 24, 43-53, 1995). Polar hydrogen plays an important role in determining secondary structure and protein packing, and the correct placement and reliable formation of hydrogen bonds is very important for energy evaluation. One misplaced polar hydrogen in the active site has been found to dramatically change the conformation of the substrate during molecular dynamics simulations (bass (B
ass) et al., Proteins, 12: 266-277, 1992).

【０００８】現在のところ、生体分子の高分解能データを得るための主たる出所はＸ線結晶
学である。しかしながら、Ｘ線結晶学は、プロトンの位置が特定されていない場
合に重原子の位置を特定するためには有用である。中性子回折試験は、プロトン
の位置を特定することはできるが、現時点においては、僅かのＸ線／中性子回折
複合試験しかプロテインデータバンク（ＰＤＢ）に寄託されていない。At present, the main source for obtaining high resolution data of biomolecules is X-ray crystallography. However, X-ray crystallography is useful for locating heavy atoms when protons have not been localized. Although neutron diffraction tests can locate protons, at the moment only a few X-ray / neutron diffraction combined tests have been deposited with the Protein Data Bank (PDB).

【０００９】タンパク質中の極性水素を配置するためにコンピュータを用いた方法がいくつ
か提案されている。まず最初に、大半の分子モデリングソフトウェアパッケージ
に共通して、非特異的な方法で水素を配置し、その後、複数ミニマム問題を受け
るエネルギー最小化アルゴリズムによって構造を最適化し得る。エネルギー最小
化アルゴリズムは、柔軟な極性水素（その多くが水素結合を形成しうる）の代替
位置を考慮に入れていない。Several computer-based methods have been proposed for locating polar hydrogens in proteins. First, common to most molecular modeling software packages, hydrogens can be placed in a non-specific manner, after which the structure can be optimized by an energy minimization algorithm subject to the multiple minimum problem. Energy minimization algorithms do not take into account alternative positions for flexible polar hydrogens, many of which may form hydrogen bonds.

【００１０】ブルンガー（Ｂｒｕｎｇｅｒ）とカープラス（Ｋａｒｐｌｕｓ）（Ｐｒｏｔａ
ｉｎｓ、第４巻、１４８〜１５６頁、１９９８年）によって提案された第２の方
法は、ねじれ角回転による各極性プロトンのターンにおける局所ミニマムコンフ
ォメーションの探索を利用している。そして収束点に至るまでこの繰り返し処理
が続けられる。この方法は隣接する回転可能な水素の影響を考慮に入れていない
ため、そのような水素同士の密接な接触が無いような系に対してのみ正確である
と言えるであろう。[0010] Brunger and Karplus (Prota)
Ins, Vol. 4, pp. 148-156, 1998), a second method utilizes the search for a local minimum conformation at each polar proton turn by twist angle rotation. Then, this iterative process is continued until the convergence point is reached. Since this method does not take into account the effects of adjacent rotatable hydrogens, it may be said to be accurate only for systems where there is no such intimate hydrogen-hydrogen contact.

【００１１】バス（Ｂａｓｓ）ら（Ｐｒｏｔｅｉｎｓ、第１２巻、２６６〜２７７頁、１９
９２年）によって提案された第３の方法は、系を、相互に作用する水素結合のド
ナーとアクセプターのネットワークに分割することに基づいている。アルゴリズ
ムは、各ネットワークにおいて形成されうる水素結合の数を最大にし、ドナーと
アクセプターの間の総距離を最小にしようとする。各ネットワークは最も可能性
のある水素結合の組を厳密に調べられるため、比較の数（オプション間）はネッ
トワーク内の要素の階乗に釣り合い、このことが計算を小さなネットワークに限
ったものにしてしまう（バス（Ｂａｓｓ）ら、Ｐｒｏｔｅｉｎｓ、第１２巻、２
６６〜２７７頁、１９９２年）。最良の構造を選択するためにいかなるエネルギ
ー評価も用いられていない。その結果、出力は位置特定された水素とそれらの環
境との間での高いエネルギー相互作用を含んでいる可能性がある。Bass et al. (Proteins, 12: 266-277, 19)
1992) proposed a third method based on partitioning the system into a network of interacting hydrogen-bond donors and acceptors. The algorithm tries to maximize the number of hydrogen bonds that can be formed in each network and minimize the total distance between donor and acceptor. Since each network can probe the most likely set of hydrogen bonds, the number of comparisons (between options) is commensurate with the factorial of the elements in the network, which limits the computation to small networks. End (Bass et al., Proteins, Volume 12, 2)
66-277, 1992). No energy rating has been used to select the best structure. As a result, the output may contain high energy interactions between the localized hydrogens and their environment.

【００１２】リチャードソン（Ｒｉｃｈａｒｄｓｏｎ）ら（Ｊ．Ｍｏｌ．Ｂｉｏｌ．、第２
８５巻、１７１１〜１７３３頁、１９９９年）とワード（Ｗｏｒｄ）ら（Ｊ．Ｍ
ｏｌ．Ｂｉｏｌ．、第２８５巻、１７３５〜１７４７頁、１９９９年）は、最近
、Ａｓｎ／Ｇｌｎ「フリップ」と、ヒスチジン環のプロトン化状態と、水の相互
作用の簡単なモデルとを考慮することにより、「ネットワーク」アプローチを拡
張している。回転可能なプロトンに対して、局所Ｈ結合の組を距離とファンデル
ワールス重複に関して最適化する。残念ながら、タンパク質などの生体分子にお
ける極性プロトンの配置の問題に対して試みられてきた上記解決策のいずれも、
そのような分子のクラスに対してさえ、該クラスの中で正確であり、実施に対し
て有効であると一般化することはできない。Richardson et al. (J. Mol. Biol., 2nd
85, 1711-1733, 1999) and Word et al. (J. M.
ol. Biol. , 285, pp. 1735-1747, 1999), by recently considering the Asn / Gln "flip", the protonation state of the histidine ring, and a simple model of water interactions. It extends the approach. For rotatable protons, optimize the set of local H bonds with respect to distance and van der Waals overlap. Unfortunately, none of the above solutions to the problem of the placement of polar protons in biomolecules such as proteins
Even for a class of such molecules, it cannot be generalized to be accurate within that class and effective for practice.

【００１３】第２の生物学的問題：アミノ酸側鎖の配置組み合わせ空間を隅々まで探索するにあたってのもう一つの問題の例として、
アミノ酸側鎖の配置が挙げられる。この問題自体は組み合わせ空間によって解決
されるが、これはタンパク質構造予測の一般的な問題の一部にすぎない。しかし
ながら、この問題は上記側鎖の位置を予測しようとする現在入手可能な方法では
処理し難いことが分ってきている。 Second biological problem: Arrangement of amino acid side chains As another example of the problem in searching the combinatorial space,
The arrangement of amino acid side chains may be mentioned. This problem itself is solved by combinatorial space, but this is only part of the general problem of protein structure prediction. However, this problem has been found to be difficult to handle with currently available methods that attempt to predict the position of the side chains.

【００１４】タンパク質側鎖を正確に配置することは、理論的および実験的目的のいずれに
対しても必要不可欠である。理論的側面からみると、これはｄｅｎｏｖｏタン
パク質構造予測における下位問題である。また、構造に基づく薬物設計（デファ
イ（Ｄｅｆａｙ）とコーエン（Ｃｏｈｅｎ）、Ｐｒｏｔｅｉｎｓ、第２３巻、４
３１〜４４５頁、１９９５年）、逆フォールディングおよびスレディング（ｔｈ
ｒｅａｄｉｎｇ）アルゴリズム（バハー（Ｂａｈａｒ）とジャーニガン（Ｊｅｒ
ｎｉｇａｎ）、Ｊ．Ｍｏｌ．Ｂｉｏｌ．、第２６６巻、１９５〜２１４頁、１９
９７年）、フォールディング過程と構造安定性の理解（ツーコフ（Ｚｈｕｋｏｖ
）ら、ＰｒｏｔｅｉｎＳｃｉ．、第９巻、２７３〜２７９頁、２０００年）、
タンパク質三次構造のアプイニシオ（ａｂ−ｉｎｉｔｉｏ）予測（フアング（Ｈ
ｕａｎｇ）ら、Ｐｒｏｔｅｉｎｓ、第３３巻、２０４〜２１７頁、１９９８年）
、およびホモロジーに基づくモデリング（ブルンデル（Ｂｌｕｎｄｅｌｌ）ら、
Ｎａｔｕｒｅ、第３２６巻、３４７〜３５２頁、１９８７年）のために回避でき
ない問題である。Ｘ線結晶学者の視点から見ると、精密化計算に先だって行われ
る主鎖の電子密度マップを用いた側鎖の位置決定を高速化することができると考
えられる。主たる限界は、各側鎖がとりうる可能なコンフォメーションが大量に
なることである（リー（Ｌｅｅ）とシュビア（Ｓｕｂｂｉａｈ）、Ｊ．Ｍｏｌ．
Ｂｉｏｌ．、第２１７巻、３７３〜３８８頁、１９９１年）。全ての可能なタン
パク質コンフォメーションの網羅的探索は、当該技術水準のコンピュータの能力
を超えている。Accurate placement of protein side chains is essential for both theoretical and experimental purposes. From a theoretical perspective, this is a subproblem in de novo protein structure prediction. Also, structure-based drug design (Defay and Cohen, Proteins, Volume 23, 4).
31-445, 1995), reverse folding and threading (th
reading algorithm (Bahar and Jerigan)
nigan), J. Mol. Biol. 266, 195-214, 19
1997) Understanding of folding process and structural stability (Zhukov (Zhukov)
) Et al., Protein Sci. , Vol. 9, pp. 273-279, 2000),
Ab-initio prediction of protein tertiary structure (Huang (H
Uang) et al., Proteins, 33, 204-217, 1998).
, And homology-based modeling (Brundell et al.,
Nature, 326, 347-352, 1987), which is an unavoidable problem. From the viewpoint of an X-ray crystallist, it is considered that the position determination of the side chain using the electron density map of the main chain, which is performed prior to the refinement calculation, can be speeded up. The main limitation is the large number of possible conformations that each side chain can have (Lee and Subbiah, J. Mol.
Biol. 217, 373-388, 1991). An exhaustive search for all possible protein conformations is beyond the state of the art computer.

【００１５】通常、Ｘ線結晶学はＲ因子によって特徴付けられる単一の構造を提供する。結
晶構造は、生理学的により関係の深いＮＭＲ構造の溶液環境とは逆に、高度に配
列された結晶格子における生体分子を示したものである。Ｘ線結晶構造は、溶液
中のコンフォメーションの集団（アンサンブル）には見られないかもしれない、
結晶中の特殊なコンフォメーションサブ状態に偏ってしまうこともある（ブルン
ガー（Ｂｒｕｎｇｅｒ）、Ｎａｔ．Ｓｔｒｕｃ．Ｂｉｏｌ．、４ｓｕｐｐｌ．
、８６２〜８６５頁、１９９７年）。代替の回転子を観察することは、非常に高
い分解能でない限りは従来のＸ線結晶学技術の検出限界を超えている。タンパク
質中の全側鎖のうちの少なくとも１０％が、注意深く精密化された結晶構造にお
いて複数のばらばらのコンフォメーションをとる（スミス（Ｓｍｉｔｈ）ら、Ｂ
ｉｏｃｈｅｍｉｓｔｒｙ、第２５巻、５０１８〜５０２７頁、１９８６年）。マ
ックオーサー（ＭａｃＡｒｔｈｕｒ）とソーントン（Ｔｈｏｒｎｔｏｎ）（Ａｃ
ｔａ．Ｃｒｙｓｔ．ＤＢｉｏｌ．Ｃｒｙｓｔ．、Ｄ５５、９９４〜１００４頁
、１９９９年）は、主に小さい柔軟性側鎖に対して、χ_１平均値と分解能との間
に顕著かつ予期せぬ相関関係があることを見つけた。全てのデータは、この観察
が、低い分解能では単一の歪んだ配座異性体として解釈されるであろう局所的な
コンフォメーションの柔軟性と不規則性を反映するという仮説を裏付けている。
これら全ての研究の結果は、静的ではなく動的なタンパク質構造の図、およびこ
の動的情報をＮＭＲ集団から抽出して、タンパク質機能のより詳細な理解を得る
必要性に焦点が当てられている（フィリッポポウラス（Ｐｈｉｌｉｐｏｐｏｕｌ
ｏｕｓ）とリム（Ｌｉｍ）、Ｐｒｏｔｅｉｎｓ、第３６巻、８７〜１１０頁、１
９９９年）。タンパク質機能と分子認識は、構造上の可塑性に依存しており（ガ
ルシア（Ｇａｒｃｉａ）ら、Ｓｃｉｅｎｃｅ．第２７９巻、１１６６〜１１７２
頁、１９９８年）、受容体タンパク質のコンフォメーション柔軟性は、リガンド
のドッキングに影響を及ぼす大きな要因の１つである（デスメット（Ｄｅｓｍｅ
ｔ）ら、ＦＡＳＥＢＪ．第１１巻、１６４〜１７２頁、１９９７年）。しかし
ながら、ポテンシャルエネルギー面上には多数の最小エネルギー配座異性体があ
るため、タンパク質側鎖の位置をコンピュータで正確に決定することは、剛性の
バックボーンを有する場合ですら複雑な作業となる。側鎖付加のための従来の方
法では、通常は単一のタンパク質構造が得られ、このタンパク質構造は利用でき
る場合にはＸ構造と比較される。コンフォメーション空間は無視されている。最
近、多くのタンパク質のＮＭＲ研究が行われ、それぞれのタンパク質に対する多
くのコンフォメーションが示唆されてきた（シュナイダー（Ｓｃｈｎｅｉｄｅｒ
）ら、Ｊ．Ｍｏｌ．Ｂｉｏｌ．第２８５巻、７２７〜７４０頁、１９９７年）。
しかしながら、そのようなコンフォメーションが、２次元および３次元カップリ
ング地図から浮上する距離の制限の代替となる解を表すかどうか、あるいは、そ
れらのコンフォメーションが平衡時における全集団に寄与しうる現実のコンフォ
メーションであるかどうかも明確でない。古典的な分子動力学的（ＭＤ）アプロ
ーチは、生体分子をシミュレーションするための技術の選択肢である。現在の技
術では、数万の原子から成る系のＭＤシミュレーションを数ナノ秒で行うことが
当たり前になりつつある（サグイ（Ｓａｇｕｉ）とダーデン（Ｄａｒｄｅｎ）、
Ａｎｎ．Ｒｅｖ．Ｂｉｏｐｈｙｓ．Ｂｉｏｍｏｌ．Ｓｔｒｕｃｔ．、第２８巻、
１５５〜１７９頁、１９９９年）。しかしながら、生体分子機能に関連する時間
尺度の範囲は、ナノ秒から秒を超えることまである。ＭＤによって、タンパク質
の異なる配座異性体の間で平衡に達するまでに必要な時間は、そのようなシミュ
レーションに対しては桁外れであり、我々はタンパク質の挙動をその環境におい
て一瞥することしかできないと思われる。X-ray crystallography usually provides a single structure characterized by the R factor. The crystal structure describes the biomolecules in a highly ordered crystal lattice, as opposed to the solution environment of the more physiologically relevant NMR structure. X-ray crystal structure may not be found in the ensemble of conformations in solution,
It may be biased to a special conformational sub-state in the crystal (Brunger, Nat. Struc. Biol., 4 supl.
, 862-865, 1997). Observing alternative rotors exceeds the detection limits of conventional X-ray crystallography techniques, unless at very high resolution. At least 10% of all side chains in a protein adopt multiple disjoint conformations in a carefully refined crystal structure (Smith et al., B.
iochemistry, 25, 5018-5027, 1986). MacArthur and Thornton (Ac
ta. Cryst. D Biol. Cryst. , D55, pp. 994-1004, 1999) found a significant and unexpected correlation between χ ₁ mean and resolution, mainly for small flexible side chains. All data support the hypothesis that this observation reflects local conformational flexibility and disorder that would be interpreted as a single distorted conformer at low resolution.
The results of all these studies have focused on a diagram of protein structure, which is dynamic rather than static, and the need to extract this dynamic information from the NMR population to gain a more detailed understanding of protein function. (Philippoul)
and Lim, Proteins, Vol. 36, pp. 87-110, 1.
999). Protein function and molecular recognition depend on structural plasticity (Garcia et al., Science. 279: 1166-1172).
P., 1998, conformational flexibility of receptor proteins is one of the major factors affecting ligand docking (Desme.
t) et al., FASEB J .; 11: 164-172, 1997). However, due to the large number of minimal energy conformers on the potential energy surface, computer-accurate determination of protein side-chain positions is a complex task, even with a rigid backbone. Conventional methods for side chain addition usually result in a single protein structure, which is compared to the X structure when available. The conformational space is ignored. Recently, many protein NMR studies have been carried out, suggesting many conformations for each protein (Schneider).
) Et al., J. Mol. Biol. 285, 727-740, 1997).
However, whether such conformations represent an alternative solution to the distance limitation emerging from two-dimensional and three-dimensional coupling maps, or whether those conformations can contribute to the total population at equilibrium It is also unclear whether it is a conformation of. The classical molecular dynamics (MD) approach is the technology of choice for simulating biomolecules. With current technology, it is becoming commonplace to perform MD simulations of systems consisting of tens of thousands of atoms in a few nanoseconds (Sagui and Darden,
Ann. Rev. Biophys. Biomol. Struct. , Volume 28,
155-179, 1999). However, the range of time scales associated with biomolecular function can range from nanoseconds to more than seconds. By MD, the time required to reach equilibrium between the different conformers of the protein is extraordinary for such simulations, and we can only glance at the behavior of the protein in its environment. Seem.

【００１６】側鎖付加に対する現行のストラテジーは異なる３つのカテゴリに分類される。
第１のカテゴリは各側鎖のコンフォメーション空間である。連続空間法（アイゼ
ンメンジャー（Ｅｉｓｅｎｍｅｎｇｅｒ）、Ｊ．Ｍｏｌ．Ｂｉｏｌ．、第２３１
巻、８４９〜８６０頁、１９９３年；ロイトバーグ（Ｒｏｉｔｂｅｒｇ）とエル
バー（Ｅｌｂｅｒ）、Ｊ．Ｃｈｅｍ．Ｐｈｙｓ．第９５巻、９２７７〜９２８７
頁、１９９１年）において、あらゆる側鎖のねじれ角をサンプリングすることが
できる。離散空間法は、側鎖が回転異性体と呼ばれるエネルギー的に好ましいコ
ンフォメーションで存在しているという仮定に基づいており、該回転異性体は、
既知の構造の統計的分析によって収集された局所ミニマム配座異性体である（チ
ャンドラセカラン（Ｃｈａｎｄｒａｓｅｋａｒａｎ）とラマチャンドラン（Ｒａ
ｍａｃｈａｎｄｒａｎ）、Ｉｎｔ．Ｊ．ＰｒｏｔｅｉｎＲｅｓ、第２巻、２２
３〜２３３頁、１９７０年；サジセクハラン（Ｓａｓｉｓｅｋｈａｒａｎ）とポ
ナスワミー（Ｐｏｎｎｕｓｗａｍｙ）、Ｂｉｏｐｏｌｙｍｅｒｓ、第９巻、１２
４９〜１２５６頁、１９７０年；サジセクハラン（Ｓａｓｉｓｅｋｈａｒａｎ）
とポナスワミー（Ｐｏｎｎｕｓｗａｍｙ）、Ｂｉｏｐｏｌｙｍｅｒｓ、第１０巻
、５８３〜５９２頁、１９７１年；ポンダー（Ｐｏｎｄｅｒ）とリチャーズ（Ｒ
ｉｃｈａｒｄｓ）、Ｊ．Ｍｏｌ．Ｂｉｏ１．、第１９３巻、７７５〜７９ｌ頁、
１９８７年；ゲリン（Ｇｅｌｉｎ）とカープラス（Ｋａｒｐｌｕｓ）、Ｂｉｏｃ
ｈｅｍｉｓｔｒｙ、第１８巻、１２５６〜１２６８頁、１９７９年；ダングラッ
ク（Ｄｕｎｂｒａｃｋ）とカープラス（Ｋａｒｐｌｕｓ）、Ｎａｔ．Ｓｔｒｕｃ
ｔ．Ｂｉｏｌ．第１巻、３３４〜３４０頁、１９９４年）。離散空間法は、回転
異性体データベースには存在しないコンフォメーションを予測することができな
い。しかしながら、非常に希なコンフォメーションを含む大きな回転異性体デー
タベースが、必ずしも小さいデータベースよりも良好な予測結果を生み出すとは
限らない（ホルム（Ｈｏｌｍ）とサンダー（Ｓａｎｄｅｒ）、Ｐｒｏｔｅｉｎｓ
、第１４巻、２１３〜２２３頁、１９９２年；ロートン（Ｌａｕｇｈｔｏｎ）、
Ｊ．Ｍｏｌ．Ｂｉｏｌ．、第２３５巻、１０８８〜１０９７頁、１９９４年；タ
ニムラ（Ｔａｎｉｍｕｒａ）ら、ＰｒｏｔｅｉｎＳｃｉ．、第３巻、２３５８
〜２３６５頁、１９９４年；バスケッツ（Ｖａｓｑｕｅｚ）、Ｂｉｏｐｏｌｙｍ
ｅｒｓ、第３６巻、５３〜７０頁、１９９５年）。データベースはバックボーン
依存性とバックボーン非依存性のものにも分類することもできる。前者は側鎖の
コンフォメーションと局所バックボーンコンフォメーションとの間の関係に基づ
いているが、後者はそのような関係に基づいていない。Current strategies for side chain addition fall into three different categories.
The first category is the conformational space of each side chain. Continuous space method (Eisenmenger, J. Mol. Biol., No. 231
Vol. 849-860, 1993; Reitberg and Elber, J. Am. Chem. Phys. Volume 95, 9277-9287
Page, 1991), the torsion angles of any side chain can be sampled. The discrete space method is based on the assumption that the side chains exist in an energetically favorable conformation called rotamers, which rotamers are
Local minimum conformers collected by statistical analysis of known structures (Chandracekaran and Ramachandran).
machandran), Int. J. Protein Res, Volume 2, 22
3-233, 1970; Sasisekharan and Ponnuswamy, Biopolymers, vol. 9, 12
49-1256, 1970; Sasisekharan.
And Ponnaswami, Biopolymers, Vol. 10, pp. 583-592, 1971; Ponder and Richards (R).
Richards), J. Mol. Bio1. , 193, 775-79l,
1987; Gelin and Karplus, Bioc.
Chemistry, Vol. 18, pp. 1256-1268, 1979; Dunblack and Karplus, Nat. Struc
t. Biol. Vol. 1, pp. 334-340, 1994). Discrete space methods cannot predict conformations that do not exist in rotamer databases. However, large rotamer databases containing very rare conformations do not always yield better predictive results than smaller databases (Holm and Sander, Proteins).
, 14: 213-223, 1992; Laughton,
J. Mol. Biol. 235, 1088-1097, 1994; Tanimura et al., Protein Sci. , Volume 3, 2358
~ 2365, 1994; Vasquez, Biopolym.
ers, 36, 53-70, 1995). Databases can also be classified into backbone-dependent and backbone-independent. The former is based on the relationship between the side chain conformation and the local backbone conformation, while the latter is not.

【００１７】第２のカテゴリは、解を評価するためのコスト関数である。エネルギーに基づ
く方法は、非結合項（ロートン（Ｌａｕｇｈｔｏｎ）、Ｊ．Ｍｏｌ．Ｂｉｏｌ、
第２３５巻、１０８８〜１０９７頁、１９９４年；バスケッツ（Ｖａｓｑｕｅｚ
）、Ｂｉｏｐｏｌｙｍｅｒｓ、第３６巻、５３〜７０頁、１９９５年；ウィルソ
ン（Ｗｉｌｓｏｎ）ら，Ｊ．Ｍｏｌ．Ｂｉｏｌ．、第２２９巻、９９６〜１００
６頁、１９９３年；バスケッツ（Ｖａｓｑｕｅｚ）、Ｃｕｒｒ．Ｏｐｉｎ．Ｓｔ
ｒｕｃｔ．Ｂｉｏｌ．、第６巻、２１７〜２２１頁、１９９６年）に依存してい
る。エネルギーが低ければ低いほど、予測精度が高くなると想定している。The second category is the cost function for evaluating the solution. Energy-based methods include non-bonded terms (Laughton, J. Mol. Biol,
Volume 235, pp. 1088-1097, 1994; Baskets (Vasquez
), Biopolymers, 36, 53-70, 1995; Wilson et al., J. Am. Mol. Biol. , Volume 229, 996-100
6 1993; Vasquez, Curr. Opin. St
ruct. Biol. , Vol. 6, pp. 217-221, 1996). We assume that the lower the energy, the higher the prediction accuracy.

【００１８】以下のような知識に基づく方法も提案されている。サチュリフ（Ｓｕｔｃｌｉ
ｆｆｅ）ら（ＰｒｏｔｅｉｎＥｎｇ．第１巻、３８５〜３９２頁、１９８７年
）は、位相幾何学的に同等の位置にある側鎖（そのような相関関係が観察される
場合に限るが）からの空間情報と、それぞれの二次構造タイプにおいて最も可能
性のある側鎖のコンフォメーションを用いて側鎖を組み立てる方法を示唆してい
る。サリ（Ｓａｌｉ）とブルンデル（Ｂｌｕｎｄｅｌ１）（Ｊ．Ｍｏｌ．Ｂｉｏ
ｌ．、第２３４巻、７７９〜８１５頁、１９９３年）は、関連した構造とのアラ
イメントを行い、ある配列に対して最も可能性の高い構造を見つけるように設計
された比較タンパク質モデリング法を記載している。バウワー（Ｂｏｗｅｒ）ら
（Ｊ．Ｍｏｌ．Ｂｉｏｌ．、第２６７巻、１２６８〜１２８２頁、１９９７年）
は、最も望ましいバックボーン依存性回転異性体内の残基の位置を特定し、該構
造から引き起こされる衝突を系統的に解析した。The following knowledge-based methods have also been proposed. Sutcli
ffe) et al. (Protein Eng. Vol. 1, 385-392, 1987) from side chains at topologically equivalent positions (provided such a correlation is observed). It suggests how to assemble side chains using spatial information and the most likely side chain conformation in each secondary structure type. Sali and Brundel 1 (J. Mol. Bio
l. , 234, 779-815, 1993), describes comparative protein modeling methods designed to align with related structures and find the most likely structure for a sequence. There is. Bower et al. (J. Mol. Biol., Vol. 267, pp. 1268-1282, 1997).
Identified the location of residues within the most desirable backbone-dependent rotamers and systematically analyzed collisions caused by the structure.

【００１９】第３のカテゴリは、探索ストラテジーである。利用されている探索ストラテジ
ーの例は多岐にわたる。メトロポリスモンテカルロ法（ホルム（Ｈｏｌｍ）とサ
ンダー（Ｓａｎｄｅｒ）、Ｐｒｏｔｅｉｎｓ、第１４巻、２１３〜２２３頁、１
９９２年），ギブスサンプリングモンテカルロ（バスケッツ（Ｖａｓｑｕｅｚ）
、Ｂｉｏｐｏｌｙｍｅｒｓ、第３６巻、５３〜７０頁、１９９５年），ニューラ
ルネットワーク（ファング（Ｈｗａｎｇ）とリャオ（Ｌｉａｏ）、Ｐｒｏｔｅｉ
ｎＥｎｇ．、第８巻、３６３〜３７０頁、１９９５年）、遺伝学的アルゴリズ
ム（タフェリー（Ｔｕｆｆｅｒｙ）ら、Ｊ．Ｂｉｏｍｏｌ．Ｓｔｒｕｃｔ．Ｄｙ
ｎａｍ．第８巻、１２６７〜１２８９頁、１９９１年；タフェリー（Ｔｕｆｆｅ
ｒｙ）ら、Ｊ．Ｃｏｍｐｕｔ．Ｃｈｅｍ、第１４巻、７９０〜７９８頁、１９９
３年），シミュレーテッドアニーリング（リー（Ｌｅｅ）とスビア（Ｓｕｂｂｉ
ａｈ）、Ｊ．Ｍｏｌ．Ｂｉｏｌ．、第２１７巻、２７３〜２８８頁、１９９１年
）、平均場最適化（ケイル（Ｋｏｅｈｌ）とデラリュー（Ｄｅｌａｒｕｅ）、Ｊ
．Ｍｏｌ．Ｂｉｏｌ．、第２３９巻、２４９〜２７５頁、１９９４年）およびＬ
ＥＳ（ＬｏｃａｌｌｙＥｌｎｈａｎｃｅｄＳａｍｐｌｉｎｇ）法（ロイトバ
ーグ（Ｒｏｉｔｂｅｒｇ）とエルバー（Ｅｌｂｅｒ）、Ｊ．Ｃｈｅｍ．Ｐｈｙｓ
、第９５巻、９２７７〜９２８７頁、１９９１年）。The third category is search strategies. There are many examples of search strategies used. Metropolis Monte Carlo method (Holm and Sander, Proteins, 14: 213-223, 1
992), Gibbs Sampling Monte Carlo (Vasquez)
, Biopolymers, 36, 53-70, 1995), neural networks (Hwang and Liao, Protei.
n Eng. , Vol. 8, pp. 363-370, 1995), Genetic algorithm (Tuffery et al., J. Biomol. Struct. Dy.
nam. Volume 8, pp 1267-1289, 1991; Tuffee
ry) et al. Comput. Chem, 14: 790-798, 199.
3 years), simulated annealing (Lee) and Subia (Subbi)
ah), J. Mol. Biol. , 217, 273-288, 1991), mean field optimization (Koehl and Delarue, J.
． Mol. Biol. , 239, 249-275, 1994) and L.
ES (Locally Enhanced Sampling) method (Roitberg and Elber, J. Chem. Phys.
95, 9277-9287, 1991).

【００２０】組み合わせ探索（ダンブラック（Ｄｕｎｂｒａｃｋ）とカープラス（Ｋａｒｐ
ｌｕｓ）、Ｊ．Ｍｏｌ．Ｂｉｏｌ．、第２３０巻、５４３〜５７４頁、１９９３
年；タフェリー（Ｔｕｆｆｅｒｙ）ら、Ｊ．Ｂｉｏｍｏｌ．Ｓｔｒｕｃｔ．Ｄｙ
ｎａｍ．、第８巻、１２６７〜１２８９頁、１９９１年；ウィルソン（Ｗｉｌｓ
ｏｎ）ら、Ｊ．Ｍｏｌ．Ｂｉｏｌ．、第２２９巻、９９６〜１００６頁、１９９
３年）は個別の配座異性体に用いられ、その後、精密化の最終段階において連続
最小化を行うようにしてもよい。なお、上記のいずれもが有効な解に収束すると
いう保証はない。もう一つの広く用いられている方法は、デッドエンド除外法（
ＤＥＥ：ＤｅａｄＥｎｄＥｌｉｍｉｎａｔｉｏｎ）である。この方法は、グ
ローバルミニマムエネルギーコンフォメーションと完全に和合しない回転異性体
を特定し、あるオーダー以上の局部エネルギーミニマムに寄与することができな
い回転異性体を除外することに基づいている。そのような回転異性体から成るコ
ンフォメーションは、デッドエンドと見なすことができる（デスメット（Ｄｅｓ
ｍｅｔ）ら、Ｎａｔｕｒｅ、第３５６巻、５３９〜５４２頁、１９９２年；デス
メット（Ｄｅｓｍｅｔ）ら、ＦＡＳＥＢＪ．、第１１巻、１６４〜１７２頁、
１９９７年；ラスターズ（Ｌａｓｔｅｒｓ）とデスメット（Ｄｅｓｍｅｔ）、Ｐ
ｒｏｔｅｉｎＥｎｇ．、第６巻、７１７〜７２２頁、１９９３年）。繰り返し
適用することによって十分な回転異性体が除外されれば、グローバルミニマムを
見つけることができる（ゴールドシュタイン（Ｇｏｌｄｓｔｅｉｎ）、Ｂｉｏｐ
ｈｙｓＪ．、第６６巻、１３３５〜１３４０頁、１９９４年）。しかしながら
、ＤＥＥは、低エネルギーの解の集団を見つけだすことができない。Combination search (Dunblack and Karplus)
lus), J. Mol. Biol. 230, pp. 543-574, 1993.
Year; Tuffery et al. Biomol. Struct. Dy
nam. 8: 1267-1289, 1991; Wilson
on) et al. Mol. Biol. 229, pp. 996-1006, 199
3 years) may be used for the individual conformers, followed by successive minimizations at the final stage of refinement. There is no guarantee that any of the above will converge to an effective solution. Another widely used method is the dead end exclusion method (
DEE: Dead End Elimination). This method is based on identifying rotamers that are not completely compatible with the global minimum energy conformation and excluding rotamers that cannot contribute to a local energy minimum above a certain order. A conformation consisting of such rotamers can be considered dead-end (Desmet (Desmet)).
et al., Nature, 356, 539-542, 1992; Desmet et al., FASEB J. et al. , Vol. 11, pp. 162-172,
1997; Lasters and Desmet, P.
protein Eng. , Vol. 6, pp. 717-722, 1993). A global minimum can be found if sufficient rotamers are excluded by repeated application (Goldstein, Biop.
hys J.H. 66, 1335-1340, 1994). However, DEE is unable to find a population of low energy solutions.

【００２１】Ａ^＊アルゴリズムは、ｆ^＊で示されるコスト関数を用いて、探索ツリーのルー
トノードからゴールノードへの最適経路を見つけだす（リーチ（Ｌｅａｃｈ）と
レモン（Ｌｅｍｏｎ）、Ｐｒｏｔｅｉｎｓ、第３３巻、２２７〜２３９頁、１９
９８年）。それぞれのノードは、スタートノードからノードを探索するコストと
、ゴールノードに達する推定コストとから成る唯一のｆ^＊値を有している。ｆ^＊を繰り返し最適化し、ｆ^＊が最小値となるノードを拡大して、新しいｆ^＊の値を
その後継ノードのために算出する。The A ^* algorithm finds an optimal path from a root node of a search tree to a goal node by using a cost function shown by f ^* (Leach and Lemon, Proteins, Vol. 33, Pp. 227-239, 19
1998). Each node has a unique f ^* value consisting of the cost of searching the node from the start node and the estimated cost of reaching the goal node. Optimize f ^* iteratively, expand the node for which f ^* has a minimum value, and calculate a new value of f ^* for the successor node.

【００２２】タンパク質の低エネルギー側鎖コンフォメーションを特定するための、現時点
で知られている最適な方法は、ＤＥＥとＡ^＊アルゴリズムとを組み合わせた方法
であり、Ａ^＊アルゴリズムは、分配関数を構築するために用いられてきた。Ａ^＊アルゴリズムアプローチは最善のＮ解を見つけうるものの、比較的小さいタンパ
ク質に限られている。現在のところこのアルゴリズムによって解析される最大の
タンパク質は、６８個のアミノ酸を含んでおり、これらは、回転異性体ライブラ
リの複雑さに応じて約１０^４３の組み合わせから成るが、より多数の組み合わせ
を有するタンパク質も一般的である。「独立型」アルゴリズム（ＤＥＥ前処理段
階を含まない）として、Ａ^＊アルゴリズムは最大１０^２１の組み合わせに達する
。Ａ^＊アルゴリズムによる効率的な探索のためには、ゴールノードに達するよう
な良好なコストの評価を有していなければならない。このことは、まだ代入され
ていない残基間での相互作用のために問題となる。こうした限界から、より大き
な系におけるグローバルミニマムと最低エネルギーコンフォメーションを見つけ
だす新規で強力なアルゴリズムが必要になっている。残念ながら、そのようなア
ルゴリズムは現在のところ提供されていない。[0022] for identifying the low energy side-chain conformations of the proteins, the best way known at present is a method that combines the DEE and A ^* algorithm, A ^* algorithm, build a distribution function Has been used to The A ^* algorithm approach can find the best N solution, but is limited to relatively small proteins. Currently, the largest protein analyzed by this algorithm contains 68 amino acids, which consist of about 10 ⁴³ combinations, depending on the complexity of the rotamer library, although more combinations are possible. The proteins that they have are also common. As a "stand-alone" algorithm (without the DEE pre-processing stage), the A ^* algorithm reaches up to 10 ²¹ combinations. For an efficient search by the A ^* algorithm, one must have a good cost estimate to reach the goal node. This is problematic due to the interaction between residues that have not yet been assigned. These limitations require new and powerful algorithms to find global minimums and lowest energy conformations in larger systems. Unfortunately, no such algorithm is currently provided.

【００２３】第３の生物学的問題：ループ構造の予測前述したように、タンパク質の構造の予測には、組み合わせ空間内の探索が必
要であり、これには現時点で適切な解決法がない。タンパク質構造の予測そのも
のは、より小さな多くの問題に分割され、それらの小問題においても組み合わせ
空間内の探索が必要である。そのような問題の一例は、非常に複雑なループ構造
の予測である。 Third Biological Problem: Prediction of Loop Structure As mentioned above , prediction of protein structure requires search in combinatorial space, and there is no suitable solution at this time. The prediction of protein structure itself is divided into many smaller problems, and even those small problems require a search in the combinatorial space. One example of such a problem is the prediction of very complex loop structures.

【００２４】様々なゲノムプロジェクトから出てきた新たに発見された配列に対して実験的
構造または良好なモデルを提供するために、構造ゲノミクスプロジェクトが行わ
れている。ブレーナー（Ｂｒｅｎｎｅｒ）とレビット（Ｌｅｖｉｔｔ）（Ｐｒｏ
ｔｅｉｎＳｃｉ、第９巻、１９７〜２００頁、２０００年）は、配列類似性デ
ータベースの解析に基づいて、新規な折り畳みの数が徐々に減少してきており、
最も一般的な折り畳みが直ぐに見つけられることを示唆している。したがって、
あるタンパク質が既知の三次構造を有する別のタンパク質（「テンプレート」）
と適度な高い配列類似性を示す場合は、ホモロジーモデリングはそのタンパク質
配列の３次元構造を予測するための有力なツールとなるかもしれない。その場合
、二次構造の要素はテンプレートから標的タンパク質に受け渡される。しかしな
がら、「ループ」または「コイル」の伸びは決定されず、予測しなければならな
い。ホモロジーモデリングは長く用いられてきており成功も収めている。ラウス
肉腫ウィルスのアスパラギン酸プロテアーゼ（ウェーバー（Ｗｅｂｅｒ），Ｓｃ
ｉｅｎｃｅ、第２４３巻、９２８〜９３１頁、ｌ９８９年）に基づいたＨＩＶ−
１プロテアーゼの予測やウシ膵トリプシンインヒビターからのアミロイド前駆体
プロテアーゼインヒビタードメイン（ストラッツァーズ（Ｓｔｒｕｔｈｅｒｓ）
ら，Ｐｒｏｔｅｉｎｓ、第９巻、１〜１１頁、１９９１年）の予測などの画期的
な予測からの教訓は、その後の多数の構造に対する配列ホモロジーモデリングに
対して有用なものであった。予測されたループの殆どは、タンパク質ファミリー
内で長さ、配列共に可変であり、それらのモデリングには３次元の挿入および削
除の処理が必要となる。そのような配列および長さの不一致がタンパク質の短い
区間に局在している場合でも、バックボーンコンフォメーションの著しい再編成
が起きる可能性がある。配列、長さ共に高度に類似している２つのタンパク質の
他の領域は、「フレームワーク領域」と呼ばれている。２つの連続した二次構造
を連結する非周期的構造は、「ループ」とよばれ、「不規則なコンフォメーショ
ン」または「ランダムコイル」を有するというように説明される（オリバ（Ｏｌ
ｉｖａ）ら、Ｊ．Ｍｏｌ．Ｂｉｏｌ．、第２６６巻、８１４〜８３０頁、１９９
７年）。ループの構築、すなわち、挿入および削除によって、タンパク質のホモ
ロジー構築において可変領域に対する適当な座標を見つけることは、球状タンパ
ク質での重要な課題である（アルウィン（Ａｌｗｙｎ）とサーラップ（Ｔｈｉｒ
ｕｐ）、ＥＭＢＯＪ．、第５巻、８１９〜８２２頁、１９８６年；ブルックス
（Ｂｒｏｏｋｓ）ら、Ｊ．Ｃｏｍｐｕｔ．Ｃｈｅｍ．、第４巻、１８７〜２１７
頁、ｌ９８３年；ブルッコレリ（Ｂｒｕｃｃｏｌｅｒｉ）ら、Ｎａｔｕｒｅ、第
３３５巻、５６４〜５６８頁、１９８８年；ブルッコレリ（Ｂｒｕｃｃｏｌｅｒ
ｉ）とカープラス（Ｋａｒｐｌｕｓ）、Ｂｉｏｐｏ１ｙｍｅｒｓ、第２６巻、１
３７〜１６８頁、ｌ９８７年；ジア（Ｇｅｅｒ）、Ｐｒｏｔｅｉｎｓ、第７巻、
３１７〜３３４頁、１９９０年；パルマー（Ｐａｌｍｅｒ）とシェラガ（Ｓｃｈ
ｅｒａｇａ）、Ｊ．Ｃｏｍｐ．Ｃｈｅｍ．、第１２巻、５０５〜５２６頁、１９
９１年；サマーズ（Ｓｕｍｍｅｒｓ）とカープラス（Ｋａｒｐｌｕｓ）、Ｊ．Ｍ
ｏｌ．Ｂｉｏｌ．、第２１６巻、９９１〜１０１６頁、１９９０年）。構造生物
学の他の分野においてもループの構築は重大な課題である。これは、ループの２
つの端点の間に適切に挿入可能な断片を見つけ出し、そのエネルギーの評価を行
う必要のあるという、途方もなく複雑な組み合わせの問題である。Structural genomics projects have been undertaken to provide experimental structures or good models for newly discovered sequences from various genomic projects. Brenner and Levitt (Pro
tein Sci, vol. 9, 197-200, 2000), the number of new folds has been gradually reduced based on the analysis of sequence similarity databases.
It suggests that the most common folds are readily found. Therefore,
Another protein in which one protein has a known tertiary structure (“template”)
Homology modeling may be a powerful tool for predicting the three-dimensional structure of a protein sequence if it shows a reasonably high sequence similarity with. In that case, the elements of secondary structure are passed from the template to the target protein. However, the "loop" or "coil" elongation is not determined and must be predicted. Homology modeling has long been used and has been successful. Rous sarcoma virus aspartic protease (Weber, Sc
H.I., 243, 928-931, 1989).
Prediction of one protease and amyloid precursor protease inhibitor domain from bovine pancreatic trypsin inhibitor (Struthers)
Et al., Proteins, Vol. 9, pages 1-11, 1991), and lessons from breakthrough predictions have been useful for subsequent sequence homology modeling for many structures. Most of the predicted loops are variable in length and sequence within the protein family, and their modeling requires a three-dimensional insertion and deletion process. Even when such sequence and length discrepancies are localized to short stretches of protein, significant rearrangement of the backbone conformation can occur. The other regions of the two proteins that are highly similar in sequence and length are called "framework regions." An aperiodic structure that connects two consecutive secondary structures is called a "loop" and is described as having an "irregular conformation" or a "random coil" (Oliver (Ol
iva) et al. Mol. Biol. 266, 814-830, 199.
7 years). Finding appropriate coordinates for variable regions in protein homology construction by loop construction, ie insertion and deletion, is an important task in globular proteins (Alwyn and Thirup).
up), EMBO J. 5: 819-822, 1986; Brooks et al., J. Am. Comput. Chem. , Volume 4, 187-217
Page, 1983; Bruccoleri et al., Nature, 335, pp. 564-568, 1988; Bruccoleri.
i) and Karplus, Biopo1ymers, Vol. 26, 1
37-168, 1987; Geer, Proteins, Volume 7,
317-334, 1990; Palmer and Shelaga (Sch.
eraga), J. et al. Comp. Chem. , Vol. 12, pp. 505-526, 19
1991; Summers and Karplus, J. et al. M
ol. Biol. 216, pp. 991-1016, 1990). The construction of loops is also a major issue in other fields of structural biology. This is loop 2
It is a tremendously complex combinatorial problem that involves finding a properly insertable fragment between two endpoints and evaluating its energy.

【００２５】ほぼ同一の配列と構造を有しながらも結合特異性が著しく異なるイムノグロブ
リンに対して、広範な構造の研究が行われてきた（ブルッコレリ（Ｂｒｕｃｃｏ
ｌｅｒｉ）ら、Ｎａｔｕｒｅ、第３３５巻、５６４〜５６８頁、１９８８年；コ
チア（Ｃｈｏｔｈｉａ）ら、Ｓｃｉｅｎｃｅ、第２３３巻、７５５〜７５８頁、
１９８６年；コチア（Ｃｈｏｔｈｉａ）とレスク（Ｌｅｓｋ）、Ｊ．Ｍｏｌ．Ｂ
ｉｏｌ．、第１９６巻、９０１〜９１７頁、１９８７年；ファイン（Ｆｉｎｅ）
ら、Ｐｒｏｔｅｉｎｓ、第１巻、３４２〜３６２頁、１９８６年；トラメンタノ
（Ｔｒａｍｅｎｔａｎｏ）とレスク（Ｌｅｓｋ）、Ｐｒｏｔｅｉｎｓ、第１３巻
、２３１〜２４５頁、１９９２年）。特異性は、相補性決定領域（ＣＤＲ）と呼
ばれる６つの超可変ループによるものである。これらのループの構造を理解する
ことにより、我々は抗原結合、触媒抗体、および分子認識に関して多くのことを
教えられる。Extensive structural studies have been carried out on immunoglobulins having almost the same sequence and structure but having markedly different binding specificities (Bruccoli (Brucco).
Leri) et al., Nature, 335, 564-568, 1988; Chothia et al., Science, 233, 755-758.
1986; Chothia and Lesk, J. et al. Mol. B
iol. 196, 901-917, 1987; Fine.
Et al., Proteins, Vol. 1, 342-362, 1986; Tramentano and Lesk, Proteins, Vol. 13, 231-245, 1992). The specificity is due to six hypervariable loops called complementarity determining regions (CDRs). Understanding the structure of these loops teaches us much about antigen binding, catalytic antibodies, and molecular recognition.

【００２６】他の例としては、Ｇタンパク質共役型受容体（ＧＰＣＲ）内の膜貫通ヘリック
スを連結して、細胞外側に潜在的リガンド結合ドメインを形成するか、あるいは
膜の細胞内側にＧタンパク質結合ドメインを形成する細胞内ループが挙げられる
（カズミ（Ｋａｚｍｉ）ら、Ｂｉｏｃｈｅｍｉｓｔｒｙ、第３９巻、３７３４〜
３７４４頁、２０００年；イエイマン（Ｉｉｅｙｍａｎｎ）ら、Ｊ．Ｓｔｒｕｃ
ｔ．Ｂｉｏ１．、第１２８巻、２４３〜２４９頁、１９９９年）。上記細胞内ル
ープの３次元コンフォメーションの予測は、機構を理解する上で重要であるとと
もに（ギャザー（Ｇａｔｈｅｒ）ら、Ｎａｔｕｒｅ、第３６２巻、３４５〜３４
８頁、ｌ９９３年；キイル（Ｋｙｌｅ）ら、Ｊ．Ｍｅｄ．Ｃｈｅｍ．、第３７巻
、１３４７〜ｌ３５２頁、１９９４年；サイペス（Ｃｙｐｅｓｓ）ら、Ｊ．Ｂｉ
ｏｌ．Ｃｈｅｍ．、第２７４巻、１９４５５〜１９４６４頁、１９９９年；ツァ
ン（Ｚｈａｎｇ）ら、ＰｒｏｔｅｉｎＳｃｉ．、第３巻、４９３〜５０６頁、
１９９９年；リー（Ｌｅｅ）ら、Ｊ．Ｂｉｏｌ．Ｃｈｅｍ．、第２７５巻、９２
８４〜９２８９頁、２０００年）、引き続いてＧＰＣＲ関連薬物を設計する上で
も重要である（ムカージー（Ｍｕｋｈｅｒｊｅｅ）ら、Ｊ．Ｂｉｏｌ．Ｃｈｅｍ
．、第２７４巻、１２９８４〜１２９８９頁、１９９９年）。As another example, transmembrane helices within a G protein-coupled receptor (GPCR) are linked to form a latent ligand binding domain on the extracellular side, or a G protein-coupled on the intracellular side of the membrane. Examples include intracellular loops that form domains (Kazmi et al., Biochemistry, Vol. 39, 3734-).
3744, 2000; Iiemann et al., J. Am. Struc
t. Bio1. 128, pp. 243-249, 1999). Prediction of the three-dimensional conformation of the intracellular loop is important for understanding the mechanism (Gather et al., Nature, 362, 345-34).
8 p. 1993; Kyle et al. Med. Chem. 37: 1347-1352, 1994; Cypes et al., J. Am. Bi
ol. Chem. 274, 19455-19464, 1999; Zhang et al., Protein Sci. , Vol. 3, pp. 493-506,
1999; Lee et al., J. Am. Biol. Chem. , Vol. 275, 92
84-9289, 2000), and is subsequently important in the design of GPCR-related drugs (Mukherjee et al., J. Biol. Chem.
． 274, 12984-12989, 1999).

【００２７】化学環のモデリングでも同様の問題が起こるが、この場合の制約は、化学的に
妥当な結合の長さおよび角度で環を閉じる必要性から生じるものである（ゴー（
Ｇｏ）とシュラガ（Ｓｃｈｅｒａｇａ）、Ｍａｃｒｏｍｏｌｅｃｕｌｅｓ、第３
巻、１７８〜１８７頁、１９７０年；ブルッコレリ（Ｂｒｕｃｃｏｌｅｒｉ）と
カープラス（Ｋａｒｐｌｕｓ）、Ｍａｃｒｏｍｏｌｅｃｕｌｅｓ、第１８巻、２
７６７頁、１９８５年；シェンキン（Ｓｈｅｎｋｉｎ）ら、Ｂｉｏｐｏｌｙｍｅ
ｒｓ、第２６巻、２０５３〜２０８５頁、１９８７年；パルマー（Ｐａｌｍｅｒ
）とシェラガ（Ｓｃｈｅｒａｇａ）、Ｊ．Ｃｏｍｐ．Ｃｈｅｍ．、第１２巻、５
０５〜５２６頁、１９９１年）。Similar problems arise in modeling chemical rings, but the constraint in this case arises from the need to close the rings with chemically reasonable bond lengths and angles (go (
Go) and Shuraga, Macromolecules, 3rd
Vol. 178-187, 1970; Bruccoleri and Karplus, Macromolecules, Vol. 18, 2
767, 1985; Shenkin et al., Biopolyme.
rs, 26, 2053-2085, 1987; Palmer.
) And Sheraga, J .; Comp. Chem. , Volume 12, 5
05-526, 1991).

【００２８】多くの現行のストラテジーは上記問題を２つの下位問題に分けている。まず第
１に、適切な長さのポリペプチドバックボーン断片を、既知のタンパク質構造の
フレームワーク内のループの２つの端点の間に挿入するための、幾何学的に許容
されるコンフォメーションを見つけださねばならない（ゴー（Ｇｏ）とシェラガ
（Ｓｃｈｅｒａｇａ）、Ｍａｃｒｏｍｏｌｅｃｕｌｅｓ、第３巻、１７８〜ｌ８
７頁、１９７０年；ワイナー（Ｗｅｉｎｅｒ）ら、Ｊ．Ａｍｅｒ．Ｃｈｅｍ．Ｓ
ｏｃ．、第１０６巻、７６５〜７８４頁、１９８４年；ブルッコレリ（Ｂｒｕｃ
ｃｏｌｅｒｉ）とカープラス（Ｋａｒｐｌｕｓ）、Ｍａｃｒｏｍｏｌｅｃｕｌｅ
ｓ、第１８巻、２７６７頁、１９８５年；シェンキン（Ｓｈｅｎｋｉｎ）ら、Ｂ
ｉｏｐｏｌｙｍｅｒｓ、第２６巻、２０５３〜２０８５頁、１９８７年）。この
ステップでは通常は複数の解が生じる。第２に、第１のステップにおいて示唆さ
れた解の中から、通常はエネルギー基準によって適切なポリペプチドを選択する
。Many current strategies divide the above problem into two sub-problems. First, find a geometrically acceptable conformation for inserting a polypeptide backbone fragment of appropriate length between two endpoints of a loop within a framework of known protein structure. Must Go (Go and Scheraga, Macromolecules, Volume 3, 178-18
P. 7, 1970; Weiner et al., J. Am. Amer. Chem. S
oc. , 106, 765-784, 1984; Bruccoli.
coleri) and Karplus, Macromolecule
S., Vol. 18, pp. 2767, 1985; Shenkin et al., B.
iopolymers, 26, 2053-2085, 1987). This step usually results in multiple solutions. Second, the appropriate polypeptide is selected from the solutions suggested in the first step, usually by energy criteria.

【００２９】ループのバックボーン上反角の大半に対する格子探索による方法（ブルッコレ
リ（Ｂｒｕｃｃｏｌｅｒｉ）とカープラス（Ｋａｒｐｌｕｓ）、Ｂｉｏｐｏｌｙ
ｍｅｒｓ、第２６巻、１３７〜１６８頁、ｌ９８７年；モウルト（Ｍｏｕｌｔ）
とジェームス（Ｊａｍｅｓ）、Ｐｒｏｔｅｉｎｓ、第１巻、１４６〜１６３頁、
１９８６年）が示唆されており、該方法においては最初の格子点がラマチャンド
ランマップの許された領域から選択される。上記許された領域は、高分解能結晶
学によって決定されたタンパク質構造の組において観察されるバックボーンねじ
れ角の分布から決定されたものである。格子探索では系の自由度の数が指数関数
的に増大するため、このような方法は比較的短いループに限られる。A method by lattice search for most of the backbone dihedral of the loop (Bruccoleri and Karplus, Biopoly)
mers, 26, 137-168, 1987; Mault.
James, Proteins, Volume 1, 146-163,
1986), in which the first grid point is selected from the allowed region of the Ramachandran map. The allowed regions are those determined from the distribution of backbone twist angles observed in the set of protein structures determined by high resolution crystallography. Such a method is limited to relatively short loops because the number of degrees of freedom of the system increases exponentially in the lattice search.

【００３０】データベース探索は、最初にジョーンズ（Ｊｏｎｅｓ）とサーラップ（Ｔｈｉ
ｒｕｐ）（ＥＭＢＯＪ．、第５巻、８１９〜８２２頁、１９８６年）によって
提案され、サマーズ（Ｓｕｍｍｅｒｓ）とカープラス（Ｋａｒｐｌｕｓ）（Ｊ．
Ｍｏｌ．Ｂｉｏｌ．、第２１６巻、９９１〜１０１６頁、１９９０年）によって
拡張された。高分解能Ｘ線構造のデータベースを調べて、所望のループと類似の
幾何学的記述子と大きさを有するアミノ酸セグメントを選び出す。選ばれたセグ
メントをタンパク質内にドッキングし、幾何学およびエネルギー基準を用いてそ
れらの実現可能性を調べる。コール（Ｋｏｅｈｌ）とデラルー（Ｄｅｌａｒｕｅ
）（Ｎａｔ．Ｓｔｒｕｃｔ．Ｂｉｏｌ．、第２巻、１６３〜ｌ７０頁、１９９５
年）は、側鎖を付加するために、つじつまの合う（ｓｅｌｆ−ｃｏｎｓｉｓｔｅ
ｎｔ）平均場理論と組み合わせたデータベース探索を用いた。ループの長さは信
頼性のあるデータベース探索に対して限界をもたらすことが示された。すなわち
、サマーズ（Ｓｕｍｍｅｒｓ）とカープラス（Ｋａｒｐｌｕｓ）（Ｊ．Ｍｏｌ．
Ｂｉｏｌ．、１９９０年、第２１６巻、９９１〜１０１６頁）が指摘したように
、この方法は長さ６残基までに限られている。さらに、この方法ではテストケー
スの殆どをカバーする必要があるため、よく解析された構造についての相当に大
きいデータベースが要求される。デアン（Ｄｅａｎｅ）とブルンデル（Ｂｌｕｎ
ｄｅｌｌ）（Ｐｒｏｔｅｉｎｓ、第４０巻、１３５〜１４４頁、２０００年）は
、最近、コンピュータで生成した断片に網羅的アプイニシオ探索を採用して８残
基までのループを生成した。The database search was first done by Jones and Thirlap.
Rup) (EMBO J., Vol. 5, pages 819-822, 1986), proposed by Summers and Karplus (J.
Mol. Biol. 216, pp. 991-1016, 1990). A high resolution X-ray structure database is consulted to select amino acid segments with similar geometric descriptors and sizes as the desired loop. Dock selected segments into proteins and examine their feasibility using geometric and energy criteria. Koehl and Delarue
) (Nat.Struct.Biol., Vol. 2, pp. 163-170, 1995).
Years) to add side chains, self-consistent
nt) Database search combined with mean field theory was used. It has been shown that the loop length poses a limitation for reliable database searching. That is, Summers and Karplus (J. Mol.
Biol. , 216, 991-1016), this method is limited to 6 residues in length. Furthermore, this method needs to cover most of the test cases, which requires a fairly large database of well-parsed structures. Deane and Brundel
(Proteins, 40, 135-144, 2000) recently adopted an exhaustive appetition search on computer generated fragments to generate loops of up to 8 residues.

【００３１】ツリー探索法（ブルックス（Ｂｒｏｏｋｓ）ら、Ｊ．Ｃｏｍｐｕｔ．Ｃｈｅｍ
．、第４巻、ｌ８７〜２１７頁、ｌ９８３年；ブルッコレリ（Ｂｒｕｃｃｏｌｅ
ｒｉ）ら、Ｎａｔｕｒｅ、第３３５巻、５６４〜５６８頁、１９８８年）におい
ては、探索ストラテジーはノードに基づいており、該ノードはコンフォメーショ
ン探索の間に拡大されてもよい。歩留まりは中程度の大きさの構造に対しては低
く、大きな構造に対しては桁外れに低い（シェルキン（Ｓｈｅｒｋｉｎ）ら、Ｂ
ｉｏｐｏｌｙｍｅｒｓ、第２６巻、２０５３〜２０８５頁、１９８７年）。Tree Search Method (Brooks et al., J. Comput. Chem.
． , Vol. 4, pp. 187-217, 1983; Bruccolei.
ri) et al., Nature, 335, pp. 564-568, 1988), the search strategy is node-based, which may be expanded during the conformational search. Yields are low for medium-sized structures and prohibitively low for large structures (Sherkin et al., B.
iopolymers, 26, 2053-2085, 1987).

【００３２】ランダムトウィーク（ｒａｎｄｏｍｔｗｅａｋ）法（シェルキン（Ｓｈｅｎ
ｋｉｎ）ら、Ｂｉｏｐｏｌｙｍｅｒｓ、第２６巻、２０５３〜２０８５頁、１９
８７年）では、拘束を受けない構造が生成され、該構造においては全ての上反角
が無作為な値に設定されている。続いて、反復処理において全ての上反角を一括
してトウィーキング（ｔｗｅａｋｉｎｇ）することにより、ループ制約は幾何学
的に強化される。ファイン（Ｆｉｎｅ）ら（Ｐｒｏｔｅｉｎｓ、第１巻、３４２
〜３６２頁、１９８６年）も、所望のループのバックボーンに対する多数のラン
ダムなコンフォメーションを生成した後に、残りの分子を固定したままで最小化
および／または分子動力学を行っている。キイル（Ｋｙｌｅ）ら（Ｊ．Ｍｅｄ．
Ｃｈｅｍ．、第３７巻、１３４７〜ｌ３５２頁、１９９４年）は、バクテリアロ
ドプシンの既知の膜貫通構造のホモロジーモデリングを、エネルギー最小化、分
子動力学およびドッキングシミュレーションのための２段階のコンフォメーショ
ン探索と組み合わせて用いている。これらの技術では、局部エネルギーミニマム
対する出発点付近におけるコンフォメーション空間を探索する。Random tweak method (Sherkin
Kin) et al., Biopolymers, 26, 2053-2085, 19
1987), a structure that is not constrained is generated, in which all dihedral angles are set to random values. Subsequently, the loop constraint is geometrically strengthened by collectively tweaking all dihedral angles in the iterative process. Fine et al. (Proteins, Volume 1, 342)
~ 362, 1986) also generate a number of random conformations to the backbone of the desired loop, followed by minimization and / or molecular dynamics with the remaining molecule immobilized. Kyle et al. (J. Med.
Chem. , 37, 1347-l352, 1994), combining homology modeling of known transmembrane structures of bacteriorhodopsin with a two-step conformational search for energy minimization, molecular dynamics and docking simulations. I am using. These techniques search the conformational space near the starting point for the local energy minimum.

【００３３】結合スケーリング緩和法（ｂｏｎｄｓｃａｌｉｎｇ−ｒｅｌａｘａｔｉｏｎ
ｐｒｏｃｅｄｕｒｅ）は、幾何学的要件とエネルギー要件とを同時に満たす（
ツェン（Ｚｈｅｎｇ）ら、Ｊ．Ｃｏｍｐ．Ｃｈｅｍ．、第１４巻、５５６〜５６
５頁、１９９３年）。標準的な結合の長さと角度を有するランダムな初期コンフ
ォメーションを生成する。それぞれの初期コンフォメーションに対する結合の長
さをループに拘束される距離に合うように増減し、系を局部エネルギーミニマム
に緩和する。この方法は、後に、複数コピーサンプリング法（ｍｕｌｔｉｐｌｅ
ｃｏｐｙｓａｍｐｌｉｎｇｍｅｔｈｏｄ）と組み合わせることにより強化
された（ツェン（Ｚｈｅｎｇ）ら、ＰｒｏｔｅｉｎＳｃｉ．、第２巻、１２４
２〜１２４８頁、１９９３年；ツェン（Ｚｈｅｎｇ）ら、ＰｒｏｔｅｉｎＳｃ
ｉ．、第３巻、４９３〜５０６頁、ｌ９９４年）。改良された方法は、１２残基
までのループを処理するために用いられた。Bond scaling-relaxation
procedure meets both geometric and energy requirements at the same time (
Zheng et al., J. Am. Comp. Chem. , Volume 14, 556-56
P. 5, 1993). Generate a random initial conformation with standard bond lengths and angles. The bond length for each initial conformation is increased or decreased to fit the distance constrained by the loop, relaxing the system to a local energy minimum. This method will be described later in terms of the multiple copy sampling method.
Enhanced by combination with a copy sampling method (Zheng et al., Protein Sci., Vol. 2, 124.
2-1248, 1993; Zheng et al., Protein Sc.
i. , Vol. 3, pp. 493-506, 1994). The improved method was used to process loops of up to 12 residues.

【００３４】上記の全ての方法における重大な問題は、正しい（非常に近い）構造が含まれ
うる可能性を最大限に高めるために必要である、大きなコンフォメーション空間
をカバーする多数の異なる閉包を与えることが難しいことにある（ツェン（Ｚｈ
ｅｎｇ）ら、Ｊ．Ｃｏｍｐ．Ｃｈｅｍ．、第１４巻、５５６〜５６５頁、１９９
３年）。現行のアプローチは、小さいループに限定されるか、あるいはコンフォ
メーション空間の一区画だけを問題としている。したがって、潜在的に良好な解
が見落とされている可能性がある。コンフォメーション空間全体を検討して、ル
ープ閉包の幾何学的基準に従う全ての可能なコンフォメーションを見つけるより
効果的な探索ストラテジーが必要とされている。これらの解は後でより精細な基
準で評価することができる。A significant problem with all of the above methods is the large number of different closures covering a large conformational space that are needed to maximize the likelihood that the correct (very close) structure may be included. It is difficult to give (Zh
eng) et al., J. Comp. Chem. 14: 556-565, 199
3 years). Current approaches are either limited to small loops or only a section of conformational space. Therefore, potentially good solutions may have been overlooked. There is a need for a more effective search strategy that considers the entire conformational space and finds all possible conformations that follow the geometrical criterion of loop closure. These solutions can later be evaluated on a finer scale.

【００３５】他の生物学的問題さらに、組み合わせ探索が必要であると考えられる、例えば理論的薬物設計に
関連するような生物学的問題が他にも多く存在する。理論的薬物設計に対する基本的な仮定は、薬物作用が、１つの分子（リガンド
）が通常はそれより大きな分子（受容体、一般的にはタンパク質）である他の分
子のポケットに分子結合することによって得られるというものである。それらの
活性または結合性コンフォメーションにおいて、分子は幾何学的および化学的相
補性を示し、そのいずれもが薬物作用が上手く発揮されるために不可欠である。
こうした巨大分子に結合することにより、薬物は、例えば、ホルモン作用に対す
る感受性を変えることにより、あるいは例えば酵素の触媒活性を妨害することで
代謝を変えることにより、シグナル経路を調節することができる。最も一般的に
は、このことは反応を触媒する酵素の特異的空隙（活性部位）内に結合すること
によって、天然の基質の接近を妨害することによって行われる。膜貫通タンパク
質など、他の場合では、「アゴニスト」（シグナル伝達を活性化する天然の分子
）の結合を阻害するために「アンタゴニスト」を設計してもよいし、あるいは、
生物学的反応を低減する場合には、より強力な結合性アゴニストが薬物として必
要とされることもある。 Other Biological Issues In addition, there are many other biological issues that may require combinatorial search, such as those associated with theoretical drug design. The basic assumption for theoretical drug design is that the drug action is molecularly bound to the pocket of another molecule where one molecule (ligand) is usually a larger molecule (receptor, generally a protein). Is obtained. In their active or binding conformations, molecules exhibit geometric and chemical complementation, both of which are essential for successful drug action.
By binding to such macromolecules, drugs can modulate signaling pathways, for example, by altering their sensitivity to hormonal action or by altering metabolism, for example by interfering with the catalytic activity of enzymes. Most commonly, this is done by interfering with the access of the natural substrate by binding within the specific cavity (active site) of the enzyme that catalyzes the reaction. In other cases, such as a transmembrane protein, an "antagonist" may be designed to block the binding of an "agonist" (a natural molecule that activates signal transduction), or
More potent binding agonists may be required as drugs in reducing the biological response.

【００３６】大半の分子は柔軟性をもち、類似または近いエネルギー状態の多数の異なるコ
ンフォメーションをとることのできることから、分子構造のモデリングは複雑な
作業である。結合過程のモデリングもまた、受容体、リガンド、およびそれらが
中に見られる溶媒の特性を考慮に入れなければならないため、複雑な作業となる
。化学者たちはできるだけ正確なモデルを得ようと奮闘しているが、実際上は幾
つかの近似を行わなければならない。正確なモデルが使われるほど、化学者は分
子の相互作用の予測に立ち向かう機会が多くなることは明らかである。にもかか
わらず、近似モデルを用いて行われた多数の予測が実験的観察によって確認され
てきた。最近になって、コンピュータ理論法によって幾つかの薬物が設計されて
きた。これに励まされて研究者達は、近似モデルを用いるツールを構築し、これ
らのツールがどの程度有用かを調べてきた。こうした近似モデルは難しいアルゴ
リズム上の問題をもたらす。より深い理論的理解またはコンピュータの力の増大
を通して得られた、より正確な分子モデリングは、より単純なモデルによって開
発された技術を向上することしかできない。Modeling molecular structure is a complex task, as most molecules are flexible and can assume many different conformations of similar or close energy states. Modeling the binding process is also a complex task, because the properties of the receptors, ligands, and the solvents in which they are found must be taken into account. Chemists are struggling to get as accurate a model as possible, but in practice some approximations have to be made. Clearly, the more accurate models used, the more opportunities chemists face in predicting molecular interactions. Nevertheless, a large number of predictions made using the fitted model have been confirmed by experimental observations. Recently, some drugs have been designed by computer theory. Encouraged by this, researchers have built tools that use approximate models and have investigated how useful these tools are. Such approximate models pose difficult algorithmic problems. More accurate molecular modeling, obtained through deeper theoretical understanding or increased computational power, can only improve the technology developed by simpler models.

【００３７】受容体の化学的および幾何学的構造が知られているか否かによって、発生する
問題を２つの広いカテゴリに分類することができる。受容体が既知の場合には、
化学者達は、複合体のエネルギーが低くなるコンフォメーションにある受容体の
結合性ポケット内に、リガンドを配置することができるかどうかを見いだすこと
に興味を持つ。この問題はドッキング問題と呼ばれる。これには幾つかのバリエ
ーションがあり、結合相互作用の正確な説明が望まれるかもしれないし、あるい
は、巨大なデータベースに含まれるリガンドの中からどのリガンドが受容体内に
フィットするかの近似評価が探し出されるかもしれない。Depending on whether the chemical and geometric structure of the receptor is known, the problems that arise can be divided into two broad categories. If the receptor is known,
Chemists are interested in finding out if it is possible to place the ligand within the binding pocket of the receptor in a conformational low energy conformation. This problem is called the docking problem. There are several variations to this, which may require an accurate explanation of binding interactions, or searching for an approximate estimate of which ligand fits within the receptor from a large database of ligands. May be issued.

【００３８】結合性ポケットは未知であることが非常に多い。実際に、Ｘ線結晶学またはＮ
ＭＲ技術によって３次元構造が決定されているものは、その数が急速に増加して
いるといえども、比較的少ない。この場合、特異的受容体と相互作用する多くの
リガンドを用いる非直接的なアプローチを採用しなければならない。これらのリ
ガンドは主に実験によって発見されてきた。これらの分子の幾何学的構造と化学
的特性を用いて、化学者達は受容体に関する情報を推論しようと試みる。とりわ
け化学者達が興味を持つのは、これらのリガンド内に存在するファーマコフォア
（薬理作用団）を同定することである。ファーマコフォアは、検討された分子の
全ての活性コンフォメーションの中に含まれる特定の３次元配置における特徴の
組である。ファーマコフォアは、薬物活性の責任を担う分子の一部または部分で
あって、分子の残りの部分はファーマコフォアの特徴のための基盤構造であると
するのが優勢な仮説である。ファーマコフォアが決定されれば、出発分子の異な
る活性、相対形状および化学的構造を調べることによって、化学者達はこれをよ
り強力な薬剤を設計するために利用することができる。Very often the binding pocket is unknown. In fact, X-ray crystallography or N
The number of three-dimensional structures determined by MR technology is relatively small, although the number is rapidly increasing. In this case, an indirect approach with many ligands that interact with specific receptors must be taken. These ligands have been discovered primarily by experimentation. Using the geometrical and chemical properties of these molecules, chemists try to infer information about receptors. Of particular interest to chemists is the identification of pharmacophores that are present within these ligands. The pharmacophore is the set of features in a particular three-dimensional arrangement that is contained within all the active conformations of the molecule considered. The predominant hypothesis is that the pharmacophore is the part or part of the molecule responsible for drug activity, the rest of the molecule being the underlying structure for the features of the pharmacophore. Once the pharmacophore is determined, by examining the different activities, relative shapes and chemical structures of the starting molecules, chemists can use it to design more potent drugs.

【００３９】これまでにコンピュータを用いた薬物設計に用いられてきた技術には、ロボッ
ト工学（運動学および計画）、グラフィックアルゴリズム（分子の可視化）、幾
何学計算（表面計算）、数値法（エネルギー最小化）、グラフ理論法（不変点の
特定）、ランダム化アルゴリズム（コンフォメーション探索）、コンピュータビ
ジョン法（ドッキング）、および遺伝学的アルゴリズムやシミュレーテッドアニ
ーリングなどの様々な他の技術が含まれる。複雑な幾何学およびエネルギー計算
を行うための多数のツールが今では入手可能であり、これらのコンピュータによ
って支援される方法が上手くいくかどうかが評価されているところである。Techniques that have been used so far for computer-based drug design include robotics (kinematics and planning), graphic algorithms (visualization of molecules), geometric calculations (surface calculation), numerical methods (energy). Minimization), graph theory methods (identification of invariants), randomization algorithms (conformation search), computer vision methods (docking), and various other techniques such as genetic algorithms and simulated annealing. Numerous tools are now available to perform complex geometry and energy calculations, and the success of these computer-aided methods is being evaluated.

【００４０】薬物設計の他方の一般的問題は、生体分子標的から浮上するものである。ゲノ
ミクス、プロテオミクス、およびバイオインフォマティクスの進歩によって、薬
物発見研究のための新しい治療の標的が急速に生まれている。これらの標的に対
する活性を試験される可能性のある化合物は実際上無数にあるため、生物薬学研
究は、薬物の発見を加速するための相乗作用のアプローチに急速に依存するよう
になってきている。配列を構造に変換するために、新しいタンパク質の配列から
浮上した多くの情報を処理する我々の能力を向上するために新奇な計算方法が要
求される。さらに、大半のＸ線研究においては、タンパク質および溶媒のプロト
ン位置に関する重要な情報を欠いた単独の構造が得られるが、詳細なタンパク質
の構造研究でさえ、最良でも限られた組の静的コンフォメーションしか生み出さ
ないために、情報内容に限定されたものになる。たとえ標的の完全な構造が分っ
たとしても、これらの標的と相互作用する可能性のある薬物候補を設計すること
はは非常に難しい作業である。このように構造に基づいた薬物設計の分野におい
ては、上記の作業を容易にする改良方法が非常に必要とされている。The other general problem in drug design is that which emerges from biomolecular targets. Advances in genomics, proteomics, and bioinformatics are rapidly creating new therapeutic targets for drug discovery research. Due to the myriad of compounds that may be tested for activity against these targets, biopharmaceutical research is rapidly becoming dependent on synergistic approaches to accelerate drug discovery. .. Novel computational methods are required to improve our ability to process much of the information emerging from the sequences of new proteins in order to convert sequences into structures. Moreover, most X-ray studies yield single structures lacking important information about protein and solvent proton positions, but even detailed protein structural studies have at best the limited set of static consonants. Since it only creates formations, it is limited to information content. Even if the complete structure of the targets is known, designing drug candidates that may interact with these targets is a daunting task. Thus, in the field of structure-based drug design, there is a great need for improved methods that facilitate the above tasks.

【００４１】これらの問題の大半における最も厳しい制限は、変数の数が多いために非常に
複雑性が高いことである。あらゆる「現実の」分子構造の探索に、この多数の変
数の「最適化」の処理が必要になる。そのような複雑なポテンシャルエネルギー
面（ＰＥＳ）は多くの最小値を有し、該最小値のうちの１つが「グローバルミニ
マム」であり、おそらくは分子のネイティブ構造に関連している。The most severe limitation of most of these problems is the high complexity due to the large number of variables. The search for any "real" molecular structure requires "optimization" of this large number of variables. Such a complex potential energy surface (PES) has many minima, one of which is the "global minimum" and is probably associated with the native structure of the molecule.

【００４２】最近の著しい進歩にもかかわらず、グローバル最適化の一般的問題は未解決の
ままである（ウェールズ（Ｗａｌｅｓ）とシェラガ（Ｓｃｈｅｒａｇａ）、Ｓｃ
ｉｅｎｃｅ、第２８５巻、１３６８頁、１９９９年）。従来の最小化技術は時間
がかかる上に、局所ミニマムに収束する傾向にある。生体分子の「位相空間」の
サンプリング（バーン（Ｂｅｒｎｅ）とストラウブ（Ｓｔｒａｕｂ）、Ｃｕｒｒ
．Ｏｐｉｎ．Ｓｔｒｕｃｔ．Ｂｉｏｌ．、第７巻、１８１頁、１９９７年）は、
最小の領域を探索することや、そのような領域に達するのに必要な時間を短縮す
ることに役立つかもしれない。主なグローバル最小化アルゴリズムの幾つかを、
コンピュータを用いた生物学的用途への適用性の高い順から示していく。Despite recent significant advances, the general problem of global optimization remains unsolved (Wales and Scheraga, Sc).
ience, 285, 1368, 1999). Traditional minimization techniques are time consuming and tend to converge to a local minimum. Sampling of the "phase space" of biomolecules (Bern and Straub, Curr)
． Opin. Struct. Biol. , Vol. 7, p. 181, 1997),
It may help to search for the smallest areas and reduce the time required to reach such areas. Some of the main global minimization algorithms
The order of applicability to biological applications using computers is shown.

【００４３】コンフォメーションの数は残基の数とともに指数関数的に増大するため、タン
パク質構造予測はＮＰ困難な問題であることを示すことができる。タンパク質の
ネイティブコンフォメーションは、これらのうちの非常に小さな部分集団しか占
めないため、探査のための強力な探索アルゴリズムが要求される。Since the number of conformations increases exponentially with the number of residues, protein structure prediction can be shown to be a NP-hard problem. The native conformation of proteins accounts for only a very small sub-population of these, thus requiring powerful search algorithms for exploration.

【００４４】シミュレーテッドアニーリング（ＳＡ）は、多体系の状態方程式と凍結状態を
調べるためのモンテカルロ法を一般化したものである（メトロポリス（Ｍｅｔｒ
ｏｐｏｌｉｓ）、Ｊ．Ｃｈｅｍ．Ｐｈｙｓ．、第２１巻、１０８７頁、１９５３
年）。アニーリング過程において、最初は高温で整列していない溶融物をゆっく
りと冷却する。冷却が進むにつれ、系はより整列した状態になり、Ｔ＝０におけ
る「凍結」基底状態に近づく。最初の構造は摂動され、エネルギー変化ｄＥが計
算される。エネルギーの変化が負である場合は、新しい構造が許容される。エネ
ルギーの変化が正である場合には、これはボルツマン係数ｅｘｐ−（ｄＥ／ｋＴ
）に基づいて許容される。この処理を現在の温度に対する良好なサンプリング統
計値を与えるのに十分な回数だけ繰り返した後、温度を下げて、Ｔ＝０における
凍結状態が達成されるまで全処理を繰り返す。ＳＡは、特に、要求されるグロー
バルミニマムが多数のより些細な局所ミニマムの中に隠されているような場合の
、大きなスケールの最適化問題に適している（ホルム（Ｈｏｌｍ）とサンダー（
Ｓａｎｄｅｒ）、Ｐｒｏｔｅｉｎｓ、第１４巻、２１３頁、１９９２年；リー（
Ｌｅｅ）とスビア（Ｓｕｂｂｉａｈ）、Ｊ．Ｍｏｌ．Ｂｉｏｌ．、第２１７巻、
３７３頁、１９９１年；ファン（Ｈｗａｎｇ）とリャオ（Ｌｉａｏ）、Ｐｒｏｔ
ｅｉｎＥｎｇ．、第８巻、３６３頁、１９９５年；プレセット（Ｐｒｅｓｓｅ
ｔ）ら「ＮｕｍｅｒｉｃａｌＲｅｃｉｐｅｓ」、３２６頁、ケンブリッジ大学
出版局、ニューヨーク、ニューヨーク州、１９８６年）。Simulated Annealing (SA) is a generalization of the Monte Carlo method for investigating equations of state and frozen states of many systems (Metropolis (Metr).
opolis), J. Chem. Phys. 21: 1087, 1953
Year). In the annealing process, the initially high temperature unaligned melt is slowly cooled. As the cooling progresses, the system becomes more aligned and approaches the "frozen" ground state at T = 0. The initial structure is perturbed and the energy change dE is calculated. If the change in energy is negative, new structures are allowed. If the energy change is positive, this is the Boltzmann coefficient exp- (dE / kT
) Is allowed based on This process is repeated a sufficient number of times to give good sampling statistics for the current temperature, then the temperature is reduced and the whole process is repeated until a frozen state at T = 0 is achieved. SA is particularly suited to large-scale optimization problems where the required global minimum is hidden in many smaller trivial local minimums (Holm and Thunder (
Sander), Proteins, Vol. 14, pp. 213, 1992; Lee (
Lee) and Subbiah, J .; Mol. Biol. , Volume 217,
373, 1991; Hwang and Liao, Prot.
ein Eng. , Volume 8, p. 363, 1995; Presse (Presse
t) et al., "Numerical Recipes", p. 326, Cambridge University Press, New York, NY, 1986).

【００４５】遺伝的アルゴリズム（ＧＡ）は、多くの最適化問題に適用され、そのうちのい
くつかが成功を収めている（タフェリー（Ｔｕｆｆｅｒｙ）ら、Ｊ．Ｃｏｍｐｕ
ｔ．Ｃｈｅｍ．、第１４巻、７９０頁、１９９３年）。ＧＡは、自然界において
は最も適性のあるものが選ばれて生き残るという、ダーウィンの進化論にヒント
を得たものである（フォレスト（Ｆｏｒｒｅｓｔ）Ｓ（１９９３年）；Ｓｃｉ
ｅｎｃｅ、第２６１巻、８７２頁）。ＧＡを反復するたびに、弱い解を淘汰する
競争選択が行われる。解の一部を他のものに交換することにより、高い「適性」
を有する解が他の解と「再結合」される。解はまた、解の単一の要素に小さな変
化を与えることによって「変異」される。ＧＡは単純で、局所ミニマムにスタッ
クしにくく、グローバルに最適な解を見つけことができる場合が多い。導関数あ
るいは他の問題に特異的な計算を一切行う必要がない。しかしながら、これが有
効な解に収束するという保証がなく、収束条件に達するためには多数回反復する
必要がある。Genetic algorithms (GA) have been applied to many optimization problems, some of which have been successful (Tuffery et al., J. Compu.
t. Chem. , Vol. 14, p. 790, 1993). GA is inspired by Darwin's theory of evolution, in which the most suitable one in nature is selected and survives (Forrest S (1993); Sci.
ence, Vol. 261, p. 872). Each time the GA is repeated, a competitive choice is made to eliminate weak solutions. High "fitness" by exchanging part of the solution for another
Solutions that have are recombined with other solutions. The solution is also "mutated" by making small changes to a single element of the solution. GA is simple, does not easily get stuck in the local minimum, and can often find the optimal solution globally. There is no need to perform any derivative or other problem specific calculations. However, there is no guarantee that it will converge to a valid solution, and it will require a large number of iterations to reach the convergence condition.

【００４６】タブーサーチ（ＴＢＳ）（グローバー（Ｇｌｏｖｅｒ）、Ｃｏｍｐｕｔｅｒｓ
ａｎｄＯｐｅｒａｔｉｏｎｓＲｅｓｅａｒｃｈ、第５巻、５３３頁、１９
８６年）は、解を得るために必要な時間と、解の質の両方に関してＳＡよりも優
れている（チジョビック（Ｃｖｉｊｏｖｉｃ）とクリノフスキ（Ｋｌｉｎｏｗｓ
ｋｉ）、Ｓｃｉｅｎｃｅ、第２６７巻、６６４頁、１９９５年）。初期化の時点
では解の空間を大まかに調べることを目的とし、候補位置が特定されるにつれ、
局所最適解を得ることに探索の焦点が向けられる。ＴＢＳは問題独立的であり、
広範な作業に適用することができる。ＴＢＳは簡単に実施できるとともに、手順
全体でも数行のコードしか占有しない。そして、概念的にはＳＡやＧＡよりもは
るかに単純である。しかしながら、ＴＢＳは、限られた数のステップにおける複
数ミニマムの問題を解決することを保証できず、計算時間が長くなることもある
。Tabu Search (TBS) (Glover, Computers
and Operations Research, Vol. 5, pp. 533, 19
1986) outperforms SA in both the time required to obtain the solution and the quality of the solution (Cvijovic and Klinowski).
ki), Science, 267, 664, 1995). At the time of initialization, the purpose is to roughly examine the space of the solution, and as candidate positions are specified,
The focus of the search is on obtaining the local optimal solution. TBS is problem-independent,
It can be applied to a wide range of tasks. TBS is easy to implement and occupies only a few lines of code throughout the procedure. And, conceptually, it is much simpler than SA or GA. However, TBS cannot guarantee that it will solve the problem of multiple minimums in a limited number of steps, which may increase the computation time.

【００４７】Ｈ．シェラガ（Ｓｃｈｅｒａｇａ）のグループは、グローバル最適化のための
方法を編み出すための研究を盛んに行っている。潜在的機能変換および平滑化法
（ピエラ（Ｐｉｅｌａ）ら、Ｊ．Ｐｈｙｓ．Ｃｈｅｍ．、第９３巻、３３３９頁
、１９８９年：フィラディ（Ｐｉｌｌａｒｄｙ）とピエラ（Ｐｉｅｌａ）、Ｊ．
Ｐｈｙｓ．Ｃｈｅｍ．、第９９巻、１１８０５頁、１９９３年；フィラディ（Ｐ
ｉｌｌａｒｄｙ）ら、Ｊ．Ｐｈｙｓ．Ｃｈｅｍ．、第１０３巻、７３５３頁、１
９９９年）は、生体分子のエネルギー「ランドスケープ」を変形して、グローバ
ルミニマムを見つけるのにより関連深いＰＥＳの部分を研究することを可能にす
る。しかしながら、変形された面は余りにも多くの「集水溝（ｃａｔｈｍｅｎｔ
ｂａｓｉｎ）」を含み、拡散方程式法による平滑化によっても、多次元問題に
おける最低のエネルギーミニマムを単離することは保証されない。コンフォメー
ション空間アニーリング（リー（Ｌｅｅ）ら、Ｊ．Ｃｏｍｐｕｔ．Ｃｈｅｍ．、
第１８巻、１２２２頁、１９９７年）もまた、少数の変数に限定される。上記コ
ンフォメーションアニーリングは、探索の範囲を全コンフォメーション空間から
低エネルギー領域に狭め、最小化されたコンフォメーションの「プール」から探
索を開始する。この最小化されたコンフォメーションは、後で「プール」から無
作為に変数を取り出すことによって修正される。H. The group at Scheraga is actively researching to devise methods for global optimization. Latent functional transformation and smoothing methods (Piela et al., J. Phys. Chem., 93, 3339, 1989: Pillardy and Piela, J. et al.
Phys. Chem. Vol. 99, p. 11805, 1993; Philady (P
illardy) et al. Phys. Chem. , 103, 7353, 1
999) transforms the energy "landscape" of biomolecules, making it possible to study parts of the PES that are more relevant to finding a global minimum. However, the deformed surface has too many "cathement".
, and even smoothing by the diffusion equation method does not guarantee isolating the lowest energy minimum in a multidimensional problem. Conformation space annealing (Lee et al., J. Comput. Chem.,
Vol. 18, p. 1222, 1997) is also limited to a small number of variables. The conformational annealing narrows the scope of the search from the entire conformation space to the low energy region, starting with a "pool" of minimized conformations. This minimized conformation is later modified by randomly picking variables from the "pool".

【００４８】デッドエンド除外（ＤＥＥ）法は、グローバルミニマムとは全く相容れない解
を同定することに基づいている（デスメット（Ｄｅｓｍｅｔ）ら、Ｎａｔｕｒｅ
、第３５６巻、５３９頁、１９９２年；ラスターズ（Ｌａｓｔｅｒｓ）ら、Ｊ．
Ｐｒｏｔ．Ｃｈｅｍ．、第１６巻、４４９頁、１９９７年）。あるオーダー以上
の局部エネルギーミニマムに寄与することのできない解が除外される。エネルギ
ー（コスト）関数は、それぞれが最大２変数からなる関数である項の和として記
載しなければならない。ｉ番目の変数ｘ_ｉに対する値は、同一の変数に対する他
の値ｘ’_ｉが見つかる場合には、グローバルに最適な解とは一致しないはずであ
る。The Dead End Exclusion (DEE) method is based on identifying solutions that are totally incompatible with the global minimum (Desmet et al., Nature).
356, 539, 1992; Lasters et al., J. Am.
Prot. Chem. 16: 449, 1997). Solutions that cannot contribute to the local energy minimum above a certain order are excluded. The energy (cost) function must be described as the sum of terms, each of which is a function consisting of at most two variables. The value for the i-th variable x _i should not match the globally optimal solution if another value x ′ _i for the same variable is found.

【００４９】[0049]

【数２】処理を繰り返すと、解が十分に除外されてグローバルミニマムを見つけること
ができる（ゴールドシュタイン（Ｇｏｌｄｓｔｅｉｎ）、Ｂｉｏｐｈｙｓ．Ｊ．
、第６６巻、１３３５頁、１９９４年）。他の方法としては、個別の探索ストラ
テジーを精密化の最終段階における連続最小化に組み合わせるものがある（ダン
ブラック（Ｄｕｎｂｒａｃｋ）とカープラス（Ｋａｒｐｌｕｓ）、Ｍｏｌ．Ｂｉ
ｏｌ．、第２３０巻、５４３頁、１９９３年；バスケッツ（Ｖａｓｑｕｅｚ），
Ｂｉｏｐｏｌｙｍｅｒｓ、第３６巻、５３頁、１９９５年）。このアルゴリズム
に対する最もよく知られる用途は、タンパク質の側鎖のコンフォメーションの決
定である。ＤＥＥ法でユニークな構造に到達することができない場合には、残っ
たコンフォメーションに対して、力づくの組み合わせ探索や集積的アプローチな
どの追加のステップが必要である（ベッカー（Ｂｅｃｋｅｒ）、Ｐｒｏｔｅｉｎ
ｓ、第２７巻、２１３頁、１９９７年）。ＤＥＥは、重大な実際上の問題に直面
している。すなわち、上記の条件は全ての可能な変数に対して最小化を必要とす
るが、ＤＥＥは１つの基準に対して１つしか最小化できない。さらなる欠点は、
低エネルギーの解の集団を見つけだすことができないことである。[Equation 2] By repeating the process, the solution can be sufficiently excluded to find the global minimum (Goldstein, Biophys. J. et al.
66, 1335, 1994). Another method is to combine individual search strategies with continuous minimization in the final stages of refinement (Dunblack and Karplus, Mol. Bi.
ol. 230, p. 543, 1993; Vasquez,
Biopolymers, 36, 53, 1995). The best-known application for this algorithm is the determination of the side chain conformation of proteins. If the DEE method cannot reach a unique structure, additional steps such as a forceful combinatorial search or an agglomerative approach are required for the remaining conformations (Becker, Protein).
s, vol. 27, p. 213, 1997). DEE faces significant practical problems. That is, while the above conditions require minimization for all possible variables, DEE can only minimize one per criterion. A further drawback is
The inability to find a group of low energy solutions.

【００５０】統計学的方法（ＳＭ）は、目的関数のモデルを用いて新しいサンプル点の選択
を偏らせる。これらの方法は、最適化すべき特定の目的関数は特定の確率関数に
よってモデル化されたあるクラスの関数に由来するものであると提唱するベイス
の論法によって正当化される（モックス（Ｍｏｃｋｕｓ）、Ｊ．Ｇｌｏｂａｌ
Ｏｐｔｉｍ．、第４巻、３４７頁、１９９４年）。目的関数の以前のサンプルか
らの情報をパラメータを推定するために用いることができ、この精密化後のモデ
ルは、探索ドメイン内の点の選択を偏らせるために使用することができる。統計
学的ＳＭを使用する際の問題は、その統計学的モデルが問題に対して適切である
かどうかである。さらに、数学的複雑さのために高次元の最適化問題に対してコ
ンピュータコードを書き込むことが難しい。多くの場合、ＳＭは探索領域を区分
に分割することに頼っており、このことがこれらの方法を中程度の次元数の問題
に限られたものとしている。Statistical methods (SM) use a model of the objective function to bias the selection of new sample points. These methods are justified by Bace's reasoning, which proposes that the particular objective function to be optimized is derived from a class of functions modeled by a particular probability function (Mockus, J. . Global
Optim. , Vol. 4, 347, 1994). Information from previous samples of the objective function can be used to estimate the parameters, and this refined model can be used to bias the selection of points in the search domain. The problem with using the statistical SM is whether the statistical model is appropriate for the problem. Moreover, mathematical complexity makes it difficult to write computer code for high-dimensional optimization problems. In many cases, SM relies on partitioning the search area into partitions, which limits these methods to problems of moderate dimensionality.

【００５１】残念なことに、上記の試みられた解決法のいずれも、タンパク質の構造予測の
大きな問題のなかの上述の特殊な問題に対する適切な答えを提供せず、組み合わ
せ空間内の探索に対するより一般的な解決しか提供していない。Unfortunately, none of the above-mentioned attempted solutions provide an adequate answer to the above-mentioned particular problem among the major problems of protein structure prediction, rather than to the search in combinatorial space. It only provides a general solution.

【００５２】したがって、効率的、迅速かつ実施容易な、そして、タンパク質構造の予測の
大きな問題のなかの問題等の様々なタイプの生物学的問題に対して有用な、組み
合わせ空間内探索に対する解決法が必要であるとともに、そうした解決法を得る
ことは有用である。Thus, a solution to combinatorial in-space searching that is efficient, fast, easy to implement, and useful for various types of biological problems, such as those in the big problem of predicting protein structure. Is needed and it is useful to have such a solution.

【００５３】（発明の要約）本発明は、組み合わせの爆発を起こすことなく、組み合わせ空間を隅々まで探
索するためのシステムおよび方法を開示している。より基本的なレベルでは、各
組み合わせは変数から構成されると考えられ、該変数のそれぞれは少なくとも１
つの値をとりうる。本発明によれば、各変数は好ましくは離散値の組のなかの１
つの値をとるが、各変数は、たとえばある範囲の連続した値または関数の中の１
つの値をとってもよい。これらの変数は、個々の相互作用に対して知られた様式
で、互いに相互作用する。好ましくは個々の相互作用は、相互作用が対の相互作
用であるように、変数対に対して記述することができる。探索は好ましくは各変
数のなかの１つの値をサンプリングして、組み合わせを得ることによって実施さ
れる。そしてこの処理を典型的には多数回反復する。各組み合わせを定量的測定
値により評価する。定量的測定値は好ましくはコスト関数であり、コスト関数を
最もよく満たす組み合わせがどれであるかを決定する処理の間に、所望の結果が
該コスト関数に対して一般的には最大化されるか、少なくとも増大される。たと
えば、コスト関数がエネルギー最小化関数である場合、好ましくは低いエネルギ
ーコストまたは値を有する組み合わせが選択される。SUMMARY OF THE INVENTION The present invention discloses a system and method for exploring combinatorial spaces without combinatorial explosion. At a more basic level, each combination is considered to consist of variables, each of which is at least 1
Can take one value. According to the invention, each variable is preferably one of a set of discrete values.
Takes one value, but each variable is, for example, a range of consecutive values or 1 in a function
It may take one value. These variables interact with each other in a manner known for individual interactions. Preferably individual interactions can be described for variable pairs such that the interactions are pair interactions. The search is preferably performed by sampling the value of one of each variable to obtain the combination. This process is then typically repeated many times. Each combination is evaluated by a quantitative measurement. The quantitative measure is preferably a cost function, and the desired result is typically maximized over the cost function during the process of determining which combination best meets the cost function. Or at least increased. For example, if the cost function is an energy minimization function, then a combination is selected that preferably has a low energy cost or value.

【００５４】次に、本発明は、どの要素が定量的測定値のための少なくともいくつかの最小
所望値を与える組み合わせに寄与しないか、および／またはどれが、所望の値に
対するあるカットオフまたは閾値を下回る定量的測定値に対する値を与える組み
合わせに寄与するかを決定することを試みる。言い換えれば、これらの要素は、
系に対する「最良の」または最も満足のいく組み合わせに対して寄与しない。そ
の後これらの要素を好ましくは除去するか、少なくとも組み合わせを作るための
残りの可能性のある要素から分離する。The present invention then provides for which factors do not contribute to the combination giving at least some minimum desired values for the quantitative measurements, and / or which are certain cutoffs or thresholds for the desired values. Try to determine which contributes to the combination giving a value for the quantitative measurement below. In other words, these elements are
It does not contribute to the "best" or most satisfactory combination for the system. These elements are then preferably removed or at least separated from the rest of the potential elements for making a combination.

【００５５】変数の値を追い出す処理は、好ましくは、除去および／または分離されていな
い要素からなる組み合わせが所定数しか残らなくなるまで繰り返す。この時点に
おいて、網羅的探索が、定量的測定値に従って、および／またはいくつかの他の
測度パラメータに従って、最も好適に実施される。The process of expelling the value of the variable is preferably repeated until only a predetermined number of combinations of elements that have not been removed and / or separated remain. At this point, an exhaustive search is best performed according to quantitative measurements and / or according to some other measure parameter.

【００５６】本発明は、探索処理の最中に基本要素の各値を少なくとも１回、好ましくは複
数回調べることによって、上記の作業を達成している。したがって、基本要素に
対する全ての値の組み合わせを網羅的に探索する必要なく、各値は網羅的探索で
探索されていると言える。したがって、本発明は、非網羅的な確率論的探索処理
の効果と、網羅的探索の効果とを併せ持っている。The present invention accomplishes the above task by examining each value of the primitive at least once, and preferably multiple times, during the search process. Therefore, it can be said that each value is searched by the exhaustive search without the need to exhaustively search all combinations of values for the basic element. Therefore, the present invention has both the effect of non-exhaustive stochastic search processing and the effect of exhaustive search.

【００５７】本発明によれば、各組み合わせが少なくとも１つの要素からなる複数の組み合
わせを特徴とする組み合わせ空間を隅々まで探索する方法が提供され、該方法の
ステップは、データプロセッサによって実施され、該方法は、（ａ）組み合わせ
空間の探索結果が成功であるかどうかを決定するための、各組み合わせに対して
測定可能な定量的パラメータを提供するステップと、（ｂ）各々が少なくとも１
つの組み合わせを特徴とする集団に、組み合わせ空間内の組み合わせを分割する
ステップと、（ｃ）各集団の少なくとも１つの組み合わせに対する前記定量的パ
ラメータに対する値を計算するステップと、（ｄ）前記定量的パラメータの前記
値に対する各要素の効果を決定するステップと、（ｅ）前記効果に従って少なく
とも１つの組み合わせを保持して、組み合わせ空間を隅々まで探索した結果を提
供するステップとを含んでいる。According to the present invention there is provided a method of searching through a combinatorial space characterized by a plurality of combinations, each combination consisting of at least one element, the steps of the method being carried out by a data processor, The method comprises: (a) providing a measurable quantitative parameter for each combination to determine whether the combinatorial space search result is successful, and (b) at least 1 each.
Dividing the combinations in the combination space into a population characterized by one combination; (c) calculating a value for the quantitative parameter for at least one combination of each population; and (d) the quantitative parameter. Determining the effect of each element on said value of, and (e) holding at least one combination according to said effect and providing the result of searching every corner of the combination space.

【００５８】以下、「アミノ酸」という用語は、他の同様の分子とペプチド結合を形成する
ことのできる、天然および合成の両方の分子のことを指すものとする。本発明は添付の図面を参照して実施例によって説明する。（好ましい実施形態の説明）本発明は、組み合わせ爆発を引き起こすことなく組み合わせ空間を隅々まで探
索するためのシステムおよび方法を開示している。この探索は、基本要素の様々
な組み合わせに対して、探索の成功に関する定量的測定値に変換可能な当該組み
合わせの少なくとも１つの所望の特性に従って実施される。次に、本発明は、ど
の要素が定量的測定値のための少なくともいくつかの最小所望値を与える組み合
わせに寄与しないか、および／またはどれが、所望の値に対するあるカットオフ
または閾値を下回る定量的測定値に対する値を与える組み合わせに寄与するかを
決定することを試みる。好ましくは、所望の値に対する最小閾値を満足しない組
み合わせにだけ寄与する要素が選択される。言い換えれば、これらの要素は、系
に対する「最良の」または最も満足のいく組み合わせに対して寄与しない。その
後これらの要素を好ましくは除去するか、少なくとも組み合わせを作るための残
りの可能な要素から分離する。Hereinafter, the term “amino acid” shall refer to both natural and synthetic molecules capable of forming peptide bonds with other like molecules. The present invention will now be described by way of example with reference to the accompanying drawings. DESCRIPTION OF THE PREFERRED EMBODIMENTS The present invention discloses a system and method for exploring combinatorial spaces without causing combinatorial explosion. This search is carried out for various combinations of building blocks according to at least one desired property of the combination, which can be converted into a quantitative measure of the success of the search. The present invention then provides for quantification which factors do not contribute to the combination giving at least some minimum desired values for the quantitative measurements, and / or which are below a certain cutoff or threshold for the desired values. Attempts to determine which contributes to the combination giving the value for the static measurement. Preferably, elements are selected that contribute only to combinations that do not meet the minimum threshold for the desired value. In other words, these factors do not contribute to the "best" or most satisfactory combination for the system. These elements are then preferably removed or at least separated from the rest of the possible elements to make a combination.

【００５９】要素をソートする処理は、好ましくは、除去および／または分離されていない
要素からなる組み合わせが所定数しか残らなくなるまで繰り返す。そのような所
定の数は、任意選択で、あるいはより好適には組み合わせの総数に対する実際の
数値であるか、あるいは、組み合わせが、残りの組み合わせに含まれるために満
たすべき定量的測定値に対する最小所望値のための閾値であってもよい。この時
点において、網羅的探索が、定量的測定値に従って、および／またはいくつかの
他の測度パラメータに従って、最も好適に実施される。The process of sorting the elements is preferably repeated until only a predetermined number of combinations of elements that have not been removed and / or separated remain. Such a predetermined number is optionally, or more preferably, an actual numerical value for the total number of combinations, or a minimum desired for quantitative measurements that a combination must meet in order to be included in the remaining combinations. It may be a threshold for the value. At this point, an exhaustive search is best performed according to quantitative measurements and / or according to some other measure parameter.

【００６０】膨大な数の組み合わせが存在するため、組み合わせを評価するための定量的測
定値に対する要素の効果について、好ましくは組み合わせのサンプルが調べられ
る。定量的測定値の一貫した最大化および／または促進を有する組み合わせの要
素を維持し、他の要素は排除する。この処理をある最大数の組み合わせが見つか
るまで反復し、その後これらの組み合わせを同様のパラメータおよび／または何
らかの他のパラメータまたは特性に従って評価することもできる。そのような一
群に組み合わせは任意選択で、定量的測定値に対する特定の最小値を有する組み
合わせの集団として観測することもできる。Due to the huge number of combinations, a sample of the combinations is preferably examined for the effect of the elements on the quantitative measurements for evaluating the combinations. Retain the elements of the combination with consistent maximization and / or promotion of quantitative measurements and exclude the other elements. The process may be repeated until a certain maximum number of combinations is found, after which these combinations may be evaluated according to similar parameters and / or some other parameter or characteristic. Such a group of combinations may optionally be observed as a population of combinations having a particular minimum for the quantitative measurement.

【００６１】より基本的なレベルでは、各組み合わせは変数から構成されると考えられ、該
変数のそれぞれは少なくとも１つの値をとる。本発明によれば、各変数は好まし
くは離散値の組のなかの１つの値をとるが、各変数は、例えばある範囲の連続値
または関数の中の１つの値をとってもよい。これらの変数は、個々の相互作用に
対して知られた様式で、互いに相互作用する。好ましくは個々の相互作用は、相
互作用が対の相互作用であるように、変数対に対して記述することができる。変
数の組み合わせの定量的測定値は好ましくはコスト関数であり、コスト関数を最
もよく満たす組み合わせがどれであるかを決定する処理の間に、所望の結果が該
コスト関数に対して全体的に最大化されるか、少なくとも増大される。たとえば
、コスト関数がエネルギー最小化関数である場合、好ましくは低いエネルギーコ
ストまたは値を有する組み合わせが選択される。At a more basic level, each combination is considered to consist of variables, each of which has at least one value. According to the invention, each variable preferably takes on one value in a set of discrete values, but each variable may take on one value in a range of continuous values or functions, for example. These variables interact with each other in a manner known for individual interactions. Preferably individual interactions can be described for variable pairs such that the interactions are pair interactions. The quantitative measure of the combination of variables is preferably a cost function, and during the process of determining which combination best meets the cost function, the desired result is the overall maximum for the cost function. Or at least increased. For example, if the cost function is an energy minimization function, then a combination is selected that preferably has a low energy cost or value.

【００６２】組み合わせ空間内での探索を行うためのコスト関数を作成するために様々な特
性を任意選択で使用することができる。たとえば、アミノ酸配列に従って様々な
タンパク質構造を探索するためには、選択された構造がエネルギーミニマムまた
はその付近を表すように、コスト関数は任意選択で組み合わせのエネルギー最小
化とすることができる。そのようなエネルギーコスト関数は、タンパク質構造予
測の大きな問題のなかの、より具体的なまたは「下位の」問題に対しても有用で
ある。たとえば、アミノ酸に対する極性プロトンと側鎖の予測位置を最小化する
ことにより、これらのタイプの組み合わせ探査に対して有用な定量的パラメータ
が与えられる。ただし、この場合、望まれる定量的パラメータの最大化は、実際
には組み合わせに対するエネルギー計算の値の最小化を通して達成されることを
注記しておく。Various properties can optionally be used to create a cost function for performing the search in combinatorial space. For example, to search various protein structures according to amino acid sequences, the cost function can optionally be a combinatorial energy minimization, such that the selected structure represents at or near the energy minimum. Such energy cost functions are also useful for more specific or "subordinate" problems within the major problem of protein structure prediction. For example, minimizing the predicted positions of polar protons and side chains relative to amino acids provides quantitative parameters useful for these types of combinatorial exploration. However, it should be noted that in this case the maximization of the desired quantitative parameter is in fact achieved through minimization of the values of the energy calculation for the combination.

【００６３】しかしながら、実質的にあらゆるコスト関数が任意選択で本発明の方法ととも
に用いることができる。コスト関数は、必ずしも生物学的問題に関連づけられる
必要はないであろうが、例えば、金銭価値（文字どおりの経済的「コスト」）に
対するコスト関数の最適化などの他のタイプの問題に関連づけられるかもしれな
い。However, virtually any cost function can optionally be used with the method of the present invention. The cost function may not necessarily be associated with a biological problem, but may be associated with other types of problems, such as optimization of the cost function with respect to monetary value (literally economic “cost”). unknown.

【００６４】本発明は、探索処理の間に基本要素の各値を少なくとも１回、好ましくは複数
回調べることによって、上記の作業を達成している。したがって、基本要素に対
する全ての値の組み合わせを網羅的に探索する必要なく、各値は網羅的探索で探
索されていると言える。したがって、本発明は、非網羅的な確率論的探索処理の
効果と、網羅的探索の効果とを併せ持っている。The present invention accomplishes the above task by examining each value of the primitive at least once, and preferably multiple times during the search process. Therefore, it can be said that each value is searched by the exhaustive search without the need to exhaustively search all combinations of values for the basic element. Therefore, the present invention has both the effect of non-exhaustive stochastic search processing and the effect of exhaustive search.

【００６５】上で注記したように、本発明を実施した後に、たとえば複数の局所ミニマムだ
けでなく絶対ミニマムを特定するために、追加の網羅的探索を任意選択で実施し
てもよい。そのような追加の網羅的探索は、本発明による最初の探索処理が、本
発明の好ましい実施形態である確率論的探索および／または比較成分を含んでい
る場合に特に好ましい。本発明は多くの点で背景技術の探索方法と明確に区別さ
れるものである。第１に、本発明は、当該分野における既知のいかなる方法に基
づいておらず、またそれらを改変したものでもない。第２に、組み合わせ探索空
間内の個々の全ての値を証明することを保証できない他の確率論的探索方法とは
異なり、組み合わせ探索空間内の全ての変数の個々の値が探索空間から除去すべ
きかどうかを証明されなければならない。第３に、本発明はまた、任意選択で好
ましくは、グローバルミニマムに加えて局所ミニマムの集団を得ることもできる
。本発明は、以下で詳しく述べるが、本発明の結果と完全な網羅的探索を単独で
用いた場合の結果との比較によって以下で証明されるように、網羅的探索の有効
性を与えながらも、確率論的探索によって上記の目的を達成することができる。As noted above, additional exhaustive searches may optionally be performed after practicing the present invention, for example, to identify absolute minimums as well as multiple local minimums. Such an additional exhaustive search is particularly preferred if the initial search process according to the invention comprises a stochastic search and / or comparison component, which is a preferred embodiment of the invention. The present invention is in many ways clearly distinguished from the background art search methods. First, the present invention is not based on any method known in the art, nor is it modified. Second, unlike other probabilistic search methods that cannot guarantee that every individual value in the combinatorial search space will be proved, the individual values of all variables in the combinatorial search space must be removed from the search space. You have to be certified. Third, the invention can also optionally and preferably obtain a population of local minimums in addition to the global minimum. The present invention is described in detail below, while providing the effectiveness of an exhaustive search, as evidenced below by comparison of the results of the present invention with the results of using a complete exhaustive search alone. , A probabilistic search can achieve the above objectives.

【００６６】本発明の原理および作用は、図面およびいくつかのセクションに分けて記載し
た説明によってより理解されるであろう。説明の最初の部分（本セクションにお
ける）は、本発明による例示的な一般的方法と、該方法を実施するための基本的
な例示的システムを中心に説明する。続く各セクションでは具体的な生物学的問
題について記載し、各セクションにはそれぞれの問題のタイプの標題を付けてい
る。これらのセクションは、本発明の適切な実施および適用のための例を記載し
たものであって、いかなる限定を与えることも意図していない。The principles and operation of the present invention may be better understood with reference to the drawings and the description set forth in several sections. The first part of the description (in this section) focuses on an exemplary general method according to the invention and a basic exemplary system for implementing the method. The following sections describe specific biological issues, and each section is labeled with its respective problem type. These sections describe examples for the proper practice and application of the invention and are not intended to give any limitation.

【００６７】ここで図面を参照すると、図１は本発明による組み合わせ空間を隅々まで探索
するための例示的ではあるが好ましい一般的方法のフローチャートである。図示
されるように、ステップ１において、組み合わせ空間が提供される。そのような
組み合わせ空間は基本要素の複数の組み合わせを特徴としている。組み合わせ空
間は、たとえば、何らかのパターン、計画および／またはスキームに従って基本
要素有する複数の構造を作成することにより、任意選択で作成される。あるいは
、組み合わせ空間は任意選択で前もって定められたものであってもよい。たとえ
ば、生物学的問題に対しては、組み合わせ空間は、分析すべき生物学的構造のタ
イプに従ってすでに定められたものであってもよい。Referring now to the drawings, FIG. 1 is a flow chart of an exemplary but preferred general method for searching an entire combinatorial space in accordance with the present invention. As shown, in step 1, a combinatorial space is provided. Such combinatorial spaces feature multiple combinations of basic elements. The combinatorial space is optionally created, for example, by creating multiple structures with basic elements according to some pattern, plan and / or scheme. Alternatively, the combination space may optionally be pre-defined. For example, for biological problems, the combinatorial space may already be defined according to the type of biological structure to be analyzed.

【００６８】ステップ２において、本発明の好適な実施形態に従って、各組み合わせは、任
意選択でかつ好適に、それぞれは少なくとも１つの値をとりうる変数から構築さ
れる。本発明によれば、各変数は好ましくは離散値の組のなかの１つの値をとる
が、各変数は、たとえばある範囲の連続値または関数の中の１つの値をとっても
よい。これらの変数は、個々の相互作用に対して知られる様式で互いに相互作用
する。好ましくは個々の相互作用は、相互作用が対の相互作用であるように、変
数対に対して記述することができる。In step 2, according to a preferred embodiment of the present invention, each combination is optionally and preferably constructed from variables each of which can take at least one value. According to the invention, each variable preferably takes on one value in a set of discrete values, but each variable may take on one value in a range of continuous values or functions, for example. These variables interact with each other in a manner known for individual interactions. Preferably individual interactions can be described for variable pairs such that the interactions are pair interactions.

【００６９】ステップ３において、定量的パラメータが決定され、該定量的パラメータに従
って探索が成功したかどうか測定される。定量的パラメータは組み合わせ空間の
各組み合わせに対して測定可能でなければならない。典型的には、定量的パラメ
ータは各組み合わせの基本要素に従って計算され、この測度に対して構造的特徴
および／または相互作用の効果を任意選択でさらに考慮に入れてもよい。タンパ
ク質構造予測などの生物学的問題に対して、特定の問題を調べるための定量的パ
ラメータの種類は既知のものであってもよい。たとえば、タンパク質内の極性プ
ロトンの位置の予測に対しては、最良の定量的パラメータは、好ましくは、技術
上周知でセクション１に関して以下でより詳細に説明する等式に従って決定され
た、組み合わせに対するエネルギーの最小化である。In step 3, quantitative parameters are determined and according to the quantitative parameters it is measured whether the search was successful. Quantitative parameters must be measurable for each combination in the combination space. Quantitative parameters are typically calculated according to the basic elements of each combination, and the effect of structural features and / or interactions on this measure may optionally be further taken into account. For biological problems such as protein structure prediction, the types of quantitative parameters for investigating a particular problem may be known. For example, for the prediction of the position of polar protons within a protein, the best quantitative parameter is preferably the energy for the combination, determined according to equations well known in the art and described in more detail below with respect to Section 1. Is the minimization of.

【００７０】変数の組み合わせの定量的測定値は、好ましくはコスト関数であり、どの組み
合わせがコスト関数を最もよく満たすかを決定する処理において、望ましい結果
が該コスト関数に対して一般的には最大化されるか、あるいは少なくとも増大さ
れる。たとえば、コスト関数がエネルギー最小化関数である場合には、好ましく
はより低いエネルギーコストまたは値を有する組み合わせが選択される。The quantitative measure of the combination of variables is preferably a cost function, and in the process of determining which combination best meets the cost function, the desired result is generally the maximum for the cost function. Or at least increased. For example, if the cost function is an energy minimization function, the combination with the lower energy cost or value is preferably selected.

【００７１】ステップ４において、各要素または変数の寄与が評価され、定量的パラメータ
またはコスト関数に対する各組み合わせの特定の要素または特定の変数の値の効
果が決定される。このような効果は好ましくは、コスト関数によって評定される
ように、変数の値と変数間の相互作用との両方から決定される。In step 4, the contribution of each element or variable is evaluated to determine the effect of the value of that particular element or variable of each combination on the quantitative parameter or cost function. Such effects are preferably determined both from the values of the variables and the interactions between the variables, as assessed by the cost function.

【００７２】好ましい効果は、コスト関数の一貫した最大化に対するものである。一貫した
最大化は、全変数組の組み合わせまたは「コンフィギュレーション」の大きなグ
ループに対するコスト関数の値の分布に従って任意選択で測定される。本発明の
好ましい実施形態に従えば、特に多数の変数を伴う場合に、好ましくはこれらの
異なる値の効果が確率論的分析によって決定される。これは網羅的分析の効率が
桁違いに悪く時間を要すると考えられるためである。確率論的分析は、好ましく
は、組み合わせを作るために、より好ましくは複数の異なる組み合わせを作るた
めに、各変数に対して無作為に値を選ぶことによって行われる。最も好ましくは
、所定数のそのような組み合わせをサンプリング処理の一部として形成する。そ
の後、各組み合わせに対するコスト関数の結果または値が、変数の値と当該変数
間の相互作用との両方に従って計算される。The preferred effect is for consistent maximization of the cost function. Consistent maximization is optionally measured according to the distribution of cost function values for all variable set combinations or large groups of "configurations". According to a preferred embodiment of the present invention, the effect of these different values is preferably determined by probabilistic analysis, especially with a large number of variables. This is because the efficiency of exhaustive analysis is considered to be orders of magnitude worse and time-consuming. The probabilistic analysis is preferably performed by randomly choosing values for each variable to create a combination, more preferably a plurality of different combinations. Most preferably, a predetermined number of such combinations are formed as part of the sampling process. The result or value of the cost function for each combination is then calculated according to both the value of the variables and the interaction between the variables.

【００７３】ステップ５において、任意選択でかつ好適に、前述のようにコスト関数の望ま
しい結果の一貫した最大化に寄与しない要素または変数の値が除去される。より
好ましくは、コスト関数のより望ましくない結果、または一定の最小閾値を下回
る結果にだけ寄与して、所望の結果のための一定の閾値を越える結果には寄与し
ない変数の値が除去される。たとえば、エネルギー最小化を含むコスト関数に対
しては、エネルギーコストが一定の閾値を上回る組み合わせ（より望ましくない
結果）の中だけ見つけられ、エネルギーコストが別の低エネルギー閾値を下回る
組み合わせ（より望ましい結果）の中では見られない変数に対する値が、好まし
くは除去される。あるいは、これらの値は、除去されるのではなく、さらなる分
析のために、任意選択で「標識される」および／または分離されてもよい。In step 5, optionally and preferably, the values of the elements or variables that do not contribute to the consistent maximization of the desired result of the cost function are removed as described above. More preferably, the values of variables that contribute only less desirable results of the cost function, or results below a certain minimum threshold, but not above a certain threshold for the desired result, are eliminated. For example, for a cost function that includes energy minimization, a combination where the energy cost is found only in a combination above a certain threshold (less desirable result) and the energy cost falls below another low energy threshold (more desirable result). Values for variables not found in) are preferably removed. Alternatively, these values may optionally be "labeled" and / or separated for further analysis rather than being removed.

【００７４】ステップ６において、組み合わせの総数がある最小値に達している場合には、
これらの組み合わせは任意選択で、より好適にコスト関数および／または、他の
何らかのパラメータに従ってさらに分析して組み合わせ探索の結果を決定する。
任意選択で、コスト関数および／または何らかの他の注目パラメータに従って評
定される、注目の組み合わせに対する最小数の組み合わせのなかで網羅的探索を
行うこともできる。そのような組み合わせ群は、定量的測定値の望ましい結果に
対する特定の最小値を有する組み合わせの集団として任意で観測することができ
る。In step 6, if the total number of combinations has reached a certain minimum value,
These combinations are optionally and more preferably further analyzed according to a cost function and / or some other parameter to determine the results of the combinatorial search.
An exhaustive search may optionally be performed among the minimum number of combinations for the combination of interest, evaluated according to a cost function and / or some other parameter of interest. Such a set of combinations can optionally be observed as a population of combinations that have a particular minimum for the desired outcome of the quantitative measurement.

【００７５】そうでなければ、組み合わせが上記最小数になるまで、好ましくはステップ４
および５を繰り返す。「最小数」は、任意選択で組み合わせの絶対数、および／
またはそのような組み合わせが「カットオフ」閾値として満足しなければならな
いコスト関数に対する最小値のことを指す。If not, preferably step 4 until the number of combinations reaches the above minimum number.
And 5 are repeated. “Minimum number” is an optional absolute number of combinations, and / or
Or, it refers to the minimum value for the cost function that such a combination must satisfy as a "cutoff" threshold.

【００７６】図２は、図１の方法を実施するための本発明による例示的システムを示してい
る。図示されるように、システム１０は計算装置１２を特徴としている。本実施
形態において、計算装置１２は図１の方法を一括して実行できるように、多数の
機能的モジュールを計算する。これらの機能的モジュールは随意かつ好適に、ソ
フトウェアモジュールとして実施されるが、これに代えて、ハードウェア、ファ
ームウェアまたはそれらの組み合わせとして実施してもよい。FIG. 2 shows an exemplary system according to the invention for implementing the method of FIG. As shown, the system 10 features a computing device 12. In the present embodiment, the computing device 12 computes a number of functional modules so that the method of FIG. These functional modules are optionally and preferably implemented as software modules, but may alternatively be implemented as hardware, firmware or a combination thereof.

【００７７】図示したように、そのようなモジュールの１つとして組み合わせ格納モジュー
ル１４があり、該モジュール１４は、それぞれの集団において現在考慮されてい
る組み合わせを保持する。次に、定量的パラメータ計算モジュール１６は、組み
合わせ格納モジュール１４からの各集団内の少なくとも１つの組み合わせに対す
る定量的パラメータの値を計算する。評価モジュール１８は、組み合わせの要素
から複数の組み合わせのサンプルを作成し、好ましくはある要素が、その組み合
わせに対する定量的パラメータに対して最大化された値に一貫して寄与するとし
て保持されるように、その組み合わせに対する定量的パラメータの値に対する各
要素の効果を評価する。これらのモジュールは好ましくは、組み合わせ格納モジ
ュール１４内に、組み合わせ空間内の探索の結果を表すある最小数の組み合わせ
が保持されるまで相互に作用する。As shown, one such module is the combination storage module 14, which holds the combinations currently being considered in each population. The quantitative parameter calculation module 16 then calculates the value of the quantitative parameter for at least one combination in each population from the combination storage module 14. The evaluation module 18 creates a sample of multiple combinations from the elements of the combination, preferably such that one element is retained as consistently contributing to the maximized value for the quantitative parameter for that combination. , Evaluate the effect of each factor on the value of the quantitative parameter for that combination. These modules preferably interact until a certain minimum number of combinations is retained in the combination storage module 14, which represents the result of the search in the combination space.

【００７８】前述の説明は本発明の方法およびシステムを大まかに説明したものである。次
の各セクションは、具体的な問題として本発明によって取り扱われる具体的なモ
デル系を記載したものであり、該具体的問題に対して、本発明は解決法を与える
ことができる。これらのセクションは、組み合わせ空間を隅々まで探索して極性
プロトンの位置を特定する（第１節）、タンパク質中のアミノ酸側鎖の位置を特
定する（第２節）、タンパク質中のループ構造を予測すること（第３節）、およ
び他の本発明によって解決される雑多な生物学的問題（第４節）の説明を含んで
いる。The above description is a general description of the method and system of the present invention. The following sections describe specific model systems that are addressed by the present invention as specific problems, to which the present invention can provide solutions. These sections search the combinatorial space to identify polar proton positions (Section 1), identify amino acid side chains in proteins (Section 2), and identify loop structures in proteins. It includes predicting (section 3) and an explanation of other miscellaneous biological problems solved by the present invention (section 4).

【００７９】第１節：極性プロトンの位置特定本発明は、生物学的分子（例えば、タンパク質分子またはＤＮＡ）内の極性プ
ロトンの位置を正確に特定するという問題を解決するのに有用である。次に、こ
のような極性プロトンの位置特定により、生物学的分子それ自体の内部にある水
素結合、または生物学的分子と別の分子との間にある水素結合の位置が決定され
る。従って、本発明のこの特定の実施は重要な科学的問題を解決する。 Section 1: Polar Proton Localization The present invention is useful in solving the problem of accurately locating polar protons within biological molecules (eg, protein molecules or DNA). The localization of such polar protons then determines the position of hydrogen bonds within the biological molecule itself or between the biological molecule and another molecule. Therefore, this particular implementation of the present invention solves an important scientific problem.

【００８０】本節の「方法」にて述べられる本発明の特定の実施はまた、「結果」にて述べ
られるように当該技術分野で周知の他の方法と突き合わせて試験された。これら
の方法および結果は例示のためだけに示され、どのようにも制限することは意図
していないことに留意すべきである。次に、これらの結果の解釈が「考察」にて
述べられる。The particular implementations of the invention described in “ Methods ” of this section were also tested against other methods known in the art as described in “ Results ”. It should be noted that these methods and results are presented for illustrative purposes only and are not intended to be limiting in any way. The interpretation of these results is then described in " Discussion ".

【００８１】方法試験のために、本発明の方法は、Ｃ＋＋で書かれたコンピュータソフトウエア
プログラムとして実施された。このプログラムは図３のフローチャートに示すよ
うに動く。示したようにステップ１において、プログラムは、任意に、タンパク
質データバンク（ＰｒｏｔｅｉｎＤａｔａＢａｎｋ）座標ファイルフォーマ
ット（ＰＤＢファイル）を読み取るか、または別の情報源から入力情報を受け取
る。プログラムは、系の原子をパラメータ化するためのデータベースとして働く
補助ＡＳＣＩＩファイルを使用する。これらのファイルは、全ての原子の連結性
、原子の電荷、Ｌｅｎｎａｒｄ−Ｊｏｎｅｓ関数のＡパラメータおよびＢパラメ
ータ、ならびに水素と重原子との間の結合長を含んでいる。ユーザーは、容易に
これらのファイルを編集することにより残基タイプを追加、削除、および修正す
ることができる。ステップ２において原子をパラメータ化するために、これらの
値はファイルから読み取られるか、または別の情報源からの入力である。 Method For testing purposes, the method of the present invention was implemented as a computer software program written in C ++. This program operates as shown in the flowchart of FIG. In step 1 as indicated, the program optionally reads the Protein Data Bank coordinate file format (PDB file) or receives input information from another source. The program uses an auxiliary ASCII file that acts as a database for parameterizing the atoms of the system. These files include connectivity of all atoms, atom charges, A and B parameters of the Lenard-Jones function, and the bond length between hydrogen and heavy atoms. The user can easily edit these files to add, delete, and modify residue types. These values are read from a file or input from another source to parameterize the atoms in step 2.

【００８２】ステップ３において、追加しようとする水素および孤立電子対が３つのカテゴ
リーに分けられる：（１）自明の水素（ｔｒｉｖｉａｌｈｙｄｒｏｇｅｎ）−
重原子の座標および結合混成を用いて位置を特定することができる水素（例えば
、脂肪族水素および芳香族水素）、（２）非自明の水素（ｎｏｎｔｒｉｖｉａ
ｌｈｙｄｒｏｇｅｎ）−回転可能な自由度を有する極性水素（例えば、セリン
、スレオニン、およびチロシンのヒドロキシル）、（３）非自明の孤立電子対−
非自明の水素の同じ幾何学的性質を有するもの。In step 3, the hydrogen to be added and the lone electron pair are divided into three categories: (1) trivial hydrogen-
Hydrogens that can be located using heavy atom coordinates and bond hybridization (eg, aliphatic and aromatic hydrogens), (2) nontrivial hydrogens (non trivia).
l hydrogen-polar hydrogens with rotatable degrees of freedom (eg, hydroxyls of serine, threonine, and tyrosine), (3) non-trivial lone pair of electrons-
Non-trivial hydrogen with the same geometric properties.

【００８３】ステップ４において、自明の水素が最初に追加される。これらの座標は、デー
タベースからの重原子座標、結合長および角度、ならびに標準的な二面角を用い
て計算される。In step 4, trivial hydrogen is first added. These coordinates are calculated using heavy atom coordinates from the database, bond lengths and angles, and standard dihedral angles.

【００８４】ステップ５において、非自明の水素および非自明の孤立電子対が集団（ｅｎｓ
ｅｍｂｌｅ）に分けられ、これらの座標はまだ計算されない。集団は、それら自
身の間で相互作用する非自明の水素または非自明の孤立電子対の集まりと定義さ
れる。集団カットオフはユーザーにより定義される。ユーザーは大きな集団カッ
トオフ値を割り当て、系が１つの大きな集団として動くようにすることができる
。非自明の原子の位置がまだ特定されていないので、集団カットオフは、非自明
の原子に結合している重原子の座標から測定される。集団は「セグメント（ｓｅ
ｇｍｅｎｔ）」からなる。各セグメントは、２つの重原子（このうちの１つは極
性プロトンに結合している）を連結する結合周囲の回転を含む。各セグメントは
、Ｈ結合条件を満たす空間において様々な位置を使用することができる。In step 5, the non-trivial hydrogen and the non-trivial lone pair of electrons (ens)
emble) and these coordinates are not calculated yet. An ensemble is defined as a collection of nontrivial hydrogen or nontrivial lone pairs of electrons that interact between themselves. The collective cutoff is user defined. The user can assign a large population cutoff value so that the system behaves as one large population. Collective cutoffs are measured from the coordinates of the heavy atoms bound to the non-trivial atom, since the position of the non-trivial atom has not yet been identified. The group is “segment (se
gment) ”. Each segment contains a rotation around a bond connecting two heavy atoms, one of which is bonded to a polar proton. Each segment can use various positions in the space that satisfy the H-bond condition.

【００８５】集団カットオフに加えて、任意におよび好ましくは、他の２つのカットオフ条
件が使用される。エネルギーカットオフ（非結合エネルギー計算において使用す
る普通の意味でのエネルギーカットオフ。すなわち、デフォルトはカットオフな
しである）が使用される。もう１つのカットオフが、回転可能なセグメント周囲
の水素結合パートナーの位置特定に使用される（下記を参照のこと）（これは「
集団カットオフ」より大きくても小さくてもよいが、Ｈ結合のための全ての近接
するパートナーを含め、セグメントの解を失う危険性を避けるために、常に＞３
Åであるべきである）。このカットオフが４．５Åを超えると、多くの非現実的
な必須でないパートナーが生じ、解を探索する時間が延びる。集団カットオフは
、全てのメンバーについての関係を解かなければならない関連する重原子（ヒド
ロキシルの酸素、水の酸素、ＮＨ_３ ^＋、アミンなど）の集まりを作成するために
使用される。従って、このカットオフが４Åである場合、原子Ａおよび原子Ｂま
たは原子Ａおよび原子Ｃの各対の距離は４Åより小さいかもしれないが、Ｒ_Ｂ， _Ｃは＞４Åである可能性があり、同時に、３つ全ての原子は同じ集団の一部であ
る。In addition to the population cutoff, optionally and preferably two other cutoff conditions are used. The energy cutoff (the energy cutoff in the normal sense used in non-bonded energy calculations, ie the default is no cutoff) is used. Another cutoff is used to locate hydrogen-bonding partners around rotatable segments (see below) (this is
It may be larger or smaller than the "collective cutoff", but always> 3 to avoid the risk of losing the solution of the segment, including all adjacent partners for H-bonds.
Å). When this cutoff exceeds 4.5Å, many unrealistic non-essential partners are created, increasing the time to search for a solution. Collective cutoffs are used to create a collection of related heavy atoms (hydroxyl oxygen, water oxygen, NH ₃ ⁺ , amines, etc.) that must be solved for all members. Therefore, if this cutoff is 4Å, the distance of each pair of atom A and atom B or atom A and atom C may be less than 4Å, but R _B, _C may be> 4Å, At the same time, all three atoms are part of the same population.

【００８６】好ましくは、各集団は別々に処理される。非自明の水素および非自明の孤立電
子対の座標を計算するために、ステップ６において２次元マトリックスが形成さ
れる。この２次元マトリックスは、供与体と受容体との間で形成することができ
る全ての水素結合のリストである。Ｈ結合カットオフが大きくなればなるほど、
形成される水素結合連結の選択肢が多くなり、代替相互作用（ａｌｔｅｒｎａｔ
ｉｖｅｉｎｔｅｒａｃｔｉｏｎ）の２Ｄマトリックスが大きくなる。Preferably, each population is treated separately. In order to calculate the coordinates of the non-trivial hydrogen and the non-trivial lone pair of electrons, a two-dimensional matrix is formed in step 6. This two-dimensional matrix is a list of all hydrogen bonds that can be formed between the donor and acceptor. The larger the H-bond cutoff, the more
There are more choices of hydrogen bond linkages that are formed, and alternative interactions (alternat
The 2D matrix of ive interaction becomes larger.

【００８７】一例として、図４に示す集団は、１つの集団を一体となって形成する２つだけ
のカルボニル（１，２）、１つのアミド、および１つのヒドロキシルを含んでい
る。ヒドロキシルは１つの非自明の水素（３）および２つの非自明の孤立電子対
（４，５）を供与し、アミドは１つの自明の水素（６）を供与する。セグメント
は、１つの重原子に結合している非自明の水素および非自明の孤立電子対の集ま
りと定義される。例えば、原子３と孤立電子対４および５は同じ酸素に連結して
いるので１つのセグメントである。ヒドロキシル水素（３）がカルボニルのいず
れかと水素結合を形成することができ、ヒドロキシル孤立電子対（４，５）がＮ
−Ｈと水素結合を形成することができると仮定すると、完全な２Ｄマトリックス
は図５Ａに示す形をしている。アミドに対して水素結合を形成するために２つの
孤立電子対は縮退しており、従って、これらの１つは２Ｄマトリックスの最初の
代替組み合わせ（ａｌｔｅｒｎａｔｉｖｅｃｏｍｂｉｎａｔｉｏｎ）を形成す
るために省くことができる（４−＞６または５−＞６）。水素および第１の孤立
電子対の位置が特定された後に、省かれた孤立電子対は自動的に追加される。従
って、初期２Ｄマトリックスは図５Ｂに示す形をしている。モジュール（ｍｏｄ
ｕｌｅ）により２Ｄマトリックスは精密化される。すなわち、高エネルギー値（
突出値）を生じる位置が削除される。エネルギー閾値はユーザーにより定義され
、非結合エネルギー式が使用される。As an example, the population shown in FIG. 4 contains only two carbonyls (1,2), one amide, and one hydroxyl, which together form one population. Hydroxyl donates one non-trivial hydrogen (3) and two non-trivial lone pair electrons (4,5), and amide donates one trivial hydrogen (6). A segment is defined as a collection of nontrivial hydrogen and nontrivial lone pairs of electrons attached to one heavy atom. For example, atom 3 and lone electron pairs 4 and 5 are one segment because they are linked to the same oxygen. The hydroxyl hydrogen (3) can form a hydrogen bond with any of the carbonyls, and the hydroxyl lone pair (4,5) is N
Assuming that hydrogen bonds can be formed with -H, the complete 2D matrix has the shape shown in Figure 5A. The two lone electron pairs are degenerate to form a hydrogen bond to the amide, so one of these can be omitted to form the first alternative combination of the 2D matrix ( 4-> 6 or 5-> 6). The omitted lone pair is automatically added after hydrogen and the first lone pair are located. Therefore, the initial 2D matrix has the shape shown in FIG. 5B. Module (mod
ul) refines the 2D matrix. That is, high energy value (
The positions that produce a salient value) are deleted. The energy threshold is user defined and a non-bonded energy equation is used.

【００８８】精密化された２Ｄマトリックスを使用して、３Ｄマトリックスがステップ７で
形成される。ここで、集団における全ての組み合わせが一義的に定義される（す
なわち、任意の組み合わせにおいて、任意の非自明の（回転可能な）水素および
非自明の孤立電子対には１つしか選択肢がない）。この例において、３Ｄマトリ
ックスは図５Ｃに示す形をしている。線の各対が１つの寄与を構成する。それぞ
れの組み合わせが評価され、最良の組み合わせが集団の結果である。１を超える
集団の場合、このプロセスは各集団について繰り返される。A 3D matrix is formed in step 7 using the refined 2D matrix. Here, every combination in the population is uniquely defined (ie, in any combination, there is only one choice for any nontrivial (rotatable) hydrogen and nontrivial lone pair). . In this example, the 3D matrix has the shape shown in Figure 5C. Each pair of lines constitutes one contribution. Each combination is evaluated and the best combination is the population result. For more than one population, this process is repeated for each population.

【００８９】それぞれの組み合わせの質を評価するために使用されるエネルギー基準は、ペ
アワイズ「非結合」エネルギー関数：The energy criterion used to assess the quality of each combination is the pairwise “unbonded” energy function:

【００９０】[0090]

【数３】である。式中、Ａ_ｉ，ｊは２つの（ｉ，ｊ）原子の反発パラメータであり、Ｂ_ｉ _，ｊは２つの（ｉ，ｊ）原子の分極率引力パラメータであり、ｑ_ｉは部分電荷で
あり、ｒ_ｉｊは原子間の距離である。εは、アルゴリズムの試験において４であ
るように選択される誘電率である。このコードは変更がきき、力場は所望の通り
どのようにも容易に改変することができる。[Equation 3] Is. _Where A _{i, j} is a repulsion parameter of two (i, j) atoms, B _i _{, j} is a polarizability attraction parameter of two (i, j) atoms, and q _i is a partial charge. , R _ij are distances between atoms. ε is the dielectric constant chosen to be 4 in testing the algorithm. This code is mutable and the force field can be easily modified in any way desired.

【００９１】エネルギー計算は各集団の「境界」にまで及んでいる。Ｅ（ｒ_ｉ，ｊ）を計算
するためのカットオフ距離はユーザーにより定義されるが、長距離の静電的相互
作用を説明することができるように、カットオフを避けることが推奨される。こ
の集団アプローチに関する主な問題は、ある集団における非自明の原子と別の集
団における非自明の原子との間の相互作用を計算することである。この場合、ま
だ配置が決定されていない第２の集団における非自明の水素の座標は、非自明の
水素が結合している重原子の座標と一致すると仮定される。これは、ある原子の
既知の位置と別の原子のまだ決定されていない位置との間の比較的長い距離によ
り容認される「融合原子（ｕｎｉｆｉｅｄａｔｏｍ）」近似である。しかしな
がら、系を１つの巨大な集団として処理するようにプログラムを強制することに
より（この場合、全ての非自明の水素および非自明の孤立電子対は正確な位置と
共に同時に追加される）、ユーザーはこの近似を避けることができる。Energy calculations extend to the “boundaries” of each population. The cutoff distance for calculating E (r _{i, j} ) is user defined, but it is recommended to avoid cutoff so that long range electrostatic interactions can be explained. The main problem with this collective approach is to compute the interactions between non-trivial atoms in one population and non-trivial atoms in another. In this case, it is assumed that the coordinates of the non-trivial hydrogen in the second population whose configuration has not yet been determined match the coordinates of the heavy atom to which the non-trivial hydrogen is bound. This is a "unified atom" approximation that is tolerated by the relatively long distance between the known position of one atom and the as yet undetermined position of another atom. However, by forcing the program to treat the system as one huge ensemble (where all non-trivial hydrogen and non-trivial lone pairs are added simultaneously with exact positions), the user This approximation can be avoided.

【００９２】１つの集団を構成する大きな生物学的系の場合、非常に大きな組み合わせの問
題が生じることは明らかである。例えば、ＲＮａｓｅＡ（５ＲＳＡ）には、全て
の回転可能な水素について１．７６^＊１０^５９の代替組み合わせがある。２Ｄマ
トリックスから３Ｄマトリックスを作成しようという試みはコンピュータの能力
の限度を超えている。この問題の大きさを小さくするために、独特の確率論的ア
プローチが開発された。このアルゴリズムは、組み合わせの数がユーザーにより
定義された閾値を超えたら、集団の網羅的な計算を確率論的な計算に切り換える
。この集団においてｄ_０セグメントにおける位置は未知である。通常、それぞれ
の非自明の水素または非自明の孤立電子対には１を超える位置があるが、たった
１つの位置しか最小エネルギーを示さない。非自明の水素および非自明の孤立電
子対は互いに影響を及ぼす。すなわち、孤立電子対の位置が特定されたら、その
位置により、同じ重原子に結合している水素の位置が決定され、逆もまた同様で
ある。It is clear that for large biological systems that make up one population, very large combinatorial problems arise. For example, RNase A (5RSA) has 1.76 ^* 10 ⁵⁹ alternative combinations for all rotatable hydrogens. Attempts to create 3D matrices from 2D matrices are beyond the capabilities of computers. A unique probabilistic approach was developed to reduce the size of this problem. This algorithm switches the exhaustive computation of the population to probabilistic computation if the number of combinations exceeds a user-defined threshold. The position in the d ₀ segment in this population is unknown. Usually, each non-trivial hydrogen or non-trivial lone pair of electrons has more than one position, but only one position exhibits the minimum energy. Nontrivial hydrogen and nontrivial lone pairs of electrons influence each other. That is, when the position of a lone pair of electrons is specified, the position determines the position of hydrogen bonded to the same heavy atom, and vice versa.

【００９３】Ｘ＝（Ｘ_１，Ｘ_２．．．Ｘ_ｄ０）が、ある集団におけるｄ_０セグメントの立体
配置とする。それぞれの立体配置Ｘについて、エネルギーＥ＝Ｅ（Ｘ）が前記の
エネルギー関数に従って計算される。目標は、Ｅを最小にする立体配置を見つけ
ることである。多数の組み合わせのために全ての代替立体配置を評価することは
不可能であるので、前記のステップの後に、ステップ８を行うための一例として
、以下の組み合わせの評価を行う。Let X = (X ₁ , X ₂ ... X _d0 ) be the configuration of the d ₀ segment in a population. For each configuration X, the energy E = E (X) is calculated according to the above energy function. The goal is to find the configuration that minimizes E. Since it is not possible to evaluate all alternative configurations for a large number of combinations, the following combinations are evaluated as an example for performing step 8 after the above steps.

【００９４】１．組み合わせの大きな母集団からｎ個の立体配置、Ｘ_１＝（Ｘ_１１，Ｘ_１２．．．Ｘ_１ｄ０）．．．，Ｘ_ｎ＝（Ｘ_ｎ１，Ｘ_ｎ２．．．Ｘ_ｎｄ０）を無作為に
標本抽出する。式中、Ｘ_１１は、１番目のセグメントの１番目の無作為に選択さ
れたコンフォメーションであり、Ｘ_ｎ１は、このセグメントのｎ番目の無作為に
選択されたコンフォメーションである。図６Ａは、２Ｄマトリックスから標本抽
出された最初の３つの立体配置とｎ番目の立体配置を示す。対応するエネルギー
値：立体配置Ｘ_１の場合1. N configurations from a large population of combinations, X ₁ = (X ₁₁ , X ₁₂ ... X _1d0 ). ．． , X _n = (X _n1 , X _n2 ... X _nd0 ) are randomly sampled. Where X ₁₁ is the first randomly selected conformation of the first segment and X _n1 is the nth randomly selected conformation of this segment. FIG. 6A shows the first three configurations and the nth configuration sampled from the 2D matrix. Corresponding energy value: In case of configuration X ₁

【００９５】[0095]

【数４】立体配置Ｘ_ｎの場合[Equation 4] In case of configuration X _n

【００９６】[0096]

【数５】を計算する。２．分布Ｆ^ｎ _Ｅ（ｎ＝約１０^３）を作成する。Ｆ^ｎ _Ｅは、タンパク質全体につ
いてのｎ個の標本抽出された立体配置に一致するエネルギーの集合である。Ｆ^ｎ _Ｅにおけるカットオフ点ＨおよびＬを定義する。Ｈは、Ｅ_ｉ＞Ｆ^ｎ _Ｅ（１−α）
を満たす全ての立体配置を含むのに対して（式中、Ｆ^ｎ _Ｅ（α）は、Ｆ^ｎ _Ｅの第
α百分位数である）、ＬはＥ_ｉ＜Ｆ^ｎ _Ｅ（α）を満たす全ての立体配置を含む。
ＨおよびＬのどちらも立体配置の数はｎ_０＝ｎ^＊αである。最大エネルギー立体
配置および最小エネルギー立体配置についてｎ＝１０００の立体配置およびα＝
１％である場合、ｎ_０＝α^＊ｎ＝０．０１^＊１０００＝１０であり、従って、Ｌ
＝１０およびＨ＝１０である。言い換えれば、Ｈは１０個の最大エネルギー系を
表すのに対して（図６Ｂ）、Ｌは最小エネルギーを有する１０個の系を表す。[Equation 5] To calculate. 2. Distribution Fⁿ _E(N = about 10^Three) Is created. Fⁿ _EIs for the whole protein
Is a set of energies corresponding to the n sampled configurations. Fⁿ _E Define cutoff points H and L at. H is E_i >Fⁿ _E(1-α)
Whereas all configurations that satisfyⁿ _E(Α) is Fⁿ _EThe first
α is the percentile), L is E_i <Fⁿ _EIncludes all configurations that satisfy (α).
The number of configurations in both H and L is n₀= N^*It is α. Maximum energy solid
N = 1000 configurations and α = for the configurations and minimum energy configurations
N if 1%₀= Α^*n = 0.01^*1000 = 10, so L
= 10 and H = 10. In other words, H is the maximum energy system of 10
In contrast (FIG. 6B), L represents the 10 systems with the lowest energies.

【００９７】３．Ｈにおけるエネルギーに一致する立体配置での位置についてベクトルｈを
作成する。ベクトルｈは、以下のような、Ｈにおける全ての立体配置のエレメン
トワイズ交差（ｅｌｅｍｅｎｔ−ｗｉｓｅｉｎｔｅｒｓｅｃｔｉｏｎ）である
。すなわち、Ｈにおける全ての立体配置が同じ値を共有する場合、成分ｊ（立体
配置Ｘ_ｎのＸ_ｎｊに対応する）で５−＞１とすると、ｈ_ｊ＝５−＞１であり、さ
もなければｈ_ｊ＝０である（全ての高エネルギー立体配置におけるセグメントｊ
について共通する位置なし）。例えば、図６Ｂにおいて、Ｈの全ての立体配置は
同じ値５−＞１および２３−＞３４を共有し、従って、これらの立体配置はベク
トルｈ＝（５−＞１，２３−＞３４．．．，０）の一部である。これは、ｄ_０セ
グメントのｎ^＊αの高エネルギー立体配置について作成されたベクトルであり、
セグメント１における値５−＞１ならびにセグメント２における値２３−＞３４
が全ての高エネルギー立体配置に現われることを示している（図６Ｃ）。高エネ
ルギー領域における最後のセグメントｄ_０に共通位置は見つけられなかった。3. Create a vector h for a position in the configuration that matches the energy in H. The vector h is the element-wise intersection of all configurations in H as follows. That is, if all the configurations in H share the same value, then if the component j (corresponding to X _{nj of the} configuration X _n ) is 5-> 1, then h _j = 5-> 1, and otherwise If h _j = 0 (segment j in all high energy configurations
No common position about). For example, in FIG. 6B, all configurations of H share the same values 5-> 1 and 23-> 34, so these configurations are vector h = (5->1,23-> 34. ., 0). This is the vector created for the n ^* α high energy configuration of the d ₀ segment,
Value 5-> 1 in segment 1 and value 23-> 34 in segment 2
Appears in all high energy configurations (Fig. 6C). No common position was found for the last segment d ₀ in the high energy region.

【００９８】４．Ｌにおけるエネルギーに一致する立体配置での位置についてベクトルｌを
作成する。ベクトルｌは、図６Ｄに示すＬにおける全ての立体配置の和集合であ
る。ベクトルｈと異なり、１を超える立体配置がｌの各セグメントに現われる可
能性がある。4. Create a vector l for the position in the configuration that matches the energy in L. Vector l is the union of all configurations in L shown in FIG. 6D. Unlike vector h, more than one configuration may appear in each segment of l.

【００９９】５．ｈおよびｌを比較する。ｈ_ｊおよびｌ_ｊの両方が類似するベクトル成分ｊ
を有する場合、これは低エネルギー値の一因ともなるので、このセグメントの存
続可能な立体配置として残される。しかしながら、ｈ_ｊ≠ｌ_ｊである場合、対応
するセグメント成分ｈ_ｊは後の反復から除かれる。１に等しい大きさを有するセ
グメントにおいて、ｈ_ｊは唯一利用可能な解であるので、後の反復から除かれな
いことに留意すべきである。図６Ｅは、値５−＞１が高エネルギーベクトルｈに
だけ存在するので、さらなる計算から除かれるのに対して、セグメント２におけ
る値２３−＞３４はベクトルｌにも存在するので除かれないことを示している。
新たな２Ｄマトリックスは、図６Ｆに示すように対５−＞１を含まない。結果を
ゆがめる可能性のある非常に高いエネルギーを有する立体配置を避けるために、
大きな組の組み合わせからの立体配置の正当な除去および不当な除去の確率を特
に扱う統計式に従って、立体配置の数ｎおよび百分位数の値αが選択された。間
違って除外される事例の最小化は、αおよびｎを大きくすることによって達成す
ることができる。しかしながら、正しく除外される事例の期待数もまた減少する
が、傾きはより小さくなる。ｎ＝５００およびα＝０．００８の値が適当な妥協
案として選択された（図３のステップ８）。5. Compare h and l. vector component j with which both h _j and l _j are similar
, Which also contributes to the low energy value, is left as a viable configuration for this segment. However, if h _j ≠ l _j , the corresponding segment component h _j is excluded from later iterations. It should be noted that in a segment with a size equal to 1, h _j is the only available solution and is not excluded from later iterations. FIG. 6E shows that the value 5-> 1 is excluded from the further calculations because it exists only in the high energy vector h, whereas the value 23-> 34 in segment 2 is also excluded in the vector l. Is shown.
The new 2D matrix does not include the pair 5-> 1 as shown in Figure 6F. To avoid configurations with very high energies that can skew the results,
The number of configurations n and the percentile value α were chosen according to statistical equations that specifically deal with the probabilities of correct and illegitimate removal of configurations from a large set of combinations. Minimization of falsely excluded cases can be achieved by increasing α and n. However, the expected number of correctly excluded cases is also reduced, but the slope is smaller. Values of n = 500 and α = 0.008 were chosen as a good compromise (step 8 in FIG. 3).

【０１００】６．可能な立体配置の数がユーザーにより定義された閾値より小さくなるまで
、減少した位置−空間に対してステップ１〜４を繰り返す（図３のステップ９）
。７．最良の立体配置を見つけるために、全ての残っている立体配置のＥを計算
する（網羅的探索；図３のステップ１０）。6. Repeat steps 1-4 for the reduced position-space until the number of possible configurations is below a user-defined threshold (step 9 in Figure 3).
. 7. To find the best configuration, calculate E for all remaining configurations (exhaustive search; step 10 in FIG. 3).

【０１０１】結果アルゴリズムは、５種類の高分解能結晶構造：ＢｒｏｏｋｈａｖｅｎＰｒｏ
ｔｅｉｎＤａｔａＢａｎｋ（Ｂｅｒｎｓｔｅｉｎら，Ｊ．Ｍｏｌ．Ｂｉｏｌ
．１９９７；１１２：５３５−５４２）ファイル：ウシ膵トリプシンインヒビタ
ー（５ＰＴＩ）、ＲＮＡｓｅ−Ａ（５ＲＳＡ）、トリプシン（１ＮＴＰ）、およ
び一酸化炭素結合型ミオグロビン（２ＭＢ５）（これらの中性子回折座標がプロ
トン位置について入手可能である）、ならびにリン酸結合タンパク質（１ＩＸＨ
）（この超高分解能結果がＸ線によって報告されている）に対して試験された。
全ての水素原子をＰＤＢファイルから取り除き、水素原子が結晶中で最適な位置
にあると仮定して水素原子の位置を再構成するために、アルゴリズムを作動させ
た。 Results The algorithm consists of five high-resolution crystal structures: Brookhaven Pro.
tein Data Bank (Bernstein et al., J. Mol. Biol
． 1997; 112: 535-542) file: Bovine pancreatic trypsin inhibitor (5PTI), RNAse-A (5RSA), trypsin (1NTP), and carbon monoxide-bound myoglobin (2MB5) (these neutron diffraction coordinates are related to the proton position). Available), as well as phosphate binding proteins (1IXH
) (This ultra-high resolution result is reported by X-ray).
The algorithm was run to remove all hydrogen atoms from the PDB file and reconstruct the positions of the hydrogen atoms, assuming that the hydrogen atoms were at optimal positions in the crystal.

【０１０２】それぞれの系を、前記方法の以下の２つのバリエーションにより処理した。１．組み合わせ「集団−確率論的アプローチ」：それぞれの系は集団に分けら
れる。各集団は別々に処理される。集団における可能な全ての組み合わせが評価
され、最小エネルギーを有する組み合わせが結果である。非常に大きな組み合わ
せ要求を有する集団では、組み合わせの数を網羅的に評価することができる数ま
で減らすために「確率論的アプローチ」が作動された。このアプローチにおける
有利な点は、計算に必要とされるＣＰＵ時間が短いことである。一例として、こ
の方法による３ＩＮＳとその水層に対する計算はＳｉｌｉｃｏｎＧｒａｐｈｉ
ｃｓＲ１００００マシンにおいて対話式であり、約４分かかる。しかしながら
、このアプローチは、前記で述べたように異なる集団における非自明の水素と非
自明の孤立電子対との間の距離の近似を必要とし、精度はいくらか低下する。Each system was treated with the following two variations of the method. 1. Combination "group-stochastic approach": Each system is divided into groups. Each population is treated separately. All possible combinations in the population are evaluated, the combination with the lowest energy is the result. In a population with very large combination requirements, a "probabilistic approach" was activated to reduce the number of combinations to a number that could be exhaustively evaluated. The advantage of this approach is that the CPU time required for the calculation is short. As an example, the calculation for 3INS and its aquifer by this method is Silicon Graphi
It is interactive on the cs R10000 machine and takes about 4 minutes. However, this approach requires an approximation of the distance between the non-trivial hydrogen and the non-trivial lone pair of electrons in different populations, as mentioned above, with some loss of accuracy.

【０１０３】２．純粋な「確率論的アプローチ」：このプログラムは、系を１つの巨大な集
団として処理するように強制されている。このアルゴリズムは、組み合わせの数
を網羅的に評価することができる数まで減らす。タンパク質における全ての水素
がそれぞれの組み合わせにおいて同時に追加され、従って、エネルギー評価の間
に近似は適用されない。このことは、少数の組み合わせ間のエネルギー差がわず
かであり、多くの長距離静電的相互作用の蓄積が重要な寄与を最終結果に加え得
る場合に重要である。このアプローチはより大きなＣＰＵ要求を有する。すなわ
ち、同じ系に対する計算はＳｉｌｉｃｏｎＧｒａｐｈｉｃｓＲ１００００マ
シンでは約１５分かかる。2. Pure "stochastic approach": This program is forced to treat the system as one huge ensemble. This algorithm reduces the number of combinations to a number that can be exhaustively evaluated. All hydrogens in the protein are added simultaneously in each combination, so no approximation is applied during energy estimation. This is important if the energy difference between the few combinations is small and the accumulation of many long-range electrostatic interactions can add a significant contribution to the final result. This approach has a larger CPU demand. That is, the calculation for the same system takes about 15 minutes on a Silicon Graphics R10000 machine.

【０１０４】系およびエネルギー関数が与えられた場合、純粋な「確率論的アプローチ」が
多数の可能な組み合わせからグローバルエネルギーミニマムを見つけることがで
きるかどうかを明らかにするために試験が考案された。この種類の計算の時間制
限を克服するために、仮想タンパク質を構築した。このタンパク質は図７に示す
ように１１８６個のアミノ酸を有し、このうち１３個がセリン（ＣＰＫモデルと
して示される）（１３セグメント）および１１７３個がグリシン（０セグメント
）である。これは６４Å^＊６４Å^＊６１Åの大きさを有する球状の形をしている
。セリンヒドロキシルの酸素は少なくとも１０Å離れるように配置した。この場
合、ヒドロキシル間の相互作用は無視することができ、各セグメントは別々の集
団として処理することができる。系のグローバルミニマムを得るために、この集
団における可能な全ての組み合わせを評価することができる。こうして、純粋な
「確率論的アプローチ」を集団アプローチ（この独特の場合、完全な網羅的評価
とほぼ等しい）と比較した。確率論的探索は総数５．０２^＊１０^１０の組み合わ
せから始まり、２０４回の反復後に２．７^＊１０^３の組み合わせに達し、次いで
、２．７^＊１０^３の組み合わせを網羅的に評価した。集団法は、１＋４＋１５＋
１２＋１０＋１１＋２＋１０＋６＋１２＋５＋８＋１１＝１０７の計算（全セグ
メントの位置の合計）だけを必要とした。２種類の方法により、エネルギーおよ
びプロトン位置について正確な同じ結果が得られた。A test was devised to reveal whether a pure “stochastic approach” could find a global energy minimum from a large number of possible combinations, given the system and the energy function. To overcome the time limitation of this kind of calculation, a hypothetical protein was constructed. This protein has 1186 amino acids, as shown in Figure 7, of which 13 are serines (shown as a CPK model) (13 segments) and 1173 are glycines (0 segments). It has a spherical shape with a size of 64Å ^* 64Å ^* 61Å. The oxygen of the serine hydroxyl was placed at least 10Å apart. In this case, the interactions between the hydroxyls can be ignored and each segment can be treated as a separate population. All possible combinations in this population can be evaluated to obtain a global minimum of the system. Thus, the pure “stochastic approach” was compared to the collective approach (which in this particular case is roughly equivalent to a full exhaustive evaluation). The probabilistic search started with a total of 5.02 ^* 10 ¹⁰ combinations, reaching 2.7 ^* 10 ³ combinations after 204 iterations, and then 2.7 ^* 10 ³ combinations were exhaustively evaluated. The group method is 1 + 4 + 15 +
Only 12 + 10 + 11 + 2 + 10 + 6 + 12 + 5 + 8 + 11 = 107 calculations (sum of positions of all segments) were needed. The two methods gave the exact same results in terms of energy and proton position.

【０１０５】Ｘ線および中性子回折の組み合わせからの高分解能座標を有する５種類のタン
パク質系が分析された。５ＰＴＩ、５ＲＳＡ、および２ＭＢ５だけが溶媒和殻内
に多くの水分子（プロトン位置を含む）を有する。Five protein systems with high resolution coordinates from a combination of X-ray and neutron diffraction were analyzed. Only 5PTI, 5RSA, and 2MB5 have many water molecules (including proton positions) within the solvation shell.

【０１０６】ウシ膵トリプシンインヒビター（５ＰＴＩ，１．８Å分解能）トリプシンインヒビターの構造は共同Ｘ線（１．０Å分解能）および中性子回
折（１．８Å分解能）により決定された（Ｗｌｏｄａｗｅｒら，Ｊ．Ｍｏｌ．Ｂ
ｉｏｌ．１９８７；１９３：１４５−１５６）。このＰＤＢファイルは５８のア
ミノ酸残基および６３の水分子の座標を含んでいる。５４の水分子を含む２．５
Åの水層がこの計算に含まれた。ＰＤＢからのカリウムイオンおよびＰＯ_４ ^３− イオンも計算に含まれた。残基ＧＬＵ７およびＭＥＴ５２の側鎖における原子が
２つの主要な部位を占有することが見出された。計算には^＊Ａ^＊形式を選択した
。４．５Åより短い距離にある回転可能な原子の集まりを１つの集団として定義
した。合計２１の集団および２５６の可能な位置があった。 Bovine Pancreatic Trypsin Inhibitor (5PTI, 1.8Å Resolution) The structure of trypsin inhibitor was determined by cooperative X-ray (1.0Å resolution) and neutron diffraction (1.8Å resolution) (Wlodawer et al., J. Mol. B
iol. 1987; 193: 145-156). This PDB file contains the coordinates of 58 amino acid residues and 63 water molecules. 2.5 containing 54 water molecules
A water layer of Å was included in this calculation. Potassium ions from PDB and PO ₄ ³⁻ ions were also included in the calculation. Atoms in the side chains of residues GLU7 and MET52 were found to occupy two major sites. The ^* A ^* format was selected for the calculation. We defined a group of rotatable atoms at a distance shorter than 4.5 Å as a group. There were a total of 21 populations and 256 possible positions.

【０１０７】「組み合わせ集団−確率論的アプローチ」が使用された。表Ｉにおいて、集団
における可能な組み合わせの数は各セグメントにおける組み合わせの数の積であ
る。集団９の組み合わせの総数は各セグメントの位置の数を掛けた結果であり、
従って、６，４０３，３２０に達する。この集団は確率論的に解いたのに対して
、他の集団は網羅的に解いた。「合計エネルギー」は全ての集団の合計である。
別々の集団における組み合わせ全ての最小エネルギーは−１２１．０Ｋｃａｌ／
ｍｏｌｅであったのに対して、最大エネルギーは２．９Ｅ＋１６Ｋｃａｌ／ｍｏ
ｌｅであった。この最大エネルギー値は、前処理段階では１つだけのプロトン突
出が試験されるために、この段階で排除することができなかった回転可能な水素
間の「突出」の結果である。A “combinatorial population-stochastic approach” was used. In Table I, the number of possible combinations in the population is the product of the number of combinations in each segment. The total number of combinations in group 9 is the result of multiplying the number of positions in each segment,
Therefore, 6,403,320 is reached. This group solved probabilistically, while other groups solved it exhaustively. "Total energy" is the sum of all populations.
The minimum energy of all combinations in separate populations is -121.0 Kcal /
The maximum energy was 2.9E + 16 Kcal / mo, while it was mole.
It was le. This maximum energy value is the result of a "protrusion" between rotatable hydrogens that could not be eliminated at this stage because only one proton protrusion was tested in the pretreatment stage.

【０１０８】純粋な「確率論的アプローチ」における系の挙動を図８および９ａに示す。図
８は、ｌｎ（可能な組み合わせの総数）対反復数を示す。組み合わせの初期数は
１．１９^＊１０^３０であり、このうち２６９０だけが４４３回の反復後に網羅的
計算のために残る。The behavior of the system in the pure “stochastic approach” is shown in FIGS. 8 and 9a. FIG. 8 shows In (total number of possible combinations) vs. number of iterations. The initial number of combinations is 1.19 ^* 10 ³⁰ , of which only 2690 remain for exhaustive computation after 443 iterations.

【０１０９】図９ａは、１回目と４回目の反復におけるエネルギー分布を示す。ｘ軸は、全
ての反復について同じエネルギー値を保持しない。すなわち、抽出された標本の
平均エネルギーは漸進的反復で減少する。従って、標本は３０のカラム間に分け
られる。すなわち、最小エネルギー標本はカラム１にあり、最大エネルギー標本
はカラム３０にある。全ての反復において抽出された標本の数は一定である。ア
ルゴリズムは反復プロセスの間にエネルギー値の突出を排除することが分かる。
従って、エネルギー分布はどんどんベル型になる。FIG. 9a shows the energy distribution in the first and fourth iterations. The x-axis does not hold the same energy value for all iterations. That is, the average energy of the extracted sample decreases in progressive iterations. Therefore, the specimen is divided between 30 columns. That is, the minimum energy sample is in column 1 and the maximum energy sample is in column 30. The number of samples taken in all iterations is constant. It can be seen that the algorithm eliminates spikes in energy values during the iterative process.
Therefore, the energy distribution becomes more and more bell-shaped.

【０１１０】ＲＮａｓｅ−Ａ（５ＲＳＡ，２．０Å分解能）リボヌクレアーゼＡの構造は共同Ｘ線および中性子回折（２．０Å分解能）に
より決定された（Ｗｌｏｄａｗｅｒら，Ａｃｔａ．Ｃｒｙｓｔａｌｌｏｇｒ．Ｂ
１９８６；４２：３７９−３８７）。このＰＤＢファイルは、１２４のアミノ
酸残基、ＰＯ_４ ^３−イオン、および１２８の水分子の座標を含んでいる。９０の
水分子を含む２．５Åの水層がこの計算に組み込まれた。計算中、５ＲＳＡの４
つのヒスチジン残基はＰＤＢファイルに見られるようにプロトン化された形態で
保たれた。The structure of RNase-A (5RSA, 2.0Å resolving power) ribonuclease A was determined by cooperative X-ray and neutron diffraction (2.0Å resolving power) (Wlodawer et al., Acta. Crystallogr. B.
1986; 42: 379-387). This PDB file contains the coordinates of 124 amino acid residues, PO ₄ ^3- ions, and 128 water molecules. A 2.5 liter water layer containing 90 water molecules was incorporated into this calculation. 4 in 5 RSA during calculation
The four histidine residues were retained in the protonated form as seen in the PDB file.

【０１１１】４．５Åより短い距離にある回転可能な原子の集まりを１つの集団として定義
した。総数３７の集団および４８５の可能な位置（表ＩＩ）が受け取られた。「
組み合わせ集団−確率論的アプローチ」が使用された。多くの組み合わせを含む
集団２、７、１０、２９は確率論的に解いたのに対して、他の集団は網羅的に解
いた。別々の集団における全ての最小エネルギー組み合わせの合計である最小エ
ネルギーは−６０．８Ｋｃａｌ／ｍｏｌｅであったのに対して、最大エネルギー
は２６１．３Ｋｃａｌ／ｍｏｌｅであった。高エネルギー値を有する組み合わせ
は、前処理「突出値」計算によって初期段階で除外された。A collection of rotatable atoms at a distance shorter than 4.5 Å was defined as one ensemble. A total of 37 populations and 485 possible positions (Table II) were received. "
The "combined population-stochastic approach" was used. Populations 2, 7, 10, 29 containing many combinations solved probabilistically, while the other populations solved exhaustively. The minimum energy, which is the sum of all minimum energy combinations in the separate populations, was -60.8 Kcal / mole, while the maximum energy was 261.3 Kcal / mole. Combinations with high energy values were excluded at an early stage by the pretreatment "protrusion value" calculation.

【０１１２】純粋な「確率論的アプローチ」における系の挙動を図８および９ｂに示す。組
み合わせの初期数は１．７６^＊１０^５９であり、このうち２７７２だけが６６８
回の反復後に網羅的計算のために残る。The behavior of the system in the pure “stochastic approach” is shown in FIGS. 8 and 9b. The initial number of combinations is 1.76 ^* 10 ⁵⁹ , of which only 2772 is 668.
It remains for exhaustive computation after iterations.

【０１１３】図９ｂは、１回目と４回目の反復におけるエネルギー分布を示す。エネルギー
突出値がないために、最小化の間、エネルギー分布はベル型のままである。ミオグロビン（２ＭＢ５）ミオグロビンの構造は中性子回折（１．８Å分解能）によって決定された（Ｃ
ｈｅｎｇおよびＳｃｈｏｅｎｂｏｒｎ，ＡｃｔａＣｒｙｓｔａｌｌｏｇｒ．Ｂ
１９９０；４６：１９５−２０８）。このＰＤＢファイルは、１５３のアミノ
酸残基および８９の水分子（これらのプロトンを含む）の座標を含んでいる。こ
れは、Ｆｅを有するプロトポルフィリン、アンモニウムイオン、硫酸イオンを含
んでいる。全ての水、イオン、およびプロトポルフィリン部分が計算に含まれた
。ＨＥＭＣＯ原子は不規則である。計算には^＊Ａ^＊形式を選択した。FIG. 9b shows the energy distribution at the 1 st and 4 th iterations. The energy distribution remains bell-shaped during minimization due to the lack of energy spikes. The structure of myoglobin (2MB5) myoglobin was determined by neutron diffraction (1.8Å resolution) (C
Heng and Schoenborn, Acta Crystallogr. B
1990; 46: 195-208). This PDB file contains the coordinates of 153 amino acid residues and 89 water molecules (including their protons). This contains protoporphyrin having Fe, ammonium ion, and sulfate ion. All water, ions, and protoporphyrin moieties were included in the calculation. HEM CO atoms are disordered. The ^* A ^* format was selected for the calculation.

【０１１４】表ＩＩＩに示すように、「組み合わせ集団−確率論的アプローチ」が使用され
た。４．５Åより短い距離にある回転可能な原子の集まりを１つの集団として定
義した。総数４３の集団を得た。As shown in Table III, a “combinational population-stochastic approach” was used. We defined a group of rotatable atoms at a distance shorter than 4.5 Å as a group. A total of 43 populations were obtained.

【０１１５】純粋な「確率論的アプローチ」における系の挙動を図８および９ｃに示す。組
み合わせの初期数は４．９８^＊１０^５２であり、このうち２４００だけが５５２
回の反復後に網羅的計算のために残る。図９ｃは、１回目と４回目の反復におけ
るエネルギー分布を示す。The behavior of the system in the pure “stochastic approach” is shown in FIGS. 8 and 9c. The initial number of combinations is 4.98 ^* 10 ⁵² , of which only 2400 have 552
It remains for exhaustive computation after iterations. FIG. 9c shows the energy distribution at the 1st and 4th iterations.

【０１１６】トリプシン（１ＮＴＰ，１．８Å分解能）トリプシンの構造は中性子回折（１．８Å分解能）によって決定された（Ｋｏ
ｓｓｉａｋｏｆｆ，ＢａｓｉｃＬｉｆｅＳｃｉ１９８４；２７：２８１−
３０４）。この酵素はモノイソプロピルホスホリル誘導体によって阻害され、そ
のことを計算中、考慮に入れた。２＋の電荷を有するカルシウムイオンをＰＤＢ
ファイルの表示に従って追加し、ＧＬＵ７０、ＡＳＮ７２、ＶＡＬ７５、および
ＧＬＵ８０の近くに配置した。この構造は結晶水を全く含まない。４．５Åより
短い距離にある回転可能な原子の集まりを１つの集団として定義する。 Trypsin (1NTP, 1.8Å resolution) The structure of trypsin was determined by neutron diffraction (1.8Å resolution) (Ko
ssiakoff, Basic Life Sci 1984; 27: 281-.
304). This enzyme was inhibited by the monoisopropylphosphoryl derivative, which was taken into account in the calculation. Calcium ion with 2+ charge PDB
Added according to file indications and placed near GLU70, ASN72, VAL75, and GLU80. This structure contains no water of crystallization. A group of rotatable atoms at a distance shorter than 4.5 Å is defined as a group.

【０１１７】再度、「組み合わせ集団−確率論的アプローチ」が使用された。表ＩＶは、４
８３．９Ｋｃａｌ／ｍｏｌｅの最小エネルギーを有する総数３３の集団を示す。純粋な「確率論的アプローチ」における系の挙動を図６および７ｄに示す。組
み合わせの初期数は９．６３^＊１０^１０であり、このうち１１５２だけが１４回
の反復後に網羅的計算のために残る。Once again, the "combinatorial population-stochastic approach" was used. Table IV is 4
A total of 33 populations with a minimum energy of 83.9 Kcal / mole are shown. The behavior of the system in the pure "stochastic approach" is shown in Figures 6 and 7d. The initial number of combinations is 9.63 ^* 10 ¹⁰ , of which only 1152 remain for exhaustive computation after 14 iterations.

【０１１８】リン酸結合タンパク質（１ＩＸＨ，０．９８Å分解能）リン酸結合タンパク質の構造はＸ線回析によって決定された（Ｗａｎｇら，Ｎ
ａｔ．Ｓｔｒｕｃｔ．Ｂｉｏｌ．１９９７；４：５１９−５２２）。ＰＤＢファ
イルは３２１のアミノ酸残基を含んでいる。水分子の座標は報告されていない。
このタンパク質は−３の電荷を有するＰＯ_４リン酸イオンと複合体を形成する。
このイオンは計算に含まれた。このエントリは、６つの不規則な残基：Ｇｌｕ１
、Ｓｅｒ３、Ｔｈｒ１６２、Ｐｒｏ２１６、Ｓｅｒ２３４、Ｌｙｓ２４５を含ん
でいる。これらの全てに^＊Ａ^＊形式を選択した。表Ｖに示すように、「組み合わ
せ集団−確率論的アプローチ」が使用された。４．５Åより短い距離にある回転
可能な原子の集まりを１つの集団として定義した。総数４５の集団を得た。 Phosphate-binding protein (1IXH, 0.98 Å resolution) The structure of the phosphate-binding protein was determined by X-ray diffraction (Wang et al., N.
at. Struct. Biol. 1997; 4: 519-522). The PDB file contains 321 amino acid residues. The coordinates of the water molecule have not been reported.
This protein forms a complex with PO ₄ phosphate, which has a −3 charge.
This ion was included in the calculation. This entry contains 6 irregular residues: Glu1.
, Ser3, Thr162, Pro216, Ser234, Lys245. The ^* A ^* format was chosen for all of these. As shown in Table V, the "combinational population-stochastic approach" was used. We defined a group of rotatable atoms at a distance shorter than 4.5 Å as a group. A total of 45 populations were obtained.

【０１１９】純粋な「確率論的アプローチ」における系の挙動を図８に示す。組み合わせの
初期数は１．１８^＊１０^２１であり、このうち２４００だけが５１回の反復後に
網羅的計算のために残る。The behavior of the system in the pure “stochastic approach” is shown in FIG. The initial number of combinations is 1.18 ^* 10 ²¹ , of which only 2400 remain for exhaustive calculation after 51 iterations.

【０１２０】考察５種類の系は２つのカテゴリーに分けるべきである。第１のカテゴリーは、水
分子の座標の実験データを欠いている系である。このような系はトリプシン（１
ＮＴＰ）およびリン酸結合タンパク質（１ＩＸＨ）である。図１０は１ＮＴＰお
よびその極性残基のリボン表示を示す。多くの極性水素が水分子との水素結合を
生じるはずである。しかしながら、このＰＤＢエントリに水の座標は含まれてい
ない。この場合、本発明の方法は、タンパク質表面上の残基の極性プロトンの正
しい位置決定に必須のデータを欠いている。 Discussion The five types of systems should be divided into two categories. The first category is systems that lack experimental data for water molecule coordinates. Such a system has trypsin (1
NTP) and phosphate binding protein (1IXH). Figure 10 shows a ribbon representation of 1NTP and its polar residues. Many polar hydrogens should form hydrogen bonds with water molecules. However, this PDB entry does not include water coordinates. In this case, the method of the invention lacks the data essential for the correct localization of the polar protons of the residues on the protein surface.

【０１２１】５ＲＳＡ、５ＰＴＩ、および２ＭＢ５は、水の位置に関する多くの実験データ
を有する系である。これらは本研究に最も重要な３つの系であり、優れたアルゴ
リズムがこれらの系の正確なプロトン予測をもたらすと予想される。5RSA, 5PTI, and 2MB5 are systems with lots of experimental data on water location. These are the three most important systems in this study, and good algorithms are expected to yield accurate proton predictions for these systems.

【０１２２】生体分子構造におけるプロトンの位置を特定する方法の結果は少ない基準によ
って評価されるべきである。第１に、結果の質は、以前に述べられた方法と比較
して、ならびに実験で得られたプロトン座標と比較して理論上のプロトン座標の
無視できるほどのＲＭＳを達成するという最終目的に関して評価されるべきであ
る。The results of methods for locating protons in biomolecular structures should be evaluated by a few criteria. First, the quality of the results is related to the ultimate goal of achieving a negligible RMS of theoretical proton coordinates as compared to the previously described method as well as to experimentally obtained proton coordinates. Should be evaluated.

【０１２３】表ＶＩに示すように、「組み合わせ集団−確率論的アプローチ」および純粋な
「確率論的アプローチ」の結果を、実験で得られた結果、ＭＳＩＤｉｓｃｏｖ
ｅｒ／ＩｎｓｉｇｈｔＩＩソフトウエアパッケージを使用したＣＶＦＦ最小化、
ＢｒｕｎｇｅｒおよびＫａｒｐｌｕｓの方法、ならびにＢａｓｓらの方法と比較
した。ＣＶＦＦ最小化は、最初の１００回の反復では「最急降下法」アルゴリズ
ムを使用し、その後、０．００１Ｋｃａｌ／Å未満の最大導関数を有する収束が
達成されるまで共役勾配法を使用した。As shown in Table VI, the results of the “combinational population-stochastic approach” and the pure “stochastic approach” were compared to the experimental results, MSI Discov.
CVFF minimization using er / InsightII software package,
Compared to the method of Brunger and Karplus, and the method of Bass et al. CVFF minimization used a "steepest descent" algorithm in the first 100 iterations, followed by a conjugate gradient method until convergence with a maximum derivative of less than 0.001 Kcal / Å was achieved.

【０１２４】標準的なプログラム（例えば、Ｄｉｓｃｏｖｅｒ／ＣＶＦＦ（ＢＩＯＳＹＭ／
ＭｏｌｅｃｕｌａｒＳｉｍｕｌａｔｉｏｎｓ．Ｄｉｓｃｏｖｅｒ２．９．７
Ｆｏｒｃｅｆｉｅｌｄｓｉｍｕｌａｔｉｏｎｓｕｓｅｒｇｕｉｄｅ
１９９５；Ｐａｒｔ１；ＢＩＯＳＹＭ／ＭｏｌｅｃｕｌａｒＳｉｍｕｌａｔｉ
ｏｎｓ．ＩｎｓｉｇｈｔＩＩ９５．０ＭｏｌｅｃｕｌａｒＭｏｄｅｌｉｎ
ｇＳｙｓｔｅｍＵｓｅｒＧｕｉｄｅ；１９９５によるさらなる最適化を備
えるＩｎｓｉｇｈｔ（ＢＩＯＳＹＭ／ＭｏｌｅｃｕｌａｒＳｉｍｕｌａｔｉｏ
ｎｓ．Ｄｉｓｃｏｖｅｒ２．９．７Ｆｏｒｃｅｆｉｅｌｄｓｉｍｕｌａ
ｔｉｏｎｓｕｓｅｒｇｕｉｄｅ１９９５；Ｐａｒｔ１；ＢＩＯＳＹＭ／Ｍ
ｏｌｅｃｕｌａｒＳｉｍｕｌａｔｉｏｎｓ．ＩｎｓｉｇｈｔＩＩ９５．０
ＭｏｌｅｃｕｌａｒＭｏｄｅｌｉｎｇＳｙｓｔｅｍＵｓｅｒＧｕｉｄｅ
；１９９５））によるプロトンの位置決定と比較して、本発明の方法（集団−確
率論的方法および「純粋な確率論的」方法）による改善が明らかに証明された。
ＢｒｕｎｇｅｒおよびＫａｒｐｌｕｓ（Ｐｒｏｔｅｉｎｓ１９８８；４：１４
８−１５６）のアルゴリズムなどの自己一貫性アルゴリズムは、通常、非特異的
な方法より良い結果を生じる。しかしながら、これらは本発明の方法ほどに正確
でない。本発明の方法は、より実験的に正確な５ＲＳＡおよび５ＰＴＩにおいて
Ｂａｓｓらより良い結果を生じ（表ＶＩにおけるＳｅｒ、ｔｒｙ、および水のＲ
ＭＳ値を参照のこと）、あまり正確でない系１ＮＴＰにおいて同様の結果を生じ
る。A standard program (for example, Discover / CVFF (BIOSYM /
Molecular Simulations. Discover 2.9.7
Force field simulations user guide
1995; Part 1; BIOSYM / Molecular Simulati
ons. InsightII 95.0 Molecular Modelin
Insight (BIOSYM / Molecular Simulatio) with further optimization according to g System User Guide; 1995.
ns. Discover 2.9.7 Force field simula
Tions user guide 1995; Part 1; BIOSYM / M
olecular Simulations. InsightII 95.0
Molecular Modeling System User Guide
1995)) compared to proton localization according to the invention, the improvement by the methods of the invention (population-stochastic and "pure stochastic" methods) was clearly demonstrated.
Brunger and Karplus (Proteins 1988; 4:14).
Self-consistency algorithms, such as the algorithm of 8-156), usually yield better results than non-specific methods. However, these are not as accurate as the method of the invention. The method of the invention yields better results than Bass et al. In the more experimentally accurate 5RSA and 5PTI (Ser, try, and water R in Table VI).
See MS values), giving similar results in the less accurate System 1 NTP.

【０１２５】本発明は、Ｂａｓｓらより優れた２つのさらなる改善点を有する。第１に、９
２ものセグメント（５ＲＳＡ）を有する系を１つの巨大な集団として処理するこ
とができるので（「純粋な確率論的」アプローチ）、集団の大きさに対する制限
はないのに対して、Ｂａｓｓら（Ｐｒｏｔｅｉｎｓ１９９２；１２：２６６−
２７７）は非常に小さな大きさに制限される。表Ｉ〜Ｖから、タンパク質のいく
つかの領域における回転可能なプロトン間の近い距離は、各プロトンの位置決定
のための選択肢の数と一緒にして考えると、全ての選択肢を考慮しなければなら
ない場合では極端に長い計算を必要とすることが明らかである。比較的短い時間
で大きな分子におけるプロトンの位置を特定する必要性に特別に注意を払わなけ
ればならないので、確率論的方法は、かなり大きな分子を処理する能力がより良
く備わっている。第２に、本発明の方法はエネルギーを基礎としているのに対し
て、Ｂａｓｓら（Ｐｒｏｔｅｉｎｓ１９９２；１２：２６６−２７７）の方法
はエネルギーを基礎としていない。The present invention has two additional improvements over Bass et al. First, 9
Since a system with as many as two segments (5RSA) can be treated as one huge ensemble ("pure stochastic" approach), there is no restriction on ensemble size, whereas Bass et al. (Proteins). 1992; 12: 266-.
277) is limited to a very small size. From Tables IV, close distances between rotatable protons in some regions of the protein should be considered when considering together with the number of options for localization of each proton. It is clear that some cases require extremely long calculations. The probabilistic method is better equipped to handle fairly large molecules, since special attention must be paid to the need to locate protons in large molecules in a relatively short time. Second, the method of the present invention is energy-based, whereas the method of Bass et al. (Proteins 1992; 12: 266-277) is not energy-based.

【０１２６】他と比較して改善した１つの残基タイプの予測に関して一貫性は見出されなか
った。これはプロトンの位置を特定する他の方法にも当てはまる。しかしながら
、場合によっては、本発明者らは、残基タイプについての本発明者らのＲＭＳ結
果の桁とＢａｓｓらのＲＭＳ結果の桁との間に相関関係を見つけている。これは
、各タンパク質におけるこれらの残基タイプの空間的分布（すなわち、タンパク
質コアに近い残基もあれば、タンパク質表面に近い残基もある）と関連している
可能性があり、水の位置に関する情報がないために正確でない可能性がある。No consistency was found with regard to the improved prediction of one residue type compared to the other. This also applies to other methods of locating protons. However, in some cases, we find a correlation between the order of our RMS results and the order of our Bass RMS results for residue types. This may be related to the spatial distribution of these residue types in each protein (ie, some residues are closer to the protein core and some are closer to the protein surface) and May not be accurate due to lack of information about.

【０１２７】純粋な確率論的アプローチは、解を除外する反復計算の間にいくつかの低エネ
ルギー解を落とし、従って、正確でない結果を生じることが予想される。系の大
部分を網羅的に解く集団−確率論的方法と比較して、純粋な確率論的アプローチ
がどのくらい良く機能するかを見出すことは注目に値する。「純粋な」確率論的
アプローチと網羅的探索を比較するために、結果の項で述べた「仮想タンパク質
」が使用された。両方のアプローチにより、５．０２^＊１０^１０の可能な組み合
わせから同じ最小値が得られる。これは、グローバルミニマムを見出すツールと
しての「純粋な」確率論的アプローチの確実性についての補足的な示唆である。It is expected that the pure stochastic approach will drop some low energy solutions during the iterative calculation that excludes the solution, and thus yield inaccurate results. It is noteworthy to find out how well a pure stochastic approach works, as compared to ensemble-stochastic methods that solve most of the system exhaustively. The "virtual protein" mentioned in the results section was used to compare the "pure" stochastic approach with the exhaustive search. Both approaches yield the same minimum from 5.02 ^* 10 ¹⁰ possible combinations. This is a complementary suggestion of the certainty of a "pure" stochastic approach as a tool to find a global minimum.

【０１２８】エネルギー分布表を詳細に調べることによって系の特徴に関する情報を得るこ
とができる（図９）。１回目の反復におけるベル型の分布は、回転可能な水素間
に突出がないことを示している。少ない反復の後に得られる、回転可能なプロト
ンの位置についてのエネルギー分布の「規則正しい」ベル型は、これらのプロト
ン近くでのタンパク質密度の表示である可能性がある。すなわち、「密集した」
タンパク質は回転の障壁を増大させているはずである。従って、そのエネルギー
はエネルギースペクトルの上端に偏っているはずである。ベル型は、あまり密集
してない周囲における、これらのプロトンの比較的「自由な回転」の証拠である
かもしれない。Information on the characteristics of the system can be obtained by examining the energy distribution table in detail (FIG. 9). The bell-shaped distribution in the first iteration shows no protrusion between the rotatable hydrogens. The "ordered" bell shape of the energy distribution for the positions of rotatable protons obtained after a few iterations could be an indication of protein density near these protons. That is, "crowded"
The protein should increase the barrier to rotation. Therefore, its energy should be biased to the upper end of the energy spectrum. The bell shape may be evidence of the relatively "free rotation" of these protons in the less dense surroundings.

【０１２９】第２節：アミノ酸側鎖の位置特定本発明はまた、タンパク質内のアミノ酸側鎖の位置を正しく決定するという問
題を解決するのに特に有用である。本発明のこの特定の実施は、過度の仮定なく
、また組み合わせ的爆発なく、かなりの精度でこのような位置の決定を可能にす
ることによって難しい問題を解決する。 Section 2: Localization of Amino Acid Side Chains The present invention is also particularly useful in solving the problem of correctly locating amino acid side chains within proteins. This particular implementation of the invention solves the difficult problem by allowing such a position determination with considerable accuracy without undue assumptions and combinatorial explosions.

【０１３０】本節の「方法」にて述べられる本発明の特定の実施はまた、「結果」にて述べ
られるように当該技術分野で周知の他の方法と突き合わせて試験された。これら
の方法および結果は例示のためだけに示され、どのようにも制限することは意図
していないことに留意すべきである。次に、これらの結果の解釈が「考察」にて
述べられる。Certain implementations of the invention described in “ Methods ” of this section were also tested against other methods known in the art as described in “ Results ”. It should be noted that these methods and results are presented for illustrative purposes only and are not intended to be limiting in any way. The interpretation of these results is then described in " Discussion ".

【０１３１】方法探索アプローチコードは、バックボーン依存的回転異性体ライブラリー（ｂａｃｋｂｏｎｅ
ｄｅｐｅｎｄｅｎｔｒｏｔａｍｅｒｌｉｂｒａｒｙ）を使用する（Ｂｏｗｅ
ｒら，Ｊ．Ｍｏｌ．Ｂｉｏｌ．１９９７；２６７：１２６８−１２８２；Ｄｕｎ
ｂｒａｃｋおよびＫａｒｐｌｕｓ，Ｎａｔ．Ｓｔｒｕｃｔ．Ｂｉｏｌ．１９９４
；１：３３４−３４０；ＤｕｎｂｒａｃｋおよびＫａｒｐｌｕｓ，Ｊ．Ｍｏｌ．
Ｂｉｏｌ．１９９３；２３０：５４３−５７４）。試験だけのために、かつ全く
制限する意図はないが、ＤｕｎｂｒａｃｋおよびＫａｒｐｌｕｓの回転異性体ラ
イブラリーの１９９７年８月更新バージョンを以下に記載の試験に使用した。融
合原子モデルを使用する（Ｗｅｉｎｅｒら，ＪＡｍｅｒ．Ｃｈｅｍ．Ｓｏｃ．
１９８４；１０６：７６５−７８４）。エネルギーは、ＡＭＢＥＲ非結合１２−
６Ｌｅｎｎａｒｄ−Ｊｏｎｅｓおよび静電エネルギー項を用いて式１によって
計算される。式中、Ａ_ｉ，ｊは２つの（ｉ，ｊ）原子の反発パラメータであり、
Ｂ_ｉ，ｊは２つの（ｉ，ｊ）原子の分極率引力パラメータであり、ｑ_ｉは部分電
荷であり、ｒ_ｉｊは原子間の距離であり、εは誘電率である。距離依存的誘電率
ε＝ｒが使用されている。Ｖ_ｎは、ねじれ角φのねじれポテンシャル障壁高さで
あり、ｎは多重度であり、γは位相因子（ｐｈａｓｅｆａｃｔｏｒ）である。
Ｖ_ｎのポテンシャルはＡＭＢＥＲ力場パラメータから選ばれている。バックボー
ンおよび他の残基の回転異性体との相互作用についての非結合エネルギーが計算
される。各残基の回転異性体の全ての二面角についての、ねじれエネルギー項が
計算される。ある特定の原子対について、非結合エネルギー項が１０Ｋｃａｌ／
ｍｏｌｅの値を超える場合、１０Ｋｃａｌ／ｍｏｌｅに切り捨てられる。 Method Exploration Approach The code is based on a backbone-dependent rotamer library (backbone).
Dependent Rotor Library (Bowe)
r et al. Mol. Biol. 1997; 267: 1268-1282; Dun.
black and Karplus, Nat. Struct. Biol. 1994
1: 334-340; Dunblack and Karplus, J .; Mol.
Biol. 1993; 230: 543-574). For testing purposes only, and not in any way limiting, the August 1997 updated version of the Dunblock and Karplus rotamer library was used in the tests described below. Use the fused atom model (Weiner et al., J Amer. Chem. Soc.
1984; 106: 765-784). Energy is AMBER unbound 12-
6 Calculated according to Eq. 1 using the Lennard-Jones and electrostatic energy terms. _Where A _{i, j} is the repulsion parameter of two (i, j) atoms,
B _{i, j} is a polarizability attraction parameter of two (i, j) atoms, q _i is a partial charge, r _ij is a distance between atoms, and ε is a dielectric constant. A distance dependent permittivity ε = r is used. V _n is the twist potential barrier height of the twist angle φ, n is the multiplicity, and γ is the phase factor.
The potential of V _n is chosen from the AMBER force field parameters. The non-bonding energies for the interactions of the backbone and other residues with rotamers are calculated. Torsional energy terms are calculated for all dihedral angles of the rotamer of each residue. For a particular atom pair, the non-bonding energy term is 10 Kcal /
If it exceeds the value of mole, it is rounded down to 10 Kcal / mole.

【０１３２】[0132]

【数６】Ｂｏｗｅｒら（Ｊ．Ｍｏｌ．Ｂｉｏｌ．１９９７；２６７：１２６８−１２８
２）により示唆され、ＳＣＷＲＬアルゴリズムにおいて実施されるように、どの
回転異性体にも、バックボーン依存的回転異性体ライブラリーにおける確率に基
づいて局所エネルギーが与えられる。エネルギーは、−ｌｎ（ｐ_{ｒｏｔａｍｅｒ} ／ｐ_０）として、バックボーン依存的回転異性体ライブラリーの確率から得られ
る。式中、ｐ_０は、最も起こりそうな回転異性体の確率であり、ｐ_{ｒｏｔａｍｅ} _ｒは、ある特定の回転異性体の確率である（ｋＴ＝１と仮定する）。この探索ス
トラテジーは、以下のいくつかのステップを含む。[Equation 6] Bower et al. (J. Mol. Biol. 1997; 267: 1268-128.
As suggested by 2) and implemented in the SCWRL algorithm, every rotamer is given a local energy based on its probability in a backbone-dependent rotamer library. _Energy, as _{_{-ln (p rotamer / p 0)}} , obtained from the probability of the backbone-dependent rotamer library. Wherein, _{p 0} is the probability of the most probable _{rotamers, p rotame} _r is (assuming kT = 1) is a certain probability of a particular rotamer. This search strategy involves several steps:

【０１３３】（Ｉ）立体衝突の排除段階および予備的な回転異性体の位置特定：計算のため
の入力は、既知の構造を有するタンパク質のバックボーン（Ｎ、Ｃ_α、Ｃ、Ｏ）
座標である。各残基の可能な回転異性体の初期配置を作成するために、これらの
座標とバックボーンのφおよびψ角度が使用される。システイン残基間の可能な
ジスルフィド結合は硫黄原子間の距離によって計算される。バックボーンと衝突
する全ての回転異性体が除外される。ある残基の全ての回転異性体がバックボー
ンと衝突する場合、最も小さい「衝突エネルギー」を有する回転異性体が残され
る。このアルゴリズムは、複数の単一回転異性体をバックボーンの一部として処
理する（すなわち、これらの残基と衝突する他の回転異性体もまた除外される）
。アルゴリズムはまた、アミノ酸ｊの回転異性体ｉとアミノ酸ｌの回転異性体ｋ
との全ての側鎖衝突を探索する。アルゴリズムはこのような対を解の一部から除
外し、従って、このような対は確率論的段階において標本抽出されない（下記を
参照のこと）。 (I) Steric collision elimination step and preliminary rotamer localization : The input for calculation is the backbone of a protein with a known structure (N, C _α , C, O).
Coordinates. These coordinates and the φ and ψ angles of the backbone are used to create the initial configuration of the possible rotamers of each residue. Possible disulfide bonds between cysteine residues are calculated by the distance between the sulfur atoms. All rotamers that collide with the backbone are excluded. When all rotamers of a residue collide with the backbone, the rotamer with the lowest "collision energy" is left behind. This algorithm treats multiple single rotamers as part of the backbone (ie, other rotamers that conflict with these residues are also excluded).
. The algorithm also uses rotamer i of amino acid j and rotamer k of amino acid l.
Search for all side chain collisions with. The algorithm excludes such pairs from the solution, so such pairs are not sampled at the stochastic stage (see below).

【０１３４】（ＩＩ）確率論的段階：タンパク質などの大きな生物学的系の場合、非常に大
きな組み合わせの問題が生じることは明らかである。例えば、ヒドロラーゼ（１
ａｒｂ）（Ｔｓｕｎａｓａｗａら，Ｊ．Ｂｉｏｌ．Ｃｈｅｍ．１９８９；２６４
：３８３２−３８３９）には、ステップＩの後に、２．２９^＊１０^１０５の代替
配置選択肢がある。問題の大きさを小さくするために、新規の確率論的アルゴリ
ズムが使用される。このタンパク質において、ｄ_０アミノ酸における側鎖回転異
性体は未知である。それぞれのアミノ酸について、通常、１を超える回転異性体
があるが、たった１つの回転異性体しか最小エネルギーを示さない。Ｘ_ｊ＝（Ｘ _ｊ１，Ｘ_ｊ２．．．Ｘ_ｊｄ０）が、タンパク質におけるｄ_０アミノ酸の無作為に
選択された回転異性体を含むタンパク質のコンフォメーションとする。それぞれ
のコンフォメーションＸ_ｊについて、エネルギーＥ_ｊ＝Ｅ（Ｘ_ｊ）は前記のエネ
ルギー関数に従って計算することができる。目標は、Ｅを最小にするコンフォメ
ーションを見つけることである。多数の組み合わせのために全ての代替コンフォ
メーションを評価することは不可能であるので、以下のステップを行う。[0134] (II) Probabilistic stage: Very large for large biological systems such as proteins
It is clear that there will be a problem of proper combination. For example, hydrolase (1
arb) (Tsunasakawa et al., J. Biol. Chem. 1989; 264.
: 3832-3839), after step I, 2.29.^*10¹⁰⁵Alternative to
There are placement options. In order to reduce the size of the problem, a new stochastic algorithm
Is used. In this protein, d₀Side chain rotation difference in amino acids
The sex is unknown. Usually more than one rotamer for each amino acid
However, only one rotamer shows the minimum energy. X_j= (X _j1 , X_j2．．． X_jd0) Is d in the protein₀Amino acids randomly
It is the conformation of the protein containing the selected rotamer. Each
Conformation X_jAbout energy E_j= E (X_j) Is the energy
It can be calculated according to the Ruggie function. The goal is a conformation that minimizes E.
To find a solution. All alternative conformations for numerous combinations
Since it is not possible to evaluate the formation, the following steps are taken.

【０１３５】１．組み合わせの大きな母集団からｎ個のコンフォメーション、Ｘ_１＝（Ｘ_１ _１，Ｘ_１２．．．Ｘ_１ｄ０），．．．，Ｘ_ｎ＝（Ｘ_ｎ１，Ｘ_ｎ２．．．Ｘ_ｎｄ０）を無作為に標本抽出する。式中、Ｘ_１１は、１番目のコンフォメーションにお
ける１番目のアミノ酸の無作為に選択された回転異性体であり、Ｘ_ｎ１は、ｎ番
目のコンフォメーションにおける同じアミノ酸の無作為に選択された回転異性体
である。本発明者らは、多くの十分な数のタンパク質コンフォメーションを作成
するためにｎ＝１０００を使用し、対応するエネルギー値：Ｅ_１＝Ｅ（Ｘ_１）〜
Ｅ_ｎ＝Ｅ（Ｘ_ｎ）を計算する。1. From the population of large combinations, n conformations, X ₁ = (X ₁ ₁ , X ₁₂ ... X _1d0 ) ,. ．． , X _n = (X _n1 , X _n2 ... X _nd0 ) are randomly sampled. Where X ₁₁ is a randomly selected rotamer of the 1 st amino acid in the 1 st conformation and X _n1 is a randomly selected rotation of the same amino acid in the n th conformation. Is an isomer. We used n = 1000 to create many sufficient numbers of protein conformations, and the corresponding energy values: E ₁ = E (X ₁ ) ˜.
Calculate E _n = E (X _n ).

【０１３６】２．分布Ｆ^ｎ _Ｅ（ｎ＝約１０^３）を作成する。Ｆ^ｎ _Ｅは、タンパク質全体につ
いてのｎ個の標本抽出されたコンフォメーション全てのエネルギーの集合である
。Ｆ^ｎ _Ｅにおけるカットオフ点ＨおよびＬを定義する。Ｈは、Ｅ_ｉ＞Ｆ^ｎ _Ｅ（１
−α）を満たす全ての変数値を含むのに対して（式中、Ｆ^ｎ _Ｅ（α）は、Ｆ^ｎ _Ｅの第α百分位数である）、ＬはＥ_ｉ＜Ｆ^ｎ _Ｅ（α）を満たす全ての変数値を含む
。ＨおよびＬのどちらもコンフォメーションの数はｎ_０＝ｎ^＊αである。最大エ
ネルギーコンフォメーションおよび最小エネルギーコンフォメーションについて
ｎ＝１０００コンフォメーションおよびα＝０．０１（１％）である場合、ｎ_０＝α^＊ｎ＝０．０１^＊１０００＝１０であり、従って、Ｌ＝１０およびＨ＝１０
である。言い換えれば、Ｈは１０個の最大エネルギーコンフォメーションを表す
のに対して、Ｌは最小エネルギーを有する１０個のコンフォメーションを表す。2. Create a distribution F ⁿ _E (n = approximately 10 ³ ). F ⁿ _E is the set of n samples extracted conformational all the energy for the entire protein. Defining a cut-off point H and L in F ⁿ _E. H is E _i > F ⁿ _E (1
While all variable values that satisfy −α) are included (wherein F ⁿ _E (α) is the ^αth _percentile of F ⁿ _E ), L is E _i < F ⁿ _E ( Includes all variable values that satisfy α). The number of conformations in both H and L is n ₀ = n ^* α. If n = 1000 conformation and α = 0.01 (1%) for maximum and minimum energy conformations, then n ₀ = α ^* n = 0.01 ^* 1000 = 10, so L = 10 and H = 10
Is. In other words, H represents the 10 highest energy conformations, while L represents the 10 highest energy conformations.

【０１３７】３．Ｈにおけるコンフォメーションに一致する全ての回転異性体変数について
ベクトルｈを作成する。ベクトルｈは、以下のような、Ｈにおける全ての回転異
性体状態のエレメントワイズ交差である。すなわち、Ｈにおける全ての回転異性
体状態が成分ｊ（コンフォメーションＸ_ｎのＸ_ｎｊに対応する）で同じ回転異性
体を共有する場合、ｈ_ｊ＝回転異性体数であり、さもなければｈ_ｊ＝０である
（全ての高エネルギーコンフォメーションにおけるｊについて共通する回転異性
体なし）。3. Create a vector h for all rotamer variables that match the conformation in H. Vector h is the elementwise intersection of all rotamer states in H, as follows: That is, if all rotamer states in H share the same rotamer in component j (corresponding to X _{nj of} conformation X _n ), then h _j = rotamer Number, otherwise h _j = 0 (no common rotamer for j in all high energy conformations).

【０１３８】４．Ｌにおけるコンフォメーションに一致する回転異性体変数についてベクト
ルｌを作成する。ベクトルｈと異なり、それぞれのアミノ酸ｊについて、１を超
える回転異性体がｌ_ｊにおけるｎ_０値の最大値まで現われる可能性がある。これ
は、Ｌの低エネルギーコンフォメーションに現われる成分ｊの全ての回転異性体
の和集合である。4. Create a vector l for rotamer variables that match the conformation in L. Unlike the vector h, for each amino acid j more than one rotamer may appear up to the maximum of the n ₀ values in l _j . It is the union of all rotamers of component j that appear in the low energy conformation of L.

【０１３９】５．ｈおよびｌを比較する。ｈ_ｊおよびｌ_ｊの両方が類似する回転異性体を有
する場合、これは低エネルギー値の一因ともなるので、存続可能な回転異性体状
態として残される。しかしながら、ｈ_ｊがｌ_ｊのどの要素とも一致しない場合、
対応する回転異性体ｈ_ｊは後の反復から除かれる。あるアミノ酸が１つしか回転
異性体を有さない場合、この回転異性体は唯一残っている解であるので、後の反
復から除かれない。5. Compare h and l. If both h _j and l _j have similar rotamers, this also contributes to the low energy value and is therefore left as a viable rotamer state. However, if h _j does not match any element of l _j , then
The corresponding rotamer h _j is excluded from later iterations. If an amino acid has only one rotamer, this rotamer is the only remaining solution and is not excluded from later iterations.

【０１４０】６．全ての変数の可能な組み合わせの数がユーザーにより定義された「確率論
的段階基準の限界値」より小さくなるまで、減少した一組の変数値に対してステ
ップ１〜４を繰り返す。6. Repeat steps 1-4 for a reduced set of variable values until the number of possible combinations of all variables is less than the user defined "probabilistic step criterion limit".

【０１４１】ｎ_０を決定するために使用されるαの値は注意して選択すべきである。αが大
きすぎる場合、回転異性体は排除されない。αが小さすぎる場合、回転異性体の
不当な排除が起こる可能性がある。いくら良く見ても、αは、回転異性体の排除
のために同じ確率を与えるように、それぞれのアミノ酸の可能な回転異性体の数
によって調節すべきである。αの決定を説明するために、どの回転異性体も、そ
の環境内で他の任意のアミノ酸との相互作用による影響を受けないと仮定する。
＞９９．９８３％の確実性で回転異性体を正しく排除する、１つの残基の２〜２
９個の可能な回転異性体のα値を図１４に示す。これらの値を以下のように計算
した。３つの回転異性体を有する残基が与えられた場合、９９．９９％より高い
確実性（Ｐ_{ｃｏｒｒｅｃｔ}）で１つの回転異性体を除外したいのであれば、誤り
の確率（Ｐ_{ｅｒｒｏｒ}）は０．０１％（０．０００１）より小さくなければなら
ない。回転異性体を誤って除くために、この回転異性体は、全ての高エネルギー
コンフォメーションに最初に現われなければならない。この場合、確率は（１／
３）^αである。さらに、この回転異性体は、どの低エネルギーエネルギーコンフ
ォメーションにも現われてはいけない。この場合、確率は（２／３）^αである。
全ての誤り確率は、Ｐ_{ｅｒｒｏｒ}＝（１／３）^α（２／３）^αである。従って、
一般式Ｐ_{ｅｒｒｏｒ}＝（１／ｍ）^α（ｍ−１／ｍ）^αを使用することによって、
ほぼ１００％の信頼度まで計算を調整することができる。式中、ｍは変数値（回
転異性体）の数である。ｍ＝１（回転異性体が１つ）である場合、Ｐ_{ｅｒｒｏｒ} ＝０である。Ｐ_{ｅｒｒｏｒ}＝０．０００１の値を割り当て、式を解くと、α＝６
．１２の値が得られる。αが非常に大きい場合、Ｐ_{ｅｒｒｏｒ}＝０であるが、ど
の変数値をも除く確率は非常に低い。従って、Ｐ_{ｃｏｒｒｅｃｔ}＝９９．９８３
％〜９９．９９８８％で変数値の除去を可能にするα値が、好ましくは、図１４
から使用される。The value of α used to determine n ₀ should be chosen carefully. If α is too large, rotamers are not excluded. If α is too small, unreasonable exclusion of rotamers can occur. At best, α should be adjusted by the number of possible rotamers of each amino acid to give the same probability for elimination of rotamers. To explain the determination of α, it is assumed that no rotamer is affected by its interaction with any other amino acid in its environment.
2 to 2 of one residue that correctly excludes rotamers with> 99.983% certainty
The α values for the 9 possible rotamers are shown in FIG. These values were calculated as follows. Given a residue with three rotamers, if we want to exclude one rotamer with greater than 99.99% certainty ( _Pcorrect ), the probability of _error ( _Perror ) is 0. It must be less than 01% (0.0001). In order to remove the rotamer by mistake, this rotamer must first appear in all high energy conformations. In this case, the probability is (1 /
3) ^α . Furthermore, this rotamer must not appear in any low energy energy conformation. In this case, the probability is (2/3) ^α .
All error probabilities are P _error = (1/3) ^α (2/3) ^α . Therefore,
By using the general formula P _error = (1 / m) ^α (m−1 / m) ^α ,
The calculation can be adjusted to almost 100% confidence. In the formula, m is the number of variable values (rotomers). If m = 1 (one rotamer), then P _error = 0. By assigning a value of P _error = 0.0001 and solving the equation, α = 6
． A value of 12 is obtained. When α is very large, P _error = 0, but the probability of excluding any variable value is very low. Therefore, P _correct = 99.983
% Values that allow the removal of variable values between% -99.99988% are preferably in accordance with FIG.
Used from.

【０１４２】（ＩＩＩ）探索の終了：残っている組み合わせがＭ未満になったら（Ｍは約１
０^５）、タンパク質のＮ個の最小エネルギー配座を得るために網羅的探索を行う
。 (III) End of search : When the number of remaining combinations is less than M (M is about 1
0 ⁵ ), perform an exhaustive search to obtain the N minimum energy conformations of the protein.

【０１４３】結果確率論的アルゴリズムは、広範囲のタンパク質フォールドファミリーをカバー
するように選択された、様々な大きさ（４６〜２６３残基）および複雑さ（バッ
クボーンと衝突する回転異性体の排除後、１．０４^＊１０^１４〜２．２９^＊１０ ^１０５の可能な組み合わせ）の１０種類のタンパク質に適用された。これらの１
０種類のタンパク質のうち６種類（４６〜４８残基）はＤＥＥ／Ａ^＊アルゴリズ
ムを用いたＬｅａｃｈおよびＬｅｍｏｎ（Ｐｒｏｔｅｉｎｓ１９８８；３３：
２２７−２３９）によっても選択され、確率論的アルゴリズムとＤＥＥ／Ａ^＊ア
ルゴリズムを比較するのに役立つ。これらのタンパク質は以下のとおりである：
クランビン（Ｃｒａｍｂｉｎ）（ＰＤＢエントリ１ｃｒｎ）（Ｔｅｅｔｅｒら，
ＪＭｏｌＢｉｏｌ．１９９３；２３０：２９２−３１１）、リボソームタン
パク質（１ｃｔｆ）（ＬｅｉｊｏｎｍａｒｃｋおよびＬｉｌｊａｓ，ＪＭｏｌ
Ｂｉｏｌ．１９８７；１９５：５５５−５７９）、補体制御タンパク質（ｃｏ
ｍｐｌｅｍｅｎｔｃｏｎｔｒｏｌｐｒｏｔｅｉｎ）（１ｈｃｃ）（Ｎｏｒｍ
ａｎら，ＪＭｏｌＢｉｏｌ．１９９１；２１９：７１７−７２５）、オボム
コイド第３ドメイン（２ｏｖｏ）（ＥｍｐｉｅおよびＬａｓｋｏｗｓｋｉ，Ｂｉ
ｏｃｈｅｍｉｓｔｒｙ１９８２；２１：２２７４−２２８４）、エラブトキシ
ンＢ（３ｅｂｘ）（Ｓｍｉｔｈら，ＡｃｔａＣｒｙｓｔａｌｌｏｇｒＡ．１
９８８；４４：３５７−３６８）、およびルブレドキシン（５ｒｘｎ）（Ｗａｔ
ｅｎｐａｕｇｈら，ＪＭｏｌＢｉｏｌ．１９８０；１３８：６１５−６３３
）。選択された残りのタンパク質はさらに大きく（１２９〜２６３残基）、高分
解能のＸ線構造（分解能＜１．５Å，Ｒ因子＜０．１７）を有する：リゾチーム
（２ｉｈｌ）、リボソームタンパク質（１ｗｈｉ）（Ｄａｖｉｅｓら，Ｓｔｒｕ
ｃｔｕｒｅ１９９６；４：５５−６６）、エンドヌクレアーゼ（２ｅｎｄ）（
Ｍｏｒｉｋａｗａら，Ｓｃｉｅｎｃｅ１９９２；２５６：５２３−５２６）、
およびヒドロラーゼ（１ａｒｂ）（Ｔｓｕｎａｓａｗａら，Ｊ．Ｂｉｏｌ．Ｃｈ
ｅｍ．１９８９；２６４：３８３２−３８３９）。表ＶＩＩは、確率論的アルゴ
リズムを１０種類のタンパク質に適用した結果をまとめている。各タンパク質に
ついて、（バックボーンと衝突する回転異性体の最初の排除後の）組み合わせの
数と、複数の単一コンフォメーションのいくつかの値（各タンパク質のグローバ
ルミニマム）および１０００個の低エネルギーコンフォメーションからなる「母
集団」の平均値を示す。各タンパク質の取り得る最良のＲＭＳを示す。最後に、
１０００個のコンフォメーションの平均エネルギーギャップ（加重なし）を示す
。Ｘ線コンフォメーションと比較して、グローバルエネルギーミニマムコンフォ
メーションの側鎖原子（Ｃ_βを除く）のＲＭＳを計算した。グローバルミニマム
についてのＲＭＳ範囲は１．３２〜２．６０である。１０００個の低エネルギー
配座についての平均ＲＭＳ値はグローバルミニマムよりいくらか大きいが、各タ
ンパク質について、グローバルミニマムよりエネルギーが大きく、ミニマムより
小さいＲＭＳを有するコンフォメーションが見出されている。１０００個の最小
エネルギー配座のエネルギー値の範囲は、グローバルミニマムより５．５２Ｋｃ
ａｌ／ｍｏｌｅ大きい値まである。グローバルミニマムからの１０００個の最小
エネルギー配座の平均エネルギーギャップは常に小さい（全てのタンパク質につ
いて２．２０Ｋｃａｌ／ｍｏｌｅ）。[0143]result Stochastic algorithms cover a wide range of protein fold families
Chosen to vary in size (46-263 residues) and complexity (back
1.04 after elimination of rotamers that collide with Kuborne^*10¹⁴~ 2.29^*10 ¹⁰⁵ (Possible combinations of) and 10 proteins. These one
6 out of 0 proteins (46-48 residues) are DEE / A^*Algorithm
And Lemon (Proteins 1988; 33:
227-239), the stochastic algorithm and DEE / A^*A
Useful for comparing rugorism. These proteins are:
Crambin (PDB entry 1crn) (Teeter et al.,
J Mol Biol. 1993; 230: 292-311), ribosomal tan.
Quality (1ctf) (Leijonmarck and Liljas, J Mol)
Biol. 1987; 195: 555-579), complement regulatory protein (co
implement control protein) (1hcc) (Norm
an et al., J Mol Biol. 1991; 219: 717-725), Obom.
Coid third domain (2ovo) (Empie and Laskowski, Bi
Chemistry 1982; 21: 2274-2284), erabutoxy.
B (3ebx) (Smith et al., Acta Crystallogr A.1).
988; 44: 357-368), and rubredoxin (5rxn) (Wat
empauugh et al., J Mol Biol. 1980; 138: 615-633.
). The remaining proteins selected were larger (residues 129-263) and
Resolvable X-ray structure (resolution <1.5Å, R factor <0.17): lysozyme
(2ihl), ribosomal protein (1whi) (Davies et al., Stru.
Culture 1996; 4: 55-66), endonuclease (2end) (
Morikawa et al., Science 1992; 256: 523-526),
And hydrolase (1 arb) (Tsunasakawa et al., J. Biol. Ch.
em. 1989; 264: 3832-3839). Table VII shows a stochastic algorithm
The results of applying the rhythm to 10 kinds of proteins are summarized. For each protein
About the combination (after the initial elimination of rotamers that collide with the backbone)
Number and some values of multiple single conformations (global for each protein)
Luminum) and 1000 low energy conformations
The average value of "population" is shown. The best possible RMS for each protein is shown. Finally,
Shows the average energy gap (without weighting) of 1000 conformations
. Global energy minimum conformation compared to X-ray conformation
Side chain atom (C_βRMS) was calculated. Global minimum
The RMS range for is 1.32 to 2.60. 1000 low energy
The average RMS value for the conformation is somewhat higher than the global minimum, but
Energy quality is higher than the global minimum and higher than the minimum.
A conformation with a small RMS has been found. 1000 minimum
The energy value range of the energy conformation is 5.52 Kc from the global minimum.
al / mole There is a large value. 1000 minimums from the global minimum
The average energy gap of the energy conformation is always small (for all proteins
2.20 Kcal / mole).

【０１４４】探索方法の妥当性の試験確率論的探索の有効性を試験するために、かつ表ＶＩＩで報告された値を考慮
して、多数の問題が提起された。第１の問題は、特定の回転異性体ライブラリー
が与えられた場合、網羅的探索によって得ることができた結果が確率論的探索に
よって得られるかどうかである。第２の問題は、回転異性体ライブラリーにオリ
ジナルのＸ線回転異性体が含まれる場合、このような探索によってタンパク質の
結晶構造が同定されるかどうかである。 Testing the Validity of the Search Method To test the effectiveness of the probabilistic search, and in view of the values reported in Table VII, a number of issues were raised. The first question is whether, given a particular rotamer library, the results obtained by the exhaustive search can be obtained by the stochastic search. The second question is whether such a search will identify the crystal structure of the protein if the rotamer library contains the original X-ray rotamers.

【０１４５】第１の問題は、このような網羅的探索を行うことができる、比較的小さなタン
パク質の試験を必要とする。エネルギー関数および回転異性体ライブラリーの制
約が与えられた場合、本発明者らの確率論的アルゴリズムは、試験タンパク質に
おける最小エネルギー組み合わせを見つけ、最小エネルギー組み合わせと網羅的
探索の結果とを比較するように課せられた。選択されたタンパク質は、高品質の
Ｘ線構造（１．０５Å分解能、Ｒ因子＝０．１０５）であるクランビン（ＰＬ形
式），ＢｒｏｏｋｈａｖｅｎＰｒｏｔｅｉｎＤａｔａＢａｎｋ（Ｂｅｒｎ
ｓｔｅｉｎら，Ｊ．Ｍｏｌ．Ｂｉｏｌ．１９９７；１１２：５３５−５４２）フ
ァイル１ｃｎｒ（Ｔｅｅｔｅｒら，ＪＭｏｌＢｉｏｌ．１９９３；２３０：
２９２−３１１）であった。クランビンは信頼性の高い試験事例となるのに十分
大きいが、あまり大きくなく、網羅的探索において長い演算を必要とするほど大
きくない。。エントリは、４６のアミノ酸残基（図１１を参照のこと）およびエ
タノール分子の座標を含んでいる。８つの不規則な残基（Ｔｈｒ１、Ｔｈｒ２、
Ｉｌｅ７、Ｖａｌ８、Ａｒｇ１０、Ａｓｎ１２、Ｉｌｅ３４、Ｔｈｒ３９）があ
る。適当な時間でこのタンパク質を評価するために、Ａｒｇ１０（この不規則な
残基ではＡ形式）、Ａｒｇ１７、Ｇｌｕ２３、Ｉｌｅ３３、およびＩｌｅ３５は
オリジナルの位置に固定された。（立体衝突の排除ステップ後の）組み合わせの
初期数は６．７９^＊１０^８であった。図１２において、広範囲なＮ個の低エネル
ギーコンフォメーションに対する確率論的探索および網羅的探索の結果を比較す
る。１０，０００個のコンフォメーションのそれぞれについてのエネルギー値お
よび２つの探索間の差（％）を示す。このタンパク質の４８５個の最小エネルギ
ーコンフォメーションが確率論的探索および網羅的探索によって正確に同じであ
ることが見出された。多数のコンフォメーションについて差はわずかである。配
座番号１０，０００のエネルギーは網羅的探索ではグローバルミニマムより４．
７１Ｋｃａｌ／ｍｏｌｅ大きく、確率論的探索では４．８０Ｋｃａｌ／ｍｏｌｅ
大きいことが分かる。従って、この配座に対するこれらの探索間の差は０．５６
％しかない。この試験は、乱数発生関数のための３つの異なる種数（１００００
０、２０００００、および３０００００）を用いて繰り返され、同様の結果が得
られた。The first problem requires the testing of relatively small proteins that can carry out such an exhaustive search. Given the constraints of the energy function and rotamer library, our probabilistic algorithm finds the lowest energy combinations in the test proteins and compares the lowest energy combinations with the results of the exhaustive search. Was imposed on. The selected proteins are clambin (PL format), which is a high-quality X-ray structure (1.05Å resolution, R factor = 0.105), Brookhaven Protein Data Bank (Bern).
Stein et al. Mol. Biol. 1997; 112: 535-542) file 1 cnr (Teeter et al., J Mol Biol. 1993; 230:
292-311). Crambin is large enough to be a reliable test case, but not very large and not large enough to require long operations in an exhaustive search. . The entry contains 46 amino acid residues (see Figure 11) and the coordinates of the ethanol molecule. Eight irregular residues (Thr1, Thr2,
Ile7, Val8, Arg10, Asn12, Ile34, Thr39). To evaluate this protein at the appropriate time, Arg10 (A form for this irregular residue), Arg17, Glu23, Ile33, and Ile35 were fixed in the original position. The initial number of combinations (after the step of eliminating steric collisions) was 6.79 ^* 10 ⁸ . In FIG. 12, the results of the probabilistic search and the exhaustive search for a wide range of N low energy conformations are compared. The energy values for each of the 10,000 conformations and the% difference between the two searches are shown. The 485 minimal energy conformations of this protein were found to be exactly the same by stochastic and exhaustive searches. The difference is small for many conformations. The energy of conformation number 10,000 is 4. From the global minimum in the exhaustive search.
71 Kcal / mole large, 4.80 Kcal / mole in stochastic search
It turns out to be big. Therefore, the difference between these searches for this conformation is 0.56
There is only%. This test consists of three different genus (10000
0, 200000, and 300000) with similar results.

【０１４６】Ｘ線座標を再現する能力を試験するために、確率論的アルゴリズムが、拡大さ
れた回転異性体ライブラリー（１ｃｎｒの結晶回転異性体が導入された）と共に
使用された。この探索間に残基は固定しなかった。確率項なしで（確率項は結晶
座標に利用できない）、エネルギーが式１によって計算された。以下の残基：４
つのＧｌｙ（側鎖なし）、５つのＡｌａ（１つしか可能な回転異性体がない）、
および６つのＣｙｓ（これらの全てがＳ−Ｓ結合を形成するので回転異性体がな
い）を含めなかった。従って、配列における４６のアミノ酸から３１のアミノ酸
がこの比較のために残された。この結晶構造座標におけるタンパク質のエネルギ
ーは、確率論的アルゴリズムによって見出されたグローバルミニマムより３．４
１Ｋｃａｌ／ｍｏｌｅ大きかった。残基の２０％（６つのアミノ酸：Ｓｅｒ６、
Ｖａｌ８、Ｔｈｒ２１、Ｔｙｒ２９、Ｉｌｅ３３、Ｉｌｅ３４）において、探索
結果はＸ線結果の上に完全に重ね合わされた。残基の５８％（１８アミノ酸）に
おいて、高品質の重ね合わせが見出された。すなわち、全てのねじれ角の絶対的
な角度のずれが４０°未満であることが見出された。従って、拡大された回転異
性体ライブラリー（このＲＭＳは０．０であるべきである）は側鎖の約８０％の
位置を正確に特定した。例えば、Ｌｅｕ１８の原子ＣＧは０．１８離れており、
Ａｓｎ１４のＣＧ原子は０．２３Å離れていた。側鎖原子（Ｃ_βを除く）につい
てのグローバルミニマム構造と結晶構造との間のＲＭＳ値は１．１６であった。To test the ability to reproduce X-ray coordinates, a stochastic algorithm was used with an expanded rotamer library (introduced 1 cnr of crystalline rotamers). No residues were fixed during this search. The energies were calculated by Eq. 1 without the probability term (probability term not available for crystal coordinates). The following residues: 4
1 Gly (no side chain), 5 Ala (only one possible rotamer),
And 6 Cys, which are rotamers-all of which form S—S bonds, were not included. Therefore, from amino acid 46 to 31 in the sequence were left for this comparison. The energy of the protein in this crystal structure coordinate is 3.4 from the global minimum found by the stochastic algorithm.
It was 1 Kcal / mole higher. 20% of the residues (6 amino acids: Ser6,
In Val8, Thr21, Tyr29, Ile33, Ile34), the search results were perfectly superimposed on the X-ray results. High quality overlays were found in 58% of the residues (18 amino acids). That is, it has been found that the absolute angular deviation of all twist angles is less than 40 °. Thus, the expanded rotamer library (this RMS should be 0.0) pinpointed approximately 80% of the side chains. For example, the atomic CGs of Leu 18 are 0.18 apart,
The CG atoms of Asn14 were separated by 0.23Å. RMS values between the global minimum structure and crystal structure of the side chain atoms (excluding C _beta) was 1.16.

【０１４７】オリジナルの回転異性体ライブラリー（結晶学的回転異性体なし）の限界を試
験するために、それぞれの回転異性体を、結晶構造における関連する側鎖にでき
るだけ近づけて配置した。得られたＲＭＳ値は１．１５であった。確率論的探索
におけるグローバルエネルギーミニマムと結晶構造との間のＲＭＳ値は１．９７
であることが見出された。To test the limits of the original rotamer library (no crystallographic rotamers), each rotamer was placed as close as possible to the relevant side chain in the crystal structure. The RMS value obtained was 1.15. The RMS value between the global energy minimum and the crystal structure in the stochastic search is 1.97.
Was found.

【０１４８】アルゴリズムと、Ｘ線、ＮＭＲ、およびＭＤからの結果との比較１０００個の最小エネルギー配座についての結果を、異なる条件：Ｘ線結晶学
、ＮＭＲ、およびＭＤで各側鎖がとり得るコンフォメーションを見抜く実験的方
法および理論的方法と比較するために、Ｅ．ｃｏｌｉリボヌクレアーゼＨＩのコ
ンフォメーション空間が本発明の方法を用いて探索された。 Comparison of algorithm with results from X-ray, NMR and MD Results for 1000 lowest energy conformations can be taken for each side chain under different conditions: X-ray crystallography, NMR and MD. To compare with the experimental and theoretical methods to detect conformation, E. The conformational space of E. coli ribonuclease HI was probed using the method of the invention.

【０１４９】８つのＮＭＲ構造（ＰＤＢエントリ１ｒｃｈ）の集合が、実験からの距離拘束
（ｄｉｓｔａｎｃｅｒｅｓｔｒｉｃｔｉｏｎ）に基づいて報告された。Ｐｈｉ
ｌｉｐｐｏｐｏｕｌｏｓおよびＬｉｍ（Ｐｒｏｔｅｉｎｓ１９９９；３６：８
７−１１０）は、拡大された一組のＮＭＲ結果を、高分解能（２ｒｎ２，１．４
８Å）結晶構造（Ｋａｔａｙａｎａｇｉら，ＪＭｏｌＢｉｏｌ．１９９２；
２２３：１０２９−１０５２）、低分解能（１ｒｎｈ，２．０５Å）結晶構造（
Ｙａｎｇら，Ｓｃｉｅｎｃｅ１９９０；２４９：１３９８−１４０１）、およ
びこれらのＭＤシミュレーションと比較した。ＮＭＲおよびＭＤシミュレーショ
ンによって、それぞれのねじれ角の結果はほとんど得られず、結果として得られ
たコンフォメーションは集団として分類された。それぞれの集団は二面角の平均
値および秩序パラメータＳ（Ｈｙｂｅｒｔｓら，ＰｒｏｔｅｉｎＳｃｉ．１９
９２；１：７３６−７５１）（それぞれの二面角のその平均値からのずれを表す
）により示される。次いで、それぞれの残基におけるそれぞれの二面角のＳパラ
メータが集団全体にわたって計算される。残基ｉの角度α_ｉの秩序パラメータＳ
（α_ｉ）（式中、α＝φ，ψ，χ_１，χ_２など）はA set of 8 NMR structures (PDB entry 1rch) was reported based on distance constraint from the experiment. Phi
lippopoulos and Lim (Proteins 1999; 36: 8).
7-110) provides an expanded set of NMR results with high resolution (2rn2,1.4).
8Å) Crystal structure (Katayanagi et al., J Mol Biol. 1992;
223: 1029-1052), low resolution (1 rnh, 2.05Å) crystal structure (
Yang et al., Science 1990; 249: 1398-1401), and these MD simulations. NMR and MD simulations yielded very few twist angle results for each, and the resulting conformations were classified as a population. Each population has an average dihedral angle and an order parameter S (Hyberts et al., Protein Sci. 19).
92; 1: 736-751) (representing the deviation of each dihedral angle from its mean value). The S-parameter of each dihedral angle at each residue is then calculated over the population. Order parameter S of angle α _i of residue i
(Α _i ) (where α = φ, ψ, χ ₁ , χ _2, etc.) is

【０１５０】[0150]

【数７】として定義される。式中、Ｎは集団における構造の総数であり、α_１ ^ｊ（ｊ＝１
，．．．，Ｎ）は二面角α_ｉと等しい位相を有する２Ｄ単位ベクトルであり、ｉ
は残基の番号を示し、ｊは集団数の番号を表す。角度が全ての構造において同じ
である場合、Ｓは１の値を有するが、１より非常に小さいＳの値は構造の不規則
な領域を示す。ＰｈｉｌｉｐｐｏｐｏｕｌｏｓおよびＬｉｍはこれらの分類を０
．８より大きなＳ値に制限した。[Equation 7] Is defined as Where N is the total number of structures in the population and α ₁ ^j (j = 1
,. ．． , N) is a 2D unit vector with a phase equal to the dihedral angle α _i , where i
Represents the residue number, and j represents the population number. If the angles are the same in all structures, S has a value of 1, but values of S much smaller than 1 indicate irregular regions of the structure. Philippopoulos and Lim classify these categories as 0
． Limited to S-values greater than 8.

【０１５１】確率論的アルゴリズムは、より高いＸ線精度を有する２ｒｎ２のバックボーン
に対して使用された。計算は１．６１^＊１０^８７の可能な組み合わせから始めた
。高エネルギー配座を除くことによってコンフォメーション空間を１．３^＊１０ ^３３の最良のコンフォメーションに精密化するために、アルゴリズムは使用され
ており、タンパク質のコンフォメーション柔軟性を評価するのに十分な配座を残
す。[0151] The stochastic algorithm is a 2 rn2 backbone with higher X-ray accuracy.
Used against. Calculated as 1.61^*10⁸⁷Started with the possible combinations of
. The conformational space is reduced to 1.3 by removing the high energy conformation.^*10 ³³ Algorithm is used to refine the best conformation of
And remains in a conformation sufficient to assess the conformational flexibility of the protein.
You

【０１５２】表ＶＩＩＩは、確率論的アルゴリズムとＸ線結晶学、ＮＭＲ、およびＭＤの結
果の比較を含む。この表は、以下の仮定に従って、非常に起こりそうなコンフォ
メーションをとる残基に焦点を当てている。ねじれ角がＭＤ集団では単一のコン
フォメーションをとり、ＮＭＲ集団では複数のコンフォメーションをとる場合も
あるし、逆のことが得られる場合もあった。本発明者らは、実験から得られた回
転異性体が以下の規則：（１）高分解能の結晶構造（２ｒｎ２）に現われる。（
２）以下の３つ：低分解能の結晶構造（１ｒｎｈ）、ＮＭＲモデル、およびＭＤ
シミュレーションの少なくとも２つに見出される、の１または複数に従う場合、
実験から得られた回転異性体の高い確率を仮定する。「ヒット」は、「正しい」
配座から±３０°までの変動を有する確率論的アルゴリズムの任意の結果である
とみなされた。このような各ヒットは表において「＋」の印を付けられている。
場合によっては、Ｍ４７のχ１などの角度が表において１つの回転異性体で示さ
れ、「（＋）」の印を付けられている。このような角度は前記の２つの規則に従
わないさらなる値を有する。他の角度は低い確率を有するとみなされ、表ＶＩＩ
Ｉに示されない。表ＶＩＩＩにおける１１５の二面角のうち７つの角度は回転異
性体ライブラリーから抜けており（図１３Ａを参照のこと）、他の２つの角度は
約４０°ずれており、従って、「ヒット」として本発明者らの評価に含まれなか
った。従って、本発明者らは、Ｘ線、ＮＭＲ、およびＭＤと比較して最大１０６
の「ヒット」を予想することができる。確率論的アルゴリズムにより８７の角度
（８２％）が正しく予測される（図１３Ｂを参照のこと）。Table VIII contains a comparison of X-ray crystallography, NMR, and MD results with stochastic algorithms. This table focuses on the highly likely conformational residues according to the following assumptions. In some cases, the twisted angle has a single conformation in the MD population, and in the NMR population, a plurality of conformations, and in some cases, the opposite is obtained. The present inventors show that the rotamer obtained from the experiment appears in the following rules: (1) high resolution crystal structure (2rn2). (
2) The following three: low resolution crystal structure (1 rnh), NMR model, and MD
According to one or more of the found in at least two of the simulations,
Assume a high probability of rotamers obtained from the experiment. "Hit" is "correct"
It was considered to be any result of a stochastic algorithm with a variation from conformation to ± 30 °. Each such hit is marked with a "+" in the table.
In some cases, angles such as χ1 of M47 are indicated in the table by one rotamer and are marked with “(+)”. Such angles have additional values that do not follow the above two rules. Other angles are considered to have low probabilities and are listed in Table VII.
Not shown in I. Seven of the 115 dihedral angles in Table VIII are missing from the rotamer library (see FIG. 13A) and the other two are offset by about 40 °, thus a “hit”. Was not included in the evaluation by the inventors. Therefore, we can obtain up to 106 compared to X-ray, NMR and MD.
Can be expected to be a "hit". The probabilistic algorithm correctly predicts 87 angles (82%) (see Figure 13B).

【０１５３】アルゴリズムとＤＥＥ／Ａ^＊アルゴリズムとの比較ＬｅａｃｈおよびＬｅｍｏｎ（Ｐｒｏｔｅｉｎｓ１９９８；３３：２２７−
２３９）は、広範囲のタンパク質フォールドファミリーをカバーするように選択
された８個一組のタンパク質に対してＤＥＥ／Ａ^＊アルゴリズムによりコンフォ
メーション空間を探索した。次いで、これらのタンパク質のうちの６種類（１ｃ
ｒｎ、１ｃｔｆ、１ｈｃｃ、２ｏｖｏ、３ｅｂｘ、５ｒｘｎ）に対して、本発明
の方法が使用された。ヘビ毒神経毒（１ｎｘｂ）（Ｔｓｅｒｎｏｇｌｏｕら，Ｍ
ｏｌＰｈａｒｍａｃｏｌ．１９７８；１４：７１０−７１６）は未知の残基タ
イプ（残基５９）のために除外された。ウシ膵トリプシンインヒビター（５ｐｔ
ｉ）（Ｗｌｏｄａｗｅｒら，ＪＭｏｌ．Ｂｉｏｌ．１９８７；１９３：１４５
−１５６）は、残基Ｇｌｕ７およびＭｅｔ５２による２つの主要な部位の占有の
ために除外された。ＬｅａｃｈおよびＬｅｍｏｎはまた、「標準的な」および「
減少した」静電表示と共に「融合」原子モデルおよび「全原子」モデルの影響を
探索した。残念なことに、彼らは各系のＲＭＳ値を別々に報告せず、各探索法で
の８つ全ての系の平均値だけを報告した。表ＩＸは、確率論的方法とＤＥＥ／Ａ ^＊との比較を含んでいる。確率論的アルゴリズムによって解かれた組み合わせの
最大数は２．２９^＊１０^１０５であるのに対して、ＤＥＥ／Ａ^＊は２．４８^＊１
０^３４の組み合わせにしか達しなかった。確率論的アルゴリズムによって解かれ
た最大のタンパク質系は２６３アミノ酸であるのに対して、ＤＥＥ／Ａ^＊は最大
６８残基を解いた。次いで、モデルの正確さを評価するために、予測されたコン
フォメーションとＸ線コンフォメーションの側鎖原子（Ｃ_βを除く）についての
平均ＲＭＳを計算した。現行の回転異性体ライブラリーの取り得る最良のＲＭＳ
を示す。ＤＥＥ／Ａ^＊によっても計算された同じ系において、確率論的アルゴリ
ズムのＲＭＳ値の範囲は１．３２〜２．４８であり、平均は２．０７であること
が見出された。本発明者らの回転異性体ライブラリーで取り得る最良のＲＭＳは
、本発明者らの試験事例における全タンパク質について１．１８である。Ｌｅａ
ｃｈおよびＬｅｍｏｎは、原子モデルおよび回転異性体ライブラリーに応じて１
．７７〜１．９２の平均ＲＭＳ値と、０．７５〜０．８３の回転異性体ライブラ
リーの取り得る最良のＲＭＳを報告した。組み合わせ的爆発のためにＤＥＥ／Ａ ^＊を使用することができなかった、より大きな系において、確率論的アルゴリズ
ムは２．２２〜２．６０の範囲の平均ＲＭＳと平均２．４０を見出した。回転異
性体ライブラリーの取り得る最良のＲＭＳは１．２３であった。[0153]Comparison between algorithm and DEE / A ^* algorithm Leach and Lemon (Proteins 1998; 33: 227-.
239) selected to cover a wide range of protein fold families
DEE / A for each set of 8 proteins^*Algorithm
I searched the space of formation. Then, six of these proteins (1c
rn, 1ctf, 1hcc, 2ovo, 3ebx, 5rxn)
Method was used. Snake Venom Neurotoxin (1nxb) (Tsernoglou et al., M
ol Pharmacol. 1978; 14: 710-716) is an unknown residue
Excluded due to ip (residue 59). Bovine pancreatic trypsin inhibitor (5pt
i) (Wloderer et al., J Mol. Biol. 1987; 193: 145.
-156) is the occupancy of the two major sites by residues Glu7 and Met52.
Was excluded because of. Leach and Lemon also refer to "standard" and "
The effects of "fused" and "whole atom" models with "decreased" electrostatic display
I searched. Unfortunately, they do not report the RMS value for each system separately,
Only the average value of all eight systems of Table IX shows the stochastic method and DEE / A ^* Includes comparison with. Of combinations solved by a stochastic algorithm
The maximum number is 2.29^*10¹⁰⁵While DEE / A^*Is 2.48^*1
0³⁴I only reached the combination of. Solved by a stochastic algorithm
The largest protein system is 263 amino acids, while DEE / A^*Is the maximum
Solved 68 residues. The predicted consonants are then used to assess model accuracy.
Side-chain atoms (C) in conformation and X-ray conformation_β(Except)
The average RMS was calculated. Best possible RMS of current rotamer library
Indicates. DEE / A^*In the same system also calculated by, the stochastic algorithm
The RMS value range of the rhythm is 1.32 to 2.48, and the average is 2.07.
Was found. The best possible RMS for our rotamer library is
, 1.18 for all proteins in our test case. Lea
ch and Lemon 1 depending on atom model and rotamer library
． 77-1.92 average RMS value and 0.75-0.83 rotamer library
Reported Lee's best possible RMS. DEE / A for combinatorial explosion ^* In a larger system that could not be used in a stochastic algorithm
Found an average RMS in the range of 2.22 to 2.60 and an average of 2.40. Different rotation
The best possible RMS for the sex library was 1.23.

【０１５４】考察前記の説明は、タンパク質側鎖のコンフォメーション空間を探索するための新
規の確率論的探索アプローチの適用に関係する。これは、タンパク質内の極性プ
ロトンの位置を探索するための前節の前例を発展および改良したものである。こ
のアルゴリズムは、タンパク質の様々な大きさのコンフォメーション空間を首尾
よく探索し、バックボーンと衝突する回転異性体を排除した後に多数の組み合わ
せを処理することができる。 Discussion The above discussion relates to the application of a novel probabilistic search approach to search the conformational space of protein side chains. This is a development and improvement of the previous example of the previous section for searching the position of polar protons in proteins. This algorithm can successfully search a large number of conformational spaces of a protein and process many combinations after eliminating rotamers that collide with the backbone.

【０１５５】複雑な組み合わせ探索の処理における確率論的アルゴリズムの確実性は表ＶＩ
ＩおよびＩＸではっきり証明されている。確率論的探索と網羅的探索との比較（
図１２）により、増加した量の最小エネルギーコンフォメーションの発見におけ
る確率論的アルゴリズムの信頼性が証明された。このタンパク質の４８５の低エ
ネルギーコンフォメーションについて、確率論的探索と網羅的探索と間に差は見
出されなかった。１０，０００個の最小エネルギーコンフォメーションの限界に
近づいた時、０．５６％のわずかなずれが検出された。この多数のコンフォメー
ションが、グローバルミニマムより最大で４．７１Ｋｃａｌ／ｍｏｌｅ大きいエ
ネルギーギャップに達するので、この母集団はボルツマン分布に従う分子特性の
主な要因を含んでいる。これらは、コンフォメーションエントロピーの演算に使
用することができる分子分配関数への主な寄与である。The certainty of probabilistic algorithms in the processing of complex combinatorial searches is shown in Table VI.
Proven in I and IX. Comparison of probabilistic search and exhaustive search (
Figure 12) proves the reliability of the probabilistic algorithm in finding the increased amount of the minimum energy conformation. No difference was found between the stochastic and exhaustive searches for the 485 low energy conformation of this protein. When approaching the limit of 10,000 minimum energy conformations, a slight deviation of 0.56% was detected. This population contains a major contributor to the molecular properties according to the Boltzmann distribution, as this large number of conformations reach an energy gap that is up to 4.71 Kcal / mole greater than the global minimum. These are the major contributions to the molecular partition function that can be used to compute the conformational entropy.

【０１５６】低エネルギーコンフォメーションの母集団に対する確率論的探索と網羅的探索
との間には差がなく、別の問題は本発明者らの探索結果と実験との比較である。
これは、１０種類のタンパク質についての本発明者らのグローバルミニマムと結
晶学的結果とを比較することにより表ＶＩＩに、および１種類のタンパク質につ
いての本発明者らの低エネルギー母集団と詳細なＸ線、ＮＭＲ、およびＭＤ結果
を比較することにより表ＶＩＩＩに示されている。１０種類のタンパク質につい
てのグローバルミニマムのＲＭＳ値は回転異性体ライブラリーの影響を強く受け
るが、回転異性体ライブラリーの影響を完全に受けるわけではない（すなわち、
エネルギー表式は構造の再現に制限される）。このライブラリーにオリジナルの
結晶学的回転異性体を含めることにより、この点が証明された。この場合でさえ
、残基の約８０％しか高い精度で計算されなかった。１ｃｎｒの結晶座標のエネ
ルギーは、本発明者らの確率論的アルゴリズムによって見出されたグローバルミ
ニマムより３．４１Ｋｃａｌ／ｍｏｌｅ大きいことが見出された。しかしながら
、この構造のＲＭＳ値は１．１６しかない。結晶学的回転異性体なしでの１ｃｎ
ｒにおける確率論的探索は１．９７のＲＭＳをもたらす。回転異性体ライブラリ
ーにより課せられた制限は、各タンパク質について、このライブラリーの取り得
る最良のＲＭＳを示す縦列によって表ＶＩＩに示される。これらの値は、グロー
バルエネルギーコンフォメーションのＲＭＳ値における誤りの５０〜７５％をも
たらす。表ＶＩＩから、グローバルミニマムエネルギーコンフォメーションは必
ずしも最小ＲＭＳ値を有するものではないことが分かる。より高いエネルギー配
座が存在し、その構造はＸ線の結果により近い。There is no difference between the probabilistic search and the exhaustive search for low energy conformation populations, and another issue is the comparison between our search results and experiments.
This is detailed in Table VII by comparing our global minimum and crystallographic results for 10 proteins and detailed with our low energy population for one protein. It is shown in Table VIII by comparing the X-ray, NMR and MD results. Global minimum RMS values for 10 proteins are strongly influenced by rotamer libraries, but not completely by rotamer libraries (ie,
The energy formula is limited to the reproduction of the structure). The inclusion of the original crystallographic rotamer in this library proved this point. Even in this case, only about 80% of the residues were calculated with high accuracy. The energy of the 1 cnr crystal coordinate was found to be 3.41 Kcal / mole greater than the global minimum found by our probabilistic algorithm. However, the RMS value of this structure is only 1.16. 1cn without crystallographic rotamers
A stochastic search at r yields an RMS of 1.97. The limits imposed by the rotamer library are shown in Table VII for each protein by the column showing the best possible RMS of this library. These values result in 50-75% of the errors in the RMS value of the global energy conformation. From Table VII it can be seen that the global minimum energy conformation does not necessarily have the minimum RMS value. There is a higher energy conformation and its structure is closer to the X-ray result.

【０１５７】表ＶＩＩＩは、Ｘ線、ＮＭＲ、またはＭＤと比較することにより検出されると
予想されたＥ．ｃｏｌｉリボヌクレアーゼＨＩの１０６の角度を含む。アルゴリ
ズムは、全体の８２％である８７の角度を正しく検出した。１００％精度からの
ずれの一部は、回転異性体ライブラリーの質によるものかもしれないが、大部分
はエネルギー関数によるものである。Ｍｅｎｄｅｓら（Ｐｒｏｔｅｉｎｓ１９
９９；３７：５３０−５４３）は、１つの回転異性体を、古典的な固定した回転
異性体の周囲に群がるコンフォメーションの連続集団（ｃｏｎｔｉｎｕｏｕｓ
ｅｎｓｅｍｂｌｅ）として示した。このようなアプローチは回転異性体ライブラ
リーの有効性を高めることができる。これらの結果（結晶学的回転異性体がある
場合では１．１６のＲＭＳ、結晶学的回転異性体がない場合では１．９７）は、
より大きな回転異性体ライブラリーがＲＭＳ値の劇的な改善を保証しないという
主張を裏付けている（Ｐｒｏｔｅｉｎｓ１９９２；１４：２１３−２２３；Ｊ
．Ｍｏｌ．Ｂｉｏｌ．１９９４；２３５：１０８８−１０９７；Ｔａｎｉｍｕｒ
ａら，ＰｒｏｔｅｉｎＳｃｉ．１９９４；３：２３５８−２３６５；Ｖａｓｑ
ｕｅｚ，Ｂｉｏｐｏｌｙｍｅｒｓ１９９５；３６：５３−７０）。Table VIII shows the E. coli expected to be detected by comparison with X-ray, NMR, or MD. The E. coli ribonuclease HI contains 106 angles. The algorithm correctly detected 87 angles, which is 82% of the total. Some of the deviation from 100% accuracy may be due to the quality of the rotamer library, but most is due to the energy function. Mendes et al. (Proteins 19
99; 37: 530-543), a continuous population of conformations in which one rotamer is grouped around a classical fixed rotamer.
Ensemble). Such an approach can enhance the effectiveness of rotamer libraries. These results (RMS of 1.16 in the presence of the crystallographic rotamer, 1.97 in the absence of the crystallographic rotamer)
Supporting the claim that the larger rotamer library does not guarantee a dramatic improvement in RMS values (Proteins 1992; 14: 213-223; J
． Mol. Biol. 1994; 235: 1088-1097; Tanimur.
a et al., Protein Sci. 1994; 3: 2358-2365; Vasq.
Uez, Biopolymers 1995; 36: 53-70).

【０１５８】現在、ある特定のタンパク質のコンフォメーション空間を研究するためには４
つの主な方法：Ｘ線結晶学、ＮＭＲ、ＭＤ、および回転異性体ライブラリーに基
づく方法がある。通常、Ｘ線結晶学は、結晶における特定のコンフォメーション
準安定状態に偏っている可能性のある、単一の構造を示唆する（Ｂｒｕｎｇｅｒ
，Ｎａｔ．Ｓｔｒｕｃｔ．Ｂｉｏｌ．１９９７；４補遺：８６２−８６５）。異
なるコンフォメーションの観察は最も高い分解能でのみ可能な場合がある。本発
明者らのアルゴリズムの利点は簡単明瞭であり、すなわち、単一のコンフォメー
ションから存続可能なコンフォメーションのファミリーまで及んでいることであ
る。Currently, there are four ways to study the conformational space of a particular protein.
There are two main methods: methods based on X-ray crystallography, NMR, MD, and rotamer libraries. X-ray crystallography usually suggests a single structure that may be biased to a particular conformational metastable state in the crystal (Brunger).
, Nat. Struct. Biol. 1997; 4 Addendum: 862-865). Observation of different conformations may only be possible with the highest resolution. The advantage of our algorithm is simple and unambiguous: it extends from a single conformation to a family of viable conformations.

【０１５９】Ｘ線結晶学とは異なり、ＮＭＲは、２Ｄおよび３Ｄカップリングマップを解読
することにより代替コンフォメーションを示唆する。ＮＭＲからは、ポテンシャ
ルエネルギー面におけるエネルギー最小値の形状は分からない。タンパク質のＮ
ＭＲは、特に、大きなタンパク質では、コンフォメーション変化のタイムスケー
ルにより制限される長く退屈な実験である。この場合、本発明の方法は、代替コ
ンフォメーションを示唆するための付加的なツールである可能性がある。ＮＭＲ
構造が入手可能な場合、コンフォメーションのエネルギー量の決定を可能にし、
従って、平衡状態での母集団全体へのエネルギー量の寄与の評価を可能にするこ
とによって、この情報を拡大するために、本発明の方法を使用することができる
。Unlike X-ray crystallography, NMR suggests alternative conformations by decoding the 2D and 3D coupling maps. From NMR, the shape of the minimum energy value on the potential energy surface is unknown. Protein N
MR is a long and tedious experiment limited by the timescale of conformational changes, especially for large proteins. In this case, the method of the present invention may be an additional tool for suggesting alternative conformations. NMR
Allows the determination of the amount of conformational energy, if a structure is available,
Therefore, the method of the invention can be used to expand this information by allowing the assessment of the contribution of the energy content to the total population at equilibrium.

【０１６０】ＭＤシミュレーションは生体分子に対して大きなＣＰＵタイムスケールを必要
とし、このことがコンフォメーション空間の完全な探索を妨げる。ＭＤは、ＮＭ
ＲによってもＸ線結晶学によっても検出することができないコンフォメーション
を示唆する。ＭＤのタイムスケールおよび障壁通過能力は、大きな生体分子にお
けるグローバルミニマムまたは最小エネルギーコンフォメーションの母集団を検
出するのに十分な信頼性がまだない。この両方の発見における本発明者らの確率
論的アルゴリズムの信頼度は本文書において証明されている。しかしながら、Ｍ
Ｄトラジェクトリーはコンフォメーション相互転換機構を示すのに対して、確率
論的アプローチは経路ではなく生成物に焦点を合わせている。MD simulations require a large CPU timescale for biomolecules, which prevents a perfect search of the conformational space. MD is NM
It suggests a conformation that cannot be detected by either R or X-ray crystallography. The time scale and barrier-passing ability of MD are not yet reliable enough to detect global minimum or minimum energy conformation populations in large biomolecules. The reliability of our probabilistic algorithm in both of these findings is proved in this document. However, M
The D trajectory shows a conformational interconversion mechanism, whereas the stochastic approach focuses on products rather than pathways.

【０１６１】ＤｉｌｌおよびＣｈａｎ（ＮａｔｕｒｅＳｔｒｕｃｔ．Ｂｉｏｌ．１９９７
；４：１０−１９；ＣｈａｎおよびＤｉｌｌ，Ｐｒｏｔｅｉｎｓ１９９８，３
０，２−３３）は、ある特定のタンパク質のネイティブな状態が、自由エネルギ
ーにおけるグローバルミニマム（必ずしも、グローバルミニマムポテンシャルエ
ネルギーではない）と一致すると発表した。従って、側鎖を追加するアルゴリズ
ムは、エントロピー評価を可能にするために最小エネルギーコンフォメーション
の大部分を生じるべきである。現在、本発明の方法はこの要求を満たしている。
「平均場」近似では、それぞれの側鎖は、隣の側鎖の可能な全てのコンフォメー
ションの平均の「影響を受ける」。次いで、ある特定の可能な位置の側鎖確率か
らコンフォメーションエントロピーが評価される（Ｖｅｓｑｕｅｚ，Ｂｉｏｐｏ
ｌｙｍｅｒｓ１９９５；３６：５３−７０；ＫｏｅｈｌおよびＤｅｌａｒｕｅ
，Ｎａｔ．Ｓｔｒｕｃｔ．Ｂｉｏｌ．１９９５；２：１６３−１７０；Ｋｏｅｈ
ｌおよびＤｅｌａｒｕｅ，Ｃｕｒｒ．Ｏｐｉｎ．Ｓｔｒｕｃｔ．Ｂｉｏｌ．１９
９６：２２２−２２６）。本発明の確率論的探索は、グローバルミニマムの発見
に加えて、いかなる平均場近似もなく大きなタンパク質における回転異性体につ
いての最も近いＮ個の最良の解を提供し、その点において独特である。従って、
本発明の確率論的探索は複雑な分子系の熱力学的特性の研究に使用することがで
きる。確率論的アルゴリズムは２５０を超える残基を処理することができる（こ
の段階での最大値は２．２９^１０５の組み合わせである）。ＤＥＥ／Ａ^＊アルゴ
リズムは最大６８の残基を処理し、（バックボーン衝突排除前の）組み合わせの
最大数は１０^４４であった。ＤＥＥアルゴリズムの適用後に、Ａ^＊アルゴリズム
による探索される残りの空間の大きさは最大１０^２１まで減らすことができる。Dill and Chan (Nature Struct. Biol. 1997.
4: 10-19; Chan and Dill, Proteins 1998,3;
0, 2-33), the native state of a particular protein is consistent with the global minimum in free energy (not necessarily the global minimum potential energy). Therefore, the algorithm that adds side chains should yield most of the minimum energy conformation to allow entropy estimation. Currently, the method of the present invention meets this need.
In the “mean field” approximation, each side chain is “affected” by the average of all possible conformations of the side chains next to it. The conformational entropy is then evaluated from side chain probabilities at certain possible positions (Vesquez, Biopo.
lymers 1995; 36: 53-70; Koehl and Delarue.
, Nat. Struct. Biol. 1995; 2: 163-170; Koeh.
1 and Delarue, Curr. Opin. Struct. Biol. 19
96: 222-226). The probabilistic search of the present invention is unique in that, in addition to finding the global minimum, it provides the N closest best solutions for rotamers in large proteins without any mean field approximation . Therefore,
The probabilistic search of the present invention can be used to study the thermodynamic properties of complex molecular systems. The probabilistic algorithm can handle over 250 residues (maximum value at this stage is 2.29 ¹⁰⁵ combinations). The DEE / A ^* algorithm processed up to 68 residues and the maximum number of combinations (before backbone collision exclusion) was 10 ⁴⁴ . After applying the DEE algorithm, the size of the remaining space searched by the A ^* algorithm can be reduced up to 10 ²¹ .

【０１６２】（本発明の方法のエネルギー表式とバックボーン依存的回転異性体ライブラリ
ーを用いる）本発明の方法の質を、（異なるエネルギー表式と２つの異なるライ
ブラリーを用いる）組み合わせＤＥＥ／Ａ^＊アルゴリズム（ＬｅａｃｈおよびＬ
ｅｍｏｎ，Ｐｒｏｔｅｉｎｓ１９９８；３３：２２７−２３９）の結果と比較
した。ＲＭＳが回転異性体ライブラリーの影響を受けるので、ＲＭＳによる各ア
プローチと実験との比較は制限される。すなわち、タンパク質の最小ＲＭＳ値が
１．９である回転異性体ライブラリーによる２．０のＲＭＳ値は、最適ＲＭＳが
０．１であるライブラリーから得られた１．５のＲＭＳ値を有するものより優れ
た探索アプローチを示している。ＲＭＳ値は、回転異性体ライブラリーの枠内で
達成することができる最適ＲＭＳ値と比較すべきである。表ＩＸにおいて、ライ
ブラリーの取り得る最良のＲＭＳ値とグローバルエネルギーコンフォメーション
のＲＭＳとの間には相関関係があることが分かる。このことは、これらの結果と
ＬｅａｃｈおよびＬｅｍｏｎの結果との差を説明することができる。別の利点は
、どの前処理アルゴリズム（例えば、Ａ^＊アルゴリズムの場合ではＤＥＥ）もな
く、「独立」形式で確率論的アルゴリズムを使用できることにある。Ａ^＊アルゴ
リズムは、ゴールノードに達するためにコスト依存性の優れた評価を必要とする
。これは達成することが難しい可能性がある。なぜなら、どの位置にもまだ割り
当てられていない残基間の相互作用は容易に計算することができないからである
。確率論的アルゴリズムについて表ＶＩＩおよびＩＸに示した組み合わせの数が
、バックボーンと衝突する回転異性体を除いた後に残される可能な組み合わせの
数を指すことにも留意すべきである。従って、可能な組み合わせの本当の数はそ
れよりかなり多い。The quality of the method of the invention (using the energy scheme of the method of the invention and the backbone-dependent rotamer library) was compared to the combination of DEE / A (using different energy schemes and two different libraries). ^* Algorithm (Leach and L
Emon, Proteins 1998; 33: 227-239). Since RMS is affected by rotamer libraries, comparisons between RMS approaches and experiments are limited. That is, an RMS value of 2.0 with a rotamer library where the protein has a minimum RMS value of 1.9 has an RMS value of 1.5 obtained from the library with an optimal RMS of 0.1. It shows a better search approach. The RMS value should be compared to the optimum RMS value that can be achieved within the framework of the rotamer library. It can be seen in Table IX that there is a correlation between the best possible RMS value of the library and the RMS of the global energy conformation. This could explain the difference between these results and the results of Leach and Lemon. Another advantage is that the probabilistic algorithm can be used in an "independent" form without any pre-processing algorithm (eg DEE in the case of the A ^* algorithm). The A ^* algorithm requires a good estimate of cost dependence to reach the goal node. This can be difficult to achieve. Because interactions between residues that have not yet been assigned to any position cannot be easily calculated. It should also be noted that the number of combinations shown in Tables VII and IX for the stochastic algorithm refers to the number of possible combinations left after removing rotamers that collide with the backbone. Therefore, the true number of possible combinations is much higher.

【０１６３】第３節：タンパク質内のループ構造の予測本発明はまた、タンパク質内のループ構造を正しく予測するという問題を解決
するのに特に有用である。本発明のこの特定の実施は、過度の仮定なく、また組
み合わせ的爆発なく、かなりの精度でこのような予測の決定を可能にすることに
より難しい問題を解決する。 Section 3: Prediction of Loop Structures in Proteins The present invention is also particularly useful in solving the problem of correctly predicting loop structures in proteins. This particular implementation of the invention solves the difficult problem by allowing the determination of such predictions with considerable accuracy, without undue assumptions and combinatorial explosions.

【０１６４】本節の「方法」にて述べられる本発明の特定の実施はまた、「結果」にて述べ
られるように当該技術分野で周知の他の方法と突き合わせて試験された。これら
の方法および結果は例示のためだけに示され、どのようにも制限することは意図
していないことに留意すべきである。次に、これらの結果の解釈が「考察」にて
述べられる。The particular implementations of the invention described in “ Methods ” of this section were also tested against other methods known in the art as described in “ Results ”. It should be noted that these methods and results are presented for illustrative purposes only and are not intended to be limiting in any way. The interpretation of these results is then described in " Discussion ".

【０１６５】方法ループの構築はいくつかのストラテジーによって達成することができる。これ
らの大部分は標準的な結合および結合角を使用するが、二面角だけを変更する。
本発明の方法のこの特定の実施はこの一般的な方針に従うが、いくつかのステッ
プではそれからはずれる。Construction of method loops can be accomplished by several strategies. Most of these use standard bonds and bond angles, but only change the dihedral angles.
This particular implementation of the method of the invention follows this general policy, but deviates from it in some steps.

【０１６６】幾何学的前提図１５は、６つの残基（０〜５）の一例を示す。残基０および残基５はタンパ
ク質の不変部分にある。残基１〜４のコンフォメーションについて探索が行われ
る。Ｎ末端およびＣ末端の両方から同時にループが構築され（Ｍｏｕｌｔおよび
Ｊａｍｅｓ，Ｐｒｏｔｅｉｎｓ１９８６；１：１４６−１６３）、ループ閉鎖
が残基２と残基３との間で試験される。このような構築ストラテジーは、誤りの
蓄積（すなわち、一方の末端から他方の末端への二面角によってループを構築す
る場合、最初の残基における間違いが、さらなる残基における増大した量のずれ
を引き起こす）を減らす。Geometric Assumptions Figure 15 shows an example of 6 residues (0-5). Residue 0 and residue 5 are in the constant part of the protein. A search is performed for the conformation of residues 1-4. Loops were constructed simultaneously from both N- and C-termini (Mault and James, Proteins 1986; 1: 146-163) and loop closure is tested between residues 2 and 3. Such a construction strategy is such that when accumulating errors (ie, constructing a loop by a dihedral angle from one end to the other, an error in the first residue causes an increased amount of deviation in additional residues). Cause).

【０１６７】図１６は、ある特定の残基についての二面角の定義を示す。すなわち、構築ス
トラテジーにおける残基ｎのψはＮ末端側の前の残基のψである。このような定
義の背景にある考え方は、φ_ｎおよびψ_ｎの両方が残基ｎにおけるＮ原子および
Ｃ原子の位置を決定することである。Ｎ末端からループを構築する（図１５の残
基１から始める）場合、予想される最初のものである残基１の窒素の位置は、前
の残基の角度ψに従って特定されるはずである。以下に記載する本発明の例示的
な方法はＣα−Ｃ−Ｎ−Ｃαのトランス（１８０°）構造を仮定する。従って、
残基１において、Ｃαの位置はこの前提に従って特定される。残基１のカルボニ
ル炭素の位置は、探索から取り出されたφ_１に従って特定される（下記を参照の
こと）。残基２の窒素の位置は、ψ_２（規則正しくψ_１と定義される）に従って
特定される（以下同様）。Ｃ末端からループを構築する場合、残基４のカルボニ
ル炭素の位置はφ_５によって特定される。残基４のＣαは残基５のＣαと１８０
°の位置にある。残基４のＮの位置はψ_５に従って特定される。同様に、残基３
の位置は、図１６で定義するようにφ_４およびψ_４に基づいて特定される。従っ
て、φ_３およびψ_３の値は必要とされない。FIG. 16 shows dihedral angle definitions for certain residues. That is, the ψ of the residue n in the construction strategy is the ψ of the previous residue on the N-terminal side. The idea behind such a definition is that both φ _n and ψ _n determine the positions of the N and C atoms at residue n. When constructing a loop from the N-terminus (starting at residue 1 in Figure 15), the position of the nitrogen in residue 1, the first expected, should be specified according to the angle ψ of the previous residue. . The exemplary method of the invention described below assumes a Cα-C-N-Cα trans (180 °) structure. Therefore,
At residue 1, the position of Cα is specified according to this premise. The position of the carbonyl carbon of residue 1 is specified according to φ ₁ retrieved from the search (see below). The nitrogen position of residue 2 is specified according to ψ ₂ (which is regularly defined as ψ ₁ ) (and so on). When constructing the loop from the C-terminus, the position of the carbonyl carbon of residue 4 is specified by φ ₅ . Residue Cα is 180 with residue 5 Cα
It is in the ° position. The N position of residue 4 is specified according to ψ ₅ . Similarly, residue 3
Position of is specified based on the phi ₄ and [psi ₄ as defined in Figure 16. Therefore, the value of phi ₃ and [psi ₃ is not required.

【０１６８】残基の（φ；ψ）可能な角度の割り当て構造データベースにおいて、６を超える残基の長さを有するペプチド配列全部
を見つける可能性は極端に低い（Ｏｌｉｖａら，Ｊ．Ｍｏｌ．Ｂｉｏｌ．１９９
７；２６６：８１４−８３０）。従って、本発明の方法は、ＳＷＩＳＳ−ＰＲＯ
Ｔにおける各ループの３つの重複する残基からなるセグメントの探索を使用する
（ＢａｉｒｏｃｈおよびＡｐｗｅｉｌｅｒ，ＮｕｃｌｅｉｃＡｃｉｄｓＲｅ
ｓ．２０００；２８：４５−４８）。ある配列．．．ＡＣＧＤＥＩＬ．．．を有
するタンパク質が与えられた場合（式中、「Ａ」は図１５からの残基０であり、
ＣＧＤＥはループである）、本発明の方法は、ＡＣＧ、ＣＧＤ、ＧＤＥ、ＤＥＩ
、およびＥＩＬのセグメントを探索する。ＳＷＩＳＳ−ＰＲＯＴ探索により検出
されたセグメントにおける関連残基の全ての（φ；ψ）角度について、Ｂｒｏｏ
ｋｈａｖｅｎＰｒｏｔｅｉｎＤａｔａＢａｎｋ（Ｂｅｒｎｓｔｅｉｎら，
Ｊ．Ｍｏｌ．Ｂｉｏｌ．１９９７；１１２：５３５−５４２）を探索する。探索
は、それぞれの三つ組における２番目と３番目の残基に対してのみ行われ、その
結果、見出されるどのφ；ψ組み合わせもループ構造の順序と関連付けられてい
るはずである。このような探索により、複数の許容されるコンフォメーション（
まれなコンフォメーションを含む）が得られ、ある特定の残基について数百対の
φ；ψ角度が得られ得る。後の処理のために、これらはデータベースとして保存
される。φ；ψ両方の角度が同じ残基の別の対と２°未満異なる場合、データベ
ースから捨てられる。どのバイアスをも避けるために、探索された任意のタンパ
ク質の二面角対の値は、この特定のタンパク質を試験するためのデータベースか
ら排除された。 Assignment of (φ; ψ) Possible Angles of Residues It is extremely unlikely to find all peptide sequences with a length of more than 6 residues in the structural database (Oliva et al., J. Mol. Biol). .199
7; 266: 814-830). Therefore, the method of the present invention is based on SWISS-PRO.
A search for a segment consisting of three overlapping residues of each loop in T is used (Bairoch and Apweiler, Nucleic Acids Re.
s. 2000; 28: 45-48). An array. ．． ACGDEIL. ．． Given a protein having: where “A” is residue 0 from FIG.
CGDE is a loop), the method of the present invention uses ACG, CGD, GDE, DEI.
, And search for the EIL segment. Broo for all (φ; ψ) angles of related residues in the segment detected by SWISS-PROT search.
khaven Protein Data Bank (Bernstein et al.,
J. Mol. Biol. 1997; 112: 535-542). The search is performed only on the 2nd and 3rd residues in each triplet so that any φ; ψ combination found should be associated with the order of the loop structure. Such a search results in multiple allowed conformations (
(Including rare conformations) can be obtained, and hundreds of pairs of φ; ψ angles can be obtained for a particular residue. These are stored as a database for later processing. If both φ and ψ angles differ from another pair of the same residue by less than 2 °, they are discarded from the database. To avoid any bias, the dihedral angle pair values of any protein searched for were excluded from the database for testing this particular protein.

【０１６９】確率論的アルゴリズムによるコンフォメーション空間の探索中程度のループおよび大きなループの場合、大きな組み合わせの問題が生じる
ことは明らかである。例えば、１．１５Å分解能のバクテリオロドプシン複合体
（Ｌｕｅｃｋｅら，Ｊ．Ｍｏｌ．Ｂｉｏｌ．１９９９；２９１：８９９−９１１
）（Ｂｒｏｏｋｈａｖｅｎファイル１ｃ３ｗ．ｐｄｂ）の第２ループにおける組
み合わせの数は５．５＊１０^２８である。これらのデータベースコンフォメーシ
ョンの一部だけが、幾何学的基準に従うループを閉じる可能性がある。問題の大
きさを小さくするために、本発明の方法が使用される。ｄ_０の未知の角度対を有
するループが与えられた場合、小さな部分だけが、最小コスト関数（ｃｏｓｔ
ｆｕｎｃｔｉｏｎ）（下記を参照のこと）の可能な構造に関与する。Ｘ_ｊ＝（ｘ _ｊ１，ｘ_ｊ２．．．ｘ_ｊｄ０）が、無作為に選択された（（φ_ｊ１；ψ_ｊ１），
（φ_ｊ２；ψ_ｊ２），．．．，（φ_ｊｄ０；ψ_ｊｄ０））角度を含むループのコ
ンフォメーションとする。それぞれのコンフォメーションＸ_ｊについて、コスト
関数Ｃ_ｊ＝Ｃ（Ｘ_ｊ）を計算することができる。目標は、Ｃを最小にする全ての
コンフォメーションを見つけることである。この方法は前記の２つの節で詳細に
前述された。簡単に言うと、以下のステップに従う。[0169]Search of conformational space by stochastic algorithm Large combination problems for medium and large loops
That is clear. For example, 1.15Å resolution bacteriorhodopsin complex
(Lücke et al., J. Mol. Biol. 1999; 291: 899-911.
) (Brookhaven file 1c3w.pdb) in the second loop
The number of combinations is 5.5 * 10²⁸Is. These database conformations
Only part of the loop may close loops that follow geometric criteria. Big problem
The method of the invention is used to reduce the texture. d₀Has an unknown angle pair of
Given a loop that does
involved in the possible structure of the function (see below). X_j= (X _j1 , X_j2．．． x_jd0) Was randomly selected ((φ_j1; Ψ_j1),
(Φ_j2; Ψ_j2) ,. ．． , (Φ_jd0; Ψ_jd0)) A loop containing an angle
Information. Each conformation X_jAbout the cost
Function C_j= C (X_j) Can be calculated. The goal is to minimize all C
To find a conformation. This method is detailed in the two sections above
As mentioned above. Briefly, follow the steps below.

【０１７０】１．それぞれの角度対の値を無作為に選択する。すなわち、その合計がループ
全体のコンフォメーションを構成する。２．このコンフォメーションの値を計算するために「コスト関数」を使用する
。1. Randomly choose a value for each angle pair. That is, the sum constitutes the conformation of the entire loop. 2. We use a "cost function" to calculate the value of this conformation.

【０１７１】３．ｎ個のこのようなコンフォメーションの値を計算し続ける（それぞれのコ
ンフォメーションは無作為に選択されたその全ての変数値を有する）。４．標本抽出された全てのエネルギーについての値の分布のヒストグラムを作
成する（ｎは約１０００）。3. Continue to calculate the values of n such conformations (each conformation has all its variable values chosen randomly). 4. Create a histogram of the distribution of values for all sampled energies (n = 1000).

【０１７２】５．上端領域での、完全なヒストグラムの部分α（α＝コンフォメーションの
数）の一因となる全ての変数成分を比較する。６．最大値コンフォメーションの一因となる成分はすべて除くが、最小値コン
フォメーションの一因となる成分は全く除かない。5. Compare all variable components that contribute to the part α (α = number of conformations) of the complete histogram in the upper region. 6. All components that contribute to the maximum conformation are removed, but none that contribute to the minimum conformation are removed .

【０１７３】７．残りの組み合わせを網羅的に評価することができるまで、プロセスを反復
的に繰り返す。この段階の終わりに、幾何学的ループ閉鎖の基準に従う多くのコンフォメーシ
ョンが残るが、必ずしも衝突がないとは限らない。従って、次の段階において、
側鎖を追加し、エネルギー基準によってループを評価する。このアルゴリズムの
様々な試験において、この段階の終わりに、１０^２〜１０^５のコンフォメーショ
ンがさらなる処理のために残された。7. The process is iteratively repeated until the remaining combinations can be exhaustively evaluated. At the end of this phase, there are many conformations that follow the criteria of geometric loop closure, but are not necessarily collision free. So in the next step,
Add side chains and evaluate loops by energy criteria. In various tests of this algorithm, at the end of this step, 10 ² -10 ⁵ conformations were left for further processing.

【０１７４】確率論的段階におけるスコア関数確率論的段階の目的は、潜在的に閉じることができるループの母集団を作成す
ることである。式２を使用することにより、開いたままの状態にあるループが除
かれる。本発明の方法は、式２におけるコスト関数を使用してコンフォメーショ
ン空間を探索する。 Score function in the stochastic phase The purpose of the stochastic phase is to create a population of loops that can potentially be closed. By using Equation 2, the loop that remains open is eliminated. The method of the present invention uses the cost function in Equation 2 to search the conformational space.

【０１７５】[0175]

【数８】式中、距離ｄ_ｉは図１５に示されている。これらは、Ｎ末端およびＣ末端から
の最後の連結残基を配置した後に計算される。ｒ_ｉ ^{ｅｘｐｅｒｉｍｅｎｔａｌ}の
値は、Ｎ−Ｃ結合長（Ｓｈｅｎｋｉｎら，Ｂｉｏｐｏｌｙｍｅｒｓ１９８７；
２６：２０５３−８５）などの標準的な値である。[Equation 8] Where the distance d _i is shown in FIG. These are calculated after placing the last linking residue from the N- and C-termini. The value of r _i ^experimental is, N-C bond length (Shenkin et al, Biopolymers 1987;
26: 2053-85).

【０１７６】ループのスコアリングＭより少ない組み合わせが残ったら（Ｍは約１０^２〜１０^５）、ループのＮ個
の最小エネルギー配座を得るために網羅的探索を作動させる。残ったループのエ
ネルギーをスコアリングするために、バックボーン依存的回転異性体ライブラリ
ー（ＤｕｎｂｒａｃｋおよびＫａｒｐｌｕｓ，Ｊ．Ｍｏｌ．Ｂｉｏｌ．１９９３
；２３０：５４３−５７４；ＤｕｎｂｒａｃｋおよびＫａｒｐｌｕｓ，Ｎａｔ．
Ｓｔｒｕｃｔ．Ｂｉｏｌ．１９９４；１：３３４−３４０）の最近更新されたバ
ージョンを用いて、ループの側鎖を追加した。原子については融合原子モデルを
使用した（Ｗｅｉｎｅｒら，ＪＡｍｅｒ．Ｃｈｅｍ．Ｓｏｃ．１９８４；１０
６：７６５−７８４）。バックボーンのＮ−Ｈおよび側鎖の極性水素がはっきり
と表示される。ＡＭＢＥＲ（ＷｅｉｎｅｒおよびＫｏｌｌｍａｎ，Ｊ．Ｃｏｍｐ
．Ｃｈｅｍ．１９８１；２：２８７−３０３；Ｗｅｉｎｅｒら，ＪＡｍｅｒ．
Ｃｈｅｍ．Ｓｏｃ．１９８４；１０６：７６５−７８４）結合エネルギー項およ
び非結合エネルギー項をε＝２ｒの距離依存的誘電率と共に使用した（式３）。
非結合エネルギーは、バックボーンおよび他の残基の回転異性体との相互作用に
ついて計算される。全ての極性水素と可能な受容体との間の水素結合の１２−１
０ポテンシャルを使用した。満足するが、ある程度のＶｄＷ衝突を示す可能性の
ある解を失わないために、Ｌｅｎｎａｒｄ−Ｊｏｎｅｓ反発力エネルギーを、あ
る特定の原子対について３０Ｋｃａｌ／ｍｏｌｅの値で切り捨てた。If there are less than M combinations of the loop scoring M (M is approximately 10 ² to 10 ⁵ ), then an exhaustive search is run to obtain the N minimum energy conformations of the loop. A backbone-dependent rotamer library (Dunblock and Karplus, J. Mol. Biol. 1993) was used to score the energy of the remaining loops.
230: 543-574; Dunblock and Karplus, Nat.
Struct. Biol. 1994; 1: 334-340), with the addition of the side chains of the loop. For atoms, a fused atom model was used (Weiner et al., J Amer. Chem. Soc. 1984; 10
6: 765-784). The backbone NH and the polar hydrogens of the side chains are clearly indicated. AMBER (Weiner and Kollman, J. Comp
． Chem. 1981; 2: 287-303; Weiner et al., J Amer.
Chem. Soc. 1984; 106: 765-784) binding and non-bonding energy terms were used with a distance dependent permittivity of ε = 2r (Equation 3).
Non-bonding energies are calculated for the interaction of the backbone and other residues with rotamers. 12-1 of hydrogen bonds between all polar hydrogens and possible acceptors
A zero potential was used. Satisfactory, but in order not to lose the solution, which may show some VdW collisions, the Lennerd-Jones repulsive energy was truncated at a value of 30 Kcal / mole for a particular atom pair.

【０１７７】[0177]

【数９】関数は平均場形式で使用した。Ｂｏｗｅｒら（Ｊ．Ｍｏｌ．Ｂｉｏｌ．１９９
７；２６７：１２６８−１２８２）により示唆され、ＳＣＷＲＬアルゴリズムに
おいて実施されるように、どの回転異性体にもバックボーン依存的回転異性体ラ
イブラリーにおける確率が与えられる。式３からの相互作用のエネルギーに、関
連する回転異性体から割り当てられた確率（ｐ）を掛ける（回転異性体確率の合
計は各残基について１である）。回転異性体−回転異性体の相互作用、回転異性
体−バックボーンの相互作用、およびバックボーン−バックボーンの相互作用を
全て考慮した。ネイティブなループから１０Åの距離にある少なくとも１つの原
子を有する残基のサブセットを「テンプレート」として含めた。式３における原
子がバックボーンからのものである場合、その確率はｐ＝１である。結合エネル
ギー項は、伸び（式４）、曲げ（式５）、およびねじれエネルギー（式６）を含
んだ。伸びエネルギー（式３）は、ループを閉じるカルボニル炭素と窒素との間
（図１５における残基２と残基３との間のｄ_１）で計算される。[Equation 9] The function was used in mean field form. Bower et al. (J. Mol. Biol. 199).
7; 267: 1268-1282), and any rotamer is given a probability in the backbone-dependent rotamer library, as implemented in the SCWRL algorithm. The energy of interaction from Equation 3 is multiplied by the probability (p) assigned from the relevant rotamer (the sum of rotamer probabilities is 1 for each residue). All rotamer-rotamer interactions, rotamer-backbone interactions, and backbone-backbone interactions were considered. A subset of residues with at least one atom at a distance of 10 Å from the native loop was included as a “template”. If the atom in Equation 3 is from the backbone, then its probability is p = 1. The binding energy terms included elongation (Equation 4), bending (Equation 5), and twist energy (Equation 6). The elongation energy (Equation 3) is calculated between the carbonyl carbon and the nitrogen closing the loop (d ₁ between residue 2 and residue 3 in FIG. 15).

【０１７８】[0178]

【数１０】「ｋ_ｂ」パラメータは結合ばね（ｂｏｎｄｓｐｒｉｎｇ）の剛性を制御する
のに対して、ｒ_０はその平衡長を定める。エネルギー関数を柔らかくするために
、ｋ_ｂ＝１００の値を割り当てた。曲げエネルギーは以下のように計算される（
式５）。[Equation 10] _{"K b"} parameter for controlling the stiffness of the coupling spring (bond spring), _{r 0} defines the equilibrium length. A value of k _b = 100 was assigned to soften the energy function. Bending energy is calculated as (
Equation 5).

【０１７９】[0179]

【数１１】「ｋ_θ」パラメータは角度ばね（ａｎｇｌｅｓｐｒｉｎｇ）の剛性を制御す
るのに対して、θ_０はその平衡角度を定める。角度曲げに特有のパラメータは、
原子の結合した三つ組それぞれに原子の種類に基づいて割り当てられる。２種類
の三つ組を使用した。１番目の三つ組はＣα−Ｎ−Ｃ（図１５におけるｄ_２）で
あった（式中、Ｃαは前の残基の一部である）。２番目の三つ組はＣα−Ｃ−Ｎ
（式中、ＣαおよびＣは前の残基の一部である）（図１５におけるｄ_３）を含ん
だ。[Equation 11] The “k _θ ” parameter controls the stiffness of the angle spring, while θ ₀ defines its equilibrium angle. Parameters specific to angle bending are
It is assigned to each of the linked triplets of atoms based on the type of atom. Two types of triplets were used. The first triplet was Cα-NC (d ₂ in Figure 15), where Cα is part of the previous residue. The second triplet is Cα-C-N
Where Cα and C are some of the previous residues) (d ₃ in FIG. 15).

【０１８０】ねじれエネルギーは周期関数によってモデル化される（式６）。[0180] The twist energy is modeled by a periodic function (Equation 6).

【０１８１】[0181]

【数１２】「Ａ」パラメータは曲線の振幅を制御し、ｎパラメータは曲線の周期性を制御
し、φは、回転角の軸（τ）に沿って曲線全体をシフトする。ねじれ回転に特有
のパラメータは、原子の結合した四つ組それぞれに原子の種類に基づいて割り当
てられる。次いで、Ｃα−Ｎ−Ｃ−Ｃα原子間（図１５におけるｄ_４）の角度の
ねじれエネルギー（すなわち、平らな（１８０°）アミド結合からのずれのエネ
ルギー「価格（ｐｒｉｃｅ）」）が計算される。[Equation 12] The "A" parameter controls the amplitude of the curve, the n parameter controls the periodicity of the curve, and φ shifts the entire curve along the axis of rotation (τ). Torsional rotation-specific parameters are assigned to each of the bonded quartets based on the atom type. The angular twist energy (ie, the energy “price” of the deviation from the flat (180 °) amide bond) between the Cα-N—C—Cα atoms (d ₄ in FIG. 15) is then calculated. .

【０１８２】座標二乗平均平方根偏差アルゴリズム（ｃＲＭＳ）を適用することによって高
分解能Ｘ線結晶学からのループ座標と比較することにより、全ての結果が評価さ
れた。このような比較は、他の方法（ｖａｎＶｌｉｊｍｅｎおよびＫａｒｐｌ
ｕｓ，Ｊ．Ｍｏｌ．Ｂｉｏｌ．１９９７；２６７：９７５−１００１ならびにＤ
ｅａｎｅおよびＢｌｕｎｄｅｌｌ（Ｐｒｏｔｅｉｎｓ２０００；４０：１３５
−１４４））との比較を可能にするためにバックボーンのＮ、Ｃα、およびＣに
対してのみ行われた。All results were evaluated by comparison with loop coordinates from high resolution X-ray crystallography by applying the coordinate root mean square deviation algorithm (cRMS). Such comparisons can be made by other methods (van Vlijmen and Karpl.
us, J. Mol. Biol. 1997; 267: 975-1001 and D.
eane and Blundell (Proteins 2000; 40: 135.
-144)) was only done on the backbones N, Cα, and C to allow comparison.

【０１８３】結果および考察前記の試験は、新規の確率論的探索法がループ構築にも適用可能であるかどう
か、および様々な大きさの構造的に既知のループの再構築に使用可能であるかど
うかを確かめることを目的とした。使用した例は膜貫通タンパク質であった。唯
一の広範囲にわたる実験例が７個の膜貫通ヘリックスを含むバクテリオロドプシ
ンであり、高分解能結晶学によって最近研究された（Ｌｅｕｃｋｅら，Ｊ．Ｍｏ
ｌ．Ｂｉｏｌ．１９９９；２９１：８９９−９１１）。この構造（１．５５Å分
解能でのＸ線結果、ＰＤＢファイル１ｃ３ｗ）に対して探索が適用された。バク
テリオロドプシンの６個のループを表Ｘに示す。ループ３（ＣＤ、細胞内）およ
びループ４（ＤＥ、細胞外）はそれぞれ２個および１個の残基を含み、興味ある
試験事例ではない。ループ５（ＥＦ、細胞内）では座標はエントリに含まれてお
らず、従って、結果の質をはっきりと評価することができない。残りのループ：
１（ＡＢ、細胞内）、２（ＢＣ、細胞外）、および６（ＦＧ、細胞外）は魅力的
な試験事例であり、４〜１６個の残基である。偏りを避けるために、確率論的探
索に使用される残基の（φ；ψ）角度データベースの作成に１ｃ３ｗ．ｐｄｂエ
ントリを含めなかった。ＲＭＳ値は０．２８〜２．４６であり（表ＸＩ）、平均
値は１．３５であった。アルゴリズムが、一方では小さなループにおいて非常に
小さなＲＭＳを生じ、非常に大きなループの場合ではかなり大きなＲＭＳ値を生
じることができることは励みになることである。 Results and Discussion The above tests can be used to see if the new stochastic search method is also applicable to loop construction, and to reconstruct structurally known loops of various sizes. The purpose was to see if. The example used was a transmembrane protein. The only extensive experimental example is bacteriorhodopsin containing seven transmembrane helices, which was recently studied by high-resolution crystallography (Leucke et al., J. Mo.
l. Biol. 1999; 291: 899-911). A search was applied to this structure (X-ray results at 1.55 Å resolution, PDB file 1c3w). The six loops of bacteriorhodopsin are shown in Table X. Loop 3 (CD, intracellular) and Loop 4 (DE, extracellular) contain 2 and 1 residues, respectively, and are not interesting test cases. In loop 5 (EF, intracellular) the coordinates are not included in the entry and therefore the quality of the results cannot be clearly evaluated. The rest of the loop:
1 (AB, intracellular), 2 (BC, extracellular), and 6 (FG, extracellular) are attractive test cases with 4-16 residues. To avoid bias, 1c3w. Did not include the pdb entry. The RMS value was 0.28-2.46 (Table XI) and the average value was 1.35. It is encouraging that the algorithm can, on the one hand, produce very small RMS in small loops and quite large RMS values in the case of very large loops.

【０１８４】続いて、ＤｅａｎｅおよびＢｌｕｎｄｅｌｌ（Ｐｒｏｔｅｉｎｓ２０００；
４０：１３５−１４４）ならびにｖａｎＶｌｉｊｍｅｎおよびＫａｒｐｌｕｓ
（Ｊ．Ｍｏｌ．Ｂｉｏｌ．１９９７；２６７：９７５−１００１）の最近の報告
と比較することにより、球状タンパク質に対する確率論的ループ予測の有効性の
比較が試みられた。ＤｅａｎｅおよびＢｌｕｎｄｅｌｌはａｂｉｎｉｔｉｏル
ープ構築法を使用した。彼らのアルゴリズムは、コンピュータで作成されたデー
タベースからポリペプチドフラグメントを選択する。それぞれのフラグメントは
８対の（φ；ψ）の代表的な組により定義される。このフラグメントの組はスコ
アリングされ、アンカー領域とのＲＭＳフィットおよび知識ベースのエネルギー
関数を用いて分類される。ｖａｎＶｌｉｊｍｅｎおよびＫａｒｐｌｕｓは、２
１種類のタンパク質からの１３０のループからなるデータベースに対する探索を
使用した。多数の候補の中で最良のループが、バックボーンおよびＣ（β）原子
に適用されたＣＨＡＲＭＭ（ＧｕｎｓｔｅｒｅｎおよびＫａｒｐｌｕｓ，Ｊ．Ｃ
ｏｍｐ．Ｃｈｅｍ．，１９８０；１：２６６−２７４）非結合エネルギー関数（
静電学なし）によって決定された。本発明の方法は、彼らの１１例のループのう
ち最も長い７つに対して試験した。表ＸＩＩは、本発明者らの結果と前記の２つ
の方法により報告された結果を比較している。ＶｌｉｊｍｅｎおよびＫａｒｐｌ
ｕｓの場合の平均ＲＭＳ値２．３（０．３〜５．２）、ＤｅａｎｅおよびＢｌｕ
ｎｄｅｌｌの場合の平均ＲＭＳ値２．１（１．３〜３．２）と比較して、平均Ｒ
ＭＳ値は１．８６であり、範囲は１．０６〜２．９９であった。これらのより低
い平均ＲＭＳ値は本発明の方法の優れた質をはっきりと証明している。Subsequently, Deane and Blundell (Proteins 2000;
40: 135-144) and van Vlijmen and Karplus.
(J. Mol. Biol. 1997; 267: 975-1001), an attempt was made to compare the effectiveness of stochastic loop predictions for globular proteins. Deane and Blundell used the ab initio loop construction method. Their algorithm selects polypeptide fragments from computer generated databases. Each fragment is defined by a representative set of 8 pairs (φ; ψ). This set of fragments is scored and classified using the RMS fit with the anchor region and the knowledge-based energy function. van Vlijmen and Karplus are 2
A search against a database of 130 loops from one protein was used. The best loop among many candidates was CHARMM applied to the backbone and C (β) atoms (Gunsteren and Karplus, JC).
omp. Chem. , 1980; 1: 266-274) non-bonding energy function (
(Without electrostatics). The method of the invention was tested on the longest 7 of their 11 loops. Table XII compares our results with those reported by the above two methods. Vlijmen and Karpl
Average RMS value of 2.3 (0.3 to 5.2) for us, Deane and Blu
Average RMS value of 2.1 (1.3 to 3.2) in the case of
MS value was 1.86 and range was 1.06 to 2.99. These lower average RMS values clearly demonstrate the superior quality of the method of the invention.

【０１８５】さらに、アルゴリズムにより低エネルギーループコンフォメーションの大きな
集まりが得られ、この集まりは、ループ特性（例えば、柔軟性）を評価するため
に、ならびにＰＤＢから既知ループを再構築する場合では結晶学からのループ温
度要因と比較するためにさらに使用することができる。Furthermore, the algorithm yields a large collection of low energy loop conformations, which can be used to evaluate loop properties (eg flexibility) as well as from crystallography in the case of reconstructing known loops from PDBs. Can be further used to compare with the loop temperature factor.

【０１８６】この研究中に、いくつかの基本的な問題が提起された。第１の問題は、標準的
な結合長および角度を使用する近似の精度に関係した。このねらいのために、本
発明の方法がバクテリオロドプシンの第１ループに対して使用された（下記を参
照のこと）。予測されたバックボーンと実験から得られたバックボーンとの間の
ＲＭＳ値は０．２８０であった。真の実験から得られた二面角を角度データベー
スに追加し、残りの二面角を削除した。従って、本発明の方法の唯一の選択肢は
、実験から得られた二面角に従って構築することであった。残りの角度および結
合長が実験から得られたものとほぼ同じである場合、０のＲＭＳ値を得ることが
予想される。しかしながら、０．２０４のＲＭＳ値が生じた。これは、このよう
な近似がわずかではあるが無視できないほどの影響を及ぼすことを示している。
特に、誤りの蓄積が結果をゆがめる可能性がある大きなループを作る時に、この
ことを考慮に入れなければならない。During this study, some basic questions were raised. The first issue involved the accuracy of the approximation using standard bond lengths and angles. For this purpose, the method of the invention was used for the first loop of bacteriorhodopsin (see below). The RMS value between the predicted backbone and the experimentally obtained backbone was 0.280. The dihedral angles obtained from the true experiment were added to the angle database and the remaining dihedral angles were deleted. Therefore, the only option of the method of the present invention was to build according to the dihedral angle obtained from the experiment. It is expected that an RMS value of 0 will be obtained if the remaining angles and bond lengths are about the same as those obtained from the experiment. However, an RMS value of 0.204 occurred. This shows that such an approximation has a slight but nonnegligible effect.
This must be taken into account, especially when creating large loops where the accumulation of errors can distort the results.

【０１８７】第２の問題は、同じ残基の別の対と（両方の角度について）２°未満異なるφ
；ψ角度対を除く近似の精度に関係した。わずかな変更を加えて（すなわち、実
験から得られた全ての二面角を２°増やした）、前記の試験が繰り返された。驚
くべきことに、０．１９８のＲＭＳ値が生じた。全ての二面角を２°減少して同
じ試験を繰り返すと、０．２２０のＲＭＳ値が得られた。このようなわずかな差
により、近似は適切であることが分かる。The second problem is that φ differs by less than 2 ° (for both angles) from another pair of the same residue.
It was related to the accuracy of the approximation excluding the ψ angle pair. The above test was repeated with minor changes (ie increasing all dihedral angles obtained from the experiment by 2 °). Surprisingly, an RMS value of 0.198 resulted. Repeating the same test with all dihedral angles reduced by 2 ° gave an RMS value of 0.220. Such a slight difference indicates that the approximation is adequate.

【０１８８】ループ構造の全体的最適化（ｇｌｏｂａｌｏｐｔｉｍｉｚａｔｉｏｎ）は、
変数間の強い依存関係のために難しい作業である。すなわち、１つのφまたはψ
角度の変更は、ループ全体における著しいコンフォメーション変化を引き起こす
可能性がある。従って、提起された問題は、プロトンおよび側鎖の位置を首尾よ
く特定した本発明の方法が、式２に定義された幾何学的基準に従うループの母集
団を作成することができるかどうかに関係した。再度、バクテリオロドプシンの
第１ループを使用した。これは網羅的探索には極端に大きくなく、他方で、組み
合わせの挑戦をなおもたらす。式２の１０，０００個の「最小コスト関数」コン
フォメーションを図１７に示す。両探索は５４，３３０，０００の組み合わせか
ら始まった。２種類の探索アプローチによって同じグローバルミニマムが達成さ
れた。最初の６６個のコンフォメーションはコスト値（ＲＭＳ）が同一であった
。最も悪い誤り（ＲＭＳ、２０１８番目の解）は３．３６％であり、コスト値の
差は０．００６７２１Åであった。この試験は、本発明の方法が小さなループお
よび大きなループを効果的に探索し、これらのループの有意な結果を得ることが
できることを証明している。これらの結果は、この種類の生体分子問題の解決に
おける本発明の方法の確実性を強く裏付けている。The global optimization of the loop structure is
This is a difficult task due to the strong dependencies between variables. That is, one φ or ψ
Changes in angle can cause significant conformational changes throughout the loop. Therefore, the question raised concerns whether the method of the present invention, which successfully locates the protons and side chains, can generate a population of loops according to the geometric criteria defined in equation 2. did. Again, the first loop of bacteriorhodopsin was used. This is not extremely large for exhaustive searches, while still presenting combinatorial challenges. The 10,000 "minimum cost function" conformations of Equation 2 are shown in FIG. Both searches began with 54,330,000 combinations. The same global minimum was achieved by two different search approaches. The first 66 conformations had the same cost value (RMS). The worst error (RMS, 2018th solution) was 3.36%, and the difference in cost value was 0.006721Å. This test demonstrates that the method of the invention can effectively search for small and large loops and obtain significant results for these loops. These results strongly support the certainty of the method of the invention in solving this type of biomolecular problem.

【０１８９】結果の質は、実験から得られたものと比較して無視できるほどのＲＭＳの達成
という目的に関して調べられるべきである。ここで、基本仮定は、エネルギーが
小さくなればなるほどＲＭＳが良くなるである。他の任意のツールと同様に、Ｒ
ＭＳには特有の制限がある。ユーザーはどの原子を重ね合わせるべきかを考慮し
なければならない。表ＸＩＩＩではＲＭＳ値が比較され、ここで、重ね合わせの
ために異なる原子が選択された。Ｎ、Ｃα、およびＣ間のループＲＭＳの計算に
より、１．８６の平均ＲＭＳ値が得られた。カルボニル酸素を追加すると、平均
ＲＭＳ値は２．１０まで上昇した。ループに結合しているタンパク質残基を追加
すると、平均ＲＭＳ値は２．６２まで増加した。これらの２つの残基の含有は、
これらの座標が「正しい」のでＲＭＳ値を小さくすると考えるかもしれない。し
かしながら、反対の現象が少なくとも本発明者らの試験事例において観察される
。言い換えれば、ループの原子を重ね合わせる時、ＲＭＳはタンパク質の残りを
無視し、幾何学的要因（例えば、ループとタンパク質との間の結合長および二面
角）が無視される。小さなＲＭＳ値は必ずしも内部ループ形状が許容可能である
ことを示さず、予測されたループが適当な形状をとり（ループが開いたままの状
態でない）、タンパク質の「既知の」部分と衝突しないことを確かめるべきであ
る。この現象はＲＭＳ重ね合わせ機構によって説明することができる。ＲＭＳ関
数は、タンパク質の残りを無視しながら、予測されたループを既知のループと重
なるように並進および回転させる。ワイヤーで作られたタンパク質構造を考えた
場合、「ワイヤーループ」は、内部ループ座標を変えることなく（予測によって
）湾曲することができる。従って、残基ｍからｎについてＲＭＳ＝０．０を達成
することができる。他方で、この湾曲は、タンパク質構造からの大きなずれを引
き起こす可能性があり、その結果、「正しく予測されたループ」をタンパク質に
くっつけると、他の残基のそのタンパク質位置からのずれのために、ＲＭＳはか
なり増加する。The quality of the results should be investigated with the goal of achieving a negligible RMS compared to that obtained from the experiment. Here, the basic assumption is that the smaller the energy, the better the RMS. Like any other tool, R
MS has its own limitations. The user has to consider which atom should be superposed. The RMS values were compared in Table XIII, where different atoms were selected for superposition. Calculation of the loop RMS between N, Cα and C gave an average RMS value of 1.86. Addition of carbonyl oxygen increased the average RMS value to 2.10. Addition of protein residues attached to the loop increased the average RMS value to 2.62. The inclusion of these two residues is
One might consider reducing the RMS value because these coordinates are "correct". However, the opposite phenomenon is observed at least in our test case. In other words, when superimposing the atoms of the loop, RMS ignores the rest of the protein, ignoring geometric factors such as bond length and dihedral angle between loop and protein. Small RMS values do not necessarily indicate that the inner loop shape is acceptable, that the predicted loop has the proper shape (the loop is not left open) and does not collide with "known" portions of the protein. You should make sure. This phenomenon can be explained by the RMS superposition mechanism. The RMS function translates and rotates the predicted loop to overlap the known loop, ignoring the rest of the protein. Given the protein structure made up of wires, the "wire loop" can bend (by prediction) without changing the inner loop coordinates. Therefore, RMS = 0.0 can be achieved for residues m to n. On the other hand, this curvature can cause a large deviation from the protein structure, so that attaching a “correctly predicted loop” to the protein results in deviations of other residues from that protein position. , RMS increases significantly.

【０１９０】第４節：他の生物学的問題の例前節により、多数の難しい生物学的問題を解決するための本発明の有効性を例
示する詳細な試験結果が得られた。本節は、他の多数のこのような問題、および
このような問題を本発明によってどのように解決することができるかについて述
べる。 Section 4: Examples of Other Biological Problems The previous section provides detailed test results that illustrate the effectiveness of the present invention in solving a number of difficult biological problems. This section describes many other such problems and how such problems can be solved by the present invention.

【０１９１】ホモロジーモデリング。Ｘ線研究またはＮＭＲ研究から分かったタンパク質に
基づく未知タンパク質構造のホモロジーモデリング構築は、既知構造と比較した
ペプチドフラグメントの「挿入」および「欠失」ならびに変異を必要とする。（
構築しようとする）標的の相同部分は、既知タンパク質の相同部分の上に１残基
ずつ重ね合わされる。他の部分は長さが異なる場合があり、既知タンパク質のル
ープ、βヘアピン、およびランダムコイル部分で定期的に遭遇する。長さの差（
「挿入」および「欠失」）ならびに少なくとも、構造の加減された部分の近くに
ある側鎖の位置のために、このような各操作は非相同部分におけるバックボーン
座標の再評価を必要とする。最初の何もしていない固定されたバックボーンを使
用してモデルを構築することにより、既知タンパク質構造における変異のどの計
画も助けられる。この重大な問題の解決における実質的な進歩が、本発明の方法
によって既に達成されている。Homology modeling. Homology modeling construction of unknown protein structures based on proteins found from X-ray or NMR studies requires "insertions" and "deletions" and mutations of peptide fragments compared to known structures. (
The homologous part of the target (to be constructed) is superposed one residue at a time over the homologous part of the known protein. Other portions may differ in length and are regularly encountered in loops of known proteins, beta hairpins, and random coil portions. Difference in length (
Each such manipulation requires a re-evaluation of the backbone coordinates in the non-homologous part, due to "insertions" and "deletions") and at least the position of the side chains near the modified part of the structure. Building the model using a fixed backbone that does nothing initially helps any scheme of mutations in the known protein structure. Substantial advances in solving this serious problem have already been achieved by the method of the invention.

【０１９２】本発明では、側鎖の位置を同時に決定して、または側鎖の位置を同時に決定す
ることなく、前記のアプローチをループ構築に適用することにより、「挿入」お
よび「欠失」の影響を処理することができる。In the present invention, by applying the above approach to loop construction, with or without simultaneous side-chain position determination, the "insertion" and "deletion" The impact can be dealt with.

【０１９３】この問題のために、いくつかのさらなる課題（例えば、エネルギーを評価する
ための様々な力場の使用、ならびに統計的加重を伴う、または統計的加重を伴わ
ない代替回転異性体ライブラリーの使用）もまた本発明に取り入れることができ
る。Because of this problem, some additional challenges, such as the use of various force fields to assess energy, and alternative rotamer libraries with or without statistical weighting Can also be incorporated into the present invention.

【０１９４】ペプチド、ペプチドミメティック、および他の環状構造の閉環。多くの研究が
、生物学的に不安定な線状ペプチドよりはむしろ、環状の（「コンフォメーショ
ンが限られた」）ペプチドの潜在的な治療重要性を示している（Ｈｒｕｂｙ，Ｌ
ｉｆｅＳｃｉ．１９８２；３１，１８９；Ａｌｓｔｅｉｎら，Ｊ．Ｂｉｏｌ．
Ｃｈｅｍ．１９９９；２７４：１７５７３）。このようなペプチドのコンフォメ
ーションの理論的研究（ＫｅａｓａｒおよびＲｏｓｅｎｆｅｌｄ，Ｆｏｌｄｉｎ
ｇａｎｄＤｅｓｉｇｎ１９９８；３：３７９；Ｔｉｅｌｅｍａｎら，Ｂｉ
ｏｐｈｙｓ．Ｊ．１９９９；７６：１７５７）は、ペプチドのコンフォメーショ
ン間の高い障壁のために、「正しい」閉環の選択肢によって大きく左右される。
このような環状ペプチドの多くはＮＭＲによって溶液中で研究されている（Ｂａ
ｙｓａｌおよびＭｅｉｒｏｖｉｔｃｈ，Ｂｉｏｐｏｌｙｍｅｒｓ１９９９；５
０：３２９）。Ring closure of peptides, peptidomimetics, and other cyclic structures. Many studies have shown the potential therapeutic importance of cyclic ("conformation-limited") peptides, rather than biologically labile linear peptides (Hruby, L.
if Sci. 1982; 31, 189; Alstein et al., J. Am. Biol.
Chem. 1999; 274: 17573). Theoretical study of the conformation of such peptides (Keasar and Rosenfeld, Foldin
g and Design 1998; 3: 379; Tieleman et al., Bi.
ophys. J. 1999; 76: 1757) is largely dependent on the "correct" ring closure option due to the high barrier between peptide conformations.
Many such cyclic peptides have been studied in solution by NMR (Ba
ysal and Meirovitch, Biopolymers 1999; 5.
0: 329).

【０１９５】活性ペプチドおよび他の線状分子の環化は、エントロピー損失の予想される減
少によって生物学的受容体への結合を高めるための、消化に対する安定性を高め
るための、ならびに特異性および選択性を強くするためなどの選り抜きの方法の
１つである。閉環のための代案の予備モデリングにより、このような環状構造の
設計がかなり助けられる。これは、環の大きさ、結合長、結合角、および他の要
因などの多くの変数からなる関数である。Cyclization of active peptides and other linear molecules enhances binding to biological receptors by the expected reduction in entropy loss, enhances stability to digestion, and specificity and This is one of the selection methods such as for strengthening the selectivity. Alternative preliminary modeling for ring closure can greatly aid the design of such an annular structure. It is a function of many variables such as ring size, bond length, bond angle, and other factors.

【０１９６】この問題は、本発明に関するループ構造予測の問題にきわめて似ている。一般
的に、環状ペプチドはループより小さく、非常に小さな「自由」を、バックボー
ンおよび側鎖のコンフォメーション柔軟性に導入することができる。また、閉環
選択肢の徹底的な探索のためには、ファイおよびプサイ（バックボーン）角度の
比較的小さな増加が必要とされる。This problem is very similar to the problem of loop structure prediction for the present invention. Cyclic peptides are generally smaller than loops, and a very small amount of "freedom" can be introduced into the backbone and side chain conformational flexibility. Also, a relatively small increase in Phi and Psi (backbone) angles is required for an exhaustive search for ring closure options.

【０１９７】活性部位への薬物候補のフレキシブルドッキング。リガンドの標的への結合を
予測する計算方法は、一般的に、複合体の最も安定した結合コンフォメーション
の探求に基づいている（Ｓｔｒｙｎｄａｋａら，ＮａｔｕｒｅＳｔｒｕｃｔ．
Ｂｉｏｌ．１９９６；３：２３３）。理論上、これは結晶学的に観察される結合
コンフォメーションに対応している（Ｒｏｓｅｎｆｅｌｄら，Ａｎｎｕ．Ｒｅｖ
．Ｂｉｏｐｈｙｓ．Ｂｉｏｍｏｌ．Ｓｔｒｕｃｔ．１９９５；２４：６７７；Ｃ
ｌａｒｋおよびＷｅｓｔｈｅａｄ，Ｃｏｍｐｕｔ．ＡｉｄｅｄＭｏｌ．Ｄｅｓ
．１９９６；１０：３３７；ＡｂａｇｙａｎおよびＴｏｔｒｏｖ，Ｊ．Ｍｏｌ．
Ｂｉｏｌ．１９９４；２３５：９８３；ＴｒｏｓｓｅｔおよびＳｃｈｅｒａｇａ
，Ｐｒｏｃ．Ｎａｔ．Ａｃａｄ．Ｓｃｉ．ＵＳＡ１９９８；９５：８０１１）
。しかしながら、複合体のグローバルエネルギーミニマム以外の低エネルギーコ
ンフォメーションが結合親和性に寄与し得る。新規のドッキングアルゴリズムは
、この仮定を取り入れている（Ｈｅａｄら，Ｊ．Ｐｈｙｓ．Ｃｈｅｍ．１９９７
；１０１：１６０９）。しかしながら、ＤＯＣＫ、ＡＵＴＯＤＯＣＫ、ＦＬＥＸ
Ｘ、ＧＯＬＤ、およびその他などのフレキシブルドッキングソフトウエアの大部
分は、標的タンパク質または生体分子の柔軟性を考慮しておらず、活性部位プロ
トン付加（ｐＫａ）状態または含水量の潜在的な変化を考慮していない。Flexible docking of drug candidates to the active site. Computational methods for predicting binding of a ligand to its target are generally based on the search for the most stable binding conformation of the complex (Stryndaka et al., Nature Struct.
Biol. 1996; 3: 233). In theory, this corresponds to the crystallographically observed binding conformation (Rosenfeld et al., Annu. Rev.
． Biophys. Biomol. Struct. 1995; 24: 677; C
lark and Westhead, Comput. Aided Mol. Des
． 1996; 10: 337; Abagyan and Totrov, J .; Mol.
Biol. 1994; 235: 983; Trosset and Scheraga.
, Proc. Nat. Acad. Sci. USA 1998; 95: 8011).
. However, low energy conformations other than the global energy minimum of the complex may contribute to binding affinity. The new docking algorithm incorporates this assumption (Head et al., J. Phys. Chem. 1997.
101: 1609). However, DOCK, AUTODOCK, FLEX
Most of the flexible docking software, such as X, GOLD, and others, does not consider the flexibility of the target protein or biomolecule, but the active site protonation (pKa) state or potential changes in water content. I haven't.

【０１９８】フレキシブルドッキングは、生体分子標的に異なる位置および変わりやすい結
合方法で結合する、ほとんどの分子の能力を試験するのに不可欠である。柔軟性
のある薬物とタンパク質との相互作用では、構造変化（主としてコンフォメーシ
ョン）は薬物とタンパク質の相互作用部位（酵素の活性部位、受容体タンパク質
の結合部位）の両方において起こり得る。Flexible docking is essential for testing the ability of most molecules to bind to biomolecular targets at different positions and variable binding methods. In flexible drug-protein interactions, structural changes (primarily conformations) can occur at both drug-protein interaction sites (active sites of enzymes, binding sites of receptor proteins).

【０１９９】本発明に関して、これは、側鎖の位置決定とタンパク質ループの構造決定の問
題を発展したものであるが、生体分子の活性部位に関して薬物を６自由度（並進
＋回転）移動させる必要がある点で異なる。本発明は、前記で記載された側鎖の
位置特定とループ（バックボーン変化）の予測の両方を処理しなければならない
が、生体分子標的とリガンドの相対的な位置を最適化するさらなる必要があり、
最適化が生体分子標的とリガンドの両方に対して同時に適用される。In the context of the present invention, this is an extension of the problem of side chain localization and protein loop structure determination, but it requires the movement of the drug 6 degrees of freedom (translation + rotation) with respect to the active site of the biomolecule. There is a difference. The present invention must handle both side chain localization and loop (backbone change) predictions described above, but there is a further need to optimize the relative position of the biomolecular target and the ligand. ,
Optimization is applied simultaneously to both biomolecular targets and ligands.

【０２００】これらのさらなる自由度は変数として任意に導入することができるが、特別な
必要条件がある。さらに、任意におよびより好ましくは、この問題は、エンティ
ティ（例えば、薬物および生体分子の活性部位）の相対距離であるさらなる変数
を追加することによって本発明の方法に従って分析される。These additional degrees of freedom can be arbitrarily introduced as variables, but there are special requirements. Moreover, optionally and more preferably, the problem is analyzed according to the method of the invention by adding an additional variable that is the relative distance of entities (eg, active sites of drugs and biomolecules).

【０２０１】従って、変数には、並進および回転の合計６つのさらなる変数のための距離お
よび角度の変数が含まれる。本発明は、前記で記載された側鎖の位置特定とルー
プ（バックボーン変化）の予測の両方を処理しなければならないが、生体分子標
的とリガンドの相対的な位置を最適化するさらなる必要があり、最適化が生体分
子標的とリガンドの両方に対して同時に適用される。従って、変数には、好まし
くは、合計６つのこのような変数：Ｘ、Ｙ、Ｚ座標軸に沿った３種類の並進およ
び同じ角度を中心にした３種類の回転のための距離および角度の変数が含まれる
。Variables therefore include distance and angle variables for a total of 6 additional variables for translation and rotation. The present invention must handle both side chain localization and loop (backbone change) predictions described above, but there is a further need to optimize the relative position of the biomolecular target and the ligand. , Optimization is applied simultaneously for both biomolecular targets and ligands. Thus, the variables preferably include a total of six such variables: distance and angle variables for three types of translation along the X, Y, Z coordinate axes and three types of rotation about the same angle. included.

【０２０２】柔軟性のある分子の構造比較。伝統的なＲＭＳアプローチおよび他の重ね合わ
せ法（Ｌｅｍｍｅｎら，Ｐａｃ．Ｓｙｍｐ．Ｂｉｏｃｏｍｐｕｔ．１９９９；４
８２）は、柔軟性のある分子の非常に大きな範囲のコンフォメーションを比較す
るのに適していない。Structural comparison of flexible molecules. Traditional RMS approach and other superposition methods (Lemmen et al., Pac. Symp. Biocomput. 1999; 4;
82) is not suitable for comparing a very large range of conformations of flexible molecules.

【０２０３】このような比較は、異なる分子が同じ生体分子部位／標的に結合することがで
きる確率の評価を可能にする。２つの異なる分子が、酵素活性部位または受容体
に対して似たような結合親和性を示すことがある。このような分子両方の「生物
活性コンフォメーション」の候補を発見するために、本発明の方法は、最適化さ
れるこのような分子間の構造的区別を可能にする。Such a comparison allows the assessment of the probability that different molecules can bind to the same biomolecular site / target. Two different molecules may show similar binding affinities for the enzyme active site or receptor. In order to discover candidates for the "bioactive conformation" of both such molecules, the methods of the invention allow structural discrimination between such molecules to be optimized.

【０２０４】この問題は本発明の別のコンフォメーション探索であるが、この場合、最小化
しようとする関数または定量的パラメータは、２つの分子における選択された原
子の空間位置間のＲＭＳ差である。This problem is another conformational search of the present invention, where the function or quantitative parameter to be minimized is the RMS difference between the spatial positions of selected atoms in the two molecules. .

【０２０５】フラグメントからの分子の構築。これは、構造をベースとする薬物設計での古
典的な問題であり（Ｋｒｙｇｅｅｒら，Ｓｔｒｕｃｔｕｒｅ１９９９；７：２
９７）、かなり注目された（Ｍｉｚｕｔａｎｉら，Ｊ．Ｍｏｌ．Ｂｉｏｌ．１９
９４；２４３：３１０；ＴｏｍｉｏｋａおよびＩｔａｉ，Ｊ．Ｃｏｍｐｕｔ．Ａ
ｉｄｅｄＭｏｌ．Ｄｅｓ．１９９４；８：３４７；ＬｅａｃｈおよびＬｅｗｉ
ｓ，Ｊ．Ｃｏｍｐ．Ｃｈｅｍ．１９９４；１５：２３３）。生体分子に対する分
子フラグメントの親和性を研究するための、いくつかの優れたアプローチが開発
されているが（例えば、ＧＲＩＤ（Ｗａｄｅら，Ｊ．Ｍｅｄ．Ｃｈｅｍ．１９９
３；３６：１４０；ＷａｄｅおよびＧｏｏｄｆｏｒｄ，Ｊ．Ｍｅｄ．Ｃｈｅｍ．
１９９３；３６：１４８；Ｂｏｏｂｂｙｅｒら，Ｊ．Ｍｅｄ．Ｃｈｅｍ．１９８
９；３２：１０８３））、薬物候補になり得る分子にフラグメントを組み合わせ
ることは膨大な計算作業を必要とする。この場合でも、１個だけのリガンドでは
なくリガンドの集団を追跡すべきである。Construction of molecules from fragments. This is a classical problem in structure-based drug design (Krygeer et al., Structure 1999; 7: 2.
97), with considerable attention (Mizutani et al., J. Mol. Biol. 19).
94; 243: 310; Tomioka and Itai, J .; Comput. A
ided Mol. Des. 1994; 8: 347; Leach and Lewi.
S.J. Comp. Chem. 1994; 15: 233). Although several excellent approaches have been developed to study the affinity of molecular fragments for biomolecules (eg GRID (Wade et al., J. Med. Chem. 199).
3; 36: 140; Wade and Goodford, J .; Med. Chem.
1993; 36: 148; Boobbyer et al. Med. Chem. 198
9; 32: 1083)), combining fragments into molecules that can be drug candidates requires a huge amount of computational work. Again, the population of ligands should be followed rather than just one.

【０２０６】これは、分子フラグメントのいくつかの位置が他の研究から分かっているが、
これらのフラグメントの特異的かつ選択的なリガンドへの組み合わせが複雑な仕
事である、薬物設計における主な問題である。かなりの数のプログラム（例えば
、ＧＲＩＤ）（ＷａｄｅおよびＧｏｏｄｆｏｒｄ，Ｊ．Ｍｅｄ．Ｃｈｅｍ．１９
９３；３６：１４８−１５６）が、ある分子フラグメント（例えば、ヒドロキシ
ル、アミン、カルボニルなど）と既知タンパク質構造の活性部位との最良の相互
作用位置を示すことができる。しかしながら、薬物候補またはリード化合物を構
築するのには、このようなフラグメントの潜在的な組み合わせの数は非常に大き
く、最適化プロセスを必要とする。この場合、合成可能な構造ならびに分子量、
親油性などによって制限されている構造を達成するためには、プロセスはまた化
学的知識によっても導かれなければならない。Although this indicates that some positions of molecular fragments are known from other studies,
Combining these fragments into specific and selective ligands is a major task in drug design, a complex task. A considerable number of programs (eg GRID) (Wade and Goodford, J. Med. Chem. 19).
93; 36: 148-156) may indicate the best interaction site between a molecular fragment (eg hydroxyl, amine, carbonyl, etc.) and the active site of a known protein structure. However, the number of potential combinations of such fragments is very large and requires an optimization process to construct drug candidates or lead compounds. In this case, the synthesizable structure and molecular weight,
The process must also be guided by chemical knowledge in order to achieve structures that are limited by lipophilicity and the like.

【０２０７】この問題は、合成および分子安定性の化学的知識を構造の評価に導入しなけれ
ばならないので、他の全ての問題と異なる。この場合、変数は、「活性部位」で
の位置が前もって最適化されているフラグメント、ならびにフラグメントを存続
可能な完全な構造に集め、「活性部位」との相互作用エネルギーならびに内部エ
ネルギーおよび水和エネルギーを評価するために使用される分子「連結」フラグ
メント（例えば、脂肪族、脂環式、および芳香族）の空間位置および方向である
。This problem differs from all other problems because chemical knowledge of synthesis and molecular stability must be introduced into the structural evaluation. In this case, the variables are fragments whose position in the "active site" has been previously optimized, as well as the assembly of the fragments into a viable complete structure, the interaction energy with the "active site" and the internal and hydration energies. The spatial position and orientation of the molecular “linking” fragments (eg, aliphatic, cycloaliphatic, and aromatic) used to assess

【０２０８】小さなタンパク質のフォールディング。「非生物学的分子および生物学的分子
におけるエネルギーランドスケープ」に関する最近のショートレビュー（Ｆｒａ
ｕｎｆｅｌｄｅｒおよびＬｅｅｓｏｎ，ＮａｔｕｒｅＳｔｒｕｃｔ．Ｂｉｏｌ
．１９９８；５：７５７）において、著者らは、「フォールディングするタンパ
ク質は、そのエネルギーランドスケープが漏斗に似るように進化によって選択さ
れている」と結論付けている。これらの漏斗は、多くの場合（Ｗａｌｅｓら，Ｎ
ａｔｕｒｅ１９９８；３９４：７５８）、内部に小さな障壁を有する急勾配の
形状をしているが、他の漏斗形状が存在する。シャペロニン（Ｈｏｒｏｖｉｔｚ
，Ｃｕｒｒ．Ｏｐｉｎ．Ｓｔｒｕｃｔ．Ｂｉｏｌ．１９９８；８：９３）なしで
フォールディングすることができるこれらのタンパク質には、動力学および関数
の研究に重要な可能性がある「グローバルミニマム」に近い多くの到達可能なコ
ンフォメーションが存在し得る。これらの漏斗の探索（ＤｉｌｌおよびＣｈａｎ
，ＮａｔｕｒｅＳｔｒｕｃｔ．Ｂｉｏｌ．１９９７；４：１０）は現代生物学
の重要課題の１つである「タンパク質フォールディング」問題になった。主に、
格子シミュレーション（ｏｎ−ｌａｔｔｉｃｅｓｉｍｕｌａｔｉｏｎ）（Ｓｈ
ａｋｎｏｖｉｔｃｈ，Ｃｕｒｒ．Ｏｐｉｎ．Ｓｔｒｕｃｔ．Ｂｉｏｌ．１９９７
；７：２９）によって小さなモデルが何年も研究された後、より現代的なシミュ
レーションがペプチドフラグメントまたはタンパク質全体をフォールディングし
ようと試みている（ＤｏｂｓｏｎおよびＫａｒｐｌｕｓ，Ｃｕｒｒ．Ｏｐｉｎ．
Ｓｔｒｕｃｔ．Ｂｉｏｌ．１９９９；９：９２）。非常に長いシミュレーション
（１μｓ）が最近開発され（ＤｕａｎおよびＫｏｌｌｍａｎ，Ｓｃｉｅｎｃｅ
１９９８；２８２：７４０）、３６残基タンパク質のフォールディングを可能に
した。ＭｏｎｔｅＣａｒｌｏ法（ＨａｎｓｍａｎｎおよびＯｋａｍｏｔｏ，Ｃ
ｕｒｒ．Ｏｐｉｎ．Ｓｔｒｕｃｔ．Ｂｉｏｌ．１９９９；９：１７７）および他
の確率論的動力学（ＳａｎｄｅｒｏｗｉｔｚおよびＳｔｉｌｌ，Ｊ．Ｃｏｍｐｕ
ｔ．Ｃｈｅｍ．１９９８；１９：１２９４）が依然として通用している。これら
のエネルギーに基づく方法は、３５〜４０を超える残基を有するタンパク質のネ
イティブなフォールディングを容易に見つけない。Folding of small proteins. Recent Short Review on "Energy Landscape in Non-biological and Biological Molecules" (Fra
unfelder and Leeson, Nature Structure. Biol
． 1998; 5: 757), the authors conclude that "the folding protein is evolutionarily selected so that its energy landscape resembles a funnel." These funnels are often (Wales et al., N.
Nature 1998; 394: 758), with a steep shape with a small barrier inside, but other funnel shapes exist. Chaperonin (Horovitz
Curr. Opin. Struct. Biol. 1998; 8: 93), there may be many accessible conformations near these "global minimums" that may be important for kinetic and functional studies. Search for these funnels (Dill and Chan
, Nature Struct. Biol. 1997; 4:10) became one of the important issues in modern biology, the "protein folding" problem. mainly,
On-lattice simulation (Sh
aknovitch, Curr. Opin. Struct. Biol. 1997
7:29) after years of studying small models, more modern simulations have attempted to fold peptide fragments or entire proteins (Dobson and Karplus, Curr. Opin.
Struct. Biol. 1999; 9:92). A very long simulation (1 μs) was recently developed (Duan and Kollman, Science).
1998; 282: 740), allowing folding of a 36 residue protein. Monte Carlo method (Hansmann and Okamoto, C
urr. Opin. Struct. Biol. 1999; 9: 177) and other stochastic dynamics (Sanderowitz and Still, J. Compu.
t. Chem. 1998; 19: 1294) is still valid. These energy-based methods do not readily find the native folding of proteins with more than 35-40 residues.

【０２０９】過去２０年間、タンパク質フォールディングは生物物理学の重要問題であった
。本発明の方法は、比較的少数の、タンパク質の一次構造に応じて５０〜８０個
の残基を有する一組のタンパク質に適用することができる。このアプローチは、
「グローバル」ミニマムに加えて、グローバルミニマムのエネルギーに近く、か
つタンパク質の全特徴の一因となる他の多くの低エネルギーコンフォメーション
を生じることができる。For the past 20 years, protein folding has been a key issue in biophysics. The method of the invention can be applied to a relatively small set of proteins with 50-80 residues depending on the primary structure of the protein. This approach is
In addition to the "global" minimum, many other low energy conformations that are close to the energy of the global minimum and that contribute to all the characteristics of proteins can be generated.

【０２１０】約５０個の残基からなる小さなタンパク質において、変数は、バックボーンに
沿ったファイおよびプサイ角度（６もしくは１２の回転で、それぞれの角度はそ
れぞれ６０°もしくは３０°の差がある）ならびに側鎖の回転異性体である。バ
ックボーンのみの場合、ファイおよびプサイそれぞれの角度について６つの回転
では、問題の大きさは６^１００または約１０^６６である。同時に配置決定すべき
さらなる回転異性体がある場合、問題の大きさは約１０^１００まで増える。従っ
て、結果として生じる計算は複雑である可能性があるが、本発明の方法を使用し
て行うことができる。In a small protein of about 50 residues, the variables were the Phi and Psi angles along the backbone (6 or 12 rotations, each angle differing by 60 ° or 30 ° respectively) and It is a side chain rotamer. For the backbone only, with 6 rotations for each of the Phi and Psi angles, the problem size is 6 ¹⁰⁰ or about 10 ⁶⁶ . If there are additional rotamers to be simultaneously configured, the magnitude of the problem increases to about 10 ¹⁰⁰ . Therefore, the resulting calculations, which can be complex, can be done using the method of the invention.

【０２１１】上記の説明はあくまで例としての役割を果たすことだけが意図され、他の多く
の実施態様が本発明の精神および範囲内で可能なことが理解される。It is understood that the above description is only intended to serve as an example, and that many other embodiments are possible within the spirit and scope of the invention.

【表１】 [Table 1]

【表２】 [Table 2]

【表３】 [Table 3]

【表４】 [Table 4]

【表５】 [Table 5]

【表６】 [Table 6]

【表７】 [Table 7]

【表８】 [Table 8]

【表９】 [Table 9]

【表１０】 [Table 10]

【表１１】 [Table 11]

【表１２】 [Table 12]

【表１３】 [Table 13]

[Brief description of drawings]

【図１】本発明による例示的方法のフローチャート。[Figure 1] 3 is a flowchart of an exemplary method according to the present invention.

【図２】本発明による例示的システムの概略ブロック図。[Fig. 2] 1 is a schematic block diagram of an exemplary system according to the present invention.

【図３】水素配置アルゴリズムに対するフローチャート。[Figure 3] Flowchart for hydrogen allocation algorithm.

【図４】両方で１つの集団を形成する、１つのｓｐ^２アミドと１つのヒドロキシルとの
２つのカルボニルを含む分子を示す図。２つのカルボニル（１，２）は、受容体
として作用し、ヒドロキシルは１つの非自明水素（３）と２つの非自明孤立電子
対（４，５）とを供与し、アミドは１つの自明水素（６）を供与する。原子３お
よび孤立電子対４，５は、同一の酸素に結合されているために１つのセグメント
である。FIG. 4 shows a molecule containing two carbonyls, one sp ² amide and one hydroxyl, which together form one population. The two carbonyls (1,2) act as acceptors, the hydroxyl donates one non-trivial hydrogen (3) and two non-trivial lone pairs of electrons (4,5), and the amide one trivial hydrogen. Donate (6). The atom 3 and the lone electron pair 4, 5 are one segment because they are bound to the same oxygen.

【図５Ａ】図４の系に対する例示的な初期２次元マトリックスを示す図。ヒドロキシル水
素（３）はカルボニル（１，２）のいずれとも水素結合を形成することができ、
ヒドロキシル孤立電子対（４，５）は、自明水素（６）と水素結合を形成するこ
とができる。5A illustrates an exemplary initial two-dimensional matrix for the system of FIG. Hydroxyl hydrogen (3) can form hydrogen bonds with either carbonyl (1,2),
The hydroxyl lone pair (4,5) can form hydrogen bonds with the trivial hydrogen (6).

【図５Ｂ】精密化後の２次元マトリックスの図。ヒドロキシルの２つの孤立電子対は縮退
するため、そのいずれか１つを省略することができる（５→６）。省略された孤
立電子対は水素と第１の孤立電子対の位置が特定された後で自動的に付加される
。FIG. 5B is a diagram of a two-dimensional matrix after refinement. Since the two lone electron pairs of hydroxyl degenerate, one of them can be omitted (5 → 6). The omitted lone pair is automatically added after hydrogen and the first lone pair are located.

【図５Ｃ】全ての可能な組み合わせを維持するために２次元マトリックスを用いて、３次
元マトリックスを作成した図。各組み合わせを評価し、最良の組み合わせが結果
となる。FIG. 5C is a diagram of a three-dimensional matrix created using the two-dimensional matrix to maintain all possible combinations. Each combination is evaluated and the best combination results.

【図６】「大きな」系の例を示した図。大きい生物学的系（たとえばタンパク質）にお
ける初期２次元マトリックス。３次元マトリックスを作成する試みはコンピュー
タの能力を超えると考えられるので、高エネルギー成分を除くことにより２次元
マトリックスの精密化を行う。FIG. 6 is a diagram showing an example of a “large” system. An initial two-dimensional matrix in large biological systems (eg proteins). Since it is considered that an attempt to create a three-dimensional matrix exceeds the capacity of a computer, the two-dimensional matrix is refined by removing high energy components.

【図７】１１８６個のアミノ酸を有する「試験」タンパク質を示す図。１１８６個のア
ミノ酸のうち、１３個は（ＣＰＫモデルとして標識された）セリン（１３セグメ
ント）であり、１１７３個はグリシン（０セグメント）である。確率論的探索は
、総数５．０２×１０^１０個の組み合わせで開始し、２０４回の繰り返し後に２
．７×１０^３個になった。その後、これらを網羅的に評価した。水素の位置に対
するグローバルミニマムが見つかった。FIG. 7 shows a “test” protein with 1186 amino acids. Of the 1186 amino acids, 13 are serines (labeled as a CPK model) (13 segments) and 1173 are glycines (0 segments). The probabilistic search starts with a total of 5.02 × 10 ¹⁰ combinations and 2 after 204 iterations.
． It became 7 × 10 ³ . After that, these were comprehensively evaluated. A global minimum for the position of hydrogen has been found.

【図８】純粋な「確率論的アプローチ」における反復回数に対する（可能な組み合わせ
の総数）の自然対数を表したグラフ。５つのタンパク質が示されている。FIG. 8 is a graph showing the natural logarithm of (total number of possible combinations) against the number of iterations in a pure “stochastic approach”. Five proteins are shown.

【図９】５ＰＴＩ（Ａ）、５ＲＳＡ（Ｂ）、２ＭＢ５（Ｃ）、ＩＮＴＰ（Ｄ）に対する
１回目から４回目の反復におけるエネルギー分布のグラフ。FIG. 9 is a graph of energy distribution in the 1st to 4th iterations for 5PTI (A), 5RSA (B), 2MB5 (C), INTP (D).

【図１０】トリプシン（ＩＮＴＰ）とその極性残基のリボン図。多くの極性水素が水分子
と水素結合を形成しているが、どの水分子の座標もＰＤＢファイルには含まれて
いない。FIG. 10: Ribbon diagram of trypsin (INTP) and its polar residues. Many polar hydrogens form hydrogen bonds with water molecules, but the coordinates of any water molecule are not included in the PDB file.

【図１１】１０，０００の最低エネルギーコンフォメーションを見つけるにあたっての、
完全な網羅的探索と確率論的探索とを比較する試験例としてのクランビン（４６
アミノ酸残基）のモデルを示した図。クランビンのバックボーンがリボンで示さ
れている。水素以外の原子はボールアンドスティックモデルによって示されてい
る。FIG. 11 In finding the lowest energy conformation of 10,000,
Crambin (46) as a test example comparing a complete exhaustive search with a stochastic search.
The figure which showed the model of (amino acid residue). The cranbin backbone is shown with a ribbon. Atoms other than hydrogen are shown by the ball-and-stick model.

【図１２】１〜１０，０００の配座異性体に対する最低エネルギーコンフォメーションを
見つけだす場合の、確率論的探索と網羅的探索の比較を示した図。２つの探索間
のずれ（％）が一番下の曲線である。FIG. 12 is a diagram showing a comparison between a stochastic search and an exhaustive search when finding the lowest energy conformation for 1 to 10,000 conformers. The deviation (%) between the two searches is the bottom curve.

【図１３Ａ】検出されうるＥ．ｃｏｌｉリボヌクレアーゼＨＩにおける角度の百分率を示し
た図。１１５個の上反角のうち、７個の角度は回転異性体ライブラリに見つけら
れない。FIG. 13A: Detectable E. The figure which showed the percentage of the angle in E. coli ribonuclease HI. Of the 115 dihedral angles, 7 angles are not found in the rotamer library.

【図１３Ｂ】確率論的アルゴリズムによって検出されたＥ．ｃｏｌｉリボヌクレアーゼＨＩ
における角度の百分率を示す図。FIG. 13B. E. coli detected by the stochastic algorithm. coli ribonuclease HI
The figure which shows the percentage of the angle in.

【図１４】高い可能性で除去されることになる２〜２９個の可能な回転異性体に対する単
一の残基のα値を示す図。各回転異性体の数ごとに、α（△）値が当てられてい
る。回転異性体の数が大きいほど、αは小さくなる。各所定の回転異性体の数お
よびαに対して、確実性の百分率が計算されている（□）。FIG. 14: Single residue α values for 2 to 29 possible rotamers that are likely to be removed. An α (Δ) value is assigned for each number of rotamers. The greater the number of rotamers, the smaller α. The percentage of certainty has been calculated for each given rotamer number and α (□).

【図１５】６残基（０〜５）ループの一例を示す図。残基０〜５は、膜貫通ヘリックスの
一部である。残基１〜４のコンフォメーションについての探索を行う。ループの
コンフォメーション空間を探索するために本発明の方法を用い、式２で定義され
る全ての可能なループ閉包コンフォメーションを見つけだす。FIG. 15 shows an example of a 6-residue (0-5) loop. Residues 0-5 are part of the transmembrane helix. A search is performed for the conformation of residues 1-4. The method of the present invention is used to search the loop conformation space to find all possible loop closure conformations defined in Equation 2.

【図１６】上反角の定義を示す図。構築ストラテジーにおける残基ｎのψは、Ｎ末端側の
前の残基のψである。FIG. 16 is a diagram showing the definition of dihedral. The ψ of residue n in the construction strategy is the ψ of the previous residue on the N-terminal side.

【図１７】４残基の試験例における１０，０００個の「最低コスト関数」コンフォメーシ
ョンを示す図。確率論的および網羅的探索は同じグローバルミニマムに達した。
最初の６６個のコンフォメーションは一致している。FIG. 17 shows 10,000 “lowest cost function” conformations in a 4-residue study. Probabilistic and exhaustive searches have reached the same global minimum.
The first 66 conformations are in agreement.

───────────────────────────────────────────────────── フロントページの続き (81)指定国ＥＰ(ＡＴ，ＢＥ，ＣＨ，ＣＹ，ＤＥ，ＤＫ，ＥＳ，ＦＩ，ＦＲ，ＧＢ，ＧＲ，ＩＥ，ＩＴ，ＬＵ，ＭＣ，ＮＬ，ＰＴ，ＳＥ，ＴＲ)，ＯＡ(ＢＦ，ＢＪ，ＣＦ，ＣＧ，ＣＩ，ＣＭ，ＧＡ，ＧＮ，ＧＷ，ＭＬ，ＭＲ，ＮＥ，ＳＮ，ＴＤ，ＴＧ)，ＡＰ(ＧＨ，ＧＭ，ＫＥ，ＬＳ，ＭＷ，ＭＺ，ＳＤ，ＳＬ，ＳＺ，ＴＺ，ＵＧ，ＺＷ)，ＥＡ(ＡＭ，ＡＺ，ＢＹ，ＫＧ，ＫＺ，ＭＤ，ＲＵ，ＴＪ，ＴＭ)，ＡＥ，ＡＧ，ＡＬ，ＡＭ，ＡＴ，ＡＵ，ＡＺ，ＢＡ，ＢＢ，ＢＧ，ＢＲ，ＢＹ，ＢＺ，ＣＡ，ＣＨ，ＣＮ，ＣＲ，ＣＵ，ＣＺ，ＤＥ，ＤＫ，ＤＭ，ＤＺ，ＥＥ，ＥＳ，ＦＩ，ＧＢ，ＧＤ，ＧＥ，ＧＨ，ＧＭ，ＨＲ，ＨＵ，ＩＤ，ＩＬ，ＩＮ，ＩＳ，ＪＰ，ＫＥ，ＫＧ，ＫＰ，ＫＲ，ＫＺ，ＬＣ，ＬＫ，ＬＲ，ＬＳ，ＬＴ，ＬＵ，ＬＶ，ＭＡ，ＭＤ，ＭＧ，ＭＫ，ＭＮ，ＭＷ，ＭＸ，ＭＺ，ＮＯ，ＮＺ，ＰＬ，ＰＴ，ＲＯ，ＲＵ，ＳＤ，ＳＥ，ＳＧ，ＳＩ，ＳＫ，ＳＬ，ＴＪ，ＴＭ，ＴＲ，ＴＴ，ＴＺ，ＵＡ，ＵＧ，ＵＳ，ＵＺ，ＶＮ，ＹＵ，ＺＡ，ＺＷ (72)発明者ゴールドブラム、アミラムイスラエル国 93501 エルサレムシムショーン 20 (72)発明者グリック、メイアイスラエル国 59503 バートヤムシェシェトハヤミン２Ｆターム(参考） 5B056 BB64 BB65 5B075 ND20 UU18 ─────────────────────────────────────────────────── ─── Continued front page (81) Designated countries EP (AT, BE, CH, CY, DE, DK, ES, FI, FR, GB, GR, IE, I T, LU, MC, NL, PT, SE, TR), OA (BF , BJ, CF, CG, CI, CM, GA, GN, GW, ML, MR, NE, SN, TD, TG), AP (GH, G M, KE, LS, MW, MZ, SD, SL, SZ, TZ , UG, ZW), EA (AM, AZ, BY, KG, KZ, MD, RU, TJ, TM), AE, AG, AL, AM, AT, AU, AZ, BA, BB, BG, BR, BY, B Z, CA, CH, CN, CR, CU, CZ, DE, DK , DM, DZ, EE, ES, FI, GB, GD, GE, GH, GM, HR, HU, ID, IL, IN, IS, J P, KE, KG, KP, KR, KZ, LC, LK, LR , LS, LT, LU, LV, MA, MD, MG, MK, MN, MW, MX, MZ, NO, NZ, PL, PT, R O, RU, SD, SE, SG, SI, SK, SL, TJ , TM, TR, TT, TZ, UA, UG, US, UZ, VN, YU, ZA, ZW (72) Inventor Gold Bram, Amiram Israel 93501 Jerusalem Sim Sean 20 (72) Inventor Glick, Meir Israel 59503 Bat Yam She Shet hayamin 2 F-term (reference) 5B056 BB64 BB65 5B075 ND20 UU18

Claims

[Claims]

1. A method for searching every corner of a combinatorial space, the space featuring a plurality of combinations, each combination consisting of a plurality of elements, wherein the steps of the method are performed by a data processor. (A) providing measurable quantitative parameters for each combination to determine whether the result of searching the combinatorial space in every corner is successful; and (b) the combinatorial space. Selecting a plurality of combinations in the above to form a selected combination; (c) calculating a value for the quantitative parameter for each of the plurality of selected combinations; ) Determining the effect of each factor on the value of the quantitative parameter, and (e) reducing the effect according to the effect. Both holds one combination, the method comprising the steps of providing a result of searching the combined space throughout.

2. The step performed before step (a), further comprising the step of determining a structure for a plurality of combinations of combination spaces such that there is interaction between the elements. The method of claim 1, wherein

3. The element according to claim 2, wherein each element is a variable having a value, and the quantitative parameter is calculated according to the value of the variable for each combination and the interaction between the variables. The method described.

4. The method of claim 3, wherein the quantitative parameter is a cost function.

5. The method of claim 4, wherein each variable has one discrete value for any particular combination.

6. Step (e) comprises: (i) rejecting a value if the value does not consistently improve the cost function; and (ii) each combination characterized by the value for the variable. The method according to claim 1 or 5, further comprising: rejecting.

7. Step (e) comprises: (iii) determining whether the number of remaining combinations is below a minimum number, and (iv) if the number of remaining combinations is above the minimum number, Step (
The method according to claim 6, further comprising repeating steps c) to (e) at least once.

8. The step (e) further comprises: (v) evaluating each remaining contribution according to a parameter if the number of the remaining combinations is less than the minimum number.
The method described in.

9. The method of claim 8, wherein step (v) is performed by an exhaustive search of the remaining combinations.

10. Step (i) comprises: (1) creating a plurality of combinations by randomly giving values to the variables; and (2) calculating the values for the cost function. (3) determining that the effect of the value is a negative effect if the value is found in a plurality of combinations having a value for the cost function that is below a predetermined minimum value, The method of claim 6, wherein the method is performed.

11. The cost function, wherein the value for the variable is found in a plurality of combinations having a value for the cost function below a predetermined minimum value, and the value for the variable is above the predetermined desired value. 11. The step (3) is performed if found in a combination having the value for and the value for the cost function is determined to be below the predetermined minimum value. the method of.

12. The method according to claim 6, wherein each value is relative to a position of a polar proton in a biomolecule, and the cost function is a calculated minimum energy value for the polar proton. .

13. The combination comprises: (i) parameterizing the atoms of the biomolecule; (ii) a hydrogen atom and a lone pair of electrons; a trivial hydrogen atom, a polar hydrogen atom, and a non-trivial lone pair of electrons; 13. The method of claim 12, determined according to the steps of: categorizing; and (iii) adding trivial hydrogen atoms to each combination.

14. The cost function is a non-bonded energy function paired as follows:
number [Equation 1] (In the formula, A_{i, j}Is the repulsion parameter for two atoms (i, j), B _{i, j} Is the polarizability attraction parameter, and q₁Is the partial charge and r_{i, j}Is an atom
Distance, and ε is the permittivity) Method according to claim 13, characterized in that it is calculated according to:

15. Each combination of the combination space includes a position with respect to a side chain of an amino acid in a protein, and the cost function is a calculated minimum energy value with respect to the side chain of the amino acid. The method according to 6.

16. Each combination is formed from rotamers for each side chain and the step of forming a combination comprises eliminating rotamers that collide with the backbone of the protein. 15. The method according to 15.

17. Each element is a rotamer, and the effect of the element on each combination is to sample a plurality of combinations, and for each population, measure the quantitative parameter across the sampled plurality of combinations. 16. Method according to claim 15, characterized in that it is determined by examining the distribution.

18. The combination according to claim 6, wherein each combination in the combination space includes a structure for a loop of a protein, and the cost function is a minimum energy calculation value for the structure of the loop. Method.

19. The method of claim 18, wherein the structure for the loop is determined according to multiple angle pairs between residues of the protein.

20. A value for each angle pair is randomly selected to create each combination, each combination having an associated value for the cost function, and an associated value for which the value exceeds a predetermined threshold. 20. The method of claim 19, wherein the value is removed if it contributes only to the combinations that it has.

21. The step (e) evaluates each combination according to the presence or absence of collision between side chains of amino acids in the loop, and rejects the combination if the collision exists. 21. The method of claim 20, further comprising:

22. Each combination of the combinatorial space includes structures for all loops in a target protein, the cost function being a minimized energy calculation for the structures of the loops, and The structure is determined according to multiple angle pairs between residues of the protein, step (a) comprising providing a predetermined structure for a known protein having homology to the target protein, and step (e) ) Between the side chains of amino acids within the loop within the target protein and between the remaining portion of the target protein and the loop after the loop has been evaluated by comparison with the structure of the known protein. Evaluate each combination according to the presence or absence of collision, and discard the combination if the collision exists. 7. The method of claim 6, further comprising the step of allowing said structure for said target protein to be determined according to said structure for said known protein.

23. A value for each angle pair is randomly selected to form each combination, each combination having an associated value for the cost function, and the value being a predetermined threshold value for the value. 23. A method according to claim 22, characterized in that it is eliminated if it only contributes to combinations having a higher associated value.

24. Each combination of combinatorial spaces includes positions for multiple moieties within a cyclized molecule, the structure of the cyclized molecule is determined from the structure of a linear molecule, and the cost function is the linear function. 7. The method according to claim 6, wherein a calculated minimum energy is obtained for the position of the portion by comparison with the structure of the molecule.

25. Each combination of combinatorial spaces comprises an assembly of molecular fragments to form one molecule, said assembly for connecting each molecular fragment in place to at least one other molecular fragment. 7. The method of claim 6, wherein the cost function is a calculated minimization energy for the predetermined position of the molecular fragment in the structure of the assembly.

26. Each combination of the combination space comprises at least one part of the structure of the first and second entities, each part comprising a variable for rotation about an angle between each said part and said first part. Is defined by a relative distance between an entity and the second entity, the relative distance being defined by a variable for transformation along a coordinate axis, the cost function being the first entity and the second entity. 7. The method of claim 6, wherein the minimized energy calculation is for interactions between, for the distance, and for at least a portion of the first entity and the second entity.