JP3843260B2

JP3843260B2 - Protein three-dimensional structure construction method including inductive adaptation and use thereof

Info

Publication number: JP3843260B2
Application number: JP2002558170A
Authority: JP
Inventors: 秀明梅山; 克一郎小松
Original assignee: IN-SILICO SCIENCES, INC.
Current assignee: IN-SILICO SCIENCES, INC.
Priority date: 2001-01-19
Filing date: 2002-01-17
Publication date: 2006-11-08
Anticipated expiration: 2022-01-17
Also published as: JPWO2002057954A1; WO2002057954A1

Description

技術分野
本発明は、誘導適合を含めたタンパク質の立体構造構築方法およびその利用に関し、さらに詳しくは、参照タンパク質の立体構造とその原子座標を変位させた複数の立体構造セットを参照タンパク質の立体構造として目的タンパク質の複数の立体構造セットを作成することよりなるタンパク質の立体構造構築方法、該立体構造セットを用いるタンパク質−リガンド複合体の立体構造構築方法、およびタンパク質のリガンド結合部位の特定方法等に関する。
本発明の方法により提供される目的タンパク質の立体構造は、誘導適合（ｉｎｄｕｃｅｄｆｉｔ）を含めた立体構造であり、医農薬の分子設計に極めて有用である。
背景技術
立体構造が既知のタンパク質に関する情報を利用し、立体構造が未知の目的タンパク質とのアライメントを得て、このアライメント情報に基づいて目的タンパク質の立体構造を、コンピュータを用いて作成することが可能であり、この手法は、通常ホモロジーモデリング（ｈｏｍｏｌｏｇｙｍｏｄｅｌｉｎｇ）と呼ばれている。ホモロジーモデリングにより構築される立体構造の精度は、近年目覚ましく向上しているが、未だ解決すべき問題点も多い。
この方法を用いて受容体タンパク質の立体構造を構築する場合、リガンドが結合する空間の確保が不可欠である。しかしながら、従来の立体構造構築法ではリガンドが存在する空間や結合部位に構築された立体構造の主鎖または側鎖がパッキングされ、その空間が塞がれてしまい、リガンドが受容体タンパク質と接触し、その結合部位に存在できない等の問題が生じていた。
また、タンパク質−リガンド複合体の立体構造構築方法において、目的受容体タンパク質の立体構造が実験的に求まっていない場合、単にホモロジーモデリング（ｈｏｍｏｌｏｇｙｍｏｄｅｌｉｎｇ）法により構築された受容体タンパク質の立体構造自身にリガンドをドッキングさせ、分子力場計算や分子動力学計算で、それらを最適化することにより受容体タンパク質−リガンド複合体の立体構造を得ていた。また、ＭｕｌｔｉｐｌｅＣｏｐｙＳｉｍｕｌｔａｎｅｏｕｓＳｅａｒｃｈ（ＭＣＳＳ）法を用いた研究においても、受容体タンパク質側の立体構造には基準振動モードは考慮されておらず、とくに分子の時間的にピコオーダーの振動を主とする長周期の熱揺らぎ（以下これを単に「熱揺らぎ」または「分子揺らぎ」と称することがある）は無視されていた。
更に、従来から、遠距離まで影響を及ぼす静電ポテンシャルによるタンパク質のリガンド結合部位を特定する方法や、類似化合物に基づいたタンパク質−リガンド複合体の立体構造の構築が行われているが、いずれも信頼性が低く、類似化合物の無い場合には信頼性のあるタンパク質−リガンド複合体の立体構造を導くことは困難であった。
発明の開示
本発明は、上記の状況を鑑みて、任意のタンパク質の立体構造を精度良く構築する方法、またタンパク質−リガンド複合体の立体構造を精度良く構築する方法等の提供を目的としてなされたものである。
本発明者等は、上記課題を達成すべく鋭意検討した結果、参照タンパク質の原子座標を基準振動解析法から得られる固有ベクトル方向に変位した原子座標を参照して受容体タンパク質の立体構造を構築すれば、リガンドが存在する空間や結合部位に立体構造の主鎖または側鎖がパッキングされてその空間が塞がれることが無く、受容体タンパク質の立体構造の精度を格段に向上させることができることを見出した。すなわち、基準振動モードに基づいて分子の熱揺らぎを考慮した複数の受容体タンパク質モデルが構築できることを見出した。
また、かくして構築された受容体タンパク質モデルにドッキングしたリガンドの立体構造を用いて、ＭｕｌｔｉｐｌｅＣｏｐｙＳｉｍｕｌｔａｎｅｏｕｓＳｅａｒｃｈ（ＭＣＳＳ）法の分子力学計算と分子動力学計算を適用して、分子の熱揺らぎを考慮した精度の高いタンパク質−リガンド複合体の立体構造構築が可能であることを見出した。
更に、本発明者等は、タンパク質−リガンド複合体には水溶液中での現象を考えると静電力よりも疎水相互作用の方が重要でないかという結論に達した。そこでタンパク質周囲および内部に溶媒を配置し、分子動力学による溶媒挙動（溶媒の拡散・集積）の解析からタンパク質に溶媒が集積する部位または溶媒が拡散しにくい部位が、リガンド結合部位と一致することを見出した。
本発明はこれらの知見に基づいて成し遂げられたものである。
即ち、本発明の方法により、（１）参照タンパク質と目的タンパク質とのアライメントを導き出し、該アライメントおよび参照タンパク質の立体構造情報に基づいて目的タンパク質の立体構造を構築する方法において、参照タンパク質の立体構造とその原子座標を変位させた複数の立体構造を参照タンパク質の立体構造として目的タンパク質の複数の立体構造セットを作成することを特徴とする誘導適合を含めたタンパク質の立体構造構築方法が提供される。
この発明の好ましい態様により（２）参照タンパク質の原子座標の変位が、基準振動解析法により行われることを特徴とする上記（１）に記載の方法、（３）立体構造の構築が、（ｉ）アミノ酸中のＣα原子について参照タンパク質の立体構造から座標を取得し、目的関数を最小化するようにＣα原子座標を最適化し、（ｉｉ）最適化されたＣαの原子座標に主鎖の他の原子を付加して目的関数を最小化するように主鎖の原子座標を最適化し、（ｉｉｉ）最適化された主鎖の原子座標に側鎖の他の原子を付加し目的関数を最小化するように最適化することにより行われることを特徴とする上記（１）又は（２）に記載の方法が提供される。
本発明の別の態様により、（４）（ｉ）上記（１）〜（３）のいずれかに記載の方法により得られる目的タンパク質の複数の立体構造とリガンドとのドッキング操作を行い、（ｉｉ）目的タンパク質の１つの構造とリガンドとの構造の経験的分子エネルギー計算を、目的タンパク質の構造の数だけ行い、その際、（ｉｉｉ）目的タンパク質側は、複数の構造それぞれのポテンシャルエネルギー勾配に応じて原子座標を動かし、（ｉｖ）リガンド側は、複数個算出されたポテンシャルエネルギー勾配を平均化した方向にリガンドの原子座標を動かして、（ｖ）目的タンパク質の複数の立体構造に基づくリガンドの立体構造を求めることを特徴とするタンパク質−リガンド複合体の立体構造構築方法が提供される。
この発明の好ましい態様により、（５）経験的分子エネルギー計算において、目的タンパク質の初期Ｃα原子座標の位置をオプションＨａｒｍｏｎｉｃ関数として加えるか、あるいは目的タンパク質の主鎖のねじれ角を拘束するポテンシャル関数を加えることを特徴とする上記（４）に記載の方法が提供される。
本発明の別の態様により、（６）（ｉ）タンパク質の立体構造の周囲に低分子化合物を配置し、（ｉｉ）それらの周囲にさらに水分子を配置し、水溶媒中での経験的分子エネルギー計算を行って、タンパク質と低分子化合物との原子座標を得、（ｉｉｉ）得られた原子座標について、タンパク質の周囲および内部の、低分子化合物の挙動解析を行い、リガンドの結合部位を判定することを特徴とするタンパク質のリガンド結合部位の特定方法、および、（７）（ｉ）タンパク質およびリガンドの立体構造の周囲に低分子化合物を配置し、（ｉｉ）それらの周囲にさらに水分子を配置し、水溶媒中での経験的分子エネルギー計算を行って、タンパク質と低分子化合物との原子座標を得、（ｉｉｉ）得られた原子座標について、タンパク質およびリガンドの周囲および内部の、低分子化合物の挙動解析を行い、タンパク質−リガンド複合体の結合部位を判定することを特徴とするタンパク質−リガンド複合体の結合部位の特定方法が提供される。
この発明の好ましい態様により、（８）低分子化合物の挙動解析が、低分子化合物を対象としたクラスター解析により行われ、得られたクラスターのサイズをリガンドの結合可能性部位の順位として結合部位を判定することを特徴とする上記（６）または（７）に記載の方法が提供される。
本発明の別の態様により、（９）上記（６）〜（８）のいずれかに記載の方法により特定したタンパク質のリガンド結合部位にリガンドをドッキングし、経験的分子エネルギー計算によりタンパク質−リガンド複合体の立体構造を得ることを特徴とするタンパク質−リガンド複合体の立体構造構築方法が提供される。
本発明の別の態様により、（１０）上記（１）〜（５）および（９）のいずれかに記載の方法により得られるタンパク質の立体構造および／またはタンパク質−リガンド複合体の立体構造を規定する原子座標が記録されていることを特徴とするコンピュータ読みとり可能な記録媒体、または、該原子座標を含むことを特徴とするデータベースが提供される。
本発明の別の態様により、（１１）上記（１０）に記載の記録媒体またはデータベースから得られるタンパク質の立体構造を規定する原子座標を用いて、薬物候補分子の立体構造との相互作用に基づいて、目的とする薬物分子を同定、検索、評価または設計することを特徴とする薬物分子設計方法が提供される。
発明を実施するための最良の形態
以下、本発明を更に詳細に説明する。本明細書において、幾つかの用語を使用するが、特に明記しない限り、次の意味を有する。
「目的タンパク質」とは、Ｘ線結晶解析やＮＭＲ解析等により完全な立体構造が決定されておらず、本発明において立体構造構築の対象となる任意のタンパク質を意味する。このタンパク質には、部分構造は解析されているが完全な立体構造が得られていないものも含まれる。本発明においては、立体構造が未知の受容体タンパク質、酵素等を目的タンパク質とするのが好ましい。ここで、Ｘ線結晶解析には、Ｘ線のみならず電子線および中性子線解析等も含まれる。
「受容体タンパク質」とは、細胞に存在し、外来性の物質あるいは物理的刺激を認識して、細胞に応答を誘起するタンパク質を意味する。この受容体タンパク質は、リガンドを特異的に結合する能力を有する。また、「リガンド」とは、タンパク質と特異的に結合する能力を有する物質を意味する。リガンドには、医農薬分子の様な低分子物質のみならず、抗体やタンパク質と相互作用をする特定のペプチドやタンパク質等の高分子物質も含まれる。
「参照タンパク質」とは、その立体構造の詳細がＸ線結晶解析やＮＭＲ解析等により既に決定されており、目的タンパク質の立体構造を規定する原子座標を構築するために参照するタンパク質を意味する。また「アライメント」とは、２種類以上のタンパク質についてアミノ酸配列の対応関係をつけることを意味する。
「原子座標」とは、三次元空間上で立体構造を記述するものである。それは空間上のある点を原点とする互いに垂直な三方向の相対的な距離であり、タンパク質中に存在する水素原子を除く原子一つあたりに３個の数字からなるベクトル量である。
「誘導適合（ｉｎｄｕｃｅｄｆｉｔ）」とは、タンパク質の立体構造は柔軟であり、リガンド、例えば医農薬分子と結合すると、それとより良く結合するようにタンパク質の立体構造が変化することを意味する。「誘導適合（ｉｎｄｕｃｅｄｆｉｔ）を含めた立体構造」とは、誘導適合により生じるタンパク質の立体構造変化を、例えば基準振動解析法で得られる固有ベクトルで表せると仮定し、誘導適合前の立体構造にこの固有ベクトルを加えて生成する立体構造を意味する。
「目的タンパク質−リガンド複合体」とは、Ｘ線結晶解析やＮＭＲ解析等により複合体の完全な立体構造が解明されておらず、本発明において立体構造の構築対象となるタンパク質−リガンド複合体を意味する。もちろんタンパク質としてＸ線結晶解析やＮＭＲ解析等により得られた立体構造を含むことは当然である。この複合体には、部分構造が解析されているが完全な立体構造が得られていないものも含まれる。タンパク質に結合したリガンド双方の複合体を意味する。
「ＭｕｌｔｉｐｌｅＣｏｐｙＳｉｍｕｌｔａｎｅｏｕｓＳｅａｒｃｈ（ＭＣＳＳ）法」とは、複数リガンドの立体構造を基にして目的タンパク質−リガンド複合体の立体構造を、経験的分子エネルギー計算法、すなわち分子力学、分子動力学計算で受容体タンパク質の立体構造を求める方法である。本発明では、それとは逆に、複数のタンパク質の立体構造を１つのリガンドの立体構造を基に目的とするタンパク質−リガンド複合体の立体構造を求める方法を意味する。
「経験的分子エネルギー計算」とは、分子力学計算と分子動力学計算を意味する。両者とも経験ポテンシャルを使った分子エネルギー計算である。
「ＭＳＡＳ（ＭａｘｉｍｕｍＳｏｌｖｅｎｔＡｃｃｅｓｓｉｂｉｌｉｔｙｏｆＳｉｄｅｃｈａｉｎ）」とは、最大溶媒接触表面積のことであり、タンパク質を構成している各アミノ酸の側鎖の溶媒接触表面積と、そのアミノ酸がタンパク質を構成していない単独に存在する状態のときの側鎖の溶媒接触表面積との比を意味する。ＭＳＡＳの詳細は、Ｋ．Ａｋａｈａｎｅ，Ｙ．ＮａｇａｎｏａｎｄＨ．Ｕｍｅｙａｍａ，Ｃｈｅｍ．Ｐｈａｒｍ．Ｂｕｌｌ．，１９８９，３７（１）８６−９２に記載されている。
後記Ｉ〜ＩＩＩの方法は、ホモロジーモデリングを行うことができる適当なコンピュータを用いて、後記方法を実行させる適当なプログラムを利用して実施することができる。
Ｉ．誘導適合を含めた立体構造の構築方法
先ず、本発明の誘導適合を含めた立体構造の構築方法について説明する。
第１図は、本発明の誘導適合（ｉｎｄｕｃｅｄｆｉｔ）を含めた立体構造構築方法の一例を示すフローチャートである。
ステップＩ−１０において、目的タンパク質の配列を入力し、目的タンパク質の立体構造の構築に用いる参照タンパク質を選定し、参照タンパク質の立体構造から原子座標を収得し、目的関数を最小化するように原子座標を最適化する。ステップＩ−２０において、最適化した原子座標の基準振動解析法を行う。ステップＩ−３０において、固有ベクトル方向に参照タンパク質の原子座標を変位し、その構造を参照タンパク質に加え、参照タンパク質のセットを作成する。ステップＩ−４０において、適当なホモロジー・モデリング・プログラム、例えばＦＡＭＳによりアライメント情報や参照タンパク質セットの各立体構造情報から目的タンパク質の立体構造のセットを構築する。かくして、目的タンパク質の誘導適合（ｉｎｄｕｃｅｄｆｉｔ）を含めた立体構造を精度良く構築することができる。以下、各ステップについて更に詳細に説明する。
ステップＩ−１０：参照タンパク質の初期座標の最適化
先ず、目的タンパク質の立体構造の構築において、目的タンパク質のアミノ酸配列を入力し、参照するタンパク質（参照タンパク質）を選定する。参照タンパク質の選定は、それ自体既知の通常用いられるアライメントソフトウエアを用いて行われる。この参照タンパク質の原子座標を、適当な立体構造データベースから収得する。この原子座標には、アミノ酸の骨格を作る窒素原子等に結合している水素原子はなく、ステップＩ−２０の基準振動解析法の計算に水素原子が必要な場合は水素原子を発生させる。参照タンパク質の原子座標から構成される目的関数を用いて原子座標を最適化する。
ここで、用いられる目的タンパク質のアミノ酸配列としては、データベースに登録されているもの、配列が始めて解析されたもの等の如何なる由来の配列であってもよい。用いられるアミノ酸配列データベースとしては、例えば、″ＡｎＩｎｔｅｒｎｅｔｒｅｖｉｅｗ：ｔｈｅｃｏｍｐｌｅｔｅｎｅｕｒｏｓｃｉｅｎｔｉｓｔｓｃｏｕｒｓｔｈｅＷｏｒｌｄＷｉｄｅＷｅｂ．″ＢｌｏｏｍＦＥ，Ｓｃｉｅｎｃｅ１９９６；２７４（５２９０）：１１０４−９に詳細が記載されているＧＣＲＤｂ（ＴｈｅＧ−ｐｒｏｔｅｉｎ−ｃｏｕｐｌｅｄＲｅｃｅｐｔｏｒＤａｔａｂａｓｅ）：ｈｔｔｐ：／／ｗｗｗ．ｇｃｒｄｂ．ｕｔｈｓｃｓａ．ｅｄｕ／、ＧＰＣＲＤＢ：ｈｔｔｐ：／／ｗｗｗ．ｇｐｃｒ．ｏｒｇ／７ｔｍ／、ＥｘＰＡＳｙ：ｈｔｔｐ：／／ｗｗｗ．ｅｘｐａｓｙ．ｃｈ／ｃｇｉ−ｂｉｎ／ｓｍ−ｇｐｃｒ．ｐｌ、ＯＲＤＢ：ｈｔｔｐ：／／ｙｃｍｉ．ｍｅｄ．ｙａｌｅ．ｅｄｕ／ｓｅｎｓｅｌａｂ／ｏｒｄｂ／、ＧｅｎｅＢａｎｋ：ｆｔｐ：／／ｎｃｂｉ．ｎｌｍ．ｎｉｈ．ｇｏｖ／ｇｅｎｂａｎｋ／ｇｅｎｏｍｅｓ／、ＰＩＲ：ｈｔｔｐ：／／ｗｗｗ−ｎｂｒｆ．ｇｅｏｒｇｅｔｏｗｎ．ｅｄｕ／ｐｉｒ／（ＮａｔｉｏｎａｌＢｉｏｍｅｄｉｃａｌＲｅｓｅａｒｃｈＦｏｕｎｄａｔｉｏｎ（ＮＢＲＦ））、ＳｗｉｓｓＰｌｏｔ：ｈｔｔｐ：／／ｗｗｗ．ｅｘｐａｓｙ．ｃｈ／ｓｐｒｏｔ／ｓｐｒｏｔ−ｔｏｐ．ｈｔｍｌ（ＳｗｉｓｓＩｎｓｔｉｔｕｔｅｏｆＢｉｏｉｎｆｏｒｍａｔｉｃｓ（ＳＩＢ），ＥｕｒｏｐｅａｎＢｉｏｉｎｆｏｍａｔｉｃｓＩｎｓｔｉｔｕｔｅ（ＥＢＩ））、ＴｒＥＭＢＬ（ＵＲＬ及び管理者ともにＳｗｉｓｓＰｌｏｔと同じ）、ＴｒＥＭＢＬＮＥＷ（ＵＲＬ及び管理者ともにＳｗｉｓｓＰｌｏｔと同じ）、ＤＡＤ：ｆｔｐ：／／ｆｔｐ．ｄｄｂｊ．ｎｉｇ．ａｃ．ｊｐ（日本ＤＮＡデータバンク）等のデータベースに登録されているヒト（Ｈ．ｓａｐｉｅｎｓ）、ショウジョウバエ（Ｄ．ｍｅｌａｎｏｇａｓｔｅｒ）、線虫（Ｃ．ｅｌａｇａｎｓ）、酵母（Ｓ．ｃｅｒｅｖｉｓｉａｅ）、シロイヌナズナ（Ａ．ｔｈａｌｉａｎａ）等を挙げることができる。これらのデータベースは単なる例示であり、タンパク質のアミノ酸配列が登録されているものであれば如何なるデータベースを用いることもできる。
また、参照タンパク質の原子座標の収得に用いられる立体構造データベースとしては、例えばＰＤＢ（ＰｒｏｔｅｉｎＤａｔａＢａｎｋ）：ｈｔｔｐ：／／ｗｗｗ．ｒｃｓｂ．ｏｒｇ／ｐｄｂ／、ＣＣＤＣ（ＣａｍｂｒｉｄｇｅＣｒｙｓｔａｌｌｏｇｒａｐｈｉｃＤａｔａＣｅｎｔｒｅ：ｈｔｔｐ：／／ｗｗｗ．ｃｃｄｃ．ｃａｍ．ａｕ．ｕｋ／、ＳＣＯＰ（ＳｔｒｕｃｔｕｒｅＣｌａｓｓｉｆｉｃａｔｉｏｎｏｆＰｒｏｔｅｉｎ）：ｈｔｔｐ：／／ｓｃｏｐ．ｍｒｃ−ｌｍｂ．ｃａｍ．ａｃ．ｕｋ／ｓｃｏｐ、ＣＡＴＨ：ｈｔｔｐ：／／ｗｗｗ．ｂｉｏｃｈｅｍ．ｕｃｌ．ａｃ．ｕｋ／ｂｓｍ／ｃａｔｈ等を挙げることができる。これらの立体構造データベースは、単独または組み合わせて用いることことができる。上記データベース中、ＳＣＯＰおよびＣＡＴＨは、ドメイン単位（タンパク質の立体構造で、３次構造の単位）に区切った立体構造データベースである。
アライメント用ソフトウエアとしては、例えばＦＡＳＴＡもしくはＰＳＩ−ＢＬＡＳＴ（Ｐｏｓｉｔｉｏｎ−ＳｐｅｃｉｆｉｃＩｔｅｒａｔｅｄＢＬＡＳＴ）を使うのが好ましい。ＦＡＳＴＡは目的配列と一致度の高い配列を立体構造データベースから探索し、最終的な目的配列と参照タンパク質との一致度をｅ値として算出するプログラムである。ＦＡＳＴＡの詳細は″Ｅｆｆｅｃｔｉｖｅｐｒｏｔｅｉｎｓｅｑｕｅｎｃｅｃｏｍｐａｒｉｓｏｎ．″ＰｅａｒｓｏｎＷＲ，（１９９６）ＭｅｔｈｏｄｓＥｎｚｙｍｏｌ；２６６：２２７−５８に記載されている。
ＰＳＩ−ＢＬＡＳＴはプロファイルアライメントを行うようにプログラムされている。ＰＳＩ−ＢＬＡＳＴの詳細は、″ＭａｔｃｈｉｎｇａｐｒｏｔｅｉｎｓｅｑｕｅｎｃｅａｇａｉｎｓｔａｃｏｌｌｅｃｔｉｏｎｏｆＰＳＩ−ＢＬＡＳＴ−ｃｏｎｓｔｒｕｃｔｅｄｐｏｓｉｔｉｏｎ−ｓｐｅｃｉｆｉｃｓｃｏｒｅｍａｔｒｉｃｅｓ．″ＳｃｈａｆｆｅｒＡＡ，ＷｏｌｆＹＩ，ＰｏｎｔｉｎｇＣＰ，ＫｏｏｎｉｎＥＶ，ＡｒａｖｉｎｄＬａｎｄＡｌｔｓｃｈｕｌＳＦ，Ｂｉｏｉｎｆｏｒｍａｔｉｃｓ１９９９，１２，１０００−１１に記載されている。
参照タンパク質の原子座標の最適化を達成するための方法、座標系、目的関数等は特に制限されないが、例えば、最大傾斜法、共役勾配法、Ｎｅｗｔｏｎ−Ｒａｐｈｓｏｎ法等で行うのが好ましい。最大傾斜法は、数値的に計算された目的関数の１次微分を利用し、原子座標の目的関数に対する最適化を行う。共役勾配法には、多くの方式があるが、Ｆｌｅｔｃｈｅｒ−Ｒｅｅｖｅｓ法（Ｆｌｅｔｃｈｅｒ，Ｒ．，ａｎｄＲｅｅｖｅｓ，Ｃ．Ｍ．（１９６４）ＦｕｎｃｔｉｏｎＭｉｎｉｍｉｚａｔｉｏｎｂｙＣｏｎｊｕｇａｔｅＧｒａｄｉｅｎｔｓ．ＣｏｍｐｕｔＪ，７：１４９−１５４）が標準的に用いられており、目的関数の１次微分を利用し、目的関数がｎ個の変数の厳密な二次関数である場合、多くともｎ回の繰り返しにより最適化に到達することが保証されている。Ｎｅｗｔｏｎ−Ｒａｐｈｓｏｎ法は、１次微分に加えて２次微分を利用し、初期構造が最適化構造に近い場合に効率が良い。これらの方法の詳細は、江口至洋「タンパク質工学の物理・化学的基礎（共立出版１９９１）」とその中の文献に記載されている。
以下、上記の通り最適化した構造および座標を、それぞれ最適化構造および最適化座標として引用する。
ステップＩ−２０：最適化座標の基準振動解析法
上記ステップＩ−１０で作成された最適化座標を用いて、その原子座標の変位を行う。原子座標の変位は、基準振動解析法を行い、各固有値の固有ベクトルを得ることにより行うのが好ましい。その際、最適化した自由度の一部を自由度とする座標系を用いても良い。この場合、一部の自由度に対しても最適化が達成されている。
ここで、「基準振動解析法」とは、ポテンシャルエネルギーを変位の２次関数として近似し、運動方程式を厳密に解き、最適化構造の周りの微小な振動を解析する方法を意味する。「固有値」とは、微小な振動の周期を意味する。「固有ベクトル」とは、振動の方向を意味する。
基準振動解析法の解くべき固有値方程式は、下記式（１）または（２）である。

ここでω_ｋは固有値、Ｕ_ｉｋは固有ベクトルであり、δ_ｉｊはクロネッカーのデルタである。Ｔ_ｉｊとＶ_ｉｊはそれぞれ運動エネルギ−Ｅ_ｋとポテンシャルエネルギーＶに関係し、下記式（３）および（４）の通りである。

よる微分である。Ａ_ｊｋは集団運動Ｑ_ｋと個々の原子運動ｑ_ｊを結ぶ係数であり、下記式（５）の通りである。

ここで、α_ｋとδ_ｋは初期条件で定められる。
上記した基準振動解析法の詳細は、Ｗｉｌｓｏｎ，Ｅ．Ｂ．，Ｄｅｃｉｕｓ，Ｊ．Ｃ．，ａｎｄＣｒｏｓｓ，Ｐ．Ｃ．１９５５．ＭｏｌｅｃｕｌａｒＶｉｂｒａｔｉｏｎｓ．ＭｃＧｒａｗ−Ｈｉｌｌ．に記載されている。
ステップＩ−３０：新規参照タンパク質の生成
上記ステップＩ−２０で得られた固有値、固有ベクトルを用いて、ある温度・ある固有値でのＣα原子の位置ゆらぎを計算する。固有値の数と等しい位置ゆらぎが得られる。参照タンパク質のＣα原子の温度因子を位置ゆらぎに換算し、各Ｃα原子について基準振動解析法の位置ゆらぎとの比を計算し、平均の比を求める。この平均の比は、使用した固有値の数だけあり、この比を掛けたこの固有値に属する固有ベクトルを構造最適化前の参照タンパク質の原子座標に加え、この変位させた原子座標からなる立体構造、即ち、誘導適合（ｉｎｄｕｃｅｄｆｉｔ）を含めた立体構造を参照タンパク質の立体構造の１つとする。以下これを、誘導適合（ｉｎｄｕｃｅｄｆｉｔ）型参照タンパク質、立体構造、座標として引用する。
平均の比を２倍して同様に参照タンパク質の誘導適合（ｉｎｄｕｃｅｄｆｉｔ）型立体構造を作成する。固有ベクトルには順・逆の方向があり、固有ベクトルに−１を掛けた逆方向にも同様に変位させる。すなわち、誘導適合（ｉｎｄｕｃｅｄｆｉｔ）型には使用した固有値の数の４倍だけある。誘導適合（ｉｎｄｕｃｅｄｆｉｔ）型と非誘導適合（ｎｏｉｎｄｕｃｅｄｆｉｔ）型参照タンパク質の立体構造を参照タンパク質立体構造セットとする。
ここで、温度因子と位置ゆらぎの関係は下記式（６）のとおりである。

ここで、Ｂ_ｉはＰＤＢファイルから得られる原子の温度因子であり、πは円周率、Ｄ_ｉは位置ゆらぎに相当する。本発明ではＣα原子に関してのみである。
基準振動法から得られる位置ゆらぎとＰＤＢファイルの温度因子を換算した位置ゆらぎの比は下記式（７）のとおりである。

ここでＦ_ｉ ^ｖは基準振動解析法から得られるｖ番目の固有値に対するｉ番目の原子の位置ゆらぎである。本発明では、Ｃα原子のみに対して行う。
比の平均は下記式（８）のとおりである。

ここでＮは原子数であり、和は原子に対して行う。Ｍ^ｖはｖ番目の固有値に対する平均の比である。本発明では、Ｃα原子に対して行う。
誘導適合（ｉｎｄｕｃｅｄｆｉｔ）型参照タンパク質立体構造の原子座標は下記式（９）および（１０）のとおりである。

ここでＣ_ｉｋ ^０は参照タンパク質の原子座標、Ｖ_ｉｋ ^ｖはｖ番目の固有値に属する固有ベクトルの成分をあらわす。
ステップＩ−４０：目的タンパク質のモデリング
上記ステップＩ−３０で得られた参照タンパク質の立体構造セットを参照して、適当なホモロジー・モデリング・プログラム、例えばＦＡＭＳにより目的タンパク質の立体構造セットを構築する。参照タンパク質の立体構造の数と同じ数の目的タンパク質の立体構造が構築される。即ち、使用した固有値の数の４倍だけある誘導適合（ｉｎｄｕｃｅｄｆｉｔ）型と非誘導適合（ｎｏｉｎｄｕｃｅｄｆｉｔ）型目的タンパク質立体構造が構築され、これらを、目的タンパク質立体構造セット、すなわち誘導適合（ｉｎｄｕｃｅｄｆｉｔ）を含めた立体構造とする。
次に、モデリング（立体構造の構築）手法の好適な一例としてＦＡＭＳの各ステップについて説明する。なお、下記のステップＩ−４１〜４３において記載されている計算回数、定数、カットオフ値等は、本発明者が最も好ましいと考えているパラメータの一例を示すものであり、本発明の範囲を何ら限定するものではない。なお、ＦＡＭＳの詳細は、ＫｏｊｉＯｇａｔａａｎｄＨｉｄｅａｋｉＵｍｅｙａｍａ，“Ａｎａｕｔｏｍａｔｉｃｈｏｍｏｌｏｇｙｍｏｄｅｌｉｎｇｍｅｔｈｏｄｃｏｎｓｉｓｔｉｎｇｏｆｄａｔａｂａｓｅｓｅａｒｃｈｅｓａｎｄｓｉｍｕｌａｔｅｄａｎｎｅａｌｉｎｇ”ＪｏｕｒｎａｌｏｆＭｏｌｅｃｕｌａｒＧｒａｐｈｉｃｓａｎｄＭｏｄｅｌｉｎｇ１８，２５８−２７２，２０００に記載されている。
ステップＩ−４１：Ｃα原子の初期座標の構築及び最適化
ステップＩ−３０からの参照タンパク質セットおよびアライメント情報を受けて、参照タンパク質から挿入および欠損のあるアミノ酸残基についての情報をえる。アライメントにおいて連続して三残基以上のアミノ酸が対応しているギャップの無い領域を選び出し、その領域においては、これらの残基ペアにおいて、目的タンパク質のＣα原子は参照タンパク質と同一のものを当てはめておく。Ｃα原子が求められなかった場合には、予め作成してある断片のデータベースから座標を当てはめる（第２図参照）。
ここで、本明細書においてＣα原子とは、各アミノ酸の骨格の中心となる炭素原子を意味する。Ｃβ原子とは、Ｃα原子の側鎖側に結合する炭素原子を意味し、Ｃγ原子とは、Ｃβ原子の側鎖側に結合する炭素原子を意味する。また、Ｃ原子とは、カルボニル基の炭素原子を意味する。
ステップＩ−４１（１）：Ｃα原子のシミュレーティッドアニーリング法による構築
上記ステップＩ−４１で作成されたＣα原子はシミュレーティッドアニーリングのプロセスを用いて参照タンパク質の座標から構成される関数を用いて最適化される。この目的関数は下記式（１１）のとおりである。

ここでＵ_ｌｅｎは、配列上隣の残基およびＣｙｓ残基のペアのＣα原子間の距離に関するもので下記式（１２）のように設定される。

ここでＤ_{ｉ，ｉ＋１}は残基ｉと残基ｉ＋１のＣα間距離である。Ｄ_ｉ ^ｓｓはジスルフィド結合を形成するＣｙｓ残基のペア同士の距離である。Ｋ_ｌとＫ_ｓｓは定数でありそれぞれ２および５と設定される。
Ｕ_ａｎｇはＣα原子の結合角の関数であり下記式（１３）のとおりである。

ここでθ_ｉ（ｒａｄ）はｉ，ｉ＋１，ｉ＋２番目の残基Ｃα原子の角度である。θ_０はＰＤＢのＸ線構造から（１００／１８０）・π（ｒａｄ）と設定される。Ｋ_ａは定数であり１とする。
Ｕ_ｐｏｓは、Ｃα原子の位置に関する関数であり、下記式（１４）のとおりである。

ここで‖・‖が意味する所はノルムであり、Ｍ_ｉは構造を基にしたアライメント上で構造的に等価な位置にあるＣα原子間の平均距離である。残基ｉについてＭ_ｉの値が求められないとき、Ｍ_ｉの値は１０と設定される。ここでは、Ｃα原子の平均座標であり下記式（１５）のとおりである。

ここでＸ^ｊ _ｉはｊ番目の参照タンパク質のｉ番目の残基のＣαの原子座標である。Ｗ^ｊ _ｉは、ｊ番目の参照タンパク質のｉ番目の残基の重みである。この重みは目的タンパク質の大体の形を決定するため重要なパラメータであるが、これはローカルスペースホモロジー（ＬＳＨ）と呼ばれる着目部位の１２Å以内の空間的近傍の局所的な値によって決定している（第３図参照）。ＬＳＨと構造がよく保存されている部位（ＳＣＲｓ：ＳｔｒｕｃｔｕｒａｌＣｏｎｓｅｒｖｅｄＲｅｇｉｏｎｓ）に存在する残基のペアの比率との相関は第４図に示されているように非常に高い。これは、高いＬＳＨ値を持つときは統計的にＣα原子の位置が参照タンパク質構造と比べて１．０Å以内にあることを意味する。
Ｕ_ｖｄｗは下記式（１６）のとおりである。

ここでＫ_ｖｄｗは０．０１（Ｄ_ｉ，ｊ〈３．２Å）と０．００１（Ｄ_ｉ，ｊ〉３．２Å）と設定され６Åをカットオフ値とした。
Ｃα原子は式（１１）に従って、シミュレーティッドアニーリング法を用いて最適化される。この最適化の段階でＣα原子の摂動は１．０Å以内になるように設定する。またこのアニーリングの段階は全てのＣα原子について、１００回づつ計算される。そして、温度に相当するパラメータは、２５から０．５回ごとに０．０１減らし、そのパラメータは以後一定とした。
この大きな２つの段階、構造情報の取得とＣα原子の構築は１０回繰り返され、最小の目的関数値をもつＣα原子の座標が最適解として算出される。
ステップＩ−４２：主鎖原子座標の構築及び最適化
ステップＩ−４１（１）のＣαの原子座標に主鎖の他の原子を付加し、シミュレーティッドアニーリング法によって目的関数を最小化するようにする。まず、Ｃα原子の立体的な重ねあわせを行い、Ｃαの原子間距離が２．５Å以下の残基が取り上げられる。Ｃαを除く主鎖の原子座標はＣα原子間距離が最小になるように参照タンパク質の座標から取得しモデル構造とする。
参照タンパク質の中に相当する残基が無い場合、主鎖の原子座標はデータベース中の相当する４残基のタンパク質断片から作成される。この過程の中で、残基ｉの主鎖原子はｉ−１番目からｉ＋２番目までのＣα原子間の最小のｒｍｓｄ値を持つ残基から選ばれる。その際Ｎ末端の残基では、Ｃα原子座標の重ね合わせ範囲がｉ番目からｉ＋３番目までとなり、Ｃ末端の残基およびそのひとつ前の残基では同様にｉ−３番目からｉ番目までおよびｉ−２番目からｉ＋１番目までとなる。
主鎖原子の目的関数を元にシミュレーティッドアニーリング法によって最適化される。
目的関数は下記式（１７）のとおりである。

Ｕ_ｂｏｎｄは下記式（１８）のとおりである。

ここでｂ_ｉ ^０は、標準の結合長でありそれぞれの化学結合の種類によって異なる。Ｋ_ｂは定数であり２２５とする。
Ｕ_ａｎｇは結合角の関数で、下記式（１９）のとおりである。

ここでθ_ｉはｉ番目の結合角であり、化学結合の種類によって異なる。Ｋ_ａは定数で４５と設定される。
Ｕ_{ｎｏｎ−ｂｏｎｄ}は非結合の相互作用の関数で、下記式（２０）のとおりである。

ここでε_ｉ，ｊとｒ_ｉ，ｊ ^＊は定数で原子の種類によって異なる。
Ｋ_ｎｏｎは定数で０．２５とし、カットオフは８Åとする。
Ｕ_ｓｓはＣｙｓ残基が生成するジスルフィド結合の関数で、下記式（２１）のとおりである。

ここでＫ^ｓｓ _ＣαおよびＫ^ｓｓ _Ｃβは定数であり７．５である。
Ｕ_ｐｏｓは原子の位置に関する関数で、下記式（２２）のとおりである。

ここで＜Ｗ_ｉＸ_ｉ＞は下記式（２３）のように与えられる。

式（２２）の＜Ｗ_ｉＸ_ｉ＞は、目的タンパク質および参照タンパク質の間の構造の重ねあわせから求める。
Ｋ_ｐｏｓは定数であり０．３である。
Ｕ_ｔｏｒは主鎖のねじれ角のものであり、下記式（２４）のとおりである。

ここでφ_ｉ ^０とψ_ｉ ^０はＲａｍａｃｈａｎｄｒａｎマップ上での最も近いねじれ角のφ_ｉおよびψ_ｉとする。またω_ｉ ^０は０としてｃｉｓ−Ｐｒｏ残基の場合のみπ（ｒａｄｉａｎ）とする。Ｋ_ｔおよびＫ_ωは定数であり、それぞれ１０および５０とする。
Ｕ_ｃｈｉはＣαのキラリティーに関するものであり、下記式（２５）のとおりである。

ここでτ_ｉはＮ−Ｃα−Ｃβ−Ｃで定められるねじれ角でありＫ_ｃｈｉは５０とする。
Ｕ_ｈｙｄｒはホモロガスなタンパク質中で保存された主鎖の水素結合に関するもので、下記式（２６）のとおり定められる。

水素結合は、Ｎ原子とＯ原子の距離が２．９±０．５Åにあるときに設定される。
複数の参照タンパク質中で水素結合があるか否かを判定するときは、７５％以上の参照タンパク質が存在すると認めた場合に水素結合ありと判定する。Ｋ_ｈｙｄｒは定数であり０．６である。
次にＣβを含む主鎖原子の最適化がシミュレーティッドアニーリングによって行われる。このアニーリングの過程で主鎖とＣβの原子の摂動が初期の位置に対して１．０Å以内になるようにする。このアニーリングの段階は主鎖とＣβの原子に対して２００回行われる。温度に相当するパラメータは５０もしくは２５から始まり一回毎に０．５倍にしてゆき０．０１になるまで続け、その後一定値とする。
主鎖の立体配置を幅広くサンプリングするために、本発明の方法では、好ましくは上記の方法を６回行い、最小の目的関数値を持つ主鎖の原子座標を最適解とする。そして、温度に相当するパラメータは、はじめの２回は５０からスタートして３回目から２５からスタートすることとする。
ステップＩ−４３：側鎖原子座標の構築及び最適化
側鎖の構築は、大きく２段階に分かれており、「構造保存部位の側鎖構築」（スッテプＩ−４３（１））と「全体の側鎖構築」（スッテプＩ−４３（２））に分けられる。
ステップＩ−４３（１）：構造保存部位の側鎖構築
算出された主鎖原子に対して、以前の研究における方法を用いてホモロガスなタンパク質から側鎖のねじれ角を得る。この方法の詳細は、″Ｔｈｅｒｏｌｅｏｆｐｌａｙｅｄｂｙｅｎｖｉｒｏｎｍｅｎｔａｌｒｅｓｉｄｕｅｓｉｎｓｉｄｅ−ｃｈａｉｎｔｏｒｓｉｏｎａｌａｎｇｌｅｓｗｉｔｈｉｎｈｏｍｏｌｏｇｏｕｓｆａｍｉｌｉｅｓｏｆｐｒｏｔｅｉｎｓ：Ａｎｅｗｍｅｔｈｏｄｏｆｓｉｄｅｃｈａｉｎｍｏｄｅｌｉｎｇ．″ＯｇａｔａＫａｎｄＵｍｅｙａｍａＨ，Ｐｒｏｔ．Ｓｔｒｕｃｔ．Ｆｕｎｃｔ．Ｇｅｎｅｔ．１９９８，３１，２５５−３６９に記載されている。
この方法の中でホモロガスなタンパク質中で保存されている側鎖の割合を算出し、この情報を元にして側鎖のモデリングを行う。側鎖の保存された部位の側鎖の原子座標は固定した主鎖原子に対して置かれる。例えば、ホモロガスなタンパク質中でアルギニン残基のχ^１角が保存されていれば、Ｃγ原子の座標を置くことができ、Ｐｈｅ残基でχ^１とχ^２角が保存されていれば、全ての側鎖原子を置くことができる。式（１７）を用いたシミュレーティッドアニーリングの最適化の過程は、主鎖とＣβの原子のみ行われて、原子の摂動は１．０Å以内となるようにした。この主鎖とＣβの原子のアニーリングの段階は２００回行われる。そして、温度に相当するパラメータは２５からスタートして一回毎に０．５倍にしてゆき０．０１になるまで小さくなるようにする。式（１７）の中のＵ_{ｎｏｎ−ｂｏｎｄ}は主鎖原子と部分的に作成された側鎖原子について行われる。そのとき側鎖原子の座標は最適化の過程を通じて保存されるようにする。
構造の情報であるＭ_ｉと水素結合のＮ−Ｏのペアは最適化の過程で用いられる。主鎖原子の配置を得るために、上記プロセスを３回繰り返し、目的関数の最小の主鎖原子の座標を算出構造とする。
ステップＩ−４３（２）：全体の側鎖の構築
側鎖の構築は固定した主鎖およびＣβ原子のもとで行う。これは上記したＯｇａｔａＫａｎｄＵｍｅｙａｍａＨ，Ｐｒｏｔ．Ｓｔｒｕｃｔ．Ｆｕｎｃｔ．Ｇｅｎｅｔ．１９９８，３１，２５５−３６９に開示されている研究成果をもって行われ、それを用いることにより短時間で正確なモデルを与えることができる。次に主鎖構造は低温におけるモンテカルロ法によって最適化され、温度は０．００１に設定され式（１７）の目的関数Ｕ_{ｎｏｎ−ｂｏｎｄ}を用い、全ての主鎖と側鎖の原子で計算される。そして、Ｎ、Ｃα、Ｃ、Ｃβ原子の最適化の過程で側鎖のねじれ角を最適化された状態を保つように側鎖の座標を再配置する。原子の摂動は０．５Å以内とする。次に側鎖は削除され、上記の側鎖構築が繰り返される。このプロセスは２．４Åの原子同士のぶつかり合いがなくなり、且つＮ−Ｃα−Ｃβ−Ｃのねじれ角が−１２０±１５°の範囲に収まるまで繰り返される。
ステップＩ−４４：最終構造の構築
かくして、任意の目的タンパク質の非誘導適合（ｎｏｉｎｄｕｃｅｄｆｉｔ）型と誘導適合（ｉｎｄｕｃｅｄｆｉｔ）型の立体構造を規定する原子座標を得ることができる。
ＩＩ．タンパク質−リガンド複合体の立体構造構築方法
次に、本発明の別の態様であるタンパク質−リガンド複合体の立体構造構築方法について、図面を参照して説明する。第５図は、目的とするタンパク質−リガンド複合体の立体構造構築方法、即ち誘導適合（ｉｎｄｕｃｅｄｆｉｔ）を含んだ複合体の立体構造構築方法の一例を示すフローチャートである。
まず、ステップＩＩ−１０において、目的タンパク質のモデリングされた原子座標を得る。最適化された参照タンパク質の基準振動解析法を行なうことにより、基準振動モードが得られる。そして固有ベクトル方向に主に実験で得られた目的タンパク質の原子座標を変位し、複数の参照タンパク質のセットを作成する。それらの座標を参照して目的タンパク質の立体構造をホモロジーモデリング（ｈｏｍｏｌｏｇｙｍｏｄｅｌｉｎｇ）により構築する。
ステップＩＩ−２０で、得られた目的タンパク質の立体構造に対してリガンドをドッキングさせる操作を行なう。ステップＩＩ−３０において、目的タンパク質のセットにドッキングしたリガンドに基づきＭＣＳＳ法による経験的分子エネルギー計算を行ない、目的とするタンパク質−リガンド複合体の立体構造をシミュレートする。かくして得られる目的タンパク質−リガンド複合体の立体構造は、目的タンパク質の誘導適合（ｉｎｄｕｃｅｆｉｔ）、すなわち周期的熱運動（分子揺らぎ）を含めた立体構造であり、医農薬の精度の高い分子設計に用いることができる。
以下、各ステップについて更に詳しく説明する。
ステップＩＩ−１０：目的タンパク質のモデリング
目的タンパク質のモデリングは、次の３つのステップＩＩ−１１：参照タンパク質の初期座標の最適化、ＩＩ−１２：最適化座標の基準振動解析、ＩＩ−１３：目的タンパク質のモデリングに分けられる。このステップは、前記Ｉ−１０〜Ｉ−４４と同様に行われる。かくして、基準振動解析法の振動モードに基づく立体構造、すなわち誘導適合（ｉｎｄｕｃｅｄｆｉｔ）を含んだ目的タンパク質の立体構造が構築できる。
ステップＩＩ−２０：目的タンパク質へのリガンドのドッキング
基準振動モードを考慮した目的タンパク質の複数の立体構造モデルに対してリガンドのドッキングを行なう。目的タンパク質のリガンド結合サイトと考えられる位置にドッキングさせる。このステップは、ＰＤＢ形式のファイルが入出力できる市販のソフトウエア、例えばＢＩＯＣＥＳ（ＮＥＣ社製）、Ｃｅｒｉｕｓ２（Ａｃｃｅｌｒｙｓ社製）、ＳＹＢＹＬ（ＴＲＩＰＯＳ社製）、ＨｙｐｅｒＣｈｅｍ（Ｈｙｐｅｒｃｕｂｅ社製）等を用いて行なう。一般的にはドッキングはステレオ表示が可能なディスプレイ上でリガンドを回転、並進して行なう。また簡易的エネルギー計算手法を含めたドッキングを行なってもよい。
用いるリガンドの結合部位は、特に限定されず、既に判明している結合部位、新たに特定した結合部位のいずれも用いることができる。リガンドの結合部位が未知のタンパク質については、後記ＩＩＩで述べる方法により、その部位を特定することもできる。
ステップＩＩ−３０：目的タンパク質−リガンド複合体の立体構造の最適化
ステップＩＩ−２０で得られたタンパク質−リガンド複合体構造モデルについて、目的タンパク質の１つの構造とリガンドとの構造の経験的分子エネルギー計算を、目的タンパク質の構造の数だけ行い、その際、目的タンパク質側は、複数の構造それぞれのポテンシャルエネルギー勾配に応じて原子座標を動かし、リガンド側は、複数個算出されたポテンシャルエネルギー勾配を平均化した方向にリガンドの原子座標を動かして、目的タンパク質の複数の立体構造に基づくリガンドの構造を求める。
このステップＩＩ−３０は、例えばＭｕｌｔｉｐｌｅＣｏｐｙＳｉｍｕｌｔａｎｅｏｕｓＳｅａｒｃｈ（ＭＣＳＳ）法により行われ、リガンドにより複数の複合体構造が経験的分子エネルギー計算（分子力場法）により同時に最適化され、それらの原子座標は経験的分子エネルギー計算（分子動力学法）により、構造が、例えば温度３００°Ｋで１０ｐｓ間緩和され、さらにその原子座標は分子力場法により最適化されることにより行われる。もちろん温度、時間は計算している対象系によって変わることはある。
ＭＣＳＳ法は、複数のリガンドを用いて受容体タンパク質とリガンド双方の立体構造を最適化する手法としてＡ．ＭｉｒａｎｋｅｒａｎｄＭ．Ｋａｒｐｌｕｓ（Ｐｒｏｔｅｉｎｓ，１９９１，１１，２９−３４）により提案されている。手法としては、個々のリガンドとタンパク質の経験的分子エネルギー計算を同時に行ない、受容体タンパク質のグラジエントについては平均をとるため、受容体タンパク質側は１つの立体構造として動く。
これに対して、本発明の方法では、タンパク質側は複数の分子構造、リガンド側は１つの分子構造を用いて、複数のタンパク質構造に基づくリガンドの構造を求めるものである。この時の経験的分子エネルギー計算において、タンパク質１構造とリガンド１構造の計算を、タンパク質構造の数だけ行い、リガンド側は、複数個算出されたポテンシャルエネルギー勾配を平均化した方向にリガンドの原子座標を動かす。一方、目的タンパク質側は、複数の構造それぞれのポテンシャルエネルギー勾配に応じて原子座標を動かし、目的タンパク質の複数の立体構造に基づくリガンドの構造が求められる。
上記の経験的分子エネルギー計算の方法は、特に限定されずそれ自体既知の方法で行えば良いが、発明者らが開発したａｐｒｉｃｏｔプログラム（Ｙｏｎｅｄａ，Ｓ．，ａｎｄＵｍｅｙａｍａ，Ｈ．，ＪＣｈｅｍＰｈｙｓ１９９２；９７：６７３０−６７３６）を改良したａｐｒｉｃｏｔ−ＭＣＳＳプログラムを用いるのが好ましい。経験的ポテンシャル関数としてはＡＭＢＥＲタイプのポテンシャル関数（Ｓ．Ｊ．Ｗｅｉｎｅｒ，Ｐ．Ａ．Ｋｏｌｌｍａｎ，Ｄ，Ａ．Ｃａｓｅ，Ｕ．ＣｈａｎｄｒａＳｉｎｇｈ，Ｃ．Ｇｈｉｏ，Ｇ．Ａｌａｇｏｎａ，Ｓ．Ｐｒｏｆｅｔａ，Ｊｒ．，Ｐ．Ｗｅｉｎｅｒ，Ｊ．Ａｍ．Ｃｈｅｍ．Ｓｏｃ．，１９８４，１０６，７６５−７８４）を、パラメータとしてはｐａｒｍ８９ａＲｅｖＡを用いるのが好ましい。もちろん他の経験ポテンシャルの使用も可能である。
分子動力学計算では、通常のエネルギー項の他にＣα原子位置に対する拘束ポテンシャルを、例えば下記式（２７）のようにＨａｒｍｏｎｉｃ関数として加えることにより、目的タンパク質の初期立体構造が大きく壊れないようにするのが好ましい。これは計算の近似の粗さを補う意味で大切であるが、拘束ポテンシャルの範囲を主鎖全体に広げたりしてもよく、これに限定されるものではない。

ここでＵｘｙｚは目的タンパク質におけるＣα原子位置に掛ける拘束のポテンシャルエネルギーで、Ｃαのオリジナル座標値がｘ０、更新された座標値がｘ、Ｋｘｙｚが原子をどの程度拘束させるかのパラメータである。ここではＫｘｙｚとして１０．０ｋｃａｌ／ｍｏｌ／Å^２を用いたが、一例であるので、式の形を含めて本発明の範囲を限定するものではない。
また、Ｃα原子のＸ、Ｙ、Ｚ座標に対する拘束ポテンシャルの代りに、式（２４）に示す目的タンパク質の主鎖のねじれ角に対する拘束を用いて、すなわち経験的分子エネルギー計算において目的タンパク質の主鎖のねじれ角を拘束するポテンシャル関数を加えることにより、初期立体構造が大きく壊れないようにしてもよい。
かくして、目的タンパク質として誘導適合（ｉｎｄｕｃｅｄｆｉｔ）型の立体構造モデルを使えば、分子揺らぎを考慮した目的タンパク質−リガンド複合体の原子座標を得ることができる。
また、リガンド分子がタンパク質の場合には、上記と同様の方法で、リガンドの基準振動モードを含む複数の立体構造とタンパク質の単一立体構造からリガンド側の誘導適合（ｉｎｄｕｃｅｄｆｉｔ）を考慮したリガンド−タンパク質複合体の立体構造の構築も可能である。
ＩＩＩ．タンパク質のリガンド結合部位の特定方法
次に、本発明の別の態様である、タンパク質のリガンド結合部位の特定方法について説明する。第６図は、タンパク質のリガンド結合部位の特定方法と、得られた結合部位にリガンドを結合させて、タンパク質−リガンド複合体の立体構造を構築する方法の一例を示すフローチャートである。
ステップＩＩＩ−１０で、タンパク質とリガンドの結合部位の特定（予測）を行う。このステップにおいて、タンパク質および／またはリガンドの周囲および内部、例えば疎水性表面に、低分子化合物、例えば非極性溶媒を発生させ、さらにそれら周囲に多数の水分子を追加して見かけ上水溶液中の分子動力学計算を行う。それらの結果に基づき、タンパク質および／またはリガンド表面の低分子化合物、例えば非極性溶媒の挙動から、タンパク質とリガンドの結合部位を検索する。ステップＩＩＩ−２０ではステップＩＩＩ−１０で得られたタンパク質とリガンドの結合推定部位を参考にして、それらをドッキングさせ、タンパク質−リガンド複合体の立体構造の初期原子座標を求める。そしてステップＩＩＩ−３０ではステップＩＩＩ−２０で得られたタンパク質−リガンド複合体の初期立体構造の周囲に水分子を発生させ、見かけ上水溶媒中の分子力学と分子動力学法を用いてタンパク質−リガンド複合体の立体構造の精密化を行う。
以下、各ステップについて更に詳しく説明する。
ステップＩＩＩ−１０：タンパク質のリガンド結合部位の特定
タンパク質とリガンドの結合部位の特定は、次の３つのステップ、ＩＩＩ−１１：タンパク質周囲および／またはリガンド周囲への低分子化合物の発生、ＩＩＩ−１２：タンパク質および／またはリガンドの水溶媒中での経験的分子エネルギー計算（分子力学、分子動力学計算）による低分子化合物（例えば非極性溶媒等）の挙動検索、ＩＩＩ−１３：低分子化合物（例えば非極性溶媒等）の挙動から、タンパク質へのリガンド結合部位および／またはリガンドのタンパク質への結合部位の判定に分けられる。
ステップＩＩＩ−１１：タンパク質周囲および／またはリガンド周囲への低分子化合物の発生
先ず、タンパク質および／またはリガンドの周囲に水分子を発生させたのち、タンパク質周囲、リガンド周囲、ならびに低分子化合物が入り込める内部周囲にある水分子を低分子化合物で置換する。その場合、これらの置換はそれら周囲全体にわたり低分子化合物を配置してもよいし、疎水性や水素結合能を有するアミノ酸や官能基の周りにだけ低分子化合物を配置してもよい。ここで、リガンドがペプチドやタンパク質等の高分子物質である場合には、リガンド周囲へも低分子化合物を発生させ、タンパク質の場合と同様に経験的分子エネルギー計算による低分子化合物の挙動解析を行う。リガンドが医農薬分子等の分子量が小さい物質である場合は、どの部分が疎水性領域か等の判別できるので、通常、結合部位の特定の必要性は無い。しかし、リガンドが高分子物質である場合は、リガンド側の結合部位もタンパク質側の結合部位と同様に解析し、複合体の結合部位を特定することが必要である。
低分子化合物としては、例えば、エタン、シクロペンタン、ベンゼン等の非極性溶媒、Ｎ−メチルアセタミド、ベンズアミド等の水素結合能性溶媒、あるいは医農薬化合物でもよく特に限定されない。だがそれら配向の任意性を考えると、対象性を有する化合物が好ましい。非極性溶媒を用いると疎水性部分を有するタンパク質やリガンドの結合部位を特定することができる。また水素結合能性溶媒である酸アミド基を有する化合物を用いると、酸アミド基と水素結合しうる部分、すなわちβシート構造の露出部分やオキシアニオンホールを含むリガンドの結合部位を特定することができる。更に医農薬分子を用いると、医農薬分子が特異的に結合しうる部分を特定することができる。
具体的には、例えばベンゼン等の非極性溶媒をタンパク質の周囲に配置させる場合は、タンパク質の中でＭＳＡＳの値が３０％以上のアミノ酸残基により形成される３．５Å以内の表面にある水分子を非極性溶媒（ベンゼン）で置換すれば良い。また非極性溶媒（ベンゼン）同士が１．５Å以内になるような場合には水分子の非極性溶媒への置換は行わなくて良い。非極性溶媒に置換されなかった水分子は一回すべて消去する。上記した水分子の非極性溶媒への置換基準は、ベンゼンを用いた場合の一例であり、本発明の範囲を限定するものではない。
ステップＩＩＩ−１２：タンパク質および／またはリガンドの水溶媒中での経験的分子エネルギー計算による低分子化合物の挙動検索
上記ステップＩＩＩ−１１で作成されたタンパク質（および／またはリガンド）と低分子化合物の原子座標を用いて、それら周囲に周期境界条件で水分子を発生させたのち、経験的分子エネルギー計算である分子力学計算で立体構造を最適化し、続いて分子動力学計算を行う。分子動力学計算が終了したのち、水分子を除去してタンパク質（および／またはリガンド）と低分子化合物との原子座標を得る。例えば、低分子化合物として非極性溶媒（ベンゼン）を配置した場合は、温度３００°Ｋ、１０〜２０ｐｓ程度の分子動力学計算を行えば良い。これにより、タンパク質の周囲や内部への低分子化合物の拡散や集積が起こる。この拡散や集積の状態、即ち低分子化合物の挙動を、後記ステップＩＩＩ−１３の方法で解析することにより、タンパク質側のリガンド結合部位、リガンド側のタンパク質結合部位を特定することができる。
上記の経験的分子エネルギー計算の方法は、特に限定されないが、本発明者らが開発したａｐｒｉｃｏｔプログラムを用いるのが好ましい。経験的ポテンシャル関数としてはＡＭＢＥＲタイプのポテンシャル関数を用いるのが好ましい。もちろん他の経験ポテンシャルの使用も可能である。
ステップＩＩＩ−１３：低分子化合物の挙動からのリガンド結合部位の判定
上記ステップＩＩＩ−１２で求まったタンパク質周囲および／またはリガンド周囲の低分子化合物、例えば非極性溶媒の分布について、これを対象としたクラスター解析を行い、得られたクラスターの大きさからリガンドがタンパク質にドッキングしやすい部位を判定する。
ここで、クラスター解析とは、多次元空間において与えられたデータ集合を個体間の類似度（あるいは相違度）によってクラスター（塊）化する多変量解析法である。ここでは３次元空間における非極性溶媒の重心（ベンゼンでは６炭素原子の座標平均）間同士のユークリッド距離を計算し、閾値以内の距離の非極性溶媒があれば、距離が短い非極性溶媒同士からクラスター化していく。そのときクラスター化された非極性溶媒の集合についても、通常のクラスター解析と異なり、クラスターの重心からの距離ではなく、その中で最短距離の非極性溶媒同士が閾値以内であるかどうかを調べることにより、それらをクラスター化するか否かを判定する。非極性溶媒のベンゼンの場合、閾値については６Åを用いたが、その値は単なる例示であり本発明の範囲を限定するものではない。
例えば非極性溶媒（ベンゼン）を用いた場合、それらはいくつかのクラスターに分類されるが、大きなクラスターほどリガンドやタンパク質へのドッキング部位である可能性が高いと考えられる。クラスター化された非極性溶媒群はその形状を楕円球で表現できるが、座標の固有値問題を解くことにより、クラスターの長短方向が求まる。タンパク質側とリガンド側双方のクラスター同士を楕円球の長短方向を参考にしてドッキングし、タンパク質−リガンド複合体のモデルをいくつか作成する。もちろんタンパク質とリガンドが重なる配置になる複合体構造は自動的に取り除く。ドッキングされたモデルはステップＩＩ−２０で記述したソフトウエアでタンパク質とリガンド配置の微調整を行う。
ステップＩＩＩ−２０：タンパク質へのリガンドのドッキング
上記ステップＩＩＩ−１３で得られた低分子化合物、例えば非極性溶媒（ベンゼン）のクラスタリングで大きなクラスターとなったサイト同士をドッキングし、タンパク質−リガンド複合体構造の初期データとする。この際、低分子化合物、例えば非極性溶媒（ベンゼン）データはドッキングに際して除かれる。
本ステップはＰＤＢ形式のファイルが入出力できる市販のソフトウエアを用いて行なうことができる。一般的にはドッキングはステレオ表示が可能なディスプレイ上でリガンドの回転、並進等により行なわれる。また簡易的エネルギー計算手法を含めたドッキングを行なってもよい。
ステップＩＩＩ−３０：タンパク質−リガンド複合体の立体構造の構築
上記ステップＩＩＩ−２０で得られたタンパク質−リガンド複合体の初期原子座標データは、それら周囲に周期境界条件で水分子を発生させたのち、分子力学計算で初期立体構造を最適化し、続いて分子動力学計算を行い、そして最終ステップの座標軌跡から水分子を取り除くことによりタンパク質−リガンド複合体の立体構造が得られる。
分子動力学計算の方法は、特に限定されず、例えば、温度３００°Ｋ、１０から２０ｐｓ程度で行えばよい。用いるプログラムも特に限定されないが、発明者らが開発したａｐｒｉｃｏｔで、経験力場もＡＭＢＥＲタイプを用いるのが好ましい。しかし使用プログラム、力場とも単なる例示であり、本発明の範囲を限定するものではない。
かくして、タンパク質−リガンド複合体の生成過程が水溶液中であることを考慮して、水溶媒中での低分子化合物、例えば非極性溶媒の集積、拡散を利用して、タンパク質とリガンドの疎水性表面を見い出し、それら同士をドッキングするという方法でこれまでより精密なタンパク質−リガンド複合体の原子座標を得ることができる。
ＩＶ．タンパク質の立体構造を規定する原子座標が記録されている記録媒体、データベース
上記方法で得られたタンパク質の立体構造またはタンパク質−リガンド複合体の立体構造を規定する原子座標を、コンピュータが利用可能な所定の形式で適当な記録媒体に格納することにより、目的タンパク質の立体構造データベースが構築できる。本発明のデータベースは、好ましくは、上記原子座標とともに参照タンパク質と目的タンパク質のアライメント情報を含んでいても良い。また、データベースには、所望によりコード番号、参照タンパク質の参照領域の情報、目的タンパク質の情報、Ｃα原子間距離等が含まれる。
本発明においてデータベースとは、上記原子座標を適当な記録媒体に書き込み、所定のプログラムに従って検索を行うコンピュータシステムをも意味する。ここで適当な記録媒体としては、例えば、フロッピーディスク、ハードディスク、磁気テープ等の磁気媒体；ＣＤ−ＲＯＭ、ＭＯ、ＣＤ−Ｒ、ＣＤ−ＲＷ等の光ディスク、半導体メモリ等を挙げることができる。
Ｖ．薬物の分子設計方法
医農薬等の薬物分子設計を行うことができる適当なプログラムが動作するコンピュータで、上記方法で得られた薬物分子の標的となるタンパク質（以下これを「標的タンパク質」と称することがある）の構造座標の全て若しくは一部、又はそれらが記録されたデータベース若しくは記録媒体の構造座標の全て若しくは一部を使用して、標的タンパク質と相互作用をする薬物分子（拮抗薬または作動薬）を同定、検索、評価又は設計等を行うことができる。
薬物分子の同定、検索、評価又は設計は、本発明の方法で得られた立体構造座標と薬物分子の立体構造座標との相互作用の有無やその程度に基づいて行われる。本明細書において、薬物分子の同定、検索、評価又は設計等を、単に薬物の分子設計ということがある。
タンパク質の立体構造座標と薬物候補分子の立体構造座標との相互作用に基づいて分子設計を行う際に用いられるコンピュータとしては、適当なプログラムが動作するように調整されているコンピュータであれば特に制限はない。また、コンピュータの記憶媒体にも特に制限はない。分子設計に用いるプログラムは、例えばアクセルリス（Ａｃｃｅｌｒｙｓ）社製のコンピュータ・プログラムＩｎｓｉｇｈｔＩＩ等を挙げることができる。特に、この目的のために特別に作成されたＬｕｄｉやＤＯＣＫといったプログラムを単独又は組み合わせて用いることで、より容易に薬物分子を同定、検索、評価又は設計することができる。また、タンパク質の立体構造座標と薬物分子とのドッキング評価は、例えば前記ステップＩＩ−２０に記載したＮＥＣ社製のＢＩＯＣＥＳ等のソフトウエアを用いて行うことができる。
ここで、薬物分子は、既知のものであっても、新たに合成された新規な化学構造を有する薬物分子であっても、その立体構造が得られるものであれば、いずれの薬物分子も本発明の方法で用いることができる。薬物分子の立体構造座標は、Ｘ線結晶解析やモデリング等のいずれの方法で得られたものでも良い。３次元構造座標が決定されているものは、適当なデータベース、例えばＣＣＤＣ（ＣａｍｂｒｉｄｇｅＣｒｙｓｔａｌｌｏｇｒａｐｈｉｃＤａｔａＣｅｎｔｒｅ：ｈｔｔｐ：／／ｗｗｗ．ｃｃｄｃ．ｃａｍ．ａｃ．ｕｋ／）やＰＤＢ（ＰｒｏｔｅｉｎＤａｔａＢａｎｋ：ｈｔｔｐ：／／ｗｗｗ．ｒｃｓｂ．ｏｒｇ／ｐｄｂ／）等から収得することができる。
更には、標的タンパク質の立体構造を用いて、例えば特開２０００−１７８２０９号公報に記載されている方法によっても、薬物分子を設計することができる。この様に、本発明の方法で得られたタンパク質の立体構造座標を用いることで、薬物分子のコンピュータによる分子設計が可能となる。ただし、本発明の分子設計方法は、これらのプログラムや手法を用いるものに限定されるものではない。
薬物の分子設計には、通常、概念的に２つの段階がある。最初の段階は、リード化合物を見つけだすものであり、次の段階はリード化合物の最適化である。どちらの段階も、標的タンパク質の立体構造座標を使用して、それ自体既知の方法により行うことができる。これにより最適な医農薬候補分子を得ることができる。
ＶＩ．分子設計方法により得られる医農薬候補分子のスクリーニング方法
上記方法により同定、検索、評価又は設計された医農薬候補分子は、その分子の性質に応じて、例えばそれ自体既知の化学合成法により得ることができる。しかしながら、薬物分子は、天然化合物、合成化合物のいずれでも良く、また、高分子化合物、低分子化合物のいずれでも良い。得られた医農薬候補分子は、更に、それ自体既知の方法により、試験管内や生体内における薬理学的または生理学的試験によりその活性を調べ、所望の活性を有する医農薬候補分子を選抜することにより実際に医農薬として応用可能なものを得ることができる。
ＶＩＩ．医農薬組成物の製造方法
上記スクリーニング方法により選択された医農薬等の薬物分子、例えば医薬分子は、それ自体単独で治療対象となる疾患等の患者に投与することができるが、これらの有効成分の１種又は２種以上を混合して投与することもできる。また、薬理学的に許容される製剤用添加物等を用いて該物質を医薬品組成物として製剤化し、これを投与するのが好ましい。例えば、必要に応じて糖衣を施した錠剤、カプセル剤、顆粒剤、細粒剤、散剤、丸剤、マイクロカプセル剤、リポソーム製剤、トローチ、舌下剤、液剤、エリキシル剤、乳剤、懸濁剤等として経口的に、あるいは無菌の水性液もしくは油性液として製造した注射剤や、座剤、軟膏、貼付剤等として非経口的に使用できる。これらは、例えば、該物質を生理学的に認められる担体、香味剤、賦形剤、ベヒクル、防腐剤、安定剤、結合剤などとともに一般に認められた製剤実施に要求される単位用量形態で混和し、充填又は打錠等の当業界で周知の方法を用いて製造することができる。これらの医薬組成物における有効成分量は指示された範囲の適当な容量が得られるようにするものである。
農薬分子について、実際に農薬として使用する場合には、担体若しくは希釈剤、添加剤および補助剤等と公知の方法で混合して、通常農薬として用いられている製剤形態（組成物）、例えば粉剤、粒剤、水和剤、乳剤、水溶剤、フロアブル剤等に調製して使用される。
実施例
以下、実施例を挙げて本発明を更に具体的に説明するが、下記の実施例は、本発明の具体的な認識を得る一助と見なすべきであり、本発明の範囲を何ら制限するものではない。
実施例１ β２アドレナリンレセプターの立体構造の構築
上記発明の実施形態のＩ−１０〜Ｉ−４０で詳述した方法に従って、次の通りヒト由来β２アドレナリンレセプターの誘導適合を含めた立体構造を構築した。第７図にフローチャートを示す。
立体構造モデルの構築は、ＮＥＣ社製ワークステーション（機種：Ｅｘｐｒｅｓｓ５８００／１２０Ｒｃ−２、ＣＰＵ：ＰｅｎｔｉｕｍＩＩＩ９３３ＭＨｚｘ２、ＯＳ：ＲｅｄＨａｔＬｉｎｕｘ６．２Ｊ、メモリ：１０２４Ｍｂｙｔｅｓ）を用いて行った。目的としたβ２アドレナリンレセプターのアミノ酸配列は、ＰＩＲ；ｈｔｔｐ：／／ｗｗｗ−ｎｂｒｆ．ｇｅｏｒｇｅｔｏｗｎ．ｅｄｕ／ｐｉｒ／のＩＤ：ＱＲＨＵＢ２より得た。
このβ２アドレナリンレセプターのアミノ酸配列を目的タンパク質の配列としてＰＳＩ−ＢＬＡＳＴ（Ｐｏｓｉｔｉｏｎ−ＳｐｅｃｉｆｉｃＩｔｅｒａｔｅｄＢＬＡＳＴ）によるアライメントを行った。その際、モチーフプロファイルは、ＧＣＲＤｂ；ｈｔｔｐ：／／ｗｗｗ．ｇｃｒｄｂ．ｕｔｈｓｃｓａ．ｅｄｕ／の全配列８９２個を用いた。β２アドレナリンレセプターのアミノ酸配列を、ＳＥＱＩＤＮｏ．１に示す。
参照するタンパク質の立体構造として、ＰＤＢ（ｈｔｔｐ：／／ｗｗｗ．ｒｃｓｂ．ｏｒｇ／ｐｄｂ／）のＩＤ：１Ｆ８８（ロドプシン）のＢ鎖の構造を用い、このＢ鎖に対してのアライメントを得た。１Ｆ８８（ロドプシン）のＢ鎖の配列をＳＥＱＩＤＮｏ．２に、アライメント結果を第８図に示す。１Ｆ８８（ロドプシン）の結晶格子中にはＡ鎖及びＢ鎖よりなるほぼ同一の立体構造を持つ２量体があり、Ｂ鎖を参照構造として用いた。またＡ鎖とＢ鎖の座標にはそれぞれ大きな欠損があり完全ではなく、前記ステップＩ−４０で詳述したモデリング・プログラムＦＡＭＳを用いて１Ｆ８８構造のモデリングを行い、構築された立体構造をβ２アドレナリンレセプターの参照タンパク質立体構造とした。
ＰＤＢファイルおよびＦＡＭＳでは適当な残基に水素原子が付加されないため、この参照タンパク質立体構造の適当な残基に水素原子を発生させ、基準振動解析法の入力座標となる初期原子座標を得た。
前記ステップＩ−１０〜Ｉ−２０のとおり、得られた初期原子座標のデカルト座標系による最適化、ＳＳ結合のポテンシャルパラメータの一部をゼロにしてデカルト座標系で再最適化、２面角座標系による基準振動解析法を行い、固有値・固有ベクトルを得た。
この際、パラメータはＡＭＢＥＲのｐａｒｍ８９ａＲｅｖＡを用いた。非結合相互作用のカットオフ値は内側９．０Å、外側１０．０Åとし、１−４相互作用のパラメータは非結合相互作用のそれに１／２を乗じたものを使用し、誘電率は距離依存型（１／ｒÅ）とした。最適化は、Ｆｌｅｔｃｈｅｒ−Ｒｅｅｖｅｓの共役勾配法を用いた。得られた初期原子座標のデカルト座標系による最適化をしたあと、ＳＳ結合の結合角、２面角のパラメータをゼロにする以外は同じ条件を使用してデカルト座標系で再最適化し、２面角座標系による基準振動解析法を行い、固有値・固有ベクトルを得た。
使用した最適化の条件は、Ｓｕｍｉｋａｗａ，Ｈ．，Ｓｕｚｕｋｉ，Ｅ．−Ｉ．，Ｆｕｋｕｈａｒａ，Ｋ．−Ｉ．，Ｎａｋａｊｉｍａ，Ｙ．，Ｋａｍｉｙａ，Ｋ．，ａｎｄＵｍｅｙａｍａＨ．１９９８．Ｄｙｎａｍｉｃｓｓｔｒｕｃｔｕｒｅｏｆｇｒａｎｕｌｏｃｙｔｅｃｏｌｏｎｙ−ｓｔｉｍｕｌａｔｉｎｇｆａｃｔｏｒｐｒｏｔｅｉｎｓｓｔｕｄｉｅｄｂｙｎｏｒｍａｌｍｏｄｅａｎａｌｙｓｉｓ：Ｔｗｏｄｏｍａｉｎ−ｔｙｐｅｍｏｔｉｏｎｓｉｎｌｏｗｆｒｅｑｕｅｎｃｙｍｏｄｅｓ．ＣｈｅｍＰｈａｒｍＢｕｌｌ４６：１０６９−１０７７に記載されている方法を用いた。また、２面角座標系による基準振動解析法の詳細は、Ｎｏｇｕｔｉ，Ｔ．，ａｎｄＧｏ，Ｎ．１９８３．Ｄｙｎａｍｉｃｓｏｆｎａｔｉｖｅｇｌｏｂｕｌａｒｐｒｏｔｅｉｎｓｉｎｔｅｒｍｓｏｆｄｉｈｅｄｒａｌａｎｇｌｅｓ．ＪＰｈｙｓＳｏｃＪｐｎ５２：３２８３−３２８８およびＮｏｇｕｔｉ，Ｔ．，ａｎｄＧｏ，Ｎ．１９８３．Ａｍｅｔｈｏｄｏｆｒａｐｉｄｃａｌｃｕｌａｔｉｏｎｏｆａｓｅｃｏｎｄｄｅｒｉｖａｔｉｖｅｍａｔｒｉｘｏｆｃｏｎｆｏｒｍａｔｉｏｎａｌｅｎｅｒｇｙｆｏｒｌａｒｇｅｍｏｌｅｃｕｌｅｓ．ＪＰｈｙｓＳｏｃＪｐｎ５２：３６８５−３６９０に記載されている方法を用いた。
前記ステップＩ−３０のとおり、温度を３００°Ｋとし、３０ｃｍ^−１以下の各固有値に対するＣα原子のゆらぎを求め、ＰＤＢＩＤ：１Ｆ８８（ロドプシン）のＡ鎖とＢ鎖の平均の温度因子から換算されるＣα原子のゆらぎとの比をとり、各固有値に対する平均の比を得た。平均の比をこの固有値に属する固有ベクトルに掛けて、参照タンパク質の原子座標に加えて変位を行い、誘導適合（ｉｎｄｕｃｅｄｆｉｔ）型参照タンパク質の立体構造を規定する座標を得た。同様に固有ベクトルに−１を掛けた変位、２倍した平均の比を固有ベクトルに掛けた変位、さらに−１を掛けた変位を行った。ただし、ここで加える固有ベクトルは２面角座標からデカルト座標に変換してある。１つの固有値・固有ベクトルから４つの誘導適合（ｉｎｄｕｃｅｄｆｉｔ）型参照タンパク質の立体構造セットが得られる。用いた３０ｃｍ^−１以下の固有値の数は１１８個であり、得られた誘導適合（ｉｎｄｕｃｅｄｆｉｔ）型の参照タンパク質の数は４７２個である。例として、第９図に最低固有値４．４７ｃｍ^−１のＭ^ｖ（＝２６．４）倍したゆらぎと換算した温度因子を示す。
前記ステップＩ−４０のとおり、非誘導適合（ｎｏｉｎｄｕｃｅｄｆｉｔ）型参照タンパク質立体構造と誘導適合（ｉｎｄｕｃｅｄｆｉｔ）型参照タンパク質立体構造セットからＦＡＭＳにより目的タンパク質であるβ２アドレナリンレセプターの立体構造をモデリングした。目的タンパク質の立体構造と参照タンパク質の立体構造は１対１の関係にあり、４７２個の誘導適合（ｉｎｄｕｃｅｄｆｉｔ）型目的タンパク質立体構造と従来の方法から得られる１個の非誘導適合（ｎｏｉｎｄｕｃｅｄｆｉｔ）型目的タンパク質立体構造を得た。例として、第１０図に、上記で得られた非誘導適合（ｎｏｉｎｄｕｃｅｄｆｉｔ）型参照タンパク質から構築された非誘導適合（ｎｏｉｎｄｕｃｅｄｆｉｔ）型目的タンパク質立体構造と最低固有値の固有ベクトルを±２×Ｍ^ｖ（±２×２６．４）倍した誘導適合（ｉｎｄｕｃｅｄｆｉｔ）型参照タンパク質立体構造から構築された誘導適合（ｉｎｄｕｃｅｄｆｉｔ）型目的タンパク質の立体構造の一部を示す。図中、中央の構造が非誘導適合型目的タンパク質である。
実施例２トリプシン単体およびトリプシン・インヒビター単体からの複合体の立体構造の構築
本例では受容体、リガンド、受容体−リガンド複合体のＸ線結晶解析が既知である牛膵臓由来のβ−Ｔｒｙｐｓｉｎ（トリプシン）とトリプシン・インヒビター（ＢＰＴＩ）の系を用いて、本発明のタンパク質−リガンド複合体の立体構造構築方法の検証を行った。ここではトリプシンが受容体タンパク質（目的タンパク質）、ＢＰＴＩがリガンドである。
用いたトリプリンのアミノ酸配列をＳＥＱＩＤＮｏ．３に、トリプシン・インヒビター（ＢＰＴＩ）のアミノ酸配列をＳＥＱＩＤＮｏ．４に示す。なお、トリプシンのアミノ酸番号は、キモトリプシノーゲン（キモトリプシンの前駆体）のアミノ酸配列番号で記述するので、次に示す通り、アミノ酸番号１６〜２４５までの２２３残基になる。途中、アミノ酸番号３５、３６、６８、１２８、１３１、１８８、２０５、２０６、２０７、２０８に欠落が、１８４、１８８、２２１に重複（１８４Ａ、１８８Ａ、２２１Ａで表示）がある。

前記ステップＩＩ−１０〜ＩＩ−３０で詳述した方法に従って、次の手順でトリプシン−ＢＰＴＩ複合体の立体構造モデルを構築し、複合体活性部位の位置をそのＸ線結晶解析データと比較検討した。
受容体タンパク質−リガンド複合体の立体構造モデルの構築は、ＤＥＬ社製パーソナルコンピュータ（機種：ＤｉｍｅｎｓｉｏｎＸＰＳＢ８６６、ＣＰＵ：ＰｅｎｔｉｎｕｍＩＩＩ８６４ＭＨｚ、ＯＳ：ＲｅｄＨａｔＬｉｎｕｘ６．２Ｊ、メモリ：５１２Ｍｂｙｔｅｓ）を用いて行った。トリプシンとＢＰＴＩ単独のＸ線結晶解析の座標、ならびにトリプシン−ＢＰＴＩ複合体のそれは、ＰｒｏｔｅｉｎＤａｔａＢａｎｋ（ＰＤＢ）；ｈｔｔｐ：／／ｗｗｗ．ｒｃｓｂ．ｏｒｇ／ｐｄｂ／より、それぞれ１ＴＬＤ（トリプシン単体）、４ＰＴＩ（ＢＰＴＩ）、２ＰＴＣ（トリプシン−ＢＰＴＩ複合体）を取得して用いた。
トリプシンとＢＰＴＩの立体座標系は、トリプシン−ＢＰＴＩ複合体の結果を考察しやすいように１ＴＬＤと４ＰＴＩの座標系を２ＰＴＣの座標系に最小二乗フットによりスーパーインポーズした。トリプシンとＢＰＴＩの立体座標はヘテロ原子に水素原子を発生させたのち、それぞれ単体での初期座標の最適化を行った。次にトリプシンはＢＰＴＩを含まない系で基準振動解析を行い、波長ごとに対する振動ベクトルを求めた。
その中で、時間的に長周期な振動ベクトルからなる５つのトリプシンの立体構造に対して、ＢＰＴＩの立体構造をドッキングしてａｐｒｉｃｏｔ−ＭＣＳＳプログラムによるＭＣＳＳ計算を行い、トリプシン−ＢＰＴＩ複合体の立体構造を精密化した。ＭＣＳＳ計算の内訳は、最初に１０００ステップのトリプシン−ＢＰＴＩ複合体の分子力学計算による立体構造の最適化を行い、続いて１ｆｓを１ステップとする３００°Ｋ、１０ｐｓの分子動力学計算によりトリプシン−ＢＰＴＩ複合体の立体構造の緩和を行った。分子動力学計算では複合体の立体構造が大きく崩れないように式（２７）に示したＣα原子に対するＫｘｙｚ＝１０．０ｋｃａｌ／ｍｏｌ／Å^２の拘束条件を加えた。そして１０ｐｓ後の立体構造について、トリプシン−ＢＰＴＩ複合体の座標データをＰＤＢフォーマットで得た。
ＭＣＳＳ計算後のトリプシン−ＢＰＴＩ複合体系のトリプシンの立体構造を第１１図に示す。トリプシンの原子座標を眺めて見ると、主鎖、側鎖ともに大きくばらついている部分と、それらが余りばらついていない部分があった。その中でもトリプシン活性部位であるトリプシン側のＨｉｓ５７、Ａｓｐ１０２、Ｇｌｙ１９３−Ａｓｐ１９４−Ｓｅｒ１９５（オキシアニオンホール）部分は主鎖、側鎖ともよく一致していた。このことを利用するとリガンド結合部位に重要な受容体タンパク質側の部位を見つけられる。それは新たなリガンドをデザインする上でたいへん参考になる。
ＭＣＳＳ計算前のトリプシン−ＢＰＴＩ複合体の初期立体構造を第１２図に、ＭＣＳＳ計算後のトリプシン−ＢＰＴＩ複合体の立体構造を第１３図に、複合体のＸ線結晶解析の立体構造とともに示した。これらの図では、トリプシン−ＢＰＴＩ複合体の活性部位に当たる、トリプシン側ではＨｉｓ５７、Ａｓｐ１０２、オキシアニオンホール（Ｇｌｙ１９３−Ａｓｐ１９４−Ｓｅｒ１９５）を、ＢＰＴＩ側ではＬｙｓ１５だけを抜き出して表示した。黒色で表示されている線がトリプシン−ＢＰＴＩ複合体のＸ線結晶解析の立体構造、灰色で表示されている線が本発明により組み立てられた複合体モデルの初期の立体構造（第１２図）と精密化された結果（第１３図）である。
トリプシンの活性部位であるＨｉｓ５７、Ａｓｐ１０２、オキシアニオンホールは、ＭＣＳＳ計算前の初期立体構造（第１２図）とＭＣＳＳ計算後の精密化された立体構造（第１３図）は主鎖、側鎖を含めてよく一致している。ＢＰＴＩのＬｙｓ１５主鎖も、そのカルボニル酸素がオキシアニオンホールのＧｌｙ１９３とＳｅｒ１９５ベプチドＮＨ基と２本の水素結合で結ばれているため、ＭＣＳＳの計算前後でよく一致している。一方ＢＰＴＩのＬｙｓ１５側鎖の方向は、ＭＣＳＳ計算前はトリプシンの活性ポケットに入っていないが、ＭＣＳＳ計算で立体構造を精密化することによりその活性ポケットに入り込み、トリプシン−ＢＰＴＩ複合体のＸ線結晶解析によく一致するようになる。
このことは、目的タンパク質の基準振動モードを含む複数のモデル立体構造を用いること、それらにドッキングして得られる目的タンパク質−リガンド複合体の初期立体構造をＭＣＳＳ計算によりシミュレーションする手法が、目的とするタンパク質−リガンド複合体の立体構造の構築に有用であることを示している。
実施例３トリプシン、トリプシン・インヒビターそれぞれの結合部位の特定
前記ステップＩＩＩ−１０〜ＩＩＩ−３０で詳述した方法に従って、次の手順でトリプシンおよびＢＰＴＩの結合部位をそれぞれ特定し、それら部位を複合体のＸ線結晶解析データと比較検討した。本例では、タンパク質−リガンド複合体Ｘ線結晶解析が既知である牛膵臓由来のβ−Ｔｒｙｐｓｉｎ（トリプシン）とトリプシン・インヒビター（ＢＰＴＩ）の系を用いた。ここではトリプシンが受容体タンパク質（目的タンパク質）、ＢＰＴＩがリガンドであるが、ＢＰＴＩもタンパク質であるので、タンパク質側だけでなく、リガンド側の結合部位の特定も行った。用いたトリプシンおよびトリプシン・インヒビター（ＢＰＴＩ）アミノ酸配列は、それぞれＳＥＱＩＤＮｏ．３およびＳＥＱＩＤＮｏ．４に示した通りである。
トリプシン−ＢＰＴＩ複合体の立体構造座標は、ＰｒｏｔｅｉｎＤａｔａＢａｎｋ（ＰＤＢ）；ｈｔｔｐ：／／ｗｗｗ．ｒｃｓｂ．ｏｒｇ／ｐｄｂ／より２ＰＴＣを得た。２ＰＴＣのトリプシン−ＢＰＴＩ複合体のＸ線結晶解析の立体構造を第１４図に示す。
タンパク質ならびにリガンドの結合部位の検索には、ＤＥＬ社製パーソナルコンピュータ（機種：ＤｉｍｅｎｓｉｏｎＸＰＳＢ８６６、ＣＰＵ：ＰｅｎｔｉｎｕｍＩＩＩ８６４ＭＨｚ、ＯＳ：ＲｅｄＨａｔＬｉｎｕｘ６．２Ｊ、メモリ：５１２Ｍｂｙｔｅｓ）を用いた。
トリプシンとＢＰＴＩの立体構造座標はそれぞれ別に扱い、ヘテロ原子に水素原子を発生させたのち、周囲に水溶媒を発生した。次にトリプシンとＢＰＴＩの中でＭＳＡＳが３０％以上のアミノ酸残基が形成する表面より３．５Å以内の水分子をベンゼン分子と置換した。その際ベンゼン同士が１．５Å以内になるときは水分子のベンゼンへの置換は行わなかった。そしてベンゼンへの置換が終了した時点で水分子は１回消去した。ベンゼン分子を含むトリプシンとＢＰＴＩの立体構造座標はそれら周囲に水分子を満たした周期ボックスを発生させたのち、水分子の周期境界条件のもとａｐｒｉｃｏｔプログラムによる経験的分子エネルギー計算を実行した。これらエネルギー計算の内訳は最初に１，０００ステップの分子力学計算よる構造の最適化、続いて１ｆｓを１ステップとする３００°Ｋ、１０ｐｓの分子動力学計算によるベンゼン分子の挙動探索である。分子動力学計算ではタンパク質の立体構造が大きく崩れないように全アミノ酸残基のＣα原子に式（２７）によるＵｘｙｚ＝１０．０ｋｃａｌ／ｍｏｌ／Å^２の拘束条件を加えた。
これら経験的分子エネルギー計算の終了した時点で、トリプシン、ＢＰＴＩともに周期ボックス内の水分子を消去し、分子動力学計算１０ｐｓ後のトリプシンとベンゼンの原子座標およびＢＰＴＩとベンゼンの原子座標をＰＤＢフォーマットで得た。それらからトリプシンならびにＢＰＴＩを除いたベンゼンの分布について閾値を６Åとしたクラスター解析をそれぞれ行った。トリプシンとＢＰＴＩ周囲にそれぞれ置かれた９４個と４０個のベンゼン分子のうち、一番大きなクラスターはそれぞれ２９個、１１個であった。トリプシンとＢＰＴＩ周囲のベンゼン分子の分布を、トリプシンとＢＰＴＩとともに第１５図と第１６図にそれぞれ示す。
これらの図は、第１４図と同じ方向から見たものである。図中、黒線の六角形が一番大きなベンゼンクラスターである。
第１４図〜第１６図より、トリプシンとＢＰＴＩ周囲の一番大きなベンゼンクラスター同士は方向的によく一致していることが分かる。すなわちタンパク質の疎水性残基の周囲にベンゼン分子を配置し、水溶媒中での分子動力学計算を行い、クラスター解析による大きなベンゼンクラスター分布を探索することにより、タンパク質のリガンドへの結合部位候補を特定できることが分かる。またグラフィックス上でこれらのクラスター同士を重ねるようにタンパク質とリガンドをドッキングさせると、タンパク質−リガンド複合体の初期立体配置をラフに予測できると考えられる。この初期立体配置は手動あるいは分子設計ソフトで調整することにより、タンパク質−リガンド複合体の立体配置の有力な候補の１つになる。
産業上の利用可能性
上記のとおり、本発明の方法は、従来の方法と比べて、より真に近いタンパク質の構造、特にリガンドと結合する近傍を精度良く構築しうる方法である。したがって、本発明の方法は医農薬分子の設計等に極めて有用である。
即ち、本発明の誘導適合を含めた立体構造の構築方法は、目的タンパク質のモデル立体構造による基準振動解析から得られる複数の座標データを用いるものであり、分子振動を考慮した平均のモデル立体構造が精度よく構築できる。とくに目的タンパク質−リガンド複合体の立体構造を予測する場合には、それに重要な誘導適合（ｉｎｄｕｃｅｄｆｉｔ）を含められるので、それを考慮した精密な複合体のモデル立体構造を構築できる。また複数の受容体タンパク質の立体構造を１つのリガンドのそれで構造最適化させるＭｕｌｔｉｐｌｅＣｏｐｙＳｉｍｕｌｔａｎｅｏｕｓＳｅａｒｃｈ（ＭＣＳＳ）法でタンパク質−リガンド複合体の立体構造をシミュレートすることにより、経時的に平均化された複合体の立体構造が得られる。
また、本発明のタンパク質−リガンド複合体の立体構造構築方法は、ＭＣＳＳ計算後に、目的タンパク質−リガンド複合体モデルにおける受容体側の原子座標のばらつきを調べるものであり、活性に重要なサイトは原子座標のばらつきが比較的小さく、その他のサイトはそのばらつきが大きいことを利用して、新たなリガンドをデザインすることができ、医農薬分子設計において、有効に利用することができる。
本発明を詳細にまた特定の実施態様を参照して説明したが、本発明の精神と範囲を逸脱することなく様々な変更や修正を加えることができることは当業者にとって明らかである。
本出願は、２００１年１月１９日の日本特許出願（特願２００１−０１１７８３号）に基づくものであり、その内容はここに参照として取り込まれる。また、本明細書にて引用した文献の内容もここに参照として取り込まれる。
【配列表】

【図面の簡単な説明】
第１図は、本発明の誘導適合を含めたタンパク質立体構造構築法の一例を示すフローチャートである。
第２図は、ステップＩ−４１のＣα原子座標の構築方法を示す図である。アライメントの一致部分は参照タンパク質から取得し、無い部分はＮ，Ｃ両端それぞれ重なった２残基の重ね合わせのｒｍｓｄが最小のものをデータベースから取得する。
第３図は、ローカルスペースホモロジー（ＬＳＨ）を示す図である。図中のＴ残基に関する計算では、網をかけた（灰色の）残基が考慮される。図中下のアライメントにおける四角で囲った部分が考慮される残基ペアであり、＊のマークがあるところの比率がＬＳＨである。この場合ＬＳＨは５６．２％である。
第４図は、ＬＳＨと構造保存部位（ＳＣＲｓ）にある比率との関係を示す図である。ＬＳＨは目的タンパク質と参照タンパク質とのＣα原子の重ね合わせから計算され、ＳＣＲｓにある比率は目的タンパク質の全残基数に対するＳＣＲｓ中の残基数である。
第５図は、本発明のタンパク質−リガンド複合体の立体構造構築法の一例を示すフローチャートである。
第６図は、本発明のリガンド結合部位の特定方法、該方法で特定された結合部位を用いるタンパク質−リガンド複合体の立体構造構築法の一例を示すフローチャートである。
第７図は、本発明の誘導適合を含めたタンパク質の立体構造構築方法の実施例の一例を示すフローチャートである。
第８図は、１Ｆ８８（ロドプシン）を参照タンパク質として得られたＱＲＨＵＢ２（β２アドレナリンレセプター）のアライメントを示す図である。図中、ＱＲＨＵＢ２及び１Ｆ８８の右側の数字は、各々のタンパク質のアミノ酸配列においてアライメントの対象となったアミノ酸数である。また、上段の配列はＱＲＨＵＢ２（β２アドレナリンレセプター）を示し、下段の配列は１Ｆ８８（ロドプシン）を示す。各タンパク質のアミノ酸配列は１文字記号で示す。
第９図は、最低固有値４．４７ｃｍ^−１のＭ^Ｖ（＝２６．４）倍したゆらぎと換算した温度因子を示す図である。実線はＰＤＢＩＤ：１Ｆ８８のＡ鎖とＢ鎖平均の温度因子を換算したＣα原子のゆらぎであり、点線は基準振動解析法から得られた４．４７ｃｍ^−１のＣα原子位置ゆらぎをＭ^Ｖ（＝２６．４）倍したものである。
第１０図は、目的タンパク質と±２×Ｍ^Ｖ（±２×２６．４）倍した誘導適合（ｉｎｄｕｃｅｄｆｉｔ）型参照タンパク質から構築された誘導適合（ｉｎｄｕｃｅｄｆｉｔ）型目的タンパク質の立体構造の一部を示すディスプレイのプリントアウトの写真である。中央の構造が非誘導適合（ｎｏｉｎｄｕｃｅｄｆｉｔ）型目的タンパク質である。
第１１図は、ＭＣＳＳ計算後のトリプシン−ＢＰＴＩ複合体系のトリプシンの立体構造を示すディスプレイのプリントアウトの写真である。
第１２図は、ＭＣＳＳ計算前のトリプシン−ＢＰＴＩ複合体の初期立体構造を示すディスプレイのプリントアウトの写真である。この図では、トリプシン−ＢＰＴＩ複合体の活性部位に当たる、トリプシン側ではＨｉｓ５７、Ａｓｐ１０２、オキシアニオンホール（Ｇｌｙ１９３−Ａｓｐ１９４−Ｓｅｒ１９５）を、ＢＰＴＩ側ではＬｙｓ１５だけを抜き出して表示してある。図中、黒色で表示されている線がトリプシン−ＢＰＴＩ複合体のＸ線結晶解析の立体構造、灰色で表示されている線が組み立てられた複合体モデルの初期の立体構造である。
第１３図は、ＭＣＳＳ計算後のトリプシン−ＢＰＴＩ複合体の立体構造を示すディスプレイのプリントアウトの写真である。この図では、トリプシン−ＢＰＴＩ複合体の活性部位に当たる、トリプシン側ではＨｉｓ５７、Ａｓｐ１０２、オキシアニオンホール（Ｇｌｙ１９３−Ａｓｐ１９４−Ｓｅｒ１９５）を、ＢＰＴＩ側ではＬｙｓ１５だけを抜き出して表示してある。図中、黒色で表示されている線がトリプシン−ＢＰＴＩ複合体のＸ線結晶解析の立体構造であり、灰色で表示されている線が組み立てられた複合体モデルの精密化された立体構造である。
第１４図は、トリプシン−ＢＰＴＩ複合体のＸ線結晶解析の立体構造座標を示すディスプレイのプリントアウトの写真である。
第１５図は、トリプシン周囲のベンゼン分子の分布を示すディスプレイのプリントアウトの写真である。図中、黒線の六角形が一番大きなベンゼンクラスターである。
第１６図は、ＢＰＴＩ周囲のベンゼン分子の分布を示すディスプレイのプリントアウトの写真である。図中、黒線の六角形が一番大きなベンゼンクラスターである。Technical field
The present invention relates to a method for constructing a three-dimensional structure of a protein including inductive adaptation and the use thereof. More specifically, the present invention relates to a three-dimensional structure set in which a three-dimensional structure of a reference protein and its atomic coordinates are displaced as a three-dimensional structure of the reference protein. The present invention relates to a method for constructing a three-dimensional structure of a protein comprising preparing a plurality of three-dimensional structure sets of the protein, a method for constructing a three-dimensional structure of a protein-ligand complex using the three-dimensional structure set, a method for specifying a ligand binding site of a protein, and the like.
The three-dimensional structure of the target protein provided by the method of the present invention is a three-dimensional structure including an induced fit, and is extremely useful for molecular design of medicines and agrochemicals.
Background art
Using information on a protein with a known three-dimensional structure, it is possible to obtain an alignment with a target protein with an unknown three-dimensional structure, and create a three-dimensional structure of the target protein using a computer based on this alignment information. This method is usually called homology modeling. The accuracy of the three-dimensional structure constructed by homology modeling has improved remarkably in recent years, but there are still many problems to be solved.
When constructing a three-dimensional structure of a receptor protein using this method, it is essential to secure a space for binding a ligand. However, in the conventional three-dimensional structure construction method, the main chain or side chain of the three-dimensional structure constructed in the space where the ligand exists or the binding site is packed and the space is blocked, and the ligand comes into contact with the receptor protein. The problem that it cannot exist in the binding site has occurred.
In the method for constructing the three-dimensional structure of the protein-ligand complex, when the three-dimensional structure of the target receptor protein has not been experimentally determined, the three-dimensional structure of the receptor protein constructed simply by the homology modeling method is used. The ligand was docked, and the three-dimensional structure of the receptor protein-ligand complex was obtained by optimizing them by molecular force field calculation and molecular dynamics calculation. Also, in the study using the Multiple Copy Simulaneous Search (MCSS) method, the normal vibration mode is not considered in the three-dimensional structure on the receptor protein side, and in particular, the temporal vibration of the molecule is mainly the pico-order vibration. Long-period thermal fluctuations (hereinafter sometimes simply referred to as “thermal fluctuations” or “molecular fluctuations”) were ignored.
Furthermore, conventionally, methods for identifying a ligand binding site of a protein by an electrostatic potential that affects a long distance and construction of a three-dimensional structure of a protein-ligand complex based on similar compounds have been carried out. In the absence of similar compounds with low reliability, it was difficult to derive a reliable protein-ligand complex steric structure.
Disclosure of the invention
In view of the above situation, the present invention has been made for the purpose of providing a method for accurately constructing a three-dimensional structure of an arbitrary protein, a method for accurately constructing a three-dimensional structure of a protein-ligand complex, and the like. .
As a result of diligent studies to achieve the above-mentioned problems, the present inventors have constructed the three-dimensional structure of the receptor protein with reference to the atomic coordinates obtained by displacing the atomic coordinates of the reference protein in the eigenvector direction obtained from the standard vibration analysis method. For example, it is possible to significantly improve the accuracy of the three-dimensional structure of the receptor protein without packing the main chain or side chain of the three-dimensional structure in the space where the ligand exists or the binding site and closing the space. I found it. That is, it has been found that a plurality of receptor protein models can be constructed in consideration of molecular thermal fluctuations based on the normal vibration mode.
In addition, using the three-dimensional structure of the ligand docked to the receptor protein model thus constructed, the molecular dynamics and molecular dynamics calculations of the Multiple Copy Simulaneous Search (MCSS) method were applied to take into account thermal fluctuations of the molecules. It has been found that it is possible to construct a three-dimensional structure of a protein-ligand complex with high accuracy.
Furthermore, the present inventors have come to the conclusion that hydrophobic interaction is more important than electrostatic force when considering a phenomenon in an aqueous solution for a protein-ligand complex. Therefore, a solvent is placed around and inside the protein, and the site where the solvent accumulates on the protein or the site where the solvent is difficult to diffuse from the analysis of solvent behavior (solvent diffusion / accumulation) by molecular dynamics matches the ligand binding site. I found.
The present invention has been accomplished based on these findings.
That is, in the method of (1) deriving alignment between a reference protein and a target protein and constructing a three-dimensional structure of the target protein based on the alignment and three-dimensional structure information of the reference protein by the method of the present invention, the three-dimensional structure of the reference protein And a method for constructing a three-dimensional structure of a protein including inductive fitting, wherein a plurality of three-dimensional structures of the target protein are created using the three-dimensional structure of which the atomic coordinates are displaced as the three-dimensional structure of the reference protein. .
According to a preferred embodiment of the present invention, (2) the displacement of the atomic coordinates of the reference protein is performed by a standard vibration analysis method, and (3) the construction of the three-dimensional structure is (i) ) Obtain the coordinates from the three-dimensional structure of the reference protein for the Cα atom in the amino acid, optimize the Cα atom coordinate so as to minimize the objective function, and (ii) change the other of the main chain to the optimized Cα atom coordinate Optimize the atomic coordinates of the main chain so as to minimize the objective function by adding atoms, and (iii) add other atoms of the side chain to the atomic coordinates of the optimized main chain to minimize the objective function The method according to the above (1) or (2) is provided by performing the optimization as described above.
According to another aspect of the present invention, (4) (i) a docking operation of a ligand with a plurality of three-dimensional structures of a target protein obtained by the method according to any one of (1) to (3) above and (ii) ) The empirical molecular energy calculation of the structure of one target protein and the structure of the target protein is performed by the number of the target protein structure, and (iii) the target protein side corresponds to the potential energy gradient of each of the plurality of structures. (Iv) The ligand side moves the atomic coordinates of the ligand in the direction that averages the plurality of calculated potential energy gradients, and (v) the three-dimensional structure of the ligand based on the three-dimensional structure of the target protein. Provided is a method for constructing a three-dimensional structure of a protein-ligand complex characterized in that a structure is determined.
According to a preferred embodiment of the present invention, (5) in the empirical molecular energy calculation, the position of the initial Cα atom coordinate of the target protein is added as an optional Harmonic function, or a potential function that constrains the twist angle of the main chain of the target protein is added The method described in (4) above is provided.
According to another aspect of the present invention, (6) (i) a low-molecular compound is arranged around the three-dimensional structure of a protein, (ii) a water molecule is further arranged around them, and an empirical molecule in an aqueous solvent. Perform energy calculation to obtain atomic coordinates of protein and low molecular weight compound, and (iii) Analyze behavior of low molecular weight compound around and inside protein, and determine binding site of ligand. And (7) (i) placing a low-molecular compound around the three-dimensional structure of the protein and the ligand, and (ii) further placing a water molecule around them. And performing empirical molecular energy calculations in an aqueous solvent to obtain atomic coordinates of the protein and low molecular weight compound, (iii) for the obtained atomic coordinates, the protein and Around and within the ligand, performs behavior analysis of the low molecular compound, protein - protein and judging the binding site of the ligand complex - How to identify the binding site of the ligand complex is provided.
According to a preferred embodiment of the present invention, (8) behavior analysis of a low molecular weight compound is performed by cluster analysis for a low molecular weight compound, and the size of the obtained cluster is regarded as the rank of the binding potential site of the ligand, and the binding site is determined. The method according to (6) or (7) above, wherein the determination is performed.
According to another aspect of the present invention, (9) a ligand is docked to a ligand binding site of a protein identified by the method according to any one of (6) to (8) above, and protein-ligand complex is calculated by empirical molecular energy calculation A method for constructing a three-dimensional structure of a protein-ligand complex is provided.
According to another aspect of the present invention, (10) the three-dimensional structure of a protein and / or the three-dimensional structure of a protein-ligand complex obtained by the method according to any one of (1) to (5) and (9) above is defined. A computer-readable recording medium characterized in that atomic coordinates to be recorded are recorded, or a database characterized in that the atomic coordinates are included.
According to another aspect of the present invention, (11) based on the interaction with the three-dimensional structure of a drug candidate molecule using atomic coordinates that define the three-dimensional structure of a protein obtained from the recording medium or database described in (10) above Thus, a drug molecule design method characterized by identifying, searching, evaluating or designing a target drug molecule is provided.
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, the present invention will be described in more detail. In this specification, several terms are used, and have the following meanings unless otherwise specified.
The “target protein” means an arbitrary protein whose solid structure has not been determined by X-ray crystallography, NMR analysis, or the like, and is a target for three-dimensional structure construction in the present invention. This protein includes a protein whose partial structure has been analyzed but a complete three-dimensional structure has not been obtained. In the present invention, it is preferable that the target protein is a receptor protein, an enzyme or the like whose steric structure is unknown. Here, the X-ray crystal analysis includes not only X-rays but also electron beam and neutron beam analysis.
“Receptor protein” means a protein that is present in a cell and recognizes an exogenous substance or physical stimulus to induce a response in the cell. This receptor protein has the ability to specifically bind a ligand. The “ligand” means a substance having an ability to specifically bind to a protein. Ligand includes not only low molecular weight substances such as medical and agrochemical molecules but also high molecular weight substances such as specific peptides and proteins that interact with antibodies and proteins.
The “reference protein” means a protein that has been determined in detail by X-ray crystallography, NMR analysis, or the like, and that is referred to in order to construct atomic coordinates that define the three-dimensional structure of the target protein. In addition, “alignment” means that amino acid sequence correspondences are established for two or more types of proteins.
“Atomic coordinates” describe a three-dimensional structure in a three-dimensional space. It is a relative distance in three directions perpendicular to each other with a point in space as the origin, and is a vector quantity consisting of three numbers per atom excluding hydrogen atoms present in the protein.
“Induced fit” means that the conformation of the protein is flexible, and when bound to a ligand, such as a biopharmaceutical molecule, the conformation of the protein changes to bind better. The “three-dimensional structure including induced fit” means that the three-dimensional structure change of a protein caused by induction fitting can be expressed by, for example, an eigenvector obtained by a standard vibration analysis method. A three-dimensional structure generated by adding eigenvectors.
The “target protein-ligand complex” is a protein-ligand complex that is a target for constructing a three-dimensional structure in the present invention, since the complete three-dimensional structure of the complex has not been elucidated by X-ray crystallography or NMR analysis. means. Of course, it is natural that the protein includes a three-dimensional structure obtained by X-ray crystal analysis, NMR analysis, or the like. This complex includes those in which a partial structure has been analyzed but a complete three-dimensional structure has not been obtained. It means a complex of both ligands bound to a protein.
"Multiple Copy Multiple Search (MCSS) method" accepts the three-dimensional structure of the target protein-ligand complex based on the three-dimensional structure of multiple ligands by empirical molecular energy calculation methods, ie molecular mechanics, molecular dynamics calculation This is a method for obtaining a three-dimensional structure of a body protein. In the present invention, on the contrary, it means a method for obtaining the three-dimensional structure of a target protein-ligand complex from the three-dimensional structure of a plurality of proteins based on the three-dimensional structure of one ligand.
“Empirical molecular energy calculation” means molecular mechanics calculation and molecular dynamics calculation. Both are molecular energy calculations using empirical potential.
“MSAS (Maximum Solvent Accessibility of Sidechain)” is the maximum solvent contact surface area, the solvent contact surface area of the side chain of each amino acid constituting the protein, and the amino acid alone does not constitute the protein. It means the ratio of the side chain to the solvent contact surface area when present. Details of MSAS can be found in K.K. Akahane, Y. et al. Nagano and H.M. Umeyama, Chem. Pharm. Bull. 1989, 37 (1) 86-92.
The methods I to III described later can be carried out using an appropriate program that executes the method described later using an appropriate computer capable of performing homology modeling.
I. Construction method of 3D structure including induction fit
First, a method for constructing a three-dimensional structure including inductive fitting according to the present invention will be described.
FIG. 1 is a flowchart showing an example of a method for constructing a three-dimensional structure including an induced fit of the present invention.
In step I-10, the sequence of the target protein is input, the reference protein used to construct the three-dimensional structure of the target protein is selected, the atomic coordinates are obtained from the three-dimensional structure of the reference protein, and the atoms are minimized so as to minimize the objective function. Optimize the coordinates. In step I-20, an optimized atomic coordinate reference vibration analysis method is performed. In step I-30, the atomic coordinates of the reference protein are displaced in the eigenvector direction, and the structure is added to the reference protein to create a set of reference proteins. In Step I-40, a set of three-dimensional structures of the target protein is constructed from alignment information and each three-dimensional structure information of the reference protein set by an appropriate homology modeling program such as FAMS. Thus, a three-dimensional structure including an induced fit of the target protein can be constructed with high accuracy. Hereinafter, each step will be described in more detail.
Step I-10: Optimization of the initial coordinates of the reference protein
First, in the construction of the three-dimensional structure of the target protein, the amino acid sequence of the target protein is input, and the protein to be referenced (reference protein) is selected. Selection of the reference protein is performed using commonly used alignment software known per se. The atomic coordinates of this reference protein are obtained from an appropriate three-dimensional structure database. In this atomic coordinate, there is no hydrogen atom bonded to the nitrogen atom or the like that forms the skeleton of the amino acid, and a hydrogen atom is generated when a hydrogen atom is required for the calculation of the normal vibration analysis method in Step I-20. The atomic coordinates are optimized using an objective function composed of the atomic coordinates of the reference protein.
Here, the amino acid sequence of the target protein to be used may be a sequence derived from any source such as those registered in a database and those analyzed for the first time. The amino acid sequence database used is described in detail in, for example, “An Internet review: the complete neuroscientists the World Wide Web.” Bloom FE, Science 1996; 274 (5290): 1104-9, G -Protein-coupled Receptor Database): http: // www. gcrdb. usthscsa. edu /, GPCRDB: http: // www. gpcr. org / 7tm /, ExPASy: http: // www. expasy. ch / cgi-bin / sm-gpcr. pl, ORDB: http: // ycmi. med. yale. edu / senselab / ordb /, GeneBank: ftp: // ncbi. nlm. nih. gov / genbank / genomes /, PIR: http: // www-nbrf. georgetown. edu / pir / (National Biomedical Research Foundation (NBRF)), Swiss Plot: http: // www. expasy. ch / sprot / sprot-top. html (Swiss Institute of Bioinformatics (SIB), European Bioinformatics Institute (EBI)), TrEMBL (URL and administrator are both the same as Swiss Plot), TrEMBLNEW (same as both URL and administrator are SwissP and Swiss Plot). ftp. ddbj. nig. ac. Humans registered in databases such as jp (Japan DNA Data Bank)H.sapiens), Drosophila (D.melanogaster),Nematode(C.elagans),yeast(S.cerevisiae), Arabidopsis (A.thalianaAnd the like. These databases are merely examples, and any database can be used as long as the amino acid sequence of the protein is registered.
Examples of the three-dimensional structure database used for obtaining the atomic coordinates of the reference protein include PDB (Protein Data Bank): http: // www. rcsb. org / pdb /, CCDC (Cambridge Crystallographic Data Center: http://www.ccdc.cam.au.uk/, SCOP (Structure Classification of Protein. / Scop, CATH: http: //www.biochem.ucl.ac.uk/bsm/cath, etc. These three-dimensional structure databases can be used alone or in combination. SCOP and CATH are three-dimensional structure databases divided into domain units (three-dimensional structure units of proteins).
For example, FASTA or PSI-BLAST (Position-Specific Iterated BLAST) is preferably used as the alignment software. FASTA is a program that searches sequences having a high degree of coincidence with a target sequence from a three-dimensional structure database and calculates the degree of coincidence between the final target sequence and a reference protein as an e value. Details of FASTA are described in “Effective protein sequence comparison.” Pearson WR, (1996) Methods Enzymol; 266: 227-58.
PSI-BLAST is programmed to perform profile alignment. The details of PSI-BLAST are “Matching a protein sequence against a collection of, PSI-BLAST-constructed position and specific score-specific I-in-a-V”, and “Suffer AA, Wol. 1000-11.
The method, coordinate system, objective function, etc. for achieving the optimization of the atomic coordinates of the reference protein are not particularly limited, but it is preferable to carry out, for example, the maximum gradient method, conjugate gradient method, Newton-Raphson method or the like. The maximum gradient method optimizes the objective function of atomic coordinates by using a first-order derivative of the objective function calculated numerically. There are many methods for the conjugate gradient method, but the Fletcher-Reeves method (Fletcher, R., and Reeves, C.M. (1964) Function Minimization by Conjugate Gradients.Comput. J, 7: 149-154) is standard. If the objective function is a strict quadratic function of n variables using the first derivative of the objective function, it is guaranteed that the optimization will be reached at most n iterations. ing. The Newton-Raphson method uses the second derivative in addition to the first derivative, and is efficient when the initial structure is close to the optimized structure. Details of these methods are described in Eguchi Yoshihiro “Physical and Chemical Basics of Protein Engineering (Kyoritsu Shuppan 1991)” and references therein.
Hereinafter, the structure and coordinates optimized as described above are referred to as the optimized structure and the optimized coordinates, respectively.
Step I-20: Standard vibration analysis method of optimized coordinates
Using the optimized coordinates created in Step I-10, the atomic coordinates are displaced. The displacement of the atomic coordinates is preferably performed by performing a reference vibration analysis method and obtaining eigenvectors of the respective eigenvalues. At this time, a coordinate system in which a part of the optimized degrees of freedom is used may be used. In this case, optimization is also achieved for some degrees of freedom.
Here, the “standard vibration analysis method” means a method in which potential energy is approximated as a quadratic function of displacement, a motion equation is solved exactly, and minute vibrations around the optimized structure are analyzed. “Eigenvalue” means a period of minute vibration. “Eigenvector” means the direction of vibration.
The eigenvalue equation to be solved by the standard vibration analysis method is the following equation (1) or (2).

Where ω_kIs the eigenvalue, U_ikIs the eigenvector and δ_ijIs the Kronecker Delta. T_ijAnd V_ijIs kinetic energy E_kAnd is related to the potential energy V as shown in the following formulas (3) and (4).

It is a differentiation by. A_jkIs group movement Q_kAnd individual atomic motion q_jIs a coefficient connecting the following equation (5).

Where α_kAnd δ_kIs determined by the initial conditions.
Details of the above-described reference vibration analysis method can be found in Wilson, E .; B. , Decius, J. et al. C. , And Cross, P.M. C. 1955. Molecular Vibrations. McGraw-Hill. It is described in.
Step I-30: Generation of a new reference protein
Using the eigenvalue and eigenvector obtained in step I-20, the position fluctuation of the Cα atom at a certain temperature and a certain eigenvalue is calculated. A position fluctuation equal to the number of eigenvalues is obtained. The temperature factor of the Cα atom of the reference protein is converted into position fluctuation, and the ratio of each Cα atom to the position fluctuation of the standard vibration analysis method is calculated to obtain the average ratio. This average ratio is equal to the number of eigenvalues used, and the eigenvectors belonging to this eigenvalue multiplied by this ratio are added to the atomic coordinates of the reference protein before structure optimization, and the three-dimensional structure consisting of the displaced atomic coordinates, A three-dimensional structure including an induced fit is taken as one of the three-dimensional structures of the reference protein. Hereinafter, this is referred to as an induced fit type reference protein, a three-dimensional structure, and coordinates.
The average ratio is doubled to similarly generate an induced fit conformation of the reference protein. The eigenvector has a forward / reverse direction, and the eigenvector is similarly displaced in the reverse direction obtained by multiplying the eigenvector by -1. That is, the induced fit type has only four times the number of eigenvalues used. The three-dimensional structures of the induced fit type and the no induced fit type reference protein are set as a reference protein three-dimensional structure set.
Here, the relationship between the temperature factor and the position fluctuation is as shown in the following formula (6).

Where B_iIs the temperature factor of the atoms obtained from the PDB file, π is the circumference, D_iCorresponds to position fluctuation. In the present invention, only the Cα atom is concerned.
The ratio between the position fluctuation obtained from the reference vibration method and the position fluctuation obtained by converting the temperature factor of the PDB file is as shown in the following formula (7).

Where F_i ^vIs the position fluctuation of the i-th atom with respect to the v-th eigenvalue obtained from the standard vibration analysis method. In the present invention, it is performed only for Cα atoms.
The average of the ratio is as shown in the following formula (8).

Here, N is the number of atoms, and the sum is performed on the atoms. M^vIs the ratio of the average to the vth eigenvalue. In the present invention, this is performed for Cα atoms.
The atomic coordinates of the induced fit reference protein tertiary structure are as shown in the following formulas (9) and (10).

Where C_ik ⁰Is the atomic coordinates of the reference protein, V_ik ^vRepresents the component of the eigenvector belonging to the vth eigenvalue.
Step I-40: Modeling the target protein
With reference to the three-dimensional structure set of the reference protein obtained in the above step I-30, the three-dimensional structure set of the target protein is constructed by an appropriate homology modeling program such as FAMS. The same number of three-dimensional structures of the target protein as the number of three-dimensional structures of the reference protein are constructed. That is, an induced fit type and an inductive fit type target protein three-dimensional structure, which is four times the number of eigenvalues used, are constructed, and these are expressed as a target protein three-dimensional structure set, that is, an inductive fit ( A three-dimensional structure including an induced fit).
Next, each step of FAMS will be described as a suitable example of modeling (construction of a three-dimensional structure). Note that the number of calculations, constants, cut-off values, etc. described in the following Steps I-41 to 43 show examples of parameters that the inventor considers most preferable. It is not limited at all. The details of FAMS are described in Koji Ogata and Hideaki Umeyama, “An automatic homology modeling modalizing of data bases and synthesized in 58” Journal of the world.
Step I-41: Construction and optimization of initial coordinates of Cα atom
In response to the reference protein set and alignment information from step I-30, information about amino acid residues inserted and deleted from the reference protein is obtained. In the alignment, a region having no gap corresponding to three or more amino acids in succession is selected, and in this region, the Cα atom of the target protein is applied to the same as the reference protein in these residue pairs. deep. If the Cα atom is not obtained, coordinates are applied from a database of fragments prepared in advance (see FIG. 2).
Here, Cα atom in the present specification means a carbon atom that is the center of the skeleton of each amino acid. The Cβ atom means a carbon atom bonded to the side chain side of the Cα atom, and the Cγ atom means a carbon atom bonded to the side chain side of the Cβ atom. Moreover, C atom means the carbon atom of a carbonyl group.
Step I-41 (1): Construction of the Cα atom by the simulated annealing method
The Cα atom created in step I-41 is optimized using a function composed of the coordinates of the reference protein using a simulated annealing process. This objective function is as shown in the following formula (11).

Where U_lenIs related to the distance between the Cα atom of the pair of the residue adjacent to the sequence and the Cys residue, and is set as shown in the following formula (12).

Where D_{i, i + 1}Is the distance between Cα of residue i and residue i + 1. D_i ^ssIs the distance between pairs of Cys residues that form disulfide bonds. K_lAnd K_ssAre constants and are set to 2 and 5, respectively.
U_angIs a function of the bond angle of the Cα atom, as shown in the following formula (13).

Where θ_i(Rad) is the angle of the i, i + 1, i + 2nd residue Cα atom. θ₀Is set to (100/180) · π (rad) from the X-ray structure of the PDB. K_aIs a constant and is 1.
U_posIs a function related to the position of the Cα atom, as shown in the following formula (14).

Here, ‖ / ‖ means norm, M_iIs the average distance between Cα atoms at structurally equivalent positions on the alignment based on the structure. M for residue i_iWhen the value of is not found, M_iThe value of is set to 10. Here, it is an average coordinate of the Cα atom, as shown in the following formula (15).

Where X^j _iIs the Cα atomic coordinate of the i th residue of the j th reference protein. W^j _iIs the weight of the i-th residue of the j-th reference protein. This weight is an important parameter for determining the approximate shape of the target protein, and this is determined by a local value within 12 km of the region of interest called local space homology (LSH) ( (See FIG. 3). The correlation between LSH and the ratio of residue pairs present in sites with highly conserved structures (SCRs: Structural Conserved Regions) is very high as shown in FIG. This means that when having a high LSH value, the position of the Cα atom is statistically within 1.0 cm compared to the reference protein structure.
U_vdwIs as shown in the following formula (16).

Where K_vdwIs 0.01 (D_{i, j}<3.2 cm) and 0.001 (D_{i, j}> 3.2 cm) and 6 cm was set as the cutoff value.
The Cα atom is optimized using a simulated annealing method according to equation (11). At this stage of optimization, the Cα atom perturbation is set to be within 1.0 cm. The annealing stage is calculated 100 times for all Cα atoms. The parameter corresponding to the temperature was decreased by 0.01 every 25 to 0.5 times, and the parameter was made constant thereafter.
The two large steps, acquisition of structure information and construction of Cα atoms are repeated 10 times, and the coordinates of Cα atoms having the minimum objective function value are calculated as the optimum solution.
Step I-42: Construction and optimization of main chain atomic coordinates
In step I-41 (1), other atoms of the main chain are added to the atomic coordinates of Cα, and the objective function is minimized by the simulated annealing method. First, three-dimensional superposition of Cα atoms is performed, and residues having a Cα interatomic distance of 2.5 cm or less are picked up. The atomic coordinates of the main chain excluding Cα are obtained from the coordinates of the reference protein so as to minimize the distance between the Cα atoms, and are used as a model structure.
If there is no corresponding residue in the reference protein, the atomic coordinates of the main chain are created from the corresponding 4-residue protein fragment in the database. In this process, the main chain atom of the residue i is selected from the residues having the smallest rmsd value between the Cα atoms from the (i−1) th to the (i + 2) th. At that time, in the N-terminal residue, the overlapping range of the Cα atom coordinates is from i-th to i + 3-th, and in the C-terminal residue and the residue immediately before it, the i-3-th to i-th and i-th are also the same. -2 to i + 1.
It is optimized by the simulated annealing method based on the objective function of the main chain atoms.
The objective function is as shown in the following formula (17).

U_bondIs as shown in the following formula (18).

Where b_i ⁰Is the standard bond length and varies with the type of chemical bond. K_bIs a constant and is 225.
U_angIs a function of the bond angle, as shown in equation (19) below.

Where θ_iIs the i-th bond angle and differs depending on the type of chemical bond. K_aIs set to 45 as a constant.
U_non-bondIs a function of non-bonded interaction, as shown in the following formula (20).

Where ε_{i, j}And r_{i, j} ^*Is a constant and depends on the type of atom.
K_nonIs a constant of 0.25 and the cutoff is 8 mm.
U_ssIs a function of the disulfide bond produced by the Cys residue, as shown in the following formula (21).

Where K^ss _CαAnd K^ss _CβIs a constant and is 7.5.
U_posIs a function related to the position of an atom, as shown in the following formula (22).

Where <W_iX_i> Is given by the following equation (23).

<W in formula (22)_iX_i> Is determined from the superposition of the structure between the target protein and the reference protein.
K_posIs a constant and is 0.3.
U_torIs a torsion angle of the main chain and is represented by the following formula (24).

Where φ_i ⁰And ψ_i ⁰Is the φ of the nearest helix angle on the Ramachandran map_iAnd ψ_iAnd Also ω_i ⁰Is 0 and is π (radian) only for cis-Pro residues. K_tAnd K_ωAre constants, 10 and 50, respectively.
U_chiIs related to the chirality of Cα, as shown in the following formula (25).

Where τ_iIs a torsion angle determined by N-Cα-Cβ-C, and K_chiIs 50.
U_hydrIs related to hydrogen bonding of the main chain conserved in homologous protein, and is defined as the following formula (26).

The hydrogen bond is set when the distance between the N atom and the O atom is 2.9 ± 0.5cm.
When it is determined whether or not there are hydrogen bonds in a plurality of reference proteins, it is determined that hydrogen bonds are present when it is recognized that 75% or more of the reference proteins are present. K_hydrIs a constant and is 0.6.
Next, optimization of the main chain atoms including Cβ is performed by simulated annealing. In this annealing process, the perturbation of the main chain and Cβ atoms is made to be within 1.0 mm with respect to the initial position. This annealing step is performed 200 times for the main chain and Cβ atoms. The parameter corresponding to the temperature starts from 50 or 25 and is increased by 0.5 times each time until it reaches 0.01, and then is set to a constant value.
In order to sample the configuration of the main chain widely, in the method of the present invention, the above method is preferably performed six times, and the atomic coordinates of the main chain having the minimum objective function value are set as the optimum solution. The parameter corresponding to the temperature starts from 50 for the first two times and starts from 25 for the third time.
Step I-43: Construction and optimization of side chain atomic coordinates
The construction of side chains is roughly divided into two stages: “Side chain construction of structure-conserving sites” (Step I-43 (1)) and “Overall side chain construction” (Step I-43 (2)). Divided.
Step I-43 (1): Construction of the side chain of the structure conservation site
For the calculated main chain atoms, the twist angle of the side chain is obtained from the homologous protein using the method in the previous study. The details of this method can be found in “The role of played by the affairs of the mine. The new members of the world.” Struct. Funct. Genet. 1998, 31, 255-369.
In this method, the proportion of side chains stored in homologous proteins is calculated, and side chains are modeled based on this information. The atomic coordinates of the side chain of the conserved part of the side chain are placed with respect to the fixed main chain atom. For example, χ of arginine residues in homologous proteins¹If the angle is conserved, the coordinates of the Cγ atom can be placed, and the χ¹And χ²If the corners are conserved, all side chain atoms can be placed. The process of optimizing simulated annealing using equation (17) was performed only for the main chain and Cβ atoms, and the perturbation of atoms was within 1.0 cm. This stage of annealing the main chain and Cβ atoms is performed 200 times. The parameter corresponding to the temperature starts from 25 and is increased by 0.5 times each time until it becomes 0.01. U in equation (17)_non-bondIs performed on main chain atoms and partially created side chain atoms. At this time, the coordinates of the side chain atoms are preserved throughout the optimization process.
M as structure information_iAnd hydrogen bonded N—O pairs are used in the optimization process. In order to obtain the arrangement of main chain atoms, the above process is repeated three times, and the coordinates of the main chain atom having the minimum objective function are used as the calculated structure.
Step I-43 (2): Construction of the entire side chain
The side chain is constructed under a fixed main chain and Cβ atoms. This is described in Ogata K and Umeyama H, Prot. Struct. Funct. Genet. The research results disclosed in 1998, 31 and 255-369 are carried out, and by using them, an accurate model can be given in a short time. Next, the main chain structure is optimized by the Monte Carlo method at a low temperature, the temperature is set to 0.001, and the objective function U of Equation (17) is set._non-bondAnd calculated for all main chain and side chain atoms. Then, the coordinates of the side chains are rearranged so that the twist angle of the side chains is optimized in the process of optimizing the N, Cα, C, and Cβ atoms. The perturbation of atoms should be within 0.5mm. The side chain is then deleted and the above side chain construction is repeated. This process is repeated until there is no collision between the 2.4Å atoms and the twist angle of N-Cα-Cβ-C falls within the range of −120 ± 15 °.
Step I-44: Construction of the final structure
Thus, it is possible to obtain atomic coordinates that define a three-dimensional structure of an inductive fit type and an inductive fit type of any target protein.
II. Method for constructing three-dimensional structure of protein-ligand complex
Next, a method for constructing a three-dimensional structure of a protein-ligand complex which is another embodiment of the present invention will be described with reference to the drawings. FIG. 5 is a flowchart showing an example of a method for constructing a target protein-ligand complex, that is, a method for constructing a complex including an induced fit.
First, in Step II-10, the modeled atomic coordinates of the target protein are obtained. The standard vibration mode is obtained by performing the standard vibration analysis method of the optimized reference protein. Then, the atomic coordinates of the target protein obtained mainly in the experiment are displaced in the eigenvector direction to create a set of a plurality of reference proteins. The three-dimensional structure of the target protein is constructed by referring to these coordinates by homology modeling.
In Step II-20, an operation for docking the ligand is performed on the obtained three-dimensional structure of the target protein. In Step II-30, the empirical molecular energy calculation by the MCSS method is performed based on the ligand docked to the target protein set, and the three-dimensional structure of the target protein-ligand complex is simulated. The three-dimensional structure of the target protein-ligand complex thus obtained is a three-dimensional structure including induction fit of the target protein, that is, periodic thermal motion (molecular fluctuation). Can be used.
Hereinafter, each step will be described in more detail.
Step II-10: Modeling the target protein
The target protein modeling is divided into the following three steps II-11: optimization of the initial coordinates of the reference protein, II-12: normal vibration analysis of the optimized coordinates, and II-13: modeling of the target protein. This step is performed in the same manner as I-10 to I-44. Thus, a three-dimensional structure based on the vibration mode of the reference vibration analysis method, that is, a three-dimensional structure of the target protein including an induced fit can be constructed.
Step II-20: Docking the ligand to the target protein
Ligand docking is performed on a plurality of three-dimensional structure models of the target protein in consideration of the normal vibration mode. It is docked at a position considered to be a ligand binding site of the target protein. This step is performed using commercially available software capable of inputting / outputting PDB format files, such as BIOCES (manufactured by NEC), Cerius 2 (manufactured by Accelrys), SYBYL (manufactured by TRIPOS), HyperChem (manufactured by Hypercube), and the like. . Generally, docking is performed by rotating and translating a ligand on a display capable of stereo display. Further, docking including a simple energy calculation method may be performed.
The binding site of the ligand to be used is not particularly limited, and any of the already known binding sites and newly specified binding sites can be used. For a protein with an unknown ligand binding site, the site can also be identified by the method described in III below.
Step II-30: Optimization of conformation of target protein-ligand complex
For the protein-ligand complex structure model obtained in Step II-20, empirical molecular energy calculation of the structure of one target protein and the structure of the target protein is performed for the number of target protein structures. The side moves the atomic coordinates according to the potential energy gradient of each of the plurality of structures, and the ligand side moves the ligand's atomic coordinates in the direction of averaging the calculated potential energy gradients, The ligand structure based on the three-dimensional structure is obtained.
This Step II-30 is performed by, for example, the Multiple Copy Simulaneous Search (MCSS) method, and a plurality of complex structures are simultaneously optimized by a ligand by empirical molecular energy calculation (molecular force field method), and their atomic coordinates are By empirical molecular energy calculation (molecular dynamics method), the structure is relaxed for 10 ps at a temperature of 300 ° K., for example, and the atomic coordinates are optimized by the molecular force field method. Of course, the temperature and time may vary depending on the target system being calculated.
The MCSS method is a method for optimizing the three-dimensional structure of both a receptor protein and a ligand using a plurality of ligands. Miranker and M.M. Karplus (Proteins, 1991, 11, 29-34). As a method, empirical molecular energy calculations of individual ligands and proteins are performed at the same time, and the receptor protein gradient is averaged, so that the receptor protein side moves as a single three-dimensional structure.
On the other hand, in the method of the present invention, the structure of a ligand based on a plurality of protein structures is obtained using a plurality of molecular structures on the protein side and a single molecular structure on the ligand side. In this empirical molecular energy calculation, protein 1 structure and ligand 1 structure are calculated as many as the number of protein structures, and on the ligand side, the atomic coordinates of the ligand are oriented in the direction of averaging the calculated potential energy gradients. Move. On the other hand, on the target protein side, the atomic coordinates are moved according to the potential energy gradient of each of the plurality of structures, and a ligand structure based on the plurality of three-dimensional structures of the target protein is obtained.
The above-described empirical molecular energy calculation method is not particularly limited and may be a method known per se, but the apricot program developed by the inventors (Yoneda, S., and Umeyama, H., J Chem Phys 1992). Preferably 97: 6730-6736), an apricot-MCSS program. Empirical potential functions include AMBER type potential functions (SJ Weiner, PA Kollman, D, A. Case, U. Chandra Singh, C. Ghio, G. Alagona, S. Profeta, Jr., P. Weiner, J. Am. Chem. Soc., 1984, 106, 765-784) and parameter 89a Rev A are preferably used as parameters. Of course, other experiential potentials can be used.
In the molecular dynamics calculation, in addition to the normal energy term, a constraint potential for the Cα atom position is added as a Harmonic function, for example, as shown in the following equation (27), so that the initial three-dimensional structure of the target protein is not greatly broken. Is preferred. This is important in terms of compensating for the roughness of the approximation of the calculation, but the range of the constraint potential may be extended to the entire main chain, and is not limited to this.

Here, Uxyz is a constraint potential energy applied to the position of the Cα atom in the target protein, and is a parameter of how much the original coordinate value of Cα is x0, the updated coordinate value is x, and how much Kxyz restricts the atom. Here, Kxyz is 10.0 kcal / mol / Å²Is used as an example, but does not limit the scope of the present invention including the form of the formula.
Further, instead of the constraint potential with respect to the X, Y, and Z coordinates of the Cα atom, the constraint on the torsion angle of the main chain of the target protein shown in Formula (24) is used, that is, in the empirical molecular energy calculation, the main chain of the target protein. By adding a potential function that constrains the twist angle, the initial three-dimensional structure may be prevented from being greatly broken.
Thus, if an induced fit type three-dimensional structure model is used as the target protein, the atomic coordinates of the target protein-ligand complex taking into account molecular fluctuations can be obtained.
In addition, when the ligand molecule is a protein, a ligand in consideration of induced fit on the ligand side from a plurality of three-dimensional structures including the normal vibration mode of the ligand and a single three-dimensional structure of the protein in the same manner as described above. -Construction of a three-dimensional structure of a protein complex is also possible.
III. Method for identifying ligand binding site of protein
Next, a method for identifying a ligand binding site of a protein, which is another embodiment of the present invention, will be described. FIG. 6 is a flowchart showing an example of a method for identifying a ligand binding site of a protein and a method for constructing a three-dimensional structure of a protein-ligand complex by binding a ligand to the obtained binding site.
In Step III-10, a protein / ligand binding site is identified (predicted). In this step, low molecular weight compounds such as nonpolar solvents are generated around and inside the protein and / or ligand, such as a hydrophobic surface, and a number of water molecules are added around them to make the molecules in an aqueous solution apparently. Perform dynamics calculations. Based on these results, the binding site of the protein and the ligand is searched from the behavior of a low molecular weight compound on the protein and / or ligand surface, for example, a nonpolar solvent. In Step III-20, referring to the estimated binding site between the protein and the ligand obtained in Step III-10, they are docked to determine the initial atomic coordinates of the three-dimensional structure of the protein-ligand complex. In Step III-30, water molecules are generated around the initial three-dimensional structure of the protein-ligand complex obtained in Step III-20. Apparently, the protein- Refine the three-dimensional structure of the ligand complex.
Hereinafter, each step will be described in more detail.
Step III-10: Identifying the ligand binding site of the protein
Identification of the protein-ligand binding site is performed by the following three steps: III-11: Generation of a low-molecular compound around the protein and / or around the ligand, III-12: Protein and / or ligand in an aqueous solvent Search for behavior of low molecular weight compounds (eg nonpolar solvents) by empirical molecular energy calculation (molecular mechanics, molecular dynamics calculations), III-13: From behavior of low molecular weight compounds (eg nonpolar solvents) to proteins It is divided into the determination of the ligand binding site and / or the binding site of the ligand to the protein.
Step III-11: Generation of low molecular weight compounds around proteins and / or around ligands
First, water molecules are generated around the protein and / or ligand, and then water molecules around the protein, around the ligand, and inside the low molecular compound can be replaced with the low molecular compound. In this case, these substitutions may arrange the low molecular weight compound over the entire periphery thereof, or may arrange the low molecular weight compound only around the amino acid or functional group having hydrophobicity or hydrogen bonding ability. Here, when the ligand is a high-molecular substance such as a peptide or protein, a low-molecular compound is generated around the ligand, and the behavior of the low-molecular compound is analyzed by empirical molecular energy calculation as in the case of the protein. . When the ligand is a substance having a low molecular weight, such as a pharmaceutical or agrochemical molecule, it is possible to determine which part is a hydrophobic region, and therefore there is usually no need to specify a binding site. However, when the ligand is a polymer substance, it is necessary to analyze the binding site on the ligand side in the same manner as the binding site on the protein side and specify the binding site of the complex.
The low molecular compound may be, for example, a nonpolar solvent such as ethane, cyclopentane, or benzene, a hydrogen-bonding solvent such as N-methylacetamide or benzamide, or a medical or agrochemical compound, and is not particularly limited. However, in view of the arbitrary nature of these orientations, compounds having objectivity are preferred. When a nonpolar solvent is used, the binding site of a protein or ligand having a hydrophobic moiety can be specified. In addition, when a compound having an acid amide group, which is a hydrogen bonding ability solvent, is used, a portion capable of hydrogen bonding with an acid amide group, that is, an exposed portion of a β sheet structure or a binding site of a ligand including an oxyanion hole can be specified. it can. Furthermore, when a medicine / pesticidal molecule is used, a moiety to which the medicine / pesticidal molecule can specifically bind can be specified.
Specifically, for example, when a nonpolar solvent such as benzene is arranged around a protein, water on the surface within 3.5 mm formed by amino acid residues having an MSAS value of 30% or more in the protein. The molecule may be replaced with a nonpolar solvent (benzene). In addition, when the nonpolar solvents (benzene) are within 1.5 mm, it is not necessary to replace the water molecules with the nonpolar solvent. All water molecules that have not been replaced by nonpolar solvents are erased once. The above-mentioned criteria for substitution of water molecules with nonpolar solvents are an example when benzene is used, and do not limit the scope of the present invention.
Step III-12: Retrieval of behavior of low molecular weight compounds by empirical molecular energy calculation of proteins and / or ligands in aqueous medium
Molecules for which empirical molecular energy calculation is performed after generating water molecules at periodic boundary conditions using the atomic coordinates of the protein (and / or ligand) and the low molecular weight compound created in Step III-11 above. The three-dimensional structure is optimized by dynamic calculation, followed by molecular dynamics calculation. After the molecular dynamics calculation is completed, the water molecules are removed to obtain atomic coordinates of the protein (and / or ligand) and the low molecular compound. For example, when a nonpolar solvent (benzene) is arranged as a low molecular compound, molecular dynamics calculation at a temperature of 300 ° K. and about 10 to 20 ps may be performed. This causes diffusion and accumulation of low molecular weight compounds around and inside the protein. By analyzing the state of diffusion and accumulation, that is, the behavior of the low-molecular compound by the method of Step III-13 described later, the ligand binding site on the protein side and the protein binding site on the ligand side can be specified.
The empirical molecular energy calculation method is not particularly limited, but it is preferable to use the apricot program developed by the present inventors. It is preferable to use an AMBER type potential function as the empirical potential function. Of course, other experiential potentials can be used.
Step III-13: Determination of ligand binding site from the behavior of low molecular weight compounds
For the distribution of low-molecular compounds around the protein and / or ligand obtained in Step III-12 above, for example, a nonpolar solvent distribution, a cluster analysis is performed on the distribution, and the ligand is converted into the protein from the size of the obtained cluster. Determine which parts are easy to dock.
Here, cluster analysis is a multivariate analysis method in which a data set given in a multidimensional space is clustered according to the degree of similarity (or degree of difference) between individuals. Here, the Euclidean distance between the centroids of nonpolar solvents in a three-dimensional space (coordinate average of 6 carbon atoms in benzene) is calculated, and if there are nonpolar solvents within a threshold distance, nonpolar solvents with short distances It will be clustered. At this time, the clustered set of nonpolar solvents is not the distance from the center of gravity of the cluster, but is checked whether the nonpolar solvents at the shortest distance are within the threshold, unlike the normal cluster analysis. To determine whether to cluster them. In the case of benzene as a non-polar solvent, a threshold value of 6 Å was used, but the value is merely an example and does not limit the scope of the present invention.
For example, when a nonpolar solvent (benzene) is used, they are classified into several clusters, and it is considered that the larger the cluster, the more likely it is a docking site to a ligand or protein. The clustered nonpolar solvent group can be represented by an ellipsoidal shape, but by solving the eigenvalue problem of coordinates, the length direction of the cluster can be obtained. Several models of protein-ligand complexes are created by docking clusters on both the protein side and the ligand side with reference to the long and short directions of the elliptic sphere. Of course, the complex structure in which protein and ligand overlap is automatically removed. The docked model is fine-tuned for protein and ligand placement with the software described in step II-20.
Step III-20: Docking of ligand to protein
The sites that have become large clusters by clustering of the low-molecular compound obtained in Step III-13, for example, a nonpolar solvent (benzene), are docked to obtain initial data of the protein-ligand complex structure. At this time, low molecular weight compounds such as nonpolar solvent (benzene) data are removed upon docking.
This step can be performed using commercially available software capable of inputting / outputting a PDB format file. In general, docking is performed by rotating or translating a ligand on a display capable of stereo display. Further, docking including a simple energy calculation method may be performed.
Step III-30: Construction of three-dimensional structure of protein-ligand complex
The initial atomic coordinate data of the protein-ligand complex obtained in the above step III-20 is obtained by generating water molecules around them under periodic boundary conditions, optimizing the initial three-dimensional structure by molecular mechanics calculation, The protein-ligand complex conformation is obtained by performing kinetic calculations and removing water molecules from the coordinate trajectory of the final step.
The method of molecular dynamics calculation is not particularly limited, and may be performed, for example, at a temperature of 300 ° K. and about 10 to 20 ps. The program to be used is not particularly limited, but it is preferable to use AMBER type as an experience field with an apricot developed by the inventors. However, the program used and the force field are merely examples, and do not limit the scope of the present invention.
Thus, taking into account that the formation process of protein-ligand complex is in aqueous solution, the hydrophobic surface of protein and ligand can be obtained by using the accumulation and diffusion of low molecular weight compounds such as non-polar solvent in aqueous solvent. The atomic coordinates of the protein-ligand complex can be obtained more accurately than before by finding them and docking them together.
IV. Recording media and database on which atomic coordinates that define the three-dimensional structure of proteins are recorded
By storing the three-dimensional structure of the protein obtained by the above method or the three-dimensional structure of the protein-ligand complex in an appropriate recording medium in a predetermined format that can be used by a computer, the three-dimensional structure of the target protein is obtained. Database can be constructed. The database of the present invention may preferably include alignment information of the reference protein and the target protein together with the atomic coordinates. In addition, the database includes a code number, information on a reference region of a reference protein, information on a target protein, a distance between Cα atoms, and the like as desired.
In the present invention, the database also means a computer system that writes the atomic coordinates on an appropriate recording medium and performs a search according to a predetermined program. Examples of suitable recording media include magnetic media such as floppy disks, hard disks, and magnetic tapes; optical disks such as CD-ROM, MO, CD-R, and CD-RW, and semiconductor memories.
V. Drug molecular design method
Structure of a protein that is a target of a drug molecule obtained by the above method (hereinafter sometimes referred to as “target protein”) on a computer that runs an appropriate program capable of designing drug molecules such as medical and agricultural chemicals. Identify or search for drug molecules (antagonists or agonists) that interact with the target protein using all or part of the coordinates, or all or part of the structure coordinates of the database or recording medium in which they are recorded Evaluation, design, etc. can be performed.
The identification, search, evaluation or design of the drug molecule is performed based on the presence or absence of the interaction between the three-dimensional structure coordinate obtained by the method of the present invention and the three-dimensional structure coordinate of the drug molecule. In the present specification, identification, search, evaluation, or design of drug molecules may be simply referred to as drug molecular design.
The computer used for designing the molecule based on the interaction between the three-dimensional structure coordinates of the protein and the three-dimensional structure coordinates of the drug candidate molecule is not particularly limited as long as the computer is adjusted so that an appropriate program can be operated. There is no. There is no particular limitation on the storage medium of the computer. Examples of the program used for molecular design include the computer program Insight II manufactured by Accelrys. In particular, by using a program such as Ludi or DOCK specially created for this purpose alone or in combination, a drug molecule can be identified, searched, evaluated or designed more easily. In addition, docking evaluation between the three-dimensional structure coordinates of a protein and a drug molecule can be performed using software such as BIOCES manufactured by NEC described in Step II-20, for example.
Here, even if the drug molecule is a known one or a newly synthesized drug molecule having a new chemical structure, any drug molecule can be used as long as the three-dimensional structure can be obtained. It can be used in the inventive method. The three-dimensional coordinate of the drug molecule may be obtained by any method such as X-ray crystallography or modeling. Those whose three-dimensional structural coordinates are determined include an appropriate database such as CCDC (Cambridge Crystallographic Data Center: http://www.ccdc.cam.ac.uk/) and PDB (Protein Data Bank: http: // www.rcsb.org/pdb/) and the like.
Furthermore, a drug molecule can also be designed by using the three-dimensional structure of the target protein, for example, by the method described in JP-A No. 2000-178209. In this way, by using the three-dimensional structure coordinates of the protein obtained by the method of the present invention, molecular design by a computer of drug molecules becomes possible. However, the molecular design method of the present invention is not limited to those using these programs and methods.
Drug molecular design usually has two conceptual stages. The first step is to find the lead compound, and the next step is to optimize the lead compound. Both steps can be performed by methods known per se using the three-dimensional structure coordinates of the target protein. As a result, an optimal candidate molecule for medical and agricultural chemicals can be obtained.
VI. Screening method for candidate molecules for medicine and pesticides obtained by molecular design method
The pharmaceutical and agrochemical candidate molecule identified, searched, evaluated or designed by the above method can be obtained, for example, by a chemical synthesis method known per se, depending on the nature of the molecule. However, the drug molecule may be a natural compound or a synthetic compound, and may be either a high molecular compound or a low molecular compound. The obtained medicinal and agricultural chemical candidate molecules are further examined by pharmacological or physiological tests in vitro or in vivo by a method known per se, and medicinal and agricultural chemical candidate molecules having a desired activity are selected. By this, what can actually be applied as a medical pesticide can be obtained.
VII. Method for producing pharmaceutical and agrochemical composition
A drug molecule such as a pharmaceutical or agrochemical selected by the above screening method, for example, a pharmaceutical molecule can be administered to a patient having a disease or the like to be treated alone, but one or more of these active ingredients Can also be administered as a mixture. Further, it is preferable to formulate the substance as a pharmaceutical composition using a pharmacologically acceptable additive for pharmaceutical preparation and the like and administer it. For example, tablets, capsules, granules, fine granules, powders, pills, microcapsules, liposome preparations, troches, sublinguals, liquids, elixirs, emulsions, suspensions, etc. As an injection, a suppository, an ointment, a patch or the like manufactured orally as a sterile aqueous liquid or oily liquid. These are, for example, admixed with the physiologically recognized carriers, flavoring agents, excipients, vehicles, preservatives, stabilizers, binders, etc., in unit dosage forms required for accepted pharmaceutical practice. , And can be produced using methods well known in the art such as filling or tableting. The amount of active ingredient in these pharmaceutical compositions is such that an appropriate volume within the indicated range is obtained.
When the agrochemical molecule is actually used as an agrochemical, it is mixed with a carrier or diluent, additive, adjuvant, etc. by a known method to form a formulation (composition) usually used as an agrochemical, such as a powder. , Granules, wettable powders, emulsions, aqueous solvents, flowables and the like.
Example
EXAMPLES Hereinafter, the present invention will be described more specifically with reference to examples. However, the following examples should be considered as helping to obtain specific recognition of the present invention and should not limit the scope of the present invention in any way. Absent.
Example 1 Construction of three-dimensional structure of β2 adrenergic receptor
In accordance with the method detailed in I-10 to I-40 of the above-described embodiment of the present invention, a three-dimensional structure including an induced fit of human-derived β2 adrenergic receptor was constructed as follows. FIG. 7 shows a flowchart.
The three-dimensional structure model was constructed using a NEC workstation (model: Express 5800 / 120Rc-2, CPU: Pentium III 933 MHz x2, OS: Red Hat Linux 6.2J, memory: 1024 Mbytes). The amino acid sequence of the intended β2 adrenergic receptor is PIR; http: // www-nbrf. georgetown. ID of edu / pir /: Obtained from QRHUB2.
Alignment by PSI-BLAST (Position-Specific Iterated BLAST) was performed using the amino acid sequence of this β2 adrenergic receptor as the sequence of the target protein. At that time, the motif profile is GCRDb; http: // www. gcrdb. usthscsa. 892 total sequences of edu / were used. The amino acid sequence of the β2 adrenergic receptor is shown in SEQ ID No. It is shown in 1.
The structure of the B chain of ID: 1F88 (rhodopsin) of PDB (http://www.rcsb.org/pdb/) was used as the three-dimensional structure of the protein to be referenced, and alignment with this B chain was obtained. The sequence of the B chain of 1F88 (rhodopsin) is represented by SEQ ID No. FIG. 8 shows the alignment result. In the crystal lattice of 1F88 (rhodopsin), there is a dimer having almost the same steric structure consisting of an A chain and a B chain, and the B chain was used as a reference structure. In addition, the coordinates of the A chain and the B chain each have a large deficiency and are not complete. The 1F88 structure is modeled using the modeling program FAMS detailed in Step I-40, and the constructed three-dimensional structure is converted to β2 adrenaline. The three-dimensional structure of the receptor reference protein was used.
In the PDB file and FAMS, since a hydrogen atom is not added to an appropriate residue, a hydrogen atom is generated at an appropriate residue of this reference protein three-dimensional structure, and initial atomic coordinates serving as input coordinates for a standard vibration analysis method are obtained.
As in steps I-10 to I-20, optimization of the obtained initial atomic coordinates with the Cartesian coordinate system, re-optimization with the Cartesian coordinate system with some of the potential parameters of the SS bond being zero, and dihedral angle coordinates A normal vibration analysis method using a system was performed, and eigenvalues and eigenvectors were obtained.
At this time, AMBER's palm89a Rev A was used as a parameter. The cutoff value of non-bonded interaction is 9.0 mm inside and 10.0 mm outside, and the parameter of 1-4 interaction is obtained by multiplying 1/2 of unbonded interaction, and the dielectric constant is distance dependent The mold (1 / rÅ) was used. The optimization used the Fletcher-Reeves conjugate gradient method. After optimizing the obtained initial atomic coordinates with the Cartesian coordinate system, re-optimize with the Cartesian coordinate system using the same conditions except that the SS bond angle and dihedral angle parameters are set to zero. A standard vibration analysis method using an angular coordinate system was performed to obtain eigenvalues and eigenvectors.
The optimization conditions used are described by Sumikawa, H .; , Suzuki, E .; -I. , Fukuhara, K .; -I. Nakajima, Y .; , Kamiya, K .; , And Umeyama H .; 1998. Dynamics structure of granulocyte colony-stimulating factor proteins prepared by normal mode analysis: Two-domain-type of symposium. The method described in Chem Pharm Bull 46: 1069-1077 was used. Details of the reference vibration analysis method using the dihedral angle coordinate system are described in Noguchi, T .; , And Go, N .; 1983. Dynamics of native global proteins in terms of dihedral angles. J Phys Soc Jpn 52: 3283-3288 and Noguti, T .; , And Go, N .; 1983. A method of rapid calculation of a second derivative matrix of conformational energy for large molecules. The method described in J Phys Soc Jpn 52: 3685-3690 was used.
As in Step I-30, the temperature is set to 300 ° K. and 30 cm.^-1Obtain the fluctuation of the Cα atom for each of the following eigenvalues, take the ratio of the fluctuation of the Cα atom converted from the average temperature factor of the A chain and B chain of PDB ID: 1F88 (rhodopsin), and calculate the average ratio for each eigenvalue Got. The average ratio was multiplied by the eigenvector belonging to this eigenvalue, and displacement was performed in addition to the atomic coordinates of the reference protein to obtain coordinates defining the three-dimensional structure of the induced fit type reference protein. Similarly, a displacement obtained by multiplying the eigenvector by −1, a displacement obtained by multiplying the eigenvector by the average ratio doubled, and a displacement multiplied by −1 were performed. However, the eigenvector added here is converted from dihedral angle coordinates to Cartesian coordinates. From the one eigenvalue / eigenvector, four induced fit type reference protein tertiary structure sets are obtained. 30cm used^-1The number of eigenvalues below is 118, and the number of derived fit-type reference proteins obtained is 472. As an example, the lowest eigenvalue of 4.47 cm is shown in FIG.^-1M^v(= 26.4) Indicates the temperature factor converted into the fluctuations multiplied.
As in Step I-40, the three-dimensional structure of the target protein β2 adrenergic receptor was modeled by FAMS from a non-induced-fit reference protein tertiary structure and an induced-fit reference protein tertiary structure set. . The three-dimensional structure of the target protein and the three-dimensional structure of the reference protein are in a one-to-one relationship, and 472 induced fit type target protein three-dimensional structures and one non-inducted compatible (no induced) obtained from the conventional method. fit) type target protein tertiary structure was obtained. As an example, FIG. 10 shows a non-inductive fit target protein three-dimensional structure constructed from the above-described no induced fit reference protein and the eigenvector of the lowest eigenvalue ± 2 × M^vA part of the three-dimensional structure of an induced fit target protein constructed from a (± 2 × 26.4) -fold induced fit reference protein tertiary structure is shown. In the figure, the central structure is a non-inductive compatible target protein.
Example 2 Construction of 3D structure of trypsin and trypsin inhibitor
In this example, using the β-trypsin (trypsin) and trypsin inhibitor (BPTI) system derived from bovine pancreas, which is known for X-ray crystallographic analysis of receptors, ligands, and receptor-ligand complexes, the protein of the present invention -Verification of the method for constructing the three-dimensional structure of the ligand complex. Here, trypsin is a receptor protein (target protein) and BPTI is a ligand.
The amino acid sequence of the triprin used was SEQ ID No. 3, the amino acid sequence of trypsin inhibitor (BPTI) is shown in SEQ ID No. 4 shows. Since the amino acid number of trypsin is described by the amino acid sequence number of chymotrypsinogen (a precursor of chymotrypsin), it has 223 residues of amino acid numbers 16 to 245 as shown below. In the middle, amino acid numbers 35, 36, 68, 128, 131, 188, 205, 206, 207, 208 are missing, and 184, 188, 221 are duplicated (indicated by 184A, 188A, 221A).

According to the method described in detail in Steps II-10 to II-30, a three-dimensional structure model of trypsin-BPTI complex was constructed by the following procedure, and the position of the complex active site was compared with the X-ray crystallographic data. .
The three-dimensional model of the receptor protein-ligand complex was constructed using a personal computer manufactured by DEL (model: Dimension XPS B866, CPU: Pentium III 864 MHz, OS: RedHat Linux 6.2J, memory: 512 Mbytes). . Coordinates of X-ray crystallographic analysis of trypsin and BPTI alone, as well as that of the trypsin-BPTI complex, are described in Protein Data Bank (PDB); http: // www. rcsb. 1 TLD (trypsin simple substance), 4PTI (BPTI), and 2PTC (trypsin-BPTI complex) were obtained from org / pdb /, respectively.
The three-dimensional coordinate system of trypsin and BPTI was superimposed on the coordinate system of 1TLD and 4PTI to the coordinate system of 2PTC with a least square foot so that the results of the trypsin-BPTI complex could be easily considered. For the three-dimensional coordinates of trypsin and BPTI, hydrogen atoms were generated as heteroatoms, and then the initial coordinates of each single substance were optimized. Next, trypsin performed a reference vibration analysis in a system not containing BPTI, and obtained a vibration vector for each wavelength.
Among them, the three-dimensional structure of trypsin-BPTI complex is obtained by docking the three-dimensional structure of BPTI with respect to the three-dimensional structure of trypsin consisting of vibration vectors with a long period of time and performing MCSS calculation by apricot-MCSS program. Was refined. The breakdown of the MCSS calculation is as follows. First, the three-dimensional structure was optimized by molecular mechanics calculation of trypsin-BPTI complex in 1000 steps, followed by molecular dynamics calculation at 300 ° K and 10 ps with 1 fs as one step. The steric structure of the BPTI complex was relaxed. In molecular dynamics calculation, Kxyz = 10.0 kcal / mol / Å with respect to the Cα atom shown in the formula (27) so that the three-dimensional structure of the complex is not greatly broken.²The restraint condition of was added. And about the three-dimensional structure after 10 ps, the coordinate data of the trypsin-BPTI complex were obtained in the PDB format.
FIG. 11 shows the three-dimensional structure of trypsin in the trypsin-BPTI complex system after MCSS calculation. Looking at the atomic coordinates of trypsin, there were parts where both the main chain and side chain were greatly dispersed, and parts where they were not scattered much. Among them, the trypsin active site, His57, Asp102, Gly193-Asp194-Ser195 (oxyanion hole) on the side of trypsin was in good agreement with the main chain and side chain. By utilizing this fact, a site on the side of the receptor protein important for the ligand binding site can be found. It is very helpful in designing new ligands.
The initial three-dimensional structure of the trypsin-BPTI complex before the MCSS calculation is shown in FIG. 12, and the three-dimensional structure of the trypsin-BPTI complex after the MCSS calculation is shown in FIG. 13 together with the three-dimensional structure of the X-ray crystallographic analysis of the complex. . In these figures, His57, Asp102 and oxyanion holes (Gly193-Asp194-Ser195) corresponding to the active site of trypsin-BPTI complex are extracted and displayed on the BPTI side, and only Lys15 is extracted. The line shown in black is the three-dimensional structure of the X-ray crystallographic analysis of the trypsin-BPTI complex, and the line shown in gray is the initial three-dimensional structure of the complex model assembled according to the present invention (FIG. 12). This is a refined result (FIG. 13).
The trypsin active sites His57, Asp102, and oxyanion holes are the initial three-dimensional structure before MCSS calculation (Figure 12) and the refined three-dimensional structure after MCSS calculation (Figure 13). It matches well including. The Lys15 main chain of BPTI is also in good agreement before and after the MCSS calculation because its carbonyl oxygen is connected to the oxyanion hole Gly193 and Ser195 peptide NH group by two hydrogen bonds. On the other hand, the direction of Lys15 side chain of BPTI is not in the active pocket of trypsin before MCSS calculation, but by entering the active pocket by refining the three-dimensional structure by MCSS calculation, the X-ray crystal of trypsin-BPTI complex Matches well with analysis.
The purpose of this is to use a plurality of model three-dimensional structures including the normal vibration mode of the target protein and to simulate the initial three-dimensional structure of the target protein-ligand complex obtained by docking with them by MCSS calculation. It is useful for constructing the three-dimensional structure of a protein-ligand complex.
Example 3 Identification of trypsin and trypsin inhibitor binding sites
According to the method described in detail in Steps III-10 to III-30, trypsin and BPTI binding sites were identified by the following procedure, and these sites were compared with the X-ray crystallographic data of the complex. In this example, a β-trypsin (trypsin) and trypsin inhibitor (BPTI) system derived from bovine pancreas, which is known for protein-ligand complex X-ray crystallography, was used. Here, trypsin is a receptor protein (target protein) and BPTI is a ligand, but since BPTI is also a protein, not only the protein side but also the binding site on the ligand side was specified. The trypsin and trypsin inhibitor (BPTI) amino acid sequences used are SEQ ID NO. 3 and SEQ ID No. As shown in FIG.
The three-dimensional structure coordinates of the trypsin-BPTI complex are as follows: Protein Data Bank (PDB); http: // www. rcsb. 2PTC was obtained from org / pdb /. FIG. 14 shows the three-dimensional structure of X-ray crystallographic analysis of 2PTC trypsin-BPTI complex.
A personal computer (model: Dimension XPS B866, CPU: Pentium III 864 MHz, OS: RedHat Linux 6.2J, memory: 512 Mbytes) manufactured by DEL was used to search for binding sites of proteins and ligands.
The three-dimensional structure coordinates of trypsin and BPTI were handled separately, a hydrogen atom was generated as a heteroatom, and an aqueous solvent was generated around it. Next, in trypsin and BPTI, water molecules within 3.5 mm of the surface formed by amino acid residues having an MSAS of 30% or more were substituted with benzene molecules. At that time, when the benzene was within 1.5 mm, the water molecule was not replaced with benzene. When the substitution with benzene was completed, the water molecules were erased once. The three-dimensional structure coordinates of trypsin and BPTI containing benzene molecules generated a periodic box filled with water molecules around them, and then empirical molecular energy calculation was performed using an apricot program under the periodic boundary condition of water molecules. The breakdown of these energy calculations is first to optimize the structure by molecular dynamics calculation in 1,000 steps, and then to search for the behavior of benzene molecules by molecular dynamics calculation at 300 ° K and 10 ps with 1 fs as one step. In molecular dynamics calculation, Uxyz = 10.0 kcal / mol / Å according to the formula (27) is added to the Cα atoms of all amino acid residues so that the three-dimensional structure of the protein is not greatly broken.²The restraint condition of was added.
At the end of these empirical molecular energy calculations, water molecules in the periodic box are erased for both trypsin and BPTI, and the atomic coordinates of trypsin and benzene and the atomic coordinates of BPTI and benzene after the molecular dynamics calculation are calculated in PDB format. Obtained. Cluster analysis was performed for each of the benzene distributions excluding trypsin and BPTI with a threshold of 6%. Of the 94 and 40 benzene molecules placed around trypsin and BPTI, respectively, the largest clusters were 29 and 11, respectively. The distribution of benzene molecules around trypsin and BPTI is shown in FIGS. 15 and 16 together with trypsin and BPTI.
These figures are viewed from the same direction as FIG. In the figure, the black hexagon is the largest benzene cluster.
From FIG. 14 to FIG. 16, it can be seen that the largest benzene clusters around trypsin and BPTI are well matched in direction. In other words, by placing benzene molecules around the hydrophobic residues of proteins, performing molecular dynamics calculations in aqueous solvents, and searching for large benzene cluster distributions by cluster analysis, binding site candidates for protein ligands are identified. You can see that it can be identified. Moreover, it is thought that the initial configuration of the protein-ligand complex can be roughly predicted by docking the protein and the ligand so that these clusters overlap each other on the graphics. By adjusting the initial configuration manually or with molecular design software, it becomes one of the promising candidates for the configuration of the protein-ligand complex.
Industrial applicability
As described above, the method of the present invention is a method capable of accurately constructing a protein structure that is closer to the true, in particular, the vicinity that binds to a ligand, as compared with the conventional method. Therefore, the method of the present invention is extremely useful for designing medical and agrochemical molecules.
That is, the method for constructing a three-dimensional structure including inductive fitting according to the present invention uses a plurality of coordinate data obtained from a reference vibration analysis based on a model three-dimensional structure of a target protein, and an average model three-dimensional structure considering molecular vibration. Can be constructed with high accuracy. In particular, when predicting the three-dimensional structure of the target protein-ligand complex, an important induced fit can be included therein, so that a precise three-dimensional model structure of the complex can be constructed. In addition, the three-dimensional structure of a plurality of receptor proteins was averaged over time by simulating the three-dimensional structure of the protein-ligand complex by the Multiple Copy Simulaneous Search (MCSS) method, which optimizes the structure of the receptor protein with that of one ligand. A three-dimensional structure of the complex is obtained.
Further, the method for constructing the three-dimensional structure of the protein-ligand complex of the present invention examines the variation of the atomic coordinates on the receptor side in the target protein-ligand complex model after the MCSS calculation, and the site important for activity is the atomic coordinates. It is possible to design a new ligand by utilizing the fact that the variation in the size is relatively small and the variation in the other sites is large, and can be effectively used in the molecular design of medicines and agrochemicals.
Although the present invention has been described in detail and with reference to specific embodiments, it will be apparent to those skilled in the art that various changes and modifications can be made without departing from the spirit and scope of the invention.
This application is based on a Japanese patent application (Japanese Patent Application No. 2001-011783) filed on January 19, 2001, the contents of which are incorporated herein by reference. The contents of documents cited in this specification are also incorporated herein by reference.
[Sequence Listing]

[Brief description of the drawings]
FIG. 1 is a flowchart showing an example of a protein three-dimensional structure construction method including induction fitting according to the present invention.
FIG. 2 is a diagram showing a construction method of Cα atomic coordinates in step I-41. The matching part of the alignment is obtained from the reference protein, and the non-existing part is obtained from the database with the smallest rmsd of the superposition of the two residues overlapping each of the N and C ends.
FIG. 3 is a diagram showing local space homology (LSH). Calculations for T residues in the figure take into account the shaded (gray) residues. In the lower alignment in the figure, the portion surrounded by a square is a residue pair to be considered, and the ratio where the mark is * is LSH. In this case, LSH is 56.2%.
FIG. 4 is a diagram showing the relationship between LSH and the ratio in structure-conserving sites (SCRs). The LSH is calculated from the superposition of the Cα atom of the target protein and the reference protein, and the ratio in the SCRs is the number of residues in the SCRs relative to the total number of residues of the target protein.
FIG. 5 is a flowchart showing an example of a method for constructing the three-dimensional structure of the protein-ligand complex of the present invention.
FIG. 6 is a flowchart showing an example of a method for identifying a ligand binding site of the present invention and a method for constructing a three-dimensional structure of a protein-ligand complex using the binding site identified by the method.
FIG. 7 is a flow chart showing an example of a method for constructing a three-dimensional structure of a protein including induction fitting according to the present invention.
FIG. 8 is a diagram showing an alignment of QRHUB2 (β2 adrenergic receptor) obtained using 1F88 (rhodopsin) as a reference protein. In the figure, the numbers on the right side of QRHUB2 and 1F88 are the number of amino acids targeted for alignment in the amino acid sequence of each protein. The upper sequence shows QRHUB2 (β2 adrenergic receptor), and the lower sequence shows 1F88 (rhodopsin). The amino acid sequence of each protein is indicated by a one-letter code.
FIG. 9 shows the lowest eigenvalue of 4.47 cm.^-1M^V(= 26.4) It is a figure which shows the temperature factor converted into the fluctuation | variation which multiplied. The solid line is the fluctuation of the Cα atom converted from the average temperature factor of the A chain and B chain of PDB ID: 1F88, and the dotted line is 4.47 cm obtained from the normal vibration analysis method.^-1Fluctuation of Cα atom position of M^V(= 26.4).
FIG. 10 shows the target protein and ± 2 × M^VFIG. 5 is a photograph of a display printout showing a part of the conformation of an induced fit target protein constructed from an induced fit reference protein multiplied by (± 2 × 26.4). The central structure is a no induced fit target protein.
FIG. 11 is a photograph of a display printout showing the three-dimensional structure of trypsin in the trypsin-BPTI complex system after MCSS calculation.
FIG. 12 is a photograph of a display printout showing the initial three-dimensional structure of trypsin-BPTI complex before MCSS calculation. In this figure, His57, Asp102, and oxyanion holes (Gly193-Asp194-Ser195) corresponding to the active site of the trypsin-BPTI complex are extracted on the trypsin side, and only Lys15 is extracted on the BPTI side. In the figure, the black line represents the three-dimensional structure of X-ray crystallographic analysis of the trypsin-BPTI complex, and the gray line represents the initial three-dimensional structure of the complex model assembled.
FIG. 13 is a photograph of a printout of a display showing the three-dimensional structure of a trypsin-BPTI complex after MCSS calculation. In this figure, His57, Asp102, and oxyanion holes (Gly193-Asp194-Ser195) corresponding to the active site of the trypsin-BPTI complex are extracted on the trypsin side, and only Lys15 is extracted on the BPTI side. In the figure, the black line is the three-dimensional structure of the X-ray crystallographic analysis of the trypsin-BPTI complex, and the gray line is the refined three-dimensional structure of the assembled complex model. .
FIG. 14 is a photograph of a printout of a display showing the three-dimensional structure coordinates of X-ray crystallographic analysis of trypsin-BPTI complex.
FIG. 15 is a photograph of a display printout showing the distribution of benzene molecules around trypsin. In the figure, the black hexagon is the largest benzene cluster.
FIG. 16 is a photograph of a display printout showing the distribution of benzene molecules around BPTI. In the figure, the black hexagon is the largest benzene cluster.

Claims

In a method for deriving an alignment between a reference protein and a target protein using a computer and constructing a three-dimensional structure of the target protein based on the alignment and the three-dimensional structure information of the reference protein,
The computer includes at least a CPU and storage means.
The storage means stores at least three-dimensional structure information describing atomic coordinates of the three-dimensional structure of the reference protein,
Executed in the CPU,
The target protein having the three-dimensional structure information of the reference protein stored in the storage means and a plurality of inductive-compatible three-dimensional structure information obtained by displacing the atomic coordinates in the eigenvector direction obtained from the standard vibration analysis method as the three-dimensional structure of the reference protein Creating a plurality of three-dimensional structure sets and storing them in the storage means ,
Only including,
The inductive conformation type three-dimensional structure information calculates a first position fluctuation of the C α atom at a predetermined temperature and the eigenvalue using the eigenvalue and eigenvector obtained from the reference vibration analysis method for the atomic coordinates, and each C The second position fluctuation obtained by converting the temperature factor into the position fluctuation for the α atom is calculated, and the ratio between the second position fluctuation and the first position fluctuation obtained by the reference vibration analysis method is calculated, and the total C α Find the average ratio of atoms, the atomic coordinates calculated using the following formula 1 or 2,
A method for constructing a three-dimensional structure of a protein including inductive fitting characterized by the above.

The three-dimensional structure construction step executed in the CPU includes
(i) obtaining coordinates from the three-dimensional structure information of the reference protein stored in the storage means for the Cα atom in the amino acid, optimizing the Cα atom coordinate so as to minimize the objective function,
(ii) optimize the atomic coordinates of the main chain so as to minimize the objective function by adding other atoms of the main chain to the optimized Cα atomic coordinates;
(iii) adding other atoms of the side chain to the optimized atomic coordinates of the main chain and optimizing it so as to minimize the objective function and storing it in the storage means ;
The method of claim 1 including each step.

In a method for constructing a three-dimensional structure of a protein-ligand complex using a computer,
The computer includes at least a CPU and storage means.
The storage means stores at least reference protein three-dimensional structure information describing atomic coordinates of the three-dimensional structure of the reference protein and ligand three-dimensional structure information describing atomic coordinates of the three-dimensional structure of the ligand,
Executed in the CPU,
(i) performing a docking process between the three-dimensional structure information of the target protein created by the method according to claim 1 or 2 stored in the storage means and the ligand three-dimensional structure information;
(ii) The empirical molecular energy calculation of the structure of one three-dimensional structure information of the target protein and the ligand three-dimensional structure information stored in the storage means is performed by the number of the three-dimensional structure information of the target protein. ,
(iii) The target protein side moves the atomic coordinates according to the potential energy gradient of each of the three-dimensional structure information of the target protein,
(iv) On the ligand side, move the atomic coordinates of the ligand three-dimensional structure information in the direction of averaging the calculated potential energy gradients,
(v) determine the said ligand structural information based on a plurality of three-dimensional structural information of the target protein, you stored in the storage means,
A method for constructing a three-dimensional structure of a protein-ligand complex, comprising each step.

In the empirical molecular energy calculation executed in the CPU, the position of the initial Cα atom coordinate of the target protein is added as an optional Harmonic function, or a potential function that constrains the twist angle of the main chain of the target protein is added The method according to claim 3.