JP2004500614A

JP2004500614A - Receptor selectivity mapping

Info

Publication number: JP2004500614A
Application number: JP2000614100A
Authority: JP
Inventors: マイヤク，デービッド，エム．; ゼッペテロ，レニー，エー．; チェン，ハオ; ウェイッスマン，アーサー，ディー．; ラング，ギャリー，エル．
Original assignee: ノヴァスクリーン　バイオサイエンシズ　コーポレーション
Priority date: 1999-04-26
Filing date: 2000-04-26
Publication date: 2004-01-08
Also published as: WO2000065421A2; AU4661300A; EP1360560A2; EP1360560A4; CA2371093A1; WO2000065421A3

Abstract

【解決手段】化合物に関する記録及び該化合物の人間及び動物の生物学的システム上の作用に関する記録を含む第１データベースと、分子ターゲットに関する記録を含む第２データベースとからなるコンピュータシステム。該システムはさらに、第１データベースの化合物と第２データベースの分子ターゲットとの間の結合、反応性その他の相互作用テストに関する記録を含む第３データベースを含む。上記テストには、第２データベース中の特定分子ターゲット及びこれと相互作用することが知られた化合物間の相互作用において、前記化合物が有する作用効果に関する情報が含まれている。前記テストの閾値を設定する手段、及び前記相互作用テスト閾値を満たす化合物を選択する手段もまた、前記システムに含まれている。第１、第２及び第３データベースの情報を、ユーザが閲覧、操作又は分析することができるユーザインタフェースが設けられている。
【選択図】図５A computer system comprising a first database containing records about compounds and the effects of the compounds on biological systems of humans and animals, and a second database containing records about molecular targets. The system further includes a third database containing records relating to binding, reactivity and other interaction tests between the compounds of the first database and the molecular targets of the second database. The test includes information on the effects of the compound in the interaction between the specific molecular target in the second database and the compound known to interact with the specific molecular target. Means for setting the test threshold and for selecting compounds that meet the interaction test threshold are also included in the system. A user interface is provided that allows a user to browse, operate, or analyze information in the first, second, and third databases.
[Selection diagram] FIG.

Description

【０００１】
【発明の属する技術分野】
本発明は、一般的に、化学インフォマティクス及びバイオインフォマティクスと、化学分子ターゲットの相互作用に関するデータとの組み合わせにより、多次元データベースを構築する技術に関する。特に、本発明は、化合物、分子ターゲット、及び生物学的又は臨床的情報を含むデータベースに関するものであり、該データベースにおいて、化合物及び分子ターゲット間の相互作用のパターン又は関係性が決定され、データベース内の他の情報と比較されることにより、薬物の発見及び開発、並びに関連領域において有用となる結論が得られるようになっている。
【０００２】
【従来の技術】
世界規模の製薬業界では、研究開発に年間３００億ドルが費やされており、このうちおよそ３分の１は、前臨床開発及び臨床開発のための薬物候補を選択するまでの期間である発見段階及び初期開発段階に費やされている。薬物発見に重要な臨床的ステップは、以下のステップからなる。（１）ヒトゲノムのセグメントを含むＤＮＡのシーケンス、（２）特定の病気又は生物学的機能と関連するゲノムを有する遺伝子の同定、（３）該機能的遺伝子に関連し、又は該機能的遺伝子によりエンコードされ、後に薬物発見のための生物学的ターゲット又は分子ターゲットとなるレセプタ又は酵素などのタンパク質の生成、（４）分子ターゲットに対して活性を化合物ライブラリーからスクリーニング、（５）他の生物学的ターゲットに対して最も活性の高い化合物をスクリーニングして、目的の生物学的／分子ターゲットに対する該化合物の選択性又は特異性について評価し、また他のターゲットに対する活性により望ましくない副作用が起こる可能性について評価、（６）毒性、吸収性、分布、代謝、排出等の特性を検定するための一定範囲のアッセイにおいて、最も作用が強く選択性の高い化合物を評定、（７）上記の情報を用いた経験的判断に基づいて最も有望な化合物を評定し、該情報を化学合成グループに送り初期活性化合物の類似体を生成、（８）該化学的類似体をステップ（４）、（５）及び（６）において再テスト後、最適な誘導化合物又は化合物群が同定されるまでステップ（７）を繰り返す、（９）この最適な誘導化合物を更なる前臨床試験及び臨床試験に使用する。
【０００３】
この発見及び開発のプロセスにおいて、狭いフィルターを通過した化合物は、より高価な前臨床開発及び臨床開発ようとして選別される。残念ながら、この選別プロセスに続く前臨床開発及び臨床開発において、化合物がこれらの段階をクリアできず、商品化にまでたどり着かないことがしばしばある。これらの失敗のため、１つの新薬を開発し発売するための平均コストは３億ドルを超えると推定されている。しかしながら、もし、発見及び開発プロセスの初期段階で最適な薬物候補を正確に同定することができ、該薬物が前臨床試験及び臨床試験をクリアすることができれば、開発コストは７５％も低減することができる。明らかに、製薬の研究開発（Ｒ＆Ｄ）における主要目的は、上記のような薬物開発初期段階のテストの予測可能性を向上させることにある。
【０００４】
バイオテクノロジーにおける技術革新と、実験プロセスの多くを自動化できる器具の発達により、製薬Ｒ＆Ｄに重大な影響を与える２つの主要なトレンドが生まれている。第一に、ヒトゲノムのシーケンスにおける進歩により、新薬発見スクリーニングプログラムに使用できる（新規レセプタ及び酵素などの）分子ターゲットの数は、急激に増加し続けている。約４００個の分子ターゲットが新薬発見のために調査されており、ヒトゲノム計画により解明されるであろう潜在的分子ターゲットの数は数千から１万以上と推測されている。第二に、自動化及び組み合わせ化学（ｃｏｍｂｉｎａｔｉｏｎａｌｃｈｅｍｉｓｔｒｙ）などの新技術により、新薬発見スクリーニングプログラムに用いることができる化合物ライブラリーのサイズは、およそ１０倍（多くの製薬企業が有する１００万種以上の化合物）に膨張している。これらの２つの要因は新薬発見に大きな見込みを与えるが、一方では、新薬開発のコスト面において望ましくない結果をもたらす重大な潜在的問題をも生み出している。さらに多くのターゲット及び化合物が更なる生物活性化合物の発見につながり、その結果、前臨床試験に進むための最適な薬物候補の選択において大きな困難が生まれ、また、より多くの化合物が前臨床試験及び臨床試験に進むことによりこれらの段階でより多くの失敗が生まれるため、開発コストは増大するであろう。
【０００５】
これらの要因により、誘導化合物の選別、最適化及び検認においては、迅速かつ低コストな（「試験管」又はマイクロプレートベースの）インビトロ・アッセイがますます必要となる。このような迅速なアッセイは、後の高価な新薬開発段階に移行する前に、これらの活性化合物のうち最も有望なものを同定するのに役立つであろう。これらの要因により、さらに、遺伝子及び遺伝子生成物（分子ターゲット）、化学的構造、及びスクリーニング結果に関する膨大なデータを管理し解釈するためのより効果的な方法が必要となる。
【０００６】
製薬Ｒ＆Ｄにおいて重要性が増しているインビトロ・アッセイの応用の一つは、「プロファイリング」である。この特許出願の権利者は、１９８０年代後半にプロファイリングの概念を開拓した。製薬企業は、新薬として開発される化合物の薬物学的活性及び潜在的副作用の特徴付けのためのインビトロ・アッセイの膨大なアレイを有している。現在、中枢神経系障害、免疫病、痛み及び炎症、感染病、癌、代謝又は成長因子、心臓血管機能、及び内分泌系に関わる病気を含む、広範囲の人間の病気において重要な役割を果たす分子ターゲット、必要なレセプタ及び酵素に基づいて、日常的に行われる２００以上の異なるアッセイが存在する。薬品は、細胞レセプタとの相互作用により、世界市場機能の半分以上をしめている。さらに、多くの薬品の副作用は、そのレセプタ及び酵素との相互作用により緩和されるようになっている。
【０００７】
プロファイリングにおいて、ある製薬企業の、一般的には前臨床開発段階に入っている誘導化合物は、レセプタ及び酵素アッセイ装置によりテストされる。プロファイリングプロセスで得られる、この企業の化合物と特定のレセプタとの相互作用に関する情報は、誘導化合物の最適化及び選別において重要であり、また化合物の副作用又は第２の効能の可能性についての示唆ともなる。この知識により、この製薬企業は、該化合物の前臨床及び／又は臨床開発にかかる時間と費用数百万ドル分を潜在的に節約することができる。
【０００８】
長年、プロファイリングサービスが行われてきたが、製薬企業は一般的に、これらのテストから得られるデータを経験則的に使用していた。多くの薬物は、選択性の高い薬物も含めて、数多くのレセプタ又は他の分子ターゲットと相互作用する。したがって、プロファイリングにより生成されるデータの解釈は、製薬企業の研究者が、経験及び知識に基づき、化合物の化学的構造及び化合物と特定のレセプタとの結合作用の両方のデータを参酌して行うものである。残念ながら、最も経験を積んだ薬理学者でさえ、様々な薬物と、新薬開発に関連した広範囲のレセプタとの相互作用についての知識は完全ではない。
【０００９】
遺伝子及び遺伝子生成物（分子ターゲット）、化学的構造、及びスクリーニング結果に関する膨大なデータを管理し、照合し、解釈し、及び活用するためのより効果的な方法の必要性から、バイオ情報学及び化学情報学、又は生物学的及び科学的データの管理における新たな機会が創造されている。新薬発見のための情報の膨大なプールを作り出す諸段階は、以下の各段階からなる。（１）ＤＮＡシーケンス（細胞が遺伝子生成物又はタンパク質を生成するための設計図となる遺伝的物質又は遺伝子のコーディング）、（２）機能的ゲノミクス（特に薬物又は生物学的機能の変化に応じて、ｍＲＮＡ生成物を介して、ＤＮＡシーケンスを、関連する遺伝子生成物又はタンパク質に転換するプロセス）、（３）プロテオミクス（アミノ酸シーケンス、及び／又は、遺伝子にコーディングされているレセプタなどの遺伝子生成物又はタンパク質の３次元構造の同定）、（４）微量分子の薬理学／毒物学（レセプタなどの遺伝子生成物と、薬物となり得る微小有機化合物との分子結合性又は相互作用）、（５）化学的構造（微小分子、薬物類似化合物について）。
【００１０】
ＤＮＡシーケンスのためのデータベース（グループ１）は確立されており、Ｇｅｎｂａｎｋ、ゲノムセンターなどを含む。同様に、化学的構造のデータベース（グループ５）もまたよく知られており、ＭＤＬ（Ｉｓｉｓ）やＯｘｆｏｒｄＭｏｌｅｃｕｌａｒなどのベンダーにより提供されている。プロテオミクスのデータベース（グループ３）、例えばＳＷＩＳＳ−ＰＲＯＴ、ＰｒｏＬｉｎｋ及びＰＤＢなども、また、構築されている。これらのデータベースは、構造情報を含んでおり、その１次元において、あるいは構造情報又はシーケンス情報の１コンポーネントにおいてパターンを決定するのに用いることができるので、それぞれを１個のコンポーネントとして考えることができる。グループ２及び４のデータベースはまだよく構築されていないが、新薬発見及び開発のための情報プールの貴重な追加情報となるだろう。これら後者２つの形式のデータベースは、遺伝子対タンパク質（グループ２）及びタンパク質対化合物（グループ４）などのように、２つの構造間の相互作用に関するデータを含んでいるので、２個の構成部品からなり、２次元的である。このようなデータベースの関係性は、１構成部品からなるデータベースに比べて、複雑さのレベルが追加されたものである。
【００１１】
グループ４のタンパク質対化合物の関係性のための部分的データベース又は複数のデータベースが現在構築されている。例えば、本願権利者がクライアントに提供している、広範囲のレセプタターゲットに対する単一化合物の結合プロファイルは、グループ４タイプのデータベースの部分的データセットである。同様に、化学的構造データベース（グループ５）に含まれているような数千から数十万種の化合物を、特定のレセプタターゲット（グループ３内の単一点）に対する活性によりスクリーニングするような高度処理スクリーニングプロジェクトによって生成されたデータは、グループ４のデータベースの一部となるであろう。このような部分的グループ４データベースは新薬発見及び開発において役に立つであろうが、これらには２つの主要な欠点がある。第一に、これらは、単一化合物又は限られた化合物セットの、一定範囲のレセプタ（プロファイル）又は１個のレセプタターゲットにおける多数の化合物に対する結合選択性などのような、特定の２個コンポーネントの分析に関するものである（高度処理スクリーニングプロジェクト）。いずれのケースにおいても、複数のレセプタターゲットと複数の化学的構造との間の統計学的相関関係を扱えるほどデータセットの幅は十分に大きくはない。第二に、重要なことであるが、これらの部分的データセットは、構造的新規性、すなわち特に新薬としてのポテンシャルに基づいて選択された化合物に関して生成されるものである。これらは新規な化合物であるから、動物又は人間の体内における活性についての生物学的情報は全く存在しない。したがって、このようなアプローチは、上記のように、プロファイルのデータを経験的に解釈しようとする薬理学者と同じ限界に苦しめられることになる。
【００１２】
【発明が解決しようとする課題】
したがって、本発明の目的は、新薬発見及び開発に関連したデータ分析のためのシステム及び方法を提供することにより、上記の必要を満たすことである。多数の化合物の多数の分子ターゲットに対するテスト結果から得られるポジティブデータ及びネガティブデータを含む、全分類スクリーニングデータベースが提供される。化合物及び分子ターゲットの組み合わせ数は、統計学的手法又は他のデータマイニング方法の分野において通常の知識を有する者が、このスクリーニングデータベースと、関連する化合物データベース及び分子ターゲットデータベースとを用いることにより、どの化合物が臨床試験に適しており、安全かつ効果的な薬物となる高い見込みがあるかについて、信頼性の高い予測をすることができる程度に大きくなくてはならない。
【００１３】
【課題を解決するための手段】
本発明は、特に、上記要求を満たすようなシステム及び方法をここに開示する。このシステムには、複数の化合物に関連する記録及び該複数の化合物の人間及び動物の生物学的システム上の作用に関連する記録を有する第１のデータベースと、複数の分子ターゲットに関連する記録を有する第２のデータベースとからなるコンピュータシステムが含まれる。このコンピュータシステムは、さらに、第１のデータベース中の化合物と第２のデータベース中の分子ターゲットとの間の結合性、反応性、その他の相互作用に関するテストに関連する記録を有する第３のデータベースを含む。上記テストには、第２のデータベース中の複数の分子ターゲットから選択された、特定の分子ターゲットと相互作用することが知られている化合物（例えば、対照剤又は対照基準）との相互作用において、第１のデータベース中の複数の化合物から選択された化合物が有する作用効果についての情報が含まれており、また上記テストは、第２のデータベース中の複数の分子ターゲットに対して行われるようになっている。前記副作用に関連する相互作用テストの閾値を設定するための手段、及び化合物、化合物セット、及び／又は作用効果テストの結果が前記相互作用テスト閾値に合致するときのこれら化合物に関する情報を選択するための手段もまた、前記コンピュータシステムに含まれている。第１のデータベース及び第２のデータベース中の情報、並びに、第１のデータベース中の１以上の化合物の記録と、及び／又は第２のデータベース中の１以上の分子ターゲットの記録とに関連する第３のデータベース中の情報を、特に前記相互作用テスト閾値に合致する結果に関連した化合物、分子ターゲット、その他のデータベース記録に関して、ユーザが閲覧し及び操作し、又は分析することができるよう、ユーザインタフェースが設けられている。
【００１４】
さらに、本発明は、統計学的手法及びその他のデータマイニング手法をこれらの多次元データベースに応用して、新薬発見及び開発に関連する相関関係又はパターンを決定することにも関連している。
【００１５】
上記の一般的記述及び以下の詳細な記述は、ともに、実施例と説明を提供するだけであり、これらの記述により本発明の特許請求の範囲が制限されるべきものではない。
【００１６】
本明細書の一部として含まれる添付図面は、本発明の実施形態を示すものであり、また、本明細書の記述と合わせて、本発明の利点と原理を説明するためのものである。
【００１７】
【発明の実施の形態】
本発明の好ましい実施形態を説明するが、その実施例は添付図面に示されており、また本発明の詳細な説明からも明らかとなるであろう。異なる図面における同一の参照番号は、可能な限り、同一又は同類の要素を示すものとする。
【００１８】
本発明に一貫するシステム及び方法により、新薬発見及び開発に関連したデータを分析して、例えば、新規化合物が安全又は効果的な新薬となる可能性が高く、全臨床試験及び臨床試験の段階に進むべきかどうかを予測することが可能となる。以下の記載では、本発明のシステム及び方法を、複数のメインテーブルを含むリレーショナルデータベースに関連して、また、化合物及び分子ターゲット間の結合性をこれら２者間の相互作用の測定基準として使用する用法に関連して説明する。この説明はまた、一般的に、複数のメインコンポーネントを有する他のデータベース構造や、化合物及び分子ターゲット間の他の相互作用の測定にも適用され得るものである。
【００１９】
本発明は、化合物、分子ターゲット、特にタンパク質、その他の高分子、及びこれら化合物の生物活性に関する情報豊富なデータベースの新規な設計、構造、及び応用に関するものである。本発明はさらに、全臨床試験又は臨床試験を通過しなかった既知の薬物及び薬物候補を、これらの化合物の副作用、反応機構その他の医学的データを含む前臨床データ及び臨床データとともに、データベースの化合物ライブラリーの情報源として使用する方法にも関する。本発明はまた、データベース中の化合物と分子ターゲットとの結合その他の相互作用を決定し、また、関係性分析法及びデータマイニング手法を用いて、これらの相互作用パターンと、新薬発見及び開発に関連した特定の生物反応との、又はこのような相互作用をする化合物の特定の化学的構造、下部構造、その他の特徴との、又はこのような相互作用をする分子ターゲットの生化学的特徴、構造、その他の特徴との相関関係を決定する。このようなデータマイニング技術の例として、以下を参照することができる。これらの参照資料は全て本発明に含まれるものとする。
ａ）Ｃｈｅｎｅｔａｌ．、３次元デスクリプタを用いた大規模構造−活性データセットの再帰的分割分析（ＲｅｃｕｒｓｉｖｅＰａｒｔｉｔｉｏｎｉｎｇＡｎａｌｙｓｉｓｏｆａＬａｒｇｅＳｔｒｕｃｔｕｒｅ−ＡｃｔｉｖｉｔｙＤａｔａＳｅｔＵｓｉｎｇＴｈｒｅｅ−ＤｉｍｅｎｓｉｏｎａｌＤｅｓｃｒｉｐｔｏｒｓ）、ＪｏｕｒｎａｌｏｆＣｈｅｍｉｃａｌＩｎｆｏｒｍａｔｉｏｎａｎｄＣｏｍｐｕｔｅｒＳｃｉｅｎｃｅｓ、１９９８年１０月；
ｂ）Ｈａｗｋｉｎｓｅｔａｌ．、再帰的分割を用いた大規模構造−活性データセットの分析（ＡｎａｌｙｓｉｓｏｆａＬａｒｇｅＳｔｒｕｃｔｕｒｅ−ＡｃｔｉｖｉｔｙＤａｔａＳｅｔＵｓｉｎｇＲｅｃｕｒｓｉｖｅＰａｒｔｉｔｉｏｎｉｎｇ）、Ｑｕａｎｔ．Ｓｔｒｕｃｔ．−Ａｃｔ．Ｒｅｌａｔ．、１６：２９６−３０２（１９９７年）；
ｃ）ＤｅＰｒｉｅｓｔｅｔａｌ．、アンギオテンシン転換酵素及びサーモリシン阻害剤の３Ｄ−ＱＳＡＲ；演繹的、経験的に決定される活性部位ジオメトリックスに基づくＣｏＭＦＡモデルの比較（３Ｄ−ＱＳＡＲｏｆａｎｇｉｏｔｅｎｓｉｏｎ−ｃｏｎｖｅｒｔｉｎｇｅｎｚｙｍｅａｎｄｔｈｅｒｍｏｌｙｓｉｎｉｎｈｉｂｉｔｏｒｓ；ａｃｏｍｐａｒｉｓｏｎｏｆＣｏＭＦＡｍｏｄｅｌｓｂａｓｅｄｏｎｄｅｄｕｃｅｄａｎｄｅｘｐｅｒｉｍｅｎｔａｌｌｙｄｅｔｅｒｍｉｎｅｄａｃｔｉｖｅｓｉｔｅｇｅｏｍｅｔｒｉｃｓ）、Ｊ．Ａｍ．Ｃｈｅｍ．Ｓｏｃ．１１５：５３７２−８４（１９９３年）；
ｄ）Ｇｏｏｄｅｔａｌ．、コンピュータ化学におけるレビュー（ｉｎＲｅｖｉｅｗｓｉｎＣｏｍｐｕｔａｔｉｏｎａｌＣｈｅｍｉｓｔｒｙ）；Ｌｉｐｋｏｗｉｔｚ，Ｋ．Ｂ．、Ｂｏｙｄ，Ｄ．Ｂ．（ｄｅｓ．）、ＶＣＨ，Ｎｅｗｙｏｒｋ，Ｖｏｌ．７．ｐｐ６７−１１７（１９９６年）；
ｅ）Ｍａｒｓｈａｌｅｔａｌ．コンピュータ利用製薬設計（Ｃｏｍｐｕｔｅｒ−ＡｓｓｉｓｔｅｄＤｒｕｇＤｅｓｉｇｎ）ＡＣＳＳｙｍｐｏｓｉｕｍＳｃｒｉｃａ１１２；ＡｍｅｒｉｃａｎＣｈｅｍｉｓｔｒｙＳｏｃｉｅｔｙ：Ｗａｓｈｉｎｇｔｏｎ，ＤＣ，１９７９年；ｐｐ２０５−２２６；
ｆ）Ｍｏｌｅｃｅｔａｌ．、３次元構造活性の関係性及び生物学的レセプタのマッピング（Ａｔｈｒｅｅ−ｄｉｍｅｎｓｉｏｎａｌｓｔｒｕｃｔｕｒｅａｃｔｉｖｉｔｙｏｆｒｅｌａｔｉｏｎｓｈｉｐａｎｄｂｉｏｌｏｇｉｃａｌｒｅｃｅｐｔｏｒｍａｐｐｉｎｇ）、ＭａｔｈｅｍａｔｉｃｓａｎｄＣｏｍｐｕｔａｔｉｏｎａｌＣｏｎｃｅｐｔｓｉｎＣｈｅｍｉｓｔｒｙ；ＥｌｌｉｓＨｏｒｗｏｏｄ；Ｃｈｉｃｈｅｓｔｅｒ，１９８５年；ｐｐ２２５−２５１；
ｇ）Ｍａｙｅｒｅｔａｌ．、構造活性研究に一致するアンギテオシン転換酵素の活性部位の特異的ジオメトリ（Ａｕｎｉｑｕｅｇｅｏｍｅｔｒｙｏｆｔｈｅａｃｔｉｖｅｓｉｔｅｏｆａｎｇｉｏｔｅｎｓｅｎ−ｃｏｎｖｅｒｔｉｎｇｅｎｚｙｍｅｃｏｎｓｉｓｔｅｎｔｗｉｔｈｓｔｒｕｃｔｕｒｅａｃｔｉｖｉｔｙｓｔｕｄｉｅｓ）Ｊ．Ｃｏｍｐｕｔ．ＡｉｄｅｄＭｏｌ．Ｄｅｓ．，１：３−１６．（１９８７年）；
ｈ）Ｓｈｅｒｉｄａｎｅｔａｌ．、ディスタンスジオメトリ法に対するアンサンブルアプローチ（Ｔｈｅｅｎｓｅｍｂｌｅａｐｐｒｏａｃｈｔｏｄｉｓｔａｎｃｅｇｅｏｍｅｔｒｙ）：ａｐｐｌｉｃａｔｉｏｎｔｏｔｈｅｎｉｃｏｔｉｎｉｃｐｈａｒｍａｃｏｐｈｏｎｅ，ｊ．ｍｅｄｃｈｅｍ．．２９：８９９−９０６（１９８６年）；
ｉ）Ｍａｒｔｉｎｅｔａｌ．、Ｐｈａｒｍａｃｏｐｈｏｎｅマッピングに対する高速で新しいアプローチ及びそのドーパミン及びベンゾジアゼピン拮抗薬への応用（Ａｆａｓｔｎｅｗａｐｐｒｏａｃｈｔｏｐｈａｒｍａｃｏｐｈｏｎｅｍａｐｐｉｎｇａｎｄｉｔｓａｐｐｌｉｃａｔｉｏｎｔｏｄｏｐａｍｉｎｅｒｇｉｃａｎｄｂｅｎｚｏｄｉａｚｅｐｉｎｅａｇｏｎｉｓｔｓ）Ｊ．Ｃｏｍｐｕｔ．ＡｉｄｅｄＭｏｌ．Ｄｅｓ．，７；８３−１０２（１９９３年）；
ｊ）Ｃａｔａｌｙｓｔ／ＨｙｐｏＴｕｔｏｒｉａｌ，ｖｅｒｓｉｏｎ２．０，ＢｉｏＣＡＤＣｏｒｐ．ＭｏｕｎｔａｉｎＶｉｅｗ，ＣＡ，１９９３年；
ｋ）Ｓｐｒａｇｕｅ，Ｐ．Ｗ．，化学的仮説の自動的生成及び触媒によるデータベース検索（ＡｕｔｏｍａｔｅｄｃｈｅｍｉｃａｌｈｙｐｏｔｈｅｓｉｓｇｅｎｅｒａｔｉｏｎａｎｄｄａｔａｂａｓｅｓｅａｒｃｈｉｎｇｗｉｔｈＣａｔａｌｙｓｔ），Ｐｅｒｓｐｅｃｔ．ＤｒｕｇＤｉｓｃｏｖ．Ｄｅｓ．，３：１−２０（１９９５年）；
ｌ）Ｂａｒｎｕｍｅｔａｌ．（Ｉｄｅｎｔｉｆｉｃａｔｉｏｎｏｆｃｏｍｍｏｎｆｕｎｃｔｉｏｎａｌｃｏｎｆｉｇｕｒａｔｉｏｎｓａｍｏｎｇｍｏｌｅｃｕｌｅｓ）、Ｊ．Ｃｈｅｍ．Ｉｎｆ．Ｃｏｍｐｕｔ．Ｓｃｉ．，１９９６，３６：５６３−７１（１９９６年）；
ｍ）ＨｉｐＨｏｐＴｕｔｏｒｉａｌ，ｖｅｒｓｉｏｎ２．３；ＭｏｌｅｃｕｌａｒＳｉｍｕｌａｔｉｏｎＩｎｃ．；Ｓｕｎｎｙｖａｌｅ，ＣＡ，１９９５年；
ｎ）Ｄａｖｉｅｓ，Ｋ．及びＵｐｉｎｎ，Ｒ．，３Ｄｐｈａｒｍａｃｏｐｈｏｒｅｓｅａｒｃｈｉｎｇ，ｎｅｔ．Ｓｃｉ．，（ｈｔｔｐ：／／ｗｗｗ．ｏｒｇ／Ｓｃｉｅｎｃｅ／Ｃｈｅｍｉｎｆｏｒｍ／ｆｅａｔｕｒｅ０２．ｈｔｍｌ）；
ｏ）Ｇｏｌｅｎｄｅｒ，Ｖ．及びＶｅｓｔｅｒｍａｎ，Ｂ．、薬物設計のためのＡＰＥＸ３Ｄエキスパートシステム（ＡＰＥＸ３Ｄｅｘｐｅｒｔｓｙｓｔｅｍｆｏｒｄｒｕｇｄｅｓｉｇｎ）、Ｎｅｔ．Ｓｃｉ．（ｈｔｔｐ：／／ａｗｏｄ．ｃｏｍ／ｎｅｔｓｃｉ／ｓｃｉｅｎｃｅ／ｃｏｍｐｃｈｅｍ／ｆｅａｔｕｒｅ０９．ｈｔｍｌ）；
ｐ）ＶａｎＤｒｉｅ，Ｊ．、ｐｈａｒｍａｃｏｐｈｏｒｅの３Ｄデータベースクエリー決定のための戦略（Ｓｔｒａｔｅｇｉｅｓｆｏｒｔｈｅｄｅｔｅｒｍｉｎａｔｉｏｎｏｆｐｈａｒｍａｃｏｐｈｏｒｉｃ３Ｄｄａｔａｂａｓｅｑｕｅｒｉｅｓ）、Ｊ．Ｃｏｍｐｕｔ．ＡｉｄｅｄＭｏｌ．Ｄｅｓ．，１１：３９−５２（１９９７年）
ｑ）ＶａｎＤｉｒｅ，Ｊ．及びＮｕｇｅｎｔ，Ｒ．、組み合わせ化学が提起する問題への取り組み（Ａｄｄｒｅｓｓｉｎｇｔｈｅｃｈａｌｌｅｎｇｅｓｐｏｓｅｄｂｙｃｏｍｂｉｎａｔｉｏｎｃｈｅｍｉｓｔｒｙ）：３Ｄｄａｔａｂａｓｅｓ，ｐｈａｒｍａｃｏｐｈｏｎ；ｒｅｃｏｇｎｉｔｉｏｎａｎｄｂｅｙｏｎｄ，ＳＡＲＱＳＡＲＥｎｖｉｒｏｎ．Ｒｅｓ．，９：１−２１（１９９８年）；
ｒ）Ｆｉｎｎｅｔａｌ．、帰納論理学プログラミングｐｒｏｇｏｌを用いたＰｈａｒｍａｃｏｐｈｏｎｒｅの発見（Ｐｈａｒｍａｃｏｐｈｏｎｒｅｄｉｓｃｏｖｅｒｙｕｓｉｎｇｔｈｅｉｎｄｕｃｔｉｖｅｌｏｇｉｃｐｒｏｇｒａｍｍｉｎｇｐｒｏｇｏｌ）ＭａｃｈｉｎｅＬｅａｒｎｉｎｇ，ＳｐｅｃｉａｌＩｓｓｕｅｏｎＡｐｐｌｉｃａｔｉｏｎｓａｎｄＫｎｏｗｌｅｄｇｅＤｉｓｃｏｖｅｒｙ，ＫｌｕｗｅｒＡｃａｄｅｍｉｃＰｕｂｌｉｓｈｅｒｓ：Ｂｏｓｔｏｎ，１９９８，ｐｐ１−３３；
ｓ）Ｊａｉｎｅｔａｌ．、製薬設計のための形状に基づく機械学習ツール（Ｃｏｍｐａｓｓ：ａｓｈａｐｅ−ｂａｓｅｄｍａｃｈｉｎｅｌｅａｒｎｉｎｇｔｏｏｌｆｏｒｄｒｕｇｄｅｓｉｇｎ）、Ｊ．Ｃｏｍｐｕｔ．ＡｉｄｅｄＭｏｌ．Ｄｅｓ．，８：６３５−５２（１９９４年）。
【００２０】
製薬業界における標準的操作手順とは違って、グループ４データベースは２個より多いコンポーネントからなるデータベースとして構築されなければならず、また、レセプタ又は酵素ターゲットと化合物との両方において相当な範囲をカバーするものでなければならないと、バックグラウンドセクションは示唆している。例として、３コンポーネントからなるデータベースを構築するには、まず、新薬発見及び開発に直接関連する情報を豊富に含んだ広範囲の化合物セットを選択する。最も関連性の高い情報は、人体については臨床試験及び／又は市販後調査において、動物については前臨床試験において、このような化合物をテストした実際の経験から得られることが多い。他の関連する生物学的情報は、１以上の生物活性を示す自然物質や、レセプタの生物学的特徴を研究する当業界で使用されている化学的参照基準から得ることができる。したがって、このようなグループ４のデータベースのために選択される情報豊富な化合物の一実施形態には、市販の薬品、臨床試験又は前臨床試験をクリアしなかった薬物、生物活性な自然物質又は自然抽出物、及びレセプタ結合アッセイに用いられる対照剤などが含まれる。
【００２１】
このようなデータベースは、科学文献から得られるスクリーニングデータを用いて構築することができる。このアプローチでは部分的データセットを生成することができるであろうが、限界がある。第一に、文献参照によって得られるのは、一的に、ポジティブな情報（例えば、特定の化合物と特定のレセプタとの結合の阻害についてのレポート）だけである。有用な情報の比較を行うためには、ネガティブデータもポジティブデータと同様に重要である。さらに、ポジティブ及びネガティブの両データが揃っていないデータセットには、ある種の統計学的分析は適用できないだろう。第二に、それぞれ個別の、１の化合物の１のレセプタに対する結合性データに関する記事の定量的レポートと、他の化合物の同レセプタに対する結合性データに関する別のレポートとでは、アッセイの行われる方法が異なるため、比較することができない。したがって、グループ４の３コンポーネントデータベースを生成するための一実施形態は、広範囲のレセプタ又は酵素ターゲットについて、広範囲の化合物アレイをスクリーニングすることにより、一貫性のある比較結果を得て、ポジティブ及びネガティブの両データを確保するものとなるであろう。
【００２２】
化合物コンポーネント：化合物ライブラリーの選択及び化合物データの包括
本発明は、薬学研究開発に関連した生物活性が知られている化合物を１個のコンポーネントとして含むデータベースに関する。生物活性に関する情報は、化合物のデータベース又はテーブルに含むことができる。
【００２３】
例えば、これらの情報豊富な化合物には以下のものが含まれる。
（ａ）未知の化合物と、レセプタ又は酵素など特定の分子ターゲットとの間の相互作用又は分子結合を測定するための薬理学的対照剤又は対照基準である化合物。このような対照剤化合物の例には、試験化合物とレセプタ又は酵素を含む分子ターゲットとの間の結合作用を特徴付けるために用いる化合物が含まれる。他の対照剤には、ＳｉｇｍａＡｌｄｒｉｃｈＣｏｒｐ．の１団体であるＲｅｓｅａｒｃｈＢｉｏｃｈｅｍｉｃａｌｓＩｎｃ．（ＲＢＩ）のカタログや、その他の当業界に知られた情報源から選択された化合物が含まれ得る。これらの薬理学対照剤化合物は、事前にテストされるか、及び／又は薬品として市販されているか、あるいは生物活性の高い自然物質であるため、以下の３カテゴリーと重複する可能性がある。
（ｂ）現在又は過去に医療用として市販され、また相当量の生物学的情報が利用可能な既知の薬物である化合物。これらの化合物はよく知られており、連邦食品医薬品局（ＦＤＡ）などの米国政府官庁より入手可能な刊行物や、また民間企業又は非営利団体が発行する刊行物に掲載されている。非営利団体から発行されているこのような刊行物の一つは、米国薬局方会議（ＵｎｉｔｅｄＳｔａｔｅｓＰｈａｒｍａｃｏｐｅｉａｌＣｏｎｖｅｎｔｉｏｎＩｎｃ．）によるＵＳＰＤＩＳｅｒｉｅｓであり、この中の、第Ｉ章．医療業者のための薬物情報（ＶｏｌｕｍｅＩ．ＤｒｕｇＩｎｆｏｒｍａｔｉｏｎｆｏｒｔｈｅＨｅａｌｔｈＣａｒｅＰｒｏｆｅｓｓｉｏｎａｌ）は、ＵＳＰＤＩＵｐｄａｔｅにより毎月更新されている。販売許可された新薬は、このカテゴリーに属することとなる。市販の薬品、又はＦＤＡ又はこれと同等の外国の規制団体に認可された薬物は、公開された記録となるので、通常の知識を有する当業者はこのカテゴリーに属するような化合物を容易に同定することができるだろう。
（ｃ）新薬となる見込みがあるとしてＩＮＤ（治験薬：ＩｎｖｅｓｔｉｇａｔｉｏｎａｌＮｅｗＤｒｕｇ）ステータスを与えられたが、ＦＤＡからの認可を得るための臨床試験において十分な効能又は安全性を達成できなかった化合物、あるいは市販薬品としてのステータスを得るには至らなかった化合物などの、人体でのテストを許可された化合物。このカテゴリーの化合物には、ＦＤＡに販売認可されたが、後に市場から撤退したような化合物も含まれ得る。これらの化合物もまた、相当量の有用な生物学的情報を含んでおり、特に本発明の目的のために有用となるであろう。失敗した薬物の識別情報は、製薬企業又はバイオテクノロジー企業からの公式発表、“ＰｉｎｋＳｈｅｅｔｓ”などの刊行物、及びＦＤＡが保有しているリストなど、多くの情報源から得ることができる。
（ｄ）植物、微生物、動物等の自然資源から得られる生物活性を示す化合物。これらの自然物質には、新薬発見及び開発に関連する情報を与える、毒物、抗菌剤、行動調整剤（ｂｅｈａｖｉｏｒａｌｍｏｄｉｆｉｅｒ）、防衛剤、その他のカテゴリーの化合物が含まれるであろう。自然物質の識別情報は、ＲＢＩやＳｉｇｍａＡｌｄｒｉｃｈの化合物カタログなどの多くの刊行物から得ることができるが、これらに限られるわけではない。
【００２４】
このデータベースに含まれる化合物のそれぞれについて、化学的構造、化学式、物理化学的特性、化学的空間配置、その他の空間化学情報（例えば、Ｓｍｉｌｅｓｃｏｄｅｓ）、可溶性、その他の関連データが、利用可能な範囲で、データベースのフィールドに収録されている。通常の知識を有する者は、その他の収録可能なパラメータを認識することができるだろう。データベース中の化学構造の関係性から、又は他の関係性から、化合物を構築することができる。
【００２５】
図１Ａは、リレーショナルデータベースの化合物テーブル３００を示す。テーブル３００には複数の化合物がリストされており、複数の化合物Ｎの記録（行１￣Ｎ）を含んでいる。各化合物について、該化合物に関する情報を含む列３０１￣３０７がある。例えば、図１Ａにおいて、列３０１は化合物名を、列３０２は化合物の種類（例えば、人体での試験を許可された化合物など）を、列３０３は化学的構造に関する情報、例えば、構造図を含むスクリーンを呼び出すためのハイパーリンク（図１Ｂのスナップショット３１０を参照）を、列３０４は化合物の化学式を、列３０５は化合物の物理化学的特性に関する情報を、列３０６は化合物の空間配置を、列３０７は化合物の可溶性に関する情報を含んでいる。
【００２６】
テーブル３００にリストされた各化合物３０１に関するその他の関連データを含めることができるように、さらに列を追加してもよい。これらの追加された列に化合物の生物活性を含めることにより、化合物データベースを２コンポーネントのデータベースにすることもできる（データベース５００を参照）。
【００２７】
図１Ｂは、テーブル３００中の記録に関連する情報を含むスナップショット３１０を示す。例えば、化合物の化学式３０４を、化合物の構造３０３とともに、スナップショットに含めることができる。
【００２８】
分子ターゲットコンポーネント：レセプタ、酵素、その他の分子ターゲットの選択及び分子ターゲットデータの包括
本発明のデータベースの第２のコンポーネントには、分子ターゲットとして、新薬発見及び開発に関連するレセプタ、酵素、その他のタンパク質、核酸、炭水化物、その他の高分子化合物などが含まれる。本発明の１実施形態では、レセプタ及び酵素が主要な分子ターゲットである。レセプタは、体内の細胞及び器官における分子レベルのコミュニケーションの大部分を緩和する。酵素は、例えば、２次的メッセンジャシステム及び細胞シグナル経路によって、このようなコミュニケーションを増大させることが多い。
【００２９】
レセプタには、ドーパミンレセプタ、セロトニンレセプタ、アヘン剤レセプタ、ムスカリン性レセプタ、アドレナリン作用性レセプタ、アデノシンレセプタ等の典型的なレセプタ群が含まれる。これらのレセプタ群には、レセプタタイプのサブタイプ（ドーパミン−１、ドーパミン−２、ドーパミン−３、ドーパミン−４及びドーパミン−５レセプタなど）が含まれる。あるサブタイプは、さらなるバリエーション（ドーパミン４．２、ドーパミン４．４及びドーパミン４．７など）を有し、又は異なる形状（ドーパミン２ｓｈｏｒｔ及びドーパミン２ｌｏｎｇなど）を持つものもある。特定のレセプタをエンコーディングする遺伝子の変異が起こることにより、薬物その他の化合物に対する結合性が通常のレセプタとはわずかに異なるレセプタ群のサブセットが誘導される可能性があり、レセプタのスプライス変形が出現する可能性もある。レセプタは、ファミリ、スーパーファミリ、又はサブファミリにより分類することができる。Ｇ−タンパク質結合レセプタ、膜貫通型レセプタ、核レセプタ等の分類法がある。関連遺伝子のＤＮＡシーケンスの類似性の程度によって、レセプタを分類することができる。また、アミノ酸シーケンス及びこれに関連する３次元配位によっても、レセプタを分類することができる。組織内のレセプタ出現位置によって、又は異なる細胞種にわたって、レセプタを分類することができる。
【００３０】
酵素には、プロテアーゼ、カルボヒドラーゼ、キナーゼ、ホスホターゼ、ＤＮＡ修飾酵素、トランスフェラーゼ、Ｐ４５０、その他の当業者に知られた酵素が含まれる。
【００３１】
本出願人は、他のレセプタ、レセプタ供給源、及びこれに関するアッセイを常時開発して、データベースコンテンツに追加している。追加されるレセプタ及びレセプタアッセイは、当業者によく知られたものである。新薬発見及び開発に関連するレセプタについてのリスト及び記述は、当業者に知られている多くの刊行物から得ることができる。これらの刊行物には、ＲＢＩレセプタ分類ハンドブック（ＲＢＩＨａｎｄｂｏｏｋｏｆＲｅｃｅｐｔｏｒＣｌａｓｓｉｆｉｃａｔｉｏｎ）及びＩＵＰＨＡＲレセプタ分類書（ＩＵＰＨＡＲｒｅｃｅｐｔｏｒｃｌａｓｓｉｆｉｃａｔｉｏｎｂｏｏｋ）が含まれる。さらに、新規のレセプタ及びレセプタサブタイプが発見されると、これらもデータベースコンテンツに追加される。
【００３２】
酵素及び酵素アッセイは当業者によく知られている。新薬発見及び開発に関連するレセプタについてのリスト及び記述は、当業者に知られている多くの刊行物から得ることができる。
【００３３】
図２は、分子ターゲット情報にアクセスするために使用できるリレーショナルデータベースシステムの部分を構成するテーブル４００、４１０及び４２０を示す。テーブル４００にはターゲットがリストされており、複数のターゲットＭの記録（行１〜Ｍ）を含んでいる。列４０１にはターゲット名がリストされており、列４０２には各ターゲット名についてターゲットのタイプが特定されている。
【００３４】
テーブル構造は、列４０２に特定されるターゲットのタイプにより変化させてもよい。テーブル４１０には、レセプタとして分類されテーブル４００にリストされたターゲットに関する情報が含まれている。特定のレセプタ名についてデータベースをクエリーすることにより、テーブル４１０の記録にアクセスすることができる。テーブル４１０のレセプタ名は、列４０２の“レセプタ（Ｒｅｃｅｐｔｏｒ）”と示されているターゲット名についてテーブル４００をクエリーすることにより、アクセスが可能である。
【００３５】
テーブル４１０の列４１１にはレセプタ名が含まれているが、このレセプタ名はテーブル４００の列４０１のターゲット名にもなっている。列４１２はレセプタのファミリ情報を含み、列４１３はレセプタのスーパーファミリ情報を含み、列４１４はレセプタのサブファミリ情報を含み、列４１５は関連遺伝子のＤＮＡシーケンスの類似性の程度に関する情報を含み、列４１６はアミノ酸シーケンスに関する情報を含んでいる。アミノ酸シーケンスは、データベースに含まれている多数の分子デスクリプタの１つである。他の分子デスクリプタ４１７には、例えば、アミノ酸シーケンスに関連した水治療法プロットが含まれていてもよい。テーブル４００、４１０及び４２０に示す分子ターゲットデータベースにはターゲット情報が含まれており、ターゲットに関連する生物学的情報もまたデータベースに含まれているため（テーブル６００）、このデータベースは２コンポーネントからなると考えられる。ここに示した列は、データベースに含まれ得る情報のタイプを示すものであり、本発明を限定するものと解釈されるべきではない。
【００３６】
テーブル４２０には、テーブル４００に酵素として分類されているターゲットの情報が含まれている。特定の酵素名についてデータベースをクエリーすることにより、テーブル４２０の記録にアクセスすることができる。テーブル４２０の酵素名は、列４０２に“酵素（Ｅｎｚｙｍｅ）”と示されているターゲット名についてテーブル４００をクエリーすることによりアクセスできる。
【００３７】
テーブル４２０の列４２１には酵素名が含まれているが、この酵素名はテーブル４００の列４０１のターゲット名にも含まれている。列４２２は酵素のタイプに関する情報を含んでいる。列４２３は“その他の関連情報”とされており、ユーザがアミノ酸シーケンス及び分子デスクリプタを含む他の酵素情報にアクセスしたい場合などに応じて、さらなる列がテーブル４２０に追加される可能性があることを示している。
【００３８】
ターゲットのタイプによる分子ターゲット情報へのアクセスを説明するために、テーブル４１０及び４２０のみを示したが、このリレーショナルデータベースシステムには、データベースに使用可能な分子ターゲットターゲットのタイプの数に応じて、さらなるテーブルを追加することができる。
【００３９】
生物学的情報コンポーネント：生物学的／化学的情報のパラメータ
データベースの一部を構成する生物学的情報には、例えば、副作用、薬理作用のメカニズム、薬物による代謝、毒性、吸収性、分布及び排出などに関する事項が含まれている。これらの情報は、市販薬のＦＤＡ認可ラベルから、又は臨床試験をクリアしなかった薬物に関する文献及び刊行物から得ることができる。パラメータの具体例として、毒性、ＬＤ_５０、ＬＤ_５０／ＥＤ_５０、催奇性、毒性メカニズム、毒性のターゲットとなる器官、インビトロ毒性バッテリー、アポトーシス誘発性、生物学的利用性（バイオアベイラビリティ）、吸収性、血液脳関門、経口吸収性、粘膜吸収性、吸収率％、分布、限界血液タンパク質、半減期、作用の発現、作用の持続性、血液内濃度のピーク、代謝、主要経路、非主要経路、活性代謝物質、排出、第１排出様式、第２排出様式、インビトロ効用、治療法の指示、動物行動での作用、副作用、既知の主要なターゲット、その他のターゲット器官／システム、及び既知のレセプタ相互作用があげられる。
【００４０】
図３は、上記のうちいくつかの生物学的情報パラメータを含むテーブル５００を示す。テーブル５００は、第１のデータベース内に存在し得る全ての化合物に関連するＮ個の行（１〜Ｎ）からなる。列５０１には化合物名が含まれ、列５０２には（市販薬又は試験をクリアしなかった薬物についての）治療法の指示が含まれ、列５０３には毒性に関する情報が含まれ、列５０４には副作用に関する情報が含まれ、列５０５には薬物の作用メカニズムに関する情報が含まれている。例えば、テーブル５００をテーブル３００と関連付けることにより、化合物及び生物活性の２コンポーネントテーブルを構成することができる。
【００４１】
図３はまた、データベース内の分子ターゲットに関連した生物学的情報パラメータを含むテーブル６００を示す。第２のデータベース内に存在し得る全てのターゲットに関連するテーブル６００は、Ｐ個の行（１〜Ｐ）からなる。列６０１にはターゲット名が含まれ、列６０２には（市販薬又は試験をクリアしなかった薬物についての）治療法の指示が含まれ、列６０３には毒性に関する情報が含まれ、列６０４には副作用に関する情報が含まれている。上記同様に、例えば、テーブル６００をテーブル４００と関連付けることにより、分子ターゲット及び生物活性の２コンポーネントテーブルを構成することができる。テーブル５００及び６００はともに、分子ターゲット情報、化合物情報、及び各分子ターゲット及び各化合物に関連した生物活性情報を含む全分類データベース（例えば、リレーショナルデータベースシステム内に存在し得る化合物及び分子ターゲットの全組み合わせを含むもの）とすることができ、また、多次元データベースとみなすことができる。本発明の範囲を逸脱することなく、テーブル５００及び６００に追加の列を加えることができる。
【００４２】
結合性情報の決定
本発明の主要な特徴は、化合物、分子ターゲット及び生物学的情報からなる複数の情報コンポーネントの構成と、化合物及び分子ターゲット間の結合性、反応性、その他の相互作用についての評価である。この結合性又は反応性に関する情報を再度既知の生物学的情報に関連付けることにより、新薬発見及び開発に使用できるパターン又は関係性を選別することができる。本発明の重要な側面は、化合物及び分子ターゲット間の広範囲かつ一貫した結合性又は反応性データを生成することにより、必要なパターン又は関係性を特定するための可能な限り完全なデータセットを提供し、また、ポジティブ及びネガティブ両方の結合性又は反応性情報を提供することである。本発明の１実施例では、例えば、特定の分子ターゲット又は分子ターゲットセットに対して閾値セットを満たすか満たさないかを示す数値デスクリプタとして、結合性データを構成する。この数値デスクリプタは、生物学的システム又は生物学的情報セットに適する閾値にほぼ近い濃度において評価された、各化合物及び各レセプタ、その他の分子ターゲットに対する反応性の有無に関連付けることもできる。例えば、ある化合物について、レセプタ及びこれに対応する特異化合物間の結合阻害を、濃度１０^−５Ｍ（１０マイクロモル）、閾値３０％でテストすることができる。この他に、初期濃度又は阻害率閾値を設定することができる。また、本発明の１実施形態においては、この最初のイエス／ノーテストにおいて閾値を超える結合阻害を起こす化合物について、結合阻害能をさらにテストする。これらの活性化合物について、例えば、１０^−５〜１０^−９Ｍの範囲内の７〜１４の異なる濃度を含む一連の濃度条件でテストを行い、特定のレセプタにおける活性化合物のＩＣ_５０及び／又はＫｉの値を決定する。この決定をするための濃度条件の数はこれよりも多くとも少なくともよく、また、１０^−５〜１０^−９Ｍより高い又は低い濃度範囲を使用することが必要となることもあるだろう。これらのデータから、各分子ターゲットに対する各活性化合物の相対的活性度又は相対的効能のマトリクスが得られる。
【００４３】
これらのスクリーニングデータを生成するために、まず、化合物を適当な溶媒系において可溶化する。溶媒系は、例えば４％ＤＭＳＯなどを用いることができるが、その他の濃度のＤＭＳＯ又は他の溶媒を用いることもできる。次に、これらの化合物のストック溶液を適当な濃度に希釈して、貯蔵溶液（ｒｅｐｏｓｉｔｏｒｙ）として利用できるようにする。化合物及び分子ターゲット間の相互作用を測定するための各アッセイごとに、用いる試薬及び手順は異なったものとなる。このような各アッセイを特徴づけ、ルーチン化して一貫性のあるものにする必要がある。アッセイを行うごとに、適当な対照試験を行う必要がある。所望のタイプ及び正確さを持つ情報を生成することができるあらゆるアッセイフォーマットを使用することができる。放射性標識、蛍光分析、蛍光偏光分析、時間分解蛍光分析、蛍光相関性分光分析、化学ルミネセンス、ＵＶ吸光、比色分析など、数多くのアッセイ検出システムを使用することができる。
【００４４】
本発明の１実施例では、レセプタ結合アッセイ又は酵素活性アッセイを用いて分子相互作用に関するデータを生成する。例えば、レセプタ結合アッセイでは、貯蔵溶液の化合物を、レセプタと該レセプタ用に選択された参照剤との間の結合相互作用の阻害能についてテストする。レセプタは、動物又は人間の組織などから得ることができ、あるいは、レセプタ用の遺伝子を含むようトランスフェクションされた細胞株からも得ることができる。アッセイのレセプタ源を、例えば、レセプタを含む細胞フラクションとして用意することができる。レセプタは、また、部分的に精製されていてもよい。参照用の化合物又はリガンドは、特定のレセプタに対する潜在的な及び／又は特定の結合に基づいて選択されるのが好ましく、また、ヨウ素１２５、トリチウム、炭素１４、その他の放射性トレーサを含むことにより、結合したリガンドと結合していないリガンドとを識別可能にすることができる。データベースに含まれるべき化合物の結合性データの試験と並行して、ポジティブ及びネガティブの対照試験を行い、参照（放射性）リガンドの様々な濃度での参照曲線により、行われたアッセイの質を保証する。
【００４５】
多数の方法及びシステムにより、ターゲット及び化合物間の相互作用を測定することができることを、当業者は理解できるだろう。結合反応を平衡状態に到達させるために、放射性リガンド、レセプタ調製溶液、及び試験化合物を、適当な時間、適当な緩衝溶液中、適当な温度下で培養する。結合放射性リガンドの非結合放射性リガンドに対する量は、ろ過や、ＳＰＡ（ｓｃｉｎｔｉｌｌａｔｉｏｎｐｒｏｘｉｍｉｔｙａｓｓａｙ）などの方法を用いた分離ステップにより決定され、液体シンチレーション又はガンマ計数により測定される。次に、試験化合物のアッセイ結果とポジティブ及びネガティブ対照試験とを比較することにより、試験化合物の特定の結合数を決定する。これらのデータから試験化合物の阻害パーセントが計算される。
【００４６】
図４には、スクリーニング結果及びアッセイデータベースを表すテーブル２００が示されており、データベース３００（１〜Ｎの化合物からなる）に含まれる化合物は、データベース４００に含まれる分子ターゲットに対する作用についてテストされるようになっている。テーブル２００は数多くの形式を取ることができる。例えば、テーブル２１０においては、複数の分子ターゲットの各々に対してテストされた複数の化合物の各々のスクリーニング結果が、各測定セットについて選択されたテスト結果閾値を超えるか下回るかに基づいて、スクリーニング結果を“イエス”又は“ノー”のエントリーとして入力することができる。
【００４７】
他の例では、スクリーニング結果は、複数の分子ターゲットの各々に対してテストされた複数の化合物の各々について、結合その他の作用の効能又は程度（例えば、化合物−レセプタ相互作用のＫｉ）を特定する数値デスクリプタとして、テーブル２２０に入力される。好ましい実施形態では、テーブル２１０及び２２０において、このような“化合物”×“ターゲット”の全てのマトリクスポイントが決定され、全種類データベースが生成される。また、スクリーニング結果及びアッセイデータベース２００は、他の化合物−ターゲット相互作用の測定結果を含んでいてもよく、この測定結果には、スクリーニング結果の生データ及びこの生データから得られる測定結果、アッセイの手順及び特徴、その他の関連情報が含まれる。
【００４８】
図５Ａ及び５Ｂは、ここではレセプタ選択性でとして例示してある、データベース１００をスクリーニングプロセスの一部として使用することにより、更なる開発の対象となるべき新薬候補としての新規化合物を発見し選別すること（図５Ａ）、又は、特定の病気検出のための新薬候補を発見するために使用する潜在的有効ターゲットとしての新規ターゲットを発見し選別すること（図５Ｂ）を示している。データベース１００は化合物コンポーネント３００、分子ターゲットコンポーネント４００、生物学的情報コンポーネント５００及び６００、及びスクリーニング結果及びアッセイデータベース２００を含むことができる。
【００４９】
新規化合物又は化合物セットをスクリーニングプロセス１０２に導入して、これらが特定の化合物（例えば、参照剤）及び分子ターゲットに対する結合阻害能を有するかどうかを決定する（図５Ａ参照）。スクリーニングプロセスでは、分子ターゲットコンポーネント４００から得られるターゲット情報を使用することができる。
【００５０】
スクリーニングプロセス１０２の結果は、中間データベースに記憶し、又はレセプタ選択性データベース１００のスクリーニング結果及びアッセイデータベース２００に入力することができる。また、この結果を特定のパラメータ（例えば、細胞毒性など）として、生物学的情報データベース５００に記憶し、又は化合物データベース３００に（例えば、化合物名として）記憶することもできる。
【００５１】
スクリーニングプロセス１０２から得られる結果の完全なセットを、スクリーニング結果及びアッセイデータベース２００に記憶することができる。化合物及び分子ターゲット間の結合に対して阻害能を示す新規化合物（例えば、参照剤）について、このデータベース２００をクエリーすることにより、これらの新規化合物をさらにテストすることができる。
【００５２】
これに代わって、例えば、「オーファン（ｏｒｐｈａｎ）」レセプタなどの新規分子ターゲットをスクリーニングプロセスに導入して、化合物データベース３００の化合物に対するテストを行うこともできる（図５Ｂ参照）。オーファンレセプタについては、その構造は知られているが、機能及び病気との関連性は知られていない。新規分子ターゲットと相互作用する化合物の識別情報を含むスクリーニングプロセスの結果は、スクリーニング結果データベース２００に組み入れられる。データベース１００内をクエリーして、新規分子ターゲットの機能を特定し、及び／又は新規分子ターゲットの病気との関連性を確認する。
【００５３】
図６Ａは、データベース１００を使用した新規化合物の薬物としてのポテンシャルの予想を示す。テーブル７１０は、化合物（３００）、分子ターゲット（４００）、生物学的情報（５００及び６００）、及びスクリーニング結果（２００）のデータベースの情報に依存している。ユーザが新規化合物に関する情報をデータベース１００に与えると、自動クエリースクリプトを実行してこの情報を取得することにより、テーブル７１０にはこれらのうち１以上のデータベース（又はテーブル）からの情報が含まれるようになっている。
【００５４】
このテーブル７１０を生成するためのクエリースクリプトは、新規化合物の情報を与えられると、化合物を化合物データベース３００から選択することができるようになっている。この選択は、新規化合物と、データベース３００に既に含まれている化合物との間の、化学的構造その他の特性の類似性に基づいて行うことができる。
【００５５】
化合物の選択後、クエリースクリプトは、選択された化合物と反応（結合）することが知られているターゲットをターゲットデータベース４００から選択する。最後に、選択された化合物及び分子ターゲットを用いて、生物学的情報データベース５００及び６００をクエリーし、化合物−分子ターゲットの対に関する生物学的情報をテーブル７１０に挿入する。これに代わって、ユーザが興味のある特定の生物学的情報カテゴリー（例えば毒性）を入力することにより、テーブル７１０に含まれる生物学的情報をこのカテゴリーに限定することもできる。
【００５６】
ユーザは、テーブル７１０をクエリーして、新規化合物の薬物として使用のポテンシャルの予測に関連した情報を得ることができる。この例としては、該新規化合物に関連した化合物と反応することが知られている分子ターゲットのクエリー、及びこの分子ターゲットが該化合物とともに引き起こすことが知られている副作用のクエリーが挙げられる。
【００５７】
図６Ｂは、データベース１００の使用をして、新規化合物の薬物ポテンシャルの予測と同様なアプローチにより、図６Ｂに示すデータ入力及びクエリーを用いて、新規分子ターゲットの病気との関連性及び／又は生物学的機能を確認する方法を示す。
【００５８】
本明細書で言及した全ての特許、特許出願及び刊行物は、参照資料として本発明に含まれる。
【００５９】
上記の本発明の実施形態の記述は、例示及び説明のためのものであり、網羅的なものではなく、また本発明をここに開示した形態に限定するものではない。上記の開示内容に照らして、又は本発明の実施に際して、本発明の修正や変更を行うことが可能であろう。
【図面の簡単な説明】
【図１】
図１Ａは、本発明の１実施形態におけるレセプタ選択性マッピングデータベースの化合物テーブルを示す。
図１Ｂは、本発明の１実施形態におけるレセプタ選択性マッピングデータベースの化合物の空間配置を含む化合物記録のスナップショットを示す。
【図２】
本発明の１実施形態におけるレセプタ選択性マッピングデータベースの、分子ターゲット情報へのアクセスに使用可能な数個の論理テーブルを示す。
【図３】
本発明の１実施形態におけるレセプタ選択性マッピングデータベースの生物学的情報のテーブルを示す。
【図４】
本発明の１実施形態におけるレセプタ選択性マッピングデータベースをスクリーニングプロセスの一部として使用する方法を示す。
【図５】
図５Ａは、レセプタ選択性マッピングデータベースをスクリーニングプロセスの一部として使用することにより、新薬候補となる新規化合物を発見し選別する方法を示す。
図５Ｂは、レセプタ選択性マッピングデータベースをスクリーニングプロセスの一部として使用することにより、特定の病気に対する新薬候補を発見するためのターゲット候補となる新規ターゲットを同定する方法を示す。
【図６】
図６Ａは、新規化合物の薬物としての可能性を予測するためのデータベースの使用法を示す。
図６Ｂは、新規分子ターゲットの病気との関連性及び／又は生物学的機能を確認するためのデータベースの使用法を示す。[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention generally relates to a technique for constructing a multidimensional database by combining chemical informatics and bioinformatics with data on the interaction of chemical molecular targets. In particular, the present invention relates to a database containing compounds, molecular targets, and biological or clinical information, in which patterns or relationships of interactions between the compounds and molecular targets are determined and stored in a database. Comparisons with other information have led to conclusions that will be useful in drug discovery and development and related areas.
[0002]
[Prior art]
The global pharmaceutical industry spends $ 30 billion a year on research and development, of which approximately one-third is the time it takes to select drug candidates for preclinical and clinical development. Spent during the early stages of development. The important clinical steps for drug discovery consist of the following steps. (1) a sequence of DNA containing a segment of the human genome, (2) identification of a gene having a genome associated with a particular disease or biological function, (3) associated with or by the functional gene. Generation of proteins such as receptors or enzymes that are encoded and later become biological or molecular targets for drug discovery, (4) screening from compound libraries for activity against molecular targets, (5) other biology To screen for compounds that are most active against a specific target, assess the selectivity or specificity of the compound for the biological / molecular target of interest, and the potential for unwanted side effects due to activity against other targets (6) To test characteristics such as toxicity, absorption, distribution, metabolism, and excretion (7) The most promising compound is evaluated based on empirical judgment using the above information, and the information is sent to the chemical synthesis group. (8) After retesting the chemical analogs in steps (4), (5) and (6) until the optimal derivative or group of compounds is identified (8) Repeat 7), (9) Use this optimal derivative for further preclinical and clinical trials.
[0003]
In this discovery and development process, compounds that pass through narrow filters are screened for more expensive preclinical and clinical development. Unfortunately, in preclinical and clinical development that follows this screening process, compounds often fail to pass these stages and do not reach commercialization. Due to these failures, the average cost of developing and launching one new drug is estimated to be over $ 300 million. However, if the best drug candidates can be accurately identified early in the discovery and development process, and if the drug clears preclinical and clinical trials, development costs can be reduced by as much as 75%. Can be. Clearly, a major objective in pharmaceutical research and development (R & D) is to improve the predictability of such early drug development tests.
[0004]
Innovations in biotechnology and the development of instruments that can automate many of the experimental processes have created two major trends that have a significant impact on pharmaceutical R & D. First, due to advances in the sequencing of the human genome, the number of molecular targets (such as new receptors and enzymes) that can be used in new drug discovery screening programs continues to increase rapidly. Approximately 400 molecular targets are being investigated for new drug discovery, and the number of potential molecular targets that may be elucidated by the human genome project is estimated to be in the thousands to more than 10,000. Second, new technologies such as automation and combinatorial chemistry have increased the size of compound libraries that can be used in new drug discovery screening programs by approximately 10-fold (more than one million compounds found in many pharmaceutical companies). ) Is expanding. While these two factors offer great promise for new drug discovery, they also create significant potential problems with undesirable cost consequences for new drug development. More targets and compounds lead to the discovery of additional bioactive compounds, resulting in greater difficulties in selecting the best drug candidate to proceed to preclinical trials, and more compounds are needed for preclinical trials and Development costs will increase as going to clinical trials creates more failures at these stages.
[0005]
These factors increasingly require rapid and low cost ("tube" or microplate-based) in vitro assays in the selection, optimization and detection of derived compounds. Such a rapid assay would help identify the most promising of these active compounds before moving on to later expensive drug development stages. These factors further require more effective methods for managing and interpreting vast amounts of data on genes and gene products (molecular targets), chemical structures, and screening results.
[0006]
One application of in vitro assays, which has become increasingly important in pharmaceutical R & D, is "profiling." The patentee of this patent application pioneered the concept of profiling in the late 1980s. Pharmaceutical companies have a vast array of in vitro assays to characterize the pharmacological activity and potential side effects of compounds developed as new drugs. Currently, molecular targets play important roles in a wide range of human diseases, including diseases involving central nervous system disorders, immune diseases, pain and inflammation, infectious diseases, cancer, metabolic or growth factors, cardiovascular function, and endocrine system There are over 200 different assays that are routinely performed, based on the required receptors and enzymes. Drugs account for more than half of the global market function due to their interaction with cell receptors. In addition, the side effects of many drugs have been mitigated by their interaction with receptors and enzymes.
[0007]
In profiling, derived compounds of a pharmaceutical company, which are generally in the preclinical development phase, are tested with receptors and enzyme assay devices. Information on the interaction of this company's compounds with specific receptors, obtained from the profiling process, is important in the optimization and selection of derived compounds and suggests any side effects or potential second effects of the compounds. Become. With this knowledge, the pharmaceutical company can potentially save millions of dollars in the time and cost of preclinical and / or clinical development of the compound.
[0008]
Although profiling services have been available for many years, pharmaceutical companies have typically used data from these tests empirically. Many drugs interact with many receptors or other molecular targets, including highly selective drugs. Therefore, the data generated by profiling should be interpreted by researchers at the pharmaceutical company based on experience and knowledge, taking into account both the data on the chemical structure of the compound and the binding activity of the compound to a particular receptor. It is. Unfortunately, even the most experienced pharmacologists have incomplete knowledge of the interaction of various drugs with a wide range of receptors associated with new drug development.
[0009]
The need for more effective methods to manage, collate, interpret and utilize vast amounts of data on genes and gene products (molecular targets), chemical structures, and screening results has led to bioinformatics and New opportunities have been created in chemical informatics or the management of biological and scientific data. The steps to creating a vast pool of information for new drug discovery consist of the following steps: (1) DNA sequencing (coding of genetic material or gene that serves as a blueprint for cells to produce gene products or proteins), (2) Functional genomics (particularly in response to changes in drug or biological function) , The process of converting a DNA sequence into the relevant gene product or protein via the mRNA product), (3) proteomics (amino acid sequence and / or gene product such as a receptor encoded in the gene or (4) Identification of three-dimensional structure of protein), (4) Pharmacology / toxicology of trace molecules (molecular binding or interaction between gene products such as receptors and small organic compounds that can be drugs), (5) Chemical Structure (for micromolecules, drug analogs).
[0010]
Databases for DNA sequencing (Group 1) have been established and include Genbank, Genome Center and others. Similarly, chemical structure databases (Group 5) are also well known and are provided by vendors such as MDL (Isis) and Oxford Molecular. Proteomics databases (Group 3), such as SWISS-PROT, ProLink and PDB, have also been constructed. These databases contain structural information and can be used to determine patterns in one dimension or in one component of structural or sequence information, so each can be considered as one component. . The databases of Groups 2 and 4 are not well established yet, but will provide valuable additional information in the information pool for drug discovery and development. These latter two types of databases contain data on the interaction between the two structures, such as gene-to-protein (group 2) and protein-to-compound (group 4). And it is two-dimensional. Such a database relationship has an added level of complexity compared to a database consisting of one component.
[0011]
Partial databases or multiple databases for Group 4 protein-to-compound relationships are currently being constructed. For example, the binding profile of a single compound to a wide range of receptor targets provided by the assignee to the client is a partial dataset of a group 4 type database. Similarly, advanced processing such as screening thousands to hundreds of thousands of compounds, such as those contained in a chemical structure database (Group 5), for activity against a particular receptor target (a single point in Group 3) The data generated by the screening project will be part of the Group 4 database. While such partial group 4 databases would be useful in new drug discovery and development, they have two major drawbacks. First, they are specific components of two components, such as the binding selectivity of a single compound or a limited set of compounds to a range of receptors (profiles) or multiple compounds on a single receptor target. It is about analysis (advanced processing screening project). In each case, the width of the dataset is not large enough to handle the statistical correlation between the receptor targets and the chemical structures. Second, and importantly, these partial datasets are generated for compounds selected based on structural novelty, ie, potential as a new drug. Because these are new compounds, there is no biological information about their activity in the animal or human body. Thus, such an approach suffers from the same limitations as pharmacologists who attempt to interpret profile data empirically, as described above.
[0012]
[Problems to be solved by the invention]
Accordingly, it is an object of the present invention to meet the above needs by providing systems and methods for data analysis related to drug discovery and development. A full classification screening database is provided that includes positive and negative data from test results for multiple compounds and multiple molecular targets. The number of combinations of compounds and molecular targets can be determined by one of ordinary skill in the field of statistical techniques or other data mining methods by using this screening database and the associated compound and molecular target databases. It must be large enough to make reliable predictions about whether a compound is suitable for clinical trials and has a high likelihood of being a safe and effective drug.
[0013]
[Means for Solving the Problems]
The present invention discloses, inter alia, systems and methods that meet the above needs. The system includes a first database having records relating to the plurality of compounds and records relating to the effects of the plurality of compounds on human and animal biological systems, and a record relating to the plurality of molecular targets. And a second database having a second database. The computer system further includes a third database having records associated with tests for binding, reactivity, and other interactions between the compounds in the first database and the molecular targets in the second database. Including. The test includes selecting from a plurality of molecular targets in the second database and interacting with a compound known to interact with a particular molecular target (eg, a control agent or control standard). Information about the effects of a compound selected from the plurality of compounds in the first database is included, and the test is performed on a plurality of molecular targets in the second database. ing. Means for setting thresholds for interaction tests associated with said side effects, and selecting information on compounds, compound sets, and / or compounds when the results of said effect test meet said interaction test thresholds. Means are also included in the computer system. Information in the first and second databases and a record associated with one or more compounds in the first database and / or one or more molecular targets in the second database; A user interface to allow the user to view and manipulate or analyze the information in the database of Step 3, especially for compounds, molecular targets and other database records related to results meeting said interaction test threshold. Is provided.
[0014]
In addition, the invention relates to applying statistical and other data mining techniques to these multidimensional databases to determine correlations or patterns related to new drug discovery and development.
[0015]
Both the foregoing general description and the following detailed description provide examples and explanations, but are not intended to limit the scope of the invention.
[0016]
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and, together with the description, explain the advantages and principles of the invention.
[0017]
BEST MODE FOR CARRYING OUT THE INVENTION
BRIEF DESCRIPTION OF THE DRAWINGS Preferred embodiments of the present invention will be described, examples of which are illustrated in the accompanying drawings and will be apparent from the detailed description of the invention. The same reference numbers in different drawings shall identify the same or similar elements wherever possible.
[0018]
The systems and methods consistent with the present invention analyze data related to new drug discovery and development, e.g., the potential for new compounds to be safe or effective new drugs, and It is possible to predict whether to proceed. In the following description, the system and method of the present invention will be used in relation to a relational database containing a plurality of main tables, and the binding between a compound and a molecular target will be used as a metric for the interaction between the two. Explanation will be given in relation to usage. This description may also generally apply to other database structures having multiple main components, and to measuring other interactions between compounds and molecular targets.
[0019]
The present invention relates to novel designs, structures, and applications of information-rich databases on compounds, molecular targets, particularly proteins, other macromolecules, and the biological activities of these compounds. The present invention further provides for the identification of known drugs and drug candidates that have not passed all clinical trials or clinical trials, along with preclinical and clinical data, including side effects, reaction mechanisms, and other medical data, of these compounds in a database. It also relates to the method used as a source of information for the library. The present invention also determines the binding and other interactions between compounds in the database and molecular targets, and uses relationship analysis and data mining techniques to relate these interaction patterns to new drug discovery and development. The specific chemical structure, substructure, or other characteristic of a compound that interacts with a particular biological reaction, or the biochemical characteristics or structure of a molecular target that interacts with such , And its correlation with other features. The following can be referred to as an example of such a data mining technique. All of these references are included in the present invention.
a) Chen et al. Recursive Partitioning Analysis of a Large Structure Structure-Activity Data Consulting Monthly Review of a Three-dimensional Descriptor of a Large-Scale Structure-Activity Dataset Using a Three-Dimensional Descriptor. ;
b) Hawkins et al. , Analysis of Large Structure-Activity Data Set Using Recursive Partitioning using Quantum., "Analysis of a Large Structure-Activity Data Set Using Recursive Partitioning." Struct. -Act. Relat. 16: 296-302 (1997);
c) DePriest et al. 3D-QSAR of angiotensin converting enzyme and a thermolysin inhibitor; comparison of CoMFA models based on a priori, empirically determined active site geometry based on reduced and experimentally determined active site geometries), J. Am. Am. Chem. Soc. 115: 5372-84 (1993);
d) Good et al. Lipkowitz, K., in Reviews in Computational Chemistry; B. Boyd, D .; B. (Des.), VCH, New York, Vol. 7. pp 67-117 (1996);
e) Marshall et al. Computer-Assisted Drug Design ACS Symposium Scrica 112; American Chemistry Society: Washington, DC, 1979; pp 205-226;
f) Molec et al. , 3-dimensional structure activity relationships and biological receptor mapping (A three-dimensional structure activity of relationship and biological receptor mapping), Mathematics and Computational Concepts in Chemistry; Ellis Horwood; Chichester, 1985 years; pp 225-251;
g) Mayer et al. , A unique geometry of the active site of angiotensen-converting enzyme continuance activity structure. Comput. Aided Mol. Des. , 1: 3-16. (1987);
h) Sheridan et al. , An ensemble approach to distance geometry method: application to the nicotinic pharmacophone, j. med chem. . 29: 899-906 (1986);
i) Martin et al. , A fast new approach to pharmacophone mapping and its application to dopamine and benzodiazepine antagonists (A fast new approach to pharmacophone mapping and it's application to dopaminegic edz. Comput. Aided Mol. Des. , 7; 83-102 (1993);
j) Catalyst / Hypo Tutorial, version 2.0, BioCAD Corp. Mountain View, CA, 1993;
k) Sprague, P .; W. Automated generation of chemical hypotheses and database search by catalyst (Automated chemical generation and database searching with Catalyst), Perspect. Drug Discov. Des. , 3: 1-20 (1995);
l) Barnum et al. (Identification of common functionals ammonium molecules), J. Am. Chem. Inf. Comput. Sci. , 1996, 36: 563-71 (1996);
m) Hiphop Tutorial, version 2.3; Molecular Simulation Inc. Sunnyvale, CA, 1995;
n) Davies, K .; And Uppin, R .; , 3D pharmacophore searching, net. Sci. , (Http://www.org/Science/Cheminform/feature02.html);
o) Goldener, V.A. And Vesterman, B .; , APEX 3D expert system for drug design, Net. Sci. (Http://awod.com/netsci/science/compchem/feature09.html);
p) Van Drie, J. et al. Strategies for the determination of pharmaceutical 3D database queries, J. Pharmacophore, 3D database queries. Comput. Aided Mol. Des. , 11: 39-52 (1997).
q) Van Dire, J .; And Nugent, R .; , Addressing the challenges presented by combination chemistry: 3D databases, pharmacophon; recognition and beyond, SAR QS RNAR. Res. , 9: 1-21 (1998);
r) Finn et al. , The discovery of Pharmacophonre using Inductive Logic Programming progol (Pharmacophonre discovery using the inductive logic programming progol) Machine Learning, Special Issue on Applications and Knowledge Discovery, Kluwer Academic Publishers: Boston, 1998, pp 1-33;
s) Jain et al. J., a shape-based machine learning tool for drug design, Comp. Comput. Aided Mol. Des. , 8: 635-52 (1994).
[0020]
Unlike standard operating procedures in the pharmaceutical industry, Group 4 databases must be constructed as databases consisting of more than two components and cover a considerable range in both receptors or enzyme targets and compounds. The background section suggests it must be. As an example, to build a three-component database, one first selects a broad set of compounds that contains a wealth of information directly relevant to drug discovery and development. The most relevant information is often obtained from the actual experience of testing such compounds in humans in clinical trials and / or post-marketing surveillance and in animals in preclinical trials. Other relevant biological information can be obtained from natural substances that exhibit one or more biological activities, or from chemical reference standards used in the art to study the biological characteristics of receptors. Thus, one embodiment of an information-rich compound selected for such a Group 4 database includes over-the-counter drugs, drugs that failed clinical or preclinical studies, natural bioactive substances or natural Extracts and control agents used in receptor binding assays.
[0021]
Such a database can be constructed using screening data obtained from the scientific literature. This approach could produce a partial dataset, but has limitations. First, literature references provide only one piece of positive information (eg, a report on the inhibition of binding of a particular compound to a particular receptor). Negative data is as important as positive data in order to compare useful information. In addition, certain statistical analyzes may not be applicable to datasets that do not have both positive and negative data. Second, a separate quantitative report of an article on the binding data of one compound to one receptor and a separate report on the binding data of another compound for the same receptor describe how the assay is performed. Because they are different, they cannot be compared. Thus, one embodiment for generating a Group 4 three-component database is to screen a wide range of compound arrays for a wide range of receptors or enzyme targets to obtain consistent comparisons and to obtain positive and negative results. It will secure both data.
[0022]
Compound components: selection of compound libraries and inclusion of compound data
The present invention relates to a database containing, as one component, compounds having known biological activities related to pharmaceutical research and development. Information about biological activity can be included in a database or table of compounds.
[0023]
For example, these information-rich compounds include:
(A) A pharmacological control or reference compound for measuring the interaction or molecular binding between an unknown compound and a particular molecular target such as a receptor or enzyme. Examples of such control compounds include compounds used to characterize the binding effect between a test compound and a molecular target, including a receptor or an enzyme. Other control agents include Sigma Aldrich Corp. Research Biochemicals Inc., a group of Compounds selected from catalogs of (RBI) and other sources known in the art may be included. Because these pharmacological control compounds are pre-tested and / or marketed as medicaments or are highly bioactive natural substances, they may overlap with the following three categories:
(B) Compounds that are known drugs that are currently or previously marketed for medical use and for which substantial amounts of biological information are available. These compounds are well known and appear in publications available from U.S. government agencies such as the Federal Food and Drug Administration (FDA), as well as in publications issued by private companies or nonprofits. One such publication, issued by a non-profit organization, is the USP DI Series by the United States Pharmacopeial Convention Inc., in which Chapter I. Drug Information for Healthcare Providers (Volume I. Drug Information for the Health Care Professional) is updated monthly by the USP DI Update. New drugs that have been licensed will fall into this category. Over-the-counter drugs or drugs approved by the FDA or equivalent foreign regulatory bodies are public records, so those of ordinary skill in the art will readily identify such compounds as belonging to this category. I can do it.
(C) Compounds given IND (Investigational New Drug) status as potential new drugs, but failing to achieve sufficient efficacy or safety in clinical trials to obtain FDA approval; Compounds that have been approved for human testing, such as compounds that did not achieve commercial status. This category of compounds may also include those compounds that have been approved for marketing by the FDA but have subsequently withdrawn from the market. These compounds also contain significant amounts of useful biological information and will be particularly useful for the purposes of the present invention. The identity of the failed drug can be obtained from a number of sources, including official publications from pharmaceutical or biotechnology companies, publications such as "Pink Sheets", and lists maintained by the FDA.
(D) Compounds exhibiting biological activity obtained from natural resources such as plants, microorganisms and animals. These natural substances may include toxins, antibacterials, behavioral modifiers, defenses, and other categories of compounds that provide information relevant to new drug discovery and development. Identification of natural substances can be obtained from many publications such as, but not limited to, RBI and Sigma Aldrich's Compound Catalog.
[0024]
For each of the compounds included in this database, the chemical structure, chemical formula, physicochemical properties, chemical spatial configuration, other spatial chemistry information (eg, Smiles codes), solubility, and other relevant data are available. In the database field. Those of ordinary skill will be able to recognize other recordable parameters. Compounds can be constructed from relationships of chemical structures in the database or from other relationships.
[0025]
FIG. 1A shows a compound table 300 of a relational database. The table 300 lists a plurality of compounds and includes a record of a plurality of compounds N (row 1 行 N). For each compound, there is a column 301-307 containing information about the compound. For example, in FIG. 1A, column 301 contains the name of the compound, column 302 contains the type of the compound (eg, a compound permitted to be tested in the human body), and column 303 contains information about the chemical structure, for example, a structural diagram. A hyperlink to invoke the screen (see snapshot 310 in FIG. 1B), column 304 shows the chemical formula of the compound, column 305 shows information on the physicochemical properties of the compound, column 306 shows the spatial arrangement of the compound, and column 307 contains information on the solubility of the compound.
[0026]
Additional columns may be added so that other relevant data for each compound 301 listed in table 300 can be included. By including the biological activity of the compounds in these added columns, the compound database can also be a two-component database (see database 500).
[0027]
FIG. 1B shows a snapshot 310 that includes information related to the records in table 300. For example, the chemical formula 304 of the compound can be included in the snapshot along with the structure 303 of the compound.
[0028]
Molecular target components: selection of receptors, enzymes, and other molecular targets and inclusion of molecular target data
The second component of the database of the present invention includes, as molecular targets, receptors, enzymes, other proteins, nucleic acids, carbohydrates, other high molecular compounds, etc. related to drug discovery and development. In one embodiment of the invention, receptors and enzymes are the primary molecular targets. Receptors alleviate most of the molecular communication in cells and organs in the body. Enzymes often enhance such communication, for example, through secondary messenger systems and cellular signaling pathways.
[0029]
Receptors include typical receptors such as dopamine receptors, serotonin receptors, opiate receptors, muscarinic receptors, adrenergic receptors, adenosine receptors, and the like. These receptor groups include receptor type subtypes (such as dopamine-1, dopamine-2, dopamine-3, dopamine-4 and dopamine-5 receptors). Certain subtypes have additional variations (such as dopamine 4.2, dopamine 4.4 and dopamine 4.7), or some have different shapes (such as dopamine 2 short and dopamine 2 long). Mutations in the gene encoding a particular receptor may lead to a subset of receptors that have a slightly different binding to drugs and other compounds than normal receptors, resulting in splice variants of the receptor There is a possibility. Receptors can be categorized by family, superfamily, or subfamily. There are classification methods such as G-protein binding receptor, transmembrane receptor, and nuclear receptor. Receptors can be classified according to the degree of similarity of the DNA sequences of the related genes. Receptors can also be classified by amino acid sequence and related three-dimensional coordination. Receptors can be classified by their location in the tissue or across different cell types.
[0030]
Enzymes include proteases, carbohydrases, kinases, phosphotases, DNA modifying enzymes, transferases, P450s, and other enzymes known to those skilled in the art.
[0031]
Applicants are constantly developing other receptors, receptor sources, and assays related thereto and adding them to the database content. Additional receptors and receptor assays are well known to those skilled in the art. Lists and descriptions of receptors relevant to drug discovery and development can be obtained from many publications known to those skilled in the art. These publications include the RBI Handbook of Receptor Classification and the IUPHA Receptor Classification Book. Further, as new receptors and receptor subtypes are discovered, they are also added to the database content.
[0032]
Enzymes and enzyme assays are well known to those skilled in the art. Lists and descriptions of receptors relevant to drug discovery and development can be obtained from many publications known to those skilled in the art.
[0033]
FIG. 2 shows tables 400, 410 and 420 that form part of a relational database system that can be used to access molecular target information. Table 400 lists the targets and includes a record of multiple targets M (lines 1-M). Column 401 lists the target names, and column 402 specifies the type of target for each target name.
[0034]
The table structure may vary depending on the type of target identified in column 402. Table 410 contains information about targets that are classified as receptors and listed in table 400. By querying the database for a particular receptor name, the records in table 410 can be accessed. The receptor names in table 410 can be accessed by querying table 400 for a target name indicated as “Receptor” in column 402.
[0035]
The column 411 of the table 410 contains the receptor name, and this receptor name is also the target name of the column 401 of the table 400. Column 412 contains receptor family information, column 413 contains receptor superfamily information, column 414 contains receptor subfamily information, column 415 contains information about the degree of similarity of the DNA sequences of related genes, Column 416 contains information about the amino acid sequence. Amino acid sequences are one of many molecular descriptors contained in databases. Other molecular descriptors 417 may include, for example, hydrotherapy plots related to amino acid sequences. Since the molecular target database shown in Tables 400, 410 and 420 contains target information, and the biological information associated with the target is also included in the database (Table 600), the database has two components. Conceivable. The columns shown here indicate the types of information that can be included in the database and should not be construed as limiting the invention.
[0036]
The table 420 includes information on targets classified as enzymes in the table 400. By querying the database for a particular enzyme name, the records in table 420 can be accessed. The enzyme names in table 420 can be accessed by querying table 400 for the target name indicated in column 402 as "Enzyme".
[0037]
The enzyme name is included in the column 421 of the table 420, and the enzyme name is also included in the target name in the column 401 of the table 400. Column 422 contains information about the type of enzyme. The column 423 is set as “other related information”, and an additional column may be added to the table 420 depending on, for example, a case where a user wants to access other enzyme information including an amino acid sequence and a molecular descriptor. Is shown.
[0038]
Although only tables 410 and 420 are shown to illustrate access to molecular target information by type of target, the relational database system provides additional information depending on the number of types of molecular target targets available in the database. Tables can be added.
[0039]
Biological Information Component: Biological / Chemical Information Parameters
Biological information that forms part of the database includes, for example, items related to side effects, mechanisms of pharmacological actions, metabolism by drugs, toxicity, absorption, distribution, and elimination. This information can be obtained from FDA-approved labels for over-the-counter drugs or from the literature and publications on drugs that failed clinical trials. Specific examples of parameters include toxicity, LD₅₀, LD₅₀/ ED₅₀Teratogenicity, toxic mechanism, toxic target organ, in vitro toxic battery, apoptosis inducing, bioavailability, absorption, blood-brain barrier, oral absorption, mucosal absorption,% absorption , Distribution, critical blood protein, half-life, onset of action, duration of action, peak in blood concentration, metabolism, major pathway, minor pathway, active metabolite, excretion, first elimination mode, second elimination mode, These include in vitro utilities, therapeutic indications, effects on animal behavior, side effects, known key targets, other target organs / systems, and known receptor interactions.
[0040]
FIG. 3 shows a table 500 containing some of the biological information parameters described above. Table 500 consists of N rows (1-N) relating to all compounds that may be present in the first database. Column 501 contains the compound name, column 502 contains treatment instructions (for over-the-counter drugs or drugs that did not pass the test), column 503 contains information about toxicity, and column 504 contains information about toxicity. Contains information about side effects, and column 505 contains information about the mechanism of action of the drug. For example, associating table 500 with table 300 can form a two-component table of compounds and biological activity.
[0041]
FIG. 3 also shows a table 600 containing biological information parameters associated with molecular targets in the database. The table 600 relating to all possible targets in the second database consists of P rows (1 to P). Column 601 contains the target name, column 602 contains the treatment instructions (for over-the-counter drugs or drugs that did not pass the test), column 603 contains information about toxicity, and column 604 contains Contains information about side effects. Similarly to the above, for example, by associating the table 600 with the table 400, a two-component table of a molecular target and a biological activity can be formed. Tables 500 and 600 together provide a complete taxonomy database (eg, all combinations of compounds and molecular targets that may exist in a relational database system) including molecular target information, compound information, and biological activity information associated with each molecular target and each compound. And can be regarded as a multidimensional database. Additional columns can be added to tables 500 and 600 without departing from the scope of the invention.
[0042]
Determination of connectivity information
A key feature of the present invention is the composition of multiple information components consisting of compounds, molecular targets, and biological information, and the evaluation of the binding, reactivity, and other interactions between the compounds and molecular targets. By reassociating this binding or reactivity information with known biological information, patterns or relationships that can be used for drug discovery and development can be screened. An important aspect of the present invention is to generate as extensive and consistent binding or reactivity data between compounds and molecular targets as possible to provide as complete a dataset as possible to identify the required patterns or relationships. And provide both positive and negative binding or reactivity information. In one embodiment of the present invention, for example, the binding data is configured as a numerical descriptor indicating whether or not a threshold set is satisfied for a specific molecular target or molecular target set. This numerical descriptor can also be associated with the presence or absence of reactivity for each compound and each receptor, and other molecular targets, evaluated at a concentration near a threshold suitable for the biological system or set of biological information. For example, for some compounds, the inhibition of binding between the receptor and the corresponding^-5M (10 micromolar), can be tested at a threshold of 30%. In addition, an initial concentration or an inhibition rate threshold can be set. Also, in one embodiment of the present invention, compounds that cause binding inhibition above the threshold in this first yes / no test are further tested for their ability to inhibit binding. For these active compounds, for example, 10^-5-10^-9The test was performed in a series of concentration conditions, including 7-14 different concentrations within the range of M, and the IC of the active compound at a particular receptor was tested.₅₀And / or determine the value of Ki. The number of concentration conditions for making this determination is at least better, at least^-5-10^-9It may be necessary to use a concentration range higher or lower than M. From these data, a matrix of relative activity or relative potency of each active compound for each molecular target is obtained.
[0043]
To generate these screening data, the compounds are first solubilized in a suitable solvent system. As the solvent system, for example, 4% DMSO or the like can be used, but other concentrations of DMSO or another solvent can also be used. The stock solutions of these compounds are then diluted to an appropriate concentration to make them available as a repository. Each assay for measuring the interaction between a compound and a molecular target will use different reagents and procedures. Each such assay needs to be characterized and routinely made consistent. An appropriate control test must be performed each time the assay is run. Any assay format that can produce information of the type and accuracy desired can be used. Numerous assay detection systems can be used, such as radiolabeling, fluorescence analysis, fluorescence polarization analysis, time-resolved fluorescence analysis, fluorescence correlation spectroscopy, chemiluminescence, UV absorption, colorimetry, and the like.
[0044]
In one embodiment of the invention, data regarding molecular interactions is generated using a receptor binding assay or an enzymatic activity assay. For example, in a receptor binding assay, compounds in a stock solution are tested for their ability to inhibit the binding interaction between the receptor and a reference agent selected for the receptor. The receptor can be obtained from animal or human tissues or the like, or can be obtained from a cell line transfected to contain the gene for the receptor. The receptor source for the assay can be provided, for example, as a cell fraction containing the receptor. The receptor may also be partially purified. A reference compound or ligand is preferably selected based on potential and / or specific binding to a particular receptor, and by including iodine 125, tritium, carbon-14, and other radioactive tracers, Bound and unbound ligands can be distinguished. Positive and negative control tests are performed in parallel with the testing of the binding data of the compounds to be included in the database, and reference curves at various concentrations of the reference (radioactive) ligand ensure the quality of the assay performed. .
[0045]
One of skill in the art will appreciate that numerous methods and systems allow the interaction between a target and a compound to be measured. To reach an equilibrium state in the binding reaction, the radioligand, the receptor preparation solution, and the test compound are cultured for an appropriate time in an appropriate buffer solution and at an appropriate temperature. The amount of bound radioligand relative to unbound radioligand is determined by filtration and separation steps using methods such as SPA (scintillation proximity assay) and is measured by liquid scintillation or gamma counting. The specific binding number of the test compound is then determined by comparing the test compound assay results with the positive and negative control tests. From these data, the percent inhibition of the test compound is calculated.
[0046]
FIG. 4 shows a table 200 representing a screening result and an assay database, wherein compounds contained in database 300 (comprising 1 to N compounds) are tested for action on molecular targets contained in database 400. It has become. Table 200 can take many forms. For example, in table 210, the screening results based on whether the screening result of each of the plurality of compounds tested against each of the plurality of molecular targets is above or below a test result threshold selected for each measurement set. Can be entered as a “yes” or “no” entry.
[0047]
In other examples, the screening results identify the potency or degree of binding or other effect (eg, the Ki of the compound-receptor interaction) for each of the plurality of compounds tested against each of the plurality of molecular targets. It is input to the table 220 as a numerical descriptor. In a preferred embodiment, all such "compound" x "target" matrix points are determined in tables 210 and 220, and an all-kind database is generated. In addition, the screening result and assay database 200 may include measurement results of other compound-target interactions, including the raw data of the screening results, the measurement results obtained from the raw data, and the assay results. Procedures and features, and other relevant information are included.
[0048]
FIGS. 5A and 5B illustrate the use of the database 100, illustrated here as receptor selectivity, as part of the screening process to discover and screen new compounds as new drug candidates to be further developed. FIG. 5A, or finding and selecting new targets as potential effective targets for use in finding new drug candidates for specific disease detection (FIG. 5B). The database 100 may include a compound component 300, a molecular target component 400, biological information components 500 and 600, and a screening results and assay database 200.
[0049]
New compounds or sets of compounds are introduced into the screening process 102 to determine whether they have the ability to inhibit binding to specific compounds (eg, reference agents) and molecular targets (see FIG. 5A). In the screening process, target information obtained from the molecular target component 400 can be used.
[0050]
The results of the screening process 102 can be stored in an intermediate database or entered into the screening results and assay database 200 of the receptor selectivity database 100. In addition, the result can be stored as a specific parameter (for example, cytotoxicity) in the biological information database 500 or stored in the compound database 300 (for example, as a compound name).
[0051]
The complete set of results from the screening process 102 can be stored in the screening results and assay database 200. These new compounds can be further tested by querying the database 200 for new compounds (eg, reference agents) that exhibit an inhibitory capacity for binding between the compound and the molecular target.
[0052]
Alternatively, new molecular targets, such as, for example, "orphan" receptors, can be introduced into the screening process to test for compounds in the compound database 300 (see FIG. 5B). Regarding the orphan receptor, its structure is known, but its function and its relation to disease are not known. The results of the screening process, including the identification of compounds that interact with the new molecular target, are incorporated into the screening results database 200. The database 100 is queried to identify the function of the new molecular target and / or to confirm the relevance of the new molecular target to the disease.
[0053]
FIG. 6A shows the prediction of the potential of a new compound as a drug using the database 100. Table 710 relies on information from a database of compounds (300), molecular targets (400), biological information (500 and 600), and screening results (200). When a user provides information about a new compound to the database 100, the automatic query script is executed to obtain this information so that the table 710 includes information from one or more of these databases (or tables). It has become.
[0054]
A query script for generating the table 710 can select a compound from the compound database 300 when given information on a new compound. This selection can be made based on the similarity in chemical structure and other properties between the new compound and the compounds already included in database 300.
[0055]
After selecting the compound, the query script selects from the target database 400 targets that are known to react (bind) with the selected compound. Finally, the biological information databases 500 and 600 are queried using the selected compound and molecular target, and the biological information for the compound-molecular target pair is inserted into the table 710. Alternatively, the biological information contained in the table 710 can be limited to a particular biological information category of interest (e.g., toxicity) by the user entering the category.
[0056]
The user can query table 710 to obtain information related to predicting the potential of a new compound to be used as a drug. Examples include queries for molecular targets that are known to react with compounds related to the novel compound, and queries for side effects that the molecular targets are known to cause with the compound.
[0057]
FIG. 6B shows the association of novel molecular targets with diseases and / or organisms using the data entry and query shown in FIG. 6B, using an approach similar to predicting the drug potential of new compounds using database 100. The method for confirming the biological function is shown.
[0058]
All patents, patent applications and publications mentioned herein are hereby incorporated by reference.
[0059]
The above description of the embodiments of the present invention is for the purpose of illustration and description, is not exhaustive, and does not limit the invention to the forms disclosed herein. Modifications and changes may be made to the invention in light of the above disclosure or upon practicing the invention.
[Brief description of the drawings]
FIG.
FIG. 1A shows a compound table of a receptor selectivity mapping database in one embodiment of the present invention.
FIG. 1B shows a snapshot of a compound record including the spatial arrangement of compounds in a receptor selectivity mapping database in one embodiment of the present invention.
FIG. 2
FIG. 4 illustrates several logical tables of a receptor selectivity mapping database that can be used to access molecular target information in one embodiment of the invention.
FIG. 3
4 shows a table of biological information in a receptor selectivity mapping database in one embodiment of the present invention.
FIG. 4
FIG. 4 illustrates a method of using a receptor selectivity mapping database as part of a screening process in one embodiment of the present invention.
FIG. 5
FIG. 5A shows how to use the receptor selectivity mapping database as part of the screening process to find and select new compounds that are new drug candidates.
FIG. 5B shows how a receptor selectivity mapping database is used as part of the screening process to identify new targets that are candidate targets for finding new drug candidates for a particular disease.
FIG. 6
FIG. 6A shows the use of a database to predict the potential of a new compound as a drug.
FIG. 6B illustrates the use of a database to identify disease relevance and / or biological function of novel molecular targets.

Claims

A first database comprising records relating to the plurality of compounds and records relating to biological information regarding the effects of the compounds on the biological system;
A second database containing records relating to the plurality of molecular targets;
A test of an interaction between a compound of a first database and a molecular target of a second database, wherein the interaction between the compound and the molecular target known to interact with the plurality of molecular targets is performed. A third database comprising a record relating to the test for the action, the information including information of the action of one of the plurality of compounds;
A user browses the selected compound and, depending on its association with the record of compound data in the first database or the record of molecular targets in the second database, the first database, the second database and the third database. A user interface that can selectively browse information from the database of
Computer system consisting of

The computer system according to claim 1, wherein the interaction includes binding, and the effect includes an inhibitory effect.

The computer system according to claim 1, wherein the compound includes a compound having no known biological activity or a compound that does not pass a test.

The computer system of claim 1, wherein the compound comprises a compound that has been tested in an animal.

The computer system according to claim 1, wherein the compound includes a compound known to act on the environment.

The computer system according to claim 1, wherein the compound includes a pharmacological reference agent.

The computer system of claim 1, wherein the compound includes a drug known in the clinical drug market and having a significant amount of available biological information.

The computer system of claim 1, wherein the compounds include compounds that have been approved for human testing.

The computer system according to claim 1, wherein the compound includes a compound exhibiting biological activity obtained from a natural resource.

The computer system according to claim 1, wherein the molecular target includes a receptor.

The computer system according to claim 1, wherein the molecular target includes an enzyme.

The computer system according to claim 1, wherein the molecular target includes a nucleic acid.

The computer system according to claim 1, wherein the molecular target includes a carbohydrate.

2. The computer system of claim 1, wherein the first database records associated with a plurality of compounds are organized by categories relating to the description and properties of the compounds.

15. The computer system of claim 14, wherein the categories include compound name, compound type, physicochemical properties, descriptor of chemical spatial arrangement or chemical structure, and solubility.

The computer system according to claim 1, wherein the first database includes a database of natural substances.

The computer system according to claim 1, wherein the first database includes a database of drugs that did not pass the test.

The computer system according to claim 1, wherein the first database includes a chemical registry database.

The computer system according to claim 1, wherein the second database includes a database having a three-dimensional structure.

The computer system according to claim 1, wherein the second database includes a sequence / mutation database.

The computer system according to claim 1, wherein the second database includes a gene database.

The third database record relating to biological information regarding the effects of the compound on the biological target is systematically organized by category including compound name, target name, toxicity, side effects, and mechanism of pharmacological action. The computer system according to claim 1, wherein the computer system is configured.

Further comprising: means for setting an interaction test threshold associated with the effect; and means for selecting the compound if the use of the compound results in meeting the interaction test threshold. 2. The computer system according to claim 1.

Selecting a compound from a first database containing records related to the plurality of compounds;
Selecting a molecular target from a second database containing records relating to the plurality of molecular targets;
Generating information related to an interaction between each of the selected compounds and each of the selected molecular targets;
Selecting a biological activity from a third database containing records relating to biological information regarding the effect of the compound on the biological target;
Using the generated information to associate a pattern of interaction between a compound and a molecular target with the selected biological activity.

The step of generating the information includes:
Generating binding data for the binding between each of the selected compounds and each of the selected molecular targets by monitoring the inhibitory effect of the unknown compound on the binding;
Setting a binding test threshold for the inhibitory effect,
Generating information about combinations of unknown compounds, molecular targets, and compounds that meet or do not meet the binding test thresholds.

The method of claim 25, wherein the connectivity data includes positive and negative connectivity information.