JP2002073626A

JP2002073626A - Pharmocophor analysis method using decision tree

Info

Publication number: JP2002073626A
Application number: JP2000257273A
Authority: JP
Inventors: Kazuto Yamazaki; 一人山崎; Shoji Kaneoka; 昌治金岡
Original assignee: Sumitomo Pharmaceuticals Co Ltd
Current assignee: Sumitomo Pharmaceuticals Co Ltd
Priority date: 2000-08-28
Filing date: 2000-08-28
Publication date: 2002-03-12

Abstract

PROBLEM TO BE SOLVED: To provide a pharmocophor analysis method which can drastically improve prediction accuracy in a virtual screening technology. SOLUTION: A minimum unit 'N-center pharmocophor component' that represents the three-dimensional spatial arrangement information of combination factors and 'a structural descriptor' that represents the physico-chemical characteristic, etc., of a compound used generally in a quantitative structure activity correlation analysis are used for an explanatory variable, and a 'decision tree analysis' being one of a nonparametric and nonlinear statistical analysis technique is performed, thereby achieving the objective method.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、ハイスループット
スクリーニングから得られる大量の薬理活性データを用
いて、コンピューター上で行なう仮想スクリーニング用
の新たなファーマコフォア解析手法に関する。更に詳し
くは、「Ｎ中心ファーマコフォア構成要素」と定量的構
造活性相関解析で一般に用いられている化合物の物理化
学的特性を表現した「各種の構造記述子」を説明変数と
して、「決定木解析」を行うことにより、薬理活性の有
無を高精度で識別できる統計モデルに関する。The present invention relates to a new pharmacophore analysis method for virtual screening performed on a computer using a large amount of pharmacological activity data obtained from high-throughput screening. More specifically, “decision tree” is used as an explanatory variable with “N-centered pharmacophore constituents” and “various structural descriptors” expressing the physicochemical properties of compounds generally used in quantitative structure-activity relationship analysis. The present invention relates to a statistical model that can identify the presence or absence of pharmacological activity with high accuracy by performing “analysis”.

【０００２】[0002]

【従来の技術】近年のコンピューターケミストリーの進
歩は、容易に入手あるいは合成可能な化合物からなる仮
想ライブラリーについて、薬理活性の有無をコンピュー
ター上で予測する仮想スクリーニング技術を生み出し
た。本技術は、一連の創薬スクリーニング研究の効率化
と成功確率の向上に有用かつ不可欠な技術になるものと
目されている。従来の仮想スクリーニング技術のひとつ
に、標的となる酵素やレセプターといった生体内高分子
の3次元構造モデルとのドッキングスタディーが知られ
ている。本手法は、比較的高い予測精度を実現するもの
の、計算に多大な時間を要するだけでなく、実際には標
的とする生体内高分子の3次元構造が未知の場合が少な
くない。2. Description of the Related Art Recent advances in computer chemistry have created a virtual screening technique for predicting, on a computer, the presence or absence of pharmacological activity of a virtual library of easily obtainable or synthesizable compounds. This technology is expected to become a useful and indispensable technology for improving the efficiency and success probability of a series of drug discovery screening studies. As one of the conventional virtual screening techniques, a docking study with a three-dimensional structural model of an in-vivo macromolecule such as a target enzyme or receptor is known. Although this method achieves relatively high prediction accuracy, not only does it take a great deal of time for calculations, but also the three-dimensional structure of the target macromolecule in vivo is often unknown.

【０００３】生体内高分子の3次元構造が未知な場合に
は、化合物の構造と既知の薬理活性データから、ファー
マコフォアモデルを作成し、そのモデルへの適合度を評
価することにより仮想スクリーニングが行われる。この
ファーマコフォア解析技術には、化合物の物理化学的な
特性を表す構造記述子を説明変数に用いた重回帰分析に
より、薬理活性を定量的に解析する方法が古くから知ら
れている。本手法は、局所的な部分構造と薬理活性との
相関解析には有用であるが、大局的な相関解析に用いる
のは困難である。他のファーマコフォア解析手法とし
て、同じ薬理作用を有する複数の化合物が共通にもち合
わせている、分子間の結合に寄与する因子の空間的な配
置を解析する手法が知られている。この解析から得られ
たモデルを用いると、数多くの化合物の中から、特定の
薬理活性をもった化合物を見出すことが可能である。し
かし、この手法は、定量的構造活性相関解析とは反対
に、局所的な部分構造と薬理活性との相関解析には適切
とは言えなかった。When the three-dimensional structure of a macromolecule in a living body is unknown, a pharmacophore model is created from the structure of the compound and known pharmacological activity data, and the degree of conformity to the model is evaluated. Is performed. For this pharmacophore analysis technique, a method of quantitatively analyzing pharmacological activity by multiple regression analysis using a structural descriptor representing the physicochemical properties of a compound as an explanatory variable has been known for a long time. Although this method is useful for analyzing the correlation between local partial structures and pharmacological activities, it is difficult to use it for global analysis. As another pharmacophore analysis technique, a technique for analyzing the spatial arrangement of factors contributing to intermolecular binding, which is shared by a plurality of compounds having the same pharmacological action, is known. Using a model obtained from this analysis, it is possible to find a compound having a specific pharmacological activity from a large number of compounds. However, this method was not appropriate for the correlation analysis between the local partial structure and the pharmacological activity, as opposed to the quantitative structure-activity relationship analysis.

【０００４】上記ふたつの手法を段階的に組み合わせた
３次元定量的構造活性相関解析法は、両者の欠点を補っ
たものであり、化合物の大局的および局所的な構造と薬
理活性との相関解析を実現している。しかし、結合に寄
与する因子の空間配置の共通性と物理化学的な特性を別
々に評価する為、十分な予測精度を実現しているとは言
い難い。また、ファーマコフォアの表現能力が乏しく、
ハイスループットスクリーニングから得られる結果のよ
うに、多様性をもつ構造群と薬理活性との相関解析には
不十分である。また、多大な計算量と操作の煩雑さか
ら、取り扱えるデータ量は数個から数10個の化合物が限
界である。以上のように、これまで知られている主な解
析手法として、定量的構造活性相関法や重ね合わせ解析
法等があり、これらは一定の成果を収めてきたものの、
仮想スクリーニングに要求される精度を十分に満足して
いるとは言い難いものであった。例えば、予測精度の向
上、モデルの表現能力の向上、計算処理の高速化、さら
にはデータマネジメントの効率化等、改良すべきいくつ
かの課題を抱えており、充分な解析手法であるとは言え
なかった。The three-dimensional quantitative structure-activity relationship analysis method, which combines the above two methods in a stepwise manner, compensates for the disadvantages of both, and analyzes the correlation between the global and local structure of the compound and the pharmacological activity. Has been realized. However, it is difficult to say that sufficient prediction accuracy has been achieved because the commonality of spatial arrangement of factors contributing to binding and physicochemical properties are separately evaluated. In addition, the expression ability of the pharmacophore is poor,
As in the results obtained from high-throughput screening, it is not sufficient to analyze the correlation between diverse structural groups and pharmacological activities. In addition, due to a large amount of calculation and complicated operation, the data amount that can be handled is limited to several to several tens of compounds. As described above, the main analysis methods known so far include the quantitative structure-activity relationship method and the overlay analysis method, and although these have achieved certain results,
It was hard to say that the accuracy required for virtual screening was sufficiently satisfied. For example, it is a sufficient analysis method because it has several issues to be improved, such as improvement of prediction accuracy, improvement of model expression ability, faster calculation processing, and more efficient data management. Did not.

【０００５】[0005]

【発明が解決しようとする課題】本発明の目的は、仮想
スクリーニング技術において、大幅な予測精度の向上を
実現するファーマコフォア解析手法を提供することにあ
る。また、対象とする薬理活性に汎用性をもって高い予
測能を示す方法であり、高速かつ簡便なデータマネジメ
ント作業を実現する方法である。SUMMARY OF THE INVENTION It is an object of the present invention to provide a pharmacophore analysis technique which realizes a great improvement in prediction accuracy in virtual screening technology. In addition, this method is a method that exhibits high predictability with versatility in the target pharmacological activity, and is a method that realizes a fast and simple data management operation.

【０００６】[0006]

【課題を解決するための手段】本発明者は、仮想スクリ
ーニングの成否に最も重要な役割をもつ、ファーマコフ
ォア解析手法について鋭意検討した結果、非線型かつノ
ンパラメトリックな統計解析手法のひとつである「決定
木解析」を用いると、定量的構造活性相関法と重ね合わ
せ解析法の双方を包含したファーマコフォア解析が実現
できるとの知見を得た。具体的には、本発明者は、結合
因子の3次元空間配置情報を表現する「Ｎ中心ファーマ
コフォア構成要素」と定量的構造活性相関法で一般に用
いられている化合物の物理化学的特性を表現する「構造
記述子」を説明変数として、「決定木解析」を行うこと
により、薬理活性の有無を高精度で識別できる統計モデ
ルが得られることを見出し、本発明を完成させた。The present inventor has conducted intensive studies on a pharmacophore analysis technique, which plays the most important role in the success or failure of virtual screening, and is one of the non-linear and non-parametric statistical analysis techniques. Using "decision tree analysis", we obtained the knowledge that pharmacophore analysis including both quantitative structure-activity relationship method and overlay analysis method can be realized. Specifically, the present inventor has proposed the “N-centered pharmacophore component” expressing the three-dimensional spatial arrangement information of a binding factor and the physicochemical properties of a compound generally used in a quantitative structure-activity relationship method. By performing "decision tree analysis" using the expressed "structural descriptor" as an explanatory variable, it has been found that a statistical model capable of identifying the presence or absence of pharmacological activity with high accuracy can be obtained, and the present invention has been completed.

【０００７】すなわち本発明の要旨は、以下の通りであ
る。（１）化合物の構造情報を収載したデーターベースにお
いて、薬理活性を有するリード候補化合物を、高速かつ
高い予測精度で選択する方法であって、結合因子の3次
元空間配置情報を表現する「N中心ファーマコフォア構
成要素」と定量的構造活性相関解析で一般に用いられて
いる化合物の物理化学的特性を表現する「構造記述子」
を説明変数に用いて、ノンパラメトリックかつ非線型の
統計解析手法のひとつ「決定木解析」を行うことによ
り、ファーマコフォア解析を実施することを特徴とする
方法。（２）決定木解析手法として、CART（Classification a
nd Regression Tree）法を採用することからなる、上記
（１）記載の方法。（３）「Ｎ中心ファーマコフォア構成要素」として、結
合因子の３次元空間配置情報を表現する最小単位である
「２中心ファーマコフォア構成要素」を採用することか
らなる、上記（１）記載の方法。（４）「構造記述子」として、オクタノール−水分配係
数、水和状態の変化に伴う自由エネルギー変化、リガン
ドの配座自由度等の分子間結合におけるエントロピーの
寄与を反映したものあるいは、分子量、体積、表面積等
の形状および、サイズを反映したものを採用することか
らなる、上記（１）記載の方法。（５）化合物の構造情報と薬理活性情報を収載したデー
ターベースから、構造活性相関ルールをコンピューター
を用いて自動的に作成し、更新する方法であって、結合
因子の3次元空間配置情報を表現する「N中心ファーマコ
フォア構成要素」と定量的構造活性相関解析で一般に用
いられている化合物の物理化学的特性を表現する「構造
記述子」を説明変数に用いて、ノンパラメトリックかつ
非線型の統計解析手法のひとつ「決定木解析」を行うこ
とにより、ファーマコフォア解析を実施することを特徴
とする方法。（６）上記（５）記載の構造活性相関ルールの作成方法
によって得られる化合物構造活性相関ルールのデーター
ベース。（７）上記（５）記載の構造活性相関ルールのデーター
ベースに特定の化合物の構造式と薬理活性データを登録
することにより、特定の薬理活性を有するリード候補化
合物をコンピューターを用いて検索する方法。（８）上記（５）記載の構造活性相関ルールのデーター
ベースにコンピューター上で稼動しているSQL形式の検
索式、あるいは、化合物の特定の薬理活性情報のいづれ
か一方を与えることによって、特定の薬理活性を有する
リード候補化合物をコンピューターを用いて検索する方
法。That is, the gist of the present invention is as follows. (1) A method for selecting a lead candidate compound having pharmacological activity at high speed and high prediction accuracy from a database containing structural information of a compound, wherein “N center” is used to express three-dimensional spatial arrangement information of binding factors. “Pharmacophore constituents” and “structural descriptors” that represent the physicochemical properties of compounds commonly used in quantitative structure-activity relationship analysis
A pharmacophore analysis by performing "decision tree analysis", which is one of the non-parametric and non-linear statistical analysis methods, using as an explanatory variable. (2) CART (Classification a)
(1) The method according to the above (1), which comprises employing an (nd Regression Tree) method. (3) The description in (1) above, wherein the “N-centered pharmacophore component” is a “two-centered pharmacophore component” that is a minimum unit expressing three-dimensional spatial arrangement information of a binding factor. the method of. (4) As the “structural descriptor”, one reflecting the contribution of entropy in intermolecular bonding such as octanol-water partition coefficient, change in free energy due to change in hydration state, conformational freedom of ligand, or molecular weight, The method according to the above (1), which employs a shape and size reflecting the volume and surface area. (5) A method of automatically creating and updating structure-activity association rules from a database containing compound structure information and pharmacological activity information using a computer, and expressing three-dimensional spatial arrangement information of binding factors. Non-parametric and non-linear non-linear using the "N-centered pharmacophore component" and the "structural descriptor" that expresses the physicochemical properties of compounds commonly used in quantitative structure-activity relationship analysis as explanatory variables A method characterized by performing pharmacophore analysis by performing one of the statistical analysis methods "decision tree analysis". (6) A database of compound structure-activity relationship rules obtained by the method for creating structure-activity relationship rules described in (5) above. (7) A method of using a computer to search for a lead candidate compound having a specific pharmacological activity by registering the structural formula of a specific compound and pharmacological activity data in the database of the structure-activity relationship rule described in (5) above . (8) The database of the structure-activity relationship rules described in the above (5) is given a specific pharmacological activity by giving either a SQL-based search formula running on a computer or specific pharmacological activity information of the compound. A method for searching for a lead candidate compound having activity using a computer.

【０００８】[0008]

【発明の実施の形態】本発明で使用される「決定木解
析」とは、非線型かつノンパラメトリックな統計解析手
法のひとつであり、目的変数のクラスに応じた最適なレ
コードの分類を行い、統計モデルを系統樹として表す。
決定木解析手法は、高速に計算が行われる為、多数のレ
コードと説明変数を扱うことができる。樹木成長段階に
おける分岐方法や頑健性を保障する為に行われる剪定方
法の違いによって、“CART”、“C5.0（ID3）”、“CHA
ID”等の手法が数多く提案されている（大滝厚／堀江宥
治／ダン・スタインバーグ著、日科技連、1998）。DESCRIPTION OF THE PREFERRED EMBODIMENTS "Decision tree analysis" used in the present invention is one of the non-linear and non-parametric statistical analysis methods, and performs optimal record classification according to the class of the objective variable. The statistical model is represented as a phylogenetic tree.
The decision tree analysis method can handle a large number of records and explanatory variables because the calculation is performed at high speed. “CART”, “C5.0 (ID3)”, “CHA”, depending on the branching method at the tree growth stage and the pruning method used to ensure robustness
Many methods such as “ID” have been proposed (Atsushi Otaki / Yoji Horie / Dan Steinberg, Nikkagiren, 1998).

【０００９】本発明で使用される「CART（Classificati
on and Regression Tree）法」とは、上記の文献（大滝
厚／堀江宥治／ダン・スタインバーグ著、日科技連、19
98）に記載の方法であり、樹木成長を2分岐に限定し、
かつ積極的な剪定を行うことを特徴としており、統計モ
デルの頑健性を重視した決定木解析手法である。CART解
析は、独自にプログラムを作成するか、あるいは統計解
析パッケージソフト“SPSS 10.0J”の決定木解析モジュ
ール“Answer Tree 2.1J”（SPSS社）等の市販アプリケ
ーションソフトウェアを利用して実行される。The CART (Classificati) used in the present invention
on and Regression Tree method ”, written by Atsushi Otaki / Yuji Horie / Dan Steinberg, Nikkagiren, 19
98), wherein tree growth is limited to two branches,
It is characterized by active pruning and is a decision tree analysis method that emphasizes the robustness of statistical models. The CART analysis is executed by creating a program on its own or by using commercially available application software such as a decision tree analysis module “Answer Tree 2.1J” (SPSS) of the statistical analysis package software “SPSS 10.0J”.

【００１０】本発明の「ファーマコフォア解析」とは、
特定の薬理活性を発現するメカニズムを明らかにするこ
とであり、解析から得られたファーマコフォアの知見
は、薬理活性が未知の化合物に対して、薬理活性の有無
等を予測できるものである。The “pharmacophore analysis” of the present invention is
The purpose of the present invention is to clarify the mechanism of expressing a specific pharmacological activity, and the findings of pharmacophore obtained from the analysis can predict the presence or absence of a pharmacological activity for a compound whose pharmacological activity is unknown.

【００１１】本発明の「Ｎ中心ファーマコフォア構成要
素」とは、高分子とリガンドの結合に重要な役割をもつ
結合因子の、3次元空間における相対配置情報を表すも
のであり、一般に知られている3次元分子記述子と同じ
ものである（Erin K. BradleyらJ. Med. Chem. 2000, 4
3, 2770-2774）。結合因子には、水素結合受容体、水素
結合供与体、疎水中心等が挙げられる。相対配置情報
は、任意に定義される離散値として表される。よって、
「Ｎ中心ファーマコフォア構成要素」は、Ｎ個の結合因
子の種類とそれらの相対距離の離散値からなる。同様に
「２中心ファーマコフォア構成要素」は、「Ｎ中心ファ
ーマコフォア構成要素」の最小単位であり、2個の結合
因子の種類とそれらの相対距離の離散値からなる。結合
因子には、水素結合受容体、水素結合供与体、疎水中
心、芳香環中心、正電荷能、負電荷能を用い、相対距離
の離散値は8個を定義することにより、180種類の「２中
心ファーマコフォア構成要素」が定義される。The "N-centered pharmacophore component" of the present invention represents relative arrangement information in a three-dimensional space of a binding factor having an important role in binding a polymer to a ligand, and is generally known. (Erin K. Bradley et al., J. Med. Chem. 2000, 4).
3, 2770-2774). Binding agents include hydrogen bond acceptors, hydrogen bond donors, hydrophobic centers, and the like. The relative arrangement information is represented as an arbitrary defined discrete value. Therefore,
The "N-centered pharmacophore component" is composed of N types of binding factors and discrete values of their relative distances. Similarly, the "two-centered pharmacophore component" is the minimum unit of the "N-centered pharmacophore component" and is composed of two types of binding factors and discrete values of their relative distances. As the binding factors, hydrogen bond acceptor, hydrogen bond donor, hydrophobic center, aromatic ring center, positive charge ability, negative charge ability are used, and by defining eight discrete values of relative distance, 180 kinds of `` A two-center pharmacophore component is defined.

【００１２】本発明の「構造記述子」とは、定量的構造
活性相関手法において一般に用いられているものであ
り、例えば、リガンドの結合に伴う熱力学的エントロピ
ー変化を反映した水−オクタノール分配係数や自由回転
結合数等、また分子の形状やサイズを表す分子量や分子
屈折率等のことを言う。The "structure descriptor" of the present invention is generally used in a quantitative structure-activity relationship technique. For example, a water-octanol partition coefficient reflecting a thermodynamic entropy change accompanying the binding of a ligand is used. And the number of free rotation bonds, and the molecular weight and molecular refractive index, etc., which represent the shape and size of the molecule.

【００１３】本発明のデータ作成は、まず、化合物の構
造式情報から安定および準安定配座を発生させる。配座
の発生は、一般に知られている方法に基づいて独自にプ
ログラムを作成するか、あるいはCatalyst ver.4.5（MS
I社）等の市販アプリケーションソフトウェアを用いて
実行される。「２中心ファーマコフォア構成要素」のデ
ータ作成は、配座の座標データから容易に求めることが
でき、独自にプログラムを作成するか、あるいはCataly
st ver.4.5（MSI社）等の市販アプリケーションソフト
ウェアを利用して実行される。「構造記述子」は、化合
物の構造式あるいは配座の座標データから容易に求める
ことができ、独自にプログラムを作成するか、あるいは
Cerius2 ver.4.0（ＭＳＩ社）等の市販アプリケーショ
ンソフトウェアを用いて実行される。これらのデータ
は、薬理活性の解析に対して共通に用いられる為、化合
物データベース内部あるいはそれとリンクして保持され
ることにより、再計算を必要としない。In the data preparation of the present invention, first, a stable and metastable conformation is generated from the structural formula information of a compound. The conformational generation can be performed by creating an original program based on generally known methods, or by using Catalyst ver.4.5 (MS
It is executed using commercially available application software such as I Company. The data creation of "two-center pharmacophore component" can be easily obtained from the coordinate data of the conformation.
It is executed using commercially available application software such as st ver.4.5 (MSI). “Structural descriptors” can be easily obtained from the coordinate data of the structural formula or conformation of a compound.
It is executed using commercially available application software such as Cerius2 ver.4.0 (MSI). Since these data are used in common for the analysis of pharmacological activity, they need not be recalculated because they are held inside the compound database or linked thereto.

【００１４】次に、CART法の設定項目を表１に示した。Next, Table 1 shows the setting items of the CART method.

【００１５】[0015]

【表１】これらの設定条件は、種々のファーマコフォア解析に適
しており、かつ汎用性のある条件であると考えており、
以下のすべての実施例で用いた。本発明の決定木による
ファーマコフォア解析方法の手順は、図１のフローチャ
ートに示した。[Table 1] These setting conditions are suitable for various pharmacophore analysis and are considered to be versatile conditions.
Used in all of the following examples. The procedure of the pharmacophore analysis method using the decision tree of the present invention is shown in the flowchart of FIG.

【００１６】[0016]

【実施例】以下、実施例により本発明を具体的に説明す
るが、本発明はこれらの実施例により何ら限定されるも
のではない。実施例１5-HT1aアンタゴニストの薬理活性の解析と検証市販薬物データベースMDDR（MDL社）に登録されている5
-HT1aアンタゴニスト429化合物から、ランダムに100化
合物を選択し、そのうち75化合物を解析用に、残りの25
化合物を検証用に用いた。一方、5-HT1aアンタゴニスト
不活性な化合物として、便宜上、市販スクリーニング用
化合物からランダムに10000化合物を選択し、そのうち7
500化合物を解析用に、残りの2500化合物を検証用に用
いた。「事前確率パラメーター」は活性および不活性の
両ケースで等しくなるように設定した。「費用パラメー
ター」は用いなかった。CARTによる解析の結果、下表の
ような結果となった。活性群の予測率は高く、不活性群
の誤分類率が低いことから、5-HT1aアンタゴニスト活性
の有無を非常に高い精度で識別できた。また、解析デー
タと検証データの予測率および誤分類率にほとんど差が
なく、頑健性の高い統計モデルが得られた。結果を表２
に示す。The present invention will be described below in detail with reference to examples.
However, the present invention is not limited by these Examples.
Not. Example 1Analysis and validation of pharmacological activity of 5-HT1a antagonist Registered in the marketed drug database MDDR (MDL) 5
-Randomized from 429 compounds of HT1a antagonist to 100
Compounds were selected, of which 75 were used for analysis and the remaining 25
Compounds were used for validation. On the other hand, 5-HT1a antagonist
As an inactive compound, for convenience, for commercial screening
10,000 compounds were randomly selected from the compounds,
500 compounds for analysis, remaining 2500 compounds for verification
Was. The "prior probability parameter" is defined as active and inactive
The settings were made equal in both cases. "Cost parameter
Was not used. As a result of analysis by CART,
The result was as follows. The prediction rate of the active group is high, and the inactive group
5-HT1a antagonist activity due to low misclassification rate
Could be identified with very high accuracy. In addition, analysis data
Data and verification data have little difference in prediction rate and misclassification rate
And a robust robust statistical model was obtained. Table 2 shows the results
Shown in

【００１７】[0017]

【表2】 [Table 2]

【００１８】実施例２SSRI（5-HT Reuptake Inhibitor）の薬理活性の解析と
検証市販薬物データベースMDDR（MDL社）に登録されているS
SRI 393化合物から、ランダムに100化合物を選択し、そ
のうち75化合物を解析用に、残りの25化合物を検証用に
用いた。一方、SSRI不活性な化合物として、便宜上、市
販スクリーニング用化合物からランダムに10000化合物
を選択し、そのうち7500化合物を解析用に、残りの2500
化合物を検証用に用いた。「事前確率パラメーター」は
活性および不活性の両ケースで等しくなるように設定し
た。「費用パラメーター」は用いなかった。CARTによる
解析の結果、下表のような結果となった。実施例１の結
果と同様に、活性群の予測率は高く、不活性群の誤分類
率が低いことから、SSRI活性の有無を非常に高い精度で
識別できた。また、解析データと検証データの予測率お
よび誤分類率にほとんど差がなく、頑健性の高い統計モ
デルが得られた。結果を表３に示す。Embodiment 2Analysis of pharmacological activity of SSRI (5-HT Reuptake Inhibitor) and
Verification S registered in the over-the-counter drug database MDDR (MDL)
100 compounds were randomly selected from 393 SRI compounds, and
Of which 75 for analysis and the remaining 25 for validation
Using. On the other hand, as an SSRI-inactive compound,
10,000 random compounds from sales screening compounds
And select 7500 compounds for analysis and the remaining 2500
Compounds were used for validation. The "prior probability parameter"
Set equal for both active and inactive cases
Was. No "cost parameters" were used. By CART
As a result of the analysis, the results are as shown in the table below. Conclusion of Example 1
As with the fruits, the prediction rate of the active group was high and the inactive group was misclassified.
Because of the low rate, the presence or absence of SSRI activity can be determined with extremely high accuracy.
Could be identified. In addition, the prediction rate of analysis data and verification data and
Robust statistical model with little difference in
Dell got. Table 3 shows the results.

【００１９】[0019]

【表３】 [Table 3]

【００２０】実施例３COX-1（Cyclooxygenase-1）阻害剤の阻害活性と検証市販薬物データベースMDDR（MDL社）に登録されているC
OX-1阻害剤688化合物から、ランダムに100化合物を選択
し、そのうち75化合物を解析用に、残りの25化合物を検
証用に用いた。一方、COX-1阻害不活性な化合物とし
て、便宜上、市販スクリーニング用化合物からランダム
に10000化合物を選択し、そのうち7500化合物を解析用
に、残りの2500化合物を検証用に用いた。「事前確率パ
ラメーター」は活性および不活性の両ケースで等しくな
るように設定した。「費用パラメーター」は用いなかっ
た。CARTによる解析の結果、下表のような結果となっ
た。実施例１および２の結果と同様に、活性群の予測率
は高く、不活性群の誤分類率が低いことから、COX-1阻
害活性の有無を非常に高い精度で識別できた。結果を表
４に示す。Embodiment 3Inhibition activity and verification of COX-1 (Cyclooxygenase-1) inhibitor C registered in the marketed drug database MDDR (MDL)
Random selection of 100 compounds from 688 compounds of OX-1 inhibitor
Of which 75 were analyzed for analysis and the remaining 25 were tested.
Used for proof. On the other hand, COX-1 inhibitory inactive compounds
For convenience, random from commercial screening compounds
10,000 compounds were selected, and 7,500 of them were used for analysis.
Then, the remaining 2500 compounds were used for verification. `` A priori probability
Parameters are equal in both active and inactive cases.
It was set so that. Do not use "cost parameters"
Was. As a result of analysis by CART, the results are as shown in the table below.
Was. Similar to the results of Examples 1 and 2, the prediction rate of the active group
Is high and the rate of misclassification in the inactive group is low,
The presence or absence of harmful activity could be identified with very high accuracy. Table of results
It is shown in FIG.

【００２１】[0021]

【表４】 [Table 4]

【００２２】[0022]

【発明の効果】本発明においては、結合因子の3次元空
間配置情報を表現する最小単位「Ｎ中心ファーマコフォ
ア構成要素」と、定量的構造活性相関解析で一般に用い
られている化合物の物理化学的特性等を表現する「構造
記述子」を説明変数に用いて、ノンパラメトリックかつ
非線型の統計解析手法のひとつ「決定木解析」を行うこ
とにより、高速かつ高い予測精度でファーマコフォア解
析を実現できることを明らかにした。即ち、いくつかの
決定木解析手法の中でも、統計モデルの頑健性を重視し
た手法として定評のあるCART（Classification and Reg
ression Tree）法を採用し、「2中心ファーマコフォア
構成要素」と限定された「構造記述子」を説明変数とし
た統計解析は、薬理活性を高い精度で予測できることを
示した。このように、本発明の解析方法により、仮想ス
クリーニング技術において、大幅な予測精度の向上が実
現できる。また、薬理活性予測だけでなく、物性、動態
および毒性等の予測においても、精度の向上が実現でき
る。さらに、本手法は、知識発見ツールとしても使用で
き、研究の方向性に対して信頼性の高い知見を与えるこ
とができる。さらに本発明方法では、先行技術と同等以
上の高い予測精度を維持したまま、薬理作用に対して汎用性を実現記述子の算出と統計解析の高速化を実現保持されるデータ容量の低減を図ることができる。これらの本発明方法の特徴は、以
下のように整理することができる。ハイスループットスクリーニングから得られる大量の
薬理活性データからファーマコフォア統計解析モデルを
得ることができる。仮想スクリーニングを行い、特定の薬理活性を示す化
合物を見出すことができる。本手法を用いた上記およびのサービスを提供でき
る。合成化学者等の非統計専門家が、化合物の構造式や薬
理活性値等といった通常の登録作業を行うだけで、特定
の薬理活性の有無を検索対象にできる「化合物データベ
ースシステム」を構築することができる。クライアント・サーバー型あるいはパーソナルコンピ
ューター上で稼動するスタンドアローン型の双方で、高
速に稼動する上記のシステムを構築することができ
る。 ISIS（MDL社）やAccord（MSI社）等の既存の化合物デ
ータベースシステムに組み込まれた上記のシステムを
構築することができる。供給可能な化合物を上記のシステムに登録してＷＷ
Ｗ上に公開し、一般ユーザーが特定の薬理活性の有無を
指標に検索を行い、ヒット化合物をそのまま発注できる
環境を実現することができる。更には、一般ユーザー所
有のコンピューター上で稼動している上記のシステム
が与えるSQL形式の検索式、あるいは、公開されている
データベースに登録されている化合物の薬理活性情報の
いづれか一方を与えることによって、検索を行うことが
できる。社内で蓄積されている化合物データと薬理活性データ
を、上記のシステムに登録しておくことにより、構造
活性相関ルールのデータベースを自動的に作成および更
新が可能になる。このルールデータベースは、新たな創
薬ターゲットの探索、副作用予測、作用メカニズム解析
等に応用することができる。According to the present invention, the minimum unit "N-centered pharmacophore component" expressing the three-dimensional spatial arrangement information of a binding factor and the physicochemical properties of compounds generally used for quantitative structure-activity relationship analysis Pharmacophore analysis with high speed and high prediction accuracy by using one of the nonparametric and non-linear statistical analysis methods, "decision tree analysis", using "structural descriptors" that express statistical characteristics as explanatory variables. Clarified that it can be realized. That is, among several decision tree analysis methods, the CART (Classification and Reg
The statistical analysis using the "response tree" method and the "2-center pharmacophore component" and the limited "structural descriptor" as explanatory variables showed that pharmacological activity could be predicted with high accuracy. As described above, according to the analysis method of the present invention, the prediction accuracy can be significantly improved in the virtual screening technology. Further, not only the prediction of pharmacological activity but also the prediction of physical properties, kinetics, toxicity, and the like can improve the accuracy. Furthermore, this method can also be used as a knowledge discovery tool, and can provide highly reliable knowledge on the direction of research. Furthermore, the method of the present invention realizes versatility for pharmacological action while maintaining high prediction accuracy equal to or higher than that of the prior art. Realizes high-speed calculation of descriptors and statistical analysis. be able to. These features of the method of the present invention can be summarized as follows. A pharmacophore statistical analysis model can be obtained from a large amount of pharmacological activity data obtained from high-throughput screening. Virtual screening can be performed to find compounds that exhibit a particular pharmacological activity. The above services can be provided by using this method. To construct a "compound database system" that allows non-statistical experts such as synthetic chemists to search for the presence or absence of a specific pharmacological activity simply by performing ordinary registration work such as structural formulas and pharmacological activity values of compounds. Can be. The above-described system that operates at high speed can be constructed in both a client-server type and a stand-alone type that operates on a personal computer. The above-described system incorporated in an existing compound database system such as ISIS (MDL) or Accord (MSI) can be constructed. Register the compounds that can be supplied to the above system and
It is possible to realize an environment where the information is published on W and a general user can search using the index of the presence or absence of a specific pharmacological activity as an index, and order a hit compound as it is. Furthermore, by giving either an SQL-format search formula provided by the above system running on a computer owned by a general user or a pharmacological activity information of a compound registered in a public database, Search can be performed. By registering the compound data and pharmacological activity data accumulated in the company in the above-mentioned system, it becomes possible to automatically create and update a database of structure-activity association rules. This rule database can be applied to search for new drug targets, predict side effects, analyze action mechanisms, and the like.

【図面の簡単な説明】[Brief description of the drawings]

【図１】図１は、本発明の解析手法のフローチャート
である。即ち、化合物の構造式、薬理活性データおよび
解析に用いられる説明変数を含んだデータベースに、随
時構造式あるいは薬理活性データを入力する（Ｓ２）。
構造式が入力された場合は、自動的に実在しうる配座が
発生され（Ｓ４）、解析に必要な説明変数が算出され
（Ｓ６）、データベース内に保持される。データベース
内部では、薬理活性項目ごとに決定木解析が行われ、解
析結果がＳＱＬ形式で保持される（Ｓ１０）。この計算
処理は、定期的にあるいは人為的に指定されることによ
り実行され、データが更新される。ユーザーは薬理活性
項目を指定することにより、該当するＳＱＬに基づき、
データベースに登録されている化合物の中から、リード
候補化合物が選択される（Ｓ１２）。FIG. 1 is a flowchart of an analysis method according to the present invention. That is, the structural formula or the pharmacological activity data is input to the database including the structural formula of the compound, the pharmacological activity data and the explanatory variables used in the analysis as needed (S2).
When a structural formula is input, a conformation that can actually exist is automatically generated (S4), and explanatory variables required for analysis are calculated (S6), and are stored in the database. In the database, a decision tree analysis is performed for each pharmacological activity item, and the analysis result is stored in an SQL format (S10). This calculation process is executed periodically or manually, and the data is updated. The user specifies the pharmacological activity item and, based on the relevant SQL,
A lead candidate compound is selected from the compounds registered in the database (S12).

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁷ 識別記号ＦＩテーマコート゛(参考）Ｇ０６Ｆ 17/50 ６３８Ｇ０６Ｆ 17/50 ６３８ ──────────────────────────────────────────────────続き Continued on the front page (51) Int.Cl. ⁷ Identification symbol FI Theme coat ゛ (Reference) G06F 17/50 638 G06F 17/50 638

Claims

[Claims]

Claims: 1. In a database containing structural information of a compound, a lead candidate compound having pharmacological activity is identified as
This is a fast and highly accurate prediction method that uses the “N-centered pharmacophore component” to represent the three-dimensional spatial arrangement information of binding factors and the physical chemistry of compounds commonly used in quantitative structure-activity relationship analysis. Pharmacophore analysis by performing "decision tree analysis", one of the non-parametric and non-linear statistical analysis methods, using "structural descriptors" that express statistical characteristics as explanatory variables. Method.

2. As a decision tree analysis method, CART (Classifi
2. The method of claim 1, comprising employing a cation and regression tree) method.

3. The method according to claim 1, wherein the “N-centered pharmacophore component” is a “two-centered pharmacophore component” which is a minimum unit expressing three-dimensional spatial arrangement information of a binding factor. The described method.

4. Octanol-
Reflects the contribution of entropy in intermolecular bonding, such as water partition coefficient, free energy change due to changes in hydration state, and conformational freedom of ligand, or reflects shape and size of molecular weight, volume, surface area, etc. 2. The method of claim 1, comprising employing one.

5. A method for automatically creating and updating a structure-activity association rule from a database containing compound structure information and pharmacological activity information by using a computer, comprising: information on three-dimensional spatial arrangement of binding factors. The non-parametric and non-parametric non-structural variables are expressed using the “N-centered pharmacophore component” that expresses the phenomena and the “structure descriptor” that expresses the physicochemical properties of compounds commonly used in quantitative structure-activity relationship analysis. One of the linear statistical analysis methods "decision tree analysis"
Performing a pharmacophore analysis.

6. A database of compound structure-activity relationship rules obtained by the method for creating a structure-activity relationship rule according to claim 5.

7. A method for searching for a compound having a specific pharmacological activity using a computer by registering the structural formula and pharmacological activity data of a specific compound in the database of the structure-activity relationship rule according to claim 5.

8. A database for a structure-activity relationship rule according to claim 5, which is provided with a search formula in the form of SQL running on a computer or a specific pharmacological activity information of a compound, thereby giving a specific pharmacological activity information. A method for searching for a lead candidate compound having pharmacological activity using a computer.