WO2007004546A1 - Method for quantitatively predicting physiological activity of compound - Google Patents
Method for quantitatively predicting physiological activity of compound Download PDFInfo
- Publication number
- WO2007004546A1 WO2007004546A1 PCT/JP2006/313076 JP2006313076W WO2007004546A1 WO 2007004546 A1 WO2007004546 A1 WO 2007004546A1 JP 2006313076 W JP2006313076 W JP 2006313076W WO 2007004546 A1 WO2007004546 A1 WO 2007004546A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- descriptor
- physiological activity
- index
- partial structure
- activity
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/30—Prediction of properties of chemical compounds, compositions or mixtures
Definitions
- the present invention is a method for quantifying the physiological activity of a compound useful for investigating and designing a useful compound such as a physiologically active substance such as a pharmaceutical or agrochemical, and a structure for avoiding a harmful compound such as a toxicity or environmental impact. It relates to a method for predicting the target.
- Orbit Merged Markush Service MMS
- MMS Orbit Merged Markush Service
- SAR structure-activity relationship
- QPR structure-property relationship
- the literature information relating to the compound is searched using a database in which the existing compound is registered.
- the system provides the estimated values from the structure of the compounds. It is limited to the physical property correlation (QPR) method, and a search means for predicting and estimating the physical properties and physiological activities required by users of the search system has not been realized. Disclosure of the invention
- the present invention quantitatively evaluates the physical properties and physiological activities of compounds that are not registered as measured values or estimated values in the database from a compound database system in which the structures of existing compounds and general formula structures are registered.
- the problem is to provide a method for prediction and estimation.
- the present invention is a method for quantitatively predicting a biological activity from a database in which a structure of a compound or a general formula structure is registered.
- This is solved by a method for quantitatively predicting the physiological activity of a compound, characterized in that it comprises a step of assembling.
- FIG. 1 is a flowchart of a method of the present invention.
- Quantitative structure activity (physical property) correlation analysis is used as a method for quantitatively predicting and estimating physical properties and physiological activities.
- the analysis is performed and the results are based on compound data.
- the part that the relevant search system uses for compound registration for use in the system The solution is to solve the problem by converting the partial structure index into a descriptor and the step of converting the descriptor into a partial structure index and making it a search expression as important steps.
- tracing set a group of compounds (hereinafter referred to as “tracing set”) whose chemical structure and physiological activity (physical properties) are measured is prepared.
- the structural characteristic components are totalized and digitized (DESC step).
- Structural characteristic components have hierarchical characteristics and numerical designations.
- aggregation is the addition of aggregation values set for each chemical fragmentation code in the conversion table, truncation at the upper limit value, selection of maximum value, average value, minimum value, addition after calculation such as square root, logarithm, and exponentiation. , Including arithmetic processing for the added numerical value.
- the upper code is halogen atom CO
- the lower code is the code of each halogen atom type.
- the code-descriptor conversion table can be set as shown in Table 1 below, for example.
- the 3 ⁇ 4 ⁇ child HAL can be used as a descriptor that gives the contribution to the molecular weight of the /, and rogen atoms in the molecule. .
- HETE3- is a descriptor indicating the number of cages containing 3 to 4 membered heteroatoms.
- the conversion table can be set as shown in Table 3 below, and F4 is a descriptor that changes according to the size of the heterocycle containing nitrogen.
- the conversion table can be created according to each component and used in the DESC step.
- the ring system contains various components such as single ring or condensed ring, number of rings, aromatic property of the ring, hetero ring or carbo ring, and number of heteroatoms.
- the descriptor value becomes a dummy variable of 0 force and 1, so the contribution of the descriptor according to the numerical value such as the number of substituents, etc.
- the effects of the structural components that can be summarized cannot be analyzed.
- the structural components included in the partial structure indexing are extracted from numerical information such as the number of substituents, and the ring structure is extracted from hierarchical structure information such as the type of heteroatoms and the state of condensation.
- QSAR step quantitative structure activity (physical property) correlation analysis of physiological activity is performed (QSAR step).
- descriptors and biological activity (physical properties) values of each compound in the training set can be correlated by multiple regression methods, PLS methods, discriminant analysis methods, neural networks, and other methods.
- physiological activity (physical properties) which is an objective variable
- descriptor weights (coefficients) which are explanatory variables, as shown in the following equation.
- the multiple regression method program is described in detail in Toshino Haga / Shigeshi Hashimoto, Nikkatsu Rensha Publishing Co., Ltd. Statistical Analysis Program Lecture 2 “Regression Analysis and Principal Component Analysis”.
- descriptors that are highly correlated among descriptors created by the DESC step must be excluded from the descriptors that make up the model expression.
- the model formula is constructed by selecting the descriptor to be used in the model formula (such as the variable increase / decrease method).
- the number of descriptors used in quantitative structure-activity (physical property) correlation analysis is the standard number of training set compounds from 1Z5 to: ⁇ It ’s said to be 10. In this way, the model formula can be constructed according to the standard method in the QSAR step if the descriptor by the DESC step is created.
- the physiological activity is quantitatively determined from the descriptor contribution (sign of the coefficient and absolute value).
- Build a query that predicts (physical properties) (QUERY step).
- the model formula descriptors are arranged in the order of the sign and value of the coefficients, and the descriptors are converted into substructure indexes using the conversion table used in the DESC step, as shown in Table 5 below. Since the possible values of the descriptor are determined depending on the setting of the partial structure index, an estimated value corresponding to the search condition is obtained based on the model formula. If the search user sets a threshold for the physiological activity (physical properties) to be searched, it is possible to set a search condition for a partial structure index that searches for compounds that are above or below the threshold.
- S is used as a hit search condition: Not indicates that the search expression is used as a NOT condition (does not hit 3 ⁇ 4;).
- Pharmacokinetics plays an important role in the creation and development of pharmaceuticals. Transporters that transport drugs as in vivo molecules that affect pharmacokinetics are attracting attention, and it is important to know the substrate specificity of drug transporters in order to create drugs with excellent pharmacokinetics. It is important. From the commercially available drugs, 36 compounds with diverse structures were selected as training set compounds, and the substrate specificity of P-glycoprotein was analyzed by the ATPase screening method.
- Chemikano Fragmentation Code and CPI Manuyu Records are indexed for the structure search of the international patent database WPI created by ThomsonDerwent, Derwent Inovation index, DIALOG, STN, Questel. Orbit It can be used with commercial database systems such as
- the index group of chemical fragmentation code and CPI manual code is published on the website as mentioned above.
- a compound library with various structures already synthesized is used. 60 compounds with various structures were selected as a trading set compound from a commercially available compound library, and the substrate specificity of P-glycoprotein was screened for ATPase. Analyzed by the method. For the descriptors used in the analysis, a chemical fragmentation code arranged hierarchically is used, and a conversion table for this is created using a method that aggregates the lower chemical fragmentation codes for each upper structural component. did. Chemical fragmentation The same operation as in Example 1 was performed for the assignment of the extension code and the generation of the descriptor. Descriptors aggregated hierarchically were created and analyzed as 159 candidate descriptors excluding the condition of using highly correlated descriptors simultaneously.
- a program was created to convert a model formula descriptor into a chemical fragmentation code and to obtain search conditions above a threshold.
- P-glycoprotein substrate properties Relative activity with respect to verapamil More than 110%
- the substrate property of P-glycoprotein of the compound Gleevec in the aggregate obtained by this search formula was confirmed to be a compound exhibiting a high substrate property according to a report that is not described in the database.
Landscapes
- Chemical & Material Sciences (AREA)
- Crystallography & Structural Chemistry (AREA)
- Life Sciences & Earth Sciences (AREA)
- Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computing Systems (AREA)
- Theoretical Computer Science (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
This invention provides a method for quantitatively predicting and estimating the properties and physiological activity of compounds not registered as measured values and estimated values with a database in compound database system with which the structures and general formula structures of existing compounds have been registered. This method is a method for quantitatively predicting physiological activity from database with which the structures and general formula structures of compounds have been registered and is characterized by comprising the steps of imparting a partial structure index used in a search system to compounds of which the physiological activity has been measured, bringing the partial structure indexes to descriptors for totalizing and quantifying each structure property component, analyzing, using the descriptors, quantitative structure activity correlation of compounds of which the physiological activity has been measured, and composing a search formula for obtaining search results on the quantitative prediction of physiological activity from the results of contribution of the descriptor to the physiological activity determined by the analysis of the quantitative structure activity correlation.
Description
明 細 書 Specification
化合物の生理活性の定量的予測方法 Method for quantitative prediction of physiological activity of compounds
技術分野 Technical field
[0001] 本発明は、医薬、農薬などの生理活性物質など有益な化合物、また毒性や環境影 響など有害な化合物を避けるための構造を調査、設計するために有用な化合物の 生理活性の定量的予測方法に関するものである。 [0001] The present invention is a method for quantifying the physiological activity of a compound useful for investigating and designing a useful compound such as a physiologically active substance such as a pharmaceutical or agrochemical, and a structure for avoiding a harmful compound such as a toxicity or environmental impact. It relates to a method for predicting the target.
背景技術 Background art
[0002] 医薬農薬をはじめとする有用な性質をもつ化合物の検索手段として、化合物の系 統的命名法 (IUPAC命名法など)や、部分構造のキーワード、また部分構造ごとに 系統的に分類されたコード(ケミカルフラグメンテーションコード、 CPIマニュアルコー ドなど)を付与 (インデキシング)して分類、調査する方法がとられていた。このインデ キシングをテキスト検索ができるデータベースシステム(DIALOG、 STNなど)に登 録し、検索する方法に移行し、さらにこれらのテキストデータベースに加え、現在では 化合物の構造や一般式を化学結合グラフ (結合表)で登録し、部分構造や完全に一 致すべき構造式、一般式で表現される構造の範囲をグラフィカルに指定して検索で きるシステム(STN CAS registry file, MARPAT, Questel. Orbit Merged Markush Service (MMS)など)が使用されている。化合物に関するデータべ一 スでは、化合物の構造に加え物性や生理活性の実測値、化合物が記載される文献 などの情報を調べることができる。さらに近年は化合物の構造から物性や生理活性を 予測、推算する構造活性相関 (SAR)、構造物性相関 (QPR)の技術が使用され、実 測値に加え推算値も登録されるようになっている。 [0002] As a means of searching for compounds with useful properties, such as pharmaceutical pesticides, systematic nomenclature of compounds (such as IUPAC nomenclature), partial structure keywords, and systematic classification by substructure Code (chemical fragmentation code, CPI manual code, etc.) was assigned (indexed) to classify and investigate. This indexing is registered in a database system (DIALOG, STN, etc.) that can perform text search, and the search method is moved to. In addition to these text databases, the structure and general formulas of compounds are now shown in chemical bond graphs (bonds). A system that allows you to register and search for partial structures, structural formulas that must be completely matched, and the range of structures represented by general formulas (STN CAS registry file, MARPAT, Questel. Orbit Merged Markush Service) (MMS) etc.). In a database of compounds, in addition to the structure of the compound, it is possible to examine information such as measured values of physical properties and physiological activities, and literatures describing the compound. In recent years, the structure-activity relationship (SAR) and structure-property relationship (QPR) techniques have been used to predict and estimate physical properties and physiological activities from the structure of compounds, and in addition to actual values, estimated values have been registered. Yes.
[0003] 有用な性質をもつ化合物を得ようとするとき、既存の化合物が登録されているデー タベースを用いて化合物に関連する文献情報を調查する。し力、しながら既存の化合 物に対して求める物性や生理活性の実測値がすべて登録されてはおらず、また化合 物の構造からの推算値もシステムが提供する構造活性相関(SAR)、構造物性相関 (QPR)の方法に限られており、検索システムのユーザーが求める物性や生理活性 を予測、推算する検索手段は実現されていな力 た。
発明の開示 [0003] When trying to obtain a compound having useful properties, the literature information relating to the compound is searched using a database in which the existing compound is registered. However, not all measured values of physical properties and physiological activities required for existing compounds are registered, and the system provides the estimated values from the structure of the compounds. It is limited to the physical property correlation (QPR) method, and a search means for predicting and estimating the physical properties and physiological activities required by users of the search system has not been realized. Disclosure of the invention
発明が解決しょうとする課題 Problems to be solved by the invention
[0004] 従って、本発明は、既存化合物の構造や一般式構造を登録した化合物データべ ースシステムから、当該データベースに実測値、推算値として登録されていない化合 物の物性、生理活性を定量的に予測、推算する方法を提供することを課題としている 課題を解決するための手段 [0004] Therefore, the present invention quantitatively evaluates the physical properties and physiological activities of compounds that are not registered as measured values or estimated values in the database from a compound database system in which the structures of existing compounds and general formula structures are registered. The problem is to provide a method for prediction and estimation.
[0005] 本発明は上記課題を、化合物の構造や一般式構造を登録したデータベースから生 理活性を定量的に予測する方法であって、検索システムに使用される部分構造イン デッタスを生理活性を測定した化合物に付与するステップと、当該部分構造インデッ タスを、構造特性成分ごとに集計して数値化する記述子とするステップと、当該記述 子を使用し、生理活性を測定した化合物の定量的構造活性相関の解析を行うステツ プと、当該定量的構造活性相関の解析で求められた生理活性への記述子の寄与結 果から定量的に生理活性を予測した検索結果を得るための検索式を組み立てるステ ップとを含むことを特徴とする化合物の生理活性の定量的予測方法により解決したも のである。 [0005] The present invention is a method for quantitatively predicting a biological activity from a database in which a structure of a compound or a general formula structure is registered. The step of assigning to the measured compound, the step of setting the partial structure index for each structural characteristic component as a descriptor, and the quantitative analysis of the compound whose physiological activity was measured using the descriptor Retrieval formula for obtaining a search result that quantitatively predicts the physiological activity from the step of analyzing the structure-activity relationship and the contribution result of the descriptor to the physiological activity obtained by the quantitative structure-activity relationship analysis This is solved by a method for quantitatively predicting the physiological activity of a compound, characterized in that it comprises a step of assembling.
発明の効果 The invention's effect
[0006] 本発明によれば、既存の化合物が登録されているデータベースを用いて、データ ベースに実測値、推算値として登録されていない化合物の物性、生理活性を定量的 に予測、推算した検索結果を得ることができるので、有益な化合物を創製することが 可能となる。 [0006] According to the present invention, using a database in which existing compounds are registered, a search that quantitatively predicts and estimates the physical properties and physiological activities of compounds that are not registered as measured values or estimated values in the database. Since the results can be obtained, it is possible to create useful compounds.
図面の簡単な説明 Brief Description of Drawings
[0007] [図 1]本発明方法のフローチャート。 [0007] FIG. 1 is a flowchart of a method of the present invention.
発明を実施するための最良の形態 BEST MODE FOR CARRYING OUT THE INVENTION
[0008] 物性、生理活性などを定量的に予測、推算する手法には定量的構造活性 (物性) 相関解析が用いられるが、本発明は、解析を実施してその結果を化合物データべ一 スシステムで使用するために、当該の検索システムが化合物登録に使用している部
分構造インデックスを記述子に変換するステップと、記述子を部分構造インデックス に変換し検索式とするステップを重要なステップとしてコンピュータ処理することで解 決を図るものである。 [0008] Quantitative structure activity (physical property) correlation analysis is used as a method for quantitatively predicting and estimating physical properties and physiological activities. In the present invention, the analysis is performed and the results are based on compound data. The part that the relevant search system uses for compound registration for use in the system The solution is to solve the problem by converting the partial structure index into a descriptor and the step of converting the descriptor into a partial structure index and making it a search expression as important steps.
[0009] 以下本発明方法を、そのフローを示す図 1と共に説明する。 [0009] Hereinafter, the method of the present invention will be described with reference to FIG.
[0010] まず化学構造とその生理活性 (物性)が測定された一群の化合物群 (以下「トレー二 ングセット」と云う)を準備する。 First, a group of compounds (hereinafter referred to as “tracing set”) whose chemical structure and physiological activity (physical properties) are measured is prepared.
次に、当該トレーニングセットの各化合物に対して検索システムが使用しているィ匕 合物の系統的命名法 (IUPAC命名法など)や、部分構造のキーワード、また部分構 造ごとに系統的に分類されたコード(ケミカルフラグメンテーションコード、 CPIマニュ アルコードなど)等を使用して部分構造インデックスの付与 (インデキシング)を行う(I NDEXステップ)。付与するためのルールは公表されているインデキシングガイドを 用いることで可能である。ケミカルフラグメンテーションコードと CPIマニュアルコード のインデキシングノレ一ノレは Thomson Derwent社、 Thomson Scientific社の下 記ホームページで公開されている。 Next, systematic nomenclature of compounds used by the search system for each compound in the training set (such as IUPAC nomenclature), substructure keywords, and systematically for each substructure. A substructure index is assigned (indexed) using classified codes (chemical fragmentation code, CPI manual code, etc.) (IDNEX step). The rules for granting are possible using the published indexing guide. Chemical fragmentation codes and CPI manual codes are indexed and published on the following websites of Thomson Derwent and Thomson Scientific.
http: / / thomsonderwent. comZ meaiaZ support/ userguides/ c emmd guide, pdi http: //thomsonderwent.comZ meaiaZ support / userguides / c emmd guide, pdi
http : / / www, thomsonscientific. vo/ support/ code/ mc/ cpi/ index, shtml http: //www/thomsonscientific.vo/support/code/mc/cpi/index,shtml
化合物の系統的命名法としては、名称からその構造や組成がわかるような方式とし て国際純正および応用化学連合 (IUPAC)の規則が定められており、解説書 (裳華 房 化学新シリーズ 化合物命名法 中原勝儼 ·稲本直榭 共著など)が数多くある。 化合物構造をグラフィカルに入力すると上述のインデキシングを自動的に行うソフト ウェアも使用できる。ケミカルフラグメンテーションコードは市販されているソフトウェア である Markush Topfrag¾r用レヽること力でさる (http: / / thomsonscientific. jp /products /mtf /index, shtml)、命名法については ChembridgeSoft社の Ch emDraw Ultraを用いることができる。 As a systematic nomenclature for compounds, the rules of the International Pure and Applied Chemistry Union (IUPAC) have been established in such a way that the structure and composition can be understood from the name. There are many such works by Katsumi Nakahara and Naosuke Inamoto. Software that automatically performs the above indexing when the compound structure is entered graphically can also be used. The chemical fragmentation code can be obtained by using the commercially available software for Markush Topfrag¾r (http://thomsonscientific.jp/products/mtf/index, shtml), and ChembridgeSoft's ChemDraw Ultra is used for the nomenclature. be able to.
[0011] 次に、当該部分構造インデックスを記述子とするために、構造特性成分ごとに集計 して数値化を行う(DESCステップ)。構造特性成分には階層的特性と数値指定があ
り、それぞれに部分構造インデックスと記述子として集計する項目の変換表を設定す ることができる。本発明において集計とは、変換表のケミカルフラグメンテーションコー ドごとに設定した集計数値の加算、上限数値での切り捨て、最大値、平均値、最小値 の選択、平方根、対数、べき乗など演算後の加算、加算した数値に対する演算処理 を含むものである。 [0011] Next, in order to use the partial structure index as a descriptor, the structural characteristic components are totalized and digitized (DESC step). Structural characteristic components have hierarchical characteristics and numerical designations. In addition, it is possible to set a conversion table for items to be aggregated as substructure indexes and descriptors. In the present invention, aggregation is the addition of aggregation values set for each chemical fragmentation code in the conversion table, truncation at the upper limit value, selection of maximum value, average value, minimum value, addition after calculation such as square root, logarithm, and exponentiation. , Including arithmetic processing for the added numerical value.
ケミカルフラグメンテーションコードを例として階層的なものでは、上位のコードとし てハロゲン原子 COがあり、下位には個々のハロゲン原子種のコードがある。 In a hierarchical structure using chemical fragmentation code as an example, the upper code is halogen atom CO, and the lower code is the code of each halogen atom type.
[0012] コード—記述子の変換表としては、例えば下記表 1のように設定できる。 [0012] The code-descriptor conversion table can be set as shown in Table 1 below, for example.
[0013] [表 1] [0013] [Table 1]
記 ¾·子 HALは、分子内の/、 ゲン原子め種類数、 HA1W は分子内の /、·ロゲン原子の分 子量への寄与を与える記述子として使用できる。. The ¾ · child HAL can be used as a descriptor that gives the contribution to the molecular weight of the /, and rogen atoms in the molecule. .
[0014] 環の構造成分では、環のサイズを優先した変換表を設定すると、下記表 2のように なる。 [0014] For the structural components of the ring, when a conversion table giving priority to the size of the ring is set, the result is as shown in Table 2 below.
[0015] [表 2]
記逑乎 集計するコード 集計数値 [0015] [Table 2] Record Code to be calculated Total value
HETE3-4 F100 1 HETE3-4 F100 1
P. -0 P. -0
F200 F200
F400 F400
F410 F410
=N = N
HETE3- は、 3.〜4員環でヘテロ原子がュつ含まれる瘰の数を示す記述子である。 HETE3- is a descriptor indicating the number of cages containing 3 to 4 membered heteroatoms.
[0016] また、ヘテロ原子を優先すると変換表は下記表 3のように設定でき、 F4は窒素を含 むへテロ環のサイズに応じて変化する記述子となる。 [0016] If hetero atoms are prioritized, the conversion table can be set as shown in Table 3 below, and F4 is a descriptor that changes according to the size of the heterocycle containing nitrogen.
[0017] [表 3]
[0017] [Table 3]
[0018] このように変換表は、インデキシングされる部分構造が複数の構造成分を持つ場合 それぞれの成分に応じて作成し、 DESCステップで使用することができる。環系はさ まざまな成分を含んでおり、例えば単一環か縮合環、環の員数、環の芳香属性、へ テロ環か炭素環、ヘテロ原子の種類'個数などが挙げられる。 [0018] Thus, when the partial structure to be indexed has a plurality of structural components, the conversion table can be created according to each component and used in the DESC step. The ring system contains various components such as single ring or condensed ring, number of rings, aromatic property of the ring, hetero ring or carbo ring, and number of heteroatoms.
[0019] 特定の置換基の置換数など数値指定を意味するケミカルフラグメンテーションコー ドでは、置換基の数を集計するための数値を指定した下記表 4のような変換表ができ る。 [0019] In the chemical fragmentation code that means numerical designation such as the number of substitutions of a specific substituent, a conversion table as shown in Table 4 below can be created in which numeric values for counting the number of substituents are designated.
[0020] [表 4]
記途子 集計するコード 集計滅 [0020] [Table 4] Nikoko Code to be aggregated
H40 顧 One 1 H40 Customer One 1
一 OH H402 Two 2 One OH H402 Two 2
H403 Three 3 H403 Three 3
H404 Foi r 4 H404 Foir 4
Η¾0β Five or more 5 Η¾0β Five or more 5
[0021] ケミカルフラグメンテーションコードの有無だけを用いると、記述子の値はすべて 0 力、 1というダミー変数となってしまうため、置換基の数など数値に応じた記述子の寄与 、また上位概念でまとめられる構造成分の効果は解析できなレ、。 DESCステップでは 上述されたように部分構造インデキシングに含まれる構造成分を置換基の数など数 値的な情報、環構造についてはヘテロ原子の種別、縮合の状態など階層化された 構造情報を抽出集計した記述子とすることで、置換基の数や階層的に整理された構 造の数え上げなど数値的な変化に応じて、生物活性に対する寄与を解析できるよう になっている。 [0021] If only the presence / absence of chemical fragmentation code is used, the descriptor value becomes a dummy variable of 0 force and 1, so the contribution of the descriptor according to the numerical value such as the number of substituents, etc. The effects of the structural components that can be summarized cannot be analyzed. In the DESC step, as described above, the structural components included in the partial structure indexing are extracted from numerical information such as the number of substituents, and the ring structure is extracted from hierarchical structure information such as the type of heteroatoms and the state of condensation. By using this descriptor, the contribution to biological activity can be analyzed according to numerical changes such as the number of substituents and the number of hierarchically organized structures.
[0022] 次に、当該記述子を使用し、生理活性の定量的構造活性 (物性)相関解析を行う ( QSARステップ)。この QSARステップでは、トレーニングセットの各化合物の記述子 と生物活性 (物性)の値を、重回帰法、 PLS法、判別分析法、ニューラルネットワーク などの方法で相関付けることができる。特に重回帰では、 目的変数である生理活性( 物性)を説明変数である記述子の重み (係数)付の総和の定数項として次式のように 表わされる。 Next, using the descriptor, quantitative structure activity (physical property) correlation analysis of physiological activity is performed (QSAR step). In this QSAR step, descriptors and biological activity (physical properties) values of each compound in the training set can be correlated by multiple regression methods, PLS methods, discriminant analysis methods, neural networks, and other methods. In particular, in multiple regression, physiological activity (physical properties), which is an objective variable, is expressed as a constant term of the sum with descriptor weights (coefficients), which are explanatory variables, as shown in the following equation.
[0023] 生理活性 (物性) =∑ (係数 X 記述子) +定数項 (モデル式) [0023] Physiological activity (physical properties) = ∑ (coefficient X descriptor) + constant term (model equation)
[0024] 重回帰法のプログラムについては、 日科技連出版社 統計解析プログラム講座第 2 卷「回帰分析と主成分分析」芳賀敏郎/橋本茂司著に詳しく記載されている。重回 帰によってモデル式を構築するためには、 DESCステップによって作成された記述 子の中から記述子間で相関性の高いものはモデル式を構成する記述子から除かな ければならない。さらに残った記述子を候補として、モデル式に使用する記述子の選 択 (変数増加減少法など)を行レ、モデル式を構築する。定量的構造活性 (物性)相関 解析で使用する記述子の数は、定法としてトレーニングセット化合物数の 1Z5〜: ίΖ
10とされてレ、る。このように QSARステップは、 DESCステップによる記述子が作成さ れれば定法に従ってモデル式を構築することができる。 [0024] The multiple regression method program is described in detail in Toshino Haga / Shigeshi Hashimoto, Nikkatsu Rensha Publishing Co., Ltd. Statistical Analysis Program Lecture 2 “Regression Analysis and Principal Component Analysis”. In order to construct a model expression by multiple recursions, descriptors that are highly correlated among descriptors created by the DESC step must be excluded from the descriptors that make up the model expression. Furthermore, using the remaining descriptors as candidates, the model formula is constructed by selecting the descriptor to be used in the model formula (such as the variable increase / decrease method). The number of descriptors used in quantitative structure-activity (physical property) correlation analysis is the standard number of training set compounds from 1Z5 to: ίΖ It ’s said to be 10. In this way, the model formula can be constructed according to the standard method in the QSAR step if the descriptor by the DESC step is created.
[0025] 次に、当該定量的構造活性 (物性)相関(QSARステップ)で求められたモデル式 に基づレ、て、記述子の寄与 (係数の符合と絶対値)から定量的に生理活性 (物性)を 予測した検索式を組み立てる(QUERYステップ)。モデル式の記述子をその係数の 符号と値の順に整理し、記述子を DESCステップで使用している変換表を用いて部 分構造インデックスに変換すると下記表 5となる。部分構造インデックスの設定の状況 で、記述子がとり得る値が決定されるため、モデル式に基づいて検索条件に応じた 推算値が求められる。検索ユーザーが検索の目的とする生理活性 (物性)の閾値を 設定すれば、閾値以上ほたは閾値以下)の化合物を検索する部分構造インデックス の検索条件を設定することができる。 [0025] Next, based on the model formula obtained by the quantitative structure activity (physical property) correlation (QSAR step), the physiological activity is quantitatively determined from the descriptor contribution (sign of the coefficient and absolute value). Build a query that predicts (physical properties) (QUERY step). The model formula descriptors are arranged in the order of the sign and value of the coefficients, and the descriptors are converted into substructure indexes using the conversion table used in the DESC step, as shown in Table 5 below. Since the possible values of the descriptor are determined depending on the setting of the partial structure index, an estimated value corresponding to the search condition is obtained based on the model formula. If the search user sets a threshold for the physiological activity (physical properties) to be searched, it is possible to set a search condition for a partial structure index that searches for compounds that are above or below the threshold.
[0026] [表 5] [0026] [Table 5]
Sはヒットすぺき撿索条件として使用すること:、 Notは検索式に NOT条件 (ヒットしてはい けな ¾、;)で使用することを示して V、る。 S is used as a hit search condition: Not indicates that the search expression is used as a NOT condition (does not hit ¾;).
実施例 Example
[0027] 以下実施例を挙げて本発明方法をさらに説明する。 [0027] The method of the present invention will be further described below with reference to examples.
[0028] 実施例 1 [0028] Example 1
医薬品の創製、開発において薬物動態は重要な位置をしめる。薬物動態に影響す る生体内分子として薬物輸送を行うトランスポーターが注目されており、薬物動態に 優れた医薬品を創製するためには薬物トランスポーターの基質特異性を知る事が重
要である。市販されてレ、る医薬品から構造が多様な 36化合物をトレーニングセット化 合物として選択し、 P—糖蛋白質の基質特異性を ATPaseスクリーニング法で解析し た。 Pharmacokinetics plays an important role in the creation and development of pharmaceuticals. Transporters that transport drugs as in vivo molecules that affect pharmacokinetics are attracting attention, and it is important to know the substrate specificity of drug transporters in order to create drugs with excellent pharmacokinetics. It is important. From the commercially available drugs, 36 compounds with diverse structures were selected as training set compounds, and the substrate specificity of P-glycoprotein was analyzed by the ATPase screening method.
まず、その構造式のケミカルフラグメンテーションコードをインデキシングルールに 従い次のとおり付与した。 First, the chemical fragmentation code of the structural formula was assigned as follows according to the indexing rules.
[0029] [0029]
D330 F653 H182 H201 J211 J321 M412 H511 M621 M530 M540 M210 281 311 M313 M321 M332 M342 M270 M272 M380 M381 M383 M391 M392 D013 FO H102 J012 M212 M349 D330 F653 H182 H201 J211 J321 M412 H511 M621 M530 M540 M210 281 311 M313 M321 M332 M342 M270 M272 M380 M381 M383 M391 M392 D013 FO H102 J012 M212 M349
[0030] ケミカノレフラグメンテーションコードと CPIマ二ユアノレコードは、 ThomsonDerwent 社が作成している国際特許データベース WPIの構造検索のためにインデキシングさ れており、 Derwent Inovation index, DIALOG, STN、 Questel. Orbitなどの 商用データベースシステムで利用できる。 [0030] Chemikano Fragmentation Code and CPI Manuyu Records are indexed for the structure search of the international patent database WPI created by ThomsonDerwent, Derwent Inovation index, DIALOG, STN, Questel. Orbit It can be used with commercial database systems such as
ケミカルフラグメンテーションコードと CPIマニュアルコードのインデキシングルー ルは前記のようにホームページで公開されている。 The index group of chemical fragmentation code and CPI manual code is published on the website as mentioned above.
[0031] 次に、ケミカルフラグメンテーションコードの内容に従い数値的な指定のコードを集 計するための変換表を作成した。 [0031] Next, a conversion table for collecting numerically specified codes according to the contents of chemical fragmentation codes was created.
さらにこの変換表に基づき記述子を作成するパーソナルコンピュータ上のプロダラ ムを作成した。ケミカルフラグメンテーションコードの集計の結果 137個の記述子が作 成され、記述子相互の相関性をスピアマンの順位相関係数で計算し、相関性の高い もの同士を重回帰モデルに含まれないよう除いた。また、化合物数の 6%である 3個
以下の発生頻度の少ない記述子を除き、計算に使用する 126個の候補記述子を得 た。薬物濃度 10 μ Mにおける ATPaseの相対的活性(verapamilに対する比活性) を目的変数とし、線形重回帰を行い、下記表 6のモデル式が得られた。数値的な指 定のケミカルフラグメンテーションの集計によって創出された記述子を用いて、トレー ニンダセット化合物の P _糖蛋白質基質性を良い相関性で識別するモデル式が作成 された。 Furthermore, a personal computer program that creates descriptors based on this conversion table was created. As a result of aggregation of chemical fragmentation codes, 137 descriptors were created, and correlations between descriptors were calculated using Spearman's rank correlation coefficient, and those with high correlations were excluded from being included in the multiple regression model. It was. In addition, 3 which is 6% of the number of compounds Except for the following infrequently occurring descriptors, 126 candidate descriptors were obtained for calculation. Using the relative activity of ATPase (specific activity relative to verapamil) at a drug concentration of 10 μM as the objective variable, linear multiple regression was performed, and the model formula shown in Table 6 below was obtained. Using the descriptors created by numerically specified chemical fragmentation aggregation, a model equation was created to identify the P_glycoprotein substrate properties of traininset compounds with good correlation.
閥 閥
トレーユングセ 'ヌト化合物の識別成績 · Treungsung 'Nuto Compound Identification Results ·
相蘭係数 R=0.92480 化合物数 ix=36 F検定値 F(6,29) =28.56 Airan coefficient R = 0.92480 Number of compounds ix = 36 F test value F (6,29) = 28.56
標準偏差 s=18.07 実施例 2 Standard deviation s = 18.07 Example 2
医薬品の創薬段階では、既に合成された多様な構造の化合物ライブラリーを使用 してレ、る。市販されている化合物ライブラリーから多様な構造の 60化合物をトレー二 ングセット化合物として選択し、 P 糖蛋白質の基質特異性を ATPaseスクリーニング
法で解析した。解析に使用する記述子には階層的に整理されたケミカルフラグメンテ ーシヨンコードを用い、上位の構造成分ごとに、下位のケミカルフラグメンテーションコ ードを集計する方法を用いた、このための変換表を作成した。ケミカルフラグメンテ一 シヨンコードの付与と記述子の発生は実施例 1と同じ操作を行った。階層的に集計し た記述子を作成し、相関性の高い記述子を同時に使用する条件を除いた 159個の 候補記述子として解析を行った。 In the drug discovery stage, a compound library with various structures already synthesized is used. 60 compounds with various structures were selected as a trading set compound from a commercially available compound library, and the substrate specificity of P-glycoprotein was screened for ATPase. Analyzed by the method. For the descriptors used in the analysis, a chemical fragmentation code arranged hierarchically is used, and a conversion table for this is created using a method that aggregates the lower chemical fragmentation codes for each upper structural component. did. Chemical fragmentation The same operation as in Example 1 was performed for the assignment of the extension code and the generation of the descriptor. Descriptors aggregated hierarchically were created and analyzed as 159 candidate descriptors excluding the condition of using highly correlated descriptors simultaneously.
薬物濃度 10 μ Μにおける ATPaseの相対的活性 (verapamilに対する比活性)を 目的変数とし、線形重回帰を行い、下記表 7のモデル式が得られた。階層的な指定 のケミカルフラグメンテーションの集計によって創出された記述子を用いて、トレー二 ングセット化合物の P—糖蛋白質基質性を良い相関性で識別するモデル式が作成さ れた。 Using the relative activity of ATPase (specific activity relative to verapamil) at a drug concentration of 10 μΜ as the objective variable, linear multiple regression was performed, and the model formula in Table 7 below was obtained. Using the descriptors created by the aggregation of chemical fragmentation with hierarchical designation, a model formula was created to identify the P-glycoprotein substrate properties of the training set compounds with good correlation.
[表 7]
[Table 7]
相関係数 R=0.8945 化合物数 n=60 F検定値 F(12,47) =15.68 Correlation coefficient R = 0.8945 Number of compounds n = 60 F test value F (12,47) = 15.68
標準偏差 s=13.69 Standard deviation s = 13.69
[0035] モデル式の記述子からケミカルフラグメンテーションコードへの変換と閾値以上の 検索条件を求めるプログラムを作成した。 [0035] A program was created to convert a model formula descriptor into a chemical fragmentation code and to obtain search conditions above a threshold.
P—糖蛋白質の基質性として verapamilに対する相対的活性 110%以上の検索式 として P-glycoprotein substrate properties Relative activity with respect to verapamil More than 110%
[0036] S (F014 F553) /M0, M2, M3, M4 [0036] S (F014 F553) / M0, M2, M3, M4
S Ll (NOTP) (H103 or H600 or H601 or H602 or H603 or H60 4 or H641 or L910 or M113 or M142) /M2, M3, M4 S Ll (NOTP) (H103 or H600 or H601 or H602 or H603 or H60 4 or H641 or L910 or M113 or M142) / M2, M3, M4
を得た。 Got.
[0037] この検索式により既存化合物データベースを検索した結果、次の化合物が得られ
た。 [0037] As a result of searching the existing compound database using this search formula, the following compound was obtained. It was.
[化 2][Chemical 2]
Gleevee Gleevee
この検索式によって得られた集合中の化合物 Gleevecの P—糖蛋白質の基質性は 、データベース中には記載がなぐ報告によると高い基質性を示す化合物であること が確認された。これは本発明の方法により、定量的な活性 (物性)を予測した検索が できていることを意味する。多様な構造をもつ化合物ライブラリーを収集し、 目的とす る生理活性を評価するには多額の費用を要するが、本発明の方法を用いることによ つて特許などのデータベースに保存された膨大な化合物のな力から多額の費用をか けずに評価すべき化合物を選択収集することができる。
The substrate property of P-glycoprotein of the compound Gleevec in the aggregate obtained by this search formula was confirmed to be a compound exhibiting a high substrate property according to a report that is not described in the database. This means that a search predicting quantitative activity (physical properties) has been made by the method of the present invention. Collecting a compound library with various structures and evaluating the target biological activity requires a large amount of money, but by using the method of the present invention, a huge amount of data stored in patent databases etc. It is possible to select and collect compounds to be evaluated from the power of compounds without much cost.
Claims
[1] 化合物の構造や一般式構造を登録したデータベースから生理活性を定量的に予 測する方法であって、検索システムに使用される部分構造インデックスを、生理活性 を測定した化合物に付与するステップと、当該部分構造インデックスを構造特性成分 ごとに集計して数値化を行ない記述子とするステップと、当該記述子を使用し、生理 活性を測定した化合物の定量的構造活性相関の解析を行うステップと、当該定量的 構造活性相関の解析で求められた生理活性への記述子の寄与結果から定量的に 生理活性を予測した検索結果を得るための検索式を組み立てるステップを含むこと を特徴とする化合物の生理活性の定量的予測方法。 [1] A method for quantitatively predicting physiological activity from a database in which the structure of a compound or a general formula structure is registered, and a step of assigning a partial structure index used in a search system to a compound whose physiological activity is measured And subtracting the partial structure index for each structural characteristic component and quantifying it as a descriptor, and using the descriptor to analyze the quantitative structure-activity relationship of the compound whose physiological activity was measured And a step of assembling a search expression for obtaining a search result quantitatively predicting the physiological activity from the contribution result of the descriptor to the physiological activity obtained by the analysis of the quantitative structure-activity relationship. A method for quantitatively predicting the physiological activity of a compound.
[2] 部分構造インデックスを構造特性成分ごとに集計して数値化を行ない記述子とする ステップにおいて、部分構造インデックスが構造成分で階層化されている場合、上位 の構造成分を集計項目とし、下位の部分構造インデックスとの対応を定義する変換 表を用いることを特徴とする請求項 1記載の方法。 [2] In the step of subtracting the partial structure index for each structural characteristic component and digitizing it into a descriptor, if the partial structure index is hierarchized by the structural component, the upper structural component is used as the aggregation item, and the lower The method according to claim 1, wherein a conversion table that defines a correspondence with a partial structure index is used.
[3] 部分構造インデックスを構造特性成分ごとに集計して数値化を行ない記述子とする ステップにおいて、部分構造インデックスがすでに階層化されている構造成分以外 の構造成分で階層化が可能な場合、新たな上位の構造成分の集計項とし、下位の 部分構造インデックスとの対応を定義する変換表を用いることを特徴とする請求項 1 記載の方法。 [3] In the step where the partial structure index is aggregated for each structural characteristic component and digitized to form a descriptor, if the substructure index can be hierarchized by a structural component other than the structural component that has already been hierarchized, The method according to claim 1, wherein a conversion table that defines a correspondence with a lower substructure index is used as a total term of a new upper structural component.
[4] 部分構造インデックスを構造特性成分ごとに集計して数値化を行ない記述子とする ステップにおいて、部分構造インデックスが部分構造の数値的な指定である場合、数 値的な指定をする部分構造に対応する集計項目を設定し、部分構造インデックスに より指定される数値を集計する対応を定義する変換表を用いることを特徴とする請求 項 1記載の方法。 [4] The substructure index is numerically specified for each structural characteristic component and converted into a numerical descriptor. When the substructure index is a numerical specification of the substructure, the substructure is numerically specified. 2. The method according to claim 1, wherein a conversion table that defines a correspondence for summing up the numerical values specified by the partial structure index is set.
[5] 部分構造インデックスとして、ケミカルフラグメンテーションコード、 CPIマニュアルコ ード、部分構造キーワードを使用することを特徴とする請求項 1〜4の何れ力 1項記 載の方法。 [5] The method according to any one of claims 1 to 4, wherein a chemical fragmentation code, a CPI manual code, and a partial structure keyword are used as the partial structure index.
[6] 定量的に生理活性を予測した検索結果を得るための検索式を組み立てるステップ において、構造活性相関モデル式に使用された記述子が、請求項 2〜4の何れか 1
項記載の変換表によって集計されたものである場合、当該変換表を用い、記述子か ら部分構造インデックスに変換して検索式に使用することを特徴とする請求項 1記載 の方法。
[6] In the step of assembling a search formula for obtaining a search result quantitatively predicting physiological activity, the descriptor used in the structure-activity relationship model formula is any one of claims 2 to 4. 2. The method according to claim 1, wherein when the data is aggregated by the conversion table described in the section, the conversion table is used to convert the descriptor into a partial structure index and use it in the search expression.
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2005196168 | 2005-07-05 | ||
JP2005-196168 | 2005-07-05 | ||
JP2006-167002 | 2006-06-16 | ||
JP2006167002A JP5075362B2 (en) | 2005-07-05 | 2006-06-16 | Method for quantitative prediction of physiological activity of compounds |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2007004546A1 true WO2007004546A1 (en) | 2007-01-11 |
Family
ID=37604411
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2006/313076 WO2007004546A1 (en) | 2005-07-05 | 2006-06-30 | Method for quantitatively predicting physiological activity of compound |
Country Status (2)
Country | Link |
---|---|
JP (1) | JP5075362B2 (en) |
WO (1) | WO2007004546A1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112189211A (en) * | 2018-08-08 | 2021-01-05 | 松下知识产权经营株式会社 | Material descriptor generation method, generation device, generation program, prediction model construction method, construction device, and construction program |
CN113454728A (en) * | 2019-02-12 | 2021-09-28 | Jsr株式会社 | Data processing method, data processing device and data processing system |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2002526863A (en) * | 1998-10-05 | 2002-08-20 | スペクス アンド バイオスペクス,ビー.ブイ. | System for classification and generation of compounds |
JP2004537085A (en) * | 2001-03-15 | 2004-12-09 | バイエル アクチェンゲゼルシャフト | Method for generating a hierarchical topological tree of 2D or 3D-compound structures for compound property optimization |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1167969A2 (en) * | 2000-06-14 | 2002-01-02 | Pfizer Inc. | Method and system for predicting pharmacokinetic properties |
JP2003028857A (en) * | 2001-03-21 | 2003-01-29 | Sumitomo Pharmaceut Co Ltd | Dorpamine receptor ligand model |
-
2006
- 2006-06-16 JP JP2006167002A patent/JP5075362B2/en not_active Expired - Fee Related
- 2006-06-30 WO PCT/JP2006/313076 patent/WO2007004546A1/en active Application Filing
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2002526863A (en) * | 1998-10-05 | 2002-08-20 | スペクス アンド バイオスペクス,ビー.ブイ. | System for classification and generation of compounds |
JP2004537085A (en) * | 2001-03-15 | 2004-12-09 | バイエル アクチェンゲゼルシャフト | Method for generating a hierarchical topological tree of 2D or 3D-compound structures for compound property optimization |
Non-Patent Citations (2)
Title |
---|
ISHKAWA T.: "Pharmacogenomics of drug transporters: a new approach to functional analysis of the genetic polymorphisms of ABCB1 (P-glycoprotein/MDR1)", BIOLOGICAL & PHARMACEUTICAL BULLETIN, vol. 27, no. 7, 2004, pages 939 - 948, XP003002733 * |
KURUP A.: "C-QSAR: a database of 18,000 QSARs and associated biological and physical data", JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN, vol. 17, 2003, pages 187 - 196, XP003002734 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112189211A (en) * | 2018-08-08 | 2021-01-05 | 松下知识产权经营株式会社 | Material descriptor generation method, generation device, generation program, prediction model construction method, construction device, and construction program |
CN113454728A (en) * | 2019-02-12 | 2021-09-28 | Jsr株式会社 | Data processing method, data processing device and data processing system |
Also Published As
Publication number | Publication date |
---|---|
JP5075362B2 (en) | 2012-11-21 |
JP2007039437A (en) | 2007-02-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Kortagere et al. | New predictive models for blood–brain barrier permeability of drug-like molecules | |
Warr | Representation of chemical structures | |
Brown et al. | Use of structure− activity data to compare structure-based clustering methods and descriptors for use in compound selection | |
Neuhaus et al. | Data sources for performing citation analysis: an overview | |
Ogura et al. | Support Vector Machine model for hERG inhibitory activities based on the integrated hERG database using descriptor selection by NSGA-II | |
Bauer et al. | Highly cited papers in Library and Information Science (LIS): Authors, institutions, and network structures | |
Michelitsch et al. | Efficient simulation of near-edge x-ray absorption fine structure (NEXAFS) in density-functional theory: Comparison of core-level constraining approaches | |
Zhu et al. | Evaluating and selecting web sources as external information resources of a data warehouse | |
Gadaleta et al. | Automated integration of structural, biological and metabolic similarities to improve read-across | |
Park et al. | Clustering multivariate functional data with phase variation | |
JP6211182B2 (en) | Computational carbon and proton NMR chemical shift based binary fingerprints for virtual screening | |
Li et al. | Challenges of measuring software impact through citations: An examination of the lme4 R package | |
Lowe et al. | Predicting compound amenability with liquid chromatography-mass spectrometry to improve non-targeted analysis | |
Chen et al. | Fuzzy overlapping community quality metrics | |
Cao et al. | Feature importance sampling‐based adaptive random forest as a useful tool to screen underlying lead compounds | |
Hsu et al. | Technology and knowledge document cluster analysis for enterprise R&D strategic planning | |
Kreitzberg et al. | Fast exact computation of the k most abundant isotope peaks with layer-ordered heaps | |
WO2007004546A1 (en) | Method for quantitatively predicting physiological activity of compound | |
Gutiérrez-Gómez et al. | Multi-hop assortativities for network classification | |
JP5135174B2 (en) | Large-scale WEB site evaluation apparatus, large-scale WEB site evaluation method, and large-scale WEB site evaluation program | |
Willett | Evaluation of molecular similarity and molecular diversity methods using biological activity data | |
US8150786B1 (en) | Method for performing data analysis of samples that calculates a number of samples and associates traits with samples | |
Buterez et al. | Multi-fidelity machine learning models for improved high-throughput screening predictions | |
Kirsten et al. | A data warehouse for multidimensional gene expression analysis | |
US20030182094A1 (en) | Methods for classifying and searching chemical reactions |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 06767684 Country of ref document: EP Kind code of ref document: A1 |