WO2007004546A1

WO2007004546A1 - Method for quantitatively predicting physiological activity of compound

Info

Publication number: WO2007004546A1
Application number: PCT/JP2006/313076
Authority: WO
Inventors: Toshihisa Ishikawa; Noboru Tsujikawa; Hiroyuki Hirano
Original assignee: Toshihisa Ishikawa; Noboru Tsujikawa; Hiroyuki Hirano
Priority date: 2005-07-05
Filing date: 2006-06-30
Publication date: 2007-01-11
Also published as: JP5075362B2; JP2007039437A

Abstract

This invention provides a method for quantitatively predicting and estimating the properties and physiological activity of compounds not registered as measured values and estimated values with a database in compound database system with which the structures and general formula structures of existing compounds have been registered. This method is a method for quantitatively predicting physiological activity from database with which the structures and general formula structures of compounds have been registered and is characterized by comprising the steps of imparting a partial structure index used in a search system to compounds of which the physiological activity has been measured, bringing the partial structure indexes to descriptors for totalizing and quantifying each structure property component, analyzing, using the descriptors, quantitative structure activity correlation of compounds of which the physiological activity has been measured, and composing a search formula for obtaining search results on the quantitative prediction of physiological activity from the results of contribution of the descriptor to the physiological activity determined by the analysis of the quantitative structure activity correlation.

Description

Specification

Method for quantitative prediction of physiological activity of compounds

Technical field

[0001] The present invention is a method for quantifying the physiological activity of a compound useful for investigating and designing a useful compound such as a physiologically active substance such as a pharmaceutical or agrochemical, and a structure for avoiding a harmful compound such as a toxicity or environmental impact. It relates to a method for predicting the target.

Background art

[0002] As a means of searching for compounds with useful properties, such as pharmaceutical pesticides, systematic nomenclature of compounds (such as IUPAC nomenclature), partial structure keywords, and systematic classification by substructure Code (chemical fragmentation code, CPI manual code, etc.) was assigned (indexed) to classify and investigate. This indexing is registered in a database system (DIALOG, STN, etc.) that can perform text search, and the search method is moved to. In addition to these text databases, the structure and general formulas of compounds are now shown in chemical bond graphs (bonds). A system that allows you to register and search for partial structures, structural formulas that must be completely matched, and the range of structures represented by general formulas (STN CAS registry file, MARPAT, Questel. Orbit Merged Markush Service) (MMS) etc.). In a database of compounds, in addition to the structure of the compound, it is possible to examine information such as measured values of physical properties and physiological activities, and literatures describing the compound. In recent years, the structure-activity relationship (SAR) and structure-property relationship (QPR) techniques have been used to predict and estimate physical properties and physiological activities from the structure of compounds, and in addition to actual values, estimated values have been registered. Yes.

[0003] When trying to obtain a compound having useful properties, the literature information relating to the compound is searched using a database in which the existing compound is registered. However, not all measured values of physical properties and physiological activities required for existing compounds are registered, and the system provides the estimated values from the structure of the compounds. It is limited to the physical property correlation (QPR) method, and a search means for predicting and estimating the physical properties and physiological activities required by users of the search system has not been realized. Disclosure of the invention

Problems to be solved by the invention

[0004] Therefore, the present invention quantitatively evaluates the physical properties and physiological activities of compounds that are not registered as measured values or estimated values in the database from a compound database system in which the structures of existing compounds and general formula structures are registered. The problem is to provide a method for prediction and estimation.

[0005] The present invention is a method for quantitatively predicting a biological activity from a database in which a structure of a compound or a general formula structure is registered. The step of assigning to the measured compound, the step of setting the partial structure index for each structural characteristic component as a descriptor, and the quantitative analysis of the compound whose physiological activity was measured using the descriptor Retrieval formula for obtaining a search result that quantitatively predicts the physiological activity from the step of analyzing the structure-activity relationship and the contribution result of the descriptor to the physiological activity obtained by the quantitative structure-activity relationship analysis This is solved by a method for quantitatively predicting the physiological activity of a compound, characterized in that it comprises a step of assembling.

The invention's effect

[0006] According to the present invention, using a database in which existing compounds are registered, a search that quantitatively predicts and estimates the physical properties and physiological activities of compounds that are not registered as measured values or estimated values in the database. Since the results can be obtained, it is possible to create useful compounds.

Brief Description of Drawings

[0007] FIG. 1 is a flowchart of a method of the present invention.

BEST MODE FOR CARRYING OUT THE INVENTION

[0008] Quantitative structure activity (physical property) correlation analysis is used as a method for quantitatively predicting and estimating physical properties and physiological activities. In the present invention, the analysis is performed and the results are based on compound data. The part that the relevant search system uses for compound registration for use in the system The solution is to solve the problem by converting the partial structure index into a descriptor and the step of converting the descriptor into a partial structure index and making it a search expression as important steps.

[0009] Hereinafter, the method of the present invention will be described with reference to FIG.

First, a group of compounds (hereinafter referred to as “tracing set”) whose chemical structure and physiological activity (physical properties) are measured is prepared.

Next, systematic nomenclature of compounds used by the search system for each compound in the training set (such as IUPAC nomenclature), substructure keywords, and systematically for each substructure. A substructure index is assigned (indexed) using classified codes (chemical fragmentation code, CPI manual code, etc.) (IDNEX step). The rules for granting are possible using the published indexing guide. Chemical fragmentation codes and CPI manual codes are indexed and published on the following websites of Thomson Derwent and Thomson Scientific.

http: //thomsonderwent.comZ meaiaZ support / userguides / c emmd guide, pdi

http: //www/thomsonscientific.vo/support/code/mc/cpi/index,shtml

As a systematic nomenclature for compounds, the rules of the International Pure and Applied Chemistry Union (IUPAC) have been established in such a way that the structure and composition can be understood from the name. There are many such works by Katsumi Nakahara and Naosuke Inamoto. Software that automatically performs the above indexing when the compound structure is entered graphically can also be used. The chemical fragmentation code can be obtained by using the commercially available software for Markush Topfrag¾r (http://thomsonscientific.jp/products/mtf/index, shtml), and ChembridgeSoft's ChemDraw Ultra is used for the nomenclature. be able to.

[0011] Next, in order to use the partial structure index as a descriptor, the structural characteristic components are totalized and digitized (DESC step). Structural characteristic components have hierarchical characteristics and numerical designations. In addition, it is possible to set a conversion table for items to be aggregated as substructure indexes and descriptors. In the present invention, aggregation is the addition of aggregation values set for each chemical fragmentation code in the conversion table, truncation at the upper limit value, selection of maximum value, average value, minimum value, addition after calculation such as square root, logarithm, and exponentiation. , Including arithmetic processing for the added numerical value.

In a hierarchical structure using chemical fragmentation code as an example, the upper code is halogen atom CO, and the lower code is the code of each halogen atom type.

[0012] The code-descriptor conversion table can be set as shown in Table 1 below, for example.

[0013] [Table 1]

The ¾ · child HAL can be used as a descriptor that gives the contribution to the molecular weight of the /, and rogen atoms in the molecule. .

[0014] For the structural components of the ring, when a conversion table giving priority to the size of the ring is set, the result is as shown in Table 2 below.

[0015] [Table 2] Record Code to be calculated Total value

HETE3-4 F100 1

P. -0

F200

F400

F410

= N

HETE3- is a descriptor indicating the number of cages containing 3 to 4 membered heteroatoms.

[0016] If hetero atoms are prioritized, the conversion table can be set as shown in Table 3 below, and F4 is a descriptor that changes according to the size of the heterocycle containing nitrogen.

[0017] [Table 3]

[0018] Thus, when the partial structure to be indexed has a plurality of structural components, the conversion table can be created according to each component and used in the DESC step. The ring system contains various components such as single ring or condensed ring, number of rings, aromatic property of the ring, hetero ring or carbo ring, and number of heteroatoms.

[0019] In the chemical fragmentation code that means numerical designation such as the number of substitutions of a specific substituent, a conversion table as shown in Table 4 below can be created in which numeric values for counting the number of substituents are designated.

[0020] [Table 4] Nikoko Code to be aggregated

H40 Customer One 1

One OH H402 Two 2

H403 Three 3

H404 Foir 4

Η¾0β Five or more 5

[0021] If only the presence / absence of chemical fragmentation code is used, the descriptor value becomes a dummy variable of 0 force and 1, so the contribution of the descriptor according to the numerical value such as the number of substituents, etc. The effects of the structural components that can be summarized cannot be analyzed. In the DESC step, as described above, the structural components included in the partial structure indexing are extracted from numerical information such as the number of substituents, and the ring structure is extracted from hierarchical structure information such as the type of heteroatoms and the state of condensation. By using this descriptor, the contribution to biological activity can be analyzed according to numerical changes such as the number of substituents and the number of hierarchically organized structures.

Next, using the descriptor, quantitative structure activity (physical property) correlation analysis of physiological activity is performed (QSAR step). In this QSAR step, descriptors and biological activity (physical properties) values of each compound in the training set can be correlated by multiple regression methods, PLS methods, discriminant analysis methods, neural networks, and other methods. In particular, in multiple regression, physiological activity (physical properties), which is an objective variable, is expressed as a constant term of the sum with descriptor weights (coefficients), which are explanatory variables, as shown in the following equation.

[0023] Physiological activity (physical properties) = ∑ (coefficient X descriptor) + constant term (model equation)

[0024] The multiple regression method program is described in detail in Toshino Haga / Shigeshi Hashimoto, Nikkatsu Rensha Publishing Co., Ltd. Statistical Analysis Program Lecture 2 “Regression Analysis and Principal Component Analysis”. In order to construct a model expression by multiple recursions, descriptors that are highly correlated among descriptors created by the DESC step must be excluded from the descriptors that make up the model expression. Furthermore, using the remaining descriptors as candidates, the model formula is constructed by selecting the descriptor to be used in the model formula (such as the variable increase / decrease method). The number of descriptors used in quantitative structure-activity (physical property) correlation analysis is the standard number of training set compounds from 1Z5 to: ίΖ It ’s said to be 10. In this way, the model formula can be constructed according to the standard method in the QSAR step if the descriptor by the DESC step is created.

[0025] Next, based on the model formula obtained by the quantitative structure activity (physical property) correlation (QSAR step), the physiological activity is quantitatively determined from the descriptor contribution (sign of the coefficient and absolute value). Build a query that predicts (physical properties) (QUERY step). The model formula descriptors are arranged in the order of the sign and value of the coefficients, and the descriptors are converted into substructure indexes using the conversion table used in the DESC step, as shown in Table 5 below. Since the possible values of the descriptor are determined depending on the setting of the partial structure index, an estimated value corresponding to the search condition is obtained based on the model formula. If the search user sets a threshold for the physiological activity (physical properties) to be searched, it is possible to set a search condition for a partial structure index that searches for compounds that are above or below the threshold.

[0026] [Table 5]

S is used as a hit search condition: Not indicates that the search expression is used as a NOT condition (does not hit ¾;).

Example

[0027] The method of the present invention will be further described below with reference to examples.

[0028] Example 1

Pharmacokinetics plays an important role in the creation and development of pharmaceuticals. Transporters that transport drugs as in vivo molecules that affect pharmacokinetics are attracting attention, and it is important to know the substrate specificity of drug transporters in order to create drugs with excellent pharmacokinetics. It is important. From the commercially available drugs, 36 compounds with diverse structures were selected as training set compounds, and the substrate specificity of P-glycoprotein was analyzed by the ATPase screening method.

First, the chemical fragmentation code of the structural formula was assigned as follows according to the indexing rules.

[0029]

D330 F653 H182 H201 J211 J321 M412 H511 M621 M530 M540 M210 281 311 M313 M321 M332 M342 M270 M272 M380 M381 M383 M391 M392 D013 FO H102 J012 M212 M349

[0030] Chemikano Fragmentation Code and CPI Manuyu Records are indexed for the structure search of the international patent database WPI created by ThomsonDerwent, Derwent Inovation index, DIALOG, STN, Questel. Orbit It can be used with commercial database systems such as

The index group of chemical fragmentation code and CPI manual code is published on the website as mentioned above.

[0031] Next, a conversion table for collecting numerically specified codes according to the contents of chemical fragmentation codes was created.

Furthermore, a personal computer program that creates descriptors based on this conversion table was created. As a result of aggregation of chemical fragmentation codes, 137 descriptors were created, and correlations between descriptors were calculated using Spearman's rank correlation coefficient, and those with high correlations were excluded from being included in the multiple regression model. It was. In addition, 3 which is 6% of the number of compounds Except for the following infrequently occurring descriptors, 126 candidate descriptors were obtained for calculation. Using the relative activity of ATPase (specific activity relative to verapamil) at a drug concentration of 10 μM as the objective variable, linear multiple regression was performed, and the model formula shown in Table 6 below was obtained. Using the descriptors created by numerically specified chemical fragmentation aggregation, a model equation was created to identify the P_glycoprotein substrate properties of traininset compounds with good correlation.

閥

Treungsung 'Nuto Compound Identification Results ·

Airan coefficient R = 0.92480 Number of compounds ix = 36 F test value F (6,29) = 28.56

Standard deviation s = 18.07 Example 2

In the drug discovery stage, a compound library with various structures already synthesized is used. 60 compounds with various structures were selected as a trading set compound from a commercially available compound library, and the substrate specificity of P-glycoprotein was screened for ATPase. Analyzed by the method. For the descriptors used in the analysis, a chemical fragmentation code arranged hierarchically is used, and a conversion table for this is created using a method that aggregates the lower chemical fragmentation codes for each upper structural component. did. Chemical fragmentation The same operation as in Example 1 was performed for the assignment of the extension code and the generation of the descriptor. Descriptors aggregated hierarchically were created and analyzed as 159 candidate descriptors excluding the condition of using highly correlated descriptors simultaneously.

Using the relative activity of ATPase (specific activity relative to verapamil) at a drug concentration of 10 μΜ as the objective variable, linear multiple regression was performed, and the model formula in Table 7 below was obtained. Using the descriptors created by the aggregation of chemical fragmentation with hierarchical designation, a model formula was created to identify the P-glycoprotein substrate properties of the training set compounds with good correlation.

[Table 7]

Correlation coefficient R = 0.8945 Number of compounds n = 60 F test value F (12,47) = 15.68

Standard deviation s = 13.69

[0035] A program was created to convert a model formula descriptor into a chemical fragmentation code and to obtain search conditions above a threshold.

P-glycoprotein substrate properties Relative activity with respect to verapamil More than 110%

[0036] S (F014 F553) / M0, M2, M3, M4

S Ll (NOTP) (H103 or H600 or H601 or H602 or H603 or H60 4 or H641 or L910 or M113 or M142) / M2, M3, M4

Got.

[0037] As a result of searching the existing compound database using this search formula, the following compound was obtained. It was.

[Chemical 2]

Gleevee

The substrate property of P-glycoprotein of the compound Gleevec in the aggregate obtained by this search formula was confirmed to be a compound exhibiting a high substrate property according to a report that is not described in the database. This means that a search predicting quantitative activity (physical properties) has been made by the method of the present invention. Collecting a compound library with various structures and evaluating the target biological activity requires a large amount of money, but by using the method of the present invention, a huge amount of data stored in patent databases etc. It is possible to select and collect compounds to be evaluated from the power of compounds without much cost.

Claims

The scope of the claims

[1] A method for quantitatively predicting physiological activity from a database in which the structure of a compound or a general formula structure is registered, and a step of assigning a partial structure index used in a search system to a compound whose physiological activity is measured And subtracting the partial structure index for each structural characteristic component and quantifying it as a descriptor, and using the descriptor to analyze the quantitative structure-activity relationship of the compound whose physiological activity was measured And a step of assembling a search expression for obtaining a search result quantitatively predicting the physiological activity from the contribution result of the descriptor to the physiological activity obtained by the analysis of the quantitative structure-activity relationship. A method for quantitatively predicting the physiological activity of a compound.

[2] In the step of subtracting the partial structure index for each structural characteristic component and digitizing it into a descriptor, if the partial structure index is hierarchized by the structural component, the upper structural component is used as the aggregation item, and the lower The method according to claim 1, wherein a conversion table that defines a correspondence with a partial structure index is used.

[3] In the step where the partial structure index is aggregated for each structural characteristic component and digitized to form a descriptor, if the substructure index can be hierarchized by a structural component other than the structural component that has already been hierarchized, The method according to claim 1, wherein a conversion table that defines a correspondence with a lower substructure index is used as a total term of a new upper structural component.

[4] The substructure index is numerically specified for each structural characteristic component and converted into a numerical descriptor. When the substructure index is a numerical specification of the substructure, the substructure is numerically specified. 2. The method according to claim 1, wherein a conversion table that defines a correspondence for summing up the numerical values specified by the partial structure index is set.

[5] The method according to any one of claims 1 to 4, wherein a chemical fragmentation code, a CPI manual code, and a partial structure keyword are used as the partial structure index.

[6] In the step of assembling a search formula for obtaining a search result quantitatively predicting physiological activity, the descriptor used in the structure-activity relationship model formula is any one of claims 2 to 4. 2. The method according to claim 1, wherein when the data is aggregated by the conversion table described in the section, the conversion table is used to convert the descriptor into a partial structure index and use it in the search expression.