JP2019020791A

JP2019020791A - Toxicity predicting method and utilization thereof

Info

Publication number: JP2019020791A
Application number: JP2017135877A
Authority: JP
Inventors: 敏彦澤田; Toshihiko Sawada; 裕昭和佐田; Hiroaki Wasada; 智裕橋本; Tomohiro Hashimoto
Original assignee: Gifu University NUC
Current assignee: Gifu University NUC
Priority date: 2017-07-12
Filing date: 2017-07-12
Publication date: 2019-02-07
Anticipated expiration: 2037-07-12
Also published as: JP6941353B2

Abstract

【課題】精度及び信頼性が高く、且つその評価が容易な予測結果が得られる化合物毒性予測手段を提供することを課題とする。【解決手段】（１）使用者が入力した供試化合物の構造情報を受信するステップと、（２）受信した前記構造情報に基づき、構造が最適化された3次元分子構造を生成するステップと、（３）前記3次元分子構造を用い、3次元分子記述子、4次元分子記述子及び量子化学分子記述子からなる群より選択される分子記述子を少なくとも一つ含む、１個以上の分子記述子の値を算出するステップと、（４）前記分子記述子の値を用いて前記供試化合物の毒性の有無の確率を毒性予測モデルが算出するステップであって、毒性有りの確率と毒性無しの確率を足し合わせると100%であるステップと、及び（５）算出した前記確率を出力するステップと、を実行し、化合物の毒性を予測する。【選択図】図１PROBLEM TO BE SOLVED: To provide a compound toxicity predicting means capable of obtaining a prediction result having high accuracy and reliability and easy evaluation thereof. SOLUTION: (1) a step of receiving structural information of a test compound input by a user, and (2) a step of generating a three-dimensional molecular structure whose structure is optimized based on the received structural information. , (3) One or more molecules using the three-dimensional molecular structure, including at least one molecular descriptor selected from the group consisting of a three-dimensional molecular descriptor, a four-dimensional molecular descriptor, and a quantum chemical molecular descriptor. The step of calculating the value of the descriptor and (4) the step of calculating the probability of the presence or absence of toxicity of the test compound by the toxicity prediction model using the value of the molecular descriptor, the probability of toxicity and the toxicity. The step of adding up the probabilities of none is 100% and (5) the step of outputting the calculated probabilities are executed to predict the toxicity of the compound. [Selection diagram] Fig. 1

Description

本発明は化合物の毒性予測に関する。詳しくは、化合物の毒性を予測する方法、システム及びプログラムに関する。 The present invention relates to the prediction of compound toxicity. Specifically, the present invention relates to a method, a system, and a program for predicting the toxicity of a compound.

化合物の毒性は、in vitroやin vivoの試験によって、各種毒性指標（例えばhERG阻害、生分解性、変異原性）に基づき評価される。各毒性指標には固有の判定基準が設定され、当該判定基準に従い、毒性の有無が判定される。毒性評価のための試験には多くの時間と費用がかかるため、事前に毒性を予測し、試験に供する化合物（候補化合物）を事前に選定（絞りこむ）することが望まれる。即ち、実際の試験を行うことなく化合物の毒性を予測するニーズが存在する。予め毒性を予測できれば、候補化合物の数の低減に伴い、試験に要する時間及び費用を削減できる。その上、仮想の化合物に代表される、実際の試験が行えない又は困難な化合物の毒性も把握できることになる。この利点は、特に新規化合物の開発において重要であり、新規化合物の設計効率を高め、成功率向上と開発費削減に寄与する。化合物開発における動物実験抑制の世界的動向（REACH規則）を受けて、化合物の毒性予測に対する需要は一層高まっている。 The toxicity of a compound is evaluated based on various toxicity indicators (for example, hERG inhibition, biodegradability, mutagenicity) by in vitro and in vivo tests. Each toxicity index has its own determination criteria, and the presence or absence of toxicity is determined according to the determination criteria. Since tests for toxicity evaluation take a lot of time and money, it is desirable to predict the toxicity in advance and select (narrow down) the compounds (candidate compounds) to be used for the test in advance. That is, there is a need to predict the toxicity of a compound without performing actual testing. If toxicity can be predicted in advance, the time and cost required for the test can be reduced as the number of candidate compounds decreases. In addition, it is possible to grasp the toxicity of a compound that cannot be actually tested or is difficult, such as a virtual compound. This advantage is particularly important in the development of new compounds, which increases the design efficiency of new compounds and contributes to improving the success rate and reducing development costs. In response to global trends in animal experiment control in compound development (REACH regulations), the demand for compound toxicity prediction is increasing.

これまでに開発された毒性予測システム／プログラム等では、一般に、供試化合物に毒性が有る又は無い、との判定結果を予測精度とともに出力する（例えば特許文献１〜３、非特許文献１〜５を参照）。予測精度として、交差検証又は外部検証における一致率が用いられることが多い。一致率の値が高いほど予測精度が高いと判断される。一致率は、(化合物の毒性予測結果と化合物の毒性試験結果が一致した数)／(毒性予測した化合物の全数)と定義される。交差検証では、データをトレーニングセットとテストセットに分け、トレーニングセットを用いて予測方法を構築し、構築した予測方法の予測精度を、テストデータを用いて検証する。外部検証では、交差検証に用いたデータから独立したデータを用い、構築した予測方法の予測精度を検証する。交差検証や外部検証を利用したとしても、定性的な判定（即ち、毒性が有又は無）にかわりはなく、化合物間の比較（優劣の判定）は難しい。また、化合物の構造との関係で判定するものではないことから、判定結果の信頼性は高いとはいえない。 In the toxicity prediction system / program and the like developed so far, generally, the determination result that the test compound is toxic or not is output together with the prediction accuracy (for example, Patent Documents 1 to 3 and Non-Patent Documents 1 to 5). See). As the prediction accuracy, a matching rate in cross validation or external validation is often used. The higher the match rate value, the higher the prediction accuracy. The coincidence rate is defined as (the number of coincidence of the compound toxicity prediction result and the compound toxicity test result) / (total number of compounds for which the toxicity was predicted). In cross-validation, data is divided into a training set and a test set, a prediction method is constructed using the training set, and the prediction accuracy of the constructed prediction method is verified using test data. In external verification, data independent of the data used for cross-validation is used to verify the prediction accuracy of the constructed prediction method. Even if cross-validation or external verification is used, qualitative determination (that is, toxicity is present or not) is not changed, and comparison between compounds (determination of superiority or inferiority) is difficult. Further, since the determination is not based on the relationship with the structure of the compound, the reliability of the determination result cannot be said to be high.

国際公開第２００９／０２５０４５号パンフレットInternational Publication No. 2009/025045 Pamphlet 国際公開第２００９／０７８０９６号パンフレットInternational Publication No. 2009/078096 Pamphlet 国際公開第２０１０／０１６１０９号パンフレットInternational Publication No. 2010/016109 Pamphlet

Wang S., ET AL, "Recent developments in computational prediction of HERG blockage", Current Topics in Medicinal Chemistry, (The United Arab Emirates), Bentham Science Publishers, 2013, vol. 13, iss. 11, p. 1317-1326, DOI: 10.2174/15680266113139990036Wang S., ET AL, "Recent developments in computational prediction of HERG blockage", Current Topics in Medicinal Chemistry, (The United Arab Emirates), Bentham Science Publishers, 2013, vol. 13, iss. 11, p. 1317-1326 , DOI: 10.2174 / 15680266113139990036 Blay V., ET AL, "Biodegradability Prediction of Fragrant Molecules by Molecular Topology", ACS Sustainable Chemistry and Engineering, (The United States of America), The American Chemical Society Publications, June 2016, vol. 4, iss. 8, p. 4224-4231, DOI: 10.1021/acssuschemeng.6b00717Blay V., ET AL, "Biodegradability Prediction of Fragrant Molecules by Molecular Topology", ACS Sustainable Chemistry and Engineering, (The United States of America), The American Chemical Society Publications, June 2016, vol. 4, iss. 8, p 4224-4231, DOI: 10.1021 / acssuschemeng.6b00717 Jolly R., ET AL, "An evaluation of in-house and off-the-shelf in silico models: Implications on guidance for mutagenicity assessment", Regulatory Toxicology and Pharmacology, (The Netherlands), Elsevier B.V., April 2015, vol. 71, iss. 3, p. 388-397, DOI: 10.1016/j.yrtph.2015.01.010Jolly R., ET AL, "An evaluation of in-house and off-the-shelf in silico models: Implications on guidance for mutagenicity assessment", Regulatory Toxicology and Pharmacology, (The Netherlands), Elsevier BV, April 2015, vol. 71, iss. 3, p. 388-397, DOI: 10.1016 / j.yrtph.2015.01.010 Greene N., ET AL, "A practical application of two in silico systems for identification of potentially mutagenic impurities", Regulatory Toxicology and Pharmacology, (The Netherlands), Elsevier B.V., May 2015, vol. 72, iss. 2, p. 335-349, DOI: 10.1016/j.yrtph.2015.05.008Greene N., ET AL, "A practical application of two in silico systems for identification of potentially mutagenic impurities", Regulatory Toxicology and Pharmacology, (The Netherlands), Elsevier BV, May 2015, vol. 72, iss. 2, p. 335-349, DOI: 10.1016 / j.yrtph.2015.05.008 Ferrari T., ET AL, "An open source multistep model to predict mutagenicity from statistical analysis and relevant structural alerts", Chemistry Central Journal, (The United Kingdom), Springer Open, July 2010, vol. 4, Suppl. 1, S2, DOI: 10.1186/1752-153X-4-S1-S2Ferrari T., ET AL, "An open source multistep model to predict mutagenicity from statistical analysis and relevant structural alerts", Chemistry Central Journal, (The United Kingdom), Springer Open, July 2010, vol. 4, Suppl. 1, S2 , DOI: 10.1186 / 1752-153X-4-S1-S2 Lazar、in silico toxicology gmbh社、ウェブサイトhttps://lazar.in-silico.ch/predictLazar, in silico toxicology gmbh, website https://lazar.in-silico.ch/predict PASS、Vladimir Poroikov ET AL. ウェブサイトhttp://www.pharmaexpert.ru/passonline/PASS, Vladimir Poroikov ET AL. Website http://www.pharmaexpert.ru/passonline/ HazardExpert Pro、CompuDrug International, Inc.社、ウェブサイトhttp://www.compudrug.com/hazardexpertproHazardExpert Pro, CompuDrug International, Inc., website http://www.compudrug.com/hazardexpertpro CompuDrug International, Inc.社、ウェブサイトhttp://www.compudrug.com/faqCompuDrug International, Inc., website http://www.compudrug.com/faq

ところで、構造情報を利用しつつ化合物の毒性を判定する方法／システムも開発されている。その一つである化合物毒性予測ソフトウェアLazar（非特許文献６）の特徴は、ユーザーが入力した化合物の構造情報に対して毒性有りの確率と毒性無しの確率をそれぞれ算出し、表示することである。しかしながら、毒性有りの確率と毒性無しの確率を足し合わせても確率100%にならず、予測結果の評価が難しい。特に、化合物間の比較が困難である。別の化合物毒性予測ソフトウェアPASS（非特許文献７）も、Laserと同様の問題を抱える。予測結果における、毒性有りの確率と毒性無しの確率を足したものが100%になるソフトウェア（HazardExpert Pro）（非特許文献８）も開発されている。このソフトウェアでは、ユーザーが入力した化合物の構造情報を利用し、毒性フラグメント構造に注目して毒性有りの確率を算出し、表示する。しかしながら、HazardExpert Proを開発したCompuDrug International, Inc.自らが「毒性有りの確率は正確な値ではない」と認めるように（非特許文献９）、その精度、信頼性は高くない。 By the way, a method / system for determining the toxicity of a compound using structural information has also been developed. One of the characteristics of the compound toxicity prediction software Lazar (Non-patent Document 6), which is one of them, is to calculate and display the probability of having toxicity and the probability of having no toxicity with respect to the structural information of the compound inputted by the user. . However, even if the probability of having toxicity and the probability of having no toxicity are added, the probability is not 100%, and it is difficult to evaluate the prediction result. In particular, comparison between compounds is difficult. Another compound toxicity prediction software PASS (Non-Patent Document 7) has the same problem as Laser. Software (HazardExpert Pro) (Non-Patent Document 8) has been developed in which the sum of the probability of toxicity and the probability of non-toxicity in the prediction result is 100%. This software uses the structural information of the compound entered by the user, calculates the probability of toxicity by paying attention to the toxic fragment structure, and displays it. However, as CompuDrug International, Inc., who developed HazardExpert Pro, admits that “the probability of being toxic is not an accurate value” (Non-patent Document 9), its accuracy and reliability are not high.

そこで本発明は、精度及び信頼性が高く、且つその評価が容易な予測結果が得られる化合物毒性予測手段を提供することを課題とする。 Therefore, an object of the present invention is to provide a compound toxicity prediction means that can obtain a prediction result that is highly accurate and reliable and that can be easily evaluated.

上記課題を解決するため、以下の発明が提供される。
［１］（１）使用者が入力した供試化合物の構造情報を受信するステップと、
（２）受信した前記構造情報に基づき、構造が最適化された3次元分子構造を生成するステップと、
（３）前記3次元分子構造を用い、3次元分子記述子、4次元分子記述子及び量子化学分子記述子からなる群より選択される分子記述子を少なくとも一つ含む、１個以上の分子記述子の値を生成するステップと、
（４）前記分子記述子の値を用いて前記供試化合物の毒性の有無の確率を毒性予測モデルが算出するステップであって、毒性有りの確率と毒性無しの確率を足し合わせると100%であるステップと、及び
（５）算出した前記確率を出力するステップと、
を含む、化合物の毒性を予測する方法。
［２］ステップ（４）が、以下のステップからなる、［１］に記載の予測方法。
（４−１）前記分子記述子の値を正規化するステップ、及び
（４−２）正規化済みの値を用いて前記供試化合物の毒性の有無の確率を算出するステップ。
［３］前記3次元分子構造が、半経験的分子軌道法によって構造が最適化された3次元分子構造、非経験的分子軌道法によって構造が最適化された3次元分子構造、密度汎関数法によって構造が最適化された3次元分子構造、及び分子力学法、半経験的分子軌道法、非経験的分子軌道法又は密度汎関数法によって立体配座探索された3次元分子構造、分子力学法、半経験的分子軌道法、非経験的分子軌道法及び密度汎関数法の任意の組合せによって構造が最適された3次元分子構造、からなる群より選択される１個以上の3次元分子構造である、［１］又は［２］に記載の予測方法。
［４］前記3次元分子構造が、半経験的分子軌道法によって構造が最適化された２個以上の3次元分子構造である、［１］又は［２］に記載の予測方法。
［５］前記１個以上の分子記述子が、１個以上の3次元分子記述子と１個以上の量子化学分子記述子を含む、［１］〜［４］のいずれか一項に記載の予測方法。
［６］前記１個以上の分子記述子が、１個以上の3次元分子記述子、１個以上の量子化学分子記述子、１個以上の2次元分子記述子、１個以上の1次元分子記述子、１個以上の0次元分子記述子、を含む、［１］〜［４］のいずれか一項に記載の予測方法。
［７］前記毒性予測モデルが、毒性の有無が既知の複数の化合物の正規化済み分子記述子の値を用いた機械学習で構築した毒性予測モデルである、［１］〜［６］のいずれか一項に記載の予測方法。
［８］前記機械学習が、サポートベクターマシン、ベイジアンネットワーク、ニューラルネットワーク、アダブースト、ランダムフォレスト及びアクティブラーニングからなる群より選択される一つ以上の機械学習である、［７］に記載の予測方法。
［９］前記供試化合物の化学式を生成するステップを更に含み、
ステップ（５）では、生成した化学式と前記確率を関連づけて出力する、［１］〜［８］のいずれか一項に記載の予測方法。
［１０］前記供試化合物が２個以上であり、
ステップ（５）では、供試化合物毎に前記確率を出力する、［１］〜［９］のいずれか一項に記載の予測方法。
［１１］ステップ（５）において、前記確率とともに、前記供試化合物の毒性の有無の判定結果を出力する、［１］〜［１０］のいずれか一項に記載の予測方法。
［１２］ステップ（５）の出力が、表形式での表示である、［１］〜［１１］のいずれか一項に記載の予測方法。
［１３］前記毒性が、細菌を用いた復帰突然変異試験で判定される変異原性である、［１］〜［１２］のいずれか一項に記載の予測方法。
［１４］供試化合物の構造情報を入力するための入力手段と、
使用者が入力した供試化合物の構造情報を受信するための受信手段と、
受信した前記構造情報に基づき、構造が最適化された3次元分子構造を生成するための第１生成手段と、
前記3次元分子構造を用い、3次元分子記述子、4次元分子記述子及び量子化学分子記述子からなる群より選択される分子記述子を少なくとも一つ含む、１個以上の分子記述子の値を算出するための第１算出手段と、
前記分子記述子の値を用いて前記供試化合物の毒性の有無の確率を毒性予測モデルが算出するための算出手段であって、毒性有りの確率と毒性無しの確率を足し合わせると100%である第２算出手段と、及び
算出した前記確率を出力するための出力手段と、
を含む、化合物の毒性を予測するシステム。
［１５］前記入力手段として機能する入力装置と、
前記第１生成手段、前記第１算出手段及び前記第２算出手段として機能する演算装置と、
前記出力手段として機能する出力装置と、
主記憶装置と、及び
システムの制御を行う制御装置と、
を含む、［１４］に記載のシステム。
［１６］プログラムが格納される補助記憶装置を更に備える、［１５］に記載のシステム。
［１７］使用者が入力した供試化合物の構造情報を受信する処理と、
受信した前記構造情報に基づき、構造が最適化された3次元分子構造を生成する処理と、
前記3次元分子構造を用い、3次元分子記述子、4次元分子記述子及び量子化学分子記述子からなる群より選択される分子記述子を少なくとも一つ含む、１個以上の分子記述子の値を算出する処理と、
前記分子記述子の値を用いて前記供試化合物の毒性の有無の確率を毒性予測モデルが算出する処理であって、毒性有りの確率と毒性無しの確率を足し合わせると100%である処理と、及び
算出した前記確率を出力する処理と、
をコンピュータに実行させるためのプログラム。
［１８］［１７］に記載のプログラムを格納した、コンピュータ読み取り可能な記憶媒体。 In order to solve the above problems, the following inventions are provided.
[1] (1) receiving the structural information of the test compound input by the user;
(2) generating a three-dimensional molecular structure having an optimized structure based on the received structural information;
(3) One or more molecular descriptions using the three-dimensional molecular structure and including at least one molecular descriptor selected from the group consisting of a three-dimensional molecular descriptor, a four-dimensional molecular descriptor, and a quantum chemical molecular descriptor. Generating a child value;
(4) A step of calculating a toxicity prediction model by using the value of the molecular descriptor to determine whether the test compound is toxic or not. When the probability of toxicity and the probability of non-toxicity are added, 100% And (5) outputting the calculated probability,
A method for predicting the toxicity of a compound, comprising:
[2] The prediction method according to [1], wherein step (4) includes the following steps.
(4-1) normalizing the value of the molecular descriptor, and (4-2) calculating the probability of the presence or absence of toxicity of the test compound using the normalized value.
[3] The three-dimensional molecular structure is a three-dimensional molecular structure optimized by a semi-empirical molecular orbital method, a three-dimensional molecular structure optimized by a non-empirical molecular orbital method, or a density functional method. 3D molecular structure whose structure is optimized by 3D molecular structure, molecular dynamics method, semi-empirical molecular orbital method, non-empirical molecular orbital method, or 3D molecular structure and molecular dynamics method searched by density functional method One or more three-dimensional molecular structures selected from the group consisting of three-dimensional molecular structures whose structures are optimized by any combination of semi-empirical molecular orbital methods, non-empirical molecular orbital methods and density functional methods The prediction method according to [1] or [2].
[4] The prediction method according to [1] or [2], wherein the three-dimensional molecular structure is two or more three-dimensional molecular structures whose structures are optimized by a semi-empirical molecular orbital method.
[5] The one or more molecular descriptors according to any one of [1] to [4], including one or more three-dimensional molecular descriptors and one or more quantum chemical molecular descriptors. Prediction method.
[6] The one or more molecular descriptors are one or more three-dimensional molecular descriptors, one or more quantum chemical molecular descriptors, one or more two-dimensional molecular descriptors, one or more one-dimensional molecules. The prediction method according to any one of [1] to [4], including a descriptor and one or more zero-dimensional molecular descriptors.
[7] Any one of [1] to [6], wherein the toxicity prediction model is a toxicity prediction model constructed by machine learning using normalized molecular descriptor values of a plurality of compounds whose presence or absence of toxicity is known. The prediction method according to claim 1.
[8] The prediction method according to [7], wherein the machine learning is at least one machine learning selected from the group consisting of a support vector machine, a Bayesian network, a neural network, Adaboost, random forest, and active learning.
[9] The method further includes the step of generating a chemical formula of the test compound,
In the step (5), the prediction method according to any one of [1] to [8], wherein the generated chemical formula and the probability are output in association with each other.
[10] There are two or more test compounds,
In the step (5), the prediction method according to any one of [1] to [9], wherein the probability is output for each test compound.
[11] The prediction method according to any one of [1] to [10], wherein in step (5), the determination result of the presence or absence of toxicity of the test compound is output together with the probability.
[12] The prediction method according to any one of [1] to [11], wherein the output in step (5) is a display in a tabular format.
[13] The prediction method according to any one of [1] to [12], wherein the toxicity is mutagenicity determined by a reverse mutation test using bacteria.
[14] An input means for inputting structure information of the test compound;
A receiving means for receiving structural information of the test compound input by the user;
First generation means for generating a three-dimensional molecular structure having an optimized structure based on the received structure information;
One or more molecular descriptor values including at least one molecular descriptor selected from the group consisting of a three-dimensional molecular descriptor, a four-dimensional molecular descriptor, and a quantum chemical molecular descriptor using the three-dimensional molecular structure. First calculating means for calculating
A calculation means for the toxicity prediction model to calculate the probability of the presence or absence of toxicity of the test compound using the value of the molecular descriptor, and the sum of the probability of toxicity and the probability of non-toxicity is 100%. A second calculation means, and an output means for outputting the calculated probability;
A system for predicting the toxicity of a compound.
[15] An input device that functions as the input means;
An arithmetic unit that functions as the first generation unit, the first calculation unit, and the second calculation unit;
An output device functioning as the output means;
A main storage device and a control device for controlling the system;
The system according to [14], including:
[16] The system according to [15], further including an auxiliary storage device in which the program is stored.
[17] A process of receiving structural information of the test compound input by the user;
Based on the received structural information, a process for generating a three-dimensional molecular structure with an optimized structure;
One or more molecular descriptor values including at least one molecular descriptor selected from the group consisting of a three-dimensional molecular descriptor, a four-dimensional molecular descriptor, and a quantum chemical molecular descriptor using the three-dimensional molecular structure. A process of calculating
A process for calculating the probability of the presence or absence of toxicity of the test compound using the value of the molecular descriptor, and a process that is 100% when the probability of toxicity and the probability of non-toxicity are added together And a process for outputting the calculated probability;
A program that causes a computer to execute.
[18] A computer-readable storage medium storing the program according to [17].

本発明によれば、使用者（ユーザー）が入力した化合物の構造情報を特有の処理に供することにより、高い精度及び信頼性の予測結果が出力される。予測結果では、毒性有りの確率と毒性無しの確率を足し合わせると確率100%となる。従って、予測結果の評価がし易く、即ち、供試化合物間の比較が容易である。例えば、毒性無しの化合物を欲している場合に本発明を実施し、毒性無しの確率54%（言い換えれば、毒性有りの確率46%）の化合物と、毒性無しの確率63%（言い換えれば、毒性有りの確率37%）の化合物が見出されれば、単純に数値の比較によって、後者の化合物を有力な候補として選出することが可能となる。 According to the present invention, a prediction result with high accuracy and reliability is output by subjecting the structural information of a compound input by a user (user) to a specific process. In the prediction result, the probability of toxicity is 100% when the probability of toxicity is added to the probability of non-toxicity. Therefore, it is easy to evaluate the prediction result, that is, the comparison between the test compounds is easy. For example, if you want a non-toxic compound, implement the present invention, a compound with a 54% probability of non-toxicity (in other words, a 46% probability of toxicity) and a 63% probability of non-toxicity (in other words, toxicity) If a compound with a certain probability of 37%) is found, the latter compound can be selected as a strong candidate by simply comparing the numerical values.

本発明の毒性予測方法のフローチャート。The flowchart of the toxicity prediction method of this invention. 毒性予測システムの構成例。Configuration example of toxicity prediction system. 実施例１のAmes変異原性予測結果の出力例。The output example of the Ames mutagenicity prediction result of Example 1. 実施例２のAmes変異原性予測結果。The Ames mutagenicity prediction result of Example 2. 実施例３でAmes変異原性を予測した農薬3種（anthraquinone、diquat及びchlormequat）のSMILES形式の構造情報。Structural information in SMILES format of three pesticides (anthraquinone, diquat and chlormequat) whose Ames mutagenicity was predicted in Example 3. anthraquinoneの3次元分子構造。内部座標を用いて表示した。Anthraquinone 3D molecular structure. Displayed using internal coordinates. diquatの3次元分子構造。内部座標を用いて表示した。3D molecular structure of diquat. Displayed using internal coordinates. chlormequatの3次元分子構造。内部座標を用いて表示した。3D molecular structure of chlormequat. Displayed using internal coordinates. 実施例３で用いた分子記述子（一部）の値。Molecular descriptor (partial) values used in Example 3. 農薬3種（anthraquinone、diquat及びchlormequat）のAmes変異原性予測結果。Ames mutagenicity prediction results of three pesticides (anthraquinone, diquat and chlormequat). 比較例１、２のAmes変異原性予測結果。The Ames mutagenicity prediction result of Comparative Examples 1 and 2.

１．化合物の毒性を予測する方法
本発明の第１の局面は化合物の毒性を予測する方法（以下、「本発明の予測方法」とも呼ぶ）に関する。本発明の予測方法は、細胞や動物を用いることなく、供試化合物（その毒性が評価される化合物）の毒性を評価することができる。細胞や動物を用いた毒性評価と本発明を併用すれば、極めて効率的な毒性評価が可能となる。 1. Method for Predicting Toxicity of Compound A first aspect of the present invention relates to a method for predicting toxicity of a compound (hereinafter also referred to as “prediction method of the present invention”). The prediction method of the present invention can evaluate the toxicity of a test compound (a compound whose toxicity is evaluated) without using cells or animals. By using the present invention in combination with toxicity evaluation using cells and animals, extremely efficient toxicity evaluation becomes possible.

一般に「化合物の毒性」は、急性毒性（経口）、急性毒性（経皮）、急性毒性（吸入）、皮膚腐食性、皮膚刺激性、眼に対する損傷性／刺激性、遺伝毒性、発がん性、生殖毒性、神経毒性、呼吸器感作性、皮膚感作性、生殖細胞変異原性、生態毒性、生物濃縮性、生分解性等の指標によって評価される。本発明の予測方法における「化合物の毒性」を規定する評価指標は特に限定されない。好ましい一態様では、細菌を用いた復帰突然変異試験（bacterial reverse mutation test （Ames試験））で判定される変異原性を指標とした毒性の予測に本発明の予測方法が適用される。尚、Ames試験の方法と判定のルールはOECD TG 471に規定されている。 In general, "toxicity of compounds" refers to acute toxicity (oral), acute toxicity (dermal), acute toxicity (inhalation), skin corrosiveness, skin irritation, eye damage / irritation, genotoxicity, carcinogenicity, reproduction It is evaluated by indicators such as toxicity, neurotoxicity, respiratory sensitization, skin sensitization, germ cell mutagenicity, ecotoxicity, bioconcentration, biodegradability. The evaluation index defining “toxicity of the compound” in the prediction method of the present invention is not particularly limited. In a preferred embodiment, the prediction method of the present invention is applied to prediction of toxicity using as an index the mutagenicity determined by a reverse mutation test (bacterial reverse mutation test (Ames test)) using bacteria. The Ames test method and judgment rules are defined in OECD TG 471.

本発明では以下ステップ（１）〜（５）を行う。
（１）使用者が入力した供試化合物の構造情報を受信するステップ
（２）受信した前記構造情報に基づき、構造が最適化された3次元分子構造を生成するステップ
（３）前記3次元分子構造を用い、3次元分子記述子、4次元分子記述子及び量子化学分子記述子からなる群より選択される分子記述子を少なくとも一つ含む、１個以上の分子記述子の値を算出するステップ
（４）前記分子記述子の値を用いて前記供試化合物の毒性の有無の確率を毒性予測モデルが算出するステップであって、毒性有りの確率と毒性無しの確率を足し合わせると100%であるステップ
（５）算出した前記確率を出力するステップ In the present invention, the following steps (1) to (5) are performed.
(1) Step of receiving structural information of the test compound input by the user (2) Step of generating a three-dimensional molecular structure with an optimized structure based on the received structural information (3) The three-dimensional molecule Calculating a value of one or more molecular descriptors including at least one molecular descriptor selected from the group consisting of a three-dimensional molecular descriptor, a four-dimensional molecular descriptor, and a quantum chemical molecular descriptor using a structure; (4) A step of calculating a toxicity prediction model by using the value of the molecular descriptor to determine whether the test compound is toxic or not. When the probability of toxicity and the probability of non-toxicity are added, 100% Step (5) A step of outputting the calculated probability

以下、図１に示すフローチャートを参照しながら、本発明の予測方法の詳細を説明する。尚、本発明の予測方法は、後述の毒性予測システム等によって実行することができる Hereinafter, the details of the prediction method of the present invention will be described with reference to the flowchart shown in FIG. The prediction method of the present invention can be executed by a toxicity prediction system described later.

まず、使用者（ユーザー）が入力した供試化合物の構造情報が受信される（ステップ（１））。使用者は、供試化合物の構造情報を用意しておく。構造情報の書式は、例えば、SMILES（smi形式と略称されることがある）（参考文献１及び参考文献２)、MDL MOL（参考文献３)、SDF(参考文献３)、CDX（binary file created by PerkinElmer, Inc.'s software ChemDraw（登録商標）、ChemBioDraw（登録商標））等である。供試化合物の数は１個又は２個以上であり、後者の場合には、供試化合物毎に構造情報が入力される。 First, the structural information of the test compound input by the user (user) is received (step (1)). The user prepares structural information of the test compound. The format of the structure information is, for example, SMILES (may be abbreviated as smi format) (references 1 and 2), MDL MOL (reference 3), SDF (reference 3), CDX (binary file created) by PerkinElmer, Inc.'s software ChemDraw (registered trademark), ChemBioDraw (registered trademark)). The number of test compounds is one or more, and in the latter case, structural information is input for each test compound.

供試化合物には、タンパク質、抗体、長鎖のDNAやRNA、ポリスチレン、ポリアクリレートなどの高分子化合物ではなく、分子量800以下の有機化合物が適する。また、重金属以外の化合物を供試化合物にするとよい。 As the test compound, an organic compound having a molecular weight of 800 or less is suitable instead of a polymer compound such as protein, antibody, long-chain DNA or RNA, polystyrene, or polyacrylate. Moreover, it is good to use compounds other than a heavy metal as a test compound.

次に、受信した構造情報に基づき、構造が最適化された3次元分子構造が生成される（ステップ（２））。構造の最適化には、例えば、半経験的分子軌道法、非経験的分子軌道法、密度汎関数法を利用できる。また、分子力学法、半経験的分子軌道法、非経験的分子軌道法又は密度汎関数法によって立体配座探索することによって構造の最適化を行ってもよい。このステップでは、以上のような最適化手法の単独又は２種類以上の併用によって、構造が最適化された１個以上の3次元構造が生成されることになる。分子力学法、半経験的分子軌道法、非経験的分子軌道法及び密度汎関数法を併用して構造を最適化する場合の例を以下に示す。尚、２種類以上の構造最適化手法を併用するか否かの判断においては、処理時間の長さや各装置（演算装置、制御装置、主記憶装置等）に掛かる負荷等を考慮するとよい。
＜併用の例１＞
最初に半経験的分子軌道法で構造を最適化し、得られた3次元分子構造を非経験的分子軌道法又は密度汎関数法で更に構造を最適化する。
＜併用の例２＞
最初にハートリー−フォック法で構造を最適化し、得られた3次元分子構造を密度汎関数法、Moller-Plesset摂動法、配置間相互作用法又はクラスター展開法で更に構造を最適化する。
＜併用の例３＞
最初に、半経験的分子軌道法で構造を最適化し、得られた3次元分子構造をハートリー−フォック法で更に構造を最適化し、得られた3次元分子構造を密度汎関数法、Moller-Plesset摂動法、配置間相互作用法、又はクラスター展開法で更に構造を最適化する。
＜併用の例４＞
Quantum Mechanics/Molecular Mechanics法又はour own N-layered integrated molecular orbital and molecular mechanics法で構造を最適化する。 Next, based on the received structure information, a three-dimensional molecular structure with an optimized structure is generated (step (2)). For example, a semi-empirical molecular orbital method, a non-empirical molecular orbital method, or a density functional method can be used for the structure optimization. Alternatively, the structure may be optimized by searching for a conformation by a molecular mechanics method, a semi-empirical molecular orbital method, a non-empirical molecular orbital method, or a density functional method. In this step, one or more three-dimensional structures with optimized structures are generated by using the above optimization methods alone or in combination of two or more kinds. An example in which the structure is optimized by using the molecular mechanics method, the semiempirical molecular orbital method, the ab initio molecular orbital method and the density functional method is shown below. In determining whether to use two or more types of structure optimization methods together, it is preferable to consider the length of processing time and the load applied to each device (arithmetic device, control device, main storage device, etc.).
<Example 1 of combined use>
First, the structure is optimized by a semi-empirical molecular orbital method, and the obtained three-dimensional molecular structure is further optimized by a non-empirical molecular orbital method or a density functional method.
<Combination example 2>
First, the structure is optimized by the Hartley-Fock method, and the obtained three-dimensional molecular structure is further optimized by the density functional method, the Moller-Plesset perturbation method, the configuration interaction method, or the cluster expansion method.
<Example 3 of combined use>
First, the semi-empirical molecular orbital method is used to optimize the structure, and the resulting three-dimensional molecular structure is further optimized using the Hartley-Fock method. The structure is further optimized by Plesset perturbation method, configuration interaction method, or cluster expansion method.
<Combined use example 4>
Optimize the structure with the Quantum Mechanics / Molecular Mechanics method or our own N-layered integrated molecular orbital and molecular mechanics method.

一般に、化合物の多くは条件によって複数の安定した3次元構造を取る。構造が最適化された3次元構造を複数（即ち２個以上）生成することは、この点を反映させたものとなり、より有用な予測結果をもたらす。尚、好ましい一態様では、半経験的分子軌道法によって構造が最適化された２個以上の3次元分子構造を生成し、次のステップへ進む。 In general, many compounds have a plurality of stable three-dimensional structures depending on conditions. Generating a plurality of (that is, two or more) three-dimensional structures with optimized structures reflects this point and provides more useful prediction results. In a preferred embodiment, two or more three-dimensional molecular structures whose structures are optimized by a semi-empirical molecular orbital method are generated, and the process proceeds to the next step.

「構造の最適化」とは、分子を構成する原子の位置を変化させることによって、分子のエネルギーを極小化することである（参考文献４）。「半経験的分子軌道法」は、経験的パラメータを使用したハートリー−フォック方程式の近似式に基づいて分子の電子状態のエネルギーを算出する方法である（参考文献５）。「非経験的分子軌道法」は、ハートリー−フォック法又はMoller-Plesset摂動法、配置間相互作用法、クラスター展開法等によって分子の電子状態のエネルギーを算出する方法である（参考文献６）。「密度汎関数法」は、電子密度の汎関数によって分子の電子状態のエネルギーを算出する方法である（参考文献７）。「分子力学法」は、古典力学的原子核間ポテンシャルエネルギー関数に基づいて分子ポテンシャルエネルギーを算出する方法である（参考文献８）。「立体配座探索」は、分子の立体配座を系統的に多数発生させた後、立体配座それぞれの構造を最適化することである（参考文献９）。Quantum Mechanics/Molecular Mechanics法及びour own N-layered integrated molecular orbital and molecular mechanics法は、分子のエネルギーを算出する前記手法の複数を混合（ハイブリッド）して、分子のエネルギーを算出する方法である。 “Structural optimization” means minimizing the energy of a molecule by changing the position of atoms constituting the molecule (reference document 4). The “semi-empirical molecular orbital method” is a method of calculating the energy of the electronic state of a molecule based on an approximate expression of the Hartree-Fock equation using empirical parameters (Reference 5). “Non-empirical molecular orbital method” is a method for calculating the energy of the electronic state of a molecule by the Hartley-Fock method or Moller-Plesset perturbation method, configuration interaction method, cluster expansion method, etc. (Reference 6) . The “density functional method” is a method for calculating the energy of the electronic state of a molecule by a functional of electron density (Reference Document 7). The “molecular mechanics method” is a method for calculating molecular potential energy based on a classical mechanical internuclear potential energy function (Reference Document 8). "Conformation search" is to optimize the structure of each conformation after systematically generating a large number of molecular conformations (Reference 9). The Quantum Mechanics / Molecular Mechanics method and our own N-layered integrated molecular orbital and molecular mechanics method are methods for calculating the molecular energy by mixing (hybridizing) a plurality of the aforementioned methods for calculating the molecular energy.

構造が最適化された3次元分子構造の生成には、例えば、CORINA Classic（参考文献１０）、SYBYL（登録商標）-X Suite（参考文献１１）、Open Babel（参考文献１２）、The Chemistry Development Kit（参考文献１３）、RDKit（参考文献１４）、Chem3D^TM（参考文献１５）、ChemBio3D（登録商標）（参考文献１６）、MarvinSketch（参考文献１７）、Balloon（参考文献１８）、TINKER（参考文献１９）、Amber（参考文献２０）、AmberTools（参考文献２１）、CHARMM（参考文献２２）、NAMD（参考文献２３）、BOSS（参考文献２４）、VEGA ZZ/VEGA Command line（参考文献２５）、GROMOS^TM（参考文献２６）、GROMACS（参考文献２７）、MOPAC（登録商標）（参考文献２８）、GAMESS（参考文献２９）、Firefly（参考文献３０）、Gaussian（登録商標）（参考文献３１）、Spartan（参考文献３２）、Q-Chem（参考文献３３）、HyperChem（参考文献３４）、Molecular Operating Environment（参考文献３５）、BIOVIA（登録商標） Discovery Studio（参考文献３６）、BIOVIA（登録商標） Material Studio（参考文献３７）、ConfGen（参考文献３８）、LigPrep（参考文献３９）、Desmond Molecular Dynamics System（参考文献４０）、Jaguar（参考文献４１）、MacroModel（参考文献４２）、MOLGEN（参考文献４３）、CONFLEX（登録商標）（参考文献４４）、OMEGA（参考文献４５）、VConf（参考文献４６）、Key3D（参考文献４７）、Molpro（参考文献４８）、Molcas（参考文献４９）、ADF（参考文献５０）、TURBOMOLE（参考文献５１）、PQS（参考文献５２）、MPQC（参考文献５３）、Dalton（参考文献５４）、LSDalton（参考文献５５）、COLUMBUS（参考文献５６）、NWChem（参考文献５７）、PSI4（参考文献５８）、CFOUR（参考文献５９）、ACES（参考文献６０）、ORCA（参考文献６１）、SMASH（参考文献６２）、ABINIT-MP（参考文献６３）、NTChem（参考文献６４）、PAICS（参考文献６５）等のコンピュータソフトウェアを利用することができる。以上のようなソフトウェアに対して外部から命令することにより、構造が最適化された3次元分子構造を生成してもよい。 For example, CORINA Classic (reference 10), SYBYL (registered trademark) -X Suite (reference 11), Open Babel (reference 12), The Chemistry Development Kit (reference 13), RDKit (reference 14), Chem3D ^™ (reference 15), ChemBio3D® (reference 16), MarvinSketch (reference 17), Balloon (reference 18), TINKER (reference) Reference 19), Amber (Reference 20), AmberTools (Reference 21), CHARMM (Reference 22), NAMD (Reference 23), BOSS (Reference 24), VEGA ZZ / VEGA Command line (Reference 25) , GROMOS ^™ (reference 26), GROMACS (reference 27), MOPAC® (reference 28), GAMESS (reference 29), Firefly (reference 30), Gaussian® (reference 31) ), Spartan (reference 32), Q-Chem (reference) Reference 33), HyperChem (reference 34), Molecular Operating Environment (reference 35), BIOVIA (registered trademark) Discovery Studio (reference 36), BIOVIA (registered trademark) Material Studio (reference 37), ConfGen (reference) Reference 38), LigPrep (reference 39), Desmond Molecular Dynamics System (reference 40), Jaguar (reference 41), MacroModel (reference 42), MOLGEN (reference 43), CONFLEX (reference) 44), OMEGA (reference 45), VConf (reference 46), Key3D (reference 47), Molpro (reference 48), Molcas (reference 49), ADF (reference 50), TURBOMOLE (reference 51) ), PQS (reference 52), MPQC (reference 53), Dalton (reference 54), LSDalton (reference 55), COLUMBUS (reference 56), NWChem (reference 57), PSI4 (reference 58) , CFOUR Reference 59), ACES (reference 60), ORCA (reference 61), SMASH (reference 62), ABINIT-MP (reference 63), NTChem (reference 64), PAICS (reference 65), etc. Computer software can be used. By instructing the software as described above from the outside, a three-dimensional molecular structure with an optimized structure may be generated.

続いて、構造が最適化された3次元分子構造を用い、１個以上の分子記述子の値が算出される（ステップ（３））。用いられる分子記述子の少なくとも一つは3次元分子記述子、4次元分子記述子又は量子化学分子記述子である。構造が最適化された3次元分子構造に基づくことから、正確性ないし信頼性の高い、分子記述子（3次元分子記述子、4次元分子記述子、量子化学分子記述子等）の値が算出され、精度の高い予測結果の出力が可能となる。 Subsequently, one or more molecular descriptor values are calculated using the optimized three-dimensional molecular structure (step (3)). At least one of the molecular descriptors used is a three-dimensional molecular descriptor, a four-dimensional molecular descriptor or a quantum chemical molecular descriptor. Since the structure is based on the optimized 3D molecular structure, accurate and reliable values of molecular descriptors (3D molecular descriptors, 4D molecular descriptors, quantum chemical molecular descriptors, etc.) are calculated. Therefore, it is possible to output a highly accurate prediction result.

好ましくは、3次元分子記述子、4次元分子記述子及び量子化学分子記述子からなる群より選択される二つ以上の分子記述子（例えば、3次元分子記述子と4次元分子記述子の併用や、3次元分子記述子と量子分子記述子の併用、3次元分子記述子、4次元分子記述子及び量子化学分子記述子併用等）の値が算出される。例えば、3次元分子記述子が単独で又は他の分子記述子（即ち、4次元分子記述子及び／又は量子化学分子記述子）との組合せで用いられる場合には、好ましくは20種類以上、更に好ましくは30種類以上、より一層好ましくは50種類以上の3次元分子記述子の値が算出されるようにするとよい。その値が算出される3次元分子記述子の種類（数）の上限は特に限定されない。但し、処理時間の長さや各装置（演算装置、制御装置、主記憶装置等）に掛かる負荷等を考慮し、例えば3,084種類を上限にすることができる。4次元分子記述子についても同様であり、好ましくは200種類以上、更に好ましくは30種類以上、より一層好ましくは50種類以上の値が算出されるようにするとよい（上限は例えば6,480種類）。量子化学分子記述子の場合も同様であり、好ましくは3種類以上、更に好ましくは5種類以上、より一層好ましくは10種類以上の値が算出されるようにするとよい（上限は例えば171種類）。 Preferably, two or more molecular descriptors selected from the group consisting of three-dimensional molecular descriptors, four-dimensional molecular descriptors and quantum chemical molecular descriptors (for example, combined use of three-dimensional molecular descriptors and four-dimensional molecular descriptors) And a combination of a 3D molecular descriptor and a quantum molecule descriptor, a 3D molecular descriptor, a 4D molecular descriptor, and a quantum chemical molecular descriptor). For example, if the 3D molecular descriptor is used alone or in combination with other molecular descriptors (ie, 4D molecular descriptors and / or quantum chemical molecular descriptors), preferably 20 or more types, Preferably, 30 or more types, more preferably 50 or more types of three-dimensional molecular descriptor values are calculated. The upper limit of the type (number) of 3D molecular descriptors for which the value is calculated is not particularly limited. However, considering the length of processing time and the load on each device (arithmetic device, control device, main storage device, etc.), for example, 3,084 types can be set as the upper limit. The same applies to the four-dimensional molecular descriptor, and preferably 200 or more values, more preferably 30 types or more, and even more preferably 50 types or more are calculated (the upper limit is, for example, 6,480 types). The same applies to the quantum chemical molecule descriptor, and preferably three or more values, more preferably five or more types, and even more preferably ten or more values are calculated (upper limit is 171 types, for example).

3次元分子記述子、4次元分子記述子及び／又は量子化学分子記述子の他、2次元分子記述子、1次元分子記述子、0次元分子記述子等の値も算出されるようにしてもよい。この場合においても、その値が算出される分子記述子の組合せは特に限定されない。その値が算出される分子記述子の組合せの例は以下の通りである。
例１）１個以上の3次元分子記述子、１個以上の4次元分子記述子、１個以上の量子化学分子記述子、１個以上の2次元分子記述子、１個以上の1次元分子記述子、１個以上の0次元分子記述子の組合せ
例２）１個以上の3次元分子記述子、１個以上の量子化学分子記述子、１個以上の2次元分子記述子、１個以上の1次元分子記述子、１個以上の0次元分子記述子の組合せ
例３）１個以上の3次元分子記述子、１個以上の2次元分子記述子、１個以上の1次元分子記述子、１個以上の0次元分子記述子の組合せ In addition to 3D molecular descriptors, 4D molecular descriptors and / or quantum chemical molecular descriptors, values such as 2D molecular descriptors, 1D molecular descriptors and 0D molecular descriptors may also be calculated. Good. Even in this case, the combination of molecular descriptors whose values are calculated is not particularly limited. Examples of combinations of molecular descriptors whose values are calculated are as follows.
Example 1) One or more 3D molecular descriptors, 1 or more 4D molecular descriptors, 1 or more quantum chemical molecule descriptors, 1 or more 2D molecular descriptors, 1 or more 1D molecules Descriptors, combinations of one or more zero-dimensional molecular descriptors Example 2) one or more three-dimensional molecular descriptors, one or more quantum chemical molecular descriptors, one or more two-dimensional molecular descriptors, one or more 1-dimensional molecular descriptors and combinations of one or more zero-dimensional molecular descriptors Example 3) One or more three-dimensional molecular descriptors, one or more two-dimensional molecular descriptors, one or more one-dimensional molecular descriptors A combination of one or more zero-dimensional molecular descriptors

その値が算出される分子記述子の総数は特に限定されないが、好ましくは、800個以上、更に好ましくは1,000個以上、より一層好ましくは1,500個以上の分子記述子の値が算出される。原則、分子記述子の数を多くすれば、より信頼性の高い予測結果が得られる。その一方で、分子記述子の数が多すぎると、過度の処理時間を要すること、各装置（演算装置、制御装置、主記憶装置等）に過度な負荷がかかる等の弊害があるため、分子記述子の総数を1,000個〜10,000個の範囲内にするとよい。 The total number of molecular descriptors whose values are calculated is not particularly limited, but preferably 800 or more, more preferably 1,000 or more, and even more preferably 1,500 or more molecular descriptors are calculated. In principle, more reliable prediction results can be obtained by increasing the number of molecular descriptors. On the other hand, if the number of molecular descriptors is too large, there are harmful effects such as excessive processing time and excessive load on each device (arithmetic device, control device, main storage device, etc.). The total number of descriptors should be in the range of 1,000 to 10,000.

「3次元分子記述子」とは、Radial Distribution Function descriptor、Weighted Holistic Invariant Molecular descriptor、Charged partial surface area等（参考文献６６）である。「4次元分子記述子」とは、Comparative Molecular Fields Analysis、GRID、conformational descriptor、4次元分子フィンガープリント等（参考文献６６及び参考文献６７）である。「量子化学分子記述子」とは、最高被占軌道エネルギー、最低空軌道エネルギー、イオン化ポテンシャル、電子親和力、双極子モーメント等（参考文献６６及び参考文献６８）である。「2次元分子記述子」とは、グラフ保存量である、Walk Count Descriptor、Path Count Descriptor等、トポロジー記述子である、Topological Distance Matrix Descriptor、Zagreb Index descriptor等（参考文献６６）である。「1次元分子記述子」とは、官能基の数、フラグメント構造の数、分子フィンガープリント等（参考文献６６）である。「0次元分子記述子」とは、分子量、炭素原子の数、自由回転が可能な単結合の数等である（参考文献６６）。 The “three-dimensional molecular descriptor” is a radial distribution function descriptor, a weighted holistic invariant molecular descriptor, a charged partial surface area, or the like (reference document 66). “4D molecular descriptor” includes Comparative Molecular Fields Analysis, GRID, conformational descriptor, 4D molecular fingerprint, and the like (references 66 and 67). “Quantum chemical molecule descriptor” refers to the highest occupied orbital energy, lowest empty orbital energy, ionization potential, electron affinity, dipole moment, and the like (references 66 and 68). The “two-dimensional molecular descriptor” refers to topological distance matrix descriptors, Zagreb Index descriptors, and the like (reference document 66) such as Walk Count Descriptor and Path Count Descriptor which are graph storage amounts. The “one-dimensional molecular descriptor” is the number of functional groups, the number of fragment structures, molecular fingerprints, and the like (reference document 66). The “0-dimensional molecular descriptor” includes the molecular weight, the number of carbon atoms, the number of single bonds capable of free rotation, and the like (reference document 66).

分子記述子の値の算出には、例えば、DRAGON（参考文献６９）、CODESSA PRO（参考文献７０）、ADAPT（参考文献７１）、ADMET Predictor（参考文献７２）、CORINA Symphony（参考文献７３）、Pentacle（参考文献７４）、VolSurf+（参考文献７５）、ISIDA Fragmentor（参考文献７６）、JOELib（参考文献７７）、Molconn-Z（参考文献７８）、PowerMV（参考文献７９）、PreADMET（参考文献８０）、PaDEL-Descriptor（参考文献８１）、cinfony（参考文献８２）、Chemopy（参考文献８３）、The Chemistry Development Kit（参考文献１３）、RDKit（参考文献１４）、Open Babel（参考文献１２）、ToMoCoMD-CARDD（参考文献８４）、QuaSAR-Descriptor（参考文献８５）、Molecular Operating Environment（参考文献３５）、SYBYL（登録商標）-X Suite（参考文献１１）、BIOVIA（登録商標） Discovery Studio（参考文献３６）、BIOVIA（登録商標） Material Studio（参考文献３７）、QikProp（参考文献８６）、Jaguar（参考文献４１）、MacroModel（参考文献４２）、VCharge（参考文献８７）、MarvinSketch（参考文献１７）、Spartan（参考文献３２）、MOPAC（登録商標）（参考文献２８）、GAMESS（参考文献２９）、Gaussian（登録商標）（参考文献３１）、HyperChem（参考文献３４）、Q-Chem（参考文献３３）、BOSS（参考文献２４）、Firefly（参考文献３０）、Molpro（参考文献４８）、Molcas（参考文献４９）、ADF（参考文献５０）、TURBOMOLE（参考文献５１）、PQS（参考文献５２）、MPQC（参考文献５３）、Dalton（参考文献５４）、LSDalton（参考文献５５）、COLUMBUS（参考文献５６）、NWChem（参考文献５７）、PSI4（参考文献５８）、CFOUR（参考文献５９）、ACES（参考文献６０）、ORCA（参考文献６１）、SMASH（参考文献６２）、ABINIT-MP（参考文献６３）、NTChem（参考文献６４）、PAICS（参考文献６５）、Mold2（参考文献８８）等のコンピュータソフトウェアを利用することができる。以上のようなソフトウェアに対して外部から命令することにより分子記述子の値を算出してもよい。 For calculating the value of the molecular descriptor, for example, DRAGON (reference 69), CODESSA PRO (reference 70), ADAPT (reference 71), ADMET Predictor (reference 72), CORINA Symphony (reference 73), Pentacle (reference 74), VolSurf + (reference 75), ISIDA Fragmentor (reference 76), JOELib (reference 77), Molconn-Z (reference 78), PowerMV (reference 79), PreADMET (reference 80) ), PaDEL-Descriptor (reference 81), cinfony (reference 82), Chemopy (reference 83), The Chemistry Development Kit (reference 13), RDKit (reference 14), Open Babel (reference 12), ToMoCoMD-CARDD (reference 84), QuaSAR-Descriptor (reference 85), Molecular Operating Environment (reference 35), SYBYL (registered trademark) -X Suite (reference 11), BIOVIA (registered trademark) Discovery Studio (reference) Reference 36), BIOVIA (Registered trademark) Material Studio (reference 37), QikProp (reference 86), Jaguar (reference 41), MacroModel (reference 42), VCharge (reference 87), MarvinSketch (reference 17), Spartan (reference) Reference 32), MOPAC (registered trademark) (reference 28), GAMESS (reference 29), Gaussian (registered trademark) (reference 31), HyperChem (reference 34), Q-Chem (reference 33), BOSS (Reference 24), Firefly (reference 30), Molpro (reference 48), Molcas (reference 49), ADF (reference 50), TURBOMOLE (reference 51), PQS (reference 52), MPQC ( Reference 53), Dalton (reference 54), LSDalton (reference 55), COLUMBUS (reference 56), NWChem (reference 57), PSI4 (reference 58), CFOUR (reference 59), ACES (reference) Reference 60), ORCA (Reference 61), SMAS Computer software such as H (reference 62), ABINIT-MP (reference 63), NTChem (reference 64), PAICS (reference 65), Mold2 (reference 88) can be used. The value of the molecular descriptor may be calculated by instructing the software as described above from the outside.

ステップ（３）に続き、供試化合物の毒性の有無の確率が算出される（ステップ（４））。確率の算出には毒性予測モデルが用いられる。毒性予測モデルは、ステップ（３）で算出された分子記述子の値を用いて供試化合物の毒性の有無の確率を算出する。本発明の最大の特徴の１つは、毒性有りの確率と毒性無しの確率を足し合わせると100%となるように毒性の有無の確率を算出することである。尚、供試化合物が２個以上の場合には、供試化合物毎に毒性の有無の確率が算出されることになる。 Following step (3), the probability of the presence or absence of toxicity of the test compound is calculated (step (4)). A toxicity prediction model is used to calculate the probability. In the toxicity prediction model, the probability of the presence or absence of toxicity of the test compound is calculated using the value of the molecular descriptor calculated in step (3). One of the greatest features of the present invention is to calculate the probability of presence / absence of toxicity so that the probability of toxicity and the probability of non-toxicity are 100%. When there are two or more test compounds, the probability of toxicity is calculated for each test compound.

毒性予測モデルには、毒性の有無が既知の複数の化合物の正規化済み分子記述子の値を用いた機械学習で構築した毒性予測モデルを用いるとよい。機械学習の例は、サポートベクターマシン、ベイジアンネットワーク、ニューラルネットワーク、アダブースト、ランダムフォレスト、アクティブラーニングである。これらの中の２つ以上を併用することにしてもよい。機械学習には、例えば、LibSVM（参考文献８９）、TensorFlow^TM（参考文献９０）、Chainer（登録商標）（参考文献９１）、Jubatus（登録商標）（参考文献９２）、Caffe（参考文献９３）、Theano（参考文献９４）、Torch（参考文献９５）、neon^TM（参考文献９６）、MXNet（参考文献９７）、The Microsoft Cognitive Toolkit（参考文献９８）、R(C)（参考文献９９）、MATLAB（登録商標）（参考文献１００）、Mathematica（登録商標）（参考文献１０１）、SAS（登録商標）（参考文献１０２）、RapidMiner（登録商標）（参考文献１０３）、KNIME（登録商標）（参考文献１０４）、WeKa（参考文献１０５）、shogun-toolbox/shogun（参考文献１０６）、Orange（参考文献１０７）、Apache Mahout^TM（参考文献１０８）、scikit-learn（参考文献１０９５）、mlpy（参考文献１１０）、XGBoost（参考文献１１１）、Deeplearning4j（参考文献１１２）等のコンピュータソフトウェアを利用することができる。以上のようなソフトウェアに対して外部から命令し、機械学習を実行させてもよい。 The toxicity prediction model may be a toxicity prediction model constructed by machine learning using normalized molecular descriptor values of a plurality of compounds with known toxicity. Examples of machine learning are support vector machines, Bayesian networks, neural networks, Adaboost, random forest, active learning. Two or more of these may be used in combination. Machine learning includes, for example, LibSVM (reference 89), TensorFlow ^™ (reference 90), Chainer (registered trademark) (reference 91), Jubatus (registered trademark) (reference 92), and Caffe (reference 93). , Theano (reference 94), Torch (reference 95), neon ^™ (reference 96), MXNet (reference 97), The Microsoft Cognitive Toolkit (reference 98), R (C) (reference 99), MATLAB (registered trademark) (reference 100), Mathematica (registered trademark) (reference 101), SAS (registered trademark) (reference 102), RapidMiner (registered trademark) (reference 103), KNIME (registered trademark) ( Reference 104), WeKa (reference 105), shogun-toolbox / shogun (reference 106), Orange (reference 107), Apache Mahout ^™ (reference 108), scikit-learn (reference 1095), mlpy ( Reference 110), XGBoost (see Document 111), it can be utilized computer software such as Deeplearning4j (ref 112). The machine learning may be executed by instructing the software as described above from the outside.

基本的には、毒性予測モデルの構築に使用する既知化合物の種類は多いほど、毒性予測モデルの信頼性が増す。好ましくは300種以上、更に好ましくは3,000種以上、より一層好ましくは7,500種以上の既知化合物を毒性予測モデルの構築に使用する。 Basically, the more types of known compounds that are used to build a toxicity prediction model, the more reliable the toxicity prediction model. Preferably, 300 or more, more preferably 3,000 or more, and even more preferably 7,500 or more known compounds are used in the construction of a toxicity prediction model.

ステップ（４）として、好ましくは、以下の２つのステップを行う。
（４−１）分子記述子の値を正規化するステップ
（４−２）正規化済みの値を用いて供試化合物の毒性の有無の確率を算出するステップ As step (4), the following two steps are preferably performed.
(4-1) Step of normalizing the value of molecular descriptor (4-2) Step of calculating probability of presence or absence of toxicity of test compound using normalized value

ステップ（４−１）は、毒性の有無が既知の複数の化合物の対応する分子記述子と比較できるようにするステップである。例えば、毒性の有無が既知の複数の化合物についての分子記述子の値を正規化する際の計算（処理）を行うことになる。このステップによって得られる正規化済みの値を用い、ステップ（４−２）において、供試化合物の毒性の有無の確率が算出される。 Step (4-1) is a step for enabling comparison with the corresponding molecular descriptors of a plurality of compounds whose presence or absence of toxicity is known. For example, calculation (processing) for normalizing the molecular descriptor values for a plurality of compounds with known toxicity is performed. Using the normalized value obtained in this step, the probability of the presence or absence of toxicity of the test compound is calculated in step (4-2).

ステップ（４）で算出された確率は所定の形式で出力される（ステップ（５））。様々な形式で出力することが可能である。例えば、表形式やグラフ形式等によって表示される。好ましくは、Excel（登録商標）（参考文献１１７）、Libre Office（参考文献１１８）、Apache Open Office^TM（参考文献１１９）等、汎用的なソフトウェアで読み取り／表示可能なように出力される。尚、供試化合物が２個以上の場合には、各供試化合物の毒性の有無の確率が出力されることになり、その典型的な表示態様は、全ての供試化合物の確率を一覧で表示するものであるが、これに限られるものではない。 The probability calculated in step (4) is output in a predetermined format (step (5)). It is possible to output in various formats. For example, it is displayed in a table format or a graph format. Preferably, it is output so that it can be read / displayed by general-purpose software such as Excel (registered trademark) (reference document 117), Libre Office (reference document 118), Apache Open Office ^™ (reference document 119). When there are two or more test compounds, the probability of the presence or absence of toxicity of each test compound is output, and the typical display mode is a list of the probabilities of all test compounds. It is what is displayed, but it is not limited to this.

毒性の有無の確率とともに、供試化合物の毒性の有無の判定結果（典型的には「毒性あり」又は「毒性なし」、或いはこれらに準じたもの）を出力することにしてもよい。当該判定結果は、例えば、複数の供試化合物の中からより効率的に候補化合物（毒性が低い又は毒性がないと予想される有望な化合物）を選抜ないし選定することを可能にする。 Along with the probability of presence / absence of toxicity, a determination result (typically “toxic”, “non-toxic” or similar) of the test compound may be output. The determination result makes it possible, for example, to more efficiently select or select a candidate compound (a promising compound expected to have low toxicity or no toxicity) from a plurality of test compounds.

本発明の一態様では、供試化合物の化学式を生成するステップ（ステップ（６））も行い、ステップ（５）では、当該ステップで生成された化学式と、ステップ（４）で算出された確率が関連づけて（例えば表形式に統合して）出力される。このような出力は、化合物の構造と毒性との関連を示すことになり、より有益な予測結果となる。「化学式」として、化学構造式や分子式等を採用できるが、好ましくは、ユーザーが化合物の幾何学的構造を認識することができる点から化学構造式を採用する。化学式の生成には、例えば、The Chemistry Development Kit（参考文献１３）、RDKit（参考文献１４）、Open Babel（参考文献１２）、MedChem Designer^TM（参考文献１１３）、ChemBioDraw（登録商標）（参考文献１１４）、ChemDraw（登録商標）（参考文献１１５）、MarvinSketch（参考文献１７）、BIOVIA（登録商標） Draw（参考文献１１６）等のコンピュータソフトウェアを利用することができる。以上のようなソフトウェアに対して外部から命令することにより化学式を生成してもよい。 In one embodiment of the present invention, a step of generating a chemical formula of the test compound (step (6)) is also performed. In step (5), the chemical formula generated in the step and the probability calculated in step (4) are calculated. Output in association (for example, integrated into a table format). Such an output will show a relationship between the structure of the compound and toxicity, which is a more useful predictive result. As the “chemical formula”, a chemical structural formula, a molecular formula, or the like can be adopted. Preferably, the chemical structural formula is adopted because the user can recognize the geometric structure of the compound. For example, the Chemistry Development Kit (reference 13), RDKit (reference 14), Open Babel (reference 12), MedChem Designer ^™ (reference 113), ChemBioDraw (registered trademark) (reference) 114), ChemDraw (registered trademark) (reference document 115), MarvinSketch (reference document 17), BIOVIA (registered trademark) Draw (reference document 116) and the like can be used. Chemical formulas may be generated by externally instructing the software as described above.

本発明による予測結果は、典型的には、次の段階の毒性評価に利用される。具体的には、本発明の予測結果に基づき、細胞や動物を用いた毒性評価に供すべき化合物を選定ないし選抜する。このように本発明を利用することにより、極めて効率的な毒性評価が実現される。 The prediction results according to the present invention are typically used for the next stage toxicity assessment. Specifically, based on the prediction result of the present invention, a compound to be subjected to toxicity evaluation using cells or animals is selected or selected. Thus, by using the present invention, extremely efficient toxicity evaluation is realized.

２．毒性予測システム
図２は本発明の毒性予測システムの構成例を概念的に示す図である。この例の毒性予測システム１は、入力装置２、演算装置３、出力装置４、主記憶装置５、制御装置６、補助記憶装置７を備えるコンピュータシステムである。図２中の実線矢印は、データの流れ方向を示す。図２中の破線矢印は制御信号の流れ方向を示す。尚、本発明の毒性予測システムは、任意の汎用コンピュータを利用して構築することもできる。 2. Toxicity Prediction System FIG. 2 is a diagram conceptually showing a configuration example of the toxicity prediction system of the present invention. The toxicity prediction system 1 in this example is a computer system including an input device 2, an arithmetic device 3, an output device 4, a main storage device 5, a control device 6, and an auxiliary storage device 7. A solid line arrow in FIG. 2 indicates a data flow direction. The broken line arrows in FIG. 2 indicate the flow direction of the control signal. The toxicity prediction system of the present invention can also be constructed using any general-purpose computer.

入力装置２は、例えば、キーボード、マウス、タッチパネル等であり、ユーザーは入力装置を操作し、１個以上の供試化合物の構造情報を入力する。主記憶装置（メインメモリ）５はRAM及び／又はROMである。主記憶装置５には、補助記憶装置７に格納されたプログラム及びデータが取り込み格納される。補助記憶装置７はハードディスクドライブ、光ディスク装置、SSD等である。コンピュータが読み取り可能な記録媒体から、或いはネットワーク又はクラウド上の他のコンピュータ／サーバからプログラムがインストールされるように構成してもよい。 The input device 2 is, for example, a keyboard, a mouse, a touch panel, etc., and the user operates the input device to input structural information of one or more test compounds. The main storage device (main memory) 5 is a RAM and / or a ROM. The main storage device 5 captures and stores programs and data stored in the auxiliary storage device 7. The auxiliary storage device 7 is a hard disk drive, an optical disk device, an SSD, or the like. The program may be installed from a computer-readable recording medium or from another computer / server on a network or cloud.

制御装置６は、主記憶装置５に取り込み格納されたプログラムに従って、他の装置を制御する。補助記憶装置７には、コンピュータシステムの出力を格納することができる。出力装置４は例えばディスプレイである。ユーザーは、出力装置を介してコンピュータシステムの出力を視認することが可能である。演算装置３は、主記憶装置５に格納されたデータを取り込んで、制御装置６から送られた演算命令に基づいて演算を行い、演算結果を主記憶装置５に返す。 The control device 6 controls other devices in accordance with a program fetched and stored in the main storage device 5. The auxiliary storage device 7 can store the output of the computer system. The output device 4 is a display, for example. The user can view the output of the computer system via the output device. The arithmetic device 3 takes in the data stored in the main storage device 5, performs an operation based on the operation instruction sent from the control device 6, and returns the operation result to the main storage device 5.

３．プログラム、記憶媒体
本発明は毒性予測システムに用いるコンピュータプログラムも提供する。本発明のコンピュータプログラムは、コンピュータに以下の処理（ｉ）〜（ｖ）を実行させる。尚、本発明のコンピュータプログラムは、例えば、CD（Compact Disc）-ROM、CD-R、CD-RW、DVD（Digital Versatile Disc）、DVD-RAM、BD（Blu-ray（登録商標） Disc）、MO（Magneto Optical disc）、SSD、磁気テープ、或いは各種メモリーカード（USBフラッシュメモリー、SDメモリーカード等）等のコンピュータ読み取り可能な記憶媒体に格納した状態、或いはクラウドコンピュータ等からダウンロードする形態で提供される。また、ネットワークを介して接続されたコンピュータの補助記憶装置に本発明のコンピュータプログラムを格納することや、ネットワークを通じて他のコンピュータに本発明のコンピュータプログラムを転送することなども可能である。
（ｉ）使用者が入力した供試化合物の構造情報を受信する処理
（ｉｉ）受信した前記構造情報に基づき、構造が最適化された3次元分子構造を生成する処理
（ｉｉｉ）前記3次元分子構造から１個以上の分子記述子の値を算出する処理
（ｉｖ）前記分子記述子の値を用いて前記供試化合物の毒性の有無の確率を毒性予測モデルが算出する処理であって、毒性有りの確率と毒性無しの確率を足し合わせると100%である処理
（ｖ）算出した前記確率を出力する処理 3. Program, Storage Medium The present invention also provides a computer program used for the toxicity prediction system. The computer program of the present invention causes a computer to execute the following processes (i) to (v). The computer program of the present invention includes, for example, CD (Compact Disc) -ROM, CD-R, CD-RW, DVD (Digital Versatile Disc), DVD-RAM, BD (Blu-ray (registered trademark) Disc), Provided in a form stored in a computer-readable storage medium such as MO (Magneto Optical disc), SSD, magnetic tape, or various memory cards (USB flash memory, SD memory card, etc.), or downloaded from a cloud computer, etc. The It is also possible to store the computer program of the present invention in an auxiliary storage device of a computer connected via a network, or to transfer the computer program of the present invention to another computer via the network.
(I) Process for receiving structural information of a test compound input by a user (ii) Process for generating a three-dimensional molecular structure with an optimized structure based on the received structural information (iii) The three-dimensional molecule A process for calculating one or more molecular descriptor values from the structure. (Iv) A process for calculating a probability of the presence or absence of toxicity of the test compound using the molecular descriptor value, wherein the toxicity prediction model calculates the toxicity. Processing that is 100% when the probability of existence and the probability of non-toxicity are added (v) Processing that outputs the calculated probability

＜実施例１：農薬2種の毒性予測＞
概要、図２に示した汎用的コンピュータシステムを利用し、本発明の毒性予測方法（図１）を実行した。化合物の構造の最適化には、半経験的分子軌道法の一つであるpm3法が実行可能なソフトウェアGAMESSを使用した。また、構造最適化された3次元分子構造から、3次元分子記述子35種、量子化学分子記述子4種、0次元分子記述子121種、1次元分子記述子907種、2次元分子記述子160種の値を算出することにした。一部の分子記述子（参考文献６８及び参考文献１２０を参照して算出した）を除き、ソフトウェアPaDEL-descriptor（参考文献８１及び参考文献１２１）を使用して分子記述子の値が算出された。一方、Ames変異原性が既知の化合物約8,000種の正規化済み分子記述子の値を用い、サポートベクターマシンによる機械学習で毒性予測モデルを構築した。 <Example 1: Toxicity prediction of two kinds of pesticides>
Outline The toxicity prediction method of the present invention (FIG. 1) was executed using the general-purpose computer system shown in FIG. In order to optimize the structure of the compound, software GAMESS that can execute the pm3 method, which is one of semi-empirical molecular orbital methods, was used. In addition, from the optimized 3D molecular structure, 35 types of 3D molecular descriptors, 4 types of quantum chemical molecular descriptors, 121 types of 0D molecular descriptors, 907 types of 1D molecular descriptors, 2D molecular descriptors We decided to calculate 160 values. Except for some molecular descriptors (calculated with reference to 68 and 120), molecular descriptor values were calculated using the software PaDEL-descriptor (81 and 121). . On the other hand, using the normalized molecular descriptor values of about 8,000 compounds with known Ames mutagenicity, a toxicity prediction model was constructed by machine learning using a support vector machine.

農薬2種（ジクロベニル（dichlobenil）、テフルベンズロン（teflubenzuron）の構造情報をSMILES形式（Clc1cccc(Cl)c1C#N、Fc1cccc(F)c1C(=O)NC(=O)Nc1cc(Cl)c(F)c(Cl)c1F）で入力し、化合物の化学構造式と予測結果を統合し、表形式で出力させた。尚、SMILES形式構造情報は化合物の一次構造であって、書式が確立している（参考文献１及び参考文献２)。SMILES形式構造ファイルは、既存ソフトウェア（例えばMarvinSketch（参考文献１７）、ChemDraw（登録商標）（参考文献１１５）、BIOVIA（登録商標） Draw（参考文献１１６）等）で簡便に作成できる。 Structure information of two pesticides (diclolobenil, teflubenzuron) in SMILES format (Clc1cccc (Cl) c1C # N, Fc1cccc (F) c1C (= O) NC (= O) Nc1cc (Cl) c (F) c (Cl) c1F), the chemical structure of the compound and the prediction result were integrated and output in tabular form, and the SMILES structure information is the primary structure of the compound and the format has been established. (Reference Document 1 and Reference Document 2) SMILES format structure files include existing software (for example, MarvinSketch (reference document 17), ChemDraw (registered trademark) (reference document 115), BIOVIA (registered trademark) Draw (reference document 116), etc. ).

図３は、農薬2種について、細菌を用いた復帰突然変異試験で判定される変異原性を予測した結果である。細菌を用いた復帰突然変異試験はAmes試験と呼称される。供試化合物として用いた農薬2種は、Ames変異原性が無いことが知られている。図３に示す通り、当該農薬2種のAmes変異原性の確率が適切且つ高い精度で算出された。また、Ames変異原性有りの確率と無しの確率を足し合わせると100％になることから、化合物間の比較が容易である。 FIG. 3 shows the predicted results of mutagenicity determined by a reverse mutation test using bacteria for two pesticides. The reverse mutation test using bacteria is called the Ames test. Two pesticides used as test compounds are known not to have Ames mutagenicity. As shown in FIG. 3, the probabilities of Ames mutagenicity of the two pesticides were calculated with appropriate and high accuracy. Moreover, since the sum of the probability of having Ames mutagenicity and the probability of having no Ames is 100%, it is easy to compare between compounds.

＜実施例２：農薬724種の毒性予測（予測精度の検証）＞
本発明のシステムの予測精度を検証するため、Ames変異原性の有無が既知の農薬724種のAmes変異原性を予測した。その結果、本発明のシステムによる予測結果とAmes試験の結果の一致率は、657／724×100＝90.7（％）であり、本発明のシステムの予測精度が高いことが裏づけられた（図４）。また、本発明の予測システムは724種の全てについて予測結果を出力可能であった。即ち、実用性に極めて優れたものであることが示された。 <Example 2: Toxicity prediction of 724 pesticides (verification of prediction accuracy)>
In order to verify the prediction accuracy of the system of the present invention, Ames mutagenicity of 724 pesticides with known Ames mutagenicity was predicted. As a result, the coincidence rate between the prediction result by the system of the present invention and the result of the Ames test is 657/724 × 100 = 90.7 (%), which proves that the prediction accuracy of the system of the present invention is high (FIG. 4). ). In addition, the prediction system of the present invention can output prediction results for all 724 types. That is, it was shown that it was extremely excellent in practicality.

＜実施例３：農薬3種を用いた予測精度の比較＞
農薬3種（anthraquinone、diquat及びchlormequat）のAmes変異原性を本発明の方法で予測した。尚、構造の最適化の方法等、特に言及しない点については、上記の実施例と同様である。 <Example 3: Comparison of prediction accuracy using three types of pesticides>
Ames mutagenicity of three pesticides (anthraquinone, diquat and chlormequat) was predicted by the method of the present invention. The points not particularly mentioned, such as the method for optimizing the structure, are the same as in the above embodiment.

この実施例でAmes変異原性を評価した農薬3種のSMILES形式の構造情報を図５に示す。図６〜８には、構造情報に基づいて生成された、農薬3種の3次元分子構造を、内部座標を用いて表示した。内部座標は、2原子の位置で定義される結合長、3原子の位置で定義される結合角、4原子の位置で定義されるねじれ角から構成される。結合長の単位はオングストロームである。結合角とねじれ角の単位は、°（度）である。 FIG. 5 shows the structure information in SMILES format of three pesticides whose Ames mutagenicity was evaluated in this example. In FIGS. 6-8, the three-dimensional molecular structure of three kinds of pesticides generated based on the structural information is displayed using internal coordinates. The internal coordinates consist of a bond length defined at the position of 2 atoms, a bond angle defined at the position of 3 atoms, and a twist angle defined at the position of 4 atoms. The unit of bond length is angstrom. The unit of the bond angle and the twist angle is ° (degrees).

3次元分子構造に基づいて、3次元分子記述子35種、量子化学分子記述子4種、0次元分子記述子119種、1次元分子記述子795種、及び2次元分子記述子149種の値を算出した。算出された値の一部を図９に示す。毒性予測モデルの構築には、Ames変異原性が既知の化合物9,719種の正規化済み分子記述子の値を用いた。 Based on the 3D molecular structure, the values of 35 types of 3D molecular descriptors, 4 types of quantum chemical molecular descriptors, 119 types of 0D molecular descriptors, 795 types of 1D molecular descriptors, and 149 types of 2D molecular descriptors Was calculated. A part of the calculated values is shown in FIG. To construct the toxicity prediction model, normalized molecular descriptor values of 9,719 compounds with known Ames mutagenicity were used.

農薬3種の予測結果の出力を図１０に示す。農薬毎にAmes変異原性の有りの確率と無しの確率（二つの確率を足すと100％になる）が計算され出力される。供試化合物として用いた農薬3種は、Ames試験によってAmes変異原性が無いことが判明している。 The output of the prediction results for the three types of pesticides is shown in FIG. For each pesticide, the probability of having Ames mutagenicity and the probability of not having Ames mutagenicity (100% by adding the two probabilities) are calculated and output. The three pesticides used as test compounds have been found to be free of Ames mutagenicity by the Ames test.

比較例として、生成される分子記述子を変えた場合の予測結果を求めた。
＜比較例１＞
0次元分子記述子119種、1次元分子記述子795種、2次元分子記述子149種の値が算出されることにし（3次元分子記述子35種と量子化学分子記述子4種を除いた）、上記と同様の処理によってAmes変異原性を予測した。上記システム（実施例３）との違いは、3次元分子記述子35種と量子化学分子記述子4種を含まない点である。
＜比較例２＞
構造最適化されていない3次元分子構造に基づいて、3次元分子記述子29種、0次元分子記述子119種、1次元分子記述子795種、2次元分子記述子149種の値が算出されることにし、上記と同様の処理によってAmes変異原性を予測した。上記システム（実施例３）との違いは、3次元分子記述子の値が、構造最適化されていない3次元構造に基づいて算出されている点と、量子化学分子記述子4種を含まない点である。 As a comparative example, the prediction result when the generated molecular descriptor was changed was obtained.
<Comparative Example 1>
The values of 119 kinds of 0-dimensional molecular descriptors, 795 kinds of 1-dimensional molecular descriptors, and 149 kinds of 2-dimensional molecular descriptors will be calculated (excluding 35 kinds of 3-dimensional molecular descriptors and 4 kinds of quantum chemical molecular descriptors). ) Ames mutagenicity was predicted by the same treatment as above. The difference from the above system (Example 3) is that it does not include 35 types of 3D molecular descriptors and 4 types of quantum chemical molecular descriptors.
<Comparative example 2>
Based on the non-optimized 3D molecular structure, 29 types of 3D molecular descriptors, 119 types of 0D molecular descriptors, 795 types of 1D molecular descriptors, and 149 types of 2D molecular descriptors are calculated. In particular, Ames mutagenicity was predicted by the same treatment as above. The difference from the above system (Example 3) is that the value of the 3D molecular descriptor is calculated based on the 3D structure that is not structurally optimized, and does not include the 4 types of quantum chemical molecular descriptors. Is a point.

比較例１、２の予測結果を図１１に示す。実施例３の出力結果（図１０）と比較すると、比較例１、２のいずれも、農薬3種の全てについて、Ames変異原性有りの確率が高く、予測精度及び正確性に劣ることがわかる。注目すべきことに、比較例１、２では、diquatについてAmes変異原性なしの確率よりもAmes変異原性有りの確率の方が高く、Ames試験の結果に反する予測結果を示した。 The prediction results of Comparative Examples 1 and 2 are shown in FIG. Compared with the output result of FIG. 3 (FIG. 10), it can be seen that both of Comparative Examples 1 and 2 have a high probability of Ames mutagenicity for all three types of pesticides and are inferior in prediction accuracy and accuracy. . It should be noted that in Comparative Examples 1 and 2, the probability of diquat having Ames mutagenicity was higher than the probability without Ames mutagenicity, and the prediction results were contrary to the results of the Ames test.

本発明によれば、既知化合物はもとより、今後開発される化合物（仮想化合物を含む）の毒性予測が可能である。また、実際の試験が行えない又は困難な化合物の毒性の予測も可能となる。毒性の無い又は毒性の低い化合物の開発が要求される、化成品、医薬品、農薬、動物薬品（家畜やペット）、香粧品、洗剤、染料、インク、添加剤、その他の素材の毒性評価に本発明を適用可能である。 According to the present invention, it is possible to predict toxicity of not only known compounds but also compounds to be developed in the future (including virtual compounds). In addition, it is possible to predict the toxicity of a compound that cannot be actually tested or is difficult. Used for chemicals, pharmaceuticals, agricultural chemicals, veterinary drugs (livestock and pets), cosmetics, detergents, dyes, inks, additives, and other materials that require the development of non-toxic or low-toxic compounds. The invention can be applied.

この発明は、上記発明の実施の形態及び実施例の説明に何ら限定されるものではない。特許請求の範囲の記載を逸脱せず、当業者が容易に想到できる範囲で種々の変形態様もこの発明に含まれる。本明細書の中で明示した論文、公開特許公報、及び特許公報などの内容は、その全ての内容を援用によって引用することとする。 The present invention is not limited to the description of the embodiments and examples of the invention described above. Various modifications may be included in the present invention as long as those skilled in the art can easily conceive without departing from the description of the scope of claims. The contents of papers, published patent gazettes, patent gazettes, and the like specified in this specification are incorporated by reference in their entirety.

＜参考文献＞
1. Weininger D., "SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules", Journal of Chemical Information and Modeling, (The United States of America), The American Chemical Society Publications, February 1988, vol. 28, iss.1, p. 31-36, DOI: 10.1021/ci00057a005
2. Weininger D., ET AL, "SMILES. 2. Algorithm for generation of unique SMILES notation", Journal of Chemical Information and Modeling, (The United States of America), The American Chemical Society Publications, May 1989, vol. 29, iss. 2, p. 97-101, DOI: 10.1021/ci00062a008
3. Dalby A., ET AL, "Description of several chemical structure file formats used by computer programs developed at Molecular Design Limited", Journal of Chemical Information and Computer Sciences, (The United States of America), The American Chemical Society Publications, vol. 32, iss. 3, May 1992, p. 244-255. DOI: 10.1021/ci00007a012
4. Schlegel H. B., "Geometry optimization", Wiley Interdisciplinary Reviews: Computational Molecular Science, (The United States of America), John Wiley & Sons, Ltd., May 2011, vol. 1, iss. 5, p.790-809, DOI: 10.1002/wcms.34
5. Zerner M. C., ET AL(Eds),"Semiempirical Molecular Orbital Methods", Reviews in Computational Chemistry, (Germany), Wiley-VCH, Inc., 1991, vol. 2, p. 313-365, DOI: 10.1002/9780470125793.ch8
6. Friesner R. A., "Ab initio quantum chemistry: Methodology and applications", Proceedings of the National Academy of Sciences of the United States of America, (The United States of America), The United States National Academy of Sciences, May 2005, vol. 102, no. 19, p. 6648-6653, DOI: 10.1073/pnas.0408036102
7. Parr R. G., "Density Functional Theory", Annual Review of Physical Chemistry, (The United States of America), Annual Reviews, Inc., 1983, vol. 34, p. 631-656, https://doi.org/10.1146/annurev.pc.34.100183.003215
8. Bowen J. P. ET AL, Lipkowitz K. B. ET AL (Eds), "Molecular Mechanics: The Art and Science of Parameterization", Reviews in Computational Chemistry, (Germany), WILEY-VCH, Inc., 1991, vol. 2, p. 81-97, DOI: 10.1002/9780470125793
9. Mazzanti A., ET AL, "Recent trends in conformational analysis", Wiley Interdisciplinary Reviews: Computational Molecular Science, (The United States of America), John Wiley & Sons, Inc., November 2011, vol. 2, iss. 4, p. 613-641, DOI: 10.1002/wcms.96
10. CORINA Classic、Molecular Networks GmbH社およびAltamira, LLC.社、ウェブサイト https://www.mn-am.com/products/corina
11. SYBYL（登録商標）-X Suite、Certara USA, Inc.社、ウェブサイトhttps://www.certara.com/software/molecular-modeling-and-simulation/sybyl-x-suite/
12. O'Boyle N. M., ET AL, "Open Babel: An open chemical toolbox", Journal of Cheminformatics, (The United States of America), Springer Publishing, 3:33, Oct 2011, DOI: 10.1186/1758-2946-3-33
13. Steinbeck C., ET AL, "The Chemistry Development Kit (CDK): An Open-Source Java Library for Chemo- and Bioinformatics", Journal of Chemical Information and Computer Sciences, (The United States of America), The American Chemical Society Publications, vol. 43, iss. 2, February 2003, p. 493-500, DOI: 10.1021/ci025584y
14. RDKit, Landrum G., "RDKit Documentation Release 2017.03.1", Online Documentation, ウェブサイトhttp://www.rdkit.org/docs/Overview.html
15. Chem3D^TM、PerkinElmer, Inc.社、ウェブサイトhttp://www.cambridgesoft.com/Ensemble_for_Chemistry/ChemOffice/ChemOfficeProfessional/
16. ChemBio3D（登録商標）、PerkinElmer, Inc.社、ウェブサイトhttps://www.cambridgesoft.com/Ensemble_for_Chemistry/details/Default.aspx?fid=17&pid=660
17. MarvinSketch、ChemAxon社、ウェブサイトhttps://www.chemaxon.com/products/calculator-plugins/molecular-modelling/
18. Balloon、Vainio M., ウェブサイトhttp://users.abo.fi/mivainio/balloon/
19. TINKER、Ponder J., ET AL, ウェブサイトhttps://dasher.wustl.edu/tinker/
20. Amber、Case D. A., ET AL, ウェブサイトhttp://ambermd.org/
21. AmberTools、ウェブサイトhttp://ambermd.org/#AmberTools
22. CHARMM、Karplus M., ET AL, ウェブサイトhttps://www.charmm.org/charmm/
23. NAMD、Theoretical and Computational Biophysics Group, Beckman Institute, University of Illinois, ウェブサイト http://www.ks.uiuc.edu/Research/namd/
24. BOSS、 Jorgensen W. L., ET AL, ウェブサイトhttp://zarbi.chem.yale.edu/software.html
25. VEGA ZZ/VEGA Command line、Pedretti A., ET AL: ウェブサイトhttp://www.vegazz.net/
26. GROMOS^TM、van Gunsteren W. F., ET AL, ウェブサイトhttp://www.igc.ethz.ch/gromos.html
27. GROMACS、van der Spoel D., ET AL, ウェブサイトhttp://www.gromacs.org/
ウェブサイトhttp://openmopac.net/
29. GAMESS、Schmidt M. W., ET AL, ウェブサイトhttp://www.msg.ameslab.gov/gamess/
30. Firefly、Granovsky A. A., ET AL, ウェブサイトhttp://classic.chem.msu.su/gran/gamess/index.html
31. Gaussian（登録商標）、Gaussian, Inc.社、ウェブサイトhttp://gaussian.com/citation/
32. Spartan、Wavefunction, Inc.社、ウェブサイトhttps://www.wavefun.com/products/spartan.html
33. Q-Chem、Q-Chem, Inc.社、ウェブサイト http://www.q-chem.com/
34. HyperChem、Hypercube, Inc.、ウェブサイトhttp://www.hyper.com/
35. Molecular Operating Environment、Chemical Computing Group ULC、ウェブサイトhttps://www.chemcomp.com/MOE-Molecular_Operating_Environment.htm
36. BIOVIA（登録商標） Discovery Studio、Dassault Systemes社、ウェブサイトhttp://accelrys.com/products/collaborative-science/biovia-discovery-studio/
37. BIOVIA（登録商標） Material Studio、Dassault Systemes社、ウェブサイトhttp://accelrys.com/products/collaborative-science/biovia-materials-studio/
38. ConfGen、Schrodinger（登録商標）, LLC社、Schrodinger Release 2017-1: New York, NY, 2017ウェブサイトhttps://www.schrodinger.com/confgen
39. LigPrep、Schrodinger（登録商標）, LLC社、Schrodinger Release 2017-1: New York, NY, 2017,ウェブサイトhttps://www.schrodinger.com/ligprep
40. Desmond Molecular Dynamics System、D. E. Shaw Research、New York, NY, 2017. ウェブサイトhttps://www.schrodinger.com/desmond
41. Jaguar、Schrodinger（登録商標）, LLC社、Schrodinger Release 2017-1: New York, NY, 2017, ウェブサイトhttps://www.schrodinger.com/jaguar
42. MacroModel、Schrodinger（登録商標）, LLC社、Schrodinger Release 2017-1: New York, NY, 2017, ウェブサイトhttps://www.schrodinger.com/macromodel
43. MOLGEN、Wassermann A., ET AL, ウェブサイト http://www.molgen.de/
44. CONFLEX（登録商標）、CONFLEX Corporation社、ウェブサイトhttp://www.conflex.net/
45. OMEGA、OpenEye Scientific Software、ウェブサイトhttps://www.eyesopen.com/omega
46. VConf、VeraChem, LLC、ウェブサイトhttp://www.verachem.com/products/vconf/
47. Key3D、IMMD, Inc.、ウェブサイト http://www.immd.co.jp/en/product_2.html
48. Molpro、TTI GmbH, ウェブサイトhttps://www.molpro.net/
49. Molcas、Veryazov. V. ET AL, ウェブサイト http://www.molcas.org/
50. ADF（登録商標）、Scientific Computing & Modelling NV, ウェブサイトhttps://www.scm.com/product/adf/
51. TURBOMOLE、TURBOMOLE GmbH, ウェブサイトhttp://www.turbomole.com/
52. PQS、Parallel Quantum Solutions, ウェブサイトhttp://www.pqs-chem.com/
53. MPQC、Valeev E. ET AL, ウェブサイト http://www.mpqc.org/
54. Dalton、Dalton/LSDalton developers, ウェブサイトhttp://daltonprogram.org
55. LSDalton、Dalton/LSDalton developersウェブサイトhttp://daltonprogram.org
56. COLUMBUS、Lischka H. ET AL, ウェブサイト https://www.univie.ac.at/columbus/
57. NWChem、Valiev M., ET AL, ウェブサイトhttp://www.nwchem-sw.org/index.php/Main_Page
58. PSI4、Sherrill C. D., ET AL, ウェブサイトhttp://www.psicode.org/
59. CFOUR、Stanton J. F., ET AL, ウェブサイトhttp://www.cfour.de/
60. ACES、Bartlett R. J., ET AL, ウェブサイト http://www.qtp.ufl.edu/ACES/
61. ORCA、Neese, F., ET AL, ウェブサイト https://orcaforum.cec.mpg.de/
62. SMASH、Ishimura K., ウェブサイト http://smash-qc.sourceforge.net/
63. ABINIT-MP（登録商標）、Mochizuki Y., ET AL, ウェブサイトhttp://www.cenav.org/abinitmpopen1/
64. NTChem、Nakajima, T., ET AL, ウェブサイトhttp://labs.aics.riken.jp/nakajimat_top/ntchem_e.html
65. PAICS、Ishikawa T., ウェブサイトhttp://www.paics.net/index_e.html
66. Johann Gasteiger J., ET AL(Editor), "Chemoinformatics: A Textbook"、WILEY-VCH Verlag GmbH & Co. KgaA、Weinheim, 2003, ISBN 3-527-30681-1
67. Wicker J. G. P., ET AL, "Beyond Rotatable Bond Counts: Capturing 3D Conformational Flexibility in a Single Descriptor", Journal of Chemical Information and Modeling, (The United States of America), The American Chemical Society Publications, vol. 56, iss. 12, November 2016, p. 2347-2352, DOI: 10.1021/acs.jcim.6b00565
68. Karelson M., ET AL, "Quantum-Chemical Descriptors in QSAR/QSPR Studies", Chemical Reviews, (The United States of America), The American Chemical Society Publications, vol. 96, iss. 3, May 1996, p. 1027-1044, DOI: 10.1021/cr950202r
69. DRAGON、Kode s.r.l.社、ウェブサイト https://chm.kode-solutions.net/products_dragon.php
70. CODESSA PRO、CompuDrug International, Inc.社、ウェブサイトhttp://www.compudrug.com/codessa_pro
71. ADAPT、Jurs P. C., ET AL、ウェブサイトhttp://research.chem.psu.edu/pcjgroup/adapt.html
72. ADMET Predictor、Simulations Plus, Inc.社、ウェブサイトhttp://www.simulations-plus.com/software/admet-property-prediction-qsar/
73. CORINA Symphony、Molecular Networks GmbH社およびAltamira, LLC.社、ウェブサイト https://www.mn-am.com/products/corinasymphony
74. Pentacle、Molecular Discovery Ltd社、ウェブサイト http://www.moldiscovery.com/software/pentacle/
75. VolSurf+、Molecular Discovery Ltd、ウェブサイトhttp://www.moldiscovery.com/software/vsplus/
76. ISIDA Fragmentor、Varnek A., ET AL、ウェブサイトhttp://infochim.u-strasbg.fr/spip.php?rubrique41
77. JOELib, Zell A., ET AL、ウェブサイトhttp://www.ra.cs.uni-tuebingen.de/software/joelib/index.html
78. Molconn-Z、eduSoft, LC社、ウェブサイトhttp://www.edusoft-lc.com/molconn/
79. PowerMV、Liu J., ET AL、ウェブサイトhttps://www.niss.org/research/software/powermv
80. PreADMET、Bioinformatics & Molecular Design Research Center (BMDRC)、ウェブサイト https://preadmet.bmdrc.kr/preadmet-pc-version-2-0/
81. PaDEL-Descriptor、Yap C. W., ウェブサイトhttp://www.yapcwsoft.com/dd/padeldescriptor/
82. Cinfony、 O'Boyle N. M., ET AL、ウェブサイト http://cinfony.github.io/
83. Cao D.-S., ET AL, "ChemoPy: freely available python package for computational biology and chemoinformatics", Bioinformatics, (The United Kingdom), Oxford University Press, vol. 29, iss. 8, March 2013, p. 1092-1094, DOI: https://doi.org/10.1093/bioinformatics/btt105
84. ToMoCoMD-CARDD、Ponce Y. M., ウェブサイト http://tomocomd.com/
85. QuaSAR-Descriptor、Chemical Computing Group ULC社、ウェブサイトhttps://www.chemcomp.com/journal/descr.htm
86. QikProp、Schrodinger（登録商標）, LLC社、Schrodinger Release 2017-1: New York, NY, 2017, ウェブサイト https://www.schrodinger.com/qikprop
87. VCharge、VeraChem, LLC、ウェブサイトhttp://www.verachem.com/products/vcharge/
88. Mold2、Hong, H., ET AL, ウェブサイトhttps://www.fda.gov/ScienceResearch/BioinformaticsTools/Mold2/ucm144528.htm
89. LibSVM、Chang C.-C., ET AL、ウェブサイトhttps://www.csie.ntu.edu.tw/~cjlin/libsvm/
90. TensorFlow^TM、Google Inc.、ウェブサイトhttps://www.tensorflow.org/
91. Chainer（登録商標）、Preferred Networks, Inc.、ウェブサイト http://chainer.org/
92. Jubatus（登録商標）、Preferred Networks, Inc.および日本電信電話株式会社、ウェブサイト https://jubat.us/en/
93. Caffe、Jia Y., ET AL、ウェブサイトhttp://caffe.berkeleyvision.org/
94. Theano、Theano Development Team、ウェブサイトhttp://deeplearning.net/software/theano/
95. Torch、Ronan Collobert ET AL、ウェブサイトhttp://torch.ch/
96. neon^TM、Nervana Systems、ウェブサイト https://www.nervanasys.com/technology/neon/
97. MXNet、ウェブサイト http://mxnet.io/
98. The Microsoft Cognitive Toolkit、Microsoft Corporation、ウェブサイトhttps://www.microsoft.com/en-us/cognitive-toolkit/
99. R(c)、The R Foundation、ウェブサイトhttps://www.r-project.org/
100. MATLAB（登録商標）、The MathWorks, Inc.、ウェブサイトhttps://www.mathworks.com/products/matlab.html?s_tid=hp_products_matlab
101. Mathematica（登録商標）、Wolfram Research、ウェブサイト http://www.wolfram.com/mathematica/
102. SAS（登録商標）、SAS Institute Inc.、ウェブサイトhttps://www.sas.com/en_us/home.html
103. RapidMiner（登録商標）、RapidMiner, Inc.、ウェブサイトhttps://rapidminer.com/
104. KNIME（登録商標）、KNIME.com AG、ウェブサイトhttps://www.knime.org/
105. Witten I. H., ET AL, The WEKA Workbench. Online Appendix for "Data Mining: Practical Machine Learning Tools and Techniques", (The United States of America), Morgan Kaufmann, Fourth Edition, 2016, ISBN13: 978-0128042915. ウェブサイト http://www.cs.waikato.ac.nz/ml/weka/
106. shogun-toolbox/shogun、ウェブサイト http://www.shogun-toolbox.org/, Sonnenburg S., ET AL, "shogun-toolbox/shogun: Shogun 6.0.0 - Baba Nobuharu", April 2017, DOI: 10.5281/zenodo.556748
107. Orange、Demsar J., ET AL, "Orange: Data Mining Toolbox in Python", Journal of Machine Learning Research, (The United States of America), JMLR, Inc. and Microtome Publishing, vol. 14, Aug 2013, p. 2349-2353. ウェブサイトhttps://orange.biolab.si/
108. Apache Mahout^TM、The Apache Software Foundation、ウェブサイトhttp://mahout.apache.org/
109. scikit-learn、ウェブサイトhttp://scikit-learn.org/stable/. Pedregosa F., ET AL, "Scikit-learn: Machine Learning in Python", Journal of Machine Learning Research, (The United States of America), JMLR, Inc. and Microtome Publishing, vol. 12, Oct 2011, p. 2825-2830
110. mlpy, Albanese D., ET AL、ウェブサイトhttp://mlpy.sourceforge.net/
111. Chen T., ET AL, "XGBoost: A Scalable Tree Boosting System", Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Aug 2016, DOI: 10.1145/2939672.2939785
112. Deeplearning4j、Skymind社、Deeplearning4j Development Team, "Deeplearning4j: Open-source distributed deep learning for the JVM", Apache Software Foundation License 2.0. ウェブサイトhttps://deeplearning4j.org/
113. MedChem Designer^TM、Simulations Plus, Inc.社、ウェブサイトhttp://www.simulations-plus.com/software/medchem-designer/
114. ChemBioDraw（登録商標）、PerkinElmer, Inc.社、ウェブサイトhttps://www.cambridgesoft.com/Ensemble_for_Chemistry/details/Default.aspx?fid=17&pid=660
115. ChemDraw（登録商標）、PerkinElmer, Inc.社、ウェブサイトhttp://www.cambridgesoft.com/software/overview.aspx
116. BIOVIA（登録商標） Draw、Dassault Systemes社、ウェブサイトhttp://accelrys.com/products/collaborative-science/biovia-draw/
117. Excel（登録商標）、Microsoft Corporation、ウェブサイト https://products.office.com/en-us/excel
118. Libre Office、The Document Foundation、ウェブサイトhttps://www.libreoffice.org/
119. Apache Open Office^TM、The Apache Software Foundation、ウェブサイトhttps://www.openoffice.org/
120. Sakuratani Y, ET AL, "Molecular size as a limiting characteristic for bioconcentration in fish", Journal of Environmental Biology, January 2008, vol. 29, iss. 1, p. 89-92.、ウェブサイトhttp://www.jeb.co.in/journal_issues/200801_jan08/paper_15.pdf
121. Yap C. W., "PaDEL-descriptor: An open source software to calculate molecular descriptors and fingerprints", Journal of Computational Chemistry, Volume 32, Issue 7, p. 1466-1474, May 2011. DOI: 10.1002/jcc.21707. Supporting Information: JCCT_21707_sm_suppinformation.xls、ウェブサイトhttp://onlinelibrary.wiley.com/doi/10.1002/jcc.21707/suppinfo <References>
1. Weininger D., "SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules", Journal of Chemical Information and Modeling, (The United States of America), The American Chemical Society Publications, February 1988, vol. 28, iss.1, p. 31-36, DOI: 10.1021 / ci00057a005
2. Weininger D., ET AL, "SMILES. 2. Algorithm for generation of unique SMILES notation", Journal of Chemical Information and Modeling, (The United States of America), The American Chemical Society Publications, May 1989, vol. 29 , iss. 2, p. 97-101, DOI: 10.1021 / ci00062a008
3. Dalby A., ET AL, "Description of several chemical structure file formats used by computer programs developed at Molecular Design Limited", Journal of Chemical Information and Computer Sciences, (The United States of America), The American Chemical Society Publications, vol. 32, iss. 3, May 1992, p. 244-255. DOI: 10.1021 / ci00007a012
4. Schlegel HB, "Geometry optimization", Wiley Interdisciplinary Reviews: Computational Molecular Science, (The United States of America), John Wiley & Sons, Ltd., May 2011, vol. 1, iss. 5, p.790-809 , DOI: 10.1002 / wcms.34
5. Zerner MC, ET AL (Eds), "Semiempirical Molecular Orbital Methods", Reviews in Computational Chemistry, (Germany), Wiley-VCH, Inc., 1991, vol. 2, p. 313-365, DOI: 10.1002 / 9780470125793.ch8
6. Friesner RA, "Ab initio quantum chemistry: Methodology and applications", Proceedings of the National Academy of Sciences of the United States of America, (The United States of America), The United States National Academy of Sciences, May 2005, vol 102, no. 19, p. 6648-6653, DOI: 10.1073 / pnas.0408036102
7. Parr RG, "Density Functional Theory", Annual Review of Physical Chemistry, (The United States of America), Annual Reviews, Inc., 1983, vol. 34, p. 631-656, https://doi.org /10.1146/annurev.pc.34.100183.003215
8. Bowen JP ET AL, Lipkowitz KB ET AL (Eds), "Molecular Mechanics: The Art and Science of Parameterization", Reviews in Computational Chemistry, (Germany), WILEY-VCH, Inc., 1991, vol. 2, p 81-97, DOI: 10.1002 / 9780470125793
9. Mazzanti A., ET AL, "Recent trends in conformational analysis", Wiley Interdisciplinary Reviews: Computational Molecular Science, (The United States of America), John Wiley & Sons, Inc., November 2011, vol. 2, iss. 4, p. 613-641, DOI: 10.1002 / wcms.96
10. CORINA Classic, Molecular Networks GmbH and Altamira, LLC., Website https://www.mn-am.com/products/corina
11. SYBYL®-X Suite, Certara USA, Inc., website https://www.certara.com/software/molecular-modeling-and-simulation/sybyl-x-suite/
12. O'Boyle NM, ET AL, "Open Babel: An open chemical toolbox", Journal of Cheminformatics, (The United States of America), Springer Publishing, 3:33, Oct 2011, DOI: 10.1186 / 1758-2946- 3-33
13. Steinbeck C., ET AL, "The Chemistry Development Kit (CDK): An Open-Source Java Library for Chemo- and Bioinformatics", Journal of Chemical Information and Computer Sciences, (The United States of America), The American Chemical Society Publications, vol. 43, iss. 2, February 2003, p. 493-500, DOI: 10.1021 / ci025584y
14. RDKit, Landrum G., "RDKit Documentation Release 2017.03.1", Online Documentation, website http://www.rdkit.org/docs/Overview.html
15. Chem3D ^TM , PerkinElmer, Inc., website http://www.cambridgesoft.com/Ensemble_for_Chemistry/ChemOffice/ChemOfficeProfessional/
16. ChemBio3D®, PerkinElmer, Inc., website https://www.cambridgesoft.com/Ensemble_for_Chemistry/details/Default.aspx?fid=17&pid=660
17. MarvinSketch, ChemAxon, website https://www.chemaxon.com/products/calculator-plugins/molecular-modelling/
18. Balloon, Vainio M., website http://users.abo.fi/mivainio/balloon/
19. TINKER, Ponder J., ET AL, website https://dasher.wustl.edu/tinker/
20. Amber, Case DA, ET AL, website http://ambermd.org/
21. AmberTools, website http://ambermd.org/#AmberTools
22. CHARMM, Karplus M., ET AL, website https://www.charmm.org/charmm/
23. NAMD, Theoretical and Computational Biophysics Group, Beckman Institute, University of Illinois, website http://www.ks.uiuc.edu/Research/namd/
24. BOSS, Jorgensen WL, ET AL, website http://zarbi.chem.yale.edu/software.html
25. VEGA ZZ / VEGA Command line, Pedretti A., ET AL: Website http://www.vegazz.net/
26. GROMOS ^TM , van Gunsteren WF, ET AL, website http://www.igc.ethz.ch/gromos.html
27. GROMACS, van der Spoel D., ET AL, website http://www.gromacs.org/
Website http://openmopac.net/
29. GAMESS, Schmidt MW, ET AL, Website http://www.msg.ameslab.gov/gamess/
30. Firefly, Granovsky AA, ET AL, website http://classic.chem.msu.su/gran/gamess/index.html
31. Gaussian (registered trademark), Gaussian, Inc., website http://gaussian.com/citation/
32. Spartan, Wavefunction, Inc., website https://www.wavefun.com/products/spartan.html
33. Q-Chem, Q-Chem, Inc., website http://www.q-chem.com/
34. HyperChem, Hypercube, Inc., website http://www.hyper.com/
35. Molecular Operating Environment, Chemical Computing Group ULC, website https://www.chemcomp.com/MOE-Molecular_Operating_Environment.htm
36. BIOVIA® Discovery Studio, Dassault Systemes, website http://accelrys.com/products/collaborative-science/biovia-discovery-studio/
37. BIOVIA® Material Studio, Dassault Systemes, website http://accelrys.com/products/collaborative-science/biovia-materials-studio/
38. ConfGen, Schrodinger (R), LLC, Schrodinger Release 2017-1: New York, NY, 2017 Website https://www.schrodinger.com/confgen
39. LigPrep, Schrodinger (R), LLC, Schrodinger Release 2017-1: New York, NY, 2017, website https://www.schrodinger.com/ligprep
40. Desmond Molecular Dynamics System, DE Shaw Research, New York, NY, 2017. Website https://www.schrodinger.com/desmond
41. Jaguar, Schrodinger (R), LLC, Schrodinger Release 2017-1: New York, NY, 2017, Website https://www.schrodinger.com/jaguar
42. MacroModel, Schrodinger (R), LLC, Schrodinger Release 2017-1: New York, NY, 2017, Website https://www.schrodinger.com/macromodel
43. MOLGEN, Wassermann A., ET AL, website http://www.molgen.de/
44. CONFLEX (registered trademark), CONFLEX Corporation, website http://www.conflex.net/
45. OMEGA, OpenEye Scientific Software, website https://www.eyesopen.com/omega
46. VConf, VeraChem, LLC, website http://www.verachem.com/products/vconf/
47. Key3D, IMMD, Inc., website http://www.immd.co.jp/en/product_2.html
48. Molpro, TTI GmbH, website https://www.molpro.net/
49. Molcas, Veryazov. V. ET AL, website http://www.molcas.org/
50. ADF (R), Scientific Computing & Modeling NV, Website https://www.scm.com/product/adf/
51. TURBOMOLE, TURBOMOLE GmbH, Website http://www.turbomole.com/
52. PQS, Parallel Quantum Solutions, website http://www.pqs-chem.com/
53. MPQC, Valeev E. ET AL, website http://www.mpqc.org/
54. Dalton, Dalton / LSDalton developers, website http://daltonprogram.org
55. LSDalton, Dalton / LSDalton developers website http://daltonprogram.org
56. COLUMBUS, Lischka H. ET AL, Website https://www.univie.ac.at/columbus/
57. NWChem, Valiev M., ET AL, website http://www.nwchem-sw.org/index.php/Main_Page
58. PSI4, Sherrill CD, ET AL, website http://www.psicode.org/
59. CFOUR, Stanton JF, ET AL, website http://www.cfour.de/
60. ACES, Bartlett RJ, ET AL, Website http://www.qtp.ufl.edu/ACES/
61. ORCA, Neese, F., ET AL, Website https://orcaforum.cec.mpg.de/
62. SMASH, Ishimura K., Website http://smash-qc.sourceforge.net/
63. ABINIT-MP (registered trademark), Mochizuki Y., ET AL, website http://www.cenav.org/abinitmpopen1/
64. NTChem, Nakajima, T., ET AL, Website http://labs.aics.riken.jp/nakajimat_top/ntchem_e.html
65. PAICS, Ishikawa T., Website http://www.paics.net/index_e.html
66. Johann Gasteiger J., ET AL (Editor), "Chemoinformatics: A Textbook", WILEY-VCH Verlag GmbH & Co. KgaA, Weinheim, 2003, ISBN 3-527-30681-1
67. Wicker JGP, ET AL, "Beyond Rotatable Bond Counts: Capturing 3D Conformational Flexibility in a Single Descriptor", Journal of Chemical Information and Modeling, (The United States of America), The American Chemical Society Publications, vol. 56, iss 12, November 2016, p. 2347-2352, DOI: 10.1021 / acs.jcim.6b00565
68. Karelson M., ET AL, "Quantum-Chemical Descriptors in QSAR / QSPR Studies", Chemical Reviews, (The United States of America), The American Chemical Society Publications, vol. 96, iss. 3, May 1996, p 1027-1044, DOI: 10.1021 / cr950202r
69. DRAGON, Kode srl, website https://chm.kode-solutions.net/products_dragon.php
70. CODESSA PRO, CompuDrug International, Inc., website http://www.compudrug.com/codessa_pro
71. ADAPT, Jurs PC, ET AL, website http://research.chem.psu.edu/pcjgroup/adapt.html
72. ADMET Predictor, Simulations Plus, Inc., website http://www.simulations-plus.com/software/admet-property-prediction-qsar/
73. CORINA Symphony, Molecular Networks GmbH and Altamira, LLC., Website https://www.mn-am.com/products/corinasymphony
74. Pentacle, Molecular Discovery Ltd, website http://www.moldiscovery.com/software/pentacle/
75. VolSurf +, Molecular Discovery Ltd, website http://www.moldiscovery.com/software/vsplus/
76. ISIDA Fragmentor, Varnek A., ET AL, website http://infochim.u-strasbg.fr/spip.php?rubrique41
77. JOELib, Zell A., ET AL, website http://www.ra.cs.uni-tuebingen.de/software/joelib/index.html
78. Molconn-Z, eduSoft, LC, website http://www.edusoft-lc.com/molconn/
79. PowerMV, Liu J., ET AL, website https://www.niss.org/research/software/powermv
80. PreADMET, Bioinformatics & Molecular Design Research Center (BMDRC), website https://preadmet.bmdrc.kr/preadmet-pc-version-2-0/
81. PaDEL-Descriptor, Yap CW, Website http://www.yapcwsoft.com/dd/padeldescriptor/
82. Cinfony, O'Boyle NM, ET AL, website http://cinfony.github.io/
83. Cao D.-S., ET AL, "ChemoPy: freely available python package for computational biology and chemoinformatics", Bioinformatics, (The United Kingdom), Oxford University Press, vol. 29, iss. 8, March 2013, p 1092-1094, DOI: https://doi.org/10.1093/bioinformatics/btt105
84. ToMoCoMD-CARDD, Ponce YM, Website http://tomocomd.com/
85. QuaSAR-Descriptor, Chemical Computing Group ULC, website https://www.chemcomp.com/journal/descr.htm
86. QikProp, Schrodinger (R), LLC, Schrodinger Release 2017-1: New York, NY, 2017, Website https://www.schrodinger.com/qikprop
87. VCharge, VeraChem, LLC, website http://www.verachem.com/products/vcharge/
88. Mold2, Hong, H., ET AL, Website https://www.fda.gov/ScienceResearch/BioinformaticsTools/Mold2/ucm144528.htm
89.LibSVM, Chang C.-C., ET AL, website https://www.csie.ntu.edu.tw/~cjlin/libsvm/
90. TensorFlow ^TM , Google Inc., website https://www.tensorflow.org/
91. Chainer (R), Preferred Networks, Inc., website http://chainer.org/
92. Jubatus (registered trademark), Preferred Networks, Inc. and Nippon Telegraph and Telephone Corporation, website https://jubat.us/en/
93. Caffe, Jia Y., ET AL, website http://caffe.berkeleyvision.org/
94. Theano, Theano Development Team, website http://deeplearning.net/software/theano/
95. Torch, Ronan Collobert ET AL, website http://torch.ch/
96.neon ^TM , Nervana Systems, website https://www.nervanasys.com/technology/neon/
97. MXNet, website http://mxnet.io/
98. The Microsoft Cognitive Toolkit, Microsoft Corporation, website https://www.microsoft.com/en-us/cognitive-toolkit/
99. R (c), The R Foundation, website https://www.r-project.org/
100. MATLAB®, The MathWorks, Inc., website https://www.mathworks.com/products/matlab.html?s_tid=hp_products_matlab
101. Mathematica®, Wolfram Research, website http://www.wolfram.com/mathematica/
102. SAS®, SAS Institute Inc., website https://www.sas.com/en_us/home.html
103. RapidMiner (R), RapidMiner, Inc., website https://rapidminer.com/
104. KNIME (registered trademark), KNIME.com AG, website https://www.knime.org/
105. Witten IH, ET AL, The WEKA Workbench. Online Appendix for "Data Mining: Practical Machine Learning Tools and Techniques", (The United States of America), Morgan Kaufmann, Fourth Edition, 2016, ISBN13: 978-0128042915. Site http://www.cs.waikato.ac.nz/ml/weka/
106.shogun-toolbox / shogun, website http://www.shogun-toolbox.org/, Sonnenburg S., ET AL, "shogun-toolbox / shogun: Shogun 6.0.0-Baba Nobuharu", April 2017, DOI : 10.5281 / zenodo.556748
107. Orange, Demsar J., ET AL, "Orange: Data Mining Toolbox in Python", Journal of Machine Learning Research, (The United States of America), JMLR, Inc. and Microtome Publishing, vol. 14, Aug 2013, p. 2349-2353. Website https://orange.biolab.si/
108. Apache Mahout ^TM , The Apache Software Foundation, website http://mahout.apache.org/
109. scikit-learn, website http://scikit-learn.org/stable/. Pedregosa F., ET AL, "Scikit-learn: Machine Learning in Python", Journal of Machine Learning Research, (The United States of America), JMLR, Inc. and Microtome Publishing, vol. 12, Oct 2011, p. 2825-2830
110.mlpy, Albanese D., ET AL, website http://mlpy.sourceforge.net/
111. Chen T., ET AL, "XGBoost: A Scalable Tree Boosting System", Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Aug 2016, DOI: 10.1145 / 2939672.2939785
112. Deeplearning4j, Skymind, Deeplearning4j Development Team, "Deeplearning4j: Open-source distributed deep learning for the JVM", Apache Software Foundation License 2.0. Website https://deeplearning4j.org/
113. MedChem Designer ^™ , Simulations Plus, Inc., website http://www.simulations-plus.com/software/medchem-designer/
114. ChemBioDraw (R), PerkinElmer, Inc., website https://www.cambridgesoft.com/Ensemble_for_Chemistry/details/Default.aspx?fid=17&pid=660
115. ChemDraw®, PerkinElmer, Inc., website http://www.cambridgesoft.com/software/overview.aspx
116. BIOVIA® Draw, Dassault Systemes, website http://accelrys.com/products/collaborative-science/biovia-draw/
117. Excel (registered trademark), Microsoft Corporation, website https://products.office.com/en-us/excel
118. Libre Office, The Document Foundation, website https://www.libreoffice.org/
119. Apache Open Office ^TM , The Apache Software Foundation, website https://www.openoffice.org/
120. Sakuratani Y, ET AL, "Molecular size as a limiting characteristic for bioconcentration in fish", Journal of Environmental Biology, January 2008, vol. 29, iss. 1, p. 89-92., Website http: // www.jeb.co.in/journal_issues/200801_jan08/paper_15.pdf
121. Yap CW, "PaDEL-descriptor: An open source software to calculate molecular descriptors and fingerprints", Journal of Computational Chemistry, Volume 32, Issue 7, p. 1466-1474, May 2011. DOI: 10.1002 / jcc.21707. Supporting Information: JCCT_21707_sm_suppinformation.xls, website http://onlinelibrary.wiley.com/doi/10.1002/jcc.21707/suppinfo

１毒性予測システム
２入力装置
３演算装置
４出力装置
５主記憶装置
６制御装置
７補助記憶装置 DESCRIPTION OF SYMBOLS 1 Toxicity prediction system 2 Input device 3 Arithmetic device 4 Output device 5 Main memory device 6 Control device 7 Auxiliary memory device

Claims

(1) receiving the structural information of the test compound input by the user;
(2) generating a three-dimensional molecular structure having an optimized structure based on the received structural information;
(3) One or more molecular descriptions using the three-dimensional molecular structure and including at least one molecular descriptor selected from the group consisting of a three-dimensional molecular descriptor, a four-dimensional molecular descriptor, and a quantum chemical molecular descriptor. Generating a child value;
(4) A step of calculating a toxicity prediction model by using the value of the molecular descriptor to determine whether the test compound is toxic or not. When the probability of toxicity and the probability of non-toxicity are added, 100% And (5) outputting the calculated probability,
A method for predicting the toxicity of a compound, comprising:

The prediction method according to claim 1, wherein step (4) includes the following steps.
(4-1) normalizing the value of the molecular descriptor, and (4-2) calculating the probability of the presence or absence of toxicity of the test compound using the normalized value.

The three-dimensional molecular structure is a three-dimensional molecular structure whose structure is optimized by a semi-empirical molecular orbital method, a three-dimensional molecular structure whose structure is optimized by a non-empirical molecular orbital method, or a structure by a density functional method. Optimized three-dimensional molecular structure and molecular mechanics method, semi-empirical molecular orbital method, non-empirical molecular orbital method or three-dimensional molecular structure conformed by density functional method, molecular mechanics method, semi-empirical One or more three-dimensional molecular structures selected from the group consisting of a three-dimensional molecular structure whose structure is optimized by any combination of the molecular molecular orbital method, the ab initio molecular orbital method and the density functional method Item 3. The prediction method according to item 1 or 2.

The prediction method according to claim 1 or 2, wherein the three-dimensional molecular structure is two or more three-dimensional molecular structures whose structures are optimized by a semi-empirical molecular orbital method.

The prediction method according to any one of claims 1 to 4, wherein the one or more molecular descriptors include one or more three-dimensional molecular descriptors and one or more quantum chemical molecular descriptors.

The one or more molecular descriptors are one or more three-dimensional molecular descriptors, one or more quantum chemical molecular descriptors, one or more two-dimensional molecular descriptors, one or more one-dimensional molecular descriptors, The prediction method according to claim 1, comprising one or more zero-dimensional molecular descriptors.

The toxicity prediction model according to any one of claims 1 to 6, wherein the toxicity prediction model is a toxicity prediction model constructed by machine learning using normalized molecular descriptor values of a plurality of compounds whose presence or absence of toxicity is known. Prediction method.

The prediction method according to claim 7, wherein the machine learning is at least one machine learning selected from the group consisting of a support vector machine, a Bayesian network, a neural network, Adaboost, random forest, and active learning.

Generating a chemical formula of the test compound;
The prediction method according to claim 1, wherein in step (5), the generated chemical formula and the probability are output in association with each other.

2 or more of the test compounds,
The prediction method according to claim 1, wherein in step (5), the probability is output for each test compound.

The prediction method according to any one of claims 1 to 10, wherein in step (5), a determination result of the presence or absence of toxicity of the test compound is output together with the probability.

The prediction method according to any one of claims 1 to 11, wherein the output of step (5) is a display in a tabular format.

The prediction method according to any one of claims 1 to 12, wherein the toxicity is mutagenicity determined by a reverse mutation test using bacteria.

An input means for inputting the structural information of the test compound;
A receiving means for receiving structural information of the test compound input by the user;
First generation means for generating a three-dimensional molecular structure having an optimized structure based on the received structure information;
One or more molecular descriptor values including at least one molecular descriptor selected from the group consisting of a three-dimensional molecular descriptor, a four-dimensional molecular descriptor, and a quantum chemical molecular descriptor using the three-dimensional molecular structure. First calculating means for calculating
A calculation means for the toxicity prediction model to calculate the probability of the presence or absence of toxicity of the test compound using the value of the molecular descriptor, and the sum of the probability of toxicity and the probability of non-toxicity is 100%. A second calculation means, and an output means for outputting the calculated probability;
A system for predicting the toxicity of a compound.

An input device that functions as the input means;
An arithmetic unit that functions as the first generation unit, the first calculation unit, and the second calculation unit;
An output device functioning as the output means;
A main storage device and a control device for controlling the system;
15. The system of claim 14, comprising:

The system according to claim 15, further comprising an auxiliary storage device in which the program is stored.

A process of receiving structural information of the test compound entered by the user;
Based on the received structural information, a process for generating a three-dimensional molecular structure with an optimized structure;
One or more molecular descriptor values including at least one molecular descriptor selected from the group consisting of a three-dimensional molecular descriptor, a four-dimensional molecular descriptor, and a quantum chemical molecular descriptor using the three-dimensional molecular structure. A process of calculating
A process for calculating the probability of the presence or absence of toxicity of the test compound using the value of the molecular descriptor, and a process that is 100% when the probability of toxicity and the probability of non-toxicity are added together And a process for outputting the calculated probability;
A program that causes a computer to execute.

A computer-readable storage medium storing the program according to claim 17.