JP5484946B2

JP5484946B2 - Fast graph match search apparatus and method for evaluating similarity between molecules

Info

Publication number: JP5484946B2
Application number: JP2010031526A
Authority: JP
Inventors: 剛白井
Original assignee: EDUCATIONAL CORP KANSAI BUNRI SOUGOUGAKUEN
Current assignee: EDUCATIONAL CORP KANSAI BUNRI SOUGOUGAKUEN
Priority date: 2010-02-16
Filing date: 2010-02-16
Publication date: 2014-05-07
Anticipated expiration: 2030-02-16
Also published as: JP2011170444A; WO2011102384A1

Description

本発明は、高速グラフマッチ検索アルゴリズムを利用して、２分子間の原子対応を求め対応に基づいて２分子を仮想的に重ね合わせ、２分子間の類似度を求めて評価する、高速グラフマッチ検索装置及び方法に関する。 The present invention uses a fast graph match search algorithm to find an atomic correspondence between two molecules, virtually superimpose two molecules based on the correspondence, and obtain and evaluate a similarity between the two molecules. The present invention relates to a search apparatus and method.

医薬や農薬の分子設計において、２つの分子に係る分子構造を仮想空間にて重ね合わせすることが頻繁に行われる。図１３は、そのような、２つの分子（Ｃｈｏｌｉｃａｃｉｄ［ＣＨＤ］とＣｏｒｔｉｃｏｓｔｅｒｏｎ［ＣＯＲ］）を仮想空間にて重ね合わせすることを模式的に示す図である。しかしながら、２つの分子についての最適な重ね合わせを探索し決定することは非常に困難な問題である。 In molecular design of medicines and agricultural chemicals, it is frequently performed to superimpose molecular structures related to two molecules in a virtual space. FIG. 13 is a diagram schematically showing superposition of two such molecules (Cholic acid [CHD] and Corticosteron [COR]) in a virtual space. However, it is a very difficult problem to search and determine the optimal superposition for two molecules.

例えば、分子Ａと分子Ｂとの重ね合わせの問題について、片方の分子Ａを『ＣＭＰ』とした場合に、それに基づき重ね合わせにて探索可能な重ね合わせの対象の分子Ｂを求める場合を検討する。ここで「探索可能な」というのは、全探査を８時間労働・週休２日の労働時間で５０年程度行って、探索が解決され得ると想定される、という程の意味である。例えば、人手による計算による場合では、分子ＢがＣｙｓｔｅｉｎである場合、１．３×１０^７通り程度の重ね合わせの計算を行い、最適な重ね合わせを求めることが可能となる（図１４（ａ））。同様に、デスクトップコンピュータによる場合では、分子ＢがＤｉａｍｉｎｏｐｉｍｅｌａｔｅである場合、１．５×１０^１５通り程度の重ね合わせの計算を行い、最適な重ね合わせを求めることが可能となる（図１４（ｂ））。更に同様に、超高速度電子計算機による場合でも、分子ＢがＡＭＰである場合、８．３×１０^２１通りの重ね合わせの計算を行い、最適な重ね合わせを求めることが可能となる（図１４（ｃ））。このように、分子Ａと分子Ｂの最適な重ね合わせを全探索に拠ることは、膨大な時間が掛かるため、必ずしも現実的な方法ではない。 For example, regarding the problem of superposition of molecule A and molecule B, when one molecule A is set to “CMP”, the case of obtaining the target molecule B to be superposed that can be searched by superposition based on it is examined. . Here, “searchable” means that it is assumed that the search can be solved by performing the entire exploration for about 50 years with working hours of 8 hours and 2 days a week. For example, in the case of calculation by hand, when the molecule B is Cystein, it is possible to calculate about 1.3 × 10 ⁷ superpositions to obtain the optimum superposition (FIG. 14A). ). Similarly, in the case of using a desktop computer, when the molecule B is Diaminoprimate, it is possible to calculate about 1.5 × 10 ¹⁵ superpositions to obtain an optimum superposition (FIG. 14B). ). Similarly, even in the case of using an ultra high speed computer, when the molecule B is AMP, it is possible to calculate 8.3 × 10 ²¹ superpositions to obtain the optimum superposition (FIG. 14). (C)). As described above, it is not necessarily a practical method because it takes an enormous amount of time to search for an optimal superposition of the molecule A and the molecule B by full search.

よって、２分子間の原子対応を求め該対応に基づいて２分子の最適な重ね合わせを実現するグラフマッチにおいて、多少の間違いを許容しつつも発見的に高速に行うことが求められている。 Therefore, it is required to perform an heuristically high speed while allowing some mistakes in a graph match that obtains an atomic correspondence between two molecules and realizes an optimal superposition of two molecules based on the correspondence.

なお、化合物検索のアルゴリズムに関する先行技術文献として、以下のような６件が挙げられる。 In addition, as the prior art documents relating to the compound search algorithm, there are the following six cases.

特許第４００１６５７号Japanese Patent No. 4001657 特許第３９２８０００号Patent No. 3928000 国際出願０１／０９７０９４号International Application 01/097094 国際出願０２／４１１８４号International Application No. 02/41184 国際出願２００７／００４６４３号International Application 2007/004643

J.Computer-Aided Molecular Design, 13:499-512, 1999 Estimation of active confirmations of drugs by a new molecular superposing procedureJ. Computer-Aided Molecular Design, 13: 499-512, 1999 Estimation of active confirmations of drugs by a new molecular superposing procedure

本発明は、原子をノード、化学結合をエッジとして表現した分子グラフに関して、２分子間の原子対応を求め該対応に基づいて２分子を重ね合わせする方法を高速に実現する、グラフマッチ検索装置及び方法を提供することを目的とする。 The present invention relates to a molecular match search device that expresses an atomic correspondence between two molecules and superimposes two molecules based on the correspondence with respect to a molecular graph expressing atoms as nodes and chemical bonds as edges. It aims to provide a method.

本発明は、上記の目的を達成するために為されたものである。本発明に係る請求項１に記載の、分子間の類似度を評価するための高速グラフマッチ検索装置は、
第１の分子Ａを構成する原子（Ａｉ，Ａｊ，・・・）の各々に係る座標データと第２の分子Ｂを構成する原子（Ｂｋ，Ｂｌ，・・・）の各々に係る座標データを記憶部から入力し、演算部にロードされるコンピュータプログラムに従って、演算部及び記憶部に構築される仮想メモリ空間において第１の分子Ａの夫々の原子（Ａｉ，Ａｊ，・・・）と第２の分子Ｂの夫々の原子（Ｂｋ，Ｂｌ，・・・）との対応付け（ｍ（Ａｉ）＝Ｂｋ）を求めて重ね合わせを行い（ｉ，ｊ，ｋ，ｌはいずれも自然数）、第１の分子Ａと第２の分子Ｂの間の最適な原子間対応、及び第１の分子Ａと第２の分子Ｂの類似度に係るデータを出力部に出力する、第１の分子Ａと第２の分子Ｂとの類似度を評価するための高速グラフマッチ検索装置において、
第１の分子Ａの全ての原子Ａｉと第２の分子Ｂの全ての原子Ｂｋとで形成される、原子Ａｉと原子Ｂｋの組の全てに関して、原子Ａｉ、Ｂｋの対の各原子からみて、周囲の環境が相互にどれだけ似ているかを示す第１の類似指標Ｓ１（Ａｉ、Ｂｋ）を求める第１の算出手段と、
第１の分子Ａの全ての原子Ａｉと第２の分子Ｂの全ての原子Ｂｋとで形成される、原子Ａｉと原子Ｂｋの組の全てに関して、原子Ａｉ、Ｂｋの対の各原子からみて、等しい結合距離にある周囲の原子Ａｊ、Ｂｌの全ての組につき、第１の類似指標Ｓ１（Ａｊ，Ｂｌ）を積算して算出する第２の類似指標Ｓ２（Ａｉ、Ｂｋ）を求める算出手段であって、その原子Ａｉ、Ｂｋの対の各原子から等しい結合距離にある周囲の原子Ａｊ、Ｂｌが同じ元素であれば、更に第１の類似指標Ｓ１（Ａｊ，Ｂｌ）に係数を掛けた上で積算する、第２の類似指標Ｓ２（Ａｉ、Ｂｋ）を求める第２の算出手段と、
第１の分子Ａの全ての原子Ａｉと第２の分子Ｂの全ての原子Ｂｋとで形成される、原子Ａｉと原子Ｂｋの組の全てに関して、原子Ａｉ、Ｂｋの対を始点とし、第１の分子Ａの原子と第２の分子Ｂの原子とを順次対応付けして全体の対応を作成し、そのときに算出されるグラフマッチスコアＭ（Ａ，Ｂ）を値とする第３の類似指標Ｓ３（Ａｉ、Ｂｋ）を求める算出手段であって、対応付け作成時には、既に対応付け済みの原子に直接結合する原子を次に選択すること、及び第２の類似指標Ｓ２が高い対を選択することを優先することを、条件とする、第３の類似指標Ｓ３（Ａｉ、Ｂｋ）を求める第３の算出手段と、
第３の算出手段にて最大のＳ３（Ａｉ，Ｂｋ）を算出した際の、始点の原子（Ａｉ，Ｂｋ）の対から開始して、未対応の原子の対の中で最大のＳ３（Ａｊ，Ｂｌ）を持つものを対応させることを、対応可能原子の組が無くなるまで続けたときの、全体の対応におけるグラフマッチスコアＭ（Ａ，Ｂ）を求める第４の算出手段と、
第４の算出手段におけるグラフマッチスコアＭ（Ａ、Ｂ）が閾値より大きいならば、第１の分子Ａと第２の分子Ｂにつき第４の算出手段で算出した原子間対応及びグラフマッチスコアＭ（Ａ，Ｂ）を出力する第５の出力手段と
を含むことを特徴とする。 The present invention has been made to achieve the above object. The fast graph match search apparatus for evaluating similarity between molecules according to claim 1 of the present invention,
Coordinate data relating to each of the atoms (Ai, Aj,...) Constituting the first molecule A and coordinate data relating to each of the atoms (Bk, B1,...) Constituting the second molecule B. Each atom (Ai, Aj,...) And second of the first molecule A in the virtual memory space constructed in the calculation unit and the storage unit according to the computer program input from the storage unit and loaded into the calculation unit. Are associated with each atom (Bk, B1,...) (M (Ai) = Bk) and superposed (i, j, k, l are all natural numbers), The first molecule A, which outputs to the output unit data relating to the optimal interatomic correspondence between one molecule A and the second molecule B and the similarity between the first molecule A and the second molecule B; In the fast graph match search device for evaluating the similarity with the second molecule B,
Regarding all the pairs of atoms Ai and Bk formed by all atoms Ai of the first molecule A and all atoms Bk of the second molecule B, as seen from each atom of the pair of atoms Ai and Bk, First calculation means for obtaining a first similarity index S1 (Ai, Bk) indicating how similar the surrounding environment is;
Regarding all the pairs of atoms Ai and Bk formed by all atoms Ai of the first molecule A and all atoms Bk of the second molecule B, as seen from each atom of the pair of atoms Ai and Bk, A calculation means for obtaining a second similarity index S2 (Ai, Bk) that is calculated by integrating the first similarity index S1 (Aj, B1) for all pairs of surrounding atoms Aj, B1 having the same bond distance. If the surrounding atoms Aj and B1 that are at the same bond distance from each atom of the pair of atoms Ai and Bk are the same element, the first similarity index S1 (Aj, B1) is further multiplied by a coefficient. A second calculating means for calculating a second similarity index S2 (Ai, Bk),
For all pairs of atoms Ai and Bk formed by all atoms Ai of the first molecule A and all atoms Bk of the second molecule B, the first pair of atoms Ai and Bk is used as the starting point. A third similarity having a value corresponding to the graph match score M (A, B) calculated at that time by creating an overall correspondence by sequentially associating the atoms of the molecule A and the atoms of the second molecule B Computation means for obtaining the index S3 (Ai, Bk), and at the time of creating the correspondence, next select an atom that directly binds to an already associated atom, and select a pair having a high second similarity index S2 Third calculation means for obtaining a third similarity index S3 (Ai, Bk), on the condition that priority is given to
Starting from the pair of starting atoms (Ai, Bk) when the maximum S3 (Ai, Bk) is calculated by the third calculating means, the largest S3 (Aj , B1), the fourth calculation means for obtaining the graph match score M (A, B) in the overall correspondence when the correspondence is continued until there is no pair of atoms that can be handled,
If the graph match score M (A, B) in the fourth calculation means is greater than the threshold value, the correspondence between atoms and the graph match score M calculated by the fourth calculation means for the first molecule A and the second molecule B And fifth output means for outputting (A, B).

本発明により、原子をノード、化学結合をエッジとして表現した分子グラフに関して、２分子間の原子を対応させ該対応に基づいて２分子を重ね合わせするにあたり、最適な重ね合わせを高速に且つ精度よく求めることができる。 According to the present invention, when a molecule graph in which atoms are represented as nodes and chemical bonds are represented as edges, the atoms between the two molecules are made to correspond and two molecules are superposed on each other based on the correspondence. Can be sought.

本発明の実施形態に係るグラフマッチによる分子構造の高速アルゴリズムを実現するコンピュータシステムの構成の例を示す図である。It is a figure which shows the example of a structure of the computer system which implement | achieves the high-speed algorithm of the molecular structure by the graph matching which concerns on embodiment of this invention. 本発明の実施形態に係る高速グラフマッチ探索アルゴリズムによる分子構造の重ね合わせ及びその表示のためのプログラムのフローチャートである。It is a flowchart of the program for superimposition and the display of the molecular structure by the fast graph match search algorithm which concerns on embodiment of this invention. 分子Ａと分子Ｂの、原子（ノード）及び結合（エッジ）を模式的に示す図（図３（１））と、図３（１）に示す分子Ａと分子Ｂに基づいて算出された分子グラフマッチスコアの例（図３（２））である。A diagram schematically showing atoms (nodes) and bonds (edges) of molecules A and B (FIG. 3 (1)), and molecules calculated based on molecules A and B shown in FIG. 3 (1) It is an example (FIG. 3 (2)) of a graph match score. 図２に示すステップＳ１０において、分子Ａと分子Ｂの間の原子対応関係｛ｍ（Ａｉ）｝とグラフマッチスコアＭ（Ａ、Ｂ）を求める高速グラフマッチ探索アルゴリズムのフローチャートである。3 is a flowchart of a fast graph match search algorithm for obtaining an atomic correspondence {m (Ai)} between a molecule A and a molecule B and a graph match score M (A, B) in step S10 shown in FIG. 分子Ａの一部及び分子Ｂの一部を示す図であって、原子Ａｉと原子Ｂｋに関する指標Ｓ１の算出を説明するための図である。It is a figure which shows a part of molecule | numerator A and a part of molecule | numerator B, Comprising: It is a figure for demonstrating calculation of parameter | index S1 regarding the atom Ai and the atom Bk. 分子Ａの一部及び分子Ｂの一部を示す図であって、原子Ａｉと原子Ｂｋに関する指標Ｓ２の算出を説明するための図である。It is a figure which shows a part of molecule | numerator A and a part of molecule | numerator B, Comprising: It is a figure for demonstrating calculation of parameter | index S2 regarding the atom Ai and the atom Bk. 分子Ａの一部及び分子Ｂの一部を示す図であって、原子Ａｉと原子Ｂｋに関する指標Ｓ３の算出を説明するための図である。It is a figure which shows a part of molecule | numerator A and a part of molecule | numerator B, Comprising: It is a figure for demonstrating calculation of parameter | index S3 regarding the atom Ai and the atom Bk. 分子Ａの一部及び分子Ｂの一部を示す図であって、図４のステップＳ１００８におけるグラフマッチスコアＭ０の算出を説明するための図である。It is a figure which shows a part of molecule | numerator A and a part of molecule | numerator B, Comprising: It is a figure for demonstrating calculation of the graph match score M0 in step S1008 of FIG. 分子Ａの一部及び分子Ｂの一部を示す図であって、図４のステップＳ１０１２における微調整を説明するための図である。It is a figure which shows a part of molecule | numerator A and a part of molecule | numerator B, Comprising: It is a figure for demonstrating the fine adjustment in step S1012 of FIG. ねじれ角を調整して、分子Ａと分子Ｂをより重ね合わせて表示することを模式的に示す図である。It is a figure which shows typically adjusting the twist angle and displaying the molecule | numerator A and the molecule | numerator B more superimposed. クエリの原子｛Ａｉ｝に対応した｛ｍ（Ａｉ）｝の組で、重ね合わせを行い原子座標を出力した図（図１１（ａ））と、クエリ構造から共通骨格にあたる原子座標を出力した図（図１１（ｂ））である。A diagram (Fig. 11 (a)) in which atomic coordinates are output by superimposing a set of {m (Ai)} corresponding to a query atom {Ai}, and a diagram in which atomic coordinates corresponding to a common skeleton are output from the query structure. (FIG. 11B). 総組み合わせ数１０^１２以下の問題に対して、全探査により最大スコアを求め、本発明の実施形態に係る高速グラフマッチ探索アルゴリズムによる解と比較を行った際の、全探査組み合わせ数に対する計算時間をグラフ化したもの（図１２（１））と、全探査組み合わせ数に対する正解率をグラフ化したもの（図１２（２））と、正解スコア差と累積正解率の関係をグラフ化したもの（図１２（３））である。For a problem with a total number of combinations of 10 ¹² or less, the maximum score is obtained by full search, and the calculation time for the total search combination number when compared with the solution by the fast graph match search algorithm according to the embodiment of the present invention is calculated. A graph (Fig. 12 (1)), a graph of the correct answer rate for the total number of exploration combinations (Fig. 12 (2)), and a graph of the relationship between the correct score difference and the cumulative correct rate (Fig. 12 (3)). ２つの分子（Ｃｈｏｌｉｃａｃｉｄ［ＣＨＤ］とＣｏｒｔｉｃｏｓｔｅｒｏｎ［ＣＯＲ］）を仮想空間にて重ね合わせすることを模式的に示す図である。It is a figure which shows typically superimposing two molecules (Cholic acid [CHD] and Corticosteron [COR]) in virtual space. 分子Ａと分子Ｂとの重ね合わせの問題について、片方の分子Ａを『ＣＭＰ』とした場合に、それに基づき重ね合わせにて探索可能な重ね合わせの対象の分子Ｂの例を示した図である。FIG. 5 is a diagram showing an example of a molecule B to be superposed that can be searched by superposition based on the case where one molecule A is “CMP” regarding the problem of superposition of molecule A and molecule B. .

以下、図面を参照して本発明に係る好適な実施の形態を説明する。 DESCRIPTION OF THE PREFERRED EMBODIMENTS Preferred embodiments according to the present invention will be described below with reference to the drawings.

本実施形態に係るグラフマッチによる分子構造の高速アルゴリズムは、コンピュータを用いて行われるものであり、Ｃ言語などの適切なプログラム言語によって記述されたプログラムをコンピュータで実行し、（後で説明する）様々な分子を構成する原子に関する座標データをコンピュータ上で構築される仮想メモリ空間に展開することにより、実現されるものである。 The high-speed algorithm for molecular structure based on graph matching according to the present embodiment is performed using a computer, and a program described in an appropriate program language such as C language is executed on the computer (described later). This is realized by expanding coordinate data relating to atoms constituting various molecules in a virtual memory space constructed on a computer.

図１は、本実施形態に係るグラフマッチによる分子構造の高速アルゴリズムを実現するコンピュータシステム２の構成の例を示す図である。コンピュータシステム２は、ディスプレイ等の出力部１２、キーボード１６やマウス１８などの入力部、並びに、演算部、記憶部及び通信制御部等を含む中央処理部１４から構成される。中央処理部１４は、インターネット４等の外部ネットワークを介して、外部サーバ８や外部データベース１０と接続しそれら外部サーバ８や外部データベース１０とデータを送受信することができるように、構成されている。 FIG. 1 is a diagram showing an example of the configuration of a computer system 2 that implements a high-speed molecular structure algorithm based on graph matching according to the present embodiment. The computer system 2 includes an output unit 12 such as a display, an input unit such as a keyboard 16 and a mouse 18, and a central processing unit 14 including a calculation unit, a storage unit, a communication control unit, and the like. The central processing unit 14 is configured to be connected to the external server 8 and the external database 10 via an external network such as the Internet 4 and transmit / receive data to / from the external server 8 and the external database 10.

本実施形態で利用される、様々な分子についての原子座標に係るデータは、ＰＤＢ（プロテインデータバンク；蛋白質構造データバンク）フォーマットのデータであり、通常、外部の商用及び公開データベース１０等から提供される。例えば、ＰＤＢフォーマットの様々な分子についての原子座標に係るデータは、外部ネットワーク４を介して外部の商用及び公開データベース１０からダウンロードされ、コンピュータシステム２に付属する記憶部に格納される。これらのデータは、図２に示すフローチャートに係る処理を実行する際、記憶部から読み出されて利用される。 Data relating to atomic coordinates for various molecules used in the present embodiment is data in a PDB (protein data bank; protein structure data bank) format, and is usually provided from an external commercial or public database 10 or the like. The For example, data relating to atomic coordinates for various molecules in the PDB format is downloaded from an external commercial and public database 10 via the external network 4 and stored in a storage unit attached to the computer system 2. These data are read from the storage unit and used when executing the processing according to the flowchart shown in FIG.

１．高速グラフマッチ探索アルゴリズムによる分子構造の重ね合わせ処理
図２は、本実施形態に係る高速グラフマッチ探索アルゴリズムによる分子構造の重ね合わせ及びその処理のフローチャートである。図２を参照して本実施形態に係る分子構造の重ね合わせ処理を説明する。まず、重ね合わせの一方の分子（分子Ａとする）についてのＰＤＢフォーマットの原子座標を読み込む（ステップＳ０２）。読み込んだＰＤＢフォーマットの原子座標に基づいて、分子Ａの結合距離・結合次数・回転可能結合の設定を行う（ステップＳ０４）。分子の結合距離・結合次数・回転可能結合の設定については後で説明する。 1. FIG. 2 is a flowchart of molecular structure superimposition and processing performed by the high-speed graph match search algorithm according to the present embodiment. With reference to FIG. 2, the superposition process of the molecular structure according to the present embodiment will be described. First, the atomic coordinates in the PDB format for one of the superimposed molecules (referred to as molecule A) are read (step S02). Based on the read atomic coordinates in the PDB format, the bond distance, bond order, and rotatable bond of molecule A are set (step S04). The setting of the molecular bond distance, bond order, and rotatable bond will be described later.

ステップＳ０２及びステップＳ０４と並行して、重ね合わせのもう一方の分子（分子Ｂとする）についてのＰＤＢフォーマットの原子座標を読み込む（ステップＳ０６）。なお、分子Ｂは複数であることがある。次に、分子Ｂの一つについて結合距離・結合次数・回転可能結合の設定を行う（ステップＳ０８）。 In parallel with step S02 and step S04, the atomic coordinates in the PDB format for the other molecule of superposition (molecule B) are read (step S06). There may be a plurality of molecules B. Next, the bond distance, bond order, and rotatable bond are set for one of the molecules B (step S08).

続いて、高速グラフマッチ探索アルゴリズムを行い、分子Ａの原子（Ａｉ）から分子Ｂ（Ｂｋ）への対応関係｛ｍ（Ａｉ）｝及びそのときのグラフマッチスコアＭ（Ａ、Ｂ）を求める（ｉ、ｋはいずれも自然数）（ステップＳ１０）。ここで、グラフマッチスコアＭ（Ａ、Ｂ）とは、２分子間の原子対応を求め該対応に基づいて２分子の最適な重ね合わせを実現するグラフマッチにおいて、最適さの程度を示す指標である。なお、グラフマッチスコアＭ（Ａ、Ｂ）、対応関係｛ｍ（Ａｉ）｝、及び高速グラフマッチ探索アルゴリズムの、夫々の詳細については、後で説明する。 Subsequently, a high-speed graph match search algorithm is performed to obtain the correspondence {m (Ai)} from the atom (Ai) of the molecule A to the molecule B (Bk) and the graph match score M (A, B) at that time ( i and k are both natural numbers) (step S10). Here, the graph match score M (A, B) is an index indicating the degree of optimization in a graph match that obtains an atomic correspondence between two molecules and realizes an optimal superposition of two molecules based on the correspondence. is there. Details of the graph match score M (A, B), the correspondence {m (Ai)}, and the fast graph match search algorithm will be described later.

グラフマッチスコアＭ（Ａ、Ｂ）が閾値より大きいかどうか確認される（ステップＳ１２）。閾値より大きいということは、そのグラフマッチスコアＭ（Ａ、Ｂ）を実現する重ね合わせのための対応関係（ｍ（Ａｉ））が十分に適切であることを意味する（ステップＳ１２のＹｅｓ）。このとき、分子Ａに対する分子Ｂのねじれ角が調節され（ステップＳ１４）、分子Ａと分子Ｂにつき原子アラインメント及び構造重ね合わせが出力される（ステップＳ１６）。ねじれ角の調節、並びに、原子アラインメント及び構造重ね合わせの出力についても、後述する。 It is confirmed whether or not the graph match score M (A, B) is larger than the threshold (step S12). That it is larger than the threshold means that the correspondence (m (Ai)) for superimposition that realizes the graph match score M (A, B) is sufficiently appropriate (Yes in step S12). At this time, the twist angle of the molecule B with respect to the molecule A is adjusted (step S14), and the atomic alignment and superposition of the molecules A and B are output (step S16). The adjustment of the twist angle, and the output of atomic alignment and structure superposition will also be described later.

更に、次の分子Ｂがあるかどうか判断される（ステップＳ１８）。次の分子Ｂがあれば（ステップＳ１８のＹｅｓ）、次の分子Ｂについての結合距離・結合次数・回転可能結合の設定（ステップＳ０８）以降の処理が繰り返される。 Further, it is determined whether there is a next molecule B (step S18). If there is the next molecule B (Yes in step S18), the processing after the setting of the bond distance, bond order, and rotatable bond (step S08) for the next molecule B is repeated.

分子Ｂが無くなれば（ステップＳ１８・Ｎｏ）、出力部１２に基本骨格（又は共通骨格）を出力して（ステップＳ２０）処理を終了する。 If the molecule B disappears (No at Step S18), the basic skeleton (or common skeleton) is output to the output unit 12 (Step S20), and the process is terminated.

２．結合距離・結合次数・回転可能結合の設定
図２のステップＳ０４及びＳ０８で行われる「結合距離・結合次数・回転可能結合の設定」について説明する。 2. Setting of Bonding Distance, Bonding Order, and Rotatable Bonding “Setting of bond distance, bond order, and rotatable bond” performed in steps S04 and S08 in FIG. 2 will be described.

（２．１）結合距離
ＰＤＢフォーマットに係るデータが示す分子構造では、原子間の結合が定義されていないことがある。そこで本実施形態では、一つの分子において、原子ｉと原子ｊの間の原子間距離が２．００Åより短い場合は化学結合が存在するものとしてデータ上、化学結合を設定する（ｉ、ｊはいずれも自然数）。この「原子間距離」は、ＰＤＢから読み込まれる原子座標に基づいて計算される。更に、一つの分子において二つの原子を取り上げたとき、それら２原子を繋ぐ化学結合の数を「結合距離」とする。それら２原子を繋ぐ経路が複数存在するときは最小のものを取る。結合を一つずつ延長することで、一つの分子内の全ての原子間に結合距離が設定される。 (2.1) Bond distance In the molecular structure indicated by the data related to the PDB format, bonds between atoms may not be defined. Therefore, in this embodiment, when the interatomic distance between atom i and atom j is shorter than 2.00 mm in one molecule, the chemical bond is set on the data assuming that a chemical bond exists (i and j are Both are natural numbers). This “interatomic distance” is calculated based on atomic coordinates read from the PDB. Furthermore, when two atoms are taken up in one molecule, the number of chemical bonds connecting these two atoms is defined as “bond distance”. If there are multiple paths connecting these two atoms, take the smallest one. By extending the bonds one by one, the bond distance is set between all atoms in one molecule.

（２．２）結合次数
ＰＤＢフォーマットに係るデータが示す分子構造では、原子間の結合次数が定義されておらず、且つ、一般に水素原子を含んでいない。そこで、以下の表１の示すルールに従い、原子間距離に基づき結合次数を求める。

(2.2) Bond Order In the molecular structure indicated by the data related to the PDB format, the bond order between atoms is not defined and generally does not contain a hydrogen atom. Therefore, the bond order is determined based on the interatomic distance according to the rules shown in Table 1 below.

（２．３）回転可能結合
直接結合する原子の対（原子ｉと原子ｊ）の全てについて、上記「（２．１）結合距離」の定義プロセスを、原子の対間の結合が存在しないものとして実行する。その結果、原子の対（原子ｉと原子ｊ）間に結合距離が設定されず、且つ、原子ｉと原子ｊの間の結合が単結合である場合は、原子ｉと原子ｊの対の間の結合は「回転可能結合」であると設定する。 (2.3) Rotatable bond For all of the directly coupled atom pairs (atom i and atom j), the definition process of “(2.1) Bond distance” above is the one where there is no bond between the pair of atoms. Run as. As a result, when the bond distance is not set between the atom pair (atom i and atom j) and the bond between atom i and atom j is a single bond, between the atom i and atom j pair Is set to be “rotatable connection”.

３．分子グラフマッチスコア定義
本実施形態に係るグラフマッチによる分子構造の高速アルゴリズムでは、分子グラフマッチスコアＭ（Ａ，Ｂ）を定義している。なお｛Ｍ（Ａ，Ｂ）｝は、分子Ａと分子Ｂとの間の分子グラフマッチスコアであることを示す。図３（１）は、分子Ａと分子Ｂの、原子（ノード）及び結合（エッジ）を模式的に示す図である。
以下に、本実施形態で利用する分子グラフマッチスコアＭ（Ａ，Ｂ）の定義（（定義１）、（定義２）、（定義３）及び（定義４））について説明する。 3. Molecular Graph Match Score Definition The molecular graph match score M (A, B) is defined in the high-speed molecular structure algorithm based on graph matching according to the present embodiment. Note that {M (A, B)} indicates a molecular graph match score between the molecule A and the molecule B. FIG. 3A is a diagram schematically showing atoms (nodes) and bonds (edges) of the molecules A and B. FIG.
Hereinafter, the definitions ((Definition 1), (Definition 2), (Definition 3) and (Definition 4)) of the molecular graph match score M (A, B) used in the present embodiment will be described.

（定義１）；「Ａｉ」は、分子Ａのｉ番目の原子であることを示す。「Ａｉ−Ａｊ」は、ＡｉとＡｊの結合を示す。 (Definition 1); “Ai” indicates the i-th atom of the molecule A. “Ai-Aj” indicates a connection between Ai and Aj.

（定義２）；分子Ａの原子ｉ（Ａｉ）が、分子Ｂの原子ｋ（Ｂｋ）に対応することを、「ｍ（Ａｉ）＝Ｂｋ」と表すものとする。即ち、ｍ（Ａｉ）＝Ｂｋとは、分子Ａの原子ｉが対応する分子Ｂの原子ｋを示す。 (Definition 2); The fact that the atom i (Ai) of the molecule A corresponds to the atom k (Bk) of the molecule B is expressed as “m (Ai) = Bk”. That is, m (Ai) = Bk indicates the atom k of the molecule B to which the atom i of the molecule A corresponds.

（定義３）
分子グラフマッチスコアＭ（Ａ，Ｂ）は以下の式（数１）で定義される

数１の各項は、以下の通り定義される。なおＥ（Ａｉ，Ａｊ）は、実行ｍｏｄｅにより異なる値を持つ。この実行ｍｏｄｅは、図１に示す入力部等を介して事後的に外部から設定され得るものである。

(Definition 3)
The molecular graph match score M (A, B) is defined by the following formula (Equation 1).

Each term of Equation 1 is defined as follows. Note that E (Ai, Aj) has a different value depending on the execution mode. This execution mode can be set later from the outside via the input unit shown in FIG.

図３（２）は、上述の定義に従い、図３（１）に示す分子Ａと分子Ｂに基づいて算出された分子グラフマッチスコアの例である。模様が同じであれば同じ元素であり、エッジは全て単結合であるとしているので、実行ｍｏｄｅに関わり無く、図３（２）に示す値（特に、Ｍ（Ａ、Ｂ）＝１４）となる。 FIG. 3B is an example of a molecular graph match score calculated based on the molecule A and the molecule B shown in FIG. If the patterns are the same, the elements are the same, and the edges are all single bonds. Therefore, the values shown in FIG. 3B (particularly M (A, B) = 14) are obtained regardless of the execution mode. .

４．高速グラフマッチ探索アルゴリズム
図４は、図２に示すステップＳ１０において、分子Ａと分子Ｂの間の、原子の対応関係｛ｍ（Ａｉ）｝とグラフマッチスコアＭ（Ａ、Ｂ）を求める高速グラフマッチ探索アルゴリズムのフローチャートである。以下、このフローチャートを参照し、高速グラフマッチ探索アルゴリズムを具体的に説明する。 4). High-Speed Graph Match Search Algorithm FIG. 4 is a high-speed graph for obtaining the atomic correspondence {m (Ai)} and the graph match score M (A, B) between the molecule A and the molecule B in step S10 shown in FIG. It is a flowchart of a match search algorithm. Hereinafter, the fast graph match search algorithm will be described in detail with reference to this flowchart.

[ステップＳ１００２]；まず、分子Ａを構成する原子と、分子Ｂを構成する原子との全ての組み合わせ（Ａｉ，Ｂｋ）について、以下の数２及び表３で定義される「Ｓ１（Ａｉ，Ｂｋ）」を求める。

[Step S1002]; First, for all combinations (Ai, Bk) of the atoms constituting the molecule A and the atoms constituting the molecule B, “S1 (Ai, Bk) defined in the following Equation 2 and Table 3 ) ”.

Ｓ１（Ａｉ，Ｂｋ）は、原子Ａｉと原子Ｂｋの対において、周囲の環境（同じ結合距離に同じ種類の原子があるか）がどれだけ似ているかを示す指標である。 S1 (Ai, Bk) is an index indicating how similar the surrounding environment (whether there are atoms of the same type at the same bond distance) in the pair of atoms Ai and Bk.

例えば、図５に示される分子Ａの一部、及び分子Ｂの一部において、原子Ａｉから２の結合距離にある原子Ａｊと、原子Ａｉに対応する原子Ｂｋから２の結合距離にある原子Ｂｌとが同一元素であれば、ｓ１（Ａｊ，Ｂｌ）の値は“１”になる。原子Ａｉと原子Ｂｋの対を中心として、同じ結合距離にある、分子Ａの原子と分子Ｂの原子が同じかどうか、全｛ｊ，ｌ｝の組について確認し、“１”又は“０”を設定して積算する。上記のＳ１（Ａｉ，Ｂｋ）は、対応する原子Ａｉ，Ｂｋからみて、同じ結合距離の位置に同じ元素がある、という場合が多い程、大きくなる。 For example, in a part of the molecule A and a part of the molecule B shown in FIG. 5, the atom Aj at a bond distance of 2 from the atom Ai and the atom Bl at a bond distance of 2 from the atom Bk corresponding to the atom Ai Is the same element, the value of s1 (Aj, Bl) is “1”. Whether the atom of the molecule A and the atom of the molecule B, which are at the same bond distance around the pair of the atom Ai and the atom Bk, is the same for all {j, l} pairs, “1” or “0” Set to integrate. The above-mentioned S1 (Ai, Bk) becomes larger as there are many cases where the same element is present at the same bond distance as viewed from the corresponding atoms Ai, Bk.

[ステップＳ１００４]；次に、以下の数３及び表４で定義される「Ｓ２（Ａｉ，Ｂｋ）」を、全ての｛ｉ，ｋ｝の組について求める。

[Step S1004]; Next, “S2 (Ai, Bk)” defined in the following Equation 3 and Table 4 is obtained for all sets of {i, k}.

Ｓ２（Ａｉ，Ｂｋ）は、対応する原子Ａｉ、Ｂｋの対の夫々において、その対の各原子から等しい結合距離にある周囲の原子Ａｊ、Ｂｌの全ての組について、上記の、周囲の環境がどれだけ似ているかを示す指標であるＳ１（Ａｊ，Ｂｌ）を積算する指標であるが、その対の各原子から等しい結合距離にある周囲の原子Ａｊ、Ｂｌが同じものであれば、更にＳ１（Ａｊ，Ｂｌ）に係数（上記表では１２）を掛けて積算される。従って、対応する原子Ａｉ、Ｂｋの対の各原子について、周囲の環境が類似し、更に周囲の環境のその周囲の環境が類似すれば、大きくなる指標である。 S2 (Ai, Bk) is the above-mentioned surrounding environment for all pairs of surrounding atoms Aj, B1 that are at equal bond distances from each atom of the pair of corresponding atoms Ai, Bk. This is an index for accumulating S1 (Aj, B1), which is an index indicating how much they are similar. Multiply (Aj, Bl) by a coefficient (12 in the above table). Therefore, for each atom of the corresponding pair of atoms Ai and Bk, if the surrounding environment is similar and the surrounding environment is similar, it is an index that increases.

例えば、図６に示される、原子Ａｉを含む分子Ａの一部、及び原子Ｂｋを含む分子Ｂの一部において、Ｓ２（Ａｉ，Ｂｋ）を検討する。原子Ａｉからある結合距離（図６では２）にある原子Ａｊと、原子Ｂｋからそれと等しい結合距離にある原子Ｂｌとの全ての対につき、ｓ２（Ａｊ，Ｂｌ）、即ちＳ１（Ａｊ，Ｂｌ）、又はＳ１（Ａｊ，Ｂｌ）×１２を積算する。特に、ＡｊとＢｌが同じ元素であれば、Ｓ１（Ａｊ，Ｂｌ）は所定数倍（ここでは１２倍）されて積算されて、Ｓ２が求められる。原子Ａｉ及び原子Ｂｋからの結合距離は、１から最大値（即ち、原子Ａｉ又は原子Ｂｋから最も遠い原子までの結合距離）まで変動することが想定される。上述のとおり、Ｓ１（Ａｊ，Ｂｌ）は、原子Ａｊ，Ｂｌの対において、（図６のＡｍ、Ｂｎなどの）周囲の環境がどれだけ似ているかを示す指標である。 For example, S2 (Ai, Bk) is examined in a part of the molecule A including the atom Ai and a part of the molecule B including the atom Bk shown in FIG. For every pair of an atom Aj at a certain bond distance from the atom Ai (2 in FIG. 6) and an atom B1 at an equal bond distance from the atom Bk, s2 (Aj, B1), that is, S1 (Aj, B1) Or S1 (Aj, Bl) × 12 is integrated. In particular, if Aj and Bl are the same element, S1 (Aj, Bl) is multiplied by a predetermined number (here, 12 times) and integrated to obtain S2. It is assumed that the bond distance from atom Ai and atom Bk varies from 1 to the maximum value (that is, bond distance from atom Ai or atom Bk to the farthest atom). As described above, S1 (Aj, B1) is an index indicating how similar the surrounding environment (Am, Bn, etc. in FIG. 6) is in the pair of atoms Aj, B1.

上記のＳ２（Ａｉ，Ｂｋ）では、対応する２つの原子Ａｉ，Ｂｋに関して、同じ結合距離の位置の原子の対（Ａｊ，Ｂｌ）のＳ１（Ａｊ，Ｂｌ）が積算されるが、（Ａｊ，Ｂｌ）が同じ元素であれば、Ｓ１（Ａｊ，Ｂｌ）が所定数倍（１２倍）されて積算されるから、周囲の原子の構成が近似するように対応付けされていると、やはりＳ２（Ａｉ，Ｂｋ）は大きくなる。なお、係数「１２」は別の数値であってもよい。 In the above S2 (Ai, Bk), S1 (Aj, B1) of the atom pair (Aj, B1) at the same bond distance position is integrated with respect to the corresponding two atoms Ai, Bk. If B1) is the same element, S1 (Aj, B1) is multiplied by a predetermined number (12 times) and integrated. Ai, Bk) increases. The coefficient “12” may be another numerical value.

[ステップＳ１００６]；次に、以下の数４及び表５で定義される「Ｓ３（Ａｉ，Ａｋ）」を、全ての｛ｉ，ｋ｝の組について求める。

[Step S1006]; Next, “S3 (Ai, Ak)” defined in the following Equation 4 and Table 5 is obtained for all {i, k} pairs.

Ｓ３（Ａｉ，Ｂｋ）は、原子Ａｉ、Ｂｋの対を始点とし、次々に分子Ａの原子と分子Ｂの原子を対応付けして全体の対応を作成し、そのときのグラフマッチスコアＭ（Ａ，Ｂ）を値とする指標である。ここで、対応付け作成時には、既に対応付け済みの原子に直接結合する原子を次に選択すること、及び指標Ｓ２が高い対を選択するのを優先することを、条件としている。 S3 (Ai, Bk) starts from a pair of atoms Ai and Bk, and successively creates an overall correspondence by associating the atoms of molecule A and atoms of molecule B, and the graph match score M (A , B). Here, when creating a correspondence, it is a condition that priority is given to the next selection of an atom that directly binds to an already associated atom, and the selection of a pair having a high index S2.

例えば、図７に示される、原子Ａｉを含む分子Ａの一部、及び原子Ｂｋを含む分子Ｂの一部において、Ｓ３（Ａｉ，Ｂｋ）を検討する。始点は、原子Ａｉ、Ｂｋの対である。原子Ａｉには、原子Ａｊ、Ａｐ、Ａｒが直接結合する。原子Ｂｋには、原子Ｂｌ、Ｂｑ、Ｂｓが直接結合する。｛Ａｊ、Ａｐ、Ａｒ｝と｛Ｂｌ、Ｂｑ、Ｂｓ｝とから形成され得る原子同士の（３×３＝９通りの）対のうちから、（Ａｊ、Ｂｌ）の対のＳ２が最大であるとすると、原子Ａｊと原子Ｂｌを対応付けすることになる。 For example, S3 (Ai, Bk) is examined in a part of the molecule A including the atom Ai and a part of the molecule B including the atom Bk shown in FIG. The starting point is a pair of atoms Ai and Bk. The atoms Aj, Ap, and Ar are directly bonded to the atom Ai. The atoms Bk, Bq, and Bs are directly bonded to the atom Bk. Of the (3 × 3 = 9) pairs of atoms that can be formed from {Aj, Ap, Ar} and {Bl, Bq, Bs}, S2 of the pair (Aj, Bl) is the largest. Then, the atom Aj and the atom Bl are associated with each other.

次に、分子Ａにおいて対応付けが済んだＡｉ−Ａｊには、原子Ａｐ、Ａｒ、Ａｔ、Ａｖが直接結合する。分子Ｂにおいて対応付けが済んだＢｋ−Ｂｌには、原子Ｂｑ、Ｂｓ、Ｂｕ、Ｂｗが直接結合する。｛Ａｐ、Ａｒ、Ａｔ、Ａｖ｝と｛Ｂｑ、Ｂｓ、Ｂｕ、Ｂｗ｝とから形成され得る原子同士の（４×４＝１６通りの）対のうちから、（Ａｐ、Ｂｑ）の対のＳ２が最大であるとすると、原子Ａｐと原子Ｂｑを対応付けすることになる。これにより、分子Ａにおいては、原子Ａｉ、Ａｊ、Ａｐの対応付けが完了し、分子Ｂにおいては、原子Ｂｋ、Ｂｌ、Ｂｑの対応付けが完了する。 Next, atoms Ap, Ar, At, and Av are directly bonded to Ai-Aj that has been associated in molecule A. The atoms Bq, Bs, Bu, and Bw are directly bonded to Bk-Bl that has been associated in the molecule B. Among the (4 × 4 = 16) pairs of atoms that can be formed from {Ap, Ar, At, Av} and {Bq, Bs, Bu, Bw}, S2 of the pair (Ap, Bq) Is the maximum, the atom Ap and the atom Bq are associated with each other. Thereby, in the molecule A, the association of the atoms Ai, Aj, and Ap is completed, and in the molecule B, the association of the atoms Bk, B1, and Bq is completed.

このような対応付けを、対応可能原子の対が無くなるまで、順次繰り返して行う。対応付けが終われば、その対応付けの下でのグラフマッチスコアＭを求める。このような対応付け及びグラフマッチスコアＭ算出が、全ての｛ｉ，ｋ｝の組について行われる。 Such association is sequentially repeated until there are no corresponding atom pairs. When the association is completed, a graph match score M under the association is obtained. Such association and graph match score M calculation are performed for all {i, k} pairs.

上記のＳ３（Ａｉ，Ｂｋ）では、全ての｛Ａｉ，Ｂｋ｝の組み合わせの各々において、原子の対の始点｛Ａｉ，Ｂｋ｝の周囲から徐々に、Ｓ２（対応する原子Ａｉ、Ｂｋの対について、周囲の環境が類似し、更に周囲の環境のその周囲の環境が類似すれば、大きくなる指標）の大きさに着目して、分子Ａの原子と分子Ｂの原子とが対応付けされ、グラフマッチスコアが計算されることになる。 In the above S3 (Ai, Bk), for each combination of all {Ai, Bk}, S2 (for the corresponding pair of atoms Ai, Bk) is gradually increased from the periphery of the starting point {Ai, Bk} of the pair of atoms. If the surrounding environment is similar, and if the surrounding environment is similar to that of the surrounding environment, the atom of the molecule A and the atom of the molecule B are associated with each other, focusing on the size of the index) A match score will be calculated.

[ステップＳ１００８]；次に、ステップＳ１００６にて最大のＳ３（Ａｉ，Ｂｋ）を算出した際の、始点の原子（Ａｉ，Ｂｋ）の対応から開始して、未対応の原子の対の中で最大のＳ３（Ａｊ，Ｂｌ）を持つものを対応させることを、対応可能原子の対が無くなるまで続け、全体の対応におけるグラフマッチスコアＭ０（Ａ，Ｂ）を求める。このとき、途中、原子の対応の対と、次の原子の対応の対とにおいて、分子Ａの原子は直接結合していなくてもよく、同様に、分子Ｂの原子も直接結合していなくてもよい。 [Step S1008]; Next, starting from the correspondence of the starting atom (Ai, Bk) when the maximum S3 (Ai, Bk) is calculated in Step S1006, The correspondence with the one having the maximum S3 (Aj, Bl) is continued until there is no pair of atoms that can be handled, and the graph match score M0 (A, B) in the whole correspondence is obtained. At this time, the atom of the molecule A may not be directly bonded in the corresponding pair of atoms and the corresponding pair of the next atom, and similarly, the atom of the molecule B is not directly bonded. Also good.

例えば、図８に示される、原子Ａｉを含む分子Ａの一部、及び原子Ｂｋを含む分子Ｂの一部において、ステップＳ１００８で行われる原子の対の対応付けを検討する。まず、分子Ａを構成する（例えば、ａ個の）全ての原子と、分子Ｂを構成する（例えば、ｂ個の）全ての原子とから形成され得る原子同士の（ａ×ｂ通りの）対のうち、原子Ａｉ、Ｂｋの対において、（ステップＳ１００６で求めた）Ｓ３が、他のどの対よりも大きい、即ち最大であるとする。そうすると、まず原子Ａｉ、Ｂｋの対が対応付けされる。
次に、Ａｉを除いた分子Ａを構成する（ａ−１）個の原子と、Ｂｋを除いた分子Ｂを構成する（ｂ−１）個の原子とから、形成され得る原子同士の（ａ−１）×（ｂ−１）通りの対のうち、原子Ａｊ、Ｂｌの対において、Ｓ３が、他のどの対よりも大きいとする。そうするとそこで原子Ａｊ、Ｂｌの対が対応付けされる。このとき、ＡｊはＡｉと直接結合しているとは限らず、ＢｌはＢｊと直接結合しているとは限らない（このことは以下、同様である）。 For example, for the part of the molecule A including the atom Ai and the part of the molecule B including the atom Bk shown in FIG. 8, the association of the atom pairs performed in step S1008 is examined. First, a pair of atoms (a × b) that can be formed from all the atoms constituting the molecule A (for example, a) and all the atoms constituting the molecule B (for example, b). Of these, in the pair of atoms Ai and Bk, S3 (obtained in step S1006) is greater than any other pair, that is, the maximum. Then, first, a pair of atoms Ai and Bk is associated.
Next, (a-1) atoms constituting the molecule A excluding Ai and (b-1) atoms constituting the molecule B excluding Bk are formed as (a -1) Among the pairs of (b-1), it is assumed that S3 is larger than any other pair in the pair of atoms Aj and Bl. Then, a pair of atoms Aj and Bl is associated therewith. At this time, Aj is not necessarily directly coupled to Ai, and Bl is not necessarily directly coupled to Bj (the same applies hereinafter).

次に、ＡｉとＡｊを除いた分子Ａを構成する（ａ−２）個の原子と、ＢｋとＢｌを除いた分子Ｂを構成する（ｂ−２）個の原子とから、形成され得る原子同士の（ａ−２）×（ｂ−２）通りの対のうち、原子Ａｊ２、Ｂｌ２の対において、Ｓ３が、他のどの対よりも大きいとする。そうするとそこで原子Ａｊ２、Ｂｌ２の対が対応付けされる。
更に次に、ＡｉとＡｊとＡｊ２を除いた分子Ａを構成する（ａ−３）個の原子と、ＢｋとＢｌとＢｌ２を除いた分子Ｂを構成する（ｂ−３）個の原子とから、形成され得る原子同士の（ａ−３）×（ｂ−３）通りの対のうち、原子Ａｊ３、Ｂｌ３の対において、Ｓ３が、他のどの対よりも大きいとする。そうするとそこで原子Ａｊ３、Ｂｌ３の対が対応付けされる。
更に次に、ＡｉとＡｊとＡｊ２とＡｊ３を除いた分子Ａを構成する（ａ−４）個の原子と、ＢｋとＢｌとＢｌ２とＢｌ３を除いた分子Ｂを構成する（ｂ−４）個の原子とから、形成され得る原子同士の（ａ−４）×（ｂ−４）通りの対のうち、原子Ａｊ４、Ｂｌ４の対において、Ｓ３が、他のどの対よりも大きいとする。そうするとそこで原子Ａｊ４、Ｂｌ４の対が対応付けされる。 Next, atoms that can be formed from (a-2) atoms constituting molecule A excluding Ai and Aj and (b-2) atoms constituting molecule B excluding Bk and Bl Of the (a-2) × (b-2) pairs of each other, in the pair of atoms Aj2 and B12, S3 is greater than any other pair. Then, a pair of atoms Aj2 and B12 is associated therewith.
Next, from (a-3) atoms constituting molecule A excluding Ai, Aj, and Aj2, and (b-3) atoms constituting molecule B excluding Bk, Bl, and Bl2. Suppose that among the (a-3) × (b-3) pairs of atoms that can be formed, S3 is larger than any other pair in the pair of atoms Aj3 and B13. Then, a pair of atoms Aj3 and B13 is associated therewith.
Next, (a-4) atoms constituting molecule A excluding Ai, Aj, Aj2, and Aj3, and (b-4) atoms constituting molecule B excluding Bk, Bl, Bl2, and Bl3. Among the (a-4) × (b-4) pairs of atoms that can be formed with each other atom, the pair of atoms Aj4 and B14 is assumed to have S3 larger than any other pair. Then, a pair of atoms Aj4 and B14 is associated therewith.

このような対応付けを、対応可能原子の対が無くなるまで、順次繰り返して行う。対応付けが終われば、その対応付けの下でのグラフマッチスコアＭ０を求める。 Such association is sequentially repeated until there are no corresponding atom pairs. When the association is completed, a graph match score M0 under the association is obtained.

ステップＳ１００８では、ステップＳ１００６で求めた多数の（例えば、ａ×ｂ通りの）Ｓ３、即ちＭ（Ａ、Ｂ）に基づいて、最終候補となり得る対応付け、及びその対応付けの下でのグラフマッチスコアＭ０の算出が行われる。 In step S1008, based on a large number of (eg, a × b) S3 obtained in step S1006, that is, M (A, B), an association that can be a final candidate, and a graph match under the association A score M0 is calculated.

[ステップＳ１０１０]；算出したＭ０（Ａ，Ｂ）が、想定され得る最大値であるか否かが確認される。具体的には、Ｍ０（Ａ，Ｂ）が、Ｍ（Ａ，Ａ）又はＭ（Ｂ，Ｂ）に等しいかどうか、確認される。図３（２）に示すように、Ｍ（Ａ，Ａ）（又はＭ（Ｂ，Ｂ））は、最大値であると考えられるから、このステップＳ１０１０はこれ以上、グラフマッチスコアを算出する必要がないのかどうかを確認するために行われる。 [Step S1010]; It is confirmed whether or not the calculated M0 (A, B) is the maximum value that can be assumed. Specifically, it is confirmed whether M0 (A, B) is equal to M (A, A) or M (B, B). As shown in FIG. 3B, since M (A, A) (or M (B, B)) is considered to be the maximum value, this step S1010 needs to calculate the graph match score any more. It is done to see if there is no.

等しければ（ステップＳ１０１０・Ｙｅｓ）、ステップＳ１０１６にて原子対応｛ｍ（Ａｉ）｝とグラフマッチスコアＭ０（Ａ、Ｂ）を出力して終了する。等しくなければ（ステップＳ１０１０・Ｎｏ）、ステップＳ１０１２に移行する。 If they are equal (step S1010 / Yes), the atom correspondence {m (Ai)} and the graph match score M0 (A, B) are output in step S1016, and the process ends. If they are not equal (step S1010 · No), the process proceeds to step S1012.

[ステップＳ１０１２]；ステップＳ１０１２では、最終候補となり得る対応付けの微調整が行われる。 [Step S1012]; In step S1012, the fine adjustment of the association that can be the final candidate is performed.

分子Ａにおけるひとつの結合した｛Ａｉ，Ａｊ｝の組に対応する、分子Ｂの｛Ｂｋ，Ｂｌ｝において、一方を他の原子Ｂｎと入れ換え、入れ換えた原子についてのみ、分子Ａにおける原子の対応を変更して、グラフマッチスコアＭ１（Ａ，Ｂ）を求める。なお、原子Ｂｎが分子Ａにおいて対応する原子を持たない場合であってもよい。 In {Bk, B1} of molecule B corresponding to one bonded {Ai, Aj} pair in molecule A, one is replaced with the other atom Bn, and the correspondence of atoms in molecule A is changed only for the replaced atom. The graph match score M1 (A, B) is obtained by changing. The case where the atom Bn does not have a corresponding atom in the molecule A may be used.

例えば、図９に示される、原子Ａｉ、Ａｊ、Ａｍを含む分子Ａの一部、及び、原子Ｂｋ、Ｂｌ、Ｂｎを含む分子Ｂの一部において、ステップＳ１０１２で行われる原子の対の対応付けの変更の例を、説明する。ＡｉとＢｋ、ＡｊとＢｌ、及び、ＡｍとＢｎが、対応付けられており、ＡｉとＡｊが結合しているとする。ここで、｛Ｂｋ、Ｂｌ｝のうちの一方であるＢｌと、Ｂｎとを入れ換え、ＡｊとＢｎを対応付け、同時に、ＡｍとＢｌを対応付ける。即ち、ｍ（Ａｊ）＝Ｂｌであったものをｍ（Ａｊ）＝Ｂｎとし、ｍ（Ａｍ）＝Ｂｎであったものをｍ（Ａｍ）＝Ｂｌとする。その他の原子に係る対応付けは動かされない。この一部のみ変更された対応付けに基づいて、グラフマッチスコアＭ１（Ａ，Ｂ）を求める。 For example, in the part of the molecule A including atoms Ai, Aj, and Am and the part of the molecule B including atoms Bk, B1, and Bn shown in FIG. An example of the change will be described. It is assumed that Ai and Bk, Aj and Bl, and Am and Bn are associated with each other, and Ai and Aj are combined. Here, B1, which is one of {Bk, B1}, and Bn are exchanged, Aj and Bn are associated, and Am and B1 are associated at the same time. That is, m (Aj) = Bl is m (Aj) = Bn, and m (Am) = Bn is m (Am) = Bl. Other atom mappings are not moved. The graph match score M1 (A, B) is obtained based on the association that is only partially changed.

図９の例における原子Ｂｋを（図示しない）Ｂｐと入れ替える、というような対応付けの変更であってもよい。 The association may be changed such that the atom Bk in the example of FIG. 9 is replaced with Bp (not shown).

[ステップＳ１０１４]；算出したＭ１（Ａ，Ｂ）が、Ｍ０（Ａ，Ｂ）より大きいかどうか、確認される。即ち、ステップＳ１０１２にて、微調整を施した原子対応付けから算出されるグラフマッチスコアＭ１（Ａ，Ｂ）の変動が確認される。算出したＭ１（Ａ，Ｂ）が、Ｍ０（Ａ，Ｂ）より大きければ（ステップＳ１０１４・Ｙｅｓ）、Ｍ１（Ａ，Ｂ）の値がＭ０（Ａ，Ｂ）に上書きされ（ステップＳ１０１５）、Ｓ１０１２にて更に微調整が施された原子対応付けから算出されるグラフマッチスコアＭ１（Ａ，Ｂ）が求められる。 [Step S1014]; It is confirmed whether or not the calculated M1 (A, B) is larger than M0 (A, B). That is, in step S1012, the fluctuation of the graph match score M1 (A, B) calculated from the atom adjustment with fine adjustment is confirmed. If the calculated M1 (A, B) is larger than M0 (A, B) (Yes in step S1014), the value of M1 (A, B) is overwritten on M0 (A, B) (step S1015), and S1012 The graph match score M1 (A, B) calculated from the atom correspondence further fine-tuned in is obtained.

[ステップＳ１０１６]；算出したＭ１（Ａ，Ｂ）が、Ｍ０（Ａ，Ｂ）より大きくなければ（ステップＳ１０１４・Ｎｏ）、原子対応付けとグラフマッチスコアＭ０（Ａ、Ｂ）を出力して終了する。 [Step S1016]; If the calculated M1 (A, B) is not greater than M0 (A, B) (No at Step S1014), output the atom correspondence and the graph match score M0 (A, B) and end To do.

５．分子の構造重ね合わせ
図４及び図２に示すフローチャートにより求めた原子対応に基づく、構造重ね合わせの表示について説明する。分子Ａと分子Ｂの分子構造の重ね合わせにおいて、分子Ａの原子｛Ａｉ｝に対応した原子｛ｍ（Ａｉ）｝は適宜、重ね合わせられて表示される。このとき、Kabschの方法(McLachlan , AD. Gene duplications in the structural evolution of chymotrypsin. Journal of Molecular Biology, 128, 49-79, 1979. Kabsch, W. A solution for the best rotation to relate two sets of vectors. Acta Crystallographica, 32A, 922-923, 1976. )が用いられてもよい。 5. Structure superimposition of molecule The structure superposition display based on the atomic correspondence obtained by the flowcharts shown in FIGS. 4 and 2 will be described. In superimposing the molecular structures of the molecule A and the molecule B, the atom {m (Ai)} corresponding to the atom {Ai} of the molecule A is appropriately superimposed and displayed. Kabsch's method (McLachlan, AD. Gene duplications in the structural evolution of chymotrypsin.Journal of Molecular Biology, 128, 49-79, 1979.Kabsch, W. A solution for the best rotation to relate two sets of vectors. Acta Crystallographica, 32A, 922-923, 1976.) may be used.

このとき、２分子間で対応するねじれ角は、以下の方法でそろえられる。
（１）グラフマッチにより結合した分子Ａの原子｛Ａｉ，Ａｊ，Ａｋ，Ａｌ｝が、同様に結合した分子Ｂの原子｛ｍ（Ａｉ），ｍ（Ａｊ），ｍ（Ａｋ），ｍ（Ａｌ）｝に対応し、かつ、結合Ａｊ−Ａｋと、ｍ（Ａｊ）−ｍ（Ａｋ）がいずれも回転可能結合であれば、分子Ｂのねじれ角｛ｍ（Ａｉ），ｍ（Ａｊ），ｍ（Ａｋ），ｍ（Ａｌ）｝を、分子Ａの対応するねじれ角｛Ａｉ，Ａｊ，Ａｋ，Ａｌ｝と同値にする。 At this time, the corresponding twist angles between the two molecules are aligned by the following method.
(1) Atoms {Ai, Aj, Ak, Al} of molecules A bonded by graph matching are similarly bonded to atoms {m (Ai), m (Aj), m (Ak), m (Al )} And the bonds Aj-Ak and m (Aj) -m (Ak) are both rotatable bonds, the twist angle {m (Ai), m (Aj), m of the molecule B Let (Ak), m (Al)} be equivalent to the corresponding twist angle {Ai, Aj, Ak, Al} of molecule A.

上記（１）の様子を模式的に表現したのが、図６である。「回転可能結合」であるか否かの判断においては、図２におけるステップＳ０４及びＳ０８や、「（２．３）回転可能結合」にて設定されるデータが利用される。 FIG. 6 schematically represents the state of (1) above. In determining whether or not it is “rotatable coupling”, data set in steps S04 and S08 in FIG. 2 and “(2.3) rotatable coupling” is used.

６．高速グラフマッチ探索アルゴリズムによる分子構造の重ね合わせ及びその処理の実施例。
図２に示される高速グラフマッチ探索アルゴリズムによる分子構造の重ね合わせ及びその処理のためのフローチャートを実現するプログラムを実装し、クエリ（分子Ａ）をＧ３９（タフミル）とし、探索ターゲットデータベースをＰＤＢの全リガンド（９４４５種）として、計算を行った。動作周波数２．４ＧＨｚのデスクトップコンピュータを利用した。計算時間は、８分５６秒であった。 6). Example of superposition of molecular structure by high-speed graph match search algorithm and its processing.
The program for realizing the superposition of the molecular structure by the high-speed graph match search algorithm shown in FIG. 2 and the flowchart for its processing is implemented, the query (molecule A) is G39 (Tough Mill), and the search target database is the entire PDB. Calculations were performed as ligands (9445 species). A desktop computer with an operating frequency of 2.4 GHz was used. The calculation time was 8 minutes 56 seconds.

以下の表６で、上述の計算によりクエリの原子｛Ａｉ｝に対応した｛ｍ（Ａｉ）｝を、下に並べて表示している。左端カラムには、クエリ（分子Ａ）及び探索対象分子（分子Ｂ）の例を示している。左端から２番目のカラムには、原子数を示している。左端から３番目のカラムには、分子Ａ（Ｇ３９（タフミル））と分子Ｂとのグラフマッチスコアを示している。左端から４番目のカラムには、自己（自己の分子）とのグラフマッチスコアを示している。そして、右部には原子アラインメントを示している。各カラム（縦）に並んだ原子種が一致する場合は共通骨格に当たる。ここで、各カラム（縦）に並んだ原子種が９０％以上一致すれば最下行に“**”を、５０％以上一致すれば最下行に“++”を、示している。

In Table 6 below, {m (Ai)} corresponding to the query atom {Ai} by the above calculation is displayed side by side. The leftmost column shows examples of a query (molecule A) and a search target molecule (molecule B). The number of atoms is shown in the second column from the left end. In the third column from the left end, the graph match score between molecule A (G39 (Tough Mill)) and molecule B is shown. The fourth column from the left shows the graph match score with self (self molecule). The right part shows atomic alignment. When the atomic species arranged in each column (vertical) match, it corresponds to a common skeleton. Here, if the atomic species arranged in each column (vertical) match 90% or more, “**” is shown in the bottom row, and if 50% or more matches, “++” is shown in the bottom row.

図１１は、クエリの原子クエリの原子｛Ａｉ｝に対応した｛ｍ（Ａｉ）｝の組で、重ね合わせを行い原子座標を出力した図（図１１（ａ））と、クエリ構造から共通骨格にあたる原子座標を出力した図（図１１（ｂ））である。 FIG. 11 is a diagram (FIG. 11A) in which atomic coordinates are output by superimposing a set of {m (Ai)} corresponding to an atom {Ai} of the query atomic query, and a common skeleton from the query structure. It is the figure (FIG.11 (b)) which output the atomic coordinate which corresponds.

７．アルゴリズム性能評価
本実施形態に係る高速グラフマッチ探索アルゴリズムの性能評価を行った。 7). Algorithm Performance Evaluation Performance evaluation of the fast graph match search algorithm according to the present embodiment was performed.

（７．１アルゴリズム性能評価（１））
本実施形態は、多項式時間アルゴリズム未知のＮＰ困難問題に近似解を与えるものである。総組み合わせ数１０^１２以下の問題に対して、全探査により最大スコア(＝正解)を求め、本実施形態に係る高速グラフマッチ探索アルゴリズムによる解と比較を行った。図１２（１）は、全探査組み合わせ数に対する計算時間をグラフ化したものであり、下方から本実施形態による計算時間、全探査による計算時間、及び、本実施形態による計算時間に対する全探査による計算時間の比を示している。本実施形態に係るアルゴリズムは、グラフの探査範囲では10^-4〜10^-3秒で計算が可能である。全探査による場合は、10^-4〜10⁶秒を要するものである。 (7.1 Algorithm performance evaluation (1))
In this embodiment, an approximate solution is given to an NP difficult problem with an unknown polynomial time algorithm. For a problem with a total number of combinations of 10 ¹² or less, a maximum score (= correct answer) was obtained by all searches, and compared with a solution by the fast graph match search algorithm according to the present embodiment. FIG. 12A is a graph of the calculation time for the total number of search combinations. From the bottom, the calculation time according to the present embodiment, the calculation time according to the total search, and the calculation according to the total search with respect to the calculation time according to the present embodiment. Shows the time ratio. The algorithm according to this embodiment can be calculated in 10 ⁻⁴ to 10 ⁻³ seconds in the graph search range. In the case of full exploration, it takes 10 ^-4 to 10 ⁶ seconds.

図１２（２）は、全探査組み合わせ数に対する正解率をグラフ化したものである。グラフの探査範囲では、平均９７％の割合で正解を発見した。更に、図１２（３）は、正解スコア差と累積正解率の関係をグラフ化したものである。誤答した場合でも、正解とのスコア差は最大２点であった。これら図１２（１）〜（３）に示すグラフ及び数値から、本実施形態に係る高速グラフマッチ探索アルゴリズムは、高い性能を持つと考えられる。 FIG. 12 (2) is a graph showing the correct answer rate with respect to the total number of search combinations. In the exploration range of the graph, correct answers were found at an average rate of 97%. Further, FIG. 12 (3) is a graph showing the relationship between the correct score difference and the cumulative correct rate. Even in the case of an incorrect answer, the score difference from the correct answer was a maximum of two points. From the graphs and numerical values shown in FIGS. 12 (1) to 12 (3), the fast graph match search algorithm according to this embodiment is considered to have high performance.

（７．２アルゴリズム性能評価（２））
ブロンのクリーク探索アルゴリズム(Bron C. & Kerbosch J. Algorithm 457: Finding all cliques of an undirected graph. Communications of the Association for Computing Machinery, 16, 575-577, 1973)を用いて発見的にグラフマッチを行う方法であるｓｉｍｃｏｍｐの方法(Hattori, M., Okuno, Y., Goto, S. & Kanehisa, M. Development of a chemical structure comparison method for integrated analysis of chemical and genomic information in the metabolic pathways. Journal of American Chemical Society,125,11853-11865, 2003)と、成績比較を行った。 (7.2 Algorithm performance evaluation (2))
Perform heuristic graph matching using Bron's clique search algorithm (Bron C. & Kerbosch J. Algorithm 457: Finding all cliques of an undirected graph. Communications of the Association for Computing Machinery, 16, 575-577, 1973) Simcomp method (Hattori, M., Okuno, Y., Goto, S. & Kanehisa, M. Development of a chemical structure comparison method for integrated analysis of chemical and genomic information in the metabolic pathways.Journal of American Chemical Society, 125, 11853-11865, 2003).

ランダムに選んだ同じ５０種の分子集合に対し総当たりグラフマッチを行い、全比較１２２５例（ａｌｌ）、及びいずれかの方法が部分グラフ(確実に正解である)を発見した１３６例（ｐａｒｔｉａｌ）について、本実施形態の定義によるスコアと実行時間(グラフマッチに要した実時間)を比較した。ｓｉｍｃｏｍｐの方法における最大試行回数（Ｒｍａｘ）を、１．５×１０^４（デフォルト値）〜１０^８で変化させた。 A round-robin graph match was performed for the same 50 molecular groups selected at random, and all comparisons were performed in 1225 cases (all), and 136 cases in which any method found a subgraph (which is definitely correct) (partial) For, the score according to the definition of the present embodiment and the execution time (actual time required for the graph match) were compared. The maximum number of trials (Rmax) in the simcomp method was varied from 1.5 × 10 ⁴ (default value) to 10 ⁸ .

その結果（以下、表７参照）、本実施形態に係る高速グラフマッチ探索アルゴリズムは、１３６の部分グラフ（ｐａｒｔｉａｌ）をすべて発見したのに対し、ｓｉｍｃｏｍｐの方法は１０例（７％）で失敗した。実行時間は一例を除いて本法が高速で、平均４８ミリ秒高速であった。Ｒｍａｘを増大させても発見できる部分グラフに逆転はなく、ｓｉｍｃｏｍｐの方法の実行時間が増大するだけであった。また全比較（ａｌｌ）においても、Ｒｍａｘ＝１．５×１０^４で本実施形態に係る高速グラフマッチ探索アルゴリズムが９６ミリ秒遅い（但し、発見したグラフマッチのスコアは高い）以外は、どのＲｍａｘにおいても、より高速により高スコアのグラフマッチを発見した。これらの数値から、本実施形態に係る高速グラフマッチ探索アルゴリズムは高い性能を持つと考えられる。

As a result (see Table 7 below), the fast graph match search algorithm according to the present embodiment found all 136 partial graphs (partials), whereas the simcomp method failed in 10 cases (7%). . The execution time was fast in this method except for one example, and the average was 48 milliseconds. There was no reversal in the subgraphs that could be found by increasing Rmax, only the execution time of the simcomp method increased. Also, in all comparisons (all), any Rmax except Rmax = 1.5 × 10 ⁴ and the fast graph match search algorithm according to the present embodiment is 96 milliseconds late (however, the score of the found graph match is high). I found a high-scoring graph match at a faster speed. From these numerical values, the fast graph match search algorithm according to the present embodiment is considered to have high performance.

２・・・コンピュータシステム、４・・・インターネット、８・・・外部サーバ、１０・・・外部データベース、１２・・・出力部、１４・・・中央処理部、１６・・・キーボード、１８・・・マウス。 2 ... computer system, 4 ... internet, 8 ... external server, 10 ... external database, 12 ... output unit, 14 ... central processing unit, 16 ... keyboard, 18. ··mouse.

Claims

Coordinate data relating to each of the atoms (Ai, Aj,...) Constituting the first molecule A and coordinate data relating to each of the atoms (Bk, B1,...) Constituting the second molecule B. Each atom (Ai, Aj,...) And second of the first molecule A in the virtual memory space constructed in the calculation unit and the storage unit according to the computer program input from the storage unit and loaded into the calculation unit. Are associated with each atom (Bk, B1,...) (M (Ai) = Bk) and superposed (i, j, k, l are all natural numbers), The first molecule A, which outputs to the output unit data relating to the optimal interatomic correspondence between one molecule A and the second molecule B and the similarity between the first molecule A and the second molecule B; In the fast graph match search device for evaluating the similarity with the second molecule B,
Regarding all the pairs of atoms Ai and Bk formed by all atoms Ai of the first molecule A and all atoms Bk of the second molecule B, as seen from each atom of the pair of atoms Ai and Bk, First calculation means for obtaining a first similarity index S1 (Ai, Bk) indicating how similar the surrounding environment is;
Regarding all the pairs of atoms Ai and Bk formed by all atoms Ai of the first molecule A and all atoms Bk of the second molecule B, as seen from each atom of the pair of atoms Ai and Bk, A calculation means for obtaining a second similarity index S2 (Ai, Bk) that is calculated by integrating the first similarity index S1 (Aj, B1) for all pairs of surrounding atoms Aj, B1 having the same bond distance. If the surrounding atoms Aj and B1 that are at the same bond distance from each atom of the pair of atoms Ai and Bk are the same element, the first similarity index S1 (Aj, B1) is further multiplied by a coefficient. A second calculating means for calculating a second similarity index S2 (Ai, Bk),
For all pairs of atoms Ai and Bk formed by all atoms Ai of the first molecule A and all atoms Bk of the second molecule B, the first pair of atoms Ai and Bk is used as the starting point. A third similarity having a value corresponding to the graph match score M (A, B) calculated at that time by creating an overall correspondence by sequentially associating the atoms of the molecule A and the atoms of the second molecule B Computation means for obtaining the index S3 (Ai, Bk), and at the time of creating the correspondence, next select an atom that directly binds to an already associated atom, and select a pair having a high second similarity index S2 Third calculation means for obtaining a third similarity index S3 (Ai, Bk), on the condition that priority is given to
Starting from the pair of starting atoms (Ai, Bk) when the maximum S3 (Ai, Bk) is calculated by the third calculating means, the largest S3 (Aj , B1), the fourth calculation means for obtaining the graph match score M (A, B) in the overall correspondence when the correspondence is continued until there is no pair of atoms that can be handled,
If the graph match score M (A, B) in the fourth calculation means is greater than the threshold value, the correspondence between atoms and the graph match score M calculated by the fourth calculation means for the first molecule A and the second molecule B A high-speed graph match search device for evaluating similarity between molecules, including a fifth output means for outputting (A, B).

Furthermore,
After calculating the graph match score M (A, B) by the fourth calculating means, {Bk of the second molecule B corresponding to one combined {Ai, Aj} pair in the first molecule A , B1}, one of the atoms is replaced with another atom Bn, and only for the replaced atom, the correspondence of the atoms in the first molecule A is changed to obtain a finely adjusted graph match score M (A, B). Including 6 calculation means,
If the finely adjusted graph match score M (A, B) calculated by the sixth calculation means is larger than the graph match score M (A, B) calculated by the third calculation means The fifth output means performs output by overwriting the finely adjusted graph match score M (A, B) on the graph match score M (A, B). Fast graph match search device.

The graph match score M (A, B) is defined by the following formula 1, each term of the following formula 1 is defined by the following table 1, and the value of the execution mode in the following table 1 is externally input via the input means. The high-speed graph match search device according to claim 1, wherein the high-speed graph match search device is set.

The first similarity index “S1 (Ai, Bk)” in the first calculating means is defined by the following Equation 2 and Table 2,
The second similarity index “S2 (Ai, Bk)” in the second calculating means is defined by the following Equation 3 and Table 3,
The third similarity index “S3 (Ai, Bk)” in the third calculation means is defined by the following Equation 4 and Table 4 as claimed in any one of claims 1 to 3: The described high-speed graph match search device.

5. The fast graph match search device according to claim 4, wherein the coefficient in Table 3 is 12.

Coordinate data relating to each of the atoms (Ai, Aj,...) Constituting the first molecule A and the atoms (Bk, Bl,...) Constituting the second molecule B stored in the storage unit. Each coordinate data (Ai, Aj,...) Of the first molecule A is input in a virtual memory space constructed on the computer according to a given computer program loaded with coordinate data relating to each and loaded into the arithmetic unit. And the corresponding atoms (Bk, B1,...) Of the second molecule B (m (Ai) = Bk) are overlapped (i, j, k, l are all natural numbers) ), An optimal interatomic correspondence between the first molecule A and the second molecule B, and the similarity between the first molecule A and the second molecule B are output to the output unit using the computer. Fast graph match search method for evaluating the similarity between molecule A and second molecule B Oite,
Regarding all the pairs of atoms Ai and Bk formed by all atoms Ai of the first molecule A and all atoms Bk of the first molecule B, as seen from each atom of the pair of atoms Ai and Bk, A first step of obtaining a first similarity index S1 (Ai, Bk) indicating how similar the surrounding environment is;
Regarding all the pairs of atoms Ai and Bk formed by all atoms Ai of the first molecule A and all atoms Bk of the second molecule B, as seen from each atom of the pair of atoms Ai and Bk, A step of obtaining a second similarity index S2 (Ai, Bk) that integrates the first similarity index S1 (Aj, B1) for all pairs of surrounding atoms Aj, B1 having the same bond distance, If the surrounding atoms Aj and B1 that are at the same bond distance from each atom of the pair of Ai and Bk are the same element, the first similarity index S1 (Aj, B1) is further multiplied by a coefficient and integrated. A second step of obtaining a similarity index S2 (Ai, Bk) of 2;
For all pairs of atoms Ai and Bk formed by all atoms Ai of the first molecule A and all atoms Bk of the second molecule B, the first pair of atoms Ai and Bk is used as the starting point. A third similarity having a value corresponding to the graph match score M (A, B) calculated at that time by creating an overall correspondence by sequentially associating the atoms of the molecule A and the atoms of the second molecule B In the step of obtaining the index S3 (Ai, Bk), at the time of creating the correspondence, the atom that directly binds to the already associated atom is selected next, and the pair having the second second similarity index S2 is selected. A third step for obtaining a third similarity index S3 (Ai, Bk), on the condition that priority is given to
Starting from the pair of starting atoms (Ai, Bk) when calculating the maximum S3 (Ai, Bk) in the third step, the largest S3 (Aj, Bk) among the unsupported pairs of atoms A fourth step of determining a graph match score M (A, B) in the overall correspondence when the correspondence with those having (Bl) is continued until there are no pairs of atoms that can be handled;
If the graph match score M (A, B) in the fourth step is larger than the threshold value, the interatomic correspondence calculated in the fourth step and the graph match score M (A) for the first molecule A and the second molecule B , B) and a fifth step of outputting a high-speed graph match search method for evaluating similarity between molecules.

Coordinate data relating to each of the atoms (Ai, Aj,...) Constituting the first molecule A and the atoms (Bk, Bl,...) Constituting the second molecule B stored in the storage unit. Coordinate data relating to each is input, and in the virtual memory space constructed on the computer, each atom (Ai, Aj,...) Of the first molecule A and each atom (Bk) of the second molecule B , Bl,...) (M (Ai) = Bk) to perform superposition (i, j, k, l are all natural numbers), and the first molecule A and the second molecule In a computer program that causes a computer to execute an optimal interatomic correspondence between B and a process of evaluating the similarity between the first molecule A and the second molecule B,
Regarding all the pairs of atoms Ai and Bk formed by all atoms Ai of the first molecule A and all atoms Bk of the second molecule B, as seen from each atom of the pair of atoms Ai and Bk, A first calculation step for obtaining a first similarity index S1 (Ai, Bk) indicating how similar the surrounding environments are;
Regarding all the pairs of atoms Ai and Bk formed by all atoms Ai of the first molecule A and all atoms Bk of the second molecule B, as seen from each atom of the pair of atoms Ai and Bk, A calculation step for obtaining a second similarity index S2 (Ai, Bk) for integrating the first similarity index S1 (Aj, B1) for all pairs of surrounding atoms Aj, B1 having an equal bond distance, If the surrounding atoms Aj and B1 that are at the same bond distance from each atom of the pair of Ai and Bk are the same element, the first similarity index S1 (Aj, B1) is further multiplied by a coefficient and integrated. A second calculation step for obtaining a second similarity index S2 (Ai, Bk);
For all pairs of atoms Ai and Bk formed by all atoms Ai of the first molecule A and all atoms Bk of the second molecule B, the first pair of atoms Ai and Bk is used as the starting point. A third similarity having a value corresponding to the graph match score M (A, B) calculated at that time by creating an overall correspondence by sequentially associating the atoms of the molecule A and the atoms of the second molecule B This is a calculation step for obtaining the index S3 (Ai, Bk), and at the time of creating the correspondence, the atom that directly binds to the already associated atom is selected next, and the pair that has a high second similarity index S2 is selected. A third calculation step for obtaining a third similarity index S3 (Ai, Bk), on the condition that priority is given to
Starting from the pair of starting atoms (Ai, Bk) when calculating the maximum S3 (Ai, Bk) in the third calculation step, the maximum S3 (Aj , B1), the fourth calculation step for obtaining the graph match score M (A, B) in the overall correspondence when the correspondence is continued until there is no pair of atoms that can be handled,
If the graph match score M (A, B) in the fourth calculation step is larger than the threshold value, the inter-atomic correspondence and the graph match score (A) calculated in the fourth step for the first molecule A and the second molecule B , B) is a computer program that causes a computer to execute a fifth output step.