JP6777903B2

JP6777903B2 - Search device, search method and search program

Info

Publication number: JP6777903B2
Application number: JP2017230089A
Authority: JP
Inventors: 淳也新井; 靖宏藤原; 鬼塚　真; 真鬼塚
Original assignee: Nippon Telegraph and Telephone Corp; Osaka University NUC
Current assignee: Nippon Telegraph and Telephone Corp; Osaka University NUC
Priority date: 2017-11-30
Filing date: 2017-11-30
Publication date: 2020-10-28
Anticipated expiration: 2037-11-30
Also published as: JP2019101610A

Description

本発明は、探索装置、探索方法および探索プログラムに関する。 The present invention relates to a search device, a search method, and a search program.

グラフ構造の中からクエリとして与えられた特定の部分構造すなわちサブグラフを発見する問題は、サブグラフマッチングと呼ばれる。従来、サブグラフマッチングでは、バックトラッキングと呼ばれる探索アルゴリズムに基づいて探索されていた（非特許文献１，２参照）。バックトラッキングでは、クエリグラフＱおよびデータグラフＧを入力として、データグラフＧに含まれる全てのクエリグラフＱの埋め込みが報告される。その際、クエリグラフＱの頂点の割り当て先となり得るデータグラフＧの候補頂点が抽出され、クエリグラフＱの頂点の割り当て先を候補頂点の中から変更しながら選択して埋め込みを探索することにより、埋め込みが列挙される。 The problem of finding a specific subgraph, or subgraph, given as a query in a graph structure is called subgraph matching. Conventionally, in subgraph matching, a search algorithm called backtracking has been used for searching (see Non-Patent Documents 1 and 2). In backtracking, the query graph Q and the data graph G are input, and the embedding of all the query graph Q included in the data graph G is reported. At that time, the candidate vertices of the data graph G that can be the allocation destination of the vertices of the query graph Q are extracted, and the allocation destination of the vertices of the query graph Q is selected from the candidate vertices while being selected to search for embedding. Embeddings are listed.

Huahai He，Ambuj K. Singh，“Graphs-at-a-time: Query Language and Access Methods for Graph Databases”，Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data，SIG-MOD '08，2008年，pp.405-418Huahai He, Ambuj K. Singh, “Graphs-at-a-time: Query Language and Access Methods for Graph Databases”, Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIG-MOD '08, 2008, pp .405-418 Jinsoo Lee，Wook-Shin Han，Romans Kasperovics，Jeong-Hoon Lee，“An In-depth Comparison of Subgraph Isomorphism Algorithms in Graph Databases”，Proceedings of the 39th international conference on Very Large Data Bases，2012年，pp.133-144Jinsoo Lee, Wook-Shin Han, Romans Kasperovics, Jeong-Hoon Lee, “An In-depth Comparison of Subgraph Isomorphism Algorithms in Graph Databases”, Proceedings of the 39th international conference on Very Large Data Bases, 2012, pp.133- 144

しかしながら、従来のバックトラッキングによれば、探索対象であるデータグラフＧ中の埋め込みが存在しない部分に対しても、網羅的に探索を行うために、長い処理時間を要していた。そのため、例えば、グラフデータ向けに設計されたデータベースであるグラフデータベースに対する、サブグラフマッチングとして記述されるクエリを用いた対話的な作業や、グラフデータベースに依存するサービスの提供が困難な場合があった。また、グラフに対してサブグラフマッチングを用いて、例えば、特定のサブグラフの出現回数をグラフの特徴量として利用するデータマイニングを行う際に、現実的な時間で完了できない場合があった。 However, according to the conventional backtracking, a long processing time is required to comprehensively search even a portion of the data graph G to be searched where there is no embedding. Therefore, for example, it may be difficult to perform interactive work using a query described as subgraph matching for a graph database, which is a database designed for graph data, or to provide a service that depends on the graph database. Further, when subgraph matching is used for a graph, for example, data mining using the number of appearances of a specific subgraph as a feature amount of the graph is performed, it may not be completed in a realistic time.

本発明は、上記に鑑みてなされたものであって、データグラフの中からクエリグラフと同型な部分を探索するサブグラフマッチング処理を高速化することを目的とする。 The present invention has been made in view of the above, and an object of the present invention is to speed up a subgraph matching process for searching a part of a data graph having the same type as a query graph.

上述した課題を解決し、目的を達成するために、本発明に係る探索装置は、ラベルが付与された頂点と、隣接する頂点間を接続するエッジとで構成されるグラフのうち、検索対象のデータのグラフであるデータグラフから、検索に使用されるクエリのグラフであるクエリグラフと同型な部分を探索する際に、該探索の過程において発生する探索の失敗に基づいて、クエリグラフの頂点とデータグラフの頂点との組み合わせの集合を、探索の失敗の要因を表す失敗パターンとして抽出するパターン抽出部と、前記パターン抽出部により抽出された前記失敗パターンと合致する探索の状態を枝刈りして、前記データグラフから前記クエリグラフと同型な部分を探索する探索部と、を備えることを特徴とする。 In order to solve the above-mentioned problems and achieve the object, the search device according to the present invention is a search target among graphs composed of labeled vertices and edges connecting adjacent vertices. When searching for a part of the data graph that is a graph of data that is similar to the query graph that is the graph of the query used for the search, the top of the query graph and the top of the query graph are based on the search failure that occurs in the process of the search. A pattern extraction unit that extracts a set of combinations with the vertices of the data graph as a failure pattern representing the cause of the search failure, and a search state that matches the failure pattern extracted by the pattern extraction unit are pruned. , A search unit for searching a portion having the same type as the query graph from the data graph.

本発明によれば、データグラフの中からクエリグラフと同型な部分を探索するサブグラフマッチング処理を高速化することができる。 According to the present invention, it is possible to speed up the subgraph matching process for searching a part of the data graph having the same type as the query graph.

図１は、サブグラフマッチングに関する定義を示す図である。FIG. 1 is a diagram showing a definition regarding subgraph matching. 図２は、再帰関数の処理手順を示すフローチャートである。FIG. 2 is a flowchart showing a processing procedure of the recursive function. 図３は、サブグラフマッチング処理を説明するための説明図である。FIG. 3 is an explanatory diagram for explaining the subgraph matching process. 図４は、サブグラフマッチング処理を説明するための説明図である。FIG. 4 is an explanatory diagram for explaining the subgraph matching process. 図５は、本実施形態の探索装置の処理概要を説明するための説明図である。FIG. 5 is an explanatory diagram for explaining a processing outline of the search device of the present embodiment. 図６は、本実施形態の探索装置の処理概要を説明するための説明図である。FIG. 6 is an explanatory diagram for explaining a processing outline of the search device of the present embodiment. 図７は、本実施形態の探索装置の概略構成を示す模式図である。FIG. 7 is a schematic diagram showing a schematic configuration of the search device of the present embodiment. 図８は、本実施形態の探索装置による探索処理に関する定義および定理を示す図である。FIG. 8 is a diagram showing a definition and a theorem regarding the search process by the search device of the present embodiment. 図９は、本実施形態の探索装置による探索処理に関する定理を示す図である。FIG. 9 is a diagram showing a theorem regarding the search process by the search device of the present embodiment. 図１０は、本実施形態の探索装置による探索処理に関する定理を示す図である。FIG. 10 is a diagram showing a theorem regarding a search process by the search device of the present embodiment. 図１１は、本実施形態の探索装置による探索処理に関する定理を示す図である。FIG. 11 is a diagram showing a theorem regarding the search process by the search device of the present embodiment. 図１２は、本実施形態の探索処理手順を示すフローチャートである。FIG. 12 is a flowchart showing the search processing procedure of the present embodiment. 図１３は、他の実施形態の探索処理を説明するための説明図である。FIG. 13 is an explanatory diagram for explaining the search process of another embodiment. 図１４は、探索プログラムを実行するコンピュータを例示する図である。FIG. 14 is a diagram illustrating a computer that executes a search program.

以下、図面を参照して、本発明の一実施形態を詳細に説明する。なお、この実施形態により本発明が限定されるものではない。また、図面の記載において、同一部分には同一の符号を付して示している。 Hereinafter, an embodiment of the present invention will be described in detail with reference to the drawings. The present invention is not limited to this embodiment. Further, in the description of the drawings, the same parts are indicated by the same reference numerals.

［従来のサブグラフマッチング処理］
まず、従来のサブグラフマッチング処理について説明する。以下の説明において、頂点がラベルを持つ無向グラフＧ＝（Ｖ_Ｇ，Ｅ_Ｇ，Σ，ｌ）を処理の対象とする。以下、Ｇをデータグラフと記す。ここで、Ｖ_Ｇは頂点の集合、Ｅ_Ｇ⊆Ｖ_Ｇ×Ｖ_Ｇはエッジの集合、Σはラベルの集合、ｌは頂点とラベルとを対応させる関数を示す。同様に、クエリグラフＱ＝（Ｖ_Ｑ，Ｅ_Ｑ，Σ，ｌ）を処理の対象とする。クエリグラフの頂点は、ｕ_１，ｕ_２，…，ｕ_｜ＶＱ｜というように、番号付けされているものとする。 [Conventional subgraph matching process]
First, the conventional subgraph matching process will be described. In the following description, undirected graph G ₌ the vertex has a label _{_{(V G, E G, Σ}} , l) is the target of processing. Hereinafter, G is referred to as a data graph. Here, V _G represents a set of vertices, a set of E _{_{_G}} ⊆V _G × V _G edge, sigma is a set of labels, l is a function for associating the vertex and labels. Similarly, the query graph Q = (V _Q , _EQ , Σ, l) is the target of processing. It is assumed that the vertices of the query graph are numbered such as u ₁ , u ₂ , ..., U _{| VQ |} .

また、サブグラフマッチングとは、データグラフＧとクエリグラフＱとが与えられたときに、サブグラフ同型である部分への埋め込みＭを列挙する問題と定義する。 Further, subgraph matching is defined as a problem of enumerating embedded Ms in parts having the same subgraph type when a data graph G and a query graph Q are given.

ここで、図１を参照して、サブグラフマッチングについて説明する。図１は、サブグラフマッチングに関する定義を示す図である。以下の説明において、図１の定義１に示すように、埋め込みを集合として表現する。サブグラフマッチングでは、クエリグラフの頂点（以下、クエリ頂点と記す。）から、データグラフの頂点（以下、データ頂点と記す。）への埋め込みＭ∈Ｖ_Ｑ×Ｖ_Ｇを探索する。 Here, subgraph matching will be described with reference to FIG. FIG. 1 is a diagram showing a definition regarding subgraph matching. In the following description, embedding is represented as a set, as shown in Definition 1 of FIG. The sub graph matching, the vertices of the query graph (hereinafter, referred to as query vertex.), The vertex of the data graph (hereinafter, referred to as data apex.) Searches the embedded M∈V _Q × V _G to.

また、探索の過程において、図１の定義２に示す部分埋め込み、すなわち、一部のクエリ頂点についてのみ、データ頂点が割り当てられた埋め込みを考える必要がある。なお、本実施形態において、定義域がＶ_Ｑ全体であるような埋め込みを完全な埋め込みと記す場合がある。 Further, in the process of search, it is necessary to consider the partial embedding shown in the definition 2 of FIG. 1, that is, the embedding in which the data vertices are assigned only to some query vertices. In the present embodiment, there is a case where domain is referred to as implantable complete embedded as a whole V _Q.

また、次式（１）〜（３）に示す３つの制約を満たす埋め込みＭ：Ｖ_Ｑ→Ｖ_Ｇを定義できるときに、ＱはＧに対してサブグラフ同型であるものする。 Also, embedding meet three constraints shown in equation (1) ~ (3) M : when you can define V _Q → V _G, _Q are those which are subgraph isomorphism against G.

従来のサブグラフマッチング処理の主流は、バックトラッキングに基づく探索アルゴリズムである。上記したとおり、バックトラッキングは、主に、候補頂点の抽出と、埋め込みの列挙との２つの処理を含む。 The mainstream of conventional subgraph matching processing is a search algorithm based on backtracking. As mentioned above, backtracking mainly involves two processes: extraction of candidate vertices and enumeration of embeddings.

まず、候補頂点の抽出において、候補頂点すなわち各クエリ頂点の割り当て先となり得るデータ頂点の集合Ｃ［ｕ_ｉ］⊆Ｖ_Ｇが抽出される。ｕ_ｉがデータ頂点ｖに割り当て可能か否かは、例えば、次式（４）〜(６)を用いて判定される（非特許文献１参照）。つまり、次式（４）〜（６）の全てを満たす場合に、ｕ_ｉがｖに割り当て可能と判定される。 First, in the extraction of candidate vertices, the set of data vertex can be the candidate vertex or each query vertex is assigned C [u _i] ⊆V _G are extracted. u _i whether assignable to data vertex v, for example, be determined using the following equation (4) to (6) (see Non-Patent Document 1). That is, if they meet all of the following formulas (4) ~ (6), u it is determined to be assigned to v.

上記式（４）では、まず、ラベルの一致が確認される。従って、Ｃ［ｕ_ｉ］に含まれるデータ頂点は、ラベル制約を満たす。また、上記式（５）にて次数が比較され、上記式（６）にて隣接頂点のラベルごとに隣接頂点数が比較される。これらの条件では、ｖにｕ_ｉ以上の数の隣接頂点がなければ、ｕ_ｉの隣接頂点をｖの隣接頂点に割り当てることができないことが利用されている。このように、確認が容易な条件により、割り当て不可能なクエリ頂点とデータ頂点との組み合わせを除外して、以降の探索を高速化している。 In the above formula (4), first, label matching is confirmed. Therefore, data vertexes included in the C _{[u i]} satisfies a label constraints. Further, the degree is compared by the above formula (5), and the number of adjacent vertices is compared for each label of the adjacent vertex by the above formula (6). In these conditions, if there is no adjacent vertices in the number of more than u _i to v, can not be assigned to adjacent vertices of u _i to the adjacent vertex of v is used. In this way, the combination of unassignable query vertices and data vertices is excluded under the condition that it is easy to confirm, and the subsequent search is speeded up.

次に、図２を参照して、埋め込みの列挙について説明する。埋め込みの列挙では、再帰関数Ｓｅａｒｃｈが呼び出され実行される。図２は、再帰関数の処理手順を示すフローチャートである。図２に示すフローチャートでは、まず、Ｓｅａｒｃｈ（φ）が呼び出される。ただし、φは空集合を表す。再帰関数Ｓｅａｒｃｈ（Ｐ）は、部分埋め込みＰを引数として受け取り、まず、Ｐが完全な埋め込みであるか否かが確認される（ステップＳ１００）。すなわち、ｋ＝｜Ｐ｜として、ｋ＝｜Ｖ_Ｑ｜であるか否かが確認される。Ｐが完全な埋め込みである場合（ステップＳ１００，Ｙｅｓ）、Ｐは埋め込みとして報告され（ステップＳ１８０）、処理は関数の呼び出し元に戻る。 Next, the embedding enumeration will be described with reference to FIG. In the embedded enumeration, the recursive function Search is called and executed. FIG. 2 is a flowchart showing a processing procedure of the recursive function. In the flowchart shown in FIG. 2, First, Search (φ) is called. However, φ represents an empty set. The recursive function Search (P) receives the partially embedded P as an argument, and first, it is confirmed whether or not P is completely embedded (step S100). That is, it is confirmed whether or not k = | V _Q | with k = | P |. If P is a complete embedding (step S100, Yes), then P is reported as an embedding (step S180) and the process returns to the caller of the function.

一方、Ｐが完全な埋め込みではない場合（ステップＳ１００，Ｎｏ）、ステップＳ１１０に処理が進む。ステップＳ１１０の処理では、クエリ頂点ｕ_ｋ＋１がデータ頂点ｖに割り当てられる。その際、ｖをＣ［ｕ_ｉ］から選択することによりラベル制約が満たされることになる。同時に、ｖは、既にＰに含まれるクエリ頂点と、データ頂点に関してエッジ制約が満たされなければならない。具体的には、次式（７）に示すエッジ制約を満たすｖ∈Ｃ［ｕ_ｉ］が選択される。 On the other hand, when P is not completely embedded (steps S100 and No), the process proceeds to step S110. In the process of step S110, the query vertex uk _{+ 1} is assigned to the data vertex v. At that time, v so that the label constraint is satisfied by the selected from C [u _i]. At the same time, v must satisfy the edge constraints for the query vertices already contained in P and the data vertices. Specifically, V∈C satisfying edge constraints shown in equation (7) [u _i] is selected.

次に、Ｐに（ｕ_ｋ＋１，ｖ）の割り当てが追加された部分埋め込みＰ^＋が作成され、Ｐ^＋について単射制約が確認される（ステップＳ１２０）。Ｐ^＋が単射であれば（ステップＳ１２０，Ｙｅｓ）、Ｐ^＋を引数として再帰関数Ｓｅａｒｃｈ（Ｐ^＋）が再帰呼び出しされる（ステップＳ１４０）。一方、Ｐ^＋が単射でなければ（ステップＳ１２０，Ｎｏ）、再帰関数Ｓｅａｒｃｈ（Ｐ^＋）は再帰呼び出しされず、呼び出し元に戻る（ステップＳ１５０）。 Next, a partially embedded P ⁺ with the assignment of (uk _{+ 1} , v) added to P is created, and the injective constraint is confirmed for P ⁺ (step S120). If P ⁺ is injective (step S120, Yes), the recursive function Search (P ⁺ ) is recursively called with P ⁺ as an argument (step S140). On the other hand, if P ⁺ is not injective (step S120, No), the recursive function Search (P ⁺ ) is not recursively called and returns to the caller (step S150).

次に、図３および図４を参照して、従来のサブグラフマッチング処理を具体的に説明する。図３および図４は、サブグラフマッチング処理について説明するための説明図である。従来のサブグラフマッチング処理では、まず、上記式（４）〜（６）を用いて、候補頂点集合が抽出される。図３に示す例では、Ｃ［ｕ₁］＝｛ｖ_１｝，Ｃ［ｕ_２］＝｛ｖ_２，…，ｖ_５｝，Ｃ［ｕ_３］＝｛ｖ_６，…，ｖ_９｝，Ｃ［ｕ_４］＝｛ｖ_１，ｖ_１０｝が抽出される。 Next, the conventional subgraph matching process will be specifically described with reference to FIGS. 3 and 4. 3 and 4 are explanatory views for explaining the subgraph matching process. In the conventional subgraph matching process, first, a set of candidate vertices is extracted using the above equations (4) to (6). In the example shown in FIG. 3, C [u ₁ ] = {v ₁ }, C [u ₂ ] = {v ₂ , ..., v ₅ }, C [u ₃ ] = {v ₆ , ..., v ₉ }, C [u ₄ ] = {v ₁ , v ₁₀ } is extracted.

この場合に、図４に示すバックトラッキングの探索木に従って、Ｃ［ｕ₁］＝｛ｖ_１｝であることから、最初にｕ_１がｖ_１に割り当てられる。またここでは、ｕ_２以降については、頂点番号の昇順に探索する場合について考える。すなわち、Ｃ［ｕ_２］からｖ_２が選択され、Ｃ［ｕ_３］からｖ_６が選択され、Ｃ［ｕ_４］からｖ_１０が選択される。この場合に、全てのクエリ頂点がデータ頂点に割り当てられるので、｛（ｕ_１，ｖ_１），（ｕ_２、ｖ_２），（ｕ_３，ｖ_６），（ｕ_４，ｖ_１０）｝が埋め込みとして報告される。 In this case, according to the backtracking search tree shown in FIG. 4, since C [u ₁ ] = {v ₁ }, u ₁ is _first assigned to v ₁ . Further, here, for u _{2 and} later, a case of searching in ascending order of vertex numbers will be considered. That is, v ₂ is selected from C [u ₂ ], v ₆ is selected from C [u ₃ ], and v ₁₀ is selected from C [u ₄ ]. In this case, all query vertices are assigned to the data vertices, so {(u ₁ , v ₁ ), (u ₂ , v ₂ ), (u ₃ , v ₆ ), (u ₄ , v ₁₀ )} Reported as an embedding.

さらに、他の埋め込みが探索される。ｕ_２の割り当て先の選択に戻り、ｕ_２がｖ_３に割り当てられ、ｕ_３がｖ_７に割り当てられる。この場合に、ｖ_７に隣接するｕ_４の候補頂点ｖ_１には、既にｕ_１が割り当てられており、単射制約の違反となるため、ｕ_４をｖ_１に割り当てることができない。ｖ_７の隣接頂点には、他にｕ_４の候補頂点が存在しないため、探索失敗となる。 In addition, other embeddings are searched. Returning to its assigned selection u _2, assigned to _{u 2} is _{v 3,} _{u 3} is assigned to _{v 7.} In this case, the candidate vertices v ₁ of u ₄ adjacent to v _7, already assigned u _1, to become a violation of injection constraints, can not be assigned to u ₄ to v _1. The adjacent vertex of v _7, since the other there is no candidate vertices of u _4, the search fails.

ｕ_３の残りの候補頂点ｖ_８、ｖ_９についても、同様に、ｕ_４をｖ_１に割り当てることができず、探索失敗となる。また、ｕ_２の割り当て先の選択に戻り、ｕ_２がｖ_４に割り当てられた場合、およびｕ_２がｖ_５に割り当てられた場合にも、同様に、探索失敗となる。以上の処理により、全ての候補頂点について探索が行われたことになり、探索が終了する。 for the remaining candidate vertices _v 8, _{v 9} of u _3, likewise, can not be assigned to _{u 4} to _{v 1,} the search fails. Further, return to its assigned selection u _2, if u ₂ is assigned to v _4, and even when u ₂ is assigned to v _5, similarly, the search fails. By the above processing, the search has been performed for all the candidate vertices, and the search ends.

このように、従来のサブグラフマッチング処理は、クエリ頂点の割り当て先を変更しながら再帰呼び出しを繰り返すバックトラッキングに基づいて実行されている。 In this way, the conventional subgraph matching process is executed based on backtracking in which recursive calls are repeated while changing the allocation destination of query vertices.

［探索装置の処理概要］
次に、図５および図６を参照して、本実施形態に係る探索装置の処理概要を説明する。本実施形態の探索装置は、図３に示した例において、上記の従来のサブグラフマッチング処理と同様に、まず、｛（ｕ_１，ｖ_１），（ｕ_２、ｖ_２），（ｕ_３，ｖ_６），（ｕ_４，ｖ_１０）｝を埋め込みとして報告する。 [Outline of search device processing]
Next, a processing outline of the search device according to the present embodiment will be described with reference to FIGS. 5 and 6. In the example shown in FIG. 3, the search device of the present embodiment first, in the same manner as the above-mentioned conventional subgraph matching process, first, {(u ₁ , v ₁ ), (u ₂ , v ₂ ), (u ₃ , Report v ₆ ), (u ₄ , v ₁₀ )} as embedding.

次に、探索装置は、バックトラッキングによってｕ_２をｖ_３に割り当て、ｕ_３をｖ_７に割り当てて探索失敗となった場合に、失敗パターンとして記録する。すなわち、「ｕ_１のｖ_１への割り当てと、ｕ_３のｖ_７への割り当てとを同時に行うと、探索に失敗する」ということを失敗パターンとして抽出して記録しておく。同様に、ｕ_３をｖ_８またはｖ_９に割り当てて探索失敗となった場合にも、「ｕ_１のｖ１への割り当てと、ｕ_３のｖ_８（またはｖ_９）への割り当てとを同時に行うと、探索に失敗する」ということを失敗パターンとして抽出して記録しておく。 Next, the search device assigns u ₂ to v ₃ by backtracking and assigns u ₃ to v ₇ , and when the search fails, records it as a failure pattern. That is, "if u ₁ is assigned to v ₁ and u ₃ is assigned to v ₇ at the same time, the search fails" is extracted and recorded as a failure pattern. Similarly, when u ₃ is assigned to v ₈ or v ₉ and the search fails, "assigning u ₁ to v ₁ and assigning u ₃ to v ₈ (or v ₉ ) are performed at the same time. And, "the search fails" is extracted and recorded as a failure pattern.

また、探索装置が、ｕ_２の割り当て先をｖ_４に変更した場合に、ｕ_３を隣接頂点ｖ_７，ｖ_８，ｖ_９に割り当てた部分埋め込みは、いずれも記録済みの失敗パターンに合致する。従って、ｕ_３を候補頂点に割り当てた部分埋め込みを作成することなく、探索失敗になることがわかる。ｕ_２の割り当て先をｖ_５に変更した場合についても、同様に、探索失敗になることがわかる。 Further, when the search device changes the allocation destination of u ₂ to v ₄ , the partial embedding in which u ₃ is assigned to the adjacent vertices v ₇ , v ₈ , and v ₉ all match the recorded failure pattern. .. Therefore, without creating an embedded portion assigned to u ₃ in candidate vertices, it is understood to be a search failure. Similarly, it can be seen that the search fails when the allocation destination of u ₂ is changed to v ₅ .

このように、本実施形態の探索装置によれば、図５に示す探索木のように、図４に示した従来のサブグラフマッチングの探索木と比較して、探索空間が減少する。すなわち、従来は、図４に示したように、ｕ_２をｖ_３，ｖ_４，ｖ_５のいずれかに割り当てるような埋め込みが存在しないにもかかわらず、それぞれｕ_３をｖ_７，ｖ_８，ｖ_９に割り当てて探索していた。つまり、従来は、ある部分埋め込みＰについて、Ｐ⊂Ｍであるような完全な埋め込みＭが存在しない場合にも、バックトラッキングによる探索を行っていた。本実施形態の探索装置は、このような埋め込みの発見に至らない探索の枝刈りを行う。したがって、本実施形態の探索装置によれば、高速なサブグラフマッチングが可能となる。 As described above, according to the search device of the present embodiment, the search space is reduced as compared with the conventional subgraph matching search tree shown in FIG. 4, as in the search tree shown in FIG. That is, conventionally, as shown in FIG. 4, although there is no embedding that assigns u ₂ to any of v ₃ , v ₄ , and v ₅ , u ₃ is assigned to v ₇ , v ₈ , and so on, respectively. v had been searching assigned to the _9. That is, conventionally, for a certain partially embedded P, a search by backtracking is performed even when there is no completely embedded M such as P⊂M. The search device of the present embodiment prunes the search that does not lead to the discovery of such an embedding. Therefore, according to the search device of the present embodiment, high-speed subgraph matching is possible.

また、図６のフローチャートに示すように、本実施形態の探索装置による埋め込みの列挙の処理は、図２に示した従来のサブグラフマッチング処理における再帰関数の処理手順と、ステップＳ１３０、Ｓ１６０〜Ｓ１７０が異なる。すなわち、図６に示すように、本実施形態の探索装置においては、失敗パターンが抽出されて記録される（ステップＳ１６０〜Ｓ１７０）。具体的には、再帰関数Ｓｅａｒｃｈ（Ｐ^＋）の再帰呼び出しにおいて埋め込みが一つも報告されなかった場合に（ステップＳ１６０，Ｙｅｓ）、Ｐから失敗パターンが抽出され記録される（ステップＳ１７０）。埋め込みが報告された場合には（ステップＳ１６０，Ｎｏ）、処理は関数の呼び出し元に戻る。 Further, as shown in the flowchart of FIG. 6, the process of enumerating the embedding by the search device of the present embodiment includes the processing procedure of the recursive function in the conventional subgraph matching process shown in FIG. 2, and steps S130 and S160 to S170. different. That is, as shown in FIG. 6, in the search device of the present embodiment, the failure pattern is extracted and recorded (steps S160 to S170). Specifically, when no embedding is reported in the recursive call of the recursive function Search (P ⁺ ) (step S160, Yes), the failure pattern is extracted from P and recorded (step S170). If the embedding is reported (step S160, No), the process returns to the caller of the function.

また、抽出された失敗パターンと照合される（ステップＳ１３０）。具体的には、作成された部分埋め込みＰ^＋が記録されている失敗パターンのいずれにも合致しない場合に（ステップＳ１３０，Ｙｅｓ）、再帰関数Ｓｅａｒｃｈ（Ｐ^＋）が再帰呼び出しされる（ステップＳ１４０）。Ｐ^＋がいずれかの失敗パターンに合致した場合には（ステップＳ１３０，Ｎｏ）、ステップＳ１５０に処理が進む。その他の処理は図２に示した従来のサブグラフマッチング処理と同様であるので、説明を省略する。 In addition, it is collated with the extracted failure pattern (step S130). Specifically, when the created partially embedded P ⁺ does not match any of the recorded failure patterns (step S130, Yes), the recursive function Search (P ⁺ ) is recursively called (step S140). .. If P ⁺ matches any of the failure patterns (steps S130, No), the process proceeds to step S150. Since the other processes are the same as the conventional subgraph matching process shown in FIG. 2, the description thereof will be omitted.

［探索装置の構成］
次に、図７を参照して、本実施形態に係る探索装置の概略構成を説明する。図７に示すように、本実施形態に係る探索装置１は、ワークステーションやパソコン等の汎用コンピュータで実現され、入力部１１と出力部１２と制御部１３と記憶部１４とを備える。探索装置１は、後述する探索処理を実行して、失敗パターンの枝刈りを行いながら、サブグラフマッチング処理を行う。 [Search device configuration]
Next, a schematic configuration of the search device according to the present embodiment will be described with reference to FIG. 7. As shown in FIG. 7, the search device 1 according to the present embodiment is realized by a general-purpose computer such as a workstation or a personal computer, and includes an input unit 11, an output unit 12, a control unit 13, and a storage unit 14. The search device 1 executes a search process described later, and performs a subgraph matching process while pruning a failure pattern.

入力部１１は、キーボードやマウス等の入力デバイスを用いて実現され、操作者による入力操作に対応して、制御部１３に対して各種指示情報を入力する。また、本実施形態において、入力部１１は、後述する探索処理の対象のデータグラフＧおよびクエリグラフＱを含むグラフデータを受け付けて制御部１３に入力する。 The input unit 11 is realized by using an input device such as a keyboard or a mouse, and inputs various instruction information to the control unit 13 in response to an input operation by the operator. Further, in the present embodiment, the input unit 11 receives the graph data including the data graph G and the query graph Q, which are the targets of the search process described later, and inputs the graph data to the control unit 13.

出力部１２は、液晶ディスプレイなどの表示装置、プリンター等の印刷装置、情報通信装置等によって実現され、例えば、後述する探索処理の処理結果であるサブグラフマッチング結果等を操作者に対して出力する。 The output unit 12 is realized by a display device such as a liquid crystal display, a printing device such as a printer, an information communication device, and the like, and outputs, for example, a subgraph matching result which is a processing result of a search process described later to an operator.

また、探索装置１は、図示しない通信制御部を備える。通信制御部は、ＮＩＣ（Network Interface Card）等で実現され、ＬＡＮ（Local Area Network）やインターネットなどの電気通信回線を介したサーバ等の外部の装置と制御部１３との通信を制御する。例えば、上記のグラフデータは、通信制御部を介して外部の装置から受け付けてもよい。また、サブグラフマッチング結果は、通信制御部を介して外部の装置に出力されてもよい。 Further, the search device 1 includes a communication control unit (not shown). The communication control unit is realized by a NIC (Network Interface Card) or the like, and controls communication between an external device such as a server via a telecommunication line such as a LAN (Local Area Network) or the Internet and the control unit 13. For example, the above graph data may be received from an external device via the communication control unit. Further, the subgraph matching result may be output to an external device via the communication control unit.

記憶部１４は、ＲＡＭ（Random Access Memory）、フラッシュメモリ（Flash Memory）等の半導体メモリ素子、または、ハードディスク、光ディスク等の記憶装置によって実現される。記憶部には、探索装置１を動作させる処理プログラムや、処理プログラムの実行中に使用されるデータなどが予め記憶され、あるいは処理の都度一時的に記憶される。例えば、後述する探索処理において、抽出された失敗パターンが記録される。この記憶部１４は、通信制御部を介して制御部１３と通信する構成でもよい。 The storage unit 14 is realized by a semiconductor memory element such as a RAM (Random Access Memory) or a flash memory (Flash Memory), or a storage device such as a hard disk or an optical disk. The processing program that operates the search device 1, data used during execution of the processing program, and the like are stored in advance in the storage unit, or are temporarily stored each time the processing is performed. For example, in the search process described later, the extracted failure pattern is recorded. The storage unit 14 may be configured to communicate with the control unit 13 via the communication control unit.

制御部１３は、ＣＰＵ（Central Processing Unit）等の演算処理装置がメモリに記憶された処理プログラムを実行することにより、図３に例示するように、頂点抽出部１３ａ、パターン抽出部１３ｂ、および探索部１３ｃとして機能する。なお、制御部１３は、後述する探索処理が実現されるならば、これらの機能部とは異なる機能単位で構成されていてもよい。 As illustrated in FIG. 3, the control unit 13 executes a processing program stored in a memory by an arithmetic processing unit such as a CPU (Central Processing Unit), thereby performing a vertex extraction unit 13a, a pattern extraction unit 13b, and a search. It functions as a unit 13c. The control unit 13 may be configured in a functional unit different from these functional units as long as the search process described later is realized.

また、本実施形態において、制御部１３は複数ＣＰＵコアで実現されてもよい。これにより、バックトラッキングの各枝に対応する処理や、後述する探索処理における各データグラフの頂点に関する処理を、複数のＣＰＵによる並行処理で行うことが可能となる。 Further, in the present embodiment, the control unit 13 may be realized by a plurality of CPU cores. As a result, it is possible to perform processing corresponding to each branch of backtracking and processing related to the vertices of each data graph in the search processing described later by parallel processing by a plurality of CPUs.

具体的には、制御部１３は、ラベルが付与された頂点と、隣接する頂点間を接続するエッジとで構成されるグラフのうち、検索対象のデータのグラフであるデータグラフから、検索に使用されるクエリのグラフであるクエリグラフと一致する部分を探索する際に、以下の探索処理を実行する。 Specifically, the control unit 13 is used for searching from a data graph which is a graph of data to be searched among graphs composed of labeled vertices and edges connecting adjacent vertices. The following search process is executed when searching for a part that matches the query graph, which is the graph of the query to be performed.

すなわち、頂点抽出部１３ａは、データグラフの各頂点と隣接する頂点とのラベルの組み合わせに基づいて、クエリグラフの頂点と一致し得るデータグラフの頂点に隣接する頂点のうち、クエリグラフの頂点に隣接する頂点と一致し得る該データグラフの頂点を、データグラフの各頂点に対応する候補頂点として抽出する。具体的には、頂点抽出部１３ａは、従来のバックトラッキング処理と同様に、上記式（４）〜（６）を満たす候補頂点を抽出する。 That is, the vertex extraction unit 13a selects the vertices of the query graph among the vertices adjacent to the vertices of the data graph that can match the vertices of the query graph based on the combination of the labels of each vertex of the data graph and the adjacent vertices. The vertices of the data graph that can match the adjacent vertices are extracted as candidate vertices corresponding to each vertex of the data graph. Specifically, the vertex extraction unit 13a extracts candidate vertices satisfying the above equations (4) to (6) as in the conventional backtracking process.

また、パターン抽出部１３ｂは、探索の過程において発生する探索の失敗に基づいて、クエリグラフの頂点とデータグラフの頂点との組み合わせの集合を、探索の失敗の要因を表す失敗パターンとして抽出する。具体的には、パターン抽出部１３ｂは、以下に説明するように、抽出された候補頂点を用いて失敗パターンを抽出する。 Further, the pattern extraction unit 13b extracts a set of combinations of the vertices of the query graph and the vertices of the data graph as a failure pattern representing the cause of the search failure, based on the search failure that occurs in the search process. Specifically, the pattern extraction unit 13b extracts a failure pattern using the extracted candidate vertices as described below.

また、探索部１３ｃは、パターン抽出部１３ｂにより抽出された失敗パターンと合致する探索の状態を枝刈りして、データグラフからクエリグラフと同型な部分を探索する。 Further, the search unit 13c prunes the search state that matches the failure pattern extracted by the pattern extraction unit 13b, and searches the data graph for a portion having the same type as the query graph.

ここで、図８は、本実施形態の探索装置による探索処理に関する定義および定理を示す図である。以下に説明する探索処理において、図８に示す定義および定理が適用される。すなわち、図８の定義３に示すように、失敗パターンが定義される。また、図８の定義４に示すように、パターンの合致が定義される。また、図８の定理１に示すように、失敗パターンの判定が定義される。 Here, FIG. 8 is a diagram showing a definition and a theorem regarding the search process by the search device of the present embodiment. The definitions and theorems shown in FIG. 8 are applied in the search process described below. That is, as shown in Definition 3 of FIG. 8, a failure pattern is defined. Further, as shown in Definition 4 of FIG. 8, pattern matching is defined. Further, as shown in Theorem 1 of FIG. 8, the determination of the failure pattern is defined.

パターン抽出部１３ｂは、図６に示したように、探索が失敗した場合に部分埋め込みから失敗パターンを抽出する（ステップＳ１７０）。パターン抽出部１３ｂは、より多くの部分埋め込みに合致してより効果的に探索の枝刈りを行うため、より一般化された失敗パターンを抽出することが望ましい。すなわち、失敗パターンＤに含まれる、クエリ頂点とデータ頂点との組み合わせの数｜Ｄ｜が小さいことが望ましい。そこで、パターン抽出部１３ｂは、探索失敗であることがわかった部分埋め込みＰから、クエリ頂点とデータ頂点との組み合わせの一部を抽出して失敗パターンＤを作成する。したがって、作成される失敗パターンＤは、抽出元の部分埋め込みをＰとしたときにその部分集合となるため、次式（８）を満たす。 As shown in FIG. 6, the pattern extraction unit 13b extracts a failure pattern from the partial embedding when the search fails (step S170). It is desirable for the pattern extraction unit 13b to extract a more generalized failure pattern in order to match more partial embeddings and perform search pruning more effectively. That is, it is desirable that the number of combinations of query vertices and data vertices | D | included in the failure pattern D is small. Therefore, the pattern extraction unit 13b creates a failure pattern D by extracting a part of the combination of the query vertex and the data vertex from the partially embedded P found to be the search failure. Therefore, the created failure pattern D becomes a subset when the partial embedding of the extraction source is P, and therefore satisfies the following equation (8).

失敗パターンは、探索失敗の原因に基づいて作成される。探索失敗の原因は、図６に示したように、エッジ制約を満たす候補頂点がないこと（ステップＳ１１０）、Ｐに新しい割り当てを追加して作成したＰ^＋がいずれも失敗であること（ステップＳ１２０〜Ｓ１４０）の２つに大別される。 The failure pattern is created based on the cause of the search failure. As shown in FIG. 6, the cause of the search failure is that there is no candidate vertex that satisfies the edge constraint (step S110), and that P ⁺ created by adding a new allocation to P is a failure (step S120). It is roughly divided into two (~ S140).

まず、エッジ制約を満たす候補頂点がない場合の失敗パターンの抽出方法について説明する。図６のステップＳ１１０の処理対象である上記式（７）を満たす候補頂点ｖの集合をＣ_Ｐ［ｕ_ｉ］とすると、次式（９）が成立する。 First, a method of extracting a failure pattern when there is no candidate vertex that satisfies the edge constraint will be described. Process is the object above formula in step S110 in FIG. 6 a set of candidate vertices v satisfying (7) When _C P _{[u i],} the following equation (9) holds.

ここで、図９は、本実施形態の探索装置による探索処理に関する定理を示す図である。クエリ頂点ｕ_１に対応する候補頂点がない状態は、空集合であるようなＣ_Ｐ［ｕ_ｉ］によって表される。また、まだデータ頂点が割り当てられていないクエリ頂点の候補頂点がなくなった場合に、探索失敗となる。換言すると、図９の定理２に示すように、式（１０）が成立する。 Here, FIG. 9 is a diagram showing a theorem regarding the search process by the search device of the present embodiment. The absence candidate vertices corresponding to the query vertex u ₁ is represented by C _{P [u} _i] such that empty set. Also, if there are no candidate vertices for query vertices to which data vertices have not been assigned, the search will fail. In other words, as shown in Theorem 2 of FIG. 9, equation (10) holds.

さらに、Ｃ_Ｐ［ｕ_ｉ］が空集合である場合について、図９に示す定理３が成立する。 _Furthermore, C P _{[u i]} is for the case where an empty set, the theorem 3 shown in FIG. 9 is established.

つまり、エッジ制約により候補頂点がなくなった場合に、Ｃ_Ｐ［ｕ_ｉ］が空集合となるｕ_ｉが存在する。その場合のｉについて、Ｄ_ｉ⊆ＰかつＤｅａｄ（Ｄ_ｉ）であるようなＤ_ｉが存在し、上記式（８）を満たす。そこで、パターン抽出部１３ｂは、Ｃ_Ｐ［ｕ_ｉ］が空集合であるような任意のＤ_ｉを失敗パターンとして登録する。 That is, when there are no more candidate vertices by the edge constraint, C P _[u _i] exists u _i as the empty set. For that case i, there are _{D i} as a _{D i} ⊆P and Dead _{(D i),} satisfies the equation (8). Therefore, the pattern extracting unit _13b, C P _{[u i]} registers as a failure pattern any _{D i} such that empty set.

次に、Ｐ^＋がいずれも失敗である場合の失敗パターンの抽出方法について説明する。ここで、図１０は、本実施形態の探索装置による探索処理に関する定理を示す図である。Ｐ^＋がいずれも失敗である場合に、図１０に示す定理４および定理５が成立する。すなわち、Ｐ^＋がいずれも失敗である場合について、定理４の式（１３）が成立することから、式（１５）に示す失敗パターンの集合を定義できる。この場合の失敗パターンについて、定理５の式（１６）が成立する。したがって、パターン抽出部１３ｂは、式（１６）を用いて作成した失敗パターンを登録する。 Next, a method of extracting a failure pattern when both P ⁺ are failures will be described. Here, FIG. 10 is a diagram showing a theorem regarding the search process by the search device of the present embodiment. When both P ⁺ are unsuccessful, Theorem 4 and Theorem 5 shown in FIG. 10 hold. That is, since the equation (13) of Theorem 4 holds in the case where all P ⁺ are failures, the set of failure patterns shown in the equation (15) can be defined. For the failure pattern in this case, the equation (16) of Theorem 5 holds. Therefore, the pattern extraction unit 13b registers the failure pattern created by using the equation (16).

次に、定理４の式（１５）に示した失敗パターンの集合を得るために、Ｐ^＋からＤ^＋を抽出する方法について説明する。Ｐ^＋が失敗となる原因は、図６に示したように、Ｐ^＋が単射でないこと（ステップＳ１２０）、Ｐ^＋がいずれかの失敗パターンに合致すること（ステップＳ１３０）、再帰呼び出しした再帰関数が探索失敗となること（ステップＳ１４０）の３つである。このうち、Ｐ^＋がいずれかの失敗パターンに合致する場合については、失敗パターンが既に抽出され記録されている。また、再帰呼び出しした再帰関数が探索失敗となる場合は、再帰関数内で失敗パターンの抽出と記録とが行われる。したがって、パターン抽出部１３ｂは、Ｐ^＋が単射でない場合にのみ、新たに失敗パターンの抽出を行う。 Next, a method of extracting D ⁺ from P ⁺ will be described in order to obtain a set of failure patterns shown in the equation (15) of Theorem 4. The reasons why P ⁺ fails are that P ⁺ is not injective (step S120), P ⁺ matches one of the failure patterns (step S130), and recursive recursion is called, as shown in FIG. There are three cases where the function fails to search (step S140). Of these, when P ⁺ matches any of the failure patterns, the failure pattern has already been extracted and recorded. If the recursive function called recursively fails in the search, the failure pattern is extracted and recorded in the recursive function. Therefore, the pattern extraction unit 13b newly extracts the failure pattern only when P ⁺ is not injective.

部分埋め込みＰ^＋が単射でないこと、すなわち、同じデータ頂点に複数のクエリ頂点が割り当てられることによる失敗は、図１１に示す定理６および定理７のように表される。すなわち、Ｐ^＋が単射でない場合について、定理６の式（１８）が成立する。また、式（１９）のＤ_ｉ ^＋は、定理７の式（２０）に示すように、それ自体が失敗パターンとなっている。 Failure due to the partial embedding P ⁺ not being injective, that is, the assignment of multiple query vertices to the same data vertices, is represented by Theorem 6 and Theorem 7 shown in FIG. That is, the equation (18) of Theorem 6 holds when P ⁺ is not injective. Further, _Di ⁺ of the equation (19) itself is a failure pattern as shown in the equation (20) of Theorem 7.

なお、この探索アルゴリズムでは、再帰関数Ｓｅａｒｃｈには単射であることが確認された部分埋め込みが引き渡されている（ステップＳ１２０）。そのため、Ｐ^＋が単射でないことは、最後に追加されたｕ_ｋと、もう一つの他のクエリ頂点とが同じデータ頂点に割り当てられたことに起因する。したがって、Ｐ^＋が単射でないならば、｜Ｄ_ｋ ^＋｜＝２である。この場合に、パターン抽出部１３ｂは、Ｄ_ｋ ^＋をＰ^＋から抽出した失敗パターンとする。 In this search algorithm, a partial embedding confirmed to be injective is passed to the recursive function Search (step S120). Therefore, the fact that P ⁺ is not injective is due to the fact that the last added _uk and another query vertex are assigned to the same data vertex. Therefore, if P ⁺ is not injective, then | D _k ⁺ | = 2. In this case, the pattern extraction unit 13b sets D _k ⁺ as a failure pattern extracted from P ⁺ .

［探索処理］
次に、図１２を参照して、探索装置１の探索処理について説明する。図１２は、本実施形態の探索処理手順を示すフローチャートである。図１２のフローチャートは、図６に示したフローチャートに、失敗パターンを抽出するための処理が追加されたものである。ただし、失敗パターンの集合は、関数Ｓｅａｒｃｈのスコープ外で定義された大域変数であり、空集合で初期化されているものとする。 [Search processing]
Next, the search process of the search device 1 will be described with reference to FIG. FIG. 12 is a flowchart showing the search processing procedure of the present embodiment. The flowchart of FIG. 12 is obtained by adding a process for extracting a failure pattern to the flowchart shown in FIG. However, it is assumed that the set of failure patterns is a global variable defined outside the scope of the function Search and is initialized with an empty set.

この探索処理では、パターン抽出部１３ｂが、上記の定理３，５，７に基づいて失敗パターンＤを作成し、作成したＤを失敗パターン集合へ追加する。また、パターン抽出部１３ｂは、関数Ｓｅａｒｃｈの戻り値としてＰから抽出された失敗パターンを返す。ただし、パターン抽出部１３ｂは、引数Ｐを含む埋め込みが発見された場合には空集合を返す。 In this search process, the pattern extraction unit 13b creates a failure pattern D based on the above theorems 3, 5 and 7, and adds the created D to the failure pattern set. Further, the pattern extraction unit 13b returns the failure pattern extracted from P as the return value of the function Search. However, the pattern extraction unit 13b returns an empty set when an embedding including the argument P is found.

まず、図２に示した従来のサブグラフマッチング処理と同様に、探索部１３ｃは、Ｐが完全な埋め込みであるか否かを確認する（ステップＳ１００）。完全な埋め込みである場合には（ステップＳ１００，Ｙｅｓ）、探索部１３ｃは、Ｐを埋め込みとして報告する（ステップＳ１８０）。また、Ｐは失敗ではないので、探索部１３ｃは、Ｐから抽出された失敗パターンＤを空集合とする（ステップＳ１６１）。 First, similarly to the conventional subgraph matching process shown in FIG. 2, the search unit 13c confirms whether or not P is completely embedded (step S100). In the case of complete embedding (step S100, Yes), the search unit 13c reports P as embedding (step S180). Further, since P is not a failure, the search unit 13c sets the failure pattern D extracted from P as an empty set (step S161).

完全な埋め込みでなかった場合には（ステップＳ１００，Ｎｏ）、探索部１３ｃは、Ｐが図９に示した定理２に基づく失敗であるか否かを確認する（ステップＳ１６３）。失敗の場合には（ステップＳ１６３，Ｙｅｓ）、パターン抽出部１３ｂは、図９に示した定理３に基づいて失敗パターンを抽出する（ステップＳ１６４）。失敗でない場合には（ステップＳ１６３，Ｎｏ）、探索部１３ｃが、Ｐに新しい割り当てを追加したＰ^＋について処理を進める。 If it is not completely embedded (step S100, No), the search unit 13c confirms whether P is a failure based on Theorem 2 shown in FIG. 9 (step S163). In the case of failure (step S163, Yes), the pattern extraction unit 13b extracts a failure pattern based on Theorem 3 shown in FIG. 9 (step S164). If it is not a failure (step S163, No), the search unit 13c proceeds with processing for P ⁺ in which a new allocation is added to P.

すなわち、探索部１３ｃは、まず、Ｐ^＋のそれぞれから抽出された失敗パターンＤ^＋の集合を空集合で初期化する（ステップＳ１６５）。次に、探索部１３ｃは、エッジ制約を満たす頂点ｖ∈Ｃ_Ｐ［ｕ_ｋ＋１］について、ループを実行する（ステップＳ１１０）。すなわち、探索部１３ｃは、Ｐ^＋を作成し（ステップＳ１２１）、それが単射であるか否かを確認する（ステップＳ１２２）。ここで、Ｐは単射であるので、Ｐ^＋が単射でないことは、ｖ∈ｖａｌ（Ｐ）であるか否かによって確認できる。 That is, the search unit 13c first initializes the set of failure patterns D ⁺ extracted from each of P ⁺ with an empty set (step S165). Then, the search section 13c, the apex meeting the edge constraints _{_{v∈C P [u k + 1]}} , it executes the loop (step S110). That is, the search unit 13c creates P ⁺ (step S121) and confirms whether or not it is injective (step S122). Here, since P is injective, it can be confirmed that P ⁺ is not injective by whether or not v ∈ val (P).

単射でない場合に（ステップＳ１２２，Ｙｅｓ）、パターン抽出部１３ｂが、図１１に示した定理７に基づいて失敗パターンを失敗パターンの集合に追加する（ステップＳ１３１）。単射である場合に（ステップＳ１２２，Ｎｏ）、探索部１３ｃは、Ｐ^＋が失敗パターンの集合に記録された失敗パターンに合致するか否かを確認する（ステップＳ１３２）。 When it is not injective (step S122, Yes), the pattern extraction unit 13b adds the failure pattern to the set of failure patterns based on Theorem 7 shown in FIG. 11 (step S131). In the case of injective function (step S122, No), the search unit 13c confirms whether or not P ⁺ matches the failure pattern recorded in the set of failure patterns (step S132).

合致する失敗パターンＤ^＋が存在する場合には（ステップＳ１３２，Ｙｅｓ）、このＤ^＋はＰから抽出された失敗パターンであるので、パターン抽出部１３ｂは、このＤ^＋を失敗パターンの集合に追加する（ステップＳ１３３）。合致する失敗パターンがなければ（ステップＳ１３２，Ｎｏ）、探索部１３ｃは、Ｓｅａｒｃｈ（Ｐ^＋）を再帰呼び出しし、戻り値をＤ^＋に代入する（ステップＳ１４１）。また、探索部１３ｃは、戻り値Ｄ^＋を失敗パターンの集合に追加する（ステップＳ１４２）。 If a matching failure pattern D ⁺ exists (step S132, Yes), since this D ⁺ is a failure pattern extracted from P, the pattern extraction unit 13b adds this D ⁺ to the set of failure patterns. (Step S133). If there is no matching failure pattern (step S132, No), the search unit 13c recursively calls Search (P ⁺ ) and substitutes the return value into D ⁺ (step S141). Further, the search unit 13c adds the return value D ⁺ to the set of failure patterns (step S142).

以上の処理を、全てのｖ∈Ｃ_Ｐ［ｕ_ｋ＋１］について実行したら、探索部１３ｃは、Ｐ^＋の中に成功したものが存在したか否かを確認する。関数Ｓｅａｒｃｈは、探索に成功した場合に空集合を返すので、成功したものがあればステップＳ１４２の処理で空集合が追加されている。したがって、探索部１３ｃは、失敗パターンの集合が空集合を含むか否かを確認する（ステップＳ１６６）。 The above process, if executed for all the _{_{v∈C P [u k + 1]}} , the search unit 13c confirms whether or not there are those successfully in the P ^+. Since the function Search returns an empty set when the search is successful, the empty set is added in the process of step S142 if there is a successful one. Therefore, the search unit 13c confirms whether or not the set of failure patterns includes an empty set (step S166).

失敗パターンの集合が空集合を含む場合に（ステップＳ１６６，Ｙｅｓ）、Ｐを含む完全な埋め込みが存在することを意味するので、探索部１３ｃは、失敗パターンＤを空集合とする（ステップＳ１６７）。失敗パターンの集合が空集合を含まない場合には（ステップＳ１６６，Ｎｏ）、Ｐは探索失敗であるので、パターン抽出部１３ｂは、図１０に示した定理５に基づいて抽出した失敗パターンをＤに代入する（ステップＳ１６８）。 When the set of failure patterns includes an empty set (step S166, Yes), it means that there is a complete embedding including P, so the search unit 13c sets the failure pattern D as an empty set (step S167). .. If the set of failure patterns does not include the empty set (steps S166, No), P is a search failure, so the pattern extraction unit 13b D extracts the failure patterns extracted based on Theorem 5 shown in FIG. Substitute in (step S168).

以上の処理の後に、Ｄが空集合でない場合、すなわち、Ｐが失敗であった場合に（ステップＳ１６９，Ｙｅｓ）、パターン抽出部１３ｂは、Ｄを失敗パターンの集合に追加する（ステップＳ１７０）。これにより、またはＤが空集合であった場合に（ステップＳ１６９，Ｎｏ）、関数Ｓｅａｒｃｈは、Ｄを返却して呼び出し元へ戻る。 After the above processing, when D is not an empty set, that is, when P is a failure (step S169, Yes), the pattern extraction unit 13b adds D to the set of failure patterns (step S170). As a result, or when D is an empty set (step S169, No), the function Search returns D and returns to the caller.

以上、説明したように、本実施形態の探索装置１は、ラベルが付与された頂点と、隣接する頂点間を接続するエッジとで構成されるグラフのうち、検索対象のデータのグラフであるデータグラフから、検索に使用されるクエリのグラフであるクエリグラフと同型な部分を探索する。その際に、パターン抽出部１３ｂが、探索の過程において発生する探索の失敗に基づいて、クエリグラフの頂点とデータグラフの頂点との組み合わせの集合を、探索の失敗の要因を表す失敗パターンとして抽出する。また、探索部１３ｃが、パターン抽出部１３ｂにより抽出された失敗パターンと合致する探索の状態を枝刈りして、データグラフからクエリグラフと同型な部分を探索する。 As described above, the search device 1 of the present embodiment is a graph of data to be searched among graphs composed of labeled vertices and edges connecting adjacent vertices. From the graph, search for a part that is similar to the query graph, which is the graph of the query used for the search. At that time, the pattern extraction unit 13b extracts a set of combinations of the vertices of the query graph and the vertices of the data graph as a failure pattern representing the cause of the search failure, based on the search failure that occurs in the search process. To do. Further, the search unit 13c prunes the search state that matches the failure pattern extracted by the pattern extraction unit 13b, and searches the data graph for a portion having the same type as the query graph.

これにより、データグラフの中からクエリグラフを含むものを探索するサブグラフマッチング処理を高速化することが可能となる。したがって、例えば、グラフデータ向けに設計されたデータベースであるグラフデータベースに対する、サブグラフマッチングとして記述されるクエリを用いた対話的な作業や、グラフデータベースに依存するサービスを容易に提供できる。また、グラフに対してサブグラフマッチングを用いて、例えば、特定のサブグラフの出現回数をグラフの特徴量として利用するデータマイニングを行う場合に、現実的な時間で完了できる。 As a result, it is possible to speed up the subgraph matching process for searching the data graph including the query graph. Therefore, for example, it is possible to easily provide an interactive work using a query described as subgraph matching for a graph database, which is a database designed for graph data, and a service that depends on the graph database. Further, when subgraph matching is used for a graph, for example, data mining using the number of appearances of a specific subgraph as a feature amount of the graph is performed, it can be completed in a realistic time.

なお、ラベルがエッジに付与された場合にも適用できる。すなわち、探索装置１は、頂点と、隣接する頂点間を接続するラベルが付与されたエッジとで構成されるグラフのうち、検索対象のデータのグラフであるデータグラフから、検索に使用されるクエリのグラフであるクエリグラフと同型な部分を探索してもよい。この場合に、パターン抽出部１３ｂは、探索の過程において発生する探索の失敗に基づいて、クエリグラフの頂点とデータグラフの頂点との組み合わせの集合を、探索の失敗の要因を表す失敗パターンとして抽出する。これにより、探索処理を適用できる範囲が拡大する。 It can also be applied when a label is attached to an edge. That is, the search device 1 is a query used for a search from a data graph which is a graph of data to be searched among graphs composed of vertices and edges with labels connecting adjacent vertices. You may search for a part having the same type as the query graph which is the graph of. In this case, the pattern extraction unit 13b extracts a set of combinations of the vertices of the query graph and the vertices of the data graph as a failure pattern representing the cause of the search failure, based on the search failure that occurs in the search process. To do. As a result, the range to which the search process can be applied is expanded.

［他の実施形態］
上記実施形態において、探索装置１は、失敗パターンの集合の中から部分埋め込みに合致する失敗パターンを探索している（図１２のステップＳ１３２）が、これに限定されない。図１３は、他の実施形態の探索処理を説明するための説明図である。集合の中から特定の条件を満たす要素を探索する作業は、一つ一つの要素を確認する方法では要素の数に比例した処理時間を要し、処理時間が増大する。また、集合に追加される失敗パターンが膨大な数になり得るため、記憶装置に全ての情報を記録できないおそれがある。 [Other Embodiments]
In the above embodiment, the search device 1 searches for a failure pattern that matches the partial embedding from the set of failure patterns (step S132 in FIG. 12), but is not limited to this. FIG. 13 is an explanatory diagram for explaining the search process of another embodiment. The work of searching for an element satisfying a specific condition from a set requires a processing time proportional to the number of elements in the method of confirming each element, and the processing time increases. In addition, since the number of failure patterns added to the set can be enormous, there is a possibility that all the information cannot be recorded in the storage device.

そこで、図１３に示すように、失敗パターンをハッシュテーブルによって保持してもよい。この場合に、探索装置１は、部分埋め込みＰに最後に追加した割り当て（ｕ_ｋ，Ｐ［ｕ_ｋ］）と、Ｐから抽出された失敗パターンとを対応付けて記憶部１４に保持する。 Therefore, as shown in FIG. 13, the failure pattern may be held by the hash table. In this case, the search device 1 allocates added to the last portion embedded P and _{(u k, P [u k} ]), it is held in the storage unit 14 in association with the failure pattern extracted from P.

これにより、図１２のステップＳ１３２の処理で確認される失敗パターンが一つになるため、処理時間が短縮される。また、ステップＳ１７０の処理において、ハッシュテーブルの値が上書きされ、古い値は保持されないため、記録される失敗パターンの数が減少する。存在し得るクエリ頂点とデータ頂点との組み合わせは、最大でも｜Ｖ_Ｑ｜｜Ｖ_Ｇ｜通りである。保持される失敗パターンの数はこれ以下の数となるため、抽出された失敗パターンの全てを保持した場合と比較して、圧倒的に少なくなる。したがって、限られた記憶装置を用いても探索処理を実行することが可能となる。 As a result, the failure patterns confirmed in the process of step S132 in FIG. 12 become one, so that the process time is shortened. Further, in the process of step S170, the value of the hash table is overwritten and the old value is not retained, so that the number of failed patterns recorded is reduced. The maximum number of combinations of query vertices and data vertices that can exist is | V _Q | | V _G |. Since the number of failed patterns to be retained is less than this, it is overwhelmingly smaller than the case where all of the extracted failure patterns are retained. Therefore, it is possible to execute the search process even if a limited storage device is used.

［プログラム］
上記実施形態に係る探索装置１が実行する処理をコンピュータが実行可能な言語で記述したプログラムを作成することもできる。一実施形態として、探索装置１は、パッケージソフトウェアやオンラインソフトウェアとして上記の探索処理を実行する探索プログラムを所望のコンピュータにインストールさせることによって実装できる。例えば、上記の探索プログラムを情報処理装置に実行させることにより、情報処理装置を探索装置１として機能させることができる。ここで言う情報処理装置には、デスクトップ型またはノート型のパーソナルコンピュータが含まれる。また、その他にも、情報処理装置にはスマートフォン、携帯電話機やＰＨＳ（Personal Handyphone System）などの移動体通信端末、さらには、ＰＤＡ（Personal Digital Assistants）などのスレート端末などがその範疇に含まれる。また、ユーザが使用する端末装置をクライアントとし、当該クライアントに上記の探索処理に関するサービスを提供するサーバ装置として実装することもできる。例えば、探索装置１は、グラフデータを入力とし、サブグラフマッチング結果を出力する探索処理サービスを提供するサーバ装置として実装される。この場合、探索装置１は、Ｗｅｂサーバとして実装することとしてもよいし、アウトソーシングによって上記の探索処理に関するサービスを提供するクラウドとして実装することとしてもかまわない。以下に、探索装置１と同様の機能を実現する探索プログラムを実行するコンピュータの一例を説明する。 [program]
It is also possible to create a program in which the processing executed by the search device 1 according to the above embodiment is described in a language that can be executed by a computer. As one embodiment, the search device 1 can be implemented by installing a search program that executes the above search process as package software or online software on a desired computer. For example, by causing the information processing device to execute the above search program, the information processing device can function as the search device 1. The information processing device referred to here includes a desktop type or notebook type personal computer. In addition, the information processing device includes smartphones, mobile communication terminals such as mobile phones and PHS (Personal Handyphone System), and slate terminals such as PDAs (Personal Digital Assistants). Further, the terminal device used by the user may be used as a client, and the terminal device may be implemented as a server device that provides the service related to the search process to the client. For example, the search device 1 is implemented as a server device that provides a search processing service that inputs graph data and outputs a subgraph matching result. In this case, the search device 1 may be implemented as a Web server, or may be implemented as a cloud that provides the service related to the search process by outsourcing. An example of a computer that executes a search program that realizes the same functions as the search device 1 will be described below.

図１４は、探索プログラムを実行するコンピュータの一例を示す図である。コンピュータ１０００は、例えば、メモリ１０１０と、ＣＰＵ１０２０と、ハードディスクドライブインタフェース１０３０と、ディスクドライブインタフェース１０４０と、シリアルポートインタフェース１０５０と、ビデオアダプタ１０６０と、ネットワークインタフェース１０７０とを有する。これらの各部は、バス１０８０によって接続される。 FIG. 14 is a diagram showing an example of a computer that executes a search program. The computer 1000 has, for example, a memory 1010, a CPU 1020, a hard disk drive interface 1030, a disk drive interface 1040, a serial port interface 1050, a video adapter 1060, and a network interface 1070. Each of these parts is connected by a bus 1080.

メモリ１０１０は、ＲＯＭ（Read Only Memory）１０１１およびＲＡＭ１０１２を含む。ＲＯＭ１０１１は、例えば、ＢＩＯＳ（Basic Input Output System）等のブートプログラムを記憶する。ハードディスクドライブインタフェース１０３０は、ハードディスクドライブ１０３１に接続される。ディスクドライブインタフェース１０４０は、ディスクドライブ１０４１に接続される。ディスクドライブ１０４１には、例えば、磁気ディスクや光ディスク等の着脱可能な記憶媒体が挿入される。シリアルポートインタフェース１０５０には、例えば、マウス１０５１およびキーボード１０５２が接続される。ビデオアダプタ１０６０には、例えば、ディスプレイ１０６１が接続される。 The memory 1010 includes a ROM (Read Only Memory) 1011 and a RAM 1012. The ROM 1011 stores, for example, a boot program such as a BIOS (Basic Input Output System). The hard disk drive interface 1030 is connected to the hard disk drive 1031. The disk drive interface 1040 is connected to the disk drive 1041. A removable storage medium such as a magnetic disk or an optical disk is inserted into the disk drive 1041. For example, a mouse 1051 and a keyboard 1052 are connected to the serial port interface 1050. For example, a display 1061 is connected to the video adapter 1060.

ここで、ハードディスクドライブ１０３１は、例えば、ＯＳ１０９１、アプリケーションプログラム１０９２、プログラムモジュール１０９３およびプログラムデータ１０９４を記憶する。上記実施形態で説明した各テーブルは、例えばハードディスクドライブ１０３１やメモリ１０１０に記憶される。 Here, the hard disk drive 1031 stores, for example, the OS 1091, the application program 1092, the program module 1093, and the program data 1094. Each table described in the above embodiment is stored in, for example, the hard disk drive 1031 or the memory 1010.

また、探索プログラムは、例えば、コンピュータ１０００によって実行される指令が記述されたプログラムモジュール１０９３として、ハードディスクドライブ１０３１に記憶される。具体的には、上記実施形態で説明した探索装置１が実行する各処理が記述されたプログラムモジュール１０９３が、ハードディスクドライブ１０３１に記憶される。 Further, the search program is stored in the hard disk drive 1031 as, for example, a program module 1093 in which a command executed by the computer 1000 is described. Specifically, the program module 1093 in which each process executed by the search device 1 described in the above embodiment is described is stored in the hard disk drive 1031.

また、探索プログラムによる情報処理に用いられるデータは、プログラムデータ１０９４として、例えば、ハードディスクドライブ１０３１に記憶される。そして、ＣＰＵ１０２０が、ハードディスクドライブ１０３１に記憶されたプログラムモジュール１０９３やプログラムデータ１０９４を必要に応じてＲＡＭ１０１２に読み出して、上述した各手順を実行する。 Further, the data used for information processing by the search program is stored as program data 1094 in, for example, the hard disk drive 1031. Then, the CPU 1020 reads the program module 1093 and the program data 1094 stored in the hard disk drive 1031 into the RAM 1012 as needed, and executes each of the above-described procedures.

なお、探索プログラムに係るプログラムモジュール１０９３やプログラムデータ１０９４は、ハードディスクドライブ１０３１に記憶される場合に限られず、例えば、着脱可能な記憶媒体に記憶されて、ディスクドライブ１０４１等を介してＣＰＵ１０２０によって読み出されてもよい。あるいは、探索プログラムに係るプログラムモジュール１０９３やプログラムデータ１０９４は、ＬＡＮやＷＡＮ（Wide Area Network）等のネットワークを介して接続された他のコンピュータに記憶され、ネットワークインタフェース１０７０を介してＣＰＵ１０２０によって読み出されてもよい。 The program module 1093 and program data 1094 related to the search program are not limited to the case where they are stored in the hard disk drive 1031. For example, they are stored in a removable storage medium and read by the CPU 1020 via the disk drive 1041 or the like. May be done. Alternatively, the program module 1093 and the program data 1094 related to the search program are stored in another computer connected via a network such as a LAN or WAN (Wide Area Network), and read by the CPU 1020 via the network interface 1070. You may.

以上、本発明者によってなされた発明を適用した実施形態について説明したが、本実施形態による本発明の開示の一部をなす記述および図面により本発明は限定されることはない。すなわち、本実施形態に基づいて当業者等によりなされる他の実施形態、実施例および運用技術等は全て本発明の範疇に含まれる。 Although the embodiment to which the invention made by the present inventor is applied has been described above, the present invention is not limited by the description and the drawings which form a part of the disclosure of the present invention according to the present embodiment. That is, all other embodiments, examples, operational techniques, and the like made by those skilled in the art based on the present embodiment are included in the scope of the present invention.

１探索装置
１１入力部
１２出力部
１３制御部
１３ａ頂点抽出部
１３ｂパターン抽出部
１３ｃ探索部
１４記憶部 1 Search device 11 Input unit 12 Output unit 13 Control unit 13a Vertex extraction unit 13b Pattern extraction unit 13c Search unit 14 Storage unit

Claims

Among the graphs composed of labeled vertices and edges connecting adjacent vertices, from the data graph which is the graph of the data to be searched to the query graph which is the graph of the query used for the search. When searching for the same type of part, a set of combinations of query graph vertices and data graph vertices is extracted as a failure pattern representing the cause of the search failure based on the search failure that occurs in the search process. Pattern extraction section and
A search unit that prunes the search state that matches the failure pattern extracted by the pattern extraction unit and searches the data graph for a portion having the same type as the query graph.
A search device characterized by comprising.

Based on the combination of labels of each vertex of the data graph and adjacent vertices, among the vertices adjacent to the vertices of the data graph that can match the vertices of the query graph, the vertices adjacent to the vertices of the query graph A vertex extraction unit that extracts matching vertices of the data graph as candidate vertices corresponding to each vertex of the data graph is further provided.
The search device according to claim 1, wherein the pattern extraction unit extracts the failure pattern using the extracted candidate vertices.

Among the graphs composed of vertices and edges with labels connecting adjacent vertices, from the data graph which is the graph of the data to be searched to the query graph which is the graph of the query used for the search. When searching for the same type of part, a set of combinations of query graph vertices and data graph vertices is extracted as a failure pattern representing the cause of the search failure based on the search failure that occurs in the search process. Pattern extraction section and
A search unit that prunes the search state that matches the failure pattern extracted by the pattern extraction unit and searches the data graph for a portion having the same type as the query graph.
A search device characterized by comprising.

The pattern extraction unit records the failure pattern as a hash table in the storage unit, and records the failure pattern in the storage unit.
Claims 1 to 1, wherein the search unit prunes a search state that matches the failure pattern with reference to the storage unit, and searches the data graph for a portion having the same type as the query graph. The search device according to any one of 3.

A search method executed by a search device,
Among the graphs composed of labeled vertices and edges connecting adjacent vertices, from the data graph which is the graph of the data to be searched to the query graph which is the graph of the query used for the search. When searching for the same type of part, a set of combinations of query graph vertices and data graph vertices is extracted as a failure pattern representing the cause of the search failure based on the search failure that occurs in the search process. Pattern extraction process and
A search step of pruning a search state that matches the failure pattern extracted in the pattern extraction step and searching for a portion having the same type as the query graph from the data graph.
A search method characterized by including.

Among the graphs composed of labeled vertices and edges connecting adjacent vertices, from the data graph which is the graph of the data to be searched to the query graph which is the graph of the query used for the search. When searching for the same type of part, a set of combinations of query graph vertices and data graph vertices is extracted as a failure pattern representing the cause of the search failure based on the search failure that occurs in the search process. Pattern extraction steps to be performed and
A search step of pruning a search state that matches the failure pattern extracted in the pattern extraction step and searching for a portion having the same type as the query graph from the data graph.
A search program characterized by having a computer execute.