JP2011039838A

JP2011039838A - Pattern classification device and pattern classification method

Info

Publication number: JP2011039838A
Application number: JP2009187377A
Authority: JP
Inventors: Kyoshi Iizuka; 京士飯塚; Takahiko Murayama; 隆彦村山; Tomohide Yamamoto; 具英山本
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2009-08-12
Filing date: 2009-08-12
Publication date: 2011-02-24
Anticipated expiration: 2029-08-12
Also published as: JP5277111B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a pattern classification device and a pattern classification method for classifying a plurality of patterns satisfactorily. <P>SOLUTION: A query issuing part 14 generates the number of N×M-pieces of retrieval patterns by making a first node in the number of N-pieces of patterns for retrieving subgraphs in a graph G include the number of M-pieces of keywords different from one another (S11) and retrieves the subgraph agreeing with each retrieval pattern from the graph G (S15). A pattern classification part 15 classifies the number of N-pieces of patterns based on an instance in a second node in the retrieved subgraph (S19). <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は、パターン分類装置およびパターン分類方法に関するものである。 The present invention relates to a pattern classification device and a pattern classification method.

近年にあっては、大量のデータソースがコンピュータネットワーク上に存在しており、複数のデータソースを結合して、単一のデータソースからでは抽出できない情報を取り出す、データウェアハウスなどの技術が注目を集めている。 In recent years, there are a large number of data sources on computer networks, and technologies such as data warehouses that combine multiple data sources to extract information that cannot be extracted from a single data source are attracting attention. Collecting.

一方、異なる複数のデータソースから得られた情報を統一的に扱うための枠組みとして、グラフ表現できるデータモデルであるRDF(Resource Description Framework)を用いたセマンティックWeb技術も注目されている。 On the other hand, as a framework for handling information obtained from a plurality of different data sources in a unified manner, a semantic Web technology using RDF (Resource Description Framework), which is a data model that can be represented in a graph, has attracted attention.

セマンティックWebでは、SPARQLなどのRDFクエリ言語を用いて検索用のパターン（以下、単にパターンという）のマッチングによって必要な情報を検索するRDF検索技術などが提唱されている。 The Semantic Web proposes RDF search technology that searches for necessary information by matching search patterns (hereinafter simply referred to as patterns) using an RDF query language such as SPARQL.

非特許文献１は、キーワード文字列を含む自然文を検索するシステムにおいて、検索結果が類似するキーワードを類似するクエリとし、類似クエリごとに分類及び、ユーザに類似クエリを提案する仕組みを提供する技術を開示している。しかし、以下の問題があった。 Non-Patent Document 1 is a system that searches a natural sentence including a keyword character string and uses a similar query as a keyword having similar search results, and provides a mechanism for classifying each similar query and proposing a similar query to the user. Is disclosed. However, there were the following problems.

これらの技術で言われるクエリは、パターンの検索キーワード変数に相当するものであり、パターンに相当する部分については一切言及されていない。そのため、パターンの分類を行うことはできない。 Queries referred to in these techniques correspond to the search keyword variables of the pattern, and no reference is made to the portion corresponding to the pattern. Therefore, pattern classification cannot be performed.

例えば、非特許文献２では、対象グラフ集合の特性を反映した構造類似性の提案がなされ、そこでは、特徴的な部分構造を用いて、構造的な類似性を定義し、部分グラフの類似性判定を行う。 For example, Non-Patent Document 2 proposes a structural similarity that reflects the characteristics of the target graph set, where the structural similarity is defined using a characteristic partial structure, and the similarity of the subgraphs is defined. Make a decision.

しかし、この技術は、ラベル無し無向グラフを対象としており、そのままRDFなどのラベル有り有向グラフへ適用することはできないのである。 However, this technology targets unlabeled undirected graphs, and cannot be applied directly to labeled directed graphs such as RDF.

RDFなどのノードとアークにラベルを持つグラフ構造データに対する検索を行うためのクエリとして用いるパターンを選択する際に、グラフの構造が複雑であると、意図する検索を行うことができるパターンを探し出すことが困難になるため、パターンを効率的に選択可能にする必要がある。 When selecting a pattern to be used as a query for searching graph structure data with labels on nodes and arcs such as RDF, if the structure of the graph is complex, find a pattern that can perform the intended search Therefore, it is necessary to select a pattern efficiently.

特に複数の異なるデータソースから結合したグラフの場合には、グラフ中に意味的な重複が含まれ、意味的に類似する異なる構造のパターンが多数存在するため、取捨選択作業が煩雑になる。 In particular, in the case of a graph combined from a plurality of different data sources, semantic duplication is included in the graph, and there are many patterns having different structures that are semantically similar, so that the selection operation becomes complicated.

小野田透、湯本高行、角谷和俊、「検索傾向の部分的な類似に基づくトピッククラスタリング」、日本データベース学会論文誌 Vol.7, No.3, pp.49-54, 2008年12月Toru Onoda, Takayuki Yumoto, Kazutoshi Kakutani, “Topic clustering based on partial similarity of search tendency”, Transactions of the Database Society of Japan Vol.7, No.3, pp.49-54, December 2008 和田貴久、大野博之、稲積宏誠、「対象グラフ集合の特性を反映した構造類似性の提案」、日本データベース学会Letters Vol.6, No.1, pp.185-188，2007年6月Takahisa Wada, Hiroyuki Ohno, Hiromasa Inazumi, “Proposal of Structural Similarity Reflecting Characteristics of Object Graph Sets”, Database Society of Japan Letters Vol.6, No.1, pp.185-188, June 2007

本発明は、上記に鑑みなされたものであり、その目的とするところは、複数のパターンを分類可能なパターン分類装置およびパターン分類方法を提供することにある。 The present invention has been made in view of the above, and an object thereof is to provide a pattern classification apparatus and a pattern classification method capable of classifying a plurality of patterns.

上記の課題を解決するために、本発明に係るパターン分類装置は、インスタンスをもつノード間がアークによって接続されたグラフが記憶されるグラフデータベースと、前記グラフ内のサブグラフを検索するためのＮ個のパターンにおける第１ノードに対し、互いに異なるＭ個のキーワードを含ませて、Ｎ×Ｍ個の検索パターンを生成し、前記グラフから前記各検索パターンに合致するサブグラフを検索するグラフ検索手段と、前記検索されたサブグラフにおける第２ノード内のインスタンスを使用して前記Ｎ個のパターンにおける２つのパターンからなる組み合わせのそれぞれについて類似の度合いを求め、当該類似の度合いに基づいて前記Ｎ個のパターンを分類するパターン分類手段とを備えることを特徴とする。 In order to solve the above problems, a pattern classification apparatus according to the present invention includes a graph database in which a graph in which nodes having instances are connected by arcs is stored, and N graphs for searching subgraphs in the graph. Graph search means for generating N × M search patterns by including M different keywords for the first node in the pattern, and searching the graph for subgraphs matching the search patterns; Using the instance in the second node in the searched subgraph, obtain a degree of similarity for each of the combinations of the two patterns in the N patterns, and determine the N patterns based on the degree of similarity. Pattern classification means for classifying.

また、本発明に係るパターン分類方法は、インスタンスをもつノード間がアークによって接続されたグラフが記憶されるグラフデータベースを備えるパターン分類装置が行うパターン分類方法であって、前記パターン分類装置のグラフ検索手段が、前記グラフ内のサブグラフを検索するためのＮ個のパターンにおける第１ノードに対し、互いに異なるＭ個のキーワードを含ませて、Ｎ×Ｍ個の検索パターンを生成し、前記グラフから前記各検索パターンに合致するサブグラフを検索し、前記パターン分類装置のパターン分類手段が、前記検索されたサブグラフにおける第２ノード内のインスタンスを使用して前記Ｎ個のパターンにおける２つのパターンからなる組み合わせのそれぞれについて類似の度合いを求め、当該類似の度合いに基づいて前記Ｎ個のパターンを分類することを特徴とする。 The pattern classification method according to the present invention is a pattern classification method performed by a pattern classification apparatus including a graph database in which a graph in which nodes having instances are connected by arcs is stored. Means generates N × M search patterns by including M keywords different from each other for the first node in the N patterns for searching the subgraph in the graph, and generates the N × M search patterns from the graph; Sub-graphs that match each search pattern are searched, and the pattern classification unit of the pattern classification device uses a combination of two patterns in the N patterns by using an instance in a second node in the searched sub-graph. Find the degree of similarity for each and based on the degree of similarity Characterized by classifying the N patterns.

前記パターン分類手段は、前記Ｎ個のパターンを、前記類似の度合いに基づいて、1つ以上のパターンを含むパターンクラスタに分類するものであって、前記各キーワードにつき、２つのパターンの一方に当該キーワードを含ませた検索パターンに合致する１つ以上のサブグラフにおける第２ノード内のインスタンスの集合をＡ、当該２つのパターンの他方に当該キーワードを含ませた検索パターンに合致する１つ以上のサブグラフにおける第２ノード内のインスタンスの集合をＢとして、
類似判定値＝｜Ａ∩Ｂ｜÷｜Ａ∪Ｂ｜
ただし、
Ａ、Ｂが共に空集合でなく、
｜Ａ∩Ｂ｜は、ＡとＢの積集合の中のインスタンス数、
｜Ａ∪Ｂ｜は、ＡとＢの和集合の中のインスタンス数、
を計算した場合、
前記各キーワードに対応する類似判定値の中のｋ個（０＜ｋ≦Ｋ：ただしＫは前記キーワードの個数）以上が所定のしきい値以上となる当該２つのパターンを、互いに類似するパターンとして、同一のパターンクラスタに含ませるようにしてもよい。 The pattern classification means classifies the N patterns into a pattern cluster including one or more patterns based on the degree of similarity, and each of the keywords is classified into one of two patterns. A set of instances in the second node in one or more subgraphs that match the search pattern including the keyword is A, and one or more subgraphs that match the search pattern that includes the keyword in the other of the two patterns Let B be the set of instances in the second node at
Similarity judgment value = | A∩B | ÷ | A∪B |
However,
A and B are not empty sets,
| A∩B | is the number of instances in the intersection of A and B,
| A∪B | is the number of instances in the union of A and B,
When calculating
The two patterns in which k or more (0 <k ≦ K: K is the number of the keywords) among the similarity determination values corresponding to the keywords are equal to or greater than a predetermined threshold value are similar to each other. These may be included in the same pattern cluster.

あるいは、前記パターン分類手段は、以下のように処理を行ってもよい。
まず、前記Ｎ×Ｍ個の検索パターンから、該当のサブグラフを得られなかった検索で使用された検索パターンを除外する。
次に、２つの各パターンに共通のキーワードを含ませて得た検索パターンから共にサブグラフが得られた場合には当該２つのパターンを関連づけ、互いに関連づけられた複数のパターンから得られ且つ除外されていない検索パターンに含まれた１つ以上のキーワードをキーワードクラスタと定義し、他のパターンと関連づけられていない単一のパターンから得られ且つ除外されていない検索パターンに含まれた１つ以上のキーワードをキーワードクラスタと定義し、前記複数のキーワードを１つ以上のキーワードクラスタに分類する。
次に、前記各キーワードクラスタから１つのキーワードを選択するとともに、該選択されるキーワードを含み且つ除外されていない検索パターンの数が最も多くなるようにする。
次に、前記各キーワードクラスタにつき、選択されたキーワードを含み且つ除外されていない検索パターンを生成するために使用された１つ以上のパターンを選択する。
次に、前記選択された１つ以上のパターンに含まれ且つ互いに類似する複数のパターンをパターンクラスタと定義し、前記選択された１つ以上のパターンに含まれ且つ他のパターンと類似しない単一のパターンをパターンクラスタと定義し、前記選択された１つ以上のパターンを１つ以上のパターンクラスタに分類するとともに、前者のパターンクラスタに含まれるいずれの２パターンも、前記２パターンの一方に前記選択されたキーワードを含ませた検索パターンに合致する１つ以上のサブグラフにおける第２ノード内のインスタンスの集合をＡ、前記２パターンの他方に前記選択されたキーワードを含ませた検索パターンに合致する１つ以上のサブグラフにおける第２ノード内のインスタンスの集合をＢとした場合、
類似判定値＝｜Ａ∩Ｂ｜÷｜Ａ∪Ｂ｜
ただし、
Ａ、Ｂが共に空集合でなく、
｜Ａ∩Ｂ｜は、ＡとＢの積集合の中のインスタンス数、
｜Ａ∪Ｂ｜は、ＡとＢの和集合の中のインスタンス数、
が所定のしきい値以上となるようにする。 Alternatively, the pattern classification unit may perform processing as follows.
First, from the N × M search patterns, the search patterns used in the search that could not obtain the corresponding subgraph are excluded.
Next, when a subgraph is obtained from a search pattern obtained by including a common keyword in each of the two patterns, the two patterns are associated with each other, obtained from a plurality of patterns associated with each other, and excluded. One or more keywords included in a search pattern that is derived from a single pattern that is defined as a keyword cluster and that is not associated with other patterns and that is not excluded Is defined as a keyword cluster, and the plurality of keywords are classified into one or more keyword clusters.
Next, one keyword is selected from each of the keyword clusters, and the number of search patterns including the selected keyword and not excluded is maximized.
Next, for each of the keyword clusters, one or more patterns used to generate a search pattern that includes the selected keyword and is not excluded are selected.
Next, a plurality of patterns included in the selected one or more patterns and similar to each other are defined as a pattern cluster, and a single pattern included in the selected one or more patterns and not similar to other patterns The pattern is defined as a pattern cluster, the selected one or more patterns are classified into one or more pattern clusters, and any two patterns included in the former pattern cluster are included in one of the two patterns. A set of instances in the second node in one or more subgraphs matching the search pattern including the selected keyword is A, and the search pattern including the selected keyword is included in the other of the two patterns. If the set of instances in the second node in one or more subgraphs is B,
Similarity judgment value = | A∩B | ÷ | A∪B |
However,
A and B are not empty sets,
| A∩B | is the number of instances in the intersection of A and B,
| A∪B | is the number of instances in the union of A and B,
Is greater than or equal to a predetermined threshold.

本発明によれば、パターンの第１ノードにキーワードを含ませて得られる検索パターンに合致するサブグラフを検索し、そのサブグラフにおける第２ノード内のインスタンスを検索結果として得る場合のパターンを分類でき、パターンの数を実質的に低減することができる。また、パターンをユーザに選択させる場合などにおいて、ユーザはパターンを容易に選択することができる。 According to the present invention, it is possible to search for a subgraph that matches a search pattern obtained by including a keyword in the first node of the pattern, and to classify patterns when obtaining an instance in the second node in the subgraph as a search result, The number of patterns can be substantially reduced. In addition, when the user selects a pattern, the user can easily select the pattern.

本実施の形態に係るグラフ検索装置の構成図である。It is a lineblock diagram of a graph search device concerning this embodiment. グラフＧの一部を例示した図である。6 is a diagram illustrating a part of a graph G. FIG. ＲＤＦ／ＸＭＬ形式のデータ、その元データおよびこの形式のデータによるサブグラフを例示した図である。It is the figure which illustrated the subgraph by the data of the RDF / XML format, its original data, and the data of this format. パターンをグラフ化して例示した図である。It is the figure which illustrated the pattern as a graph. パターンを分類する動作を示すシーケンス図である。It is a sequence diagram which shows the operation | movement which classifies a pattern. パターンの分類でクラスを選択し、しきい値を指定する際に表示される画面を示す図である。It is a figure which shows the screen displayed when selecting a class by the classification of a pattern and specifying a threshold value. パターンを分類する際に生成された検索パターンを示す図である。It is a figure which shows the search pattern produced | generated when classifying a pattern. パターンを分類する際に検索されたサブグラフを示す図である。It is a figure which shows the subgraph searched when classifying a pattern. ノードのインスタンスを検索する動作を示すシーケンス図である。It is a sequence diagram which shows the operation | movement which searches the instance of a node. インスタンスの検索でクラスを選択する際に表示される画面を示す図である。It is a figure which shows the screen displayed when selecting a class by the search of an instance. インスタンスの検索でパターンを選択し、キーワードを入力する際に表示される画面を示す図である。It is a figure which shows the screen displayed when selecting a pattern by searching an instance and inputting a keyword. 検索されたインスタンスを表示する画面を示す図である。It is a figure which shows the screen which displays the searched instance. パターンを分類する別な方法の説明で使用するパターンとキーワードを示す図である。It is a figure which shows the pattern and keyword used by description of another method of classifying a pattern. その方法におけるステップＳ１９の動作を示すフローチャートである。It is a flowchart which shows operation | movement of step S19 in the method. その方法におけるパターンの除外、関連づけ、キーワードの選択の様子を示す図である。It is a figure which shows the mode of the exclusion of a pattern in the method, an association, and the selection of a keyword. その方法においてインスタンスの集合が構成される様子を示す図である。It is a figure which shows a mode that the set of instances is comprised in the method. その方法において３３０個のパターンを分類した結果を示す図である。It is a figure which shows the result of having classified 330 patterns in the method. その分類により得られた１つのパターンクラスタに含まれるパターンを示す図である。It is a figure which shows the pattern contained in one pattern cluster obtained by the classification.

以下、本発明の実施の形態を図面を参照して説明する。なお、同一または類似のものには同一符号を付与し、重複説明を省略する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. In addition, the same code | symbol is provided to the same or similar thing, and duplication description is abbreviate | omitted.

図１は、本実施の形態に係るグラフ検索装置の構成図である。
グラフ検索装置１は、ユーザ端末２に接続され、ユーザ端末２には、表示装置３が接続されている。 FIG. 1 is a configuration diagram of a graph search apparatus according to the present embodiment.
The graph search device 1 is connected to a user terminal 2, and a display device 3 is connected to the user terminal 2.

グラフ検索装置１は、表示装置３に表示される入力用インタフェースと出力用インタフェースを生成しユーザ端末２に送信するユーザインタフェース１１と、インスタンスをもつノード間がアークによって接続されたグラフが記憶されるグラフデータベース１２と、サブグラフの検索に用いられるパターンが記憶されるパターンデータベース１３と、サブグラフやパターンを検索するクエリ発行部１４と、パターンを分類するパターン分類部１５と、分類により得られるパターンクラスタが記憶されるパターンクラスタデータベース１６とを備える。
グラフ検索装置１は、パターン分類部１５を備えることからわかるように、パターン分類装置としても機能する。 The graph search device 1 stores a user interface 11 that generates an input interface and an output interface displayed on the display device 3 and transmits them to the user terminal 2, and a graph in which nodes having instances are connected by arcs. There are a graph database 12, a pattern database 13 for storing patterns used for subgraph search, a query issuing unit 14 for searching for subgraphs and patterns, a pattern classifying unit 15 for classifying patterns, and a pattern cluster obtained by classification. And a pattern cluster database 16 to be stored.
As can be seen from the fact that the graph search device 1 includes the pattern classification unit 15, it also functions as a pattern classification device.

図２は、グラフデータベース１２に記憶されたデータ群を全て使って表示できるグラフＧの一部を例示した図である。 FIG. 2 is a diagram illustrating a part of the graph G that can be displayed using all the data groups stored in the graph database 12.

グラフデータベース１２に記憶されたデータ群を全て使って、図２に一部を例示したグラフＧ、つまり互いに異なるインスタンスをもつノード間がラベルをもつアークによって接続され且つ当該インスタンスのクラスが定義されたグラフＧ、を表示することができる。逆にいえば、グラフＧを表示するための過不足ないデータ群がグラフデータベース１２に記憶されている。以下、そのデータ群を便宜的にグラフＧという。また、なんらかのグラフ、サブグラフ（なんらかのグラフそのものまたはそれに含まれるグラフ）、パス（分岐および閉ループをもたないグラフ）などをクラスを含めて表示するための過不足ないデータ群を便宜的にグラフ、サブグラフ、パスなどという。 Using all the data groups stored in the graph database 12, the graph G illustrated in part in FIG. 2, that is, nodes having different instances are connected by arcs having labels, and classes of the instances are defined. A graph G can be displayed. Conversely, a data group for displaying the graph G is stored in the graph database 12. Hereinafter, the data group is referred to as a graph G for convenience. Also, for convenience, graphs and subgraphs can be used to display any graphs, subgraphs (some graphs themselves or graphs included in them), paths (graphs without branches and closed loops), etc. , Path and so on.

ラベルとは、アークの種類を識別する識別子であり、クラスとは、各インスタンスが属する概念を示すノードであり、インスタンスとは、クラス以外の個々の事物を示すノードである。 The label is an identifier for identifying the type of arc, the class is a node indicating a concept to which each instance belongs, and the instance is a node indicating individual things other than the class.

グラフＧでは、例えば、「政治」や「山本幸子」などのインスタンスをもつノードが、「ｔｈｅｍｅ：担当者」などのラベルをもつアークで接続される。また、グラフＧでは、ノードにそのインスタンス「政治」などの概念であるクラス「テーマ」などが定義される。 In the graph G, for example, nodes having instances such as “politics” and “Sachiko Yamamoto” are connected by an arc having a label such as “theme”. In the graph G, a class “theme”, which is a concept such as an instance “politics”, is defined in the node.

図３に示すように、「論文Ｆ」で示され、その元データの著者が山田太郎でり、題名が「Ｂ技術入門」であり、キーワードがＢ技術である、元データは、グラフデータベース１２では、ＲＤＦ／ＸＭＬ形式のデータとなって、グラフデータベース１２に記憶され、これがグラフＧのサブグラフをなす。「ＲＤＦのグラフ表現」と題されたものは、このサブグラフをグラフィカルに表現したものである。ＲＤＦについては、以下の文献に記載されている。 As shown in FIG. 3, the original data is “graph F”, the author of the original data is Taro Yamada, the title is “Introduction to B Technology”, the keyword is B technology, and the original data is the graph database 12. Then, it becomes RDF / XML format data and is stored in the graph database 12, which forms a sub-graph of the graph G. What is entitled “RDF graph representation” is a graphical representation of this subgraph. RDF is described in the following documents.

「Resource Description Framework(RDF)Model and Syntax Specification」, Ora Lassia, Ralph R.Swick編,[online], インターネット<URL:http://www.w3.org/TR/1999/REC-rdf-syntax-19990222/>
「RDF Vocabulary Description Language 1.0: RDF Schema」, Dan Brickley, R.V.Guha編,[online], インターネット<URL:http://www.w3.org/TR/rdf-schema/>
図１に戻り、パターンデータベース１３には、サブグラフの検索に用いられるパターンが記憶される。 "Resource Description Framework (RDF) Model and Syntax Specification", Ora Lassia, Ralph R. Swick, [online], Internet <URL: http://www.w3.org/TR/1999/REC-rdf-syntax- 19990222 />
"RDF Vocabulary Description Language 1.0: RDF Schema", Dan Brickley, RVGuha, [online], Internet <URL: http://www.w3.org/TR/rdf-schema/>
Returning to FIG. 1, the pattern database 13 stores patterns used for subgraph search.

図４は、パターンデータベース１３に記憶されたパターンのうちの３パターンをグラフ化して例示した図である。 FIG. 4 is a diagram illustrating three patterns of the patterns stored in the pattern database 13 as graphs.

パターンは、グラフデータベース１２に記憶されるデータ群（グラフＧ）の一部をなすデータ群と同様なものであり、それを本図のようにグラフ化できるので、便宜的にはグラフと言えるが、パターンは表示するものではなく、表示されるグラフの検索に使用されるものである。なお、データ群である実際のパターンを逐一説明するのは冗長なのでグラフ化されたパターンで便宜的に説明する。 The pattern is the same as the data group forming a part of the data group (graph G) stored in the graph database 12 and can be graphed as shown in FIG. The pattern is not displayed but is used for searching the displayed graph. Since it is redundant to explain the actual pattern as a data group one by one, it will be explained for convenience with a graphed pattern.

一般的にパターンでは、ノードやアークの一部はインスタンスやラベルをもち、残りはそれらをもたない。そして、インスタンスやラベルをもたないノードやアークには変数が設定される。変数は、図に示すように、？とそれに後続する単語からなる。 In general, in a pattern, some nodes and arcs have instances and labels, and the rest do not. Variables are set for nodes and arcs that do not have instances or labels. Variables as shown in the figure? Followed by a word.

ここでは、クラス「テーマ」が定義されたノードを一方の端位置に有し、クラス「組織」が定義されたノードを他方の端位置に有し、各ノードがインスタンスをもたず、各アークがラベルをもつ、パターンＰ１、Ｐ２、Ｐ３が、パターンデータベース１３に記憶されていることとする。これらは、いずれもテーマから組織を知るためのパターンであり、パターンＰ１は、「テーマが属する組織」という意味を有し、パターンＰ２は、「テーマの責任者が属する組織」という意味を有し、パターンＰ３は、「テーマの担当者が属する組織」という意味を有する。 Here, a node with class “theme” is defined at one end position, a node with class “organization” is defined at the other end position, each node has no instance, and each arc It is assumed that patterns P1, P2, and P3 having a label are stored in the pattern database 13. These are all patterns for knowing the organization from the theme. The pattern P1 has the meaning of “organization to which the theme belongs”, and the pattern P2 has the meaning of “organization to which the person in charge of the theme belongs”. The pattern P3 means “organization to which the person in charge of the theme belongs”.

このようなパターンによって、あるグラフから検索されるサブグラフは、以下の条件を備えるものである。 A subgraph retrieved from a certain graph by such a pattern has the following conditions.

つまり、検索されるのは、（１）そのグラフまたはそのサブグラフであって、（２）パターンの構造を過不足なく有し、（３）パターン内でのインスタンスやラベルを過不足なく有し、つまりパターン内でのインスタンスやラベルをもつノードやアークの位置に等しい位置にあるノードやアークが当該インスタンスに等しいインスタンスやラベルを有するものである。 In other words, what is searched is (1) the graph or its subgraph, (2) having a pattern structure without excess or deficiency, (3) having instances or labels within the pattern without deficiency, That is, a node or arc at a position equal to the position of a node or arc having an instance or label in the pattern has an instance or label equal to the instance.

（３）の条件を補足すれば、例えば、パターンの一方端にあるノードのインスタンスを「Ａ」とすると、少なくとも検索されるサブグラフの一方端にあるノードのインスタンスも「Ａ」でなければならず、また、パターンの一方端にあるノードに接続される唯一のアークのラベルを「Ｂ」とすると、当該サブグラフの一方端にあるノードに接続される唯一のアークのラベルも「Ｂ」でなければならず、こうしたインスタンスやラベルのマッチングが、パターン内でのインスタンスやラベルをもつ全てのノードとアークにおいて必要なのである。 If the condition of (3) is supplemented, for example, if the instance of the node at one end of the pattern is “A”, the instance of the node at the one end of the subgraph to be searched must also be “A”. Also, if the label of the only arc connected to the node at one end of the pattern is “B”, the label of the only arc connected to the node at one end of the subgraph is not “B”. Rather, such instance and label matching is required for all nodes and arcs that have instances and labels in the pattern.

なお、パターンにより、このようにしてサブグラフを検索することを、パターンに合致する（マッチするともいう）サブグラフを検索するという。 Note that searching for a subgraph by a pattern in this way is called searching for a subgraph that matches (also matches) the pattern.

図１に戻り、クエリ発行部１４は、パターンデータベース１３からパターンを検索する。また、クエリ発行部１４は、グラフＧのサブグラフを検索する。 Returning to FIG. 1, the query issuing unit 14 searches for a pattern from the pattern database 13. In addition, the query issuing unit 14 searches for a subgraph of the graph G.

グラフ検索装置１は、各部（データベース含む）でデータの送受信（受け渡し）が可能であればよい。つまり、各部を、同一のコンピュータに配置してもよいし、複数のコンピュータに分散配置してもよい。また、これらコンピュータをグラフ検索装置やパターン分類装置として動作させるコンピュータプログラムを通信回線を介して送受信してもよい。また、このコンピュータプログラムを、半導体メモリ、磁気ディスク、光ディスク、光磁気ディスク、磁気テープなどの記録媒体に記録し、その記録媒体を流通させてもよい。 The graph search device 1 only needs to be able to transmit and receive (deliver) data in each unit (including a database). That is, each unit may be arranged on the same computer or may be distributed on a plurality of computers. Further, a computer program that causes these computers to operate as a graph search device or a pattern classification device may be transmitted / received via a communication line. The computer program may be recorded on a recording medium such as a semiconductor memory, a magnetic disk, an optical disk, a magneto-optical disk, or a magnetic tape, and the recording medium may be distributed.

（本実施の形態の動作）
図５は、グラフ検索装置１においてパターンを分類する動作を示すシーケンス図である。 (Operation of this embodiment)
FIG. 5 is a sequence diagram showing an operation of classifying patterns in the graph search device 1.

グラフ検索装置１では、ユーザインタフェース１１が入力用インタフェースを生成し、それをユーザ端末２に送信して（Ｓ１）、図６で示すように表示させる（Ｓ３）。 In the graph search device 1, the user interface 11 generates an input interface, transmits it to the user terminal 2 (S1), and displays it as shown in FIG. 6 (S3).

ここで、ユーザが、例えば、新聞社などの中で、「政治」というテーマに関連する社内などの組織がどこかを知りたいとする。また、ユーザは、パターン同士の類似の判定を厳しめにしたく、その程度が、１を最大とした場合には、「０．７」であると考えていることとする。 Here, it is assumed that the user wants to know where an organization such as a company related to the theme of “politics” is located in a newspaper company, for example. In addition, the user wants to make the similarity determination between patterns stricter, and when the degree is set to 1 as the maximum, it is assumed that the user thinks that “0.7”.

この例では、ユーザの操作により、「テーマ」という情報（クラス「テーマ」という）、「組織」という情報（クラス「組織」という）が、入力用インタフェースに含まれた情報から選択されたこととする。 In this example, the information “theme” (class “theme”) and the information “organization” (class “organization”) are selected from the information included in the input interface. To do.

また、ユーザ端末２では、ユーザの操作により、「０．７」という値（しきい値「０．７」という）が指定されたこととする。 In the user terminal 2, it is assumed that a value “0.7” (threshold value “0.7”) is designated by a user operation.

ユーザ端末２は、これらのパラメータをグラフ検索装置１に送信する（Ｓ５）。 The user terminal 2 transmits these parameters to the graph search device 1 (S5).

グラフ検索装置１では、クエリ発行部１４が、クラス「テーマ」を含む検索構文であるクエリをグラフデータベース１２に送信し、これにより、クラス「テーマ」が定義されたノード内のインスタンス（ここでは、インスタンス「政治」、「歴史」、「科学」（以下、それぞれキーワードＫ１、Ｋ２、Ｋ３という））をグラフデータベース１２から検索する（Ｓ７）。 In the graph search device 1, the query issuing unit 14 transmits a query, which is a search syntax including the class “theme”, to the graph database 12, whereby an instance in the node where the class “theme” is defined (here, The instance “politics”, “history”, and “science” (hereinafter referred to as keywords K1, K2, and K3) are searched from the graph database 12 (S7).

クエリ発行部１４は、クラス「テーマ」とクラス「組織」を含む検索構文であるクエリをパターンデータベース１３に送信し、これにより、パターンＰ１、Ｐ２、Ｐ３をパターンデータベース１３から検索する（Ｓ９）。 The query issuing unit 14 transmits a query having a search syntax including the class “theme” and the class “organization” to the pattern database 13, thereby searching the pattern database 13 for patterns P 1, P 2 and P 3 (S 9).

次に、クエリ発行部１４が、パターンＰ１、Ｐ２、Ｐ３における、クラス「テーマ」が定義されたノード（第１ノードという）に対し、各キーワードＫ１、Ｋ２、Ｋ３を含ませて、３（パターン数）×３（キーワード数）個（合計９個）のパターン（以下、検索パターンＰ１１〜Ｐ３３という）を生成する（Ｓ１１）。 Next, the query issuing unit 14 includes the keywords K1, K2, and K3 in the patterns P1, P2, and P3 in which the class “theme” is defined (referred to as the first node) and includes 3 (pattern Number) × 3 (number of keywords) (total of 9) patterns (hereinafter referred to as search patterns P11 to P33) are generated (S11).

図７は、これらの検索パターンを示す図である。 FIG. 7 is a diagram showing these search patterns.

この図では、クラスを図示省略している。例えば、検索パターンＰ１１は、パターンＰ１の第１ノードにキーワードＫ１「政治」をインスタンスとして含ませたものである。
検索パターンＰ１２は、パターンＰ１の第１ノードにキーワードＫ１「歴史」をインスタンスとして含ませたものである。
検索パターンＰ１３は、パターンＰ１の第１ノードにキーワードＫ１「科学」をインスタンスとして含ませたものである。
検索パターンＰ２１は、パターンＰ２の第１ノードにキーワードＫ１「政治」をインスタンスとして含ませたものである。
検索パターンＰ２２は、パターンＰ２の第１ノードにキーワードＫ１「歴史」をインスタンスとして含ませたものである。
検索パターンＰ２３は、パターンＰ２の第１ノードにキーワードＫ１「科学」をインスタンスとして含ませたものである。
検索パターンＰ３１は、パターンＰ３の第１ノードにキーワードＫ１「政治」をインスタンスとして含ませたものである。
検索パターンＰ３２は、パターンＰ３の第１ノードにキーワードＫ１「歴史」をインスタンスとして含ませたものである。
検索パターンＰ３３は、パターンＰ３の第１ノードにキーワードＫ１「科学」をインスタンスとして含ませたものである。 In this figure, the class is not shown. For example, the search pattern P11 includes the keyword K1 “politics” as an instance in the first node of the pattern P1.
The search pattern P12 includes the keyword K1 “history” as an instance in the first node of the pattern P1.
The search pattern P13 includes the keyword K1 “science” as an instance in the first node of the pattern P1.
The search pattern P21 includes the keyword K1 “politics” as an instance in the first node of the pattern P2.
The search pattern P22 includes the keyword K1 “history” as an instance in the first node of the pattern P2.
The search pattern P23 includes the keyword K1 “science” as an instance in the first node of the pattern P2.
The search pattern P31 includes the keyword K1 “politics” as an instance in the first node of the pattern P3.
The search pattern P32 includes the keyword K1 “history” as an instance in the first node of the pattern P3.
The search pattern P33 includes the keyword K1 “science” as an instance in the first node of the pattern P3.

図５に戻り、クエリ発行部１４が、検索パターンＰ１１〜Ｐ３３をクエリに変換し、それをグラフデータベース１２に送信することで、その検索パターンにマッチするサブグラフをグラフＧから取得する（Ｓ１５）。 Returning to FIG. 5, the query issuing unit 14 converts the search patterns P11 to P33 into queries, and transmits them to the graph database 12, thereby acquiring a subgraph matching the search pattern from the graph G (S15).

ここでは、検索パターンＰ１１〜Ｐ３３から、図８に示すようなサブグラフＳＧ（Ｐ１１）〜ＳＧ（Ｐ３３）がそれぞれ取得されたこととする。 Here, it is assumed that subgraphs SG (P11) to SG (P33) as shown in FIG. 8 are obtained from the search patterns P11 to P33, respectively.

図５に戻り、クエリ発行部１４は、パターンＰ１、Ｐ２、Ｐ３、サブグラフＳＧ（Ｐ１１）〜ＳＧ（Ｐ３３）、しきい値「０．７」、クラス「組織」をパターン分類部１５に与える（Ｓ１７）。 Returning to FIG. 5, the query issuing unit 14 gives the patterns P1, P2, and P3, the subgraphs SG (P11) to SG (P33), the threshold value “0.7”, and the class “organization” to the pattern classification unit 15 ( S17).

パターン分類部１５は、サブグラフ、しきい値、クラスに基づいて、パターンを分類し（Ｓ１９）、分類結果をパターンクラスタデータベース１６に格納する（Ｓ２１）。 The pattern classification unit 15 classifies the pattern based on the subgraph, threshold value, and class (S19), and stores the classification result in the pattern cluster database 16 (S21).

ここで、ステップＳ１９を詳述する。
パターン分類部１５は、パターンＰ１〜Ｐ３を１以上のパターンクラスタに分類する。ここでは、複数の類似するパターンを含む集合、または、他のパターンと類似しない単一のパターンをパターンクラスタという。したがって、パターン分類部１５は、例えば、パターンＰ１、Ｐ２を含むパターンクラスタと、パターンＰ３を含むパターンクラスタを生成する。 Here, step S19 will be described in detail.
The pattern classification unit 15 classifies the patterns P1 to P3 into one or more pattern clusters. Here, a set including a plurality of similar patterns or a single pattern not similar to other patterns is referred to as a pattern cluster. Accordingly, the pattern classification unit 15 generates, for example, a pattern cluster including the patterns P1 and P2 and a pattern cluster including the pattern P3.

このとき、パターン分類部１５は、パターンＰ１〜Ｐ３における全てのパターンの組み合わせにつき、以下のような処理を行う。 At this time, the pattern classification unit 15 performs the following processing for all combinations of patterns in the patterns P1 to P3.

パターンＰ１、Ｐ２の組の例を説明する。
パターン分類部１５は、パターンＰ１にキーワードＫ１を含ませた検索パターンＰ１１に合致する１つ以上のサブグラフ（ここではサブグラフＳＧ（Ｐ１１））におけるクラス「組織」が定義されたノード（第２ノードという）内のインスタンスの集合（ここでは「○○グループ」のみ）をＲ１１、パターンＰ２にキーワードＫ１を含ませた検索パターンＰ２１に合致する１つ以上のサブグラフ（ここではサブグラフＳＧ（Ｐ２１））におけるクラス「組織」が定義されたノード（第２ノード）内のインスタンスの集合（ここでは「○○グループ」）をＲ２１として、
Ｒ１１、Ｒ２１が共に空集合でないなら、パターンＰ１、Ｐ２、キーワードＫ１に関し、
類似判定値Ｔ（Ｐ１、Ｐ２、Ｋ１）＝｜Ｒ１１∩Ｒ２１｜÷｜Ｒ１１∪Ｒ２１｜
ただし、
｜Ｒ１１∩Ｒ２１｜は、Ｒ１１とＲ２１の積集合の中のインスタンス数、
｜Ｒ１１∪Ｒ２１｜は、Ｒ１１とＲ２１の和集合の中のインスタンス数、
を計算し、類似判定値Ｔ（Ｐ１、Ｐ２、Ｋ１）がしきい値「０．７」以上か否かを判定する。 An example of a set of patterns P1 and P2 will be described.
The pattern classification unit 15 is a node (referred to as a second node) in which a class “organization” is defined in one or more subgraphs (here, subgraph SG (P11)) matching the search pattern P11 in which the keyword P1 is included in the pattern P1. ) Is a set of instances (here, “XX group only”) in R11, and a class in one or more subgraphs (here, subgraph SG (P21)) that matches the search pattern P21 including the keyword K1 in the pattern P2. A set of instances (here “XX group”) in a node (second node) in which “organization” is defined is R21,
If R11 and R21 are not empty sets, the patterns P1, P2 and the keyword K1 are
Similarity determination value T (P1, P2, K1) = | R11∩R21 | ÷ | R11∪R21 |
However,
| R11∩R21 | is the number of instances in the product set of R11 and R21,
| R11∪R21 | is the number of instances in the union of R11 and R21,
Is calculated, and it is determined whether or not the similarity determination value T (P1, P2, K1) is equal to or greater than the threshold value “0.7”.

ここで、類似判定値Ｔ（Ｐ１、Ｐ２、Ｋ１）におけるＰ１、Ｐ２、Ｋ１は、パターンＰ１、Ｐ２、キーワードＫ１に関するものという意味である。以下に説明する類似判定値についても同様である。 Here, P1, P2, and K1 in the similarity determination value T (P1, P2, and K1) mean that they relate to the patterns P1 and P2, and the keyword K1. The same applies to the similarity determination value described below.

さて、ここでは、パターンＰ１、Ｐ２、キーワードＫ１について、類似判定値Ｔ（Ｐ１、Ｐ２、Ｋ１）を計算し、しきい値「０．７」以上か否かを判定するのである。 Here, the similarity determination value T (P1, P2, K1) is calculated for the patterns P1, P2 and the keyword K1, and it is determined whether or not the threshold is “0.7” or more.

Ｒ１１∩Ｒ２１は、「○○グループ」を含むので、｜Ｒ１１∩Ｒ２１｜＝１である。
Ｒ１１∪Ｒ２１は、「○○グループ」を含むので、｜Ｒ１１∪Ｒ２１｜＝１である。
よって、類似判定値Ｔ（Ｐ１、Ｐ２、Ｋ１）＝１÷１＝１となり、しきい値「０．７」以上と判定される。 Since R11∩R21 includes “XX group”, | R11∩R21 | = 1.
Since R11∪R21 includes “XX group”, | R11∪R21 | = 1.
Therefore, the similarity determination value T (P1, P2, K1) = 1 ÷ 1 = 1, and it is determined that the threshold value is “0.7” or more.

同様に、パターン分類部１５は、パターンＰ１、Ｐ２、キーワードＫ２について、類似判定値Ｔ（Ｐ１、Ｐ２、Ｋ２）を計算し、しきい値「０．７」以上か否かを判定する。
積集合は、「○○グループ」を含むので、積集合の中のインスタンス数＝１である。
和集合は、「○○グループ」を含むので、和集合の中のインスタンス数＝１である。
よって、類似判定値Ｔ（Ｐ１、Ｐ２、Ｋ２）＝１÷１＝１となり、しきい値「０．７」以上と判定される。 Similarly, the pattern classification unit 15 calculates a similarity determination value T (P1, P2, K2) for the patterns P1, P2, and the keyword K2, and determines whether or not the threshold is “0.7” or more.
Since the intersection set includes “XX group”, the number of instances in the intersection set = 1.
Since the union includes “XX group”, the number of instances in the union = 1.
Therefore, the similarity determination value T (P1, P2, K2) = 1 ÷ 1 = 1, and it is determined that the threshold is “0.7” or more.

同様に、パターン分類部１５は、パターンＰ１、Ｐ２、キーワードＫ３について、類似判定値Ｔ（Ｐ１、Ｐ２、Ｋ３）を計算し、しきい値「０．７」以上か否かを判定する。
積集合は、空集合なので、積集合の中のインスタンス数＝０である。
和集合は、「××グループ」、「△△グループ」を含むので、和集合の中のインスタンス数＝２である。
よって、類似判定値Ｔ（Ｐ１、Ｐ２、Ｋ３）＝０となり、しきい値「０．７」未満と判定される。 Similarly, the pattern classification unit 15 calculates the similarity determination value T (P1, P2, K3) for the patterns P1, P2, and the keyword K3, and determines whether or not the threshold is “0.7” or more.
Since the product set is an empty set, the number of instances in the product set = 0.
Since the union includes “XX group” and “ΔΔ group”, the number of instances in the union = 2.
Therefore, the similarity determination value T (P1, P2, K3) = 0, and is determined to be less than the threshold value “0.7”.

次に、パターン分類部１５は、例えば、３つの類似判定値の中のｋ個（０＜ｋ≦Ｋ：ただしＫはキーワードの個数（ここでは「３」）である。）以上がしきい値「０．７」以上であったか否かを判定し、ｋ個以上がしきい値以上であったなら、パターンＰ１、Ｐ２は類似していると判定する。なお、ｋの値は、パラメータとして入力してもよいし、既定値であってもよい。 Next, for example, the pattern classification unit 15 has a threshold equal to or greater than k (0 <k ≦ K: where K is the number of keywords (here, “3”)) among the three similarity determination values. It is determined whether or not “0.7” or more. If k or more is equal to or greater than the threshold value, it is determined that the patterns P1 and P2 are similar. The value of k may be input as a parameter or may be a default value.

例えば、ｋ＝１なら、３つの類似判定値の中の少なくとも１つがしきい値以上なら、パターンＰ１、Ｐ２は類似していると判定される。
例えば、ｋ＝３なら、３つの類似判定値の中の全てがしきい値以上なら、パターンＰ１、Ｐ２は類似していると判定される。 For example, if k = 1, it is determined that the patterns P1 and P2 are similar if at least one of the three similarity determination values is greater than or equal to the threshold value.
For example, if k = 3, if all of the three similarity determination values are equal to or greater than the threshold value, the patterns P1 and P2 are determined to be similar.

なお、類似判定値を計算する際、計算対象の２つの集合の一方または両方に空集合なら、その計算はスキップされる。つまり、スキップなしで、３つの類似判定値が計算される場合だけでなく、それより少ない１つまたは２つの類似判定値が計算される場合もあるのである。 When calculating the similarity determination value, if one or both of the two sets to be calculated are empty sets, the calculation is skipped. That is, not only the case where three similarity determination values are calculated without skipping, but also one or two similarity determination values smaller than that may be calculated.

このようにして、パターン分類部１５は、例えば、パターンＰ１、Ｐ２を含むパターンクラスタと、パターンＰ３を含むパターンクラスタを生成する（Ｓ１９）。 In this way, the pattern classification unit 15 generates, for example, a pattern cluster including the patterns P1 and P2 and a pattern cluster including the pattern P3 (S19).

パターン分類部１５は、各パターンクラスタ（分類結果）をパターンクラスタデータベース１６に格納する（Ｓ２１）。ここでは、２つ以上のパターンを含むパターンクラスタのみならず、１つのパターンのみを含むパターンクラスタもパターンクラスタデータベース１６に格納される。 The pattern classification unit 15 stores each pattern cluster (classification result) in the pattern cluster database 16 (S21). Here, not only a pattern cluster including two or more patterns but also a pattern cluster including only one pattern is stored in the pattern cluster database 16.

図９は、グラフ検索装置１においてグラフＧからノードのインスタンスを検索する動作を示すシーケンス図である。 FIG. 9 is a sequence diagram showing an operation of searching for an instance of a node from the graph G in the graph search device 1.

ユーザインタフェース１１は、入力用インタフェースを生成し、それをユーザ端末２に送信して（Ｓ５１）、図１０に示すように表示させる（Ｓ５３）。 The user interface 11 generates an input interface, transmits it to the user terminal 2 (S51), and displays it as shown in FIG. 10 (S53).

ここで、ユーザは、例えば、パターンを分類する際のユーザであり、「政治」というテーマに関連する社内などの組織がどこかを知りたいとする。 Here, the user is, for example, a user who classifies patterns, and wants to know where the organization such as the company related to the theme “politics” is.

ここでは、ユーザの操作により、「テーマ」という情報（クラス「テーマ」という）、「組織」という情報（クラス「組織」という）が、入力用インタフェースに含まれた情報から選択されたこととする。 Here, it is assumed that the information “theme” (class “theme”) and the information “organization” (class “organization”) are selected from the information included in the input interface by the user's operation. .

ユーザ端末２は、これらパラメータをグラフ検索装置１に送信する（Ｓ５５）。 The user terminal 2 transmits these parameters to the graph search device 1 (S55).

グラフ検索装置１では、クエリ発行部１４が、クラス「テーマ」とクラス「組織」を含む検索構文であるクエリをパターンデータベース１３に送信し、これにより、パターンＰ１、Ｐ２、Ｐ３をパターンデータベース１３から検索する（Ｓ５９）。 In the graph search device 1, the query issuing unit 14 transmits a query having a search syntax including the class “theme” and the class “organization” to the pattern database 13, and thereby the patterns P 1, P 2, and P 3 are transmitted from the pattern database 13. Search is performed (S59).

次に、クエリ発行部１４は、パターンクラスタデータベース１６を参照し、パターンＰ１、Ｐ２を含むパターンクラスタがあるので、検索されたパターンＰ１、Ｐ２の一方である、例えばパターンＰ２を除外し（Ｓ６０）、パターンＰ２を除いた２つのパターンＰ１、Ｐ３をユーザインタフェース１１に与える。 Next, the query issuing unit 14 refers to the pattern cluster database 16, and since there is a pattern cluster including the patterns P1 and P2, excludes, for example, the pattern P2, which is one of the searched patterns P1 and P2 (S60). The two patterns P1 and P3 excluding the pattern P2 are given to the user interface 11.

ステップＳ６０では、パターンデータベース１３から複数のパターンを含むパターンクラスタを検索し、ステップＳ５９で検索された複数のパターンから、当該パターンクラスタ内の複数のパターンに合致する複数のパターンを選択し、選択された複数のパターンのうちの任意の１つ以上を残して、残りを除外する。 In step S60, a pattern cluster including a plurality of patterns is searched from the pattern database 13, and a plurality of patterns matching the plurality of patterns in the pattern cluster are selected from the plurality of patterns searched in step S59. Any one or more of the plurality of patterns are left and the rest are excluded.

詳しくは、例えば、最も長いパターンを残すようにしてもよい。シンプルでわかりやすいパターンだからである。
また、検索結果が空でなく且つキーワードが多いパターンを残すようにしてもよい。検索結果が得やすいパターンだからである。
また、検索結果が空でなく且つキーワードが少ないパターンを残すようにしてもよい。厳密な検索結果が得やすいパターンだからである。
また、こうした取捨選択をオペレータが判断してもよい。 Specifically, for example, the longest pattern may be left. This is because it is a simple and easy-to-understand pattern.
Alternatively, a pattern in which the search result is not empty and has many keywords may be left. This is because the search results are easy to obtain.
Alternatively, a pattern in which the search result is not empty and the number of keywords is small may be left. This is because an exact search result can be easily obtained.
In addition, the operator may determine such selection.

ユーザインタフェース１１は、入力用インタフェースを生成し、それをユーザ端末２に送信して（Ｓ６１）、図１１に示すように表示させる（Ｓ６３）。ここでは、パターンＰ１、Ｐ３が表示されるが、図１１では、クラスやアークのラベルなどを図示せず、簡易的に示している。 The user interface 11 generates an input interface, transmits it to the user terminal 2 (S61), and displays it as shown in FIG. 11 (S63). Here, the patterns P1 and P3 are displayed, but in FIG. 11, the class and arc labels are not shown and are simply shown.

これに対して、ユーザがユーザ端末２にパターンＰ１の選択指示、キーワード「政治」を入力し、ユーザ端末２は、これらパラメータをグラフ検索装置１に送信する（Ｓ６５）。 In response to this, the user inputs the selection instruction of the pattern P1 and the keyword “politics” to the user terminal 2, and the user terminal 2 transmits these parameters to the graph search device 1 (S65).

グラフ検索装置１では、クエリ発行部１４が、選択されたパターンＰ１における、クラス「テーマ」が定義されたノードに対し、キーワード「政治」を含ませて、検索パターン（つまり、図７の検索パターンＰ１１）を生成する（Ｓ６７）。 In the graph search device 1, the query issuing unit 14 includes the keyword “politics” for the node in which the class “theme” is defined in the selected pattern P <b> 1, and the search pattern (that is, the search pattern in FIG. 7). P11) is generated (S67).

クエリ発行部１４は、検索パターンＰ１１をクエリに変換し、それをグラフデータベース１２に送信することで、その検索パターンにマッチするサブグラフ（つまり、図８のサブグラフＳＧ（Ｐ１１））をグラフＧから取得する（Ｓ７１）。 The query issuing unit 14 converts the search pattern P11 into a query, and transmits it to the graph database 12, thereby acquiring a subgraph that matches the search pattern (that is, the subgraph SG (P11) in FIG. 8) from the graph G. (S71).

次に、クエリ発行部１４は、サブグラフＳＧ（Ｐ１１）におけるクラス「組織」が定義されたノード内のインスタンス（つまり、「○○グループ」）を取り出し、ユーザインタフェース１１に与える（Ｓ７３）。 Next, the query issuing unit 14 takes out an instance (that is, “XX group”) in the node in which the class “organization” in the subgraph SG (P11) is defined, and gives it to the user interface 11 (S73).

ユーザインタフェース１１は、出力用インタフェースを生成し、それをユーザ端末２に送信して（Ｓ７５）、図１２に示すように表示させる（Ｓ７７）。 The user interface 11 generates an output interface, transmits it to the user terminal 2 (S75), and displays it as shown in FIG. 12 (S77).

したがって、ユーザは、３つのパターンＰ１、Ｐ２、Ｐ３から１つを選択するのでなく、２つのパターンＰ１、Ｐ３から１つを選択すればよいので、利便性が向上する。 Therefore, the user does not need to select one from the three patterns P1, P2, and P3, but selects one from the two patterns P1 and P3, which improves convenience.

なお、ここでは、１つのパターンを選択したが、２つ以上のパターンを選択してもよい。この際であっても、例えば、選択候補を少なくでき、その少ない選択候補から２つ以上のパターンを選択すればよいので、利便性が向上する。 Although one pattern is selected here, two or more patterns may be selected. Even in this case, for example, the number of selection candidates can be reduced, and two or more patterns may be selected from the few selection candidates, so that convenience is improved.

なお、パターン分類部１５は、複数のパターンを１以上のパターンクラスタに分類する際に、別な方法を用いてもよい。 The pattern classification unit 15 may use another method when classifying a plurality of patterns into one or more pattern clusters.

以下、その一例を説明する。なお、説明のない点については、上記の実施例と同様である。 An example will be described below. The points not described are the same as in the above embodiment.

ここでは、便宜上、図１３に示すように、前述のパターンＰ１などの代わりに、５つのパターンＰ１０１〜Ｐ１０５を使用し、前述のキーワードＫ１などの代わりに、５つのキーワードＫ１１〜Ｋ１５を使用することとする。 Here, for the sake of convenience, as shown in FIG. 13, five patterns P101 to P105 are used instead of the above-described pattern P1, and five keywords K11 to K15 are used instead of the above-described keyword K1. And

図５のステップＳ１１では、クエリ発行部１４は、パターンＰ１０１〜Ｐ１０５における第１ノードに対し、各キーワードＫ１１〜Ｋ１５を含ませて、５（パターン数）×５（キーワード数）個（合計２５個）の検索パターンＰ１０１１〜Ｐ１０５５という）を生成する（Ｓ１１）。 In step S11 of FIG. 5, the query issuing unit 14 includes 5 keywords (number of patterns) × 5 (number of keywords) (25 in total) including the keywords K11 to K15 for the first nodes in the patterns P101 to P105. ) Search patterns P1011 to P1055) are generated (S11).

クエリ発行部１４が、検索パターンＰ１０１１〜Ｐ１０５５をクエリに変換し、それをグラフデータベース１２に送信することで、その検索パターンにマッチするサブグラフをグラフＧから取得する（Ｓ１５）。 The query issuing unit 14 converts the search patterns P1011 to P1055 into queries, and transmits them to the graph database 12, thereby acquiring a subgraph that matches the search pattern from the graph G (S15).

クエリ発行部１４は、パターンＰ１０１〜Ｐ１０５、サブグラフ、しきい値「０．７」、クラス「組織」をパターン分類部１５に与える（Ｓ１７）。 The query issuing unit 14 gives the patterns P101 to P105, the subgraph, the threshold value “0.7”, and the class “organization” to the pattern classification unit 15 (S17).

ここで、ステップＳ１９を詳述する。ステップ１９では、パターン分類部１５は、パターンＰ１０１〜Ｐ１０５を１以上のパターンクラスタに分類する。 Here, step S19 will be described in detail. In step 19, the pattern classification unit 15 classifies the patterns P101 to P105 into one or more pattern clusters.

図１４は、そのステップＳ１９の動作を示すフローチャートである。 FIG. 14 is a flowchart showing the operation in step S19.

パターン分類部１５は、まず、検索パターンＰ１０１１〜Ｐ１０５５から、該当のサブグラフを得られなかった検索で使用された検索パターンを除外する（Ｓ１９１）。 The pattern classification unit 15 first excludes the search pattern used in the search for which the corresponding subgraph was not obtained from the search patterns P1011 to P1055 (S191).

図１５に示すように、パターン分類部１５は、×印のついた升目に該当する検索パターンを除外する。 As shown in FIG. 15, the pattern classification unit 15 excludes search patterns corresponding to cells marked with “x”.

次に、パターン分類部１５は、図１５において枠線Ｙ１で囲って示したように、２つの各パターン（例えば、パターンＰ１０１、Ｐ１０２）に共通のキーワード（例えば、キーワードＫ１１）を含ませて得た検索パターン（例えば、検索パターンＰ１０１１、Ｐ１０２１）から共にサブグラフが得られた場合には当該２つのパターン（パターンＰ１０１、Ｐ１０２）を互いに関連づける（Ｓ１９３）。 Next, the pattern classification unit 15 is obtained by including a keyword (for example, keyword K11) common to the two patterns (for example, patterns P101 and P102), as indicated by the frame Y1 in FIG. When subgraphs are obtained from the search patterns (for example, search patterns P1011 and P1021), the two patterns (patterns P101 and P102) are associated with each other (S193).

図１５において、枠線Ｙ２で囲って示したように、４つのパターンＰ１０１〜Ｐ１０４は互いに関連づけられる。 In FIG. 15, the four patterns P101 to P104 are associated with each other as shown by being surrounded by the frame line Y2.

また、パターン分類部１５は、互いに関連づけられた複数のパターンから得られ且つ除外されていない検索パターンに含まれた１つ以上のキーワードをキーワードクラスタと定義する（Ｓ１９５）。図１５によれば、キーワードＫ１１〜Ｋ１４がキーワードクラスタ（キーワードクラスタＣ１という）と定義される。 Further, the pattern classification unit 15 defines one or more keywords included in the search pattern obtained from a plurality of patterns associated with each other and not excluded as a keyword cluster (S195). According to FIG. 15, the keywords K11 to K14 are defined as keyword clusters (referred to as keyword clusters C1).

また、パターン分類部１５は、他のパターンと関連づけられていない単一のパターンから得られ且つ除外されていない検索パターンに含まれた１つ以上のキーワードをキーワードクラスタと定義する（Ｓ１９５）。図１５によれば、キーワードＫ１５がキーワードクラスタ（キーワードクラスタＣ２という）と定義される。 Further, the pattern classification unit 15 defines one or more keywords included in a search pattern obtained from a single pattern not associated with another pattern and not excluded as a keyword cluster (S195). According to FIG. 15, the keyword K15 is defined as a keyword cluster (referred to as keyword cluster C2).

次に、パターン分類部１５は、各キーワードクラスタＣ１、Ｃ２から１つのキーワードを選択する（Ｓ１９７）。その際に、パターン分類部１５は、選択されるキーワードを含み且つ除外されていない検索パターンの数が最も多くなるようにする。図１５によれば、キーワードクラスタＣ１、Ｃ２からそれぞれキーワードＫ１１、Ｋ１４が選択される。 Next, the pattern classification unit 15 selects one keyword from each of the keyword clusters C1 and C2 (S197). At that time, the pattern classification unit 15 maximizes the number of search patterns that include the selected keyword and are not excluded. According to FIG. 15, keywords K11 and K14 are selected from the keyword clusters C1 and C2, respectively.

次に、パターン分類部１５は、キーワードクラスタＣ１につき、選択されたキーワードＫ１１を含み且つ除外されていない検索パターンＰ１０１１、Ｐ１０２１、Ｐ１０３１を生成するために使用された１つ以上のパターンＰ１０１、Ｐ１０２、Ｐ１０３を選択する（Ｓ１９９）。 Next, the pattern classification unit 15 includes, for the keyword cluster C1, one or more patterns P101, P102, P1021, P1021, which are used to generate the search patterns P1011, P1021, P1031 that include the selected keyword K11 and are not excluded. P103 is selected (S199).

また、パターン分類部１５は、キーワードクラスタＣ２につき、選択されたキーワードＫ１４を含み且つ除外されていない検索パターンＰ１０５５を生成するために使用された１つ以上のパターンＰ１０５を選択する（Ｓ１９９）。 Further, the pattern classification unit 15 selects one or more patterns P105 used to generate the search pattern P1055 that includes the selected keyword K14 and is not excluded for the keyword cluster C2 (S199).

なお、キーワードクラスタＣ１については、パターンＰ１０１、Ｐ１０２、Ｐ１０３が選択され（Ｓ１９９）、パターンＰ１０４は選択されなかったが、このように選択されなかったパターンが複数ある場合には、そのような複数のパターンを、このステップＳ１９と同様にして、１以上のパターンクラスタに分類してもよい。 For the keyword cluster C1, patterns P101, P102, and P103 are selected (S199), and the pattern P104 is not selected. If there are a plurality of patterns that are not selected in this way, The pattern may be classified into one or more pattern clusters in the same manner as in step S19.

さて、次に、パターン分類部１５は、選択されたパターンＰ１０１、Ｐ１０２、Ｐ１０３に含まれ且つ互いに類似する複数のパターンをパターンクラスタと定義し、パターンＰ１０１、Ｐ１０２、Ｐ１０３に含まれ且つ他のパターンと類似しない単一のパターンをパターンクラスタと定義し、パターンＰ１０１、Ｐ１０２、Ｐ１０３を１つ以上のパターンクラスタに分類する（Ｓ１９１１）。 Next, the pattern classification unit 15 defines a plurality of patterns included in the selected patterns P101, P102, and P103 and similar to each other as a pattern cluster, and is included in the patterns P101, P102, and P103 and other patterns. Is defined as a pattern cluster, and the patterns P101, P102 and P103 are classified into one or more pattern clusters (S1911).

ここで、パターン分類部１５は、パターンＰ１０１、Ｐ１０２、Ｐ１０３のような、複数のパターンを含むパターンクラスタに含まれるいずれの２パターンも、その２つのパターンについて求めた類似判定値が予め定めた条件を満たすようにする。 Here, the pattern classification unit 15 determines whether the similarity determination value obtained for the two patterns is predetermined for any two patterns included in the pattern cluster including a plurality of patterns such as the patterns P101, P102, and P103. To satisfy.

パターンＰ１０１、Ｐ１０２の例を説明する。 An example of the patterns P101 and P102 will be described.

パターン分類部１５は、パターンＰ１０１にキーワードＫ１１を含ませた検索パターンＰ１０１１に合致する１つ以上のサブグラフにおけるクラス「組織」が定義されたノード（第２ノードという）内のインスタンスの集合をＲ１０１、パターンＰ１０２にキーワードＫ１１を含ませた検索パターンＰ１０２１に合致する１つ以上のサブグラフにおけるクラス「組織」が定義されたノード（第２ノード）内のインスタンスの集合をＲ１０２として、
類似判定値Ｔ（Ｐ１０１、Ｐ１０２）＝｜Ｒ１０１∩Ｒ１０２｜÷｜Ｒ１０１∪Ｒ１０２｜
ただし、
｜Ｒ１０１∩Ｒ１０２｜は、Ｒ１０１とＲ１０２の積集合の中のインスタンス数、
｜Ｒ１０１∪Ｒ１０２｜は、Ｒ１０１とＲ１０２の和集合の中のインスタンス数、
を計算し、類似判定値がしきい値「０．７」以上となるようにする。 The pattern classification unit 15 sets a set of instances in a node (referred to as a second node) in which a class “organization” is defined in one or more subgraphs that match the search pattern P1011 including the keyword K11 in the pattern P101, as R101, A set of instances in a node (second node) in which the class “organization” in one or more subgraphs matching the search pattern P1021 including the keyword K11 in the pattern P102 is defined as R102.
Similarity determination value T (P101, P102) = | R101∩R102 | ÷ | R101∪R102 |
However,
| R101∩R102 | is the number of instances in the product set of R101 and R102,
| R101∪R102 | is the number of instances in the union of R101 and R102,
And the similarity determination value is set to be equal to or greater than the threshold value “0.7”.

ここで、類似判定値Ｔ（Ｐ１０１、Ｐ１０２）におけるＰ１０１、Ｐ１０２は、パターンＰ１、Ｐ２に関するものという意味である。 Here, P101 and P102 in the similarity determination values T (P101 and P102) mean that they relate to the patterns P1 and P2.

類似判定値がしきい値未満なら、パターンＰ１０１、Ｐ１０２は、１つのパターンクラスタには含まれないこととなる。 If the similarity determination value is less than the threshold value, the patterns P101 and P102 are not included in one pattern cluster.

例えば、図１６に示すように、Ｒ１０１がインスタンス「Ａ部門」、「Ｂ部門」を含み、Ｒ１０２がインスタンス「Ａ部門」、「Ｂ部門」、「Ｃ部門」を含むすると、Ｒ１０１∩Ｒ１０２は、「Ａ部門」と「Ｂ部門」を含むので、｜Ｒ１０１∩Ｒ１０２｜＝２である。 For example, as shown in FIG. 16, when R101 includes instances “A department” and “B department”, and R102 includes instances “A department”, “B department”, and “C department”, R101∩R102: Since “A department” and “B department” are included, | R101∩R102 | = 2.

Ｒ１０１∪Ｒ１０２は、「Ａ部門」と「Ｂ部門」と「Ｃ部門」を含むので、｜Ｒ１０１∪Ｒ１０２｜＝３である。 Since R101∪R102 includes “A department”, “B department”, and “C department”, | R101∪R102 | = 3.

よって、類似判定値Ｔ（Ｐ１０１、Ｐ１０２）＝２÷３≒０．６７となり、例えば、しきい値が「０．６」なら、そのしきい値以上と判定される。これにより、パターンＰ１０１、Ｐ１０２は１つのパターンクラスタに含まれることとなる。例えば、しきい値が「０．９」なら、そのしきい値未満と判定される。これにより、パターンＰ１０１、Ｐ１０２は１つのパターンクラスタに含まれないこととなる。 Therefore, the similarity determination value T (P101, P102) = 2 ÷ 3≈0.67. For example, if the threshold value is “0.6”, it is determined that the threshold value is equal to or greater than the threshold value. As a result, the patterns P101 and P102 are included in one pattern cluster. For example, if the threshold is “0.9”, it is determined that the threshold is less than the threshold. As a result, the patterns P101 and P102 are not included in one pattern cluster.

このようにして、パターン分類部１５は、例えば、２つのパターンＰ１０１、Ｐ１０２を含むパターンクラスタと、１つのパターンＰ１０３を含むパターンクラスタを生成する。 In this way, the pattern classification unit 15 generates, for example, a pattern cluster including two patterns P101 and P102 and a pattern cluster including one pattern P103.

また、パターン分類部１５は、パターンＰ１０５についても同様のことを行うが、この場合、パターンＰ１０５がパターンクラスタとなる。 In addition, the pattern classification unit 15 performs the same for the pattern P105. In this case, the pattern P105 is a pattern cluster.

これまでの説明では、便宜的に、３つまたは４つのパターンを分類する例を示したが、実際には、例えば、３００個程度のパターンを分類することが多い。 In the description so far, for the sake of convenience, an example in which three or four patterns are classified has been shown. However, in practice, for example, about 300 patterns are often classified.

上記の別な方法を用いて、その際のしきい値を「０．８」として、３３０個のパターンを分類すると、例えば、図１７に示すような結果が得られた。 When the above-mentioned another method is used and the threshold value at that time is set to “0.8” and 330 patterns are classified, for example, a result as shown in FIG. 17 is obtained.

まず、３３０個のパターンから１７２個のパターンクラスタが得られた。１個のパターンを含むパターンクラスタの数は１１６であった。２個のパターンを含むパターンクラスタの数は３５であった。以下、パターンクラスタ中のパターンの数とパターンクラスタの数の関係は、図に示す通りであった。 First, 172 pattern clusters were obtained from 330 patterns. The number of pattern clusters including one pattern was 116. The number of pattern clusters including two patterns was 35. Hereinafter, the relationship between the number of patterns in the pattern cluster and the number of pattern clusters is as shown in the figure.

図１８は、図１７の矢印Ｙ３で示すパターンクラスタ内の９個のパターンを示す図である。 FIG. 18 is a diagram showing nine patterns in the pattern cluster indicated by the arrow Y3 in FIG.

これらのパターンは互いに類似しているので、クエリ発行部１４は、ステップＳ５９で、この３３０個のパターンを検索したなら、その３３０個に含まれる、図の９個のパターンから１つを選択する。同様にして、クエリ発行部１４は、３３０個を１７２個に絞り込むのである。 Since these patterns are similar to each other, if the query issuing unit 14 searches for these 330 patterns in step S59, the query issuing unit 14 selects one of the nine patterns in the figure included in the 330 patterns. . Similarly, the query issuing unit 14 narrows 330 pieces to 172 pieces.

そして、ステップＳ６３で１７２個が表示されたら、ユーザは、所望のパターンを１７２個のパターンから選択すればよく、つまり、３３０個のパターンから選択する必要はないので、利便性が向上する。 When 172 are displayed in step S63, the user only has to select a desired pattern from 172 patterns, that is, it is not necessary to select from 330 patterns, so that convenience is improved.

以上説明したように、本実施の形態によれば、インスタンスをもつノード間がアークによって接続されたグラフＧが記憶されるグラフデータベース１２と、グラフＧ内のサブグラフを検索するためのＮ個のパターンにおける第１ノードに対し、互いに異なるＭ個のキーワードを含ませて、Ｎ×Ｍ個の検索パターンを生成し（Ｓ１１）、グラフＧから各検索パターンに合致するサブグラフを検索する（Ｓ１５）グラフ検索手段（クエリ発行部１４）と、検索されたサブグラフにおける第２ノード内のインスタンスを使用してＮ個のパターンにおける２つのパターンからなる組み合わせのそれぞれについて類似の度合いを求め、当該類似の度合いに基づいてＮ個のパターンを分類する（Ｓ１９）パターン分類手段（パターン分類部１５）とを備えることで、パターンの第１ノードにキーワードを含ませて得られる検索パターンに合致するサブグラフを検索し、そのサブグラフにおける第２ノード内のインスタンスを検索結果として得る場合のパターンを分類でき、パターンの数を実質的に低減することができる。また、パターンをユーザに選択させる場合などにおいて、ユーザはパターンを容易に選択することができる。 As described above, according to the present embodiment, the graph database 12 storing the graph G in which nodes having instances are connected by arcs, and N patterns for searching for subgraphs in the graph G N × M search patterns are generated for the first node in FIG. 5 by including different M keywords (S11), and subgraphs matching each search pattern are searched from the graph G (S15). Graph search Using the means (query issuing unit 14) and the instance in the second node in the searched subgraph, the degree of similarity is obtained for each of the combinations of the two patterns in the N patterns, and based on the degree of similarity And pattern classification means (pattern classification unit 15) for classifying N patterns (S19). By searching for a subgraph that matches the search pattern obtained by including a keyword in the first node of the pattern, and by obtaining an instance in the second node in the subgraph as a search result, the patterns can be classified. Can be substantially reduced. In addition, when the user selects a pattern, the user can easily select the pattern.

また、パターン分類手段は、Ｎ個のパターンを、類似の度合いに基づいて、1つ以上のパターンを含むパターンクラスタに分類するものであって、各キーワードにつき、２つのパターンの一方に当該キーワードを含ませた検索パターンに合致する１つ以上のサブグラフにおける第２ノード内のインスタンスの集合をＡ、当該２つのパターンの他方に当該キーワードを含ませた検索パターンに合致する１つ以上のサブグラフにおける第２ノード内のインスタンスの集合をＢとして、
類似判定値＝｜Ａ∩Ｂ｜÷｜Ａ∪Ｂ｜
ただし、
Ａ、Ｂが共に空集合でなく、
｜Ａ∩Ｂ｜は、ＡとＢの積集合の中のインスタンス数、
｜Ａ∪Ｂ｜は、ＡとＢの和集合の中のインスタンス数、
を計算した場合、
前記各キーワードに対応する類似判定値の中のｋ個（０＜ｋ≦Ｋ：ただしＫは前記キーワードの個数）以上が所定のしきい値以上となる当該２つのパターンを同一のパターンクラスタに含ませるので、パターンの数を実質的に低減することができる。また、パターンをユーザに選択させる場合などにおいて、ユーザはパターンを容易に選択することができる。 The pattern classification means classifies the N patterns into pattern clusters including one or more patterns based on the degree of similarity, and for each keyword, assigns the keyword to one of the two patterns. The set of instances in the second node in one or more subgraphs that match the included search pattern is A, and the first set in one or more subgraphs that match the search pattern that includes the keyword in the other of the two patterns. Let B be the set of instances in two nodes.
Similarity judgment value = | A∩B | ÷ | A∪B |
However,
A and B are not empty sets,
| A∩B | is the number of instances in the intersection of A and B,
| A∪B | is the number of instances in the union of A and B,
When calculating
The same pattern cluster includes the two patterns in which k or more (0 <k ≦ K: K is the number of the keywords) among the similarity determination values corresponding to the keywords are equal to or greater than a predetermined threshold value. Therefore, the number of patterns can be substantially reduced. In addition, when the user selects a pattern, the user can easily select the pattern.

また、パターン分類手段は、Ｎ×Ｍ個の検索パターンから、該当のサブグラフを得られなかった検索で使用された検索パターンを除外し（Ｓ１９１）、２つの各パターンに共通のキーワードを含ませて得た検索パターンから共にサブグラフが得られた場合には当該２つのパターンを関連づけ（Ｓ１９３）、互いに関連づけられた複数のパターンから得られ且つ除外されていない検索パターンに含まれた１つ以上のキーワードをキーワードクラスタと定義し（Ｓ１９５）、他のパターンと関連づけられていない単一のパターンから得られ且つ除外されていない検索パターンに含まれた１つ以上のキーワードをキーワードクラスタと定義し（Ｓ１９５）、複数のキーワードを１つ以上のキーワードクラスタに分類し、各キーワードクラスタから１つのキーワードを選択するとともに、該選択されるキーワードを含み且つ除外されていない検索パターンの数が最も多くなるようにし（Ｓ１９７）、各キーワードクラスタにつき、選択されたキーワードを含み且つ除外されていない検索パターンを生成するために使用された１つ以上のパターンを選択し（Ｓ１９９）、選択された１つ以上のパターンに含まれ且つ互いに類似する複数のパターンをパターンクラスタと定義し、選択された１つ以上のパターンに含まれ且つ他のパターンと類似しない単一のパターンをパターンクラスタと定義し、選択された１つ以上のパターンを１つ以上のパターンクラスタに分類する（Ｓ１９１１）とともに、前者のパターンクラスタに含まれるいずれの２パターンも、２パターンの一方に選択されたキーワードを含ませた検索パターンに合致する１つ以上のサブグラフにおける第２ノード内のインスタンスの集合をＡ、前記２パターンの他方に前記選択されたキーワードを含ませた検索パターンに合致する１つ以上のサブグラフにおける第２ノード内のインスタンスの集合をＢとした場合、
類似判定値＝｜Ａ∩Ｂ｜÷｜Ａ∪Ｂ｜
ただし、
Ａ、Ｂが共に空集合でなく、
｜Ａ∩Ｂ｜は、ＡとＢの積集合の中のインスタンス数、
｜Ａ∪Ｂ｜は、ＡとＢの和集合の中のインスタンス数、
が所定のしきい値より大きくなるようにするので、パターンの数を実質的に低減することができる。また、パターンをユーザに選択させる場合などにおいて、ユーザはパターンを容易に選択することができる。 The pattern classification means excludes the search pattern used in the search for which the corresponding subgraph could not be obtained from the N × M search patterns (S191), and includes a common keyword in each of the two patterns. If both subgraphs are obtained from the obtained search patterns, the two patterns are associated (S193), and one or more keywords included in the search patterns that are obtained from a plurality of associated patterns and are not excluded Is defined as a keyword cluster (S195), and one or more keywords that are obtained from a single pattern that is not associated with other patterns and are not excluded are defined as keyword clusters (S195). Categorize multiple keywords into one or more keyword clusters, 1 from each keyword cluster Search keywords including the selected keyword and not excluded (S197), and for each keyword cluster, a search including the selected keyword and not excluded. One or more patterns used to generate the pattern are selected (S199), a plurality of patterns included in the selected one or more patterns and similar to each other are defined as a pattern cluster, and the selected 1 A single pattern that is included in one or more patterns and is not similar to another pattern is defined as a pattern cluster, and the selected one or more patterns are classified into one or more pattern clusters (S1911). Any two patterns included in the pattern cluster are the keywords selected as one of the two patterns. A set of instances in the second node in one or more subgraphs that match the search pattern including the search field A, and one or more that matches the search pattern including the selected keyword in the other of the two patterns If the set of instances in the second node in the subgraph is B,
Similarity judgment value = | A∩B | ÷ | A∪B |
However,
A and B are not empty sets,
| A∩B | is the number of instances in the intersection of A and B,
| A∪B | is the number of instances in the union of A and B,
Is larger than a predetermined threshold value, so that the number of patterns can be substantially reduced. In addition, when the user selects a pattern, the user can easily select the pattern.

１…グラフ検索装置
２…ユーザ端末
３…表示装置
１１…ユーザインタフェース
１２…グラフデータベース
１３…パターンデータベース
１４…クエリ発行部
１５…パターン分類部
１６…パターンクラスタデータベース DESCRIPTION OF SYMBOLS 1 ... Graph search apparatus 2 ... User terminal 3 ... Display apparatus 11 ... User interface 12 ... Graph database 13 ... Pattern database 14 ... Query issuing part 15 ... Pattern classification part 16 ... Pattern cluster database

Claims

A graph database storing a graph in which nodes having instances are connected by arcs;
N × M search patterns are generated by including M keywords different from each other for the first node in the N patterns for searching the subgraphs in the graph, and each search pattern is generated from the graph. A graph search means for searching for a subgraph that matches
Using the instance in the second node in the searched subgraph, obtain a degree of similarity for each of the combinations of the two patterns in the N patterns, and determine the N patterns based on the degree of similarity. A pattern classification device comprising: pattern classification means for classifying.

The pattern classification means includes
Classifying the N patterns into pattern clusters including one or more patterns based on the degree of similarity;
For each keyword, A is a set of instances in the second node in one or more subgraphs that match the search pattern that includes the keyword in one of the two patterns, and the keyword is included in the other of the two patterns. Let B be a set of instances in the second node in one or more subgraphs that match the search pattern
Similarity judgment value = | A∩B | ÷ | A∪B |
However,
A and B are not empty sets,
| A∩B | is the number of instances in the intersection of A and B,
| A∪B | is the number of instances in the union of A and B,
When calculating
The same pattern cluster includes the two patterns in which k or more (0 <k ≦ K: K is the number of the keywords) among the similarity determination values corresponding to the keywords are equal to or greater than a predetermined threshold value. The pattern classification apparatus according to claim 1, wherein:

A pattern classification method performed by a pattern classification apparatus including a graph database in which a graph in which nodes having instances are connected by arcs is stored,
The graph search means of the pattern classifier includes N different M keywords for the first node in the N patterns for searching the subgraphs in the graph, and sets N × M search patterns. Generate and search the graph for subgraphs that match each search pattern,
The pattern classification unit of the pattern classification device obtains a similarity degree for each of the combinations of two patterns in the N patterns using the instance in the second node in the searched subgraph, A pattern classification method, wherein the N patterns are classified based on a degree.

The pattern classification means includes
Classifying the N patterns into pattern clusters including one or more patterns based on the degree of similarity;
For each keyword, A is a set of instances in the second node in one or more subgraphs that match the search pattern that includes the keyword in one of the two patterns, and the keyword is included in the other of the two patterns. Let B be a set of instances in the second node in one or more subgraphs that match the search pattern
Similarity judgment value = | A∩B | ÷ | A∪B |
However,
A and B are not empty sets,
| A∩B | is the number of instances in the intersection of A and B,
| A∪B | is the number of instances in the union of A and B,
When calculating
The same pattern cluster includes the two patterns in which k or more (0 <k ≦ K: K is the number of the keywords) among the similarity determination values corresponding to the keywords are equal to or greater than a predetermined threshold value. The pattern classification method according to claim 3, wherein:

A computer program for operating a computer as the pattern classification apparatus according to claim 1.