JP2008181331A

JP2008181331A - Relation extraction method, relation extraction system

Info

Publication number: JP2008181331A
Application number: JP2007014167A
Authority: JP
Inventors: Hiroyuki Sato; 宏之佐藤; Kyoshi Iizuka; 京士飯塚; Pramudiono Iko; プラムディオノイコ; Kenji Otomo; 健治大友; Takahiko Murayama; 隆彦村山
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2007-01-24
Filing date: 2007-01-24
Publication date: 2008-08-07
Anticipated expiration: 2027-01-24
Also published as: JP4698618B2

Abstract

<P>PROBLEM TO BE SOLVED: To obtain strength of a noticed relation of nodes from a graph structure data. <P>SOLUTION: In a relation extraction system 1, a relation extraction device 10 connected to a graph structure database 30, which stores data of graph structure composed of various nodes and various arcs, inputs a plurality of nodes desired to obtain strength of the relation, and obtains a coefficient for each of values of input node from a query graph pattern database 12 which stores many query graph patterns by using the number of nodes of a same concept which overlaps between subgraphs extracted from graph structure database 30 by using a query issuance section 11 which searches the query graph pattern relevant to an input node and a searched query graph pattern, and by using the number of nodes of same concept which overlaps between subgraphs extracted from the graph structure database 30. Thereby, the strength of the relation between input nodes can be obtained. Also the strength of the relation between nodes input can be obtained from a plurality of viewpoints by using a plurality of query graph patterns. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、グラフ構造で表現されたデータからノード間の関係の強弱を求めるグラフマインニング技術に関する。 The present invention relates to a graph mining technique for obtaining the strength of a relationship between nodes from data expressed in a graph structure.

近年、さまざまな構造を持つデータをグラフとして表現し、その中からグラフパターンのマッチングによる検索を実行して検索結果を得るデータベースシステムの開発が進んでいる。非特許文献１は、グラフパターンマッチングによるＲＤＦ（Resource Description Framework）データの検索を行うための仕様である。非特許文献１に示された仕様に基づいたシステムでは、グラフ構造のクエリを用いて、多様なノードや多様なアークから構成されるグラフ構造を持つ大量のデータから構造がマッチするデータを抽出し、マッチした部分をサブグラフとして取得することができる。なお、特許文献１には、キーワードや概念などの情報に基づいてグラフ構造のクエリグラフパターンを生成する技術が記載されている。クエリグラフパターンとは、グラフ構造を持つデータから構造がマッチするデータを抽出するためのグラフである。 In recent years, development of a database system that expresses data having various structures as a graph and executes a search by matching a graph pattern from the data to obtain a search result has been progressing. Non-Patent Document 1 is a specification for searching RDF (Resource Description Framework) data by graph pattern matching. In the system based on the specification shown in Non-Patent Document 1, data having a matching structure is extracted from a large amount of data having a graph structure composed of various nodes and various arcs by using a query of the graph structure. , The matched part can be obtained as a subgraph. Patent Document 1 describes a technique for generating a query graph pattern having a graph structure based on information such as keywords and concepts. A query graph pattern is a graph for extracting data whose structure matches from data having a graph structure.

また、グラフとして表現されたデータから特徴的な関係を発見するグラフマイニングなどの手法に関する研究が進展している。非特許文献２には、特徴的な関係を求めたい複数の事柄に関するキーワードを用いてインターネット上で情報検索を行った検索結果の重なりを求めて、複数の事柄間の係数を求める技術が記載されている。
特開２００６−３１３５０１号公報 "SPARQL Query Language for RDF"、[online]、World Wide Web Consortium、［平成１９年１月５日検索］、インターネット＜ＵＲＬ：ｈｔｔｐ：／／ｗｗｗ．ｗ３ｃ．ｏｒｇ／ＴＲ／ｒｄｆ−ｓｐａｒｑｌ−ｑｕｅｒｙ／＞松尾豊、外５名、"研究者ネットワーク抽出検索システム"、[online]、人工知能学会、［平成１９年１月５日検索］、インターネット＜ＵＲＬ：ｈｔｔｐ：／／ｗｗｗ−ｋａｓｍ．ｎｉｉ．ａｃ．ｊｐ／ｊｓａｉ２００５／ｓｃｈｅｄｕｌｅ／ｐｄｆ／０００１３７．ｐｄｆ＞ In addition, research on techniques such as graph mining that discovers characteristic relationships from data expressed as graphs is progressing. Non-Patent Document 2 describes a technique for obtaining a coefficient between a plurality of matters by obtaining an overlap of search results obtained by performing an information search on the Internet using keywords relating to a plurality of matters for which a characteristic relationship is desired. ing.
JP 2006-313501 A "SPARQL Query Language for RDF", [online], World Wide Web Consortium, [Search January 5, 2007], Internet <URL: http: // www. w3c. org / TR / rdf-sparql-query /> Yutaka Matsuo, 5 others, “Researcher Network Extraction Search System”, [online], Japanese Society for Artificial Intelligence, [Search January 5, 2007], Internet <URL: http: // www-kasm. nii. ac. jp / jsai2005 / schedule / pdf / 000137. pdf>

しかしながら、非特許文献１には、クエリグラフパターンにマッチしたサブグラフを抽出することは記載されているが、抽出結果から特徴的な関係を発見することについては記載されていない。 However, Non-Patent Document 1 describes extracting a subgraph that matches a query graph pattern, but does not describe finding a characteristic relationship from the extraction result.

また、非特許文献２に記載の技術は、関係を求めたい複数の事柄について、インターネット上に存在する情報を検索対象として用いるものであって、多様なノードや多様なアークで構成されたグラフ構造を持つデータを検索対象とすることについては想定しておらず、その方法も記載されていない。 The technique described in Non-Patent Document 2 uses information existing on the Internet as a search target for a plurality of matters for which a relationship is desired, and has a graph structure composed of various nodes and various arcs. It is not assumed that data having a search target is to be searched, and the method is not described.

本発明は、上記に鑑みてなされたものであり、その課題とするところは、多様なノードや多様なアークから構成されたグラフ構造データから、注目したノード同士の関係の強さを複数の観点から求めることにある。 The present invention has been made in view of the above, and the problem is that a graph structure data composed of various nodes and various arcs can be used to determine the strength of the relationship between the nodes in question from a plurality of viewpoints. There is to ask from.

第１の本発明に係る関係抽出方法は、入力手段により、多様なノードと多様なアークから構成されるグラフ構造を持つデータが格納されているグラフ構造データベースに格納されたノードの中の複数のノードの選択を受け付けるステップと、クエリ発行手段により、選択されたノードの各々について、グラフ構造を持つデータから構造がマッチするデータを抽出するためのクエリグラフパターンの変数ノードに選択されたノードを代入し、当該クエリグラフパターンにマッチするデータをグラフ構造データベースから取得するステップと、係数計算手段により、データを取得するステップで選択されたノードの各々について取得したデータを比較してノードが重複する数を求め、当該重複数に基づいて選択されたノード間の係数を計算するステップと、を有することを特徴とする。 In the relation extraction method according to the first aspect of the present invention, a plurality of nodes among the nodes stored in the graph structure database in which data having a graph structure composed of various nodes and various arcs is stored by the input means. Accepting the selection of the node and assigning the selected node to the variable node of the query graph pattern for extracting the data whose structure matches from the data having the graph structure for each selected node by the query issuing means The number of overlapping nodes by comparing the data acquired for each of the nodes selected in the step of acquiring data matching the query graph pattern from the graph structure database and the step of acquiring data by the coefficient calculation means And calculate the coefficient between the selected nodes based on the overlap number. Characterized in that it has Tsu and up, the.

本発明にあっては、多様なノードと多様なアークから構成されるグラフ構造を持つデータの中から複数のノードの選択を受け付け、選択されたノードの各々について、クエリグラフパターンの変数ノードに選択されたノードを代入したクエリグラフパターンにマッチするデータを抽出し、選択されたノード毎に抽出したデータ間で重複するノードの数に基づいて係数を計算することで、選択されたノード同士の関係の強さを求めることができる。 In the present invention, selection of a plurality of nodes from data having a graph structure composed of various nodes and various arcs is accepted, and each of the selected nodes is selected as a variable node of a query graph pattern. The data that matches the query graph pattern assigned to the selected node is extracted, and the coefficient is calculated based on the number of nodes that overlap between the extracted data for each selected node. Can be determined.

上記関係抽出方法において、データを取得するステップは、選択されたノードに対応する変数ノードを含むクエリグラフパターンをクエリグラフパターンを格納しているクエリグラフパターンデータベースから取得することを特徴とする。 In the above relationship extraction method, the step of acquiring data is characterized in that a query graph pattern including a variable node corresponding to the selected node is acquired from a query graph pattern database storing the query graph pattern.

本発明にあっては、クエリグラフパターンを格納するクエリグラフパターンから利用するクエリグラフパターンを取得することで、さまざまなクエリグラフパターンを素早く取得することができ、いろいろな観点から選択されたノード同士の関係の強さを求めることができる。 In the present invention, by acquiring a query graph pattern to be used from a query graph pattern storing a query graph pattern, various query graph patterns can be quickly acquired, and nodes selected from various viewpoints can be obtained. The strength of the relationship can be obtained.

上記関係抽出方法において、データを取得するステップは、選択されたノードに対応する変数ノードを含むクエリグラフパターンを選択されたノードに基づいて生成することを特徴とする。 In the relation extraction method, the step of acquiring data generates a query graph pattern including a variable node corresponding to the selected node based on the selected node.

本発明にあっては、利用するクエリグラフパターンを選択されたノードに基づいて生成するので、選択されたノードに応じた柔軟なクエリグラフパターンの利用を可能とする。 In the present invention, since the query graph pattern to be used is generated based on the selected node, the flexible query graph pattern according to the selected node can be used.

上記関係抽出方法において、受け付けるステップは、選択されたノードに対応する変数ノードを含むクエリグラフパターンを入力することを特徴とする。 In the relationship extracting method, the accepting step inputs a query graph pattern including a variable node corresponding to the selected node.

本発明にあっては、利用するクエリグラフパターンを入力することにより、所望の観点からノード間の関係性の強弱を求めることを可能とする。 In the present invention, by inputting a query graph pattern to be used, it is possible to obtain the strength of the relationship between nodes from a desired viewpoint.

上記関係抽出方法において、計算するステップにより計算された係数に基づいて選択されたノード間の関係をグラフ構造として表示するステップを有することを特徴とする。 The relation extracting method includes a step of displaying a relation between nodes selected based on the coefficient calculated in the calculating step as a graph structure.

本発明にあっては、ノード間の関係の強弱を計算された係数に基づいてグラフ構造として表示することにより、利用者は選択したノードの関係を視覚的に容易に把握することが可能となる。 In the present invention, by displaying the strength of the relationship between nodes as a graph structure based on the calculated coefficient, the user can easily grasp the relationship between the selected nodes visually. .

第２の本発明に係る関係抽出システムは、多様なノードと多様なアークから構成されるグラフ構造を持つデータが格納されているグラフ構造データベースに接続され、グラフ構造データベースに格納されたノードの中の複数のノードの選択を受け付ける入力手段と、選択されたノードの各々について、グラフ構造を持つデータから構造がマッチするデータを抽出するためのクエリグラフパターンの変数ノードに選択されたノードを代入し、当該クエリグラフパターンにマッチするデータをグラフ構造データベースから取得するクエリ発行手段と、クエリ発行手段が選択されたノードの各々について取得したデータを比較してノードが重複する数を求め、当該重複数に基づいて選択されたノード間の係数を計算する係数計算手段と、を有することを特徴とする。 The relation extraction system according to the second aspect of the present invention is connected to a graph structure database in which data having a graph structure composed of various nodes and various arcs is stored, and among the nodes stored in the graph structure database. Input means for accepting selection of a plurality of nodes, and for each of the selected nodes, the selected node is substituted into a variable node of a query graph pattern for extracting data whose structure matches from data having a graph structure The query issuing means for acquiring data matching the query graph pattern from the graph structure database and the query issuing means for comparing the data acquired for each of the selected nodes to obtain the number of overlapping nodes, Coefficient calculating means for calculating a coefficient between nodes selected based on The features.

上記関係抽出システムにおいて、クエリグラフパターンを格納しているクエリグラフパターンデータベースを有し、クエリ発行手段は、選択されたノードに対応する変数ノードを含むクエリグラフパターンをクエリグラフパターンデータベースから取得することを特徴とする。 The relation extraction system has a query graph pattern database storing a query graph pattern, and the query issuing means acquires a query graph pattern including a variable node corresponding to the selected node from the query graph pattern database. It is characterized by.

上記関係抽出システムにおいて、クエリ発行手段は、選択されたノードに対応する変数ノードを含むクエリグラフパターンを選択されたノードに基づいて生成することを特徴とする。 In the relationship extraction system, the query issuing unit generates a query graph pattern including a variable node corresponding to the selected node based on the selected node.

上記関係抽出システムにおいて、入力手段は、選択されたノードに対応する変数ノードを含むクエリグラフパターンを入力することを特徴とする。 In the relationship extraction system, the input unit inputs a query graph pattern including a variable node corresponding to the selected node.

上記関係抽出システムにおいて、係数計算手段により計算された係数に基づいて選択されたノード間の関係をグラフ構造として表示する表示手段を有することを特徴とする。 The relation extraction system includes display means for displaying a relation between nodes selected based on the coefficient calculated by the coefficient calculation means as a graph structure.

本発明によれば、多様なノードや多様なアークから構成されたグラフ構造データから、注目したノード同士の関係の強さを複数の観点から求めることができる。 According to the present invention, it is possible to obtain the strength of the relationship between nodes of interest from a plurality of viewpoints from graph structure data composed of various nodes and various arcs.

以下、本発明の実施の形態について図面を用いて説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

図１は、本実施の形態における関係抽出装置１０を用いた関係抽出システム１の構成を示すブロック図である。同図に示す関係抽出システム１は、関係抽出装置１０と、ユーザインタフェース提供部２０と、グラフ構造データベース３０とを有しており、関係抽出装置１０は、ユーザインタフェース提供部２０、グラフ構造データベース３０にアクセス可能な状態で接続されている。 FIG. 1 is a block diagram showing a configuration of a relationship extraction system 1 using a relationship extraction device 10 in the present embodiment. The relationship extraction system 1 shown in FIG. 1 includes a relationship extraction device 10, a user interface providing unit 20, and a graph structure database 30. The relationship extraction device 10 includes a user interface providing unit 20, a graph structure database 30. Is connected in an accessible state.

関係抽出装置１０は、ユーザインタフェース提供部２０で選択されたノードに基づいてクエリグラフパターンを検索または生成するクエリ発行部１１と、クエリグラフパターンを記憶しておくクエリグラフパターンデータベース１２と、クエリグラフパターンを用いた検索結果から関係性を表す係数を計算する係数計算部１３とを有する。なお、関係抽出装置１０は、演算処理装置、記憶装置、メモリ等を備えたコンピュータにより構成できるものであり、各部の処理はプログラムによって実行される。 The relationship extraction apparatus 10 includes a query issuing unit 11 that searches or generates a query graph pattern based on the node selected by the user interface providing unit 20, a query graph pattern database 12 that stores the query graph pattern, a query graph And a coefficient calculation unit 13 for calculating a coefficient representing the relationship from the search result using the pattern. The relationship extraction device 10 can be configured by a computer including an arithmetic processing device, a storage device, a memory, and the like, and the processing of each unit is executed by a program.

ユーザインタフェース提供部２０は、関連性を求めたいノードの選択が行えるＧＵＩを提供したり、選択したノードの関連性の表示を行う。 The user interface providing unit 20 provides a GUI capable of selecting a node for which a relationship is desired, and displays the relationship of the selected node.

グラフ構造データベース３０は、図２に示すような多様なノードや多様なアークから構成されたグラフ構造を持つデータが格納されている。同図において、何も表示されていないアークのラベル名は「技術キーワード」であり、ネームスペースも一部省略している。本実施の形態におけるグラフ構造データは、ラベル付き有効グラフデータであり、ラベル付き有効グラフで表現可能なデータのデータモデルを定めたものがＲＤＦである。 The graph structure database 30 stores data having a graph structure composed of various nodes and various arcs as shown in FIG. In the figure, the label name of the arc for which nothing is displayed is “technical keyword”, and a part of the name space is also omitted. The graph structure data in the present embodiment is labeled effective graph data, and RDF defines a data model of data that can be expressed by a labeled effective graph.

図３は、これらのグラフ構造データがとのように既存のリレーショナルデータベースなどで管理されているデータから生成され、グラフ化されているかを示す説明図である。同図は、図２のデータの一部を示したものであり、「論文Ｇ」と、その著者の「山田太郎」と、その題名の「Ｂ技術入門」と、そのキーワード「Ｂ技術」とが与えられているデータを、ＲＤＦ／ＸＭＬ形式データおよびＲＤＦのグラフで表現したものである。ここでは、グラフのアークのことをＲＤＦの仕様に基づいてプロパティと呼んでいる。 FIG. 3 is an explanatory diagram showing whether the graph structure data is generated from data managed in an existing relational database or the like and graphed. This figure shows a part of the data in FIG. 2, “Paper G”, its author “Taro Yamada”, its title “Introduction to B Technology”, and its keyword “B Technology”. Is represented by RDF / XML format data and an RDF graph. Here, the arc of the graph is called a property based on the RDF specification.

次に、上記のように構成された関係抽出システム１の動作を図を用いて説明する。 Next, the operation of the relationship extraction system 1 configured as described above will be described with reference to the drawings.

図４は、本実施の形態における関係抽出システム１の動作を示すシーケンス図である。まず、ユーザインタフェース提供部２０においてユーザ操作用ＧＵＩが生成される（Ｓ１）。本実施の形態においては、選択されるノードが特定できるキーワードなどの入力を促すＧＵＩが生成される。 FIG. 4 is a sequence diagram showing the operation of the relationship extraction system 1 in the present embodiment. First, the user interface providing unit 20 generates a user operation GUI (S1). In the present embodiment, a GUI that prompts input of a keyword or the like that can identify the selected node is generated.

そして、生成されたＧＵＩがユーザに提示され、ユーザにより関係性を求めたいノードが複数個選択される（Ｓ２）。ユーザにより選択されたノードやノードのＩＤであるＵＲＩ（Uniform Resource Identifier）などが関係抽出装置１０に送信される。また、検索の指示内容を関係抽出装置１０に送信してもよい。検索の指示内容としては、後述するクエリグラフパターンを直接指定するものでもよいし、クエリグラフパターンを生成するのに有用な情報でもよい。なお、本実施の形態においては、ユーザに入力を促してノードを選択する関係抽出システム１について説明したが、ユーザインタフェース提供部２０を備えずに、関係抽出装置１０に接続された別の装置が、ユーザを介さずにノードを選択し、それを入力するものであってもよい。 Then, the generated GUI is presented to the user, and the user selects a plurality of nodes for which a relationship is desired (S2). The node selected by the user and the URI (Uniform Resource Identifier) which is the ID of the node are transmitted to the relationship extraction apparatus 10. Further, the search instruction content may be transmitted to the relationship extraction device 10. The search instruction content may directly specify a query graph pattern to be described later, or may be information useful for generating a query graph pattern. In the present embodiment, the relationship extraction system 1 that prompts the user to input and selects a node has been described. However, another device connected to the relationship extraction device 10 without the user interface providing unit 20 is provided. , A node may be selected and input without going through the user.

クエリ発行部１１では、受信したノードのキーワードやＵＲＩなどの入力値や検索の指示内容などに基づいて、クエリグラフパターンデータベース１２からクエリグラフパターンの探索を行う（Ｓ３）。クエリグラフパターンデータベース１２には、多数のクエリグラフパターンが格納されており、クエリ発行部１１は、その中から入力値に関連したクエリグラフパターンを検索する。例えば、「Ａ技術」、「Ｂ技術」、「Ｃ技術」、「Ｄ技術」という具体的な技術名がキーワードとして入力されたとする。この場合、クエリグラフパターンデータベース１２から技術名を取り得る変数ノード「?keyword」を有するクエリグラフパターンを検索して、読み出す。 The query issuing unit 11 searches for a query graph pattern from the query graph pattern database 12 based on the received input keyword value such as keyword or URI, search instruction content, and the like (S3). A large number of query graph patterns are stored in the query graph pattern database 12, and the query issuing unit 11 searches for query graph patterns related to input values. For example, it is assumed that specific technical names “A technology”, “B technology”, “C technology”, and “D technology” are input as keywords. In this case, a query graph pattern having a variable node “? Keyword” that can take a technical name from the query graph pattern database 12 is retrieved and read out.

例として、図５、図６に技術名を取り得る変数ノード「?keyword」を有するクエリグラフパターンを示す。図５は、変数ノード「?keyword」をプロパティ「rm:技術キーワード」で指し示す２つの変数ノード「?x」、「?y」を有し、２つの変数ノード「?x」、「?y」のそれぞれは、変数ノード「?person 」をプロパティ「rm:著者」で指し示し、変数ノード「?person」は、ノード「Person:人」につながるプロパティ「rdf:type」を有しているクエリグラフパターンである。 As an example, FIG. 5 and FIG. 6 show a query graph pattern having a variable node “? Keyword” that can take a technical name. FIG. 5 includes two variable nodes “? X” and “? Y” indicating the variable node “? Keyword” with the property “rm: technical keyword”, and the two variable nodes “? X” and “? Y”. Each indicates a variable node “? Person” with a property “rm: author”, and the variable node “? Person” has a property “rdf: type” that leads to the node “Person: person”. It is.

図６は、図５に示したものに対して、変数ノード「?x」が、プロパティ「pj:担当者」で変数ノード「?person」を指し示すようになっているクエリグラフパターンである。 FIG. 6 is a query graph pattern in which the variable node “? X” indicates the variable node “? Person” with the property “pj: person in charge” with respect to what is shown in FIG.

なお、本実施の形態では、クエリグラフパターンデータベース１２に格納されたクエリグラフパターンの中からユーザの入力値に基づいてクエリグラフパターンを抽出したが、例えば、クエリグラフパターンデータベース１２を備えず、特許文献１に示す方法を用いて、ユーザの入力値に基づいてグラフ構造データベース３０に格納されるデータを解析することにより、クエリグラフパターンを生成しても良い。 In the present embodiment, the query graph pattern is extracted from the query graph patterns stored in the query graph pattern database 12 based on the input value of the user. For example, the query graph pattern database 12 is not provided and the patent graph A query graph pattern may be generated by analyzing data stored in the graph structure database 30 based on a user input value using the method shown in Document 1.

続いて、クエリ発行部１１は、得られたクエリグラフパターンの変数ノードに入力値を代入したものをクエリとして、グラフ構造データベース３０に対してパターンにマッチするデータ（サブグラフ）を返すように要求を出す（Ｓ４）。例えば、キーワード「Ａ技術」、「Ｂ技術」、「Ｃ技術」、「Ｄ技術」が入力され、図５に示すクエリグラフパターンＡが得られていたとすると、クエリグラフパターンＡの変数ノード「?keyword」の部分に、「Ａ技術」、「Ｂ技術」、「Ｃ技術」、「Ｄ技術」をそれぞれ代入した４つのクエリが作られる。 Subsequently, the query issuing unit 11 requests the graph structure database 30 to return data (subgraph) that matches the pattern, using the query node obtained by substituting the input value for the variable node of the obtained query graph pattern as a query. (S4). For example, if the keywords “A technology”, “B technology”, “C technology”, “D technology” are input and the query graph pattern A shown in FIG. 5 is obtained, the variable node “? Four queries are created by substituting “A technology”, “B technology”, “C technology”, and “D technology” in the “keyword” portion.

そして、グラフ構造データベース３０は、クエリにマッチするデータを返す（Ｓ５）。図７は、図２に示したグラフ構造に対して、図５に示すクエリグラフパターンＡの変数ノード「?keyword」の部分に、「Ｂ技術」を代入したクエリグラフパターンを利用して得られた結果を太線で示した図である。図７では、上記クエリにパターンマッチする構造が３箇所示されており、クエリグラフパターンＡの各変数ノード「?x」、「?y」、「?person」に対応するサブグラフのノードの値を抽出することができる。例えば、上記クエリにマッチしたパターンにおける変数ノード「?person」に対応する値は、「Person:山田太郎」、「Person:田中一郎」、「Person:鈴木花子」の３つである。 Then, the graph structure database 30 returns data that matches the query (S5). FIG. 7 is obtained by using a query graph pattern in which “technique B” is substituted for the variable node “? Keyword” of the query graph pattern A shown in FIG. 5 with respect to the graph structure shown in FIG. FIG. In FIG. 7, three structures that match the query are shown, and the values of the subgraph nodes corresponding to the variable nodes “? X”, “? Y”, “? Person” of the query graph pattern A are shown. Can be extracted. For example, there are three values corresponding to the variable node “? Person” in the pattern matching the above query: “Person: Taro Yamada”, “Person: Ichiro Tanaka”, “Person: Hanako Suzuki”.

続いて、クエリを利用して得られた結果が係数計算部１３に渡され（Ｓ６）、係数計算部１３は、それらの結果に基づいて、入力されたノード間の関係の強弱を計算する（Ｓ７）。本実施の形態においては、入力された各ノードの値を代入したクエリを利用して得られたデータ間で重複する同概念のノードの数を算出して、係数を求める。なお、係数計算部１３に渡されるデータは、得られたグラフ構造データ、あるいは、クエリグラフパターンの変数ノードの値などである。 Subsequently, the result obtained by using the query is passed to the coefficient calculation unit 13 (S6), and the coefficient calculation unit 13 calculates the strength of the relationship between the input nodes based on those results ( S7). In the present embodiment, the coefficient is obtained by calculating the number of nodes of the same concept that overlap between data obtained by using a query in which the value of each input node is substituted. The data passed to the coefficient calculation unit 13 is the obtained graph structure data, the value of the variable node of the query graph pattern, or the like.

例として、クエリグラフパターンＡの変数ノード「?person」に対応する値を係数を求める際に参照する場合を示す。「Ａ技術」を代入して得られた変数ノード「?person」に対応する値は、「Person:山本幸子」、「Person:中村二郎」、「Person:山田太郎」であり、「Ｂ技術」を代入して得られた変数ノード「?person」に対応する値は、すでに示したように「Person:山田太郎」、「Person:田中一郎」、「Person:鈴木花子」である。「Ｃ技術」を代入して得られた変数ノード「?person」に対応する値は、「Person:山田太郎」、「Person:田中一郎」であり、「Ｄ技術」を代入して得られた変数ノード「?person」に対応する値は、「Person:田中一郎」、「Person:鈴木花子」である。 As an example, a case where a value corresponding to the variable node “? Person” of the query graph pattern A is referred to when a coefficient is obtained is shown. The values corresponding to the variable node “? Person” obtained by substituting “A technology” are “Person: Sachiko Yamamoto”, “Person: Jiro Nakamura”, “Person: Taro Yamada”, and “B technology”. As described above, the values corresponding to the variable node “? Person” obtained by substituting are “Person: Taro Yamada”, “Person: Ichiro Tanaka”, and “Person: Hanako Suzuki”. The values corresponding to the variable node “? Person” obtained by substituting “C technology” are “Person: Taro Yamada” and “Person: Ichiro Tanaka”, and obtained by substituting “D technology”. The values corresponding to the variable node “? Person” are “Person: Ichiro Tanaka” and “Person: Hanako Suzuki”.

これらの結果により、入力された各ノード間で重複する変数ノード「?person」に対応する値を求めると、「Ａ技術」と「Ｂ技術」に関しては「Person:山田太郎」が重複し、「Ａ技術」と「Ｃ技術」に関しては「Person:山田太郎」が重複し、「Ａ技術」と「Ｄ技術」に関しては重複する値はない。また、「Ｂ技術」と「Ｃ技術」に関しては「Person:山田太郎」と「Person:田中一郎」の２つが重複し、「Ｂ技術」と「Ｄ技術」に関しては「Person:田中一郎」と「Person:鈴木花子」の２つが重複し、「Ｃ技術」と「Ｄ技術」に関しては「田中一郎」が重複している。 Based on these results, when the value corresponding to the variable node “? Person” that is duplicated between the input nodes is obtained, “Person: Taro Yamada” is duplicated for “A technology” and “B technology”. “Person: Taro Yamada” overlaps for “A technology” and “C technology”, and there is no overlapping value for “A technology” and “D technology”. In addition, “Person: Taro Yamada” and “Person: Ichiro Tanaka” overlap for “B Technology” and “C Technology”, and “Person: Ichiro Tanaka” for “B Technology” and “D Technology”. “Person: Hanako Suzuki” overlaps, and “C Technology” and “D Technology” overlap “Ichiro Tanaka”.

入力された各ノード間で重複するノードの数が得られたら、次式（１）に示すシンプソン係数を用いて関係性の強弱を求める。

When the number of nodes that overlap between the input nodes is obtained, the strength of the relationship is obtained using the Simpson coefficient expressed by the following equation (1).

例えば、「Ａ技術」と「Ｂ技術」の係数を求めると、「Ａ技術」と「Ｂ技術」のそれぞれを代入したクエリの結果の変数ノード「?person」に対応する値のうち重複するものの個数は１であり、「Ａ技術」を代入したクエリの結果の変数ノード「?person」に対応する値の個数は３であり、「Ｂ技術」を代入したクエリの結果の変数ノード「?person」に対応する値の個数は３であるので、シンプソン係数は、１／ｍｉｎ（３，３）＝０．３３となる。図８は、入力された各ノード間の係数を同様に求めた表を示す図である。 For example, when the coefficients of “A technology” and “B technology” are obtained, duplicate values among the values corresponding to the variable node “? Person” of the query result obtained by assigning each of “A technology” and “B technology” The number is 1, the number of values corresponding to the variable node “? Person” of the query result to which “A technology” is substituted is 3, and the variable node “? Person” of the query result to which “B technology” is substituted. Since the number of values corresponding to “3” is 3, the Simpson coefficient is 1 / min (3, 3) = 0.33. FIG. 8 is a diagram showing a table in which coefficients between input nodes are similarly obtained.

また、図９は、図６に示したクエリグラフパターンＢの変数ノード「?keyword」に「Ａ技術」、「Ｂ技術」、「Ｃ技術」、「Ｄ技術」をそれぞれ代入して得られた結果からＳ７に示した処理により係数を求めて表にした図である。 FIG. 9 is obtained by substituting “A technology”, “B technology”, “C technology”, and “D technology” into the variable node “? Keyword” of the query graph pattern B shown in FIG. It is the figure which calculated | required the coefficient by the process shown to S7 from the result, and made it a table | surface.

図８、図９に示すように、クエリグラフパターンＡを利用したものと、クエリグラフパターンＢを利用したものでは結果が異なっている。これは、グラフパターンが異なることに起因しており、求めた係数を関係抽出システム１がユーザに提示するときに、クエリグラフパターンを併せて提示すれば、どの観点から係数を求めたのかユーザは判断することができる。なお、クエリグラフパターンＡは、技術をキーワードとする２つの論文の著者という観点であり、クエリグラフパターンＢは、技術をキーワードとする論文の著者であって技術をキーワードとするプロジェクトの担当者でもあるという観点である。 As shown in FIGS. 8 and 9, the results using the query graph pattern A are different from those using the query graph pattern B. This is because the graph pattern is different. When the relationship extraction system 1 presents the obtained coefficient to the user, if the query graph pattern is also presented, the user can determine from which point of view the coefficient is obtained. Judgment can be made. Query graph pattern A is the viewpoint of authors of two papers that use technology as a keyword, and query graph pattern B is the author of a paper that uses technology as a keyword and is also in charge of a project that uses technology as a keyword. It is a point of view.

求めた係数は、ユーザインタフェース提供部２０に送られ、ユーザインタフェース提供部２０により表示結果画面の生成が行われてユーザに提示される（Ｓ８）。画像は、例えば、図１０、図１１に示すように、グラフにより表示する。これにより、ユーザは指定したノードの関係を視覚的に容易に把握することが可能となる。また、複数のクエリグラフパターンを用いることにより、複数のクエリグラフパターン（観点）毎に結果を得ることができるので、クエリグラフパターン（観点）とノード間の関係の強弱との相関を示すことが可能となる。 The obtained coefficient is sent to the user interface providing unit 20, and a display result screen is generated by the user interface providing unit 20 and presented to the user (S8). The image is displayed in a graph as shown in FIGS. As a result, the user can easily visually grasp the relationship between the designated nodes. Moreover, since a result can be obtained for each of a plurality of query graph patterns (viewpoints) by using a plurality of query graph patterns, the correlation between the query graph pattern (viewpoint) and the strength of the relationship between nodes may be shown. It becomes possible.

したがって、本実施の形態によれば、多様なノードや多様なアークから構成されたグラフ構造のデータを格納するグラフ構造データベース３０に接続される関係抽出装置１０が、関係の強弱を求めたい複数のノードを入力し、多数のクエリグラフパターンを格納したクエリグラフパターンデータベース１２から入力されたノードに関連したクエリグラフパターンを検索するクエリ発行部１１と、検索したクエリグラフパターンを用いて入力されたノードの値のそれぞれについてグラフ構造データベース３０から抽出されたサブグラフ間で重複する同概念のノードの数を用いて係数を求めることにより、入力されたノード間の関係の強弱を求めることができる。また、複数のクエリグラフパターンを利用することにより、複数の観点から入力されたノード間の関係の強弱を求めることができる。 Therefore, according to the present embodiment, the relationship extraction apparatus 10 connected to the graph structure database 30 that stores graph structure data composed of various nodes and various arcs is required to obtain a plurality of relationship strengths. A query issuing unit 11 for inputting a node and searching for a query graph pattern related to the node input from the query graph pattern database 12 storing a large number of query graph patterns, and a node input using the searched query graph pattern By calculating the coefficient using the number of nodes of the same concept that overlap between the subgraphs extracted from the graph structure database 30 for each of the values, the strength of the relationship between the input nodes can be obtained. Further, by using a plurality of query graph patterns, it is possible to obtain the strength of the relationship between nodes input from a plurality of viewpoints.

一実施の形態における関係抽出システムの構成を示すブロック図である。It is a block diagram which shows the structure of the relationship extraction system in one embodiment. 一実施の形態における関係抽出装置に接続されているグラフ構造データベース内のデータを示す説明図である。It is explanatory drawing which shows the data in the graph structure database connected to the relationship extraction apparatus in one embodiment. 図２に示すデータの一部であり、元のリレーショナルデータベースにおけるデータと、このデータをＸＭＬ形式およびグラフ形式を用いてＲＤＦで表現した状態を示す説明図である。FIG. 3 is a part of the data shown in FIG. 2, and is an explanatory diagram showing data in the original relational database and a state in which this data is expressed in RDF using an XML format and a graph format. 図１に示す関係抽出システムの動作を示すシーケンス図である。It is a sequence diagram which shows operation | movement of the relationship extraction system shown in FIG. 図１に示す関係抽出システムで利用するクエリグラフパターンを示す説明図である。It is explanatory drawing which shows the query graph pattern utilized with the relationship extraction system shown in FIG. 図１に示す関係抽出システムで利用する別のクエリグラフパターンを示す説明図である。It is explanatory drawing which shows another query graph pattern utilized with the relationship extraction system shown in FIG. 図２に示すグラフ構造データに対して図５に示すクエリグラフパターンを適用して得られたサブグラフを太線で示した状態を示す説明図である。It is explanatory drawing which shows the state which showed the subgraph obtained by applying the query graph pattern shown in FIG. 5 with respect to the graph structure data shown in FIG. 2 with the thick line. 図２に示すグラフ構造データに対して図５に示すクエリグラフパターンを適用して係数を計算した結果を示す説明図である。FIG. 6 is an explanatory diagram showing a result of calculating coefficients by applying the query graph pattern shown in FIG. 5 to the graph structure data shown in FIG. 2. 図２に示すグラフ構造データに対して図６に示すクエリグラフパターンを適用して係数を計算した結果を示す説明図である。It is explanatory drawing which shows the result of having calculated the coefficient by applying the query graph pattern shown in FIG. 6 with respect to the graph structure data shown in FIG. 図８に示す係数の計算結果をグラフ構造により表した説明図である。It is explanatory drawing which represented the calculation result of the coefficient shown in FIG. 8 with the graph structure. 図９に示す係数の計算結果をグラフ構造により表した説明図である。It is explanatory drawing which represented the calculation result of the coefficient shown in FIG. 9 with the graph structure.

Explanation of symbols

１…関係抽出システム
１０…関係抽出装置
１１…クエリ発行部
１２…クエリグラフパターンデータベース
１３…係数計算部
２０…ユーザインタフェース提供部
３０…グラフ構造データベース DESCRIPTION OF SYMBOLS 1 ... Relation extraction system 10 ... Relation extraction apparatus 11 ... Query issuing part 12 ... Query graph pattern database 13 ... Coefficient calculation part 20 ... User interface provision part 30 ... Graph structure database

Claims

Receiving a selection of a plurality of nodes among nodes stored in a graph structure database storing data having a graph structure composed of various nodes and various arcs by an input means;
For each of the selected nodes, the selected node is substituted for a variable node of a query graph pattern for extracting data whose structure matches from the data having the graph structure by each query issuing means, and the query graph Obtaining data matching the pattern from the graph structure database;
The coefficient calculating means compares the data acquired for each of the selected nodes in the step of acquiring the data to determine the number of overlapping nodes, and calculates the coefficient between the selected nodes based on the overlap number. A calculating step;
A relationship extraction method characterized by comprising:

The relation according to claim 1, wherein the step of acquiring the data acquires a query graph pattern including a variable node corresponding to the selected node from a query graph pattern database storing a query graph pattern. Extraction method.

The relation extraction method according to claim 1, wherein the step of acquiring data generates a query graph pattern including a variable node corresponding to the selected node based on the selected node.

The relation extracting method according to claim 1, wherein the accepting step inputs a query graph pattern including a variable node corresponding to the selected node.

5. The relationship extraction method according to claim 1, further comprising a step of displaying a relationship between the selected nodes as a graph structure based on the coefficient calculated by the calculating step.

Connected to a graph structure database that stores data with a graph structure consisting of various nodes and arcs,
Input means for accepting selection of a plurality of nodes among the nodes stored in the graph structure database;
For each of the selected nodes, data that matches the query graph pattern by substituting the selected node into a variable node of a query graph pattern for extracting data whose structure matches from the data having the graph structure Issuance means for obtaining from the graph structure database;
A coefficient calculation means for comparing the data obtained by the query issuing means for each of the selected nodes to determine the number of overlapping nodes, and calculating a coefficient between the selected nodes based on the overlap number;
A relationship extraction system characterized by comprising:

A query graph pattern database storing query graph patterns;
The relation extraction system according to claim 6, wherein the query issuing unit acquires a query graph pattern including a variable node corresponding to the selected node from the query graph pattern database.

The relation extraction system according to claim 6, wherein the query issuing unit generates a query graph pattern including a variable node corresponding to the selected node based on the selected node.

The relation extraction system according to claim 6, wherein the input unit inputs a query graph pattern including a variable node corresponding to the selected node.

10. The relation extraction system according to claim 6, further comprising display means for displaying a relation between the selected nodes as a graph structure based on the coefficient calculated by the coefficient calculation means.