JP2012038023A

JP2012038023A - Query generation device and operation method therefor

Info

Publication number: JP2012038023A
Application number: JP2010176378A
Authority: JP
Inventors: Rie Sakai; 理江酒井; Kyoshi Iizuka; 京士飯塚; Hiroyuki Sato; 宏之佐藤; Takahiko Murayama; 隆彦村山; Toru Kobayashi; 透小林
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2010-08-05
Filing date: 2010-08-05
Publication date: 2012-02-23

Abstract

PROBLEM TO BE SOLVED: To provide a query generation device which can generate a query having versatility.SOLUTION: As representative paths have increased in number, the number of queries relevant to a group G13 has increased to a threshold value (1, for example) or more. Thus, when the numbers of queries for all groups have become equal to or higher than the threshold value, queries consisting solely of combined representative paths are retrieved.

Description

本発明は、クエリ生成装置およびその動作方法に関するものである。 The present invention relates to a query generation device and an operation method thereof.

近年、セマンティックWebやLinked Dataの活用分野が広がっている。このような技術により、複数のサイトで保有されていた異種のデータセットが結合され、以前は検索できなかったデータが検索可能となった。 In recent years, the fields of utilization of the Semantic Web and Linked Data have expanded. With such a technology, disparate data sets held at a plurality of sites are combined, and data that could not be searched before can be searched.

このようなデータセットを用いて、あるキーワードに関連する語などを得るには、データセットから生成されたクエリにキーワードを含ませ、このクエリに合致するデータをデータセットから検索し、検索されたデータから目的の語などを取り出す。 To obtain a word related to a certain keyword using such a data set, include the keyword in the query generated from the data set, and search the data set for data that matches the query. Extract the target word from the data.

統合されたデータセットについては、多数のクエリが生成されるため、クエリの数を絞り込む必要がある。 For consolidated datasets, many queries are generated, so the number of queries needs to be narrowed down.

しかし、単にクエリの数を絞り込んでしまうと、あるキーワードを使用した場合には目的の語などが得られるが、他のキーワードを使用した場合には得られないという可能性がある。つまり、絞り込んだクエリにおいて、多様性が欠如してしまうのである。 However, if the number of queries is simply narrowed down, the target word or the like can be obtained when a certain keyword is used, but it cannot be obtained when another keyword is used. In other words, the narrowed query lacks diversity.

また、多様性を確保できた場合であっても、得られた語などに必ずしも信頼性があるとは限らない。 Even if diversity can be ensured, the obtained word or the like is not always reliable.

従来においては、多数のクエリグラフパターンの中から、ユーザの関心に沿ったものをランキングの上位として表示するものがある（非特許文献１）。 Conventionally, among many query graph patterns, there is one that displays the one in line with the user's interest as the top ranking (Non-Patent Document 1).

しかし、ユーザの関心を反映するための重み付けをどのようなルールで誰が行うかという問題が解決していない。 However, the problem of who performs the weighting to reflect the user's interests under what rule has not been solved.

また、クエリグラフパターンを構成するパスの長さを制限することで、クエリグラフパターンの数を絞り込むものがある（非特許文献２）。 In addition, there is one that narrows down the number of query graph patterns by limiting the length of a path constituting the query graph pattern (Non-Patent Document 2).

しかし、長いパスで構成されるクエリグラフパターンによる検索でしか得られなった情報が得られなくなる可能性がある。 However, there is a possibility that information obtained only by a search using a query graph pattern composed of a long path cannot be obtained.

また、パスの長さを制限して得た情報は、キーワードとの関連性は高いが、信頼性は必ずしも高くない。 In addition, information obtained by limiting the path length is highly related to the keyword, but is not necessarily reliable.

特開２００６−３１３５０１号公報JP 2006-313501 A

”SemRank: ranking complex relationship search results on the semantic web”、［ｏｎｌｉｎｅ］、［平成２２年７月１３日検索］、インターネット＜ＵＲＬ：http://lsdis.cs.uga.edu/library/download/AMS05-WWW2005.pdf＞“SemRank: ranking complex relationship search results on the semantic web”, [online], [searched July 13, 2010], Internet <URL: http://lsdis.cs.uga.edu/library/download/AMS05 -WWW2005.pdf> ”What have Innsbruck and Leipzig in common? Extracting Semantics from Wiki Content”、［ｏｎｌｉｎｅ］、［平成２２年７月１３日検索］、インターネット＜ＵＲＬ：http://www.informatik.uni-leipzig.de/~auer/publication/ExtractingSemantics.pdf＞“What have Innsbruck and Leipzig in common? Extracting Semantics from Wiki Content”, [online], [searched July 13, 2010], Internet <URL: http://www.informatik.uni-leipzig.de/~ auer / publication / ExtractingSemantics.pdf>

本発明は、上記の課題に鑑みてなされたものであり、その目的とするところは、多様性を有するクエリを生成できるクエリ生成装置およびその動作方法を提供することにある。 The present invention has been made in view of the above problems, and an object of the present invention is to provide a query generation device capable of generating a query having diversity and an operation method thereof.

また、信頼性を有する語などを得るのに適し且つ多様性を有するクエリを生成できるようにすることが、さらなる目的である。 It is a further object to be able to generate queries that are suitable for obtaining reliable words and the like and have diversity.

上記の課題を解決するために、第１の本発明は、インスタンスおよび該インスタンスを示すクラスを有するノードを２以上有するデータで構成される２つのデータセットをマージして得られたデータセットから、データを検索するためのキーワードを示すものとして予め定められたクラスを有するノードと、検索されたデータにおける目的のインスタンスを示すものとして予め定められたクラスを有するノードとを結ぶデータを取得し、該データのインスタンスを変数に置き換えて得られたクエリが記憶されるクエリ記憶手段と、前記クエリに含まれるノードによる１本の経路のみからなるパスを前記２つのデータセットに共通するノードに該当するノードで分割するパス生成手段と、当該分割されたパスが記憶されるパス記憶手段と、前記パス記憶部のパスを該パスの両端のノードのインスタンスを示すクラスの組ごとのグループに分類するパス分類手段と、前記パス記憶部のパスの一方端のノードに該ノードのインスタンスとして所定の検索ワードを代入し、該パスに対して変数以外で合致するデータを前記マージされたデータセットから検索するデータ検索手段と、前記各グループで該グループ内の各パスを対象とし、対象のパスにより検索されたデータの他方端のノードのインスタンスの集合と、該対象でないパスにより検索されたデータの他方端のノードのインスタンスの集合とを取得し、前者の集合のインスタンス数を各集合の和集合のインスタンス数で除した値を計算し、予め定めた閾値より高い該値に対応するパスを代表のパスとして該グループから選択する代表パス選択手段と、異なるグループの代表のパスどうしをマージし且つマージ後のパスが前記キーワードに対応するノードと前記目的のインスタンスに対応するノードとを結ぶものとなるようにするパスマージ手段と、該マージされたパスが記憶される代表パス記憶手段と、前記クエリ記憶手段から該マージされたパスを含むクエリを検索し、マージ前の各パスが属していたグループの組のそれぞれにつき、該検索されたクエリの数が予め定められた数以上となるように、前記代表パス選択手段に対して閾値を調整するように指示する第１のクエリ検索手段と、検索されたクエリの数が予め定められた数以上となったなら、前記クエリ記憶手段から該マージされたパスのみからなるクエリを検索する第２のクエリ検索手段とを備えることを特徴とするクエリ生成装置をもって解決手段とする。 In order to solve the above-mentioned problem, the first aspect of the present invention is based on a data set obtained by merging two data sets composed of data having two or more nodes having an instance and a class indicating the instance. Obtaining data connecting a node having a predetermined class as indicating a keyword for searching data and a node having a predetermined class as indicating a target instance in the searched data; A query storage means for storing a query obtained by replacing an instance of data with a variable, and a node corresponding to a node common to the two data sets in a path consisting of only one route by a node included in the query A path generation means for dividing the path, a path storage means for storing the divided path, Path classifying means for classifying the path of the path storage unit into a group for each set of classes indicating instances of nodes at both ends of the path, and a predetermined search as an instance of the node at one node of the path of the path storage unit Data search means for substituting a word and searching for data that matches the path other than variables from the merged data set, and for each path in the group in each group, search by target path To obtain a set of instances of the other end node of the obtained data and a set of instances of the other end node of the data searched by the non-target path, and calculate the number of instances of the former set of the union of each set. A representative that calculates a value divided by the number of instances and selects a path corresponding to the value higher than a predetermined threshold as a representative path from the group Path merging means for merging paths representative of different groups, and for the merged path to connect the node corresponding to the keyword and the node corresponding to the target instance; and A representative path storage means for storing the merged path and a query including the merged path are searched from the query storage means, and the search is performed for each group set to which each path before merging belongs. First query search means for instructing the representative path selection means to adjust the threshold value, and the number of searched queries is determined in advance so that the number of queries obtained is equal to or greater than a predetermined number. And a second query search means for searching a query consisting only of the merged path from the query storage means. The query generation device that performs the processing is used as a solving means.

例えば、前記第２のクエリ検索手段は、前記クエリ記憶手段から、２以上のマージされたパスのみからなるクエリを検索する。 For example, the second query search unit searches the query storage unit for a query including only two or more merged paths.

第２の本発明は、インスタンスおよび該インスタンスを示すクラスを有するノードを２以上有するデータで構成される２つのデータセットをマージして得られたデータセットから、データを検索するためのキーワードを示すものとして予め定められたクラスを有するノードと、検索されたデータにおける目的のインスタンスを示すものとして予め定められたクラスを有するノードとを結ぶデータを取得し、該データのインスタンスを変数に置き換えて得られたクエリが記憶されるクエリ記憶手段を備えるクエリ生成装置の動作方法であって、前記クエリに含まれるノードによる１本の経路のみからなるパスを前記２つのデータセットに共通するノードに該当するノードで分割し、当該分割されたパスをパス記憶手段に記憶させるパス生成工程と、前記パス記憶部のパスを該パスの両端のノードのインスタンスを示すクラスの組ごとのグループに分類するパス分類工程と、前記パス記憶部のパスの一方端のノードに該ノードのインスタンスとして所定の検索ワードを代入し、該パスに対して変数以外で合致するデータを前記マージされたデータセットから検索するデータ検索工程と、前記各グループで該グループ内の各パスを対象とし、対象のパスにより検索されたデータの他方端のノードのインスタンスの集合と、該対象でないパスにより検索されたデータの他方端のノードのインスタンスの集合とを取得し、前者の集合のインスタンス数を各集合の和集合のインスタンス数で除した値を計算し、予め定めた閾値より高い該値に対応するパスを代表のパスとして該グループから選択する代表パス選択工程と、異なるグループの代表のパスどうしをマージし且つマージ後のパスが前記キーワードに対応するノードと前記目的のインスタンスに対応するノードとを結ぶものとなるようにし、該マージされたパスを代表パス記憶手段に記憶させるパスマージ工程と、前記クエリ記憶手段から該マージされたパスを含むクエリを検索し、マージ前の各パスが属していたグループの組のそれぞれにつき、該検索されたクエリの数が予め定められた数以上となるように、前記代表パス選択手段に対して閾値を調整するように指示する第１のクエリ検索工程と、検索されたクエリの数が予め定められた数以上となったなら、前記クエリ記憶手段から該マージされたパスのみからなるクエリを検索する第２のクエリ検索工程とを備えることを特徴とするクエリ生成装置の動作方法をもって解決手段とする。 The second aspect of the present invention shows a keyword for searching for data from a data set obtained by merging two data sets composed of data having two or more nodes each having an instance and a class indicating the instance. Obtain data connecting a node having a predetermined class as a thing and a node having a predetermined class to indicate a target instance in the retrieved data, and replace the data instance with a variable. An operation method of a query generation device comprising a query storage means for storing a received query, wherein a path consisting of only one route by a node included in the query corresponds to a node common to the two data sets A path generation step of dividing by a node and storing the divided path in a path storage means A path classification step of classifying paths of the path storage unit into groups for each class set indicating instances of nodes at both ends of the path, and a node at one end of the path of the path storage unit as an instance of the node A data search step of searching the merged data set for data that matches a path other than a variable, and each path in the group in each group as a target. To obtain the set of instances of the node at the other end of the data searched for and the set of instances of the node at the other end of the data searched by the non-target path, and add the number of instances of the former set to the sum of each set Calculate the value divided by the number of instances in the set and select the path corresponding to the value higher than the predetermined threshold from the group as the representative path The representative path selection step and the representative paths of different groups are merged so that the merged path connects the node corresponding to the keyword and the node corresponding to the target instance. A path merging step of storing the path in the representative path storage means, and a query including the merged path is searched from the query storage means, and the search is performed for each group set to which each path before merging belongs. A first query search step for instructing the representative path selection means to adjust a threshold value, and a number of searched queries so that the number of queries is equal to or greater than a predetermined number. A second query search step of searching for a query consisting only of the merged path from the query storage means. The feature generation method of the query generation device is used as the solution.

例えば、前記第２のクエリ検索工程は、前記クエリ記憶手段から、２以上のマージされたパスのみからなるクエリを検索する。 For example, in the second query search step, a query including only two or more merged paths is searched from the query storage unit.

本発明によれば、マージ前の代表のパスが属していたグループの組のそれぞれにつき、検索されたクエリの数が予め定められた数以上となるようにするので、多様性を有するクエリを得ることができる。 According to the present invention, the number of retrieved queries is set to be equal to or more than a predetermined number for each group set to which the representative path before merging belongs, so that a query having diversity is obtained. be able to.

本実施の形態に係るクエリグラフパターン生成装置の構成要素を含むデータ検索装置のブロック図である。It is a block diagram of the data search apparatus containing the component of the query graph pattern generation apparatus which concerns on this Embodiment. データセットＤ１を構成するデータを示す図である。It is a figure which shows the data which comprise the data set D1. データセットＤ３のＲＤＦのインスタンス同士の接続関係をクラス同士の関係に置き換えて表した図である。It is the figure which replaced and represented the connection relationship between the RDF instances of the data set D3 with the relationship between classes. クエリの一例を示す図である。It is a figure which shows an example of a query. 分割後の各クエリの部分を示す図である。It is a figure which shows the part of each query after a division | segmentation. クエリの分割の様子を示す図である。It is a figure which shows the mode of the division | segmentation of a query. データセットＤ１から生成されたパスの分類の様子を示す図である。It is a figure which shows the mode of the classification | category of the path | pass produced | generated from the data set D1. データセットＤ２から生成されたパスの分類の様子を示す図である。It is a figure which shows the mode of the classification | category of the path | pass produced | generated from the data set D2. パスに検索ワードを含ませた様子を示す図である。It is a figure which shows a mode that the search word was included in the path | pass. 検索ワードを含むパスと検索されるデータを示す図である。It is a figure which shows the path | pass including a search word, and the data searched. グループＧ１３から代表のパスが選択される様子を示す図である。It is a figure which shows a mode that the representative path | pass is selected from the group G13. パスがマージされる様子を示す図である。It is a figure which shows a mode that a path | pass is merged. 代表のパスをマージしたものによる１回目の検索の集計の結果を示す図である。It is a figure which shows the result of the total of the 1st search by what merged the representative path | pass. 任意のパスをマージしたものによる検索の集計の結果を示す図である。It is a figure which shows the result of the total of the search by what merged arbitrary paths. 代表のパスをマージしたものによる２回目の検索の集計の結果を示す図である。It is a figure which shows the result of the total of the 2nd search by what merged the representative path | pass. パスＨ１、Ｈ２と、これらのパスから検索されたクエリＱ１０１とを示す図である。It is a figure which shows the path | route H1, H2, and the query Q101 searched from these paths. クエリにキーワードを含ませる様子を示す図である。It is a figure which shows a mode that a keyword is included in a query. 「インターネット」を含ませたクエリＱ１０１について、クラスの値、プロパティ、有向性までも全て図示したものである。The query Q101 including “Internet” also illustrates all the class values, properties, and directional characteristics. クエリにキーワードを含ませものから検索されたデータの例を示す図である。It is a figure which shows the example of the data searched from what includes the keyword in the query.

以下、本発明の実施の形態について図面を参照して説明する。なお、一部の用語として、ＲＤＦ（Resource Description Framework）の技術用語を用いる。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. Note that RDF (Resource Description Framework) technical terms are used as some terms.

図１は、本実施の形態に係るクエリ生成装置の構成要素を含むデータ検索装置のブロック図である。 FIG. 1 is a block diagram of a data search apparatus including components of the query generation apparatus according to the present embodiment.

データ検索装置１は、データセットＤ１を記憶するデータセット記憶部１１と、データセットＤ２を記憶するデータセット記憶部１２と、データセットＤ１、Ｄ２をマージ（結合）するデータセットマージ部１３と、マージ後のデータセットＤ３を記憶するデータセット記憶部１４と、外部から入力されるパラメータを基にデータセットＤ３からクエリグラフパターン（略してクエリという）を生成するクエリ生成部１５と、生成されたクエリが記憶されるクエリ記憶部１６と、クエリからパスを生成するパス生成部１７と、生成されたパスが記憶されるパス記憶部１８と、パスを複数のグループに分類するパス分類部１９と、データセットＤ３内の検索ワードを適用したパスに合致するデータをデータセットＤ３から検索するデータ検索部１０１と、検索されたデータを基にグループから代表のパスを選択する代表パス選択部１０２と、パス同士をマージするパスマージ部１０３と、マージされた代表のパスおよびマージできなかった単独の代表のパスが記憶される代表パス記憶部１０４と、マージされた任意のパスおよびマージできなかった単独の任意のパスが記憶されるパス記憶部１０５と、代表パス記憶部１０４やパス記憶部１０５のパスに合致するパスを含むクエリをクエリ記憶部１６から検索するクエリ検索部１０６と、クエリ記憶部１６から代表パス記憶部１０４の複数のパスのみからなるクエリを検索するクエリ検索部１０７と、検索されたクエリが記憶されるクエリ記憶部１０８と、クエリ記憶部１０８のクエリにキーワードを代入し、代入後のクエリに合致するデータをデータセットＤ３から検索するデータ検索部１０９とを備える。 The data search device 1 includes a data set storage unit 11 that stores the data set D1, a data set storage unit 12 that stores the data set D2, a data set merge unit 13 that merges (combines) the data sets D1 and D2, A data set storage unit 14 for storing the merged data set D3, a query generation unit 15 for generating a query graph pattern (referred to as a query for short) from the data set D3 based on parameters input from the outside, A query storage unit 16 that stores a query; a path generation unit 17 that generates a path from the query; a path storage unit 18 that stores a generated path; and a path classification unit 19 that classifies the path into a plurality of groups. A data search unit that searches the data set D3 for data that matches the path to which the search word in the data set D3 is applied. 01, a representative path selection unit 102 that selects a representative path from the group based on the retrieved data, a path merge unit 103 that merges the paths, a merged representative path, and a single representative that could not be merged A representative path storage unit 104 in which paths are stored, a path storage unit 105 in which arbitrary merged paths and a single arbitrary path that cannot be merged are stored, and paths in the representative path storage unit 104 and the path storage unit 105 A query search unit 106 that searches the query storage unit 16 for a query that includes a path that matches the query path, and a query search unit 107 that searches the query storage unit 16 for a query that includes only a plurality of paths in the representative path storage unit 104. A keyword is substituted for the query stored in the query storage unit 108 and the query storage unit 108 to match the query after the substitution. And a data retrieval unit 109 for retrieving data from a data set D3.

図２は、データセットＤ１を構成するデータを示す図である。 FIG. 2 is a diagram illustrating data constituting the data set D1.

図２（ａ）、（ｂ）に示すように、最小単位のデータは、２つのノードの一方を示す情報（ノードという）と、他方を示す情報（ノードという）と、ノード間を結ぶアークを示す情報（アークという）からなるラベル付き有向グラフデータである。 As shown in FIGS. 2A and 2B, the minimum unit data includes information indicating one of two nodes (referred to as a node), information indicating the other (referred to as a node), and an arc connecting the nodes. It is the directed graph data with a label which consists of the information to show (it is called an arc).

ノードは、インスタンスという値（「インターネット」、「卒業研究課題」、「田中一郎」など）を有する。ノードは、インスタンスの種類を示すクラスという値（「技術用語」、「研究課題」、「教職員」など）を有する。クラス「研究課題」を有するノードのインスタンスは、「研究課題」という種類に属する「卒業研究課題」などである。クラス「教職員」を有するノードのインスタンスは、「教職員」という種類に属する「田中一郎」である。 The node has a value of “instance” (“Internet”, “Graduation Research Project”, “Ichiro Tanaka”, etc.). The node has a value of a class (“technical term”, “research subject”, “faculty member”, etc.) indicating the type of instance. An instance of a node having the class “research project” is a “graduation research project” belonging to the type of “research project”. An instance of a node having the class “faculty and staff” is “Ichiro Tanaka” belonging to the type “faculty and staff”.

なお、クラス「技術用語」を有するノード（キーワードノード）のインスタンスは、外部から入力されるキーワードに一致するか否かの判定対象であり、データ検索装置１では、キーワードノードは特殊ノードとして扱われている。 Note that an instance of a node (keyword node) having the class “technical term” is a determination target of whether or not it matches a keyword input from the outside, and the keyword node is treated as a special node in the data search device 1. ing.

アークは、有向性を有し、つまり、どちらかのノードに向き、プロパティという値を有する。アークの向きは、ノードの関係に応じたものとなっている。 The arc is directional, that is, it is directed to either node and has a property value. The direction of the arc is in accordance with the node relationship.

図２（ａ）のデータは、「インターネットに関する卒業研究課題がある」という事実により得られたもので、アークの向きは「卒業研究課題」から「インターネット」を導くという動作に対応している。プロパティは、「インターネットに関する卒業研究課題がある」を「インターネットが卒業研究課題の技術用語である」と言い換えられることから、「技術用語」となっている。 The data in FIG. 2A is obtained by the fact that “there is a graduation research topic related to the Internet”, and the direction of the arc corresponds to the operation of deriving the “Internet” from the “gradation research problem”. The property is “technical term” because “there is a graduation research subject related to the Internet” can be rephrased as “the Internet is a technical term for the graduation research subject”.

図２（ｃ）は、図２（ａ）、（ｂ）のデータをマージ（結合）したものである。 FIG. 2C is a result of merging (joining) the data of FIGS. 2A and 2B.

データセットＤ１では、２つのデータに同じノード（「「卒業研究課題」のノード）がある場合、２つのデータはそのノードで互いに関連づけられる。これを、便宜的に、２つのデータがマージされたという。図２（ｃ）は、マージ後のデータを示している。 In the data set D1, when two data have the same node (“Graduation Research Project” node), the two data are associated with each other at the node. For convenience, it is said that the two data are merged. FIG. 2C shows the data after merging.

データセットＤ１は、マージ可能な多数のデータを有し、マージが繰り返され、網の目状となっている。 The data set D1 has a large number of data that can be merged, and the merge is repeated to form a mesh.

データセットＤ１は、例えば、大学関連のデータとなっている。 The data set D1 is university-related data, for example.

一方、データセットＤ２は、例えば、ＳＮＳ（Social Networking Service）となっている。図示しないが、データセットＤ２も、データセットＤ１と同様なデータからなり、網の目状となっている。 On the other hand, the data set D2 is, for example, SNS (Social Networking Service). Although not shown, the data set D2 also includes data similar to the data set D1 and has a mesh shape.

図１に戻り、まず、データセットマージ部１３は、データセットＤ１、Ｄ２に共に存在し且つクラスとインスタンスが共に互いに等しいノードを同一化しそのノードでデータセットＤ１、Ｄ２をマージする。データセットマージ部１３は、マージ後のデータセットＤ３をデータセット記憶部１４に記憶させる。 Returning to FIG. 1, first, the data set merging unit 13 identifies nodes that exist in the data sets D1 and D2 and have the same class and instance, and merges the data sets D1 and D2 at the nodes. The data set merge unit 13 causes the data set storage unit 14 to store the merged data set D3.

図３は、データセットＤ３のＲＤＦのインスタンス同士の接続関係をクラス同士の関係に置き換えて表した図である。この図では例えば技術用語をクラスとするインスタンスと研究課題をクラスとするインスタンスがアークによって結ばれたＲＤＦのトリプルが存在する事を示している。 FIG. 3 is a diagram in which the connection relationship between the RDF instances of the data set D3 is replaced with the relationship between classes. In this figure, for example, there is an RDF triple in which an instance having a technical term as a class and an instance having a research subject as a class are connected by an arc.

図３は、クラス「技術用語」とクラス「教職員」とでマージされたデータセットＤ３のＲＤＦのインスタンス同士の接続関係をクラス同士の関係に置き換えて示している。各ノード（マージノード）は、データセットＤ１、Ｄ２のそれぞれにあったものである。 FIG. 3 shows the connection relationship between the RDF instances of the data set D3 merged with the class “technical terms” and the class “faculty and staff” replaced with the relationship between classes. Each node (merge node) is in each of the data sets D1 and D2.

図１に戻り、データ検索装置１は、キーワード「インターネット」に詳しい「教職員」の名前（例えば、「田中一郎」）を調べるのに先立ち、外部からキーワード「インターネット」、パラメータ「教職員」を入力する。 Returning to FIG. 1, the data search apparatus 1 inputs the keyword “Internet” and the parameter “faculty and staff” from the outside prior to examining the name of “faculty and staff” familiar with the keyword “Internet” (for example, “Ichiro Tanaka”). .

クエリ生成部１５は、パラメータ「教職員」を基にデータセットＤ３からクエリを生成し、生成されたクエリをクエリ記憶部１６に記憶させる。 The query generation unit 15 generates a query from the data set D <b> 3 based on the parameter “faculty / staff”, and stores the generated query in the query storage unit 16.

クエリ生成部１５は、具体的には、まず、特殊ノードつまりクラス「技術用語」を有するノード（キーワードノード）と、パラメータに等しいクラス「教職員」を有するノード（ゴールノード）と、これらのノードを結ぶ１以上のアークまたはこれらのノードを１以上の経路で結ぶアークおよびノードとからなるデータをデータセットＤ３から全て取得する。 Specifically, the query generation unit 15 first includes a special node, that is, a node having a class “technical term” (keyword node), a node having a class “faculty member” equal to a parameter (goal node), and these nodes. All the data consisting of one or more arcs to be connected or arcs and nodes connecting these nodes through one or more paths are acquired from the data set D3.

次に、クエリ生成部１５は、各データの各ノードのインスタンスをノードのクラスに応じた変数に置き換える。なお、クエリ生成部１５は、データのアークの有向性を示す値およびプロパティは置き換えずに残しておく。こうして得たものが全てクエリである。 Next, the query generation unit 15 replaces the instance of each node of each data with a variable corresponding to the class of the node. Note that the query generation unit 15 leaves the value and property indicating the directivity of the arc of the data without being replaced. All that is obtained is a query.

図１に戻り、パス生成部１７は、クエリ記憶部１６に記憶された全てのクエリから可能な限りパスを生成し、生成されたパスをパス記憶部１８に記憶させる。 Returning to FIG. 1, the path generation unit 17 generates as many paths as possible from all the queries stored in the query storage unit 16, and stores the generated paths in the path storage unit 18.

図４は、クエリの一例を示す図である。 FIG. 4 is a diagram illustrating an example of a query.

パス生成部１７は、具体的には、まず、図４に示すような、キーワードノードとゴールノードを複数の経路（アークまたはアークとノードからなる）で結ぶクエリを検索する。 Specifically, the path generation unit 17 first searches for a query connecting a keyword node and a goal node with a plurality of paths (consisting of arcs or arcs and nodes) as shown in FIG.

クエリＱ１では、クラス「技術用語」を有するノードに応じた変数「？技術用語」を有するノードがキーワードノードであり、クラス「教職員」に応じた変数「？教職員」を有するノードがゴールノードである。他のノードも同様に、「？」とクラスの値（「研究課題」など）からなる変数を有する。 In the query Q1, a node having a variable “? Technical term” corresponding to a node having the class “technical term” is a keyword node, and a node having a variable “? Teacher” corresponding to the class “teacher” is a goal node. . Similarly, other nodes have a variable consisting of “?” And a class value (such as “research subject”).

図５は、分割後の各クエリの部分を示す図である。 FIG. 5 is a diagram showing a part of each query after division.

パス生成部１７は、次に、図５（ａ）、（ｂ）に示すように、該当のクエリを経路ごとに分割する。分割後の各クエリは、キーワードノードとゴールノードを１つの経路で結ぶクエリとなる。 Next, as shown in FIGS. 5A and 5B, the path generation unit 17 divides the corresponding query for each path. Each query after the division is a query that connects the keyword node and the goal node through one route.

次に、パス生成部１７は、分割後のクエリおよび元々キーワードノードとゴールノードを１つの経路で結ぶクエリをさらに分割する。 Next, the path generation unit 17 further divides the divided query and the query that originally connects the keyword node and the goal node with one route.

図６は、クエリの分割の様子を示す図である。 FIG. 6 is a diagram illustrating how a query is divided.

パス生成部１７は、具体的には、図６に示すように、まず、キーワードノードとゴールノード以外のノード（中間ノード）で、且つ、マージノード（クラス「技術用語」、または、「教職員」を有するノード）に該当するノード（マージノード）を有するクエリを検索する。 Specifically, as shown in FIG. 6, the path generation unit 17 is a node (intermediate node) other than a keyword node and a goal node, and a merge node (class “technical term” or “faculty member”). A query having a node (merge node) corresponding to (node having a node) is searched.

ここで、パス生成部１７は、例えば、データセットＤ１、Ｄ２をマージした際に、マージノードとして記憶されたノードを中間ノードとして有するクエリを検索する。 Here, for example, when the data sets D1 and D2 are merged, the path generation unit 17 searches for a query having a node stored as a merge node as an intermediate node.

または、パス生成部１７は、例えば、予め各データセットのノードに該当のデータセットを示す識別情報（名前など）が付与されているなら、両方の識別情報を有するノードつまりマージノードを中間ノードとして有するクエリを検索する。 Alternatively, for example, if the identification information (name or the like) indicating the corresponding data set is given to the node of each data set in advance, the path generation unit 17 uses a node having both identification information, that is, a merge node as an intermediate node. Find the query you have.

図６の変数「？教職員」を有するノードは、図３のクラス「教職員」を有するマージノードのインスタンスを変数「？教職員」に置き換えたものであり、よって、マージノードである。 The node having the variable “? Faculty staff” in FIG. 6 is a merge node obtained by replacing the instance of the merge node having the class “faculty staff” in FIG. 3 with the variable “? Faculty staff”.

パス生成部１７は、次に、図６に示すように、該当のクエリをマージノードで分割する。分割後のクエリは、キーワードノードとマージノードを１つの経路で結ぶクエリとなる。 Next, as illustrated in FIG. 6, the path generation unit 17 divides the corresponding query at the merge node. The divided query is a query that connects the keyword node and the merge node through one route.

パス生成部１７は、分割後のクエリにマージノードがある場合は、さらにクエリをマージノードで分割する。元々中間ノード且つマージノードを有していなかったクエリおよび分割後のクエリが全てパスである。 If there is a merge node in the divided query, the path generation unit 17 further divides the query by the merge node. The query that originally did not have an intermediate node and merge node and the query after splitting are all paths.

図１に戻り、次に、パス分類部１９が、パス記憶部１８の全てのパスを複数のグループに分類する。 Returning to FIG. 1, the path classification unit 19 then classifies all the paths in the path storage unit 18 into a plurality of groups.

パス分類部１９は、具体的には、まず、全てのパスを、データセットＤ１から生成されたパスのグループと、データセットＤ２から生成されたパスのグループとに分類する。 Specifically, the path classification unit 19 first classifies all the paths into a group of paths generated from the data set D1 and a group of paths generated from the data set D2.

次に、パス分類部１９は、各グループのパスを、両端のノードが有するクラスの値の組み合わせによって分類する。 Next, the path classification unit 19 classifies the paths of each group according to combinations of class values possessed by nodes at both ends.

図７は、データセットＤ１から生成されたパスの分類の様子を示す図である。 FIG. 7 is a diagram showing a state of classification of paths generated from the data set D1.

データセットＤ１から生成されたパスでは、両端のノードのクラスの値は、「技術用語」、「教職員」のいずれかなので、パスは、両端のクラスの値が共に「技術用語」のグループＧ１１、両端のクラスの値が共に「教職員」のグループＧ１２、一方端のクラスの値が「技術用語」且つ他方端のクラスの値が「教職員」のグループＧ１３、に分類される。 In the path generated from the data set D1, the value of the class of the nodes at both ends is either “technical term” or “faculty and staff”, so the path has a group G11 whose class values at both ends are both “technical term”. The class values at both ends are classified into a group G12 in which “faculty and staff” are included, and the group G13 in which the value at one end is “technical term” and the value at the other end is “faculty and staff”.

グループＧ１１には、パスＧ１１１、１１２が含まれる。グループＧ１２には、パスＧ１２１、１２２が含まれる。グループＧ１３には、パスＧ１３１、１３２、１３３が含まれる。 The group G11 includes paths G111 and 112. The group G12 includes paths G121 and 122. The group G13 includes paths G131, 132, and 133.

図８は、データセットＤ２から生成されたパスの分類の様子を示す図である。 FIG. 8 is a diagram showing a state of classification of paths generated from the data set D2.

データセットＤ２から生成されたパスも、同様に、３つのグループＧ２１、Ｇ２２、Ｇ２３に分類される。 Similarly, the paths generated from the data set D2 are also classified into three groups G21, G22, and G23.

グループＧ２１には、パスＧ２１１、２１２、２１３が含まれる。グループＧ２２には、パスＧ２２１、２２２が含まれる。グループＧ２３には、パスＧ２３１、２３２、２３３が含まれる。 The group G21 includes paths G211, 212, and 213. The group G22 includes paths G221 and 222. The group G23 includes paths G231, 232, and 233.

よって、パスは６グループに分類される。 Therefore, the paths are classified into 6 groups.

なお、パス分類部１９は、パスにそのパスが属するグループのＩＤなどを付与する。 The path classification unit 19 assigns the ID of the group to which the path belongs to the path.

図１に戻り、データ検索部１０１は、パス記憶部１８のパスにデータセットＤ３内の検索ワードを適用し、適用後のパスに合致するデータをデータセットＤ３から検索する。 Returning to FIG. 1, the data search unit 101 applies the search word in the data set D3 to the path of the path storage unit 18, and searches the data set D3 for data that matches the path after application.

データ検索部１０１は、まず、パスにおける端のノードのクラスの値ごとに、該値と同じクラスを有するノードのインスタンスをデータセットＤ３から重複しないように１以上（便宜的に２とする）取得する。取得されたインスタンスを以下、検索ワードという。 First, for each class value of the end node in the path, the data search unit 101 obtains one or more (for convenience, 2) instances of a node having the same class as the value so as not to be duplicated from the data set D3. To do. The acquired instance is hereinafter referred to as a search word.

図７、図８の例では、クラスの値は、「技術用語」、「教職員」の２つなので、データ検索部１０１は、「技術用語」から２つの検索ワード（以下、検索ワードＳＷ１１、ＳＷ１２）を取得し、「教職員」から２つの検索ワード（以下、検索ワードＳＷ２１、ＳＷ２２）を取得する。 In the example of FIGS. 7 and 8, there are two class values, “technical term” and “faculty / staff”. Therefore, the data search unit 101 uses two search words (hereinafter referred to as search words SW11 and SW12) from “technical term”. ) And two search words (hereinafter referred to as search words SW21 and SW22) are acquired from the “faculty and staff”.

データ検索部１０１は、まず、各グループについて以下のことを行う。 The data search unit 101 first performs the following for each group.

図９は、パスに検索ワードを含ませた様子を示す図である。 FIG. 9 is a diagram illustrating a state in which a search word is included in the path.

つまり、データ検索部１０１は、図９に示すように、グループＧ１１については、検索ワードＳＷ１１、ＳＷ１２のいずれかを一方端のインスタンスとして含ませたパスを生成する。こうして、パスＧ１１１からパスＰ１１、１２が生成され、パスＧ１１２からパスＰ２１、２２が生成され、データ検索部１０１は、他のグループＧ１２〜Ｇ２３についても同様の処理を行う。 That is, as shown in FIG. 9, the data search unit 101 generates a path including one of the search words SW11 and SW12 as one end instance for the group G11. In this way, the paths P11 and P12 are generated from the path G111, the paths P21 and P22 are generated from the path G112, and the data search unit 101 performs the same processing for the other groups G12 to G23.

図１０は、検索ワードを含むパスと検索されるデータを示す図である。 FIG. 10 is a diagram illustrating a path including a search word and data to be searched.

次に、データ検索部１０１は、図１０に一例を示すような、パスＰ１１に対し変数（「？研究課題」など）以外で全てが合致する全てのデータをデータセットＤ３から検索する。図１０では、検索ワードＳＷ１１は「特許」である。また、図１０では、図９では省略したクラスの値、プロパティ、有向性までも全て図示している。また、データ検索部１０１は、他のパスＰ１２、２１、２２についても同様の検索を行う。 Next, the data search unit 101 searches the data set D3 for all data that matches all of the paths P11 other than variables (such as “? Research Project”) as shown in FIG. In FIG. 10, the search word SW11 is “patent”. FIG. 10 also shows all of the values of class, properties, and directivity that are omitted in FIG. In addition, the data search unit 101 performs the same search for the other paths P12, 21, and 22.

また、データ検索部１０１は、他のグループＧ１２〜Ｇ２３についても同様の処理を行う。 In addition, the data search unit 101 performs the same processing for the other groups G12 to G23.

そして、データ検索部１０１は、検索されたデータを代表パス選択部１０２に送信する。 Then, the data search unit 101 transmits the searched data to the representative path selection unit 102.

図１に戻り、代表パス選択部１０２は、データを基にグループから代表のパスを選択する。 Returning to FIG. 1, the representative path selection unit 102 selects a representative path from the group based on the data.

代表パス選択部１０２は、複数のパスを含むグループのそれぞれにおいて、当該グループに属する各パスを対象に値の集合Ｍを計算する。集合Ｍは、対象のパス（対象パス）が、同じグループの対象でない各パス（非対象パス）に対して有する特性を示す特性値ｍからなる集合である。 The representative path selection unit 102 calculates a set M of values for each path belonging to the group in each group including a plurality of paths. The set M is a set of characteristic values m indicating characteristics that the target path (target path) has for each path (non-target path) that is not the target of the same group.

例えば、グループＧ１１においては、パスＧ１１１が対象の集合Ｍは、パスＧ１１１がパスＧ１１２に対して有する特性を示す特性値ｍのみからなる。また、パスＧ１１２が対象の集合Ｍは、パスＧ１１２がパスＧ１１１に対して有する特性を示す特性値ｍのみからなる。 For example, in the group G11, the set M targeted by the path G111 includes only the characteristic value m indicating the characteristic that the path G111 has with respect to the path G112. Further, the set M targeted by the path G112 includes only the characteristic value m indicating the characteristic of the path G112 with respect to the path G111.

例えば、グループＧ１３においては、パスＧ１３１が対象の集合Ｍは、パスＧ１３１がパスＧ１３２に対して有する特性を示す特性値ｍ、パスＧ１３１がパスＧ１３３に対して有する特性を示す特性値ｍからなる。パスＧ１３２が対象の集合Ｍは、パスＧ１３２がパスＧ１３１に対して有する特性を示す特性値ｍ、パスＧ１３２がパスＧ１３３に対して有する特性を示す特性値ｍからなる。パスＧ１３３が対象の集合Ｍは、パスＧ１３３がパスＧ１３２に対して有する特性を示す特性値ｍ、パスＧ１３３がパスＧ１３１に対して有する特性を示す特性値ｍからなる。 For example, in the group G13, the set M that is the target of the path G131 includes the characteristic value m indicating the characteristic that the path G131 has for the path G132, and the characteristic value m that indicates the characteristic that the path G131 has for the path G133. The set M targeted by the path G132 includes a characteristic value m indicating a characteristic of the path G132 with respect to the path G131, and a characteristic value m indicating a characteristic of the path G132 with respect to the path G133. The set M targeted by the path G133 includes a characteristic value m indicating a characteristic that the path G133 has for the path G132, and a characteristic value m indicating a characteristic that the path G133 has for the path G131.

代表パス選択部１０２は、このような特性値ｍを以下の式（１）により計算する。

The representative path selection unit 102 calculates such a characteristic value m by the following equation (1).

ただし、ｋｉは、該当のグループについて使用された各検索ワード
Ｑ１（ｋｉ）は、検索ワードｋｉの適用後の対象パスから検索されたデータの集合、
Ｑ２（ｋｉ）は、検索ワードｋｉの適用後の非対象パスから検索されたデータの集合、
Ｑ１（ｋｉ）∪Ｑ２（ｋｉ）は、集合Ｑ１（ｋｉ）と集合Ｑ２（ｋｉ）の和集合、
｜Ｑ１（ｋｉ）∪Ｑ２（ｋｉ）｜は、和集合Ｑ１（ｋｉ）∪Ｑ２（ｋｉ）に含まれるデータ数、
｜Ｑ１（ｋｉ）｜は、集合Ｑ１（ｋｉ）に含まれるデータ数、
Σは、データ数｜Ｑ１（ｋｉ）｜をデータ数｜Ｑ１（ｋｉ）∪Ｑ２（ｋｉ）｜で除した値を各検索ワードについて求めて総和したもの、
｜Ｋ｜は、各検索ワードｋｉの集合Ｍに含まれる検索ワード数、である。 However, ki is the search word Q1 (ki) used for the corresponding group is a set of data searched from the target path after application of the search word ki,
Q2 (ki) is a set of data searched from the non-target path after application of the search word ki,
Q1 (ki) ∪Q2 (ki) is the union of the set Q1 (ki) and the set Q2 (ki),
| Q1 (ki) ∪Q2 (ki) | is the number of data included in the union Q1 (ki) ∪Q2 (ki)
| Q1 (ki) | is the number of data included in the set Q1 (ki),
Σ is the sum of the values obtained by dividing the number of data | Q1 (ki) | by the number of data | Q1 (ki) ∪Q2 (ki) |
| K | is the number of search words included in the set M of each search word ki.

つまり、特性値ｍは、対象パスから検索されたデータの集合で非対象パスから検索されたデータの集合をどの程度網羅できるかを示す値である。 That is, the characteristic value m is a value indicating how much a set of data retrieved from the non-target path can be covered by a set of data retrieved from the target path.

代表パス選択部１０２は、次に、例えば、グループＧ１１においては、パスＧ１１１が対象の集合Ｍ、パスＧ１１２が対象の集合Ｍの２つから、予め定めた閾値より大きい特性値ｍを有する集合Ｍを選択し、グループＧ１１においては、選択された集合Ｍに対応するパスを代表として選択する。 Next, for example, in the group G11, the representative path selection unit 102 selects a set M having a characteristic value m larger than a predetermined threshold from two sets, the path G111 being the target set M and the path G112 being the target set M. In the group G11, a path corresponding to the selected set M is selected as a representative.

代表パス選択部１０２は、例えば、グループＧ１３においては、パスＧ１３１が対象の集合Ｍ、パスＧ１３２が対象の集合Ｍ、パスＧ１３３が対象の集合Ｍの３つから、前記閾値より大きい特性値ｍのみからなる集合Ｍを選択し、グループＧ１３においては、選択された集合Ｍに対応するパスを代表として選択する。 For example, in the group G13, the representative path selection unit 102 selects only the characteristic value m that is larger than the threshold value from three, the path G131 being the target set M, the path G132 being the target set M, and the path G133 being the target set M. And a group G13 selects a path corresponding to the selected set M as a representative.

図１１は、グループＧ１３から代表のパスが選択される様子を示す図である。 FIG. 11 is a diagram illustrating a state in which a representative path is selected from the group G13.

パスＧ１３１が対象の集合Ｍは、パスＧ１３２に対する特性値２９．３％、パスＧ１３３に対する特性値６４．７％からなる。 The set M targeted for the path G131 includes a characteristic value 29.3% for the path G132 and a characteristic value 64.7% for the path G133.

パスＧ１３２が対象の集合Ｍは、パスＧ１３１に対する特性値７１．０％、パスＧ１３３に対する特性値８２．３％からなる。 The set M targeted for the path G132 includes a characteristic value 71.0% for the path G131 and a characteristic value 82.3% for the path G133.

パスＧ１３３が対象の集合Ｍは、パスＧ１３１に対する特性値３６．０％、パスＧ１３２に対する特性値２０．３％からなる。 The set M targeted for the path G133 includes a characteristic value 36.0% for the path G131 and a characteristic value 20.3% for the path G132.

よって、閾値が７０％の場合、パスＧ１３２が対象の集合Ｍが選択され、パスＧ１３２が代表として選択される。 Therefore, when the threshold is 70%, the target set M for the path G132 is selected, and the path G132 is selected as a representative.

なお、代表パス選択部１０２は、代表のパスにその旨を示すフラグなどを付与する。 The representative path selection unit 102 assigns a flag indicating that to the representative path.

また、代表パス選択部１０２は、各集合Ｍ内の特性値ｍの平均または最大値を求め、この値が最も大きいものに対応するパスを代表のパスとして選択してもよい。 The representative path selection unit 102 may obtain an average or maximum value of the characteristic values m in each set M, and may select a path corresponding to the largest value as a representative path.

図１に戻り、パスマージ部１０３は、まず、パス記憶部１８から取りだした２つの代表のパスを、一方のパスの一方端のノードのクラスと他方のパスの一方端のノードのクラスとが同じ且つ一方のパスの他方端のノードのクラスが「技術用語」であり且つ他方のパスの他方端のノードのクラスが「教職員」なら、同じクラスのノードで２つのパスをマージし、これを全ての可能な組み合わせで行い、マージされたパス並びに元々一方端のクラスが「技術用語」であり且つ他方端のノードのクラスが「教職員」である単独のパスを全て代表パス記憶部１０４に記憶させる。 Returning to FIG. 1, the path merge unit 103 first has two representative paths extracted from the path storage unit 18 having the same class of node at one end of one path and the class of node at one end of the other path. If the class of the node at the other end of one path is “technical term” and the class of the node at the other end of the other path is “faculty / staff”, merge the two paths with the nodes of the same class, The representative path storage unit 104 stores all of the merged paths and single paths whose original class is “technical terms” and whose other node class is “faculty”. .

図１２は、パスがマージされる様子を示す図である。 FIG. 12 is a diagram illustrating how paths are merged.

パスマージ部１０３は、例えば、図１２に示すように、パスＧ１３１とパスＧ１２２が代表なら、これらをクラス「教職員」を有するノードでマージし、パスＨ１を生成する。 For example, as shown in FIG. 12, if the path G131 and the path G122 are representative, the path merge unit 103 merges these with nodes having the class “faculty and staff” to generate a path H1.

なお、例えば、グループＧ１１のパスの両端のクラスは「技術用語」であり、グループＧ１２のパスの両端のクラスは「教職員」であるから、これらのパスはマージできず、代表パス記憶部１０４にも記憶されない。 For example, the classes at both ends of the path of the group G11 are “technical terms” and the classes at both ends of the path of the group G12 are “faculty and staff”. Therefore, these paths cannot be merged and are stored in the representative path storage unit 104. Is not remembered.

また、例えば、グループＧ１１のパスの両端のクラスは「技術用語」であるから、これらのパスは、単独のパスとしても代表パス記憶部１０４に記憶されない。 For example, since the classes at both ends of the path of the group G11 are “technical terms”, these paths are not stored in the representative path storage unit 104 even as independent paths.

図１に戻り、パスマージ部１０３は、代表か否かによらず、任意のパスについても同じ処理を行う。 Returning to FIG. 1, the path merge unit 103 performs the same processing for an arbitrary path regardless of whether or not it is a representative.

パスマージ部１０３は、パス記憶部１８から取りだした２つの任意のパスを、一方のパスの一方端のノードのクラスと他方のパスの一方端のノードのクラスとが同じ且つ一方のパスの他方端のノードのクラスが「技術用語」であり且つ他方のパスの他方端のノードのクラスが「教職員」なら、同じクラスのノードで２つのパスをマージし、これを全ての可能な組み合わせで行い、マージされたパス並びに元々一方端のクラスが「技術用語」であり且つ他方端のノードのクラスが「教職員」である単独のパスを全てパス記憶部１０５に記憶させる。なお、マージの方法は、代表のパスのマージと同様である。 The path merge unit 103 takes two arbitrary paths extracted from the path storage unit 18 as one end node class of one path and one end node class of the other path and the other end of one path. If the class of the node is “technical term” and the class of the node at the other end of the other path is “faculty and staff”, then the two paths are merged with nodes of the same class, and this is done in all possible combinations, All of the merged paths and the single paths whose original one-end class is “technical term” and whose other-end node class is “faculty and staff” are stored in the path storage unit 105. The merging method is the same as the merging of representative paths.

次に、クエリ検索部１０６が、代表パス記憶部１０４のパスに合致するパスを含むクエリをクエリ記憶部１６から検索し、検索されたクエリ数を、該当のパスが属していたグループまたはグループの組み合わせごとに集計する。 Next, the query search unit 106 searches the query storage unit 16 for a query including a path that matches the path in the representative path storage unit 104, and the number of searched queries is the number of the group or group to which the corresponding path belongs. Aggregate for each combination.

図１３は、代表のパスをマージしたものによる１回目の検索の集計の結果を示す図である。 FIG. 13 is a diagram illustrating a result of a total of the first search by merging representative paths.

図１３に示すように、例えば、グループＧ１１のパス同士をマージできず、よって、グループＧ１１同士の組については、検索自体がなされない。これが、Ｇ１１の列とＧ１１の行の交点が×印で示されている。 As shown in FIG. 13, for example, the paths of the group G11 cannot be merged, and therefore the search itself is not performed for the group of the groups G11. This is indicated by crosses at the intersections of G11 columns and G11 rows.

また、グループＧ１１のパスとグループＧ１２のパスをマージできず、よって、グループＧ１１、Ｇ１２の組については、検索自体がなされない。これが、Ｇ１１の列とＧ１２の行の交点が×印で示されている。 Further, the path of the group G11 and the path of the group G12 cannot be merged, and therefore the search itself is not performed for the group G11 and G12. This is indicated by crosses at the intersections of G11 columns and G12 rows.

また、グループＧ１１のパスの両端のクラスは「技術用語」なので、Ｇ１１の列と最下行の交点が×印で示されている。 Also, since the class at both ends of the path of the group G11 is “technical term”, the intersection of the column of G11 and the bottom row is indicated by a cross.

一方、グループＧ１１のパスとグループＧ１３のパスはマージでき、検索されたクエリ数は「２」であった。 On the other hand, the path of the group G11 and the path of the group G13 can be merged, and the number of retrieved queries is “2”.

また、グループＧ１２のパスとグループＧ１３のパスはマージでき、検索されたクエリ数は「０」であった。 Further, the path of the group G12 and the path of the group G13 can be merged, and the number of retrieved queries is “0”.

図１３は、他のグループまたはグループの組み合わせについても検索の有無、さらには、検索されたクエリの数を示している。 FIG. 13 shows the presence / absence of a search for other groups or combinations of groups, and the number of queries searched.

また、クエリ検索部１０６は、パス記憶部１０５のパスに合致するパスを含むクエリをクエリ記憶部１６から検索し、検索されたクエリ数を、該当のパスが属していたグループまたはグループの組み合わせごとに集計する。 Further, the query search unit 106 searches the query storage unit 16 for a query including a path that matches the path stored in the path storage unit 105, and determines the number of searched queries for each group or combination of groups to which the corresponding path belongs. To sum up.

図１４は、任意のパスをマージしたものによる検索の集計の結果を示す図である。 FIG. 14 is a diagram illustrating a result of totaling search by merging arbitrary paths.

図１４は、グループＧ１２のパスとグループＧ１３のパスはマージでき、検索されたクエリ数は「４１」であったことを示している。 FIG. 14 shows that the path of the group G12 and the path of the group G13 can be merged, and the number of retrieved queries is “41”.

クエリ検索部１０６は、図１３のように、検索されたクエリ数が０の箇所にグループを特定する。図１３では、クエリ数が０の箇所が２箇所あり、共通するのは、グループＧ１３であるから、グループＧ１３が特定される。 As shown in FIG. 13, the query search unit 106 specifies a group at a place where the number of searched queries is zero. In FIG. 13, there are two places where the number of queries is 0, and since the group G13 is common, the group G13 is specified.

また、クエリ検索部１０６は、検索されたクエリ数が０の箇所に対応する、図１４のクエリ数が所定の閾値（例えば、１）以上なら、代表パス選択部１０２に対し、特定されたグループＧ１３については、代表のパスを選択するのに用いた閾値を低くして、再び代表のパスを同様の方法で選択するように指示する。 Further, if the number of queries in FIG. 14 corresponding to a place where the number of searched queries is 0 is equal to or greater than a predetermined threshold (for example, 1), the query search unit 106 identifies the specified group to the representative path selection unit 102. For G13, the threshold used to select the representative path is lowered, and an instruction is given to select the representative path again in the same manner.

図１３では、クエリの数が０の箇所が２箇所あり、図１４では、対応する箇所のクエリの数が４１、１５なので、例えば、閾値が１なら、クエリ検索部１０６は、代表パス選択部１０２に指示を行う。 In FIG. 13, there are two places where the number of queries is 0, and in FIG. 14, the number of queries at the corresponding places is 41 and 15. For example, if the threshold is 1, the query search unit 106 represents the representative path selection unit. An instruction is issued to 102.

代表パス選択部１０２は、例えば、図１０において、閾値を例えば、２５％に変更し、これにより、パスＧ１３１、１３２が代表として選択される。つまり、代表の数が２に増加する。 For example, in FIG. 10, the representative path selection unit 102 changes the threshold value to, for example, 25%, and the paths G131 and 132 are selected as representatives. That is, the number of representatives increases to two.

パスマージ部１０３は、代表のパスが再選択されたら、代表のパスのマージを再び行う。 When the representative path is reselected, the path merge unit 103 merges the representative paths again.

クエリ検索部１０６は、代表のパスのマージが再び行われたら、集計を再び行う。 When the representative paths are merged again, the query search unit 106 performs aggregation again.

図１５は、代表のパスをマージしたものによる２回目の検索の集計の結果を示す図である。 FIG. 15 is a diagram showing the result of the second search totaling by merging representative paths.

代表のパスが増えたことで、グループＧ１３に関連する箇所のクエリの数が増加し、閾値（例えば、１）以上となっている。 As the number of representative paths increases, the number of queries at locations related to the group G13 increases, which is equal to or greater than a threshold (for example, 1).

図１に戻り、クエリ検索部１０７は、このようにクエリの数が全て閾値以上となったなら、クエリ記憶部１６から、代表パス記憶部１０４の複数のパスのみからなるクエリを検索し、検索されたクエリをクエリ記憶部１０８に記憶させる。 Returning to FIG. 1, the query search unit 107 searches the query storage unit 16 for a query including only a plurality of paths in the representative path storage unit 104 when the number of queries is equal to or greater than the threshold value. The processed query is stored in the query storage unit 108.

図１６は、パスＨ１、Ｈ２と、これらのパスから検索されたクエリＱ１０１とを示す図である。 FIG. 16 is a diagram illustrating paths H1 and H2 and a query Q101 searched from these paths.

クエリＱ１０１は、キーワードノードとゴールノードを複数の経路（アークまたはアークとノードからなる）で結ぶクエリであり、一方の経路はパスＨ１に相当し、他方の経路はパスＨ２に相当する。経路は３以上であってもよい。 The query Q101 is a query that connects a keyword node and a goal node with a plurality of paths (consisting of arcs or arcs and nodes). One path corresponds to the path H1, and the other path corresponds to the path H2. There may be three or more routes.

クエリ検索部１０７は、クエリＱ１０１をパスＨ１、Ｈ２から検索するように、このような条件を満たすクエリを全て検索する。 The query search unit 107 searches all the queries that satisfy such conditions so as to search the query Q101 from the paths H1 and H2.

図１に戻り、次に、データ検索部１０９は、クエリ記憶部１０８のクエリにキーワード「インターネット」を代入し、代入後のクエリに合致するデータをデータセットＤ３から検索する。 Returning to FIG. 1, next, the data search unit 109 substitutes the keyword “Internet” for the query in the query storage unit 108, and searches the data set D3 for data that matches the query after the substitution.

具体的には、データ検索部１０９は、クエリを読み出し、例えば、「インターネット」を、クエリにキーワードノードのインスタンスとして含ませ、代入後のクエリに合致するデータをデータセットＤ３から検索する。 Specifically, the data search unit 109 reads the query, includes, for example, “Internet” as an instance of the keyword node in the query, and searches the data set D3 for data that matches the query after substitution.

図１７は、クエリにキーワードを含ませる様子を示す図である。 FIG. 17 is a diagram illustrating how keywords are included in a query.

データ検索部１０９は、図１７に示すように、例えば、クエリＱ１０１のキーワードノードにインスタンスとして「インターネット」を含ませる。 As shown in FIG. 17, the data search unit 109 includes, for example, “Internet” as an instance in the keyword node of the query Q101.

図１８は、「インターネット」を含ませたクエリＱ１０１について、クラスの値、プロパティ、有向性までも全て図示したものである。 FIG. 18 illustrates all of the class values, properties, and directivity for the query Q101 including “Internet”.

図１９は、クエリにキーワードを含ませたものから検索されたデータの例を示す図である。 FIG. 19 is a diagram illustrating an example of data retrieved from a query including a keyword.

データ検索部１０９は、図１８のクエリＱ１０１に対し変数（「？論文」など）以外で全てが合致する、図１９のデータＤ２０３のようなデータをデータセットＤ３から検索する。かかるデータは、「インターネット」に詳しい「教職員」の名前は「田中一郎」であることを示している。 The data search unit 109 searches the data set D3 for data such as data D203 in FIG. 19 that matches all of the queries Q101 in FIG. 18 except for variables (such as “? Paper”). This data indicates that the name of “faculty and staff” who is familiar with “Internet” is “Ichiro Tanaka”.

よって、データ検索部１０９は、ゴールノードのインスタンスを外部に出力する。 Therefore, the data search unit 109 outputs the goal node instance to the outside.

これにより、外部では「インターネット」に詳しい「教職員」の名前は「田中一郎」であることがわかる。 As a result, the name of the “faculty and staff” who is familiar with “Internet” is “Ichiro Tanaka”.

以上説明したように、データ検索装置１は、インスタンスおよび該インスタンスを示すクラスを有するノードを２以上有するデータで構成される２つのデータセットをマージするデータセットマージ手段（１３）と、マージされたデータセットが記憶されるデータセット記憶手段（１４）と、該データセットから、データを検索するためのキーワードを示すものとして予め定められたクラスを有するノード（キーワードノード）と、検索されたデータにおける目的のインスタンスを示すものとして予め定められたクラスを有するノード（ゴールノード）とを結ぶデータを取得し、該データのインスタンスを変数に置き換えたクエリを生成するクエリ生成手段（１５）と、該クエリが記憶されるクエリ記憶手段（１６）と、該クエリに含まれるノードによる１本の経路のみからなるパスを前記２つのデータセットに共通するノードに該当するノードで分割するパス生成手段（１７）と、当該分割されたパスが記憶されるパス記憶手段（１８）と、前記パス記憶部のパスを該パスの両端のノードのインスタンスを示すクラスの組ごとのグループに分類するパス分類手段（１９）と、前記パス記憶部のパスの一方端のノードに該ノードのインスタンスとして所定の検索ワードを代入し、該パスに対して変数以外で合致するデータを前記マージされたデータセットから検索するデータ検索手段（１０１）と、前記各グループで該グループ内の各パスを対象とし、対象のパスにより検索されたデータの他方端のノードのインスタンスの集合と、該対象でないパスにより検索されたデータの他方端のノードのインスタンスの集合とを取得し、前者の集合のインスタンス数を各集合の和集合のインスタンス数で除した値を式（１）により計算し、予め定めた閾値より高い該値に対応するパスを代表のパスとして該グループから選択する代表パス選択手段（１０２）と、異なるグループの代表のパスどうしをマージし且つマージ後のパスが前記キーワードに対応するノードと前記目的のインスタンスに対応するノードとを結ぶものとなるようにするパスマージ手段（１０３）と、該マージされたパスが記憶される代表パス記憶手段（１０４）と、前記クエリ記憶部から該マージされたパスを含むクエリを検索し、マージ前の各パスが属していたグループの組のそれぞれにつき、該検索されたクエリの数が予め定められた数以上となるように、前記代表パス選択手段に対して閾値を調整するように指示する第１のクエリ検索手段（１０６）と、検索されたクエリの数が予め定められた数以上となったなら、前記クエリ記憶部から該マージされたパスのみからなるクエリを検索する第２のクエリ検索手段（１０７）とを備える。 As described above, the data search device 1 is merged with the data set merging means (13) for merging two data sets composed of data having two or more nodes having an instance and a class indicating the instance. A data set storage means (14) for storing the data set, a node (keyword node) having a class predetermined as indicating a keyword for searching for data from the data set, and the searched data Query generation means (15) for acquiring data connecting a node (goal node) having a predetermined class as indicating a target instance, and generating a query in which the instance of the data is replaced with a variable, the query Is stored in the query storage means (16) for storing A path generation means (17) for dividing a path consisting of only one route by a node at a node corresponding to a node common to the two data sets, and a path storage means (18) for storing the divided paths. ), A path classification means (19) for classifying the paths of the path storage unit into groups for each class set indicating instances of nodes at both ends of the path, and a node at one end of the path of the path storage unit A predetermined search word is substituted as an instance of the node, data search means (101) for searching the merged data set for data that matches the path other than variables, and each group in the group A set of instances of the node at the other end of the data searched for by the target path and the other of the data searched by the non-target path And a value obtained by dividing the number of instances of the former set by the number of instances of the union of each set is calculated by Equation (1), and corresponds to a value higher than a predetermined threshold value. Representative path selection means (102) for selecting a path from the group as a representative path, and merging the representative paths of different groups, and the merged path corresponds to the node corresponding to the keyword and the target instance. A path merging unit (103) for connecting nodes, a representative path storage unit (104) for storing the merged path, and a query including the merged path from the query storage unit are searched. Then, for each set of groups to which each path before merging belongs, the number of retrieved queries is equal to or greater than a predetermined number. First query search means (106) for instructing the representative path selection means to adjust the threshold value, and if the number of searched queries exceeds a predetermined number, from the query storage unit Second query search means (107) for searching for a query consisting only of the merged path.

つまり、マージ前の代表のパスが属していたグループの組のそれぞれにつき、検索されたクエリの数が予め定められた数以上となるようにするので、多様性を有するクエリを得ることができる。 In other words, since the number of retrieved queries is greater than or equal to a predetermined number for each group set to which the representative path before merging belongs, it is possible to obtain diverse queries.

また、前記第２のクエリ検索手段は、前記クエリ記憶部から、２以上のマージされたパスのみからなるクエリを検索するので、信頼性を有する目的のインスタンスを得るのに適したクエリを得ることができる。 In addition, since the second query search unit searches the query storage unit for a query including only two or more merged paths, a query suitable for obtaining a target instance having reliability can be obtained. Can do.

なお、本実施の形態に係るクエリグラフパターン生成装置としてコンピュータを機能させるためのコンピュータプログラムは、半導体メモリ、磁気ディスク、光ディスク、光磁気ディスク、磁気テープなどのコンピュータ読み取り可能な記録媒体に記録でき、また、インターネットなどの通信網を介して伝送させて、広く流通させることができる。 The computer program for causing the computer to function as the query graph pattern generation device according to the present embodiment can be recorded on a computer-readable recording medium such as a semiconductor memory, a magnetic disk, an optical disk, a magneto-optical disk, or a magnetic tape, Further, it can be widely distributed by being transmitted via a communication network such as the Internet.

１…データ検索装置
１１…データセット記憶部
１２…データセット記憶部
１３…データセットマージ部
１４…データセット記憶部
１５…クエリ生成部
１６…クエリ記憶部
１７…パス生成部
１８…パス記憶部
１９…パス分類部
１０１…データ検索部
１０２…代表パス選択部
１０３…パスマージ部
１０４…代表パス記憶部
１０５…パス記憶部
１０６…クエリ検索部
１０７…クエリ検索部
１０８…クエリ記憶部
１０９…データ検索部 DESCRIPTION OF SYMBOLS 1 ... Data search device 11 ... Data set storage part 12 ... Data set storage part 13 ... Data set merge part 14 ... Data set storage part 15 ... Query generation part 16 ... Query storage part 17 ... Path generation part 18 ... Path storage part 19 ... path classification unit 101 ... data search unit 102 ... representative path selection unit 103 ... path merge unit 104 ... representative path storage unit 105 ... path storage unit 106 ... query search unit 107 ... query search unit 108 ... query storage unit 109 ... data search unit

Claims

A class predetermined to indicate a keyword for retrieving data from a data set obtained by merging two data sets composed of an instance and data having two or more nodes having a class indicating the instance. A query obtained by acquiring data connecting a node having a node and a node having a predetermined class to indicate a target instance in the retrieved data and substituting the instance of the data with a variable is stored. Query storage means;
Path generation means for dividing a path consisting of only one route by a node included in the query at a node corresponding to a node common to the two data sets;
Path storage means for storing the divided paths;
Path classification means for classifying the paths of the path storage unit into groups for each class set indicating instances of nodes at both ends of the path;
Data search means for substituting a predetermined search word as an instance of the node into one end node of the path of the path storage unit, and searching the merged data set for data that matches the path other than variables; ,
For each path in the group in each group, a set of instances of the node at the other end of the data searched by the target path and an instance of the node at the other end of the data searched by the non-target path Get the set, calculate the value obtained by dividing the number of instances of the former set by the number of instances of the union of each set, and select the path corresponding to the value higher than the predetermined threshold from the group as the representative path Representative path selection means to
Path merging means for merging paths of representatives of different groups so that the merged path connects the node corresponding to the keyword and the node corresponding to the target instance;
Representative path storage means for storing the merged path;
A query including the merged path is retrieved from the query storage means, and the number of retrieved queries is greater than or equal to a predetermined number for each group set to which each path before merging belongs. First query search means for instructing the representative path selection means to adjust a threshold;
A second query search means for searching a query consisting only of the merged path from the query storage means when the number of searched queries is equal to or greater than a predetermined number. Generator.

The query generation apparatus according to claim 1, wherein the second query search unit searches the query storage unit for a query including only two or more merged paths.

A class predetermined to indicate a keyword for retrieving data from a data set obtained by merging two data sets composed of an instance and data having two or more nodes having a class indicating the instance. A query obtained by acquiring data connecting a node having a node and a node having a predetermined class to indicate a target instance in the retrieved data and substituting the instance of the data with a variable is stored. An operation method of a query generation device including query storage means,
A path generation step of dividing a path consisting of only one route by a node included in the query at a node corresponding to a node common to the two data sets, and storing the divided path in a path storage unit;
A path classification step of classifying the paths of the path storage unit into groups for each set of classes indicating instances of nodes at both ends of the path;
A data search step of substituting a predetermined search word as an instance of the node into a node at one end of the path of the path storage unit, and searching the merged data set for data that matches the path other than variables; ,
For each path in the group in each group, a set of instances of the node at the other end of the data searched by the target path and an instance of the node at the other end of the data searched by the non-target path Get the set, calculate the value obtained by dividing the number of instances of the former set by the number of instances of the union of each set, and select the path corresponding to the value higher than the predetermined threshold from the group as the representative path A representative path selection process,
The representative paths of different groups are merged so that the merged path connects the node corresponding to the keyword and the node corresponding to the target instance, and the merged path is stored as representative path storage means. Path merge process to be stored in
A query including the merged path is retrieved from the query storage means, and the number of retrieved queries is greater than or equal to a predetermined number for each group set to which each path before merging belongs. A first query search step for instructing the representative path selection means to adjust a threshold;
A second query search step of searching for a query consisting only of the merged path from the query storage means when the number of searched queries exceeds a predetermined number. How the generator works.

4. The operation method of the query generation device according to claim 3, wherein the second query search step searches the query storage unit for a query including only two or more merged paths.

A computer program for causing a computer to function as the query generation device according to claim 1.