JP2020060981A

JP2020060981A - Node search method and node search program

Info

Publication number: JP2020060981A
Application number: JP2018192131A
Authority: JP
Inventors: 阿部　修也; Shuya Abe; 修也阿部
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2018-10-10
Filing date: 2018-10-10
Publication date: 2020-04-16
Anticipated expiration: 2038-10-10
Also published as: US20200117662A1; US11436226B2; JP7087904B2

Abstract

To identify data portions relating to other data sets, in a data set described based on a plurality of ontologies.SOLUTION: A computer identifies, from nodes included in a first data set described based on a plurality of ontologies, nodes corresponding to each of a plurality of data included in a second data set (step 701). The computer determines, from the identified nodes on a path tracing links included in the first data set, a relationship between the ontologies of a moving source node and a moving destination node (step 702). The computer searches, by tracing a link between the moving source node and the moving destination node based on the determination result, a common node where a first path tracing the link from a first node intersects a second path tracing the link from a second node, (step 703). The computer outputs the search result indicating the common node, the first node, and the second node (step 704).SELECTED DRAWING: Figure 7

Description

本発明は、ノード探索方法及びノード探索プログラムに関する。 The present invention relates to a node search method and a node search program.

近年、様々なデータセットがＲＤＦ（Resource Description Framework）によって記述され、異なるデータセット同士がＲＤＦによってリンクされている。したがって、異なるデータセット同士を組み合わせて分析することができる。 In recent years, various data sets are described by RDF (Resource Description Framework), and different data sets are linked by RDF. Therefore, different data sets can be combined and analyzed.

ＲＤＦでは、主語、述語、及び目的語の３つの要素が最小単位として用いられ、この最小単位は、トリプルと呼ばれる。例えば、（ＦＪ社，業種，電気機器）というトリプルの場合、主語が「ＦＪ社」であり、述語が「業種」であり、目的語が「電気機器」である。このトリプルは、「ＦＪ社の業種は電気機器である」という情報を記述している。 In RDF, three elements of a subject, a predicate, and an object are used as a minimum unit, and this minimum unit is called a triple. For example, in the case of the triple (FJ company, industry, electrical equipment), the subject is “FJ corporation”, the predicate is “business industry”, and the object is “electric equipment”. This triple describes the information that "the industry of FJ company is electrical equipment".

主語及び述語は、ＵＲＩ（Uniform Resource Identifier）で表現され、目的語は、ＵＲＩ又はリテラル（文字列）で表現される。ＵＲＩは、＜＞で囲んで記述され、リテラレルは、“”で囲んで記述される。ＵＲＩとして、ＵＲＬ（Uniform Resource Locator）が用いられることもある。述語は、属性又はプロパティと呼ばれることがあり、目的語は、属性の値又はプロパティの値と呼ばれることがある。 The subject and the predicate are expressed by a URI (Uniform Resource Identifier), and the object is expressed by a URI or a literal (character string). The URI is described by enclosing it in <>, and the literal is described by enclosing it in "". A URL (Uniform Resource Locator) may be used as the URI. Predicates may be referred to as attributes or properties, and objects may be referred to as attribute values or property values.

ＲＤＦによって記述されたデータセットは、トリプルの集合である。データセットをグラフとして記述する場合、トリプルの主語及び目的語はノードと呼ばれ、述語はリンク又はエッジと呼ばれる。 The dataset described by RDF is a set of triples. When describing a dataset as a graph, the subject and object of triples are called nodes and the predicates are called links or edges.

図１は、ＲＤＦによって記述されたデータセットの例を示している。＿：ｆのように、下線“＿”とコロン“：”の組み合わせで始まる値は、任意のＵＲＩを表す。＿：ｆ、“ＦＪ社”、“神奈川県川崎市”、及び“電気機器”は、ノードを表し、＜名称＞、＜所在地＞、及び＜業種＞の矢印は、リンクを表す。リンクの矢印は参照方向を示し、矢印の起点のノードは、リンクによって矢印の終点のノードを参照している。このデータセットは、ある会社の名称が“ＦＪ社”であり、所在地が“神奈川県川崎市”であり、業種が“電気機器”であることを表している。 FIG. 1 shows an example of a data set described by RDF. A value that starts with a combination of an underscore "_" and a colon ":" like _: f represents an arbitrary URI. _: F, "FJ company", "Kawasaki City, Kanagawa", and "electrical device" represent nodes, and <Name>, <Location>, and <Industry> arrows represent links. The arrow of the link indicates the reference direction, and the node at the starting point of the arrow refers to the node at the ending point of the arrow by the link. This data set shows that the name of a certain company is “FJ Company”, the location is “Kawasaki City, Kanagawa Prefecture”, and the industry is “electric equipment”.

図１のデータセットをN-Triples形式で記述すると、次のようになる。
（＿：ｆ，＜名称＞，“ＦＪ社”）
（＿：ｆ，＜所在地＞，“神奈川県川崎市”）
（＿：ｆ，＜業種＞，“電気機器”） The data set in Fig. 1 is described as follows in N-Triples format.
(_: F, <name>, “FJ company”)
(_: F, <Location>, “Kawasaki City, Kanagawa”)
(_: F, <industry>, “electrical equipment”)

また、図１のデータセットをTurtle形式で記述すると、次のようになる。
＿：ｆ＜名称＞“ＦＪ社”；
＜所在地＞“神奈川県川崎市”；
＜業種＞“電気機器”． When the data set in FIG. 1 is described in Turtle format, it becomes as follows.
_: F <name> “FJ company”;
<Location> “Kawasaki City, Kanagawa Prefecture”;
<Industry> “Electrical equipment”.

ＲＤＦでは、何らかのオントロジーの定義に従って情報が記述される。例えば、法人についてＲＤＦで記述する場合は、法人に関するオントロジーが用いられる。同じ情報に関する複数のオントロジーが存在することもある。例えば、法人に関するオントロジーとして、Organizationオントロジー、共通語彙基盤等が存在する。異なるオントロジーに基づいて記述された情報は、同じ情報であっても、異なる構造とリンクを用いて記述されている。 In RDF, information is described according to some ontology definition. For example, when describing a corporation in RDF, an ontology about the corporation is used. There may be multiple ontologies for the same information. For example, there are Organization ontologies, common vocabulary bases, etc. as ontologies related to corporations. Information described based on different ontologies is described by using different structures and links even if the information is the same.

図２は、異なるオントロジーに基づいて記述されたデータセットの例を示している。図２（ａ）は、Organizationオントロジーに基づいて記述されたデータセットの例を示している。＿：ｆ、●、“ＦＪ社”、及び“神奈川県”は、ノードを表し、＜ラベル＞、＜登記住所＞、＜場所＞、及び＜地域＞の矢印は、リンクを表す。●はブランクノードを表す。このデータセットは、ある会社の名称が“ＦＪ社”であり、住所が属する都道府県が“神奈川県”であることを表している。 FIG. 2 shows an example of a data set described based on different ontologies. FIG. 2A shows an example of a data set described based on the Organization ontology. _: F, ●, “FJ company”, and “Kanagawa prefecture” represent nodes, and arrows of <label>, <registered address>, <place>, and <region> represent links. ● represents a blank node. This data set indicates that the name of a certain company is "FJ company" and the prefecture to which the address belongs is "Kanagawa prefecture".

図２（ｂ）は、共通語彙基盤に基づいて記述されたデータセットの例を示している。＿：ｆ、●、“ＦＪ社”、及び“神奈川県”は、ノードを表し、＜名称＞、＜表記＞、＜住所＞、及び＜都道府県＞の矢印は、リンクを表す。このデータセットは、図２（ａ）のデータセットと同じ情報を記述している。 FIG. 2B shows an example of a data set described based on the common vocabulary base. _: F, ●, “FJ company”, and “Kanagawa prefecture” represent nodes, and <Name>, <notation>, <address>, and <prefecture> arrows represent links. This data set describes the same information as the data set of FIG.

単一のデータセットは、複数のオントロジーを組み合わせて記述されることがあり、複数のデータセットをリンクによって連結することで、複合的なデータセットが生成されることもある。 A single data set may be described by combining multiple ontologies, and a plurality of data sets may be linked by a link to generate a composite data set.

図３は、複合的なデータセットの例を示している。データセット３１１は、オントロジー３２１〜オントロジー３２４に基づいて記述され、データセット３１２は、オントロジー３２１、オントロジー３２５、及びオントロジー３２６に基づいて記述される。そして、データセット３０１は、データセット３１１及びデータセット３１２を含む複数のデータセットを連結することで生成される。 FIG. 3 shows an example of a composite data set. The data set 311 is described based on the ontology 321 to the ontology 324, and the data set 312 is described based on the ontology 321, the ontology 325, and the ontology 326. Then, the data set 301 is generated by concatenating a plurality of data sets including the data set 311 and the data set 312.

また、オントロジーの中には、特定のデータセットのみを記述するために用いられる特殊なオントロジーもあり、様々なデータセットを記述するために用いられる基本的なオントロジーもある。 Further, among the ontologies, there are special ontologies used to describe only specific data sets, and there are basic ontologies used to describe various data sets.

ＲＤＦに関して、オープンデータを用いて薬物の間の類似性を計算する類似性計算装置、非概念的データ項目をデータグラフに統合するコンピュータ装置、及び平坦データの階層情報を取得する方法が知られている（例えば、特許文献１〜特許文献３を参照）。 Regarding RDF, a similarity calculator for calculating similarity between drugs using open data, a computer for integrating non-conceptual data items into a data graph, and a method for obtaining hierarchical information of flat data are known. (See, for example, Patent Documents 1 to 3).

特開２０１６−２１２８５３号公報JP, 2016-212853, A 特開２０１６−１５１２４号公報JP, 2016-15124, A 特開２０１２−１４１９５５号公報JP 2012-141955 A

上述したように、ＲＤＦによって記述された複数のデータセットは、リンクによって連結することができ、それらのデータセットを組み合わせて分析することができる。 As described above, multiple datasets described by RDF can be linked by links, and the datasets can be combined and analyzed.

しかしながら、すべてのデータセットがＲＤＦによって記述されているとは限らない。例えば、企業の正式名称、所在地、電話番号等の属性を含む顧客企業一覧が、ＣＳＶ（Comma-Separated Values）によって記述されており、各企業のオープンデータがＲＤＦによって記述されている場合もある。この場合、ＣＳＶによって記述された顧客企業一覧と、ＲＤＦによって記述されたデータセットとの間の対応関係が不明であり、これらのデータを組み合わせて分析することは困難である。 However, not all datasets are described in RDF. For example, a list of customer companies including attributes such as official names, addresses, and telephone numbers of companies is described in CSV (Comma-Separated Values), and open data of each company may be described in RDF. In this case, the correspondence relationship between the customer company list described in CSV and the data set described in RDF is unknown, and it is difficult to analyze these data in combination.

なお、かかる問題は、ＲＤＦによって記述されたデータセットとＣＳＶによって記述されたデータセットとを対応付ける場合に限らず、複数のオントロジーに基づいて記述されたデータセットと他のデータセットとを対応付ける場合においても生ずるものである。 Note that such a problem is not limited to the case of associating the data set described by RDF and the data set described by CSV, but in the case of associating the data set described based on a plurality of ontologies with other data sets. Also occurs.

１つの側面において、本発明は、複数のオントロジーに基づいて記述されたデータセットにおいて、他のデータセットに関連するデータ部分を特定することを目的とする。 In one aspect, the present invention aims to identify a data portion related to other data sets in a data set described based on a plurality of ontologies.

１つの案では、コンピュータは、以下の処理を実行する。
（１）コンピュータは、複数のオントロジーに基づいて記述された第１データセットに含まれるノードの中から、第２データセットに含まれる複数のデータそれぞれに対応するノードを特定する。
（２）コンピュータは、特定された複数のノードそれぞれから第１データセットに含まれるリンクを辿る経路上において、移動元ノードのオントロジーと移動先ノードのオントロジーとの間の関連性を判定する。
（３）コンピュータは、関連性を判定した結果に基づいて、移動元ノードと移動先ノードとの間のリンクを辿ることで、第１経路と第２経路とが交わる共通ノードを探索する。第１経路は、特定された複数のノードのうち第１ノードからリンクを辿る経路であり、第２経路は、特定された複数のノードのうち第２ノードからリンクを辿る経路である。
（４）コンピュータは、共通ノードを示す情報と第１ノードを示す情報と第２ノードを示す情報とを含む、探索結果を出力する。 In one proposal, the computer performs the following processing.
(1) The computer identifies a node corresponding to each of the plurality of data included in the second data set from the nodes included in the first data set described based on the plurality of ontology.
(2) The computer determines the relevance between the ontology of the source node and the ontology of the destination node on the route that follows the link included in the first data set from each of the identified plurality of nodes.
(3) The computer searches for a common node where the first route and the second route intersect by tracing the link between the source node and the destination node based on the result of determining the relevance. The first route is a route that follows the link from the first node among the plurality of identified nodes, and the second route is a route that follows the link from the second node among the plurality of identified nodes.
(4) The computer outputs a search result including information indicating the common node, information indicating the first node, and information indicating the second node.

実施形態によれば、複数のオントロジーに基づいて記述されたデータセットにおいて、他のデータセットに関連するデータ部分を特定することができる。 According to the embodiment, in a data set described based on a plurality of ontologies, a data part related to another data set can be specified.

ＲＤＦによって記述されたデータセットを示す図である。It is a figure which shows the data set described by RDF. 異なるオントロジーに基づいて記述されたデータセットを示す図である。It is a figure which shows the data set described based on a different ontology. 複合的なデータセットを示す図である。It is a figure which shows a composite data set. 顧客企業一覧と企業のオープンデータとを対応付ける処理を示す図である。It is a figure which shows the process which matches a customer company list and open data of a company. 複合的なＲＤＦデータセットにおける探索処理を示す図である。It is a figure which shows the search process in a composite RDF data set. ノード探索装置の機能的構成図である。It is a functional block diagram of a node search device. ノード探索処理のフローチャートである。It is a flowchart of a node search process. ノード探索装置の具体例を示す機能的構成図である。It is a functional block diagram which shows the specific example of a node search apparatus. ＲＤＦデータセット群に含まれるＲＤＦデータセットを示す図である。It is a figure which shows the RDF data set contained in a RDF data set group. ＰＭＩ（ｘ，ｙ）の計算に用いられるＲＤＦデータセットを示す図である。It is a figure which shows the RDF data set used for calculation of PMI (x, y). ｔｆｉｄｆ（ｉ，ｊ）の計算に用いられるＲＤＦデータセットを示す図である。It is a figure which shows the RDF data set used for calculation of tfidf (i, j). ノード探索処理の具体例を示すフローチャートである。It is a flowchart which shows the specific example of a node search process. ノード検出処理のフローチャートである。It is a flowchart of a node detection process. 探索ノードキューを示す図である。It is a figure which shows a search node queue. ノード移動処理のフローチャートである。It is a flowchart of a node movement process. 経路を示す図である。It is a figure which shows a route. 経路リストを示す図である。It is a figure which shows a route list. 共通ノードを含む経路リストを示す図である。It is a figure which shows the route list containing a common node. 情報処理装置の構成図である。It is a block diagram of an information processing apparatus.

以下、図面を参照しながら、実施形態を詳細に説明する。
以下では、ＲＤＦによって記述されたデータセットをＲＤＦデータセットと記載し、ＲＤＦによって記述されていないデータセットを非ＲＤＦデータセットと記載することがある。 Hereinafter, embodiments will be described in detail with reference to the drawings.
Hereinafter, a dataset described by RDF may be referred to as an RDF dataset, and a dataset not described by RDF may be referred to as a non-RDF dataset.

図４は、顧客企業一覧と企業のオープンデータとを対応付ける処理の例を示している。図４（ａ）は、ＣＳＶによって記述された顧客企業一覧の例を示している。この例では、ＦＪ社の名称、所在地、及び電話番号が顧客企業一覧に記録されている。 FIG. 4 shows an example of a process of associating a list of customer companies with open data of companies. FIG. 4A shows an example of a customer company list described in CSV. In this example, the name, address, and telephone number of FJ company are recorded in the customer company list.

図４（ｂ）は、ＲＤＦによって記述された企業のオープンデータの例を示している。＿：ｆ、＿：ｎ１、＿：ｎ２、“ＦＪ社”、“神奈川県川崎市”、及び“ａａａａ−ｂｂｂ−ｃｃｃ”は、ノードを表し、＜名称＞、＜本社＞、＜所在地＞、及び＜電話番号＞の矢印は、リンクを表す。このＲＤＦデータセットは、企業の名称が“ＦＪ社”であり、本社の所在地が“神奈川県川崎市”であり、本社の電話番号が“ａａａａ−ｂｂｂ−ｃｃｃ”であることを表している。 FIG. 4B shows an example of open data of a company described by RDF. _: F, _: n1, _: n2, "FJ company", "Kanagawa prefecture Kawasaki city", and "aaa-bbb-ccc" represent nodes, <name>, <head office>, <location>, And <telephone number> arrows represent links. This RDF data set indicates that the name of the company is “FJ company”, the location of the head office is “Kawasaki City, Kanagawa Prefecture”, and the telephone number of the head office is “aaaa-bbb-ccc”.

図４（ｃ）は、図４（ａ）の顧客企業一覧と図４（ｂ）のオープンデータとを対応付けた結果の例を示している。対応ノードは、ＲＤＦデータセットにおいて、顧客企業一覧に含まれるデータに対応するリテラルを有するノードを表し、共通ノードは、２つ以上の対応ノードを直接又は間接的に参照するノードを表す。 FIG. 4C shows an example of the result of associating the customer company list of FIG. 4A with the open data of FIG. 4B. The corresponding node represents a node having a literal corresponding to the data included in the customer company list in the RDF data set, and the common node represents a node that directly or indirectly refers to two or more corresponding nodes.

共通ノードが対応ノードを直接参照する場合、共通ノードから対応ノードへ向かう１つのリンクが存在し、共通ノードが対応ノードを間接的に参照する場合、共通ノードから１つ以上の別のノードを経由して対応ノードへ向かう２つ以上のリンクが存在する。 When the common node directly refers to the corresponding node, there is one link from the common node to the corresponding node, and when the common node indirectly refers to the corresponding node, the common node goes through one or more other nodes. Then, there are two or more links to the corresponding node.

例えば、共通ノード‘＿：ｎ１’は、リンク＜所在地＞によって、対応ノード“神奈川県川崎市”を直接参照するとともに、ノード‘＿：ｎ２’を経由して、対応ノード“ａａａａ−ｂｂｂ−ｃｃｃ”を間接的に参照している。 For example, the common node “_: n1” directly refers to the corresponding node “Kawasaki City, Kanagawa Prefecture” by the link <Location>, and the corresponding node “aaaa-bbb-ccc” via the node “_: n2”. "Refers to indirectly.

一方、共通ノード‘＿：ｆ’は、リンク＜名称＞によって、対応ノード“ＦＪ社”を直接参照するとともに、ノード‘＿：ｎ１’を経由して、対応ノード“神奈川県川崎市”を間接的に参照している。さらに、共通ノード‘＿：ｆ’は、ノード‘＿：ｎ１’及びノード‘＿：ｎ２’を経由して、対応ノード“ａａａａ−ｂｂｂ−ｃｃｃ”を間接的に参照している。 On the other hand, the common node “_: f” directly refers to the corresponding node “FJ company” by the link <name>, and indirectly connects to the corresponding node “Kawasaki City, Kanagawa Prefecture” via the node “_: n1”. Refer to. Further, the common node “_: f” indirectly refers to the corresponding node “aaa-bbb-ccc” via the node “_: n1” and the node “_: n2”.

コンピュータは、図４（ａ）の顧客企業一覧に含まれる各データの文字列と、図４（ｂ）のＲＤＦデータセットに含まれる各ノードが有するリテラルの文字列とが一致するか否かをチェックすることで、対応ノードを検出することができる。これにより、ＲＤＦデータセットから、“ＦＪ社”、“神奈川県川崎市”、及び“ａａａａ−ｂｂｂ−ｃｃｃ”の３つの対応ノードが検出される。 The computer determines whether the character string of each data included in the customer company list of FIG. 4A matches the literal character string of each node included in the RDF data set of FIG. 4B. By checking, the corresponding node can be detected. As a result, three corresponding nodes of “FJ company”, “Kawasaki city, Kanagawa prefecture”, and “aaa-bbb-ccc” are detected from the RDF data set.

次に、コンピュータは、各対応ノードから参照方向とは逆方向にリンクを辿る探索処理を、辿れるリンクがなくなるまで再帰的に行うことで、ＲＤＦデータセットから共通ノードを検出することができる。探索処理において、複数の対応ノードそれぞれからリンクを辿る経路が交わった場合、経路の交点に対応するノードが共通ノードとして検出される。例えば、コンピュータは、以下の手順で探索処理を行う。
（Ｐ１）コンピュータは、対応ノード“ＦＪ社”を起点とする経路上で、“ＦＪ社”を参照するリンク＜名称＞を逆方向に辿り、ノード‘＿：ｆ’を検出する。
（Ｐ２）コンピュータは、対応ノード“神奈川県川崎市”を起点とする経路上で、“神奈川県川崎市”を参照するリンク＜所在地＞を逆方向に辿り、ノード‘＿：ｎ１’を検出する。
（Ｐ３）コンピュータは、対応ノード“ａａａａ−ｂｂｂ−ｃｃｃ”を起点とする経路上で、“ａａａａ−ｂｂｂ−ｃｃｃ”を参照するリンク＜電話番号＞を逆方向に辿り、ノード‘＿：ｎ２’を検出する。
（Ｐ４）コンピュータは、対応ノード“神奈川県川崎市”を起点とする経路上で、‘＿：ｎ１’を参照するリンク＜本社＞を逆方向に辿り、ノード‘＿：ｆ’を検出する。
（Ｐ５）コンピュータは、対応ノード“ａａａａ−ｂｂｂ−ｃｃｃ”を起点とする経路上で、‘＿：ｎ２’を参照するリンクを逆方向に辿り、ノード‘＿：ｎ１’を検出する。
（Ｐ６）コンピュータは、対応ノード“ａａａａ−ｂｂｂ−ｃｃｃ”を起点とする経路上で、‘＿：ｎ１’を参照するリンク＜本社＞を逆方向に辿り、ノード‘＿：ｆ’を検出する。 Next, the computer can detect the common node from the RDF data set by recursively performing a search process of following a link from each corresponding node in the direction opposite to the reference direction until there are no links that can be followed. In the search process, when a route that follows a link intersects from each of a plurality of corresponding nodes, the node corresponding to the intersection of the routes is detected as a common node. For example, the computer performs the search process in the following procedure.
(P1) The computer traces the link <name> referring to “FJ company” in the reverse direction on the path starting from the corresponding node “FJ company”, and detects the node '_: f'.
(P2) The computer follows the link <location> referring to “Kawasaki City, Kanagawa” in the reverse direction on the route starting from the corresponding node “Kawasaki City, Kanagawa”, and detects the node '_: n1'. .
(P3) The computer follows the link <telephone number> referring to “aaa-bbb-ccc” in the reverse direction on the route starting from the corresponding node “aaa-bbb-ccc”, and the node '_: n2' To detect.
(P4) The computer detects the node '_: f' by tracing the link <head office> referring to '_: n1' in the reverse direction on the route starting from the corresponding node “Kawasaki, Kanagawa”.
(P5) The computer detects the node '_: n1' by tracing the link referring to '_: n2' in the reverse direction on the route starting from the corresponding node'aaa-bbb-ccc '.
(P6) The computer follows the link <headquarters> referring to “_: n1” in the reverse direction on the route starting from the corresponding node “aaa-bbb-ccc”, and detects the node “_: f”. .

この場合、“ＦＪ社”、“神奈川県川崎市”、及び“ａａａａ−ｂｂｂ−ｃｃｃ”それぞれを起点とする３本の経路がノード‘＿：ｆ’で交わるため、ノード‘＿：ｆ’が共通ノードとして検出される。また、“神奈川県川崎市”及び“ａａａａ−ｂｂｂ−ｃｃｃ”それぞれを起点とする２本の経路がノード‘＿：ｎ１’で交わるため、ノード‘＿：ｎ１’も共通ノードとして検出される。 In this case, since the three routes starting from "FJ company", "Kawasaki city, Kanagawa prefecture", and "aaa-bbb-ccc" intersect at node "_: f", node "_: f" becomes Detected as a common node. Also, since two routes starting from "Kawasaki City, Kanagawa" and "aaa-bbb-ccc" intersect at node "_: n1", node "_: n1" is also detected as a common node.

このように、ＲＤＦデータセットにおいて対応ノード及び共通ノードを特定することで、共通ノードから直接又は間接的に参照されるノードの範囲を、非ＲＤＦデータセットに関連するデータ部分として特定することができる。これにより、非ＲＤＦデータセットがＲＤＦデータセットに対応付けられ、これらのデータセットを組み合わせて分析することが可能になる。 In this way, by specifying the corresponding node and the common node in the RDF data set, the range of nodes directly or indirectly referenced from the common node can be specified as the data portion related to the non-RDF data set. . This allows non-RDF datasets to be associated with RDF datasets and allows these datasets to be combined and analyzed.

一例として、図４（ｂ）のＲＤＦデータセットにおいて、ノード‘＿：ｆ’、ノード‘＿：ｎ１’、又はノード‘＿：ｎ２’が、不図示の別のリンクによって別のノードを参照している場合を想定する。別のノードが有する情報は、顧客企業一覧には含まれていない営業拠点、事業所、工場等に関する情報であってもよい。この場合、コンピュータは、別のリンクを参照方向に辿ることで、別のノードが有する情報を取得して分析することが可能になる。 As an example, in the RDF data set of FIG. 4B, the node '_: f', the node '_: n1', or the node '_: n2' refers to another node by another link (not shown). Suppose that. The information held by another node may be information about a sales office, a business office, a factory, etc., which is not included in the customer company list. In this case, the computer can acquire and analyze information possessed by another node by following another link in the reference direction.

ところで、図３に示した複合的なＲＤＦデータセットでは、連結されているすべてのデータセットのうち、非ＲＤＦデータセットに関連するデータ部分は高々１個又は２個のデータセットであることが多い。この場合、連結されているすべてのデータセットを探索すると、無関係なデータセットも探索されるため、時間計算量及び空間計算量が膨大になる。 By the way, in the composite RDF data set shown in FIG. 3, among all the connected data sets, the data portion related to the non-RDF data set is often at most one or two data sets. . In this case, if all connected data sets are searched, irrelevant data sets are also searched, resulting in enormous amount of time calculation and space calculation.

図５は、図４（ｂ）のＲＤＦデータセットを含む複合的なＲＤＦデータセットにおける探索処理の例を示している。図５のＲＤＦデータセットでは、企業ＤＢ（database）と略称ＤＢの情報が混在している。“略称ＤＢ”、“ＦＪ”、及び“Ｆ”は、ノードを表し、＜略称＞の矢印は、リンクを表す。 FIG. 5 shows an example of a search process in a composite RDF dataset including the RDF dataset of FIG. In the RDF data set of FIG. 5, information of a company DB (database) and an abbreviation DB is mixed. "Abbreviation DB", "FJ", and "F" represent nodes, and an arrow of <abbreviation> represents a link.

この場合、コンピュータは、図４（ａ）の顧客企業一覧に関連するデータ部分を特定するために、以下の手順で探索処理を行う。
（Ｐ１１）コンピュータは、対応ノード“ＦＪ社”を起点とする第１の経路上で、“ＦＪ社”を参照するリンク＜正式名称＞を逆方向に辿り、ブランクノード●を検出する。
（Ｐ１２）コンピュータは、対応ノード“ＦＪ社”を起点とする第２の経路上で、“ＦＪ社”を参照するリンク＜名称＞を逆方向に辿り、ノード‘＿：ｆ’を検出する。
（Ｐ１３）コンピュータは、対応ノード“神奈川県川崎市”を起点とする経路上で、“神奈川県川崎市”を参照するリンク＜所在地＞を逆方向に辿り、ノード‘＿：ｎ１’を検出する。
（Ｐ１４）コンピュータは、対応ノード“ａａａａ−ｂｂｂ−ｃｃｃ”を起点とする経路上で、“ａａａａ−ｂｂｂ−ｃｃｃ”を参照するリンク＜電話番号＞を逆方向に辿り、ノード‘＿：ｎ２’を検出する。
（Ｐ１５）コンピュータは、対応ノード“ＦＪ社”を起点とする第１の経路上で、●を参照するリンクを逆方向に辿る。
（Ｐ１６）コンピュータは、対応ノード“神奈川県川崎市”を起点とする経路上で、‘＿：ｎ１’を参照するリンク＜本社＞を逆方向に辿り、ノード‘＿：ｆ’を検出する。
（Ｐ１７）コンピュータは、対応ノード“ａａａａ−ｂｂｂ−ｃｃｃ”を起点とする経路上で、‘＿：ｎ２’を参照するリンクを逆方向に辿り、ノード‘＿：ｎ１’を検出する。
（Ｐ１８）コンピュータは、対応ノード“ＦＪ社”を起点とする第１の経路上で、●を参照するリンクの次のリンクを逆方向に辿る。
（Ｐ１９）コンピュータは、対応ノード“ａａａａ−ｂｂｂ−ｃｃｃ”を起点とする経路上で、‘＿：ｎ１’を参照するリンク＜本社＞を逆方向に辿り、ノード‘＿：ｆ’を検出する。
（Ｐ２０）コンピュータは、対応ノード“ＦＪ社”を起点とする第１の経路上で、さらに次のリンクを逆方向に辿る。 In this case, the computer performs the search process in the following procedure in order to specify the data portion related to the customer company list in FIG.
(P11) The computer follows the link <formal name> referring to “FJ company” in the reverse direction on the first path starting from the corresponding node “FJ company”, and detects the blank node ●.
(P12) The computer follows the link <name> referring to “FJ company” in the opposite direction on the second path starting from the corresponding node “FJ company” to detect the node “_: f”.
(P13) The computer follows the link <location> referring to “Kanagawa Prefecture Kawasaki City” in the reverse direction on the route starting from the corresponding node “Kanagawa Prefecture Kawasaki City”, and detects the node '_: n1'. .
(P14) The computer follows the link <telephone number> referring to “aaa-bbb-ccc” in the reverse direction on the route starting from the corresponding node “aaa-bbb-ccc”, and the node '_: n2' To detect.
(P15) The computer follows the link referring to ● in the reverse direction on the first route starting from the corresponding node “FJ company”.
(P16) The computer detects the node '_: f' by tracing the link <head office> referring to '_: n1' in the reverse direction on the route starting from the corresponding node “Kawasaki, Kanagawa”.
(P17) The computer detects the node '_: n1' by tracing the link referring to '_: n2' in the reverse direction on the route starting from the corresponding node'aaa-bbb-ccc '.
(P18) The computer follows the link next to the link referring to ● in the reverse direction on the first route starting from the corresponding node “FJ company”.
(P19) The computer follows the link <head office> that refers to “_: n1” in the opposite direction on the route starting from the corresponding node “aaa-bbb-ccc”, and detects the node “_: f”. .
(P20) The computer further follows the next link in the reverse direction on the first route starting from the corresponding node “FJ company”.

図４（ｂ）のＲＤＦデータセットの場合と同様に、共通ノード‘＿：ｆ’から直接又は間接的に参照されるノードの範囲が、顧客企業一覧に関連するデータ部分である。このため、本来であれば、ブランクノード●を含む経路上のリンクを辿る手順（Ｐ１１）、手順（Ｐ１５）、手順（Ｐ１８）、及び手順（Ｐ２０）の処理は、不要である。特に、ブランクノード●を含む経路に多数のリンクが含まれている場合、それらのリンクを辿るための計算量が膨大になる。探索処理において、仮に、対応ノードが属するデータセットのみを探索することができれば、無関係なデータセットを探索する必要がなくなる。 Similar to the case of the RDF data set of FIG. 4B, the range of nodes directly or indirectly referenced from the common node '_: f' is the data portion related to the customer company list. Therefore, originally, the processing of the procedure (P11), the procedure (P15), the procedure (P18), and the procedure (P20) of following the link on the route including the blank node ● is unnecessary. In particular, when a route including the blank node ● includes many links, the amount of calculation for tracing those links becomes enormous. In the search process, if only the data set to which the corresponding node belongs can be searched, it becomes unnecessary to search an irrelevant data set.

図３に示したように、各データセットは、特定のオントロジーの集合に基づいて記述される。そこで、対応ノードが属するデータセットを求め、求めたデータセットにおいてよく利用されるオントロジーの集合を求め、そのオントロジーの集合に属するノードのみを探索対象として選択する方法が考えられる。しかし、各データセットにおいてよく利用されるオントロジーの集合を特定することは難しい。 As shown in FIG. 3, each data set is described based on a specific set of ontology. Therefore, a method is conceivable in which a data set to which the corresponding node belongs is obtained, a set of ontology frequently used in the obtained data set is obtained, and only nodes belonging to the set of ontology are selected as search targets. However, it is difficult to identify the set of frequently used ontologies in each data set.

図６は、実施形態のノード探索装置の機能的構成例を示している。図６のノード探索装置６０１は、記憶部６１１、探索部６１２、及び出力部６１３を含む。記憶部６１１は、複数のオントロジーに基づいて記述された第１データセット６２１と、第２データセット６２２とを記憶する。探索部６１２は、第１データセット６２１及び第２データセット６２２を用いて、ノード探索処理を行う。 FIG. 6 illustrates a functional configuration example of the node search device according to the embodiment. The node search device 601 of FIG. 6 includes a storage unit 611, a search unit 612, and an output unit 613. The storage unit 611 stores a first data set 621 and a second data set 622 described based on a plurality of ontologies. The search unit 612 uses the first data set 621 and the second data set 622 to perform node search processing.

図７は、図６のノード探索装置６０１が行うノード探索処理の例を示すフローチャートである。まず、探索部６１２は、第１データセット６２１に含まれるノードの中から、第２データセット６２２に含まれる複数のデータそれぞれに対応するノードを特定する（ステップ７０１）。 FIG. 7 is a flowchart showing an example of the node search processing performed by the node search device 601 of FIG. First, the search unit 612 identifies a node corresponding to each of the plurality of data included in the second data set 622 from the nodes included in the first data set 621 (step 701).

次に、探索部６１２は、特定された複数のノードそれぞれから第１データセット６２１に含まれるリンクを辿る経路上において、移動元ノードのオントロジーと移動先ノードのオントロジーとの間の関連性を判定する（ステップ７０２）。 Next, the search unit 612 determines the relevance between the ontology of the source node and the ontology of the destination node on the route that follows the links included in the first data set 621 from each of the identified nodes. (Step 702).

次に、探索部６１２は、関連性を判定した結果に基づいて、移動元ノードと移動先ノードとの間のリンクを辿ることで、第１経路と第２経路とが交わる共通ノードを探索する（ステップ７０３）。第１経路は、特定された複数のノードのうち第１ノードからリンクを辿る経路であり、第２経路は、特定された複数のノードのうち第２ノードからリンクを辿る経路である。 Next, the search unit 612 searches for a common node where the first route and the second route intersect by tracing the link between the source node and the destination node based on the result of determining the relevance. (Step 703). The first route is a route that follows the link from the first node among the plurality of identified nodes, and the second route is a route that follows the link from the second node among the plurality of identified nodes.

次に、出力部６１３は、共通ノードを示す情報、第１ノードを示す情報、及び第２ノードを示す情報を含む、探索結果を出力する（ステップ７０４）。 Next, the output unit 613 outputs a search result including information indicating the common node, information indicating the first node, and information indicating the second node (step 704).

図６のノード探索装置６０１によれば、複数のオントロジーに基づいて記述されたデータセットにおいて、他のデータセットに関連するデータ部分を特定することができる。 According to the node search device 601 of FIG. 6, it is possible to specify a data portion related to another data set in the data set described based on a plurality of ontologies.

図８は、図６のノード探索装置６０１の具体例を示している。図８のノード探索装置８０１は、記憶部８１１、計算部８１２、探索部８１３、及び出力部８１４を含む。記憶部８１１、探索部８１３、及び出力部８１４は、図６の記憶部６１１、探索部６１２、及び出力部６１３にそれぞれ対応する。 FIG. 8 shows a specific example of the node search device 601 of FIG. The node search device 801 in FIG. 8 includes a storage unit 811, a calculation unit 812, a search unit 813, and an output unit 814. The storage unit 811, the search unit 813, and the output unit 814 correspond to the storage unit 611, the search unit 612, and the output unit 613 of FIG. 6, respectively.

記憶部８１１は、ＲＤＦデータセット群８２１、オントロジー群８２２、ＲＤＦデータセット８２５、及び非ＲＤＦデータセット８２６を記憶する。ＲＤＦデータセット８２５及び非ＲＤＦデータセット８２６は、図６の第１データセット６２１及び第２データセット６２２にそれぞれ対応する。 The storage unit 811 stores the RDF data set group 821, the ontology group 822, the RDF data set 825, and the non-RDF data set 826. The RDF data set 825 and the non-RDF data set 826 correspond to the first data set 621 and the second data set 622 of FIG. 6, respectively.

非ＲＤＦデータセット８２６は、ＣＳＶ、ＴＳＶ（Tab-Separated Values）、ＳＳＶ（Space-Separated Values）、ＸＭＬ（Extensible Markup Language）、リレーショナルデータモデル等によって記述されたデータセットであってもよい。 The non-RDF data set 826 may be a data set described in CSV, TSV (Tab-Separated Values), SSV (Space-Separated Values), XML (Extensible Markup Language), a relational data model, or the like.

例えば、非ＲＤＦデータセット８２６は、銀行が保持する融資先企業の財務情報であってもよく、ＲＤＦデータセット８２５は、その企業のオープンデータであってもよい。この場合、これらのデータセットを対応付けることで、融資先企業の財務情報及びオープンデータを統合的に分析して、銀行による融資の可否を判断することが可能になる。 For example, the non-RDF dataset 826 may be financial information of a lender company held by a bank, and the RDF dataset 825 may be open data of the firm. In this case, by associating these data sets, it is possible to comprehensively analyze the financial information and open data of the loan recipient company and determine whether or not the bank can lend.

また、非ＲＤＦデータセット８２６は、企業の役員の個人情報であってもよく、ＲＤＦデータセット８２５は、その企業のオープンデータであってもよい。この場合、これらのデータセットを対応付けることで、役員の個人情報及び企業のオープンデータを統合的に分析して、その企業との取引の可否を判断することが可能になる。 Further, the non-RDF data set 826 may be personal information of a company officer, and the RDF data set 825 may be open data of the company. In this case, by associating these data sets, it becomes possible to comprehensively analyze the personal information of the officer and the open data of the company, and determine whether or not the transaction with the company is possible.

ＲＤＦデータセット群８２１は、リンクに関する統計量の計算に用いられる複数のＲＤＦデータセットであり、オントロジー群８２２は、それらのＲＤＦデータセットを記述するための複数のオントロジーである。ＲＤＦデータセット群８２１に含まれる各リンクは、オントロジー群８２２に含まれる各オントロジーによって定義される。 The RDF data set group 821 is a plurality of RDF data sets used for calculation of statistics regarding links, and the ontology group 822 is a plurality of ontology for describing those RDF data sets. Each link included in the RDF data set group 821 is defined by each ontology included in the ontology group 822.

計算部８１２は、ＲＤＦデータセット群８２１の各ＲＤＦデータセットに含まれる２つのリンクの組み合わせについて、それらのリンクが同時に出現する出現回数を求める。そして、計算部８１２は、求めた出現回数を用いて、それらのリンクに対する共起統計量８２３を計算し、共起統計量８２３を記憶部８１１に格納する。 The calculation unit 812 obtains the number of appearances of the links simultaneously appearing for the combination of two links included in each RDF data set of the RDF data set group 821. Then, the calculation unit 812 calculates the co-occurrence statistic 823 for those links using the obtained number of appearances, and stores the co-occurrence statistic 823 in the storage unit 811.

また、計算部８１２は、オントロジー群８２２に含まれる各オントロジーから、各リンクのラベル及びコメントに含まれる単語を抽出し、各リンクのラベル及びコメントに出現する各単語の出現回数を求める。そして、計算部８１２は、求めた出現回数を用いて、各単語の重要度を示す重要度統計量８２４を計算し、重要度統計量８２４を記憶部８１１に格納する。 The calculation unit 812 also extracts words included in the label and comment of each link from each ontology included in the ontology group 822, and obtains the number of appearances of each word that appears in the label and comment of each link. Then, the calculating unit 812 calculates the importance statistic 824 indicating the importance of each word using the obtained number of appearances, and stores the importance statistic 824 in the storage unit 811.

探索部８１３は、ＲＤＦデータセット８２５に含まれるノードの中から、非ＲＤＦデータセット８２６に含まれる複数のデータそれぞれに対応する対応ノードを特定する。そして、探索部８１３は、特定した複数の対応ノードを含む探索ノードキュー８２７を生成し、探索ノードキュー８２７を記憶部８１１に格納する。探索ノードキュー８２７に登録されたノードは、移動元ノードとして用いられる。 The search unit 813 identifies a corresponding node corresponding to each of the plurality of data included in the non-RDF data set 826 from the nodes included in the RDF data set 825. Then, the search unit 813 generates a search node queue 827 including the identified corresponding nodes, and stores the search node queue 827 in the storage unit 811. The node registered in the search node queue 827 is used as a movement source node.

次に、探索部８１３は、探索ノードキュー８２７に含まれる複数の対応ノードそれぞれから、ＲＤＦデータセット８２５に含まれるリンクを参照方向とは逆方向に辿る経路上において、移動元ノード及び移動先ノードのオントロジー間の関連性を判定する。このとき、探索部８１３は、共起統計量８２３及び重要度統計量８２４に基づいて、オントロジー間の関連性を判定する。 Next, the search unit 813 determines, from each of the plurality of corresponding nodes included in the search node queue 827, the source node and the destination node on the route that follows the link included in the RDF data set 825 in the direction opposite to the reference direction. Determine the relationships between the ontologies of. At this time, the search unit 813 determines the relevance between the ontologies based on the co-occurrence statistic 823 and the importance statistic 824.

探索部８１３は、オントロジー間に関連性があると判定した場合、移動元ノードと移動先ノードとの間のリンクを逆方向に辿り、移動先ノードを新たな移動元ノードとして探索ノードキュー８２７に設定する。そして、探索部８１３は、新たな移動元ノードを含む経路上の探索を継続する。 When the search unit 813 determines that there is a relationship between the ontologies, the search unit 813 follows the link between the source node and the destination node in the reverse direction, and sets the destination node in the search node queue 827 as a new source node. Set. Then, the search unit 813 continues the search on the route including the new source node.

探索部８１３は、複数の対応ノードそれぞれからＲＤＦデータセット８２５に含まれるリンクを辿る経路を含む、経路リスト８２８を生成し、経路リスト８２８を記憶部８１１に格納する。そして、探索部８１３は、探索ノードキュー８２７及び経路リスト８２８を更新しながら探索を継続することで、第１対応ノードからリンクを辿る第１経路と、第２対応ノードからリンクを辿る第２経路とが交わる、共通ノードを探索する。 The search unit 813 generates a route list 828 including routes that follow the links included in the RDF data set 825 from each of the corresponding nodes, and stores the route list 828 in the storage unit 811. Then, the search unit 813 continues the search while updating the search node queue 827 and the route list 828, so that the first route following the link from the first corresponding node and the second route following the link from the second corresponding node. Search for a common node where and intersect.

探索部８１３は、探索の途中で、オントロジー間に関連性がないと判定した場合、移動元ノードを含む経路上の探索を打ち切る。そして、探索部８１３は、探索された共通ノードを示す情報、第１ノードを示す情報、及び第２ノードを示す情報を含む探索結果８２９を生成し、探索結果８２９を記憶部８１１に格納する。出力部８１４は、探索結果８２９を出力する。 If the search unit 813 determines that there is no relationship between the ontologies during the search, the search unit 813 terminates the search on the route including the source node. Then, the search unit 813 generates a search result 829 including information indicating the searched common node, information indicating the first node, and information indicating the second node, and stores the search result 829 in the storage unit 811. The output unit 814 outputs the search result 829.

オントロジーは、どのようなリンクをどのような構造で用いるかを定義する情報であると考えられる。そこで、計算部８１２は、オントロジー及びオントロジー間の関係を直接扱う代わりに、リンク及びリンク間の関係を用いて、共起統計量８２３及び重要度統計量８２４を計算する。これにより、共起統計量８２３及び重要度統計量８２４に基づいて、同じデータセットにおいて一緒に用いることが可能なオントロジーの集合を特定することが可能になる。 An ontology is considered to be information that defines what kind of link is used in what kind of structure. Therefore, the calculation unit 812 calculates the co-occurrence statistic 823 and the importance statistic 824 by using the link and the relationship between the links, instead of directly handling the ontology and the relationship between the ontologies. This makes it possible to identify a set of ontologies that can be used together in the same data set based on the co-occurrence statistic 823 and the importance statistic 824.

探索部８１３は、経路上でリンクを辿る際、共起統計量８２３及び重要度統計量８２４に基づいて、移動元ノードのオントロジーと移動先ノードのオントロジーとが同じデータセットにおいて一緒に用いることが可能か否かを推定する。そして、探索部８１３は、それらのオントロジーを一緒に用いることが可能である場合にのみ、それらのノード間のリンクを辿って、経路リスト８２８の経路を更新する。 When following a link on a route, the search unit 813 may use the source node ontology and the destination node ontology together in the same data set based on the co-occurrence statistic 823 and the importance statistic 824. Estimate whether it is possible or not. Then, the search unit 813 updates the routes in the route list 828 by following the links between the nodes only when the ontology can be used together.

図８のノード探索装置８０１によれば、ＲＤＦデータセット８２５において対応ノード及び共通ノードを特定することで、共通ノードから対応ノードまでの範囲を、非ＲＤＦデータセット８２６に関連するデータ部分として特定することができる。これにより、共通ノードから参照される、関連するデータ部分以外のデータを取得することが可能になる。 According to the node search device 801 in FIG. 8, by specifying the corresponding node and the common node in the RDF data set 825, the range from the common node to the corresponding node is specified as the data portion related to the non-RDF data set 826. be able to. As a result, it is possible to acquire data other than the related data part that is referred to by the common node.

一例として、ＲＤＦデータセット８２５において、共通ノードが対応ノード以外の別のノードを直接又は間接的に参照しており、別のノードが非ＲＤＦデータセット８２６に含まれていない情報を有する場合を想定する。この場合、共通ノードからリンクを参照方向に辿ることで、別のノードが有する情報を取得して分析することが可能になる。 As an example, it is assumed that the common node directly or indirectly refers to another node other than the corresponding node in the RDF dataset 825, and the other node has information that is not included in the non-RDF dataset 826. To do. In this case, by following the link from the common node in the reference direction, it becomes possible to acquire and analyze the information possessed by another node.

また、経路上の探索において、移動元ノード及び移動先ノードのオントロジー間に関連性がないと判定した場合、その経路上の探索を打ち切ることで、非ＲＤＦデータセット８２６と無関係なデータ部分の探索が省略される。これにより、ＲＤＦデータセット８２５と非ＲＤＦデータセット８２６とを対応付けるための計算量を削減することができる。 Further, in the search on the route, when it is determined that there is no relationship between the ontology of the source node and the ontology of the destination node, the search on the route is aborted to search for a data portion unrelated to the non-RDF data set 826. Is omitted. As a result, the amount of calculation for associating the RDF data set 825 and the non-RDF data set 826 can be reduced.

例えば、図５のＲＤＦデータセットにおいて、対応ノード“ＦＪ社”及びブランクノード●のオントロジー間に関連性がないと判定された場合、それらのノードを含む経路上の探索が打ち切られる。これにより、ブランクノード●を含む経路上のリンクを辿る処理が省略されるため、ＲＤＦデータセットと顧客企業一覧とを対応付けるための計算量が削減される。 For example, in the RDF data set of FIG. 5, when it is determined that there is no relationship between the ontology of the corresponding node “FJ company” and the blank node ●, the search on the route including those nodes is terminated. As a result, the process of following the links on the route including the blank node ● is omitted, so that the amount of calculation for associating the RDF data set and the customer company list is reduced.

図９は、ＲＤＦデータセット群８２１に含まれるＲＤＦデータセットの例を示している。＿：ｂ、＿：ａ、ｌ４ａ：１００００１１００ｘｘｘｘ、“ＡＢＣ”、“ＡＢＣＬｉｂｒａｒｙ”、“１００００１１００ｘｘｘｘ”、“国の機関”、及び“新規”は、ノードを表す。ｄｂｏ：ａａａ、ｄｂｏ：ｎｎｎ、ｓｋｏｓ：ｐｐｐ、ｏｒｇ：ｉｉｉ、ｄｃｔ：ｓｓｓ、及びｏｒｇ：ｃｃｃの矢印は、リンクを表す。 FIG. 9 shows an example of RDF data sets included in the RDF data set group 821. _: B, _: a, 14a: 100001100xxxx, "ABC", "ABC Library", "100001100xxxx", "national institution", and "new" represent nodes. The arrows of dbo: aaa, dbo: nnn, skos: ppp, org: iii, dct: sss, and org: ccc represent links.

ＲＤＦのTurtle形式において、“ｄｂｏ：”、“ｓｋｏｓ：”、“ｏｒｇ：”、及び“ｄｃｔ：”はプレフィクスと呼ばれ、ＵＲＩの先頭部分を省略するために用いられる。慣習として、同じオントロジーには同じプレフィクスが使用されるため、図９の例では、プレフィクスによってオントロジーを区別することができる。したがって、図９のＲＤＦデータセットは、“ｄｂｏ：”、“ｓｋｏｓ：”、“ｏｒｇ：”、及び“ｄｃｔ：”が示す４つのオントロジーに基づいて記述されていることが分かる。 In the Turtle format of RDF, “dbo:”, “skos:”, “org:”, and “dct:” are called prefixes and are used to omit the beginning part of the URI. By convention, the same prefix is used for the same ontology, so in the example of FIG. 9, the ontologies can be distinguished by the prefix. Therefore, it can be seen that the RDF data set of FIG. 9 is described based on four ontologies indicated by “dbo:”, “skos:”, “org:”, and “dct:”.

共起統計量８２３としては、例えば、次式の自己相互情報量（Pointwise Mutual Information，ＰＭＩ）を用いることができる。 As the co-occurrence statistic 823, for example, the self-mutual information (PMI) of the following equation can be used.

式（１）において、ＰＭＩ（ｘ，ｙ）は、リンクｘ及びリンクｙが共起する確率を表し、ｐ（ｘ）は、リンクｘが出現する出現確率を表し、ｐ（ｙ）は、リンクｙが出現する出現確率を表す。ｃ（ｘ）は、リンクｘが出現する出現回数を表し、ｃ（ｙ）は、リンクｙが出現する出現回数を表し、ｃ（ｘ，ｙ）は、リンクｘ及びリンクｙが同時に出現する出現回数を表す。Ｎは、すべてのリンクの出現回数を表し、Ｋは、２つのリンクのすべての組み合わせ（すべての共起）の出現回数を表す。 In Expression (1), PMI (x, y) represents the probability that the link x and the link y co-occur, p (x) represents the appearance probability that the link x appears, and p (y) is the link It represents the appearance probability that y appears. c (x) represents the number of appearances of the link x, c (y) represents the number of appearances of the link y, and c (x, y) represents the number of appearances of the link x and the link y at the same time. Indicates the number of times. N represents the number of appearances of all links, and K represents the number of appearances of all combinations (all co-occurrence) of two links.

まず、計算部８１２は、ＲＤＦデータセット群８２１に含まれるすべてのＲＤＦデータセットを用いて、ｃ（ｘ）、ｃ（ｙ）、及びＮを求める。次に、計算部８１２は、リンクｘ及びリンクｙを含む単一のＲＤＦデータセットを用いて、ｃ（ｘ，ｙ）を求める。そして、計算部８１２は、式（１）を用いてＰＭＩ（ｘ，ｙ）を計算する。 First, the calculation unit 812 calculates c (x), c (y), and N using all RDF data sets included in the RDF data set group 821. Next, the calculation unit 812 obtains c (x, y) using the single RDF data set including the link x and the link y. Then, the calculation unit 812 calculates PMI (x, y) using the equation (1).

図１０は、ＰＭＩ（ｘ，ｙ）の計算に用いられるＲＤＦデータセットの例を示している。図１０（ａ）は、ノードｎ０１を含むＲＤＦデータセットの例を示している。このＲＤＦデータセットは、ノードｎ０１〜ノードｎ０７及びリンクＬ１〜リンクＬ３を含む。図１０（ｂ）は、ノードｎ１１を含むＲＤＦデータセットの例を示している。このＲＤＦデータセットは、ノードｎ１１〜ノードｎ１７、リンクＬ２、リンクＬ４、及びリンクＬ５を含む。 FIG. 10 shows an example of the RDF data set used for the calculation of PMI (x, y). FIG. 10A shows an example of the RDF data set including the node n01. This RDF data set includes nodes n01 to n07 and links L1 to L3. FIG.10 (b) has shown the example of the RDF data set containing the node n11. This RDF data set includes nodes n11 to n17, link L2, link L4, and link L5.

この場合、各リンクの出現回数と２つのリンクが同時に出現する出現回数は、次のようになる。
ｃ（Ｌ１）＝２
ｃ（Ｌ２）＝４
ｃ（Ｌ３）＝２
ｃ（Ｌ４）＝２
ｃ（Ｌ５）＝２
ｃ（Ｌ１，Ｌ２）＝４
ｃ（Ｌ１，Ｌ３）＝４
ｃ（Ｌ１，Ｌ４）＝０
ｃ（Ｌ１，Ｌ５）＝０
ｃ（Ｌ２，Ｌ３）＝４
ｃ（Ｌ２，Ｌ４）＝４
ｃ（Ｌ２，Ｌ５）＝４
ｃ（Ｌ３，Ｌ４）＝０
ｃ（Ｌ３，Ｌ５）＝０
ｃ（Ｌ４，Ｌ５）＝４ In this case, the number of appearances of each link and the number of appearances of two links appearing at the same time are as follows.
c (L1) = 2
c (L2) = 4
c (L3) = 2
c (L4) = 2
c (L5) = 2
c (L1, L2) = 4
c (L1, L3) = 4
c (L1, L4) = 0
c (L1, L5) = 0
c (L2, L3) = 4
c (L2, L4) = 4
c (L2, L5) = 4
c (L3, L4) = 0
c (L3, L5) = 0
c (L4, L5) = 4

Ｎ＝１２であるから、式（１）のｌｏｇとして、底を１０とする対数を用いた場合、Ｐ（Ｌ１，Ｌ２）、Ｐ（Ｌ１，Ｌ３）、及びＰ（Ｌ１，Ｌ４）は、次式のように計算される。 Since N = 12, when logarithm with base 10 is used as the log of equation (1), P (L1, L2), P (L1, L3), and P (L1, L4) are Calculated as the formula.

式（４）のＮａＮは、非数を表す。他のリンクの組み合わせに対するＰＭＩ（ｘ，ｙ）も、式（２）〜式（４）と同様にして計算される。 NaN of Formula (4) represents a non-number. PMI (x, y) for other combinations of links are also calculated in the same manner as equations (2) to (4).

各ＲＤＦデータセットにおいてリンクｘ及びリンクｙが同時に出現する出現回数が多いほど、式（１）のＰＭＩ（ｘ，ｙ）が大きくなる。このため、ＰＭＩ（ｘ，ｙ）が大きいほど、リンクｘを定義するオントロジーとリンクｙを定義するオントロジーとが、同じＲＤＦデータセットにおいて一緒に用いられる可能性が高くなる。したがって、ＰＭＩ（ｘ，ｙ）を利用して、それらのオントロジー間の関連性を判定することができる。 The greater the number of appearances of the link x and the link y at the same time in each RDF data set, the larger the PMI (x, y) of the equation (1). Therefore, the larger PMI (x, y), the more likely the ontology defining link x and the ontology defining link y are used together in the same RDF dataset. Therefore, PMI (x, y) can be utilized to determine the relevance between those ontologies.

重要度統計量８２４としては、例えば、次式のｔｆｉｄｆ（ｉ，ｊ）を用いることができる。 As the importance statistic 824, for example, tfidf (i, j) in the following equation can be used.

式（１２）のｎ（ｉ，ｊ）は、リンクｄ（ｊ）における単語ｔ（ｉ）の出現回数を表し、

は、リンクｄ（ｊ）におけるすべての単語の出現回数の和を表す。式（１３）の｜Ｄ｜は、ＲＤＦデータセット群８２１に含まれる、異なるリンクの個数を表し、｜｛ｄ：ｄ∋ｔ（ｉ）｝｜は、単語ｔ（ｉ）を含むリンクの個数を表す。 N (i, j) in the equation (12) represents the number of appearances of the word t (i) in the link d (j),

Represents the sum of the number of appearances of all words in the link d (j). In Expression (13), | D | represents the number of different links included in the RDF dataset group 821, and | {d: d∋t (i)} | is the number of links including the word t (i). Represents

まず、計算部８１２は、ＲＤＦデータセット群８２１に含まれるすべてのＲＤＦデータセットからリンクｄ（ｊ）を抽出し、オントロジー群８２２の中から、リンクｄ（ｊ）を定義するオントロジーを選択する。次に、計算部８１２は、選択したオントロジーから、リンクｄ（ｊ）のラベル及びコメントに含まれる単語を抽出し、抽出したラベル及びコメントに出現する単語ｔ（ｉ）の出現回数を求める。そして、計算部８１２は、求めた出現回数をｎ（ｉ，ｊ）として用いて、式（１１）〜式（１３）により、ｔｆｉｄｆ（ｉ，ｊ）を計算する。 First, the calculation unit 812 extracts the links d (j) from all the RDF datasets included in the RDF dataset group 821, and selects the ontology defining the links d (j) from the ontology group 822. Next, the calculation unit 812 extracts words included in the label and comment of the link d (j) from the selected ontology, and obtains the number of appearances of the word t (i) that appears in the extracted label and comment. Then, the calculation unit 812 calculates tfidf (i, j) from Expressions (11) to (13) using the obtained number of appearances as n (i, j).

図１１は、ｔｆｉｄｆ（ｉ，ｊ）の計算に用いられるＲＤＦデータセットの例を示している。ｌ４ａ：１００００１１００ｘｘｘｘ、“ＡＢＣＬｉｂｒａｒｙ”、“１００００１１００ｘｘｘｘ”、及び“新規”は、ノードを表し、ｓｋｏｓ：ｐｐｐ、ｏｒｇ：ｉｉｉ、及びｏｒｇ：ｃｃｃの矢印は、リンクを表す。各リンクのラベル及びコメントとしては、例えば、次のようなテキストが抽出される。
skos:ppp
label: ppp lll
comment: skos:ppp, skos:aaa and skos:hhh are p1 d1 p2. T12 r3 of skos:ppp is t12 c4 of R5 p6 l7. A r8 h13 no m14 t15 one v9 of skos:ppp p16 l10 t11.

org:iii
label: iii
comment: G1 an iii, s28 as a c2 r3 n4, t29 c30 be u31 to u32 to u5 i6 the organization. M7 d7 n9 and i10 iii s11 are a12. The o13 o14 is n15 to w33 s16 are u36. The p17 iii s11 s37 be i18 by the d19 of the iii v20. U38 d19 to d21 the n22 s11 u39 is c23 w40 r24 b25 p25 for ‘skos:nnn’ of w34 t35 p26 is a s27.

org:ccc
label: ccc bbb
comment: I1 a c2 event w3 r4 in a c5 to t6 organization. D7 on the event the organization may or may not h12 c8 to e9 a10 the event. I11 of ‘org:ooo’. FIG. 11 shows an example of the RDF data set used to calculate tfidf (i, j). l4a: 100001100xxxx, "ABC Library", "100001100xxxx", and "new" represent nodes, and the arrows of skos: ppp, org: iii, and org: ccc represent links. As the label and comment of each link, for example, the following texts are extracted.
skos: ppp
label: ppp lll
comment: skos: ppp, skos: aaa and skos: hhh are p1 d1 p2. T12 r3 of skos: ppp is t12 c4 of R5 p6 l7.A r8 h13 no m14 t15 one v9 of skos: ppp p16 l10 t11.

org: iii
label: iii
comment: G1 an iii, s28 as a c2 r3 n4, t29 c30 be u31 to u32 to u5 i6 the organization.M7 d7 n9 and i10 iii s11 are a12.The o13 o14 is n15 to w33 s16 are u36. The p17 iii s11 s37 be i18 by the d19 of the iii v20.U38 d19 to d21 the n22 s11 u39 is c23 w40 r24 b25 p25 for'skos: nnn 'of w34 t35 p26 is a s27.

org: ccc
label: ccc bbb
comment: I1 a c2 event w3 r4 in a c5 to t6 organization.D7 on the event the organization may or may not h12 c8 to e9 a10 the event.I11 of 'org: ooo'.

ただし、これらのラベル（label）及びコメント（comment）は、実施形態を説明するための仮想的なテキストである。 However, the label and the comment are virtual texts for explaining the embodiment.

ＲＤＦデータセット群８２１に図１１のＲＤＦデータセットのみが含まれており、リンクｄ（ｊ）がｏｒｇ：ｃｃｃであり、単語ｔ（ｉ）が“organization”である場合、ｎ（ｉ，ｊ）は２となる。また、ｏｒｇ：ｃｃｃのラベルに含まれる単語は２個であり、コメントに含まれる単語は３２個であるため、すべての単語の出現回数の和は３４である。 If the RDF dataset group 821 includes only the RDF dataset of FIG. 11, the link d (j) is org: ccc, and the word t (i) is “organization”, then n (i, j). Is 2. Also, since the word included in the org: ccc label is two and the word included in the comment is 32, the sum of the number of appearances of all words is 34.

図１１のＲＤＦデータセットに含まれる異なるリンクの個数は３個であり、ラベル及びコメントに“organization”が含まれるリンクは、ｏｒｇ：ｉｉｉ及びｏｒｇ：ｃｃｃの２個である。したがって、式（１３）のｌｏｇとして自然対数を用いた場合、ｔｆｉｄｆ（ｉ，ｊ）は、次式のように計算される。 The number of different links included in the RDF data set of FIG. 11 is three, and the number of links including “organization” in the label and comment is org: iii and org: ccc. Therefore, when natural logarithm is used as the log of the equation (13), tfidf (i, j) is calculated by the following equation.

また、リンクｄ（ｊ）がｏｒｇ：ｃｃｃであり、単語ｔ（ｉ）が“event”である場合、ｎ（ｉ，ｊ）は３となり、ラベル及びコメントに“event”が含まれるリンクは、ｏｒｇ：ｃｃｃの１個のみである。この場合、ｔｆｉｄｆ（ｉ，ｊ）は、次式のように計算される。 Further, when the link d (j) is org: ccc and the word t (i) is “event”, n (i, j) is 3, and a link including “event” in the label and the comment is There is only one org: ccc. In this case, tfidf (i, j) is calculated by the following equation.

他のリンク及び単語の組み合わせに対するｔｆｉｄｆ（ｉ，ｊ）も、式（１４）及び式（１５）と同様にして計算される。 The tfidf (i, j) for other link and word combinations are also calculated in the same manner as the equations (14) and (15).

式（１１）のｔｆｉｄｆ（ｉ，ｊ）が大きいほど、リンクｄ（ｊ）における単語ｔ（ｉ）の重要度が大きくなる。このため、同じ単語のｔｆｉｄｆ（ｉ，ｊ）が２つのリンクにおいて大きな値を示す場合、それらのリンクを定義する２つのオントロジーが同じＲＤＦデータセットにおいて一緒に用いられる可能性が高いと考えられる。したがって、ｔｆｉｄｆ（ｉ，ｊ）を利用して、それらのオントロジー間の関連性を判定することができる。 The larger tfidf (i, j) in Expression (11), the greater the importance of the word t (i) in the link d (j). Thus, if the same word tfidf (i, j) shows a large value in two links, then it is likely that the two ontologies that define those links will be used together in the same RDF dataset. Therefore, tfidf (i, j) can be used to determine the relevance between those ontologies.

次に、図１２から図１８までを参照しながら、図８のノード探索装置８０１の動作について、より詳細に説明する。 Next, the operation of the node search device 801 in FIG. 8 will be described in more detail with reference to FIGS. 12 to 18.

図１２は、図８のノード探索装置８０１が行うノード探索処理の具体例を示すフローチャートである。まず、計算部８１２は、ＲＤＦデータセット群８２１の各ＲＤＦデータセットに含まれる２つのリンクの組み合わせについて、共起統計量８２３を計算する（ステップ１２０１）。 FIG. 12 is a flowchart showing a specific example of the node search processing performed by the node search device 801 of FIG. First, the calculation unit 812 calculates the co-occurrence statistic 823 for a combination of two links included in each RDF data set of the RDF data set group 821 (step 1201).

次に、計算部８１２は、ＲＤＦデータセット群８２１に含まれる各リンクのラベル及びコメントに含まれる単語を抽出し、各単語について重要度統計量８２４を計算する（ステップ１２０２）。そして、探索部８１３は、共起統計量８２３及び重要度統計量８２４を用いて、ノード検出処理を行う（ステップ１２０３）。 Next, the calculation unit 812 extracts the words included in the label and the comment of each link included in the RDF data set group 821 and calculates the importance statistic 824 for each word (step 1202). Then, the search unit 813 uses the co-occurrence statistic 823 and the importance statistic 824 to perform node detection processing (step 1203).

図１３は、図１２のステップ１２０３におけるノード検出処理の例を示すフローチャートである。まず、探索部８１３は、ＲＤＦデータセット８２５に含まれるノードの中から、非ＲＤＦデータセット８２６に含まれる複数のデータそれぞれに対応する対応ノードを特定する（ステップ１３０１）。 FIG. 13 is a flowchart showing an example of the node detection processing in step 1203 of FIG. First, the search unit 813 identifies the corresponding node corresponding to each of the plurality of data included in the non-RDF data set 826 from the nodes included in the RDF data set 825 (step 1301).

このとき、探索部８１３は、非ＲＤＦデータセット８２６に含まれる各データの文字列と、ＲＤＦデータセット８２５に含まれる各ノードが有するリテラルの文字列とが一致するか否かをチェックする。そして、探索部８１３は、各データの文字列と一致するリテラルを有するノードを、対応ノードとして特定し、複数の対応ノードを含む探索ノードキュー８２７を生成する（ステップ１３０２）。 At this time, the search unit 813 checks whether or not the character string of each data included in the non-RDF data set 826 and the literal character string of each node included in the RDF data set 825 match. Then, the search unit 813 identifies a node having a literal that matches the character string of each data as a corresponding node, and generates a search node queue 827 including a plurality of corresponding nodes (step 1302).

図１４は、探索ノードキュー８２７の例を示している。図１４の探索ノードキュー８２７には、対応ノードｎ１〜対応ノードｎ４が登録されている。 FIG. 14 shows an example of the search node queue 827. Corresponding nodes n1 to n4 are registered in the search node queue 827 of FIG.

次に、探索部８１３は、探索ノードキュー８２７が空であるか否かをチェックする（ステップ１３０３）。探索ノードキュー８２７が空でない場合（ステップ１３０３，ＮＯ）、探索部８１３は、探索ノードキュー８２７から１つのノードを移動元ノードとして抽出し（ステップ１３０４）、ノード移動処理を行う（ステップ１３０５）。 Next, the search unit 813 checks whether the search node queue 827 is empty (step 1303). When the search node queue 827 is not empty (step 1303, NO), the search unit 813 extracts one node from the search node queue 827 as a movement source node (step 1304) and performs node movement processing (step 1305).

次に、探索部８１３は、経路リスト８２８を参照して、ステップ１３０１において特定されたすべての対応ノードを直接又は間接的に参照する、共通ノードが存在するか否かをチェックする（ステップ１３０６）。すべての対応ノードを直接又は間接的に参照する共通ノードが存在しない場合（ステップ１３０６，ＮＯ）、探索部８１３は、ステップ１３０３以降の処理を繰り返す。 Next, the search unit 813 refers to the route list 828 to check whether or not there is a common node that directly or indirectly refers to all the corresponding nodes identified in step 1301 (step 1306). . When there is no common node that directly or indirectly refers to all the corresponding nodes (step 1306, NO), the search unit 813 repeats the processing from step 1303.

一方、すべての対応ノードを直接又は間接的に参照する共通ノードが存在する場合（ステップ１３０６，ＹＥＳ）、探索部８１３は、それまでに検出された共通ノードを示す情報を含む探索結果８２９を生成する。そして、出力部８１４は、探索結果８２９を出力する（ステップ１３０７）。この場合、探索結果８２９には、それまでに検出されたすべての共通ノードについて、共通ノードを示す情報と、その共通ノードが直接又は間接的に参照する対応ノードを示す情報とが含まれる。 On the other hand, when there is a common node that directly or indirectly refers to all the corresponding nodes (step 1306, YES), the search unit 813 generates a search result 829 including information indicating the common nodes detected so far. To do. Then, the output unit 814 outputs the search result 829 (step 1307). In this case, the search result 829 includes information indicating the common node and information indicating the corresponding node that the common node directly or indirectly refers to, for all the common nodes detected so far.

探索ノードキュー８２７が空である場合（ステップ１３０３，ＹＥＳ）、ノード探索装置８０１は、ステップ１３０７の処理を行う。 When the search node queue 827 is empty (step 1303, YES), the node search device 801 performs the process of step 1307.

図１５は、図１３のステップ１３０５におけるノード移動処理の例を示すフローチャートである。まず、探索部８１３は、ＲＤＦデータセット８２５において、移動元ノードを参照するリンクの両端のノードのうち、移動元ノードではない方のノードを、移動先ノードとして特定する（ステップ１５０１）。そして、探索部８１３は、移動元ノードから移動先ノードへ探索位置を移動させるか否かを判定する（ステップ１５０２）。 FIG. 15 is a flowchart showing an example of the node move processing in step 1305 of FIG. First, the search unit 813 identifies, in the RDF data set 825, one of the nodes at both ends of the link that refers to the source node, which is not the source node, as the destination node (step 1501). Then, the search unit 813 determines whether to move the search position from the source node to the destination node (step 1502).

例えば、探索部８１３は、以下の条件（Ｃ１）又は条件（Ｃ２）のうちいずれかが満たされる場合、移動元ノード及び移動先ノードのオントロジー間に関連性があると判定する。この場合、移動元ノードから移動先ノードへ探索位置を移動させると判定される。
（Ｃ１）移動元ノードを参照するリンクと移動先ノードを参照するリンクとに対する共起統計量８２３が所定値αよりも大きい。 For example, the search unit 813 determines that there is a relationship between the ontology of the source node and the ontology of the destination node when either of the following condition (C1) or condition (C2) is satisfied. In this case, it is determined that the search position is moved from the source node to the destination node.
(C1) The co-occurrence statistic 823 for the link that refers to the source node and the link that refers to the destination node is larger than the predetermined value α.

例えば、αは、ユーザにより指定される値であり、α＝０であってもよい。移動先ノードを参照するリンクが存在しない場合、共起統計量８２３は非常に大きな値に設定される。この場合、６４ビット浮動小数点数の最大値を共起統計量８２３として用いてもよい。
（Ｃ２）移動元ノードを参照するリンクのラベル及びコメントに含まれる重要単語と、移動先ノードを参照するリンクのラベル及びコメントに含まれる重要単語とが重複している。 For example, α is a value designated by the user and may be α = 0. When there is no link that refers to the destination node, the co-occurrence statistic 823 is set to a very large value. In this case, the maximum value of 64-bit floating point numbers may be used as the co-occurrence statistic 823.
(C2) The important word included in the label and comment of the link that refers to the source node overlaps with the important word included in the label and comment of the link that refers to the destination node.

リンクのラベル及びコメントに含まれる重要単語は、そのラベル及びコメントに含まれる単語のうち、所定値βよりも大きな重要度統計量８２４を有する単語である。移動元ノードを参照するリンクの少なくとも１つの重要単語が、移動先ノードを参照するリンクの少なくとも１つの重要単語と同じである場合、それらのリンクの重要単語が重複していると判定される。例えば、βは、ユーザにより指定される値である。 The important word included in the label and comment of the link is a word having the importance statistic 824 larger than the predetermined value β among the words included in the label and the comment. If at least one important word of the link that refers to the source node is the same as at least one important word of the link that refers to the destination node, it is determined that the important words of those links are duplicated. For example, β is a value specified by the user.

一方、条件（Ｃ１）又は条件（Ｃ２）のいずれも満たされない場合、探索部８１３は、移動元ノード及び移動先ノードのオントロジー間に関連性がないと判定する。この場合、移動元ノードから移動先ノードへ探索位置を移動させないと判定される。 On the other hand, when neither the condition (C1) nor the condition (C2) is satisfied, the search unit 813 determines that there is no relationship between the ontology of the source node and the ontology of the destination node. In this case, it is determined that the search position is not moved from the source node to the destination node.

なお、ＲＤＦデータセット８２５において、複数のリンクが移動元ノードを参照している場合、探索部８１３は、それらのリンクの端点の移動先ノードそれぞれについて、探索位置を移動させるか否かを判定する。 In addition, in the RDF data set 825, when a plurality of links refer to the movement source node, the search unit 813 determines whether or not to move the search position for each movement destination node of the end point of those links. .

また、複数のリンクが移動先ノードを参照している場合、探索部８１３は、それらのリンクと移動元ノードを参照するリンクのすべての組み合わせについて、条件（Ｃ１）又は条件（Ｃ２）が満たされるか否かを判定する。そして、探索部８１３は、いずれかの組み合わせについて条件（Ｃ１）又は条件（Ｃ２）が満たされる場合、探索位置を移動させると判定する。 When a plurality of links refer to the movement-destination node, the search unit 813 satisfies the condition (C1) or the condition (C2) for all combinations of those links and the links referencing the movement-source node. Or not. Then, the search unit 813 determines to move the search position when the condition (C1) or the condition (C2) is satisfied for any combination.

探索位置を移動させる場合（ステップ１５０２，ＹＥＳ）、探索部８１３は、経路リスト８２８を更新し（ステップ１５０３）、更新された経路リスト８２８に基づいて共通ノードを検出する（ステップ１５０４）。そして、探索部８１３は、探索ノードキュー８２７に移動先ノードを追加する（ステップ１５０５）。追加された移動先ノードは、図１３のステップ１３０４において、新たな移動元ノードとして抽出される。 When moving the search position (step 1502, YES), the search unit 813 updates the route list 828 (step 1503) and detects a common node based on the updated route list 828 (step 1504). Then, the search unit 813 adds the destination node to the search node queue 827 (step 1505). The added destination node is extracted as a new source node in step 1304 of FIG.

一方、探索位置を移動させない場合（ステップ１５０２，ＮＯ）、探索部８１３は、経路リスト８２８を更新することなく、処理を終了する。 On the other hand, when the search position is not moved (step 1502, NO), the search unit 813 ends the process without updating the route list 828.

経路リスト８２８には、ノード検出処理で検出された各ノードの経路を示すエントリが含まれている。ノードｍの経路を示すエントリは、ＲＤＦデータセット８２５において、いずれかの対応ノードからノードｍまでの間に存在する１つ以上のノードの識別情報を含む。ノードｍが対応ノードである場合、そのエントリはノードｍの識別情報のみを含む。ステップ１５０３において、探索部８１３は、例えば、以下のような手順で経路リスト８２８を更新する。
（Ｐ３１）探索部８１３は、経路リスト８２８から移動元ノードの経路を示すエントリＥ０を取得する。
（Ｐ３２）探索部８１３は、経路リスト８２８からエントリＥ０を削除する
（Ｐ３３）探索部８１３は、エントリＥ０に移動先ノードの識別情報を追加して、移動先ノードの経路を示すエントリＥ１を生成する。複数の移動先ノードが存在する場合、各移動先ノードについてエントリＥ１が生成される。
（Ｐ３４）探索部８１３は、エントリＥ１を経路リスト８２８に追加する。
（Ｐ３５）探索部８１３は、経路リスト８２８に含まれるエントリをチェックし、同じノードの経路を示す複数のエントリが存在する場合、経路リスト８２８からそれらのエントリを削除する。そして、探索部８１３は、削除したエントリを結合して新たなエントリＥ２を生成し、経路リスト８２８に追加する。 The route list 828 includes an entry indicating the route of each node detected by the node detection process. The entry indicating the route of the node m includes identification information of one or more nodes existing between any corresponding node and the node m in the RDF data set 825. If the node m is the corresponding node, the entry contains only the identification information of the node m. In step 1503, the search unit 813 updates the route list 828 in the following procedure, for example.
(P31) The search unit 813 acquires the entry E0 indicating the route of the source node from the route list 828.
(P32) The search unit 813 deletes the entry E0 from the route list 828. (P33) The search unit 813 adds the identification information of the movement destination node to the entry E0 and generates the entry E1 indicating the route of the movement destination node. To do. If there are multiple destination nodes, an entry E1 is created for each destination node.
(P34) The search unit 813 adds the entry E1 to the route list 828.
(P35) The search unit 813 checks the entries included in the route list 828, and if there are multiple entries indicating the route of the same node, deletes those entries from the route list 828. Then, the search unit 813 combines the deleted entries to generate a new entry E2 and adds it to the route list 828.

図１６は、ＲＤＦデータセット８２５に含まれる経路の例を示している。ｎ０、ｎ１、ｎ００、ｎ１１、ｎ０００、ｎ００１、及びｎ００２はノードを表し、ノード間の矢印は、リンクを表す。ｎ０及びｎ１は、対応ノードである。 FIG. 16 shows an example of paths included in the RDF data set 825. n0, n1, n00, n11, n000, n001, and n002 represent nodes, and arrows between the nodes represent links. n0 and n1 are corresponding nodes.

図１７は、図１６の経路を示す経路リスト８２８の例を示している。図１７（ａ）は、経路リスト８２８から取得されるエントリＥ０の例を示している。エントリ１７０１は、移動元ノードｎ００の経路を示しており、ｐ（ｎ０，ｎ００）は、対応ノードｎ０から移動元ノードｎ００までの経路上に存在するノードを表す。 FIG. 17 shows an example of the route list 828 showing the routes of FIG. FIG. 17A shows an example of the entry E0 acquired from the route list 828. The entry 1701 indicates the route of the source node n00, and p (n0, n00) represents a node existing on the route from the corresponding node n0 to the source node n00.

図１７（ｂ）は、図１７（ａ）のエントリＥ０から生成されるエントリＥ１の例を示している。この場合、移動元ノードｎ００から移動可能な移動先ノードは、ノードｎ０００〜ノードｎ００２である。このうち、移動先ノードｎ００１及び移動先ノードｎ００２が探索位置の移動先に決定された場合、エントリＥ１として、これらの移動先ノードの経路を示すエントリ１７０２及びエントリ１７０３が生成される。 FIG. 17B shows an example of the entry E1 generated from the entry E0 in FIG. 17A. In this case, the destination nodes that can be moved from the source node n00 are the nodes n000 to n002. Among them, when the movement destination node n001 and the movement destination node n002 are determined as the movement destinations of the search position, the entry 1702 and the entry 1703 indicating the routes of these movement destination nodes are generated as the entry E1.

エントリ１７０２のｐ（ｎ０，ｎ００，ｎ００１）は、対応ノードｎ０から移動先ノードｎ００１までの経路上に存在するノードを表す。また、エントリ１７０３のｐ（ｎ０，ｎ００，ｎ００２）は、対応ノードｎ０から移動先ノードｎ００２までの経路上に存在するノードを表す。 P (n0, n00, n001) of the entry 1702 represents a node existing on the path from the corresponding node n0 to the destination node n001. Further, p (n0, n00, n002) of the entry 1703 represents a node existing on the route from the corresponding node n0 to the movement-destination node n002.

図１８は、共通ノードを含む経路リスト８２８の例を示している。図１８（ａ）は、経路リスト８２８から取得されるエントリＥ０の例を示している。エントリ１８０１は、移動元ノードｎ００の経路を示し、エントリ１８０２は、移動元ノードｎ１１の経路を示している。エントリ１８０１のｐ（ｎ０，ｎ００）は、対応ノードｎ０から移動元ノードｎ００までの経路上に存在するノードを表し、エントリ１８０２のｐ（ｎ１，ｎ１１）は、対応ノードｎ１から移動元ノードｎ１１までの経路上に存在するノードを表す。 FIG. 18 shows an example of the route list 828 including the common node. FIG. 18A shows an example of the entry E0 acquired from the route list 828. The entry 1801 shows the route of the source node n00, and the entry 1802 shows the route of the source node n11. The p (n0, n00) of the entry 1801 represents a node existing on the path from the corresponding node n0 to the source node n00, and the p (n1, n11) of the entry 1802 is from the corresponding node n1 to the source node n11. Represents a node existing on the path of.

図１８（ｂ）は、図１８（ａ）のエントリＥ０から生成されるエントリＥ１の例を示している。移動元ノードｎ００から移動可能な移動先ノードのうち、移動先ノードｎ０００及び移動先ノードｎ００１が探索位置の移動先に決定された場合、エントリＥ１として、これらの移動先ノードの経路を示すエントリ１８０３及びエントリ１８０４が生成される。 FIG. 18B shows an example of the entry E1 generated from the entry E0 of FIG. 18A. Among the destination nodes that can be moved from the source node n00, if the destination node n000 and the destination node n001 are determined as the destinations of the search positions, the entry 1803 indicating the route of these destination nodes is set as the entry E1. And an entry 1804 is created.

エントリ１８０３のｐ（ｎ０，ｎ００，ｎ０００）は、対応ノードｎ０から移動先ノードｎ０００までの経路上に存在するノードを表す。また、エントリ１８０４のｐ（ｎ０，ｎ００，ｎ００１）は、対応ノードｎ０から移動先ノードｎ００１までの経路上に存在するノードを表す。 The entry 1803 p (n0, n00, n000) represents a node existing on the route from the corresponding node n0 to the destination node n000. Further, p (n0, n00, n001) of the entry 1804 represents a node existing on the route from the corresponding node n0 to the destination node n001.

さらに、移動元ノードｎ１１から移動可能な移動先ノードｎ００１が探索位置の移動先に決定された場合、エントリＥ１として、移動先ノードｎ００１の経路を示すエントリ１８０５が生成される。エントリ１８０５のｐ（ｎ１，ｎ１１，ｎ００１）は、対応ノードｎ１から移動先ノードｎ００１までの経路上に存在するノードを表す。 Furthermore, when the destination node n001 that can be moved from the source node n11 is determined as the destination of the search position, an entry 1805 indicating the route of the destination node n001 is generated as the entry E1. Entry 1805 p (n1, n11, n001) represents a node existing on the path from the corresponding node n1 to the destination node n001.

図１８（ｃ）は、手順（Ｐ３５）において、図１８（ｂ）のエントリ１８０４及びエントリ１８０５から生成される新たなエントリＥ２の例を示している。エントリ１８０４及びエントリ１８０５は、同じノードｎ００１の経路を示しているため、経路リスト８２８から削除される。そして、エントリ１８０４及びエントリ１８０５を結合することで、エントリ１８０６が生成される。 18C shows an example of a new entry E2 generated from the entry 1804 and the entry 1805 of FIG. 18B in the procedure (P35). The entry 1804 and the entry 1805 indicate the route of the same node n001, and thus are deleted from the route list 828. Then, the entry 1806 is generated by combining the entry 1804 and the entry 1805.

エントリ１８０６は、ノードｎ００１の識別情報と、対応ノードｎ０からノードｎ００１までの経路を示すｐ（ｎ０，ｎ００，ｎ００１）と、対応ノードｎ１からノードｎ００１までの経路を示すｐ（ｎ１，ｎ１１，ｎ００１）とを含んでいる。 The entry 1806 includes identification information of the node n001, p (n0, n00, n001) indicating the route from the corresponding node n0 to the node n001, and p (n1, n11, n001) indicating the route from the corresponding node n1 to the node n001. ) And are included.

したがって、探索部８１３は、ステップ１５０４においてエントリ１８０６をチェックすることで、対応ノードｎ０からリンクを辿る経路と、対応ノードｎ１からリンクを辿る経路とが、ノードｎ００１において交わることを検出できる。これにより、ノードｎ００１が共通ノードとして検出される。この場合、探索部８１３は、共通ノードｎ００１、対応ノードｎ０、及び対応ノードｎ１の識別情報（ｎ００１，ｎ０，ｎ１）を含む探索結果８２９を生成する。 Therefore, the search unit 813 can detect that the route that follows the link from the corresponding node n0 and the route that follows the link from the corresponding node n1 intersect at the node n001 by checking the entry 1806 in step 1504. As a result, the node n001 is detected as the common node. In this case, the search unit 813 generates a search result 829 including the identification information (n001, n0, n1) of the common node n001, the corresponding node n0, and the corresponding node n1.

図６のノード探索装置６０１及び図８のノード探索装置８０１の構成は一例に過ぎず、ノード探索装置の用途又は条件に応じて一部の構成要素を省略又は変更してもよい。例えば、図８のノード探索装置８０１において、共起統計量８２３及び重要度統計量８２４が外部の装置によって計算される場合、又は共起統計量８２３及び重要度統計量８２４を用いない場合は、計算部８１２を省略することができる。 The configurations of the node search device 601 of FIG. 6 and the node search device 801 of FIG. 8 are merely examples, and some of the constituent elements may be omitted or changed depending on the use or condition of the node search device. For example, in the node search device 801 of FIG. 8, when the co-occurrence statistic 823 and the importance statistic 824 are calculated by an external device, or when the co-occurrence statistic 823 and the importance statistic 824 are not used, The calculation unit 812 can be omitted.

図７、図１２、図１３、及び図１５のフローチャートは一例に過ぎず、ノード探索装置の構成又は条件に応じて一部の処理を省略又は変更してもよい。例えば、図１２のノード探索処理において、共起統計量８２３及び重要度統計量８２４が外部の装置によって計算される場合は、ステップ１２０１及びステップ１２０２の処理を省略することができる。図１２のノード探索処理において、共起統計量８２３及び重要度統計量８２４を用いない場合も、ステップ１２０１及びステップ１２０２の処理を省略することができる。 The flowcharts of FIGS. 7, 12, 13, and 15 are merely examples, and some processing may be omitted or changed depending on the configuration or conditions of the node search device. For example, in the node search process of FIG. 12, when the co-occurrence statistic 823 and the importance statistic 824 are calculated by an external device, the processes of step 1201 and step 1202 can be omitted. Even when the co-occurrence statistic 823 and the importance statistic 824 are not used in the node search process of FIG. 12, the processes of step 1201 and step 1202 can be omitted.

図１５のステップ１５０２において、探索部８１３は、条件（Ｃ１）又は条件（Ｃ２）の一方のみを用いて、探索位置を移動させるか否かを判定してもよく、別の条件を用いて探索位置を移動させるか否かを判定してもよい。例えば、ユーザは、オントロジー間の関連性がある２つのリンクの組み合わせを示すリストを、記憶部８１１に記憶させておくことも可能である。この場合、探索部８１３は、移動元ノードを参照するリンクと移動先ノードを参照するリンクの組み合わせが、そのリストに含まれているか否かをチェックすることで、探索位置を移動させるか否かを決定する。 In step 1502 of FIG. 15, the search unit 813 may determine whether to move the search position using only one of the condition (C1) and the condition (C2), and the search may be performed using another condition. It may be determined whether to move the position. For example, the user can store in the storage unit 811 a list indicating a combination of two links having a relationship between ontologies. In this case, the search unit 813 checks whether a combination of a link that refers to the source node and a link that refers to the destination node is included in the list to determine whether to move the search position. To decide.

図１〜図３、図４（ｂ）、図５、及び図９〜図１１に示したＲＤＦデータセットは一例に過ぎず、ＲＤＦデータセットは、ＲＤＦによって記述される情報と、記述に用いられるオントロジーとに応じて変化する。図４（ａ）に示した非ＲＤＦデータセットは一例に過ぎず、非ＲＤＦデータセットは、記述される情報に応じて変化する。 The RDF datasets illustrated in FIGS. 1 to 3, 4B, 5 and 9 to 11 are merely examples, and the RDF dataset is used for description and information described by RDF. It changes according to the ontology. The non-RDF data set shown in FIG. 4A is merely an example, and the non-RDF data set changes according to the information to be described.

図１４に示した探索ノードキュー８２７は一例に過ぎず、探索ノードキュー８２７に含まれる対応ノードは、ＲＤＦデータセット８２５及び非ＲＤＦデータセット８２６に応じて変化する。図１６に示した経路と図１７及び図１８に示した経路リスト８２８は一例に過ぎず、経路及び経路リスト８２８は、ＲＤＦデータセット８２５及び非ＲＤＦデータセット８２６に応じて変化する。 The search node queue 827 shown in FIG. 14 is merely an example, and the corresponding nodes included in the search node queue 827 change according to the RDF data set 825 and the non-RDF data set 826. The route shown in FIG. 16 and the route list 828 shown in FIGS. 17 and 18 are merely examples, and the route and the route list 828 change according to the RDF data set 825 and the non-RDF data set 826.

式（１）〜式（１５）は一例に過ぎず、探索部８１３は、別の計算式を用いて共起統計量８２３及び重要度統計量８２４を計算してもよい。 Expressions (1) to (15) are merely examples, and the search unit 813 may calculate the co-occurrence statistic 823 and the importance statistic 824 using another calculation expression.

図１９は、図６のノード探索装置６０１及び図８のノード探索装置８０１として用いられる情報処理装置（コンピュータ）の構成例を示している。図１９の情報処理装置は、ＣＰＵ（Central Processing Unit）１９０１、メモリ１９０２、入力装置１９０３、出力装置１９０４、補助記憶装置１９０５、媒体駆動装置１９０６、及びネットワーク接続装置１９０７を含む。これらの構成要素はバス１９０８により互いに接続されている。 FIG. 19 shows a configuration example of an information processing device (computer) used as the node search device 601 of FIG. 6 and the node search device 801 of FIG. The information processing apparatus of FIG. 19 includes a CPU (Central Processing Unit) 1901, a memory 1902, an input device 1903, an output device 1904, an auxiliary storage device 1905, a medium drive device 1906, and a network connection device 1907. These components are connected to each other by a bus 1908.

メモリ１９０２は、例えば、ＲＯＭ（Read Only Memory）、ＲＡＭ（Random Access Memory）、フラッシュメモリ等の半導体メモリであり、処理に用いられるプログラム及びデータを格納する。メモリ１９０２は、図６の記憶部６１１又は図８の記憶部８１１として用いることができる。 The memory 1902 is, for example, a semiconductor memory such as a ROM (Read Only Memory), a RAM (Random Access Memory), and a flash memory, and stores programs and data used for processing. The memory 1902 can be used as the storage unit 611 in FIG. 6 or the storage unit 811 in FIG.

ＣＰＵ１９０１（プロセッサ）は、例えば、メモリ１９０２を利用してプログラムを実行することにより、図６の探索部６１２、図８の計算部８１２、及び探索部８１３として動作する。 The CPU 1901 (processor) operates as the search unit 612 in FIG. 6, the calculation unit 812 in FIG. 8, and the search unit 813 by executing the program using the memory 1902, for example.

入力装置１９０３は、例えば、キーボード、ポインティングデバイス等であり、オペレータ又はユーザからの指示又は情報の入力に用いられる。出力装置１９０４は、例えば、表示装置、プリンタ、スピーカ等であり、オペレータ又はユーザへの問い合わせ又は指示、及び処理結果の出力に用いられる。出力装置１９０４は、図６の出力部６１３又は図８の出力部８１４として用いることができる。処理結果は、探索結果８２９であってもよい。 The input device 1903 is, for example, a keyboard, a pointing device, or the like, and is used to input an instruction or information from an operator or a user. The output device 1904 is, for example, a display device, a printer, a speaker, or the like, and is used for inquiring or instructing an operator or a user and outputting a processing result. The output device 1904 can be used as the output unit 613 of FIG. 6 or the output unit 814 of FIG. The processing result may be the search result 829.

補助記憶装置１９０５は、例えば、磁気ディスク装置、光ディスク装置、光磁気ディスク装置、テープ装置等である。補助記憶装置１９０５は、ハードディスクドライブ又はフラッシュメモリであってもよい。情報処理装置は、補助記憶装置１９０５にプログラム及びデータを格納しておき、それらをメモリ１９０２にロードして使用することができる。補助記憶装置１９０５は、図６の記憶部６１１又は図８の記憶部８１１として用いることができる。 The auxiliary storage device 1905 is, for example, a magnetic disk device, an optical disk device, a magneto-optical disk device, a tape device, or the like. The auxiliary storage device 1905 may be a hard disk drive or a flash memory. The information processing apparatus can store the program and data in the auxiliary storage device 1905 and load them into the memory 1902 for use. The auxiliary storage device 1905 can be used as the storage unit 611 of FIG. 6 or the storage unit 811 of FIG.

媒体駆動装置１９０６は、可搬型記録媒体１９０９を駆動し、その記録内容にアクセスする。可搬型記録媒体１９０９は、メモリデバイス、フレキシブルディスク、光ディスク、光磁気ディスク等である。可搬型記録媒体１９０９は、ＣＤ−ＲＯＭ（Compact Disk Read Only Memory）、ＤＶＤ（Digital Versatile Disk）、ＵＳＢ（Universal Serial Bus）メモリ等であってもよい。オペレータ又はユーザは、この可搬型記録媒体１９０９にプログラム及びデータを格納しておき、それらをメモリ１９０２にロードして使用することができる。 The medium driving device 1906 drives a portable recording medium 1909 to access the recorded contents. The portable recording medium 1909 is a memory device, flexible disk, optical disk, magneto-optical disk, or the like. The portable recording medium 1909 may be a CD-ROM (Compact Disk Read Only Memory), a DVD (Digital Versatile Disk), a USB (Universal Serial Bus) memory, or the like. The operator or the user can store the program and data in the portable recording medium 1909 and load them into the memory 1902 for use.

このように、処理に用いられるプログラム及びデータを格納するコンピュータ読み取り可能な記録媒体は、メモリ１９０２、補助記憶装置１９０５、又は可搬型記録媒体１９０９のような、物理的な（非一時的な）記録媒体である。 As described above, a computer-readable recording medium that stores programs and data used for processing is a physical (non-transitory) recording medium such as the memory 1902, the auxiliary storage device 1905, or the portable recording medium 1909. It is a medium.

ネットワーク接続装置１９０７は、ＬＡＮ（Local Area Network）、ＷＡＮ（Wide Area Network）等の通信ネットワークに接続され、通信に伴うデータ変換を行う通信インタフェース回路である。情報処理装置は、プログラム及びデータを外部の装置からネットワーク接続装置１９０７を介して受信し、それらをメモリ１９０２にロードして使用することができる。ネットワーク接続装置１９０７は、図６の出力部６１３又は図８の出力部８１４として用いることができる。 The network connection device 1907 is a communication interface circuit that is connected to a communication network such as a LAN (Local Area Network) and a WAN (Wide Area Network) and performs data conversion accompanying communication. The information processing device can receive a program and data from an external device via the network connection device 1907, load them into the memory 1902, and use them. The network connection device 1907 can be used as the output unit 613 of FIG. 6 or the output unit 814 of FIG.

情報処理装置は、ネットワーク接続装置１９０７を介して、ユーザ端末からＲＤＦデータセット８２５、非ＲＤＦデータセット８２６、及び処理要求を受信し、探索結果８２９をユーザ端末へ送信することもできる。 The information processing device can also receive the RDF data set 825, the non-RDF data set 826, and the processing request from the user terminal via the network connection device 1907, and can also transmit the search result 829 to the user terminal.

なお、情報処理装置が図１９のすべての構成要素を含む必要はなく、用途又は条件に応じて一部の構成要素を省略することも可能である。例えば、情報処理装置がユーザ端末から処理要求を受信する場合は、入力装置１９０３及び出力装置１９０４を省略してもよい。可搬型記録媒体１９０９又は通信ネットワークを使用しない場合は、媒体駆動装置１９０６又はネットワーク接続装置１９０７を省略してもよい。 Note that the information processing device does not need to include all the constituent elements of FIG. 19, and it is possible to omit some of the constituent elements according to the use or the conditions. For example, when the information processing device receives a processing request from the user terminal, the input device 1903 and the output device 1904 may be omitted. When the portable recording medium 1909 or the communication network is not used, the medium driving device 1906 or the network connection device 1907 may be omitted.

開示の実施形態とその利点について詳しく説明したが、当業者は、特許請求の範囲に明確に記載した本発明の範囲から逸脱することなく、様々な変更、追加、省略をすることができるであろう。 Although the disclosed embodiments and their advantages have been described in detail, those skilled in the art can make various changes, additions, and omissions without departing from the scope of the invention explicitly described in the claims. Let's do it.

図４乃至図１９を参照しながら説明した実施形態に関し、さらに以下の付記を開示する。
（付記１）
コンピュータにより実行されるノード探索方法であって、
前記コンピュータが、
複数のオントロジーに基づいて記述された第１データセットに含まれるノードの中から、第２データセットに含まれる複数のデータそれぞれに対応するノードを特定し、
特定された複数のノードそれぞれから前記第１データセットに含まれるリンクを辿る経路上において、移動元ノードのオントロジーと移動先ノードのオントロジーとの間の関連性を判定し、
前記関連性を判定した結果に基づいて、前記移動元ノードと前記移動先ノードとの間のリンクを辿ることで、前記複数のノードのうち第１ノードからリンクを辿る第１経路と、前記複数のノードのうち第２ノードからリンクを辿る第２経路とが交わる、共通ノードを探索し、
前記共通ノードを示す情報と前記第１ノードを示す情報と前記第２ノードを示す情報とを含む、探索結果を出力する、
ことを特徴とするノード探索方法。
（付記２）
前記コンピュータは、前記移動元ノードのオントロジーと前記移動先ノードのオントロジーとの間に関連性があると判定した場合、前記移動元ノードと前記移動先ノードとの間のリンクを辿り、前記移動先ノードを新たな移動元ノードに設定して、前記新たな移動元ノードを含む経路上の探索を継続し、前記移動元ノードのオントロジーと前記移動先ノードのオントロジーとの間に関連性がないと判定した場合、前記移動元ノードを含む経路上の探索を打ち切ることを特徴とする付記１記載のノード探索方法。
（付記３）
前記コンピュータは、複数のデータセット各々に含まれる２つのリンクの組み合わせについて、前記２つのリンクが同時に出現する出現回数を求め、求めた出現回数を用いて前記２つのリンクに対する共起統計量を計算し、前記移動元ノードを参照する第１リンクと前記移動先ノードを参照する第２リンクとに対する共起統計量に基づいて、前記移動元ノードのオントロジーと前記移動先ノードのオントロジーとの間の関連性を判定することを特徴とする付記１又は２記載のノード探索方法。
（付記４）
前記コンピュータは、複数のデータセットに含まれる各リンクを定義したオントロジーから、各リンクのラベル及びコメントに含まれる単語を抽出し、各リンクのラベル及びコメントに出現する各単語の出現回数を求め、求めた出現回数を用いて各単語の重要度を示す重要度統計量を計算し、前記移動元ノードを参照する第１リンクのラベル及びコメントに含まれる各単語の重要度を示す第１重要度統計量と、前記移動先ノードを参照する第２リンクのラベル及びコメントに含まれる各単語の重要度を示す第２重要度統計量とを用いて、前記移動元ノードのオントロジーと前記移動先ノードのオントロジーとの間の関連性を判定することを特徴とする付記１又は２記載のノード探索方法。
（付記５）
複数のオントロジーに基づいて記述された第１データセットに含まれるノードの中から、第２データセットに含まれる複数のデータそれぞれに対応するノードを特定し、
特定された複数のノードそれぞれから前記第１データセットに含まれるリンクを辿る経路上において、移動元ノードのオントロジーと移動先ノードのオントロジーとの間の関連性を判定し、
前記関連性を判定した結果に基づいて、前記移動元ノードと前記移動先ノードとの間のリンクを辿ることで、前記複数のノードのうち第１ノードからリンクを辿る第１経路と、前記複数のノードのうち第２ノードからリンクを辿る第２経路とが交わる、共通ノードを探索し、
前記共通ノードを示す情報と前記第１ノードを示す情報と前記第２ノードを示す情報とを含む、探索結果を出力する、
処理をコンピュータに実行させるためのノード探索プログラム。
（付記６）
前記コンピュータは、前記移動元ノードのオントロジーと前記移動先ノードのオントロジーとの間に関連性があると判定した場合、前記移動元ノードと前記移動先ノードとの間のリンクを辿り、前記移動先ノードを新たな移動元ノードに設定して、前記新たな移動元ノードを含む経路上の探索を継続し、前記移動元ノードのオントロジーと前記移動先ノードのオントロジーとの間に関連性がないと判定した場合、前記移動元ノードを含む経路上の探索を打ち切ることを特徴とする付記５記載のノード探索プログラム。
（付記７）
前記コンピュータは、複数のデータセット各々に含まれる２つのリンクの組み合わせについて、前記２つのリンクが同時に出現する出現回数を求め、求めた出現回数を用いて前記２つのリンクに対する共起統計量を計算し、前記移動元ノードを参照する第１リンクと前記移動先ノードを参照する第２リンクとに対する共起統計量に基づいて、前記移動元ノードのオントロジーと前記移動先ノードのオントロジーとの間の関連性を判定することを特徴とする付記５又は６記載のノード探索プログラム。
（付記８）
前記コンピュータは、複数のデータセットに含まれる各リンクを定義したオントロジーから、各リンクのラベル及びコメントに含まれる単語を抽出し、各リンクのラベル及びコメントに出現する各単語の出現回数を求め、求めた出現回数を用いて各単語の重要度を示す重要度統計量を計算し、前記移動元ノードを参照する第１リンクのラベル及びコメントに含まれる各単語の重要度を示す第１重要度統計量と、前記移動先ノードを参照する第２リンクのラベル及びコメントに含まれる各単語の重要度を示す第２重要度統計量とを用いて、前記移動元ノードのオントロジーと前記移動先ノードのオントロジーとの間の関連性を判定することを特徴とする付記５又は６記載のノード探索プログラム。 Regarding the embodiment described with reference to FIGS. 4 to 19, the following supplementary notes are further disclosed.
(Appendix 1)
A node search method executed by a computer, comprising:
The computer is
From the nodes included in the first data set described based on the multiple ontologies, the nodes corresponding to each of the multiple data items included in the second data set are specified,
On a route that follows a link included in the first data set from each of the identified nodes, the relationship between the ontology of the source node and the ontology of the destination node is determined,
A first route that follows a link from a first node of the plurality of nodes by tracing a link between the source node and the destination node based on a result of determining the relevance; A common node intersecting with a second route that follows the link from the second node among the nodes
Outputting a search result including information indicating the common node, information indicating the first node, and information indicating the second node,
A node search method characterized by the above.
(Appendix 2)
When the computer determines that there is a relationship between the ontology of the source node and the ontology of the destination node, the computer follows a link between the source node and the destination node, The node is set as a new source node, and the search on the route including the new source node is continued, and if there is no relationship between the ontology of the source node and the ontology of the destination node. If the determination is made, the search on the route including the source node is aborted, and the node search method according to note 1.
(Appendix 3)
The computer obtains the number of appearances of the two links appearing simultaneously for a combination of two links included in each of the plurality of data sets, and calculates the co-occurrence statistic for the two links using the obtained number of appearances. Between the ontology of the source node and the ontology of the destination node based on the co-occurrence statistic for the first link that refers to the source node and the second link that refers to the destination node. The node search method according to supplementary note 1 or 2, wherein the relevance is determined.
(Appendix 4)
The computer, from the ontology that defines each link included in the plurality of data sets, extracts the words included in the label and comment of each link, and obtains the number of appearances of each word that appears in the label and comment of each link, An importance statistic indicating the importance of each word is calculated using the obtained number of appearances, and a first importance indicating the importance of each word included in the label and comment of the first link that refers to the source node. The ontology of the source node and the destination node are calculated using a statistic and a second importance statistic indicating the importance of each word included in the label and comment of the second link that references the destination node. 3. The node search method according to appendix 1 or 2, characterized in that the relevance to the above ontology is determined.
(Appendix 5)
From the nodes included in the first data set described based on the multiple ontologies, the nodes corresponding to each of the multiple data items included in the second data set are specified,
On a route that follows a link included in the first data set from each of the identified nodes, the relationship between the ontology of the source node and the ontology of the destination node is determined,
A first route that follows a link from a first node of the plurality of nodes by tracing a link between the source node and the destination node based on a result of determining the relevance; A common node intersecting with a second route that follows the link from the second node among the nodes
Outputting a search result including information indicating the common node, information indicating the first node, and information indicating the second node,
A node search program that causes a computer to execute processing.
(Appendix 6)
When the computer determines that there is a relationship between the ontology of the source node and the ontology of the destination node, the computer follows a link between the source node and the destination node, The node is set as a new source node, and the search on the route including the new source node is continued, and if there is no relationship between the ontology of the source node and the ontology of the destination node. If the determination is made, the search on the route including the source node is aborted, and the node search program according to note 5.
(Appendix 7)
The computer obtains the number of appearances of the two links appearing simultaneously for a combination of two links included in each of the plurality of data sets, and calculates the co-occurrence statistic for the two links using the obtained number of appearances Between the ontology of the source node and the ontology of the destination node based on the co-occurrence statistic for the first link that refers to the source node and the second link that refers to the destination node. 7. The node search program according to supplementary note 5 or 6, which determines the relevance.
(Appendix 8)
The computer, from the ontology that defines each link included in the plurality of data sets, extracts the words included in the label and comment of each link, and obtains the number of appearances of each word that appears in the label and comment of each link, An importance statistic indicating the importance of each word is calculated using the obtained number of appearances, and a first importance indicating the importance of each word included in the label and comment of the first link that refers to the source node. The ontology of the source node and the destination node are calculated using a statistic and a second importance statistic indicating the importance of each word included in the label and comment of the second link that references the destination node. 7. The node search program according to appendix 5 or 6, wherein the relevance to the ontology is determined.

３０１、３１１、３１２データセット
３２１〜３２６オントロジー
６０１、８０１ノード探索装置
６１１、８１１記憶部
６１２、８１３探索部
６１３、８１４出力部
６２１第１データセット
６２２第２データセット
８１２計算部
８２１ＲＤＦデータセット群
８２２オントロジー群
８２３共起統計量
８２４重要度統計量
８２５ＲＤＦデータセット
８２６非ＲＤＦデータセット
８２７探索ノードキュー
８２８経路リスト
８２９探索結果
１７０１〜１７０３、１８０１〜１８０６エントリ
１９０１ＣＰＵ
１９０２メモリ
１９０３入力装置
１９０４出力装置
１９０５補助記憶装置
１９０６媒体駆動装置
１９０７ネットワーク接続装置
１９０８バス
１９０９可搬型記録媒体 301, 311, 312 Data sets 321 to 326 Ontology 601, 80 1 Node search device 611, 811 Storage unit 612, 813 Search unit 613, 814 Output unit 621 First data set 622 Second data set 812 Calculation unit 821 RDF data set group 822 Ontology group 823 Co-occurrence statistic 824 Importance statistic 825 RDF data set 826 Non-RDF data set 827 Search node queue 828 Route list 829 Search result 1701-1703, 1801-1806 Entry 1901 CPU
1902 Memory 1903 Input Device 1904 Output Device 1905 Auxiliary Storage Device 1906 Medium Drive Device 1907 Network Connection Device 1908 Bus 1909 Portable Recording Medium

Claims

A node search method executed by a computer, comprising:
The computer is
From the nodes included in the first data set described based on the multiple ontologies, the nodes corresponding to each of the multiple data items included in the second data set are specified,
On a route that follows a link included in the first data set from each of the identified nodes, the relationship between the ontology of the source node and the ontology of the destination node is determined,
A first route that follows a link from a first node of the plurality of nodes by tracing a link between the source node and the destination node based on a result of determining the relevance; A common node intersecting with a second route that follows the link from the second node among the nodes
Outputting a search result including information indicating the common node, information indicating the first node, and information indicating the second node,
A node search method characterized by the above.

When the computer determines that there is a relationship between the ontology of the source node and the ontology of the destination node, the computer follows a link between the source node and the destination node, If the node is set as a new source node and the search on the route including the new source node is continued, and there is no relationship between the ontology of the source node and the ontology of the destination node The node search method according to claim 1, wherein when the determination is made, the search on the route including the source node is terminated.

The computer obtains the number of appearances of the two links appearing simultaneously for a combination of two links included in each of the plurality of data sets, and calculates the co-occurrence statistic for the two links using the obtained number of appearances. Between the ontology of the source node and the ontology of the destination node based on the co-occurrence statistic for the first link that refers to the source node and the second link that refers to the destination node. 3. The node search method according to claim 1, wherein the relevance is determined.

The computer, from the ontology that defines each link included in the plurality of data sets, extracts the words included in the label and comment of each link, and obtains the number of appearances of each word that appears in the label and comment of each link, An importance statistic indicating the importance of each word is calculated using the obtained number of appearances, and a first importance indicating the importance of each word included in the label and comment of the first link that refers to the source node. The ontology of the source node and the destination node are calculated using a statistic and a second importance statistic indicating the importance of each word included in the label and comment of the second link that references the destination node. 3. The node search method according to claim 1, further comprising determining the relevance to the ontology of.

From the nodes included in the first data set described based on the multiple ontologies, the nodes corresponding to each of the multiple data items included in the second data set are specified,
On a route that follows a link included in the first data set from each of the identified nodes, the relationship between the ontology of the source node and the ontology of the destination node is determined,
A first route that follows a link from a first node of the plurality of nodes by tracing a link between the source node and the destination node based on a result of determining the relevance; A common node intersecting with a second route that follows the link from the second node among the nodes
Outputting a search result including information indicating the common node, information indicating the first node, and information indicating the second node,
A node search program that causes a computer to execute processing.