JP6702035B2

JP6702035B2 - Class estimating device, class estimating method, and class estimating program

Info

Publication number: JP6702035B2
Application number: JP2016132825A
Authority: JP
Inventors: 成司岡嶋
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2016-07-04
Filing date: 2016-07-04
Publication date: 2020-05-27
Anticipated expiration: 2036-07-04
Also published as: JP2018005632A

Description

本発明は、クラス推定装置、クラス推定方法及びクラス推定プログラムに関する。 The present invention relates to a class estimating device, a class estimating method, and a class estimating program.

近年注目されつつあるＲＤＦ（Resource Description Framework）は、「主語」「述語」「目的語」の三要素で、ウェブ上に存在するリソース間の関係を記述するデータ形式である。リソースは、人や物事などのエンティティを表し、ＵＲＩ（Uniform Resource Identifier）によって一意に識別される。「主語」「述語」はリソースであり、「目的語」はリソース又は文字列（「リテラル」という）である。ＲＤＦは、一般的に、リソースをノードとする有向グラフで表現され、述語を重みとするエッジでリソース間を結ぶことで、主語及び目的語の関係を表す。 RDF (Resource Description Framework), which has been attracting attention in recent years, is a data format that describes the relationship between resources existing on the Web with three elements, "subject," "predicate," and "object." A resource represents an entity such as a person or thing, and is uniquely identified by a URI (Uniform Resource Identifier). The “subject” and “predicate” are resources, and the “object” is a resource or a character string (referred to as “literal”). RDF is generally represented by a directed graph having resources as nodes, and represents the relationship between the subject and the object by connecting the resources with an edge having a predicate as a weight.

また、ＲＤＦの各リソースは、クラスと呼ばれるリソースの集合に属する。各リソースが属するクラスは、「述語」“rdf:type”によって記述される。例えば、「主語」“http://xxx/アインシュタイン”、「述語」“rdf:type”、「目的語」“http://xxx/人物”、ならびに、「主語」“http://xxx/アインシュタイン”、「述語」“rdf:type”、「目的語」“http://xxx/科学者”というリソース間の関係を考える。この場合、“アインシュタイン”は、“人物クラス”及び“科学者クラス”に属する。このとき、アインシュタインは、“人物クラス”及び“科学者クラス”の“インスタンス”である。 Each resource of RDF belongs to a set of resources called a class. The class to which each resource belongs is described by the "predicate" "rdf:type". For example, "subject" "http://xxx/Einstein", "predicate" "rdf:type", "object" "http://xxx/person", and "subject" "http://xxx/ Consider the relationship between the resources "Einstein", "predicate" "rdf:type", "object" "http://xxx/scientist". In this case, "Einstein" belongs to the "person class" and the "scientist class". At this time, Einstein is an “instance” of the “person class” and the “scientist class”.

このようなウェブ上の膨大なＲＤＦのリソースに対し、適切なスキーマ情報を与えることで、ＲＤＦのリソースの円滑な利用を促進することが期待されている。 Providing appropriate schema information to such enormous RDF resources on the Web is expected to promote smooth use of the RDF resources.

例えば、文書から「主語」「述語」「目的語」を抽出して文書のメタデータを生成する技術がある。また、グラフ構造を持つ大量のデータの中からクエリグラフパターンが一致する情報を検索に関して、ユーザにより入力された情報等に関連する意味合いの構造を持つ情報を取得するクエリグラフパターンを生成する技術がある。 For example, there is a technique for generating metadata of a document by extracting "subject", "predicate", "object" from the document. Also, regarding the search for information that matches a query graph pattern from a large amount of data with a graph structure, there is a technique for generating a query graph pattern that acquires information having a meaning structure related to information input by a user. is there.

特開２００５−２５８６５９号公報JP, 2005-258659, A 特開２００６−３１３５０１号公報JP, 2006-313501, A

しかしながら、既存のＲＤＦのリソースには、クラス情報が付与されているリソースと、クラス情報が付与されていないリソースがある。このため、例えば、クラス情報を用いてＲＤＦのリソースを一括取得する場合、クラスの情報が付与されていないリソースは、取得結果から漏れてしまう。よって、リソース探索の容易性を低下させ、利用者がリソースを円滑に利用できないという問題がある。 However, existing RDF resources include a resource to which class information is added and a resource to which class information is not added. Therefore, for example, when RDF resources are collectively acquired using the class information, the resources to which the class information is not added are omitted from the acquisition result. Therefore, there is a problem that the ease of resource search is reduced and the user cannot use the resource smoothly.

一つの側面としては、例えば、クラスが不明なＲＤＦのリソースに対して適切にクラス情報を付与するためのクラス分類規則を学習するクラス推定装置、クラス推定方法及びクラス推定プログラムを提供することを目的とする。 One aspect is to provide, for example, a class estimation device, a class estimation method, and a class estimation program that learn a class classification rule for appropriately assigning class information to an RDF resource whose class is unknown. And

一つの案では、例えば、クラス推定装置は、主語、述語、目的語の三要素でリソース間の関係情報を示すＲＤＦの学習データにおける各述語に対応する各目的語のクラスの出現確率をもとに、各述語に対応する各目的語のクラスの多様性を示す指標を算出する。そして、クラス推定装置は、指標が所定閾値を超える第１の述語に対応する目的語を主語とする場合の第２の述語を取得する。そして、クラス推定装置は、第１の述語及び第２の述語を組合せ、各主語のクラスに対応する述語と、第１の述語及び第２の述語を組合せた組合せ述語とを含む各素性を生成する。そして、クラス推定装置は、各主語のクラスと、各素性との対応関係をもとに、各素性が各クラスに対応して出現する出現頻度を集計し、集計した出現頻度から各素性に対して付与されるクラスを分類するクラス分類規則を学習する。 In one proposal, for example, the class estimation device determines the appearance probability of the class of each object corresponding to each predicate in the RDF learning data indicating the relationship information between resources with three elements of the subject, the predicate, and the object. Then, an index showing the diversity of classes of each object corresponding to each predicate is calculated. Then, the class estimation device acquires the second predicate when the object is the object corresponding to the first predicate whose index exceeds the predetermined threshold. Then, the class estimation device combines the first predicate and the second predicate to generate each feature including a predicate corresponding to the class of each subject and a combined predicate that combines the first predicate and the second predicate. To do. Then, the class estimation device, based on the correspondence between each subject class and each feature, aggregates the appearance frequencies at which each feature appears corresponding to each class, and for each feature from the aggregated appearance frequency, Learn the class classification rules that classify the classes given by

一つの側面として、例えば、クラスが不明なＲＤＦのリソースに対して適切にクラス情報を付与するためのクラス分類規則を学習できる。 As one aspect, for example, a class classification rule for appropriately assigning class information to an RDF resource whose class is unknown can be learned.

図１Ａは、ＲＤＦを説明する図である。FIG. 1A is a diagram illustrating RDF. 図１Ｂは、ＲＤＦのグラフ表現を示す図である。FIG. 1B is a diagram showing a graph representation of RDF. 図１Ｃは、リソースのクラスを説明する図である。FIG. 1C is a diagram illustrating resource classes. 図２は、実施例に係るクラス推定装置の構成の一例を示すブロック図である。FIG. 2 is a block diagram illustrating an example of the configuration of the class estimating device according to the embodiment. 図３は、実施例に係る学習フェーズ処理の一例を示すフローチャートである。FIG. 3 is a flowchart illustrating an example of the learning phase process according to the embodiment. 図４は、実施例に係る曖昧性計算処理の一例を示すフローチャートである。FIG. 4 is a flowchart illustrating an example of the ambiguity calculation process according to the embodiment. 図５は、実施例に係る素性生成処理の一例を示すフローチャートである。FIG. 5 is a flowchart illustrating an example of the feature generation processing according to the embodiment. 図６は、実施例に係る取得した学習データの一例を示す図である。FIG. 6 is a diagram illustrating an example of the acquired learning data according to the embodiment. 図７は、実施例に係る学習データの目的語のクラスの一例を示す図である。FIG. 7 is a diagram illustrating an example of the object class of the learning data according to the embodiment. 図８は、実施例に係る学習データに含まれる述語を重複を排除して列挙したリストＬ１の一例を示す図である。FIG. 8 is a diagram showing an example of the list L1 in which the predicates included in the learning data according to the embodiment are listed by eliminating duplication. 図９は、実施例に係る目的語のクラスの出現確率の算出の一例を示す図である。FIG. 9 is a diagram illustrating an example of calculation of the appearance probability of the object class according to the embodiment. 図１０は、実施例に係る展開述語保存リストの一例を示す図である。FIG. 10 is a diagram illustrating an example of the expanded predicate storage list according to the embodiment. 図１１は、実施例に係る学習データに含まれる主語を重複を排除して列挙したリストＬ２の一例を示す図である。FIG. 11 is a diagram illustrating an example of a list L2 in which the subject included in the learning data according to the embodiment is listed by eliminating duplication. 図１２は、実施例に係るグラフの展開の一例を示す図である。FIG. 12 is a diagram illustrating an example of expansion of the graph according to the embodiment. 図１３は、実施例に係る素性リストの一例を示す図である。FIG. 13 is a diagram illustrating an example of the feature list according to the embodiment. 図１４は、実施例に係るクラス分類規則（各クラスにおける各素性の出現頻度）の一例を示す図である。FIG. 14 is a diagram illustrating an example of the class classification rule (the appearance frequency of each feature in each class) according to the embodiment. 図１５は、実施例に係る分類フェーズ処理の一例を示すフローチャートである。FIG. 15 is a flowchart illustrating an example of the classification phase process according to the embodiment. 図１６は、実施例に係る取得したクラス推定対象データの一例を示す図である。FIG. 16 is a diagram illustrating an example of the acquired class estimation target data according to the example. 図１７は、実施例に係るクラス推定対象データの素性リストの一例を示す図である。FIG. 17 is a diagram illustrating an example of a feature list of class estimation target data according to the embodiment. 図１８は、実施例に係るクラスの推定の一例を示す図である。FIG. 18 is a diagram illustrating an example of class estimation according to the embodiment. 図１９は、実施例の他の適用例に係る目的語のクラスの出現確率の算出の一例を示す図である。FIG. 19 is a diagram illustrating an example of calculating the appearance probability of an object class according to another application example of the embodiment. 図２０は、実施例の他の適用例に係るグラフの展開の一例を示す図である。FIG. 20 is a diagram illustrating an example of expansion of a graph according to another application example of the embodiment. 図２１は、実施例の他の適用例に係るクラス分類規則（各クラスにおける各素性の出現頻度）の一例を示す図である。FIG. 21 is a diagram showing an example of a classification rule (appearance frequency of each feature in each class) according to another application example of the embodiment.

以下に添付図面を参照して実施例に係るクラス推定装置、クラス推定方法及びクラス推定プログラムについて説明する。なお、以下の実施例は開示技術を限定するものではない。また、各実施例は、矛盾しない範囲で適宜組み合わせてもよい。また、以下の実施例の説明では、開示技術と関連する構成のみ説明し、その他の構成の説明を省略する。また、以下の実施例の説明では、既出の同一又は類似の構成もしくは処理について後出の説明を省略する。 Hereinafter, a class estimation device, a class estimation method, and a class estimation program according to an embodiment will be described with reference to the accompanying drawings. The disclosed embodiments do not limit the disclosed technology. Further, the respective embodiments may be appropriately combined within a range that does not contradict. Further, in the following description of the embodiments, only the configuration related to the disclosed technology will be described, and description of the other configurations will be omitted. Further, in the following description of the embodiments, the later description of the same or similar configuration or processing that has already been described will be omitted.

図１Ａに示すように、以下の実施例において、ＲＤＦ（Resource Description Framework）は、「主語」「述語」「目的語」の組合せで、ウェブ上に存在するリソース間の関係を記述するデータ形式である。図１Ａは、ＲＤＦを説明する図である。 As shown in FIG. 1A, in the following embodiment, RDF (Resource Description Framework) is a combination of “subject”, “predicate”, and “object” in a data format that describes the relationship between resources existing on the Web. is there. FIG. 1A is a diagram illustrating RDF.

リソースは、人や物事などのエンティティを表し、ＵＲＩ（Uniform Resource Identifier）によって一意に識別される。「主語」「述語」はリソースであり、「目的語」はリソース又は文字列（「リテラル」という）である。図１Ａでは、「主語」「述語」「目的語」の３つの関係を示す。“http://xxx/アインシュタイン”“http://xxx/名前”“アルバート・アインシュタイン”、“http://xxx/アインシュタイン”“http://xxx/所属”“http://xxx/ＺＵ大学”、“http://xxx/アインシュタイン”“http://xxx/分野”“http://xxx/物理学”である。なお、「目的語」“アルバート・アインシュタイン”は、リテラルである。 A resource represents an entity such as a person or thing, and is uniquely identified by a URI (Uniform Resource Identifier). The “subject” and “predicate” are resources, and the “object” is a resource or a character string (referred to as “literal”). FIG. 1A shows three relationships of “subject”, “predicate” and “object”. “Http://xxx/Einstein” “http://xxx/name” “Albert Einstein”, “http://xxx/Einstein” “http://xxx/affiliation” “http://xxx/ZU "University", "http://xxx/Einstein", "http://xxx/field" "http://xxx/physics". The "object" "Albert Einstein" is a literal.

また、図１Ｂに示すように、例えば、図１Ａの「主語」「述語」「目的語」の関係を示すＲＤＦは、リソース（「主語」及び「目的語」）をノードとし、「述語」を重みとする有向エッジでリソース間を結ぶ。これにより、「主語」「述語」「目的語」の関係を示すＲＤＦは、主語及び目的語の関係を表す有向グラフで表現される。図１Ｂは、ＲＤＦのグラフ表現を示す図である。 Further, as shown in FIG. 1B, for example, the RDF indicating the relationship between “subject”, “predicate” and “object” in FIG. 1A has resources (“subject” and “object”) as nodes and “predicate” as a node. Resources are connected by directed edges as weights. As a result, the RDF indicating the relationship between the “subject”, the “predicate”, and the “object” is represented by a directed graph showing the relationship between the subject and the object. FIG. 1B is a diagram showing a graph representation of RDF.

また、図１Ｃに示すように、以下の実施例において、ＲＤＦの各リソースは、クラスと呼ばれるリソースの集合に属し、クラスの集合に属するリソースは、該当クラスのインスタンスである。図１Ｃは、リソースのクラスを説明する図である。 Further, as shown in FIG. 1C, in the following embodiment, each resource of RDF belongs to a set of resources called a class, and a resource belonging to the set of classes is an instance of the corresponding class. FIG. 1C is a diagram illustrating resource classes.

図１Ｃでは、“http://xxx/アインシュタイン”“rdf:type”“http://xxx/人物”、“http://xxx/アインシュタイン”“rdf:type”“http://xxx/科学者”という２つの「主語」「述語」「目的語」の関係を示す。図１Ｃは、「主語」“http://xxx/アインシュタイン”は、“人物”及び“科学者”のクラスに属する。なお、rdf:typeは、“http://www.w3.org/1999/02/22-rdf-syntax-ns#type”の略である。 In FIG. 1C, “http://xxx/Einstein” “rdf:type” “http://xxx/person”, “http://xxx/Einstein” “rdf:type” “http://xxx/science” The relationship between two "subjects", "predicates" and "objects" is shown. In FIG. 1C, “subject” “http://xxx/Einstein” belongs to the classes “person” and “scientist”. Note that rdf:type is an abbreviation for “http://www.w3.org/1999/02/22-rdf-syntax-ns#type”.

（実施例に係るクラス推定装置）
図２は、実施例に係るクラス推定装置の構成の一例を示すブロック図である。実施例に係るクラス推定装置１０は、曖昧性計算部１１、展開述語保存部１２、素性生成部１３、分類規則学習部１４、クラス分類規則保存部１５、クラス推定部１６を有する。展開述語保存部１２及びクラス分類規則保存部１５は、揮発性又は不揮発性の記憶装置である。 (Class estimation device according to an embodiment)
FIG. 2 is a block diagram illustrating an example of the configuration of the class estimating device according to the embodiment. The class estimation device 10 according to the embodiment includes an ambiguity calculation unit 11, an expansion predicate storage unit 12, a feature generation unit 13, a classification rule learning unit 14, a class classification rule storage unit 15, and a class estimation unit 16. The expansion predicate storage unit 12 and the class classification rule storage unit 15 are volatile or non-volatile storage devices.

曖昧性計算部１１は、入力された学習データ（ＲＤＦリソース）について、述語に対する目的語の曖昧性を計算して展開する述語を決定し、展開述語保存部１２に保存する。展開述語保存部１２は、曖昧性計算部１１により展開された述語の展開結果を格納する。素性生成部１３は、展開述語保存部１２を参照して展開するノードを決定し、入力された学習データ（ＲＤＦリソース）から素性を生成する。 The ambiguity calculation unit 11 calculates the ambiguity of the object with respect to the input learning data (RDF resource), determines a predicate to be expanded, and stores it in the expanded predicate storage unit 12. The expansion predicate storage unit 12 stores the expansion result of the predicate expanded by the ambiguity calculation unit 11. The feature generation unit 13 refers to the expansion predicate storage unit 12 to determine a node to be expanded, and generates a feature from the input learning data (RDF resource).

なお、素性とは、入力に対する出力を実データに基づいて学習する機械学習における入力をいう。例えば、「主語」であるリソースから学習データを作成する例では、リソースに対応する「述語」を素性としてクラスを推定する。そして、“rdf:type”以外の「述語」が素性として入力されたとき、「述語」“rdf:type”の「目的語」に対応する各クラスを出力する事例として学習する。 The feature means an input in machine learning for learning an output for an input based on actual data. For example, in an example in which learning data is created from a resource that is a “subject”, a class is estimated by using a “predicate” corresponding to the resource as a feature. Then, when a “predicate” other than “rdf:type” is input as a feature, learning is performed as an example of outputting each class corresponding to the “object” of the “predicate” “rdf:type”.

分類規則学習部１４は、素性生成部１３により生成された素性からクラス分類規則を学習して、学習したクラス分類規則をクラス分類規則保存部１５に保存する。クラス分類規則保存部１５は、分類規則学習部１４により学習されたクラス分類規則を格納する。 The classification rule learning unit 14 learns the class classification rules from the features generated by the feature generation unit 13, and stores the learned class classification rules in the class classification rule storage unit 15. The class classification rule storage unit 15 stores the class classification rules learned by the classification rule learning unit 14.

また、素性生成部１３は、展開述語保存部１２を参照して展開するノードを決定し、入力されたクラス推定対象データ（ＲＤＦリソース）から素性を生成する。クラス推定部１６は、クラス分類規則保存部１５に格納されているクラス分類規則を用いて、素性生成部１３により生成されたクラス推定対象データ（ＲＤＦリソース）の素性からクラスを推定し、クラス推定結果を出力する。 The feature generation unit 13 also refers to the expansion predicate storage unit 12 to determine a node to be expanded, and generates a feature from the input class estimation target data (RDF resource). The class estimation unit 16 estimates a class from the features of the class estimation target data (RDF resource) generated by the feature generation unit 13 using the class classification rules stored in the class classification rule storage unit 15, and the class estimation is performed. Output the result.

なお、曖昧性計算部１１は、少なくとも主語、述語、目的語の三要素でリソース間の関係情報を示すＲＤＦの学習データにおける各述語に対応する各目的語のクラスの出現確率をもとに、各述語に対応する各目的語のクラスの多様性を示す指標を算出する算出部の一例である。また、素性生成部１３は、指標が所定閾値を超える第１の述語に対応する目的語を主語とする場合の第２の述語を取得し、第１の述語及び第２の述語を組合せ、各主語のクラスに対応する述語と、第１の述語及び第２の述語を組合せた組合せ述語とを含む各素性を生成する生成部の一例である。また、分類規則学習部１４は、各主語のクラスと、各素性との対応関係をもとに、各素性が各クラスに対応して出現する出現頻度を集計し、集計した出現頻度から各素性に対して付与されるクラスを分類するクラス分類規則を学習する学習部の一例である。また、クラス推定部１６は、クラス分類規則を参照し、入力された素性が各クラスにおいて出現する出現頻度の和を算出し、和が閾値を超えるクラスを素性から推定される推定クラスとして出力する推定部の一例である。 Note that the ambiguity calculation unit 11 determines, based on the occurrence probability of each object class corresponding to each predicate in the RDF learning data indicating at least three elements of the subject, the predicate, and the object, the relationship information between the resources. It is an example of a calculation unit that calculates an index indicating the variety of classes of each object corresponding to each predicate. In addition, the feature generation unit 13 acquires a second predicate in the case where an object corresponding to a first predicate whose index exceeds a predetermined threshold is the subject, combines the first predicate and the second predicate, and It is an example of a generation unit that generates each feature including a predicate corresponding to a subject class and a combination predicate that combines a first predicate and a second predicate. Further, the classification rule learning unit 14 aggregates the appearance frequencies in which each feature appears corresponding to each class, based on the correspondence relationship between each subject class and each feature, and calculates each feature from the aggregated appearance frequency. 3 is an example of a learning unit that learns a class classification rule that classifies a class assigned to a. Further, the class estimation unit 16 refers to the class classification rule, calculates the sum of the appearance frequencies in which the input features appear in each class, and outputs the class whose sum exceeds the threshold value as the estimated class estimated from the features. It is an example of an estimation unit.

（実施例に係る学習フェーズ処理）
図３は、実施例に係る学習フェーズ処理の一例を示すフローチャートである。先ず、曖昧性計算部１１は、学習データ（ＲＤＦリソース）を取得する（ステップＳ１１）。ステップＳ１１で曖昧性計算部１１が取得する学習データは、例えば図６に示す学習データＤ１である。図６は、実施例に係る取得した学習データの一例を示す図である。 (Learning phase process according to the embodiment)
FIG. 3 is a flowchart illustrating an example of the learning phase process according to the embodiment. First, the ambiguity calculation unit 11 acquires learning data (RDF resource) (step S11). The learning data acquired by the ambiguity calculating unit 11 in step S11 is, for example, the learning data D1 illustrated in FIG. FIG. 6 is a diagram illustrating an example of the acquired learning data according to the embodiment.

次に、曖昧性計算部１１は、ステップＳ１１で取得した学習データＤ１のうち、クラスが未確定である「述語」が“http://xxx/分野”“http://xxx/国籍”“http://xxx/所属”であるレコードの「目的語」について、既知のクラスを取得する（ステップＳ１２）。クラスが未確定である「述語」“http://xxx/分野”“http://xxx/国籍”“http://xxx/所属”のレコードの「目的語」は、“http://xxx/物理学”“http://xxx/日本”“http://xxx/ＫＹ大学”“http://xxx/Ａミュージック”“http://xxx/Ｓ製作所”である。図７は、実施例に係る学習データの目的語のクラスの一例を示す図である。 Next, the ambiguity calculation unit 11 determines that the "predicate" whose class is undetermined is "http://xxx/field" "http://xxx/nationality" in the learning data D1 acquired in step S11. A known class is acquired for the "object" of the record "http://xxx/belonging" (step S12). The "object" of the record of "predicate" "http://xxx/field" "http://xxx/nationality" "http://xxx/affiliation" whose class is undetermined is "http:// xxx/Physics "http://xxx/Japan" "http://xxx/KY University" "http://xxx/A Music" "http://xxx/S Manufacturing". FIG. 7 is a diagram illustrating an example of the object class of the learning data according to the embodiment.

図７に示すクラスデータＤ２は、“http://xxx/物理学”“http://xxx/日本”“http://xxx/ＫＹ大学”“http://xxx/Ａミュージック”“http://xxx/Ｓ製作所”を「主語」として取得したそれぞれのクラスを示す。それぞれのクラスは、“http://xxx/学問”“http://xxx/国”“http://xxx/大学”“http://xxx/企業”である。 The class data D2 shown in FIG. 7 is “http://xxx/physics” “http://xxx/Japan” “http://xxx/KY University” “http://xxx/A music” “http The following shows each class acquired with "://xxx/S Seisakusho" as the "subject". Each class is “http://xxx/academic” “http://xxx/country” “http://xxx/university” “http://xxx/company”.

次に、曖昧性計算部１１は、学習データＤ１中のリソースについて述語に対する目的語の曖昧性を計算して展開する述語を決定し、決定した述語を展開述語保存部１２に保存する（曖昧性計算処理、ステップＳ１３）。曖昧性計算処理の詳細については、図４を参照して後述する。 Next, the ambiguity calculation unit 11 calculates the ambiguity of the object with respect to the predicate for the resource in the learning data D1 and determines the predicate to be developed, and stores the determined predicate in the expanded predicate storage unit 12 (ambiguity Calculation processing, step S13). Details of the ambiguity calculation process will be described later with reference to FIG.

次に、素性生成部１３は、学習データＤ１中のリソースについて展開述語保存部１２を参照し、目的語を展開して素性を生成する（素性生成処理、ステップＳ１４）。素性生成処理の詳細については、図５を参照して後述する。 Next, the feature generation unit 13 refers to the expansion predicate storage unit 12 for the resource in the learning data D1 and expands the object to generate a feature (feature generation process, step S14). Details of the feature generation processing will be described later with reference to FIG.

次に、分類規則学習部１４は、ステップＳ１４で生成した素性からクラス分類規則を学習してクラス分類規則保存部１５に保存する（ステップＳ１５）。ステップＳ１５が終了すると、クラス推定装置１０は、実施例に係る学習フェーズ処理を終了する。なお、クラス分類規則の詳細については、図１４を参照して後述する。 Next, the classification rule learning unit 14 learns the class classification rules from the features generated in step S14 and stores them in the class classification rule storage unit 15 (step S15). When step S15 ends, the class estimation device 10 ends the learning phase process according to the embodiment. The details of the classification rule will be described later with reference to FIG.

（実施例に係る曖昧性計算処理）
図４は、実施例に係る曖昧性計算処理の一例を示すフローチャートである。先ず、曖昧性計算部１１は、図３のステップＳ１１で取得した学習データＤ１に含まれる述語を重複なしで列挙し、リストＬ１に格納する（ステップＳ１３−１）。 (Ambiguity calculation processing according to the embodiment)
FIG. 4 is a flowchart illustrating an example of the ambiguity calculation process according to the embodiment. First, the ambiguity calculation unit 11 enumerates the predicates included in the learning data D1 acquired in step S11 of FIG. 3 without duplication and stores them in the list L1 (step S13-1).

図８は、実施例に係る学習データに含まれる述語を重複を排除して列挙したリストＬ１の一例を示す図である。図８に示すように、曖昧性計算部１１は、学習データＤ１のうち、クラスを表す“rdf:type”を除く「述語」の重複を排除した“http://xxx/分野”“http://xxx/国籍”“http://xxx/所属”“http://xxx/名前”をリストＬ１に格納する。 FIG. 8 is a diagram showing an example of the list L1 in which the predicates included in the learning data according to the embodiment are listed by eliminating duplication. As shown in FIG. 8, the ambiguity calculation unit 11 eliminates the duplication of the “predicate” excluding the “rdf:type” representing the class in the learning data D1, “http://xxx/field” “http: Store "xxx/nationality" "http://xxx/affiliation" "http://xxx/name" in the list L1.

次に、曖昧性計算部１１は、ステップＳ１３−１で「述語」が格納されたリストＬ１中に格納されているすべての述語について、後述のステップＳ１３−３〜ステップＳ１３−５の処理を行ったか否かを判定する（ステップＳ１３−２）。曖昧性計算部１１は、リストＬ１中のすべての述語について処理した場合（ステップＳ１３−２：Ｙｅｓ）、曖昧性計算処理を終了し、図３のステップＳ１４へ処理を移す。一方、曖昧性計算部１１は、リストＬ１中のすべての述語について処理していない場合（ステップＳ１３−２：Ｎｏ）、ステップＳ１３−３へ処理を移す。 Next, the ambiguity calculation unit 11 performs the processing of steps S13-3 to S13-5 described below for all the predicates stored in the list L1 in which the "predicate" is stored in step S13-1. It is determined whether or not (step S13-2). When all the predicates in the list L1 have been processed (step S13-2: Yes), the ambiguity calculation unit 11 ends the ambiguity calculation process and moves the process to step S14 in FIG. On the other hand, when the ambiguity calculation unit 11 has not processed all the predicates in the list L1 (step S13-2: No), the ambiguity calculation unit 11 moves the process to step S13-3.

ステップＳ１３−３では、曖昧性計算部１１は、リストＬ１から未処理の述語を１つ選択し、この述語Ｐの目的語がリテラルか否かを判定する。曖昧性計算部１１は、述語Ｐの目的語がリテラルである場合（ステップＳ１３−３：Ｙｅｓ）、ステップＳ１３−２へ処理を戻す。一方、曖昧性計算部１１は、述語Ｐの目的語がリテラルでない場合（ステップＳ１３−３：Ｎｏ）、ステップＳ１３−４へ処理を移す。 In step S13-3, the ambiguity calculation unit 11 selects one unprocessed predicate from the list L1 and determines whether the object of this predicate P is a literal. When the object of the predicate P is a literal (step S13-3: Yes), the ambiguity calculation unit 11 returns the process to step S13-2. On the other hand, when the object of the predicate P is not a literal (step S13-3: No), the ambiguity calculator 11 shifts the processing to step S13-4.

ステップＳ１３−４では、曖昧性計算部１１は、未処理の述語Ｐについて学習データＤ１中の対応する目的語のクラスの出現確率を算出し、クラスの出現数から閾値を計算する。図９を参照して、ステップＳ１３−４の処理を説明する。図９は、実施例に係る目的語のクラスの出現確率の算出の一例を示す図である。 In step S13-4, the ambiguity calculation unit 11 calculates the appearance probability of the class of the corresponding object in the learning data D1 for the unprocessed predicate P, and calculates the threshold value from the number of appearances of the class. The process of step S13-4 will be described with reference to FIG. FIG. 9 is a diagram illustrating an example of calculation of the appearance probability of the object class according to the embodiment.

目的語のクラスの出現確率は、すべてのリソースについて、各述語に対する目的語の曖昧性を計算することで、どの述語である場合に目的語を展開するかを決定するために算出される。先ず、各述語について、対応する目的語のクラスの出現確率を調べる。実際のデータは、クラスが判明していない目的語も存在するため、クラスが判明している目的語についてのみ出現確率の算出対象とする。 The appearance probability of a class of object is calculated for all resources to determine the ambiguity of the object for each predicate to determine in which predicate the object is expanded. First, for each predicate, the probability of occurrence of the class of the corresponding object is examined. Since there are some objects whose classes are not known in the actual data, only the objects whose classes are known are to be calculated for the appearance probability.

図９に示す例では、「述語」“分野（http://xxx/分野の‘http://xxx/’の省略形、以下同様）”については、対応する「目的語」はリテラルではなく、出現するクラスは“学問”の１つであることから、“学問”の出現確率は１である。また、「述語」“所属”については、対応する「目的語」はリテラルではなく、出現するクラスは“大学”“企業”“企業”の３つであることから、“大学”の出現確率が１／３、“企業”の出現確率が２／３である。また、「述語」“国籍”については、対応して出現する「目的語」はリテラルではなく、出現するクラスは“国”の１つであることから、“国”の出現確率は１である。 In the example shown in FIG. 9, for the "predicate" "field (abbreviation of'http://xxx/' in field http://xxx/ field, the same applies below)", the corresponding "object" is not a literal Since the class that appears is one of “study”, the appearance probability of “study” is 1. As for the “predicate” and “affiliation”, the corresponding “object” is not a literal and the classes that appear are “university”, “company”, and “company”. The appearance probability of "1/3" and "company" is 2/3. Regarding the "predicate" and "nationality", the corresponding "object" that appears is not a literal, and the class that appears is "country", so the probability of appearance of "country" is 1. ..

また、各「述語」に対するエントロピーの最大値は、出現するクラスの数をＮとするとlog_２Ｎであるので、例えば（log_２Ｎ）／２を、後述のエントロピーを閾値判定する際の閾値とする。エントロピーを閾値判定する際の閾値を、例えば（log_２Ｎ）／２とする等、学習データＤ１の特性に応じて閾値を動的に変化させることで、目的語の展開を行うか否かを適切に判定できる。 Further, the maximum value of entropy for each “predicate” is log ₂ N, where N is the number of classes that appear, so (log ₂ N)/2 is the threshold for determining entropy, which will be described later. To do. Whether or not to expand the object is determined by dynamically changing the threshold according to the characteristics of the learning data D1 such as the threshold for determining the entropy threshold, for example, (log ₂ N)/2. You can judge appropriately.

次に、ステップＳ１３−５では、曖昧性計算部１１は、各クラスの出現確率のエントロピーＳ（Ｓ＝−Σｐlog_２ｐ；ｐは各クラスの出現確率であり、Σは全てのクラスについての和を表す）を計算する。そして、曖昧性計算部１１は、エントロピーＳがステップＳ１３−４で計算した閾値より大きい場合は、現在の処理対象である述語Ｐを展開述語保存部１２に保存する。エントロピーＳは、目的語のクラスの曖昧性を示し、その値が大きいほど各述語に対する目的語が曖昧であることを意味する。エントロピーＳが閾値を超える場合に、該当目的語が展開される。 Next, in step S13-5, the ambiguity calculation unit 11 causes the entropy S(S=−Σplog ₂ p;p of the appearance probability of each class to be the appearance probability of each class, and Σ to be the sum of all classes). Represents) is calculated. Then, when the entropy S is larger than the threshold calculated in step S13-4, the ambiguity calculation unit 11 stores the predicate P that is the current processing target in the expanded predicate storage unit 12. The entropy S indicates the ambiguity of the class of the object, and the larger the value, the more ambiguous the object is for each predicate. When the entropy S exceeds the threshold value, the corresponding object is expanded.

曖昧性計算部１１は、各述語に対する目的語のクラスの出現確率から、目的語のクラスの曖昧性（各クラスの出現確率に基づくエントロピーＳ）を計算する。図９の例で曖昧性計算部１１が算出する各クラスの出現確率のエントロピーＳを説明する。図９では、「述語」“分野”において、出現するクラスは“学問”の１つであるから、曖昧性計算部１１は、エントロピーＳ＝−１×log_２１＝０と計算する。また、この場合、曖昧性計算部１１は、閾値は（log_２Ｎ）／２＝（log_２１）／２＝０と計算する。よって、曖昧性計算部１１は、「述語」“分野”は、エントロピーＳ＝閾値となり、エントロピーＳが閾値より大きいという条件が満たされないので、「述語」“分野”を展開述語保存部１２に保存しない。つまり、「述語」“分野”の先の「目的語」のクラスは、取り得るクラスのリソースの多様性がなく、曖昧性が小さい。 The ambiguity calculation unit 11 calculates the ambiguity of the object class (entropy S based on the appearance probability of each class) from the appearance probability of the object class for each predicate. The entropy S of the appearance probability of each class calculated by the ambiguity calculation unit 11 will be described with reference to the example of FIG. 9. In FIG. 9, since the class that appears in the “predicate” and “field” is one of “scholarship”, the ambiguity calculation unit 11 calculates entropy S=−1×log ₂ 1=0. Further, in this case, the ambiguity calculation unit 11 calculates the threshold value as (log ₂ N)/2=(log ₂ 1)/2=0. Therefore, the ambiguity calculation unit 11 saves the “predicate” and the “field” in the expanded predicate storage unit 12 because the condition that the “predicate” and the “field” has the entropy S=threshold and the entropy S is larger than the threshold. do not do. In other words, the class of the “object” that precedes the “predicate” and the “field” does not have a wide variety of possible resource resources, and the ambiguity is small.

また、図９では、「述語」“所属”において、“大学”のクラスが１／３の出現確率で出現し、“企業”のクラスが２／３の確率で出現する。このことから、曖昧性計算部１１は、エントロピーＳ＝−｛（１／３）×log_２（１／３）＋（２／３）×log_２（２／３）｝≒０．９２と計算する。また、この場合、曖昧性計算部１１は、閾値は（log_２Ｎ）／２＝（log_２２）／２＝０．５と計算する。よって、曖昧性計算部１１は、「述語」“所属”は、エントロピーＳ＞閾値となり、エントロピーＳが閾値より大きいという条件が満たされるので、「述語」“所属”を展開述語保存部１２の展開述語保存リストＬに格納して保存する。つまり、「述語」“所属”の先の「目的語」のクラスは、取り得るクラスのリソースの多様性があり、曖昧性が大きい。図１０は、実施例に係る展開述語保存リストの一例を示す図である。 In FIG. 9, in the "predicate" and "belonging", the class "university" appears with a probability of 1/3, and the class "company" appears with a probability of 2/3. From this, the ambiguity calculation unit 11 calculates that entropy S=−{(1/3)×log ₂ (1/3)+(2/3)×log ₂ (2/3)}≈0.92 To do. In addition, in this case, the ambiguity calculation unit 11 calculates the threshold as (log ₂ N)/2=(log ₂ 2)/2=0.5. Therefore, the ambiguity calculation unit 11 expands the "predicate" and "belonging" because the condition that "predicate" and "belonging" is entropy S>threshold, and entropy S is larger than the threshold. It is stored in the predicate storage list L and stored. In other words, the class of the “object” that precedes the “predicate” and “belongs” has a wide variety of possible class resources and is highly ambiguous. FIG. 10 is a diagram illustrating an example of the expanded predicate storage list according to the embodiment.

また、図９では、「述語」“国籍”において、“国”のクラスのみが出現することから、曖昧性計算部１１は、エントロピーＳ＝−１×log_２１＝０と計算する。また、この場合、曖昧性計算部１１は、閾値は（log_２１）／２＝（log_２１）／２＝０と計算する。よって、曖昧性計算部１１は、「述語」“国籍”は、エントロピーＳ＝閾値となり、エントロピーＳが閾値より大きいという条件が満たされないので、「述語」“国籍”を展開述語保存部１２に保存しない。つまり、「述語」“国籍”の先の「目的語」のクラスは、取り得るクラスのリソースの多様性がなく、曖昧性が小さい。 Further, in FIG. 9, since only the class of “country” appears in the “predicate” and “nationality”, the ambiguity calculation unit 11 calculates entropy S=−1×log ₂ 1=0. In this case, the ambiguity calculation unit 11 calculates the threshold as (log ₂ 1)/2=(log ₂ 1)/2=0. Therefore, the ambiguity calculation unit 11 stores the "predicate" and "nationality" in the expanded predicate storage unit 12 because the condition that the "predicate" and "nationality" are entropy S=threshold and the entropy S is larger than the threshold is not satisfied. do not do. In other words, the class of the “object” that precedes the “predicate” and “nationality” does not have a variety of possible class resources, and the ambiguity is small.

以上のようにして、ステップＳ１３−５が終了すると、曖昧性計算部１１は、ステップＳ１３−２へ処理を移す。 When step S13-5 ends as described above, the ambiguity calculation unit 11 shifts the processing to step S13-2.

（実施例に係る素性生成処理）
図５は、実施例に係る素性生成処理の一例を示すフローチャートである。先ず、素性生成部１３は、図３のステップＳ１１で取得した学習データＤ１に含まれる主語を重複なしで列挙し、リストＬ２に格納する（ステップＳ１４−１）。 (Feature generation processing according to the embodiment)
FIG. 5 is a flowchart illustrating an example of the feature generation processing according to the embodiment. First, the feature generation unit 13 enumerates the subjects included in the learning data D1 acquired in step S11 of FIG. 3 without duplication and stores them in the list L2 (step S14-1).

図１１は、実施例に係る学習データに含まれる主語を重複を排除して列挙したリストＬ２の一例を示す図である。図１１に示すように、素性生成部１３は、学習データＤ１のうち、「主語」の重複を排除した“http://xxx/湯川秀樹”“http://xxx/坂本竜二”“http://xxx/田中一郎”を学習対象リソースとしてリストＬ２に格納する。 FIG. 11 is a diagram illustrating an example of a list L2 in which the subject included in the learning data according to the embodiment is listed by eliminating duplication. As shown in FIG. 11, the feature generation unit 13 eliminates duplication of the “subject” in the learning data D1 “http://xxx/Hideki Yukawa” “http://xxx/Ryuji Sakamoto” “http: //xxx/Ichiro Tanaka” is stored in the list L2 as a learning target resource.

次に、素性生成部１３は、ステップＳ１４−１で学習対象リソースが格納されたリストＬ２中に格納されているすべての学習対象リソースについて、後述のステップＳ１４−３〜ステップＳ１４−４の処理を行ったか否かを判定する（ステップＳ１４−２）。素性生成部１３は、リストＬ２中のすべての学習対象リソースについて処理した場合（ステップＳ１４−２：Ｙｅｓ）、素性生成処理を終了し、図３のステップＳ１５へ処理を移す。一方、素性生成部１３は、リストＬ２中のすべての学習対象リソースについて処理していない場合（ステップＳ１４−２：Ｎｏ）、ステップＳ１４−３へ処理を移す。 Next, the feature generation unit 13 performs the processes of steps S14-3 to S14-4 described below for all the learning target resources stored in the list L2 in which the learning target resources are stored in step S14-1. It is determined whether or not it has been performed (step S14-2). When the feature generation unit 13 has processed all the learning target resources in the list L2 (step S14-2: Yes), the feature generation process ends and the process proceeds to step S15 in FIG. On the other hand, when the feature generation unit 13 has not processed all the learning target resources in the list L2 (step S14-2: No), the process proceeds to step S14-3.

ステップＳ１４−３では、素性生成部１３は、リストＬ２から未処理の学習対象リソースを１つ選択し、この学習対象リソースＲが、展開述語保存部１２に含まれる述語を持つ場合に展開して述語を取得する。すなわち、素性生成部１３は、述語に対する目的語のクラスの曖昧性が高い場合に、ＲＤＦのグラフ表現における該当目的語の先のグラフを展開する。 In step S14-3, the feature generation unit 13 selects one unprocessed learning target resource from the list L2, and expands this learning target resource R when it has a predicate included in the expansion predicate storage unit 12. Get the predicate. That is, when the ambiguity of the class of the object with respect to the predicate is high, the feature generation unit 13 expands the previous graph of the corresponding object in the RDF graph representation.

図１２を参照して、グラフの展開について説明する。図１２は、実施例に係るグラフの展開の一例を示す図である。図１２では、「述語」“所属”が展開対象であり、“分野”“国籍”は展開対象ではない。図１２に示すように、「主語」“湯川秀樹”“坂本竜二”“田中一郎”のそれぞれについて「述語」“所属”以下を展開し、展開後の各述語を取得する。図１２の例では、「主語」“湯川秀樹”について「述語」“所属”の展開後の述語は“学長”“学部”である。また、「主語」“坂本竜二”について「述語」“所属”の展開後の述語は“社長”“作品”である。また、「主語」“田中一郎”について「述語」“所属”の展開後の述語は“社長”“製品”である。 The expansion of the graph will be described with reference to FIG. FIG. 12 is a diagram illustrating an example of expansion of the graph according to the embodiment. In FIG. 12, "predicate" and "affiliation" are expansion targets, and "field" and "nationality" are not expansion targets. As shown in FIG. 12, "predicate", "affiliation" and the following are expanded for each of "subject", "Hideki Yukawa", "Ryuji Sakamoto", and "Ichiro Tanaka", and each predicate after expansion is acquired. In the example of FIG. 12, the predicate after the expansion of the “predicate” and “affiliation” for the “subject” “Hideki Yukawa” is “President” and “Faculty”. In addition, the predicate after the expansion of the "predicate" and "affiliation" for the "subject" "Ryuji Sakamoto" is "president" "work". Further, the predicate after the expansion of the "predicate" and "affiliation" for the "subject" and "Ichiro Tanaka" is "president" and "product".

なお、図１２の例では、「目的語」の先の展開を行うのは１ノード先までとするが、再帰的に複数ノード先まで展開してよい。 Note that in the example of FIG. 12, the expansion of the “object” is performed up to one node ahead, but it may be expanded recursively up to multiple nodes ahead.

次に、素性生成部１３は、処理対象の学習対象リソースＲの述語及びステップＳ１４−３で取得した展開先の述語を組合せ、この組合せから素性を生成し、素性リストＬ３に格納する（ステップＳ１４−４）。すなわち、素性生成部１３は、展開前後のグラフの述語を組合せた組合せ素性を生成し、クラス分類のための学習データを作成する。図１２及び図１３を参照して、素性の生成について説明する。図１３は、実施例に係る素性リストの一例を示す図である。 Next, the feature generation unit 13 combines the predicate of the learning target resource R to be processed and the predicate of the expansion destination acquired in step S14-3, generates a feature from this combination, and stores the feature in the feature list L3 (step S14). -4). That is, the feature generation unit 13 generates a combination feature that combines the predicates of the graph before and after the expansion, and creates learning data for class classification. Generation of features will be described with reference to FIGS. 12 and 13. FIG. 13 is a diagram illustrating an example of the feature list according to the embodiment.

例えば、素性生成部１３は、処理対象の学習対象リソースＲが図１２に示す「主語」“湯川秀樹”である場合、「述語」“所属”の先について展開を行い、“所属”の展開結果である“学長”“学部”を取得する。そして、素性生成部１３は、「述語」“分野”“所属”“国籍”“名前”と、“学長”“学部”とから、クラス“人物”“科学者”それぞれの素性として「分野，所属，国籍，名前，所属＋学長，所属＋学部」を生成し、図１３に示す素性リストＬ３に格納する。 For example, when the learning target resource R to be processed is the “subject” “Hideki Yukawa” shown in FIG. 12, the feature generation unit 13 expands the “predicate” “affiliation” ahead and expands the “affiliation” result. To obtain the "President" and "Undergraduate". Then, the feature generation unit 13 determines, based on the “predicate”, “field”, “affiliation”, “nationality”, “name”, and “principal” and “faculty”, “field, affiliation” as the feature of each of the “person” and “scientist”. , Nationality, name, affiliation+president, affiliation+faculty” are generated and stored in the feature list L3 shown in FIG.

また、素性生成部１３は、処理対象の学習対象リソースＲが図１２に示す「主語」“坂本竜二”である場合、「述語」“所属”の先について展開を行い、“所属”の展開結果である“社長”“作品”を取得する。そして、素性生成部１３は、「述語」“所属”“国籍”“名前”と、“社長”“作品”とから、クラス“人物”“作曲家”それぞれの素性として「所属，国籍，名前，所属＋社長，所属＋作品」を生成し、図１３に示す素性リストＬ３に格納する。 Further, when the learning target resource R to be processed is the “subject” “Ryuji Sakamoto” shown in FIG. 12, the feature generation unit 13 expands the “predicate” “affiliation” ahead, and expands the “affiliation” result. "President" "Work" is acquired. Then, the feature generation unit 13 uses the “predicate”, “affiliation”, “nationality”, “name”, and “president” “work” as the features of the class “person” and “composer”, respectively, “affiliation, nationality, name, "Affiliation+President, Affiliation+Works" is generated and stored in the feature list L3 shown in FIG.

また、素性生成部１３は、処理対象の学習対象リソースＲが図１２に示す「主語」“田中一郎”である場合、「述語」“所属”の先について展開を行い、“所属”の展開結果である“社長”“製品”を取得する。そして、素性生成部１３は、「述語」“所属”“国籍”“名前”と、“社長”“製品”とから、クラス“人物”“会社員”それぞれの素性として「所属，国籍，名前，所属＋社長，所属＋製品」を生成し、図１３に示す素性リストＬ３に格納する。 Further, when the learning target resource R to be processed is the “subject” “Ichiro Tanaka” shown in FIG. 12, the feature generation unit 13 expands the “predicate” “affiliation” ahead, and expands the “affiliation” result. "President" "Product" is acquired. Then, the feature generation unit 13 uses the "predicate", "affiliation", "nationality", "name", and "president" "product" as the features of the class "person" "company employee", "affiliation, nationality, name, "Affiliation+President, Affiliation+Product" is generated and stored in the feature list L3 shown in FIG.

ステップＳ１４−４が終了すると、素性生成部１３は、ステップＳ１４−２へ処理を移す。 When step S14-4 ends, the feature generation unit 13 moves the process to step S14-2.

（実施例に係るクラス分類規則）
図１４は、実施例に係るクラス分類規則（各クラスにおける各素性の出現頻度）の一例を示す図である。クラス分類規則は、図３のステップＳ１５で学習及び生成される。図１４に示すクラス分類規則Ｒ１は、図１３に示す素性リストＬ３における個々の素性の各クラスにおける出現頻度から生成される。 (Class classification rules according to the embodiment)
FIG. 14 is a diagram illustrating an example of the class classification rule (the appearance frequency of each feature in each class) according to the embodiment. The class classification rules are learned and generated in step S15 of FIG. The class classification rule R1 shown in FIG. 14 is generated from the frequency of appearance of each feature in each class in the feature list L3 shown in FIG.

先ず、分類規則学習部１４は、素性リストＬ３で出現する素性を重複を排除して列挙する。素性リストＬ３で出現する素性は、図１４に示すように、“分野”“所属”“国籍”“名前”“所属＋学長”“所属＋学部”“所属＋社長”“所属＋作品”“所属＋製品”である。なお、“所属＋学長”“所属＋学部”“所属＋社長”“所属＋作品”“所属＋製品”が、目的語の先のグラフを展開することにより追加取得された素性である。 First, the classification rule learning unit 14 lists features that appear in the feature list L3 by eliminating duplication. The features appearing in the feature list L3 are, as shown in FIG. 14, "field" "affiliation" "nationality" "name" "affiliation + president" "affiliation + department" "affiliation + president" "affiliation + work" "affiliation" + Product”. “Affiliation+President”, “Affiliation+Faculty”, “Affiliation+President”, “Affiliation+Works”, and “Affiliation+Product” are features additionally acquired by expanding the graph of the object.

そして、分類規則学習部１４は、それぞれの素性が、クラスにおいて出現する頻度を集計し、各集計結果をスコアとする。図１４に示す例では、“分野”は“人物”“科学者”のクラスにそれぞれ１回ずつ出現する。よって、“分野”の“人物”クラスのスコアは「１」、“科学者”クラスのスコアは「１」である。また、“所属”は“人物”のクラスに３回、“科学者”のクラスに１回、“作曲家”のクラスに１回、“会社員”のクラスに１回ずつ出現する。よって、“所属”の“人物”クラスのスコアは「３」、“科学者”クラスのスコアは「１」、“作曲家”クラスのスコアは「１」、“会社員”クラスのスコアは「１」である。 Then, the classification rule learning unit 14 aggregates the frequencies of appearance of each feature in the class, and sets each aggregation result as a score. In the example shown in FIG. 14, the “field” appears once in each of the “person” and “scientist” classes. Therefore, the score of the "person" class of the "field" is "1", and the score of the "scientist" class is "1". The "affiliation" appears three times in the "person" class, once in the "scientist" class, once in the "composer" class, and once in the "company employee" class. Therefore, the score of "person" class of "affiliation" is "3", the score of "scientist" class is "1", the score of "composer" class is "1", the score of "office worker" class is " 1”.

“国籍”“名前”“所属＋学長”“所属＋学部”“所属＋社長”“所属＋作品”“所属＋製品”についても同様である。このように、分類規則学習部１４は、素性と各クラスにおけるスコアとを対応付けたクラス分類規則Ｒ１を生成して、クラス分類規則保存部１５に保存する。 The same applies to "nationality", "name", "affiliation + president", "affiliation + department", "affiliation + president", "affiliation + work", "affiliation + product". As described above, the classification rule learning unit 14 generates the class classification rule R1 in which the feature and the score in each class are associated with each other, and stores the class classification rule storage unit 15 in the class classification rule storage unit 15.

すなわち、クラス分類規則Ｒ１とは、各主語のクラスと、各素性との対応関係をもとに、各素性が各クラスに対応して出現する出現頻度を集計したものをスコアとし、素性ごとに各素性が所属する可能性があるクラスの分類をスコアに基づいて学習したものである。ここで、各素性は、少なくとも主語、述語、目的語の三要素でリソース間の関係情報を示すＲＤＦの学習データにおける各述語に対応する各目的語のクラスの出現確率をもとに、各述語に対応する各目的語のクラスの多様性を示す指標が所定閾値を超える第１の述語に対応する目的語を主語とする場合の第２の述語を取得し、第１の述語及び第２の述語を組合せ、各主語のクラスに対応する述語と、第１の述語及び第２の述語を組合せた組合せ述語とを含むものである。 That is, the class classification rule R1 is a score obtained by aggregating the frequency of appearance of each feature corresponding to each class based on the correspondence relationship between each subject class and each feature, and for each feature The classification of classes to which each feature may belong is learned based on the score. Here, each feature is based on the occurrence probability of the class of each object corresponding to each predicate in the RDF learning data indicating at least three elements of the subject, the predicate, and the object. The second predicate in the case where the object corresponding to the first predicate whose index indicating the variety of the class of each object corresponding to is a subject is acquired, and the first predicate and the second predicate are acquired. Predicates are combined, and predicates corresponding to the classes of each subject and combination predicates that combine the first predicate and the second predicate are included.

このようなクラス分類規則Ｒ１は、各述語に対する目的語のクラスの曖昧性が高いときに、グラフを展開して素性を追加するので、素性の増加を抑制し、学習及び分類の速度低下を抑制しつつ、学習及び分類の精度を向上させることができる。 Such a class classification rule R1 expands a graph and adds features when the ambiguity of the object class for each predicate is high, and thus suppresses an increase in features and a decrease in learning and classification speed. At the same time, the accuracy of learning and classification can be improved.

（実施例に係る分類フェーズ処理）
図１５は、実施例に係る分類フェーズ処理の一例を示すフローチャートである。先ず、素性生成部１３は、クラス推定対象データ（ＲＤＦリソース）を取得する（ステップＳ２１）。ステップＳ２１で素性生成部１３が取得するクラス推定対象データは、例えば図１６に示すクラス推定対象データＤ３である。図１６は、実施例に係る取得したクラス推定対象データの一例を示す図である。 (Classification phase process according to the embodiment)
FIG. 15 is a flowchart illustrating an example of the classification phase process according to the embodiment. First, the feature generation unit 13 acquires class estimation target data (RDF resource) (step S21). The class estimation target data acquired by the feature generation unit 13 in step S21 is, for example, the class estimation target data D3 shown in FIG. FIG. 16 is a diagram illustrating an example of the acquired class estimation target data according to the example.

次に、素性生成部１３は、ステップＳ２１で取得したクラス推定対象データＤ３中のリソースについて展開述語保存部１２を参照し、目的語を展開して素性を生成する（素性生成処理、ステップＳ２２）。ステップＳ２２の素性生成処理の詳細は、図５を参照して上述したステップＳ１４の素性生成処理において、“学習データＤ１”“学習対象リソースＲ”をそれぞれ「クラス推定対象データＤ３」「クラス推定対象リソースＲ」と読み換えたものと同一である。 Next, the feature generation unit 13 refers to the expansion predicate storage unit 12 for the resource in the class estimation target data D3 acquired in step S21 and expands the object to generate a feature (feature generation process, step S22). . For details of the feature generation process of step S22, refer to the feature generation process of step S14 described above with reference to FIG. 5 by referring to “learning data D1” and “learning target resource R” as “class estimation target data D3” It is the same as the one read as "Resource R".

素性生成部１３は、ステップＳ２２の処理により、例えば、図１６に示すクラス推定対象データＤ３から、図１７の素性リストＤ４に示す素性及び組合せ素性を得る。図１７は、実施例に係るクラス推定対象データの素性リストの一例を示す図である。図１７に示す例では、クラス推定対象データＤ３から得られた素性は、“所属”“国籍”“名前”と、“所属”の展開結果から得られた“所属＋学長”“所属＋学部”である。 The feature generation unit 13 obtains the features and combination features shown in the feature list D4 of FIG. 17 from the class estimation target data D3 shown in FIG. 16, for example, by the process of step S22. FIG. 17 is a diagram illustrating an example of a feature list of class estimation target data according to the embodiment. In the example shown in FIG. 17, the features obtained from the class estimation target data D3 are “affiliation”, “nationality”, “name”, and “affiliation+president” “affiliation+faculty” obtained from the expansion result of “affiliation”. Is.

次に、クラス推定部１６は、ステップＳ２２で生成された素性リストＤ４に含まれる素性に、図３のステップＳ１５で分類規則学習部１４により学習されクラス分類規則保存部１５に保存されたクラス分類規則Ｒ１を適用する。クラス推定部１６は、素性にクラス分類規則Ｒ１を適用することにより、入力素性からクラスを推定し、推定結果を出力する（ステップＳ２３）。 Next, the class estimation unit 16 classifies the features included in the feature list D4 generated in step S22 by the classification rule learning unit 14 in step S15 of FIG. Rule R1 applies. The class estimating unit 16 estimates the class from the input features by applying the class classification rule R1 to the features, and outputs the estimation result (step S23).

図１８は、実施例に係るクラスの推定の一例を示す図である。例えば、図１８に示すクラス分類規則Ｒ１は、図１４に示すクラス分類規則Ｒ１と同一である。図１８に示す素性リストＤ４には“所属”“国籍”“名前”“所属＋学長”“所属＋学部”の素性が含まれる。クラス推定部１６は、クラス分類規則Ｒ１を参照し、“所属”“国籍”“名前”“所属＋学長”“所属＋学部”それぞれの素性の各クラスのスコアを計算する。図１８に示す例では、“人物”クラスのスコアは、“所属”の素性で「３」、“国籍”の素性で「３」、“名前”の素性で「３」、“所属＋学長”の素性で「１」、“所属＋学部”の素性で「１」であるので、スコアの合計が３＋３＋３＋１＋１＝１１となる。 FIG. 18 is a diagram illustrating an example of class estimation according to the embodiment. For example, the class classification rule R1 shown in FIG. 18 is the same as the class classification rule R1 shown in FIG. The feature list D4 shown in FIG. 18 includes features of “affiliation”, “nationality”, “name”, “affiliation+president”, “affiliation+faculty”. The class estimating unit 16 refers to the class classification rule R1 and calculates the score of each class of each feature of “affiliation”, “nationality”, “name”, “affiliation+president”, “affiliation+faculty”. In the example shown in FIG. 18, the score of the “person” class is “3” for the “affiliation” feature, “3” for the “nationality” feature, “3” for the “name” feature, and “affiliation + president”. Since the feature is “1” and the feature of “affiliation+faculty” is “1”, the total score is 3+3+3+1+1=11.

同様に、“科学者”クラスのスコアは、“所属”の素性で「１」、“国籍”の素性で「１」、“名前”の素性で「１」、“所属＋学長”の素性で「１」、“所属＋学部”の素性で「１」であるので、スコアの合計が１＋１＋１＋１＋１＝５となる。また、“会社員”クラスのスコアは、“所属”の素性で「１」、“国籍”の素性で「１」、“名前”の素性で「１」、“所属＋学長”の素性で「０」、“所属＋学部”の素性で「０」であるので、スコアの合計が１＋１＋１＋０＋０＝３となる。また、“作曲家”クラスのスコアは、“所属”の素性で「１」、“国籍”の素性で「１」、“名前”の素性で「１」、“所属＋学長”の素性で「０」、“所属＋学部”の素性で「０」であるので、スコアの合計が１＋１＋１＋０＋０＝３となる。 Similarly, the "scientist" class score is "1" for the "affiliation" feature, "1" for the "nationality" feature, "1" for the "name" feature, and "1" for the "affiliation + president" feature. Since the feature of “1” and “affiliation+faculty” is “1”, the total score is 1+1+1+1+1=5. In addition, the score of the “company employee” class is “1” for the “affiliation” feature, “1” for the “nationality” feature, “1” for the “name” feature, and “1” for the “affiliation + president” feature. Since the feature of “0” and “affiliation+faculty” is “0”, the total score is 1+1+1+0+0=3. The score of the "Composer" class is "1" for the "affiliation" feature, "1" for the "nationality" feature, "1" for the "name" feature, and "1" for the "affiliation + president" feature. Since the feature of “0” and “affiliation+faculty” is “0”, the total score is 1+1+1+0+0=3.

そして、クラス推定部１６は、例えばスコア閾値をスコア４と設定し、スコア閾値である４を超えるスコアの“人物”クラス及び“科学者”クラスを、クラス推定対象データに対する推定クラスとして出力する。 Then, the class estimation unit 16 sets, for example, the score threshold to score 4, and outputs the “person” class and the “scientist” class having a score exceeding 4 which is the score threshold, as the estimation class for the class estimation target data.

（実施例の他の適用例）
図１９は、実施例の他の適用例に係る目的語のクラスの出現確率の算出の一例を示す図である。図１９に示す例では、「述語」“所在地”については、対応する「目的語」はリテラルではなく、出現するクラスは“市区町村”が９つ、“都道府県”が１つであることから、“市区町村”の出現確率が９／１０であり、“都道府県”の出現確率が１／１０である。また、「述語」“祭神”については、対応する「目的語」はリテラルではなく、出現するクラスは“皇族”が３つ、“神”が３つであることから、“皇族”“神”の出現確率はともに３／６である。 (Other application examples of the embodiment)
FIG. 19 is a diagram illustrating an example of calculating the appearance probability of an object class according to another application example of the embodiment. In the example shown in FIG. 19, for the “predicate” and “location”, the corresponding “object” is not a literal, and the classes that appear are “city” and “prefecture”. Therefore, the appearance probability of “city, town and village” is 9/10, and the appearance probability of “prefecture” is 1/10. Also, for the "predicate" and "festival deity", the corresponding "object" is not a literal, and the classes that appear are "royal family" and "god", so "royal family" and "god" The appearance probabilities of both are 3/6.

同様に、「述語」“本尊”については、対応する「目的語」はリテラルではなく、出現するクラスは全て“仏”であることから、“仏”の出現確率は１である。また、「述語」“開基”については、対応する「目的語」はリテラルではなく、出現するクラスは全て“僧”であることから、“僧”の出現確率は１である。 Similarly, for the “predicate” and “honson”, the corresponding “object” is not a literal and all the appearing classes are “Buddha”, so the appearance probability of “Buddha” is 1. Further, regarding the “predicate” and “open group”, since the corresponding “object” is not a literal and all the appearing classes are “monks”, the appearance probability of “monks” is 1.

よって、「述語」“所在地”において、出現するクラスは“市区町村”“都道府県”の２つであり、それぞれの出現確率が９／１０、１／１０である。よって、曖昧性計算部１１は、エントロピーＳ＝−（９／１０）×log_２（９／１０）−（１／１０）×log_２（１／１０）≒０．４７と計算する。また、この場合、曖昧性計算部１１は、閾値は（log_２Ｎ）／２＝（log_２２）／２＝１／２＝０．５と計算する。よって、曖昧性計算部１１は、「述語」“所在地”は、エントロピーＳ＜閾値となり、エントロピーＳが閾値より大きいという条件が満たされないので、「述語」“所在地”を展開述語保存部１２に保存しない。つまり、「述語」“所在地”の先の「目的語」のクラスは、取り得るクラスのリソースの多様性がなく、曖昧性が小さい。 Therefore, in the “predicate” and “location”, there are two classes that appear, “municipalities” and “prefectures”, and the respective occurrence probabilities are 9/10 and 1/10. Therefore, the ambiguity calculation unit 11 calculates that entropy S=−(9/10)×log ₂ (9/10)−(1/10)×log ₂ (1/10)≈0.47. Further, in this case, the ambiguity calculation unit 11 calculates the threshold as (log ₂ N)/2=(log ₂ 2)/2=1/2=0.5. Therefore, the ambiguity calculation unit 11 stores the “predicate” “location” in the expanded predicate storage unit 12 because the condition that the “predicate” “location” is entropy S<threshold and the entropy S is larger than the threshold is not satisfied. do not do. In other words, the class of the “object” that precedes the “predicate” and the “location” does not have a variety of possible class resources, and the ambiguity is small.

同様に、曖昧性計算部１１は、「述語」“祭神”において、出現するクラスは“皇族”“神”の２つであり、出現確率はともに３／６であることから、エントロピーＳ＝−（３／６）×log_２（３／６）−（３／６）×log_２（３／６）＝１と計算する。また、この場合、曖昧性計算部１１は、閾値は（log_２Ｎ）／２＝（log_２２）／２＝１／２＝０．５と計算する。よって、曖昧性計算部１１は、「述語」“祭神”は、エントロピーＳ＞閾値となり、エントロピーＳが閾値より大きいという条件が満たされるので、「述語」“祭神”を展開述語保存部１２に保存する。つまり、「述語」“祭神”の先の「目的語」のクラスは、取り得るクラスのリソースの多様性があり、曖昧性が大きい。 Similarly, the ambiguity calculation unit 11 has two classes, “royal family” and “god”, that appear in the “predicate” and “festival deity”, and the appearance probabilities are both 3/6. Therefore, entropy S=− (3/6)×log ₂ (3/6)−(3/6)×log ₂ (3/6)=1 is calculated. Further, in this case, the ambiguity calculation unit 11 calculates the threshold as (log ₂ N)/2=(log ₂ 2)/2=1/2=0.5. Therefore, the ambiguity calculation unit 11 saves the “predicate” “festival deity” in the expanded predicate storage unit 12 because the condition that the “predicate” “festival deity” becomes entropy S>threshold and the entropy S is greater than the threshold. To do. In other words, the class of the “object” that precedes the “predicate” and “festival” has a wide variety of possible class resources and is highly ambiguous.

同様に、曖昧性計算部１１は、「述語」“本尊”において、出現するクラスは“仏”のみであり、出現確率は１であることから、エントロピーＳ＝−１×log_２１＝０と計算する。また、この場合、曖昧性計算部１１は、閾値は（log_２Ｎ）／２＝（log_２１）／２＝０と計算する。よって、曖昧性計算部１１は、「述語」“本尊”は、エントロピーＳ＝閾値となり、エントロピーＳが閾値より大きいという条件が満たされないので、「述語」“本尊”を展開述語保存部１２に保存しない。つまり、「述語」“本尊”の先の「目的語」のクラスは、取り得るクラスのリソースの多様性がなく、曖昧性が小さい。 Similarly, the ambiguity calculation unit 11 determines that entropy S=−1×log ₂ 1=0 since the only class that appears in the “predicate” “honson” is “France” and the appearance probability is 1. calculate. Further, in this case, the ambiguity calculation unit 11 calculates the threshold value as (log ₂ N)/2=(log ₂ 1)/2=0. Therefore, the ambiguity calculation unit 11 saves the “predicate” “honson” in the expanded predicate storage unit 12 because the condition that the “predicate” “honson” is the entropy S=threshold and the entropy S is greater than the threshold. do not do. That is, the class of the “object” that precedes the “predicate” and “honson” does not have a variety of possible class resources, and the ambiguity is small.

同様に、曖昧性計算部１１は、「述語」“開基”において、出現するクラスは“僧”のみであり、出現確率は１であることから、エントロピーＳ＝−１×log_２１＝０と計算する。また、この場合、曖昧性計算部１１は、閾値は（log_２Ｎ）／２＝（log_２１）／２＝０と計算する。よって、曖昧性計算部１１は、「述語」“開基”は、エントロピーＳ＝閾値となり、エントロピーＳが閾値より大きいという条件が満たされないので、「述語」“開基”を展開述語保存部１２に保存しない。つまり、「述語」“開基”の先の「目的語」のクラスは、取り得るクラスのリソースの多様性がなく、曖昧性が小さい。 Similarly, the ambiguity calculation unit 11 determines that entropy S=−1×log ₂ 1=0 since the only class that appears in the “predicate” “open group” is “monk” and the appearance probability is 1. calculate. Further, in this case, the ambiguity calculation unit 11 calculates the threshold value as (log ₂ N)/2=(log ₂ 1)/2=0. Therefore, the ambiguity calculation unit 11 stores the “predicate” “open base” in the expanded predicate storage unit 12 because the condition that the “predicate” “open base” has the entropy S=threshold and the entropy S is larger than the threshold. do not do. In other words, the class of the “object” that precedes the “predicate” and “open base” does not have a wide variety of possible class resources, and the ambiguity is small.

よって、図１９の例では、図２０に示すように、“祭神”を展開し、“所在地”“本尊”“開基”は展開しない。図２０は、実施例の他の適用例に係るグラフの展開の一例を示す図である。図２０に示すように、「主語」“明治神宮”“吉野神宮”“住吉神社”“赤間神宮”“厳島神社”“出雲大社”のそれぞれについて「述語」“祭神”以下を展開し、展開後の各述語を取得する。図２０の例では、「主語」“明治神宮”“吉野神宮”“赤間神宮”のそれぞれについて「述語」“祭神”の展開後の述語は“在位”“元号”である。また、「主語」“住吉神社”“厳島神社”“出雲大社”について「述語」“祭神”の展開後の述語は“正式名”である。 Therefore, in the example of FIG. 19, as shown in FIG. 20, “festival deity” is expanded and “location”, “honson”, and “Kaiki” are not expanded. FIG. 20 is a diagram illustrating an example of expansion of a graph according to another application example of the embodiment. As shown in FIG. 20, "prediction", "festival deity" and the following are expanded for each of "subject", "Meiji Shrine", "Yoshino Shrine", "Sumiyoshi Shrine", "Akama Shrine", "Itsukushima Shrine", and "Izumo Taisha Shrine". Get each predicate of. In the example of FIG. 20, for each of the “subject”, “Meiji Jingu”, “Yoshino Jingu”, and “Akama Jingu”, the predicate after the expansion of “predicate” and “festival god” is “reign” and “gengo”. In addition, the predicate after the expansion of the "predicate" "festival god" is the "official name" for "subject" "Sumiyoshi Shrine" "Itsukushima Shrine" "Izumo Taisha".

そして、上述したように、素性生成部１３は、素性リストＬ３を生成する。素性生成部１３は、「主語」“明治神宮”である場合、「述語」“祭神”の先について展開を行い、“祭神”の展開結果である“在位”“元号”を取得する。そして、素性生成部１３は、「述語」“所在地”“祭神”と、“祭神”の展開結果である“在位”“元号”から、クラス“神宮”の素性として「所在地，祭神，祭神＋在位，祭神＋元号」を生成し、図２１に示す素性リストＬ３に格納する。図２１に示すその他の「主語」についても同様である。 Then, as described above, the feature generation unit 13 generates the feature list L3. When the subject is “Meiji Jingu”, the feature generation unit 13 expands beyond the “predicate” and “shrine god”, and acquires the “residence” and “gengo” that are the development results of the “shrine god”. Then, the feature generation unit 13 uses the "predicate", "location", "festival deity", and the "results" and "gengo", which are the development results of the "festival deity", as the features of the class "jingu" to "location, deity, deity". +Position, festival god+gengo” is generated and stored in the feature list L3 shown in FIG. The same applies to the other "subjects" shown in FIG.

そして、分類規則学習部１４は、上述と同様に、個々の素性の各クラスにおける出現頻度からクラス分類規則Ｒ１を生成する。図２１は、実施例の他の適用例に係るクラス分類規則（各クラスにおける各素性の出現頻度）の一例を示す図である。分類規則学習部１４は、“所在地”“祭神”“本尊”“開基”“祭神＋在位”“祭神＋元号”“祭神＋正式名”について、素性と各クラスにおけるスコアとを対応付けたクラス分類規則Ｒ１を生成して、クラス分類規則保存部１５に保存する。なお、“祭神＋在位”“祭神＋元号”“祭神＋正式名”が、目的語の先のグラフを展開することにより追加取得された素性である。 Then, as described above, the classification rule learning unit 14 generates the class classification rule R1 from the appearance frequency of each class of each feature. FIG. 21 is a diagram showing an example of a classification rule (appearance frequency of each feature in each class) according to another application example of the embodiment. The classification rule learning unit 14 associates the feature and the score in each class with respect to “location”, “festival deity”, “honson”, “kaiki”, “festival deity+reign”, “festival deity+gengo”, “festival deity+formal name”. The class classification rule R1 is generated and stored in the class classification rule storage unit 15. “Festival+reign”, “Festival+gengo”, “Festival+formal name” are the additional features acquired by developing the graph above the object.

また、素性生成部１３は、上述と同様に、クラス推定対象データに基づく素性及び組合せ素性を得る。そして、クラス推定部１６は、クラス分類規則Ｒ１を参照し、“所在地”“祭神”“本尊”“開基”“祭神＋在位”“祭神＋元号”“祭神＋正式名”それぞれの素性の各クラスのスコアを計算する。そして、素性生成部１３は、スコアがスコア閾値を超えるクラスを、クラス推定対象データに対する推定クラスとして出力する。 Further, the feature generation unit 13 obtains a feature and a combination feature based on the class estimation target data, similarly to the above. Then, the class estimating unit 16 refers to the class classification rule R1 and refers to each of the features of “location” “festival deity” “honson” “kaiki” “festival deity+reign” “festival deity+gengo” “festival deity+formal name”. Calculate the score for each class. Then, the feature generation unit 13 outputs a class whose score exceeds the score threshold as an estimated class for the class estimation target data.

以上の実施例では、ＲＤＦグラフにおいて、各述語に対する目的語のクラスの曖昧性（多様性）が閾値判定により所定より高いと判定されるときに、その先のグラフを展開し、展開前後のグラフの述語を組合せた組合せ素性を生成する。そして、組合せ素性に基づくクラス分類のための学習データを生成し、この学習データからクラス分類規則を学習する。 In the above embodiment, when the ambiguity (diversity) of the object class for each predicate is determined to be higher than a predetermined value by the threshold determination in the RDF graph, the graph before the expansion is expanded and the graphs before and after the expansion are expanded. Generate a combination feature that combines the predicates of. Then, the learning data for class classification based on the combination feature is generated, and the classification rule is learned from this learning data.

そして、実施例では、クラスを推定したいクラス推定対象リソースを入力とし、クラス分類規則の学習時と同様にグラフを展開して組合せ素性を生成し、生成した組合せ素性に対してクラス分類規則を適用することで、入力したリソースのクラスを推定する。これにより、素性の増加を抑制することで処理負荷及び計算コストを抑制しつつ、クラス推定の精度を向上させることができる。 Then, in the embodiment, a class estimation target resource whose class is to be estimated is input, a graph is expanded in the same manner as when learning the class classification rule to generate a combination feature, and the class classification rule is applied to the generated combination feature. By doing so, the class of the input resource is estimated. As a result, it is possible to improve the accuracy of class estimation while suppressing the processing load and the calculation cost by suppressing the increase in features.

例えば、ウェブ上の膨大なリソースのなかには、述語が同一でも目的語のクラスが異なるリソースが存在するため、このようなリソースのクラスを精度よく判別することは容易ではない。クラスを精度よく判別するためには、判別の手がかりとなる素性の数を増やすことが考えられる。しかし、単純に素性の数を増加させることは、処理負荷が増大し、計算速度が低下する。 For example, among the vast amount of resources on the Web, there are resources that have the same predicate but different object classes, so it is not easy to accurately determine such resource classes. In order to discriminate a class with high accuracy, it is conceivable to increase the number of features serving as a clue for discrimination. However, simply increasing the number of features increases the processing load and reduces the calculation speed.

そこで、実施例は、各述語に対する目的語のクラスの曖昧性が閾値を超える場合にのみ目的語を展開し、クラスを特徴付ける素性を増加させたクラス分類規則を学習する。これにより、処理負荷の増大及び計算速度の低下を抑制し、精度よくクラス分類できる。 Therefore, in the embodiment, the object is expanded only when the ambiguity of the class of the object with respect to each predicate exceeds the threshold value, and the class classification rule in which the features that characterize the class are increased is learned. As a result, it is possible to suppress an increase in processing load and a decrease in calculation speed, and perform class classification with high accuracy.

また、ウェブ上の膨大なリソースについて、ＲＤＦとして、データ構造のスキーマを定義し、リソース同士をリンクさせて公開することにより、あるリソースを手がかりに他のリソースを機械探索できるとされている。これは、“西出頼継他、「日本のOpen Data活用を目的としたデータセットのスキーマ分析とリンク関係の調査」、研究報告情報基礎とアクセス技術（IFAT）、1-8、一般社団法人電子情報通信学会、２０１３年９月１９日、2013-IFAT-112（4）”に示される。例えば、リソースにクラス情報を付与することで，データの円滑な利用を行うことが期待されている。 Further, it is said that by defining a schema of a data structure as RDF for a vast amount of resources on the Web and linking the resources to each other and publishing them, a certain resource can be used as a clue to machine search for other resources. This is “Yoshitsugu Nishide et al., “Schema analysis of datasets and link relationship survey for the purpose of utilizing Open Data in Japan”, Research report information basics and access technology (IFAT), 1-8, general incorporated foundation. Institute of Electronics, Information and Communication Engineers, September 19, 2013, 2013-IFAT-112(4)”. For example, it is expected that data can be used smoothly by adding class information to resources. ..

しかし、ウェブ上で公開されている多くのリソースが、クラス情報が付与されていない等、スキーマ定義が不十分である。このため、ウェブ上で公開されている多くのリソースは、スキーマに基づいた機械的アクセスができず、活用が困難である。 However, many resources published on the web do not have sufficient schema definition, such as class information is not added. Therefore, many resources published on the web cannot be mechanically accessed based on the schema and are difficult to utilize.

しかし、実施例によるクラス推定の結果を用いると、ＲＤＦの異なるリソースを、推定クラスに基づいて適切に結びつけることができる。よって、実施例は、他のリソースをもとに目的のリソースの探索が容易でないという不都合を補完し、リソース探索を容易にすることで、リソース活用の利便性を向上させる。 However, using the result of the class estimation according to the embodiment, different RDF resources can be appropriately combined based on the estimated class. Therefore, the embodiment complements the inconvenience that it is not easy to search for a target resource based on other resources, and facilitates resource search, thereby improving the convenience of resource utilization.

以上の実施例において図示した各装置の各構成要素は、必ずしも物理的に図示のように構成されていることを要しない。すなわち、各部の分散又は統合の具体的形態は図示に限られず、その全部又は一部を、各種の負荷や使用状況等に応じて、任意の単位で機能的又は物理的に分散・統合して構成することができる。 Each component of each device illustrated in the above embodiments does not necessarily have to be physically configured as illustrated. That is, the specific form of distribution or integration of the respective parts is not limited to the illustration, and all or part of the parts may be functionally or physically distributed/integrated in arbitrary units according to various loads and usage conditions. Can be configured.

例えば、展開述語保存部１２及びクラス分類規則保存部１５は、クラス推定装置１０に接続される外部記憶装置であってもよい。また、クラス推定装置１０は、曖昧性計算部１１、素性生成部１３、分類規則学習部１４を含む学習装置と、素性生成部１３、クラス推定部１６を含む推定装置とに分散実装されてもよい。 For example, the expansion predicate storage unit 12 and the class classification rule storage unit 15 may be external storage devices connected to the class estimation device 10. Further, the class estimation device 10 may be distributed and implemented in a learning device including the ambiguity calculation unit 11, the feature generation unit 13, and the classification rule learning unit 14 and an estimation device including the feature generation unit 13 and the class estimation unit 16. Good.

また、クラス推定装置１０の曖昧性計算部１１、素性生成部１３、分類規則学習部１４、クラス推定部１６の各種処理機能は、ＣＰＵ（Central Processing Unit）及びメモリの協働により、その全部又は任意の一部が実現される。または、クラス推定装置１０の各種処理機能は、ＭＰＵ、ＭＣＵ、ＡＳＩＣ、ＦＰＧＡ等のマイクロコンピュータにより、その全部又は任意の一部が実現されてもよい。ＭＰＵはMicro Processing Unitであり、ＭＣＵはMicro Controller Unitであり、ＡＳＩＣはApplication Specific Integrated Circuitであり、ＦＰＧＡはField-Programmable Gate Arrayである。 Further, the various processing functions of the ambiguity calculation unit 11, the feature generation unit 13, the classification rule learning unit 14, and the class estimation unit 16 of the class estimation device 10 are all or in cooperation with a CPU (Central Processing Unit) and a memory. Any part is realized. Alternatively, all or any part of the various processing functions of the class estimation device 10 may be realized by a microcomputer such as MPU, MCU, ASIC, or FPGA. The MPU is a Micro Processing Unit, the MCU is a Micro Controller Unit, the ASIC is an Application Specific Integrated Circuit, and the FPGA is a Field-Programmable Gate Array.

また、クラス推定装置１０の各種処理機能は、ＣＰＵ（またはＭＰＵ、ＭＣＵ等のマイクロコンピュータ）により解析実行されるプログラム又はワイヤードロジック等によるハードウェアで、その全部又は任意の一部が実現されてもよい。 Further, the various processing functions of the class estimation device 10 may be implemented by a program that is analyzed and executed by a CPU (or a microcomputer such as MPU or MCU) or hardware such as wired logic, and all or any part thereof may be realized. Good.

１０クラス推定装置
１１曖昧性計算部
１２展開述語保存部
１３素性生成部
１４分類規則学習部
１５クラス分類規則保存部
１６クラス推定部 10 Class Estimating Device 11 Ambiguity Calculation Unit 12 Expansion Predicate Saving Unit 13 Feature Generation Unit 14 Classification Rule Learning Unit 15 Class Classification Rule Saving Unit 16 Class Estimating Unit

Claims

Corresponds to each predicate based on the occurrence probability of each object class corresponding to each predicate in the learning data of RDF (Resource Description Framework) that shows relation information between resources with at least three elements of subject, predicate, and object A calculation unit that calculates an index indicating the diversity of each object class
Acquires a second predicate in the case where an object corresponding to the first predicate whose index exceeds a predetermined threshold is a subject, combines the first predicate and the second predicate, and corresponds to each subject class And a generating unit that generates each feature including a combined predicate combining the first predicate and the second predicate,
Based on the correspondence relationship between each subject class and each feature, the appearance frequency of each feature corresponding to each class is aggregated, and the class assigned to each feature is classified from the aggregated appearance frequency. And a learning unit that learns a class classification rule that

An estimation unit is further provided, which refers to the class classification rule, calculates a sum of appearance frequencies in which the input features appear in each class, and outputs a class in which the sum exceeds a threshold as an estimated class estimated from the features. ,
The generation unit, when each predicate in the RDF class estimation target data corresponds to the first predicate, acquires a second predicate when the object corresponding to the first predicate is the subject, Combining the first predicate and the second predicate to generate each feature including a predicate corresponding to each subject class and a combined predicate combining the first predicate and the second predicate;
The class estimating device according to claim 1, wherein the estimating unit estimates a class of a subject corresponding to the feature in the class estimation target data by using the feature generated by the generating unit as an input.

The class estimation device according to claim 1 or 2, wherein the index is entropy based on the appearance probability.

The class estimating device according to claim 1, 2 or 3, wherein the index is variable according to the number of appearances of a class of each object corresponding to each predicate.

Computer
Corresponding to each predicate based on the occurrence probability of each object class corresponding to each predicate in the learning data of RDF (Resource Description Framework) that shows relation information between resources with at least three elements of subject, predicate, and object Calculate the index showing the diversity of each object class
Acquiring a second predicate when the object is the object corresponding to the first predicate whose index exceeds a predetermined threshold value,
Combining the first predicate and the second predicate,
Generating each feature including a predicate corresponding to each subject class and a combination predicate combining the first predicate and the second predicate;
Based on the correspondence relationship between each subject class and each feature, the appearance frequency of each feature corresponding to each class is aggregated, and the class assigned to each feature is classified from the aggregated appearance frequency. A class estimation method characterized by executing each process for learning a class classification rule.

On the computer,
Corresponding to each predicate based on the occurrence probability of each object class corresponding to each predicate in the learning data of RDF (Resource Description Framework) that shows relation information between resources with at least three elements of subject, predicate, and object Calculate an index showing the diversity of each object class
Acquiring a second predicate when the object is the object corresponding to the first predicate whose index exceeds a predetermined threshold value,
Combining the first predicate and the second predicate,
Generating each feature including a predicate corresponding to each subject class and a combination predicate combining the first predicate and the second predicate;
Based on the correspondence relationship between each subject class and each feature, the appearance frequency of each feature corresponding to each class is aggregated, and the class assigned to each feature is classified from the aggregated appearance frequency. A class estimation program that executes each process of learning a class classification rule to be performed.