JP5622310B2

JP5622310B2 - Mutual machine learning device, mutual machine learning method, and program

Info

Publication number: JP5622310B2
Application number: JP2010184356A
Authority: JP
Inventors: 鍾勲呉; 山田　一郎; 一郎山田; 健太郎鳥澤; デサーガステイン
Original assignee: National Institute of Information and Communications Technology
Current assignee: National Institute of Information and Communications Technology
Priority date: 2010-08-19
Filing date: 2010-08-19
Publication date: 2014-11-12
Anticipated expiration: 2030-08-19
Also published as: JP2012043225A

Description

本発明は、２個の機械学習を用いて相互機械学習を行う相互機械学習装置等に関する。 The present invention relates to a mutual machine learning device that performs mutual machine learning using two machine learnings.

従来、教師データを用いた機械学習において、複数の機械学習を組み合わせた相互機械学習という手法が提案されている（例えば、非特許文献１，２参照）。非特許文献１の手法では、同じ語のペアを対象として、学習時に利用する素性を人手によって分けて複数の機械学習器を生成し、一つの機械学習器から得られた信頼できる結果を、別の機械学習器の学習データとして使用している。非特許文献２では、異なる言語を対象として、言語ごとに機械学習器を生成し、一つの機械学習器から得られた信頼できる結果を、別の機械学習器の学習データとして使用している。 Conventionally, in machine learning using teacher data, a method called mutual machine learning in which a plurality of machine learnings are combined has been proposed (see, for example, Non-Patent Documents 1 and 2). In the technique of Non-Patent Document 1, a plurality of machine learners are generated by manually dividing the features used at the time of learning for the same word pair, and reliable results obtained from one machine learner are separated from each other. It is used as learning data for machine learners. In Non-Patent Document 2, a machine learning device is generated for each language for different languages, and a reliable result obtained from one machine learning device is used as learning data of another machine learning device.

ＡｖｒｉｍＢｌｕｍ、ＴｏｍＭｉｔｃｈｅｌｌ、「ＣｏｍｂｉｎｉｎｇＬａｂｅｌｅｄａｎｄＵｎｌａｂｅｌｅｄＤａｔａｗｉｔｈＣｏ−Ｔｒａｉｎｉｎｇ」、ＩｎＣＯＬＴ'９８：ＰｒｏｃｅｅｄｉｎｇｓｏｆｔｈｅｅｌｅｖｅｎｔｈａｎｎｕａｌｃｏｎｆｅｒｅｎｃｅｏｎＣｏｍｐｕｔａｔｉｏｎａｌｌｅａｒｎｉｎｇｔｈｅｏｒｙ、ｐ．９２−１００、１９９８年Avrim Blum, Tom Mitchell, “Combining Labeled and Unlabeled Data with Co-Training”, In CALT '98: Proceedings of the evening concealment conference. 92-100, 1998 Ｊｏｎｇ−ＨｏｏｎＯｈ、ＫｉｙｏｔａｋａＵｃｈｉｍｏｔｏ、ＫｅｎｔａｒｏＴｏｒｉｓａｗａ、「ＢｉｌｉｎｇｕａｌＣｏ−ＴｒａｉｎｉｎｇｆｏｒＭｏｎｏｌｉｎｇｕａｌＨｙｐｏｎｙｍｙ−ＲｅｌａｔｉｏｎＡｃｑｕｉｓｉｔｉｏｎ」、ＩｎＰｒｏｃｏｆＡＣＬ−０９：ＩＪＣＮＬＰ、ｐ．４３２−４４０、２００９年Jong-Hoon Oh, Kiyotaka Uchimoto, Kentaro Torisawa, “Bilingual Co-Training for Monopoly-Relation Acquisition”, In Proc of ACL-NLP9, In Proc. 432-440, 2009

しかしながら、非特許文献１の手法では、複数の機械学習器が扱う処理対象が同じでなければならず、異なる処理対象を扱うことができないという問題があった。また、従来の相互機械学習よりも、より精度の高い相互機械学習の実現が望まれていた。 However, the method of Non-Patent Document 1 has a problem that the processing targets handled by a plurality of machine learners must be the same, and different processing targets cannot be handled. In addition, it has been desired to realize mutual machine learning with higher accuracy than conventional mutual machine learning.

本発明は、上記課題を解決するためになされたものであり、複数の機械学習器が異なる処理対象を扱うことができ、精度の高い機械学習を実現可能な相互機械学習装置等を提供することを目的とする。 The present invention has been made to solve the above-described problems, and provides a mutual machine learning device and the like that can handle different processing targets by a plurality of machine learners and can realize highly accurate machine learning. With the goal.

上記目的を達成するため、本発明による相互機械学習装置は、第１の方法によって第１のコーパスから抽出された、意味的関係を有する言語表現のペアの候補である複数の第１関係ペア候補と、第１の方法とは異なる第２の方法によって第２のコーパスから抽出された、意味的関係を有する言語表現のペアの候補である複数の第２関係ペア候補とに共通する共通ペアであるジェニュイン共通ペアと、第１のコーパスから抽出された、意味的関係を有さない言語表現のペアの候補である複数の第１無関係ペア候補と、複数の第２関係ペア候補とに共通する共通ペアであるバーチャル共通ペア、及び、第２のコーパスから抽出された、意味的関係を有さない言語表現のペアの候補である複数の第２無関係ペア候補と、複数の第１関係ペア候補とに共通する共通ペアであるバーチャル共通ペアとが記憶される共通ペア記憶部と、第１関係ペア候補が意味的関係を有しているかどうかの分類に関する機械学習で用いられる教師データである第１の学習データが記憶される第１の学習データ記憶部と、第１の学習データを用いて機械学習を行い、機械学習の結果を用いて、ジェニュイン共通ペア及びバーチャル共通ペアが意味的関係を有しているかどうか分類する第１の分類部と、第２関係ペア候補が意味的関係を有しているかどうかの分類に関する機械学習で用いられる教師データである第２の学習データが記憶される第２の学習データ記憶部と、第２の学習データを用いて機械学習を行い、機械学習の結果を用いて、ジェニュイン共通ペア及びバーチャル共通ペアが意味的関係を有しているかどうか分類する第２の分類部と、第１の分類部の分類による共通ペアの分類結果と確信度との少なくとも一方に応じて、当該共通ペアと当該共通ペアに関する分類結果とを第２の学習データに追加し、第２の分類部の分類による共通ペアの分類結果と確信度との少なくとも一方に応じて、当該共通ペアと当該共通ペアに関する分類結果とを第１の学習データに追加する追加部と、を備え、第１及び第２の分類部による機械学習及び分類と、追加部による学習データの追加とが繰り返して実行される、ものである。 In order to achieve the above object, a mutual machine learning device according to the present invention provides a plurality of first relationship pair candidates that are candidates for a pair of language expressions having a semantic relationship extracted from a first corpus by a first method. And a common pair that is extracted from the second corpus by a second method different from the first method and is common to a plurality of second relationship pair candidates that are candidates for language expression pairs having a semantic relationship. Common to a certain common pair, a plurality of first unrelated pair candidates that are extracted from the first corpus, and that are candidate language expression pairs that have no semantic relationship, and a plurality of second relationship pair candidates A plurality of second unrelated pair candidates and a plurality of first relationship pair candidates extracted from the second common corpus, a virtual common pair that is a common pair, and language expression pairs that have no semantic relationship And A common pair storage unit that stores a virtual common pair that is a common pair that passes through, and first teacher data that is used in machine learning related to classification of whether or not the first relationship pair candidate has a semantic relationship. Machine learning is performed using the first learning data storage unit in which the learning data is stored, and the first learning data, and the genuine common pair and the virtual common pair have a semantic relationship using the result of the machine learning. A first classifying unit that classifies whether or not the second learning data is stored, and second learning data that is teacher data used in machine learning related to the classification of whether or not the second relationship pair candidate has a semantic relationship. Machine learning using the learning data storage unit and the second learning data, and using the result of the machine learning, the genuine common pair and the virtual common pair have a semantic relationship. A second classification unit that classifies whether or not the classification result of the common pair and the common pair are classified according to at least one of the classification result and the certainty of the common pair by the classification of the first classification unit In addition to the learning data, the common pair and the classification result related to the common pair are added to the first learning data according to at least one of the classification result and the certainty factor of the common pair by the classification of the second classification unit. An addition unit, and machine learning and classification by the first and second classification units and addition of learning data by the addition unit are repeatedly executed.

このような構成により、第１及び第２の分類部は、異なる処理対象、すなわち、第１の方法によって抽出された言語表現のペア、及び、第１の方法とは異なる第２の方法によって抽出された言語表現のペアを扱うことができると共に、バーチャル共通ペアをも用いて相互機械学習を行うため、より精度の高い機械学習を実現することができる。その結果、その機械学習の結果を用いて意味的関係を有する言語表現のペアの分類を行うことによって、意味的関係を有する言語表現のペアを精度高く獲得することができるようになる。 With such a configuration, the first and second classification units can extract different processing targets, that is, pairs of language expressions extracted by the first method, and a second method different from the first method. It is possible to handle a pair of linguistic expressions, and to perform mutual machine learning using a virtual common pair, so that it is possible to realize machine learning with higher accuracy. As a result, linguistic expression pairs having a semantic relationship can be obtained with high accuracy by classifying linguistic expression pairs having a semantic relationship using the machine learning result.

また、本発明による相互機械学習装置は、第１の方法によって第１のコーパスから抽出された、意味的関係を有する言語表現のペアの候補である複数の第１関係ペア候補と、前記第１の方法とは異なる第２の方法によって第２のコーパスから抽出された、前記意味的関係を有する言語表現のペアの候補である複数の第２関係ペア候補とに共通する共通ペアであるジェニュイン共通ペア、及び、前記複数の第１関係ペア候補と、前記複数の第２関係ペア候補と、前記第１のコーパスから抽出された、前記意味的関係を有さない言語表現のペアの候補である複数の第１無関係ペア候補と、前記第２のコーパスから抽出された、前記意味的関係を有さない言語表現のペアの候補である複数の第２無関係ペア候補とのうち、前記ジェニュイン共通ペアではないペアである共通ペアであるバーチャル共通ペアが記憶される共通ペア記憶部と、前記第１関係ペア候補が前記意味的関係を有しているかどうかの分類に関する機械学習で用いられる教師データである第１の学習データが記憶される第１の学習データ記憶部と、前記第１の学習データを用いて機械学習を行い、当該機械学習の結果を用いて、前記ジェニュイン共通ペア及び前記バーチャル共通ペアが前記意味的関係を有しているかどうか分類する第１の分類部と、前記第２関係ペア候補が前記意味的関係を有しているかどうかの分類に関する機械学習で用いられる教師データである第２の学習データが記憶される第２の学習データ記憶部と、前記第２の学習データを用いて機械学習を行い、当該機械学習の結果を用いて、前記ジェニュイン共通ペア及び前記バーチャル共通ペアが前記意味的関係を有しているかどうか分類する第２の分類部と、第１の分類部の分類による共通ペアの分類結果と確信度との少なくとも一方に応じて、当該共通ペアと当該共通ペアに関する分類結果とを第２の学習データに追加し、第２の分類部の分類による共通ペアの分類結果と確信度との少なくとも一方に応じて、当該共通ペアと当該共通ペアに関する分類結果とを第１の学習データに追加する追加部と、を備え、第１及び第２の分類部による機械学習及び分類と、追加部による学習データの追加とが繰り返して実行される、ものである。
このような構成により、前述の相互機械学習装置と同様に、異なる処理対象を扱うことができると共に、精度の高い機械学習を実現できる。また、前述の相互機械学習装置よりも多くのバーチャル共通ペアを用いた処理が可能となる。 The mutual machine learning device according to the present invention includes a plurality of first relationship pair candidates that are extracted from a first corpus by a first method and that are candidate language expression pairs having a semantic relationship. Genuine common that is a common pair that is extracted from a second corpus by a second method different from the above method and is common to a plurality of second relationship pair candidates that are candidates for pairs of language expressions having the semantic relationship A pair, a plurality of first relationship pair candidates, a plurality of second relationship pair candidates, and a pair of language expressions that are extracted from the first corpus and have no semantic relationship. Among the plurality of first irrelevant pair candidates and the plurality of second irrelevant pair candidates that are extracted from the second corpus and that are candidate language expression pairs that do not have the semantic relationship, the common common pair A common pair storage unit that stores a virtual common pair that is a common pair that is not a pair, and teacher data used in machine learning related to classification of whether the first relation pair candidate has the semantic relationship A first learning data storage unit that stores certain first learning data and machine learning using the first learning data, and using the result of the machine learning, the genuine common pair and the virtual common A first classification unit that classifies whether a pair has the semantic relationship, and teacher data used in machine learning regarding classification of whether the second relationship pair candidate has the semantic relationship Machine learning is performed using a second learning data storage unit in which second learning data is stored and the second learning data, and the result of the machine learning is used to perform the learning. Depending on at least one of a second classification unit that classifies whether the virtual common pair and the virtual common pair have the semantic relationship, and a classification result of the common pair and a certainty factor according to the classification of the first classification unit Then, the common pair and the classification result related to the common pair are added to the second learning data, and the common pair is determined according to at least one of the classification result and the certainty factor of the common pair by the classification of the second classification unit. And an addition unit that adds the classification result related to the common pair to the first learning data, and machine learning and classification by the first and second classification units and addition of the learning data by the addition unit are repeated. To be executed.
With such a configuration, different processing objects can be handled and machine learning with high accuracy can be realized as in the above-described mutual machine learning device. Further, it is possible to perform processing using more virtual common pairs than the above-described mutual machine learning device.

また、本発明による相互機械学習装置では、前記追加部は、前記第１の分類部の分類による確信度が高い共通ペアと当該共通ペアに関する分類結果とを前記第２の学習データに追加し、前記第２の分類部の分類による確信度が高い共通ペアと当該共通ペアに関する分類結果とを前記第１の学習データに追加してもよい。
このような構成により、一方の分類部によって確信度が高く分類された共通ペアは信頼できるものであると考えられるため、このような構成によって、適切に学習データを増やすことができると考えられる。 In the mutual machine learning device according to the present invention, the adding unit adds a common pair having a high certainty factor by classification of the first classification unit and a classification result related to the common pair to the second learning data, A common pair having a high certainty factor according to the classification of the second classification unit and a classification result related to the common pair may be added to the first learning data.
With such a configuration, it is considered that a common pair classified with a high degree of certainty by one of the classification units is reliable, and thus it is considered that the learning data can be appropriately increased by such a configuration.

また、本発明による相互機械学習装置では、追加部は、第１の分類部の分類による確信度が高く、第１及び第２の分類部の分類結果が同じである共通ペアと共通ペアに関する分類結果とを第２の学習データに追加し、第２の分類部の分類による確信度が高く、第１及び第２の分類部の分類結果が同じである共通ペアと共通ペアに関する分類結果とを第１の学習データに追加してもよい。 In the mutual machine learning device according to the present invention, the additional unit has a high certainty factor by the classification of the first classification unit and the classification related to the common pair and the common pair having the same classification result of the first and second classification units. The result is added to the second learning data, the confidence level by the classification of the second classification unit is high, and the classification result of the common pair and the common pair with the same classification result of the first and second classification units You may add to 1st learning data.

第１及び第２の分類部による分類結果が同じであり、一方の分類部によって確信度が高く分類された共通ペアは、他方の分類部の分類による確信度にかかわらず、信頼できるものであると考えられる。したがって、このような構成により、その共通ペアを、他方の分類部の学習データに追加することによって、適切に学習データを増やすことができると考えられる。 A common pair that has the same classification result by the first and second classification units and is classified with high confidence by one classification unit is reliable regardless of the certainty by the classification of the other classification unit. it is conceivable that. Therefore, with such a configuration, it is considered that the learning data can be appropriately increased by adding the common pair to the learning data of the other classification unit.

また、本発明による相互機械学習装置では、追加部は、第１の分類部の分類による確信度が高く、第２の分類部の分類による確信度が低い共通ペアと共通ペアに関する分類結果とを第２の学習データに追加し、第２の分類部の分類による確信度が高く、第１の分類部の分類による確信度が低い共通ペアと共通ペアに関する分類結果とを第１の学習データに追加してもよい。 In the mutual machine learning device according to the present invention, the adding unit obtains a common pair having a high certainty factor by the classification of the first classification unit and a low certainty factor by the classification of the second classification unit and a classification result related to the common pair. In addition to the second learning data, a common pair having a high certainty factor by the classification of the second classification unit and a low certainty factor by the classification of the first classification unit and a classification result related to the common pair are used as the first learning data. May be added.

一方の分類部によって確信度が高く分類され、他方の分類部によって確信度が低く分類された共通ペアは、前者の分類部による分類が信頼できるものと考えられる。したがって、このような構成により、その共通ペアを、後者の分類部の学習データに追加することによって、適切に学習データを増やすことができると考えられる。 A common pair that is classified with high confidence by one classification unit and classified with low confidence by the other classification unit is considered to be reliable for classification by the former classification unit. Accordingly, with such a configuration, it is considered that the learning data can be appropriately increased by adding the common pair to the learning data of the latter classification unit.

また、本発明による相互機械学習装置では、複数の第１関係ペア候補が記憶される第１関係ペア候補記憶部と、複数の第１無関係ペア候補が記憶される第１無関係ペア候補記憶部と、複数の第２関係ペア候補が記憶される第２関係ペア候補記憶部と、複数の第２無関係ペア候補が記憶される第２無関係ペア候補記憶部と、複数の第１関係ペア候補と複数の第２関係ペア候補とを用いて、ジェニュイン共通ペアを取得して共通ペア記憶部に蓄積し、複数の第１関係ペア候補と複数の第２関係ペア候補と複数の第１無関係ペア候補と複数の第２無関係ペア候補とを用いて、バーチャル共通ペアを取得して共通ペア記憶部に蓄積する取得部と、をさらに備えてもよい。
このような構成により、相互機械学習装置において、ジェニュイン共通ペアとバーチャル共通ペアとを取得する処理をも行うことができるようになる。 In the mutual machine learning device according to the present invention, a first relationship pair candidate storage unit that stores a plurality of first relationship pair candidates, and a first irrelevant pair candidate storage unit that stores a plurality of first irrelevant pair candidates; A second relationship pair candidate storage unit that stores a plurality of second relationship pair candidates; a second relationship pair candidate storage unit that stores a plurality of second unrelated pair candidates; and a plurality of first relationship pair candidates and a plurality Using the second relationship pair candidates, a genuine common pair is acquired and stored in the common pair storage unit, and a plurality of first relationship pair candidates, a plurality of second relationship pair candidates, and a plurality of first unrelated pair candidates An acquisition unit that acquires a virtual common pair using a plurality of second unrelated pair candidates and stores the virtual common pair in a common pair storage unit may be further provided.
With such a configuration, the mutual machine learning apparatus can also perform processing for acquiring a genuine common pair and a virtual common pair.

また、本発明による相互機械学習装置では、第１のコーパスが記憶される第１のコーパス記憶部と、第２のコーパスが記憶される第２のコーパス記憶部と、第１のコーパスから複数の第１関係ペア候補を抽出して第１関係ペア候補記憶部に蓄積し、第１のコーパスから複数の第１無関係ペア候補を抽出して第１無関係ペア候補記憶部に蓄積する第１の抽出部と、第２のコーパスから複数の第２関係ペア候補を抽出して第２関係ペア候補記憶部に蓄積し、第２のコーパスから複数の第２無関係ペア候補を抽出して第２無関係ペア候補記憶部に蓄積する第２の抽出部と、をさらに備えてもよい。
このような構成により、相互機械学習装置において、第１及び第２のコーパスから、第１関係ペア候補等を抽出する処理をも行うことができるようになる。 In the mutual machine learning device according to the present invention, a first corpus storage unit that stores a first corpus, a second corpus storage unit that stores a second corpus, and a plurality of first corpuses. First extraction of first relation pair candidates extracted and stored in the first relation pair candidate storage unit, and a plurality of first unrelated pair candidates extracted from the first corpus and stored in the first unrelated pair candidate storage unit And a plurality of second related pair candidates extracted from the second corpus and stored in a second related pair candidate storage unit, and a plurality of second unrelated pair candidates extracted from the second corpus A second extraction unit that accumulates in the candidate storage unit.
With such a configuration, the mutual machine learning apparatus can also perform processing for extracting the first relationship pair candidate and the like from the first and second corpora.

また、本発明による相互機械学習装置では、第１の分類部は、機械学習及び分類と学習データの追加との繰り返しの後に、複数の第１関係ペア候補に対して分類を行い、第２の分類部は、機械学習及び分類と学習データの追加との繰り返しの後に、複数の第２関係ペア候補に対して分類を行ってもよい。
このような構成により、前述のようにして共通ペアの追加された学習データを機械学習した結果を用いて第１関係ペア候補及び第２関係ペア候補の分類を行うため、より精度の高い分類を行うことができるようになる。 In the mutual machine learning device according to the present invention, the first classifying unit classifies the plurality of first relationship pair candidates after the machine learning and the repetition of the classification and the addition of the learning data, The classification unit may perform classification on a plurality of second relationship pair candidates after machine learning and the repetition of classification and addition of learning data.
With such a configuration, since the first relation pair candidate and the second relation pair candidate are classified using the result of machine learning of the learning data to which the common pair is added as described above, the classification with higher accuracy is performed. Will be able to do.

また、本発明による相互機械学習装置では、第１のコーパスは、構造化されたコーパスであり、第２のコーパスは、構造化されていない自然言語文のコーパスであってもよい。
このような構成により、異なる処理対象として、構造化されたコーパスから取得された言語表現のペア、及び、構造化されていないコーパスから取得された言語表現のペアを扱うことができる。 In the mutual machine learning device according to the present invention, the first corpus may be a structured corpus, and the second corpus may be an unstructured natural language sentence corpus.
With such a configuration, a pair of language expressions acquired from a structured corpus and a pair of language expressions acquired from an unstructured corpus can be handled as different processing targets.

また、本発明による相互機械学習装置では、意味的関係は、上位下位の関係であってもよい。 In the mutual machine learning device according to the present invention, the semantic relationship may be an upper-lower relationship.

本発明による相互機械学習装置等によれば、異なる処理対象を扱うことができると共に、より精度の高い機械学習を実現することができる。 According to the mutual machine learning apparatus and the like according to the present invention, different processing objects can be handled, and machine learning with higher accuracy can be realized.

本発明の実施の形態１による相互機械学習装置の構成を示すブロック図1 is a block diagram showing the configuration of a mutual machine learning device according to Embodiment 1 of the present invention. 同実施の形態による相互機械学習装置の動作を示すフローチャートThe flowchart which shows the operation | movement of the mutual machine learning apparatus by the embodiment 同実施の形態による相互機械学習装置の動作を示すフローチャートThe flowchart which shows the operation | movement of the mutual machine learning apparatus by the embodiment 同実施の形態における共通ペアについて説明するための図The figure for demonstrating the common pair in the embodiment 同実施の形態における構造化されたコーパスについて説明するための図The figure for demonstrating the structured corpus in the embodiment 同実施の形態における実験結果を示す図The figure which shows the experimental result in the same embodiment 同実施の形態における実験結果を示す図The figure which shows the experimental result in the same embodiment 同実施の形態におけるコンピュータシステムの外観一例を示す模式図Schematic diagram showing an example of the appearance of the computer system in the embodiment 同実施の形態におけるコンピュータシステムの構成の一例を示す図The figure which shows an example of a structure of the computer system in the embodiment

以下、本発明による相互機械学習装置について、実施の形態を用いて説明する。なお、以下の実施の形態において、同じ符号を付した構成要素及びステップは同一または相当するものであり、再度の説明を省略することがある。 Hereinafter, a mutual machine learning apparatus according to the present invention will be described using embodiments. In the following embodiments, components and steps denoted by the same reference numerals are the same or equivalent, and repetitive description may be omitted.

（実施の形態１）
本発明の実施の形態１による相互機械学習装置について、図面を参照しながら説明する。
図１は、本実施の形態による相互機械学習装置１の構成を示すブロック図である。本実施の形態による相互機械学習装置１は、第１のコーパス記憶部１１と、第２のコーパス記憶部１２と、第１の抽出部１３と、第２の抽出部１４と、第１関係ペア候補記憶部１５と、第１無関係ペア候補記憶部１６と、第２関係ペア候補記憶部１７と、第２無関係ペア候補記憶部１８と、取得部１９と、共通ペア記憶部２０と、第１の学習データ記憶部２１と、第２の学習データ記憶部２２と、第１の分類部２３と、第２の分類部２４と、追加部２５と、第１関係ペア記憶部２６と、第２関係ペア記憶部２７とを備える。 (Embodiment 1)
A mutual machine learning apparatus according to Embodiment 1 of the present invention will be described with reference to the drawings.
FIG. 1 is a block diagram showing a configuration of a mutual machine learning device 1 according to the present embodiment. The mutual machine learning device 1 according to the present embodiment includes a first corpus storage unit 11, a second corpus storage unit 12, a first extraction unit 13, a second extraction unit 14, and a first relationship pair. The candidate storage unit 15, the first irrelevant pair candidate storage unit 16, the second relationship pair candidate storage unit 17, the second irrelevant pair candidate storage unit 18, the acquisition unit 19, the common pair storage unit 20, and the first Learning data storage unit 21, second learning data storage unit 22, first classification unit 23, second classification unit 24, addition unit 25, first relationship pair storage unit 26, second A relationship pair storage unit 27.

第１のコーパス記憶部１１では、第１のコーパスが記憶され、第２のコーパス記憶部１２では、第２のコーパスが記憶される。第１及び第２のコーパスは、異なる種類のコーパスであってもよく、あるいは、同じ種類のコーパスであってもよい。前者の場合には、例えば、第１のコーパスは構造化されたコーパスであり、第２のコーパスは構造化されていない自然言語文のコーパスであってもよい。本実施の形態では、主にその場合について説明する。ここで、構造化されたコーパスとは、そのコーパスに含まれる文書が階層構造や、ツリー構造等の何らかの構造を有しているコーパスのことである。構造化されたコーパスは、例えば、百科事典の情報であってもよく、その他の情報であってもよい。それらの情報では、例えば、タイトル、セクション、サブセクション、リスト等の階層構造やツリー構造があるものとする。百科事典のコーパスとしては、例えば、ウェブで公開されているＷＩＫＩＰＥＤＩＡ（登録商標）等がある。構造化されていない自然言語文のコーパスとは、階層構造やツリー構造等の構造を有さないコーパスのことであり、例えば、新聞の情報や、小説の情報、構造化されていないウェブの情報等であってもよい。自然言語文の文書を含む情報であれば、通常、構造化されていない自然言語文のコーパスとなる。したがって、構造を有するコーパスであっても、その構造を用いないことによって、構造化されていないコーパスとして用いることも可能である。構造化されていないウェブの情報として、例えば、「検索エンジン研究基盤ＴＳＵＢＡＫＩ」等で対象としている日本語のウェブ文書の大規模コーパスが存在する。 The first corpus storage unit 11 stores the first corpus, and the second corpus storage unit 12 stores the second corpus. The first and second corpora may be different types of corpuses or the same type of corpora. In the former case, for example, the first corpus may be a structured corpus and the second corpus may be an unstructured natural language sentence corpus. In this embodiment, the case will be mainly described. Here, the structured corpus is a corpus in which a document included in the corpus has some structure such as a hierarchical structure or a tree structure. The structured corpus may be, for example, encyclopedia information or other information. Such information includes, for example, a hierarchical structure such as a title, a section, a subsection, a list, and a tree structure. As an encyclopedia corpus, for example, there is WIKIPEDIA (registered trademark) published on the web. An unstructured natural language corpus is a corpus that does not have a hierarchical or tree structure, such as newspaper information, novel information, or unstructured web information. Etc. Information including a document of a natural language sentence is usually a corpus of unstructured natural language sentences. Therefore, even a corpus having a structure can be used as an unstructured corpus by not using the structure. As unstructured web information, for example, there is a large-scale corpus of Japanese web documents targeted by “Search Engine Research Infrastructure TSUBAKI” or the like.

第１のコーパス記憶部１１、及び第２のコーパス記憶部１２にコーパスが記憶される過程は問わない。例えば、記録媒体を介してコーパスが第１のコーパス記憶部１１等で記憶されるようになってもよく、あるいは、通信回線等を介して送信されたコーパスが第１のコーパス記憶部１１等で記憶されるようになってもよい。 The process of storing the corpus in the first corpus storage unit 11 and the second corpus storage unit 12 does not matter. For example, a corpus may be stored in the first corpus storage unit 11 or the like via a recording medium, or a corpus transmitted via a communication line or the like may be stored in the first corpus storage unit 11 or the like. It may be memorized.

第１の抽出部１３は、第１のコーパス記憶部１１で記憶されている第１のコーパスから複数の第１関係ペア候補を抽出して第１関係ペア候補記憶部１５に蓄積する。また、第１の抽出部１３は、第１のコーパス記憶部１１で記憶されている第１のコーパスから複数の第１無関係ペア候補を抽出して第１無関係ペア候補記憶部１６に蓄積する。第１関係ペア候補は、ある意味的関係を有する言語表現のペアの候補である。なお、第１関係ペア候補は、その意味的関係を有する言語表現のペアに関する候補であるため、必ずしもその意味的関係を有しているとは限らない。第１無関係ペア候補は、その意味的関係を有さない言語表現のペアの候補である。また、第１無関係ペア候補は、その意味的関係を有さない言語表現のペアに関する候補であるため、必ずしもその意味的関係を有さないとは限らない。意味的関係とは、例えば、上位下位の関係（例えば、飲み物とコーヒー）であってもよく、原因結果の関係（例えば、豪雨と洪水）であってもよく、全体部分の関係（例えば、人と手、自動車とタイヤ）であってもよく、ライバルや対義語の関係（例えば、上と下）であってもよく、製品とメーカーの関係（例えば、掃除機とＡ社）であってもよく、事象と方法の関係（例えば、爆発と爆弾）であってもよく、事象とツールの関係（例えば、授業と教科書）であってもよく、事象と防ぐものの関係（例えば、病気と薬）であってもよく、物と材料の関係（例えば、缶とアルミニウム）であってもよく、名所・建物と場所の関係（例えば、二条城と京都）であってもよく、その他の種類の関連であってもよい。言語表現は、例えば、単語（形態素）であってもよく、単語の並びであるフレーズであってもよい。また、言語表現は、複数の単語の連続（例えば、複合名詞など）であってもよい。第１の抽出部１３は、通常、いずれか一つの意味的関係を有する言語表現のペアの候補である第１関係ペア候補を抽出する。本実施の形態では、第１の抽出部１３が、上位下位の関係を有する言語表現のペアの候補である第１関係ペア候補を抽出する場合について主に説明する。また、第１の抽出部１３は、第１関係ペア候補が有しているとされる意味的関係を有していないと考えられる言語表現のペアを、第１無関係ペア候補として抽出してもよく、あるいは、第１関係ペア候補が有しているとされる意味的関係ではない意味的関係を有していると考えられる言語表現のペアを、第１無関係ペア候補として抽出してもよい（第１関係ペア候補が有しているとされる意味的関係ではない意味的関係を有している言語表現のペアは、その第１関係ペア候補が有しているとされる意味的関係を有していないであろうと考えられるからである）。本実施の形態では、第１の抽出部１３が、上位下位の関係を有さない言語表現のペアの候補である第１無関係ペア候補を抽出する場合について主に説明する。 The first extraction unit 13 extracts a plurality of first relationship pair candidates from the first corpus stored in the first corpus storage unit 11 and accumulates them in the first relationship pair candidate storage unit 15. Further, the first extraction unit 13 extracts a plurality of first irrelevant pair candidates from the first corpus stored in the first corpus storage unit 11 and accumulates them in the first irrelevant pair candidate storage unit 16. The first relationship pair candidate is a linguistic expression pair candidate having a certain semantic relationship. The first relationship pair candidate is a candidate for a linguistic expression pair having the semantic relationship, and thus does not necessarily have the semantic relationship. The first irrelevant pair candidate is a linguistic expression pair candidate that does not have the semantic relationship. In addition, the first unrelated pair candidate is a candidate for a language expression pair that does not have the semantic relationship, and thus does not necessarily have the semantic relationship. The semantic relationship may be, for example, an upper-lower relationship (for example, drink and coffee), a causal relationship (for example, heavy rain and flood), or a whole-part relationship (for example, a person) Or hand, car and tire), rival or synonym (eg, up and down), product and manufacturer (eg, vacuum cleaner and company A), event It may be a relationship between the method and the method (for example, explosion and bomb), the relationship between the event and the tool (for example, lesson and textbook), or the relationship between the event and the prevention (for example, illness and medicine). It may be a relationship between materials and materials (for example, cans and aluminum), a relationship between famous places / buildings and places (for example, Nijo Castle and Kyoto), or other types of relationships. Good. The language expression may be, for example, a word (morpheme) or a phrase that is a sequence of words. Further, the linguistic expression may be a sequence of a plurality of words (for example, compound nouns). The first extraction unit 13 usually extracts a first relationship pair candidate that is a candidate for a pair of language expressions having any one semantic relationship. In the present embodiment, a case will be mainly described in which the first extraction unit 13 extracts a first relationship pair candidate that is a candidate for a pair of language expressions having upper and lower relationships. Further, the first extraction unit 13 may extract, as the first irrelevant pair candidate, a pair of language expressions that is considered not to have a semantic relationship that the first relation pair candidate has. Alternatively, a pair of language expressions considered to have a semantic relationship that is not the semantic relationship that the first relationship pair candidate has may be extracted as the first unrelated pair candidate. (A pair of linguistic expressions having a semantic relationship that is not a semantic relationship that the first relationship pair candidate has is a semantic relationship that the first relationship pair candidate has. Because it is thought that they would not have In the present embodiment, a case will be mainly described in which the first extraction unit 13 extracts a first irrelevant pair candidate that is a linguistic expression pair candidate having no upper-lower relationship.

第２の抽出部１４は、第２のコーパス記憶部１２で記憶されている第２のコーパスから複数の第２関係ペア候補を抽出して第２関係ペア候補記憶部１７に蓄積する。また、第２の抽出部１４は、第２のコーパス記憶部１２で記憶されている第２のコーパスから複数の第２無関係ペア候補を抽出して第２無関係ペア候補記憶部１８に蓄積する。第２関係ペア候補は、第１関係ペア候補が有していると考えられる意味的関係と同じ意味的関係を有する言語表現のペアの候補である。なお、第２関係ペア候補は、その意味的関係を有する言語表現のペアに関する候補であるため、必ずしもその意味的関係を有しているとは限らない。第２無関係ペア候補は、その意味的関係を有さない言語表現のペアの候補である。また、第２無関係ペア候補は、その意味的関係を有さない言語表現のペアに関する候補であるため、必ずしもその意味的関係を有さないとは限らない。第２の抽出部１４は、通常、いずれか一つの意味的関係を有する言語表現のペアの候補である第２関係ペア候補を抽出する。その意味的関係は、前述のように、第１関係ペア候補が有していると考えられる意味的関係と同じ意味的関係である。本実施の形態では、第２の抽出部１４が、上位下位の関係を有する言語表現のペアの候補である第２関係ペア候補を抽出する場合について主に説明する。また、第２の抽出部１４は、第２関係ペア候補が有しているとされる意味的関係を有していないと考えられる言語表現のペアを、第２無関係ペア候補として抽出してもよく、あるいは、第２関係ペア候補が有しているとされる意味的関係ではない意味的関係を有していると考えられる言語表現のペアを、第２無関係ペア候補として抽出してもよい。本実施の形態では、第２の抽出部１４が、上位下位の関係ではない意味的関係を有する言語表現のペアの候補である第２無関係ペア候補を抽出する場合について主に説明する。 The second extraction unit 14 extracts a plurality of second relationship pair candidates from the second corpus stored in the second corpus storage unit 12 and accumulates them in the second relationship pair candidate storage unit 17. Further, the second extraction unit 14 extracts a plurality of second irrelevant pair candidates from the second corpus stored in the second corpus storage unit 12 and accumulates them in the second irrelevant pair candidate storage unit 18. The second relationship pair candidate is a linguistic expression pair candidate having the same semantic relationship as the semantic relationship considered to be possessed by the first relationship pair candidate. The second relationship pair candidate is a candidate for a linguistic expression pair having the semantic relationship, and thus does not necessarily have the semantic relationship. The second irrelevant pair candidate is a linguistic expression pair candidate that does not have the semantic relationship. Moreover, since the 2nd unrelated pair candidate is a candidate regarding the pair of the language expression which does not have the semantic relationship, it does not necessarily have the semantic relationship. The second extraction unit 14 typically extracts a second relationship pair candidate that is a linguistic expression pair candidate having any one semantic relationship. As described above, the semantic relationship is the same semantic relationship as the semantic relationship considered to be possessed by the first relationship pair candidate. In the present embodiment, a case will be mainly described in which the second extraction unit 14 extracts second relationship pair candidates that are candidates for pairs of language expressions having upper and lower relationships. Further, the second extraction unit 14 may extract, as the second unrelated pair candidate, a language expression pair that is considered not to have the semantic relationship that the second relation pair candidate has. Alternatively, a pair of language expressions that are considered to have a semantic relationship that is not the semantic relationship that the second relationship pair candidate has may be extracted as a second unrelated pair candidate. . In the present embodiment, a case will be mainly described in which the second extraction unit 14 extracts a second irrelevant pair candidate that is a candidate for a linguistic expression pair having a semantic relationship that is not an upper / lower relationship.

第１及び第２の抽出部１３，１４は、言語表現のペアを抽出する元となるコーパスが異なる以外に、その抽出方法が異なるものとする。すなわち、第１の抽出部１３は、第１の方法によって第１コーパスから複数の第１関係ペア候補を抽出し、第２の抽出部１４は、第２の方法によって第２のコーパスから複数の第２関係ペア候補を抽出する。なお、第１の方法と第２の方法とは異なるものとする。したがって、第１及び第２のコーパスの種類が一緒であったとしても、各コーパスから第１及び第２関係ペア候補を抽出する方法が異なるため、第１関係ペア候補と第２関係ペア候補とは異なる種類のものとなる。本実施の形態では、第１の方法は、第１のコーパスが有する構造を用いて第１関係ペア候補を抽出する方法であり、第２の方法は、レキシコシンタクティックパターン（Ｌｅｘｉｃｏ−ｓｙｎｔａｃｔｉｃｐａｔｔｅｒｎｓ）を用いて第２関係ペア候補を抽出する方法である場合について説明する。それらの抽出方法の詳細については後述する。 It is assumed that the first and second extraction units 13 and 14 have different extraction methods in addition to different corpus from which language expression pairs are extracted. That is, the first extraction unit 13 extracts a plurality of first relationship pair candidates from the first corpus by the first method, and the second extraction unit 14 extracts a plurality of first relation pair candidates from the second corpus by the second method. A second relationship pair candidate is extracted. Note that the first method and the second method are different. Therefore, even if the types of the first and second corpora are the same, the method for extracting the first and second relationship pair candidates from each corpus is different, so the first relationship pair candidate and the second relationship pair candidate are Are of different kinds. In the present embodiment, the first method is a method for extracting first relationship pair candidates using the structure of the first corpus, and the second method is a lexico-tactic pattern (Lexico-syntactic pattern). ) Is used to extract a second related pair candidate. Details of these extraction methods will be described later.

取得部１９は、複数の第１関係ペア候補と複数の第２関係ペア候補とを用いて、ジェニュイン（ｊｅｎｕｉｎｅ）共通ペアを取得して共通ペア記憶部２０に蓄積する。また、取得部１９は、複数の第１関係ペア候補と複数の第２関係ペア候補と複数の第１無関係ペア候補と複数の第２無関係ペア候補とを用いて、バーチャル（ｖｉｒｔｕａｌ）共通ペアを取得して共通ペア記憶部２０に蓄積する。ジェニュイン共通ペアとは、第１関係ペア候補記憶部１５で記憶されている複数の第１関係ペア候補と、第２関係ペア候補記憶部１７で記憶されている複数の第２関係ペア候補とに共通するペアである。したがって、ある第１関係ペア候補と、ある第２関係ペア候補とが同じ言語表現のペアである場合に、その第１関係ペア候補（その第２関係ペア候補）は、ジェニュイン共通ペアとなる。また、バーチャル共通ペアとは、第１無関係ペア候補記憶部１６で記憶されている複数の第１無関係ペア候補と、第２関係ペア候補記憶部１７で記憶されている複数の第２関係ペア候補とに共通するペア、及び、第２無関係ペア候補記憶部１８で記憶されている複数の第２無関係ペア候補と、第１関係ペア候補記憶部１５で記憶されている複数の第１関係ペア候補とに共通するペアである。したがって、ある第１無関係ペア候補と、ある第２関係ペア候補とが同じ言語表現のペアである場合に、その第１無関係ペア候補（その第２関係ペア候補）は、バーチャル共通ペアとなる。また、ある第２無関係ペア候補と、ある第１関係ペア候補とが同じ言語表現のペアである場合に、その第２無関係ペア候補（その第１関係ペア候補）は、バーチャル共通ペアとなる。なお、ジェニュイン共通ペアとバーチャル共通ペアとをあわせて共通ペアと呼ぶ。 The obtaining unit 19 obtains a genuine common pair by using the plurality of first relationship pair candidates and the plurality of second relationship pair candidates, and accumulates them in the common pair storage unit 20. In addition, the acquisition unit 19 uses the plurality of first relationship pair candidates, the plurality of second relationship pair candidates, the plurality of first irrelevant pair candidates, and the plurality of second irrelevant pair candidates to generate a virtual common pair. Acquired and stored in the common pair storage unit 20. Genuine common pairs are a plurality of first relationship pair candidates stored in the first relationship pair candidate storage unit 15 and a plurality of second relationship pair candidates stored in the second relationship pair candidate storage unit 17. It is a common pair. Therefore, when a certain first relationship pair candidate and a certain second relationship pair candidate are pairs of the same language expression, the first relationship pair candidate (the second relationship pair candidate) becomes a genuine common pair. The virtual common pair is a plurality of first unrelated pair candidates stored in the first unrelated pair candidate storage unit 16 and a plurality of second relationship pair candidates stored in the second relationship pair candidate storage unit 17. And a plurality of second unrelated pair candidates stored in the second unrelated pair candidate storage unit 18 and a plurality of first relationship pair candidates stored in the first relationship pair candidate storage unit 15 And a common pair. Therefore, when a certain first unrelated pair candidate and a certain second related pair candidate are pairs of the same language expression, the first unrelated pair candidate (the second related pair candidate) is a virtual common pair. Further, when a certain second irrelevant pair candidate and a certain first related pair candidate are the same language expression pair, the second unrelated pair candidate (the first related pair candidate) is a virtual common pair. The genuine common pair and the virtual common pair are collectively referred to as a common pair.

第１の学習データ記憶部２１では、第１関係ペア候補が意味的関係を有しているかどうかの分類に関する機械学習で用いられる教師データである第１の学習データが記憶される。
第２の学習データ記憶部２２では、第２関係ペア候補が意味的関係を有しているかどうかの分類に関する機械学習で用いられる教師データである第２の学習データが記憶される。 The first learning data storage unit 21 stores first learning data that is teacher data used in machine learning related to the classification of whether or not the first relationship pair candidate has a semantic relationship.
The second learning data storage unit 22 stores second learning data that is teacher data used in machine learning related to the classification of whether or not the second relationship pair candidate has a semantic relationship.

なお、第１の学習データ記憶部２１、第２の学習データ記憶部２２に第１の学習データや第２の学習データが記憶される過程は問わない。例えば、記録媒体を介して第１の学習データ等が第１の学習データ記憶部２１等で記憶されるようになってもよく、あるいは、通信回線等を介して送信された第１の学習データ等が第１の学習データ記憶部２１等で記憶されるようになってもよい。 In addition, the process in which 1st learning data and 2nd learning data are memorize | stored in the 1st learning data storage part 21 and the 2nd learning data storage part 22 does not ask | require. For example, first learning data or the like may be stored in the first learning data storage unit 21 or the like via a recording medium, or first learning data transmitted via a communication line or the like. Or the like may be stored in the first learning data storage unit 21 or the like.

第１の分類部２３は、第１の学習データを用いて機械学習を行い、機械学習の結果を用いて、ジェニュイン共通ペア及びバーチャル共通ペアが意味的関係を有しているかどうか分類する。その分類によって、第１の分類部２３は、分類結果（意味的関係を有するかどうか）と、その分類結果の確信度とを得ることができる。なお、後述するように、追加部２５によって第１の学習データが追加された場合には、第１の分類部２３は、その追加された第１の学習データをも用いて学習を行うものとする。また、第１の分類部２３は、機械学習及び分類と学習データの追加との繰り返しの後に、第１関係ペア候補記憶部１５で記憶されている複数の第１関係ペア候補に対して分類を行う。そして、第１の分類部２３は、意味的関係を有すると判断した第１関係ペア候補である第１関係ペアを、第１関係ペア記憶部２６に蓄積する。 The first classification unit 23 performs machine learning using the first learning data, and classifies whether the genuine common pair and the virtual common pair have a semantic relationship using the result of the machine learning. By the classification, the first classification unit 23 can obtain the classification result (whether it has a semantic relationship) and the certainty of the classification result. As will be described later, when the first learning data is added by the adding unit 25, the first classification unit 23 performs learning using the added first learning data as well. To do. The first classifying unit 23 classifies the plurality of first relationship pair candidates stored in the first relationship pair candidate storage unit 15 after repeating machine learning and classification and addition of learning data. Do. Then, the first classification unit 23 accumulates in the first relationship pair storage unit 26 the first relationship pair that is the first relationship pair candidate determined to have a semantic relationship.

第２の分類部２４は、第２の学習データを用いて機械学習を行い、機械学習の結果を用いて、ジェニュイン共通ペア及びバーチャル共通ペアが意味的関係を有しているかどうか分類する。その分類によって、第２の分類部２４は、分類結果（意味的関係を有するかどうか）と、その分類結果の確信度とを得ることができる。なお、後述するように、追加部２５によって第２の学習データが追加された場合には、第２の分類部２４は、その追加された第２の学習データをも用いて学習を行うものとする。また、第２の分類部２４は、機械学習及び分類と学習データの追加との繰り返しの後に、複数の第２関係ペア候補に対して分類を行う。そして、第２の分類部２４は、意味的関係を有すると判断した第２関係ペア候補である第２関係ペアを、第２関係ペア記憶部２７に蓄積する。 The second classification unit 24 performs machine learning using the second learning data, and classifies whether the genuine common pair and the virtual common pair have a semantic relationship using the result of the machine learning. By the classification, the second classification unit 24 can obtain the classification result (whether it has a semantic relationship) and the certainty of the classification result. As will be described later, when the second learning data is added by the adding unit 25, the second classification unit 24 also performs learning using the added second learning data. To do. The second classifying unit 24 classifies the plurality of second relation pair candidates after repeating machine learning and classification and addition of learning data. Then, the second classification unit 24 accumulates in the second relationship pair storage unit 27 the second relationship pair that is the second relationship pair candidate determined to have a semantic relationship.

ここで、第１及び第２の分類部２３，２４による機械学習を用いた分類について簡単に説明する。第１及び第２の分類部２３，２４は、機械学習を用いて、第１及び第２関係ペア候補を、意味的関係を有するものと、そうでないものとに分類する。この機械学習の入力は、第１及び第２関係ペア候補である。また、その機械学習の出力は、その第１及び第２関係ペア候補が意味的関係を有するかどうかである。また、その機械学習で用いられる教師データとしての学習データ（訓練データ）は、２個の言語表現のペアと、そのペアの意味的関係の有無を示す情報（すなわち、意味的関係を有しているか、有していないかの情報）とである。学習データを用いた学習の後に、分類の対象となる第１関係ペア候補や第２関係ペア候補を入力すると、その第１関係ペア候補等に関する素性の各値が取得され、その第１関係ペア候補等が意味的関係を有するかどうかと、その確信度とが出力される。その機械学習で用いられる素性については後述する。 Here, the classification using machine learning by the first and second classification units 23 and 24 will be briefly described. The first and second classification units 23 and 24 classify the first and second relationship pair candidates into those having a semantic relationship and those not having them using machine learning. The machine learning inputs are first and second relationship pair candidates. The output of the machine learning is whether or not the first and second relationship pair candidates have a semantic relationship. In addition, learning data (training data) as teacher data used in the machine learning includes two language expression pairs and information indicating presence / absence of a semantic relationship between the pairs (that is, having a semantic relationship). Or not). After the learning using the learning data, when the first relationship pair candidate or the second relationship pair candidate to be classified is input, each feature value regarding the first relationship pair candidate is acquired, and the first relationship pair Whether the candidate or the like has a semantic relationship and its certainty are output. The features used in the machine learning will be described later.

なお、第１及び第２の分類部２３，２４は、例えば、機械学習として、ＳＶＭ（ＳｕｐｐｏｒｔＶｅｃｔｏｒＭａｃｈｉｎｅ）を用いてもよく、その他のものを用いてもよい。本実施の形態では、機械学習としてＳＶＭを用いる場合について説明する。 The first and second classification units 23 and 24 may use, for example, SVM (Support Vector Machine) or other things as machine learning. In the present embodiment, a case where SVM is used as machine learning will be described.

追加部２５は、第１の分類部２３の分類による確信度が高い共通ペアと、その共通ペアに関する分類結果とを第２の学習データに追加する。また、追加部２５は、第２の分類部２４の分類による確信度が高い共通ペアと、その共通ペアに関する分類結果とを第１の学習データに追加する。具体的には、追加部２５は、第１の分類部２３の分類による確信度が高く、第１及び第２の分類部２３，２４の分類結果が同じである共通ペアと、その共通ペアに関する分類結果とを第２の学習データに追加してもよく、第２の分類部２４の分類による確信度が高く、第１及び第２の分類部２３，２４の分類結果が同じである共通ペアと、その共通ペアに関する分類結果とを第１の学習データに追加してもよい。また、追加部２５は、第１の分類部２３の分類による確信度が高く、第２の分類部２４の分類による確信度が低い共通ペアと、その共通ペアに関する分類結果とを第２の学習データに追加してもよく、第２の分類部２４の分類による確信度が高く、第１の分類部２３の分類による確信度が低い共通ペアと、その共通ペアに関する分類結果とを第１の学習データに追加してもよい。ここで、バーチャル共通ペアは、一方のコーパスにおいては意味的関係の候補とされているが、実際に意味的関係を有している可能性は低いと考えられる。したがって、バーチャル共通ペアは、負例として追加される可能性が高い。一方、ジェニュイン共通ペアは、バーチャル共通ペアよりも意味的関係を有している可能性が高いと考えられ、正例として追加される可能性もある。 The adding unit 25 adds a common pair having a high certainty factor according to the classification of the first classification unit 23 and a classification result related to the common pair to the second learning data. The adding unit 25 adds a common pair having a high certainty factor according to the classification of the second classification unit 24 and a classification result related to the common pair to the first learning data. Specifically, the adding unit 25 relates to a common pair having a high certainty factor by the classification of the first classification unit 23 and the same classification result of the first and second classification units 23 and 24, and the common pair. The classification result may be added to the second learning data, the confidence level by the classification of the second classification unit 24 is high, and the classification result of the first and second classification units 23 and 24 is the same. And the classification result regarding the common pair may be added to the first learning data. Further, the adding unit 25 performs the second learning on the common pair having a high certainty factor by the classification of the first classification unit 23 and a low certainty factor by the classification of the second classification unit 24 and the classification result regarding the common pair. It is possible to add to the data. The common pair having a high certainty factor by the classification of the second classification unit 24 and a low certainty factor by the classification of the first classification unit 23 and the classification result relating to the common pair You may add to learning data. Here, the virtual common pair is considered as a candidate for a semantic relationship in one corpus, but is unlikely to actually have a semantic relationship. Therefore, the virtual common pair is likely to be added as a negative example. On the other hand, the genuine common pair is considered to have a higher semantic relationship than the virtual common pair, and may be added as a positive example.

ある共通ペアと分類結果とを第１の学習データに追加するとは、その共通ペア等を第１の学習データ記憶部２１に蓄積することであってもよく、あるいは、その共通ペア等をも第１の分類部２３が第１の学習データとして使用するように設定することであってもよい。後者の場合には、第１の学習データに追加された、共通ペア記憶部２０で記憶されている共通ペアと、その分類結果とを、第１の分類部２３が第１の学習データとして読み出すように設定することであってもよい。ここで、その分類結果は、第２の分類部２４によって共通ペア記憶部２０に蓄積されてもよい。また、ある共通ペアと分類結果とを第２の学習データに追加するとは、その共通ペア等を第２の学習データ記憶部２２に蓄積することであってもよく、あるいは、その共通ペア等をも第２の分類部２４が第２の学習データとして使用するように設定することであってもよい。後者の場合には、第２の学習データに追加された、共通ペア記憶部２０で記憶されている共通ペアと、その分類結果とを、第２の分類部２４が第２の学習データとして読み出すように設定することであってもよい。ここで、その分類結果は、第１の分類部２３によって共通ペア記憶部２０に蓄積されてもよい。本実施の形態では、追加部２５が、第１の学習データへの追加対象である共通ペア等を第１の学習データ記憶部２１に蓄積し、第２の学習データへの追加対象である共通ペア等を第２の学習データ記憶部２２に蓄積する場合について説明する。なお、第１の学習データ記憶部２１及び第２の学習データ記憶部２２であらかじめ記憶されている学習データをそれぞれ、初期の第１の学習データ、初期の第２の学習データと呼ぶこともある。その初期の第１の学習データ及び初期の第２の学習データは、それぞれ異なったものであってもよく、あるいは、同じものであってもよい。 To add a certain common pair and classification result to the first learning data may be to accumulate the common pair or the like in the first learning data storage unit 21 or to add the common pair or the like to the first learning data. One classification unit 23 may be set to be used as the first learning data. In the latter case, the first classification unit 23 reads the common pair added to the first learning data and stored in the common pair storage unit 20 and the classification result as the first learning data. It may be set as follows. Here, the classification result may be accumulated in the common pair storage unit 20 by the second classification unit 24. Further, adding a certain common pair and classification result to the second learning data may be to accumulate the common pair or the like in the second learning data storage unit 22, or to add the common pair or the like. Alternatively, the second classification unit 24 may be set to be used as the second learning data. In the latter case, the second classification unit 24 reads out the common pair added to the second learning data and stored in the common pair storage unit 20 and the classification result as the second learning data. It may be set as follows. Here, the classification result may be accumulated in the common pair storage unit 20 by the first classification unit 23. In the present embodiment, the adding unit 25 accumulates a common pair that is an addition target to the first learning data in the first learning data storage unit 21, and is a common addition target to the second learning data. The case where a pair etc. are accumulate | stored in the 2nd learning data storage part 22 is demonstrated. Note that the learning data stored in advance in the first learning data storage unit 21 and the second learning data storage unit 22 may be referred to as initial first learning data and initial second learning data, respectively. . The initial first learning data and the initial second learning data may be different from each other, or may be the same.

なお、第１及び第２の分類部２３，２４による機械学習及び分類と、追加部２５による学習データの追加とは繰り返して実行される。その繰り返しの際に、第１及び第２の分類部２３，２４は、追加部２５による追加が行われた後の学習データを用いて、機械学習を行うことになる。 The machine learning and classification by the first and second classification units 23 and 24 and the addition of learning data by the adding unit 25 are repeatedly executed. During the repetition, the first and second classification units 23 and 24 perform machine learning using the learning data after the addition by the adding unit 25 is performed.

なお、第１のコーパス記憶部１１、第２のコーパス記憶部１２、第１関係ペア候補記憶部１５、第１無関係ペア候補記憶部１６、第２関係ペア候補記憶部１７、第２無関係ペア候補記憶部１８、共通ペア記憶部２０、第１の学習データ記憶部２１、第２の学習データ記憶部２２、第１関係ペア記憶部２６、第２関係ペア記憶部２７での記憶は、ＲＡＭ等における一時的な記憶でもよく、あるいは、長期的な記憶でもよい。また、これらの記憶部は、所定の記録媒体（例えば、半導体メモリや磁気ディスク、光ディスクなど）によって実現されうる。 The first corpus storage unit 11, the second corpus storage unit 12, the first related pair candidate storage unit 15, the first irrelevant pair candidate storage unit 16, the second related pair candidate storage unit 17, the second irrelevant pair candidate Storage in the storage unit 18, common pair storage unit 20, first learning data storage unit 21, second learning data storage unit 22, first relationship pair storage unit 26, and second relationship pair storage unit 27 is RAM or the like. It may be a temporary memory or a long-term memory. These storage units can be realized by a predetermined recording medium (for example, a semiconductor memory, a magnetic disk, an optical disk, etc.).

また、第１のコーパス記憶部１１、第２のコーパス記憶部１２、第１関係ペア候補記憶部１５、第１無関係ペア候補記憶部１６、第２関係ペア候補記憶部１７、第２無関係ペア候補記憶部１８、共通ペア記憶部２０、第１の学習データ記憶部２１、第２の学習データ記憶部２２、第１関係ペア記憶部２６、第２関係ペア記憶部２７のうち、任意の２以上の記憶部は、同一の記録媒体によって実現されてもよく、あるいは、別々の記録媒体によって実現されてもよい。前者の場合には、例えば、第１のコーパスを記憶している領域が第１のコーパス記憶部１１となり、第２のコーパスを記憶している領域が第２のコーパス記憶部１２となる。 Moreover, the 1st corpus memory | storage part 11, the 2nd corpus memory | storage part 12, the 1st related pair candidate memory | storage part 15, the 1st unrelated pair candidate memory | storage part 16, the 2nd related pair candidate memory | storage part 17, the 2nd unrelated pair candidate Any two or more of the storage unit 18, the common pair storage unit 20, the first learning data storage unit 21, the second learning data storage unit 22, the first relationship pair storage unit 26, and the second relationship pair storage unit 27 These storage units may be realized by the same recording medium or may be realized by separate recording media. In the former case, for example, an area storing the first corpus is the first corpus storage unit 11, and an area storing the second corpus is the second corpus storage unit 12.

次に、本実施の形態による相互機械学習装置１の動作について、図２のフローチャートを用いて説明する。ここで、第１のコーパスを「Ｓ」とし、第２のコーパスを「Ｕ」とし、第１関係ペア候補の集合を「Ｘ_Ｓ」とし、第１無関係ペア候補の集合を「Ｒ_Ｓ」とし、第２関係ペア候補の集合を「Ｘ_Ｕ」とし、第２無関係ペア候補の集合を「Ｒ_Ｕ」とし、共通ペアの集合を「Ｙ」とし、ジェニュイン共通ペアの集合を「Ｇ」とし、バーチャル共通ペアの集合を「Ｖ」とする。Ｘ_ＳやＸ_Ｕ、Ｇ、Ｖ等の関係は、図４で示されるようになる。なお、Ｙ＝Ｇ∪Ｖである。また、あらかじめ第１の学習データ記憶部２１で記憶されている第１の学習データを「Ｌ^０ _Ｓ」とし、あらかじめ第２の学習データ記憶部２２で記憶されている第２の学習データを「Ｌ^０ _Ｕ」とする。 Next, the operation of the mutual machine learning device 1 according to the present embodiment will be described using the flowchart of FIG. Here, the first corpus is “S”, the second corpus is “U”, the first related pair candidate set is “X _S ”, and the first unrelated pair candidate set is “R _S ”. , The set of second relationship pair candidates is “X _U ”, the set of second unrelated pair candidates is “R _U ”, the set of common pairs is “Y”, the set of genuine common pairs is “G”, A set of virtual common pairs is assumed to be “V”. X _S and _X U, G, the relationship V like is as shown in Figure 4. Note that Y = G∪V. Also, the first learning data stored in advance in the first learning data storage unit 21 is “L ⁰ _S ”, and the second learning data stored in advance in the second learning data storage unit 22 is “ L ⁰ _U ”.

（ステップＳ１０１）第１の抽出部１３は、第１のコーパスＳから複数の第１関係ペア候補の集合Ｘ_Ｓを抽出して第１関係ペア候補記憶部１５に蓄積する。 (Step S <b> 101) The first extraction unit 13 extracts a plurality of first relationship pair candidate sets X _S from the first corpus S and accumulates them in the first relationship pair candidate storage unit 15.

（ステップＳ１０２）第２の抽出部１４は、第２のコーパスＵから複数の第２関係ペア候補の集合Ｘ_Ｕを抽出して第２関係ペア候補記憶部１７に蓄積する。 (Step S102) the second extraction portion 14 will be accumulated in the second relationship pair candidate storage unit 17 extracts a set X _U of the second relationship pair candidates more of the second corpus U.

（ステップＳ１０３）第１の抽出部１３は、第１のコーパスＳから複数の第１無関係ペア候補の集合Ｒ_Ｓを抽出して第１無関係ペア候補記憶部１６に蓄積する。なお、Ｘ_Ｓ∩Ｒ_Ｓは空集合である。 (Step S <b> 103) The first extraction unit 13 extracts a set R _S of a plurality of first irrelevant pair candidates from the first corpus S and accumulates them in the first irrelevant pair candidate storage unit 16. Note that X _S ∩R _S is an empty set.

（ステップＳ１０４）第２の抽出部１４は、第２のコーパスＵから複数の第２無関係ペア候補の集合Ｒ_Ｕを抽出して第２無関係ペア候補記憶部１８に蓄積する。なお、Ｘ_Ｕ∩Ｒ_Ｕは空集合である。 (Step S104) The second extraction unit 14 stores the second independent pair candidate storage unit 18 extracts a set R _U in the plurality from the second corpus U second unrelated pair candidates. Note that X _U ∩R _U is an empty set.

（ステップＳ１０５）取得部１９は、複数の第１関係ペア候補の集合Ｘ_Ｓ、複数の第１無関係ペア候補の集合Ｒ_Ｓ、複数の第２関係ペア候補の集合Ｘ_Ｕ、複数の第２無関係ペア候補の集合Ｒ_Ｕを用いて、ジェニュイン共通ペアの集合Ｇと、バーチャル共通ペアの集合Ｖとを取得し、それらを共通ペア記憶部２０に蓄積する。図４で示されるように、ジェニュイン共通ペアの集合Ｇは、第１関係ペア候補の集合Ｘ_Ｓと、第２関係ペア候補の集合Ｘ_Ｕとの共通部分である。すなわち、Ｇ＝Ｘ_Ｓ∩Ｘ_Ｕとなる。また、バーチャル共通ペアの集合Ｖは、第１関係ペア候補の集合Ｘ_Ｓと、第２無関係ペア候補の集合Ｒ_Ｕとの共通部分、及び、第２関係ペア候補の集合Ｘ_Ｕと、第１無関係ペア候補の集合Ｒ_Ｓとの共通部分である。すなわち、Ｖ＝（Ｘ_Ｓ∩Ｒ_Ｕ）∪（Ｒ_Ｓ∩Ｘ_Ｕ）となる。 (Step S105) The acquisition unit 19 includes a plurality of first relationship pair candidate sets X _S , a plurality of first unrelated pair candidate sets R _S , a plurality of second relationship pair candidate sets X _U , and a plurality of second unrelated relationships. using a set R _U pair candidate obtains a set G of genuine common pair and a set V of virtual common pair, storing them in a common pair storage unit 20. As shown in FIG. 4, the set G of genuine common pairs is a common part of the set X _S of first relationship pair candidates and the set X _U of second relationship pair candidates. That is, G = X _S ∩X _U. The virtual common pair set V includes a common part of the first relation pair candidate set X _S and the second unrelated pair candidate set R _U , the second relation pair candidate set X _U , It is a common part with the set R _S of irrelevant pair candidates. That is, V = (X _S ∩R _U ) ∪ (R _S ∩X _U ).

（ステップＳ１０６）追加部２５は、カウンタｉを０に設定する。このカウンタｉは、ステップＳ１０７〜Ｓ１１３のサイクルをカウントするためのカウンタである。 (Step S106) The adding unit 25 sets the counter i to 0. This counter i is a counter for counting the cycles of steps S107 to S113.

（ステップＳ１０７）第１の分類部２３は、第１の学習データ記憶部２１で記憶されている第１の学習データＬ^ｉ _Ｓを用いて機械学習を行う。その機械学習によって得られた分類器をｃ^ｉ _Ｓとする。なお、機械学習を行う際に用いる素性は、例えば、あらかじめ第１の学習データ記憶部２１で記憶されていてもよく、あるいは、第１のコーパスを参照して取得してもよい。 (Step S107) The first classification unit 23 performs machine learning using the first learning data L ⁱ _S stored in the first learning data storage unit 21. _{Let the} classifier obtained by the machine learning be c ⁱ _S. Note that the features used when performing machine learning may be stored in advance in the first learning data storage unit 21 or may be acquired with reference to the first corpus, for example.

（ステップＳ１０８）第２の分類部２４は、第２の学習データ記憶部２２で記憶されている第２の学習データＬ^ｉ _Ｕを用いて機械学習を行う。その機械学習によって得られた分類器をｃ^ｉ _Ｕとする。なお、機械学習を行う際に用いる素性は、例えば、あらかじめ第２の学習データ記憶部２２で記憶されていてもよく、あるいは、第２のコーパスを参照して取得してもよい。 (Step S108) The second classification unit 24 performs machine learning using the second learning data L ⁱ _U stored in the second learning data storage unit 22. Let c ⁱ _U be the classifier obtained by the machine learning. Note that the features used when performing machine learning may be stored in advance in the second learning data storage unit 22, or may be acquired with reference to the second corpus, for example.

（ステップＳ１０９）第１の分類部２３は、機械学習の結果である分類器ｃ^ｉ _Ｓを用いて、共通ペアの集合Ｙに含まれる各共通ペアに対して分類を行う。この分類の結果、意味的関係を有するかどうかを示すクラスラベルｃｌ∈｛ｙｅｓ、ｎｏ｝と、確信度ｒ∈Ｒ^＋とを得ることができる。なお、クラスラベルｃｌ「ｙｅｓ」は、意味的関係を有すると分類されたことを示し、クラスラベルｃｌ「ｎｏ」は、意味的関係を有さないと分類されたことを示す。また、「Ｒ^＋」は、負でない実数である。分類器ｃによるｙ∈Ｙの分類結果を、ｃ（ｙ）＝（ｙ、ｃｌ、ｒ）と記述することがある。なお、この分類の際に、Ｙに含まれる共通ペアのうち、第１の学習データＬ^ｉ _Ｓ、または、第２の学習データＬ^ｉ _Ｕに含まれる共通ペアについては、分類を行わなくてもよい。また、この分類の際に用いる各共通ペアの素性は、例えば、あらかじめ共通ペア記憶部２６で記憶されていてもよく、あるいは、第１のコーパスを参照して取得してもよい。 (Step S109) The first classifying unit 23 classifies each common pair included in the common pair set Y by using the classifier c ⁱ _S that is the result of the machine learning. As a result of this classification, it is possible to obtain a class label clε {yes, no} indicating whether or not there is a semantic relationship, and a certainty factor rεR ⁺ . The class label cl “yes” indicates that it is classified as having a semantic relationship, and the class label cl “no” indicates that it is classified as not having a semantic relationship. “R ⁺ ” is a non-negative real number. The classification result of yεY by the classifier c may be described as c (y) = (y, cl, r). In this classification, among the common pairs included in Y, the common pairs included in the first learning data L ⁱ _S or the second learning data L ⁱ _U need not be classified. Good. The features of each common pair used in this classification may be stored in advance in the common pair storage unit 26, or may be acquired with reference to the first corpus, for example.

（ステップＳ１１０）第２の分類部２４は、機械学習の結果である分類器ｃ^ｉ _Ｕを用いて、共通ペアの集合Ｙに含まれる各共通ペアに対して分類を行う。この分類の結果、意味的関係を有するかどうかを示すクラスラベルｃｌと、確信度ｒとを得ることができることは、第１の分類部２３の場合と同様である。なお、この分類の際に、Ｙに含まれる共通ペアのうち、第１の学習データＬ^ｉ _Ｓ、または、第２の学習データＬ^ｉ _Ｕに含まれる共通ペアについては、分類を行わなくてもよい。また、この分類の際に用いる各共通ペアの素性は、例えば、あらかじめ共通ペア記憶部２６で記憶されていてもよく、あるいは、第２のコーパスを参照して取得してもよい。 (Step S110) The second classifying unit 24 classifies each common pair included in the common pair set Y by using the classifier c ⁱ _U that is the result of machine learning. As a result of this classification, the class label cl indicating whether or not there is a semantic relationship and the certainty factor r can be obtained as in the case of the first classification unit 23. In this classification, among the common pairs included in Y, the common pairs included in the first learning data L ⁱ _S or the second learning data L ⁱ _U need not be classified. Good. In addition, the features of each common pair used in this classification may be stored in advance in the common pair storage unit 26, or may be acquired with reference to the second corpus, for example.

（ステップＳ１１１）追加部２５は、分類結果を用いて、所定の条件を満たす共通ペアを、第１の学習データＬ^{（ｉ＋１）} _Ｓや第２の学習データＬ^{（ｉ＋１）} _Ｕに追加する。また、第１の学習データＬ^{（ｉ＋１）} _Ｓは、Ｌ^ｉ _Ｓのすべての要素を含むものであり、第２の学習データＬ^{（ｉ＋１）} _Ｕは、Ｌ^ｉ _Ｕのすべての要素を含むものである。なお、この学習データの追加の処理の詳細については、図３のフローチャートを用いて後述する。 (Step S111) Using the classification result, the adding unit 25 adds a common pair that satisfies a predetermined condition to the first learning data L ^{(i + 1)} _S and the second learning data L ^{(i + 1)} _U. The first learning data L ^{(i + 1)} _S includes all elements of L ⁱ _S , and the second learning data L ^{(i + 1)} _U includes all elements of L ⁱ _U. Details of the learning data addition process will be described later with reference to the flowchart of FIG.

（ステップＳ１１２）追加部２５は、ステップＳ１０７〜Ｓ１１３のサイクルの繰り返しの終了条件が満たされるかどうか判断する。そして、その終了条件が満たされる場合には、ステップＳ１１４に進み、そうでない場合には、ステップＳ１１３に進む。 (Step S112) The adding unit 25 determines whether or not a condition for ending the cycle of steps S107 to S113 is satisfied. If the end condition is satisfied, the process proceeds to step S114. If not, the process proceeds to step S113.

その終了条件は、例えば、ｄ^ｉ＝｜σ^ｉ−σ^{（ｉ−１）}｜／｜σ^{（ｉ−１）}｜の値が、連続した所定回数（例えば、３回であってもよい）のサイクルだけ、あらかじめ決められたしきい値「ε」未満であることであってもよい。なお、σ^ｉは、カウンタｉのサイクルにおけるステップＳ１０９，Ｓ１１０において分類された各ｙ∈Ｙの第１の分類部２３による分類の確信度をｒ１とし、第２の分類部２４による分類の確信度をｒ２とした場合に、
σ^ｉ＝Σ｜ｒ１−ｒ２｜
で示される値である。なお、その和は、すべてのｙ∈Ｙに対してとられるものである。ｄ^ｉ＜εであるということは、前回のサイクルと比較して、学習結果である超平面がほとんど変化していないこと、すなわち、新たな学習データの追加を行っても、学習結果がほとんど変化していないことを意味する。なお、そのようになるようにしきい値εが選択されることが好適である。そのしきい値εは、例えば、０．００１等であってもよい。 The end condition is, for example, that the value of d ⁱ = | σ ⁱ −σ ⁽ⁱ⁻¹⁾ | / | σ ⁽ⁱ⁻¹⁾ | is a predetermined number of consecutive times (for example, it may be three times). Only a cycle may be less than a predetermined threshold “ε”. Note that σ ⁱ is the reliability of classification by the first classification unit 23 of each y∈Y classified in steps S109 and S110 in the cycle of the counter i, and is the reliability of classification by the second classification unit 24. Is r2,
σ ⁱ = Σ | r1-r2 |
This is the value indicated by. The sum is taken for all yεY. d ⁱ <ε means that the hyperplane that is the learning result has hardly changed compared to the previous cycle, that is, even if new learning data is added, the learning result hardly changes. Means not. It is preferable that the threshold ε is selected so as to be so. The threshold value ε may be, for example, 0.001.

なお、異なる終了条件を用いてもよいことは言うまでもない。例えば、経験則によって、ステップＳ１０７〜Ｓ１１３のサイクルの繰り返し回数が所定の回数になった場合（例えば、カウンタｉ＝Ａとなった場合。ただし、Ａは１以上の整数である）に、新たな学習データの追加を行っても学習結果がほとんど変化していないことが分かっている場合には、終了条件は、カウンタｉ＝Ａとなったことであってもよい。 Needless to say, different termination conditions may be used. For example, according to an empirical rule, when the number of repetitions of the cycle of steps S107 to S113 reaches a predetermined number (for example, when counter i = A, where A is an integer of 1 or more), a new If it is known that the learning result has hardly changed even when learning data is added, the end condition may be that the counter i = A.

その終了条件を示す情報は、図示しない記録媒体で記憶されており、追加部２５は、その記録媒体から終了条件を示す情報を読み出し、その終了条件が満たされるかどうかの判断を行ってもよい。また、ここでは、追加部２５が終了条件に関する判断を行う場合について説明したが、その判断を行うのは追加部２５以外の構成要素であってもよいことは言うまでもない。 Information indicating the end condition is stored in a recording medium (not shown), and the adding unit 25 may read information indicating the end condition from the recording medium and determine whether the end condition is satisfied. . Although the case where the adding unit 25 makes a determination regarding the end condition has been described here, it is needless to say that the component other than the adding unit 25 may make the determination.

（ステップＳ１１３）追加部２５は、カウンタｉを１だけインクリメントする。そして、ステップＳ１０７に戻る。 (Step S113) The adding unit 25 increments the counter i by 1. Then, the process returns to step S107.

（ステップＳ１１４）第１の分類部２３は、その時点の学習結果である分類器を用いて、第１関係ペア候補記憶部１５で記憶されている各第１関係ペア候補の分類を行い、その分類によって意味的関係を有するとされた第１関係ペア候補である第１関係ペアを、第１関係ペア記憶部２６に蓄積する。なお、この分類の際に用いる各第１関係ペア候補の素性は、例えば、あらかじめ第１関係ペア候補記憶部１５で記憶されていてもよく、あるいは、第１のコーパスを参照して取得してもよい。 (Step S114) The first classifying unit 23 classifies each first relationship pair candidate stored in the first relationship pair candidate storage unit 15 using the classifier that is the learning result at that time, The first relationship pair that is a first relationship pair candidate determined to have a semantic relationship by classification is stored in the first relationship pair storage unit 26. Note that the feature of each first relation pair candidate used in this classification may be stored in advance in the first relation pair candidate storage unit 15 or obtained by referring to the first corpus, for example. Also good.

（ステップＳ１１５）第２の分類部２４は、その時点の学習結果である分類器を用いて、第２関係ペア候補記憶部１７で記憶されている各第２関係ペア候補の分類を行い、その分類によって意味的関係を有するとされた第２関係ペア候補である第２関係ペアを、第２関係ペア記憶部２７に蓄積する。なお、この分類の際に用いる各第２関係ペア候補の素性は、例えば、あらかじめ第２関係ペア候補記憶部１７で記憶されていてもよく、あるいは、第２のコーパスを参照して取得してもよい。 (Step S115) The second classifying unit 24 classifies each second relationship pair candidate stored in the second relationship pair candidate storage unit 17 using the classifier that is the learning result at that time, The second relationship pair that is the second relationship pair candidate determined to have a semantic relationship by classification is accumulated in the second relationship pair storage unit 27. Note that the feature of each second relationship pair candidate used in this classification may be stored in advance in the second relationship pair candidate storage unit 17 or obtained by referring to the second corpus, for example. Also good.

このようにして、相互学習と、その学習結果を用いた分類との一連の処理が終了することになる。なお、図２のフローチャートにおいて、ステップＳ１０１〜Ｓ１０４までの処理の順序を問わないことは言うまでもない。ステップＳ１０７，Ｓ１０８の処理の順序、ステップＳ１０９，Ｓ１１０の処理の順序、ステップＳ１１４，Ｓ１１５の処理の順序を問わないことも言うまでもない。また、並列して実行できる処理については、並列処理を行ってもよいことは言うまでもない。このように、図２のフローチャートにおいて、一連の処理の目的が達成される範囲内における種々の変更が可能である。 In this way, a series of processes of mutual learning and classification using the learning result is completed. In the flowchart of FIG. 2, it goes without saying that the order of the processes from step S101 to S104 is not limited. Needless to say, the order of the processes in steps S107 and S108, the order of the processes in steps S109 and S110, and the order of the processes in steps S114 and S115 are not important. Needless to say, the processing that can be executed in parallel may be performed in parallel. As described above, in the flowchart of FIG. 2, various changes can be made within a range in which the purpose of a series of processing is achieved.

図３は、図２のフローチャートにおける学習データの追加の処理（ステップＳ１１１）の詳細を示すフローチャートである。
（ステップＳ２０１）追加部２５は、ステップＳ１０９における第１の分類部２３による分類結果から、Ｌ^ｉ _Ｓ∪Ｌ^ｉ _Ｕに含まれない共通ペアに対する分類結果の集合ＣＲ^ｉ _Ｓを特定する。ＣＲ^ｉ _Ｓは、次式で示されるものである。なお、ステップＳ１０９において、Ｌ^ｉ _Ｓ∪Ｌ^ｉ _Ｕに含まれない共通ペアに対する分類のみを行っている場合には、第１の分類部２３による分類結果そのものが、ＣＲ^ｉ _Ｓとなる。なお、集合ＣＲ^ｉ _Ｓを特定する処理は、特定したものと特定していないものとを区別できるようにする処理であれば、その内容を問わない。集合ＣＲ^ｉ _Ｓを特定する処理は、例えば、その特定した集合ＣＲ^ｉ _Ｓを図示しない記録媒体に蓄積することであってもよく、特定した集合ＣＲ^ｉ _Ｓの各要素に特定したことを示すフラグ等を設定することであってもよい。なお、このことは、他の特定の処理についても同様である。

FIG. 3 is a flowchart showing details of the learning data addition process (step S111) in the flowchart of FIG.
(Step S201) The adding unit 25 specifies a set CR ⁱ _S of classification results for common pairs not included in L ⁱ _S ＳL ⁱ _U from the classification result by the first classification unit 23 in step S109. CR ⁱ _S is represented by the following equation. Incidentally, in step ^S109, if you are performing only classification for common pair that is not included in ^L _{ⁱ S} ∪L _i _U is classified result itself of the first classification section 23, a ^CR _{i S.} Note that the process of specifying the set CR ⁱ _S is not particularly limited as long as it is a process that enables the specified CR to be distinguished from the unspecified one. The process of specifying the set CR ⁱ _S may be, for example, storing the specified set CR ⁱ _S in a recording medium (not shown), and a flag indicating that each element of the specified set CR ⁱ _S is specified. Etc. may be set. This also applies to other specific processes.

（ステップＳ２０２）追加部２５は、ステップＳ１１０における第２の分類部２４による分類結果から、Ｌ^ｉ _Ｓ∪Ｌ^ｉ _Ｕに含まれない共通ペアに対する分類結果の集合ＣＲ^ｉ _Ｕを特定する。ＣＲ^ｉ _Ｕは、次式で示されるものである。なお、ステップＳ１１０において、Ｌ^ｉ _Ｓ∪Ｌ^ｉ _Ｕに含まれない共通ペアに対する分類のみを行っている場合には、第２の分類部２４による分類結果そのものが、ＣＲ^ｉ _Ｕとなる。

(Step S202) The adding unit 25 specifies a set CR ⁱ _U of classification results for common pairs not included in L ⁱ _S ∪L ⁱ _U from the classification result by the second classification unit 24 in step S110. CR ⁱ _U is represented by the following equation. Note that, in step ^S110, if you are performing only classification for common pair that is not included in ^L _{ⁱ S} ∪L _i _U is classified result itself of the second classification section 24, a ^CR _{i U.}

（ステップＳ２０３）追加部２５は、分類結果の集合ＣＲ^ｉ _Ｓから、確信度ｒの大きい順に選択したＮ個の分類結果の集合ＴｏｐＮ（ＣＲ^ｉ _Ｓ）を特定する。このＮは、あらかじめ決められた１以上の整数であり、例えば、９００などであってもよい。追加部２５は、分類結果の集合ＣＲ^ｉ _Ｓを確信度ｒの降順にソートして、上位からＮ個の分類結果の集合ＴｏｐＮ（ＣＲ^ｉ _Ｓ）を選択してもよい。 (Step S203) addition unit 25, the classification results of the set ^CR _{i S,} identifies a set of descending order to the selected N pieces of classification results of confidence r TopN ^(CR _{i S).} N is a predetermined integer of 1 or more, and may be 900, for example. The adding unit 25 may sort the set of classification results CR ⁱ _S in descending order of the certainty factor r, and select the top N classification result sets TopN (CR ⁱ _S ).

（ステップＳ２０４）追加部２５は、カウンタｊを１に設定する。 (Step S204) The adding unit 25 sets the counter j to 1.

（ステップＳ２０５）追加部２５は、ステップＳ２０３で特定した分類結果の集合ＴｏｐＮ（ＣＲ^ｉ _Ｓ）に含まれるｊ番目の共通ペアｙ^ｊの分類結果（ｙ^ｊ、ｃｌ^ｊ _Ｓ、ｒ^ｊ _Ｓ）∈ＴｏｐＮ（ＣＲ^ｉ _Ｓ）と、同じ共通ペアｙ^ｊに対する第２の分類部２４による分類結果（ｙ^ｊ、ｃｌ^ｊ _Ｕ、ｒ^ｊ _Ｕ）∈ＣＲ^ｉ _Ｕとを用いて、その共通ペアｙ^ｊが第２の学習データＬ^{（ｉ＋１）} _Ｕへの追加対象となるかどうか判断する。そして、共通ペアｙ^ｊが第２の学習データＬ^{（ｉ＋１）} _Ｕへの追加対象となる場合には、ステップＳ２０６に進み、そうでない場合には、ステップＳ２０７に進む。なお、ステップＳ２０３で特定した分類結果の集合ＴｏｐＮ（ＣＲ^ｉ _Ｓ）に含まれるｊ番目の共通ペアｙ^ｊの分類結果（ｙ^ｊ、ｃｌ^ｊ _Ｓ、ｒ^ｊ _Ｓ）は、分類結果の集合ＣＲ^ｉ _Ｓを確信度ｒの降順にソートした結果におけるｊ番目の共通ペアｙ^ｊの分類結果（ｙ^ｊ、ｃｌ^ｊ _Ｓ、ｒ^ｊ _Ｓ）であってもよい。 (Step S205) The adding unit 25 performs the classification result (y ^j , cl ^j _S , r ^j _S ) ∈ of the j-th common pair y ^j included in the set TopN (CR ⁱ _S ) of the classification results specified in step S203. Using TopN (CR ⁱ _S ) and the classification result (y ^j , cl ^j _U , r ^j _U ) ∈CR ⁱ _U by the second classification unit 24 for the same common pair y ^j , the common pair y ^j is It is determined whether or not to be added to the second learning data L ^{(i + 1)} _U. If the common pair y ^j is to be added to the second learning data L ^{(i + 1)} _U , the process proceeds to step S206; otherwise, the process proceeds to step S207. The classification result (y ^j , cl ^j _S , r ^j _S ) of the j-th common pair y ^j included in the classification result set TopN (CR ⁱ _S ) specified in step S203 is the classification result set CR ^i. The classification result (y ^j , cl ^j _S , r ^j _S ) of the j-th common pair y ^j in the result of sorting _S in descending order of the certainty factor r may be used.

具体的には、次の（条件１）または（条件２）を満たす場合に、追加部２５は、その共通ペアｙ^ｊを第２の学習データＬ^{（ｉ＋１）} _Ｕに追加すると判断する。
（条件１）：ｒ^ｊ _Ｓ＞α、かつ、ｒ^ｊ _Ｕ＜β
（条件２）：ｒ^ｊ _Ｓ＞α、かつ、ｃｌ^ｊ _Ｓ＝ｃｌ^ｊ _Ｕ Specifically, when the following (Condition 1) or (Condition 2) is satisfied, the adding unit 25 determines to add the common pair y ^j to the second learning data L ^{(i + 1)} _U.
(Condition 1): r ^j _S > α and r ^j _U <β
(Condition 2): r ^j _S > α and cl ^j _S = cl ^j _U

なお、条件１は、第１の分類部２３の分類による確信度が高く、第２の分類部２４の分類による確信度が低いことに対応している。また、条件２は、第１の分類部２３による確信度が高く、第１及び第２の分類部２３，２４の分類結果が同じであることに対応している。なお、α、βの値は、あらかじめ適切に設定されているものとする。また、本実施の形態では、条件１，２の両方を用いて判断を行う場合について説明するが、いずれか一方の条件のみを用いて判断を行ってもよい。 Condition 1 corresponds to a high certainty factor according to the classification of the first classification unit 23 and a low certainty factor according to the classification of the second classification unit 24. Condition 2 corresponds to a high degree of certainty by the first classification unit 23 and the same classification results of the first and second classification units 23 and 24. Note that the values of α and β are set appropriately in advance. In the present embodiment, a case is described in which the determination is performed using both the conditions 1 and 2, but the determination may be performed using only one of the conditions.

（ステップＳ２０６）追加部２５は、共通ペアｙ^ｊとその分類結果ｃｌ^ｊ _Ｓとを、次のサイクルの機械学習で用いる第２の学習データＬ^{（ｉ＋１）} _Ｕに追加する。すなわち、
Ｌ^{（ｉ＋１）} _Ｕ←Ｌ^{（ｉ＋１）} _Ｕ∪（ｙ^ｊ、ｃｌ^ｊ _Ｓ）
とする。なお、ステップＳ２０５〜Ｓ２０８のサイクルによる共通ペアの追加の処理が開始される以前に、Ｌ^{（ｉ＋１）} _Ｕ←Ｌ^ｉ _Ｕとされているものとする。 (Step S206) The adding unit 25 adds the common pair y ^j and the classification result cl ^j _S to the second learning data L ^{(i + 1)} _U used in the machine learning of the next cycle. That is,
L ^{(i + 1)} _U ← L ^{(i + 1)} _U ∪ (y ^j , cl ^j _S )
And It is assumed that L ^{(i + 1)} _U ← L ⁱ _U before the process of adding a common pair in the cycle of steps S205 to S208 is started.

（ステップＳ２０７）追加部２５は、カウンタｊを１だけインクリメントする。 (Step S207) The adding unit 25 increments the counter j by 1.

（ステップＳ２０８）追加部２５は、ステップＳ２０３で特定した分類結果の集合ＴｏｐＮ（ＣＲ^ｉ _Ｓ）にｊ番目の共通ペアｙ^ｊの分類結果（ｙ^ｊ、ｃｌ^ｊ _Ｓ、ｒ^ｊ _Ｓ）が含まれるかどうか判断する。そして、ｊ番目の分類結果が含まれる場合には、ステップＳ２０５に戻り、そうでない場合には、ステップＳ２０９に進む。なお、ステップＳ２０３で特定した分類結果の集合ＴｏｐＮ（ＣＲ^ｉ _Ｓ）にはＮ個の要素が含まれるため、追加部２５は、ｊ≦Ｎであるかどうか判断し、ｊ≦Ｎである場合にはステップＳ２０５に戻り、そうでない場合にはステップＳ２０９に進んでもよい。 (Step S208) The adding unit 25 includes the classification result (y ^j , cl ^j _S , r ^j _S ) of the j-th common pair y ^{j in} the classification result set TopN (CR ⁱ _S ) specified in step S203. Judge whether or not. If the jth classification result is included, the process returns to step S205; otherwise, the process proceeds to step S209. Since the set TopN (CR ⁱ _S ) of the classification result specified in step S203 includes N elements, the adding unit 25 determines whether j ≦ N, and when j ≦ N. May return to step S205, and if not, may proceed to step S209.

（ステップＳ２０９）追加部２５は、分類結果の集合ＣＲ^ｉ _Ｕから、確信度ｒの大きい順に選択したＮ個の分類結果の集合ＴｏｐＮ（ＣＲ^ｉ _Ｕ）を特定する。 (Step S209) addition unit 25, the classification results of the set ^CR _{i U,} identifies a set of descending order to the selected N pieces of classification results of confidence r TopN ^(CR _{i U).}

（ステップＳ２１０）追加部２５は、カウンタｊを１に設定する。 (Step S210) The adding unit 25 sets the counter j to 1.

（ステップＳ２１１）追加部２５は、ステップＳ２０９で特定した分類結果の集合ＴｏｐＮ（ＣＲ^ｉ _Ｕ）に含まれるｊ番目の共通ペアｙ^ｊの分類結果（ｙ^ｊ、ｃｌ^ｊ _Ｕ、ｒ^ｊ _Ｕ）∈ＴｏｐＮ（ＣＲ^ｉ _Ｕ）と、同じ共通ペアｙ^ｊに対する第１の分類部２３による分類結果（ｙ^ｊ、ｃｌ^ｊ _Ｓ、ｒ^ｊ _Ｓ）∈ＣＲ^ｉ _Ｓとを用いて、その共通ペアｙ^ｊが第１の学習データＬ^{（ｉ＋１）} _Ｓへの追加対象となるかどうか判断する。そして、共通ペアｙ^ｊが第２の学習データＬ^{（ｉ＋１）} _Ｓへの追加対象となる場合には、ステップＳ２１２に進み、そうでない場合には、ステップＳ２１３に進む。なお、ステップＳ２０９で特定した分類結果の集合ＴｏｐＮ（ＣＲ^ｉ _Ｕ）に含まれるｊ番目の共通ペアｙ^ｊの分類結果（ｙ^ｊ、ｃｌ^ｊ _Ｕ、ｒ^ｊ _Ｕ）は、分類結果の集合ＣＲ^ｉ _Ｕを確信度ｒの降順にソートした結果におけるｊ番目の共通ペアｙ^ｊの分類結果（ｙ^ｊ、ｃｌ^ｊ _Ｕ、ｒ^ｊ _Ｕ）であってもよい。 (Step S211) The adding unit 25 classifies the classification result (y ^j , cl ^j _U , r ^j _U ) ε of the j-th common pair y ^j included in the classification result set TopN (CR ⁱ _U ) specified in step S209. Using TopN (CR ⁱ _U ) and the classification result (y ^j , cl ^j _S , r ^j _S ) ∈CR ⁱ _S by the first classification unit 23 for the same common pair y ^j , the common pair y ^j is It is determined whether or not to be added to the ^first learning data L ^{(i + 1)} _S. If the common pair y ^j is to be added to the second learning data L ^{(i + 1)} _S , the process proceeds to step S212. Otherwise, the process proceeds to step S213. The classification result (y ^j , cl ^j _U , r ^j _U ) of the j-th common pair y ^j included in the classification result set TopN (CR ⁱ _U ) specified in step S209 is the classification result set CR ^i. The classification result (y ^j , cl ^j _U , r ^j _U ) of the j-th common pair y ^j in the result of sorting _U in descending order of the certainty factor r may be used.

具体的には、次の（条件３）または（条件４）を満たす場合に、追加部２５は、その共通ペアｙ^ｊを第１の学習データＬ^{（ｉ＋１）} _Ｓに追加すると判断する。
（条件３）：ｒ^ｊ _Ｕ＞α、かつ、ｒ^ｊ _Ｓ＜β
（条件４）：ｒ^ｊ _Ｕ＞α、かつ、ｃｌ^ｊ _Ｕ＝ｃｌ^ｊ _Ｓ Specifically, when the following (Condition 3) or (Condition 4) is satisfied, the adding unit 25 determines to add the common pair y ^j to the first learning data L ^{(i + 1)} _S.
(Condition 3): r ^j _U > α and r ^j _S <β
(Condition 4): r ^j _U > α and cl ^j _U = cl ^j _S

なお、条件３は、第２の分類部２４の分類による確信度が高く、第１の分類部２３の分類による確信度が低いことに対応している。また、条件４は、第２の分類部２４による確信度が高く、第１及び第２の分類部２３，２４の分類結果が同じであることに対応している。また、本実施の形態では、条件３，４の両方を用いて判断を行う場合について説明するが、いずれか一方の条件のみを用いて判断を行ってもよい。 Condition 3 corresponds to a high certainty factor according to the classification of the second classification unit 24 and a low certainty factor according to the classification of the first classification unit 23. Condition 4 corresponds to a high degree of certainty by the second classification unit 24 and the same classification results of the first and second classification units 23 and 24. In the present embodiment, a case is described in which the determination is performed using both the conditions 3 and 4. However, the determination may be performed using only one of the conditions.

（ステップＳ２１２）追加部２５は、共通ペアｙ^ｊとその分類結果ｃｌ^ｊ _Ｕとを、次のサイクルの機械学習で用いる第１の学習データＬ^{（ｉ＋１）} _Ｓに追加する。すなわち、
Ｌ^{（ｉ＋１）} _Ｓ←Ｌ^{（ｉ＋１）} _Ｓ∪（ｙ^ｊ、ｃｌ^ｊ _Ｕ）
とする。なお、ステップＳ２１１〜Ｓ２１４のサイクルによる共通ペアの追加の処理が開始される以前に、Ｌ^{（ｉ＋１）} _Ｓ←Ｌ^ｉ _Ｓとされているものとする。 (Step S212) The adding unit 25 adds the common pair y ^j and the classification result cl ^j _U to the first learning data L ^{(i + 1)} _S used in the machine learning of the next cycle. That is,
L ^{(i + 1)} _S ← L ^{(i + 1)} _S ∪ (y ^j , cl ^j _U )
And Note that L ^{(i + 1)} _S ← L ⁱ _{S is} assumed before the process of adding a common pair in the cycle of steps S211 to S214 is started.

（ステップＳ２１３）追加部２５は、カウンタｊを１だけインクリメントする。 (Step S213) The adding unit 25 increments the counter j by 1.

（ステップＳ２１４）追加部２５は、ステップＳ２０９で特定した分類結果の集合ＴｏｐＮ（ＣＲ^ｉ _Ｕ）にｊ番目の共通ペアｙ^ｊの分類結果（ｙ^ｊ、ｃｌ^ｊ _Ｕ、ｒ^ｊ _Ｕ）が含まれるかどうか判断する。そして、ｊ番目の分類結果が含まれる場合には、ステップＳ２１１に戻り、そうでない場合には、図２のフローチャートに戻る。なお、ステップＳ２０９で特定した分類結果の集合ＴｏｐＮ（ＣＲ^ｉ _Ｕ）にはＮ個の要素が含まれるため、追加部２５は、ｊ≦Ｎであるかどうか判断し、ｊ≦Ｎである場合にはステップＳ２１１に戻り、そうでない場合には図２のフローチャートに戻ってもよい。 (Step S214) The adding unit 25 includes the classification result (y ^j , cl ^j _U , r ^j _U ) of the j-th common pair y ^{j in} the classification result set TopN (CR ⁱ _U ) specified in step S209. Judge whether or not. If the jth classification result is included, the process returns to step S211; otherwise, the process returns to the flowchart of FIG. Note that since the set TopN (CR ⁱ _U ) of the classification result specified in step S209 includes N elements, the adding unit 25 determines whether j ≦ N, and when j ≦ N. May return to step S211, or otherwise return to the flowchart of FIG.

ここで、第１のコーパスが構造化されたものであり、第２のコーパスが構造化されていないものである場合に、第１及び第２関係ペア候補、及び、第１及び第２無関係ペア候補を抽出する方法について説明する。なお、意味的関係は、上位下位の関係であるとする。また、ここでは、構造化された第１コーパスとして、百科事典コーパスを用いる場合について説明する。百科事典のコーパスでは、例えば、図５（ａ）で示されるように、タイトル「Ｔｉｇｅｒ」に対して、セクション「Ｒａｎｇｅ」「Ｔａｘｏｎｏｍｙ」が存在し、そのセクション「Ｔａｘｏｎｏｍｙ」に対して、サブセクション「Ｓｕｂｓｐｅｃｉｅｓ」が存在し、そのサブセクション「Ｓｕｂｓｐｅｃｉｅｓ」に対して、リスト「Ｂｅｎｇａｌｔｉｇｅｒ」「Ｍａｌａｙａｎｔｉｇｅｒ」「Ｓｉｂｅｒｉａｎｔｉｇｅｒ」が存在する。その百科事典のコーパスがＨＴＭＬやＸＭＬ等のマークアップ言語によって記述されている場合には、タイトルを特定するタイトルタグや、セクションを特定するセクションタグ、サブセクションを特定するサブセクションタグ、リストを特定するリストタグ等を用いることによって、タイトルやセクション、サブセクション、リスト等を特定することができ、図５（ｂ）で示されるように、それらのツリー構造を取得することができる。ツリー構造は、例えば、次のようにして取得することができる。例えば、セクションのタグＡよりも後に存在するセクションよりも下位の階層のタグ（例えば、サブセクションのタグや、リストのタグ）であって、サブセクションのタグまたはサブセクションよりも上位の階層のタグ（例えば、タイトルダグ）が出現するまでに出現するタグのうち、セクションのタグＡに最も近いタグ（これを「タグＢ」とする）は、そのセクションのタグＡのノードとつながる一階層だけ下位のノードとなる。また、そのタグＢよりも後に存在するタグであって、そのタグＢと異なる階層のタグが出現するまでに出現するタグ（このタグはタグＢと同じ階層である）も、そのセクションのタグＡのノードとつながる一階層だけ下位のノードとなる。なお、上位階層から順にタイトル、セクション、サブセクション、リストとなることがあらかじめ決められているものとする。このようにして、図５（ｂ）で示されるツリー構造が得られると、上位のノードと、その上位のノードに対する直接または間接の下位のノードとのペアが、第１関係ペア候補となる。図５（ｂ）の場合には、例えば、（Ｔｉｇｅｒ、Ｒａｎｇｅ）、（Ｔｉｇｅｒ、Ｔａｘｏｎｏｍｙ）、（Ｔｉｇｅｒ、Ｓｕｂｓｐｅｃｉｅｓ）、（Ｔｉｇｅｒ、Ｂｅｎｇａｌｔｉｇｅｒ）、（Ｔａｘｏｎｏｍｙ、Ｓｕｂｓｐｅｃｉｅｓ）、（Ｔａｘｏｎｏｍｙ、Ｂｅｎｇａｌｔｉｇｅｒ）等が第１関係ペア候補となる。なお、その第１関係ペア候補において、（上位の言語表現、下位の言語表現）となっている。また、百科事典のコーパスにおけるツリー構造において、上位のノードと、その上位のノードに対する直接または間接の下位のノードとのペアでないペア、例えば、同じ親ノードを有するノードのペアが、第１無関係ペア候補となる。図５（ｂ）の場合には、例えば、（Ｒａｎｇｅ、Ｔａｘｏｎｏｍｙ）、（Ｂｅｎｇａｌｔｉｇｅｒ、Ｍａｌａｙａｎｔｉｇｅｒ）等が第１無関係ペア候補となる。なお、第１の抽出部１３は、第１関係ペア候補を抽出するために用いる第１のコーパスの部分と、第１無関係ペア候補を抽出するために用いる第１のコーパスの部分とを別にしてもよい。例えば、日本語のＷＩＫＩＰＥＤＩＡ（登録商標）から上位下位の関係を抽出する方法については、前述の非特許文献２を参照されたい。また、その文献に記載の方法を用いることによって、ＷＩＫＩＰＥＤＩＡ（登録商標）のツリー構造を知ることができるため、前述のようにして、上位下位の関係でない関係を抽出することができる。 Here, when the first corpus is structured and the second corpus is unstructured, the first and second relationship pair candidates and the first and second unrelated pairs A method for extracting candidates will be described. It is assumed that the semantic relationship is an upper / lower relationship. Here, a case where an encyclopedia corpus is used as the structured first corpus will be described. In the encyclopedia corpus, for example, as shown in FIG. 5A, for the title “Tiger”, there are sections “Range” and “Taxonomy”, and for the section “Taxonomy”, the subsection “ "Subspecies", and for the subsection "Subspecies", there are lists "Bengal tiger", "Malayan tiger", and "Siberian tiger". If the encyclopedia corpus is described in a markup language such as HTML or XML, specify the title tag that identifies the title, the section tag that identifies the section, the subsection tag that identifies the subsection, and the list By using the list tag or the like, the title, section, subsection, list, etc. can be specified, and their tree structure can be obtained as shown in FIG. The tree structure can be acquired as follows, for example. For example, a tag in a lower hierarchy than a section existing after the section tag A (for example, a tag in a subsection or a tag in a list), and a tag in a hierarchy higher than the subsection tag or the subsection. Among the tags that appear before (for example, title Doug) appear, the tag closest to the tag A of the section (referred to as “tag B”) is one level lower than the tag A node of the section. Node. In addition, a tag that exists after the tag B and appears before a tag of a different hierarchy from the tag B appears (this tag is in the same hierarchy as the tag B) is also included in the tag A of the section. Only one layer connected to the node is a lower node. It is assumed that a title, a section, a subsection, and a list are determined in advance from the upper layer. When the tree structure shown in FIG. 5B is obtained in this way, a pair of an upper node and a direct or indirect lower node with respect to the upper node becomes a first relation pair candidate. In the case of FIG. 5B, for example, (Tiger, Range), (Tiger, Taxonomy), (Tiger, Subspecies), (Tiger, Bengal tiger), (Taxonomic, Subspecies), (Taxonomy, Bengalti, etc.) Becomes the first relationship pair candidate. In the first relation pair candidate, (higher language expression, lower language expression) is set. Also, in the encyclopedia corpus tree structure, a pair that is not a pair of an upper node and a direct or indirect lower node with respect to the upper node, for example, a pair of nodes having the same parent node is a first unrelated pair. Be a candidate. In the case of FIG. 5B, for example, (Range, Taxonomy), (Bengal tiger, Malayan tiger), etc. are the first unrelated pair candidates. The first extraction unit 13 separates the first corpus portion used to extract the first related pair candidate and the first corpus portion used to extract the first unrelated pair candidate. May be. For example, refer to the above-mentioned Non-Patent Document 2 for a method of extracting upper and lower relations from Japanese WIKI IPEDIA (registered trademark). Also, by using the method described in that document, it is possible to know the tree structure of WIKIPEDIA (registered trademark), so that it is possible to extract a relationship that is not a higher-order relationship as described above.

次に、第２のコーパスから第２関係ペア候補、第２無関係ペア候補を抽出する方法について説明する。ここでは、構造化されていない第２のコーパスとして、ウェブの情報を用いる場合について説明する。第２の抽出部１４は、レキシコシンタクティックパターンを用いることによって、第２関係ペア候補、及び第２無関係ペア候補を抽出することができる。第２の抽出部１４は、その第２のコーパスにおいて、例えば、上位下位の関係に対応するレキシコシンタクティックパターン「ＡというＢ」、「ＡなどのＢ」等に一致する箇所を特定し、言語表現Ａ，Ｂを抽出することによって、第２関係ペア候補（Ａ，Ｂ）を抽出することができる。また、第２の抽出部１４は、その第２のコーパスにおいて、例えば、上位下位以外の関係（例えば、因果関係等）に対応するレキシコシンタクティックパターン「Ｃが原因となるＤ」、「Ｃに使用されるＤ」等に一致する箇所を特定し、言語表現Ｃ，Ｄを抽出することによって、第２無関係ペア候補（Ｃ，Ｄ）を抽出することができる。なお、このようにレキシコシンタクティックパターンを用いて意味的関係を有する言語表現のペアの候補を抽出する方法については、例えば、次の文献を参照されたい。
文献：ＭａｙａＡｎｄｏ、ＳａｔｏｓｈｉＳｅｋｉｎｅ、ＳｈｕｎＩｓｈｉｚａｋｉ、「ＡｕｔｏｍａｔｉｃｅｘｔｒａｃｔｉｏｎｏｆｈｙｐｏｎｙｍｓｆｒｏｍＪａｐａｎｅｓｅｎｅｗｓｐａｐｅｒｕｓｉｎｇｌｅｘｉｃｏ−ｓｙｎｔａｃｔｉｃｐａｔｔｅｒｎｓ」、ＩｎＰｒｏｃ．ｏｆＬＲＥＣ'０４、２００４年 Next, a method for extracting the second related pair candidate and the second unrelated pair candidate from the second corpus will be described. Here, a case where web information is used as the unstructured second corpus will be described. The second extraction unit 14 can extract the second related pair candidate and the second unrelated pair candidate by using the lexicosyntactic pattern. In the second corpus, for example, the second extraction unit 14 identifies a location that matches the lexicosyntactic pattern “B of A”, “B of A”, and the like corresponding to the upper and lower relationships, By extracting the linguistic expressions A and B, the second relationship pair candidate (A, B) can be extracted. In addition, the second extraction unit 14 uses, for example, a lexicosyntactic pattern “D caused by C”, “C” corresponding to a relationship other than upper and lower levels (for example, a causal relationship) in the second corpus. The second unrelated pair candidate (C, D) can be extracted by identifying the part that matches “D” used for the above and extracting the language expressions C, D. For a method of extracting a linguistic expression pair candidate having a semantic relationship using a lexicosyntactic pattern in this way, refer to the following document, for example.
Literature: Maya Ando, Satoshi Sekin, Shun Shizaki, “Automatic extraction of hypothems from Japan newspace using lexico-intactic tactics”. of LREC '04, 2004

また、第１のコーパスが構造化されたものであり、第２のコーパスが構造化されていないものであり、意味的関係が上位下位の関係である場合に、第１及び第２の分類部２３，２４の機械学習で用いられる素性について説明する。ここでは、第１のコーパスがＷＩＫＩＰＥＤＩＡ（登録商標）であり、第２のコーパスがウェブテキストである場合について説明する。 In addition, when the first corpus is structured, the second corpus is unstructured, and the semantic relationship is an upper-lower relationship, the first and second classification units The features used in the machine learning of 23 and 24 will be described. Here, a case where the first corpus is WIKIPEDIA (registered trademark) and the second corpus is web text will be described.

第１の分類部２３による機械学習の素性には、第１関係ペア候補や学習データに含まれる２個の言語表現（この言語表現を言語表現Ａ，Ｂとする）そのものや、各言語表現Ａ，Ｂの形態素、品詞が含まれる。また、各言語表現Ａ，Ｂの主辞の形態素が含まれてもよい。主辞とは、他の部分とつながっている主要な形態素のことである。日本語では通常、最後に位置する形態素が主辞となる。例えば、言語表現「ＸＹＺ大学」の場合には、最後の形態素「大学」が主辞となる。また、ツリー構造における言語表現Ａ，Ｂの距離（階層の深さ）が素性に含まれてもよい。例えば、図５の場合には、「Ｔｉｇｅｒ」と「Ｒｎａｇｅ」とは距離が「１」であり、「Ｔｉｇｅｒ」と「Ｂｅｎｇａｌｔｉｇｅｒ」とは距離が「３」である。また、言語表現Ａ，Ｂが、項目の一覧やリストが現れるセクション等の複数のパターン（例えば、「〜の一覧」や、「〜のリスト」等）のいずれかに一致するかどうかの情報、言語表現Ａ，Ｂが、ＷＩＫＩＰＥＤＩＡ（登録商標）の見出し語（タイトル、セクションタイトル、サブセクションタイトルを含む。リストは含まない。）に頻出するもの（例えば、ＷＩＫＩＰＥＤＩＡ（登録商標）にあらかじめ決められた頻度より多く出現するもの。例えば、「参考文献」や「外部リンク」等が該当する）に一致するかどうかの情報、言語表現Ａ，Ｂのレイアウトタイプ（例えば、タイトル、セクション、リスト等）、言語表現Ａ，Ｂのツリー構造のノードタイプ（例えば、ルートノード、リーフノード、中間ノードなどがある。図５の場合、「Ｔｉｇｅｒ」がルートノードであり、「Ｂｅｎｇａｌｔｉｇｅｒ」がリーフノードであり、「Ｒａｎｇｅ」が中間ノードである）、言語表現Ａ，Ｂの親ノード、子ノードのうち、任意の１以上のものが素性に含まれてもよい。また、ＷＩＫＩＰＥＤＩＡ（登録商標）のＩｎｆｏｂｏｘから得られる属性や属性値が素性に含まれてもよい。なお、これらの素性については、前述の非特許文献２を参照されたい。 The features of machine learning by the first classification unit 23 include the first relation pair candidate and two language expressions included in the learning data (this language expression is referred to as language expressions A and B), and each language expression A. , B morphemes and parts of speech. Moreover, the morpheme of the main word of each language expression A and B may be included. A headword is a major morpheme connected to another part. In Japanese, the last morpheme is usually the main word. For example, in the case of the language expression “XYZ University”, the last morpheme “University” is the main word. The distance between the language expressions A and B (hierarchy depth) in the tree structure may be included in the feature. For example, in the case of FIG. 5, the distance between “Tiger” and “Rnage” is “1”, and the distance between “Tiger” and “Bengal tiger” is “3”. Information on whether the language expressions A and B match any of a plurality of patterns (for example, “list of”, “list of”, etc.) such as a list of items and a section in which the list appears, etc. The language expressions A and B are pre-determined in advance (for example, WIKIPEDIA (registered trademark)) in terms of WIKIPEDIA (registered trademark) headwords (including titles, section titles, subsection titles, not including lists). Those that appear more often than the frequency (for example, “references”, “external links”, etc.), the layout type (eg title, section, list, etc.) of language expressions A and B, There are node types (for example, a root node, a leaf node, an intermediate node, etc.) of a tree structure of language expressions A and B. "iger" is a root node, "Bengal tiger" is a leaf node, and "Range" is an intermediate node), and any one or more of the parent nodes and child nodes of the language expressions A and B are features May be included. In addition, an attribute or attribute value obtained from the Infobox of WIKIPEDIA (registered trademark) may be included in the feature. For these features, refer to the aforementioned Non-Patent Document 2.

第２の分類部２４による機械学習の素性には、第２関係ペア候補や学習データに含まれる２個の言語表現（この言語表現を言語表現Ａ，Ｂとする）そのものや、各言語表現Ａ，Ｂの形態素、品詞が含まれる。また、その２個の言語表現を取得するのに用いたレキシコシンタクティックパターンの識別子、その２個の言語表現とパターンとの間のＰＭＩ（ｐｏｉｎｔ−ｗｉｓｅｍｕｔｕａｌｉｎｆｏｒｍａｔｉｏｎ）スコア、言語表現Ａ，Ｂ間のＰＭＩスコア、言語表現Ａ，Ｂの名詞クラスのうち、任意の１以上のものが素性に含まれてもよい。なお、名詞クラスとは、次の文献によって示されるＥＭベースのクラスタリングによって５×１０^５個の名詞を５００個のクラスに分けたものである。例えば、名詞クラスＣ_３１１は、「多糖」、「有機化合物」などの生物学や化学に関する名詞を有するクラスである。
文献：Ｊｕｎ'ｉｃｈｉＫａｚａｍａ、ＫｅｎｔａｒｏＴｏｒｉｓａｗａ、「Ｉｎｄｕｃｉｎｇｇａｚｅｔｔｅｅｒｓｆｏｒｎａｍｅｄｅｎｔｉｔｙｒｅｃｏｇｎｉｔｉｏｎｂｙｌａｒｇｅ−ｓｃａｌｅｃｌｕｓｔｅｒｉｎｇｏｆｄｅｐｅｎｄｅｎｃｙｒｅｌａｔｉｏｎｓ」、ＩｎＰｒｏｃｅｅｄｉｎｇｓｏｆＡＣＬ−０８：ＨＬＴ、ｐ．４０７−４１５、２００８年 Machine learning by the second classifying unit 24 includes two language expressions included in the second relation pair candidate and the learning data (this language expression is referred to as language expressions A and B), and each language expression A. , B morphemes and parts of speech. Also, an identifier of the lexicosyntactic pattern used to acquire the two language expressions, a PMI (point-wise mutual information) score between the two language expressions and the patterns, and language expressions A and B Any one or more of the PMI scores and the noun classes of the language expressions A and B may be included in the feature. The noun class is obtained by dividing 5 × 10 ⁵ nouns into 500 classes by EM-based clustering shown in the following document. For example, the noun class C ₃₁₁ is a class having nouns related to biology and chemistry such as “polysaccharide” and “organic compound”.
Literature: Jun'ichi Kazama, Kentaro Torisawa, “Inducting gazeteters for named entity recognition by large-scale crushing of dependency Pro. 407-415, 2008

また、第１及び第２の分類部２３，２４による機械学習や分類で用いられる素性は、上記説明のものに限定されないことは言うまでもない。他の素性が用いられてもよく、上記説明の一部の素性が用いられなくてもよい。また、意味的関係が上位下位ではない関係になった場合には、その意味的関係に応じた適切な素性が用いられることが好適である。 Needless to say, the features used in machine learning and classification by the first and second classification units 23 and 24 are not limited to those described above. Other features may be used, and some of the features described above may not be used. In addition, when the semantic relationship is not a higher-order or lower-order relationship, it is preferable that an appropriate feature corresponding to the semantic relationship is used.

［実験例］
次に、本実施の形態による相互機械学習装置１の実験例について説明する。ここで、第１のコーパスとして、２００９年７月バージョンの日本語のＷＩＫＩＰＥＤＩＡ（登録商標）を用いた。そして、１．９×１０^７個の第１関係ペア候補を取得した。この第１関係ペア候補の抽出は、ＷＩＫＩＰＥＤＩＡ（登録商標）のメインの記事のところから行った。なお、そこから２４０００個の第１関係ペア候補をランダムに抽出し、それらが意味的関係（上位下位の関係）を有するかどうかを手作業で判断した。そして、そのうちの２００００個のペアを学習データとし、残りの４０００個のペアをディベロップメントデータと、テストデータとに均等に分けた。ディベロップメントデータは、最適なパラメータの選択のために用いられるものである。また、テストデータは、評価のために用いられるものである。なお、学習データ、ディベロップメントデータ、テストデータのそれぞれには、人手で判断した正例と負例とが含まれている。 [Experimental example]
Next, an experimental example of the mutual machine learning device 1 according to the present embodiment will be described. Here, as a first corpus, a Japanese version of WIKIPEDIA (registered trademark) in July 2009 was used. Then, 1.9 × 10 ⁷ first relationship pair candidates were acquired. The extraction of the first relationship pair candidate was performed from the main article of WIKIPEDIA (registered trademark). It should be noted that 24,000 first relationship pair candidates were randomly extracted therefrom, and it was manually determined whether or not they had a semantic relationship (upper-lower relationship). Of these, 20000 pairs were used as learning data, and the remaining 4000 pairs were equally divided into development data and test data. The development data is used for selecting an optimum parameter. The test data is used for evaluation. Each of the learning data, development data, and test data includes a positive example and a negative example that are manually determined.

また、第２のコーパスとして、前述のＴＳＵＢＡＫＩからの５×１０^７ページ分のウェブテキストを用いた。そして、それらのウェブテキストから上位下位の関係のレキシコシンタクティックパターンを用いて６×１０^６個の第２関係ペア候補を取得した。また、ウェブテキストから９５００個の第２関係ペア候補をランダムに抽出し、それらが意味的関係（上位下位の関係）を有するかどうかを手作業で判断した。そして、そのうちの７５００個のペアを学習データとし、残りの２０００個のペアをディベロップメントデータと、テストデータとに均等に分けた。なお、学習データ、ディベロップメントデータ、テストデータのそれぞれには、人手で判断した正例と負例とが含まれている。
なお、本実験例において、第１関係ペア候補の集合や第２関係ペア候補の集合における意味的関係（上位下位の関係）を有する候補の割合はあまり高くなく、２５〜３０％程度であった。 Further, as the second corpus, 5 × 10 ⁷ pages of web text from the above-mentioned TSUBAKI was used. Then, 6 × 10 ⁶ second relationship pair candidates were obtained from these web texts using lexicosyntactic patterns of upper and lower relationships. In addition, 9500 second relationship pair candidates were randomly extracted from the web text, and it was manually determined whether or not they had a semantic relationship (higher-lower relationship). Of these, 7500 pairs were used as learning data, and the remaining 2000 pairs were divided equally into development data and test data. Each of the learning data, development data, and test data includes a positive example and a negative example that are manually determined.
In this experimental example, the ratio of candidates having a semantic relationship (higher-lower relationship) in the first relationship pair candidate set or the second relationship pair candidate set was not so high, and was about 25 to 30%. .

また、ＷＩＫＩＰＥＤＩＡ（登録商標）のカテゴリーシステムを用いて、第１無関係ペア候補を抽出した。すなわち、第１関係ペア候補はメインの記事から抽出し、第１無関係ペア候補はカテゴリーシステムを用いて抽出したため、第１関係ペア候補を抽出するために用いる第１のコーパスの部分と、第１無関係ペア候補を抽出するために用いる第１のコーパスの部分とは別である。なお、その第１無関係ペア候補は、前述のように、そのペアに含まれる一方の言語表現が、他方の言語表現の祖先となっていないものである。また、ＴＳＵＢＡＫＩのページから因果関係等のレキシコシンタクティックパターンを用いて第２無関係ペア候補を抽出した。 Moreover, the 1st unrelated pair candidate was extracted using the category system of WIKIPEDIA (registered trademark). That is, since the first related pair candidate is extracted from the main article, and the first unrelated pair candidate is extracted using the category system, the first corpus portion used for extracting the first related pair candidate and the first This is different from the portion of the first corpus used to extract irrelevant pair candidates. Note that, as described above, the first irrelevant pair candidate is one in which one language expression included in the pair is not an ancestor of the other language expression. Moreover, the 2nd unrelated pair candidate was extracted from the TSUBAKI page using lexicosyntactic patterns such as a causal relationship.

また、この実験例では、ｐｏｌｙｎｏｍｉａｌｋｅｒｎｅｌｄ＝２のＴｉｎｙＳＶＭ（ｈｔｔｐ：／／ｃｈａｓｅｎ．ｏｒｇ／〜ｔａｋｕ／ｓｏｆｔｗａｒｅ／ＴｉｎｙＳＶＭ／）を第１及び第２の分類部２３，２４として用いた。また、ディベロップメントデータを用いた実験によって、パラメータα、β、Ｎを決定した。この実験例では、α＝１．０、β＝０．３、Ｎ＝９００とした。また、適合率（Ｐ）、再現率（Ｒ）、Ｆ値（Ｆ）を用いて評価を行った。 Further, in this experimental example, TinySVM (http://chasen.org/˜taku/software/TinySVM/) with a primary kernel d = 2 was used as the first and second classification units 23 and 24. In addition, the parameters α, β, and N were determined by experiments using development data. In this experimental example, α = 1.0, β = 0.3, and N = 900. Moreover, evaluation was performed using the precision (P), the recall (R), and the F value (F).

この実験例では、６個のシステムを比較した。そのうちの３個はＢ１、Ｂ２、Ｂ３であり、異なる素性のセットと異なる学習データとの効果を示すためのものである。Ｂ１，Ｂ２では、２個の分類部を分けて機械学習したのに対して、Ｂ３では、単一の分類部の機械学習に、統合した素性のセットと学習データとを用いた。 In this experimental example, six systems were compared. Three of them are B1, B2, and B3, and are intended to show the effect of different feature sets and different learning data. In B1 and B2, two classifiers were machine-learned separately, whereas in B3, an integrated feature set and learning data were used for machine learning of a single classifier.

Ｂ１は、完全に独立した分類部から構成される。ＳとＵの分類部は、それぞれ自分自身の素性と学習データとを用いて学習され、評価された。すなわち、Ｓの分類部に対してはＷＩＫＩＰＥＤＩＡ（登録商標）の素性と学習データが用いられ、Ｕの分類部に対してはウェブの素性と学習データが用いられた。 B1 is composed of a completely independent classification unit. The S and U classifiers were learned and evaluated using their own features and learning data, respectively. That is, the features and learning data of WIKIPEDIA (registered trademark) were used for the S classification unit, and the web features and learning data were used for the U classification unit.

Ｂ２は、２個の分類部が、統合された学習データを用いて学習された以外は、Ｂ１と同じである。すなわち、２個の分類部は、それぞれ、２７５００個の学習データで機械学習を行った。なお、その機械学習で用いる素性はそれぞれ別個である。ここで、Ｕの分類部の機械学習において、ウェブテキストから取得された学習データを学習する場合には、ディスタンス等の素性が存在しないが、それは存在しないものとして学習を行った。 B2 is the same as B1 except that the two classification units are learned using the integrated learning data. That is, each of the two classification units performed machine learning with 27500 pieces of learning data. Note that the features used in the machine learning are different. Here, in the machine learning of the classification unit of U, when learning data acquired from web text is learned, the learning such as distance does not exist, but learning is performed.

Ｂ３は、Ｂ１に対してマスター分類部を付加したものである。Ｂ２と同様に、統合された学習データを用いて学習された。また、すべての利用可能な素性を用いて機械学習を行った。すなわち、２個の分類部の両方において、同じ素性を用いた。さらに、Ｂ１の２個の分類部によって得られた各ペアに対するＳＶＭスコアも素性に含めた。 B3 is obtained by adding a master classification unit to B1. As with B2, learning was performed using the integrated learning data. We also performed machine learning using all available features. That is, the same feature was used in both of the two classification parts. Furthermore, the SVM score for each pair obtained by the two classifiers of B1 was also included in the feature.

その他の３個のシステム、ＢＩＣＯ，Ｃｏ−Ｂ，Ｃｏ−ＳＴＡＲ（本実施の形態による相互機械学習装置１）は、二言語相互機械学習（ＢＩＣＯ）と、相互機械学習（Ｃｏ−ＢとＣｏ−ＳＴＡＲ）とを比較するためのものである。特に、Ｃｏ−ＢとＣｏ−ＳＴＡＲとは、バーチャル共通ペアの使用の有無を評価するための比較を行う。また、Ｃｏ−ＢとＣｏ−ＳＴＡＲとについて、Ｂ１、Ｂ２と同様の初期の学習データを用いた。なお、Ｂ１と同様の初期の学習データを用いたものは、Ｃｏ−Ｂ、Ｃｏ−ＳＴＡＲと表記し、Ｂ２と同様の初期の学習データを用いたものは、Ｃｏ−Ｂ＊、Ｃｏ−ＳＴＡＲ＊と表記している。すなわち、Ｃｏ−Ｂ、Ｃｏ−ＳＴＡＲについては、第１の学習データが２００００個（ＷＩＫＩＰＥＤＩＡ（登録商標）から抽出されたもの）であり、第２の学習データが７５００個（ウェブから抽出されたもの）であるが、Ｃｏ−Ｂ＊、Ｃｏ−ＳＴＡＲ＊については、第１及び第２の学習データが２７５００個（ＷＩＫＩＰＥＤＩＡ（登録商標）から抽出されたものとウェブから抽出されたものとをあわせたもの）ずつとなる。 The other three systems, BICO, Co-B, and Co-STAR (mutual machine learning apparatus 1 according to the present embodiment), are bilingual mutual machine learning (BICO), mutual machine learning (Co-B and Co- STAR). In particular, Co-B and Co-STAR are compared to evaluate whether or not a virtual common pair is used. For Co-B and Co-STAR, initial learning data similar to B1 and B2 was used. Note that those using the initial learning data similar to B1 are denoted as Co-B and Co-STAR, and those using the initial learning data similar to B2 are Co-B * and Co-STAR *. It is written. That is, for Co-B and Co-STAR, the first learning data is 20000 pieces (extracted from WIKIPEDIA (registered trademark)), and the second learning data is 7500 pieces (extracted from the web). However, for Co-B * and Co-STAR *, 27500 first and second learning data (extracted from WIKIPEDIA (registered trademark) and extracted from the web) were combined. Things).

ＢＩＣＯは、前述の非特許文献２に記載されている二言語相互機械学習アルゴリズムを用いたものである。そのアルゴリズムでは、二言語の上位下位の意味的関係が協同的に２個の処理によって取得されていく。そのＢＩＣＯのために、２００００個の英語の学習データと、２００００個の日本語の学習データとを用意した。なお、その２００００個の日本語の学習データは、前述のＷＩＫＩＰＥＤＩＡ（登録商標）から取得した学習データと同じものである。 BICO uses a bilingual mutual machine learning algorithm described in Non-Patent Document 2 described above. In the algorithm, the upper and lower semantic relationships of two languages are acquired cooperatively by two processes. For the BICO, 20000 pieces of English learning data and 20000 pieces of Japanese learning data were prepared. Note that the 20,000 pieces of Japanese learning data are the same as the learning data acquired from the aforementioned WIKIPEDIA (registered trademark).

Ｃｏ−Ｂは、本実施の形態による相互機械学習装置１（Ｃｏ−ＳＴＡＲ）の変形であり、前述のように、ジェニュイン共通ペアのみを用いるものである。この実験例では、６７０００個のジェニュイン共通ペアが用いられた。 Co-B is a modification of the mutual machine learning apparatus 1 (Co-STAR) according to the present embodiment, and uses only a genuine common pair as described above. In this experimental example, 67000 common common pairs were used.

Ｃｏ−ＳＴＡＲは、本実施の形態による相互機械学習装置１であり、ジェニュイン共通ペアとバーチャル共通ペアとの両方を用いた。それらの共通ペアの総数は６４３０００個であった。 Co-STAR is a mutual machine learning apparatus 1 according to the present embodiment, and uses both a genuine common pair and a virtual common pair. The total number of these common pairs was 643000.

その実験結果は、図６で示されるとおりである。その実験結果は、前述のような初期の学習データと素性（前述の説明のすべての素性）とを用いて、終了条件が満たされるまで、順次、機械学習と共通ペアの分類と学習データの追加とを繰り返した後に、テストデータに対して分類を行った結果である。そのテストデータの分類部による結果が、人手による判断結果と同じかどうかによって、正解、不正解を判定した。なお、終了条件としては、ｄ^ｉの値が３回連続して０．００１未満であることを採用した。また、ＢＩＣＯでは、二言語でＷＩＫＩＰＥＤＩＡ（登録商標）のデータを用いた処理を行ったため、ウェブデータのほうについては結果が存在しない。図６の実験結果において、ＷｅｂＳｅｔは、ウェブテキストを用いた実験結果（すなわち、第２の分類部２４の分類による結果）であり、ＷｉｋｉＳｅｔは、ＷＩＫＩＰＥＤＩＡ（登録商標）を用いた実験結果（すなわち、第１の分類部２３の分類による結果）である。 The experimental result is as shown in FIG. The experimental results are based on the initial learning data and features (all the features described above) as described above. Machine learning, common pair classification, and learning data are added until the end condition is satisfied. This is a result of classifying the test data after repeating the above. The correct answer and the incorrect answer were determined depending on whether the result of the classification of the test data is the same as the result of manual judgment. As the termination condition, adopts the value of d ⁱ is less than 3 consecutive times 0.001. In addition, since BICO performs processing using data of WIKIPEDIA (registered trademark) in two languages, there is no result for web data. In the experimental results shown in FIG. 6, WebSet is an experimental result using web text (that is, a result of classification by the second classification unit 24), and WikiSet is an experimental result using WIKIPEDIA (registered trademark) (that is, The result of classification by the first classification unit 23).

Ｂ１〜Ｂ３の比較により、Ｂ２，Ｂ３がＢ１よりＦ値についてすぐれていることが分かる。Ｂ２，Ｂ３は、より多くの学習データ（２７５００個）を用いたため、それに比べて少数の学習データ（７５００個と２００００個）を用いたＢ１よりもよい結果になった。Ｂ２，Ｂ３は、分類部の個数が異なり、その分類部では異なる素性と学習データを用いて学習が行われたにもかかわらず、両者のＦ値は同様の結果となっている。 By comparing B1 to B3, it can be seen that B2 and B3 are superior to B1 in terms of the F value. Since B2 and B3 used more learning data (27500), the results were better than B1 using a smaller number of learning data (7500 and 20000). B2 and B3 differ in the number of classification units, and the F values of both are the same, although the classification unit has learned using different features and learning data.

Ｃｏ−ＳＴＡＲは、Ｂ１〜Ｂ３よりもより性能が優れていることが分かる。また、Ｃｏ−ＳＴＡＲは、ＢＩＣＯに対しても、より少ない学習データで、よりよい性能であることが分かる。なお、Ｃｏ−ＳＴＡＲの学習データは全部で２７５００個であり、ＢＩＣＯの学習データは全部で４００００個である。Ｃｏ−ＢとＣｏ−ＳＴＡＲとの性能の違いは、バーチャル共通ペアの使用の有無の効果を示している。Ｃｏ−ＢよりもＣｏ−ＳＴＡＲのほうがＦ値が高いことによって、ジェニュイン共通ペアと共にバーチャル共通ペアを用いた方が、２個の分類部のより効果的な協同を実現できることが分かる。 It can be seen that Co-STAR has better performance than B1 to B3. It can also be seen that Co-STAR has better performance with less learning data than BICO. The total learning data for Co-STAR is 27500, and the total learning data for BICO is 40000. The difference in performance between Co-B and Co-STAR shows the effect of using or not using a virtual common pair. Since Co-STAR has a higher F value than Co-B, it can be seen that using the virtual common pair together with the genuine common pair can realize more effective cooperation between the two classification units.

このように、本実施の形態による相互機械学習装置１（Ｃｏ−ＳＴＡＲ）は、他の方法に対して、Ｆ値が１．４〜８．５％高く、他の方法よりも性能の高い学習を実現できていることが分かる。なお、そのようにして機械学習を行った相互機械学習装置１によって、ウェブテキストから４．３×１０^５個の第１関係ペア（上位下位の関係のペア）を取得することができ、ＷＩＫＩＰＥＤＩＡ（登録商標）から４．６×１０^６個の第２関係ペア（上位下位の関係のペア）を取得することができた。また、ＳＶＭのしきい値をウェブデータに対しては０．２３に設定し、ＷＩＫＩＰＥＤＩＡ（登録商標）に対しては０．１に設定することによって、９０％の適合率を得ることもできた。 As described above, the mutual machine learning device 1 (Co-STAR) according to the present embodiment has an F value that is 1.4 to 8.5% higher than other methods, and has higher performance than other methods. It can be seen that The mutual machine learning device 1 that has performed machine learning in this manner can acquire 4.3 × 10 ⁵ first relationship pairs (higher and lower relationship pairs) from the web text, and can obtain WIKIPEDIA ( It was possible to obtain 4.6 × 10 ⁶ second relationship pairs (pairs of upper and lower relationships) from (registered trademark). In addition, by setting the SVM threshold to 0.23 for web data and 0.1 for WIKIPEDIA (registered trademark), it was possible to obtain a 90% conformance rate. .

次に、本実施の形態による相互機械学習装置１のロバスト性を評価する実験例について説明する。この実験例では、構造化されたコーパス（ＷＩＫＩＰＥＤＩＡ（登録商標））については、前述の実験例と同様の人手による学習データを用い、構造化されていないコーパス（ウェブテキスト）については、自動的に取得したノイズの多い（すなわち、必ずしも高精度でない）学習データを用いた。その学習データについて簡単に説明する。その学習データの正例については、次のように取得した。まず、ＷＩＫＩＰＥＤＩＡ（登録商標）の定義文（ＷＩＫＩＰＥＤＩＡ（登録商標）の記事の頭の文）から「（下位の言語表現）は（上位の言語表現）である」や「（下位の言語表現）は（上位の言語表現）の一種である」等のパターンを用いて取得した上位下位の関係のペアと、ＷＩＫＩＰＥＤＩＡ（登録商標）のカテゴリーを用いて取得した上位下位の関係のペアとを取得した。ＷＩＫＩＰＥＤＩＡ（登録商標）のカテゴリーを用いて上位下位の関係のペアを取得する際には、まず、タイトルを下位の言語表現、そのタイトルのカテゴリーを上位の言語表現とするペアを取得した。そして、そのタイトルが下位の言語表現である上位下位の関係のペアが、ＷＩＫＩＰＥＤＩＡ（登録商標）の定義文からパターンを用いて取得できており、かつ、その上位下位の関係のペアの上位の言語表現と、カテゴリーである上位の言語表現との主辞が一致している場合に、そのタイトルを下位の言語表現、そのタイトルのカテゴリーを上位の言語表現とするペアを上位下位の関係のペアとした。例えば、タイトル「新型インフルエンザ」、定義文「新型インフルエンザは、インフルエンザウイルスのうちヒト−ヒト間の伝染能力を新たに有するようになったウイルスを病原体とするインフルエンザ感染症である」から、定義文のパターンを用いて、上位下位の関係（インフルエンザ感染症、新型インフルエンザ）を正例のペアとして取得する。また、「新型インフルエンザ」のカテゴリーに「ウイルス感染症」がある場合には、そのカテゴリーの主辞の「感染症」が、定義文のパターンを用いて取得された上位下位の関係の上位の言語表現の主辞と一致するため、このカテゴリーから取得された「ウイルス感染症」も、「新型インフルエンザ」の上位の言語表現の正例として取得する。すなわち、カテゴリーから、上位下位の関係（ウイルス感染症、新型インフルエンザ）を正例のペアとして取得することになる。 Next, an experimental example for evaluating the robustness of the mutual machine learning device 1 according to the present embodiment will be described. In this experimental example, for the structured corpus (WIKIPEDIA (registered trademark)), manual learning data similar to the above-described experimental example is used, and for the unstructured corpus (web text), automatically. The acquired learning data with a lot of noise (that is, not necessarily highly accurate) was used. The learning data will be briefly described. The positive example of the learning data was acquired as follows. First, from the definition sentence of WIKIPEDIA (registered trademark) (the sentence at the beginning of the article of WIKIPEDIA (registered trademark)), “(lower language expression) is (upper language expression)” and “(lower language expression) are A pair of upper and lower relations acquired using a pattern such as “a kind of (upper language expression)” and a pair of upper and lower relations acquired using a category of WIKIPEDIA (registered trademark) were acquired. When acquiring a pair of upper and lower relations using a category of WIKIPEDIA (registered trademark), first, a pair having a title as a lower language expression and a category of the title as an upper language expression was acquired. Then, a pair of upper and lower relations whose title is a lower language expression can be obtained from the definition sentence of WIKIPEDIA (registered trademark) using a pattern, and the upper language of the pair of the upper and lower relations When the main words of the expression and the higher-level language expression that is the category match, the pair whose title is the lower-level language expression and whose title category is the higher-level language expression is the upper-lower relationship pair . For example, from the title “New influenza” and the definition sentence “New influenza is an influenza infection whose pathogen is a virus that has a new human-to-human transmission ability among influenza viruses”. Using patterns, upper and lower relationships (influenza infections, new influenza) are acquired as positive pairs. In addition, if there is “virus infection” in the category of “new influenza”, the main language expression “infectious disease” of the category is the higher-level language expression of the upper-lower relationship acquired using the definition sentence pattern In order to agree with the main word, “virus infection” acquired from this category is also acquired as a positive example of the language expression above “new influenza”. That is, from the category, the upper-lower relationship (virus infection, new influenza) is acquired as a positive pair.

そのようにして取得した上位下位の関係のうち、第２関係ペア候補との共通するものを正例の学習データとした。学習データの負例については、バーチャル共通ペア、特にＲ_Ｓ∩Ｘ_Ｕから取得した。このようにして取得した学習データのサイズは非常に大きくなるため、前述の実験例と同じになるように、７５００個の学習データをランダムに選択した。その際に、正例と負例の比が１：４となるように選択を行った。 Among the upper and lower relations acquired in this way, the common relation with the second relation pair candidate was used as positive example learning data. Negative examples of learning data were obtained from virtual common pairs, particularly R _S ＲX _U. Since the size of the learning data acquired in this way becomes very large, 7500 pieces of learning data were randomly selected so as to be the same as the above-described experimental example. At that time, selection was made so that the ratio of positive example to negative example was 1: 4.

そのような学習データを用いて、前述の実験例と同様の実験を行ったところ、結果は、図７で示されるようになった。その図７の結果から、本実施の形態による相互機械学習装置１（Ｃｏ−ＳＴＡＲ）は、精度の低い学習データに対してロバスト性を有していることが分かる。また、本実施の形態による相互機械学習装置１（Ｃｏ−ＳＴＡＲ）は、図６の結果に比べると、少しは性能が落ちているが、Ｂ１〜Ｂ３に対して、よりよい性能を有していることが分かる。したがって、性能が少し落ちてもよい場合には、人手によって学習データを用意する労力を軽減することができることが分かる。 Using such learning data, an experiment similar to the above-described experimental example was performed, and the result was as shown in FIG. From the result of FIG. 7, it can be seen that the mutual machine learning device 1 (Co-STAR) according to the present embodiment has robustness with respect to learning data with low accuracy. Moreover, although the mutual machine learning apparatus 1 (Co-STAR) by this Embodiment has a little performance fall compared with the result of FIG. 6, it has a better performance with respect to B1-B3. I understand that. Therefore, it can be understood that the labor for preparing the learning data manually can be reduced when the performance may be slightly reduced.

以上のように、本実施の形態による相互機械学習装置１によれば、共通ペア、すなわち、ジェニュイン共通ペアと、バーチャル共通ペアとを用いて相互機械学習を行うことによって、より性能の高い相互機械学習を実現することができる。前述の非特許文献１，２は、ジェニュイン共通ペアしか用いていないため、本実施の形態による相互機械学習装置１は、バーチャル共通ペアを用いた２個の機械学習の共同によって、それら非特許文献１，２の手法よりも、より高い性能を実現できたことになる。また、本実施の形態による相互機械学習装置１は、精度の低い学習データに対してもロバスト性を有していることが分かる。したがって、学習データを用意する際の人手による作業を軽減することも可能となる。また、本実施の形態による相互機械学習装置１によれば、構造化されたデータと、構造化されていないデータのように、第１及び第２の分類部２３，２４ごとに、異なる処理対象を扱うことも可能となる。 As described above, according to the mutual machine learning device 1 according to the present embodiment, a mutual machine with higher performance can be obtained by performing mutual machine learning using a common pair, that is, a genuine common pair and a virtual common pair. Learning can be realized. Since the non-patent documents 1 and 2 described above use only the genuine common pair, the mutual machine learning device 1 according to the present embodiment performs the non-patent document by cooperating two machine learnings using the virtual common pair. This means that higher performance can be realized than the methods 1 and 2. Moreover, it turns out that the mutual machine learning apparatus 1 by this Embodiment has robustness also with respect to learning data with low precision. Therefore, it is possible to reduce manual work when preparing learning data. Further, according to the mutual machine learning device 1 according to the present embodiment, different processing targets are used for the first and second classification units 23 and 24, such as structured data and unstructured data. Can also be handled.

このようにして、本実施の形態による相互機械学習装置１を用いて取得された意味的関係は、例えば、ウェブ検索などの情報検索システムや、機械翻訳システムなどで用いることができる。具体的には、ウェブ検索において、意味的関係を用いたクエリの拡張が可能となる。例えば、辞書に登録されていない未知語が入力された場合に、その未知語を下位語とする上位下位の関係が本実施の形態による相互機械学習装置１によって取得されているのであれば、その未知語の上位語による検索を行うことができる。また、機械翻訳システムにおいても、意味的関係を有することによって、より適切な訳語を選択することができると共に、訳語の登録がなくても、その上位語を用いて翻訳するなどの柔軟な翻訳を行うことができる。なお、本実施の形態による相互機械学習装置１を用いて取得された意味的関係の使用方法はこれらに限定されるものではなく、他の種々の活用方法があることは言うまでもない。 Thus, the semantic relationship acquired using the mutual machine learning device 1 according to the present embodiment can be used in, for example, an information search system such as a web search, a machine translation system, or the like. Specifically, in web search, it is possible to expand a query using a semantic relationship. For example, when an unknown word that is not registered in the dictionary is input, if the upper-lower relationship with the unknown word as a lower word is acquired by the mutual machine learning device 1 according to the present embodiment, It is possible to perform a search using a broader term of the unknown word. Also in the machine translation system, it is possible to select a more appropriate translated word by having a semantic relationship, and to perform flexible translation such as translating using the broader word without registering the translated word. It can be carried out. It should be noted that the method of using the semantic relationship acquired using the mutual machine learning apparatus 1 according to the present embodiment is not limited to these, and it goes without saying that there are various other utilization methods.

なお、本実施の形態において、バーチャル共通ペアを拡張してもよい。すなわち、バーチャル共通ペアの集合Ｖは、図４で示されるＶの領域以外の共通ペアをも含むものであってもよい。例えば、バーチャル共通ペアは、複数の第１関係ペア候補と、複数の第２関係ペア候補と、複数の第１無関係ペア候補と、複数の第２無関係ペア候補とのうち、ジェニュイン共通ペアではないペアである共通ペアであってもよい。その場合には、バーチャル共通ペアの集合Ｖは、図４において、Ｘ_Ｓ、Ｒ_Ｓ、Ｘ_Ｕ、Ｒ_Ｕの網掛けのない部分をも含むようになる。ここで、厳密に言えば、Ｘ_Ｓ、Ｒ_Ｓ、Ｘ_Ｕ、Ｒ_Ｕの網掛けのない部分のペアは共通ペアではないが、ここではバーチャル共通ペアを拡張しているため、便宜上、そのペアについても共通ペアと呼ぶことにする。すなわち、この場合には、バーチャル共通ペアの集合Ｖは、本来の共通ペア（２個の集合に共通するペア）を含むペアの集合となる。 In the present embodiment, the virtual common pair may be expanded. That is, the set V of virtual common pairs may also include common pairs other than the region of V shown in FIG. For example, the virtual common pair is not a genuine common pair among a plurality of first relationship pair candidates, a plurality of second relationship pair candidates, a plurality of first irrelevant pair candidates, and a plurality of second irrelevant pair candidates. It may be a common pair that is a pair. In that case, the set V of virtual common pairs also includes portions of X _S , R _S , X _U , and R _U that are not shaded in FIG. Strictly speaking, a pair of non-shaded portions of X _S , R _S , X _U , and R _U is not a common pair, but since the virtual common pair is expanded here, the pair is not shown for convenience. Is also called a common pair. That is, in this case, the set V of virtual common pairs is a set of pairs including the original common pair (a pair common to two sets).

また、本実施の形態において、追加部２５が、第１及び第２の分類部２３，２４の分類による確信度の高い共通ペアとその分類結果とを学習データに追加する場合について説明したが、追加部２５は、それ以外の処理を行ってもよい。すなわち、追加部２５は、第１の分類部２３の分類による共通ペアの分類結果と確信度との少なくとも一方に応じて、その共通ペアとその共通ペアに関する分類結果とを第２の学習データに追加し、第２の分類部２４の分類による共通ペアの分類結果と確信度との少なくとも一方に応じて、その共通ペアとその共通ペアに関する分類結果とを第１の学習データに追加してもよい。ここで、「共通ペアの分類結果と確信度との少なくとも一方に応じて、その共通ペアとその共通ペアに関する分類結果とを学習データに追加する」とは、共通ペアの分類結果と確信度との少なくとも一方が所定の条件を満たしている場合には、その共通ペアと分類結果が学習データに追加され、共通ペアの分類結果と確信度との少なくとも一方がその所定の条件を満たしていない場合には、その共通ペア等が学習データに追加されないことを意味している。その所定の条件は、例えば、分類結果のみに関するものであってもよく、確信度のみに関するものであってもよく、あるいは、分類結果と確信度の両方に関するものであってもよい。例えば、追加部２５は、ある共通ペアについて、第１及び第２の分類部２３，２４の分類結果が同じである場合に、その共通ペアとその共通ペアの分類結果とを第１及び第２の学習データに追加してもよい。また、例えば、追加部２５は、第１の分類部２３の分類による確信度が高い共通ペアのうち、ランダムに選択された共通ペアとその分類結果とを第２の学習データに追加し、第２の分類部２４の分類による確信度が高い共通ペアのうち、ランダムに選択された共通ペアとその分類結果とを第１の学習データに追加してもよい。また、例えば、追加部２５は、第１の分類部２３の分類による確信度が高い共通ペアのうち、正例となる共通ペアとその分類結果とを第２の学習データに追加し、第２の分類部２４の分類による確信度が高い共通ペアのうち、正例となる共通ペアとその分類結果とを第１の学習データに追加してもよい。また、例えば、追加部２５は、第１の分類部２３の分類による正例の共通ペアとその分類結果とを第２の学習データに追加し、第２の分類部２４の分類による正例の共通ペアとその分類結果とを第１の学習データに追加してもよい。 Moreover, in this Embodiment, although the addition part 25 demonstrated the case where the common pair with the high reliability by the classification | category of the 1st and 2nd classification | category parts 23 and 24 and its classification result were added to learning data, The adding unit 25 may perform other processes. That is, the adding unit 25 converts the common pair and the classification result related to the common pair into the second learning data according to at least one of the classification result and the certainty factor of the common pair by the classification of the first classification unit 23. And adding the common pair and the classification result related to the common pair to the first learning data according to at least one of the classification result of the common pair and the certainty factor by the classification of the second classification unit 24. Good. Here, “adding the common pair and the classification result relating to the common pair to the learning data according to at least one of the classification result and the certainty of the common pair” means that the classification result of the common pair and the certainty If at least one of the criteria satisfies the predetermined condition, the common pair and the classification result are added to the learning data, and at least one of the classification result and the certainty of the common pair does not satisfy the predetermined condition Means that the common pair or the like is not added to the learning data. The predetermined condition may relate to only the classification result, may relate to only the certainty degree, or may relate to both the classification result and the certainty degree, for example. For example, when the classification result of the first and second classification units 23 and 24 is the same for a certain common pair, the adding unit 25 determines that the common pair and the classification result of the common pair are the first and second. It may be added to the learning data. Further, for example, the adding unit 25 adds the randomly selected common pair and the classification result to the second learning data among the common pairs having high certainty by the classification of the first classification unit 23, and the second learning data Among the common pairs having high certainty by the classification of the second classification unit 24, a randomly selected common pair and the classification result may be added to the first learning data. Further, for example, the adding unit 25 adds the common pair that is a positive example and the classification result to the second learning data among the common pairs having high certainty by the classification of the first classification unit 23, and the second learning data Among the common pairs having a high certainty factor by the classification of the classification unit 24, a common pair that is a positive example and the classification result may be added to the first learning data. Further, for example, the adding unit 25 adds the common pair of the positive example based on the classification of the first classification unit 23 and the classification result to the second learning data, and adds the positive example based on the classification of the second classification unit 24. The common pair and the classification result may be added to the first learning data.

また、本実施の形態では、第１のコーパスが構造を有するものであり、第２のコーパスが構造を有しないものである場合について主に説明したが、そうでなくてもよい。例えば、両者共に、構造を有するものであってもよい。その場合であっても、例えば、第１のコーパスから、本実施の形態と同様に、構造を用いて第１関係ペア候補等を抽出し、第２のコーパスから、本実施の形態と同様に、レキシコシンタクティックパターンを用いて第２関係ペア候補等を抽出してもよい。なお、意味的関係が上位下位の関係でない場合には、その意味的関係に応じた構造等を用いた第１関係ペア候補等や、第２関係ペア候補等の抽出が行われることが好適である。例えば、文書構造や文書間構造、テーブル（表）構造等を用いて第１関係ペア候補等を抽出してもよい。 In the present embodiment, the case where the first corpus has a structure and the second corpus has no structure has been mainly described, but this need not be the case. For example, both may have a structure. Even in that case, for example, the first relationship pair candidate or the like is extracted from the first corpus using the structure in the same manner as in the present embodiment, and from the second corpus as in the present embodiment. The second relationship pair candidate or the like may be extracted using a lexicosyntactic pattern. If the semantic relationship is not an upper-lower relationship, it is preferable to extract the first relationship pair candidate, the second relationship pair candidate, etc. using a structure or the like according to the semantic relationship. is there. For example, the first relationship pair candidate or the like may be extracted using a document structure, an inter-document structure, a table structure, or the like.

また、本実施の形態では、第１の抽出部１３が第１関係ペア候補、第１無関係ペア候補を抽出し、第２の抽出部１４が第２関係ペア候補、第２無関係ペア候補を抽出する場合について説明したが、そうでなくてもよい。その場合には、相互機械学習装置１は、第１のコーパス記憶部１１、第２のコーパス記憶部１２、第１の抽出部１３、第２の抽出部１４を備えていなくてもよい。また、その場合には、第１関係ペア候補、第１無関係ペア候補、第２関係ペア候補、第２無関係ペア候補は、第１及び第２の抽出部１３，１４の抽出と同様にして抽出されたものであってもよく、あるいは、他の方法（例えば、人手による方法等）によって抽出されたものであってもよい。また、第１関係ペア候補記憶部１５、第１無関係ペア候補記憶部１６、第２関係ペア候補記憶部１７、第２無関係ペア候補記憶部１８に第１関係ペア候補等が記憶される過程は問わない。例えば、記録媒体を介して第１関係ペア候補等が第１関係ペア候補記憶部１５等で記憶されるようになってもよく、あるいは、通信回線等を介して送信された第１関係ペア候補等が第１関係ペア候補記憶部１５等で記憶されるようになってもよい。なお、機械学習や分類を行う際には、素性が必要であるため、相互機械学習装置１が第１及び第２のコーパス記憶部１１，１２を備えていない場合には、第１関係ペア候補等の言語表現のペアに、あらかじめ素性の情報が対応付けられていることが好適である。第１及び第２の分類部２３，２４は、その素性の情報を用いることによって、機械学習や分類を行うことができる。 In the present embodiment, the first extraction unit 13 extracts the first relationship pair candidate and the first irrelevant pair candidate, and the second extraction unit 14 extracts the second relationship pair candidate and the second irrelevant pair candidate. Although the case where it does is demonstrated, it does not need to be so. In that case, the mutual machine learning device 1 may not include the first corpus storage unit 11, the second corpus storage unit 12, the first extraction unit 13, and the second extraction unit 14. In that case, the first relation pair candidate, the first irrelevant pair candidate, the second relation pair candidate, and the second irrelevant pair candidate are extracted in the same manner as the extraction of the first and second extraction units 13 and 14. May have been extracted, or may be extracted by other methods (for example, a manual method). In addition, the process of storing the first related pair candidates and the like in the first related pair candidate storage unit 15, the first unrelated pair candidate storage unit 16, the second related pair candidate storage unit 17, and the second unrelated pair candidate storage unit 18 It doesn't matter. For example, the first relationship pair candidate or the like may be stored in the first relationship pair candidate storage unit 15 or the like via a recording medium, or the first relationship pair candidate transmitted via a communication line or the like Or the like may be stored in the first relationship pair candidate storage unit 15 or the like. Note that, when machine learning and classification are performed, a feature is necessary. Therefore, if the mutual machine learning device 1 does not include the first and second corpus storage units 11 and 12, the first relationship pair candidate It is preferable that feature information is previously associated with a pair of language expressions such as. The first and second classification units 23 and 24 can perform machine learning and classification by using information on the features.

また、本実施の形態では、第１及び第２の分類部２３，２４が第１及び第２関係ペア候補を分類した結果である第１及び第２関係ペアが蓄積される第１及び第２関係ペア記憶部２６，２７を備える場合について説明したが、第１及び第２の分類部２３，２４が第１及び第２関係ペア候補の分類を行わない場合や、第１及び第２の分類部２３，２４が第１及び第２関係ペアの蓄積を行わない場合（例えば、第１及び第２関係ペア候補記憶部１５，１７で記憶されている第１及び第２関係ペアに対してフラグを設定するような場合）には、相互機械学習装置１は、第１及び第２関係ペア記憶部２６，２７を備えていなくてもよい。 In the present embodiment, the first and second relationship pairs in which the first and second relationship pairs, which are the results of the first and second classification units 23 and 24 classifying the first and second relationship pair candidates, are accumulated. Although the case where the relationship pair storage units 26 and 27 are provided has been described, the first and second classification units 23 and 24 do not classify the first and second relationship pair candidates, or the first and second classifications. When the units 23 and 24 do not accumulate the first and second relationship pairs (for example, flags the first and second relationship pairs stored in the first and second relationship pair candidate storage units 15 and 17) 1), the mutual machine learning device 1 may not include the first and second relationship pair storage units 26 and 27.

また、本実施の形態では、取得部１９が共通ペアの取得を行う場合について説明したが、そうでなくてもよい。その場合には、相互機械学習装置１は、第１関係ペア候補記憶部１５、第１無関係ペア候補記憶部１６、第２関係ペア候補記憶部１７、第２無関係ペア候補記憶部１８、取得部１９を備えていなくてもよい。また、その場合に、共通ペア記憶部２０に共通ペア（ジェニュイン共通ペアやバーチャル共通ペア）が記憶される過程は問わない。例えば、記録媒体を介して共通ペアが共通ペア記憶部２０で記憶されるようになってもよく、あるいは、通信回線等を介して送信された共通ペアが共通ペア記憶部２０で記憶されるようになってもよい。 Moreover, although this Embodiment demonstrated the case where the acquisition part 19 acquires a common pair, it does not need to be so. In that case, the mutual machine learning device 1 includes the first relationship pair candidate storage unit 15, the first irrelevant pair candidate storage unit 16, the second relationship pair candidate storage unit 17, the second irrelevant pair candidate storage unit 18, and the acquisition unit. 19 may not be provided. In this case, the process of storing the common pair (genuine common pair or virtual common pair) in the common pair storage unit 20 is not limited. For example, a common pair may be stored in the common pair storage unit 20 via a recording medium, or a common pair transmitted via a communication line or the like may be stored in the common pair storage unit 20. It may be.

また、本実施の形態による相互機械学習装置１は、当該装置内で生成された情報を出力する図示しない出力部をさらに備えてもよい。その出力対象の情報は、例えば、第１関係ペア記憶部２６で記憶される第１関係ペアであってもよく、第２関係ペア記憶部２７で記憶される第２関係ペアであってもよく、第１の分類部２３による学習結果の情報であってもよく、第２の分類部２４による学習結果の情報であってもよく、その他の情報であってもよい。その図示しない出力部による出力は、例えば、所定の機器への通信回線を介した送信でもよく、記録媒体への蓄積でもよい。なお、その図示しない出力部は、出力を行うデバイス（例えば、通信デバイスなど）を含んでもよく、あるいは含まなくてもよい。また、その図示しない出力部は、ハードウェアによって実現されてもよく、あるいは、それらのデバイスを駆動するドライバ等のソフトウェアによって実現されてもよい。 In addition, the mutual machine learning device 1 according to the present embodiment may further include an output unit (not shown) that outputs information generated in the device. The output target information may be, for example, the first relationship pair stored in the first relationship pair storage unit 26 or the second relationship pair stored in the second relationship pair storage unit 27. The information of the learning result by the first classification unit 23, the information of the learning result by the second classification unit 24, or other information may be used. The output from the output unit (not shown) may be, for example, transmitted via a communication line to a predetermined device or may be stored in a recording medium. The output unit (not shown) may or may not include a device (for example, a communication device) that performs output. The output unit (not shown) may be realized by hardware, or may be realized by software such as a driver that drives these devices.

また、本実施の形態では、機械学習の後に、第１及び第２の分類部２３，２４が第１関係ペア候補、第２関係ペア候補の分類を行う場合について説明したが、第１及び第２の分類部２３，２４は、第１無関係ペア候補、第２無関係ペア候補についても、分類を行ってもよい。 Moreover, although this Embodiment demonstrated the case where the 1st and 2nd classification | category part 23 and 24 classify a 1st relationship pair candidate and a 2nd relationship pair candidate after machine learning, the 1st and 1st The second classification units 23 and 24 may perform classification on the first irrelevant pair candidate and the second irrelevant pair candidate.

また、本実施の形態では、第１のコーパスと、第２のコーパスとを用いて相互機械学習を行う場合について説明したが、３個以上のコーパスを用いて本実施の形態による相互機械学習装置１と同様の相互機械学習を行ってもよいことは言うまでもない。なお、その場合であっても、その３個以上のコーパスのうち、２個のコーパスに注目すると、本実施の形態による相互機械学習装置１と同様の処理が行われることになる。 In the present embodiment, the case where the mutual machine learning is performed using the first corpus and the second corpus has been described. However, the mutual machine learning apparatus according to the present embodiment using three or more corpora. It goes without saying that mutual machine learning similar to 1 may be performed. Even in such a case, if attention is paid to two corpuses among the three or more corpora, the same processing as the mutual machine learning apparatus 1 according to the present embodiment is performed.

また、上記実施の形態では、相互機械学習装置１がスタンドアロンである場合について説明したが、相互機械学習装置１は、スタンドアロンの装置であってもよく、サーバ・クライアントシステムにおけるサーバ装置であってもよい。後者の場合には、出力部等は、例えば、通信回線を介して情報を出力してもよい。 Moreover, although the case where the mutual machine learning device 1 is a stand-alone has been described in the above embodiment, the mutual machine learning device 1 may be a stand-alone device or a server device in a server / client system. Good. In the latter case, the output unit or the like may output information via a communication line, for example.

また、上記実施の形態において、各処理または各機能は、単一の装置または単一のシステムによって集中処理されることによって実現されてもよく、あるいは、複数の装置または複数のシステムによって分散処理されることによって実現されてもよい。 In the above embodiment, each process or each function may be realized by centralized processing by a single device or a single system, or may be distributedly processed by a plurality of devices or a plurality of systems. It may be realized by doing.

また、上記実施の形態において、各構成要素が実行する処理に関係する情報、例えば、各構成要素が受け付けたり、取得したり、選択したり、生成したり、送信したり、受信したりした情報や、各構成要素が処理で用いるしきい値や数式、アドレス等の情報等は、上記説明で明記していない場合であっても、図示しない記録媒体において、一時的に、あるいは長期にわたって保持されていてもよい。また、その図示しない記録媒体への情報の蓄積を、各構成要素、あるいは、図示しない蓄積部が行ってもよい。また、その図示しない記録媒体からの情報の読み出しを、各構成要素、あるいは、図示しない読み出し部が行ってもよい。 In the above embodiment, information related to processing executed by each component, for example, information received, acquired, selected, generated, transmitted, or received by each component In addition, information such as threshold values, mathematical formulas, addresses, etc. used by each component in processing is retained temporarily or over a long period of time on a recording medium (not shown) even when not explicitly stated in the above description. It may be. Further, the storage of information in the recording medium (not shown) may be performed by each component or a storage unit (not shown). Further, reading of information from the recording medium (not shown) may be performed by each component or a reading unit (not shown).

また、上記実施の形態において、各構成要素等で用いられる情報、例えば、各構成要素が処理で用いるしきい値やアドレス、各種の設定値等の情報がユーザによって変更されてもよい場合には、上記説明で明記していない場合であっても、ユーザが適宜、それらの情報を変更できるようにしてもよく、あるいは、そうでなくてもよい。それらの情報をユーザが変更可能な場合には、その変更は、例えば、ユーザからの変更指示を受け付ける図示しない受付部と、その変更指示に応じて情報を変更する図示しない変更部とによって実現されてもよい。その図示しない受付部による変更指示の受け付けは、例えば、入力デバイスからの受け付けでもよく、通信回線を介して送信された情報の受信でもよく、所定の記録媒体から読み出された情報の受け付けでもよい。 In the above embodiment, when information used by each component, for example, information such as a threshold value, an address, and various setting values used by each component may be changed by the user Even if it is not specified in the above description, the user may be able to change the information as appropriate, or it may not be. If the information can be changed by the user, the change is realized by, for example, a not-shown receiving unit that receives a change instruction from the user and a changing unit (not shown) that changes the information in accordance with the change instruction. May be. The change instruction received by the receiving unit (not shown) may be received from an input device, information received via a communication line, or information read from a predetermined recording medium, for example. .

また、上記実施の形態において、相互機械学習装置１に含まれる２以上の構成要素が通信デバイスや入力デバイス等を有する場合に、２以上の構成要素が物理的に単一のデバイスを有してもよく、あるいは、別々のデバイスを有してもよい。 In the above embodiment, when two or more components included in the mutual machine learning apparatus 1 have communication devices, input devices, etc., the two or more components have a physically single device. Or may have separate devices.

また、上記実施の形態において、各構成要素は専用のハードウェアにより構成されてもよく、あるいは、ソフトウェアにより実現可能な構成要素については、プログラムを実行することによって実現されてもよい。例えば、ハードディスクや半導体メモリ等の記録媒体に記録されたソフトウェア・プログラムをＣＰＵ等のプログラム実行部が読み出して実行することによって、各構成要素が実現され得る。なお、上記実施の形態における相互機械学習装置１を実現するソフトウェアは、以下のようなプログラムである。つまり、このプログラムは、第１の方法によって第１のコーパスから抽出された、意味的関係を有する言語表現のペアの候補である複数の第１関係ペア候補と、第１の方法とは異なる第２の方法によって第２のコーパスから抽出された、意味的関係を有する言語表現のペアの候補である複数の第２関係ペア候補とに共通する共通ペアであるジェニュイン共通ペアと、第１のコーパスから抽出された、意味的関係を有さない言語表現のペアの候補である複数の第１無関係ペア候補と、複数の第２関係ペア候補とに共通する共通ペアであるバーチャル共通ペア、及び、第２のコーパスから抽出された、意味的関係を有さない言語表現のペアの候補である複数の第２無関係ペア候補と、複数の第１関係ペア候補とに共通する共通ペアであるバーチャル共通ペアとが記憶される共通ペア記憶部と、第１関係ペア候補が意味的関係を有しているかどうかの分類に関する機械学習で用いられる教師データである第１の学習データが記憶される第１の学習データ記憶部と、第２関係ペア候補が意味的関係を有しているかどうかの分類に関する機械学習で用いられる教師データである第２の学習データが記憶される第２の学習データ記憶部とにアクセス可能なコンピュータを、第１の学習データを用いて機械学習を行い、機械学習の結果を用いて、ジェニュイン共通ペア及びバーチャル共通ペアが意味的関係を有しているかどうか分類する第１の分類部、第２の学習データを用いて機械学習を行い、機械学習の結果を用いて、ジェニュイン共通ペア及びバーチャル共通ペアが意味的関係を有しているかどうか分類する第２の分類部、第１の分類部の分類による共通ペアの分類結果と確信度との少なくとも一方に応じて、当該共通ペアと当該共通ペアに関する分類結果とを第２の学習データに追加し、第２の分類部の分類による共通ペアの分類結果と確信度との少なくとも一方に応じて、当該共通ペアと当該共通ペアに関する分類結果とを第１の学習データに追加する追加部として機能させ、第１及び第２の分類部による機械学習及び分類と、追加部による学習データの追加とが繰り返して実行される、プログラムである。 In the above embodiment, each component may be configured by dedicated hardware, or a component that can be realized by software may be realized by executing a program. For example, each component can be realized by a program execution unit such as a CPU reading and executing a software program recorded on a recording medium such as a hard disk or a semiconductor memory. In addition, the software which implement | achieves the mutual machine learning apparatus 1 in the said embodiment is the following programs. That is, this program is different from the first method in that a plurality of first relationship pair candidates that are extracted from the first corpus by the first method and that are candidates for pairs of language expressions having a semantic relationship. A common pair that is a common pair that is extracted from the second corpus by the method 2 and is common to a plurality of second relationship pair candidates that are candidates for a pair of language expressions having a semantic relationship; A virtual common pair that is a common pair that is common to a plurality of first unrelated pair candidates and a plurality of second relationship pair candidates that are extracted from A virtual pair that is a common pair that is extracted from the second corpus and is common to a plurality of second unrelated pair candidates that are candidates for pairs of language expressions having no semantic relationship and a plurality of first relationship pair candidates A common pair storage unit that stores common pairs, and first learning data that is teacher data used in machine learning related to classification of whether or not the first relationship pair candidate has a semantic relationship is stored. A first learning data storage unit and a second learning data storage that stores second learning data that is teacher data used in machine learning related to classification of whether or not the second relationship pair candidate has a semantic relationship A computer which can access the computer and classifies whether the genuine common pair and the virtual common pair have a semantic relationship by performing machine learning using the first learning data and using the result of the machine learning Machine learning is performed using one classification unit and second learning data, and the genuine common pair and the virtual common pair have a semantic relationship using the result of the machine learning. A second classifying unit that classifies whether or not the classification result of the common pair according to the classification of the first classifying unit and at least one of the certainty and the classification result related to the common pair are second learning data And an additional unit that adds the common pair and the classification result related to the common pair to the first learning data according to at least one of the classification result and the certainty factor of the common pair by the classification of the second classification unit And the machine learning and classification by the first and second classification units and the addition of learning data by the addition unit are repeatedly executed.

また、上記実施の形態における相互機械学習装置１を実現するソフトウェアは、以下のようなプログラムであってもよい。つまり、このプログラムは、第１の方法によって第１のコーパスから抽出された、意味的関係を有する言語表現のペアの候補である複数の第１関係ペア候補と、前記第１の方法とは異なる第２の方法によって第２のコーパスから抽出された、前記意味的関係を有する言語表現のペアの候補である複数の第２関係ペア候補とに共通する共通ペアであるジェニュイン共通ペア、及び、前記複数の第１関係ペア候補と、前記複数の第２関係ペア候補と、前記第１のコーパスから抽出された、前記意味的関係を有さない言語表現のペアの候補である複数の第１無関係ペア候補と、前記第２のコーパスから抽出された、前記意味的関係を有さない言語表現のペアの候補である複数の第２無関係ペア候補とのうち、前記ジェニュイン共通ペアではないペアである共通ペアであるバーチャル共通ペアが記憶される共通ペア記憶部と、前記第１関係ペア候補が前記意味的関係を有しているかどうかの分類に関する機械学習で用いられる教師データである第１の学習データが記憶される第１の学習データ記憶部と、前記第２関係ペア候補が前記意味的関係を有しているかどうかの分類に関する機械学習で用いられる教師データである第２の学習データが記憶される第２の学習データ記憶部とにアクセス可能なコンピュータを、前記第１の学習データを用いて機械学習を行い、当該機械学習の結果を用いて、前記ジェニュイン共通ペア及び前記バーチャル共通ペアが前記意味的関係を有しているかどうか分類する第１の分類部、前記第２の学習データを用いて機械学習を行い、当該機械学習の結果を用いて、前記ジェニュイン共通ペア及び前記バーチャル共通ペアが前記意味的関係を有しているかどうか分類する第２の分類部、第１の分類部の分類による共通ペアの分類結果と確信度との少なくとも一方に応じて、当該共通ペアと当該共通ペアに関する分類結果とを第２の学習データに追加し、第２の分類部の分類による共通ペアの分類結果と確信度との少なくとも一方に応じて、当該共通ペアと当該共通ペアに関する分類結果とを第１の学習データに追加する追加部として機能させ、前記第１及び第２の分類部による機械学習及び分類と、前記追加部による学習データの追加とが繰り返して実行される、プログラムである。 Moreover, the following programs may be sufficient as the software which implement | achieves the mutual machine learning apparatus 1 in the said embodiment. That is, this program is different from the first method in that a plurality of first relationship pair candidates that are extracted from the first corpus by the first method and that are candidate language expression pairs having a semantic relationship. Genuine common pairs that are common pairs that are extracted from the second corpus by the second method and that are common to a plurality of second relationship pair candidates that are candidates for pairs of language expressions having semantic relations, and A plurality of first relationship pair candidates, a plurality of second relationship pair candidates, and a plurality of first irrelevant candidates that are extracted from the first corpus and have language expression pairs that do not have the semantic relationship A pair that is not a genuine common pair among a plurality of second irrelevant pair candidates that are candidates for a pair of language expressions that are extracted from the second corpus and that have no semantic relationship. A common pair storage unit that stores a virtual common pair that is a certain common pair, and first teacher data that is used in machine learning related to classification as to whether the first relation pair candidate has the semantic relationship or not. A first learning data storage unit storing learning data, and second learning data that is teacher data used in machine learning related to classification as to whether the second relationship pair candidate has the semantic relationship. A computer accessible to the stored second learning data storage unit performs machine learning using the first learning data, and uses the result of the machine learning to generate the genuine common pair and the virtual common pair. A first classifying unit that classifies whether or not has the semantic relationship, performs machine learning using the second learning data, and uses a result of the machine learning , A second classification unit that classifies whether the genuine common pair and the virtual common pair have the semantic relationship, and at least one of the classification result of the common pair and the certainty factor according to the classification of the first classification unit Accordingly, the common pair and the classification result related to the common pair are added to the second learning data, and the common pair is classified according to at least one of the classification result and the certainty factor of the common pair by the classification of the second classification unit. The pair and the classification result related to the common pair are caused to function as an addition unit that adds to the first learning data, and machine learning and classification by the first and second classification units, and addition of learning data by the addition unit It is a program that is executed repeatedly.

なお、プログラムにおいて、そのプログラムが実現する機能には、ハードウェアでしか実現できない機能は含まれない。例えば、情報を取得する取得部や、情報を出力する出力部などにおけるモデムやインターフェースカードなどのハードウェアでしか実現できない機能は、そのプログラムが実現する機能には少なくとも含まれない。 In the program, functions realized by the program do not include functions that can be realized only by hardware. For example, functions that can be realized only by hardware such as a modem or an interface card in an acquisition unit that acquires information, an output unit that outputs information, and the like are not included in at least the functions realized by the program.

また、このプログラムは、サーバなどからダウンロードされることによって実行されてもよく、所定の記録媒体（例えば、ＣＤ−ＲＯＭなどの光ディスクや磁気ディスク、半導体メモリなど）に記録されたプログラムが読み出されることによって実行されてもよい。また、このプログラムは、プログラムプロダクトを構成するプログラムとして用いられてもよい。 Further, this program may be executed by being downloaded from a server or the like, and a program recorded on a predetermined recording medium (for example, an optical disk such as a CD-ROM, a magnetic disk, a semiconductor memory, or the like) is read out. May be executed by Further, this program may be used as a program constituting a program product.

また、このプログラムを実行するコンピュータは、単数であってもよく、複数であってもよい。すなわち、集中処理を行ってもよく、あるいは分散処理を行ってもよい。 Further, the computer that executes this program may be singular or plural. That is, centralized processing may be performed, or distributed processing may be performed.

図８は、上記プログラムを実行して、上記実施の形態による相互機械学習装置１を実現するコンピュータの外観の一例を示す模式図である。上記実施の形態は、コンピュータハードウェア及びその上で実行されるコンピュータプログラムによって実現されうる。 FIG. 8 is a schematic diagram illustrating an example of an external appearance of a computer that executes the program and realizes the mutual machine learning device 1 according to the embodiment. The above-described embodiment can be realized by computer hardware and a computer program executed on the computer hardware.

図８において、コンピュータシステム９００は、ＣＤ−ＲＯＭ（ＣｏｍｐａｃｔＤｉｓｋＲｅａｄＯｎｌｙＭｅｍｏｒｙ）ドライブ９０５、ＦＤ（Ｆｌｏｐｐｙ（登録商標）Ｄｉｓｋ）ドライブ９０６を含むコンピュータ９０１と、キーボード９０２と、マウス９０３と、モニタ９０４とを備える。 In FIG. 8, a computer system 900 includes a computer 901 including a CD-ROM (Compact Disk Read Only Memory) drive 905, an FD (Floppy (registered trademark) Disk) drive 906, a keyboard 902, a mouse 903, a monitor 904, and the like. Is provided.

図９は、コンピュータシステム９００の内部構成を示す図である。図９において、コンピュータ９０１は、ＣＤ−ＲＯＭドライブ９０５、ＦＤドライブ９０６に加えて、ＭＰＵ（ＭｉｃｒｏＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）９１１と、ブートアッププログラム等のプログラムを記憶するためのＲＯＭ９１２と、ＭＰＵ９１１に接続され、アプリケーションプログラムの命令を一時的に記憶すると共に、一時記憶空間を提供するＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）９１３と、アプリケーションプログラム、システムプログラム、及びデータを記憶するハードディスク９１４と、ＭＰＵ９１１、ＲＯＭ９１２等を相互に接続するバス９１５とを備える。なお、コンピュータ９０１は、ＬＡＮへの接続を提供する図示しないネットワークカードを含んでいてもよい。 FIG. 9 is a diagram showing an internal configuration of the computer system 900. In FIG. 9, in addition to the CD-ROM drive 905 and the FD drive 906, a computer 901 is connected to an MPU (Micro Processing Unit) 911, a ROM 912 for storing a program such as a bootup program, and the MPU 911. A RAM (Random Access Memory) 913 that temporarily stores program instructions and provides a temporary storage space, a hard disk 914 that stores application programs, system programs, and data, and an MPU 911 and a ROM 912 are interconnected. And a bus 915. The computer 901 may include a network card (not shown) that provides connection to the LAN.

コンピュータシステム９００に、上記実施の形態による相互機械学習装置１の機能を実行させるプログラムは、ＣＤ−ＲＯＭ９２１、またはＦＤ９２２に記憶されて、ＣＤ−ＲＯＭドライブ９０５、またはＦＤドライブ９０６に挿入され、ハードディスク９１４に転送されてもよい。これに代えて、そのプログラムは、図示しないネットワークを介してコンピュータ９０１に送信され、ハードディスク９１４に記憶されてもよい。プログラムは実行の際にＲＡＭ９１３にロードされる。なお、プログラムは、ＣＤ−ＲＯＭ９２１やＦＤ９２２、またはネットワークから直接、ロードされてもよい。 A program that causes the computer system 900 to execute the functions of the mutual machine learning apparatus 1 according to the above-described embodiment is stored in the CD-ROM 921 or the FD 922, inserted into the CD-ROM drive 905 or the FD drive 906, and the hard disk 914. May be forwarded to. Instead, the program may be transmitted to the computer 901 via a network (not shown) and stored in the hard disk 914. The program is loaded into the RAM 913 when executed. The program may be loaded directly from the CD-ROM 921, the FD 922, or the network.

プログラムは、コンピュータ９０１に、上記実施の形態による相互機械学習装置１の機能を実行させるオペレーティングシステム（ＯＳ）、またはサードパーティプログラム等を必ずしも含んでいなくてもよい。プログラムは、制御された態様で適切な機能（モジュール）を呼び出し、所望の結果が得られるようにする命令の部分のみを含んでいてもよい。コンピュータシステム９００がどのように動作するのかについては周知であり、詳細な説明は省略する。 The program does not necessarily include an operating system (OS) or a third party program that causes the computer 901 to execute the functions of the mutual machine learning device 1 according to the above-described embodiment. The program may include only a part of an instruction that calls an appropriate function (module) in a controlled manner and obtains a desired result. How the computer system 900 operates is well known and will not be described in detail.

また、本発明は、以上の実施の形態に限定されることなく、種々の変更が可能であり、それらも本発明の範囲内に包含されるものであることは言うまでもない。 Further, the present invention is not limited to the above-described embodiment, and various modifications are possible, and it goes without saying that these are also included in the scope of the present invention.

以上より、本発明による相互機械学習装置等によれば、より精度の高い機械学習を実現できるという効果が得られ、機械学習を行う装置等として有用である。 As described above, according to the mutual machine learning device or the like according to the present invention, an effect that a machine learning with higher accuracy can be realized is obtained, and it is useful as a device for performing machine learning.

１相互機械学習装置
１１第１のコーパス記憶部
１２第２のコーパス記憶部
１３第１の抽出部
１４第２の抽出部
１５第１関係ペア候補記憶部
１６第１無関係ペア候補記憶部
１７第２関係ペア候補記憶部
１８第２無関係ペア候補記憶部
１９取得部
２０共通ペア記憶部
２１第１の学習データ記憶部
２２第２の学習データ記憶部
２３第１の分類部
２４第２の分類部
２５追加部
２６第１関係ペア記憶部
２７第２関係ペア記憶部 DESCRIPTION OF SYMBOLS 1 Mutual machine learning apparatus 11 1st corpus memory | storage part 12 2nd corpus memory | storage part 13 1st extraction part 14 2nd extraction part 15 1st relationship pair candidate memory | storage part 16 1st unrelated pair candidate memory | storage part 17 2nd Related pair candidate storage unit 18 Second unrelated pair candidate storage unit 19 Acquisition unit 20 Common pair storage unit 21 First learning data storage unit 22 Second learning data storage unit 23 First classification unit 24 Second classification unit 25 Additional unit 26 First relationship pair storage unit 27 Second relationship pair storage unit

Claims

A plurality of first relationship pair candidates that are extracted from the first corpus by the first method and that are candidates for pairs of linguistic expressions having a semantic relationship, and a second method different from the first method. Extracted from two corpora, a common pair common to a plurality of second relationship pair candidates that are candidates for a pair of language expressions having a semantic relationship, and extracted from the first corpus In addition, a plurality of first unrelated pair candidates that are candidates for pairs of language expressions that do not have the semantic relationship, a virtual common pair that is a common pair common to the plurality of second relationship pair candidates, and A common pair that is extracted from a second corpus and is common to a plurality of second unrelated pair candidates that are candidates for a pair of language expressions having no semantic relationship and the plurality of first relationship pair candidates. Ba A common pair storage unit and the Virtual common pair is stored,
A first learning data storage unit that stores first learning data that is teacher data used in machine learning related to classification of whether the first relationship pair candidate has the semantic relationship;
A first classification unit that performs machine learning using the first learning data, and classifies whether the genuine common pair and the virtual common pair have the semantic relationship using the result of the machine learning When,
A second learning data storage unit that stores second learning data that is teacher data used in machine learning related to classification of whether the second relationship pair candidate has the semantic relationship;
A second classification unit that performs machine learning using the second learning data, and classifies whether the genuine common pair and the virtual common pair have the semantic relationship using the result of the machine learning When,
Adding the common pair and the classification result related to the common pair to the second learning data according to at least one of the classification result and the certainty factor of the common pair according to the classification of the first classification unit; An additional unit that adds the common pair and the classification result related to the common pair to the first learning data according to at least one of the classification result of the common pair by the classification of the classification unit and the certainty factor,
A mutual machine learning device in which machine learning and classification by the first and second classification units and addition of learning data by the adding unit are repeatedly executed.

A plurality of first relationship pair candidates that are extracted from the first corpus by the first method and that are candidates for pairs of linguistic expressions having a semantic relationship, and a second method different from the first method. Genuine common pairs extracted from two corpora and common to a plurality of second relationship pair candidates that are candidate language expression pairs having a semantic relationship, and the plurality of first relationship pairs A plurality of second irrelevant pair candidates extracted from the first corpus, a plurality of first irrelevant pair candidates that are extracted from the first corpus and are linguistic expression pair candidates that do not have the semantic relationship, Birch that is a common pair that is a pair that is not a genuine common pair among a plurality of second irrelevant pair candidates that are extracted from the corpus of 2 and that are candidate language expression pairs that do not have a semantic relationship A common pair storage unit Le common pair is stored,
A first learning data storage unit that stores first learning data that is teacher data used in machine learning related to classification of whether the first relationship pair candidate has the semantic relationship;
A first classification unit that performs machine learning using the first learning data, and classifies whether the genuine common pair and the virtual common pair have the semantic relationship using the result of the machine learning When,
A second learning data storage unit that stores second learning data that is teacher data used in machine learning related to classification of whether the second relationship pair candidate has the semantic relationship;
A second classification unit that performs machine learning using the second learning data, and classifies whether the genuine common pair and the virtual common pair have the semantic relationship using the result of the machine learning When,
Adding the common pair and the classification result related to the common pair to the second learning data according to at least one of the classification result and the certainty factor of the common pair according to the classification of the first classification unit; An additional unit that adds the common pair and the classification result related to the common pair to the first learning data according to at least one of the classification result of the common pair by the classification of the classification unit and the certainty factor,
A mutual machine learning device in which machine learning and classification by the first and second classification units and addition of learning data by the adding unit are repeatedly executed.

The additional part is:
A common pair having a high certainty factor by the classification of the first classification unit and a classification result related to the common pair are added to the second learning data, and a common pair having a high certainty factor by the classification of the second classification unit; The mutual learning device according to claim 1, wherein a classification result related to the common pair is added to the first learning data.

The additional part is:
A common pair having a high certainty factor by the classification of the first classification unit and having the same classification result of the first and second classification units and a classification result related to the common pair are added to the second learning data. A common pair having a high certainty factor by the classification of the second classification unit and having the same classification result of the first and second classification units and a classification result related to the common pair are added to the first learning data. The mutual machine learning device according to claim 3.

The additional part is:
Adding a common pair having a high certainty factor by the classification of the first classification unit and a low certainty factor by the classification of the second classification unit and a classification result related to the common pair to the second learning data; A common pair having a high certainty factor by the classification of the second classification unit and a low certainty factor by the classification of the first classification unit and a classification result related to the common pair are added to the first learning data. The mutual machine learning device according to claim 4.

A first relationship pair candidate storage unit in which the plurality of first relationship pair candidates are stored;
A first irrelevant pair candidate storage unit that stores the plurality of first irrelevant pair candidates;
A second relationship pair candidate storage unit in which the plurality of second relationship pair candidates are stored;
A second irrelevant pair candidate storage unit in which the plurality of second irrelevant pair candidates are stored;
Using the plurality of first relationship pair candidates and the plurality of second relationship pair candidates, the genuine common pair is acquired and stored in the common pair storage unit, and the plurality of first relationship pair candidates and the plurality Using the second relationship pair candidate, the plurality of first irrelevant pair candidates and the plurality of second irrelevant pair candidates, an acquisition unit that acquires the virtual common pair and accumulates it in the common pair storage unit; The mutual machine learning apparatus according to claim 1, further comprising:

A first corpus storage unit in which the first corpus is stored;
A second corpus storage unit for storing the second corpus;
The plurality of first relationship pair candidates are extracted from the first corpus and stored in the first relationship pair candidate storage unit, and the plurality of first unrelated pair candidates are extracted from the first corpus. A first extraction unit that accumulates in one unrelated pair candidate storage unit;
The plurality of second relationship pair candidates are extracted from the second corpus and accumulated in the second relationship pair candidate storage unit, and the plurality of second unrelated pair candidates are extracted from the second corpus. The mutual machine learning device according to claim 6, further comprising: a second extraction unit that accumulates in the two unrelated pair candidate storage units.

The first classifying unit classifies the plurality of first relationship pair candidates after repeating machine learning and classification and addition of learning data,
The mutual machine learning according to claim 6 or 7, wherein the second classifying unit classifies the plurality of second relationship pair candidates after machine learning and repeating the classification and the addition of learning data. apparatus.

The first corpus is a structured corpus;
The mutual machine learning device according to claim 1, wherein the second corpus is a corpus of an unstructured natural language sentence.

The mutual machine learning device according to claim 1, wherein the semantic relationship is a high-order and low-order relationship.

A plurality of first relationship pair candidates that are extracted from the first corpus by the first method and that are candidates for pairs of linguistic expressions having a semantic relationship, and a second method different from the first method. Extracted from two corpora, a common pair common to a plurality of second relationship pair candidates that are candidates for a pair of language expressions having a semantic relationship, and extracted from the first corpus In addition, a plurality of first unrelated pair candidates that are candidates for pairs of language expressions that do not have the semantic relationship, a virtual common pair that is a common pair common to the plurality of second relationship pair candidates, and A common pair that is extracted from a second corpus and is common to a plurality of second unrelated pair candidates that are candidates for a pair of language expressions having no semantic relationship and the plurality of first relationship pair candidates. Ba A common pair storage unit that stores a char common pair, and first learning data that is teacher data used in machine learning related to classification of whether the first relationship pair candidate has the semantic relationship is stored. The first learning data storage unit, the first classifying unit, and the second training data used in machine learning related to the classification of whether the second relationship pair candidate has the semantic relationship or not A mutual machine learning method that is processed using a second learning data storage unit that stores learning data, a second classification unit, and an adding unit,
Whether the first classifying unit performs machine learning using the first learning data, and the genuine common pair and the virtual common pair have the semantic relationship using the result of the machine learning A first classification step to classify whether;
Whether the second classifying unit performs machine learning using the second learning data, and whether the genuine common pair and the virtual common pair have the semantic relationship using the result of the machine learning A second classification step for classifying whether;
In accordance with at least one of the classification result of the common pair and the certainty level by the classification in the first classification step, the adding unit converts the common pair and the classification result related to the common pair into the second learning data. And adding the common pair and the classification result related to the common pair to the first learning data according to at least one of the classification result and the certainty factor of the common pair based on the classification in the second classification step. An additional step,
A mutual machine learning method in which machine learning and classification in the first and second classification steps and addition of learning data in the adding step are repeatedly executed.

A plurality of first relationship pair candidates that are extracted from the first corpus by the first method and that are candidates for pairs of linguistic expressions having a semantic relationship, and a second method different from the first method. Genuine common pairs extracted from two corpora and common to a plurality of second relationship pair candidates that are candidate language expression pairs having a semantic relationship, and the plurality of first relationship pairs A plurality of second irrelevant pair candidates extracted from the first corpus, a plurality of first irrelevant pair candidates that are extracted from the first corpus and are linguistic expression pair candidates that do not have the semantic relationship, Birch that is a common pair that is a pair that is not a genuine common pair among a plurality of second irrelevant pair candidates that are extracted from the corpus of 2 and that are candidate language expression pairs that do not have a semantic relationship A common pair storage unit that stores a common pair, and first learning data that is teacher data used in machine learning regarding classification of whether the first relationship pair candidate has the semantic relationship is stored A first learning data storage unit, a first classifying unit, and a second learning which is teacher data used in machine learning regarding classification of whether the second relationship pair candidate has the semantic relationship A mutual machine learning method processed using a second learning data storage unit in which data is stored, a second classification unit, and an addition unit,
Whether the first classifying unit performs machine learning using the first learning data, and the genuine common pair and the virtual common pair have the semantic relationship using the result of the machine learning A first classification step to classify whether;
Whether the second classifying unit performs machine learning using the second learning data, and whether the genuine common pair and the virtual common pair have the semantic relationship using the result of the machine learning A second classification step for classifying whether;
In accordance with at least one of the classification result of the common pair and the certainty level by the classification in the first classification step, the adding unit converts the common pair and the classification result related to the common pair into the second learning data. And adding the common pair and the classification result related to the common pair to the first learning data according to at least one of the classification result and the certainty factor of the common pair based on the classification in the second classification step. An additional step,
A mutual machine learning method in which machine learning and classification in the first and second classification steps and addition of learning data in the adding step are repeatedly executed.

A plurality of first relationship pair candidates that are extracted from the first corpus by the first method and that are candidates for pairs of linguistic expressions having a semantic relationship, and a second method different from the first method. Extracted from two corpora, a common pair common to a plurality of second relationship pair candidates that are candidates for a pair of language expressions having a semantic relationship, and extracted from the first corpus In addition, a plurality of first unrelated pair candidates that are candidates for pairs of language expressions that do not have the semantic relationship, a virtual common pair that is a common pair common to the plurality of second relationship pair candidates, and A common pair that is extracted from a second corpus and is common to a plurality of second unrelated pair candidates that are candidates for a pair of language expressions having no semantic relationship and the plurality of first relationship pair candidates. Ba A common pair storage unit that stores a char common pair, and first learning data that is teacher data used in machine learning related to classification of whether the first relationship pair candidate has the semantic relationship is stored. A first learning data storage unit, and second learning data that is teacher data used in machine learning related to classification of whether the second relationship pair candidate has the semantic relationship is stored. A computer that can access the learning data storage unit
A first classification unit that performs machine learning using the first learning data, and classifies whether the genuine common pair and the virtual common pair have the semantic relationship using the result of the machine learning ,
A second classification unit that performs machine learning using the second learning data, and classifies whether the genuine common pair and the virtual common pair have the semantic relationship using the result of the machine learning ,
Adding the common pair and the classification result related to the common pair to the second learning data according to at least one of the classification result and the certainty factor of the common pair according to the classification of the first classification unit; In accordance with at least one of the classification result and the certainty of the common pair according to the classification of the classification unit, the common pair and the classification result related to the common pair are functioned as an additional unit that adds to the first learning data,
A program in which machine learning and classification by the first and second classification units and addition of learning data by the addition unit are repeatedly executed.

A plurality of first relationship pair candidates that are extracted from the first corpus by the first method and that are candidates for pairs of linguistic expressions having a semantic relationship, and a second method different from the first method. Genuine common pairs extracted from two corpora and common to a plurality of second relationship pair candidates that are candidate language expression pairs having a semantic relationship, and the plurality of first relationship pairs A plurality of second irrelevant pair candidates extracted from the first corpus, a plurality of first irrelevant pair candidates that are extracted from the first corpus and are linguistic expression pair candidates that do not have the semantic relationship, Birch that is a common pair that is a pair that is not a genuine common pair among a plurality of second irrelevant pair candidates that are extracted from the corpus of 2 and that are candidate language expression pairs that do not have a semantic relationship A common pair storage unit that stores a common pair, and first learning data that is teacher data used in machine learning regarding classification of whether the first relationship pair candidate has the semantic relationship is stored A first learning data storage unit that stores second learning data that is teacher data used in machine learning related to classification as to whether or not the second relationship pair candidate has the semantic relationship. A computer accessible to the learning data storage unit of
A first classification unit that performs machine learning using the first learning data, and classifies whether the genuine common pair and the virtual common pair have the semantic relationship using the result of the machine learning ,
A second classification unit that performs machine learning using the second learning data, and classifies whether the genuine common pair and the virtual common pair have the semantic relationship using the result of the machine learning ,
Adding the common pair and the classification result related to the common pair to the second learning data according to at least one of the classification result and the certainty factor of the common pair according to the classification of the first classification unit; In accordance with at least one of the classification result and the certainty of the common pair according to the classification of the classification unit, the common pair and the classification result related to the common pair are functioned as an additional unit that adds to the first learning data,
A program in which machine learning and classification by the first and second classification units and addition of learning data by the addition unit are repeatedly executed.