JP2018156332A

JP2018156332A - Generation device, generation method, and generation program

Info

Publication number: JP2018156332A
Application number: JP2017051952A
Authority: JP
Inventors: ウィボルカノジア; Zia Wiboru Kano
Original assignee: Yahoo Japan Corp
Current assignee: Yahoo Japan Corp
Priority date: 2017-03-16
Filing date: 2017-03-16
Publication date: 2018-10-04
Anticipated expiration: 2037-03-16
Also published as: JP6705763B2

Abstract

PROBLEM TO BE SOLVED: To improve accuracy of distributed representations generated from triples.SOLUTION: A generation device of the present disclosure comprises: an extraction unit configured to extract a first element and association information included in a predetermined triple; a determination unit configured to stochastically determine whether to limit a selection source for a second element or not; a selection unit configured to select at least a second element not included in the predetermined triple from among second elements included in arbitrary triples based on a result of the determination; and a generation unit configured to generate each distributed representation in such a way that a sum of distributed representation of the extracted first element and a distributed representation of the association information is not similar to a distributed representation of the selected second element.SELECTED DRAWING: Figure 1

Description

本発明は、生成装置、生成方法および生成プログラムに関する。 The present invention relates to a generation device, a generation method, and a generation program.

従来、ＲＤＦ（Resource Description Framework）等、関連性を有する２つの要素と要素間の関係性を示す関係情報とを含むトリプルと呼ばれるデータを構成する技術が知られている。このようなトリプルの検索を容易にするため、第１要素の分散表現と関係情報の分散表現との和が第２要素の分散表現となるように、各要素および関係情報の分散表現を学習し、第１要素と関係情報とを検索クエリとして受付けた場合に、分散表現の和を算出することでトリプルを構成する第２要素を検索する技術が知られている。 2. Description of the Related Art Conventionally, a technique for constructing data called a triple including two related elements such as RDF (Resource Description Framework) and relation information indicating the relation between the elements is known. In order to facilitate such a search for triples, the distributed representation of each element and relationship information is learned so that the sum of the distributed representation of the first element and the distributed representation of the relationship information becomes the distributed representation of the second element. A technique for searching for a second element constituting a triple by calculating the sum of distributed expressions when a first element and relation information are received as a search query is known.

"Translating Embeddings for Modeling Multi-relational Data"Antoine Bordes Et al."Translating Embeddings for Modeling Multi-relational Data" Antoine Bordes Et al.

しかしながら、上述した従来技術では、分散表現の精度を担保出来ない場合がある。 However, the above-described conventional technology may not be able to guarantee the accuracy of distributed representation.

例えば、従来技術では、あるトリプルの第１要素の分散表現と関係情報の分散情報との和が、そのトリプルの第２要素の分散表現と類似し、他のトリプルの第２要素の分散表現とは類似しないように、各要素および関係情報の分散表現を学習する。このため、学習対象となるトリプルが属する分野に偏りが存在する場合は、トリプルの数が少ない分野における学習が進まないため、最終的に得られる分散表現の精度が低下してしまう。 For example, in the prior art, the sum of the distributed representation of the first element of a triple and the distributed information of the relationship information is similar to the distributed representation of the second element of the triple, Learns the distributed representation of each element and relation information so that they are not similar. For this reason, if there is a bias in the field to which the triple to be learned belongs, the learning in the field with a small number of triples does not proceed, and the accuracy of the distributed representation that is finally obtained decreases.

本願は、上記に鑑みてなされたものであって、トリプルから生成する分散表現の精度を改善することを目的とする。 The present application has been made in view of the above, and an object thereof is to improve the accuracy of distributed representation generated from triples.

本願に係る生成装置は、所定のトリプルに含まれる第１要素と関係情報とを抽出する抽出部と、第２要素の選択元を限定するか否かを確率的に決定する決定部と、前記決定結果に基づいて、任意のトリプルに含まれる第２要素のうち前記所定のトリプルに含まれない第２要素を少なくとも選択する選択部と、抽出した第１要素の分散表現と関係情報の分散表現との和と、選択された第２要素の分散表現とが類似しないように、各分散表現を生成する生成部とを有することを特徴とする。 The generation apparatus according to the present application includes an extraction unit that extracts a first element and relationship information included in a predetermined triple, a determination unit that determines probabilistically whether to limit a selection source of the second element, A selection unit that selects at least a second element that is not included in the predetermined triple among second elements included in an arbitrary triple based on the determination result, a distributed expression of the extracted first element, and a distributed expression of relation information And a generation unit that generates each distributed representation so that the sum of and the distributed representation of the selected second element are not similar to each other.

実施形態の一態様によれば、トリプルから生成する分散表現の精度を改善できる。 According to one aspect of the embodiment, it is possible to improve the accuracy of the distributed representation generated from the triple.

図１は、実施形態に係る情報提供装置が実行する処理の一例を示す図である。FIG. 1 is a diagram illustrating an example of processing executed by the information providing apparatus according to the embodiment. 図２は、実施形態に係る情報提供装置の構成例を示す図である。FIG. 2 is a diagram illustrating a configuration example of the information providing apparatus according to the embodiment. 図３は、実施形態に係るエンティティデータベースに登録される情報の一例を示す図である。FIG. 3 is a diagram illustrating an example of information registered in the entity database according to the embodiment. 図４は、実施形態に係る関係情報データベースに登録される情報の一例を示す図である。FIG. 4 is a diagram illustrating an example of information registered in the relational information database according to the embodiment. 図５は、実施形態に係る分散表現データベースに登録される情報の一例を示す図である。FIG. 5 is a diagram illustrating an example of information registered in the distributed representation database according to the embodiment. 図６は、実施形態に係る情報提供装置が不正解データとして選択する第２要素の一例を示す図である。FIG. 6 is a diagram illustrating an example of a second element selected as incorrect answer data by the information providing apparatus according to the embodiment. 図７は、実施形態に係る情報提供装置が実行する生成処理の流れの一例を示すフローチャートである。FIG. 7 is a flowchart illustrating an example of a flow of generation processing executed by the information providing apparatus according to the embodiment. 図８は、ハードウェア構成の一例を示す図である。FIG. 8 is a diagram illustrating an example of a hardware configuration.

以下に、本願に係る生成装置、生成方法および生成プログラムを実施するための形態（以下、「実施形態」と記載する。）について図面を参照しつつ詳細に説明する。なお、この実施形態により本願に係る生成装置、生成方法および生成プログラムが限定されるものではない。また、以下の各実施形態において同一の部位には同一の符号を付し、重複する説明は省略する。 Hereinafter, a mode for carrying out a generation device, a generation method, and a generation program according to the present application (hereinafter referred to as “embodiment”) will be described in detail with reference to the drawings. Note that the generation device, the generation method, and the generation program according to the present application are not limited to the embodiment. In the following embodiments, the same parts are denoted by the same reference numerals, and redundant description is omitted.

［実施形態］
〔１．情報提供装置が提供する処理について〕
まず、図１を用いて、生成装置の一例となる情報提供装置が実行する生成処理の一例について説明する。図１は、実施形態に係る情報提供装置が実行する処理の一例を示す図である。なお、以下の説明では、情報提供装置１０が実行する処理として、トリプルとして登録された情報の分散表現を生成する生成処理と、生成処理によって生成した分散表現を用いて、情報の検索を行う検索処理とについて説明する。なお、以下の説明では、情報提供装置１０が各情報の分散表現を「生成」する処理について説明するが、かかる処理は、各データに対応する適切な分散表現の値を適宜「学習」することで、分散表現を「生成」する処理であるものとする。 [Embodiment]
[1. Regarding the processing provided by the information providing device]
First, an example of a generation process executed by an information providing apparatus as an example of a generation apparatus will be described with reference to FIG. FIG. 1 is a diagram illustrating an example of processing executed by the information providing apparatus according to the embodiment. In the following description, as a process executed by the information providing apparatus 10, a generation process for generating a distributed representation of information registered as a triple and a search for searching for information using the distributed representation generated by the generation process are performed. Processing will be described. In the following description, a process in which the information providing apparatus 10 “generates” a distributed representation of each information will be described. However, such processing appropriately “learns” the value of an appropriate distributed representation corresponding to each data. It is assumed that this is a process of “generating” a distributed representation.

〔１−１．情報提供装置の概要〕
情報提供装置１０は、インターネット等の所定のネットワークＮ（例えば、図２を参照。）を介して、利用者端末１００と通信可能な情報処理装置であり、例えば、サーバ装置やクラウドシステム等により実現される。なお、情報提供装置１０は、ネットワークＮを介して、任意の数の利用者端末１００と通信可能であってよい。 [1-1. Overview of information providing device)
The information providing apparatus 10 is an information processing apparatus that can communicate with the user terminal 100 via a predetermined network N (for example, see FIG. 2) such as the Internet, and is realized by, for example, a server apparatus or a cloud system. Is done. The information providing apparatus 10 may be able to communicate with an arbitrary number of user terminals 100 via the network N.

利用者端末１００は、情報の検索を要求する利用者が使用する情報処理装置であり、ＰＣ（Personal Computer）、サーバ装置、スマートデバイスといった情報処理装置により実現される。例えば、利用者端末１００は、情報の検索を行う際の検索クエリを情報提供装置１０に送信する。このような場合、情報提供装置１０は、ＲＤＦの形式で保存された情報、すなわち、トリプルの中から検索クエリに対応する情報を検索し、検索結果を利用者端末１００へと提供する。 The user terminal 100 is an information processing device used by a user who requests information retrieval, and is realized by an information processing device such as a PC (Personal Computer), a server device, or a smart device. For example, the user terminal 100 transmits a search query for searching for information to the information providing apparatus 10. In such a case, the information providing apparatus 10 searches for information stored in the RDF format, that is, information corresponding to the search query from the triples, and provides the search result to the user terminal 100.

ここで、情報提供装置１０が検索するトリプルについて説明する。例えば、情報提供装置１０は、エンティティ等と呼ばれる第１要素および第２要素と、要素間の関係性を示す関係情報とで構成される三つ組みの情報をトリプルとして記憶する。例えば、情報提供装置１０は、第１要素である主語（Ｓ：Subject）のエンティティ、第２要素である目的語（Ｏ：Object）となるエンティティ、および述語（Ｐ：Predicate）となる関係情報の組をトリプルとして記憶する。 Here, triples searched by the information providing apparatus 10 will be described. For example, the information providing apparatus 10 stores triple information composed of first and second elements called entities or the like and relation information indicating the relationship between the elements as triples. For example, the information providing apparatus 10 includes the entity of the subject (S: Subject) as the first element, the entity as the object (O: Object) as the second element, and the relationship information as the predicate (P: Predicate). Memorize pairs as triples.

例えば、情報提供装置１０は、エンティティとして、実世界における人物、物体、建築物等の主語となりうる各種の物、職業や国籍等といった属性、各種の状態や事象等、世の中における各種の物事に対応する情報を記憶し、エンティティ間の関係情報を記憶することで、様々な知識を体系的に管理することができる。例えば、所定の人物＃Ａに対応するエンティティ＃１と、職業「政治家」に対応するエンティティ＃２と、「職業」を示す関係情報とをトリプルとした場合、かかるトリプルは、「人物＃Ａ」の「職業」が「政治家」である旨を示すことができる。このように、情報提供装置１０は、トリプルを用いて、知識を体系的に管理する。 For example, the information providing apparatus 10 corresponds to various things in the world, such as various things that can be the subject of a person, an object, a building, etc. in the real world, attributes such as occupation and nationality, various states and events as entities. It is possible to systematically manage various knowledge by storing information to be stored and storing relationship information between entities. For example, when the entity # 1 corresponding to the predetermined person #A, the entity # 2 corresponding to the occupation “politician”, and the relationship information indicating “occupation” are defined as triples, the triple is “person #A”. "Occupation" can be shown to be "politician". As described above, the information providing apparatus 10 systematically manages knowledge using triples.

なお、人物に対応するエンティティと人名に対応するエンティティとは別のエンティティであってもよい。このような場合、人物に対応するエンティティと、「名前」を示す関係情報と、人名に対応するエンティティとをトリプルとして記憶することで、ある人物の名前を保持することができる。このようなトリプルを用いた知識の体系的な管理は、例えば、ナレッジデータベース等に用いられている。 The entity corresponding to the person and the entity corresponding to the person name may be different entities. In such a case, the name of a person can be held by storing the entity corresponding to the person, the relationship information indicating “name”, and the entity corresponding to the person name as triples. Such systematic management of knowledge using triples is used in, for example, a knowledge database.

なお、エンティティは、物事を示すテキストのみならず、物事を示す静止画像、動画像、音声、ウェブコンテンツ、ウェブコンテンツのＵＲＬ（Uniform Resource Locator）等、任意の情報であってもよい。また、エンティティは、物事を示す情報そのものである必要はなく、概念を示すエンティティとして設定されたものであってもよい。 The entity may be not only text indicating things but also arbitrary information such as still images indicating things, moving images, sounds, web contents, URLs (Uniform Resource Locators) of web contents. Moreover, the entity does not need to be the information itself indicating things, and may be set as an entity indicating a concept.

〔１−２．生成処理について〕
ここで、トリプルの検索を容易にするため、第１要素の分散表現と関係情報の分散表現との和が第２要素の分散表現となるように、各要素および関係情報の分散表現を生成し、第１要素と関係情報とを検索クエリとして受付けた場合に、分散表現の和を算出することでトリプルを構成する第２要素を検索する技術が知られている。このような分散表現を生成した場合、情報提供装置１０は、第１要素と関係情報とを検索クエリとして受付けた場合に、第１要素の分散表現と関係情報の分散表現との和を算出し、算出した分散表現と類似する分散表現を検索することで、検索クエリとして受付けた第１要素および関係情報を含むトリプルの第２要素を近傍検索により検索することができる。 [1-2. About generation processing)
Here, in order to facilitate the search for triples, a distributed representation of each element and relationship information is generated so that the sum of the distributed representation of the first element and the distributed representation of the relationship information becomes the distributed representation of the second element. A technique for searching for a second element constituting a triple by calculating the sum of distributed expressions when a first element and relation information are received as a search query is known. When such a distributed expression is generated, the information providing apparatus 10 calculates the sum of the distributed expression of the first element and the distributed expression of the relationship information when receiving the first element and the related information as a search query. By searching for a distributed expression similar to the calculated distributed expression, the first element accepted as a search query and the second element of the triple including the relationship information can be searched by a neighborhood search.

しかしながら、従来技術では、分散表現の精度を担保出来ない場合がある。例えば、従来技術では、あるトリプルに含まれる第１要素の分散表現と関係情報の分散表現との和が、そのトリプルに含まれる第２要素、すなわち正解データの分散表現に類似し、そのトリプルに含まれない第２要素の分散表現、すなわち不正解データに類似しないように分散表現を学習する。しかしながら、このような従来技術では、分散表現空間において、第１要素の分散表現と関係情報の分散表現との和の最近傍に、正解データとなる第２要素と類似する他の第２要素の分散表現が配置される可能性がある。このため、従来技術では、分散表現の精度を所定の精度までしか保証することができない。 However, there are cases in which the accuracy of distributed representation cannot be ensured with the prior art. For example, in the prior art, the sum of the distributed representation of the first element included in a triple and the distributed representation of the relationship information is similar to the second element included in the triple, that is, the distributed representation of correct data. The distributed expression of the second element not included, that is, the distributed expression is learned so as not to be similar to the incorrect answer data. However, in such a conventional technique, in the distributed expression space, the second element similar to the second element that is the correct answer data is located closest to the sum of the distributed expression of the first element and the distributed expression of the relationship information. A distributed representation may be placed. For this reason, in the prior art, the accuracy of the distributed representation can only be guaranteed up to a predetermined accuracy.

また、不正解データを選択する場合には、記憶するトリプルの中からランダムに第２要素を選択する。このため、第２要素が属する分野（人物の名前や職業等といった第２要素を分類することができる任意の基準）を設定した際に、各分野に属する第２要素の数に偏りが存在する場合は、属する第２要素の数が少ない分野における分散表現の学習が進まず、精度が悪化する恐れがある。 In addition, when selecting incorrect answer data, the second element is selected at random from the stored triples. For this reason, when the field to which the second element belongs (arbitrary criteria that can classify the second element such as a person's name or occupation) is set, there is a bias in the number of second elements belonging to each field. In such a case, there is a risk that learning of distributed expressions in a field where the number of second elements to which the member belongs is small does not advance, and the accuracy deteriorates.

そこで、情報提供装置１０は、以下の生成処理を実行することで、各要素の分散表現を生成する。まず、情報提供装置１０は、所定のトリプルに含まれる第１要素と関係情報とを抽出する。続いて、情報提供装置１０は、第２要素の選択元を限定するか否かを確率的に決定する。 Therefore, the information providing apparatus 10 generates a distributed representation of each element by executing the following generation process. First, the information providing apparatus 10 extracts a first element and relation information included in a predetermined triple. Subsequently, the information providing apparatus 10 determines probabilistically whether or not to limit the selection source of the second element.

そして、情報提供装置１０は、決定結果に基づいて、任意のトリプルに含まれる第２要素のうち所定のトリプルに含まれない第２要素を少なくとも選択する。例えば、情報提供装置１０は、第２要素の選択元を限定すると決定した場合は、任意のトリプルに含まれる第２要素のうち所定のトリプルに含まれない第２要素であって、所定の分野に属する第２要素を選択する。 And the information provision apparatus 10 selects at least the 2nd element which is not contained in a predetermined triple among the 2nd elements contained in arbitrary triple based on a determination result. For example, when the information providing apparatus 10 determines that the selection source of the second element is limited, the information providing apparatus 10 is a second element that is not included in the predetermined triple among the second elements included in the arbitrary triple, and the predetermined field The second element belonging to is selected.

より具体的な例を挙げると、情報提供装置１０は、第２要素の選択元を限定すると決定した場合は、任意のトリプルに含まれる第２要素のうち、所定のトリプルに含まれる第２要素と同一または類似する分野に属する第２要素であって、所定のトリプルに含まれない第２要素を選択する。一方、情報提供装置１０は、第２要素の選択元を限定しないと決定した場合は、任意のトリプルに含まれる第２要素のうち所定のトリプルに含まれない第２要素を選択する。そして、情報提供装置１０は、抽出した第１要素の分散表現と関係情報の分散表現との和と、選択された第２要素の分散表現とが類似しないように、各要素および関係情報の分散表現を生成する。 As a more specific example, when the information providing apparatus 10 determines to limit the selection source of the second element, the second element included in the predetermined triple among the second elements included in the arbitrary triple. The second element that belongs to the same or similar field and is not included in the predetermined triple is selected. On the other hand, when determining that the selection source of the second element is not limited, the information providing apparatus 10 selects the second element that is not included in the predetermined triple among the second elements included in the arbitrary triple. Then, the information providing apparatus 10 distributes each element and the relationship information so that the sum of the extracted distributed representation of the first element and the distributed representation of the relationship information is not similar to the distributed representation of the selected second element. Generate a representation.

より具体的には、情報提供装置１０は、分散表現を生成する際に、ポジティブサンプリングとネガティブサンプリングとを行う。ここで、ポジティブサンプリングとは、分散表現の学習に用いる正例を選択する処理であり、例えば、所定のトリプルに含まれる第１要素および関係情報と、その所定のトリプルに含まれる第２要素（すなわち、正解データ）とを正解ペアとして選択する処理である。また、ネガティブサンプリングとは、分散表現の学習に用いる負例を選択する処理であり、例えば、所定のトリプルに含まれる第１要素および関係情報と、その所定のトリプルに含まれない第２要素（すなわち、不正解データ）とを不正解ペアとして選択する処理である。 More specifically, the information providing apparatus 10 performs positive sampling and negative sampling when generating the distributed representation. Here, positive sampling is a process of selecting a positive example used for learning distributed expressions. For example, the first element and the relationship information included in a predetermined triple and the second element ( That is, the correct data) is selected as a correct pair. Negative sampling is a process of selecting a negative example used for learning distributed expressions. For example, the first element and the relationship information included in the predetermined triple and the second element (not included in the predetermined triple) ( That is, it is a process of selecting the incorrect answer data) as an incorrect answer pair.

このようなネガティブサンプリングを行う際、情報提供装置１０は、第２要素の選択先を限定するか否かを確率的に決定する。すなわち、情報提供装置１０は、分散表現の生成（すなわち、分散表現のトレーニング）において、トレーニング対象となる第２要素の選択先に確率的なバイアスを設定する。そして、情報提供装置１０は、第２要素の選択先を限定しないと決定した場合には、記憶する全てのトリプルに含まれる第２要素のうち、所定のトリプルに含まれない第２要素を不正解データとしてランダムに選択する。一方、情報提供装置１０は、第２要素の選択先を限定すると決定した場合は、所定のトリプルに含まれる第２要素と同一または類似する分野に属する第２要素であって、所定のトリプルに含まれない第２要素を不正解データとしてランダムに選択する。 When performing such negative sampling, the information providing apparatus 10 determines probabilistically whether or not to limit the selection destination of the second element. That is, the information providing apparatus 10 sets a stochastic bias in the selection destination of the second element to be a training target in the generation of the distributed expression (that is, the distributed expression training). When the information providing apparatus 10 determines that the selection destination of the second element is not limited, the information providing apparatus 10 does not include the second element that is not included in the predetermined triple among the second elements included in all the stored triples. Select randomly as correct data. On the other hand, when the information providing apparatus 10 determines to limit the selection destination of the second element, the information providing apparatus 10 is a second element belonging to the same or similar field as the second element included in the predetermined triple, A second element that is not included is randomly selected as incorrect answer data.

このような処理の結果、情報提供装置１０は、ネガティブサンプリングにおいて、正解データと同じまたは類似する分野の異なるデータを不正解データとして選択する。すると、情報提供装置１０は、意味的に正解データと近いが、誤りであるデータを不正解データとして選択することができる。このような不正解データを用いて分散表現を学習した場合、同一または類似する分野内に属する第２要素間の関係を分散表現空間に落とし込むことができる。換言すると、ある第１要素と関係情報との組に対し、正解データとなる第２要素と、正解データに類似するが不正解データである第２要素との関係を分散表現空間に落とし込むことができる。この結果、情報提供装置１０は、生成する分散表現の精度を向上させることができる。 As a result of such processing, the information providing apparatus 10 selects different data in the same or similar field as the correct answer data as the incorrect answer data in the negative sampling. Then, the information providing apparatus 10 can select data having an error as incorrect data although it is semantically close to the correct data. When the distributed representation is learned using such incorrect answer data, the relationship between the second elements belonging to the same or similar field can be dropped into the distributed representation space. In other words, for a set of a certain first element and relationship information, the relationship between the second element that is correct data and the second element that is similar to the correct data but is incorrect data may be dropped into the distributed expression space. it can. As a result, the information providing apparatus 10 can improve the accuracy of the generated distributed expression.

〔１−３．生成処理の一例について〕
続いて、図１を用いて、情報提供装置１０が実行する生成処理の一例について説明する。なお、以下の説明では、正解データとなる第２要素を含むトリプル、すなわち、処理対象となる所定のトリプルを正解トリプルと記載する。 [1-3. (Example of generation process)
Next, an example of a generation process executed by the information providing apparatus 10 will be described with reference to FIG. In the following description, a triple including the second element serving as correct data, that is, a predetermined triple to be processed is described as a correct triple.

まず、情報提供装置１０は、記憶するトリプルから正解トリプルを選択し、正解トリプルに含まれる第１要素と関係情報とを抽出する（ステップＳ１）。例えば、情報提供装置１０は、第１要素「Ｓ１」、関係情報「Ｐ１」、および第２要素「Ｏ１」を含むトリプルＴ１を正解トリプルとした場合、第１要素「Ｓ１」および関係情報「Ｐ１」を抽出する。 First, the information providing apparatus 10 selects a correct triple from the stored triples, and extracts the first element and the relationship information included in the correct triple (step S1). For example, when the triple T1 including the first element “S1”, the relation information “P1”, and the second element “O1” is the correct triple, the information providing apparatus 10 sets the first element “S1” and the relation information “P1”. Is extracted.

続いて、情報提供装置１０は、第２要素同士の類似度を示す確度ｄを算出する（ステップＳ２）。例えば、情報提供装置１０は、正解トリプルに含まれる第２要素「Ｏ１」を基準として、他のトリプルに含まれる第２要素「Ｏ２」、「Ｏ３」、「Ｏ４」を抽出する。そして、情報提供装置１０は、「Ｏ１」と「Ｏ２」との類似度を示す確度ｄ１２、「Ｏ１」と「Ｏ３」との確度ｄ１３、および「Ｏ１」と「Ｏ４」との確度ｄ１４を算出する。 Subsequently, the information providing apparatus 10 calculates the accuracy d indicating the similarity between the second elements (step S2). For example, the information providing apparatus 10 extracts the second elements “O2”, “O3”, and “O4” included in other triples based on the second element “O1” included in the correct triple. The information providing apparatus 10 calculates the accuracy d12 indicating the similarity between “O1” and “O2”, the accuracy d13 between “O1” and “O3”, and the accuracy d14 between “O1” and “O4”. To do.

なお、情報提供装置１０は、意味的な類似度、すなわち、セマンティックな類似度を確度として算出するのであれば、任意の指標に基づいて確度を算出してよい。例えば、情報提供装置１０は、第２要素が名前であるか、職業であるか、立場であるか等、第２要素の種別の類似性を示す確度を算出してもよい。また、情報提供装置１０は、第２要素に予め付与される分野に基づいて確度を算出してもよい。また、情報提供装置１０は、第２要素とともにトリプルに含まれる関係情報の同一性に基づいて、確度を算出してもよい。例えば、情報提供装置１０は、第２要素とともにトリプルに含まれる関係情報が同一である場合は、類似する或いは同じ分野に属する旨を示す確度を算出してもよい。すなわち、情報提供装置１０は、不正解データとなる第２要素が正解データとなる第２要素と同じ分野に属するか否か（類似するか否か）を示すことができるのであれば、任意の情報に基づいて、任意の情報を確度ｄとして算出してよい。 Note that the information providing apparatus 10 may calculate the accuracy based on an arbitrary index as long as it calculates a semantic similarity, that is, a semantic similarity, as the accuracy. For example, the information providing apparatus 10 may calculate the accuracy indicating the similarity of the type of the second element, such as whether the second element is a name, occupation, or position. Further, the information providing apparatus 10 may calculate the accuracy based on a field previously given to the second element. The information providing apparatus 10 may calculate the accuracy based on the identity of the relationship information included in the triple together with the second element. For example, when the relationship information included in the triple together with the second element is the same, the information providing apparatus 10 may calculate the accuracy indicating that it is similar or belongs to the same field. That is, the information providing apparatus 10 can arbitrarily indicate whether or not the second element that is incorrect data belongs to the same field as the second element that is correct data (whether or not they are similar). Any information may be calculated as the accuracy d based on the information.

続いて、情報提供装置１０は、不正解データの選択元となる分野を限定するか否かを確率的に決定する（ステップＳ３）。例えば、情報提供装置１０は、分散表現の学習を行う度に、確率「β」で分野を限定するか否かを決定する。より具体的な例を挙げると、情報提供装置１０は、確率「β」の値が「０．４」である場合、４０パーセント（すなわち、４割）の確率で分野を限定すると判定する。なお、例えば、情報提供装置１０は、確率「β」の値として、「０」から「０．４」の範囲において予め定められた値を採用してもよく、任意の値（例えば、「０」から「１」の範囲等）において、分散表現の学習に伴い動的に値を変更してもよい。 Subsequently, the information providing apparatus 10 probabilistically determines whether or not to limit the field that is the source of selection of incorrect answer data (step S3). For example, each time the information providing apparatus 10 learns the distributed expression, the information providing apparatus 10 determines whether or not to limit the field with the probability “β”. As a more specific example, when the value of the probability “β” is “0.4”, the information providing apparatus 10 determines that the field is limited with a probability of 40 percent (that is, 40%). For example, the information providing apparatus 10 may adopt a predetermined value in the range of “0” to “0.4” as the value of the probability “β”, or any value (for example, “0”). In the range of “1” to “1”, etc.), the value may be dynamically changed as the distributed expression is learned.

例えば、図１に示す例では、全カテゴリＣＡには、第２要素「Ｏ１」〜「Ｏ５」が含まれている。このような場合、情報提供装置１０は、算出した確度ｄに基づいて、各第２要素のカテゴリ分けを行う。例えば、「Ｏ１」と「Ｏ２」との間の確度ｄ１２、および「Ｏ１」と「Ｏ４」との間の確度ｄ１４が所定の閾値よりも高い場合、情報提供装置１０は、第２要素「Ｏ１」、「Ｏ２」、「Ｏ４」をカテゴリＣ１に分類する。なお、同じ分野に属するか否かを判定するための確度ｄの閾値は、所定の値を採用してもよく、学習に応じて動的に変化させてもよい。 For example, in the example illustrated in FIG. 1, all categories CA include second elements “O1” to “O5”. In such a case, the information providing apparatus 10 categorizes each second element based on the calculated accuracy d. For example, when the accuracy d12 between “O1” and “O2” and the accuracy d14 between “O1” and “O4” are higher than a predetermined threshold value, the information providing apparatus 10 has the second element “O1”. ”,“ O2 ”, and“ O4 ”are classified into the category C1. Note that a threshold value of the accuracy d for determining whether or not they belong to the same field may be a predetermined value, or may be dynamically changed according to learning.

そして、情報提供装置１０は、不正解ペアと正解ペアとを生成し、分散表現の学習を行う（ステップＳ４）。例えば、情報提供装置１０は、正解データとして、正解トリプルに含まれる第２要素を選択する。また、例えば、情報提供装置１０は、分野を限定しない旨を決定した場合は、全カテゴリＣＡに属する第２要素「Ｏ１」〜「Ｏ５」のうち、正解トリプルに含まれない第２要素「Ｏ２」〜「Ｏ５」の中から、不正解データとなる第２要素をランダムに選択する。一方、情報提供装置１０は、分野を限定する旨を決定した場合は、所定のカテゴリに属する第２要素の中から、正解トリプルに含まれない第２要素をランダムに選択する。より具体的には、情報提供装置１０は、正解データと同じカテゴリＣ１に属する第２要素「Ｏ２」、「Ｏ４」の中から、不正解データをランダムに選択する。 And the information provision apparatus 10 produces | generates an incorrect answer pair and a correct answer pair, and learns a distributed expression (step S4). For example, the information providing apparatus 10 selects the second element included in the correct triple as the correct answer data. For example, when the information providing apparatus 10 determines not to limit the field, the second element “O2” that is not included in the correct triple among the second elements “O1” to “O5” belonging to all categories CA. ] To “O5”, the second element to be incorrect answer data is randomly selected. On the other hand, when the information providing apparatus 10 determines to limit the field, the information providing apparatus 10 randomly selects a second element not included in the correct triple from the second elements belonging to the predetermined category. More specifically, the information providing apparatus 10 randomly selects incorrect answer data from the second elements “O2” and “O4” belonging to the same category C1 as the correct answer data.

そして、情報提供装置１０は、分散表現の学習を行う。例えば、情報提供装置１０は、正解ペアとして、第１要素「Ｓ１」と関係情報「Ｐ１」との組、および、第２要素「Ｏ１」を生成する。また、情報提供装置１０は、不正解ペアとして、第１要素「Ｓ１」と関係情報「Ｐ１」との組、および、第２要素「Ｏ４」を生成する。そして、情報提供装置１０は、第１要素「Ｓ１」の分散表現Ｓ１と関係情報「Ｐ１」の分散表現Ｐ１との和が、第２要素「Ｏ１」の分散表現Ｏ１と類似し、第１要素「Ｓ１」の分散表現Ｓ１と関係情報「Ｐ１」の分散表現Ｐ１との和が、第２要素「Ｏ４」の分散表現Ｏ４と類似しないように、各分散表現の値を学習する。 Then, the information providing apparatus 10 learns distributed expressions. For example, the information providing apparatus 10 generates a pair of the first element “S1” and the relationship information “P1” and the second element “O1” as the correct answer pair. Further, the information providing apparatus 10 generates a pair of the first element “S1” and the relationship information “P1” and the second element “O4” as an incorrect answer pair. Then, the information providing apparatus 10 is similar to the distributed expression O1 of the second element “O1” in which the sum of the distributed expression S1 of the first element “S1” and the distributed expression P1 of the relationship information “P1” is similar to the first element “S1”. The value of each distributed expression is learned so that the sum of the distributed expression S1 of “S1” and the distributed expression P1 of the relationship information “P1” is not similar to the distributed expression O4 of the second element “O4”.

また、情報提供装置１０は、ステップＳ１〜Ｓ４の処理を繰り返し実行する。この結果、情報提供装置１０は、確率的に、正解データと類似する不正解データの存在を反映させた分散表現を生成することができるので、各要素および各関係情報の分散表現の精度を向上させることができる。 Moreover, the information provision apparatus 10 repeatedly performs the process of step S1-S4. As a result, the information providing apparatus 10 can probabilistically generate a distributed expression reflecting the presence of incorrect answer data similar to the correct answer data, thereby improving the accuracy of the distributed expression of each element and each related information. Can be made.

続いて、情報提供装置１０が実行する検索処理の一例について説明する。まず、情報提供装置１０は、利用者端末１００から検索クエリを受付ける（ステップＳ５）。例えば、情報提供装置１０は、利用者端末１００から検索クエリとして、第１要素「Ｓ１」、および関係情報「Ｐ１」を受付ける。 Next, an example of search processing executed by the information providing apparatus 10 will be described. First, the information providing apparatus 10 receives a search query from the user terminal 100 (step S5). For example, the information providing apparatus 10 receives the first element “S1” and the related information “P1” as a search query from the user terminal 100.

このような場合、情報提供装置１０は、検索クエリの分散表現を足し合わせて、類似する分散表現を検索する（ステップＳ６）。例えば、情報提供装置１０は、生成した分散表現の中から、第１要素「Ｓ１」の分散表現「Ｓ１」と関係情報「Ｐ１」の分散表現「Ｐ１」とを特定し、特定した分散表現の和を算出する。そして、情報提供装置１０は、生成した和の値と類似する分散表現を近傍検索する。 In such a case, the information providing apparatus 10 searches for similar distributed expressions by adding the distributed expressions of the search query (step S6). For example, the information providing apparatus 10 identifies the distributed representation “S1” of the first element “S1” and the distributed representation “P1” of the relationship information “P1” from the generated distributed representations, and Calculate the sum. Then, the information providing apparatus 10 performs a neighborhood search for a distributed expression similar to the generated sum value.

ここで、上述した生成処理により、情報提供装置１０は、正解データと、正解データと類似する不正解データとの関係を分散表現に落とし込むことができる。この結果、例えば、分散表現「Ｓ１」と分散表現「Ｐ１」との和に最も類似する分散表現は、分散表現「Ｓ１」に対応する第１要素「Ｓ１」と、分散表現「Ｐ１」に対応する関係情報「Ｐ１」とを含む正解トリプルの第２要素に対応する分散表現である確率が高い。 Here, with the generation process described above, the information providing apparatus 10 can drop the relationship between the correct answer data and the incorrect answer data similar to the correct answer data into the distributed expression. As a result, for example, the distributed expression most similar to the sum of the distributed expression “S1” and the distributed expression “P1” corresponds to the first element “S1” corresponding to the distributed expression “S1” and the distributed expression “P1”. There is a high probability of being a distributed expression corresponding to the second element of the correct triple including the relation information “P1”.

そこで、情報提供装置１０は、分散表現「Ｓ１」と分散表現「Ｐ１」との和に最も類似する分散表現（例えば、分散表現「Ｏ１」）を検索し、検索した分散表現と対応する第２要素を特定する。そして、情報提供装置１０は、特定した第２要素を検索結果として利用者端末１００に出力する（ステップＳ７）。なお、例えば、情報提供装置１０は、分散表現「Ｓ１」と分散表現「Ｐ１」との和に近い順に、所定の数の分散表現を特定し、特定した各分散表現に対応する第２要素をランキング形式（すなわち、分散表現が和に近い順）で出力してもよい。 Therefore, the information providing apparatus 10 searches for a distributed expression (for example, distributed expression “O1”) that is most similar to the sum of the distributed expression “S1” and the distributed expression “P1”, and the second corresponding to the searched distributed expression. Identify the element. And the information provision apparatus 10 outputs the specified 2nd element to the user terminal 100 as a search result (step S7). For example, the information providing apparatus 10 specifies a predetermined number of distributed expressions in the order close to the sum of the distributed expression “S1” and the distributed expression “P1”, and sets the second element corresponding to each specified distributed expression. The data may be output in a ranking format (that is, in the order in which the distributed representation is close to the sum).

〔１−４．動的な確率の設定について〕
上述した説明では、情報提供装置１０は、不正解データの選択元となる分野を限定するか否かを所定の確率「β」で決定した。ここで、情報提供装置１０は、確率「β」の値を学習の進捗に応じて動的に変更してもよい。例えば、情報提供装置１０は、所定の回数、分散表現の学習を行うまでは、「β」の値を「０」に設定し、所定の回数だけ学習を行った後は、「β」の値を上昇させてもよい。すなわち、情報提供装置１０は、学習回数が所定の閾値を超えた際に、不正解データの選択元となる分野を限定するか否かを確率的に決定してもよい。また、情報提供装置１０は、学習回数が増えるにつれて、「β」の値を上昇させてもよい。 [1-4. (Setting dynamic probability)
In the above description, the information providing apparatus 10 determines whether or not to limit the field from which incorrect answer data is selected with a predetermined probability “β”. Here, the information providing apparatus 10 may dynamically change the value of the probability “β” according to the progress of learning. For example, the information providing apparatus 10 sets the value of “β” to “0” until it learns the distributed expression a predetermined number of times, and after it learns the predetermined number of times, the value of “β” May be raised. That is, the information providing apparatus 10 may determine probabilistically whether or not to limit the field from which incorrect answer data is selected when the number of learnings exceeds a predetermined threshold. Further, the information providing apparatus 10 may increase the value of “β” as the number of learning increases.

また、情報提供装置１０は、分散表現の精度に応じて、不正解データの選択元となる分野を限定するか否かを確率的に決定してもよい。例えば、情報提供装置１０は、正解トリプルに含まれる第２要素の分散表現が、正解トリプルに含まれる第１要素および関係情報の分散表現の和に最も近い分散表現となる割合を分散表現の精度として算出し、学習を行う度に精度の値を算出する。そして、情報提供装置１０は、算出した精度が所定の条件を満たす場合には、不正解データの選択元となる分野を限定するか否かを確率的に決定し、精度が所定の条件を満たさない場合は、選択元を限定せずともよい。 Further, the information providing apparatus 10 may probabilistically determine whether or not to limit the field from which incorrect answer data is selected according to the accuracy of the distributed representation. For example, the information providing apparatus 10 determines the ratio of the distributed representation of the second element included in the correct triple to the distributed representation closest to the sum of the distributed representation of the first element and the relationship information included in the correct triple. The accuracy value is calculated every time learning is performed. Then, when the calculated accuracy satisfies a predetermined condition, the information providing apparatus 10 determines probabilistically whether to limit the field from which the incorrect answer data is selected, and the accuracy satisfies the predetermined condition. If not, the selection source may not be limited.

例えば、情報提供装置１０は、分散表現の学習を継続して行っているにも関わらず、精度が所定の閾値を超えない場合には、不正解データの選択元となる分野を限定するか否かを確率的に決定してもよい。また、情報提供装置１０は、精度が所定の閾値を超えた場合に、不正解データの選択元となる分野を限定するか否かを確率的に決定してもよい。また、情報提供装置１０は、学習を継続したにも関わらず、精度の上昇率が所定の期間だけ変化しない場合は、不正解データの選択元となる分野を限定するか否かを確率的に決定してもよい。 For example, if the information providing apparatus 10 continues to learn distributed expressions and the accuracy does not exceed a predetermined threshold, whether or not to limit the field from which incorrect answer data is selected. It may be determined probabilistically. In addition, when the accuracy exceeds a predetermined threshold, the information providing apparatus 10 may determine probabilistically whether or not to limit the field from which incorrect data is selected. Further, if the rate of increase in accuracy does not change for a predetermined period in spite of continuing learning, the information providing apparatus 10 probabilistically determines whether to limit the field from which incorrect answer data is selected. You may decide.

〔１−５．確度の設定について〕
また、上述した説明では、情報提供装置１０は、不正解データの選択元となる分野を限定する場合に、正解データとの確度が所定の閾値よりも高い第２要素を不正解データとして選択した。しかしながら、実施形態は、これに限定されるものではない。例えば、情報提供装置１０は、正解データとなる第２要素と同じ種別の第２要素（例えば、名前や顔写真等のエンティティ）から不正解データを選択してもよい。 [1-5. (Setting accuracy)
In the above description, the information providing apparatus 10 selects, as the incorrect answer data, the second element whose accuracy with the correct answer data is higher than the predetermined threshold when limiting the field from which the incorrect answer data is selected. . However, the embodiment is not limited to this. For example, the information providing apparatus 10 may select incorrect answer data from a second element (for example, an entity such as a name or a face photograph) of the same type as the second element that becomes correct answer data.

また、情報提供装置１０は、任意の基準に基づく分類を採用してもよい。例えば、情報提供装置１０は、正解データが「政治家の夫人の名前」を示す場合は、「名前」を示す第２要素を正解データと同じ分野に属する第２要素としてもよく、「夫人の名前」を示す第２要素を正解データと同じ分野に属する第２要素としてもよく、「政治家の夫人の名前」を示す第２要素を正解データと同じ分野に属する第２要素としてもよい。すなわち、情報提供装置１０は、正解データである第２要素と同一または類似の分野に属する第２要素を不正解データとするのであれば、任意の基準で設定された分野を作用してよい。 Further, the information providing apparatus 10 may adopt a classification based on an arbitrary criterion. For example, when the correct answer data indicates “the name of the politician's wife”, the information providing apparatus 10 may set the second element indicating “name” as the second element belonging to the same field as the correct answer data. The second element indicating “name” may be the second element belonging to the same field as the correct answer data, and the second element indicating “name of the politician's wife” may be the second element belonging to the same field as the correct answer data. In other words, the information providing apparatus 10 may act on a field set on an arbitrary basis as long as the second element belonging to the same or similar field as the second element which is correct answer data is used as incorrect answer data.

また、例えば、情報提供装置１０は、学習が進むたびに、あるいは、分類情報の精度の上昇率が滞った場合等に、不正解データを選択する分野を狭めてもよい。すなわち、情報提供装置１０は、不正解データを選択する分野の広さを動的に設定してもよい。 In addition, for example, the information providing apparatus 10 may narrow the field for selecting incorrect answer data every time learning progresses or when the rate of increase in accuracy of classification information is delayed. That is, the information providing apparatus 10 may dynamically set the size of a field for selecting incorrect answer data.

また、情報提供装置１０は、これらの分野を、各第２要素と共にトリプルに含まれる関係情報の同一性に基づいて設定してもよい。例えば、情報提供装置１０は、第１要素の「名前」を示す関係情報を含む複数のトリプルに含まれる第２要素については、同一の分野（例えば、分野「名前」）に属する第２要素と取り扱ってもよい。また、情報提供装置１０は、正解トリプルに含まれる関係情報と類似する関係情報を含む他のトリプルに含まれる第２要素を、正解データと類似する分野に属する第２要素としてもよい。すなわち、情報提供装置１０は、選択元を限定すると決定した場合は、正解トリプルの関係情報と同じ関係情報を含む任意のトリプルに含まれる第２要素のうち、正解トリプルに含まれない第２要素を不正解データとして選択してもよい。 Moreover, the information provision apparatus 10 may set these fields based on the identity of the relationship information included in the triple together with each second element. For example, for the second element included in the plurality of triples including the relationship information indicating the “name” of the first element, the information providing apparatus 10 and the second element belonging to the same field (for example, the field “name”) May be handled. Moreover, the information provision apparatus 10 is good also considering the 2nd element contained in the other triple containing the relationship information similar to the relationship information contained in the correct answer triple as the 2nd element which belongs to the field | area similar to correct data. That is, when the information providing apparatus 10 determines to limit the selection source, the second element that is not included in the correct triple among the second elements included in any triple that includes the same relationship information as the correct triple information. May be selected as incorrect answer data.

なお、情報提供装置１０は、上述した各種の処理を、第２要素同士の確度を算出することにより実現してもよい。例えば、情報提供装置１０は、同一の分野に属するか否かを示す二値の確度を設定してもよい。また、情報提供装置１０は、第２要素の種別の同一性や類似性、第２要素と共にトリプルに含まれる関係情報の同一性や類似性等に基づいて、確度の算出を行い、算出した確度が所定の閾値を超えるか否かに基づいて、第２要素同士が同一または類似する分野に属するか否かを判定してもよい。例えば、情報提供装置１０は、第２要素の選択元を限定すると決定した場合は、任意のトリプルに含まれる第２要素のうち正解トリプルに含まれない第２要素であって、正解トリプルに含まれる第２要素との間の確度が所定の範囲に含まれる第２要素を選択してもよい。 In addition, the information provision apparatus 10 may implement | achieve the various processes mentioned above by calculating the accuracy of 2nd elements. For example, the information providing apparatus 10 may set a binary accuracy indicating whether or not they belong to the same field. Further, the information providing apparatus 10 calculates the accuracy based on the identity and similarity of the type of the second element, the identity and similarity of the relationship information included in the triple together with the second element, and the calculated accuracy Whether or not the second elements belong to the same or similar field may be determined based on whether or not exceeds a predetermined threshold. For example, when it is determined that the selection source of the second element is limited, the information providing apparatus 10 is a second element that is not included in the correct triple among the second elements included in any triple, and is included in the correct triple. A second element whose accuracy is within a predetermined range may be selected.

また、情報提供装置１０は、第２要素同士の意味的（セマンティック）な類似度を示す確度を算出するのであれば、任意の手法により確度の算出を行ってよい。例えば、情報提供装置１０は、第２要素の種別や、第２要素と共にトリプルに含まれる関係情報のみならず、第２要素同士の意味的な類似度に基づいて確度を算出するのであれば、任意の手法により確度を算出してもよい。 Further, the information providing apparatus 10 may calculate the accuracy by any method as long as it calculates the accuracy indicating the semantic similarity between the second elements. For example, if the information providing apparatus 10 calculates the accuracy based not only on the type of the second element or the relationship information included in the triple together with the second element, but also on the semantic similarity between the second elements, The accuracy may be calculated by any method.

また、情報提供装置１０は、選択元を限定する旨を決定した場合は、分散表現の精度が上昇するにつれて正解トリプルに含まれる第２要素が属する分野とより類似する分野に属する第２要素を不正解データとして選択してもよい。すなわち、情報提供装置１０は、分散表現の精度が上昇するにつれて、正解データとなる第２要素と同じ分類であるか否かを判定する際の確度の閾値を上昇させることで、不正解データを選択する分野を徐々に狭めてもよい。 In addition, when the information providing apparatus 10 determines to limit the selection source, the second element belonging to a field more similar to the field to which the second element included in the correct answer triple belongs as the accuracy of the distributed representation increases. It may be selected as incorrect answer data. That is, as the accuracy of the distributed representation increases, the information providing apparatus 10 increases the accuracy threshold when determining whether the classification is the same as the second element that is the correct answer data, so that the incorrect answer data is obtained. The fields to be selected may be gradually narrowed.

また、情報提供装置１０は、属する第２要素の数が他の分野よりも少ない分野に属する第２要素を不正解データとして選択してもよい。例えば、情報提供装置１０は、正解トリプルに含まれる関係情報と同じ関係情報が含まれるトリプルの第２要素を、正解データと同じ分類に属する第２要素として特定する。このような場合、情報提供装置１０は、特定した第２要素が属する分野を所定の粒度で特定し、特定した分野に属する第２要素の数を計数する。そして、情報提供装置１０は、計数した数が最も少ない分野に属する第２要素の中から、不正解データを選択することで、選択機会が少ない分野を優先的に学習してもよい。 In addition, the information providing apparatus 10 may select a second element belonging to a field in which the number of second elements to which it belongs is smaller than other fields as incorrect answer data. For example, the information providing apparatus 10 specifies the second element of the triple including the same relationship information as the relationship information included in the correct triple as the second element belonging to the same classification as the correct data. In such a case, the information providing apparatus 10 identifies the field to which the identified second element belongs with a predetermined granularity, and counts the number of second elements belonging to the identified field. And the information provision apparatus 10 may preferentially learn the field | area with few selection opportunities by selecting incorrect answer data from the 2nd element which belongs to the field | area with the fewest number counted.

〔２．情報提供装置の構成〕
以下、上記した情報提供装置１０が有する機能構成の一例について説明する。図２は、実施形態に係る情報提供装置の構成例を示す図である。図２に示すように、情報提供装置１０は、通信部２０、記憶部３０、および制御部４０を有する。 [2. Configuration of information providing device]
Hereinafter, an example of the functional configuration of the information providing apparatus 10 will be described. FIG. 2 is a diagram illustrating a configuration example of the information providing apparatus according to the embodiment. As illustrated in FIG. 2, the information providing apparatus 10 includes a communication unit 20, a storage unit 30, and a control unit 40.

通信部２０は、例えば、ＮＩＣ（Network Interface Card）等によって実現される。そして、通信部２０は、ネットワークＮと有線または無線で接続され、利用者端末１００との間で情報の送受信を行う。 The communication unit 20 is realized by, for example, a NIC (Network Interface Card). The communication unit 20 is connected to the network N by wire or wireless, and transmits / receives information to / from the user terminal 100.

記憶部３０は、例えば、ＲＡＭ（Random Access Memory)、フラッシュメモリ（Flash Memory）等の半導体メモリ素子、または、ハードディスク、光ディスク等の記憶装置によって実現される。また、記憶部３０は、エンティティデータベース３１、関係情報データベース３２、および分散表現データベース３３を記憶する。 The storage unit 30 is realized by, for example, a semiconductor memory device such as a RAM (Random Access Memory) or a flash memory, or a storage device such as a hard disk or an optical disk. In addition, the storage unit 30 stores an entity database 31, a relationship information database 32, and a distributed representation database 33.

以下、図３〜５を用いて、各データベース３１〜３３に登録される情報の一例を説明する。エンティティデータベース３１には、エンティティ、すなわち、トリプルに含まれる第１要素および第２要素に関する情報が登録される。例えば、図３は、実施形態に係るエンティティデータベースに登録される情報の一例を示す図である。図３に示すように、エンティティデータベース３１には、「エンティティＩＤ」、「データ種別」および「データ」といった項目を有する情報が登録される。 Hereinafter, an example of information registered in each of the databases 31 to 33 will be described with reference to FIGS. In the entity database 31, information on entities, that is, first elements and second elements included in a triple is registered. For example, FIG. 3 is a diagram illustrating an example of information registered in the entity database according to the embodiment. As shown in FIG. 3, information having items such as “entity ID”, “data type”, and “data” is registered in the entity database 31.

ここで、「エンティティＩＤ」とは、エンティティの識別子である。また、「データ種別」とは、対応付けられた「エンティティＩＤ」が示すエンティティと対応する情報の種別を示す情報である。また、「データ」とは、対応付けられた「エンティティＩＤ」が示すエンティティと対応する情報である。 Here, “entity ID” is an identifier of an entity. The “data type” is information indicating the type of information corresponding to the entity indicated by the associated “entity ID”. “Data” is information corresponding to the entity indicated by the associated “entity ID”.

例えば、図３に示す例では、エンティティＩＤ「Ｓ１」、データ種別「人名」、およびデータ「名前＃１」が対応付けて登録されている。このような情報は、エンティティＩＤ「Ｓ１」が示すエンティティが、「人名」に対応するエンティティであり、その人名が「名前＃１」である旨を示す。なお、図３に示す例では、「名前＃１」等といった概念的な値を記載したが、実際には、エンティティデータベース３１には、対応付けられたエンティティと対応する人物の名前、写真、生年月日等を示す各種の情報がデータとして登録されることとなる。 For example, in the example illustrated in FIG. 3, the entity ID “S1”, the data type “person name”, and the data “name # 1” are registered in association with each other. Such information indicates that the entity indicated by the entity ID “S1” is an entity corresponding to “person name”, and the person name is “name # 1”. In the example shown in FIG. 3, conceptual values such as “name # 1” are described, but actually, in the entity database 31, the names, photographs, and birth dates of persons corresponding to the associated entities. Various types of information indicating the date and the like are registered as data.

関係情報データベース３２には、エンティティ間の関係情報、すなわち、トリプルに含まれる第１要素および第２要素の関係性を示す関係情報が登録される。例えば、図４は、実施形態に係る関係情報データベースに登録される情報の一例を示す図である。図４に示す例では、関係情報データベース３２には、「関係情報ＩＤ」、「種別」、「第１要素」、および「第２要素」といった項目を有する情報が登録される。 In the relationship information database 32, relationship information between entities, that is, relationship information indicating the relationship between the first element and the second element included in the triple is registered. For example, FIG. 4 is a diagram illustrating an example of information registered in the relational information database according to the embodiment. In the example illustrated in FIG. 4, information including items such as “relation information ID”, “type”, “first element”, and “second element” is registered in the relationship information database 32.

ここで、「関係情報ＩＤ」とは、関係情報を識別する識別子である。また、「種別」とは、「関係情報ＩＤ」が示す関係情報が、要素間のどのような関係を示しているかを示す情報である。また、「第１要素」および「第２要素」とは、対応付けられた「関係情報ＩＤ」が示す関係情報が関係性を示す第１要素および第２要素である。 Here, the “relation information ID” is an identifier for identifying the relationship information. Further, the “type” is information indicating what kind of relationship between the elements the relationship information indicated by the “relation information ID” indicates. The “first element” and the “second element” are the first element and the second element in which the relationship information indicated by the associated “relation information ID” indicates the relationship.

例えば、図４に示す例では、関係情報ＩＤ「Ｐ１」、種別「配偶者」、第１要素「Ｓ１」、および第２要素「Ｏ１」が対応付けて登録されている。このような情報は、関係情報ＩＤ「Ｐ１」が示す関係情報が、第１要素「Ｓ１」および第２要素「Ｏ１」間の関係を示す情報であり、第２要素「Ｏ１」が第１要素「Ｓ１」の「配偶者」である旨を示す。すなわち、このような情報は、関係情報ＩＤ「Ｐ１」が示す関係情報と、第１要素「Ｓ１」および第２要素「Ｏ１」がトリプルを構成する旨を示す。 For example, in the example illustrated in FIG. 4, the relationship information ID “P1”, the type “spouse”, the first element “S1”, and the second element “O1” are registered in association with each other. In such information, the relationship information indicated by the relationship information ID “P1” is information indicating the relationship between the first element “S1” and the second element “O1”, and the second element “O1” is the first element. It shows that it is a “spouse” of “S1”. That is, such information indicates that the relationship information indicated by the relationship information ID “P1”, the first element “S1”, and the second element “O1” form a triple.

分散表現データベース３３には、各エンティティや関係情報の分散表現が登録される。例えば、図５は、実施形態に係る分散表現データベースに登録される情報の一例を示す図である。図５に示すように、分散表現データベース３３には、「要素ＩＤ／関係情報ＩＤ」および「分散表現」といった項目が登録される。「要素ＩＤ／関係情報ＩＤ」とは、分散表現と対応するエンティティＩＤまたは関係情報ＩＤである。また、「分散表現」とは、対応付けられた「要素ＩＤ／関係情報ＩＤ」が示すエンティティまたは関係情報の分散表現である。 In the distributed expression database 33, distributed expressions of each entity and relation information are registered. For example, FIG. 5 is a diagram illustrating an example of information registered in the distributed expression database according to the embodiment. As shown in FIG. 5, items such as “element ID / relation information ID” and “distributed expression” are registered in the distributed expression database 33. The “element ID / relation information ID” is an entity ID or relationship information ID corresponding to the distributed representation. The “distributed expression” is a distributed expression of the entity or the relationship information indicated by the associated “element ID / relationship information ID”.

例えば、図５に示す例では、分散表現データベース３３には、要素ＩＤ／関係情報ＩＤ「Ｓ１」および分散表現「分散表現＃１」が対応付けて登録されている。このような情報は、要素ＩＤ／関係情報ＩＤ「Ｓ１」が示す要素の分散表現が、分散表現「分散表現＃１」である旨を示す。なお、図５に示す例では「分散表現＃１」といった概念的な値を記載したが、実際には、分散表現データベース３３には、分散表現である多次元量が登録されることとなる。 For example, in the example shown in FIG. 5, the element ID / relation information ID “S1” and the distributed expression “distributed expression # 1” are registered in the distributed expression database 33 in association with each other. Such information indicates that the distributed representation of the element indicated by the element ID / relationship information ID “S1” is the distributed representation “distributed representation # 1”. In the example shown in FIG. 5, a conceptual value such as “distributed expression # 1” is described. However, in practice, a multi-dimensional quantity that is a distributed expression is registered in the distributed expression database 33.

図２に戻り、説明を続ける。制御部４０は、コントローラ（controller）であり、例えば、ＣＰＵ（Central Processing Unit）、ＭＰＵ（Micro Processing Unit）等のプロセッサによって、情報提供装置１０内部の記憶装置に記憶されている各種プログラムがＲＡＭ等を作業領域として実行されることにより実現される。また、制御部４０は、コントローラ（controller）であり、例えば、ＡＳＩＣ（Application Specific Integrated Circuit）やＦＰＧＡ（Field Programmable Gate Array）等の集積回路により実現されてもよい。 Returning to FIG. 2, the description will be continued. The control unit 40 is a controller. For example, various programs stored in a storage device inside the information providing apparatus 10 are stored in a RAM or the like by a processor such as a CPU (Central Processing Unit) or an MPU (Micro Processing Unit). Is implemented as a work area. The control unit 40 is a controller, and may be realized by an integrated circuit such as an ASIC (Application Specific Integrated Circuit) or an FPGA (Field Programmable Gate Array).

図２に示すように、制御部４０は、算出部４１、抽出部４２、決定部４３、選択部４４、生成部４５、および検索部４６を有する。算出部４１は、トリプルに含まれる第２要素同士の類似度を示す確度を算出する。より具体的には、算出部４１は、第２要素同士の意味的な類似度を示す確度を算出する。例えば、算出部４１は、関係情報データベース３２を参照し、第２要素として登録されているエンティティを特定する。また、算出部４１は、特定したエンティティと対応するデータの種別やデータをエンティティデータベース３１から特定するとともに、特定したエンティティとともにトリプルに含まれる関係情報を関係情報データベース３１から特定する。 As illustrated in FIG. 2, the control unit 40 includes a calculation unit 41, an extraction unit 42, a determination unit 43, a selection unit 44, a generation unit 45, and a search unit 46. The calculation unit 41 calculates the accuracy indicating the similarity between the second elements included in the triple. More specifically, the calculation unit 41 calculates the accuracy indicating the semantic similarity between the second elements. For example, the calculation unit 41 refers to the relationship information database 32 and identifies an entity registered as the second element. Further, the calculation unit 41 identifies the type and data of data corresponding to the identified entity from the entity database 31 and identifies the relationship information included in the triple together with the identified entity from the relationship information database 31.

そして、算出部４１は、エンティティと対応するデータの種別の同一性や類似性、特定したエンティティとともにトリプルに含まれる関係情報の種別の同一性や類似性等に基づいて、各第２要素同士の意味的な類似度を示す確度を算出する。なお、算出部４１は、抽出部４２によって正解トリプルとして選択されたトリプルに含まれる第２要素と他の第２要素との確度を算出してもよい。 And the calculation part 41 is based on the identity and similarity of the data type corresponding to the entity, the identity and similarity of the type of the relationship information included in the triple together with the specified entity, etc. The accuracy indicating the semantic similarity is calculated. Note that the calculation unit 41 may calculate the accuracy between the second element and the other second element included in the triple selected as the correct triple by the extraction unit 42.

抽出部４２は、所定のトリプルに含まれる第１要素と関係情報とを抽出する。例えば、抽出部４２は、関係情報データベース３２を参照し、トリプルの中から処理対象となるトリプルを正解トリプルとして１つ選択する。そして、抽出部４２は、選択した正解トリプルに含まれる第１要素のエンティティと関係情報とを特定する。すなわち、抽出部４２は、第１要素として、正解トリプルに主語と対応する情報として含まれるエンティティを抽出し、関係情報として、正解トリプルに述語と対応する情報として含まれる要素を抽出する。 The extraction unit 42 extracts the first element and the relationship information included in the predetermined triple. For example, the extraction unit 42 refers to the relationship information database 32 and selects one triple to be processed as a correct triple from the triples. Then, the extraction unit 42 specifies the entity of the first element and the relationship information included in the selected correct triple. That is, the extraction unit 42 extracts an entity included as information corresponding to the subject in the correct triple as the first element, and extracts an element included as information corresponding to the predicate in the correct triple as the relationship information.

決定部４３は、不正解データとなる第２要素の選択元を限定するか否かを確率的に決定する。例えば、決定部４３は、抽出部４２が正解トリプルを選択する度に、不正解データとなる第２要素の選択元を限定するか否かを確率的に決定する。なお、決定部４３は、学習処理が継続して行われた場合は、分散表現データベース３３を参照し、分散表現の精度を算出する。そして、決定部４３は、分散表現の精度が所定の条件を満たす場合は、選択元を限定するか否かを確率的に決定し、分散表現の精度が所定の条件を満たさない場合は、選択元を限定しないと決定してもよい。例えば、決定部４３は、分散表現の精度の上昇率が所定の期間だけ変化しない場合は、選択元を限定するか否かを確率的に決定してもよい。なお、決定部４３は、例えば、４割以下の確率で、第２要素の選択元を限定する旨を決定してよい。 The determination unit 43 determines probabilistically whether or not to limit the selection source of the second element serving as incorrect answer data. For example, each time the extraction unit 42 selects the correct answer triple, the determination unit 43 probabilistically determines whether or not the selection source of the second element that is the incorrect answer data is limited. When the learning process is continuously performed, the determination unit 43 refers to the distributed representation database 33 and calculates the accuracy of the distributed representation. The determination unit 43 probabilistically determines whether or not to limit the selection source when the accuracy of the distributed representation satisfies a predetermined condition, and selects when the accuracy of the distributed representation does not satisfy the predetermined condition. You may decide not to limit the origin. For example, when the rate of increase in accuracy of the distributed representation does not change for a predetermined period, the determination unit 43 may determine probabilistically whether to limit the selection source. For example, the determination unit 43 may determine that the selection source of the second element is limited with a probability of 40% or less.

選択部４４は、第２要素の選択元を限定しないと決定した場合は、任意のトリプルに含まれる第２要素のうち正解トリプルに含まれない第２要素を不正解データとして選択する。また、選択部４４は、第２要素の選択元を限定すると決定した場合は、任意のトリプルに含まれる第２要素のうち正解トリプルに含まれない第２要素であって、所定の分野に属する第２要素を選択する。より具体的には、選択部４４は、不正解データとなる第２要素として、トリプルに述語と対応する情報として含まれる要素を選択する。 When determining that the selection source of the second element is not limited, the selection unit 44 selects, as incorrect answer data, the second element that is not included in the correct triple among the second elements included in any triple. In addition, when the selection unit 44 determines to limit the selection source of the second element, the selection unit 44 is a second element that is not included in the correct triple among the second elements included in any triple, and belongs to a predetermined field. Select the second element. More specifically, the selection unit 44 selects an element included in the triple as information corresponding to the predicate as the second element serving as incorrect answer data.

例えば、選択部４４は、第２要素の選択元を限定すると決定した場合は、任意のトリプルに含まれる第２要素のうち、正解トリプルに含まれる第２要素と同一または類似する分野に属する第２要素であって、正解トリプルに含まれない第２要素を選択する。また、選択部４４は、選択元を限定すると決定した場合は、正解トリプルの含まれる関係情報と同じ関係情報を含む任意のトリプルに含まれる第２要素のうち、正解トリプルに含まれない第２要素を選択する。 For example, if the selection unit 44 determines to limit the selection source of the second element, the selection unit 44 belongs to the same or similar field as the second element included in the correct triple among the second elements included in the arbitrary triple. A second element that is two elements and is not included in the correct triple is selected. In addition, when the selection unit 44 determines to limit the selection source, the second part not included in the correct triple among the second elements included in any triple including the same relationship information as the relationship information included in the correct triple is included. Select an element.

より具体的な例を挙げると、選択部４４は、決定部４３が不正解データとなる第２要素の選択元を限定しないと決定した場合には、関係情報データベース３２を参照し、正解トリプルに含まれる第２要素以外の第２要素の中から、ランダムに不正解データとなる第２要素を選択する。一方、選択部４４は、決定部４３が不正解データとなる第２要素の選択元を限定すると決定した場合には、算出部４１が算出した確度を用いて、正解トリプルに含まれる第２要素、すなわち、正解データとなる第２要素との間の確度が所定の範囲内となる他の第２要素を特定する。そして、選択部４４は、特定した第２要素の中から、ランダムに不正解データとなる第２要素を選択する。 To give a more specific example, when the determination unit 43 determines that the selection source of the second element that is incorrect data is not limited, the selection unit 44 refers to the relation information database 32 and converts it to a correct triple. From the second elements other than the second element included, a second element that is randomly incorrect data is selected. On the other hand, when the determination unit 43 determines that the selection source of the second element that is incorrect data is limited, the selection unit 44 uses the accuracy calculated by the calculation unit 41 to use the second element included in the correct answer triple. That is, the other 2nd element in which the accuracy between the 2nd element used as correct answer data is in a predetermined range is specified. And the selection part 44 selects the 2nd element used as incorrect answer data at random from the specified 2nd element.

なお、選択部４４は、学習処理が継続して行われた場合は、分散表現データベース３３を参照し、分散表現の精度を算出する。そして、選択部４４は、分散表現の精度が上昇するにつれて、正解データとなる第２要素が属する分野とより類似する分野に属する第２要素を不正解データとして選択してもよい。例えば、選択部４４は、分散表現の精度が上昇するにつれて、正解データとなる第２要素との間の確度がより高い値となる第２要素の中から、不正解データとなる第２要素を選択してもよい。また、選択部４４は、属する第２要素の数が他の分野よりも少ない分野に属する第２要素を選択してもよい。 When the learning process is continuously performed, the selection unit 44 refers to the distributed representation database 33 and calculates the accuracy of the distributed representation. And the selection part 44 may select the 2nd element which belongs to the field | area similar to the field | area to which the 2nd element used as correct data belongs as incorrect data as the precision of distributed representation rises. For example, as the accuracy of the distributed representation increases, the selection unit 44 selects the second element that is incorrect data from the second elements that have a higher accuracy with the second element that is correct data. You may choose. In addition, the selection unit 44 may select a second element belonging to a field in which the number of second elements to which it belongs is smaller than other fields.

ここで、図６は、実施形態に係る情報提供装置が不正解データとして選択する第２要素の一例を示す図である。なお、図６には、各第２要素「Ｏ１」〜「Ｏ５」が属する分野とともに、各分野に属する第２要素の数（データ量）の一例について記載した。 Here, FIG. 6 is a diagram illustrating an example of a second element selected as incorrect answer data by the information providing apparatus according to the embodiment. FIG. 6 shows an example of the number (data amount) of the second element belonging to each field as well as the field to which each of the second elements “O1” to “O5” belongs.

図６に示す例では、第２要素「Ｏ１」は、関係情報「Ｐ１」とともに、同一のトリプルに含まれている。また、第２要素「Ｏ２」は、他の第２要素「Ｏ１」、「Ｏ３」、「Ｏ４」と種別が異なるエンティティであり、関係情報「Ｐ１」と同一の関係情報を含むトリプルに含まれていないものとする。また、第２要素「Ｏ３」、「Ｏ５」は、第２要素「Ｏ１」と同一種別のエンティティであり、関係情報「Ｐ１」と同一の関係情報を含むトリプルに含まれているものとする。また、第２要素「Ｏ４」は、関係情報「Ｐ１」と類似する他の関係情報「Ｐ２」とともにトリプルに含まれているものとする。 In the example illustrated in FIG. 6, the second element “O1” is included in the same triple together with the relationship information “P1”. The second element “O2” is an entity having a different type from the other second elements “O1”, “O3”, and “O4”, and is included in the triple that includes the same relationship information as the relationship information “P1”. Shall not. The second elements “O3” and “O5” are entities of the same type as the second element “O1”, and are included in a triple including the same relationship information as the relationship information “P1”. Further, it is assumed that the second element “O4” is included in the triple together with other related information “P2” similar to the related information “P1”.

このような場合、抽出部４２は、第１要素「Ｓ」と、関係情報「Ｐ１」と、第２要素「Ｏ１」とを含むトリプルを正解トリプルＣＴとして選択する。そして、選択部４４は、正解トリプルＣＴに含まれる第２要素「Ｏ１」を正解データＣＳ１として選択する。 In such a case, the extraction unit 42 selects a triple including the first element “S”, the relationship information “P1”, and the second element “O1” as the correct triple CT. Then, the selection unit 44 selects the second element “O1” included in the correct triple CT as the correct data CS1.

続いて、選択部４４は、決定部４３が選択元を限定しない旨を決定した場合は、全カテゴリＣＡに属する第２要素「Ｏ１」〜「Ｏ５」のうち、正解トリプルＣＴに含まれない第２要素「Ｏ２」〜「Ｏ５」の中から、ランダムに１つの第２要素を不正解データとして選択する。例えば、選択部４４は、第２要素「Ｏ２」を不正解データＮＳ１として選択する。 Subsequently, when the determination unit 43 determines that the selection source is not limited, the selection unit 44 includes the second elements “O1” to “O5” belonging to all categories CA that are not included in the correct triple CT. One second element is randomly selected as incorrect data from the two elements “O2” to “O5”. For example, the selection unit 44 selects the second element “O2” as the incorrect answer data NS1.

一方、選択部４４は、決定部４３が選択元を限定する旨を決定した場合は、所定の分野として、正解データとして選択した第２要素「Ｏ１」と同一または類似する分野の他の第２要素を特定する。例えば、選択部４４は、正解トリプルＣＴと同様に、関係情報「Ｐ１」を含むトリプルに含まれている第２要素「Ｏ３」、「Ｏ５」を特定する。 On the other hand, when the determination unit 43 determines that the selection source is limited, the selection unit 44 determines that the second element “O1” selected as the correct answer data as the predetermined field is another second field that is the same as or similar to the second element “O1”. Identify the element. For example, the selection unit 44 specifies the second elements “O3” and “O5” included in the triple including the relationship information “P1”, similarly to the correct triple CT.

ここで、選択部４４は、第２要素「Ｏ３」および「Ｏ５」が属する分野のデータ量をそれぞれ計数する。図６に示す例では、第２要素「Ｏ３」が属するカテゴリＣＸのデータ量は、第２要素「Ｏ５」が属するカテゴリＣＺのデータ量よりも少ない。そこで、選択部４４は、よりデータ量が少ない分野に属する第２要素「Ｏ３」を不正解データＮＳ２として選択する。 Here, the selection unit 44 counts the data amounts of the fields to which the second elements “O3” and “O5” belong, respectively. In the example illustrated in FIG. 6, the data amount of the category CX to which the second element “O3” belongs is smaller than the data amount of the category CZ to which the second element “O5” belongs. Therefore, the selection unit 44 selects the second element “O3” belonging to the field with a smaller data amount as the incorrect answer data NS2.

なお、選択部４４は、正解トリプルの関係情報「Ｐ１」と類似する関係情報「Ｐ２」を含むトリプルから第２要素を選択することで、選択対象となるトリプルを拡張してもよい。例えば、選択部４４は、正解トリプルの関係情報「Ｐ１」と類似する関係情報「Ｐ２」を含むトリプルを特定し、特定したトリプルに含まれる第２要素「Ｏ４」を不正解データＮＳ３として選択してもよい。 The selection unit 44 may expand the triple to be selected by selecting the second element from triples including the relationship information “P2” similar to the relationship information “P1” of the correct answer triple. For example, the selection unit 44 specifies a triple including the relationship information “P2” similar to the relationship information “P1” of the correct answer triple, and selects the second element “O4” included in the specified triple as the incorrect answer data NS3. May be.

図２に戻り、説明を続ける。生成部４５は、抽出した第１要素の分散表現と関係情報の分散表現との和と、選択された第２要素の分散表現とが類似しないように、各分散表現を生成する。例えば、生成部４５は、正解トリプルに含まれる第１要素の分散表現と、正解トリプルに含まれる関係情報の分散表現との和が、正解データである第２要素の分散表現となり、正解トリプルに含まれる第１要素の分散表現と、正解トリプルに含まれる関係情報の分散表現との和が、不正解データである第２要素の分散表現とは異なるように、各要素および関係情報の分散表現を生成する。そして、生成部４５は、生成した分散表現を分散表現データベース３３に登録する。 Returning to FIG. 2, the description will be continued. The generation unit 45 generates each distributed representation so that the sum of the extracted distributed representation of the first element and the distributed representation of the relationship information is not similar to the distributed representation of the selected second element. For example, the generation unit 45 adds the distributed representation of the first element included in the correct triple and the distributed representation of the relationship information included in the correct triple to the distributed representation of the second element that is the correct data, Distributed representation of each element and relationship information so that the sum of the distributed representation of the first element included and the distributed representation of the relationship information included in the correct triple is different from the distributed representation of the second element that is incorrect data Is generated. Then, the generation unit 45 registers the generated distributed expression in the distributed expression database 33.

なお、分散表現を生成する際の具体的な手法については、ネガティブサンプリングを用いた分散表現の生成手法であれば、任意の手法が適用可能であるものとする。 As a specific method for generating the distributed representation, any method can be applied as long as it is a distributed representation generation method using negative sampling.

検索部４６は、分散表現を用いた検索処理を実行する。例えば、検索部４６は、利用者端末１００から検索クエリとして第１要素と関係情報とを受付ける。このような場合、検索部４６は、分散表現データベース３３を参照し、検索クエリとして受付けた第１要素の分散表現と関係情報の分散表現とを特定する。そして、検索部４６は、特定した分散表現の和を算出し、算出した和と分散表現空間上における距離が最も近い分散表現を分散表現データベース３３から特定する。そして、検索部４６は、特定した分散表現と対応するエンティティのデータをエンティティデータベース３１から読出し、読み出したデータを利用者端末１００へと送信する。 The search unit 46 executes a search process using a distributed expression. For example, the search unit 46 receives the first element and the related information from the user terminal 100 as a search query. In such a case, the search unit 46 refers to the distributed expression database 33 and identifies the distributed expression of the first element and the distributed expression of the relationship information received as the search query. Then, the search unit 46 calculates the sum of the specified distributed expressions, and specifies the distributed expression having the closest distance in the distributed expression space from the distributed expression database 33. Then, the search unit 46 reads the entity data corresponding to the specified distributed representation from the entity database 31, and transmits the read data to the user terminal 100.

なお、検索部４６は、エンティティデータベース３１や分散表現データベース３３に登録されていない第１要素や関係情報を検索クエリとして受付けた場合は、検索クエリとして受付けた第１要素と最も類似する他の第１要素や、検索クエリとして受付けた関係情報と最も類似する他の関係情報を検索し、検索した第１要素の分散表現および関係情報の分散表現を用いて、対応する第２要素を検索すればよい。 When the search unit 46 receives a first element or relationship information that is not registered in the entity database 31 or the distributed representation database 33 as a search query, the search unit 46 selects another first element that is most similar to the first element received as the search query. If one element or other relation information that is most similar to the relation information received as a search query is searched, and the corresponding second element is searched using the distributed expression of the searched first element and the distributed expression of the relation information. Good.

〔３．情報提供装置が実行する処理の流れの一例〕
続いて、図７を用いて、情報提供装置１０が実行する生成処理の流れについて説明する。図７は、実施形態に係る情報提供装置が実行する生成処理の流れの一例を示すフローチャートである。なお、情報提供装置１０は、図７に示す処理を、任意の単位で、任意のタイミングにより実行可能である。 [3. Example of flow of processing executed by information providing apparatus]
Next, the flow of generation processing executed by the information providing apparatus 10 will be described with reference to FIG. FIG. 7 is a flowchart illustrating an example of a flow of generation processing executed by the information providing apparatus according to the embodiment. In addition, the information provision apparatus 10 can perform the process shown in FIG. 7 by arbitrary units at arbitrary timings.

まず、情報提供装置１０は、正解トリプルから第１要素と関係情報とを抽出し、正解トリプルの第２要素を正解データとする（ステップＳ１０１）。続いて、情報提供装置１０は、不正解データの選択元を限定するか否かを確率的に決定する（ステップＳ１０２）。そして、情報提供装置１０は、限定しないと決定した場合は（ステップＳ１０３：Ｎｏ）、所定のトリプル以外のトリプルの第２要素から不正解データをランダムに選択する（ステップＳ１０４）。一方、情報提供装置１０は、限定すると決定した場合は（ステップＳ１０３：Ｙｅｓ）、確度に応じた所定の分野に属する第２要素から不正解データを選択する（ステップＳ１０５）。 First, the information providing apparatus 10 extracts the first element and the relationship information from the correct triple, and sets the second element of the correct triple as correct data (step S101). Subsequently, the information providing apparatus 10 probabilistically determines whether or not the selection source of incorrect answer data is limited (step S102). If the information providing apparatus 10 determines not to limit (step S103: No), the information providing apparatus 10 randomly selects incorrect answer data from the second elements of triples other than the predetermined triple (step S104). On the other hand, when the information providing apparatus 10 determines to limit (step S103: Yes), it selects incorrect answer data from the second element belonging to the predetermined field according to the accuracy (step S105).

そして、情報提供装置１０は、正解データ、不正解データ、抽出した第１要素および関係情報を用いて、正解ペアと不正解ペアとを生成し（ステップＳ１０６）、正解ペアと不正解ペアとを用いて分散表現を学習し（ステップＳ１０７）、処理を終了する。 Then, the information providing apparatus 10 generates the correct answer pair and the incorrect answer pair using the correct answer data, the incorrect answer data, the extracted first element and the relationship information (Step S106), and generates the correct answer pair and the incorrect answer pair. The distributed expression is learned by using (step S107), and the process is terminated.

〔４．変形例〕
上記では、情報提供装置１０による生成処理や検索処理の一例について説明した。しかしながら、実施形態は、これに限定されるものではない。以下、情報提供装置１０が実行する生成処理や検索処理のバリエーションについて説明する。 [4. (Modification)
In the above, an example of generation processing and search processing by the information providing apparatus 10 has been described. However, the embodiment is not limited to this. Hereinafter, variations of generation processing and search processing executed by the information providing apparatus 10 will be described.

〔４−１．エンティティの種別について〕
上述した例では、情報提供装置１０は、人物と人物の名前とを関連付けたトリプルについて説明した。しかしながら、実施形態は、これに限定されるものではない。すなわち、情報提供装置１０は、任意の事象を示すエンティティが関連づけられたトリプルについて、上述した生成処理を実行することで、分散表現の生成を行ってよい。 [4-1. (About entity types)
In the example described above, the information providing apparatus 10 has described the triple that associates a person with a person's name. However, the embodiment is not limited to this. That is, the information providing apparatus 10 may generate a distributed expression by executing the above-described generation process for a triple associated with an entity indicating an arbitrary event.

〔４−２．装置構成〕
記憶部３０に登録された各データベース３１〜３３は、外部のストレージサーバに保持されていてもよい。また、情報提供装置１０は、検索処理を実現するフロントエンドサーバと、生成処理を実現するバックエンドサーバとで実現されてもよい。このような場合、フロントエンドサーバには、図２に示す検索部４６が配置され、バックエンドサーバには、算出部４１、抽出部４２、決定部４３、選択部４４、および生成部４５が配置される。 [4-2. Device configuration〕
Each of the databases 31 to 33 registered in the storage unit 30 may be held in an external storage server. Further, the information providing apparatus 10 may be realized by a front-end server that realizes a search process and a back-end server that realizes a generation process. In such a case, the search unit 46 shown in FIG. 2 is arranged in the front-end server, and the calculation unit 41, the extraction unit 42, the determination unit 43, the selection unit 44, and the generation unit 45 are arranged in the back-end server. Is done.

〔４−３．その他〕
また、上記実施形態において説明した各処理のうち、自動的に行われるものとして説明した処理の全部または一部を手動的に行うこともでき、逆に、手動的に行われるものとして説明した処理の全部または一部を公知の方法で自動的に行うこともできる。この他、上記文書中や図面中で示した処理手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて任意に変更することができる。例えば、各図に示した各種情報は、図示した情報に限られない。 [4-3. Others]
In addition, among the processes described in the above embodiment, all or part of the processes described as being automatically performed can be performed manually, and conversely, the processes described as being performed manually. All or a part of the above can be automatically performed by a known method. In addition, the processing procedures, specific names, and information including various data and parameters shown in the document and drawings can be arbitrarily changed unless otherwise specified. For example, the various types of information illustrated in each drawing is not limited to the illustrated information.

また、図示した各装置の各構成要素は機能概念的なものであり、必ずしも物理的に図示の如く構成されることを要しない。すなわち、各装置の分散・統合の具体的形態は図示のものに限られず、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散・統合して構成することができる。 Further, each component of each illustrated apparatus is functionally conceptual, and does not necessarily need to be physically configured as illustrated. In other words, the specific form of distribution / integration of each device is not limited to that shown in the figure, and all or a part thereof may be functionally or physically distributed or arbitrarily distributed in arbitrary units according to various loads or usage conditions. Can be integrated and configured.

また、上記してきた各実施形態は、処理内容を矛盾させない範囲で適宜組み合わせることが可能である。 In addition, the above-described embodiments can be appropriately combined within a range in which processing contents do not contradict each other.

〔４−４．プログラム〕
また、上述した実施形態に係る情報提供装置１０は、例えば図８に示すような構成のコンピュータ１０００によって実現される。図８は、ハードウェア構成の一例を示す図である。コンピュータ１０００は、出力装置１０１０、入力装置１０２０と接続され、演算装置１０３０、一次記憶装置１０４０、二次記憶装置１０５０、出力ＩＦ（Interface）１０６０、入力ＩＦ１０７０、ネットワークＩＦ１０８０がバス１０９０により接続された形態を有する。 [4-4. program〕
Further, the information providing apparatus 10 according to the above-described embodiment is realized by a computer 1000 having a configuration as shown in FIG. FIG. 8 is a diagram illustrating an example of a hardware configuration. The computer 1000 is connected to an output device 1010 and an input device 1020, and an arithmetic device 1030, a primary storage device 1040, a secondary storage device 1050, an output IF (Interface) 1060, an input IF 1070, and a network IF 1080 are connected via a bus 1090. Have

演算装置１０３０は、一次記憶装置１０４０や二次記憶装置１０５０に格納されたプログラムや入力装置１０２０から読み出したプログラム等に基づいて動作し、各種の処理を実行する。一次記憶装置１０４０は、ＲＡＭ等、演算装置１０３０が各種の演算に用いるデータを一次的に記憶するメモリ装置である。また、二次記憶装置１０５０は、演算装置１０３０が各種の演算に用いるデータや、各種のデータベースが登録される記憶装置であり、ＲＯＭ(Read Only Memory)、ＨＤＤ（Hard Disk Drive）、フラッシュメモリ等により実現される。 The arithmetic device 1030 operates based on a program stored in the primary storage device 1040 and the secondary storage device 1050, a program read from the input device 1020, and the like, and executes various processes. The primary storage device 1040 is a memory device such as a RAM that temporarily stores data used by the arithmetic device 1030 for various arithmetic operations. The secondary storage device 1050 is a storage device in which data used by the arithmetic device 1030 for various calculations and various databases are registered, such as ROM (Read Only Memory), HDD (Hard Disk Drive), flash memory, and the like. It is realized by.

出力ＩＦ１０６０は、モニタやプリンタといった各種の情報を出力する出力装置１０１０に対し、出力対象となる情報を送信するためのインタフェースであり、例えば、ＵＳＢ（Universal Serial Bus）やＤＶＩ（Digital Visual Interface）、ＨＤＭＩ（登録商標）（High Definition Multimedia Interface）といった規格のコネクタにより実現される。また、入力ＩＦ１０７０は、マウス、キーボード、およびスキャナ等といった各種の入力装置１０２０から情報を受信するためのインタフェースであり、例えば、ＵＳＢ等により実現される。 The output IF 1060 is an interface for transmitting information to be output to an output device 1010 that outputs various types of information such as a monitor and a printer. For example, USB (Universal Serial Bus), DVI (Digital Visual Interface), This is realized by a standard connector such as HDMI (registered trademark) (High Definition Multimedia Interface). The input IF 1070 is an interface for receiving information from various input devices 1020 such as a mouse, a keyboard, and a scanner, and is realized by, for example, a USB.

なお、入力装置１０２０は、例えば、ＣＤ（Compact Disc）、ＤＶＤ（Digital Versatile Disc）、ＰＤ（Phase change rewritable Disk）等の光学記録媒体、ＭＯ（Magneto-Optical disk）等の光磁気記録媒体、テープ媒体、磁気記録媒体、または半導体メモリ等から情報を読み出す装置であってもよい。また、入力装置１０２０は、ＵＳＢメモリ等の外付け記憶媒体であってもよい。 The input device 1020 includes, for example, an optical recording medium such as a CD (Compact Disc), a DVD (Digital Versatile Disc), and a PD (Phase change rewritable disk), a magneto-optical recording medium such as an MO (Magneto-Optical disk), and a tape. It may be a device that reads information from a medium, a magnetic recording medium, a semiconductor memory, or the like. The input device 1020 may be an external storage medium such as a USB memory.

ネットワークＩＦ１０８０は、ネットワークＮを介して他の機器からデータを受信して演算装置１０３０へ送り、また、ネットワークＮを介して演算装置１０３０が生成したデータを他の機器へ送信する。 The network IF 1080 receives data from other devices via the network N and sends the data to the arithmetic device 1030, and transmits data generated by the arithmetic device 1030 to other devices via the network N.

演算装置１０３０は、出力ＩＦ１０６０や入力ＩＦ１０７０を介して、出力装置１０１０や入力装置１０２０の制御を行う。例えば、演算装置１０３０は、入力装置１０２０や二次記憶装置１０５０からプログラムを一次記憶装置１０４０上にロードし、ロードしたプログラムを実行する。 The arithmetic device 1030 controls the output device 1010 and the input device 1020 via the output IF 1060 and the input IF 1070. For example, the arithmetic device 1030 loads a program from the input device 1020 or the secondary storage device 1050 onto the primary storage device 1040, and executes the loaded program.

例えば、コンピュータ１０００が情報提供装置１０として機能する場合、コンピュータ１０００の演算装置１０３０は、一次記憶装置１０４０上にロードされたプログラムを実行することにより、制御部４０の機能を実現する。 For example, when the computer 1000 functions as the information providing device 10, the arithmetic device 1030 of the computer 1000 implements the function of the control unit 40 by executing a program loaded on the primary storage device 1040.

〔５．効果〕
上述したように、情報提供装置１０は、正解トリプルに含まれる第１要素と関係情報とを抽出する。また、情報提供装置１０は、不正解データとなる第２要素の選択元を限定するか否かを確率的に決定する。そして、情報提供装置１０は、決定結果に基づいて、任意のトリプルに含まれる第２要素のうち所定のトリプルに含まれない第２要素を少なくとも選択し、抽出した第１要素の分散表現と関係情報の分散表現との和と、選択された第２要素の分散表現とが類似しないように、各分散表現を生成する。 [5. effect〕
As described above, the information providing apparatus 10 extracts the first element and the relationship information included in the correct answer triple. In addition, the information providing apparatus 10 determines probabilistically whether or not to limit the selection source of the second element that is incorrect answer data. Then, the information providing apparatus 10 selects at least a second element that is not included in the predetermined triple among the second elements included in any triple based on the determination result, and is related to the distributed representation of the extracted first element. Each distributed representation is generated so that the sum of the distributed representation of information and the distributed representation of the selected second element are not similar.

例えば、情報提供装置１０は、不正解データとなる第２要素の選択元を限定すると決定した場合は、任意のトリプルに含まれる第２要素のうち正解トリプルに含まれない第２要素であって、所定の分野に属する第２要素を選択する。より具体的な例を挙げると、情報提供装置１０は、不正解データとなる第２要素の選択元を限定すると決定した場合は、任意のトリプルに含まれる第２要素のうち、正解トリプルに含まれる第２要素と同一または類似する分野に属する第２要素であって、正解トリプルに含まれない第２要素を不正解データとして選択する。一方、情報提供装置１０は、不正解データとなる第２要素の選択元を限定しないと決定した場合は、任意のトリプルに含まれる第２要素のうち正解トリプルに含まれない第２要素を不正解データとして選択する。 For example, if the information providing apparatus 10 determines that the selection source of the second element that is the incorrect answer data is limited, the information providing apparatus 10 is the second element that is not included in the correct triple among the second elements included in any triple. The second element belonging to a predetermined field is selected. To give a more specific example, when the information providing apparatus 10 determines to limit the selection source of the second element serving as incorrect data, it is included in the correct triple among the second elements included in any triple. A second element belonging to the same or similar field as the second element and not included in the correct triple is selected as incorrect answer data. On the other hand, if the information providing apparatus 10 determines not to limit the selection source of the second element serving as the incorrect answer data, the second element that is not included in the correct triple among the second elements included in the arbitrary triple is rejected. Select as correct data.

このように、情報提供装置１０は、確率的に所定の分野に属する不正解データを用いて分散表現を学習する。この結果、情報提供装置１０は、正解データとなる第２要素と類似する第２要素を不正解データとして分散表現の学習を行うことができる。この結果、情報提供装置１０は、生成する分散表現の精度を向上させることができる。 In this way, the information providing apparatus 10 learns the distributed expression using the incorrect answer data belonging to a predetermined field in a probabilistic manner. As a result, the information providing apparatus 10 can learn the distributed expression using the second element similar to the second element serving as the correct answer data as incorrect answer data. As a result, the information providing apparatus 10 can improve the accuracy of the generated distributed expression.

また、情報提供装置１０は、分散表現の精度が所定の条件を満たす場合は、選択元を限定するか否かを確率的に決定し、分散表現の精度が所定の条件を満たさない場合は、選択元を限定しないと決定する。例えば、情報提供装置１０は、分散表現の精度の上昇率が所定の期間だけ変化しない場合は、選択元を限定するか否かを確率的に決定する。また、例えば、情報提供装置１０は、分散表現の精度が所定の閾値を超えた場合は、選択元を限定するか否かを確率的に決定する。このため、情報提供装置１０は、分散表現の学習において、初期の段階においては従来と同様の学習処理を実行し、従来の学習で達成可能と推定される精度を超えた場合や、従来の学習では精度が上がらなくなった場合に、確率的に所定分野の不正解データを選択することで、精度をより向上させる学習を行う。この結果、情報提供装置１０は、より効率的な学習を実現することができる。 The information providing apparatus 10 probabilistically determines whether or not to limit the selection source when the accuracy of the distributed representation satisfies a predetermined condition, and when the accuracy of the distributed representation does not satisfy the predetermined condition, It is determined that the selection source is not limited. For example, when the rate of increase in accuracy of the distributed representation does not change for a predetermined period, the information providing apparatus 10 determines probabilistically whether to limit the selection source. Further, for example, when the accuracy of the distributed representation exceeds a predetermined threshold, the information providing apparatus 10 determines probabilistically whether or not to limit the selection source. For this reason, in the distributed representation learning, the information providing apparatus 10 executes the same learning process at the initial stage and exceeds the accuracy estimated to be achievable by the conventional learning or the conventional learning. Then, when the accuracy does not increase, learning to improve the accuracy is performed by selecting incorrect answer data in a predetermined field in a probabilistic manner. As a result, the information providing apparatus 10 can realize more efficient learning.

また、情報提供装置１０は、トリプルに含まれる第２要素同士の類似度を示す確度を算出する。そして、情報提供装置１０は、不正解データとなる第２要素の選択元を限定すると決定した場合は、任意のトリプルに含まれる第２要素のうち正解トリプルに含まれない第２要素であって、正解トリプルに含まれる第２要素との間の確度が所定の範囲に含まれる第２要素を不正解データとして選択する。このため、情報提供装置１０は、例えば、正解データとなる第２要素と類似する第２要素を不正解データとすることができるので、生成する分散表現の精度を向上させることができる。 In addition, the information providing apparatus 10 calculates the accuracy indicating the similarity between the second elements included in the triple. When the information providing apparatus 10 determines that the selection source of the second element serving as the incorrect answer data is limited, the information providing apparatus 10 is a second element that is not included in the correct triple among the second elements included in any triple. Then, the second element whose accuracy with the second element included in the correct triple is included in a predetermined range is selected as incorrect answer data. For this reason, for example, the information providing apparatus 10 can set the second element similar to the second element serving as the correct answer data as incorrect answer data, so that the accuracy of the distributed expression to be generated can be improved.

また、情報提供装置１０は、第２要素同士の意味的な類似度を示す確度を算出する。このため、情報提供装置１０は、意味的な類似性を分散表現空間上に落とし込むことができるので、例えば、セマンティックな検索処理を実現することができる。 Moreover, the information provision apparatus 10 calculates the accuracy indicating the semantic similarity between the second elements. For this reason, since the information providing apparatus 10 can drop semantic similarity into the distributed expression space, for example, a semantic search process can be realized.

また、情報提供装置１０は、所定の分野に属する第２要素として、分散表現の精度が上昇するにつれて正解トリプルに含まれる第２要素が属する分野とより類似する分野に属する第２要素を不正解データとして選択する。このため、情報提供装置１０は、学習を進めるにつれて、細かな類似性を分散表現に反映させていくので、分散表現の精度を改善するとともに、効率的な分散表現の生成を実現できる。 In addition, the information providing apparatus 10 incorrectly determines, as the second element belonging to the predetermined field, the second element belonging to a field more similar to the field to which the second element included in the correct answer triple belongs as the accuracy of the distributed representation increases. Select as data. For this reason, as the learning progresses, the information providing apparatus 10 reflects fine similarities in the distributed representation, thereby improving the accuracy of the distributed representation and realizing efficient generation of the distributed representation.

また、情報提供装置１０は、属する第２要素の数が他の分野よりも少ない分野に属する第２要素を不正解データとして選択する。このため、情報提供装置１０は、第２要素が属する分野に偏りが存在する場合にも、分散表現の精度の悪化を防ぐことができる。 In addition, the information providing apparatus 10 selects, as incorrect answer data, a second element that belongs to a field in which the number of second elements to which it belongs is smaller than other fields. For this reason, the information providing apparatus 10 can prevent deterioration in accuracy of the distributed representation even when there is a bias in the field to which the second element belongs.

また、情報提供装置１０は、第１要素として、トリプルに主語と対応する情報として含まれる要素を抽出し、関係情報として、トリプルに述語と対応する情報として含まれる要素を抽出する。そして、情報提供装置１０は、第２要素として、トリプルに目的と対応する情報として含まれる要素を選択する。このため、情報提供装置１０は、ＲＤＦ等、Ｓ、Ｐ、Ｏに対応する情報からなるトリプルから分散表現を生成することができる。 Further, the information providing apparatus 10 extracts an element included as information corresponding to the subject in the triple as the first element, and extracts an element included as information corresponding to the predicate in the triple as the relationship information. Then, the information providing apparatus 10 selects an element included as information corresponding to the purpose in the triple as the second element. For this reason, the information providing apparatus 10 can generate a distributed representation from triples including information corresponding to S, P, and O, such as RDF.

また、情報提供装置１０は、４割以下の確率で、第２要素の選択元を限定する旨を決定する。このため、情報提供装置１０は、生成する分散表現の精度を向上させることができる。 Further, the information providing apparatus 10 determines that the selection source of the second element is limited with a probability of 40% or less. For this reason, the information provision apparatus 10 can improve the precision of the distributed expression to generate.

また、情報提供装置１０は、第２要素の選択元を限定すると決定した場合は、正解トリプルの関係情報を含む任意のトリプルに含まれる第２要素のうち正解トリプルに含まれない第２要素を選択する。この結果、情報提供装置１０は、不正解データの選択範囲を拡張することができるので、データ量が少ない場合にも、分散表現の精度の悪化を防ぐことができる。 In addition, when the information providing apparatus 10 determines that the selection source of the second element is limited, the second element that is not included in the correct triple among the second elements included in the arbitrary triple including the relation information of the correct answer triple is selected. select. As a result, the information providing apparatus 10 can expand the selection range of the incorrect answer data, and therefore, it is possible to prevent the accuracy of the distributed representation from deteriorating even when the data amount is small.

以上、本願の実施形態のいくつかを図面に基づいて詳細に説明したが、これらは例示であり、発明の開示の欄に記載の態様を始めとして、当業者の知識に基づいて種々の変形、改良を施した他の形態で本発明を実施することが可能である。 As described above, some of the embodiments of the present application have been described in detail with reference to the drawings. However, these are merely examples, and various modifications, including the aspects described in the disclosure section of the invention, based on the knowledge of those skilled in the art, It is possible to implement the present invention in other forms with improvements.

また、上記してきた「部（section、module、unit）」は、「手段」や「回路」などに読み替えることができる。例えば、決定部は、決定手段や決定回路に読み替えることができる。 Moreover, the above-mentioned “section (module, unit)” can be read as “means”, “circuit”, and the like. For example, the determination unit can be read as determination means or a determination circuit.

１０情報提供装置
２０通信部
３０記憶部
３１エンティティデータベース
３２関係情報データベース
３３分散表現データベース
４０制御部
４１算出部
４２抽出部
４３決定部
４４選択部
４５生成部
４６検索部
１００利用者端末 DESCRIPTION OF SYMBOLS 10 Information provision apparatus 20 Communication part 30 Storage part 31 Entity database 32 Relational information database 33 Distributed expression database 40 Control part 41 Calculation part 42 Extraction part 43 Determination part 44 Selection part 45 Generation part 46 Search part 100 User terminal

Claims

An extraction unit for extracting the first element and the relationship information included in the predetermined triple;
A determination unit that probabilistically determines whether or not to limit the selection source of the second element;
A selection unit that selects at least a second element that is not included in the predetermined triple among second elements included in an arbitrary triple based on the determination result;
A generating unit that generates each distributed expression so that the sum of the extracted distributed expression of the first element and the distributed expression of the relationship information is not similar to the distributed expression of the selected second element. Generator to do.

If the selection unit determines to limit the selection source of the second element, the selection unit is a second element that is not included in the predetermined triple among the second elements included in an arbitrary triple, and is in a predetermined field. The generation device according to claim 1, wherein the second element to which it belongs is selected.

When it is determined that the selection source of the second element is limited, the selection unit belongs to the same or similar field as the second element included in the predetermined triple among the second elements included in the arbitrary triple. The generation device according to claim 2, wherein a second element that is not included in the predetermined triple is selected.

When the selection unit determines that the selection source of the second element is not limited, the selection unit selects a second element that is not included in the predetermined triple among the second elements included in an arbitrary triple. The generation device according to any one of claims 1 to 3.

When the accuracy of the distributed representation satisfies a predetermined condition, the determination unit probabilistically determines whether to limit the selection source, and when the accuracy of the distributed representation does not satisfy a predetermined condition, The generation apparatus according to claim 1, wherein the generation source is determined not to be limited.

6. The generation according to claim 5, wherein the determination unit probabilistically determines whether or not to limit the selection source when the rate of increase in accuracy of the distributed representation does not change for a predetermined period. apparatus.

The generation apparatus according to claim 5 or 6, wherein the determination unit determines probabilistically whether or not to limit the selection source when the accuracy of the distributed representation exceeds a predetermined threshold. .

A calculation unit that calculates the accuracy indicating the similarity between the second elements included in the triple;
When the selection unit determines to limit the selection source of the second element, the selection unit is a second element that is not included in the predetermined triple among the second elements included in an arbitrary triple, and the predetermined triple The generation device according to any one of claims 1 to 7, wherein a second element whose accuracy between the second element and the second element is included in a predetermined range is selected.

The generation device according to claim 8, wherein the calculation unit calculates an accuracy indicating a semantic similarity between the second elements.

The selection unit selects a second element belonging to a field more similar to a field to which the second element included in the predetermined triple belongs as the accuracy of the distributed representation increases. The production | generation apparatus as described in any one of these.

The generation device according to any one of claims 1 to 10, wherein the selection unit selects a second element belonging to a field in which the number of second elements belonging to the field is smaller than other fields.

The extraction unit extracts, as the first element, an element included as information corresponding to a subject in the triple, and as the relation information, extracts an element included as information corresponding to a predicate in the triple,
The generation device according to any one of claims 1 to 11, wherein the selection unit selects, as the second element, an element included in the triple as information corresponding to a purpose.

If the selection unit determines to limit the selection source of the second element, the selection unit is not included in the predetermined triple among the second elements included in any triple including the relation information extracted by the extraction unit. Two elements are selected. The generation device according to any one of claims 1 to 12.

A generation method executed by a generation device,
An extraction step of extracting the first element and the relationship information included in the predetermined triple;
A determination step of probabilistically determining whether or not to limit the selection source of the second element;
A selection step of selecting at least a second element not included in the predetermined triple among the second elements included in an arbitrary triple based on the determination result;
Generating the respective distributed expressions so that the sum of the extracted distributed expression of the first element and the distributed expression of the relationship information and the distributed expression of the selected second element are not similar to each other. How to generate.

An extraction procedure for extracting the first element and the relationship information included in the predetermined triple;
A decision procedure for probabilistically determining whether to limit the selection source of the second element;
A selection procedure for selecting at least a second element not included in the predetermined triple among the second elements included in an arbitrary triple based on the determination result;
To cause the computer to execute a generation procedure for generating each distributed representation so that the sum of the extracted distributed representation of the first element and the distributed representation of the relationship information is not similar to the distributed representation of the selected second element Generator program.