JP2017059077A

JP2017059077A - Information provision device, information provision method, and information provision program

Info

Publication number: JP2017059077A
Application number: JP2015184649A
Authority: JP
Inventors: 祐宮崎; Yu Miyazaki; 香里谷尾; Kaori Tanio; 隼人小林; Hayato Kobayashi; 正樹野口; Masaki Noguchi; 晃平菅原; Kohei Sugawara
Original assignee: Yahoo Japan Corp
Current assignee: Yahoo Japan Corp
Priority date: 2015-09-18
Filing date: 2015-09-18
Publication date: 2017-03-23
Anticipated expiration: 2035-09-18
Also published as: JP6552353B2

Abstract

PROBLEM TO BE SOLVED: To allow for outputting information that assists creativity of a user.SOLUTION: An information provision device of the present invention includes; a reception unit for receiving input information; a pattern identification unit configured to extract syntax and a group of words embedded in the syntax from the input information; a similarity computation unit configured to compute similarity using a distributed representation of the identified group of words included in an identified pattern; and an output unit configured to output the extracted information based on the similarity computed by the similarity computation unit.SELECTED DRAWING: Figure 1

Description

本発明は、情報提供装置、情報提供方法および情報提供プログラムに関する。 The present invention relates to an information providing apparatus, an information providing method, and an information providing program.

従来、入力された情報の解析結果に基づいて、入力された情報と関連する情報を検索もしくは生成し、検索もしくは生成した情報を応答として出力する技術が知られている。このような技術の一例として、入力されたテキストに含まれる単語、文章、文脈を多次元ベクトルに変換して解析し、解析結果に基づいて、入力されたテキストと類似するテキストや、入力されたテキストに続くテキストを類推し、類推結果を出力する自然言語処理の技術が知られている。 2. Description of the Related Art Conventionally, a technique for searching or generating information related to input information based on an analysis result of input information and outputting the searched or generated information as a response is known. As an example of such a technology, words, sentences, and contexts contained in the input text are converted into multidimensional vectors and analyzed, and based on the analysis result, text similar to the input text or input A natural language processing technique for analogizing text following text and outputting an analogy result is known.

特開２００６−１２７０７７号公報JP 2006-127077 A

“word2vecによる自然言語処理”，西尾泰和，2014年05月発行，ISBN978-4-87311-683-9“Natural Language Processing with word2vec”, Yasukazu Nishio, May 2014, ISBN978-4-87311-683-9 “創造的設計のための仮説的知識生成支援の研究”，日本機械学会，No03-27，第１３回設計工学・システム部門講演会講演論文集"Research on hypothetical knowledge generation support for creative design", The Japan Society of Mechanical Engineers, No03-27, 13th Design Engineering and System Division Lecture Proceedings

しかしながら、上記の従来技術では、利用者の創作を援助する情報を出力することができない場合がある。例えば、上記の従来技術では、入力されたテキストと類似するテキストや、入力されたテキストに続くテキスト等、利用者が予測しうる情報を出力しているに過ぎず、入力されたテキストと関連するが利用者が予測しえない情報を出力することが困難である。このため、上記の従来技術では、利用者にひらめきを与えるような情報を提供することができない。 However, in the above-described conventional technology, there are cases where information for assisting the creation of the user cannot be output. For example, in the above-described prior art, only information that can be predicted by the user, such as text similar to the input text or text following the input text, is output, and is related to the input text. However, it is difficult to output information that the user cannot predict. For this reason, the above-described conventional technology cannot provide information that gives inspiration to the user.

本願は、上記に鑑みてなされたものであって、利用者の創作を援助する情報を出力することができる情報提供装置、情報提供方法および情報提供プログラムを提供することを目的とする。 The present application has been made in view of the above, and an object thereof is to provide an information providing apparatus, an information providing method, and an information providing program capable of outputting information for assisting creation of a user.

本願にかかる情報提供装置は、入力情報を受付ける受付部と、入力情報から特定構文とその特定構文に埋め込まれた単語群を抽出するパターン特定部と、特定した前記パターンにふくまれる、前記特定した単語群の分散表現を用いて、類似度を算出する類似度算出部と、前記類似度算出部が算出した類似度に基づいて抽出した情報を出力する出力部とを有することを特徴とする。 The information providing apparatus according to the present application includes a receiving unit that receives input information, a pattern specifying unit that extracts a specific syntax and a group of words embedded in the specific syntax from the input information, and the specified pattern included in the specified pattern. A similarity calculation unit that calculates similarity using a distributed representation of a word group, and an output unit that outputs information extracted based on the similarity calculated by the similarity calculation unit.

実施形態の一態様によれば、利用者の創作を援助する情報を出力することができるという効果を奏する。 According to one aspect of the embodiment, it is possible to output information that assists the creation of the user.

図１は、実施形態にかかる情報提供装置の一例を示す図である。FIG. 1 is a diagram illustrating an example of an information providing apparatus according to the embodiment. 図２は、発明発掘手法の１つである等価変換理論を示す図である。FIG. 2 is a diagram showing an equivalent conversion theory which is one of the invention excavation methods. 図３は、実施形態にかかる情報提供装置の入力と出力のバリエーションの第１の例を説明する図である。FIG. 3 is a diagram illustrating a first example of input and output variations of the information providing apparatus according to the embodiment. 図４は、実施形態にかかる情報提供装置の入力と出力のバリエーションの第２の例を説明する図である。FIG. 4 is a diagram illustrating a second example of input and output variations of the information providing apparatus according to the embodiment. 図５は、実施形態にかかる情報提供装置の入力と出力のバリエーションの第３の例を説明する図である。FIG. 5 is a diagram illustrating a third example of input and output variations of the information providing apparatus according to the embodiment. 図６は、実施形態にかかる情報提供装置の入力と出力のバリエーションの第４の例を説明する図である。FIG. 6 is a diagram illustrating a fourth example of input and output variations of the information providing apparatus according to the embodiment. 図７は、実施形態にかかる情報提供装置が有する機能構成の一例を示す図である。FIG. 7 is a diagram illustrating an example of a functional configuration of the information providing apparatus according to the embodiment. 図８は、実施形態にかかる情報提供装置が抽出する単語の一例を説明するための図である。FIG. 8 is a diagram for explaining an example of words extracted by the information providing apparatus according to the embodiment. 図９は、実施形態にかかる情報提供装置が抽出する単語組の一例を説明する図である。FIG. 9 is a diagram illustrating an example of a word set extracted by the information providing apparatus according to the embodiment. 図１０は、実施形態にかかる情報提供装置が実行する分散表現空間を拡張する処理の一例を説明するための図である。FIG. 10 is a diagram for explaining an example of processing for extending the distributed representation space executed by the information providing apparatus according to the embodiment. 図１１は、実施形態にかかる情報提供装置が実行する分散表現空間の限定と拡張とを説明するための図である。FIG. 11 is a diagram for explaining the limitation and expansion of the distributed representation space executed by the information providing apparatus according to the embodiment. 図１２は、実施形態にかかる情報提供装置が類似する単語を抽出する処理の一例を説明する図である。FIG. 12 is a diagram illustrating an example of processing for extracting similar words by the information providing apparatus according to the embodiment. 図１３は、単語のベクトル同士の関係性の一例を説明するための図である。FIG. 13 is a diagram for explaining an example of the relationship between word vectors. 図１４は、情報提供装置が提案する単語が有する概念の一例を説明するための図である。FIG. 14 is a diagram for explaining an example of the concept of the word proposed by the information providing apparatus. 図１５は、実施形態にかかる情報提供装置が実行する抽出処理の流れを説明するフローチャートである。FIG. 15 is a flowchart illustrating a flow of extraction processing executed by the information providing apparatus according to the embodiment. 図１６は、実施形態にかかる情報提供装置が実行する抽出処理の具体的な処理の流れを説明するフローチャートである。FIG. 16 is a flowchart for explaining a specific processing flow of the extraction processing executed by the information providing apparatus according to the embodiment. 図１７は、実施形態にかかる情報提供装置がセレンディピティを起こしやすい単語組を選択する処理の具体的な処理の流れを説明するフローチャートである。FIG. 17 is a flowchart for explaining a specific processing flow of processing for selecting a word group that is likely to cause serendipity by the information providing apparatus according to the embodiment. 図１８は、入力分野の情報と異分野の情報とに関係のある情報を出力する処理の一例を説明する図である。FIG. 18 is a diagram illustrating an example of processing for outputting information related to information in an input field and information in a different field. 図１９は、抽出処理を実現するコンピュータの一例を示すハードウェア構成図である。FIG. 19 is a hardware configuration diagram illustrating an example of a computer that implements extraction processing.

以下に、本願にかかる情報提供装置、情報提供方法および情報提供プログラムを実施するための形態（以下、「実施形態」と記載する。）について図面を参照しつつ詳細に説明する。なお、この実施形態により本願にかかる情報提供装置、情報提供方法および情報提供プログラムが限定されるものではない。また、以下の実施形態において同一の部位には同一の符号を付し、重複する説明は省略される。 Hereinafter, a mode for implementing an information providing apparatus, an information providing method, and an information providing program according to the present application (hereinafter referred to as “embodiment”) will be described in detail with reference to the drawings. Note that the information providing apparatus, the information providing method, and the information providing program according to the present application are not limited by this embodiment. Moreover, in the following embodiment, the same code | symbol is attached | subjected to the same site | part and the overlapping description is abbreviate | omitted.

〔１．情報提供装置の一例〕
まず、図１を用いて、情報提供装置１０が実行する処理の一例について説明する。図１は、実施形態にかかる情報提供装置の一例を示す図である。例えば、図１に示す例では、情報提供装置１０は、複数の利用者がブレインストーミング等の会議を行っている際に、利用者の発言を入力情報として取得し、取得した入力情報に基づいて、通常の思考では浮かばないような発言を生成し、生成した発言を音声に変換してロボット等に出力させることで、利用者の思考を支援する処理の一例について記載した。 [1. Example of information providing device)
First, an example of processing executed by the information providing apparatus 10 will be described with reference to FIG. FIG. 1 is a diagram illustrating an example of an information providing apparatus according to the embodiment. For example, in the example illustrated in FIG. 1, the information providing apparatus 10 acquires a user's remarks as input information when a plurality of users are holding a conference such as brainstorming, and based on the acquired input information. An example of processing that supports the user's thought by generating a speech that does not appear in normal thinking, converting the generated speech to speech and outputting it to a robot or the like has been described.

図１に示した情報提供装置１０は、サーバ装置等の情報処理装置により実現される。なお、情報提供装置１０は、単一の情報処理装置に実現されてもよく、例えば、クラウドネットワーク上に存在する複数の情報処理装置が協調して実現されてもよい。このような、情報提供装置１０は、利用者の発言をテキストデータに変換し、テキストデータを自然言語処理により解析する。そして、情報提供装置１０は、解析結果に基づいて、会議や利用者の思考を支援する発言を生成し、生成した発言を出力する。 The information providing apparatus 10 illustrated in FIG. 1 is realized by an information processing apparatus such as a server apparatus. Note that the information providing apparatus 10 may be realized as a single information processing apparatus. For example, a plurality of information processing apparatuses existing on the cloud network may be realized in cooperation. Such an information providing apparatus 10 converts a user's speech into text data and analyzes the text data by natural language processing. And the information provision apparatus 10 produces | generates the speech which supports a meeting and a user's thought based on an analysis result, and outputs the produced | generated speech.

ここで、従来技術では、入力されたテキストを構成する複数次元の単語ベクトルで示す分散表現を用いて、入力されたテキストと類似するテキストや、入力されたテキストに続くテキストを類推する。しかしながら、従来技術では、入力されたテキストと分散表現が類似するテキスト、すなわち、利用者が予測しうるテキストを出力するに過ぎない。このため、従来技術では、利用者が思いもしなかった情報や、利用者に新規なひらめきを与えるような情報、すなわち、利用者のセレンディピティ（ひらめき、気づき、驚き）を起こし得る情報を提供することができなかった。また、例えば、単純に入力されたテキストとは分散表現が類似しないテキストを出力した場合には、利用者の思考とは関係がないテキストを出力することとなり、利用者の思考を阻害する場合がある。 Here, in the prior art, text similar to the input text or text following the input text is inferred using a distributed expression represented by a multi-dimensional word vector constituting the input text. However, the prior art merely outputs text whose distributed representation is similar to the input text, that is, text that can be predicted by the user. For this reason, the conventional technology provides information that the user has never thought of, information that gives the user a new inspiration, that is, information that can cause the user's serendipity (inspiration, awareness, surprise). I could not. Also, for example, if text that does not resemble a simple input text is output, text that is not related to the user's thought will be output, which may hinder the user's thought. is there.

そこで、情報提供装置１０は、以下の処理を実行する。まず、情報処理装置１０は、利用者の発言等を入力情報として受付ける。そして、情報処理装置１０は、入力情報を受付けると、入力情報から特定構文を特定し、特定した特定構文に埋め込まれた単語群を抽出する。また、情報処理装置１０は、特定構文に含まれる単語群の分散表現を用いて、他の単語の分散表現との類似度を算出する。そして、情報提供装置１０は、算出した類似度に基づいて抽出した情報を出力する。 Therefore, the information providing apparatus 10 executes the following process. First, the information processing apparatus 10 receives a user's remarks as input information. When the information processing apparatus 10 receives the input information, the information processing apparatus 10 identifies a specific syntax from the input information and extracts a word group embedded in the identified specific syntax. In addition, the information processing apparatus 10 calculates the degree of similarity with the distributed expression of other words using the distributed expression of the word group included in the specific syntax. Then, the information providing apparatus 10 outputs information extracted based on the calculated similarity.

具体的には、情報提供装置１０は、入力情報が属する分野を特定する。また、情報提供装置１０は、特定した分野に属する情報から所定の関係性を有する複数の情報を特定する。そして、情報提供装置１０は、特定した複数の情報により生じる概念と同様の概念を有する情報を、特定した分野とは異なる分野に属する情報から抽出する。 Specifically, the information providing apparatus 10 specifies the field to which the input information belongs. Further, the information providing apparatus 10 specifies a plurality of pieces of information having a predetermined relationship from information belonging to the specified field. And the information provision apparatus 10 extracts the information which has the concept similar to the concept produced by the specified several information from the information which belongs to the field | area different from the specified field | area.

ここで、情報提供装置１０は、所定の関連性を有する複数の情報の分散表現を用いて、複数の情報により生じる属性との類似度が所定の条件を満たす属性を有する情報を、入力情報が属する分野との類似度が所定の閾値以下となる分野に属する情報から抽出する。例えば、情報提供装置１０は、入力情報が属する分野の文献から、所定の関係性を有する複数の単語を特定し、特定した複数の単語により生じる属性と同様の属性を有する単語を、特定した分野との類似度が所定の閾値以下となる分野に属する文献から抽出する。そして、情報提供装置１０は、抽出した情報を出力する。 Here, the information providing apparatus 10 uses the distributed representation of a plurality of pieces of information having a predetermined relevance to input information having an attribute whose similarity with the attribute generated by the pieces of information satisfies a predetermined condition as input information Extraction is made from information belonging to a field in which the similarity to the field to which it belongs is equal to or less than a predetermined threshold. For example, the information providing apparatus 10 identifies a plurality of words having a predetermined relationship from documents in the field to which the input information belongs, and identifies a field having the same attribute as the attribute generated by the identified plurality of words. Are extracted from documents belonging to the field in which the degree of similarity is equal to or less than a predetermined threshold. Then, the information providing apparatus 10 outputs the extracted information.

ここで、複数の情報により生じる概念とは、例えば、各情報が持つ属性である。より具体的な例を説明すると、情報として単語を採用する場合、各単語が何語であるか、各単語にどのような意味があるか等といった属性の組を、複数の単語が持つ属性、すなわち単語の組の概念とする。以下の説明では、情報提供装置１０は、情報として取り扱われる単語、生体情報、コンテンツ等が有する属性を、その情報が有する「概念」と記載し、複数の単語、複数の生体情報、複数のコンテンツ等が有する属性を、その複数の情報が有する「概念」と記載する。 Here, the concept generated by a plurality of pieces of information is, for example, an attribute of each piece of information. More specific examples will be described. When a word is adopted as information, the attribute of a plurality of words includes a set of attributes such as how many words each word has and the meaning of each word. In other words, the concept is a word set. In the following description, the information providing apparatus 10 describes the attribute of a word, biological information, content, etc. handled as information as a “concept” of the information, and includes a plurality of words, a plurality of biological information, and a plurality of contents. Are described as “concepts” of the plurality of pieces of information.

すなわち、情報提供装置１０は、入力情報が属するカテゴリにおいて所定の関係性を有する情報によって形成される概念を特定し、入力情報とは異なるカテゴリに属する情報から、共通する概念を形成しうる情報を抽出する。かかる処理の結果、情報提供装置１０は、入力情報と暗黙的なつながりを保持しつつ、明示的には不連続な関係性を有する情報、すなわち、カテゴリが異なる結果、一見すると関係が無さそうに見える情報を提供することができる。 That is, the information providing apparatus 10 identifies a concept formed by information having a predetermined relationship in the category to which the input information belongs, and information that can form a common concept from information belonging to a category different from the input information. Extract. As a result of such processing, the information providing apparatus 10 maintains the implicit connection with the input information, but explicitly has discontinuous information, that is, as a result of different categories, it seems that there is no relationship at first glance. Visible information can be provided.

ここで入力情報と暗黙的なつながりを保持しつつ、明示的には不連続な関係性を有する情報、すなわち、カテゴリが異なる結果、一見すると関係が無さそうに見える情報は、利用者が思いもしなかった情報や、利用者に新規なひらめきを与えるような情報となりえる。 Here, information that has an implicit discontinuous relationship with the input information, that is, information that seems to have no relationship at first glance as a result of different categories, is assumed by the user. It can be information that did not exist or information that gives the user a new inspiration.

例えば、情報提供装置１０は、入力情報が属する分野の情報から、その分野における発明や発想のポイントを示す複数の情報を特定し、特定した情報が示す概念、すなわち、発明や発想のポイントを示す概念を特定する。そして、情報提供装置１０は、特定した概念と類似する情報を他分野の情報から抽出する。このような処理によって抽出された情報は、入力情報が属する分野とは異なる分野において、発明や発想のポイントを示す概念を満たす情報、すなわち、他分野における発明や発想のポイントを利用者に示唆し、利用者のセレンディピティを起こし得る情報になりえる。この結果、情報提供装置１０は、利用者の創作を援助することができる。 For example, the information providing apparatus 10 identifies a plurality of pieces of information indicating points of invention or idea in the field from the information of the field to which the input information belongs, and indicates the concept indicated by the identified information, that is, the point of invention or idea. Identify the concept. Then, the information providing apparatus 10 extracts information similar to the identified concept from information in other fields. The information extracted by such processing suggests to the user information that satisfies the concept indicating the point of invention or idea in a field different from the field to which the input information belongs, that is, the point of invention or idea in another field. It can be information that can cause serendipity of users. As a result, the information providing apparatus 10 can assist the creation of the user.

例えば、図２は、発明発掘手法の１つである等価変換理論を示す図である。例えば、図２中（Ａ）に示すように、元となる具体的事象（例えば、元となる発明）であるＡｏは、図２中（Ｂ）に示すように、開発目的に合った観点ｖｉにより成り立っている。また、Ａｏには、図２中（Ｃ）に示すように、Ａｏが属する技術的な属性など、Ａｏが有する特殊な条件Σａが含まれている。このため、図２中（Ｄ）に示すように、ＡｏからΣａを除いた概念ｃεは、Ａｏの核心をなす概念となりうる。そして、図２中（Ｅ）に示すように、概念ｃεに、Σａとは異なる特殊な条件Σｂを考慮した場合には、図２中（Ｆ）に示すように、概念上、新たな発明Ｂ_τを導出できると考えられる。 For example, FIG. 2 is a diagram showing an equivalent conversion theory that is one of the invention excavation techniques. For example, as shown in FIG. 2A, Ao, which is the original specific event (for example, the original invention), is a viewpoint vi that suits the development purpose as shown in FIG. It consists of. Further, Ao includes a special condition Σa possessed by Ao, such as a technical attribute to which Ao belongs, as shown in FIG. Therefore, as shown in FIG. 2D, the concept cε obtained by removing Σa from Ao can be a concept that forms the core of Ao. Then, as shown in FIG. 2E, when a special condition Σb different from Σa is considered in the concept cε, as shown in FIG. _It is considered that _τ can be derived.

そこで、情報提供装置１０は、後述する抽出処理により、入力情報が属する分野におけるアイデアの概念、すなわちＡｏからＡｏの核心をなす概念ｃεを抽出し、抽出した概念ｃεを他分野の条件Σｂにあてはめることで、新たなアイデアＢ_τを導出できるような情報を利用者に提供する。 Therefore, the information providing apparatus 10 extracts the concept of the idea in the field to which the input information belongs, that is, the concept cε that forms the core of Ao by extraction processing described later, and applies the extracted concept cε to the condition Σb of the other field. Thus, information that can derive a new idea _Bτ is provided to the user.

以下、図１の例を用いて、情報提供装置１０が実行する具体的な抽出処理の一例を説明する。まず、情報提供装置１０は、会議における利用者の発言Ａや発言Ｂを入力として受付ける（ステップＳ１）。例えば、情報提供装置１０は、利用者が発声した発言Ａをテキストデータに変換し、変換後のテキストデータを入力情報として取得する。 Hereinafter, an example of a specific extraction process executed by the information providing apparatus 10 will be described using the example of FIG. First, the information providing apparatus 10 receives the user's speech A and speech B in the conference as input (step S1). For example, the information providing apparatus 10 converts the utterance A uttered by the user into text data, and acquires the converted text data as input information.

かかる場合、情報提供装置１０は、入力情報が属する分野とは異なる分野に属する単語であって、入力情報が属する分野において所定の関係性を有する複数の単語が形成する概念と同様の概念を示す単語を抽出する抽出処理を実行する（ステップＳ２）。以下、情報提供装置１０が実行する生成処理の流れをステップＳ３〜ステップＳ８に分けて説明する。 In this case, the information providing apparatus 10 shows a concept that is a word that belongs to a field different from the field to which the input information belongs, and that is formed by a plurality of words having a predetermined relationship in the field to which the input information belongs. An extraction process for extracting a word is executed (step S2). Hereinafter, the flow of the generation process executed by the information providing apparatus 10 will be described in steps S3 to S8.

まず、情報提供装置１０は、入力情報が属する分野を特定する（ステップＳ３）。例えば、情報提供装置１０は、利用者の発言のテキストデータを解析し、テキストデータに含まれる単語がどのような分野において使用されている単語であるかを特定する。例えば、情報提供装置１０は、例えば、「眼鏡型」、「腕時計型」、「表示」、「スマートデバイス」等といった単語が含まれる場合は、入力情報が属する分野を「ウェアラブルデバイス」とする。なお、情報提供装置１０は、例えば、入力情報に含まれるであろう単語と、その単語が含まれる入力情報が属する分野とを予め対応付けて記憶し、入力情報に含まれる単語と対応付けられた分野を、入力情報が属する分野として特定してもよく、例えば、ウェブ上の検索サービス等を用いて、入力情報に含まれる単語と関連する分野を検索し、検索結果として得られる分野を入力情報が属する分野としてもよい。 First, the information providing apparatus 10 specifies the field to which the input information belongs (step S3). For example, the information providing apparatus 10 analyzes text data of a user's utterance and identifies in which field the word included in the text data is used. For example, when the information providing apparatus 10 includes words such as “glasses type”, “watch type”, “display”, “smart device”, etc., the field to which the input information belongs is “wearable device”. Note that the information providing apparatus 10 stores, for example, a word that will be included in the input information and a field to which the input information including the word belongs in advance, and is associated with the word included in the input information. May be specified as a field to which the input information belongs. For example, using a search service on the web or the like, a field related to a word included in the input information is searched and a field obtained as a search result is input. It may be a field to which information belongs.

続いて、情報提供装置１０は、特定した分野における文献データから、所定の関係性を有する複数の単語を含む単語組を抽出する（ステップＳ４）。例えば、情報提供装置１０は、公開特許公報や特許公報等、各種の技術について記載された文献データベースから、特定した分野の文献データを取得する。続いて、情報提供装置１０は、取得した文献データを形態素解析を行い、文献データ内から所定の順序で出現する複数の単語組を抽出する。より具体的には、情報提供装置１０は、発想や発明の特徴を説明する際に用いられる構造を有する文章を特定し、特定した文章中に含まれる単語組を抽出する。 Subsequently, the information providing apparatus 10 extracts a word set including a plurality of words having a predetermined relationship from the document data in the specified field (step S4). For example, the information providing apparatus 10 acquires document data in a specified field from a document database that describes various technologies such as a published patent gazette and a patent gazette. Subsequently, the information providing apparatus 10 performs morphological analysis on the acquired document data, and extracts a plurality of word sets that appear in a predetermined order from the document data. More specifically, the information providing apparatus 10 identifies a sentence having a structure that is used when explaining the idea and features of the invention, and extracts a word set included in the identified sentence.

具体例を説明すると、例えば、発想や発明の特徴、すなわちアイデアの特徴を説明しているであろう文章は、処理の対象を示す名詞、名詞が示す対象の状態を示す状態表現語若しくは副詞句、および処理の内容を示す動詞とを含むと予測される。また、このような文章は、助詞などの単語間に存在しうる品詞を中括弧で囲むと、「（名詞）｛を｝（状態表現語、副詞句）｛で、によって、にして｝（動詞）｛する、できる｝」といった構造を有すると予測される。以下、このような構文の構造をｃε辞典法と記載する場合がある。 To explain specific examples, for example, a sentence that may explain a feature of an idea or an invention, that is, a feature of an idea is a noun indicating a target of processing, a state expression word or an adverb phrase indicating a target state indicated by the noun , And a verb indicating the content of the process. Also, in such a sentence, a part of speech that can exist between words such as particles is enclosed in curly brackets, and "(noun) {"} (state expression word, adverb phrase) {, by}} (verb ) {Yes, I can do} ". Hereinafter, such a syntax structure may be referred to as a cε dictionary method.

そこで、情報提供装置１０は、特定した分野の文献データから上述した構造を有する文章を抽出し、抽出した文章に含まれる３つの単語、すなわち、名詞、状態表現語若しくは副詞句、および動詞を、発想を示す単語組として抽出する。このように、情報提供装置１０は、入力情報から特定構文を抽出し、抽出した特定構文に埋め込まれた単語群を抽出する。ここで、特定構文とは、所定のパターン、すなわち、ｃε辞典法の構造を有する構文である。このようにして抽出された単語組は、所定の格助詞を追加するのみで、発明や発想などのアイデアを示す文章を再現することができるため、単語組の抽出元となる文章が示す概念、すなわち、発明や発想等のアイデアの概念を示すことができる。 Therefore, the information providing apparatus 10 extracts the sentence having the structure described above from the literature data of the specified field, and three words included in the extracted sentence, that is, a noun, a state expression word or an adverb phrase, and a verb, Extracted as a word set indicating an idea. As described above, the information providing apparatus 10 extracts the specific syntax from the input information, and extracts a word group embedded in the extracted specific syntax. Here, the specific syntax is a syntax having a predetermined pattern, that is, a structure of cε dictionary. Since the word set extracted in this way can reproduce a sentence showing an idea such as an invention or idea only by adding a predetermined case particle, the concept indicated by the sentence from which the word set is extracted, That is, the concept of an idea such as an invention or an idea can be shown.

ここで、ある分野に属する文献データには、その分野において当たり前に用いられている概念を示す単語組が多く含まれていると予測される。例えば、プログラム分野の文献データには、「プログラム」を「作成する」ことにより任意のアイデアを「実現」するといった概念の文章が多く含まれると予測される。このため、プログラム分野の文献データから抽出された単語組のうち、「プログラム、作成、実現」等といった単語を含む単語組には、利用者のセレンディピティを起こし得る概念が含まれているとは言えない。 Here, it is predicted that the document data belonging to a certain field includes a large number of word sets indicating concepts that are commonly used in the field. For example, it is predicted that the literature data in the program field will contain many sentences with the concept of “realizing” an arbitrary idea by “creating” a “program”. For this reason, it can be said that the word set including words such as “program, creation, realization” and the like among the word sets extracted from the literature data in the program field includes a concept that can cause the serendipity of the user. Absent.

そこで、情報提供装置１０は、抽出した単語組からセレンディピティを起こし得る単語組を選択する（ステップＳ５）。すなわち、情報提供装置１０は、単語組の中から、利用者がより発想しづらい概念であって、提示された際に理解しやすい概念を示す単語組を選択する。具体的には、情報提供装置１０は、他に抽出された単語組のうち、同様の概念を示す単語組の数や単語組に含まれる単語間の関係性に基づいて、セレンディピティを起こし得る単語組を選択する。 Therefore, the information providing apparatus 10 selects a word group that can cause serendipity from the extracted word group (step S5). That is, the information providing apparatus 10 selects a word set that indicates a concept that is difficult for the user to conceive and is easy to understand when presented. Specifically, the information providing apparatus 10 can generate serendipity based on the number of word pairs indicating the same concept among the other extracted word sets and the relationship between words included in the word set. Select a pair.

例えば、情報提供装置１０は、利用者がより発想しづらい概念を示す単語組を選択するため、同様の概念を示す単語組の数が所定の数以下である単語組を選択する。また、情報提供装置１０は、提示された際に理解しやすい概念を示す単語組を選択するため、選択した単語組の中から、単語組に含まれる単語同士が関連語である単語組や、単語組に含まれる単語から関連語を辿った場合に、同じ単語組に含まれる他の単語にたどり着くまで辿った関連語の数が所定の閾値以下となる単語組を選択する。 For example, the information providing apparatus 10 selects a word group in which the number of word groups indicating the same concept is equal to or less than a predetermined number in order to select a word group indicating a concept that is harder for the user to conceive. In addition, the information providing apparatus 10 selects a word set indicating a concept that is easy to understand when presented, so that a word set in which words included in the word set are related words from the selected word set, When a related word is traced from a word included in the word group, a word group in which the number of related words traced until reaching another word included in the same word group is equal to or less than a predetermined threshold is selected.

ここで、セレンディピティを起こし得る単語組であっても、複数の文献において横断的に使用されている単語組は、利用者が予測しやすい単語組であるため、セレンディピティを起こす確率が低い。そこで、情報提供装置１０は、選択した単語組のうち、横断的に用いられている単語組を除去する（ステップＳ６）。 Here, even if it is a word set that can cause serendipity, a word set that is used across a plurality of documents is a word set that is easy for a user to predict, and therefore, the probability of causing serendipity is low. Therefore, the information providing apparatus 10 removes the word group used across the selected word group (step S6).

例えば、情報提供装置１０は、選択した単語組が各文献に含まれる頻度である文章頻度（ＤＦ：Document Frequency）を算出する。より具体的な例を説明すると、情報提供装置１０は、単語組ごとに、単語組が出現する文献の数を全文献の数で除算した値を算出する。そして、情報提供装置１０は、算出した値が所定の閾値以下となる単語組を選択する。なお、文章頻度を算出する際に用いる文献は、ステップＳ３にて特定した分野の文献に限定してもよく、限定しなくともよい。 For example, the information providing apparatus 10 calculates a sentence frequency (DF: Document Frequency) that is a frequency at which the selected word group is included in each document. To explain a more specific example, the information providing apparatus 10 calculates a value obtained by dividing the number of documents in which the word group appears for each word group by the number of all documents. And the information provision apparatus 10 selects the word group from which the calculated value becomes below a predetermined threshold value. Note that the documents used for calculating the sentence frequency may or may not be limited to documents in the field specified in step S3.

ここで、文章頻度が所定の閾値よりも低い値にした場合には、選択された単語組の数が少なくなる。そこで、情報提供装置１０は、名詞が示す物体に対して行われる各種の操作を示す動詞（以下、操作的動詞と記載する。）を用いて、選択された単語組を拡張する（ステップＳ７）。すなわち、情報提供装置１０は、所定の関連性を有する複数の情報を含む組に対し、所定の操作を示す複数の情報を組み合わせた新たな組を生成することで、所定の関連性を有する複数の情報を含む組が有する概念を拡張する。 Here, when the sentence frequency is set to a value lower than the predetermined threshold, the number of selected word sets is reduced. Therefore, the information providing apparatus 10 expands the selected word set using verbs indicating various operations performed on the object indicated by the noun (hereinafter, referred to as operational verbs) (step S7). . In other words, the information providing apparatus 10 generates a new set by combining a plurality of pieces of information indicating a predetermined operation with respect to a set including a plurality of pieces of information having a predetermined relevance. Extends the concept of a set containing information on

例えば、情報提供装置１０は、コバーク＆バクナールの操作的動詞を予め記憶する。ここで、コバーク＆バクナールの操作的動詞とは、例えば、「増やす」、「分割する」、「除去する」、「和らげる」、「逆にする」、「切り離す」、「入れ換える」、「一体化する」、「ねじ曲げる」、「回転させる」、「平らに伸ばす」、「絞る」、「補足する」、「水に沈める」、「凍結させる」、「柔らかくする」、「ふくらませる」、「回り道をする」、「付け加える」、「控除する」、「軽くする」、「繰り返す」、「厚くする」、「一杯に伸ばす」、「押し出す」、「はねのける」、「防衛する」、「引き離す」、「統合する」、「象徴する」、「抽象する」、「切断する」といった動詞である。 For example, the information providing apparatus 10 stores in advance the Koverk & Baknar operational verbs. Here, Koverk & Bakunar operational verbs are, for example, “increase”, “divide”, “remove”, “relieve”, “reverse”, “disconnect”, “replace”, “integrate” , “Twist”, “rotate”, “smooth out”, “squeeze”, “supplement”, “submerge”, “freeze”, “soften”, “inflate”, “detour” "Do", "add", "subtract", "lighten", "repeat", "thicken", "stretch", "push out", "splash", "defend", "pull away", The verbs are “integrate”, “symbolize”, “abstract”, “cut”.

続いて、情報提供装置１０は、選択された単語組に含まれる動詞を、上述した操作的動詞のそれぞれに置き換えた単語組を新たに生成する。例えば、情報提供装置１０は、「眼鏡、着用、視聴」といった単語組から、「眼鏡、着用、増やす」、「眼鏡、着用、分割する」等といった単語組を新たに生成する。この結果、情報提供装置１０は、利用者に対してセレンディピティを起こす可能性が高い概念のバリエーションを生成することができる。 Subsequently, the information providing apparatus 10 newly generates a word set in which the verb included in the selected word set is replaced with each of the above-described operational verbs. For example, the information providing apparatus 10 newly generates a word set such as “glasses, wear, increase”, “glasses, wear, divide” from a word set such as “glasses, wear, view”. As a result, the information providing apparatus 10 can generate a concept variation that is highly likely to cause serendipity for the user.

続いて、情報提供装置１０は、単語組のそれぞれについて、様々な分野の単語をベクトルで示した空間上、すなわち、分散表現された単語を含む分散表現空間上に単語組を示すベクトルをマッピングする。そして、情報提供装置１０は、単語組が属する分野とは異なる分野に属する単語であって、単語組が示す概念と同様の概念を示す単語を分散表現空間上から抽出する（ステップＳ８）。 Subsequently, for each word set, the information providing apparatus 10 maps a vector indicating the word set on a space indicating words in various fields as vectors, that is, on a distributed expression space including the words expressed in a distributed manner. . Then, the information providing apparatus 10 extracts, from the distributed expression space, words that belong to a field different from the field to which the word group belongs and that have the same concept as the word group (step S8).

例えば、情報提供装置１０は、Ｗ２Ｖ（word2vec）を用いて、様々な分野に属する単語の分散表現を含む分散表現空間を予め記憶する。かかる分散表現空間においては、ステップＳ３にて特定した分野に属する文献に含まれる単語の分散表現も含まれる。続いて、情報提供装置１０は、Ｗ２Ｖを用いて、単語組に含まれる各単語を分散表現に変換し、各単語の分散表現の和を算出する。すなわち、情報提供装置１０は、単語組に含まれる各単語を示すベクトルの和を算出する。 For example, the information providing apparatus 10 stores in advance a distributed expression space including distributed expressions of words belonging to various fields using W2V (word2vec). In such a distributed expression space, a distributed expression of words included in a document belonging to the field specified in step S3 is also included. Subsequently, the information providing apparatus 10 uses W2V to convert each word included in the word set into a distributed representation, and calculates the sum of the distributed representation of each word. That is, the information providing apparatus 10 calculates the sum of vectors indicating each word included in the word set.

ここで、分散表現空間上において、単語組に含まれる各単語を示すベクトルの和（以下、単語組のベクトルと記載する。）が示す向きは、単語組が示す概念を示唆していると考えられる。このため、単語組のベクトルと向きが類似するベクトルが示す単語は、単語組が示す概念と類似する概念を有すると考えられる。しかしながら、単語組が示す概念と類似する概念を有する単語であっても、単語組が属する分野と同じ分野の単語は、利用者が予測しうる単語であり、セレンディピティを起こす可能性が低い。 Here, on the distributed representation space, the direction indicated by the sum of vectors indicating each word included in the word set (hereinafter referred to as word set vector) is considered to suggest the concept indicated by the word set. It is done. For this reason, it is considered that the word indicated by the vector whose direction is similar to that of the word set vector has a concept similar to the concept indicated by the word set. However, even if the word has a concept similar to the concept indicated by the word set, a word in the same field as the field to which the word set belongs is a word that can be predicted by the user, and is less likely to cause serendipity.

そこで、情報提供装置１０は、単語組が示す概念と類似する概念を有する単語であって、単語組が属する分野とは異なる分野の文献に含まれる単語を抽出する。より具体的には、情報提供装置１０は、単語組のベクトルと向きが類似するベクトルであって、分散表現空間上において単語組のベクトルとは距離が離れたベクトルが示す単語、すなわち、単語組が示す概念と類似する概念を示す単語であって、単語組とは異なる分野に属する単語を抽出する。 Therefore, the information providing apparatus 10 extracts words that have a concept similar to the concept indicated by the word set and are included in documents in a field different from the field to which the word set belongs. More specifically, the information providing apparatus 10 is a vector whose direction is similar to that of a word set vector, and a word indicated by a vector that is separated from the word set vector in the distributed expression space, that is, the word set. A word indicating a concept similar to the concept indicated by, which belongs to a field different from the word set, is extracted.

例えば、２つのベクトルの向きがそろうにつれて、かかる２つのベクトルのコサイン距離の値は大きくなる。また、ある合成ベクトルと他のベクトルとのコサイン距離が離れるにつれて、合成ベクトルを構成する各単語ベクトルと、他のベクトルとのコサイン距離の値は小さくなる。そこで、情報提供装置１０は、単語組のベクトルと、分散表現空間上の各ベクトルとのコサイン距離をそれぞれ算出する。また、情報提供装置１０は、単語組に含まれる各単語のベクトルと、分散表現空間上の各ベクトルとのコサイン距離をそれぞれ算出する。 For example, as the directions of two vectors are aligned, the value of the cosine distance between the two vectors increases. Further, as the cosine distance between a certain synthesized vector and another vector increases, the value of the cosine distance between each word vector constituting the synthesized vector and another vector decreases. Therefore, the information providing apparatus 10 calculates the cosine distance between the word set vector and each vector in the distributed expression space. Further, the information providing apparatus 10 calculates a cosine distance between each word vector included in the word set and each vector on the distributed expression space.

そして、情報提供装置１０は、単語組のベクトルとのコサイン距離の値が所定の閾値よりも大きく、かつ、単語組に含まれる各単語のベクトルとのコサイン距離の和が所定の閾値よりも小さいベクトルを抽出し、抽出したベクトルが示す単語を特定する。すなわち、情報提供装置１０は、単語組が示す概念と類似する概念を有する単語であって、単語組が属する分野とは異なる分野の単語を抽出する。換言すると、情報提供装置１０は、等価変換理論に基づいて、特定構文から抽出した単語群と同様のアナロジーを有し、抽出した単語群と異なる分野に属する単語を抽出する。 Then, the information providing apparatus 10 has a cosine distance value with a word set vector larger than a predetermined threshold value, and a sum of cosine distances with each word vector included in the word set is smaller than a predetermined threshold value. A vector is extracted, and a word indicated by the extracted vector is specified. In other words, the information providing apparatus 10 extracts words having a concept similar to the concept indicated by the word set and having a field different from the field to which the word set belongs. In other words, the information providing apparatus 10 extracts a word having an analogy similar to that of the word group extracted from the specific syntax and belonging to a field different from the extracted word group based on the equivalent conversion theory.

そして、情報提供装置１０は、抽出した単語を提案として出力する（ステップＳ９）。例えば、情報提供装置１０は、単語組「グラフ、一括、付け加える」について、ステップＳ８に示す処理により、単語「回転」を抽出していた場合、「回転させてはどう？」等というように、抽出した単語を含み、かつ、抽出した単語が示す概念を提案するような文章を生成し、生成した文章を発言Ｃとしてロボット等に読み出させることで、利用者に提案を行う。 And the information provision apparatus 10 outputs the extracted word as a proposal (step S9). For example, when the word “rotation” is extracted by the process shown in step S8 for the word set “graph, collective, add”, the information providing apparatus 10 reads “Rotate?” A sentence that includes the extracted word and suggests the concept indicated by the extracted word is generated, and the generated sentence is read as a utterance C by a robot or the like to make a proposal to the user.

このような提案が行われた場合、利用者は、提案された単語の概念に基づいて、あらたなひらめきを生じさせる可能性が高い。このため、情報提供装置１０は、利用者のセレンディピティを起こさせることができる。 When such a proposal is made, the user is likely to generate a new inspiration based on the concept of the proposed word. For this reason, the information provision apparatus 10 can raise a user's serendipity.

〔２．情報提供装置が提供する概念のバリエーション〕
ここで、図１に示す例では、利用者の発言に含まれる単語から、利用者にセレンディピティを生じさせるような概念を示す単語を提案する処理について記載した。例えば、図３は、実施形態にかかる情報提供装置の入力と出力のバリエーションの第１の例を説明する図である。例えば、図３中（Ａ）に示すように、情報提供装置１０は、文献として、公開公報等の特許明細書に含まれる単語をＷ２Ｖにより分散表現空間にマッピングすることで、古い技術や異分野技術の共通アナロジーを含む分散表現空間を生成する。 [2. Variations on concepts provided by information providing devices)
Here, in the example illustrated in FIG. 1, the process of proposing a word indicating a concept that causes the user to generate serendipity from words included in the user's utterance is described. For example, FIG. 3 is a diagram illustrating a first example of input and output variations of the information providing apparatus according to the embodiment. For example, as shown in FIG. 3A, the information providing apparatus 10 maps old words and different fields by mapping words included in a patent specification such as a public gazette as a document to a distributed expression space using W2V. Generate a distributed representation space that contains a common analogy of technology.

続いて、情報提供装置１０は、図３中（Ｂ）に示すように、入力情報から、サービスや技術分野を特定する。そして、情報提供装置１０は、図３中（Ｃ）に示すように上述した抽出処理を実行することで、特定した分野とは異なる分野のアナロジーのうち、特定した分野におけるアナロジーが有する概念と近い概念を有するアナロジーを抽出する。この結果、情報提供装置１０は、図３中（Ｄ）に示すように、新たなアイデアのヒントを出力できる。 Subsequently, as shown in FIG. 3B, the information providing apparatus 10 specifies a service and a technical field from the input information. And the information provision apparatus 10 is close to the concept which the analogy in the specified field | area has among the analogies of the field | area which is different from the specified field | area by performing the extraction process mentioned above as shown to (C) in FIG. Extract analogies with concepts. As a result, the information providing apparatus 10 can output a hint of a new idea as shown in (D) of FIG.

このように出力されたヒントは、入力情報が属する分野におけるアイデアが有する概念と類似する概念であって、異なる分野における概念を利用者に想起させることができると予測される。このため、情報提供装置１０は、図３中（Ｅ）に示すように、古い技術の発想を新しい環境に展開したアイデアや、図３中（Ｆ）に示すように、異分野の技術を活用した新たな技術の創出を助けることができる。 The hint output in this way is a concept similar to the concept of the idea in the field to which the input information belongs, and it is predicted that the concept in a different field can be recalled by the user. For this reason, the information providing apparatus 10 utilizes an idea obtained by developing the idea of the old technology in a new environment as shown in (E) of FIG. 3 or a technology in a different field as shown in (F) of FIG. Can help create new technologies.

しかしながら、実施形態は、これに限定されるものではない。例えば、情報提供装置１０は、単語に代えて、ビジネスモデルやビジネスの分野を入力とし、様々な業種におけるビジネスモデルの内容や契約書の内容を分散表現に置き換えた分散表現空間を用いることで、新たなビジネスモデルを創出するためのセレンディピティを利用者に起こさせるようなヒントとなる情報を出力してもよい。 However, the embodiment is not limited to this. For example, the information providing apparatus 10 uses a distributed expression space in which a business model and a business field are input instead of words, and the contents of business models and contracts in various industries are replaced with distributed expressions. You may output the information used as the hint which makes a user raise the serendipity for creating a new business model.

例えば、図４は、実施形態にかかる情報提供装置の入力と出力のバリエーションの第２の例を説明する図である。例えば、図４中（Ａ）に示すように、情報提供装置１０は、文献として、過去のビジネスが有する特徴をＷ２Ｖにより分散表現空間にマッピングすることで、古いビジネスや異分野技術の共通アナロジーを含む分散表現空間を生成する。 For example, FIG. 4 is a diagram illustrating a second example of input and output variations of the information providing apparatus according to the embodiment. For example, as shown in FIG. 4 (A), the information providing apparatus 10 maps, as a document, features of past business to a distributed representation space by W2V, so that a common analogy of old business and different field technologies can be obtained. Create a distributed representation space containing.

続いて、情報提供装置１０は、図４中（Ｂ）に示すように、入力情報から、サービスやビジネスが属する業種を特定する。そして、情報提供装置１０は、特定した分野のサービスやビジネスが有する特徴を抽出し、抽出した複数の特徴を分散表現に変換する。また、情報提供装置１０は、分散表現に変換した特徴の和と類似する特徴を有するビジネスの特徴を他分野のビジネスから抽出することで、図４中（Ｃ）に示すように、特定した業種とは異なる業種のアナロジーのうち、特定した業種におけるアナロジーが有する概念と近い概念を有するアナロジーを抽出する。この結果、情報提供装置１０は、図４中（Ｄ）に示すように、新たなビジネスのヒントを出力できる。 Subsequently, as shown in FIG. 4B, the information providing apparatus 10 specifies the type of business to which the service or business belongs from the input information. Then, the information providing apparatus 10 extracts features of services and businesses in the identified field, and converts the extracted features into a distributed representation. In addition, the information providing apparatus 10 extracts business features having characteristics similar to the sum of the features converted into the distributed representation from other business fields, and as shown in FIG. The analogies having concepts similar to the concepts possessed by the analogies in the specified industries are extracted from the analogies of different industries. As a result, the information providing apparatus 10 can output a new business hint as shown in FIG.

このように出力されたヒントは、入力情報が属する業種におけるビジネスの概念と類似する概念であって、異なる業種における概念を利用者に想起させることができると予測される。このため、情報提供装置１０は、図４中（Ｅ）に示すように、古いビジネスの発想を新しい環境に展開したビジネスや、図４中（Ｆ）に示すように、異業種のビジネスモデルを活用した新たなビジネスモデルの創出を助けることができる。 The hint output in this way is a concept similar to the business concept in the industry to which the input information belongs, and it is predicted that the concept in a different industry can be recalled by the user. For this reason, as shown in FIG. 4 (E), the information providing apparatus 10 can develop a business model in which an old business idea has been developed in a new environment, or a business model of a different industry as shown in FIG. 4 (F). It can help create new business models.

また、例えば、情報提供装置１０は、単語に代えて、音楽、画像、広告等の任意のコンテンツを入力とし、コンテンツの分散表現空間を用いることで、新たなコンテンツを創出するためのセレンディピティを利用者に起こさせるようなコンテンツを出力してもよい。 Further, for example, the information providing apparatus 10 uses serendipity for creating new content by using arbitrary content such as music, images, advertisements, and the like instead of words as input and using a distributed expression space of the content. Content that causes a person to wake up may be output.

例えば、図５は、実施形態にかかる情報提供装置の入力と出力のバリエーションの第３の例を説明する図である。例えば、図５中（Ａ）に示すように、情報提供装置１０は、文献に代えて、過去のコンテンツが有する特徴を分散表現空間にマッピングすることで、古いコンテンツや異分野のコンテンツの共通アナロジーを含む分散表現空間を生成する。 For example, FIG. 5 is a diagram illustrating a third example of input and output variations of the information providing apparatus according to the embodiment. For example, as illustrated in FIG. 5A, the information providing apparatus 10 maps the features of past contents to the distributed expression space instead of the literature, so that the common analogy of old contents and contents in different fields is mapped. Generate a distributed representation space containing.

続いて、情報提供装置１０は、図５中（Ｂ）に示すように、音声や画像等のコンテンツを入力情報として受付けると、コンテンツが属する分野を特定する。また、情報提供装置１０は、特定した分野のコンテンツが有する複数の特徴を抽出し、抽出した複数の特徴を分散表現に変換する。また、情報提供装置１０は、分散表現に変換した特徴の和と類似するコンテンツを他分野のコンテンツから抽出することで、図５中（Ｃ）に示すように、特定した分野とは異なる分野のコンテンツが有するアナロジーのうち、特定した分野におけるコンテンツのアナロジーが有する概念と近い概念を有するアナロジーを抽出する。この結果、情報提供装置１０は、図５中（Ｄ）に示すように、新たなコンテンツのヒントを出力できる。 Subsequently, as shown in FIG. 5B, when the information providing apparatus 10 receives content such as sound or image as input information, the information providing apparatus 10 identifies the field to which the content belongs. Further, the information providing apparatus 10 extracts a plurality of features included in the content in the identified field, and converts the extracted plurality of features into a distributed representation. In addition, the information providing apparatus 10 extracts content similar to the sum of features converted into the distributed representation from content in other fields, so that, as shown in FIG. Among the analogies possessed by the content, an analogy having a concept close to the concept possessed by the content analogy in the specified field is extracted. As a result, the information providing apparatus 10 can output a new content hint as shown in FIG.

このように出力されたヒントは、入力情報が属する分野のコンテンツが有する概念と類似する概念であって、異なる分野のコンテンツの概念を利用者に想起させることができると予測される。このため、情報提供装置１０は、図５中（Ｅ）に示すように、古いコンテンツの発想を新しい環境に展開したコンテンツや、図５中（Ｆ）に示すように、異分野のコンテンツを活用した新たなコンテンツの創出を助けることができる。また、情報提供装置１０は、図５中（Ｇ）に示すように、例えば、出力が音であった場合は、例えば、作曲の元となるフレーズの作成を助けることができる。 The hint output in this manner is a concept similar to the concept of the content in the field to which the input information belongs, and it is predicted that the user can recall the concept of the content in a different field. For this reason, the information providing apparatus 10 utilizes content obtained by developing the idea of old content in a new environment as shown in (E) of FIG. 5 or content in a different field as shown in (F) of FIG. Can help create new content. Further, as shown in FIG. 5G, for example, when the output is a sound, the information providing apparatus 10 can assist in creating a phrase that is a source of composition.

また、例えば、情報提供装置１０は、単語に代えて、利用者から取得した五感などの生体情報を入力とし、各種生体情報を分散表現した分散表現空間を用いることで、セレンディピティを利用者に起こさせるようなヒントとなる情報を出力してもよい。 In addition, for example, the information providing apparatus 10 causes the user to generate serendipity by using biometric information such as the five senses acquired from the user instead of words and using a distributed expression space in which various biometric information is expressed in a distributed manner. Information that serves as a hint may be output.

例えば、図６は、実施形態にかかる情報提供装置の入力と出力のバリエーションの第４の例を説明する図である。例えば、図６中（Ａ）に示すように、情報提供装置１０は、文献にかえて、過去に取得された生体情報が有する特徴を分散表現空間にマッピングすることで、古い生体情報や異種別の生体情報が有する共通アナロジー（例えば、パターン）を含む分散表現空間を生成する。 For example, FIG. 6 is a diagram illustrating a fourth example of input and output variations of the information providing apparatus according to the embodiment. For example, as shown in FIG. 6A, the information providing apparatus 10 maps old biometric information or different types of information by mapping features of biometric information acquired in the past to a distributed expression space instead of the literature. A distributed expression space including a common analogy (for example, a pattern) included in the biometric information is generated.

続いて、情報提供装置１０は、図６中（Ｂ）に示すように、入力情報から、利用者の生体情報の種別を特定する。また、情報提供装置１０は、特定した種別の複数の生体情報が有する特徴を抽出し、抽出した複数の特徴を分散表現に変換する。また、情報提供装置１０は、分散表現に変換した特徴の和と類似する特徴を有する生体情報を他種別の生体情報から抽出することで、図６中（Ｃ）に示すように、特定した種別とは異なる種別の生体情報が有するアナロジーのうち、特定した種別の生体情報のアナロジーが有する概念と近い概念を有するアナロジーを抽出する。この結果、情報提供装置１０は、図６中（Ｄ）に示すように、新たな生体情報を惹起させるためのヒントを出力できる。 Subsequently, as shown in FIG. 6B, the information providing apparatus 10 specifies the type of the biological information of the user from the input information. In addition, the information providing apparatus 10 extracts features included in a plurality of specified types of biological information, and converts the extracted features into a distributed representation. Further, the information providing apparatus 10 extracts biometric information having features similar to the sum of the features converted into the distributed representation from other types of biometric information, and as shown in FIG. The analogy having a concept close to the concept of the analogy of the specified type of biological information is extracted from the analogies of the different types of biological information. As a result, the information providing apparatus 10 can output a hint for inducing new biological information, as shown in FIG.

このように出力されたヒントは、図６中（Ｆ）に示すように、例えば、生体情報の元となる利用者にとって心地よい感覚の特定に用いることができる。例えば、情報提供装置１０は、利用者から取得した生体情報が好む味覚であった場合に、利用者が好む色彩や音程等を出力することができる。 As shown in FIG. 6F, the hint output in this way can be used, for example, for specifying a comfortable sensation for the user who is the source of the biological information. For example, the information providing apparatus 10 can output a color, a pitch, or the like that the user likes when the biological information acquired from the user has a taste that the user likes.

なお、上述した処理を実行するためには、単語以外のコンテンツを分散表現空間上にマッピングする必要がある。そこで、情報提供装置１０は、例えば、ニューラルネットワークやディープラーニング等の手法を用いて、契約書やビジネス書籍、各種のコンテンツ、生体情報が有する特徴を抽出し、抽出した特徴を示すベクトルを分散表現空間上にマッピングすればよい。すなわち、情報提供装置１０は、特徴を分散表現空間上にマッピングすることができる情報であって、複数の種別に分類することができる情報であれば、任意の情報について上述した抽出処理を適用することができる。 In order to execute the above-described processing, it is necessary to map content other than words on the distributed expression space. Therefore, the information providing apparatus 10 extracts features of contracts, business books, various contents, and biometric information using a technique such as neural network or deep learning, and distributes the vectors representing the extracted features. What is necessary is just to map on space. That is, the information providing apparatus 10 applies the above-described extraction process to arbitrary information as long as the information can be mapped onto the distributed expression space and can be classified into a plurality of types. be able to.

〔３．情報提供装置の構成〕
次に、図７を用いて、図１に示した抽出処理を実行する情報提供装置１０の構成について説明する。なお、以下の説明では、複数の文献に含まれる単語を分散表現空間上にマッピングし、利用者の発言が属する技術の分野とは異なる分野の単語であって、利用者の発言が属する分野におけるアイデアの概念と類似する概念を有する単語をヒントとして出力する情報提供装置１０の一例について説明する。 [3. Configuration of information providing device]
Next, the configuration of the information providing apparatus 10 that executes the extraction process illustrated in FIG. 1 will be described with reference to FIG. In the following description, words included in a plurality of documents are mapped on the distributed expression space, and are words in a field different from the field of technology to which the user's remark belongs, and in the field to which the user remark belongs. An example of the information providing apparatus 10 that outputs a word having a concept similar to the idea concept as a hint will be described.

図７は、実施形態にかかる情報提供装置が有する機能構成の一例を示す図である。図７に示すように、情報提供装置１０は、入力装置３０および出力装置３１と接続されている。また、情報提供装置１０は、通信部１１、記憶部１２、および制御部１６を有する。 FIG. 7 is a diagram illustrating an example of a functional configuration of the information providing apparatus according to the embodiment. As shown in FIG. 7, the information providing device 10 is connected to an input device 30 and an output device 31. In addition, the information providing apparatus 10 includes a communication unit 11, a storage unit 12, and a control unit 16.

通信部１１は、例えば、ＮＩＣ（Network Interface Card）等によって実現される。そして、通信部１１は、マイクやキーボード等の入力装置３０と、モニタやプリンタ、音声を発声することができるロボット等の出力装置３１と接続され、各種情報の送受信を行う。 The communication unit 11 is realized by, for example, a NIC (Network Interface Card). The communication unit 11 is connected to an input device 30 such as a microphone and a keyboard and an output device 31 such as a monitor, a printer, and a robot that can utter voice, and transmits and receives various types of information.

記憶部１２は、例えば、ＲＡＭ（Random Access Memory)、フラッシュメモリ（Flash Memory）等の半導体メモリ素子、または、ハードディスク、光ディスク等の記憶装置によって実現される。また、記憶部１２は、文献データベース１３、分散表現空間データベース１４、拡張単語データベース１５を有する。 The storage unit 12 is realized by, for example, a semiconductor memory device such as a RAM (Random Access Memory) or a flash memory, or a storage device such as a hard disk or an optical disk. The storage unit 12 includes a document database 13, a distributed expression space database 14, and an extended word database 15.

文献データベース１３には、各種分野に属する文献が登録されている。例えば、文献データベース１３には、公開特許公報、公表特許公報、特許公報、論文等、過去になされたアイデアの概念を含む文献が登録されている。 Documents belonging to various fields are registered in the document database 13. For example, in the document database 13, documents including concepts of ideas made in the past, such as published patent publications, published patent publications, patent publications, and papers, are registered.

分散表現空間データベース１４には、各種分野に属する単語の分散表現が登録されている。例えば、分散表現空間データベース１４には、文献データベース１３に登録された文献に含まれる単語を示すベクトルが登録されている。なお、各単語の分散表現は、各単語同士の関係性に基づいて生成されるため、各単語同士の向きや距離は、各単語同士が有する概念や共起性の類似度に対応することとなる。 In the distributed expression space database 14, distributed expressions of words belonging to various fields are registered. For example, in the distributed expression space database 14, vectors indicating words included in documents registered in the document database 13 are registered. In addition, since the distributed representation of each word is generated based on the relationship between each word, the direction and distance between each word correspond to the concept and co-occurrence similarity of each word. Become.

拡張単語データベース１５は、単語組を拡張する際に用いる操作的動詞が登録されている。例えば、拡張単語データベース１５には、コバーク＆バクナールの操作的動詞を含む操作的動詞リストが予め登録されている。 The extended word database 15 is registered with operational verbs used when expanding word groups. For example, in the extended word database 15, an operational verb list including the operational verbs of Kobak & Baknar is registered in advance.

制御部１６は、例えば、ＣＰＵ（Central Processing Unit）やＭＰＵ（Micro Processing Unit）等によって、情報提供装置１０内部の記憶装置に記憶されている各種プログラムがＲＡＭを作業領域として実行されることにより実現される。また、制御部１６は、例えば、ＡＳＩＣ（Application Specific Integrated Circuit）やＦＰＧＡ（Field Programmable Gate Array）等の集積回路により実現される。 The control unit 16 is realized by, for example, a CPU (Central Processing Unit), an MPU (Micro Processing Unit), or the like executing various programs stored in the storage device inside the information providing apparatus 10 using the RAM as a work area. Is done. Moreover, the control part 16 is implement | achieved by integrated circuits, such as ASIC (Application Specific Integrated Circuit) and FPGA (Field Programmable Gate Array), for example.

図７に示すように、制御部１６は、受付部１７、分野特定部１８、パターン抽出部１９、関連単語抽出部２０、単語空間限定部２１、単語空間拡張部２２、提案単語抽出部２３、出力部２４、学習部２５を有する。 As shown in FIG. 7, the control unit 16 includes a reception unit 17, a field identification unit 18, a pattern extraction unit 19, a related word extraction unit 20, a word space limitation unit 21, a word space expansion unit 22, a suggested word extraction unit 23, An output unit 24 and a learning unit 25 are included.

受付部１７は、利用者の発言を入力情報として受付ける。例えば、受付部１７は、マイクやキーボード等により実現される入力装置３０から利用者の発言を取得する。かかる場合、受付部１７は、受付けた利用者の発言をテキストデータに変換する。そして、受付部１７は、変換後のテキストデータを分野特定部１８に出力する。 The reception unit 17 receives a user's remarks as input information. For example, the reception unit 17 acquires a user's remarks from the input device 30 realized by a microphone, a keyboard, or the like. In such a case, the reception unit 17 converts the received user's remarks into text data. Then, the accepting unit 17 outputs the converted text data to the field specifying unit 18.

分野特定部１８は、入力情報が属する分野を特定する。例えば、分野特定部１８は、受付部１７から受付けたテキストデータの形態素解析を行い、テキストデータに含まれる単語がどのような分野において使用されている単語であるかを特定する。そして、分野特定部１８は、特定した分野をパターン抽出部１９に通知する。 The field specifying unit 18 specifies the field to which the input information belongs. For example, the field specifying unit 18 performs a morphological analysis of the text data received from the receiving unit 17 and specifies in which field the word included in the text data is used. Then, the field specifying unit 18 notifies the pattern extracting unit 19 of the specified field.

パターン抽出部１９は、特定した分野において所定の関係性を有する複数の単語を含む単語組を抽出する。例えば、パターン抽出部１９は、分野特定部１８から分野の通知を受付けると、通知された分野に属する文献を文献データベース１３から抽出する。そして、パターン抽出部１９は、抽出した文献から所定の構造を有する文章を特定し、特定した文章中に所定の順序で含まれる複数の単語を抽出し、抽出した複数の単語を含む単語組を関連単語抽出部２０に出力する。 The pattern extraction unit 19 extracts a word set including a plurality of words having a predetermined relationship in the specified field. For example, when receiving the notification of the field from the field specifying unit 18, the pattern extracting unit 19 extracts a document belonging to the notified field from the document database 13. Then, the pattern extraction unit 19 identifies a sentence having a predetermined structure from the extracted document, extracts a plurality of words included in the specified sentence in a predetermined order, and extracts a word set including the extracted plurality of words. Output to the related word extraction unit 20.

例えば、図８は、実施形態にかかる情報提供装置が抽出する単語の一例を説明するための図である。図８中（Ａ）に示すように、アイデアの特徴を示す文章は、「（名詞）｛を｝（状態表現語、副詞句）｛で、によって、にして｝（動詞）｛する、できる｝」という構造を有すると予測される。そこで、パターン抽出部１９は、通知された分野に属する文献を文献データベース１３から抽出し、形態素解析等の技術を用いて、「（名詞）｛を｝（状態表現語、副詞句）｛で、によって、にして｝（動詞）｛する、できる｝」という構造の文章を抽出する。そして、パターン抽出部１９は、図８中（Ｂ）に示すように、抽出した文章に含まれる名詞を単語＃１とし、状態表現語又は副詞句を単語＃２とし、動詞を単語＃３として抽出し、抽出した各単語＃１〜＃３を含む単語組を生成する。このようにして生成された単語組は、図８中（Ｃ）に示すように、抽出した文章が示すアイデアの特徴、すなわち概念ｃεを含むこととなる。 For example, FIG. 8 is a diagram for explaining an example of words extracted by the information providing apparatus according to the embodiment. As shown in FIG. 8A, the sentence indicating the feature of the idea is “(noun) {is} (state expression word, adverb phrase) {, by} (verb) {do, can} It is predicted to have a structure of “ Therefore, the pattern extraction unit 19 extracts a document belonging to the notified field from the document database 13 and uses a technique such as morphological analysis, and “(noun) {a} (state expression word, adverb phrase) { Then, a sentence having a structure of “} (verb) {do, can do}” is extracted. Then, as shown in FIG. 8B, the pattern extraction unit 19 sets the noun included in the extracted sentence as the word # 1, the state expression word or adverb phrase as the word # 2, and the verb as the word # 3. Extraction is performed, and a word set including the extracted words # 1 to # 3 is generated. As shown in FIG. 8C, the word set generated in this way includes the feature of the idea indicated by the extracted sentence, that is, the concept cε.

図７に戻り、説明を続ける。関連単語抽出部２０は、パターン抽出部１９が抽出した単語組の中から利用者がより発想しづらい概念であって、提示された際に理解しやすい概念を示す単語組を選択する。例えば、図９は、実施形態にかかる情報提供装置が抽出する単語組の一例を説明する図である。なお、図９に示す例では、複数の公開特許公報を形態素解析し、所定の構造を有する文章から抽出した単語組、すなわち、所定の関係性を有する複数の単語を含む単語組を、各文献の出現順に記載した。 Returning to FIG. 7, the description will be continued. The related word extraction unit 20 selects a word set that is a concept that is more difficult for the user to think out of the word sets extracted by the pattern extraction unit 19 and that is easy to understand when presented. For example, FIG. 9 is a diagram illustrating an example of a word set extracted by the information providing apparatus according to the embodiment. In the example shown in FIG. 9, a plurality of published patent publications are subjected to morphological analysis, and word sets extracted from sentences having a predetermined structure, that is, word sets including a plurality of words having a predetermined relationship, In order of appearance.

例えば、図９中（Ａ）に示すように、「プログラム作成実現」、「トレリス構造縮退計算」等といった単語組が文献に多く含まれている。しかしながら、このような単語組は、特定した分野において良く使用されている単語組や安定感がある単語組であるため、このような単語組が有するアイデアの概念は、その分野において当たり前に用いられている概念である。この結果、出現頻度が高い単語組は、セレンディピティが低い。 For example, as shown in FIG. 9A, the literature contains many word sets such as “program creation realization”, “trellis structure degeneration calculation”, and the like. However, since such a word set is a word set that is often used in a specified field or a stable word set, the idea concept of such a word set is commonly used in that field. Is a concept. As a result, word groups with high appearance frequency have low serendipity.

一方、図９中（Ｂ）に示すように、「眼鏡着用視聴」といった単語組は、特定した分野においての使用頻度があまり高くないため、特定した分野において当たり前ではない概念を示す単語組、すなわち、セレンディピティを起こさせる可能性が高い単語組であると予測される。また、図９中（Ｃ）に示すように、出現頻度があまりにも低い単語組、すなわち、出現頻度が所定の閾値よりも低い単語組は、概念が理解しづらいため、利用者にセレンディピティを起こさせる可能性が低くなってしまう。 On the other hand, as shown in FIG. 9B, a word set such as “wearing and wearing glasses” is not so frequently used in the specified field. It is predicted that the word set is highly likely to cause serendipity. In addition, as shown in FIG. 9C, a word set with an appearance frequency that is too low, that is, a word set with an appearance frequency lower than a predetermined threshold, is difficult to understand, and causes serendipity to the user. The possibility of making it low.

そこで、関連単語抽出部２０は、パターン抽出部１９が抽出した単語組のうち、特定した分野に属する文献内に、同様の概念を示す単語組の数が第１の閾値以下、第２の閾値以上である単語組を抽出する。この結果、関連単語抽出部２０は、特定した分野、すなわち、利用者の発言が属する分野においてセレンディピティを起こしやすい概念を含む単語組を抽出することができる。 Therefore, the related word extraction unit 20 includes the number of word sets indicating the same concept in the documents belonging to the specified field among the word sets extracted by the pattern extraction unit 19 below the first threshold value, and the second threshold value. The word group which is the above is extracted. As a result, the related word extraction unit 20 can extract a word set including a concept that is likely to cause serendipity in the specified field, that is, the field to which the user's speech belongs.

なお、関連単語抽出部２０は、単語同士の関連関係を示す関連語辞書を用いて、単語組に含まれるいずれかの単語から同一の単語組に含まれる他の単語までの間に存在する関連語の数を計数する。そして、関連単語抽出部２０は、計数した関連語の数が所定の範囲内に収まるような単語組を抽出することで、セレンディピティを起こしやすい概念を含む単語組を抽出してもよい。 Note that the related word extraction unit 20 uses a related word dictionary that indicates a related relationship between words, and a relationship that exists between any word included in the word set and another word included in the same word set. Count the number of words. And the related word extraction part 20 may extract the word group containing the concept which is easy to raise | generate a serendipity by extracting the word group which the number of the related words counted falls in the predetermined range.

図７に戻り、説明を続ける。単語空間限定部２１は、関連単語抽出部２０が抽出した単語組に含まれる各単語によって形成される分散表現空間をさらに限定する。具体的には、単語空間限定部２１は、文献データベース１３に格納された全ての文献のうち、関連単語抽出部２０が抽出した単語組が含まれる文章を含む文献の数を計数する。そして、単語空間限定部２１は、計数した値を文献の数で除算した値、すなわち、単語組のＤＦを算出し、算出したＤＦの値が所定の閾値以下となる単語組を抽出する。この結果、情報提供装置１０は、全ての分野を基準としてよりセレンディピティを起こされる可能性が高い概念を含む単語組を抽出することができる。 Returning to FIG. 7, the description will be continued. The word space limiting unit 21 further limits the distributed expression space formed by each word included in the word set extracted by the related word extracting unit 20. Specifically, the word space limiting unit 21 counts the number of documents including a sentence including the word set extracted by the related word extraction unit 20 among all the documents stored in the document database 13. Then, the word space limiting unit 21 calculates a value obtained by dividing the counted value by the number of documents, that is, a DF of the word set, and extracts a word set whose calculated DF value is equal to or less than a predetermined threshold. As a result, the information providing apparatus 10 can extract a word set including a concept that is more likely to cause serendipity based on all fields.

単語空間拡張部２２は、単語組に含まれる動詞を所定の操作的動詞に置き換えた複数の単語組を生成する。この結果、単語空間拡張部２２は、単語組に含まれる概念を大きく変更することなく、単語空間限定部２１によって限定された分散表現空間を拡張する。 The word space expansion unit 22 generates a plurality of word sets in which verbs included in the word set are replaced with predetermined operational verbs. As a result, the word space expanding unit 22 expands the distributed expression space limited by the word space limiting unit 21 without greatly changing the concept included in the word set.

例えば、図１０は、実施形態にかかる情報提供装置が実行する分散表現空間を拡張する処理の一例を説明するための図である。例えば、単語空間拡張部２２は、単語空間限定部２１により抽出された単語組として、「アイコンクリック表示」といった単語組を取得する。かかる場合、単語空間拡張部２２は、図１０中（Ａ）に示すように、単語組に含まれる動詞「表示」を、図１０中（Ｂ）に示すように、拡張単語データベース１５に登録された操作的動詞リストの各単語に置き換えた単語組を生成する。 For example, FIG. 10 is a diagram for explaining an example of a process for expanding the distributed representation space executed by the information providing apparatus according to the embodiment. For example, the word space expanding unit 22 acquires a word set such as “icon click display” as the word set extracted by the word space limiting unit 21. In such a case, the word space expansion unit 22 registers the verb “display” included in the word group in the expanded word database 15 as shown in FIG. 10B, as shown in FIG. A word set is generated by replacing each word in the operational verb list.

この結果、例えば、単語空間拡張部２２は、図１０中（Ｃ）に示すように、「アイコンクリック増やす」、「アイコンクリック分割する」「アイコンクリック除去する」といった単語組を新たに生成する。なお、単語空間拡張部２２は、元となる単語組「アイコンクリック表示」についても、新たに生成した単語組として提案単語抽出部２３に出力する。 As a result, for example, as shown in FIG. 10C, the word space expansion unit 22 newly generates word sets such as “increase icon click”, “divide icon click”, and “remove icon click”. The word space expansion unit 22 also outputs the original word set “icon click display” to the suggested word extraction unit 23 as a newly generated word set.

このように、単語空間限定部２１と単語空間拡張部２２とは、関連単語抽出部２０により抽出された単語組が含まれる文章の数を、全ての分野の文章の数で除算した値を算出し、算出した値が所定の閾値よりも小さい単語組と操作的動詞とを組み合わせることで、新たな単語組の生成を行う。このようにして新たに生成された単語組は、元の単語組の動詞を変換した単語組であるため、分散表現空間上にマッピングした場合、元の単語組と類似する方向を示すベクトルになると予測されるが、このようなベクトルは、類似する概念を示していると予測される。この結果、情報提供装置１０は、抽出された単語組の数が少ない場合にも、単語空間限定部２１によって抽出された単語組と類似する概念を含む単語組、すなわち、セレンディピティを生じさせやすい概念を含む単語組を生成することができる。 Thus, the word space limiting unit 21 and the word space expanding unit 22 calculate a value obtained by dividing the number of sentences including the word set extracted by the related word extracting unit 20 by the number of sentences in all fields. Then, a new word set is generated by combining a word set whose calculated value is smaller than a predetermined threshold and an operational verb. Since the newly generated word set is a word set obtained by converting the verb of the original word set, when mapped on the distributed expression space, the vector will indicate a direction similar to the original word set. Although predicted, such vectors are predicted to represent similar concepts. As a result, even when the number of extracted word pairs is small, the information providing apparatus 10 includes a word set including a concept similar to the word set extracted by the word space limiting unit 21, that is, a concept that easily causes serendipity. Can be generated.

また、単語空間拡張部２２は、情報提供装置１０が単語以外の情報を処理対象とする場合には、所定の関連性を有する情報の組に対し、操作的動詞と同様に所定の操作を示す情報を組み合わせることで、複数の新たな情報の組を生成してもよい。また、単語空間拡張部２２は、例えば、所定の関連性を有する情報の組に含まれる情報のうち、所定の種別の情報を、所定の操作を示す情報に変更した組を生成すればよい。 In addition, when the information providing apparatus 10 processes information other than words, the word space expansion unit 22 indicates a predetermined operation in the same manner as an operational verb for a set of information having a predetermined relevance. A plurality of new information sets may be generated by combining information. In addition, the word space expansion unit 22 may generate a set in which information of a predetermined type is changed to information indicating a predetermined operation among information included in a set of information having a predetermined relevance.

図７に戻り、説明を続ける。提案単語抽出部２３は、特定した分野とは異なる分野に属する単語のうち、抽出された単語組により生じる概念と同様の概念を有する単語を抽出する。例えば、提案単語抽出部２３は、単語空間拡張部２２が生成した単語組に含まれる各単語をＷ２Ｖに入力し、各単語の分散表現を取得する。続いて、提案単語抽出部２３は取得した分散表現の和、すなわち、単語組の分散表現を算出する。そして、提案単語抽出部２３は、分散表現空間データベース１４に格納された分散表現のうち、単語組の分散表現と類似する向きの分散表現であって、単語組の分散表現との距離が離れている分散表現を特定する。すなわち、提案単語抽出部２３は、単語組に含まれる単語の分散表現を用いて、単語組により生じる概念との類似度が所定の条件を満たす概念を有する単語を、入力情報が属する分野とは異なる分野に属する単語から抽出する。より具体的には、提案単語抽出部２３は、単語組の分散表現との類似度が所定の閾値以下となる分散表現に対応する単語を抽出する。 Returning to FIG. 7, the description will be continued. The suggested word extraction unit 23 extracts words having the same concept as the concept generated by the extracted word set from words belonging to a field different from the identified field. For example, the suggested word extraction unit 23 inputs each word included in the word set generated by the word space expansion unit 22 to W2V, and acquires a distributed representation of each word. Subsequently, the suggested word extraction unit 23 calculates the sum of the acquired distributed expressions, that is, the distributed expression of the word set. Then, the proposed word extraction unit 23 is a distributed expression in a direction similar to the distributed expression of the word set among the distributed expressions stored in the distributed expression space database 14, and is separated from the distributed expression of the word set. Identify distributed representations. In other words, the proposed word extraction unit 23 uses a distributed representation of words included in a word set to determine a word having a concept that satisfies a predetermined condition for similarity to the concept generated by the word set as a field to which the input information belongs. Extract words from different fields. More specifically, the proposed word extraction unit 23 extracts a word corresponding to a distributed expression whose similarity to the distributed expression of the word set is equal to or less than a predetermined threshold.

例えば、図１１は、実施形態にかかる情報提供装置が実行する分散表現空間の限定と拡張とを説明するための図である。例えば、図１１中（Ａ）に示すように、関連単語抽出部２０は、セレンディピティを起こしやすい概念を含む単語組として、単語組＃１〜＃Ｎを抽出する。かかる場合、図１１中（Ｂ）に示すように、単語空間限定部２１は、文献データベース１３に格納された全ての文献について、各単語組＃１〜＃ＮのＤＦの値を算出し、算出したＤＦの値で単語組のフィルタリングを行う。この結果、単語空間限定部２１は、図１１中（Ｃ）に示すように、ＤＦの値が所定の閾値以下である単語組として、例えば、単語組＃１、＃３、＃Ｎ等を抽出する。 For example, FIG. 11 is a diagram for explaining the limitation and expansion of the distributed representation space executed by the information providing apparatus according to the embodiment. For example, as shown in FIG. 11A, the related word extraction unit 20 extracts word groups # 1 to #N as word groups including a concept that easily causes serendipity. In this case, as shown in FIG. 11B, the word space limiting unit 21 calculates the DF values of the word groups # 1 to #N for all the documents stored in the document database 13, and calculates them. The word set is filtered by the DF value. As a result, the word space limiting unit 21 extracts, for example, word groups # 1, # 3, #N, etc. as word groups whose DF value is equal to or less than a predetermined threshold, as shown in FIG. To do.

また、単語空間拡張部２２は、図１１中（Ｄ）に示すように、コバーク＆バクナールチェックリストを用いて単語組を拡張する。例えば、単語空間拡張部２２は、単語組＃１に含まれる動詞を、コバーク＆バクナールチェックリストに含まれる操作的動詞に置き換えた単語組＃１−１〜＃１−ｍを生成する。同様に、単語空間拡張部２２は、単語組＃３に含まれる動詞を操作的動詞に置き換えた単語組＃３−１〜＃３−ｍを生成し、単語組＃Ｎに含まれる動詞を操作的動詞に置き換えた単語組＃Ｎ−１〜＃Ｎ−ｍを生成する。 Further, as shown in FIG. 11D, the word space expansion unit 22 expands the word set using the Kobak & bacnar checklist. For example, the word space expansion unit 22 generates word groups # 1-1 to # 1-m in which the verbs included in the word group # 1 are replaced with the operational verbs included in the Kovark & Bakunal checklist. Similarly, the word space expansion unit 22 generates word sets # 3-1 to # 3-m in which the verbs included in the word set # 3 are replaced with operational verbs, and operates the verbs included in the word set #N. The word groups # N-1 to #Nm replaced with the target verbs are generated.

続いて、提案単語抽出部２３は、図１１中（Ｅ）に示すように、各単語組＃１−１〜＃１−ｍ、＃３−１〜＃３−ｍ、＃Ｎ−１〜＃Ｎ−ｍごとに、以下の処理を実行する。すなわち、提案単語抽出部２３は、単語組に含まれる単語をＷ２Ｖに入力してベクトルを取得し、取得したベクトルの和と類似する単語であって、利用者の発言が属する分野、すなわち、特定した分野とは異なる分野の単語を分散表現空間から取得する。 Subsequently, as shown in FIG. 11E, the proposed word extraction unit 23 sets each word group # 1-1 to # 1-m, # 3-1 to # 3-m, # N-1 to ##. The following processing is executed for each N−m. That is, the suggested word extraction unit 23 inputs a word included in the word set to W2V, acquires a vector, is a word similar to the sum of the acquired vector, and belongs to a field to which the user's remark belongs, that is, a specific word The word of the field different from the selected field is acquired from the distributed expression space.

以下、提案単語抽出部２３が、単語組のベクトルの和と類似する単語であって、特定した分野とは異なる分野の単語を分散表現空間から取得する処理の一例について図面を用いて説明する。例えば、図１２は、実施形態にかかる情報提供装置が類似する単語を抽出する処理の一例を説明する図である。例えば、提案単語抽出部２３は、図１２中（Ａ）に示すように、単語組「眼鏡着用視聴」を取得する。かかる場合、提案単語抽出部２３は、図１２中（Ｂ）に示すように、単語組に含まれる各単語、すなわち、「眼鏡」、「着用」、「視聴」をそれぞれＷ２Ｖに入力することで、各単語の分散表現、すなわちベクトルを算出する。 Hereinafter, an example of processing in which the proposed word extraction unit 23 acquires words from a field different from the identified field that are similar to the sum of the vectors of the word sets from the distributed expression space will be described with reference to the drawings. For example, FIG. 12 is a diagram illustrating an example of a process of extracting similar words by the information providing apparatus according to the embodiment. For example, as shown in FIG. 12A, the suggested word extraction unit 23 acquires the word set “view wearing glasses”. In such a case, as shown in FIG. 12B, the proposed word extraction unit 23 inputs each word included in the word set, that is, “glasses”, “wear”, and “view” to W2V. Compute a distributed representation of each word, ie a vector.

続いて、提案単語抽出部２３は、図１２中（Ｃ）に示すように、単語組のそれぞれの単語のベクトルの和と類似するベクトルを分散表現空間データベース１４から抽出し、抽出したベクトルが示す単語を取得する。例えば、提案単語抽出部２３は、図１２中（Ｄ）に示すように、単語組のベクトルとのコサイン距離の値が大きいベクトルが示す単語を取得する。 Subsequently, as shown in FIG. 12C, the proposed word extraction unit 23 extracts a vector similar to the sum of the vectors of each word in the word set from the distributed representation space database 14, and the extracted vector indicates Get a word. For example, as shown in FIG. 12D, the suggested word extraction unit 23 acquires a word indicated by a vector having a large cosine distance value from the word set vector.

例えば、図１２中（Ｄ）に示す例では、コサイン距離の値が大きい順に「高齢」、「配偶」、「出演」、「演技」、「検眼」、「司会」、「不動産＿所有」等といった単語が取得される。ここで「不動産＿所有」とは、単語「不動産」と単語「所有」とが連続することで１つの熟語として用いられている場合に、形態素解析により１単語として抽出された単語である。 For example, in the example shown in FIG. 12D, “age”, “spouse”, “appearance”, “acting”, “optimization”, “moderator”, “real estate_owned”, etc., in descending order of the cosine distance value. Is acquired. Here, “real estate_owned” is a word extracted as one word by morphological analysis when the word “real estate” and the word “owned” are used as one idiom by being consecutive.

ここで、コサイン距離の値が大きい単語は、単語組が示す概念と類似する概念を有する単語であると言える。しかしながら、単純にコサイン距離の値が大きい単語は、利用者が予測していなかった概念を提供するものではないため、セレンディピティが低いと予測される。一方、コサイン距離の値が小さい単語には、単語組が示す概念とは異なる概念を有する単語が含まれるものの、一見して利用者が思いつかないような概念を提供する単語、すなわち、セレンディピティが高い単語も含まれると予測される。 Here, it can be said that a word having a large cosine distance value is a word having a concept similar to the concept indicated by the word set. However, a word simply having a large cosine distance value is not intended to provide a concept that the user has not predicted, so it is predicted that the serendipity is low. On the other hand, a word with a small cosine distance value includes a word having a concept different from the concept indicated by the word set, but a word that provides a concept that the user cannot think of at first glance, that is, has a high serendipity. Words are also expected to be included.

例えば、図１３は、単語のベクトル同士の関係性の一例を説明するための図である。図１３に示す例では、単語「ＭＡＮ」、「ＷＯＭＡＮ」、「ＵＮＣＬＥ」、「ＡＵＮＴ」、「ＫＩＮＧ」、「ＱＵＥＥＮ」がマッピングされた分散表現空間の一例を記載した。ここで、図１３中（Ａ）に示すように、単語「ＭＡＮ」から単語「ＷＯＭＡＮ」までのベクトル、単語「ＵＮＣＬＥ」から単語「ＡＵＮＴ」までのベクトル、単語「ＫＩＮＧ」から単語「ＱＵＥＥＮ」までのベクトルは、それぞれ「単語から男性の概念を除いて女性の概念を加算する」といった同一の概念を示すと考えられる。このため、図１３中（Ｂ）に示すように、このようなベクトルの矢印が向く方向は一致する。 For example, FIG. 13 is a diagram for explaining an example of the relationship between word vectors. In the example illustrated in FIG. 13, an example of a distributed expression space in which the words “MAN”, “WOMAN”, “UNCLE”, “AUNT”, “KING”, and “QUEEN” are mapped is described. Here, as shown in FIG. 13A, a vector from the word “MAN” to the word “WOMAN”, a vector from the word “UNCLE” to the word “AUNT”, and from the word “KING” to the word “QUEEN” These vectors are considered to indicate the same concept, such as “add the concept of a woman excluding the concept of a man from the word”. For this reason, as shown in FIG. 13B, the directions of the arrows of such vectors coincide.

一方、単語「ＫＩＮＧ」と単語「ＵＮＣＬＥ」との距離が離れている場合、単語「ＫＩＮＧ」から単語「ＵＮＣＬＥ」を容易に想起できるとは限らないため、図１３中（Ｃ）に示すように、単語「ＫＩＮＧ」が示す概念と単語「ＵＮＣＬＥ」が示す概念との共起性は小さいと考えられる。しかしながら、このように共起性が小さい概念であっても、ベクトルの方向が向いている場合には、同様の概念を示し得るため、図１３中（Ｄ）に示すように、セレンディピティを起こす可能性が高いと考えられる。 On the other hand, when the distance between the word “KING” and the word “UNCLE” is long, the word “UNCLE” cannot always be easily recalled from the word “KING”. The co-occurrence of the concept indicated by the word “KING” and the concept indicated by the word “UNCLE” is considered to be small. However, even in such a concept with low co-occurrence, a similar concept can be shown when the vector direction is facing, so that serendipity can occur as shown in FIG. It is considered that the nature is high.

このため、情報提供装置１０は、利用者にセレンディピティを起こす可能性が高い単語として、利用者の発言に含まれる単語との共起性が高いアイデアが有する概念と類似する概念を有する単語であって、利用者の発言に含まれる単語からは共起性が低い単語を提供すればよい。ここで、共起性が低い単語とは、ベクトル同士の距離が離れている単語であり、例えば、異なる分野に属する単語等が該当すると予測される。 For this reason, the information providing apparatus 10 is a word having a concept similar to a concept of an idea having a high co-occurrence with a word included in the user's utterance as a word having a high possibility of causing serendipity to the user. Thus, it is only necessary to provide a word having low co-occurrence from words included in the user's utterance. Here, a word with low co-occurrence is a word whose vectors are separated from each other. For example, a word belonging to a different field is predicted to be applicable.

そこで、提案単語抽出部２３は、単語組のベクトルと他の単語を示すベクトルとの距離を算出し、算出した距離が所定の閾値よりも離れているベクトルを特定する。このようにして特定された単語は、利用者の発言と共起性が高いアイデアの概念と類似する概念を含む単語であって、利用者の発言との共起性が低い単語であると予測される。 Therefore, the proposed word extraction unit 23 calculates the distance between the vector of the word set and a vector indicating another word, and identifies a vector whose calculated distance is more than a predetermined threshold. The word identified in this way is a word including a concept similar to the concept of an idea having high co-occurrence with the user's utterance, and is predicted to be a word having low co-occurrence with the user's utterance. Is done.

例えば、図１４は、情報提供装置が提案する単語が有する概念の一例を説明するための図である。図１４に示すように、分野＃１に属する単語組＃１を構成する単語＃１〜＃３は、それぞれ概念＃１〜＃３を有する。ここで、概念＃１は、概念を形成する複数の要素である要素群＃１により形成され、概念＃２は、概念を形成する複数の要素である要素群＃２により形成され、概念＃３は、概念を形成する複数の要素である要素群＃３により形成される。このため、単語組＃１の概念は、概念＃１〜＃３の和、すなわち、要素群＃１〜＃３の和であると考えられる。 For example, FIG. 14 is a diagram for explaining an example of a concept that a word proposed by the information providing apparatus has. As shown in FIG. 14, the words # 1 to # 3 constituting the word set # 1 belonging to the field # 1 have concepts # 1 to # 3, respectively. Here, concept # 1 is formed by element group # 1 which is a plurality of elements forming the concept, and concept # 2 is formed by element group # 2 which is a plurality of elements forming the concept, and concept # 3. Is formed by an element group # 3 which is a plurality of elements forming the concept. For this reason, the concept of the word set # 1 is considered to be the sum of the concepts # 1 to # 3, that is, the sum of the element groups # 1 to # 3.

ここで、図１４中（Ａ）に示すように、ベクトルの距離が遠いほどセレンディピティを生じさせる可能性が高い。しかしながら、単純に距離が離れてしまうと、単語組＃１が有する概念と離れすぎてしまうため、セレンディピティが生じにくくなる。そこで、提案単語抽出部２３は、図１４中（Ｂ）に示すように、セレンディピティを生じさせる程度に意味があることを保証することができる程度に、単語組＃１と共通する部分がある単語を抽出する。より具体的には、提案単語抽出部２３は、単語組＃１が有する概念、すなわち、要素群＃１〜＃３の和と類似する要素群＃４によって形成される概念＃４を有する単語であって、分野＃１との類似性が十分に離れた分野＃２に属する単語＃４を抽出する。 Here, as shown in FIG. 14A, the possibility of causing serendipity increases as the vector distance increases. However, if the distance is simply separated, it is too far from the concept of the word set # 1, so that it is difficult for serendipity to occur. Therefore, as shown in FIG. 14B, the proposed word extraction unit 23 has a word that has a part in common with the word set # 1 to such an extent that it can be guaranteed that it is meaningful enough to generate serendipity. To extract. More specifically, the proposed word extraction unit 23 is a word having a concept # 4 formed by an element group # 4 similar to the concept of the word group # 1, that is, the sum of the element groups # 1 to # 3. Thus, the word # 4 belonging to the field # 2 that is sufficiently similar to the field # 1 is extracted.

例えば、提案単語抽出部２３は、単語組のベクトルとのコサイン距離が所定の閾値よりも大きい（若しくは、所定の範囲内に収まる）ベクトルを抽出し、抽出したベクトルと、単語組に含まれる各単語のベクトルのコサイン距離をそれぞれ算出する。そして、提案単語抽出部２３は、算出したコサイン距離の和が所定の閾値よりも小さいベクトルを特定し、特定したベクトルが示す単語を提案する単語とする。すなわち、提案単語抽出部２３は、単語組の分散表現の和とのコサイン距離が所定の閾値よりも大きく、かつ、各単語の分散表現とのコサイン距離の和が所定の閾値よりも小さい分散表現に対応する単語を抽出する。 For example, the suggested word extraction unit 23 extracts a vector whose cosine distance from a word set vector is greater than a predetermined threshold (or falls within a predetermined range), and the extracted vector and each word included in the word set The cosine distance of each word vector is calculated. Then, the suggested word extraction unit 23 specifies a vector in which the sum of the calculated cosine distances is smaller than a predetermined threshold, and sets the word indicated by the specified vector as a suggested word. That is, the proposed word extraction unit 23 has a distributed representation in which the cosine distance with the sum of the distributed representations of the word set is larger than a predetermined threshold and the sum of the cosine distances with the distributed representation of each word is smaller than the predetermined threshold. The word corresponding to is extracted.

このように、提案単語抽出部２３は、単語組に含まれる各単語の分散表現の和を算出し、算出した分散表現の和との類似度が所定の条件を満たす分散表現に対応する単語を抽出する。例えば、提案単語抽出部２３は、単語組に含まれる単語の分散表現の和（すなわち、単語組のベクトル）と同様の向きを有する分散表現であって、単語組に含まれる単語の分散表現の和との距離が所定の閾値以上となる分散表現に対応する単語を抽出する。 In this way, the suggested word extraction unit 23 calculates the sum of the distributed representations of the words included in the word set, and calculates the word corresponding to the distributed representation whose similarity with the calculated sum of the distributed representations satisfies the predetermined condition. Extract. For example, the suggested word extraction unit 23 is a distributed representation having the same direction as the sum of the distributed representations of words included in the word set (that is, the vector of the word sets), and is a distributed representation of the words included in the word set. A word corresponding to the distributed expression whose distance from the sum is equal to or greater than a predetermined threshold is extracted.

例えば、提案単語抽出部２３は、「グラフ一括付け加える」といった単語組を受信した場合、単語「グラフ」のベクトルと、単語「一括」のベクトルと、単語「付け加える」のベクトルとの和を算出する。次に、提案単語抽出部２３は、算出したベクトルの和とのコサイン距離が所定の閾値よりも大きいベクトルとして、単語「回転」のベクトルを特定する。かかる場合、提案単語抽出部２３は、単語「回転」のベクトルと単語「グラフ」のベクトルとのコサイン距離、単語「回転」のベクトルと単語「一括」のベクトルとのコサイン距離、単語「回転」のベクトルと単語「付け加える」のベクトルとのコサイン距離をそれぞれ算出し、算出したコサイン距離の和が所定の閾値よりも小さいか否かを判定する。 For example, when the proposed word extracting unit 23 receives a word set such as “graph collective addition”, it calculates the sum of the word “graph” vector, the word “collective” vector, and the word “add” vector. . Next, the suggested word extraction unit 23 specifies a vector of the word “rotation” as a vector whose cosine distance with the calculated vector sum is larger than a predetermined threshold. In this case, the proposed word extraction unit 23 performs cosine distance between the vector of the word “rotation” and the vector of the word “graph”, the cosine distance between the vector of the word “rotation” and the vector of the word “collective”, and the word “rotation”. The cosine distance between each vector and the vector “add” is calculated, and it is determined whether or not the sum of the calculated cosine distances is smaller than a predetermined threshold.

すなわち、提案単語抽出部２３は、単語組のベクトルを構成する各単語ベクトルとの距離が所定の閾値よりも小さいか否かを判定する。そして、提案単語抽出部２３は、単語「回転」のベクトルについて、単語組のベクトルを構成する各単語ベクトルとの距離が所定の閾値よりも小さいと判定した場合は、単語「回転」を提案する単語とする。 That is, the suggested word extraction unit 23 determines whether or not the distance from each word vector constituting the word set vector is smaller than a predetermined threshold. Then, the proposed word extraction unit 23 proposes the word “rotation” when it is determined that the distance between the word “rotation” vector and each word vector constituting the word set vector is smaller than a predetermined threshold. A word.

図７に戻り、説明を続ける。出力部２４は、提案単語抽出部２３が抽出した単語を提案として出力する。例えば、出力部２４は、提案単語抽出部２３が単語「回転」を抽出した場合に、「回転させるのはどう？」等といった提案を行う文章を生成し、生成した文章を出力装置３１に送信する。この結果、出力装置３１は、情報提供装置１０が抽出した単語、すなわち、利用者にセレンディピティを生じさせるような単語を出力することができる。 Returning to FIG. 7, the description will be continued. The output unit 24 outputs the word extracted by the suggested word extraction unit 23 as a proposal. For example, when the proposed word extraction unit 23 extracts the word “rotation”, the output unit 24 generates a sentence that makes a proposal such as “How is it rotated?” And transmits the generated sentence to the output device 31. To do. As a result, the output device 31 can output a word extracted by the information providing device 10, that is, a word that causes a serendipity to the user.

学習部２５は、分散表現空間データベース１４が記憶する分散表現空間の学習を行う。例えば、学習部２５は、文献データベース１３に含まれる各文献の形態素解析を行い、Ｗ２Ｖの技術を用いて、各文献に含まれる単語同士の関係性に基づく分散表現を学習する。そして、学習部２５は、学習結果を分散表現空間データベース１４に登録する。 The learning unit 25 learns the distributed expression space stored in the distributed expression space database 14. For example, the learning unit 25 performs morphological analysis of each document included in the document database 13 and learns a distributed expression based on the relationship between words included in each document using the W2V technology. Then, the learning unit 25 registers the learning result in the distributed representation space database 14.

〔４．情報提供装置１０が実行する処理の流れ〕
次に、図１５〜図１７を用いて、情報提供装置１０が実行する処理の流れについて説明する。まず、図１５を用いて、情報提供装置１０が実行する抽出処理の流れについて説明する。図１５は、実施形態にかかる情報提供装置が実行する抽出処理の流れを説明するフローチャートである。 [4. Flow of processing executed by information providing apparatus 10]
Next, the flow of processing executed by the information providing apparatus 10 will be described with reference to FIGS. First, the flow of extraction processing executed by the information providing apparatus 10 will be described with reference to FIG. FIG. 15 is a flowchart illustrating a flow of extraction processing executed by the information providing apparatus according to the embodiment.

図１５に示すように、情報提供装置１０は、入力情報を取得すると（ステップＳ１０１）、取得した入力情報が属する分野を特定する（ステップＳ１０２）。続いて、情報提供装置１０は、特定した分野に属する文献データから、所定の関係性を満たす単語組を抽出する（ステップＳ１０３）。すなわち、情報提供装置１０は、ｃε辞典法の構造を有する特定構文から、単語組を抽出する。そして、情報提供装置１０は、抽出した単語組からセレンディピティを起こし得る単語組を選択し（ステップＳ１０４）、単語組の分散表現を用いて、単語組により生じる概念と類似度が所定の条件を満たす単語を抽出する（ステップＳ１０５）。すなわち、情報提供装置１０は、単語組が属する分野以外の分野に属する単語であって、単語組が有する概念と類似する概念を有する単語を抽出する。そして、情報提供装置１０は、抽出した単語をヒントとして出力し（ステップＳ１０６）、処理を終了する。 As illustrated in FIG. 15, when the information providing apparatus 10 acquires input information (step S101), the information providing apparatus 10 specifies a field to which the acquired input information belongs (step S102). Subsequently, the information providing apparatus 10 extracts a word set satisfying a predetermined relationship from the document data belonging to the specified field (step S103). That is, the information providing apparatus 10 extracts a word set from a specific syntax having a cε dictionary structure. Then, the information providing apparatus 10 selects a word group that can cause serendipity from the extracted word group (step S104), and the concept and similarity generated by the word group satisfy a predetermined condition using the distributed representation of the word group. A word is extracted (step S105). That is, the information providing apparatus 10 extracts words that belong to a field other than the field to which the word group belongs and have a concept similar to the concept of the word group. And the information provision apparatus 10 outputs the extracted word as a hint (step S106), and complete | finishes a process.

次に、図１６を用いて、単語組が属する分野以外の分野に属する単語であって、単語組が有する概念と類似する概念を有する単語を抽出するための各種処理の流れをより具体的に説明する。図１６は、実施形態にかかる情報提供装置が実行する抽出処理の具体的な処理の流れを説明するフローチャートである。なお、図１６に示すステップＳ２０１〜Ｓ２０６は、図１５に示すステップＳ１０３〜Ｓ１０５の処理をより具体的にしたものである。 Next, referring to FIG. 16, the flow of various processes for extracting a word that belongs to a field other than the field to which the word group belongs and has a concept similar to the concept of the word group will be described more specifically. explain. FIG. 16 is a flowchart for explaining a specific processing flow of the extraction processing executed by the information providing apparatus according to the embodiment. Note that steps S201 to S206 shown in FIG. 16 are more specific processes of steps S103 to S105 shown in FIG.

例えば、情報提供装置１０は、入力情報が属する分野に属する文献データの形態素解析を行い（ステップＳ２０１）、所定のパターンに合致する文字列を特定する（ステップＳ２０２）。そして、情報提供装置１０は、特定した文字列に含まれる単語から単語組を生成する（ステップＳ２０３）。 For example, the information providing apparatus 10 performs morphological analysis of document data belonging to the field to which the input information belongs (step S201), and specifies a character string that matches a predetermined pattern (step S202). And the information provision apparatus 10 produces | generates a word set from the word contained in the specified character string (step S203).

また、情報提供装置１０は、生成した単語組からセレンディピティを起こしやすい単語組を選択し（ステップＳ２０４）、選択した単語組から操作的動詞を用いて単語組のバリエーションを生成する（ステップＳ２０５）。そして、情報提供装置１０は、選択した単語組をＷ２Ｖに入力し、方向を含めた類似度、すなわち、単語組のベクトルとのコサイン距離が所定の範囲内であって、方向を含めない類似度、すなわち、単語組に含まれる各単語のベクトルとのコサイン距離の和が所定の閾値以下となる単語を抽出する（ステップＳ２０６）。 Further, the information providing apparatus 10 selects a word group that is likely to cause serendipity from the generated word group (step S204), and generates a variation of the word group from the selected word group using an operational verb (step S205). Then, the information providing apparatus 10 inputs the selected word set to W2V, and the degree of similarity including the direction, that is, the degree of similarity not including the direction when the cosine distance with the vector of the word set is within a predetermined range. That is, a word whose sum of cosine distances with a vector of words included in the word set is equal to or less than a predetermined threshold is extracted (step S206).

次に、図１７を用いて、単語組の中からセレンディピティを起こしやすい単語組を選択し、選択した単語組から単語組のバリエーションを生成する処理の流れをより具体的に説明する。図１７は、実施形態にかかる情報提供装置がセレンディピティを起こしやすい単語組を選択する処理の具体的な処理の流れを説明するフローチャートである。なお、図１７に示すステップＳ３０１〜Ｓ３０３は、図１６に示すステップＳ２０４、Ｓ２０５の処理をより具体的にしたものである。 Next, the flow of processing for selecting a word group that is likely to cause serendipity from the word group and generating a variation of the word group from the selected word group will be described more specifically with reference to FIG. FIG. 17 is a flowchart for explaining a specific processing flow of processing for selecting a word group that is likely to cause serendipity by the information providing apparatus according to the embodiment. Note that steps S301 to S303 shown in FIG. 17 are more specific processes of steps S204 and S205 shown in FIG.

例えば、情報提供装置１０は、生成した単語組のＤＦをそれぞれ算出する（ステップＳ３０１）。また、情報提供装置１０は、生成した単語組のＤＦの値が所定の閾値以下となる単語組を選択する（ステップＳ３０２）。そして、情報提供装置１０は、操作的動詞リストを用いて、選択した単語組のバリエーションを生成する（ステップＳ３０３）。 For example, the information providing apparatus 10 calculates each DF of the generated word set (step S301). Further, the information providing apparatus 10 selects a word set in which the DF value of the generated word set is equal to or less than a predetermined threshold (step S302). And the information provision apparatus 10 produces | generates the variation of the selected word group using an operational verb list | wrist (step S303).

〔５．変形例〕
上記では、図１に例示した態様を用いながら、情報提供装置１０が実行する抽出処理の一例について説明した。しかしながら、実施形態は、これに限定されるものではない。以下、情報提供装置１０が実行する抽出処理のバリエーションについて説明する。 [5. (Modification)
In the above, an example of the extraction process performed by the information providing apparatus 10 has been described using the aspect illustrated in FIG. However, the embodiment is not limited to this. Hereinafter, the variation of the extraction process which the information provision apparatus 10 performs is demonstrated.

〔５−１．各種のパラメータについて〕
上述した情報提供装置１０は、セレンディピティを起こし得る単語を抽出するため、例えば、抽出した単語組のうち、ＤＦの値が所定の閾値（以下、第１閾値と記載する。）以下の単語組を抽出した。また、情報提供装置１０は、単語組のベクトルとのコサイン距離が所定の範囲（以下、第１範囲と記載する。）内となるベクトルであって、単語組に含まれる各単語のベクトルとのコサイン距離の和が所定の閾値（以下、第２閾値と記載する。）以下となるベクトルを抽出した。ここで、情報提供装置１０が採用する各種の閾値は、任意の閾値が採用可能である。 [5-1. (Various parameters)
In order to extract words that can cause serendipity, for example, the information providing apparatus 10 described above extracts word pairs having a DF value of a predetermined threshold value (hereinafter referred to as a first threshold value) or less from the extracted word sets. Extracted. Further, the information providing apparatus 10 is a vector whose cosine distance from a word set vector is within a predetermined range (hereinafter referred to as a first range), A vector in which the sum of the cosine distances is equal to or less than a predetermined threshold (hereinafter referred to as a second threshold) was extracted. Here, as the various threshold values employed by the information providing apparatus 10, any threshold value can be employed.

例えば、情報提供装置１０は、一見した際にはわかりづらいものの、熟考した際にセレンディピティが生じやすい単語を出力する場合は、第１閾値の値をより低くしてもよく、第２閾値の値をより小さくしてもよい。また、情報提供装置１０は、第１範囲をより狭く設定してもよい。 For example, when the information providing apparatus 10 outputs a word that is difficult to understand at first glance but is prone to serendipity when considered, the value of the first threshold may be set lower. May be made smaller. Further, the information providing apparatus 10 may set the first range to be narrower.

〔５−２．セレンディピティを生じさせやすい単語組について〕
上述した例では、情報提供装置１０は、セレンディピティを起こさせる可能性を高めるため、抽出した単語組からセレンディピティを起こさせる可能性が高い単語を選択した。例えば、関連単語抽出部２０は、各単語組の出現頻度や単語同士の関係性等に基づいて、単語組の選択を行った。しかしながら、実施形態は、これに限定されるものではない。 [5-2. (About word sets that are prone to serendipity)
In the example described above, the information providing apparatus 10 selects a word that is highly likely to cause serendipity from the extracted word set in order to increase the possibility of causing serendipity. For example, the related word extraction unit 20 selects a word set based on the appearance frequency of each word set, the relationship between words, and the like. However, the embodiment is not limited to this.

例えば、情報提供装置１０は、分野ごとに、セレンディピティを起こし得る単語組を予め選択しておいてもよい。また、かかる選択処理は、人の手によって予め行われていてもよい。例えば、情報提供装置１０は、所定の関係性を有する複数の単語を含む単語組を文献データから抽出し、抽出した単語組をオペレータに提示する。そして、情報提供装置１０は、オペレータが選択した単語組を、セレンディピティを起こし得る単語組として予め選択しておいてもよい。 For example, the information providing apparatus 10 may select a word group that can cause serendipity in advance for each field. Such selection processing may be performed in advance by a human hand. For example, the information providing apparatus 10 extracts a word set including a plurality of words having a predetermined relationship from the document data, and presents the extracted word set to the operator. And the information provision apparatus 10 may select beforehand the word set which the operator selected as a word set which can raise serendipity.

〔５−３．情報提供装置が実行する処理〕
上述した説明では、情報提供装置１０は、入力情報が属する分野の情報から所定の関係性を有する複数の情報を特定し、特定した複数の情報により生じる概念と同様の概念を有する情報を、特定した分野とは異なる分野に属する情報から抽出した。しかしながら、実施形態は、これに限定されるものではない。すなわち、情報提供装置１０は、入力情報が属する分野と異なる分野の情報の中から、入力情報と暗黙的なつながりを保持しつつ、明示的には不連続な関係性を有する情報を出力できるのであれば、異なる処理により、出力する情報を抽出してもよい。 [5-3. Processing performed by information providing device]
In the above description, the information providing apparatus 10 specifies a plurality of pieces of information having a predetermined relationship from the information of the field to which the input information belongs, and specifies information having the same concept as that generated by the specified pieces of information. Extracted from information belonging to a different field. However, the embodiment is not limited to this. That is, the information providing apparatus 10 can output information having an explicitly discontinuous relationship while maintaining an implicit connection with the input information from information in a field different from the field to which the input information belongs. If so, the output information may be extracted by different processing.

例えば、図１８は、入力分野の情報と異分野の情報とに関係のある情報を出力する処理の一例を説明する図である。なお、図１８に示す例では、情報提供装置１０が実行する処理のバリエーションとして、入力情報が属する分野の情報と、入力情報が属する分野とは異なる分野の情報との双方を分析し、双方の情報に関連性のある情報を出力する処理の一例について記載した。 For example, FIG. 18 is a diagram illustrating an example of a process for outputting information related to information in an input field and information in a different field. In the example shown in FIG. 18, as a variation of the process executed by the information providing apparatus 10, both the information on the field to which the input information belongs and the information on the field different from the field to which the input information belongs are analyzed. An example of processing for outputting information relevant to information was described.

例えば、図１８に示すように、利用者が新たなアイデアを思案する場合、参考になるであろう情報が属する分野は、教育業、電力供給業、銀行業等、多岐に渡る。このため、利用者は、図１８中（Ａ）に示すように、これら全ての分野の情報を考慮して、新たなアイデアを思索するのが困難である。 For example, as shown in FIG. 18, when a user thinks of a new idea, fields to which information that may be helpful belongs include education, power supply, banking, and the like. For this reason, as shown in FIG. 18A, it is difficult for the user to consider a new idea in consideration of information in all these fields.

そこで、情報提供装置１０は、以下の処理を実行する。まず、情報提供装置１０は、利用者から入力情報を受付ける。このような場合、情報提供装置１０は、入力情報が属する分野を特定する。また、情報提供装置１０は、特定した分野とは異なる分野（異分野）を抽出する。そして、情報提供装置１０は、入力情報が属する分野に属する情報と、かかる分野とは異なる分野に属する情報との双方を分析し、双方の情報に関連性のある情報を特定する。そして、情報提供装置１０は、特定した情報を出力する。 Therefore, the information providing apparatus 10 executes the following process. First, the information providing apparatus 10 receives input information from a user. In such a case, the information providing apparatus 10 specifies the field to which the input information belongs. Further, the information providing apparatus 10 extracts a field (different field) different from the identified field. Then, the information providing apparatus 10 analyzes both information belonging to the field to which the input information belongs and information belonging to a field different from the field, and specifies information relevant to both information. Then, the information providing apparatus 10 outputs the specified information.

例えば、利用者は、図１８中（Ｂ）に示すように、情報提供装置１０に対して、思いついた単語等の任意の情報を入力情報として入力する。このような場合、情報提供装置１０は、図１８中（Ｃ）に示すように、入力情報が属する分野を特定する。例えば、情報提供装置１０は、入力情報が「プロパンガス」等といった情報であれば、「ガス供給業」を入力情報が属する分野として特定する。 For example, as shown in FIG. 18B, the user inputs arbitrary information such as a word that has come up to the information providing apparatus 10 as input information. In such a case, the information providing apparatus 10 identifies the field to which the input information belongs, as shown in (C) of FIG. For example, if the input information is information such as “propane gas”, the information providing apparatus 10 identifies “gas supply industry” as the field to which the input information belongs.

続いて、情報提供装置１０は、特定した分野とは異なる分野を抽出する。例えば、情報提供装置１０は、特定した分野「ガス供給業」とは異なる分野として「広告業」を抽出する。なお、情報提供装置１０は、入力情報が属する分野とは異なる分野を、複数選択してもよい。また、情報提供装置１０は、上述した各種の処理を用いて、入力情報が属する分野とは、意味が離れている分野（例えば、分散表現空間上における距離が遠い分野）を選択してもよい。 Subsequently, the information providing apparatus 10 extracts a field different from the identified field. For example, the information providing apparatus 10 extracts “advertisement industry” as a field different from the identified field “gas supply industry”. The information providing apparatus 10 may select a plurality of fields different from the field to which the input information belongs. Further, the information providing apparatus 10 may select a field that has a meaning that is different from the field to which the input information belongs (for example, a field that is far in the distributed expression space) by using the various processes described above. .

そして、情報提供装置１０は、入力情報が属する分野と抽出した分野との双方の情報を分析し、双方の情報に関係のある情報を特定する。例えば、情報提供装置１０は、各分野に属する単語の属性、概念、意味等を分散表現空間上における単語間の距離や向きを用いて比較し、双方の分野に関係のある情報を特定する。また、例えば、情報提供装置１０は、分散表現空間上における単語の向きが近く、距離が遠い単語同士を抽出する。そして、情報提供装置１０は、図１８中（Ｆ）に示すように、抽出した単語を出力する。 Then, the information providing apparatus 10 analyzes information on both the field to which the input information belongs and the extracted field, and specifies information related to both information. For example, the information providing apparatus 10 compares the attributes, concepts, meanings, and the like of words belonging to each field using the distance and direction between words in the distributed expression space, and identifies information related to both fields. Further, for example, the information providing apparatus 10 extracts words that are close in direction and long in the distributed expression space. And the information provision apparatus 10 outputs the extracted word, as shown to (F) in FIG.

このような処理を実行した結果、情報提供装置１０は、通常の思考では浮かばない単語の組み合わせを出力することができるので、利用者にセレンディピティを生じさせることができる。なお、上述した処理以外にも、情報提供装置１０は、入力情報が属する分野と、異分野との双方に関係のある情報を出力するのであれば、任意の分析手法を採用することができる。 As a result of executing such a process, the information providing apparatus 10 can output a combination of words that does not appear in normal thinking, so that serendipity can be generated for the user. In addition to the processing described above, the information providing apparatus 10 can employ any analysis method as long as it outputs information related to both the field to which the input information belongs and the different field.

なお、上述した情報提供装置１０は、入力情報として任意の情報を採用することができる。例えば、情報提供装置１０は、入力情報として、テキスト、ビジネスの特徴、音や画像等のコンテンツ、生体情報等を採用し、入力情報が属する分野と、異分野との双方の情報に関係のある情報として、テキスト、ビジネスの特徴、コンテンツ、生体情報等を出力すればよい。また、情報提供装置１０は、上述した実施形態において説明した書く処理のうち、任意の処理を矛盾させない範囲で利用可能である。 Note that the information providing apparatus 10 described above can employ arbitrary information as input information. For example, the information providing apparatus 10 employs text, business features, contents such as sound and images, biometric information, and the like as input information, and is related to information in both the field to which the input information belongs and the different field. As information, text, business characteristics, contents, biometric information, and the like may be output. In addition, the information providing apparatus 10 can be used within a range in which arbitrary processes are not contradicted among the writing processes described in the above-described embodiments.

〔５−４．その他〕
また、上記実施形態において説明した各処理のうち、自動的に行われるものとして説明した処理の全部または一部を手動的に行うこともでき、あるいは、手動的に行われるものとして説明した処理の全部または一部を公知の方法で自動的に行うこともできる。この他、上記文書中や図面中で示した処理手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて任意に変更することができる。例えば、各図に示した各種情報は、図示した情報に限られない。 [5-4. Others]
In addition, among the processes described in the above embodiment, all or part of the processes described as being automatically performed can be performed manually, or the processes described as being performed manually can be performed. All or a part can be automatically performed by a known method. In addition, the processing procedures, specific names, and information including various data and parameters shown in the document and drawings can be arbitrarily changed unless otherwise specified. For example, the various types of information illustrated in each drawing is not limited to the illustrated information.

また、図示した各装置の各構成要素は機能概念的なものであり、必ずしも物理的に図示の如く構成されていることを要しない。すなわち、各装置の分散・統合の具体的形態は図示のものに限られず、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散・統合して構成することができる。例えば、図７に示した単語空間限定部２１と単語空間拡張部２２とは統合されてもよい。 Further, each component of each illustrated apparatus is functionally conceptual, and does not necessarily need to be physically configured as illustrated. In other words, the specific form of distribution / integration of each device is not limited to that shown in the figure, and all or a part thereof may be functionally or physically distributed or arbitrarily distributed in arbitrary units according to various loads or usage conditions. Can be integrated and configured. For example, the word space limiting unit 21 and the word space extending unit 22 illustrated in FIG. 7 may be integrated.

また、上記してきた各実施形態は、処理内容を矛盾させない範囲で適宜組み合わせることが可能である。 In addition, the above-described embodiments can be appropriately combined within a range in which processing contents do not contradict each other.

〔５−５．プログラム〕
また、上記してきた実施形態にかかる情報提供装置１０は、例えば、図１９に示すような構成のコンピュータ１０００によって実現される。図１９は、抽出処理を実現するコンピュータの一例を示すハードウェア構成図である。コンピュータ１０００は、ＣＰＵ１１００、ＲＡＭ１２００、ＲＯＭ１３００、ＨＤＤ１４００、通信インターフェイス（Ｉ／Ｆ）１５００、入出力インターフェイス（Ｉ／Ｆ）１６００、およびメディアインターフェイス（Ｉ／Ｆ）１７００を有する。 [5-5. program〕
Further, the information providing apparatus 10 according to the above-described embodiment is realized by, for example, a computer 1000 configured as shown in FIG. FIG. 19 is a hardware configuration diagram illustrating an example of a computer that implements extraction processing. The computer 1000 includes a CPU 1100, RAM 1200, ROM 1300, HDD 1400, communication interface (I / F) 1500, input / output interface (I / F) 1600, and media interface (I / F) 1700.

ＣＰＵ１１００は、ＲＯＭ１３００またはＨＤＤ１４００に格納されたプログラムに基づいて動作し、各部の制御を行う。ＲＯＭ１３００は、コンピュータ１０００の起動時にＣＰＵ１１００によって実行されるブートプログラムや、コンピュータ１０００のハードウェアに依存するプログラム等を格納する。 The CPU 1100 operates based on a program stored in the ROM 1300 or the HDD 1400 and controls each unit. The ROM 1300 stores a boot program executed by the CPU 1100 when the computer 1000 is started up, a program depending on the hardware of the computer 1000, and the like.

ＨＤＤ１４００は、ＣＰＵ１１００によって実行されるプログラム、および、かかるプログラムによって使用されるデータ等を格納する。通信インターフェイス１５００は、ネットワークＮを介して他の機器からデータを受信してＣＰＵ１１００へ送り、ＣＰＵ１１００が生成したデータを他の機器へ送信する。 The HDD 1400 stores programs executed by the CPU 1100, data used by the programs, and the like. The communication interface 1500 receives data from other devices via the network N, sends the data to the CPU 1100, and transmits the data generated by the CPU 1100 to the other devices.

ＣＰＵ１１００は、入出力インターフェイス１６００を介して、ディスプレイやプリンタ等の出力装置、および、キーボードやマウス等の入力装置を制御する。ＣＰＵ１１００は、入出力インターフェイス１６００を介して、入力装置からデータを取得する。また、ＣＰＵ１１００は、生成したデータを入出力インターフェイス１６００を介して出力装置へ出力する。 The CPU 1100 controls an output device such as a display and a printer and an input device such as a keyboard and a mouse via the input / output interface 1600. The CPU 1100 acquires data from the input device via the input / output interface 1600. In addition, the CPU 1100 outputs the generated data to the output device via the input / output interface 1600.

メディアインターフェイス１７００は、非一時的にコンピュータが読み取り可能な記憶媒体の一例である記録媒体１８００に格納された情報提供プログラム等のプログラムまたはデータを読み取り、ＲＡＭ１２００を介してＣＰＵ１１００に提供する。ＣＰＵ１１００は、かかるプログラムを、メディアインターフェイス１７００を介して記録媒体１８００からＲＡＭ１２００上にロードし、ロードしたプログラムを実行する。記録媒体１８００は、例えばＤＶＤ（Digital Versatile Disc）、ＰＤ（Phase change rewritable Disk）等の光学記録媒体、ＭＯ（Magneto-Optical disk）等の光磁気記録媒体、テープ媒体、磁気記録媒体、または半導体メモリ等である。 The media interface 1700 reads a program or data such as an information providing program stored in a recording medium 1800 that is an example of a non-transitory computer-readable storage medium, and provides the read program or data to the CPU 1100 via the RAM 1200. The CPU 1100 loads such a program from the recording medium 1800 onto the RAM 1200 via the media interface 1700, and executes the loaded program. The recording medium 1800 is, for example, an optical recording medium such as a DVD (Digital Versatile Disc) or PD (Phase change rewritable disk), a magneto-optical recording medium such as an MO (Magneto-Optical disk), a tape medium, a magnetic recording medium, or a semiconductor memory. Etc.

例えば、コンピュータ１０００が実施形態にかかる情報提供装置１０として機能する場合、コンピュータ１０００のＣＰＵ１１００は、ＲＡＭ１２００上にロードされたプログラムを実行することにより、制御部１６の機能を実現する。また、ＨＤＤ１４００には、記憶部１２内のデータ、すなわち文献データベース１３、分散表現空間データベース１４、拡張単語データベース１５が格納される。コンピュータ１０００のＣＰＵ１１００は、これらのプログラムを記録媒体１８００から読み取って実行するが、他の例として、他の装置からこれらのプログラムを取得してもよい。 For example, when the computer 1000 functions as the information providing apparatus 10 according to the embodiment, the CPU 1100 of the computer 1000 implements the function of the control unit 16 by executing a program loaded on the RAM 1200. The HDD 1400 stores data in the storage unit 12, that is, the document database 13, the distributed expression space database 14, and the extended word database 15. The CPU 1100 of the computer 1000 reads these programs from the recording medium 1800 and executes them. However, as another example, these programs may be acquired from other devices.

〔６．効果〕
上述したように、情報提供装置１０は、入力情報を受付けると、入力情報から特定構文を特定し、特定した特定構文に埋め込まれた単語群を抽出する。また、情報提供装置１０は、特定構文に含まれる単語群の分散表現を用いて、他の単語の分散表現との類似度を算出する。そして、情報提供装置１０は、算出した類似度に基づいて抽出した情報を出力する。このため、情報提供装置１０は、利用者が思い浮かばない情報を提供することができるので、利用者にセレンディピティを生じさせることができるような情報を出力することができる。 [6. effect〕
As described above, when receiving the input information, the information providing apparatus 10 specifies a specific syntax from the input information and extracts a word group embedded in the specified specific syntax. In addition, the information providing apparatus 10 calculates the degree of similarity with the distributed expression of other words using the distributed expression of the word group included in the specific syntax. Then, the information providing apparatus 10 outputs information extracted based on the calculated similarity. For this reason, the information providing apparatus 10 can provide information that does not come to the mind of the user, and thus can output information that can cause serendipity to the user.

また、情報提供装置１０は、ｃε辞典法を用いて、特定構文を特定する。このため、情報提供装置１０は、ある分野においてセレンディピティを生じさせる可能性を有するパターンを有する文章に含まれる単語群から、かかる単語群が有する特徴と類似する特徴を有する単語を出力することができるので、利用者にセレンディピティを生じさせる可能性を担保しつつ、利用者より思い浮かびづらい情報を提供することができる。 In addition, the information providing apparatus 10 specifies a specific syntax using the cε dictionary method. For this reason, the information providing apparatus 10 can output a word having a feature similar to the feature of the word group from the word group included in the sentence having a pattern that has the possibility of causing serendipity in a certain field. Therefore, it is possible to provide information that is hard to come up with than the user while ensuring the possibility of causing serendipity for the user.

また、情報提供装置１０は、等価変換理論に基づいて、単語群と同様のアナロジーを有し、かかる単語群と異なる分野に属する単語を出力する。単語群により生じる概念を保ちつつ、利用者がより思い浮かびづらい情報を提供することができるので、利用者にセレンディピティを生じさせることができるような情報を出力することができる。 Moreover, the information provision apparatus 10 has the analogy similar to a word group based on the equivalent conversion theory, and outputs the word which belongs to the field | area different from this word group. Since it is possible to provide information that is more difficult for the user to remember while maintaining the concept generated by the word group, it is possible to output information that can generate serendipity for the user.

また、情報提供装置１０は、単語群の分散表現の和と類似する分散表現に対応する単語や、単具群の分散表現の和との類似度が所定の範囲内に収まる分散表現に対応する情報を抽出する。このため、情報提供装置１０は、利用者にセレンディピティを生じさせることができるような情報を出力することができる。 Further, the information providing apparatus 10 corresponds to a word corresponding to a distributed expression similar to the sum of the distributed expressions of the word group or a distributed expression in which the similarity to the sum of the distributed expressions of the single tool group falls within a predetermined range. Extract information. Therefore, the information providing apparatus 10 can output information that can cause the user to generate serendipity.

以上、本願の実施形態のいくつかを図面に基づいて詳細に説明したが、これらは例示であり、発明の開示の欄に記載の態様を始めとして、当業者の知識に基づいて種々の変形、改良を施した他の形態で本発明を実施することが可能である。 As described above, some of the embodiments of the present application have been described in detail based on the drawings. It is possible to implement the present invention in other forms with improvements.

また、上記してきた「部（section、module、unit）」は、「手段」や「回路」などに読み替えることができる。例えば、制御部は、制御手段や制御回路に読み替えることができる。 Moreover, the above-mentioned “section (module, unit)” can be read as “means”, “circuit”, and the like. For example, the control unit can be read as control means or a control circuit.

１０情報提供装置
１１通信部
１２記憶部
１３文献データベース
１４分散表現データベース
１５拡張単語データベース
１６制御部
１７受付部
１８分野特定部
１９パターン抽出部
２０関連単語抽出部
２１単語空間限定部
２２単語空間拡張部
２３提案単語抽出部
２４出力部
２５学習部
３０入力装置
３１出力装置 DESCRIPTION OF SYMBOLS 10 Information provision apparatus 11 Communication part 12 Storage part 13 Reference database 14 Distributed expression database 15 Extended word database 16 Control part 17 Reception part 18 Field specific part 19 Pattern extraction part 20 Related word extraction part 21 Word space limitation part 22 Word space expansion part 23 Proposed word extraction unit 24 Output unit 25 Learning unit 30 Input device 31 Output device

Claims

A reception unit for receiving input information;
A pattern identifying unit that extracts a specific syntax and a group of words embedded in the specific syntax from input information;
A similarity calculation unit that calculates a similarity using a distributed representation of the specified word group included in the specified pattern;
And an output unit that outputs information extracted based on the similarity calculated by the similarity calculation unit.

The information providing apparatus according to claim 1, wherein the pattern specifying unit uses a cε dictionary method.

3. The information provision according to claim 1, wherein the output unit outputs a word having an analogy similar to that of the word group and belonging to a field different from the word group based on an equivalent conversion theory. apparatus.

The information providing apparatus according to claim 1, wherein the output unit outputs a word for a distributed expression similar to the sum of the distributed expressions of a word group.

5. The output unit according to claim 1, wherein the output unit outputs a word corresponding to a distributed expression whose similarity with the sum of the distributed expressions of the word group falls within a predetermined range. Information provision device.

A reception process for receiving input information;
A pattern identification step of extracting a specific syntax and a group of words embedded in the specific syntax from input information;
A similarity calculation step of calculating a similarity using a distributed representation of the specified word group included in the specified pattern;
An output step of outputting information extracted based on the similarity calculated by the similarity calculation step.

Acceptance procedure to accept input information,
A pattern identification procedure for extracting a specific syntax and a group of words embedded in the specific syntax from input information;
A similarity calculation procedure for calculating a similarity using a distributed representation of the specified word group included in the specified pattern;
An information providing program for causing a computer to execute an output procedure for outputting information extracted based on the similarity calculated by the similarity calculation procedure.