JP6679360B2

JP6679360B2 - Information providing apparatus and information providing method

Info

Publication number: JP6679360B2
Application number: JP2016055545A
Authority: JP
Inventors: 祐宮崎; 隼人小林; 香里谷尾; 正樹野口; 晃平菅原
Original assignee: Yahoo Japan Corp
Current assignee: Yahoo Japan Corp
Priority date: 2016-03-18
Filing date: 2016-03-18
Publication date: 2020-04-15
Anticipated expiration: 2036-03-18
Also published as: JP2017173874A

Description

本発明は、情報提供装置および情報提供方法に関する。 The present invention relates to an information providing device and an information providing method.

従来、入力された情報の解析結果に基づいて、入力された情報と関連する情報を検索もしくは生成し、検索もしくは生成した情報を応答として出力する技術が知られている。このような技術の一例として、入力されたテキストに含まれる単語、文章、文脈を多次元ベクトルに変換して解析し、解析結果に基づいて、入力されたテキストと類似するテキストや、入力されたテキストに続くテキストを類推し、類推結果を出力する自然言語処理の技術が知られている。 BACKGROUND ART Conventionally, there is known a technique of searching or generating information related to the input information based on an analysis result of the input information and outputting the searched or generated information as a response. As an example of such a technique, the words, sentences, and contexts included in the input text are converted into a multidimensional vector and analyzed, and based on the analysis result, text similar to the input text or the input text is input. There is a known natural language processing technique in which a text following a text is analogized and the analogy result is output.

特開２０１５−２８６２５号公報JP, 2015-28625, A

”FastSLAM: A Factored Solution to the Simultaneous Localization and Mapping Problem”,Michael Montemerlo, Sebastian Thrun, Daphne Koller, Ben Wegbreit, ”、［online］、［平成２８年３月９日検索］、インターネット＜http://ai.stanford.edu/~koller/Papers/Montemerlo+al:AAAI02.pdf＞"FastSLAM: A Factored Solution to the Simultaneous Localization and Mapping Problem", Michael Montemerlo, Sebastian Thrun, Daphne Koller, Ben Wegbreit, ", [online], [March 9, 2016 search], internet <http: // ai.stanford.edu/~koller/Papers/Montemerlo+al:AAAI02.pdf>

しかしながら、上記の従来技術を会議の補助として適用した場合、会議を効率的に進めることができない場合がある。 However, when the above-mentioned conventional technique is applied as an aid to a conference, the conference may not be able to proceed efficiently.

例えば、上述した従来技術を用いて、会議等の発言に対応する発言をロボット等から出力するといった技術が考えられる。しかしながら、上記の従来技術では、入力されたテキストと類似するテキストや、入力されたテキストに続くテキスト等、利用者が予測しうる情報を出力しているに過ぎない。このため、例えば、上記の従来技術では、会議を効率的に進めることができないおそれがある。 For example, a technique is conceivable in which a utterance corresponding to a utterance at a conference or the like is output from a robot or the like by using the above-described conventional technique. However, in the above-mentioned conventional technique, the information similar to the input text, the text following the input text, and the like that the user can predict are merely output. Therefore, for example, in the above-described related art, there is a possibility that the conference cannot be efficiently advanced.

本願は、上記に鑑みてなされたものであって、会議を効率的に進めることを目的とする。 The present application has been made in view of the above, and an object thereof is to efficiently advance a conference.

本願にかかる情報提供装置は、会議における複数の利用者の発言内容を取得する取得部と、前記取得部により取得された利用者の発言内容をベクトル空間上に投影する投影部と、前記ベクトル空間上に投影された発言内容の履歴の位置と、前記ベクトル空間上に投影された新たな発言内容の位置とに基づいて、前記会議を誘導する方向を特定する特定部と、前記会議を前記特定された方向へと誘導するための応答を出力する出力部とを有することを特徴とする。 An information providing apparatus according to the present application includes an acquisition unit that acquires the content of statements of a plurality of users in a conference, a projection unit that projects the content of statements of the users acquired by the acquisition unit onto a vector space, and the vector space. A specifying unit that specifies the direction in which the conference is guided based on the position of the history of the comment contents projected above and the position of the new comment contents projected on the vector space, and the specifying of the meeting. And an output unit that outputs a response for guiding in the specified direction.

実施形態の一態様によれば、会議を効率的に進めることができる。 According to one aspect of the embodiment, the conference can be efficiently advanced.

図１は、実施形態に係る情報提供装置の一例を示す図である。FIG. 1 is a diagram illustrating an example of an information providing device according to an embodiment. 図２は、実施形態に係る情報提供装置が有する機能構成の一例を示す図である。FIG. 2 is a diagram illustrating an example of a functional configuration of the information providing apparatus according to the embodiment. 図３は、実施形態に係る情報提供装置が会議を誘導する方向を特定する処理の一例を示す図である。FIG. 3 is a diagram illustrating an example of processing in which the information providing apparatus according to the embodiment specifies a direction in which a conference is guided. 図４は、実施形態に係る情報提供装置が会議を誘導する道筋の一例を示す図である。FIG. 4 is a diagram illustrating an example of a route by which the information providing apparatus according to the embodiment guides a conference. 図５は、実施形態に係る情報提供装置が実行する深層強化学習の一例を説明する図である。FIG. 5 is a diagram illustrating an example of deep reinforcement learning executed by the information providing apparatus according to the embodiment. 図６は、実施形態に係る情報提供装置が実行する学習処理の流れを説明するフローチャートである。FIG. 6 is a flowchart illustrating the flow of learning processing executed by the information providing apparatus according to the embodiment. 図７は、ハードウェア構成の一例を示す図である。FIG. 7 is a diagram illustrating an example of the hardware configuration.

以下に、本願にかかる情報提供装置および情報提供方法を実施するための形態（以下、「実施形態」と記載する。）について図面を参照しつつ詳細に説明する。なお、この実施形態により本願にかかる情報提供装置および情報提供方法が限定されるものではない。また、以下の実施形態において同一の部位には同一の符号を付し、重複する説明は省略される。 Hereinafter, modes (hereinafter, referred to as “embodiments”) for carrying out the information providing apparatus and the information providing method according to the present application will be described in detail with reference to the drawings. The information providing apparatus and the information providing method according to the present application are not limited by this embodiment. Further, in the following embodiments, the same parts are designated by the same reference numerals, and duplicated description will be omitted.

〔１．情報提供装置の一例〕
まず、図１を用いて、情報提供装置１０が実行する応答処理の一例について説明する。図１は、実施形態に係る情報提供装置の一例を示す図である。例えば、図１に示す例では、応答処理の一例を、複数の利用者がブレインストーミング等の会議を行っている際に、利用者の発言を入力情報として取得し、取得した入力情報に基づいて、会議に参加した個々の意見や会議における話題の方向性を深層強化学習（DQN：Deep Q-Network）により学習することで、会議に参加した利用者の集団的知能の学習を行う学習処理と、会議における発言内容をベクトル空間上にマッピングし、自動運転等で用いられているＳＬＡＭ（Simultaneous Localization and Mapping）の技術を用いて、会議における話題の方向性を誘導する誘導処理とに分けて説明する。 [1. Example of information providing device]
First, an example of the response process executed by the information providing apparatus 10 will be described with reference to FIG. FIG. 1 is a diagram illustrating an example of an information providing device according to an embodiment. For example, in the example shown in FIG. 1, an example of the response process is as follows. When a plurality of users are having a meeting such as brainstorming, the user's remark is acquired as input information, and based on the acquired input information. , Learning process that learns collective intelligence of users who participated in the conference by learning individual opinions participating in the conference and direction of topics in the conference by deep reinforcement learning (DQN: Deep Q-Network). , The content of the speech at the conference is mapped on the vector space, and it is explained by using the SLAM (Simultaneous Localization and Mapping) technology used in automatic driving etc., and the guidance process for guiding the direction of the topic at the conference. To do.

より具体的には、以下の説明では、学習処理として、会議に参加した利用者の集団的知能の学習を行い、学習結果に基づいて、会議における現在の状況をコントロールするための応答を出力する処理の一例について説明する。また、以下の説明では、誘導処理として、会議において過去に行われた議論を避けたり、過去に会議が盛り上がった話題等に会議の内容を誘導する処理の一例について説明する。 More specifically, in the following description, as a learning process, the collective intelligence of the users who participated in the conference is learned, and a response for controlling the current situation in the conference is output based on the learning result. An example of processing will be described. In addition, in the following description, an example of a process of avoiding a discussion conducted in the past in the conference or guiding the contents of the conference to a topic or the like that the conference was excited in the past will be described as the guidance process.

図１に示した情報提供装置１０は、サーバ装置等の情報処理装置により実現される。なお、情報提供装置１０は、単一の情報処理装置に実現されてもよく、例えば、クラウドネットワーク上に存在する複数の情報処理装置が協調して実現されてもよい。このような、情報提供装置１０は、利用者の発言をテキストデータに変換し、テキストデータを自然言語処理により解析する。そして、情報提供装置１０は、解析結果に基づいて、会議や利用者の思考を支援する発言を生成し、生成した発言を出力する。 The information providing device 10 shown in FIG. 1 is realized by an information processing device such as a server device. The information providing apparatus 10 may be realized by a single information processing apparatus, for example, a plurality of information processing apparatuses existing on the cloud network may be realized in cooperation with each other. Such an information providing device 10 converts a user's statement into text data and analyzes the text data by natural language processing. Then, the information providing apparatus 10 generates a comment that supports the meeting and the thinking of the user based on the analysis result, and outputs the generated comment.

〔１−１．応答処理の一例〕
ここで、従来技術では、入力されたテキストを構成する複数次元の単語ベクトルで示す分散表現を用いて、入力されたテキストと類似するテキストや、入力されたテキストに続くテキストを類推する。このような従来技術を用いて、会議等の発言に対応する発言をロボット等から出力することで、会議の進行を補助するといった技術が考えられる。しかしながら、上記の従来技術では、入力されたテキストと類似するテキストや、入力されたテキストに続くテキスト等、利用者が予測しうる情報を出力しているに過ぎない。このため、例えば、上記の従来技術では、会議の内容を停滞させたり、会議の内容を発散させたりしてしまい、会議を効率的に進めることができないおそれがある。そこで、情報提供装置１０は、応答処理として、集団知能の学習を行う学習処理と、会議の内容を誘導する誘導処理とを実行する。なお、以下の説明では、学習処理と誘導処理とをそれぞれ個別に説明するが、実際には、情報提供装置１０により学習処理と誘導処理とが同時並行的に実行されることとなる。 [1-1. Example of response processing]
Here, in the prior art, a text similar to the input text or a text following the input text is analogized by using a distributed expression represented by a multidimensional word vector forming the input text. A technique of assisting the progress of the conference by outputting a utterance corresponding to the utterance of the conference or the like from a robot or the like using such a conventional technique is conceivable. However, in the above-mentioned conventional technique, the information similar to the input text, the text following the input text, and the like that the user can predict are merely output. Therefore, for example, in the above-described conventional technology, the content of the conference may be stagnant or the content of the conference may be diverged, and the conference may not be able to proceed efficiently. Therefore, the information providing apparatus 10 executes, as response processing, a learning processing for learning collective intelligence and a guidance processing for guiding the content of the conference. In the following description, the learning process and the guidance process will be described separately, but in reality, the learning process and the guidance process will be simultaneously executed in parallel by the information providing device 10.

〔１−１−１．学習処理〕
まず、情報提供装置１０が実行する学習処理の一例について説明する。まず、情報提供装置１０は、会議における利用者の発言内容を取得する。続いて、情報提供装置１０は、入力された発言内容に対する応答であって、後続する他の発言の内容を会議の目的に近づくように誘導する応答を決定する。より具体的には、情報提供装置１０は、取得された発言内容に対して出力された応答に後続する発言が、会議の目的に近づくようにする応答を決定する。例えば、情報提供装置１０は、ある発言内容に後続する発言が会議の目的に近づくように、発言に対する応答を深層強化学習により学習した学習器を用いて、応答を決定する。そして、情報提供装置１０は、決定した応答を発言に対する応答として出力する。 [1-1-1. Learning process]
First, an example of the learning process executed by the information providing device 10 will be described. First, the information providing apparatus 10 acquires the content of a user's statement in a conference. Subsequently, the information providing apparatus 10 determines a response to the input utterance content, which guides the content of other subsequent utterances so as to approach the purpose of the conference. More specifically, the information providing apparatus 10 determines a response in which the utterance subsequent to the response output for the acquired utterance content approaches the purpose of the conference. For example, the information providing apparatus 10 determines a response by using a learner that has learned the response to the utterance by deep reinforcement learning so that the utterance following a certain utterance content approaches the purpose of the conference. Then, the information providing apparatus 10 outputs the determined response as a response to the message.

ここで、深層強化学習とは、所謂ディープラーニング等の深層学習と、前回の出力結果による現在の状況を観測し、観測結果に応じて取るべき行動を決定する強化学習とを組み合わせた学習手法である。より具体的には、情報提供装置１０は、入力層と、複数の中間層と、出力層とを有するニューラルネットワークであるＤＮＮ（Deep Neural Network）を学習器として保持し、入力情報を入力層から入力した際に出力層から出力される情報に基づいて、応答を決定する。さらに、情報提供装置１０は、決定した応答を出力した後における環境、すなわち、応答に対する利用者の評価や反応、会議の内容等を取得し、取得した評価等を報酬として取得し、取得した報酬に基づいて、学習器の修正（再学習）を行う。すなわち、情報提供装置１０は、会議における利用者の発言を深層強化学習における状態Ｓとし、状態Ｓと方策πとに基づいて、方策ａを決定する。そして、情報提供装置１０は、後述する利用者の発言を深層強化学習における方策πの結果（すなわち、報酬Ｒ）として状態価値関数を算出し、算出した状態価値関数に基づいて、方策πの更新を行う。 Here, deep reinforcement learning is a learning method that combines deep learning such as so-called deep learning with reinforcement learning that observes the current situation based on the previous output result and determines the action to be taken according to the observation result. is there. More specifically, the information providing apparatus 10 holds a DNN (Deep Neural Network), which is a neural network having an input layer, a plurality of intermediate layers, and an output layer, as a learning device, and inputs input information from the input layer. The response is determined based on the information output from the output layer when input. Further, the information providing apparatus 10 acquires the environment after outputting the determined response, that is, the user's evaluation and reaction to the response, the content of the meeting, and the like, acquires the acquired evaluation and the like as a reward, and acquires the acquired reward. The learner is corrected (re-learned) based on That is, the information providing apparatus 10 sets the statement of the user in the conference as the state S in the deep reinforcement learning, and determines the policy a based on the state S and the policy π. Then, the information providing apparatus 10 calculates the state value function by using the statement of the user described later as the result of the policy π in deep reinforcement learning (that is, the reward R), and updates the policy π based on the calculated state value function. I do.

例えば、情報提供装置１０は、出力した応答に対する利用者の発言内容を新たに取得する。そして、情報提供装置１０は、新たに取得された利用者の発言内容に基づいて、応答を評価する値、すなわち、状態価値関数を算出し、算出した状態価値関数に基づいて、新たに取得された利用者の発言内容に対する応答を決定する。 For example, the information providing apparatus 10 newly obtains the content of the user's statement in response to the output response. Then, the information providing apparatus 10 calculates a value for evaluating the response, that is, a state-value function based on the newly acquired content of the user's statement, and newly acquires the value based on the calculated state-value function. It determines the response to the user's comments.

ここで、情報提供装置１０は、利用者の発言が会議の目的に近づくように、報酬Ｒの値を設定する。例えば、情報提供装置１０は、前回取得した発言内容と、新たに取得した発言内容とのブレが少ない程、報酬Ｒの値を大きくする。ここで、報酬Ｒの値は、例えば、形態素解析等の文章解析技術に基づいて、前回の発言内容と、新たな発言内容との比較を行い、前回の発言内容と新たな発言内容との差分が大きい程、報酬Ｒの値を小さくするといった手段により実現される。 Here, the information providing apparatus 10 sets the value of the reward R so that the speech of the user approaches the purpose of the conference. For example, the information providing apparatus 10 increases the value of the reward R as the amount of blurring between the previously acquired statement content and the newly acquired statement content decreases. Here, the value of the reward R is, for example, based on a sentence analysis technique such as morphological analysis, the content of the previous statement is compared with the content of the new statement, and the difference between the content of the previous statement and the content of the new statement is calculated. Is realized as the value of R increases, the value of the reward R decreases.

また、報酬Ｒの値は、例えば、発言後において、会議に参加する利用者が入力した評価の値に基づいて決定されてもよい。例えば、情報提供装置１０は、応答の出力後に、会議に参加する利用者から、会議の内容が目的に近づいているか否かの評価を受付ける。このような評価は、例えば、各利用者に与えられたスライダ等の入力装置や、ＢＭＩ（Brain-machine Interface）等によって取得される。そして、情報提供装置１０は、取得された評価に基づいて、報酬Ｒの値を設定し、設定した報酬Ｒの値に基づいて、学習器の修正を行う。 Further, the value of the reward R may be determined, for example, based on the evaluation value input by the users who participate in the conference after the statement. For example, the information providing apparatus 10 receives, after outputting the response, an evaluation as to whether or not the content of the meeting is close to the purpose from the users who participate in the meeting. Such an evaluation is acquired by, for example, an input device such as a slider provided to each user, a BMI (Brain-machine Interface), or the like. Then, the information providing apparatus 10 sets the value of the reward R based on the acquired evaluation, and corrects the learning device based on the set value of the reward R.

〔１−１−２．誘導処理〕
ここで、利用者の発言が単純に会議の目的に近づくような応答を出力したとしても、会議を適切に誘導することができない場合もある。例えば、会議等においては、各利用者に前回とは異なるアイデアを出させたり、過去に盛り上がった話題等に誘導することで、会議を円滑に進めたい場合等が考えられる。 [1-1-2. Induction processing)
Here, even if a response in which the user's remark simply approaches the purpose of the conference is output, it may not be possible to appropriately guide the conference. For example, in a meeting or the like, there may be a case where the user wants to smoothly advance the meeting by causing each user to come up with an idea different from the previous one, or by leading to a topic or the like that has been excited in the past.

そこで、情報提供装置１０は、以下の誘導処理を実行する。例えば、情報提供装置１０は、会議における複数の利用者の発言内容を取得する。そして、情報提供装置１０は、取得された利用者の発言内容をベクトル空間上に投影する。例えば、情報提供装置１０は、Ｗ２Ｖ（Word 2 Vector）等、単語や文章等が有する意味や概念を複数次元の量（分散表現）に変換する任意の技術を用いて、利用者の発言内容が有する意味をベクトル空間上に投影する。すなわち、情報提供装置１０は、同じ話題の発言内容が、ベクトル空間のうち所定の範囲内に収まるように、発言内容の投影を行う。そして、情報提供装置１０は、ベクトル空間上に投影された発言内容の履歴の位置と、ベクトル空間上に投影された新たな発言内容の位置とに基づいて、会議を誘導する方向を決定する。 Therefore, the information providing apparatus 10 executes the following guidance process. For example, the information providing apparatus 10 acquires the content of remarks made by a plurality of users in a conference. Then, the information providing apparatus 10 projects the acquired user's comment contents on the vector space. For example, the information providing apparatus 10 uses the W2V (Word 2 Vector) or the like to convert the meaning or concept of a word or a sentence into a multidimensional amount (distributed expression) by using an arbitrary technique. The meaning it has is projected onto the vector space. That is, the information providing apparatus 10 projects the utterance content so that the utterance content of the same topic falls within a predetermined range in the vector space. Then, the information providing apparatus 10 determines the direction in which the conference is guided based on the position of the history of the utterance content projected on the vector space and the position of the new utterance content projected on the vector space.

すなわち、情報提供装置１０は、発言内容の意味をベクトル空間上の位置に置き換えることで、過去の会議の内容と、現在の会議内容との関係性を、ベクトル空間上の位置に置き換える。そして、情報提供装置１０は、自動運転等に用いられるＳＬＡＭ等の技術を用いて、ベクトル空間上において会議を誘導する方向を特定し、特定した方向に会議を誘導する応答を出力する。すなわち、情報提供装置１０は、発言内容の意味をベクトル空間上に投影することで、発言内容の意味やアイデアの近さを距離空間上にマッピングしたアイデアマップを生成する。そして、情報提供装置１０は、最新の発言内容や、それまでの発言内容を投影したベクトルの合成をベクトル空間上に投影することで、アイデアマップ中における会議の現状を示す位置を特定することで、議論の中心の位置を議論全体の中から俯瞰できるようにする。 That is, the information providing apparatus 10 replaces the relationship between the content of the past meeting and the content of the current meeting with the position on the vector space by replacing the meaning of the utterance content with the position on the vector space. Then, the information providing apparatus 10 uses a technique such as SLAM used for automatic driving or the like to specify the direction in which the conference is guided in the vector space, and outputs a response to guide the conference in the specified direction. That is, the information providing apparatus 10 projects the meaning of the utterance content on the vector space to generate an idea map in which the meaning of the utterance content and the closeness of the idea are mapped on the distance space. Then, the information providing apparatus 10 projects the latest utterance content or a combination of vectors projecting the utterance content up to that point on the vector space to specify the position indicating the current state of the conference in the idea map. , To enable a bird's eye view of the position of the center of the discussion from the overall discussion.

その後、情報提供装置１０は、特定した方向へ会議を誘導する応答を生成して出力する。例えば、情報提供装置１０は、上述した学習処理において、利用者の発言内容がベクトル空間上において特定した方向へ向かうように学習器の更新を行うことで、特定した方向へ会議を誘導する応答を出力する。より具体的には、情報提供装置１０は、応答に続く利用者の新たな発言内容をベクトル空間上に投影し、新たな発言内容を投影した位置が前回の発言内容を投影した位置よりも特定した方向に近づいている場合には、報酬Ｒの値をより大きな値に更新し、学習器の修正を行う。 After that, the information providing apparatus 10 generates and outputs a response that guides the conference in the specified direction. For example, in the learning process described above, the information providing apparatus 10 updates the learning device so that the user's utterance content goes in the specified direction on the vector space, thereby giving a response that guides the conference in the specified direction. Output. More specifically, the information providing apparatus 10 projects the new utterance content of the user following the response on the vector space, and the position where the new utterance content is projected is specified more than the position where the previous utterance content is projected. If the direction is approaching, the value of the reward R is updated to a larger value and the learning device is corrected.

ここで、情報提供装置１０は、会議を誘導する方向として任意の方向を特定してよい。例えば、情報提供装置１０は、過去に会議が盛り上がった際における利用者の発言内容が投影された方向を特定し、特定した方向へ会議を誘導する応答を出力する。この結果、情報提供装置１０は、会議の停滞等を防ぎ、会議を円滑に進めることができる。また、他の例では、情報提供装置１０は、過去の会議における発言内容が投影された位置、すなわち、発言内容の履歴が投影された位置から離れる方向を特定し、特定した方向へ会議を誘導する応答を出力する。この結果、情報提供装置１０は、会議が同じ話題で堂々巡りする等といった事態を防ぎ、会議を目的に沿った方向へ進めることができる。 Here, the information providing apparatus 10 may specify an arbitrary direction as the direction for guiding the conference. For example, the information providing apparatus 10 identifies the direction in which the user's utterance content when the conference was excited in the past was projected, and outputs a response for guiding the conference in the identified direction. As a result, the information providing device 10 can prevent the conference from being stagnant and smoothly advance the conference. Further, in another example, the information providing apparatus 10 identifies the position where the utterance content in the past conference is projected, that is, the direction away from the position where the history of the utterance content is projected, and guides the conference in the identified direction. Output the response. As a result, the information providing apparatus 10 can prevent a situation where the conference is boring around the same topic, etc., and advance the conference in a direction according to the purpose.

なお、会議を誘導する方向を特定する処理については、会議の目的や会議の現状、過去における会議の内容などに基づいて、任意の設定や戦略が適用可能である。例えば、情報提供装置１０は、会議が盛り上がらなかった際における利用者の発言内容が投影されたベクトル空間上の領域を障害物と見做し、ＳＬＡＭの技術を用いて障害物をよけるベクトル空間上の経路を特定し、会議を特定した経路に沿うように誘導してもよい。 Regarding the process of identifying the direction in which the conference is guided, arbitrary settings and strategies can be applied based on the purpose of the conference, the current state of the conference, the content of the conference in the past, and the like. For example, the information providing apparatus 10 considers an area on the vector space where the user's utterance content is projected when the conference is not exciting as an obstacle and uses the SLAM technology to avoid the obstacle. The above route may be specified and the conference may be guided along the specified route.

また、情報提供装置１０は、高次元なベクトル空間を高速に処理するため、各点間の距離を保持するように次元数を圧縮するＭＤＳ（Multi Dimensional Scaling）（多次元距離尺度）等の技術を用いて、ベクトル空間の次元を圧縮してから、誘導方向を特定する処理を実行してもよい。 In addition, the information providing apparatus 10 processes a high-dimensional vector space at high speed, and thus a technique such as MDS (Multi Dimensional Scaling) (multi-dimensional distance scale) that compresses the number of dimensions to maintain the distance between points. May be used to compress the dimension of the vector space, and then the process of identifying the guiding direction may be executed.

〔１−２．応答処理の一例〕
次に、図１を用いて、情報提供装置１０が実行する応答処理の一例について説明する。例えば、図１に示す例では、情報提供装置１０は、会議における利用者の発言Ａや発言Ｂを入力として受付ける（ステップＳ１）。より具体的には、情報提供装置１０は、利用者が発声した発言Ａをテキストデータに変換し、変換後のテキストデータを入力情報として取得する。 [1-2. Example of response processing]
Next, an example of the response process executed by the information providing apparatus 10 will be described with reference to FIG. For example, in the example shown in FIG. 1, the information providing apparatus 10 receives, as an input, the statement A or the statement B of the user in the conference (step S1). More specifically, the information providing apparatus 10 converts the utterance A uttered by the user into text data, and acquires the converted text data as input information.

かかる場合、情報提供装置１０は、応答処理を実行する（ステップＳ２）。まず、情報提供装置１０は、発言内容をベクトル空間上に投影する（ステップＳ３）。そして、情報提供装置１０は、ベクトル空間上における過去の発言内容の位置と、新たな発言内容の位置との位置関係を特定する（ステップＳ４）。そして、情報提供装置１０は、特定した位置関係に基づいて、ＳＬＡＭ等の技術を用いて、会議を誘導する方向を特定する（ステップＳ５）。 In such a case, the information providing device 10 executes a response process (step S2). First, the information providing apparatus 10 projects the utterance content on the vector space (step S3). Then, the information providing apparatus 10 identifies the positional relationship between the position of the past message content and the position of the new message content in the vector space (step S4). Then, the information providing device 10 specifies the direction in which the conference is to be guided by using a technique such as SLAM based on the specified positional relationship (step S5).

続いて、情報提供装置１０は、会議の目的に近づけるように深層強化学習を行った学習器を用いて、発言内容に対する応答を生成する（ステップＳ６）。そして、情報提供装置１０は、生成した応答を出力する（ステップＳ７）。例えば、情報提供装置１０は、生成した応答を発言Ｃとしてロボット等に読み出させることで、会議の誘導を行う。 Subsequently, the information providing apparatus 10 uses the learning device that has performed deep reinforcement learning so as to approach the purpose of the conference, and generates a response to the utterance content (step S6). Then, the information providing device 10 outputs the generated response (step S7). For example, the information providing apparatus 10 guides the conference by causing the robot or the like to read the generated response as the statement C.

また、情報提供装置１０は、発言Ｃに対する利用者の評価を受付ける（ステップＳ８）。例えば、会議に参加する利用者は、発言Ｃにより会議が目的へと近づいたと感じた場合等には、発言Ｃが有用である旨の評価を入力する。すると、情報提供装置１０は、入力された評価を現在の方策に対する報酬として、深層強化学習を行う（ステップＳ９）。すなわち、情報提供装置１０は、より良い報酬を得ることができるように、学習器の修正を行う。 Further, the information providing apparatus 10 receives the user's evaluation of the statement C (step S8). For example, a user who participates in a meeting inputs an evaluation that the statement C is useful when, for example, the statement C feels that the meeting approaches the purpose. Then, the information providing apparatus 10 performs deep reinforcement learning by using the input evaluation as a reward for the current policy (step S9). That is, the information providing apparatus 10 modifies the learning device so that a better reward can be obtained.

なお、上述した例では、応答処理として、学習処理と誘導処理との実行結果を応答として出力する処理の一例について記載した。しかしながら、実施形態は、これに限定されるものではない。例えば、情報提供装置１０は、上述した誘導処理のみを実行して得られる応答を出力してもよく、誘導処理を実行せず、深層強化学習の結果得られる学習器を用いて生成した応答を出力してもよい。 In addition, in the above-mentioned example, an example of the process of outputting the execution result of the learning process and the guidance process as the response is described as the response process. However, the embodiment is not limited to this. For example, the information providing device 10 may output the response obtained by executing only the above-described guidance process, or may not output the response generated by using the learner obtained as a result of deep reinforcement learning without performing the guidance process. You may output.

〔２．情報提供装置の構成〕
次に、図２を用いて、図１に示した学習処理および誘導処理を実行する情報提供装置１０の構成について説明する。図２は、実施形態に係る情報提供装置が有する機能構成の一例を示す図である。図２に示すように、情報提供装置１０は、入力装置３０および出力装置３１と接続されている。また、情報提供装置１０は、通信部１１、記憶部１２、および制御部１６を有する。 [2. Configuration of information providing device]
Next, the configuration of the information providing apparatus 10 that executes the learning process and the guidance process shown in FIG. 1 will be described with reference to FIG. FIG. 2 is a diagram illustrating an example of a functional configuration of the information providing apparatus according to the embodiment. As shown in FIG. 2, the information providing device 10 is connected to the input device 30 and the output device 31. The information providing device 10 also includes a communication unit 11, a storage unit 12, and a control unit 16.

通信部１１は、例えば、ＮＩＣ（Network Interface Card）等によって実現される。そして、通信部１１は、マイクやキーボード等の入力装置３０と、モニタやプリンタ、音声を発声することができるロボット等の出力装置３１と接続され、各種情報の送受信を行う。 The communication unit 11 is realized by, for example, a NIC (Network Interface Card) or the like. The communication unit 11 is connected to an input device 30 such as a microphone and a keyboard and an output device 31 such as a monitor, a printer, and a robot capable of producing a voice, and transmits and receives various information.

記憶部１２は、例えば、ＲＡＭ（Random Access Memory)、フラッシュメモリ（Flash Memory）等の半導体メモリ素子、または、ハードディスク、光ディスク等の記憶装置によって実現される。また、記憶部１２は、モデルデータベース１３およびベクトル空間データベース１４を有する。 The storage unit 12 is realized by, for example, a semiconductor memory device such as a RAM (Random Access Memory) or a flash memory (Flash Memory), or a storage device such as a hard disk or an optical disk. The storage unit 12 also has a model database 13 and a vector space database 14.

モデルデータベース１３には、学習処理によって学習されるモデル、すなわち、深層強化学習により、後続する利用者の発言内容を会議の目的に近づけるようにする応答を学習した学習器のデータが登録されている。例えば、モデルデータベース１３には、学習器に含まれるニューロン間の接続関係や、接続係数等が登録されている。 In the model database 13, the model learned by the learning process, that is, the data of the learning device that has learned the response that makes the content of the speech of the subsequent user closer to the purpose of the conference by the deep reinforcement learning is registered. . For example, in the model database 13, the connection relationship between neurons included in the learning device, the connection coefficient, and the like are registered.

ベクトル空間データベース１４には、会議における利用者の発言内容が投影されたベクトル空間が登録されている。例えば、ベクトル空間データベース１４には、Ｗ２Ｖの技術等を用いて、複数次元の量に変換された利用者の発言内容の履歴が登録されている。なお、各発言内容を示すベクトル（分散表現）は、各発言内容の関係性に基づいて生成されるため、各発言内容の向きや距離は、各発言内容が有する意味、概念および共起性の類似度に対応することとなる。 In the vector space database 14, a vector space in which the user's utterance content in the conference is projected is registered. For example, in the vector space database 14, a history of user's utterance contents converted into a plurality of dimensions using W2V technology or the like is registered. Since the vector (distributed expression) indicating each utterance content is generated based on the relationship between each utterance content, the direction and distance of each utterance content indicate the meaning, concept, and co-occurrence of each utterance content. It corresponds to the degree of similarity.

制御部１６は、コントローラ（controller）であり、例えば、ＣＰＵ（Central Processing Unit）、ＭＰＵ（Micro Processing Unit）等のプロセッサによって、情報提供装置１０内部の記憶装置に記憶されている各種プログラムがＲＡＭ等を作業領域として実行されることにより実現される。また、制御部１６は、コントローラ（controller）であり、例えば、ＡＳＩＣ（Application Specific Integrated Circuit）やＦＰＧＡ（Field Programmable Gate Array）等の集積回路により実現されてもよい。 The control unit 16 is a controller, and for example, a processor such as a CPU (Central Processing Unit) or an MPU (Micro Processing Unit) causes various programs stored in a storage device inside the information providing apparatus 10 to be a RAM or the like. Is implemented as a work area. The control unit 16 is a controller, and may be realized by an integrated circuit such as an ASIC (Application Specific Integrated Circuit) or an FPGA (Field Programmable Gate Array).

図２に示すように、制御部１６は、取得部１７、投影部１８、位置特定部１９、誘導方向特定部２０、応答決定部２１、応答出力部２２、評価取得部２３、およびモデル更新部２４を有する。なお、各部１７〜２０は、上述した誘導処理を実現するための機能構成であり、各部２１〜２４は、学習処理を実現するための機能構成である。このため、情報提供装置１０は、例えば、各部１７〜２０を有する誘導装置と、各部２１〜２４有する学習装置とが協調して動作することにより、実現されてもよい。 As shown in FIG. 2, the control unit 16 includes an acquisition unit 17, a projection unit 18, a position identification unit 19, a guidance direction identification unit 20, a response determination unit 21, a response output unit 22, an evaluation acquisition unit 23, and a model update unit. With 24. It should be noted that each of the units 17 to 20 has a functional configuration for implementing the above-described guidance process, and each of the units 21 to 24 has a functional configuration for implementing the learning process. Therefore, the information providing apparatus 10 may be realized, for example, by the guidance apparatus including the units 17 to 20 and the learning apparatus including the units 21 to 24 operating in cooperation with each other.

〔２−１．誘導処理を実現する構成例〕
取得部１７は、会議における利用者の発言内容を取得する。例えば、取得部１７は、マイクやキーボード等により実現される入力装置３０から、会議における利用者の発言を取得する。かかる場合、取得部１７は、受付けた利用者の発言をテキストデータに変換する。 [2-1. Configuration example for implementing guidance processing]
The acquisition unit 17 acquires the content of a user's statement in a conference. For example, the acquisition unit 17 acquires a user's statement in the conference from the input device 30 realized by a microphone, a keyboard, or the like. In such a case, the acquisition unit 17 converts the received statement of the user into text data.

投影部１８は、取得された利用者の発言内容をベクトル空間上に投影する。例えば、投影部１８は、形態素解析等の技術を用いて、テキストデータに含まれる単語群を抽出し、抽出した単語群が有する概念や意味、すなわち、発言内容が有する概念や意味を複数次元の量に変換することで、発言内容をベクトル空間上に投影する。そして、投影部１８は、変換後の発言内容をベクトル空間データベース１４に登録する。 The projection unit 18 projects the acquired speech content of the user on the vector space. For example, the projection unit 18 extracts a word group included in the text data by using a technique such as morphological analysis, and calculates the concept or meaning of the extracted word group, that is, the concept or meaning of the utterance content in multiple dimensions. By converting it into a quantity, the content of the speech is projected on the vector space. Then, the projection unit 18 registers the converted speech content in the vector space database 14.

位置特定部１９は、ベクトル空間上に投影された発言内容の履歴の位置と、ベクトル空間上に投影された新たな発言内容の位置との関係を特定する。例えば、位置特定部１９は、ベクトル空間データベース１４を参照し、現在の会議の状態を示す位置と、過去の発言内容が投影されたベクトル空間上の位置との位置関係を特定する。例えば、位置特定部１９は、最後に出力された発言内容（最新の発言内容）が投影されたベクトル空間上の位置や、会議における発言内容を投影したベクトルの合計等を現在の会議の状態を示すベクトル空間上の位置として、過去の発言内容が投影されたベクトル空間上の位置との関係を特定する。 The position specifying unit 19 specifies the relationship between the position of the history of the utterance content projected on the vector space and the position of the new utterance content projected on the vector space. For example, the position identifying unit 19 refers to the vector space database 14 and identifies the positional relationship between the position indicating the current state of the conference and the position on the vector space where the past utterance content is projected. For example, the position identifying unit 19 indicates the position in the vector space where the last output utterance content (latest utterance content) is projected, the total vector of the utterance contents projected in the conference, and the like as the current conference state. As the position on the vector space to be shown, the relationship with the position on the vector space where the past utterance content is projected is specified.

誘導方向特定部２０は、位置特定部１９によって特定された位置関係に基づいて、会議を誘導する方向を特定する。例えば、誘導方向特定部２０は、最後に出力された発言内容が投影されたベクトル空間上の位置を現在の位置とし、過去の発言内容が投影されたベクトル空間上の位置を過去の位置として、ＳＬＡＭの技術を用いて、会議を誘導する方向を特定する。例えば、誘導方向特定部２０は、会議を誘導する方向として、発言内容の履歴が投影された位置から離れる方向や、会議が盛り上がった際における発言内容が投影された方向を特定する。 The guidance direction specifying unit 20 specifies the direction in which the conference is guided, based on the positional relationship specified by the position specifying unit 19. For example, the guiding direction identification unit 20 sets the position on the vector space where the last output utterance content is projected as the current position, and the position on the vector space where the past utterance content is projected as the past position, SLAM technology is used to identify the direction in which to guide the conference. For example, the guidance direction identification unit 20 identifies, as the direction in which the conference is guided, a direction away from the position where the history of the message content is projected, or a direction in which the message content is projected when the conference is excited.

例えば、図３は、実施形態に係る情報提供装置が会議を誘導する方向を特定する処理の一例を示す図である。例えば、投影部１８は、図３中（Ａ）に示すように、会議における利用者の発言内容をベクトル空間上に投影することで、発言内容の意味や概念の関係性を距離空間に投影したアイデアマップを生成する。より具体的には、投影部１８は、図３中（Ｂ）に示すように、発言内容を、その発言内容の意味や概念を相対的に示す位置に投影する。そこで、誘導方向特定部２０は、ＳＬＡＭの技術を用いて、図３中（Ａ）に示すアイデアマップにおいて、どのように移動するかを特定することで、どのように会議を誘導するかを決定する。 For example, FIG. 3 is a diagram illustrating an example of processing in which the information providing apparatus according to the embodiment specifies a direction in which a conference is guided. For example, as shown in FIG. 3A, the projection unit 18 projects the user's utterance content in the conference on the vector space, thereby projecting the meaning of the utterance content and the relationship between the concepts to the metric space. Generate an idea map. More specifically, as shown in FIG. 3B, the projection unit 18 projects the utterance content at a position relatively indicating the meaning or concept of the utterance content. Therefore, the guiding direction specifying unit 20 determines how to guide the conference by specifying how to move in the idea map shown in FIG. 3A using the SLAM technology. To do.

例えば、誘導方向特定部２０は、図３中（Ｃ）に示すように、過去の会議における利用者の発言内容が投影された領域を障害物として特定する。そして、誘導方向特定部２０は、会議における過去の話題とは異なる話題の方向へ会議を誘導する場合には、図３中（Ｄ）に示すように、障害物をよけるように、アイデアマップ上の経路を特定することで、会議を誘導する方向を特定する。すなわち、誘導方向特定部２０は、会議における利用者の発言内容を投影したアイデアマップを作成しつつ、過去の会議の内容とは異なる道筋で、会議を目的へと誘導する。 For example, as shown in FIG. 3C, the guiding direction identification unit 20 identifies an area where the user's utterance content in a past conference is projected as an obstacle. Then, when guiding the conference in the direction of a topic different from the past topic in the conference, the guidance direction specifying unit 20 avoids obstacles as shown in FIG. The direction leading the conference is specified by specifying the above route. That is, the guidance direction identification unit 20 guides the conference to the purpose along a route different from the contents of the past conference while creating an idea map in which the contents of the statement of the user in the conference are projected.

例えば、図４は、実施形態に係る情報提供装置が会議を誘導する道筋の一例を示す図である。例えば、誘導方向特定部２０は、図４中（Ａ）に示すように、利用者の発言Ｓ_１を取得した場合、図４中（Ｂ）に示すように、会議の目的へと各利用者の発言Ｓ_２〜Ｓ_ｔが行われ、会議の目的から離れた発言ｚ_１〜ｚ_ｔが行われないように、各利用者の発言を誘導する方向を特定する。また、誘導方向特定部２０は、会議の目的から離れた発言θ_１、θ_２、ｕ_１〜ｕ_ｔ等が行われた場合には、各利用者の発言ｓ_２〜ｓ_ｔが行われる方向を、各利用者の発言を誘導する方向とする。 For example, FIG. 4 is a diagram illustrating an example of a route by which the information providing apparatus according to the embodiment guides a conference. For example, when the guidance direction identifying unit 20 acquires the user's statement S ₁ as shown in (A) of FIG. 4, each user is directed to the purpose of the conference as shown in (B) of FIG. is performed remarks S ₂ to S _t, so is not performed remarks z ₁ to z _t away from desired conference, specifying a direction to induce speech of each user. Further, the guidance direction specifying section 20, speech theta ₁ away from the target of the meeting, when the θ _2, _u 1 ~u _t like is performed, the direction in which speech _s 2 ~s _t of each user is performed Is the direction in which the speech of each user is guided.

ここで、誘導方向特定部２０がＳＬＡＭの技術を用いて、会議を誘導する方向を決定する手法の一例について説明する。例えば、誘導方向特定部２０は、誘導対象となる会議の時刻「ｔ」におけるベクトル空間上の位置を「ｓ^ｔ」とする。例えば、誘導方向特定部２０は、会議における全発言内容をベクトル空間上に投影したベクトルの和や、最後の発言内容を投影したベクトルを「ｓ^ｔ」とする。また、誘導方向特定部２０は、誘導するベクトル空間上の方向を「θ」とする。また誘導方向特定部２０は、ベクトル空間上にマッピングした発言内容のベクトル位置を「ｚ^ｔ」（計量に対応）とし、会議における所定期間内の利用者の発言内容の合成ベクトルを「ｕ^ｔ」（制御に対応）とし、ベクトル空間上における移動距離を「ｎ^ｔ」とする。このような場合、誘導方向特定部２０は、以下の式（１）を用いて、ＳＬＡＭの技術に基づき、次の時刻における誘導方向を特定する。 Here, an example of a method in which the guidance direction identification unit 20 determines the direction in which the conference is guided by using the SLAM technology will be described. For example, the guidance direction identification unit 20 sets the position in the vector space at the time “t” of the conference to be guided as “s ^t ”. For example, the guidance direction specifying unit 20 sets the sum of vectors obtained by projecting all the utterance contents in the conference on the vector space or the vector obtained by projecting the last utterance contents as "s ^t ". In addition, the guidance direction identifying unit 20 sets the direction in the vector space to be guided to “θ”. In addition, the guidance direction identification unit 20 sets the vector position of the message content mapped on the vector space as “z ^t ” (corresponding to the metric), and the synthetic vector of the message content of the user within the predetermined period in the conference is “u ^t ”. (Corresponding to control), and the moving distance in the vector space is “ ^nt ”. In such a case, the guide direction specifying unit 20 specifies the guide direction at the next time based on the SLAM technique using the following formula (1).

なお、例えば、誘導方向特定部２０は、ベクトル空間上に投影した発言内容の位置に基づいて、会議に参加する各利用者の立場を明確化してもよい。また、誘導方向特定部２０は、ベクトル空間上に発言内容を投影することで、各利用者の発言内容が同じことを異なる表現で言っているだけであるか否かを判定してもよく、会議における議論がループしているか否かを判定してもよい。そして、誘導方向特定部位２０は、判定結果に基づいて、会議を誘導する方向を特定してもよい。 Note that, for example, the guidance direction specifying unit 20 may clarify the position of each user who participates in the conference based on the position of the message content projected on the vector space. In addition, the guiding direction identification unit 20 may determine whether or not the user's utterance contents are simply saying different things by projecting the utterance contents on the vector space, It may be determined whether the discussion at the conference is looping. And the guidance direction specific | specification part 20 may specify the direction which guides a meeting based on a determination result.

〔２−２．ＤＱＮを用いた学習処理を実現する構成例〕
図２に戻り、説明を続ける。応答決定部２１は、後続する利用者の発言内容を会議の目的に近づけるようにする応答を深層強化学習により学習した学習器を用いて、取得された発言内容に対する応答を決定する。例えば、応答決定部２１は、モデルデータベース１３から深層強化学習が行われた学習器を取得し、取得された利用者の発言内容を学習器に入力し、学習器の出力に応じて、会議における利用者の発言に対する応答を決定する。そして、応答出力部２２は、応答決定部２１によって決定された応答をスピーカー等の出力装置３１から出力する。 [2-2. Configuration example for realizing learning process using DQN]
Returning to FIG. 2, the description will be continued. The response determination unit 21 determines the response to the acquired message content by using the learning device that has learned the response that makes the message content of the subsequent user closer to the purpose of the conference by deep reinforcement learning. For example, the response determination unit 21 acquires a learning device on which deep reinforcement learning has been performed from the model database 13, inputs the acquired user's utterance content to the learning device, and according to the output of the learning device, Determine the response to the user's statement. Then, the response output unit 22 outputs the response determined by the response determination unit 21 from the output device 31 such as a speaker.

なお、後述する様に、モデルデータベース１３に登録される学習器は、応答が出力される度に、評価取得部２３によって取得された評価に基づいて、モデル更新部２４により順次更新が行われる。このため、応答決定部２１は、新たな発言内容が取得された場合には、前回の発言内容に対応する応答に基づいて更新された学習器を用いて、新たな応答を出力することとなる。 As will be described later, the learning device registered in the model database 13 is sequentially updated by the model updating unit 24 based on the evaluation acquired by the evaluation acquisition unit 23 each time a response is output. Therefore, when the new utterance content is acquired, the response determination unit 21 outputs the new response using the learning device updated based on the response corresponding to the previous utterance content. .

評価取得部２３は、スライダやＢＭＩなどの所定のインターフェースを有する入力装置３０を介して、応答出力部２２が出力した応答に対する評価を取得する。なお、評価取得部２３は、例えば、応答を出力した際における利用者の発言内容を取得し、取得した発言内容を解析することで、応答に対する利用者の評価を取得してもよい。すなわち、評価取得部２３は、応答が出力された後における会議の状態を取得する。 The evaluation acquisition unit 23 acquires an evaluation for the response output by the response output unit 22 via the input device 30 having a predetermined interface such as a slider or BMI. The evaluation acquisition unit 23 may acquire the user's evaluation of the response, for example, by acquiring the content of the user's remark when the response is output and analyzing the acquired content of the remark. That is, the evaluation acquisition unit 23 acquires the state of the conference after the response is output.

なお、評価取得部２３は、応答の前に取得した利用者の発言内容と、応答の後に取得した利用者の発言内容とに基づいて、会議が目的に近づいているか否かを判定し、判定結果に応じた評価を取得してもよい。例えば、評価取得部２３は、応答の前に取得した利用者の発言内容の意味と、応答の後に取得した利用者の発言内容の意味とを比較し、各発言内容の意味がブレていない場合や、誘導方向特定部２０により特定された方向、すなわち、会議の目的の方向に向かっている場合には、応答が有益である旨の評価を取得してもよい。 The evaluation acquisition unit 23 determines whether or not the meeting is close to the purpose based on the content of the user's statement acquired before the response and the content of the user's statement acquired after the response. You may acquire the evaluation according to a result. For example, the evaluation acquisition unit 23 compares the meaning of the remark content of the user acquired before the response with the meaning of the remark content of the user acquired after the response, and when the meaning of each remark content is not blurred. Alternatively, when the direction is specified by the guide direction specifying unit 20, that is, when the direction is the target direction of the conference, the evaluation that the response is useful may be acquired.

モデル更新部２４は、応答が出力された後の会議の状態に基づいて、学習器の更新を行う。すなわち、モデル更新部２４は、学習器によって応答が出力された後の会議の状態に基づいて、後続する利用者の発言内容を会議の目的に近づけるようにする応答を学習するように、学習器の深層強化学習を行う。具体的には、モデル更新部２４は、応答出力部２２が前回応答を出力した後で評価取得部２３が取得した評価に基づいて、学習器の深層強化学習を行う。 The model updating unit 24 updates the learning device based on the state of the conference after the response is output. That is, the model updating unit 24 learns the response that makes the content of the speech of the subsequent user closer to the purpose of the conference based on the state of the conference after the response is output by the learner. Perform deep reinforcement learning. Specifically, the model update unit 24 performs deep reinforcement learning of the learning device based on the evaluation acquired by the evaluation acquisition unit 23 after the response output unit 22 outputs the previous response.

例えば、図５は、実施形態に係る情報提供装置が実行する深層強化学習の一例を説明する図である。例えば、会議において利用者の発言を「ｓ」、学習器が発言に対してどのような応答を出力するかを示す方策を「π」、利用者の発言「ｓ」があった際に方策「π」で出力される応答を「π（ｓ）」と記載すると、会議における利用者の発言と応答とは、図５中（Ａ）に示す図で模式的に示すことができる。より具体的には、利用者Ａの発言「Ｓ_０」があった場合、情報提供装置１０は、応答「π（ｓ_０）」を出力する。このような応答「π（ｓ_０）」に対して利用者Ｂの発言「Ｓ_１」があった場合、情報提供装置１０は、応答「π（ｓ_１）」を出力し、応答「π（ｓ_１）」に対して利用者Ｃの発言「Ｓ_２」があった場合、応答「π（ｓ_２）」を出力する。 For example, FIG. 5 is a diagram illustrating an example of deep reinforcement learning executed by the information providing apparatus according to the embodiment. For example, in a conference, the user's utterance is “s”, the measure indicating what response the learner outputs to the utterance is “π”, and the user's utterance “s” is the measure. When the response output by “π” is described as “π (s)”, the message and response of the user in the conference can be schematically shown in the diagram shown in FIG. More specifically, when there is a statement “S ₀ ” of the user A, the information providing apparatus 10 outputs a response “π (s ₀ )”. When the statement “S ₁ ” of the user B is made in response to such a response “π (s ₀ )”, the information providing apparatus 10 outputs the response “π (s ₁ )” and the response “π (s ₁ When there is a statement “S ₂ ” of the user C for “s ₁ )”, the response “π (s ₂ )” is output.

このような発言および応答があった場合、情報提供装置１０は、図５中（Ｂ）に示すように、方策「π」に対して、評価に基づく状態行動価値関数「Ｑ^π（ｓ，ａ）」を算出する。そして、情報提供装置１０は、状態行動価値関数「Ｑ^π（ｓ，ａ）」に基づいて、方策「π」を更新する。より具体的には、情報提供装置１０は、状態行動価値関数「Ｑ^π（ｓ，ａ）」の値を最大化するように、方策「π」の値を更新する。このような応答の出力と、出力に対する評価に基づいた方策の更新とを繰り返し実行することで、情報提供装置１０は、学習器の深層強化学習を実現する。 When there is such a statement and response, the information providing apparatus 10 determines the state action value function “Q ^π (s, a) based on the evaluation with respect to the policy“ π ”, as shown in FIG. 5B. ) ”Is calculated. Then, the information providing apparatus 10 updates the policy “π” based on the state action value function “Q ^π (s, a)”. More specifically, the information providing apparatus 10 updates the value of the policy “π” so as to maximize the value of the state action value function “Q ^π (s, a)”. The information providing apparatus 10 realizes the deep reinforcement learning of the learner by repeatedly executing the output of such a response and the update of the policy based on the evaluation of the output.

より具体的には、情報提供装置１０は、以下の式（２）に示すように、方策に対する応答「π（ｓ）」の値の「ａ」と見做すと、以下の式（３）に示すように、報酬に応じた次の方策「π^＊（ｓ）」を、状態行動価値関数の値の最大化に基づいて決定する。すなわち、情報提供装置１０は、報酬に応じて、次に得られるであろう報酬の値が最大化するように、状態行動価値関数を算出し、算出した状態行動価値関数に基づいて、方策の更新を行う。 More specifically, when the information providing apparatus 10 considers the value of the response “π (s)” to the policy is “a”, as shown in the following equation (2), the following equation (3) As shown in, the next policy “π ^* (s)” according to the reward is determined based on the maximization of the value of the state action value function. That is, the information providing apparatus 10 calculates the state action value function so that the value of the reward that will be obtained next is maximized according to the reward, and based on the calculated state action value function, the policy Update.

ここで、状態行動価値関数は、以下の式（４）にて求めることができる。ここで、Ｒ（Ｓ_０、ａ_０、ｓ_１）とは、前回の応答に対する報酬を示す変数である。なお、「ｓ」および「ａ」の添え字は、方策および方策に対する応答の時間（順番）を示す値である。例えば「ｓ_ｔ」は、時刻「ｔ」における「ｓ」の値であり、「ｓ_ｔ＋１」は、時刻「ｔ＋１」における「ｓ」の値である。 Here, the state action value function can be obtained by the following equation (4). Here, R (S ₀ , a ₀ , s ₁ ) is a variable indicating the reward for the previous response. The subscripts of "s" and "a" are values indicating the policy and the response time (order) to the policy. For example, “s _t ” is the value of “s” at time “t”, and “s _{t + 1} ” is the value of “s” at time “t + 1”.

このように、情報提供装置１０は、出力した応答に対する利用者の評価に基づいて、学習器の更新を行い、更新後の学習器を用いて、会議を目的となる方向に誘導するような応答を出力する。このため、情報提供装置１０は、会議を効率的に進めることができる。 As described above, the information providing apparatus 10 updates the learning device based on the evaluation of the user with respect to the output response, and uses the updated learning device to guide the conference in the target direction. Is output. Therefore, the information providing apparatus 10 can efficiently proceed with the conference.

〔３．情報提供装置１０が実行する応答処理の流れ〕
次に、図６を用いて、情報提供装置１０が実行する応答処理の流れについて説明する。図６は、実施形態に係る情報提供装置が実行する応答処理の流れを説明するフローチャートである。なお、情報提供装置１０は、以下に説明するステップＳ１０１〜ステップＳ１０８の処理を繰り返し実行する。 [3. Flow of response processing executed by the information providing apparatus 10]
Next, the flow of the response process executed by the information providing apparatus 10 will be described with reference to FIG. FIG. 6 is a flowchart illustrating the flow of response processing executed by the information providing apparatus according to the embodiment. The information providing apparatus 10 repeatedly executes the processes of steps S101 to S108 described below.

まず、情報提供装置１０は、利用者の発言内容を取得する（ステップＳ１０１）。そして、情報提供装置１０は、取得した発言内容をベクトル空間上に投影し（ステップＳ１０２）、発言内容の履歴の位置と現在の発言内容の位置との関係を特定する（ステップＳ１０３）。そして、情報提供装置１０は、ＳＬＡＭの技術を用いて、会議を誘導する方向を特定する（ステップＳ１０４）。 First, the information providing apparatus 10 acquires the content of a user's statement (step S101). Then, the information providing apparatus 10 projects the acquired utterance content on the vector space (step S102), and specifies the relationship between the position of the history of the utterance content and the current position of the utterance content (step S103). Then, the information providing apparatus 10 uses the SLAM technology to identify the direction in which the conference is to be guided (step S104).

また、情報提供装置１０は、利用者の発言内容を会議の目的となる方向に誘導するための応答を深層強化学習により学習した学習器を用いて、発言内容に対する応答を決定し（ステップＳ１０５）、決定した応答を出力する（ステップＳ１０６）。また、情報提供装置１０は、応答に対する利用者の評価に基づく報酬を設定し（ステップＳ１０７）、設定した報酬に基づく深層強化学習を実行する（ステップＳ１０８）。 Further, the information providing apparatus 10 determines the response to the message content by using the learning device that has learned the response for guiding the user's message content in the target direction of the conference by deep reinforcement learning (step S105). , And outputs the determined response (step S106). The information providing apparatus 10 also sets a reward based on the user's evaluation of the response (step S107), and executes deep reinforcement learning based on the set reward (step S108).

〔４．変形例〕
上記では、図１に例示した態様を用いながら、情報提供装置１０が実行する処理の一例について説明した。しかしながら、実施形態は、これに限定されるものではない。以下、情報提供装置１０が実行する抽出処理のバリエーションについて説明する。 [4. Modification)
In the above, an example of the process executed by the information providing apparatus 10 has been described using the mode illustrated in FIG. 1. However, the embodiment is not limited to this. Hereinafter, variations of the extraction process executed by the information providing device 10 will be described.

〔４−１．処理の実行形態について〕
上述した例では、情報提供装置１０は、会議を効率的に進めるため、発言内容をベクトル空間上に投影し、各発言内容の位置関係に基づいて、会議を誘導する方向を特定するとともに、利用者の発言を会議の目的となる方向に誘導させる応答を深層強化学習により学習した学習器を用いて、応答を生成、出力した。しかしながら、実施形態は、これに限定されるものではない。 [4-1. Execution mode of processing]
In the above-described example, the information providing apparatus 10 projects the utterance content on the vector space in order to efficiently advance the conference, specifies the direction in which the conference is guided based on the positional relationship between the utterance contents, and A response was generated and output using a learner that learned by deep reinforcement learning a response that induces a person's remark in the direction of the meeting. However, the embodiment is not limited to this.

例えば、情報提供装置１０は、上述した誘導処理の結果、会議を誘導する方向を特定し、特定した方向を示す情報を出力してもよい。また、情報提供装置１０は、上述した学習器を用いて、応答を生成して出力してもよい。すなわち、情報提供装置１０は、上述した学習処理や誘導処理を、それぞれ独立に実行し、実行結果を出力する装置であってもよい。 For example, the information providing device 10 may specify the direction in which the conference is to be guided as a result of the above-described guidance process, and output information indicating the specified direction. Moreover, the information providing apparatus 10 may generate and output a response by using the learning device described above. That is, the information providing device 10 may be a device that independently executes the learning process and the guidance process described above and outputs the execution result.

〔４−２．その他〕
また、上記実施形態において説明した各処理のうち、自動的に行われるものとして説明した処理の全部または一部を手動的に行うこともでき、あるいは、手動的に行われるものとして説明した処理の全部または一部を公知の方法で自動的に行うこともできる。この他、上記文書中や図面中で示した処理手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて任意に変更することができる。例えば、各図に示した各種情報は、図示した情報に限られない。 [4-2. Other]
Further, of the processes described in the above embodiment, all or part of the processes described as being automatically performed may be manually performed, or the processes described as manually performed may be performed. All or part of the process can be automatically performed by a known method. In addition, the processing procedures, specific names, and information including various data and parameters shown in the above documents and drawings can be arbitrarily changed unless otherwise specified. For example, the various information shown in each drawing is not limited to the information shown.

また、図示した各装置の各構成要素は機能概念的なものであり、必ずしも物理的に図示の如く構成されていることを要しない。すなわち、各装置の分散・統合の具体的形態は図示のものに限られず、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散・統合して構成することができる。 Further, each component of each device shown in the drawings is functionally conceptual, and does not necessarily have to be physically configured as shown. That is, the specific form of distribution / integration of each device is not limited to that shown in the figure, and all or a part of the device may be functionally or physically distributed / arranged in arbitrary units according to various loads and usage conditions. It can be integrated and configured.

また、上記してきた各実施形態は、処理内容を矛盾させない範囲で適宜組み合わせることが可能である。 Further, the respective embodiments described above can be appropriately combined within a range in which the processing content is not inconsistent.

〔４−３．ハードウェア構成について〕
また、上述してきた実施形態に係る情報提供装置１０は、例えば図７に示すような構成のコンピュータ１０００によって実現される。図７は、ハードウェア構成の一例を示す図である。コンピュータ１０００は、出力装置１０１０、入力装置１０２０と接続され、演算装置１０３０、一次記憶装置１０４０、二次記憶装置１０５０、出力ＩＦ（Interface）１０６０、入力ＩＦ１０７０、ネットワークＩＦ１０８０がバス１０９０により接続された形態を有する。 [4-3. About hardware configuration]
Further, the information providing device 10 according to the above-described embodiment is realized by, for example, a computer 1000 configured as shown in FIG. 7. FIG. 7 is a diagram illustrating an example of the hardware configuration. The computer 1000 is connected to an output device 1010 and an input device 1020, and an arithmetic device 1030, a primary storage device 1040, a secondary storage device 1050, an output IF (Interface) 1060, an input IF 1070, and a network IF 1080 are connected by a bus 1090. Have.

演算装置１０３０は、一次記憶装置１０４０や二次記憶装置１０５０に格納されたプログラムや入力装置１０２０から読み出したプログラム等に基づいて動作し、各種の処理を実行する。一次記憶装置１０４０は、ＲＡＭ等、演算装置１０３０が各種の演算に用いるデータを一時的に記憶するメモリ装置である。また、二次記憶装置１０５０は、演算装置１０３０が各種の演算に用いるデータや、各種のデータベースが登録される記憶装置であり、ＲＯＭ(Read Only Memory)、ＨＤＤ、フラッシュメモリ等により実現される。 The arithmetic unit 1030 operates based on a program stored in the primary storage device 1040 or the secondary storage device 1050, a program read from the input device 1020, or the like, and executes various processes. The primary storage device 1040 is a memory device such as a RAM that temporarily stores data used by the arithmetic device 1030 for various calculations. The secondary storage device 1050 is a storage device in which data used by the arithmetic device 1030 for various calculations and various databases are registered, and is realized by a ROM (Read Only Memory), an HDD, a flash memory, or the like.

出力ＩＦ１０６０は、モニタやプリンタといった各種の情報を出力する出力装置１０１０に対し、出力対象となる情報を送信するためのインタフェースであり、例えば、ＵＳＢ（Universal Serial Bus）やＤＶＩ（Digital Visual Interface）、ＨＤＭＩ（登録商標）（High Definition Multimedia Interface）といった規格のコネクタにより実現される。また、入力ＩＦ１０７０は、マウス、キーボード、およびスキャナ等といった各種の入力装置１０２０から情報を受信するためのインタフェースであり、例えば、ＵＳＢ等により実現される。 The output IF 1060 is an interface for transmitting information to be output to an output device 1010 that outputs various kinds of information such as a monitor and a printer. For example, a USB (Universal Serial Bus) or a DVI (Digital Visual Interface), It is realized by a connector of a standard such as HDMI (registered trademark) (High Definition Multimedia Interface). The input IF 1070 is an interface for receiving information from various input devices 1020 such as a mouse, a keyboard, and a scanner, and is realized by, for example, USB.

なお、入力装置１０２０は、例えば、ＣＤ（Compact Disc）、ＤＶＤ（Digital Versatile Disc）、ＰＤ（Phase change rewritable Disk）等の光学記録媒体、ＭＯ（Magneto-Optical disk）等の光磁気記録媒体、テープ媒体、磁気記録媒体、または半導体メモリ等から情報を読み出す装置であってもよい。また、入力装置１０２０は、ＵＳＢメモリ等の外付け記憶媒体であってもよい。 The input device 1020 is, for example, an optical recording medium such as a CD (Compact Disc), a DVD (Digital Versatile Disc), and a PD (Phase change rewritable Disk), a magneto-optical recording medium such as an MO (Magneto-Optical disk), and a tape. It may be a device that reads information from a medium, a magnetic recording medium, a semiconductor memory, or the like. The input device 1020 may also be an external storage medium such as a USB memory.

ネットワークＩＦ１０８０は、ネットワークＮを介して他の機器からデータを受信して演算装置１０３０へ送り、また、ネットワークＮを介して演算装置１０３０が生成したデータを他の機器へ送信する。 The network IF 1080 receives data from another device via the network N and sends the data to the arithmetic device 1030, and also transmits the data generated by the arithmetic device 1030 via the network N to another device.

演算装置１０３０は、出力ＩＦ１０６０や入力ＩＦ１０７０を介して、出力装置１０１０や入力装置１０２０の制御を行う。例えば、演算装置１０３０は、入力装置１０２０や二次記憶装置１０５０からプログラムを一次記憶装置１０４０上にロードし、ロードしたプログラムを実行する。 The arithmetic device 1030 controls the output device 1010 and the input device 1020 via the output IF 1060 and the input IF 1070. For example, the arithmetic device 1030 loads a program from the input device 1020 or the secondary storage device 1050 onto the primary storage device 1040, and executes the loaded program.

例えば、コンピュータ１０００が情報提供装置１０として機能する場合、コンピュータ１０００の演算装置１０３０は、一次記憶装置１０４０上にロードされたプログラムを実行することにより、制御部１６の機能を実現する。 For example, when the computer 1000 functions as the information providing device 10, the arithmetic device 1030 of the computer 1000 implements the function of the control unit 16 by executing the program loaded on the primary storage device 1040.

〔５．効果〕
上述したように、情報提供装置１０は、会議における複数の利用者の発言内容を取得し、取得された利用者の発言内容をベクトル空間上に投影する。そして、情報提供装置１０は、ベクトル空間上に投影された発言内容の履歴の位置と、ベクトル空間上に投影された新たな発言内容の位置とに基づいて、会議を誘導する方向を特定し、会議を特定された方向へと誘導するための応答を出力する。このため、情報提供装置１０は、それまでの発言内容と現在の会議との関連性をベクトル空間上の位置関係に投影し、ベクトル空間上の位置関係に基づいて、会議を誘導する方向を特定するので、会議を適切な方向へ誘導することができる結果、会議を効率的に進めることができる。 [5. effect〕
As described above, the information providing apparatus 10 acquires the utterance content of a plurality of users in the conference and projects the acquired utterance content of the user on the vector space. Then, the information providing apparatus 10 specifies the direction in which the conference is guided, based on the position of the history of the utterance content projected on the vector space and the position of the new utterance content projected on the vector space, Outputs a response to guide the meeting in the specified direction. For this reason, the information providing apparatus 10 projects the relevance between the utterance contents up to that point and the current conference on the positional relationship in the vector space, and identifies the direction in which the conference is guided based on the positional relationship in the vector space. As a result, the conference can be guided in an appropriate direction, so that the conference can be efficiently advanced.

また、情報提供装置１０は、ＳＬＡＭの技術を用いて、ベクトル空間上において会議を誘導する方向を特定する。例えば、情報提供装置１０は、会議を誘導する方向として、会議が盛り上がった際における利用者の発言内容が投影された方向や、発言内容の履歴が投影された位置から離れる方向を特定する。このため、情報提供装置１０は、会議を盛り上げたり、同じ話題が繰り返される等といった事態を防ぎ、会議を効率的に進めることができる。 Further, the information providing apparatus 10 uses the SLAM technology to identify the direction in which the conference is guided in the vector space. For example, the information providing apparatus 10 identifies, as the direction in which the conference is guided, the direction in which the user's utterance content is projected when the conference is excited or the direction away from the position in which the utterance content history is projected. For this reason, the information providing apparatus 10 can prevent a situation in which the conference is excited, the same topic is repeated, and the like, and the conference can be efficiently advanced.

また、情報提供装置１０は、同じ話題に属する複数の発言内容を、ベクトル空間上のうち、所定の領域内に投影する。このため、情報提供装置１０は、会議を誘導する方向を適切に特定することができる。 Further, the information providing apparatus 10 projects a plurality of utterance contents belonging to the same topic into a predetermined area in the vector space. Therefore, the information providing device 10 can appropriately specify the direction in which the conference is guided.

以上、本願の実施形態のいくつかを図面に基づいて詳細に説明したが、これらは例示であり、発明の開示の欄に記載の態様を始めとして、当業者の知識に基づいて種々の変形、改良を施した他の形態で本発明を実施することが可能である。 As described above, some of the embodiments of the present application have been described in detail based on the drawings, but these are examples, and various modifications based on the knowledge of those skilled in the art, including the modes described in the section of the disclosure of the invention, It is possible to implement the present invention in other forms with improvements.

また、上記してきた「部（section、module、unit）」は、「手段」や「回路」などに読み替えることができる。例えば、制御部は、制御手段や制御回路に読み替えることができる。 Further, the above-mentioned "section (module, unit)" can be read as "means" or "circuit". For example, the control unit can be read as a control unit or a control circuit.

１０情報提供装置
１１通信部
１２記憶部
１３モデルデータベース
１４ベクトル空間データベース
１６制御部
１７取得部
１８投影部
１９位置特定部
２０誘導方向特定部
２１応答決定部
２２応答出力部
２３評価取得部
２４モデル更新部
３０入力装置
３１出力装置 10 information providing device 11 communication unit 12 storage unit 13 model database 14 vector space database 16 control unit 17 acquisition unit 18 projection unit 19 position identification unit 20 guidance direction identification unit 21 response determination unit 22 response output unit 23 evaluation acquisition unit 24 model update Part 30 Input device 31 Output device

Claims

An acquisition unit that acquires the content of statements of multiple users in a meeting,
By using a conversion method for converting a sentence having a similar concept into a similar vector, the word group included in the user's statement acquired by the acquisition unit is vectorized and registered in a predetermined storage device. A projection unit that projects the concept of remarks on a vector space,
Of the vectors registered in the storage device, from the position on the vector space where the concept of the past utterance is projected, to the position on the vector space where the concept of the new utterance that is the last output utterance is projected. Based on the similarity of the direction to the position and the direction to the position in the vector space where the concept of the predetermined utterance based on the utterances in the past meetings is projected , the new utterance deviates from the purpose of the meeting. If it is determined that the new utterance has deviated from the purpose of the conference, the predetermined utterance concept is projected from the position on the vector space where the new utterance concept is projected. A direction to the position in the vector space, a specifying unit that specifies the direction to guide the conference,
As a response for guiding the conference in the direction specified by the specifying unit , the user's utterance after outputting a response is a reward of a larger value as it approaches the direction in which the conference is guided, and, The smaller the difference between the position on the vector space onto which the concept of the past utterance is projected and the position on the vector space onto which the new utterance concept is projected after outputting the response, the smaller the reward is set. And an output unit that outputs a response generated by the model in which deep reinforcement learning is performed .

The specifying unit uses a SLAM (Simultaneous Localization and Mapping) technique to project a concept that the user wants to make a new statement from a position in the vector space where the new concept of the statement is projected. The information providing apparatus according to claim 1, wherein a direction to a position in the vector space is specified.

The specifying unit specifies a direction in which a concept of a predetermined utterance is projected as a utterance of the user when the conference is excited as a direction for guiding the conference. Alternatively, the information providing device described in 2.

The said specific | specification part specifies the direction which deviates from the position where the concept of the speech acquired last time was projected as a direction which guides the said meeting. The claim | item 1 characterized by the above-mentioned. Information providing device.

An information providing method executed by an information providing device, comprising:
An acquisition process for acquiring the content of remarks of multiple users in a meeting,
By using a conversion method for converting a sentence having a similar concept into a similar vector, the word group included in the user's statement acquired in the acquisition step is vectorized and registered in a predetermined storage device. A projection process for projecting the concept of speech on a vector space,
Of the vectors registered in the storage device, from the position on the vector space where the concept of the past utterance is projected, to the position on the vector space where the concept of the new utterance that is the last output utterance is projected. Based on the similarity of the direction to the position and the direction to the position in the vector space where the concept of the predetermined utterance based on the utterances in the past meetings is projected , the new utterance deviates from the purpose of the meeting. If it is determined that the new utterance has deviated from the purpose of the conference, the predetermined utterance concept is projected from the position on the vector space where the new utterance concept is projected. A step of identifying the direction to the position in the vector space as a direction guiding the conference,
As a response for guiding the conference in the direction specified by the specifying step , the user's utterance after outputting a response is a reward of a larger value as it approaches the direction in which the conference is guided, and, The smaller the difference between the position on the vector space onto which the concept of the past utterance is projected and the position on the vector space onto which the new utterance concept is projected after outputting the response, the smaller the reward is set. And an output step of outputting a response generated by the model for which deep reinforcement learning has been performed .