JP2019215483A

JP2019215483A - Leaning device, learning method, and learning program

Info

Publication number: JP2019215483A
Application number: JP2018113605A
Authority: JP
Inventors: 力橋本; Chikara Hashimoto; 颯々野　学; Manabu Satsusano; 学颯々野
Original assignee: Z Holdings Corp
Current assignee: LY Corp
Priority date: 2018-06-14
Filing date: 2018-06-14
Publication date: 2019-12-19
Anticipated expiration: 2038-06-14
Also published as: JP7013329B2

Abstract

To improve accuracy of interactions.SOLUTION: A learning device according to the present application comprises: an acquisition unit configured to acquire an indicator utterance that acts as an evaluation indicator of an interactive service among utterances to an interactive model providing the interactive service by generating a response to an input utterance; and a learning unit configured to perform reinforcement leaning for the interactive model by setting reward based on the indicator utterance acquired by the acquisition unit.SELECTED DRAWING: Figure 1

Description

本発明は、学習装置、学習方法および学習プログラムに関する。 The present invention relates to a learning device, a learning method, and a learning program.

従来、利用者から音声若しくはテキストによるメッセージを受け付けると、受付けたメッセージに対応するメッセージを利用者に提供する対話システムが知られている。このような対話システムにおいて、利用者から受け付けたメッセージ（以下、「発話」と総称する。）に対し、発話に対応する多様な内容のメッセージ（以下、「応答」と総称する。）を利用者に対して提供する技術が適用されている。また、このような対話システムにおいて、発話と応答との流れに基づいて、対話を評価する技術が知られている。 2. Description of the Related Art Conventionally, there has been known an interactive system in which, when a voice or text message is received from a user, a message corresponding to the received message is provided to the user. In such an interactive system, in response to a message received from the user (hereinafter collectively referred to as "utterance"), a message having various contents corresponding to the utterance (hereinafter collectively referred to as "response") is used. The technology provided for is applied. Further, in such a dialog system, a technique for evaluating a dialog based on a flow of an utterance and a response is known.

特開２０１１−２５３０５３号公報JP 2011-253053 A

しかしながら、上述した従来技術では、対話の精度を向上させることが難しい場合がある。 However, in the above-described related art, it may be difficult to improve the accuracy of the conversation.

例えば、対話システムにおいて、対話から応答を生成するモデルを対話に対する評価に基づいて学習するといった手法が考えられる。しかしながら、このような手法では、多くの評価が必要となるため、対話の改善が必ずしも容易とは言えない。 For example, in a dialog system, a method of learning a model that generates a response from the dialog based on the evaluation of the dialog can be considered. However, such a method requires a lot of evaluations, so that it is not always easy to improve the dialogue.

本願は、上記に鑑みてなされたものであって、対話の精度を向上させることを目的とする。 The present application has been made in view of the above, and has as its object to improve the accuracy of dialogue.

本願に係る学習装置は、入力された発話に対する応答を生成することで対話サービスを実現する対話モデルに対する発話のうち、当該対話サービスに対する評価の指標となる指標発話を取得する取得部と、前記取得部により取得された指標発話に基づく報酬を設定することで、前記対話モデルの強化学習を行う学習部とを有することを特徴とする。 A learning device according to the present application, an acquisition unit that acquires an index utterance that is an index of evaluation of the interactive service among utterances of an interactive model that realizes an interactive service by generating a response to an input utterance, A learning unit that performs reinforcement learning of the dialog model by setting a reward based on the index utterance obtained by the unit.

実施形態の一態様によれば、対話の精度を向上させることができる。 According to one aspect of the embodiment, the accuracy of the interaction can be improved.

図１は、実施形態に係る情報提供装置が実行する処理の一例を示す図である。FIG. 1 is a diagram illustrating an example of a process performed by the information providing device according to the embodiment. 図２は、実施形態に係る情報提供装置の構成例を示す図である。FIG. 2 is a diagram illustrating a configuration example of the information providing apparatus according to the embodiment. 図３は、実施形態に係るセッションログデータベースに登録される情報の一例を示す図である。FIG. 3 is a diagram illustrating an example of information registered in the session log database according to the embodiment. 図４は、実施形態に係るセッション評価データベースに登録される情報の一例を示す図である。FIG. 4 is a diagram illustrating an example of information registered in the session evaluation database according to the embodiment. 図５は、実施形態に係る指標発話データベースに登録される情報の一例を示す図である。FIG. 5 is a diagram illustrating an example of information registered in the index utterance database according to the embodiment. 図６は、実施形態に係るエンゲージメントデータベースに登録される情報の一例を示す図である。FIG. 6 is a diagram illustrating an example of information registered in the engagement database according to the embodiment. 図７は、実施形態に係るモデルデータベースに登録される情報の一例を示す図である。FIG. 7 is a diagram illustrating an example of information registered in the model database according to the embodiment. 図８は、シードとなる指標発話から新たな指標発話を抽出する処理の概念を示す図である。FIG. 8 is a diagram illustrating a concept of a process of extracting a new index utterance from an index utterance serving as a seed. 図９は、対話サービスにおける利用態様に基づいて、新たな指標発話を抽出する処理の概念を示す図である。FIG. 9 is a diagram illustrating a concept of a process of extracting a new index utterance based on a use mode in the interactive service. 図１０は、セッションの評価結果に基づいて、新たな指標発話を抽出する処理の概念を示す図である。FIG. 10 is a diagram illustrating a concept of a process of extracting a new index utterance based on a session evaluation result. 図１１は、実施形態に係る指標発話抽出処理部の機能構成の一例を示す図である。FIG. 11 is a diagram illustrating an example of a functional configuration of the index utterance extraction processing unit according to the embodiment. 図１２は、指標発話に基づいてセッションを評価する処理の概念を示す図である。FIG. 12 is a diagram illustrating the concept of processing for evaluating a session based on an index utterance. 図１３は、指標発話に基づいてセッションを評価する処理の概念を示す図である。FIG. 13 is a diagram illustrating a concept of a process of evaluating a session based on an index utterance. 図１４は、３種類の対話の特徴を学習させた３値評価モデルを用いてセッションを評価する処理の一例を示す図である。FIG. 14 is a diagram illustrating an example of a process of evaluating a session using a ternary evaluation model in which features of three types of conversations are learned. 図１５は、エンゲージメントに基づいてセッションを評価する処理の一例を示す図である。FIG. 15 is a diagram illustrating an example of a process of evaluating a session based on engagement. 図１６は、繰り返し発話に基づいてセッションを評価する処理の概念を示す図である。FIG. 16 is a diagram illustrating the concept of processing for evaluating a session based on repeated utterances. 図１７は、自動生成したデータの特徴を学習させたモデルを用いてセッションを評価する処理の概念を示す図である。FIG. 17 is a diagram illustrating a concept of a process of evaluating a session using a model in which a feature of automatically generated data is learned. 図１８は、画像を用いた繰り返し発話の評価の概念を示す図である。FIG. 18 is a diagram illustrating the concept of evaluating repeated utterances using images. 図１９は、謝罪応答に基づいてセッションを評価する処理の概念を示す図である。FIG. 19 is a diagram illustrating a concept of a process of evaluating a session based on an apology response. 図２０は、複数の評価手法を組み合わせてセッションを評価する処理の概念を示す図である。FIG. 20 is a diagram illustrating a concept of a process of evaluating a session by combining a plurality of evaluation methods. 図２１は、ラベル付きセッションを自動拡張する処理の概念を示す図である。FIG. 21 is a diagram illustrating the concept of the process of automatically expanding a labeled session. 図２２は、実施形態に係るセッション評価処理部の機能構成の一例を示す図である。FIG. 22 is a diagram illustrating an example of a functional configuration of the session evaluation processing unit according to the embodiment. 図２３は、指標発話に基づいて将来の利用態様を予測する処理の概念を示す図である。FIG. 23 is a diagram illustrating a concept of a process of predicting a future use mode based on an index utterance. 図２４は、評価結果に基づいて将来の利用態様を予測する処理の概念を示す図である。FIG. 24 is a diagram illustrating a concept of a process of predicting a future use mode based on an evaluation result. 図２５は、実施形態に係るエンゲージメント予測処理部の機能構成の一例を示す図である。FIG. 25 is a diagram illustrating an example of a functional configuration of an engagement prediction processing unit according to the embodiment. 図２６は、実施形態に係る情報提供装置が実行する処理の関連性の一例を示す図である。FIG. 26 is a diagram illustrating an example of the relevance of the process performed by the information providing device according to the embodiment. 図２７は、対話モデルの強化学習を実行する処理の概念を示す図である。FIG. 27 is a diagram illustrating a concept of a process of executing reinforcement learning of a dialog model. 図２８は、実施形態に係る強化学習処理部の機能構成の一例を示す図である。FIG. 28 is a diagram illustrating an example of a functional configuration of the reinforcement learning processing unit according to the embodiment. 図２９は、実施形態に係る情報提供装置が利用態様に基づいて実行する処理の流れの一例を示すフローチャートである。FIG. 29 is a flowchart illustrating an example of a flow of a process performed by the information providing apparatus according to the embodiment based on a use mode. 図３０は、実施形態に係る情報提供装置が指標発話を抽出する処理の流れの一例を示すフローチャートである。FIG. 30 is a flowchart illustrating an example of a flow of a process in which the information providing apparatus according to the embodiment extracts an index utterance. 図３１は、実施形態に係る情報提供装置がセッションを評価する処理の流れの一例を示すフローチャートである。FIG. 31 is a flowchart illustrating an example of the flow of a process in which the information providing apparatus according to the embodiment evaluates a session. 図３２は、実施形態に係る情報提供装置が画像に基づいてセッションを評価する処理の流れの一例を示すフローチャートである。FIG. 32 is a flowchart illustrating an example of a flow of processing in which the information providing apparatus according to the embodiment evaluates a session based on an image. 図３３は、実施形態に係る情報提供装置が発話の繰り返しに基づいてセッションを評価する処理の流れの一例を示すフローチャートである。FIG. 33 is a flowchart illustrating an example of a flow of processing in which the information providing apparatus according to the embodiment evaluates a session based on repetition of speech. 図３４は、実施形態に係る情報提供装置が発話を報酬として実行する強化学習の流れの一例を示すフローチャートである。FIG. 34 is a flowchart illustrating an example of the flow of reinforcement learning in which the information providing apparatus according to the embodiment executes an utterance as a reward. 図３５は、実施形態に係る情報提供装置が共起性に基づいて指標発話を抽出する処理の流れの一例を示すフローチャートである。FIG. 35 is a flowchart illustrating an example of a flow of a process in which the information providing apparatus according to the embodiment extracts an index utterance based on co-occurrence. 図３６は、実施形態に係る情報提供装置が３値評価モデルを用いてセッションを評価する処理の流れの一例を示すフローチャートである。FIG. 36 is a flowchart illustrating an example of a flow of processing in which the information providing apparatus according to the embodiment evaluates a session using a ternary evaluation model. 図３７は、実施形態に係る情報提供装置が複数のモデルを用いてセッションを評価する処理の流れの一例を示すフローチャートである。FIG. 37 is a flowchart illustrating an example of a flow of a process in which the information providing apparatus according to the embodiment evaluates a session using a plurality of models. 図３８は、ハードウェア構成の一例を示す図である。FIG. 38 is a diagram illustrating an example of a hardware configuration. 図３９は、セッションデータに対するラベル付の結果の一例を示す図である。FIG. 39 is a diagram illustrating an example of a result of labeling session data. 図４０は、第１実験の結果を示す図である。FIG. 40 is a diagram showing the results of the first experiment. 図４１は、第２実験の結果を示す図である。FIG. 41 is a diagram showing the results of the second experiment. 図４２は、第３実験におけるＰｒｏｐ＿Ｄの結果を示す図である。FIG. 42 is a diagram showing the result of Prop_D in the third experiment. 図４３は、第４実験におけるＰｒｏｐ＿Ｓの結果を示す図である。FIG. 43 is a diagram showing the result of Prop_S in the fourth experiment. 図４４は、第５実験におけるＰｒｏｐ＿Ｅの結果を示す図である。FIG. 44 is a diagram showing the result of Prop_E in the fifth experiment.

以下に、本願に係る学習装置、学習方法および学習プログラムを実施するための形態（以下、「実施形態」と記載する。）について図面を参照しつつ詳細に説明する。なお、この実施形態により本願に係る学習装置、学習方法および学習プログラムが限定されるものではない。また、以下の各実施形態において同一の部位には同一の符号を付し、重複する説明は省略する。 Hereinafter, an embodiment for implementing a learning device, a learning method, and a learning program according to the present application (hereinafter, referred to as “embodiment”) will be described in detail with reference to the drawings. The embodiment does not limit the learning device, the learning method, and the learning program according to the present application. In the following embodiments, the same portions are denoted by the same reference numerals, and overlapping description will be omitted.

[実施形態]
〔１．情報提供装置の概要について〕
まず、図１を用いて、抽出装置、評価装置、および学習装置の一例である情報提供装置が実行する処理の一例について説明する。図１は、実施形態に係る情報提供装置が実行する処理の一例を示す図である。 [Embodiment]
[1. Outline of information providing device]
First, an example of a process executed by an information providing device that is an example of an extraction device, an evaluation device, and a learning device will be described with reference to FIG. FIG. 1 is a diagram illustrating an example of a process performed by the information providing device according to the embodiment.

図１では、情報提供装置１０は、インターネット等の所定のネットワークＮ（例えば、図２参照）を介して、対話装置２００と相互に通信可能である。なお、情報提供装置１０は、利用者が利用する端末装置１００と通信可能であってもよい。また、情報提供装置１０は、端末装置１００や対話装置２００以外にも、各種の外部サーバと相互に通信可能であるものとする。 In FIG. 1, the information providing apparatus 10 can communicate with the interactive apparatus 200 via a predetermined network N such as the Internet (for example, see FIG. 2). The information providing device 10 may be able to communicate with the terminal device 100 used by the user. It is assumed that the information providing apparatus 10 can communicate with various external servers in addition to the terminal apparatus 100 and the interactive apparatus 200.

端末装置１００は、スマートフォンやタブレット等のスマートデバイスであり、３Ｇ（3rd Generation）やＬＴＥ（Long Term Evolution）等の無線通信網を介して任意のサーバ装置と通信を行うことができる携帯端末装置である。なお、端末装置１００は、スマートデバイスのみならず、デスクトップＰＣ（Personal Computer）やノートＰＣ等の情報処理装置であってもよい。なお、端末装置１００は、利用者が発生した音声（いわゆる「発話」）を取得すると、取得した発話音声を対話装置２００へと送信し、対話装置２００から送信される応答を取得すると、取得した応答を音声として出力する装置であれば、スマートスピーカ等、任意の装置が採用可能である。 The terminal device 100 is a smart device such as a smartphone or a tablet, and is a portable terminal device that can communicate with an arbitrary server device via a wireless communication network such as 3G (3rd Generation) or LTE (Long Term Evolution). is there. Note that the terminal device 100 is not limited to a smart device, and may be an information processing device such as a desktop PC (Personal Computer) or a notebook PC. In addition, when the terminal device 100 acquires the voice generated by the user (so-called “utterance”), the terminal device 100 transmits the acquired uttered voice to the interactive device 200, and acquires the response transmitted from the interactive device 200, and acquires the response. Any device such as a smart speaker can be adopted as long as the device outputs a response as voice.

対話装置２００は、利用者との間の対話を実現する対話処理を実行する情報処理装置であり、例えば、サーバ装置やクラウドシステム等により実現される。例えば、対話装置２００は、端末装置１００から発話を受付けると、受付けた発話に応じた応答を生成し、生成した応答を音声で出力するためのデータを端末装置１００へと送信する。 The interactive device 200 is an information processing device that executes an interactive process for realizing an interaction with a user, and is implemented, for example, by a server device or a cloud system. For example, when the utterance is received from the terminal device 100, the interactive device 200 generates a response according to the received utterance, and transmits data for outputting the generated response as voice to the terminal device 100.

ここで、対話装置２００が実現する対話処理の一例について説明する。なお、以下に示す対話処理は、あくまで一例であり、実施形態を限定するものではない。例えば、端末装置１００は、利用者が発した音声を発話として取得する（ステップＳ１）。より具体的な例を挙げると、端末装置１００は、ビームフォーミング等の各種音声抽出技術を用いて、発話の音声データを取得する。このような場合、端末装置１００は、取得した音声データを所定の変換サーバ（図示は省略）へと送信する。このような場合、変換サーバは、音声データから発話のテキストデータを生成し、生成したテキストデータを端末装置１００へと送信する。そして、端末装置１００は、テキストデータを発話として対話装置２００へと送信する（ステップＳ２）。 Here, an example of the interactive processing realized by the interactive device 200 will be described. Note that the following interactive processing is merely an example, and does not limit the embodiment. For example, the terminal device 100 acquires a voice uttered by the user as an utterance (step S1). To give a more specific example, the terminal device 100 acquires speech data of an utterance by using various speech extraction techniques such as beamforming. In such a case, the terminal device 100 transmits the acquired audio data to a predetermined conversion server (not shown). In such a case, the conversion server generates text data of the utterance from the voice data, and transmits the generated text data to the terminal device 100. Then, the terminal device 100 transmits the text data as an utterance to the interactive device 200 (Step S2).

一方、対話装置２００は、各種の意図解析技術を用いて、発話として受付けたテキストデータを解析し、発話が示す利用者の意図を推定する。そして、対話装置２００は、推定した意図に応じた各種の処理を実行し、実行結果を示すテキストデータを生成する。すなわち、対話装置２００は、応答を生成する（ステップＳ３）。そして、対話装置２００は、応答として生成したテキストデータを端末装置１００に送信する（ステップＳ４）。この結果、端末装置１００は、テキストデータの読み上げを行ったり、テキストデータを画面上に表示することで、発話に対する応答を利用者に対して提供する（ステップＳ５）。 On the other hand, the interaction device 200 analyzes the text data received as the utterance using various intention analysis techniques, and estimates the user's intention indicated by the utterance. Then, the interactive device 200 executes various processes according to the estimated intention, and generates text data indicating the execution result. That is, the interaction device 200 generates a response (step S3). Then, the interactive device 200 transmits the generated text data to the terminal device 100 as a response (step S4). As a result, the terminal device 100 provides a response to the utterance to the user by reading out the text data or displaying the text data on the screen (step S5).

このような対話装置２００が実現する対話処理においては、応答の内容に対して利用者が満足したか否かという評価が重要になる。例えば、ある応答に対して利用者が満足したか否かという情報は、対話装置２００が実行する対話処理の精度を向上させるための情報として採用可能である。しかしながら、利用者に対して応答に満足したか否かを問い合わせた場合には、ユーザーエクスペリメンスを毀損する恐れがある。 In the dialogue processing realized by such a dialogue apparatus 200, it is important to evaluate whether the user is satisfied with the contents of the response. For example, information indicating whether or not the user is satisfied with a certain response can be adopted as information for improving the accuracy of the interactive processing performed by the interactive device 200. However, if the user is inquired as to whether he or she is satisfied with the response, there is a risk that the user experience will be impaired.

ここで、対話に対して利用者が満足しているか否かを適切に評価するため、利用者が満足したか否かを示すラベルが付与されたラベル付き対話のデータを準備し、各種モデルにラベル付き対話のデータが有する特徴を教師あり若しくは教師なし学習により学習させ、学習済モデルを用いて対話の評価を行うといった手法が考えられる。しかしながら、このようなモデルの学習には、ラベル付き対話のデータが比較的多く必要となるが、ラベル付き対話のデータを作成するのは、手間がかかるという問題がある。 Here, in order to appropriately evaluate whether or not the user is satisfied with the dialogue, prepare data of a labeled dialogue with a label indicating whether or not the user is satisfied, and prepare data for various models. A method is conceivable in which the features of the data of the labeled dialogue are learned by supervised or unsupervised learning, and the dialogue is evaluated using the trained model. However, learning such a model requires a relatively large amount of data of a labeled dialogue, but there is a problem that creating data of a labeled dialogue is troublesome.

例えば、ラベル付き対話のデータを作成するには、クラウドソージング等の手作業により作成される場合がある。しかしながら、対話の中には、多くの発話と応答とを含む対話（すなわち、長い対話）も存在するため、手作業によりラベルを付与するのは、困難となる場合がある。このため、学習に用いるラベル付き対話のデータを自動的に獲得することができれば、学習を効率的に行うことができ、対話処理の精度を向上させることができると考えられる。 For example, in order to create data of a dialogue with a label, the data may be created by a manual operation such as crowd sowing. However, since some dialogues include many utterances and responses (that is, long dialogues), it may be difficult to manually apply labels. Therefore, it is considered that if the data of the labeled dialogue used for the learning can be automatically acquired, the learning can be performed efficiently, and the accuracy of the dialogue processing can be improved.

そこで、情報提供装置１０は、対話処理における利用者の発話と応答とを利用して、対話処理に対して利用者が満足したか否かを自動的に評価するための評価処理を実行する。また、情報提供装置１０は、評価処理を行う際に用いる発話を抽出する抽出処理を実行する。また、情報提供装置１０は、対話を行うためのモデルを学習する学習装置を実行する。そして、情報提供装置１０は、抽出装置、評価装置、および学習装置に基づいて、対話装置２００による対話処理の精度を向上させる。 Thus, the information providing apparatus 10 executes an evaluation process for automatically evaluating whether or not the user is satisfied with the interaction process, using the utterance and response of the user in the interaction process. In addition, the information providing apparatus 10 executes an extraction process of extracting an utterance used when performing the evaluation process. The information providing device 10 executes a learning device that learns a model for performing a conversation. Then, the information providing device 10 improves the accuracy of the interactive processing by the interactive device 200 based on the extraction device, the evaluation device, and the learning device.

〔１−１．情報提供装置が実行する処理の一例〕
以下、情報提供装置１０が実行する処理の一例について説明する。まず、情報提供装置１０は、対話装置２００から対話処理における発話と応答との履歴である対話履歴を取得する（ステップＳ６）。そして、情報提供装置１０は、対話履歴に基づいて、対話を評価する（ステップＳ７）。 [1-1. Example of processing executed by information providing apparatus]
Hereinafter, an example of a process performed by the information providing apparatus 10 will be described. First, the information providing apparatus 10 acquires a dialog history from the dialog apparatus 200, which is a history of an utterance and a response in the dialog processing (step S6). Then, the information providing apparatus 10 evaluates the conversation based on the conversation history (Step S7).

より具体的には、情報提供装置１０は、利用者による発話、その発話に対する応答、およびその応答に対する利用者の発話といった一連の連続する複数の発話と複数の応答とを含む対話履歴をセッションとして抽出する。ここで、情報提供装置１０は、任意の基準により定められた複数の発話および応答を１つのセッションとして採用してよい。 More specifically, the information providing apparatus 10 sets, as a session, a dialogue history including a series of multiple utterances such as a user's utterance, a response to the utterance, and a user's utterance to the response, and a plurality of responses. Extract. Here, the information providing apparatus 10 may employ, as one session, a plurality of utterances and responses determined by an arbitrary standard.

例えば、情報提供装置１０は、利用者からの発話が無い状態が所定の時間（例えば、３０分）以上経過した場合は、それまでに利用者から受付けた発話および利用者に提供した応答を１つのセッションに含まれる発話および応答としてもよい。また、情報提供装置１０は、利用者が対話処理を行うためのアプリケーションを起動させてから、アプリケーションをスリープ若しくは終了させるまでに取得された発話および応答を１つのセッションとしてもよい。すなわち、セッションとは、利用者による評価単位となりえる一連の対話であれば、任意の単位でまとめられた発話および応答を含むものであってよい。 For example, when a state in which there is no utterance from the user has elapsed for a predetermined time (for example, 30 minutes), the information providing apparatus 10 transmits the utterance received from the user and the response provided to the user up to that time. The utterance and response included in one session may be used. Further, the information providing apparatus 10 may use the utterance and the response acquired from the time when the user activates the application for performing the interactive processing until the time when the application sleeps or ends, as one session. That is, the session may include a speech and a response grouped in arbitrary units as long as the series of conversations can be a unit of evaluation by the user.

そして、情報提供装置１０は、セッションにおける対話の評価を行う。より具体的には、情報提供装置１０は、セッションにおける応答や発話に基づいて、セッションにおける対話に利用者が満足したか否かを推定する。例えば、情報提供装置１０は、セッションに対する利用者の評価の指標となる複数の特徴のうち、それぞれ異なる種別の特徴に基づいてセッションを評価する複数のモデルの評価結果を用いて、セッションの評価を行う。 Then, the information providing apparatus 10 evaluates the dialogue in the session. More specifically, the information providing apparatus 10 estimates whether or not the user is satisfied with the conversation in the session based on the response and the utterance in the session. For example, the information providing apparatus 10 evaluates a session by using evaluation results of a plurality of models that evaluate a session based on characteristics of different types from among a plurality of characteristics that are indexes of a user's evaluation of the session. Do.

例えば、１つのセッションの中に含まれる発話に同一若しくは類似する内容の発話が含まれている場合、そのセッションにおける対話に対して利用者が満足していないと推定される。例えば、利用者が旅館での求職を意図して「旅館の仕事」と発話した際に、対話装置２００が「旅館」という単語から宿泊施設の検索を利用者が意図していると誤判定し、「宿泊施設を案内します」といった応答を返してしまう場合がある。このような意図に反する応答が出力された場合、利用者が音声認識が上手くいかなかったと推定して再度「旅館の仕事」と発話し、対話装置２００が再度「宿泊施設を案内します」といった応答を返してしまう場合がある。このように、１つのセッションにおいて類似する発話の繰り返しは、そのセッションに対して利用者があまり満足していない旨を示唆していると考えられる。 For example, when utterances included in one session include utterances having the same or similar contents, it is estimated that the user is not satisfied with the dialogue in the session. For example, when the user utters “work of an inn” with the intention of seeking a job at an inn, the interactive device 200 erroneously determines that the user intends to search for accommodation from the word “inn”. , "A guide to accommodation facilities" may be returned. When a response contrary to the intention is output, the user presumes that the speech recognition did not work well, and speaks again “work of the inn”, and the interactive device 200 again “guides the accommodation”. A response may be returned. Thus, repetition of similar utterances in one session is considered to indicate that the user is not very satisfied with the session.

そこで、情報提供装置１０は、同一または類似する発話の繰り返しに基づいてセッションを評価するモデルを生成する（ステップＳ７−１）。例えば、情報提供装置１０は、セッションに含まれる発話や応答の文字列を入力した際に、そのセッションに含まれる発話の繰り返しに基づいて、セッションに対して利用者がどれくらい満足していないか、すなわち、利用者の不満足度を示すスコアである第１不満足度を出力する繰り返し評価モデルＭ１を生成する。例えば、このような繰り返し評価モデルは、同一発話（あるいは同一文字列）が含まれるセッション、若しくは、類似発話（あるいは類似文字列）が含まれるセッションが入力された場合は、同一発話（あるいは同一文字列）が含まれないセッション、若しくは、類似発話（あるいは類似文字列）が含まれないセッションが含まれないセッションが入力された場合よりもより高い値の第１不満足度を出力する。ここで、「類似発話」とは、「旅館の仕事」と「ホテルの仕事」等、意味が類似する発話である。なお、繰り返し評価モデルＭ１は、セッションが有する文字列そのものではなく、文字列から生成した画像が有する特徴に基づいて、セッションの評価を行うが、このような処理の詳細については、後述する。 Therefore, the information providing apparatus 10 generates a model for evaluating a session based on repetition of the same or similar utterance (step S7-1). For example, when the information providing apparatus 10 inputs a character string of an utterance or a response included in the session, based on the repetition of the utterance included in the session, the user is not satisfied with the session. That is, the repetition evaluation model M1 that outputs the first degree of dissatisfaction, which is a score indicating the degree of dissatisfaction of the user, is generated. For example, such a repetition evaluation model uses the same utterance (or the same character string) when a session including the same utterance (or the same character string) or a session including a similar utterance (or a similar character string) is input. The first dissatisfaction degree is output at a higher value than when a session that does not include a session that does not include a column or a session that does not include a session that does not include a similar utterance (or a similar character string) is input. Here, “similar utterance” is an utterance having a similar meaning, such as “work of an inn” and “work of a hotel”. The repetition evaluation model M1 evaluates a session based not on the character string itself of the session but on the characteristics of an image generated from the character string. The details of such processing will be described later.

また、例えば、「違う」といった発話に対して「申し訳ありません」や「ごめんなさい」等といった謝罪を示す応答が提供される場合がある。このような謝罪を示す応答（以下、「謝罪応答」と総称する場合がある。）がセッションに含まれている場合、そのセッションに対して利用者があまり満足していないと推定される。そこで、情報提供装置１０は、謝罪応答が含まれているか否かに基づいて、セッションを評価するモデルを生成する（ステップＳ７−２）。例えば、情報提供装置１０は、より多くの謝罪応答が含まれているセッションに対して、より高い値の第２不満足度を出力し、より謝罪応答が含まれていないセッションに対して、より低い値の第２不満足度を出力する謝罪評価モデルＭ２を生成する。 In addition, for example, a response indicating an apology such as “sorry” or “sorry” may be provided for an utterance such as “no”. When a response indicating such an apology (hereinafter sometimes collectively referred to as an “apology response”) is included in a session, it is estimated that the user is not very satisfied with the session. Therefore, the information providing apparatus 10 generates a model for evaluating the session based on whether or not an apology response is included (step S7-2). For example, the information providing apparatus 10 outputs a higher second dissatisfaction degree for a session including more apology responses, and outputs a lower value for a session including no more apology responses. An apology evaluation model M2 that outputs the second degree of dissatisfaction of the value is generated.

また、利用者の発話には、セッションに対する利用者の印象を示唆する発話が含まれるが、このような利用者の印象を示唆する発話は、セッションの評価の指標となる発話、すなわち、指標発話として採用することができる。例えば、利用者の「使えない」や「アンインストール」等といった発話は、応答に対する利用者の非好意的な評価を示す発話、すなわち、ｕｎｆａｖｏｒａｂｌｅなフィードバックであると考えられる。このような非好意的な評価を示す発話（以下、「非好意的指標発話」と記載する。）が多く含まれるセッションは、利用者の満足度が低いと推定される。また、利用者の「すごいね」や「よくできる」といった発話は、応答に対する利用者の好意的な評価を示す発話、すなわち、ｆａｖｏｒａｂｌｅなフィードバックであると考えられる。このような好意的な評価を示す発話（以下、「好意的指標発話」と記載する。）が多く含まれるセッションは、利用者の満足度が高いと推定される。 Further, the utterance of the user includes an utterance that suggests the user's impression of the session, and the utterance that suggests the impression of the user is an utterance that is an index of the evaluation of the session, that is, an index utterance. Can be adopted as For example, an utterance such as “unusable” or “uninstall” of the user is considered to be an utterance indicating the user's unfavorable evaluation of the response, that is, unfabricable feedback. It is estimated that a session including many utterances indicating such unfavorable evaluations (hereinafter referred to as “unfavorable index utterances”) has low user satisfaction. In addition, the user's utterance such as “awesome” or “good” is considered to be an utterance indicating a favorable evaluation of the user for the response, that is, a favorable feedback. A session including many utterances indicating such favorable evaluations (hereinafter referred to as “favorable index utterances”) is estimated to have high user satisfaction.

すなわち、このような好意的指標発話や非好意的指標発話（以下、「指標発話」と総称する場合がある。）は、セッションに対する利用者の満足度の指標となりえる。そこで、情報提供装置１０は、好意的指標発話や非好意的指標発話といった各種指標発話に基づいて、セッションを評価するモデルを生成する。 That is, such favorable index utterances and unfavorable index utterances (hereinafter sometimes collectively referred to as “index utterances”) can be an index of the user's satisfaction with the session. Therefore, the information providing apparatus 10 generates a model for evaluating a session based on various index utterances such as a favorable index utterance and an unfavorable index utterance.

例えば、情報提供装置１０は、非好意的指標発話に基づいてセッションを評価するモデルを生成する（ステップＳ７−３）。例えば、情報提供装置１０は、セッションに含まれる発話に「使えない」や「アンインストール」等といったあらかじめ設定された非好意的指標発話がより多く含まれているセッションに対して、より高い値の第３不満足度を出力し、非好意的指標発話がより含まれていないセッションに対して、より低い値の第３不満足度を出力する非好意的指標発話評価モデルＭ３を生成する。 For example, the information providing apparatus 10 generates a model for evaluating the session based on the unfavorable index utterance (Step S7-3). For example, the information providing apparatus 10 may set a higher value for a session in which the utterance included in the session includes a larger number of unfavorable index utterances such as “unusable” and “uninstall”. The third dissatisfaction degree is output, and the unfavorable index utterance evaluation model M3 that outputs a lower third dissatisfaction degree is generated for a session that does not include the unfavorable index utterance.

また、情報提供装置１０は、好意的指標発話に基づいてセッションを評価するモデルを生成する（ステップＳ７−４）。例えば、情報提供装置１０は、セッションに含まれる発話に「すごいね」や「よくできている」等といったあらかじめ設定された好意的指標発話がより多く含まれているセッションに対して、より高い値の第１満足度を出力し、好意的指標発話がより含まれていないセッションに対して、より低い値の第１満足度を出力する好意的指標発話評価モデルＭ４を生成する。なお、上述した非好意的指標発話や好意的指標発話は、後述する処理により、セッションから適宜自動的に抽出された発話であってもよい。 Further, the information providing device 10 generates a model for evaluating the session based on the favorable index utterance (Step S7-4). For example, the information providing apparatus 10 may set a higher value for a session in which the utterance included in the session includes more preset utterances of a favorable index such as “awesome” or “good”. , And generates a favorable index utterance evaluation model M4 that outputs a lower first satisfaction degree for a session that does not include more favorable index utterances. Note that the above-mentioned unfavorable index utterance or favorable index utterance may be an utterance automatically and appropriately extracted from a session by a process described later.

なお、情報提供装置１０は、任意の公知技術を用いて、各モデルの学習を行ってよい。例えば、情報提供装置１０は、バックプロパゲーションや強化学習、ＧＡＮ（Generative Adversarial Network）等の技術を用いて、各モデルの学習を行えばよい。 Note that the information providing apparatus 10 may learn each model using any known technique. For example, the information providing apparatus 10 may learn each model using a technique such as back propagation, reinforcement learning, and GAN (Generative Adversarial Network).

そして、情報提供装置１０は、各モデルが出力した各不満足度および満足度に基づいて、入力されたセッションを評価する（ステップＳ７−５）。例えば、情報提供装置１０は、評価対象となるセッションを抽出し、抽出したセッションを繰り返し評価モデルＭ１、謝罪評価モデルＭ２、非好意的指標発話評価モデルＭ３、および好意的指標発話評価モデルＭ４のそれぞれに入力する。そして、情報提供装置１０は、第１不満足度、第２不満足度、および第３不満足度の和から第１満足度を減算した値を不満足度スコアとして算出する。すなわち、情報提供装置１０は、不満足度スコアとして、セッションに対して利用者が満足していない程高くなるスコアを算出する。なお、「不満足」とは、例えば、利用者がセッションにおける応答に対してイライラ（frustrated）している場合のみならず、利用者が呆れている場合等、セッションに対して非好意的な主観を有する任意の状態を含む概念である。 Then, the information providing apparatus 10 evaluates the input session based on each dissatisfaction degree and satisfaction degree output by each model (step S7-5). For example, the information providing apparatus 10 extracts a session to be evaluated, and repeats the extracted session for each of the evaluation model M1, the apology evaluation model M2, the unfavorable index utterance evaluation model M3, and the favorable index utterance evaluation model M4. To enter. Then, the information providing apparatus 10 calculates, as the dissatisfaction score, a value obtained by subtracting the first satisfaction from the sum of the first dissatisfaction, the second dissatisfaction, and the third dissatisfaction. That is, the information providing apparatus 10 calculates, as the dissatisfaction degree score, a score that is so high that the user is not satisfied with the session. Note that “dissatisfied” means not only that the user is frustrated with the response in the session, but also that the user is uncomfortable with the session, such as when the user is amazed. It is a concept that includes any state that has.

また、情報提供装置１０は、対話履歴から抽出される各セッションから不満足度スコアをそれぞれ算出する。そして、情報提供装置１０は、各セッションのうち不満足度スコアが高い方から所定の数のセッションを抽出し、抽出したセッションを利用者の満足度が低いセッションである低満足セッションとする。また、情報提供装置１０は、不満足度スコアが低い方から順に所定の数のセッションを抽出し、抽出したセッションを利用者の満足度が高いセッションである高満足セッションとする。なお、情報提供装置１０は、不満足度スコアが所定の閾値を超えるセッションを低満足セッションとし、不満足度スコアが所定の閾値を下回るセッションを高満足セッションとしてもよい。このような処理を実行することで、情報提供装置１０は、セッションに対して利用者が満足しているか否かを評価する処理を実現する。 Further, the information providing apparatus 10 calculates a dissatisfaction score from each session extracted from the conversation history. Then, the information providing apparatus 10 extracts a predetermined number of sessions from each session having a higher dissatisfaction score, and sets the extracted sessions as low-satisfaction sessions in which the user has a lower degree of satisfaction. The information providing apparatus 10 extracts a predetermined number of sessions in descending order of the dissatisfaction score, and sets the extracted sessions as high-satisfaction sessions, which are sessions with a high degree of user satisfaction. Note that the information providing apparatus 10 may determine a session in which the dissatisfaction score exceeds a predetermined threshold as a low-satisfaction session, and a session in which the dissatisfaction score is lower than a predetermined threshold as a high-satisfaction session. By performing such a process, the information providing apparatus 10 realizes a process of evaluating whether or not the user is satisfied with the session.

そして、情報提供装置１０は、評価結果に基づいて、対話モデルを更新する（ステップＳ８）。例えば、情報提供装置１０は、ある発話と、その発話に対する応答とが低満足セッションに含まれる場合、その発話に対してその応答を出力する確率を下げ、ある発話と、その発話に対する応答とが高満足セッションに含まれる場合、その発話に対してその応答を出力する確率を上げるように、応答を生成するための対話モデルを更新する。すなわち、情報提供装置１０は、利用者がより満足するように、対話モデルの更新を行う。 Then, the information providing device 10 updates the interaction model based on the evaluation result (Step S8). For example, when a certain utterance and a response to the utterance are included in the low-satisfaction session, the information providing apparatus 10 reduces the probability of outputting the response to the utterance, and the certain utterance and the response to the utterance If included in the high satisfaction session, the dialogue model for generating the response is updated so as to increase the probability of outputting the response to the utterance. That is, the information providing apparatus 10 updates the interaction model so that the user is more satisfied.

このように、情報提供装置１０は、対話サービスにおける利用者の発話と応答とを含むセッションを取得し、セッションに対する利用者の評価の指標となる複数の特徴のうちそれぞれ異なる種別の特徴に基づいてセッションを評価する複数のモデルＭ１〜Ｍ４の評価結果を用いて、セッションを評価する。この結果、情報提供装置１０は、複数の観点から総合的にセッションを評価するので、評価の精度を向上させることができる。また、情報提供装置１０は、クラウドソージング等といった人手によるセッションの評価を行わずとも、セッションの評価を自動で実現できるので、対話に対する評価を容易にすることができる。 As described above, the information providing apparatus 10 acquires the session including the utterance and the response of the user in the interactive service, and based on the different types of features among the plurality of features serving as indexes of the user's evaluation of the session. The session is evaluated using the evaluation results of the plurality of models M1 to M4 for evaluating the session. As a result, the information providing apparatus 10 comprehensively evaluates the session from a plurality of viewpoints, so that the accuracy of the evaluation can be improved. Further, the information providing apparatus 10 can automatically realize the session evaluation without manually evaluating the session such as crowd sourcing, and thus can easily evaluate the dialogue.

〔１−２．情報提供装置が実行する処理について〕
なお、上述した例では、情報処理装置１０が実行する処理の一例として、セッションに含まれる発話の繰り返しの有無、謝罪応答の有無、好意的指標発話や非好意的指標発話の有無に応じて、セッションの評価を行う処理の一例について記載した。しかしながら、実施形態は、これに限定されるものではない。 [1-2. Processing performed by information providing device]
In the above-described example, as an example of the process performed by the information processing apparatus 10, according to the presence / absence of repetition of the utterance included in the session, the presence / absence of an apology response, and the presence / absence of a favorable index utterance or an unfavorable index utterance An example of the process for evaluating a session has been described. However, embodiments are not limited to this.

例えば、以下に説明するように、情報提供装置１０は、様々な観点からセッションの評価を行ってよい。また、情報提供装置１０は、このようなセッションの評価を行うため、セッションに含まれる発話から各種の指標発話を自動的に抽出してもよい。また、情報提供装置１０は、このようなセッションの評価を行うため、利用者が対話サービスを利用する期間や頻度（以下、「エンゲージメント」と記載する場合がある。）を推定してもよい。また、情報提供装置１０は、対話の履歴に基づいて、発話から応答を生成するための対話モデルの学習を行ってもよい。 For example, as described below, the information providing apparatus 10 may evaluate a session from various viewpoints. Further, the information providing apparatus 10 may automatically extract various index utterances from utterances included in the session in order to evaluate such a session. In order to evaluate such a session, the information providing apparatus 10 may estimate a period and a frequency (hereinafter, sometimes referred to as “engagement”) in which the user uses the interactive service. In addition, the information providing apparatus 10 may learn a dialog model for generating a response from an utterance based on the history of the dialog.

すなわち、情報提供装置１０は、以下に説明する各種の処理を組み合わせることで、図１に示すセッションの評価を実現すればよい。なお、以下に説明する各種の処理は、任意の組み合わせで実施してよい。 That is, the information providing apparatus 10 may realize the evaluation of the session illustrated in FIG. 1 by combining various processes described below. Note that the various processes described below may be performed in any combination.

〔１−３．利用者の主観について〕
上述した説明では、情報提供装置１０は、応答に対する利用者の好意的な印象若しくは非好意的な印象を示す各種指標発話を用いて、セッションの評価を行った。ここで、「好意的な印象」や「非好意的な印象」とは、利用者の好きか嫌いかといった印象のみならず、応答が正しい若しくは正しくないといった印象等、応答に対する利用者の肯定的な各種の印象や否定的な各種の印象をも含む概念である。 [1-3. About user's subjectivity)
In the above description, the information providing apparatus 10 evaluates the session using various index utterances indicating a favorable impression or a non-favorable impression of the response to the user. Here, “favorable impressions” and “unfavorable impressions” include not only impressions that the user likes or dislike, but also positive impressions of the user, such as impressions that the response is correct or incorrect. It is a concept that includes various impressions and negative impressions.

また、上述した説明では、情報提供装置１０は、セッションに対して利用者が満足していたか否かを評価した。ここで、情報提供装置１０は、満足したか否かという２値の評価のみならず、任意の態様の評価を行ってよい。例えば、情報提供装置１０は、不満足度スコアの値そのものを評価結果としてもよい。また、情報提供装置１０は、不満足度ではなく、利用者の満足度が高い程値が高くなり、満足度が低い程値が低くなるスコア、すなわち、満足度を算出するよう学習が行われたモデルＭ１〜Ｍ３を利用してもよい。また、情報提供装置１０は、不満足度スコアではなく、利用者が満足した度合を示す満足度スコアをセッションごとに算出してもよい。 Further, in the above description, the information providing apparatus 10 evaluated whether or not the user was satisfied with the session. Here, the information providing apparatus 10 may perform not only a binary evaluation of whether or not the user is satisfied, but also an evaluation of an arbitrary mode. For example, the information providing apparatus 10 may use the value of the dissatisfaction score itself as the evaluation result. In addition, the information providing apparatus 10 was trained to calculate a score in which the value is higher as the user's satisfaction is higher, and the value is lower as the user's satisfaction is lower, that is, the satisfaction is not the dissatisfaction. Models M1 to M3 may be used. In addition, the information providing apparatus 10 may calculate, for each session, a satisfaction score indicating the degree of satisfaction of the user instead of the dissatisfaction score.

なお、上述した説明では、情報提供装置１０は、利用者がセッションに対して「満足」したか「不満足」であったかを評価した。ここで、「満足」や「不満足」という概念は、利用者がセッションに対してイライラしているか否か等、セッションに対する利用者の好意的な各種の印象もしくは非好意的な印象をも含む概念である。 In the above description, the information providing apparatus 10 evaluates whether the user is “satisfied” or “unsatisfied” with the session. Here, the concept of “satisfied” or “unsatisfied” is a concept including various favorable impressions or unfavorable impressions of the user on the session, such as whether the user is frustrated with the session. It is.

すなわち、上述した「好意的」や「満足」といった記載は、利用者による好意的な各種の印象を示すものであり、「非好意的」や「不満足」といった記載は、利用者による非好意的な各種の印象を示すものである。情報提供装置１０は、応答に対する利用者の印象を示す各種指標発話に基づいて、セッションを評価するのであれば、利用者の印象の種別によらず、実施形態に記載した各種の処理により、任意の印象を軸とした評価を実現してよい。 That is, the description such as “favorable” or “satisfied” indicates various favorable impressions by the user, and the description “unfavorable” or “unsatisfied” It shows various impressions. If the information providing apparatus 10 evaluates the session based on various index utterances indicating the user's impression of the response, the information providing apparatus 10 can perform arbitrary processing by various processing described in the embodiment regardless of the type of the user's impression. The evaluation based on the impression of may be realized.

〔１−４．指標発話に基づく評価について〕
上述した説明では、情報提供装置１０は、好意的指標発話や非好意的指標発話といった各種指標発話に基づいて、セッションに利用者が満足しているか否かの評価を行った。ここで、各種の指標発話は、その指標発話が出現するまでの対話や応答等といった対象に対して利用者が好意的な印象を有しているか否かといった発話であり、セッションの評価の指標となりうる発話である。 [1-4. Evaluation based on index utterance]
In the above description, the information providing apparatus 10 evaluates whether the user is satisfied with the session based on various index utterances such as favorable index utterances and unfavorable index utterances. Here, the various index utterances are utterances indicating whether or not the user has a favorable impression on an object such as a dialogue or a response until the index utterance appears. This is a possible utterance.

ここで、指標発話は、指標発話が示す利用者の印象の対象によらず、セッションの評価の指標となりえる発話であれば、任意の対象に対する利用者の印象を示す発話が採用可能である。例えば、情報提供装置１０は、単一の応答に対する利用者の印象を示す発話（すなわち、フィードバックとなる発話）を指標発話としてもよく、一連の対話やセッション全体に対する利用者の印象を示す発話を指標発話として採用してもよい。すなわち、情報提供装置１０は、応答に対する対話サービスに対する利用者の評価を示す発話を指標発話とするのであれば、対話サービスにおける任意の対象に対する利用者の印象を示す発話を指標発話として採用してもよい。 Here, as the index utterance, an utterance indicating an impression of the user with respect to an arbitrary target can be adopted as long as the utterance can be an index of the evaluation of the session regardless of the target of the user's impression indicated by the index utterance. For example, the information providing apparatus 10 may use, as an index utterance, an utterance indicating the user's impression of a single response (that is, an utterance that serves as a feedback), and generate an utterance indicating the user's impression of a series of dialogues or the entire session. It may be adopted as an index utterance. That is, if the utterance indicating the user's evaluation of the dialogue service in response to the response is used as the index utterance, the information providing apparatus 10 adopts the utterance indicating the user's impression of any object in the dialogue service as the index utterance. Is also good.

〔２．情報提供装置の構成〕
以下、情報提供装置１０が実行する各種の処理を実現する情報提供装置１０が有する機能構成の一例について説明する。図２は、実施形態に係る情報提供装置の構成例を示す図である。図２に示すように、情報提供装置１０は、通信部２０、記憶部３０、および制御部４０を有する。 [2. Configuration of Information Provision Device)
Hereinafter, an example of a functional configuration of the information providing apparatus 10 that realizes various processes executed by the information providing apparatus 10 will be described. FIG. 2 is a diagram illustrating a configuration example of the information providing apparatus according to the embodiment. As shown in FIG. 2, the information providing device 10 includes a communication unit 20, a storage unit 30, and a control unit 40.

〔２−１．通信部について〕
通信部２０は、例えば、ＮＩＣ（Network Interface Card）等によって実現される。そして、通信部２０は、ネットワークＮと有線または無線で接続され、端末装置１００および対話装置２００との間で情報の送受信を行う。 [2-1. About communication department)
The communication unit 20 is realized by, for example, a NIC (Network Interface Card). The communication unit 20 is connected to the network N by wire or wirelessly, and transmits and receives information between the terminal device 100 and the interactive device 200.

〔２−２．記憶部について〕
記憶部３０は、例えば、ＲＡＭ（Random Access Memory)、フラッシュメモリ（Flash Memory）等の半導体メモリ素子、または、ハードディスク、光ディスク等の記憶装置によって実現される。また、記憶部３０は、セッションログデータベース３１、セッション評価データベース３２、指標発話データベース３３、エンゲージメントデータベース３４、およびモデルデータベース３５を記憶する。 [2-2. About storage unit)
The storage unit 30 is realized by, for example, a semiconductor memory element such as a RAM (Random Access Memory) or a flash memory, or a storage device such as a hard disk or an optical disk. The storage unit 30 stores a session log database 31, a session evaluation database 32, an index utterance database 33, an engagement database 34, and a model database 35.

以下、図３〜図７を用いて、記憶部３０が記憶する各データベース３１〜３５の一例について説明する。セッションログデータベース３１には、対話履歴がセッションごとに登録される。例えば、図３は、実施形態に係るセッションログデータベースに登録される情報の一例を示す図である。図３に示すように、セッションログデータベース３１には、「セッションＩＤ（Identifier）」、「対話時刻」、「利用者ＩＤ」、および「対話内容」といった項目を有する情報が登録される。なお、図３に示す情報以外にも、対話処理における発話および応答に関する情報であれば、任意の情報がセッションログデータベース３１に登録されていてよい。 Hereinafter, an example of each of the databases 31 to 35 stored in the storage unit 30 will be described with reference to FIGS. A conversation history is registered in the session log database 31 for each session. For example, FIG. 3 is a diagram illustrating an example of information registered in the session log database according to the embodiment. As shown in FIG. 3, information having items such as “session ID (Identifier)”, “dialogue time”, “user ID”, and “dialogue content” is registered in the session log database 31. In addition, any information other than the information shown in FIG. 3 may be registered in the session log database 31 as long as the information is related to an utterance and a response in the interactive processing.

ここで、「セッションＩＤ」とは、セッションを識別するための情報である。また、「対話時刻」とは、「セッションＩＤ」が示すセッションに含まれる対話が行われた日時を示す情報であり、例えば、セッションに含まれる最初の発話が取得された日時を示す情報である。また、「利用者ＩＤ」とは、「セッションＩＤ」が示すセッションに含まれる発話を行った利用者を識別する識別子である。また、「対話内容」とは、「セッションＩＤ」が示すセッションに含まれる発話および応答の内容を時系列順に並べた情報である。 Here, the “session ID” is information for identifying a session. The “dialogue time” is information indicating the date and time when the dialogue included in the session indicated by the “session ID” is performed, and is, for example, information indicating the date and time when the first utterance included in the session is acquired. . The “user ID” is an identifier for identifying a user who has made an utterance included in the session indicated by the “session ID”. Further, the “interaction content” is information in which the contents of utterances and responses included in the session indicated by the “session ID” are arranged in chronological order.

例えば、図３に示す例では、セッションＩＤ「セッション＃１」、対話時刻「対話時刻＃１」、利用者ＩＤ「利用者＃１」、対話内容「Usr:旅館の仕事 sys:宿泊施設を案内します・・・（中略）・・・」といった情報が登録されている。このような情報は、セッションＩＤ「セッション＃１」が示すセッションに含まれる対話が行われた日時が「対話時刻＃１」であり、そのセッションにおいて発話した利用者が「利用者＃１」が示す利用者である旨を示す。また、セッションＩＤ「セッション＃１」が示すセッションには、利用者の発話「旅館の仕事」が含まれ、このような発話に対する応答が「宿泊施設を案内します」であった旨を示す。 For example, in the example shown in FIG. 3, the session ID “session # 1”, the conversation time “dialog time # 1”, the user ID “user # 1”, the conversation content “Usr: work at an inn sys: guide the accommodation facility” ... (omitted) ... ". In such information, the date and time when the conversation included in the session indicated by the session ID “session # 1” was performed is “dialogue time # 1”, and the user who spoke in the session is “user # 1”. Indicates that the user is the indicated user. The session indicated by the session ID “session # 1” includes the user's utterance “work of an inn”, and indicates that the response to such utterance is “guide the accommodation facility”.

なお、図３に示す例では、「セッション＃１」、「対話時刻＃１」、「利用者＃１」といった概念的な値を記載したが、実際には、セッションを識別する文字列や数値、日時を示す数値、利用者を識別する文字列や数値等がセッションログデータベース３１に登録されることとなる。 In the example shown in FIG. 3, conceptual values such as “session # 1”, “interaction time # 1”, and “user # 1” are described. , A numerical value indicating the date and time, a character string or numerical value for identifying the user, and the like are registered in the session log database 31.

セッション評価データベース３２には、セッションに対する評価結果が登録される。例えば、図４は、実施形態に係るセッション評価データベースに登録される情報の一例を示す図である。図４に示すように、セッション評価データベース３２には、「セッションＩＤ」、「セッション種別」、および「不満足度スコア」といった項目を有する情報が登録される。なお、セッション評価データベース３２に登録される情報は、これに限定されるものではなく、セッションに対する評価に関する情報であれば、任意の情報が登録されていてよい。また、セッション評価データベース３２は、例えば、セッションログデータベース３１と統合されていてもよい。 In the session evaluation database 32, the evaluation result for the session is registered. For example, FIG. 4 is a diagram illustrating an example of information registered in the session evaluation database according to the embodiment. As shown in FIG. 4, information having items such as “session ID”, “session type”, and “satisfaction score” is registered in the session evaluation database 32. The information registered in the session evaluation database 32 is not limited to this, and any information may be registered as long as the information is related to the evaluation of the session. Further, the session evaluation database 32 may be integrated with the session log database 31, for example.

ここで、「セッション種別」とは、対応付けられた「セッションＩＤ」が示すセッションにおける対話に対して利用者が満足したと推定されるか否かを示す情報であり、例えば、セッションに対して付与されるラベルである。例えば、「セッション種別」は、利用者が満足していない（例えば、不満足度スコアが高い）と推定される場合は、「低満足セッション」が、利用者が満足した（例えば、不満足度スコアが低い）と推定される場合は、「高満足セッション」が、そのどちらでもない（例えば、不満足度スコアが所定の範囲内に収まる場合）と推定される場合は、「中性セッション」が採用される。また、「不満足度スコア」とは、対応付けられた「セッションＩＤ」が示すセッションに対して付与された不満足度スコアの値である。 Here, the “session type” is information indicating whether or not it is estimated that the user has been satisfied with the dialogue in the session indicated by the associated “session ID”. This is the label to be given. For example, if the “session type” is estimated that the user is not satisfied (for example, the dissatisfaction score is high), the “low-satisfaction session” indicates that the user is satisfied (for example, the dissatisfaction score is high). If it is estimated to be "low", then "high satisfaction session" is assumed to be neither (for example, if the dissatisfaction score falls within a predetermined range), then "neutral session" is adopted. You. The “dissatisfaction score” is the value of the dissatisfaction score assigned to the session indicated by the associated “session ID”.

例えば、図４に示す例では、セッション評価データベース３２には、セッションＩＤ「セッション＃１」、セッション種別「低満足セッション」、および不満足度スコア「１５０」が対応付けて登録されている。このような情報は、セッションＩＤ「セッション＃１」が示すセッションのセッション種別が「低満足セッション」であり、不満足度スコアが「１５０」であった旨を示す。なお、セッションの評価が行われていない場合は、セッション種別が「未評価」となり、不満足度スコアが算出されていない場合は、不満足度スコアが「未評価」となる。 For example, in the example illustrated in FIG. 4, the session ID “session # 1”, the session type “low satisfaction session”, and the dissatisfaction score “150” are registered in the session evaluation database 32 in association with each other. Such information indicates that the session type of the session indicated by the session ID “session # 1” is “low satisfaction session” and the dissatisfaction score is “150”. When the session is not evaluated, the session type is “unrated”, and when the dissatisfaction score is not calculated, the dissatisfaction score is “unrated”.

指標発話データベース３３には、各種の指標発話が登録される。例えば、図５は、実施形態に係る指標発話データベースに登録される情報の一例を示す図である。図５に示すように、指標発話データベース３３には、「指標発話」、「指標種別」、「指標スコア」および「登録元」といった項目を有する情報が登録される。なお、図５に示す情報以外にも、指標発話データベース３３には、指標発話に関する各種の情報が登録されていてよい。 Various index utterances are registered in the index utterance database 33. For example, FIG. 5 is a diagram illustrating an example of information registered in the index utterance database according to the embodiment. As shown in FIG. 5, information having items such as “index utterance”, “index type”, “index score”, and “registration source” is registered in the index utterance database 33. Note that, in addition to the information illustrated in FIG. 5, various kinds of information regarding the index utterance may be registered in the index utterance database 33.

ここで、「指標発話」とは、非好意的指標発話や好意的指標発話等、その発話が含まれるセッションにおける対話に対して利用者が満足しているか否かの指標となる発話の文字列である。また、「指標種別」とは、対応付けられた指標発話が非好意的指標発話であるか好意的指標発話であるかを示す情報である。また、「指標スコア」とは、対応付けられた「指標発話」が含まれるセッションにおける対話に対して利用者が満足していない度合を示すスコアである。また、「登録元」とは、対応付けられた指標発話が予め設定された指標発話、すなわち、シードとなる指標発話であるか、セッションから抽出処理によって自動的に抽出された指標発話であるかを示す情報であり、セッションから抽出された指標発話である場合、抽出元となったセッションのセッションＩＤが登録される。 Here, the “index utterance” is a character string of an utterance that is an index of whether or not the user is satisfied with a dialogue in a session including the utterance, such as an unfavorable index utterance or a favorable index utterance. It is. The “index type” is information indicating whether the associated index utterance is an unfavorable index utterance or a favorable index utterance. The “index score” is a score indicating the degree to which the user is not satisfied with the dialogue in the session including the associated “index utterance”. In addition, “registration source” means whether the associated index utterance is a preset index utterance, that is, an index utterance to be a seed, or an index utterance automatically extracted from a session by an extraction process. If the utterance is an index utterance extracted from a session, the session ID of the session from which the session was extracted is registered.

例えば、図５に示す例では、指標発話データベース３３には、指標発話「よくできている」、指標種別「好意的」、指標スコア「−１５０」および登録元「事前登録」が対応付けて登録されている。このような情報は、「よくできている」という文字列が指標発話として登録されており、指標発話の種別が「好意的」、すなわち、好意的指標発話であり、指標スコアが「−１５０」である旨を示す。また、このような情報は、「よくできている」という好意的指標発話が、予め事前登録された指標発話、すなわち、シードとなる指標発話である旨を示す。 For example, in the example illustrated in FIG. 5, the index utterance database 33 registers the index utterance “good”, the index type “favorable”, the index score “−150”, and the registration source “pre-registration” in association with each other. Have been. In such information, a character string “good” is registered as an index utterance, and the type of the index utterance is “favorable”, that is, a favorable index utterance, and the index score is “−150”. Is shown. Such information indicates that the favorable index utterance “good” is an index utterance registered in advance, that is, an index utterance to be a seed.

エンゲージメントデータベース３４には、利用者による対話サービスの利用態様を示す情報が登録される。より具体的には、エンゲージメントデータベース３４には、利用期間や利用頻度といった情報、すなわち、利用者がどれくらい対話サービスを利用するかを示すエンゲージメント情報が登録される。例えば、図６は、実施形態に係るエンゲージメントデータベースに登録される情報の一例を示す図である。図６に示すように、エンゲージメントデータベース３４には、「利用者ＩＤ」、「エンゲージメント履歴」、および「予測エンゲージメント」といった項目を有する情報が登録される。なお、図６に示す情報以外にも、エンゲージメントに関する情報であれば、任意の情報がエンゲージメントデータベース３４に登録されていてよい。 In the engagement database 34, information indicating the manner in which the user uses the interactive service is registered. More specifically, information such as a use period and a use frequency, that is, engagement information indicating how much the user uses the interactive service is registered in the engagement database 34. For example, FIG. 6 is a diagram illustrating an example of information registered in the engagement database according to the embodiment. As shown in FIG. 6, information having items such as “user ID”, “engagement history”, and “predicted engagement” is registered in the engagement database 34. Any information other than the information shown in FIG. 6 may be registered in the engagement database 34 as long as the information is related to the engagement.

ここで、「エンゲージメント履歴」とは、対応付けられた「利用者ＩＤ」が示す利用者による対話サービスの利用期間の履歴である履歴期間、および、利用者による対話サービスの利用頻度の履歴である「履歴頻度」を示す情報である。また、「予測エンゲージメント」とは、対応付けられた「利用者ＩＤ」が示す利用者による将来の利用期間である「予測期間」、および利用者による将来の利用頻度である「予測頻度」を示す情報である。 Here, the “engagement history” is a history period that is a history of the use period of the interactive service by the user indicated by the associated “user ID”, and a history of the use frequency of the interactive service by the user. This is information indicating “history frequency”. Further, the “prediction engagement” indicates a “prediction period” that is a future use period by the user indicated by the associated “user ID” and a “prediction frequency” that is a future use frequency by the user. Information.

また、「履歴期間」とは、例えば、過去の所定の期間において利用者が連続して対話サービスを利用した期間を示す情報であり、例えば、所定の期間（例えば、数日間）を空けずに利用者が対話サービスを利用した最長の期間を示す情報である。また、「履歴頻度」とは、履歴期間の間に利用者が対話サービスを利用した頻度を示す情報であり、例えば、履歴期間の間に利用者が対話サービスを利用した回数（例えば、セッションの数）を履歴期間で除算した値である。また、「予測期間」とは、将来の所定の期間において利用者が連続して対話サービスを利用すると推定される期間を示す情報である。また、「予測頻度」とは、将来の所定の期間において利用者が対話サービスを利用すると推定される頻度を示す情報である。 The “history period” is, for example, information indicating a period during which the user has continuously used the interactive service in a predetermined period in the past. For example, the “history period” is a time period without a predetermined period (for example, several days). This is information indicating the longest period during which the user has used the interactive service. The “history frequency” is information indicating the frequency at which the user has used the interactive service during the history period. For example, the number of times the user has used the interactive service during the history period (for example, Number) divided by the history period. The “prediction period” is information indicating a period in which the user is expected to use the interactive service continuously in a predetermined period in the future. The “predicted frequency” is information indicating a frequency at which the user is assumed to use the interactive service in a predetermined period in the future.

例えば、図６に示す例では、利用者ＩＤ「利用者＃１」、エンゲージメント履歴「履歴期間＃１履歴頻度＃１」、予測エンゲージメント「予測期間＃１予測頻度＃１」が対応付けて登録されている。このような情報は、利用者ＩＤ「利用者＃１」が示す利用者のエンゲージメント履歴が「履歴期間＃１」および「履歴頻度＃１」であり、予測エンゲージメントが「予測期間＃１」および「予測頻度＃１」である旨を示す。 For example, in the example illustrated in FIG. 6, the user ID “user # 1”, the engagement history “history period # 1 history frequency # 1”, and the prediction engagement “prediction period # 1 prediction frequency # 1” are registered in association with each other. ing. Such information includes that the engagement history of the user indicated by the user ID “user # 1” is “history period # 1” and “history frequency # 1”, and the prediction engagement is “prediction period # 1” and “history frequency # 1”. Predicted frequency # 1. "

なお、図６に示す例では、「履歴期間＃１」、「履歴頻度＃１」、「予測期間＃１」、「予測頻度＃１」といった概念的な値を記載したが、実際には、各種の期間や頻度を示す数値が登録されることとなる。 In the example illustrated in FIG. 6, conceptual values such as “history period # 1”, “history frequency # 1”, “prediction period # 1”, and “prediction frequency # 1” are described. Numerical values indicating various periods and frequencies are registered.

モデルデータベース３５には、各種のモデルが登録される。例えば、図７は、実施形態に係るモデルデータベースに登録される情報の一例を示す図である。図７に示すように、モデルデータベース３５には、「モデルＩＤ」、「モデル種別」、および「モデルデータ」といった項目を有する情報が登録される。なお、図７に示す情報以外にも、モデルデータベース３５には、モデルに関する各種の情報が登録されていてよい。 Various models are registered in the model database 35. For example, FIG. 7 is a diagram illustrating an example of information registered in the model database according to the embodiment. As shown in FIG. 7, information having items such as “model ID”, “model type”, and “model data” is registered in the model database 35. Note that, in addition to the information shown in FIG. 7, various types of information regarding the model may be registered in the model database 35.

ここで、「モデルＩＤ」とは、モデルを識別する識別子である。また、「モデル種別」とは、モデルがどのようなモデルであるかを示す情報である。また、「モデルデータ」とは、モデルのデータである。なお、図７に示す例では、「モデルデータ＃１」といった概念的な値を記載したが、実際には、モデルを構成する各種のパラメータ等の情報が登録される。 Here, the “model ID” is an identifier for identifying a model. The “model type” is information indicating what kind of model the model is. “Model data” is model data. Note that, in the example shown in FIG. 7, a conceptual value such as “model data # 1” is described. However, information such as various parameters constituting the model is actually registered.

例えば、図７に示す例では、モデルデータベース３５には、セッションに繰り返しが含まれているか否かに基づいてセッションを評価する「繰り返し評価モデル」、セッションに謝罪応答が含まれているか否かに基づいてセッションを評価する「謝罪評価モデル」、セッションに非好意的指標発話が含まれているか否かに基づいてセッションを評価する「非好意的指標発話評価モデル」、およびセッションに好意的指標発話が含まれているか否かに基づいてセッションを評価する「好意的指標発話評価モデル」が登録されている。 For example, in the example illustrated in FIG. 7, the model database 35 includes a “repetition evaluation model” that evaluates a session based on whether or not the session includes a repetition, and whether or not an apology response is included in the session. "Apology evaluation model" that evaluates sessions based on, "Unfavorable index utterance evaluation model" that evaluates sessions based on whether or not the session contains unfavorable index utterances, and "Indicator utterances that are favorable to sessions" A “favorable index utterance evaluation model” that evaluates a session based on whether or not is included is registered.

また、モデルデータベース３５には、利用者が満足したと推定される対話、利用者が満足していないと推定される対話、および、そのどちらでもない対話の特徴を学習した「３値評価モデル」が登録されている。また、モデルデータベース３５には、対話サービスの利用態様が所定の条件を満たす利用者の発話を含むセッションの特徴を学習した「エンゲージメントセッション評価モデル」が登録されている。また、モデルデータベース３５には、疑似的に生成されたセッションであって、同一又は類似の発話を含むセッションの特徴を学習した「反復発話評価モデル」が登録されている。 In addition, the model database 35 includes a “ternary evaluation model” that learns features of a dialogue that is estimated to be satisfied by the user, a dialogue that is estimated to be not satisfied by the user, and a dialogue that is neither of them. Is registered. In the model database 35, an "engagement session evaluation model" in which the feature of the session including the utterance of the user in which the use mode of the interactive service satisfies the predetermined condition is registered. In the model database 35, a “repeated utterance evaluation model” that is a pseudo-generated session and that has learned the features of the session including the same or similar utterance is registered.

また、モデルデータベース３５には、対話装置２００が使用するモデルであって、発話に対する応答を生成するための「対話モデル」が登録されている。また、モデルデータベース３５には、利用者を疑似的に再現するモデルであって、応答が入力された場合に、その応答に対する発話を出力する「利用者模型モデル」が登録されている。 In the model database 35, a "dialog model" for generating a response to the utterance, which is a model used by the dialog device 200, is registered. In the model database 35, a "user model model" that is a model that simulates a user and that outputs an utterance in response to a response when the response is input is registered.

なお、モデルデータベース３５には、各モデルのモデルデータとして、任意の形式のモデルの情報が登録される。例えば、各モデルは、ＳＶＭ（Support Vector Machine）やＤＮＮ（Deep Neural Network）により実現されてもよい。ここで、ＤＮＮは、ＣＮＮ（Convolutional Neural Network）やＲＮＮ（Recurrent Neural Network）であってもよい。また、ＲＮＮは、ＬＳＴＭ（Long short-term memory）等であってもよい。すなわち、各モデルは、任意の形式のモデルが採用可能である。また、各モデルは、例えば、ＣＮＮとＲＮＮとを組み合わせたモデル等、複数のモデルを組み合わせることで実現されるモデルであってもよい。 In the model database 35, model information of an arbitrary format is registered as model data of each model. For example, each model may be realized by an SVM (Support Vector Machine) or a DNN (Deep Neural Network). Here, the DNN may be a CNN (Convolutional Neural Network) or an RNN (Recurrent Neural Network). Further, the RNN may be an LSTM (Long Short-Term Memory) or the like. That is, each model can adopt a model of an arbitrary format. In addition, each model may be a model realized by combining a plurality of models, such as a model combining CNN and RNN.

〔２−３．制御部について〕
図２に戻り、説明を続ける。制御部４０は、コントローラ（controller）であり、例えば、ＣＰＵ（Central Processing Unit）、ＭＰＵ（Micro Processing Unit）等のプロセッサによって、情報提供装置１０内部の記憶装置に記憶されている各種プログラムがＲＡＭ等を作業領域として実行されることにより実現される。また、制御部４０は、コントローラ（controller）であり、例えば、ＡＳＩＣ（Application Specific Integrated Circuit）やＦＰＧＡ（Field Programmable Gate Array）等の集積回路により実現されてもよい。 [2-3. About control unit)
Returning to FIG. 2, the description will be continued. The control unit 40 is a controller. For example, various programs stored in a storage device inside the information providing apparatus 10 by a processor such as a CPU (Central Processing Unit) and an MPU (Micro Processing Unit) are stored in a RAM or the like. As a work area. The control unit 40 is a controller, and may be realized by an integrated circuit such as an ASIC (Application Specific Integrated Circuit) or an FPGA (Field Programmable Gate Array).

図２に示すように、制御部４０は、セッション取得部４１、利用態様特定部４２、指標発話抽出処理部５０、セッション評価処理部６０、エンゲージメント予測処理部７０、および強化学習処理部８０を有する。 As illustrated in FIG. 2, the control unit 40 includes a session acquisition unit 41, a use mode specifying unit 42, an index utterance extraction processing unit 50, a session evaluation processing unit 60, an engagement prediction processing unit 70, and a reinforcement learning processing unit 80. .

セッション取得部４１は、セッションの取得を行う。例えば、セッション取得部４１は、対話装置２００から対話履歴を取得すると、取得した対話履歴をセッションごとに分割する。例えば、セッション取得部４１は、処理対象となる利用者を選択し、選択した利用者の発話を含む発話履歴を抽出する。続いて、セッション取得部４１は、抽出した発話履歴における各応答と発話との間の時間を特定する。そして、セッション取得部４１は、ある応答から次の発話までの間の時間が所定の期間よりも長い場合は、その応答がセッションにおける最後の応答であると判定し、判定結果に基づいて、利用者との対話をセッションに分割する。そして、セッション取得部４１は、各セッションにセッションＩＤと利用者の利用者ＩＤとを付与して、セッションログデータベース３１に登録する。 The session acquisition unit 41 acquires a session. For example, when acquiring the conversation history from the conversation device 200, the session acquisition unit 41 divides the acquired conversation history for each session. For example, the session acquisition unit 41 selects a user to be processed and extracts an utterance history including the utterance of the selected user. Subsequently, the session acquisition unit 41 specifies a time between each response and the utterance in the extracted utterance history. If the time from one response to the next utterance is longer than a predetermined period, the session acquisition unit 41 determines that the response is the last response in the session, and uses the response based on the determination result. The conversation with the person into sessions. Then, the session acquisition unit 41 assigns a session ID and a user ID of the user to each session, and registers the session ID in the session log database 31.

利用態様特定部４２は、利用者の利用態様を特定する。例えば、利用態様特定部４２は、処理対象となる利用者を選択し、所定の期間内（例えば、過去半年）における利用者の対話サービスの利用態様を特定する。より具体的には、利用態様特定部４２は、所定の期間内における履歴期間と履歴頻度とを特定する。そして、利用態様特定部４２は、特定した履歴期間と履歴頻度とをエンゲージメント履歴として、利用者の利用者ＩＤと対応付けてエンゲージメントデータベース３４に登録する。 The usage mode specifying unit 42 specifies the usage mode of the user. For example, the use mode specifying unit 42 selects a user to be processed, and specifies a use mode of the user's interactive service within a predetermined period (for example, the past six months). More specifically, the usage mode specifying unit 42 specifies a history period and a history frequency within a predetermined period. Then, the usage mode specifying unit 42 registers the specified history period and history frequency in the engagement database 34 as an engagement history in association with the user ID of the user.

〔２−４．抽出処理について〕
指標発話抽出処理部５０は、セッションの評価に用いる指標発話をセッションから抽出する抽出処理を実行する。以下、図８〜図１１を用いて、指標発話抽出処理部５０が実行する処理の概念、および、指標発話抽出処理部５０が有する機能構成の一例を説明する。 [2-4. About extraction processing)
The index utterance extraction processing unit 50 executes an extraction process of extracting an index utterance used for evaluating the session from the session. Hereinafter, the concept of the process executed by the index utterance extraction processing unit 50 and an example of the functional configuration of the index utterance extraction processing unit 50 will be described with reference to FIGS.

〔２−４−１．共起性に基づく抽出処理について〕
例えば、指標発話抽出処理部５０は、指標発話との共起性に基づいて、新たな指標発話の抽出を行う。例えば、図８は、シードとなる指標発話から新たな指標発話を抽出する処理の概念を示す図である。例えば、あるセッションに好意的指標発話が含まれる場合、同一のセッションには好意的指標発話が含まれやすいと考えられる。また、あるセッションに非好意的指標発話が含まれる場合、同一のセッションには、非好意的指標発話が含まれやすいと推定される。すなわち、同一のセッション内においては、好意的指標発話同士が共起しやすく、非好意的指標発話同士が共起しやすいと考えられる。 [2-4-1. About extraction processing based on co-occurrence]
For example, the index utterance extraction processing unit 50 extracts a new index utterance based on co-occurrence with the index utterance. For example, FIG. 8 is a diagram illustrating a concept of a process of extracting a new index utterance from an index utterance serving as a seed. For example, if a certain session includes a favorable index utterance, the same session is likely to include a favorable index utterance. Further, when a certain session includes an unfavorable index utterance, it is estimated that the same session is likely to include an unfavorable index utterance. That is, in the same session, favorable index utterances are likely to co-occur and non-favorable index utterances are likely to co-occur.

そこで、指標発話抽出処理部５０は、対話サービスにおける利用者の発話と発話に対する応答との履歴から、対話サービスに対する利用者の評価の指標となる発話であって、予め設定された指標発話を特定し、特定した指標発話との共起性に基づいて、利用者の発話から新たな指標発話の抽出を行う。例えば、指標発話抽出処理部５０は、予め登録されたシード非好意的指標発話やシード好意的指標発話との共起性に基づいて、各発話から新たな指標発話を抽出し、指標発話データベース３３に登録する。 Therefore, the index utterance extraction processing unit 50 identifies the utterance which is an index of the evaluation of the user with respect to the interactive service from the history of the utterance of the user in the interactive service and the response to the utterance, and specifies a preset index utterance. Then, a new index utterance is extracted from the user's utterance based on the co-occurrence with the specified index utterance. For example, the index utterance extraction processing unit 50 extracts a new index utterance from each utterance based on co-occurrence with a seed non-favorable index utterance or a seed-friendly index utterance registered in advance, and generates an index utterance database 33. Register with.

例えば、「すごい」という文字列がシード好意的指標発話として指標発話データベース３３に登録されている場合、指標発話抽出処理部５０は、セッションログデータベース３１から「すごい」という文字列が発話に含まれるセッションを全て抽出する。また、指標発話抽出処理部５０は、抽出したセッションに含まれる発話ごとに、その発話が含まれるセッションの数を計数する。すなわち、指標発話抽出処理部５０は、発話ごとに、シード好意的指標発話との共起回数を計数する。 For example, when the character string “Wow” is registered in the index utterance database 33 as the seed-friendly index utterance, the index utterance extraction processing unit 50 includes the character string “Wow” from the session log database 31 in the utterance. Extract all sessions. Further, for each utterance included in the extracted session, the index utterance extraction processing unit 50 counts the number of sessions including the utterance. That is, the index utterance extraction processing unit 50 counts the number of co-occurrences with the seed favorable index utterance for each utterance.

また、例えば、「馬鹿野郎」という文字列がシード非好意的指標発話として指標発話データベース３３に登録されている場合、指標発話抽出処理部５０は、セッションログデータベース３１から「馬鹿野郎」という文字列が発話に含まれるセッションを全て抽出する。また、指標発話抽出処理部５０は、抽出したセッションに含まれる発話ごとに、その発話が含まれるセッションの数を計数する。すなわち、指標発話抽出処理部５０は、発話ごとに、シード非好意的指標発話との共起回数を計数する。 Further, for example, when a character string “Idiot” is registered in the index utterance database 33 as a seed unfavorable index utterance, the index utterance extraction processing unit 50 reads the character string “Idiot” from the session log database 31. Extracts all sessions included in the utterance. Further, for each utterance included in the extracted session, the index utterance extraction processing unit 50 counts the number of sessions including the utterance. That is, the index utterance extraction processing unit 50 counts the number of co-occurrences with the seed unfavorable index utterance for each utterance.

そして、指標発話抽出処理部５０は、計数した共起回数の比を発話ごとに算出する。より具体的には、指標発話抽出処理部５０は、シード非好意的指標発話との共起回数をシード好意的指標発話との共起回数で除算した値を発話ごとに算出する。そして、指標発話抽出処理部５０は、比率が上位Ｎ個（Ｎは、任意の数）の発話を新たな非好意的指標発話として抽出し、比率が下位Ｎ個の発話を新たな好意的指標発話として抽出する。また、指標発話抽出処理部５０は、抽出した非好意的指標発話や好意的指標発話を指標発話データベース３３に登録する。 Then, the index utterance extraction processing unit 50 calculates a ratio of the counted number of times of co-occurrence for each utterance. More specifically, the index utterance extraction processing unit 50 calculates, for each utterance, a value obtained by dividing the number of times of co-occurrence with the seed unfavorable index utterance by the number of times of co-occurrence with the seed favorable index utterance. Then, the index utterance extraction processing unit 50 extracts the utterances having the higher N ratio (N is an arbitrary number) as new unfavorable index utterances, and extracts the utterances having the lower N ratios as the new favorable index. Extract as utterance. In addition, the index utterance extraction processing unit 50 registers the extracted unfavorable index utterance or favorable index utterance in the index utterance database 33.

ここで、指標発話抽出処理部５０は、共起ネットワーク上の活性伝播に基づいて、シード指標発話から新たな指標発話を抽出してもよい。例えば、指標発話抽出処理部５０は、シード指標発話と同一のセッションに含まれる発話の共起ネットワークを生成する。例えば、指標発話抽出処理部５０は、各発話をノードとし、同一のセッションに含まれる発話同士をリンクで接続したネットワークを生成する。なお、共起ネットワークの生成については、任意の公知技術が採用可能である。 Here, the index utterance extraction processing unit 50 may extract a new index utterance from the seed index utterance based on the activity propagation on the co-occurrence network. For example, the index utterance extraction processing unit 50 generates a co-occurrence network of utterances included in the same session as the seed index utterance. For example, the index utterance extraction processing unit 50 generates a network in which each utterance is a node and the utterances included in the same session are connected by a link. It should be noted that any known technique can be employed for generating the co-occurrence network.

続いて、指標発話抽出処理部５０は、共起ネットワークからシード指標発話と対応するノードを特定し、各ノードごとに、シード指標発話と対応するノードとの間の距離（例えば、リンクを辿るホップ数）を算出する。そして、指標発話抽出処理部５０は、シード非好意的指標発話と対応するノードとの距離が近く、シード好意的指標発話と対応するノードとの距離が遠いノードと対応する発話を新たな非好意的指標発話とする。また、指標発話抽出処理部５０は、シード好意的指標発話と対応するノードとの距離が近く、シード非好意的指標発話と対応するノードとの距離が遠いノードと対応する発話を新たな好意的指標発話とする。 Subsequently, the index utterance extraction processing unit 50 specifies a node corresponding to the seed index utterance from the co-occurrence network and, for each node, a distance between the seed index utterance and the corresponding node (for example, a hop following a link). Number). Then, the index utterance extraction processing unit 50 generates a new unfavorable utterance corresponding to a node in which the distance between the node corresponding to the seed unfavorable index utterance and the corresponding node is short and the distance between the node corresponding to the seed unfavorable index utterance and the corresponding node is long. The utterance is a target index utterance. In addition, the index utterance extraction processing unit 50 generates a new favorable utterance corresponding to a node in which the distance between the node corresponding to the seed favorable index utterance and the corresponding node is short and the distance between the node corresponding to the seed unfavorable index utterance and the corresponding node is long. This is an index utterance.

例えば、指標発話抽出処理部５０は、シード好意的指標発話からの距離が所定の閾値未満となり、かつ、シード非好意的指標発話からの距離が所定の閾値以上となるノードを特定し、特定したノードと対応する発話を新たな好意的指標発話として抽出する。また、指標発話抽出処理部５０は、シード好意的指標発話からの距離が所定の閾値以上となり、かつ、シード非好意的指標発話からの距離が所定の閾値未満となるノードを特定し、特定したノードと対応する発話を新たな非好意的指標発話として抽出する。そして、指標発話抽出処理部５０は、抽出した各指標発話を指標発話データベース３３に登録する。 For example, the index utterance extraction processing unit 50 specifies and identifies a node whose distance from the seed favorable index utterance is less than a predetermined threshold and whose distance from the seed unfavorable index utterance is equal to or greater than a predetermined threshold. The utterance corresponding to the node is extracted as a new favorable index utterance. In addition, the index utterance extraction processing unit 50 specifies and identifies a node whose distance from the seed favorable index utterance is equal to or greater than a predetermined threshold and whose distance from the seed unfavorable index utterance is less than a predetermined threshold. The utterance corresponding to the node is extracted as a new unfavorable index utterance. Then, the index utterance extraction processing unit 50 registers each extracted index utterance in the index utterance database 33.

なお、指標発話抽出処理部５０は、上述した抽出処理によって抽出した指標発話を新たなシード指標発話とし、新たなシード指標発話を用いて上述した抽出処理を実行することで、さらに新たな指標発話の抽出を行ってもよい。 Note that the index utterance extraction processing unit 50 uses the index utterance extracted by the above-described extraction processing as a new seed index utterance, and executes the above-described extraction processing using the new seed index utterance, thereby obtaining a new index utterance. May be extracted.

〔２−４−２．利用態様に基づく抽出処理について〕
また、例えば、指標発話抽出処理部５０は、利用者の対話サービスにおける利用態様に基づいて、新たな指標発話の抽出を行う。例えば、図９は、対話サービスにおける利用態様に基づいて、新たな指標発話を抽出する処理の概念を示す図である。 [2-4-2. About extraction processing based on usage mode)
In addition, for example, the index utterance extraction processing unit 50 extracts a new index utterance based on a use mode of the user in the interactive service. For example, FIG. 9 is a diagram illustrating a concept of a process of extracting a new index utterance based on a use mode in the interactive service.

例えば、対話サービスを継続して利用する期間、すなわち、利用期間が長い利用者や、対話サービスの利用頻度が高い利用者は、対話サービスに満足していると推定される。このため、利用期間が長い利用者や利用頻度が高い利用者は、好意的（好意的）な発言、すなわち、好意的指標発話を多く発話すると推定される。一方、利用期間が短い利用者や利用頻度が短い利用者は、非好意的（非好意的）な発言、すなわち、非好意的指標発話を多く発話すると推定される。 For example, a period during which the interactive service is continuously used, that is, a user who has a long usage period or a user who frequently uses the interactive service is estimated to be satisfied with the interactive service. For this reason, it is presumed that a user who has been using for a long period of time or a user who has been used frequently has a favorable (or favorable) utterance, that is, utters a lot of favorable index utterances. On the other hand, it is estimated that a user with a short usage period or a user with a short usage frequency will utter unfavorable (unfavorable) utterances, that is, many unfavorable index utterances.

そこで、指標発話抽出処理部５０は、対話サービスの利用態様が所定の条件を満たす利用者の発話を含むセッションを取得し、取得されたセッションにおける出現頻度に基づいて、対話サービスに対する利用者の評価の指標となる指標発話を抽出する。例えば、指標発話抽出処理部５０は、エンゲージメントデータベース３４を参照し、エンゲージメント履歴が所定の条件を満たす利用者を特定する。 Therefore, the index utterance extraction processing unit 50 obtains a session including a user's utterance in which the usage mode of the interactive service satisfies a predetermined condition, and evaluates the user for the interactive service based on the appearance frequency in the obtained session. An index utterance to be an index of is extracted. For example, the index utterance extraction processing unit 50 refers to the engagement database 34 and specifies a user whose engagement history satisfies a predetermined condition.

例えば、指標発話抽出処理部５０は、履歴期間と履歴頻度とがそれぞれ所定の閾値を超える利用者を好意的利用者として特定し、履歴期間と履歴頻度とがそれぞれ所定の閾値未満となる利用者を非好意的利用者として特定する。また、指標発話抽出処理部５０は、好意的利用者の発話を含むセッションと、非好意的利用者の発話を含むセッションとを特定し、特定したセッションに含まれる発話を抽出する。 For example, the index utterance extraction processing unit 50 identifies a user whose history period and history frequency each exceed a predetermined threshold as a favorable user, and a user whose history period and history frequency are each less than a predetermined threshold. Are identified as unfavorable users. Further, the index utterance extraction processing unit 50 specifies a session including the utterance of the favorable user and a session including the utterance of the unfavorable user, and extracts the utterance included in the specified session.

そして、指標発話抽出処理部５０は、抽出した発話ごとに、好意的利用者のセッションにおける出現頻度と、非好意的利用者のセッションにおける出現頻度とをそれぞれ算出する。例えば、指標発話抽出処理部５０は、発話が好意的利用者の発話を含むセッションに出現した回数を好意的利用者の発話を含むセッションの総数で除算した値を、好意的利用者のセッションにおける出現頻度として算出する。また、例えば、指標発話抽出処理部５０は、発話が非好意的利用者の発話を含むセッションに出現した回数を非好意的利用者の発話を含むセッションの総数で除算した値を、非好意的利用者のセッションにおける出現頻度として算出する。 Then, the index utterance extraction processing unit 50 calculates, for each extracted utterance, an appearance frequency in a session of a favorable user and an appearance frequency in a session of a non-favorable user. For example, the index utterance extraction processing unit 50 calculates a value obtained by dividing the number of times the utterance appeared in the session including the utterance of the favorable user by the total number of sessions including the utterance of the favorable user in the session of the favorable user. It is calculated as an appearance frequency. In addition, for example, the index utterance extraction processing unit 50 calculates a value obtained by dividing the number of times the utterance appeared in the session including the utterance of the unfavorable user by the total number of sessions including the utterance of the unfavorable user, It is calculated as the frequency of appearance in the user's session.

また、指標発話抽出処理部５０は、非好意的利用者のセッションにおける出現頻度を好意的利用者のセッションにおける出現頻度で除算した値、すなわち、出現頻度の比率を発話ごとに算出する。そして、指標発話抽出処理部５０は、算出した比率が上位Ｎ個の発話を新たな非好意的指標発話とし、下位Ｎ個の発話を新たな好意的指標発話として抽出する。 Further, the index utterance extraction processing unit 50 calculates a value obtained by dividing the frequency of appearance in the session of the unfavorable user by the frequency of appearance in the session of the favorable user, that is, the ratio of the frequency of appearance for each utterance. Then, the index utterance extraction processing unit 50 extracts the upper N utterances having the calculated ratio as new unfavorable index utterances and extracts the lower N utterances as new favorable index utterances.

〔２−４−３．評価結果に基づく抽出処理について〕
また、例えば、指標発話抽出処理部５０は、セッションの評価結果に基づいて、新たな指標発話の抽出を行う。例えば、図１０は、セッションの評価結果に基づいて、新たな指標発話を抽出する処理の概念を示す図である。 [2-4-3. Extraction processing based on evaluation results]
Further, for example, the index utterance extraction processing unit 50 extracts a new index utterance based on the evaluation result of the session. For example, FIG. 10 is a diagram illustrating a concept of a process of extracting a new index utterance based on an evaluation result of a session.

例えば、セッションには後述する評価処理によって評価結果が付与されている場合がある。また、セッションには、クラウドソージング等といった人手による評価結果が付与されている場合がある。このようなセッションに対する評価結果は、そのセッションに含まれる発話から指標発話を抽出する指標となりえる。 For example, an evaluation result may be given to a session by an evaluation process described later. In addition, a session may be provided with a manual evaluation result such as crowd sourcing. An evaluation result for such a session can be an index for extracting an index utterance from the utterance included in the session.

例えば、利用者による満足度が高いセッション（すなわち、高満足セッション）には、好意的指標発話が多く含まれていると推定される。一方、利用者による満足度が低いセッション（すなわち、低満足セッション）には、非好意的指標発話が多く含まれていると推定される。 For example, it is estimated that a session with a high degree of satisfaction by the user (that is, a session with high satisfaction) contains many favorable index utterances. On the other hand, it is estimated that a session with a low degree of satisfaction by the user (that is, a session with low satisfaction) contains a lot of unfavorable index utterances.

そこで、指標発話抽出処理部５０は、評価が所定の条件を満たすセッションに含まれる発話のうち出現頻度が所定の条件を満たす発話を、発話が含まれるセッションに対する利用者の評価が好意的であるか否か、すなわち、セッションに対して利用者が満足しているか否かの指標となる指標発話として抽出する。例えば、指標発話抽出処理部５０は、高満足セッションおよび低満足セッションに含まれる発話をそれぞれ抽出し、抽出した発話ごとに、低満足セッションにおける出現頻度と高満足セッションにおける出現頻度とを算出する。 Therefore, the index utterance extraction processing unit 50 favorably evaluates the utterance whose appearance frequency satisfies the predetermined condition among the utterances included in the session whose evaluation satisfies the predetermined condition, with respect to the session including the utterance. , Ie, extracted as an index utterance that is an index of whether the user is satisfied with the session. For example, the index utterance extraction processing unit 50 extracts the utterances included in the high satisfaction session and the low satisfaction session, respectively, and calculates the appearance frequency in the low satisfaction session and the appearance frequency in the high satisfaction session for each extracted utterance.

そして、指標発話抽出処理部５０は、低満足セッションにおける出現頻度を高満足セッションにおける出現頻度で除算した値を出現頻度の比率として算出する。そして、指標発話抽出処理部５０は、算出した比率が上位Ｎ個の発話を新たな非好意的指標発話とし、下位Ｎ個の発話を新たな好意的指標発話として抽出する。 Then, the index utterance extraction processing unit 50 calculates a value obtained by dividing the appearance frequency in the low satisfaction session by the appearance frequency in the high satisfaction session as a ratio of the appearance frequency. Then, the index utterance extraction processing unit 50 extracts the upper N utterances having the calculated ratio as new unfavorable index utterances and extracts the lower N utterances as new favorable index utterances.

〔２−４−４．指標発話抽出処理部の機能構成について〕
次に、図１１を用いて、指標発話抽出処理部５０が有する機能構成の一例について説明する。図１１は、実施形態に係る指標発話抽出処理部の機能構成の一例を示す図である。図１１に示すように、指標発話抽出処理部５０は、指標発話特定部５１、発話抽出部５２、およびネットワーク生成部５３を有する。 [2-4-4. Function configuration of index utterance extraction processing unit]
Next, an example of a functional configuration of the index utterance extraction processing unit 50 will be described with reference to FIG. FIG. 11 is a diagram illustrating an example of a functional configuration of the index utterance extraction processing unit according to the embodiment. As illustrated in FIG. 11, the index utterance extraction processing unit 50 includes an index utterance identification unit 51, an utterance extraction unit 52, and a network generation unit 53.

指標発話特定部５１は、対話サービスにおける利用者の発話と応答との履歴から、対話サービスに対する利用者の評価の指標となる発話であって、予め設定された指標発話を特定する。例えば、指標発話特定部５１は、指標発話データベース３３を参照し、指標発話を識別する。なお、指標発話特定部５１は、予め登録された指標発話を識別してもよく、セッションから抽出された指標発話を識別してもよい。 The index utterance specifying unit 51 specifies, from the history of the user's utterance and response in the interactive service, an utterance serving as an index of the user's evaluation of the interactive service, which is a preset index utterance. For example, the index utterance identification unit 51 refers to the index utterance database 33 and identifies the index utterance. Note that the index utterance identification unit 51 may identify an index utterance registered in advance, or may identify an index utterance extracted from a session.

そして、指標発話特定部５１は、セッションログデータベース３１を参照し、識別した指標発話を含むセッションを検索する。すなわち、指標発話特定部５１は、セッションの履歴のうち、利用者の発話に好意的指標発話若しくは非好意的指標発話が含まれるセッションを特定する。そして、指標発話特定部５１は、特定したセッション、すなわち、指標発話を含むセッションを発話抽出部５２に通知する。 Then, the index utterance specifying unit 51 refers to the session log database 31 and searches for a session including the identified index utterance. That is, the index utterance specifying unit 51 specifies a session in which a user utterance includes a favorable index utterance or a non-favorable index utterance in the session history. Then, the index utterance specifying unit 51 notifies the utterance extracting unit 52 of the specified session, that is, the session including the index utterance.

なお、指標発話特定部５１は、図８に示す共起ネットワークから指標発話を特定する場合、ネットワーク生成部５３に対して、指標発話を含むセッションを通知する。そして、指標発話特定部５１は、ネットワーク生成部５３により生成された共起ネットワークに含まれる各ノードのうち、識別した指標発話と対応するノード、すなわち、指標発話データベース３３に登録された好意的指標発話や非好意的指標発話と対応するノードを特定する。そして、指標発話特定部５１は、共起ネットワークと、特定したノードとを発話抽出部５２に通知する。 When specifying the index utterance from the co-occurrence network illustrated in FIG. 8, the index utterance specifying unit 51 notifies the network generation unit 53 of the session including the index utterance. Then, the index utterance identification unit 51, among the nodes included in the co-occurrence network generated by the network generation unit 53, the node corresponding to the identified index utterance, that is, the favorable index registered in the index utterance database 33. The node corresponding to the utterance or the unfavorable index utterance is specified. Then, the index utterance specifying unit 51 notifies the utterance extracting unit 52 of the co-occurrence network and the specified node.

発話抽出部５２は、各セッションから指標発話を抽出する。例えば、発話抽出部５２は、図８に示すように、シードとなる指標発話との共起性に基づいて、セッションの履歴に含まれる発話から新たな指標発話を抽出する。 The utterance extraction unit 52 extracts an index utterance from each session. For example, as illustrated in FIG. 8, the utterance extracting unit 52 extracts a new index utterance from the utterance included in the session history based on the co-occurrence with the index utterance serving as a seed.

例えば、発話抽出部５２は、指標発話特定部５１により特定されたセッション、すなわち、指標発話が含まれるセッションをセッションログデータベース３１から抽出する。そして、発話抽出部５２は、発話ごとに、好意的指標発話との共起性を特定し、好意的指標発話との共起性が所定の閾値を超える発話を新たな好意的指標発話として抽出する。また、発話抽出部５２は、発話ごとに、非好意的指標発話との共起性を特定し、特定した共起性が所定の閾値を超える発話を新たな非好意的指標発話として抽出する。 For example, the utterance extracting unit 52 extracts, from the session log database 31, the session specified by the index utterance specifying unit 51, that is, the session including the index utterance. Then, the utterance extraction unit 52 specifies, for each utterance, co-occurrence with the favorable index utterance, and extracts an utterance whose co-occurrence with the favorable index utterance exceeds a predetermined threshold as a new favorable index utterance. I do. Further, the utterance extracting unit 52 specifies, for each utterance, co-occurrence with the unfavorable index utterance, and extracts an utterance in which the specified co-occurrence exceeds a predetermined threshold as a new unfavorable index utterance.

例えば、発話抽出部５２は、抽出したセッションに含まれる発話ごとに、好意的指標発話が含まれるセッションでの出現回数、および、非好意的指標発話が含まれるセッションでの出現回数を計数する。すなわち、発話抽出部５２は、好意的指標発話が同一のセッション内に含まれる回数と、非好意的指標発話が同一のセッション内に含まれる回数とを発話ごとに計数する。そして、発話抽出部５２は、計数した各回数の比が所定の条件を満たす発話を新たな指標発話として抽出する。例えば、発話抽出部５２は、非好意的指標発話が含まれるセッションでの出現回数を好意的指標発話が含まれるセッションでの出現回数で除算した値を比率として算出し、比率が上位Ｎ個の発話を新たな非好意的指標発話とし、比率が下位Ｎ個の発話を新たな好意的指標発話とする。 For example, for each utterance included in the extracted session, the utterance extraction unit 52 counts the number of appearances in the session including the favorable index utterance and the number of appearances in the session including the unfavorable index utterance. That is, the utterance extraction unit 52 counts, for each utterance, the number of times that the favorable index utterance is included in the same session and the number of times that the unfavorable index utterance is included in the same session. Then, the utterance extraction unit 52 extracts, as a new index utterance, an utterance in which the ratio of the counted times satisfies a predetermined condition. For example, the utterance extraction unit 52 calculates, as a ratio, a value obtained by dividing the number of appearances in the session including the unfavorable index utterance by the number of appearances in the session including the favorable index utterance, and calculates the N highest ratios. The utterance is defined as a new unfavorable index utterance, and the utterances having the lower N ratios are defined as new favorable index utterances.

なお、発話抽出部５２は、単純に、好意的指標発話が含まれるセッションでの出現回数が所定の閾値を超える発話を新たな好意的指標発話としてもよい。また、発話抽出部５２は、非好意的指標発話が含まれるセッションでの出現回数が所定の閾値を超える発話を新たな非好意的指標発話としてもよい。 Note that the utterance extraction unit 52 may simply set an utterance whose appearance frequency in a session including the favorable index utterance exceeds a predetermined threshold value as a new favorable index utterance. In addition, the utterance extraction unit 52 may set, as a new unfavorable index utterance, an utterance in which the number of appearances in a session including the unfavorable index utterance exceeds a predetermined threshold.

また、発話抽出部５２は、共起ネットワークと指標発話に対応するノードとの通知を受付けた場合、共起ネットワークに基づいて、指標発話との共起性が所定の条件を満たす発話を指標発話として抽出する。すなわち、発話抽出部５２は、共起ネットワーク上における指標発話との距離が所定の条件を満たす発話を新たな指標発話として抽出する。 In addition, when the utterance extraction unit 52 receives a notification of the node corresponding to the co-occurrence network and the index utterance, the utterance extraction unit 52 determines, based on the co-occurrence network, an utterance whose co-occurrence with the index utterance satisfies a predetermined condition. Extract as That is, the utterance extracting unit 52 extracts, as a new index utterance, an utterance whose distance from the index utterance on the co-occurrence network satisfies a predetermined condition.

例えば、発話抽出部５２は、共起ネットワーク上における好意的指標発話との距離が所定の閾値以下となる発話を新たな好意的指標発話として抽出してもよい。また、例えば、発話抽出部５２は、共起ネットワーク上における非好意的指標発話との距離が所定の閾値以下となる発話を、非好意的指標発話として抽出してもよい。また、例えば、発話抽出部５２は、好意的指標発話との距離が所定の閾値以下となり、かつ、非好意的指標発話との距離が所定の閾値以上となる発話を新たな好意的指標発話として抽出してもよい。また、例えば、発話抽出部５２は、非好意的指標発話との距離が所定の閾値以下となり、かつ、好意的指標発話との距離が所定の閾値以上となる発話を新たな非好意的指標発話として抽出してもよい。 For example, the utterance extracting unit 52 may extract, as a new favorable index utterance, an utterance whose distance from the favorable index utterance on the co-occurrence network is equal to or less than a predetermined threshold. Further, for example, the utterance extracting unit 52 may extract, as a non-favorable index utterance, an utterance whose distance from the non-favorable index utterance on the co-occurrence network is equal to or less than a predetermined threshold. In addition, for example, the utterance extraction unit 52 sets, as a new favorable index utterance, an utterance whose distance from the favorable index utterance is equal to or less than a predetermined threshold and whose distance from the unfavorable index utterance is equal to or greater than a predetermined threshold. May be extracted. In addition, for example, the utterance extraction unit 52 converts the utterance whose distance to the unfavorable index utterance is equal to or less than a predetermined threshold value and whose distance from the favorable utterance index is equal to or more than a predetermined threshold value to a new unfavorable index utterance. May be extracted.

例えば、発話抽出部５２は、共起ネットワークに含まれるノードごとに、好意的指標発話と対応するノードとの距離、および、非好意的指標発話と対応するノードとの距離を特定する。そして、発話抽出部５２は、非好意的指標発話と対応するノードとの距離が所定の閾値以下となり、かつ、好意的指標発話と対応するノードとの距離が所定の閾値を超えるノードと対応する発話を新たな非好意的指標発話とする。また、発話抽出部５２は、好意的指標発話と対応するノードとの距離が所定の閾値以下となり、かつ、非好意的指標発話と対応するノードとの距離が所定の閾値を超えるノードと対応する発話を新たな好意的指標発話とする。 For example, the utterance extraction unit 52 specifies, for each node included in the co-occurrence network, the distance between the node corresponding to the favorable index utterance and the distance between the node corresponding to the unfavorable index utterance. Then, the utterance extracting unit 52 corresponds to a node in which the distance between the non-favorable index utterance and the corresponding node is equal to or less than a predetermined threshold and the distance between the favorable index utterance and the corresponding node exceeds the predetermined threshold. Let the utterance be a new unfavorable index utterance. In addition, the utterance extraction unit 52 corresponds to a node in which the distance between the favorable index utterance and the corresponding node is equal to or less than a predetermined threshold and the distance between the non-favorable index utterance and the corresponding node exceeds the predetermined threshold. Let the utterance be a new favorable index utterance.

また、発話抽出部５２は、図９に示す処理を実行する場合、対話サービスの利用態様が所定の条件を満たす利用者の発話と応答とを含むセッションを取得し、取得したセッションにおける出現頻度に基づいて、指標発話を抽出する。例えば、発話抽出部５２は、エンゲージメントデータベース３４を参照し、所定の期間内における利用者のエンゲージメント履歴を特定する。より具体的な例を挙げると、発話抽出部５２は、処理日時と直近の数か月間における履歴期間と履歴頻度とを特定する。 When executing the processing illustrated in FIG. 9, the utterance extraction unit 52 acquires a session including an utterance and a response of a user whose use mode of the interactive service satisfies a predetermined condition, and determines the appearance frequency in the acquired session. Based on this, an index utterance is extracted. For example, the utterance extraction unit 52 refers to the engagement database 34 and specifies the user's engagement history within a predetermined period. To give a more specific example, the utterance extraction unit 52 specifies the processing date and time and the history period and history frequency in the latest several months.

そして、発話抽出部５２は、履歴期間が所定の閾値よりも長く、かつ、履歴頻度が所定の閾値よりも高い利用者を好意的利用者として特定する。また、発話抽出部５２は、履歴期間が所定の閾値よりも短く、かつ、履歴頻度が所定の閾値よりも低い利用者を非好意的利用者として特定する。なお、発話抽出部５２は、予測エンゲージメントとして登録された予測期間および予測頻度が所定の閾値を超えるか否かに基づいて、好意的利用者や非好意的利用者を特定してもよい。 Then, the utterance extraction unit 52 specifies a user whose history period is longer than the predetermined threshold and whose history frequency is higher than the predetermined threshold as a favorable user. Further, the utterance extraction unit 52 specifies a user whose history period is shorter than a predetermined threshold and whose history frequency is lower than the predetermined threshold as a non-favorable user. The utterance extraction unit 52 may specify a favorable user or a non-favorable user based on whether the prediction period and the prediction frequency registered as the prediction engagement exceed a predetermined threshold.

そして、発話抽出部５２は、非好意的利用者のセッションにおける出現頻度を好意的利用者のセッションにおける出現頻度で除算した値を、発話ごとに算出する。そして、発話抽出部５２は、算出した値が上位Ｎ個の発話を新たな非好意的指標発話とし、算出した値が下位Ｎ個の発話を新たな好意的指標発話として抽出する。 Then, the utterance extracting unit 52 calculates, for each utterance, a value obtained by dividing the frequency of appearance of the session of the unfavorable user by the frequency of appearance of the session of the favorable user. Then, the utterance extraction unit 52 extracts the utterances whose calculated values are higher N as new unfavorable index utterances, and extracts the utterances whose calculated values are lower N as new favorable index utterances.

すなわち、発話抽出部５２は、利用態様が所定の条件を満たす利用者、すなわち、好意的利用者の発話を含むセッションにおける出現頻度が所定の閾値を超える発話を、新たな好意的指標発話として抽出する。また、発話抽出部５２は、利用態様が所定の条件を満たさない利用者、すなわち、非好意的利用者の発話を含むセッションにおける出現頻度が所定の閾値を超える発話を、新たな非好意的指標発話として抽出する。 That is, the utterance extraction unit 52 extracts, as a new favorable index utterance, a user whose usage mode satisfies a predetermined condition, that is, an utterance whose appearance frequency in a session including an utterance of a favorable user exceeds a predetermined threshold. I do. Further, the utterance extraction unit 52 determines that a user whose usage mode does not satisfy the predetermined condition, that is, an utterance whose appearance frequency in a session including the utterance of the unfavorable user exceeds a predetermined threshold value is a new unfavorable index. Extract as utterance.

また、発話抽出部５２は、図１０に示す処理を実行する場合、評価が所定の条件を満たすセッションから出現頻度が所定の条件を満たす発話を指標発話として抽出する。例えば、発話抽出部５２は、セッション評価データベース３２を参照し、高満足セッションと低満足セッションとを特定する。また、発話抽出部５２は、セッションログデータベース３１を参照し、高満足セッションと低満足セッションとを抽出する。 In addition, when executing the process illustrated in FIG. 10, the utterance extraction unit 52 extracts, as an index utterance, an utterance whose appearance frequency satisfies a predetermined condition from a session whose evaluation satisfies a predetermined condition. For example, the utterance extraction unit 52 refers to the session evaluation database 32 and specifies a high satisfaction session and a low satisfaction session. Further, the utterance extraction unit 52 refers to the session log database 31 and extracts a high satisfaction session and a low satisfaction session.

そして、発話抽出部５２は、発話ごとに、低満足セッションにおける出現頻度と高満足セッションにおける出現頻度との割合を算出し、割合が所定の条件を満たす発話を新たな指標発話として抽出する。例えば、発話抽出部５２は、低満足セッションにおける出現頻度を高満足セッションにおける出現頻度で除算した値を算出し、算出した値が上位Ｎ個の発話を新たな非好意的指標発話とし、算出した値が下位Ｎ個の発話を新たな好意的指標発話として抽出する。 Then, the utterance extracting unit 52 calculates, for each utterance, the ratio between the frequency of appearance in the low-satisfaction session and the frequency of appearance in the high-satisfaction session, and extracts the utterance whose ratio satisfies a predetermined condition as a new index utterance. For example, the utterance extracting unit 52 calculates a value obtained by dividing the frequency of appearance in the low-satisfaction session by the frequency of appearance in the high-satisfaction session, and calculates the utterances having the highest N values as new unfavorable index utterances. The lower N utterances are extracted as new favorable index utterances.

なお、発話抽出部５２は、上述した各種の処理により抽出した指標発話を、指標発話データベース３３に登録する。このように、指標発話データベース３３に登録された指標発話は、登録後の処理において、シードとなる指標発話やセッションを評価するための指標発話として利用される。 Note that the utterance extraction unit 52 registers the index utterance extracted by the various processes described above in the index utterance database 33. As described above, the index utterance registered in the index utterance database 33 is used as an index utterance for evaluating a seed index utterance or a session in a process after registration.

ネットワーク生成部５３は、発話と応答との履歴から、共起性を有する発話若しくは応答を接続する共起ネットワークを生成する。例えば、ネットワーク生成部５３は、セッションログデータベース３１を参照し、セッションに含まれている各発話を特定する。そして、ネットワーク生成部５３は、各発話と対応するノードを設定し、各発話の共起性に基づいて、各発話と対応するノード間を接続した共起ネットワークを生成する。そして、ネットワーク生成部５３は、共起ネットワークを指標発話特定部５１に提供する。 The network generation unit 53 generates a co-occurrence network connecting utterances or responses having co-occurrence from the history of the utterance and the response. For example, the network generation unit 53 refers to the session log database 31 and specifies each utterance included in the session. Then, the network generation unit 53 sets a node corresponding to each utterance, and generates a co-occurrence network connecting the nodes corresponding to each utterance based on the co-occurrence of each utterance. Then, the network generation unit 53 provides the co-occurrence network to the index utterance identification unit 51.

〔２−５．評価処理について〕
図２に戻り、説明を続ける。セッション評価処理部６０は、セッションを評価する評価処理を実行する。以下、図１２〜図２１を用いてセッション評価処理部６０が実行する評価の概念、および、セッション評価処理部６０が有する機能構成の一例を説明する。 [2-5. Evaluation process)
Returning to FIG. 2, the description will be continued. The session evaluation processing unit 60 executes an evaluation process for evaluating a session. Hereinafter, the concept of the evaluation performed by the session evaluation processing unit 60 and an example of the functional configuration of the session evaluation processing unit 60 will be described with reference to FIGS.

〔２−５−１．指標発話に基づく評価処理について〕
例えば、セッション評価処理部６０は、セッションに含まれる指標発話に基づいて、セッションの評価を行う。例えば、図１２は、指標発話に基づいてセッションを評価する処理の概念を示す図である。例えば、あるセッションに好意的指標発話が多く含まれる場合、そのセッションに含まれる各応答に対する利用者の印象が良いと推定されるので、そのセッションに対する利用者の満足度は高いと推定される。また、あるセッションに非好意的指標発話が多い場合、そのセッションに含まれる各応答に対する利用者の印象が悪いと推定されるので、そのセッションに対する利用者の満足度は低いと推定される。 [2-5-1. Evaluation process based on index utterance]
For example, the session evaluation processing unit 60 evaluates the session based on the index utterance included in the session. For example, FIG. 12 is a diagram illustrating a concept of a process of evaluating a session based on an index utterance. For example, when a session includes many favorable index utterances, it is estimated that the user has a good impression of each response included in the session, and thus the user's satisfaction with the session is estimated to be high. In addition, when there are many unfavorable index utterances in a certain session, it is estimated that the user's impression of each response included in the session is bad, and thus the user's satisfaction with the session is estimated to be low.

そこで、セッション評価処理部６０は、セッションに含まれる非好意的指標発話の出現頻度や好意的指標発話の出現頻度に基づいて、セッションの評価を行う。例えば、セッション評価処理部６０は、セッション評価データベース３２を参照し、評価が行われていないセッションを評価対象として選択する。なお、セッション評価処理部６０は、評価が行われたセッションを評価対象として選択してもよい。例えば、セッション評価処理部６０は、評価が行われてから所定の期間が経過したセッションを評価対象として選択し直してもよい。 Therefore, the session evaluation processing unit 60 evaluates the session based on the appearance frequency of the unfavorable index utterance included in the session and the appearance frequency of the favorable index utterance. For example, the session evaluation processing unit 60 refers to the session evaluation database 32 and selects a session that has not been evaluated as an evaluation target. Note that the session evaluation processing unit 60 may select the evaluated session as an evaluation target. For example, the session evaluation processing unit 60 may reselect a session for which a predetermined period has elapsed since the evaluation was performed as an evaluation target.

続いて、セッション評価処理部６０は、評価対象のセッションをセッションログデータベース３１から抽出し、指標発話データベース３３を参照して、抽出したセッションに含まれる指標発話を特定する。そして、セッション評価処理部６０は、非好意的指標発話の出現頻度の合計と、好意的指標発話の出現頻度の合計との比率を算出する。例えば、セッション評価処理部６０は、処理対象となるセッションに含まれる発話のうち、非好意的指標発話として登録された発話を全て特定し、特定した発話の総数を非好意的指標発話の出現頻度の合計として特定する。また、セッション評価処理部６０は、処理対象となるセッションに含まれる発話のうち、好意的指標発話として登録された発話を全て特定し、特定した発話の総数を好意的指標発話の出現頻度の合計として特定する。 Subsequently, the session evaluation processing unit 60 extracts the session to be evaluated from the session log database 31 and refers to the index utterance database 33 to specify the index utterance included in the extracted session. Then, the session evaluation processing unit 60 calculates the ratio of the total appearance frequency of unfavorable index utterances to the total appearance frequency of favorable index utterances. For example, the session evaluation processing unit 60 specifies all the utterances registered as unfavorable index utterances among the utterances included in the session to be processed, and determines the total number of specified utterances as the frequency of occurrence of the unfavorable index utterance. Specified as the sum of Further, the session evaluation processing unit 60 specifies all utterances registered as favorable index utterances among the utterances included in the session to be processed, and calculates the total number of the specified utterances as the sum of the appearance frequencies of the favorable index utterances. To be specified.

そして、セッション評価処理部６０は、非好意的指標発話の出現頻度の合計を、好意的指標発話の出現頻度の合計で除算した値を比率として算出する。また、セッション評価処理部６０は、他のセッションに対しても同様に比率を算出し、算出した値が上位Ｎ個のセッションを低満足セッションとして抽出し、算出した値が下位Ｎ個のセッションを高満足セッションとして抽出する。 Then, the session evaluation processing unit 60 calculates, as a ratio, a value obtained by dividing the total appearance frequency of unfavorable index utterances by the total appearance frequency of favorable index utterances. In addition, the session evaluation processing unit 60 similarly calculates the ratio for the other sessions, extracts the top N sessions whose calculated value is the low-satisfied session, and assigns the calculated value to the bottom N sessions. Extract as high satisfaction sessions.

ここで、各指標発話が示唆する利用者の満足度には、差異があると考えられる。例えば、「すごいね」という発話を行った利用者は、「いいね」という発話を行った利用者よりも、対話サービスに満足しているとも考えられる。そこで、セッション評価処理部６０は、指標発話が示唆する利用者の満足度の程度を考慮して、セッションの評価を行ってもよい。 Here, it is considered that there is a difference in the degree of user satisfaction indicated by each index utterance. For example, it can be considered that a user who utters “awesome” is more satisfied with the interactive service than a user who utters “like”. Thus, the session evaluation processing unit 60 may evaluate the session in consideration of the degree of user satisfaction indicated by the index utterance.

例えば、指標発話データベース３３には、対応する指標発話を発した利用者がどれくらい対話サービスに満足しているか、若しくは、満足していないかを示す指標スコアが付与されている（例えば、図５参照）。このような指標スコアは、例えば、クラウドソージングにより人手で提供されたものでも良く、例えば、セッションに付与される不満足度スコアと各指標発話との出現頻度との関係性から自動的に算出されたものであってもよい。例えば、情報提供装置１０は、不満足度スコアがより高いセッションに多く含まれる非好意的指標発話ほどより高い値の指標スコアを算出してもよく、不満足度スコアがより低いセッションに多く含まれる好意的指標発話ほどより高い値の指標スコアを算出してもよい。 For example, the index utterance database 33 is provided with an index score indicating how much the user who has uttered the corresponding index utterance is satisfied or not satisfied with the interactive service (for example, see FIG. 5). ). Such an index score may be provided manually by, for example, crowdsourcing.For example, the index score is automatically calculated from the relationship between the dissatisfaction score given to the session and the appearance frequency of each index utterance. May be used. For example, the information providing apparatus 10 may calculate an index score having a higher value as the unfavorable index utterance is more included in a session with a higher dissatisfaction score, and a favorable score is more often included in a session with a lower dissatisfaction score. An index score of a higher value may be calculated for a target index utterance.

このような指標スコアが付与されている場合、セッション評価処理部６０は、以下の処理を実行する。まず、セッション評価処理部６０は、評価対象のセッションをセッションログデータベース３１から抽出し、指標発話データベース３３を参照して、抽出したセッションに含まれる指標発話を特定する。 When such an index score is given, the session evaluation processing unit 60 executes the following processing. First, the session evaluation processing unit 60 extracts a session to be evaluated from the session log database 31 and refers to the index utterance database 33 to specify an index utterance included in the extracted session.

そして、セッション評価処理部６０は、処理対象となるセッションに含まれる非好意的指標発話の指標スコアの合計と、好意的指標発話の指標スコアの合計との比率を算出する。例えば、セッション評価処理部６０は、処理対象となるセッションに含まれる非好意的指標発話の指標スコアの合計を、好意的指標発話の指標スコアの合計で除算した値を比率として算出する。なお、セッション評価処理部６０は、各指標発話の指標スコアの絶対値の合計から、比率の算出を行ってもよい。また、セッション評価処理部６０は、他のセッションに対しても同様に比率を算出し、算出した値が上位Ｎ個のセッションを低満足セッションとし、算出した値が下位Ｎ個のセッションを高満足セッションとする。その後、セッション評価処理部６０は、評価結果をセッション評価データベース３２に登録する。 Then, the session evaluation processing unit 60 calculates the ratio of the sum of the index scores of the unfavorable index utterances included in the session to be processed to the sum of the index scores of the favorable index utterances. For example, the session evaluation processing unit 60 calculates, as a ratio, a value obtained by dividing the sum of the index scores of the unfavorable index utterances included in the session to be processed by the sum of the index scores of the favorable index utterances. Note that the session evaluation processing unit 60 may calculate the ratio from the sum of the absolute values of the index scores of the respective index utterances. The session evaluation processing unit 60 also calculates the ratio for other sessions in the same manner, and sets the calculated value to the top N sessions as the low-satisfaction session and sets the calculated value to the lower N sessions as the high-satisfaction session. Session. Thereafter, the session evaluation processing unit 60 registers the evaluation result in the session evaluation database 32.

〔２−５−２．指標発話に基づく評価処理について〕
また、例えば、セッション評価処理部６０は、セッションに含まれる指標発話に基づいて、セッションを評価するモデルを用いて、セッションの評価を行う。例えば、図１３は、指標発話に基づいてセッションを評価する処理の概念を示す図である。 [2-5-2. Evaluation process based on index utterance]
Further, for example, the session evaluation processing unit 60 evaluates the session based on the index utterance included in the session, using a model for evaluating the session. For example, FIG. 13 is a diagram illustrating a concept of a process of evaluating a session based on an index utterance.

例えば、セッション評価処理部６０は、セッションログデータベース３１を参照し、高満足セッションと高満足セッション以外のセッションとを学習データとして抽出する。例えば、セッション評価処理部６０は、図１２に示す処理による評価結果に基づいて、高満足セッションであると評価されたセッションをセッションログデータベース３１から抽出する。また、セッション評価処理部６０は、高満足セッションであると評価されたセッション以外のセッションをセッションログデータベース３１から抽出する。 For example, the session evaluation processing unit 60 refers to the session log database 31 and extracts a high satisfaction session and a session other than the high satisfaction session as learning data. For example, the session evaluation processing unit 60 extracts, from the session log database 31, a session that has been evaluated as a highly satisfactory session, based on the evaluation result obtained by the processing illustrated in FIG. In addition, the session evaluation processing unit 60 extracts from the session log database 31 sessions other than the session that has been evaluated as a highly satisfactory session.

そして、セッション評価処理部６０は、抽出した学習データを用いて、好意的指標発話評価モデルＭ４の学習を行う。例えば、セッション評価処理部６０は、高満足セッションに含まれる発話および応答の文字列を入力した際に、所定の閾値よりも高い値の第１満足度を出力し、高満足セッション以外のセッションに含まれる発話および応答の文字列を入力した際に、所定の閾値よりも低い値の第１満足度を出力するように、好意的指標発話評価モデルＭ４の学習を行う。なお、セッション評価処理部６０は、高満足セッションに含まれる発話および応答の文字列を入力した際に、高満足セッションである旨の値を出力し、高満足セッション以外のセッションに含まれる発話および応答の文字列を入力した際に、高満足セッションではない旨の値を出力するように、好意的指標発話評価モデルＭ４の学習を行ってもよい。 Then, the session evaluation processing unit 60 learns the favorable index utterance evaluation model M4 using the extracted learning data. For example, when the utterance and response character strings included in the high-satisfaction session are input, the session evaluation processing unit 60 outputs a first degree of satisfaction higher than a predetermined threshold, and outputs the first satisfaction degree to a session other than the high-satisfaction session. The learning of the favorable index utterance evaluation model M4 is performed so as to output a first degree of satisfaction lower than a predetermined threshold when a character string of the included utterance and response is input. Note that, when the utterance and response character strings included in the high-satisfaction session are input, the session evaluation processing unit 60 outputs a value indicating that the session is a high-satisfaction session. The learning of the favorable index utterance evaluation model M4 may be performed such that when a response character string is input, a value indicating that the session is not a high satisfaction session is output.

なお、高満足セッションには、好意的指標発話が多く含まれていると推定される。このため、上述した学習を行った場合、好意的指標発話評価モデルＭ４は、評価対象となる対象セッションに好意的指標発話が多く含まれるほど、より高い値の第１満足度を出力し、評価対象となる対象セッションに好意的指標発話が含まれないほど、より低い値の第１満足度を出力することとなる。 It is presumed that the high satisfaction session includes many favorable index utterances. Therefore, when the above-described learning is performed, the favorable index utterance evaluation model M4 outputs a higher value of the first satisfaction degree as the target session to be evaluated includes more favorable index utterances, and The lower the value of the first degree of satisfaction is, the less the favorable index utterance is included in the target session.

また、セッション評価処理部６０は、セッションログデータベース３１を参照し、低満足セッションと低満足セッション以外のセッションとを学習データとして抽出する。そして、セッション評価処理部６０は、抽出した学習データを用いて、非好意的指標発話評価モデルＭ３の学習を行う。例えば、セッション評価処理部６０は、低満足セッションに含まれる発話および応答の文字列を入力した際に、所定の閾値よりも高い値の第３不満足度を出力し、低満足セッション以外のセッションに含まれる発話および応答の文字列を入力した際に、所定の閾値よりも低い値の第３不満足度を出力するように、非好意的指標発話評価モデルＭ３の学習を行う。なお、セッション評価処理部６０は、低満足セッションに含まれる発話および応答の文字列を入力した際に、低満足セッションである旨の値を出力し、低満足セッション以外のセッションに含まれる発話および応答の文字列を入力した際に、低満足セッションではない旨の値を出力するように、非好意的指標発話評価モデルＭ３の学習を行ってもよい。 Further, the session evaluation processing unit 60 refers to the session log database 31 and extracts a low satisfaction session and a session other than the low satisfaction session as learning data. Then, the session evaluation processing unit 60 learns the unfavorable index utterance evaluation model M3 using the extracted learning data. For example, when the utterance and response character strings included in the low-satisfaction session are input, the session evaluation processing unit 60 outputs a third degree of dissatisfaction that is higher than a predetermined threshold, and outputs The learning of the unfavorable index utterance evaluation model M3 is performed so as to output a third dissatisfaction degree lower than a predetermined threshold when a character string of the included utterance and response is input. Note that, when the utterance and response character strings included in the low-satisfaction session are input, the session evaluation processing unit 60 outputs a value indicating that the session is a low-satisfaction session. When a response character string is input, learning of the unfavorable index utterance evaluation model M3 may be performed such that a value indicating that the session is not a low satisfaction session is output.

なお、低満足セッションには、非好意的指標発話が多く含まれていると推定される。このため、上述した学習を行った場合、非好意的指標発話評価モデルＭ３は、評価対象となる対象セッションに非好意的指標発話が多く含まれるほど、より高い値の第３不満足度を出力し、評価対象となる対象セッションに非好意的指標発話が含まれないほど、より低い値の第３不満足度を出力することとなる。 It is estimated that the low satisfaction session includes many unfavorable index utterances. Therefore, when the learning described above is performed, the unfavorable index utterance evaluation model M3 outputs a higher value of the third dissatisfaction degree as the target session to be evaluated includes more unfavorable index utterances. The lower the value of the third degree of dissatisfaction is, the less the unfavorable index utterance is included in the target session to be evaluated.

そして、セッション評価処理部６０は、非好意的指標発話評価モデルＭ３や好意的指標発話評価モデルＭ４を用いて、セッションを評価する。例えば、セッション評価処理部６０は、評価対象となる対象セッションの発話や応答の文字列を非好意的指標発話評価モデルＭ３や好意的指標発話評価モデルＭ４に入力し、第３不満足度および第１満足度を取得する。そして、セッション評価処理部６０は、第３不満足度や第１満足度に基づいて、セッションを評価する。 Then, the session evaluation processing unit 60 evaluates the session using the unfavorable index utterance evaluation model M3 and the favorable index utterance evaluation model M4. For example, the session evaluation processing unit 60 inputs the utterance or response character string of the target session to be evaluated to the unfavorable index utterance evaluation model M3 or the favorable index utterance evaluation model M4, and outputs the third dissatisfaction degree and the first Get satisfaction. Then, the session evaluation processing unit 60 evaluates the session based on the third degree of dissatisfaction or the first degree of satisfaction.

例えば、セッション評価処理部６０は、第３不満足度が所定の閾値を超える場合は、対象セッションを低満足セッションと評価してもよい。また、セッション評価処理部６０は、第１満足度が所定の閾値を超える場合は、対象セッションを高満足セッションと評価してもよい。また、セッション評価処理部６０は、第３不満足度が所定の閾値を超え、かつ、第１満足度が所定の閾値未満となる場合は、対象セッションを低満足セッションと評価してもよい。また、セッション評価処理部６０は、第３不満足度が所定の閾値未満となり、かつ、第１満足度が所定の閾値以上となる場合は、対象セッションを高満足セッションと評価してもよい。また、セッション評価処理部６０は、図１に示した処理により、対象セッションを評価してもよい。 For example, when the third degree of dissatisfaction exceeds a predetermined threshold, the session evaluation processing unit 60 may evaluate the target session as a low-satisfaction session. When the first degree of satisfaction exceeds a predetermined threshold, the session evaluation processing unit 60 may evaluate the target session as a high satisfaction session. Further, when the third degree of dissatisfaction exceeds a predetermined threshold and the first degree of satisfaction becomes less than the predetermined threshold, the session evaluation processing unit 60 may evaluate the target session as a low-satisfaction session. When the third degree of dissatisfaction is less than a predetermined threshold and the first degree of satisfaction is equal to or more than a predetermined threshold, the session evaluation processing unit 60 may evaluate the target session as a high satisfaction session. Further, the session evaluation processing unit 60 may evaluate the target session by the processing shown in FIG.

〔２−５−３．３値学習に基づく評価処理について〕
ここで、セッションに限らず各種の対話において、好意的指標発話に先駆けて行われた対話は、利用者の満足度が高く、非好意的指標発話に先駆けて行われた対話は、利用者の満足度が低いと推定される。また、指標発話が示す利用者の印象は、指標発話よりも後の対話ではなく、指標発話の直前の対話に対するものであると考えられる。 [About evaluation processing based on 2-5-3.3 value learning]
Here, not only in the session but also in various dialogues, the dialogue performed prior to the favorable index utterance has a high degree of user satisfaction, and the dialogue performed prior to the unfavorable index utterance is the user's It is estimated that satisfaction is low. Also, it is considered that the user's impression indicated by the index utterance is not a dialog after the index utterance but a dialog immediately before the index utterance.

そこで、セッション評価処理部６０は、指標発話と対応する一連の対話を抽出し、抽出した対話を用いて、入力された対話に対する利用者の評価を推定するモデルの学習を行ってもよい。例えば、セッション評価処理部６０は、指標発話に先駆けてやり取りされた発話と応答とを抽出し、抽出した発話の特徴をモデルに学習させ、学習済のモデルを用いてセッションの評価を行ってもよい。また、セッション評価処理部６０は、利用者による印象が好意的でも非好意的でもない（すなわち、満足もしていないが不満足でもない）中性的な対話の特徴を合わせてモデルに学習させてもよい。 Thus, the session evaluation processing unit 60 may extract a series of dialogues corresponding to the index utterance, and may use the extracted dialogues to learn a model for estimating a user's evaluation of the input dialogue. For example, the session evaluation processing unit 60 may extract the utterance and response exchanged prior to the index utterance, cause the model to learn features of the extracted utterance, and evaluate the session using the trained model. Good. Also, the session evaluation processing unit 60 may cause the model to learn the characteristics of the neutral dialogue in which the impression by the user is neither favorable nor unfavorable (that is, neither satisfied nor unsatisfied). Good.

例えば、図１４は、３種類の対話の特徴を学習させた３値評価モデルを用いてセッションを評価する処理の一例を示す図である。例えば、セッション評価処理部６０は、セッションログデータベース３１を参照し、非好意的指標発話が含まれるセッションを特定する。続いて、セッション評価処理部６０は、非好意的指標発話と、非好意的指標発話の直前に行われた所定の数の発話および応答とを非好意的対話例ＵＤとしてセッションから抽出する。例えば、セッション評価処理部６０は、「意味がわからない」という非好意的指標発話の直前に行われた２つの発話およびこれらの発話に対する応答とを抽出し、「意味がわからない」という非好意的指標発話とともに非好意的対話例ＵＤとして抽出する。 For example, FIG. 14 is a diagram illustrating an example of a process of evaluating a session using a ternary evaluation model in which three types of dialogue features are learned. For example, the session evaluation processing unit 60 refers to the session log database 31 and specifies a session including the unfavorable index utterance. Subsequently, the session evaluation processing unit 60 extracts a non-favorable index utterance and a predetermined number of utterances and responses immediately before the non-favorable index utterance from the session as a non-favorable dialogue example UD. For example, the session evaluation processing unit 60 extracts two utterances made immediately before the unfavorable index utterance “I do not know the meaning” and the responses to these utterances, and extracts the unfavorable index “I do not understand the meaning”. It is extracted as an unfavorable dialogue example UD together with the utterance.

また、セッション評価処理部６０は、セッションログデータベース３１を参照し、好意的指標発話が含まれるセッションを特定する。続いて、セッション評価処理部６０は、好意的指標発話と、好意的指標発話の直前に行われた所定の数の発話および応答とを好意的対話例ＦＤとしてセッションから抽出する。例えば、セッション評価処理部６０は、「賢いね」という好意的指標発話の直前に行われた２つの発話およびこれらの発話に対する応答とを抽出し、「賢いね」という好意的指標発話とともに好意的対話例ＦＤとして抽出する。 In addition, the session evaluation processing unit 60 refers to the session log database 31 and specifies a session including a favorable index utterance. Subsequently, the session evaluation processing unit 60 extracts a favorable index utterance and a predetermined number of utterances and responses performed immediately before the favorable index utterance from the session as a favorable dialogue example FD. For example, the session evaluation processing unit 60 extracts two utterances made immediately before the favorable index utterance “smart” and responses to these utterances, and extracts the favorable utterance together with the favorable index utterance “smart”. Extracted as a dialog example FD.

また、セッション評価処理部６０は、指標発話ではない発話を中性発話として１つ選択し、選択した中性発話を含むセッションから、中性発話の直前に行われた所定の数の発話および応答とを中性対話例ＮＤとしてセッションから抽出する。例えば、セッション評価処理部６０は、「明日の天気」という中性発話の直前に行われた２つの発話およびこれらの発話に対する応答とを抽出し、「明日の天気」という中性発話とともに中性対話例ＮＤとして抽出する。 In addition, the session evaluation processing unit 60 selects one utterance that is not an index utterance as a neutral utterance, and performs a predetermined number of utterances and responses performed immediately before the neutral utterance from the session including the selected neutral utterance. Are extracted from the session as a neutral dialogue example ND. For example, the session evaluation processing unit 60 extracts two utterances made immediately before the neutral utterance of “tomorrow's weather” and responses to these utterances, and extracts the neutral utterance together with the neutral utterance of “tomorrow's weather”. Extracted as a dialog example ND.

そして、セッション評価処理部６０は、非好意的対話例ＵＤ、中性対話例ＮＤ、および好意的対話例ＦＤが有する特徴を３値評価モデルＭ５に学習させる。例えば、セッション評価処理部６０は、非好意的対話例ＵＤに含まれる発話および応答の文字列を入力した場合は、入力された発話および応答が非好意的対話例である旨を示す値を出力し、中性対話例ＮＤに含まれる発話および応答の文字列を入力した場合は、入力された発話および応答が中性対話例である旨を示す値を出力し、好意的対話例ＦＤに含まれる発話および応答の文字列を入力した場合は、入力された発話および応答が好意的対話例である旨を示す値を出力するように、３値評価モデルＭ５の学習を行う。すなわち、セッション評価処理部６０は、入力された対話を、利用者が満足しているか（好意的対話であるか）、満足していないか（非好意的対話であるか）、それ以外であるか（中性対話であるか）の３つに分類するモデルを学習する。 Then, the session evaluation processing unit 60 causes the ternary evaluation model M5 to learn features of the unfavorable dialogue example UD, the neutral dialogue example ND, and the favorable dialogue example FD. For example, when the utterance and response character strings included in the unfavorable dialogue example UD are input, the session evaluation processing unit 60 outputs a value indicating that the input utterance and response are unfavorable dialogue examples. Then, when a character string of an utterance and a response included in the neutral dialogue example ND is input, a value indicating that the input utterance and response is a neutral dialogue example is output and included in the favorable dialogue example FD. When a character string of the utterance and response to be input is input, learning of the ternary evaluation model M5 is performed so that a value indicating that the input utterance and response is a favorable dialogue example is output. That is, the session evaluation processing unit 60 determines whether the user is satisfied with the input dialogue (whether it is a favorable dialogue), is not satisfied (whether it is a non-favorable dialogue), or otherwise. (Neutral dialogue) is learned.

そして、セッション評価処理部６０は、生成した３値評価モデルＭ５を用いて、対象セッションの評価を行う。例えば、セッション評価処理部６０は、３値評価モデルＭ５に対象セッションに含まれる発話および応答の文字列を入力し、３値評価モデルＭ５の出力に基づいて、対象セッションに利用者が満足しているか（すなわち、高満足セッションであるか）、利用者が満足していないか（すなわち、低満足セッションであるか）、それ以外であるか（すなわち、中性セッションであるか）を評価する。 Then, the session evaluation processing unit 60 evaluates the target session using the generated ternary evaluation model M5. For example, the session evaluation processing unit 60 inputs the utterance and response character strings included in the target session to the ternary evaluation model M5, and based on the output of the ternary evaluation model M5, the user is satisfied with the target session. Whether the user is not satisfied (ie, a low-satisfaction session) or not (ie, a neutral session).

なお、セッション評価処理部６０は、各対話例を抽出する際、指標発話や中性発話の直前に行われた任意の数の発話および応答を抽出してよい。このように、指標発話や中性発話の直前に行われた発話および応答が有する特徴をモデルに学習させることで、モデルによる対話に対する評価、ひいては、セッションに対する評価の精度を向上させることができる。 When extracting each dialogue example, the session evaluation processing unit 60 may extract an arbitrary number of utterances and responses performed immediately before the index utterance and the neutral utterance. In this way, by making the model learn the features of the utterance and response performed immediately before the index utterance and the neutral utterance, it is possible to improve the evaluation of the dialogue by the model, and the accuracy of the evaluation of the session.

〔２−５−４．エンゲージメントに基づく評価処理について〕
図１５は、エンゲージメントに基づいてセッションを評価する処理の一例を示す図である。例えば、対話に対する満足度が高い利用者は、利用期間や利用頻度が高くなり、対話に対する満足度が低い利用者は、利用期間や利用頻度が低くなると考えられる。このため、利用期間や利用頻度等のエンゲージメントが高い利用者の発話を含むセッションは、満足度が高く、エンゲージメントが低い利用者の発話を含むセッションは、満足度が低いと推定される。そこで、セッション評価処理部６０は、セッションに含まれる発話を行った利用者のエンゲージメントに基づいて、そのセッションを評価する。 [2-5-4. Evaluation process based on engagement)
FIG. 15 is a diagram illustrating an example of a process of evaluating a session based on engagement. For example, it is considered that a user who has a high degree of satisfaction with the dialogue has a high use period and frequency of use, and a user who has a low degree of satisfaction with the dialogue has a low use period and frequency of use. For this reason, it is estimated that a session including an utterance of a user with high engagement such as a use period and a use frequency has a high degree of satisfaction, and a session including an utterance of a user with low engagement has a low degree of satisfaction. Therefore, the session evaluation processing unit 60 evaluates the session based on the engagement of the user who has made the utterance included in the session.

例えば、セッション評価処理部６０は、エンゲージメントデータベース３４を参照し、所定期間内における履歴期間および履歴頻度が所定の閾値を超える利用者を特定する。続いて、セッション評価処理部６０は、特定した利用者の発話を含むセッションをセッションログデータベース３１から抽出し、抽出したセッションをエンゲージメントが高いユーザーのセッションとする。 For example, the session evaluation processing unit 60 refers to the engagement database 34 and specifies a user whose history period and history frequency within a predetermined period exceed a predetermined threshold. Subsequently, the session evaluation processing unit 60 extracts a session including the utterance of the specified user from the session log database 31, and sets the extracted session as a session of a user with high engagement.

一方、セッション評価処理部６０は、エンゲージメントデータベース３４を参照し、所定期間内における履歴期間および履歴頻度が所定の閾値未満となる利用者を特定する。続いて、セッション評価処理部６０は、特定した利用者の発話を含むセッションをセッションログデータベース３１から抽出し、抽出したセッションをエンゲージメントが低いユーザーのセッションとする。 On the other hand, the session evaluation processing unit 60 refers to the engagement database 34 and specifies a user whose history period and history frequency within a predetermined period are less than a predetermined threshold. Subsequently, the session evaluation processing unit 60 extracts a session including the utterance of the specified user from the session log database 31, and sets the extracted session as a session of a user with low engagement.

そして、セッション評価処理部６０は、エンゲージメントが高いユーザーのセッションと、エンゲージメントが低いユーザーのセッションとが有する特徴をエンゲージメントセッション評価モデルＭ６に学習させる。例えば、セッション評価処理部６０は、エンゲージメントが高いユーザーのセッションに含まれる発話および応答を入力した場合は、セッションに対する利用者の満足度が高い旨を示す値を出力し、エンゲージメントが低いユーザーのセッションに含まれる発話および応答を入力した場合は、セッションに対する利用者の満足度が低い旨を示す値を出力するように、エンゲージメントセッション評価モデルＭ６を学習する。 Then, the session evaluation processing unit 60 causes the engagement session evaluation model M6 to learn the features of the session of the user with high engagement and the session of the user with low engagement. For example, when an utterance and a response included in a session of a user with a high engagement are input, the session evaluation processing unit 60 outputs a value indicating that the user's satisfaction with the session is high, and a session of a user with a low engagement is When the utterance and response included in are input, the engagement session evaluation model M6 is learned so as to output a value indicating that the user's satisfaction with the session is low.

例えば、セッション評価処理部６０は、エンゲージメントが高いユーザーのセッションに含まれる発話および応答を入力した場合は、所定の閾値よりも高い値の評価スコアを出力し、エンゲージメントが低いユーザーのセッションに含まれる発話および応答を入力した場合は、所定の閾値よりも低い値の評価スコアを出力するように、エンゲージメントセッション評価モデルＭ６を学習する。すなわち、セッション評価処理部６０は、エンゲージメントに基づいて抽出された学習データを用いて、入力されたセッションを、利用者の満足度が高いセッション若しくは利用者の満足度が低いセッションに分類するエンゲージメントセッション評価モデルＭ６を学習する。 For example, when an utterance and a response included in a session of a user with high engagement are input, the session evaluation processing unit 60 outputs an evaluation score of a value higher than a predetermined threshold, and is included in a session of a user with low engagement. When an utterance and a response are input, the engagement session evaluation model M6 is learned so as to output an evaluation score lower than a predetermined threshold. That is, the session evaluation processing unit 60 uses the learning data extracted based on the engagement to classify the input session into a session with high user satisfaction or a session with low user satisfaction. The evaluation model M6 is learned.

そして、セッション評価処理部６０は、エンゲージメントセッション評価モデルＭ６を用いて、対象セッションを評価する。例えば、セッション評価処理部６０は、対象セッションに含まれる発話および応答の文字列を入力した際に、エンゲージメントセッション評価モデルＭ６が出力した評価スコアに基づいて、対象セッションに対して利用者が満足しているか否かを評価する。 Then, the session evaluation processing unit 60 evaluates the target session using the engagement session evaluation model M6. For example, when a character string of an utterance and a response included in the target session is input, the session evaluation processing unit 60 determines whether the user is satisfied with the target session based on the evaluation score output by the engagement session evaluation model M6. Evaluate whether or not

〔２−５−５．繰り返し発話に基づく評価処理について〕
また、図１６は、繰り返し発話に基づいてセッションを評価する処理の概念を示す図である。例えば、同一又は類似の発話がセッション内で繰り返されている場合、そのセッションに対する利用者の満足度は低く、同一又は類似の発話がセッション内で繰り返されていない場合、そのセッションに対する利用者の満足度は高いと推定される。そこで、セッション評価処理部６０は、セッションに含まれる繰り返し発話に基づいて、セッションを評価してもよい。 [2-5-5. Evaluation process based on repeated speech)
FIG. 16 is a diagram illustrating a concept of a process of evaluating a session based on repeated utterances. For example, if the same or similar utterance is repeated in a session, the user's satisfaction with the session is low, and if the same or similar utterance is not repeated in the session, the user's satisfaction with the session is low. The degree is estimated to be high. Therefore, the session evaluation processing unit 60 may evaluate the session based on the repeated utterance included in the session.

例えば、セッション評価処理部６０は、対象セッションに含まれる発話から、表記が類似するペアの数を表記類似ペア数として計数する。例えば、セッション評価処理部６０は、Ｊａｃｃａｒｄ係数、Ｄｉｃｅ係数、Ｓｉｍｐｓｏｎ係数等といった類似度が所定の閾値を超えるペアの数や、レーベンシュタイン距離等の各種編集距離に基づく類似度が所定の閾値を超えるペアの数を計数する。また、セッション評価処理部６０は、ｗｏｒｄ２ｖｅｃやＧｌｏＶｅ等を用いて、意味の類似度が所定の閾値を超えるペアの数を意味類似ペア数として算出してもよい。 For example, the session evaluation processing unit 60 counts, from the utterances included in the target session, the number of pairs with similar notations as the number of notation-similar pairs. For example, the session evaluation processing unit 60 determines whether the similarity based on various editing distances such as the Jaccard coefficient, the Dice coefficient, the Simpson coefficient, and the like exceeds a predetermined threshold, and the similarity based on various editing distances such as the Levenshtein distance exceeds the predetermined threshold. Count the number of pairs. In addition, the session evaluation processing unit 60 may calculate the number of pairs whose similarity in meaning exceeds a predetermined threshold as the number of meaning-similar pairs using word2vec, GloVe, or the like.

そして、セッション評価処理部６０は、対象セッションから抽出された表記類似ペア数や意味類似ペア数の値が所定の閾値を超える場合は、対象セッションを低満足セッションと評価し、対象セッションから抽出された表記類似ペア数や意味類似ペア数の値が所定の閾値を下回る場合は、対象セッションを高満足セッションと評価する。 Then, when the value of the number of notation similar pairs or the number of similar meaning pairs extracted from the target session exceeds a predetermined threshold, the session evaluation processing unit 60 evaluates the target session as a low-satisfaction session, and is extracted from the target session. When the value of the number of notation similar pairs or the number of meaning similar pairs is less than a predetermined threshold, the target session is evaluated as a high satisfaction session.

また、セッション評価処理部６０は、繰り返し発話を含むセッションの特徴をモデルに学習させることで、対象セッションを評価する反復発話評価モデルＭ７の学習を行ってもよい。例えば、セッション評価処理部６０は、繰り返し発話を含むセッション（例えば、各ペア数が所定の閾値を超えるセッション）を入力した際に、入力されたセッションが低満足セッションである旨を出力するように反復発話評価モデルＭ７を学習させてもよい。また、セッション評価処理部６０は、このような反復発話評価モデルＭ７を用いて、対象セッションの評価を行ってもよい。 In addition, the session evaluation processing unit 60 may learn a repetitive utterance evaluation model M7 for evaluating a target session by causing a model to learn features of a session including repetitive utterances. For example, when a session including a repeated utterance (for example, a session in which the number of pairs exceeds a predetermined threshold) is input, the session evaluation processing unit 60 outputs a message indicating that the input session is a low-satisfaction session. The repetitive utterance evaluation model M7 may be learned. Further, the session evaluation processing unit 60 may evaluate the target session using such a repeated utterance evaluation model M7.

〔２−５−６．繰り返し発話の自動生成について〕
ここで、セッション評価処理部６０は、繰り返し発話を含むセッションを自動生成し、生成したセッションの特徴を反復発話評価モデルＭ７に学習させてもよい。例えば、図１７は、自動生成したデータの特徴を学習させたモデルを用いてセッションを評価する処理の概念を示す図である。例えば、セッション評価処理部６０は、セッションログデータベース３１を参照し、いずれかの発話をシード発話として抽出する。例えば、セッション評価処理部６０は、「今日の天気は」といった発話をシード発話として抽出する。 [2-5-6. Automatic generation of repeated speech)
Here, the session evaluation processing unit 60 may automatically generate a session including a repeated utterance, and cause the repeated utterance evaluation model M7 to learn features of the generated session. For example, FIG. 17 is a diagram illustrating the concept of a process of evaluating a session using a model that has learned the characteristics of automatically generated data. For example, the session evaluation processing unit 60 refers to the session log database 31 and extracts any utterance as a seed utterance. For example, the session evaluation processing unit 60 extracts an utterance such as "Today's weather" as a seed utterance.

続いて、セッション評価処理部６０は、抽出したシード発話から、シード発話と類似する発話を複数生成する。例えば、セッション評価処理部６０は、音声認識のミスやタイプミス等を模倣するため、「今日の天気は」といったシード発話の文字列に含まれる単語や文字をランダムに削除、若しくは音が類似する単語や文字列に置換した発話を複数生成する。より具体的な例を挙げると、セッション評価処理部６０は、「今日の天気は」というシード発話の文字列から、「今日の電気は」、「業の電気は」、「今日を電気は」といった文字列を複製発話として生成する。 Subsequently, the session evaluation processing unit 60 generates a plurality of utterances similar to the seed utterance from the extracted seed utterance. For example, the session evaluation processing unit 60 randomly deletes words or characters included in the character string of the seed utterance such as “Today's weather” to mimic a speech recognition mistake or a typo, or sounds similar. Generate multiple utterances replaced with words or character strings. To give a more specific example, the session evaluation processing unit 60 determines from the character string of the seed utterance "Today's weather is" that "Today's electricity is", "Evening electricity is", and "Today is electricity". Is generated as a duplicate utterance.

また、セッション評価処理部６０は、セッションログデータベース３１からシード発話とは異なる発話を混在発話として１つ若しくは複数抽出する。例えば、セッション評価処理部６０は、「牡羊座の運勢」といった発話を混在発話として抽出する。そして、セッション評価処理部６０は、シード発話と、シード発話から生成した複数の複製発話と、混在発話とをランダムに並び替えて、疑似的な発話からなるセッションを反復セッションＲＳとして生成する。例えば、図１７に示す例では、セッション評価処理部６０は、シード発話である「今日の天気は」という文字列を利用者の発話「Ｕｓｒ：今日の天気は」として含み、複製発話である「今日の電気は」という文字列を利用者の発話「Ｕｓｒ：今日の電気は」として含み、混在発話である「牡羊座の運勢」という文字列を利用者の発話「Ｕｓｒ：牡羊座の運勢」として含み、シード発話である「今日の天気は」という文字列を利用者の発話「Ｕｓｒ：今日の天気は」として再度含み、複製発話である「業の天気は」という文字列および「今日を電気は」を利用者の発話「Ｕｓｒ：業の天気は」および「Ｕｓｒ：今日を電気は」として含む反復セッションＲＳを生成する。 Further, the session evaluation processing unit 60 extracts one or a plurality of utterances different from the seed utterance from the session log database 31 as mixed utterances. For example, the session evaluation processing unit 60 extracts an utterance such as “fortune of Aries” as a mixed utterance. Then, the session evaluation processing unit 60 randomly rearranges the seed utterance, the plurality of duplicate utterances generated from the seed utterance, and the mixed utterance, and generates a session including a pseudo utterance as the repetitive session RS. For example, in the example illustrated in FIG. 17, the session evaluation processing unit 60 includes the character string “Today's weather”, which is a seed utterance, as the user's utterance “Usr: Today's weather”, and is a duplicate utterance. The user's utterance "Usr: Aries" includes a character string "Usr: Today's electricity" as a user's utterance "Usr: Today's electricity". The character string "Today's weather", which is a seed utterance, is included again as the user's utterance "Usr: Today's weather", and the character string "Trade weather is", which is a duplicate utterance, and " A repetitive session RS is generated that includes the user's utterances “Usr: Electricity Today” as “Usr: Business Weather” and “Usr: Electricity Today”.

すなわち、セッション評価処理部６０は、疑似的に繰り返し発話を含むセッションを生成する。なお、セッション評価処理部６０は、複数のシード発話を抽出し、各シード発話から複製発話を生成してもよい。 That is, the session evaluation processing unit 60 generates a session including a pseudo repeated speech. Note that the session evaluation processing unit 60 may extract a plurality of seed utterances and generate a duplicate utterance from each seed utterance.

また、セッション評価処理部６０は、セッションログデータベース３１を参照し、ランダムに複数の発話を抽出する。そして、セッション評価処理部６０は、抽出した発話をランダムに並び替えたセッションを非反復セッションＮＲＳとして生成する。そして、セッション評価処理部６０は、反復セッションＲＳと、非反復セッションＮＲＳとを学習データとして、反復発話評価モデルＭ７の学習を行う。例えば、セッション評価処理部６０は、反復セッションＲＳが入力された場合に、利用者が満足していない旨を示すスコアを出力し、非反復セッションＮＲＳが入力された場合に、利用者が満足している旨を示すスコアを出力するように、反復発話評価モデルＭ７の学習を行う。 Further, the session evaluation processing unit 60 refers to the session log database 31 and randomly extracts a plurality of utterances. Then, the session evaluation processing unit 60 generates a session in which the extracted utterances are randomly rearranged as a non-repetitive session NRS. Then, the session evaluation processing unit 60 learns the repetitive utterance evaluation model M7 using the repetitive session RS and the non-repetitive session NRS as learning data. For example, the session evaluation processing unit 60 outputs a score indicating that the user is not satisfied when the repetitive session RS is input, and outputs a score indicating that the user is not satisfied when the non-repetitive session NRS is input. The learning of the repetitive utterance evaluation model M7 is performed so as to output a score indicating that the repetition utterance is evaluated.

そして、セッション評価処理部６０は、反復発話評価モデルＭ７を用いて、対象セッションを評価する。例えば、セッション評価処理部６０は、対象セッションを入力した際に、反復発話評価モデルＭ７が出力するスコアの値に基づいて、対象セッションに対して利用者が満足しているか否かを評価する。 Then, the session evaluation processing unit 60 evaluates the target session using the repetitive utterance evaluation model M7. For example, when the target session is input, the session evaluation processing unit 60 evaluates whether or not the user is satisfied with the target session based on the value of the score output by the repetitive utterance evaluation model M7.

〔２−５−７．画像を用いた繰り返し発話の評価について〕
ここで、セッション評価処理部６０は、画像を用いた繰り返し発話の評価を行ってもよい。例えば、図１８は、画像を用いた繰り返し発話の評価の概念を示す図である。例えば、縦方向および横方向に同じ数の領域を設定した正方形の画像を設定し、あるセッションに含まれる発話の文字列の各文字を画像の縦方向および横方向に対応付け、縦方向および横方向に同じ文字が対応付けられた領域を塗りつぶした画像を生成した場合、かかる画像は、発話の文字列に含まれる同一文字の分布を示す分布画像となる。このような分布画像には、発話の文字列に繰り返しが含まれる場合、斜線を構成するように塗りつぶされた領域が含まれる。すなわち、あるセッションに類似する発話が含まれる場合、分布画像には斜線が現れることとなる。 [2-5-7. Evaluation of repeated speech using images)
Here, the session evaluation processing unit 60 may evaluate a repeated utterance using an image. For example, FIG. 18 is a diagram illustrating the concept of evaluating repeated utterances using images. For example, a square image in which the same number of regions are set in the vertical and horizontal directions is set, and each character of the utterance character string included in a certain session is associated with the vertical and horizontal directions of the image, and the vertical and horizontal directions are set. When an image is generated in which an area in which the same character is associated in the direction is generated, the image is a distribution image indicating the distribution of the same character included in the character string of the utterance. When a character string of an utterance includes repetition, such a distribution image includes a region that is filled so as to form a diagonal line. That is, when an utterance similar to a certain session is included, a hatched line appears in the distribution image.

近年、画像解析技術の進歩により、ＤＮＮ等を用いたモデルによる類似画像の判定精度が向上している。このため、単純にセッションに含まれる発話の文字列の特徴を学習するよりも、分布画像が有する特徴を学習した方が、繰り返し発話を含むセッションの特徴を精度良く学習できると考えられる。 In recent years, with the progress of image analysis technology, the accuracy of determining similar images using a model using DNN or the like has been improved. For this reason, it is considered that learning the features of the distribution image can more accurately learn the features of the session including the repeated utterances than simply learning the features of the character strings of the utterances included in the session.

そこで、セッション評価処理部６０は、分布画像が有する特徴を繰り返し評価モデルＭ１に学習させ、繰り返し評価モデルＭ１を用いて、対象セッションの評価を行ってもよい。例えば、セッション評価処理部６０は、図１６に示すように、表記類似ペアや意味類似ペアの数が所定の閾値を超えるセッションを繰り返し発話を含むセッションとしてセッションログデータベース３１から抽出する。なお、セッション評価処理部６０は、図１７に示す手法により生成した反復セッションＲＳを繰り返し発話を含むセッションとして取得してもよい。そして、セッション評価処理部６０は、繰り返し発話を含むセッションの各発話から、分布画像を生成する。 Therefore, the session evaluation processing unit 60 may cause the repetition evaluation model M1 to learn the features of the distribution image, and may evaluate the target session using the repetition evaluation model M1. For example, as illustrated in FIG. 16, the session evaluation processing unit 60 extracts, from the session log database 31, a session in which the number of notation-similar pairs and meaning-similar pairs exceeds a predetermined threshold, as a session including repeated utterances. Note that the session evaluation processing unit 60 may acquire the repetitive session RS generated by the method illustrated in FIG. 17 as a session including repeated utterances. Then, the session evaluation processing unit 60 generates a distribution image from each utterance of the session including the repeated utterance.

一方、セッション評価処理部６０は、図１６に示すように、表記類似ペアや意味類似ペアの数が所定の閾値を下回るセッションを繰り返し発話を含まないセッションとしてセッションログデータベース３１から抽出する。なお、セッション評価処理部６０は、図１７に示す手法により生成した非反復セッションＮＲＳを繰り返し発話を含まないセッションとして取得してもよい。そして、セッション評価処理部６０は、繰り返し発話を含まないセッションの各発話から、分布画像を生成する。 On the other hand, as shown in FIG. 16, the session evaluation processing unit 60 extracts from the session log database 31 a session in which the number of notation-similar pairs and meaning-similar pairs falls below a predetermined threshold as a session that does not include utterances repeatedly. Note that the session evaluation processing unit 60 may acquire the non-repetitive session NRS generated by the method illustrated in FIG. 17 as a session that does not include repeated utterances. Then, the session evaluation processing unit 60 generates a distribution image from each utterance of the session that does not include the repeated utterance.

そして、セッション評価処理部６０は、生成した分布画像が有する特徴を繰り返し評価モデルＭ１に学習させる。例えば、セッション評価処理部６０は、繰り返し発話を含むセッションの各発話から生成した分布画像が入力された場合は、利用者が満足していない旨を示すスコアを出力し、繰り返し発話を含まないセッションの各発話から生成した分布画像が入力された場合は、利用者が満足している旨を示すスコアを出力するように、繰り返し評価モデルＭ１の学習を行う。 Then, the session evaluation processing unit 60 causes the evaluation model M1 to repeatedly learn features of the generated distribution image. For example, when a distribution image generated from each utterance of a session including a repeated utterance is input, the session evaluation processing unit 60 outputs a score indicating that the user is not satisfied, and outputs a score indicating that the user does not include the repeated utterance. When the distribution image generated from each of the utterances is input, learning of the evaluation model M1 is repeatedly performed so as to output a score indicating that the user is satisfied.

続いて、セッション評価処理部６０は、対象セッションに含まれる各発話から、分布画像を生成し、生成した分布画像を繰り返し評価モデルＭ１に入力する。そして、セッション評価処理部６０は、繰り返し評価モデルＭ１が出力スコアに基づいて、対象セッションに対して利用者が満足しているか否かを評価する。 Subsequently, the session evaluation processing unit 60 generates a distribution image from each utterance included in the target session, and repeatedly inputs the generated distribution image to the evaluation model M1. Then, the session evaluation processing unit 60 evaluates whether the user is satisfied with the target session based on the output score of the repetition evaluation model M1.

〔２−５−８．謝罪応答に基づいた評価について〕
また、図１９は、謝罪応答に基づいてセッションを評価する処理の概念を示す図である。図１９に示すように、謝罪応答が含まれるセッションに対する利用者の満足度は低く、謝罪応答が含まれないセッションに対する利用者の満足度は高いと推定される。そこで、セッション評価処理部６０は、謝罪応答に基づいたセッションの評価を行ってもよい。 [2-5-8. About evaluation based on apology response)
FIG. 19 is a diagram showing the concept of processing for evaluating a session based on an apology response. As shown in FIG. 19, it is estimated that the user's satisfaction with the session including the apology response is low, and the user's satisfaction with the session including no apology response is high. Therefore, the session evaluation processing unit 60 may evaluate the session based on the apology response.

例えば、セッション評価処理部６０は、セッションログデータベース３１を参照し、予め登録された所定の謝罪応答が含まれるセッションを抽出する。また、セッション評価処理部６０は、所定の謝罪応答が含まれないセッションを抽出する。そして、セッション評価処理部６０は、抽出したセッションが有する特徴を謝罪評価モデルＭ２に学習させる。例えば、セッション評価処理部６０は、謝罪応答を含むセッションが入力された場合は、利用者が満足していない旨を示すスコアを出力し、謝罪応答を含まないセッションが入力された場合は、利用者が満足している旨を示すスコアを出力するように、謝罪評価モデルＭ２の学習を行う。 For example, the session evaluation processing unit 60 refers to the session log database 31 and extracts a session including a predetermined apology response registered in advance. Further, the session evaluation processing unit 60 extracts a session that does not include a predetermined apology response. Then, the session evaluation processing unit 60 causes the apology evaluation model M2 to learn features of the extracted session. For example, when a session including an apology response is input, the session evaluation processing unit 60 outputs a score indicating that the user is not satisfied. The apology evaluation model M2 is learned so as to output a score indicating that the person is satisfied.

続いて、セッション評価処理部６０は、対象セッションを謝罪評価モデルＭ２に入力する。そして、セッション評価処理部６０は、謝罪評価モデルＭ２が出力スコアに基づいて、対象セッションに対して利用者が満足しているか否かを評価する。 Subsequently, the session evaluation processing unit 60 inputs the target session into the apology evaluation model M2. Then, the session evaluation processing unit 60 evaluates whether the user is satisfied with the target session based on the output score of the apology evaluation model M2.

〔２−５−９．各評価基準の組み合わせについて〕
なお、セッション評価処理部６０は、上述した各評価手法を組み合わせることで、対象セッションの評価を行ってもよい。例えば、図２０は、複数の評価手法を組み合わせてセッションを評価する処理の概念を示す図である。図２０に示すように、セッション評価処理部６０は、繰り返し評価モデルＭ１、謝罪評価モデルＭ２、非好意的指標発話評価モデルＭ３、好意的指標発話評価モデルＭ４、３値評価モデルＭ５、エンゲージメントセッション評価モデルＭ６、および反復発話評価モデルＭ７を生成する。そして、セッション評価処理部６０は、各モデルＭ１〜Ｍ７のそれぞれに対象セッションや対象セッションから生成した分布画像を入力し、対象セッションを評価する。 [2-5-9. Combination of evaluation criteria)
The session evaluation processing unit 60 may evaluate the target session by combining the above-described evaluation methods. For example, FIG. 20 is a diagram illustrating a concept of a process of evaluating a session by combining a plurality of evaluation methods. As shown in FIG. 20, the session evaluation processing unit 60 includes a repetition evaluation model M1, an apology evaluation model M2, an unfavorable index utterance evaluation model M3, a favorable index utterance evaluation model M4, a ternary evaluation model M5, an engagement session evaluation. A model M6 and a repeated speech evaluation model M7 are generated. Then, the session evaluation processing unit 60 inputs the target session and the distribution image generated from the target session to each of the models M1 to M7, and evaluates the target session.

例えば、繰り返し評価モデルＭ１は、対象セッションの分布画像が繰り返しセッションの分布画像と類似する場合、すなわち、対象セッションに発話の繰り返しがあると、利用者の満足度が低い旨の評価を出力する。また、謝罪評価モデルＭ２は、対象セッションに謝罪発話が含まれると、利用者の満足度が低い旨の評価を出力する。また、非好意的指標発話評価モデルＭ３は、対象セッションに非好意的指標発話が含まれると、利用者の満足度が低い旨の評価を出力する。また、好意的指標発話評価モデルＭ４は、対象セッションに好意的指標発話が含まれると、利用者の満足度が高い旨の評価を出力する。 For example, when the distribution image of the target session is similar to the distribution image of the repetition session, that is, when the target session has repeated utterances, the repetition evaluation model M1 outputs an evaluation indicating that the user satisfaction is low. In addition, when the apology utterance is included in the target session, the apology evaluation model M2 outputs an evaluation indicating that the degree of user satisfaction is low. Further, the unfavorable index utterance evaluation model M3 outputs an evaluation indicating that the degree of user satisfaction is low when the target session includes the unfavorable index utterance. Further, the favorable index utterance evaluation model M4 outputs an evaluation indicating that the degree of user satisfaction is high when the target session includes a favorable index utterance.

また、３値評価モデルＭ５は、対象セッションに利用者が満足している対話と特徴が類似する対話が含まれると、利用者の満足度が高い旨の評価を出力する。また、エンゲージメントセッション評価モデルＭ６は、対象セッションの特徴が、エンゲージメントが高い利用者の発話を含むセッションと特徴が類似する場合、すなわち、満足度が高い利用者の発話を含むセッションと類似する場合は、利用者の満足度が高い旨の評価を出力する。また、反復発話評価モデルＭ７は、対象セッションに発話の繰り返しがあると、利用者の満足度が低い旨の評価を出力する。 Further, the ternary evaluation model M5 outputs an evaluation indicating that the degree of user satisfaction is high when the target session includes a dialogue having similar characteristics to the dialogue satisfied by the user. Further, the engagement session evaluation model M6 indicates that the feature of the target session is similar to the session including the utterance of the user with high engagement, that is, the feature of the target session is similar to the session including the utterance of the user with high satisfaction. Then, the evaluation that the user's satisfaction is high is output. In addition, the repetitive utterance evaluation model M7 outputs an evaluation indicating that the satisfaction of the user is low when the target session includes repeated utterances.

セッション評価処理部６０は、これら各モデルＭ１〜Ｍ７による評価結果の組み合わせに基づいて、対象セッションを評価する。例えば、セッション評価処理部６０は、対象セッションに対する利用者の満足度が高いと判定したモデルの数と、利用者の満足度が低いと判定したモデルの数とを計数し、満足度が高いと判定したモデルの数が満足度が低いと判定したモデルの数よりも多い場合は、対象セッションに対する利用者の満足度が高いと判定してもよい。すなわち、セッション評価処理部６０は、多数決により、対象セッションの評価を行ってもよい。 The session evaluation processing unit 60 evaluates the target session based on a combination of the evaluation results of the models M1 to M7. For example, the session evaluation processing unit 60 counts the number of models determined to have a high degree of user satisfaction with the target session and the number of models determined to have a low degree of user satisfaction with the target session. If the number of determined models is larger than the number of models determined to have a low degree of satisfaction, it may be determined that the degree of user satisfaction with the target session is high. That is, the session evaluation processing unit 60 may evaluate the target session by majority decision.

また、セッション評価処理部６０は、各モデルＭ１〜Ｍ７のうち、一部のモデルの評価結果を用いて、対象セッションの評価を行ってもよい。また、例えば、セッション評価処理部６０は、各モデルＭ１〜Ｍ７が出力するスコアを統合してもよい。例えば、セッション評価処理部６０は、モデルＭ１、Ｍ２、Ｍ３、Ｍ７のスコアの和から、モデルＭ４〜Ｍ６のスコアを減算したスコアを算出し、算出したスコアの値に基づいて、対象セッションを評価してもよい。 Further, the session evaluation processing unit 60 may evaluate the target session by using the evaluation results of some of the models M1 to M7. Further, for example, the session evaluation processing unit 60 may integrate the scores output from the models M1 to M7. For example, the session evaluation processing unit 60 calculates a score obtained by subtracting the scores of the models M4 to M6 from the sum of the scores of the models M1, M2, M3, and M7, and evaluates the target session based on the calculated score value. May be.

また、セッション評価処理部６０は、各モデルＭ１〜Ｍ７を用いた評価結果をさらに学習してもよい。例えば、セッション評価処理部６０は、利用者の満足度が高いと評価されたセッションと、利用者の満足度が低いと評価されたセッションとを学習データとして、モデルに学習させ、かかるモデルを用いて、対象セッションの評価を行ってもよい。また、指標発話抽出処理部５０は、このようなセッション評価処理部６０による評価結果に基づいて、指標発話の抽出を行ってもよい。例えば、指標発話抽出処理部５０は、セッション評価処理部６０によって満足度が高いと評価されたセッション、すなわち、高満足セッションから好意的指標発話を抽出し、セッション評価処理部６０によって満足度が低いと評価されたセッション、すなわち、低満足セッションから非好意的指標発話を抽出してもよい。 Further, the session evaluation processing unit 60 may further learn evaluation results using the models M1 to M7. For example, the session evaluation processing unit 60 causes a model to learn, as learning data, a session evaluated as having high user satisfaction and a session evaluated as having low user satisfaction, and uses the model. Then, the target session may be evaluated. In addition, the index utterance extraction processing unit 50 may extract the index utterance based on the evaluation result by the session evaluation processing unit 60. For example, the index utterance extraction processing unit 50 extracts a favorable index utterance from a session evaluated as having a high degree of satisfaction by the session evaluation processing unit 60, that is, a session with high satisfaction, and the session evaluation processing unit 60 has a low degree of satisfaction. An unfavorable index utterance may be extracted from a session evaluated as, that is, a low satisfaction session.

〔２−５−１０．ラベル付きデータの自動拡張について〕
ここで、セッション評価処理部６０は、各モデルＭ１〜Ｍ７の学習に用いるセッションのデータを自動拡張してもよい。例えば、図２１は、ラベル付きセッションを自動拡張する処理の概念を示す図である。 [2-5-10. Automatic expansion of labeled data)
Here, the session evaluation processing unit 60 may automatically extend session data used for learning each of the models M1 to M7. For example, FIG. 21 is a diagram illustrating the concept of a process of automatically expanding a labeled session.

例えば、セッション評価処理部６０は、利用者の満足度が低いセッション、すなわち、低満足セッションに含まれる各発話をそれぞれ抽出し、セッションログデータベース３１から、抽出した各発話と類似する他の発話を検索する。また、セッション評価処理部６０は、低満足セッションに含まれる各応答をそれぞれ抽出し、セッションログデータベース３１から、抽出した各応答と類似する他の応答を検索する。 For example, the session evaluation processing unit 60 extracts each utterance included in a session with a low user satisfaction, that is, each utterance included in the low-satisfaction session, and extracts another utterance similar to each extracted utterance from the session log database 31. Search for. In addition, the session evaluation processing unit 60 extracts each response included in the low satisfaction session, and searches the session log database 31 for another response similar to the extracted response.

そして、セッション評価処理部６０は、検索した発話および応答とを組み合わせることで、新たなセッションを疑似的に生成し、生成した新たなセッションを低満足セッションとして、モデルＭ１〜Ｍ７の学習時における学習データとして採用してもよい。 Then, the session evaluation processing unit 60 generates a new session in a pseudo manner by combining the searched utterance and the response, and sets the generated new session as a low-satisfaction session to perform learning at the time of learning the models M1 to M7. It may be adopted as data.

〔２−５−１１．セッション評価処理部の機能構成について〕
次に、図２２を用いて、セッション評価処理部６０が有する機能構成の一例について説明する。図２２は、実施形態に係るセッション評価処理部の機能構成の一例を示す図である。図２２に示すように、セッション評価処理部６０は、学習データ抽出部６１、学習部６２、画像生成部６３、発話群生成部６４、および評価部６５を有する。 [2-5-11. Functional configuration of session evaluation processing unit]
Next, an example of a functional configuration of the session evaluation processing unit 60 will be described with reference to FIG. FIG. 22 is a diagram illustrating an example of a functional configuration of the session evaluation processing unit according to the embodiment. As shown in FIG. 22, the session evaluation processing unit 60 includes a learning data extraction unit 61, a learning unit 62, an image generation unit 63, an utterance group generation unit 64, and an evaluation unit 65.

学習データ抽出部６１は、学習データとなるセッションの抽出を行う。例えば、学習データ抽出部６１は、図１２に示すように、好意的指標発話の出現頻度や、好意的指標発話の指標スコアの合計が所定の条件を満たすセッションを、好意的指標発話評価モデルＭ４の学習データとなる高満足セッションとして抽出する。また、学習データ抽出部６１は、高満足セッション以外のセッションを、好意的指標発話評価モデルＭ４の学習データとなる高満足セッション以外のセッションとして抽出する。 The learning data extraction unit 61 extracts a session serving as learning data. For example, as illustrated in FIG. 12, the learning data extraction unit 61 determines a session in which the appearance frequency of the favorable index utterance and the sum of the index scores of the favorable index utterance satisfy a predetermined condition, by using the favorable index utterance evaluation model M4. Is extracted as a high-satisfaction session that becomes learning data for. Further, the learning data extracting unit 61 extracts sessions other than the high-satisfaction session as sessions other than the high-satisfaction session serving as learning data of the favorable index utterance evaluation model M4.

また、例えば、学習データ抽出部６１は、図１２に示すように、非好意的指標発話の出現頻度や、非好意的指標発話の指標スコアの合計が所定の条件を満たすセッションを、非好意的指標発話評価モデルＭ３の学習データとなる高満足セッションとして抽出する。また、学習データ抽出部６１は、低満足セッション以外のセッションを、非好意的指標発話評価モデルＭ３の学習データとなる低満足セッション以外のセッションとして抽出する。 Further, for example, as shown in FIG. 12, the learning data extracting unit 61 determines a session in which the appearance frequency of unfavorable index utterances and the sum of the index scores of the unfavorable index utterances satisfy a predetermined condition, as shown in FIG. The session is extracted as a high satisfaction session serving as learning data for the index utterance evaluation model M3. Further, the learning data extracting unit 61 extracts sessions other than the low-satisfaction session as sessions other than the low-satisfaction session serving as learning data of the unfavorable index utterance evaluation model M3.

なお、学習データ抽出部６１は、セッション評価データベース３２を参照し、高満足セッションおよび低満足セッションを特定し、特定した高満足セッションおよび低満足セッションを非好意的指標発話評価モデルＭ３および好意的指標発話評価モデルＭ４の学習データとして、セッションログデータベース３１から抽出してもよい。 The learning data extraction unit 61 refers to the session evaluation database 32, specifies a high satisfaction session and a low satisfaction session, and identifies the specified high satisfaction session and low satisfaction session as the unfavorable index utterance evaluation model M3 and the favorable index. The learning data of the utterance evaluation model M4 may be extracted from the session log database 31.

また、学習データ抽出部６１は、図１５に示すように、利用者のエンゲージメントに基づいて、エンゲージメントセッション評価モデルＭ６の学習データとなるセッションの抽出を行ってもよい。例えば、学習データ抽出部６１は、エンゲージメントデータベース３４を参照し、各利用者の利用態様、すなわち、履歴期間と履歴頻度とを特定する。そして、学習データ抽出部６１は、特定された利用態様が所定の条件を満たす利用者を特定し、特定した利用者の発話を含むセッションをセッションログデータベース３１から抽出する。 Further, as shown in FIG. 15, the learning data extracting unit 61 may extract a session serving as learning data of the engagement session evaluation model M6 based on the user's engagement. For example, the learning data extraction unit 61 refers to the engagement database 34 and specifies a usage mode of each user, that is, a history period and a history frequency. Then, the learning data extraction unit 61 specifies a user whose specified use mode satisfies a predetermined condition, and extracts a session including the utterance of the specified user from the session log database 31.

例えば、学習データ抽出部６１は、履歴期間や履歴頻度が所定の閾値を超える利用者を特定し、特定した利用者の発話を含むセッションを、エンゲージメントが高いユーザーのセッションとして、セッションログデータベース３１から抽出する。また、学習データ抽出部６１は、履歴期間や履歴頻度が所定の閾値を下回る利用者を特定し、特定した利用者の発話を含むセッションを、エンゲージメントが低いユーザーのセッションとして、セッションログデータベース３１から抽出する。そして、学習データ抽出部６１は、抽出した各セッションを、エンゲージメントセッション評価モデルＭ６の学習データとしてもよい。 For example, the learning data extraction unit 61 specifies a user whose history period or history frequency exceeds a predetermined threshold, and sets a session including the utterance of the specified user as a session of a user with high engagement from the session log database 31. Extract. Further, the learning data extraction unit 61 specifies a user whose history period or history frequency is lower than a predetermined threshold, and sets a session including an utterance of the specified user as a session of a user with low engagement from the session log database 31. Extract. Then, the learning data extraction unit 61 may use each extracted session as learning data of the engagement session evaluation model M6.

すなわち、学習データ抽出部６１は、利用態様が前記所定の条件を満たす利用者の発話を含むセッションを、利用者の満足度が高いという第１属性を有する学習データとして抽出する。また、学習データ抽出部６１は、利用態様が所定の条件を満たさない利用者の発話を含むセッションを、利用者の満足度が低いという第２属性を有する学習データとして抽出する。 That is, the learning data extraction unit 61 extracts a session including a speech of a user whose use mode satisfies the predetermined condition as learning data having the first attribute of high user satisfaction. Further, the learning data extracting unit 61 extracts a session including an utterance of a user whose use mode does not satisfy a predetermined condition, as learning data having a second attribute of low user satisfaction.

また、学習データ抽出部６１は、図１６に示すように、セッションログデータベース３１から、表記類似ペアの数や意味類似ペアの数が多いセッションを低満足セッションとして抽出し、表記類似ペアの数や意味類似ペアの数が少ないセッションを高満足セッションとして抽出する。そして、学習データ抽出部６１は、抽出したセッションを、繰り返し評価モデルＭ１や反復発話評価モデルＭ７の学習データの元となるセッションとしてもよい。 Further, as shown in FIG. 16, the learning data extraction unit 61 extracts, from the session log database 31, a session having a large number of notation similar pairs and a large number of meaning similar pairs as a low-satisfaction session. A session with a small number of semantically similar pairs is extracted as a highly satisfactory session. Then, the learning data extracting unit 61 may set the extracted session as a session that is a source of learning data of the repetition evaluation model M1 and the repetition utterance evaluation model M7.

また、学習データ抽出部６１は、図１９に示すように、セッションログデータベース３１から、謝罪応答が含まれるセッションと、謝罪応答が含まれないセッションとを、謝罪評価モデルＭ２の学習データとして抽出する。 Further, as shown in FIG. 19, the learning data extracting unit 61 extracts, from the session log database 31, a session including an apology response and a session not including an apology response as learning data of the apology evaluation model M2. .

なお、上述した処理に限らず、学習データ抽出部６１は、各モデルＭ１〜Ｍ７の学習に有用なセッションを学習データとして抽出してもよい。例えば、学習データ抽出部６１は、セッション評価データベース３２に登録されたセッションの評価結果に基づいて、高満足セッションや低満足セッションを各モデルＭ１〜Ｍ７の学習データとして抽出してもよい。 The learning data extraction unit 61 is not limited to the above-described processing, and may extract sessions useful for learning the models M1 to M7 as learning data. For example, the learning data extracting unit 61 may extract a high satisfaction session or a low satisfaction session as learning data of each of the models M1 to M7 based on the evaluation result of the session registered in the session evaluation database 32.

学習部６２は、各モデルＭ１〜Ｍ７の学習を行う。例えば、学習部６２は、図１３に示すように、学習データ抽出部６１によって抽出された学習データとして、非好意的指標発話評価モデルＭ３および好意的指標発話評価モデルＭ４の学習を行う。 The learning unit 62 learns each of the models M1 to M7. For example, as illustrated in FIG. 13, the learning unit 62 learns the unfavorable index utterance evaluation model M3 and the favorable index utterance evaluation model M4 as the learning data extracted by the learning data extraction unit 61.

例えば、学習部６２は、図１３に示すように、評価が所定の条件を満たすセッションが有する特徴を非好意的指標発話評価モデルＭ３や好意的指標発話評価モデルＭ４に学習させる。例えば、学習部６２は、学習データ抽出部６１により抽出された高満足セッションと、高満足セッション以外のセッションを学習データとして好意的指標発話評価モデルＭ４の学習を行う。すなわち、学習部６２は、入力された対象セッションを高満足セッションとそれ以外のセッションとに分類する好意的指標発話評価モデルＭ４を学習する。 For example, as illustrated in FIG. 13, the learning unit 62 causes the non-favorable index utterance evaluation model M3 and the favorable index utterance evaluation model M4 to learn features of a session whose evaluation satisfies a predetermined condition. For example, the learning unit 62 learns the favorable index utterance evaluation model M4 using the high satisfaction session extracted by the learning data extraction unit 61 and a session other than the high satisfaction session as learning data. That is, the learning unit 62 learns a favorable index utterance evaluation model M4 that classifies the input target session into a high satisfaction session and other sessions.

また、学習部６２は、学習データ抽出部６１により抽出された低満足セッションと、低満足セッション以外のセッションを学習データとして非好意的指標発話評価モデルＭ３の学習を行う。すなわち、学習部６２は、入力された対象セッションを低満足セッションとそれ以外のセッションとに分類する非好意的指標発話評価モデルＭ３を学習する。 The learning unit 62 learns the unfavorable index utterance evaluation model M3 using the low satisfaction session extracted by the learning data extraction unit 61 and sessions other than the low satisfaction session as learning data. That is, the learning unit 62 learns the unfavorable index utterance evaluation model M3 that classifies the input target session into low-satisfaction sessions and other sessions.

また、学習部６２は、発話群生成部６４により生成された対話例を学習データとして、図１４に示すように、３値評価モデルＭ５の学習を行う。すなわち、学習部６２は、指標発話と対応する一連の対話を学習データとして、３値評価モデルＭ５の学習を行う。例えば、発話群生成部６４は、非好意的指標発話の直前に行われた所定の数の発話および応答を非好意的対話例ＵＤとして抽出し、好意的指標発話の直前に行われた所定の数の発話および応答とを好意的対話例ＦＤとして抽出し、中性発話の直前に行われた所定の数の発話および応答とを中性対話例ＮＤとして抽出する。このような場合、学習部６２は、非好意的対話例ＵＤ、好意的対話例ＦＤ、および中性対話例ＮＤを学習データとして、対象セッションの評価を行う３値評価モデルＭ５の学習を行う。すなわち、学習部６２は、対象セッションにおける対話に対する利用者の評価を推定するモデルの学習を行う。 Further, the learning unit 62 learns the ternary evaluation model M5 as shown in FIG. 14 using the dialogue example generated by the utterance group generation unit 64 as learning data. That is, the learning unit 62 learns the ternary evaluation model M5 using a series of dialogues corresponding to the index utterance as learning data. For example, the utterance group generation unit 64 extracts a predetermined number of utterances and responses performed immediately before the unfavorable index utterance as the unfavorable dialogue example UD, and extracts the predetermined number of utterances and responses performed immediately before the favorable index utterance. A number of utterances and responses are extracted as favorable dialogue examples FD, and a predetermined number of utterances and responses performed immediately before the neutral utterance are extracted as neutral dialogue examples ND. In such a case, the learning unit 62 learns the ternary evaluation model M5 that evaluates the target session using the unfavorable dialogue example UD, the favorable dialogue example FD, and the neutral dialogue example ND as learning data. That is, the learning unit 62 learns a model for estimating the user's evaluation on the dialogue in the target session.

例えば、学習部６２は、非好意的対話例ＵＤを、利用者による評価が非好意的である対話の学習データとする。また、学習部６２は、好意的対話例ＦＤを、利用者による評価が好意的である対話の学習データとする。また、学習部６２は、中性対話例ＮＤを、利用者による評価が中立的である対話の学習データとする。そして、学習部６２は、学習データが有する特徴を３値評価モデルＭ５に学習させることで、利用者による評価が非好意的な対話、好意的な対話、あるいは中立的な対話へと入力された対話を分類するモデルを実現する。 For example, the learning unit 62 sets the unfavorable dialogue example UD as learning data of a dialogue whose evaluation by the user is unfavorable. In addition, the learning unit 62 sets the favorable conversation example FD as learning data of the conversation in which the evaluation by the user is favorable. Further, the learning unit 62 sets the neutral dialogue example ND as learning data of a dialogue whose evaluation by the user is neutral. Then, the learning unit 62 causes the ternary evaluation model M5 to learn the features of the learning data, so that the evaluation by the user is input into an unfavorable conversation, a favorable conversation, or a neutral conversation. Implement a model for classifying dialogue.

また、学習部６２は、図１５に示すように、学習データ抽出部６１により抽出されたエンゲージメントが高いユーザーのセッションやエンゲージメントが低いユーザーのセッションを学習データとして、エンゲージメントセッション評価モデルＭ６の学習を行う。すなわち、学習部６２は、入力された対象セッションを、エンゲージメントが高いユーザーによるセッションと類似するセッション（以下、「第１属性のセッション」と記載する。）、もしくは、エンゲージメントが低いユーザーによるセッションと類似するセッション（以下、「第２属性のセッション」と記載する。）に分類するエンゲージメントセッション評価モデルＭ６の学習を行う。 Further, as shown in FIG. 15, the learning unit 62 learns the engagement session evaluation model M6 using the session of the user with high engagement and the session of the user with low engagement extracted by the learning data extraction unit 61 as learning data. . That is, the learning unit 62 sets the input target session similar to a session similar to a session by a user with high engagement (hereinafter, referred to as a “first attribute session”) or a session by a user with low engagement. The engagement session evaluation model M6, which is classified as a session to be performed (hereinafter, referred to as a “second attribute session”), is learned.

ここで、エンゲージメントが高いユーザーによるセッションは、利用者による満足度が高いと推定され、エンゲージメントが低いユーザーによるセッションは、利用者による満足度が低いと推定される。このため、第１属性のセッションは、利用者による評価が好意的なセッション、すなわち、高満足セッションであり、第２属性のセッションは、利用者による評価が非好意的なセッション、すなわち、低満足セッションであると言える。このため、学習部６２は、対象コンテンツに対する利用者の印象が好意的であるか非好意的であるかを判定するエンゲージメントセッション評価モデルＭ６を学習することができる。 Here, a session by a user with high engagement is estimated to have a high degree of user satisfaction, and a session by a user with low engagement is estimated to have a low degree of user satisfaction. For this reason, the session of the first attribute is a session in which the evaluation by the user is favorable, that is, a high satisfaction session, and the session of the second attribute is a session in which the evaluation by the user is unfavorable, that is, low satisfaction. It can be said that it is a session. For this reason, the learning unit 62 can learn the engagement session evaluation model M6 that determines whether the user's impression of the target content is favorable or unfavorable.

また、学習部６２は、図１７や図１８に示すように、繰り返し発話を含むセッションを学習データとして、反復発話評価モデルＭ７や繰り返し評価モデルＭ１の学習を行う。例えば、学習部６２は、図１７に示す手法により発話群生成部６４がシード発話や混在発話から生成した反復セッションＲＳと、非反復セッションＮＲＳとを取得すると、取得した反復セッションＲＳと、非反復セッションＮＲＳとを学習データとして、反復発話評価モデルＭ７の学習を行う。すなわち、学習部６２は、対象セッションを、繰り返し発話を含むセッション、または、繰り返し発話を含まないセッションに分類する反復発話評価モデルＭ７の学習を行う。 Further, as illustrated in FIGS. 17 and 18, the learning unit 62 learns the repetitive utterance evaluation model M7 and the repetition evaluation model M1 using the session including the repetitive utterance as learning data. For example, when the learning unit 62 acquires the repetitive session RS and the non-repetitive session NRS generated from the seed utterance and the mixed utterance by the utterance group generation unit 64 by the method illustrated in FIG. Learning of the repetitive utterance evaluation model M7 is performed using the session NRS as learning data. That is, the learning unit 62 learns a repetitive utterance evaluation model M7 that classifies the target session into a session including repetitive utterance or a session not including repetitive utterance.

ここで、繰り返し発話を含むセッションは、利用者の印象が非好意的であり、繰り返し発話を含まないセッションは、利用者の印象が好意的であると推定される。このため、学習部６２は、対象コンテンツに対する利用者の印象が好意的であるか非好意的であるかを判定する反復発話評価モデルＭ７を学習することができる。 Here, it is presumed that a session including a repeated utterance has a negative impression of the user, and a session not including a repeated utterance has a favorable impression of the user. For this reason, the learning unit 62 can learn the repetitive utterance evaluation model M7 that determines whether the user's impression of the target content is favorable or unfavorable.

また、学習部６２は、画像生成部６３が繰り返し発話を含むセッションから生成した分布画像、および画像生成部６３が繰り返し発話を含まないセッションから生成した分布画像を学習データとして取得する。このような場合、学習部６２は、これらの分布画像を学習データとして、繰り返し評価モデルＭ１の学習を行う。より具体的には、学習部６２は、入力された画像を、繰り返し発話を含むセッションから生成した分布画像に類似する画像、もしくは、繰り返し発話を含まないセッションから生成した分布画像に類似する画像に分類する繰り返し評価モデルＭ１の学習を行う。 Further, the learning unit 62 acquires, as learning data, a distribution image generated from the session including the repeated utterance by the image generation unit 63 and a distribution image generated from the session not including the repeated utterance by the image generation unit 63. In such a case, the learning unit 62 learns the repetition evaluation model M1 using these distribution images as learning data. More specifically, the learning unit 62 converts the input image into an image similar to a distribution image generated from a session including repetitive utterance, or an image similar to a distribution image generated from a session including no repetitive utterance. The learning of the repeated evaluation model M1 to be classified is performed.

ここで、繰り返し発話を含むセッションの分布画像には、斜線が現れやすく、繰り返し発話を含まないセッションの分布画像には、斜線が表れにくい。さらに、繰り返し発話を含むセッションは、利用者の印象が非好意的であり、繰り返し発話を含まないセッションは、利用者の印象が好意的であると推定される。このため、学習部６２は、対象セッションの発話文字列を縦横に対応付けた分布画像を分類することで、対象セッションに対する利用者の印象が好意的であるか非好意的であるかを判定する繰り返し評価モデルＭ１を学習することができる。 Here, a hatched line is likely to appear in a distribution image of a session including a repeated utterance, and a hatched line is unlikely to appear in a distribution image of a session not including a repeated utterance. Further, it is presumed that a session including a repeated utterance has a negative impression of the user, and a session not including a repeated utterance has a favorable impression of the user. For this reason, the learning unit 62 determines whether the user's impression of the target session is favorable or unfavorable by classifying the distribution image in which the utterance character strings of the target session are vertically and horizontally associated. The repetition evaluation model M1 can be learned.

また、学習部６２は、図１９に示すように、学習データ抽出部６１が抽出したセッションを学習データとして、謝罪評価モデルＭ２の学習を行う。例えば、学習部６２は、入力された対象セッションを、謝罪応答が含まれるセッションと類似するセッション、若しくは、謝罪応答が含まれないセッションと類似するセッションに分類する謝罪評価モデルＭ２の学習を行う。 Further, as shown in FIG. 19, the learning unit 62 learns the apology evaluation model M2 using the session extracted by the learning data extraction unit 61 as learning data. For example, the learning unit 62 learns an apology evaluation model M2 that classifies the input target session into a session similar to a session including an apology response or a session similar to a session not including an apology response.

ここで、謝罪応答を含むセッションは、利用者の印象が非好意的であり、謝罪応答を含まないセッションは、利用者の印象が好意的であると推定される。このため、学習部６２は、謝罪応答に基づいて、対象セッションに対する利用者の印象が好意的であるか非好意的であるかを判定する謝罪評価モデルＭ２を学習することができる。 Here, it is presumed that the session including the apology response has the unfavorable impression of the user, and the session not including the apology response has the favorable impression of the user. For this reason, the learning unit 62 can learn the apology evaluation model M2 that determines whether the user's impression of the target session is favorable or unfavorable based on the apology response.

なお、学習部６２は、図２１に示すように、学習データとして取得した低満足セッションから、新たな低満足セッションを生成し、学習データに加えてもよい。例えば、学習部６２は、学習データとして取得したセッションに含まれる各発話および各応答と類似する発話および応答をそれぞれセッションログデータベース３１から抽出し、抽出した発話および応答を組み合わせることで、新たな学習データを生成してもよい。 The learning unit 62 may generate a new low-satisfaction session from the low-satisfaction sessions acquired as the learning data and add the new low-satisfaction session to the learning data as shown in FIG. For example, the learning unit 62 extracts, from the session log database 31, utterances and responses similar to the utterances and responses included in the session acquired as the learning data, and combines the extracted utterances and responses to generate new learning. Data may be generated.

ここで、学習部６２は、新たな学習データの生成に対し、生成元となった学習データと同じラベルを付与すればよい。例えば、学習部６２は、好意的指標発話が多いセッション、エンゲージメントが高いユーザーのセッション、好意的対話例ＦＤ、非反復セッションＮＲＳ等、利用者の印象が好意的な対話例から、利用者の印象が好意的な対話例を新たに生成し、学習データとしてもよい。また、例えば、学習部６２は、非好意的指標発話が多いセッション、エンゲージメントが低いユーザーのセッション、非好意的対話例ＮＤ、反復セッションＲＳ等、利用者の印象が非好意的な対話例から、利用者の印象が非好意的な対話例を新たに生成し、学習データとしてもよい。 Here, the learning unit 62 may assign the same label as the generation source learning data to the generation of new learning data. For example, the learning unit 62 extracts a user's impression from a conversation example in which the user's impression is favorable, such as a session with a lot of favorable index utterances, a session of a user with high engagement, a favorable dialogue example FD, and a non-repetitive session NRS. May generate a new favorable dialogue example and use it as learning data. In addition, for example, the learning unit 62 may determine a session in which the impression of the user is unfavorable, such as a session with many unfavorable indicator utterances, a session of a user with low engagement, a non-favorable dialogue example ND, and a repetitive session RS. A dialog example in which the user's impression is unfavorable may be newly generated and used as learning data.

画像生成部６３は、分布画像の生成を行う。例えば、画像生成部６３は、学習データ抽出部６１によって抽出された繰り返し発話を含むセッションや、繰り返し発話を含まないセッションを取得する。また、画像生成部６３は、発話群生成部６４により、図１７に示すように生成された反復セッションＲＳや非反復セッションＮＲＳを取得する。なお、画像生成部６３は、図２１に示すように、取得したセッションから新たなセッションを生成してもよい。 The image generation unit 63 generates a distribution image. For example, the image generation unit 63 acquires a session including a repeated utterance extracted by the learning data extraction unit 61 and a session not including a repeated utterance. In addition, the image generation unit 63 acquires the repetition session RS and the non-repetition session NRS generated by the utterance group generation unit 64 as illustrated in FIG. Note that the image generating unit 63 may generate a new session from the acquired session, as shown in FIG.

そして、画像生成部６３は、取得したセッションに含まれる発話の文字が相互に同一か否かを示す分布画像を生成する。すなわち、画像生成部６３は、セッションにおける同一文字の分布を示す画像を生成する。例えば、画像生成部６３は、セッションに含まれる発話の文字列を画像の縦方向の各領域と横方向の各領域とに対応付けた場合に、縦方向と横方向とに同一の文字が対応付けられた領域に対して所定の色彩を付した画像を生成する。 Then, the image generation unit 63 generates a distribution image indicating whether or not the characters of the utterance included in the acquired session are the same as each other. That is, the image generation unit 63 generates an image indicating the distribution of the same character in the session. For example, when the image generation unit 63 associates the character string of the utterance included in the session with each of the vertical region and the horizontal region of the image, the same character corresponds to the vertical direction and the horizontal direction. An image in which a predetermined color is applied to the attached area is generated.

発話群生成部６４は、非好意的対話例ＵＤ、好意的対話例ＦＤおよび中性対話例ＮＤの抽出を行う。すなわち、発話群生成部６４は、対話サービスにおける履歴から指標発話を特定し、特定した指標発話と対応する一連の対話を対話群として抽出する。例えば、発話群生成部６４は、指標発話から所定の数だけ前の応答および発話を学習データとして抽出する。なお、指標発話よりも前の応答や発話から、いくつの応答や発話を対話例として抽出するかについては、任意の設定が可能である。 The utterance group generation unit 64 extracts an unfavorable dialogue example UD, a favorable dialogue example FD, and a neutral dialogue example ND. That is, the utterance group generation unit 64 specifies the index utterance from the history in the dialog service, and extracts a series of dialogs corresponding to the specified index utterance as the dialog group. For example, the utterance group generation unit 64 extracts responses and utterances that are a predetermined number before the index utterance as learning data. Note that it is possible to arbitrarily set how many responses and utterances are extracted as examples of dialogues from responses and utterances before the index utterance.

例えば、発話群生成部６４は、セッションログデータベース３１を参照し、好意的指標発話を含むセッションを特定する。続いて、発話群生成部６４は、好意的指標発話から所定の数だけ前の応答および発話を好意的対話例ＦＤとして抽出する。同様に、発話群生成部６４は、非好意的指標発話を含むセッションを特定し、非好意的指標発話から所定の数だけ前の応答および発話を非好意的対話例ＵＤとして抽出する。また、発話群生成部６４は、中性発話を含むセッションを特定し、中性発話から所定の数だけ前の応答および発話を中性対話例ＮＤとして抽出する。 For example, the utterance group generation unit 64 refers to the session log database 31 and specifies a session including a favorable index utterance. Subsequently, the utterance group generation unit 64 extracts a predetermined number of responses and utterances before the favorable index utterance as a favorable dialogue example FD. Similarly, the utterance group generation unit 64 specifies a session that includes the unfavorable index utterance, and extracts a predetermined number of responses and utterances before the unfavorable index utterance as the unfavorable dialogue example UD. Further, the utterance group generation unit 64 specifies a session including a neutral utterance, and extracts a predetermined number of responses and utterances before the neutral utterance as a neutral dialogue example ND.

また、発話群生成部６４は、反復セッションＲＳ（すなわち、第１発話群）および非反復セッションＮＲＳ（すなわち、第２発話群）を生成する。例えば、発話群生成部６４は、対話サービスにおける履歴からいずれかの発話を選択し、選択した発話と同一又は類似する発話を複数含む発話群を生成する。 Further, the utterance group generation unit 64 generates a repetition session RS (that is, a first utterance group) and a non-repetition session NRS (that is, a second utterance group). For example, the utterance group generation unit 64 selects any utterance from the history in the interactive service, and generates an utterance group including a plurality of utterances that are the same as or similar to the selected utterance.

より具体的な例を挙げると、発話群生成部６４は、シード発話をセッションログデータベース３１から抽出し、シード発話と類似する複製発話を複数生成する。また、発話群生成部６４は、セッションログデータベース３１から混在発話を抽出する。そして、発話群生成部６４は、シード発話と、複数の複製発話と、混在発話とをランダムに並び替えて、疑似的な発話からなるセッションを反復セッションＲＳとして生成する。また、発話群生成部６４は、セッションログデータベース３１を参照し、ランダムに複数の発話を抽出し、抽出した発話をランダムに並び替えたセッションを非反復セッションＮＲＳとして生成する。 As a more specific example, the utterance group generation unit 64 extracts a seed utterance from the session log database 31 and generates a plurality of duplicate utterances similar to the seed utterance. In addition, the utterance group generation unit 64 extracts mixed utterances from the session log database 31. Then, the utterance group generation unit 64 randomly rearranges the seed utterance, the plurality of duplicate utterances, and the mixed utterance, and generates a session including a pseudo utterance as the repetitive session RS. In addition, the utterance group generation unit 64 refers to the session log database 31, extracts a plurality of utterances at random, and generates a session in which the extracted utterances are randomly rearranged as a non-repetitive session NRS.

なお、発話群生成部６４は、非好意的対話例ＵＤ、好意的対話例ＦＤもしくは中性対話例ＮＤ（以下、「対話例」と総称する）に含まれる発話および応答と類似する発話および応答を抽出し、抽出した発話および応答を組み合わせることで、新たな対話例を生成してもよい。例えば、発話群生成部６４は、図２１に示す手法により、非好意的対話例ＵＤから新たな非好意的対話例ＵＤを生成してもよく、好意的対話例ＦＤから新たな好意的対話例ＦＤを生成してもよく、中性対話例ＮＤから新たな中性対話例ＮＤを生成してもよい。また、発話群生成部６４は、反復セッションＲＳから新たな反復セッションＲＳを生成してもよく、非反復セッションＮＲＳから新たな非反復セッションＮＲＳを生成してもよい。 The utterance group generation unit 64 generates the utterance and the response similar to the utterance and the response included in the unfavorable dialogue example UD, the favorable dialogue example FD, or the neutral dialogue example ND (hereinafter, collectively referred to as “dialogue example”). May be extracted and a new dialog example may be generated by combining the extracted utterance and response. For example, the utterance group generation unit 64 may generate a new unfavorable dialogue example UD from the unfavorable dialogue example UD by the method shown in FIG. 21, and may generate a new favorable dialogue example from the favorable dialogue example FD. The FD may be generated, or a new neutral dialogue example ND may be generated from the neutral dialogue example ND. In addition, the utterance group generation unit 64 may generate a new repetition session RS from the repetition session RS, or may generate a new non-repetition session NRS from the non-repetition session NRS.

評価部６５は、セッションの評価を行う。例えば、評価部６５は、セッションログデータベース３１から評価対象となる対象セッションを選択する。そして、評価部６５は、指標発話データベース３３を参照し、指標発話抽出処理部５０により抽出された指標発話やシードとなる指標発話の出現頻度に基づいて、対象セッションを評価する。 The evaluation unit 65 evaluates the session. For example, the evaluation unit 65 selects a target session to be evaluated from the session log database 31. Then, the evaluation unit 65 refers to the index utterance database 33 and evaluates the target session based on the appearance frequency of the index utterance extracted by the index utterance extraction processing unit 50 or the index utterance serving as a seed.

例えば、評価部６５は、非好意的指標発話や好意的指標発話が含まれているか否かに基づいて、対象セッションを評価する。例えば、評価部６５は、図１２に示すように、対象セッションについて、非好意的指標発話の出現頻度と好意的指標発話の出現頻度との比に基づいて、対象セッションが低満足セッションであるか高満足セッションであるかを評価する。また、評価部６５は、非好意的指標発話の指標スコアの合計と、好意的指標発話の指標スコアの合計との比に基づいて、対象セッションが低満足セッションであるか高満足セッションであるかを評価してもよい。 For example, the evaluation unit 65 evaluates the target session based on whether or not an unfavorable index utterance or a favorable index utterance is included. For example, as shown in FIG. 12, the evaluation unit 65 determines whether the target session is a low-satisfaction session based on the ratio between the frequency of appearance of the unfavorable index utterance and the frequency of appearance of the favorable index utterance for the target session. Evaluate if it is a high satisfaction session. In addition, the evaluation unit 65 determines whether the target session is a low-satisfaction session or a high-satisfaction session based on the ratio of the total of the index scores of the unfavorable index utterances to the total of the index scores of the favorable index utterances. May be evaluated.

ここで、セッションの評価に用いられる指標発話には、指標発話抽出処理部５０により抽出された指標発話、すなわち、シードとなる指標発話との共起性により抽出された指標発話が含まれる。このため、評価部６５は、例えば、非好意的指標発話との共起性が所定の条件を満たす発話の出現頻度や、好意的指標発話との共起性が所定の条件を満たす発話の出現頻度に基づいて、対象セッションの評価を行うこととなる。 Here, the index utterance used for the evaluation of the session includes the index utterance extracted by the index utterance extraction processing unit 50, that is, the index utterance extracted by co-occurrence with the seed utterance. For this reason, the evaluation unit 65 determines, for example, the appearance frequency of the utterance whose co-occurrence with the unfavorable index utterance satisfies a predetermined condition, and the appearance frequency of the utterance whose co-occurrence with the favorable index utterance satisfies a predetermined condition. The target session will be evaluated based on the frequency.

また、セッションの評価に用いられる指標発話には、指標発話抽出処理部５０により利用者のエンゲージメントに基づいて抽出された指標発話が含まれる。すなわち、指標発話には、利用態様が所定の条件を満たす利用者のセッションでの出現頻度や、利用態様が所定の条件を満たさない利用者のセッションでの出現頻度に基づいて抽出された指標発話が含まれる。このため、評価部６５は、利用態様が所定の条件を満たす利用者の発話と、利用態様が所定の条件を満たさない利用者の発話とに基づいて、対象セッションを評価することとなる。 The index utterance used for evaluating the session includes the index utterance extracted by the index utterance extraction processing unit 50 based on the user's engagement. That is, in the index utterance, the index utterance extracted based on the appearance frequency in the session of the user whose use mode satisfies the predetermined condition or the appearance frequency in the session of the user whose use mode does not satisfy the predetermined condition is used. Is included. Therefore, the evaluation unit 65 evaluates the target session based on the utterance of the user whose use mode satisfies the predetermined condition and the utterance of the user whose use mode does not satisfy the predetermined condition.

また、セッションの評価に用いられる指標発話には、低満足セッションにおける出現頻度と高満足セッションにおける出現頻度との比に基づいて抽出された指標発話が含まれる。このため、評価部６５は、低満足セッションにおける出現頻度が所定の条件を満たす発話と、高満足セッションにおける出現頻度が所定の条件を満たす発話とに基づいて、対象セッションを評価することとなる。 The index utterance used for evaluating the session includes an index utterance extracted based on the ratio of the frequency of appearance in a low-satisfaction session to the frequency of appearance in a high-satisfaction session. Therefore, the evaluation unit 65 evaluates the target session based on the utterance whose appearance frequency in the low satisfaction session satisfies the predetermined condition and the utterance whose appearance frequency in the high satisfaction session satisfies the predetermined condition.

また、評価部６５は、図１３に示すように、高満足セッションが有する特徴を学習した好意的指標発話評価モデルＭ４を用いて、対象セッションを評価してもよい。また、評価部６５は、低満足セッションが有する特徴を学習した非好意的指標発話評価モデルＭ３を用いて、対象セッションを評価してもよい。例えば、評価部６５は、非好意的指標発話評価モデルＭ３や好意的指標発話評価モデルＭ４をモデルデータベース３５から読出し、非好意的指標発話評価モデルＭ３や好意的指標発話評価モデルＭ４に対象セッションを入力する。そして、評価部６５は、非好意的指標発話評価モデルＭ３や好意的指標発話評価モデルＭ４が出力するスコアに基づいて、対象セッションの評価を行ってもよい。例えば、評価部６５は、非好意的指標発話評価モデルＭ３により、非好意的指標発話が含まれるセッションに類似すると分類された場合は、対象セッションを低満足セッションと評価し、好意的指標発話評価モデルＭ４により、好意的指標発話が含まれるセッションに類似すると分類された場合は、対象セッションを高満足セッションと評価してもよい。 In addition, as shown in FIG. 13, the evaluation unit 65 may evaluate the target session by using a favorable index utterance evaluation model M4 that has learned the features of the high satisfaction session. In addition, the evaluation unit 65 may evaluate the target session using the unfavorable index utterance evaluation model M3 that has learned the features of the low satisfaction session. For example, the evaluation unit 65 reads the non-favorable index utterance evaluation model M3 and the favorable index utterance evaluation model M4 from the model database 35, and stores the target session in the non-favorable index utterance evaluation model M3 and the favorable index utterance evaluation model M4. input. Then, the evaluation unit 65 may evaluate the target session based on the score output by the unfavorable index utterance evaluation model M3 or the favorable index utterance evaluation model M4. For example, when the evaluation unit 65 is classified by the unfavorable index utterance evaluation model M3 as being similar to a session including the unfavorable index utterance, the evaluation unit 65 evaluates the target session as a low-satisfaction session, and evaluates the favorable index utterance. If the model M4 classifies the target session as similar to the session including the favorable index utterance, the target session may be evaluated as a high satisfaction session.

また、評価部６５は、図１４に示すように、非好意的対話例ＵＤ、中性対話例ＮＤ、および好意的対話例ＦＤの特徴を学習した３値評価モデルＭ５を用いて、対象セッションの評価を行ってもよい。例えば、評価部６５は、モデルデータベース３５から読み出した３値評価モデルＭ５に対象セッションを入力し、３値評価モデルＭ５による分類結果に基づいて、対象セッションを評価してもよい。例えば、評価部６５は、３値評価モデルＭ５により、対象セッションが非好意的対話例ＵＤに類似すると分類された場合は、対象セッションを低満足セッションと評価し、好意的対話例ＦＤに類似すると分類された場合は、対象セッションを高満足セッションと評価してもよい。 In addition, as shown in FIG. 14, the evaluation unit 65 uses the ternary evaluation model M5 that has learned the characteristics of the unfavorable dialogue example UD, the neutral dialogue example ND, and the favorable dialogue example FD to determine the target session. An evaluation may be performed. For example, the evaluation unit 65 may input the target session into the ternary evaluation model M5 read from the model database 35, and evaluate the target session based on the classification result by the ternary evaluation model M5. For example, when the target session is classified by the ternary evaluation model M5 as being similar to the unfavorable dialogue example UD, the evaluation unit 65 evaluates the target session as a low-satisfaction session and determines that the target session is similar to the favorable dialogue example FD. If classified, the target session may be evaluated as a high satisfaction session.

また、評価部６５は、図１５に示すように、利用態様が所定の条件を満たす利用者の発話を含むセッションとの類似性に基づいて、対象セッションを評価してもよい。例えば、評価部６５は、エンゲージメントが高いユーザのセッションと、エンゲージメントが低いユーザーのセッションとの特徴を学習したエンゲージメントセッション評価モデルＭ６に対象セッションを入力し、エンゲージメントセッション評価モデルＭ６の分類結果に基づいて、対象セッションを評価してもよい。例えば、評価部６５は、エンゲージメントセッション評価モデルＭ６により、対象セッションがエンゲージメントが高いユーザーのセッションに類似すると分類された場合は、対象セッションを高満足セッションと評価してもよい。 In addition, as shown in FIG. 15, the evaluation unit 65 may evaluate the target session based on the similarity with the session including the utterance of the user whose use mode satisfies the predetermined condition. For example, the evaluation unit 65 inputs the target session to the engagement session evaluation model M6 that has learned the characteristics of the session of the user with high engagement and the session of the user with low engagement, and based on the classification result of the engagement session evaluation model M6. Alternatively, the target session may be evaluated. For example, if the target session is classified by the engagement session evaluation model M6 as being similar to a session of a user with high engagement, the evaluation unit 65 may evaluate the target session as a high satisfaction session.

また、評価部６５は、図１６〜１８に示すように、繰り返し発話に基づいて、対象セッションを評価してもよい。例えば、評価部６５は、対象セッションに同一又は類似の文字列が繰り返し出現している場合は、対象セッションを低満足セッションと評価し、対象セッションに同一又は類似の文字列が繰り返し出現していない場合は、対象セッションを高満足セッションと評価してもよい。 In addition, the evaluation unit 65 may evaluate the target session based on repeated utterances, as shown in FIGS. For example, when the same or similar character string repeatedly appears in the target session, the evaluation unit 65 evaluates the target session as a low satisfaction session, and the same or similar character string does not repeatedly appear in the target session. In this case, the target session may be evaluated as a highly satisfactory session.

また、評価部６５は、文字列の類似度や意味的な類似度が所定の閾値を超える発話の対の数に基づいて、対象セッションを評価してもよい。例えば、評価部６５は、図１６に示すように、対象セッションに含まれる表記類似ペア数や意味類似ペア数の数が所定の閾値を超える場合は、対象セッションを低満足セッションと評価し、表記類似ペア数や意味類似ペア数の数が所定の閾値を下回る場合は、対象セッションを高満足セッションと評価してもよい。 Further, the evaluation unit 65 may evaluate the target session based on the number of utterance pairs in which the similarity of the character strings or the semantic similarity exceeds a predetermined threshold. For example, as shown in FIG. 16, when the number of notation similar pairs or the number of meaning similar pairs included in the target session exceeds a predetermined threshold, the evaluation unit 65 evaluates the target session as a low-satisfaction session, and If the number of similar pairs or the number of semantic similar pairs is below a predetermined threshold, the target session may be evaluated as a highly satisfactory session.

また、評価部６５は、図１７に示すように、繰り返し発話を含むセッションの特徴を学習した反復発話評価モデルＭ７を用いて、対象セッションを評価してもよい。例えば、評価部６５は、反復発話評価モデルＭ７に対象セッションを入力し、反復発話評価モデルＭ７が、対象セッションを反復セッションＲＳに類似するセッションに分類した場合は、対象セッションを低満足セッションと評価し、対象セッションを非反復セッションＮＲＳに類似するセッションに分類した場合は、対象セッションを高満足セッションと評価してもよい。 In addition, as shown in FIG. 17, the evaluation unit 65 may evaluate the target session using a repetitive utterance evaluation model M7 that has learned features of a session including repetitive utterances. For example, the evaluation unit 65 inputs the target session into the repetitive utterance evaluation model M7, and when the repetitive utterance evaluation model M7 classifies the target session into a session similar to the repetitive session RS, the evaluation unit 65 evaluates the target session as a low satisfaction session. However, when the target session is classified into a session similar to the non-repetitive session NRS, the target session may be evaluated as a high-satisfaction session.

また、評価部６５は、図１８に示すように、繰り返し発話を含むセッションにおける発話から生成した分布画像の特徴を学習することで、繰り返し発話を含むセッションの特徴を学習した繰り返し評価モデルＭ１を用いて、対象セッションを評価してもよい。例えば、画像生成部６３は、対象セッションに含まれる発話の分布画像を生成する。そして、画像生成部６３は、分布画像に基づいて、対象セッションの評価を行う。 Also, as shown in FIG. 18, the evaluation unit 65 learns the characteristics of the distribution image generated from the utterance in the session including the repeated utterance, and uses the repetition evaluation model M1 that has learned the characteristics of the session including the repeated utterance. Then, the target session may be evaluated. For example, the image generation unit 63 generates an utterance distribution image included in the target session. Then, the image generation unit 63 evaluates the target session based on the distribution image.

例えば、評価部６５は、対象画像を繰り返し評価モデルＭ１に入力し、繰り返し評価モデルＭ１が、対象セッションの分布画像を、繰り返し発話を含むセッションの分布画像と類似する画像に分類した場合は、対象セッションを低満足セッションと評価してもよい。また、例えば、評価部６５は、対象画像を繰り返し評価モデルＭ１に入力し、繰り返し評価モデルＭ１が、対象セッションの分布画像を、繰り返し発話を含まないセッションの分布画像と類似する画像に分類した場合は、対象セッションを高満足セッションと評価してもよい。 For example, the evaluation unit 65 inputs the target image to the repetition evaluation model M1, and when the repetition evaluation model M1 classifies the distribution image of the target session into an image similar to the distribution image of the session including repetitive utterances, A session may be rated as a low satisfaction session. Further, for example, when the evaluation unit 65 inputs the target image to the repetition evaluation model M1, and the repetition evaluation model M1 classifies the distribution image of the target session into an image similar to the distribution image of the session that does not include the repeated utterance. May evaluate the target session as a high satisfaction session.

なお、評価部６５は、公知の画像解析技術を用いて、対象セッションの分布画像に斜線が含まれているか否かを判定し、斜線が含まれていると判定した場合は、対象セッションを低満足セッションと評価してもよい。すなわち、評価部６５は、繰り返し評価モデルＭ１を用いずに、対象セッションの分布画像から、対象セッションに繰り返し発話が含まれているか否かを判定し、判定結果に基づいて、対象セッションを評価してもよい。 The evaluation unit 65 determines whether or not the distribution image of the target session includes a diagonal line by using a known image analysis technique. You may be evaluated as a satisfactory session. That is, the evaluation unit 65 determines whether or not the target session includes repeated utterances from the distribution image of the target session without using the repetition evaluation model M1, and evaluates the target session based on the determination result. May be.

また、評価部６５は、謝罪応答が含まれているか否かに基づいて、対象セッションを評価してもよい。例えば、評価部６５は、対象セッションに含まれる謝罪応答の数が所定の閾値を超える場合は、対象セッションを低満足セッションと評価してもよい。また、評価部６５は、図１９に示すように、謝罪評価モデルＭ２に対象セッションを入力し、謝罪評価モデルＭ２が対象セッションを謝罪応答が含まれるセッションに類似するセッションに分類した場合は、対象セッションを低満足セッションと評価してもよい。また、評価部６５は、謝罪評価モデルＭ２が対象セッションを謝罪応答が含まれないセッションに類似するセッションに分類した場合は、対象セッションを高満足セッションと評価してもよい。 Further, the evaluation unit 65 may evaluate the target session based on whether or not an apology response is included. For example, when the number of apology responses included in the target session exceeds a predetermined threshold, the evaluation unit 65 may evaluate the target session as a low satisfaction session. Also, as shown in FIG. 19, the evaluation unit 65 inputs the target session to the apology evaluation model M2, and if the apology evaluation model M2 classifies the target session into a session similar to the session including the apology response, the evaluation unit 65 A session may be rated as a low satisfaction session. When the apology evaluation model M2 classifies the target session into a session similar to a session that does not include an apology response, the evaluation unit 65 may evaluate the target session as a high-satisfaction session.

また、評価部６５は、複数のモデルを用いて、対象セッションの評価を行ってもよい。例えば、評価部６５は、図２０に示すように、セッションに対する利用者の評価の指標となる複数の特徴のうちそれぞれ異なる種別の特徴に基づいてセッションを評価する複数のモデルＭ１〜Ｍ７の評価結果を用いて、対象セッションを評価してもよい。 The evaluation unit 65 may evaluate the target session using a plurality of models. For example, as illustrated in FIG. 20, the evaluation unit 65 evaluates a plurality of models M1 to M7 that evaluate a session based on different types of characteristics among a plurality of characteristics serving as indexes of the user's evaluation of the session. May be used to evaluate the target session.

なお、このように、複数のモデルＭ１〜Ｍ７の評価結果を用いてセッションを評価する場合、任意のモデルの組み合わせが採用可能である。例えば、評価部６５は、図１に示すように、繰り返し評価モデルＭ１、謝罪評価モデルＭ２、非好意的指標発話評価モデルＭ３、および好意的指標発話評価モデルＭ４のそれぞれに対象セッション（および対象セッションの対象画像）を入力し、各モデルＭ１〜Ｍ３が出力したスコアの和から、好意的指標発話評価モデルＭ４が出力したスコアの値を減算した値を算出する。そして、評価部６５は、対象セッションのうち、算出した値が高い方から所定の数のセッションを低満足セッションと評価し、算出した値が低い方から所定の数のセッションを高満足セッションと評価してもよい。 When a session is evaluated using evaluation results of a plurality of models M1 to M7, any combination of models can be adopted. For example, as shown in FIG. 1, the evaluation unit 65 assigns the target session (and the target session) to each of the repetition evaluation model M1, the apology evaluation model M2, the unfavorable index utterance evaluation model M3, and the favorable index utterance evaluation model M4. Is calculated, and a value obtained by subtracting the value of the score output by the favorable index utterance evaluation model M4 from the sum of the scores output by the models M1 to M3 is calculated. Then, the evaluation unit 65 evaluates a predetermined number of sessions from a target session having a higher calculated value as a low satisfaction session, and evaluates a predetermined number of sessions from a lower calculated value as a high satisfaction session. May be.

なお、各モデルＭ１〜Ｍ３が出力したスコアの和から、好意的指標発話評価モデルＭ４が出力したスコアの値を減算した値を算出した場合、対象セッションに対して利用者が満足している場合は、負の値が算出され、対象セッションに対して利用者が満足していない場合は、正の値が算出されることとなる。そこで、評価部６５は、算出した値が正の値であるか負の値であるかに応じて、対象セッションを評価してもよい。 When a value obtained by subtracting the value of the score output by the favorable index utterance evaluation model M4 from the sum of the scores output by the models M1 to M3 is calculated, and the user is satisfied with the target session. Is calculated as a negative value, and if the user is not satisfied with the target session, a positive value is calculated. Therefore, the evaluation unit 65 may evaluate the target session according to whether the calculated value is a positive value or a negative value.

また、評価部６５は、例えば、セッションに含まれる発話に同一又は類似する発話が繰り返し含まれているか否かに基づいてセッションを評価する繰り返し評価モデルＭ１や反復発話評価モデルＭ７による評価結果を少なくとも用いてもよい。また、評価部６５は、セッションに含まれる応答に謝罪を示す所定の応答が含まれているか否かに基づいてセッションを評価する謝罪評価モデルＭ２の評価結果を少なくとも用いてもよい。 In addition, the evaluation unit 65 determines at least the evaluation result by the repetition evaluation model M1 or the repetition utterance evaluation model M7 that evaluates the session based on whether or not the same or similar utterance is repeatedly included in the utterance included in the session. May be used. Further, the evaluation unit 65 may use at least the evaluation result of the apology evaluation model M2 that evaluates the session based on whether or not the response included in the session includes a predetermined response indicating an apology.

また、評価部６５は、セッションに含まれる発話に非好意的指標発話が含まれているか否かに基づいてセッションを評価する非好意的指標発話評価モデルＭ３の評価結果を少なくとも用いてもよい。また、評価部６５は、セッションに含まれる発話に好意的指標発話が含まれているか否かに基づいてセッションを評価する好意的指標発話モデルＭ４の評価結果を少なくとも用いてもよい。 Further, the evaluation unit 65 may use at least the evaluation result of the unfavorable index utterance evaluation model M3 that evaluates the session based on whether or not the utterance included in the session includes the unfavorable index utterance. Further, the evaluation unit 65 may use at least the evaluation result of the favorable index utterance model M4 that evaluates the session based on whether or not the utterance included in the session includes the favorable index utterance.

ここで、非好意的指標発話評価モデルＭ３や好意的指標発話評価モデルＭ４は、学習データとなるセッションを抽出する際に、指標発話をあらかじめ設定もしくは抽出しておく必要がある。しかしながら、このように指標発話の設定や抽出をあらかじめ行うのは、手間がかかる。そこで、評価部６５は、繰り返し評価モデルＭ１と、謝罪評価モデルＭ２との評価結果のみに基づいて、対象コンテンツの評価を行ってもよい。例えば、評価部６５は、繰り返し評価モデルＭ１のスコアと、謝罪評価モデルＭ２のスコアとの和を算出し、和の値が大きい方から順に所定の数の対象セッションを低満足セッションと評価し、和の値が小さい方から順に所定の数の対象セッションを高満足セッションと評価してもよい。 Here, the index utterance needs to be set or extracted in advance in the unfavorable index utterance evaluation model M3 or the favorable index utterance evaluation model M4 when a session serving as learning data is extracted. However, it is troublesome to set and extract the index utterance in advance in this way. Therefore, the evaluation unit 65 may evaluate the target content based on only the evaluation results of the repetition evaluation model M1 and the apology evaluation model M2. For example, the evaluation unit 65 calculates the sum of the score of the repetition evaluation model M1 and the score of the apology evaluation model M2, and evaluates a predetermined number of target sessions as low-satisfaction sessions in descending order of the sum, A predetermined number of target sessions may be evaluated as high satisfaction sessions in ascending order of sum.

なお、複数のモデルＭ１〜Ｍ７を用いて対象セッションを評価する場合、評価部６５は、各モデルＭ１〜Ｍ７の評価結果に所定の重みを設定してもよい。例えば、評価部６５は、各モデルＭ１〜Ｍ７出力するスコアの値に各モデルＭ１〜Ｍ７ごとの所定の係数を積算し、係数を積算した各スコアの値を統合してもよい。なお、評価部６５は、対象セッションの評価を行った場合は、評価結果をセッション評価データベース３２に登録する。このような評価結果は、例えば、指標発話の抽出に利用されてもよく、他のセッションの評価に利用されてもよい。 When the target session is evaluated using the plurality of models M1 to M7, the evaluation unit 65 may set a predetermined weight to the evaluation result of each of the models M1 to M7. For example, the evaluation unit 65 may integrate a predetermined coefficient for each of the models M1 to M7 into a score value output from each of the models M1 to M7, and integrate the score values obtained by integrating the coefficients. When the evaluation unit 65 evaluates the target session, the evaluation unit registers the evaluation result in the session evaluation database 32. Such an evaluation result may be used, for example, to extract an index utterance, or may be used to evaluate another session.

〔２−６．予測処理について〕
図２に戻り、説明を続ける。エンゲージメント予測処理部７０は、将来の利用者の対話サービスの利用態様を予測する予測処理を実行する。以下、図２３〜図２５を用いて、エンゲージメント予測処理部７０が実行する処理の概念、および、エンゲージメント予測処理部７０が有する機能構成の一例を説明する。 [2-6. About prediction processing)
Returning to FIG. 2, the description will be continued. The engagement prediction processing unit 70 executes a prediction process of predicting a future use mode of the user's interactive service. Hereinafter, the concept of the process executed by the engagement prediction processing unit 70 and an example of the functional configuration of the engagement prediction processing unit 70 will be described with reference to FIGS.

〔２−６−１．指標発話に基づく予測処理について〕
例えば、エンゲージメント予測処理部７０は、将来の対話サービスの利用態様の予測対象となる利用者（以下、「対象利用者」と記載する。）のセッションに含まれる指標発話に基づいて、予測処理を実行する。例えば、図２３は、指標発話に基づいて将来の利用態様を予測する処理の概念を示す図である。 [2-6-1. Prediction process based on index utterance]
For example, the engagement prediction processing unit 70 performs a prediction process based on an index utterance included in a session of a user (hereinafter, referred to as a “target user”) that is a prediction target of a usage mode of a future interactive service. Execute. For example, FIG. 23 is a diagram illustrating a concept of a process of predicting a future use mode based on an index utterance.

図２３に示すように、好意的指標発話が多い利用者は、対話サービスに対して好意的な印象を有すると考えられる。このように、対話サービスに対して好意的な印象を有する利用者は、自ずと対話サービスを利用する結果、利用期間や利用頻度等のエンゲージメントが高い利用者になると推定される。一方、非好意的指標発話が多い利用者は、対話サービスに対して非好意的な印象を有すると考えられる。このように、対話サービスに対して非好意的な印象を有する利用者は、対話サービスを利用しなくなり、エンゲージメントが低い利用者になると推定される。 As shown in FIG. 23, it is considered that a user who frequently gives a favorable index utterance has a favorable impression on the interactive service. As described above, it is presumed that a user who has a favorable impression of the interactive service naturally uses the interactive service and, as a result, becomes a user who is highly engaged in the use period, the use frequency, and the like. On the other hand, it is considered that a user who has many unfavorable index utterances has an unfavorable impression on the interactive service. As described above, it is presumed that a user who has an unfavorable impression of the interactive service stops using the interactive service and becomes a user with low engagement.

そこで、エンゲージメント予測処理部７０は、利用者の発話を含むセッションにおける指標発話の出現頻度に基づいて、その利用者の将来のエンゲージメントを予測する。例えば、エンゲージメント予測処理部７０は、対象利用者の発話を含むセッションであって、予測処理の実行時から所定の期間内（以下、「観測期間」と記載する。）に行われた対話を含むセッションを抽出する。 Therefore, the engagement prediction processing unit 70 predicts the future engagement of the user based on the frequency of appearance of the index utterance in the session including the utterance of the user. For example, the engagement prediction processing unit 70 is a session including the utterance of the target user, and includes a dialog performed within a predetermined period (hereinafter, referred to as an “observation period”) from the time of executing the prediction processing. Extract sessions.

続いて、エンゲージメント予測処理部７０は、観測期間におけるセッションに含まれる好意的指標発話および非好意的指標発話を抽出し、観測期間における好意的指標発話の出現頻度の合計、および、観測期間における非好意的指標発話の出現頻度の合計を算出する。そして、エンゲージメント予測処理部７０は、観測期間における好意的指標発話の出現頻度の合計を観測期間における非好意的指標発話の出現頻度の合計で除算したスコアを算出し、算出したスコアが高い程、将来におけるエンゲージメントが高い利用者であると判定する。 Subsequently, the engagement prediction processing unit 70 extracts the favorable index utterance and the unfavorable index utterance included in the session during the observation period, and sums the appearance frequency of the favorable index utterance during the observation period and the non-favorable index utterance during the observation period. Calculate the total appearance frequency of the favorable index utterances. Then, the engagement prediction processing unit 70 calculates a score obtained by dividing the total appearance frequency of favorable index utterances in the observation period by the total appearance frequency of non-favorable index utterances in the observation period. It is determined that the user has high engagement in the future.

なお、エンゲージメント予測処理部７０は、好意的指標発話や非好意的指標発話の指標スコアの値を考慮してもよい。例えば、エンゲージメント予測処理部７０は、観測期間における好意的指標発話の指標スコアの合計、および、観測期間における非好意的指標発話の指標スコアの合計を算出する。そして、エンゲージメント予測処理部７０は、観測期間における好意的指標発話の指標スコアの合計を観測期間における非好意的指標発話の指標スコアの合計で除算したスコアを算出し、算出したスコアが高い程、将来におけるエンゲージメントが高い利用者であると判定してもよい。そして、エンゲージメント予測処理部７０は、予測結果をエンゲージメントデータベース３４に登録する。 Note that the engagement prediction processing unit 70 may consider the value of the index score of a favorable index utterance or an unfavorable index utterance. For example, the engagement prediction processing unit 70 calculates the sum of index scores of favorable index utterances during the observation period and the sum of index scores of unfavorable index utterances during the observation period. Then, the engagement prediction processing unit 70 calculates a score obtained by dividing the total of the index scores of the favorable index utterances in the observation period by the total of the index scores of the non-favorable index utterances in the observation period. It may be determined that the user has high engagement in the future. Then, the engagement prediction processing unit 70 registers the prediction result in the engagement database 34.

なお、このような処理によって予測された将来におけるエンゲージメントは、指標発話の抽出やセッションの評価に利用される。例えば、指標発話抽出処理部５０は、図９に示す処理において、エンゲージメント予測処理部７０によって予測された将来のエンゲージメントが高い利用者を好意的利用者とし、エンゲージメント予測処理部７０によって予測された将来のエンゲージメントが低い利用者を非好意的利用者とすることで、各利用者の発話から指標発話の抽出を行ってもよい。また、例えば、セッション評価処理部６０は、図１５に示す処理おいて、エンゲージメント予測処理部７０によって予測された将来のエンゲージメントが高い利用者のセッションや、他の利用者のセッションから、エンゲージメントセッション評価モデルＭ６を学習し、学習したエンゲージメントセッション評価モデルＭ６を用いて、対象セッションの評価を行ってもよい。 The future engagement predicted by such processing is used for extracting an index utterance and evaluating a session. For example, in the processing illustrated in FIG. 9, the index utterance extraction processing unit 50 regards a user whose future engagement predicted by the engagement prediction processing unit 70 is high as a favorable user, and sets the future prediction predicted by the engagement prediction processing unit 70. The index utterance may be extracted from the utterance of each user by setting the user with low engagement as the unfavorable user. In addition, for example, in the processing illustrated in FIG. 15, the session evaluation processing unit 60 evaluates an engagement session evaluation from a session of a user with a high future engagement predicted by the engagement prediction processing unit 70 or a session of another user. The target session may be evaluated by learning the model M6 and using the learned engagement session evaluation model M6.

〔２−６−２．評価結果に基づく予測処理について〕
また、例えば、エンゲージメント予測処理部７０は、セッションの評価結果に基づいて、予測処理を実行してもよい。例えば、図２４は、評価結果に基づいて将来の利用態様を予測する処理の概念を示す図である。 [2-6-2. About prediction processing based on evaluation results]
Further, for example, the engagement prediction processing unit 70 may execute a prediction process based on the evaluation result of the session. For example, FIG. 24 is a diagram illustrating a concept of a process of predicting a future use mode based on an evaluation result.

例えば、図２４に示すように、評価処理によって満足度が高いと評価されたセッション、すなわち、高満足セッションが多い利用者は、対話サービスに対して好意的な印象を有すると推定されるため、将来におけるエンゲージメントが高くなると考えられる。一方、評価処理によって満足度が低いと評価されたセッション、すなわち低満足セッションが多い利用者は、対話サービスに対して非好意的な印象を有すると推定されるため、将来におけるエンゲージメントが低くなると考えられる。 For example, as shown in FIG. 24, a session evaluated as having a high degree of satisfaction by the evaluation process, that is, a user having many high-satisfaction sessions is estimated to have a favorable impression on the interactive service. Engagement in the future is expected to increase. On the other hand, sessions evaluated as having a low degree of satisfaction by the evaluation process, that is, users who have many low-satisfaction sessions, are presumed to have an unfavorable impression on the dialogue service, and therefore are considered to have low future engagement. Can be

そこで、エンゲージメント予測処理部７０は、対象利用者の発話を含むセッションに対する評価結果に基づいて、その利用者の将来のエンゲージメントを予測する。例えば、エンゲージメント予測処理部７０は、対象利用者の発話を含むセッションであって、観測期間に行われた対話を含むセッションを抽出する。続いて、エンゲージメント予測処理部７０は、抽出したセッションに対する評価結果を取得する。例えば、エンゲージメント予測処理部７０は、セッション評価データベース３２を参照し、セッション評価処理部６０による評価結果を取得する。 Therefore, the engagement prediction processing unit 70 predicts the future engagement of the target user based on the evaluation result of the session including the utterance of the target user. For example, the engagement prediction processing unit 70 extracts a session that includes the utterance of the target user and that includes a dialog performed during the observation period. Subsequently, the engagement prediction processing unit 70 acquires an evaluation result for the extracted session. For example, the engagement prediction processing unit 70 acquires the evaluation result by the session evaluation processing unit 60 with reference to the session evaluation database 32.

そして、エンゲージメント予測処理部７０は、評価結果に統計値に基づいて、予測期間における対象利用者のエンゲージメントを予測する。例えば、エンゲージメント予測処理部７０は、抽出したセッションのうち高満足セッションの割合と低満足セッションとの割合とを算出する。そして、エンゲージメント予測処理部７０は、高満足セッションの割合が高い程、対象利用者を将来におけるエンゲージメントが高い利用者であると判定してもよい。また、例えば、エンゲージメント予測処理部７０は、抽出したセッションのうち高満足セッションの数が所定の閾値を超える場合は、対象利用者を将来におけるエンゲージメントが高い利用者であると判定してもよい。 Then, the engagement prediction processing unit 70 predicts the engagement of the target user in the prediction period based on the evaluation result based on the statistical value. For example, the engagement prediction processing unit 70 calculates a ratio of a high satisfaction session and a ratio of a low satisfaction session among the extracted sessions. Then, the engagement prediction processing unit 70 may determine that the target user is a user having a high future engagement as the ratio of the high satisfaction sessions is higher. Further, for example, when the number of high-satisfaction sessions among the extracted sessions exceeds a predetermined threshold, the engagement prediction processing unit 70 may determine that the target user is a user with high future engagement.

また、例えば、エンゲージメント予測処理部７０は、高満足セッションに正のスコアを、低満足セッションに負のスコアを割り当てるとともに、より新しいセッションにより大きな係数を設定する。そして、エンゲージメント予測処理部７０は、各セッションのスコアと係数との積の値を合計したスコアが、所定の閾値を超えるか否かに基づいて、対象利用者の将来におけるエンゲージメントを推定してもよい。なお、エンゲージメント予測処理部７０は、高満足セッションに負のスコアを、低満足セッションに正のスコアを割り当てるとともに、より新しいセッションにより大きな係数を設定し、各セッションのスコアと係数との積の値を合計したスコアが、所定の閾値以下となるか否かに基づいて、対象利用者の将来におけるエンゲージメントを推定してもよい。 Further, for example, the engagement prediction processing unit 70 assigns a positive score to a high satisfaction session and a negative score to a low satisfaction session, and sets a larger coefficient to a newer session. Then, the engagement prediction processing unit 70 estimates the future engagement of the target user based on whether or not the score obtained by summing the product of the score of each session and the coefficient exceeds a predetermined threshold. Good. The engagement prediction processing unit 70 assigns a negative score to a high satisfaction session and a positive score to a low satisfaction session, sets a larger coefficient to a newer session, and sets the value of the product of the score and the coefficient for each session. The engagement of the target user in the future may be estimated based on whether or not the score obtained by summing is equal to or smaller than a predetermined threshold.

〔２−６−３．エンゲージメント予測処理部の機能構成について〕
次に、図２５を用いて、エンゲージメント予測処理部７０が有する機能構成の一例について説明する。図２５は、実施形態に係るエンゲージメント予測処理部の機能構成の一例を示す図である。図２５に示すように、エンゲージメント予測処理部７０は、推定部７１を有する。 [2-6-3. Functional configuration of the engagement prediction processing unit]
Next, an example of a functional configuration of the engagement prediction processing unit 70 will be described with reference to FIG. FIG. 25 is a diagram illustrating an example of a functional configuration of an engagement prediction processing unit according to the embodiment. As shown in FIG. 25, the engagement prediction processing unit 70 has an estimation unit 71.

推定部７１は、対象利用者の発話を含むセッションに対する評価結果に基づいて、対象利用者による将来の対話サービスの利用態様を推定する。例えば、推定部７１は、観測期間における好意的指標発話の指標スコアの合計、すなわち、利用者による評価がどれくらい好意的であるかを示す好意的スコアを算出する。また、推定部７１は、観測期間における非好意的指標発話の指標スコアの合計、すなわち、利用者による評価がどれくらい非好意的であるかを示す非好意的スコアを算出する。そして、推定部７１は、好意的好意的スコアおよび非好意的スコアの比に基づいて、対象利用者の将来のエンゲージメント、すなわち、利用態様の推定を行う。 The estimating unit 71 estimates a future use mode of the interactive service by the target user based on the evaluation result of the session including the utterance of the target user. For example, the estimating unit 71 calculates a total of the index scores of the favorable index utterances during the observation period, that is, a favorable score indicating how favorable the evaluation by the user is. Further, the estimating unit 71 calculates the total of the index scores of the unfavorable index utterances during the observation period, that is, the unfavorable score indicating how unfavorable the evaluation by the user is. Then, the estimating unit 71 estimates the future engagement of the target user, that is, the usage mode, based on the ratio between the favorable / favorable score and the non-favorable score.

また、推定部７１は、対象利用者の観測期間における各セッションが高満足セッションであるか、低満足セッションであるかを特定する。そして、推定部７１は、高満足セッションおよび低満足セッションの数や割合等に基づいて、対象利用者の将来の利用態様の推定を行う。 Further, the estimating unit 71 specifies whether each session during the observation period of the target user is a high satisfaction session or a low satisfaction session. Then, the estimating unit 71 estimates the future use mode of the target user based on the number and ratio of the high satisfaction sessions and the low satisfaction sessions.

〔２−７．各処理の関連性について〕
ここで、図２６を用いて、指標発話抽出処理部５０が実行する抽出処理、セッション評価処理部６０が実行する評価処理、およびエンゲージメント予測処理部７０が実行する予測処理の関連性について説明する。図２６は、実施形態に係る情報提供装置が実行する処理の関連性の一例を示す図である。 [2-7. Relevance of each process)
Here, the relevance of the extraction processing executed by the index utterance extraction processing unit 50, the evaluation processing executed by the session evaluation processing unit 60, and the prediction processing executed by the engagement prediction processing unit 70 will be described with reference to FIG. FIG. 26 is a diagram illustrating an example of the relevance of processing performed by the information providing device according to the embodiment.

まず、抽出処理と評価処理との関係性について説明する。例えば、評価が高いセッションには、好意的指標発話が含まれやすく、低評価セッションには、非好意的指標発話が含まれやすいと推定される。このため、セッションの評価結果は、指標発話の抽出処理に利用することができる。より具体的には、情報提供装置１０は、上述した評価処理によりセッションの評価を進めることで、指標発話の抽出量や抽出精度を向上させることができる。一方、好意的指標発話が含まれるセッションは、評価が高いと推定され、非好意的指標発話が含まれるセッションは、評価が低いと推定される。このため、指標発話の抽出結果は、セッションの評価処理に利用することができる。より具体的には、情報提供装置１０は、上述した抽出処理により指標発話を抽出することで、セッションの評価精度を向上させることができる。 First, the relationship between the extraction processing and the evaluation processing will be described. For example, it is estimated that a session with a high evaluation is likely to include a favorable index utterance, and a session with a low evaluation is likely to include an unfavorable index utterance. Therefore, the evaluation result of the session can be used for the extraction processing of the index utterance. More specifically, the information providing apparatus 10 can improve the extraction amount and extraction accuracy of the index utterance by performing the evaluation of the session by the above-described evaluation processing. On the other hand, a session including a favorable index utterance is estimated to have a high evaluation, and a session including a non-favorable index utterance is estimated to have a low evaluation. For this reason, the extraction result of the index utterance can be used for the session evaluation processing. More specifically, the information providing apparatus 10 can improve the evaluation accuracy of the session by extracting the index utterance by the above-described extraction processing.

続いて、抽出処理と予測処理との関係性について説明する。例えば、エンゲージメントが高い利用者は好意的指標発話を発話した頻度が高く、エンゲージメントが低い利用者は、非好意的指標発話を発話した頻度が低いと推定される。このため、エンゲージメントの予測結果は、指標発話の抽出処理に利用することができる。より具体的には、情報提供装置１０は、エンゲージメントの予測を行うことで、指標発話の抽出量や抽出精度を向上させることができる。一方、好意的指標発話が多い利用者はエンゲージメントが高く、非好意的指標発話が多い利用者はエンゲージメントが低いと推定される。このため、指標発話の抽出結果は、エンゲージメントの予測処理に利用することができる。より具体的には、情報提供装置１０は、指標発話の抽出を行うことで、エンゲージメントの予測精度を向上させることができる。 Next, the relationship between the extraction processing and the prediction processing will be described. For example, it is estimated that a user with a high engagement has a high frequency of uttering a favorable index utterance, and a user with a low engagement has a low frequency of uttering an unfavorable index utterance. For this reason, the engagement prediction result can be used for the extraction processing of the index utterance. More specifically, the information providing apparatus 10 can improve the extraction amount and extraction accuracy of the index utterance by predicting the engagement. On the other hand, it is presumed that a user with many favorable index utterances has a high engagement, and a user with many non-favorable index utterances has a low engagement. For this reason, the extraction result of the index utterance can be used for engagement prediction processing. More specifically, the information providing apparatus 10 can improve the prediction accuracy of the engagement by extracting the index utterance.

続いて、評価処理と予測処理との関係性について説明する。例えば、評価が高いセッションが多い利用者のエンゲージメントは高く、評価が低いセッションが多い利用者のエンゲージメントは低いと推定される。このため、セッションの評価結果は、エンゲージメントの予測処理に利用することができる。より具体的には、情報提供装置１０は、セッションの評価を行うことで、エンゲージメントの予測精度を向上させることができる。一方、エンゲージメントが高い利用者は、セッションの評価が高く、エンゲージメントが低い利用者は、セッションの評価が低いと推定される。このため、エンゲージメントの予測結果は、セッションの評価処理に利用することができる。より具体的には、情報提供装置１０は、エンゲージメントの予測を行うことで、セッションの評価精度を向上させることができる。 Next, the relationship between the evaluation processing and the prediction processing will be described. For example, it is estimated that a user with many sessions with high evaluation has a high engagement, and a user with many sessions with low evaluation has a low engagement. For this reason, the session evaluation result can be used for engagement prediction processing. More specifically, the information providing apparatus 10 can improve the prediction accuracy of the engagement by evaluating the session. On the other hand, it is estimated that a user with a high engagement has a high evaluation of the session, and a user with a low engagement has a low evaluation of the session. For this reason, the prediction result of the engagement can be used for the session evaluation processing. More specifically, the information providing apparatus 10 can improve the evaluation accuracy of the session by predicting the engagement.

このように、上述した各種の抽出処理、評価処理、および予測処理は、相互にブートストラップの関係性を有する。そこで、情報提供装置１０は、各処理を順番に繰り返し実行することで、各処理の精度を漸進的に向上させることができる。例えば、情報提供装置１０は、抽出処理と予測処理、抽出処理と評価処理、あるいは、評価処理と予測処理を交互に実行してもよい。また、情報提供装置１０は、各処理を順番に繰り返し実行してもよい。 As described above, the above-described various kinds of extraction processing, evaluation processing, and prediction processing have a bootstrap relationship with each other. Therefore, the information providing apparatus 10 can gradually improve the accuracy of each process by repeatedly executing each process in order. For example, the information providing apparatus 10 may alternately execute an extraction process and a prediction process, an extraction process and an evaluation process, or an evaluation process and a prediction process. Further, the information providing apparatus 10 may repeatedly execute each process in order.

〔２−８．強化学習処理について〕
図２に戻り、説明を続ける。強化学習処理部８０は、対話装置２００が用いる対話モデルの強化学習を実行する。 [2-8. About reinforcement learning processing)
Returning to FIG. 2, the description will be continued. The reinforcement learning processing unit 80 executes reinforcement learning of the dialog model used by the dialog device 200.

〔２−８−１．強化学習の一例について〕
例えば、図２７は、対話モデルの強化学習を実行する処理の概念を示す図である。対話装置２００は、対話モデルＭ１０を用いて、利用者の発話に対する応答を生成する。ここで、対話サービスにおいては、応答に対し、利用者が全く新規な内容（それまでのセンテンスとは異なるセンテンス）の発話を行う場合もあれば、応答に対する評価を含む発話を行う場合もある。例えば、上述した好意的指標発話や非好意的指標発話は、直前に行われた応答に対する評価であると考えられる。すると、対話モデルＭ１０を強化学習におけるエージェント、利用者を強化学習による環境と見做した場合、利用者が発話した各種の指標発話は、エージェントに対する報酬であると見做すことができる。 [2-8-1. About an example of reinforcement learning)
For example, FIG. 27 is a diagram illustrating a concept of a process of executing reinforcement learning of a dialog model. The interaction device 200 generates a response to the utterance of the user using the interaction model M10. Here, in the interactive service, the user may utter an entirely new content (a sentence different from the previous sentence) in response to the response, or may utter an utterance including evaluation of the response. For example, the above-mentioned favorable index utterance and unfavorable index utterance are considered to be evaluations of the response performed immediately before. Then, when the dialogue model M10 is regarded as an agent in reinforcement learning and the user as an environment based on reinforcement learning, various index utterances spoken by the user can be regarded as rewards for the agent.

そこで、強化学習処理部８０は、対話モデルＭ１０に対する指標発話に基づく報酬を設定することで、対話モデルＭ１０の強化学習を実現する。より具体的には、強化学習処理部８０は、応答モデルＭ１０に対する発話のうち、対話サービスに対する評価の指標となる指標発話を取得し、取得した指標発話に基づく報酬を設定することで、応答モデルＭ１０の強化学習を行う。 Thus, the reinforcement learning processing unit 80 realizes reinforcement learning of the dialog model M10 by setting a reward based on the index utterance for the dialog model M10. More specifically, the reinforcement learning processing unit 80 obtains an index utterance that is an index of the evaluation of the dialogue service among the utterances of the response model M10, and sets a reward based on the obtained index utterance, thereby obtaining a response model. Perform M10 reinforcement learning.

例えば、強化学習処理部８０は、セッションログデータベース３１を参照し、セッションに含まれる好意的指標発話および非好意的指標発話を特定する。そして、強化学習処理部８０は、好意的指標発話が特定された場合、特定された好意的指標発話の直前の応答に対する報酬として、正の報酬を設定し、対話モデルＭ１０の学習を行う。また、強化学習処理部８０は、非好意的指標発話が特定された場合、特定された非好意的指標発話の直前の応答に対する報酬として、負の報酬を設定し、対話モデルＭ１０の学習を行う。 For example, the reinforcement learning processing unit 80 refers to the session log database 31 and specifies a favorable index utterance and a non-favorable index utterance included in the session. Then, when a favorable index utterance is specified, the reinforcement learning processing unit 80 sets a positive reward as a reward for a response immediately before the specified favorable index utterance, and learns the interaction model M10. Further, when the unfavorable index utterance is specified, the reinforcement learning processing unit 80 sets a negative reward as a reward for a response immediately before the specified unfavorable index utterance, and learns the dialog model M10. .

また、強化学習処理部８０は、利用者の発話を模倣する利用者模型モデルＭ２０を生成し、利用者模型モデルＭ２０と対話モデルＭ１０とに仮想的な対話を行わせ、利用者模型モデルＭ２０が出力した指標発話に応じた報酬を設定し、対話モデルＭ１０の強化学習を行ってもよい。例えば、強化学習処理部８０は、所定の対話規則や、セッションログデータベース３１に登録されたセッションを学習データとして、利用者模型モデルＭ２０を生成する。続いて、強化学習処理部８０は、利用者模型モデルＭ２０が生成した発話を対話モデルＭ１０に入力し、対話モデルＭ１０が出力した応答を利用者模型モデルＭ２０に入力する。 Further, the reinforcement learning processing unit 80 generates a user model model M20 that imitates the utterance of the user, causes the user model model M20 and the dialog model M10 to perform virtual dialogue, and the user model model M20 A reward according to the output index utterance may be set, and reinforcement learning of the dialog model M10 may be performed. For example, the reinforcement learning processing unit 80 generates the user model model M20 using predetermined dialog rules and sessions registered in the session log database 31 as learning data. Subsequently, the reinforcement learning processing unit 80 inputs the utterance generated by the user model model M20 to the dialog model M10, and inputs the response output by the dialog model M10 to the user model model M20.

また、強化学習処理部８０は、対話モデルＭ１０と利用者模型モデルＭ２０との仮想的な対話を繰り返し実行させる。そして、強化学習処理部８０は、利用者模型モデルＭ２０が発話として指標発話を出力した場合は、指標発話に応じた報酬を設定し、設定した報酬に基づいて、対話モデルＭ１０の強化学習を行う。 Further, the reinforcement learning processing unit 80 repeatedly executes a virtual dialogue between the dialogue model M10 and the user model model M20. Then, when the user model model M20 outputs an index utterance as an utterance, the reinforcement learning processing unit 80 sets a reward according to the index utterance, and performs reinforcement learning of the dialog model M10 based on the set reward. .

〔２−８−２．強化学習処理部が有する機能構成の一例〕
次に、図２８を用いて、強化学習処理部８０が有する機能構成の一例について説明する。図２８は、実施形態に係る強化学習処理部の機能構成の一例を示す図である。図２８に示すように、強化学習処理部８０は、指標発話取得部８１、報酬設定部８２、および強化学習部８３を有する。 [2-8-2. Example of Functional Configuration of Reinforcement Learning Processing Unit]
Next, an example of a functional configuration of the reinforcement learning processing unit 80 will be described with reference to FIG. FIG. 28 is a diagram illustrating an example of a functional configuration of the reinforcement learning processing unit according to the embodiment. As illustrated in FIG. 28, the reinforcement learning processing unit 80 includes an index utterance acquisition unit 81, a reward setting unit 82, and a reinforcement learning unit 83.

指標発話取得部８１は、対話モデルＭ１０に対する発話のうち、対話サービスに対する評価の指標となる指標発話を取得する。より具体的には、指標発話取得部８１は、利用者や利用者模型モデルＭ２０が出力した発話から、好意的指標発話や非好意的指標発話を取得する。例えば、指標発話取得部８１は、セッションログデータベース３１を参照し、指標発話を取得する。また、指標発話取得部８１は、利用者模型モデルＭ２０が出力した発話のうち、指標発話を取得する。 The index utterance obtaining unit 81 obtains an index utterance, which is an index of evaluation of a dialog service, among the utterances of the dialog model M10. More specifically, the index utterance obtaining unit 81 obtains a favorable index utterance and a non-favorable index utterance from the utterance output by the user or the user model model M20. For example, the index utterance obtaining unit 81 obtains an index utterance by referring to the session log database 31. In addition, the index utterance obtaining unit 81 obtains an index utterance among the utterances output by the user model model M20.

報酬設定部８２は、指標発話取得部１８が取得した指標発話に応じた報酬を設定する。例えば、報酬設定部８２は、取得された指標発話の指標スコアに応じた報酬を設定する。より具体的には、報酬設定部８２は、取得された指標発話が好意的指標発話である場合は、正の報酬を設定し、取得された指標発話が非好意的指標発話である場合は、負の報酬を設定する。 The reward setting unit 82 sets a reward according to the index utterance obtained by the index utterance obtaining unit 18. For example, the reward setting unit 82 sets a reward according to the index score of the obtained index utterance. More specifically, the reward setting unit 82 sets a positive reward when the obtained index utterance is a favorable index utterance, and sets a positive reward when the obtained index utterance is a non-favorable index utterance. Set a negative reward.

強化学習部８３は、報酬設定部８２が設定した報酬を用いて、対話モデルＭ１０の強化学習を行う。例えば、強化学習部８３は、指標発話取得部８１がセッションからある指標発話を抽出した場合、その指標発話の直前の応答を特定し、特定した応答を対話モデルＭ１０が出力した際の報酬が、その指標発話に基づく報酬であるものとして、対話モデルＭ１０の強化学習を行う。 The reinforcement learning unit 83 performs reinforcement learning of the interactive model M10 using the reward set by the reward setting unit 82. For example, when the index utterance acquisition unit 81 extracts a certain index utterance from the session, the reinforcement learning unit 83 specifies the response immediately before the index utterance, and the reward when the specified response is output by the dialog model M10 is: The reinforcement learning of the dialog model M10 is performed as a reward based on the index utterance.

また、例えば、強化学習部８３は、対話モデルＭ１０が生成した応答に対して指標発話を出力する発話モデル、すなわち、利用者模型モデルＭ２０が出力した指標発話に基づく報酬を用いて、対話モデルＭ１０の強化学習を行う。例えば、強化学習部８３は、セッションログデータベース３１に登録された各セッションの内容や、対話規則を用いて、利用者模型モデルＭ２０の学習を行う。このような利用者模型モデルＭ２０は、例えば、対話モデルＭ１０の学習と同様の学習手法が採用可能である。 Further, for example, the reinforcement learning unit 83 uses the utterance model that outputs an index utterance in response to the response generated by the dialogue model M10, that is, a reward based on the index utterance output by the user model model M20, and uses the dialogue model M10. Perform reinforcement learning. For example, the reinforcement learning unit 83 learns the user model model M20 using the contents of each session registered in the session log database 31 and the dialog rules. For such a user model model M20, for example, a learning method similar to the learning of the interactive model M10 can be adopted.

続いて、強化学習部８３は、利用者模型モデルＭ２０と対話モデルＭ１０とに仮想的な対話を実行させる。そして、強化学習部８３は、利用者模型モデルＭ２０が指標発話を出力した場合は、報酬設定部８２が設定した報酬に基づいて、対話モデルＭ１０の強化学習を実行する。 Subsequently, the reinforcement learning unit 83 causes the user model model M20 and the dialog model M10 to execute a virtual dialog. Then, when the user model model M20 outputs the index utterance, the reinforcement learning unit 83 executes the reinforcement learning of the dialog model M10 based on the reward set by the reward setting unit 82.

〔３．情報提供装置が実行する処理の流れの一例〕
続いて、図２９〜図３７を用いて、情報提供装置１０が実行する各処理の流れの一例を説明する。 [3. Example of flow of processing executed by information providing apparatus]
Subsequently, an example of the flow of each process executed by the information providing apparatus 10 will be described with reference to FIGS.

〔３−１．利用態様に基づく処理の流れの一例〕
まず、図２９を用いて、情報提供装置１０が利用態様に基づいて実行する処理の流れの一例を説明する。図２９は、実施形態に係る情報提供装置が利用態様に基づいて実行する処理の流れの一例を示すフローチャートである。 [3-1. Example of processing flow based on usage mode)
First, an example of a flow of processing executed by the information providing apparatus 10 based on the usage mode will be described with reference to FIG. FIG. 29 is a flowchart illustrating an example of a flow of a process performed by the information providing apparatus according to the embodiment based on a use mode.

まず、情報提供装置１０は、各利用者の対話サービスの利用態様を特定する（ステップＳ１０１）。続いて、情報提供装置１０は、特定結果が所定の条件を満たす利用者を特定する（ステップＳ１０２）。そして、情報提供装置１０は、特定した利用者の発話を含むセッションを抽出する（ステップＳ１０３）。また、情報提供装置１０は、このようにして抽出したセッションを学習データとして、エンゲージメントセッション評価モデルＭ６の学習を行い（ステップＳ１０４）、処理を終了する。 First, the information providing apparatus 10 specifies a usage mode of each user's interactive service (step S101). Subsequently, the information providing apparatus 10 specifies a user whose specified result satisfies a predetermined condition (step S102). Then, the information providing apparatus 10 extracts a session including the utterance of the specified user (Step S103). Further, the information providing apparatus 10 learns the engagement session evaluation model M6 using the session thus extracted as learning data (step S104), and ends the process.

なお、情報提供装置１０は、抽出したセッションの評価を行ってもよい（ステップＳ１０５）。例えば、情報提供装置１０は、エンゲージメントが高い利用者のセッションを抽出した場合、それらのセッションを高満足セッションであると評価してもよい。また、情報提供装置１０は、このようにして抽出したセッションから指標発話を抽出してもよい（ステップＳ１０６）。例えば、情報提供装置１０は、エンゲージメントが高い利用者のセッションにおける出現頻度が所定の条件を満たす発話を好意的指標発話として抽出し、エンゲージメントが低い利用者のセッションにおける出現頻度が所定の条件を満たす発話を非好意的指標発話として抽出してもよい。 Note that the information providing apparatus 10 may evaluate the extracted session (step S105). For example, when extracting sessions of users with high engagement, the information providing apparatus 10 may evaluate those sessions as high-satisfaction sessions. Further, the information providing apparatus 10 may extract the index utterance from the session thus extracted (step S106). For example, the information providing apparatus 10 extracts, as a favorable index utterance, an utterance whose appearance frequency in a session of a user with high engagement satisfies a predetermined condition, and an appearance frequency in a session of a user with low engagement satisfies a predetermined condition. The utterance may be extracted as an unfavorable index utterance.

〔３−２．指標発話を抽出する処理の流れの一例〕
次に、図３０を用いて、情報提供装置１０が指標発話を抽出する処理の流れの一例を説明する。図３０は、実施形態に係る情報提供装置が指標発話を抽出する処理の流れの一例を示すフローチャートである。 [3-2. Example of processing flow for extracting index utterance]
Next, an example of a flow of a process in which the information providing apparatus 10 extracts the index utterance will be described with reference to FIG. FIG. 30 is a flowchart illustrating an example of a flow of a process in which the information providing apparatus according to the embodiment extracts an index utterance.

まず、情報提供装置１０は、利用態様が所定の条件を満たす利用者のセッションを取得し（ステップＳ２０１）、取得したセッションにおける出現頻度に基づいて、指標発話を抽出する（ステップＳ２０２）。そして、情報提供装置１０は、指標発話との共起性に基づいて、新たな指標発話をさらに抽出し（ステップＳ２０３）、処理を終了する。 First, the information providing apparatus 10 acquires a session of a user whose use mode satisfies a predetermined condition (step S201), and extracts an index utterance based on the appearance frequency in the acquired session (step S202). Then, the information providing apparatus 10 further extracts a new index utterance based on the co-occurrence with the index utterance (step S203), and ends the process.

なお、情報提供装置１０は、抽出した指標発話に基づいて、セッションの評価を行ってもよい（ステップＳ２０４）。例えば、情報提供装置１０は、好意的指標発話を多く含むセッションを高満足セッションと評価し、非好意的指標発話を多く含むセッションを低満足セッションと評価してもよい。また、情報提供装置１０は、指標発話の出現頻度に基づいて、利用者の利用態様を推定してもよい（ステップＳ２０５）。例えば、情報提供装置１０は、好意的指標発話を多く発話する利用者を、エンゲージメントが高い利用者と推定し、非好意的指標発話を多く含む利用者を、エンゲージメントが低い利用者と推定してもよい。 Note that the information providing apparatus 10 may evaluate the session based on the extracted index utterance (Step S204). For example, the information providing apparatus 10 may evaluate a session including many favorable index utterances as a high-satisfaction session, and evaluate a session including many unfavorable index utterances as a low-satisfaction session. Further, the information providing apparatus 10 may estimate the usage mode of the user based on the appearance frequency of the index utterance (Step S205). For example, the information providing apparatus 10 estimates a user who utters a lot of favorable index utterances as a user with high engagement, and estimates a user who contains many unfavorable index utterances as a user with low engagement. Is also good.

〔３−３．セッションを評価する処理の流れの一例〕
次に、図３１を用いて、情報提供装置１０がセッションを評価する処理の流れの一例を説明する。図３１は、実施形態に係る情報提供装置がセッションを評価する処理の流れの一例を示すフローチャートである。 [3-3. Example of processing flow for evaluating session]
Next, an example of a flow of a process in which the information providing apparatus 10 evaluates a session will be described with reference to FIG. FIG. 31 is a flowchart illustrating an example of the flow of a process in which the information providing apparatus according to the embodiment evaluates a session.

まず、情報提供装置１０は、評価対象となるセッションを取得し（ステップＳ３０１）、取得したセッションにおける指標発話の出現頻度に基づいて、セッションを評価する（ステップＳ３０２）。例えば、情報提供装置１０は、好意的指標発話を多く含むセッションを高満足セッションと評価し、非好意的指標発話を多く含むセッションを低満足セッションと評価する。そして、情報提供装置１０は、評価結果に対応するラベルが付与された学習データとして、セッションの特徴を学習し（ステップＳ３０３）、処理を終了する。例えば、情報提供装置１０は、高満足セッションや低満足セッションの特徴を、各モデルＭ３、Ｍ４に学習させる。 First, the information providing apparatus 10 acquires a session to be evaluated (step S301), and evaluates the session based on the appearance frequency of the index utterance in the acquired session (step S302). For example, the information providing apparatus 10 evaluates a session including many favorable index utterances as a high satisfaction session, and evaluates a session including many unfavorable index utterances as a low satisfaction session. Then, the information providing apparatus 10 learns the characteristics of the session as learning data to which a label corresponding to the evaluation result is added (step S303), and ends the processing. For example, the information providing apparatus 10 causes the models M3 and M4 to learn the features of the high satisfaction session and the low satisfaction session.

なお、情報提供装置１０は、セッションの評価結果に基づいて、セッションから新たな指標発話の抽出を行ってもよい（ステップＳ３０４）。例えば、情報提供装置１０は、高満足セッションでの出現頻度が高い発話を好意的指標発話として抽出し、低満足セッションでの出現頻度が高い発話を非好意的指標発話として抽出してもよい。また、情報提供装置１０は、セッションの評価結果に基づいて、利用者の利用態様を推定してもよい（ステップＳ３０５）。例えば、情報提供装置１０は、高満足セッションが多い利用者をエンゲージメントが高い利用者とし、低満足セッションが多い利用者をエンゲージメントが低い利用者と推定してもよい。 Note that the information providing apparatus 10 may extract a new index utterance from the session based on the evaluation result of the session (step S304). For example, the information providing apparatus 10 may extract an utterance having a high appearance frequency in a high satisfaction session as a favorable index utterance, and may extract an utterance having a high appearance frequency in a low satisfaction session as a non-favorable index utterance. Further, the information providing apparatus 10 may estimate a usage mode of the user based on the evaluation result of the session (Step S305). For example, the information providing apparatus 10 may estimate a user with many high-satisfaction sessions as a user with high engagement and a user with many low-satisfaction sessions as a user with low engagement.

〔３−４．画像に基づいてセッションを評価する処理の流れの一例〕
次に、図３２を用いて、情報提供装置１０が画像に基づいてセッションを評価する処理の流れの一例を説明する。図３２は、実施形態に係る情報提供装置が画像に基づいてセッションを評価する処理の流れの一例を示すフローチャートである。 [3-4. Example of process flow for evaluating session based on image)
Next, an example of the flow of a process in which the information providing apparatus 10 evaluates a session based on an image will be described with reference to FIG. FIG. 32 is a flowchart illustrating an example of a flow of processing in which the information providing apparatus according to the embodiment evaluates a session based on an image.

まず、情報提供装置１０は、評価対象となるセッションを取得し（ステップＳ４０１）、取得したセッションに含まれる発話の文字が相互に同一か否かを示す画像、すなわち、分布画像を生成する（ステップＳ４０２）。そして、情報提供装置１０は、生成した分布画像に基づいて、セッションを評価する（ステップＳ４０３）。例えば、情報提供装置１０は、分布画像に斜線が含まれている場合は、繰り返し評価モデルＭ１によって繰り返しが含まれると分類された場合は、対象セッションを低満足セッションと評価する。そして、情報提供装置１０は、評価結果に対応するラベルが付与された学習データとして、セッションの特徴を学習し（ステップＳ４０４）、処理を終了する。 First, the information providing apparatus 10 acquires a session to be evaluated (step S401), and generates an image indicating whether or not the characters of the utterances included in the acquired session are mutually the same, that is, a distribution image (step S401). S402). Then, the information providing apparatus 10 evaluates the session based on the generated distribution image (Step S403). For example, the information providing apparatus 10 evaluates the target session as a low-satisfaction session when the distribution image includes a diagonal line, and when the distribution image is classified as including a repetition by the repetition evaluation model M1. Then, the information providing apparatus 10 learns the characteristics of the session as learning data to which a label corresponding to the evaluation result is added (step S404), and ends the processing.

なお、情報提供装置１０は、セッションの評価結果に基づいて、セッションから新たな指標発話の抽出を行ってもよい（ステップＳ４０５）。また、情報提供装置１０は、セッションの評価結果に基づいて、利用者の利用態様を推定してもよい（ステップＳ４０６）。 The information providing apparatus 10 may extract a new index utterance from the session based on the evaluation result of the session (step S405). Further, the information providing apparatus 10 may estimate a usage mode of the user based on the evaluation result of the session (Step S406).

〔３−５．発話の繰り返しに基づいてセッションを評価する処理の流れの一例〕
次に、図３３を用いて、情報提供装置１０が発話の繰り返しに基づいてセッションを評価する処理の流れの一例を説明する。図３３は、実施形態に係る情報提供装置が発話の繰り返しに基づいてセッションを評価する処理の流れの一例を示すフローチャートである。 [3-5. Example of processing flow for evaluating session based on repetition of utterance]
Next, an example of the flow of a process in which the information providing apparatus 10 evaluates a session based on repetition of speech will be described with reference to FIG. FIG. 33 is a flowchart illustrating an example of a flow of processing in which the information providing apparatus according to the embodiment evaluates a session based on repetition of speech.

まず、情報提供装置１０は、評価対象となるセッションを取得し（ステップＳ５０１）、取得したセッションに含まれる発話の文字列の繰り返しに基づいて、セッションを評価する（ステップＳ５０２）。例えば、情報提供装置１０は、文字列が同一または類似する発話の数や、意味が同一又は類似する発話の数が所定の閾値を超える場合は、低満足セッションであると評価する。そして、情報提供装置１０は、評価結果に対応するラベルが付与された学習データとして、セッションの特徴を学習し（ステップＳ５０３）、処理を終了する。 First, the information providing apparatus 10 acquires a session to be evaluated (step S501), and evaluates the session based on repetition of a character string of an utterance included in the acquired session (step S502). For example, if the number of utterances having the same or similar character strings or the number of utterances having the same or similar meaning exceeds a predetermined threshold, the information providing apparatus 10 evaluates the session as a low-satisfaction session. Then, the information providing apparatus 10 learns the characteristics of the session as learning data to which a label corresponding to the evaluation result is added (step S503), and ends the processing.

なお、情報提供装置１０は、セッションの評価結果に基づいて、セッションから新たな指標発話の抽出を行ってもよい（ステップＳ５０４）。また、情報提供装置１０は、セッションの評価結果に基づいて、利用者の利用態様を推定してもよい（ステップＳ５０５）。 The information providing apparatus 10 may extract a new index utterance from the session based on the evaluation result of the session (step S504). Further, the information providing apparatus 10 may estimate a usage mode of the user based on the evaluation result of the session (Step S505).

〔３−６．発話を報酬とした強化学習の処理の流れの一例〕
次に、図３４を用いて、情報提供装置１０が発話を報酬として実行する強化学習の流れの一例を説明する。図３４は、実施形態に係る情報提供装置が発話を報酬として実行する強化学習の流れの一例を示すフローチャートである。 [3-6. Example of reinforcement learning process flow with utterance as reward)
Next, an example of a flow of reinforcement learning in which the information providing apparatus 10 executes the utterance as a reward will be described with reference to FIG. FIG. 34 is a flowchart illustrating an example of the flow of reinforcement learning in which the information providing apparatus according to the embodiment executes an utterance as a reward.

まず、情報提供装置１０は、対話モデルＭ１０に対して発話を入力し（ステップＳ６０１）、対話モデルＭ１０が出力した応答を利用者模型モデルＭ２０に入力することで、応答に対する指標発話を取得する（ステップＳ６０２）。そして、情報提供装置１０は、利用者模型モデルＭ２０から取得した指標発話を報酬として、対話モデルＭ１０の強化学習を実行し（ステップＳ６０３）、処理を終了する。 First, the information providing apparatus 10 inputs an utterance to the dialogue model M10 (step S601), and obtains an index utterance for the response by inputting the response output by the dialogue model M10 to the user model model M20 (step S601). Step S602). Then, the information providing apparatus 10 executes the reinforcement learning of the interactive model M10 using the index utterance acquired from the user model model M20 as a reward (step S603), and ends the process.

〔３−７．共起性に基づいて指標発話を抽出する処理の流れの一例〕
次に、図３５を用いて、情報提供装置１０が共起性に基づいて指標発話を抽出する処理の流れの一例を説明する。図３５は、実施形態に係る情報提供装置が共起性に基づいて指標発話を抽出する処理の流れの一例を示すフローチャートである。 [3-7. An example of the flow of processing for extracting an index utterance based on co-occurrence]
Next, an example of a flow of a process in which the information providing apparatus 10 extracts an index utterance based on co-occurrence will be described with reference to FIG. FIG. 35 is a flowchart illustrating an example of a flow of a process in which the information providing apparatus according to the embodiment extracts an index utterance based on co-occurrence.

まず、情報提供装置１０は、セッションの履歴から指標発話を特定し（ステップＳ７０１）、発話ごとに、各セッションにおける指標発話との共起性を特定する（ステップＳ７０３）。そして、情報提供装置１０は、特定した共起性に基づいて、新たな指標発話を抽出する（ステップＳ７０３）。例えば、情報提供装置１０は、好意的指標発話との共起性が高い発話を新たな好意的指標発話として抽出し、非好意的指標発話との共起性が高い発話を新たな非好意的指標発話として抽出する。 First, the information providing apparatus 10 specifies an index utterance from the history of the session (step S701), and specifies co-occurrence with the index utterance in each session for each utterance (step S703). Then, the information providing apparatus 10 extracts a new index utterance based on the specified co-occurrence (step S703). For example, the information providing apparatus 10 extracts an utterance having a high co-occurrence with the favorable index utterance as a new favorable index utterance, and extracts an utterance having a high co-occurrence with the unfavorable index utterance as a new unfavorable index utterance. Extract as an index utterance.

なお、情報提供装置１０は、このようにして新たに抽出した指標発話との共起性に基づいて、さらに新たな指標発話を抽出してもよい（ステップＳ７０４）。また、情報提供装置１０は、このようにして新たに抽出した指標発話に基づいて、セッションを評価を行ってもよく（ステップＳ７０５）、指標発話の出現頻度に基づいて、利用者の利用態様を推定してもよい（ステップＳ７０６）。 The information providing apparatus 10 may further extract a new index utterance based on the co-occurrence with the newly extracted index utterance (step S704). In addition, the information providing apparatus 10 may evaluate the session based on the index utterance newly extracted in this manner (step S705), and determine the usage mode of the user based on the appearance frequency of the index utterance. It may be estimated (step S706).

〔３−８．３値評価モデルを用いて、セッションを評価する処理の流れの一例〕
次に、図３６を用いて、３値評価モデルＭ５を用いてセッションを評価する処理の流れの一例を説明する。図３６は、実施形態に係る情報提供装置が３値評価モデルを用いてセッションを評価する処理の流れの一例を示すフローチャートである。 [3-8.3 Example of Process Flow for Evaluating Session Using Three-Value Evaluation Model]
Next, an example of the flow of processing for evaluating a session using the ternary evaluation model M5 will be described with reference to FIG. FIG. 36 is a flowchart illustrating an example of a flow of processing in which the information providing apparatus according to the embodiment evaluates a session using a ternary evaluation model.

まず、情報提供装置１０は、セッションの履歴から好意的指標発話、非好意的指標発話および中性発話を特定し（ステップＳ８０１）、各指標発話から所定の数だけ前の発話および応答を学習データとして抽出する（ステップＳ８０２）。すなわち、情報提供装置１０は、好意的対話例ＦＤ、中性対話例ＮＤ、および非好意的対話例ＵＤを生成する。また、情報提供装置１０は、抽出した学習データに含まれる指標発話に応じたラベルを付与し（ステップＳ８０３）、学習データを用いて、３値評価モデルＭ５の学習を行う（ステップＳ８０４）。その後、情報提供装置１０は、３値評価モデルＭ５を用いて、セッションの評価を行い（ステップＳ８０５）、処理を終了する。 First, the information providing apparatus 10 specifies a favorable index utterance, an unfavorable index utterance, and a neutral utterance from the history of the session (step S801), and outputs a predetermined number of utterances and responses from each index utterance as learning data. (Step S802). That is, the information providing apparatus 10 generates a favorable interaction example FD, a neutral interaction example ND, and a non-favorable interaction example UD. Further, the information providing apparatus 10 gives a label corresponding to the index utterance included in the extracted learning data (step S803), and learns the ternary evaluation model M5 using the learning data (step S804). Thereafter, the information providing apparatus 10 evaluates the session using the ternary evaluation model M5 (Step S805), and ends the processing.

〔３−９．複数のモデルを用いて、セッションを評価する処理の流れの一例〕
次に、図３７を用いて、複数のモデルＭ１〜Ｍ７を用いてセッションを評価する処理の流れの一例を説明する。図３７は、実施形態に係る情報提供装置が複数のモデルを用いてセッションを評価する処理の流れの一例を示すフローチャートである。 [3-9. Example of process flow for evaluating session using multiple models]
Next, an example of a flow of a process of evaluating a session using a plurality of models M1 to M7 will be described with reference to FIG. FIG. 37 is a flowchart illustrating an example of a flow of a process in which the information providing apparatus according to the embodiment evaluates a session using a plurality of models.

例えば、情報提供装置１０は、評価対象となるセッションを取得し（ステップＳ９０１）、それぞれ異なる特徴を学習した複数のモデルＭ１〜Ｍ７にセッションを評価させる（ステップＳ９０２）。そして、情報提供装置１０は、各モデルＭ１〜Ｍ７の評価結果を統合し、セッションを評価して（ステップＳ９０３）、処理を終了する。 For example, the information providing apparatus 10 acquires a session to be evaluated (step S901), and causes a plurality of models M1 to M7 that have learned different characteristics to evaluate the session (step S902). Then, the information providing apparatus 10 integrates the evaluation results of the models M1 to M7, evaluates the session (step S903), and ends the processing.

〔４．その他〕
また、上記実施形態において説明した各処理のうち、自動的に行われるものとして説明した処理の全部または一部を手動的に行うこともでき、逆に、手動的に行われるものとして説明した処理の全部または一部を公知の方法で自動的に行うこともできる。この他、上記文書中や図面中で示した処理手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて任意に変更することができる。例えば、各図に示した各種情報は、図示した情報に限られない。 [4. Others)
Further, among the processes described in the above embodiment, all or some of the processes described as being performed automatically can be manually performed, and conversely, the processes described as being performed manually can be performed. Can be automatically or completely performed by a known method. In addition, the processing procedure, specific names, and information including various data and parameters shown in the above document and drawings can be arbitrarily changed unless otherwise specified. For example, the various information shown in each drawing is not limited to the information shown.

また、図示した各装置の各構成要素は機能概念的なものであり、必ずしも物理的に図示の如く構成されることを要しない。すなわち、各装置の分散・統合の具体的形態は図示のものに限られず、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散・統合して構成することができる。 Each component of each device illustrated is a functional concept and does not necessarily need to be physically configured as illustrated. In other words, the specific form of distribution / integration of each device is not limited to that shown in the figure, and all or a part thereof may be functionally or physically distributed or arbitrarily distributed in arbitrary units according to various loads or usage conditions. Can be integrated and configured.

また、上記してきた各実施形態は、処理内容を矛盾させない範囲で適宜組み合わせることが可能である。 Further, the respective embodiments described above can be appropriately combined within a range that does not contradict processing contents.

〔５．プログラム〕
また、上述した実施形態に係る情報提供装置１０は、例えば図３８に示すような構成のコンピュータ１０００によって実現される。図３８は、ハードウェア構成の一例を示す図である。コンピュータ１０００は、出力装置１０１０、入力装置１０２０と接続され、演算装置１０３０、一次記憶装置１０４０、二次記憶装置１０５０、出力ＩＦ（Interface）１０６０、入力ＩＦ１０７０、ネットワークＩＦ１０８０がバス１０９０により接続された形態を有する。 [5. program〕
The information providing apparatus 10 according to the above-described embodiment is realized by, for example, a computer 1000 having a configuration as illustrated in FIG. FIG. 38 is a diagram illustrating an example of a hardware configuration. The computer 1000 is connected to an output device 1010 and an input device 1020, and a form in which a computing device 1030, a primary storage device 1040, a secondary storage device 1050, an output IF (Interface) 1060, an input IF 1070, and a network IF 1080 are connected by a bus 1090. Having.

演算装置１０３０は、一次記憶装置１０４０や二次記憶装置１０５０に格納されたプログラムや入力装置１０２０から読み出したプログラム等に基づいて動作し、各種の処理を実行する。一次記憶装置１０４０は、ＲＡＭ等、演算装置１０３０が各種の演算に用いるデータを一次的に記憶するメモリ装置である。また、二次記憶装置１０５０は、演算装置１０３０が各種の演算に用いるデータや、各種のデータベースが登録される記憶装置であり、ＲＯＭ(Read Only Memory)、ＨＤＤ（Hard Disk Drive）、フラッシュメモリ等により実現される。 The arithmetic device 1030 operates based on a program stored in the primary storage device 1040 or the secondary storage device 1050, a program read from the input device 1020, or the like, and executes various processes. The primary storage device 1040 is a memory device such as a RAM that temporarily stores data used by the arithmetic device 1030 for various calculations. The secondary storage device 1050 is a storage device in which data used by the arithmetic device 1030 for various calculations and various databases are registered, such as a ROM (Read Only Memory), a HDD (Hard Disk Drive), and a flash memory. Is realized by:

出力ＩＦ１０６０は、モニタやプリンタといった各種の情報を出力する出力装置１０１０に対し、出力対象となる情報を送信するためのインタフェースであり、例えば、ＵＳＢ（Universal Serial Bus）やＤＶＩ（Digital Visual Interface）、ＨＤＭＩ（登録商標）（High Definition Multimedia Interface）といった規格のコネクタにより実現される。また、入力ＩＦ１０７０は、マウス、キーボード、およびスキャナ等といった各種の入力装置１０２０から情報を受信するためのインタフェースであり、例えば、ＵＳＢ等により実現される。 The output IF 1060 is an interface for transmitting information to be output to an output device 1010 that outputs various types of information such as a monitor and a printer. For example, a USB (Universal Serial Bus), a DVI (Digital Visual Interface), This is realized by a connector of a standard such as HDMI (registered trademark) (High Definition Multimedia Interface). The input IF 1070 is an interface for receiving information from various input devices 1020 such as a mouse, a keyboard, and a scanner, and is realized by, for example, a USB.

なお、入力装置１０２０は、例えば、ＣＤ（Compact Disc）、ＤＶＤ（Digital Versatile Disc）、ＰＤ（Phase change rewritable Disk）等の光学記録媒体、ＭＯ（Magneto-Optical disk）等の光磁気記録媒体、テープ媒体、磁気記録媒体、または半導体メモリ等から情報を読み出す装置であってもよい。また、入力装置１０２０は、ＵＳＢメモリ等の外付け記憶媒体であってもよい。 The input device 1020 includes, for example, an optical recording medium such as a CD (Compact Disc), a DVD (Digital Versatile Disc), and a PD (Phase change rewritable Disk), a magneto-optical recording medium such as an MO (Magneto-Optical disk), and a tape. A device that reads information from a medium, a magnetic recording medium, a semiconductor memory, or the like may be used. The input device 1020 may be an external storage medium such as a USB memory.

ネットワークＩＦ１０８０は、ネットワークＮを介して他の機器からデータを受信して演算装置１０３０へ送り、また、ネットワークＮを介して演算装置１０３０が生成したデータを他の機器へ送信する。 The network IF 1080 receives data from another device via the network N and sends the data to the arithmetic device 1030, and transmits the data generated by the arithmetic device 1030 to the other device via the network N.

演算装置１０３０は、出力ＩＦ１０６０や入力ＩＦ１０７０を介して、出力装置１０１０や入力装置１０２０の制御を行う。例えば、演算装置１０３０は、入力装置１０２０や二次記憶装置１０５０からプログラムを一次記憶装置１０４０上にロードし、ロードしたプログラムを実行する。 The arithmetic device 1030 controls the output device 1010 and the input device 1020 via the output IF 1060 and the input IF 1070. For example, the arithmetic device 1030 loads a program from the input device 1020 or the secondary storage device 1050 onto the primary storage device 1040, and executes the loaded program.

例えば、コンピュータ１０００が情報提供装置１０として機能する場合、コンピュータ１０００の演算装置１０３０は、一次記憶装置１０４０上にロードされたプログラムを実行することにより、制御部４０の機能を実現する。また、例えば、コンピュータ１０００が端末装置１００として機能する場合、コンピュータ１０００の演算装置１０３０は、一次記憶装置１０４０上にロードされたプログラムを実行することにより、制御部４０の機能を実現する。 For example, when the computer 1000 functions as the information providing apparatus 10, the arithmetic unit 1030 of the computer 1000 implements the function of the control unit 40 by executing a program loaded on the primary storage device 1040. For example, when the computer 1000 functions as the terminal device 100, the arithmetic device 1030 of the computer 1000 implements the function of the control unit 40 by executing a program loaded on the primary storage device 1040.

〔６．実験結果について〕
次に、図３９〜図４７を用いて、上述した情報提供装置１０の実験結果の一例について説明する。 [6. Experimental results)
Next, an example of an experimental result of the above-described information providing apparatus 10 will be described with reference to FIGS.

〔６−１．データについて〕
まず、情報提供装置１０の実験に用いたセッションデータの一例について説明する。以下の実験においては、少なくとも２つ以上の発話や応答を含む４０００個のセッションに対し、クラウドソージングによりラベルの付与を行った。具体的には、トレーニング用セッション（Ｔｒａｉｎｉｎｇ）を２０００個、調整用セッション（Ｖａｌｉｄａｔｉｏｎ）を１０００個、およびテスト用セッション（Ｔｅｓｔ）を１０００個準備した。また、７人の判定者に対して各セッションを提示し、利用者がイライラ（frustrated）していると判断される場合は、「Ｙｅｓ」、利用者がイライラしていないと判断される場合は、「Ｎｏ」のラベルを各セッションに付与させた。 [6-1. About data)
First, an example of the session data used for the experiment of the information providing apparatus 10 will be described. In the following experiment, labels were assigned to 4000 sessions including at least two or more utterances and responses by crowdsourcing. Specifically, 2,000 training sessions (Training), 1,000 adjustment sessions (Validation), and 1,000 test sessions (Test) were prepared. Also, each session is presented to the seven judges, and when it is determined that the user is frustrated (“Frustrated”), “Yes”, and when it is determined that the user is not frustrated, , "No" was assigned to each session.

例えば、図３９は、セッションデータに対するラベル付の結果の一例を示す図である。例えば、図３９に示す例では、少なくとも１人の判定者が「Ｙｅｓ」のラベルを付与したセッションをＦＬ（Frustration Level）１、少なくとも２人の判定者が「Ｙｅｓ」のラベルを付与したセッションをＦＬ２、少なくとも３人の判定者が「Ｙｅｓ」のラベルを付与したセッションをＦＬ３、少なくとも４人の判定者が「Ｙｅｓ」のラベルを付与したセッションをＦＬ４と記載した。また、図３９では、トレーニング用セッション、調整用セッション、およびテスト用セッションのうち、ＦＬ１〜ＦＬ４に分類されたセッションの数を記載した。 For example, FIG. 39 is a diagram illustrating an example of a result of labeling session data. For example, in the example shown in FIG. 39, a session in which at least one judge has attached a label of “Yes” is FL (Frustration Level) 1, and a session in which at least two judge are given a label of “Yes” is FL2, a session in which at least three judges have labeled "Yes" is described as FL3, and a session in which at least four judges have labeled "Yes" is described as FL4. FIG. 39 shows the number of sessions classified into FL1 to FL4 among the training session, the adjustment session, and the test session.

〔６−２．第１実験について〕
続いて、上述したセッションデータを用いた第１実験の内容について説明する。第１実験においては、実験対象としてＰｒｏｐ＿Ｄで示す方式およびＰｒｏｐ＿Ｕで示す方式を設定し、比較対象となるベースラインとして、ＢＬ＿Ｓ１、ＢＬ＿Ｓ２、およびＢＬ＿ＳＳで示す方式を設定した。 [6-2. About the first experiment]
Subsequently, the contents of the first experiment using the above-described session data will be described. In the first experiment, a method indicated by Prop_D and a method indicated by Prop_U were set as experimental objects, and methods indicated by BL_S1, BL_S2, and BL_SS were set as baselines to be compared.

ここで、Ｐｒｏｐ＿Ｄは、図１に示すように、繰り返し評価モデルＭ１、謝罪評価モデルＭ２、非好意的指標発話評価モデルＭ３、および好意的指標発話評価モデルＭ４を用いた判定手法である。例えば、Ｐｒｏｐ＿Ｄにおいては、テストセッションを各モデルＭ１〜Ｍ４に入力して不満足度スコアを算出した。また、Ｐｒｏｐ＿Ｄにおいては、算出した不満足度スコアの値が高い方から順に５０万のセッションを低満足セッション（すなわち、ラベルが「Ｙｅｓ」）であると判定し、不満足度スコアの値が高い方から順に５０万のセッションを高満足セッション（すなわち、ラベルが「Ｎｏ」）であると判定した。 Here, Prop_D is a determination method using a repetition evaluation model M1, an apology evaluation model M2, an unfavorable index utterance evaluation model M3, and a favorable index utterance evaluation model M4, as shown in FIG. For example, in Prop_D, a test session was input to each of the models M1 to M4 to calculate a dissatisfaction score. In Prop_D, 500,000 sessions are determined to be low-satisfaction sessions (that is, the label is “Yes”) in descending order of the calculated value of the dissatisfaction score. 500,000 sessions in turn were determined to be high satisfaction sessions (ie, the label was "No").

ここで、Ｐｒｏｐ＿Ｄにおいては、繰り返し評価モデルＭ１の学習を行う際に、図１７に示した手法を用いて、学習データを生成し、図１８に示すように、生成した学習データの分布画像が有する特徴を繰り返し評価モデルＭ１に学習させた。また、Ｐｒｏｐ＿Ｄにおいては、謝罪評価モデルＭ２の学習を行う際に、図１９に示した手法を用いて、学習データを生成し、学習データが有する特徴を謝罪評価モデルＭ２に学習させた。また、Ｐｒｏｐ＿Ｄにおいては、図１３に示した手法を用いて、好意的指標発話評価モデルＭ３および非好意的指標発話評価モデルＭ４の学習を行った。また、Ｐｒｏｐ＿Ｄにおいては、繰り返し評価モデルＭ１として画像分類を行うモデルを採用し、モデルＭ２〜Ｍ４として、ＬＳＴＭを用いて入力された情報の分類を行うモデルを採用した。 Here, in Prop_D, when learning the repetition evaluation model M1, learning data is generated by using the method shown in FIG. 17, and as shown in FIG. 18, a distribution image of the generated learning data has The feature was repeatedly learned by the evaluation model M1. In Prop_D, when learning the apology evaluation model M2, learning data was generated using the method shown in FIG. 19, and the features of the learning data were learned by the apology evaluation model M2. In the case of Prop_D, learning of the favorable index utterance evaluation model M3 and the non-favorable index utterance evaluation model M4 was performed using the method shown in FIG. In Prop_D, a model for performing image classification is adopted as the repetition evaluation model M1, and a model for classifying information input using LSTM is adopted as models M2 to M4.

また、Ｐｒｏｐ＿Ｕは、図１に示す各モデルＭ１〜Ｍ４のうち、繰り返し評価モデルＭ１および謝罪評価モデルＭ２のみを用いた判定手法である。このような判定手法は、モデルの学習に指標発話が必要ないため、予め人手でデータを作成する必要がなく、Ｕｎｓｕｐｅｒｖｉｓｅｄな手法となる。例えば、Ｐｒｏｐ＿Ｕでは、対象セッションを繰り返し評価モデルＭ１および謝罪評価モデルＭ２に入力し、各モデルＭ１、Ｍ２が出力した満足度の和を不満足度スコアとして算出した。また、Ｐｒｏｐ＿Ｕでは、算出した不満足度スコアの値が高い方から順に５０万のセッションを低満足セッション（すなわち、ラベルが「Ｙｅｓ」）であると判定し、不満足度スコアの値が高い方から順に５０万のセッションを高満足セッション（すなわち、ラベルが「Ｎｏ」）であると判定した。 Prop_U is a determination method using only the repetition evaluation model M1 and the apology evaluation model M2 among the models M1 to M4 shown in FIG. Since such a determination method does not require index utterance for learning a model, it is not necessary to manually create data in advance, and it is an unsupervised method. For example, in Prop_U, the target session was repeatedly input to the evaluation model M1 and the apology evaluation model M2, and the sum of the satisfaction levels output by the models M1 and M2 was calculated as the dissatisfaction score. In Prop_U, 500,000 sessions are determined to be low-satisfaction sessions (that is, the label is “Yes”) in descending order of the calculated value of the dissatisfaction score, and the dissatisfaction score is determined in descending order of the value of the dissatisfaction score. 500,000 sessions were determined to be high satisfaction sessions (ie, the label was "No").

また、ＢＬ＿Ｓ１は、上述したトレーニング用セッションを用いて学習を行い、調整用セッションを用いて、調整を行ったモデルを用いた。また、ＢＬ＿Ｓ１では、上述したＰｒｏｐ＿ＤやＰｒｏｐ＿Ｕと同様に、ＬＳＴＭを用いて入力された情報の分類を行うモデルを採用した。 BL_S1 used a model that was learned using the training session described above and adjusted using the adjustment session. In the case of BL_S1, a model for classifying information input using LSTM is employed, similarly to Prop_D and Prop_U described above.

また、ＢＬ＿Ｓ２では、トレーニング用セッションを用いて学習を行い、調整用セッションを用いて調整を行ったＳＶＭを用いた。また、ＢＬ＿Ｓ２では、上述したＢＬ＿Ｓ１で利用したモデルのＬＳＴＭ層の出力をＳＶＭに入力し、ＳＶＭの出力を分類結果として採用した。 Further, in BL_S2, SVM in which learning was performed using a training session and adjustment was performed using an adjustment session was used. In BL_S2, the output of the LSTM layer of the model used in BL_S1 is input to SVM, and the output of SVM is adopted as the classification result.

また、ＢＬ＿ＳＳでは、トレーニング用セッションに加え、ラベルが付与されていない百万個のセッションを用いて学習が行われたトランスダクティブＳＶＭ（ＴＳＶＭ）を用いた。また、調整用セッションおよびテスト用セッションは、ラベルを無視することで、ラベルが付与されていないセッションとして学習に利用した。 In addition, in BL_SS, in addition to the training session, a transductive SVM (TSVM) trained using one million unlabeled sessions was used. Further, the adjustment session and the test session were used for learning as sessions without labels by ignoring the labels.

ここで、第１実験においては、上述した各種の手法によりＦＬ１〜ＦＬ４に分類されたセッションの分類を行い、分類結果のアキュラシー（正確性）、プレシジョン（適合率）、リコール（再現率）、および適合率と再現率との調和平均であるＦ１値を算出した。図４０は、第１実験の結果を示す図である。なお、図４０に示す例では、Ｐｒｏｐ＿Ｄとの有意差がある手法にアスタリスクを、Ｐｒｏｐ＿Ｕとの有意差がある手法にシャープを付した。図４０に示すように、Ｆ１値を比較対象とした場合、ＦＬ１、ＦＬ２、およびＦＬ３に分類されたセッションデータの分類結果においては、Ｐｒｏｐ＿Ｄによる分類結果の精度が最も良くなった。 Here, in the first experiment, the sessions classified into FL1 to FL4 are classified by the various methods described above, and the accuracy (accuracy), precision (accuracy), recall (recall), The F1 value, which is the harmonic mean of the precision and the recall, was calculated. FIG. 40 is a diagram showing the results of the first experiment. In the example shown in FIG. 40, an asterisk is given to a method having a significant difference from Prop_D, and a sharp is given to a method having a significant difference from Prop_U. As shown in FIG. 40, when the F1 value is used as a comparison target, the accuracy of the classification result by Prop_D is highest in the classification results of the session data classified into FL1, FL2, and FL3.

〔６−３．第２実験について〕
続いて、上述した各モデルが精度にどれくらい寄与しているかを判断するため、以下の第２実験を行った。第２実験においては、Ｐｒｏｐ＿Ｄにおいて繰り返し評価モデルＭ１を取り除いたｗ／ｏ−Ｒｅｐ、Ｐｒｏｐ＿Ｄにおいて謝罪評価モデルＭ２を取り除いたｗ／ｏ−Ａｐｏ、Ｐｒｏｐ＿Ｄにおいて非好意的指標発話評価モデルＭ３を取り除いたｗ／ｏ−Ｕｎｆ、Ｐｒｏｐ＿Ｄにおいて好意的指標発話評価モデルＭ４を取り除いたｗ／ｏ−Ｆａｖという方式を設定した。 [6-3. About the second experiment]
Subsequently, the following second experiment was performed to determine how much each model described above contributed to the accuracy. In the second experiment, w / o-Rep in which the repeated evaluation model M1 was removed in Prop_D, w / o-Apo in which the apology evaluation model M2 was removed in Prop_D, and w / o-Apo in which the unfavorable index utterance evaluation model M3 was removed in Prop_D. A method called w / o-Fav was set in which the favorable index utterance evaluation model M4 was removed from / o-Unf and Prop_D.

また、第２実験においては、図１に示す４つのモデルＭ１〜Ｍ４を利用しないｗ／ｏ−ＲＡＵＦという方式を設定した。ｗ／ｏ−ＲＡＵＦでは、５０万個の高満足セッションと５０万個の低満足セッションとを学習データとし、入力されたセッションを高満足セッション（すなわち、ラベルが「Ｎｏ」のセッション）もしくは低満足セッション（すなわち、ラベルが「Ｙｅｓ」のセッション）に分類するモデルの学習を行い、学習したモデルを用いてセッションの分類を行った。 In the second experiment, a method called w / o-RAUF was set without using the four models M1 to M4 shown in FIG. In w / o-RAUF, 500,000 high-satisfaction sessions and 500,000 low-satisfaction sessions are used as learning data, and the input session is a high-satisfaction session (that is, a session with a label “No”) or a low-satisfaction session. A model to be classified into a session (that is, a session whose label is “Yes”) was learned, and the session was classified using the learned model.

また、第２実験においては、ｗ／ｏ−ＣａｎｄＲｅｔという方式を設定した。ｗ／ｏ−ＣａｎｄＲｅｔでは、予め取得されたセッションの履歴に対し、上述した各モデルＭ１〜Ｍ４を用いて不満足度スコアを算出し、算出したスコアが多い方から順に５０万個のセッションをラベルが「Ｙｅｓ」のセッションとして抽出し、算出したスコアが低い方から順に５０万個のセッションをラベルが「Ｎｏ」のセッションとして抽出した。そして、ｗ／ｏ−ＣａｎｄＲｅｔでは、生成した学習データを用いて、入力されたセッションを分類するモデルの学習を行い、学習したモデルを用いてセッションの分類を行った。 In the second experiment, a method called w / o-CandRet was set. In the w / o-CandRet, the dissatisfaction score is calculated using the above-described models M1 to M4 with respect to the session history acquired in advance, and 500,000 sessions are labeled in descending order of the calculated score. The sessions were extracted as “Yes” sessions, and 500,000 sessions were extracted as sessions with the label “No” in ascending order of calculated score. In w / o-CandRet, learning of a model for classifying an input session is performed using the generated learning data, and session classification is performed using the learned model.

そして、第２実験においても、上述した各種の手法によりＦＬ１〜ＦＬ４に分類されたセッションの分類を行い、分類結果のアキュラシー（正確性）、プレシジョン（適合率）、リコール（再現率）、およびＦ１値を算出した。図４１は、第２実験の結果を示す図である。図４１に示すように、Ｆ１値を比較対象とした場合、Ｐｒｏｐ＿Ｄの精度が最も高く、また、各モデルＭ１〜Ｍ４のいずれかを利用しない場合に、精度が低下してしまうことが分かった。この結果、いずれのモデルＭ１〜Ｍ４も、セッションの分類精度の向上に寄与していることが分かった。 In the second experiment, the sessions classified into FL1 to FL4 are classified by the above-described various methods, and the accuracy (accuracy), precision (accuracy), recall (recall), The F1 value was calculated. FIG. 41 is a diagram showing the results of the second experiment. As shown in FIG. 41, it was found that when the F1 value was used as a comparison target, the accuracy of Prop_D was the highest, and when any of the models M1 to M4 was not used, the accuracy was reduced. As a result, it was found that each of the models M1 to M4 contributed to the improvement of the session classification accuracy.

〔６−４．第３実験について〕
続いて、上述したＰｒｏｐ＿Ｄにおいて、分類精度に対する学習データの量の寄与を判断するため、以下の第３実験を行った。なお、以下の説明では、使用した学習データの量を添え字のｋで示すものとする。例えば、Ｐｒｏｐ＿Ｄ＿ｋは、ｋ／２個の高満足セッションとｋ／２個の低満足セッションとを用いて各モデルＭ１〜Ｍ４を学習し、学習した各モデルＭ１〜Ｍ４を用いて、テストセッションの分類を行った旨を示す。また、第３実験においては、ｋの値を５０万（５００ｋ）、１００万（１Ｍ）、２００万（２Ｍ）、３００万（３Ｍ）、および４００万（４Ｍ）と変更することで、学習データの量の各手法における精度への寄与を調べた。 [6-4. About the third experiment]
Subsequently, in Prop_D described above, the following third experiment was performed in order to determine the contribution of the amount of learning data to the classification accuracy. In the following description, the amount of used learning data is indicated by a subscript k. For example, Prop_D_k learns each model M1 to M4 using k / 2 high satisfaction sessions and k / 2 low satisfaction sessions, and classifies test sessions using the learned models M1 to M4. Is performed. In the third experiment, the value of k was changed to 500,000 (500k), 1,000,000 (1M), 2,000,000 (2M), 3,000,000 (3M), and 4,000,000 (4M) to obtain the learning data. The contribution of the quantity to the accuracy in each approach was investigated.

図４２は、第３実験におけるＰｒｏｐ＿Ｄの結果を示す図である。なお、図４２では、ｋを１Ｍとした際との有意差を有する結果にアスタリスクを付与した。図４２に示すように、Ｐｒｏｐ＿Ｄにおいては、テストセッションの分類に係わらず、ｋ＝２Ｍとした場合に最も精度が高くなることがわかった。これは、学習データが２００万よりも少ないと精度の向上の余地があり、学習データが２００万を超えると過学習が生じてしまうためとも考えられる。 FIG. 42 is a diagram showing the result of Prop_D in the third experiment. In FIG. 42, an asterisk is added to a result having a significant difference from the case where k is set to 1M. As shown in FIG. 42, in Prop_D, it was found that the accuracy was highest when k = 2M, regardless of the classification of the test session. This may be because if the learning data is less than 2 million, there is room for improvement in accuracy, and if the learning data exceeds 2 million, over-learning occurs.

〔６−５．第４実験について〕
ここで、上述したＰｒｏｐ＿Ｄでは、繰り返し発話や謝罪、および指標発話に基づく評価を行うモデルとして、ニューラルネットワークを利用していた。ここで、各認識にニューラルネットワークを利用することの効果を検査するため、以下の第４実験を行った。例えば、第４実験では、Ｐｒｏｐ＿Ｄの手法において、ニューラルネットワークの代わりに、編集距離を用いて繰り返し発話に基づくセッションの評価を行い、繰り返し発話や指標発話に基づく評価を文字列マッチングにより行う手法をＰｒｏｐ＿Ｓとして実行した。 [6-5. About the fourth experiment]
Here, in the above-described Prop_D, a neural network is used as a model for performing evaluation based on repeated utterances, apologies, and index utterances. Here, the following fourth experiment was performed in order to examine the effect of using a neural network for each recognition. For example, in the fourth experiment, in the Prop_D method, instead of a neural network, a session based on a repeated utterance is evaluated using an edit distance, and a method based on character string matching is used to perform evaluation based on a repeated utterance or an index utterance by character string matching. Ran as

図４３は、第４実験におけるＰｒｏｐ＿Ｓの結果を示す図である。なお、図４３では、ＦＬ２、ＦＬ３、ＦＬ４において、Ｐｒｏｐ＿Ｄと有意差を有する結果が得られた。図４０、図４１に示すＰｒｏｐ＿Ｄの結果と図４３に示すＰｒｏｐ＿Ｓの結果とを比較すると、Ｐｒｏｐ＿Ｓにおいては、Ｐｒｏｐ＿Ｄよりも適合率が高く、再現率およびＦ１値が低くなっている。すなわち、Ｐｒｏｐ＿Ｓは、Ｐｒｏｐ＿Ｄよりも精度が低くなっていると考えられる。このため、Ｐｒｏｐ＿Ｄにおいて、セッションを評価する際にニューラルネットワークを利用することで、精度の向上を実現することができる。 FIG. 43 is a diagram showing the result of Prop_S in the fourth experiment. Note that in FIG. 43, results having significant differences from Prop_D were obtained in FL2, FL3, and FL4. Comparing the results of Prop_D shown in FIGS. 40 and 41 with the results of Prop_S shown in FIG. 43, Prop_S has a higher precision and lower recall and F1 value than Prop_D. That is, Prop_S is considered to have lower accuracy than Prop_D. For this reason, in Prop_D, accuracy can be improved by using a neural network when evaluating a session.

〔６−６．第５実験について〕
次に、Ｐｒｏｐ＿Ｄにおいて評価に用いられる各モデルの出力を融合させ、汎化能力（未学習データに対する予測能力）を向上させたモデルを用いて、セッションの評価を行う第５実験を行った。より具体的には、第５実験においては、アンサンブルの手法を用いて、Ｐｒｏｐ＿Ｄの精度をさらに向上させた手法をＰｒｏｐ＿Ｅとして実験した。 [6-6. About the fifth experiment]
Next, a fifth experiment was performed in which the output of each model used for evaluation in Prop_D was fused, and the session was evaluated using a model with improved generalization ability (prediction ability for unlearned data). More specifically, in the fifth experiment, a technique in which the accuracy of Prop_D was further improved using the ensemble technique was used as Prop_E.

図４４は、第５実験におけるＰｒｏｐ＿Ｅの結果を示す図である。なお、図４４では、Ｐｒｏｐ＿Ｅの結果と、Ｐｒｏｐ＿Ｄ＿２Ｍの結果とを並べて記載した。図４４に示すように、Ｐｒｏｐ＿Ｅの結果は、Ｐｒｏｐ＿Ｄ＿２Ｍの結果と比較して、Ｆ１値が高くなっている。このため、Ｐｒｏｐ＿Ｅのように、アンサンブルの手法を用いることで、セッションを評価する精度をさらに向上させることができる。 FIG. 44 is a diagram showing the result of Prop_E in the fifth experiment. In FIG. 44, the result of Prop_E and the result of Prop_D_2M are described side by side. As shown in FIG. 44, the result of Prop_E has a higher F1 value than the result of Prop_D_2M. For this reason, by using an ensemble method like Prop_E, the accuracy of evaluating a session can be further improved.

〔７．実施形態の効果について〕
以下、上述した各処理を実行する情報提供装置１０の効果の一例について説明する。 [7. Effect of Embodiment]
Hereinafter, an example of the effect of the information providing apparatus 10 that executes the above-described processes will be described.

〔７−１．利用態様に基づく学習データの抽出について〕
上述したように、情報提供装置１０は、利用者による対話サービスの利用態様を特定する。そして、情報提供装置１０は、対話サービスにおける発話と応答とを含むセッションを評価するための学習データとして、特定された利用態様が所定の条件を満たす利用者の発話を含むセッションを対話サービスの履歴から抽出する。 [7-1. Extraction of learning data based on usage mode)
As described above, the information providing apparatus 10 specifies the usage mode of the interactive service by the user. Then, the information providing apparatus 10 uses the session including the utterance of the user whose specified use mode satisfies the predetermined condition as the learning data for evaluating the session including the utterance and the response in the interactive service. Extract from

このように、情報提供装置１０は、各利用者のエンゲージメントに基づいて、セッションを評価するモデルを学習するための学習データを抽出する。このため、情報提供装置１０は、例えば、セッションに対して利用者が満足しているか否かを判定するモデルを学習するための学習データを、人手を介さずに自動で抽出することができる。このような処理の結果、情報提供装置１０は、多くの学習データを容易に準備することができるので、モデルの精度を向上させる結果、セッションの評価精度を改善することができる。 As described above, the information providing apparatus 10 extracts learning data for learning a model for evaluating a session based on the engagement of each user. Therefore, the information providing apparatus 10 can automatically extract, for example, learning data for learning a model for determining whether or not the user is satisfied with the session without human intervention. As a result of such processing, the information providing apparatus 10 can easily prepare a large amount of learning data. As a result, the accuracy of the model can be improved, and as a result, the evaluation accuracy of the session can be improved.

また、情報提供装置１０は、利用態様として、利用者による対話サービスの利用期間を特定し、利用期間が所定の条件を満たす利用者の発話を含むセッションを学習データとして抽出する。また、情報提供装置１０は、利用態様として、利用者による対話サービスの利用頻度を特定し、利用頻度が所定の条件を満たす利用者の発話を含むセッションを学習データとして抽出する。このため、情報提供装置１０は、例えば、利用者の満足度が低いセッションを抽出することを目的として学習データを抽出する場合、エンゲージメントが高い利用者、すなわち、対話サービスに対する評価が高い利用者のセッションを不例として抽出し、エンゲージメントが低い利用者、すなわち、対話サービスに対する評価が低い利用者のセッションを正例として抽出するので、学習に適したデータを容易に準備することができる結果、セッションの評価精度を改善することができる。 In addition, the information providing apparatus 10 specifies a use period of the interactive service by the user as a use mode, and extracts a session including an utterance of the user whose use period satisfies a predetermined condition as learning data. In addition, the information providing apparatus 10 specifies the use frequency of the interactive service by the user as a use mode, and extracts a session including the utterance of the user whose use frequency satisfies a predetermined condition as learning data. For this reason, for example, when extracting learning data for the purpose of extracting a session with a low degree of user satisfaction, the information providing apparatus 10 determines whether a user with a high engagement, that is, a user with a high evaluation for a dialogue service, Sessions are extracted as examples and sessions of users with low engagement, that is, users with low evaluation of dialogue services are extracted as positive examples, so that data suitable for learning can be easily prepared. Can improve the evaluation accuracy.

また、情報提供装置１０は、抽出されたセッションが有する特徴をモデルに学習させる。このため、情報提供装置１０は、セッションの評価精度を改善することができる。 Further, the information providing apparatus 10 causes the model to learn features of the extracted session. For this reason, the information providing apparatus 10 can improve the evaluation accuracy of the session.

また、情報提供装置１０は、利用態様が所定の条件を満たす利用者の発話を含むセッションを第１属性を有する学習データとして抽出するとともに、利用態様が所定の条件を満たさない利用者の発話を含むセッションを第２属性を有する学習データとして抽出する。そして、情報提供装置１０は、抽出された学習データを用いて、入力されたセッションを第１属性を有するデータ若しくは第２属性を有するデータのいずれかに分類するモデルを学習する。 In addition, the information providing apparatus 10 extracts a session including an utterance of a user whose use mode satisfies a predetermined condition as learning data having a first attribute, and extracts an utterance of a user whose use mode does not satisfy a predetermined condition. The extracted session is extracted as learning data having the second attribute. Then, the information providing apparatus 10 learns a model that classifies the input session into either data having the first attribute or data having the second attribute using the extracted learning data.

例えば、情報提供装置１０は、利用態様が所定の条件を満たす利用者の発話を含むセッションを、その利用者による評価が好意的な学習データとして抽出するとともに、利用態様が所定の条件を満たさない利用者の発話を含むセッションをその利用者による評価が非好意的な学習データとして抽出する。そして、情報提供装置１０は、抽出された学習データを用いて、入力されたセッションに対する利用者の評価が好意的であるか非好意的であるかを判定するモデルを学習する。このため、情報提供装置１０は、セッションの評価精度を改善することができる。 For example, the information providing apparatus 10 extracts a session including an utterance of a user whose use mode satisfies a predetermined condition as learning data favorably evaluated by the user, and the use mode does not satisfy the predetermined condition. The session including the utterance of the user is extracted as learning data whose evaluation by the user is unfavorable. Then, the information providing apparatus 10 learns a model that determines whether the user's evaluation of the input session is favorable or unfavorable using the extracted learning data. For this reason, the information providing apparatus 10 can improve the evaluation accuracy of the session.

また、情報提供装置１０は、特定された利用態様に基づいて、利用者の発話を含むセッションを評価する。そして、情報提供装置１０は、評価が所定の条件を満たすセッションを学習データとして抽出する。このため、情報提供装置１０は、セッションの評価精度を改善することができる。 In addition, the information providing apparatus 10 evaluates the session including the utterance of the user based on the specified usage mode. Then, the information providing apparatus 10 extracts, as learning data, a session whose evaluation satisfies a predetermined condition. For this reason, the information providing apparatus 10 can improve the evaluation accuracy of the session.

また、情報提供装置１０は、評価が所定の条件を満たすセッションに含まれる発話と類似する発話、若しくは、そのセッションに含まれる応答と類似する応答を対話サービスの履歴から抽出し、抽出した発話若しくは応答を用いた新たなセッションを評価が所定の条件を満たすセッションとして生成する。そして、情報提供装置１０は、生成された新たなセッションが有する特徴をモデルに学習させる。このように、情報提供装置１０は、利用態様が所定の条件を満たすセッションから、類似するセッションを学習データとして生成し、生成した学習データを用いてモデルの学習を行う。このため、情報提供装置１０は、モデルの判定精度を向上させる結果、セッションの評価精度を改善することができる。 Further, the information providing apparatus 10 extracts, from the history of the interactive service, an utterance similar to an utterance included in a session whose evaluation satisfies a predetermined condition, or a response similar to a response included in the session, and extracts the extracted utterance or A new session using the response is generated as a session whose evaluation satisfies a predetermined condition. Then, the information providing apparatus 10 causes the model to learn features of the generated new session. As described above, the information providing apparatus 10 generates a similar session as learning data from a session whose use mode satisfies a predetermined condition, and performs model learning using the generated learning data. For this reason, the information providing apparatus 10 can improve the session evaluation accuracy as a result of improving the model determination accuracy.

また、情報提供装置１０は、評価が所定の条件を満たすセッションを対話サービスの履歴から抽出し、抽出したセッションに含まれる発話のうち出現頻度が所定の条件を満たす発話を、その発話が含まれるセッションに対する利用者の評価が好意的であるか否かの指標となる指標発話として抽出する。例えば、情報提供装置１０は、評価が所定の条件を満たすセッションにおける出現頻度と、評価が所定の条件を満たさないセッションにおける出現頻度との比が所定の条件を満たす発話を指標発話として抽出する。このため、情報提供装置１０は、利用者の利用態様に基づいて、指標発話の抽出を行うことができる。なお、このような指標発話は、セッションの評価に利用することができる。このため、情報提供装置１０は、指標発話に基づいたセッションの評価精度を改善することができる。 In addition, the information providing apparatus 10 extracts a session whose evaluation satisfies a predetermined condition from the history of the interactive service, and among the utterances included in the extracted session, the utterance whose appearance frequency satisfies the predetermined condition includes the utterance. It is extracted as an index utterance that is an index of whether or not the user's evaluation of the session is favorable. For example, the information providing apparatus 10 extracts, as an index utterance, an utterance in which a ratio of an appearance frequency in a session whose evaluation satisfies a predetermined condition and an appearance frequency in a session whose evaluation does not satisfy a predetermined condition satisfies a predetermined condition. For this reason, the information providing apparatus 10 can extract the index utterance based on the usage mode of the user. Note that such an index utterance can be used for evaluating a session. For this reason, the information providing apparatus 10 can improve the evaluation accuracy of the session based on the index utterance.

また、情報提供装置１０は、利用者の利用態様に加えて、謝罪を示す所定の応答がセッションに含まれているか否かに基づいて、そのセッションを評価する。また、情報提供装置１０は、利用者の利用態様に加えて、セッションに対する利用者の評価が好意的であるか否かの指標となる指標発話が含まれているか否かに基づいて、そのセッションを評価する。このため、情報提供装置１０は、セッションの評価精度をさらに改善することができる。 Further, the information providing apparatus 10 evaluates the session based on whether or not a predetermined response indicating an apology is included in the session, in addition to the usage mode of the user. In addition, the information providing apparatus 10 determines whether or not the session includes an index utterance that is an index of whether the evaluation of the user is favorable or not, in addition to the usage mode of the user. To evaluate. For this reason, the information providing apparatus 10 can further improve the evaluation accuracy of the session.

また、情報提供装置１０は、利用者による対話サービスの利用態様を特定し、特定された利用態様に基づいて、その利用者の発話を含むセッションを評価する。このため、情報提供装置１０は、人手を介さずにセッションを適切に評価することができる。 In addition, the information providing apparatus 10 specifies a usage mode of the interactive service by the user, and evaluates a session including the utterance of the user based on the specified usage mode. For this reason, the information providing apparatus 10 can appropriately evaluate the session without manual intervention.

〔７−２．利用態様に基づく指標発話の抽出について〕
また、情報提供装置１０は、対話サービスの利用態様が所定の条件を満たす利用者の発話とその発話に対する応答とを含むセッションを取得する。そして、情報提供装置１０は、取得されたセッションにおける出現頻度に基づいて、対話サービスに対する利用者の評価の指標となる指標発話を抽出する。このような指標発話は、セッションの評価に利用することができる。このため、情報提供装置１０は、セッションの評価精度を改善することができる。 [7-2. Extraction of index utterance based on usage pattern]
In addition, the information providing apparatus 10 acquires a session including an utterance of a user whose use mode of the interactive service satisfies a predetermined condition and a response to the utterance. Then, the information providing apparatus 10 extracts an index utterance that is an index of the user's evaluation of the interactive service based on the acquired appearance frequency in the session. Such an index utterance can be used for evaluating a session. For this reason, the information providing apparatus 10 can improve the evaluation accuracy of the session.

また、情報提供装置１０は、対話サービスの利用期間が所定の条件を満たす利用者の発話を含むセッションを取得する。また、情報提供装置１０は、対話サービスの利用頻度が所定の条件を満たす利用者の発話を含むセッションを取得する。そして、情報提供装置１０は、利用態様が所定の条件を満たす利用者の発話を含むセッションにおける出現頻度と、利用態様が所定の条件を満たさない利用者の発話を含むセッションにおける出現頻度との比が所定の条件を満たす発話を指標発話として抽出する。 In addition, the information providing apparatus 10 acquires a session including an utterance of a user whose usage period of the interactive service satisfies a predetermined condition. In addition, the information providing apparatus 10 acquires a session including the utterance of the user whose use frequency of the interactive service satisfies a predetermined condition. Then, the information providing apparatus 10 determines the ratio of the appearance frequency in the session including the utterance of the user whose use mode satisfies the predetermined condition to the appearance frequency in the session including the utterance of the user whose use mode does not satisfy the predetermined condition. Extracts an utterance satisfying a predetermined condition as an index utterance.

また、情報提供装置１０は、利用態様が所定の条件を満たす利用者の発話を含むセッションにおける出現頻度が所定の閾値を超える発話を、対話サービスを利用者が好意的に評価している旨を示す好意的指標発話として抽出する。また、情報提供装置１０は、利用態様が所定の条件を満たさない利用者の発話を含むセッションにおける出現頻度が所定の閾値を超える発話を、対話サービスを利用者が非好意的に評価している旨を示す非好意的指標発話として抽出する。これらの処理の結果、情報提供装置１０は、利用者の発話から指標発話を適切に抽出することができるので、セッションの評価精度を改善することができる。 Further, the information providing apparatus 10 indicates that the user favorably evaluates the interactive service for an utterance whose appearance frequency in a session including the utterance of the user whose usage mode satisfies the predetermined condition exceeds a predetermined threshold. It is extracted as a favorable index utterance shown. In the information providing apparatus 10, the user unfavorably evaluates the dialogue service for an utterance whose appearance frequency in a session including an utterance of a user whose use mode does not satisfy the predetermined condition exceeds a predetermined threshold. It is extracted as an unfavorable index utterance indicating the effect. As a result of these processes, the information providing apparatus 10 can appropriately extract the index utterance from the utterance of the user, so that the evaluation accuracy of the session can be improved.

また、情報提供装置１０は、抽出された指標発話の出現頻度に基づいて、セッションを評価する。このため、情報提供装置１０は、セッションの評価精度を改善することができる。 Further, the information providing apparatus 10 evaluates the session based on the appearance frequency of the extracted index utterance. For this reason, the information providing apparatus 10 can improve the evaluation accuracy of the session.

また、情報提供装置１０は、評価対象となるセッションについて、利用態様が所定の条件を満たす利用者の発話を含むセッションにおける出現頻度が所定の閾値を超える発話と、利用態様が所定の条件を満たさない利用者の発話を含むセッションにおける出現頻度が所定の閾値を超える発話と基づいて、評価対象となるセッションを評価する。このため、情報提供装置１０は、適切にセッションを評価することができる。 In addition, the information providing apparatus 10 determines that, for a session to be evaluated, an utterance whose appearance frequency in a session including an utterance of a user whose usage mode satisfies a predetermined condition exceeds a predetermined threshold value, and the usage mode satisfies a predetermined condition. The session to be evaluated is evaluated based on the utterance whose appearance frequency in the session including the utterance of the user who does not exceed the predetermined threshold. For this reason, the information providing apparatus 10 can appropriately evaluate the session.

また、情報提供装置１０は、評価が所定の条件を満たすセッションが有する特徴をモデルに学習させる。このため、情報提供装置１０は、セッションを評価するモデルを実現することができる。 Further, the information providing apparatus 10 causes the model to learn features of a session whose evaluation satisfies a predetermined condition. For this reason, the information providing apparatus 10 can realize a model for evaluating a session.

また、情報提供装置１０は、対話サービスにおける履歴から抽出された指標発話を特定し、特定した指標発話に対応する一連の対話を学習データとして抽出する。そして、情報提供装置１０は、抽出された学習データの特徴をモデルに学習させる。例えば、情報提供装置１０は、非好意的対話例ＵＤや好意的対話例ＦＤの特徴をモデルに学習させる。このため、情報提供装置１０は、セッションを評価するモデルを実現することができる。 Further, the information providing apparatus 10 specifies the index utterance extracted from the history in the dialog service, and extracts a series of dialogs corresponding to the specified index utterance as learning data. Then, the information providing apparatus 10 causes the model to learn the features of the extracted learning data. For example, the information providing apparatus 10 causes the model to learn characteristics of the unfavorable dialogue example UD and the favorable dialogue example FD. For this reason, the information providing apparatus 10 can realize a model for evaluating a session.

〔７−３．指標発話に基づくセッションの評価について〕
また、情報提供装置１０は、対話サービスにおける利用者の発話と応答とを含むセッションを取得する。そして、情報提供装置１０は、対話サービスに対する利用者の評価の指標となる指標発話に基づいて、取得されたセッションを評価する。このため、情報提供装置１０は、セッションを自動的に精度良く学習することができる。 [7-3. Evaluation of Session Based on Index Utterance]
Further, the information providing device 10 acquires a session including a user's utterance and a response in the interactive service. Then, the information providing apparatus 10 evaluates the acquired session based on an index utterance which is an index of the user's evaluation on the interactive service. Therefore, the information providing apparatus 10 can automatically and accurately learn the session.

また、情報提供装置１０は、対話サービスに対する利用者の評価が好意的である旨を示す好意的指標発話と、対話サービスに対する利用者の評価が非好意的である旨を示す非好意的指標発話とに基づいて、セッションを評価する。そして、例えば、情報提供装置１０は、好意的指標発話の出現頻度と、非好意的指標発話の出現頻度との比に基づいて、セッションを評価する。また、例えば、情報提供装置１０は、好意的指標発話が含まれるセッションにおいて対話サービスに対する利用者の評価がどれくらい好意的であるかを示す好意的スコアと、非好意的指標発話が含まれるセッションにおいて対話サービスに対する利用者の評価がどれくらい非好意的であるかを示す非好意的スコアとの比に基づいて、セッションを評価する。このため、情報提供装置１０は、セッションを自動的に精度良く学習することができる。 The information providing apparatus 10 also displays a favorable index utterance indicating that the evaluation of the user for the interactive service is favorable, and an unfavorable index utterance indicating that the evaluation of the user for the interactive service is unfavorable. And evaluate the session based on Then, for example, the information providing apparatus 10 evaluates the session based on the ratio between the appearance frequency of the favorable index utterance and the appearance frequency of the unfavorable index utterance. In addition, for example, the information providing apparatus 10 may include, in a session including a favorable index utterance, a favorable score indicating how favorable the evaluation of the user for the interactive service is, and a session including a non-favorable index utterance. Evaluate the session based on the ratio of the unfavorable score indicating how unfavorable the user's rating of the interactive service is. Therefore, the information providing apparatus 10 can automatically and accurately learn the session.

また、情報提供装置１０は、利用者による評価が好意的な高満足セッションであるか否かを評価する。そして、情報提供装置１０は、高満足セッションが有する特徴をモデルに学習させる。また、情報提供装置１０は、利用者による評価が非好意的な低満足セッションであるか否かを評価する。そして、情報提供装置１０は、低満足セッションが有する特徴をモデルに学習させる。 In addition, the information providing apparatus 10 evaluates whether the evaluation by the user is a favorable high satisfaction session. Then, the information providing apparatus 10 causes the model to learn the features of the high satisfaction session. Further, the information providing apparatus 10 evaluates whether or not the evaluation by the user is an unfavorable low satisfaction session. Then, the information providing apparatus 10 causes the model to learn the features of the low satisfaction session.

また、情報提供装置１０は、対話サービスにおける履歴から指標発話を特定し、特定した指標発話から所定の数だけ前の応答および発話を学習データとして抽出する。そして、情報提供装置１０は、抽出された学習データの特徴をモデルに学習させる。例えば、情報提供装置１０は、対話サービスに対する利用者の評価が好意的である旨を示す好意的指標発話から所定の数だけ前の応答および発話を好意的対話例として抽出し、対話サービスに対する利用者の評価が非好意的である旨を示す非好意的指標発話から所定の数だけ前の応答および発話を非好意的対話例として抽出し、対話サービスに対する利用者の評価が好意的でも非好意的でもない旨を示す中性発話から所定の数だけ前の応答および発話を中性対話例として抽出する。そして、情報提供装置１０は、抽出された好意的対話例と、非好意的対話例と、中性対話例とが有する特徴をモデルに学習させる。 In addition, the information providing apparatus 10 specifies an index utterance from the history of the interactive service, and extracts a predetermined number of responses and utterances preceding the specified index utterance as learning data. Then, the information providing apparatus 10 causes the model to learn the features of the extracted learning data. For example, the information providing apparatus 10 extracts, as a favorable dialogue example, responses and utterances a predetermined number of times earlier than a favorable index utterance indicating that the user's evaluation of the dialogue service is favorable, and uses the dialogue service. The response and the utterance a predetermined number of times ago are extracted as unfavorable dialogue examples from the unfavorable index utterance indicating that the evaluation of the user is unfavorable, and the user's evaluation of the dialogue service is favorable or unfavorable A predetermined number of previous responses and utterances from the neutral utterance indicating that it is not a target are extracted as a neutral dialogue example. Then, the information providing apparatus 10 causes the model to learn the features of the extracted favorable example, the unfavorable example, and the neutral example.

また、情報提供装置１０は、指標発話と、謝罪を示す所定の応答がセッションに含まれているか否かに基づいて、そのセッションを評価する。上述した各種の処理の結果、情報提供装置１０は、様々な観点からセッションを適切に評価するモデルを学習することができる。この結果、情報提供装置１０は、セッションの評価精度を向上させることができる。 Further, the information providing apparatus 10 evaluates the session based on whether the session includes an index utterance and a predetermined response indicating an apology. As a result of the various processes described above, the information providing apparatus 10 can learn a model that appropriately evaluates a session from various viewpoints. As a result, the information providing apparatus 10 can improve the evaluation accuracy of the session.

また、情報提供装置１０は、利用者の発話を含むセッションに対する評価結果に基づいて、その利用者による将来の対話サービスの利用態様を推定する。例えば、情報提供装置１０は、評価として、利用者による評価がどれくらい好意的であるかを示す好意的スコア、若しくは、利用者による評価がどれくらい非好意的であるかを示す非好意的スコアを算出し、好意的スコアと非好意的スコアとの比に基づいて、利用者による将来の対話サービスの利用態様を推定する。このため、情報提供装置１０は、利用者の利用態様を適切に推定することができる。 In addition, the information providing apparatus 10 estimates a future use mode of the interactive service by the user based on the evaluation result of the session including the utterance of the user. For example, the information providing apparatus 10 calculates, as the evaluation, a favorable score indicating how favorable the evaluation by the user is, or an unfavorable score indicating how unfavorable the evaluation by the user is. Then, based on a ratio between the favorable score and the non-favorable score, a future use mode of the interactive service by the user is estimated. For this reason, the information providing apparatus 10 can appropriately estimate the usage mode of the user.

また、情報提供装置１０は、評価が所定の条件を満たすセッションから出現頻度が所定の条件を満たす発話を指標発話として抽出する。例えば、情報提供装置１０は、セッションに対する利用者による評価が好意的であるか非好意的であるかを評価し、利用者による評価が好意的であると評価されたセッションにおける出現頻度と、利用者による評価が非好意的であると評価されたセッションにおける出現頻度との比が所定の条件を満たす発話を指標発話として抽出する。このため、情報提供装置１０は、新たな指標発話を適切に抽出することができる。 Further, the information providing apparatus 10 extracts, as an index utterance, an utterance whose appearance frequency satisfies a predetermined condition from a session whose evaluation satisfies a predetermined condition. For example, the information providing apparatus 10 evaluates whether the evaluation of the session by the user is favorable or unfavorable, and determines the frequency of appearance in the session in which the evaluation by the user is favorable, An utterance that satisfies a predetermined condition with respect to a frequency of appearance in a session evaluated as unfavorable by the user is extracted as an index utterance. For this reason, the information providing apparatus 10 can appropriately extract a new index utterance.

〔７−４．分布画像に基づくセッションの評価について〕
また、情報提供装置１０は、対話サービスにおける利用者の発話とその発話に対する応答とを含むセッションを取得する。また、情報提供装置１０は、セッションに含まれる発話の文字が相互に同一か否かを示す画像を生成する。そして、情報提供装置１０は、生成された画像に基づいて、取得されたセッションを評価する。 [7-4. Session evaluation based on distribution image)
In addition, the information providing apparatus 10 acquires a session including the utterance of the user in the interactive service and a response to the utterance. In addition, the information providing apparatus 10 generates an image indicating whether the characters of the utterance included in the session are the same as each other. Then, the information providing device 10 evaluates the acquired session based on the generated image.

ここで、セッションに同一または類似の発話が繰り返し含まれる場合、上述した画像には、所定の模様が含まれることとなる。すると、画像に所定の模様が含まれると推定される場合には、セッションに同一または類似の発話が繰り返し含まれると推定することができる。このように、同一または類似の発話が繰り返し含まれるセッションは、利用者による評価が低いと推定される。このため、情報提供装置１０は、生成された画像に基づいて、取得されたセッションを適切に評価することができる。 Here, when the same or similar utterance is repeatedly included in the session, the above-described image includes a predetermined pattern. Then, when it is estimated that a predetermined pattern is included in the image, it can be estimated that the same or similar utterance is repeatedly included in the session. In this way, a session in which the same or similar utterance is repeatedly included is estimated to have a low evaluation by the user. For this reason, the information providing apparatus 10 can appropriately evaluate the acquired session based on the generated image.

また、情報提供装置１０は、画像として、セッションにおける同一文字の分布を示す画像を生成する。例えば、情報提供装置１０は、セッションに含まれる発話の文字列を画像の縦方向の各領域と横方向の各領域とに対応付けた場合に、縦方向と横方向とに同一の文字が対応付けられた領域に対して所定の色彩を付した画像を生成する。そして、情報提供装置１０は、画像に基づいて、セッションに同一又は類似する文字列が繰り返し出現しているか否かを判定し、判定結果に基づいて、セッションを評価する。例えば、情報提供装置１０は、セッションに同一又は類似する文字列が繰り返し出現している場合は、そのセッションに対して利用者による評価が非好意的である旨の評価を付与する。このため、情報提供装置１０は、セッションを適切に評価することができる。 Further, the information providing apparatus 10 generates, as an image, an image indicating the distribution of the same character in the session. For example, when the information providing apparatus 10 associates a character string of an utterance included in a session with each of a vertical region and a horizontal region of an image, the same character corresponds to the vertical direction and the horizontal direction. An image in which a predetermined color is applied to the attached area is generated. Then, the information providing apparatus 10 determines whether the same or similar character string repeatedly appears in the session based on the image, and evaluates the session based on the determination result. For example, when the same or similar character string repeatedly appears in a session, the information providing apparatus 10 gives an evaluation that the evaluation by the user is unfavorable to the session. For this reason, the information providing apparatus 10 can appropriately evaluate the session.

また、情報提供装置１０は、同一又は類似する文字列が繰り返し出現するセッションから生成された画像が有する特徴をモデルに学習させる。そして、情報提供装置１０は、学習が行われたモデルを用いて、画像からその画像の元となったセッションを評価する。ここで、文字を学習させたモデルよりも、画像を学習させたモデルの方が判定精度を向上させやすいことが知られている。このため、情報提供装置１０は、セッションの評価精度を向上させることができる。 In addition, the information providing apparatus 10 causes the model to learn features of an image generated from a session in which the same or similar character strings repeatedly appear. Then, the information providing apparatus 10 evaluates the session from which the image is based on the image using the model on which the learning has been performed. Here, it is known that a model that has learned images is easier to improve determination accuracy than a model that has learned characters. For this reason, the information providing apparatus 10 can improve the evaluation accuracy of the session.

また、情報提供装置１０は、対話サービスにおける履歴からいずれかの発話を選択し、選択した発話と同一又は類似する発話を複数含む発話群を生成する。そして、情報提供装置１０は、生成された発話群から画像を生成し、生成した画像が有する特徴をモデルに学習させる。このため、情報提供装置１０は、モデルの学習データを増大させる結果、モデルの精度を向上させることができる。 Further, the information providing apparatus 10 selects one of the utterances from the history in the interactive service, and generates an utterance group including a plurality of utterances that are the same as or similar to the selected utterance. Then, the information providing apparatus 10 generates an image from the generated utterance group, and causes the model to learn features of the generated image. For this reason, the information providing apparatus 10 can improve the accuracy of the model as a result of increasing the learning data of the model.

〔７−５．繰り返し発話に基づくセッションの評価について〕
また、情報提供装置１０は、対話サービスにおける利用者の発話とその発話に対する応答とを含むセッションを取得する。そして、情報提供装置１０は、セッションに含まれる発話の文字列の繰り返しに基づいて、取得されたセッションを評価する。このため、情報提供装置１０は、人手を介さずとも、セッションを精度良く評価することができる。 [7-5. Session evaluation based on repeated speech)
In addition, the information providing apparatus 10 acquires a session including the utterance of the user in the interactive service and a response to the utterance. Then, the information providing apparatus 10 evaluates the acquired session based on the repetition of the character string of the utterance included in the session. For this reason, the information providing apparatus 10 can accurately evaluate the session without manual intervention.

例えば、情報提供装置１０は、セッションから文字列の類似度が所定の閾値を超える発話の対を抽出し、抽出した対の数に基づいて、セッションを評価する。また、情報提供装置１０は、セッションから意味的な類似度が所定の閾値を超える発話の対を抽出し、抽出した対の数に基づいて、セッションを評価する。そして、情報提供装置１０は、セッションから抽出した対の数が所定の閾値を超える場合は、セッションに対して利用者による評価が非好意的である旨の評価を付与する。このため、情報提供装置１０は、セッションを適切に評価することができる。 For example, the information providing apparatus 10 extracts a pair of utterances in which the similarity of the character string exceeds a predetermined threshold from the session, and evaluates the session based on the number of the extracted pairs. The information providing apparatus 10 extracts a pair of utterances whose semantic similarity exceeds a predetermined threshold from the session, and evaluates the session based on the number of the extracted pairs. If the number of pairs extracted from the session exceeds a predetermined threshold, the information providing apparatus 10 gives the session an evaluation indicating that the evaluation by the user is unfavorable. For this reason, the information providing apparatus 10 can appropriately evaluate the session.

また、情報提供装置１０は、同一又は類似する文字列が繰り返し出現するセッションが有する特徴を学習したモデルを用いて、セッションを評価する。例えば、情報提供装置１０は、対話サービスにおける履歴からいずれかの発話を選択し、選択した発話と同一又は類似する発話を複数含む発話群を生成する。また、情報提供装置１０は、生成した発話群が有する特徴をモデルに学習させる。そして、情報提供装置１０は、学習が行われたモデルを用いて、セッションを評価する。 In addition, the information providing apparatus 10 evaluates the session using a model that has learned features of the session in which the same or similar character strings repeatedly appear. For example, the information providing apparatus 10 selects one of the utterances from the history of the interactive service, and generates an utterance group including a plurality of utterances that are the same as or similar to the selected utterance. Further, the information providing apparatus 10 causes the model to learn the features of the generated utterance group. Then, the information providing apparatus 10 evaluates the session by using the learned model.

また、情報提供装置１０は、選択した発話と同一又は類似する発話を複数含む第１発話群を生成するとともに、対話サービスにおける履歴からそれぞれ異なる複数の発話を選択し、選択した発話からなる第２発話群とを生成し、第１発話群が有する特徴と第２発話群が有する特徴とをモデルに学習させる。また、情報提供装置１０は、発話群が有する特徴をモデルに学習させることで、入力されたセッションに対して利用者による評価が非好意的であるか否かを判定するモデルの学習を行う。このため、情報提供装置１０は、精度良くセッションの評価を行うことができる。 In addition, the information providing apparatus 10 generates a first utterance group including a plurality of utterances that are the same as or similar to the selected utterance, selects a plurality of different utterances from the history in the interactive service, and generates a second utterance including the selected utterance. An utterance group is generated, and the model learns features of the first utterance group and features of the second utterance group. In addition, the information providing apparatus 10 learns a feature of the group of utterances by learning the model to determine whether or not the evaluation of the input session by the user is unfavorable. Therefore, the information providing apparatus 10 can accurately evaluate the session.

また、情報提供装置１０は、評価が所定の条件を満たすセッションから、出現頻度が所定の条件を満たす発話を、対話サービスに対する利用者の評価となる指標となる指標発話として抽出する。このため、情報提供装置１０は、新たな指標発話を自動で抽出することができる。 In addition, the information providing apparatus 10 extracts, from a session whose evaluation satisfies a predetermined condition, an utterance whose appearance frequency satisfies a predetermined condition, as an index utterance serving as an index that is a user's evaluation of the interactive service. For this reason, the information providing apparatus 10 can automatically extract a new index utterance.

〔７−６．発話を報酬とした強化学習について〕
また、情報提供装置１０は、入力された発話に対する応答を生成することで対話サービスを実現する対話モデルに対する発話のうち、その対話サービスに対する評価の指標となる指標発話を取得し、取得された指標発話に基づく報酬を設定することで、対話モデルの強化学習を行う。このため、情報提供装置１０は、人手を介さずとも、対話サービスを実現する対話モデルの学習を実現することができ、対話モデルの精度を向上させることができる。 [7-6. Reinforcement learning with speech as reward)
In addition, the information providing apparatus 10 obtains an index utterance that is an index for evaluation of the interactive service among utterances for an interactive model that realizes an interactive service by generating a response to the input utterance, and obtains the obtained index. By setting a reward based on the utterance, reinforcement learning of the dialog model is performed. For this reason, the information providing apparatus 10 can realize the learning of the dialog model for realizing the dialog service without any manual operation, and can improve the accuracy of the dialog model.

また、情報提供装置１０は、対話サービスに対する利用者の評価が好意的である旨を示す好意的指標発話、若しくは、対話サービスに対する利用者の評価が非好意的である旨を示す非好意的指標発話を取得し、好意的指標発話が取得された場合は、対話モデルに対して正の報酬を設定し、非好意的指標発話が取得された場合は、対話モデルに対して負の報酬を設定する。このため、情報提供装置１０は、適切に対話モデルの強化学習を実現することができる。 In addition, the information providing apparatus 10 outputs a favorable index utterance indicating that the evaluation of the user for the interactive service is favorable, or an unfavorable index indicating that the evaluation of the user for the interactive service is unfavorable. When a utterance is acquired and a favorable index utterance is acquired, a positive reward is set for the dialogue model, and when an unfavorable index utterance is acquired, a negative reward is set for the dialogue model I do. For this reason, the information providing apparatus 10 can appropriately realize the reinforcement learning of the conversation model.

また、情報提供装置１０は、対話モデルが生成した応答に対して指標発話を出力する発話モデルが出力した指標発話を取得する。例えば、情報提供装置１０は、発話モデルとして、対話サービスにおける利用者と対話モデルとの対話の履歴が有する特徴を学習したモデルが出力した指標発話を取得する。このため、情報提供装置１０は、例えば、セッションの履歴が無くとも、対話モデルの強化学習を実現することができる。 Further, the information providing apparatus 10 acquires the index utterance output by the utterance model that outputs the index utterance in response to the response generated by the interaction model. For example, the information providing apparatus 10 acquires, as the utterance model, the index utterance output by the model that has learned the features of the history of the interaction between the user and the interaction model in the interaction service. For this reason, the information providing apparatus 10 can realize the reinforcement learning of the dialogue model without the history of the session, for example.

〔７−７．共起性に基づいた指標発話の抽出について〕
また、情報提供装置１０は、対話サービスにおける利用者の発話とその発話に対する応答との履歴から、対話サービスに対する利用者の評価の指標となる発話であって、予め設定された指標発話を特定する。そして、情報提供装置１０は、特定された指標発話との共起性に基づいて、履歴に含まれる発話から新たな指標発話を抽出する。上述したように、このような指標発話は、セッションの評価に用いることができる。この結果、情報提供装置１０は、セッションの評価精度を向上させることができる。 [7-7. Extraction of index utterance based on co-occurrence]
Further, the information providing apparatus 10 specifies, from the history of the utterance of the user in the interactive service and the response to the utterance, an utterance serving as an index of the evaluation of the user for the interactive service, which is a preset index utterance. . Then, the information providing apparatus 10 extracts a new index utterance from the utterances included in the history based on the co-occurrence with the specified index utterance. As described above, such an index utterance can be used for evaluating a session. As a result, the information providing apparatus 10 can improve the evaluation accuracy of the session.

また、情報提供装置１０は、対話サービスに対する利用者の評価が好意的である旨を示す好意的指標発話を特定し、好意的指標発話との共起性が所定の閾値を超える発話を新たな好意的指標発話として抽出する。また、情報提供装置１０は、対話サービスに対する利用者の評価が非好意的である旨を示す非好意的指標発話を特定し、非好意的指標発話との共起性が所定の閾値を超える発話を新たな非好意的指標発話として抽出する。 Further, the information providing apparatus 10 specifies a favorable index utterance indicating that the evaluation of the user for the interactive service is favorable, and generates a new utterance whose co-occurrence with the favorable index utterance exceeds a predetermined threshold. It is extracted as a favorable index utterance. The information providing apparatus 10 also specifies an unfavorable index utterance indicating that the user's evaluation of the dialogue service is unfavorable, and the utterance whose co-occurrence with the unfavorable index utterance exceeds a predetermined threshold value. Is extracted as a new unfavorable index utterance.

また、情報提供装置１０は、対話サービスに対する利用者の評価が好意的である旨を示す好意的指標発話が同一のセッション内に含まれる回数と、対話サービスに対する利用者の評価が非好意的である旨を示す非好意的指標発話が同一のセッション内に含まれる回数とを利用者の発話ごとに計数し、計数した各回数の比が所定の条件を満たす発話を新たな指標発話として抽出する。 In addition, the information providing apparatus 10 determines that the number of times that the favorable index utterance indicating that the evaluation of the user for the interactive service is favorable is included in the same session, and that the evaluation of the user for the interactive service is unfavorable. The number of times that the unfavorable index utterance indicating the presence is included in the same session is counted for each user's utterance, and the utterance in which the ratio of the counted times satisfies a predetermined condition is extracted as a new index utterance. .

また、情報提供装置１０は、発話と応答との履歴から、共起性を有する発話若しくは応答を接続する共起ネットワークを生成し、共起ネットワークから指標発話を特定し、共起ネットワーク上における指標発話との距離が所定の条件を満たす発話を新たな指標発話として抽出する。例えば、情報提供装置１０は、共起ネットワークから対話サービスに対する利用者の評価が好意的である旨を示す好意的指標発話を特定し、共起ネットワーク上における好意的指標発話との距離が所定の閾値以下となる発話を新たな好意的指標発話として抽出する。また、情報提供装置１０は、好意的指標発話との距離が所定の閾値以下となり、かつ、対話サービスに対する利用者の評価が非好意的である旨を示す非好意的指標発話との共起ネットワーク上における距離が所定の閾値以上となる発話を、新たな好意的指標発話として抽出する。 Also, the information providing apparatus 10 generates a co-occurrence network connecting utterances or responses having co-occurrence from the history of the utterance and the response, specifies an index utterance from the co-occurrence network, An utterance whose distance from the utterance satisfies a predetermined condition is extracted as a new index utterance. For example, the information providing apparatus 10 specifies, from the co-occurrence network, a favorable index utterance indicating that the evaluation of the user for the interactive service is favorable, and the distance from the favorable index utterance on the co-occurrence network is a predetermined distance. An utterance that is equal to or less than the threshold is extracted as a new favorable index utterance. In addition, the information providing apparatus 10 may include a co-occurrence network with a non-favorable index utterance indicating that the distance from the favorable index utterance is equal to or less than a predetermined threshold value and that the evaluation of the user for the interactive service is unfavorable. An utterance in which the distance above is equal to or greater than a predetermined threshold is extracted as a new favorable index utterance.

また、情報提供装置１０は、共起ネットワークから対話サービスに対する利用者の評価が非好意的である旨を示す非好意的指標発話を特定し、共起ネットワーク上における非好意的指標発話との距離が所定の閾値以下となる発話を新たな非好意的指標発話として抽出する。例えば、情報提供装置１０は、非好意的指標発話との距離が所定の閾値以下となり、かつ、対話サービスに対する利用者の評価が好意的である旨を示す好意的指標発話との共起ネットワーク上における距離が所定の閾値以上となる発話を、新たな非好意的指標発話として抽出する。これらの処理の結果、情報提供装置１０は、利用者の印象を示す指標発話を精度良く抽出することができる。 Further, the information providing apparatus 10 specifies an unfavorable index utterance indicating that the evaluation of the user for the interactive service is unfavorable from the co-occurrence network, and determines a distance from the unfavorable index utterance on the co-occurrence network. Are extracted as new unfavorable index utterances. For example, the information providing apparatus 10 may be configured such that the distance from the unfavorable index utterance is equal to or less than a predetermined threshold value, and the co-occurrence network with the favorable index utterance indicating that the evaluation of the user for the interactive service is favorable. Is extracted as a new unfavorable index utterance. As a result of these processes, the information providing device 10 can accurately extract the index utterance indicating the impression of the user.

また、情報提供装置１０は、抽出された指標発話が含まれているか否かに基づいて、対話サービスにおける発話と応答とを含むセッションを評価する。このため、情報提供装置１０は、セッションの評価精度を向上させることができる。 Further, the information providing apparatus 10 evaluates a session including an utterance and a response in the interactive service based on whether or not the extracted index utterance is included. For this reason, the information providing apparatus 10 can improve the evaluation accuracy of the session.

〔７−８．対話例を用いたセッションの評価について〕
また、情報提供装置１０は、対話サービスの履歴のうち、対話サービスに対する利用者の評価の指標となる指標発話に対応する一連の対話を抽出する。すなわち、情報提供装置１０は、各種の対話例を抽出する。例えば、情報提供装置１０は、指標発話から所定の数だけ前の発話および応答を対話例として抽出する。そして、情報提供装置１０は、抽出された対話を用いて、入力された対話に対する利用者の評価を推定するモデルを学習する。 [7-8. Session evaluation using dialogue examples)
In addition, the information providing apparatus 10 extracts a series of dialogues corresponding to an index utterance that is an index of the user's evaluation of the interactive service from the history of the interactive service. That is, the information providing apparatus 10 extracts various examples of dialogue. For example, the information providing apparatus 10 extracts a predetermined number of utterances and responses preceding the index utterance as a dialog example. Then, the information providing apparatus 10 learns a model for estimating a user's evaluation of the input dialogue by using the extracted dialogue.

このように、情報提供装置１０は、セッションではなく、各種の対話例を抽出し、抽出した対話例の特徴をモデルに学習させる。このような対話例の抽出は、各種の指標発話があれば、セッションの抽出よりも容易に実現することができる。この結果、情報提供装置１０は、容易にセッションの評価を実現することができる。 As described above, the information providing apparatus 10 extracts various dialogue examples, not sessions, and causes the model to learn features of the extracted dialogue examples. Extraction of such a dialog example can be realized more easily than extraction of a session if there are various index utterances. As a result, the information providing apparatus 10 can easily realize the session evaluation.

また、情報提供装置１０は、対話サービスに対する利用者の評価が好意的である旨を示す好意的指標発話から所定の数だけ前の発話および応答を抽出し、抽出された発話および応答を、利用者による評価が好意的である対話の学習データとして、モデルの学習を行う。また、情報提供装置１０は、対話サービスに対する利用者の評価が非好意的である旨を示す非好意的指標発話から所定の数だけ前の発話および応答を抽出し、抽出された発話および応答を、利用者による評価が非好意的である対話の学習データとして、モデルの学習を行う。また、情報提供装置１０は、対話サービスに対する利用者の評価が中立的である旨を示す中立指標発話から所定の数だけ前の発話および応答を抽出し、抽出された発話および応答を、利用者による評価が中立的である対話の学習データとして、モデルの学習を行う。このため、情報提供装置１０は、モデルによるセッションの評価精度を向上させることができる。 In addition, the information providing apparatus 10 extracts a predetermined number of utterances and responses from the favorable index utterance indicating that the evaluation of the user for the interactive service is favorable, and uses the extracted utterances and responses. Model learning is performed as learning data for dialogues that are favorably evaluated by the user. Further, the information providing apparatus 10 extracts a predetermined number of utterances and responses before the unfavorable index utterance indicating that the evaluation of the user for the interactive service is unfavorable, and extracts the extracted utterances and responses. In addition, the learning of the model is performed as learning data of the dialogue that the evaluation by the user is unfavorable. In addition, the information providing apparatus 10 extracts a predetermined number of utterances and responses prior to the neutral index utterance indicating that the evaluation of the user for the interactive service is neutral, and outputs the extracted utterances and responses to the user. The model is learned as learning data for dialogues in which the evaluation is neutral. For this reason, the information providing apparatus 10 can improve the evaluation accuracy of the session based on the model.

また、情報提供装置１０は、対話サービスにおける利用者の発話と応答とを含むセッションを取得する。そして、情報提供装置１０は、学習が行われたモデルを用いて、取得されたセッションの評価を行う。このため、情報提供装置１０は、セッションの評価を実現することができる。 Further, the information providing device 10 acquires a session including a user's utterance and a response in the interactive service. Then, the information providing apparatus 10 evaluates the acquired session by using the learned model. For this reason, the information providing apparatus 10 can implement session evaluation.

また、情報提供装置１０は、評価が所定の条件を満たすセッションから出現頻度が所定の条件を満たす発話を、指標発話として抽出する。例えば、情報提供装置１０は、セッションに対する利用者による評価が好意的であるか非好意的であるかを評価し、利用者による評価が好意的であると評価されたセッションにおける出現頻度と、利用者による評価が非好意的であると評価されたセッションにおける出現頻度との比が所定の条件を満たす発話を指標発話として抽出する。このため、情報提供装置１０は、新たな指標発話の抽出を自動的に行うことができる。 The information providing apparatus 10 extracts, as an index utterance, an utterance whose appearance frequency satisfies a predetermined condition from a session whose evaluation satisfies a predetermined condition. For example, the information providing apparatus 10 evaluates whether the evaluation of the session by the user is favorable or unfavorable, and determines the frequency of appearance in the session in which the evaluation by the user is favorable, An utterance that satisfies a predetermined condition with respect to a frequency of appearance in a session evaluated as unfavorable by the user is extracted as an index utterance. For this reason, the information providing apparatus 10 can automatically extract a new index utterance.

〔７−９．複数のモデルを用いたセッションの評価について〕
また、情報提供装置１０は、対話サービスにおける利用者の発話と応答とを含むセッションを取得する。そして、情報提供装置１０は、セッションに対する利用者の評価の指標となる複数の特徴のうちそれぞれ異なる種別の特徴に基づいてセッションを評価する複数のモデルの評価結果を用いて、セッションを評価する。このため、情報提供装置１０は、セッションの評価精度を向上させることができる。 [7-9. Session evaluation using multiple models]
Further, the information providing device 10 acquires a session including a user's utterance and a response in the interactive service. Then, the information providing apparatus 10 evaluates the session using evaluation results of a plurality of models that evaluate the session based on different types of characteristics among a plurality of characteristics that are indexes of the user's evaluation of the session. For this reason, the information providing apparatus 10 can improve the evaluation accuracy of the session.

また、情報提供装置１０は、セッションに含まれる発話に同一又は類似する発話が繰り返し含まれているか否かに基づいて、セッションを評価するモデルの評価結果を少なくとも用いる。また、情報提供装置１０は、セッションに含まれる応答に謝罪を示す所定の応答が含まれているか否かに基づいて、セッションを評価するモデルの評価結果を少なくとも用いる。また、情報提供装置１０は、セッションに含まれる発話に、対話サービスを利用者が非好意的に評価している旨を示す非好意的指標発話が含まれているか否かに基づいて、セッションを評価するモデルの評価結果を少なくとも用いる。また、情報提供装置１０は、セッションに含まれる発話に、対話サービスを利用者が好意的に評価している旨を示す好意的指標発話が含まれているか否かに基づいて、セッションを評価するモデルの評価結果を少なくとも用いる。このため、情報提供装置１０は、セッションの評価精度を向上させることができる。 Further, the information providing apparatus 10 uses at least the evaluation result of the model for evaluating the session based on whether the same or similar utterance is repeatedly included in the utterance included in the session. Further, the information providing apparatus 10 uses at least the evaluation result of the model for evaluating the session based on whether or not the response included in the session includes a predetermined response indicating an apology. Further, the information providing apparatus 10 determines the session based on whether or not the utterance included in the session includes an unfavorable index utterance indicating that the user is unfavorably evaluating the interactive service. At least the evaluation result of the model to be evaluated is used. Further, the information providing apparatus 10 evaluates the session based on whether or not the utterance included in the session includes a favorable index utterance indicating that the user is favorably evaluating the interactive service. Use at least the evaluation results of the model. For this reason, the information providing apparatus 10 can improve the evaluation accuracy of the session.

また、情報提供装置１０は、セッションに含まれる発話に同一又は類似する発話が繰り返し含まれているか否かに基づいてセッションを評価するモデルが出力した評価値と、セッションに含まれる応答に謝罪を示す所定の応答が含まれているか否かに基づいてセッションを評価するモデルが出力した評価値との和の値に基づいて、セッションを評価する。このため、情報提供装置１０は、人手を介さずともセッションを評価可能な複数のモデルを用いて、セッションを評価することができる。 In addition, the information providing apparatus 10 apologizes to the evaluation value output by the model for evaluating the session based on whether the same or similar utterance is repeatedly included in the utterance included in the session and the response included in the session. The session is evaluated based on the value of the sum with the evaluation value output by the model that evaluates the session based on whether or not the specified response is included. For this reason, the information providing apparatus 10 can evaluate a session using a plurality of models that can evaluate the session without manual intervention.

また、情報提供装置１０は、セッションに含まれる発話に同一又は類似する発話が繰り返し含まれているか否かに基づいてセッションを評価するモデルが出力した評価値と、セッションに含まれる応答に謝罪を示す所定の応答が含まれているか否かに基づいてセッションを評価するモデルが出力した評価値と、セッションに含まれる発話に対話サービスを利用者が非好意的に評価している旨を示す非好意的指標発話が含まれているか否かに基づいてセッションを評価するモデルが出力した評価値との和から、セッションに含まれる発話に対話サービスを利用者が好意的に評価している旨を示す好意的指標発話が含まれているか否かに基づいてセッションを評価するモデルが出力した評価値を減算した値に基づいて、セッションを評価する。例えば、情報提供装置１０は、値が正の値であるか負の値であるかに応じて、セッションを評価する。このため、情報提供装置１０は、精度良くセッションを評価することができる。 In addition, the information providing apparatus 10 apologizes to the evaluation value output by the model for evaluating the session based on whether the same or similar utterance is repeatedly included in the utterance included in the session and the response included in the session. The evaluation value output by the model for evaluating the session based on whether or not the predetermined response is included is included in the utterance included in the session. From the sum of the evaluation values output by the model that evaluates the session based on whether or not the utterances included in the session are included, it is determined that the user is favorably evaluating the dialogue service for the utterance included in the session. The session is evaluated based on a value obtained by subtracting the evaluation value output from the model for evaluating the session based on whether or not the indicated favorable index utterance is included. For example, the information providing apparatus 10 evaluates the session according to whether the value is a positive value or a negative value. Therefore, the information providing apparatus 10 can accurately evaluate the session.

以上、本願の実施形態のいくつかを図面に基づいて詳細に説明したが、これらは例示であり、発明の開示の欄に記載の態様を始めとして、当業者の知識に基づいて種々の変形、改良を施した他の形態で本発明を実施することが可能である。 As described above, some of the embodiments of the present application have been described in detail with reference to the drawings. However, these are exemplifications, and various modifications based on the knowledge of those skilled in the art, including the aspects described in the section of disclosure of the invention, The invention can be implemented in other modified forms.

また、上記してきた「部（section、module、unit）」は、「手段」や「回路」などに読み替えることができる。例えば、抽出部は、抽出手段や抽出回路に読み替えることができる。 Further, the above-mentioned “section (section, module, unit)” can be read as “means”, “circuit”, or the like. For example, the extraction unit can be replaced with an extraction unit or an extraction circuit.

１０情報提供装置
２０通信部
３０記憶部
３１セッションログデータベース
３２セッション評価データベース
３３指標発話データベース
３４エンゲージメントデータベース
３５モデルデータベース
４０制御部
４１セッション取得部
４２利用態様特定部
５０指標発話抽出処理部
５１指標発話特定部
５２発話抽出部
５３ネットワーク生成部
６０セッション評価処理部
６１学習データ抽出部
６２学習部
６３画像生成部
６４発話群生成部
６５評価部
７０エンゲージメント予測処理部
７１推定部
８０強化学習処理部
８１指標発話取得部
８２報酬設定部
８３強化学習部
１００端末装置
２００対話装置 DESCRIPTION OF SYMBOLS 10 Information provision apparatus 20 Communication part 30 Storage part 31 Session log database 32 Session evaluation database 33 Index utterance database 34 Engagement database 35 Model database 40 Control part 41 Session acquisition part 42 Usage mode identification part 50 Index utterance extraction processing part 51 Index utterance identification Unit 52 utterance extraction unit 53 network generation unit 60 session evaluation processing unit 61 learning data extraction unit 62 learning unit 63 image generation unit 64 utterance group generation unit 65 evaluation unit 70 engagement prediction processing unit 71 estimation unit 80 reinforcement learning processing unit 81 index utterance Acquisition unit 82 Reward setting unit 83 Reinforcement learning unit 100 Terminal device 200 Interactive device

Claims

An acquisition unit that acquires an index utterance that is an index of evaluation of the dialog service, among utterances of a dialog model that realizes a dialog service by generating a response to the input utterance;
A learning unit configured to perform a reinforcement learning of the interactive model by setting a reward based on the index utterance acquired by the acquiring unit.

The acquisition unit is a favorable index utterance indicating that the evaluation of the user for the interactive service is favorable, or an unfavorable index utterance indicating that the evaluation of the user for the interactive service is unfavorable. And get
The learning unit sets a positive reward for the dialogue model when the favorable index utterance is obtained, and sets a negative reward for the dialogue model when the non-favorable index utterance is obtained. The learning device according to claim 1, wherein a reward is set.

The learning device according to claim 1, wherein the obtaining unit obtains an index utterance output by an utterance model that outputs the index utterance in response to a response generated by the interaction model.

The said acquisition part acquires the index utterance which the model which learned the characteristic which the history of the conversation between the user and the said conversation model in the said conversation service has in the said conversation service was output. The said utterance model, The said utterance model. Learning device.

A learning method performed by the learning device,
Acquisition step of acquiring an index utterance that is an index of evaluation for the dialog service, among utterances for a dialog model that realizes a dialog service by generating a response to the input utterance,
A learning step of performing a reinforcement learning of the dialog model by setting a reward based on the index utterance acquired in the acquiring step.

Acquisition procedure of acquiring an index utterance that is an index of evaluation for the dialog service, among utterances for a dialog model that realizes a dialog service by generating a response to the input utterance,
A learning program for causing a computer to execute a learning procedure of performing reinforcement learning of the dialog model by setting a reward based on the index utterance acquired by the acquisition procedure.