JP2016090776A

JP2016090776A - Response generation apparatus, response generation method, and program

Info

Publication number: JP2016090776A
Application number: JP2014224168A
Authority: JP
Inventors: 佐和樋口; Sawa Higuchi; 生聖渡部; Seisho Watabe
Original assignee: Toyota Motor Corp
Current assignee: Toyota Motor Corp
Priority date: 2014-11-04
Filing date: 2014-11-04
Publication date: 2016-05-23

Abstract

PROBLEM TO BE SOLVED: To provide a response generation apparatus, a response generation method, and a program, which estimate an emotion in different method in accordance with a degree of intimacy with a user, thus generating a response.SOLUTION: A response generation apparatus 100 includes: a voice recognition part 101 for recognizing utterance content of a user; an emotion estimation part 104 for estimating a category of emotion of the user on the basis of the utterance content; a response generation part 106 for generating a response sentence in accordance with the estimated category of emotion; and an emotion category database 105 for storing a degree of intimacy with a user correlated with a combination of plural categories of emotions. In the emotion category database 105, the categories of emotion included in combinations are different depending on a difference in the degree of intimacy with a user. Referring to the emotion category database 105, the emotion estimation part 104 selects a category of emotion in accordance with utterance content from combinations in accordance with the degrees of intimacy, thus estimating the emotion of the user.SELECTED DRAWING: Figure 1

Description

本発明は応答生成装置、応答生成方法およびプログラムに関し、特に音声対話ロボット等において親密度に応じた感情推定を行い、応答を生成する技術に関する。 The present invention relates to a response generation apparatus, a response generation method, and a program, and more particularly to a technique for generating a response by performing emotion estimation according to intimacy in a voice interactive robot or the like.

特許文献１には、感情をこめて対話を行うロボット装置が開示されている。ロボット装置は、親密度データベースを保持しており、ユーザのインタラクションに応じて、ユーザとロボット装置間の親密度を更新する。インタラクションとは、ユーザのロボット装置に対する接し方のことであり、なでる、たたく、食事を提供する、話しかける、ボール遊びをする、等の動作をいう。ロボット装置は、これらのインタラクションが行われた回数や時間に基づいて親密度を更新する。 Patent Document 1 discloses a robot apparatus that performs dialogue with emotion. The robot apparatus holds a closeness database, and updates the closeness between the user and the robot apparatus in accordance with the user's interaction. The interaction refers to how the user touches the robot apparatus, and refers to actions such as stroking, hitting, providing a meal, talking, playing a ball, and the like. The robot apparatus updates the intimacy based on the number and time of these interactions.

また、ロボット装置は、ユーザの発話を認識し、対話データベースに定義された応答文一覧表から、親密度に応じた応答文を選択する。そして、ユーザの発話内容から推定される感情に応じて、応答文の語尾や抑揚を変化させた応答を行う。 Also, the robot apparatus recognizes the user's utterance and selects a response sentence corresponding to the familiarity from the response sentence list defined in the dialogue database. And the response which changed the ending and the inflection of the response sentence according to the emotion estimated from the user's utterance content is performed.

ここで、ロボット装置が用いる応答文一覧表は、親密度によらず、感情の種類や数は一定である。 Here, in the response sentence list used by the robot apparatus, the types and number of emotions are constant regardless of the familiarity.

特開２００４−０９０１０９号公報JP 2004-090109 A

特許文献１記載のロボット装置は、親密度と、ユーザの感情と、に応じて定義された応答文を一覧表から選択する。ここで、親密度に対応付けられた感情の種類や数は、親密度に関わらず一定である。そのため、ロボット装置側からの応答のバリエーションが限られるという問題があった。 The robot apparatus described in Patent Literature 1 selects a response sentence defined according to the familiarity and the user's emotion from a list. Here, the type and number of emotions associated with intimacy are constant regardless of intimacy. For this reason, there is a problem that variations in response from the robot apparatus side are limited.

すなわち、ロボット装置が推定できる感情の種類や数が一定であるので、ユーザとより親密になったとしても、ユーザの感情に一定以上踏み込んだ応答ができない。また、さほど親密でないにもかかわらず感情に踏み込みすぎて違和感を与えてしまうこともあり、親密度に応じた距離感を保つ会話が困難であった。これは、親密度に応じて、ユーザの感情への共感の度合いを変化させることができず、適切な応答の選択ができていなかったことに問題がある。 That is, since the types and number of emotions that can be estimated by the robot apparatus are constant, even if the robot device is more intimate with the user, it cannot respond to the emotion of the user more than a certain amount. In addition, even though it is not so intimate, it sometimes gives a sense of incongruity due to too much emotion, making conversation difficult to maintain a sense of distance according to intimacy. This is problematic in that the degree of empathy for the user's emotion cannot be changed according to the familiarity, and an appropriate response has not been selected.

本発明にかかる応答生成装置は、ユーザの発話内容を認識する音声認識部と、前記発話内容に基づいて前記ユーザの感情の種類を推定する感情推定部と、前記推定された感情の種類に応じた応答文を生成する応答生成部と、を有する応答生成装置であって、前記ユーザとの親密度と、複数の感情の種類の組合せと、が対応付けられた感情分類データベースをさらに有し、前記組合せに含まれる感情の種類は、前記ユーザとの親密度の違いに応じて異なっており、前記感情推定部は、感情分類データベースを参照して、前記親密度に応じた前記組合せから、前記発話内容に応じた感情の種類を選択することによって前記ユーザの感情を推定する。 The response generation apparatus according to the present invention includes a voice recognition unit that recognizes a user's utterance content, an emotion estimation unit that estimates a user's emotion type based on the utterance content, and the estimated emotion type. A response generation unit that generates a response sentence, and further includes an emotion classification database in which a closeness with the user and a combination of a plurality of emotion types are associated with each other, The types of emotions included in the combination are different according to a difference in intimacy with the user, and the emotion estimation unit refers to an emotion classification database from the combination according to the intimacy, The user's emotion is estimated by selecting the type of emotion according to the utterance content.

すなわち、本発明の応答生成装置は、ユーザとの親密度を推定し、親密度によって感情推定のための感情分類を変化させ、もって応答を変化させる。具体的には、ユーザと親密度が高い場合は、より詳細な感情の種類を用いることにより、より深い感情推定をし、応答を行う。一方、親密度が低い場合は、表面的な浅い感情推定や印象推定をし、応答を行う。 That is, the response generation apparatus of the present invention estimates the familiarity with the user, changes the emotion classification for emotion estimation according to the familiarity, and changes the response accordingly. Specifically, when the closeness with the user is high, a deeper emotion estimation is performed and a response is made by using a more detailed emotion type. On the other hand, when the intimacy is low, superficial shallow emotion estimation and impression estimation are performed and a response is made.

本発明により、ユーザとの親密度に応じ異なる方法で感情推定を行い、応答を生成する応答生成装置、応答生成方法およびプログラムを提供することができる。 According to the present invention, it is possible to provide a response generation device, a response generation method, and a program that perform emotion estimation using different methods according to intimacy with a user and generate a response.

実施の形態にかかる応答生成装置１００の構成を示す図である。It is a figure which shows the structure of the response production | generation apparatus 100 concerning Embodiment. 実施の形態にかかる応答生成装置１００の動作を示す図である。It is a figure which shows operation | movement of the response generation apparatus 100 concerning embodiment.

以下、図面を参照して本発明の実施の形態について説明する。
はじめに、図１のブロック図を用いて、本発明の実施の形態にかかる応答生成装置１００の構成について説明する。 Embodiments of the present invention will be described below with reference to the drawings.
First, the configuration of the response generation apparatus 100 according to the embodiment of the present invention will be described using the block diagram of FIG.

応答生成装置１００は、ユーザの発話を認識して応答を返す装置であり、典型的には音声対話ロボット装置である。応答生成装置１００は、例えば中央処理装置（ＣＰＵ）、揮発性又は不揮発性のメモリ、及び音声入出力装置（マイク、スピーカ、Ａ／Ｄ及びＤ／Ａ変換装置等）等を含み、ＣＰＵがメモリに格納されたプログラムに従って情報処理を実行することにより、目的とする種々の機能を実現することができる。 The response generation device 100 is a device that recognizes a user's utterance and returns a response, and is typically a voice interactive robot device. The response generation device 100 includes, for example, a central processing unit (CPU), a volatile or nonvolatile memory, a voice input / output device (such as a microphone, a speaker, an A / D and a D / A converter), and the like. By executing information processing according to the program stored in the program, various target functions can be realized.

応答生成装置１００は、少なくとも音声認識部１０１、親密度算出部１０２、親密度データベース１０３、感情推定部１０４、感情分類データベース１０５、応答生成部１０６を含む。さらに音声合成部１０７を含んでも良い。 The response generation device 100 includes at least a speech recognition unit 101, a familiarity calculation unit 102, a familiarity database 103, an emotion estimation unit 104, an emotion classification database 105, and a response generation unit 106. Furthermore, a speech synthesizer 107 may be included.

音声認識部１０１は、ユーザの発話内容を認識する処理を行う。すなわち、音声認識部１０１は、ユーザの発話を音声データとして入力し、音声データからテキストデータを生成する処理を行う。なお、典型的には、マイクがユーザの発話をアナログ音声信号として取得し、Ａ／Ｄ変換装置がアナログ音声信号を音声データに変換して、音声認識部１０１に入力する。音声データからのテキストデータの生成は、公知の種々の音声認識技術等により実現可能である。 The voice recognition unit 101 performs processing for recognizing the user's utterance content. That is, the voice recognition unit 101 performs a process of inputting a user's utterance as voice data and generating text data from the voice data. Typically, the microphone acquires the user's utterance as an analog voice signal, and the A / D converter converts the analog voice signal into voice data and inputs the voice data to the voice recognition unit 101. Generation of text data from voice data can be realized by various known voice recognition techniques.

感情推定部１０４は、音声認識部１０１が生成したテキストデータを利用してユーザの発話内容を解析し、ユーザの感情を推定する処理を行う。親密度算出部１０２は、感情推定部１０４による感情推定結果を含む、ユーザとの対話に関する情報を用いて、ユーザとの親密度を算出する処理を行う。親密度算出部１０２は、ユーザとの対話に関する情報や親密度を後述の親密度データベース１０３に記録する。 The emotion estimation unit 104 analyzes the content of the user's utterance using the text data generated by the voice recognition unit 101 and performs a process of estimating the user's emotion. The intimacy calculating unit 102 performs processing for calculating intimacy with the user using information related to the interaction with the user, including the emotion estimation result by the emotion estimating unit 104. The intimacy calculation unit 102 records information and intimacy regarding the dialogue with the user in the intimacy database 103 described later.

あるいは、親密度算出部１０２は、特許文献１記載の従来技術のように、ユーザとのインタラクションに応じてユーザとの親密度を算出しても良い。この場合、親密度算出部１０２は、感情推定部１０４による感情推定処理を要せずに親密度の更新を実施できる。 Alternatively, the familiarity calculation unit 102 may calculate the familiarity with the user according to the interaction with the user, as in the related art described in Patent Document 1. In this case, the familiarity calculation unit 102 can update the familiarity without requiring emotion estimation processing by the emotion estimation unit 104.

親密度データベース１０３は、ユーザとの対話に関する情報や親密度を記録する記憶手段である。 The intimacy database 103 is a storage unit that records information related to user interaction and intimacy.

感情推定部１０４は、親密度算出部１０２が親密度を算出した後、後述の感情分類データベース１０５を参照して、再度、ユーザの感情の推定を行う。すなわち、感情の種類を特定する。親密度が高いほど、より深い感情推定を行い、親密度が低いほど、より表面的な感情推定や印象推定を行う。 The emotion estimation unit 104 estimates the user's emotion again with reference to the emotion classification database 105 described later after the familiarity calculation unit 102 calculates the familiarity. That is, the type of emotion is specified. The higher the intimacy, the deeper the emotion estimation, and the lower the familiarity, the more superficial emotion estimation and impression estimation.

感情分類データベース１０５は、親密度毎に、異なる感情の種類の組合せを定義している記憶手段である。感情の種類については、親密度が高いほど深く詳細な感情が、親密度が低いほど表面的な感情又は印象が定義されている。 The emotion classification database 105 is a storage unit that defines different combinations of emotion types for each intimacy. As for the types of emotions, deeper and more detailed emotions are defined as the familiarity is higher, and superficial emotions or impressions are defined as the familiarity is lower.

応答生成部１０６は、親密度算出部１０２が算出したユーザとの親密度と、感情推定部１０４が推定したユーザの感情とに応じ、当該ユーザに対する応答文を生成する処理を行う。応答文は、典型的にはテキストデータである。 The response generation unit 106 performs processing for generating a response sentence for the user according to the closeness with the user calculated by the closeness calculation unit 102 and the user's emotion estimated by the emotion estimation unit 104. The response sentence is typically text data.

音声合成部１０７は、応答生成部１０６が生成した応答文を音声データに変換する。テキストデータからの音声データの生成は、公知の種々の音声合成技術等により実現可能である。その後、典型的にはＤ／Ａ変換装置が音声データをアナログ音声信号に変換し、スピーカがアナログ音声信号を音声として出力する。 The voice synthesizer 107 converts the response sentence generated by the response generator 106 into voice data. Generation of voice data from text data can be realized by various known voice synthesis techniques. After that, typically, the D / A converter converts the audio data into an analog audio signal, and the speaker outputs the analog audio signal as audio.

つぎに、図２のフローチャートを用いて、本発明の実施の形態にかかる応答生成装置１００の動作について説明する。 Next, the operation of the response generation apparatus 100 according to the embodiment of the present invention will be described using the flowchart of FIG.

Ｓ１０１：音声認識
ユーザが、応答生成装置１００に対して発話を行う。音声認識部１０１は、ユーザの発話内容を入力、認識し、テキストデータを生成する。 S101: Voice recognition A user speaks to the response generation apparatus 100. The voice recognition unit 101 inputs and recognizes the user's utterance content and generates text data.

具体的には、マイクがユーザの発話をアナログ音声信号として取得し、Ａ／Ｄ変換装置がアナログ音声信号を音声データに変換する。音声認識部１０１は、この音声データを入力し、公知の種々の音声認識技術等を利用して音声データをテキストデータに変換する。例えば、ユーザが「昨日、友達にプレゼントをもらったんだ」と発話すると、音声認識部１０１は「昨日、友達にプレゼントをもらったんだ」という内容のテキストデータを生成する。 Specifically, the microphone acquires the user's utterance as an analog voice signal, and the A / D converter converts the analog voice signal into voice data. The voice recognition unit 101 receives the voice data and converts the voice data into text data using various known voice recognition techniques. For example, when the user speaks “Yes, yesterday I got a present”, the voice recognition unit 101 generates text data with the content “Yes, yesterday, I got a present”.

Ｓ１０２：感情推定（１）
感情推定部１０４が、Ｓ１０１で生成されたテキストデータを解析し、ユーザの感情の推定を試みる。ユーザの発話内容のテキストデータを解析し、ユーザの感情を推定する手法としては種々の方法が公知であり、本発明はいずれかの感情推定手法に限定されるものではない。上記感情推定方法のひとつとして、「Ｗｅｂから獲得した感情生起要因コーパスに基づく感情推定」（徳久良子ほか，言語処理学会第１４回年次大会論文集，２００８年３月）がある。 S102: Emotion estimation (1)
The emotion estimation unit 104 analyzes the text data generated in S101 and tries to estimate the user's emotion. Various methods are known as a method for analyzing the text data of the user's utterance content and estimating the user's emotion, and the present invention is not limited to any emotion estimation method. As one of the emotion estimation methods, there is “Emotion estimation based on the emotion-causing factor corpus acquired from the Web” (Yoshiko Tokuhisa et al., Proc. 14th Annual Conference of the Language Processing Society, March 2008).

感情推定部１０４は、上記感情推定処理により、ユーザの感情の種類を推定する。例えば、「昨日、友達にプレゼントをもらったんだ」というテキストデータからは、「嬉しい」という種類の感情が推定される。この他、テキストデータの内容に応じ、「楽しい」、「嫌」等の様々な種類の感情が推定され得る。 The emotion estimation unit 104 estimates the type of emotion of the user through the emotion estimation process. For example, from the text data “I got a gift from a friend yesterday”, a kind of emotion “happy” is estimated. In addition, various types of emotions such as “fun” and “dislike” can be estimated according to the contents of the text data.

なお、Ｓ１０２における感情推定は、後述の感情分類データベース１０５を用いた精緻な感情推定を必ずしも要しない。上述のような公知の感情推定方法により、何らかの感情が推定できれば足りる。無論、後述する感情分類データベース１０５を用いた精緻な感情推定を用いても構わない。 Note that the emotion estimation in S102 does not necessarily require precise emotion estimation using the emotion classification database 105 described later. It is only necessary to be able to estimate some emotion by the known emotion estimation method as described above. Of course, precise emotion estimation using the emotion classification database 105 described later may be used.

Ｓ１０３：親密度算出
親密度算出部１０２が、親密度データベース１０３に既に記録されている情報と、Ｓ１０２で推定されたユーザの感情に基づいて、親密度を算出する。 S103: Intimacy Calculation The intimacy calculation unit 102 calculates intimacy based on information already recorded in the intimacy database 103 and the user's emotion estimated in S102.

ここで、親密度データベース１０３は、現在までの、ユーザの発話回数及び感情の種類毎の推定回数（感情出現回数）を累積的に記録している。また、親密度データベース１０３は、過去に算出されたユーザとの親密度を記録していても良い。なお、発話するユーザが複数存在する場合は、親密度データベース１０３は、上記情報をユーザ毎にそれぞれ記録することができる。 Here, the familiarity database 103 cumulatively records the number of utterances of the user and the estimated number of times for each emotion type (the number of times of emotion appearance) up to now. Further, the closeness database 103 may record closeness with the user calculated in the past. When there are a plurality of users who speak, the familiarity database 103 can record the above information for each user.

親密度算出部１０２は、親密度データベース１０３から、現在までのユーザの発話回数、及び、各感情の種類の出現回数を取得する。ここで、Ｓ１０２において何らかの感情が推定されたならば、当該感情の種類の出現回数を更新（＋１）する。さらに、発話数を更新（＋１）する。そして、親密度算出部１０２は、更新された発話数及び感情出現回数を用いて、例えば以下の算出式により親密度を算出する。
親密度＝感情出現回数／発話回数
例えば、これまでの発話回数が２０回であり、感情の種類として「嬉しい」が１２回、「楽しい」が３回、「嫌」が２回それぞれ推定されている場合、親密度は
（１２＋３＋２）／２０＝０．８５
となる。 The familiarity calculation unit 102 acquires the number of user utterances and the number of appearances of each emotion type from the familiarity database 103 to the present. If any emotion is estimated in S102, the number of appearances of the emotion type is updated (+1). Further, the number of utterances is updated (+1). Then, the familiarity calculating unit 102 calculates the familiarity using, for example, the following calculation formula using the updated number of utterances and number of emotion appearances.
Intimacy = number of emotion appearances / number of utterances For example, the number of utterances so far has been estimated to be 20 times, and “joyful” is estimated as 12 emotions, “fun” is 3 times, and “dislike” is estimated twice. The intimacy is (12 + 3 + 2) /20=0.85
It becomes.

この算出式によれば、親密度は０乃至１．０の数値で表される。親密度が１．０に近いほど、ユーザとの親密度が高いことを意味する。 According to this calculation formula, the familiarity is represented by a numerical value of 0 to 1.0. The closer the familiarity is to 1.0, the higher the familiarity with the user.

親密度算出部１０２は、Ｓ１０３において更新された発話回数、感情出現回数を親密度データベース１０３に記録する。Ｓ１０３において算出した親密度を併せて記録しても良い。 The intimacy calculation unit 102 records the number of utterances and the number of emotion appearances updated in step S <b> 103 in the intimacy database 103. The intimacy calculated in S103 may be recorded together.

なお、上述したＳ１０２及びＳ１０３の処理に代えて、親密度算出部１０２は、特許文献１記載の従来技術のように、ユーザとのインタラクションに基づいてユーザとの親密度を算出しても良い。あるいは、他の任意の公知の技術を用いて、ユーザとの親密度を算出することとしても良い。 Instead of the processing of S102 and S103 described above, the familiarity calculation unit 102 may calculate the familiarity with the user based on the interaction with the user as in the related art described in Patent Document 1. Alternatively, the familiarity with the user may be calculated using any other known technique.

Ｓ１０４：感情推定（２）
感情推定部１０４は、親密度算出部１０２が算出した親密度を用い、感情分類データベース１０５を参照して、再度、ユーザの感情の推定を行う。ここでの感情推定は、ユーザとの親密度に応じ、異なる感情の種類の組合せを用いて行う。親密度と、感情の種類の組合せとの対応は、感情分類データベース１０５において定義される。 S104: Emotion estimation (2)
The emotion estimation unit 104 estimates the user's emotion again with reference to the emotion classification database 105 using the familiarity calculated by the familiarity calculation unit 102. The emotion estimation here is performed using a combination of different emotion types according to the familiarity with the user. The correspondence between the familiarity and the combination of emotion types is defined in the emotion classification database 105.

感情分類データベース１０５は、親密度のレベルに応じて、異なる感情の種類の組合せを保持している。例えば、親密度が高度（例えば０．８以上）である場合に用いられる感情の種類として、「嬉しい」「好き」「安心」「悲しい」「嫌い」「不安」「辛い」「寂しい」等が設定されている。また、親密度が中程度（例えば０．５以上）である場合に用いられる感情の種類として、「楽しい」「面白い」「つまらない」「退屈」「綺麗」「汚い」等が設定されている。さらに、親密度が低度（０．５未満）である場合に用いられる感情の種類として、「いい」「素敵」「すごい」等が設定されている。 The emotion classification database 105 holds combinations of different emotion types according to the level of intimacy. For example, the emotions used when the intimacy is high (eg, 0.8 or higher) include “happy”, “like”, “relief”, “sad”, “dislike”, “anxiety”, “spicy”, “lonely”, etc. Is set. Also, “fun”, “interesting”, “dull”, “boring”, “beautiful”, “dirty”, etc. are set as the types of emotions used when the intimacy is moderate (for example, 0.5 or more). Furthermore, “good”, “nice”, “great”, etc. are set as the types of emotions used when the intimacy is low (less than 0.5).

ここで、感情分類データベース１０５においては、親密度が高いほど、より深く詳細な感情が、親密度が低いほど、より表面的な感情又は印象が、それぞれ設定されることが好ましい。すなわち、親密度が高くなるほど、感情の種類の数が多く設定される。また、親密度が高くなるほど、より深い内面の感情を示すような、感情の種類が用いられる。 Here, in the emotion classification database 105, it is preferable to set deeper and more detailed emotions as the intimacy is higher, and more superficial emotions or impressions as the intimacy is lower. That is, the higher the intimacy, the greater the number of emotion types. Moreover, the kind of emotion which shows deeper inner emotion is used, so that intimacy becomes high.

感情推定部１０４は、Ｓ１０１で生成されたテキストデータを再度解析し、ユーザの感情の推定を行う。ユーザの発話内容のテキストデータを解析し、ユーザの感情を推定する手法としては種々の方法が公知であり、本発明はいずれかの感情推定手法に限定されるものではない。上記感情推定方法のひとつとして、「Ｗｅｂから獲得した感情生起要因コーパスに基づく感情推定」（徳久良子ほか，言語処理学会第１４回年次大会論文集，２００８年３月）がある。 The emotion estimation unit 104 analyzes the text data generated in S101 again, and estimates the user's emotion. Various methods are known as a method for analyzing the text data of the user's utterance content and estimating the user's emotion, and the present invention is not limited to any emotion estimation method. As one of the emotion estimation methods, there is “Emotion estimation based on the emotion-causing factor corpus acquired from the Web” (Yoshiko Tokuhisa et al., Proc. 14th Annual Conference of the Language Processing Society, March 2008).

但し、感情推定部１０４はここで、感情分類データベース１０５に定義された感情の種類から適切なものを選択することにより、感情推定結果を生成する。すなわち、感情推定部１０４はまず、感情分類データベース１０５を参照し、親密度算出部１０２が算出した親密度に対応付けられた感情の種類の組合せを取得する。ついで、感情推定部１０４は、公知の感情推定手法を用いて、上述の感情の種類の組合せを候補として、感情を推定する。すなわち、感情の種類の組合せの中から、適切な感情を選択する。 However, the emotion estimation unit 104 generates an emotion estimation result by selecting an appropriate one from the types of emotions defined in the emotion classification database 105. That is, the emotion estimation unit 104 first refers to the emotion classification database 105 and acquires a combination of emotion types associated with the familiarity calculated by the familiarity calculation unit 102. Next, the emotion estimation unit 104 estimates an emotion using a known emotion estimation method with the above-mentioned combination of emotion types as a candidate. That is, an appropriate emotion is selected from a combination of emotion types.

例えば、親密度算出部１０２が算出した親密度が「０．８５」であれば、感情推定部１０４はまず、感情分類データベース１０５において親密度が高度（０．８以上）な場合に対応付けられている感情の種類の組合せ、「嬉しい」「好き」「安心」「悲しい」「嫌い」「不安」「辛い」「寂しい」を取得する。ついで、これらの感情の種類の組合せを推定結果の候補として、公知の手法による感情推定を実施する。結果として、感情推定部１０４は上述の感情の種類の組合せから、例えば「嬉しい」をユーザの感情として推定することができる。 For example, if the familiarity calculated by the familiarity calculation unit 102 is “0.85”, the emotion estimation unit 104 is first associated when the familiarity is high (0.8 or higher) in the emotion classification database 105. The combination of the types of emotions that are present, “happy”, “like”, “relief”, “sad”, “dislike”, “anxiety”, “spicy”, “lonely” are acquired. Next, emotion estimation by a known method is performed using combinations of these emotion types as estimation result candidates. As a result, the emotion estimation unit 104 can estimate, for example, “happy” as the user's emotion from the above-described combination of emotion types.

Ｓ１０５：応答生成
応答生成部１０６は、Ｓ１０４で推定されたユーザの感情に応じた応答文を生成する。例えば、ユーザの感情推定結果が「嬉しい」であれば、「それは嬉しかったね。」という応答文を作成することができる。あるいは、ユーザの感情推定結果が「いい」であれば、「いいね。」という応答文を作成することができる。 S105: Response Generation The response generation unit 106 generates a response sentence corresponding to the user's emotion estimated in S104. For example, if the user's emotion estimation result is “happy”, a response sentence “I was happy” can be created. Alternatively, if the user's emotion estimation result is “good”, a response sentence “like” can be created.

さらには、例えば親密度が所定のレベル以上である場合などに、ユーザの話を深堀りするための問い返しを含む応答文を生成しても良い。例えば、「それは嬉しかったね。」「何を貰ったの？」という応答文を生成することができる。 Furthermore, for example, when the intimacy is a predetermined level or higher, a response sentence including a question answer for deepening the user's story may be generated. For example, it is possible to generate a response sentence such as “I was happy about it” or “What are you talking about?”.

応答生成部１０６は、典型的には、図示しない記憶手段が保持する応答文テーブルを参照し、応答文テーブルから適切な応答文を選択することにより、応答文の生成を行うことができる。例えば、応答文テーブルは、感情の種類に対応付けられた応答文をそれぞれ含んでおり、応答生成部１０６は、応答文テーブルの中から、ユーザの感情の種類に予め対応付けられた応答文を選択する。例えば、「嬉しい」という感情に対応付けられている、「それは嬉しかったね。」という応答文を選択する。 Typically, the response generation unit 106 can generate a response text by referring to a response text table held by a storage unit (not shown) and selecting an appropriate response text from the response text table. For example, the response sentence table includes response sentences associated with emotion types, and the response generation unit 106 selects response sentences previously associated with user emotion types from the response sentence table. select. For example, the response sentence “I was happy” associated with the emotion “I am happy” is selected.

Ｓ１０６：音声合成
音声合成部１０７は、Ｓ１０５で生成された応答文を、公知の音声合成技術等を用いて音声データに変換する。その後、典型的にはスピーカが、Ｄ／Ａ変換された応答文の音声データを音声として出力する。 S106: Speech Synthesis The speech synthesis unit 107 converts the response sentence generated in S105 into speech data using a known speech synthesis technique or the like. Thereafter, the speaker typically outputs the voice data of the response sentence that has been D / A converted as voice.

本実施の形態によれば、応答生成装置１００は、親密度により推定する感情、印象の種類を変化させることで、ユーザとの親密度と気持ちに沿った応答をする。具体的には、親密度がより低い相手には、より浅い感情や印象の推定を行い、距離感を保った応答をする。一方、親密度が高い相手にはより細かい感情推定をし、相手の感情に踏み込んだ応答をする。 According to the present embodiment, the response generation device 100 responds according to the familiarity and feeling with the user by changing the type of emotion and impression estimated by the familiarity. Specifically, for a partner with lower intimacy, a shallower emotion or impression is estimated and a response with a sense of distance is maintained. On the other hand, a more detailed emotion estimation is made to a partner with a high degree of intimacy, and a response is taken into the partner's emotion.

これにより、親密度が低いときは、浅い感情推定に基づく応答を行うため、感情に踏み込んだ応答を行うことがない。一方、親密度が高いときは、深い感情推定を行うことにより、感情により踏み込んだ応答ができるようになる。すなわち、親密度に応じた適切な応答を行うので、ユーザと応答生成装置１００とのより円滑なコミュニケーションを実現できる。 As a result, when the intimacy is low, a response based on shallow emotion estimation is performed, and therefore, a response that goes into emotion is not performed. On the other hand, when the intimacy is high, a deeper emotion estimation makes it possible to respond more deeply to emotions. That is, since an appropriate response according to the intimacy is performed, smoother communication between the user and the response generation device 100 can be realized.

１００応答生成装置
１０１音声認識部
１０２親密度算出部
１０３親密度データベース
１０４感情推定部
１０５感情分類データベース
１０６応答生成部
１０７音声合成部 DESCRIPTION OF SYMBOLS 100 Response production | generation apparatus 101 Speech recognition part 102 Intimacy calculation part 103 Intimacy database 104 Emotion estimation part 105 Emotion classification database 106 Response generation part 107 Speech synthesis part

Claims

A voice recognition unit for recognizing the user's utterance content;
An emotion estimation unit that estimates the type of emotion of the user based on the utterance content;
A response generation unit that generates a response sentence according to the estimated type of emotion,
An emotion classification database in which a closeness with the user and a combination of a plurality of emotion types are associated with each other;
The types of emotions included in the combination are different depending on the intimacy with the user,
The response estimation device, wherein the emotion estimation unit estimates an emotion of the user by selecting an emotion type corresponding to the utterance content from the combination corresponding to the familiarity with reference to an emotion classification database.

The response generation device according to claim 1, wherein, in the emotion classification database, the higher the intimacy, the more the emotion type combinations include more emotion types.

3. The response generation device according to claim 1, wherein, in the emotion classification database, the higher the intimacy, the more a combination of the emotion types includes an emotion type indicating a deeper emotion.

A speech recognition step for recognizing the user's utterance content;
An emotion estimation step of estimating the type of emotion of the user based on the utterance content;
A response generation step of generating a response sentence corresponding to the estimated type of emotion, comprising:
In the emotion estimation step,
An affinity classification database in which the intimacy with the user is associated with a combination of a plurality of emotion types, and the types of emotion included in the combination are different according to the difference in intimacy with the user. Browse
A response generation method of estimating the user's emotion by selecting a type of emotion according to the utterance content from the combination according to the familiarity.

The program for making a computer perform the method of Claim 4.